Making a Syntax Highlighter for Ace

Creating a brand new syntax highlighter for Ace is very simple. You will must outline two items of code: a brand new mode, and a brand new set of highlighting guidelines.

The place to Begin

We suggest utilizing the Ace Mode Creator when defining your highlighter. This lets you examine your code’s tokens, in addition to offering a reside preview of the syntax highlighter in motion.

Defining a Mode

Each language wants a mode. A mode accommodates the paths to a language’s syntax highlighting guidelines, indentation guidelines, and code folding guidelines. With out defining a mode, Ace will not know something in regards to the finer elements of your language.

Right here is the starter template we’ll use to create a brand new mode:

outline(operate(require, exports, module) {
"use strict";

var oop = require("../lib/oop");
// defines the father or mother mode
var TextMode = require("./text").Mode;
var Tokenizer = require("../tokenizer").Tokenizer;
var MatchingBraceOutdent = require("./matching_brace_outdent").MatchingBraceOutdent;

// defines the language particular highlighters and folding guidelines
var MyNewHighlightRules = require("./mynew_highlight_rules").MyNewHighlightRules;
var MyNewFoldMode = require("./folding/mynew").MyNewFoldMode;

var Mode = operate() {
    // set the whole lot up
    this.HighlightRules = MyNewHighlightRules;
    this.$outdent = new MatchingBraceOutdent();
    this.foldingRules = new MyNewFoldMode();
};
oop.inherits(Mode, TextMode);

(operate() {
    // configure remark begin/finish characters
    this.lineCommentStart = "//";
    this.blockComment = {begin: "/*", finish: "*/"};
    
    // particular logic for indent/outdent. 
    // By default ace retains indentation of earlier line
    this.getNextLineIndent = operate(state, line, tab) {
        var indent = this.$getIndent(line);
        return indent;
    };

    this.checkOutdent = operate(state, line, enter) {
        return this.$outdent.checkOutdent(line, enter);
    };

    this.autoOutdent = operate(state, doc, row) {
        this.$outdent.autoOutdent(doc, row);
    };
    
    // create employee for reside syntax checking
    this.createWorker = operate(session) {
        var employee = new WorkerClient(["ace"], "ace/mode/mynew_worker", "NewWorker");
        employee.attachToDocument(session.getDocument());
        employee.on("errors", operate(e) {
            session.setAnnotations(e.information);
        });
        return employee;
    };
    
}).name(Mode.prototype);

exports.Mode = Mode;
});

What is going on on right here? First, you are defining the trail to TextMode (extra on this later). Then you definately’re pointing the mode to your definitions for the highlighting guidelines, in addition to your guidelines for code folding. Lastly, you are setting the whole lot as much as discover these guidelines, and exporting the Mode in order that it may be consumed. That is it!

Relating to TextMode, you may discover that it is solely getting used as soon as: oop.inherits(Mode, TextMode);. In case your new language is determined by the principles of one other language, you possibly can select to inherit the identical guidelines, whereas increasing on it together with your language’s personal necessities. For instance, PHP inherits from HTML, since it may be embedded instantly inside .html pages. You’ll be able to both inherit from TextMode, or another current mode, if it already pertains to your language.

All Ace modes may be discovered within the lib/ace/mode folder.

Defining Syntax Highlighting Guidelines

The Ace highlighter may be thought-about to be a state machine. Common expressions outline the tokens for the present state, in addition to the transitions into one other state. Let’s outline mynew_highlight_rules.js, which our mode above makes use of.

All syntax highlighters begin off trying one thing like this:

outline(operate(require, exports, module) {
"use strict";

var oop = require("../lib/oop");
var TextHighlightRules = require("./text_highlight_rules").TextHighlightRules;

var MyNewHighlightRules = operate() {

    // regexp should not have capturing parentheses. Use (?:) as an alternative.
    // regexps are ordered -> the primary match is used
   this.$guidelines = {
        "start" : [
            {
                token: token, // String, Array, or Function: the CSS token to apply
                regex: regex, // String or RegExp: the regexp to match
                next:  next   // [Optional] String: subsequent state to enter
            }
        ]
    };
};

oop.inherits(MyNewHighlightRules, TextHighlightRules);

exports.MyNewHighlightRules = MyNewHighlightRules;

});

The token state machine operates on no matter is outlined in this.$guidelines. The highlighter at all times begins on the begin state, and progresses down the listing, in search of an identical regex. When one is discovered, the ensuing textual content is wrapped inside a <span class="ace_<token>"> tag, the place <token> is outlined because the token property. Be aware that each one tokens are preceded by the ace_ prefix after they’re rendered on the web page.

See also  Champions Online : Nemesis Basics

As soon as once more, we’re inheriting from TextHighlightRules right here. We may select to make this another language set we wish, if our new language requires beforehand outlined syntaxes. For extra info on extending languages, see “extending Highlighters” under.

Defining Tokens

The Ace highlighting system is closely impressed by the TextMate language grammar. Most tokens will observe the conventions of TextMate when naming grammars. A radical (albeit incomplete) listing of tokens may be discovered on the Ace Wiki.

For the entire listing of tokens, see device/tmtheme.js. It’s attainable so as to add new token names, however the scope of that data is outdoors of this doc.

A number of tokens may be utilized to the identical textual content by including dots within the token, e.g. token: help.operate wraps the textual content in a <span class="ace_support ace_function"> tag.

Defining Common Expressions

Common expressions can both be a RegExp or String definition

When you’re utilizing a daily expression, bear in mind to start out and finish the road with the / character, like this:

{
    token : "constant.language.escape",
    regex : /$[wd]+/
}

A caveat of utilizing stringed common expressions is that any character have to be escaped. That signifies that even an innocuous common expression like this:

regex: "functions*(w+)"

Should truly be written like this:

regex: "functions*(w+)"

Groupings

It’s also possible to embody flat regexps–(var)–or have matching groups–((a+)(b+)). There’s a strict requirement whereby matching teams should cowl the whole matched string; thus, (hel)lo is invalid. If you wish to create a non-matching group, merely begin the group with the ?: predicate; thus, (hel)(?:lo) is okay. You’ll be able to, after all, create longer non-matching teams. For instance:

false)b/
,

For flat common expression matches, token generally is a String, or a Operate that takes a single argument (the match) and returns a string token. For instance, utilizing a operate would possibly appear like this:

var colours = lang.arrayToMap(
    ("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" +
    "purple|red|silver|teal|white|yellow").cut up("|")
);

var fonts = lang.arrayToMap(
    ("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" +
    "symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" +
    "serif|monospace").cut up("|")
);

...

{
    token: operate(worth) {
        if (colours.hasOwnProperty(worth.toLowerCase())) {
            return "support.constant.color";
        }
        else if (fonts.hasOwnProperty(worth.toLowerCase())) {
            return "support.constant.fonts";
        }
        else {
            return "text";
        }
    },
    regex: "-?[a-zA-Z_][a-zA-Z0-9_-]*"
}

If token is a operate,it ought to take the identical variety of arguments as there are teams, and return an array of tokens.

For grouped common expressions, token generally is a String, by which case all matched teams are given that very same token, like this:

{
    token: "identifier",
    regex: "(w+s*:)(w*)"
}

Extra generally, although, token is an Array (of the identical size because the variety of teams), whereby matches are given the token of the identical alignment as within the match. For a sophisticated common expression, like defining a operate, that may look one thing like this:

{
    token : ["storage.type", "text", "entity.name.function"],
    regex : "(function)(s+)([a-zA-Z_][a-zA-Z0-9_]*b)"
}

Defining States

The syntax highlighting state machine stays within the begin state, till you outline a subsequent state for it to advance to. At that time, the tokenizer stays in that new state, till it advances to a different state. Afterwards, you need to return to the unique begin state.

This is an instance:

this.$guidelines = {
    "start" : [ {
        token : "text",
        regex : "<![CDATA[",
        next : "cdata"
    } ],

    "cdata" : [ {
        token : "text",
        regex : "]]>",
        subsequent : "start"
    }, {
        defaultToken : "text"
    } ]
};

On this extraordinarily brief pattern, we’re defining some highlighting guidelines for when Ace detects a <![CDATA tag. When one is encountered, the tokenizer moves from start into the cdata state. It remains there, applying the text token to any string it encounters. Finally, when it hits a closing ]> image, it returns to the begin state and continues to tokenize anything.

See also  HP OfficeJet 4650 Review

Utilizing the TMLanguage Instrument

There’s a device that
will take an current tmlanguage file and do its greatest to transform it into Javascript for Ace to eat. This is what it’s essential to get began:

  1. Within the Ace repository, navigate to the instruments folder.
  2. Run npm set up to put in required dependencies.
  3. Run node tmlanguage.js <path_to_tmlanguage_file>; for instance, node <path_to_tmlanguage_file> /Customers/Elrond/elven.tmLanguage

Two recordsdata are created and positioned in lib/ace/mode: one for the language mode, and one for the set of spotlight guidelines. You’ll nonetheless want so as to add the code into ace/ext/modelist.js, and add a pattern file for testing.

A Be aware on Accuracy

Your .tmlanguage file will then be transformed to one of the best of the converter’s means. It’s an understatement to say that the device is imperfect. Most likely, language mode creation won’t ever be capable to be absolutely autogenerated. There is a listing of non-determinable objects; for instance:

  • The usage of common expression lookbehinds
    It is a idea that JavaScript merely doesn’t have and must be faked
  • Deciding which state to transition to
    Whereas the device does create new states accurately, it labels them with generic phrases like state_2, state_10, e.t.c.
  • Extending modes
    Many modes say one thing like embody supply.c, to imply, “add all the rules in C highlighting.” That syntax doesn’t make sense to Ace or this device (although after all you possibly can extending current highlighters).
  • Rule choice order
  • Gathering key phrases
    Most certainly, you’ll must take key phrases out of your language file and run them by createKeywordMapper()

Nevertheless, the device is a wonderful solution to get a fast begin, when you already possess a tmlanguage file for you language.

Extending Highlighters

Suppose you are engaged on a LuaPage, PHP embedded in HTML, or a Django template. You will must create a syntax highlighter that takes all the principles from the unique language (Lua, PHP, or Python) and extends it with some further identifiers (<?lua, <?php, {%, for instance). Ace lets you simply prolong a highlighter utilizing just a few helper capabilities.

Getting Current Guidelines

To get the present syntax highlighting guidelines for a selected language, use the getRules() operate. For instance:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;

this.$guidelines = new HtmlHighlightRules().getRules();

/*
    this.$guidelines == Similar this.$guidelines as HTML highlighting
*/

Extending a Highlighter

The addRules technique does one factor, and it does one factor properly: it provides new guidelines to an current rule set, and prefixes any state with a given tag. For instance, for instance you’ve got received two units of guidelines, outlined like this:

this.$guidelines = {
    "start": [ /* ... */ ]
};

var newRules = {
    "start": [ /* ... */ ]
}

If you wish to incorporate newRules into this.$guidelines, you’d do one thing like this:

this.addRules(newRules, "new-");

/*
    this.$guidelines = {
        "start": [ ... ],
        "new-start": [ ... ]
    };
*/

Extending Two Highlighters

The final operate obtainable to you combines each of those ideas, and it is referred to as embedRules. It takes three parameters:

  1. An current rule set to embed with
  2. A prefix to use for every state within the current rule set
  3. A set of recent states so as to add

Like addRules, embedRules provides on to the present this.$guidelines object.

To elucidate this visually, let’s check out the syntax highlighter for Lua pages, which mixes all of those ideas:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules;

var LuaPageHighlightRules = operate() {
    this.$guidelines = new HtmlHighlightRules().getRules();

    for (var i on this.$guidelines) {
        this.$guidelines[i].unshift({
            token: "keyword",
            regex: "<%=?",
            subsequent: "lua-start"
        }, {
            token: "keyword",
            regex: "<?lua=?",
            subsequent: "lua-start"
        });
    }
    this.embedRules(LuaHighlightRules, "lua-", [
        {
            token: "keyword",
            regex: "%>",
            next: "start"
        },
        {
            token: "keyword",
            regex: "?>",
            next: "start"
        }
    ]);
};

Right here, this.$guidelines begins off as a set of HTML highlighting guidelines. To this set, we add two new checks for <%= and <?lua=. We additionally delegate that if one in all these guidelines are matched, we must always transfer onto the lua-start state. Subsequent, embedRules takes the already current set of LuaHighlightRules and applies the lua- prefix to every state there. Lastly, it provides two new checks for %> and ?>, permitting the state machine to return to begin.

Code Folding

Including new folding guidelines to your mode generally is a little tough. First, insert the next strains of code into your mode definition:

var MyFoldMode = require("./folding/newrules").FoldMode;

...
var MyMode = operate() {

    ...

    this.foldingRules = new MyFoldMode();
};

You will be defining your code folding guidelines into the lib/ace/mode/folding folder. This is a template that you need to use to get began:

outline(operate(require, exports, module) {
"use strict";

var oop = require("../../lib/oop");
var Vary = require("../../range").Vary;
var BaseFoldMode = require("./fold_mode").FoldMode;

var FoldMode = exports.FoldMode = operate() {};
oop.inherits(FoldMode, BaseFoldMode);

(operate() {

    // common expressions that establish beginning and stopping factors
    this.foldingStartMarker; 
    this.foldingStopMarker;

    this.getFoldWidgetRange = operate(session, foldStyle, row) {
        var line = session.getLine(row);

        // check every line, and return a variety of segments to break down
    };

}).name(FoldMode.prototype);

});

Identical to with TextMode for syntax highlighting, BaseFoldMode accommodates the place to begin for code folding logic. foldingStartMarker defines your opening folding level, whereas foldingStopMarker defines the stopping level. For instance, for a C-style folding system, these values would possibly appear like this:

this.foldingStartMarker = /([)[^]]*$|^s*(/*)/;
this.foldingStopMarker = /^[^[{]*(}|])|^[s*]*(*/)/;

These common expressions establish varied symbols–{, [, //–to pay attention to. getFoldWidgetRange matches on these regular expressions, and when found, returns the range of relevant folding points. For more information on the Range object, see the Ace API documentation.

See also  16 Best Free Music Making Software For Windows

Again, for a C-style folding mechanism, a range to return for the starting fold might look like this:

var line = session.getLine(row);
var match = line.match(this.foldingStartMarker);
if (match) {
    var i = match.index;

    if (match[1])
        return this.openingBracketBlock(session, match[1], row, i);

    var vary = session.getCommentFoldRange(row, i + match[0].size);
    vary.finish.column -= 2;
    return vary;
}

As an instance we stumble throughout the code block hello_world() {. Our vary object right here turns into:

{
  startRow: 0,
  endRow: 0,
  startColumn: 0,
  endColumn: 13
}

Testing Your Highlighter

One of the best ways to check your tokenizer is to see it reside, proper? To do this, you may need to modify the reside Ace demo to preview your adjustments. You will discover this file within the root Ace listing with the identify kitchen-sink.html.

  1. add an entry to supportedModes in ace/ext/modelist.js
  2. add a pattern file to demo/kitchen-sink/docs/ with identical identify because the mode file

When you set this up, you need to be capable to witness a reside demonstration of your new highlighter.

Including Automated Checks

Including automated checks for a highlighter is trivial so you aren’t required to do it, however it may possibly assist throughout improvement.

In lib/ace/mode/_test create a file named

text_<modeName>.txt

with some instance code. (You’ll be able to skip this if the doc you will have added in demo/docs each appears to be like good and covers varied edge instances in your language syntax).

Run node highlight_rules_test.js -gen to protect present output of your tokenizer in tokens_<modeName>.json

After this operating highlight_rules_test.js optionalLanguageName will examine output of your tokenizer with the right output you’ve got created.

Any recordsdata ending with the _test.js suffix are mechanically run by Ace’s Travis CI server.

Leave a Reply

Your email address will not be published.