Highlighting Rules
The highlighting rules specify to Ace (Cloud9's editor) how to color the syntax of the language of your mode.
Defining Syntax Highlighting Rules
The Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define mynew_highlight_rules.js
, which our mode uses.
All syntax highlighters start off looking something like this:
define(function(require, exports, module) {
"use strict";
var oop = require("../lib/oop");
var TextHighlightRules = require("ace/mode/text_highlight_rules").TextHighlightRules;
var MyNewHighlightRules = function() {
// regexp must not have capturing parentheses. Use (?:) instead.
// regexps are ordered -> the first match is used
this.$rules = {
"start" : [
{
token: <token>, // String, Array, or Function: the CSS token to apply
regex: <regex>, // String or RegExp: the regexp to match
next: <next> // [Optional] String: next state to enter
}
]
};
};
oop.inherits(MyNewHighlightRules, TextHighlightRules);
exports.MyNewHighlightRules = MyNewHighlightRules;
});
The token state machine operates on whatever is defined in this.$rules
. The highlighter always begins at the start
state, and progresses down the list, looking for a matching regex
. When one is found, the resulting text is wrapped within a <span class="ace_<token>">
tag, where <token>
is defined as the token
property. Note that all tokens are preceded by the ace_
prefix when they're rendered on the page.
Once again, we're inheriting from TextHighlightRules
here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see extending Highlighters below.
Defining Tokens
The Ace highlighting system is heavily inspired on the TextMate language grammar. Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found on the Ace Wiki.
For the complete list of tokens, see tool/tmtheme.js. It is possible to add new token names, but the scope of that knowledge is outside of this document.
Multiple tokens can be applied to the same text by adding dots in the token, e.g. token: support.function
wraps the text in a <span class="ace_support ace_function">
tag.
Defining Regular Expressions
Regular expressions can either be a RegExp or String definition.
If you're using a regular expression, remember to start and end the line with the / character, like this:
{
token : "constant.language.escape",
regex : /\$[\w\d]+/
}
A caveat of using stringed regular expressions is that any \
character must be escaped. That means that even an innocuous regular expression like this:
regex: "function\s*\(\w+\)"
Must actually be written like this:
regex: "function\\s*\(\\w+\)"
Groupings
The regular expression matches the part of the code that should be styled by the token. You can also include flat regexps (var)
or have matching groups ((a+)(b+))
. There is a strict requirement whereby matching groups must cover the entire matched string; thus, (hel)lo
is invalid. If you want to create a non-matching group, simply start the group with the ?:
predicate; thus, (hel)(?:lo)
is okay. You can create longer non-matching groups. For example:
{
token : "constant.language.boolean",
regex : /(?:true|false)\b/
},
For flat regular expression matches, token
can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this:
var colors = lang.arrayToMap(
("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" +
"purple|red|silver|teal|white|yellow").split("|")
);
var fonts = lang.arrayToMap(
("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" +
"symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" +
"serif|monospace").split("|")
);
...
{
token: function(value) {
if (colors.hasOwnProperty(value.toLowerCase())) {
return "support.constant.color";
}
else if (fonts.hasOwnProperty(value.toLowerCase())) {
return "support.constant.fonts";
}
else {
return "text";
}
},
regex: "\\-?[a-zA-Z_][a-zA-Z0-9_\\-]*"
}
If token
is a function, it should take the same number of arguments as there are groups, and return an array of tokens.
For grouped regular expressions, token
can be a String, in which case all matched groups are given that same token, like this:
{
token: "identifier",
regex: "(\\w+\\s*:)(\\w*)"
}
More commonly, though, token
is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this:
{
token : ["storage.type", "text", "entity.name.function"],
regex : "(function)(\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\b)"
}
Defining States
The syntax highlighting state machine stays in the start
state, until you define a next
state for it to advance to. At that point, the tokenizer stays in that new state
, until it advances to another state. Afterwards, you should return to the original start
state.
Here's an example:
this.$rules = {
"start" : [ {
token : "text",
regex : "<\\!\\[CDATA\\[",
next : "cdata"
},
"cdata" : [ {
token : "text",
regex : "\\]\\]>",
next : "start"
}, {
defaultToken : "text"
} ]
};
In this extremly short sample, we're defining some highlighting rules for when Ace detectes a <![CDATA
tag. When one is encountered, the tokenizer moves from start
into the cdata
state. It remains there, applying the text
token to any string it encounters. Finally, when it hits a closing ]>
symbol, it returns to the start
state and continues to tokenize anything else.
Extending Highlighters
Suppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (<?lua
, <?php
, {%
, for example). Ace allows you to easily extend a highlighter using a few helper functions.
Getting Existing Rules
To get the existing syntax highlighting rules for a particular language, use the getRules() function. For example:
var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
this.$rules = new HtmlHighlightRules().getRules();
/*
this.$rules == Same this.$rules as HTML highlighting
*/
Extending a Highlighter
The addRules()
method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this:
this.$rules = {
"start": [ /* ... */ ]
};
var newRules = {
"start": [ /* ... */ ]
}
If you want to incorporate newRules
into this.$rules
, you'd do something like this:
this.addRules(newRules, "new-");
/*
this.$rules = {
"start": [ ... ],
"new-start": [ ... ]
};
*/
Extending Two Highlighters
The last function available to you combines both of these concepts, and it's called embedRules
. It takes three parameters:
- An existing rule set to embed with
- A prefix to apply for each state in the existing rule set
- A set of new states to add
Like addRules
, embedRules
adds on to the existing this.$rules
object.
To explain this visually, let's take a look at the syntax highlighter for Lua pages, which
combines all of these concepts:
var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules;
var LuaPageHighlightRules = function() {
this.$rules = new HtmlHighlightRules().getRules();
for (var i in this.$rules) {
this.$rules[i].unshift({
token: "keyword",
regex: "<\\%\\=?",
next: "lua-start"
}, {
token: "keyword",
regex: "<\\?lua\\=?",
next: "lua-start"
});
}
this.embedRules(LuaHighlightRules, "lua-", [
{
token: "keyword",
regex: "\\%>",
next: "start"
},
{
token: "keyword",
regex: "\\?>",
next: "start"
}
]);
};
Here, this.$rules
starts off as a set of HTML highlighting rules. To this set, we add two new checks for <%=
and <?lua=
. We also delegate that if one of these rules are matched, we should move onto the lua-start
state. Next, embedRules
takes the already existing set of LuaHighlightRules
and applies the lua-
prefix to each state there. Finally, it adds two new checks for %>
and ?>
, allowing the state machine to return to start
.
Testing Your Highlighter
The best way to test your tokenizer is to see it live, right? To do that you'll want to create a new Cloud9 bundle and add your highlighter to it. See this guide for more information.
Adding Automated Tests
Adding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development.
In lib/ace/mode/_test
create a file named
text_<modeName>.txt
with some example code. (You can skip this if the document you have added in demo/docs
both looks good and covers various edge cases in your language syntax).
Run node highlight_rules_test.js -gen
to preserve current output of your tokenizer in tokens_<modeName>.json
After this running highlight_rules_test.js optionalLanguageName
will compare output of your tokenizer with the correct output you've created.
Updated less than a minute ago