{"__v":7,"_id":"551f39ef610f400d00837ec4","category":{"__v":5,"_id":"551f39be6886f8230055f02a","pages":["551f39d0610f400d00837ec2","551f39ef610f400d00837ec4","551f3a0b50a0fc210057968c","551f3a164986f62b00a720e0","55206797623ff50d009b2bdf"],"project":"54d53c7b23010a0d001aca0c","version":"54d5635532d98b0d00384afb","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-04-04T01:09:18.662Z","from_sync":false,"order":10,"slug":"language-modes","title":"Language Modes"},"parentDoc":null,"project":"54d53c7b23010a0d001aca0c","user":"54cfa8e1a8a4fd0d00b7fd1d","version":{"__v":10,"_id":"54d5635532d98b0d00384afb","forked_from":"54d53c7c23010a0d001aca0f","project":"54d53c7b23010a0d001aca0c","createdAt":"2015-02-07T00:59:01.934Z","releaseDate":"2015-02-07T00:59:01.934Z","categories":["54d5635632d98b0d00384afc","54d5635632d98b0d00384afd","54d5635632d98b0d00384afe","54d5635632d98b0d00384aff","54d5635632d98b0d00384b00","54d5635632d98b0d00384b01","54d5635632d98b0d00384b02","54d652097e05890d006f153e","54dd1315ca1e5219007e9daa","54e21e2b22de1c230094b147","54e68e62a43fe13500db3879","54fa1d3fe7a0ba2f00306309","551c453a23a1ee190034d19a","551df586e52a0b23000c62b6","551f39be6886f8230055f02a","55a6720751457325000e4d97"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"0.1.0","version":"0.1"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2015-04-04T01:10:07.830Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"The highlighting rules specify to Ace (Cloud9's editor) how to color the syntax of the language of [your mode](doc:language-mode).\n\n# Defining Syntax Highlighting Rules\n\nThe Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define `mynew_highlight_rules.js`, which our [mode](doc:language-mode) uses.\n\nAll syntax highlighters start off looking something like this:\n\n```javascript\ndefine(function(require, exports, module) {\n\"use strict\";\n\nvar oop = require(\"../lib/oop\");\nvar TextHighlightRules = require(\"ace/mode/text_highlight_rules\").TextHighlightRules;\n\nvar MyNewHighlightRules = function() {\n    // regexp must not have capturing parentheses. Use (?:) instead.\n    // regexps are ordered -> the first match is used\n   this.$rules = {\n        \"start\" : [\n            {\n                token: <token>, // String, Array, or Function: the CSS token to apply\n                regex: <regex>, // String or RegExp: the regexp to match\n                next:  <next>   // [Optional] String: next state to enter\n            }\n        ]\n    };\n};\n\noop.inherits(MyNewHighlightRules, TextHighlightRules);\nexports.MyNewHighlightRules = MyNewHighlightRules;\n});\n```\n\nThe token state machine operates on whatever is defined in `this.$rules`. The highlighter always begins at the `start` state, and progresses down the list, looking for a matching `regex`. When one is found, the resulting text is wrapped within a `<span class=\"ace_<token>\">` tag, where `<token>` is defined as the `token` property. Note that all tokens are preceded by the `ace_` prefix when they're rendered on the page.\n\nOnce again, we're inheriting from `TextHighlightRules` here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see [extending Highlighters](#extending-highlighters) below.\n\n## Defining Tokens\n\nThe Ace highlighting system is heavily inspired on the [TextMate language grammar](http://manual.macromates.com/en/language_grammars). Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found [on the Ace Wiki](https://github.com/ajaxorg/ace/wiki/Creating-or-Extending-an-Edit-Mode#wiki-commonTokens).\n\nFor the complete list of tokens, see [tool/tmtheme.js](https://github.com/ajaxorg/ace/blob/master/tool/tmtheme.js). It is possible to add new token names, but the scope of that knowledge is outside of this document.\n\nMultiple tokens can be applied to the same text by adding dots in the token, e.g. `token: support.function` wraps the text in a `<span class=\"ace_support ace_function\">` tag.\n\n## Defining Regular Expressions\n\nRegular expressions can either be a RegExp or String definition.\n\nIf you're using a regular expression, remember to start and end the line with the / character, like this:\n\n```javascript\n{\n    token : \"constant.language.escape\",\n    regex : /\\$[\\w\\d]+/\n}\n```\n\nA caveat of using stringed regular expressions is that any `\\` character must be escaped. That means that even an innocuous regular expression like this:\n\n```javascript\nregex: \"function\\s*\\(\\w+\\)\"\n```\n\nMust actually be written like this:\n\n```javascript\nregex: \"function\\\\s*\\(\\\\w+\\)\"\n```\n\n### Groupings\n\nThe regular expression matches the part of the code that should be styled by the token. You can also include flat regexps `(var)` or have matching groups `((a+)(b+))`. There is a strict requirement whereby matching groups must cover the entire matched string; thus, `(hel)lo` is invalid. If you want to create a non-matching group, simply start the group with the `?:` predicate; thus, `(hel)(?:lo)` is okay. You can create longer non-matching groups. For example:\n```\n{\n    token : \"constant.language.boolean\",\n    regex : /(?:true|false)\\b/\n},\n```\n\nFor flat regular expression matches, `token` can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this:\n\n```\nvar colors = lang.arrayToMap(\n    (\"aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|\" +\n    \"purple|red|silver|teal|white|yellow\").split(\"|\")\n);\n\nvar fonts = lang.arrayToMap(\n    (\"arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|\" +\n    \"symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|\" +\n    \"serif|monospace\").split(\"|\")\n);\n...\n{\n    token: function(value) {\n        if (colors.hasOwnProperty(value.toLowerCase())) {\n            return \"support.constant.color\";\n        }\n        else if (fonts.hasOwnProperty(value.toLowerCase())) {\n            return \"support.constant.fonts\";\n        }\n        else {\n            return \"text\";\n        }\n    },\n    regex: \"\\\\-?[a-zA-Z_][a-zA-Z0-9_\\\\-]*\"\n}\n```\n\nIf `token` is a function, it should take the same number of arguments as there are groups, and return an array of tokens.\n\nFor grouped regular expressions, `token` can be a String, in which case all matched groups are given that same token, like this:\n\n```\n{\n    token: \"identifier\",\n    regex: \"(\\\\w+\\\\s*:)(\\\\w*)\"\n}\n```\n\nMore commonly, though, `token` is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this:\n\n```\n{\n    token : [\"storage.type\", \"text\", \"entity.name.function\"],\n    regex : \"(function)(\\\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\\\b)\"\n}\n```\n\n## Defining States\n\nThe syntax highlighting state machine stays in the `start` state, until you define a `next` state for it to advance to. At that point, the tokenizer stays in that new `state`, until it advances to another state. Afterwards, you should return to the original `start` state.\n\nHere's an example:\n\n```\nthis.$rules = {\n    \"start\" : [ {\n        token : \"text\",\n        regex : \"<\\\\!\\\\[CDATA\\\\[\",\n        next : \"cdata\"\n    },\n\n    \"cdata\" : [ {\n        token : \"text\",\n        regex : \"\\\\]\\\\]>\",\n        next : \"start\"\n    }, {\n        defaultToken : \"text\"\n    } ]\n};\n```\n\nIn this extremly short sample, we're defining some highlighting rules for when Ace detectes a `<![CDATA` tag. When one is encountered, the tokenizer moves from `start` into the `cdata` state. It remains there, applying the `text` token to any string it encounters. Finally, when it hits a closing `]>` symbol, it returns to the `start` state and continues to tokenize anything else.\n\n# Extending Highlighters\n\nSuppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (`<?lua`, `<?php`, `{%`, for example). Ace allows you to easily extend a highlighter using a few helper functions.\n\n## Getting Existing Rules\n\nTo get the existing syntax highlighting rules for a particular language, use the getRules() function. For example:\n\n```javascript\nvar HtmlHighlightRules = require(\"./html_highlight_rules\").HtmlHighlightRules;\n\nthis.$rules = new HtmlHighlightRules().getRules();\n\n/*\n    this.$rules == Same this.$rules as HTML highlighting\n*/\n```\n\n## Extending a Highlighter\n\nThe `addRules()` method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this:\n\n```javascript\nthis.$rules = {\n    \"start\": [ /* ... */ ]\n};\n\nvar newRules = {\n    \"start\": [ /* ... */ ]\n}\n```\n\nIf you want to incorporate `newRules` into `this.$rules`, you'd do something like this:\n\n```javascript\nthis.addRules(newRules, \"new-\");\n\n/*\n    this.$rules = {\n        \"start\": [ ... ],\n        \"new-start\": [ ... ]\n    };\n*/\n```\n\n## Extending Two Highlighters\n\nThe last function available to you combines both of these concepts, and it's called `embedRules`. It takes three parameters:\n\n1. An existing rule set to embed with\n1. A prefix to apply for each state in the existing rule set\n1. A set of new states to add\n\nLike `addRules`, `embedRules` adds on to the existing `this.$rules` object.\n\nTo explain this visually, let's take a look at the syntax highlighter for Lua pages, which \ncombines all of these concepts:\n\n```javascript\nvar HtmlHighlightRules = require(\"./html_highlight_rules\").HtmlHighlightRules;\nvar LuaHighlightRules = require(\"./lua_highlight_rules\").LuaHighlightRules;\n\nvar LuaPageHighlightRules = function() {\n    this.$rules = new HtmlHighlightRules().getRules();\n\n    for (var i in this.$rules) {\n        this.$rules[i].unshift({\n            token: \"keyword\",\n            regex: \"<\\\\%\\\\=?\",\n            next: \"lua-start\"\n        }, {\n            token: \"keyword\",\n            regex: \"<\\\\?lua\\\\=?\",\n            next: \"lua-start\"\n        });\n    }\n    this.embedRules(LuaHighlightRules, \"lua-\", [\n        {\n            token: \"keyword\",\n            regex: \"\\\\%>\",\n            next: \"start\"\n        },\n        {\n            token: \"keyword\",\n            regex: \"\\\\?>\",\n            next: \"start\"\n        }\n    ]);\n};\n```\n\nHere, `this.$rules` starts off as a set of HTML highlighting rules. To this set, we add two new checks for `<%=` and `<?lua=`. We also delegate that if one of these rules are matched, we should move onto the `lua-start` state. Next, `embedRules` takes the already existing set of `LuaHighlightRules` and applies the `lua-` prefix to each state there. Finally, it adds two new checks for `%>` and `?>`, allowing the state machine to return to `start`.\n\n# Testing Your Highlighter\n\nThe best way to test your tokenizer is to see it live, right? To do that you'll want to create a new Cloud9 bundle and add your highlighter to it. See [this guide](doc:bundle-highlighter) for more information.\n\n# Adding Automated Tests\n\nAdding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development.\n\nIn `lib/ace/mode/_test` create a file named\n\n```\ntext_<modeName>.txt\n```\n\nwith some example code. (You can skip this if the document you have added in `demo/docs` both looks good and covers various edge cases in your language syntax).\n\n\nRun `node highlight_rules_test.js -gen` to preserve current output of your tokenizer in `tokens_<modeName>.json`\n\nAfter this running `highlight_rules_test.js optionalLanguageName` will compare output of your tokenizer with the correct output you've created.","excerpt":"","slug":"highlighting-rules","type":"basic","title":"Highlighting Rules"}

Highlighting Rules


The highlighting rules specify to Ace (Cloud9's editor) how to color the syntax of the language of [your mode](doc:language-mode). # Defining Syntax Highlighting Rules The Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define `mynew_highlight_rules.js`, which our [mode](doc:language-mode) uses. All syntax highlighters start off looking something like this: ```javascript define(function(require, exports, module) { "use strict"; var oop = require("../lib/oop"); var TextHighlightRules = require("ace/mode/text_highlight_rules").TextHighlightRules; var MyNewHighlightRules = function() { // regexp must not have capturing parentheses. Use (?:) instead. // regexps are ordered -> the first match is used this.$rules = { "start" : [ { token: <token>, // String, Array, or Function: the CSS token to apply regex: <regex>, // String or RegExp: the regexp to match next: <next> // [Optional] String: next state to enter } ] }; }; oop.inherits(MyNewHighlightRules, TextHighlightRules); exports.MyNewHighlightRules = MyNewHighlightRules; }); ``` The token state machine operates on whatever is defined in `this.$rules`. The highlighter always begins at the `start` state, and progresses down the list, looking for a matching `regex`. When one is found, the resulting text is wrapped within a `<span class="ace_<token>">` tag, where `<token>` is defined as the `token` property. Note that all tokens are preceded by the `ace_` prefix when they're rendered on the page. Once again, we're inheriting from `TextHighlightRules` here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see [extending Highlighters](#extending-highlighters) below. ## Defining Tokens The Ace highlighting system is heavily inspired on the [TextMate language grammar](http://manual.macromates.com/en/language_grammars). Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found [on the Ace Wiki](https://github.com/ajaxorg/ace/wiki/Creating-or-Extending-an-Edit-Mode#wiki-commonTokens). For the complete list of tokens, see [tool/tmtheme.js](https://github.com/ajaxorg/ace/blob/master/tool/tmtheme.js). It is possible to add new token names, but the scope of that knowledge is outside of this document. Multiple tokens can be applied to the same text by adding dots in the token, e.g. `token: support.function` wraps the text in a `<span class="ace_support ace_function">` tag. ## Defining Regular Expressions Regular expressions can either be a RegExp or String definition. If you're using a regular expression, remember to start and end the line with the / character, like this: ```javascript { token : "constant.language.escape", regex : /\$[\w\d]+/ } ``` A caveat of using stringed regular expressions is that any `\` character must be escaped. That means that even an innocuous regular expression like this: ```javascript regex: "function\s*\(\w+\)" ``` Must actually be written like this: ```javascript regex: "function\\s*\(\\w+\)" ``` ### Groupings The regular expression matches the part of the code that should be styled by the token. You can also include flat regexps `(var)` or have matching groups `((a+)(b+))`. There is a strict requirement whereby matching groups must cover the entire matched string; thus, `(hel)lo` is invalid. If you want to create a non-matching group, simply start the group with the `?:` predicate; thus, `(hel)(?:lo)` is okay. You can create longer non-matching groups. For example: ``` { token : "constant.language.boolean", regex : /(?:true|false)\b/ }, ``` For flat regular expression matches, `token` can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this: ``` var colors = lang.arrayToMap( ("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" + "purple|red|silver|teal|white|yellow").split("|") ); var fonts = lang.arrayToMap( ("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" + "symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" + "serif|monospace").split("|") ); ... { token: function(value) { if (colors.hasOwnProperty(value.toLowerCase())) { return "support.constant.color"; } else if (fonts.hasOwnProperty(value.toLowerCase())) { return "support.constant.fonts"; } else { return "text"; } }, regex: "\\-?[a-zA-Z_][a-zA-Z0-9_\\-]*" } ``` If `token` is a function, it should take the same number of arguments as there are groups, and return an array of tokens. For grouped regular expressions, `token` can be a String, in which case all matched groups are given that same token, like this: ``` { token: "identifier", regex: "(\\w+\\s*:)(\\w*)" } ``` More commonly, though, `token` is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this: ``` { token : ["storage.type", "text", "entity.name.function"], regex : "(function)(\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\b)" } ``` ## Defining States The syntax highlighting state machine stays in the `start` state, until you define a `next` state for it to advance to. At that point, the tokenizer stays in that new `state`, until it advances to another state. Afterwards, you should return to the original `start` state. Here's an example: ``` this.$rules = { "start" : [ { token : "text", regex : "<\\!\\[CDATA\\[", next : "cdata" }, "cdata" : [ { token : "text", regex : "\\]\\]>", next : "start" }, { defaultToken : "text" } ] }; ``` In this extremly short sample, we're defining some highlighting rules for when Ace detectes a `<![CDATA` tag. When one is encountered, the tokenizer moves from `start` into the `cdata` state. It remains there, applying the `text` token to any string it encounters. Finally, when it hits a closing `]>` symbol, it returns to the `start` state and continues to tokenize anything else. # Extending Highlighters Suppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (`<?lua`, `<?php`, `{%`, for example). Ace allows you to easily extend a highlighter using a few helper functions. ## Getting Existing Rules To get the existing syntax highlighting rules for a particular language, use the getRules() function. For example: ```javascript var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules; this.$rules = new HtmlHighlightRules().getRules(); /* this.$rules == Same this.$rules as HTML highlighting */ ``` ## Extending a Highlighter The `addRules()` method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this: ```javascript this.$rules = { "start": [ /* ... */ ] }; var newRules = { "start": [ /* ... */ ] } ``` If you want to incorporate `newRules` into `this.$rules`, you'd do something like this: ```javascript this.addRules(newRules, "new-"); /* this.$rules = { "start": [ ... ], "new-start": [ ... ] }; */ ``` ## Extending Two Highlighters The last function available to you combines both of these concepts, and it's called `embedRules`. It takes three parameters: 1. An existing rule set to embed with 1. A prefix to apply for each state in the existing rule set 1. A set of new states to add Like `addRules`, `embedRules` adds on to the existing `this.$rules` object. To explain this visually, let's take a look at the syntax highlighter for Lua pages, which combines all of these concepts: ```javascript var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules; var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules; var LuaPageHighlightRules = function() { this.$rules = new HtmlHighlightRules().getRules(); for (var i in this.$rules) { this.$rules[i].unshift({ token: "keyword", regex: "<\\%\\=?", next: "lua-start" }, { token: "keyword", regex: "<\\?lua\\=?", next: "lua-start" }); } this.embedRules(LuaHighlightRules, "lua-", [ { token: "keyword", regex: "\\%>", next: "start" }, { token: "keyword", regex: "\\?>", next: "start" } ]); }; ``` Here, `this.$rules` starts off as a set of HTML highlighting rules. To this set, we add two new checks for `<%=` and `<?lua=`. We also delegate that if one of these rules are matched, we should move onto the `lua-start` state. Next, `embedRules` takes the already existing set of `LuaHighlightRules` and applies the `lua-` prefix to each state there. Finally, it adds two new checks for `%>` and `?>`, allowing the state machine to return to `start`. # Testing Your Highlighter The best way to test your tokenizer is to see it live, right? To do that you'll want to create a new Cloud9 bundle and add your highlighter to it. See [this guide](doc:bundle-highlighter) for more information. # Adding Automated Tests Adding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development. In `lib/ace/mode/_test` create a file named ``` text_<modeName>.txt ``` with some example code. (You can skip this if the document you have added in `demo/docs` both looks good and covers various edge cases in your language syntax). Run `node highlight_rules_test.js -gen` to preserve current output of your tokenizer in `tokens_<modeName>.json` After this running `highlight_rules_test.js optionalLanguageName` will compare output of your tokenizer with the correct output you've created.