For instance, the example of this article (`type` as a keyword vs. `type` as a function) would probably have worked with font-lock-mode as well because you could distinguish the two cases from whether or not a left parenthesis follows the token. But, of course, without proper parsing, there's always the possibility of edge cases that you cannot resolve correctly.
The interesting cases arise anyway when whatever you have in your buffer does not adhere to the grammar, i.e. you have a syntax error: how does then your syntax highlighter cope with that?
The rules for indenting are actually implemented differently, even though they also involve some kind of parse. And it's not unusual to have to cache context information about the current line, for performance, so that you don't have to look back at preceding lines until you're satisfied you have enough context to indent the current line. The functions to indent multiple lines at once of course might represent this context without having to annotate the buffer.
> you have a syntax error: how does then your syntax highlighter cope with that?
I wrote (but didn't release) an all-new language-specific incremental fast parser for Emacs that recovered from some syntax errors. My general approach was to pick a region of text that included the obvious syntax error, visually highlight it in red, annotate it so that a mouseover would hover an explanation bubble of what's wrong with it, and then continue the parse assuming some reasonable context. You can see screenshots at:
https://www.neilvandyke.org/quack/#meow
For example, for an unterminated string literal, it would error-highlight the opening quote and subsequent characters up to the first whitespace. For another example, a string literal with an invalid escape sequence would error-highlight the entire string literal up through the closing quote. Another example shown is detecting a character that can't occur in that context (a close-paren immediately after a comment-the-following-s-expression).
I just updated my page to acknowledge that there's a different project with that name, and I will rename my unreleased project.
(I'd mentioned Meow online several times, years ago, but understandable that they wouldn't have been aware of it, and I have no claim to the name, anyway. Not only was my project never released, but the community where I mostly mentioned it had/has a problem with many posts from our Google Group no longer showing up in Google search hits.)
> I like your naming scheme of using animal sounds,
It originally wasn't. :) The developers of the Scheme implementation family that's now called Racket developed a bespoke IDE for students, called DrScheme (as in doctor), which did some fancy things. For my much less fancy Emacs kludges, I named it "Quack", as in a fake doctor. The animal sounds only came when I needed a name for the successor to Quack.
Only half joking
Unfortunately, it wasn't as straightforward as I hoped. You need to create a custom major mode for your language and manually integrate the tree-sitter highlighting.
What I'd really like to see one day is an Emacs mode that allows you to automatically plug in any tree-sitter grammar with just a couple of lines of configuration in your .emacs, and instantly get syntax highlighting. Is that too much to ask?
Then when the file goes into tree-sitter-mode, you can check the filetype again, and map that into the language to load into tree-sitter. Keep a buffer-local variable to remember that current language, so that you can use it for any additional language-specific customization that you want as well.
Keep in mind that there's nothing about a major mode in Emacs that has to be specific to a programming language. It's totally cool to have a major mode that works for multiple programming languages!
*Update:* I found the code. Instead of using font-lock, we simply draw our own overlays over the text and color them, which is what font-lock does. Font-lock IIRC is specifically designed to use regexps to parse the text. We don't need that. So throw font-lock away. Tree-Sitter itself knows how the parse tree maps to text regions. Just use that information to draw the overlays directly. It's way simpler that way.
Lemme know if anyone wants the code.
My elisp skills are basic, so I could help more with testing, coordination, publishing packages, or documentation.
Shall I create the repo?
(A bit reductionist of his many accomplishments in between, I know, it's just a thing that's hit me in the moment)