One of the items in etc/TODO is:

** Beefed-up syntax-tables.

*** recognize multi-character syntactic entities like `begin' and `end'.

Lately I'm using languages where this would be quite useful and would be interested in adding support. Before I dive in, are there any strong opinions about how this should be implemented?

The approach I was thinking of taking is defining a new syntax character class (let's say, *) which inherits from the previous character (recursively if the previous character is *). The important distinction is that they would not be treated as a new instance of that syntax class, so point movement by syntax class or paren matching would work (e.g. begin would be (****, and would only add 1 level of paren nesting).

A mode would use a syntax-propertize-function to tag keywords with appropriate text properties. So something like Ruby:

class Foo

def Bar

if condition

...

end

would have syntax classes like:

(**** www

(** www

(* wwwwwwwww

...

)**