One of the items in etc/TODO is:

** Beefed-up syntax-tables.
*** recognize multi-character syntactic entities like `begin' and `end'.

Lately I'm using languages where this would be quite useful and would be interested in adding support. Before I dive in, are there any strong opinions about how this should be implemented?

The approach I was thinking of taking is defining a new syntax character class (let's say, *) which inherits from the previous character (recursively if the previous character is *). The important distinction is that they would not be treated as a new instance of that syntax class, so point movement by syntax class or paren matching would work (e.g. begin would be (****, and would only add 1 level of paren nesting).

A mode would use a syntax-propertize-function to tag keywords with appropriate text properties. So something like Ruby:

class Foo
  def Bar
    if condition
      ...
    end
  end
end

would have syntax classes like:

(**** www
  (** www
    (* wwwwwwwww
       ...
    )**
  )**
)**