multi-character syntactic entities in syntax tables

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* multi-character syntactic entities in syntax tables
@ 2013-04-26 17:28 Erik Charlebois
  2013-04-26 18:53 ` Dmitry Gutov
  2013-04-26 19:26 ` Stefan Monnier
  0 siblings, 2 replies; 7+ messages in thread
From: Erik Charlebois @ 2013-04-26 17:28 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

One of the items in etc/TODO is:

** Beefed-up syntax-tables.
*** recognize multi-character syntactic entities like `begin' and `end'.

Lately I'm using languages where this would be quite useful and would be
interested in adding support. Before I dive in, are there any strong
opinions about how this should be implemented?

The approach I was thinking of taking is defining a new syntax character
class (let's say, *) which inherits from the previous character
(recursively if the previous character is *). The important distinction is
that they would not be treated as a new instance of that syntax class, so
point movement by syntax class or paren matching would work (e.g. begin
would be (****, and would only add 1 level of paren nesting).

A mode would use a syntax-propertize-function to tag keywords with
appropriate text properties. So something like Ruby:

class Foo
  def Bar
    if condition
      ...
    end
  end
end

would have syntax classes like:

(**** www
  (** www
    (* wwwwwwwww
       ...
    )**
  )**
)**

[-- Attachment #2: Type: text/html, Size: 1637 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 17:28 multi-character syntactic entities in syntax tables Erik Charlebois
@ 2013-04-26 18:53 ` Dmitry Gutov
  2013-04-26 19:22   ` Erik Charlebois
  2013-04-26 19:26 ` Stefan Monnier
  1 sibling, 1 reply; 7+ messages in thread
From: Dmitry Gutov @ 2013-04-26 18:53 UTC (permalink / raw)
  To: Erik Charlebois; +Cc: emacs-devel

Erik Charlebois <erikcharlebois@gmail.com> writes:

> One of the items in etc/TODO is:
>
> ** Beefed-up syntax-tables.
> *** recognize multi-character syntactic entities like `begin' and
> `end'.
>
> Lately I'm using languages where this would be quite useful and would
> be interested in adding support. Before I dive in, are there any
> strong opinions about how this should be implemented?
>
> The approach I was thinking of taking is defining a new syntax
> character class (let's say, *) which inherits from the previous
> character (recursively if the previous character is *). The important
> distinction is that they would not be treated as a new instance of
> that syntax class, so point movement by syntax class or paren matching
> would work (e.g. begin would be (****, and would only add 1 level of
> paren nesting).
>
> A mode would use a syntax-propertize-function to tag keywords with
> appropriate text properties. So something like Ruby:
>
> class Foo
> def Bar
> if condition
> ...
> end
> end
> end

ruby-mode code could definitely benefit from something like this.

> would have syntax classes like:
>
> (**** www
> (** www
> (* wwwwwwwww
> ...
> )**
> )**
> )**

I don't think using syntax-propertize-function is something the person
who wrote that TODO entry had in mind, but if we'll use it for that
purpose, at least in ruby-mode implementing something like a "generic
parenthesis" class should suffice (which would work similarly to generic
string and generic comment delimiters), since all non-curly blocks in
Ruby end the same way.

So, what's the rationale for your, more complex proposal? In what
context would treating e, g, i and n in "begin" as parenthesis openers
will be useful?



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 18:53 ` Dmitry Gutov
@ 2013-04-26 19:22   ` Erik Charlebois
  2013-04-26 20:57     ` Dmitry Gutov
  0 siblings, 1 reply; 7+ messages in thread
From: Erik Charlebois @ 2013-04-26 19:22 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2456 bytes --]

Off the top of my head, point motion (e.g. forward-word should skip the
entire word, not stop where the syntax class changes from "(" to "w") and
font locking (show-paren-mode should highlight the entire matching words).

Since the matching keyword lengths can be different (begin vs end), my
understanding is I can't just turn them into ((((( and ))) because it
doesn't balance.

Currently I have some hacks for Ruby mode that makes the first characters
of the block keywords have "(" or ")" syntax class. It works fine, aside
from point motion and font locking.



On Fri, Apr 26, 2013 at 2:53 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:

> Erik Charlebois <erikcharlebois@gmail.com> writes:
>
> > One of the items in etc/TODO is:
> >
> > ** Beefed-up syntax-tables.
> > *** recognize multi-character syntactic entities like `begin' and
> > `end'.
> >
> > Lately I'm using languages where this would be quite useful and would
> > be interested in adding support. Before I dive in, are there any
> > strong opinions about how this should be implemented?
> >
> > The approach I was thinking of taking is defining a new syntax
> > character class (let's say, *) which inherits from the previous
> > character (recursively if the previous character is *). The important
> > distinction is that they would not be treated as a new instance of
> > that syntax class, so point movement by syntax class or paren matching
> > would work (e.g. begin would be (****, and would only add 1 level of
> > paren nesting).
> >
> > A mode would use a syntax-propertize-function to tag keywords with
> > appropriate text properties. So something like Ruby:
> >
> > class Foo
> > def Bar
> > if condition
> > ...
> > end
> > end
> > end
>
> ruby-mode code could definitely benefit from something like this.
>
> > would have syntax classes like:
> >
> > (**** www
> > (** www
> > (* wwwwwwwww
> > ...
> > )**
> > )**
> > )**
>
> I don't think using syntax-propertize-function is something the person
> who wrote that TODO entry had in mind, but if we'll use it for that
> purpose, at least in ruby-mode implementing something like a "generic
> parenthesis" class should suffice (which would work similarly to generic
> string and generic comment delimiters), since all non-curly blocks in
> Ruby end the same way.
>
> So, what's the rationale for your, more complex proposal? In what
> context would treating e, g, i and n in "begin" as parenthesis openers
> will be useful?
>

[-- Attachment #2: Type: text/html, Size: 3364 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 17:28 multi-character syntactic entities in syntax tables Erik Charlebois
  2013-04-26 18:53 ` Dmitry Gutov
@ 2013-04-26 19:26 ` Stefan Monnier
  2013-04-26 21:37   ` Erik Charlebois
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2013-04-26 19:26 UTC (permalink / raw)
  To: Erik Charlebois; +Cc: emacs-devel

> One of the items in etc/TODO is:
> ** Beefed-up syntax-tables.
> *** recognize multi-character syntactic entities like `begin' and `end'.

> Lately I'm using languages where this would be quite useful and would be
> interested in adding support. Before I dive in, are there any strong
> opinions about how this should be implemented?

> The approach I was thinking of taking is defining a new syntax character
> class (let's say, *) which inherits from the previous character
> (recursively if the previous character is *). The important distinction is
> that they would not be treated as a new instance of that syntax class, so
> point movement by syntax class or paren matching would work (e.g. begin
> would be (****, and would only add 1 level of paren nesting).

I see.  So you'd rely on syntax-propertize-function to recognize those
multi-char entities and label them with one of the current syntaxes for
the first char and "*" for the other ones, thus labelling the symbol as
forming a single entity.

That's interesting.  The main drawback I see with it is the heavy
reliance on syntax-propertize, which can imply a significant performance
cost when jumping to the end of a largish buffer (forcing the whole
buffer to be lexed).

But it sounds like an attractive "easy" way to extend syntax tables to
support multi-char entities.

BTW: have you tried to set forward-sexp-function to something like
ruby-forward-sexp?

        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 19:22   ` Erik Charlebois
@ 2013-04-26 20:57     ` Dmitry Gutov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2013-04-26 20:57 UTC (permalink / raw)
  To: Erik Charlebois; +Cc: emacs-devel

On 26.04.2013 23:22, Erik Charlebois wrote:
> Off the top of my head, point motion (e.g. forward-word should skip the
> entire word, not stop where the syntax class changes from "(" to "w")
> and font locking (show-paren-mode should highlight the entire matching
> words).

But `forward-word' will jump to the end of the next word then, since 
"begin" as a whole would have non-word syntax.

> Since the matching keyword lengths can be different (begin vs end), my
> understanding is I can't just turn them into ((((( and ))) because it
> doesn't balance.

Yes, that's not what my suggestion was.

> Currently I have some hacks for Ruby mode that makes the first
> characters of the block keywords have "(" or ")" syntax class. It works
> fine, aside from point motion and font locking.

I would've put ")" at the end of "end", but otherwise, that is it. I 
didn't expect it to work with current open/close parens syntax classes, 
though: "string quotes", in similar circumstances, doesn't.

The problem with font-lock is important. Maybe we need a syntax class 
that would mark text as symbol (or word) constituent (or delegate to the 
buffer syntax table), and at the same time mark them as parenthesising 
construct.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 19:26 ` Stefan Monnier
@ 2013-04-26 21:37   ` Erik Charlebois
  2013-04-27  4:15     ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Erik Charlebois @ 2013-04-26 21:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

ruby-forward-sexp works fine. It's more for things like % (under vim
emulation, I'm not sure what the Emacs-equivalent is) to jump to the
matching brace and show-paren-mode.


On Fri, Apr 26, 2013 at 3:26 PM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

> > One of the items in etc/TODO is:
> > ** Beefed-up syntax-tables.
> > *** recognize multi-character syntactic entities like `begin' and `end'.
>
> > Lately I'm using languages where this would be quite useful and would be
> > interested in adding support. Before I dive in, are there any strong
> > opinions about how this should be implemented?
>
> > The approach I was thinking of taking is defining a new syntax character
> > class (let's say, *) which inherits from the previous character
> > (recursively if the previous character is *). The important distinction
> is
> > that they would not be treated as a new instance of that syntax class, so
> > point movement by syntax class or paren matching would work (e.g. begin
> > would be (****, and would only add 1 level of paren nesting).
>
> I see.  So you'd rely on syntax-propertize-function to recognize those
> multi-char entities and label them with one of the current syntaxes for
> the first char and "*" for the other ones, thus labelling the symbol as
> forming a single entity.
>
> That's interesting.  The main drawback I see with it is the heavy
> reliance on syntax-propertize, which can imply a significant performance
> cost when jumping to the end of a largish buffer (forcing the whole
> buffer to be lexed).
>
> But it sounds like an attractive "easy" way to extend syntax tables to
> support multi-char entities.
>
> BTW: have you tried to set forward-sexp-function to something like
> ruby-forward-sexp?
>
>
>         Stefan
>

[-- Attachment #2: Type: text/html, Size: 2347 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multi-character syntactic entities in syntax tables
  2013-04-26 21:37   ` Erik Charlebois
@ 2013-04-27  4:15     ` Stefan Monnier
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2013-04-27  4:15 UTC (permalink / raw)
  To: Erik Charlebois; +Cc: emacs-devel

> ruby-forward-sexp works fine. It's more for things like % (under vim
> emulation, I'm not sure what the Emacs-equivalent is) to jump to the
> matching brace and show-paren-mode.

For modes that use SMIE, the paren-blinking works on those keywords as
well (using a post-self-insert-hook).

BTW, I think Ruby's syntax should work well with SMIE.


        Stefan



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-27  4:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-26 17:28 multi-character syntactic entities in syntax tables Erik Charlebois
2013-04-26 18:53 ` Dmitry Gutov
2013-04-26 19:22   ` Erik Charlebois
2013-04-26 20:57     ` Dmitry Gutov
2013-04-26 19:26 ` Stefan Monnier
2013-04-26 21:37   ` Erik Charlebois
2013-04-27  4:15     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).