font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
@ 2012-09-21 20:48 immerrr again...
  2012-09-25  1:03 ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: immerrr again... @ 2012-09-21 20:48 UTC (permalink / raw)
  To: emacs-devel

Hi all

I'm hacking lua-mode in my spare time and one thing that bothered me a
lot is Lua's long-bracket-constructs. For those who don't know what I'm
talking about, here's a short recap:

. A _long bracket of level N_ consists of two square brackets with N
   equals signs between them, N >= 0.

. An _opening long bracket_ has two _opening square brackets_, e.g "[[",
   "[=[", "[===[".

. A _closing long bracket_ has two _closing square brackets_, e.g. "]]",
   "]=]", "]===]".

. A _long string_ starts with an opening long bracket of any level and
   ends at the first closing long bracket of the same level.

. A comment starts with a double hyphen "--" anywhere outside a string.
   If the text immediately after "--" is not an opening long bracket, the
   comment is a _short comment_, which runs until the end of the line.
   Otherwise, it is a _long comment_, which runs until the corresponding
   closing long bracket.

Here are some characteristic situations for the rules above:

. code [[ string ]] code

. code -- comment till EOL

. code --[[ comment ]] code

. code -- [[ comment till EOL ]]
   because '--' is followed by space

. code ---[[ comment till EOL ]]
   because '--' is followed by '-'

. code [===[ string   ]=] string --[=[ string ]===] code

. code [===[ string ]===]  code  --[=[ comment ]=] code

Obviously, Emacs character syntax flags are not enough to describe that.
Currently, I'm trying to use `font-lock-syntactic-keywords' with the
following rule to capture both long comments and long strings:

1.    `(,(rx
2.        (or (seq (or line-start (not (any "-")))
3.                 (group-n 1 "-") "-[" (group-n 5 (0+ "=")))
4.            (seq (group-n 3 "[")      (group-n 6 (0+ "="))))

5.        "[" (minimal-match (0+ anything)) "]"

6.        (or (seq (backref 5) (group-n 2 "]"))
7.            (seq (backref 6) (group-n 4 "]"))))

8.     (1 "!" nil t) (2 "!" nil t)
9.    (3 "|" nil t) (4 "|" nil t))

The construct is probably not obvious, so I'll elaborate a little bit.

Lines 2-4 match opening brackets with optional double-dash and equals
signs, line 2 makes sure that 3+ consecutive dash situations are
unmatched and are interpreted as short comment. Note also, that there
are separate groups sequences of equals signs for strings and comments,
we'll get to them later.

Line 5 matches inner square brackets and the content of the literal.

Lines 6 and 7 match closing equals signs and square bracket. The
matching alternative is deduced from the fact that if an optional group
doesn't match, its backref won't match too. So, if it's a long comment,
then the regexp will match groups 1 and 5 for the opening bracket --
even if 5 is empty -- and 2 for the closing one since backref 6 won't
match. If it's a long string, vice-versa, it will match groups 3,4 and
6.

Kudos to those who made it to this point, here go the questions:

. Is it actually worth the while to optimize propertizing this way or
   probably two separate rules would perform just as fine?

. The obvious simplification would be to match both brackets with single
   captures and choose proper syntax flag programmatically depending on
   if there's a leading double dash. The documentation states smth about
   SYNTAX component of MATCH-HIGHLIGHT being "an expression whose value
   is such a form", can I leverage that here?

--
Cheers, immerrr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-21 20:48 font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers? immerrr again...
@ 2012-09-25  1:03 ` Stefan Monnier
  2012-09-25 11:31   ` immerrr again...
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2012-09-25  1:03 UTC (permalink / raw)
  To: immerrr again...; +Cc: emacs-devel

> 1.    `(,(rx
> 2.        (or (seq (or line-start (not (any "-")))
> 3.                 (group-n 1 "-") "-[" (group-n 5 (0+ "=")))
> 4.            (seq (group-n 3 "[")      (group-n 6 (0+ "="))))

> 5.        "[" (minimal-match (0+ anything)) "]"

> 6.        (or (seq (backref 5) (group-n 2 "]"))
> 7.            (seq (backref 6) (group-n 4 "]"))))

> 8.     (1 "!" nil t) (2 "!" nil t)
> 9.    (3 "|" nil t) (4 "|" nil t))

Here's your problem: the comments/strings you want to match may span
several lines, yet the patterns on font-lock-syntactic-keywords cannot
reliably match more than a single line (because when a line is modified,
font-lock only looked for that pattern in that line, for example).

So you need to do something more like:

For syntax-propertize (which is Emacs-24's successor to
font-lock-syntactic-keywords), I'd use something like:

(defun lua-syntax-propertize (start end)
  (goto-char start)
  (lua-syntax-propertize-string-or-comment-end end)
  (funcall
   (syntax-propertize-rules
    ("\\(?:\\(?:^\\|[^-]\\)\\(-\\)-\\)?\\([\\)=*["
     (1 "< b") ;; Only applied if sub-group1 exists.
     (2 (prog1 (unless (match-beginning 1) (string-to-syntax "|"))
          (lua-syntax-propertize-string-or-comment-end end)))))
   start end))

and then in lua-syntax-propertize-string-or-comment I'd use syntax-ppss
to check the parser state (i.e. determine if I'm in a type-b comment or
delimited-string corresponding to a long-bracket construct as opposed to
some type-a comment or standard string, or plain old code), and if I'm
in one of those long-bracket-constructs, use (nth 8 ppss) to find the
beginning, count the number of = used there, then search for the
matching ]==] pattern and place the matching "> b" or "|" syntax on the
second closing bracket.

This should reliably work even for long-brackets that span many many lines.

        Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-25  1:03 ` Stefan Monnier
@ 2012-09-25 11:31   ` immerrr again...
  2012-09-25 13:20     ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: immerrr again... @ 2012-09-25 11:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 09/25/2012 05:03 AM, Stefan Monnier wrote:
 > Here's your problem: the comments/strings you want to match may span
 > several lines, yet the patterns on font-lock-syntactic-keywords cannot
 > reliably match more than a single line (because when a line is modified,
 > font-lock only looked for that pattern in that line, for example).
 >

If I follow the documentation correctly, this can be worked around by using
`font-lock-extend-region-functions', isn't it?

 >
 > (defun lua-syntax-propertize (start end)
 >   (goto-char start)
 >   (lua-syntax-propertize-string-or-comment-end end)
 >   (funcall
 >    (syntax-propertize-rules
 >     ("\\(?:\\(?:^\\|[^-]\\)\\(-\\)-\\)?\\([\\)=*["
 >      (1 "< b") ;; Only applied if sub-group1 exists.
 >      (2 (prog1 (unless (match-beginning 1) (string-to-syntax "|"))
 >           (lua-syntax-propertize-string-or-comment-end end)))))
 >    start end))>
 >

The snippet is great. I, for one, have somehow missed the "<"/">" syntax
descriptors. But I have a concern: what if closing long bracket, i.e. 
"]==]",
is after `end', doesn't it need an extra matcher rule for that?

As for 23-vs-24 question, I really look forward to the moment when I can
safely drop backward compatibility for lua-mode, but in the meanwhile, I
guess, I'll need to look into the font-lock code to see how to do the 
same in
Emacs23.

-- 
Cheers,
immerrr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-25 11:31   ` immerrr again...
@ 2012-09-25 13:20     ` Stefan Monnier
  2012-09-28  8:19       ` immerrr again
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2012-09-25 13:20 UTC (permalink / raw)
  To: immerrr again...; +Cc: emacs-devel

>> Here's your problem: the comments/strings you want to match may span
>> several lines, yet the patterns on font-lock-syntactic-keywords cannot
>> reliably match more than a single line (because when a line is modified,
>> font-lock only looked for that pattern in that line, for example).
> If I follow the documentation correctly, this can be worked around by using
> `font-lock-extend-region-functions', isn't it?

Yes, although writing an appropriate function can be difficult, and even
if you manage to do it, it will still mean that whenever you modify
a line inside a 1000-line string, the whole 1000-line string will
be refontified which can lead to performance problems.

> The snippet is great. I, for one, have somehow missed the "<"/">" syntax
> descriptors. But I have a concern: what if closing long bracket,
> i.e. "]==]",
> is after `end', doesn't it need an extra matcher rule for that?

Well, you'll want to make sure you don't go past `end' and if the ]==]
is not found before `end' you just don't put any "closing syntax
property".  That's not a problem, because that ]==] will be found in
some later invocation of the syntax-propertize function where `end' will
be after the ]==].

> As for 23-vs-24 question, I really look forward to the moment when
> I can safely drop backward compatibility for lua-mode, but in the
> meanwhile, I guess, I'll need to look into the font-lock code to see
> how to do the same in Emacs23.

Yes, you can do the same with font-lock-syntactic-keywords, of course.

        Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-25 13:20     ` Stefan Monnier
@ 2012-09-28  8:19       ` immerrr again
  2012-09-28 12:28         ` Stefan Monnier
  0 siblings, 1 reply; 8+ messages in thread
From: immerrr again @ 2012-09-28  8:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

On Tue, Sep 25, 2012 at 5:20 PM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

> > The snippet is great. I, for one, have somehow missed the "<"/">" syntax
> > descriptors. But I have a concern: what if closing long bracket,
> > i.e. "]==]",
> > is after `end', doesn't it need an extra matcher rule for that?
>
> Well, you'll want to make sure you don't go past `end' and if the ]==]
> is not found before `end' you just don't put any "closing syntax
> property".  That's not a problem, because that ]==] will be found in
> some later invocation of the syntax-propertize function where `end' will
> be after the ]==].
>

Hold on, do I understand procedure correctly in the following
situation: suppose, we're propertizing an arbitrary buffer region, big
enough not to be matched in single attempt; suppose, first match region -
from `begin1' to `end1' - contains exactly one opening long bracket and its
long bracket of the same level is located beyond `end1' point, and there's
no other opening long bracket in between. Am I right that the next match
invocation will operate on region starting from `end1 + 1'? If yes, then
unless there's a rule that can match closing long brackets, the closing
long bracket won't be found and thus won't be propertized, or will it?

--
Cheers,
immerrr

[-- Attachment #2: Type: text/html, Size: 1787 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-28  8:19       ` immerrr again
@ 2012-09-28 12:28         ` Stefan Monnier
  2012-09-29  6:50           ` immerrr again
  2013-03-26 13:48           ` immerrr again
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Monnier @ 2012-09-28 12:28 UTC (permalink / raw)
  To: immerrr again; +Cc: emacs-devel

> Hold on, do I understand procedure correctly in the following
> situation: suppose, we're propertizing an arbitrary buffer region, big
> enough not to be matched in single attempt; suppose, first match region -
> from `begin1' to `end1' - contains exactly one opening long bracket and its
> long bracket of the same level is located beyond `end1' point, and there's
> no other opening long bracket in between. Am I right that the next match
> invocation will operate on region starting from `end1 + 1'?

Yes.

> If yes, then unless there's a rule that can match closing long
> brackets, the closing long bracket won't be found and thus won't be
> propertized, or will it?

There is such a rule: the definition of lua-syntax-propertize begins by
calling lua-syntax-propertize-string-or-comment-end which:

   and then in lua-syntax-propertize-string-or-comment I'd use syntax-ppss
   to check the parser state (i.e. determine if I'm in a type-b comment or
   delimited-string corresponding to a long-bracket construct as opposed to
   some type-a comment or standard string, or plain old code), and if I'm
   in one of those long-bracket-constructs, use (nth 8 ppss) to find the
   beginning, count the number of = used there, then search for the
   matching ]==] pattern and place the matching "> b" or "|" syntax on the
   second closing bracket.

so while the second call is for "end1+1 ... end2", your code will go
back to the beginning of the long bracket to figure out its end and then
look for that end again (which might show up before end2 this time, tho
maybe not).

-- Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-28 12:28         ` Stefan Monnier
@ 2012-09-29  6:50           ` immerrr again
  2013-03-26 13:48           ` immerrr again
  1 sibling, 0 replies; 8+ messages in thread
From: immerrr again @ 2012-09-29  6:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

On Fri, Sep 28, 2012 at 4:28 PM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

>
> There is such a rule: the definition of lua-syntax-propertize begins by
> calling lua-syntax-propertize-string-or-comment-end which:
>
>    and then in lua-syntax-propertize-string-or-comment I'd use syntax-ppss
>    to check the parser state (i.e. determine if I'm in a type-b comment or
>    delimited-string corresponding to a long-bracket construct as opposed to
>    some type-a comment or standard string, or plain old code), and if I'm
>    in one of those long-bracket-constructs, use (nth 8 ppss) to find the
>    beginning, count the number of = used there, then search for the
>    matching ]==] pattern and place the matching "> b" or "|" syntax on the
>    second closing bracket.
>
> so while the second call is for "end1+1 ... end2", your code will go
> back to the beginning of the long bracket to figure out its end and then
> look for that end again (which might show up before end2 this time, tho
> maybe not).
>
>
Oh, my bad. Thanks a lot for your help!

>
--
Cheers,
immerrr

[-- Attachment #2: Type: text/html, Size: 1770 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers?
  2012-09-28 12:28         ` Stefan Monnier
  2012-09-29  6:50           ` immerrr again
@ 2013-03-26 13:48           ` immerrr again
  1 sibling, 0 replies; 8+ messages in thread
From: immerrr again @ 2013-03-26 13:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Fri, Sep 28, 2012 at 4:28 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>
> <snip>
>
> -- Stefan

Hi Stefan

Wanted to say thanks once again for the help.

I've finally found the time to try out your suggestion and it worked
as a charm [1]. Although, the similarity is not immediately obvious,
because I couldn't resist to (cl-prettyexpand ...) the
syntax-propertize-rules macro and tailor the expansion to my liking.
The only drawback was that it only worked for Emacs24, so I had to
backport basically the same functionality via
font-lock-syntactic-keywords [2]. And this implementation, admittedly
slower than syntax-propertize-based one, is probably the answer to my
original question.

Cheers,
immerrr

1. https://github.com/immerrr/lua-mode/commit/dbedb1e3e19e81e9031eff08d2d30b19b89d8654
2. https://github.com/immerrr/lua-mode/commit/2a0314b2bda879f5e97d851c43e840671f5ace8d

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-03-26 13:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 20:48 font-lock-syntactic-keywords: evaluating arbitrary elisp inside matchers? immerrr again...
2012-09-25  1:03 ` Stefan Monnier
2012-09-25 11:31   ` immerrr again...
2012-09-25 13:20     ` Stefan Monnier
2012-09-28  8:19       ` immerrr again
2012-09-28 12:28         ` Stefan Monnier
2012-09-29  6:50           ` immerrr again
2013-03-26 13:48           ` immerrr again

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).