unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
@ 2020-09-22  9:35 Alan Mackenzie
  2020-09-22 14:09 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-22  9:35 UTC (permalink / raw)
  To: 43558, monnier

Hello, Emacs and Stefan.

In the following C comment:

1   /*
2     \*/
3   /**/

, with point at BOL 1, do M-: (forward-comment 1).  This leaves point
wrongly at EOL 2.  It should end up at EOL 3, since the apparent comment
ender on L2 is actually escaped.

The following patch fixes this.  Are there any objections to me
installing it?


diff --git a/src/syntax.c b/src/syntax.c
index e6af8a377b..066972e6d8 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2354,6 +2354,13 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	/* We have encountered a nested comment of the same style
 	   as the comment sequence which began this comment section.  */
 	nesting++;
+      if (comment_end_can_be_escaped
+          && (code == Sescape || code == Scharquote))
+        {
+          inc_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_FORWARD (from);
+          if (from == stop) continue; /* Failure */
+        }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
 

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-22  9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
@ 2020-09-22 14:09 ` Stefan Monnier
  2020-09-22 19:41   ` Alan Mackenzie
       [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
  2020-09-23  9:01 ` Mattias Engdegård
  2 siblings, 1 reply; 36+ messages in thread
From: Stefan Monnier @ 2020-09-22 14:09 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558

Hi Alan,

> Hello, Emacs and Stefan.
>
> In the following C comment:
>
> 1   /*
> 2     \*/
> 3   /**/
>
> , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> wrongly at EOL 2.

That seems to be correct w.r.t the highlighting I see, OTOH.
IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

> It should end up at EOL 3, since the apparent comment
> ender on L2 is actually escaped.
>
> The following patch fixes this.

Does it fix it for `parse-partial-sexp` as well?

> Are there any objections to me installing it?

None from me, no.


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-22 14:09 ` Stefan Monnier
@ 2020-09-22 19:41   ` Alan Mackenzie
  0 siblings, 0 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-22 19:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558

Hello, Stefan.

On Tue, Sep 22, 2020 at 10:09:43 -0400, Stefan Monnier wrote:
> Hi Alan,

> > Hello, Emacs and Stefan.

> > In the following C comment:

> > 1   /*
> > 2     \*/
> > 3   /**/

> > , with point at BOL 1, do M-: (forward-comment 1).  This leaves point
> > wrongly at EOL 2.

> That seems to be correct w.r.t the highlighting I see, OTOH.
> IOW the bug seems to affect both forward-comment and parse-partial-sexp, right?

Yes.

> > It should end up at EOL 3, since the apparent comment
> > ender on L2 is actually escaped.

> > The following patch fixes this.

> Does it fix it for `parse-partial-sexp` as well?

It does, yes.  The patch is in forw_comment, which is called by
Fforward_comment, scan_lists, and scan_sexps_forward.

> > Are there any objections to me installing it?

> None from me, no.

Thanks!

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
       [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
@ 2020-09-23  8:57   ` Alan Mackenzie
  0 siblings, 0 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-23  8:57 UTC (permalink / raw)
  To: 43558-done

Bug fixed in master.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-22  9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
  2020-09-22 14:09 ` Stefan Monnier
       [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
@ 2020-09-23  9:01 ` Mattias Engdegård
  2020-09-23 14:48   ` Alan Mackenzie
  2 siblings, 1 reply; 36+ messages in thread
From: Mattias Engdegård @ 2020-09-23  9:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Stefan Monnier

Sorry if I misunderstood, but since when do backslashes escape */ in C?






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23  9:01 ` Mattias Engdegård
@ 2020-09-23 14:48   ` Alan Mackenzie
  2020-09-23 18:44     ` Stefan Monnier
  2020-09-24 18:52     ` Michael Welsh Duggan
  0 siblings, 2 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-23 14:48 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 43558, Stefan Monnier

Hello, Mattias.

On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> Sorry if I misunderstood, but since when do backslashes escape */ in C?

Since forever, but only in the CC Mode test suite.  :-(

I just tried it out with gcc, and it seems that \*/ does indeed end a
block comment.  But an escaped newline doesn't end a line comment,
instead continuing it to the next line.  So I got confused.  Thanks for
pointing out the mistake.

It seems that as well as the existing variable
comment-end-can-be-escaped, we need a new one, say
line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
be nil and t respectively.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23 14:48   ` Alan Mackenzie
@ 2020-09-23 18:44     ` Stefan Monnier
  2020-09-23 19:44       ` Alan Mackenzie
  2020-09-24 10:20       ` Alan Mackenzie
  2020-09-24 18:52     ` Michael Welsh Duggan
  1 sibling, 2 replies; 36+ messages in thread
From: Stefan Monnier @ 2020-09-23 18:44 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.

syntax.c doesn't like to think of it as "line-comment" but rather as
comment stay a, b, c, or nested and non-nested.

> In C and C++ modes, these would
> be nil and t respectively.

I sm-c-mode, I'd handle those corner cases in
`syntax-propertize-function` (tho I think I don't bother with this one
currently).

So, I guess in CC-mode, you could handle those by placing `syntax-table`
properties from ... wherever you place them ;-)


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23 18:44     ` Stefan Monnier
@ 2020-09-23 19:44       ` Alan Mackenzie
  2020-09-23 20:02         ` Stefan Monnier
  2020-09-24 10:20       ` Alan Mackenzie
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-23 19:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård

Hello, Stefan.

On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote:
> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.

> syntax.c doesn't like to think of it as "line-comment" but rather as
> comment stay [ ?? style ?? ] a, b, c, or nested and non-nested.

Hmm.  It could be quite troublesome to decide on an interface for major
modes specifying "comment style b can have its ender escaped, but
comment styles a and c cannot".

> > In C and C++ modes, these would
> > be nil and t respectively.

> I sm-c-mode, I'd handle those corner cases in
> `syntax-propertize-function` (tho I think I don't bother with this one
> currently).

> So, I guess in CC-mode, you could handle those by placing `syntax-table`
> properties from ... wherever you place them ;-)

Thanks, that's an idea - either putting a neutral s-t prop on the \ of
\*/, or something on the \n of \\n in a line comment.  I think the first
of these is a better idea than the second.

But on the other hand, it feels like a workaround for the lack of a
full-featured comment-end-can-be-escaped.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23 19:44       ` Alan Mackenzie
@ 2020-09-23 20:02         ` Stefan Monnier
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Monnier @ 2020-09-23 20:02 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

> But on the other hand, it feels like a workaround for the lack of a

Yes, that's the definition of `syntax-propertize-function` ;-)


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23 18:44     ` Stefan Monnier
  2020-09-23 19:44       ` Alan Mackenzie
@ 2020-09-24 10:20       ` Alan Mackenzie
  2020-09-24 16:56         ` Stefan Monnier
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-24 10:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård

Hello, Stefan.

On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote:
> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.

> syntax.c doesn't like to think of it as "line-comment" but rather as
> comment stay a, b, c, or nested and non-nested.

> > In C and C++ modes, these would be nil and t respectively.

> I sm-c-mode, I'd handle those corner cases in
> `syntax-propertize-function` (tho I think I don't bother with this one
> currently).

> So, I guess in CC-mode, you could handle those by placing `syntax-table`
> properties from ... wherever you place them ;-)

As already said, this is a(n ugly) workaround.  syntax.c should handle
comments in all their generality.  With a bit of consideration, the
method to do this is clear:

Introduce a new syntax flag `e' which takes effect in comment delimiters.
It means "escape characters are active in this type of comment".  In a
two character delimiter it would, like `b', only take effect on the inner
of the two characters.

So the syntaxes of the C++ comment characters would be amended to look
like
    /  ". 124be"
    *  ". 23"  (unchanged)
    \n "> be"

This would be an easy change to make, and (unlike using syntax-table text
properties) would cost negligible run time.

What do you think?

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 10:20       ` Alan Mackenzie
@ 2020-09-24 16:56         ` Stefan Monnier
  2020-09-24 18:50           ` Alan Mackenzie
  2020-11-19 21:18           ` Alan Mackenzie
  0 siblings, 2 replies; 36+ messages in thread
From: Stefan Monnier @ 2020-09-24 16:56 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

> As already said, this is a(n ugly) workaround.  syntax.c should handle
> comments in all their generality.  With a bit of consideration, the
> method to do this is clear:

In my world, it's quite normal for a specific language's lexical rules
not to line up 100% with syntax tables (whether for strings, comments,
younameit).  I don't see anything very special here.

A `syntax-propertize` rule for "\*/" should be very easy to implement
and fairly cheap since the regexp is simple and will almost never match.

So, yeah, you can add yet-another-hack on top of the other syntax.c
hacks if you want, but there's a good chance it will only ever be used
by CC-mode.  It will take a lot more code changes in syntax.c than
a quick tweak to your Elisp code to search for "\*/".

I do think it would be good to handle this without `syntax-table`
text-property hacks, but I think that should come with an overhaul of
syntax.c based on a major-mode provided DFA (or something like that) so
it can accommodate all the various oddball cases without even the need
to introduce the notion of escaping comment markers.


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 16:56         ` Stefan Monnier
@ 2020-09-24 18:50           ` Alan Mackenzie
  2020-09-24 22:43             ` Stefan Monnier
  2020-11-19 21:18           ` Alan Mackenzie
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-24 18:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård

Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

Normally when there's a mismatch, it's because a character is
syntactically ambiguous.  There's nothing syntax.c can do about this.

In the current situation, this isn't the case: syntax.c is unable to
handle a comment scenario where there is no ambiguity.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

Well, the rule would actually be for escaped newlines, but this would be
quite expensive (compared with a syntax.c solution) since every comment
near a change region would need scanning at each change.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

I've hacked up a working, but as yet unsatisfactory, change to syntax.c.
It is surely better, where possible, to fix bugs at their point of
causation rather than by workarounds elsewhere.  As you note, CC Mode
modes will be the only known users at the moment.

Just as an aside, the project where I was working ~four years ago banned
a proprietory editor after a mammoth search for a bug caused by an
unintentional escaped NL on a line comment.  The banned editor didn't
fontify the continuation line in comment face.  I was able to
demonstrate to the project manager that Emacs fontified that comment
correctly.

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

That sounds almost more like a rewrite than an overhaul.  You mean, I
think, that the syntax of language expressions would be defined using
something a bit like (but more powerful than) regular expressions.  And
with that, the need for syntactic analysis in Lisp would be much
reduced.

We would need to make sure that this wouldn't run more slowly than the
current syntax.c/Lisp combination.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-23 14:48   ` Alan Mackenzie
  2020-09-23 18:44     ` Stefan Monnier
@ 2020-09-24 18:52     ` Michael Welsh Duggan
  2020-09-24 19:57       ` Alan Mackenzie
  1 sibling, 1 reply; 36+ messages in thread
From: Michael Welsh Duggan @ 2020-09-24 18:52 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Alan Mackenzie <acm@muc.de> writes:

> Hello, Mattias.
>
> On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
> Since forever, but only in the CC Mode test suite.  :-(
>
> I just tried it out with gcc, and it seems that \*/ does indeed end a
> block comment.  But an escaped newline doesn't end a line comment,
> instead continuing it to the next line.  So I got confused.  Thanks for
> pointing out the mistake.
>
> It seems that as well as the existing variable
> comment-end-can-be-escaped, we need a new one, say
> line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> be nil and t respectively.

But where does it say that backslashes escape */ in C++?  The C++ 14
standard (and it hasn't changed through C++ 20) says:

    2.7 Comments [lex.comment]
    
    The characters /* start a comment, which terminates with the
    characters */. These comments do not nest.  The characters // start
    a comment, which terminates immediately before the next new-line
    character. If there is a form-feed or a vertical-tab character in
    such a comment, only white-space characters shall appear between it
    and the new-line that terminates the comment; no diagnostic is
    required. [ Note: The comment characters //, /*, and */ have no
    special meaning within a // comment and are treated just like other
    characters. Similarly, the comment characters // and /* have no
    special meaning within a /* comment.  — end note ]

-- 
Michael Welsh Duggan
(md5i@md5i.com)





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 18:52     ` Michael Welsh Duggan
@ 2020-09-24 19:57       ` Alan Mackenzie
  2020-09-24 20:27         ` Michael Welsh Duggan
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-09-24 19:57 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Hello, Michael.

On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?

> > Since forever, but only in the CC Mode test suite.  :-(

> > I just tried it out with gcc, and it seems that \*/ does indeed end a
> > block comment.  But an escaped newline doesn't end a line comment,
> > instead continuing it to the next line.  So I got confused.  Thanks for
> > pointing out the mistake.

> > It seems that as well as the existing variable
> > comment-end-can-be-escaped, we need a new one, say
> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
> > be nil and t respectively.

> But where does it say that backslashes escape */ in C++?

Nowhere.  :-(

There has been a test in the CC Mode test suite for many years which
assumed this (but was disabled for existing (X)Emacs versions, waiting
for a new Emacs version to be "fixed").

> The C++ 14 standard (and it hasn't changed through C++ 20) says:

>     2.7 Comments [lex.comment]
    
>     The characters /* start a comment, which terminates with the
>     characters */. These comments do not nest.  The characters // start
>     a comment, which terminates immediately before the next new-line
>     character.

For all the difference it makes, Emacs assumes the comment ends _after_
the NL.

>     If there is a form-feed or a vertical-tab character in such a
>     comment, only white-space characters shall appear between it and
>     the new-line that terminates the comment; no diagnostic is
>     required.

I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
probably isn't worth bothering about.

>     [ Note: The comment characters //, /*, and */ have no special
>     meaning within a // comment and are treated just like other
>     characters. Similarly, the comment characters // and /* have no
>     special meaning within a /* comment.  — end note ]

Additionally, an escaped newline continues a comment onto the next line.
This happens, notionally, at a very early stage of compilation where a
backslash followed by NL anywhere get replaced by a space.  I think that
even two backslashes followed by NL would get replaced by backslash,
space.

> -- 
> Michael Welsh Duggan
> (md5i@md5i.com)

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 19:57       ` Alan Mackenzie
@ 2020-09-24 20:27         ` Michael Welsh Duggan
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Welsh Duggan @ 2020-09-24 20:27 UTC (permalink / raw)
  To: Alan Mackenzie
  Cc: Michael Welsh Duggan, Mattias Engdegård,
	43558@debbugs.gnu.org, Stefan Monnier

Alan Mackenzie <acm@muc.de> writes:

> Hello, Michael.
>
> On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote:
>> Alan Mackenzie <acm@muc.de> writes:
>
>> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote:
>> >> Sorry if I misunderstood, but since when do backslashes escape */ in C?
>
>> > Since forever, but only in the CC Mode test suite.  :-(
>
>> > I just tried it out with gcc, and it seems that \*/ does indeed end a
>> > block comment.  But an escaped newline doesn't end a line comment,
>> > instead continuing it to the next line.  So I got confused.  Thanks for
>> > pointing out the mistake.
>
>> > It seems that as well as the existing variable
>> > comment-end-can-be-escaped, we need a new one, say
>> > line-comment-end-can-be-escaped, too.  In C and C++ modes, these would
>> > be nil and t respectively.
>
>> But where does it say that backslashes escape */ in C++?
>
> Nowhere.  :-(
>
> There has been a test in the CC Mode test suite for many years which
> assumed this (but was disabled for existing (X)Emacs versions, waiting
> for a new Emacs version to be "fixed").
>
>> The C++ 14 standard (and it hasn't changed through C++ 20) says:
>
>>     2.7 Comments [lex.comment]
>     
>>     The characters /* start a comment, which terminates with the
>>     characters */. These comments do not nest.  The characters // start
>>     a comment, which terminates immediately before the next new-line
>>     character.
>
> For all the difference it makes, Emacs assumes the comment ends _after_
> the NL.
>
>>     If there is a form-feed or a vertical-tab character in such a
>>     comment, only white-space characters shall appear between it and
>>     the new-line that terminates the comment; no diagnostic is
>>     required.
>
> I didn't know that.  Emacs/CC Mode doesn't code up this subtlety.  It
> probably isn't worth bothering about.
>
>>     [ Note: The comment characters //, /*, and */ have no special
>>     meaning within a // comment and are treated just like other
>>     characters. Similarly, the comment characters // and /* have no
>>     special meaning within a /* comment.  — end note ]
>
> Additionally, an escaped newline continues a comment onto the next line.
> This happens, notionally, at a very early stage of compilation where a
> backslash followed by NL anywhere get replaced by a space.  I think that
> even two backslashes followed by NL would get replaced by backslash,
> space.

Almost.  A backslash followed by a newline is elided completely, joining
the lines.  (Not replaced by a space.  Otherwise, I concur.

-- 
Michael Welsh Duggan
(mwd@cert.org)





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 18:50           ` Alan Mackenzie
@ 2020-09-24 22:43             ` Stefan Monnier
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Monnier @ 2020-09-24 22:43 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

>> > As already said, this is a(n ugly) workaround.  syntax.c should handle
>> > comments in all their generality.  With a bit of consideration, the
>> > method to do this is clear:
>> In my world, it's quite normal for a specific language's lexical rules
>> not to line up 100% with syntax tables (whether for strings, comments,
>> younameit).  I don't see anything very special here.
> Normally when there's a mismatch, it's because a character is
> syntactically ambiguous.  There's nothing syntax.c can do about this.

Oh, no, there are many more situations than just "a character is
syntactically ambiguous" (or alternatively you could argue that all
cases are "a character is syntactically ambiguous", including your cases
of escaped newline and escaped */).

>> A `syntax-propertize` rule for "\*/" should be very easy to implement
>> and fairly cheap since the regexp is simple and will almost never match.
> Well, the rule would actually be for escaped newlines,

It doesn't have to be if you set `comment-end-can-be-escaped` to non-nil,
in which case you only need to tweak the \*/ case, AFAICT.

> but this would be quite expensive (compared with a syntax.c solution)
> since every comment near a change region would need scanning at
> each change.

I don't know what you mean by scanning, but yes you'd need to search for
all "\\\\\n" or "\\\\\\*/" (depending on how you set
`comment-end-can-be-escaped) and mark the second char accordingly.
Seems pretty cheap in either case.

> I've hacked up a working, but as yet unsatisfactory, change to syntax.c.
> It is surely better, where possible, to fix bugs at their point of
> causation rather than by workarounds elsewhere.

I don't think it's a bug in `syntax.c`.  `syntax.c` is not defined to
support the syntax of C, it's only defined to handle a particular set of
comment and string styles, which correspond to a common subset of what
is in use in most languages, but IME most languages need some extra
tweaks handled via the `syntax-table` text property.  It's only
a question of time until we add a `syntax-propertize-function` for Elisp
mode to properly handle some corner cases, for example.

>> I do think it would be good to handle this without `syntax-table`
>> text-property hacks, but I think that should come with an overhaul of
>> syntax.c based on a major-mode provided DFA (or something like that) so
>> it can accommodate all the various oddball cases without even the need
>> to introduce the notion of escaping comment markers.
> That sounds almost more like a rewrite than an overhaul.

Tomato tomahto.

> We would need to make sure that this wouldn't run more slowly than the
> current syntax.c/Lisp combination.

I don't think that would be required, as long as it runs fast enough.
In any case, the resulting performance is probably not the main worry
(I suspect it will/would be easy to make it fast enough).


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-09-24 16:56         ` Stefan Monnier
  2020-09-24 18:50           ` Alan Mackenzie
@ 2020-11-19 21:18           ` Alan Mackenzie
  2020-11-19 22:47             ` Stefan Monnier
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-19 21:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård, acm

Hello, Stefan.

On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround.  syntax.c should handle
> > comments in all their generality.  With a bit of consideration, the
> > method to do this is clear:

> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit).  I don't see anything very special here.

> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.

> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode.  It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".

> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.

OK, here's the patch.  As a matter of interest, it's been heavily tested
by the .../test/src/syntax-tests.el unit tests, further enhancements to
which are part of the patch.

Just as a reminder, the motivation is to be able to have syntax.c
correctly parse C/C++ line comments which look like:

    foo(); // comment \\
    second line of comment.

by introducing a new syntax flag "e" as a modifier on the syntax entry
for \n:

    (modify-syntax-entry ?\n "> be")

>         Stefan



diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..c701729ba1 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags)
 {
   return (flags >> 22) & 1;
 }
+static bool
+SYNTAX_FLAGS_COMMENT_ESCAPES (int flags)
+{
+  return (flags >> 24) & 1;
+}
 
 /* FLAGS should be the flags of the main char of the comment marker, e.g.
    the second for comstart and the first for comend.  */
@@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte)
   return val;
 }
 
+static bool
+comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax)
+{
+  int c;
+  int next_syntax;
+  if (comment_end_can_be_escaped && char_quoted (from, from_byte))
+    return true;
+  if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax))
+    {
+      dec_both (&from, &from_byte);
+      UPDATE_SYNTAX_TABLE_BACKWARD (from);
+      c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+      next_syntax = SYNTAX_WITH_FLAGS (c);
+      UPDATE_SYNTAX_TABLE_FORWARD (from + 1);
+      if (next_syntax == Sescape || next_syntax == Scharquote)
+        return true;
+    }
+  return false;
+}
+
 /* Check whether charpos FROM is at the end of a comment.
    FROM_BYTE is the bytepos corresponding to FROM.
    Do not move back before STOP.
@@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 		 && SYNTAX_FLAGS_COMEND_SECOND (prev_syntax));
       comstart = (com2start || code == Scomment);
 
+      /* Check for any current delimiter being escaped.  */
+      if (from > stop
+          && (((com2end || code == Sendcomment)
+               && comment_ender_quoted (from, from_byte, syntax))
+              || (code == Scomment
+                  && comment_end_can_be_escaped
+                  && char_quoted (from, from_byte))))
+        {
+          dec_both (&from, &from_byte);
+          UPDATE_SYNTAX_TABLE_BACKWARD (from);
+          com2end = comstart = com2start = 0;
+          syntax = Smax;
+        }
+
       /* Nasty cases with overlapping 2-char comment markers:
 	 - snmp-mode: -- c -- foo -- c --
 	              --- c --
@@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property.  */)
       case 'c':
 	val |= 1 << 23;
 	break;
+
+      case 'e':
+        val |= 1 << 24;
+        break;
       }
 
   if (val < ASIZE (Vsyntax_code_object) && NILP (match))
@@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
   (Lisp_Object syntax)
 {
   int code, syntax_code;
-  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested;
+  bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested,
+    comescapes;
   char str[2];
   Lisp_Object first, match_lisp, value = syntax;
 
@@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
   comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code);
   comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code);
   comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code);
+  comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code);
 
   if (Smax <= code)
     {
@@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
     insert ("c", 1);
   if (comnested)
     insert ("n", 1);
+  if (comescapes)
+    insert ("e", 1);
 
   insert_string ("\twhich means: ");
 
@@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
     insert_string (" (comment style c)");
   if (comnested)
     insert_string (" (nestable)");
+  if (comescapes)
+    insert_string (" (can be escaped)");
 
   if (prefix)
     {
@@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	  && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
 	  && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
 	      (nesting > 0 && --nesting == 0) : nesting < 0)
-          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+          && !comment_ender_quoted (from, from_byte, syntax))
 	/* We have encountered a comment end of the same style
 	   as the comment sequence which began this comment
 	   section.  */
@@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	/* We have encountered a nested comment of the same style
 	   as the comment sequence which began this comment section.  */
 	nesting++;
-      if (comment_end_can_be_escaped
-          && (code == Sescape || code == Scharquote))
+      if (SYNTAX_FLAGS_COMEND_FIRST (syntax)
+          && comment_ender_quoted (from, from_byte, syntax))
         {
           inc_both (&from, &from_byte);
           UPDATE_SYNTAX_TABLE_FORWARD (from);
-          if (from == stop) continue; /* Failure */
+          continue;
         }
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil.  */)
       /* We're at the start of a comment.  */
       found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
 			    &out_charpos, &out_bytepos, &dummy, &dummy2);
-      from = out_charpos; from_byte = out_bytepos;
-      if (!found)
+      from = out_charpos; from_byte = out_bytepos; 
+     if (!found)
 	{
 	  SET_PT_BOTH (from, from_byte);
 	  return Qnil;
@@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil.  */)
 	  if (code == Sendcomment)
 	    comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0);
 	  if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax)
-	      && prev_char_comend_first (from, from_byte)
-	      && !char_quoted (from - 1, dec_bytepos (from_byte)))
+	      && prev_char_comend_first (from, from_byte))
 	    {
 	      int other_syntax;
-	      /* We must record the comment style encountered so that
+              /* We must record the comment style encountered so that
 		 later, we can match only the proper comment begin
 		 sequence of the same style.  */
 	      dec_both (&from, &from_byte);
-	      code = Sendcomment;
-	      /* Calling char_quoted, above, set up global syntax position
-		 at the new value of FROM.  */
 	      c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
 	      other_syntax = SYNTAX_WITH_FLAGS (c1);
-	      comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
-	      comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              if (!comment_ender_quoted (from, from_byte, other_syntax))
+                {
+                  code = Sendcomment;
+                  comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
+                  comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+                  syntax = other_syntax;
+                }
+              else
+                {
+                  inc_both (&from, &from_byte);
+                  UPDATE_SYNTAX_TABLE_FORWARD (from);
+                }
 	    }
 
 	  if (code == Scomment_fence)
@@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil.  */)
 	    }
 	  else if (code == Sendcomment)
 	    {
-              found = (!quoted || !comment_end_can_be_escaped)
+              found =
+                !comment_ender_quoted (from, from_byte, syntax)
                 && back_comment (from, from_byte, stop, comnested, comstyle,
                                  &out_charpos, &out_bytepos);
 	      if (!found)
@@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
 	      other_syntax = SYNTAX_WITH_FLAGS (c2);
 	      comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
 	      comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+              syntax = other_syntax;
 	    }
 
 	  /* Quoting turns anything except a comment-ender
@@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
 	    case Sendcomment:
 	      if (!parse_sexp_ignore_comments)
 		break;
-	      found = back_comment (from, from_byte, stop, comnested, comstyle,
+	      found =
+                (from == stop
+                 || !comment_ender_quoted (from, from_byte, syntax))
+                && back_comment (from, from_byte, stop, comnested, comstyle,
 				    &out_charpos, &out_bytepos);
 	      /* FIXME:  if !found, it really wasn't a comment-end.
 		 For single-char Sendcomment, we can't do much about it apart
diff --git a/test/src/syntax-resources/syntax-comments.txt b/test/src/syntax-resources/syntax-comments.txt
index a292d816b9..f3357ea244 100644
--- a/test/src/syntax-resources/syntax-comments.txt
+++ b/test/src/syntax-resources/syntax-comments.txt
@@ -34,7 +34,7 @@
 54{ //74 \
 }54
 55{/* */}55
-56{ /*76 \*/ }56
+56{ /*76 \*/80 }56
 57*/77
 58}58
 60{ /*78 \\*/79}60
@@ -87,6 +87,21 @@
 110
 111#| ; |#111
 
+/* Comments and purported comments containing string delimiters. */
+120/* "string" */120
+121/* "" */121
+122/* " */122
+130/*
+" " */130
+" "*/123
+124/* " ' */124
+126/*
+" ' */126
+127/* " " " " " */127
+128/* " ' "  ' " ' */128
+129/*   ' "  ' " ' */129
+" ' */125
+
 Local Variables:
 mode: fundamental
 eval: (set-syntax-table (make-syntax-table))
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index edee01ec58..399986c31d 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -307,6 +307,7 @@ syntax-pps-comments
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun {-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?{ "<")
   (modify-syntax-entry ?} ">"))
@@ -336,6 +337,7 @@ {-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \;-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?\n ">")
   (modify-syntax-entry ?\; "<")
@@ -375,6 +377,7 @@ \;-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun \#|-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (modify-syntax-entry ?# ". 14")
   (modify-syntax-entry ?| ". 23n")
   (modify-syntax-entry ?\; "< b")
@@ -418,15 +421,18 @@ \#|-out
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (defun /*-in ()
   (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
   (setq comment-end-can-be-escaped t)
   (modify-syntax-entry ?/ ". 124b")
   (modify-syntax-entry ?* ". 23")
-  (modify-syntax-entry ?\n "> b"))
+  (modify-syntax-entry ?\n "> b")
+  (modify-syntax-entry ?\' "\""))
 (defun /*-out ()
   (setq comment-end-can-be-escaped nil)
   (modify-syntax-entry ?/ ".")
   (modify-syntax-entry ?* ".")
-  (modify-syntax-entry ?\n " "))
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
 (eval-and-compile
   (setq syntax-comments-section "c"))
 
@@ -489,4 +495,142 @@ /*-out
 (syntax-pps-comments /* 56 76 77 58)
 (syntax-pps-comments /* 60 78 79)
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the
+;; "e" flag is used for line comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(defun //-in ()
+  (setq parse-sexp-ignore-comments t)
+  (setq comment-use-syntax-ppss nil)
+  (modify-syntax-entry ?/ ". 124be")
+  (modify-syntax-entry ?* ". 23")
+  (modify-syntax-entry ?\n "> be")
+  (modify-syntax-entry ?\' "\""))
+(defun //-out ()
+  (modify-syntax-entry ?/ ".")
+  (modify-syntax-entry ?* ".")
+  (modify-syntax-entry ?\n " ")
+  (modify-syntax-entry ?\' "."))
+(eval-and-compile
+  (setq syntax-comments-section "c++"))
+
+(syntax-comments // forward t 1)
+(syntax-comments // backward t 1)
+(syntax-comments // forward t 2)
+(syntax-comments // backward t 2)
+(syntax-comments // forward t 3)
+(syntax-comments // backward t 3)
+
+(syntax-comments // forward t 4)
+(syntax-comments // backward t 4)
+(syntax-comments // forward t 5 6)
+(syntax-comments // backward nil 5 0)
+(syntax-comments // forward nil 6 0)
+(syntax-comments // backward t 6 5)
+
+(syntax-comments // forward t 7)
+(syntax-comments // backward t 7)
+(syntax-comments // forward nil 8 0)
+(syntax-comments // backward nil 8 0)
+(syntax-comments // forward t 9)
+(syntax-comments // backward t 9)
+
+(syntax-comments // forward nil 10 0)
+(syntax-comments // backward nil 10 0)
+(syntax-comments // forward t 11)
+(syntax-comments // backward t 11)
+
+(syntax-comments // forward t 13)
+(syntax-comments // backward t 13)
+(syntax-comments // forward t 15)
+(syntax-comments // backward t 15)
+
+;; Emacs 28 "C" style comments inside brace lists.
+(syntax-br-comments // forward t 50)
+(syntax-br-comments // backward t 50)
+(syntax-br-comments // forward t 51)
+(syntax-br-comments // backward t 51)
+(syntax-br-comments // forward t 52)
+(syntax-br-comments // backward t 52)
+
+(syntax-br-comments // forward t 53)
+(syntax-br-comments // backward t 53)
+(syntax-br-comments // forward t 54 58)
+(syntax-br-comments // backward t 54)
+(syntax-br-comments // forward t 55)
+(syntax-br-comments // backward t 55)
+
+(syntax-br-comments // forward t 56 56)
+(syntax-br-comments // backward t 58 54)
+(syntax-br-comments // backward nil 59)
+(syntax-br-comments // forward t 60)
+(syntax-br-comments // backward t 60)
+
+;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'.
+(syntax-pps-comments // 50 70 71)
+(syntax-pps-comments // 52 72 73)
+(syntax-pps-comments // 54 74 55 58)
+(syntax-pps-comments // 56 76 80)
+(syntax-pps-comments // 60 78 79)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Comments containing string delimiters.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c-\""))
+
+(syntax-comments /* forward t 120)
+(syntax-comments /* backward t 120)
+(syntax-comments /* forward t 121)
+(syntax-comments /* backward t 121)
+(syntax-comments /* forward t 122)
+(syntax-comments /* backward t 122)
+
+(syntax-comments /* backward nil 123 0)
+(syntax-comments /* forward t 124)
+(syntax-comments /* backward t 124)
+(syntax-comments /* backward nil 125 0)
+(syntax-comments /* forward t 126)
+(syntax-comments /* backward t 126)
+
+(syntax-comments /* forward t 127)
+(syntax-comments /* backward t 127)
+(syntax-comments /* forward t 128)
+(syntax-comments /* backward t 128)
+(syntax-comments /* forward t 129)
+(syntax-comments /* backward t 129)
+
+(syntax-comments /* forward t 130)
+(syntax-comments /* backward t 130)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The same again, with Emacs 28 style C comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+  (setq syntax-comments-section "c++-\""))
+
+(syntax-comments // forward t 120)
+(syntax-comments // backward t 120)
+(syntax-comments // forward t 121)
+(syntax-comments // backward t 121)
+(syntax-comments // forward t 122)
+(syntax-comments // backward t 122)
+
+(syntax-comments // backward nil 123 0)
+(syntax-comments // forward t 124)
+(syntax-comments // backward t 124)
+(syntax-comments // backward nil 125 0)
+(syntax-comments // forward t 126)
+(syntax-comments // backward t 126)
+
+(syntax-comments // forward t 127)
+(syntax-comments // backward t 127)
+(syntax-comments // forward t 128)
+(syntax-comments // backward t 128)
+(syntax-comments // forward t 129)
+(syntax-comments // backward t 129)
+
+(syntax-comments // forward t 130)
+(syntax-comments // backward t 130)
+
 ;;; syntax-tests.el ends here


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-19 21:18           ` Alan Mackenzie
@ 2020-11-19 22:47             ` Stefan Monnier
  2020-11-22 13:12               ` Alan Mackenzie
  0 siblings, 1 reply; 36+ messages in thread
From: Stefan Monnier @ 2020-11-19 22:47 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

>> So, yeah, you can add yet-another-hack on top of the other syntax.c
>> hacks if you want, but there's a good chance it will only ever be used
>> by CC-mode.  It will take a lot more code changes in syntax.c than
>> a quick tweak to your Elisp code to search for "\*/".
[...]
> OK, here's the patch.

I think the patch agrees with my assessment above (even though it's
still missing a etc/NEWS entry, adjustment to the docstring of
modify-syntax-entry and to the .texi manual).

I really can't understand why you resist so much the use of
a `syntax-table` property on those rare \\\n sequences.


        Stefan


PS: Also, I just noticed that `gcc -Wall` warns about the use of such
multiline comments, so it doesn't seem to be a very popular feature.

PPS: For reference, I just tried to add support for it in sm-c-mode
and this is the resulting code:


@@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"#  define\"."
                                'syntax-table (string-to-syntax "|"))
             (put-text-property (match-beginning 2) (match-end 2)
                                'syntax-table (string-to-syntax "|")))
-          (sm-c--cpp-syntax-propertize end)))))
+          (sm-c--cpp-syntax-propertize end))))
+    ("\\\\\\(\n\\)"
+     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
+          (when (and (nth 4 ppss)        ;Within a comment
+                     (null (nth 7 ppss)) ;Within a // comment
+                     (save-excursion     ;The \ is not itself escaped
+                       (goto-char (match-beginning 0))
+                       (zerop (mod (skip-chars-backward "\\\\") 2))))
+            (string-to-syntax "."))))))
    (point) end))
 
 (defun sm-c-syntactic-face-function (ppss)






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-19 22:47             ` Stefan Monnier
@ 2020-11-22 13:12               ` Alan Mackenzie
  2020-11-22 15:20                 ` Stefan Monnier
  2020-11-22 15:35                 ` Eli Zaretskii
  0 siblings, 2 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 13:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård, acm

Hello, Stefan.

On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote:
> >> So, yeah, you can add yet-another-hack on top of the other syntax.c
> >> hacks if you want, but there's a good chance it will only ever be used
> >> by CC-mode.  It will take a lot more code changes in syntax.c than
> >> a quick tweak to your Elisp code to search for "\*/".
> [...]
> > OK, here's the patch.

> I think the patch agrees with my assessment above (even though it's
> still missing a etc/NEWS entry, adjustment to the docstring of
> modify-syntax-entry and to the .texi manual).

Here are these things:



diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
index b99b5de0b3..4e9e9207c3 100644
--- a/doc/lispref/syntax.texi
+++ b/doc/lispref/syntax.texi
@@ -287,21 +287,21 @@ Syntax Flags
 @cindex syntax flags
 
   In addition to the classes, entries for characters in a syntax table
-can specify flags.  There are eight possible flags, represented by the
+can specify flags.  There are nine possible flags, represented by the
 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
-@samp{n}, and @samp{p}.
+@samp{e}, @samp{n}, and @samp{p}.
 
   All the flags except @samp{p} are used to describe comment
 delimiters.  The digit flags are used for comment delimiters made up
 of 2 characters.  They indicate that a character can @emph{also} be
 part of a comment sequence, in addition to the syntactic properties
 associated with its character class.  The flags are independent of the
-class and each other for the sake of characters such as @samp{*} in
-C mode, which is a punctuation character, @emph{and} the second
+class and each other for the sake of characters such as @samp{*} in C
+mode, which is a punctuation character, @emph{and} the second
 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
 first character of an end-of-comment sequence (@samp{*/}).  The flags
-@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
-comment delimiter.
+@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the
+corresponding comment delimiter.
 
   Here is a table of the possible flags for a character @var{c},
 and what they mean:
@@ -332,6 +332,13 @@ Syntax Flags
 alternative ``c'' comment style.  For a two-character comment
 delimiter, @samp{c} on either character makes it of style ``c''.
 
+@item
+@samp{e} means that when @var{c}, a comment ender or first character
+of a two character ender, is directly proceded by one or more escape
+characters, @var{c} does not act as a comment ender.  Contrast this
+with the effect of variable @code{comment-end-can-be-escaped}
+(@pxref{Control Parsing}).
+
 @item
 @samp{n} on a comment delimiter character specifies that this kind of
 comment can be nested.  Inside such a comment, only comments of the
@@ -357,7 +364,7 @@ Syntax Flags
 @item @samp{*}
 @samp{23b}
 @item newline
-@samp{>}
+@samp{> e}
 @end table
 
 This defines four comment-delimiting sequences:
@@ -377,7 +384,9 @@ Syntax Flags
 
 @item newline
 This is a comment-end sequence for ``a'' style, because the newline
-character does not have the @samp{b} flag.
+character does not have the @samp{b} flag.  It can be escaped by one
+or more @samp{\} characters, so that an ``a'' style comment can
+continue onto the next line.
 @end table
 
 @item
@@ -962,9 +971,14 @@ Control Parsing
 @defvar comment-end-can-be-escaped
 If this buffer local variable is non-@code{nil}, a single character
 which usually terminates a comment doesn't do so when that character
-is escaped.  This is used in C and C++ Modes, where line comments
-starting with @samp{//} can be continued onto the next line by
-escaping the newline with @samp{\}.
+is escaped.  This used to be used in C and C++ Modes, where line
+comments starting with @samp{//} can be continued onto the next line
+by escaping the newline with @samp{\}.
+
+Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax
+Flags}), where two consecutive escape characters escape the comment
+ender.  @code{comment-end-can-be-escaped} should not be used together
+with the @samp{e} syntax flag.
 @end defvar
 
 You can use @code{forward-comment} to move forward or backward over
@@ -1037,6 +1051,8 @@ Syntax Table Internals
 @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)}
 @item
 @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)}
+@item
+@tab@tab @samp{e} @tab @code{(ash 1 24)}
 @end multitable
 
 @defun string-to-syntax desc
diff --git a/etc/NEWS b/etc/NEWS
index a0e72bc673..3b292e8f41 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
 \f
 * Lisp Changes in Emacs 28.1
 
++++
+** New syntax flag 'e'.
+This indicates that one or two (or more) escape characters escape a
+comment ender with this flag, causing the comment to be continued past
+that comment ender (typically onto the next line).
+
 +++
 ** 'set-window-configuration' now takes an optional 'dont-set-frame'
 parameter which, when non-nil, instructs the function not to select
diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..7bdbd114ba 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below.
 The second character of NEWENTRY is the matching parenthesis,
  used only if the first character is `(' or `)'.
 Any additional characters are flags.
-Defined flags are the characters 1, 2, 3, 4, b, p, and n.
+Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p.
  1 means CHAR is the start of a two-char comment start sequence.
  2 means CHAR is the second character of such a sequence.
  3 means CHAR is the start of a two-char comment end sequence.
@@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag:
  c means CHAR is part of comment sequence c.
  n means CHAR is part of a nestable comment sequence.
 
+ e means CHAR, when a comment ender or first char of a two character
+   comment ender, can be escaped by (any number of consecutive)
+   characters with escape syntax.  C and C++ use this facility.
+   Compare and contrast with the variable `comment-end-can-be-escaped'.
+
  p means CHAR is a prefix character for `backward-prefix-chars';
    such characters are treated as whitespace when they occur
    between expressions.



> I really can't understand why you resist so much the use of
> a `syntax-table` property on those rare \\\n sequences.

Because syntax-table text properties are already used for so many
different things in CC Mode (I think the count is five in C++ Mode).
Adding another one would mean having to scan for this rare construct at
every buffer change, and this would slow things down, possibly a lot.

There is no slowdown (beyond a possible microscopic one) in the
modification to syntax.c and, as a bonus, I have written around 200 test
cases for syntax.c's comment features.

>         Stefan


> PS: Also, I just noticed that `gcc -Wall` warns about the use of such
> multiline comments, so it doesn't seem to be a very popular feature.

It is more of a mistake that people occasionally might make than a
feature.  In my opinion, having escaped newlines inside line comments is
a bug in the C/C++ language standards.  Anybody might "end" a line
comment accidentally with "\" or "\\".

> PPS: For reference, I just tried to add support for it in sm-c-mode
> and this is the resulting code:

Just to emphasize Stefan Kangas's point, it is a newline preceded by a
"\" which continues the comment, not an escaped NL in the ordinary
sense.  In particular two "\"s followed by NL still continue the
comment.

> @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"#  define\"."
>                                 'syntax-table (string-to-syntax "|"))
>              (put-text-property (match-beginning 2) (match-end 2)
>                                 'syntax-table (string-to-syntax "|")))
> -          (sm-c--cpp-syntax-propertize end)))))
> +          (sm-c--cpp-syntax-propertize end))))
> +    ("\\\\\\(\n\\)"
> +     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
> +          (when (and (nth 4 ppss)        ;Within a comment
> +                     (null (nth 7 ppss)) ;Within a // comment
> +                     (save-excursion     ;The \ is not itself escaped
> +                       (goto-char (match-beginning 0))
> +                       (zerop (mod (skip-chars-backward "\\\\") 2))))
> +            (string-to-syntax "."))))))
>     (point) end))
>  
>  (defun sm-c-syntactic-face-function (ppss)

Yes, something like this would be possible.  But all these syntax-ppsss
would be slow, at least somewhat, as discussed above.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 13:12               ` Alan Mackenzie
@ 2020-11-22 15:20                 ` Stefan Monnier
  2020-11-22 17:08                   ` Alan Mackenzie
  2020-11-22 15:35                 ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Stefan Monnier @ 2020-11-22 15:20 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

> Because syntax-table text properties are already used for so many
> different things in CC Mode (I think the count is five in C++ Mode).
> Adding another one would mean having to scan for this rare construct at
> every buffer change, and this would slow things down, possibly a lot.

The fact that you already have 5 other such uses implies that the slow
down from this one cannot possibly be larger than 20% (since the scan
for it is very simple, I doubt any of the other 5 is simpler).

Most major modes have such things and we live just fine with them.
This is a non-issue.


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 13:12               ` Alan Mackenzie
  2020-11-22 15:20                 ` Stefan Monnier
@ 2020-11-22 15:35                 ` Eli Zaretskii
  2020-11-22 17:03                   ` Alan Mackenzie
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2020-11-22 15:35 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, mattiase, monnier

> Date: Sun, 22 Nov 2020 13:12:31 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: 43558@debbugs.gnu.org,
>  Mattias Engdegård <mattiase@acm.org>, acm@muc.de
> 
> +@samp{e} means that when @var{c}, a comment ender or first character
> +of a two character ender, is directly proceded by one or more escape
                                         ^^^^^^^^
"preceded", I guess?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 15:35                 ` Eli Zaretskii
@ 2020-11-22 17:03                   ` Alan Mackenzie
  0 siblings, 0 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 17:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43558, mattiase, monnier

Hello, Eli.

On Sun, Nov 22, 2020 at 17:35:24 +0200, Eli Zaretskii wrote:
> > Date: Sun, 22 Nov 2020 13:12:31 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: 43558@debbugs.gnu.org,
> >  Mattias Engdegård <mattiase@acm.org>, acm@muc.de

> > +@samp{e} means that when @var{c}, a comment ender or first character
> > +of a two character ender, is directly proceded by one or more escape
>                                          ^^^^^^^^
> "preceded", I guess?

Er, yes.  Thanks!  I've corrected it.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 15:20                 ` Stefan Monnier
@ 2020-11-22 17:08                   ` Alan Mackenzie
  2020-11-22 17:46                     ` Dmitry Gutov
  2020-11-22 23:10                     ` Stefan Monnier
  0 siblings, 2 replies; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 17:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 43558, Mattias Engdegård

Hello, Stefan.

On Sun, Nov 22, 2020 at 10:20:32 -0500, Stefan Monnier wrote:
> > Because syntax-table text properties are already used for so many
> > different things in CC Mode (I think the count is five in C++ Mode).
> > Adding another one would mean having to scan for this rare construct at
> > every buffer change, and this would slow things down, possibly a lot.

> The fact that you already have 5 other such uses implies that the slow
> down from this one cannot possibly be larger than 20% (since the scan
> for it is very simple, I doubt any of the other 5 is simpler).

The fact remains that an implementation at the C level is objectively
better than one at the Lisp level.

> Most major modes have such things and we live just fine with them.
> This is a non-issue.

Really?  Are there any other programming language modes whose comments
syntax.c cannot handle without syntax-table text properties?

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 17:08                   ` Alan Mackenzie
@ 2020-11-22 17:46                     ` Dmitry Gutov
  2020-11-22 18:19                       ` Alan Mackenzie
  2020-11-22 23:10                     ` Stefan Monnier
  1 sibling, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2020-11-22 17:46 UTC (permalink / raw)
  To: Alan Mackenzie, Stefan Monnier; +Cc: 43558, Mattias Engdegård

On 22.11.2020 19:08, Alan Mackenzie wrote:
> Really?  Are there any other programming language modes whose comments
> syntax.c cannot handle without syntax-table text properties?

Ruby is just one example.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 17:46                     ` Dmitry Gutov
@ 2020-11-22 18:19                       ` Alan Mackenzie
  2020-11-22 20:39                         ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 18:19 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Hello, Dmitry.

On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote:
> On 22.11.2020 19:08, Alan Mackenzie wrote:
> > Really?  Are there any other programming language modes whose comments
> > syntax.c cannot handle without syntax-table text properties?

> Ruby is just one example.

Thanks.

I've just searched the web for that.  Ruby has block comment delimiters
=begin and =end.

It would be possible to handle these in syntax.c, but somewhat clumsy
and awkward.

Presumably ruby-mode handles these with syntax-table text properties
applied to the = sign and the terminating d, which is a little clumsy,
but not too bad, at the Lisp level.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 18:19                       ` Alan Mackenzie
@ 2020-11-22 20:39                         ` Dmitry Gutov
  2020-11-22 21:13                           ` Alan Mackenzie
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2020-11-22 20:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier

On 22.11.2020 20:19, Alan Mackenzie wrote:
> Hello, Dmitry.
> 
> On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote:
>> On 22.11.2020 19:08, Alan Mackenzie wrote:
>>> Really?  Are there any other programming language modes whose comments
>>> syntax.c cannot handle without syntax-table text properties?
> 
>> Ruby is just one example.
> 
> Thanks.
> 
> I've just searched the web for that.  Ruby has block comment delimiters
> =begin and =end.
> 
> It would be possible to handle these in syntax.c, but somewhat clumsy
> and awkward.

Just like the C comments syntax discussed here.

> Presumably ruby-mode handles these with syntax-table text properties
> applied to the = sign and the terminating d, which is a little clumsy,
> but not too bad, at the Lisp level.

This is just two more regexps to search for (and propertize). I don't 
expect that the slowdown from them is in any way perceptible.

And the general point is that the Emacs syntax table structure doesn't 
necessarily have to mirror the syntax of the C language.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 20:39                         ` Dmitry Gutov
@ 2020-11-22 21:13                           ` Alan Mackenzie
  2020-11-22 21:34                             ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 21:13 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Hello, Dmitry.

On Sun, Nov 22, 2020 at 22:39:08 +0200, Dmitry Gutov wrote:
> On 22.11.2020 20:19, Alan Mackenzie wrote:
> > On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote:
> >> On 22.11.2020 19:08, Alan Mackenzie wrote:
> >>> Really?  Are there any other programming language modes whose comments
> >>> syntax.c cannot handle without syntax-table text properties?

> >> Ruby is just one example.

> > Thanks.

> > I've just searched the web for that.  Ruby has block comment delimiters
> > =begin and =end.

> > It would be possible to handle these in syntax.c, but somewhat clumsy
> > and awkward.

> Just like the C comments syntax discussed here.

Not at all.  The amendment we're talking about is to handle escaped
newlines inside line comments.  Which takes precedence, the comment to
EOL, or the escape?  It's rather arbitrary, and should be configurable.

Coding up the Ruby block comments in syntax.c would involve string
comparisons, for example, and would be an entirely new flavour inside
that file.  It would involve examining individual letters rather than
just their syntax.

By contrast, coding up the escaped NL in syntax.c was straightforward and
natural.

Have you looked at the patch?

> > Presumably ruby-mode handles these with syntax-table text properties
> > applied to the = sign and the terminating d, which is a little clumsy,
> > but not too bad, at the Lisp level.

> This is just two more regexps to search for (and propertize). I don't 
> expect that the slowdown from them is in any way perceptible.

> And the general point is that the Emacs syntax table structure doesn't 
> necessarily have to mirror the syntax of the C language.

Maybe not, but the point remains, that for this fix, a fix at the C level
is objectively better than a fix at the Lisp level.  Furthermore, the C
level change is already implemented and has been well tested.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 21:13                           ` Alan Mackenzie
@ 2020-11-22 21:34                             ` Dmitry Gutov
  2020-11-22 22:01                               ` Alan Mackenzie
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2020-11-22 21:34 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Hi Alan,

On 22.11.2020 23:13, Alan Mackenzie wrote:

> Coding up the Ruby block comments in syntax.c would involve string
> comparisons, for example, and would be an entirely new flavour inside
> that file.  It would involve examining individual letters rather than
> just their syntax.

It could be made to support a new syntax using a finite state machine, 
something like that. And the strings could be converted to such by the 
major mode. But you're right, it would be more difficult.

> By contrast, coding up the escaped NL in syntax.c was straightforward and
> natural.
> 
> Have you looked at the patch?

Yup.

It's not terrible, but it's still a bunch of new if/elses that one would 
need to grasp to maintain that code.

>>> Presumably ruby-mode handles these with syntax-table text properties
>>> applied to the = sign and the terminating d, which is a little clumsy,
>>> but not too bad, at the Lisp level.
> 
>> This is just two more regexps to search for (and propertize). I don't
>> expect that the slowdown from them is in any way perceptible.
> 
>> And the general point is that the Emacs syntax table structure doesn't
>> necessarily have to mirror the syntax of the C language.
> 
> Maybe not, but the point remains, that for this fix, a fix at the C level
> is objectively better than a fix at the Lisp level.  Furthermore, the C
> level change is already implemented and has been well tested.

Why is it objectively better?

With user experience (speed, latencies, etc) being equal or within the 
margin of error, I think it's more logical to go with simpler data 
structures and low level APIs.

Finally, as I recall you feel strongly about supporting older Emacs 
versions, a significant number of them. Doing that fix in Lisp would 
allow you to fix the bug for those versions too. Not just Emacs 28+.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 21:34                             ` Dmitry Gutov
@ 2020-11-22 22:01                               ` Alan Mackenzie
  2020-11-22 23:00                                 ` Stefan Monnier
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Mackenzie @ 2020-11-22 22:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier

Hello, Dmitry.

On Sun, Nov 22, 2020 at 23:34:18 +0200, Dmitry Gutov wrote:
> Hi Alan,

> On 22.11.2020 23:13, Alan Mackenzie wrote:

> > Coding up the Ruby block comments in syntax.c would involve string
> > comparisons, for example, and would be an entirely new flavour
> > inside that file.  It would involve examining individual letters
> > rather than just their syntax.

> It could be made to support a new syntax using a finite state machine,
> something like that. And the strings could be converted to such by the
> major mode. But you're right, it would be more difficult.

> > By contrast, coding up the escaped NL in syntax.c was
> > straightforward and natural.

> > Have you looked at the patch?

> Yup.

> It's not terrible, but it's still a bunch of new if/elses that one
> would need to grasp to maintain that code.

It's character, the general use of ifs/elses, and so on, is unchanged.
Only somebody with a very detailed memory of exact statements would be
inconvenienced, and that only slightly.

> >>> Presumably ruby-mode handles these with syntax-table text
> >>> properties applied to the = sign and the terminating d, which is a
> >>> little clumsy, but not too bad, at the Lisp level.

> >> This is just two more regexps to search for (and propertize). I
> >> don't expect that the slowdown from them is in any way perceptible.

> >> And the general point is that the Emacs syntax table structure
> >> doesn't necessarily have to mirror the syntax of the C language.

> > Maybe not, but the point remains, that for this fix, a fix at the C
> > level is objectively better than a fix at the Lisp level.
> > Furthermore, the C level change is already implemented and has been
> > well tested.

> Why is it objectively better?

It's faster, and it avoids fragmenting the handling of CC Mode comments
between C and Lisp the way that of strings, for example, has been.  It
provides a mechanism which might be useful to other major modes in the
future.

> With user experience (speed, latencies, etc) being equal or within the 
> margin of error, I think it's more logical to go with simpler data 
> structures and low level APIs.

Fixing things in syntax.c was simpler than a Lisp solution using
syntax-table text properties would have been.

> Finally, as I recall you feel strongly about supporting older Emacs 
> versions, a significant number of them. Doing that fix in Lisp would 
> allow you to fix the bug for those versions too. Not just Emacs 28+.

Yes.  That appears to be the sole drawback of the fix being in syntax.c.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 22:01                               ` Alan Mackenzie
@ 2020-11-22 23:00                                 ` Stefan Monnier
  2021-05-13 10:38                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 36+ messages in thread
From: Stefan Monnier @ 2020-11-22 23:00 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Dmitry Gutov

> It provides a mechanism which might be useful to other major modes in
> the future.

FWIW, I doubt it.
In my experience these details never quite match.

>> With user experience (speed, latencies, etc) being equal or within the 
>> margin of error, I think it's more logical to go with simpler data 
>> structures and low level APIs.
> Fixing things in syntax.c was simpler than a Lisp solution using
> syntax-table text properties would have been.

I find it hard to take this statement seriously after having seen my
sm-c-mode patch.

>> Finally, as I recall you feel strongly about supporting older Emacs
>> versions, a significant number of them. Doing that fix in Lisp would
>> allow you to fix the bug for those versions too. Not just Emacs 28+.
> Yes.  That appears to be the sole drawback of the fix being in syntax.c.

I don't really find this to be a drawback since I don't care about
CC-mode's support for multiline `//` comments in older Emacsen.

But the added complexity in the C code and in the API documentation for the
sole benefit of C (and C++?) mode is a drawback, yes.

I do want to clarify my position, tho: while I don't like your patch
very much because I think a Lisp-level solution (like the one I used in
sm-c-mode) is preferable I find your patch acceptable.


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 17:08                   ` Alan Mackenzie
  2020-11-22 17:46                     ` Dmitry Gutov
@ 2020-11-22 23:10                     ` Stefan Monnier
  1 sibling, 0 replies; 36+ messages in thread
From: Stefan Monnier @ 2020-11-22 23:10 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård

> Really?  Are there any other programming language modes whose comments
> syntax.c cannot handle without syntax-table text properties?

Yes, a fair number.  Beside the ones that use "long" delimiters, there
are some that put restrictions about where a comment can start (e.g. the
# in `sh` is a normal character when it appears within a word), there's
Scheme's #;(...) "sexp comment" (which we still don't actually handle
correctly), there are cases where there are too many different syntaxes
(e.g. in Pascal you can have (*...*), /*...*/, //...\n and then Emacs
concludes that `(/` is also a valid comment starter), ...


        Stefan






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2020-11-22 23:00                                 ` Stefan Monnier
@ 2021-05-13 10:38                                   ` Lars Ingebrigtsen
  2021-05-13 14:51                                     ` Alan Mackenzie
  0 siblings, 1 reply; 36+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-13 10:38 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: 43558, Alan Mackenzie, Mattias Engdegård, Dmitry Gutov

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I do want to clarify my position, tho: while I don't like your patch
> very much because I think a Lisp-level solution (like the one I used in
> sm-c-mode) is preferable I find your patch acceptable.

This was the patch that adds the "e" syntax modifier?  Apparently this
thread stalled here, and as far as I can see, the patch was never applied?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2021-05-13 10:38                                   ` Lars Ingebrigtsen
@ 2021-05-13 14:51                                     ` Alan Mackenzie
  2021-05-16 13:53                                       ` Lars Ingebrigtsen
  2022-04-28 11:17                                       ` Lars Ingebrigtsen
  0 siblings, 2 replies; 36+ messages in thread
From: Alan Mackenzie @ 2021-05-13 14:51 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov

Hello, Lars.

On Thu, May 13, 2021 at 12:38:01 +0200, Lars Ingebrigtsen wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > I do want to clarify my position, tho: while I don't like your patch
> > very much because I think a Lisp-level solution (like the one I used in
> > sm-c-mode) is preferable I find your patch acceptable.

> This was the patch that adds the "e" syntax modifier?  Apparently this
> thread stalled here, and as far as I can see, the patch was never applied?

That is correct, yes.  The patch was working and ready, but I took
Stefan's criticisms on board, and I think he was right.  It was a fairly
large change to syntax.c with a minor gain.  Also, it would have meant
having two alternative comment mechanisms in CC Mode for a few releases.
So I think on balance, it would be better to regard the patch as an
experiment only, which should go no further.

I think the bug should stay open, though.  CC Mode still doesn't handle
multi-line comments absolutely correctly, and these could surely be
fixed, like Stefan said, by using syntax-table text properties somehow.

I also created a plethora of syntax test cases for the patch, some of
which are applicable to syntax.c as is.  I'll get around to sorting
these out and committing them sometime.

> -- 
> (domestic pets only, the antidote for overdose, milk.)
>    bloggy blog: http://lars.ingebrigtsen.no

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2021-05-13 14:51                                     ` Alan Mackenzie
@ 2021-05-16 13:53                                       ` Lars Ingebrigtsen
  2022-04-28 11:17                                       ` Lars Ingebrigtsen
  1 sibling, 0 replies; 36+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-16 13:53 UTC (permalink / raw)
  To: Alan Mackenzie
  Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov

Alan Mackenzie <acm@muc.de> writes:

> I think the bug should stay open, though.  CC Mode still doesn't handle
> multi-line comments absolutely correctly, and these could surely be
> fixed, like Stefan said, by using syntax-table text properties somehow.

OK; makes sense to me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2021-05-13 14:51                                     ` Alan Mackenzie
  2021-05-16 13:53                                       ` Lars Ingebrigtsen
@ 2022-04-28 11:17                                       ` Lars Ingebrigtsen
  2022-04-28 18:52                                         ` Alan Mackenzie
  1 sibling, 1 reply; 36+ messages in thread
From: Lars Ingebrigtsen @ 2022-04-28 11:17 UTC (permalink / raw)
  To: Alan Mackenzie
  Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov

Alan Mackenzie <acm@muc.de> writes:

> I think the bug should stay open, though.  CC Mode still doesn't handle
> multi-line comments absolutely correctly, and these could surely be
> fixed, like Stefan said, by using syntax-table text properties somehow.

Just to add a test case for this bug -- put this into a C file:

/*
  \*/
a */

cc-mode will highlight that as one comment (i.e., the "a" will be
coloured with the comment face), but the file won't compile, since the
comment ended at line two.  (This is in Emacs 29.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
  2022-04-28 11:17                                       ` Lars Ingebrigtsen
@ 2022-04-28 18:52                                         ` Alan Mackenzie
  0 siblings, 0 replies; 36+ messages in thread
From: Alan Mackenzie @ 2022-04-28 18:52 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov

Hello, Lars.

On Thu, Apr 28, 2022 at 13:17:12 +0200, Lars Ingebrigtsen wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > I think the bug should stay open, though.  CC Mode still doesn't handle
> > multi-line comments absolutely correctly, and these could surely be
> > fixed, like Stefan said, by using syntax-table text properties somehow.

> Just to add a test case for this bug -- put this into a C file:

> /*
>   \*/
> a */

> cc-mode will highlight that as one comment (i.e., the "a" will be
> coloured with the comment face), but the file won't compile, since the
> comment ended at line two.  (This is in Emacs 29.)

I'll look at this in the coming days, and hopefully fix it.

> -- 
> (domestic pets only, the antidote for overdose, milk.)
>    bloggy blog: http://lars.ingebrigtsen.no

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-04-28 18:52 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
2020-09-22 14:09 ` Stefan Monnier
2020-09-22 19:41   ` Alan Mackenzie
     [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
2020-09-23  8:57   ` Alan Mackenzie
2020-09-23  9:01 ` Mattias Engdegård
2020-09-23 14:48   ` Alan Mackenzie
2020-09-23 18:44     ` Stefan Monnier
2020-09-23 19:44       ` Alan Mackenzie
2020-09-23 20:02         ` Stefan Monnier
2020-09-24 10:20       ` Alan Mackenzie
2020-09-24 16:56         ` Stefan Monnier
2020-09-24 18:50           ` Alan Mackenzie
2020-09-24 22:43             ` Stefan Monnier
2020-11-19 21:18           ` Alan Mackenzie
2020-11-19 22:47             ` Stefan Monnier
2020-11-22 13:12               ` Alan Mackenzie
2020-11-22 15:20                 ` Stefan Monnier
2020-11-22 17:08                   ` Alan Mackenzie
2020-11-22 17:46                     ` Dmitry Gutov
2020-11-22 18:19                       ` Alan Mackenzie
2020-11-22 20:39                         ` Dmitry Gutov
2020-11-22 21:13                           ` Alan Mackenzie
2020-11-22 21:34                             ` Dmitry Gutov
2020-11-22 22:01                               ` Alan Mackenzie
2020-11-22 23:00                                 ` Stefan Monnier
2021-05-13 10:38                                   ` Lars Ingebrigtsen
2021-05-13 14:51                                     ` Alan Mackenzie
2021-05-16 13:53                                       ` Lars Ingebrigtsen
2022-04-28 11:17                                       ` Lars Ingebrigtsen
2022-04-28 18:52                                         ` Alan Mackenzie
2020-11-22 23:10                     ` Stefan Monnier
2020-11-22 15:35                 ` Eli Zaretskii
2020-11-22 17:03                   ` Alan Mackenzie
2020-09-24 18:52     ` Michael Welsh Duggan
2020-09-24 19:57       ` Alan Mackenzie
2020-09-24 20:27         ` Michael Welsh Duggan

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).