unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
@ 2023-05-16 10:57 Alan Mackenzie
  2023-05-16 14:03 ` Alan Mackenzie
  2023-05-16 15:43 ` Eli Zaretskii
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 10:57 UTC (permalink / raw)
  To: 63535

Hello, Emacs.

In the master branch:

Consider the following C++ Mode buffer:

    // comment \
    comment line 2
    line_3();

..  The backslash at the end of line 1 extends the comment into line 2.

Put point at // on L1, and do:

    M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))

..  s gets the parse state of the inside of the comment.

Now put point at EOL 1, between the backslash and the LF.  Do

    M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)

..  This ought to leave point at BOL 2, since the syntax before the LF at
EOL 1 is that of a C++ comment, otherwise neutral.  Instead, it leaves
point wrongly at BOL 3.

#########################################################################

The reason for this bug is at L+42 of forw_comment (in syntax.c).  There
we have

   && !(comment_end_can_be_escaped && char_quoted (from, from_byte))

..  Checking char_quoted is wrong.  Instead the function should check the
current parse state.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
@ 2023-05-16 14:03 ` Alan Mackenzie
  2023-05-17 22:01   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-05-16 15:43 ` Eli Zaretskii
  1 sibling, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 14:03 UTC (permalink / raw)
  To: 63535; +Cc: acm

On Tue, May 16, 2023 at 10:57:40 +0000, Alan Mackenzie wrote:
> Hello, Emacs.

> In the master branch:

> Consider the following C++ Mode buffer:

>     // comment \
>     comment line 2
>     line_3();

> ..  The backslash at the end of line 1 extends the comment into line 2.

> Put point at // on L1, and do:

>     M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))

> ..  s gets the parse state of the inside of the comment.

> Now put point at EOL 1, between the backslash and the LF.  Do

>     M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)

> ..  This ought to leave point at BOL 2, since the syntax before the LF at
> EOL 1 is that of a C++ comment, otherwise neutral.  Instead, it leaves
> point wrongly at BOL 3.

> #########################################################################

> The reason for this bug is at L+42 of forw_comment (in syntax.c).  There
> we have

>    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))

> ..  Checking char_quoted is wrong.  Instead the function should check the
> current parse state.

And here is a patch which fixes it.  I will apply this patch to master
soon if I don't hear any objection.



diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..76d9f16e4ed 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	  && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
 	  && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
 	      (nesting > 0 && --nesting == 0) : nesting < 0)
-          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+          && !(comment_end_can_be_escaped &&
+	       (((prev_syntax & 0xff) == Sescape)
+		|| ((prev_syntax & 0xff) == Scharquote))))
 	/* We have encountered a comment end of the same style
 	   as the comment sequence which began this comment
 	   section.  */


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
  2023-05-16 14:03 ` Alan Mackenzie
@ 2023-05-16 15:43 ` Eli Zaretskii
  2023-05-16 16:15   ` Alan Mackenzie
  1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 15:43 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 63535

> Date: Tue, 16 May 2023 10:57:40 +0000
> From: Alan Mackenzie <acm@muc.de>
> 
> Hello, Emacs.
> 
> In the master branch:

Is it different on emacs-29?

>    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> 
> ..  Checking char_quoted is wrong.  Instead the function should check the
> current parse state.

Why not both?  IOW, please explain why char_quoted is not TRT here.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 15:43 ` Eli Zaretskii
@ 2023-05-16 16:15   ` Alan Mackenzie
  2023-05-16 16:29     ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 16:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63535

Hello, Eli.

On Tue, May 16, 2023 at 18:43:59 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 10:57:40 +0000
> > From: Alan Mackenzie <acm@muc.de>

> > Hello, Emacs.

> > In the master branch:

> Is it different on emacs-29?

No, the bug has been there since ?2016, having been coded, almost
certainly, by me.  ;-(  The context in 2016 was making an escaped NL in
a C++ line comment continue the comment's fontification onto the next
line.  The (then) new variable comment-end-can-be-escaped configured the
effect of the backslash at EOL.

I have been assuming that it is too unimportant a bug to go into
emacs-29 at this late stage.

> >    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))

> > ..  Checking char_quoted is wrong.  Instead the function should check the
> > current parse state.

> Why not both?  IOW, please explain why char_quoted is not TRT here.

Because parse-partial-sexp is not scanning the backslash.  The scan
starts one character after the backslash, and the syntactic effect of
that backslash is not in the OLDSTATE argument to parse-partial-sexp.

-- 
Alan Mackenzie (Nuremberg, Germany)





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 16:15   ` Alan Mackenzie
@ 2023-05-16 16:29     ` Eli Zaretskii
  2023-05-16 16:58       ` Alan Mackenzie
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 16:29 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 63535

> Date: Tue, 16 May 2023 16:15:24 +0000
> Cc: 63535@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > >    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> 
> > > ..  Checking char_quoted is wrong.  Instead the function should check the
> > > current parse state.
> 
> > Why not both?  IOW, please explain why char_quoted is not TRT here.
> 
> Because parse-partial-sexp is not scanning the backslash.  The scan
> starts one character after the backslash, and the syntactic effect of
> that backslash is not in the OLDSTATE argument to parse-partial-sexp.

Sorry, I still don't follow: char_quoted doesn't call
parse-partial-sexp, AFAICT.  So why does it matter what
parse-partial-sexp does when we are discussing why char_quoted is not
TRT?





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 16:29     ` Eli Zaretskii
@ 2023-05-16 16:58       ` Alan Mackenzie
  2023-05-16 17:50         ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 63535

Hello, Eli.

On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 16:15:24 +0000
> > Cc: 63535@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > >    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))

> > > > ..  Checking char_quoted is wrong.  Instead the function should check the
> > > > current parse state.

> > > Why not both?  IOW, please explain why char_quoted is not TRT here.

> > Because parse-partial-sexp is not scanning the backslash.  The scan
> > starts one character after the backslash, and the syntactic effect of
> > that backslash is not in the OLDSTATE argument to parse-partial-sexp.

> Sorry, I still don't follow: char_quoted doesn't call
> parse-partial-sexp, AFAICT.

parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.

> So why does it matter what parse-partial-sexp does when we are
> discussing why char_quoted is not TRT?

parse-partial-sexp is the context in which the bug becomes evident.  If,
in the C++ line comment with escaped NL, you start parse-partial-sexp at
the NL, it behaves as though the scan started at the backslash.  This is
the bug.

The cause of the bug is the use of char_quoted at line 42 of
forw_comment.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 16:58       ` Alan Mackenzie
@ 2023-05-16 17:50         ` Eli Zaretskii
  0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 17:50 UTC (permalink / raw)
  To: Alan Mackenzie, Stefan Monnier; +Cc: 63535

> Date: Tue, 16 May 2023 16:58:55 +0000
> Cc: 63535@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > > Date: Tue, 16 May 2023 16:15:24 +0000
> > > Cc: 63535@debbugs.gnu.org
> > > From: Alan Mackenzie <acm@muc.de>
> 
> > > > >    && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> 
> > > > > ..  Checking char_quoted is wrong.  Instead the function should check the
> > > > > current parse state.
> 
> > > > Why not both?  IOW, please explain why char_quoted is not TRT here.
> 
> > > Because parse-partial-sexp is not scanning the backslash.  The scan
> > > starts one character after the backslash, and the syntactic effect of
> > > that backslash is not in the OLDSTATE argument to parse-partial-sexp.
> 
> > Sorry, I still don't follow: char_quoted doesn't call
> > parse-partial-sexp, AFAICT.
> 
> parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.
> 
> > So why does it matter what parse-partial-sexp does when we are
> > discussing why char_quoted is not TRT?
> 
> parse-partial-sexp is the context in which the bug becomes evident.  If,
> in the C++ line comment with escaped NL, you start parse-partial-sexp at
> the NL, it behaves as though the scan started at the backslash.  This is
> the bug.
> 
> The cause of the bug is the use of char_quoted at line 42 of
> forw_comment.

Thanks, let's see what Stefan has to say about this.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-16 14:03 ` Alan Mackenzie
@ 2023-05-17 22:01   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-05-22 14:59     ` Alan Mackenzie
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-17 22:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 63535

Hi Alan,

> diff --git a/src/syntax.c b/src/syntax.c
> index e9e04e2d638..76d9f16e4ed 100644
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	  && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
>  	  && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
>  	      (nesting > 0 && --nesting == 0) : nesting < 0)
> -          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
> +          && !(comment_end_can_be_escaped &&
> +	       (((prev_syntax & 0xff) == Sescape)
> +		|| ((prev_syntax & 0xff) == Scharquote))))
>  	/* We have encountered a comment end of the same style
>  	   as the comment sequence which began this comment
>  	   section.  */

AFAIK this is your code, so you should know better, but AFAICT
`prev_syntax` is not updated in the loop, so it only reflects the syntax
before the beginning of the scanned text, rather than anything near `from`.
Are you sure this is right?


        Stefan






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-17 22:01   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-05-22 14:59     ` Alan Mackenzie
  2023-05-22 15:16       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-22 14:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 63535

Hello, Stefan.

On Wed, May 17, 2023 at 18:01:32 -0400, Stefan Monnier wrote:
> Hi Alan,

[ .... ]

> AFAIK this is your code, so you should know better, but AFAICT
> `prev_syntax` is not updated in the loop, so it only reflects the syntax
> before the beginning of the scanned text, rather than anything near `from`.
> Are you sure this is right?

Thanks, you are correct, the patch was not good.

It turned out to be quite tricky to get working.  As well as
forw_comment, I had to amend scan_sexps_forward to make it return a
quoted state to its caller when this happens at the limit of the scan.

I think the following patch is better.  Would you please have a look at
it, in the hope I haven't made any other silly mistakes.  Thanks!



diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..94b2ac2b591 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2338,13 +2338,16 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	  return 0;
 	}
       c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+      prev_syntax = syntax;
       syntax = SYNTAX_WITH_FLAGS (c);
       code = syntax & 0xff;
       if (code == Sendcomment
 	  && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
 	  && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
 	      (nesting > 0 && --nesting == 0) : nesting < 0)
-          && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+          && !(comment_end_can_be_escaped
+	       && ((prev_syntax & 0xff) == Sescape
+		   || (prev_syntax & 0xff) == Scharquote)))
 	/* We have encountered a comment end of the same style
 	   as the comment sequence which began this comment
 	   section.  */
@@ -2368,7 +2371,11 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
           inc_both (&from, &from_byte);
           UPDATE_SYNTAX_TABLE_FORWARD (from);
           if (from == stop) continue; /* Failure */
-        }
+	  c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+	  prev_syntax = syntax;
+	  syntax = Smax;
+	  code = syntax;
+	}
       inc_both (&from, &from_byte);
       UPDATE_SYNTAX_TABLE_FORWARD (from);
 
@@ -3349,7 +3356,14 @@ do { prev_from = from;				\
 	     are invalid now.  Luckily, the `done' doesn't use them
 	     and the INC_FROM sets them to a sane value without
 	     looking at them. */
-	  if (!found) goto done;
+	  if (!found)
+	    {
+	      if ((prev_from_syntax & 0xff) == Sescape
+		  || (prev_from_syntax & 0xff) == Scharquote)
+		goto endquoted;
+	      else
+		goto done;
+	    }
 	  INC_FROM;
 	  state->incomment = 0;
 	  state->comstyle = 0;	/* reset the comment style */


>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-22 14:59     ` Alan Mackenzie
@ 2023-05-22 15:16       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-05-22 16:16         ` Alan Mackenzie
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-22 15:16 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 63535

> I think the following patch is better.  Would you please have a look at
> it, in the hope I haven't made any other silly mistakes.  Thanks!

I don't see any silly mistake there, sorry.


        Stefan


PS: It does remind me that we really should do ourselves a favor and get rid
of the distinction between `Sescape` and `Scharquote`.
IIRC there's a risk of backward incompatibility, so it has to be done
progressively, but we should start the process.  E.g. first declare one of the
two as obsolete, then emit a warning when we see it being used, ...






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
  2023-05-22 15:16       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-05-22 16:16         ` Alan Mackenzie
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-22 16:16 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, 63535-done

Hello, Stefan.

On Mon, May 22, 2023 at 11:16:40 -0400, Stefan Monnier wrote:
> > I think the following patch is better.  Would you please have a look at
> > it, in the hope I haven't made any other silly mistakes.  Thanks!

> I don't see any silly mistake there, sorry.

Thanks!  I've committed the patch, and I'm closing the bug.

>         Stefan


> PS: It does remind me that we really should do ourselves a favor and get rid
> of the distinction between `Sescape` and `Scharquote`.
> IIRC there's a risk of backward incompatibility, so it has to be done
> progressively, but we should start the process.  E.g. first declare one of the
> two as obsolete, then emit a warning when we see it being used, ...

Yes.  I think Sescape should be the survivor.  I don't know if
Scharquote is used at all in Emacs code.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-22 16:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
2023-05-16 14:03 ` Alan Mackenzie
2023-05-17 22:01   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 14:59     ` Alan Mackenzie
2023-05-22 15:16       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 16:16         ` Alan Mackenzie
2023-05-16 15:43 ` Eli Zaretskii
2023-05-16 16:15   ` Alan Mackenzie
2023-05-16 16:29     ` Eli Zaretskii
2023-05-16 16:58       ` Alan Mackenzie
2023-05-16 17:50         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).