* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs @ 2023-05-16 10:57 Alan Mackenzie 2023-05-16 14:03 ` Alan Mackenzie 2023-05-16 15:43 ` Eli Zaretskii 0 siblings, 2 replies; 11+ messages in thread From: Alan Mackenzie @ 2023-05-16 10:57 UTC (permalink / raw) To: 63535 Hello, Emacs. In the master branch: Consider the following C++ Mode buffer: // comment \ comment line 2 line_3(); .. The backslash at the end of line 1 extends the comment into line 2. Put point at // on L1, and do: M-: (setq s (parse-partial-sexp (point) (+ (point) 9))) .. s gets the parse state of the inside of the comment. Now put point at EOL 1, between the backslash and the LF. Do M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table) .. This ought to leave point at BOL 2, since the syntax before the LF at EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves point wrongly at BOL 3. ######################################################################### The reason for this bug is at L+42 of forw_comment (in syntax.c). There we have && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) .. Checking char_quoted is wrong. Instead the function should check the current parse state. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie @ 2023-05-16 14:03 ` Alan Mackenzie 2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-05-16 15:43 ` Eli Zaretskii 1 sibling, 1 reply; 11+ messages in thread From: Alan Mackenzie @ 2023-05-16 14:03 UTC (permalink / raw) To: 63535; +Cc: acm On Tue, May 16, 2023 at 10:57:40 +0000, Alan Mackenzie wrote: > Hello, Emacs. > In the master branch: > Consider the following C++ Mode buffer: > // comment \ > comment line 2 > line_3(); > .. The backslash at the end of line 1 extends the comment into line 2. > Put point at // on L1, and do: > M-: (setq s (parse-partial-sexp (point) (+ (point) 9))) > .. s gets the parse state of the inside of the comment. > Now put point at EOL 1, between the backslash and the LF. Do > M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table) > .. This ought to leave point at BOL 2, since the syntax before the LF at > EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves > point wrongly at BOL 3. > ######################################################################### > The reason for this bug is at L+42 of forw_comment (in syntax.c). There > we have > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > .. Checking char_quoted is wrong. Instead the function should check the > current parse state. And here is a patch which fixes it. I will apply this patch to master soon if I don't hear any objection. diff --git a/src/syntax.c b/src/syntax.c index e9e04e2d638..76d9f16e4ed 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !(comment_end_can_be_escaped && + (((prev_syntax & 0xff) == Sescape) + || ((prev_syntax & 0xff) == Scharquote)))) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 14:03 ` Alan Mackenzie @ 2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-05-22 14:59 ` Alan Mackenzie 0 siblings, 1 reply; 11+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-17 22:01 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 63535 Hi Alan, > diff --git a/src/syntax.c b/src/syntax.c > index e9e04e2d638..76d9f16e4ed 100644 > --- a/src/syntax.c > +++ b/src/syntax.c > @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style > && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? > (nesting > 0 && --nesting == 0) : nesting < 0) > - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) > + && !(comment_end_can_be_escaped && > + (((prev_syntax & 0xff) == Sescape) > + || ((prev_syntax & 0xff) == Scharquote)))) > /* We have encountered a comment end of the same style > as the comment sequence which began this comment > section. */ AFAIK this is your code, so you should know better, but AFAICT `prev_syntax` is not updated in the loop, so it only reflects the syntax before the beginning of the scanned text, rather than anything near `from`. Are you sure this is right? Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-22 14:59 ` Alan Mackenzie 2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 11+ messages in thread From: Alan Mackenzie @ 2023-05-22 14:59 UTC (permalink / raw) To: Stefan Monnier; +Cc: 63535 Hello, Stefan. On Wed, May 17, 2023 at 18:01:32 -0400, Stefan Monnier wrote: > Hi Alan, [ .... ] > AFAIK this is your code, so you should know better, but AFAICT > `prev_syntax` is not updated in the loop, so it only reflects the syntax > before the beginning of the scanned text, rather than anything near `from`. > Are you sure this is right? Thanks, you are correct, the patch was not good. It turned out to be quite tricky to get working. As well as forw_comment, I had to amend scan_sexps_forward to make it return a quoted state to its caller when this happens at the limit of the scan. I think the following patch is better. Would you please have a look at it, in the hope I haven't made any other silly mistakes. Thanks! diff --git a/src/syntax.c b/src/syntax.c index e9e04e2d638..94b2ac2b591 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2338,13 +2338,16 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, return 0; } c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + prev_syntax = syntax; syntax = SYNTAX_WITH_FLAGS (c); code = syntax & 0xff; if (code == Sendcomment && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !(comment_end_can_be_escaped + && ((prev_syntax & 0xff) == Sescape + || (prev_syntax & 0xff) == Scharquote))) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ @@ -2368,7 +2371,11 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); if (from == stop) continue; /* Failure */ - } + c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + prev_syntax = syntax; + syntax = Smax; + code = syntax; + } inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); @@ -3349,7 +3356,14 @@ do { prev_from = from; \ are invalid now. Luckily, the `done' doesn't use them and the INC_FROM sets them to a sane value without looking at them. */ - if (!found) goto done; + if (!found) + { + if ((prev_from_syntax & 0xff) == Sescape + || (prev_from_syntax & 0xff) == Scharquote) + goto endquoted; + else + goto done; + } INC_FROM; state->incomment = 0; state->comstyle = 0; /* reset the comment style */ > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-22 14:59 ` Alan Mackenzie @ 2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-05-22 16:16 ` Alan Mackenzie 0 siblings, 1 reply; 11+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-22 15:16 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 63535 > I think the following patch is better. Would you please have a look at > it, in the hope I haven't made any other silly mistakes. Thanks! I don't see any silly mistake there, sorry. Stefan PS: It does remind me that we really should do ourselves a favor and get rid of the distinction between `Sescape` and `Scharquote`. IIRC there's a risk of backward incompatibility, so it has to be done progressively, but we should start the process. E.g. first declare one of the two as obsolete, then emit a warning when we see it being used, ... ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-22 16:16 ` Alan Mackenzie 0 siblings, 0 replies; 11+ messages in thread From: Alan Mackenzie @ 2023-05-22 16:16 UTC (permalink / raw) To: Stefan Monnier; +Cc: acm, 63535-done Hello, Stefan. On Mon, May 22, 2023 at 11:16:40 -0400, Stefan Monnier wrote: > > I think the following patch is better. Would you please have a look at > > it, in the hope I haven't made any other silly mistakes. Thanks! > I don't see any silly mistake there, sorry. Thanks! I've committed the patch, and I'm closing the bug. > Stefan > PS: It does remind me that we really should do ourselves a favor and get rid > of the distinction between `Sescape` and `Scharquote`. > IIRC there's a risk of backward incompatibility, so it has to be done > progressively, but we should start the process. E.g. first declare one of the > two as obsolete, then emit a warning when we see it being used, ... Yes. I think Sescape should be the survivor. I don't know if Scharquote is used at all in Emacs code. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie 2023-05-16 14:03 ` Alan Mackenzie @ 2023-05-16 15:43 ` Eli Zaretskii 2023-05-16 16:15 ` Alan Mackenzie 1 sibling, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2023-05-16 15:43 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 63535 > Date: Tue, 16 May 2023 10:57:40 +0000 > From: Alan Mackenzie <acm@muc.de> > > Hello, Emacs. > > In the master branch: Is it different on emacs-29? > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > .. Checking char_quoted is wrong. Instead the function should check the > current parse state. Why not both? IOW, please explain why char_quoted is not TRT here. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 15:43 ` Eli Zaretskii @ 2023-05-16 16:15 ` Alan Mackenzie 2023-05-16 16:29 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: Alan Mackenzie @ 2023-05-16 16:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63535 Hello, Eli. On Tue, May 16, 2023 at 18:43:59 +0300, Eli Zaretskii wrote: > > Date: Tue, 16 May 2023 10:57:40 +0000 > > From: Alan Mackenzie <acm@muc.de> > > Hello, Emacs. > > In the master branch: > Is it different on emacs-29? No, the bug has been there since ?2016, having been coded, almost certainly, by me. ;-( The context in 2016 was making an escaped NL in a C++ line comment continue the comment's fontification onto the next line. The (then) new variable comment-end-can-be-escaped configured the effect of the backslash at EOL. I have been assuming that it is too unimportant a bug to go into emacs-29 at this late stage. > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > .. Checking char_quoted is wrong. Instead the function should check the > > current parse state. > Why not both? IOW, please explain why char_quoted is not TRT here. Because parse-partial-sexp is not scanning the backslash. The scan starts one character after the backslash, and the syntactic effect of that backslash is not in the OLDSTATE argument to parse-partial-sexp. -- Alan Mackenzie (Nuremberg, Germany) ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 16:15 ` Alan Mackenzie @ 2023-05-16 16:29 ` Eli Zaretskii 2023-05-16 16:58 ` Alan Mackenzie 0 siblings, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2023-05-16 16:29 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 63535 > Date: Tue, 16 May 2023 16:15:24 +0000 > Cc: 63535@debbugs.gnu.org > From: Alan Mackenzie <acm@muc.de> > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > current parse state. > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > Because parse-partial-sexp is not scanning the backslash. The scan > starts one character after the backslash, and the syntactic effect of > that backslash is not in the OLDSTATE argument to parse-partial-sexp. Sorry, I still don't follow: char_quoted doesn't call parse-partial-sexp, AFAICT. So why does it matter what parse-partial-sexp does when we are discussing why char_quoted is not TRT? ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 16:29 ` Eli Zaretskii @ 2023-05-16 16:58 ` Alan Mackenzie 2023-05-16 17:50 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: Alan Mackenzie @ 2023-05-16 16:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63535 Hello, Eli. On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote: > > Date: Tue, 16 May 2023 16:15:24 +0000 > > Cc: 63535@debbugs.gnu.org > > From: Alan Mackenzie <acm@muc.de> > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > > current parse state. > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > Because parse-partial-sexp is not scanning the backslash. The scan > > starts one character after the backslash, and the syntactic effect of > > that backslash is not in the OLDSTATE argument to parse-partial-sexp. > Sorry, I still don't follow: char_quoted doesn't call > parse-partial-sexp, AFAICT. parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted. > So why does it matter what parse-partial-sexp does when we are > discussing why char_quoted is not TRT? parse-partial-sexp is the context in which the bug becomes evident. If, in the C++ line comment with escaped NL, you start parse-partial-sexp at the NL, it behaves as though the scan started at the backslash. This is the bug. The cause of the bug is the use of char_quoted at line 42 of forw_comment. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs 2023-05-16 16:58 ` Alan Mackenzie @ 2023-05-16 17:50 ` Eli Zaretskii 0 siblings, 0 replies; 11+ messages in thread From: Eli Zaretskii @ 2023-05-16 17:50 UTC (permalink / raw) To: Alan Mackenzie, Stefan Monnier; +Cc: 63535 > Date: Tue, 16 May 2023 16:58:55 +0000 > Cc: 63535@debbugs.gnu.org > From: Alan Mackenzie <acm@muc.de> > > On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote: > > > Date: Tue, 16 May 2023 16:15:24 +0000 > > > Cc: 63535@debbugs.gnu.org > > > From: Alan Mackenzie <acm@muc.de> > > > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > > > current parse state. > > > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > > > Because parse-partial-sexp is not scanning the backslash. The scan > > > starts one character after the backslash, and the syntactic effect of > > > that backslash is not in the OLDSTATE argument to parse-partial-sexp. > > > Sorry, I still don't follow: char_quoted doesn't call > > parse-partial-sexp, AFAICT. > > parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted. > > > So why does it matter what parse-partial-sexp does when we are > > discussing why char_quoted is not TRT? > > parse-partial-sexp is the context in which the bug becomes evident. If, > in the C++ line comment with escaped NL, you start parse-partial-sexp at > the NL, it behaves as though the scan started at the backslash. This is > the bug. > > The cause of the bug is the use of char_quoted at line 42 of > forw_comment. Thanks, let's see what Stefan has to say about this. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-22 16:16 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie 2023-05-16 14:03 ` Alan Mackenzie 2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-05-22 14:59 ` Alan Mackenzie 2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-05-22 16:16 ` Alan Mackenzie 2023-05-16 15:43 ` Eli Zaretskii 2023-05-16 16:15 ` Alan Mackenzie 2023-05-16 16:29 ` Eli Zaretskii 2023-05-16 16:58 ` Alan Mackenzie 2023-05-16 17:50 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).