* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
@ 2023-05-16 10:57 Alan Mackenzie
2023-05-16 14:03 ` Alan Mackenzie
2023-05-16 15:43 ` Eli Zaretskii
0 siblings, 2 replies; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 10:57 UTC (permalink / raw)
To: 63535
Hello, Emacs.
In the master branch:
Consider the following C++ Mode buffer:
// comment \
comment line 2
line_3();
.. The backslash at the end of line 1 extends the comment into line 2.
Put point at // on L1, and do:
M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))
.. s gets the parse state of the inside of the comment.
Now put point at EOL 1, between the backslash and the LF. Do
M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)
.. This ought to leave point at BOL 2, since the syntax before the LF at
EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves
point wrongly at BOL 3.
#########################################################################
The reason for this bug is at L+42 of forw_comment (in syntax.c). There
we have
&& !(comment_end_can_be_escaped && char_quoted (from, from_byte))
.. Checking char_quoted is wrong. Instead the function should check the
current parse state.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
@ 2023-05-16 14:03 ` Alan Mackenzie
2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-16 15:43 ` Eli Zaretskii
1 sibling, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 14:03 UTC (permalink / raw)
To: 63535; +Cc: acm
On Tue, May 16, 2023 at 10:57:40 +0000, Alan Mackenzie wrote:
> Hello, Emacs.
> In the master branch:
> Consider the following C++ Mode buffer:
> // comment \
> comment line 2
> line_3();
> .. The backslash at the end of line 1 extends the comment into line 2.
> Put point at // on L1, and do:
> M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))
> .. s gets the parse state of the inside of the comment.
> Now put point at EOL 1, between the backslash and the LF. Do
> M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)
> .. This ought to leave point at BOL 2, since the syntax before the LF at
> EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves
> point wrongly at BOL 3.
> #########################################################################
> The reason for this bug is at L+42 of forw_comment (in syntax.c). There
> we have
> && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> .. Checking char_quoted is wrong. Instead the function should check the
> current parse state.
And here is a patch which fixes it. I will apply this patch to master
soon if I don't hear any objection.
diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..76d9f16e4ed 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
&& SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
&& (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
(nesting > 0 && --nesting == 0) : nesting < 0)
- && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+ && !(comment_end_can_be_escaped &&
+ (((prev_syntax & 0xff) == Sescape)
+ || ((prev_syntax & 0xff) == Scharquote))))
/* We have encountered a comment end of the same style
as the comment sequence which began this comment
section. */
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
2023-05-16 14:03 ` Alan Mackenzie
@ 2023-05-16 15:43 ` Eli Zaretskii
2023-05-16 16:15 ` Alan Mackenzie
1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 15:43 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: 63535
> Date: Tue, 16 May 2023 10:57:40 +0000
> From: Alan Mackenzie <acm@muc.de>
>
> Hello, Emacs.
>
> In the master branch:
Is it different on emacs-29?
> && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> .. Checking char_quoted is wrong. Instead the function should check the
> current parse state.
Why not both? IOW, please explain why char_quoted is not TRT here.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 15:43 ` Eli Zaretskii
@ 2023-05-16 16:15 ` Alan Mackenzie
2023-05-16 16:29 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 16:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 63535
Hello, Eli.
On Tue, May 16, 2023 at 18:43:59 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 10:57:40 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Hello, Emacs.
> > In the master branch:
> Is it different on emacs-29?
No, the bug has been there since ?2016, having been coded, almost
certainly, by me. ;-( The context in 2016 was making an escaped NL in
a C++ line comment continue the comment's fontification onto the next
line. The (then) new variable comment-end-can-be-escaped configured the
effect of the backslash at EOL.
I have been assuming that it is too unimportant a bug to go into
emacs-29 at this late stage.
> > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> > .. Checking char_quoted is wrong. Instead the function should check the
> > current parse state.
> Why not both? IOW, please explain why char_quoted is not TRT here.
Because parse-partial-sexp is not scanning the backslash. The scan
starts one character after the backslash, and the syntactic effect of
that backslash is not in the OLDSTATE argument to parse-partial-sexp.
--
Alan Mackenzie (Nuremberg, Germany)
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 16:15 ` Alan Mackenzie
@ 2023-05-16 16:29 ` Eli Zaretskii
2023-05-16 16:58 ` Alan Mackenzie
0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 16:29 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: 63535
> Date: Tue, 16 May 2023 16:15:24 +0000
> Cc: 63535@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> > > .. Checking char_quoted is wrong. Instead the function should check the
> > > current parse state.
>
> > Why not both? IOW, please explain why char_quoted is not TRT here.
>
> Because parse-partial-sexp is not scanning the backslash. The scan
> starts one character after the backslash, and the syntactic effect of
> that backslash is not in the OLDSTATE argument to parse-partial-sexp.
Sorry, I still don't follow: char_quoted doesn't call
parse-partial-sexp, AFAICT. So why does it matter what
parse-partial-sexp does when we are discussing why char_quoted is not
TRT?
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 16:29 ` Eli Zaretskii
@ 2023-05-16 16:58 ` Alan Mackenzie
2023-05-16 17:50 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-16 16:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 63535
Hello, Eli.
On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 16:15:24 +0000
> > Cc: 63535@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> > > > .. Checking char_quoted is wrong. Instead the function should check the
> > > > current parse state.
> > > Why not both? IOW, please explain why char_quoted is not TRT here.
> > Because parse-partial-sexp is not scanning the backslash. The scan
> > starts one character after the backslash, and the syntactic effect of
> > that backslash is not in the OLDSTATE argument to parse-partial-sexp.
> Sorry, I still don't follow: char_quoted doesn't call
> parse-partial-sexp, AFAICT.
parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.
> So why does it matter what parse-partial-sexp does when we are
> discussing why char_quoted is not TRT?
parse-partial-sexp is the context in which the bug becomes evident. If,
in the C++ line comment with escaped NL, you start parse-partial-sexp at
the NL, it behaves as though the scan started at the backslash. This is
the bug.
The cause of the bug is the use of char_quoted at line 42 of
forw_comment.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 16:58 ` Alan Mackenzie
@ 2023-05-16 17:50 ` Eli Zaretskii
0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-05-16 17:50 UTC (permalink / raw)
To: Alan Mackenzie, Stefan Monnier; +Cc: 63535
> Date: Tue, 16 May 2023 16:58:55 +0000
> Cc: 63535@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
>
> On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > > Date: Tue, 16 May 2023 16:15:24 +0000
> > > Cc: 63535@debbugs.gnu.org
> > > From: Alan Mackenzie <acm@muc.de>
>
> > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> > > > > .. Checking char_quoted is wrong. Instead the function should check the
> > > > > current parse state.
>
> > > > Why not both? IOW, please explain why char_quoted is not TRT here.
>
> > > Because parse-partial-sexp is not scanning the backslash. The scan
> > > starts one character after the backslash, and the syntactic effect of
> > > that backslash is not in the OLDSTATE argument to parse-partial-sexp.
>
> > Sorry, I still don't follow: char_quoted doesn't call
> > parse-partial-sexp, AFAICT.
>
> parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.
>
> > So why does it matter what parse-partial-sexp does when we are
> > discussing why char_quoted is not TRT?
>
> parse-partial-sexp is the context in which the bug becomes evident. If,
> in the C++ line comment with escaped NL, you start parse-partial-sexp at
> the NL, it behaves as though the scan started at the backslash. This is
> the bug.
>
> The cause of the bug is the use of char_quoted at line 42 of
> forw_comment.
Thanks, let's see what Stefan has to say about this.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-16 14:03 ` Alan Mackenzie
@ 2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 14:59 ` Alan Mackenzie
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-17 22:01 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: 63535
Hi Alan,
> diff --git a/src/syntax.c b/src/syntax.c
> index e9e04e2d638..76d9f16e4ed 100644
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
> && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
> (nesting > 0 && --nesting == 0) : nesting < 0)
> - && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
> + && !(comment_end_can_be_escaped &&
> + (((prev_syntax & 0xff) == Sescape)
> + || ((prev_syntax & 0xff) == Scharquote))))
> /* We have encountered a comment end of the same style
> as the comment sequence which began this comment
> section. */
AFAIK this is your code, so you should know better, but AFAICT
`prev_syntax` is not updated in the loop, so it only reflects the syntax
before the beginning of the scanned text, rather than anything near `from`.
Are you sure this is right?
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-05-22 14:59 ` Alan Mackenzie
2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-22 14:59 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 63535
Hello, Stefan.
On Wed, May 17, 2023 at 18:01:32 -0400, Stefan Monnier wrote:
> Hi Alan,
[ .... ]
> AFAIK this is your code, so you should know better, but AFAICT
> `prev_syntax` is not updated in the loop, so it only reflects the syntax
> before the beginning of the scanned text, rather than anything near `from`.
> Are you sure this is right?
Thanks, you are correct, the patch was not good.
It turned out to be quite tricky to get working. As well as
forw_comment, I had to amend scan_sexps_forward to make it return a
quoted state to its caller when this happens at the limit of the scan.
I think the following patch is better. Would you please have a look at
it, in the hope I haven't made any other silly mistakes. Thanks!
diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..94b2ac2b591 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2338,13 +2338,16 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
return 0;
}
c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+ prev_syntax = syntax;
syntax = SYNTAX_WITH_FLAGS (c);
code = syntax & 0xff;
if (code == Sendcomment
&& SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
&& (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
(nesting > 0 && --nesting == 0) : nesting < 0)
- && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+ && !(comment_end_can_be_escaped
+ && ((prev_syntax & 0xff) == Sescape
+ || (prev_syntax & 0xff) == Scharquote)))
/* We have encountered a comment end of the same style
as the comment sequence which began this comment
section. */
@@ -2368,7 +2371,11 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
if (from == stop) continue; /* Failure */
- }
+ c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+ prev_syntax = syntax;
+ syntax = Smax;
+ code = syntax;
+ }
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -3349,7 +3356,14 @@ do { prev_from = from; \
are invalid now. Luckily, the `done' doesn't use them
and the INC_FROM sets them to a sane value without
looking at them. */
- if (!found) goto done;
+ if (!found)
+ {
+ if ((prev_from_syntax & 0xff) == Sescape
+ || (prev_from_syntax & 0xff) == Scharquote)
+ goto endquoted;
+ else
+ goto done;
+ }
INC_FROM;
state->incomment = 0;
state->comstyle = 0; /* reset the comment style */
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-22 14:59 ` Alan Mackenzie
@ 2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 16:16 ` Alan Mackenzie
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-05-22 15:16 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: 63535
> I think the following patch is better. Would you please have a look at
> it, in the hope I haven't made any other silly mistakes. Thanks!
I don't see any silly mistake there, sorry.
Stefan
PS: It does remind me that we really should do ourselves a favor and get rid
of the distinction between `Sescape` and `Scharquote`.
IIRC there's a risk of backward incompatibility, so it has to be done
progressively, but we should start the process. E.g. first declare one of the
two as obsolete, then emit a warning when we see it being used, ...
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-05-22 16:16 ` Alan Mackenzie
0 siblings, 0 replies; 11+ messages in thread
From: Alan Mackenzie @ 2023-05-22 16:16 UTC (permalink / raw)
To: Stefan Monnier; +Cc: acm, 63535-done
Hello, Stefan.
On Mon, May 22, 2023 at 11:16:40 -0400, Stefan Monnier wrote:
> > I think the following patch is better. Would you please have a look at
> > it, in the hope I haven't made any other silly mistakes. Thanks!
> I don't see any silly mistake there, sorry.
Thanks! I've committed the patch, and I'm closing the bug.
> Stefan
> PS: It does remind me that we really should do ourselves a favor and get rid
> of the distinction between `Sescape` and `Scharquote`.
> IIRC there's a risk of backward incompatibility, so it has to be done
> progressively, but we should start the process. E.g. first declare one of the
> two as obsolete, then emit a warning when we see it being used, ...
Yes. I think Sescape should be the survivor. I don't know if
Scharquote is used at all in Emacs code.
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-22 16:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-16 10:57 bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Alan Mackenzie
2023-05-16 14:03 ` Alan Mackenzie
2023-05-17 22:01 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 14:59 ` Alan Mackenzie
2023-05-22 15:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 16:16 ` Alan Mackenzie
2023-05-16 15:43 ` Eli Zaretskii
2023-05-16 16:15 ` Alan Mackenzie
2023-05-16 16:29 ` Eli Zaretskii
2023-05-16 16:58 ` Alan Mackenzie
2023-05-16 17:50 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.