From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 43558@debbugs.gnu.org, "Mattias Engdegård" <mattiase@acm.org>,
acm@muc.de
Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
Date: Sun, 22 Nov 2020 13:12:31 +0000 [thread overview]
Message-ID: <20201122131231.GB5912@ACM> (raw)
In-Reply-To: <jwvzh3d3rry.fsf-monnier+emacs@gnu.org>
Hello, Stefan.
On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote:
> >> So, yeah, you can add yet-another-hack on top of the other syntax.c
> >> hacks if you want, but there's a good chance it will only ever be used
> >> by CC-mode. It will take a lot more code changes in syntax.c than
> >> a quick tweak to your Elisp code to search for "\*/".
> [...]
> > OK, here's the patch.
> I think the patch agrees with my assessment above (even though it's
> still missing a etc/NEWS entry, adjustment to the docstring of
> modify-syntax-entry and to the .texi manual).
Here are these things:
diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
index b99b5de0b3..4e9e9207c3 100644
--- a/doc/lispref/syntax.texi
+++ b/doc/lispref/syntax.texi
@@ -287,21 +287,21 @@ Syntax Flags
@cindex syntax flags
In addition to the classes, entries for characters in a syntax table
-can specify flags. There are eight possible flags, represented by the
+can specify flags. There are nine possible flags, represented by the
characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
-@samp{n}, and @samp{p}.
+@samp{e}, @samp{n}, and @samp{p}.
All the flags except @samp{p} are used to describe comment
delimiters. The digit flags are used for comment delimiters made up
of 2 characters. They indicate that a character can @emph{also} be
part of a comment sequence, in addition to the syntactic properties
associated with its character class. The flags are independent of the
-class and each other for the sake of characters such as @samp{*} in
-C mode, which is a punctuation character, @emph{and} the second
+class and each other for the sake of characters such as @samp{*} in C
+mode, which is a punctuation character, @emph{and} the second
character of a start-of-comment sequence (@samp{/*}), @emph{and} the
first character of an end-of-comment sequence (@samp{*/}). The flags
-@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
-comment delimiter.
+@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the
+corresponding comment delimiter.
Here is a table of the possible flags for a character @var{c},
and what they mean:
@@ -332,6 +332,13 @@ Syntax Flags
alternative ``c'' comment style. For a two-character comment
delimiter, @samp{c} on either character makes it of style ``c''.
+@item
+@samp{e} means that when @var{c}, a comment ender or first character
+of a two character ender, is directly proceded by one or more escape
+characters, @var{c} does not act as a comment ender. Contrast this
+with the effect of variable @code{comment-end-can-be-escaped}
+(@pxref{Control Parsing}).
+
@item
@samp{n} on a comment delimiter character specifies that this kind of
comment can be nested. Inside such a comment, only comments of the
@@ -357,7 +364,7 @@ Syntax Flags
@item @samp{*}
@samp{23b}
@item newline
-@samp{>}
+@samp{> e}
@end table
This defines four comment-delimiting sequences:
@@ -377,7 +384,9 @@ Syntax Flags
@item newline
This is a comment-end sequence for ``a'' style, because the newline
-character does not have the @samp{b} flag.
+character does not have the @samp{b} flag. It can be escaped by one
+or more @samp{\} characters, so that an ``a'' style comment can
+continue onto the next line.
@end table
@item
@@ -962,9 +971,14 @@ Control Parsing
@defvar comment-end-can-be-escaped
If this buffer local variable is non-@code{nil}, a single character
which usually terminates a comment doesn't do so when that character
-is escaped. This is used in C and C++ Modes, where line comments
-starting with @samp{//} can be continued onto the next line by
-escaping the newline with @samp{\}.
+is escaped. This used to be used in C and C++ Modes, where line
+comments starting with @samp{//} can be continued onto the next line
+by escaping the newline with @samp{\}.
+
+Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax
+Flags}), where two consecutive escape characters escape the comment
+ender. @code{comment-end-can-be-escaped} should not be used together
+with the @samp{e} syntax flag.
@end defvar
You can use @code{forward-comment} to move forward or backward over
@@ -1037,6 +1051,8 @@ Syntax Table Internals
@samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)}
@item
@samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)}
+@item
+@tab@tab @samp{e} @tab @code{(ash 1 24)}
@end multitable
@defun string-to-syntax desc
diff --git a/etc/NEWS b/etc/NEWS
index a0e72bc673..3b292e8f41 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
\f
* Lisp Changes in Emacs 28.1
++++
+** New syntax flag 'e'.
+This indicates that one or two (or more) escape characters escape a
+comment ender with this flag, causing the comment to be continued past
+that comment ender (typically onto the next line).
+
+++
** 'set-window-configuration' now takes an optional 'dont-set-frame'
parameter which, when non-nil, instructs the function not to select
diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..7bdbd114ba 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below.
The second character of NEWENTRY is the matching parenthesis,
used only if the first character is `(' or `)'.
Any additional characters are flags.
-Defined flags are the characters 1, 2, 3, 4, b, p, and n.
+Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p.
1 means CHAR is the start of a two-char comment start sequence.
2 means CHAR is the second character of such a sequence.
3 means CHAR is the start of a two-char comment end sequence.
@@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag:
c means CHAR is part of comment sequence c.
n means CHAR is part of a nestable comment sequence.
+ e means CHAR, when a comment ender or first char of a two character
+ comment ender, can be escaped by (any number of consecutive)
+ characters with escape syntax. C and C++ use this facility.
+ Compare and contrast with the variable `comment-end-can-be-escaped'.
+
p means CHAR is a prefix character for `backward-prefix-chars';
such characters are treated as whitespace when they occur
between expressions.
> I really can't understand why you resist so much the use of
> a `syntax-table` property on those rare \\\n sequences.
Because syntax-table text properties are already used for so many
different things in CC Mode (I think the count is five in C++ Mode).
Adding another one would mean having to scan for this rare construct at
every buffer change, and this would slow things down, possibly a lot.
There is no slowdown (beyond a possible microscopic one) in the
modification to syntax.c and, as a bonus, I have written around 200 test
cases for syntax.c's comment features.
> Stefan
> PS: Also, I just noticed that `gcc -Wall` warns about the use of such
> multiline comments, so it doesn't seem to be a very popular feature.
It is more of a mistake that people occasionally might make than a
feature. In my opinion, having escaped newlines inside line comments is
a bug in the C/C++ language standards. Anybody might "end" a line
comment accidentally with "\" or "\\".
> PPS: For reference, I just tried to add support for it in sm-c-mode
> and this is the resulting code:
Just to emphasize Stefan Kangas's point, it is a newline preceded by a
"\" which continues the comment, not an escaped NL in the ordinary
sense. In particular two "\"s followed by NL still continue the
comment.
> @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"# define\"."
> 'syntax-table (string-to-syntax "|"))
> (put-text-property (match-beginning 2) (match-end 2)
> 'syntax-table (string-to-syntax "|")))
> - (sm-c--cpp-syntax-propertize end)))))
> + (sm-c--cpp-syntax-propertize end))))
> + ("\\\\\\(\n\\)"
> + (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
> + (when (and (nth 4 ppss) ;Within a comment
> + (null (nth 7 ppss)) ;Within a // comment
> + (save-excursion ;The \ is not itself escaped
> + (goto-char (match-beginning 0))
> + (zerop (mod (skip-chars-backward "\\\\") 2))))
> + (string-to-syntax "."))))))
> (point) end))
>
> (defun sm-c-syntactic-face-function (ppss)
Yes, something like this would be possible. But all these syntax-ppsss
would be slow, at least somewhat, as discussed above.
--
Alan Mackenzie (Nuremberg, Germany).
next prev parent reply other threads:[~2020-11-22 13:12 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
2020-09-22 14:09 ` Stefan Monnier
2020-09-22 19:41 ` Alan Mackenzie
[not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
2020-09-23 8:57 ` Alan Mackenzie
2020-09-23 9:01 ` Mattias Engdegård
2020-09-23 14:48 ` Alan Mackenzie
2020-09-23 18:44 ` Stefan Monnier
2020-09-23 19:44 ` Alan Mackenzie
2020-09-23 20:02 ` Stefan Monnier
2020-09-24 10:20 ` Alan Mackenzie
2020-09-24 16:56 ` Stefan Monnier
2020-09-24 18:50 ` Alan Mackenzie
2020-09-24 22:43 ` Stefan Monnier
2020-11-19 21:18 ` Alan Mackenzie
2020-11-19 22:47 ` Stefan Monnier
2020-11-22 13:12 ` Alan Mackenzie [this message]
2020-11-22 15:20 ` Stefan Monnier
2020-11-22 17:08 ` Alan Mackenzie
2020-11-22 17:46 ` Dmitry Gutov
2020-11-22 18:19 ` Alan Mackenzie
2020-11-22 20:39 ` Dmitry Gutov
2020-11-22 21:13 ` Alan Mackenzie
2020-11-22 21:34 ` Dmitry Gutov
2020-11-22 22:01 ` Alan Mackenzie
2020-11-22 23:00 ` Stefan Monnier
2021-05-13 10:38 ` Lars Ingebrigtsen
2021-05-13 14:51 ` Alan Mackenzie
2021-05-16 13:53 ` Lars Ingebrigtsen
2022-04-28 11:17 ` Lars Ingebrigtsen
2022-04-28 18:52 ` Alan Mackenzie
2020-11-22 23:10 ` Stefan Monnier
2020-11-22 15:35 ` Eli Zaretskii
2020-11-22 17:03 ` Alan Mackenzie
2020-09-24 18:52 ` Michael Welsh Duggan
2020-09-24 19:57 ` Alan Mackenzie
2020-09-24 20:27 ` Michael Welsh Duggan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201122131231.GB5912@ACM \
--to=acm@muc.de \
--cc=43558@debbugs.gnu.org \
--cc=mattiase@acm.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).