unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 43558@debbugs.gnu.org, "Mattias Engdegård" <mattiase@acm.org>,
	acm@muc.de
Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
Date: Sun, 22 Nov 2020 13:12:31 +0000	[thread overview]
Message-ID: <20201122131231.GB5912@ACM> (raw)
In-Reply-To: <jwvzh3d3rry.fsf-monnier+emacs@gnu.org>

Hello, Stefan.

On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote:
> >> So, yeah, you can add yet-another-hack on top of the other syntax.c
> >> hacks if you want, but there's a good chance it will only ever be used
> >> by CC-mode.  It will take a lot more code changes in syntax.c than
> >> a quick tweak to your Elisp code to search for "\*/".
> [...]
> > OK, here's the patch.

> I think the patch agrees with my assessment above (even though it's
> still missing a etc/NEWS entry, adjustment to the docstring of
> modify-syntax-entry and to the .texi manual).

Here are these things:



diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi
index b99b5de0b3..4e9e9207c3 100644
--- a/doc/lispref/syntax.texi
+++ b/doc/lispref/syntax.texi
@@ -287,21 +287,21 @@ Syntax Flags
 @cindex syntax flags
 
   In addition to the classes, entries for characters in a syntax table
-can specify flags.  There are eight possible flags, represented by the
+can specify flags.  There are nine possible flags, represented by the
 characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c},
-@samp{n}, and @samp{p}.
+@samp{e}, @samp{n}, and @samp{p}.
 
   All the flags except @samp{p} are used to describe comment
 delimiters.  The digit flags are used for comment delimiters made up
 of 2 characters.  They indicate that a character can @emph{also} be
 part of a comment sequence, in addition to the syntactic properties
 associated with its character class.  The flags are independent of the
-class and each other for the sake of characters such as @samp{*} in
-C mode, which is a punctuation character, @emph{and} the second
+class and each other for the sake of characters such as @samp{*} in C
+mode, which is a punctuation character, @emph{and} the second
 character of a start-of-comment sequence (@samp{/*}), @emph{and} the
 first character of an end-of-comment sequence (@samp{*/}).  The flags
-@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding
-comment delimiter.
+@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the
+corresponding comment delimiter.
 
   Here is a table of the possible flags for a character @var{c},
 and what they mean:
@@ -332,6 +332,13 @@ Syntax Flags
 alternative ``c'' comment style.  For a two-character comment
 delimiter, @samp{c} on either character makes it of style ``c''.
 
+@item
+@samp{e} means that when @var{c}, a comment ender or first character
+of a two character ender, is directly proceded by one or more escape
+characters, @var{c} does not act as a comment ender.  Contrast this
+with the effect of variable @code{comment-end-can-be-escaped}
+(@pxref{Control Parsing}).
+
 @item
 @samp{n} on a comment delimiter character specifies that this kind of
 comment can be nested.  Inside such a comment, only comments of the
@@ -357,7 +364,7 @@ Syntax Flags
 @item @samp{*}
 @samp{23b}
 @item newline
-@samp{>}
+@samp{> e}
 @end table
 
 This defines four comment-delimiting sequences:
@@ -377,7 +384,9 @@ Syntax Flags
 
 @item newline
 This is a comment-end sequence for ``a'' style, because the newline
-character does not have the @samp{b} flag.
+character does not have the @samp{b} flag.  It can be escaped by one
+or more @samp{\} characters, so that an ``a'' style comment can
+continue onto the next line.
 @end table
 
 @item
@@ -962,9 +971,14 @@ Control Parsing
 @defvar comment-end-can-be-escaped
 If this buffer local variable is non-@code{nil}, a single character
 which usually terminates a comment doesn't do so when that character
-is escaped.  This is used in C and C++ Modes, where line comments
-starting with @samp{//} can be continued onto the next line by
-escaping the newline with @samp{\}.
+is escaped.  This used to be used in C and C++ Modes, where line
+comments starting with @samp{//} can be continued onto the next line
+by escaping the newline with @samp{\}.
+
+Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax
+Flags}), where two consecutive escape characters escape the comment
+ender.  @code{comment-end-can-be-escaped} should not be used together
+with the @samp{e} syntax flag.
 @end defvar
 
 You can use @code{forward-comment} to move forward or backward over
@@ -1037,6 +1051,8 @@ Syntax Table Internals
 @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)}
 @item
 @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)}
+@item
+@tab@tab @samp{e} @tab @code{(ash 1 24)}
 @end multitable
 
 @defun string-to-syntax desc
diff --git a/etc/NEWS b/etc/NEWS
index a0e72bc673..3b292e8f41 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
 \f
 * Lisp Changes in Emacs 28.1
 
++++
+** New syntax flag 'e'.
+This indicates that one or two (or more) escape characters escape a
+comment ender with this flag, causing the comment to be continued past
+that comment ender (typically onto the next line).
+
 +++
 ** 'set-window-configuration' now takes an optional 'dont-set-frame'
 parameter which, when non-nil, instructs the function not to select
diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..7bdbd114ba 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below.
 The second character of NEWENTRY is the matching parenthesis,
  used only if the first character is `(' or `)'.
 Any additional characters are flags.
-Defined flags are the characters 1, 2, 3, 4, b, p, and n.
+Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p.
  1 means CHAR is the start of a two-char comment start sequence.
  2 means CHAR is the second character of such a sequence.
  3 means CHAR is the start of a two-char comment end sequence.
@@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag:
  c means CHAR is part of comment sequence c.
  n means CHAR is part of a nestable comment sequence.
 
+ e means CHAR, when a comment ender or first char of a two character
+   comment ender, can be escaped by (any number of consecutive)
+   characters with escape syntax.  C and C++ use this facility.
+   Compare and contrast with the variable `comment-end-can-be-escaped'.
+
  p means CHAR is a prefix character for `backward-prefix-chars';
    such characters are treated as whitespace when they occur
    between expressions.



> I really can't understand why you resist so much the use of
> a `syntax-table` property on those rare \\\n sequences.

Because syntax-table text properties are already used for so many
different things in CC Mode (I think the count is five in C++ Mode).
Adding another one would mean having to scan for this rare construct at
every buffer change, and this would slow things down, possibly a lot.

There is no slowdown (beyond a possible microscopic one) in the
modification to syntax.c and, as a bonus, I have written around 200 test
cases for syntax.c's comment features.

>         Stefan


> PS: Also, I just noticed that `gcc -Wall` warns about the use of such
> multiline comments, so it doesn't seem to be a very popular feature.

It is more of a mistake that people occasionally might make than a
feature.  In my opinion, having escaped newlines inside line comments is
a bug in the C/C++ language standards.  Anybody might "end" a line
comment accidentally with "\" or "\\".

> PPS: For reference, I just tried to add support for it in sm-c-mode
> and this is the resulting code:

Just to emphasize Stefan Kangas's point, it is a newline preceded by a
"\" which continues the comment, not an escaped NL in the ordinary
sense.  In particular two "\"s followed by NL still continue the
comment.

> @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"#  define\"."
>                                 'syntax-table (string-to-syntax "|"))
>              (put-text-property (match-beginning 2) (match-end 2)
>                                 'syntax-table (string-to-syntax "|")))
> -          (sm-c--cpp-syntax-propertize end)))))
> +          (sm-c--cpp-syntax-propertize end))))
> +    ("\\\\\\(\n\\)"
> +     (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0)))))
> +          (when (and (nth 4 ppss)        ;Within a comment
> +                     (null (nth 7 ppss)) ;Within a // comment
> +                     (save-excursion     ;The \ is not itself escaped
> +                       (goto-char (match-beginning 0))
> +                       (zerop (mod (skip-chars-backward "\\\\") 2))))
> +            (string-to-syntax "."))))))
>     (point) end))
>  
>  (defun sm-c-syntactic-face-function (ppss)

Yes, something like this would be possible.  But all these syntax-ppsss
would be slow, at least somewhat, as discussed above.

-- 
Alan Mackenzie (Nuremberg, Germany).





  reply	other threads:[~2020-11-22 13:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22  9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
2020-09-22 14:09 ` Stefan Monnier
2020-09-22 19:41   ` Alan Mackenzie
     [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
2020-09-23  8:57   ` Alan Mackenzie
2020-09-23  9:01 ` Mattias Engdegård
2020-09-23 14:48   ` Alan Mackenzie
2020-09-23 18:44     ` Stefan Monnier
2020-09-23 19:44       ` Alan Mackenzie
2020-09-23 20:02         ` Stefan Monnier
2020-09-24 10:20       ` Alan Mackenzie
2020-09-24 16:56         ` Stefan Monnier
2020-09-24 18:50           ` Alan Mackenzie
2020-09-24 22:43             ` Stefan Monnier
2020-11-19 21:18           ` Alan Mackenzie
2020-11-19 22:47             ` Stefan Monnier
2020-11-22 13:12               ` Alan Mackenzie [this message]
2020-11-22 15:20                 ` Stefan Monnier
2020-11-22 17:08                   ` Alan Mackenzie
2020-11-22 17:46                     ` Dmitry Gutov
2020-11-22 18:19                       ` Alan Mackenzie
2020-11-22 20:39                         ` Dmitry Gutov
2020-11-22 21:13                           ` Alan Mackenzie
2020-11-22 21:34                             ` Dmitry Gutov
2020-11-22 22:01                               ` Alan Mackenzie
2020-11-22 23:00                                 ` Stefan Monnier
2021-05-13 10:38                                   ` Lars Ingebrigtsen
2021-05-13 14:51                                     ` Alan Mackenzie
2021-05-16 13:53                                       ` Lars Ingebrigtsen
2022-04-28 11:17                                       ` Lars Ingebrigtsen
2022-04-28 18:52                                         ` Alan Mackenzie
2020-11-22 23:10                     ` Stefan Monnier
2020-11-22 15:35                 ` Eli Zaretskii
2020-11-22 17:03                   ` Alan Mackenzie
2020-09-24 18:52     ` Michael Welsh Duggan
2020-09-24 19:57       ` Alan Mackenzie
2020-09-24 20:27         ` Michael Welsh Duggan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201122131231.GB5912@ACM \
    --to=acm@muc.de \
    --cc=43558@debbugs.gnu.org \
    --cc=mattiase@acm.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).