From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. Date: Sun, 22 Nov 2020 13:12:31 +0000 Message-ID: <20201122131231.GB5912@ACM> References: <20200923144824.GD6178@ACM> <20200924102022.GA4714@ACM> <20201119211822.GE6259@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30440"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 43558@debbugs.gnu.org, Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= , acm@muc.de To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 22 14:13:28 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kgpBH-0007ni-Ey for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 22 Nov 2020 14:13:27 +0100 Original-Received: from localhost ([::1]:39224 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kgpBG-0003m9-IA for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 22 Nov 2020 08:13:26 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43578) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kgpAs-0003ln-Ag for bug-gnu-emacs@gnu.org; Sun, 22 Nov 2020 08:13:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35534) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kgpAr-0008J5-VO for bug-gnu-emacs@gnu.org; Sun, 22 Nov 2020 08:13:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kgpAr-00084K-Pu for bug-gnu-emacs@gnu.org; Sun, 22 Nov 2020 08:13:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 22 Nov 2020 13:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43558 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 43558-submit@debbugs.gnu.org id=B43558.160605076130990 (code B ref 43558); Sun, 22 Nov 2020 13:13:01 +0000 Original-Received: (at 43558) by debbugs.gnu.org; 22 Nov 2020 13:12:41 +0000 Original-Received: from localhost ([127.0.0.1]:47080 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kgpAX-00083l-0P for submit@debbugs.gnu.org; Sun, 22 Nov 2020 08:12:41 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:20816 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1kgpAV-00083X-Bi for 43558@debbugs.gnu.org; Sun, 22 Nov 2020 08:12:40 -0500 Original-Received: (qmail 98865 invoked by uid 3782); 22 Nov 2020 13:12:32 -0000 Original-Received: from acm.muc.de (p2e5d52bc.dip0.t-ipconnect.de [46.93.82.188]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Sun, 22 Nov 2020 14:12:31 +0100 Original-Received: (qmail 10556 invoked by uid 1000); 22 Nov 2020 13:12:31 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:193838 Archived-At: Hello, Stefan. On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote: > >> So, yeah, you can add yet-another-hack on top of the other syntax.c > >> hacks if you want, but there's a good chance it will only ever be used > >> by CC-mode. It will take a lot more code changes in syntax.c than > >> a quick tweak to your Elisp code to search for "\*/". > [...] > > OK, here's the patch. > I think the patch agrees with my assessment above (even though it's > still missing a etc/NEWS entry, adjustment to the docstring of > modify-syntax-entry and to the .texi manual). Here are these things: diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index b99b5de0b3..4e9e9207c3 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -287,21 +287,21 @@ Syntax Flags @cindex syntax flags In addition to the classes, entries for characters in a syntax table -can specify flags. There are eight possible flags, represented by the +can specify flags. There are nine possible flags, represented by the characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c}, -@samp{n}, and @samp{p}. +@samp{e}, @samp{n}, and @samp{p}. All the flags except @samp{p} are used to describe comment delimiters. The digit flags are used for comment delimiters made up of 2 characters. They indicate that a character can @emph{also} be part of a comment sequence, in addition to the syntactic properties associated with its character class. The flags are independent of the -class and each other for the sake of characters such as @samp{*} in -C mode, which is a punctuation character, @emph{and} the second +class and each other for the sake of characters such as @samp{*} in C +mode, which is a punctuation character, @emph{and} the second character of a start-of-comment sequence (@samp{/*}), @emph{and} the first character of an end-of-comment sequence (@samp{*/}). The flags -@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding -comment delimiter. +@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the +corresponding comment delimiter. Here is a table of the possible flags for a character @var{c}, and what they mean: @@ -332,6 +332,13 @@ Syntax Flags alternative ``c'' comment style. For a two-character comment delimiter, @samp{c} on either character makes it of style ``c''. +@item +@samp{e} means that when @var{c}, a comment ender or first character +of a two character ender, is directly proceded by one or more escape +characters, @var{c} does not act as a comment ender. Contrast this +with the effect of variable @code{comment-end-can-be-escaped} +(@pxref{Control Parsing}). + @item @samp{n} on a comment delimiter character specifies that this kind of comment can be nested. Inside such a comment, only comments of the @@ -357,7 +364,7 @@ Syntax Flags @item @samp{*} @samp{23b} @item newline -@samp{>} +@samp{> e} @end table This defines four comment-delimiting sequences: @@ -377,7 +384,9 @@ Syntax Flags @item newline This is a comment-end sequence for ``a'' style, because the newline -character does not have the @samp{b} flag. +character does not have the @samp{b} flag. It can be escaped by one +or more @samp{\} characters, so that an ``a'' style comment can +continue onto the next line. @end table @item @@ -962,9 +971,14 @@ Control Parsing @defvar comment-end-can-be-escaped If this buffer local variable is non-@code{nil}, a single character which usually terminates a comment doesn't do so when that character -is escaped. This is used in C and C++ Modes, where line comments -starting with @samp{//} can be continued onto the next line by -escaping the newline with @samp{\}. +is escaped. This used to be used in C and C++ Modes, where line +comments starting with @samp{//} can be continued onto the next line +by escaping the newline with @samp{\}. + +Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax +Flags}), where two consecutive escape characters escape the comment +ender. @code{comment-end-can-be-escaped} should not be used together +with the @samp{e} syntax flag. @end defvar You can use @code{forward-comment} to move forward or backward over @@ -1037,6 +1051,8 @@ Syntax Table Internals @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)} @item @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)} +@item +@tab@tab @samp{e} @tab @code{(ash 1 24)} @end multitable @defun string-to-syntax desc diff --git a/etc/NEWS b/etc/NEWS index a0e72bc673..3b292e8f41 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el. * Lisp Changes in Emacs 28.1 ++++ +** New syntax flag 'e'. +This indicates that one or two (or more) escape characters escape a +comment ender with this flag, causing the comment to be continued past +that comment ender (typically onto the next line). + +++ ** 'set-window-configuration' now takes an optional 'dont-set-frame' parameter which, when non-nil, instructs the function not to select diff --git a/src/syntax.c b/src/syntax.c index df07809aaa..7bdbd114ba 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below. The second character of NEWENTRY is the matching parenthesis, used only if the first character is `(' or `)'. Any additional characters are flags. -Defined flags are the characters 1, 2, 3, 4, b, p, and n. +Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p. 1 means CHAR is the start of a two-char comment start sequence. 2 means CHAR is the second character of such a sequence. 3 means CHAR is the start of a two-char comment end sequence. @@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag: c means CHAR is part of comment sequence c. n means CHAR is part of a nestable comment sequence. + e means CHAR, when a comment ender or first char of a two character + comment ender, can be escaped by (any number of consecutive) + characters with escape syntax. C and C++ use this facility. + Compare and contrast with the variable `comment-end-can-be-escaped'. + p means CHAR is a prefix character for `backward-prefix-chars'; such characters are treated as whitespace when they occur between expressions. > I really can't understand why you resist so much the use of > a `syntax-table` property on those rare \\\n sequences. Because syntax-table text properties are already used for so many different things in CC Mode (I think the count is five in C++ Mode). Adding another one would mean having to scan for this rare construct at every buffer change, and this would slow things down, possibly a lot. There is no slowdown (beyond a possible microscopic one) in the modification to syntax.c and, as a bonus, I have written around 200 test cases for syntax.c's comment features. > Stefan > PS: Also, I just noticed that `gcc -Wall` warns about the use of such > multiline comments, so it doesn't seem to be a very popular feature. It is more of a mistake that people occasionally might make than a feature. In my opinion, having escaped newlines inside line comments is a bug in the C/C++ language standards. Anybody might "end" a line comment accidentally with "\" or "\\". > PPS: For reference, I just tried to add support for it in sm-c-mode > and this is the resulting code: Just to emphasize Stefan Kangas's point, it is a newline preceded by a "\" which continues the comment, not an escaped NL in the ordinary sense. In particular two "\"s followed by NL still continue the comment. > @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"# define\"." > 'syntax-table (string-to-syntax "|")) > (put-text-property (match-beginning 2) (match-end 2) > 'syntax-table (string-to-syntax "|"))) > - (sm-c--cpp-syntax-propertize end))))) > + (sm-c--cpp-syntax-propertize end)))) > + ("\\\\\\(\n\\)" > + (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0))))) > + (when (and (nth 4 ppss) ;Within a comment > + (null (nth 7 ppss)) ;Within a // comment > + (save-excursion ;The \ is not itself escaped > + (goto-char (match-beginning 0)) > + (zerop (mod (skip-chars-backward "\\\\") 2)))) > + (string-to-syntax ".")))))) > (point) end)) > > (defun sm-c-syntactic-face-function (ppss) Yes, something like this would be possible. But all these syntax-ppsss would be slow, at least somewhat, as discussed above. -- Alan Mackenzie (Nuremberg, Germany).