From: Alan Mackenzie <acm@muc.de>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 43558@debbugs.gnu.org, "Mattias Engdegård" <mattiase@acm.org>,
acm@muc.de
Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped.
Date: Thu, 19 Nov 2020 21:18:22 +0000 [thread overview]
Message-ID: <20201119211822.GE6259@ACM> (raw)
In-Reply-To: <jwv7dsjp1i7.fsf-monnier+emacs@gnu.org>
Hello, Stefan.
On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote:
> > As already said, this is a(n ugly) workaround. syntax.c should handle
> > comments in all their generality. With a bit of consideration, the
> > method to do this is clear:
> In my world, it's quite normal for a specific language's lexical rules
> not to line up 100% with syntax tables (whether for strings, comments,
> younameit). I don't see anything very special here.
> A `syntax-propertize` rule for "\*/" should be very easy to implement
> and fairly cheap since the regexp is simple and will almost never match.
> So, yeah, you can add yet-another-hack on top of the other syntax.c
> hacks if you want, but there's a good chance it will only ever be used
> by CC-mode. It will take a lot more code changes in syntax.c than
> a quick tweak to your Elisp code to search for "\*/".
> I do think it would be good to handle this without `syntax-table`
> text-property hacks, but I think that should come with an overhaul of
> syntax.c based on a major-mode provided DFA (or something like that) so
> it can accommodate all the various oddball cases without even the need
> to introduce the notion of escaping comment markers.
OK, here's the patch. As a matter of interest, it's been heavily tested
by the .../test/src/syntax-tests.el unit tests, further enhancements to
which are part of the patch.
Just as a reminder, the motivation is to be able to have syntax.c
correctly parse C/C++ line comments which look like:
foo(); // comment \\
second line of comment.
by introducing a new syntax flag "e" as a modifier on the syntax entry
for \n:
(modify-syntax-entry ?\n "> be")
> Stefan
diff --git a/src/syntax.c b/src/syntax.c
index df07809aaa..c701729ba1 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags)
{
return (flags >> 22) & 1;
}
+static bool
+SYNTAX_FLAGS_COMMENT_ESCAPES (int flags)
+{
+ return (flags >> 24) & 1;
+}
/* FLAGS should be the flags of the main char of the comment marker, e.g.
the second for comstart and the first for comend. */
@@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte)
return val;
}
+static bool
+comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax)
+{
+ int c;
+ int next_syntax;
+ if (comment_end_can_be_escaped && char_quoted (from, from_byte))
+ return true;
+ if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax))
+ {
+ dec_both (&from, &from_byte);
+ UPDATE_SYNTAX_TABLE_BACKWARD (from);
+ c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+ next_syntax = SYNTAX_WITH_FLAGS (c);
+ UPDATE_SYNTAX_TABLE_FORWARD (from + 1);
+ if (next_syntax == Sescape || next_syntax == Scharquote)
+ return true;
+ }
+ return false;
+}
+
/* Check whether charpos FROM is at the end of a comment.
FROM_BYTE is the bytepos corresponding to FROM.
Do not move back before STOP.
@@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
&& SYNTAX_FLAGS_COMEND_SECOND (prev_syntax));
comstart = (com2start || code == Scomment);
+ /* Check for any current delimiter being escaped. */
+ if (from > stop
+ && (((com2end || code == Sendcomment)
+ && comment_ender_quoted (from, from_byte, syntax))
+ || (code == Scomment
+ && comment_end_can_be_escaped
+ && char_quoted (from, from_byte))))
+ {
+ dec_both (&from, &from_byte);
+ UPDATE_SYNTAX_TABLE_BACKWARD (from);
+ com2end = comstart = com2start = 0;
+ syntax = Smax;
+ }
+
/* Nasty cases with overlapping 2-char comment markers:
- snmp-mode: -- c -- foo -- c --
--- c --
@@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property. */)
case 'c':
val |= 1 << 23;
break;
+
+ case 'e':
+ val |= 1 << 24;
+ break;
}
if (val < ASIZE (Vsyntax_code_object) && NILP (match))
@@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
(Lisp_Object syntax)
{
int code, syntax_code;
- bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested;
+ bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested,
+ comescapes;
char str[2];
Lisp_Object first, match_lisp, value = syntax;
@@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code);
comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code);
comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code);
+ comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code);
if (Smax <= code)
{
@@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
insert ("c", 1);
if (comnested)
insert ("n", 1);
+ if (comescapes)
+ insert ("e", 1);
insert_string ("\twhich means: ");
@@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value,
insert_string (" (comment style c)");
if (comnested)
insert_string (" (nestable)");
+ if (comescapes)
+ insert_string (" (can be escaped)");
if (prefix)
{
@@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
&& SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
&& (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
(nesting > 0 && --nesting == 0) : nesting < 0)
- && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+ && !comment_ender_quoted (from, from_byte, syntax))
/* We have encountered a comment end of the same style
as the comment sequence which began this comment
section. */
@@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
/* We have encountered a nested comment of the same style
as the comment sequence which began this comment section. */
nesting++;
- if (comment_end_can_be_escaped
- && (code == Sescape || code == Scharquote))
+ if (SYNTAX_FLAGS_COMEND_FIRST (syntax)
+ && comment_ender_quoted (from, from_byte, syntax))
{
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
- if (from == stop) continue; /* Failure */
+ continue;
}
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil. */)
/* We're at the start of a comment. */
found = forw_comment (from, from_byte, stop, comnested, comstyle, 0,
&out_charpos, &out_bytepos, &dummy, &dummy2);
- from = out_charpos; from_byte = out_bytepos;
- if (!found)
+ from = out_charpos; from_byte = out_bytepos;
+ if (!found)
{
SET_PT_BOTH (from, from_byte);
return Qnil;
@@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil. */)
if (code == Sendcomment)
comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0);
if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax)
- && prev_char_comend_first (from, from_byte)
- && !char_quoted (from - 1, dec_bytepos (from_byte)))
+ && prev_char_comend_first (from, from_byte))
{
int other_syntax;
- /* We must record the comment style encountered so that
+ /* We must record the comment style encountered so that
later, we can match only the proper comment begin
sequence of the same style. */
dec_both (&from, &from_byte);
- code = Sendcomment;
- /* Calling char_quoted, above, set up global syntax position
- at the new value of FROM. */
c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte);
other_syntax = SYNTAX_WITH_FLAGS (c1);
- comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
- comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+ if (!comment_ender_quoted (from, from_byte, other_syntax))
+ {
+ code = Sendcomment;
+ comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
+ comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+ syntax = other_syntax;
+ }
+ else
+ {
+ inc_both (&from, &from_byte);
+ UPDATE_SYNTAX_TABLE_FORWARD (from);
+ }
}
if (code == Scomment_fence)
@@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil. */)
}
else if (code == Sendcomment)
{
- found = (!quoted || !comment_end_can_be_escaped)
+ found =
+ !comment_ender_quoted (from, from_byte, syntax)
&& back_comment (from, from_byte, stop, comnested, comstyle,
&out_charpos, &out_bytepos);
if (!found)
@@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
other_syntax = SYNTAX_WITH_FLAGS (c2);
comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax);
comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax);
+ syntax = other_syntax;
}
/* Quoting turns anything except a comment-ender
@@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag)
case Sendcomment:
if (!parse_sexp_ignore_comments)
break;
- found = back_comment (from, from_byte, stop, comnested, comstyle,
+ found =
+ (from == stop
+ || !comment_ender_quoted (from, from_byte, syntax))
+ && back_comment (from, from_byte, stop, comnested, comstyle,
&out_charpos, &out_bytepos);
/* FIXME: if !found, it really wasn't a comment-end.
For single-char Sendcomment, we can't do much about it apart
diff --git a/test/src/syntax-resources/syntax-comments.txt b/test/src/syntax-resources/syntax-comments.txt
index a292d816b9..f3357ea244 100644
--- a/test/src/syntax-resources/syntax-comments.txt
+++ b/test/src/syntax-resources/syntax-comments.txt
@@ -34,7 +34,7 @@
54{ //74 \
}54
55{/* */}55
-56{ /*76 \*/ }56
+56{ /*76 \*/80 }56
57*/77
58}58
60{ /*78 \\*/79}60
@@ -87,6 +87,21 @@
110
111#| ; |#111
+/* Comments and purported comments containing string delimiters. */
+120/* "string" */120
+121/* "" */121
+122/* " */122
+130/*
+" " */130
+" "*/123
+124/* " ' */124
+126/*
+" ' */126
+127/* " " " " " */127
+128/* " ' " ' " ' */128
+129/* ' " ' " ' */129
+" ' */125
+
Local Variables:
mode: fundamental
eval: (set-syntax-table (make-syntax-table))
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index edee01ec58..399986c31d 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -307,6 +307,7 @@ syntax-pps-comments
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun {-in ()
(setq parse-sexp-ignore-comments t)
+ (setq comment-use-syntax-ppss nil)
(setq comment-end-can-be-escaped nil)
(modify-syntax-entry ?{ "<")
(modify-syntax-entry ?} ">"))
@@ -336,6 +337,7 @@ {-out
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun \;-in ()
(setq parse-sexp-ignore-comments t)
+ (setq comment-use-syntax-ppss nil)
(setq comment-end-can-be-escaped nil)
(modify-syntax-entry ?\n ">")
(modify-syntax-entry ?\; "<")
@@ -375,6 +377,7 @@ \;-out
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun \#|-in ()
(setq parse-sexp-ignore-comments t)
+ (setq comment-use-syntax-ppss nil)
(modify-syntax-entry ?# ". 14")
(modify-syntax-entry ?| ". 23n")
(modify-syntax-entry ?\; "< b")
@@ -418,15 +421,18 @@ \#|-out
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun /*-in ()
(setq parse-sexp-ignore-comments t)
+ (setq comment-use-syntax-ppss nil)
(setq comment-end-can-be-escaped t)
(modify-syntax-entry ?/ ". 124b")
(modify-syntax-entry ?* ". 23")
- (modify-syntax-entry ?\n "> b"))
+ (modify-syntax-entry ?\n "> b")
+ (modify-syntax-entry ?\' "\""))
(defun /*-out ()
(setq comment-end-can-be-escaped nil)
(modify-syntax-entry ?/ ".")
(modify-syntax-entry ?* ".")
- (modify-syntax-entry ?\n " "))
+ (modify-syntax-entry ?\n " ")
+ (modify-syntax-entry ?\' "."))
(eval-and-compile
(setq syntax-comments-section "c"))
@@ -489,4 +495,142 @@ /*-out
(syntax-pps-comments /* 56 76 77 58)
(syntax-pps-comments /* 60 78 79)
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the
+;; "e" flag is used for line comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(defun //-in ()
+ (setq parse-sexp-ignore-comments t)
+ (setq comment-use-syntax-ppss nil)
+ (modify-syntax-entry ?/ ". 124be")
+ (modify-syntax-entry ?* ". 23")
+ (modify-syntax-entry ?\n "> be")
+ (modify-syntax-entry ?\' "\""))
+(defun //-out ()
+ (modify-syntax-entry ?/ ".")
+ (modify-syntax-entry ?* ".")
+ (modify-syntax-entry ?\n " ")
+ (modify-syntax-entry ?\' "."))
+(eval-and-compile
+ (setq syntax-comments-section "c++"))
+
+(syntax-comments // forward t 1)
+(syntax-comments // backward t 1)
+(syntax-comments // forward t 2)
+(syntax-comments // backward t 2)
+(syntax-comments // forward t 3)
+(syntax-comments // backward t 3)
+
+(syntax-comments // forward t 4)
+(syntax-comments // backward t 4)
+(syntax-comments // forward t 5 6)
+(syntax-comments // backward nil 5 0)
+(syntax-comments // forward nil 6 0)
+(syntax-comments // backward t 6 5)
+
+(syntax-comments // forward t 7)
+(syntax-comments // backward t 7)
+(syntax-comments // forward nil 8 0)
+(syntax-comments // backward nil 8 0)
+(syntax-comments // forward t 9)
+(syntax-comments // backward t 9)
+
+(syntax-comments // forward nil 10 0)
+(syntax-comments // backward nil 10 0)
+(syntax-comments // forward t 11)
+(syntax-comments // backward t 11)
+
+(syntax-comments // forward t 13)
+(syntax-comments // backward t 13)
+(syntax-comments // forward t 15)
+(syntax-comments // backward t 15)
+
+;; Emacs 28 "C" style comments inside brace lists.
+(syntax-br-comments // forward t 50)
+(syntax-br-comments // backward t 50)
+(syntax-br-comments // forward t 51)
+(syntax-br-comments // backward t 51)
+(syntax-br-comments // forward t 52)
+(syntax-br-comments // backward t 52)
+
+(syntax-br-comments // forward t 53)
+(syntax-br-comments // backward t 53)
+(syntax-br-comments // forward t 54 58)
+(syntax-br-comments // backward t 54)
+(syntax-br-comments // forward t 55)
+(syntax-br-comments // backward t 55)
+
+(syntax-br-comments // forward t 56 56)
+(syntax-br-comments // backward t 58 54)
+(syntax-br-comments // backward nil 59)
+(syntax-br-comments // forward t 60)
+(syntax-br-comments // backward t 60)
+
+;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'.
+(syntax-pps-comments // 50 70 71)
+(syntax-pps-comments // 52 72 73)
+(syntax-pps-comments // 54 74 55 58)
+(syntax-pps-comments // 56 76 80)
+(syntax-pps-comments // 60 78 79)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Comments containing string delimiters.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+ (setq syntax-comments-section "c-\""))
+
+(syntax-comments /* forward t 120)
+(syntax-comments /* backward t 120)
+(syntax-comments /* forward t 121)
+(syntax-comments /* backward t 121)
+(syntax-comments /* forward t 122)
+(syntax-comments /* backward t 122)
+
+(syntax-comments /* backward nil 123 0)
+(syntax-comments /* forward t 124)
+(syntax-comments /* backward t 124)
+(syntax-comments /* backward nil 125 0)
+(syntax-comments /* forward t 126)
+(syntax-comments /* backward t 126)
+
+(syntax-comments /* forward t 127)
+(syntax-comments /* backward t 127)
+(syntax-comments /* forward t 128)
+(syntax-comments /* backward t 128)
+(syntax-comments /* forward t 129)
+(syntax-comments /* backward t 129)
+
+(syntax-comments /* forward t 130)
+(syntax-comments /* backward t 130)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The same again, with Emacs 28 style C comments.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(eval-and-compile
+ (setq syntax-comments-section "c++-\""))
+
+(syntax-comments // forward t 120)
+(syntax-comments // backward t 120)
+(syntax-comments // forward t 121)
+(syntax-comments // backward t 121)
+(syntax-comments // forward t 122)
+(syntax-comments // backward t 122)
+
+(syntax-comments // backward nil 123 0)
+(syntax-comments // forward t 124)
+(syntax-comments // backward t 124)
+(syntax-comments // backward nil 125 0)
+(syntax-comments // forward t 126)
+(syntax-comments // backward t 126)
+
+(syntax-comments // forward t 127)
+(syntax-comments // backward t 127)
+(syntax-comments // forward t 128)
+(syntax-comments // backward t 128)
+(syntax-comments // forward t 129)
+(syntax-comments // backward t 129)
+
+(syntax-comments // forward t 130)
+(syntax-comments // backward t 130)
+
;;; syntax-tests.el ends here
--
Alan Mackenzie (Nuremberg, Germany).
next prev parent reply other threads:[~2020-11-19 21:18 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie
2020-09-22 14:09 ` Stefan Monnier
2020-09-22 19:41 ` Alan Mackenzie
[not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org>
2020-09-23 8:57 ` Alan Mackenzie
2020-09-23 9:01 ` Mattias Engdegård
2020-09-23 14:48 ` Alan Mackenzie
2020-09-23 18:44 ` Stefan Monnier
2020-09-23 19:44 ` Alan Mackenzie
2020-09-23 20:02 ` Stefan Monnier
2020-09-24 10:20 ` Alan Mackenzie
2020-09-24 16:56 ` Stefan Monnier
2020-09-24 18:50 ` Alan Mackenzie
2020-09-24 22:43 ` Stefan Monnier
2020-11-19 21:18 ` Alan Mackenzie [this message]
2020-11-19 22:47 ` Stefan Monnier
2020-11-22 13:12 ` Alan Mackenzie
2020-11-22 15:20 ` Stefan Monnier
2020-11-22 17:08 ` Alan Mackenzie
2020-11-22 17:46 ` Dmitry Gutov
2020-11-22 18:19 ` Alan Mackenzie
2020-11-22 20:39 ` Dmitry Gutov
2020-11-22 21:13 ` Alan Mackenzie
2020-11-22 21:34 ` Dmitry Gutov
2020-11-22 22:01 ` Alan Mackenzie
2020-11-22 23:00 ` Stefan Monnier
2021-05-13 10:38 ` Lars Ingebrigtsen
2021-05-13 14:51 ` Alan Mackenzie
2021-05-16 13:53 ` Lars Ingebrigtsen
2022-04-28 11:17 ` Lars Ingebrigtsen
2022-04-28 18:52 ` Alan Mackenzie
2020-11-22 23:10 ` Stefan Monnier
2020-11-22 15:35 ` Eli Zaretskii
2020-11-22 17:03 ` Alan Mackenzie
2020-09-24 18:52 ` Michael Welsh Duggan
2020-09-24 19:57 ` Alan Mackenzie
2020-09-24 20:27 ` Michael Welsh Duggan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201119211822.GE6259@ACM \
--to=acm@muc.de \
--cc=43558@debbugs.gnu.org \
--cc=mattiase@acm.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).