* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. @ 2020-09-22 9:35 Alan Mackenzie 2020-09-22 14:09 ` Stefan Monnier ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-09-22 9:35 UTC (permalink / raw) To: 43558, monnier Hello, Emacs and Stefan. In the following C comment: 1 /* 2 \*/ 3 /**/ , with point at BOL 1, do M-: (forward-comment 1). This leaves point wrongly at EOL 2. It should end up at EOL 3, since the apparent comment ender on L2 is actually escaped. The following patch fixes this. Are there any objections to me installing it? diff --git a/src/syntax.c b/src/syntax.c index e6af8a377b..066972e6d8 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2354,6 +2354,13 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, /* We have encountered a nested comment of the same style as the comment sequence which began this comment section. */ nesting++; + if (comment_end_can_be_escaped + && (code == Sescape || code == Scharquote)) + { + inc_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_FORWARD (from); + if (from == stop) continue; /* Failure */ + } inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply related [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-22 9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie @ 2020-09-22 14:09 ` Stefan Monnier 2020-09-22 19:41 ` Alan Mackenzie [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org> 2020-09-23 9:01 ` Mattias Engdegård 2 siblings, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2020-09-22 14:09 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558 Hi Alan, > Hello, Emacs and Stefan. > > In the following C comment: > > 1 /* > 2 \*/ > 3 /**/ > > , with point at BOL 1, do M-: (forward-comment 1). This leaves point > wrongly at EOL 2. That seems to be correct w.r.t the highlighting I see, OTOH. IOW the bug seems to affect both forward-comment and parse-partial-sexp, right? > It should end up at EOL 3, since the apparent comment > ender on L2 is actually escaped. > > The following patch fixes this. Does it fix it for `parse-partial-sexp` as well? > Are there any objections to me installing it? None from me, no. Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-22 14:09 ` Stefan Monnier @ 2020-09-22 19:41 ` Alan Mackenzie 0 siblings, 0 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-09-22 19:41 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558 Hello, Stefan. On Tue, Sep 22, 2020 at 10:09:43 -0400, Stefan Monnier wrote: > Hi Alan, > > Hello, Emacs and Stefan. > > In the following C comment: > > 1 /* > > 2 \*/ > > 3 /**/ > > , with point at BOL 1, do M-: (forward-comment 1). This leaves point > > wrongly at EOL 2. > That seems to be correct w.r.t the highlighting I see, OTOH. > IOW the bug seems to affect both forward-comment and parse-partial-sexp, right? Yes. > > It should end up at EOL 3, since the apparent comment > > ender on L2 is actually escaped. > > The following patch fixes this. > Does it fix it for `parse-partial-sexp` as well? It does, yes. The patch is in forw_comment, which is called by Fforward_comment, scan_lists, and scan_sexps_forward. > > Are there any objections to me installing it? > None from me, no. Thanks! > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <handler.43558.B.160076736116422.ack@debbugs.gnu.org>]
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org> @ 2020-09-23 8:57 ` Alan Mackenzie 0 siblings, 0 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-09-23 8:57 UTC (permalink / raw) To: 43558-done Bug fixed in master. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-22 9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie 2020-09-22 14:09 ` Stefan Monnier [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org> @ 2020-09-23 9:01 ` Mattias Engdegård 2020-09-23 14:48 ` Alan Mackenzie 2 siblings, 1 reply; 36+ messages in thread From: Mattias Engdegård @ 2020-09-23 9:01 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Stefan Monnier Sorry if I misunderstood, but since when do backslashes escape */ in C? ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 9:01 ` Mattias Engdegård @ 2020-09-23 14:48 ` Alan Mackenzie 2020-09-23 18:44 ` Stefan Monnier 2020-09-24 18:52 ` Michael Welsh Duggan 0 siblings, 2 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-09-23 14:48 UTC (permalink / raw) To: Mattias Engdegård; +Cc: 43558, Stefan Monnier Hello, Mattias. On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: > Sorry if I misunderstood, but since when do backslashes escape */ in C? Since forever, but only in the CC Mode test suite. :-( I just tried it out with gcc, and it seems that \*/ does indeed end a block comment. But an escaped newline doesn't end a line comment, instead continuing it to the next line. So I got confused. Thanks for pointing out the mistake. It seems that as well as the existing variable comment-end-can-be-escaped, we need a new one, say line-comment-end-can-be-escaped, too. In C and C++ modes, these would be nil and t respectively. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 14:48 ` Alan Mackenzie @ 2020-09-23 18:44 ` Stefan Monnier 2020-09-23 19:44 ` Alan Mackenzie 2020-09-24 10:20 ` Alan Mackenzie 2020-09-24 18:52 ` Michael Welsh Duggan 1 sibling, 2 replies; 36+ messages in thread From: Stefan Monnier @ 2020-09-23 18:44 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård > It seems that as well as the existing variable > comment-end-can-be-escaped, we need a new one, say > line-comment-end-can-be-escaped, too. syntax.c doesn't like to think of it as "line-comment" but rather as comment stay a, b, c, or nested and non-nested. > In C and C++ modes, these would > be nil and t respectively. I sm-c-mode, I'd handle those corner cases in `syntax-propertize-function` (tho I think I don't bother with this one currently). So, I guess in CC-mode, you could handle those by placing `syntax-table` properties from ... wherever you place them ;-) Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 18:44 ` Stefan Monnier @ 2020-09-23 19:44 ` Alan Mackenzie 2020-09-23 20:02 ` Stefan Monnier 2020-09-24 10:20 ` Alan Mackenzie 1 sibling, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-09-23 19:44 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård Hello, Stefan. On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote: > > It seems that as well as the existing variable > > comment-end-can-be-escaped, we need a new one, say > > line-comment-end-can-be-escaped, too. > syntax.c doesn't like to think of it as "line-comment" but rather as > comment stay [ ?? style ?? ] a, b, c, or nested and non-nested. Hmm. It could be quite troublesome to decide on an interface for major modes specifying "comment style b can have its ender escaped, but comment styles a and c cannot". > > In C and C++ modes, these would > > be nil and t respectively. > I sm-c-mode, I'd handle those corner cases in > `syntax-propertize-function` (tho I think I don't bother with this one > currently). > So, I guess in CC-mode, you could handle those by placing `syntax-table` > properties from ... wherever you place them ;-) Thanks, that's an idea - either putting a neutral s-t prop on the \ of \*/, or something on the \n of \\n in a line comment. I think the first of these is a better idea than the second. But on the other hand, it feels like a workaround for the lack of a full-featured comment-end-can-be-escaped. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 19:44 ` Alan Mackenzie @ 2020-09-23 20:02 ` Stefan Monnier 0 siblings, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2020-09-23 20:02 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård > But on the other hand, it feels like a workaround for the lack of a Yes, that's the definition of `syntax-propertize-function` ;-) Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 18:44 ` Stefan Monnier 2020-09-23 19:44 ` Alan Mackenzie @ 2020-09-24 10:20 ` Alan Mackenzie 2020-09-24 16:56 ` Stefan Monnier 1 sibling, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-09-24 10:20 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård Hello, Stefan. On Wed, Sep 23, 2020 at 14:44:54 -0400, Stefan Monnier wrote: > > It seems that as well as the existing variable > > comment-end-can-be-escaped, we need a new one, say > > line-comment-end-can-be-escaped, too. > syntax.c doesn't like to think of it as "line-comment" but rather as > comment stay a, b, c, or nested and non-nested. > > In C and C++ modes, these would be nil and t respectively. > I sm-c-mode, I'd handle those corner cases in > `syntax-propertize-function` (tho I think I don't bother with this one > currently). > So, I guess in CC-mode, you could handle those by placing `syntax-table` > properties from ... wherever you place them ;-) As already said, this is a(n ugly) workaround. syntax.c should handle comments in all their generality. With a bit of consideration, the method to do this is clear: Introduce a new syntax flag `e' which takes effect in comment delimiters. It means "escape characters are active in this type of comment". In a two character delimiter it would, like `b', only take effect on the inner of the two characters. So the syntaxes of the C++ comment characters would be amended to look like / ". 124be" * ". 23" (unchanged) \n "> be" This would be an easy change to make, and (unlike using syntax-table text properties) would cost negligible run time. What do you think? > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 10:20 ` Alan Mackenzie @ 2020-09-24 16:56 ` Stefan Monnier 2020-09-24 18:50 ` Alan Mackenzie 2020-11-19 21:18 ` Alan Mackenzie 0 siblings, 2 replies; 36+ messages in thread From: Stefan Monnier @ 2020-09-24 16:56 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård > As already said, this is a(n ugly) workaround. syntax.c should handle > comments in all their generality. With a bit of consideration, the > method to do this is clear: In my world, it's quite normal for a specific language's lexical rules not to line up 100% with syntax tables (whether for strings, comments, younameit). I don't see anything very special here. A `syntax-propertize` rule for "\*/" should be very easy to implement and fairly cheap since the regexp is simple and will almost never match. So, yeah, you can add yet-another-hack on top of the other syntax.c hacks if you want, but there's a good chance it will only ever be used by CC-mode. It will take a lot more code changes in syntax.c than a quick tweak to your Elisp code to search for "\*/". I do think it would be good to handle this without `syntax-table` text-property hacks, but I think that should come with an overhaul of syntax.c based on a major-mode provided DFA (or something like that) so it can accommodate all the various oddball cases without even the need to introduce the notion of escaping comment markers. Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 16:56 ` Stefan Monnier @ 2020-09-24 18:50 ` Alan Mackenzie 2020-09-24 22:43 ` Stefan Monnier 2020-11-19 21:18 ` Alan Mackenzie 1 sibling, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-09-24 18:50 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård Hello, Stefan. On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote: > > As already said, this is a(n ugly) workaround. syntax.c should handle > > comments in all their generality. With a bit of consideration, the > > method to do this is clear: > In my world, it's quite normal for a specific language's lexical rules > not to line up 100% with syntax tables (whether for strings, comments, > younameit). I don't see anything very special here. Normally when there's a mismatch, it's because a character is syntactically ambiguous. There's nothing syntax.c can do about this. In the current situation, this isn't the case: syntax.c is unable to handle a comment scenario where there is no ambiguity. > A `syntax-propertize` rule for "\*/" should be very easy to implement > and fairly cheap since the regexp is simple and will almost never match. Well, the rule would actually be for escaped newlines, but this would be quite expensive (compared with a syntax.c solution) since every comment near a change region would need scanning at each change. > So, yeah, you can add yet-another-hack on top of the other syntax.c > hacks if you want, but there's a good chance it will only ever be used > by CC-mode. It will take a lot more code changes in syntax.c than > a quick tweak to your Elisp code to search for "\*/". I've hacked up a working, but as yet unsatisfactory, change to syntax.c. It is surely better, where possible, to fix bugs at their point of causation rather than by workarounds elsewhere. As you note, CC Mode modes will be the only known users at the moment. Just as an aside, the project where I was working ~four years ago banned a proprietory editor after a mammoth search for a bug caused by an unintentional escaped NL on a line comment. The banned editor didn't fontify the continuation line in comment face. I was able to demonstrate to the project manager that Emacs fontified that comment correctly. > I do think it would be good to handle this without `syntax-table` > text-property hacks, but I think that should come with an overhaul of > syntax.c based on a major-mode provided DFA (or something like that) so > it can accommodate all the various oddball cases without even the need > to introduce the notion of escaping comment markers. That sounds almost more like a rewrite than an overhaul. You mean, I think, that the syntax of language expressions would be defined using something a bit like (but more powerful than) regular expressions. And with that, the need for syntactic analysis in Lisp would be much reduced. We would need to make sure that this wouldn't run more slowly than the current syntax.c/Lisp combination. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 18:50 ` Alan Mackenzie @ 2020-09-24 22:43 ` Stefan Monnier 0 siblings, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2020-09-24 22:43 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård >> > As already said, this is a(n ugly) workaround. syntax.c should handle >> > comments in all their generality. With a bit of consideration, the >> > method to do this is clear: >> In my world, it's quite normal for a specific language's lexical rules >> not to line up 100% with syntax tables (whether for strings, comments, >> younameit). I don't see anything very special here. > Normally when there's a mismatch, it's because a character is > syntactically ambiguous. There's nothing syntax.c can do about this. Oh, no, there are many more situations than just "a character is syntactically ambiguous" (or alternatively you could argue that all cases are "a character is syntactically ambiguous", including your cases of escaped newline and escaped */). >> A `syntax-propertize` rule for "\*/" should be very easy to implement >> and fairly cheap since the regexp is simple and will almost never match. > Well, the rule would actually be for escaped newlines, It doesn't have to be if you set `comment-end-can-be-escaped` to non-nil, in which case you only need to tweak the \*/ case, AFAICT. > but this would be quite expensive (compared with a syntax.c solution) > since every comment near a change region would need scanning at > each change. I don't know what you mean by scanning, but yes you'd need to search for all "\\\\\n" or "\\\\\\*/" (depending on how you set `comment-end-can-be-escaped) and mark the second char accordingly. Seems pretty cheap in either case. > I've hacked up a working, but as yet unsatisfactory, change to syntax.c. > It is surely better, where possible, to fix bugs at their point of > causation rather than by workarounds elsewhere. I don't think it's a bug in `syntax.c`. `syntax.c` is not defined to support the syntax of C, it's only defined to handle a particular set of comment and string styles, which correspond to a common subset of what is in use in most languages, but IME most languages need some extra tweaks handled via the `syntax-table` text property. It's only a question of time until we add a `syntax-propertize-function` for Elisp mode to properly handle some corner cases, for example. >> I do think it would be good to handle this without `syntax-table` >> text-property hacks, but I think that should come with an overhaul of >> syntax.c based on a major-mode provided DFA (or something like that) so >> it can accommodate all the various oddball cases without even the need >> to introduce the notion of escaping comment markers. > That sounds almost more like a rewrite than an overhaul. Tomato tomahto. > We would need to make sure that this wouldn't run more slowly than the > current syntax.c/Lisp combination. I don't think that would be required, as long as it runs fast enough. In any case, the resulting performance is probably not the main worry (I suspect it will/would be easy to make it fast enough). Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 16:56 ` Stefan Monnier 2020-09-24 18:50 ` Alan Mackenzie @ 2020-11-19 21:18 ` Alan Mackenzie 2020-11-19 22:47 ` Stefan Monnier 1 sibling, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-11-19 21:18 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård, acm Hello, Stefan. On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote: > > As already said, this is a(n ugly) workaround. syntax.c should handle > > comments in all their generality. With a bit of consideration, the > > method to do this is clear: > In my world, it's quite normal for a specific language's lexical rules > not to line up 100% with syntax tables (whether for strings, comments, > younameit). I don't see anything very special here. > A `syntax-propertize` rule for "\*/" should be very easy to implement > and fairly cheap since the regexp is simple and will almost never match. > So, yeah, you can add yet-another-hack on top of the other syntax.c > hacks if you want, but there's a good chance it will only ever be used > by CC-mode. It will take a lot more code changes in syntax.c than > a quick tweak to your Elisp code to search for "\*/". > I do think it would be good to handle this without `syntax-table` > text-property hacks, but I think that should come with an overhaul of > syntax.c based on a major-mode provided DFA (or something like that) so > it can accommodate all the various oddball cases without even the need > to introduce the notion of escaping comment markers. OK, here's the patch. As a matter of interest, it's been heavily tested by the .../test/src/syntax-tests.el unit tests, further enhancements to which are part of the patch. Just as a reminder, the motivation is to be able to have syntax.c correctly parse C/C++ line comments which look like: foo(); // comment \\ second line of comment. by introducing a new syntax flag "e" as a modifier on the syntax entry for \n: (modify-syntax-entry ?\n "> be") > Stefan diff --git a/src/syntax.c b/src/syntax.c index df07809aaa..c701729ba1 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags) { return (flags >> 22) & 1; } +static bool +SYNTAX_FLAGS_COMMENT_ESCAPES (int flags) +{ + return (flags >> 24) & 1; +} /* FLAGS should be the flags of the main char of the comment marker, e.g. the second for comstart and the first for comend. */ @@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte) return val; } +static bool +comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax) +{ + int c; + int next_syntax; + if (comment_end_can_be_escaped && char_quoted (from, from_byte)) + return true; + if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax)) + { + dec_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_BACKWARD (from); + c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + next_syntax = SYNTAX_WITH_FLAGS (c); + UPDATE_SYNTAX_TABLE_FORWARD (from + 1); + if (next_syntax == Sescape || next_syntax == Scharquote) + return true; + } + return false; +} + /* Check whether charpos FROM is at the end of a comment. FROM_BYTE is the bytepos corresponding to FROM. Do not move back before STOP. @@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMEND_SECOND (prev_syntax)); comstart = (com2start || code == Scomment); + /* Check for any current delimiter being escaped. */ + if (from > stop + && (((com2end || code == Sendcomment) + && comment_ender_quoted (from, from_byte, syntax)) + || (code == Scomment + && comment_end_can_be_escaped + && char_quoted (from, from_byte)))) + { + dec_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_BACKWARD (from); + com2end = comstart = com2start = 0; + syntax = Smax; + } + /* Nasty cases with overlapping 2-char comment markers: - snmp-mode: -- c -- foo -- c -- --- c -- @@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property. */) case 'c': val |= 1 << 23; break; + + case 'e': + val |= 1 << 24; + break; } if (val < ASIZE (Vsyntax_code_object) && NILP (match)) @@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, (Lisp_Object syntax) { int code, syntax_code; - bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested; + bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested, + comescapes; char str[2]; Lisp_Object first, match_lisp, value = syntax; @@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code); comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code); comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code); + comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code); if (Smax <= code) { @@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, insert ("c", 1); if (comnested) insert ("n", 1); + if (comescapes) + insert ("e", 1); insert_string ("\twhich means: "); @@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, insert_string (" (comment style c)"); if (comnested) insert_string (" (nestable)"); + if (comescapes) + insert_string (" (can be escaped)"); if (prefix) { @@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !comment_ender_quoted (from, from_byte, syntax)) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ @@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, /* We have encountered a nested comment of the same style as the comment sequence which began this comment section. */ nesting++; - if (comment_end_can_be_escaped - && (code == Sescape || code == Scharquote)) + if (SYNTAX_FLAGS_COMEND_FIRST (syntax) + && comment_ender_quoted (from, from_byte, syntax)) { inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); - if (from == stop) continue; /* Failure */ + continue; } inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); @@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil. */) /* We're at the start of a comment. */ found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, &out_charpos, &out_bytepos, &dummy, &dummy2); - from = out_charpos; from_byte = out_bytepos; - if (!found) + from = out_charpos; from_byte = out_bytepos; + if (!found) { SET_PT_BOTH (from, from_byte); return Qnil; @@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil. */) if (code == Sendcomment) comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0); if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax) - && prev_char_comend_first (from, from_byte) - && !char_quoted (from - 1, dec_bytepos (from_byte))) + && prev_char_comend_first (from, from_byte)) { int other_syntax; - /* We must record the comment style encountered so that + /* We must record the comment style encountered so that later, we can match only the proper comment begin sequence of the same style. */ dec_both (&from, &from_byte); - code = Sendcomment; - /* Calling char_quoted, above, set up global syntax position - at the new value of FROM. */ c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte); other_syntax = SYNTAX_WITH_FLAGS (c1); - comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); - comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + if (!comment_ender_quoted (from, from_byte, other_syntax)) + { + code = Sendcomment; + comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); + comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + syntax = other_syntax; + } + else + { + inc_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_FORWARD (from); + } } if (code == Scomment_fence) @@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil. */) } else if (code == Sendcomment) { - found = (!quoted || !comment_end_can_be_escaped) + found = + !comment_ender_quoted (from, from_byte, syntax) && back_comment (from, from_byte, stop, comnested, comstyle, &out_charpos, &out_bytepos); if (!found) @@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag) other_syntax = SYNTAX_WITH_FLAGS (c2); comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + syntax = other_syntax; } /* Quoting turns anything except a comment-ender @@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag) case Sendcomment: if (!parse_sexp_ignore_comments) break; - found = back_comment (from, from_byte, stop, comnested, comstyle, + found = + (from == stop + || !comment_ender_quoted (from, from_byte, syntax)) + && back_comment (from, from_byte, stop, comnested, comstyle, &out_charpos, &out_bytepos); /* FIXME: if !found, it really wasn't a comment-end. For single-char Sendcomment, we can't do much about it apart diff --git a/test/src/syntax-resources/syntax-comments.txt b/test/src/syntax-resources/syntax-comments.txt index a292d816b9..f3357ea244 100644 --- a/test/src/syntax-resources/syntax-comments.txt +++ b/test/src/syntax-resources/syntax-comments.txt @@ -34,7 +34,7 @@ 54{ //74 \ }54 55{/* */}55 -56{ /*76 \*/ }56 +56{ /*76 \*/80 }56 57*/77 58}58 60{ /*78 \\*/79}60 @@ -87,6 +87,21 @@ 110 111#| ; |#111 +/* Comments and purported comments containing string delimiters. */ +120/* "string" */120 +121/* "" */121 +122/* " */122 +130/* +" " */130 +" "*/123 +124/* " ' */124 +126/* +" ' */126 +127/* " " " " " */127 +128/* " ' " ' " ' */128 +129/* ' " ' " ' */129 +" ' */125 + Local Variables: mode: fundamental eval: (set-syntax-table (make-syntax-table)) diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el index edee01ec58..399986c31d 100644 --- a/test/src/syntax-tests.el +++ b/test/src/syntax-tests.el @@ -307,6 +307,7 @@ syntax-pps-comments ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun {-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?{ "<") (modify-syntax-entry ?} ">")) @@ -336,6 +337,7 @@ {-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun \;-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?\n ">") (modify-syntax-entry ?\; "<") @@ -375,6 +377,7 @@ \;-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun \#|-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (modify-syntax-entry ?# ". 14") (modify-syntax-entry ?| ". 23n") (modify-syntax-entry ?\; "< b") @@ -418,15 +421,18 @@ \#|-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun /*-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped t) (modify-syntax-entry ?/ ". 124b") (modify-syntax-entry ?* ". 23") - (modify-syntax-entry ?\n "> b")) + (modify-syntax-entry ?\n "> b") + (modify-syntax-entry ?\' "\"")) (defun /*-out () (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?/ ".") (modify-syntax-entry ?* ".") - (modify-syntax-entry ?\n " ")) + (modify-syntax-entry ?\n " ") + (modify-syntax-entry ?\' ".")) (eval-and-compile (setq syntax-comments-section "c")) @@ -489,4 +495,142 @@ /*-out (syntax-pps-comments /* 56 76 77 58) (syntax-pps-comments /* 60 78 79) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the +;; "e" flag is used for line comments. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(defun //-in () + (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) + (modify-syntax-entry ?/ ". 124be") + (modify-syntax-entry ?* ". 23") + (modify-syntax-entry ?\n "> be") + (modify-syntax-entry ?\' "\"")) +(defun //-out () + (modify-syntax-entry ?/ ".") + (modify-syntax-entry ?* ".") + (modify-syntax-entry ?\n " ") + (modify-syntax-entry ?\' ".")) +(eval-and-compile + (setq syntax-comments-section "c++")) + +(syntax-comments // forward t 1) +(syntax-comments // backward t 1) +(syntax-comments // forward t 2) +(syntax-comments // backward t 2) +(syntax-comments // forward t 3) +(syntax-comments // backward t 3) + +(syntax-comments // forward t 4) +(syntax-comments // backward t 4) +(syntax-comments // forward t 5 6) +(syntax-comments // backward nil 5 0) +(syntax-comments // forward nil 6 0) +(syntax-comments // backward t 6 5) + +(syntax-comments // forward t 7) +(syntax-comments // backward t 7) +(syntax-comments // forward nil 8 0) +(syntax-comments // backward nil 8 0) +(syntax-comments // forward t 9) +(syntax-comments // backward t 9) + +(syntax-comments // forward nil 10 0) +(syntax-comments // backward nil 10 0) +(syntax-comments // forward t 11) +(syntax-comments // backward t 11) + +(syntax-comments // forward t 13) +(syntax-comments // backward t 13) +(syntax-comments // forward t 15) +(syntax-comments // backward t 15) + +;; Emacs 28 "C" style comments inside brace lists. +(syntax-br-comments // forward t 50) +(syntax-br-comments // backward t 50) +(syntax-br-comments // forward t 51) +(syntax-br-comments // backward t 51) +(syntax-br-comments // forward t 52) +(syntax-br-comments // backward t 52) + +(syntax-br-comments // forward t 53) +(syntax-br-comments // backward t 53) +(syntax-br-comments // forward t 54 58) +(syntax-br-comments // backward t 54) +(syntax-br-comments // forward t 55) +(syntax-br-comments // backward t 55) + +(syntax-br-comments // forward t 56 56) +(syntax-br-comments // backward t 58 54) +(syntax-br-comments // backward nil 59) +(syntax-br-comments // forward t 60) +(syntax-br-comments // backward t 60) + +;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'. +(syntax-pps-comments // 50 70 71) +(syntax-pps-comments // 52 72 73) +(syntax-pps-comments // 54 74 55 58) +(syntax-pps-comments // 56 76 80) +(syntax-pps-comments // 60 78 79) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Comments containing string delimiters. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(eval-and-compile + (setq syntax-comments-section "c-\"")) + +(syntax-comments /* forward t 120) +(syntax-comments /* backward t 120) +(syntax-comments /* forward t 121) +(syntax-comments /* backward t 121) +(syntax-comments /* forward t 122) +(syntax-comments /* backward t 122) + +(syntax-comments /* backward nil 123 0) +(syntax-comments /* forward t 124) +(syntax-comments /* backward t 124) +(syntax-comments /* backward nil 125 0) +(syntax-comments /* forward t 126) +(syntax-comments /* backward t 126) + +(syntax-comments /* forward t 127) +(syntax-comments /* backward t 127) +(syntax-comments /* forward t 128) +(syntax-comments /* backward t 128) +(syntax-comments /* forward t 129) +(syntax-comments /* backward t 129) + +(syntax-comments /* forward t 130) +(syntax-comments /* backward t 130) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; The same again, with Emacs 28 style C comments. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(eval-and-compile + (setq syntax-comments-section "c++-\"")) + +(syntax-comments // forward t 120) +(syntax-comments // backward t 120) +(syntax-comments // forward t 121) +(syntax-comments // backward t 121) +(syntax-comments // forward t 122) +(syntax-comments // backward t 122) + +(syntax-comments // backward nil 123 0) +(syntax-comments // forward t 124) +(syntax-comments // backward t 124) +(syntax-comments // backward nil 125 0) +(syntax-comments // forward t 126) +(syntax-comments // backward t 126) + +(syntax-comments // forward t 127) +(syntax-comments // backward t 127) +(syntax-comments // forward t 128) +(syntax-comments // backward t 128) +(syntax-comments // forward t 129) +(syntax-comments // backward t 129) + +(syntax-comments // forward t 130) +(syntax-comments // backward t 130) + ;;; syntax-tests.el ends here -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply related [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-19 21:18 ` Alan Mackenzie @ 2020-11-19 22:47 ` Stefan Monnier 2020-11-22 13:12 ` Alan Mackenzie 0 siblings, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2020-11-19 22:47 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård >> So, yeah, you can add yet-another-hack on top of the other syntax.c >> hacks if you want, but there's a good chance it will only ever be used >> by CC-mode. It will take a lot more code changes in syntax.c than >> a quick tweak to your Elisp code to search for "\*/". [...] > OK, here's the patch. I think the patch agrees with my assessment above (even though it's still missing a etc/NEWS entry, adjustment to the docstring of modify-syntax-entry and to the .texi manual). I really can't understand why you resist so much the use of a `syntax-table` property on those rare \\\n sequences. Stefan PS: Also, I just noticed that `gcc -Wall` warns about the use of such multiline comments, so it doesn't seem to be a very popular feature. PPS: For reference, I just tried to add support for it in sm-c-mode and this is the resulting code: @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"# define\"." 'syntax-table (string-to-syntax "|")) (put-text-property (match-beginning 2) (match-end 2) 'syntax-table (string-to-syntax "|"))) - (sm-c--cpp-syntax-propertize end))))) + (sm-c--cpp-syntax-propertize end)))) + ("\\\\\\(\n\\)" + (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0))))) + (when (and (nth 4 ppss) ;Within a comment + (null (nth 7 ppss)) ;Within a // comment + (save-excursion ;The \ is not itself escaped + (goto-char (match-beginning 0)) + (zerop (mod (skip-chars-backward "\\\\") 2)))) + (string-to-syntax ".")))))) (point) end)) (defun sm-c-syntactic-face-function (ppss) ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-19 22:47 ` Stefan Monnier @ 2020-11-22 13:12 ` Alan Mackenzie 2020-11-22 15:20 ` Stefan Monnier 2020-11-22 15:35 ` Eli Zaretskii 0 siblings, 2 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 13:12 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård, acm Hello, Stefan. On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote: > >> So, yeah, you can add yet-another-hack on top of the other syntax.c > >> hacks if you want, but there's a good chance it will only ever be used > >> by CC-mode. It will take a lot more code changes in syntax.c than > >> a quick tweak to your Elisp code to search for "\*/". > [...] > > OK, here's the patch. > I think the patch agrees with my assessment above (even though it's > still missing a etc/NEWS entry, adjustment to the docstring of > modify-syntax-entry and to the .texi manual). Here are these things: diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index b99b5de0b3..4e9e9207c3 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -287,21 +287,21 @@ Syntax Flags @cindex syntax flags In addition to the classes, entries for characters in a syntax table -can specify flags. There are eight possible flags, represented by the +can specify flags. There are nine possible flags, represented by the characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c}, -@samp{n}, and @samp{p}. +@samp{e}, @samp{n}, and @samp{p}. All the flags except @samp{p} are used to describe comment delimiters. The digit flags are used for comment delimiters made up of 2 characters. They indicate that a character can @emph{also} be part of a comment sequence, in addition to the syntactic properties associated with its character class. The flags are independent of the -class and each other for the sake of characters such as @samp{*} in -C mode, which is a punctuation character, @emph{and} the second +class and each other for the sake of characters such as @samp{*} in C +mode, which is a punctuation character, @emph{and} the second character of a start-of-comment sequence (@samp{/*}), @emph{and} the first character of an end-of-comment sequence (@samp{*/}). The flags -@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding -comment delimiter. +@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the +corresponding comment delimiter. Here is a table of the possible flags for a character @var{c}, and what they mean: @@ -332,6 +332,13 @@ Syntax Flags alternative ``c'' comment style. For a two-character comment delimiter, @samp{c} on either character makes it of style ``c''. +@item +@samp{e} means that when @var{c}, a comment ender or first character +of a two character ender, is directly proceded by one or more escape +characters, @var{c} does not act as a comment ender. Contrast this +with the effect of variable @code{comment-end-can-be-escaped} +(@pxref{Control Parsing}). + @item @samp{n} on a comment delimiter character specifies that this kind of comment can be nested. Inside such a comment, only comments of the @@ -357,7 +364,7 @@ Syntax Flags @item @samp{*} @samp{23b} @item newline -@samp{>} +@samp{> e} @end table This defines four comment-delimiting sequences: @@ -377,7 +384,9 @@ Syntax Flags @item newline This is a comment-end sequence for ``a'' style, because the newline -character does not have the @samp{b} flag. +character does not have the @samp{b} flag. It can be escaped by one +or more @samp{\} characters, so that an ``a'' style comment can +continue onto the next line. @end table @item @@ -962,9 +971,14 @@ Control Parsing @defvar comment-end-can-be-escaped If this buffer local variable is non-@code{nil}, a single character which usually terminates a comment doesn't do so when that character -is escaped. This is used in C and C++ Modes, where line comments -starting with @samp{//} can be continued onto the next line by -escaping the newline with @samp{\}. +is escaped. This used to be used in C and C++ Modes, where line +comments starting with @samp{//} can be continued onto the next line +by escaping the newline with @samp{\}. + +Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax +Flags}), where two consecutive escape characters escape the comment +ender. @code{comment-end-can-be-escaped} should not be used together +with the @samp{e} syntax flag. @end defvar You can use @code{forward-comment} to move forward or backward over @@ -1037,6 +1051,8 @@ Syntax Table Internals @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)} @item @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)} +@item +@tab@tab @samp{e} @tab @code{(ash 1 24)} @end multitable @defun string-to-syntax desc diff --git a/etc/NEWS b/etc/NEWS index a0e72bc673..3b292e8f41 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el. \f * Lisp Changes in Emacs 28.1 ++++ +** New syntax flag 'e'. +This indicates that one or two (or more) escape characters escape a +comment ender with this flag, causing the comment to be continued past +that comment ender (typically onto the next line). + +++ ** 'set-window-configuration' now takes an optional 'dont-set-frame' parameter which, when non-nil, instructs the function not to select diff --git a/src/syntax.c b/src/syntax.c index df07809aaa..7bdbd114ba 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below. The second character of NEWENTRY is the matching parenthesis, used only if the first character is `(' or `)'. Any additional characters are flags. -Defined flags are the characters 1, 2, 3, 4, b, p, and n. +Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p. 1 means CHAR is the start of a two-char comment start sequence. 2 means CHAR is the second character of such a sequence. 3 means CHAR is the start of a two-char comment end sequence. @@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag: c means CHAR is part of comment sequence c. n means CHAR is part of a nestable comment sequence. + e means CHAR, when a comment ender or first char of a two character + comment ender, can be escaped by (any number of consecutive) + characters with escape syntax. C and C++ use this facility. + Compare and contrast with the variable `comment-end-can-be-escaped'. + p means CHAR is a prefix character for `backward-prefix-chars'; such characters are treated as whitespace when they occur between expressions. > I really can't understand why you resist so much the use of > a `syntax-table` property on those rare \\\n sequences. Because syntax-table text properties are already used for so many different things in CC Mode (I think the count is five in C++ Mode). Adding another one would mean having to scan for this rare construct at every buffer change, and this would slow things down, possibly a lot. There is no slowdown (beyond a possible microscopic one) in the modification to syntax.c and, as a bonus, I have written around 200 test cases for syntax.c's comment features. > Stefan > PS: Also, I just noticed that `gcc -Wall` warns about the use of such > multiline comments, so it doesn't seem to be a very popular feature. It is more of a mistake that people occasionally might make than a feature. In my opinion, having escaped newlines inside line comments is a bug in the C/C++ language standards. Anybody might "end" a line comment accidentally with "\" or "\\". > PPS: For reference, I just tried to add support for it in sm-c-mode > and this is the resulting code: Just to emphasize Stefan Kangas's point, it is a newline preceded by a "\" which continues the comment, not an escaped NL in the ordinary sense. In particular two "\"s followed by NL still continue the comment. > @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"# define\"." > 'syntax-table (string-to-syntax "|")) > (put-text-property (match-beginning 2) (match-end 2) > 'syntax-table (string-to-syntax "|"))) > - (sm-c--cpp-syntax-propertize end))))) > + (sm-c--cpp-syntax-propertize end)))) > + ("\\\\\\(\n\\)" > + (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0))))) > + (when (and (nth 4 ppss) ;Within a comment > + (null (nth 7 ppss)) ;Within a // comment > + (save-excursion ;The \ is not itself escaped > + (goto-char (match-beginning 0)) > + (zerop (mod (skip-chars-backward "\\\\") 2)))) > + (string-to-syntax ".")))))) > (point) end)) > > (defun sm-c-syntactic-face-function (ppss) Yes, something like this would be possible. But all these syntax-ppsss would be slow, at least somewhat, as discussed above. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply related [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 13:12 ` Alan Mackenzie @ 2020-11-22 15:20 ` Stefan Monnier 2020-11-22 17:08 ` Alan Mackenzie 2020-11-22 15:35 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2020-11-22 15:20 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård > Because syntax-table text properties are already used for so many > different things in CC Mode (I think the count is five in C++ Mode). > Adding another one would mean having to scan for this rare construct at > every buffer change, and this would slow things down, possibly a lot. The fact that you already have 5 other such uses implies that the slow down from this one cannot possibly be larger than 20% (since the scan for it is very simple, I doubt any of the other 5 is simpler). Most major modes have such things and we live just fine with them. This is a non-issue. Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 15:20 ` Stefan Monnier @ 2020-11-22 17:08 ` Alan Mackenzie 2020-11-22 17:46 ` Dmitry Gutov 2020-11-22 23:10 ` Stefan Monnier 0 siblings, 2 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 17:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: 43558, Mattias Engdegård Hello, Stefan. On Sun, Nov 22, 2020 at 10:20:32 -0500, Stefan Monnier wrote: > > Because syntax-table text properties are already used for so many > > different things in CC Mode (I think the count is five in C++ Mode). > > Adding another one would mean having to scan for this rare construct at > > every buffer change, and this would slow things down, possibly a lot. > The fact that you already have 5 other such uses implies that the slow > down from this one cannot possibly be larger than 20% (since the scan > for it is very simple, I doubt any of the other 5 is simpler). The fact remains that an implementation at the C level is objectively better than one at the Lisp level. > Most major modes have such things and we live just fine with them. > This is a non-issue. Really? Are there any other programming language modes whose comments syntax.c cannot handle without syntax-table text properties? > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 17:08 ` Alan Mackenzie @ 2020-11-22 17:46 ` Dmitry Gutov 2020-11-22 18:19 ` Alan Mackenzie 2020-11-22 23:10 ` Stefan Monnier 1 sibling, 1 reply; 36+ messages in thread From: Dmitry Gutov @ 2020-11-22 17:46 UTC (permalink / raw) To: Alan Mackenzie, Stefan Monnier; +Cc: 43558, Mattias Engdegård On 22.11.2020 19:08, Alan Mackenzie wrote: > Really? Are there any other programming language modes whose comments > syntax.c cannot handle without syntax-table text properties? Ruby is just one example. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 17:46 ` Dmitry Gutov @ 2020-11-22 18:19 ` Alan Mackenzie 2020-11-22 20:39 ` Dmitry Gutov 0 siblings, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 18:19 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier Hello, Dmitry. On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote: > On 22.11.2020 19:08, Alan Mackenzie wrote: > > Really? Are there any other programming language modes whose comments > > syntax.c cannot handle without syntax-table text properties? > Ruby is just one example. Thanks. I've just searched the web for that. Ruby has block comment delimiters =begin and =end. It would be possible to handle these in syntax.c, but somewhat clumsy and awkward. Presumably ruby-mode handles these with syntax-table text properties applied to the = sign and the terminating d, which is a little clumsy, but not too bad, at the Lisp level. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 18:19 ` Alan Mackenzie @ 2020-11-22 20:39 ` Dmitry Gutov 2020-11-22 21:13 ` Alan Mackenzie 0 siblings, 1 reply; 36+ messages in thread From: Dmitry Gutov @ 2020-11-22 20:39 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier On 22.11.2020 20:19, Alan Mackenzie wrote: > Hello, Dmitry. > > On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote: >> On 22.11.2020 19:08, Alan Mackenzie wrote: >>> Really? Are there any other programming language modes whose comments >>> syntax.c cannot handle without syntax-table text properties? > >> Ruby is just one example. > > Thanks. > > I've just searched the web for that. Ruby has block comment delimiters > =begin and =end. > > It would be possible to handle these in syntax.c, but somewhat clumsy > and awkward. Just like the C comments syntax discussed here. > Presumably ruby-mode handles these with syntax-table text properties > applied to the = sign and the terminating d, which is a little clumsy, > but not too bad, at the Lisp level. This is just two more regexps to search for (and propertize). I don't expect that the slowdown from them is in any way perceptible. And the general point is that the Emacs syntax table structure doesn't necessarily have to mirror the syntax of the C language. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 20:39 ` Dmitry Gutov @ 2020-11-22 21:13 ` Alan Mackenzie 2020-11-22 21:34 ` Dmitry Gutov 0 siblings, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 21:13 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier Hello, Dmitry. On Sun, Nov 22, 2020 at 22:39:08 +0200, Dmitry Gutov wrote: > On 22.11.2020 20:19, Alan Mackenzie wrote: > > On Sun, Nov 22, 2020 at 19:46:24 +0200, Dmitry Gutov wrote: > >> On 22.11.2020 19:08, Alan Mackenzie wrote: > >>> Really? Are there any other programming language modes whose comments > >>> syntax.c cannot handle without syntax-table text properties? > >> Ruby is just one example. > > Thanks. > > I've just searched the web for that. Ruby has block comment delimiters > > =begin and =end. > > It would be possible to handle these in syntax.c, but somewhat clumsy > > and awkward. > Just like the C comments syntax discussed here. Not at all. The amendment we're talking about is to handle escaped newlines inside line comments. Which takes precedence, the comment to EOL, or the escape? It's rather arbitrary, and should be configurable. Coding up the Ruby block comments in syntax.c would involve string comparisons, for example, and would be an entirely new flavour inside that file. It would involve examining individual letters rather than just their syntax. By contrast, coding up the escaped NL in syntax.c was straightforward and natural. Have you looked at the patch? > > Presumably ruby-mode handles these with syntax-table text properties > > applied to the = sign and the terminating d, which is a little clumsy, > > but not too bad, at the Lisp level. > This is just two more regexps to search for (and propertize). I don't > expect that the slowdown from them is in any way perceptible. > And the general point is that the Emacs syntax table structure doesn't > necessarily have to mirror the syntax of the C language. Maybe not, but the point remains, that for this fix, a fix at the C level is objectively better than a fix at the Lisp level. Furthermore, the C level change is already implemented and has been well tested. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 21:13 ` Alan Mackenzie @ 2020-11-22 21:34 ` Dmitry Gutov 2020-11-22 22:01 ` Alan Mackenzie 0 siblings, 1 reply; 36+ messages in thread From: Dmitry Gutov @ 2020-11-22 21:34 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier Hi Alan, On 22.11.2020 23:13, Alan Mackenzie wrote: > Coding up the Ruby block comments in syntax.c would involve string > comparisons, for example, and would be an entirely new flavour inside > that file. It would involve examining individual letters rather than > just their syntax. It could be made to support a new syntax using a finite state machine, something like that. And the strings could be converted to such by the major mode. But you're right, it would be more difficult. > By contrast, coding up the escaped NL in syntax.c was straightforward and > natural. > > Have you looked at the patch? Yup. It's not terrible, but it's still a bunch of new if/elses that one would need to grasp to maintain that code. >>> Presumably ruby-mode handles these with syntax-table text properties >>> applied to the = sign and the terminating d, which is a little clumsy, >>> but not too bad, at the Lisp level. > >> This is just two more regexps to search for (and propertize). I don't >> expect that the slowdown from them is in any way perceptible. > >> And the general point is that the Emacs syntax table structure doesn't >> necessarily have to mirror the syntax of the C language. > > Maybe not, but the point remains, that for this fix, a fix at the C level > is objectively better than a fix at the Lisp level. Furthermore, the C > level change is already implemented and has been well tested. Why is it objectively better? With user experience (speed, latencies, etc) being equal or within the margin of error, I think it's more logical to go with simpler data structures and low level APIs. Finally, as I recall you feel strongly about supporting older Emacs versions, a significant number of them. Doing that fix in Lisp would allow you to fix the bug for those versions too. Not just Emacs 28+. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 21:34 ` Dmitry Gutov @ 2020-11-22 22:01 ` Alan Mackenzie 2020-11-22 23:00 ` Stefan Monnier 0 siblings, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 22:01 UTC (permalink / raw) To: Dmitry Gutov; +Cc: 43558, Mattias Engdegård, Stefan Monnier Hello, Dmitry. On Sun, Nov 22, 2020 at 23:34:18 +0200, Dmitry Gutov wrote: > Hi Alan, > On 22.11.2020 23:13, Alan Mackenzie wrote: > > Coding up the Ruby block comments in syntax.c would involve string > > comparisons, for example, and would be an entirely new flavour > > inside that file. It would involve examining individual letters > > rather than just their syntax. > It could be made to support a new syntax using a finite state machine, > something like that. And the strings could be converted to such by the > major mode. But you're right, it would be more difficult. > > By contrast, coding up the escaped NL in syntax.c was > > straightforward and natural. > > Have you looked at the patch? > Yup. > It's not terrible, but it's still a bunch of new if/elses that one > would need to grasp to maintain that code. It's character, the general use of ifs/elses, and so on, is unchanged. Only somebody with a very detailed memory of exact statements would be inconvenienced, and that only slightly. > >>> Presumably ruby-mode handles these with syntax-table text > >>> properties applied to the = sign and the terminating d, which is a > >>> little clumsy, but not too bad, at the Lisp level. > >> This is just two more regexps to search for (and propertize). I > >> don't expect that the slowdown from them is in any way perceptible. > >> And the general point is that the Emacs syntax table structure > >> doesn't necessarily have to mirror the syntax of the C language. > > Maybe not, but the point remains, that for this fix, a fix at the C > > level is objectively better than a fix at the Lisp level. > > Furthermore, the C level change is already implemented and has been > > well tested. > Why is it objectively better? It's faster, and it avoids fragmenting the handling of CC Mode comments between C and Lisp the way that of strings, for example, has been. It provides a mechanism which might be useful to other major modes in the future. > With user experience (speed, latencies, etc) being equal or within the > margin of error, I think it's more logical to go with simpler data > structures and low level APIs. Fixing things in syntax.c was simpler than a Lisp solution using syntax-table text properties would have been. > Finally, as I recall you feel strongly about supporting older Emacs > versions, a significant number of them. Doing that fix in Lisp would > allow you to fix the bug for those versions too. Not just Emacs 28+. Yes. That appears to be the sole drawback of the fix being in syntax.c. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 22:01 ` Alan Mackenzie @ 2020-11-22 23:00 ` Stefan Monnier 2021-05-13 10:38 ` Lars Ingebrigtsen 0 siblings, 1 reply; 36+ messages in thread From: Stefan Monnier @ 2020-11-22 23:00 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Dmitry Gutov > It provides a mechanism which might be useful to other major modes in > the future. FWIW, I doubt it. In my experience these details never quite match. >> With user experience (speed, latencies, etc) being equal or within the >> margin of error, I think it's more logical to go with simpler data >> structures and low level APIs. > Fixing things in syntax.c was simpler than a Lisp solution using > syntax-table text properties would have been. I find it hard to take this statement seriously after having seen my sm-c-mode patch. >> Finally, as I recall you feel strongly about supporting older Emacs >> versions, a significant number of them. Doing that fix in Lisp would >> allow you to fix the bug for those versions too. Not just Emacs 28+. > Yes. That appears to be the sole drawback of the fix being in syntax.c. I don't really find this to be a drawback since I don't care about CC-mode's support for multiline `//` comments in older Emacsen. But the added complexity in the C code and in the API documentation for the sole benefit of C (and C++?) mode is a drawback, yes. I do want to clarify my position, tho: while I don't like your patch very much because I think a Lisp-level solution (like the one I used in sm-c-mode) is preferable I find your patch acceptable. Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 23:00 ` Stefan Monnier @ 2021-05-13 10:38 ` Lars Ingebrigtsen 2021-05-13 14:51 ` Alan Mackenzie 0 siblings, 1 reply; 36+ messages in thread From: Lars Ingebrigtsen @ 2021-05-13 10:38 UTC (permalink / raw) To: Stefan Monnier Cc: 43558, Alan Mackenzie, Mattias Engdegård, Dmitry Gutov Stefan Monnier <monnier@iro.umontreal.ca> writes: > I do want to clarify my position, tho: while I don't like your patch > very much because I think a Lisp-level solution (like the one I used in > sm-c-mode) is preferable I find your patch acceptable. This was the patch that adds the "e" syntax modifier? Apparently this thread stalled here, and as far as I can see, the patch was never applied? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2021-05-13 10:38 ` Lars Ingebrigtsen @ 2021-05-13 14:51 ` Alan Mackenzie 2021-05-16 13:53 ` Lars Ingebrigtsen 2022-04-28 11:17 ` Lars Ingebrigtsen 0 siblings, 2 replies; 36+ messages in thread From: Alan Mackenzie @ 2021-05-13 14:51 UTC (permalink / raw) To: Lars Ingebrigtsen Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov Hello, Lars. On Thu, May 13, 2021 at 12:38:01 +0200, Lars Ingebrigtsen wrote: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > I do want to clarify my position, tho: while I don't like your patch > > very much because I think a Lisp-level solution (like the one I used in > > sm-c-mode) is preferable I find your patch acceptable. > This was the patch that adds the "e" syntax modifier? Apparently this > thread stalled here, and as far as I can see, the patch was never applied? That is correct, yes. The patch was working and ready, but I took Stefan's criticisms on board, and I think he was right. It was a fairly large change to syntax.c with a minor gain. Also, it would have meant having two alternative comment mechanisms in CC Mode for a few releases. So I think on balance, it would be better to regard the patch as an experiment only, which should go no further. I think the bug should stay open, though. CC Mode still doesn't handle multi-line comments absolutely correctly, and these could surely be fixed, like Stefan said, by using syntax-table text properties somehow. I also created a plethora of syntax test cases for the patch, some of which are applicable to syntax.c as is. I'll get around to sorting these out and committing them sometime. > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2021-05-13 14:51 ` Alan Mackenzie @ 2021-05-16 13:53 ` Lars Ingebrigtsen 2022-04-28 11:17 ` Lars Ingebrigtsen 1 sibling, 0 replies; 36+ messages in thread From: Lars Ingebrigtsen @ 2021-05-16 13:53 UTC (permalink / raw) To: Alan Mackenzie Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov Alan Mackenzie <acm@muc.de> writes: > I think the bug should stay open, though. CC Mode still doesn't handle > multi-line comments absolutely correctly, and these could surely be > fixed, like Stefan said, by using syntax-table text properties somehow. OK; makes sense to me. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2021-05-13 14:51 ` Alan Mackenzie 2021-05-16 13:53 ` Lars Ingebrigtsen @ 2022-04-28 11:17 ` Lars Ingebrigtsen 2022-04-28 18:52 ` Alan Mackenzie 1 sibling, 1 reply; 36+ messages in thread From: Lars Ingebrigtsen @ 2022-04-28 11:17 UTC (permalink / raw) To: Alan Mackenzie Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov Alan Mackenzie <acm@muc.de> writes: > I think the bug should stay open, though. CC Mode still doesn't handle > multi-line comments absolutely correctly, and these could surely be > fixed, like Stefan said, by using syntax-table text properties somehow. Just to add a test case for this bug -- put this into a C file: /* \*/ a */ cc-mode will highlight that as one comment (i.e., the "a" will be coloured with the comment face), but the file won't compile, since the comment ended at line two. (This is in Emacs 29.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2022-04-28 11:17 ` Lars Ingebrigtsen @ 2022-04-28 18:52 ` Alan Mackenzie 0 siblings, 0 replies; 36+ messages in thread From: Alan Mackenzie @ 2022-04-28 18:52 UTC (permalink / raw) To: Lars Ingebrigtsen Cc: 43558, Mattias Engdegård, Stefan Monnier, Dmitry Gutov Hello, Lars. On Thu, Apr 28, 2022 at 13:17:12 +0200, Lars Ingebrigtsen wrote: > Alan Mackenzie <acm@muc.de> writes: > > I think the bug should stay open, though. CC Mode still doesn't handle > > multi-line comments absolutely correctly, and these could surely be > > fixed, like Stefan said, by using syntax-table text properties somehow. > Just to add a test case for this bug -- put this into a C file: > /* > \*/ > a */ > cc-mode will highlight that as one comment (i.e., the "a" will be > coloured with the comment face), but the file won't compile, since the > comment ended at line two. (This is in Emacs 29.) I'll look at this in the coming days, and hopefully fix it. > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 17:08 ` Alan Mackenzie 2020-11-22 17:46 ` Dmitry Gutov @ 2020-11-22 23:10 ` Stefan Monnier 1 sibling, 0 replies; 36+ messages in thread From: Stefan Monnier @ 2020-11-22 23:10 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård > Really? Are there any other programming language modes whose comments > syntax.c cannot handle without syntax-table text properties? Yes, a fair number. Beside the ones that use "long" delimiters, there are some that put restrictions about where a comment can start (e.g. the # in `sh` is a normal character when it appears within a word), there's Scheme's #;(...) "sexp comment" (which we still don't actually handle correctly), there are cases where there are too many different syntaxes (e.g. in Pascal you can have (*...*), /*...*/, //...\n and then Emacs concludes that `(/` is also a valid comment starter), ... Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 13:12 ` Alan Mackenzie 2020-11-22 15:20 ` Stefan Monnier @ 2020-11-22 15:35 ` Eli Zaretskii 2020-11-22 17:03 ` Alan Mackenzie 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2020-11-22 15:35 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, mattiase, monnier > Date: Sun, 22 Nov 2020 13:12:31 +0000 > From: Alan Mackenzie <acm@muc.de> > Cc: 43558@debbugs.gnu.org, > Mattias Engdegård <mattiase@acm.org>, acm@muc.de > > +@samp{e} means that when @var{c}, a comment ender or first character > +of a two character ender, is directly proceded by one or more escape ^^^^^^^^ "preceded", I guess? ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-11-22 15:35 ` Eli Zaretskii @ 2020-11-22 17:03 ` Alan Mackenzie 0 siblings, 0 replies; 36+ messages in thread From: Alan Mackenzie @ 2020-11-22 17:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 43558, mattiase, monnier Hello, Eli. On Sun, Nov 22, 2020 at 17:35:24 +0200, Eli Zaretskii wrote: > > Date: Sun, 22 Nov 2020 13:12:31 +0000 > > From: Alan Mackenzie <acm@muc.de> > > Cc: 43558@debbugs.gnu.org, > > Mattias Engdegård <mattiase@acm.org>, acm@muc.de > > +@samp{e} means that when @var{c}, a comment ender or first character > > +of a two character ender, is directly proceded by one or more escape > ^^^^^^^^ > "preceded", I guess? Er, yes. Thanks! I've corrected it. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-23 14:48 ` Alan Mackenzie 2020-09-23 18:44 ` Stefan Monnier @ 2020-09-24 18:52 ` Michael Welsh Duggan 2020-09-24 19:57 ` Alan Mackenzie 1 sibling, 1 reply; 36+ messages in thread From: Michael Welsh Duggan @ 2020-09-24 18:52 UTC (permalink / raw) To: Alan Mackenzie; +Cc: 43558, Mattias Engdegård, Stefan Monnier Alan Mackenzie <acm@muc.de> writes: > Hello, Mattias. > > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > > Since forever, but only in the CC Mode test suite. :-( > > I just tried it out with gcc, and it seems that \*/ does indeed end a > block comment. But an escaped newline doesn't end a line comment, > instead continuing it to the next line. So I got confused. Thanks for > pointing out the mistake. > > It seems that as well as the existing variable > comment-end-can-be-escaped, we need a new one, say > line-comment-end-can-be-escaped, too. In C and C++ modes, these would > be nil and t respectively. But where does it say that backslashes escape */ in C++? The C++ 14 standard (and it hasn't changed through C++ 20) says: 2.7 Comments [lex.comment] The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates immediately before the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear between it and the new-line that terminates the comment; no diagnostic is required. [ Note: The comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment. — end note ] -- Michael Welsh Duggan (md5i@md5i.com) ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 18:52 ` Michael Welsh Duggan @ 2020-09-24 19:57 ` Alan Mackenzie 2020-09-24 20:27 ` Michael Welsh Duggan 0 siblings, 1 reply; 36+ messages in thread From: Alan Mackenzie @ 2020-09-24 19:57 UTC (permalink / raw) To: Michael Welsh Duggan; +Cc: 43558, Mattias Engdegård, Stefan Monnier Hello, Michael. On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote: > Alan Mackenzie <acm@muc.de> writes: > > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: > >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > > Since forever, but only in the CC Mode test suite. :-( > > I just tried it out with gcc, and it seems that \*/ does indeed end a > > block comment. But an escaped newline doesn't end a line comment, > > instead continuing it to the next line. So I got confused. Thanks for > > pointing out the mistake. > > It seems that as well as the existing variable > > comment-end-can-be-escaped, we need a new one, say > > line-comment-end-can-be-escaped, too. In C and C++ modes, these would > > be nil and t respectively. > But where does it say that backslashes escape */ in C++? Nowhere. :-( There has been a test in the CC Mode test suite for many years which assumed this (but was disabled for existing (X)Emacs versions, waiting for a new Emacs version to be "fixed"). > The C++ 14 standard (and it hasn't changed through C++ 20) says: > 2.7 Comments [lex.comment] > The characters /* start a comment, which terminates with the > characters */. These comments do not nest. The characters // start > a comment, which terminates immediately before the next new-line > character. For all the difference it makes, Emacs assumes the comment ends _after_ the NL. > If there is a form-feed or a vertical-tab character in such a > comment, only white-space characters shall appear between it and > the new-line that terminates the comment; no diagnostic is > required. I didn't know that. Emacs/CC Mode doesn't code up this subtlety. It probably isn't worth bothering about. > [ Note: The comment characters //, /*, and */ have no special > meaning within a // comment and are treated just like other > characters. Similarly, the comment characters // and /* have no > special meaning within a /* comment. — end note ] Additionally, an escaped newline continues a comment onto the next line. This happens, notionally, at a very early stage of compilation where a backslash followed by NL anywhere get replaced by a space. I think that even two backslashes followed by NL would get replaced by backslash, space. > -- > Michael Welsh Duggan > (md5i@md5i.com) -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. 2020-09-24 19:57 ` Alan Mackenzie @ 2020-09-24 20:27 ` Michael Welsh Duggan 0 siblings, 0 replies; 36+ messages in thread From: Michael Welsh Duggan @ 2020-09-24 20:27 UTC (permalink / raw) To: Alan Mackenzie Cc: Michael Welsh Duggan, Mattias Engdegård, 43558@debbugs.gnu.org, Stefan Monnier Alan Mackenzie <acm@muc.de> writes: > Hello, Michael. > > On Thu, Sep 24, 2020 at 14:52:16 -0400, Michael Welsh Duggan wrote: >> Alan Mackenzie <acm@muc.de> writes: > >> > On Wed, Sep 23, 2020 at 11:01:59 +0200, Mattias Engdegård wrote: >> >> Sorry if I misunderstood, but since when do backslashes escape */ in C? > >> > Since forever, but only in the CC Mode test suite. :-( > >> > I just tried it out with gcc, and it seems that \*/ does indeed end a >> > block comment. But an escaped newline doesn't end a line comment, >> > instead continuing it to the next line. So I got confused. Thanks for >> > pointing out the mistake. > >> > It seems that as well as the existing variable >> > comment-end-can-be-escaped, we need a new one, say >> > line-comment-end-can-be-escaped, too. In C and C++ modes, these would >> > be nil and t respectively. > >> But where does it say that backslashes escape */ in C++? > > Nowhere. :-( > > There has been a test in the CC Mode test suite for many years which > assumed this (but was disabled for existing (X)Emacs versions, waiting > for a new Emacs version to be "fixed"). > >> The C++ 14 standard (and it hasn't changed through C++ 20) says: > >> 2.7 Comments [lex.comment] > >> The characters /* start a comment, which terminates with the >> characters */. These comments do not nest. The characters // start >> a comment, which terminates immediately before the next new-line >> character. > > For all the difference it makes, Emacs assumes the comment ends _after_ > the NL. > >> If there is a form-feed or a vertical-tab character in such a >> comment, only white-space characters shall appear between it and >> the new-line that terminates the comment; no diagnostic is >> required. > > I didn't know that. Emacs/CC Mode doesn't code up this subtlety. It > probably isn't worth bothering about. > >> [ Note: The comment characters //, /*, and */ have no special >> meaning within a // comment and are treated just like other >> characters. Similarly, the comment characters // and /* have no >> special meaning within a /* comment. — end note ] > > Additionally, an escaped newline continues a comment onto the next line. > This happens, notionally, at a very early stage of compilation where a > backslash followed by NL anywhere get replaced by a space. I think that > even two backslashes followed by NL would get replaced by backslash, > space. Almost. A backslash followed by a newline is elided completely, joining the lines. (Not replaced by a space. Otherwise, I concur. -- Michael Welsh Duggan (mwd@cert.org) ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2022-04-28 18:52 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-09-22 9:35 bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped Alan Mackenzie 2020-09-22 14:09 ` Stefan Monnier 2020-09-22 19:41 ` Alan Mackenzie [not found] ` <handler.43558.B.160076736116422.ack@debbugs.gnu.org> 2020-09-23 8:57 ` Alan Mackenzie 2020-09-23 9:01 ` Mattias Engdegård 2020-09-23 14:48 ` Alan Mackenzie 2020-09-23 18:44 ` Stefan Monnier 2020-09-23 19:44 ` Alan Mackenzie 2020-09-23 20:02 ` Stefan Monnier 2020-09-24 10:20 ` Alan Mackenzie 2020-09-24 16:56 ` Stefan Monnier 2020-09-24 18:50 ` Alan Mackenzie 2020-09-24 22:43 ` Stefan Monnier 2020-11-19 21:18 ` Alan Mackenzie 2020-11-19 22:47 ` Stefan Monnier 2020-11-22 13:12 ` Alan Mackenzie 2020-11-22 15:20 ` Stefan Monnier 2020-11-22 17:08 ` Alan Mackenzie 2020-11-22 17:46 ` Dmitry Gutov 2020-11-22 18:19 ` Alan Mackenzie 2020-11-22 20:39 ` Dmitry Gutov 2020-11-22 21:13 ` Alan Mackenzie 2020-11-22 21:34 ` Dmitry Gutov 2020-11-22 22:01 ` Alan Mackenzie 2020-11-22 23:00 ` Stefan Monnier 2021-05-13 10:38 ` Lars Ingebrigtsen 2021-05-13 14:51 ` Alan Mackenzie 2021-05-16 13:53 ` Lars Ingebrigtsen 2022-04-28 11:17 ` Lars Ingebrigtsen 2022-04-28 18:52 ` Alan Mackenzie 2020-11-22 23:10 ` Stefan Monnier 2020-11-22 15:35 ` Eli Zaretskii 2020-11-22 17:03 ` Alan Mackenzie 2020-09-24 18:52 ` Michael Welsh Duggan 2020-09-24 19:57 ` Alan Mackenzie 2020-09-24 20:27 ` Michael Welsh Duggan
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).