* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE @ 2023-04-19 16:40 Eli Zaretskii 2023-04-21 20:37 ` Yuan Fu 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2023-04-19 16:40 UTC (permalink / raw) To: 62951 To reproduce: emacs -Q C-x C-f src/fns.c RET C-u 3365 M-g g Observe that "if" and "STRINGP" in the body of FOR_EACH_TAIL_SAFE are fontified in font-lock-function-name-face. This is because the FOR_EACH_TAIL_SAFE macro is misparsed by tree-sitter as a declaration. Can we teach c-ts-mode to recognize FOR_EACH_TAIL_SAFE and FOR_EACH_TAIL for what they are, perhaps conditioned on c-ts-mode-emacs-sources-support being non-nil? In GNU Emacs 29.0.90 (build 25, i686-pc-mingw32) of 2023-04-17 built on HOME-C4E4A596F7 Repository revision: 2f59595f5f4e86a791c425a6f9e2c1594d6813e4 Repository branch: emacs-29 Windowing system distributor 'Microsoft Corp.', version 5.1.2600 System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600) Configured using: 'configure -C --prefix=/d/usr --with-wide-int --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3'' Configured features: ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB Important settings: value of $LANG: ENU locale-coding-system: cp1255 Major mode: C/* Minor modes in effect: tooltip-mode: t global-eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t line-number-mode: t indent-tabs-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr emacsbug message mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068 epg-config gnus-util text-property-search mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils time-date subr-x pp wid-edit descr-text help-mode c-ts-mode c-ts-common treesit cl-seq vc-git diff-mode easy-mmode vc vc-dispatcher bug-reference byte-opt gv bytecomp byte-compile cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs cl-loaddefs cl-lib rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty make-network-process emacs) Memory information: ((conses 16 100876 13128) (symbols 48 10122 0) (strings 16 31511 2068) (string-bytes 1 995836) (vectors 16 17200) (vector-slots 8 233602 14782) (floats 8 43 173) (intervals 40 2885 151) (buffers 888 16)) ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-19 16:40 bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE Eli Zaretskii @ 2023-04-21 20:37 ` Yuan Fu 2023-04-22 7:17 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Yuan Fu @ 2023-04-21 20:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 62951 Eli Zaretskii <eliz@gnu.org> writes: > To reproduce: > > emacs -Q > C-x C-f src/fns.c RET > C-u 3365 M-g g > > Observe that "if" and "STRINGP" in the body of FOR_EACH_TAIL_SAFE are > fontified in font-lock-function-name-face. This is because the > FOR_EACH_TAIL_SAFE macro is misparsed by tree-sitter as a declaration. > > Can we teach c-ts-mode to recognize FOR_EACH_TAIL_SAFE and > FOR_EACH_TAIL for what they are, perhaps conditioned on > c-ts-mode-emacs-sources-support being non-nil? I’m aware of this issue, but the truth is there isn’t a good solution to it. We need to recognize FOR_EACH_TAIL_SAFE (not hard) and fix arbitrary code after it (hard). In this case it’s a if statement, with macro calls and AND operation in it’s condition, it’s already three things we need to recognize and somehow handle. It can also be a for loop, a switch case, a function call, a while loop. If we want to fix FOR_EACH_TAIL we would need to handle every possible thing, at that point we might as well have wrote a parser :-) We can probably fix this very particular case, but it’s still work and overhead, and doesn’t mean much. Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-21 20:37 ` Yuan Fu @ 2023-04-22 7:17 ` Eli Zaretskii 2023-04-23 0:28 ` Yuan Fu 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2023-04-22 7:17 UTC (permalink / raw) To: Yuan Fu; +Cc: 62951 > From: Yuan Fu <casouri@gmail.com> > Date: Fri, 21 Apr 2023 13:37:08 -0700 > Cc: 62951@debbugs.gnu.org > > > Eli Zaretskii <eliz@gnu.org> writes: > > > To reproduce: > > > > emacs -Q > > C-x C-f src/fns.c RET > > C-u 3365 M-g g > > > > Observe that "if" and "STRINGP" in the body of FOR_EACH_TAIL_SAFE are > > fontified in font-lock-function-name-face. This is because the > > FOR_EACH_TAIL_SAFE macro is misparsed by tree-sitter as a declaration. > > > > Can we teach c-ts-mode to recognize FOR_EACH_TAIL_SAFE and > > FOR_EACH_TAIL for what they are, perhaps conditioned on > > c-ts-mode-emacs-sources-support being non-nil? > > I’m aware of this issue, but the truth is there isn’t a good solution to > it. We need to recognize FOR_EACH_TAIL_SAFE (not hard) and fix arbitrary > code after it (hard). In this case it’s a if statement, with macro calls > and AND operation in it’s condition, it’s already three things we need > to recognize and somehow handle. It can also be a for loop, a switch > case, a function call, a while loop. If we want to fix FOR_EACH_TAIL we > would need to handle every possible thing, at that point we might as > well have wrote a parser :-) Sorry, I don't understand the difficulties. The body of FOR_EACH_TAIL and a few similar macros we use could be on of the following: . a single simple statement . an 'if' clause . a 'while' loop . a 'do' loop . a 'for' loop . a brace-delimited block (this one already works, AFAICS, so we perhaps need not anything about it) (In practice, only the first 2 and the last one are used, AFAICS.) Doesn't tree-sitter tell us enough to figure out which of the above is in the body? If so, why would we need to write a full parser? > We can probably fix this very particular case, but it’s still work and > overhead, and doesn’t mean much. Please understand: good support for editing Emacs C sources is from my POV imperative for c-ts-mode to gain traction. One of my gripes about CC Mode was insufficient support for our macro system and for various GCC attributes; that improved recently to some extend, but not enough, and at a price of introducing ugly lists of strings that cause trouble when used in file-local variables. We must do better in c-ts-mode! So please make an effort of providing reasonable built-in solutions for these idiosyncrasies of the Emacs C sources, conditioned on the new variable c-ts-mode-emacs-sources-support, at least for those of them that are used widely enough. It is very important. TIA ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-22 7:17 ` Eli Zaretskii @ 2023-04-23 0:28 ` Yuan Fu 2023-04-23 6:25 ` Eli Zaretskii 2023-04-23 21:04 ` Dmitry Gutov 0 siblings, 2 replies; 15+ messages in thread From: Yuan Fu @ 2023-04-23 0:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 62951 > On Apr 22, 2023, at 12:17 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Fri, 21 Apr 2023 13:37:08 -0700 >> Cc: 62951@debbugs.gnu.org >> >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >>> To reproduce: >>> >>> emacs -Q >>> C-x C-f src/fns.c RET >>> C-u 3365 M-g g >>> >>> Observe that "if" and "STRINGP" in the body of FOR_EACH_TAIL_SAFE are >>> fontified in font-lock-function-name-face. This is because the >>> FOR_EACH_TAIL_SAFE macro is misparsed by tree-sitter as a declaration. >>> >>> Can we teach c-ts-mode to recognize FOR_EACH_TAIL_SAFE and >>> FOR_EACH_TAIL for what they are, perhaps conditioned on >>> c-ts-mode-emacs-sources-support being non-nil? >> >> I’m aware of this issue, but the truth is there isn’t a good solution to >> it. We need to recognize FOR_EACH_TAIL_SAFE (not hard) and fix arbitrary >> code after it (hard). In this case it’s a if statement, with macro calls >> and AND operation in it’s condition, it’s already three things we need >> to recognize and somehow handle. It can also be a for loop, a switch >> case, a function call, a while loop. If we want to fix FOR_EACH_TAIL we >> would need to handle every possible thing, at that point we might as >> well have wrote a parser :-) > > Sorry, I don't understand the difficulties. The body of FOR_EACH_TAIL > and a few similar macros we use could be on of the following: > > . a single simple statement > . an 'if' clause > . a 'while' loop > . a 'do' loop > . a 'for' loop > . a brace-delimited block (this one already works, AFAICS, so we > perhaps need not anything about it) > > (In practice, only the first 2 and the last one are used, AFAICS.) > > Doesn't tree-sitter tell us enough to figure out which of the above is > in the body? If so, why would we need to write a full parser? Ok, the full parser part is a bit exaggeration :-) But my point is it’s a lot of dirty code for a subpar result. Take if (NILP (XCDR (tail)) && STRINGP (XCAR (tail))) for example, it’s parsed as a function definition and all the tokens in the condition are parsed as weird things like macro declarator, error, function declarator, type, etc. And if the condition changes to something else, say it has another layer of function call, it’ll parse differently. > >> We can probably fix this very particular case, but it’s still work and >> overhead, and doesn’t mean much. > > Please understand: good support for editing Emacs C sources is from my > POV imperative for c-ts-mode to gain traction. One of my gripes about > CC Mode was insufficient support for our macro system and for various > GCC attributes; that improved recently to some extend, but not enough, > and at a price of introducing ugly lists of strings that cause trouble > when used in file-local variables. We must do better in c-ts-mode! > > So please make an effort of providing reasonable built-in solutions > for these idiosyncrasies of the Emacs C sources, conditioned on the > new variable c-ts-mode-emacs-sources-support, at least for those of > them that are used widely enough. It is very important. What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) If we can fix the parser, we don’t need to retrofit hacks onto font-lock, indent, etc, separately, and it truly fixes the problem. The downside is compiling from grammar source to grammar.c needs rust and node tools. But I guess depending on the grammar maintained by tree-sitter’s author isn’t too much different from depending on the grammar maintained by another individual (ie, me)? Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-23 0:28 ` Yuan Fu @ 2023-04-23 6:25 ` Eli Zaretskii 2023-04-24 7:02 ` Yuan Fu 2023-04-23 21:04 ` Dmitry Gutov 1 sibling, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2023-04-23 6:25 UTC (permalink / raw) To: Yuan Fu; +Cc: 62951 > From: Yuan Fu <casouri@gmail.com> > Date: Sat, 22 Apr 2023 17:28:25 -0700 > Cc: 62951@debbugs.gnu.org > > >> I’m aware of this issue, but the truth is there isn’t a good solution to > >> it. We need to recognize FOR_EACH_TAIL_SAFE (not hard) and fix arbitrary > >> code after it (hard). In this case it’s a if statement, with macro calls > >> and AND operation in it’s condition, it’s already three things we need > >> to recognize and somehow handle. It can also be a for loop, a switch > >> case, a function call, a while loop. If we want to fix FOR_EACH_TAIL we > >> would need to handle every possible thing, at that point we might as > >> well have wrote a parser :-) > > > > Sorry, I don't understand the difficulties. The body of FOR_EACH_TAIL > > and a few similar macros we use could be on of the following: > > > > . a single simple statement > > . an 'if' clause > > . a 'while' loop > > . a 'do' loop > > . a 'for' loop > > . a brace-delimited block (this one already works, AFAICS, so we > > perhaps need not anything about it) > > > > (In practice, only the first 2 and the last one are used, AFAICS.) > > > > Doesn't tree-sitter tell us enough to figure out which of the above is > > in the body? If so, why would we need to write a full parser? > > Ok, the full parser part is a bit exaggeration :-) But my point is it’s a lot of dirty code for a subpar result. Take > > if (NILP (XCDR (tail)) && STRINGP (XCAR (tail))) > > for example, it’s parsed as a function definition and all the tokens in the condition are parsed as weird things like macro declarator, error, function declarator, type, etc. And if the condition changes to something else, say it has another layer of function call, it’ll parse differently. But the top-level construct is 'if', no? Isn't that enough? Also, can we detect the FOR_EACH_TAIL etc. macros themselves, and then treat their body specially? > > Please understand: good support for editing Emacs C sources is from my > > POV imperative for c-ts-mode to gain traction. One of my gripes about > > CC Mode was insufficient support for our macro system and for various > > GCC attributes; that improved recently to some extend, but not enough, > > and at a price of introducing ugly lists of strings that cause trouble > > when used in file-local variables. We must do better in c-ts-mode! > > > > So please make an effort of providing reasonable built-in solutions > > for these idiosyncrasies of the Emacs C sources, conditioned on the > > new variable c-ts-mode-emacs-sources-support, at least for those of > > them that are used widely enough. It is very important. > > What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) This goes against the purpose of using tree-sitter and its grammars. I don't think we should maintain and develop language grammars, especially since the production of the C sources from them needs non-trivial system resources and additional tools. Maybe coming up with a way of extending the C grammar in some more general way, and then submitting the changes to the tree-sitter-c developers for inclusion would be OK. But I very much hope that we could support these macros at a lower cost, by some tailored Lisp. Please give it a try. Even if the result works only for the cases we actually use in our sources, it might be "good enough" for us. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-23 6:25 ` Eli Zaretskii @ 2023-04-24 7:02 ` Yuan Fu 0 siblings, 0 replies; 15+ messages in thread From: Yuan Fu @ 2023-04-24 7:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 62951 > On Apr 22, 2023, at 11:25 PM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Sat, 22 Apr 2023 17:28:25 -0700 >> Cc: 62951@debbugs.gnu.org >> >>>> I’m aware of this issue, but the truth is there isn’t a good solution to >>>> it. We need to recognize FOR_EACH_TAIL_SAFE (not hard) and fix arbitrary >>>> code after it (hard). In this case it’s a if statement, with macro calls >>>> and AND operation in it’s condition, it’s already three things we need >>>> to recognize and somehow handle. It can also be a for loop, a switch >>>> case, a function call, a while loop. If we want to fix FOR_EACH_TAIL we >>>> would need to handle every possible thing, at that point we might as >>>> well have wrote a parser :-) >>> >>> Sorry, I don't understand the difficulties. The body of FOR_EACH_TAIL >>> and a few similar macros we use could be on of the following: >>> >>> . a single simple statement >>> . an 'if' clause >>> . a 'while' loop >>> . a 'do' loop >>> . a 'for' loop >>> . a brace-delimited block (this one already works, AFAICS, so we >>> perhaps need not anything about it) >>> >>> (In practice, only the first 2 and the last one are used, AFAICS.) >>> >>> Doesn't tree-sitter tell us enough to figure out which of the above is >>> in the body? If so, why would we need to write a full parser? >> >> Ok, the full parser part is a bit exaggeration :-) But my point is it’s a lot of dirty code for a subpar result. Take >> >> if (NILP (XCDR (tail)) && STRINGP (XCAR (tail))) >> >> for example, it’s parsed as a function definition and all the tokens in the condition are parsed as weird things like macro declarator, error, function declarator, type, etc. And if the condition changes to something else, say it has another layer of function call, it’ll parse differently. > > But the top-level construct is 'if', no? Isn't that enough? > > Also, can we detect the FOR_EACH_TAIL etc. macros themselves, and then > treat their body specially? > >>> Please understand: good support for editing Emacs C sources is from my >>> POV imperative for c-ts-mode to gain traction. One of my gripes about >>> CC Mode was insufficient support for our macro system and for various >>> GCC attributes; that improved recently to some extend, but not enough, >>> and at a price of introducing ugly lists of strings that cause trouble >>> when used in file-local variables. We must do better in c-ts-mode! >>> >>> So please make an effort of providing reasonable built-in solutions >>> for these idiosyncrasies of the Emacs C sources, conditioned on the >>> new variable c-ts-mode-emacs-sources-support, at least for those of >>> them that are used widely enough. It is very important. >> >> What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) > > This goes against the purpose of using tree-sitter and its grammars. > I don't think we should maintain and develop language grammars, > especially since the production of the C sources from them needs > non-trivial system resources and additional tools. > > Maybe coming up with a way of extending the C grammar in some more > general way, and then submitting the changes to the tree-sitter-c > developers for inclusion would be OK. > > But I very much hope that we could support these macros at a lower > cost, by some tailored Lisp. Please give it a try. Even if the > result works only for the cases we actually use in our sources, it > might be "good enough" for us. Fair enough, I’ll try :-) Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-23 0:28 ` Yuan Fu 2023-04-23 6:25 ` Eli Zaretskii @ 2023-04-23 21:04 ` Dmitry Gutov 2023-04-26 22:19 ` Yuan Fu 1 sibling, 1 reply; 15+ messages in thread From: Dmitry Gutov @ 2023-04-23 21:04 UTC (permalink / raw) To: Yuan Fu, Eli Zaretskii; +Cc: 62951 On 23/04/2023 03:28, Yuan Fu wrote: > What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) If we can fix the parser, we don’t need to retrofit hacks onto font-lock, indent, etc, separately, and it truly fixes the problem. The downside is compiling from grammar source to grammar.c needs rust and node tools. But I guess depending on the grammar maintained by tree-sitter’s author isn’t too much different from depending on the grammar maintained by another individual (ie, me)? We had also talked at some point about replacing the actual text that the parser sees with something else. If this can be done in a straightforward way (with tracking the subsequent correspondence of "real" text back to the buffer for syntax highlighting), that might be the perfect solution: we'd have a defcustom which would hold a list of macros used in the current codebase in the form of templates, and we'd set a bunch of them in emacs/.dir-locals.el. I'm not sure how difficult this is to implement and maintain, but it's probably going to be less work to maintain than a fork of the grammar. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-23 21:04 ` Dmitry Gutov @ 2023-04-26 22:19 ` Yuan Fu 2023-04-27 3:14 ` Yuan Fu 2023-04-27 8:57 ` Dmitry Gutov 0 siblings, 2 replies; 15+ messages in thread From: Yuan Fu @ 2023-04-26 22:19 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, 62951 > On Apr 23, 2023, at 2:04 PM, Dmitry Gutov <dmitry@gutov.dev> wrote: > > On 23/04/2023 03:28, Yuan Fu wrote: >> What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) If we can fix the parser, we don’t need to retrofit hacks onto font-lock, indent, etc, separately, and it truly fixes the problem. The downside is compiling from grammar source to grammar.c needs rust and node tools. But I guess depending on the grammar maintained by tree-sitter’s author isn’t too much different from depending on the grammar maintained by another individual (ie, me)? > > We had also talked at some point about replacing the actual text that the parser sees with something else. > > If this can be done in a straightforward way (with tracking the subsequent correspondence of "real" text back to the buffer for syntax highlighting), that might be the perfect solution: we'd have a defcustom which would hold a list of macros used in the current codebase in the form of templates, and we'd set a bunch of them in emacs/.dir-locals.el. > > I'm not sure how difficult this is to implement and maintain, but it's probably going to be less work to maintain than a fork of the grammar. Sounds to me a bit difficult to write. Eg, translating between tree-sitter position and buffer position efficiently isn’t too easy. Now plus narrowing, and what if the narrowing boundary is in the middle of a replace region? My idea right now is to use the range feature in tree-sitter. Since the “body” of FOR_EACH_TAIL is valid C, I can either set the ranges for the parser so it ignores FOR_EACH_TAIL, or I can add another parser that only parses the body of FOR_EACH_TAIL. Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-26 22:19 ` Yuan Fu @ 2023-04-27 3:14 ` Yuan Fu 2023-04-27 15:03 ` Eli Zaretskii 2023-04-27 8:57 ` Dmitry Gutov 1 sibling, 1 reply; 15+ messages in thread From: Yuan Fu @ 2023-04-27 3:14 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, 62951 [-- Attachment #1: Type: text/plain, Size: 444 bytes --] > > My idea right now is to use the range feature in tree-sitter. Since the “body” of FOR_EACH_TAIL is valid C, I can either set the ranges for the parser so it ignores FOR_EACH_TAIL I end up going this route. > or I can add another parser that only parses the body of FOR_EACH_TAIL. Ok, here’s the patch. Eli, would you give it a try? This is a sizable patch so I’m not sure if you want it on emacs-29 or master. Yuan [-- Attachment #2: for-each-tail-fix.patch --] [-- Type: application/octet-stream, Size: 8003 bytes --] From 6d9a9a4ce88f996d87fd121aaf6bfa21581fac30 Mon Sep 17 00:00:00 2001 From: Yuan Fu <casouri@gmail.com> Date: Wed, 26 Apr 2023 20:09:42 -0700 Subject: [PATCH] Fix FOR_EACH_TAIL in c-ts-mode (bug#62951) * lisp/progmodes/c-ts-mode.el (c-ts-mode--indent-styles): New indent rule. * lisp/progmodes/c-ts-mode.el (c-ts-mode--for-each-tail-body-matcher): (c-ts-mode--emacs-c-range-query) (c-ts-mode--for-each-tail-ranges) (c-ts-mode--reverse-ranges) (c-ts-mode--emacs-set-ranges): New functions and variables. (c-ts-mode): Create a emacs-c parser. More setup for Emacs source support. * lisp/treesit.el (treesit-query-range): Ignore underscore-prefixed capture names. --- lisp/progmodes/c-ts-mode.el | 92 ++++++++++++++++++++++++++++++++++++- lisp/treesit.el | 17 +++++-- 2 files changed, 103 insertions(+), 6 deletions(-) diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el index 6100f00e3ba..c4e1ba7a23a 100644 --- a/lisp/progmodes/c-ts-mode.el +++ b/lisp/progmodes/c-ts-mode.el @@ -357,7 +357,9 @@ c-ts-mode--indent-styles "Indent rules supported by `c-ts-mode'. MODE is either `c' or `cpp'." (let ((common - `(((parent-is "translation_unit") column-0 0) + `((c-ts-mode--for-each-tail-body-matcher prev-line c-ts-mode-indent-offset) + + ((parent-is "translation_unit") column-0 0) ((query "(ERROR (ERROR)) @indent") column-0 0) ((node-is ")") parent 1) ((node-is "]") parent-bol 0) @@ -969,6 +971,75 @@ c-ts-mode--emacs-current-defun-name (or (treesit-add-log-current-defun) (c-ts-mode--defun-name (c-ts-mode--emacs-defun-at-point)))) +;;; FOR_EACH_TAIL fix +;; +;; FOR_EACH_TAIL (and FOR_EACH_TAIL_SAFE) followed by a unbracketed +;; body will mess up the parser, which parses the thing as a function +;; declaration. We "fix" it by adding a shadow parser, emacs-c (which +;; is just c but under a different name). We use emacs-c to find each +;; FOR_EACH_TAIL with a unbracketed body, and set the ranges of the C +;; parser so that it skips those FOR_EACH_TAIL's. Note that we only +;; ignore FOR_EACH_TAIL's with a unbracketed body. Those with a +;; bracketed body parses more or less fine. + +(defun c-ts-mode--for-each-tail-body-matcher (_n _p bol &rest _) + "A matcher that matches the first line after a FOR_EACH_TAIL. +For BOL see `treesit-simple-indent-rules'." + (when c-ts-mode-emacs-sources-support + (save-excursion + (goto-char bol) + (forward-line -1) + (skip-chars-forward " \t") + (looking-at (rx "FOR_EACH_TAIL" (? (or "_SAFE" "_ALIST_VALUE"))))))) + +(defvar c-ts-mode--emacs-c-range-query + (treesit-query-compile + 'emacs-c `(((declaration + type: (macro_type_specifier + name: (identifier) @_name) + @for-each-tail) + (:match ,(rx "FOR_EACH_TAIL" + (? (or "_SAFE" "_ALIST_VALUE"))) + @_name)))) + "Query that finds the FOR_EACH_TAIL with a unbracketed body.") + +(defvar-local c-ts-mode--for-each-tail-ranges nil + "Ranges covering all the FOR_EACH_TAIL's in the buffer.") + +(defun c-ts-mode--reverse-ranges (ranges beg end) + "Reverse RANGES and return the new ranges between BEG and END. +Positions that were included RANGES are not in the returned +ranges, and vice versa. + +Return nil if RANGES is nil. This way, passing the returned +ranges to `treesit-parser-set-included-ranges' will make the +parser parse the whole buffer." + (if (null ranges) + nil + (let ((new-ranges nil) + (prev-end beg)) + (dolist (range ranges) + (push (cons prev-end (car range)) new-ranges) + (setq prev-end (cdr range))) + (push (cons prev-end end) new-ranges) + (nreverse new-ranges)))) + +(defun c-ts-mode--emacs-set-ranges (beg end) + "Set ranges for the C parser to skip some FOR_EACH_TAIL's. +BEG and END are described in `treesit-range-rules'." + (let* ((c-parser (treesit-parser-create 'c)) + (old-ranges c-ts-mode--for-each-tail-ranges) + (new-ranges (treesit-query-range + 'emacs-c c-ts-mode--emacs-c-range-query beg end)) + (set-ranges (treesit--clip-ranges + (treesit--merge-ranges + old-ranges new-ranges beg end) + (point-min) (point-max))) + (reversed-ranges (c-ts-mode--reverse-ranges + set-ranges (point-min) (point-max)))) + (setq-local c-ts-mode--for-each-tail-ranges set-ranges) + (treesit-parser-set-included-ranges c-parser reversed-ranges))) + ;;; Modes (defvar-keymap c-ts-base-mode-map @@ -1072,6 +1143,13 @@ c-ts-mode :after-hook (c-ts-mode-set-modeline) (when (treesit-ready-p 'c) + ;; If Emacs source support is enabled, make sure emacs-c parser is + ;; after c parser in the parser list. This way various tree-sitter + ;; functions will automatically use the c parser rather than the + ;; emacs-c parser. + (when c-ts-mode-emacs-sources-support + (treesit-parser-create 'emacs-c)) + (treesit-parser-create 'c) ;; Comments. (setq-local comment-start "/* ") @@ -1087,7 +1165,17 @@ c-ts-mode (when c-ts-mode-emacs-sources-support (setq-local add-log-current-defun-function - #'c-ts-mode--emacs-current-defun-name)))) + #'c-ts-mode--emacs-current-defun-name) + + (setq-local treesit-range-settings + (treesit-range-rules 'c-ts-mode--emacs-set-ranges)) + + (setq-local treesit-language-at-point-function (lambda (_pos) 'c)) + + ;; Add a fake "emacs-c" language which is just C. Used for + ;; skipping FOR_EACH_TAIL, see `c-ts-mode--emacs-set-ranges'. + (setf (alist-get 'emacs-c treesit-load-name-override-list) + '("libtree-sitter-c" "tree_sitter_c"))))) ;;;###autoload (define-derived-mode c++-ts-mode c-ts-base-mode "C++" diff --git a/lisp/treesit.el b/lisp/treesit.el index e718ea1a23a..1d4749c8cd2 100644 --- a/lisp/treesit.el +++ b/lisp/treesit.el @@ -378,13 +378,16 @@ treesit-query-string (defun treesit-query-range (node query &optional beg end) "Query the current buffer and return ranges of captured nodes. -QUERY, NODE, BEG, END are the same as in -`treesit-query-capture'. This function returns a list -of (START . END), where START and END specifics the range of each -captured node. Capture names don't matter." +QUERY, NODE, BEG, END are the same as in `treesit-query-capture'. +This function returns a list of (START . END), where START and +END specifics the range of each captured node. Capture names +generally don't matter, but names that starts with an underscore +are ignored." (cl-loop for capture in (treesit-query-capture node query beg end) + for name = (car capture) for node = (cdr capture) + if (not (string-prefix-p "_" (symbol-name name))) collect (cons (treesit-node-start node) (treesit-node-end node)))) @@ -399,6 +402,9 @@ treesit-range-settings range to the range spanned by captured nodes. QUERY must be a compiled query. +Capture names generally don't matter, but names that starts with +an underscore are ignored. + QUERY can also be a function, in which case it is called with 2 arguments, START and END. It should ensure parsers' ranges are correct in the region between START and END. @@ -418,6 +424,9 @@ treesit-range-rules Each QUERY is a tree-sitter query in either the string, s-expression or compiled form. +Capture names generally don't matter, but names that starts with +an underscore are ignored. + For each QUERY, :KEYWORD and VALUE pairs add meta information to it. For example, -- 2.33.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-27 3:14 ` Yuan Fu @ 2023-04-27 15:03 ` Eli Zaretskii 2023-04-27 19:56 ` Yuan Fu 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2023-04-27 15:03 UTC (permalink / raw) To: Yuan Fu; +Cc: dmitry, 62951 > From: Yuan Fu <casouri@gmail.com> > Date: Wed, 26 Apr 2023 20:14:45 -0700 > Cc: Eli Zaretskii <eliz@gnu.org>, > 62951@debbugs.gnu.org > > Ok, here’s the patch. Eli, would you give it a try? It signals an error when I enable c-ts-mode: c-ts-mode: Cannot load language definition: not-found, ("libtree-sitter-emacs-c" "libtree-sitter-emacs-c.dll"), "No such file or directory" It looks like your "fake emacs-c language" trick somehow misfires? The value of treesit-load-name-override-list is nil, which is not what you intended, AFAICT? The only way I can make this work is by manually customizing treesit-load-name-override-list before loading c-ts-mode. Otherwise, looks quite good; here are some other problems I found: . some uses of FOR_EACH_TAIL are not fontified at all; examples: comp.c, line 2079, fns.c, line 189 . FOR_EACH_LIVE_BUFFER (data.c, line 1430) is not recognized? . FOR_EACH_FRAME (keyboard.c, line 1256) is not recognized? This is much better than before already, so I think you should install this on the emacs-29 branch, once you fix the above problems (assuming they are easily fixable). Thanks! ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-27 15:03 ` Eli Zaretskii @ 2023-04-27 19:56 ` Yuan Fu 2023-04-28 5:41 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Yuan Fu @ 2023-04-27 19:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: dmitry, 62951 > On Apr 27, 2023, at 8:03 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Wed, 26 Apr 2023 20:14:45 -0700 >> Cc: Eli Zaretskii <eliz@gnu.org>, >> 62951@debbugs.gnu.org >> >> Ok, here’s the patch. Eli, would you give it a try? > > It signals an error when I enable c-ts-mode: > > c-ts-mode: Cannot load language definition: not-found, ("libtree-sitter-emacs-c" "libtree-sitter-emacs-c.dll"), "No such file or directory" > > It looks like your "fake emacs-c language" trick somehow misfires? > The value of treesit-load-name-override-list is nil, which is not what > you intended, AFAICT? The only way I can make this work is by > manually customizing treesit-load-name-override-list before loading > c-ts-mode. Duh, sorry, dumb mistake. Fixed. > > Otherwise, looks quite good; here are some other problems I found: > > . some uses of FOR_EACH_TAIL are not fontified at all; examples: > comp.c, line 2079, fns.c, line 189 You mean the FOE_EACH_TAIL part isn’t fontified, or the body isn’t fontified? Because the body are always fontified here. FOR_EACH_TAIL itself shouldn’t be fontified since it’s just a macro call and a variable. > . FOR_EACH_LIVE_BUFFER (data.c, line 1430) is not recognized? > . FOR_EACH_FRAME (keyboard.c, line 1256) is not recognized? Didn’t know that they exited :-) Now I have FOR_EACH_TAIL, FOR_EACH_TAIL_SAFE, FOR_EACH_ALIST_VALUE, FOR_EACH_LIVE_BUFFER, FOR_EACH_FRAME. > > This is much better than before already, so I think you should install > this on the emacs-29 branch, once you fix the above problems (assuming > they are easily fixable). Cool, I pushed the change. Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-27 19:56 ` Yuan Fu @ 2023-04-28 5:41 ` Eli Zaretskii 2023-04-29 22:55 ` Yuan Fu 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2023-04-28 5:41 UTC (permalink / raw) To: Yuan Fu; +Cc: dmitry, 62951 > From: Yuan Fu <casouri@gmail.com> > Date: Thu, 27 Apr 2023 12:56:07 -0700 > Cc: dmitry@gutov.dev, > 62951@debbugs.gnu.org > > > c-ts-mode: Cannot load language definition: not-found, ("libtree-sitter-emacs-c" "libtree-sitter-emacs-c.dll"), "No such file or directory" > > > > It looks like your "fake emacs-c language" trick somehow misfires? > > The value of treesit-load-name-override-list is nil, which is not what > > you intended, AFAICT? The only way I can make this work is by > > manually customizing treesit-load-name-override-list before loading > > c-ts-mode. > > Duh, sorry, dumb mistake. Fixed. Thanks. > > Otherwise, looks quite good; here are some other problems I found: > > > > . some uses of FOR_EACH_TAIL are not fontified at all; examples: > > comp.c, line 2079, fns.c, line 189 > > You mean the FOE_EACH_TAIL part isn’t fontified, or the body isn’t fontified? Because the body are always fontified here. FOR_EACH_TAIL itself shouldn’t be fontified since it’s just a macro call and a variable. I mean the macro itself, FOR_EACH_TAIL. If it isn't supposed to be fontified, then why is it fontified at line 856 of comp.c? It's inconsistent. (However, this is a very minor problem, so if fixing it is difficult, we can leave this alone for now.) > > . FOR_EACH_LIVE_BUFFER (data.c, line 1430) is not recognized? > > . FOR_EACH_FRAME (keyboard.c, line 1256) is not recognized? > > Didn’t know that they exited :-) Now I have FOR_EACH_TAIL, FOR_EACH_TAIL_SAFE, FOR_EACH_ALIST_VALUE, FOR_EACH_LIVE_BUFFER, FOR_EACH_FRAME. Great, thanks. > > This is much better than before already, so I think you should install > > this on the emacs-29 branch, once you fix the above problems (assuming > > they are easily fixable). > > Cool, I pushed the change. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-28 5:41 ` Eli Zaretskii @ 2023-04-29 22:55 ` Yuan Fu 2023-04-30 5:24 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Yuan Fu @ 2023-04-29 22:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Dmitry Gutov, 62951 > On Apr 27, 2023, at 10:41 PM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Yuan Fu <casouri@gmail.com> >> Date: Thu, 27 Apr 2023 12:56:07 -0700 >> Cc: dmitry@gutov.dev, >> 62951@debbugs.gnu.org >> >>> c-ts-mode: Cannot load language definition: not-found, ("libtree-sitter-emacs-c" "libtree-sitter-emacs-c.dll"), "No such file or directory" >>> >>> It looks like your "fake emacs-c language" trick somehow misfires? >>> The value of treesit-load-name-override-list is nil, which is not what >>> you intended, AFAICT? The only way I can make this work is by >>> manually customizing treesit-load-name-override-list before loading >>> c-ts-mode. >> >> Duh, sorry, dumb mistake. Fixed. > > Thanks. > >>> Otherwise, looks quite good; here are some other problems I found: >>> >>> . some uses of FOR_EACH_TAIL are not fontified at all; examples: >>> comp.c, line 2079, fns.c, line 189 >> >> You mean the FOE_EACH_TAIL part isn’t fontified, or the body isn’t fontified? Because the body are always fontified here. FOR_EACH_TAIL itself shouldn’t be fontified since it’s just a macro call and a variable. > > I mean the macro itself, FOR_EACH_TAIL. If it isn't supposed to be > fontified, then why is it fontified at line 856 of comp.c? It's > inconsistent. (However, this is a very minor problem, so if fixing > it is difficult, we can leave this alone for now.) Ah, I see. FOR_EACH_TAIL’s that has a bracketed body are not skipped and still have fontification on them. I pushed a change so that no one has fontification now. Also thanks for the doc fix :-) Yuan ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-29 22:55 ` Yuan Fu @ 2023-04-30 5:24 ` Eli Zaretskii 0 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2023-04-30 5:24 UTC (permalink / raw) To: Yuan Fu; +Cc: dmitry, 62951 > From: Yuan Fu <casouri@gmail.com> > Date: Sat, 29 Apr 2023 15:55:26 -0700 > Cc: Dmitry Gutov <dmitry@gutov.dev>, > 62951@debbugs.gnu.org > > > I mean the macro itself, FOR_EACH_TAIL. If it isn't supposed to be > > fontified, then why is it fontified at line 856 of comp.c? It's > > inconsistent. (However, this is a very minor problem, so if fixing > > it is difficult, we can leave this alone for now.) > > Ah, I see. FOR_EACH_TAIL’s that has a bracketed body are not skipped and still have fontification on them. I pushed a change so that no one has fontification now. Thanks, looks good now. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE 2023-04-26 22:19 ` Yuan Fu 2023-04-27 3:14 ` Yuan Fu @ 2023-04-27 8:57 ` Dmitry Gutov 1 sibling, 0 replies; 15+ messages in thread From: Dmitry Gutov @ 2023-04-27 8:57 UTC (permalink / raw) To: Yuan Fu; +Cc: Eli Zaretskii, 62951 On 27/04/2023 01:19, Yuan Fu wrote: > > >> On Apr 23, 2023, at 2:04 PM, Dmitry Gutov <dmitry@gutov.dev> wrote: >> >> On 23/04/2023 03:28, Yuan Fu wrote: >>> What do you think of extending the parser to support these macros instead? (So we fork tree-sitter-c.) If we can fix the parser, we don’t need to retrofit hacks onto font-lock, indent, etc, separately, and it truly fixes the problem. The downside is compiling from grammar source to grammar.c needs rust and node tools. But I guess depending on the grammar maintained by tree-sitter’s author isn’t too much different from depending on the grammar maintained by another individual (ie, me)? >> >> We had also talked at some point about replacing the actual text that the parser sees with something else. >> >> If this can be done in a straightforward way (with tracking the subsequent correspondence of "real" text back to the buffer for syntax highlighting), that might be the perfect solution: we'd have a defcustom which would hold a list of macros used in the current codebase in the form of templates, and we'd set a bunch of them in emacs/.dir-locals.el. >> >> I'm not sure how difficult this is to implement and maintain, but it's probably going to be less work to maintain than a fork of the grammar. > > Sounds to me a bit difficult to write. Eg, translating between tree-sitter position and buffer position efficiently isn’t too easy. Now plus narrowing, and what if the narrowing boundary is in the middle of a replace region? When the match isn't full, the replacement wouldn't be performed. Same as with macro name that isn't fully typed out yet. Yeah, it does seem like a lot of work, but the result might be that everybody's macros could be supported. I'm definitely not volunteering, though, so please take this as just a suggestion. > My idea right now is to use the range feature in tree-sitter. Since the “body” of FOR_EACH_TAIL is valid C, I can either set the ranges for the parser so it ignores FOR_EACH_TAIL, or I can add another parser that only parses the body of FOR_EACH_TAIL. Sounds good. Especially for Emacs 29 (maybe). ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-04-30 5:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-19 16:40 bug#62951: 29.0.90; c-ts-mode: Incorrect fontification due to FOR_EACH_TAIL_SAFE Eli Zaretskii 2023-04-21 20:37 ` Yuan Fu 2023-04-22 7:17 ` Eli Zaretskii 2023-04-23 0:28 ` Yuan Fu 2023-04-23 6:25 ` Eli Zaretskii 2023-04-24 7:02 ` Yuan Fu 2023-04-23 21:04 ` Dmitry Gutov 2023-04-26 22:19 ` Yuan Fu 2023-04-27 3:14 ` Yuan Fu 2023-04-27 15:03 ` Eli Zaretskii 2023-04-27 19:56 ` Yuan Fu 2023-04-28 5:41 ` Eli Zaretskii 2023-04-29 22:55 ` Yuan Fu 2023-04-30 5:24 ` Eli Zaretskii 2023-04-27 8:57 ` Dmitry Gutov
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.