* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
@ 2022-11-27 10:12 Eli Zaretskii
2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Eli Zaretskii @ 2022-11-27 10:12 UTC (permalink / raw)
To: 59628
To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
or c++-ts-mode, go to the middle of some function, and type
M-: (treesit-beginning-of-defun) RET
or
M-: (treesit-end-of-defun) RET
This will move point to very strange places, which generally are neither the
beginning nor the end of the function. In very simple functions, like this
one:
void
__executable_start (void)
{
emacs_abort ();
}
the result is correct. But once the function is even slightly more
complicated, for example, like this:
static int
margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
{
if (margin > 0)
{
int width = w->total_cols;
double d = max (0, margin);
d = min (width / 2 - 1, d);
/* Since MARGIN is positive, we cannot possibly have less than
one glyph for the marginal area. */
return max (1, (int) ((double) total_glyphs / width * d));
}
return 0;
}
the results are very far off the mark.
These two functions are the only ones to move by defuns in treesit-based
modes, right? So they should be improved, IMO.
In GNU Emacs 29.0.50 (build 2273, i686-pc-mingw32) of 2022-11-27 built
on HOME-C4E4A596F7
Repository revision: 80dcd78ff1fce3241043edf1951289eef0bf50c9
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)
Configured using:
'configure -C --prefix=/d/usr --with-wide-int
--enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB
Important settings:
value of $LANG: ENU
locale-coding-system: cp1255
Major mode: C
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
misearch multi-isearch vc-git diff-mode easy-mmode vc-dispatcher
c-ts-mode rx treesit cl-seq cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads w32notify w32 lcms2 multi-tty make-network-process emacs)
Memory information:
((conses 16 60472 7440)
(symbols 48 7296 0)
(strings 16 20076 2163)
(string-bytes 1 498360)
(vectors 16 11107)
(vector-slots 8 164960 11733)
(floats 8 29 319)
(intervals 40 2962 92)
(buffers 896 14))
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
@ 2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-28 22:08 ` Yuan Fu
2022-11-30 23:07 ` Yuan Fu
2 siblings, 0 replies; 6+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-28 10:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 59628
Eli Zaretskii <eliz@gnu.org> writes:
> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
> or c++-ts-mode, go to the middle of some function, and type
>
> M-: (treesit-beginning-of-defun) RET
> or
> M-: (treesit-end-of-defun) RET
>
> This will move point to very strange places, which generally are neither the
> beginning nor the end of the function. In very simple functions, like this
> one:
>
> void
> __executable_start (void)
> {
> emacs_abort ();
> }
>
> the result is correct. But once the function is even slightly more
> complicated, for example, like this:
>
> static int
> margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
> {
> if (margin > 0)
> {
> int width = w->total_cols;
> double d = max (0, margin);
> d = min (width / 2 - 1, d);
> /* Since MARGIN is positive, we cannot possibly have less than
> one glyph for the marginal area. */
> return max (1, (int) ((double) total_glyphs / width * d));
> }
> return 0;
> }
>
> the results are very far off the mark.
>
> These two functions are the only ones to move by defuns in treesit-based
> modes, right? So they should be improved, IMO.
>
If I type
M-: (setq treesit-defun-type-regexp "function_definition") RET
treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
That begs the question: Is it really necessary to have a Tree-sitter
regexp variable to match defun nodes? If yes, should it already have a
sensible default value so things work out of the box in most major
modes?
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-28 22:08 ` Yuan Fu
2022-11-29 0:12 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-30 23:07 ` Yuan Fu
2 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2022-11-28 22:08 UTC (permalink / raw)
To: Daniel Martín; +Cc: Eli Zaretskii, 59628
Daniel Martín <mardani29@yahoo.es> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>
>> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
>> or c++-ts-mode, go to the middle of some function, and type
>>
>> M-: (treesit-beginning-of-defun) RET
>> or
>> M-: (treesit-end-of-defun) RET
>>
>> This will move point to very strange places, which generally are neither the
>> beginning nor the end of the function. In very simple functions, like this
>> one:
>>
>> void
>> __executable_start (void)
>> {
>> emacs_abort ();
>> }
>>
>> the result is correct. But once the function is even slightly more
>> complicated, for example, like this:
>>
>> static int
>> margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
>> {
>> if (margin > 0)
>> {
>> int width = w->total_cols;
>> double d = max (0, margin);
>> d = min (width / 2 - 1, d);
>> /* Since MARGIN is positive, we cannot possibly have less than
>> one glyph for the marginal area. */
>> return max (1, (int) ((double) total_glyphs / width * d));
>> }
>> return 0;
>> }
>>
>> the results are very far off the mark.
>>
>> These two functions are the only ones to move by defuns in treesit-based
>> modes, right? So they should be improved, IMO.
Yeah, I’ll need to look at C grammar and fix treesit-defun-type-regexp.
>
> If I type
>
> M-: (setq treesit-defun-type-regexp "function_definition") RET
>
> treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
> That begs the question: Is it really necessary to have a Tree-sitter
> regexp variable to match defun nodes? If yes, should it already have a
> sensible default value so things work out of the box in most major
> modes?
Different languages have different grammars that give different names to
function definitions and class definitions. So it is necessary to have a
regexp variable. Finding such a regexp isn’t too hard, so I don’t think
we need a default value. If we do have a default, it would be often wrong,
given differences between language grammars.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
2022-11-28 22:08 ` Yuan Fu
@ 2022-11-29 0:12 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 6+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-29 0:12 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, 59628
Yuan Fu <casouri@gmail.com> writes:
>
> Different languages have different grammars that give different names to
> function definitions and class definitions. So it is necessary to have a
> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> we need a default value. If we do have a default, it would be often wrong,
> given differences between language grammars.
I see that each major mode sets the value of that buffer-local variable.
c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
correct? In C code, treesit-explore-mode shows function definition
nodes as "function_definition", so I think the regexp is matching more
nodes than expected, causing C-M-a C-M-e to move to weird places in the
buffer.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-28 22:08 ` Yuan Fu
@ 2022-11-30 23:07 ` Yuan Fu
2022-12-01 8:08 ` Eli Zaretskii
2 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2022-11-30 23:07 UTC (permalink / raw)
To: Daniel Martín; +Cc: eliz, 59628
Daniel Martín <mardani29@yahoo.es> writes:
> Yuan Fu <casouri@gmail.com> writes:
>
>>
>> Different languages have different grammars that give different names to
>> function definitions and class definitions. So it is necessary to have a
>> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
>> we need a default value. If we do have a default, it would be often wrong,
>> given differences between language grammars.
>
> I see that each major mode sets the value of that buffer-local variable.
> c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> correct? In C code, treesit-explore-mode shows function definition
> nodes as "function_definition", so I think the regexp is matching more
> nodes than expected, causing C-M-a C-M-e to move to weird places in the
> buffer.
Right, I’ve fixed the value in 599369bf3a3.
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
2022-11-30 23:07 ` Yuan Fu
@ 2022-12-01 8:08 ` Eli Zaretskii
0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2022-12-01 8:08 UTC (permalink / raw)
To: Yuan Fu; +Cc: 59628-done, mardani29
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 30 Nov 2022 15:07:45 -0800
> Cc: eliz@gnu.org,
> 59628@debbugs.gnu.org
>
>
> Daniel Martín <mardani29@yahoo.es> writes:
>
> > Yuan Fu <casouri@gmail.com> writes:
> >
> >>
> >> Different languages have different grammars that give different names to
> >> function definitions and class definitions. So it is necessary to have a
> >> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> >> we need a default value. If we do have a default, it would be often wrong,
> >> given differences between language grammars.
> >
> > I see that each major mode sets the value of that buffer-local variable.
> > c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> > correct? In C code, treesit-explore-mode shows function definition
> > nodes as "function_definition", so I think the regexp is matching more
> > nodes than expected, causing C-M-a C-M-e to move to weird places in the
> > buffer.
>
> Right, I’ve fixed the value in 599369bf3a3.
Thanks, this seems to work now as expected. So I'm closing the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-12-01 8:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-28 22:08 ` Yuan Fu
2022-11-29 0:12 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-30 23:07 ` Yuan Fu
2022-12-01 8:08 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).