unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
@ 2022-11-27 10:12 Eli Zaretskii
  2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eli Zaretskii @ 2022-11-27 10:12 UTC (permalink / raw)
  To: 59628

To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
or c++-ts-mode, go to the middle of some function, and type

   M-: (treesit-beginning-of-defun) RET
or
   M-: (treesit-end-of-defun) RET

This will move point to very strange places, which generally are neither the
beginning nor the end of the function.  In very simple functions, like this
one:

  void
  __executable_start (void)
  {
    emacs_abort ();
  }

the result is correct.  But once the function is even slightly more
complicated, for example, like this:

  static int
  margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
  {
    if (margin > 0)
      {
	int width = w->total_cols;
	double d = max (0, margin);
	d = min (width / 2 - 1, d);
	/* Since MARGIN is positive, we cannot possibly have less than
	   one glyph for the marginal area.  */
	return max (1, (int) ((double) total_glyphs / width * d));
      }
    return 0;
  }

the results are very far off the mark.

These two functions are the only ones to move by defuns in treesit-based
modes, right?  So they should be improved, IMO.

In GNU Emacs 29.0.50 (build 2273, i686-pc-mingw32) of 2022-11-27 built
 on HOME-C4E4A596F7
Repository revision: 80dcd78ff1fce3241043edf1951289eef0bf50c9
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: C

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
misearch multi-isearch vc-git diff-mode easy-mmode vc-dispatcher
c-ts-mode rx treesit cl-seq cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads w32notify w32 lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 60472 7440)
 (symbols 48 7296 0)
 (strings 16 20076 2163)
 (string-bytes 1 498360)
 (vectors 16 11107)
 (vector-slots 8 164960 11733)
 (floats 8 29 319)
 (intervals 40 2962 92)
 (buffers 896 14))





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
  2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
@ 2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-28 22:08 ` Yuan Fu
  2022-11-30 23:07 ` Yuan Fu
  2 siblings, 0 replies; 6+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-28 10:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59628

Eli Zaretskii <eliz@gnu.org> writes:

> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
> or c++-ts-mode, go to the middle of some function, and type
>
>    M-: (treesit-beginning-of-defun) RET
> or
>    M-: (treesit-end-of-defun) RET
>
> This will move point to very strange places, which generally are neither the
> beginning nor the end of the function.  In very simple functions, like this
> one:
>
>   void
>   __executable_start (void)
>   {
>     emacs_abort ();
>   }
>
> the result is correct.  But once the function is even slightly more
> complicated, for example, like this:
>
>   static int
>   margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
>   {
>     if (margin > 0)
>       {
> 	int width = w->total_cols;
> 	double d = max (0, margin);
> 	d = min (width / 2 - 1, d);
> 	/* Since MARGIN is positive, we cannot possibly have less than
> 	   one glyph for the marginal area.  */
> 	return max (1, (int) ((double) total_glyphs / width * d));
>       }
>     return 0;
>   }
>
> the results are very far off the mark.
>
> These two functions are the only ones to move by defuns in treesit-based
> modes, right?  So they should be improved, IMO.
>

If I type

M-: (setq treesit-defun-type-regexp "function_definition") RET

treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
That begs the question: Is it really necessary to have a Tree-sitter
regexp variable to match defun nodes?  If yes, should it already have a
sensible default value so things work out of the box in most major
modes?





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in  C/C++
  2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
  2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-11-28 22:08 ` Yuan Fu
  2022-11-29  0:12   ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-30 23:07 ` Yuan Fu
  2 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2022-11-28 22:08 UTC (permalink / raw)
  To: Daniel Martín; +Cc: Eli Zaretskii, 59628


Daniel Martín <mardani29@yahoo.es> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> To reproduce, visit any C source file in the Emacs tree, turn on c-ts-mode
>> or c++-ts-mode, go to the middle of some function, and type
>>
>>    M-: (treesit-beginning-of-defun) RET
>> or
>>    M-: (treesit-end-of-defun) RET
>>
>> This will move point to very strange places, which generally are neither the
>> beginning nor the end of the function.  In very simple functions, like this
>> one:
>>
>>   void
>>   __executable_start (void)
>>   {
>>     emacs_abort ();
>>   }
>>
>> the result is correct.  But once the function is even slightly more
>> complicated, for example, like this:
>>
>>   static int
>>   margin_glyphs_to_reserve (struct window *w, int total_glyphs, int margin)
>>   {
>>     if (margin > 0)
>>       {
>> 	int width = w->total_cols;
>> 	double d = max (0, margin);
>> 	d = min (width / 2 - 1, d);
>> 	/* Since MARGIN is positive, we cannot possibly have less than
>> 	   one glyph for the marginal area.  */
>> 	return max (1, (int) ((double) total_glyphs / width * d));
>>       }
>>     return 0;
>>   }
>>
>> the results are very far off the mark.
>>
>> These two functions are the only ones to move by defuns in treesit-based
>> modes, right?  So they should be improved, IMO.

Yeah, I’ll need to look at C grammar and fix treesit-defun-type-regexp.

>
> If I type
>
> M-: (setq treesit-defun-type-regexp "function_definition") RET
>
> treesit-beginning-of-defun and treesit-end-of-defun do the right thing.
> That begs the question: Is it really necessary to have a Tree-sitter
> regexp variable to match defun nodes?  If yes, should it already have a
> sensible default value so things work out of the box in most major
> modes?

Different languages have different grammars that give different names to
function definitions and class definitions. So it is necessary to have a
regexp variable. Finding such a regexp isn’t too hard, so I don’t think
we need a default value. If we do have a default, it would be often wrong,
given differences between language grammars.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++
  2022-11-28 22:08 ` Yuan Fu
@ 2022-11-29  0:12   ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-11-29  0:12 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, 59628

Yuan Fu <casouri@gmail.com> writes:

>
> Different languages have different grammars that give different names to
> function definitions and class definitions. So it is necessary to have a
> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> we need a default value. If we do have a default, it would be often wrong,
> given differences between language grammars.

I see that each major mode sets the value of that buffer-local variable.
c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
correct?  In C code, treesit-explore-mode shows function definition
nodes as "function_definition", so I think the regexp is matching more
nodes than expected, causing C-M-a C-M-e to move to weird places in the
buffer.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in  C/C++
  2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
  2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-11-28 22:08 ` Yuan Fu
@ 2022-11-30 23:07 ` Yuan Fu
  2022-12-01  8:08   ` Eli Zaretskii
  2 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2022-11-30 23:07 UTC (permalink / raw)
  To: Daniel Martín; +Cc: eliz, 59628


Daniel Martín <mardani29@yahoo.es> writes:

> Yuan Fu <casouri@gmail.com> writes:
>
>>
>> Different languages have different grammars that give different names to
>> function definitions and class definitions. So it is necessary to have a
>> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
>> we need a default value. If we do have a default, it would be often wrong,
>> given differences between language grammars.
>
> I see that each major mode sets the value of that buffer-local variable.
> c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> correct?  In C code, treesit-explore-mode shows function definition
> nodes as "function_definition", so I think the regexp is matching more
> nodes than expected, causing C-M-a C-M-e to move to weird places in the
> buffer.

Right, I’ve fixed the value in 599369bf3a3.

Yuan





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in  C/C++
  2022-11-30 23:07 ` Yuan Fu
@ 2022-12-01  8:08   ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2022-12-01  8:08 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 59628-done, mardani29

> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 30 Nov 2022 15:07:45 -0800
> Cc: eliz@gnu.org,
>  59628@debbugs.gnu.org
> 
> 
> Daniel Martín <mardani29@yahoo.es> writes:
> 
> > Yuan Fu <casouri@gmail.com> writes:
> >
> >>
> >> Different languages have different grammars that give different names to
> >> function definitions and class definitions. So it is necessary to have a
> >> regexp variable. Finding such a regexp isn’t too hard, so I don’t think
> >> we need a default value. If we do have a default, it would be often wrong,
> >> given differences between language grammars.
> >
> > I see that each major mode sets the value of that buffer-local variable.
> > c-ts-mode sets it to "\\(?:definition\\|specifier\\)" but, is that
> > correct?  In C code, treesit-explore-mode shows function definition
> > nodes as "function_definition", so I think the regexp is matching more
> > nodes than expected, causing C-M-a C-M-e to move to weird places in the
> > buffer.
> 
> Right, I’ve fixed the value in 599369bf3a3.

Thanks, this seems to work now as expected.  So I'm closing the bug.





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-01  8:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-27 10:12 bug#59628: 29.0.50; treesit-beginning/end-of-defun problems in C/C++ Eli Zaretskii
2022-11-28 10:56 ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-28 22:08 ` Yuan Fu
2022-11-29  0:12   ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-11-30 23:07 ` Yuan Fu
2022-12-01  8:08   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).