unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer
@ 2022-11-25 15:04 Eli Zaretskii
  2022-11-26  3:18 ` Yuan Fu
  0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2022-11-25 15:04 UTC (permalink / raw)
  To: 59574; +Cc: Yuan Fu

To reproduce:

  emacs -Q
  C-x C-f foo.c RET
  M-x c-ts-mode RET
  Type "in"

Make sure foo.c doesn't exist, so you start from an empty buffer.  As soon
as you type the second character of "in", there's an assertion violation:

treesit.c:1383: Emacs fatal error: assertion failed: end_byte <= BUF_ZV_BYTE (bu
ffer)

  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=22, backtrace_limit=2147483647) at emacs.c:427
  427       signal (sig, SIG_DFL);
  (gdb) up
  #1  0x01230802 in die (
      msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)", file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
      at alloc.c:7697
  7697      terminate_due_to_signal (SIGABRT, INT_MAX);
  (gdb)
  #2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
      buffer=0x7fe94b0) at treesit.c:1383
  1383          eassert (end_byte <= BUF_ZV_BYTE (buffer));
  (gdb) p end_byte
  $1 = 4
  (gdb) p BUF_ZV_BYTE(buffer)
  $2 = 3

Interestingly, this only happens once, when the buffer includes exactly 1
byte and an additional character is inserted.  If you get past this
assertion, further characters can be inserted without any problems, and
end_byte always equals BUF_ZV_BYTE.

The backtrace is below, if it is interesting.

I couldn't figure out where did tree-sitter take the range it returns to us.
Yuan, can you describe how does the parser get the range it needs to
consider?  If I put a breakpoint in treesit-parser-set-included-ranges, the
breakpoint never breaks, so this doesn't seem to be how the range is set in
this scenario.

There's also something strange in treesit_record_change: when it is called
for the first time in a buffer which was empty and you insert one character,
we bypass the updating of visible_beg and visible_end fields of the Lisp
parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
to me that we should still update these two fields regardless, no?  Only the
call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
of update explains the assertion, but even if I move the condition to guard
only treesit_tree_edit_1, the assertion still happens, so I guess my
hypothesis eats dust.)

Here's the backtrace I promised:

(gdb) bt
#0  terminate_due_to_signal (sig=22, backtrace_limit=2147483647)
    at emacs.c:427
#1  0x01230802 in die (
    msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)",
 file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
    at alloc.c:7697
#2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
    buffer=0x7fe94b0) at treesit.c:1383
#3  0x01353c7e in treesit_call_after_change_functions (old_tree=0x84d9fe0,
    new_tree=0x856a5d0, parser=XIL(0xa00000000853e4e8)) at treesit.c:859
#4  0x01353fff in treesit_ensure_parsed (parser=XIL(0xa00000000853e4e8))
    at treesit.c:906
#5  0x01354ff8 in Ftreesit_parser_root_node (parser=XIL(0xa00000000853e4e8))
    at treesit.c:1328
#6  0x012773d2 in funcall_subr (subr=0x1883640 <Streesit_parser_root_node>,
    numargs=1, args=0x6c10470) at eval.c:3034
#7  0x012e9b92 in exec_byte_code (fun=XIL(0xa00000000850edc8),
    args_template=256, nargs=1, args=0x6c10390) at bytecode.c:809
#8  0x0127799a in fetch_and_exec_byte_code (fun=XIL(0xa0000000084b0d20),
    args_template=257, nargs=1, args=0x6c101c8) at eval.c:3081
#9  0x01277ef9 in funcall_lambda (fun=XIL(0xa0000000084b0d20), nargs=1,
    arg_vector=0x6c101c8) at eval.c:3153
#10 0x01276e66 in funcall_general (fun=XIL(0xa0000000084b0d20), numargs=1,
    args=0x6c101c8) at eval.c:2945
#11 0x012771eb in Ffuncall (nargs=2, args=0x6c101c0) at eval.c:2995
#12 0x012762ae in run_hook_wrapped_funcall (nargs=2, args=0x6c101c0)
    at eval.c:2773
#13 0x01276765 in run_hook_with_args (nargs=2, args=0x6c101c0,
    funcall=0x1276266 <run_hook_wrapped_funcall>) at eval.c:2854
#14 0x012762fd in Frun_hook_wrapped (nargs=2, args=0x6c101c0) at eval.c:2788
#15 0x0127784b in funcall_subr (subr=0x187cf00 <Srun_hook_wrapped>,
    numargs=2, args=0x6c101c0) at eval.c:3059
#16 0x012e9b92 in exec_byte_code (fun=XIL(0xa0000000061302c4),
    args_template=514, nargs=2, args=0x6c100f8) at bytecode.c:809
#17 0x0127799a in fetch_and_exec_byte_code (fun=XIL(0xa00000000612fd94),
    args_template=257, nargs=1, args=0x82ac88) at eval.c:3081
#18 0x01277ef9 in funcall_lambda (fun=XIL(0xa00000000612fd94), nargs=1,
    arg_vector=0x82ac88) at eval.c:3153
#19 0x01276e66 in funcall_general (fun=XIL(0xa00000000612fd94), numargs=1,
    args=0x82ac88) at eval.c:2945
#20 0x012771eb in Ffuncall (nargs=2, args=0x82ac80) at eval.c:2995
#21 0x012712a1 in internal_condition_case_n (bfun=0x127709f <Ffuncall>,
    nargs=2, args=0x82ac80, handlers=XIL(0x30),
    hfun=0x104286e <safe_eval_handler>) at eval.c:1558
#22 0x01042aa1 in safe__call (inhibit_quit=false, nargs=2,
    func=XIL(0x47648c4), ap=0x82ad44 "") at xdisp.c:3024
#23 0x01042b1a in safe_call (nargs=2, func=XIL(0x47648c4)) at xdisp.c:3039
#24 0x01042b6e in safe_call1 (fn=XIL(0x47648c4), arg=make_fixnum(1))
    at xdisp.c:3050
#25 0x010469d4 in handle_fontified_prop (it=0x82afd0) at xdisp.c:4416
#26 0x010453c7 in handle_stop (it=0x82afd0) at xdisp.c:3951
#27 0x01051ebf in reseat (it=0x82afd0, pos=..., force_p=true) at xdisp.c:7469
#28 0x01044495 in init_iterator (it=0x82afd0, w=0x7958be0, charpos=1,
    bytepos=1, row=0x7a214a0, base_face_id=DEFAULT_FACE_ID) at xdisp.c:3488
#29 0x010446c3 in start_display (it=0x82afd0, w=0x7958be0, pos=...)
    at xdisp.c:3568
#30 0x0107c99e in try_window (window=XIL(0xa000000007958be0), pos=...,
    flags=1) at xdisp.c:20511
#31 0x01079579 in redisplay_window (window=XIL(0xa000000007958be0),
    just_this_one_p=true) at xdisp.c:19903
#32 0x010706c6 in redisplay_window_1 (window=XIL(0xa000000007958be0))
    at xdisp.c:17405
#33 0x0127108e in internal_condition_case_1 (
    bfun=0x107066e <redisplay_window_1>, arg=XIL(0xa000000007958be0),
    handlers=XIL(0xc000000006462abc), hfun=0x10702c6 <redisplay_window_error>)
    at eval.c:1498
#34 0x0106f10a in redisplay_internal () at xdisp.c:16944
#35 0x0106c163 in redisplay () at xdisp.c:16006
#36 0x01174cf8 in read_char (commandflag=1, map=XIL(0xc000000008096220),
    prev_event=XIL(0), used_mouse_menu=0x82f41f, end_time=0x0)
    at keyboard.c:2623
#37 0x0118ec5e in read_key_sequence (keybuf=0x82f6f8, prompt=XIL(0),
    dont_downcase_last=false, can_return_switch_frame=true,
    fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:10070
#38 0x0117033d in command_loop_1 () at keyboard.c:1376
#39 0x01270fa4 in internal_condition_case (bfun=0x116fcdc <command_loop_1>,
    handlers=XIL(0x90), hfun=0x116ecaa <cmd_error>) at eval.c:1474
#40 0x0116f749 in command_loop_2 (handlers=XIL(0x90)) at keyboard.c:1125
#41 0x0126fe2b in internal_catch (tag=XIL(0x10290),
    func=0x116f712 <command_loop_2>, arg=XIL(0x90)) at eval.c:1197
#42 0x0116f6b4 in command_loop () at keyboard.c:1103
#43 0x0116e70a in recursive_edit_1 () at keyboard.c:712
#44 0x0116e9a8 in Frecursive_edit () at keyboard.c:795
#45 0x0116975d in main (argc=2, argv=0xa428e0) at emacs.c:2523

Lisp Backtrace:
"treesit-parser-root-node" (0x6c10470)
"treesit-buffer-root-node" (0x6c10388)
"treesit-font-lock-fontify-region" (0x6c10300)
"font-lock-default-fontify-region" (0x6c10298)
"font-lock-fontify-region" (0x6c10230)
0x84b0d20 PVEC_COMPILED
"run-hook-wrapped" (0x6c101c0)
"jit-lock--run-functions" (0x6c100e8)
"jit-lock-fontify-now" (0x6c10058)
"jit-lock-function" (0x82ac88)
"redisplay_internal (C function)" (0x0)
(gdb)


In GNU Emacs 29.0.50 (build 2261, i686-pc-mingw32) of 2022-11-25 built
 on HOME-C4E4A596F7
Repository revision: af545234314601ba3dcd8bf32e0d9b46e1917f79
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --enable-checking=yes,glyphs 'CFLAGS=-O0 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NOTIFY
W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel dos-w32
ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs theme-loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads
w32notify w32 lcms2 multi-tty make-network-process emacs)

Memory information:
((conses 16 42624 11101)
 (symbols 48 6278 0)
 (strings 16 16553 2914)
 (string-bytes 1 398654)
 (vectors 16 9312)
 (vector-slots 8 146415 13640)
 (floats 8 23 27)
 (intervals 40 274 97)
 (buffers 896 10))





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer
  2022-11-25 15:04 bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer Eli Zaretskii
@ 2022-11-26  3:18 ` Yuan Fu
  2022-11-26 14:31   ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Yuan Fu @ 2022-11-26  3:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 59574



> On Nov 25, 2022, at 7:04 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> To reproduce:
> 
>  emacs -Q
>  C-x C-f foo.c RET
>  M-x c-ts-mode RET
>  Type "in"

Thanks for finding this out! 

> 
> Make sure foo.c doesn't exist, so you start from an empty buffer.  As soon
> as you type the second character of "in", there's an assertion violation:
> 
> treesit.c:1383: Emacs fatal error: assertion failed: end_byte <= BUF_ZV_BYTE (bu
> ffer)
> 
>  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=22, backtrace_limit=2147483647) at emacs.c:427
>  427       signal (sig, SIG_DFL);
>  (gdb) up
>  #1  0x01230802 in die (
>      msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)", file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
>      at alloc.c:7697
>  7697      terminate_due_to_signal (SIGABRT, INT_MAX);
>  (gdb)
>  #2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
>      buffer=0x7fe94b0) at treesit.c:1383
>  1383          eassert (end_byte <= BUF_ZV_BYTE (buffer));
>  (gdb) p end_byte
>  $1 = 4
>  (gdb) p BUF_ZV_BYTE(buffer)
>  $2 = 3
> 
> Interestingly, this only happens once, when the buffer includes exactly 1
> byte and an additional character is inserted.  If you get past this
> assertion, further characters can be inserted without any problems, and
> end_byte always equals BUF_ZV_BYTE.
> 
> The backtrace is below, if it is interesting.
> 
> I couldn't figure out where did tree-sitter take the range it returns to us.
> Yuan, can you describe how does the parser get the range it needs to
> consider?  If I put a breakpoint in treesit-parser-set-included-ranges, the
> breakpoint never breaks, so this doesn't seem to be how the range is set in
> this scenario.

After we parse the buffer (in treesit_ensure_parsed) we compute the ranges that has changed since last parse, by calling ts_tree_get_changed_ranges, and pass the ranges to notifier functions (those added by treesit-parser-add-notifier). This range is different from the range within which a parser operates. That range is set by treesit-parser-set-included-ranges, and is not involved with the parsing, treesit_record_changes, visible_beg/end stuff.

Both feature happens to use treesit_make_ranges as a helper function, but the similarity ends there.

> There's also something strange in treesit_record_change: when it is called
> for the first time in a buffer which was empty and you insert one character,
> we bypass the updating of visible_beg and visible_end fields of the Lisp
> parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
> to me that we should still update these two fields regardless, no?  Only the
> call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
> of update explains the assertion, but even if I move the condition to guard
> only treesit_tree_edit_1, the assertion still happens, so I guess my
> hypothesis eats dust.)

We don’t need to update visible_beg/end in treesit_record_change if tree is NULL, because visible_beg/end represents the range of buffer that the tree sees, so if there is no tree, visible_beg/end can be considered uninitialized. However you are right about needing to update visible_beg/end, but in treesit_ensure_position_synced (I renamed it to treesit_sync_visible_region): that’s where we ensure visible_beg/end equals to BUF_BEGV_BYTE/friends. 

The problem is we don’t update visible_beg/end for the very first parse, when tree is NULL.

I also added some comments, hopefully they sufficiently explain everything.

Yuan






^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer
  2022-11-26  3:18 ` Yuan Fu
@ 2022-11-26 14:31   ` Eli Zaretskii
  0 siblings, 0 replies; 3+ messages in thread
From: Eli Zaretskii @ 2022-11-26 14:31 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 59574-done

> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 25 Nov 2022 19:18:09 -0800
> Cc: 59574@debbugs.gnu.org
> 
> > There's also something strange in treesit_record_change: when it is called
> > for the first time in a buffer which was empty and you insert one character,
> > we bypass the updating of visible_beg and visible_end fields of the Lisp
> > parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
> > to me that we should still update these two fields regardless, no?  Only the
> > call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
> > of update explains the assertion, but even if I move the condition to guard
> > only treesit_tree_edit_1, the assertion still happens, so I guess my
> > hypothesis eats dust.)
> 
> We don’t need to update visible_beg/end in treesit_record_change if tree is NULL, because visible_beg/end represents the range of buffer that the tree sees, so if there is no tree, visible_beg/end can be considered uninitialized. However you are right about needing to update visible_beg/end, but in treesit_ensure_position_synced (I renamed it to treesit_sync_visible_region): that’s where we ensure visible_beg/end equals to BUF_BEGV_BYTE/friends. 
> 
> The problem is we don’t update visible_beg/end for the very first parse, when tree is NULL.
> 
> I also added some comments, hopefully they sufficiently explain everything.

Thanks, the problem is gone, so I'm closing the bug.





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-26 14:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-25 15:04 bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer Eli Zaretskii
2022-11-26  3:18 ` Yuan Fu
2022-11-26 14:31   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).