bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Yuan Fu <casouri@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 59574@debbugs.gnu.org
Subject: bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer
Date: Fri, 25 Nov 2022 19:18:09 -0800	[thread overview]
Message-ID: <6350D0DE-63CD-410A-AA48-56D924ED67EA@gmail.com> (raw)
In-Reply-To: <837czjulc4.fsf@gnu.org>

> On Nov 25, 2022, at 7:04 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> To reproduce:
> 
>  emacs -Q
>  C-x C-f foo.c RET
>  M-x c-ts-mode RET
>  Type "in"

Thanks for finding this out! 

> 
> Make sure foo.c doesn't exist, so you start from an empty buffer.  As soon
> as you type the second character of "in", there's an assertion violation:
> 
> treesit.c:1383: Emacs fatal error: assertion failed: end_byte <= BUF_ZV_BYTE (bu
> ffer)
> 
>  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=22, backtrace_limit=2147483647) at emacs.c:427
>  427       signal (sig, SIG_DFL);
>  (gdb) up
>  #1  0x01230802 in die (
>      msg=0x18e6778 <DEFAULT_REHASH_SIZE+3288> "end_byte <= BUF_ZV_BYTE (buffer)", file=0x18e5fcc <DEFAULT_REHASH_SIZE+1324> "treesit.c", line=1383)
>      at alloc.c:7697
>  7697      terminate_due_to_signal (SIGABRT, INT_MAX);
>  (gdb)
>  #2  0x01355636 in treesit_make_ranges (ranges=0x856a778, len=1,
>      buffer=0x7fe94b0) at treesit.c:1383
>  1383          eassert (end_byte <= BUF_ZV_BYTE (buffer));
>  (gdb) p end_byte
>  $1 = 4
>  (gdb) p BUF_ZV_BYTE(buffer)
>  $2 = 3
> 
> Interestingly, this only happens once, when the buffer includes exactly 1
> byte and an additional character is inserted.  If you get past this
> assertion, further characters can be inserted without any problems, and
> end_byte always equals BUF_ZV_BYTE.
> 
> The backtrace is below, if it is interesting.
> 
> I couldn't figure out where did tree-sitter take the range it returns to us.
> Yuan, can you describe how does the parser get the range it needs to
> consider?  If I put a breakpoint in treesit-parser-set-included-ranges, the
> breakpoint never breaks, so this doesn't seem to be how the range is set in
> this scenario.

After we parse the buffer (in treesit_ensure_parsed) we compute the ranges that has changed since last parse, by calling ts_tree_get_changed_ranges, and pass the ranges to notifier functions (those added by treesit-parser-add-notifier). This range is different from the range within which a parser operates. That range is set by treesit-parser-set-included-ranges, and is not involved with the parsing, treesit_record_changes, visible_beg/end stuff.

Both feature happens to use treesit_make_ranges as a helper function, but the similarity ends there.

> There's also something strange in treesit_record_change: when it is called
> for the first time in a buffer which was empty and you insert one character,
> we bypass the updating of visible_beg and visible_end fields of the Lisp
> parser object, because XTS_PARSER (lisp_parser)->tree is NULL.  But it looks
> to me that we should still update these two fields regardless, no?  Only the
> call to treesit_tree_edit_1 needs the tree.  (I thought that maybe this lack
> of update explains the assertion, but even if I move the condition to guard
> only treesit_tree_edit_1, the assertion still happens, so I guess my
> hypothesis eats dust.)

We don’t need to update visible_beg/end in treesit_record_change if tree is NULL, because visible_beg/end represents the range of buffer that the tree sees, so if there is no tree, visible_beg/end can be considered uninitialized. However you are right about needing to update visible_beg/end, but in treesit_ensure_position_synced (I renamed it to treesit_sync_visible_region): that’s where we ensure visible_beg/end equals to BUF_BEGV_BYTE/friends. 

The problem is we don’t update visible_beg/end for the very first parse, when tree is NULL.

I also added some comments, hopefully they sufficiently explain everything.

Yuan

next prev parent reply	other threads:[~2022-11-26  3:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 15:04 bug#59574: 29.0.50; Emacs crashes when using tree-sitter-based mode in an empty buffer Eli Zaretskii
2022-11-26  3:18 ` Yuan Fu [this message]
2022-11-26 14:31   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6350D0DE-63CD-410A-AA48-56D924ED67EA@gmail.com \
    --to=casouri@gmail.com \
    --cc=59574@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.