From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#61369: Problem with keeping tree-sitter parse tree up-to-date Date: Mon, 13 Feb 2023 15:59:02 -0800 Message-ID: <1AC63591-F4EF-411F-B554-7CD38B4B4888@gmail.com> References: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_C4C62C22-6CBE-42EB-A4C4-AAA5F12BCE0A" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30527"; mail-complaints-to="usenet@ciao.gmane.io" Cc: theo@thornhill.no, 61369@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Feb 14 01:00:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pRikD-0007lz-0B for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 14 Feb 2023 01:00:25 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pRijs-0001Dg-KT; Mon, 13 Feb 2023 19:00:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pRijq-0001DW-Re for bug-gnu-emacs@gnu.org; Mon, 13 Feb 2023 19:00:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pRijq-0004nz-Hp for bug-gnu-emacs@gnu.org; Mon, 13 Feb 2023 19:00:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pRijq-0004Vk-2e for bug-gnu-emacs@gnu.org; Mon, 13 Feb 2023 19:00:02 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 14 Feb 2023 00:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61369 X-GNU-PR-Package: emacs Original-Received: via spool by 61369-submit@debbugs.gnu.org id=B61369.167633276417260 (code B ref 61369); Tue, 14 Feb 2023 00:00:02 +0000 Original-Received: (at 61369) by debbugs.gnu.org; 13 Feb 2023 23:59:24 +0000 Original-Received: from localhost ([127.0.0.1]:51993 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pRijD-0004UK-Ez for submit@debbugs.gnu.org; Mon, 13 Feb 2023 18:59:24 -0500 Original-Received: from mail-pj1-f52.google.com ([209.85.216.52]:38467) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pRijB-0004U3-GH for 61369@debbugs.gnu.org; Mon, 13 Feb 2023 18:59:22 -0500 Original-Received: by mail-pj1-f52.google.com with SMTP id a8-20020a17090a6d8800b002336b48f653so12537910pjk.3 for <61369@debbugs.gnu.org>; Mon, 13 Feb 2023 15:59:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:cc:date:message-id:subject:mime-version:from:from:to:cc:subject :date:message-id:reply-to; bh=mNX0orqMOjoF7wfu9T7pQhz4iLGFczXNUl1I3AC4tLQ=; b=mc/g0JgCFAe9fLWFN1xZM+OumEkP5Mmwt3PL3qw6fuVcOudBvplnJ31fb0G5u2SEfr /rOfWY4dTYbur9UB4yg5FjdzdYD7kneG+4fgvAIvDlZwqaF8CA/WyXcfp2DK1ja7/JxL OC7dA9lEybDtF4abefYNLtGBXxgsMmZhvzBXdz3ujBlJXZpeSe2+5UoJFsnysOHIsvo6 tDdxj9WBoiarAFAoPjwuM1AKIa4o3YUIQq4UPhtR4RkfslnaWNjrBss1n1pzOG2lV6l9 ay1zeyUZXYaXRrcviPzNCR8T++oHTFsUS317JFSP6Bq3NtGYY8fuL6aefDbv32DKX6zb 4RVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:cc:date:message-id:subject:mime-version:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=mNX0orqMOjoF7wfu9T7pQhz4iLGFczXNUl1I3AC4tLQ=; b=soVnaZEQLEY7QSNUEzTHB06wNEtGnB6cF5mcOZtNvCCcFHDT8GYSPoGuxk8NK1CEve LJ7+JJs6bE8NZXDWL1lUF3fiK+/hctBhURq87IflmLQX05fy/GZJyNYop0l8anociLyC D/Nw/pWhQ/IiBoRSqZUFItYq/ePNzZg9sLzES+KNak57Aqxyr6Iy3jZ0k2xWeZBh5dI7 mXoXlCEvsFcVzqwa8I0tk5FvnitRUA0q9DrHQocHmBju+cgAe84v+fS2QB+0hyuKNg0S aaxKSD5QlKvI4Z/JyTUq/1Bh+O1WXUs2YLgo656e5TxYlm4Sd2p5cfVhZtQjMFgF/2gx U8RA== X-Gm-Message-State: AO0yUKXScHb20t3M3Je2uH9jctR4wsf/o7YqcoBD/jv1Lxcqk96vZ7As 9UXdCpM73SBmBKzdTFfNkfU= X-Google-Smtp-Source: AK7set/nJzqEtwOIUuEN4pZ378ioPU30Y1ZjEJTxZfbAZhDaCbK1pIRaTw3JODq3xVsYTS65UqoZpA== X-Received: by 2002:a17:902:d2d0:b0:19a:5a0d:f760 with SMTP id n16-20020a170902d2d000b0019a5a0df760mr19546784plc.18.1676332755536; Mon, 13 Feb 2023 15:59:15 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id i11-20020a170902eb4b00b001992521f23esm4910784pli.100.2023.02.13.15.59.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Feb 2023 15:59:15 -0800 (PST) X-Mailer: Apple Mail (2.3731.300.101.1.3) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:255533 Archived-At: --Apple-Mail=_C4C62C22-6CBE-42EB-A4C4-AAA5F12BCE0A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Yuan Fu writes: > Dmitry Gutov writes: > >> On 10/02/2023 03:22, Yuan Fu wrote: >>>> I just want to confirm that I can reproduce this, and that if you = skip >>>> the trailing newline from the use-statement, I don't get this = behavior. >>>> So it seems like the newline is the crucial point, right? >>>> >>>> Yes, same. >>>> >>>> Thr trailing newline is necessary. >>>> >>>> The empty lines at the beginning of the buffer (being copied to) = are necessary to reproduce this as well. >>> Hmmm, it might be related to how does tree-sitter does incremental >>> parsing? If the newline is necessary, then I guess it=E2=80=99s not = because >>> Emacs missed characters when reporting edits to tree-sitter. >> >> The newline is somewhat necessary: the scenario doesn't work, for >> example, if the pasted text doesn't include the newline but the = buffer >> had an additional (third) one at the top. >> >> But the scenario also doesn't work if some other (any) character is >> removed from the yanked line before pasting: it could be even one >> after the comment instruction (//). >> >> OTOH, if I add an extra char to the yanked line, anywhere, I can skip >> the newline. E.g. I can paste >> >> use std::path::{self, Path, PathBuf}; // good: std is a crate = namee >> >> without a newline and still see the exact same syntax error. >> >> So it looks more like an off-by-one error somewhere. Maybe in our >> code, but maybe in tree-sitter somewhere. > > Some progress report: I added a function that reads the buffer like a > parser would, like this: > > DEFUN ("treesit--parser-view", > Ftreesit__parser_view, > Streesit__parser_view, 1, 1, 0, > doc: /* Return the view of PARSER. > Read buffer like PARSER would into a string and return it. */) > (Lisp_Object parser) > { > const ptrdiff_t visible_beg =3D XTS_PARSER (parser)->visible_beg; > const ptrdiff_t visible_end =3D XTS_PARSER (parser)->visible_end; > const ptrdiff_t view_len =3D visible_end - visible_beg; > > char *str_buf =3D xzalloc (view_len + 1); > uint32_t read =3D 0; > TSPoint pos =3D { 0 }; > for (int idx =3D 0; idx < view_len; idx++) > { > const char *ch =3D treesit_read_buffer (XTS_PARSER (parser), > idx, pos, &read); > if (read =3D=3D 0) > { > xfree (str_buf); > xsignal1 (Qtreesit_error, make_fixnum (idx)); > } > else > str_buf[idx] =3D *ch; > } > Lisp_Object ret_str =3D make_string (str_buf, view_len); > xfree (str_buf); > return ret_str; > } > > After I follow the steps and got the error node, I run this function = on > the parser, and the returned string looks good. > > Next I=E2=80=99ll try to log every character actually read by the = parser and see > if anything seems fishy. I don=E2=80=99t know if it=E2=80=99s good news or bad news, but it = doesn=E2=80=99t seem like a off-by-one. Here is what I did: 1. I applied the attached patch (patch.diff) so that = treesit_read_buffer, the function used by tree-sitter parser to read buffer contents, prints the position it read and the character it gets to stdout. 2. I open test.rs which contains " let date =3D DateTime::::from_utc(date, chrono::Utc); " as in the recipe. I have rust-ts-mode enabled, so Emacs prints the characters read by the parser to stdout. I type return several times to separate this first batch of output from the next, which is what I=E2=80=99= m interested in. 3. I paste "use std::Path::{self, Path, PathBuf}; // good: std is a crate name " at the beginning of the buffer. Now the parse tree contains that error node. I go to the terminal, copy the output out, which looks like: 0 117 1 115 2 101 3 32 0 117 1 115 2 101 ... 133 59 134 10 134 10 134 10 134 10 4. I paste this output (output.txt) into a buffer, and reconstruct the = text read by the parser with (setq str (reconstruct)), where reconstruct is: (defun reconstruct () (goto-char (point-min)) (let ((result "")) (while (< (point) (point-max)) (let* ((str (buffer-substring (point) (line-end-position))) (nums (string-split str)) (pos (string-to-number (car nums))) (char (string-to-number (cadr nums)))) (when (not (< pos (length result))) (setq result (concat result (make-string (- (1+ pos) (length result)) ?0)))) (setf (aref result pos) char)) (forward-line 1)) result)) 5. I insert str into a new buffer, and (to my disappointment) the content is identical to the buffer text. There are two surprises here: 1) there isn=E2=80=99t an off-by-one bug, = 2) the parser actually read the whole buffer, rather than reading only the new content. Then there are even less reason for it to create that error node. In addition, I inserted a new line in the Rust source buffer (test.rs) = (which fixes the error node), here is what the parser read after that insertion: "0000000000000000000000000000000000000000000000000000000000000000000 let 0000 =3D 000000000000000000000000000000000000000000000000000);" 0 means it didn=E2=80=99t read that position, we can see that the parser = read all the newlines, "let ", " =3D ", and ");". I can=E2=80=99t discern = anything interesting from that, tho. Yuan --Apple-Mail=_C4C62C22-6CBE-42EB-A4C4-AAA5F12BCE0A Content-Disposition: attachment; filename=output.txt Content-Type: text/plain; x-unix-mode=0644; name="output.txt" Content-Transfer-Encoding: 7bit 0 117 1 115 2 101 3 32 0 117 1 115 2 101 3 32 4 115 3 32 4 115 5 116 6 100 7 58 4 115 5 116 6 100 7 58 8 58 9 80 10 97 11 116 12 104 13 58 9 80 13 58 14 58 15 123 16 115 17 101 18 108 19 102 20 44 16 115 17 101 18 108 19 102 20 44 21 32 22 80 21 32 22 80 23 97 24 116 25 104 26 44 22 80 26 44 27 32 28 80 27 32 28 80 29 97 30 116 31 104 32 66 33 117 34 102 35 125 28 80 35 125 36 59 37 32 38 32 39 47 40 47 37 32 38 32 39 47 40 47 41 32 42 103 43 111 44 111 45 100 46 58 47 32 48 115 49 116 50 100 51 32 52 105 53 115 54 32 55 97 56 32 57 99 58 114 59 97 60 116 61 101 62 32 63 110 64 97 65 109 66 101 67 10 68 10 69 10 70 108 67 10 68 10 69 10 70 108 71 101 72 116 73 32 70 108 71 101 72 116 73 32 74 100 73 32 74 100 75 97 76 116 77 101 78 32 74 100 75 97 78 32 79 61 78 32 79 61 80 32 81 68 80 32 81 68 82 97 83 116 84 101 85 84 86 105 87 109 88 101 89 58 81 68 89 58 90 58 91 60 92 99 93 104 94 114 95 111 96 110 97 111 98 58 92 99 93 104 94 114 98 58 99 58 100 85 101 116 102 99 103 62 100 85 103 62 104 58 105 58 106 102 107 114 108 111 109 109 110 95 111 117 112 116 113 99 114 40 106 102 107 114 114 40 115 100 116 97 117 116 118 101 119 44 115 100 116 97 119 44 120 32 121 99 120 32 121 99 122 104 123 114 124 111 125 110 126 111 127 58 121 99 122 104 123 114 127 58 128 58 129 85 130 116 131 99 132 41 129 85 133 59 134 10 132 41 133 59 134 10 134 10 134 10 134 10 --Apple-Mail=_C4C62C22-6CBE-42EB-A4C4-AAA5F12BCE0A Content-Disposition: attachment; filename=patch.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="patch.diff" Content-Transfer-Encoding: 7bit diff --git a/src/treesit.c b/src/treesit.c index cab2f0d5354..ad87a6ae759 100644 --- a/src/treesit.c +++ b/src/treesit.c @@ -1101,6 +1101,13 @@ treesit_read_buffer (void *parser, uint32_t byte_index, assertion should never hit. */ eassert (len < UINT32_MAX); *bytes_read = (uint32_t) len; + + if (*bytes_read > 0) + { + printf ("%d %d\n", byte_index, *beg); + fflush (stdout); + } + return beg; } @@ -3432,6 +3439,37 @@ DEFUN ("treesit-subtree-stat", } } +DEFUN ("treesit--parser-view", + Ftreesit__parser_view, + Streesit__parser_view, 1, 1, 0, + doc: /* Return the view of PARSER. +Read buffer like PARSER would into a string and return it. */) + (Lisp_Object parser) +{ + const ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg; + const ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end; + const ptrdiff_t view_len = visible_end - visible_beg; + + char *str_buf = xzalloc (view_len + 1); + uint32_t read = 0; + TSPoint pos = { 0 }; + for (int idx = 0; idx < view_len; idx++) + { + const char *ch = treesit_read_buffer (XTS_PARSER (parser), + idx, pos, &read); + if (read == 0) + { + xfree (str_buf); + xsignal1 (Qtreesit_error, make_fixnum (idx)); + } + else + str_buf[idx] = *ch; + } + Lisp_Object ret_str = make_string (str_buf, view_len); + xfree (str_buf); + return ret_str; +} + #endif /* HAVE_TREE_SITTER */ DEFUN ("treesit-available-p", Ftreesit_available_p, @@ -3633,6 +3671,8 @@ syms_of_treesit (void) defsubr (&Streesit_search_forward); defsubr (&Streesit_induce_sparse_tree); defsubr (&Streesit_subtree_stat); + + defsubr (&Streesit__parser_view); #endif /* HAVE_TREE_SITTER */ defsubr (&Streesit_available_p); } --Apple-Mail=_C4C62C22-6CBE-42EB-A4C4-AAA5F12BCE0A--