From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Theodor Thornhill Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter indentation for js-mode & cc-mode Date: Thu, 27 Oct 2022 11:11:01 +0200 Message-ID: <87k04lljh6.fsf@thornhill.no> References: <9AF8BFDC-C9A2-4AE5-A8D2-E6AA05DA3C91@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23396"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Stefan Monnier To: Yuan Fu , emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Oct 27 11:12:57 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1onywa-0005op-9z for ged-emacs-devel@m.gmane-mx.org; Thu, 27 Oct 2022 11:12:56 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1onyvC-0000pf-Ud; Thu, 27 Oct 2022 05:11:31 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1onyvB-0000ZO-JZ for emacs-devel@gnu.org; Thu, 27 Oct 2022 05:11:29 -0400 Original-Received: from out2.migadu.com ([2001:41d0:2:aacc::]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1onyuu-0004Ph-V0 for emacs-devel@gnu.org; Thu, 27 Oct 2022 05:11:29 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thornhill.no; s=key1; t=1666861863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3MeBK0sn7Z9O2cziWqEuPJthJPfHEG6rAwED57C5ROg=; b=X8PMPnzqiUyAwTPPFzIFz3SxQS+rzl1YIuozBqCLI9S8ZsWds++Tz7MzpecEcSOCMljorb u+oAho1sJcFdKAVF0j4OEmqU/uAeocilb67AiPtMfQWuqLCPVu86zZZ6lWrCJyevDvIaol uD7o6tBHf4ItkVxwJBiYUUmRLy4+BG+n2MG95y/Z+tx3X4bxW/jPMOXpifkexP22HD7Nap vdqrtyKWo01AiSDXliwtp+Kf0HNovoP0lO2iG3L2/Mf8HG6iwJqD27hoYIjO1HNnxlOviB lPr0jCMuigVsa449Xms8Gv20GFKCNfGRNkwyzV8cVlU4+cwEI7+s99CqQ7xR9w== In-Reply-To: <9AF8BFDC-C9A2-4AE5-A8D2-E6AA05DA3C91@gmail.com> X-Migadu-Flow: FLOW_OUT Received-SPF: pass client-ip=2001:41d0:2:aacc::; envelope-from=theo@thornhill.no; helo=out2.migadu.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298608 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Yuan! > I did some work to allow tree-sitter indentation engine to plug in to > c-offset-alist. Currently in a tree-sitter indent rule, we have > > (MATCHER ANCHOR OFFSET) > > OFFSET is normally an integer, but now it can also be a syntax symbol > recognized by cc-mode=E2=80=99s indentation engine. In that case, tree-si= tter > indent calculates the indent using c-calc-offset, passing the syntax > symbol and anchor position to it, and c-calc-offset will give us the > integer offset based on c-offset-alist. > This is cool, but do we really want/need this? I mean, now we're really binding these implementations together and allowing all the legacy of CC mode to blend in. We also need knowledge of how CC mode names their syntactic definitions. IMO one of the big selling points of tree sitter is that you can look at other editors implementation and get inspired immediately. Now we need deep knowledge of cc mode, don't we? Also, why would we want cc mode to calculate this for us? I see what you're trying to do, but _I_ think this is a step in the wrong direction. > I=E2=80=99ve written indent rules for js-mode, they are in > js-treesit-cc-indent-rules. Overall it works pretty well. Theo, could > you give it a try? From my testing it is already an improvement from > the original rules. I didn=E2=80=99t finish the JSX part and just copied = your > original rules for JSX. In the future I can probably port that to > cc-style too. > I don't think this is better, for example: The old variant renders this snippet correctly: ``` const fooClient =3D new Foo({ bucket: process.env.foo, region: process.env.foo, }); ``` But the new one renders it like this: ``` const fooClient =3D new Foo({ bucket: process.env.foo, region: process.env.foo, }); ``` I know this is a matter of tweaking, but it immediately makes me question the reasoning to blend them. In addition I profiled indenting a 50k lines js file with messed up indentation, and received some surprising results. The pure cc mode variant is slow, but MUCH faster than tree-sitter. It seems we are looking up way to much the root of the tree, but you know the internals here better than me. Is this something we can optimize away? See the attached report at the bottom. > I also added imenu support for js-mode and ts-mode, and navigation for > python-mode. > Very cool - it seems to work pretty nicely! Anyways - can we please revisit the idea that we init and/or use cc mode in tandem with tree-sitter? I know we want "feature parity", but I think we lose too much of the simplicity gained by adding in the old complexity. My prediction for the future is that this will result in numerous bug reports where it's hard to know whether this is a fix for cc mode or treesitter. And in the end people _will_ just skip these modes altogether and put simpler ones in (m)elpa that only uses treesit, to avoid this. The cc mode won't go away at all, for the people that considers that superior. We can still use the treesit-settings as a centralized variable to get most, if not all of the "auto-enabling" benefits by just lifting it up: ``` ;;.... (cond ;; Tree-sitter. ((treesit-ready-p 'js-mode 'javascript) ;; init all treesitter relevant stuff - can add in _some_ other ;; non-cc-mode settinigs, such as comment-start, etc above this. ;; We don't need the cache, detection of js-jsx or any of the ;; before-change-functions =20=20 (treesit-major-mode-setup)) ;; Elisp. (t ;; enable in normal cc mode stuff ))) ``` This way other hypothetical tree-sitter-v2 in the future is just a simple cond, and no need to worry. If I'm missing something important here, please let me know, but I _really_ don't understand the reason for merging these implementations. Anyway, thanks for your continued hard work! Theo --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=indent-report.txt 528085 99% - command-execute 527969 99% - funcall-interactively 527969 99% - execute-extended-command 527969 99% - command-execute 527969 99% - funcall-interactively 527969 99% - foo2 527969 99% - indent-for-tab-command 527969 99% - indent-region 527965 99% - indent-region-line-by-line 527901 99% - indent-according-to-mode 527893 99% - treesit-indent 527825 99% - let* 482649 91% - cond 482636 91% - treesit-node-at 482512 91% - let* 482496 91% - if 478981 90% - treesit-buffer-root-node 478969 90% - let* 478957 90% if 8 0% - and 8 0% - or 8 0% - if 8 0% - or 4 0% car 3503 0% - progn 3495 0% - while 3483 0% setq 4 0% if 4 0% null 44354 8% - let* 39861 7% - funcall 39845 7% - treesit-simple-indent 39845 7% - if 39837 7% - let* 39753 7% - let* 39741 7% - while 39729 7% - and 39701 7% - progn 39633 7% - if 37372 7% - progn 37360 7% - progn 37348 7% - setq 37344 7% - let 37332 7% - treesit--simple-indent-eval 37320 7% - cond 37320 7% - apply 36812 6% - # 36808 6% - save-excursion 36688 6% - backward-up-list 36680 6% - up-list 36 0% syntax-ppss 112 0% - goto-char 112 0% - treesit-node-start 108 0% - treesit-node-at 108 0% - let* 108 0% - if 84 0% - progn 84 0% - while 84 0% setq 24 0% - treesit-buffer-root-node 24 0% - let* 24 0% if 416 0% - # 132 0% - save-excursion 4 0% goto-char 52 0% - mapcar 36 0% - treesit--simple-indent-eval 24 0% cond 4 0% - treesit--simple-indent-eval 4 0% cond 4 0% cons 2229 0% - treesit--simple-indent-eval 2173 0% - cond 2129 0% - apply 1320 0% - treesit--simple-indent-eval 1320 0% - cond 1292 0% - apply 1172 0% - # 824 0% - ` 592 0% - # 548 0% - backquote-process 424 0% - backquote-process 188 0% - backquote-process 64 0% - backquote-process 4 0% backquote-listify 8 0% backquote-listify 184 0% - list 80 0% - backquote-list* 24 0% cons 48 0% - treesit--simple-indent-eval 44 0% cond 44 0% - mapcar 28 0% - treesit--simple-indent-eval 28 0% - cond 4 0% and 8 0% and 204 0% - mapcar 120 0% - treesit--simple-indent-eval 84 0% - cond 4 0% and 5 0% - # 5 0% - string-match-p 5 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% # 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% # 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% # 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% string-match-p 4 0% - # 4 0% - string-match-p 4 0% or 4 0% - # 4 0% - string-match-p 4 0% or 20 0% - and 4 0% not 28 0% setq 4 0% treesit-node-language 4073 0% - progn 4053 0% - let* 4049 0% - let 4041 0% - if 4033 0% - let 3721 0% - if 1611 0% - indent-line-to 620 0% jit-lock-after-change 38 0% - # 38 0% - filter-buffer-substring 38 0% - buffer-substring--filter 38 0% - # 38 0% apply 16 0% syntax-ppss-flush-cache 8 0% - js--flush-caches 4 0% setq 4 0% undo-auto--undoable-change 300 0% - + 132 0% save-excursion 404 0% - cond 116 0% - treesit-node-at 116 0% - let* 116 0% - if 84 0% - progn 84 0% - while 80 0% setq 582 0% - treesit-parent-while 564 0% - let 564 0% - while 528 0% - progn 520 0% setq 32 0% - and 28 0% - funcall 4 0% # 40 0% save-excursion 20 0% - treesit-update-ranges 4 0% let 116 0% - byte-code 116 0% - read-extended-command 116 0% - read-extended-command-1 116 0% - completing-read-default 116 0% - apply 116 0% - vertico--advice 92 0% - # 58 0% - vertico--exhibit 40 0% - vertico--update 40 0% - vertico--recompute 32 0% - vertico--all-completions 32 0% - completion-all-completions 32 0% - completion--nth-completion 32 0% - completion--some 32 0% - # 32 0% - completion-basic-all-completions 32 0% - completion-pcm--all-completions 32 0% - # 32 0% - complete-with-action 20 0% - all-completions 8 0% - # 4 0% # 4 0% vertico-sort-history-length-alpha 18 0% - vertico--arrange-candidates 15 0% - vertico--affixate 15 0% read-extended-command--affixation 3 0% - # 3 0% - completion-hilit-commonality 3 0% - # 3 0% font-lock-prepend-text-property 5 0% - timer-event-handler 2 0% - apply 2 0% # 1189 0% + ... 95 0% + timer-event-handler --=-=-=--