From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yoav Marco Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Thu, 12 May 2022 17:16:41 +0300 Message-ID: <87sfpekf6t.fsf@gmail.com> References: <87y1zabmbt.fsf@gmail.com> <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com> <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> <8735hhw75p.fsf@gmail.com> <83czgks4ss.fsf@gnu.org> <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org> <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org> <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com> <838rr7qqhw.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7123"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.6.3; emacs 29.0.50 Cc: Yuan Fu , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 12 17:54:53 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1npB9R-0001mM-IS for ged-emacs-devel@m.gmane-mx.org; Thu, 12 May 2022 17:54:53 +0200 Original-Received: from localhost ([::1]:39286 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1npB9Q-0007HF-8K for ged-emacs-devel@m.gmane-mx.org; Thu, 12 May 2022 11:54:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49326) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1np9dq-0003V3-Bu for emacs-devel@gnu.org; Thu, 12 May 2022 10:18:10 -0400 Original-Received: from mail-wm1-x332.google.com ([2a00:1450:4864:20::332]:43951) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1np9do-0008SO-9k; Thu, 12 May 2022 10:18:10 -0400 Original-Received: by mail-wm1-x332.google.com with SMTP id l38-20020a05600c1d2600b00395b809dfbaso2831188wms.2; Thu, 12 May 2022 07:18:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:user-agent:from:to:cc:subject:date:in-reply-to :message-id:mime-version; bh=FiZ2CdYrzcxyJzoTclU7ijsW3GfUPhkfb1lUZ1Dcfe4=; b=R055dN3VF6WwASG/S6QdaCu6eGTNEOQOhNm2yTnobwt8736Z/sJlH3xLTGPuI6MaSE tJ9LDrRjqZAXHV8XVjN9P1a516jeM4ibNOLTWb5jBNlJOfhqTL9SaJEALSTbIdGA8L14 2xwJNEQKeDzoNCpHGBFjT9YVAX+IPwLVqDgfxxQ8tmnVNm2k3e0i1hn0d75qpW3tUwlp A0t0lw74dwa4/8Ay2ZpZBogLBrv39a4Yuz6Bq7zK4An894Tt6+uqzfHBjMqpBCzv2oG2 7pcjgtXvuVpShc5XuAnRS+hmZl9oEzZfbOy6mC6Yr1di52ogV1SNaJuPFCW8LyXPGwUS 8klA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:references:user-agent:from:to:cc:subject:date :in-reply-to:message-id:mime-version; bh=FiZ2CdYrzcxyJzoTclU7ijsW3GfUPhkfb1lUZ1Dcfe4=; b=fpKEUqKqvVUlhC0BL2T4Ps099XUXvVWu3MA/zxbiO/QeLAFiWZMs0aNcDxFlxILHHe Y9FyXJ+D5jX8yASAE9cAO4Sddd5x/vfdy9axvhH1OxS9uSkWzaOhyZFOs0vaHtn5ZqdS 2VwE273wqBaNi8wxbPtQbwF3Y8a3+Y/jq+c/o3RTVhstEfjY/3cz6LgnKIcttlHj77Fl vX1F4i3Bq7K09NIqL6z+uwu+GTLJ+fOwhQE1uyW1+HnAPxaTkDgjjZjJGRJFUHm2uGpw AGrPt3+pG2NGg+S0C9FNLGV6tLQnoRKPj1GsEMXG1Enzcwl0JU42hxFDE/jhglSpS38Z h1+g== X-Gm-Message-State: AOAM5339B0FWF7mQe/nWBXtXyquLYIhK/nBCT/MV9g3dxIC0fMHhx42F /YY2yN7Kn6wu/j0JHbenzSz8lKD8ZkneEw== X-Google-Smtp-Source: ABdhPJyQKabv7ulF9DE0VWJukb5ipdVlUvh9FfttxYj9GLS7cm4Vqn7WWnnnNDtsTIjsW8RnmStnnQ== X-Received: by 2002:a7b:cf36:0:b0:394:e58:b64b with SMTP id m22-20020a7bcf36000000b003940e58b64bmr10491818wmg.125.1652365085466; Thu, 12 May 2022 07:18:05 -0700 (PDT) Original-Received: from localhost ([77.126.101.171]) by smtp.gmail.com with ESMTPSA id w12-20020adfde8c000000b0020c5253d8dfsm4431608wrl.43.2022.05.12.07.18.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 May 2022 07:18:04 -0700 (PDT) In-reply-to: <838rr7qqhw.fsf@gnu.org> Received-SPF: pass client-ip=2a00:1450:4864:20::332; envelope-from=yoavm448@gmail.com; helo=mail-wm1-x332.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 12 May 2022 11:52:25 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289697 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Eli Zaretskii writes: >> From: Yuan Fu >> Date: Wed, 11 May 2022 13:14:33 -0700 >> Cc: Yoav Marco , >> emacs-devel@gnu.org >> >> I redid the benchmark, but without his reuse patch, just to see how >> much time is spent on creating query objects. So fortifying 40 lines >> for 463 times takes 6.92s (according to Emacs, 7.30s according to the >> profiler). That counts to 0.0158s per call to font-lock-region, of >> which 0.0104s is spent on creating the query object. That seems to >> tell me if we optimize away the query object creation we can make >> font-locking very very fast? This is a little confusing, which profiler are we talking about? Is the difference between Emacs's 6.92s and the profiler's 7.30 because Emacs is only benchmarking the loop, and the profiler's measuring the entire execution? Query compilation doesn't improve startup time, so the conclusion that only 10ms is spent on query compilation might be wrong. And it probably is: in my benchmark, query compilation improved performance in much more than 16/6=3D266%: it went from 6.06 to 0.01. > According to your benchmarks, it is already very fast: 16 msec is a > negligible time interval. Of course, 40 is a somewhat arbitrary > number, but to get a less arbitrary one, we should determine it from > some concrete scenarios, such as the 512-character chunk JIT font-lock > uses during redisplay, or the number of lines on a typical window > that's important when one scrolls with C-v/M-v, etc. It's easy enough to convert the benchmarks to 512-chars chunks rather than 40 lines. See table a few paragraphs below. >> font-lock: 88.28s -> 0.1997285067873303 / loop > > So we already have an order-of-magnitude speed-up with tree-sitter: we > go from 200 msec down to 16 msec. Also, 200 msec is above the > threshold of human perception of a response delay, whereas 16 msec is > way below that threshold. With such significantly faster font-lock, I > wouldn't bother caching anything, at least not yet, not unless someone > comes up with a practical use case where the query-compilation part > really makes a significant practical difference in terms of absolute > response times. > Bottom line: I think the 6-msec speedup (from 16 to 10) in the > scenario that was used in these benchmarks doesn't justify the > complexities of caching the queries, given the overall excellent > performance we get with tree-sitter. Caching is an optimization, and > in this case it sounds like doing that now would be a premature > optimization. As said, I think 16=E2=86=9210 is a wrong conclusion. >> If we expose "compiled query=E2=80=9D we don=E2=80=99t need to cache the= m either. > > Then the Lisp program will have to do that, which is even worse, > because the problems I described will now have to be solved by Lisp > application programmers, each time anew. Will they? They'd just need to compile their queries once, when defining them or when setting treesit-font-lock-defaults. Right now the most convenient way to represent queries is as sexps, but although treesit accepts queries as lists major-modes are encouraged to stringify them, since the tree-sitter API works with string queries. This exact discussion occured when Theodor asked for feedback on the go-mode.el: > From: Yuan Fu > Date: Mon, 2022-05-09 21:10 UTC > To: Eli Zaretskii > > I have some comments below, I haven=E2=80=99t tested the patch yet. >> >> +(defvar js-treesit-font-lock-settings-1 >> + '((javascript >> + ( >> + ((identifier) @font-lock-constant-face >> + (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face)) > > I would use treesit-expand-query to =E2=80=9Cexpand=E2=80=9D the sexp que= ry to string, > so Emacs don=E2=80=99t need to re-expand it every time treesit-query-capt= ure is > called. I don=E2=80=99t know how much it speed things up, but hey its fre= e. Why don't we check how much it speeds things up? | | | font-lock | TS sexp | TS | TS query reuse | | 1 | xdisp.c all at once | 12.886 | 0.031 | 0.016 | 0.017 | | 2 | 20 =C3=97 512c | 0.273 | 0.214 | 0.209 | 0.= 000 | | 3 | 512c to end | 4m+ | 24.177 | 23.474 | 0.037 | So the time to stringify is negligible compared to query compilation. Also, I don't know why font lock took that much time in the last benchmark. > or the number of lines on a typical window that's important when one > scrolls with C-v/M-v, etc. The following calculation sounds a little silly to me, but here it is anywa= y. xdisp.c has 32.3 chars per line on average, so each 512 char fontification covers 15.8 lines. My Emacs window can fit 50 lines, so when jumping to an unfontified buffer location I'll need 4 calls for fontification. That would take, depending on the engine: | font-lock | TS sexp | TS | TS query reuse | | 0.054 | 0.042 | 0.041 | 0.00 | (The 20 =C3=97 512c row, divided by 5 to represent 4 =C3=97 512c) Improving fontification by 41ms is worth it in my opinion, as long as it's not complicated, which it shouldn't be when letting users compile their queries before use, though I don't know the downsides of exposing another type to lisp. (Currently tree-sitter adds two new types, treesit-node and treesit-parser.) - Yoav --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=tree-sitter-benchmark.el ;;; tree-sitter-benchmark.el -*- lexical-binding: t; -*- ;; run benchmark with ;; emacs -Q --script tree-sitter-benchmark.el [-1] [-2] [-3] [-regexp] (require 'treesit) (require 'cc-mode) (defvar query-type 'list "How to save the query. Either `string' or `list'.") (defcustom fontifying-mode 'treesit "Benchmark mode." :type '(choice (const treesit) (const regexp))) (setq fontifying-mode 'regexp) (defvar c-font-lock-settings-1 `((c ,(with-temp-buffer (insert-file-contents-literally "./highlights.scm") ;; make capture names map to a face, any face (goto-char (point-min)) (while (re-search-forward "@[a-z.]+" nil t) (replace-match "@font-lock-string-face" t)) (pcase query-type ('string (buffer-substring (point-min) (point-max))) ('list (goto-char (point-min)) (insert "(") (goto-char (point-max)) (insert ")") (goto-char (point-min)) (read (current-buffer))) (_ (user-error "`query-type' must be 'string or 'list"))))))) (defun setup-fontification () (pcase fontifying-mode ('treesit (treesit-get-parser-create 'c) ;; This needs to be non-nil, because reasons (unless font-lock-defaults (setq font-lock-defaults '(nil t))) (setq-local treesit-font-lock-defaults '((c-font-lock-settings-1))) (treesit-font-lock-enable) (advice-add #'font-lock-default-fontify-region :override #'ignore)) ('regexp (c-mode)))) (defun fontify (beg end) (pcase fontifying-mode ('treesit (font-lock-fontify-region beg end)) ('regexp (font-lock-fontify-region beg end nil)))) (defun buffer-middle () (/ (+ (point-min) (point-max)) 2)) (with-temp-buffer (message "Fontification method: %s %s" fontifying-mode query-type) (setup-fontification) (insert-file-contents "xdisp.c") (apply #'message "Benchmark 1: fontify xdisp.c all at once.\ took %2.3f, with %d gc runs (meaning %2.3f)" (benchmark-run 1 (fontify (point-min) (point-max)))) (set-text-properties (point-min) (point-max) nil) ;; fontify xdisp.c from the middle, since it starts with a comment header of ;; 22k chars (goto-char (buffer-middle)) (apply #'message "Benchmark 2: fontify part of xdisp.c, 20 batches of 512 chars.\ took %2.3f, with %d gc runs (meaning %2.3f)" (benchmark-run 1 (dotimes (_ 20) (fontify (point) (min (+ 512 (point)) (point-max))) (forward-char 512)))) (kill-new (buffer-substring (buffer-middle) (+ (* 512 10) (buffer-middle)))) (set-text-properties (point-min) (point-max) nil) (goto-char (point-min)) (apply #'message "Benchmark 3: fontify all of xdisp.c, 512 chars at a time.\ took %2.3f, with %d gc runs (meaning %2.3f)" (benchmark-run 1 (while (/= (point-max) (point)) (fontify (point) (min (+ 512 (point)) (point-max))) (goto-char (min (+ 512 (point)) (point-max)))))) (advice-remove #'font-lock-default-fontify-region #'ignore)) --=-=-=--