From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Tue, 10 May 2022 10:54:53 -0700 Message-ID: <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> References: <87y1zabmbt.fsf@gmail.com> <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22163"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Yoav Marco Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 10 19:56:22 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1noU5u-0005cf-17 for ged-emacs-devel@m.gmane-mx.org; Tue, 10 May 2022 19:56:22 +0200 Original-Received: from localhost ([::1]:43422 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1noU5s-00059g-R6 for ged-emacs-devel@m.gmane-mx.org; Tue, 10 May 2022 13:56:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57610) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1noU4i-0003d7-5D for emacs-devel@gnu.org; Tue, 10 May 2022 13:55:08 -0400 Original-Received: from mail-pf1-x430.google.com ([2607:f8b0:4864:20::430]:37732) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1noU4f-0007HH-J4 for emacs-devel@gnu.org; Tue, 10 May 2022 13:55:07 -0400 Original-Received: by mail-pf1-x430.google.com with SMTP id bo5so15554380pfb.4 for ; Tue, 10 May 2022 10:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=5P/wKgkOJykAK4mVCvyT3w3/aahymQzHOcpMgdyt7dI=; b=RLS86IRNM89hnyq5NqguVxb9ESF1xpes39mxMafIqWxJD/c/wRyQFOjnhq4svS7k1a hbYoT0Z2uNbbzGsVtjeaM7bM++UdR30L+twS27MUTLWFxc7wYd7PI63fSPcMmnGGiSz1 3p3Yhc7fNnKYFoPmbRWkMKXLDcF24i88l2cDxl/gwQDimkzEAoPIUwBr/2YSUJjga4ZY yURtXjFRIZgpu/+N4sCe7grKD2zyUw8kLfE+0CdXAsrBR49MufD+IJpnq04h6va/js0Y Oh4HlNJBZ670XtL9YFPeLfPnv93znISoBcHY/hqqXsp3pU83M2hAO6+HBx6oS2LbYN2Y XsmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=5P/wKgkOJykAK4mVCvyT3w3/aahymQzHOcpMgdyt7dI=; b=5C6nOwedsk3c+pVyfazOekhDOdZHBg4jgwcqKzt+rSfem2UbIQRWWP142CZAzlN0JF q3RSU9sCg/RY0JKP7WI9ZXQi2PS/fqvbfurnblGyJXE3zgqeQ852gpJIEKFCAyQBGJzV jDcxEkw076mYvufArQT+7tRz+RX/fO55tZX3Jpbb521zpzUmdHQnMzpnDhWM3Wm+xoH8 KVk0V4OJvCfz4NtEwqrli6XjE4UBblCqBl57hsN4HzigVuv2LA5hfH1UVDTIJnUpa3qk eo/45pbKY8BgJmIPFHse+rGD6LuV1279bGIVJGCcfbL14KSc/xkCjuy6Haxfx19VFcwI zNFw== X-Gm-Message-State: AOAM533JktS/MzO280SL0QAMfkdjcEjIhziGnbZb4JNLnFAc4yJcck02 IefwqK2lKVZDuwRGO7HctXE= X-Google-Smtp-Source: ABdhPJyxCrtqe+eto+krh7Tncy01P14ac3ELheo4gOX6Rlhe4HSFx9lDDpJx1QL2Cq5ruOTzfyKjFg== X-Received: by 2002:a63:4b5e:0:b0:3c2:4706:d9d5 with SMTP id k30-20020a634b5e000000b003c24706d9d5mr17622867pgl.43.1652205294963; Tue, 10 May 2022 10:54:54 -0700 (PDT) Original-Received: from smtpclient.apple ([128.54.12.63]) by smtp.gmail.com with ESMTPSA id n14-20020a170903110e00b0015e8d4eb20dsm2434044plh.87.2022.05.10.10.54.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 May 2022 10:54:54 -0700 (PDT) In-Reply-To: <875ymdwf76.fsf@gmail.com> X-Mailer: Apple Mail (2.3696.80.82.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::430; envelope-from=casouri@gmail.com; helo=mail-pf1-x430.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289599 Archived-At: > On May 10, 2022, at 8:43 AM, Yoav Marco wrote: >=20 > I benchmarked query compilation reuse: >=20 > | | | no reuse (now) | reuse | > | 1 | Fontify xdisp.c all at once | 0.01s | 0.01s | > | 2 | Fontify 60 next lines of xdisp.c =C3=9710 | 0.10s | = 0.00s | > | 3 | Fontify 60 next lines till the end | 6.06s | 0.01s | >=20 >=20 > The patch to reuse the query is pretty dumb: if the char* for the = query > string didn't change from last time, it reuses the TSQuery object from > last time instead of calling ts_new_query again. The patch is = attached. >=20 > The elisp code for the benchmarks is also attached, but I'll give a > summary here: >=20 > The queries are tree-sitter-langs' highlights.scm for C. >=20 > Benchmark 1 runs treesit-font-lock-fontify-region once on the entire > buffer, meaning the query is compiled only once in both cases >=20 > Benchmark 2 runs treesit-font-lock-fontify-region on blocks of 60 = lines, > meaning the no reuse version has to compile the query 10 times even > though nothing changes in the buffer or query. >=20 > Benchmark 3 is just 2 done all the way. xdisp.c has 36k lines, so the > 6.06s is consistent > (600 lines =3D 0.10s, multiply by 60 =E2=87=92 36k lines ~=3D 6.00s). >=20 I had a look and it=E2=80=99s a pretty sensible benchmark, and creating = the query object taking a lot of time makes sense. But could you maybe = run the benchmark under gprof and see what you get? Just curious. > So, is caching worth it? I don't know. It definetily is if it's = possible > to do it internally without introducing a new object type. But I don't > think that's possible without making a hash map or a complicated cache > like the one for compiled regexps that compile_pattern uses in > search.c. Yeah using a single cache would probably result in a lot of misses since = Emacs don=E2=80=99t fontify the whole buffer at once. We don=E2=80=99t = necessarily need to use a hash map. I had a look at search.c and IIUC it = uses an Emacs-wide array of 20 regex caches and links them into a linked = list sorted by most-recently used, which doesn=E2=80=99t seem too bad? I = think I can do something similar to that. Tho we might also want to = allow users to pin some =E2=80=9Cpersistent=E2=80=9D cache, for example = major mode font-locking and indent queries, as they are guaranteed to be = reused a lot and are generally large (ie, slow to create). Maybe = that=E2=80=99s unnecessary tho. And I wonder if there is a cheap & easy = way to do caching buffer-locally=E2=80=A6 Or maybe add an argument to query-capture that allow the user to specify = whether they want the query to be cached, or assume user wants the query = to be cached if the query is in string form rather than in sexp form. Yuan=