From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yoav Marco Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Thu, 12 May 2022 20:22:30 +0300 Message-ID: <87mtfmk4oi.fsf@gmail.com> References: <87y1zabmbt.fsf@gmail.com> <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com> <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> <8735hhw75p.fsf@gmail.com> <83czgks4ss.fsf@gnu.org> <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org> <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org> <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com> <838rr7qqhw.fsf@gnu.org> <87sfpekf6t.fsf@gmail.com> <838rr6pwjt.fsf@gnu.org> <87pmkik7x6.fsf@gmail.com> <83wneqoej5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8064"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.6.3; emacs 29.0.50 Cc: casouri@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 12 21:47:01 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1npEm4-0001tp-Gm for ged-emacs-devel@m.gmane-mx.org; Thu, 12 May 2022 21:47:00 +0200 Original-Received: from localhost ([::1]:45778 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1npEm3-0002iE-4C for ged-emacs-devel@m.gmane-mx.org; Thu, 12 May 2022 15:46:59 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50484) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1npDUe-0001n8-5w for emacs-devel@gnu.org; Thu, 12 May 2022 14:24:56 -0400 Original-Received: from mail-wm1-x32a.google.com ([2a00:1450:4864:20::32a]:37698) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1npDUb-0006KC-Vk; Thu, 12 May 2022 14:24:55 -0400 Original-Received: by mail-wm1-x32a.google.com with SMTP id o12-20020a1c4d0c000000b00393fbe2973dso5669272wmh.2; Thu, 12 May 2022 11:24:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:user-agent:from:to:cc:subject:date:message-id :in-reply-to:mime-version:content-transfer-encoding; bh=xKWokoEF/jhtBev5+pJVJ09WEL74+cpr/0Sl1H+8ubE=; b=Z3X5ia/nIn/FKx5kVYVyKeXWsGSeta+ho10CcAAzoeTmMnJjHYmeCeFPOqPGdbDjo3 Ga+NQx+8N6Thhi6shEDFXz6ge1B/54uhyEXHkvlpWPn7EpT8XBLHKUdKD0X6DnEdbAUa L8p37/arO05VkQVm/7bBe32Pesk7tRBGsy8O9OIVnOLXq9OVdr//7RjphReSA7bHkveO /pP77uSwxyoCdsUF0/kLqD8pVaFf/dGVlpBW5WJ4CkdzJTtvpaXgK1fTeVa4ZxtNFOqz SvBXojPsqpqeKWWdoxkImD5za+ud2lvWnIgDC2JfMKpvCtsnKngKslsMWi7YAXpBOORO j2bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:references:user-agent:from:to:cc:subject:date :message-id:in-reply-to:mime-version:content-transfer-encoding; bh=xKWokoEF/jhtBev5+pJVJ09WEL74+cpr/0Sl1H+8ubE=; b=xx0d1c91bGeurjSqnjdmgdkjj0ysZRUR//8tkRF/P2CTAFyd2YJsBX+iPWe69QFy+T WdFLvRpmZBvGQCNxZav/0RIUXwQFbuwvMvFiBBfoEuoltF9YbjN2VJDNkxirydfQcKb1 pM297QTvXwJTq1MA+8Eut8dF+Ewq1heT8nP4cj/M0YxcSaHDrp9A5nEyE0jyRsKG7gMh 7p1ONVbcM+PF4OepjcPzHHz96EnbCktBMzNI4QFDDmwg2uD4kNfqu1V2rfUofkW/Ts0Q wVlM3SNuSvlix/N808uHCPEw+FaUhazFvN38y6wWzfDdqRS6N81GrOM5gJ53+KdcFbaj fmog== X-Gm-Message-State: AOAM531i03tpt2bYkC/ry1YMsicFvyWCD9pFHiqGelMeYYgVmTaB3782 8pt6DSMhUo8Sxe+3pBL6L8X6uFhEbHtKrA== X-Google-Smtp-Source: ABdhPJwxfsSzqZgd15BD1vG8ENdDFKH91HnKs27qSImaPt463JZal5OiEnHT3/oNVoqqIuM+2KlzZA== X-Received: by 2002:a7b:cbc1:0:b0:38e:7c42:fe38 with SMTP id n1-20020a7bcbc1000000b0038e7c42fe38mr11780168wmi.51.1652379891898; Thu, 12 May 2022 11:24:51 -0700 (PDT) Original-Received: from localhost ([77.126.101.171]) by smtp.gmail.com with ESMTPSA id c13-20020adfa70d000000b0020c5253d8bfsm270131wrd.11.2022.05.12.11.24.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 May 2022 11:24:51 -0700 (PDT) In-reply-to: <83wneqoej5.fsf@gnu.org> Received-SPF: pass client-ip=2a00:1450:4864:20::32a; envelope-from=yoavm448@gmail.com; helo=mail-wm1-x32a.google.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 12 May 2022 14:58:30 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289705 Archived-At: Eli Zaretskii writes: >> From: Yoav Marco >> Cc: casouri@gmail.com, emacs-devel@gnu.org >> Date: Thu, 12 May 2022 19:26:50 +0300 >> >> How I understand it, if it takes 23.474s to fontify 2332 times without >> query caching and 0.037s with, then 99.7% of the time is spent in >> recompiling the same query, or (23.474 - 0.037)/2332 =3D 10ms per >> fontification. > > Yes, and 10 ms is negligibly short. So, while the relative speedup is > very significant, I still don't see any reason for caching the > queries. > > But maybe we should make this discussion more concrete. Can you show > the queries and explain how they are produced from the font-lock rules > (or whatever else they are produced from)? How many different queries > do we expect to have in a garden-variety major mode for a PL, and what > do they depend on? So first of all, query is kind of an aggregate term, since one query string/sexp can contain many "query patterns". I expect most major modes to have one big query string/sexp, and maybe a handful more that are optional to users. treesit allows you to set as many query strings/sexps as you want for syntax highlighting. Outside of that, queries are also how packages like evil-textobj-tree-sitter work, with the backend of the elisp-tree-sitter which uses a dynamic module. Queries are specific to the parse tree and therefore to the parser. Most parsers have a queries/highlights.scm file in their repo, and tree-sitter-langs contains a bunch of these: > https://github.com/emacs-tree-sitter/tree-sitter-langs/#readme > > Highlighting query patterns for a language are in the file > queries//highlights.scm. Most of them are intentionally > different from those from upstream repositories, which are more geared > towards GitHub=E2=80=99s use cases. We try to be more consistent with Ema= cs=E2=80=99s > existing conventions. (For some languages, this is WIP, so their > patterns may look similar to upstream=E2=80=99s.) The query I used in the benchmarks is tree-sitter-langs's queries/c/highlights.scm, which is a rather big file. One thing to check that I only thought of now is how long it takes with treesit having to compile and run multiple queries. >> Explaination for the whole table: >> >> | | | font-lock | TS sexp | TS | TS query reus= e | >> | 1 | xdisp.c all at once | 12.886 | 0.031 | 0.016 | 0.01= 7 | >> | 2 | 20 =C3=97 512c | 0.273 | 0.214 | 0.209 | = 0.000 | >> | 3 | 512c to end | 4m+ | 24.177 | 23.474 | 0.03= 7 | >> >> Rows: >> - Benchmark 1 xdisp.c all at once: run font-lock-font-lock-fontify-region >> on the entire buffer once >> - Benchmark 2 20 =C3=97 512c: fontify the next 512 characters 20 times >> - Benchmark 2 20 =C3=97 512c: fontify the next 512 characters until the >> buffer ends > > Thanks. I think these benchmarks are not very useful. Representative > benchmarks I can think of are: > > . the time it takes to visit xdisp.c and display the first window-full > . visit xdisp.c, then immediately go to its end > . C-v in xdisp.c (repeat many times to see how much a single C-v > takes) Okay, we can try that. What's the proper way to trigger a "natural fontification" as would occur in the GUI without opening an interactive session? I'd rather use the groundwork that's actually used by users, and not get stuff like the JIT chunck size wrong. In general I'm not too familiar with that part of Emacs; the benchmarks up to now used with-temp-buffer, would that suffice for these new benchmarks? >> I thought garbage collection could take care of that. Is that >> problematic? > > GC can take care of queries that the Lisp program no longer needs, but > the Lisp program should first decide that it no longer needs them. > Like stop referencing them in any data structure. Is that a problem? If anyone's generating queries and putting them in lists, that would be a problem whether they're strings or compiled objects. - Yoav