From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter api Date: Wed, 15 Sep 2021 08:56:18 -0700 Message-ID: <1F752923-F357-4A18-B6E2-0120F1B9BD37@gmail.com> References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> <40611F1F-7B5C-4885-A2CA-CE709ED8D22B@gmail.com> <4E876354-10D1-46B3-8124-CAE916261F08@gmail.com> <0A3F5464-B90D-4D47-BBDD-CCA26D877F43@gmail.com> <83tuiys1y4.fsf@gnu.org> <835yvcpdip.fsf@gnu.org> <7B1F90DE-A992-4F51-B391-0A4E5A598780@gmail.com> <3E8CA8E4-E623-4051-A76D-508C6CF94B6A@gmail.com> <837dfpj5yf.fsf@gnu.org> <8335qbirsr.fsf@gnu.org> <73E0B1F6-6F9F-40E0-927E-D08481BFF391@gmail.com> <834kaqhqlp.fsf@gnu.org> <8335qahqgk.fsf@gnu.org> <3BC29D06-CA75-4706-9AD7-ABA2F65C4DEE@gmail.com> <83v936fj35.fsf@gnu.org> <83r1dselyo.fsf@gnu.org> <6A4CE984-6ACE-4E66-8EF2-F3D351C02248@gmail.com> <83r1dscpt2.fsf@gnu.org> <83o88wcof9.fsf@gnu.org> <83lf3zdh4z.fsf@gnu.org> <8965C4A0-79D3-4D77-A6BA-D07A6C93F7FE@gmail.com> <83ilz3cs4k.fsf@gnu.org> <04D19C1A-CD64-4156-B932-1C9FEEE4EC7B@gmail.com> <83zgsebc0r.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20266"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?B?VHXhuqVuLUFuaCBOZ3V54buFbg==?= , Theodor Thornhill , =?utf-8?Q?Cl=C3=A9ment_Pit-Claudel?= , Emacs developers , Stefan Monnier , stephen_leake@stephe-leake.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 15 17:58:46 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mQXJ8-000539-0U for ged-emacs-devel@m.gmane-mx.org; Wed, 15 Sep 2021 17:58:46 +0200 Original-Received: from localhost ([::1]:51470 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mQXJ6-0002SX-6K for ged-emacs-devel@m.gmane-mx.org; Wed, 15 Sep 2021 11:58:44 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50072) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mQXGt-0007dD-Fb for emacs-devel@gnu.org; Wed, 15 Sep 2021 11:56:27 -0400 Original-Received: from mail-qk1-x732.google.com ([2607:f8b0:4864:20::732]:34409) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mQXGo-0000F6-3C; Wed, 15 Sep 2021 11:56:27 -0400 Original-Received: by mail-qk1-x732.google.com with SMTP id a66so3964716qkc.1; Wed, 15 Sep 2021 08:56:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ObzrmYeXG6Ekkc23+w9sBZgJmTbLErHT9LECYPVV7us=; b=qil1B2E1vR9V6B/wxmEfW3W14DxBT/zvQFMLUb0r9Az+z6+QI8dH3erhL2enfKNbKC Lcvvu5Rv67VXyAEXQBhN5IOAiBCJd/cfvinI3L7TV1RcIdN1Eg873/QFjtBAfFQqAZLw rVlTa2rRegRvuBqd7m3w3/xhUAvO4MayokmuhujpeRQlwoGM6onclojb3KFI1dCesmjL NHf8QPS2Z+OEC9tWKhJV01i3XPlwCg89Scg6tcIOUBM8miBrzl2czkALgLqb3AKmWeNO 1QkLeZOkg/t9UQ4DvT87lX2yYzG8iEd0TA6ntJuW8gGRMoOcO/g4XIlZ23uK0CGm3LjA gvYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ObzrmYeXG6Ekkc23+w9sBZgJmTbLErHT9LECYPVV7us=; b=Dkk/o3u6nQfR0z4+6pJ+WrNV/hMN72vvO0XODvdEfeo5DTNZO0bziH6mTcaXSBzGn1 iXwN+CETwN6df38dH3zh8LqUWcBsXQAEK5aBuS/a19SShcg2jQDmyZ+4keXlcRlBJZg5 XE4CV3cqq+IB9AD68XMvFVIrjSxVqYJA3hO5SrKx+ZJjX6A+XQQKnx6EigMSHlUCicud QCCdQ5m+KUyw7PuMa3pX42n7oVAX54kLW6LoyN3zpfJ+uivjUch1WT59oTQ/LknEcGG7 7vGLCRdouCewB8sgM46/Iaqy8Ma94cz7FuJKWSzMhL547Sc42rp3DvIyWz7mdfbv0KsU f3VQ== X-Gm-Message-State: AOAM533ia1dXRoPONvClFw0RehMrpuE1OjD7W3qH4vCiqyAby+oeOpIU L004dnONTEm2Xnal88p9LIVfnxuLDwuGlI5U X-Google-Smtp-Source: ABdhPJxJwamhxY2fObCowOzfNF2JlM/pzvwmGCyXnuH63c7kMwzH3gtX/Tgou97b+RjVd9E/byb2Zg== X-Received: by 2002:ae9:c115:: with SMTP id z21mr596273qki.482.1631721380115; Wed, 15 Sep 2021 08:56:20 -0700 (PDT) Original-Received: from smtpclient.apple ([2600:1700:2ec7:8c9f:6c2c:1a17:b2af:c864]) by smtp.gmail.com with ESMTPSA id x8sm179772qts.69.2021.09.15.08.56.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Sep 2021 08:56:19 -0700 (PDT) In-Reply-To: <83zgsebc0r.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) Received-SPF: pass client-ip=2607:f8b0:4864:20::732; envelope-from=casouri@gmail.com; helo=mail-qk1-x732.google.com X-Spam_score_int: -1 X-Spam_score: -0.2 X-Spam_bar: / X-Spam_report: (-0.2 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:274748 Archived-At: > On Sep 14, 2021, at 11:15 PM, Eli Zaretskii wrote: >=20 >> From: Yuan Fu >> Date: Tue, 14 Sep 2021 17:50:48 -0700 >> Cc: Tu=E1=BA=A5n-Anh Nguy=E1=BB=85n , >> Theodor Thornhill , >> Cl=C3=A9ment Pit-Claudel , >> Emacs developers , >> Stefan Monnier , >> stephen_leake@stephe-leake.org >>=20 >>> Almost: there's the (minor) problem of obtaining the "" part = by >>> the major-mode. I think it would be good to have a utility function >>> to do that so that major modes won't need to reinvent the wheel, do >>> the research, etc. >>=20 >> That=E2=80=99s where I don=E2=80=99t understand: the major mode is = written by major mode writers, who certainly know the correct = =E2=80=9C=E2=80=9D name: they need to read the source of the = language definition to use language=E2=80=99s tree-sitter features. You = seem to agree on that because you said that this function can be = extended by major mode writers. >=20 > I don't understand what you are saying here. Why would major mode > programmers need to know the correct name? The TS facilities > we will have in Emacs will be language-agnostic, right? For example, > to correctly indent a line of code, the major mode will call some > hypothetical tree-sitter-get-indentation function, and that function > will work in any major mode, provided that the major mode told TS to > load the support for the programming language of the buffer. Right? Now I see why there is confusion. Tree-sitter only provide a = =E2=80=9Cprimitive=E2=80=9D feature: the concert syntax tree, and it is = not language-agnostic. You don=E2=80=99t get indentation for free, = unfortunately. Indenting the program by the information from the syntax = tree is our problem. Tree-sitter doesn=E2=80=99t have anything like = tree-sitter-get-indentation function, and there is no mechanical way to = provide one, a human needs to read the source of the tree-sitter = language definition and figure out how to do it. See below. > So when the major mode initializes for working with TS, it should tell > TS which language to load, and why would we request the major mode > programmer to know the correct name which corresponds to the > major mode's programming language? Why would they need to "read the > source of the language definition to use language=E2=80=99s = tree-sitter > features"? The specifics of the TS implementation of, say, > indentation calculations won't be exposed on the level of the > indentation facilities provided by TS integration in Emacs, right? Tree-sitter has no indentation calculation feature. Major mode writers = genuinely need to read the source of the tree-sitter language = definition. The source tells us what will be in the syntax tree parsed = by tree-sitter, and the node names differ from one language to another. = For example, if I want to fontify type identifiers in C with = font-lock-type-face, I need to know how is type represented in the = syntax tree. I look up the source[1], and find _type_specifier: $ =3D> choice( $.struct_specifier, $.union_specifier, $.enum_specifier, $.macro_type_specifier, $.sized_type_specifier, $.primitive_type, $._type_identifier ), This roughly translates to=20 _type_specifier :=3D | | | | | | <_type_identifier> in BNF =46rom this (and some other hint) I know I need to grab all the = _type_specifier nodes in the syntax tree, find their corresponding text = in the buffer, and apply font-lock-type-face. And type identifiers in = another language will be named differently, tree-sitter doesn=E2=80=99t = provide an abstraction for semantic names in the syntax tree. >=20 > There's some misunderstanding here, and I cannot for the life of me > figure out where is it. I was very confused, too, for the past several days, but I think we know = the source of it now. [1] The source of tree-sitter-c is at = https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js Yuan=