From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter api Date: Fri, 17 Sep 2021 13:30:05 -0700 Message-ID: <1BD3BF1C-C9F6-41CF-8558-4FA3E351346C@gmail.com> References: <83r1f7hydn.fsf@gnu.org> <8335qbirsr.fsf@gnu.org> <73E0B1F6-6F9F-40E0-927E-D08481BFF391@gmail.com> <834kaqhqlp.fsf@gnu.org> <8335qahqgk.fsf@gnu.org> <3BC29D06-CA75-4706-9AD7-ABA2F65C4DEE@gmail.com> <83v936fj35.fsf@gnu.org> <83r1dselyo.fsf@gnu.org> <6A4CE984-6ACE-4E66-8EF2-F3D351C02248@gmail.com> <83r1dscpt2.fsf@gnu.org> <83o88wcof9.fsf@gnu.org> <83lf3zdh4z.fsf@gnu.org> <8965C4A0-79D3-4D77-A6BA-D07A6C93F7FE@gmail.com> <83ilz3cs4k.fsf@gnu.org> <04D19C1A-CD64-4156-B932-1C9FEEE4EC7B@gmail.com> <83zgsebc0r.fsf@gnu.org> <1F752923-F357-4A18-B6E2-0120F1B9BD37@gmail.com> <83fsu5bzem.fsf@gnu.org> <83zgsdad5j.fsf@gnu.org> <83sfy391ni.fsf@gnu.org> <03386E3C-A975-4ECD-BF89-6AC62F751725@gmail.com> <83ilyz8xdl.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1455"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?B?VHXhuqVuLUFuaCBOZ3V54buFbg==?= , Theodor Thornhill , =?utf-8?Q?Cl=C3=A9ment_Pit-Claudel?= , Emacs developers , Stefan Monnier , stephen_leake@stephe-leake.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Sep 17 22:31:18 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mRKVy-0000Ia-Hn for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Sep 2021 22:31:18 +0200 Original-Received: from localhost ([::1]:57470 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mRKVv-0008KU-Vc for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Sep 2021 16:31:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60186) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mRKUt-0007ZA-As for emacs-devel@gnu.org; Fri, 17 Sep 2021 16:30:11 -0400 Original-Received: from mail-qt1-x835.google.com ([2607:f8b0:4864:20::835]:36538) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mRKUr-0007rZ-CZ; Fri, 17 Sep 2021 16:30:10 -0400 Original-Received: by mail-qt1-x835.google.com with SMTP id d11so9887464qtw.3; Fri, 17 Sep 2021 13:30:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=mjM/5KKzOhsJKQ53vUKsWtO92WhTiPw9NiL+QleYTYI=; b=iak4u7XJ5lZkeR0FE1V7U3XXRS0qfN9IeeEQbsHsMO1JtyZlImmeUxoRo860X2UuUh dLsQnCsQjqB1o/U5+7DI/jUV6niNSiybQXuFfgtfaHxSwTOcFTdd9vjPuBhDhhY0kf4L pd5y0J4syXkSI6MREUrxbxBBH4U90YD5m4zVJW2e9F5iCqjMQglWYXmfbfwF0xzgr/n5 esbggpowlCMNMgnvv+Gk1LaH1BThjXx2wg+77xshOLalF7rFTT5XKnIczgPEz5ebTprL H2bXiWftrKtqgD2EQtl+4vbe32r5rHx0YU2az3QB51FrZXMS20FyqzZb95y52Nsh9nUf cv/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=mjM/5KKzOhsJKQ53vUKsWtO92WhTiPw9NiL+QleYTYI=; b=T8/egMGO4NbcOxe3WEsyP0afqj+KBF4Nd5vGX5qCUAu8V0ZAcJ9+B01UjgFS2JrxMp AHOfiUcE8FqQKHCHqYkwdQYmnwmfzqONQKiFcz8PHjHmPR05zeZljK8/dgd+CpOHeVhF VW4pGTiNuQ2SLLPZCcRm/6zNT/91PFiEU76YRNWgK5BKgMUjycMIWmZRC94Vjj178nxA wzpXoROyraE3CtjGps/LodBYqIoXWZUHnkS+8uC64rCvWI2xWlk6Tc88+9hY+Kjp13w0 ZOyNE+ugHmz4haaSg4M31N51k2yZbeXY5N0vujMDwNjRUOEjbWtU77QbxXt0FoWEsO8T Tk4A== X-Gm-Message-State: AOAM5319HCjhFsdOtfZCOIngctrgFvyFFp7K9pcnojfZErx+OwDlhHFc ET2Qo3od/IpkuagM+3lfUMDuuH7yGDv8+w== X-Google-Smtp-Source: ABdhPJw2q2GG8Q412kHtP1Hce7C9bjepRrrLk5dnNYGsmaFlBwJF8LVs+Z5EsfheLq61k7iKDO4iYw== X-Received: by 2002:a05:622a:1041:: with SMTP id f1mr9526877qte.102.1631910607700; Fri, 17 Sep 2021 13:30:07 -0700 (PDT) Original-Received: from smtpclient.apple ([2600:1700:2ec7:8c9f:c82d:eb89:1e58:8f64]) by smtp.gmail.com with ESMTPSA id z20sm1366615qtm.37.2021.09.17.13.30.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Sep 2021 13:30:07 -0700 (PDT) In-Reply-To: <83ilyz8xdl.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) Received-SPF: pass client-ip=2607:f8b0:4864:20::835; envelope-from=casouri@gmail.com; helo=mail-qt1-x835.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:274936 Archived-At: >=20 > Why do we need to document the language definitions? When a Lisp > programmer defines font-lock and indentation for a programming > language in the current Emacs, do they necessarily need to consult the > language grammar? > [=E2=80=A6] > This stuff should be known to TS; the Lisp programmer only needs to be > aware of the results of lexical and syntactical analysis, in terms of > their Lisp expressions (Lisp data structures with appropriate symbols > and fields). I demonstrate the reason why one needs to consult the source a few = messages back: > Tree-sitter has no indentation calculation feature. Major mode writers = genuinely need to read the source of the tree-sitter language = definition. The source tells us what will be in the syntax tree parsed = by tree-sitter, and the node names differ from one language to another. = For example, if I want to fontify type identifiers in C with = font-lock-type-face, I need to know how is type represented in the = syntax tree. I look up the source[1], and find >=20 > _type_specifier: $ =3D> choice( > $.struct_specifier, > $.union_specifier, > $.enum_specifier, > $.macro_type_specifier, > $.sized_type_specifier, > $.primitive_type, > $._type_identifier > ), >=20 > This roughly translates to=20 >=20 > _type_specifier :=3D > | > | > | > | > | > | <_type_identifier> >=20 > in BNF >=20 > =46rom this (and some other hint) I know I need to grab all the = _type_specifier nodes in the syntax tree, find their corresponding text = in the buffer, and apply font-lock-type-face. And type identifiers in = another language will be named differently, tree-sitter doesn=E2=80=99t = provide an abstraction for semantic names in the syntax tree. >> And I want to also point out that as Emacs core developers, we = can=E2=80=99t possibly provide a good translation from convention = language names to their tree-sitter name (C# -> c-sharp). Maybe we can = do a half-decent job, but 1) that won=E2=80=99t cover all available = languages, and 2) if there is a new language, we need to wait for the = next release to update our translation. It is better for the major mode = writers to provide the information on how to translate names. >=20 > The database used by the conversion should definitely be extensible. > But that doesn't mean it should be empty. >=20 > Anyway, we've spent enough time on this issue. If you are still > unconvinced, feel free to do it your way, and let the chips fall as > they may. I=E2=80=99ll do it the way I see fit. You can always comment in the = final review (or something). Thanks. Yuan=