From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: Questions about tree-sitter Date: Wed, 6 Sep 2023 12:11:24 -0400 Message-ID: References: <12fe5895-7d34-4f3e-b1cf-aa133b718c24@mailo.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34398"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?UTF-8?Q?Augustin_Ch=C3=A9neau_=28BTuin=29?= , emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 06 18:12:56 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qdv9E-0008i9-OW for ged-emacs-devel@m.gmane-mx.org; Wed, 06 Sep 2023 18:12:56 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qdv88-0004pA-VS; Wed, 06 Sep 2023 12:11:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qdv83-0004la-TN for emacs-devel@gnu.org; Wed, 06 Sep 2023 12:11:44 -0400 Original-Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qdv7z-0007ES-Eg for emacs-devel@gnu.org; Wed, 06 Sep 2023 12:11:43 -0400 Original-Received: by mail-pj1-x1029.google.com with SMTP id 98e67ed59e1d1-27373f0916dso2525561a91.1 for ; Wed, 06 Sep 2023 09:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694016696; x=1694621496; darn=gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P3V3hrjINWHrUkmjZDWs+CuFPfzAYKt7t3585eVetGg=; b=MnFnymRh4pL8czFKbahIT9bfGeDATN5o1SJprft4QAHNTygm39aLpc8exS2vkEjUUE ZSPdHmTFb2mpxGm8cnv1AvKzls/OHua/MzUgk1zMkP1hJscAvLdnGHGCUOssH1vkAzUO //bYVQ0OusglT3RgO4Jmdor3rmXxD6HWrAbakqI9B3FjRGfV9PfJS5S3yQYK24brY9GL IESFMWqCjgsqnRL5WbWTON+QXHSlsG+o8lHtsBYw3LbGN8JOHD13elrtx5CdPL++EhJ3 imV0rg86Imh+SnJDD3sTsxd3XA/oTszNHDtZj+1iFKW5XPd7CY9/iMVJQDaybzbjilgE oAAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1694016696; x=1694621496; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P3V3hrjINWHrUkmjZDWs+CuFPfzAYKt7t3585eVetGg=; b=YFcxVMfClLwAZgZQPwymdVCz4p0nnoKP/Whc2XPbbO4ZMaQBe0DXuGUO94X0LW1ueZ e1pH8FhVp4c7ECIAiEk3Lxeur4QwbrJYFlE61UfJoAziqQ8SvHftS1FvJtSe8rIZ9wrP VVi70xN0BAZ06kn0N5wfXhbeMthlPDVTFS0x/8koylaMH5Ho9XQUiuQmzFJIXLSJL+SP oESGAL4lgAakgLeqkWq3Wc/SRGvsCJ3Byaz+sb9NYRKAdq1RKUUBFUrnPdEUBHvIvWc1 pZeb9uDypriRXvDisOuRaLyTOewJpAvG0ckQsn1Zh7C4MxpNM4mxHRDU+wWc84lkXqZB kTog== X-Gm-Message-State: AOJu0YwjVokN4nb2Wy1DHWqS+QqVZU8xLM2I4SdAXNVWpohsFr++4O/1 JBPSBJW2Xw9ORpWUYkcMlvuJPasF//qHMKzMPoE= X-Google-Smtp-Source: AGHT+IGuSMYRZADxssnnazPS9bFJjjhSsn5HlFtKdUTH6IoKIftolTKNYu2/dysUX5ToSrmhfyox2iCb0zrS6YF2vxc= X-Received: by 2002:a17:90a:7b82:b0:267:f9ab:15bb with SMTP id z2-20020a17090a7b8200b00267f9ab15bbmr12555987pjc.14.1694016695970; Wed, 06 Sep 2023 09:11:35 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::1029; envelope-from=owinebar@gmail.com; helo=mail-pj1-x1029.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310208 Archived-At: On Wed, Aug 30, 2023 at 3:03=E2=80=AFAM Yuan Fu wrote: > > On Aug 29, 2023, at 2:26 PM, Augustin Ch=C3=A9neau (BTuin) wrote: > > I have a few questions about tree-sitter. > > > > I'm currently developing a grammar for GNU Bison alongside a tree-sitte= r > > major mode, it's a work in progress. The grammar is here: > > , still incomplete but so > > far able to parse simple files, and the major mode prototype is > > attached to this message. > > > > So, the questions: > > > > 1. Is there a way to reload a grammar? > > > > Emacs is pretty nice as a playground for testing grammars, but once a > > grammar is loaded, it won't be loaded again until Emacs restarts (as fa= r > > as I know). > > Is it possible to reload a grammar after modifying it? > > No, and it=E2=80=99s probably not easy to implement either, since unloadi= ng the grammar would require Emacs to purge/invalid all the node/query/pars= ers using that grammar. Reviewing some generated "parser.c" files, and some of the available documentation, it appears the parser.c file basically creates a lexing function that adheres to a certain protocol in terms of producing/consuming a standard lexer state data structure, and an LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous actions). These and definitions of the tokens and grammar symbols are bundled up in a language structure passed to the tree-sitter library. LALR(1) tables are essentially simplified/compressed LR(1) tables, and emacs has code to calculate such tables directly in elisp. Therefore, given functionality to translate elisp data into the raw C structures, we should be able to dynamically create language data structures to pass to the tree-sitter library to create a library. We would also need a table driven lexer framework in place of the generated lexer in the C file to completely avoid going through a C compiler. The other novel features of tree-sitter parsers appear to be implemented in the parser runtime, not in the table calculation. I've implemented LALR(1) parser generators two or three times in the last couple of decades, this might be a fun project for me while I am unambiguously able to contribute to GNU Emacs. Regards, Lynn