From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Questions about tree-sitter Date: Thu, 7 Sep 2023 16:42:44 -0700 Message-ID: <581816B0-2F41-42C9-B49A-70F7DD800212@gmail.com> References: <12fe5895-7d34-4f3e-b1cf-aa133b718c24@mailo.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11656"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?utf-8?B?IkF1Z3VzdGluIENow6luZWF1IChCVHVpbiki?= , emacs-devel@gnu.org To: Lynn Winebarger Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Sep 08 01:43:59 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qeOfG-0002pZ-Qx for ged-emacs-devel@m.gmane-mx.org; Fri, 08 Sep 2023 01:43:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qeOeL-0005S6-FE; Thu, 07 Sep 2023 19:43:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qeOeK-0005Ry-Rt for emacs-devel@gnu.org; Thu, 07 Sep 2023 19:43:00 -0400 Original-Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qeOeI-0001AT-5O for emacs-devel@gnu.org; Thu, 07 Sep 2023 19:43:00 -0400 Original-Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bdbf10333bso12482455ad.1 for ; Thu, 07 Sep 2023 16:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694130176; x=1694734976; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P1+gYoSVZZ/GclVHv8YOF7o10TQhIibJoz/W4JiblmQ=; b=cHsnECIaFMfuhTOWLgeWefpFGmKcnXqsJYuskRgeKhG8qZJo9cWiqrYOafmz9Q8HVX p85mOD4zCmHnPdXazt4WC0mNpMp31Fj4/01URSlEvAfo2QSinLEr5PnPyLgI/IrRk3eu pWd4fQ4pz2kXJRGritqKry8NCS6wrwHQxMqbq+UlMsb+d87RIDOj7pBKW2v5Y2k88r0H PxP6zmjZ9PY0oB51iYGDfYPb64znrVlXXTPivu+dhzCX7beKm4Wvgp0zQVxT1Tymkimf nhc1IYHOw6wEb8VScd53Ffo4Fhr1zsG0etYTj3klxuGZ0d2LIFT9AVAKGBdApPVMmf6G cp+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694130176; x=1694734976; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P1+gYoSVZZ/GclVHv8YOF7o10TQhIibJoz/W4JiblmQ=; b=OzsFlehy+FGB0ARRLi5yxUCiYEGhMDpBxHY+Kahpuo96pO/fHyhns1v0xnK32XWSo4 yXrkdVgeD+BROKox/zmFSV+9l2UGgsxmU0eTj40nEeNIBSz7kPy4lnvyn6qSe6noKWLM 7UoN2QaVDDzcBykxPERHpRZGvmzjYRPqWM+DXYFkNtoMvl5a5F/KqFmqz2abFRx+fS5M 9bJX4JGZ6ecR4Xj47cW5YInaxV5InmVFj6AxjCaC/ld76CSPf66c/6/503nPqHxR+6Wj jCQ2JTyb2m5Oq18cqQQfbN8DZ2UlU1oxAWhPRC57jLNnuJHu/0WisfKCDPcUdtcBJgHr cpmQ== X-Gm-Message-State: AOJu0YwF57pz6uYbK7OkES2YsG9cd/p4MFsKtlNCHyKf+VWQcPnvJaVk cGuBg1wEHi1ZGkbRFq9/rlM= X-Google-Smtp-Source: AGHT+IEm9KHhJnz1YZCTxQradcEX/c9SZvsSRJEawK11FunC+jbkaOIGkcf6xgEZhJX4JHzakdEAEg== X-Received: by 2002:a17:902:d3d5:b0:1bc:2abb:4e98 with SMTP id w21-20020a170902d3d500b001bc2abb4e98mr1080001plb.21.1694130176224; Thu, 07 Sep 2023 16:42:56 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id l12-20020a170902eb0c00b001bf574dd1fesm281393plb.141.2023.09.07.16.42.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Sep 2023 16:42:55 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::62c; envelope-from=casouri@gmail.com; helo=mail-pl1-x62c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310296 Archived-At: > On Sep 6, 2023, at 9:11 AM, Lynn Winebarger = wrote: >=20 > On Wed, Aug 30, 2023 at 3:03=E2=80=AFAM Yuan Fu = wrote: >>> On Aug 29, 2023, at 2:26 PM, Augustin Ch=C3=A9neau (BTuin) = wrote: >>> I have a few questions about tree-sitter. >>>=20 >>> I'm currently developing a grammar for GNU Bison alongside a = tree-sitter >>> major mode, it's a work in progress. The grammar is here: >>> , still incomplete but = so >>> far able to parse simple files, and the major mode prototype is >>> attached to this message. >>>=20 >>> So, the questions: >>>=20 >>> 1. Is there a way to reload a grammar? >>>=20 >>> Emacs is pretty nice as a playground for testing grammars, but once = a >>> grammar is loaded, it won't be loaded again until Emacs restarts (as = far >>> as I know). >>> Is it possible to reload a grammar after modifying it? >>=20 >> No, and it=E2=80=99s probably not easy to implement either, since = unloading the grammar would require Emacs to purge/invalid all the = node/query/parsers using that grammar. >=20 > Reviewing some generated "parser.c" files, and some of the available > documentation, it appears the parser.c file basically creates a lexing > function that adheres to a certain protocol in terms of > producing/consuming a standard lexer state data structure, and an > LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous > actions). These and definitions of the tokens and grammar symbols are > bundled up in a language structure passed to the tree-sitter library. > LALR(1) tables are essentially simplified/compressed LR(1) tables, and > emacs has code to calculate such tables directly in elisp. > Therefore, given functionality to translate elisp data into the raw C > structures, we should be able to dynamically create language data > structures to pass to the tree-sitter library to create a library. > We would also need a table driven lexer framework in place of the > generated lexer in the C file to completely avoid going through a C > compiler. > The other novel features of tree-sitter parsers appear to be > implemented in the parser runtime, not in the table calculation. >=20 > I've implemented LALR(1) parser generators two or three times in the > last couple of decades, this might be a fun project for me while I am > unambiguously able to contribute to GNU Emacs. That=E2=80=99ll be great. But note that the parser structure has scape = hatches: certain things can be implemented by arbitrary C function. Also = tree-sitter allows grammars to use custom scanners [1].=20 [1] = https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanne= rs Yuan=