From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree Sitter (was Re: cc-mode fontification feels random) Date: Thu, 22 Jul 2021 09:49:27 -0400 Message-ID: References: <83o8cge4lg.fsf@gnu.org> <62e438b5-d27f-1d3c-69c6-11fe29a76d74@dancol.org> <83fsxsdxhu.fsf@gnu.org> <179f22a44d8.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <179f38c0370.2816.cc5b3318d7e9908e2c46732289705cb0@dancol.org> <236e62c2-be9b-b26d-8cd0-4b5a1a86e19a@dancol.org> <86mtqsoh3f.fsf@stephe-leake.org> <286d815e-d1a1-07ca-6696-a7f51923ab4e@piermont.com> <86wnpl6f0y.fsf@stephe-leake.org> <865yx45y7g.fsf@stephe-leake.org> <0c575ca7-d287-4699-02bd-65822c11bf5d@piermont.com> <2e5ead63-624e-57bf-feaa-996f078fc782@dancol.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_143A3399-A69E-4953-982C-6C3E7E0F85E0" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22963"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, Stephen Leake , Stefan Monnier , "Perry E. Metzger" To: Daniel Colascione Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Jul 22 15:51:19 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m6Z6c-0005mJ-Ss for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 15:51:18 +0200 Original-Received: from localhost ([::1]:36552 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6Z6b-0005FB-Sp for ged-emacs-devel@m.gmane-mx.org; Thu, 22 Jul 2021 09:51:17 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40726) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6Z4s-0002pH-TJ for emacs-devel@gnu.org; Thu, 22 Jul 2021 09:49:30 -0400 Original-Received: from mail-qt1-x831.google.com ([2607:f8b0:4864:20::831]:45836) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m6Z4r-0006LW-AJ for emacs-devel@gnu.org; Thu, 22 Jul 2021 09:49:30 -0400 Original-Received: by mail-qt1-x831.google.com with SMTP id z25so4211468qto.12 for ; Thu, 22 Jul 2021 06:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=UpT51fm5kOH7/6S6TOykmatgoYllKaF3L4QZF4nJ0Vg=; b=mWDPQh92i1pN/652I3xzO2AqYOTJOjFgQ6e9YGMK7YgHt/6ExKdCQMNclDOLBIy5XP wYaThUJOny/9z0r9BsffOV2RxjC/9dc/CA48EkPyVbgM/WZodwKKkijVjB+C0LASJgOK dFAXOW72IHE7Wy/SOs6E5j9vaWP2W0KpwMx/4U841poDcYGc4ZBAh+xBezyCWVOsYQEZ N0ICnwerRQSczAFKzsj0wb2t12vC16Ea4m80l/oWoOAILvtIAQtKT9A0lX9UkZhPS0PV pbtGKndVGZ3Jn5GdrXLGEmI9pj9kh6Co/ATGR492mn/R5O8fOcb99qFdFwtl9doDPrx6 DAqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=UpT51fm5kOH7/6S6TOykmatgoYllKaF3L4QZF4nJ0Vg=; b=QAkqQaJSUr76MR5NSLlpVPVgHzXAszD55kOeMiI/XWQGBkbNNa9e/LgozbUABHFTXX m8jFC6zdNB2YDwqTwr4CVg90X7lMMFgItxg+X24YLTTTxLiiJaZOvROTkYCWoip0Xaiu NlBrEEFhPXaxMji5RyhmOXzPczja0PltLIEkhamSm9mUJuXNK0DYYBe2ZfxmYvwAix3t V/0f58m/uxdeJc7U0tltSGo3K4tl67cR+f6Y1rHOqtyBVdCSEGb+VdTKOWnsmPGUAyYH y5cZ/lNrlKlyg/5ETTtSRbuH00n/L4rgd9Dv1ltLMkxxXbjVjOLR9+T9U03OeUBs5o/s oA3w== X-Gm-Message-State: AOAM5320cbJ9kSYvyI9DBQDuoe4GWJfUdQaojktk7kjkfSv81mNrW2c+ IX6zo6q0BgnGYJ5pOFurVQk= X-Google-Smtp-Source: ABdhPJyqzCgC2GqW0YpJXMlhU34eyMetT4TMetmq7rIpbtVIqq7yIu2qnoBg2jx//OYEITSSEA8SPA== X-Received: by 2002:ac8:58c9:: with SMTP id u9mr17864179qta.205.1626961768436; Thu, 22 Jul 2021 06:49:28 -0700 (PDT) Original-Received: from 2603-7080-0302-635e-501b-d938-5abe-a221.res6.spectrum.com (2603-7080-0302-635e-501b-d938-5abe-a221.res6.spectrum.com. [2603:7080:302:635e:501b:d938:5abe:a221]) by smtp.gmail.com with ESMTPSA id i123sm11240901qkf.60.2021.07.22.06.49.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Jul 2021 06:49:28 -0700 (PDT) In-Reply-To: <2e5ead63-624e-57bf-feaa-996f078fc782@dancol.org> X-Mailer: Apple Mail (2.3654.60.0.2.21) Received-SPF: pass client-ip=2607:f8b0:4864:20::831; envelope-from=casouri@gmail.com; helo=mail-qt1-x831.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271465 Archived-At: --Apple-Mail=_143A3399-A69E-4953-982C-6C3E7E0F85E0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jul 21, 2021, at 9:16 PM, Daniel Colascione = wrote: >=20 >=20 > On 7/21/21 12:15 PM, Perry E. Metzger wrote: >> On 7/21/21 12:21, Daniel Colascione wrote: >>> On 7/21/21 7:43 AM, Perry E. Metzger wrote: >>>> Thought I would note that there's a substantial literature now on = incremental parsing, especially the sort that is needed for editor = tools. One doesn't need to reinvent the algorithms, they're out there = waiting to be used. The Tree Sitter project is based on previous = published work. >>>=20 >>> There is indeed a big literature! I wish there were a bigger = literature on *composable* incremental parsers though. IMHO, what we = need is an incremental GLR system (yes, GLR is bad worst-case, but it's = not a practical concern) that spits out a parse *forest* which we then = pare down to a parse tree with ad-hoc syntactic consistency rules. = Something like this naturally supports multi-language modes and = incorporation of out-of-band semantic information. >>>=20 >> Tree sitter handles GLR. >>=20 >=20 > Cool. How does it prune the parse forest? I=E2=80=99m not an expert, but the author talked about using grammar = definition to reject branches in his talk. Yuan= --Apple-Mail=_143A3399-A69E-4953-982C-6C3E7E0F85E0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Jul 21, 2021, at 9:16 PM, Daniel Colascione <dancol@dancol.org> = wrote:


On 7/21/21 = 12:15 PM, Perry E. Metzger wrote:
On 7/21/21 12:21, Daniel Colascione = wrote:
On 7/21/21 = 7:43 AM, Perry E. Metzger wrote:
Thought I would note that there's a substantial literature = now on incremental parsing, especially the sort that is needed for = editor tools. One doesn't need to reinvent the algorithms, they're out = there waiting to be used. The Tree Sitter project is based on previous = published work.

There is = indeed a big literature! I wish there were a bigger literature on = *composable* incremental parsers though. IMHO, what we need is an = incremental GLR system (yes, GLR is bad worst-case, but it's not a = practical concern) that spits out a parse *forest* which we then pare = down to a parse tree with ad-hoc syntactic consistency rules. Something = like this naturally supports multi-language modes and incorporation of = out-of-band semantic information.

Tree sitter handles GLR.


Cool. How does it prune the parse = forest?

I=E2=80=99m not an expert, but the author talked about using = grammar definition to reject branches in his talk.

Yuan
= --Apple-Mail=_143A3399-A69E-4953-982C-6C3E7E0F85E0--