From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: Using the wisent parser-generator, as it creates faster parsers Date: Mon, 26 Dec 2022 08:54:33 -0500 Message-ID: References: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000004a423a05f0bb7a5e" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26804"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eric Ludlam , emacs-devel To: ambulajan@gmail.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 26 14:55:41 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p9nx7-0006qZ-G4 for ged-emacs-devel@m.gmane-mx.org; Mon, 26 Dec 2022 14:55:41 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p9nwP-0004wu-4y; Mon, 26 Dec 2022 08:54:57 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p9nwI-0004wk-7u for emacs-devel@gnu.org; Mon, 26 Dec 2022 08:54:50 -0500 Original-Received: from mail-pj1-x102c.google.com ([2607:f8b0:4864:20::102c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p9nwF-0005rc-2v for emacs-devel@gnu.org; Mon, 26 Dec 2022 08:54:48 -0500 Original-Received: by mail-pj1-x102c.google.com with SMTP id gt4so10728196pjb.1 for ; Mon, 26 Dec 2022 05:54:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=E0HQKP+znc/NquZMd3G8xiRuHz6LSAmQaxXWAb2UOgk=; b=Sod1KeAVJKYSO5FxNTkr8ePcUwtea2bYmtGK59evW3FGHCz/B8o1nXolWeOwYINSRA QvgRdGSjdXEyYT/rhrQ9fUT1zrOtS3VYlkKMLcE8kUufnORUQhrnWzSkvePRZDqhGdhu fnnkp1X+JQaKeWSkU496PLc5PbE0RbenJUqZ24nCldK4pKzBDN5FK6Lg1W9XBTve0tE0 yjVU58pn8R+tcJgw+6qXzk+jVp18TOozCryZz3O4J2xeOJX2THtacKkE96934ELSh7qB uMCqDcZrczb5dXpgkKOPVc0HcL2yOjDfBbXD/FGScWKop+KxlR4Re9Ao6PGKmf03f984 SSYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=E0HQKP+znc/NquZMd3G8xiRuHz6LSAmQaxXWAb2UOgk=; b=MbkIS/To+b6523FsnoxMv/R6sP7fAz6o121DWz+2iZofgnqWTkStqu2eFKrATrzX/r x6exQfqng8PY/gNYwzgDd1qAnVyDB/wquL2PfKtPWCba2h3X9DW9cPD590gB13FbIWfj JAzqrrg6KfX51wSCJJUCU3rwWguqGD93L6hE9KT4Egt7Vd3kEN9IIO8mi79zZM41pty9 FpKpf0Lp1tr/leEPII2t2Qm1RZUtx+6qxPgyHJsTGrlJ19Mrc6MhQEDZUjhZX4flO6g1 EsjyZUdsNOCCqq/IjRNooeG3iAPOL317HX/VYxeAedmIGz9LhZIV7hH2bR7dV1KTdto2 2e+w== X-Gm-Message-State: AFqh2kqxNb41jF56My0Fi8rF+n7Lnsjmq4x10rRQ8hdjYNqRik+BoD3e sv2/q6LqYsOaMUSNhtxn8DRVwhujMxS2GiGKrZE= X-Google-Smtp-Source: AMrXdXtvJQLKpta2SmdmhYJ3ODAaXrTwaO58eMKRwNxJTOemOOnXfEupLiYTTFCxCMkIbVOFLBy1rKLlGtxB9aR4kOE= X-Received: by 2002:a17:902:c40b:b0:190:f66c:13aa with SMTP id k11-20020a170902c40b00b00190f66c13aamr1129444plk.121.1672062884788; Mon, 26 Dec 2022 05:54:44 -0800 (PST) In-Reply-To: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::102c; envelope-from=owinebar@gmail.com; helo=mail-pj1-x102c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:301931 Archived-At: --0000000000004a423a05f0bb7a5e Content-Type: text/plain; charset="UTF-8" On Sun, Dec 25, 2022, 11:03 PM wrote: > A follow-up note after reading earlier discussion with a subject "Re: > Why tree-sitter instead of Semantic?". > > > If you want to build a parser that sits on the lexer, there is more > > to it, as I recommend using the wisent parser-generator, as it > > creates faster parsers. In the wisent .wy files, you define %tokens > > using a bison-like syntax, and that in turns builds analyzers that > > you include in your lexer. > > I doubt that the problem of Semantic parsers was ever in Elisp being > slow for that purpose. > For me it was writing a LALR parser. Everything else was logical - > lexers, SemanticDB, etc. But a grammar in development that stalls at > every step with shift/reduce and reduce/reduce conflicts is like > pushing against a wall. LALR algorithm never meant to be an interface > for a developer, rather a workaround for slow CPUs with small memory > systems of the 1980s. > I don't agree - conflicts detected by the parser generator indicate that distinct ASTs map to the same text, at least according to the reduction method being used. I like using LR derivations (including ones produced by LALR) because of the bottom-up recognition of grammar symbols. I've written an Earley parser, and so far it looks in the same > performance category as LALR(wisent) written in Elisp. > Earley parser works with any grammar you throw at it. No conflicts. > Each token gets full context of rules that are in effect at that point. > Seems like there's no need to build parse trees, a list of states- > tokens can be thought of as a flattened parse tree. > Though there's a lot of testing for this concept ahead. > The above is not to discourage adding Earley parsing to the toolkit. However, just defers the to resolution of ambiguities to your code rather than by refining your grammar. For specifying a programming language, this seems like inviting difficult to debug cases to me. Lynn --0000000000004a423a05f0bb7a5e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Dec 25, 2022, 11:03 PM <ambulajan@gmail.com> wrote:
A follow-up note after reading earlier discussion with a subj= ect "Re:
Why tree-sitter instead of Semantic?".

> If you want to build a parser that sits on the lexer, there is more > to it, as I recommend using the wisent parser-generator, as it
> creates faster parsers. In the wisent .wy files, you define %tokens > using a bison-like syntax, and that in turns builds analyzers that
> you include in your lexer.

I doubt that the problem of Semantic parsers was ever in Elisp being
slow for that purpose.
For me it was writing a LALR parser. Everything else was logical -
lexers, SemanticDB, etc. But a grammar in development that stalls at
every step with shift/reduce and reduce/reduce conflicts is like
pushing against a wall. LALR algorithm never meant to be an interface
for a developer, rather a workaround for slow CPUs with small memory
systems of the 1980s.

I don't agree - conflicts detected by the parser g= enerator indicate that distinct ASTs map to the same text, at least accordi= ng to the reduction method being used.=C2=A0 I like using LR derivations (i= ncluding ones produced by LALR) because of the bottom-up recognition of gra= mmar symbols.=C2=A0


=
I've written an Earley parser, and so far it looks in the same
performance category as LALR(wisent) written in Elisp.
Earley parser works with any grammar you throw at it. No conflicts.
Each token gets full context of rules that are in effect at that point.
Seems like there's no need to build parse trees, a list of states-
tokens can be thought of as a flattened parse tree.
Though there's a lot of testing for this concept ahead.


The above is not to discourage adding Earley parsing to the too= lkit.=C2=A0 However, just defers the to resolution of ambiguities to your c= ode rather than by refining your grammar. For specifying a programming lan= guage, this seems like inviting difficult to debug cases to me.=C2=A0=C2=A0=

Lynn

--0000000000004a423a05f0bb7a5e--