From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Lynn Winebarger <owinebar@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Using the wisent parser-generator, as it creates faster parsers
Date: Mon, 26 Dec 2022 08:54:33 -0500
Message-ID: <CAM=F=bAw9rJb36kkxAweJ=1ieeWCNMav73y_zT+Zf4auAtyr4g@mail.gmail.com>
References: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="0000000000004a423a05f0bb7a5e"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="26804"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Eric Ludlam <ericludlam@gmail.com>, emacs-devel <emacs-devel@gnu.org>
To: ambulajan@gmail.com
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 26 14:55:41 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1p9nx7-0006qZ-G4
	for ged-emacs-devel@m.gmane-mx.org; Mon, 26 Dec 2022 14:55:41 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1p9nwP-0004wu-4y; Mon, 26 Dec 2022 08:54:57 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1p9nwI-0004wk-7u
 for emacs-devel@gnu.org; Mon, 26 Dec 2022 08:54:50 -0500
Original-Received: from mail-pj1-x102c.google.com ([2607:f8b0:4864:20::102c])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <owinebar@gmail.com>)
 id 1p9nwF-0005rc-2v
 for emacs-devel@gnu.org; Mon, 26 Dec 2022 08:54:48 -0500
Original-Received: by mail-pj1-x102c.google.com with SMTP id gt4so10728196pjb.1
 for <emacs-devel@gnu.org>; Mon, 26 Dec 2022 05:54:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=E0HQKP+znc/NquZMd3G8xiRuHz6LSAmQaxXWAb2UOgk=;
 b=Sod1KeAVJKYSO5FxNTkr8ePcUwtea2bYmtGK59evW3FGHCz/B8o1nXolWeOwYINSRA
 QvgRdGSjdXEyYT/rhrQ9fUT1zrOtS3VYlkKMLcE8kUufnORUQhrnWzSkvePRZDqhGdhu
 fnnkp1X+JQaKeWSkU496PLc5PbE0RbenJUqZ24nCldK4pKzBDN5FK6Lg1W9XBTve0tE0
 yjVU58pn8R+tcJgw+6qXzk+jVp18TOozCryZz3O4J2xeOJX2THtacKkE96934ELSh7qB
 uMCqDcZrczb5dXpgkKOPVc0HcL2yOjDfBbXD/FGScWKop+KxlR4Re9Ao6PGKmf03f984
 SSYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=E0HQKP+znc/NquZMd3G8xiRuHz6LSAmQaxXWAb2UOgk=;
 b=MbkIS/To+b6523FsnoxMv/R6sP7fAz6o121DWz+2iZofgnqWTkStqu2eFKrATrzX/r
 x6exQfqng8PY/gNYwzgDd1qAnVyDB/wquL2PfKtPWCba2h3X9DW9cPD590gB13FbIWfj
 JAzqrrg6KfX51wSCJJUCU3rwWguqGD93L6hE9KT4Egt7Vd3kEN9IIO8mi79zZM41pty9
 FpKpf0Lp1tr/leEPII2t2Qm1RZUtx+6qxPgyHJsTGrlJ19Mrc6MhQEDZUjhZX4flO6g1
 EsjyZUdsNOCCqq/IjRNooeG3iAPOL317HX/VYxeAedmIGz9LhZIV7hH2bR7dV1KTdto2
 2e+w==
X-Gm-Message-State: AFqh2kqxNb41jF56My0Fi8rF+n7Lnsjmq4x10rRQ8hdjYNqRik+BoD3e
 sv2/q6LqYsOaMUSNhtxn8DRVwhujMxS2GiGKrZE=
X-Google-Smtp-Source: AMrXdXtvJQLKpta2SmdmhYJ3ODAaXrTwaO58eMKRwNxJTOemOOnXfEupLiYTTFCxCMkIbVOFLBy1rKLlGtxB9aR4kOE=
X-Received: by 2002:a17:902:c40b:b0:190:f66c:13aa with SMTP id
 k11-20020a170902c40b00b00190f66c13aamr1129444plk.121.1672062884788; Mon, 26
 Dec 2022 05:54:44 -0800 (PST)
In-Reply-To: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com>
Received-SPF: pass client-ip=2607:f8b0:4864:20::102c;
 envelope-from=owinebar@gmail.com; helo=mail-pj1-x102c.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:301931
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/301931>

--0000000000004a423a05f0bb7a5e
Content-Type: text/plain; charset="UTF-8"

On Sun, Dec 25, 2022, 11:03 PM <ambulajan@gmail.com> wrote:

> A follow-up note after reading earlier discussion with a subject "Re:
> Why tree-sitter instead of Semantic?".
>
> > If you want to build a parser that sits on the lexer, there is more
> > to it, as I recommend using the wisent parser-generator, as it
> > creates faster parsers. In the wisent .wy files, you define %tokens
> > using a bison-like syntax, and that in turns builds analyzers that
> > you include in your lexer.
>
> I doubt that the problem of Semantic parsers was ever in Elisp being
> slow for that purpose.
> For me it was writing a LALR parser. Everything else was logical -
> lexers, SemanticDB, etc. But a grammar in development that stalls at
> every step with shift/reduce and reduce/reduce conflicts is like
> pushing against a wall. LALR algorithm never meant to be an interface
> for a developer, rather a workaround for slow CPUs with small memory
> systems of the 1980s.
>

I don't agree - conflicts detected by the parser generator indicate that
distinct ASTs map to the same text, at least according to the reduction
method being used.  I like using LR derivations (including ones produced by
LALR) because of the bottom-up recognition of grammar symbols.


I've written an Earley parser, and so far it looks in the same
> performance category as LALR(wisent) written in Elisp.
> Earley parser works with any grammar you throw at it. No conflicts.
> Each token gets full context of rules that are in effect at that point.
> Seems like there's no need to build parse trees, a list of states-
> tokens can be thought of as a flattened parse tree.
> Though there's a lot of testing for this concept ahead.
>


The above is not to discourage adding Earley parsing to the toolkit.
However, just defers the to resolution of ambiguities to your code rather
than by refining your grammar. For specifying a programming language, this
seems like inviting difficult to debug cases to me.

Lynn

--0000000000004a423a05f0bb7a5e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D=
"gmail_attr">On Sun, Dec 25, 2022, 11:03 PM  &lt;<a href=3D"mailto:ambulaja=
n@gmail.com">ambulajan@gmail.com</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">A follow-up note after reading earlier discussion with a subj=
ect &quot;Re:<br>
Why tree-sitter instead of Semantic?&quot;.<br>
<br>
&gt; If you want to build a parser that sits on the lexer, there is more<br=
>
&gt; to it, as I recommend using the wisent parser-generator, as it<br>
&gt; creates faster parsers. In the wisent .wy files, you define %tokens<br=
>
&gt; using a bison-like syntax, and that in turns builds analyzers that<br>
&gt; you include in your lexer.<br>
<br>
I doubt that the problem of Semantic parsers was ever in Elisp being<br>
slow for that purpose.<br>
For me it was writing a LALR parser. Everything else was logical -<br>
lexers, SemanticDB, etc. But a grammar in development that stalls at<br>
every step with shift/reduce and reduce/reduce conflicts is like<br>
pushing against a wall. LALR algorithm never meant to be an interface<br>
for a developer, rather a workaround for slow CPUs with small memory<br>
systems of the 1980s.<br></blockquote></div></div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto">I don&#39;t agree - conflicts detected by the parser g=
enerator indicate that distinct ASTs map to the same text, at least accordi=
ng to the reduction method being used.=C2=A0 I like using LR derivations (i=
ncluding ones produced by LALR) because of the bottom-up recognition of gra=
mmar symbols.=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto"><br>=
</div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">
I&#39;ve written an Earley parser, and so far it looks in the same<br>
performance category as LALR(wisent) written in Elisp.<br>
Earley parser works with any grammar you throw at it. No conflicts.<br>
Each token gets full context of rules that are in effect at that point.<br>
Seems like there&#39;s no need to build parse trees, a list of states-<br>
tokens can be thought of as a flattened parse tree.<br>
Though there&#39;s a lot of testing for this concept ahead.<br></blockquote=
></div></div><div dir=3D"auto"><br></div><div dir=3D"auto"><br></div><div d=
ir=3D"auto">The above is not to discourage adding Earley parsing to the too=
lkit.=C2=A0 However, just defers the to resolution of ambiguities to your c=
ode rather than by refining your grammar.  For specifying a programming lan=
guage, this seems like inviting difficult to debug cases to me.=C2=A0=C2=A0=
</div><div dir=3D"auto"><br></div><div dir=3D"auto">Lynn</div><div dir=3D"a=
uto"><br></div></div>

--0000000000004a423a05f0bb7a5e--