From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: Using the wisent parser-generator, as it creates faster parsers Date: Tue, 27 Dec 2022 22:22:37 -0500 Message-ID: References: <8075de284038bb4970568bb856656cbc88ded050.camel@gmail.com> <27990365a2b32aaa3f76fe8f0922a8481d953ef6.camel@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000020b7505f0dae202" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35631"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eric Ludlam , emacs-devel To: Alexandr Karbivnichyi Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Dec 28 04:23:23 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pAN2J-00099K-DT for ged-emacs-devel@m.gmane-mx.org; Wed, 28 Dec 2022 04:23:23 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pAN1r-0002mw-Og; Tue, 27 Dec 2022 22:22:55 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAN1p-0002md-Lt for emacs-devel@gnu.org; Tue, 27 Dec 2022 22:22:53 -0500 Original-Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pAN1n-0007A5-Ai for emacs-devel@gnu.org; Tue, 27 Dec 2022 22:22:52 -0500 Original-Received: by mail-pl1-x634.google.com with SMTP id s7so14919607plk.5 for ; Tue, 27 Dec 2022 19:22:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MdMm+ihPyqHk7NG5sxFdCPZX9DGaDfZzSC1do/Ft7ag=; b=IoPOpiHiDPapjA3Ou2qm9NTEQb8dIlVqsE0dk9r3UjUM+VV7K8ID55pAQFyzqqp/Ko LE0sMf8dfFaaDu3tkIVMG9n9+tD1dTW7aKrUvJAbNyNw476jMICgCN6+fvHsiShon0Oj 2ZZl/3HR/0xzmJgNewGCAWravoo6fhfMmwpugOoJduqk727soB/P/0uRrhyblHseSlZP 7plvWTVb4jzseG1OLjefVuLrSJGYHi2riewGIDrmf+bdeQafPuNMVngbG48vK+6UKhSP kxhighQTsE5a+wWn214AAnZmT7WBJGVEwQe+Etkcf4ArWawS7TCUbVNnzdzHvlc7w0G2 cOaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MdMm+ihPyqHk7NG5sxFdCPZX9DGaDfZzSC1do/Ft7ag=; b=7qN6iv5uWCTpQ3gW3SKzfb9m2nVgLBytguY5s73k1UoW9sNoOfGfbg7AreAMMmu2LO B0VpWzob3atfkmBdZNyzOvb16Nj55kUkAW5/+4deeu8LtnBSXCdMha4nlig3JCUwBkIu ExcvCoQ1prweJF4UfueTtPuinM7IMdCFTAQwl4Jd+VxR1E5FGREuexon+HLmYFS1xySK vTpuqFq42FbTSyfcRhDhVXurb+hTeGb17rSGc9uZJs/7tLrpMGXGx4vFksZtaEfPAJqk L/63I/x6H0XC8XK4e42vGVuOmd4lLaX0TKw65v3+kQWDzI8z8kRFGPJqzvm3Pr6wjY7k zlBw== X-Gm-Message-State: AFqh2ko45Q3+JKIG7nYPdeLUP6tZn5sZ0cih7nnbLzhOtxs3G8JuQHLz YxoqT7Kkyl5lRFUjiq8ehLB7XGn2/k8XOqgPVI0= X-Google-Smtp-Source: AMrXdXvGcO3pwwDoY1fEazn+KTIotoGVGOA+W2sbOKaZ+lfSlLbS5/LkXQ3oPkK51/xgGm7FhkLzJF2kHf75zYKcQOc= X-Received: by 2002:a17:902:c40b:b0:190:f66c:13aa with SMTP id k11-20020a170902c40b00b00190f66c13aamr1476065plk.121.1672197768873; Tue, 27 Dec 2022 19:22:48 -0800 (PST) In-Reply-To: <27990365a2b32aaa3f76fe8f0922a8481d953ef6.camel@gmail.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=owinebar@gmail.com; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:301996 Archived-At: --000000000000020b7505f0dae202 Content-Type: text/plain; charset="UTF-8" On Mon, Dec 26, 2022, 1:13 PM Alexandr Karbivnichyi wrote: > On Mon, 2022-12-26 at 08:54 -0500, Lynn Winebarger wrote: > > On Sun, Dec 25, 2022, 11:03 PM wrote: > > > > I don't agree - conflicts detected by the parser generator indicate > > that distinct ASTs map to the same text, at least according to the > > reduction method being used. I like using LR derivations (including > > ones produced by LALR) because of the bottom-up recognition of > > grammar symbols. > > True for LALR parsers. Tried-and-tested system, for industrial > compilers built on a formal standard with simple grammar that doesn't > change often. > I suppose it depends on the tools you are using. I tend to roll my own implementation of the LALR(1) construction of DeRemer and Penello, then use the relations they define to determine how strings that will be ambiguously parsed by the grammar. Then that ambiguity can be resolved. It's not that different from what you describe doing at run-time with GLR, just performing the analysis ahead of time (on LR(0) states, aka item sets) and encoding the disambiguation logic into the grammar itself. The DeRemer-Penello construction is just a control-flow analysis. I'm not familiar with any work that uses it to automatically construct test cases that illustrate ambiguities in grammars, but it's surprising if there isn't given the tools for test case generation for programs in general purpose programming languages. There's then the second question of how to map productions of the disambiguated grammar back to an AST data structure that is much "flatter". Usually the grammar for that AST data structure is the origin of ambiguities in the concrete syntax. It's a problem in noncommutative algebra. The ambiguous grammar presumably specifies the language (strings of terminal sbols or tokens) being recognized correctly. A CFG is a system of equations, and a parse is a particular solution to those equations for a given element in that language. It's just a matter of finding an equivalent system of equations such that every solution is unique. At least, that formulation helps me. Lynn --000000000000020b7505f0dae202 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Dec 26, 2022, 1:13 PM Alexandr Karbivnichyi <ambulajan@gmail.com> wrote:
On Mon, 2022-12-26 at 08:54 -0500, Lynn Wi= nebarger wrote:
> On Sun, Dec 25, 2022, 11:03 PM <ambulajan@gmail.com> wrote:=
>
> I don't agree - conflicts detected by the parser generator indicat= e
> that distinct ASTs map to the same text, at least according to the
> reduction method being used.=C2=A0 I like using LR derivations (includ= ing
> ones produced by LALR) because of the bottom-up recognition of
> grammar symbols.=C2=A0

True for LALR parsers. Tried-and-tested system, for industrial
compilers built on a formal standard with simple grammar that doesn't change often.
I suppose it de= pends on the tools you are using.=C2=A0 I tend to roll my own implementatio= n of the LALR(1) construction of DeRemer and Penello, then use the relation= s they define to determine how strings that will be ambiguously parsed by t= he grammar. Then that ambiguity can be resolved. It's not that differen= t from what you describe doing at run-time with GLR, just performing the an= alysis ahead of time (on LR(0) states, aka item sets) and encoding the disa= mbiguation logic into the grammar itself.=C2=A0 The DeRemer-Penello constru= ction is just a control-flow analysis.=C2=A0 I'm not familiar with any = work that uses it to automatically construct test cases that illustrate amb= iguities in grammars, but it's surprising if there isn't given the = tools for test case generation for programs in general purpose programming = languages.
There's then the second question of h= ow to map productions of the disambiguated grammar back to an AST data stru= cture that is much "flatter".=C2=A0 Usually the grammar for that = AST data structure is the origin of ambiguities in the concrete syntax.

It's a problem in nonco= mmutative algebra.=C2=A0 The ambiguous grammar presumably specifies the lan= guage (strings of terminal sbols or tokens) being recognized correctly.=C2= =A0 A CFG is a system of equations, and a parse is a particular solution to= those equations for a given element in that language.=C2=A0 It's just = a matter of finding an equivalent system of equations such that every solut= ion is unique.=C2=A0 At least, that formulation helps me.

Lynn


--000000000000020b7505f0dae202--