From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robin Templeton Newsgroups: gmane.lisp.guile.user Subject: Re: Questions about PEG parsing Date: Mon, 01 Nov 2021 20:44:01 -0400 Message-ID: <877ddro08e.fsf@terpri.org> References: <64ff1f637d30dcf9cf9df1c20284698ea33e9c1d.camel@planete-kraus.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7385"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) To: guile-user@gnu.org Cancel-Lock: sha1:wmiJVcdF22t8OVVK5HQKrrliH+4= Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Tue Nov 02 01:44:39 2021 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mhhup-0001jj-Mh for guile-user@m.gmane-mx.org; Tue, 02 Nov 2021 01:44:39 +0100 Original-Received: from localhost ([::1]:34254 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mhhuo-0007bs-DF for guile-user@m.gmane-mx.org; Mon, 01 Nov 2021 20:44:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58650) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhhuU-0007bc-AH for guile-user@gnu.org; Mon, 01 Nov 2021 20:44:18 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]:48182) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhhuR-00028E-L9 for guile-user@gnu.org; Mon, 01 Nov 2021 20:44:18 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1mhhuO-0001GT-6m for guile-user@gnu.org; Tue, 02 Nov 2021 01:44:12 +0100 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=guile-user@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.io gmane.lisp.guile.user:17821 Archived-At: Vivien Kraus via General Guile related discussions writes: > There are two ways to make a parser with guile: the undocumented > LALR(1) parser generator, for which I don’t know how to reuse grammar > elements, and the PEG parser. The lalr module isn't documented in the Guile manual itself, but has some upstream documentation and examples (linked from the info node): https://github.com/schemeway/lalr-scm/ > The manual shows an example for a PEG parser. According to wikipedia, > which is cited in the manual itself, PEG can have exponential worst- > case complexity. However, I can’t figure out an example that would > produce this behavior. Could you show me one? Does Guile do anything to > help this, like using memoization as in the packrat parser? A completely naïve implementation would probably have exponential time complexity due to the possibility of unlimited backtracking. Guile's PEG parser has a "cache" submodule, so I assume it implements some form of memoization based on Bryan Ford's packrat parsing technique. (Though I haven't actually read the code in question) > Another thing is that the parser does not seem to accept production > rules, and instead returns a compressed tree with some nodes lumped > together or ignored depending on the grammar extension <--, <- or <. It > seems impossible to ignore part of a rule, that’s why the tutorial uses > a dedicated NL token: ISTR finding an apparent bug in either Guile's PEG parser generator, or in my code attempting to use it; perhaps this is the same or a related problem. I'll see if I can dig up the grammar I was using. Regards, Robin P.S. I wrote a Scheme packrat parser shortly after reading Ford's packrat parsing papers, which I'd be happy to share if you'd like to try an alternative to Guile's native support. It's not technically linear-time because it uses alists for memoization (which had better performance characteristics for my grammars and inputs than, e.g., hash tables), but that could be changed easily if needed. Its grammars look like this (the syntax is hopefully obvious, except perhaps that ':' is a sequence with the final argument being a Scheme expression producing the result): --8<---------------cut here---------------start------------->8--- (define (digit->number d) (case d ((#\0) 0) ((#\1) 1) ((#\2) 2) ((#\3) 3) ((#\4) 4) ((#\5) 5) ((#\6) 6) ((#\7) 7) ((#\8) 8) ((#\9) 9))) (define-grammar arith additive (rule additive (: (bind left multiplicative) (bind right (? (: #\+ (bind foo additive) foo))) (+ left (or right 0)))) (rule multiplicative (: (bind left primary) (bind right (? (: #\* (bind bar multiplicative) bar))) (* left (or right 1)))) (rule primary (/ number parenthesized)) (rule number (: (bind ds (+ digit)) (let loop ((ds ds) (n 0)) (if (null? ds) n (loop (cdr ds) (+ (* n 10) (car ds))))))) (rule digit (: (bind d (/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)) (digit->number d))) (rule parenthesized (: #\( (bind value additive) #\) value))) --8<---------------cut here---------------end--------------->8--- -- Inteligenta persono lernas la lingvon Esperanton rapide kaj facile. Esperanto estas moderna, kultura lingvo por la mondo. Simpla, fleksebla, belsona, Esperanto estas la praktika solvo de la problemo de universala interkompreno. Lernu la interlingvon Esperanton!