From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Vivien Kraus via General Guile related discussions Newsgroups: gmane.lisp.guile.user Subject: Questions about PEG parsing Date: Sun, 24 Oct 2021 01:25:44 +0200 Message-ID: <64ff1f637d30dcf9cf9df1c20284698ea33e9c1d.camel@planete-kraus.eu> Reply-To: Vivien Kraus Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14074"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.34.2 To: Guile User Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Sun Oct 24 01:26:26 2021 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1meQPC-0003Rs-Ne for guile-user@m.gmane-mx.org; Sun, 24 Oct 2021 01:26:26 +0200 Original-Received: from localhost ([::1]:46816 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1meQPA-0000sq-Ve for guile-user@m.gmane-mx.org; Sat, 23 Oct 2021 19:26:24 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51986) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1meQOo-0000qr-6U for guile-user@gnu.org; Sat, 23 Oct 2021 19:26:03 -0400 Original-Received: from planete-kraus.eu ([2a00:5881:4008:2810::309]:49708) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1meQOk-0005zD-UO for guile-user@gnu.org; Sat, 23 Oct 2021 19:26:01 -0400 Original-Received: from planete-kraus.eu (localhost.lan [127.0.0.1]) by planete-kraus.eu (OpenSMTPD) with ESMTP id f98be0a3 for ; Sat, 23 Oct 2021 23:25:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=planete-kraus.eu; h= message-id:subject:from:to:date:content-type:mime-version :content-transfer-encoding; s=*; bh=RGD7VVeeFMiGurorITahi7i7oe4=; b= CR2j7+Pe0DVoQWpBZNk80/HGdMc51H9VOqQMOP+A7Qx2I37chEfU+iEcyiGz2CCD qiMQk8j0Mh22tQPut25khjPYw8Aejw/miKjEA/BFAE/NJ+8aXGDpUcgn+GDDFS2/ We5vwq0VWvI/ufcRfKPc6xIL9mN48U8k39QdE3g+75E= Original-Received: by planete-kraus.eu (OpenSMTPD) with ESMTPSA id 02f5a82d (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO) for ; Sat, 23 Oct 2021 23:25:46 +0000 (UTC) Received-SPF: pass client-ip=2a00:5881:4008:2810::309; envelope-from=vivien@planete-kraus.eu; helo=planete-kraus.eu X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.io gmane.lisp.guile.user:17812 Archived-At: Dear guile users, There are two ways to make a parser with guile: the undocumented LALR(1) parser generator, for which I don’t know how to reuse grammar elements, and the PEG parser. The manual shows an example for a PEG parser. According to wikipedia, which is cited in the manual itself, PEG can have exponential worst- case complexity. However, I can’t figure out an example that would produce this behavior. Could you show me one? Does Guile do anything to help this, like using memoization as in the packrat parser? Another thing is that the parser does not seem to accept production rules, and instead returns a compressed tree with some nodes lumped together or ignored depending on the grammar extension <--, <- or <. It seems impossible to ignore part of a rule, that’s why the tutorial uses a dedicated NL token: passwd <- entry* !. entry <-- (! NL .)* NL* NL < '\n' So, I have to create a new terminal token for each delimiter I want to match, but discard. Is it not possible to do something like this instead, to ignore only a part of the rule? passwd <- entry* !. entry <-- (! '\n' .)* ( < ('\n'*)) Finally, the result of a match is a compressed tree, where nodes are rule names and leaves are matched text. This is not very useful, I would prefer to be able to specify a production rule. It seems that I can inject a production, for instance to parse a natural number: (use-modules (ice-9 peg) (ice-9 match)) (define-peg-string-patterns "DIGIT- <- [0-9]") (define (DIGIT . args) (match (apply DIGIT- args) (#f #f) ((length value) `(,length ,(- (char->integer (string-ref value 0)) (char->integer #\0)))))) That way, I can reuse the DIGIT function in another parser: (define-peg-string-patterns "NATURAL- <- DIGIT+") (define (NATURAL . args) (match (apply NATURAL- args) (#f #f) ((length (digits ...)) (let produce ((sum 0) (digits digits)) (match digits (() `(,length ,sum)) ((significant digits ...) (produce (+ (* sum 10) significant) digits))))))) (peg:tree (match-pattern NATURAL "256")) Are there downsides? Maybe the production should be pure, if the result gets memoized? Best regards, Vivien