unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Vivien Kraus via General Guile related discussions <guile-user@gnu.org>
To: Guile User <guile-user@gnu.org>
Subject: Questions about PEG parsing
Date: Sun, 24 Oct 2021 01:25:44 +0200	[thread overview]
Message-ID: <64ff1f637d30dcf9cf9df1c20284698ea33e9c1d.camel@planete-kraus.eu> (raw)

Dear guile users,

There are two ways to make a parser with guile: the undocumented
LALR(1) parser generator, for which I don’t know how to reuse grammar
elements, and the PEG parser.

The manual shows an example for a PEG parser. According to wikipedia,
which is cited in the manual itself, PEG can have exponential worst-
case complexity. However, I can’t figure out an example that would
produce this behavior. Could you show me one? Does Guile do anything to
help this, like using memoization as in the packrat parser?

Another thing is that the parser does not seem to accept production
rules, and instead returns a compressed tree with some nodes lumped
together or ignored depending on the grammar extension <--, <- or <. It
seems impossible to ignore part of a rule, that’s why the tutorial uses
a dedicated NL token:

	passwd <- entry* !.
	entry <-- (! NL .)* NL*
	NL < '\n'

So, I have to create a new terminal token for each delimiter I want to
match, but discard. Is it not possible to do something like this
instead, to ignore only a part of the rule?

	passwd <- entry* !.
	entry <-- (! '\n' .)* ( < ('\n'*))

Finally, the result of a match is a compressed tree, where nodes are
rule names and leaves are matched text. This is not very useful, I
would prefer to be able to specify a production rule. It seems that I
can inject a production, for instance to parse a natural number:

(use-modules (ice-9 peg) (ice-9 match))

(define-peg-string-patterns "DIGIT- <- [0-9]")
(define (DIGIT . args)
  (match (apply DIGIT- args)
    (#f #f)
    ((length value)
     `(,length ,(- (char->integer (string-ref value 0))
                   (char->integer #\0))))))

That way, I can reuse the DIGIT function in another parser:

(define-peg-string-patterns "NATURAL- <- DIGIT+")
(define (NATURAL . args)
  (match (apply NATURAL- args)
    (#f #f)
    ((length (digits ...))
     (let produce ((sum 0) (digits digits))
       (match digits
         (() `(,length ,sum))
         ((significant digits ...)
          (produce (+ (* sum 10) significant) digits)))))))

(peg:tree (match-pattern NATURAL "256"))

Are there downsides? Maybe the production should be pure, if the result
gets memoized?

Best regards,

Vivien




             reply	other threads:[~2021-10-23 23:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-23 23:25 Vivien Kraus via General Guile related discussions [this message]
2021-11-02  0:44 ` Questions about PEG parsing Robin Templeton
2021-11-02 12:44 ` Matt Wette

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64ff1f637d30dcf9cf9df1c20284698ea33e9c1d.camel@planete-kraus.eu \
    --to=guile-user@gnu.org \
    --cc=vivien@planete-kraus.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).