unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Questions about PEG parsing
@ 2021-10-23 23:25 Vivien Kraus via General Guile related discussions
  2021-11-02  0:44 ` Robin Templeton
  2021-11-02 12:44 ` Matt Wette
  0 siblings, 2 replies; 3+ messages in thread
From: Vivien Kraus via General Guile related discussions @ 2021-10-23 23:25 UTC (permalink / raw)
  To: Guile User

Dear guile users,

There are two ways to make a parser with guile: the undocumented
LALR(1) parser generator, for which I don’t know how to reuse grammar
elements, and the PEG parser.

The manual shows an example for a PEG parser. According to wikipedia,
which is cited in the manual itself, PEG can have exponential worst-
case complexity. However, I can’t figure out an example that would
produce this behavior. Could you show me one? Does Guile do anything to
help this, like using memoization as in the packrat parser?

Another thing is that the parser does not seem to accept production
rules, and instead returns a compressed tree with some nodes lumped
together or ignored depending on the grammar extension <--, <- or <. It
seems impossible to ignore part of a rule, that’s why the tutorial uses
a dedicated NL token:

	passwd <- entry* !.
	entry <-- (! NL .)* NL*
	NL < '\n'

So, I have to create a new terminal token for each delimiter I want to
match, but discard. Is it not possible to do something like this
instead, to ignore only a part of the rule?

	passwd <- entry* !.
	entry <-- (! '\n' .)* ( < ('\n'*))

Finally, the result of a match is a compressed tree, where nodes are
rule names and leaves are matched text. This is not very useful, I
would prefer to be able to specify a production rule. It seems that I
can inject a production, for instance to parse a natural number:

(use-modules (ice-9 peg) (ice-9 match))

(define-peg-string-patterns "DIGIT- <- [0-9]")
(define (DIGIT . args)
  (match (apply DIGIT- args)
    (#f #f)
    ((length value)
     `(,length ,(- (char->integer (string-ref value 0))
                   (char->integer #\0))))))

That way, I can reuse the DIGIT function in another parser:

(define-peg-string-patterns "NATURAL- <- DIGIT+")
(define (NATURAL . args)
  (match (apply NATURAL- args)
    (#f #f)
    ((length (digits ...))
     (let produce ((sum 0) (digits digits))
       (match digits
         (() `(,length ,sum))
         ((significant digits ...)
          (produce (+ (* sum 10) significant) digits)))))))

(peg:tree (match-pattern NATURAL "256"))

Are there downsides? Maybe the production should be pure, if the result
gets memoized?

Best regards,

Vivien




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-02 12:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23 23:25 Questions about PEG parsing Vivien Kraus via General Guile related discussions
2021-11-02  0:44 ` Robin Templeton
2021-11-02 12:44 ` Matt Wette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).