Questions about PEG parsing

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* Questions about PEG parsing
@ 2021-10-23 23:25 Vivien Kraus via General Guile related discussions
  2021-11-02  0:44 ` Robin Templeton
  2021-11-02 12:44 ` Matt Wette
  0 siblings, 2 replies; 3+ messages in thread
From: Vivien Kraus via General Guile related discussions @ 2021-10-23 23:25 UTC (permalink / raw)
  To: Guile User

Dear guile users,

There are two ways to make a parser with guile: the undocumented
LALR(1) parser generator, for which I don’t know how to reuse grammar
elements, and the PEG parser.

The manual shows an example for a PEG parser. According to wikipedia,
which is cited in the manual itself, PEG can have exponential worst-
case complexity. However, I can’t figure out an example that would
produce this behavior. Could you show me one? Does Guile do anything to
help this, like using memoization as in the packrat parser?

Another thing is that the parser does not seem to accept production
rules, and instead returns a compressed tree with some nodes lumped
together or ignored depending on the grammar extension <--, <- or <. It
seems impossible to ignore part of a rule, that’s why the tutorial uses
a dedicated NL token:

	passwd <- entry* !.
	entry <-- (! NL .)* NL*
	NL < '\n'

So, I have to create a new terminal token for each delimiter I want to
match, but discard. Is it not possible to do something like this
instead, to ignore only a part of the rule?

	passwd <- entry* !.
	entry <-- (! '\n' .)* ( < ('\n'*))

Finally, the result of a match is a compressed tree, where nodes are
rule names and leaves are matched text. This is not very useful, I
would prefer to be able to specify a production rule. It seems that I
can inject a production, for instance to parse a natural number:

(use-modules (ice-9 peg) (ice-9 match))

(define-peg-string-patterns "DIGIT- <- [0-9]")
(define (DIGIT . args)
  (match (apply DIGIT- args)
    (#f #f)
    ((length value)
     `(,length ,(- (char->integer (string-ref value 0))
                   (char->integer #\0))))))

That way, I can reuse the DIGIT function in another parser:

(define-peg-string-patterns "NATURAL- <- DIGIT+")
(define (NATURAL . args)
  (match (apply NATURAL- args)
    (#f #f)
    ((length (digits ...))
     (let produce ((sum 0) (digits digits))
       (match digits
         (() `(,length ,sum))
         ((significant digits ...)
          (produce (+ (* sum 10) significant) digits)))))))

(peg:tree (match-pattern NATURAL "256"))

Are there downsides? Maybe the production should be pure, if the result
gets memoized?

Best regards,

Vivien

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about PEG parsing
  2021-10-23 23:25 Questions about PEG parsing Vivien Kraus via General Guile related discussions
@ 2021-11-02  0:44 ` Robin Templeton
  2021-11-02 12:44 ` Matt Wette
  1 sibling, 0 replies; 3+ messages in thread
From: Robin Templeton @ 2021-11-02  0:44 UTC (permalink / raw)
  To: guile-user

Vivien Kraus via General Guile related discussions <guile-user@gnu.org>
writes:

> There are two ways to make a parser with guile: the undocumented
> LALR(1) parser generator, for which I don’t know how to reuse grammar
> elements, and the PEG parser.

The lalr module isn't documented in the Guile manual itself, but has
some upstream documentation and examples (linked from the info node):

https://github.com/schemeway/lalr-scm/

> The manual shows an example for a PEG parser. According to wikipedia,
> which is cited in the manual itself, PEG can have exponential worst-
> case complexity. However, I can’t figure out an example that would
> produce this behavior. Could you show me one? Does Guile do anything to
> help this, like using memoization as in the packrat parser?

A completely naïve implementation would probably have exponential time
complexity due to the possibility of unlimited backtracking. Guile's PEG
parser has a "cache" submodule, so I assume it implements some form of
memoization based on Bryan Ford's packrat parsing technique. (Though I
haven't actually read the code in question)

> Another thing is that the parser does not seem to accept production
> rules, and instead returns a compressed tree with some nodes lumped
> together or ignored depending on the grammar extension <--, <- or <. It
> seems impossible to ignore part of a rule, that’s why the tutorial uses
> a dedicated NL token:

ISTR finding an apparent bug in either Guile's PEG parser generator, or
in my code attempting to use it; perhaps this is the same or a related
problem. I'll see if I can dig up the grammar I was using.

Regards,
Robin

P.S. I wrote a Scheme packrat parser shortly after reading Ford's
packrat parsing papers, which I'd be happy to share if you'd like to try
an alternative to Guile's native support. It's not technically
linear-time because it uses alists for memoization (which had better
performance characteristics for my grammars and inputs than, e.g., hash
tables), but that could be changed easily if needed. Its grammars look
like this (the syntax is hopefully obvious, except perhaps that ':' is a
sequence with the final argument being a Scheme expression producing the
result):

--8<---------------cut here---------------start------------->8---
(define (digit->number d)
  (case d
    ((#\0) 0) ((#\1) 1) ((#\2) 2) ((#\3) 3) ((#\4) 4)
    ((#\5) 5) ((#\6) 6) ((#\7) 7) ((#\8) 8) ((#\9) 9)))

(define-grammar arith additive
  (rule additive 
    (: (bind left multiplicative)
       (bind right (? (: #\+ (bind foo additive) foo)))
       (+ left (or right 0))))
  (rule multiplicative
    (: (bind left primary)
       (bind right (? (: #\* (bind bar multiplicative) bar)))
       (* left (or right 1))))
  (rule primary
    (/ number parenthesized))
  (rule number
    (: (bind ds (+ digit))
       (let loop ((ds ds) (n 0))
         (if (null? ds)
             n
             (loop (cdr ds) (+ (* n 10) (car ds)))))))
  (rule digit
    (: (bind d (/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
       (digit->number d)))
  (rule parenthesized
    (: #\(
       (bind value additive)
       #\)
       value)))
--8<---------------cut here---------------end--------------->8---

-- 
Inteligenta persono lernas la lingvon Esperanton rapide kaj facile.
Esperanto estas moderna, kultura lingvo por la mondo. Simpla, fleksebla,
belsona, Esperanto estas la praktika solvo de la problemo de universala
interkompreno. Lernu la interlingvon Esperanton!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Questions about PEG parsing
  2021-10-23 23:25 Questions about PEG parsing Vivien Kraus via General Guile related discussions
  2021-11-02  0:44 ` Robin Templeton
@ 2021-11-02 12:44 ` Matt Wette
  1 sibling, 0 replies; 3+ messages in thread
From: Matt Wette @ 2021-11-02 12:44 UTC (permalink / raw)
  To: guile-user

On 10/23/21 4:25 PM, Vivien Kraus via General Guile related discussions 
wrote:
> Dear guile users,
>
> There are two ways to make a parser with guile: the undocumented
> LALR(1) parser generator, for which I don’t know how to reuse grammar
> elements, and the PEG parser.
>
I'm not sure what your needs are but in my case being not keen on the
lalr parser in Guile I ended up writing my own (for Guile).

      https://savannah.nongnu.org/projects/nyacc

If you go to the "Project Homepage" there are some manuals.  And the
distribution includes several examples.

Matt




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-02 12:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23 23:25 Questions about PEG parsing Vivien Kraus via General Guile related discussions
2021-11-02  0:44 ` Robin Templeton
2021-11-02 12:44 ` Matt Wette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).