* [PATCH] ice-9/peg: Extend PEG for production use.
@ 2020-05-11 13:41 Rutger van Beusekom
0 siblings, 0 replies; only message in thread
From: Rutger van Beusekom @ 2020-05-11 13:41 UTC (permalink / raw)
To: guile-devel
[-- Attachment #1: Type: text/plain, Size: 332 bytes --]
Hi,
First of all, I am aware I am submitting a sizeable patch here, and I am
happy to break it up if necessary.
Currently I am using these changes to the Parser Expression Grammar
library in GNU Guile for a production parser.
Hopefully you will find this interesting and useful. If so I would
appreciate your feedback.
Rutger.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ice-9-peg-Extend-PEG-for-production-use.patch --]
[-- Type: text/x-diff, Size: 34651 bytes --]
From 091915cc0cfbe73333b2c486b254628f37d52bb1 Mon Sep 17 00:00:00 2001
From: Rutger van Beusekom <rutger.van.beusekom@verum.com>
Date: Mon, 11 May 2020 14:30:49 +0200
Subject: [PATCH] ice-9/peg: Extend PEG for production use.
* doc/ref/api-peg.texi: Document debug tracing, locations, skip parsing,
expect parsing and fallback parsing.
* module/ice-9/peg.scm (ice-9): Re-export added parameters and macro.
* module/ice-9/peg/codegen.scm (%peg:debug?): Add exported parameter to
toggle parser debug tracing.
(%peg:locations?): Add exported parameter to toggle adding locations to
the parse tree.
(%peg:skip?): Add exported parameter to pass a skip parser to the main
grammar.
(%peg:fallback?): Add exported parameter to toggle fallback parsing.
(%peg:error): Add exported parameter to handle fallback recovery.
(%continuation): Add parameter to maintain the parser productions
following the current production.
(format-error): New function which wraps (%peg:error).
(fallback-skip): New function which implements parser recovery.
(partial-match): New function to allow continuations to match only
partially.
(cg-and-int): Rewrite to add fallback parsing.
(cg-*): Rewrite to add fallback parsing.
(cg-+): Rewrite to add fallback parsing.
(cg-expect-int): New function implementing expect parsing interfals.
(cg-expect): New function implementing expect parsing.
(expect): Add expect parsing to PEG
(trace?): New function determining debug tracing verbosity.
(indent): New variable maintaining debug trace indentation level.
(wrap-parser-for-users): Add debug tracing, locations and skip parsing.
(define-skip-parser): New macro allows definging skip parsers.
* module/ice-9/peg/string-peg.scm (peg-as-peg): Add #
(peg-secondary->defn): Add # for expect parsing.
* test-suite/tests/peg.test: Test it.
---
doc/ref/api-peg.texi | 307 +++++++++++++++++++++++++++++---
module/ice-9/peg.scm | 9 +-
module/ice-9/peg/codegen.scm | 193 +++++++++++++++-----
module/ice-9/peg/string-peg.scm | 10 +-
test-suite/tests/peg.test | 120 ++++++++++++-
5 files changed, 562 insertions(+), 77 deletions(-)
diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 82e2758b4..811c4d216 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -159,6 +159,22 @@ Would be:
(+ "e"))
@end lisp
+@deftp {PEG Pattern} {expect} a
+Expect to parse @var{a}. Upon failure, throw a @code{'syntax-error}
+including its offset position and its symbol.
+
+@code{"a#"}
+
+@code{(expect a)}
+@end deftp
+
+Example:
+
+@example
+"a b# (c / d)#"
+@end example
+
+
@subsubheading Extended Syntax
There is some extra syntax for S-expressions.
@@ -221,11 +237,11 @@ character (in normal PEGs nonterminals can only be alphabetic).
For example, if we:
@lisp
-(define-peg-string-patterns
+(define-peg-string-patterns
"as <- 'a'+
bs <- 'b'+
as-or-bs <- as/bs")
-(define-peg-string-patterns
+(define-peg-string-patterns
"as-tag <-- 'a'+
bs-tag <-- 'b'+
as-or-bs-tag <-- as-tag/bs-tag")
@@ -271,9 +287,9 @@ For Example, if we:
@end lisp
Then:
@lisp
-(match-pattern as-or-bs "aabbcc") @result{}
+(match-pattern as-or-bs "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: aa>
-(match-pattern as-or-bs-tag "aabbcc") @result{}
+(match-pattern as-or-bs-tag "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: (as-or-bs-tag (as-tag aa))>
@end lisp
@@ -325,7 +341,7 @@ find a valid substring starting at index 0 and @code{search-for-pattern} keeps
looking. They are both equally capable of ``parsing'' and ``matching''
given those constraints.
-@deffn {Scheme Procedure} match-pattern nonterm string
+@deffn {Scheme Procedure} match-pattern nonterm string
Parses @var{string} using the PEG stored in @var{nonterm}. If no match
was found, @code{match-pattern} returns false. If a match was found, a PEG
match record is returned.
@@ -344,19 +360,19 @@ nothing
@lisp
(define-peg-pattern as all (+ "a"))
-(match-pattern as "aabbcc") @result{}
+(match-pattern as "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
(define-peg-pattern as body (+ "a"))
-(match-pattern as "aabbcc") @result{}
+(match-pattern as "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: aa>
(define-peg-pattern as none (+ "a"))
-(match-pattern as "aabbcc") @result{}
+(match-pattern as "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: ()>
(define-peg-pattern bs body (+ "b"))
-(match-pattern bs "aabbcc") @result{}
+(match-pattern bs "aabbcc") @result{}
#f
@end lisp
@end deffn
@@ -371,31 +387,31 @@ was found, a PEG match record is returned.
@lisp
(define-peg-pattern as body (+ "a"))
-(search-for-pattern as "aabbcc") @result{}
+(search-for-pattern as "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern (+ "a") "aabbcc") @result{}
+(search-for-pattern (+ "a") "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern "'a'+" "aabbcc") @result{}
+(search-for-pattern "'a'+" "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: aa>
(define-peg-pattern as all (+ "a"))
-(search-for-pattern as "aabbcc") @result{}
+(search-for-pattern as "aabbcc") @result{}
#<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
(define-peg-pattern bs body (+ "b"))
-(search-for-pattern bs "aabbcc") @result{}
+(search-for-pattern bs "aabbcc") @result{}
#<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern (+ "b") "aabbcc") @result{}
+(search-for-pattern (+ "b") "aabbcc") @result{}
#<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern "'b'+" "aabbcc") @result{}
+(search-for-pattern "'b'+" "aabbcc") @result{}
#<peg start: 2 end: 4 string: aabbcc tree: bb>
(define-peg-pattern zs body (+ "z"))
-(search-for-pattern zs "aabbcc") @result{}
+(search-for-pattern zs "aabbcc") @result{}
#f
-(search-for-pattern (+ "z") "aabbcc") @result{}
+(search-for-pattern (+ "z") "aabbcc") @result{}
#f
-(search-for-pattern "'z'+" "aabbcc") @result{}
+(search-for-pattern "'z'+" "aabbcc") @result{}
#f
@end lisp
@end deffn
@@ -468,9 +484,9 @@ itself satisfies @var{tst}, @code{(list lst)} is returned (this is a
flat list whose only element satisfies @var{tst}).
@lisp
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{}
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{}
(2 2 (1 1 (2 2)) 2 2 (1 1))
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{}
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{}
((1 1 (1 1 (2 2)) (2 2 (1 1))))
@end lisp
@@ -624,7 +640,7 @@ symbol for strings)..
(entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
(entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
(peg:tree (match-pattern tag-passwd "one entry"))
-(tag-passwd
+(tag-passwd
(entry "one entry"))
@end lisp
@@ -841,7 +857,7 @@ number <-- [0-9]+")
(apply next-func (car first))
(apply next-func (car rest)))
;; form (sum (((product ...) "+") ((product ...) "+")) (product ...))
- (car
+ (car
(reduce ;; walk through the list and build a left-associative tree
(lambda (l r)
(list (list (cadr r) (car r) (apply next-func (car l)))
@@ -1035,3 +1051,248 @@ symbol function)}, where @code{symbol} is the symbol that will indicate
a form of this type and @code{function} is the code generating function
described above. The function @code{add-peg-compiler!} is exported from
the @code{(ice-9 peg codegen)} module.
+
+@subsubheading Debug tracing
+
+Due to the backtracking nature of PEG, the parser result is @code{#f}
+when it cannot match the input text. It proves to be a big pain
+determining whether the problem is actually in the input or in the
+grammar, especially when changing the grammar itself. Setting the
+parameter @code{%peg:debug?} to @code{#t} enables debug tracing of all
+non-terminals, alternatively parameter @code{%peg:debug?} can be set to
+a list of non-terminal sysmbols. Debug traring will make the PEG parser
+print for each selected non-terminal: its name, the input remaining, as
+well as the parse result.
+
+@lisp
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+ ;; missing: / term
+ "expression <-- term (PLUS / MINUS) expression
+ term <-- factor (DIV / MULT) term / factor
+ factor <-- LPAR expression RPAR / number
+ number <- MINUS? [0-9]+
+ PLUS <- '+'
+ MINUS <- '-'
+ DIV <- '/'
+ MULT <- '*'
+ LPAR < '('
+ RPAR < ')'")
+
+(parameterize ((%peg:debug? #t))
+ (pretty-print (match-pattern expression "1+")))
+@result{}
+expression
+ term
+ factor
+ LPAR
+ number
+ MINUS
+ number := "1" next: "+"
+ factor := "1" next: "+"
+ DIV
+ MULT
+ term := "1" next: "+"
+ PLUS
+ PLUS := "+" next: ""
+ expression
+ term
+ factor
+ LPAR
+ number
+ MINUS
+#f
+
+@end lisp
+
+@subsubheading Locations
+
+Having the corresponding location of non-terminal matches in the parse
+tree proves to be useful when providing the user with warning or error
+messages. Location generation in the parse tree is enabled by setting
+the parameter @code{%peg:locations?} to @code{#t}.
+
+Example:
+
+@lisp
+(parameterize ((%peg:locations? #t))
+ (and=> (match-pattern expression "1+2/3*4") peg:tree))
+@result{}
+(expression
+ (term (factor "1" (location 0 1)) (location 0 1))
+ "+"
+ (expression
+ (term (factor "2" (location 2 3))
+ "*"
+ (term (factor "3" (location 4 5))
+ "/"
+ (term (factor "4" (location 6 7)) (location 6 7))
+ (location 4 7))
+ (location 2 7))
+ (location 2 7))
+ (location 0 7))
+@end lisp
+
+@subsubheading Skip parsing
+
+To write a PEG parser for a whitespace invariant language or a language
+which includes line and block comments requires littering the grammar
+with whitespace and comment parser expressions, which not only violates
+the DRY principle, but is hard to get right by hand.
+
+To this end, the @code{%peg:skip?} parameter is available. When setting
+this parameter to an @emph{unwrapped parser} using
+@code{define-skip-parser}, this parser will be interleaved
+with the @emph{main parser}.
+
+Example:
+
+@lisp
+(use-modules (ice-9 match))
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+ "expression <-- term (PLUS / MINUS) expression / term
+ term <-- factor (DIV / MULT) term / factor
+ factor <-- LPAR expression RPAR / number
+ number <- MINUS? [0-9]+
+ PLUS <- '+'
+ MINUS <- '-'
+ DIV <- '/'
+ MULT <- '*'
+ LPAR < '('
+ RPAR < ')'")
+
+(define (transform e)
+ (match
+ e
+ (((? symbol?) lhs op rhs) `(,(transform op) ,(transform lhs) ,(transform rhs)))
+ (((? symbol?) lhs) (transform lhs))
+ ("+" '+)
+ ("-" '-)
+ ("*" '*)
+ ("/" '/)
+ ((? string?) (string->number e))))
+
+(pretty-print (transform (peg:tree (match-pattern expression "1/(2+3)*4"))))
+@result{}
+(/ 1 (* (+ 2 3) 4))
+
+(define-skip-parser peg-ws none (* (or " " "\t" "\n")))
+
+(parameterize ((%peg:skip? peg-ws))
+ (pretty-print (transform (peg:tree (match-pattern expression "1 / (2 + 3) * 4")))))
+@result{}
+(/ 1 (* (+ 2 3) 4))
+@end lisp
+
+@subsubheading Expect parsing
+
+The best thing about PEG is its backtracking nature giving it
+@emph{infinite} look ahead capability. At the same time it severely
+limits the debugability of the grammar, as mentioned in debug tracing.
+The ability to short circuit backtracking is achieved by introducing the
+@code{#} operator to the PEG language. Putting a @code{#} behind a
+terminal or non-terminal indicates that its parsing must succeed,
+otherwise an exception is thrown containing the current parser state
+providing a hook to produce informative parse errors.
+
+Example:
+
+@lisp
+(use-modules (ice-9 peg))
+
+(define-peg-string-patterns
+ "expression <-- term (PLUS / MINUS) expression# / term
+ term <-- factor (DIV / MULT) term# / factor
+ factor <-- LPAR expression# RPAR# / number#
+ number <- MINUS? [0-9]+
+ PLUS <- '+'
+ MINUS <- '-'
+ DIV <- '/'
+ MULT <- '*'
+ LPAR < '('
+ RPAR < ')'")
+
+(let ((input "1/(2+var)*4"))
+ (catch 'syntax-error
+ (lambda _ (peg:tree (match-pattern expression input)))
+ (lambda (key args) (format #t "at position ~a, expected a ~a, not: ~s\n"
+ (car args)
+ (cadr args)
+ (substring input (car args))))))
+@result{}
+at position 5, expected a number, not: "var)*4"
+@end lisp
+
+@subsubheading Fallback parsing
+
+A natural extension to expect parsing is fallback parsing. Which
+is implemented by catching the exception thrown by the expect operator.
+The parser will catch the exception in the peg @code{and}, @code{+} and
+@code{*} operator to attempt to recover its state by eating away at the
+input until it runs out of input or until one of the grammar
+continuations matches and it will continue parsing from there.
+
+By setting parameter @code{%peg:fallback?} to @code{#t} fallback parsing
+will be enabled when it encouters a mismatch in the input on an expect
+production. Parameter @code{%peg:error} is used to handle the error and
+it receives the position, the input and a list representing the error as
+pair. When the @code{car} is @code{'skipped}, the @code{cdr} is the
+substring which is ignored by the parser. When the @code{car} is
+@code{'missing} the @code{cdr} is the parser snippet that was expected,
+but is missing in the input.
+
+Example:
+
+@lisp
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+ "expression <-- term (PLUS / MINUS) expression# / term
+ term <-- factor (DIV / MULT) term# / factor
+ factor <-- LPAR expression RPAR# / number
+ number <- MINUS? [0-9]+
+ PLUS <- '+'
+ MINUS <- '-'
+ DIV <- '/'
+ MULT <- '*'
+ LPAR < '('
+ RPAR < ')'")
+
+(define (handle-error pos str error)
+ (format #t "at postion ~a, ~a: \"~a\"\n"
+ pos (car error) (cdr error)))
+
+(let ((input "1/(2+var)*4"))
+ (parameterize ((%peg:fallback? #t)
+ (%peg:error handle-error))
+ (pretty-print (peg:tree (match-pattern expression input)))))
+@result{}
+at postion 5, skipped: "var)"
+(expression
+ (term (factor "1")
+ "/"
+ (term (factor
+ (expression (term (factor "2")) "+" expression))
+ "*"
+ (term (factor "4")))))
+
+(let ((input "1/(2+)*4"))
+ (parameterize ((%peg:fallback? #t)
+ (%peg:error handle-error))
+ (pretty-print (peg:tree (match-pattern expression input)))))
+@result{}
+at postion 5, missing: "expression"
+(expression
+ (term (factor "1")
+ "/"
+ (term (factor
+ (expression (term (factor "2")) "+" expression))
+ "*"
+ (term (factor "4")))))
+@end lisp
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 4e03131cd..ca7d091bf 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -38,5 +38,10 @@
peg:string
peg:tree
peg:substring
- peg-record?))
-
+ peg-record?
+ %peg:debug?
+ %peg:fallback?
+ %peg:locations?
+ %peg:skip?
+ %peg:error
+ define-skip-parser))
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index d80c3e849..fcfb71422 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -18,10 +18,27 @@
;;;;
(define-module (ice-9 peg codegen)
- #:export (compile-peg-pattern wrap-parser-for-users add-peg-compiler!)
+ #:export (compile-peg-pattern
+ wrap-parser-for-users
+ add-peg-compiler!
+ define-skip-parser
+ %peg:debug?
+ %peg:locations?
+ %peg:skip?
+ %peg:fallback?
+ %peg:error)
+ #:use-module (srfi srfi-1)
#:use-module (ice-9 pretty-print)
#:use-module (system base pmatch))
+(define %peg:debug? (make-parameter #f))
+(define %peg:locations? (make-parameter #f))
+(define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
+(define %peg:fallback? (make-parameter #f))
+(define %peg:error (make-parameter (lambda (pos str error) #f)))
+
+(define %continuation (make-parameter (lambda (str streln at) #f)))
+
(define-syntax single?
(syntax-rules ()
"Return #t if X is a list of one element."
@@ -162,6 +179,38 @@ return EXP."
((eq? accum 'none) 'none)))
(define baf builtin-accum-filter)
+(define (format-error missing str)
+ (lambda (from to)
+ (unless (and (< from to)
+ (string-every char-set:whitespace str from (1+ to)))
+ ((%peg:error) from str
+ (if (< from to) (cons 'skipped (substring str from (1+ to)))
+ (cons 'missing missing))))))
+
+(define* (fallback-skip kernel #:optional sequence?)
+ (if (not (%peg:fallback?)) kernel
+ (lambda (str strlen start)
+ (catch 'syntax-error
+ (lambda _
+ (cond ((or #t (< start strlen)) (kernel str strlen start))
+ ((not sequence?) `(,strlen ()))
+ (else #f)))
+ (lambda (key . args)
+ (let* ((expected (cadar args))
+ (format-error (format-error expected str)))
+ (let loop ((at start))
+ (cond ((or (= at strlen) ((%continuation) str strlen at)) (format-error start at) (if sequence? `(,at ()) `(,at (,expected))))
+ (else (or (let ((res (false-if-exception (kernel str strlen (1+ at))))) (when res(format-error start at)) res)
+ ;;if kernel matches, we have skipped over: (substring str start (1+ at)))
+ (loop (1+ at))))))))))))
+
+
+(define (partial-match kernel)
+ (lambda (str strlen at)
+ (catch #t
+ (lambda _ (kernel str strlen at))
+ (lambda (key . args) (and (< at (caar args)) (car args))))))
+
;; Top-level function builder for AND. Reduces to a call to CG-AND-INT.
(define (cg-and clauses accum)
#`(lambda (str len pos)
@@ -174,8 +223,14 @@ return EXP."
(()
(cggr accum 'cg-and #`(reverse #,body) at))
((first rest ...)
- #`(let ((res (#,(compile-peg-pattern #'first accum) #,str #,strlen #,at)))
- (and res
+ #`(let* ((next #,(cg-or #'(rest ...) 'body))
+ (kernel #,(compile-peg-pattern #'first accum))
+ (res (parameterize ((%continuation (let ((after-that (%continuation)))
+ (lambda (str strlen at)
+ (or ((partial-match next) str strlen at)
+ ((partial-match after-that) str strlen at))))))
+ ((fallback-skip kernel) #,str #,strlen #,at))))
+ (and res
;; update AT and BODY then recurse
(let ((newat (car res))
(newbody (cadr res)))
@@ -200,42 +255,40 @@ return EXP."
(define (cg-* args accum)
(syntax-case args ()
((pat)
- #`(lambda (str strlen at)
- (let ((body '()))
- (let lp ((end at) (count 0))
- (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
- str strlen end))
- (new-end (if match (car match) end))
- (count (if (> new-end end) (1+ count) count)))
- (if (> new-end end)
- (push-not-null! body (single-filter (cadr match))))
- (if (and (> new-end end)
- #,#t)
- (lp new-end count)
- (let ((success #,#t))
- #,#`(and success
- #,(cggr (baf accum) 'cg-body
- #'(reverse body) #'new-end)))))))))))
+ #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+ (kleene (lambda (str strlen at)
+ (let ((body '()))
+ (let lp ((end at) (count 0))
+ (let* ((match ((fallback-skip kernel #t) str strlen end))
+ (new-end (if match (car match) end))
+ (count (if (> new-end end) (1+ count) count)))
+ (when (> new-end end)
+ (push-not-null! body (single-filter (cadr match))))
+ (if (and (> new-end end) #,#t) (lp new-end count)
+ (let ((success #,#t))
+ #,#`(and success
+ #,(cggr (baf accum) 'cg-body
+ #'(reverse body) #'new-end))))))))))
+ kleene))))
(define (cg-+ args accum)
(syntax-case args ()
((pat)
- #`(lambda (str strlen at)
- (let ((body '()))
- (let lp ((end at) (count 0))
- (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
- str strlen end))
- (new-end (if match (car match) end))
- (count (if (> new-end end) (1+ count) count)))
- (if (> new-end end)
- (push-not-null! body (single-filter (cadr match))))
- (if (and (> new-end end)
- #,#t)
- (lp new-end count)
- (let ((success #,#'(>= count 1)))
- #,#`(and success
- #,(cggr (baf accum) 'cg-body
- #'(reverse body) #'new-end)))))))))))
+ #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+ (multiple (lambda (str strlen at)
+ (let ((body '()))
+ (let lp ((end at) (count 0))
+ (let* ((match ((fallback-skip kernel #t) str strlen end))
+ (new-end (if match (car match) end))
+ (count (if (> new-end end) (1+ count) count)))
+ (when (> new-end end)
+ (push-not-null! body (single-filter (cadr match))))
+ (if (and (> new-end end) #,#t) (lp new-end count)
+ (let ((success #,#'(>= count 1)))
+ #,#`(and success
+ #,(cggr (baf accum) 'cg-body
+ #'(reverse body) #'new-end))))))))))
+ multiple))))
(define (cg-? args accum)
(syntax-case args ()
@@ -296,6 +349,16 @@ return EXP."
#f
#,(cggr (baf accum) 'cg-body #''() #'at)))))))))))
+(define (cg-expect-int clauses accum str strlen at)
+ (syntax-case clauses ()
+ ((pat)
+ #`(or (#,(compile-peg-pattern #'pat accum) #,str #,strlen #,at)
+ (throw 'syntax-error (list #,at (syntax->datum #'pat)))))))
+
+(define (cg-expect clauses accum)
+ #`(lambda (str len pos)
+ #,(cg-expect-int clauses (baf accum) #'str #'len #'pos)))
+
;; Association list of functions to handle different expressions as PEGs
(define peg-compiler-alist '())
@@ -313,6 +376,7 @@ return EXP."
(add-peg-compiler! '? cg-?)
(add-peg-compiler! 'followed-by cg-followed-by)
(add-peg-compiler! 'not-followed-by cg-not-followed-by)
+(add-peg-compiler! 'expect cg-expect)
;; Takes an arbitrary expressions and accumulation variable, then parses it.
;; E.g.: (compile-peg-pattern syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
@@ -332,28 +396,65 @@ return EXP."
"Not one of" (map car peg-compiler-alist)))))))
;; Packages the results of a parser
+
+(define (trace? symbol)
+ (cond ((pair? (%peg:debug?)) (memq symbol (%peg:debug?)))
+ ((or (null? (%peg:debug?)) (%peg:debug?)) #t)
+ (else #f)))
+
+(define indent 0)
+
(define (wrap-parser-for-users for-syntax parser accumsym s-syn)
- #`(lambda (str strlen at)
- (let ((res (#,parser str strlen at)))
+ #`(lambda (str strlen at)
+ (when (trace? '#,s-syn)
+ (format (current-error-port) "~a~a\n"
+ (make-string indent #\space)
+ '#,s-syn))
+ (set! indent (+ indent 4))
+ (let* ((comment-res ((%peg:skip?) str strlen at))
+ (comment-loc (and (%peg:locations?) comment-res `(location ,at ,(car comment-res))))
+ (at (or (and comment-res (car comment-res)) at))
+ (res (#,parser str strlen at)))
+ (set! indent (- indent 4))
+ (let ((pos (or (and res (car res)) 0)))
+ (when (and (trace? '#,s-syn) (< at pos))
+ (format (current-error-port) "~a~a := ~s\tnext: ~s\n"
+ (make-string indent #\space)
+ '#,s-syn
+ (substring str at pos)
+ (substring str pos (min strlen (+ pos 10))))))
;; Try to match the nonterminal.
(if res
;; If we matched, do some post-processing to figure out
;; what data to propagate upward.
- (let ((at (car res))
- (body (cadr res)))
+ (let* ((body (cadr res))
+ (loc `(location ,at ,(car res)))
+ (annotate (if (not (%peg:locations?)) '()
+ (if (null? (cadr comment-res)) `(,loc)
+ `((comment ,(cdr comment-res) ,comment-loc) ,loc))))
+ (at (car res)))
#,(cond
((eq? accumsym 'name)
- #`(list at '#,s-syn))
+ #`(list at '#,s-syn ,@annotate))
((eq? accumsym 'all)
#`(list (car res)
(cond
((not (list? body))
- (list '#,s-syn body))
- ((null? body) '#,s-syn)
+ `(,'#,s-syn ,body ,@annotate))
+ ((null? body)
+ `(,'#,s-syn ,@annotate))
((symbol? (car body))
- (list '#,s-syn body))
- (else (cons '#,s-syn body)))))
- ((eq? accumsym 'none) #`(list (car res) '()))
- (else #`(begin res))))
+ `(,'#,s-syn ,body ,@annotate))
+ (else
+ (cons '#,s-syn (append body annotate))))))
+ ((eq? accumsym 'none) #``(,at () ,@annotate))
+ (else #``(,at ,body ,@annotate))))
;; If we didn't match, just return false.
#f))))
+
+(define-syntax define-skip-parser
+ (lambda (x)
+ (syntax-case x ()
+ ((_ sym accum pat)
+ (let* ((matchf (compile-peg-pattern #'pat (syntax->datum #'accum))))
+ #`(define sym #,matchf))))))
diff --git a/module/ice-9/peg/string-peg.scm b/module/ice-9/peg/string-peg.scm
index 45ed14bb1..e8af6bb3d 100644
--- a/module/ice-9/peg/string-peg.scm
+++ b/module/ice-9/peg/string-peg.scm
@@ -42,7 +42,8 @@
pattern <-- alternative (SLASH sp alternative)*
alternative <-- ([!&]? sp suffix)+
suffix <-- primary ([*+?] sp)*
-primary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
+primary <-- secondary ([#] sp)?
+secondary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
literal <-- ['] (!['] .)* ['] sp
charclass <-- LB (!']' (CCrange / CCsingle))* RB sp
CCrange <-- . '-' .
@@ -73,6 +74,8 @@ RB < ']'
(define-sexp-parser peg-suffix all
(and peg-primary (* (and (or "*" "+" "?") peg-sp))))
(define-sexp-parser peg-primary all
+ (and peg-secondary (? (and "#" peg-sp))))
+(define-sexp-parser peg-secondary all
(or (and "(" peg-sp peg-pattern ")" peg-sp)
(and "." peg-sp)
peg-literal
@@ -188,6 +191,11 @@ RB < ']'
;; Parse a primary.
(define (peg-primary->defn lst for-syntax)
+ (let ((inner-defn (peg-secondary->defn (cadr lst) for-syntax)))
+ (if (and (pair? (cddr lst)) (equal? (caddr lst) "#")) #`(expect #,inner-defn)
+ inner-defn)))
+
+(define (peg-secondary->defn lst for-syntax)
(let ((el (cadr lst)))
(cond
((list? el)
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index f516571e8..3f2ea70e1 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -1,5 +1,5 @@
+;;;;; PEG test suite. -*- scheme -*-
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; PEG test suite.
;; Tests the parsing capabilities of (ice-9 peg). Could use more
;; tests for edge cases.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -33,6 +33,7 @@
(alternative peg-alternative)
(suffix peg-suffix)
(primary peg-primary)
+ (secondary peg-secondary)
(literal peg-literal)
(charclass peg-charclass)
(CCrange charclass-range)
@@ -63,13 +64,12 @@
(pass-if
"defining PEGs with PEG"
(and (eeval `(define-peg-string-patterns ,(@@ (ice-9 peg) peg-as-peg))) #t))
- (pass-if
+ (pass-if-equal
"equivalence of definitions"
- (equal?
- (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
+ (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
(tree-map
grammar-transform
- (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
+ (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg))))))
;; A grammar for pascal-style comments from Wikipedia.
(define comment-grammar
@@ -276,3 +276,113 @@ number <-- [0-9]+")
(equal? (eq-parse "1+1/2*3+(1+1)/2")
'(+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2)))))
+(define-peg-string-patterns
+ "trace-grammar <-- foo bar* baz
+foo <-- 'foo'
+bar <-- 'bar'
+baz <-- 'baz'")
+
+(use-modules (ice-9 peg codegen)) ; %peg:debug?
+(with-test-prefix "Parse tracing"
+ (pass-if-equal
+ "trace"
+"trace-grammar
+ foo
+ foo := \"foo\" next: \"baz\"
+ bar
+ baz
+ baz := \"baz\" next: \"\"
+trace-grammar := \"foobaz\" next: \"\"
+"
+ (parameterize ((%peg:debug? #t))
+ (with-error-to-string
+ (lambda _ (and=> (match-pattern trace-grammar "foobaz")
+ peg:tree))))))
+
+(define-peg-string-patterns
+ "expect-grammar <-- one# two# three#
+one <-- 'one'
+two <-- 'two'
+three <-- 'three'"
+)
+
+(with-test-prefix "Parsing expect"
+ (pass-if-equal "expect okay"
+ '(expect-grammar (one "one") (two "two") (three "three"))
+ (and=> (match-pattern expect-grammar "onetwothree")
+ peg:tree))
+ (pass-if-equal "expect one"
+ '(syntax-error (0 one))
+ (catch 'syntax-error
+ (lambda _
+ (and=> (match-pattern expect-grammar "twothree")
+ peg:tree))
+ (lambda args args)))
+ (pass-if-equal "expect two"
+ '(syntax-error (3 two))
+ (catch 'syntax-error
+ (lambda _
+ (and=> (match-pattern expect-grammar "onethree")
+ peg:tree))
+ (lambda args args))))
+
+(define program-text "
+/*
+ CopyLeft (L) Acme
+*/
+foo // the first
+bar
+ bar
+baz
+")
+
+(define-skip-parser peg-ws all (or " " "\t"))
+(define-skip-parser peg-eol all (or "\r" "\n"))
+(define-skip-parser peg-line all (and "//" (* (and (not-followed-by peg-eol) peg-any))))
+(define-skip-parser peg-block all (and "/*" (* (or peg-block (and (not-followed-by "*/") peg-any))) (expect "*/")))
+(define-skip-parser peg-skip all (* (or peg-ws peg-eol peg-line peg-block)))
+
+(with-test-prefix "Skip parser"
+ (pass-if-equal "skip comments and whitespace"
+ '(trace-grammar (foo "foo") ((bar "bar") (bar "bar")) (baz "baz"))
+ (and=> (parameterize ((%peg:skip? peg-skip))
+ (match-pattern trace-grammar program-text))
+ peg:tree))
+ (pass-if-equal "preserve comments and whitespace"
+ '(trace-grammar (foo "foo" (location 26 29))
+ ((bar "bar"
+ (comment " // the first\n" (location 29 43))
+ (location 43 46))
+ (bar "bar"
+ (comment "\n\t" (location 46 48))
+ (location 48 51)))
+ (baz "baz"
+ (comment "\n" (location 51 52))
+ (location 52 55))
+ (comment "\n/*\n CopyLeft (L) Acme\n*/\n" (location 0 26))
+ (location 26 55))
+ (and=> (parameterize ((%peg:skip? peg-skip)
+ (%peg:locations? #t))
+ (match-pattern trace-grammar program-text))
+ peg:tree)))
+
+(with-test-prefix "Fallback parser"
+ (pass-if-equal "only one"
+ '(expect-grammar (one "one") two three)
+ (and=> (parameterize ((%peg:skip? peg-skip)
+ (%peg:fallback? #t))
+ (match-pattern expect-grammar "one"))
+ peg:tree))
+ (pass-if-equal "no two"
+ '(expect-grammar (one "one") two (three "three"))
+ (and=> (parameterize ((%peg:skip? peg-skip)
+ (%peg:fallback? #t))
+ (match-pattern expect-grammar "one three"))
+ peg:tree))
+ (pass-if-equal "missing one (skipping zero)"
+ '(expect-grammar (one (two "two") (three "three")))
+ (and=> (parameterize ((%peg:skip? peg-skip)
+ (%peg:fallback? #t))
+ (match-pattern expect-grammar "zero two three"))
+ peg:tree)))
+
--
2.26.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2020-05-11 13:41 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 13:41 [PATCH] ice-9/peg: Extend PEG for production use Rutger van Beusekom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).