unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] ice-9/peg: Extend PEG for production use.
@ 2020-05-11 13:41 Rutger van Beusekom
  2024-10-14  7:26 ` [PATCH v2 0/5] " Janneke Nieuwenhuizen
  0 siblings, 1 reply; 11+ messages in thread
From: Rutger van Beusekom @ 2020-05-11 13:41 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

Hi,

First of all, I am aware I am submitting a sizeable patch here, and I am
happy to break it up if necessary.

Currently I am using these changes to the Parser Expression Grammar
library in GNU Guile for a production parser.

Hopefully you will find this interesting and useful. If so I would
appreciate your feedback.

Rutger.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ice-9-peg-Extend-PEG-for-production-use.patch --]
[-- Type: text/x-diff, Size: 34651 bytes --]

From 091915cc0cfbe73333b2c486b254628f37d52bb1 Mon Sep 17 00:00:00 2001
From: Rutger van Beusekom <rutger.van.beusekom@verum.com>
Date: Mon, 11 May 2020 14:30:49 +0200
Subject: [PATCH] ice-9/peg: Extend PEG for production use.

* doc/ref/api-peg.texi: Document debug tracing, locations, skip parsing,
  expect parsing and fallback parsing.
* module/ice-9/peg.scm (ice-9): Re-export added parameters and macro.
* module/ice-9/peg/codegen.scm (%peg:debug?): Add exported parameter to
  toggle parser debug tracing.
(%peg:locations?): Add exported parameter to toggle adding locations to
the parse tree.
(%peg:skip?): Add exported parameter to pass a skip parser to the main
grammar.
(%peg:fallback?): Add exported parameter to toggle fallback parsing.
(%peg:error): Add exported parameter to handle fallback recovery.
(%continuation): Add parameter to maintain the parser productions
following the current production.
(format-error): New function which wraps (%peg:error).
(fallback-skip): New function which implements parser recovery.
(partial-match): New function to allow continuations to match only
partially.
(cg-and-int): Rewrite to add fallback parsing.
(cg-*): Rewrite to add fallback parsing.
(cg-+): Rewrite to add fallback parsing.
(cg-expect-int): New function implementing expect parsing interfals.
(cg-expect): New function implementing expect parsing.
(expect): Add expect parsing to PEG
(trace?): New function determining debug tracing verbosity.
(indent): New variable maintaining debug trace indentation level.
(wrap-parser-for-users): Add debug tracing, locations and skip parsing.
(define-skip-parser): New macro allows definging skip parsers.
* module/ice-9/peg/string-peg.scm (peg-as-peg): Add #
(peg-secondary->defn): Add # for expect parsing.
* test-suite/tests/peg.test: Test it.
---
 doc/ref/api-peg.texi            | 307 +++++++++++++++++++++++++++++---
 module/ice-9/peg.scm            |   9 +-
 module/ice-9/peg/codegen.scm    | 193 +++++++++++++++-----
 module/ice-9/peg/string-peg.scm |  10 +-
 test-suite/tests/peg.test       | 120 ++++++++++++-
 5 files changed, 562 insertions(+), 77 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 82e2758b4..811c4d216 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -159,6 +159,22 @@ Would be:
  (+ "e"))
 @end lisp
 
+@deftp {PEG Pattern} {expect} a
+Expect to parse @var{a}.  Upon failure, throw a @code{'syntax-error}
+including its offset position and its symbol.
+
+@code{"a#"}
+
+@code{(expect a)}
+@end deftp
+
+Example:
+
+@example
+"a b# (c / d)#"
+@end example
+
+
 @subsubheading Extended Syntax
 
 There is some extra syntax for S-expressions.
@@ -221,11 +237,11 @@ character (in normal PEGs nonterminals can only be alphabetic).
 
 For example, if we:
 @lisp
-(define-peg-string-patterns 
+(define-peg-string-patterns
   "as <- 'a'+
 bs <- 'b'+
 as-or-bs <- as/bs")
-(define-peg-string-patterns 
+(define-peg-string-patterns
   "as-tag <-- 'a'+
 bs-tag <-- 'b'+
 as-or-bs-tag <-- as-tag/bs-tag")
@@ -271,9 +287,9 @@ For Example, if we:
 @end lisp
 Then:
 @lisp
-(match-pattern as-or-bs "aabbcc") @result{} 
+(match-pattern as-or-bs "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(match-pattern as-or-bs-tag "aabbcc") @result{} 
+(match-pattern as-or-bs-tag "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as-or-bs-tag (as-tag aa))>
 @end lisp
 
@@ -325,7 +341,7 @@ find a valid substring starting at index 0 and @code{search-for-pattern} keeps
 looking.  They are both equally capable of ``parsing'' and ``matching''
 given those constraints.
 
-@deffn {Scheme Procedure} match-pattern nonterm string 
+@deffn {Scheme Procedure} match-pattern nonterm string
 Parses @var{string} using the PEG stored in @var{nonterm}.  If no match
 was found, @code{match-pattern} returns false.  If a match was found, a PEG
 match record is returned.
@@ -344,19 +360,19 @@ nothing
 
 @lisp
 (define-peg-pattern as all (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
 
 (define-peg-pattern as body (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
 
 (define-peg-pattern as none (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: ()>
 
 (define-peg-pattern bs body (+ "b"))
-(match-pattern bs "aabbcc") @result{} 
+(match-pattern bs "aabbcc") @result{}
 #f
 @end lisp
 @end deffn
@@ -371,31 +387,31 @@ was found, a PEG match record is returned.
 
 @lisp
 (define-peg-pattern as body (+ "a"))
-(search-for-pattern as "aabbcc") @result{} 
+(search-for-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern (+ "a") "aabbcc") @result{} 
+(search-for-pattern (+ "a") "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern "'a'+" "aabbcc") @result{} 
+(search-for-pattern "'a'+" "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
 
 (define-peg-pattern as all (+ "a"))
-(search-for-pattern as "aabbcc") @result{} 
+(search-for-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
 
 (define-peg-pattern bs body (+ "b"))
-(search-for-pattern bs "aabbcc") @result{} 
+(search-for-pattern bs "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern (+ "b") "aabbcc") @result{} 
+(search-for-pattern (+ "b") "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern "'b'+" "aabbcc") @result{} 
+(search-for-pattern "'b'+" "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
 
 (define-peg-pattern zs body (+ "z"))
-(search-for-pattern zs "aabbcc") @result{} 
+(search-for-pattern zs "aabbcc") @result{}
 #f
-(search-for-pattern (+ "z") "aabbcc") @result{} 
+(search-for-pattern (+ "z") "aabbcc") @result{}
 #f
-(search-for-pattern "'z'+" "aabbcc") @result{} 
+(search-for-pattern "'z'+" "aabbcc") @result{}
 #f
 @end lisp
 @end deffn
@@ -468,9 +484,9 @@ itself satisfies @var{tst}, @code{(list lst)} is returned (this is a
 flat list whose only element satisfies @var{tst}).
 
 @lisp
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{} 
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{}
 (2 2 (1 1 (2 2)) 2 2 (1 1))
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{} 
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{}
 ((1 1 (1 1 (2 2)) (2 2 (1 1))))
 @end lisp
 
@@ -624,7 +640,7 @@ symbol for strings)..
   (entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
   (entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
 (peg:tree (match-pattern tag-passwd "one entry"))
-(tag-passwd 
+(tag-passwd
   (entry "one entry"))
 @end lisp
 
@@ -841,7 +857,7 @@ number <-- [0-9]+")
 		(apply next-func (car first))
 		(apply next-func (car rest)))
           ;; form (sum (((product ...) "+") ((product ...) "+")) (product ...))
-	  (car 
+	  (car
 	   (reduce ;; walk through the list and build a left-associative tree
 	    (lambda (l r)
 	      (list (list (cadr r) (car r) (apply next-func (car l)))
@@ -1035,3 +1051,248 @@ symbol function)}, where @code{symbol} is the symbol that will indicate
 a form of this type and @code{function} is the code generating function
 described above. The function @code{add-peg-compiler!} is exported from
 the @code{(ice-9 peg codegen)} module.
+
+@subsubheading Debug tracing
+
+Due to the backtracking nature of PEG, the parser result is @code{#f}
+when it cannot match the input text.  It proves to be a big pain
+determining whether the problem is actually in the input or in the
+grammar, especially when changing the grammar itself.  Setting the
+parameter @code{%peg:debug?} to @code{#t} enables debug tracing of all
+non-terminals, alternatively parameter @code{%peg:debug?} can be set to
+a list of non-terminal sysmbols.  Debug traring will make the PEG parser
+print for each selected non-terminal: its name, the input remaining, as
+well as the parse result.
+
+@lisp
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+                                     ;; missing: / term
+  "expression <-- term (PLUS / MINUS) expression
+   term       <-- factor (DIV / MULT) term / factor
+   factor     <-- LPAR expression RPAR / number
+   number     <-  MINUS? [0-9]+
+   PLUS       <-  '+'
+   MINUS      <-  '-'
+   DIV        <-  '/'
+   MULT       <-  '*'
+   LPAR       <  '('
+   RPAR       <  ')'")
+
+(parameterize ((%peg:debug? #t))
+  (pretty-print (match-pattern expression "1+")))
+@result{}
+expression
+    term
+        factor
+            LPAR
+            number
+                MINUS
+            number := "1"	next: "+"
+        factor := "1"	next: "+"
+        DIV
+        MULT
+    term := "1"	next: "+"
+    PLUS
+    PLUS := "+"	next: ""
+    expression
+        term
+            factor
+                LPAR
+                number
+                    MINUS
+#f
+
+@end lisp
+
+@subsubheading Locations
+
+Having the corresponding location of non-terminal matches in the parse
+tree proves to be useful when providing the user with warning or error
+messages. Location generation in the parse tree is enabled by setting
+the parameter @code{%peg:locations?} to @code{#t}.
+
+Example:
+
+@lisp
+(parameterize ((%peg:locations? #t))
+  (and=> (match-pattern expression "1+2/3*4") peg:tree))
+@result{}
+(expression
+  (term (factor "1" (location 0 1)) (location 0 1))
+  "+"
+  (expression
+    (term (factor "2" (location 2 3))
+          "*"
+          (term (factor "3" (location 4 5))
+                "/"
+                (term (factor "4" (location 6 7)) (location 6 7))
+                (location 4 7))
+          (location 2 7))
+    (location 2 7))
+  (location 0 7))
+@end lisp
+
+@subsubheading Skip parsing
+
+To write a PEG parser for a whitespace invariant language or a language
+which includes line and block comments requires littering the grammar
+with whitespace and comment parser expressions, which not only violates
+the DRY principle, but is hard to get right by hand.
+
+To this end, the @code{%peg:skip?} parameter is available.  When setting
+this parameter to an @emph{unwrapped parser} using
+@code{define-skip-parser}, this parser will be interleaved
+with the @emph{main parser}.
+
+Example:
+
+@lisp
+(use-modules (ice-9 match))
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+  "expression <-- term (PLUS / MINUS) expression / term
+   term       <-- factor (DIV / MULT) term / factor
+   factor     <-- LPAR expression RPAR / number
+   number     <-  MINUS? [0-9]+
+   PLUS       <-  '+'
+   MINUS      <-  '-'
+   DIV        <-  '/'
+   MULT       <-  '*'
+   LPAR       <  '('
+   RPAR       <  ')'")
+
+(define (transform e)
+  (match
+   e
+   (((? symbol?) lhs op rhs) `(,(transform op) ,(transform lhs) ,(transform rhs)))
+   (((? symbol?) lhs) (transform lhs))
+   ("+" '+)
+   ("-" '-)
+   ("*" '*)
+   ("/" '/)
+   ((? string?) (string->number e))))
+
+(pretty-print (transform (peg:tree (match-pattern expression "1/(2+3)*4"))))
+@result{}
+(/ 1 (* (+ 2 3) 4))
+
+(define-skip-parser peg-ws none (* (or " " "\t" "\n")))
+
+(parameterize ((%peg:skip? peg-ws))
+  (pretty-print (transform (peg:tree (match-pattern expression "1 / (2 + 3) * 4")))))
+@result{}
+(/ 1 (* (+ 2 3) 4))
+@end lisp
+
+@subsubheading Expect parsing
+
+The best thing about PEG is its backtracking nature giving it
+@emph{infinite} look ahead capability.  At the same time it severely
+limits the debugability of the grammar, as mentioned in debug tracing.
+The ability to short circuit backtracking is achieved by introducing the
+@code{#} operator to the PEG language.  Putting a @code{#} behind a
+terminal or non-terminal indicates that its parsing must succeed,
+otherwise an exception is thrown containing the current parser state
+providing a hook to produce informative parse errors.
+
+Example:
+
+@lisp
+(use-modules (ice-9 peg))
+
+(define-peg-string-patterns
+  "expression <-- term (PLUS / MINUS) expression# / term
+   term       <-- factor (DIV / MULT) term# / factor
+   factor     <-- LPAR expression# RPAR# / number#
+   number     <-  MINUS? [0-9]+
+   PLUS       <-  '+'
+   MINUS      <-  '-'
+   DIV        <-  '/'
+   MULT       <-  '*'
+   LPAR       <  '('
+   RPAR       <  ')'")
+
+(let ((input "1/(2+var)*4"))
+  (catch 'syntax-error
+        (lambda _ (peg:tree (match-pattern expression input)))
+        (lambda (key args) (format #t "at position ~a, expected a ~a, not: ~s\n"
+                                   (car args)
+                                   (cadr args)
+                                   (substring input (car args))))))
+@result{}
+at position 5, expected a number, not: "var)*4"
+@end lisp
+
+@subsubheading Fallback parsing
+
+A natural extension to expect parsing is fallback parsing. Which
+is implemented by catching the exception thrown by the expect operator.
+The parser will catch the exception in the peg @code{and}, @code{+} and
+@code{*} operator to attempt to recover its state by eating away at the
+input until it runs out of input or until one of the grammar
+continuations matches and it will continue parsing from there.
+
+By setting parameter @code{%peg:fallback?} to @code{#t} fallback parsing
+will be enabled when it encouters a mismatch in the input on an expect
+production. Parameter @code{%peg:error} is used to handle the error and
+it receives the position, the input and a list representing the error as
+pair. When the @code{car} is @code{'skipped}, the @code{cdr} is the
+substring which is ignored by the parser. When the @code{car} is
+@code{'missing} the @code{cdr} is the parser snippet that was expected,
+but is missing in the input.
+
+Example:
+
+@lisp
+(use-modules (ice-9 peg))
+(use-modules (ice-9 pretty-print))
+
+(define-peg-string-patterns
+  "expression <-- term (PLUS / MINUS) expression# / term
+   term       <-- factor (DIV / MULT) term# / factor
+   factor     <-- LPAR expression RPAR# / number
+   number     <-  MINUS? [0-9]+
+   PLUS       <-  '+'
+   MINUS      <-  '-'
+   DIV        <-  '/'
+   MULT       <-  '*'
+   LPAR       <   '('
+   RPAR       <   ')'")
+
+(define (handle-error pos str error)
+  (format #t "at postion ~a, ~a: \"~a\"\n"
+          pos (car error) (cdr error)))
+
+(let ((input "1/(2+var)*4"))
+  (parameterize ((%peg:fallback? #t)
+                 (%peg:error handle-error))
+    (pretty-print (peg:tree (match-pattern expression input)))))
+@result{}
+at postion 5, skipped: "var)"
+(expression
+  (term (factor "1")
+        "/"
+        (term (factor
+                (expression (term (factor "2")) "+" expression))
+              "*"
+              (term (factor "4")))))
+
+(let ((input "1/(2+)*4"))
+  (parameterize ((%peg:fallback? #t)
+                 (%peg:error handle-error))
+    (pretty-print (peg:tree (match-pattern expression input)))))
+@result{}
+at postion 5, missing: "expression"
+(expression
+  (term (factor "1")
+        "/"
+        (term (factor
+                (expression (term (factor "2")) "+" expression))
+              "*"
+              (term (factor "4")))))
+@end lisp
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 4e03131cd..ca7d091bf 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -38,5 +38,10 @@
                peg:string
                peg:tree
                peg:substring
-               peg-record?))
-
+               peg-record?
+               %peg:debug?
+               %peg:fallback?
+               %peg:locations?
+               %peg:skip?
+               %peg:error
+               define-skip-parser))
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index d80c3e849..fcfb71422 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -18,10 +18,27 @@
 ;;;;
 
 (define-module (ice-9 peg codegen)
-  #:export (compile-peg-pattern wrap-parser-for-users add-peg-compiler!)
+  #:export (compile-peg-pattern
+            wrap-parser-for-users
+            add-peg-compiler!
+            define-skip-parser
+            %peg:debug?
+            %peg:locations?
+            %peg:skip?
+            %peg:fallback?
+            %peg:error)
+  #:use-module (srfi srfi-1)
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
+(define %peg:debug? (make-parameter #f))
+(define %peg:locations? (make-parameter #f))
+(define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
+(define %peg:fallback? (make-parameter #f))
+(define %peg:error (make-parameter (lambda (pos str error) #f)))
+
+(define %continuation (make-parameter (lambda (str streln at) #f)))
+
 (define-syntax single?
   (syntax-rules ()
     "Return #t if X is a list of one element."
@@ -162,6 +179,38 @@ return EXP."
    ((eq? accum 'none) 'none)))
 (define baf builtin-accum-filter)
 
+(define (format-error missing str)
+  (lambda (from to)
+    (unless (and (< from to)
+                 (string-every char-set:whitespace str from (1+ to)))
+      ((%peg:error) from str
+       (if (< from to) (cons 'skipped (substring str from (1+ to)))
+           (cons 'missing missing))))))
+
+(define* (fallback-skip kernel #:optional sequence?)
+  (if (not (%peg:fallback?)) kernel
+      (lambda (str strlen start)
+        (catch 'syntax-error
+               (lambda _
+                 (cond ((or #t (< start strlen)) (kernel str strlen start))
+                       ((not sequence?) `(,strlen ()))
+                       (else #f)))
+               (lambda (key . args)
+                 (let* ((expected (cadar args))
+                        (format-error (format-error expected str)))
+                   (let loop ((at start))
+                     (cond ((or (= at strlen) ((%continuation) str strlen at)) (format-error start at) (if sequence? `(,at ()) `(,at (,expected))))
+                           (else (or (let ((res (false-if-exception (kernel str strlen (1+ at))))) (when res(format-error start at)) res)
+                                     ;;if kernel matches, we have skipped over: (substring str start (1+ at)))
+                                     (loop (1+ at))))))))))))
+
+
+(define (partial-match kernel)
+  (lambda (str strlen at)
+    (catch #t
+           (lambda _ (kernel str strlen at))
+           (lambda (key . args) (and (< at (caar args)) (car args))))))
+
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and clauses accum)
   #`(lambda (str len pos)
@@ -174,8 +223,14 @@ return EXP."
     (()
      (cggr accum 'cg-and #`(reverse #,body) at))
     ((first rest ...)
-     #`(let ((res (#,(compile-peg-pattern #'first accum) #,str #,strlen #,at)))
-         (and res 
+     #`(let* ((next #,(cg-or #'(rest ...) 'body))
+              (kernel #,(compile-peg-pattern #'first accum))
+              (res (parameterize ((%continuation (let ((after-that (%continuation)))
+                                                   (lambda (str strlen at)
+                                                     (or ((partial-match next) str strlen at)
+                                                         ((partial-match after-that) str strlen at))))))
+                     ((fallback-skip kernel) #,str #,strlen #,at))))
+         (and res
               ;; update AT and BODY then recurse
               (let ((newat (car res))
                     (newbody (cadr res)))
@@ -200,42 +255,40 @@ return EXP."
 (define (cg-* args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#t))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (kleene (lambda (str strlen at)
+                        (let ((body '()))
+                          (let lp ((end at) (count 0))
+                            (let* ((match ((fallback-skip kernel #t) str strlen end))
+                                   (new-end (if match (car match) end))
+                                   (count (if (> new-end end) (1+ count) count)))
+                              (when (> new-end end)
+                                (push-not-null! body (single-filter (cadr match))))
+                              (if (and (> new-end end) #,#t) (lp new-end count)
+                                  (let ((success #,#t))
+                                    #,#`(and success
+                                             #,(cggr (baf accum) 'cg-body
+                                                     #'(reverse body) #'new-end))))))))))
+         kleene))))
 
 (define (cg-+ args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#'(>= count 1)))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (multiple (lambda (str strlen at)
+                          (let ((body '()))
+                            (let lp ((end at) (count 0))
+                              (let* ((match ((fallback-skip kernel #t) str strlen end))
+                                     (new-end (if match (car match) end))
+                                     (count (if (> new-end end) (1+ count) count)))
+                                (when (> new-end end)
+                                  (push-not-null! body (single-filter (cadr match))))
+                                (if (and (> new-end end) #,#t) (lp new-end count)
+                                    (let ((success #,#'(>= count 1)))
+                                      #,#`(and success
+                                               #,(cggr (baf accum) 'cg-body
+                                                       #'(reverse body) #'new-end))))))))))
+         multiple))))
 
 (define (cg-? args accum)
   (syntax-case args ()
@@ -296,6 +349,16 @@ return EXP."
                                 #f
                                 #,(cggr (baf accum) 'cg-body #''() #'at)))))))))))
 
+(define (cg-expect-int clauses accum str strlen at)
+  (syntax-case clauses ()
+    ((pat)
+     #`(or (#,(compile-peg-pattern #'pat accum) #,str #,strlen #,at)
+           (throw 'syntax-error (list #,at (syntax->datum #'pat)))))))
+
+(define (cg-expect clauses accum)
+  #`(lambda (str len pos)
+      #,(cg-expect-int clauses (baf accum) #'str #'len #'pos)))
+
 ;; Association list of functions to handle different expressions as PEGs
 (define peg-compiler-alist '())
 
@@ -313,6 +376,7 @@ return EXP."
 (add-peg-compiler! '? cg-?)
 (add-peg-compiler! 'followed-by cg-followed-by)
 (add-peg-compiler! 'not-followed-by cg-not-followed-by)
+(add-peg-compiler! 'expect cg-expect)
 
 ;; Takes an arbitrary expressions and accumulation variable, then parses it.
 ;; E.g.: (compile-peg-pattern syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
@@ -332,28 +396,65 @@ return EXP."
                                 "Not one of" (map car peg-compiler-alist)))))))
 
 ;; Packages the results of a parser
+
+(define (trace? symbol)
+  (cond ((pair? (%peg:debug?)) (memq symbol (%peg:debug?)))
+        ((or (null? (%peg:debug?)) (%peg:debug?)) #t)
+        (else #f)))
+
+(define indent 0)
+
 (define (wrap-parser-for-users for-syntax parser accumsym s-syn)
-   #`(lambda (str strlen at)
-      (let ((res (#,parser str strlen at)))
+  #`(lambda (str strlen at)
+      (when (trace? '#,s-syn)
+        (format (current-error-port) "~a~a\n"
+                (make-string indent #\space)
+                '#,s-syn))
+      (set! indent (+ indent 4))
+      (let* ((comment-res ((%peg:skip?) str strlen at))
+             (comment-loc (and (%peg:locations?) comment-res `(location ,at ,(car comment-res))))
+             (at (or (and comment-res (car comment-res)) at))
+             (res (#,parser str strlen at)))
+        (set! indent (- indent 4))
+        (let ((pos (or (and res (car res)) 0)))
+          (when (and (trace? '#,s-syn) (< at pos))
+            (format (current-error-port) "~a~a := ~s\tnext: ~s\n"
+                    (make-string indent #\space)
+                    '#,s-syn
+                    (substring str at pos)
+                    (substring str pos (min strlen (+ pos 10))))))
         ;; Try to match the nonterminal.
         (if res
             ;; If we matched, do some post-processing to figure out
             ;; what data to propagate upward.
-            (let ((at (car res))
-                  (body (cadr res)))
+            (let* ((body (cadr res))
+                   (loc `(location ,at ,(car res)))
+                   (annotate (if (not (%peg:locations?)) '()
+                                 (if (null? (cadr comment-res)) `(,loc)
+                                     `((comment ,(cdr comment-res) ,comment-loc) ,loc))))
+                   (at (car res)))
               #,(cond
                  ((eq? accumsym 'name)
-                  #`(list at '#,s-syn))
+                  #`(list at '#,s-syn ,@annotate))
                  ((eq? accumsym 'all)
                   #`(list (car res)
                           (cond
                            ((not (list? body))
-                            (list '#,s-syn body))
-                           ((null? body) '#,s-syn)
+                            `(,'#,s-syn ,body ,@annotate))
+                           ((null? body)
+                            `(,'#,s-syn ,@annotate))
                            ((symbol? (car body))
-                            (list '#,s-syn body))
-                           (else (cons '#,s-syn body)))))
-                 ((eq? accumsym 'none) #`(list (car res) '()))
-                 (else #`(begin res))))
+                            `(,'#,s-syn ,body ,@annotate))
+                           (else
+                            (cons '#,s-syn (append body annotate))))))
+                 ((eq? accumsym 'none) #``(,at () ,@annotate))
+                 (else #``(,at ,body ,@annotate))))
             ;; If we didn't match, just return false.
             #f))))
+
+(define-syntax define-skip-parser
+  (lambda (x)
+    (syntax-case x ()
+      ((_ sym accum pat)
+       (let* ((matchf (compile-peg-pattern #'pat (syntax->datum #'accum))))
+         #`(define sym #,matchf))))))
diff --git a/module/ice-9/peg/string-peg.scm b/module/ice-9/peg/string-peg.scm
index 45ed14bb1..e8af6bb3d 100644
--- a/module/ice-9/peg/string-peg.scm
+++ b/module/ice-9/peg/string-peg.scm
@@ -42,7 +42,8 @@
 pattern <-- alternative (SLASH sp alternative)*
 alternative <-- ([!&]? sp suffix)+
 suffix <-- primary ([*+?] sp)*
-primary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
+primary <-- secondary ([#] sp)?
+secondary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
 literal <-- ['] (!['] .)* ['] sp
 charclass <-- LB (!']' (CCrange / CCsingle))* RB sp
 CCrange <-- . '-' .
@@ -73,6 +74,8 @@ RB < ']'
 (define-sexp-parser peg-suffix all
   (and peg-primary (* (and (or "*" "+" "?") peg-sp))))
 (define-sexp-parser peg-primary all
+  (and peg-secondary (? (and "#" peg-sp))))
+(define-sexp-parser peg-secondary all
   (or (and "(" peg-sp peg-pattern ")" peg-sp)
       (and "." peg-sp)
       peg-literal
@@ -188,6 +191,11 @@ RB < ']'
 
 ;; Parse a primary.
 (define (peg-primary->defn lst for-syntax)
+  (let ((inner-defn (peg-secondary->defn (cadr lst) for-syntax)))
+    (if (and (pair? (cddr lst)) (equal? (caddr lst) "#")) #`(expect #,inner-defn)
+        inner-defn)))
+
+(define (peg-secondary->defn lst for-syntax)
   (let ((el (cadr lst)))
   (cond
    ((list? el)
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index f516571e8..3f2ea70e1 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -1,5 +1,5 @@
+;;;;; PEG test suite. -*- scheme -*-
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; PEG test suite.
 ;; Tests the parsing capabilities of (ice-9 peg).  Could use more
 ;; tests for edge cases.
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -33,6 +33,7 @@
     (alternative peg-alternative)
     (suffix peg-suffix)
     (primary peg-primary)
+    (secondary peg-secondary)
     (literal peg-literal)
     (charclass peg-charclass)
     (CCrange charclass-range)
@@ -63,13 +64,12 @@
   (pass-if
    "defining PEGs with PEG"
    (and (eeval `(define-peg-string-patterns ,(@@ (ice-9 peg) peg-as-peg))) #t))
-  (pass-if
+  (pass-if-equal
    "equivalence of definitions"
-   (equal?
-    (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
+      (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
     (tree-map
      grammar-transform
-     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
+     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg))))))
 
 ;; A grammar for pascal-style comments from Wikipedia.
 (define comment-grammar
@@ -276,3 +276,113 @@ number <-- [0-9]+")
    (equal? (eq-parse "1+1/2*3+(1+1)/2")
 	   '(+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2)))))
 
+(define-peg-string-patterns
+  "trace-grammar <-- foo bar* baz
+foo <-- 'foo'
+bar <-- 'bar'
+baz <-- 'baz'")
+
+(use-modules (ice-9 peg codegen)) ; %peg:debug?
+(with-test-prefix "Parse tracing"
+  (pass-if-equal
+   "trace"
+"trace-grammar
+    foo
+    foo := \"foo\"	next: \"baz\"
+    bar
+    baz
+    baz := \"baz\"	next: \"\"
+trace-grammar := \"foobaz\"	next: \"\"
+"
+   (parameterize ((%peg:debug? #t))
+      (with-error-to-string
+        (lambda _ (and=> (match-pattern trace-grammar "foobaz")
+                         peg:tree))))))
+
+(define-peg-string-patterns
+  "expect-grammar <-- one# two# three#
+one <-- 'one'
+two <-- 'two'
+three <-- 'three'"
+)
+
+(with-test-prefix "Parsing expect"
+  (pass-if-equal "expect okay"
+      '(expect-grammar (one "one") (two "two") (three "three"))
+      (and=> (match-pattern expect-grammar "onetwothree")
+             peg:tree))
+  (pass-if-equal "expect one"
+      '(syntax-error (0 one))
+    (catch 'syntax-error
+      (lambda _
+        (and=> (match-pattern expect-grammar "twothree")
+               peg:tree))
+      (lambda args args)))
+  (pass-if-equal "expect two"
+      '(syntax-error (3 two))
+    (catch 'syntax-error
+      (lambda _
+        (and=> (match-pattern expect-grammar "onethree")
+               peg:tree))
+      (lambda args args))))
+
+(define program-text "
+/*
+ CopyLeft (L) Acme
+*/
+foo // the first
+bar
+	bar
+baz
+")
+
+(define-skip-parser peg-ws all (or " " "\t"))
+(define-skip-parser peg-eol all (or "\r" "\n"))
+(define-skip-parser peg-line all (and "//" (* (and (not-followed-by peg-eol) peg-any))))
+(define-skip-parser peg-block all (and "/*" (* (or peg-block (and (not-followed-by "*/") peg-any))) (expect "*/")))
+(define-skip-parser peg-skip all (* (or peg-ws peg-eol peg-line peg-block)))
+
+(with-test-prefix "Skip parser"
+  (pass-if-equal "skip comments and whitespace"
+      '(trace-grammar (foo "foo") ((bar "bar") (bar "bar")) (baz "baz"))
+    (and=> (parameterize ((%peg:skip? peg-skip))
+             (match-pattern trace-grammar program-text))
+           peg:tree))
+  (pass-if-equal "preserve comments and whitespace"
+      '(trace-grammar (foo "foo" (location 26 29))
+                      ((bar "bar"
+                            (comment " // the first\n" (location 29 43))
+                            (location 43 46))
+                       (bar "bar"
+                            (comment "\n\t" (location 46 48))
+                            (location 48 51)))
+                      (baz "baz"
+                           (comment "\n" (location 51 52))
+                           (location 52 55))
+                      (comment "\n/*\n CopyLeft (L) Acme\n*/\n" (location 0 26))
+                      (location 26 55))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:locations? #t))
+             (match-pattern trace-grammar program-text))
+           peg:tree)))
+
+(with-test-prefix "Fallback parser"
+  (pass-if-equal "only one"
+      '(expect-grammar (one "one") two three)
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fallback? #t))
+             (match-pattern expect-grammar "one"))
+           peg:tree))
+  (pass-if-equal "no two"
+      '(expect-grammar (one "one") two (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fallback? #t))
+             (match-pattern expect-grammar "one three"))
+           peg:tree))
+  (pass-if-equal "missing one (skipping zero)"
+      '(expect-grammar (one (two "two") (three "three")))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fallback? #t))
+             (match-pattern expect-grammar "zero two three"))
+           peg:tree)))
+
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 0/5] Extend PEG for production use.
  2020-05-11 13:41 [PATCH] ice-9/peg: Extend PEG for production use Rutger van Beusekom
@ 2024-10-14  7:26 ` Janneke Nieuwenhuizen
  2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
  0 siblings, 1 reply; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:26 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga

Rutger van Beusekom writes:

> First of all, I am aware I am submitting a sizeable patch here, and I am
> happy to break it up if necessary.

As Ekaitz has been doing some great PEG work and it seems that there's
some renewed interest in PEG, please find a newer version of this patch
as a broken-up patch set and rebased on latest main.  It features:

  * debug tracing, for debugging of grammar
  * expectation parser, for (syntax) error reporting
  * parameterize'able whitespace and comment skip parsing
  * preserving of location information
  * fall-back parsing, for parsing incomplete texts

We've been using these extensions in Dezyne (https://dezyne.org) for
some years now.

For your pulling convenience, also here
    https://gitlab.com/janneke/guile/-/commits/wip-peg

Greetings,
Janneke

Rutger van Beusekom (5):
  Remove trailing whitespace in PEG texinfo.
  peg: Add debug tracing.
  peg: Add expect.
  peg: Add whitespace and comment skip parsers.
  peg: Add fall-back parsing.

 doc/ref/api-peg.texi            | 143 ++++++++++++++++----
 module/ice-9/peg.scm            |  11 +-
 module/ice-9/peg/codegen.scm    | 231 +++++++++++++++++++++++++-------
 module/ice-9/peg/string-peg.scm |  12 +-
 test-suite/tests/peg.test       | 141 +++++++++++++++++--
 5 files changed, 453 insertions(+), 85 deletions(-)

-- 
Janneke Nieuwenhuizen <janneke@gnu.org>  | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo.
  2024-10-14  7:26 ` [PATCH v2 0/5] " Janneke Nieuwenhuizen
@ 2024-10-14  7:31   ` Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 2/5] peg: Add debug tracing Janneke Nieuwenhuizen
                       ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:31 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

From: Rutger van Beusekom <rutger@dezyne.org>

* doc/ref/api-peg.texi (PEG API Reference): Remove trailing whitespace.
---
 doc/ref/api-peg.texi | 46 ++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index d34ddc64c..0214f7ff1 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -221,11 +221,11 @@ character (in normal PEGs nonterminals can only be alphabetic).
 
 For example, if we:
 @lisp
-(define-peg-string-patterns 
+(define-peg-string-patterns
   "as <- 'a'+
 bs <- 'b'+
 as-or-bs <- as/bs")
-(define-peg-string-patterns 
+(define-peg-string-patterns
   "as-tag <-- 'a'+
 bs-tag <-- 'b'+
 as-or-bs-tag <-- as-tag/bs-tag")
@@ -271,9 +271,9 @@ For Example, if we:
 @end lisp
 Then:
 @lisp
-(match-pattern as-or-bs "aabbcc") @result{} 
+(match-pattern as-or-bs "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(match-pattern as-or-bs-tag "aabbcc") @result{} 
+(match-pattern as-or-bs-tag "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as-or-bs-tag (as-tag aa))>
 @end lisp
 
@@ -325,7 +325,7 @@ find a valid substring starting at index 0 and @code{search-for-pattern} keeps
 looking.  They are both equally capable of ``parsing'' and ``matching''
 given those constraints.
 
-@deffn {Scheme Procedure} match-pattern nonterm string 
+@deffn {Scheme Procedure} match-pattern nonterm string
 Parses @var{string} using the PEG stored in @var{nonterm}.  If no match
 was found, @code{match-pattern} returns false.  If a match was found, a PEG
 match record is returned.
@@ -344,19 +344,19 @@ nothing
 
 @lisp
 (define-peg-pattern as all (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
 
 (define-peg-pattern as body (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
 
 (define-peg-pattern as none (+ "a"))
-(match-pattern as "aabbcc") @result{} 
+(match-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: ()>
 
 (define-peg-pattern bs body (+ "b"))
-(match-pattern bs "aabbcc") @result{} 
+(match-pattern bs "aabbcc") @result{}
 #f
 @end lisp
 @end deffn
@@ -371,31 +371,31 @@ was found, a PEG match record is returned.
 
 @lisp
 (define-peg-pattern as body (+ "a"))
-(search-for-pattern as "aabbcc") @result{} 
+(search-for-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern (+ "a") "aabbcc") @result{} 
+(search-for-pattern (+ "a") "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
-(search-for-pattern "'a'+" "aabbcc") @result{} 
+(search-for-pattern "'a'+" "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: aa>
 
 (define-peg-pattern as all (+ "a"))
-(search-for-pattern as "aabbcc") @result{} 
+(search-for-pattern as "aabbcc") @result{}
 #<peg start: 0 end: 2 string: aabbcc tree: (as aa)>
 
 (define-peg-pattern bs body (+ "b"))
-(search-for-pattern bs "aabbcc") @result{} 
+(search-for-pattern bs "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern (+ "b") "aabbcc") @result{} 
+(search-for-pattern (+ "b") "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
-(search-for-pattern "'b'+" "aabbcc") @result{} 
+(search-for-pattern "'b'+" "aabbcc") @result{}
 #<peg start: 2 end: 4 string: aabbcc tree: bb>
 
 (define-peg-pattern zs body (+ "z"))
-(search-for-pattern zs "aabbcc") @result{} 
+(search-for-pattern zs "aabbcc") @result{}
 #f
-(search-for-pattern (+ "z") "aabbcc") @result{} 
+(search-for-pattern (+ "z") "aabbcc") @result{}
 #f
-(search-for-pattern "'z'+" "aabbcc") @result{} 
+(search-for-pattern "'z'+" "aabbcc") @result{}
 #f
 @end lisp
 @end deffn
@@ -468,9 +468,9 @@ itself satisfies @var{tst}, @code{(list lst)} is returned (this is a
 flat list whose only element satisfies @var{tst}).
 
 @lisp
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{} 
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(2 2 (1 1 (2 2)) (2 2 (1 1)))) @result{}
 (2 2 (1 1 (2 2)) 2 2 (1 1))
-(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{} 
+(context-flatten (lambda (x) (and (number? (car x)) (= (car x) 1))) '(1 1 (1 1 (2 2)) (2 2 (1 1)))) @result{}
 ((1 1 (1 1 (2 2)) (2 2 (1 1))))
 @end lisp
 
@@ -624,7 +624,7 @@ symbol for strings)..
   (entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
   (entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
 (peg:tree (match-pattern tag-passwd "one entry"))
-(tag-passwd 
+(tag-passwd
   (entry "one entry"))
 @end lisp
 
@@ -841,7 +841,7 @@ number <-- [0-9]+")
 		(apply next-func (car first))
 		(apply next-func (car rest)))
           ;; form (sum (((product ...) "+") ((product ...) "+")) (product ...))
-	  (car 
+	  (car
 	   (reduce ;; walk through the list and build a left-associative tree
 	    (lambda (l r)
 	      (list (list (cadr r) (car r) (apply next-func (car l)))
-- 
2.46.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] peg: Add debug tracing.
  2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
@ 2024-10-14  7:31     ` Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 3/5] peg: Add expect Janneke Nieuwenhuizen
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:31 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

From: Rutger van Beusekom <rutger@dezyne.org>

* module/ice-9/peg/codegen.scm (trace?): New function.
(indent): New variable.
(%peg:debug?): New exported parameter.
(wrap-parser-for-users): Use them to provide debug tracing.
* test-suite/tests/peg.test ("Parse tracing"): Test it.
* doc/ref/api-peg.texi (Debug tracing): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
---
 doc/ref/api-peg.texi         | 18 ++++++++-
 module/ice-9/peg.scm         |  3 +-
 module/ice-9/peg/codegen.scm | 74 ++++++++++++++++++++++++------------
 test-suite/tests/peg.test    | 39 ++++++++++++++++---
 4 files changed, 102 insertions(+), 32 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 0214f7ff1..dfa806832 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -1,6 +1,6 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Guile Reference Manual.
-@c Copyright (C) 2006, 2010, 2011
+@c Copyright (C) 2006, 2010, 2011, 2024
 @c   Free Software Foundation, Inc.
 @c See the file guile.texi for copying conditions.
 
@@ -1035,3 +1035,19 @@ symbol function)}, where @code{symbol} is the symbol that will indicate
 a form of this type and @code{function} is the code generating function
 described above.  The function @code{add-peg-compiler!} is exported from
 the @code{(ice-9 peg codegen)} module.
+
+@subsubheading Debug tracing
+
+Due to the backtracking nature of PEG, the parser result is @code{#f}
+when it cannot match the input text.  It proves to be a big pain
+determining whether the problem is actually in the input or in the
+grammar, especially when changing the grammar itself.  Setting the
+parameter @var{%peg:debug?} to @code{#t} enables debug tracing, which
+will make the PEG parser print for each production rule: its name, the
+current state of the input, as well as the parse result.
+
+@lisp
+(define-peg-string-patterns "grammar @dots{}")
+(parameterize ((%peg:debug? #t))
+  (and=> (match-pattern grammar input-text) peg:tree))
+@end lisp
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 4e03131cd..499c3820c 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -1,6 +1,6 @@
 ;;;; peg.scm --- Parsing Expression Grammar (PEG) parser generator
 ;;;;
-;;;; 	Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2010, 2011, 2024 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -28,6 +28,7 @@
   #:use-module (ice-9 peg cache)
   #:re-export (define-peg-pattern
                define-peg-string-patterns
+               %peg:debug?
                match-pattern
                search-for-pattern
                compile-peg-pattern
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index d80c3e849..c450be440 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -1,6 +1,6 @@
 ;;;; codegen.scm --- code generation for composable parsers
 ;;;;
-;;;; 	Copyright (C) 2011 Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2011, 2024 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -18,7 +18,11 @@
 ;;;;
 
 (define-module (ice-9 peg codegen)
-  #:export (compile-peg-pattern wrap-parser-for-users add-peg-compiler!)
+  #:export (compile-peg-pattern
+            wrap-parser-for-users
+            add-peg-compiler!
+            %peg:debug?)
+
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
@@ -332,28 +336,48 @@ return EXP."
                                 "Not one of" (map car peg-compiler-alist)))))))
 
 ;; Packages the results of a parser
+
+(define %peg:debug? (make-parameter #f))
+(define (trace? symbol)
+  (%peg:debug?))
+
+(define indent 0)
+
 (define (wrap-parser-for-users for-syntax parser accumsym s-syn)
-   #`(lambda (str strlen at)
+  #`(lambda (str strlen at)
+      (when (trace? '#,s-syn)
+        (format (current-error-port) "~a~a\n"
+                (make-string indent #\space)
+                '#,s-syn))
+      (set! indent (+ indent 4))
       (let ((res (#,parser str strlen at)))
-        ;; Try to match the nonterminal.
-        (if res
-            ;; If we matched, do some post-processing to figure out
-            ;; what data to propagate upward.
-            (let ((at (car res))
-                  (body (cadr res)))
-              #,(cond
-                 ((eq? accumsym 'name)
-                  #`(list at '#,s-syn))
-                 ((eq? accumsym 'all)
-                  #`(list (car res)
-                          (cond
-                           ((not (list? body))
-                            (list '#,s-syn body))
-                           ((null? body) '#,s-syn)
-                           ((symbol? (car body))
-                            (list '#,s-syn body))
-                           (else (cons '#,s-syn body)))))
-                 ((eq? accumsym 'none) #`(list (car res) '()))
-                 (else #`(begin res))))
-            ;; If we didn't match, just return false.
-            #f))))
+        (set! indent (- indent 4))
+        (let ((pos (or (and res (car res)) 0)))
+          (when (and (trace? '#,s-syn) (< at pos))
+            (format (current-error-port) "~a~a := ~s\tnext: ~s\n"
+                    (make-string indent #\space)
+                    '#,s-syn
+                    (substring str at pos)
+                    (substring str pos (min strlen (+ pos 10)))))
+          ;; Try to match the nonterminal.
+          (if res
+              ;; If we matched, do some post-processing to figure out
+              ;; what data to propagate upward.
+              (let ((at (car res))
+                    (body (cadr res)))
+                #,(cond
+                   ((eq? accumsym 'name)
+                    #`(list at '#,s-syn))
+                   ((eq? accumsym 'all)
+                    #`(list (car res)
+                            (cond
+                             ((not (list? body))
+                              (list '#,s-syn body))
+                             ((null? body) '#,s-syn)
+                             ((symbol? (car body))
+                              (list '#,s-syn body))
+                             (else (cons '#,s-syn body)))))
+                   ((eq? accumsym 'none) #`(list (car res) '()))
+                   (else #`(begin res))))
+              ;; If we didn't match, just return false.
+              #f)))))
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index f516571e8..6a8709794 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -1,14 +1,14 @@
+;;;;; PEG test suite. -*- scheme -*-
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; PEG test suite.
 ;; Tests the parsing capabilities of (ice-9 peg).  Could use more
 ;; tests for edge cases.
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define-module (test-suite test-peg)
-  :use-module (test-suite lib)
-  :use-module (ice-9 peg)
-  :use-module (ice-9 pretty-print)
-  :use-module (srfi srfi-1))
+  #:use-module (srfi srfi-1)
+  #:use-module (test-suite lib)
+  #:use-module (ice-9 peg)
+  #:use-module (ice-9 pretty-print))
 
 ;; Doubled up for pasting into REPL.
 (use-modules (test-suite lib))  
@@ -276,3 +276,32 @@ number <-- [0-9]+")
    (equal? (eq-parse "1+1/2*3+(1+1)/2")
 	   '(+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2)))))
 
+(define-peg-string-patterns
+  "trace-grammar <-- foo bar* baz
+foo <-- 'foo'
+bar <-- bla+
+bla <-- 'bar'
+baz <-- 'baz'")
+
+(with-test-prefix "Parse tracing"
+  (pass-if-equal
+   "trace"
+"trace-grammar
+    foo
+    foo := \"foo\"	next: \"barbarbaz\"
+    bar
+        bla
+        bla := \"bar\"	next: \"barbaz\"
+        bla
+        bla := \"bar\"	next: \"baz\"
+        bla
+    bar := \"barbar\"	next: \"baz\"
+    bar
+    baz
+    baz := \"baz\"	next: \"\"
+trace-grammar := \"foobarbarbaz\"	next: \"\"
+"
+   (parameterize ((%peg:debug? #t))
+      (with-error-to-string
+        (lambda _ (and=> (match-pattern trace-grammar "foobarbarbaz")
+                         peg:tree))))))
-- 
2.46.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] peg: Add expect.
  2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 2/5] peg: Add debug tracing Janneke Nieuwenhuizen
@ 2024-10-14  7:31     ` Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 4/5] peg: Add whitespace and comment skip parsers Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 5/5] peg: Add fall-back parsing Janneke Nieuwenhuizen
  3 siblings, 0 replies; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:31 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

From: Rutger van Beusekom <rutger@dezyne.org>

This adds an expectation parser (expect a, 'a#') to the PEG parser
grammar.

Rationale: PEG will return #f for invalid input.  Adding expect (#)
to the grammar is a way to report syntax errors.

* module/ice-9/peg/string-peg.scm (peg-secondary->defn): Rename from
peg-primary->defn.
(cg-expect-int, gc-expect): New function.
(expect): Add them in new parser.
(peg-primary->defn): Use it in new function.
(peg-as-peg "secondary"): Rename from "primary".
(peg-as-peg "primary"): Use it, inserting `#' as expect.
* test-suite/tests/peg.test (grammar-mapping): Update for secondary.
("PEG Grammar"): Use pass-if-equal for friendlier failure resolving.
("Parsing expect"): Test it.
* doc/ref/api-peg.texi (PEG Syntax Reference): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
---
 doc/ref/api-peg.texi            | 21 ++++++++++++++++++++
 module/ice-9/peg/codegen.scm    | 11 +++++++++++
 module/ice-9/peg/string-peg.scm | 12 +++++++++--
 test-suite/tests/peg.test       | 35 +++++++++++++++++++++++++++++----
 4 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index dfa806832..df2e74d05 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -159,6 +159,15 @@ Would be:
  (+ "e"))
 @end lisp
 
+@deftp {PEG Pattern} expect a
+Expect to parse @var{a}.  If this succeeds, continues.  If this fails,
+throw a @code{syntax-error} with location and failed expectation.
+
+@code{"a#"}
+
+@code{(expect a)}
+@end deftp
+
 @subsubheading Extended Syntax
 
 There is some extra syntax for S-expressions.
@@ -1037,6 +1046,7 @@ described above.  The function @code{add-peg-compiler!} is exported from
 the @code{(ice-9 peg codegen)} module.
 
 @subsubheading Debug tracing
+@anchor{Debug tracing}
 
 Due to the backtracking nature of PEG, the parser result is @code{#f}
 when it cannot match the input text.  It proves to be a big pain
@@ -1051,3 +1061,14 @@ current state of the input, as well as the parse result.
 (parameterize ((%peg:debug? #t))
   (and=> (match-pattern grammar input-text) peg:tree))
 @end lisp
+
+@subsubheading Expect parsing
+
+The best thing about PEG is its backtracking nature making it
+LL(infinite).  At the same time it severely limits the debugability of
+the grammar, as mentioned in @xref{Debug tracing}.  The ability to stop
+backtracking is achieved by introducing the @code{#} operator to the PEG
+language.  Putting a @code{#} behind a terminal or non-terminal
+indicates that its parsing must succeed, otherwise an exception is
+thrown containing the current parser state providing a hook to produce
+informative parse errors.
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index c450be440..dd24bdac0 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -300,6 +300,16 @@ return EXP."
                                 #f
                                 #,(cggr (baf accum) 'cg-body #''() #'at)))))))))))
 
+(define (cg-expect-int clauses accum str strlen at)
+  (syntax-case clauses ()
+    ((pat)
+     #`(or (#,(compile-peg-pattern #'pat accum) #,str #,strlen #,at)
+           (throw 'syntax-error (list #,at (syntax->datum #'pat)))))))
+
+(define (cg-expect clauses accum)
+  #`(lambda (str len pos)
+      #,(cg-expect-int clauses (baf accum) #'str #'len #'pos)))
+
 ;; Association list of functions to handle different expressions as PEGs
 (define peg-compiler-alist '())
 
@@ -317,6 +327,7 @@ return EXP."
 (add-peg-compiler! '? cg-?)
 (add-peg-compiler! 'followed-by cg-followed-by)
 (add-peg-compiler! 'not-followed-by cg-not-followed-by)
+(add-peg-compiler! 'expect cg-expect)
 
 ;; Takes an arbitrary expressions and accumulation variable, then parses it.
 ;; E.g.: (compile-peg-pattern syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
diff --git a/module/ice-9/peg/string-peg.scm b/module/ice-9/peg/string-peg.scm
index 45ed14bb1..da98a0da6 100644
--- a/module/ice-9/peg/string-peg.scm
+++ b/module/ice-9/peg/string-peg.scm
@@ -1,6 +1,6 @@
 ;;;; string-peg.scm --- representing PEG grammars as strings
 ;;;;
-;;;; 	Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2010, 2011, 2024 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -42,7 +42,8 @@
 pattern <-- alternative (SLASH sp alternative)*
 alternative <-- ([!&]? sp suffix)+
 suffix <-- primary ([*+?] sp)*
-primary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
+primary <-- secondary ([#] sp)?
+secondary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<'
 literal <-- ['] (!['] .)* ['] sp
 charclass <-- LB (!']' (CCrange / CCsingle))* RB sp
 CCrange <-- . '-' .
@@ -73,6 +74,8 @@ RB < ']'
 (define-sexp-parser peg-suffix all
   (and peg-primary (* (and (or "*" "+" "?") peg-sp))))
 (define-sexp-parser peg-primary all
+  (and peg-secondary (? (and "#" peg-sp))))
+(define-sexp-parser peg-secondary all
   (or (and "(" peg-sp peg-pattern ")" peg-sp)
       (and "." peg-sp)
       peg-literal
@@ -188,6 +191,11 @@ RB < ']'
 
 ;; Parse a primary.
 (define (peg-primary->defn lst for-syntax)
+  (let ((inner-defn (peg-secondary->defn (cadr lst) for-syntax)))
+    (if (and (pair? (cddr lst)) (equal? (caddr lst) "#")) #`(expect #,inner-defn)
+        inner-defn)))
+
+(define (peg-secondary->defn lst for-syntax)
   (let ((el (cadr lst)))
   (cond
    ((list? el)
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index 6a8709794..b3586c891 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -33,6 +33,7 @@
     (alternative peg-alternative)
     (suffix peg-suffix)
     (primary peg-primary)
+    (secondary peg-secondary)
     (literal peg-literal)
     (charclass peg-charclass)
     (CCrange charclass-range)
@@ -63,13 +64,12 @@
   (pass-if
    "defining PEGs with PEG"
    (and (eeval `(define-peg-string-patterns ,(@@ (ice-9 peg) peg-as-peg))) #t))
-  (pass-if
+  (pass-if-equal
    "equivalence of definitions"
-   (equal?
-    (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
+      (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
     (tree-map
      grammar-transform
-     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
+     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg))))))
 
 ;; A grammar for pascal-style comments from Wikipedia.
 (define comment-grammar
@@ -305,3 +305,30 @@ trace-grammar := \"foobarbarbaz\"	next: \"\"
       (with-error-to-string
         (lambda _ (and=> (match-pattern trace-grammar "foobarbarbaz")
                          peg:tree))))))
+
+(define-peg-string-patterns
+  "expect-grammar <-- one two three / .*
+one <-- 'one'#
+two <-- 'two'#
+three <-- 'three'"
+)
+
+(with-test-prefix "Parsing expect"
+  (pass-if-equal "expect okay"
+      '(expect-grammar (one "one") (two "two") (three "three"))
+      (and=> (match-pattern expect-grammar "onetwothree")
+             peg:tree))
+  (pass-if-equal "expect one"
+      '(syntax-error (0 "one"))
+    (catch 'syntax-error
+      (lambda _
+        (and=> (match-pattern expect-grammar "twothree")
+               peg:tree))
+      (lambda args args)))
+  (pass-if-equal "expect two"
+      '(syntax-error (3 "two"))
+    (catch 'syntax-error
+      (lambda _
+        (and=> (match-pattern expect-grammar "onethree")
+               peg:tree))
+      (lambda args args))))
-- 
2.46.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] peg: Add whitespace and comment skip parsers.
  2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 2/5] peg: Add debug tracing Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 3/5] peg: Add expect Janneke Nieuwenhuizen
@ 2024-10-14  7:31     ` Janneke Nieuwenhuizen
  2024-10-14  7:31     ` [PATCH v2 5/5] peg: Add fall-back parsing Janneke Nieuwenhuizen
  3 siblings, 0 replies; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:31 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

From: Rutger van Beusekom <rutger@dezyne.org>

* module/ice-9/peg/codegen.scm: (%peg:locations?, %peg:skip?): New
exported parameters.
(wrap-parser-for-users): Use them to enable skip parsing and switch
having locations on comments and whitespace.
* test-suite/tests/peg.test ("Skip parser"): Test it.
* doc/ref/api-peg.texi (Whitespace and comments): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
---
 doc/ref/api-peg.texi         | 44 +++++++++++++++++++++
 module/ice-9/peg.scm         |  3 ++
 module/ice-9/peg/codegen.scm | 74 ++++++++++++++++++++++++------------
 test-suite/tests/peg.test    | 47 +++++++++++++++++++++++
 4 files changed, 143 insertions(+), 25 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index df2e74d05..733cb1c6d 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -1062,6 +1062,50 @@ current state of the input, as well as the parse result.
   (and=> (match-pattern grammar input-text) peg:tree))
 @end lisp
 
+@subsubheading Whitespace and comments
+
+To write a PEG parser for a whitespace invariant language or a language
+which includes line and block comments requires littering the grammar
+with whitespace or comment parser expressions, which not only violates
+the DRY principle, but is hard to get right.
+
+For example, to parse a C-like language one would define these
+whitespace and comment parsers
+
+@lisp
+(define-skip-parser peg-eof none (not-followed-by peg-any))
+(define-skip-parser peg-eol none (or "\f" "\n" "\r" "\v"))
+(define-skip-parser peg-ws none (or " " "\t"))
+(define-skip-parser peg-line all
+  (and "//" (* (and (not-followed-by peg-eol) peg-any))
+       (expect (or "\n" "\r\n" peg-eof))))
+(define-skip-parser peg-block-strict all
+  (and "/*"
+       (* (or peg-block (and (not-followed-by "*/") peg-any)))
+       (expect "*/")))
+(define-skip-parser peg-skip all
+  (* (or peg-ws peg-eol peg-line peg-block-strict)))
+(define-skip-parser peg-block all
+  (and "/*" (* (or peg-block (and (not-followed-by "*/") peg-any)))
+       (or "*/" peg-eof)))
+@end lisp
+
+When setting @var{%peg:skip?} to @code{peg-skip}, whitespace and
+comments are silently skipped.
+
+@lisp
+(parameterize ((%peg:skip? peg-skip))
+  (and=> (match-pattern grammar input-text) peg:tree))
+@end lisp
+
+If you want to preserve locations and comments, set
+@var{%peg:locations?} to @code{#t}.
+@lisp
+(parameterize ((%peg:skip? peg-skip)
+               (%peg:locations? #t))
+  (and=> (match-pattern grammar input-text) peg:tree))
+@end lisp
+
 @subsubheading Expect parsing
 
 The best thing about PEG is its backtracking nature making it
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 499c3820c..fd9dce54c 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -28,7 +28,10 @@
   #:use-module (ice-9 peg cache)
   #:re-export (define-peg-pattern
                define-peg-string-patterns
+               define-skip-parser
                %peg:debug?
+               %peg:locations?
+               %peg:skip?
                match-pattern
                search-for-pattern
                compile-peg-pattern
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index dd24bdac0..458a7e3ab 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -21,7 +21,10 @@
   #:export (compile-peg-pattern
             wrap-parser-for-users
             add-peg-compiler!
-            %peg:debug?)
+            define-skip-parser
+            %peg:debug?
+            %peg:locations?
+            %peg:skip?)
 
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
@@ -349,6 +352,9 @@ return EXP."
 ;; Packages the results of a parser
 
 (define %peg:debug? (make-parameter #f))
+(define %peg:locations? (make-parameter #f))
+(define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
+
 (define (trace? symbol)
   (%peg:debug?))
 
@@ -361,7 +367,11 @@ return EXP."
                 (make-string indent #\space)
                 '#,s-syn))
       (set! indent (+ indent 4))
-      (let ((res (#,parser str strlen at)))
+      (let* ((comment-res ((%peg:skip?) str strlen at))
+             (comment-loc (and (%peg:locations?) comment-res
+                               `(location ,at ,(car comment-res))))
+             (at (or (and comment-res (car comment-res)) at))
+             (res (#,parser str strlen at)))
         (set! indent (- indent 4))
         (let ((pos (or (and res (car res)) 0)))
           (when (and (trace? '#,s-syn) (< at pos))
@@ -369,26 +379,40 @@ return EXP."
                     (make-string indent #\space)
                     '#,s-syn
                     (substring str at pos)
-                    (substring str pos (min strlen (+ pos 10)))))
-          ;; Try to match the nonterminal.
-          (if res
-              ;; If we matched, do some post-processing to figure out
-              ;; what data to propagate upward.
-              (let ((at (car res))
-                    (body (cadr res)))
-                #,(cond
-                   ((eq? accumsym 'name)
-                    #`(list at '#,s-syn))
-                   ((eq? accumsym 'all)
-                    #`(list (car res)
-                            (cond
-                             ((not (list? body))
-                              (list '#,s-syn body))
-                             ((null? body) '#,s-syn)
-                             ((symbol? (car body))
-                              (list '#,s-syn body))
-                             (else (cons '#,s-syn body)))))
-                   ((eq? accumsym 'none) #`(list (car res) '()))
-                   (else #`(begin res))))
-              ;; If we didn't match, just return false.
-              #f)))))
+                    (substring str pos (min strlen (+ pos 10))))))
+        ;; Try to match the nonterminal.
+        (if res
+            ;; If we matched, do some post-processing to figure out
+            ;; what data to propagate upward.
+            (let* ((body (cadr res))
+                   (loc `(location ,at ,(car res)))
+                   (annotate (if (not (%peg:locations?)) '()
+                                 (if (null? (cadr comment-res)) `(,loc)
+                                     `((comment ,(cdr comment-res) ,comment-loc)
+                                       ,loc))))
+                   (at (car res)))
+              #,(cond
+                 ((eq? accumsym 'name)
+                  #`(list at '#,s-syn ,@annotate))
+                 ((eq? accumsym 'all)
+                  #`(list (car res)
+                          (cond
+                           ((not (list? body))
+                            `(,'#,s-syn ,body ,@annotate))
+                           ((null? body)
+                            `(,'#,s-syn ,@annotate))
+                           ((symbol? (car body))
+                            `(,'#,s-syn ,body ,@annotate))
+                           (else
+                            (cons '#,s-syn (append body annotate))))))
+                 ((eq? accumsym 'none) #``(,at () ,@annotate))
+                 (else #``(,at ,body ,@annotate))))
+            ;; If we didn't match, just return false.
+            #f))))
+
+(define-syntax define-skip-parser
+  (lambda (x)
+    (syntax-case x ()
+      ((_ sym accum pat)
+       (let* ((matchf (compile-peg-pattern #'pat (syntax->datum #'accum))))
+         #`(define sym #,matchf))))))
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index b3586c891..4f267f561 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -332,3 +332,50 @@ three <-- 'three'"
         (and=> (match-pattern expect-grammar "onethree")
                peg:tree))
       (lambda args args))))
+
+(define program-text "
+/*
+ CopyLeft (L) Acme
+*/
+foo // the first
+bar
+	bar
+baz
+")
+
+(define-skip-parser peg-eof none (not-followed-by peg-any))
+(define-skip-parser peg-eol none (or "\f" "\n" "\r" "\v"))
+(define-skip-parser peg-ws none (or " " "\t"))
+(define-skip-parser peg-line all
+  (and "//" (* (and (not-followed-by peg-eol) peg-any))
+       (expect (or "\n" "\r\n" peg-eof))))
+(define-skip-parser peg-block all
+  (and "/*" (* (or peg-block (and (not-followed-by "*/") peg-any)))
+       (or "*/" peg-eof)))
+(define-skip-parser peg-block-strict all
+  (and "/*"
+       (* (or peg-block (and (not-followed-by "*/") peg-any)))
+       (expect "*/")))
+(define-skip-parser peg-skip all
+  (* (or peg-ws peg-eol peg-line peg-block-strict)))
+
+(with-test-prefix "Skip parser"
+  (pass-if-equal "skip comments and whitespace"
+      '(trace-grammar (foo "foo") (bar (bla "bar") (bla "bar")) (baz "baz"))
+    (and=> (parameterize ((%peg:skip? peg-skip))
+             (match-pattern trace-grammar program-text))
+           peg:tree))
+  (pass-if-equal "preserve comments and whitespace"
+      '(trace-grammar (foo "foo" (location 26 29))
+                      (bar (bla "bar" (location 43 46))
+                           (bla "bar" (location 48 51))
+                           (comment "// the first\n" (location 29 43))
+                           (location 43 51))
+                      (baz "baz" (location 52 55))
+                      (comment "/*\n CopyLeft (L) Acme\n*/"
+                               (location 0 26))
+                      (location 26 55))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:locations? #t))
+             (match-pattern trace-grammar program-text))
+           peg:tree)))
-- 
2.46.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] peg: Add fall-back parsing.
  2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
                       ` (2 preceding siblings ...)
  2024-10-14  7:31     ` [PATCH v2 4/5] peg: Add whitespace and comment skip parsers Janneke Nieuwenhuizen
@ 2024-10-14  7:31     ` Janneke Nieuwenhuizen
  2024-10-14  7:36       ` Janneke Nieuwenhuizen
  3 siblings, 1 reply; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:31 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

From: Rutger van Beusekom <rutger@dezyne.org>

This allows production of incomplete parse trees, without errors, e.g.,
for code completion.

* module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
parameter.
(%enable-expect, %continuation, %final-continuation): New parameter.
(final-continuation): New function.
(cg-or-rest): New function.
(cg-and-int): Recover from expectation failures, fall-back by skipping
forward or escalating upward.
(cg-*): Prepare fall-back %continuation.
* test-suite/tests/peg.test ("Fall-back parser"): Test it.
* doc/ref/api-peg.texi (PEG Internals): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>

fall-back

fall-back

fallback
---
 doc/ref/api-peg.texi         |  14 ++++
 module/ice-9/peg.scm         |   5 +-
 module/ice-9/peg/codegen.scm | 146 +++++++++++++++++++++++++++--------
 test-suite/tests/peg.test    |  24 +++++-
 4 files changed, 151 insertions(+), 38 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 733cb1c6d..4c96b2acf 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -1116,3 +1116,17 @@ language.  Putting a @code{#} behind a terminal or non-terminal
 indicates that its parsing must succeed, otherwise an exception is
 thrown containing the current parser state providing a hook to produce
 informative parse errors.
+
+@subsubheading Fallback parsing
+
+A natural extension to expect parsing is fallback parsing.  It is
+enabled by setting parameter @var{%peg:fall-back?} to @code{#t}.
+Fallback parsing is implemented by catching the exception thrown by the
+expect operator.  At this point the parser attempts to recover its state
+by eating away at the input until the input runs out or until one of the
+grammar continuations matches and parsing continues regularly.
+
+When error occurs, @var{%peg:error} is invoked.
+
+@deffn {Scheme Procedure} %peg:error str line-number column-number error-type error
+@end deffn
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index fd9dce54c..aa7ddc743 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -25,13 +25,15 @@
   ;; peg-sexp-compile.
   #:use-module (ice-9 peg simplify-tree)
   #:use-module (ice-9 peg using-parsers)
-  #:use-module (ice-9 peg cache)
+
   #:re-export (define-peg-pattern
                define-peg-string-patterns
                define-skip-parser
                %peg:debug?
+               %peg:fall-back?
                %peg:locations?
                %peg:skip?
+               %peg:error
                match-pattern
                search-for-pattern
                compile-peg-pattern
@@ -43,4 +45,3 @@
                peg:tree
                peg:substring
                peg-record?))
-
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index 458a7e3ab..642f31c63 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -23,9 +23,12 @@
             add-peg-compiler!
             define-skip-parser
             %peg:debug?
+            %peg:error
+            %peg:fall-back?
             %peg:locations?
             %peg:skip?)
 
+  #:use-module (srfi srfi-1)
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
@@ -60,6 +63,8 @@ return EXP."
      (set! lst (cons obj lst)))))
 
 
+(define %peg:fall-back? (make-parameter #f)) ;; public interface, enable fall-back parsing
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; CODE GENERATORS
 ;; These functions generate scheme code for parsing PEGs.
@@ -169,6 +174,71 @@ return EXP."
    ((eq? accum 'none) 'none)))
 (define baf builtin-accum-filter)
 
+(define (final-continuation str strlen at) #f)
+
+(define %continuation (make-parameter final-continuation))
+
+(define %fall-back-skip-at (make-parameter #f))
+
+;;Fallback parsing is triggered by a syntax-error exception
+;;the 'at' parameter is then pointing to "incomplete or erroneous" input
+;;and moves ahead in the input until one of the continuations
+;;of the production rules in the current callstack matches the input at that point.
+;;At this point parsing continues regularly, but with an incomplete or erroneous parse tree.
+;;If none of the continuations match then parsing fails without a result.
+;;The operators involved for determining a continuation are: '(+ * and)
+;;operator / is naturally not combined with the use of #
+;;operators '(! &) may be considered later, since they may prove useful as asserts
+
+(define (format-error error str)
+  "Return procedure with two parameters (FROM TO) that formats parser
+exception ERROR (offset . error) according using the source text in STR
+and collects it using procedure (%peg:error)."
+  (define (get-error-type from to)
+    (if (< from to)
+        'expected
+        'error))
+  (lambda (from to)
+    (let* ((error-type (get-error-type from to))
+           (error-pos (caar error))
+           (line-number (1+ (string-count str #\newline 0 error-pos)))
+           (col-number (- error-pos
+                          (or (string-rindex str #\newline 0 error-pos) -1))))
+      ((%peg:error) str line-number col-number error-type error))))
+
+(define* (fall-back-skip kernel #:optional sequence?)
+  (if (not (%peg:fall-back?)) kernel
+      (lambda (str strlen start)
+        (catch 'syntax-error
+          (lambda _
+            (kernel str strlen start))
+          (lambda (key . args)
+            (let* ((expected (cadar args))
+                   (format-error (format-error args str)))
+              (let loop ((at start))
+                (cond ((or (= at strlen)
+                           ;; TODO: decide what to do; inspecting at might not be enough?!!
+                           (unless (and (%fall-back-skip-at)
+                                        (eq? (%fall-back-skip-at) at))
+                             (parameterize ((%fall-back-skip-at at))
+                               ((%continuation) str strlen at))))
+                       (format-error start at)
+                       (if sequence? `(,at ()) `(,at (,expected))))
+                      (else
+                       (let ((res (false-if-exception (kernel str strlen (1+ at)))))
+                         (if res
+                             (begin
+                               (format-error (or (string-index str (char-set-complement char-set:whitespace) start at) start) at)
+                               res)
+                             (loop (1+ at)))))))))))))
+
+
+(define (partial-match kernel sym)
+  (lambda (str strlen at)
+    (catch 'syntax-error
+      (lambda _ (kernel str strlen at))
+      (lambda (key . args) (and (< at (caar args)) (car args))))))
+
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and clauses accum)
   #`(lambda (str len pos)
@@ -181,8 +251,17 @@ return EXP."
     (()
      (cggr accum 'cg-and #`(reverse #,body) at))
     ((first rest ...)
-     #`(let ((res (#,(compile-peg-pattern #'first accum) #,str #,strlen #,at)))
-         (and res 
+     #`(let* ((next #,(cg-or #'(rest ...) 'body))
+              (kernel #,(compile-peg-pattern #'first accum))
+              (res (parameterize
+                       ((%continuation
+                         (let ((after-that (%continuation)))
+                           (lambda (str strlen at)
+                             (or ((partial-match next 'next) str strlen at)
+                                 ((partial-match after-that 'after-that)
+                                  str strlen at))))))
+                     ((fall-back-skip kernel) #,str #,strlen #,at))))
+         (and res
               ;; update AT and BODY then recurse
               (let ((newat (car res))
                     (newbody (cadr res)))
@@ -207,42 +286,40 @@ return EXP."
 (define (cg-* args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#t))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (kleene (lambda (str strlen at)
+                        (let ((body '()))
+                          (let lp ((end at) (count 0))
+                            (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                   (new-end (if match (car match) end))
+                                   (count (if (> new-end end) (1+ count) count)))
+                              (when (> new-end end)
+                                (push-not-null! body (single-filter (cadr match))))
+                              (if (and (> new-end end) #,#t) (lp new-end count)
+                                  (let ((success #,#t))
+                                    #,#`(and success
+                                             #,(cggr (baf accum) 'cg-body
+                                                     #'(reverse body) #'new-end))))))))))
+         kleene))))
 
 (define (cg-+ args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#'(>= count 1)))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (multiple (lambda (str strlen at)
+                          (let ((body '()))
+                            (let lp ((end at) (count 0))
+                              (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                     (new-end (if match (car match) end))
+                                     (count (if (> new-end end) (1+ count) count)))
+                                (when (> new-end end)
+                                  (push-not-null! body (single-filter (cadr match))))
+                                (if (and (> new-end end) #,#t) (lp new-end count)
+                                    (let ((success #,#'(>= count 1)))
+                                      #,#`(and success
+                                               #,(cggr (baf accum) 'cg-body
+                                                       #'(reverse body) #'new-end))))))))))
+         multiple))))
 
 (define (cg-? args accum)
   (syntax-case args ()
@@ -351,6 +428,7 @@ return EXP."
 
 ;; Packages the results of a parser
 
+(define %peg:error (make-parameter (const #f)))
 (define %peg:debug? (make-parameter #f))
 (define %peg:locations? (make-parameter #f))
 (define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index 4f267f561..8a20cda41 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -6,6 +6,7 @@
 
 (define-module (test-suite test-peg)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-26)
   #:use-module (test-suite lib)
   #:use-module (ice-9 peg)
   #:use-module (ice-9 pretty-print))
@@ -310,8 +311,7 @@ trace-grammar := \"foobarbarbaz\"	next: \"\"
   "expect-grammar <-- one two three / .*
 one <-- 'one'#
 two <-- 'two'#
-three <-- 'three'"
-)
+three <-- 'three'")
 
 (with-test-prefix "Parsing expect"
   (pass-if-equal "expect okay"
@@ -379,3 +379,23 @@ baz
                           (%peg:locations? #t))
              (match-pattern trace-grammar program-text))
            peg:tree)))
+
+(with-test-prefix "Fall-back parser"
+  (pass-if-equal "only one"
+      '(expect-grammar "one")
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one"))
+           peg:tree))
+  (pass-if-equal "no two"
+      '(expect-grammar (one "one") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one three"))
+           (compose (cute remove string? <>) peg:tree)))
+  (pass-if-equal "missing one"
+      '(expect-grammar (two "two") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "two three"))
+           (compose (cute remove string? <>) peg:tree))))
-- 
2.46.0




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] peg: Add fall-back parsing.
  2024-10-14  7:31     ` [PATCH v2 5/5] peg: Add fall-back parsing Janneke Nieuwenhuizen
@ 2024-10-14  7:36       ` Janneke Nieuwenhuizen
  2024-10-14 15:58         ` Ekaitz Zarraga
  0 siblings, 1 reply; 11+ messages in thread
From: Janneke Nieuwenhuizen @ 2024-10-14  7:36 UTC (permalink / raw)
  To: guile-devel; +Cc: Ekaitz Zarraga, Rutger van Beusekom

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Janneke Nieuwenhuizen writes:

> From: Rutger van Beusekom <rutger@dezyne.org>
>
> This allows production of incomplete parse trees, without errors, e.g.,
> for code completion.
>
> * module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
> parameter.
> (%enable-expect, %continuation, %final-continuation): New parameter.
> (final-continuation): New function.
> (cg-or-rest): New function.
> (cg-and-int): Recover from expectation failures, fall-back by skipping
> forward or escalating upward.
> (cg-*): Prepare fall-back %continuation.
> * test-suite/tests/peg.test ("Fall-back parser"): Test it.
> * doc/ref/api-peg.texi (PEG Internals): Document it.
>
> Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
>
> fall-back
>
> fall-back
>
> fallback

Oops, find cleaned-up version attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0005-peg-Add-fall-back-parsing.patch --]
[-- Type: text/x-patch, Size: 13650 bytes --]

From b3a3b48c0b76a2baed4d4b11f1d38ec0f772717c Mon Sep 17 00:00:00 2001
From: Rutger van Beusekom <rutger@dezyne.org>
Date: Tue, 7 Jan 2020 13:33:15 +0100
Subject: [PATCH v2 5/5] peg: Add fall-back parsing.

This allows production of incomplete parse trees, without errors, e.g.,
for code completion.

* module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
parameter.
(%enable-expect, %continuation, %final-continuation): New parameter.
(final-continuation): New function.
(cg-or-rest): New function.
(cg-and-int): Recover from expectation failures, fall-back by skipping
forward or escalating upward.
(cg-*): Prepare fall-back %continuation.
* test-suite/tests/peg.test ("Fall-back parser"): Test it.
* doc/ref/api-peg.texi (PEG Internals): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
---
 doc/ref/api-peg.texi         |  14 ++++
 module/ice-9/peg.scm         |   5 +-
 module/ice-9/peg/codegen.scm | 146 +++++++++++++++++++++++++++--------
 test-suite/tests/peg.test    |  24 +++++-
 4 files changed, 151 insertions(+), 38 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 733cb1c6d..4c96b2acf 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -1116,3 +1116,17 @@ language.  Putting a @code{#} behind a terminal or non-terminal
 indicates that its parsing must succeed, otherwise an exception is
 thrown containing the current parser state providing a hook to produce
 informative parse errors.
+
+@subsubheading Fallback parsing
+
+A natural extension to expect parsing is fallback parsing.  It is
+enabled by setting parameter @var{%peg:fall-back?} to @code{#t}.
+Fallback parsing is implemented by catching the exception thrown by the
+expect operator.  At this point the parser attempts to recover its state
+by eating away at the input until the input runs out or until one of the
+grammar continuations matches and parsing continues regularly.
+
+When error occurs, @var{%peg:error} is invoked.
+
+@deffn {Scheme Procedure} %peg:error str line-number column-number error-type error
+@end deffn
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index fd9dce54c..aa7ddc743 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -25,13 +25,15 @@
   ;; peg-sexp-compile.
   #:use-module (ice-9 peg simplify-tree)
   #:use-module (ice-9 peg using-parsers)
-  #:use-module (ice-9 peg cache)
+
   #:re-export (define-peg-pattern
                define-peg-string-patterns
                define-skip-parser
                %peg:debug?
+               %peg:fall-back?
                %peg:locations?
                %peg:skip?
+               %peg:error
                match-pattern
                search-for-pattern
                compile-peg-pattern
@@ -43,4 +45,3 @@
                peg:tree
                peg:substring
                peg-record?))
-
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index 458a7e3ab..642f31c63 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -23,9 +23,12 @@
             add-peg-compiler!
             define-skip-parser
             %peg:debug?
+            %peg:error
+            %peg:fall-back?
             %peg:locations?
             %peg:skip?)
 
+  #:use-module (srfi srfi-1)
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
@@ -60,6 +63,8 @@ return EXP."
      (set! lst (cons obj lst)))))
 
 
+(define %peg:fall-back? (make-parameter #f)) ;; public interface, enable fall-back parsing
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; CODE GENERATORS
 ;; These functions generate scheme code for parsing PEGs.
@@ -169,6 +174,71 @@ return EXP."
    ((eq? accum 'none) 'none)))
 (define baf builtin-accum-filter)
 
+(define (final-continuation str strlen at) #f)
+
+(define %continuation (make-parameter final-continuation))
+
+(define %fall-back-skip-at (make-parameter #f))
+
+;;Fallback parsing is triggered by a syntax-error exception
+;;the 'at' parameter is then pointing to "incomplete or erroneous" input
+;;and moves ahead in the input until one of the continuations
+;;of the production rules in the current callstack matches the input at that point.
+;;At this point parsing continues regularly, but with an incomplete or erroneous parse tree.
+;;If none of the continuations match then parsing fails without a result.
+;;The operators involved for determining a continuation are: '(+ * and)
+;;operator / is naturally not combined with the use of #
+;;operators '(! &) may be considered later, since they may prove useful as asserts
+
+(define (format-error error str)
+  "Return procedure with two parameters (FROM TO) that formats parser
+exception ERROR (offset . error) according using the source text in STR
+and collects it using procedure (%peg:error)."
+  (define (get-error-type from to)
+    (if (< from to)
+        'expected
+        'error))
+  (lambda (from to)
+    (let* ((error-type (get-error-type from to))
+           (error-pos (caar error))
+           (line-number (1+ (string-count str #\newline 0 error-pos)))
+           (col-number (- error-pos
+                          (or (string-rindex str #\newline 0 error-pos) -1))))
+      ((%peg:error) str line-number col-number error-type error))))
+
+(define* (fall-back-skip kernel #:optional sequence?)
+  (if (not (%peg:fall-back?)) kernel
+      (lambda (str strlen start)
+        (catch 'syntax-error
+          (lambda _
+            (kernel str strlen start))
+          (lambda (key . args)
+            (let* ((expected (cadar args))
+                   (format-error (format-error args str)))
+              (let loop ((at start))
+                (cond ((or (= at strlen)
+                           ;; TODO: decide what to do; inspecting at might not be enough?!!
+                           (unless (and (%fall-back-skip-at)
+                                        (eq? (%fall-back-skip-at) at))
+                             (parameterize ((%fall-back-skip-at at))
+                               ((%continuation) str strlen at))))
+                       (format-error start at)
+                       (if sequence? `(,at ()) `(,at (,expected))))
+                      (else
+                       (let ((res (false-if-exception (kernel str strlen (1+ at)))))
+                         (if res
+                             (begin
+                               (format-error (or (string-index str (char-set-complement char-set:whitespace) start at) start) at)
+                               res)
+                             (loop (1+ at)))))))))))))
+
+
+(define (partial-match kernel sym)
+  (lambda (str strlen at)
+    (catch 'syntax-error
+      (lambda _ (kernel str strlen at))
+      (lambda (key . args) (and (< at (caar args)) (car args))))))
+
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and clauses accum)
   #`(lambda (str len pos)
@@ -181,8 +251,17 @@ return EXP."
     (()
      (cggr accum 'cg-and #`(reverse #,body) at))
     ((first rest ...)
-     #`(let ((res (#,(compile-peg-pattern #'first accum) #,str #,strlen #,at)))
-         (and res 
+     #`(let* ((next #,(cg-or #'(rest ...) 'body))
+              (kernel #,(compile-peg-pattern #'first accum))
+              (res (parameterize
+                       ((%continuation
+                         (let ((after-that (%continuation)))
+                           (lambda (str strlen at)
+                             (or ((partial-match next 'next) str strlen at)
+                                 ((partial-match after-that 'after-that)
+                                  str strlen at))))))
+                     ((fall-back-skip kernel) #,str #,strlen #,at))))
+         (and res
               ;; update AT and BODY then recurse
               (let ((newat (car res))
                     (newbody (cadr res)))
@@ -207,42 +286,40 @@ return EXP."
 (define (cg-* args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#t))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (kleene (lambda (str strlen at)
+                        (let ((body '()))
+                          (let lp ((end at) (count 0))
+                            (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                   (new-end (if match (car match) end))
+                                   (count (if (> new-end end) (1+ count) count)))
+                              (when (> new-end end)
+                                (push-not-null! body (single-filter (cadr match))))
+                              (if (and (> new-end end) #,#t) (lp new-end count)
+                                  (let ((success #,#t))
+                                    #,#`(and success
+                                             #,(cggr (baf accum) 'cg-body
+                                                     #'(reverse body) #'new-end))))))))))
+         kleene))))
 
 (define (cg-+ args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#'(>= count 1)))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (multiple (lambda (str strlen at)
+                          (let ((body '()))
+                            (let lp ((end at) (count 0))
+                              (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                     (new-end (if match (car match) end))
+                                     (count (if (> new-end end) (1+ count) count)))
+                                (when (> new-end end)
+                                  (push-not-null! body (single-filter (cadr match))))
+                                (if (and (> new-end end) #,#t) (lp new-end count)
+                                    (let ((success #,#'(>= count 1)))
+                                      #,#`(and success
+                                               #,(cggr (baf accum) 'cg-body
+                                                       #'(reverse body) #'new-end))))))))))
+         multiple))))
 
 (define (cg-? args accum)
   (syntax-case args ()
@@ -351,6 +428,7 @@ return EXP."
 
 ;; Packages the results of a parser
 
+(define %peg:error (make-parameter (const #f)))
 (define %peg:debug? (make-parameter #f))
 (define %peg:locations? (make-parameter #f))
 (define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index 4f267f561..8a20cda41 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -6,6 +6,7 @@
 
 (define-module (test-suite test-peg)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-26)
   #:use-module (test-suite lib)
   #:use-module (ice-9 peg)
   #:use-module (ice-9 pretty-print))
@@ -310,8 +311,7 @@ trace-grammar := \"foobarbarbaz\"	next: \"\"
   "expect-grammar <-- one two three / .*
 one <-- 'one'#
 two <-- 'two'#
-three <-- 'three'"
-)
+three <-- 'three'")
 
 (with-test-prefix "Parsing expect"
   (pass-if-equal "expect okay"
@@ -379,3 +379,23 @@ baz
                           (%peg:locations? #t))
              (match-pattern trace-grammar program-text))
            peg:tree)))
+
+(with-test-prefix "Fall-back parser"
+  (pass-if-equal "only one"
+      '(expect-grammar "one")
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one"))
+           peg:tree))
+  (pass-if-equal "no two"
+      '(expect-grammar (one "one") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one three"))
+           (compose (cute remove string? <>) peg:tree)))
+  (pass-if-equal "missing one"
+      '(expect-grammar (two "two") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "two three"))
+           (compose (cute remove string? <>) peg:tree))))
-- 
2.46.0


[-- Attachment #3: Type: text/plain, Size: 164 bytes --]


-- 
Janneke Nieuwenhuizen <janneke@gnu.org>  | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] peg: Add fall-back parsing.
  2024-10-14  7:36       ` Janneke Nieuwenhuizen
@ 2024-10-14 15:58         ` Ekaitz Zarraga
  2024-10-16  6:44           ` Rutger van Beusekom
  0 siblings, 1 reply; 11+ messages in thread
From: Ekaitz Zarraga @ 2024-10-14 15:58 UTC (permalink / raw)
  To: Janneke Nieuwenhuizen, guile-devel; +Cc: Rutger van Beusekom

On 2024-10-14 09:36, Janneke Nieuwenhuizen wrote:
> Janneke Nieuwenhuizen writes:
> 
>> From: Rutger van Beusekom <rutger@dezyne.org>
>>
>> This allows production of incomplete parse trees, without errors, e.g.,
>> for code completion.
>>
>> * module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
>> parameter.
>> (%enable-expect, %continuation, %final-continuation): New parameter.
>> (final-continuation): New function.
>> (cg-or-rest): New function.
>> (cg-and-int): Recover from expectation failures, fall-back by skipping
>> forward or escalating upward.
>> (cg-*): Prepare fall-back %continuation.
>> * test-suite/tests/peg.test ("Fall-back parser"): Test it.
>> * doc/ref/api-peg.texi (PEG Internals): Document it.
>>
>> Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
>>
>> fall-back
>>
>> fall-back
>>
>> fallback
> 
> Oops, find cleaned-up version attached.
> 
> 
> 

Hi all,

I think this change is not compatible with the PEG syntax. In PEG # is a 
line comment:

https://bford.info/pub/lang/peg.pdf

I think we should take a look to what PEG/LEG does to see if they 
implement the same idea and copy the syntax they use. If they do.

https://github.com/gpakosz/peg/blob/upstream/src/peg.1

I'll take a deeper look to it, the rest of the commits seem good, but 
I'll also review in detail and propose something compatible with what 
I'm trying to push for in #73188.

Thanks for sharing all these commits, they are very interesting and helpful.

Ekaitz



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] peg: Add fall-back parsing.
  2024-10-14 15:58         ` Ekaitz Zarraga
@ 2024-10-16  6:44           ` Rutger van Beusekom
  2024-10-18  9:58             ` Ekaitz Zarraga
  0 siblings, 1 reply; 11+ messages in thread
From: Rutger van Beusekom @ 2024-10-16  6:44 UTC (permalink / raw)
  To: Ekaitz Zarraga, Janneke Nieuwenhuizen, guile-devel@gnu.org
  Cc: Rutger van Beusekom

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]



________________________________
From: Ekaitz Zarraga <ekaitz@elenq.tech>
Sent: Monday, 14 October 2024 17:58
To: Janneke Nieuwenhuizen <janneke@gnu.org>; guile-devel@gnu.org <guile-devel@gnu.org>
Cc: Rutger van Beusekom <rutger@dezyne.org>
Subject: Re: [PATCH v2 5/5] peg: Add fall-back parsing.

Hi Ekaitz,

> I think this change is not compatible with the PEG syntax. In PEG # is a
> line comment:

I am not attached to # as the PEG expect operator.  At the same time the
expect operator is not part of https://bford.info/pub/lang/peg.pdf. I think
we should consider choosing another name for the expect operator, do you
have a suggestion Ekaitz?

I was considering "reusing" the she-bang, i.e. #!, since the expect operator
throws an exception, but Janneke pointed out that ! as a modifier on the #
comment is not very intuitive and I tend to agree.

Regtur

[-- Attachment #2: Type: text/html, Size: 3817 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] peg: Add fall-back parsing.
  2024-10-16  6:44           ` Rutger van Beusekom
@ 2024-10-18  9:58             ` Ekaitz Zarraga
  0 siblings, 0 replies; 11+ messages in thread
From: Ekaitz Zarraga @ 2024-10-18  9:58 UTC (permalink / raw)
  To: Rutger van Beusekom, Janneke Nieuwenhuizen, guile-devel@gnu.org
  Cc: Rutger van Beusekom

Hi

On 2024-10-16 08:44, Rutger van Beusekom wrote:

> *Subject:* Re: [PATCH v2 5/5] peg: Add fall-back parsing.
> Hi Ekaitz,
> 
>  > I think this change is not compatible with the PEG syntax. In PEG # is a
>  > line comment:
> 
> I am not attached to # as the PEG expect operator.  At the same time the
> expect operator is not part of https://bford.info/pub/lang/peg.pdf 
> <https://bford.info/pub/lang/peg.pdf>. I think
> we should consider choosing another name for the expect operator, do you
> have a suggestion Ekaitz?
> 
> I was considering "reusing" the she-bang, i.e. #!, since the expect operator
> throws an exception, but Janneke pointed out that ! as a modifier on the #
> comment is not very intuitive and I tend to agree.

My suggestion is to try to be closer to what PEG/LEG does simply because 
it's a very famous library:

https://www.piumarta.com/software/peg/peg.1.html

In any case, we should and probably must diverge from it a little bit in 
order to be compatible with our constructs. For example, that one 
supports (patter), {action} and <acc> that we might not be able to 
replicate. My idea is to continue its line as long as it's useful for 
us, but when it's not we can just diverge from the parts that we don't like.

I'm wondering if this fallback mechanism could fit in one of the 
constructs of PEG that the library supports. If it doesn't, we could 
just go with @ or % signs or anything like that. If it does, we could 
just follow what the library does.

WDYT?






^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-10-18  9:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-11 13:41 [PATCH] ice-9/peg: Extend PEG for production use Rutger van Beusekom
2024-10-14  7:26 ` [PATCH v2 0/5] " Janneke Nieuwenhuizen
2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 2/5] peg: Add debug tracing Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 3/5] peg: Add expect Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 4/5] peg: Add whitespace and comment skip parsers Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 5/5] peg: Add fall-back parsing Janneke Nieuwenhuizen
2024-10-14  7:36       ` Janneke Nieuwenhuizen
2024-10-14 15:58         ` Ekaitz Zarraga
2024-10-16  6:44           ` Rutger van Beusekom
2024-10-18  9:58             ` Ekaitz Zarraga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).