unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Janneke Nieuwenhuizen <janneke@gnu.org>
To: guile-devel@gnu.org
Cc: Ekaitz Zarraga <ekaitz@elenq.tech>,
	 Rutger van Beusekom <rutger@dezyne.org>
Subject: Re: [PATCH v2 5/5] peg: Add fall-back parsing.
Date: Mon, 14 Oct 2024 09:36:36 +0200	[thread overview]
Message-ID: <87iktvdtkr.fsf@gnu.org> (raw)
In-Reply-To: <20241014073109.19774-5-janneke@gnu.org> (Janneke Nieuwenhuizen's message of "Mon, 14 Oct 2024 09:31:09 +0200")

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Janneke Nieuwenhuizen writes:

> From: Rutger van Beusekom <rutger@dezyne.org>
>
> This allows production of incomplete parse trees, without errors, e.g.,
> for code completion.
>
> * module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
> parameter.
> (%enable-expect, %continuation, %final-continuation): New parameter.
> (final-continuation): New function.
> (cg-or-rest): New function.
> (cg-and-int): Recover from expectation failures, fall-back by skipping
> forward or escalating upward.
> (cg-*): Prepare fall-back %continuation.
> * test-suite/tests/peg.test ("Fall-back parser"): Test it.
> * doc/ref/api-peg.texi (PEG Internals): Document it.
>
> Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
>
> fall-back
>
> fall-back
>
> fallback

Oops, find cleaned-up version attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0005-peg-Add-fall-back-parsing.patch --]
[-- Type: text/x-patch, Size: 13650 bytes --]

From b3a3b48c0b76a2baed4d4b11f1d38ec0f772717c Mon Sep 17 00:00:00 2001
From: Rutger van Beusekom <rutger@dezyne.org>
Date: Tue, 7 Jan 2020 13:33:15 +0100
Subject: [PATCH v2 5/5] peg: Add fall-back parsing.

This allows production of incomplete parse trees, without errors, e.g.,
for code completion.

* module/ice-9/peg/codegen.scm (%peg:fall-back?): New exported
parameter.
(%enable-expect, %continuation, %final-continuation): New parameter.
(final-continuation): New function.
(cg-or-rest): New function.
(cg-and-int): Recover from expectation failures, fall-back by skipping
forward or escalating upward.
(cg-*): Prepare fall-back %continuation.
* test-suite/tests/peg.test ("Fall-back parser"): Test it.
* doc/ref/api-peg.texi (PEG Internals): Document it.

Co-authored-by: Janneke Nieuwenhuizen <janneke@gnu.org>
---
 doc/ref/api-peg.texi         |  14 ++++
 module/ice-9/peg.scm         |   5 +-
 module/ice-9/peg/codegen.scm | 146 +++++++++++++++++++++++++++--------
 test-suite/tests/peg.test    |  24 +++++-
 4 files changed, 151 insertions(+), 38 deletions(-)

diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi
index 733cb1c6d..4c96b2acf 100644
--- a/doc/ref/api-peg.texi
+++ b/doc/ref/api-peg.texi
@@ -1116,3 +1116,17 @@ language.  Putting a @code{#} behind a terminal or non-terminal
 indicates that its parsing must succeed, otherwise an exception is
 thrown containing the current parser state providing a hook to produce
 informative parse errors.
+
+@subsubheading Fallback parsing
+
+A natural extension to expect parsing is fallback parsing.  It is
+enabled by setting parameter @var{%peg:fall-back?} to @code{#t}.
+Fallback parsing is implemented by catching the exception thrown by the
+expect operator.  At this point the parser attempts to recover its state
+by eating away at the input until the input runs out or until one of the
+grammar continuations matches and parsing continues regularly.
+
+When error occurs, @var{%peg:error} is invoked.
+
+@deffn {Scheme Procedure} %peg:error str line-number column-number error-type error
+@end deffn
diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index fd9dce54c..aa7ddc743 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -25,13 +25,15 @@
   ;; peg-sexp-compile.
   #:use-module (ice-9 peg simplify-tree)
   #:use-module (ice-9 peg using-parsers)
-  #:use-module (ice-9 peg cache)
+
   #:re-export (define-peg-pattern
                define-peg-string-patterns
                define-skip-parser
                %peg:debug?
+               %peg:fall-back?
                %peg:locations?
                %peg:skip?
+               %peg:error
                match-pattern
                search-for-pattern
                compile-peg-pattern
@@ -43,4 +45,3 @@
                peg:tree
                peg:substring
                peg-record?))
-
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
index 458a7e3ab..642f31c63 100644
--- a/module/ice-9/peg/codegen.scm
+++ b/module/ice-9/peg/codegen.scm
@@ -23,9 +23,12 @@
             add-peg-compiler!
             define-skip-parser
             %peg:debug?
+            %peg:error
+            %peg:fall-back?
             %peg:locations?
             %peg:skip?)
 
+  #:use-module (srfi srfi-1)
   #:use-module (ice-9 pretty-print)
   #:use-module (system base pmatch))
 
@@ -60,6 +63,8 @@ return EXP."
      (set! lst (cons obj lst)))))
 
 
+(define %peg:fall-back? (make-parameter #f)) ;; public interface, enable fall-back parsing
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; CODE GENERATORS
 ;; These functions generate scheme code for parsing PEGs.
@@ -169,6 +174,71 @@ return EXP."
    ((eq? accum 'none) 'none)))
 (define baf builtin-accum-filter)
 
+(define (final-continuation str strlen at) #f)
+
+(define %continuation (make-parameter final-continuation))
+
+(define %fall-back-skip-at (make-parameter #f))
+
+;;Fallback parsing is triggered by a syntax-error exception
+;;the 'at' parameter is then pointing to "incomplete or erroneous" input
+;;and moves ahead in the input until one of the continuations
+;;of the production rules in the current callstack matches the input at that point.
+;;At this point parsing continues regularly, but with an incomplete or erroneous parse tree.
+;;If none of the continuations match then parsing fails without a result.
+;;The operators involved for determining a continuation are: '(+ * and)
+;;operator / is naturally not combined with the use of #
+;;operators '(! &) may be considered later, since they may prove useful as asserts
+
+(define (format-error error str)
+  "Return procedure with two parameters (FROM TO) that formats parser
+exception ERROR (offset . error) according using the source text in STR
+and collects it using procedure (%peg:error)."
+  (define (get-error-type from to)
+    (if (< from to)
+        'expected
+        'error))
+  (lambda (from to)
+    (let* ((error-type (get-error-type from to))
+           (error-pos (caar error))
+           (line-number (1+ (string-count str #\newline 0 error-pos)))
+           (col-number (- error-pos
+                          (or (string-rindex str #\newline 0 error-pos) -1))))
+      ((%peg:error) str line-number col-number error-type error))))
+
+(define* (fall-back-skip kernel #:optional sequence?)
+  (if (not (%peg:fall-back?)) kernel
+      (lambda (str strlen start)
+        (catch 'syntax-error
+          (lambda _
+            (kernel str strlen start))
+          (lambda (key . args)
+            (let* ((expected (cadar args))
+                   (format-error (format-error args str)))
+              (let loop ((at start))
+                (cond ((or (= at strlen)
+                           ;; TODO: decide what to do; inspecting at might not be enough?!!
+                           (unless (and (%fall-back-skip-at)
+                                        (eq? (%fall-back-skip-at) at))
+                             (parameterize ((%fall-back-skip-at at))
+                               ((%continuation) str strlen at))))
+                       (format-error start at)
+                       (if sequence? `(,at ()) `(,at (,expected))))
+                      (else
+                       (let ((res (false-if-exception (kernel str strlen (1+ at)))))
+                         (if res
+                             (begin
+                               (format-error (or (string-index str (char-set-complement char-set:whitespace) start at) start) at)
+                               res)
+                             (loop (1+ at)))))))))))))
+
+
+(define (partial-match kernel sym)
+  (lambda (str strlen at)
+    (catch 'syntax-error
+      (lambda _ (kernel str strlen at))
+      (lambda (key . args) (and (< at (caar args)) (car args))))))
+
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and clauses accum)
   #`(lambda (str len pos)
@@ -181,8 +251,17 @@ return EXP."
     (()
      (cggr accum 'cg-and #`(reverse #,body) at))
     ((first rest ...)
-     #`(let ((res (#,(compile-peg-pattern #'first accum) #,str #,strlen #,at)))
-         (and res 
+     #`(let* ((next #,(cg-or #'(rest ...) 'body))
+              (kernel #,(compile-peg-pattern #'first accum))
+              (res (parameterize
+                       ((%continuation
+                         (let ((after-that (%continuation)))
+                           (lambda (str strlen at)
+                             (or ((partial-match next 'next) str strlen at)
+                                 ((partial-match after-that 'after-that)
+                                  str strlen at))))))
+                     ((fall-back-skip kernel) #,str #,strlen #,at))))
+         (and res
               ;; update AT and BODY then recurse
               (let ((newat (car res))
                     (newbody (cadr res)))
@@ -207,42 +286,40 @@ return EXP."
 (define (cg-* args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#t))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (kleene (lambda (str strlen at)
+                        (let ((body '()))
+                          (let lp ((end at) (count 0))
+                            (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                   (new-end (if match (car match) end))
+                                   (count (if (> new-end end) (1+ count) count)))
+                              (when (> new-end end)
+                                (push-not-null! body (single-filter (cadr match))))
+                              (if (and (> new-end end) #,#t) (lp new-end count)
+                                  (let ((success #,#t))
+                                    #,#`(and success
+                                             #,(cggr (baf accum) 'cg-body
+                                                     #'(reverse body) #'new-end))))))))))
+         kleene))))
 
 (define (cg-+ args accum)
   (syntax-case args ()
     ((pat)
-     #`(lambda (str strlen at)
-         (let ((body '()))
-           (let lp ((end at) (count 0))
-             (let* ((match (#,(compile-peg-pattern #'pat (baf accum))
-                            str strlen end))
-                    (new-end (if match (car match) end))
-                    (count (if (> new-end end) (1+ count) count)))
-               (if (> new-end end)
-                   (push-not-null! body (single-filter (cadr match))))
-               (if (and (> new-end end)
-                        #,#t)
-                   (lp new-end count)
-                   (let ((success #,#'(>= count 1)))
-                     #,#`(and success
-                                 #,(cggr (baf accum) 'cg-body
-                                         #'(reverse body) #'new-end)))))))))))
+     #`(let* ((kernel #,(compile-peg-pattern #'pat (baf accum)))
+              (multiple (lambda (str strlen at)
+                          (let ((body '()))
+                            (let lp ((end at) (count 0))
+                              (let* ((match ((fall-back-skip kernel #t) str strlen end))
+                                     (new-end (if match (car match) end))
+                                     (count (if (> new-end end) (1+ count) count)))
+                                (when (> new-end end)
+                                  (push-not-null! body (single-filter (cadr match))))
+                                (if (and (> new-end end) #,#t) (lp new-end count)
+                                    (let ((success #,#'(>= count 1)))
+                                      #,#`(and success
+                                               #,(cggr (baf accum) 'cg-body
+                                                       #'(reverse body) #'new-end))))))))))
+         multiple))))
 
 (define (cg-? args accum)
   (syntax-case args ()
@@ -351,6 +428,7 @@ return EXP."
 
 ;; Packages the results of a parser
 
+(define %peg:error (make-parameter (const #f)))
 (define %peg:debug? (make-parameter #f))
 (define %peg:locations? (make-parameter #f))
 (define %peg:skip? (make-parameter (lambda (str strlen at) `(,at ()))))
diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test
index 4f267f561..8a20cda41 100644
--- a/test-suite/tests/peg.test
+++ b/test-suite/tests/peg.test
@@ -6,6 +6,7 @@
 
 (define-module (test-suite test-peg)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-26)
   #:use-module (test-suite lib)
   #:use-module (ice-9 peg)
   #:use-module (ice-9 pretty-print))
@@ -310,8 +311,7 @@ trace-grammar := \"foobarbarbaz\"	next: \"\"
   "expect-grammar <-- one two three / .*
 one <-- 'one'#
 two <-- 'two'#
-three <-- 'three'"
-)
+three <-- 'three'")
 
 (with-test-prefix "Parsing expect"
   (pass-if-equal "expect okay"
@@ -379,3 +379,23 @@ baz
                           (%peg:locations? #t))
              (match-pattern trace-grammar program-text))
            peg:tree)))
+
+(with-test-prefix "Fall-back parser"
+  (pass-if-equal "only one"
+      '(expect-grammar "one")
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one"))
+           peg:tree))
+  (pass-if-equal "no two"
+      '(expect-grammar (one "one") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "one three"))
+           (compose (cute remove string? <>) peg:tree)))
+  (pass-if-equal "missing one"
+      '(expect-grammar (two "two") (three "three"))
+    (and=> (parameterize ((%peg:skip? peg-skip)
+                          (%peg:fall-back? #t))
+             (match-pattern expect-grammar "two three"))
+           (compose (cute remove string? <>) peg:tree))))
-- 
2.46.0


[-- Attachment #3: Type: text/plain, Size: 164 bytes --]


-- 
Janneke Nieuwenhuizen <janneke@gnu.org>  | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com

  reply	other threads:[~2024-10-14  7:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 13:41 [PATCH] ice-9/peg: Extend PEG for production use Rutger van Beusekom
2024-10-14  7:26 ` [PATCH v2 0/5] " Janneke Nieuwenhuizen
2024-10-14  7:31   ` [PATCH v2 1/5] Remove trailing whitespace in PEG texinfo Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 2/5] peg: Add debug tracing Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 3/5] peg: Add expect Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 4/5] peg: Add whitespace and comment skip parsers Janneke Nieuwenhuizen
2024-10-14  7:31     ` [PATCH v2 5/5] peg: Add fall-back parsing Janneke Nieuwenhuizen
2024-10-14  7:36       ` Janneke Nieuwenhuizen [this message]
2024-10-14 15:58         ` Ekaitz Zarraga
2024-10-16  6:44           ` Rutger van Beusekom
2024-10-18  9:58             ` Ekaitz Zarraga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87iktvdtkr.fsf@gnu.org \
    --to=janneke@gnu.org \
    --cc=ekaitz@elenq.tech \
    --cc=guile-devel@gnu.org \
    --cc=rutger@dezyne.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).