From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Janneke Nieuwenhuizen Newsgroups: gmane.lisp.guile.devel Subject: [PATCH v2 3/5] peg: Add expect. Date: Mon, 14 Oct 2024 09:31:07 +0200 Message-ID: <20241014073109.19774-3-janneke@gnu.org> References: <87seszdu1d.fsf@gnu.org> <20241014073109.19774-1-janneke@gnu.org> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20872"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Ekaitz Zarraga , Rutger van Beusekom To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Oct 14 09:31:33 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1t0FYC-0005DZ-Bc for guile-devel@m.gmane-mx.org; Mon, 14 Oct 2024 09:31:32 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1t0FY3-0008Rk-5V; Mon, 14 Oct 2024 03:31:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t0FY1-0008RH-64 for guile-devel@gnu.org; Mon, 14 Oct 2024 03:31:21 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t0FXz-0007TF-HD; Mon, 14 Oct 2024 03:31:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:References:In-Reply-To:Date:Subject:To: From; bh=pq88YhsNZDCDpuJ0h4HxuvWSYJQgbW/rVHlFqoHOA68=; b=AybzlAXO6atCc62sB40r SooZhz/SdqNzgdKfS1dwGKrO5n3rEb9bJ1+JEz6ok6tVYXwAcARMkYZpNqK2RCGpIs0Zlo+U4e8wj Iyz78vCZz5rqJBQCfiQIrA31LN9g2r03P5dch0Dr4rs6SemkF27Iil+3OWQ0Dhc27N8jXcLNn8pI8 zhWFo9yzzNN5cf8lCTi7kI0N/TRG50KbHeOiKh+ocSX+urkK00aYTyPU5L6kreVNvCwf3kMxiMJGE 5+atuqI1IhfntKiJFNEVmrgqJFAWQI7lPXy3tiqOzCbOgqXz67l3mRbIxY1pQMeaGv/6RNbUrvwHX mi6XMCxWMlF9WA==; X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241014073109.19774-1-janneke@gnu.org> X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22732 Archived-At: From: Rutger van Beusekom This adds an expectation parser (expect a, 'a#') to the PEG parser grammar. Rationale: PEG will return #f for invalid input. Adding expect (#) to the grammar is a way to report syntax errors. * module/ice-9/peg/string-peg.scm (peg-secondary->defn): Rename from peg-primary->defn. (cg-expect-int, gc-expect): New function. (expect): Add them in new parser. (peg-primary->defn): Use it in new function. (peg-as-peg "secondary"): Rename from "primary". (peg-as-peg "primary"): Use it, inserting `#' as expect. * test-suite/tests/peg.test (grammar-mapping): Update for secondary. ("PEG Grammar"): Use pass-if-equal for friendlier failure resolving. ("Parsing expect"): Test it. * doc/ref/api-peg.texi (PEG Syntax Reference): Document it. Co-authored-by: Janneke Nieuwenhuizen --- doc/ref/api-peg.texi | 21 ++++++++++++++++++++ module/ice-9/peg/codegen.scm | 11 +++++++++++ module/ice-9/peg/string-peg.scm | 12 +++++++++-- test-suite/tests/peg.test | 35 +++++++++++++++++++++++++++++---- 4 files changed, 73 insertions(+), 6 deletions(-) diff --git a/doc/ref/api-peg.texi b/doc/ref/api-peg.texi index dfa806832..df2e74d05 100644 --- a/doc/ref/api-peg.texi +++ b/doc/ref/api-peg.texi @@ -159,6 +159,15 @@ Would be: (+ "e")) @end lisp +@deftp {PEG Pattern} expect a +Expect to parse @var{a}. If this succeeds, continues. If this fails, +throw a @code{syntax-error} with location and failed expectation. + +@code{"a#"} + +@code{(expect a)} +@end deftp + @subsubheading Extended Syntax There is some extra syntax for S-expressions. @@ -1037,6 +1046,7 @@ described above. The function @code{add-peg-compiler!} is exported from the @code{(ice-9 peg codegen)} module. @subsubheading Debug tracing +@anchor{Debug tracing} Due to the backtracking nature of PEG, the parser result is @code{#f} when it cannot match the input text. It proves to be a big pain @@ -1051,3 +1061,14 @@ current state of the input, as well as the parse result. (parameterize ((%peg:debug? #t)) (and=> (match-pattern grammar input-text) peg:tree)) @end lisp + +@subsubheading Expect parsing + +The best thing about PEG is its backtracking nature making it +LL(infinite). At the same time it severely limits the debugability of +the grammar, as mentioned in @xref{Debug tracing}. The ability to stop +backtracking is achieved by introducing the @code{#} operator to the PEG +language. Putting a @code{#} behind a terminal or non-terminal +indicates that its parsing must succeed, otherwise an exception is +thrown containing the current parser state providing a hook to produce +informative parse errors. diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm index c450be440..dd24bdac0 100644 --- a/module/ice-9/peg/codegen.scm +++ b/module/ice-9/peg/codegen.scm @@ -300,6 +300,16 @@ return EXP." #f #,(cggr (baf accum) 'cg-body #''() #'at))))))))))) +(define (cg-expect-int clauses accum str strlen at) + (syntax-case clauses () + ((pat) + #`(or (#,(compile-peg-pattern #'pat accum) #,str #,strlen #,at) + (throw 'syntax-error (list #,at (syntax->datum #'pat))))))) + +(define (cg-expect clauses accum) + #`(lambda (str len pos) + #,(cg-expect-int clauses (baf accum) #'str #'len #'pos))) + ;; Association list of functions to handle different expressions as PEGs (define peg-compiler-alist '()) @@ -317,6 +327,7 @@ return EXP." (add-peg-compiler! '? cg-?) (add-peg-compiler! 'followed-by cg-followed-by) (add-peg-compiler! 'not-followed-by cg-not-followed-by) +(add-peg-compiler! 'expect cg-expect) ;; Takes an arbitrary expressions and accumulation variable, then parses it. ;; E.g.: (compile-peg-pattern syntax '(and "abc" (or "-" (range #\a #\z))) 'all) diff --git a/module/ice-9/peg/string-peg.scm b/module/ice-9/peg/string-peg.scm index 45ed14bb1..da98a0da6 100644 --- a/module/ice-9/peg/string-peg.scm +++ b/module/ice-9/peg/string-peg.scm @@ -1,6 +1,6 @@ ;;;; string-peg.scm --- representing PEG grammars as strings ;;;; -;;;; Copyright (C) 2010, 2011 Free Software Foundation, Inc. +;;;; Copyright (C) 2010, 2011, 2024 Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -42,7 +42,8 @@ pattern <-- alternative (SLASH sp alternative)* alternative <-- ([!&]? sp suffix)+ suffix <-- primary ([*+?] sp)* -primary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<' +primary <-- secondary ([#] sp)? +secondary <-- '(' sp pattern ')' sp / '.' sp / literal / charclass / nonterminal !'<' literal <-- ['] (!['] .)* ['] sp charclass <-- LB (!']' (CCrange / CCsingle))* RB sp CCrange <-- . '-' . @@ -73,6 +74,8 @@ RB < ']' (define-sexp-parser peg-suffix all (and peg-primary (* (and (or "*" "+" "?") peg-sp)))) (define-sexp-parser peg-primary all + (and peg-secondary (? (and "#" peg-sp)))) +(define-sexp-parser peg-secondary all (or (and "(" peg-sp peg-pattern ")" peg-sp) (and "." peg-sp) peg-literal @@ -188,6 +191,11 @@ RB < ']' ;; Parse a primary. (define (peg-primary->defn lst for-syntax) + (let ((inner-defn (peg-secondary->defn (cadr lst) for-syntax))) + (if (and (pair? (cddr lst)) (equal? (caddr lst) "#")) #`(expect #,inner-defn) + inner-defn))) + +(define (peg-secondary->defn lst for-syntax) (let ((el (cadr lst))) (cond ((list? el) diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test index 6a8709794..b3586c891 100644 --- a/test-suite/tests/peg.test +++ b/test-suite/tests/peg.test @@ -33,6 +33,7 @@ (alternative peg-alternative) (suffix peg-suffix) (primary peg-primary) + (secondary peg-secondary) (literal peg-literal) (charclass peg-charclass) (CCrange charclass-range) @@ -63,13 +64,12 @@ (pass-if "defining PEGs with PEG" (and (eeval `(define-peg-string-patterns ,(@@ (ice-9 peg) peg-as-peg))) #t)) - (pass-if + (pass-if-equal "equivalence of definitions" - (equal? - (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg))) + (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg))) (tree-map grammar-transform - (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg))))))) + (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))) ;; A grammar for pascal-style comments from Wikipedia. (define comment-grammar @@ -305,3 +305,30 @@ trace-grammar := \"foobarbarbaz\" next: \"\" (with-error-to-string (lambda _ (and=> (match-pattern trace-grammar "foobarbarbaz") peg:tree)))))) + +(define-peg-string-patterns + "expect-grammar <-- one two three / .* +one <-- 'one'# +two <-- 'two'# +three <-- 'three'" +) + +(with-test-prefix "Parsing expect" + (pass-if-equal "expect okay" + '(expect-grammar (one "one") (two "two") (three "three")) + (and=> (match-pattern expect-grammar "onetwothree") + peg:tree)) + (pass-if-equal "expect one" + '(syntax-error (0 "one")) + (catch 'syntax-error + (lambda _ + (and=> (match-pattern expect-grammar "twothree") + peg:tree)) + (lambda args args))) + (pass-if-equal "expect two" + '(syntax-error (3 "two")) + (catch 'syntax-error + (lambda _ + (and=> (match-pattern expect-grammar "onethree") + peg:tree)) + (lambda args args)))) -- 2.46.0