From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ekaitz Zarraga Newsgroups: gmane.lisp.guile.bugs Subject: bug#73188: [PATCH 3/3] PEG: add large string-peg patch Date: Sun, 22 Dec 2024 21:01:08 +0100 Message-ID: <20241222200128.13782-3-ekaitz@elenq.tech> References: <20241222200128.13782-1-ekaitz@elenq.tech> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35442"; mail-complaints-to="usenet@ciao.gmane.io" Cc: ludo@gnu.org, Ekaitz Zarraga To: 73188@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Sun Dec 22 21:03:38 2024 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tPSAs-00094f-0p for guile-bugs@m.gmane-mx.org; Sun, 22 Dec 2024 21:03:38 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tPSAK-000207-2s; Sun, 22 Dec 2024 15:03:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tPSAI-0001zf-EU for bug-guile@gnu.org; Sun, 22 Dec 2024 15:03:02 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tPSAI-0002vx-5A for bug-guile@gnu.org; Sun, 22 Dec 2024 15:03:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:Date:From:To:Subject; bh=9UAu1ynsN/4mHorNAcnoU6RAMZF+lAEAhTag7V5LLR8=; b=qkirOcYn+qYk5kDwE1pkL/oHQRkafISzt4AgvUjjBzorSP5KHPUXVV2qb0yc0+Ssr3D/9uIh6D/s++n6JkHo75bgsarhPjkMyPw83HH0Whri5yadbhG4ok8VDcJlTAKnS4s2krd2z5f0Wm8jSbZjvMPBpoqSQf60gkUtkyey3q6Q2HBsOl8yzX+pNqFv1u7Jdn41OtYrZVJqpvbBPuLjWwCpNHqnQtjCNFeW8RW0FrXy0q1Bv/GQ9Z6Wq3u6fJ0JBVbC2Q11sm0uzNJLnUEIo7aGhOaNAWGxQ/Kt3F4UrFX04aEfuK1n5qKqefOGpyePZEhBoSN/dZUuW1vwXzEYwA==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tPSAH-0006uH-Vw for bug-guile@gnu.org; Sun, 22 Dec 2024 15:03:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ekaitz Zarraga Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 22 Dec 2024 20:03:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73188 X-GNU-PR-Package: guile Original-Received: via spool by 73188-submit@debbugs.gnu.org id=B73188.173489772326451 (code B ref 73188); Sun, 22 Dec 2024 20:03:01 +0000 Original-Received: (at 73188) by debbugs.gnu.org; 22 Dec 2024 20:02:03 +0000 Original-Received: from localhost ([127.0.0.1]:52043 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tPS9K-0006sN-Dn for submit@debbugs.gnu.org; Sun, 22 Dec 2024 15:02:03 -0500 Original-Received: from dane.soverin.net ([185.233.34.30]:51111) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tPS9I-0006rx-Fe for 73188@debbugs.gnu.org; Sun, 22 Dec 2024 15:02:01 -0500 Original-Received: from smtp.soverin.net (c04smtp-lb01.int.sover.in [10.10.4.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dane.soverin.net (Postfix) with ESMTPS id 4YGX9Z3xgBz2xQt; Sun, 22 Dec 2024 20:01:54 +0000 (UTC) Original-Received: from smtp.soverin.net (smtp.soverin.net [10.10.4.100]) by soverin.net (Postfix) with ESMTPSA id 4YGX9Y1M3dzKP; Sun, 22 Dec 2024 20:01:53 +0000 (UTC) Authentication-Results: smtp.soverin.net; dkim=pass (2048-bit key; unprotected) header.d=elenq.tech header.i=@elenq.tech header.a=rsa-sha256 header.s=soverin1 header.b=FLeyxawZ; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=elenq.tech; s=soverin1; t=1734897714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9UAu1ynsN/4mHorNAcnoU6RAMZF+lAEAhTag7V5LLR8=; b=FLeyxawZDgewsApium7luAhT221KjmZhfqHLnE115I9DL5UGi6s0qr/u8YzD6tQOHtSwnr hLBHKadIcTD5/m2AdXGA9lBeWFOVkCR9aMUr4BPY8XvpCLeMIkuaQOaH5+BWL0aNtx8egS H0Fn0KTVWbXslIsZ+LwXzra49pSosDIkaNHcScA/RXugLEVXLk1Zb2R1xEpo00G6pdcZeU bvv7tR0u2cgh888TxCvuZuObOsrg8MdI4vgPd5qPnZ99H1exHsszP1HsL+JjrjuEYuKqwN PhDQLfCj2KR9zq3MneHad6Xj3vvcYTD8WwKQAvDzrEbNBrivas/7HvJmMxp5aA== X-CM-Envelope: MS4xfLzHidv8mPwe1gg24dKsV7TMyeiunZ2BdPub/h/aaX92h2XE3QZCxhMnshETB6jJv2iUAwBobSZKiWt4tElLsamclM8PMgY79ppaV92UNuV7nIjT5mDV 9L+xZRpDqFraGS9XOVwYKTky4gkmDLtfCmcgna8vduuViRudjIssHHp4dqLNLa1srVvFdy05+EZ6LepwSA81tXyhIAFihG9S4YS3AKnIHRd38dxNrPkAXf7H YjWlZI2+dmeXp6Lx50W9yA== X-CM-Analysis: v=2.4 cv=UsCZN/wB c=1 sm=1 tr=0 ts=67687032 a=boG0PpFrEpR1SC5N/ZD5Tw==:117 a=boG0PpFrEpR1SC5N/ZD5Tw==:17 a=MKtGQD3n3ToA:10 a=1oJP67jkp3AA:10 a=NEAV23lmAAAA:8 a=ekYV4lpRAAAA:8 a=I0CVDw5ZAAAA:8 a=aInjHplUd54IY3E55FIA:9 a=UNA1SJA-vpUoCqqO:21 a=4IeJLL4p9JcA:10 a=mrCxpU6zTNQDHnudu_9Q:22 a=yPy0HX4kI4LsAlP3oO-2:22 In-Reply-To: <20241222200128.13782-1-ekaitz@elenq.tech> X-Spampanel-Class: ham X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.bugs:11126 Archived-At: --- test-suite/tests/peg.test | 117 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 113 insertions(+), 4 deletions(-) diff --git a/test-suite/tests/peg.test b/test-suite/tests/peg.test index d9e3e1b22..d8d047288 100644 --- a/test-suite/tests/peg.test +++ b/test-suite/tests/peg.test @@ -86,7 +86,7 @@ End <-- '*)' C <- Begin N* End N <- C / (!Begin !End Z) -Z <- [^X-Z]") ;; Forbid some characters to test not-in-range +Z <- .") =20 ;; A short /etc/passwd file. (define *etc-passwd* @@ -126,9 +126,6 @@ SLASH < '/'") (match-pattern C "(*blah*)") (make-prec 0 8 "(*blah*)" '((Begin "(*") "blah" (End "*)"))))) - (pass-if - "simple comment with forbidden char" - (not (match-pattern C "(*blYh*)"))) (pass-if "simple comment padded" (equal? @@ -288,3 +285,115 @@ number <-- [0-9]+") (equal? (eq-parse "1+1/2*3+(1+1)/2") '(+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2))))) =20 + +(define html-grammar +" +# Based on code from https://github.com/Fantom-Factory/afHtmlParser +# 2014-2023 Steve Eynon. This code was originally released under the follo= wing +# terms: +# +# Permission to use, copy, modify, and/or distribute this software fo= r any +# purpose with or without fee is hereby granted, provided that the ab= ove +# copyright notice and this permission notice appear in all copies. +# +# THE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL +# WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRA= NTIES +# OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIA= BLE +# FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY +# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHE= THER +# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARIS= ING +# OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWAR= E. + +# PEG Rules for parsing well formed HTML 5 documents +# https://html.spec.whatwg.org/multipage/syntax.html + +html <-- bom? blurb* doctype? blurb* xmlProlog? blurb* elem blurb* +bom <-- \"\\uFEFF\" +xmlProlog <-- \"\" .)+ \"?>\" + +# ---- Doctype ---- + +doctype <-- \"\" +doctypePublicId <-- [ \\t\\n\\f\\r]+ \"PUBLIC\" [ \\t\\n\\f\\r]+ ((\"= \\\"\" [^\"]* \"\\\"\") / (\"'\" [^']* \"'\")) +doctypeSystemId <-- [ \\t\\n\\f\\r]+ (\"SYSTEM\" [ \\t\\n\\f\\r]+)? ((\"= \\\"\" [^\"]* \"\\\"\") / (\"'\" [^']* \"'\")) + +# ---- Elems ---- + +elem <-- voidElem / rawTextElem / escRawTextElem / selfClosin= gElem / normalElem +voidElem <-- \"<\" voidElemName attributes \">\" +rawTextElem <-- \"<\" rawTextElemName attributes \">\" rawTextC= ontent endElem +escRawTextElem <-- \"<\" escRawTextElemName attributes \">\" escRawTe= xtContent endElem +selfClosingElem <-- \"<\" elemName attributes \"/>\" +normalElem <-- \"<\" elemName attributes \">\" normalCo= ntent? endElem? +endElem <-- \"\" + +elemName <-- [a-zA-Z] [^\\t\\n\\f />]* +voidElemName <-- \"area\" / \"base\" / \"br\" / \"col\" / \"embed\"= / + \"hr\" / \"img\" / \"input\" / \"keygen\" / \"link\"= / + \"meta\" / \"param\" / \"source\" / \"track\" / \"wb= r\" +rawTextElemName <-- \"script\" / \"style\" +escRawTextElemName <-- \"textarea\" / \"title\" + +rawTextContent <-- (!(\"\" / \"\") .)+ +escRawTextContent <-- ((!(\"\" / \"\" / \"&\") .)+ / = charRef)* +normalContent <-- !\"] ([ \\t]+ / doubleQuoteAttr / singleQuoteAttr= / unquotedAttr / emptyAttr))* +attrName <-- [^ \\t\\n\\r\\f\"'>/=3D]+ +emptyAttr <-- attrName+ +unquotedAttr <-- attrName [ \\t]* \"=3D\" [ \\t]* (charRef / [^ = \\t\\n\\r\\f\"'=3D<>`&]+)+ +singleQuoteAttr <-- attrName [ \\t]* \"=3D\" [ \\t]* \"'\" (charRef / [= ^'&]+)* \"'\" +doubleQuoteAttr <-- attrName [ \\t]* \"=3D\" [ \\t]* \"\\\"\" (charRef /= [^\"&]+)* \"\\\"\" + +# ---- Character References ---- + +charRef <-- &\"&\" (decNumCharRef / hexNumCharRef / namedCharRef /= borkedRef) +namedCharRef <-- \"&\" [^;>]+ \";\" +decNumCharRef <-- \"&#\" [0-9]+ \";\" +hexNumCharRef <-- \"&#x\" [a-fA-F0-9]+ \";\" +borkedRef <-- \"&\" &[ \\t] + +# ---- Misc ---- + +cdata <-- \"\" .)+ \"]]>\" +comment <-- \"\" +blurb <-- [ \\t\\n\\f\\r]+ / comment") + +(define html-example " + + + + Example Domain + + + + + + + +
+

Example Domain

+

This domain is for use in illustrative examples in documents. You m= ay + use this domain in literature without prior coordination or asking for + permission.

= More + information...

+
+ + +") + +(with-test-prefix "Parsing with complex grammars" + (eeval `(define-peg-string-patterns ,html-grammar)) + (pass-if + "HTML parsing" + (equal? + (peg:tree (match-pattern html html-example)) + '(html (blurb "\n") (doctype "") (blurb "\n") (elem (= normalElem "<" (elemName "html") attributes ">" (normalContent "\n" (elem (= normalElem "<" (elemName "head") attributes ">" (normalContent "\n " (el= em (escRawTextElem "<" (escRawTextElemName "title") attributes ">" (escRawT= extContent "Example Domain") (endElem ""))) "\n = " (elem (selfClosingElem "<" (elemName "meta") (attributes " " (doubleQuot= eAttr (attrName "charset") "=3D\"utf-8\"") " ") "/>")) "\n " (elem (self= ClosingElem "<" (elemName "meta") (attributes " " (doubleQuoteAttr (attrNam= e "http-equiv") "=3D\"Content-type\"") " " (doubleQuoteAttr (attrName "cont= ent") "=3D\"text/html; charset=3Dutf-8\"") " ") "/>")) "\n " (elem (self= ClosingElem "<" (elemName "meta") (attributes " " (doubleQuoteAttr (attrNam= e "name") "=3D\"viewport\"") " " (doubleQuoteAttr (attrName "content") "=3D= \"width=3Ddevice-width, initial-scale=3D1\"") " ") "/>")) "\n " (elem (r= awTextElem "<" (rawTextElemName "style") (attributes " " (doubleQuoteAttr (= attrName "type") "=3D\"text/css\"")) ">" (rawTextContent "\n body {\n = background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n = }\n ") (endElem ""))) "\n") (endElem ""))) "\n\n" (elem (normalElem "<" (elemName "body") attri= butes ">" (normalContent "\n" (elem (normalElem "<" (elemName "div") attrib= utes ">" (normalContent "\n " (elem (normalElem "<" (elemName "h1") attr= ibutes ">" (normalContent "Example Domain") (endElem ""))) "\n " (elem (normalElem "<" (elemName "p") attributes ">" (normalC= ontent "This domain is for use in illustrative examples in documents. You m= ay\n use this domain in literature without prior coordination or asking = for\n permission.") (endElem ""))) " " (elem (norma= lElem "<" (elemName "p") attributes ">" (normalContent (elem (normalElem "<= " (elemName "a") (attributes " " (doubleQuoteAttr (attrName "href") "=3D\"h= ttps://www.iana.org/domains/example\"")) ">" (normalContent "More\n info= rmation...") (endElem "")))) (endElem ""))) "\n") (endElem ""))) "\n") (endElem ""))) "\n") (endElem ""))) (blur= b "\n"))))) --=20 2.46.0