From: Zelphir Kaltstahl <zelphirkaltstahl@posteo.de>
To: Guile User <guile-user@gnu.org>
Subject: question about PEG parsing - mutually recursive definitions
Date: Fri, 4 Oct 2024 00:44:53 +0000 [thread overview]
Message-ID: <52cc3469-bb04-4a9d-bbc6-0412d8303e1f@posteo.de> (raw)
Hello Guile Users,
I have questions regarding Guile's PEG parsing library. Is it possible to have
mutually recursive definitions of grammar rulse? I am asking this, because I am
trying to write a grammar for org-mode files and that required handling nested
inline markups like something bold, which contains something emphasized, which
contains something ... and so on.
After some tinkering, I came to the conclusion to first try only with bold and
emphasized:
https://codeberg.org/ZelphirKaltstahl/guile-examples/src/commit/1575441cf2fdf0f35535db1e6b1986606fa5b8b0/parsing/peg-parsing/nested-inline-elements.scm:
~~~~
(define-module (negation-grammar))
(use-modules
(ice-9 peg)
(ice-9 pretty-print)
(peg-tree-utils))
(define-peg-pattern LITERAL-ASTERISK body "*")
(define-peg-pattern LITERAL-SLASH body "/")
(define-peg-pattern LITERALLY-SPACE-BODY body " ")
(define-peg-pattern LITERALLY-TAB-BODY body "\t")
(define-peg-pattern LITERALLY-ASTERISK-BODY body "*")
(define-peg-pattern LITERALLY-NEWLINE-BODY body "\n")
(define-peg-pattern WHITESPACE all
(or LITERALLY-SPACE-BODY
LITERALLY-TAB-BODY
LITERALLY-NEWLINE-BODY))
(define-peg-pattern BOLD-DELIMITER all LITERAL-ASTERISK)
(define-peg-pattern EMPHASIS-DELIMITER body LITERAL-SLASH)
(define-peg-pattern NOT-ASTERISK body (and (not-followed-by LITERAL-ASTERISK) peg-any))
(define-peg-pattern NOT-SLASH body (and (not-followed-by LITERAL-SLASH) peg-any))
(define-peg-pattern BOLD-CONTENT all
(or EMPHASIS
(* (and (not-followed-by "*")
peg-any))))
(define-peg-pattern BOLD all
(and BOLD-DELIMITER
BOLD-CONTENT
BOLD-DELIMITER))
(define-peg-pattern EMPHASIS-CONTENT all
(or BOLD
(* (and (not-followed-by "*")
peg-any))))
(define-peg-pattern EMPHASISIS all
(and EMPHASIS-DELIMITER
EMPHASIS-CONTENT
EMPHASIS-DELIMITER))
(define-peg-pattern DOCUMENT all
(+ BOLD))
(define input "*bold /emphasized/ bold*")
(simple-format #t "input: ~s\n" input)
(define peg-record (match-pattern DOCUMENT input))
(define parse-tree (peg:tree peg-record))
(simple-format
#t "tree:\n~a"
(call-with-output-string
(λ (port)
(print-tree parse-tree port))))
(simple-format #t "match?: ~a\n" (peg-record? peg-record))
(simple-format #t "matched substring: ~a\n" (matched-substring DOCUMENT input))
(simple-format #t "exhausting-match?: ~a\n" (exhausting-match? DOCUMENT input))
~~~~
The problem with this one is, that Guile errors, because EMPHASIS is not yet
defined where it is first used in the definition of BOLD-CONTENT. However, if I
move EMPHASIS up above BOLD-CONTENT ... it is not going have EMPHASIS-CONTENT
defined before it uses those. And if I move those up too, then BOLD is not going
to be defined, but used in EMPHASIS-CONTENT. And so on and on. This problem
would of course happen to all other inline markup definitions that are yet to
come: strikethrough, underline, verbatim, code, ...
So now I am wondering, whether it is impossible to have such a definition, or
how to resolve the problem. The pattern definitions do not seem to work like
normal function definitions, which would be able to reference each other
mutually recursively. Instead they seem to rely on the order of definition.
How can I achieve parsing nested markup? How can I circumvent this problem of
mutually recursive definitions? Can the PEG library do it, or do I perhaps need
to switch to some other parsing library?
Best regards,
Zelphir
--
repositories:https://notabug.org/ZelphirKaltstahl,https://codeberg.org/ZelphirKaltstahl
reply other threads:[~2024-10-04 0:44 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52cc3469-bb04-4a9d-bbc6-0412d8303e1f@posteo.de \
--to=zelphirkaltstahl@posteo.de \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).