* question about PEG parsing - mutually recursive definitions
@ 2024-10-04 0:44 Zelphir Kaltstahl
0 siblings, 0 replies; only message in thread
From: Zelphir Kaltstahl @ 2024-10-04 0:44 UTC (permalink / raw)
To: Guile User
Hello Guile Users,
I have questions regarding Guile's PEG parsing library. Is it possible to have
mutually recursive definitions of grammar rulse? I am asking this, because I am
trying to write a grammar for org-mode files and that required handling nested
inline markups like something bold, which contains something emphasized, which
contains something ... and so on.
After some tinkering, I came to the conclusion to first try only with bold and
emphasized:
https://codeberg.org/ZelphirKaltstahl/guile-examples/src/commit/1575441cf2fdf0f35535db1e6b1986606fa5b8b0/parsing/peg-parsing/nested-inline-elements.scm:
~~~~
(define-module (negation-grammar))
(use-modules
(ice-9 peg)
(ice-9 pretty-print)
(peg-tree-utils))
(define-peg-pattern LITERAL-ASTERISK body "*")
(define-peg-pattern LITERAL-SLASH body "/")
(define-peg-pattern LITERALLY-SPACE-BODY body " ")
(define-peg-pattern LITERALLY-TAB-BODY body "\t")
(define-peg-pattern LITERALLY-ASTERISK-BODY body "*")
(define-peg-pattern LITERALLY-NEWLINE-BODY body "\n")
(define-peg-pattern WHITESPACE all
(or LITERALLY-SPACE-BODY
LITERALLY-TAB-BODY
LITERALLY-NEWLINE-BODY))
(define-peg-pattern BOLD-DELIMITER all LITERAL-ASTERISK)
(define-peg-pattern EMPHASIS-DELIMITER body LITERAL-SLASH)
(define-peg-pattern NOT-ASTERISK body (and (not-followed-by LITERAL-ASTERISK) peg-any))
(define-peg-pattern NOT-SLASH body (and (not-followed-by LITERAL-SLASH) peg-any))
(define-peg-pattern BOLD-CONTENT all
(or EMPHASIS
(* (and (not-followed-by "*")
peg-any))))
(define-peg-pattern BOLD all
(and BOLD-DELIMITER
BOLD-CONTENT
BOLD-DELIMITER))
(define-peg-pattern EMPHASIS-CONTENT all
(or BOLD
(* (and (not-followed-by "*")
peg-any))))
(define-peg-pattern EMPHASISIS all
(and EMPHASIS-DELIMITER
EMPHASIS-CONTENT
EMPHASIS-DELIMITER))
(define-peg-pattern DOCUMENT all
(+ BOLD))
(define input "*bold /emphasized/ bold*")
(simple-format #t "input: ~s\n" input)
(define peg-record (match-pattern DOCUMENT input))
(define parse-tree (peg:tree peg-record))
(simple-format
#t "tree:\n~a"
(call-with-output-string
(λ (port)
(print-tree parse-tree port))))
(simple-format #t "match?: ~a\n" (peg-record? peg-record))
(simple-format #t "matched substring: ~a\n" (matched-substring DOCUMENT input))
(simple-format #t "exhausting-match?: ~a\n" (exhausting-match? DOCUMENT input))
~~~~
The problem with this one is, that Guile errors, because EMPHASIS is not yet
defined where it is first used in the definition of BOLD-CONTENT. However, if I
move EMPHASIS up above BOLD-CONTENT ... it is not going have EMPHASIS-CONTENT
defined before it uses those. And if I move those up too, then BOLD is not going
to be defined, but used in EMPHASIS-CONTENT. And so on and on. This problem
would of course happen to all other inline markup definitions that are yet to
come: strikethrough, underline, verbatim, code, ...
So now I am wondering, whether it is impossible to have such a definition, or
how to resolve the problem. The pattern definitions do not seem to work like
normal function definitions, which would be able to reference each other
mutually recursively. Instead they seem to rely on the order of definition.
How can I achieve parsing nested markup? How can I circumvent this problem of
mutually recursive definitions? Can the PEG library do it, or do I perhaps need
to switch to some other parsing library?
Best regards,
Zelphir
--
repositories:https://notabug.org/ZelphirKaltstahl,https://codeberg.org/ZelphirKaltstahl
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-10-04 0:44 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 0:44 question about PEG parsing - mutually recursive definitions Zelphir Kaltstahl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).