unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* question about PEG parsing - mutually recursive definitions
@ 2024-10-04  0:44 Zelphir Kaltstahl
  0 siblings, 0 replies; only message in thread
From: Zelphir Kaltstahl @ 2024-10-04  0:44 UTC (permalink / raw)
  To: Guile User

Hello Guile Users,

I have questions regarding Guile's PEG parsing library. Is it possible to have 
mutually recursive definitions of grammar rulse? I am asking this, because I am 
trying to write a grammar for org-mode files and that required handling nested 
inline markups like something bold, which contains something emphasized, which 
contains something ... and so on.

After some tinkering, I came to the conclusion to first try only with bold and 
emphasized: 
https://codeberg.org/ZelphirKaltstahl/guile-examples/src/commit/1575441cf2fdf0f35535db1e6b1986606fa5b8b0/parsing/peg-parsing/nested-inline-elements.scm:

~~~~
(define-module (negation-grammar))


(use-modules
  (ice-9 peg)
  (ice-9 pretty-print)
  (peg-tree-utils))


(define-peg-pattern LITERAL-ASTERISK body "*")
(define-peg-pattern LITERAL-SLASH body "/")

(define-peg-pattern LITERALLY-SPACE-BODY body " ")
(define-peg-pattern LITERALLY-TAB-BODY body "\t")
(define-peg-pattern LITERALLY-ASTERISK-BODY body "*")
(define-peg-pattern LITERALLY-NEWLINE-BODY body "\n")

(define-peg-pattern WHITESPACE all
   (or LITERALLY-SPACE-BODY
       LITERALLY-TAB-BODY
       LITERALLY-NEWLINE-BODY))

(define-peg-pattern BOLD-DELIMITER all LITERAL-ASTERISK)
(define-peg-pattern EMPHASIS-DELIMITER body LITERAL-SLASH)

(define-peg-pattern NOT-ASTERISK body (and (not-followed-by LITERAL-ASTERISK) peg-any))
(define-peg-pattern NOT-SLASH body (and (not-followed-by LITERAL-SLASH) peg-any))

(define-peg-pattern BOLD-CONTENT all
   (or EMPHASIS
       (* (and (not-followed-by "*")
               peg-any))))

(define-peg-pattern BOLD all
   (and BOLD-DELIMITER
        BOLD-CONTENT
        BOLD-DELIMITER))

(define-peg-pattern EMPHASIS-CONTENT all
   (or BOLD
       (* (and (not-followed-by "*")
               peg-any))))

(define-peg-pattern EMPHASISIS all
   (and EMPHASIS-DELIMITER
        EMPHASIS-CONTENT
        EMPHASIS-DELIMITER))

(define-peg-pattern DOCUMENT all
   (+ BOLD))

(define input "*bold /emphasized/ bold*")

(simple-format #t "input: ~s\n" input)

(define peg-record (match-pattern DOCUMENT input))
(define parse-tree (peg:tree peg-record))

(simple-format
  #t "tree:\n~a"
  (call-with-output-string
    (λ (port)
      (print-tree parse-tree port))))

(simple-format #t "match?: ~a\n" (peg-record? peg-record))
(simple-format #t "matched substring: ~a\n" (matched-substring DOCUMENT input))
(simple-format #t "exhausting-match?: ~a\n" (exhausting-match? DOCUMENT input))
~~~~

The problem with this one is, that Guile errors, because EMPHASIS is not yet 
defined where it is first used in the definition of BOLD-CONTENT. However, if I 
move EMPHASIS up above BOLD-CONTENT ... it is not going have EMPHASIS-CONTENT 
defined before it uses those. And if I move those up too, then BOLD is not going 
to be defined, but used in EMPHASIS-CONTENT. And so on and on. This problem 
would of course happen to all other inline markup definitions that are yet to 
come: strikethrough, underline, verbatim, code, ...

So now I am wondering, whether it is impossible to have such a definition, or 
how to resolve the problem. The pattern definitions do not seem to work like 
normal function definitions, which would be able to reference each other 
mutually recursively. Instead they seem to rely on the order of definition.

How can I achieve parsing nested markup? How can I circumvent this problem of 
mutually recursive definitions? Can the PEG library do it, or do I perhaps need 
to switch to some other parsing library?

Best regards,
Zelphir

-- 
repositories:https://notabug.org/ZelphirKaltstahl,https://codeberg.org/ZelphirKaltstahl


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-10-04  0:44 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04  0:44 question about PEG parsing - mutually recursive definitions Zelphir Kaltstahl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).