unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Zelphir Kaltstahl <zelphirkaltstahl@posteo.de>
To: Guile User <guile-user@gnu.org>
Subject: question about PEG parsing - mutually recursive definitions
Date: Fri,  4 Oct 2024 00:44:53 +0000	[thread overview]
Message-ID: <52cc3469-bb04-4a9d-bbc6-0412d8303e1f@posteo.de> (raw)

Hello Guile Users,

I have questions regarding Guile's PEG parsing library. Is it possible to have 
mutually recursive definitions of grammar rulse? I am asking this, because I am 
trying to write a grammar for org-mode files and that required handling nested 
inline markups like something bold, which contains something emphasized, which 
contains something ... and so on.

After some tinkering, I came to the conclusion to first try only with bold and 
emphasized: 
https://codeberg.org/ZelphirKaltstahl/guile-examples/src/commit/1575441cf2fdf0f35535db1e6b1986606fa5b8b0/parsing/peg-parsing/nested-inline-elements.scm:

~~~~
(define-module (negation-grammar))


(use-modules
  (ice-9 peg)
  (ice-9 pretty-print)
  (peg-tree-utils))


(define-peg-pattern LITERAL-ASTERISK body "*")
(define-peg-pattern LITERAL-SLASH body "/")

(define-peg-pattern LITERALLY-SPACE-BODY body " ")
(define-peg-pattern LITERALLY-TAB-BODY body "\t")
(define-peg-pattern LITERALLY-ASTERISK-BODY body "*")
(define-peg-pattern LITERALLY-NEWLINE-BODY body "\n")

(define-peg-pattern WHITESPACE all
   (or LITERALLY-SPACE-BODY
       LITERALLY-TAB-BODY
       LITERALLY-NEWLINE-BODY))

(define-peg-pattern BOLD-DELIMITER all LITERAL-ASTERISK)
(define-peg-pattern EMPHASIS-DELIMITER body LITERAL-SLASH)

(define-peg-pattern NOT-ASTERISK body (and (not-followed-by LITERAL-ASTERISK) peg-any))
(define-peg-pattern NOT-SLASH body (and (not-followed-by LITERAL-SLASH) peg-any))

(define-peg-pattern BOLD-CONTENT all
   (or EMPHASIS
       (* (and (not-followed-by "*")
               peg-any))))

(define-peg-pattern BOLD all
   (and BOLD-DELIMITER
        BOLD-CONTENT
        BOLD-DELIMITER))

(define-peg-pattern EMPHASIS-CONTENT all
   (or BOLD
       (* (and (not-followed-by "*")
               peg-any))))

(define-peg-pattern EMPHASISIS all
   (and EMPHASIS-DELIMITER
        EMPHASIS-CONTENT
        EMPHASIS-DELIMITER))

(define-peg-pattern DOCUMENT all
   (+ BOLD))

(define input "*bold /emphasized/ bold*")

(simple-format #t "input: ~s\n" input)

(define peg-record (match-pattern DOCUMENT input))
(define parse-tree (peg:tree peg-record))

(simple-format
  #t "tree:\n~a"
  (call-with-output-string
    (λ (port)
      (print-tree parse-tree port))))

(simple-format #t "match?: ~a\n" (peg-record? peg-record))
(simple-format #t "matched substring: ~a\n" (matched-substring DOCUMENT input))
(simple-format #t "exhausting-match?: ~a\n" (exhausting-match? DOCUMENT input))
~~~~

The problem with this one is, that Guile errors, because EMPHASIS is not yet 
defined where it is first used in the definition of BOLD-CONTENT. However, if I 
move EMPHASIS up above BOLD-CONTENT ... it is not going have EMPHASIS-CONTENT 
defined before it uses those. And if I move those up too, then BOLD is not going 
to be defined, but used in EMPHASIS-CONTENT. And so on and on. This problem 
would of course happen to all other inline markup definitions that are yet to 
come: strikethrough, underline, verbatim, code, ...

So now I am wondering, whether it is impossible to have such a definition, or 
how to resolve the problem. The pattern definitions do not seem to work like 
normal function definitions, which would be able to reference each other 
mutually recursively. Instead they seem to rely on the order of definition.

How can I achieve parsing nested markup? How can I circumvent this problem of 
mutually recursive definitions? Can the PEG library do it, or do I perhaps need 
to switch to some other parsing library?

Best regards,
Zelphir

-- 
repositories:https://notabug.org/ZelphirKaltstahl,https://codeberg.org/ZelphirKaltstahl


                 reply	other threads:[~2024-10-04  0:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52cc3469-bb04-4a9d-bbc6-0412d8303e1f@posteo.de \
    --to=zelphirkaltstahl@posteo.de \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).