Re: GNU Guile PEG-parser

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Noah Lavine <noah.b.lavine@gmail.com>
To: Krister Svanlund <svanlund@student.chalmers.se>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: GNU Guile PEG-parser
Date: Thu, 9 Feb 2012 10:42:37 -0500	[thread overview]
Message-ID: <CA+U71=OcqfEu+5Qd9m=bU=f480xRLxsqXDeiC=FPbyodNHd_-g@mail.gmail.com> (raw)
In-Reply-To: <CAO_vGe8jkCyXHaX1_sVqO-i7rsNmuM--PhS8pnC2N9fY=TmC+g@mail.gmail.com>

Hello,

> I've actually found no PEG library that has a string syntax for the
> equivalent of ignore. I'm guessing most people are satisfied with just
> specifying another nonterminal and matching that one. Probably because it is
> seen as less ugly than extending on the formal definition of PEG but I
> really think we could get a cleaner PEG definition of our parser if we where
> able to ignore text that wasn't needed or gets in the way while using
> string-patterns.

That makes sense. I'm a bit surprised that you find string patterns
easier than S-expression patterns, because I find it the other way
around, but different people like different things. I think we could
add some string syntax for ignore if you wanted it, although other
people on the list should chime in.

> It's actually exactly Python I'm thinking about, we are currently doing a
> preprocessor that will put #{ and #} before and after each block but I was
> hoping that there exists a cleaner solution using the power of PEG instead
> of basic string manipulation. If you could help in any way shape or form
> that would be greatly appreciated, even just suggesting on what parts of PEG
> internals to look at would be really useful.

After thinking about it more, you have two choices.

The easiest thing would be to parse each line (or extended line, if it
ends with "\") with a PEG parser, and use your own logic for the
blocks. Your parser would have two steps for each line:

1. Get the indent from the beginning of a line
2. Parse the rest of the line with a PEG parser

Then you would take the lines returned by the PEG parser and combine
them into a data structure yourself, using the Python line-combining
rules. This is probably your best choice.

Your second choice depends on the fact that PEG parsers are just
functions that take certain arguments and return certain arguments.
You can write a function like that yourself and use it just like a PEG
nonterminal in your grammar. When I was working on PEG, I actually
thought that it would be nice to make this interface public so that
different parser generators could interoperate, but I never did it.
It's all documented in the PEG Internals section of the manual,
though. However, I'd recommend against this just because I think the
interface is not as good as it should be right now, so I'd probably
want to change it in the future, which would make your code stop
working. (Although if this is a one-time thing, then you don't need to
care about that.)

I suppose you also have a third choice, which is to change the
internal interface yourself, then let us make it public, then use it
that way. That's the most elegant solution, but it's more work for
you. I wouldn't recommend it unless the first option is hard and you
want this to last for a long time.

I hope this helps,
Noah

     prev parent reply	other threads:[~2012-02-09 15:42 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com>
2012-02-08  0:47 ` GNU Guile PEG-parser Noah Lavine
2012-02-08 18:29   ` Krister Svanlund
2012-02-09 15:42     ` Noah Lavine [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+U71=OcqfEu+5Qd9m=bU=f480xRLxsqXDeiC=FPbyodNHd_-g@mail.gmail.com' \
    --to=noah.b.lavine@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=svanlund@student.chalmers.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).