* Re: GNU Guile PEG-parser
[not found] <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com>
@ 2012-02-08 0:47 ` Noah Lavine
2012-02-08 18:29 ` Krister Svanlund
0 siblings, 1 reply; 3+ messages in thread
From: Noah Lavine @ 2012-02-08 0:47 UTC (permalink / raw)
To: Krister Svanlund; +Cc: guile-devel
Hello,
Thanks for emailing! I suppose I am the one to talk to, since I was
the last one to work on it.
I didn't make the PEG parsing syntax, but I would guess the reason
there isn't a string syntax for ignore is that there's no conventional
way to write it, but there is for the other PEG elements. It would be
easy to add one if it was useful, but we'd want to make sure our
syntax agreed with other PEG libraries, so people wouldn't be confused
later.
For blank-space indented blocks, do you mean you want to group
together lines with the same indentation, like Python syntax? If you
know what the indentation will be at the beginning of each line, you
can do something like this:
(* (and "\t" <match-line> "\n")),
where you replace "\t" with whatever indentation you want.
However, what you probably want to do is look at the indentation in
the first line and then group it with every following line that has
the same indentation. I'm not sure if it's possible, but it would
probably be ugly. If you tell me what you're trying to do, though, I
can help you write your own parser to handle it. You can even write
some of your parser yourself and use PEGs for the rest, if you're
willing to use PEG internals.
Can you tell me more about what you're trying to do? I am happy to
help now, but I will be more helpful if I know more.
I'm going to CC the guile-devel mailing list because of the issue with
the string syntax.
Noah
On Tue, Feb 7, 2012 at 10:03 AM, Krister Svanlund
<krister.svanlund@gmail.com> wrote:
> Hi,
> I'm currently involved in a project that plans on using the PEG module for
> Guile for parsing and I've understod that you are the one to talk to about
> it. I'm mostly just curious how come there isn't an equivalent to ignore in
> string-patterns and if this would be complex to add?
>
> I'm also curious if there is any way to deal with blank-space indented
> blocks in PEG.
>
> Yours
> Krister Svanlund
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GNU Guile PEG-parser
2012-02-08 0:47 ` GNU Guile PEG-parser Noah Lavine
@ 2012-02-08 18:29 ` Krister Svanlund
2012-02-09 15:42 ` Noah Lavine
0 siblings, 1 reply; 3+ messages in thread
From: Krister Svanlund @ 2012-02-08 18:29 UTC (permalink / raw)
To: Noah Lavine; +Cc: guile-devel
[-- Attachment #1: Type: text/plain, Size: 3094 bytes --]
Hi, thanks for a quick response!
I've actually found no PEG library that has a string syntax for the
equivalent of ignore. I'm guessing most people are satisfied with just
specifying another nonterminal and matching that one. Probably because it
is seen as less ugly than extending on the formal definition of PEG but I
really think we could get a cleaner PEG definition of our parser if we
where able to ignore text that wasn't needed or gets in the way while using
string-patterns.
It's actually exactly Python I'm thinking about, we are currently doing a
preprocessor that will put #{ and #} before and after each block but I was
hoping that there exists a cleaner solution using the power of PEG instead
of basic string manipulation. If you could help in any way shape or form
that would be greatly appreciated, even just suggesting on what parts of
PEG internals to look at would be really useful.
I hope you or the guile-devel list can be of help.
Yors,
Krister Svanlund
On Wed, Feb 8, 2012 at 1:47 AM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello,
>
> Thanks for emailing! I suppose I am the one to talk to, since I was
> the last one to work on it.
>
> I didn't make the PEG parsing syntax, but I would guess the reason
> there isn't a string syntax for ignore is that there's no conventional
> way to write it, but there is for the other PEG elements. It would be
> easy to add one if it was useful, but we'd want to make sure our
> syntax agreed with other PEG libraries, so people wouldn't be confused
> later.
>
> For blank-space indented blocks, do you mean you want to group
> together lines with the same indentation, like Python syntax? If you
> know what the indentation will be at the beginning of each line, you
> can do something like this:
>
> (* (and "\t" <match-line> "\n")),
>
> where you replace "\t" with whatever indentation you want.
>
> However, what you probably want to do is look at the indentation in
> the first line and then group it with every following line that has
> the same indentation. I'm not sure if it's possible, but it would
> probably be ugly. If you tell me what you're trying to do, though, I
> can help you write your own parser to handle it. You can even write
> some of your parser yourself and use PEGs for the rest, if you're
> willing to use PEG internals.
>
> Can you tell me more about what you're trying to do? I am happy to
> help now, but I will be more helpful if I know more.
>
> I'm going to CC the guile-devel mailing list because of the issue with
> the string syntax.
>
> Noah
>
> On Tue, Feb 7, 2012 at 10:03 AM, Krister Svanlund
> <krister.svanlund@gmail.com> wrote:
> > Hi,
> > I'm currently involved in a project that plans on using the PEG module
> for
> > Guile for parsing and I've understod that you are the one to talk to
> about
> > it. I'm mostly just curious how come there isn't an equivalent to ignore
> in
> > string-patterns and if this would be complex to add?
> >
> > I'm also curious if there is any way to deal with blank-space indented
> > blocks in PEG.
> >
> > Yours
> > Krister Svanlund
>
[-- Attachment #2: Type: text/html, Size: 3908 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: GNU Guile PEG-parser
2012-02-08 18:29 ` Krister Svanlund
@ 2012-02-09 15:42 ` Noah Lavine
0 siblings, 0 replies; 3+ messages in thread
From: Noah Lavine @ 2012-02-09 15:42 UTC (permalink / raw)
To: Krister Svanlund; +Cc: guile-devel
Hello,
> I've actually found no PEG library that has a string syntax for the
> equivalent of ignore. I'm guessing most people are satisfied with just
> specifying another nonterminal and matching that one. Probably because it is
> seen as less ugly than extending on the formal definition of PEG but I
> really think we could get a cleaner PEG definition of our parser if we where
> able to ignore text that wasn't needed or gets in the way while using
> string-patterns.
That makes sense. I'm a bit surprised that you find string patterns
easier than S-expression patterns, because I find it the other way
around, but different people like different things. I think we could
add some string syntax for ignore if you wanted it, although other
people on the list should chime in.
> It's actually exactly Python I'm thinking about, we are currently doing a
> preprocessor that will put #{ and #} before and after each block but I was
> hoping that there exists a cleaner solution using the power of PEG instead
> of basic string manipulation. If you could help in any way shape or form
> that would be greatly appreciated, even just suggesting on what parts of PEG
> internals to look at would be really useful.
After thinking about it more, you have two choices.
The easiest thing would be to parse each line (or extended line, if it
ends with "\") with a PEG parser, and use your own logic for the
blocks. Your parser would have two steps for each line:
1. Get the indent from the beginning of a line
2. Parse the rest of the line with a PEG parser
Then you would take the lines returned by the PEG parser and combine
them into a data structure yourself, using the Python line-combining
rules. This is probably your best choice.
Your second choice depends on the fact that PEG parsers are just
functions that take certain arguments and return certain arguments.
You can write a function like that yourself and use it just like a PEG
nonterminal in your grammar. When I was working on PEG, I actually
thought that it would be nice to make this interface public so that
different parser generators could interoperate, but I never did it.
It's all documented in the PEG Internals section of the manual,
though. However, I'd recommend against this just because I think the
interface is not as good as it should be right now, so I'd probably
want to change it in the future, which would make your code stop
working. (Although if this is a one-time thing, then you don't need to
care about that.)
I suppose you also have a third choice, which is to change the
internal interface yourself, then let us make it public, then use it
that way. That's the most elegant solution, but it's more work for
you. I wouldn't recommend it unless the first option is hard and you
want this to last for a long time.
I hope this helps,
Noah
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-02-09 15:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com>
2012-02-08 0:47 ` GNU Guile PEG-parser Noah Lavine
2012-02-08 18:29 ` Krister Svanlund
2012-02-09 15:42 ` Noah Lavine
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).