From: Michael Lucy <MichaelGLucy@Gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: "Ludovic Courtès" <ludo@gnu.org>, guile-devel@gnu.org
Subject: Re: GSOC PEG project
Date: Sun, 11 Jul 2010 02:48:39 -0500 [thread overview]
Message-ID: <AANLkTinJu-o5FDbj1E9A1RNGD2tjf6jT3B_cEBtis5Mp@mail.gmail.com> (raw)
In-Reply-To: <m3sk3uq70t.fsf@unquote.localdomain>
On Thu, Jul 8, 2010 at 11:21 AM, Andy Wingo <wingo@pobox.com> wrote:
> Hi Michael,
>
> On Tue 06 Jul 2010 00:59, Michael Lucy <MichaelGLucy@Gmail.com> writes:
>
>> (use-modules (ice-9 peg))
>> (peg-find "'b'+" "aabbcc")
>> --> (2 4 "bb")
>
> Humm, another thing to think about: (ice-9 regex) returns "match
> structures", which are really just vectors; have a look at them, and if
> it makes sense to mimic that interface, re-exporting those accessors
> somehow, please do.
So, there are a few places where the interfaces don't quite match up:
1. match:substring
Problem: It's perfectly legal to pass peg-match a parsing nonterminal
and have it give you a parse tree rather than a substring.
Potential solutions:
1.a. Just have match:substring return either the substring or the
parse tree. The problem with this is that it may violate
expectations.
1.b. Have match:substring collapse the parse tree into a string, and
have another function match:parse-tree that would return the parse
tree. The problem with this is that the parse tree might have
discarded text, which would once again violate expectations.
2. match:count
Problem: the notion of a "parenthesized sub-expression" doesn't really
map cleanly onto PEGs. This information isn't tracked while parsing
and wouldn't be very meaningful.
Potential solutions:
2.a. Ignore it (not that bad a solution).
2.b. Track that information. I'd rather not do this because it would
slow down the parser and wouldn't be very useful. Which brings us
to...
3. submatch numbers
Problem: there's no notion of a "submatch" right now. People should
be getting this information by building a parsing nonterminal and then
traversing the resulting parse-tree. I'd rather not wire in a whole
separate system just to provide an alternative way of getting
information about what parts of an expression matched what (it would
also slow down parsing).
Potential solutions:
3.a. Ignore it (would violate expectations in a big way).
3.b. Wire it in (I'd really rather not do this).
So, there would be some gaps if I shimmied the match structure
interface onto PEGs.
The problem is that, while it would be useful to have a consistent
interface for matching both regexps and PEGs, they're different things
and naming the accessor functions the same things might lead people to
assume things that aren't true.
So, three potential paths from here:
1. Mimic the match structure interface as much as possible.
2. Have a similar but differently-named "peg-match structure"
interface that behaves mostly the same but has a few different
functions (I think naming them something slightly different would lead
to fewer people assuming they worked exactly the same as match
structures).
3. Just having a different interface.
I'm leaning toward (2); what do other people think? I'd probably:
1. Not have a peg-match:count function at all.
2. Not have the functions take submatch numbers.
3. Have peg-match:substring return the actual substring.
4. Have another function peg-match:parse-tree that returns the parse tree.
>
>> And when I use it with --no-autocompile I don't get any errors:
>>
>> What does this mean?
>
> Probably some eval-when tomfoolery. See "Eval When" in the manual.
>
> Cheers,
>
> Andy
> --
> http://wingolog.org/
>
next prev parent reply other threads:[~2010-07-11 7:48 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-05 6:05 GSOC PEG project Michael Lucy
2010-07-05 22:40 ` Ludovic Courtès
2010-07-05 23:59 ` Michael Lucy
2010-07-06 1:41 ` No Itisnt
2010-07-06 21:37 ` Ludovic Courtès
2010-07-08 16:23 ` Andy Wingo
2010-07-08 16:21 ` Andy Wingo
2010-07-11 7:48 ` Michael Lucy [this message]
2010-07-17 12:21 ` Andy Wingo
2010-07-17 12:56 ` Ludovic Courtès
2010-07-18 11:57 ` Andy Wingo
2010-07-08 10:29 ` Andy Wingo
2010-07-09 7:58 ` Michael Lucy
2010-07-10 9:49 ` Michael Lucy
2010-07-17 12:19 ` Andy Wingo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTinJu-o5FDbj1E9A1RNGD2tjf6jT3B_cEBtis5Mp@mail.gmail.com \
--to=michaelglucy@gmail.com \
--cc=guile-devel@gnu.org \
--cc=ludo@gnu.org \
--cc=wingo@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).