From: Michael Lucy <MichaelGLucy@Gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: guile-devel@gnu.org
Subject: Re: Some Questions
Date: Mon, 29 Mar 2010 23:08:18 -0600 [thread overview]
Message-ID: <52c42c3e1003292208v14b31651q9b8acd72e63bf890@mail.gmail.com> (raw)
In-Reply-To: <m3zl1rh3dk.fsf@pobox.com>
Hi,
On Mon, Mar 29, 2010 at 3:40 AM, Andy Wingo <wingo@pobox.com> wrote:
>
> Tree-IL is the right thing IMO, mostly because it allows you keep source
> location information, but it also allows you to express more precisely
> what you want to compile to. You don't want to run your compiler's
> output through the Scheme syntax expander, that's unnecessary.
>
> Also tree-il is the right place to hook into Guile's compiler
> infrastructure.
>
Ah, k.
>
> Ah, I should have responded to you before I responded to "No Itisnt"; if
> you're down for this, let's do it then. I suspect that it's a project
> that would take 2 months to do *well*:
>
> * Compile PEG grammars at syntax time using a macro
> - into state-machine-like lexically-bound closures
> - see LPEG VM for example
> http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
> - potentially augment Guile's VM with needed instructions
> * Procedural, LPEG-like interface?
> - run the compiler at runtime?
So, I'm assuming "syntax time" means the grammars will be compiled as
soon as the source file is loaded and macros are parsed, while
"runtime" means the grammars will be compiled when a function using
them is called. Do you mean you'd like to see both of these
supported?
As for the interface, the lisp tradition I'm most comfortable in calls
for transparent composing and decomposing of lists as the main way of
dealing with data, so I was thinking of representing uncompiled
grammars in a form similar to the example at the top of peg.scm and
then compiling this to something efficient (a choice between this and
more traditional string syntax is how CL-PPCRE approaches things:
http://weitz.de/cl-ppcre/#scan ). For example, if I have a pattern
stored in *var* and I want a new pattern that's one or more
repetitions of that, I'd write `(,*var* *), or (list *var* '*).
It looks like LPEG's approach is functions that return opaque data and
other functions to combine these opaque data. For example, if I have
a pattern stored in *var* and I want a new pattern that's one or more
repetitions of that, I'd write (one-or-more *var*). The problem with
this approach is that you have to choose between verbosity or
confusion (e.g. LPEG uses * to concatenate things in the lpeg module,
but when present in a string for the re module it means what it
normally would in a PEG grammar). I haven't finished reading the
paper, but I'm assuming the advantage to this approach is that the
opaque representation is already compiled, so doing several hundred
matches with one grammar won't require several hundred compilations.
It seems like we could get the best of both worlds by having
transparent list representations, functions to compose them (so
(one-or-more *var*) yields the same thing as `(,*var* *)), and a
compile-peg function that would take the list representation and
produce a compiled PEG parser. All the peg functions would take
either the uncompiled or the compiled form (they would check at
runtime what they were dealing with), so in situations where speed is
important people can store the compiled grammar and re-use it.
> * PEG grammars as text?
> - Guile PEG language, parsed by PEG itself?
Once a parser's written, parsing strings into grammars sounds useful
and not too difficult.
> * Benchmarking
> - LPEG benchmarks would be a good first start
> * Test suite
> * Docs
> * Stream parsing?
>
> If you finished the first tasks, figuring out an efficient way to do
> stream parsing could well take all the rest of the time. (The LPEG paper
> works on strings.)
>
> So if you're down with it, perhaps you can do the PEG stuff, and No
> Itisnt can work on Lua. There's definitely enough work for everyone :)
> Hopefully the GOOG comes through with funding.
Cool, PEG it is then.
>
>
> So, let me know what you think about PEG, if you think it's the right
> size for summer. I think it has great potential, and implementing it
> well (as a compiler and procedurally) will be of great use. Otherwise we
> can go ahead with Python, if that's to your liking.
How does this look for a timeline (unless my math is very horrible,
the "midterm" evaluations are 70% of the way through, so most of the
real work will be done by then):
The ~7 weeks before mid-term evaluations are due:
1 week defining the PEG syntax carefully and writing functions for
dealing with it
3 weeks making the PEG syntax compile to VM code
1 week writing tests / fixing bugs that tests show
1 week documenting things that weren't well enough documented on the way
1 week for things to go wrong in
The ~3 weeks after mid-term evaluations are in:
1 week writing useful add-ons (e.g. taking PEG grammar in strings)
1 week writing benchmarks and tuning the speed
1 week for cleaning things up and writing more documentation/examples
The 1 week after suggested pencils-down date:
More documentation/examples.
next prev parent reply other threads:[~2010-03-30 5:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <52c42c3e1003242253w40bd6447k48e5a42cc1d33a98@mail.gmail.com>
[not found] ` <m3zl1ws738.fsf@pobox.com>
[not found] ` <52c42c3e1003251242g1a39b967pfdf5437b953bf054@mail.gmail.com>
[not found] ` <m3hbo3s3dl.fsf@pobox.com>
2010-03-29 6:18 ` Some Questions Michael Lucy
2010-03-29 9:40 ` Andy Wingo
2010-03-30 5:08 ` Michael Lucy [this message]
2002-05-09 5:59 Some questions Lynn Winebarger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52c42c3e1003292208v14b31651q9b8acd72e63bf890@mail.gmail.com \
--to=michaelglucy@gmail.com \
--cc=guile-devel@gnu.org \
--cc=wingo@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).