unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* PEG Parser
@ 2010-05-27  5:19 Michael Lucy
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Lucy @ 2010-05-27  5:19 UTC (permalink / raw)
  To: guile-devel

Hi,

I'm another of the GSOC students--I figured I'd introduce myself since
I may be asking questions on this list.  I'm going to be writing a PEG
parser/matcher for Guile.  My paperwork is still working its way
through the system so I can't point to any repositories where my code
will be yet, but it should be pretty well-labeled once it's in there
if anybody wants to take a look at it.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* PEG Parser
@ 2011-01-13 21:29 Noah Lavine
  2011-01-17 21:32 ` Ludovic Courtès
  0 siblings, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-13 21:29 UTC (permalink / raw)
  To: guile-devel

Hello all,

I was just looking at the PEG parser code, which is currently in the
"mlucy" branch of the Guile repo. I was looking at it to see if it was
ready to be merged.

First of all, it passes its test suite. This is pretty good, because
its test suite includes a grammar for itself, a grammar for basic
mathematical equations, and a grammar for parsing /etc/passwd files.
These all seem like good examples of things you'd want to parse.

I also looked at the code. It's pretty clear. I like the way it does
code generation by defining lots of little functions that generate
bits of code and then calling those functions. I tried to compare its
style to the existing LALR parser generator, but they're too different
- the LALR generator computes tables in advance but doesn't seem to
actually generate code (it just uses a pre-written driver function),
whereas the PEG parser generator doesn't do much other than generating
code. Also, the way it handles parsing descriptions of grammar is
really elegant - it first defines its code generator using an
s-expression representation of grammars, then uses an s-expression
representation of PEG grammar descriptions to parse PEG grammars
written as strings.

The only thing that seemed really weird is the macro safe-bind, which
basically reimplements hygienic macros. It's used all over the place
in the function-generating code. I also saw a couple of lines that
were too long, but not many, and the whole thing could be split into a
few modules, but it's not that long the way it is, so neither of those
seem like big issues. It could also stand to have a bigger test suite
(as the test suite itself says).

In summary, I think the code is good and functional, and should be
merged into Guile. I think it would also be very useful to have a PEG
parser generator.

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-13 21:29 PEG Parser Noah Lavine
@ 2011-01-17 21:32 ` Ludovic Courtès
  2011-01-21 15:23   ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Ludovic Courtès @ 2011-01-17 21:32 UTC (permalink / raw)
  To: guile-devel

Hi!

Thanks for looking into this!  Perhaps Andy has some insight to share,
too?

Noah Lavine <noah.b.lavine@gmail.com> writes:

> First of all, it passes its test suite. This is pretty good, because
> its test suite includes a grammar for itself, a grammar for basic
> mathematical equations, and a grammar for parsing /etc/passwd files.
> These all seem like good examples of things you'd want to parse.

While these are nice, small unit tests targeting specific parts of PEG
would seem useful to me, as in the rest of Guile.  This may not be
blocking, though.

[...]

> The only thing that seemed really weird is the macro safe-bind, which
> basically reimplements hygienic macros. It's used all over the place
> in the function-generating code. I also saw a couple of lines that
> were too long, but not many, and the whole thing could be split into a
> few modules, but it's not that long the way it is, so neither of those
> seem like big issues. It could also stand to have a bigger test suite
> (as the test suite itself says).

“./check-guile --coverage peg.test” can be used to measure code
coverage, normally.

I agree that PEG could be useful.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-17 21:32 ` Ludovic Courtès
@ 2011-01-21 15:23   ` Noah Lavine
  2011-01-22 21:02     ` Ludovic Courtès
  0 siblings, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-21 15:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hello,

> “./check-guile --coverage peg.test” can be used to measure code
> coverage, normally.

I tried running that, but what all I got was this output:

Running peg.test

Totals for this test run:
passes:                 21
failures:               0
unexpected passes:      0
expected failures:      0
unresolved test cases:  0
untested test cases:    0
unsupported test cases: 0
errors:                 0

That seems incorrect, but I don't know anything about check-guile.

I can probably make smaller unit tests too if you'd like. I'll need a
little bit more time looking at it to understand what to test.

I also have a process question - how can I send patches for the peg
stuff that don't erase mlucy's contributions? If I just sent a diff
between the mainline and myself, it would seem to all come from me. (I
just started a peg branch on my machine.)

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-21 15:23   ` Noah Lavine
@ 2011-01-22 21:02     ` Ludovic Courtès
  2011-01-24  1:29       ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Ludovic Courtès @ 2011-01-22 21:02 UTC (permalink / raw)
  To: Noah Lavine; +Cc: guile-devel

Hello!

Noah Lavine <noah.b.lavine@gmail.com> writes:

>> “./check-guile --coverage peg.test” can be used to measure code
>> coverage, normally.
>
> I tried running that, but what all I got was this output:
>
> Running peg.test
>
> Totals for this test run:
> passes:                 21
> failures:               0
> unexpected passes:      0
> expected failures:      0
> unresolved test cases:  0
> untested test cases:    0
> unsupported test cases: 0
> errors:                 0
>
> That seems incorrect, but I don't know anything about check-guile.

It should have produced $top_builddir/guile.info, which can be used as
input to LCOV to generate an HTML code coverage report
(http://ltp.sourceforge.net/coverage/lcov.php).

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-22 21:02     ` Ludovic Courtès
@ 2011-01-24  1:29       ` Noah Lavine
  2011-01-24 20:55         ` Ludovic Courtès
  2011-01-27  1:40         ` Noah Lavine
  0 siblings, 2 replies; 27+ messages in thread
From: Noah Lavine @ 2011-01-24  1:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hello all,

> It should have produced $top_builddir/guile.info, which can be used as
> input to LCOV to generate an HTML code coverage report
> (http://ltp.sourceforge.net/coverage/lcov.php).

Oh, that worked. The current tests check 92.6% of the lines in
peg.scm, and 90.7% of the functions. I looked through lcov's HTML
guide, and it looks like what the tests miss is almost all
error-handling code. However, I must say that the HTML output looked a
bit suspicious - for instance, there were places where the first line
of a function was marked as hit, but the second line was not.

On another note, I looked at the PEG documentation, and it was quite good.

When I merged the 'mlucy' branch into Guile mainline, the merge went
almost cleanly - the only issues were a page of links in Guile's
documentation, which was a two-line issue, and ice-9/psyntax-pp.scm,
which I fixed by choosing the mainline's version and had no problems.

Given this, what are the issues blocking PEG being merged (if there
are any)? I'd like to work on them.

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-24  1:29       ` Noah Lavine
@ 2011-01-24 20:55         ` Ludovic Courtès
  2011-01-27  1:40         ` Noah Lavine
  1 sibling, 0 replies; 27+ messages in thread
From: Ludovic Courtès @ 2011-01-24 20:55 UTC (permalink / raw)
  To: guile-devel

Hi,

Noah Lavine <noah.b.lavine@gmail.com> writes:

>> It should have produced $top_builddir/guile.info, which can be used as
>> input to LCOV to generate an HTML code coverage report
>> (http://ltp.sourceforge.net/coverage/lcov.php).
>
> Oh, that worked. The current tests check 92.6% of the lines in
> peg.scm, and 90.7% of the functions. I looked through lcov's HTML
> guide, and it looks like what the tests miss is almost all
> error-handling code. However, I must say that the HTML output looked a
> bit suspicious - for instance, there were places where the first line
> of a function was marked as hit, but the second line was not.

Often, if a function is defined but unused, the ‘define’ line will be
marked as hit whereas the rest of the function will remain red–but
that’s OK since strictly speaking the ‘define’ line /is/ executed.

However, it could be that the code coverage code is buggy.  If you find
obvious errors and can reduce them, I’d be happy to look into it.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-24  1:29       ` Noah Lavine
  2011-01-24 20:55         ` Ludovic Courtès
@ 2011-01-27  1:40         ` Noah Lavine
  2011-01-27  2:23           ` Michael Lucy
  1 sibling, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-27  1:40 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2613 bytes --]

Hello again,

I've attached my coverage results. The html file expects the css file
to be in the same directory. If you look at the html file, you'll see
that almost all of peg.scm is hit by the tests.

As far as I can tell, the two functions that are not tested are
keyword-flatten (line 512) and peg-string-compile (line 713). I looked
at these, but I don't understand what either of them does well enough
to test them yet.

Other than that, some error code is not hit, and the ends of some cond
clauses. These should be tested more, but I need to understand the
code more to know what will test them. There are also a few lines that
I find suspicious, in particular lines 39, 134-136, 146, 157, 165,
506-508, and 649. (Ludovic - sorry I haven't isolated test cases. I'm
just pointing these out now to show that possibly the test suite tests
more than the coverage makes it appear. In the future I might be able
to isolate the issues.)

Lines 14-15 look to me like a function that was used for debugging and
now serves no purpose.

Let me give a few more overall thoughts on peg.scm, after working with
it for a few more days. It looks like good code, but the documentation
isn't great. It took me several read-throughs to figure out what some
of it did, and I'm still not sure about those two functions that don't
have tests. (Although they are a small part of the overall module.)
I'm not sure what this means about its fitness to merge.

Noah

On Sun, Jan 23, 2011 at 8:29 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello all,
>
>> It should have produced $top_builddir/guile.info, which can be used as
>> input to LCOV to generate an HTML code coverage report
>> (http://ltp.sourceforge.net/coverage/lcov.php).
>
> Oh, that worked. The current tests check 92.6% of the lines in
> peg.scm, and 90.7% of the functions. I looked through lcov's HTML
> guide, and it looks like what the tests miss is almost all
> error-handling code. However, I must say that the HTML output looked a
> bit suspicious - for instance, there were places where the first line
> of a function was marked as hit, but the second line was not.
>
> On another note, I looked at the PEG documentation, and it was quite good.
>
> When I merged the 'mlucy' branch into Guile mainline, the merge went
> almost cleanly - the only issues were a page of links in Guile's
> documentation, which was a two-line issue, and ice-9/psyntax-pp.scm,
> which I fixed by choosing the mainline's version and had no problems.
>
> Given this, what are the issues blocking PEG being merged (if there
> are any)? I'd like to work on them.
>
> Noah
>

[-- Attachment #2: peg.scm.gcov.html --]
[-- Type: text/html, Size: 93518 bytes --]

[-- Attachment #3: gcov.css --]
[-- Type: text/css, Size: 9893 bytes --]

/* All views: initial background and text color */
body
{
  color: #000000;
  background-color: #FFFFFF;
}

/* All views: standard link format*/
a:link
{
  color: #284FA8;
  text-decoration: underline;
}

/* All views: standard link - visited format */
a:visited
{
  color: #00CB40;
  text-decoration: underline;
}

/* All views: standard link - activated format */
a:active
{
  color: #FF0040;
  text-decoration: underline;
}

/* All views: main title format */
td.title
{
  text-align: center;
  padding-bottom: 10px;
  font-family: sans-serif;
  font-size: 20pt;
  font-style: italic;
  font-weight: bold;
}

/* All views: header item format */
td.headerItem
{
  text-align: right;
  padding-right: 6px;
  font-family: sans-serif;
  font-weight: bold;
  vertical-align: top;
  white-space: nowrap;
}

/* All views: header item value format */
td.headerValue
{
  text-align: left;
  color: #284FA8;
  font-family: sans-serif;
  font-weight: bold;
  white-space: nowrap;
}

/* All views: header item coverage table heading */
td.headerCovTableHead
{
  text-align: center;
  padding-right: 6px;
  padding-left: 6px;
  padding-bottom: 0px;
  font-family: sans-serif;
  font-size: 80%;
  white-space: nowrap;
}

/* All views: header item coverage table entry */
td.headerCovTableEntry
{
  text-align: right;
  color: #284FA8;
  font-family: sans-serif;
  font-weight: bold;
  white-space: nowrap;
  padding-left: 12px;
  padding-right: 4px;
  background-color: #DAE7FE;
}

/* All views: header item coverage table entry for high coverage rate */
td.headerCovTableEntryHi
{
  text-align: right;
  color: #000000;
  font-family: sans-serif;
  font-weight: bold;
  white-space: nowrap;
  padding-left: 12px;
  padding-right: 4px;
  background-color: #A7FC9D;
}

/* All views: header item coverage table entry for medium coverage rate */
td.headerCovTableEntryMed
{
  text-align: right;
  color: #000000;
  font-family: sans-serif;
  font-weight: bold;
  white-space: nowrap;
  padding-left: 12px;
  padding-right: 4px;
  background-color: #FFEA20;
}

/* All views: header item coverage table entry for ow coverage rate */
td.headerCovTableEntryLo
{
  text-align: right;
  color: #000000;
  font-family: sans-serif;
  font-weight: bold;
  white-space: nowrap;
  padding-left: 12px;
  padding-right: 4px;
  background-color: #FF0000;
}

/* All views: header legend value for legend entry */
td.headerValueLeg
{
  text-align: left;
  color: #000000;
  font-family: sans-serif;
  font-size: 80%;
  white-space: nowrap;
  padding-top: 4px;
}

/* All views: color of horizontal ruler */
td.ruler
{
  background-color: #6688D4;
}

/* All views: version string format */
td.versionInfo
{
  text-align: center;
  padding-top: 2px;
  font-family: sans-serif;
  font-style: italic;
}

/* Directory view/File view (all)/Test case descriptions:
   table headline format */
td.tableHead
{
  text-align: center;
  color: #FFFFFF;
  background-color: #6688D4;
  font-family: sans-serif;
  font-size: 120%;
  font-weight: bold;
  white-space: nowrap;
  padding-left: 4px;
  padding-right: 4px;
}

span.tableHeadSort
{
  padding-right: 4px;
}

/* Directory view/File view (all): filename entry format */
td.coverFile
{
  text-align: left;
  padding-left: 10px;
  padding-right: 20px; 
  color: #284FA8;
  background-color: #DAE7FE;
  font-family: monospace;
}

/* Directory view/File view (all): bar-graph entry format*/
td.coverBar
{
  padding-left: 10px;
  padding-right: 10px;
  background-color: #DAE7FE;
}

/* Directory view/File view (all): bar-graph outline color */
td.coverBarOutline
{
  background-color: #000000;
}

/* Directory view/File view (all): percentage entry for files with
   high coverage rate */
td.coverPerHi
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #A7FC9D;
  font-weight: bold;
  font-family: sans-serif;
}

/* Directory view/File view (all): line count entry for files with
   high coverage rate */
td.coverNumHi
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #A7FC9D;
  white-space: nowrap;
  font-family: sans-serif;
}

/* Directory view/File view (all): percentage entry for files with
   medium coverage rate */
td.coverPerMed
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #FFEA20;
  font-weight: bold;
  font-family: sans-serif;
}

/* Directory view/File view (all): line count entry for files with
   medium coverage rate */
td.coverNumMed
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #FFEA20;
  white-space: nowrap;
  font-family: sans-serif;
}

/* Directory view/File view (all): percentage entry for files with
   low coverage rate */
td.coverPerLo
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #FF0000;
  font-weight: bold;
  font-family: sans-serif;
}

/* Directory view/File view (all): line count entry for files with
   low coverage rate */
td.coverNumLo
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #FF0000;
  white-space: nowrap;
  font-family: sans-serif;
}

/* File view (all): "show/hide details" link format */
a.detail:link
{
  color: #B8D0FF;
  font-size:80%;
}

/* File view (all): "show/hide details" link - visited format */
a.detail:visited
{
  color: #B8D0FF;
  font-size:80%;
}

/* File view (all): "show/hide details" link - activated format */
a.detail:active
{
  color: #FFFFFF;
  font-size:80%;
}

/* File view (detail): test name entry */
td.testName
{
  text-align: right;
  padding-right: 10px;
  background-color: #DAE7FE;
  font-family: sans-serif;
}

/* File view (detail): test percentage entry */
td.testPer
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px; 
  background-color: #DAE7FE;
  font-family: sans-serif;
}

/* File view (detail): test lines count entry */
td.testNum
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px; 
  background-color: #DAE7FE;
  font-family: sans-serif;
}

/* Test case descriptions: test name format*/
dt
{
  font-family: sans-serif;
  font-weight: bold;
}

/* Test case descriptions: description table body */
td.testDescription
{
  padding-top: 10px;
  padding-left: 30px;
  padding-bottom: 10px;
  padding-right: 30px;
  background-color: #DAE7FE;
}

/* Source code view: function entry */
td.coverFn
{
  text-align: left;
  padding-left: 10px;
  padding-right: 20px; 
  color: #284FA8;
  background-color: #DAE7FE;
  font-family: monospace;
}

/* Source code view: function entry zero count*/
td.coverFnLo
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #FF0000;
  font-weight: bold;
  font-family: sans-serif;
}

/* Source code view: function entry nonzero count*/
td.coverFnHi
{
  text-align: right;
  padding-left: 10px;
  padding-right: 10px;
  background-color: #DAE7FE;
  font-weight: bold;
  font-family: sans-serif;
}

/* Source code view: source code format */
pre.source
{
  font-family: monospace;
  white-space: pre;
  margin-top: 2px;
}

/* Source code view: line number format */
span.lineNum
{
  background-color: #EFE383;
}

/* Source code view: format for lines which were executed */
td.lineCov,
span.lineCov
{
  background-color: #CAD7FE;
}

/* Source code view: format for Cov legend */
span.coverLegendCov
{
  padding-left: 10px;
  padding-right: 10px;
  padding-bottom: 2px;
  background-color: #CAD7FE;
}

/* Source code view: format for lines which were not executed */
td.lineNoCov,
span.lineNoCov
{
  background-color: #FF6230;
}

/* Source code view: format for NoCov legend */
span.coverLegendNoCov
{
  padding-left: 10px;
  padding-right: 10px;
  padding-bottom: 2px;
  background-color: #FF6230;
}

/* Source code view (function table): standard link - visited format */
td.lineNoCov > a:visited,
td.lineCov > a:visited
{  
  color: black;
  text-decoration: underline;
}  

/* Source code view: format for lines which were executed only in a
   previous version */
span.lineDiffCov
{
  background-color: #B5F7AF;
}

/* Source code view: format for branches which were executed
 * and taken */
span.branchCov
{
  background-color: #CAD7FE;
}

/* Source code view: format for branches which were executed
 * but not taken */
span.branchNoCov
{
  background-color: #FF6230;
}

/* Source code view: format for branches which were not executed */
span.branchNoExec
{
  background-color: #FF6230;
}

/* Source code view: format for the source code heading line */
pre.sourceHeading
{
  white-space: pre;
  font-family: monospace;
  font-weight: bold;
  margin: 0px;
}

/* All views: header legend value for low rate */
td.headerValueLegL
{
  font-family: sans-serif;
  text-align: center;
  white-space: nowrap;
  padding-left: 4px;
  padding-right: 2px;
  background-color: #FF0000;
  font-size: 80%;
}

/* All views: header legend value for med rate */
td.headerValueLegM
{
  font-family: sans-serif;
  text-align: center;
  white-space: nowrap;
  padding-left: 2px;
  padding-right: 2px;
  background-color: #FFEA20;
  font-size: 80%;
}

/* All views: header legend value for hi rate */
td.headerValueLegH
{
  font-family: sans-serif;
  text-align: center;
  white-space: nowrap;
  padding-left: 2px;
  padding-right: 4px;
  background-color: #A7FC9D;
  font-size: 80%;
}

/* All views except source code view: legend format for low coverage */
span.coverLegendCovLo
{
  padding-left: 10px;
  padding-right: 10px;
  padding-top: 2px;
  background-color: #FF0000;
}

/* All views except source code view: legend format for med coverage */
span.coverLegendCovMed
{
  padding-left: 10px;
  padding-right: 10px;
  padding-top: 2px;
  background-color: #FFEA20;
}

/* All views except source code view: legend format for hi coverage */
span.coverLegendCovHi
{
  padding-left: 10px;
  padding-right: 10px;
  padding-top: 2px;
  background-color: #A7FC9D;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  1:40         ` Noah Lavine
@ 2011-01-27  2:23           ` Michael Lucy
  2011-01-27  2:38             ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Lucy @ 2011-01-27  2:23 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Ludovic Courtès, guile-devel

Hi,

I'm the guy that originally wrote this for GSOC, so I figured I'd jump
in.  I'd be happy to help with getting the PEG module merge-ready.

On Wed, Jan 26, 2011 at 7:40 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello again,
>
> I've attached my coverage results. The html file expects the css file
> to be in the same directory. If you look at the html file, you'll see
> that almost all of peg.scm is hit by the tests.
>
> As far as I can tell, the two functions that are not tested are
> keyword-flatten (line 512) and peg-string-compile (line 713). I looked
> at these, but I don't understand what either of them does well enough
> to test them yet.

keyword-flatten is described in api-peg.texi.  It's basically a
special case of context-flatten which collapses S-expressions
according to the symbol they start with.  From the documentation:
@deffn {Scheme Procedure} keyword-flatten terms lst
A less general form of @code{context-flatten}. Takes a list of
terminal atoms @code{terms} and flattens @var{lst} until all elements
are either atoms, or lists which have an atom from @code{terms} as
their first element.
@lisp
(keyword-flatten '(a b) '(c a b (a c) (b c) (c (b a) (c a)))) @result{}
(c a b (a c) (b c) c (b a) c a)
@end lisp

peg-string-compile is a function that will compile PEGs expressed as
strings into lambda expressions.  It does this by first parsing the
string using the PEG-parsing-PEG, then turning the output into an
S-expression representation of a PEG, compressing it, and passing the
compressed S-expression to peg-sexp-compile.  It's used if you e.g.
call peg-match with a string instead of an S-expression as the first
argument.

>
> Other than that, some error code is not hit, and the ends of some cond
> clauses. These should be tested more, but I need to understand the
> code more to know what will test them. There are also a few lines that
> I find suspicious, in particular lines 39, 134-136, 146, 157, 165,
> 506-508, and 649. (Ludovic - sorry I haven't isolated test cases. I'm
> just pointing these out now to show that possibly the test suite tests
> more than the coverage makes it appear. In the future I might be able
> to isolate the issues.)
>
> Lines 14-15 look to me like a function that was used for debugging and
> now serves no purpose.
>
> Let me give a few more overall thoughts on peg.scm, after working with
> it for a few more days. It looks like good code, but the documentation
> isn't great. It took me several read-throughs to figure out what some
> of it did, and I'm still not sure about those two functions that don't
> have tests. (Although they are a small part of the overall module.)
> I'm not sure what this means about its fitness to merge.

If you give me some more details on the parts that are weakest I'll
try and beef up the documentation.

>
> Noah
>
> On Sun, Jan 23, 2011 at 8:29 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
>> Hello all,
>>
>>> It should have produced $top_builddir/guile.info, which can be used as
>>> input to LCOV to generate an HTML code coverage report
>>> (http://ltp.sourceforge.net/coverage/lcov.php).
>>
>> Oh, that worked. The current tests check 92.6% of the lines in
>> peg.scm, and 90.7% of the functions. I looked through lcov's HTML
>> guide, and it looks like what the tests miss is almost all
>> error-handling code. However, I must say that the HTML output looked a
>> bit suspicious - for instance, there were places where the first line
>> of a function was marked as hit, but the second line was not.
>>
>> On another note, I looked at the PEG documentation, and it was quite good.
>>
>> When I merged the 'mlucy' branch into Guile mainline, the merge went
>> almost cleanly - the only issues were a page of links in Guile's
>> documentation, which was a two-line issue, and ice-9/psyntax-pp.scm,
>> which I fixed by choosing the mainline's version and had no problems.
>>
>> Given this, what are the issues blocking PEG being merged (if there
>> are any)? I'd like to work on them.
>>
>> Noah
>>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  2:23           ` Michael Lucy
@ 2011-01-27  2:38             ` Noah Lavine
  2011-01-27  3:02               ` Michael Lucy
  0 siblings, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-27  2:38 UTC (permalink / raw)
  To: Michael Lucy; +Cc: Ludovic Courtès, guile-devel

Hello,

> I'm the guy that originally wrote this for GSOC, so I figured I'd jump
> in.  I'd be happy to help with getting the PEG module merge-ready.

Great!

> keyword-flatten is described in api-peg.texi.  It's basically a
> special case of context-flatten which collapses S-expressions
> according to the symbol they start with.  From the documentation:
> @deffn {Scheme Procedure} keyword-flatten terms lst
> A less general form of @code{context-flatten}. Takes a list of
> terminal atoms @code{terms} and flattens @var{lst} until all elements
> are either atoms, or lists which have an atom from @code{terms} as
> their first element.
> @lisp
> (keyword-flatten '(a b) '(c a b (a c) (b c) (c (b a) (c a)))) @result{}
> (c a b (a c) (b c) c (b a) c a)
> @end lisp

Okay. I was confused about the behavior of keyword-flatten when the
car of its argument is not in its list of keywords. I expected that to
change, but it didn't. Looking at it now, is it true that the car of
the argument never changes? So keyword-flatten flattens the sublists,
but has to leave the overall one alone?

> peg-string-compile is a function that will compile PEGs expressed as
> strings into lambda expressions.  It does this by first parsing the
> string using the PEG-parsing-PEG, then turning the output into an
> S-expression representation of a PEG, compressing it, and passing the
> compressed S-expression to peg-sexp-compile.  It's used if you e.g.
> call peg-match with a string instead of an S-expression as the first
> argument.

I tried running peg-string-compile and then passing the result to
eval, but I got an error because there was no symbol 'Begin. (capital
B intended) Do you know what's going on with that? It might be
something weird involving memoized symbols. Also, just to make sure I
get it - this function produces Scheme code, right? (As opposed to
Tree-IL or any of Guile's other languages.)

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  2:38             ` Noah Lavine
@ 2011-01-27  3:02               ` Michael Lucy
  2011-01-27  5:17                 ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Lucy @ 2011-01-27  3:02 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Ludovic Courtès, guile-devel

On Wed, Jan 26, 2011 at 8:38 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello,
>
>> I'm the guy that originally wrote this for GSOC, so I figured I'd jump
>> in.  I'd be happy to help with getting the PEG module merge-ready.
>
> Great!
>
>> keyword-flatten is described in api-peg.texi.  It's basically a
>> special case of context-flatten which collapses S-expressions
>> according to the symbol they start with.  From the documentation:
>> @deffn {Scheme Procedure} keyword-flatten terms lst
>> A less general form of @code{context-flatten}. Takes a list of
>> terminal atoms @code{terms} and flattens @var{lst} until all elements
>> are either atoms, or lists which have an atom from @code{terms} as
>> their first element.
>> @lisp
>> (keyword-flatten '(a b) '(c a b (a c) (b c) (c (b a) (c a)))) @result{}
>> (c a b (a c) (b c) c (b a) c a)
>> @end lisp
>
> Okay. I was confused about the behavior of keyword-flatten when the
> car of its argument is not in its list of keywords. I expected that to
> change, but it didn't. Looking at it now, is it true that the car of
> the argument never changes? So keyword-flatten flattens the sublists,
> but has to leave the overall one alone?

Yeah, the overall list is always there; only the sublists will ever be
collapsed down.  But the car can change.

It flattens until you have a list where each element either:
1. Is an atom.
or 2. Is a list whose first element is in the list of keywords.

So the car of the argument will change if it's a list that doesn't
start with the right keyword.  E.g.:
(keyword-flatten '(a) '((c (a b)) (a b) (b a))) ->
(c (a b) (a b) b a)

>
>> peg-string-compile is a function that will compile PEGs expressed as
>> strings into lambda expressions.  It does this by first parsing the
>> string using the PEG-parsing-PEG, then turning the output into an
>> S-expression representation of a PEG, compressing it, and passing the
>> compressed S-expression to peg-sexp-compile.  It's used if you e.g.
>> call peg-match with a string instead of an S-expression as the first
>> argument.
>
> I tried running peg-string-compile and then passing the result to
> eval, but I got an error because there was no symbol 'Begin. (capital
> B intended) Do you know what's going on with that? It might be
> something weird involving memoized symbols. Also, just to make sure I
> get it - this function produces Scheme code, right? (As opposed to
> Tree-IL or any of Guile's other languages.)

That's odd.  Could you paste in the lambda expression it generates?

(Yeah, it produces Scheme code.)

>
> Noah
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  3:02               ` Michael Lucy
@ 2011-01-27  5:17                 ` Noah Lavine
  2011-01-28  3:25                   ` Noah Lavine
  2011-01-28 15:48                   ` Andy Wingo
  0 siblings, 2 replies; 27+ messages in thread
From: Noah Lavine @ 2011-01-27  5:17 UTC (permalink / raw)
  To: Michael Lucy; +Cc: Ludovic Courtès, guile-devel

Hi,

> It flattens until you have a list where each element either:
> 1. Is an atom.
> or 2. Is a list whose first element is in the list of keywords.
>
> So the car of the argument will change if it's a list that doesn't
> start with the right keyword.  E.g.:
> (keyword-flatten '(a) '((c (a b)) (a b) (b a))) ->
> (c (a b) (a b) b a)

I see.

> That's odd.  Could you paste in the lambda expression it generates?

Wait, actually, I realized it was my own error. My test function was
using nonterminals, but I had only done (define grammar-string ....),
not (define-grammar grammar-string). I think the generated code tried
to call the nonterminals that should have been there, and threw an
error when it didn't find them.

The peg matcher is really awesome. I am glad to be able to use it soon.

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  5:17                 ` Noah Lavine
@ 2011-01-28  3:25                   ` Noah Lavine
  2011-01-28  5:13                     ` Michael Lucy
  2011-01-28 15:48                   ` Andy Wingo
  1 sibling, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-28  3:25 UTC (permalink / raw)
  To: Michael Lucy; +Cc: Ludovic Courtès, guile-devel

Hello again,

I've run into another issue with the parser that should be addressed
in the documentation (or the code?).

How do you match a character that is used in the PEG grammar, like
"["? Can you only define such matchers via s-expressions?

Noah

On Thu, Jan 27, 2011 at 12:17 AM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hi,
>
>> It flattens until you have a list where each element either:
>> 1. Is an atom.
>> or 2. Is a list whose first element is in the list of keywords.
>>
>> So the car of the argument will change if it's a list that doesn't
>> start with the right keyword.  E.g.:
>> (keyword-flatten '(a) '((c (a b)) (a b) (b a))) ->
>> (c (a b) (a b) b a)
>
> I see.
>
>> That's odd.  Could you paste in the lambda expression it generates?
>
> Wait, actually, I realized it was my own error. My test function was
> using nonterminals, but I had only done (define grammar-string ....),
> not (define-grammar grammar-string). I think the generated code tried
> to call the nonterminals that should have been there, and threw an
> error when it didn't find them.
>
> The peg matcher is really awesome. I am glad to be able to use it soon.
>
> Noah
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-28  3:25                   ` Noah Lavine
@ 2011-01-28  5:13                     ` Michael Lucy
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Lucy @ 2011-01-28  5:13 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Ludovic Courtès, guile-devel

Putting it in single quotes should work.  E.g. " '[' ".  Similarly,
you can add a literal single quote by making it part of a character
class.  E.g. " ['] ".

You're right, I should definitely add something about that to the documentation.

Any other weak parts you've noticed while perusing?

On Thu, Jan 27, 2011 at 9:25 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello again,
>
> I've run into another issue with the parser that should be addressed
> in the documentation (or the code?).
>
> How do you match a character that is used in the PEG grammar, like
> "["? Can you only define such matchers via s-expressions?
>
> Noah
>
> On Thu, Jan 27, 2011 at 12:17 AM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
>> Hi,
>>
>>> It flattens until you have a list where each element either:
>>> 1. Is an atom.
>>> or 2. Is a list whose first element is in the list of keywords.
>>>
>>> So the car of the argument will change if it's a list that doesn't
>>> start with the right keyword.  E.g.:
>>> (keyword-flatten '(a) '((c (a b)) (a b) (b a))) ->
>>> (c (a b) (a b) b a)
>>
>> I see.
>>
>>> That's odd.  Could you paste in the lambda expression it generates?
>>
>> Wait, actually, I realized it was my own error. My test function was
>> using nonterminals, but I had only done (define grammar-string ....),
>> not (define-grammar grammar-string). I think the generated code tried
>> to call the nonterminals that should have been there, and threw an
>> error when it didn't find them.
>>
>> The peg matcher is really awesome. I am glad to be able to use it soon.
>>
>> Noah
>>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-27  5:17                 ` Noah Lavine
  2011-01-28  3:25                   ` Noah Lavine
@ 2011-01-28 15:48                   ` Andy Wingo
  2011-01-29  3:07                     ` Noah Lavine
  1 sibling, 1 reply; 27+ messages in thread
From: Andy Wingo @ 2011-01-28 15:48 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

Heya,

On Thu 27 Jan 2011 06:17, Noah Lavine <noah.b.lavine@gmail.com> writes:

> The peg matcher is really awesome. I am glad to be able to use it soon.

Indeed!  I'm looking forward to having it in Guile.

The reason I didn't merge it yet was twofold: (1) the commit logs were
not in the standard style, and (2) I wasn't comfortable with the macro
binding stuff.  (2) was sufficiently big that I didn't have time to fix
it up.  My apologies for stalling on this...  But maybe now is the time
to look at it again!

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-28 15:48                   ` Andy Wingo
@ 2011-01-29  3:07                     ` Noah Lavine
  2011-01-29  4:15                       ` Michael Lucy
  2011-01-29 11:33                       ` Andy Wingo
  0 siblings, 2 replies; 27+ messages in thread
From: Noah Lavine @ 2011-01-29  3:07 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

Hello,

> Indeed!  I'm looking forward to having it in Guile.

Great! I don't intend to be pressuring you to merge it, by the way. I
hope my message didn't come across that way.

> The reason I didn't merge it yet was twofold: (1) the commit logs were
> not in the standard style, and (2) I wasn't comfortable with the macro
> binding stuff.  (2) was sufficiently big that I didn't have time to fix
> it up.  My apologies for stalling on this...  But maybe now is the time
> to look at it again!

Yes, the binding stuff actually seemed very weird to me too, mostly
because it's very non-idiomatic. More specifically, the peg module
does most of its code generation in regular functions rather than
macros. safe-bind implements hygiene for these code-generating
functions. It's basically what you'd get with defmacro.

I eventually came to the conclusion, however, that it's really a
symptom of a larger issue.  Most likely it would be possible to take
all of the functions that use safe-bind and rewrite them as macros
themselves, using either syntax-case or syntax-rules. However, I am
afraid that the result might be incredibly ugly. In order to make the
code more readable, we use normal functions for code generation and
provide hygiene through safe-bind.

So the larger issue is that Scheme needs nicer-looking ways to define
macros. If it had those, we wouldn't be using hacks like safe-bind.
But we don't, and that's a huge issue to tackle.

Those are my thoughts, but I would be quite curious to know if you (or
anyone else) think the same way, or think something different is going
on.

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-29  3:07                     ` Noah Lavine
@ 2011-01-29  4:15                       ` Michael Lucy
  2011-01-29 11:34                         ` Andy Wingo
  2011-01-29 11:33                       ` Andy Wingo
  1 sibling, 1 reply; 27+ messages in thread
From: Michael Lucy @ 2011-01-29  4:15 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Andy Wingo, Ludovic Courtès, guile-devel

On Fri, Jan 28, 2011 at 9:07 PM, Noah Lavine <noah.b.lavine@gmail.com> wrote:
> Hello,
>
>> Indeed!  I'm looking forward to having it in Guile.
>
> Great! I don't intend to be pressuring you to merge it, by the way. I
> hope my message didn't come across that way.
>
>> The reason I didn't merge it yet was twofold: (1) the commit logs were
>> not in the standard style, and (2) I wasn't comfortable with the macro
>> binding stuff.  (2) was sufficiently big that I didn't have time to fix
>> it up.  My apologies for stalling on this...  But maybe now is the time
>> to look at it again!
>
> Yes, the binding stuff actually seemed very weird to me too, mostly
> because it's very non-idiomatic. More specifically, the peg module
> does most of its code generation in regular functions rather than
> macros. safe-bind implements hygiene for these code-generating
> functions. It's basically what you'd get with defmacro.
>
> I eventually came to the conclusion, however, that it's really a
> symptom of a larger issue.  Most likely it would be possible to take
> all of the functions that use safe-bind and rewrite them as macros
> themselves, using either syntax-case or syntax-rules. However, I am
> afraid that the result might be incredibly ugly. In order to make the
> code more readable, we use normal functions for code generation and
> provide hygiene through safe-bind.
>
> So the larger issue is that Scheme needs nicer-looking ways to define
> macros. If it had those, we wouldn't be using hacks like safe-bind.
> But we don't, and that's a huge issue to tackle.
>
> Those are my thoughts, but I would be quite curious to know if you (or
> anyone else) think the same way, or think something different is going
> on.

So, that's sort of a complicated story.  The safe-bind thing is a
standard idiom in the Common Lisp world (although it goes by many
names).  The most immediate reason it's there is that I was struggling
a bit with the hygienic macro system and fell back on what I knew,
then ran out of time trying to turn everything into a syntax-case
macro before the end of GSOC (which is why the data gets turned into
syntax at several points near the end of the process).

If we're talking about root causes, I think you're about right, at
least in my case.  It's pretty standard in the CL world to write large
macros by breaking the code out into functions, because it's
considered easier to reason about functions than macros.  I remember
Andy told me that functions can generate syntax, and I tried that, but
I had a remarkably hard time making it actually work correctly.  Part
of the problem was probably that I never really understood how syntax
worked, whereas I understand very well how lists work.

Also, macros are notoriously difficult to debug, especially when
they're generating several hundred lines of code that compiles fine
but mysteriously produces the wrong result after a seemingly trivial
change.  So porting them from list generation to syntax generation
turned out to be harder in practice than I had thought it would be.

>
> Noah
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-29  3:07                     ` Noah Lavine
  2011-01-29  4:15                       ` Michael Lucy
@ 2011-01-29 11:33                       ` Andy Wingo
  1 sibling, 0 replies; 27+ messages in thread
From: Andy Wingo @ 2011-01-29 11:33 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

On Sat 29 Jan 2011 04:07, Noah Lavine <noah.b.lavine@gmail.com> writes:

> Yes, the binding stuff actually seemed very weird to me too, mostly
> because it's very non-idiomatic. More specifically, the peg module
> does most of its code generation in regular functions rather than
> macros. safe-bind implements hygiene for these code-generating
> functions. It's basically what you'd get with defmacro.

It's perfectly fine for regular functions to use `syntax' or
`quasisyntax' to build up a result, in the hygienic context.

Syntax transformers are just functions that transform syntax to syntax.
See the manual for more.  I recommend reading the entire "Macros"
section.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-29  4:15                       ` Michael Lucy
@ 2011-01-29 11:34                         ` Andy Wingo
  2011-01-29 19:37                           ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Wingo @ 2011-01-29 11:34 UTC (permalink / raw)
  To: Michael Lucy; +Cc: Ludovic Courtès, guile-devel

On Sat 29 Jan 2011 05:15, Michael Lucy <MichaelGLucy@Gmail.com> writes:

> Also, macros are notoriously difficult to debug, especially when
> they're generating several hundred lines of code that compiles fine
> but mysteriously produces the wrong result after a seemingly trivial
> change.  So porting them from list generation to syntax generation
> turned out to be harder in practice than I had thought it would be.

Hey no worries, you got the thing working in the first place.  We'll see
about porting it to use hygienic expansion :)

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-29 11:34                         ` Andy Wingo
@ 2011-01-29 19:37                           ` Noah Lavine
  2011-01-30 11:43                             ` Andy Wingo
  0 siblings, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-01-29 19:37 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

Hello all,

Sorry for the late notice, but I'm about halfway through porting
peg.scm to use hygienic macros, so if anyone else was thinking of
doing it, you might want to save your effort for something else. Or
email me and I'll send you what I have.

I hope no one else has already started.

Noah

On Sat, Jan 29, 2011 at 6:34 AM, Andy Wingo <wingo@pobox.com> wrote:
> On Sat 29 Jan 2011 05:15, Michael Lucy <MichaelGLucy@Gmail.com> writes:
>
>> Also, macros are notoriously difficult to debug, especially when
>> they're generating several hundred lines of code that compiles fine
>> but mysteriously produces the wrong result after a seemingly trivial
>> change.  So porting them from list generation to syntax generation
>> turned out to be harder in practice than I had thought it would be.
>
> Hey no worries, you got the thing working in the first place.  We'll see
> about porting it to use hygienic expansion :)
>
> Cheers,
>
> Andy
> --
> http://wingolog.org/
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-29 19:37                           ` Noah Lavine
@ 2011-01-30 11:43                             ` Andy Wingo
  2011-02-02  0:26                               ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Wingo @ 2011-01-30 11:43 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

On Sat 29 Jan 2011 20:37, Noah Lavine <noah.b.lavine@gmail.com> writes:

> I'm about halfway through porting peg.scm to use hygienic macros

Super!  I'll work on other bugs then :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-01-30 11:43                             ` Andy Wingo
@ 2011-02-02  0:26                               ` Noah Lavine
  2011-02-06 15:31                                 ` Noah Lavine
  2011-02-18 22:03                                 ` Andy Wingo
  0 siblings, 2 replies; 27+ messages in thread
From: Noah Lavine @ 2011-02-02  0:26 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

Hello again,

Here it is! All of the unhygienic syntax is gone, is a series of only
20 commits. :-) The peg.test tests should all pass after each one of
these commits.

However, I have a suspicion something odd is going on and this does
not contain all of the work it should from guile's repo. I ended up
having to use git cherry-pick instead of git rebase because I had made
a mistake when I first made my local peg branch. I checked out
origin/wip-mlucy and then cherry-picked my commits on top of that.

It worked well, but in one of my earlier attempts before I decided
that git-rebase wasn't going to work, I saw that someone had renamed
peg-sexp-compile to cg-match-func and added some more test cases to
peg.test. Now I don't see those additions, and if they're not in
wip-mlucy then I'm not sure where to find them. Does anyone know where
I should look for that?

Thanks,
Noah

On Sun, Jan 30, 2011 at 6:43 AM, Andy Wingo <wingo@pobox.com> wrote:
> On Sat 29 Jan 2011 20:37, Noah Lavine <noah.b.lavine@gmail.com> writes:
>
>> I'm about halfway through porting peg.scm to use hygienic macros
>
> Super!  I'll work on other bugs then :)
>
> Andy
> --
> http://wingolog.org/
>

[-- Attachment #2: 0001-module-ice-9-peg.scm-split-define-nonterm-into-two-f.patch --]
[-- Type: application/octet-stream, Size: 3720 bytes --]

From 60c55351ed8488ff46cf43db8575bd2ec8b41558 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 12:40:37 -0500
Subject: [PATCH 01/20]  * module/ice-9/peg.scm: split define-nonterm into two functions for
      better readability.

---
 module/ice-9/peg.scm |   56 +++++++++++++++++++++++++------------------------
 1 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 2386001..e5b2a17 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -346,6 +346,34 @@
 ;; the point of diminishing returns on my box.
 (define *cache-size* 512)
 
+(define (code-for-non-cache-case matchf accumsym symsym)
+  (safe-bind
+   (str strlen at res body)
+   `(lambda (,str ,strlen ,at)
+      (let ((,res (,matchf ,str ,strlen ,at)))
+        ;; Try to match the nonterminal.
+        (if ,res
+            ;; If we matched, do some post-processing to figure out
+            ;; what data to propagate upward.
+            (let ((,at (car ,res))
+                  (,body (cadr ,res)))
+              ,(cond
+                ((eq? accumsym 'name)
+                 `(list ,at ',symsym))
+                ((eq? accumsym 'all)
+                 `(list (car ,res)
+                        (cond
+                         ((not (list? ,body))
+                          (list ',symsym ,body))
+                         ((null? ,body) ',symsym)
+                         ((symbol? (car ,body))
+                          (list ',symsym ,body))
+                         (#t (cons ',symsym ,body)))))
+                ((eq? accumsym 'none) `(list (car ,res) '()))
+                (#t (begin res))))
+            ;; If we didn't match, just return false.
+            #f)))))
+
 ;; Defines a new nonterminal symbol accumulating with ACCUM.
 (define-syntax define-nonterm
   (lambda (x)
@@ -357,33 +385,7 @@
              (accumsym (syntax->datum #'accum))
              (c (datum->syntax x (gensym))));; the cache
          ;; CODE is the code to parse the string if the result isn't cached.
-         (let ((code
-                (safe-bind
-                 (str strlen at res body)
-                `(lambda (,str ,strlen ,at)
-                   (let ((,res (,matchf ,str ,strlen ,at)))
-                     ;; Try to match the nonterminal.
-                     (if ,res
-                         ;; If we matched, do some post-processing to figure out
-                         ;; what data to propagate upward.
-                         (let ((,at (car ,res))
-                               (,body (cadr ,res)))
-                           ,(cond
-                             ((eq? accumsym 'name)
-                              `(list ,at ',symsym))
-                             ((eq? accumsym 'all)
-                              `(list (car ,res)
-                                     (cond
-                                      ((not (list? ,body))
-                                       (list ',symsym ,body))
-                                      ((null? ,body) ',symsym)
-                                      ((symbol? (car ,body))
-                                       (list ',symsym ,body))
-                                      (#t (cons ',symsym ,body)))))
-                             ((eq? accumsym 'none) `(list (car ,res) '()))
-                             (#t (begin res))))
-                         ;; If we didn't match, just return false.
-                         #f))))))
+         (let ((code (code-for-non-cache-case matchf accumsym symsym)))
            #`(begin
                (define #,c (make-vector *cache-size* #f));; the cache
                (define (sym str strlen at)
-- 
1.7.4


[-- Attachment #3: 0002-module-ice-9-peg.scm-function-now-returns-syntax-ins.patch --]
[-- Type: application/octet-stream, Size: 2546 bytes --]

From 7692e6b336293f226cf07043de14bc11831abb7b Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 12:55:43 -0500
Subject: [PATCH 02/20]  * module/ice-9/peg.scm: function now returns syntax instead of an
     s-expression.

---
 module/ice-9/peg.scm |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index e5b2a17..9bf5ace 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -346,8 +346,8 @@
 ;; the point of diminishing returns on my box.
 (define *cache-size* 512)
 
-(define (code-for-non-cache-case matchf accumsym symsym)
-  (safe-bind
+(define (syntax-for-non-cache-case for-syntax matchf accumsym symsym)
+  (datum->syntax for-syntax (safe-bind
    (str strlen at res body)
    `(lambda (,str ,strlen ,at)
       (let ((,res (,matchf ,str ,strlen ,at)))
@@ -372,7 +372,7 @@
                 ((eq? accumsym 'none) `(list (car ,res) '()))
                 (#t (begin res))))
             ;; If we didn't match, just return false.
-            #f)))))
+            #f))))))
 
 ;; Defines a new nonterminal symbol accumulating with ACCUM.
 (define-syntax define-nonterm
@@ -385,7 +385,7 @@
              (accumsym (syntax->datum #'accum))
              (c (datum->syntax x (gensym))));; the cache
          ;; CODE is the code to parse the string if the result isn't cached.
-         (let ((code (code-for-non-cache-case matchf accumsym symsym)))
+         (let ((syn (syntax-for-non-cache-case x matchf accumsym symsym)))
            #`(begin
                (define #,c (make-vector *cache-size* #f));; the cache
                (define (sym str strlen at)
@@ -394,14 +394,14 @@
                    (if (and vref (eq? (car vref) str) (= (cadr vref) at))
                        (caddr vref);; If it is return it.
                        (let ((fres ;; Else calculate it and cache it.
-                              (#,(datum->syntax x code) str strlen at)))
+                              (#,syn str strlen at)))
                          (vector-set! #,c (modulo at *cache-size*)
                                       (list str at fres))
                          fres))))
 
                ;; Store the code in case people want to debug.
                (set-symbol-property!
-                'sym 'code #,(datum->syntax x (list 'quote code)))
+                'sym 'code #,(datum->syntax x (list 'quote (syntax->datum syn))))
                sym)))))))
 
 ;; Gets the code corresponding to NONTERM
-- 
1.7.4


[-- Attachment #4: 0003-module-ice-9-peg.scm-remove-safe-bind-from-syntax-fo.patch --]
[-- Type: application/octet-stream, Size: 3112 bytes --]

From a95fba1e03dd79e7e47f1546963916fbac07821f Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 13:30:48 -0500
Subject: [PATCH 03/20]  * module/ice-9/peg.scm: remove safe-bind from syntax-for-non-cache-case.

---
 module/ice-9/peg.scm |   50 ++++++++++++++++++++++++++++----------------------
 1 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 9bf5ace..b4f1b1b 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -347,32 +347,38 @@
 (define *cache-size* 512)
 
 (define (syntax-for-non-cache-case for-syntax matchf accumsym symsym)
-  (datum->syntax for-syntax (safe-bind
-   (str strlen at res body)
-   `(lambda (,str ,strlen ,at)
-      (let ((,res (,matchf ,str ,strlen ,at)))
+  (let ((m-syn (datum->syntax for-syntax matchf))
+        (a-syn (datum->syntax for-syntax accumsym))
+        (s-syn (datum->syntax for-syntax symsym))
+        (str-syn (syntax str))
+        (strlen-syn (syntax strlen))
+        (at-syn (syntax at))
+        (res-syn (syntax res))
+        (body-syn (syntax body)))        
+   #`(lambda (#,str-syn #,strlen-syn #,at-syn)
+      (let ((#,res-syn (#,m-syn #,str-syn #,strlen-syn #,at-syn)))
         ;; Try to match the nonterminal.
-        (if ,res
+        (if #,res-syn
             ;; If we matched, do some post-processing to figure out
             ;; what data to propagate upward.
-            (let ((,at (car ,res))
-                  (,body (cadr ,res)))
-              ,(cond
-                ((eq? accumsym 'name)
-                 `(list ,at ',symsym))
-                ((eq? accumsym 'all)
-                 `(list (car ,res)
-                        (cond
-                         ((not (list? ,body))
-                          (list ',symsym ,body))
-                         ((null? ,body) ',symsym)
-                         ((symbol? (car ,body))
-                          (list ',symsym ,body))
-                         (#t (cons ',symsym ,body)))))
-                ((eq? accumsym 'none) `(list (car ,res) '()))
-                (#t (begin res))))
+            (let ((#,at-syn (car #,res-syn))
+                  (#,body-syn (cadr #,res-syn)))
+              #,(cond
+                 ((eq? accumsym 'name)
+                  #`(list #,at-syn '#,s-syn))
+                 ((eq? accumsym 'all)
+                  #`(list (car #,res-syn)
+                          (cond
+                           ((not (list? #,body-syn))
+                            (list '#,s-syn #,body-syn))
+                           ((null? #,body-syn) '#,s-syn)
+                           ((symbol? (car #,body-syn))
+                            (list '#,s-syn #,body-syn))
+                           (#t (cons '#,s-syn #,body-syn)))))
+                 ((eq? accumsym 'none) #`(list (car #,res-syn) '()))
+                 (#t #`(begin #,res-syn))))
             ;; If we didn't match, just return false.
-            #f))))))
+            #f)))))
 
 ;; Defines a new nonterminal symbol accumulating with ACCUM.
 (define-syntax define-nonterm
-- 
1.7.4


[-- Attachment #5: 0004-module-ice-9-peg.scm-clean-up-syntax-for-non-cache-c.patch --]
[-- Type: application/octet-stream, Size: 2796 bytes --]

From 0454a3628b48ebe87f2b6dcb71d9207fe7454b65 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 13:36:41 -0500
Subject: [PATCH 04/20]  * module/ice-9/peg.scm: clean up syntax-for-non-cache-case.

---
 module/ice-9/peg.scm |   37 ++++++++++++++++---------------------
 1 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index b4f1b1b..27cca65 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -349,34 +349,29 @@
 (define (syntax-for-non-cache-case for-syntax matchf accumsym symsym)
   (let ((m-syn (datum->syntax for-syntax matchf))
         (a-syn (datum->syntax for-syntax accumsym))
-        (s-syn (datum->syntax for-syntax symsym))
-        (str-syn (syntax str))
-        (strlen-syn (syntax strlen))
-        (at-syn (syntax at))
-        (res-syn (syntax res))
-        (body-syn (syntax body)))        
-   #`(lambda (#,str-syn #,strlen-syn #,at-syn)
-      (let ((#,res-syn (#,m-syn #,str-syn #,strlen-syn #,at-syn)))
+        (s-syn (datum->syntax for-syntax symsym)))
+   #`(lambda (str strlen at)
+      (let ((res (#,m-syn str strlen at)))
         ;; Try to match the nonterminal.
-        (if #,res-syn
+        (if res
             ;; If we matched, do some post-processing to figure out
             ;; what data to propagate upward.
-            (let ((#,at-syn (car #,res-syn))
-                  (#,body-syn (cadr #,res-syn)))
+            (let ((at (car res))
+                  (body (cadr res)))
               #,(cond
                  ((eq? accumsym 'name)
-                  #`(list #,at-syn '#,s-syn))
+                  #`(list at '#,s-syn))
                  ((eq? accumsym 'all)
-                  #`(list (car #,res-syn)
+                  #`(list (car res)
                           (cond
-                           ((not (list? #,body-syn))
-                            (list '#,s-syn #,body-syn))
-                           ((null? #,body-syn) '#,s-syn)
-                           ((symbol? (car #,body-syn))
-                            (list '#,s-syn #,body-syn))
-                           (#t (cons '#,s-syn #,body-syn)))))
-                 ((eq? accumsym 'none) #`(list (car #,res-syn) '()))
-                 (#t #`(begin #,res-syn))))
+                           ((not (list? body))
+                            (list '#,s-syn body))
+                           ((null? body) '#,s-syn)
+                           ((symbol? (car body))
+                            (list '#,s-syn body))
+                           (#t (cons '#,s-syn body)))))
+                 ((eq? accumsym 'none) #`(list (car res) '()))
+                 (#t #`(begin res))))
             ;; If we didn't match, just return false.
             #f)))))
 
-- 
1.7.4


[-- Attachment #6: 0005-module-ice-9-peg.scm-clean-up-syntax-for-non-cache-c.patch --]
[-- Type: application/octet-stream, Size: 1888 bytes --]

From c2fdd38aad8465bba58912fafeef6434cd85e3ff Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 13:42:32 -0500
Subject: [PATCH 05/20]  * module/ice-9/peg.scm: clean up syntax-for-non-cache-case a bit.

---
 module/ice-9/peg.scm |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 27cca65..5905651 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -346,12 +346,10 @@
 ;; the point of diminishing returns on my box.
 (define *cache-size* 512)
 
-(define (syntax-for-non-cache-case for-syntax matchf accumsym symsym)
-  (let ((m-syn (datum->syntax for-syntax matchf))
-        (a-syn (datum->syntax for-syntax accumsym))
-        (s-syn (datum->syntax for-syntax symsym)))
+(define (syntax-for-non-cache-case for-syntax matchf accumsym s-syn)
+  (let ((matchf-syn (datum->syntax for-syntax matchf)))
    #`(lambda (str strlen at)
-      (let ((res (#,m-syn str strlen at)))
+      (let ((res (#,matchf-syn str strlen at)))
         ;; Try to match the nonterminal.
         (if res
             ;; If we matched, do some post-processing to figure out
@@ -382,11 +380,10 @@
       ((_ sym accum match)
        (let ((matchf (peg-sexp-compile (syntax->datum #'match)
                                     (syntax->datum #'accum)))
-             (symsym (syntax->datum #'sym))
              (accumsym (syntax->datum #'accum))
              (c (datum->syntax x (gensym))));; the cache
          ;; CODE is the code to parse the string if the result isn't cached.
-         (let ((syn (syntax-for-non-cache-case x matchf accumsym symsym)))
+         (let ((syn (syntax-for-non-cache-case x matchf accumsym #'sym)))
            #`(begin
                (define #,c (make-vector *cache-size* #f));; the cache
                (define (sym str strlen at)
-- 
1.7.4


[-- Attachment #7: 0006-module-ice-9-peg.scm-pass-for-syntax-argument-to-all.patch --]
[-- Type: application/octet-stream, Size: 12555 bytes --]

From dfba45fcac47ecc318aff19ea26206c7b3b7e9a6 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 14:07:49 -0500
Subject: [PATCH 06/20]  * module/ice-9/peg.scm: pass for-syntax argument to all of the
     code-generating functions.

---
 module/ice-9/peg.scm |  116 ++++++++++++++++++++++++++------------------------
 1 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 5905651..f60d840 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -91,7 +91,7 @@
 
 ;; Code we generate will be defined in a function, and always has to test
 ;; whether it's beyond the bounds of the string before it executes.
-(define (cg-generic-lambda str strlen at code)
+(define (cg-generic-lambda for-syntax str strlen at code)
   `(lambda (,str ,strlen ,at)
      (if (>= ,at ,strlen)
          #f
@@ -105,7 +105,7 @@
 
 ;; Code we generate will have a certain return structure depending on how we're
 ;; accumulating (the ACCUM variable).
-(define (cg-generic-ret accum name body-uneval at)
+(define (cg-generic-ret for-syntax accum name body-uneval at)
   (safe-bind
    (body)
    `(let ((,body ,body-uneval))
@@ -138,36 +138,37 @@
 (define cggr cg-generic-ret)
 
 ;; Generates code that matches a particular string.
-;; E.g.: (cg-string "abc" 'body)
-(define (cg-string match accum)
+;; E.g.: (cg-string syntax "abc" 'body)
+(define (cg-string for-syntax match accum)
   (safe-bind
    (str strlen at)
    (let ((len (string-length match)))
-     (cggl str strlen at
+     (cggl for-syntax str strlen at
            `(if (string=? (substring ,str ,at (min (+ ,at ,len) ,strlen))
                           ,match)
-                ,(cggr accum 'cg-string match `(+ ,at ,len))
+                ,(cggr for-syntax accum 'cg-string match `(+ ,at ,len))
                 #f)))))
 
 ;; Generates code for matching any character.
-;; E.g.: (cg-peg-any 'body)
-(define (cg-peg-any accum)
+;; E.g.: (cg-peg-any syntax 'body)
+(define (cg-peg-any for-syntax accum)
   (safe-bind
    (str strlen at)
-   (cggl str strlen at
-         (cggr accum 'cg-peg-any `(substring ,str ,at (+ ,at 1)) `(+ ,at 1)))))
+   (cggl for-syntax str strlen at
+         (cggr for-syntax accum
+               'cg-peg-any `(substring ,str ,at (+ ,at 1)) `(+ ,at 1)))))
 
 ;; Generates code for matching a range of characters between start and end.
-;; E.g.: (cg-range #\a #\z 'body)
-(define (cg-range start end accum)
+;; E.g.: (cg-range syntax #\a #\z 'body)
+(define (cg-range for-syntax start end accum)
   (safe-bind
    (str strlen at c)
-   (cggl str strlen at
+   (cggl for-syntax str strlen at
          `(let ((,c (string-ref ,str ,at)))
             (if (and
                  (char>=? ,c ,start)
                  (char<=? ,c ,end))
-                ,(cggr accum 'cg-range `(string ,c) `(+ ,at 1))
+                ,(cggr for-syntax accum 'cg-range `(string ,c) `(+ ,at 1))
                 #f)))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
@@ -189,13 +190,13 @@
     val))
 
 ;; Takes an arbitrary expressions and accumulation variable, then parses it.
-;; E.g.: (peg-sexp-compile '(and "abc" (or "-" (range #\a #\z))) 'all)
-(define (peg-sexp-compile match accum)
+;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
+(define (peg-sexp-compile for-syntax match accum)
    (cond
-    ((string? match) (cg-string match (baf accum)))
+    ((string? match) (cg-string for-syntax match (baf accum)))
     ((symbol? match) ;; either peg-any or a nonterminal
      (cond
-      ((eq? match 'peg-any) (cg-peg-any (baf accum)))
+      ((eq? match 'peg-any) (cg-peg-any for-syntax (baf accum)))
       ;; if match is any other symbol it's a nonterminal, so just return it
       (#t match)))
     ((or (not (list? match)) (null? match))
@@ -203,19 +204,19 @@
      (error-val `(peg-sexp-compile-error-1 ,match ,accum)))
     
     ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
-     (cg-range (cadr match) (caddr match) (baf accum)))
+     (cg-range for-syntax (cadr match) (caddr match) (baf accum)))
     ((eq? (car match) 'ignore) ;; match but don't parse
-     (peg-sexp-compile (cadr match) 'none))
+     (peg-sexp-compile for-syntax (cadr match) 'none))
     ((eq? (car match) 'capture) ;; parse
-     (peg-sexp-compile (cadr match) 'body))
+     (peg-sexp-compile for-syntax (cadr match) 'body))
     ((eq? (car match) 'peg) ;; embedded PEG string
-     (peg-string-compile (cadr match) (baf accum)))
-    ((eq? (car match) 'and) (cg-and (cdr match) (baf accum)))
-    ((eq? (car match) 'or) (cg-or (cdr match) (baf accum)))
+     (peg-string-compile for-syntax (cadr match) (baf accum)))
+    ((eq? (car match) 'and) (cg-and for-syntax (cdr match) (baf accum)))
+    ((eq? (car match) 'or) (cg-or for-syntax (cdr match) (baf accum)))
     ((eq? (car match) 'body)
      (if (not (= (length match) 4))
          (error-val `(peg-sexp-compile-error-2 ,match ,accum))
-         (apply cg-body (cons (baf accum) (cdr match)))))
+         (apply cg-body for-syntax (cons (baf accum) (cdr match)))))
     (#t (error-val `(peg-sexp-compile-error-3 ,match ,accum)))))
 
 ;;;;; Convenience macros for making sure things come out in a readable form.
@@ -233,20 +234,20 @@
        #'(if (not (null? obj)) (push! lst obj))))))
 
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
-(define (cg-and arglst accum)
+(define (cg-and for-syntax arglst accum)
   (safe-bind
    (str strlen at body)
    `(lambda (,str ,strlen ,at)
       (let ((,body '()))
-        ,(cg-and-int arglst accum str strlen at body)))))
+        ,(cg-and-int for-syntax arglst accum str strlen at body)))))
 
 ;; Internal function builder for AND (calls itself).
-(define (cg-and-int arglst accum str strlen at body)
+(define (cg-and-int for-syntax arglst accum str strlen at body)
   (safe-bind
    (res newat newbody)
    (if (null? arglst)
-       (cggr accum 'cg-and `(reverse ,body) at) ;; base case
-       (let ((mf (peg-sexp-compile (car arglst) accum))) ;; match function
+       (cggr for-syntax accum 'cg-and `(reverse ,body) at) ;; base case
+       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
          `(let ((,res (,mf ,str ,strlen ,at)))
             (if (not ,res) 
                 #f ;; if the match failed, the and failed
@@ -255,33 +256,33 @@
                       (,newbody (cadr ,res)))
                   (set! ,at ,newat)
                   ((@@ (ice-9 peg) push-not-null!) ,body ((@@ (ice-9 peg) single-filter) ,newbody))
-                  ,(cg-and-int (cdr arglst) accum str strlen at body))))))))
+                  ,(cg-and-int for-syntax (cdr arglst) accum str strlen at body))))))))
 
 ;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
-(define (cg-or arglst accum)
+(define (cg-or for-syntax arglst accum)
   (safe-bind
    (str strlen at body)
    `(lambda (,str ,strlen ,at)
-      ,(cg-or-int arglst accum str strlen at body))))
+      ,(cg-or-int for-syntax arglst accum str strlen at body))))
 
 ;; Internal function builder for OR (calls itself).
-(define (cg-or-int arglst accum str strlen at body)
+(define (cg-or-int for-syntax arglst accum str strlen at body)
   (safe-bind
    (res)
    (if (null? arglst)
        #f ;; base case
-       (let ((mf (peg-sexp-compile (car arglst) accum)))
+       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
          `(let ((,res (,mf ,str ,strlen ,at)))
             (if ,res ;; if the match succeeds, we're done
-                ,(cggr accum 'cg-or `(cadr ,res) `(car ,res))
-                ,(cg-or-int (cdr arglst) accum str strlen at body)))))))
+                ,(cggr for-syntax accum 'cg-or `(cadr ,res) `(car ,res))
+                ,(cg-or-int for-syntax (cdr arglst) accum str strlen at body)))))))
 
 ;; Returns a block of code that tries to match MATCH, and on success updates AT
 ;; and BODY, return #f on failure and #t on success.
-(define (cg-body-test match accum str strlen at body)
+(define (cg-body-test for-syntax match accum str strlen at body)
   (safe-bind
    (at2-body2 at2 body2)
-   (let ((mf (peg-sexp-compile match accum)))
+   (let ((mf (peg-sexp-compile for-syntax match accum)))
      `(let ((,at2-body2 (,mf ,str ,strlen ,at)))
         (if (or (not ,at2-body2) (= ,at (car ,at2-body2)))
             #f
@@ -295,7 +296,7 @@
 
 ;; Returns a block of code that sees whether NUM wants us to try and match more
 ;; given that we've already matched COUNT.
-(define (cg-body-more num count)
+(define (cg-body-more for-syntax num count)
   (cond ((number? num) `(< ,count ,num))
         ((eq? num '+) #t)
         ((eq? num '*) #t)
@@ -304,20 +305,23 @@
 
 ;; Returns a function that takes a paramter indicating whether or not the match
 ;; was succesful and returns what the body expression should return.
-(define (cg-body-ret accum type name body at at2)
+(define (cg-body-ret for-syntax accum type name body at at2)
   (safe-bind
    (success)
    `(lambda (,success)
-      ,(cond ((eq? type '!) `(if ,success #f ,(cggr accum name ''() at)))
-             ((eq? type '&) `(if ,success ,(cggr accum name ''() at) #f))
+      ,(cond ((eq? type '!)
+              `(if ,success #f ,(cggr for-syntax accum name ''() at)))
+             ((eq? type '&)
+              `(if ,success ,(cggr for-syntax accum name ''() at) #f))
              ((eq? type 'lit)
-              `(if ,success ,(cggr accum name `(reverse ,body) at2) #f))
+              `(if ,success
+                   ,(cggr for-syntax accum name `(reverse ,body) at2) #f))
              (#t (error-val
                   `(cg-body-ret-error ,type ,accum ,name ,body ,at ,at2)))))))
 
 ;; Returns a block of code that sees whether COUNT satisfies the constraints of
 ;; NUM.
-(define (cg-body-success num count)
+(define (cg-body-success for-syntax num count)
   (cond ((number? num) `(= ,count ,num))
         ((eq? num '+) `(>= ,count 1))
         ((eq? num '*) #t)
@@ -325,16 +329,16 @@
         (#t `(cg-body-success-error ,num))))
 
 ;; Returns a function that parses a BODY element.
-(define (cg-body accum type match num)
+(define (cg-body for-syntax accum type match num)
   (safe-bind
    (str strlen at at2 count body)
    `(lambda (,str ,strlen ,at)
       (let ((,at2 ,at) (,count 0) (,body '()))
-        (while (and ,(cg-body-test match accum str strlen at2 body)
+        (while (and ,(cg-body-test for-syntax match accum str strlen at2 body)
                     (set! ,count (+ ,count 1))
-                    ,(cg-body-more num count)))
-        (,(cg-body-ret accum type 'cg-body body at at2)
-         ,(cg-body-success num count))))))
+                    ,(cg-body-more for-syntax num count)))
+        (,(cg-body-ret for-syntax accum type 'cg-body body at at2)
+         ,(cg-body-success for-syntax num count))))))
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; FOR DEFINING AND USING NONTERMINALS
@@ -378,8 +382,8 @@
   (lambda (x)
     (syntax-case x ()
       ((_ sym accum match)
-       (let ((matchf (peg-sexp-compile (syntax->datum #'match)
-                                    (syntax->datum #'accum)))
+       (let ((matchf (peg-sexp-compile x (syntax->datum #'match)
+                                       (syntax->datum #'accum)))
              (accumsym (syntax->datum #'accum))
              (c (datum->syntax x (gensym))));; the cache
          ;; CODE is the code to parse the string if the result isn't cached.
@@ -428,8 +432,8 @@
        (let ((pmsym (syntax->datum #'peg-matcher)))
          (let ((peg-sexp-compile
                 (if (string? pmsym)
-                    (peg-string-compile pmsym 'body)
-                    (peg-sexp-compile pmsym 'body))))
+                    (peg-string-compile x pmsym 'body)
+                    (peg-sexp-compile x pmsym 'body))))
            ;; We copy the string before using it because it might have been
            ;; modified in-place since the last time it was parsed, which would
            ;; invalidate the cache.  Guile uses copy-on-write for strings, so
@@ -710,8 +714,8 @@ RB < ']'
        (#t (map compressor lst)))))
 
 ;; Builds a lambda-expressions for the pattern STR using accum.
-(define (peg-string-compile str accum)
-  (peg-sexp-compile
+(define (peg-string-compile for-syntax str accum)
+  (peg-sexp-compile for-syntax
    (compressor (peg-parse-pattern (peg:tree (peg-parse peg-pattern str))))
    accum))
 
-- 
1.7.4


[-- Attachment #8: 0007-module-ice-9-peg.scm-peg-sexp-compile-and-peg-string.patch --]
[-- Type: application/octet-stream, Size: 4827 bytes --]

From f2c35a67b1b12fe84b543a7258215cbe9629a2e9 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sat, 29 Jan 2011 14:12:38 -0500
Subject: [PATCH 07/20]  * module/ice-9/peg.scm: peg-sexp-compile and peg-string-compile now
     return syntax instead of s-expressions.

---
 module/ice-9/peg.scm |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index f60d840..966ec20 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -192,6 +192,7 @@
 ;; Takes an arbitrary expressions and accumulation variable, then parses it.
 ;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
 (define (peg-sexp-compile for-syntax match accum)
+  (datum->syntax for-syntax
    (cond
     ((string? match) (cg-string for-syntax match (baf accum)))
     ((symbol? match) ;; either peg-any or a nonterminal
@@ -206,18 +207,18 @@
     ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
      (cg-range for-syntax (cadr match) (caddr match) (baf accum)))
     ((eq? (car match) 'ignore) ;; match but don't parse
-     (peg-sexp-compile for-syntax (cadr match) 'none))
+     (syntax->datum (peg-sexp-compile for-syntax (cadr match) 'none)))
     ((eq? (car match) 'capture) ;; parse
-     (peg-sexp-compile for-syntax (cadr match) 'body))
+     (syntax->datum (peg-sexp-compile for-syntax (cadr match) 'body)))
     ((eq? (car match) 'peg) ;; embedded PEG string
-     (peg-string-compile for-syntax (cadr match) (baf accum)))
+     (syntax->datum (peg-string-compile for-syntax (cadr match) (baf accum))))
     ((eq? (car match) 'and) (cg-and for-syntax (cdr match) (baf accum)))
     ((eq? (car match) 'or) (cg-or for-syntax (cdr match) (baf accum)))
     ((eq? (car match) 'body)
      (if (not (= (length match) 4))
          (error-val `(peg-sexp-compile-error-2 ,match ,accum))
          (apply cg-body for-syntax (cons (baf accum) (cdr match)))))
-    (#t (error-val `(peg-sexp-compile-error-3 ,match ,accum)))))
+    (#t (error-val `(peg-sexp-compile-error-3 ,match ,accum))))))
 
 ;;;;; Convenience macros for making sure things come out in a readable form.
 ;; If SYM is a list of one element, return (car SYM), else return SYM.
@@ -247,7 +248,8 @@
    (res newat newbody)
    (if (null? arglst)
        (cggr for-syntax accum 'cg-and `(reverse ,body) at) ;; base case
-       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
+       (let ((mf (syntax->datum
+                  (peg-sexp-compile for-syntax (car arglst) accum)))) ;; match function
          `(let ((,res (,mf ,str ,strlen ,at)))
             (if (not ,res) 
                 #f ;; if the match failed, the and failed
@@ -271,7 +273,8 @@
    (res)
    (if (null? arglst)
        #f ;; base case
-       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
+       (let ((mf (syntax->datum
+                  (peg-sexp-compile for-syntax (car arglst) accum))))
          `(let ((,res (,mf ,str ,strlen ,at)))
             (if ,res ;; if the match succeeds, we're done
                 ,(cggr for-syntax accum 'cg-or `(cadr ,res) `(car ,res))
@@ -282,7 +285,8 @@
 (define (cg-body-test for-syntax match accum str strlen at body)
   (safe-bind
    (at2-body2 at2 body2)
-   (let ((mf (peg-sexp-compile for-syntax match accum)))
+   (let ((mf (syntax->datum
+              (peg-sexp-compile for-syntax match accum))))
      `(let ((,at2-body2 (,mf ,str ,strlen ,at)))
         (if (or (not ,at2-body2) (= ,at (car ,at2-body2)))
             #f
@@ -350,8 +354,8 @@
 ;; the point of diminishing returns on my box.
 (define *cache-size* 512)
 
-(define (syntax-for-non-cache-case for-syntax matchf accumsym s-syn)
-  (let ((matchf-syn (datum->syntax for-syntax matchf)))
+(define (syntax-for-non-cache-case for-syntax matchf-syn accumsym s-syn)
+;  (let ((matchf-syn (datum->syntax for-syntax matchf)))
    #`(lambda (str strlen at)
       (let ((res (#,matchf-syn str strlen at)))
         ;; Try to match the nonterminal.
@@ -375,7 +379,7 @@
                  ((eq? accumsym 'none) #`(list (car res) '()))
                  (#t #`(begin res))))
             ;; If we didn't match, just return false.
-            #f)))))
+            #f))))
 
 ;; Defines a new nonterminal symbol accumulating with ACCUM.
 (define-syntax define-nonterm
@@ -443,7 +447,7 @@
                    (at 0))
                (let ((ret ((@@ (ice-9 peg) until-works)
                            (or (>= at strlen)
-                               (#,(datum->syntax x peg-sexp-compile)
+                               (#,peg-sexp-compile
                                 string strlen at))
                            (set! at (+ at 1)))))
                  (if (eq? ret #t) ;; (>= at strlen) succeeded
-- 
1.7.4


[-- Attachment #9: 0008-module-ice-9-peg.scm-push-datum-syntax-call-through-.patch --]
[-- Type: application/octet-stream, Size: 4355 bytes --]

From c7621730339bbaf1dfd7279ea69902ee384b106a Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sun, 30 Jan 2011 15:59:52 -0500
Subject: [PATCH 08/20]  * module/ice-9/peg.scm: push datum->syntax call through cond expression
     in peg-sexp-compile. this is a preliminary move so that I can convert
     the code-generating functions into syntax-generating functions one by
     one.

---
 module/ice-9/peg.scm |   64 +++++++++++++++++++++++++++++---------------------
 1 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 966ec20..8f6f54d 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -192,33 +192,43 @@
 ;; Takes an arbitrary expressions and accumulation variable, then parses it.
 ;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
 (define (peg-sexp-compile for-syntax match accum)
-  (datum->syntax for-syntax
-   (cond
-    ((string? match) (cg-string for-syntax match (baf accum)))
-    ((symbol? match) ;; either peg-any or a nonterminal
-     (cond
-      ((eq? match 'peg-any) (cg-peg-any for-syntax (baf accum)))
-      ;; if match is any other symbol it's a nonterminal, so just return it
-      (#t match)))
-    ((or (not (list? match)) (null? match))
-     ;; anything besides a string, symbol, or list is an error
-     (error-val `(peg-sexp-compile-error-1 ,match ,accum)))
-    
-    ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
-     (cg-range for-syntax (cadr match) (caddr match) (baf accum)))
-    ((eq? (car match) 'ignore) ;; match but don't parse
-     (syntax->datum (peg-sexp-compile for-syntax (cadr match) 'none)))
-    ((eq? (car match) 'capture) ;; parse
-     (syntax->datum (peg-sexp-compile for-syntax (cadr match) 'body)))
-    ((eq? (car match) 'peg) ;; embedded PEG string
-     (syntax->datum (peg-string-compile for-syntax (cadr match) (baf accum))))
-    ((eq? (car match) 'and) (cg-and for-syntax (cdr match) (baf accum)))
-    ((eq? (car match) 'or) (cg-or for-syntax (cdr match) (baf accum)))
-    ((eq? (car match) 'body)
-     (if (not (= (length match) 4))
-         (error-val `(peg-sexp-compile-error-2 ,match ,accum))
-         (apply cg-body for-syntax (cons (baf accum) (cdr match)))))
-    (#t (error-val `(peg-sexp-compile-error-3 ,match ,accum))))))
+  (cond
+   ((string? match) (datum->syntax for-syntax
+                                   (cg-string for-syntax match (baf accum))))
+   ((symbol? match) ;; either peg-any or a nonterminal
+    (cond
+     ((eq? match 'peg-any) (datum->syntax for-syntax
+                                          (cg-peg-any for-syntax (baf accum))))
+     ;; if match is any other symbol it's a nonterminal, so just return it
+     (#t (datum->syntax for-syntax match))))
+   ((or (not (list? match)) (null? match))
+    ;; anything besides a string, symbol, or list is an error
+    (datum->syntax for-syntax
+                   (error-val `(peg-sexp-compile-error-1 ,match ,accum))))
+   
+   ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
+    (datum->syntax for-syntax
+                   (cg-range for-syntax (cadr match) (caddr match) (baf accum))))
+   ((eq? (car match) 'ignore) ;; match but don't parse
+    (peg-sexp-compile for-syntax (cadr match) 'none))
+   ((eq? (car match) 'capture) ;; parse
+    (peg-sexp-compile for-syntax (cadr match) 'body))
+   ((eq? (car match) 'peg) ;; embedded PEG string
+    (peg-string-compile for-syntax (cadr match) (baf accum)))
+   ((eq? (car match) 'and)
+    (datum->syntax for-syntax
+                   (cg-and for-syntax (cdr match) (baf accum))))
+   ((eq? (car match) 'or)
+    (datum->syntax for-syntax
+                   (cg-or for-syntax (cdr match) (baf accum))))
+   ((eq? (car match) 'body)
+    (if (not (= (length match) 4))
+        (datum->syntax for-syntax
+                       (error-val `(peg-sexp-compile-error-2 ,match ,accum)))
+        (datum->syntax for-syntax
+                       (apply cg-body for-syntax (cons (baf accum) (cdr match))))))
+   (#t (datum->syntax for-syntax
+                      (error-val `(peg-sexp-compile-error-3 ,match ,accum))))))
 
 ;;;;; Convenience macros for making sure things come out in a readable form.
 ;; If SYM is a list of one element, return (car SYM), else return SYM.
-- 
1.7.4


[-- Attachment #10: 0009-module-ice-9-peg.scm-cg-string-now-returns-syntax-in.patch --]
[-- Type: application/octet-stream, Size: 1514 bytes --]

From 876a1a61d88b76f492d853875a5cd8467513d7ce Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sun, 30 Jan 2011 16:04:36 -0500
Subject: [PATCH 09/20]  * module/ice-9/peg.scm: cg-string now returns syntax instead of
      s-expressions.

---
 module/ice-9/peg.scm |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 8f6f54d..dc6e2d4 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -140,9 +140,11 @@
 ;; Generates code that matches a particular string.
 ;; E.g.: (cg-string syntax "abc" 'body)
 (define (cg-string for-syntax match accum)
-  (safe-bind
-   (str strlen at)
-   (let ((len (string-length match)))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at))
+        (len (string-length match)))
+    (datum->syntax for-syntax
      (cggl for-syntax str strlen at
            `(if (string=? (substring ,str ,at (min (+ ,at ,len) ,strlen))
                           ,match)
@@ -193,8 +195,7 @@
 ;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
 (define (peg-sexp-compile for-syntax match accum)
   (cond
-   ((string? match) (datum->syntax for-syntax
-                                   (cg-string for-syntax match (baf accum))))
+   ((string? match) (cg-string for-syntax match (baf accum)))
    ((symbol? match) ;; either peg-any or a nonterminal
     (cond
      ((eq? match 'peg-any) (datum->syntax for-syntax
-- 
1.7.4


[-- Attachment #11: 0010-module-ice-9-peg.scm-cg-peg-any-now-returns-syntax-i.patch --]
[-- Type: application/octet-stream, Size: 1744 bytes --]

From 7bd0ea7015b32440e74d732c08198c14b9f6c2d6 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sun, 30 Jan 2011 16:07:34 -0500
Subject: [PATCH 10/20]  * module/ice-9/peg.scm: cg-peg-any now returns syntax instead of
     s-expressions.

---
 module/ice-9/peg.scm |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index dc6e2d4..e101eec 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -154,11 +154,13 @@
 ;; Generates code for matching any character.
 ;; E.g.: (cg-peg-any syntax 'body)
 (define (cg-peg-any for-syntax accum)
-  (safe-bind
-   (str strlen at)
-   (cggl for-syntax str strlen at
-         (cggr for-syntax accum
-               'cg-peg-any `(substring ,str ,at (+ ,at 1)) `(+ ,at 1)))))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at)))
+    (datum->syntax for-syntax
+      (cggl for-syntax str strlen at
+            (cggr for-syntax accum
+                  'cg-peg-any `(substring ,str ,at (+ ,at 1)) `(+ ,at 1))))))
 
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
@@ -198,8 +200,7 @@
    ((string? match) (cg-string for-syntax match (baf accum)))
    ((symbol? match) ;; either peg-any or a nonterminal
     (cond
-     ((eq? match 'peg-any) (datum->syntax for-syntax
-                                          (cg-peg-any for-syntax (baf accum))))
+     ((eq? match 'peg-any) (cg-peg-any for-syntax (baf accum)))
      ;; if match is any other symbol it's a nonterminal, so just return it
      (#t (datum->syntax for-syntax match))))
    ((or (not (list? match)) (null? match))
-- 
1.7.4


[-- Attachment #12: 0011-module-ice-9-peg.scm-cg-range-now-returns-syntax-ins.patch --]
[-- Type: application/octet-stream, Size: 2127 bytes --]

From d96acddb00ca1b742bf19e750ab6c2c6ad2c68eb Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Sun, 30 Jan 2011 16:10:07 -0500
Subject: [PATCH 11/20]  * module/ice-9/peg.scm: cg-range now returns syntax instead of
     s-expressions.

---
 module/ice-9/peg.scm |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index e101eec..1d10f9a 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -165,15 +165,18 @@
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
 (define (cg-range for-syntax start end accum)
-  (safe-bind
-   (str strlen at c)
-   (cggl for-syntax str strlen at
-         `(let ((,c (string-ref ,str ,at)))
-            (if (and
-                 (char>=? ,c ,start)
-                 (char<=? ,c ,end))
-                ,(cggr for-syntax accum 'cg-range `(string ,c) `(+ ,at 1))
-                #f)))))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at))
+        (c (syntax c)))
+    (datum->syntax for-syntax
+      (cggl for-syntax str strlen at
+            `(let ((,c (string-ref ,str ,at)))
+               (if (and
+                    (char>=? ,c ,start)
+                    (char<=? ,c ,end))
+                   ,(cggr for-syntax accum 'cg-range `(string ,c) `(+ ,at 1))
+                   #f))))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
 ;; literals (since we don't want to tag them with their name if we're doing an
@@ -209,8 +212,7 @@
                    (error-val `(peg-sexp-compile-error-1 ,match ,accum))))
    
    ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
-    (datum->syntax for-syntax
-                   (cg-range for-syntax (cadr match) (caddr match) (baf accum))))
+    (cg-range for-syntax (cadr match) (caddr match) (baf accum)))
    ((eq? (car match) 'ignore) ;; match but don't parse
     (peg-sexp-compile for-syntax (cadr match) 'none))
    ((eq? (car match) 'capture) ;; parse
-- 
1.7.4


[-- Attachment #13: 0012-module-ice-9-peg.scm-add-cggl-syn-and-cggr-syn-funct.patch --]
[-- Type: application/octet-stream, Size: 3248 bytes --]

From 0d591206929a794f75b7cfa06a08c9b162a2667f Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Mon, 31 Jan 2011 14:45:32 -0500
Subject: [PATCH 12/20]  * module/ice-9/peg.scm: add cggl-syn and cggr-syn functions that are
     equivalent to the regular cggl and cggr functions except that they
     operate on syntax instead of s-expressions.

---
 module/ice-9/peg.scm |   48 ++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 1d10f9a..9c16987 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -99,6 +99,13 @@
 ;; The short name makes the formatting below much easier to read.
 (define cggl cg-generic-lambda)
 
+(define (cggl-syn for-syntax str strlen at code)
+  ;; all arguments are syntax
+  #`(lambda (#,str #,strlen #,at)
+     (if (>= #,at #,strlen)
+         #f
+         #,code)))
+
 ;; Optimizations for CG-GENERIC-RET below...
 (define *op-known-single-body* '(cg-string cg-peg-any cg-range))
 ;; ...done with optimizations (could use more of these).
@@ -137,6 +144,35 @@
 ;; The short name makes the formatting below much easier to read.
 (define cggr cg-generic-ret)
 
+(define (cggr-syn for-syntax accum name body-uneval at)
+  ;; name, body-uneval and at are syntax
+  #`(let ((body #,body-uneval))
+     #,(cond
+        ((and (eq? accum 'all) name)
+         #`(list #,at
+                 (cond
+                  ((not (list? body)) (list '#,name body))
+                  ((null? body) '#,name)
+                  ((symbol? (car body)) (list '#,name body))
+                  (#t (cons '#,name body)))))
+        ((eq? accum 'name)
+         #`(list #,at '#,name))
+        ((eq? accum 'body)
+         (cond
+          ((member (syntax->datum name) *op-known-single-body*)
+           #`(list #,at body))
+          (#t #`(list #,at
+                      (cond
+                       (((@@ (ice-9 peg) single?) body) (car body))
+                       (#t body))))))
+        ((eq? accum 'none)
+         #`(list #,at '()))
+        (#t
+         (begin
+           (pretty-print `(cg-generic-ret-error ,accum ,name ,body-uneval ,at))
+           (pretty-print "Defaulting to accum of none.\n")
+           #`(list #,at '()))))))
+
 ;; Generates code that matches a particular string.
 ;; E.g.: (cg-string syntax "abc" 'body)
 (define (cg-string for-syntax match accum)
@@ -144,12 +180,12 @@
         (strlen (syntax strlen))
         (at (syntax at))
         (len (string-length match)))
-    (datum->syntax for-syntax
-     (cggl for-syntax str strlen at
-           `(if (string=? (substring ,str ,at (min (+ ,at ,len) ,strlen))
-                          ,match)
-                ,(cggr for-syntax accum 'cg-string match `(+ ,at ,len))
-                #f)))))
+     (cggl-syn for-syntax str strlen at
+               #`(if (string=? (substring #,str #,at (min (+ #,at #,len) #,strlen))
+                              #,match)
+                    #,(cggr-syn for-syntax accum 'cg-string match
+                                #`(+ #,at #,len))
+                    #f))))
 
 ;; Generates code for matching any character.
 ;; E.g.: (cg-peg-any syntax 'body)
-- 
1.7.4


[-- Attachment #14: 0013-module-ice-9-peg.scm-update-cg-peg-any-to-use-cggl-s.patch --]
[-- Type: application/octet-stream, Size: 1094 bytes --]

From f7b2f2ca414fb9ebe011e64bcf25efc71d075771 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Mon, 31 Jan 2011 14:56:02 -0500
Subject: [PATCH 13/20]  * module/ice-9/peg.scm: update cg-peg-any to use cggl-syn and cggr-syn.

---
 module/ice-9/peg.scm |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 9c16987..8555a02 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -193,10 +193,10 @@
   (let ((str (syntax str))
         (strlen (syntax strlen))
         (at (syntax at)))
-    (datum->syntax for-syntax
-      (cggl for-syntax str strlen at
-            (cggr for-syntax accum
-                  'cg-peg-any `(substring ,str ,at (+ ,at 1)) `(+ ,at 1))))))
+    (cggl-syn for-syntax str strlen at
+          (cggr-syn for-syntax accum
+                    'cg-peg-any #`(substring #,str #,at (+ #,at 1))
+                    #`(+ #,at 1)))))
 
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
-- 
1.7.4


[-- Attachment #15: 0014-module-ice-9-peg.scm-update-cg-range-to-use-cggl-syn.patch --]
[-- Type: application/octet-stream, Size: 1432 bytes --]

From 87cc9247721a095dd2c3fb312678e2c28e746bfd Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Mon, 31 Jan 2011 14:58:15 -0500
Subject: [PATCH 14/20]  * module/ice-9/peg.scm: update cg-range to use cggl-syn and cggr-syn.

---
 module/ice-9/peg.scm |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 8555a02..60cb499 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -205,14 +205,14 @@
         (strlen (syntax strlen))
         (at (syntax at))
         (c (syntax c)))
-    (datum->syntax for-syntax
-      (cggl for-syntax str strlen at
-            `(let ((,c (string-ref ,str ,at)))
-               (if (and
-                    (char>=? ,c ,start)
-                    (char<=? ,c ,end))
-                   ,(cggr for-syntax accum 'cg-range `(string ,c) `(+ ,at 1))
-                   #f))))))
+    (cggl-syn for-syntax str strlen at
+          #`(let ((#,c (string-ref #,str #,at)))
+             (if (and
+                  (char>=? #,c #,start)
+                  (char<=? #,c #,end))
+                 #,(cggr-syn for-syntax accum 'cg-range
+                             #`(string #,c) #`(+ #,at 1))
+                 #f)))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
 ;; literals (since we don't want to tag them with their name if we're doing an
-- 
1.7.4


[-- Attachment #16: 0015-module-ice-9-peg.scm-convert-cg-and-and-cg-and-int-t.patch --]
[-- Type: application/octet-stream, Size: 3782 bytes --]

From 3922b5329a1b33d0ee0b3668cdf80d12f577090b Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Mon, 31 Jan 2011 15:04:59 -0500
Subject: [PATCH 15/20]  * module/ice-9/peg.scm: convert cg-and and cg-and-int to use cggr-syn
     instead of cggr. they also return syntax now instead of s-expressions.

---
 module/ice-9/peg.scm |   42 +++++++++++++++++++++---------------------
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 60cb499..153bba1 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -245,8 +245,7 @@
    ((or (not (list? match)) (null? match))
     ;; anything besides a string, symbol, or list is an error
     (datum->syntax for-syntax
-                   (error-val `(peg-sexp-compile-error-1 ,match ,accum))))
-   
+                   (error-val `(peg-sexp-compile-error-1 ,match ,accum))))   
    ((eq? (car match) 'range) ;; range of characters (e.g. [a-z])
     (cg-range for-syntax (cadr match) (caddr match) (baf accum)))
    ((eq? (car match) 'ignore) ;; match but don't parse
@@ -256,8 +255,7 @@
    ((eq? (car match) 'peg) ;; embedded PEG string
     (peg-string-compile for-syntax (cadr match) (baf accum)))
    ((eq? (car match) 'and)
-    (datum->syntax for-syntax
-                   (cg-and for-syntax (cdr match) (baf accum))))
+    (cg-and for-syntax (cdr match) (baf accum)))
    ((eq? (car match) 'or)
     (datum->syntax for-syntax
                    (cg-or for-syntax (cdr match) (baf accum))))
@@ -286,29 +284,31 @@
 
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and for-syntax arglst accum)
-  (safe-bind
-   (str strlen at body)
-   `(lambda (,str ,strlen ,at)
-      (let ((,body '()))
-        ,(cg-and-int for-syntax arglst accum str strlen at body)))))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at))
+        (body (syntax body)))
+   #`(lambda (#,str #,strlen #,at)
+      (let ((#,body '()))
+        #,(cg-and-int for-syntax arglst accum str strlen at body)))))
 
 ;; Internal function builder for AND (calls itself).
 (define (cg-and-int for-syntax arglst accum str strlen at body)
-  (safe-bind
-   (res newat newbody)
+  (let ((res (syntax res))
+        (newat (syntax newat))
+        (newbody (syntax newbody)))
    (if (null? arglst)
-       (cggr for-syntax accum 'cg-and `(reverse ,body) at) ;; base case
-       (let ((mf (syntax->datum
-                  (peg-sexp-compile for-syntax (car arglst) accum)))) ;; match function
-         `(let ((,res (,mf ,str ,strlen ,at)))
-            (if (not ,res) 
+       (cggr-syn for-syntax accum 'cg-and #`(reverse #,body) at) ;; base case
+       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
+         #`(let ((#,res (#,mf #,str #,strlen #,at)))
+            (if (not #,res) 
                 #f ;; if the match failed, the and failed
                 ;; otherwise update AT and BODY then recurse
-                (let ((,newat (car ,res))
-                      (,newbody (cadr ,res)))
-                  (set! ,at ,newat)
-                  ((@@ (ice-9 peg) push-not-null!) ,body ((@@ (ice-9 peg) single-filter) ,newbody))
-                  ,(cg-and-int for-syntax (cdr arglst) accum str strlen at body))))))))
+                (let ((#,newat (car #,res))
+                      (#,newbody (cadr #,res)))
+                  (set! #,at #,newat)
+                  ((@@ (ice-9 peg) push-not-null!) #,body ((@@ (ice-9 peg) single-filter) #,newbody))
+                  #,(cg-and-int for-syntax (cdr arglst) accum str strlen at body))))))))
 
 ;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
 (define (cg-or for-syntax arglst accum)
-- 
1.7.4


[-- Attachment #17: 0016-module-ice-9-peg.scm-cg-or-and-cg-or-int-now-return-.patch --]
[-- Type: application/octet-stream, Size: 2479 bytes --]

From e5582e64144958b98f7abf66c059737530a7fdd7 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Mon, 31 Jan 2011 15:08:32 -0500
Subject: [PATCH 16/20]  * module/ice-9/peg.scm: cg-or and cg-or-int now return syntax instead of
     s-expressions.

---
 module/ice-9/peg.scm |   27 +++++++++++++--------------
 1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 153bba1..5def0e0 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -257,8 +257,7 @@
    ((eq? (car match) 'and)
     (cg-and for-syntax (cdr match) (baf accum)))
    ((eq? (car match) 'or)
-    (datum->syntax for-syntax
-                   (cg-or for-syntax (cdr match) (baf accum))))
+    (cg-or for-syntax (cdr match) (baf accum)))
    ((eq? (car match) 'body)
     (if (not (= (length match) 4))
         (datum->syntax for-syntax
@@ -312,23 +311,23 @@
 
 ;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
 (define (cg-or for-syntax arglst accum)
-  (safe-bind
-   (str strlen at body)
-   `(lambda (,str ,strlen ,at)
-      ,(cg-or-int for-syntax arglst accum str strlen at body))))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at))
+        (body (syntax body)))
+   #`(lambda (#,str #,strlen #,at)
+      #,(cg-or-int for-syntax arglst accum str strlen at body))))
 
 ;; Internal function builder for OR (calls itself).
 (define (cg-or-int for-syntax arglst accum str strlen at body)
-  (safe-bind
-   (res)
+  (let ((res (syntax res)))
    (if (null? arglst)
        #f ;; base case
-       (let ((mf (syntax->datum
-                  (peg-sexp-compile for-syntax (car arglst) accum))))
-         `(let ((,res (,mf ,str ,strlen ,at)))
-            (if ,res ;; if the match succeeds, we're done
-                ,(cggr for-syntax accum 'cg-or `(cadr ,res) `(car ,res))
-                ,(cg-or-int for-syntax (cdr arglst) accum str strlen at body)))))))
+       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
+         #`(let ((#,res (#,mf #,str #,strlen #,at)))
+            (if #,res ;; if the match succeeds, we're done
+                #,(cggr-syn for-syntax accum 'cg-or #`(cadr #,res) #`(car #,res))
+                #,(cg-or-int for-syntax (cdr arglst) accum str strlen at body)))))))
 
 ;; Returns a block of code that tries to match MATCH, and on success updates AT
 ;; and BODY, return #f on failure and #t on success.
-- 
1.7.4


[-- Attachment #18: 0017-module-ice-9-peg.scm-cg-body-cg-body-success-cg-body.patch --]
[-- Type: application/octet-stream, Size: 5226 bytes --]

From e8d9a8050a9c2e5d783e43ee06d0a577393cd33f Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Tue, 1 Feb 2011 10:36:08 -0500
Subject: [PATCH 17/20]  * module/ice-9/peg.scm: cg-body, cg-body-success, cg-body-more, and
     cg-body-ret now return syntax instead of s-expressions.

---
 module/ice-9/peg.scm |   71 ++++++++++++++++++++++++++-----------------------
 1 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 5def0e0..ece798b 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -332,66 +332,71 @@
 ;; Returns a block of code that tries to match MATCH, and on success updates AT
 ;; and BODY, return #f on failure and #t on success.
 (define (cg-body-test for-syntax match accum str strlen at body)
-  (safe-bind
-   (at2-body2 at2 body2)
-   (let ((mf (syntax->datum
-              (peg-sexp-compile for-syntax match accum))))
-     `(let ((,at2-body2 (,mf ,str ,strlen ,at)))
-        (if (or (not ,at2-body2) (= ,at (car ,at2-body2)))
+  (let ((at2-body2 (syntax at2-body2))
+        (at2 (syntax at2))
+        (body2 (syntax body2)))
+   (let ((mf (peg-sexp-compile for-syntax match accum)))
+     #`(let ((#,at2-body2 (#,mf #,str #,strlen #,at)))
+        (if (or (not #,at2-body2) (= #,at (car #,at2-body2)))
             #f
-            (let ((,at2 (car ,at2-body2))
-                  (,body2 (cadr ,at2-body2)))
-              (set! ,at ,at2)
+            (let ((#,at2 (car #,at2-body2))
+                  (#,body2 (cadr #,at2-body2)))
+              (set! #,at #,at2)
               ((@@ (ice-9 peg) push-not-null!)
-               ,body
-               ((@@ (ice-9 peg) single-filter) ,body2))
+               #,body
+               ((@@ (ice-9 peg) single-filter) #,body2))
               #t))))))
 
 ;; Returns a block of code that sees whether NUM wants us to try and match more
 ;; given that we've already matched COUNT.
 (define (cg-body-more for-syntax num count)
-  (cond ((number? num) `(< ,count ,num))
+  (cond ((number? num) #`(< #,count #,(datum->syntax for-syntax num)))
         ((eq? num '+) #t)
         ((eq? num '*) #t)
-        ((eq? num '?) `(< ,count 1))
+        ((eq? num '?) #`(< #,count 1))
         (#t (error-val `(cg-body-more-error ,num ,count)))))
 
 ;; Returns a function that takes a paramter indicating whether or not the match
 ;; was succesful and returns what the body expression should return.
 (define (cg-body-ret for-syntax accum type name body at at2)
-  (safe-bind
-   (success)
-   `(lambda (,success)
-      ,(cond ((eq? type '!)
-              `(if ,success #f ,(cggr for-syntax accum name ''() at)))
+  (let ((success (syntax success)))
+   #`(lambda (#,success)
+      #,(cond ((eq? type '!)
+              #`(if #,success #f #,(cggr-syn for-syntax accum name ''() at)))
              ((eq? type '&)
-              `(if ,success ,(cggr for-syntax accum name ''() at) #f))
+              #`(if #,success #,(cggr-syn for-syntax accum name ''() at) #f))
              ((eq? type 'lit)
-              `(if ,success
-                   ,(cggr for-syntax accum name `(reverse ,body) at2) #f))
+              #`(if #,success
+                   #,(cggr-syn for-syntax accum name #`(reverse #,body) at2) #f))
              (#t (error-val
                   `(cg-body-ret-error ,type ,accum ,name ,body ,at ,at2)))))))
 
 ;; Returns a block of code that sees whether COUNT satisfies the constraints of
 ;; NUM.
 (define (cg-body-success for-syntax num count)
-  (cond ((number? num) `(= ,count ,num))
-        ((eq? num '+) `(>= ,count 1))
+  (cond ((number? num) #`(= #,count #,num))
+        ((eq? num '+) #`(>= #,count 1))
         ((eq? num '*) #t)
-        ((eq? num '?) `(<= ,count 1))
+        ((eq? num '?) #`(<= #,count 1))
         (#t `(cg-body-success-error ,num))))
 
 ;; Returns a function that parses a BODY element.
 (define (cg-body for-syntax accum type match num)
-  (safe-bind
-   (str strlen at at2 count body)
-   `(lambda (,str ,strlen ,at)
-      (let ((,at2 ,at) (,count 0) (,body '()))
-        (while (and ,(cg-body-test for-syntax match accum str strlen at2 body)
-                    (set! ,count (+ ,count 1))
-                    ,(cg-body-more for-syntax num count)))
-        (,(cg-body-ret for-syntax accum type 'cg-body body at at2)
-         ,(cg-body-success for-syntax num count))))))
+  (let ((str (syntax str))
+        (strlen (syntax strlen))
+        (at (syntax at))
+        ; this next one doesn't work with (syntax at2), and I'd really
+        ; like to know why.
+        (at2 (datum->syntax for-syntax (gensym)))
+        (count (syntax count))
+        (body (syntax body)))
+   #`(lambda (#,str #,strlen #,at)
+      (let ((#,at2 #,at) (#,count 0) (#,body '()))
+        (while (and #,(cg-body-test for-syntax match accum str strlen at2 body)
+                    (set! #,count (+ #,count 1))
+                    #,(cg-body-more for-syntax num count)))
+        (#,(cg-body-ret for-syntax accum type 'cg-body body at at2)
+         #,(cg-body-success for-syntax num count))))))
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; FOR DEFINING AND USING NONTERMINALS
-- 
1.7.4


[-- Attachment #19: 0018-module-ice-9-peg.scm-remove-the-unused-non-syntax-ve.patch --]
[-- Type: application/octet-stream, Size: 7295 bytes --]

From b0bf9c19b10e54e54d8354805ed06eb0a32f177d Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Tue, 1 Feb 2011 10:41:20 -0500
Subject: [PATCH 18/20]  * module/ice-9/peg.scm: remove the unused non-syntax versions of cggl and
     cggr, and rename the syntax versions to take their place.

---
 module/ice-9/peg.scm |   91 ++++++++++++++++----------------------------------
 1 files changed, 29 insertions(+), 62 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index ece798b..5675a3c 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -92,20 +92,16 @@
 ;; Code we generate will be defined in a function, and always has to test
 ;; whether it's beyond the bounds of the string before it executes.
 (define (cg-generic-lambda for-syntax str strlen at code)
-  `(lambda (,str ,strlen ,at)
-     (if (>= ,at ,strlen)
-         #f
-         ,code)))
-;; The short name makes the formatting below much easier to read.
-(define cggl cg-generic-lambda)
-
-(define (cggl-syn for-syntax str strlen at code)
   ;; all arguments are syntax
   #`(lambda (#,str #,strlen #,at)
      (if (>= #,at #,strlen)
          #f
          #,code)))
 
+;; The short name makes the formatting below much easier to read.
+(define cggl cg-generic-lambda)
+
+
 ;; Optimizations for CG-GENERIC-RET below...
 (define *op-known-single-body* '(cg-string cg-peg-any cg-range))
 ;; ...done with optimizations (could use more of these).
@@ -113,38 +109,6 @@
 ;; Code we generate will have a certain return structure depending on how we're
 ;; accumulating (the ACCUM variable).
 (define (cg-generic-ret for-syntax accum name body-uneval at)
-  (safe-bind
-   (body)
-   `(let ((,body ,body-uneval))
-      ,(cond
-        ((and (eq? accum 'all) name body)
-         `(list ,at
-                (cond
-                 ((not (list? ,body)) (list ',name ,body))
-                 ((null? ,body) ',name)
-                 ((symbol? (car ,body)) (list ',name ,body))
-                 (#t (cons ',name ,body)))))
-        ((and (eq? accum 'name) name)
-         `(list ,at ',name))
-        ((and (eq? accum 'body) body)
-         (cond
-          ((member name *op-known-single-body*)
-           `(list ,at ,body))
-          (#t `(list ,at
-                     (cond
-                      (((@@ (ice-9 peg) single?) ,body) (car ,body))
-                      (#t ,body))))))
-        ((eq? accum 'none)
-         `(list ,at '()))
-        (#t
-         (begin
-           (pretty-print `(cg-generic-ret-error ,accum ,name ,body-uneval ,at))
-           (pretty-print "Defaulting to accum of none.\n")
-           `(list ,at '())))))))
-;; The short name makes the formatting below much easier to read.
-(define cggr cg-generic-ret)
-
-(define (cggr-syn for-syntax accum name body-uneval at)
   ;; name, body-uneval and at are syntax
   #`(let ((body #,body-uneval))
      #,(cond
@@ -173,6 +137,9 @@
            (pretty-print "Defaulting to accum of none.\n")
            #`(list #,at '()))))))
 
+;; The short name makes the formatting below much easier to read.
+(define cggr cg-generic-ret)
+
 ;; Generates code that matches a particular string.
 ;; E.g.: (cg-string syntax "abc" 'body)
 (define (cg-string for-syntax match accum)
@@ -180,12 +147,12 @@
         (strlen (syntax strlen))
         (at (syntax at))
         (len (string-length match)))
-     (cggl-syn for-syntax str strlen at
-               #`(if (string=? (substring #,str #,at (min (+ #,at #,len) #,strlen))
-                              #,match)
-                    #,(cggr-syn for-syntax accum 'cg-string match
-                                #`(+ #,at #,len))
-                    #f))))
+     (cggl for-syntax str strlen at
+           #`(if (string=? (substring #,str #,at (min (+ #,at #,len) #,strlen))
+                           #,match)
+                 #,(cggr for-syntax accum 'cg-string match
+                         #`(+ #,at #,len))
+                 #f))))
 
 ;; Generates code for matching any character.
 ;; E.g.: (cg-peg-any syntax 'body)
@@ -193,10 +160,10 @@
   (let ((str (syntax str))
         (strlen (syntax strlen))
         (at (syntax at)))
-    (cggl-syn for-syntax str strlen at
-          (cggr-syn for-syntax accum
-                    'cg-peg-any #`(substring #,str #,at (+ #,at 1))
-                    #`(+ #,at 1)))))
+    (cggl for-syntax str strlen at
+          (cggr for-syntax accum
+                'cg-peg-any #`(substring #,str #,at (+ #,at 1))
+                #`(+ #,at 1)))))
 
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
@@ -205,14 +172,14 @@
         (strlen (syntax strlen))
         (at (syntax at))
         (c (syntax c)))
-    (cggl-syn for-syntax str strlen at
+    (cggl for-syntax str strlen at
           #`(let ((#,c (string-ref #,str #,at)))
-             (if (and
-                  (char>=? #,c #,start)
-                  (char<=? #,c #,end))
-                 #,(cggr-syn for-syntax accum 'cg-range
-                             #`(string #,c) #`(+ #,at 1))
-                 #f)))))
+              (if (and
+                   (char>=? #,c #,start)
+                   (char<=? #,c #,end))
+                  #,(cggr for-syntax accum 'cg-range
+                          #`(string #,c) #`(+ #,at 1))
+                  #f)))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
 ;; literals (since we don't want to tag them with their name if we're doing an
@@ -297,7 +264,7 @@
         (newat (syntax newat))
         (newbody (syntax newbody)))
    (if (null? arglst)
-       (cggr-syn for-syntax accum 'cg-and #`(reverse #,body) at) ;; base case
+       (cggr for-syntax accum 'cg-and #`(reverse #,body) at) ;; base case
        (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
          #`(let ((#,res (#,mf #,str #,strlen #,at)))
             (if (not #,res) 
@@ -326,7 +293,7 @@
        (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
          #`(let ((#,res (#,mf #,str #,strlen #,at)))
             (if #,res ;; if the match succeeds, we're done
-                #,(cggr-syn for-syntax accum 'cg-or #`(cadr #,res) #`(car #,res))
+                #,(cggr for-syntax accum 'cg-or #`(cadr #,res) #`(car #,res))
                 #,(cg-or-int for-syntax (cdr arglst) accum str strlen at body)))))))
 
 ;; Returns a block of code that tries to match MATCH, and on success updates AT
@@ -362,12 +329,12 @@
   (let ((success (syntax success)))
    #`(lambda (#,success)
       #,(cond ((eq? type '!)
-              #`(if #,success #f #,(cggr-syn for-syntax accum name ''() at)))
+              #`(if #,success #f #,(cggr for-syntax accum name ''() at)))
              ((eq? type '&)
-              #`(if #,success #,(cggr-syn for-syntax accum name ''() at) #f))
+              #`(if #,success #,(cggr for-syntax accum name ''() at) #f))
              ((eq? type 'lit)
               #`(if #,success
-                   #,(cggr-syn for-syntax accum name #`(reverse #,body) at2) #f))
+                   #,(cggr for-syntax accum name #`(reverse #,body) at2) #f))
              (#t (error-val
                   `(cg-body-ret-error ,type ,accum ,name ,body ,at ,at2)))))))
 
-- 
1.7.4


[-- Attachment #20: 0019-module-ice-9-peg.scm-remove-the-unused-safe-bind-and.patch --]
[-- Type: application/octet-stream, Size: 2259 bytes --]

From cdfd04a571ac0803f5c9ec7bfaf612866b957ba3 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Tue, 1 Feb 2011 10:42:50 -0500
Subject: [PATCH 19/20]  * module/ice-9/peg.scm: remove the unused safe-bind and safe-bind-f.

---
 module/ice-9/peg.scm |   40 ----------------------------------------
 1 files changed, 0 insertions(+), 40 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 5675a3c..105bef6 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -8,46 +8,6 @@
 (eval-when (compile load eval)
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; CONVENIENCE MACROS
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-(define (eeval exp)
-  (eval exp (interaction-environment)))
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; MACRO BUILDERS
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-;; Safe-bind helps to bind macros safely.
-;; e.g.:
-;; (safe-bind
-;;  (a b)
-;;  `(,a ,b))
-;; gives:
-;; (#<uninterned-symbol a cc608d0> #<uninterned-symbol b cc608a0>)
-(define-syntax safe-bind
-  (lambda (x)
-    (syntax-case x ()
-      ((_ vals . actions)
-       (datum->syntax x (apply safe-bind-f
-                               (cons
-                                (syntax->datum #'vals)
-                                (syntax->datum #'actions))))))))
-;; (define-macro (safe-bind vals . actions)
-;;   (apply safe-bind-f (cons vals actions)))
-(define (safe-bind-f vals . actions)
-  `(let ,(map (lambda (val) `(,val (make-symbol ,(symbol->string val)))) vals)
-     ,@actions))
-
-;; Unsafe-bind is like safe-bind but uses symbols that are easier to read while
-;; debugging rather than safe ones.  Currently unused.
-;; (define-macro (unsafe-bind vals . actions)
-;;   (apply unsafe-bind-f (cons vals actions)))
-;; (define (unsafe-bind-f vals . actions)
-;;   `(let ,(map (lambda (val) `(,val ',val)) vals)
-;;      ,@actions))
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; LOOPING CONSTRUCTS
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-- 
1.7.4


[-- Attachment #21: 0020-module-ice-9-peg.scm-remove-some-unnecessary-lets.patch --]
[-- Type: application/octet-stream, Size: 10620 bytes --]

From 99fd23ec19ac298589fadc8e0a136d32e1ff43c2 Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Tue, 1 Feb 2011 15:15:54 -0500
Subject: [PATCH 20/20]  * module/ice-9/peg.scm: remove some unnecessary lets.

---
 module/ice-9/peg.scm |  159 ++++++++++++++++++++------------------------------
 1 files changed, 64 insertions(+), 95 deletions(-)

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index 105bef6..e87c5a1 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -61,7 +61,6 @@
 ;; The short name makes the formatting below much easier to read.
 (define cggl cg-generic-lambda)
 
-
 ;; Optimizations for CG-GENERIC-RET below...
 (define *op-known-single-body* '(cg-string cg-peg-any cg-range))
 ;; ...done with optimizations (could use more of these).
@@ -103,43 +102,33 @@
 ;; Generates code that matches a particular string.
 ;; E.g.: (cg-string syntax "abc" 'body)
 (define (cg-string for-syntax match accum)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at))
-        (len (string-length match)))
-     (cggl for-syntax str strlen at
-           #`(if (string=? (substring #,str #,at (min (+ #,at #,len) #,strlen))
-                           #,match)
-                 #,(cggr for-syntax accum 'cg-string match
-                         #`(+ #,at #,len))
-                 #f))))
+  (let ((len (string-length match)))
+    (cggl for-syntax #'str #'strlen #'at
+          #`(if (string=? (substring str at (min (+ at #,len) strlen))
+                          #,match)
+                #,(cggr for-syntax accum 'cg-string match
+                        #`(+ at #,len))
+                #f))))
 
 ;; Generates code for matching any character.
 ;; E.g.: (cg-peg-any syntax 'body)
 (define (cg-peg-any for-syntax accum)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at)))
-    (cggl for-syntax str strlen at
-          (cggr for-syntax accum
-                'cg-peg-any #`(substring #,str #,at (+ #,at 1))
-                #`(+ #,at 1)))))
+  (cggl for-syntax #'str #'strlen #'at
+        (cggr for-syntax accum
+              'cg-peg-any #`(substring str at (+ at 1))
+              #`(+ at 1))))
 
 ;; Generates code for matching a range of characters between start and end.
 ;; E.g.: (cg-range syntax #\a #\z 'body)
 (define (cg-range for-syntax start end accum)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at))
-        (c (syntax c)))
-    (cggl for-syntax str strlen at
-          #`(let ((#,c (string-ref #,str #,at)))
-              (if (and
-                   (char>=? #,c #,start)
-                   (char<=? #,c #,end))
-                  #,(cggr for-syntax accum 'cg-range
-                          #`(string #,c) #`(+ #,at 1))
-                  #f)))))
+  (cggl for-syntax #'str #'strlen #'at
+        #`(let ((c (string-ref str at)))
+            (if (and
+                 (char>=? c #,start)
+                 (char<=? c #,end))
+                #,(cggr for-syntax accum 'cg-range
+                        #`(string c) #`(+ at 1))
+                #f))))
 
 ;; Filters the accum argument to peg-sexp-compile for buildings like string
 ;; literals (since we don't want to tag them with their name if we're doing an
@@ -210,69 +199,54 @@
 
 ;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
 (define (cg-and for-syntax arglst accum)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at))
-        (body (syntax body)))
-   #`(lambda (#,str #,strlen #,at)
-      (let ((#,body '()))
-        #,(cg-and-int for-syntax arglst accum str strlen at body)))))
+   #`(lambda (str strlen at)
+      (let ((body '()))
+        #,(cg-and-int for-syntax arglst accum #'str #'strlen #'at #'body))))
 
 ;; Internal function builder for AND (calls itself).
 (define (cg-and-int for-syntax arglst accum str strlen at body)
-  (let ((res (syntax res))
-        (newat (syntax newat))
-        (newbody (syntax newbody)))
-   (if (null? arglst)
-       (cggr for-syntax accum 'cg-and #`(reverse #,body) at) ;; base case
-       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
-         #`(let ((#,res (#,mf #,str #,strlen #,at)))
-            (if (not #,res) 
+  (if (null? arglst)
+      (cggr for-syntax accum 'cg-and #`(reverse #,body) at) ;; base case
+      (let ((mf (peg-sexp-compile for-syntax (car arglst) accum))) ;; match function
+        #`(let ((res (#,mf #,str #,strlen #,at)))
+            (if (not res) 
                 #f ;; if the match failed, the and failed
                 ;; otherwise update AT and BODY then recurse
-                (let ((#,newat (car #,res))
-                      (#,newbody (cadr #,res)))
-                  (set! #,at #,newat)
-                  ((@@ (ice-9 peg) push-not-null!) #,body ((@@ (ice-9 peg) single-filter) #,newbody))
-                  #,(cg-and-int for-syntax (cdr arglst) accum str strlen at body))))))))
+                (let ((newat (car res))
+                      (newbody (cadr res)))
+                  (set! #,at newat)
+                  ((@@ (ice-9 peg) push-not-null!) #,body ((@@ (ice-9 peg) single-filter) newbody))
+                  #,(cg-and-int for-syntax (cdr arglst) accum str strlen at body)))))))
 
 ;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
 (define (cg-or for-syntax arglst accum)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at))
-        (body (syntax body)))
-   #`(lambda (#,str #,strlen #,at)
-      #,(cg-or-int for-syntax arglst accum str strlen at body))))
+  #`(lambda (str strlen at)
+      #,(cg-or-int for-syntax arglst accum #'str #'strlen #'at #'body)))
 
 ;; Internal function builder for OR (calls itself).
 (define (cg-or-int for-syntax arglst accum str strlen at body)
-  (let ((res (syntax res)))
-   (if (null? arglst)
-       #f ;; base case
-       (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
-         #`(let ((#,res (#,mf #,str #,strlen #,at)))
-            (if #,res ;; if the match succeeds, we're done
-                #,(cggr for-syntax accum 'cg-or #`(cadr #,res) #`(car #,res))
-                #,(cg-or-int for-syntax (cdr arglst) accum str strlen at body)))))))
+  (if (null? arglst)
+      #f ;; base case
+      (let ((mf (peg-sexp-compile for-syntax (car arglst) accum)))
+        #`(let ((res (#,mf #,str #,strlen #,at)))
+            (if res ;; if the match succeeds, we're done
+                #,(cggr for-syntax accum 'cg-or #`(cadr res) #`(car res))
+                #,(cg-or-int for-syntax (cdr arglst) accum str strlen at body))))))
 
 ;; Returns a block of code that tries to match MATCH, and on success updates AT
 ;; and BODY, return #f on failure and #t on success.
 (define (cg-body-test for-syntax match accum str strlen at body)
-  (let ((at2-body2 (syntax at2-body2))
-        (at2 (syntax at2))
-        (body2 (syntax body2)))
-   (let ((mf (peg-sexp-compile for-syntax match accum)))
-     #`(let ((#,at2-body2 (#,mf #,str #,strlen #,at)))
-        (if (or (not #,at2-body2) (= #,at (car #,at2-body2)))
+  (let ((mf (peg-sexp-compile for-syntax match accum)))
+    #`(let ((at2-body2 (#,mf #,str #,strlen #,at)))
+        (if (or (not at2-body2) (= #,at (car at2-body2)))
             #f
-            (let ((#,at2 (car #,at2-body2))
-                  (#,body2 (cadr #,at2-body2)))
-              (set! #,at #,at2)
+            (let ((at2 (car at2-body2))
+                  (body2 (cadr at2-body2)))
+              (set! #,at at2)
               ((@@ (ice-9 peg) push-not-null!)
                #,body
-               ((@@ (ice-9 peg) single-filter) #,body2))
-              #t))))))
+               ((@@ (ice-9 peg) single-filter) body2))
+              #t)))))
 
 ;; Returns a block of code that sees whether NUM wants us to try and match more
 ;; given that we've already matched COUNT.
@@ -286,17 +260,16 @@
 ;; Returns a function that takes a paramter indicating whether or not the match
 ;; was succesful and returns what the body expression should return.
 (define (cg-body-ret for-syntax accum type name body at at2)
-  (let ((success (syntax success)))
-   #`(lambda (#,success)
+   #`(lambda (success)
       #,(cond ((eq? type '!)
-              #`(if #,success #f #,(cggr for-syntax accum name ''() at)))
+              #`(if success #f #,(cggr for-syntax accum name ''() at)))
              ((eq? type '&)
-              #`(if #,success #,(cggr for-syntax accum name ''() at) #f))
+              #`(if success #,(cggr for-syntax accum name ''() at) #f))
              ((eq? type 'lit)
-              #`(if #,success
+              #`(if success
                    #,(cggr for-syntax accum name #`(reverse #,body) at2) #f))
              (#t (error-val
-                  `(cg-body-ret-error ,type ,accum ,name ,body ,at ,at2)))))))
+                  `(cg-body-ret-error ,type ,accum ,name ,body ,at ,at2))))))
 
 ;; Returns a block of code that sees whether COUNT satisfies the constraints of
 ;; NUM.
@@ -309,21 +282,17 @@
 
 ;; Returns a function that parses a BODY element.
 (define (cg-body for-syntax accum type match num)
-  (let ((str (syntax str))
-        (strlen (syntax strlen))
-        (at (syntax at))
-        ; this next one doesn't work with (syntax at2), and I'd really
+  (let (; this doesn't work with regular syntax, and I'd really
         ; like to know why.
-        (at2 (datum->syntax for-syntax (gensym)))
-        (count (syntax count))
-        (body (syntax body)))
-   #`(lambda (#,str #,strlen #,at)
-      (let ((#,at2 #,at) (#,count 0) (#,body '()))
-        (while (and #,(cg-body-test for-syntax match accum str strlen at2 body)
-                    (set! #,count (+ #,count 1))
-                    #,(cg-body-more for-syntax num count)))
-        (#,(cg-body-ret for-syntax accum type 'cg-body body at at2)
-         #,(cg-body-success for-syntax num count))))))
+        (at2 (datum->syntax for-syntax (gensym))))
+   #`(lambda (str strlen at)
+      (let ((#,at2 at) (count 0) (body '()))
+        (while (and #,(cg-body-test for-syntax match accum
+                                    #'str #'strlen at2 #'body)
+                    (set! count (+ count 1))
+                    #,(cg-body-more for-syntax num #'count)))
+        (#,(cg-body-ret for-syntax accum type 'cg-body #'body #'at at2)
+         #,(cg-body-success for-syntax num #'count))))))
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; FOR DEFINING AND USING NONTERMINALS
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-02-02  0:26                               ` Noah Lavine
@ 2011-02-06 15:31                                 ` Noah Lavine
  2011-02-18 22:03                                 ` Andy Wingo
  1 sibling, 0 replies; 27+ messages in thread
From: Noah Lavine @ 2011-02-06 15:31 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

Hi,

> However, I have a suspicion something odd is going on and this does
> not contain all of the work it should from guile's repo. I ended up
> having to use git cherry-pick instead of git rebase because I had made
> a mistake when I first made my local peg branch. I checked out
> origin/wip-mlucy and then cherry-picked my commits on top of that.
>
> It worked well, but in one of my earlier attempts before I decided
> that git-rebase wasn't going to work, I saw that someone had renamed
> peg-sexp-compile to cg-match-func and added some more test cases to
> peg.test. Now I don't see those additions, and if they're not in
> wip-mlucy then I'm not sure where to find them. Does anyone know where
> I should look for that?

Never mind that, sorry. There was a rename, but I was on the correct
side of it. I somehow looked at old code and got confused and thought
it was new code. The patches should apply cleanly to the head of
wip-mlucy, and everything should be fine.

Noah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-02-02  0:26                               ` Noah Lavine
  2011-02-06 15:31                                 ` Noah Lavine
@ 2011-02-18 22:03                                 ` Andy Wingo
  2011-02-23 15:10                                   ` Noah Lavine
  1 sibling, 1 reply; 27+ messages in thread
From: Andy Wingo @ 2011-02-18 22:03 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

On Wed 02 Feb 2011 01:26, Noah Lavine <noah.b.lavine@gmail.com> writes:

> Here it is! All of the unhygienic syntax is gone, is a series of only
> 20 commits. :-) The peg.test tests should all pass after each one of
> these commits.

Thanks!  You've probably seen that I've applied this to wip-mlucy, which
we should probably rename wip-peg.  I've also added on a number of
cleanups of my own, some of which I will push out shortly when my ISP
figures out the route to git.sv.gnu.org again (hah).

The branch still needs some work before it can go in.  I have a feeling
that it should probably be split into two modules -- one providing the
things that peg-sexp-compile needs (minus `peg' patterns perhaps?) and
another that uses the "base" library to define a PEG grammar.  Perhaps?
In any case we need to not have the entire thing in one big ol'
eval-when.

Also, the documentation needs some help, and perhaps the patterns need
some tweaking -- for example (& pat) makes more sense than (body & pat
1) or the like.

I think Michael's work was pretty great, especially considering the
scope of the problem.  It has the potential to have so wide an impact
that we should focus on making it have exactly the right interface
before we merge it in.

Onwards, upwards, etc.!

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-02-18 22:03                                 ` Andy Wingo
@ 2011-02-23 15:10                                   ` Noah Lavine
  2011-03-04 10:52                                     ` Andy Wingo
  0 siblings, 1 reply; 27+ messages in thread
From: Noah Lavine @ 2011-02-23 15:10 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 2220 bytes --]

Hello,

I tried looking at it a bit. First of all, your cleanups are awesome. :)

Unfortunately, I need to ask for help in order to work on this more. I
thought I would first move the code-generating functions to their own
module. It seems like this should be a simple and obviously-correct
transformation, because I didn't change any code - just moved it to
its own module, and replaced its definition by a (use-modules ...)
clause. Yet I have somehow created an error, and I don't see why. If
you have time, could someone please explain why the attached patch
does not work? I am afraid there is some interaction between modules
and syntax generators that I don't understand.

The patch will apply to the wip-mlucy branch.

Thanks,
Noah

On Fri, Feb 18, 2011 at 5:03 PM, Andy Wingo <wingo@pobox.com> wrote:
> On Wed 02 Feb 2011 01:26, Noah Lavine <noah.b.lavine@gmail.com> writes:
>
>> Here it is! All of the unhygienic syntax is gone, is a series of only
>> 20 commits. :-) The peg.test tests should all pass after each one of
>> these commits.
>
> Thanks!  You've probably seen that I've applied this to wip-mlucy, which
> we should probably rename wip-peg.  I've also added on a number of
> cleanups of my own, some of which I will push out shortly when my ISP
> figures out the route to git.sv.gnu.org again (hah).
>
> The branch still needs some work before it can go in.  I have a feeling
> that it should probably be split into two modules -- one providing the
> things that peg-sexp-compile needs (minus `peg' patterns perhaps?) and
> another that uses the "base" library to define a PEG grammar.  Perhaps?
> In any case we need to not have the entire thing in one big ol'
> eval-when.
>
> Also, the documentation needs some help, and perhaps the patterns need
> some tweaking -- for example (& pat) makes more sense than (body & pat
> 1) or the like.
>
> I think Michael's work was pretty great, especially considering the
> scope of the problem.  It has the potential to have so wide an impact
> that we should focus on making it have exactly the right interface
> before we merge it in.
>
> Onwards, upwards, etc.!
>
> Andy
> --
> http://wingolog.org/
>

[-- Attachment #2: 0001-Move-PEG-code-generators-into-their-own-module.patch --]
[-- Type: application/octet-stream, Size: 19365 bytes --]

From 20692168cdb84c51bdcf3266572ed6619fd9c9cc Mon Sep 17 00:00:00 2001
From: Noah Lavine <nlavine@haverford.edu>
Date: Wed, 23 Feb 2011 10:03:45 -0500
Subject: [PATCH] Move PEG code generators into their own module

 * module/ice-9/peg.scm: take out the cg-* functions and peg-sexp-compile
 * module/ice-9/peg/codegen.scm: and put them in here.
---
 module/ice-9/peg.scm         |  214 +-------------------------------------
 module/ice-9/peg/codegen.scm |  237 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 242 insertions(+), 209 deletions(-)
 create mode 100644 module/ice-9/peg/codegen.scm

diff --git a/module/ice-9/peg.scm b/module/ice-9/peg.scm
index d91a74e..0d60e73 100644
--- a/module/ice-9/peg.scm
+++ b/module/ice-9/peg.scm
@@ -18,8 +18,7 @@
 ;;;;
 
 (define-module (ice-9 peg)
-  #:export (peg-sexp-compile
-            peg-string-compile
+  #:export (peg-string-compile
             context-flatten
             peg-parse
             define-nonterm
@@ -35,7 +34,9 @@
             peg-record?
             keyword-flatten)
   #:use-module (system base pmatch)
-  #:use-module (ice-9 pretty-print))
+  #:use-module (ice-9 pretty-print)
+  #:use-module (ice-9 peg codegen)
+  #:re-export (peg-sexp-compile))
 
 ;;;
 ;;; Helper Macros
@@ -64,214 +65,9 @@ execute the STMTs and try again."
     ((_ lst obj)
      (set! lst (cons obj lst)))))
 
-(define-syntax single-filter
-  (syntax-rules ()
-    "If EXP is a list of one element, return the element.  Otherwise
-return EXP."
-    ((_ exp)
-     (pmatch exp
-       ((,elt) elt)
-       (,elts elts)))))
-
-(define-syntax push-not-null!
-  (syntax-rules ()
-    "If OBJ is non-null, push it onto LST, otherwise do nothing."
-    ((_ lst obj)
-     (if (not (null? obj))
-         (push! lst obj)))))
-
 (eval-when (compile load eval)
 
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;;;; CODE GENERATORS
-;; These functions generate scheme code for parsing PEGs.
-;; Conventions:
-;;   accum: (all name body none)
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-;; Code we generate will have a certain return structure depending on how we're
-;; accumulating (the ACCUM variable).
-(define (cg-generic-ret accum name body-uneval at)
-  ;; name, body-uneval and at are syntax
-  #`(let ((body #,body-uneval))
-     #,(cond
-        ((and (eq? accum 'all) name)
-         #`(list #,at
-                 (cond
-                  ((not (list? body)) (list '#,name body))
-                  ((null? body) '#,name)
-                  ((symbol? (car body)) (list '#,name body))
-                  (else (cons '#,name body)))))
-        ((eq? accum 'name)
-         #`(list #,at '#,name))
-        ((eq? accum 'body)
-         #`(list #,at
-                 (cond
-                  ((single? body) (car body))
-                  (else body))))
-        ((eq? accum 'none)
-         #`(list #,at '()))
-        (else
-         (begin
-           (pretty-print `(cg-generic-ret-error ,accum ,name ,body-uneval ,at))
-           (pretty-print "Defaulting to accum of none.\n")
-           #`(list #,at '()))))))
-
-;; The short name makes the formatting below much easier to read.
-(define cggr cg-generic-ret)
-
-;; Generates code that matches a particular string.
-;; E.g.: (cg-string syntax "abc" 'body)
-(define (cg-string pat accum)
-  (let ((plen (string-length pat)))
-    #`(lambda (str len pos)
-        (let ((end (+ pos #,plen)))
-          (and (<= end len)
-               (string= str #,pat pos end)
-               #,(case accum
-                   ((all) #`(list end (list 'cg-string #,pat)))
-                   ((name) #`(list end 'cg-string))
-                   ((body) #`(list end #,pat))
-                   ((none) #`(list end '()))
-                   (else (error "bad accum" accum))))))))
-
-;; Generates code for matching any character.
-;; E.g.: (cg-peg-any syntax 'body)
-(define (cg-peg-any accum)
-  #`(lambda (str len pos)
-      (and (< pos len)
-           #,(case accum
-               ((all) #`(list (1+ pos)
-                              (list 'cg-peg-any (substring str pos (1+ pos)))))
-               ((name) #`(list (1+ pos) 'cg-peg-any))
-               ((body) #`(list (1+ pos) (substring str pos (1+ pos))))
-               ((none) #`(list (1+ pos) '()))
-               (else (error "bad accum" accum))))))
-
-;; Generates code for matching a range of characters between start and end.
-;; E.g.: (cg-range syntax #\a #\z 'body)
-(define (cg-range start end accum)
-  #`(lambda (str len pos)
-      (and (< pos len)
-           (let ((c (string-ref str pos)))
-             (and (char>=? c #,start)
-                  (char<=? c #,end)
-                  #,(case accum
-                      ((all) #`(list (1+ pos) (list 'cg-range (string c))))
-                      ((name) #`(list (1+ pos) 'cg-range))
-                      ((body) #`(list (1+ pos) (string c)))
-                      ((none) #`(list (1+ pos) '()))
-                      (else (error "bad accum" accum))))))))
-
-;; Filters the accum argument to peg-sexp-compile for buildings like string
-;; literals (since we don't want to tag them with their name if we're doing an
-;; "all" accum).
-(define (builtin-accum-filter accum)
-  (cond
-   ((eq? accum 'all) 'body)
-   ((eq? accum 'name) 'name)
-   ((eq? accum 'body) 'body)
-   ((eq? accum 'none) 'none)))
-(define baf builtin-accum-filter)
-
-;; Takes an arbitrary expressions and accumulation variable, then parses it.
-;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
-(define (peg-sexp-compile pat accum)
-  (syntax-case pat (peg-any range ignore capture peg and or body)
-    (peg-any
-     (cg-peg-any (baf accum)))
-    (sym (identifier? #'sym) ;; nonterminal
-     #'sym)
-    (str (string? (syntax->datum #'str)) ;; literal string
-     (cg-string (syntax->datum #'str) (baf accum)))
-    ((range start end) ;; range of characters (e.g. [a-z])
-     (and (char? (syntax->datum #'start)) (char? (syntax->datum #'end)))
-     (cg-range (syntax->datum #'start) (syntax->datum #'end) (baf accum)))
-    ((ignore pat) ;; match but don't parse
-     (peg-sexp-compile #'pat 'none))
-    ((capture pat) ;; parse
-     (peg-sexp-compile #'pat 'body))
-    ((peg pat)  ;; embedded PEG string
-     (string? (syntax->datum #'pat))
-     (peg-string-compile #'pat (baf accum)))
-    ((and pat ...)
-     (cg-and #'(pat ...) (baf accum)))
-    ((or pat ...)
-     (cg-or #'(pat ...) (baf accum)))
-    ((body type pat num)
-     (cg-body (baf accum) #'type #'pat #'num))))
-
-;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
-(define (cg-and clauses accum)
-  #`(lambda (str len pos)
-      (let ((body '()))
-        #,(cg-and-int clauses accum #'str #'len #'pos #'body))))
-
-;; Internal function builder for AND (calls itself).
-(define (cg-and-int clauses accum str strlen at body)
-  (syntax-case clauses ()
-    (()
-     (cggr accum 'cg-and #`(reverse #,body) at))
-    ((first rest ...)
-     #`(let ((res (#,(peg-sexp-compile #'first accum) #,str #,strlen #,at)))
-         (and res 
-              ;; update AT and BODY then recurse
-              (let ((newat (car res))
-                    (newbody (cadr res)))
-                (set! #,at newat)
-                (push-not-null! #,body (single-filter newbody))
-                #,(cg-and-int #'(rest ...) accum str strlen at body)))))))
-
-;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
-(define (cg-or clauses accum)
-  #`(lambda (str len pos)
-      #,(cg-or-int clauses accum #'str #'len #'pos)))
-
-;; Internal function builder for OR (calls itself).
-(define (cg-or-int clauses accum str strlen at)
-  (syntax-case clauses ()
-    (()
-     #f)
-    ((first rest ...)
-     #`(or (#,(peg-sexp-compile #'first accum) #,str #,strlen #,at)
-           #,(cg-or-int #'(rest ...) accum str strlen at)))))
-
-;; Returns a function that parses a BODY element.
-(define (cg-body accum type pat num)
-  #`(lambda (str strlen at)
-      (let ((body '()))
-        (let lp ((end at) (count 0))
-          (let* ((match (#,(peg-sexp-compile pat accum) str strlen end))
-                 (new-end (if match (car match) end))
-                 (count (if (> new-end end) (1+ count) count)))
-            (if (> new-end end)
-                (push-not-null! body (single-filter (cadr match))))
-            (if (and (> new-end end)
-                     #,(syntax-case num (+ * ?)
-                         (n (number? (syntax->datum #'n))
-                            #'(< count n))
-                         (+ #t)
-                         (* #t)
-                         (? #'(< count 1))))
-                (lp new-end count)
-                (let ((success #,(syntax-case num (+ * ?)
-                                   (n (number? (syntax->datum #'n))
-                                      #'(= count n))
-                                   (+ #'(>= count 1))
-                                   (* #t)
-                                   (? #t))))
-                  #,(syntax-case type (! & lit)
-                      (!
-                       #`(if success
-                             #f
-                             #,(cggr accum 'cg-body #''() #'at)))
-                      (&
-                       #`(and success
-                              #,(cggr accum 'cg-body #''() #'at)))
-                      (lit
-                       #`(and success
-                              #,(cggr accum 'cg-body #'(reverse body) #'new-end)))))))))))
-
+(use-modules (ice-9 peg codegen))
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;;; FOR DEFINING AND USING NONTERMINALS
diff --git a/module/ice-9/peg/codegen.scm b/module/ice-9/peg/codegen.scm
new file mode 100644
index 0000000..879e2cb
--- /dev/null
+++ b/module/ice-9/peg/codegen.scm
@@ -0,0 +1,237 @@
+;;;; codegen.scm --- code generation for PEG parsers
+;;;;
+;;;; 	Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+;;;;
+;;;; This library is free software; you can redistribute it and/or
+;;;; modify it under the terms of the GNU Lesser General Public
+;;;; License as published by the Free Software Foundation; either
+;;;; version 3 of the License, or (at your option) any later version.
+;;;; 
+;;;; This library is distributed in the hope that it will be useful,
+;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;;;; Lesser General Public License for more details.
+;;;; 
+;;;; You should have received a copy of the GNU Lesser General Public
+;;;; License along with this library; if not, write to the Free Software
+;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;;;;
+
+(define-module (ice-9 peg codegen)
+  #:export (peg-sexp-compile)
+  #:use-module (ice-9 peg)
+  #:use-module (ice-9 pretty-print))
+
+(define-syntax single?
+  (syntax-rules ()
+    "Return #t if X is a list of one element."
+    ((_ x)
+     (pmatch x
+       ((_) #t)
+       (else #f)))))
+
+(define-syntax single-filter
+  (syntax-rules ()
+    "If EXP is a list of one element, return the element.  Otherwise
+return EXP."
+    ((_ exp)
+     (pmatch exp
+       ((,elt) elt)
+       (,elts elts)))))
+
+(define-syntax push-not-null!
+  (syntax-rules ()
+    "If OBJ is non-null, push it onto LST, otherwise do nothing."
+    ((_ lst obj)
+     (if (not (null? obj))
+         (push! lst obj)))))
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;;;; CODE GENERATORS
+;; These functions generate scheme code for parsing PEGs.
+;; Conventions:
+;;   accum: (all name body none)
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; Code we generate will have a certain return structure depending on how we're
+;; accumulating (the ACCUM variable).
+(define (cg-generic-ret accum name body-uneval at)
+  ;; name, body-uneval and at are syntax
+  #`(let ((body #,body-uneval))
+     #,(cond
+        ((and (eq? accum 'all) name)
+         #`(list #,at
+                 (cond
+                  ((not (list? body)) (list '#,name body))
+                  ((null? body) '#,name)
+                  ((symbol? (car body)) (list '#,name body))
+                  (else (cons '#,name body)))))
+        ((eq? accum 'name)
+         #`(list #,at '#,name))
+        ((eq? accum 'body)
+         #`(list #,at
+                 (cond
+                  ((single? body) (car body))
+                  (else body))))
+        ((eq? accum 'none)
+         #`(list #,at '()))
+        (else
+         (begin
+           (pretty-print `(cg-generic-ret-error ,accum ,name ,body-uneval ,at))
+           (pretty-print "Defaulting to accum of none.\n")
+           #`(list #,at '()))))))
+
+;; The short name makes the formatting below much easier to read.
+(define cggr cg-generic-ret)
+
+;; Generates code that matches a particular string.
+;; E.g.: (cg-string syntax "abc" 'body)
+(define (cg-string pat accum)
+  (let ((plen (string-length pat)))
+    #`(lambda (str len pos)
+        (let ((end (+ pos #,plen)))
+          (and (<= end len)
+               (string= str #,pat pos end)
+               #,(case accum
+                   ((all) #`(list end (list 'cg-string #,pat)))
+                   ((name) #`(list end 'cg-string))
+                   ((body) #`(list end #,pat))
+                   ((none) #`(list end '()))
+                   (else (error "bad accum" accum))))))))
+
+;; Generates code for matching any character.
+;; E.g.: (cg-peg-any syntax 'body)
+(define (cg-peg-any accum)
+  #`(lambda (str len pos)
+      (and (< pos len)
+           #,(case accum
+               ((all) #`(list (1+ pos)
+                              (list 'cg-peg-any (substring str pos (1+ pos)))))
+               ((name) #`(list (1+ pos) 'cg-peg-any))
+               ((body) #`(list (1+ pos) (substring str pos (1+ pos))))
+               ((none) #`(list (1+ pos) '()))
+               (else (error "bad accum" accum))))))
+
+;; Generates code for matching a range of characters between start and end.
+;; E.g.: (cg-range syntax #\a #\z 'body)
+(define (cg-range start end accum)
+  #`(lambda (str len pos)
+      (and (< pos len)
+           (let ((c (string-ref str pos)))
+             (and (char>=? c #,start)
+                  (char<=? c #,end)
+                  #,(case accum
+                      ((all) #`(list (1+ pos) (list 'cg-range (string c))))
+                      ((name) #`(list (1+ pos) 'cg-range))
+                      ((body) #`(list (1+ pos) (string c)))
+                      ((none) #`(list (1+ pos) '()))
+                      (else (error "bad accum" accum))))))))
+
+;; Filters the accum argument to peg-sexp-compile for buildings like string
+;; literals (since we don't want to tag them with their name if we're doing an
+;; "all" accum).
+(define (builtin-accum-filter accum)
+  (cond
+   ((eq? accum 'all) 'body)
+   ((eq? accum 'name) 'name)
+   ((eq? accum 'body) 'body)
+   ((eq? accum 'none) 'none)))
+(define baf builtin-accum-filter)
+
+;; Takes an arbitrary expressions and accumulation variable, then parses it.
+;; E.g.: (peg-sexp-compile syntax '(and "abc" (or "-" (range #\a #\z))) 'all)
+(define (peg-sexp-compile pat accum)
+  (syntax-case pat (peg-any range ignore capture peg and or body)
+    (peg-any
+     (cg-peg-any (baf accum)))
+    (sym (identifier? #'sym) ;; nonterminal
+     #'sym)
+    (str (string? (syntax->datum #'str)) ;; literal string
+     (cg-string (syntax->datum #'str) (baf accum)))
+    ((range start end) ;; range of characters (e.g. [a-z])
+     (and (char? (syntax->datum #'start)) (char? (syntax->datum #'end)))
+     (cg-range (syntax->datum #'start) (syntax->datum #'end) (baf accum)))
+    ((ignore pat) ;; match but don't parse
+     (peg-sexp-compile #'pat 'none))
+    ((capture pat) ;; parse
+     (peg-sexp-compile #'pat 'body))
+    ((peg pat)  ;; embedded PEG string
+     (string? (syntax->datum #'pat))
+     (peg-string-compile #'pat (baf accum)))
+    ((and pat ...)
+     (cg-and #'(pat ...) (baf accum)))
+    ((or pat ...)
+     (cg-or #'(pat ...) (baf accum)))
+    ((body type pat num)
+     (cg-body (baf accum) #'type #'pat #'num))))
+
+;; Top-level function builder for AND.  Reduces to a call to CG-AND-INT.
+(define (cg-and clauses accum)
+  #`(lambda (str len pos)
+      (let ((body '()))
+        #,(cg-and-int clauses accum #'str #'len #'pos #'body))))
+
+;; Internal function builder for AND (calls itself).
+(define (cg-and-int clauses accum str strlen at body)
+  (syntax-case clauses ()
+    (()
+     (cggr accum 'cg-and #`(reverse #,body) at))
+    ((first rest ...)
+     #`(let ((res (#,(peg-sexp-compile #'first accum) #,str #,strlen #,at)))
+         (and res 
+              ;; update AT and BODY then recurse
+              (let ((newat (car res))
+                    (newbody (cadr res)))
+                (set! #,at newat)
+                (push-not-null! #,body (single-filter newbody))
+                #,(cg-and-int #'(rest ...) accum str strlen at body)))))))
+
+;; Top-level function builder for OR.  Reduces to a call to CG-OR-INT.
+(define (cg-or clauses accum)
+  #`(lambda (str len pos)
+      #,(cg-or-int clauses accum #'str #'len #'pos)))
+
+;; Internal function builder for OR (calls itself).
+(define (cg-or-int clauses accum str strlen at)
+  (syntax-case clauses ()
+    (()
+     #f)
+    ((first rest ...)
+     #`(or (#,(peg-sexp-compile #'first accum) #,str #,strlen #,at)
+           #,(cg-or-int #'(rest ...) accum str strlen at)))))
+
+;; Returns a function that parses a BODY element.
+(define (cg-body accum type pat num)
+  #`(lambda (str strlen at)
+      (let ((body '()))
+        (let lp ((end at) (count 0))
+          (let* ((match (#,(peg-sexp-compile pat accum) str strlen end))
+                 (new-end (if match (car match) end))
+                 (count (if (> new-end end) (1+ count) count)))
+            (if (> new-end end)
+                (push-not-null! body (single-filter (cadr match))))
+            (if (and (> new-end end)
+                     #,(syntax-case num (+ * ?)
+                         (n (number? (syntax->datum #'n))
+                            #'(< count n))
+                         (+ #t)
+                         (* #t)
+                         (? #'(< count 1))))
+                (lp new-end count)
+                (let ((success #,(syntax-case num (+ * ?)
+                                   (n (number? (syntax->datum #'n))
+                                      #'(= count n))
+                                   (+ #'(>= count 1))
+                                   (* #t)
+                                   (? #t))))
+                  #,(syntax-case type (! & lit)
+                      (!
+                       #`(if success
+                             #f
+                             #,(cggr accum 'cg-body #''() #'at)))
+                      (&
+                       #`(and success
+                              #,(cggr accum 'cg-body #''() #'at)))
+                      (lit
+                       #`(and success
+                              #,(cggr accum 'cg-body #'(reverse body) #'new-end)))))))))))
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-02-23 15:10                                   ` Noah Lavine
@ 2011-03-04 10:52                                     ` Andy Wingo
  2011-03-04 13:09                                       ` Noah Lavine
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Wingo @ 2011-03-04 10:52 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

On Wed 23 Feb 2011 16:10, Noah Lavine <noah.b.lavine@gmail.com> writes:

> I need to ask for help in order to work on this more. I thought I
> would first move the code-generating functions to their own module. It
> seems like this should be a simple and obviously-correct
> transformation, because I didn't change any code - just moved it to
> its own module, and replaced its definition by a (use-modules ...)
> clause. Yet I have somehow created an error, and I don't see why.

I haven't tried it yet, but there is a circularity: peg-sexp-compile
from your (ice-9 peg codegen) references peg-string-compile, but
peg-string-compile is implemented in (ice-9 peg).

I'm not sure exactly how to break that circle.  For now if you're
feeling hacky, you can ((module-ref (resolve-interface '(ice-9 peg))
'peg-string-compile) ...) in that (peg ...) case.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: PEG Parser
  2011-03-04 10:52                                     ` Andy Wingo
@ 2011-03-04 13:09                                       ` Noah Lavine
  0 siblings, 0 replies; 27+ messages in thread
From: Noah Lavine @ 2011-03-04 13:09 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Michael Lucy, Ludovic Courtès, guile-devel

Actually, I think I found a way to make it work. I had missed a macro
that needed to be copied from one module to another. I made each
module import the other, and that seemed to work all right.

As soon as I have time, I hope to turn this into a patch series that
splits the PEG module up into nicer chunks.

Noah

On Fri, Mar 4, 2011 at 5:52 AM, Andy Wingo <wingo@pobox.com> wrote:
> On Wed 23 Feb 2011 16:10, Noah Lavine <noah.b.lavine@gmail.com> writes:
>
>> I need to ask for help in order to work on this more. I thought I
>> would first move the code-generating functions to their own module. It
>> seems like this should be a simple and obviously-correct
>> transformation, because I didn't change any code - just moved it to
>> its own module, and replaced its definition by a (use-modules ...)
>> clause. Yet I have somehow created an error, and I don't see why.
>
> I haven't tried it yet, but there is a circularity: peg-sexp-compile
> from your (ice-9 peg codegen) references peg-string-compile, but
> peg-string-compile is implemented in (ice-9 peg).
>
> I'm not sure exactly how to break that circle.  For now if you're
> feeling hacky, you can ((module-ref (resolve-interface '(ice-9 peg))
> 'peg-string-compile) ...) in that (peg ...) case.
>
> Andy
> --
> http://wingolog.org/
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-03-04 13:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-13 21:29 PEG Parser Noah Lavine
2011-01-17 21:32 ` Ludovic Courtès
2011-01-21 15:23   ` Noah Lavine
2011-01-22 21:02     ` Ludovic Courtès
2011-01-24  1:29       ` Noah Lavine
2011-01-24 20:55         ` Ludovic Courtès
2011-01-27  1:40         ` Noah Lavine
2011-01-27  2:23           ` Michael Lucy
2011-01-27  2:38             ` Noah Lavine
2011-01-27  3:02               ` Michael Lucy
2011-01-27  5:17                 ` Noah Lavine
2011-01-28  3:25                   ` Noah Lavine
2011-01-28  5:13                     ` Michael Lucy
2011-01-28 15:48                   ` Andy Wingo
2011-01-29  3:07                     ` Noah Lavine
2011-01-29  4:15                       ` Michael Lucy
2011-01-29 11:34                         ` Andy Wingo
2011-01-29 19:37                           ` Noah Lavine
2011-01-30 11:43                             ` Andy Wingo
2011-02-02  0:26                               ` Noah Lavine
2011-02-06 15:31                                 ` Noah Lavine
2011-02-18 22:03                                 ` Andy Wingo
2011-02-23 15:10                                   ` Noah Lavine
2011-03-04 10:52                                     ` Andy Wingo
2011-03-04 13:09                                       ` Noah Lavine
2011-01-29 11:33                       ` Andy Wingo
  -- strict thread matches above, loose matches on Subject: below --
2010-05-27  5:19 Michael Lucy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).