From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Noah Lavine <noah.b.lavine@gmail.com>
Newsgroups: gmane.lisp.guile.devel
Subject: Re: GNU Guile PEG-parser
Date: Thu, 9 Feb 2012 10:42:37 -0500
Message-ID: <CA+U71=OcqfEu+5Qd9m=bU=f480xRLxsqXDeiC=FPbyodNHd_-g@mail.gmail.com>
References: <CAO_vGe8tm2=gyhF4vKrYV=mU9gEpbrmGwsr0JnsnF9JqvfaMuA@mail.gmail.com>
	<CA+U71=OCNcApUDLYGe6BuqacLnzvic-XtbD40Hp2Mag1GJ=_ug@mail.gmail.com>
	<CAO_vGe8jkCyXHaX1_sVqO-i7rsNmuM--PhS8pnC2N9fY=TmC+g@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
X-Trace: dough.gmane.org 1328802169 14636 80.91.229.3 (9 Feb 2012 15:42:49 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 9 Feb 2012 15:42:49 +0000 (UTC)
Cc: guile-devel <guile-devel@gnu.org>
To: Krister Svanlund <svanlund@student.chalmers.se>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Feb 09 16:42:48 2012
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([140.186.70.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1RvW91-0000KN-P9
	for guile-devel@m.gmane.org; Thu, 09 Feb 2012 16:42:48 +0100
Original-Received: from localhost ([::1]:33988 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1RvW90-0003up-Jj
	for guile-devel@m.gmane.org; Thu, 09 Feb 2012 10:42:46 -0500
Original-Received: from eggs.gnu.org ([140.186.70.92]:40641)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <noah549@gmail.com>) id 1RvW8x-0003uj-Vw
	for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:44 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <noah549@gmail.com>) id 1RvW8s-00033r-3i
	for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:43 -0500
Original-Received: from mail-iy0-f169.google.com ([209.85.210.169]:34422)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <noah549@gmail.com>) id 1RvW8s-00033h-0e
	for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:38 -0500
Original-Received: by iagz16 with SMTP id z16so3274312iag.0
	for <guile-devel@gnu.org>; Thu, 09 Feb 2012 07:42:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=+RJye0e2BkzzT/dczENzwa9x1KZi8DJd/gL6O+7Fp8c=;
	b=STXCcalRx/6ux/9F+BxxhQBwaY/Knm9TgYRbbsz0rJUyQxpGfcBEOEyk+w7l5COG52
	bsDIsHj5ThBjlXkubkXZyumPIIQI7qZL9GeylAo41Gpyedl/uZjWqjy7ayN1Lmt7y1uE
	0jJEC8ZLOBQLj06sKNfM4Ji9+7DSjjQcBQv9w=
Original-Received: by 10.42.138.133 with SMTP id c5mr3187888icu.52.1328802157428; Thu,
	09 Feb 2012 07:42:37 -0800 (PST)
Original-Received: by 10.42.142.7 with HTTP; Thu, 9 Feb 2012 07:42:37 -0800 (PST)
In-Reply-To: <CAO_vGe8jkCyXHaX1_sVqO-i7rsNmuM--PhS8pnC2N9fY=TmC+g@mail.gmail.com>
X-Google-Sender-Auth: AS61pBUlRqPYBCJTOhwGW9zjc1I
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 209.85.210.169
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.lisp.guile.devel:13834
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/13834>

Hello,

> I've actually found no PEG library that has a string syntax for the
> equivalent of ignore. I'm guessing most people are satisfied with just
> specifying another nonterminal and matching that one. Probably because it is
> seen as less ugly than extending on the formal definition of PEG but I
> really think we could get a cleaner PEG definition of our parser if we where
> able to ignore text that wasn't needed or gets in the way while using
> string-patterns.

That makes sense. I'm a bit surprised that you find string patterns
easier than S-expression patterns, because I find it the other way
around, but different people like different things. I think we could
add some string syntax for ignore if you wanted it, although other
people on the list should chime in.

> It's actually exactly Python I'm thinking about, we are currently doing a
> preprocessor that will put #{ and #} before and after each block but I was
> hoping that there exists a cleaner solution using the power of PEG instead
> of basic string manipulation. If you could help in any way shape or form
> that would be greatly appreciated, even just suggesting on what parts of PEG
> internals to look at would be really useful.

After thinking about it more, you have two choices.

The easiest thing would be to parse each line (or extended line, if it
ends with "\") with a PEG parser, and use your own logic for the
blocks. Your parser would have two steps for each line:

1. Get the indent from the beginning of a line
2. Parse the rest of the line with a PEG parser

Then you would take the lines returned by the PEG parser and combine
them into a data structure yourself, using the Python line-combining
rules. This is probably your best choice.

Your second choice depends on the fact that PEG parsers are just
functions that take certain arguments and return certain arguments.
You can write a function like that yourself and use it just like a PEG
nonterminal in your grammar. When I was working on PEG, I actually
thought that it would be nice to make this interface public so that
different parser generators could interoperate, but I never did it.
It's all documented in the PEG Internals section of the manual,
though. However, I'd recommend against this just because I think the
interface is not as good as it should be right now, so I'd probably
want to change it in the future, which would make your code stop
working. (Although if this is a one-time thing, then you don't need to
care about that.)

I suppose you also have a third choice, which is to change the
internal interface yourself, then let us make it public, then use it
that way. That's the most elegant solution, but it's more work for
you. I wouldn't recommend it unless the first option is hard and you
want this to last for a long time.

I hope this helps,
Noah