From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Noah Lavine Newsgroups: gmane.lisp.guile.devel Subject: Re: GNU Guile PEG-parser Date: Thu, 9 Feb 2012 10:42:37 -0500 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: dough.gmane.org 1328802169 14636 80.91.229.3 (9 Feb 2012 15:42:49 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 9 Feb 2012 15:42:49 +0000 (UTC) Cc: guile-devel To: Krister Svanlund Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Feb 09 16:42:48 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RvW91-0000KN-P9 for guile-devel@m.gmane.org; Thu, 09 Feb 2012 16:42:48 +0100 Original-Received: from localhost ([::1]:33988 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RvW90-0003up-Jj for guile-devel@m.gmane.org; Thu, 09 Feb 2012 10:42:46 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:40641) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RvW8x-0003uj-Vw for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RvW8s-00033r-3i for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:43 -0500 Original-Received: from mail-iy0-f169.google.com ([209.85.210.169]:34422) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RvW8s-00033h-0e for guile-devel@gnu.org; Thu, 09 Feb 2012 10:42:38 -0500 Original-Received: by iagz16 with SMTP id z16so3274312iag.0 for ; Thu, 09 Feb 2012 07:42:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=+RJye0e2BkzzT/dczENzwa9x1KZi8DJd/gL6O+7Fp8c=; b=STXCcalRx/6ux/9F+BxxhQBwaY/Knm9TgYRbbsz0rJUyQxpGfcBEOEyk+w7l5COG52 bsDIsHj5ThBjlXkubkXZyumPIIQI7qZL9GeylAo41Gpyedl/uZjWqjy7ayN1Lmt7y1uE 0jJEC8ZLOBQLj06sKNfM4Ji9+7DSjjQcBQv9w= Original-Received: by 10.42.138.133 with SMTP id c5mr3187888icu.52.1328802157428; Thu, 09 Feb 2012 07:42:37 -0800 (PST) Original-Received: by 10.42.142.7 with HTTP; Thu, 9 Feb 2012 07:42:37 -0800 (PST) In-Reply-To: X-Google-Sender-Auth: AS61pBUlRqPYBCJTOhwGW9zjc1I X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.210.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13834 Archived-At: Hello, > I've actually found no PEG library that has a string syntax for the > equivalent of ignore. I'm guessing most people are satisfied with just > specifying another nonterminal and matching that one. Probably because it is > seen as less ugly than extending on the formal definition of PEG but I > really think we could get a cleaner PEG definition of our parser if we where > able to ignore text that wasn't needed or gets in the way while using > string-patterns. That makes sense. I'm a bit surprised that you find string patterns easier than S-expression patterns, because I find it the other way around, but different people like different things. I think we could add some string syntax for ignore if you wanted it, although other people on the list should chime in. > It's actually exactly Python I'm thinking about, we are currently doing a > preprocessor that will put #{ and #} before and after each block but I was > hoping that there exists a cleaner solution using the power of PEG instead > of basic string manipulation. If you could help in any way shape or form > that would be greatly appreciated, even just suggesting on what parts of PEG > internals to look at would be really useful. After thinking about it more, you have two choices. The easiest thing would be to parse each line (or extended line, if it ends with "\") with a PEG parser, and use your own logic for the blocks. Your parser would have two steps for each line: 1. Get the indent from the beginning of a line 2. Parse the rest of the line with a PEG parser Then you would take the lines returned by the PEG parser and combine them into a data structure yourself, using the Python line-combining rules. This is probably your best choice. Your second choice depends on the fact that PEG parsers are just functions that take certain arguments and return certain arguments. You can write a function like that yourself and use it just like a PEG nonterminal in your grammar. When I was working on PEG, I actually thought that it would be nice to make this interface public so that different parser generators could interoperate, but I never did it. It's all documented in the PEG Internals section of the manual, though. However, I'd recommend against this just because I think the interface is not as good as it should be right now, so I'd probably want to change it in the future, which would make your code stop working. (Although if this is a one-time thing, then you don't need to care about that.) I suppose you also have a third choice, which is to change the internal interface yourself, then let us make it public, then use it that way. That's the most elegant solution, but it's more work for you. I wouldn't recommend it unless the first option is hard and you want this to last for a long time. I hope this helps, Noah