From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Michael Lucy Newsgroups: gmane.lisp.guile.devel Subject: Re: GSOC PEG project Date: Sun, 11 Jul 2010 02:48:39 -0500 Message-ID: References: <874ogdmu2m.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: dough.gmane.org 1278834560 9189 80.91.229.12 (11 Jul 2010 07:49:20 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 11 Jul 2010 07:49:20 +0000 (UTC) Cc: =?ISO-8859-1?Q?Ludovic_Court=E8s?= , guile-devel@gnu.org To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Jul 11 09:49:17 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OXrHo-0000q1-9F for guile-devel@m.gmane.org; Sun, 11 Jul 2010 09:49:16 +0200 Original-Received: from localhost ([127.0.0.1]:55121 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OXrHm-0006Fa-Qd for guile-devel@m.gmane.org; Sun, 11 Jul 2010 03:49:15 -0400 Original-Received: from [140.186.70.92] (port=48054 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OXrHa-0006FU-4W for guile-devel@gnu.org; Sun, 11 Jul 2010 03:49:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OXrHY-0004ay-Kc for guile-devel@gnu.org; Sun, 11 Jul 2010 03:49:02 -0400 Original-Received: from mail-iw0-f169.google.com ([209.85.214.169]:41684) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OXrHY-0004ar-Dn; Sun, 11 Jul 2010 03:49:00 -0400 Original-Received: by iwn2 with SMTP id 2so4665992iwn.0 for ; Sun, 11 Jul 2010 00:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=rEUDhzlOBBoac+YOvItQexXASGtyfNM8FjS4VHO2T+0=; b=mTma++vp6vgLHYE7czP4R8CdUQaSoPGnt8q2ty3FqML5Dq2kbfxcKhxpDEn6r5T3w6 VpPGoKY2FE7M8I4MN3a0ErG1LrsGzQvAzDFvIWNaFZpSyf8eb2mN5ZfTpSTmVHms0jdb a0PoOqnRykSEQsGKW0m2/xJgnEEMxgRMzcZw0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=vbycLqXEAipKT085eKhUm0Kwxd1TWbHPYXZyBlSocXD87ldN1ffBP7LvnyNRPbsisg fpmO7GeopAS34CHE/jl8X6BA1tVDCnBtjlJPiyL1QomcZJUXmAEAdnFPc6/vm/qpqBOw ZjzbRY/lkLT3+TT1KgdNckjAuY+HI96UGnwEU= Original-Received: by 10.231.178.103 with SMTP id bl39mr11463990ibb.138.1278834539142; Sun, 11 Jul 2010 00:48:59 -0700 (PDT) Original-Received: by 10.231.12.1 with HTTP; Sun, 11 Jul 2010 00:48:39 -0700 (PDT) In-Reply-To: X-Google-Sender-Auth: 2k6PUcLbg4IGx3LqBw76N8vBRyE X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:10643 Archived-At: On Thu, Jul 8, 2010 at 11:21 AM, Andy Wingo wrote: > Hi Michael, > > On Tue 06 Jul 2010 00:59, Michael Lucy writes: > >> (use-modules (ice-9 peg)) >> (peg-find "'b'+" "aabbcc") >> --> (2 4 "bb") > > Humm, another thing to think about: (ice-9 regex) returns "match > structures", which are really just vectors; have a look at them, and if > it makes sense to mimic that interface, re-exporting those accessors > somehow, please do. So, there are a few places where the interfaces don't quite match up: 1. match:substring Problem: It's perfectly legal to pass peg-match a parsing nonterminal and have it give you a parse tree rather than a substring. Potential solutions: 1.a. Just have match:substring return either the substring or the parse tree. The problem with this is that it may violate expectations. 1.b. Have match:substring collapse the parse tree into a string, and have another function match:parse-tree that would return the parse tree. The problem with this is that the parse tree might have discarded text, which would once again violate expectations. 2. match:count Problem: the notion of a "parenthesized sub-expression" doesn't really map cleanly onto PEGs. This information isn't tracked while parsing and wouldn't be very meaningful. Potential solutions: 2.a. Ignore it (not that bad a solution). 2.b. Track that information. I'd rather not do this because it would slow down the parser and wouldn't be very useful. Which brings us to... 3. submatch numbers Problem: there's no notion of a "submatch" right now. People should be getting this information by building a parsing nonterminal and then traversing the resulting parse-tree. I'd rather not wire in a whole separate system just to provide an alternative way of getting information about what parts of an expression matched what (it would also slow down parsing). Potential solutions: 3.a. Ignore it (would violate expectations in a big way). 3.b. Wire it in (I'd really rather not do this). So, there would be some gaps if I shimmied the match structure interface onto PEGs. The problem is that, while it would be useful to have a consistent interface for matching both regexps and PEGs, they're different things and naming the accessor functions the same things might lead people to assume things that aren't true. So, three potential paths from here: 1. Mimic the match structure interface as much as possible. 2. Have a similar but differently-named "peg-match structure" interface that behaves mostly the same but has a few different functions (I think naming them something slightly different would lead to fewer people assuming they worked exactly the same as match structures). 3. Just having a different interface. I'm leaning toward (2); what do other people think? I'd probably: 1. Not have a peg-match:count function at all. 2. Not have the functions take submatch numbers. 3. Have peg-match:substring return the actual substring. 4. Have another function peg-match:parse-tree that returns the parse tree. > >> And when I use it with --no-autocompile I don't get any errors: >> >> What does this mean? > > Probably some eval-when tomfoolery. See "Eval When" in the manual. > > Cheers, > > Andy > -- > http://wingolog.org/ >