From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Michael Lucy Newsgroups: gmane.lisp.guile.devel Subject: Re: Some Questions Date: Mon, 29 Mar 2010 23:08:18 -0600 Message-ID: <52c42c3e1003292208v14b31651q9b8acd72e63bf890@mail.gmail.com> References: <52c42c3e1003242253w40bd6447k48e5a42cc1d33a98@mail.gmail.com> <52c42c3e1003251242g1a39b967pfdf5437b953bf054@mail.gmail.com> <52c42c3e1003282318q3f01f575m7bf45342bf520499@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1269925739 16011 80.91.229.12 (30 Mar 2010 05:08:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 30 Mar 2010 05:08:59 +0000 (UTC) Cc: guile-devel@gnu.org To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Mar 30 07:08:50 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NwTh3-0000FS-Du for guile-devel@m.gmane.org; Tue, 30 Mar 2010 07:08:49 +0200 Original-Received: from localhost ([127.0.0.1]:43810 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwTh2-0001Z2-3z for guile-devel@m.gmane.org; Tue, 30 Mar 2010 01:08:48 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwTgy-0001Ys-CA for guile-devel@gnu.org; Tue, 30 Mar 2010 01:08:44 -0400 Original-Received: from [140.186.70.92] (port=46180 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwTgw-0001Yb-7k for guile-devel@gnu.org; Tue, 30 Mar 2010 01:08:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwTgu-0004Qn-0G for guile-devel@gnu.org; Tue, 30 Mar 2010 01:08:42 -0400 Original-Received: from mail-pv0-f169.google.com ([74.125.83.169]:60480) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwTgt-0004Qj-NU for guile-devel@gnu.org; Tue, 30 Mar 2010 01:08:39 -0400 Original-Received: by pvg2 with SMTP id 2so7168471pvg.0 for ; Mon, 29 Mar 2010 22:08:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:received:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=Z7V0izOG4ZSlOhWBTEYf+leVEwGXPVe0rwI5Hs4DPbw=; b=Xz4yIFMLYesiCp9aD1bbzFp8EQapr3VoQkNaIJsKyVA4BpPU3y0y+S3ysdS+9OqRY9 V46xNg/O9eLKM8wf31pDMI4mJeIxZjBsn6PoqXy+J/OOkJhpI52YMZWhOx2/Um2xxTxJ Lp/XzL+v6wrEHS5KLJfczAn9WQmEwYNXEL6Zw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; b=N1m5fqpcGHrjHBAn23gogcsddwdunb5+TpyQn7sRfQGpnZYhFz6hf50RrOepv/b9n/ r4qUdfi8t08hIfM8h/mIUMxCOTN48zr10f9Xomqb/dd+mRnlgg26zDQxC79VruX+2pMT gvb8jqn+2FHjYk0etLoVOH8FPAMLVcAizet3c= Original-Received: by 10.141.40.4 with HTTP; Mon, 29 Mar 2010 22:08:18 -0700 (PDT) In-Reply-To: X-Google-Sender-Auth: a57de0ab7e498016 Original-Received: by 10.140.251.19 with SMTP id y19mr5313771rvh.161.1269925718619; Mon, 29 Mar 2010 22:08:38 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:10119 Archived-At: Hi, On Mon, Mar 29, 2010 at 3:40 AM, Andy Wingo wrote: > > Tree-IL is the right thing IMO, mostly because it allows you keep source > location information, but it also allows you to express more precisely > what you want to compile to. You don't want to run your compiler's > output through the Scheme syntax expander, that's unnecessary. > > Also tree-il is the right place to hook into Guile's compiler > infrastructure. > Ah, k. > > Ah, I should have responded to you before I responded to "No Itisnt"; if > you're down for this, let's do it then. I suspect that it's a project > that would take 2 months to do *well*: > > =A0* Compile PEG grammars at syntax time using a macro > =A0 =A0- into state-machine-like lexically-bound closures > =A0 =A0- see LPEG VM for example > =A0 =A0 =A0http://www.inf.puc-rio.br/~roberto/docs/peg.pdf > =A0 =A0- potentially augment Guile's VM with needed instructions > =A0* Procedural, LPEG-like interface? > =A0 =A0- run the compiler at runtime? So, I'm assuming "syntax time" means the grammars will be compiled as soon as the source file is loaded and macros are parsed, while "runtime" means the grammars will be compiled when a function using them is called. Do you mean you'd like to see both of these supported? As for the interface, the lisp tradition I'm most comfortable in calls for transparent composing and decomposing of lists as the main way of dealing with data, so I was thinking of representing uncompiled grammars in a form similar to the example at the top of peg.scm and then compiling this to something efficient (a choice between this and more traditional string syntax is how CL-PPCRE approaches things: http://weitz.de/cl-ppcre/#scan ). For example, if I have a pattern stored in *var* and I want a new pattern that's one or more repetitions of that, I'd write `(,*var* *), or (list *var* '*). It looks like LPEG's approach is functions that return opaque data and other functions to combine these opaque data. For example, if I have a pattern stored in *var* and I want a new pattern that's one or more repetitions of that, I'd write (one-or-more *var*). The problem with this approach is that you have to choose between verbosity or confusion (e.g. LPEG uses * to concatenate things in the lpeg module, but when present in a string for the re module it means what it normally would in a PEG grammar). I haven't finished reading the paper, but I'm assuming the advantage to this approach is that the opaque representation is already compiled, so doing several hundred matches with one grammar won't require several hundred compilations. It seems like we could get the best of both worlds by having transparent list representations, functions to compose them (so (one-or-more *var*) yields the same thing as `(,*var* *)), and a compile-peg function that would take the list representation and produce a compiled PEG parser. All the peg functions would take either the uncompiled or the compiled form (they would check at runtime what they were dealing with), so in situations where speed is important people can store the compiled grammar and re-use it. > =A0* PEG grammars as text? > =A0 =A0- Guile PEG language, parsed by PEG itself? Once a parser's written, parsing strings into grammars sounds useful and not too difficult. > =A0* Benchmarking > =A0 =A0- LPEG benchmarks would be a good first start > =A0* Test suite > =A0* Docs > =A0* Stream parsing? > > If you finished the first tasks, figuring out an efficient way to do > stream parsing could well take all the rest of the time. (The LPEG paper > works on strings.) > > So if you're down with it, perhaps you can do the PEG stuff, and No > Itisnt can work on Lua. There's definitely enough work for everyone :) > Hopefully the GOOG comes through with funding. Cool, PEG it is then. > > > So, let me know what you think about PEG, if you think it's the right > size for summer. I think it has great potential, and implementing it > well (as a compiler and procedurally) will be of great use. Otherwise we > can go ahead with Python, if that's to your liking. How does this look for a timeline (unless my math is very horrible, the "midterm" evaluations are 70% of the way through, so most of the real work will be done by then): The ~7 weeks before mid-term evaluations are due: 1 week defining the PEG syntax carefully and writing functions for dealing with it 3 weeks making the PEG syntax compile to VM code 1 week writing tests / fixing bugs that tests show 1 week documenting things that weren't well enough documented on the way 1 week for things to go wrong in The ~3 weeks after mid-term evaluations are in: 1 week writing useful add-ons (e.g. taking PEG grammar in strings) 1 week writing benchmarks and tuning the speed 1 week for cleaning things up and writing more documentation/examples The 1 week after suggested pencils-down date: More documentation/examples.