From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Mattie Newsgroups: gmane.emacs.help Subject: Re: how to use parsing expressing grammar Date: Wed, 4 Mar 2009 22:55:54 -0800 Message-ID: <20090304225554.07eb4dd3@gmail.com> References: <6b8a1070-1a89-48b0-9287-343b673b5758@a29g2000pra.googlegroups.com> <098ddc5b-7074-41fb-a921-caee3bddddff@t39g2000prh.googlegroups.com> <91fa65a0-a7e7-4374-8f96-5786c8908615@w24g2000prd.googlegroups.com> <067151a4-b892-4e9a-8600-e2dcd92cb0ad@v18g2000pro.googlegroups.com> <87fxhukvuu.fsf@gmail.com> <87d4cyqg5n.fsf@catnip.gol.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1236245576 15852 80.91.229.12 (5 Mar 2009 09:32:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 5 Mar 2009 09:32:56 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 05 10:34:11 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Lf9xy-0004Ih-8T for geh-help-gnu-emacs@m.gmane.org; Thu, 05 Mar 2009 10:34:10 +0100 Original-Received: from localhost ([127.0.0.1]:44738 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Lf9wc-0002uV-Ug for geh-help-gnu-emacs@m.gmane.org; Thu, 05 Mar 2009 04:32:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Lf7Uw-0000Bi-A7 for help-gnu-emacs@gnu.org; Thu, 05 Mar 2009 01:56:02 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Lf7Uu-00009v-50 for help-gnu-emacs@gnu.org; Thu, 05 Mar 2009 01:56:01 -0500 Original-Received: from [199.232.76.173] (port=56701 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Lf7Ut-00009p-UZ for help-gnu-emacs@gnu.org; Thu, 05 Mar 2009 01:55:59 -0500 Original-Received: from wa-out-1112.google.com ([209.85.146.177]:46959) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Lf7Ut-0008AJ-Ah for help-gnu-emacs@gnu.org; Thu, 05 Mar 2009 01:55:59 -0500 Original-Received: by wa-out-1112.google.com with SMTP id k17so1884204waf.26 for ; Wed, 04 Mar 2009 22:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=CDyzybrQeWVOm0vO4ZVnA7hSVvTez14Zqci6rN3Vx8Y=; b=uaq94By90nPJ5jhMiXsGZh763Y3gfrDs8uAFLJ68eHs+jQQpWX8QiRmEMmwXD4yvVA DmmnJF4er9eWokK0OGsZAXO7Us+XQMZVF7fZ9q/t0pmHJKEtOskbRdPCquMNJFYl6cVP JAlGpiDg2ObhWn41EU7N/vaum9DYZcb+ogFoE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=S306o44dYqZRPt/AdO7BYBJQX7gxRC/QiGDSOeFbPMngc+izebnTM+o0HkHnqp3OoS PGAHJ3DUnnjc+3VMa5wSo79QZPYN+Xv+vb7tXKka068iMciChWF2HYfxWc/DJoao0hPc yahJ1eOdqPnXDrIEbADvdMFFQBwuaXjxzR+YU= Original-Received: by 10.115.73.20 with SMTP id a20mr490616wal.1.1236236158086; Wed, 04 Mar 2009 22:55:58 -0800 (PST) Original-Received: from localhost (c66-235-1-45.sea2.cablespeed.com [66.235.1.45]) by mx.google.com with ESMTPS id k35sm2118660waf.21.2009.03.04.22.55.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 04 Mar 2009 22:55:57 -0800 (PST) In-Reply-To: <87d4cyqg5n.fsf@catnip.gol.com> X-Mailer: Claws Mail 3.7.0 (GTK+ 2.12.11; i686-pc-linux-gnu) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-Mailman-Approved-At: Thu, 05 Mar 2009 04:31:19 -0500 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:62607 Archived-At: On Wed, 04 Mar 2009 09:35:00 +0900 Miles Bader wrote: > W Dan Meyer writes: > > Are we talking about memoising PEG (e.g Packrat) in elisp? ... > > Packrat parser is the best option out there currently because it > > recognises much complex grammar then LALR(1), and it has no > > ambiguities with expressing these grammars. >=20 > You also might be interested in Roberto Ierusalimschy's paper on the > implemenation of LPEG, which is a PEG implementation for Lua: >=20 > http://www.inf.puc-rio.br/~roberto/docs/peg.pdf >=20 >=20 > Note that LPEG does _not_ use the packrat algorithm, as apparently it > presents some serious practical problems for common uses of parsing > tools: >=20 > In 2002, Ford proposed Packrat [5], an adaptation of the > original algorithm that uses lazy evaluation to avoid that > inefficiency. >=20 > Even with this improvement, however, the space complexity of the > algorithm is still linear on the subject size (with a somewhat big > constant), even in the best case. As its author himself recognizes, > this makes the algorithm not befitting for parsing =E2=80=9Clarge amou= nts > of relatively flat=E2=80=9D data ([5], p. 57). However, unlike parsing to= ols, > regular-expression tools aim exactly at large amounts of relatively > flat data. >=20 > To avoid these difficulties, we did not use the Packrat > algorithm for LPEG. To implement LPEG we created a virtual parsing > machine, not unlike Knuth=E2=80=99s parsing machine [15], where each patt= ern > is represented as a program for the machine. The program is somewhat > similar to a recursive-descendent parser (with limited > backtracking) for the pattern, but it uses an explicit stack instead > of recursion. >=20 >=20 > The general LPEG page is here: >=20 > http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html >=20 >=20 > -Miles >=20 It seems there is a convergence :) I implemented the same thing in my parser compiler, with a different intent.The PEG operators decompose down to an internal instruction set translated by the core into elisp functions. The VM is a co-routine of two functions. A compiler which generates code from a valid instruction set, and a "union" function which determines the validity of a given instruction sequence. The union essentially attempts to pack as many instructions as possible into a matching function. When adding an instruction produces an invalid set it kicks the set to the compile function and starts a fresh matching=20 function. My goal was to allow the creation of custom operators for the ugly parts of practical grammars that can be mixed with standard operators in a consistent fashion. If the compiler is flexible enough then maybe the AST can move inside the parser so the user does not have to re-invent the wheel from the match hooks -> AST. It's probably getting a bit to parser specific for emacs-help, but if you want to compare notes feel free to e-mail me. Cheers, Mike Mattie