From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Xah Lee Newsgroups: gmane.emacs.devel Subject: Re: "Font-lock is limited to text matching" is a myth Date: Wed, 12 Aug 2009 04:28:51 -0700 Message-ID: References: <7b501d5c0908091634ndfba631vd9db6502db301097@mail.gmail.com> <87my67s8mr.fsf@randomsample.de> <1249942011.29022.15.camel@projectile.siege-engine.com> <1249955428.29022.186.camel@projectile.siege-engine.com> <9c768dc60908102347v57bdf38ara9fe2179f68c07e4@mail.gmail.com> <1250043413.6753.457.camel@projectile.siege-engine.com> Reply-To: xahlee@gmail.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=00163649a6dd94a73e0470f01f19 X-Trace: ger.gmane.org 1250076652 4563 80.91.229.12 (12 Aug 2009 11:30:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 12 Aug 2009 11:30:52 +0000 (UTC) Cc: emacs-devel@gnu.org To: Miles Bader Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Aug 12 13:30:45 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MbC2W-0006Ai-F8 for ged-emacs-devel@m.gmane.org; Wed, 12 Aug 2009 13:30:45 +0200 Original-Received: from localhost ([127.0.0.1]:34289 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MbC2U-0002Oq-Cx for ged-emacs-devel@m.gmane.org; Wed, 12 Aug 2009 07:30:42 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MbC0u-0000qM-UV for emacs-devel@gnu.org; Wed, 12 Aug 2009 07:29:05 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MbC0p-0000g0-Gt for emacs-devel@gnu.org; Wed, 12 Aug 2009 07:29:03 -0400 Original-Received: from [199.232.76.173] (port=51099 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MbC0o-0000ft-Qg for emacs-devel@gnu.org; Wed, 12 Aug 2009 07:28:59 -0400 Original-Received: from mail-px0-f193.google.com ([209.85.216.193]:43689) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MbC0j-0006cv-Kl; Wed, 12 Aug 2009 07:28:54 -0400 Original-Received: by pxi31 with SMTP id 31so27533pxi.24 for ; Wed, 12 Aug 2009 04:28:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=QBsWBeV0KrWOihxMg0s4gI+tDvmtCA1Aj3C0RpdIjPU=; b=gPRYmlNdTlj7xOTWZspLw14eUbKpgIpxeqouEqGEQrdr2Hmxv5XTlQtToRl159OUNC Lp9KwBmw5hIUIK2QMxytR0gHUw4AY1DTD0QTb7PfJMntWlI/qB0L+IGLBtbCMMERL/UO /XXZ2pLqpd2ZdcpsdsoVdUbTrOgAao8acoy3E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=I1bp1m2bPnjuSFHIiaxaO5wNYPH/KXuuCo3CwyXjLS9XEDRrwFIBBDVm/qyyiDlo0C G63O1GuZCyeqs8FuEj9wwM7jft2kACYxeYOq+LgciU+ZW/k/eCukV73pH9YRafpkm+kU 5ad3s/Ojq6sI8KAcLOqUtKU18AO6JmJIn3U98= Original-Received: by 10.115.47.17 with SMTP id z17mr14248waj.109.1250076531139; Wed, 12 Aug 2009 04:28:51 -0700 (PDT) In-Reply-To: X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:114138 Archived-At: --00163649a6dd94a73e0470f01f19 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit i very much second this! PEG's the next level of regex, and i expect it to replace regex in some sense in the coming years for the whole field of text processing. there are currently 2 of them in elisp as far as i know: * http://www.emacswiki.org/cgi-bin/wiki/ParserCompiler (2008) by Mike Mattie. * http://www.emacswiki.org/emacs/ParsingExpressionGrammars (2008) by Helmut Eller. it'd be much better if PEG is integrated from the ground up in elisp, possibly implemented in C or from other libs for speed. I imagine functions that takes a regex can have a version with PEG. Xah On Tue, Aug 11, 2009 at 11:43 PM, Miles Bader wrote: > "Eric M. Ludlam" writes: > > As far as how to define tables for a parsing system written in C, an > > old-school solution is to just use the flex/bison engines under the > > Emacs Lisp API. There are a lot of new parser generator systems > > though, and I don't really know what the best one might be. > > > > One of the hairier parts of the CEDET parser is the lexical analyzer. > > Slightly off-topic, but I'm a huge fan of "LPeg" [1], which is a > pattern-matching library for Lua, based on Parsing Expression Grammars > (PEGs). > > I've always wished for something like LPeg in elisp, and since Lua is at > heart quite lisp-like (despite the very different syntax), I think it > could work very well. Maybe it wouldn't be too hard to adapt LPeg's > core to elisp (it's licensed under the BSD license). > > [There's a popular implementation technique for PEGs called "packrat > parsers", and many PEG libraries use that technique -- however > apparently packrat parsers have some serious problems in practice, so > LPeg uses a different technique. See [2] for a discussion of this, and > of the LPeg implementation in detail.] > > Some nice things about LPeg: > > (1) It's very fast. > > (2) It's very concise; for typical usage, it's essentially like > writing a parser in yacc or whatever. > > (3) It makes it trivial to insert code and hooks at any point in the > parse; not just "actions", but code that can determine how the > parsing happens. This give a _huge_ amount of flexibility. > > (4) It's very easy to "think about", despite the flexibility and > presence of arbitrary code driving parsing, because it works kind > of like a recursive descent parser, operating greedily (but > provides mechanisms to do automatic backtracking when necessary). > > (5) Because it's so fast and flexible, typical practice is to _not_ > have a separate lexical analyzer, but just do lexical analysis in > the parser. This easier and more convenient, and also makes it > easier to use parser information in lexical analysis (e.g., the > famous "typedef" vs. "id" issue in C parsers). > > (6) It's very small -- the entire implementation (core engine and Lua > interface) is only 2000 lines of C. > > [The standard way to use LPeg in Lua uses Lua's ability to easily > overload standard operators, giving them LPeg-specific meanings when > invoked on first-class "pattern" objects. That can't be done in elisp, > but I think a more lispy approach should be easy.] > > [1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html > > [2] http://www.inf.puc-rio.br/~roberto/docs/peg.pdf > > -Miles > > -- > Zeal, n. A certain nervous disorder afflicting the young and inexperienced. > > > --00163649a6dd94a73e0470f01f19 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
i very much second this! PEG's the next level of regex, and i= expect it to replace regex in some sense in the coming years for the whole= field of text processing.

there are currently 2 o= f them in elisp as far as i know:



it'd be much better if PEG is integrated from the g= round up in elisp, possibly implemented in C or from other libs for speed. = I imagine functions that takes a regex can have a version with PEG.

=C2=A0=C2=A0Xah

On Tue, Aug 11, 2009 at 11:43 PM, Miles Bader <= ;miles@gnu.org> wrote:
"Eric M. Ludlam" <eric@siege-engine.com> writes:
> As far as how to define tables for a parsing system written in C, an > old-school solution is to just use the flex/bison engines under the > Emacs Lisp API. =C2=A0There are a lot of new parser generator systems<= br> > though, and I don't really know what the best one might be.
>
> One of the hairier parts of the CEDET parser i= s the lexical analyzer.

Slightly off-topic, but I'm a huge fan of "LPeg" [1], w= hich is a
pattern-matching library for Lua, based on Parsing Expression Grammars
(PEGs).

I've always wished for something like LPeg in elisp, and since Lua is a= t
heart quite lisp-like (despite the very different syntax), I think it
could work very well. =C2=A0Maybe it wouldn't be too hard to adapt LPeg= 's
core to elisp (it's licensed under the BSD license).

[There's a popular implementation technique for PEGs called "packr= at
parsers", and many PEG libraries use that technique -- however
apparently packrat parsers have some serious problems in practice, so
LPeg uses a different technique. =C2=A0See [2] for a discussion of this, an= d
of the LPeg implementation in detail.]

Some nice things about LPeg:

=C2=A0(1) It's very fast.

=C2=A0(2) It's very concise; for typical usage, it's essentially l= ike
=C2=A0 =C2=A0 =C2=A0writing a parser in yacc or whatever.

=C2=A0(3) It makes it trivial to insert code and hooks at any point in the=
=C2=A0 =C2=A0 =C2=A0parse; not just "actions", but code that can= determine how the
=C2=A0 =C2=A0 =C2=A0parsing happens. =C2=A0This give a _huge_ amount of fl= exibility.

=C2=A0(4) It's very easy to "think about", despite the flexi= bility and
=C2=A0 =C2=A0 =C2=A0presence of arbitrary code driving parsing, because it= works kind
=C2=A0 =C2=A0 =C2=A0of like a recursive descent parser, operating greedily= (but
=C2=A0 =C2=A0 =C2=A0provides mechanisms to do automatic backtracking when = necessary).

=C2=A0(5) Because it's so fast and flexible, typical practice is to _n= ot_
=C2=A0 =C2=A0 =C2=A0have a separate lexical analyzer, but just do lexical = analysis in
=C2=A0 =C2=A0 =C2=A0the parser. =C2=A0This easier and more convenient, and= also makes it
=C2=A0 =C2=A0 =C2=A0easier to use parser information in lexical analysis (= e.g., the
=C2=A0 =C2=A0 =C2=A0famous "typedef" vs. "id" issue in= C parsers).

=C2=A0(6) It's very small -- the entire implementation (core engine an= d Lua
=C2=A0 =C2=A0 =C2=A0interface) is only 2000 lines of C.

[The standard way to use LPeg in Lua uses Lua's ability to easily
overload standard operators, giving them LPeg-specific meanings when
invoked on first-class "pattern" objects. =C2=A0That can't be= done in elisp,
but I think a more lispy approach should be easy.]

[1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

[2] http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

-Miles

--
Zeal, n. A certain nervous disorder afflicting the young and inexperienced.=




--00163649a6dd94a73e0470f01f19--