all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Eric M. Ludlam" <eric@siege-engine.com>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Cc: Daniel Colascione <danc@merrillpress.com>,
	David Engster <deng@randomsample.de>,
	Daniel Colascione <danc@merrillprint.com>,
	emacs-devel@gnu.org, Lennart Borgman <lennart.borgman@gmail.com>,
	Deniz Dogan <deniz.a.m.dogan@gmail.com>,
	Steve Yegge <stevey@google.com>, Leo <sdl.web@gmail.com>,
	Miles Bader <miles@gnu.org>
Subject: Re: "Font-lock is limited to text matching" is a myth
Date: Tue, 11 Aug 2009 22:16:53 -0400	[thread overview]
Message-ID: <1250043413.6753.457.camel@projectile.siege-engine.com> (raw)
In-Reply-To: <jwvtz0e5qfk.fsf-monnier+emacs@gnu.org>

On Tue, 2009-08-11 at 12:04 -0400, Stefan Monnier wrote:
> > I. Asynchronous parsing
> 
> BTW, I'm interested in adding core-Emacs support for such parsing, so if
> you have any ideas about it, please share them.  The way I see it, there
> should be a table-based parsing engine written in C running in
> a separate thread, so the question are "what should the tables look
> like?", "what should the output look like?", "what should the engine
> look like?", "how should the asynchronous code synchronize with the rest
> of Emacs?".  Any help along this effort would be welcome.

Hi,

  I don't think you could define a way to parse some things, like C++
with pre-processor support, via table declaration.  There will always be
some decision that needs to be made that the table designer misses out
on.  For example, to selectively exclude a block of code in #ifdef
MYSYM ... #endif.

  I do think that if there was a set of Emacs Lisp functions that were
considered "thread safe" that parser authors like myself would find a
way to restrict ourselves to that list in those situations where the
special built-in parser needs to make such decisions.  It might even be
a special set of new functions, like 'parser-cons', or 'parser-car',
though in writing it, it does feel icky.

  The project I was going to pick up after the current CEDET
merge/release stuff is related to this, though I was going to build a
separate process and communicate via pipe/socket/whatever with it. This
is because I'm seeing situations where the tag database is so large that
the whole machine slows down.  In this case it is not related to parser
speed, just the data structure size. Imagine a Lisp data structure for
the entirety of the Linux kernel's symbol space.  Such a subprocess
would also enable background parsing of files not currently in buffers
of the active Emacs.

  Let me summarize.  I think CEDET has managed to use timers to overcome
asynchronous parsing problems for the purpose of TAGGING files, and may
not be able to take advantage of asynchronous table-driven parsing
system.  There are, however, different problems CEDET faces that could
take advantage of asynchronous behaviors, but mostly in the way you can
asynchronously run "make TAGS" for etags.  As such, table driven
asynchronous parsing system could focus on a narrower set of
requirements that are not as rigorous for purposes of colorizing, and
have a synchronous form for when other kinds of logic are needed.

  As far as how to define tables for a parsing system written in C, an
old-school solution is to just use the flex/bison engines under the
Emacs Lisp API.  There are a lot of new parser generator systems though,
and I don't really know what the best one might be.  Defining some new
system for parsing using a more modern technique that has enough hooks
for tag generation would be easy to integrate into CEDET, and would
obsolete nothing.

  One of the hairier parts of the CEDET parser is the lexical analyzer.
I remember jumping through hoops trying to squeak out every stray
(if ..) or function call from the core loop, while keeping things
declarative and flexible enough to handle all the various languages.
Even so, the variable-length lexical token still ended up shuttling text
strings around in some situations where I would have preferred
references to a buffer.  Major performance was gained by treating
parenthetical groups as single lex tokens, expanding only if actually
needed.  Tagging parsers can then skip parsing function bodies, for
example, which provides a nice speed boost, but is not so good for
colorizing.

  As far as making this asynchronous, however, there aren't too many
languages that can use nothing but the core lexical types.

  Once you have a lexical token stream, David Ponce's wisent parser (a
port of bison) is top-notch, and very effective.  Moving bits into C may
not provide a huge speed boost, and would certainly not be separable
from lisp code that needs to be synchronous as far as TAGGING is
concerned.  It might be possible to define simplified code blocks such
as (FONT $1 'function) and get away with making asynchronous parsers.
I'm not sure if that is sufficient.

  Error handling in bison is a bit confusing for the uninitiated, and
hard to get right.  CEDET solved this by making the parsers implicitly
iterative at the framework level.  That means when you enter the parser
engine, it will return nil, or a single tag.  If it gets a nil, it skips
that lexical token, and tries again on the next one.  This makes it
robust to "junk" between tags, or between fcn arguments, or between
commands.

  After babbling for a while, I would guess that Stefan is probably
asking for help identifying something like a syntax table.  I think
lexical analysis is common between all the parser generator frameworks,
and has the potential to make a data structure larger than the buffer it
was derived from.  Deriving a push/pop lexical analyzer structure that
shares data with the buffer text, or can even cache itself on top of the
buffer so it doesn't need to "analyze" the whole thing over and over
would be a great first step for any parsing system.

Eric




  parent reply	other threads:[~2009-08-12  2:16 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-09 23:34 Why js2-mode in Emacs 23.2? Deniz Dogan
2009-08-09 23:38 ` Lennart Borgman
2009-08-09 23:46   ` Daniel Colascione
2009-08-09 23:50     ` Deniz Dogan
2009-08-09 23:56       ` Lennart Borgman
2009-08-09 23:56       ` Daniel Colascione
2009-08-09 23:55     ` Lennart Borgman
2009-08-09 23:58       ` Daniel Colascione
2009-08-10  0:00         ` Lennart Borgman
2009-08-10  0:06           ` Daniel Colascione
2009-08-10  0:17             ` Lennart Borgman
2009-08-10  0:46               ` Daniel Colascione
2009-08-10  0:55                 ` Lennart Borgman
2009-08-10  0:18         ` Leo
2009-08-10  0:49           ` Daniel Colascione
2009-08-10  7:06           ` Carsten Dominik
2009-08-10  8:44             ` Leo
2009-08-10  8:54               ` CHENG Gao
2009-08-10  9:26                 ` Leo
2009-08-10 10:22                   ` Richard Riley
2009-08-10 15:21                   ` eval-after-load not harmful after all (Was: Re: Why js-2mode?) Daniel Colascione
2009-08-10 17:01                     ` Drew Adams
2009-08-10 17:21                       ` eval-after-load not harmful after all Stefan Monnier
2009-08-11  0:43                       ` eval-after-load not harmful after all (Was: Re: Why js-2mode?) Stephen J. Turnbull
2009-08-11  0:46                         ` Drew Adams
2009-08-11 14:06                           ` Stephen J. Turnbull
2009-08-11 15:08                           ` eval-after-load not harmful after all Stefan Monnier
2009-08-16 21:43                             ` Leo
2009-08-17  0:34                               ` Lennart Borgman
2009-08-17 11:44                                 ` Leo
2009-08-17 11:55                                   ` Lennart Borgman
2009-08-17 12:26                                     ` Leo
2009-08-17 14:40                                       ` Lennart Borgman
2009-08-11  0:53                         ` eval-after-load not harmful after all (Was: Re: Why js-2mode?) Lennart Borgman
2009-08-11  3:06                         ` Daniel Colascione
2009-08-11  9:17                           ` Leo
2009-08-11 14:37                           ` Stephen J. Turnbull
2009-08-10 10:41               ` Why js2-mode in Emacs 23.2? Carsten Dominik
2009-08-10 13:04                 ` Leo
2009-08-10 14:55                   ` Stefan Monnier
2009-08-11  1:13                 ` Glenn Morris
2009-08-11  3:02                   ` Daniel Colascione
2009-08-11  4:28                     ` Dan Nicolaescu
2009-08-11  4:33                       ` Daniel Colascione
2009-08-11  4:39                         ` Dan Nicolaescu
2009-08-11  4:45                           ` Daniel Colascione
2009-08-11  4:37                     ` Glenn Morris
2009-08-10  2:47         ` Stefan Monnier
2009-08-10  2:55           ` Lennart Borgman
2009-08-10 13:12             ` Stefan Monnier
2009-08-10  0:32   ` Leo
2009-08-10  0:48     ` Daniel Colascione
2009-08-10  2:55       ` Stefan Monnier
2009-08-10  3:24         ` Miles Bader
2009-08-10  3:27           ` Lennart Borgman
2009-08-10  3:45             ` Daniel Colascione
2009-08-10  5:18             ` Jason Rumney
2009-08-10  5:51           ` Xah Lee
2009-08-10  6:22             ` Xah Lee
2009-08-10  6:59               ` Miles Bader
2009-08-10 11:01             ` Lennart Borgman
2009-08-10 17:35             ` "Font-lock is limited to text matching" is a myth Daniel Colascione
2009-08-10 18:04               ` Lennart Borgman
2009-08-10 20:42                 ` David Engster
2009-08-10 20:51                   ` Lennart Borgman
2009-08-10 22:06                     ` Eric M. Ludlam
2009-08-10 22:19                       ` Lennart Borgman
2009-08-11  1:50                         ` Eric M. Ludlam
2009-08-11  6:47                           ` Steve Yegge
2009-08-11  9:17                             ` Miles Bader
2009-08-11 12:13                             ` Daniel Colascione
2009-08-11 14:37                               ` Miles Bader
2009-08-11 14:49                                 ` Lennart Borgman
2009-08-11 14:57                                   ` Daniel Colascione
2009-08-11 14:53                                 ` Daniel Colascione
2009-08-11 15:08                                   ` Lennart Borgman
2009-08-11 15:36                                   ` Miles Bader
2009-08-11 15:56                                 ` Stephen J. Turnbull
2009-08-11 15:54                                   ` Lennart Borgman
2009-08-11 17:00                                     ` Stephen J. Turnbull
2009-08-11 17:19                                       ` Lennart Borgman
2009-08-11 15:57                                   ` Miles Bader
2009-08-11 17:06                                     ` Stephen J. Turnbull
2009-08-11 14:50                               ` Chong Yidong
2009-08-11 15:06                                 ` Daniel Colascione
2009-08-11 15:11                                   ` Lennart Borgman
2009-08-11 15:16                                     ` Daniel Colascione
2009-08-11 15:44                                       ` Lennart Borgman
2009-08-11 18:04                                   ` joakim
2009-08-11 18:08                                     ` Lennart Borgman
2009-08-11 19:12                                       ` joakim
2009-08-11 17:09                               ` Stefan Monnier
2009-08-11 16:04                             ` Stefan Monnier
2009-08-11 18:10                               ` Edward O'Connor
2009-08-12  1:58                               ` Steve Yegge
2009-08-12 13:48                                 ` Chong Yidong
2009-08-12 16:07                                   ` Lennart Borgman
2009-08-12 22:08                                   ` Steve Yegge
2009-08-14  1:22                                 ` Stefan Monnier
2009-08-12  2:16                               ` Eric M. Ludlam [this message]
2009-08-12  6:43                                 ` Miles Bader
2009-08-12 11:28                                   ` Xah Lee
2010-11-23 14:43                                   ` Stefan Monnier
2009-08-12 15:21                               ` asynchronous parsing (was: "Font-lock is limited to text matching" is a myth) Ted Zlatanov
2009-08-12 17:16                                 ` asynchronous parsing joakim
2009-08-12 19:39                                   ` Ted Zlatanov
2009-08-12 20:01                                     ` joakim
2009-08-13  2:51                                 ` Stefan Monnier
2009-08-13 14:51                                   ` Ted Zlatanov
2009-08-11 19:48                           ` "Font-lock is limited to text matching" is a myth Lennart Borgman
2009-08-10 18:47               ` Stefan Monnier
2009-08-10 18:55                 ` Lennart Borgman
2009-08-11  3:33                   ` Stefan Monnier
2009-08-10 14:49           ` Why js2-mode in Emacs 23.2? Stefan Monnier
2009-08-10  6:46         ` Deniz Dogan
2009-08-10 14:53           ` Stefan Monnier
2009-08-10 14:05       ` Stephen Eilert
2009-08-10 14:37         ` Lennart Borgman
2009-08-10 14:42           ` Deniz Dogan
2009-08-10 19:12           ` Stephen Eilert
2009-08-10 14:41         ` Deniz Dogan
2009-08-10 14:57           ` Lennart Borgman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1250043413.6753.457.camel@projectile.siege-engine.com \
    --to=eric@siege-engine.com \
    --cc=danc@merrillpress.com \
    --cc=danc@merrillprint.com \
    --cc=deng@randomsample.de \
    --cc=deniz.a.m.dogan@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=lennart.borgman@gmail.com \
    --cc=miles@gnu.org \
    --cc=monnier@IRO.UMontreal.CA \
    --cc=sdl.web@gmail.com \
    --cc=stevey@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.