unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Thomas Lord <lord@emf.net>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Chong Yidong <cyd@stupidchicken.com>,
	192@emacsbugs.donarmstrong.com, emacs-devel@gnu.org,
	David Koppelman <koppel@ece.lsu.edu>,
	Bruno Haible <bruno@clisp.org>
Subject: bug#192: regexp does not work as documented
Date: Mon, 12 May 2008 08:55:14 -0700	[thread overview]
Message-ID: <48286862.6040105__1582.94740035689$1210622412$gmane$org@emf.net> (raw)
In-Reply-To: <jwvfxsnpryp.fsf-monnier+emacsbugreports@gnu.org>

Stefan Monnier wrote:
> That's what I do in lex.el.
>
>   

Sounds nice.  

Last bits of experience report, then:

If it isn't so already, it may be easy to make it so
that a choice of which DFA is being used, plus a choice of the
"current state" can be represented as lisp objects and cheaply
copied.  That gives the essence of "regular expression continuations".

Handy features that shouldn't be difficult to add (if not present):

Let programmers specify "labels" for each NFA state and then,
for each DFA state, have either a list of all NFA labels that
correspond to that DFA state and/or a more general way to
"combine" NFA state labels to make the DFA label.  You can
wind up with many NFA states combined to a single DFA state,
of course, so a "combine" function might be important.

Include scanning functions to:

~ advance the DFA at most N characters (or until failure)
~ advance the DFA to the next non-nil state label (or failure)

In both cases, give a way for lisp programs to get back not only
the label (or failure indication) but also the regular expression
continuation.

Those features are handy so that (for example) lisp programs can
hang a suspended regexp continuation on a buffer character as
a property, doing incremental "re-lexing" in application-specific
ways.

The "advance to non-nil label" feature is useful for writing lisp
programs that *do not* need back-referencing or sub-exp locations
per se.

It is a bit more speculative but also consider functions to:

~ advance the state of a DFA based on characters provided
   in a function call rather than read from a buffer -- e.g., a
   buffer position should not have to be part of the state of a
   running DFA.  
       (advance-dfa re-continuation chr) => re-continuation

Why that last one?  Because then you can probably use the same
DFA engine as the heart of a shift-reduce parser and (for languages
that admit such things) write an incremental parser.  (You'd be using
non-buffer-position DFAs to process token ids emitted by the lexer.)
You can also use such a feature for things like serial I/O protocols.

Incremental parsers open the door to robust "syntax directed editing"
which I think could be an exciting direction for IDE features to take.
(Years ago, Thomas Reps and Tim Teitelbaum worked on the "Synthesizer
Generator" which I recall had features along these lines (their parser
guts were probably different from what I suggest).  As I (now vaguely)
recall there is a book that talks about their Emacs-based implementation.)

Bye.  Thanks.  And good luck!
-t







  parent reply	other threads:[~2008-05-12 15:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87k5i8ukq8.fsf@stupidchicken.com>
     [not found] ` <200805061335.11379.bruno@clisp.org>
     [not found]   ` <48204B3D.6000500@gmx.at>
2008-05-10 19:18     ` bug#192: regexp does not work as documented David Koppelman
     [not found]     ` <yg5skwqc6ho.fsf@nested.ece.lsu.edu>
2008-05-10 20:13       ` David Koppelman
     [not found]       ` <yg5bq3ddij2.fsf@nested.ece.lsu.edu>
2008-05-11  7:40         ` martin rudalics
     [not found]         ` <4826A303.3030002@gmx.at>
2008-05-11 14:27           ` Chong Yidong
     [not found]           ` <87abiwoqzd.fsf@stupidchicken.com>
2008-05-11 15:36             ` David Koppelman
     [not found]             ` <yg57ie0df8u.fsf@nested.ece.lsu.edu>
2008-05-11 18:44               ` Stefan Monnier
     [not found]               ` <jwv4p94r8vp.fsf-monnier+emacsbugreports@gnu.org>
2008-05-11 19:09                 ` David Koppelman
     [not found]                 ` <yg5tzh4bqtw.fsf@nested.ece.lsu.edu>
2008-05-12  1:28                   ` Stefan Monnier
     [not found]                   ` <jwvr6c8pbd6.fsf-monnier+emacsbugreports@gnu.org>
2008-05-12 15:03                     ` David Koppelman
     [not found]                     ` <yg5d4nra7jb.fsf@nested.ece.lsu.edu>
2008-05-12 16:29                       ` Stefan Monnier
     [not found]                       ` <jwvzlqvmr6g.fsf-monnier+emacsbugreports@gnu.org>
2008-05-12 17:04                         ` David Koppelman
2008-05-11 18:44             ` Stefan Monnier
     [not found]             ` <jwv8wygrbss.fsf-monnier+emacsbugreports@gnu.org>
2008-05-11 20:03               ` Thomas Lord
     [not found]               ` <482750F4.2050102@emf.net>
2008-05-12  1:43                 ` Stefan Monnier
     [not found]                 ` <jwvlk2gpas3.fsf-monnier+emacsbugreports@gnu.org>
2008-05-12  3:30                   ` Thomas Lord
     [not found]                   ` <4827B9B8.30406@emf.net>
2008-05-12 13:43                     ` Stefan Monnier
     [not found]                     ` <jwvfxsnpryp.fsf-monnier+emacsbugreports@gnu.org>
2008-05-12 15:55                       ` Thomas Lord [this message]
     [not found]                       ` <48286862.6040105@emf.net>
2008-05-12 16:18                         ` tomas
     [not found]   ` <jwvfxsvbgg5.fsf-monnier+emacs@gnu.org>
2008-05-10 20:04     ` Bruno Haible
2008-05-06  1:30 Bruno Haible
2015-12-29 17:48 ` bug#192: " Bruno Haible

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='48286862.6040105__1582.94740035689$1210622412$gmane$org@emf.net' \
    --to=lord@emf.net \
    --cc=192@emacsbugs.donarmstrong.com \
    --cc=bruno@clisp.org \
    --cc=cyd@stupidchicken.com \
    --cc=emacs-devel@gnu.org \
    --cc=koppel@ece.lsu.edu \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).