Re: regexp does not work as documented

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Thomas Lord <lord@emf.net>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Chong Yidong <cyd@stupidchicken.com>,
	192@emacsbugs.donarmstrong.com, emacs-devel@gnu.org,
	martin rudalics <rudalics@gmx.at>,
	David Koppelman <koppel@ece.lsu.edu>,
	Bruno Haible <bruno@clisp.org>
Subject: Re: regexp does not work as documented
Date: Mon, 12 May 2008 08:55:14 -0700	[thread overview]
Message-ID: <48286862.6040105@emf.net> (raw)
In-Reply-To: <jwvfxsnpryp.fsf-monnier+emacsbugreports@gnu.org>

Stefan Monnier wrote:
> That's what I do in lex.el.
>
>   

Sounds nice.  

Last bits of experience report, then:

If it isn't so already, it may be easy to make it so
that a choice of which DFA is being used, plus a choice of the
"current state" can be represented as lisp objects and cheaply
copied.  That gives the essence of "regular expression continuations".

Handy features that shouldn't be difficult to add (if not present):

Let programmers specify "labels" for each NFA state and then,
for each DFA state, have either a list of all NFA labels that
correspond to that DFA state and/or a more general way to
"combine" NFA state labels to make the DFA label.  You can
wind up with many NFA states combined to a single DFA state,
of course, so a "combine" function might be important.

Include scanning functions to:

~ advance the DFA at most N characters (or until failure)
~ advance the DFA to the next non-nil state label (or failure)

In both cases, give a way for lisp programs to get back not only
the label (or failure indication) but also the regular expression
continuation.

Those features are handy so that (for example) lisp programs can
hang a suspended regexp continuation on a buffer character as
a property, doing incremental "re-lexing" in application-specific
ways.

The "advance to non-nil label" feature is useful for writing lisp
programs that *do not* need back-referencing or sub-exp locations
per se.

It is a bit more speculative but also consider functions to:

~ advance the state of a DFA based on characters provided
   in a function call rather than read from a buffer -- e.g., a
   buffer position should not have to be part of the state of a
   running DFA.  
       (advance-dfa re-continuation chr) => re-continuation

Why that last one?  Because then you can probably use the same
DFA engine as the heart of a shift-reduce parser and (for languages
that admit such things) write an incremental parser.  (You'd be using
non-buffer-position DFAs to process token ids emitted by the lexer.)
You can also use such a feature for things like serial I/O protocols.

Incremental parsers open the door to robust "syntax directed editing"
which I think could be an exciting direction for IDE features to take.
(Years ago, Thomas Reps and Tim Teitelbaum worked on the "Synthesizer
Generator" which I recall had features along these lines (their parser
guts were probably different from what I suggest).  As I (now vaguely)
recall there is a book that talks about their Emacs-based implementation.)

Bye.  Thanks.  And good luck!
-t

next prev parent reply	other threads:[~2008-05-12 15:55 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-06  4:20 regexp does not work as documented Chong Yidong
2008-05-06 11:35 ` Bruno Haible
2008-05-06 12:12   ` martin rudalics
2008-05-10 19:18     ` bug#192: " David Koppelman
2008-05-10 19:18     ` David Koppelman
2008-05-10 20:13       ` bug#192: " David Koppelman
2008-05-10 20:13       ` David Koppelman
2008-05-11  7:40         ` martin rudalics
2008-05-11 14:27           ` bug#192: " Chong Yidong
2008-05-11 14:27           ` Chong Yidong
2008-05-11 15:36             ` bug#192: " David Koppelman
2008-05-11 15:36             ` David Koppelman
2008-05-11 18:44               ` Stefan Monnier
2008-05-11 19:09                 ` bug#192: " David Koppelman
2008-05-11 19:09                 ` David Koppelman
2008-05-12  1:28                   ` bug#192: " Stefan Monnier
2008-05-12  1:28                   ` Stefan Monnier
2008-05-12 15:03                     ` bug#192: " David Koppelman
2008-05-12 15:03                     ` David Koppelman
2008-05-12 16:29                       ` bug#192: " Stefan Monnier
2008-05-12 16:29                       ` Stefan Monnier
2008-05-12 17:04                         ` David Koppelman
2008-05-12 17:04                         ` bug#192: " David Koppelman
2008-05-11 18:44               ` Stefan Monnier
2008-05-11 18:44             ` Stefan Monnier
2008-05-11 20:03               ` bug#192: " Thomas Lord
2008-05-11 20:03               ` Thomas Lord
2008-05-12  1:43                 ` Stefan Monnier
2008-05-12  3:30                   ` bug#192: " Thomas Lord
2008-05-12  3:30                   ` Thomas Lord
2008-05-12 13:43                     ` bug#192: " Stefan Monnier
2008-05-12 13:43                     ` Stefan Monnier
2008-05-12 15:55                       ` Thomas Lord [this message]
2008-05-12 16:18                         ` bug#192: " tomas
2008-05-12 16:18                         ` tomas
2008-05-12 15:55                       ` bug#192: " Thomas Lord
2008-05-12  1:43                 ` Stefan Monnier
2008-05-11 18:44             ` Stefan Monnier
2008-05-11  7:40         ` martin rudalics
2008-05-06 15:35   ` Stefan Monnier
2008-05-06 21:29     ` Bruno Haible
2008-05-10 20:04     ` bug#192: " Bruno Haible
2008-05-10 20:04     ` Bruno Haible
2008-05-06 15:00 ` David Koppelman
2008-05-06 21:35   ` Bruno Haible
2008-05-07  1:04     ` Stefan Monnier
2008-05-07  1:08     ` Auto-discovery of multi-line font-lock regexps Stefan Monnier
2008-05-07  3:46       ` Chong Yidong
2008-05-07  4:21         ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2008-05-06  1:30 regexp does not work as documented Bruno Haible
2008-05-06 14:15 ` Johan Bockgård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48286862.6040105@emf.net \
    --to=lord@emf.net \
    --cc=192@emacsbugs.donarmstrong.com \
    --cc=bruno@clisp.org \
    --cc=cyd@stupidchicken.com \
    --cc=emacs-devel@gnu.org \
    --cc=koppel@ece.lsu.edu \
    --cc=monnier@iro.umontreal.ca \
    --cc=rudalics@gmx.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.