all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Pierre Neidhardt <ambrevar@gmail.com>
Cc: van@scratch.space, eliz@gnu.org, emacs-devel@gnu.org,
	rms@gnu.org, Noam Postavsky <npostavs@gmail.com>
Subject: Re: rx.el sexp regexp syntax (WAS: Off Topic)
Date: Fri, 25 May 2018 15:51:26 +0000	[thread overview]
Message-ID: <20180525155126.GA4096@ACM> (raw)
In-Reply-To: <87h8mw3yoc.fsf@gmail.com>

Hello, Pierre.

On Fri, May 25, 2018 at 10:52:03 +0200, Pierre Neidhardt wrote:

> rx.el is one of the best concepts I've discovered in a long time.
> It's another instance of "Don't come up with a new (mini)language when
> Lisp can do better": it's easier to learn, more flexible, easier to
> write, much easier to read and as a consequence much more maintainable.

Much easier than what?  Than the putative mini-language that doesn't get
written?

> > Some people, when confronted with a problem, think "I know, I'll use
> > regular expressions." Now they have two problems.
> > -- Jamie Zawinski

> It's also much more "programmable" thanks to its `eval' expression.
> (It's possible to count!)

> See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/
> for some nice examples.

> I think it's high time we moved away from traditional regexps and
> embraced the concept of rx.el.  I'm thinking of implementing it for
> Guile.

There's nothing stopping anybody from using rx.el.  However, people have
mostly _not_ used it.  The "I think it's high time ...." suggests in
some way forcing people to use it.  Before mandating something like
this, I think we should find out why it's not already in common use.

> At the moment the rx.el implementation is built on top of Emacs regexps
> which are implemented in C.  I believe this does not use the power of
> Lisp as much as it could.

But would any alternative use the power of regexps?

> The traditional regexps work in two steps: first build a blackbox
> automaton from the string expression, then test if the input matches.

> Building the automaton is costly.  In C, we build it once and save the
> result in a variable so that every regexp match does not rebuild the
> automaton each time.

Emacs has a (moderately large) cache of regexps, so that building the
automatons is done very rarely.  Possibly just once each for each
session of Emacs.

> In high-level languages, automatons are automatically cached to save the
> cost of building them.

Emacs Lisp does this too.

> The rx.el library/concept could alleviate this issue altogether: because
> we express the automaton directly in Lisp, the parsing step is not
> needed and thus the building cost could be tremendously reduced.

> So the rx.el building steps

>   rx expression -> regexp string -> C regexp automaton

> could boil down to simply

>   rx automaton

I don't see what you're trying to save, here.  At some stage, the regexp
source, in whatever form, needs to be converted to an automaton.

Are you suggesting here building an interpreter in Lisp directly to
execute rx expressions?

> It would be interesting to compare the performance.  This also means
> that there would be no need for caching on behalf of the supporting
> language.

I will predict that an rx interpreter built in Lisp will be two orders
of magnitude slower than the current regexp machine, where both the
construction of an automaton, and the byte-code interpreter which runs
it are written in C (and probably quite optimised C at that).

Regexp performance is critical to Emacs's performance in general.

> What do you think?

I think we will, in the main, carry on using conventional regular
expressions expressed as strings.  I can't get excited about rx syntax,
which I'm sure would be just as tedious, and possibly more difficult to
read than a standard regexp.  Analagously, as a musician, I read
standard musical notation (with sets of five lines and dots) far more
easily and fluently than I could any "simplified" system designed for
beginners, which would be bloated by comparison.

Regular expressions can be difficult.  I don't believe this difficulty
lies, in the main, in the compact notation used to express them.  Rather
it lies in the concepts and the semantics of the regexp elements, and
being able to express a "mental automaton" in regexp semantics.

> --
> Pierre Neidhardt

-- 
Alan Mackenzie (Nuremberg, Germany).



  reply	other threads:[~2018-05-25 15:51 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky
2018-05-24 10:58 ` Van L
2018-05-25  2:57 ` Richard Stallman
2018-05-25  8:52   ` Pierre Neidhardt
2018-05-25 15:51     ` Alan Mackenzie [this message]
2018-05-25 16:47       ` Pierre Neidhardt
2018-05-25 18:01         ` rx.el sexp regexp syntax Eric Abrahamsen
2018-05-25 18:12           ` Pierre Neidhardt
2018-05-25 18:56             ` Eric Abrahamsen
2018-05-25 21:42               ` Clément Pit-Claudel
2018-05-25 21:51                 ` Eric Abrahamsen
2018-05-25 22:27                   ` Michael Heerdegen
2018-05-25 22:44                     ` Eric Abrahamsen
2018-05-27 20:27           ` Stefan Monnier
2018-05-28 16:37             ` Pierre Neidhardt
2018-05-28 17:15               ` Stefan Monnier
2018-05-29  3:10                 ` Richard Stallman
2018-05-29  7:28                   ` Robert Pluim
2018-05-29  8:27                 ` Philipp Stephani
2018-05-30  3:24                   ` Richard Stallman
2018-05-30  7:25                     ` Robert Pluim
2018-05-31  3:53                       ` Richard Stallman
2018-05-31  8:57                         ` Robert Pluim
2018-05-31  4:13                       ` Clément Pit-Claudel
2018-05-31 14:19                       ` Stefan Monnier
2018-05-31 15:43                         ` Drew Adams
2018-05-31 16:12                           ` João Távora
2018-05-31 16:18                             ` Robert Pluim
2018-05-31 16:48                               ` Basil L. Contovounesios
2018-05-31 17:02                                 ` Basil L. Contovounesios
2018-05-31 18:40                                   ` João Távora
2018-06-02 19:33             ` Eric Abrahamsen
2018-06-03  3:49               ` Stefan Monnier
2018-06-03  4:59                 ` Eric Abrahamsen
2018-06-03 14:51                 ` Helmut Eller
2018-06-03 15:15                   ` Eric Abrahamsen
2018-06-03 15:53                     ` Helmut Eller
2018-06-03 16:40                       ` Eric Abrahamsen
2018-06-03 19:57                       ` Drew Adams
2018-06-03 21:15                         ` Eric Abrahamsen
2018-06-03 23:23                           ` Drew Adams
2018-06-04 13:56                         ` Stefan Monnier
2018-06-04 15:24                           ` Drew Adams
2018-06-04 15:44                             ` Pierre Neidhardt
2018-05-25 18:17         ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
2018-05-25 20:35           ` Peter Neidhardt
2018-05-25 21:01           ` rx.el sexp regexp syntax Michael Heerdegen
2018-05-25 23:32             ` Peter Neidhardt
2018-05-27 16:56       ` Tom Tromey
2018-05-27 20:16         ` Alan Mackenzie
2018-05-27 20:23       ` Stefan Monnier
2018-05-27 20:16     ` Stefan Monnier
2018-05-28 16:36       ` Pierre Neidhardt
2018-05-28 17:04         ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180525155126.GA4096@ACM \
    --to=acm@muc.de \
    --cc=ambrevar@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=npostavs@gmail.com \
    --cc=rms@gnu.org \
    --cc=van@scratch.space \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.