From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: rx.el sexp regexp syntax (WAS: Off Topic) Date: Fri, 25 May 2018 15:51:26 +0000 Message-ID: <20180525155126.GA4096@ACM> References: <87h8mw3yoc.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1527263546 11973 195.159.176.226 (25 May 2018 15:52:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 25 May 2018 15:52:26 +0000 (UTC) User-Agent: Mutt/1.9.4 (2018-02-28) Cc: van@scratch.space, eliz@gnu.org, emacs-devel@gnu.org, rms@gnu.org, Noam Postavsky To: Pierre Neidhardt Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 25 17:52:22 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fMF0w-00030R-4w for ged-emacs-devel@m.gmane.org; Fri, 25 May 2018 17:52:22 +0200 Original-Received: from localhost ([::1]:44736 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMF31-0004iF-D8 for ged-emacs-devel@m.gmane.org; Fri, 25 May 2018 11:54:31 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35214) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMF2L-0004i2-J6 for emacs-devel@gnu.org; Fri, 25 May 2018 11:53:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fMF2G-0004X8-PK for emacs-devel@gnu.org; Fri, 25 May 2018 11:53:49 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:40767 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1fMF2G-0004Un-De for emacs-devel@gnu.org; Fri, 25 May 2018 11:53:44 -0400 Original-Received: (qmail 21071 invoked by uid 3782); 25 May 2018 15:53:40 -0000 Original-Received: from acm.muc.de (p5B14740C.dip0.t-ipconnect.de [91.20.116.12]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 25 May 2018 17:53:38 +0200 Original-Received: (qmail 4102 invoked by uid 1000); 25 May 2018 15:51:26 -0000 Content-Disposition: inline In-Reply-To: <87h8mw3yoc.fsf@gmail.com> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:225710 Archived-At: Hello, Pierre. On Fri, May 25, 2018 at 10:52:03 +0200, Pierre Neidhardt wrote: > rx.el is one of the best concepts I've discovered in a long time. > It's another instance of "Don't come up with a new (mini)language when > Lisp can do better": it's easier to learn, more flexible, easier to > write, much easier to read and as a consequence much more maintainable. Much easier than what? Than the putative mini-language that doesn't get written? > > Some people, when confronted with a problem, think "I know, I'll use > > regular expressions." Now they have two problems. > > -- Jamie Zawinski > It's also much more "programmable" thanks to its `eval' expression. > (It's possible to count!) > See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/ > for some nice examples. > I think it's high time we moved away from traditional regexps and > embraced the concept of rx.el. I'm thinking of implementing it for > Guile. There's nothing stopping anybody from using rx.el. However, people have mostly _not_ used it. The "I think it's high time ...." suggests in some way forcing people to use it. Before mandating something like this, I think we should find out why it's not already in common use. > At the moment the rx.el implementation is built on top of Emacs regexps > which are implemented in C. I believe this does not use the power of > Lisp as much as it could. But would any alternative use the power of regexps? > The traditional regexps work in two steps: first build a blackbox > automaton from the string expression, then test if the input matches. > Building the automaton is costly. In C, we build it once and save the > result in a variable so that every regexp match does not rebuild the > automaton each time. Emacs has a (moderately large) cache of regexps, so that building the automatons is done very rarely. Possibly just once each for each session of Emacs. > In high-level languages, automatons are automatically cached to save the > cost of building them. Emacs Lisp does this too. > The rx.el library/concept could alleviate this issue altogether: because > we express the automaton directly in Lisp, the parsing step is not > needed and thus the building cost could be tremendously reduced. > So the rx.el building steps > rx expression -> regexp string -> C regexp automaton > could boil down to simply > rx automaton I don't see what you're trying to save, here. At some stage, the regexp source, in whatever form, needs to be converted to an automaton. Are you suggesting here building an interpreter in Lisp directly to execute rx expressions? > It would be interesting to compare the performance. This also means > that there would be no need for caching on behalf of the supporting > language. I will predict that an rx interpreter built in Lisp will be two orders of magnitude slower than the current regexp machine, where both the construction of an automaton, and the byte-code interpreter which runs it are written in C (and probably quite optimised C at that). Regexp performance is critical to Emacs's performance in general. > What do you think? I think we will, in the main, carry on using conventional regular expressions expressed as strings. I can't get excited about rx syntax, which I'm sure would be just as tedious, and possibly more difficult to read than a standard regexp. Analagously, as a musician, I read standard musical notation (with sets of five lines and dots) far more easily and fluently than I could any "simplified" system designed for beginners, which would be bloated by comparison. Regular expressions can be difficult. I don't believe this difficulty lies, in the main, in the compact notation used to express them. Rather it lies in the concepts and the semantics of the regexp elements, and being able to express a "mental automaton" in regexp semantics. > -- > Pierre Neidhardt -- Alan Mackenzie (Nuremberg, Germany).