From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: rx.el sexp regexp syntax Date: Sun, 27 May 2018 20:16:29 +0000 Message-ID: <20180527201629.GC11447@ACM> References: <87h8mw3yoc.fsf@gmail.com> <20180525155126.GA4096@ACM> <87a7slyr3v.fsf@tromey.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1527452264 29159 195.159.176.226 (27 May 2018 20:17:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 27 May 2018 20:17:44 +0000 (UTC) User-Agent: Mutt/1.9.4 (2018-02-28) Cc: rms@gnu.org, Pierre Neidhardt , Noam Postavsky , emacs-devel@gnu.org, van@scratch.space, eliz@gnu.org To: Tom Tromey Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun May 27 22:17:39 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fN26j-0007U9-Ta for ged-emacs-devel@m.gmane.org; Sun, 27 May 2018 22:17:38 +0200 Original-Received: from localhost ([::1]:53146 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fN28r-0002d7-0e for ged-emacs-devel@m.gmane.org; Sun, 27 May 2018 16:19:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55224) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fN288-0002ck-Ui for emacs-devel@gnu.org; Sun, 27 May 2018 16:19:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fN285-0001iI-SP for emacs-devel@gnu.org; Sun, 27 May 2018 16:19:04 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:24835 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1fN285-0001hy-Lm for emacs-devel@gnu.org; Sun, 27 May 2018 16:19:01 -0400 Original-Received: (qmail 62011 invoked by uid 3782); 27 May 2018 20:19:01 -0000 Original-Received: from acm.muc.de (p5B14677F.dip0.t-ipconnect.de [91.20.103.127]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 27 May 2018 22:18:57 +0200 Original-Received: (qmail 12372 invoked by uid 1000); 27 May 2018 20:16:29 -0000 Content-Disposition: inline In-Reply-To: <87a7slyr3v.fsf@tromey.com> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:225770 Archived-At: Hello, Tom. On Sun, May 27, 2018 at 10:56:36 -0600, Tom Tromey wrote: > >>>>> "Alan" == Alan Mackenzie writes: > >> Building the automaton is costly. In C, we build it once and save the > >> result in a variable so that every regexp match does not rebuild the > >> automaton each time. > Alan> Emacs has a (moderately large) cache of regexps, so that building the > Alan> automatons is done very rarely. Possibly just once each for each > Alan> session of Emacs. > I wonder about both of these statements. > On the one hand, AFAICT the regex cache is 20 items. From search.c: > #define REGEXP_CACHE_SIZE 20 > That seems pretty small to me, given how prevalent regexps are in elisp. Hmm. I must have misremembered. I thought the cache size was 60, for some reason. Now that RAM is measured in gigabytes, we could probably increase that 20 (if there's any need). > On the other hand, in the past when I have tried to profile Emacs, I > haven't seen regexp compilation show up too much. IIRC I did see regexp > matching and the GC. Maybe this just points out the efficacy of the > cache -- maybe 20 items is plenty. Maybe. I just don't know. > Perhaps the regexp matcher could use some micro-optimizations, like the > token-threading the bytecode interpreter does. > Alan> Are you suggesting here building an interpreter in Lisp directly to > Alan> execute rx expressions? > It's interesting, IMO, to consider compiling rx (or regexps generally) > to lisp bytecode. Perhaps with the JIT, it would boost performance in > some cases. (It may be slower, but it's worthwhile to do the > experiment.) > For other work in this area see Stefan's lex-parse-re package. I think > it includes a regexp matcher in elisp. I'll need to have a look at that. > Tom -- Alan Mackenzie (Nuremberg, Germany).