From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Tom Tromey Newsgroups: gmane.emacs.devel Subject: Re: rx.el sexp regexp syntax Date: Sun, 27 May 2018 10:56:36 -0600 Message-ID: <87a7slyr3v.fsf@tromey.com> References: <87h8mw3yoc.fsf@gmail.com> <20180525155126.GA4096@ACM> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1527440137 7632 195.159.176.226 (27 May 2018 16:55:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 27 May 2018 16:55:37 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Cc: rms@gnu.org, Pierre Neidhardt , Noam Postavsky , emacs-devel@gnu.org, van@scratch.space, eliz@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun May 27 18:55:32 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fMyx8-0001oz-Nd for ged-emacs-devel@m.gmane.org; Sun, 27 May 2018 18:55:31 +0200 Original-Received: from localhost ([::1]:52681 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMyzF-0005dE-Pd for ged-emacs-devel@m.gmane.org; Sun, 27 May 2018 12:57:41 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54628) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fMyyY-0005cv-1r for emacs-devel@gnu.org; Sun, 27 May 2018 12:56:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fMyyS-0006Kq-Si for emacs-devel@gnu.org; Sun, 27 May 2018 12:56:58 -0400 Original-Received: from gateway33.websitewelcome.com ([192.185.146.80]:35130) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fMyyS-00063C-Mm for emacs-devel@gnu.org; Sun, 27 May 2018 12:56:52 -0400 Original-Received: from cm15.websitewelcome.com (cm15.websitewelcome.com [100.42.49.9]) by gateway33.websitewelcome.com (Postfix) with ESMTP id 5E0D8913911 for ; Sun, 27 May 2018 11:56:40 -0500 (CDT) Original-Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with SMTP id MyyGfJxWObXuJMyyGfapEj; Sun, 27 May 2018 11:56:40 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Nd6tcC0jj1dlx2mXIXkTduThr9mHTB1qJorU9XtaSws=; b=FpKIS47lsg+HapZ+25XNCXUxIz h6J1hforivXFPuRtAokT2cpj7BF6I/WLVYVBYBcvPR5CtS09Aw6ffUZbc8zbQ+RI4aIdAelQVSOxO sAwBzOhHJUxXllN0dyUBpXdDg; Original-Received: from 174-29-44-154.hlrn.qwest.net ([174.29.44.154]:45422 helo=bapiya) by box5379.bluehost.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1fMyyG-0001rq-02; Sun, 27 May 2018 11:56:40 -0500 X-Attribution: Tom In-Reply-To: <20180525155126.GA4096@ACM> (Alan Mackenzie's message of "Fri, 25 May 2018 15:51:26 +0000") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 174.29.44.154 X-Source-L: No X-Exim-ID: 1fMyyG-0001rq-02 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 174-29-44-154.hlrn.qwest.net (bapiya) [174.29.44.154]:45422 X-Source-Auth: tom+tromey.com X-Email-Count: 5 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 192.185.146.80 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:225768 Archived-At: >>>>> "Alan" == Alan Mackenzie writes: >> Building the automaton is costly. In C, we build it once and save the >> result in a variable so that every regexp match does not rebuild the >> automaton each time. Alan> Emacs has a (moderately large) cache of regexps, so that building the Alan> automatons is done very rarely. Possibly just once each for each Alan> session of Emacs. I wonder about both of these statements. On the one hand, AFAICT the regex cache is 20 items. From search.c: #define REGEXP_CACHE_SIZE 20 That seems pretty small to me, given how prevalent regexps are in elisp. On the other hand, in the past when I have tried to profile Emacs, I haven't seen regexp compilation show up too much. IIRC I did see regexp matching and the GC. Maybe this just points out the efficacy of the cache -- maybe 20 items is plenty. Perhaps the regexp matcher could use some micro-optimizations, like the token-threading the bytecode interpreter does. Alan> Are you suggesting here building an interpreter in Lisp directly to Alan> execute rx expressions? It's interesting, IMO, to consider compiling rx (or regexps generally) to lisp bytecode. Perhaps with the JIT, it would boost performance in some cases. (It may be slower, but it's worthwhile to do the experiment.) For other work in this area see Stefan's lex-parse-re package. I think it includes a regexp matcher in elisp. Tom