From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c) Date: Wed, 03 May 2023 09:36:01 +0000 Message-ID: <87wn1psqny.fsf@localhost> References: <63882A45-BD02-40D5-92FA-70175267BA3B@acm.org> <874jou7lsf.fsf@localhost> <37EED5F9-F1FE-46B6-B4FA-0B268B945123@gmail.com> <87wn1qqvj0.fsf@localhost> <34F4849A-CB39-4C96-9CC1-11ED723706DA@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13443"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63225@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 03 11:33:15 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pu8rK-0003BQ-Vw for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 03 May 2023 11:33:15 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pu8rB-0003J1-52; Wed, 03 May 2023 05:33:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pu8r8-0003DS-7K for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 05:33:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pu8r7-0007b9-UD for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 05:33:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pu8r7-0002DX-Ji for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 05:33:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 03 May 2023 09:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63225 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63225-submit@debbugs.gnu.org id=B63225.16831063818519 (code B ref 63225); Wed, 03 May 2023 09:33:01 +0000 Original-Received: (at 63225) by debbugs.gnu.org; 3 May 2023 09:33:01 +0000 Original-Received: from localhost ([127.0.0.1]:45574 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pu8r6-0002DI-GG for submit@debbugs.gnu.org; Wed, 03 May 2023 05:33:01 -0400 Original-Received: from mout02.posteo.de ([185.67.36.66]:40983) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pu8r3-0002D0-HN for 63225@debbugs.gnu.org; Wed, 03 May 2023 05:32:59 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 7B94D240390 for <63225@debbugs.gnu.org>; Wed, 3 May 2023 11:32:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1683106371; bh=AWgE0KqQ2UHdr13zBUfVrgnRzCh3FUA3A4ZCgA1hWN0=; h=From:To:Cc:Subject:Date:From; b=dAlP8PF9OBbUSNAMxLqiwYiwDYTQRQ9u2m7JE/pqWRlH4XeHcllCVVIqEU97Qe2DN qnvaaacslgC4HJ2qah9q4iyvVlQA/N+BkNDO14i9hd02NIGygobwwY6SqYeSNfIxX0 Tn6X92PjCADBh9wLekMbGhOuCKmwczicp4wDLDpCFD+XdIgaN0C64aWHZe4xNkzIpA MNMMHUKOGovqmZ1bA75C0MczZa4+rHY4PXGDhPcP3qAXRHYERiieBpLdxUf38+0FIu VE2ipkyueddfIxQROpcNmA1BC/i0wgZth2O+oktJaxRf6IQsmlGcOThZOYkMZNIc3v gS8GWlGTbxHAw== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4QBBZB5dbtz6twT; Wed, 3 May 2023 11:32:50 +0200 (CEST) In-Reply-To: <34F4849A-CB39-4C96-9CC1-11ED723706DA@gmail.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260970 Archived-At: Mattias Engdeg=C3=A5rd writes: >> I tried this, and it did not give any noticeable improvement. >> Most likely, because the actual `cond' is >>=20 >> (cond ((looking-at "foo) ()) ...) > > I see, so it doesn't run through all top-level cases very often then? I t= hought that would be the common path (plain text). You are indeed right. Top-level cases are ran very often. So, what I said does not make much sense. Yet, in my tests, I am unable to see any improvement when I consolidate the regexps. If I do (progn (set-regexp-cache-size 50) (org-element-parse-buffer) nil) Without consolidation, but using `looking-at-p' as much as possible: Profiler top ;; 4160 21% + org-element--current-element ;; 2100 10% + org-element--parse-elements ;; 1894 9% + org-element--parse-objects ;; 1422 7% Automatic GC ;; 871 4% + org-element--headline-deferred ;; 806 4% + apply ;; 796 4% + org-element-create ;; 638 3% + org-element--list-struct Perf top ;; 16.72% emacs emacs [.] re_= match_2_internal ;; 7.16% emacs emacs [.] exe= c_byte_code ;; 4.08% emacs emacs [.] fun= call_subr ;; 4.06% emacs emacs [.] re_= search_2 With consolidation into a giant rx (or ...) with groups: ;; 4158 21% + org-element--current-element ;; 2163 11% + org-element--parse-objects ;; 1796 9% + org-element--parse-elements ;; 1276 6% Automatic GC ;; 921 4% + org-element--headline-deferred ;; 833 4% + apply ;; 793 4% + org-element-create ;; 660 3% + org-element--list-struct ;; 16.44% emacs emacs [.] re_= match_2_internal ;; 7.03% emacs emacs [.] exe= c_byte_code ;; 6.78% emacs emacs [.] pro= cess_mark_stack ;; 4.05% emacs emacs [.] re_= search_2 ;; 4.02% emacs emacs [.] fun= call_subr The version with giant single rx form is actually slower overall (!), making no difference at all in `org-element--current-element'. > Perhaps you just don't see much improvement until the working set of rege= xps fits in the cache. As you see, I now increased cache size to 50. No improvement. Same with my observations on current master. > The regexp compiler doesn't do much optimisation in order to keep the tra= nslation fast. It doesn't even convert "[a]" to "a". I guess that it is another thing that could be improved if we were to have compiled regexp objects. Compilation time would not matter as much. Ideally, the compiler should do something similar to what https://www.colm.net/open-source/ragel/ does. >> Or, alternatively, the parsed regexps can be attached to string objects >> internally. Then, regexp cache lookup will degenerate to looking into a >> string object slot. > > That would work too but we really don't want to make our strings any fanc= ier, they are already much too big and slow. Then, what about making compiled regexp object similar to string, but with plist slot replaced by compiled regexp slot? Maybe some other slots removed (I am not very familiar with specific of internal string representation) AFAIU, compiled regexp read/write syntax can be uniquely represented simply by a string. Something like #r"[a-z]+" (maybe even with special handling for backslashes, like proposed in https://yhetil.org/emacs-devel/4209edd83cfee7c84b2d75ebfcd38784fa21b23c.cam= el@crossproduct.net) --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at