From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c) Date: Wed, 03 May 2023 07:36:46 +0800 Message-ID: <87ildacnld.fsf@yahoo.com> References: <87ttwvgp4s.fsf@localhost> <63882A45-BD02-40D5-92FA-70175267BA3B@acm.org> Reply-To: Po Lu Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5219"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: 63225@debbugs.gnu.org, Ihor Radchenko To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 03 01:38:24 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ptzZf-00016x-97 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 03 May 2023 01:38:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ptzZN-00025T-2k; Tue, 02 May 2023 19:38:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ptzZL-000255-6H for bug-gnu-emacs@gnu.org; Tue, 02 May 2023 19:38:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ptzZK-0004eU-TW for bug-gnu-emacs@gnu.org; Tue, 02 May 2023 19:38:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ptzZK-0001qH-Bo for bug-gnu-emacs@gnu.org; Tue, 02 May 2023 19:38:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Po Lu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 02 May 2023 23:38:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63225 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63225-submit@debbugs.gnu.org id=B63225.16830706237005 (code B ref 63225); Tue, 02 May 2023 23:38:02 +0000 Original-Received: (at 63225) by debbugs.gnu.org; 2 May 2023 23:37:03 +0000 Original-Received: from localhost ([127.0.0.1]:45256 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ptzYN-0001ou-2A for submit@debbugs.gnu.org; Tue, 02 May 2023 19:37:03 -0400 Original-Received: from sonic304-22.consmr.mail.ne1.yahoo.com ([66.163.191.148]:43508) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ptzYL-0001oQ-4J for 63225@debbugs.gnu.org; Tue, 02 May 2023 19:37:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683070615; bh=PojXzKecAFAqjlheGPHwRLmbHw6rx3Vu/Lu776LJ/Y0=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=auh/hL8fZHZmNTh2mvLVYFijyPYGeLQiATBBUf4J7CRlOvnhNTFbon68QP+jcD8H80BOWIk364OETSPvJF9gEwIHfyKuyy0dtBmTad9Au4bHnwNSkLMMOydoGpoYEsu2Ccs895GFSqrUMQsAFVyFtsQsm/kO4vHGksMdBM4x5RqRcEjVKBYOciF7us1u6p5a9TDsUhzCcJNokKbw22nKa8a6z5kI3P//vMQqv3o+3k2gKaf9Wzk2iivu7uQbZVCXA8Q0vWedbZOgLLmUbfqLeO/S1hXdnYrpp6IMY4NSTblmXmg7Tm8R9yd1E3/8AHf5d4dyIgNFx7DX2qq+sblclQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683070615; bh=BAoQ0b8A2lXWyKkMQB2bBd611LYyaFVmzEo2I1FdVJm=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=JLaidhhZiKRArjq2wrSuWJ2Y/QN6E5rksdd39az5HzypCdzPHp/vbwjlbUAcT1RbhHalE0sr9+WIUdDVcPMsYjyW3hn6J/hXx8rl/97xe8s9gg7gYPehq8UJksr3+KU79QQ+CxMuXDCnc2LNQ0BStLLXzKXwMhIWSlzSemSRScDNuPHcpmwpjx4FIencDhCWikuA7Uths0Tkcm79w6pBmviiTDNL7+WOs8nyYklBkoKjhFk4vaqKjx28GX3uRh+AsGMOEMrYhkiVkB5cCg1Kczi6712bR5HNtd069XT3XN6ijfMh/TlUZcvt2YIJTbMTLF+HjI9B30fE/2A4g2RLIw== X-YMail-OSG: SWNl9voVM1lEVzo2hOqu5b3Yokn11sWzR0ED..bVYVWVRBm2nS4JDrDoEpVRMlJ 8wgqc9ciMdDSBkeWmVEOiH_ZaocEvv0_SKje_lZM_lXB_HfujRKemZQ5pOs5GqgE.k6byL7m.vgG jNgR_YnEQqzxVXMU0LR4r_SudonVtEd020s9hYusAU7MaAYAWO8MOcGt8QiZA7jUG6hReXU_UyG1 hOhv7r_Exkjj_w4zM5RN7Sd9MZUE8ZpcYE63v8exM544JFPU1M6zDxJMhZRibR.ilkOriaZaBCVg BPTTtNHo5BXxuW2npBbjpL3C4vTlzF_1wpC9TW2lU1LquPguNZ4wKATj4r9CmtVQ7v30XJV59Cd1 0eZ.rf91VTrSIhaffLehxruYOjgRwbgeGHFhyRx8u4G9Ni6ylMzaf.o2aQ9KstYERQMg5QvZ52tp PboEWVPXJz.jJ0NmPwkctAgaf7rDdAHVXt6mb_9Ts68xIJ._U2wQKElAGlRtpEMLxGSzAIcGWL56 JdpNALWin60aVPVcq3d.3auq9yWR1Kni65np2_7MWVbEWPOJ3Yfh9Jx6pml17YPPNqhIOhfpXuzf oFaYFAfs3PX0ggdYCqbC.l.h8d7leNE4MzZJmT9JJ03FDtB9F5uPI_3C.AnObwW1r5rolVrcqo.4 hs6TgW9KP3tssVpuNB9Zdj0Xsn7hOvtBricYpcyjVhyEBLWFi4rygTAZv5tjwF8aJlI0J2ReI8lz MzGiuEP8iqH7wCTd.7ky9qpd9VSRZGG0RL8.jrKEsLw0wtAIYa1njrTeWFiPpMMwgV8uJnsr9VaR LJvpPisdn1LN_NptXzP_oZDKlGMNIBxygB5ZMV0G9E X-Sonic-MF: X-Sonic-ID: 54181fd4-9539-4ea8-bd28-71a0bc58571f Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.ne1.yahoo.com with HTTP; Tue, 2 May 2023 23:36:55 +0000 Original-Received: by hermes--production-sg3-6d6fb994f6-pcrg5 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID c7a243b9f9a6f9247a47af73228abde5; Tue, 02 May 2023 23:36:51 +0000 (UTC) In-Reply-To: <63882A45-BD02-40D5-92FA-70175267BA3B@acm.org> ("Mattias =?UTF-8?Q?Engdeg=C3=A5rd?="'s message of "Tue, 2 May 2023 16:33:58 +0200") X-Mailer: WebService/1.1.21417 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260961 Archived-At: Mattias Engdeg=C3=A5rd writes: >> I was able to get rid of the regex compilation-related slowdown simply >> by increasing REGEXP_CACHE_SIZE 10x (see the attached patch). > > Indeed it sounds like you are suffering from regexp cache thrashing. I'm = attaching two patches: one to measure the cache miss rate, and one that all= ows the regexp cache size to be changed at run time. > > That should let you find the working set size for your application, and i= deally come up with a way to reduce it. Perhaps you could give us an idea o= f what these regexps look like and how they are used? > >> Does anyone know if there are potential side effects of this increase if >> applied across Emacs? Or, alternatively, may Emacs provide an ability to >> store compiled regexp patterns from Elisp (similar to what >> `treesit-query-compile' does)? > > I don't think it's necessarily a good idea to increase the size to 200 > right away because of the linear cache lookup mechanism. Allowing the > size to be changed at run time is probably less controversial (but > arguably just as much of a crutch). > > Introducing regexp objects that could store compiled regexps and be used = instead of strings would be quite some work but probably worthwhile. Thanks for curing this instance of C programmer's disease. > From f1246af3cc558bd38527f320964bb0e0a1e74de0 Mon Sep 17 00:00:00 2001 > From: =3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D > Date: Sat, 7 Nov 2020 17:00:53 +0100 > Subject: [PATCH 1/2] Add regexp cache hit/miss counters > > --- > src/search.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/src/search.c b/src/search.c > index 0bb52c03eef..6f71f3d16c1 100644 > --- a/src/search.c > +++ b/src/search.c > @@ -220,7 +220,10 @@ compile_pattern (Lisp_Object pattern, struct re_regi= sters *regp, > || EQ (cp->syntax_table, BVAR (current_buffer, syntax_table))) > && !NILP (Fequal (cp->f_whitespace_regexp, Vsearch_spaces_regexp)) > && cp->buf.charset_unibyte =3D=3D charset_unibyte) > - break; > + { > + regexp_cache_hit++; > + break; > + } >=20=20 > /* If we're at the end of the cache, compile into the last > (least recently used) non-busy cell in the cache. */ > @@ -232,6 +235,7 @@ compile_pattern (Lisp_Object pattern, struct re_regis= ters *regp, > cp =3D *cpp; > compile_it: > eassert (!cp->busy); > + regexp_cache_miss++; > compile_pattern_1 (cp, pattern, translate, posix); > break; > } > @@ -3431,6 +3435,13 @@ syms_of_search (void) > is to bind it with `let' around a small expression. */); > Vinhibit_changing_match_data =3D Qnil; >=20=20 > + DEFVAR_INT("regexp-cache-hit", regexp_cache_hit, > + doc: /* Regexp cache hit count. Internal use only. */); > + regexp_cache_hit =3D 0; > + DEFVAR_INT("regexp-cache-miss", regexp_cache_miss, > + doc: /* Regexp cache miss count. Internal use only. */); > + regexp_cache_miss =3D 0; Please put a space between `DEFVAR_INT' and `('.