From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c) Date: Wed, 3 May 2023 10:39:10 +0200 Message-ID: <34F4849A-CB39-4C96-9CC1-11ED723706DA@gmail.com> References: <63882A45-BD02-40D5-92FA-70175267BA3B@acm.org> <874jou7lsf.fsf@localhost> <37EED5F9-F1FE-46B6-B4FA-0B268B945123@gmail.com> <87wn1qqvj0.fsf@localhost> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24444"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63225@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 03 10:40:18 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pu826-0006Ay-3t for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 03 May 2023 10:40:18 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pu81s-000232-Td; Wed, 03 May 2023 04:40:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pu81r-00022r-6H for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 04:40:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pu81q-0004QR-TJ for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 04:40:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pu81q-0000b5-8S for bug-gnu-emacs@gnu.org; Wed, 03 May 2023 04:40:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 03 May 2023 08:40:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63225 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63225-submit@debbugs.gnu.org id=B63225.16831031642236 (code B ref 63225); Wed, 03 May 2023 08:40:02 +0000 Original-Received: (at 63225) by debbugs.gnu.org; 3 May 2023 08:39:24 +0000 Original-Received: from localhost ([127.0.0.1]:45526 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pu81E-0000Zz-Bq for submit@debbugs.gnu.org; Wed, 03 May 2023 04:39:24 -0400 Original-Received: from mail-lf1-f49.google.com ([209.85.167.49]:62708) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pu818-0000Zf-O0 for 63225@debbugs.gnu.org; Wed, 03 May 2023 04:39:22 -0400 Original-Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-4f139de8cefso1410918e87.0 for <63225@debbugs.gnu.org>; Wed, 03 May 2023 01:39:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683103153; x=1685695153; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=lph4oKkInYP6BIr+GJlBGbxdEAw/GyFK4lF1nclYuR0=; b=W0ghVmICIWuy0AnuOUrehodkVvnfKUfx0zKMmZ1gu91tcY7QoMRF8pqfrXN1tA1X7y EAUtcgJp5EFK+XznIvaCN8fJnleeNFjNbUeYalmK0l4XYy3DC2763DGzA0hRJgsoo5l8 ejLTVnJ+xVuB+wsL2iOsDuJkOXtHlmwhuSkUY91cPJrqEFjin1ymc4x4BppN9LboV9tK zGMlUJIPllDUPX6BOUEDy7dSCwphG9uigVDD+SLtGrwF2AaacaIfFTXHUWjnG+nd8Rh8 C/17st1atzZgaJoC65g73RvSHRil9/IfErlbqB0bY8xpm1LmI8HsNnd/l5NOyH/NAQSc MAHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683103153; x=1685695153; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=lph4oKkInYP6BIr+GJlBGbxdEAw/GyFK4lF1nclYuR0=; b=XTGvMj+Di9eWARBeeFyaNMTYfE7ul+4KNMcWCmz915uZ+BaGjWugtVOIgsZmrC3e0v Bc4WHaCy9nvLaiEL+BCC5dwft/Y6Mpq8zFP5qrU/4DMvfvB9lJJaxYIyFULJAdV8NlgC zVxkuWsyiUC8tsTnbP29fRRLfht7k36lYYm0JmkufNnqMoq0rrirr2paZwBiqJsWP/qq UWSiVv0paSUcONcul/xza6nJmouuNULcX+XjUz2C68CMHOVnSIXtyIizFGQg2fDWee1L FVRmKjiA05dt1+PGCvfLsqmHNNKDV7dNTc7k5u5jPVK2KWzPyLQ1TjJxEXbjlJZjqkEu nFkw== X-Gm-Message-State: AC+VfDwXevNAgu1YadJsyUezFp7Qz5/I7QDfeDlA1JYkGw6L675g0ZEf wt208FQv03y+oQ7LFAIZv1w= X-Google-Smtp-Source: ACHHUZ73HPHUCkb/gxtRISpuqT9bCmfKZXGzYelCnXmyX9YCVYxYnE21x/f3U5AjOnpMvi4CEXfOxQ== X-Received: by 2002:a05:6512:224f:b0:4ec:36d6:1517 with SMTP id i15-20020a056512224f00b004ec36d61517mr291714lfu.2.1683103152552; Wed, 03 May 2023 01:39:12 -0700 (PDT) Original-Received: from smtpclient.apple (c188-150-165-235.bredband.tele2.se. [188.150.165.235]) by smtp.gmail.com with ESMTPSA id f11-20020a056512092b00b004b4b600c093sm5906020lft.92.2023.05.03.01.39.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 May 2023 01:39:11 -0700 (PDT) In-Reply-To: <87wn1qqvj0.fsf@localhost> X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260967 Archived-At: 2 maj 2023 kl. 23.21 skrev Ihor Radchenko : > I tried this, and it did not give any noticeable improvement. > Most likely, because the actual `cond' is >=20 > (cond ((looking-at "foo) ()) ...) I see, so it doesn't run through all top-level cases very often then? I = thought that would be the common path (plain text). Would consolidating some of the secondary regexps help at all? What are = the most frequent branches in the parser? Perhaps you just don't see much improvement until the working set of = regexps fits in the cache. >> Otherwise it's very much a matter of optimisation of everything, = including regexps. Minimise backtracking. >> If you want to match five or more dashes, use "------*" instead of = "-\\{5,\\}". And so on. >=20 > This example sounds like something that regexp compilation should be > able to optimize, no? I do not easily see why the latter should cause > more CPU time compared to the former. It's a trivial point and definitely not the source of your problems, = sorry! (Counted repetitions are slightly less efficient because they = need to maintain the counter, it's all done in a terrible way.) The regexp compiler doesn't do much optimisation in order to keep the = translation fast. It doesn't even convert "[a]" to "a". > Or, alternatively, the parsed regexps can be attached to string = objects > internally. Then, regexp cache lookup will degenerate to looking into = a > string object slot. That would work too but we really don't want to make our strings any = fancier, they are already much too big and slow.