From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: sbaugh@catern.com Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Thu, 20 Jul 2023 12:22:19 +0000 (UTC) Message-ID: <87mszqixhh.fsf@catern.com> References: <837cqv41ob.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24066"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Spencer Baugh , 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 20 14:23:21 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qMSgi-0005yd-9h for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 20 Jul 2023 14:23:20 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qMSgW-0007TR-Hm; Thu, 20 Jul 2023 08:23:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMSgR-0007Sp-FQ for bug-gnu-emacs@gnu.org; Thu, 20 Jul 2023 08:23:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qMSgR-0002Te-7w for bug-gnu-emacs@gnu.org; Thu, 20 Jul 2023 08:23:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qMSgQ-0007vg-Ie for bug-gnu-emacs@gnu.org; Thu, 20 Jul 2023 08:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: sbaugh@catern.com Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 20 Jul 2023 12:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.168985574930394 (code B ref 64735); Thu, 20 Jul 2023 12:23:02 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 20 Jul 2023 12:22:29 +0000 Original-Received: from localhost ([127.0.0.1]:57916 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qMSft-0007u9-7F for submit@debbugs.gnu.org; Thu, 20 Jul 2023 08:22:29 -0400 Original-Received: from s.wrqvtzvf.outbound-mail.sendgrid.net ([149.72.126.143]:9974) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qMSfp-0007tl-SU for 64735@debbugs.gnu.org; Thu, 20 Jul 2023 08:22:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=catern.com; h=from:subject:in-reply-to:references:mime-version:to:cc:content-type: content-transfer-encoding:cc:content-type:from:subject:to; s=s1; bh=UiqMYTQYxvgL3MBaew3BvGTzw7g3XSXWfNVimM1Y7Fw=; b=UdPXIw/bRReu3aWEAmOgXEZBgGAorj39Fsv0pMIcwt0hmuHeY3fQWrzUgwXdlifySkLx /bxomw+FaN8iWU1k47PlokNHrDdFoUrB+TcpmBSANgvN1ncAGqhBKVx5BDYgk7JIrCwpIR nyCFCnPgbwm0LHwnj727nquP8LNoWQEJUPJmnvLfgvvTfpACy28xEZ7sX8YH9ZSpVPzyTW 1Wq4yhbRSwVH9omXQmwYMM2QElI9el4WMl+nELvkuKz72ioZBZe31oMBKY0XEQBkadk4VK kH1/S1eRUATeaPMs6YYutEriDxegyOBPpzvT4e3jFzBvku6KIosO1nuSdRGLWI7w== Original-Received: by filterdrecv-84b96456cb-ggcm8 with SMTP id filterdrecv-84b96456cb-ggcm8-1-64B926FB-3 2023-07-20 12:22:19.133355491 +0000 UTC m=+6093829.264883699 Original-Received: from earth.catern.com (unknown) by geopod-ismtpd-21 (SG) with ESMTP id 1cEVyLHCSEWk33BgfeXgcg Thu, 20 Jul 2023 12:22:18.966 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=::1; helo=localhost; envelope-from=sbaugh@catern.com; receiver=gnu.org Original-Received: from localhost (localhost [IPv6:::1]) by earth.catern.com (Postfix) with ESMTPSA id 6EF5760166; Thu, 20 Jul 2023 08:22:18 -0400 (EDT) In-Reply-To: <837cqv41ob.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 20 Jul 2023 08:00:52 +0300") X-SG-EID: ZgbRq7gjGrt0q/Pjvxk7wM0yQFRdOkTJAtEbkjCkHbLj6wGsjDHg7xvKHIL/ey25KEMZmf2YDGJ0YTmF1VpAa1ZeMkuuTv+lYnC+YI63BCfDIXY+O6E1TrpH9Pg4wFH0v+AYij4tmvxXz4JBa0G7wrGZp3lO9vZhPjQ7T6UNEiK8iLJ05nSMrWgk10RolcDY/IlZpGEWN2GDpF8d6jJiRQ== X-Entity-ID: d/0VcHixlS0t7iB1YKCv4Q== X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:265591 Archived-At: Eli Zaretskii writes: >> From: Spencer Baugh >> Date: Wed, 19 Jul 2023 17:16:31 -0400 >> >> >> Several important commands and functions invoke find; for example rgrep >> and project-find-regexp. >> >> Most of these add some set of ignores to the find command, pulling from >> grep-find-ignored-files in the former case. So the find command looks >> like: >> >> find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \) >> -prune -o -type f -print0 >> >> Alas, on my system, using GNU find, these ignores slow down find by >> about 15x on a large directory tree, taking it from around .5 seconds to >> 7.8 seconds. >> >> This is very noticeable overhead; removing the ignores makes rgrep and >> other find-invoking commands substantially faster for me. > > grep-find-ignored-files is a customizable user option, so if this > slowdown bothers you, just customize it to avoid that. I think the fact that the default behavior is very slow, is bad. > And if there are patterns there that are no longer pertinent or rare, > we could remove them from the default value. Sure! So the thing to narrow down would be completion-ignored-extensions, which is what populates grep-find-ignored-files. Most things in that list are irrelevant to most users, but all of them are relevant to some users. Most of these are language-specific things - e.g. there's a bunch of Common Lisp compiled object (or something) extensions. Perhaps we could modularize this, so that individual packages add things to completion-ignored-extensions at load time. Then completion-ignored-extensions would only include things which are relevant to a given user, as determined by what packages they load. > I'm not sure we should bother more than these two simple measures. Unfortunately those two simple measures help rgrep but they don't help project-find-regexp (and others project.el commands using project--files-in-directory such as project-find-file), since those project commands pull their ignores from the version control system through vc (not grep-find-ignored-files), and then pass them to find. >> The overhead is linear in the number of ignores - that is, each >> additional ignore adds a small fixed cost. This suggests that find is >> linearly scanning the list of ignores and checking each one, rather than >> optimizing them to a single regexp and checking that regexp. > > If it uses fnmatch, it cannot do it any other way, I think