From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Spencer Baugh Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Wed, 19 Jul 2023 17:16:31 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21287"; mail-complaints-to="usenet@ciao.gmane.io" To: 64735@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jul 19 23:17:22 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qMEXx-0005Du-6x for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 19 Jul 2023 23:17:21 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qMEXh-0006rx-Sp; Wed, 19 Jul 2023 17:17:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMEXf-0006rQ-5C for bug-gnu-emacs@gnu.org; Wed, 19 Jul 2023 17:17:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qMEXe-0007ff-TV for bug-gnu-emacs@gnu.org; Wed, 19 Jul 2023 17:17:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qMEXe-000493-KV for bug-gnu-emacs@gnu.org; Wed, 19 Jul 2023 17:17:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Spencer Baugh Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 19 Jul 2023 21:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 64735 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.168980140715907 (code B ref -1); Wed, 19 Jul 2023 21:17:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 19 Jul 2023 21:16:47 +0000 Original-Received: from localhost ([127.0.0.1]:57195 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qMEXO-00048U-DV for submit@debbugs.gnu.org; Wed, 19 Jul 2023 17:16:46 -0400 Original-Received: from lists.gnu.org ([2001:470:142::17]:46920) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qMEXK-00048F-3o for submit@debbugs.gnu.org; Wed, 19 Jul 2023 17:16:44 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMEXE-0006kz-T1 for bug-gnu-emacs@gnu.org; Wed, 19 Jul 2023 17:16:36 -0400 Original-Received: from mxout5.mail.janestreet.com ([64.215.233.18]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMEXB-0007dQ-V9 for bug-gnu-emacs@gnu.org; Wed, 19 Jul 2023 17:16:36 -0400 Received-SPF: pass client-ip=64.215.233.18; envelope-from=sbaugh@janestreet.com; helo=mxout5.mail.janestreet.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:265560 Archived-At: Several important commands and functions invoke find; for example rgrep and project-find-regexp. Most of these add some set of ignores to the find command, pulling from grep-find-ignored-files in the former case. So the find command looks like: find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \) -prune -o -type f -print0 Alas, on my system, using GNU find, these ignores slow down find by about 15x on a large directory tree, taking it from around .5 seconds to 7.8 seconds. This is very noticeable overhead; removing the ignores makes rgrep and other find-invoking commands substantially faster for me. The overhead is linear in the number of ignores - that is, each additional ignore adds a small fixed cost. This suggests that find is linearly scanning the list of ignores and checking each one, rather than optimizing them to a single regexp and checking that regexp. Obviously, GNU find should be optimizing this. However they have previously said they will not optimize this; I commented on this bug https://savannah.gnu.org/bugs/index.php?58197 to request they rethink that. Hopefully as a fellow GNU project they will be interested in helping us... In Emacs alone, there are a few things we could do: - we could mitigate the find bug by optimizing the regexp before we pass it to find; this should basically remove all the overhead but makes the find command uglier and harder to edit - we could remove rare and likely irrelevant things from completion-ignored-extensions and vc-ignore-dir-regexp (which are used to build these lists of ignores) - we could use our own recursive directory-tree walking implementation (directory-files-recursively), if we found a nice way to pipe its output directly to grep etc without going through Lisp. (This could be nice for project-files, at least) Incidentally, I tried a find alternative, "bfs", https://github.com/tavianator/bfs and it doesn't optimize this either, sadly, so it also has the 15x slowdown. In GNU Emacs 29.0.92 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw scroll bars) of 2023-07-10 built on Repository revision: dd15432ffacbeff0291381c0109f5b1245060b1d Repository branch: emacs-29 Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Rocky Linux 8.8 (Green Obsidian) Configured using: 'configure --config-cache --with-x-toolkit=lucid --with-gif=ifavailable' Configured features: CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Shell Memory information: ((conses 16 1939322 193013) (symbols 48 76940 49) (strings 32 337371 45355) (string-bytes 1 12322013) (vectors 16 148305) (vector-slots 8 3180429 187121) (floats 8 889 751) (intervals 56 152845 1238) (buffers 976 235) (heap 1024 978725 465480))