From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Thu, 27 Jul 2023 08:20:55 +0000 Message-ID: <87lef17ok8.fsf@localhost> References: <5c4d9bea-3eb9-b262-138a-4ea0cb203436@gutov.dev> <87tttypp2e.fsf@localhost> <87r0p030w0.fsf@yahoo.com> <83sf9f6wm0.fsf@gnu.org> <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> <83v8e6lyi4.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3736"; mail-complaints-to="usenet@ciao.gmane.io" Cc: luangruo@yahoo.com, Dmitry Gutov , 64735@debbugs.gnu.org, sbaugh@janestreet.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 27 10:48:01 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qOwf9-0000iw-Ri for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jul 2023 10:48:00 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qOwF8-00011x-4s; Thu, 27 Jul 2023 04:21:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qOwF5-00010i-22 for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 04:21:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qOwF4-0004YG-Lm for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 04:21:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qOwF4-0001QE-GB for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 04:21:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Jul 2023 08:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.16904460515439 (code B ref 64735); Thu, 27 Jul 2023 08:21:02 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 27 Jul 2023 08:20:51 +0000 Original-Received: from localhost ([127.0.0.1]:40684 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOwEt-0001Pf-5i for submit@debbugs.gnu.org; Thu, 27 Jul 2023 04:20:51 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]:39391) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOwEr-0001PS-FP for 64735@debbugs.gnu.org; Thu, 27 Jul 2023 04:20:50 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 8A7C4240029 for <64735@debbugs.gnu.org>; Thu, 27 Jul 2023 10:20:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1690446043; bh=8CK4frDJIpaTtKrKYiSTXrB+ElVpBcM6E1a0mmb6o1Q=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:From; b=ealKtDjJUROjtxIE65nvxGkX0PytgtR2EbRtkqybupTbEWkAXuNQVasRQiY+9nFRW +VPXxHaX6Ai7sxii/5KdlMhAdBcNj1TfCeqoChGHwIBq4Ot+ACZzuMML6lgVRSltRK +jBXXsKEa3vlCfSnGmEuMDnlv1BsLHgfz2GVGi2boxO6xan0ZUCatFuhCNk0QQXH8z QfJFwCnA7Vi+62x3Rlwe9JjNsoEvYFeQnJCUF3PnW+e9LjAmFYQQeKqw77kEDo9ytv abRInWx0iGu5R0h55oj3qfYw8T6RDr8mQykKqkWvzpIo0ODKA/dVqtlv/yS7hpWF6p ZyHWycvM6b+Rw== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4RBNxk3V4Tz6tvc; Thu, 27 Jul 2023 10:20:42 +0200 (CEST) In-Reply-To: <83v8e6lyi4.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266171 Archived-At: Eli Zaretskii writes: >> > I have modified `directory-files-recursively' to avoid O(N^2) `nconc' >> > calls + bypassing regexp matches when REGEXP is nil. >> >> Sounds good. I haven't examined the diff closely, but it sounds like an >> improvement that can be applied irrespective of how this discussion ends. > > That change should be submitted as a separate issue and discussed in > detail before we decide we can make it. I will look into it. This was mostly a quick and dirty rewrite without paying too match attention to file order in the result. >> Skipping regexp matching entirely, though, will make this benchmark >> farther removed from real-life usage: this thread started from being >> able to handle multiple ignore entries when listing files (e.g. in a >> project). > > Agreed. From my POV, that variant's purpose was only to show how much > time is spent in matching file names against some include or exclude > list. Yes and no. It is not uncommon to query _all_ the files in directory and something as simple as (when (and (not (member regexp '("" ".*"))) (string-match regexp file))...) can give considerable speedup. Might be worth adding such optimization. >> So any solution for that (whether we use it on all or just >> some platforms) needs to be able to handle those. And it doesn't seem >> like directory-files-recursively has any alternative solution for that >> other than calling string-match on every found file. > > There's a possibility of pushing this filtering into > file-name-all-completions, but I'm not sure that will be faster. We > should try that and measure the results, I think. Isn't `file-name-all-completions' more limited and cannot accept arbitrary regexp? >> We can't really forget about GC, though. > > But we could temporarily lift the threshold while this function runs, > if that leads to significant savings. Yup. Also, GC times and frequencies will vary across different Emacs sessions. So, we may not want to rely on it when comparing the benchmarks from different people. >> But the above numbers make me hopeful about the async-parallel solution, >> implying that the parallelization really can help (and offset whatever >> latency we lose on pselect), as soon as we determine the source of extra >> consing and decide what to do about it. > > Isn't it clear that additional consing comes from the fact that we > first insert the Find's output into a buffer or produce a string from > it, and then chop that into individual file names? To add to it, I also tried to implement a version of `directory-files-recursively' that first inserts all the files in buffer and then filters them using `re-search-forward' instead of calling `string-match' on every file name string. That ended up being slower compared to the current `string-match' approach. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at