From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Thu, 27 Jul 2023 03:41:29 +0300 Message-ID: References: <1fd5e3ed-e1c3-5d6e-897f-1d5d55e379fa@gutov.dev> <87wmyupvlw.fsf@localhost> <5c4d9bea-3eb9-b262-138a-4ea0cb203436@gutov.dev> <87tttypp2e.fsf@localhost> <87r0p030w0.fsf@yahoo.com> <83sf9f6wm0.fsf@gnu.org> <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31398"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, Eli Zaretskii , 64735@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 27 02:47:08 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qOp9m-0007vK-Qu for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jul 2023 02:47:08 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qOp4u-0006vI-3l; Wed, 26 Jul 2023 20:42:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qOp4s-0006vA-Kn for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 20:42:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qOp4s-0003zE-Cx for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 20:42:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qOp4s-0006An-0e for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 20:42:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Jul 2023 00:42:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169041850423705 (code B ref 64735); Thu, 27 Jul 2023 00:42:01 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 27 Jul 2023 00:41:44 +0000 Original-Received: from localhost ([127.0.0.1]:40346 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOp4a-0006AG-8l for submit@debbugs.gnu.org; Wed, 26 Jul 2023 20:41:44 -0400 Original-Received: from wout4-smtp.messagingengine.com ([64.147.123.20]:57969) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOp4X-0006A3-Aw for 64735@debbugs.gnu.org; Wed, 26 Jul 2023 20:41:42 -0400 Original-Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id ADF0332007E8; Wed, 26 Jul 2023 20:41:34 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 26 Jul 2023 20:41:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1690418494; x=1690504894; bh=ySSA/l4LP/FOYqC9lH92WgOaNrzVFPoeleu zEfCWzU8=; b=V7xeKrzYYuyM+v0dzAsopQQoH5F8g2yxXDCMKPZxXRmysbGcdsq 503guhJVqtapQ2EpxYXj6DyYbi2ZAgFtMrhqKhOd0S/U7OyWqylfLLxzwl0xkfgh xhS9EFbqcVotOray3uLn1VF2ehqXuE2/wyoOHeBU1+6o24gxZFifeE5RckAoMGYT ZAZkKXDRc84khWtTKzipWSEgmry86ozmBH41mApAZXovt/tm3LtCaKNSx3oStUHz RigcoW7k42whJYat/S2q8ZAgYHL6dc66Q++mpFyftXEmEC3I9HQ1fwym5Oz7kzR0 itTonfrNssKTSUD3QS+mYHP4LbwkXSusHZw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690418494; x=1690504894; bh=ySSA/l4LP/FOYqC9lH92WgOaNrzVFPoeleu zEfCWzU8=; b=OWDzB9Fj1vHq4elpHc/hlg7vMvGF56o9KnrXCFxpedxA+h/SYTW alO8zo/BaGnduo9NByaPOKY+HjkeyHn8ni6JVukGR8bITqtSk4td6nZYHIhQtcux pBMIDjKT3F8YXyOgDLoL5BXIWHYjCxnPY++c40LKdSARbvqMIaXL307y6/+80QKM rwZt0F41EySsBCJJI1ouQSG8DWHuDxckNYWHjkIrans4V6OCCqzibCi7T0yTmRQb 8AkmJx4dCyzYxiDcuDkwk0GZIC/MDbOB2Z6Yv+weG+Ojl6wqcZWXL3FO1yC6xO48 c5xzh2ZYBo0cfe9K6kh6tdaiGeHQ3L940SA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrieefgdefiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthejredttdefjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpeehleeflefhheeujeduteejveeggeejjefhleeljeettdfhjedthfejgfeivdeu leenucffohhmrghinhephihhvghtihhlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptd enucfrrghrrghmpehmrghilhhfrhhomhepughmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 26 Jul 2023 20:41:32 -0400 (EDT) Content-Language: en-US In-Reply-To: <878rb3m43b.fsf@localhost> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266143 Archived-At: On 26/07/2023 12:09, Ihor Radchenko wrote: >> It's also something to note that, GC-wise, numbers 1 and 2 are not the >> worst: the time must be spent somewhere else. > Indeed. I did more detailed analysis in > https://yhetil.org/emacs-devel/87cz0p2xlc.fsf@localhost/ > > Main contributors in the lisp versions are (in the order from most > significant to less significant) (1) file name handlers; (2) regexp > matching of the file names; (3) nconc calls in the current > `directory-files-recursively' implementation. > > I have modified `directory-files-recursively' to avoid O(N^2) `nconc' > calls + bypassing regexp matches when REGEXP is nil. Sounds good. I haven't examined the diff closely, but it sounds like an improvement that can be applied irrespective of how this discussion ends. Skipping regexp matching entirely, though, will make this benchmark farther removed from real-life usage: this thread started from being able to handle multiple ignore entries when listing files (e.g. in a project). So any solution for that (whether we use it on all or just some platforms) needs to be able to handle those. And it doesn't seem like directory-files-recursively has any alternative solution for that other than calling string-match on every found file. > Here are the results (using the attached modified version of your > benchmark file): > > (my-bench 10 "/usr/src/linux/" "") > (("built-in" . "Elapsed time: 7.285597s (3.853368s in 6 GCs)") > ("built-in no filename handler alist" . "Elapsed time: 5.855019s (3.760662s in 6 GCs)") > ("built-in non-recursive no filename handler alist" . "Elapsed time: 5.817639s (4.326945s in 7 GCs)") > ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 2.708306s (1.871665s in 3 GCs)") > ("with-find" . "Elapsed time: 6.082200s (4.262830s in 7 GCs)") > ("with-find-p" . "Elapsed time: 4.325503s (3.058647s in 5 GCs)") > ("with-find-sync" . "Elapsed time: 3.267648s (1.903655s in 3 GCs)")) Nice. > (let ((gc-cons-threshold most-positive-fixnum)) > (my-bench 10 "/usr/src/linux/" "")) > (("built-in" . "Elapsed time: 2.754473s") > ("built-in no filename handler alist" . "Elapsed time: 1.322443s") > ("built-in non-recursive no filename handler alist" . "Elapsed time: 1.235044s") > ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 0.750275s") > ("with-find" . "Elapsed time: 1.438510s") > ("with-find-p" . "Elapsed time: 1.200876s") > ("with-find-sync" . "Elapsed time: 1.349755s")) > > If we forget about GC, Elisp version can get fairly close to GNU find. > And if we do not perform regexp matching (which makes sense when the > REGEXP is ""), Elisp version is faster. We can't really forget about GC, though. But the above numbers make me hopeful about the async-parallel solution, implying that the parallelization really can help (and offset whatever latency we lose on pselect), as soon as we determine the source of extra consing and decide what to do about it.