From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Sat, 22 Jul 2023 20:46:01 +0300 Message-ID: <83wmyr7sbq.fsf@gnu.org> References: <87edl1scbw.fsf@gmx.de> <87fs5hmp6i.fsf@localhost> <87cz0lmoxy.fsf@localhost> <83v8edzb31.fsf@gnu.org> <87r0p1cta3.fsf@gmx.de> <87pm4ll7ox.fsf@localhost> <87a5vpcmc7.fsf@gmx.de> <878rb9l1f5.fsf@localhost> <87zg3pb6yt.fsf@gmx.de> <83zg3p9s39.fsf@gnu.org> <878rb944wi.fsf@localhost> <83tttx9q4v.fsf@gnu.org> <87pm4lb4fr.fsf@gmx.de> <83pm4l9n0o.fsf@gnu.org> <87jzutb14l.fsf@gmx.de> <83mszp9kl2.fsf@gnu.org> <83h6pwa52z.fsf@gnu.org> <87ilaci637.fsf@catern.com> <83sf9g88eh.fsf@gnu.org> <87cz0jj25g.fsf@catern.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37437"; mail-complaints-to="usenet@ciao.gmane.io" Cc: sbaugh@janestreet.com, yantar92@posteo.net, rms@gnu.org, dmitry@gutov.dev, michael.albinus@gmx.de, 64735@debbugs.gnu.org To: sbaugh@catern.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jul 22 19:46:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qNGgT-0009Sf-5S for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 22 Jul 2023 19:46:25 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qNGg8-00009V-6O; Sat, 22 Jul 2023 13:46:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qNGg7-00009M-6b for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:46:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qNGg6-0004kn-Ux for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qNGg6-0005FD-Fe for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:46:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 22 Jul 2023 17:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169004794120128 (code B ref 64735); Sat, 22 Jul 2023 17:46:02 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 22 Jul 2023 17:45:41 +0000 Original-Received: from localhost ([127.0.0.1]:37544 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNGfk-0005EZ-VU for submit@debbugs.gnu.org; Sat, 22 Jul 2023 13:45:41 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40836) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNGfi-0005EM-4B for 64735@debbugs.gnu.org; Sat, 22 Jul 2023 13:45:39 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qNGfb-0004iT-Pt; Sat, 22 Jul 2023 13:45:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=bKTVOF9y9vUbCMp9Lv9+yw8MxYmMaeqfltn0EszC6lQ=; b=EtqU/6GBD/w1 PYE0VK+VuJg+i+qBVz6675PKmkmh9CxZIAVr2uf0YspSnJM3u8F3ul2+1b4KpDF5onx0av+E7L1ja Kd+SaVp16qA50D+W1EN4hu0GcFrzhnW3ydplpxQw23cikhUFVlCDyMj5zb1kgAfDf8tX37wCzFIAJ 5VEmeXAOEN/I+65A/73NTKQ0gTqmGwqJ2XQTgQ0qdJSbkdPjsQtVhSNIPI+jwyXq92yDC7RLHyJnZ fZkPVx20E17vEUbkO/0YnHFAJVuZKdZaSDLxgvdUSpFZAs7wbGlvA1puAU/2kWsyN4AafaJawkVyk ySNc1qdcIRX8rEy8wBo2xw==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qNGfU-0005FA-KZ; Sat, 22 Jul 2023 13:45:24 -0400 In-Reply-To: <87cz0jj25g.fsf@catern.com> (sbaugh@catern.com) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:265837 Archived-At: > From: sbaugh@catern.com > Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC) > Cc: sbaugh@janestreet.com, yantar92@posteo.net, rms@gnu.org, dmitry@gutov.dev, > michael.albinus@gmx.de, 64735@debbugs.gnu.org > > First my results: > > (my-bench 100 "~/public_html" "") > (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)") > ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)")) > > (my-bench 10 "~/.local/src/linux" "") > (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)") > ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)")) > > (my-bench 100 "/ssh:catern.com:~/public_html" "") > (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)") > ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)")) > > 2x speedup on local files, and almost a 10x speedup for remote files. Thanks, that's impressive. But you omitted some of the features of directory-files-recursively, see below. > And my implementation *isn't even using the fact that find can run in > parallel with Emacs*. If I did start using that, I expect even more > speed gains from parallelism, which aren't achievable in Emacs itself. I'm not sure I understand what you mean by "in parallel" and why it would be faster. > So can we add something like this (with the appropriate fallbacks to > directory-files-recursively), since it has such a big speedup even > without parallelism? We can have an alternative implementation, yes. But it should support predicate, and it should sort the files in each directory like directory-files-recursively does, so that it's a drop-in replacement. Also, I believe that Find does return "." in each directory, and your implementation doesn't filter them, whereas directory-files-recursively does AFAIR. And I see no need for any fallback: that's for the application to do if it wants. > (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates") It should. > (if follow-symlinks > '("-L") > '("!" "(" "-type" "l" "-xtype" "d" ")")) > (unless (string-empty-p regexp) > "-regex" (concat ".*" regexp ".*")) > (unless include-directories > '("!" "-type" "d")) > '("-print0") Some of these switches are specific to GNU Find. Are we going to support only GNU Find? > )) > (remote (file-remote-p dir)) > (proc > (if remote > (let ((proc (apply #'start-file-process > "find" (current-buffer) command))) > (set-process-sentinel proc (lambda (_proc _state))) > (set-process-query-on-exit-flag proc nil) > proc) > (make-process :name "find" :buffer (current-buffer) > :connection-type 'pipe > :noquery t > :sentinel (lambda (_proc _state)) > :command command)))) > (while (accept-process-output proc)) Why do you call accept-process-output here? it could interfere with reading output from async subprocesses running at the same time. To come think of this, why use async subprocesses here and not call-process?