From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Spencer Baugh Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Sat, 22 Jul 2023 16:53:05 -0400 Message-ID: References: <87cz0lmoxy.fsf@localhost> <83v8edzb31.fsf@gnu.org> <87r0p1cta3.fsf@gmx.de> <87pm4ll7ox.fsf@localhost> <87a5vpcmc7.fsf@gmx.de> <878rb9l1f5.fsf@localhost> <87zg3pb6yt.fsf@gmx.de> <83zg3p9s39.fsf@gnu.org> <878rb944wi.fsf@localhost> <83tttx9q4v.fsf@gnu.org> <87pm4lb4fr.fsf@gmx.de> <83pm4l9n0o.fsf@gnu.org> <87jzutb14l.fsf@gmx.de> <83mszp9kl2.fsf@gnu.org> <83h6pwa52z.fsf@gnu.org> <87ilaci637.fsf@catern.com> <83sf9g88eh.fsf@gnu.org> <87cz0jj25g.fsf@catern.com> <83wmyr7sbq.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27539"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: yantar92@posteo.net, rms@gnu.org, sbaugh@catern.com, dmitry@gutov.dev, michael.albinus@gmx.de, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jul 22 22:54:18 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qNJcH-0006vn-Rv for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 22 Jul 2023 22:54:17 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qNJc3-0003Ec-RJ; Sat, 22 Jul 2023 16:54:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qNJc2-0003EU-Kx for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 16:54:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qNJc2-0008D1-Cz for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 16:54:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qNJc2-0001yH-8k for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 16:54:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Spencer Baugh Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 22 Jul 2023 20:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.16900591947518 (code B ref 64735); Sat, 22 Jul 2023 20:54:02 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 22 Jul 2023 20:53:14 +0000 Original-Received: from localhost ([127.0.0.1]:37623 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNJbF-0001xB-BH for submit@debbugs.gnu.org; Sat, 22 Jul 2023 16:53:13 -0400 Original-Received: from mxout5.mail.janestreet.com ([64.215.233.18]:51239) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNJbD-0001wy-27 for 64735@debbugs.gnu.org; Sat, 22 Jul 2023 16:53:12 -0400 In-Reply-To: <83wmyr7sbq.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 22 Jul 2023 20:46:01 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:265843 Archived-At: Eli Zaretskii writes: >> From: sbaugh@catern.com >> Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC) >> Cc: sbaugh@janestreet.com, yantar92@posteo.net, rms@gnu.org, dmitry@gutov.dev, >> michael.albinus@gmx.de, 64735@debbugs.gnu.org >> >> First my results: >> >> (my-bench 100 "~/public_html" "") >> (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)") >> ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)")) >> >> (my-bench 10 "~/.local/src/linux" "") >> (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)") >> ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)")) >> >> (my-bench 100 "/ssh:catern.com:~/public_html" "") >> (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)") >> ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)")) >> >> 2x speedup on local files, and almost a 10x speedup for remote files. > > Thanks, that's impressive. But you omitted some of the features of > directory-files-recursively, see below. > >> And my implementation *isn't even using the fact that find can run in >> parallel with Emacs*. If I did start using that, I expect even more >> speed gains from parallelism, which aren't achievable in Emacs itself. > > I'm not sure I understand what you mean by "in parallel" and why it > would be faster. I mean having Emacs read output from the process and turn them into strings while find is still running and walking the directory tree. So the two parts are running in parallel. This, specifically: (defun find-directory-files-recursively (dir regexp &optional include-directories _predicate follow-symlinks) (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates") (cl-assert (not (file-remote-p dir))) (let* (buffered result (proc (make-process :name "find" :buffer nil :connection-type 'pipe :noquery t :sentinel (lambda (_proc _state)) :filter (lambda (proc data) (let ((start 0)) (when-let (end (string-search "\0" data start)) (push (concat buffered (substring data start end)) result) (setq buffered "") (setq start (1+ end)) (while-let ((end (string-search "\0" data start))) (push (substring data start end) result) (setq start (1+ end)))) (setq buffered (concat buffered (substring data start))))) :command (append (list "find" (file-local-name dir)) (if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) (unless (string-empty-p regexp) "-regex" (concat ".*" regexp ".*")) (unless include-directories '("!" "-type" "d")) '("-print0") )))) (while (accept-process-output proc)) result)) Can you try this further change on your Windows (and GNU/Linux) box? I just tested on a different box and my original change gets: (("built-in" . "Elapsed time: 4.506643s (2.276269s in 21 GCs)") ("with-find" . "Elapsed time: 4.114531s (2.848497s in 27 GCs)")) while this parallel implementation gets (("built-in" . "Elapsed time: 4.479185s (2.236561s in 21 GCs)") ("with-find" . "Elapsed time: 2.858452s (1.934647s in 19 GCs)")) so it might have a favorable impact on Windows and your other GNU/Linux box. >> So can we add something like this (with the appropriate fallbacks to >> directory-files-recursively), since it has such a big speedup even >> without parallelism? > > We can have an alternative implementation, yes. But it should support > predicate, and it should sort the files in each directory like > directory-files-recursively does, so that it's a drop-in replacement. > Also, I believe that Find does return "." in each directory, and your > implementation doesn't filter them, whereas > directory-files-recursively does AFAIR. > > And I see no need for any fallback: that's for the application to do > if it wants. > >> (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates") > > It should. This is where I think a fallback would be useful - it's basically impossible to support arbitrary predicates efficiently here, since it requires us to put Lisp in control of whether find descends into a directory. So I'm thinking I would just fall back to running the old directory-files-recursively whenever there's a predicate. Or just not supporting this at all... >> (if follow-symlinks >> '("-L") >> '("!" "(" "-type" "l" "-xtype" "d" ")")) >> (unless (string-empty-p regexp) >> "-regex" (concat ".*" regexp ".*")) >> (unless include-directories >> '("!" "-type" "d")) >> '("-print0") > > Some of these switches are specific to GNU Find. Are we going to > support only GNU Find? POSIX find doesn't support -regex, so I think we have to. We could stick to just POSIX find if we only allowed globs in find-directory-files-recursively, instead of full regexes. >> )) >> (remote (file-remote-p dir)) >> (proc >> (if remote >> (let ((proc (apply #'start-file-process >> "find" (current-buffer) command))) >> (set-process-sentinel proc (lambda (_proc _state))) >> (set-process-query-on-exit-flag proc nil) >> proc) >> (make-process :name "find" :buffer (current-buffer) >> :connection-type 'pipe >> :noquery t >> :sentinel (lambda (_proc _state)) >> :command command)))) >> (while (accept-process-output proc)) > > Why do you call accept-process-output here? it could interfere with > reading output from async subprocesses running at the same time. To > come think of this, why use async subprocesses here and not > call-process? See my new iteration which does use the async-ness.