From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: sbaugh@catern.com Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC) Message-ID: <87cz0jj25g.fsf@catern.com> References: <87edl1scbw.fsf@gmx.de> <87fs5hmp6i.fsf@localhost> <87cz0lmoxy.fsf@localhost> <83v8edzb31.fsf@gnu.org> <87r0p1cta3.fsf@gmx.de> <87pm4ll7ox.fsf@localhost> <87a5vpcmc7.fsf@gmx.de> <878rb9l1f5.fsf@localhost> <87zg3pb6yt.fsf@gmx.de> <83zg3p9s39.fsf@gnu.org> <878rb944wi.fsf@localhost> <83tttx9q4v.fsf@gnu.org> <87pm4lb4fr.fsf@gmx.de> <83pm4l9n0o.fsf@gnu.org> <87jzutb14l.fsf@gmx.de> <83mszp9kl2.fsf@gnu.org> <83h6pwa52z.fsf@gnu.org> <87ilaci637.fsf@catern.com> <83sf9g88eh.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25732"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: sbaugh@janestreet.com, yantar92@posteo.net, rms@gnu.org, dmitry@gutov.dev, michael.albinus@gmx.de, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jul 22 19:19:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qNGGK-0006TG-6J for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 22 Jul 2023 19:19:24 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qNGG0-0005lS-3e; Sat, 22 Jul 2023 13:19:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qNGFy-0005lI-Oh for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:19:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qNGFy-0003Da-H3 for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:19:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qNGFy-0004aw-9g for bug-gnu-emacs@gnu.org; Sat, 22 Jul 2023 13:19:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: sbaugh@catern.com Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 22 Jul 2023 17:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169004630917619 (code B ref 64735); Sat, 22 Jul 2023 17:19:02 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 22 Jul 2023 17:18:29 +0000 Original-Received: from localhost ([127.0.0.1]:37526 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNGFQ-0004a6-FT for submit@debbugs.gnu.org; Sat, 22 Jul 2023 13:18:29 -0400 Original-Received: from s.wrqvtzvf.outbound-mail.sendgrid.net ([149.72.126.143]:13978) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qNGFO-0004Zt-1I for 64735@debbugs.gnu.org; Sat, 22 Jul 2023 13:18:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=catern.com; h=from:subject:in-reply-to:references:mime-version:to:cc:content-type: content-transfer-encoding:cc:content-type:from:subject:to; s=s1; bh=eD958oiQjQIrk0Ld6XP5QZWP3il36+j1PcE1wIQrrRM=; b=mD0zv01PsNCOs3TTg4SQLP9DQBEgEkYhptmAkpZBN/wSEP8nYe0pY2ZKa2KtNxyfoHFZ fn7j3RtrVpNfsNytcb890LE6O+Y7ycfFxURednm90k3u4YvqX5jLM3Vy2e4TqGC+j/6rm5 2T0tHbV8PwRnEInateAHQn/Qnxu6/vJ57EL0RicHLxrp9MwDleVlKDhtvCxg1w6Jyu/I3U QzGOjs92Eu7BkyalVfpY8dgtMlO+yJNUsdIwJGal0CWawXr1qiH3PMt87aZd/wSsAC3s4X DIkCXSvFyd2MPmxhk1Y3YE7zt4rumiGq6w4JR+YvqXkjSBwP5T8DaIHhN/kxkkeg== Original-Received: by filterdrecv-84b96456cb-b5mzh with SMTP id filterdrecv-84b96456cb-b5mzh-1-64BC0F5B-A 2023-07-22 17:18:19.641567595 +0000 UTC m=+6284401.883235998 Original-Received: from earth.catern.com (unknown) by geopod-ismtpd-8 (SG) with ESMTP id b_UL3K0eRH2wUziXbaGQpw Sat, 22 Jul 2023 17:18:19.449 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=::1; helo=localhost; envelope-from=sbaugh@catern.com; receiver=gnu.org Original-Received: from localhost (localhost [IPv6:::1]) by earth.catern.com (Postfix) with ESMTPSA id 1AA0A60077; Sat, 22 Jul 2023 13:18:19 -0400 (EDT) In-Reply-To: <83sf9g88eh.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 22 Jul 2023 14:58:46 +0300") X-SG-EID: ZgbRq7gjGrt0q/Pjvxk7wM0yQFRdOkTJAtEbkjCkHbJf+I76yiUOSLF77crDivv8+uvwSZTFgqtahh2tqM+F7M2mOi6ydlNgAw8AFwCkWrnWqEAmfE88qKoTvOwipr7IJTIoYAGJEQeq2qPZMhJhYkAh84G8xLIJ4nPFW7SRJ+7EHZ9DEBNKsEgNyyJpTWsQpm2Kbqh1cmrcAk3QRKS6TQ== X-Entity-ID: d/0VcHixlS0t7iB1YKCv4Q== X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:265835 Archived-At: Eli Zaretskii writes: >> From: sbaugh@catern.com >> Date: Sat, 22 Jul 2023 10:38:37 +0000 (UTC) >> Cc: Spencer Baugh , dmitry@gutov.dev, >> yantar92@posteo.net, michael.albinus@gmx.de, rms@gnu.org, >> 64735@debbugs.gnu.org >> >> Eli Zaretskii writes: >> > No, the first step is to use in Emacs what Find does today, because it >> > will already be a significant speedup. >> >> Why bother? directory-files-recursively is a rarely used API, as you >> have mentioned before in this thread. > > Because we could then use it much more (assuming the result will be > performant enough -- this remains to be seen). > >> And there is a way to speed it up which will have a performance boost >> which is unbeatable any other way: Use find instead of >> directory-files-recursively, and operate on files as they find prints >> them. > > Not every command can operate on the output sequentially: some need to > see all of the output, others will need to be redesigned and > reimplemented to support such sequential mode. > > Moreover, piping from Find incurs overhead: data is broken into blocks > by the pipe or PTY, reading the data can be slowed down if Emacs is > busy processing something, etc. I went ahead and implemented it, and I get a 2x speedup even *without* running find in parallel with Emacs. First my results: (my-bench 100 "~/public_html" "") (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)") ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)")) (my-bench 10 "~/.local/src/linux" "") (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)") ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)")) (my-bench 100 "/ssh:catern.com:~/public_html" "") (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)") ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)")) 2x speedup on local files, and almost a 10x speedup for remote files. And my implementation *isn't even using the fact that find can run in parallel with Emacs*. If I did start using that, I expect even more speed gains from parallelism, which aren't achievable in Emacs itself. So can we add something like this (with the appropriate fallbacks to directory-files-recursively), since it has such a big speedup even without parallelism? My implementation and benchmarking: (defun find-directory-files-recursively (dir regexp &optional include-directories _predicate follow-symlinks) (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates") (with-temp-buffer (setq case-fold-search nil) (cd dir) (let* ((command (append (list "find" (file-local-name dir)) (if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) (unless (string-empty-p regexp) "-regex" (concat ".*" regexp ".*")) (unless include-directories '("!" "-type" "d")) '("-print0") )) (remote (file-remote-p dir)) (proc (if remote (let ((proc (apply #'start-file-process "find" (current-buffer) command))) (set-process-sentinel proc (lambda (_proc _state))) (set-process-query-on-exit-flag proc nil) proc) (make-process :name "find" :buffer (current-buffer) :connection-type 'pipe :noquery t :sentinel (lambda (_proc _state)) :command command)))) (while (accept-process-output proc)) (let ((start (goto-char (point-min))) ret) (while (search-forward "\0" nil t) (push (concat remote (buffer-substring-no-properties start (1- (point)))) ret) (setq start (point))) ret)))) (defun my-bench (count path regexp) (setq path (expand-file-name path)) (let ((old (directory-files-recursively path regexp)) (new (find-directory-files-recursively path regexp))) (dolist (path old) (should (member path new))) (dolist (path new) (should (member path old)))) (list (cons "built-in" (benchmark count (list 'directory-files-recursively path regexp))) (cons "with-find" (benchmark count (list 'find-directory-files-recursively path regexp)))))