From: Ihor Radchenko <yantar92@posteo.net>
To: Dmitry Gutov <dmitry@gutov.dev>
Cc: luangruo@yahoo.com, sbaugh@janestreet.com,
Eli Zaretskii <eliz@gnu.org>,
64735@debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Wed, 26 Jul 2023 09:09:28 +0000 [thread overview]
Message-ID: <878rb3m43b.fsf@localhost> (raw)
In-Reply-To: <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev>
[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]
Dmitry Gutov <dmitry@gutov.dev> writes:
>> (my-bench 10 "/usr/src/linux/" "")
>>
>> (("built-in" . "Elapsed time: 7.034326s (3.598539s in 14 GCs)")
>> ("built-in no filename handler alist" . "Elapsed time: 5.907194s (3.698456s in 15 GCs)")
>> ("with-find" . "Elapsed time: 6.078056s (4.052791s in 16 GCs)")
>> ("with-find-p" . "Elapsed time: 4.496762s (2.739565s in 11 GCs)")
>> ("with-find-sync" . "Elapsed time: 3.702760s (1.715160s in 7 GCs)"))
>
> Thanks, for the extra data point in particular. Easy to see how it
> compares to the most efficient use of 'find', right (on GNU/Linix, at
> least)?
>
> It's also something to note that, GC-wise, numbers 1 and 2 are not the
> worst: the time must be spent somewhere else.
Indeed. I did more detailed analysis in
https://yhetil.org/emacs-devel/87cz0p2xlc.fsf@localhost/
Main contributors in the lisp versions are (in the order from most
significant to less significant) (1) file name handlers; (2) regexp
matching of the file names; (3) nconc calls in the current
`directory-files-recursively' implementation.
I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
calls + bypassing regexp matches when REGEXP is nil.
Here are the results (using the attached modified version of your
benchmark file):
(my-bench 10 "/usr/src/linux/" "")
(("built-in" . "Elapsed time: 7.285597s (3.853368s in 6 GCs)")
("built-in no filename handler alist" . "Elapsed time: 5.855019s (3.760662s in 6 GCs)")
("built-in non-recursive no filename handler alist" . "Elapsed time: 5.817639s (4.326945s in 7 GCs)")
("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 2.708306s (1.871665s in 3 GCs)")
("with-find" . "Elapsed time: 6.082200s (4.262830s in 7 GCs)")
("with-find-p" . "Elapsed time: 4.325503s (3.058647s in 5 GCs)")
("with-find-sync" . "Elapsed time: 3.267648s (1.903655s in 3 GCs)"))
(let ((gc-cons-threshold most-positive-fixnum))
(my-bench 10 "/usr/src/linux/" ""))
(("built-in" . "Elapsed time: 2.754473s")
("built-in no filename handler alist" . "Elapsed time: 1.322443s")
("built-in non-recursive no filename handler alist" . "Elapsed time: 1.235044s")
("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 0.750275s")
("with-find" . "Elapsed time: 1.438510s")
("with-find-p" . "Elapsed time: 1.200876s")
("with-find-sync" . "Elapsed time: 1.349755s"))
If we forget about GC, Elisp version can get fairly close to GNU find.
And if we do not perform regexp matching (which makes sense when the
REGEXP is ""), Elisp version is faster.
[-- Attachment #2: find-bench.el --]
[-- Type: text/plain, Size: 7254 bytes --]
;; -*- lexical-binding: t; -*-
(defun find-directory-files-recursively (dir regexp &optional include-directories _p follow-symlinks)
(cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
(with-temp-buffer
(setq case-fold-search nil)
(cd dir)
(let* ((command
(append
(list "find" (file-local-name dir))
(if follow-symlinks
'("-L")
'("!" "(" "-type" "l" "-xtype" "d" ")"))
(unless (string-empty-p regexp)
(list "-regex" (concat ".*" regexp ".*")))
(unless include-directories
'("!" "-type" "d"))
'("-print0")
))
(remote (file-remote-p dir))
(proc
(if remote
(let ((proc (apply #'start-file-process
"find" (current-buffer) command)))
(set-process-sentinel proc (lambda (_proc _state)))
(set-process-query-on-exit-flag proc nil)
proc)
(make-process :name "find" :buffer (current-buffer)
:connection-type 'pipe
:noquery t
:sentinel (lambda (_proc _state))
:command command))))
(while (accept-process-output proc))
(let ((start (goto-char (point-min))) ret)
(while (search-forward "\0" nil t)
(push (concat remote (buffer-substring-no-properties start (1- (point)))) ret)
(setq start (point)))
ret))))
(defun find-directory-files-recursively-2 (dir regexp &optional include-directories _p follow-symlinks)
(cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
(cl-assert (not (file-remote-p dir)))
(let* (buffered
result
(proc
(make-process
:name "find" :buffer nil
:connection-type 'pipe
:noquery t
:sentinel (lambda (_proc _state))
:filter (lambda (proc data)
(let ((start 0))
(when-let (end (string-search "\0" data start))
(push (concat buffered (substring data start end)) result)
(setq buffered "")
(setq start (1+ end))
(while-let ((end (string-search "\0" data start)))
(push (substring data start end) result)
(setq start (1+ end))))
(setq buffered (concat buffered (substring data start)))))
:command (append
(list "find" (file-local-name dir))
(if follow-symlinks
'("-L")
'("!" "(" "-type" "l" "-xtype" "d" ")"))
(unless (string-empty-p regexp)
(list "-regex" (concat ".*" regexp ".*")))
(unless include-directories
'("!" "-type" "d"))
'("-print0")
))))
(while (accept-process-output proc))
result))
(defun find-directory-files-recursively-3 (dir regexp &optional include-directories _p follow-symlinks)
(cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
(cl-assert (not (file-remote-p dir)))
(let ((args `(,(file-local-name dir)
,@(if follow-symlinks
'("-L")
'("!" "(" "-type" "l" "-xtype" "d" ")"))
,@(unless (string-empty-p regexp)
(list "-regex" (concat ".*" regexp ".*")))
,@(unless include-directories
'("!" "-type" "d"))
"-print0")))
(with-temp-buffer
(let ((status (apply #'process-file
"find"
nil
t
nil
args))
(pt (point-min))
res)
(unless (zerop status)
(error "Listing failed"))
(goto-char (point-min))
(while (search-forward "\0" nil t)
(push (buffer-substring-no-properties pt (1- (point)))
res)
(setq pt (point)))
res))))
(defun directory-files-recursively-strip-nconc
(dir regexp
&optional include-directories predicate
follow-symlinks)
"Return list of all files under directory DIR whose names match REGEXP.
This function works recursively. Files are returned in \"depth
first\" order, and files from each directory are sorted in
alphabetical order. Each file name appears in the returned list
in its absolute form.
By default, the returned list excludes directories, but if
optional argument INCLUDE-DIRECTORIES is non-nil, they are
included.
PREDICATE can be either nil (which means that all subdirectories
of DIR are descended into), t (which means that subdirectories that
can't be read are ignored), or a function (which is called with
the name of each subdirectory, and should return non-nil if the
subdirectory is to be descended into).
If FOLLOW-SYMLINKS is non-nil, symbolic links that point to
directories are followed. Note that this can lead to infinite
recursion."
(let* ((result nil)
(dirs (list dir))
(dir (directory-file-name dir))
;; When DIR is "/", remote file names like "/method:" could
;; also be offered. We shall suppress them.
(tramp-mode (and tramp-mode (file-remote-p (expand-file-name dir)))))
(while (setq dir (pop dirs))
(dolist (file (file-name-all-completions "" dir))
(unless (member file '("./" "../"))
(if (directory-name-p file)
(let* ((leaf (substring file 0 (1- (length file))))
(full-file (concat dir "/" leaf)))
;; Don't follow symlinks to other directories.
(when (and (or (not (file-symlink-p full-file))
follow-symlinks)
;; Allow filtering subdirectories.
(or (eq predicate nil)
(eq predicate t)
(funcall predicate full-file)))
(push full-file dirs))
(when (and include-directories
(string-match regexp leaf))
(setq result (nconc result (list full-file)))))
(when (and regexp (string-match regexp file))
(push (concat dir "/" file) result))))))
(sort result #'string<)))
(defun my-bench (count path regexp)
(setq path (expand-file-name path))
;; (let ((old (directory-files-recursively path regexp))
;; (new (find-directory-files-recursively-3 path regexp)))
;; (dolist (path old)
;; (unless (member path new) (error "! %s not in" path)))
;; (dolist (path new)
;; (unless (member path old) (error "!! %s not in" path))))
(list
(cons "built-in" (benchmark count (list 'directory-files-recursively path regexp)))
(cons "built-in no filename handler alist" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively path regexp))))
(cons "built-in non-recursive no filename handler alist" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively-strip-nconc path regexp))))
(cons "built-in non-recursive no filename handler alist + skip re-match" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively-strip-nconc path nil))))
(cons "with-find" (benchmark count (list 'find-directory-files-recursively path regexp)))
(cons "with-find-p" (benchmark count (list 'find-directory-files-recursively-2 path regexp)))
(cons "with-find-sync" (benchmark count (list 'find-directory-files-recursively-3 path regexp)))))
(provide 'find-bench)
[-- Attachment #3: Type: text/plain, Size: 224 bytes --]
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
next prev parent reply other threads:[~2023-07-26 9:09 UTC|newest]
Thread overview: 213+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 21:16 bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Spencer Baugh
2023-07-20 5:00 ` Eli Zaretskii
2023-07-20 12:22 ` sbaugh
2023-07-20 12:42 ` Dmitry Gutov
2023-07-20 13:43 ` Spencer Baugh
2023-07-20 18:54 ` Dmitry Gutov
2023-07-20 12:38 ` Dmitry Gutov
2023-07-20 13:20 ` Ihor Radchenko
2023-07-20 15:19 ` Dmitry Gutov
2023-07-20 15:42 ` Ihor Radchenko
2023-07-20 15:57 ` Dmitry Gutov
2023-07-20 16:03 ` Ihor Radchenko
2023-07-20 18:56 ` Dmitry Gutov
2023-07-21 9:14 ` Ihor Radchenko
2023-07-20 16:33 ` Eli Zaretskii
2023-07-20 16:36 ` Ihor Radchenko
2023-07-20 16:45 ` Eli Zaretskii
2023-07-20 17:23 ` Ihor Radchenko
2023-07-20 18:24 ` Eli Zaretskii
2023-07-20 18:29 ` Ihor Radchenko
2023-07-20 18:43 ` Eli Zaretskii
2023-07-20 18:57 ` Ihor Radchenko
2023-07-21 12:37 ` Dmitry Gutov
2023-07-21 12:58 ` Ihor Radchenko
2023-07-21 13:00 ` Dmitry Gutov
2023-07-21 13:34 ` Ihor Radchenko
2023-07-21 13:36 ` Dmitry Gutov
2023-07-21 13:46 ` Ihor Radchenko
2023-07-21 15:41 ` Dmitry Gutov
2023-07-21 15:48 ` Ihor Radchenko
2023-07-21 19:53 ` Dmitry Gutov
2023-07-23 5:40 ` Ihor Radchenko
2023-07-23 11:50 ` Michael Albinus
2023-07-24 7:35 ` Ihor Radchenko
2023-07-24 7:59 ` Michael Albinus
2023-07-24 8:22 ` Ihor Radchenko
2023-07-24 9:31 ` Michael Albinus
2023-07-21 7:45 ` Michael Albinus
2023-07-21 10:46 ` Eli Zaretskii
2023-07-21 11:32 ` Michael Albinus
2023-07-21 11:51 ` Ihor Radchenko
2023-07-21 12:01 ` Michael Albinus
2023-07-21 12:20 ` Ihor Radchenko
2023-07-21 12:25 ` Ihor Radchenko
2023-07-21 12:46 ` Eli Zaretskii
2023-07-21 13:01 ` Michael Albinus
2023-07-21 13:23 ` Ihor Radchenko
2023-07-21 15:31 ` Michael Albinus
2023-07-21 15:38 ` Ihor Radchenko
2023-07-21 15:49 ` Michael Albinus
2023-07-21 15:55 ` Eli Zaretskii
2023-07-21 16:08 ` Michael Albinus
2023-07-21 16:15 ` Ihor Radchenko
2023-07-21 16:38 ` Eli Zaretskii
2023-07-21 16:43 ` Ihor Radchenko
2023-07-21 16:43 ` Michael Albinus
2023-07-21 17:45 ` Eli Zaretskii
2023-07-21 17:55 ` Michael Albinus
2023-07-21 18:38 ` Eli Zaretskii
2023-07-21 19:33 ` Spencer Baugh
2023-07-22 5:27 ` Eli Zaretskii
2023-07-22 10:38 ` sbaugh
2023-07-22 11:58 ` Eli Zaretskii
2023-07-22 14:14 ` Ihor Radchenko
2023-07-22 14:32 ` Eli Zaretskii
2023-07-22 15:07 ` Ihor Radchenko
2023-07-22 15:29 ` Eli Zaretskii
2023-07-23 7:52 ` Ihor Radchenko
2023-07-23 8:01 ` Eli Zaretskii
2023-07-23 8:11 ` Ihor Radchenko
2023-07-23 9:11 ` Eli Zaretskii
2023-07-23 9:34 ` Ihor Radchenko
2023-07-23 9:39 ` Eli Zaretskii
2023-07-23 9:42 ` Ihor Radchenko
2023-07-23 10:20 ` Eli Zaretskii
2023-07-23 11:43 ` Ihor Radchenko
2023-07-23 12:49 ` Eli Zaretskii
2023-07-23 12:57 ` Ihor Radchenko
2023-07-23 13:32 ` Eli Zaretskii
2023-07-23 13:56 ` Ihor Radchenko
2023-07-23 14:32 ` Eli Zaretskii
2023-07-22 17:18 ` sbaugh
2023-07-22 17:26 ` Ihor Radchenko
2023-07-22 17:46 ` Eli Zaretskii
2023-07-22 18:31 ` Eli Zaretskii
2023-07-22 19:06 ` Eli Zaretskii
2023-07-22 20:53 ` Spencer Baugh
2023-07-23 6:15 ` Eli Zaretskii
2023-07-23 7:48 ` Ihor Radchenko
2023-07-23 8:06 ` Eli Zaretskii
2023-07-23 8:16 ` Ihor Radchenko
2023-07-23 9:13 ` Eli Zaretskii
2023-07-23 9:16 ` Ihor Radchenko
2023-07-23 11:44 ` Michael Albinus
2023-07-23 2:59 ` Richard Stallman
2023-07-23 5:28 ` Eli Zaretskii
2023-07-22 8:17 ` Michael Albinus
2023-07-21 13:17 ` Ihor Radchenko
2023-07-21 12:27 ` Michael Albinus
2023-07-21 12:30 ` Ihor Radchenko
2023-07-21 13:04 ` Michael Albinus
2023-07-21 13:24 ` Ihor Radchenko
2023-07-21 15:36 ` Michael Albinus
2023-07-21 15:44 ` Ihor Radchenko
2023-07-21 12:39 ` Eli Zaretskii
2023-07-21 13:09 ` Michael Albinus
2023-07-21 12:38 ` Dmitry Gutov
2023-07-20 17:08 ` Spencer Baugh
2023-07-20 17:24 ` Eli Zaretskii
2023-07-22 6:35 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-20 17:25 ` Ihor Radchenko
2023-07-21 19:31 ` Spencer Baugh
2023-07-21 19:37 ` Ihor Radchenko
2023-07-21 19:56 ` Dmitry Gutov
2023-07-21 20:11 ` Spencer Baugh
2023-07-22 6:39 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-22 21:01 ` Dmitry Gutov
2023-07-23 5:11 ` Eli Zaretskii
2023-07-23 10:46 ` Dmitry Gutov
2023-07-23 11:18 ` Eli Zaretskii
2023-07-23 17:46 ` Dmitry Gutov
2023-07-23 17:56 ` Eli Zaretskii
2023-07-23 17:58 ` Dmitry Gutov
2023-07-23 18:21 ` Eli Zaretskii
2023-07-23 19:07 ` Dmitry Gutov
2023-07-23 19:27 ` Eli Zaretskii
2023-07-23 19:44 ` Dmitry Gutov
2023-07-23 19:27 ` Dmitry Gutov
2023-07-24 11:20 ` Eli Zaretskii
2023-07-24 12:55 ` Dmitry Gutov
2023-07-24 13:26 ` Eli Zaretskii
2023-07-25 2:41 ` Dmitry Gutov
2023-07-25 8:22 ` Ihor Radchenko
2023-07-26 1:51 ` Dmitry Gutov
2023-07-26 9:09 ` Ihor Radchenko [this message]
2023-07-27 0:41 ` Dmitry Gutov
2023-07-27 5:22 ` Eli Zaretskii
2023-07-27 8:20 ` Ihor Radchenko
2023-07-27 8:47 ` Eli Zaretskii
2023-07-27 9:28 ` Ihor Radchenko
2023-07-27 13:30 ` Dmitry Gutov
2023-07-29 0:12 ` Dmitry Gutov
2023-07-29 6:15 ` Eli Zaretskii
2023-07-30 1:35 ` Dmitry Gutov
2023-07-31 11:38 ` Eli Zaretskii
2023-09-08 0:53 ` Dmitry Gutov
2023-09-08 6:35 ` Eli Zaretskii
2023-09-10 1:30 ` Dmitry Gutov
2023-09-10 5:33 ` Eli Zaretskii
2023-09-11 0:02 ` Dmitry Gutov
2023-09-11 11:57 ` Eli Zaretskii
2023-09-11 23:06 ` Dmitry Gutov
2023-09-12 11:39 ` Eli Zaretskii
2023-09-12 13:11 ` Dmitry Gutov
2023-09-12 14:23 ` Dmitry Gutov
2023-09-12 14:26 ` Dmitry Gutov
2023-09-12 16:32 ` Eli Zaretskii
2023-09-12 18:48 ` Dmitry Gutov
2023-09-12 19:35 ` Eli Zaretskii
2023-09-12 20:27 ` Dmitry Gutov
2023-09-13 11:38 ` Eli Zaretskii
2023-09-13 14:27 ` Dmitry Gutov
2023-09-13 15:07 ` Eli Zaretskii
2023-09-13 17:27 ` Dmitry Gutov
2023-09-13 19:32 ` Eli Zaretskii
2023-09-13 20:38 ` Dmitry Gutov
2023-09-14 5:41 ` Eli Zaretskii
2023-09-16 1:32 ` Dmitry Gutov
2023-09-16 5:37 ` Eli Zaretskii
2023-09-19 19:59 ` bug#66020: (bug#64735 spin-off): regarding the default for read-process-output-max Dmitry Gutov
2023-09-20 11:20 ` Eli Zaretskii
2023-09-21 0:57 ` Dmitry Gutov
2023-09-21 2:36 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
[not found] ` <58e9135f-915d-beb9-518a-e814ec2a0c5b@gutov.dev>
2023-09-21 13:16 ` Eli Zaretskii
2023-09-21 17:54 ` Dmitry Gutov
2023-09-21 7:42 ` Eli Zaretskii
2023-09-21 14:37 ` Dmitry Gutov
2023-09-21 14:59 ` Eli Zaretskii
2023-09-21 17:40 ` Dmitry Gutov
2023-09-21 18:39 ` Eli Zaretskii
2023-09-21 18:42 ` Dmitry Gutov
2023-09-21 18:49 ` Eli Zaretskii
2023-09-21 17:33 ` Dmitry Gutov
2023-09-23 21:51 ` Dmitry Gutov
2023-09-24 5:29 ` Eli Zaretskii
2024-05-26 15:20 ` Dmitry Gutov
2024-05-26 16:01 ` Eli Zaretskii
2024-05-26 23:27 ` Stefan Kangas
2024-06-08 12:11 ` Eli Zaretskii
2024-06-09 0:12 ` Dmitry Gutov
2024-06-11 3:12 ` Dmitry Gutov
2024-06-11 6:51 ` Eli Zaretskii
2024-06-11 11:41 ` Dmitry Gutov
2024-06-11 12:55 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-11 13:06 ` Eli Zaretskii
2024-06-11 17:15 ` Ihor Radchenko
2024-06-11 18:09 ` Dmitry Gutov
2024-06-11 19:33 ` Ihor Radchenko
2024-06-11 20:00 ` Dmitry Gutov
2023-09-21 8:07 ` Stefan Kangas
[not found] ` <b4f2135b-be9d-2423-02ac-9690de8b5a92@gutov.dev>
2023-09-21 13:17 ` Eli Zaretskii
2023-07-25 18:42 ` bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Eli Zaretskii
2023-07-26 1:56 ` Dmitry Gutov
2023-07-26 2:28 ` Eli Zaretskii
2023-07-26 2:35 ` Dmitry Gutov
2023-07-25 19:16 ` sbaugh
2023-07-26 2:28 ` Dmitry Gutov
2023-07-21 2:42 ` Richard Stallman
2023-07-22 2:39 ` Richard Stallman
2023-07-22 5:49 ` Eli Zaretskii
2023-07-22 10:18 ` Ihor Radchenko
2023-07-22 10:42 ` sbaugh
2023-07-22 12:00 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878rb3m43b.fsf@localhost \
--to=yantar92@posteo.net \
--cc=64735@debbugs.gnu.org \
--cc=dmitry@gutov.dev \
--cc=eliz@gnu.org \
--cc=luangruo@yahoo.com \
--cc=sbaugh@janestreet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).