unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dmitry@gutov.dev>
To: Eli Zaretskii <eliz@gnu.org>
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
	64735@debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Tue, 25 Jul 2023 05:41:13 +0300	[thread overview]
Message-ID: <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> (raw)
In-Reply-To: <83a5vlsanw.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

On 24/07/2023 16:26, Eli Zaretskii wrote:
>> Date: Mon, 24 Jul 2023 15:55:13 +0300
>> Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
>>   64735@debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>>>> 1. 'find' itself is much slower there. There is room for improvement in
>>>> the port.
>>>
>>> I think it's the filesystem, not the port (which I did myself in this
>>> case).
>>
>> But directory-files-recursively goes through the same filesystem,
>> doesn't it?
> 
> It does (more or less; see below).  But I was not trying to explain
> why Find is slower than directory-files-recursively, I was trying to
> explain why Find on Windows is slower than Find on GNU/Linux.

Understood. But we probably don't need to worry about the differences 
between platforms as much as about choosing the best option for each 
platform (or not choosing the worst, at least). So I'm more interested 
about how the find-based solution is more than 4x slower than the 
built-in one on MS Windows.

> If you are asking why directory-files-recursively is so much faster on
> Windows than Find, then the main factors I can think about are:
> 
>    . IPC, at least in how we implement it in Emacs on MS-Windows, via a
>      separate thread and OS-level events between them to signal that
>      stuff is available for reading, whereas
>      directory-files-recursively avoids this overhead completely;
>    . Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on
>      Windows are emulated by wrappers around native APIs.  Moreover,
>      Find uses 'char *' for file names, so calling native APIs involves
>      transparent conversion to UTF-16 and back, which is what native
>      APIs accept and return.  By contrast, Emacs on Windows calls the
>      native APIs directly, and converts to UTF-16 from UTF-8, which is
>      faster.  (This last point also means that using Find on Windows
>      has another grave disadvantage: it cannot fully support non-ASCII
>      file names, only those that can be encoded by the current
>      single-byte system codepage.)

I seem to remember that Wine, which also does a similar dance of 
translating library and system calls, is often very close to the native 
performance for many programs. So this could be a problem, but 
necessarily a significant one.

Although text encoding conversion seems like a prime suspect, if the 
problem is here.

>>>> 2. The process output handling is worse.
>>>
>>> Not sure what that means.
>>
>> Emacs's ability to process the output of a process on the particular
>> platform.
>>
>> You said:
>>
>>     Btw, the Find command with pipe to some other program, like wc,
>>     finishes much faster, like 2 to 4 times faster than when it is run
>>     from find-directory-files-recursively.  That's probably the slowdown
>>     due to communications with async subprocesses in action.
> 
> I see this slowdown on GNU/Linux as well.
> 
>> One thing to try it changing the -with-find implementation to use a
>> synchronous call, to compare (e.g. using 'process-file'). And repeat
>> these tests on GNU/Linux too.
> 
> This still uses pipes, albeit without the pselect stuff.

I'm attaching an extended benchmark, one that includes a "synchronous" 
implementation as well. Please give it a spin as well.

Here (GNU/Linux) the reported numbers look like this:

 > (my-bench 1 default-directory "")

(("built-in" . "Elapsed time: 1.601649s (0.709108s in 22 GCs)")
  ("with-find" . "Elapsed time: 1.792383s (1.135869s in 38 GCs)")
  ("with-find-p" . "Elapsed time: 1.248543s (0.682827s in 20 GCs)")
  ("with-find-sync" . "Elapsed time: 0.922291s (0.343497s in 10 GCs)"))

[-- Attachment #2: find-bench.el --]
[-- Type: text/x-emacs-lisp, Size: 4648 bytes --]

(defun find-directory-files-recursively (dir regexp &optional include-directories _p follow-symlinks)
  (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
  (with-temp-buffer
    (setq case-fold-search nil)
    (cd dir)
    (let* ((command
	    (append
	     (list "find" (file-local-name dir))
	     (if follow-symlinks
		 '("-L")
	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
	     (unless (string-empty-p regexp)
	       (list "-regex" (concat ".*" regexp ".*")))
	     (unless include-directories
	       '("!" "-type" "d"))
	     '("-print0")
	     ))
	   (remote (file-remote-p dir))
	   (proc
	    (if remote
		(let ((proc (apply #'start-file-process
				   "find" (current-buffer) command)))
		  (set-process-sentinel proc (lambda (_proc _state)))
		  (set-process-query-on-exit-flag proc nil)
		  proc)
	      (make-process :name "find" :buffer (current-buffer)
			    :connection-type 'pipe
			    :noquery t
			    :sentinel (lambda (_proc _state))
			    :command command))))
      (while (accept-process-output proc))
      (let ((start (goto-char (point-min))) ret)
	(while (search-forward "\0" nil t)
	  (push (concat remote (buffer-substring-no-properties start (1- (point)))) ret)
	  (setq start (point)))
	ret))))

(defun find-directory-files-recursively-2 (dir regexp &optional include-directories _p follow-symlinks)
  (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
  (cl-assert (not (file-remote-p dir)))
  (let* (buffered
         result
         (proc
	  (make-process
           :name "find" :buffer nil
	   :connection-type 'pipe
	   :noquery t
	   :sentinel (lambda (_proc _state))
           :filter (lambda (proc data)
                     (let ((start 0))
                       (when-let (end (string-search "\0" data start))
                         (push (concat buffered (substring data start end)) result)
                         (setq buffered "")
                         (setq start (1+ end))
                         (while-let ((end (string-search "\0" data start)))
                           (push (substring data start end) result)
                           (setq start (1+ end))))
                       (setq buffered (concat buffered (substring data start)))))
	   :command (append
	             (list "find" (file-local-name dir))
	             (if follow-symlinks
		         '("-L")
	               '("!" "(" "-type" "l" "-xtype" "d" ")"))
	             (unless (string-empty-p regexp)
	               (list "-regex" (concat ".*" regexp ".*")))
	             (unless include-directories
	               '("!" "-type" "d"))
	             '("-print0")
	             ))))
    (while (accept-process-output proc))
    result))

(defun find-directory-files-recursively-3 (dir regexp &optional include-directories _p follow-symlinks)
  (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates")
  (cl-assert (not (file-remote-p dir)))
  (let ((args `(,(file-local-name dir)
	        ,@(if follow-symlinks
		      '("-L")
	            '("!" "(" "-type" "l" "-xtype" "d" ")"))
	        ,@(unless (string-empty-p regexp)
	            (list "-regex" (concat ".*" regexp ".*")))
	        ,@(unless include-directories
	            '("!" "-type" "d"))
	        "-print0")))
    (with-temp-buffer
      (let ((status (apply #'process-file
                           "find"
                           nil
                           t
                           nil
                           args))
            (pt (point-min))
            res)
        (unless (zerop status)
          (error "Listing failed"))
        (goto-char (point-min))
        (while (search-forward "\0" nil t)
          (push (buffer-substring-no-properties pt (1- (point)))
                res)
          (setq pt (point)))
        res))))

(defun my-bench (count path regexp)
  (setq path (expand-file-name path))
  ;; (let ((old (directory-files-recursively path regexp))
  ;;       (new (find-directory-files-recursively-3 path regexp)))
  ;;   (dolist (path old)
  ;;     (unless (member path new) (error "! %s not in" path)))
  ;;   (dolist (path new)
  ;;     (unless (member path old) (error "!! %s not in" path))))
  (list
   (cons "built-in" (benchmark count (list 'directory-files-recursively path regexp)))
   (cons "with-find" (benchmark count (list 'find-directory-files-recursively path regexp)))
   (cons "with-find-p" (benchmark count (list 'find-directory-files-recursively-2 path regexp)))
   (cons "with-find-sync" (benchmark count (list 'find-directory-files-recursively-3 path regexp)))))

  reply	other threads:[~2023-07-25  2:41 UTC|newest]

Thread overview: 202+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 21:16 bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Spencer Baugh
2023-07-20  5:00 ` Eli Zaretskii
2023-07-20 12:22   ` sbaugh
2023-07-20 12:42     ` Dmitry Gutov
2023-07-20 13:43       ` Spencer Baugh
2023-07-20 18:54         ` Dmitry Gutov
2023-07-20 12:38 ` Dmitry Gutov
2023-07-20 13:20   ` Ihor Radchenko
2023-07-20 15:19     ` Dmitry Gutov
2023-07-20 15:42       ` Ihor Radchenko
2023-07-20 15:57         ` Dmitry Gutov
2023-07-20 16:03           ` Ihor Radchenko
2023-07-20 18:56             ` Dmitry Gutov
2023-07-21  9:14               ` Ihor Radchenko
2023-07-20 16:33         ` Eli Zaretskii
2023-07-20 16:36           ` Ihor Radchenko
2023-07-20 16:45             ` Eli Zaretskii
2023-07-20 17:23               ` Ihor Radchenko
2023-07-20 18:24                 ` Eli Zaretskii
2023-07-20 18:29                   ` Ihor Radchenko
2023-07-20 18:43                     ` Eli Zaretskii
2023-07-20 18:57                       ` Ihor Radchenko
2023-07-21 12:37                         ` Dmitry Gutov
2023-07-21 12:58                           ` Ihor Radchenko
2023-07-21 13:00                             ` Dmitry Gutov
2023-07-21 13:34                               ` Ihor Radchenko
2023-07-21 13:36                                 ` Dmitry Gutov
2023-07-21 13:46                                   ` Ihor Radchenko
2023-07-21 15:41                                     ` Dmitry Gutov
2023-07-21 15:48                                       ` Ihor Radchenko
2023-07-21 19:53                                         ` Dmitry Gutov
2023-07-23  5:40                                     ` Ihor Radchenko
2023-07-23 11:50                                       ` Michael Albinus
2023-07-24  7:35                                         ` Ihor Radchenko
2023-07-24  7:59                                           ` Michael Albinus
2023-07-24  8:22                                             ` Ihor Radchenko
2023-07-24  9:31                                               ` Michael Albinus
2023-07-21  7:45                       ` Michael Albinus
2023-07-21 10:46                         ` Eli Zaretskii
2023-07-21 11:32                           ` Michael Albinus
2023-07-21 11:51                             ` Ihor Radchenko
2023-07-21 12:01                               ` Michael Albinus
2023-07-21 12:20                                 ` Ihor Radchenko
2023-07-21 12:25                                   ` Ihor Radchenko
2023-07-21 12:46                                     ` Eli Zaretskii
2023-07-21 13:01                                       ` Michael Albinus
2023-07-21 13:23                                         ` Ihor Radchenko
2023-07-21 15:31                                           ` Michael Albinus
2023-07-21 15:38                                             ` Ihor Radchenko
2023-07-21 15:49                                               ` Michael Albinus
2023-07-21 15:55                                                 ` Eli Zaretskii
2023-07-21 16:08                                                   ` Michael Albinus
2023-07-21 16:15                                                   ` Ihor Radchenko
2023-07-21 16:38                                                     ` Eli Zaretskii
2023-07-21 16:43                                                       ` Ihor Radchenko
2023-07-21 16:43                                                       ` Michael Albinus
2023-07-21 17:45                                                         ` Eli Zaretskii
2023-07-21 17:55                                                           ` Michael Albinus
2023-07-21 18:38                                                             ` Eli Zaretskii
2023-07-21 19:33                                                               ` Spencer Baugh
2023-07-22  5:27                                                                 ` Eli Zaretskii
2023-07-22 10:38                                                                   ` sbaugh
2023-07-22 11:58                                                                     ` Eli Zaretskii
2023-07-22 14:14                                                                       ` Ihor Radchenko
2023-07-22 14:32                                                                         ` Eli Zaretskii
2023-07-22 15:07                                                                           ` Ihor Radchenko
2023-07-22 15:29                                                                             ` Eli Zaretskii
2023-07-23  7:52                                                                               ` Ihor Radchenko
2023-07-23  8:01                                                                                 ` Eli Zaretskii
2023-07-23  8:11                                                                                   ` Ihor Radchenko
2023-07-23  9:11                                                                                     ` Eli Zaretskii
2023-07-23  9:34                                                                                       ` Ihor Radchenko
2023-07-23  9:39                                                                                         ` Eli Zaretskii
2023-07-23  9:42                                                                                           ` Ihor Radchenko
2023-07-23 10:20                                                                                             ` Eli Zaretskii
2023-07-23 11:43                                                                                               ` Ihor Radchenko
2023-07-23 12:49                                                                                                 ` Eli Zaretskii
2023-07-23 12:57                                                                                                   ` Ihor Radchenko
2023-07-23 13:32                                                                                                     ` Eli Zaretskii
2023-07-23 13:56                                                                                                       ` Ihor Radchenko
2023-07-23 14:32                                                                                                         ` Eli Zaretskii
2023-07-22 17:18                                                                       ` sbaugh
2023-07-22 17:26                                                                         ` Ihor Radchenko
2023-07-22 17:46                                                                         ` Eli Zaretskii
2023-07-22 18:31                                                                           ` Eli Zaretskii
2023-07-22 19:06                                                                             ` Eli Zaretskii
2023-07-22 20:53                                                                           ` Spencer Baugh
2023-07-23  6:15                                                                             ` Eli Zaretskii
2023-07-23  7:48                                                                             ` Ihor Radchenko
2023-07-23  8:06                                                                               ` Eli Zaretskii
2023-07-23  8:16                                                                                 ` Ihor Radchenko
2023-07-23  9:13                                                                                   ` Eli Zaretskii
2023-07-23  9:16                                                                                     ` Ihor Radchenko
2023-07-23 11:44                                                                             ` Michael Albinus
2023-07-23  2:59                                                                 ` Richard Stallman
2023-07-23  5:28                                                                   ` Eli Zaretskii
2023-07-22  8:17                                                             ` Michael Albinus
2023-07-21 13:17                                       ` Ihor Radchenko
2023-07-21 12:27                                   ` Michael Albinus
2023-07-21 12:30                                     ` Ihor Radchenko
2023-07-21 13:04                                       ` Michael Albinus
2023-07-21 13:24                                         ` Ihor Radchenko
2023-07-21 15:36                                           ` Michael Albinus
2023-07-21 15:44                                             ` Ihor Radchenko
2023-07-21 12:39                             ` Eli Zaretskii
2023-07-21 13:09                               ` Michael Albinus
2023-07-21 12:38                           ` Dmitry Gutov
2023-07-20 17:08         ` Spencer Baugh
2023-07-20 17:24           ` Eli Zaretskii
2023-07-22  6:35             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-20 17:25           ` Ihor Radchenko
2023-07-21 19:31             ` Spencer Baugh
2023-07-21 19:37               ` Ihor Radchenko
2023-07-21 19:56                 ` Dmitry Gutov
2023-07-21 20:11                 ` Spencer Baugh
2023-07-22  6:39           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-22 21:01             ` Dmitry Gutov
2023-07-23  5:11               ` Eli Zaretskii
2023-07-23 10:46                 ` Dmitry Gutov
2023-07-23 11:18                   ` Eli Zaretskii
2023-07-23 17:46                     ` Dmitry Gutov
2023-07-23 17:56                       ` Eli Zaretskii
2023-07-23 17:58                         ` Dmitry Gutov
2023-07-23 18:21                           ` Eli Zaretskii
2023-07-23 19:07                             ` Dmitry Gutov
2023-07-23 19:27                               ` Eli Zaretskii
2023-07-23 19:44                                 ` Dmitry Gutov
2023-07-23 19:27                         ` Dmitry Gutov
2023-07-24 11:20                           ` Eli Zaretskii
2023-07-24 12:55                             ` Dmitry Gutov
2023-07-24 13:26                               ` Eli Zaretskii
2023-07-25  2:41                                 ` Dmitry Gutov [this message]
2023-07-25  8:22                                   ` Ihor Radchenko
2023-07-26  1:51                                     ` Dmitry Gutov
2023-07-26  9:09                                       ` Ihor Radchenko
2023-07-27  0:41                                         ` Dmitry Gutov
2023-07-27  5:22                                           ` Eli Zaretskii
2023-07-27  8:20                                             ` Ihor Radchenko
2023-07-27  8:47                                               ` Eli Zaretskii
2023-07-27  9:28                                                 ` Ihor Radchenko
2023-07-27 13:30                                             ` Dmitry Gutov
2023-07-29  0:12                                               ` Dmitry Gutov
2023-07-29  6:15                                                 ` Eli Zaretskii
2023-07-30  1:35                                                   ` Dmitry Gutov
2023-07-31 11:38                                                     ` Eli Zaretskii
2023-09-08  0:53                                                       ` Dmitry Gutov
2023-09-08  6:35                                                         ` Eli Zaretskii
2023-09-10  1:30                                                           ` Dmitry Gutov
2023-09-10  5:33                                                             ` Eli Zaretskii
2023-09-11  0:02                                                               ` Dmitry Gutov
2023-09-11 11:57                                                                 ` Eli Zaretskii
2023-09-11 23:06                                                                   ` Dmitry Gutov
2023-09-12 11:39                                                                     ` Eli Zaretskii
2023-09-12 13:11                                                                       ` Dmitry Gutov
2023-09-12 14:23                                                                   ` Dmitry Gutov
2023-09-12 14:26                                                                     ` Dmitry Gutov
2023-09-12 16:32                                                                     ` Eli Zaretskii
2023-09-12 18:48                                                                       ` Dmitry Gutov
2023-09-12 19:35                                                                         ` Eli Zaretskii
2023-09-12 20:27                                                                           ` Dmitry Gutov
2023-09-13 11:38                                                                             ` Eli Zaretskii
2023-09-13 14:27                                                                               ` Dmitry Gutov
2023-09-13 15:07                                                                                 ` Eli Zaretskii
2023-09-13 17:27                                                                                   ` Dmitry Gutov
2023-09-13 19:32                                                                                     ` Eli Zaretskii
2023-09-13 20:38                                                                                       ` Dmitry Gutov
2023-09-14  5:41                                                                                         ` Eli Zaretskii
2023-09-16  1:32                                                                                           ` Dmitry Gutov
2023-09-16  5:37                                                                                             ` Eli Zaretskii
2023-09-19 19:59                                                                                               ` bug#66020: (bug#64735 spin-off): regarding the default for read-process-output-max Dmitry Gutov
2023-09-20 11:20                                                                                                 ` Eli Zaretskii
2023-09-21  0:57                                                                                                   ` Dmitry Gutov
2023-09-21  2:36                                                                                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
     [not found]                                                                                                       ` <58e9135f-915d-beb9-518a-e814ec2a0c5b@gutov.dev>
2023-09-21 13:16                                                                                                         ` Eli Zaretskii
2023-09-21 17:54                                                                                                           ` Dmitry Gutov
2023-09-21  7:42                                                                                                     ` Eli Zaretskii
2023-09-21 14:37                                                                                                       ` Dmitry Gutov
2023-09-21 14:59                                                                                                         ` Eli Zaretskii
2023-09-21 17:40                                                                                                           ` Dmitry Gutov
2023-09-21 18:39                                                                                                             ` Eli Zaretskii
2023-09-21 18:42                                                                                                               ` Dmitry Gutov
2023-09-21 18:49                                                                                                                 ` Eli Zaretskii
2023-09-21 17:33                                                                                                         ` Dmitry Gutov
2023-09-23 21:51                                                                                                           ` Dmitry Gutov
2023-09-24  5:29                                                                                                             ` Eli Zaretskii
2024-05-26 15:20                                                                                                               ` Dmitry Gutov
2024-05-26 16:01                                                                                                                 ` Eli Zaretskii
2024-05-26 23:27                                                                                                                   ` Stefan Kangas
2023-09-21  8:07                                                                                                   ` Stefan Kangas
     [not found]                                                                                                     ` <b4f2135b-be9d-2423-02ac-9690de8b5a92@gutov.dev>
2023-09-21 13:17                                                                                                       ` Eli Zaretskii
2023-07-25 18:42                                   ` bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Eli Zaretskii
2023-07-26  1:56                                     ` Dmitry Gutov
2023-07-26  2:28                                       ` Eli Zaretskii
2023-07-26  2:35                                         ` Dmitry Gutov
2023-07-25 19:16                                   ` sbaugh
2023-07-26  2:28                                     ` Dmitry Gutov
2023-07-21  2:42 ` Richard Stallman
2023-07-22  2:39   ` Richard Stallman
2023-07-22  5:49     ` Eli Zaretskii
2023-07-22 10:18 ` Ihor Radchenko
2023-07-22 10:42   ` sbaugh
2023-07-22 12:00     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev \
    --to=dmitry@gutov.dev \
    --cc=64735@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=luangruo@yahoo.com \
    --cc=sbaugh@janestreet.com \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).