unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dmitry@gutov.dev>
To: Eli Zaretskii <eliz@gnu.org>
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
	64735@debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Tue, 12 Sep 2023 17:23:53 +0300	[thread overview]
Message-ID: <e6dd2d5c-9a88-6e18-f1e2-544a9cb1beaf@gutov.dev> (raw)
In-Reply-To: <83tts0rkh5.fsf@gnu.org>

On 11/09/2023 14:57, Eli Zaretskii wrote:
>> So there is also a second recording for
>> find-directory-files-recursively-2 with read-process-output-max=409600.
>> It does improve the performance significantly (and reduce the number of
>> GC pauses). I guess what I'm still not clear on, is whether the number
>> of GC pauses is fewer because of less consing (the only column that
>> looks significantly different is the 3rd: VECTOR-CELLS), or because the
>> process finishes faster due to larger buffers, which itself causes fewer
>> calls to maybe_gc.
> I think the latter.

It might be both.

To try to analyze how large might per-chunk overhead be (CPU and GC-wise 
combined), I first implemented the same function in yet another way that 
doesn't use :filter (so that the default filter is used). But still 
asynchronously, with parsing happening concurrently to the process:

(defun find-directory-files-recursively-5 (dir regexp &optional 
include-directories _p follow-symlinks)
   (cl-assert (null _p) t "find-directory-files-recursively can't accept 
arbitrary predicates")
   (with-temp-buffer
     (setq case-fold-search nil)
     (cd dir)
     (let* ((command
	    (append
	     (list "find" (file-local-name dir))
	     (if follow-symlinks
		 '("-L")
	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
	     (unless (string-empty-p regexp)
	       (list "-regex" (concat ".*" regexp ".*")))
	     (unless include-directories
	       '("!" "-type" "d"))
	     '("-print0")
	     ))
	   (remote (file-remote-p dir))
	   (proc
	    (if remote
		(let ((proc (apply #'start-file-process
				   "find" (current-buffer) command)))
		  (set-process-sentinel proc (lambda (_proc _state)))
		  (set-process-query-on-exit-flag proc nil)
		  proc)
	      (make-process :name "find" :buffer (current-buffer)
			    :connection-type 'pipe
			    :noquery t
			    :sentinel (lambda (_proc _state))
			    :command command)))
            start ret)
       (setq start (point-min))
       (while (accept-process-output proc)
         (goto-char start)
         (while (search-forward "\0" nil t)
	  (push (buffer-substring-no-properties start (1- (point))) ret)
	  (setq start (point))))
       ret)))

This method already improved the performance somewhat (compared to 
find-directory-files-recursively-2), but not too much. So I tried these 
next two steps:

- Dropping most of the setup in read_and_dispose_of_process_output 
(which creates some consing too) and calling 
Finternal_default_process_filter directly (call_filter_directly.diff), 
when it is the filter to be used anyway.

- Going around that function entirely, skipping the creation of a Lisp 
string (CHARS -> TEXT) and inserting into the buffer directly (when the 
filter is set to the default, of course). Copied and adapted some code 
from 'call_process' for that (read_and_insert_process_output.diff).

Neither are intended as complete proposals, but here are some 
comparisons. Note that either of these patches could only help the 
implementations that don't set up process filter (the naive first one, 
and the new parallel number 5 above).

For testing, I used two different repo checkouts that are large enough 
to not finish too quickly: gecko-dev and torvalds-linux.

master

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.69 |  0.41 |
| find-directory-files-recursively-2               |      1.16 |  0.28 |
| find-directory-files-recursively-3               |      0.92 |  0.23 |
| find-directory-files-recursively-5               |      1.07 |  0.26 |
| find-directory-files-recursively (rpom 409600)   |      1.42 |  0.35 |
| find-directory-files-recursively-2 (rpom 409600) |      0.90 |  0.25 |
| find-directory-files-recursively-5 (rpom 409600) |      0.89 |  0.24 |

call_filter_directly.diff (basically, not much difference)

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.64 |  0.38 |
| find-directory-files-recursively-5               |      1.05 |  0.26 |
| find-directory-files-recursively (rpom 409600)   |      1.42 |  0.36 |
| find-directory-files-recursively-5 (rpom 409600) |      0.91 |  0.25 |

read_and_insert_process_output.diff (noticeable differences)

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.30 |  0.34 |
| find-directory-files-recursively-5               |      1.03 |  0.25 |
| find-directory-files-recursively (rpom 409600)   |      1.20 |  0.35 |
| find-directory-files-recursively-5 (rpom 409600) | (!!) 0.72 |  0.21 |

So it seems like we have at least two potential ways to implement an 
asynchronous file listing routine that is as fast or faster than the 
synchronous one (if only thanks to starting the parsing in parallel).

Combining the last patch together with using the very large value of 
read-process-output-max seems to yield the most benefit, but I'm not 
sure if it's appropriate to just raise that value in our code, though.

Thoughts?





  parent reply	other threads:[~2023-09-12 14:23 UTC|newest]

Thread overview: 202+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 21:16 bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Spencer Baugh
2023-07-20  5:00 ` Eli Zaretskii
2023-07-20 12:22   ` sbaugh
2023-07-20 12:42     ` Dmitry Gutov
2023-07-20 13:43       ` Spencer Baugh
2023-07-20 18:54         ` Dmitry Gutov
2023-07-20 12:38 ` Dmitry Gutov
2023-07-20 13:20   ` Ihor Radchenko
2023-07-20 15:19     ` Dmitry Gutov
2023-07-20 15:42       ` Ihor Radchenko
2023-07-20 15:57         ` Dmitry Gutov
2023-07-20 16:03           ` Ihor Radchenko
2023-07-20 18:56             ` Dmitry Gutov
2023-07-21  9:14               ` Ihor Radchenko
2023-07-20 16:33         ` Eli Zaretskii
2023-07-20 16:36           ` Ihor Radchenko
2023-07-20 16:45             ` Eli Zaretskii
2023-07-20 17:23               ` Ihor Radchenko
2023-07-20 18:24                 ` Eli Zaretskii
2023-07-20 18:29                   ` Ihor Radchenko
2023-07-20 18:43                     ` Eli Zaretskii
2023-07-20 18:57                       ` Ihor Radchenko
2023-07-21 12:37                         ` Dmitry Gutov
2023-07-21 12:58                           ` Ihor Radchenko
2023-07-21 13:00                             ` Dmitry Gutov
2023-07-21 13:34                               ` Ihor Radchenko
2023-07-21 13:36                                 ` Dmitry Gutov
2023-07-21 13:46                                   ` Ihor Radchenko
2023-07-21 15:41                                     ` Dmitry Gutov
2023-07-21 15:48                                       ` Ihor Radchenko
2023-07-21 19:53                                         ` Dmitry Gutov
2023-07-23  5:40                                     ` Ihor Radchenko
2023-07-23 11:50                                       ` Michael Albinus
2023-07-24  7:35                                         ` Ihor Radchenko
2023-07-24  7:59                                           ` Michael Albinus
2023-07-24  8:22                                             ` Ihor Radchenko
2023-07-24  9:31                                               ` Michael Albinus
2023-07-21  7:45                       ` Michael Albinus
2023-07-21 10:46                         ` Eli Zaretskii
2023-07-21 11:32                           ` Michael Albinus
2023-07-21 11:51                             ` Ihor Radchenko
2023-07-21 12:01                               ` Michael Albinus
2023-07-21 12:20                                 ` Ihor Radchenko
2023-07-21 12:25                                   ` Ihor Radchenko
2023-07-21 12:46                                     ` Eli Zaretskii
2023-07-21 13:01                                       ` Michael Albinus
2023-07-21 13:23                                         ` Ihor Radchenko
2023-07-21 15:31                                           ` Michael Albinus
2023-07-21 15:38                                             ` Ihor Radchenko
2023-07-21 15:49                                               ` Michael Albinus
2023-07-21 15:55                                                 ` Eli Zaretskii
2023-07-21 16:08                                                   ` Michael Albinus
2023-07-21 16:15                                                   ` Ihor Radchenko
2023-07-21 16:38                                                     ` Eli Zaretskii
2023-07-21 16:43                                                       ` Ihor Radchenko
2023-07-21 16:43                                                       ` Michael Albinus
2023-07-21 17:45                                                         ` Eli Zaretskii
2023-07-21 17:55                                                           ` Michael Albinus
2023-07-21 18:38                                                             ` Eli Zaretskii
2023-07-21 19:33                                                               ` Spencer Baugh
2023-07-22  5:27                                                                 ` Eli Zaretskii
2023-07-22 10:38                                                                   ` sbaugh
2023-07-22 11:58                                                                     ` Eli Zaretskii
2023-07-22 14:14                                                                       ` Ihor Radchenko
2023-07-22 14:32                                                                         ` Eli Zaretskii
2023-07-22 15:07                                                                           ` Ihor Radchenko
2023-07-22 15:29                                                                             ` Eli Zaretskii
2023-07-23  7:52                                                                               ` Ihor Radchenko
2023-07-23  8:01                                                                                 ` Eli Zaretskii
2023-07-23  8:11                                                                                   ` Ihor Radchenko
2023-07-23  9:11                                                                                     ` Eli Zaretskii
2023-07-23  9:34                                                                                       ` Ihor Radchenko
2023-07-23  9:39                                                                                         ` Eli Zaretskii
2023-07-23  9:42                                                                                           ` Ihor Radchenko
2023-07-23 10:20                                                                                             ` Eli Zaretskii
2023-07-23 11:43                                                                                               ` Ihor Radchenko
2023-07-23 12:49                                                                                                 ` Eli Zaretskii
2023-07-23 12:57                                                                                                   ` Ihor Radchenko
2023-07-23 13:32                                                                                                     ` Eli Zaretskii
2023-07-23 13:56                                                                                                       ` Ihor Radchenko
2023-07-23 14:32                                                                                                         ` Eli Zaretskii
2023-07-22 17:18                                                                       ` sbaugh
2023-07-22 17:26                                                                         ` Ihor Radchenko
2023-07-22 17:46                                                                         ` Eli Zaretskii
2023-07-22 18:31                                                                           ` Eli Zaretskii
2023-07-22 19:06                                                                             ` Eli Zaretskii
2023-07-22 20:53                                                                           ` Spencer Baugh
2023-07-23  6:15                                                                             ` Eli Zaretskii
2023-07-23  7:48                                                                             ` Ihor Radchenko
2023-07-23  8:06                                                                               ` Eli Zaretskii
2023-07-23  8:16                                                                                 ` Ihor Radchenko
2023-07-23  9:13                                                                                   ` Eli Zaretskii
2023-07-23  9:16                                                                                     ` Ihor Radchenko
2023-07-23 11:44                                                                             ` Michael Albinus
2023-07-23  2:59                                                                 ` Richard Stallman
2023-07-23  5:28                                                                   ` Eli Zaretskii
2023-07-22  8:17                                                             ` Michael Albinus
2023-07-21 13:17                                       ` Ihor Radchenko
2023-07-21 12:27                                   ` Michael Albinus
2023-07-21 12:30                                     ` Ihor Radchenko
2023-07-21 13:04                                       ` Michael Albinus
2023-07-21 13:24                                         ` Ihor Radchenko
2023-07-21 15:36                                           ` Michael Albinus
2023-07-21 15:44                                             ` Ihor Radchenko
2023-07-21 12:39                             ` Eli Zaretskii
2023-07-21 13:09                               ` Michael Albinus
2023-07-21 12:38                           ` Dmitry Gutov
2023-07-20 17:08         ` Spencer Baugh
2023-07-20 17:24           ` Eli Zaretskii
2023-07-22  6:35             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-20 17:25           ` Ihor Radchenko
2023-07-21 19:31             ` Spencer Baugh
2023-07-21 19:37               ` Ihor Radchenko
2023-07-21 19:56                 ` Dmitry Gutov
2023-07-21 20:11                 ` Spencer Baugh
2023-07-22  6:39           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-22 21:01             ` Dmitry Gutov
2023-07-23  5:11               ` Eli Zaretskii
2023-07-23 10:46                 ` Dmitry Gutov
2023-07-23 11:18                   ` Eli Zaretskii
2023-07-23 17:46                     ` Dmitry Gutov
2023-07-23 17:56                       ` Eli Zaretskii
2023-07-23 17:58                         ` Dmitry Gutov
2023-07-23 18:21                           ` Eli Zaretskii
2023-07-23 19:07                             ` Dmitry Gutov
2023-07-23 19:27                               ` Eli Zaretskii
2023-07-23 19:44                                 ` Dmitry Gutov
2023-07-23 19:27                         ` Dmitry Gutov
2023-07-24 11:20                           ` Eli Zaretskii
2023-07-24 12:55                             ` Dmitry Gutov
2023-07-24 13:26                               ` Eli Zaretskii
2023-07-25  2:41                                 ` Dmitry Gutov
2023-07-25  8:22                                   ` Ihor Radchenko
2023-07-26  1:51                                     ` Dmitry Gutov
2023-07-26  9:09                                       ` Ihor Radchenko
2023-07-27  0:41                                         ` Dmitry Gutov
2023-07-27  5:22                                           ` Eli Zaretskii
2023-07-27  8:20                                             ` Ihor Radchenko
2023-07-27  8:47                                               ` Eli Zaretskii
2023-07-27  9:28                                                 ` Ihor Radchenko
2023-07-27 13:30                                             ` Dmitry Gutov
2023-07-29  0:12                                               ` Dmitry Gutov
2023-07-29  6:15                                                 ` Eli Zaretskii
2023-07-30  1:35                                                   ` Dmitry Gutov
2023-07-31 11:38                                                     ` Eli Zaretskii
2023-09-08  0:53                                                       ` Dmitry Gutov
2023-09-08  6:35                                                         ` Eli Zaretskii
2023-09-10  1:30                                                           ` Dmitry Gutov
2023-09-10  5:33                                                             ` Eli Zaretskii
2023-09-11  0:02                                                               ` Dmitry Gutov
2023-09-11 11:57                                                                 ` Eli Zaretskii
2023-09-11 23:06                                                                   ` Dmitry Gutov
2023-09-12 11:39                                                                     ` Eli Zaretskii
2023-09-12 13:11                                                                       ` Dmitry Gutov
2023-09-12 14:23                                                                   ` Dmitry Gutov [this message]
2023-09-12 14:26                                                                     ` Dmitry Gutov
2023-09-12 16:32                                                                     ` Eli Zaretskii
2023-09-12 18:48                                                                       ` Dmitry Gutov
2023-09-12 19:35                                                                         ` Eli Zaretskii
2023-09-12 20:27                                                                           ` Dmitry Gutov
2023-09-13 11:38                                                                             ` Eli Zaretskii
2023-09-13 14:27                                                                               ` Dmitry Gutov
2023-09-13 15:07                                                                                 ` Eli Zaretskii
2023-09-13 17:27                                                                                   ` Dmitry Gutov
2023-09-13 19:32                                                                                     ` Eli Zaretskii
2023-09-13 20:38                                                                                       ` Dmitry Gutov
2023-09-14  5:41                                                                                         ` Eli Zaretskii
2023-09-16  1:32                                                                                           ` Dmitry Gutov
2023-09-16  5:37                                                                                             ` Eli Zaretskii
2023-09-19 19:59                                                                                               ` bug#66020: (bug#64735 spin-off): regarding the default for read-process-output-max Dmitry Gutov
2023-09-20 11:20                                                                                                 ` Eli Zaretskii
2023-09-21  0:57                                                                                                   ` Dmitry Gutov
2023-09-21  2:36                                                                                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
     [not found]                                                                                                       ` <58e9135f-915d-beb9-518a-e814ec2a0c5b@gutov.dev>
2023-09-21 13:16                                                                                                         ` Eli Zaretskii
2023-09-21 17:54                                                                                                           ` Dmitry Gutov
2023-09-21  7:42                                                                                                     ` Eli Zaretskii
2023-09-21 14:37                                                                                                       ` Dmitry Gutov
2023-09-21 14:59                                                                                                         ` Eli Zaretskii
2023-09-21 17:40                                                                                                           ` Dmitry Gutov
2023-09-21 18:39                                                                                                             ` Eli Zaretskii
2023-09-21 18:42                                                                                                               ` Dmitry Gutov
2023-09-21 18:49                                                                                                                 ` Eli Zaretskii
2023-09-21 17:33                                                                                                         ` Dmitry Gutov
2023-09-23 21:51                                                                                                           ` Dmitry Gutov
2023-09-24  5:29                                                                                                             ` Eli Zaretskii
2024-05-26 15:20                                                                                                               ` Dmitry Gutov
2024-05-26 16:01                                                                                                                 ` Eli Zaretskii
2024-05-26 23:27                                                                                                                   ` Stefan Kangas
2023-09-21  8:07                                                                                                   ` Stefan Kangas
     [not found]                                                                                                     ` <b4f2135b-be9d-2423-02ac-9690de8b5a92@gutov.dev>
2023-09-21 13:17                                                                                                       ` Eli Zaretskii
2023-07-25 18:42                                   ` bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Eli Zaretskii
2023-07-26  1:56                                     ` Dmitry Gutov
2023-07-26  2:28                                       ` Eli Zaretskii
2023-07-26  2:35                                         ` Dmitry Gutov
2023-07-25 19:16                                   ` sbaugh
2023-07-26  2:28                                     ` Dmitry Gutov
2023-07-21  2:42 ` Richard Stallman
2023-07-22  2:39   ` Richard Stallman
2023-07-22  5:49     ` Eli Zaretskii
2023-07-22 10:18 ` Ihor Radchenko
2023-07-22 10:42   ` sbaugh
2023-07-22 12:00     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6dd2d5c-9a88-6e18-f1e2-544a9cb1beaf@gutov.dev \
    --to=dmitry@gutov.dev \
    --cc=64735@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=luangruo@yahoo.com \
    --cc=sbaugh@janestreet.com \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).