From: Dmitry Gutov <dmitry@gutov.dev>
To: Eli Zaretskii <eliz@gnu.org>
Cc: sbaugh@janestreet.com, 71094@debbugs.gnu.org, rgm@gnu.org
Subject: bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Wed, 22 May 2024 17:50:42 +0300 [thread overview]
Message-ID: <10f62497-dfb1-4c46-b18a-6d1100de4b6a@gutov.dev> (raw)
In-Reply-To: <861q5t7vrp.fsf@gnu.org>
On 22/05/2024 17:42, Eli Zaretskii wrote:
>>> That's true, but what is your mental model of how the pipe with xargs
>>> works in practice? How many invocations of grep will xargs do, and
>>> when will the first invocation happen?
>>
>> In my mental model xargs acts like an asynchronous queue with batch
>> processing. The first invocation will happen after the output reaches
>> the maximum line number of maximum number of arguments configured. They
>> are system-dependent by default.
>
> And can be rather small. But if it is large, then...
>
>> For example, on my system 'xargs --show-limits' says
>>
>> Size of command buffer we are actually using: 131072
>>
>> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
>> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
>> parallelism there we'll need to either lower the limit or test it in a
>> project at least twice as big.
>
> ...until xargs collects all those characters, it will not invoke grep,
> right? So, for directories whose file names total less than those
> 200K, xargs will still wait until find ends its job, right?
That's right. And it's why we're not seeing much of a difference in
projects of Emacs's size or smaller. No apparent regression either, though.
>> So here is another example: a Linux kernel checkout (76K files). Also
>> about 30% improvement: 1.40s vs 2.00s.
>
> This is all highly system-dependent.
Naturally. So it'd be great to see some additional data points from
users on other systems.
Especially those where the default limit is lower than it is on mine.
next prev parent reply other threads:[~2024-05-22 14:50 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-21 14:35 bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep Spencer Baugh
2024-05-21 20:00 ` Dmitry Gutov
2024-05-22 11:59 ` Eli Zaretskii
2024-05-22 12:34 ` Dmitry Gutov
2024-05-22 13:50 ` Eli Zaretskii
2024-05-22 14:22 ` Dmitry Gutov
2024-05-22 14:42 ` Eli Zaretskii
2024-05-22 14:50 ` Dmitry Gutov [this message]
2024-05-22 15:26 ` Eli Zaretskii
2024-05-22 17:47 ` Dmitry Gutov
2024-05-22 18:21 ` Eli Zaretskii
2024-05-22 18:06 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-22 18:30 ` Eli Zaretskii
2024-05-22 19:15 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-23 4:46 ` Eli Zaretskii
2024-05-23 13:24 ` Dmitry Gutov
2024-05-24 17:44 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-26 15:57 ` Dmitry Gutov
2024-05-22 18:51 ` Dmitry Gutov
2024-05-22 19:36 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-22 19:59 ` Dmitry Gutov
2024-05-22 12:54 ` Spencer Baugh
2024-05-26 9:47 ` Eli Zaretskii
2024-05-30 12:29 ` Spencer Baugh
2024-05-30 14:52 ` Eli Zaretskii
2024-06-28 14:03 ` Spencer Baugh
2024-06-30 5:07 ` Stefan Kangas
2024-07-03 12:53 ` Spencer Baugh
2024-07-03 13:42 ` Andrea Corallo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=10f62497-dfb1-4c46-b18a-6d1100de4b6a@gutov.dev \
--to=dmitry@gutov.dev \
--cc=71094@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=rgm@gnu.org \
--cc=sbaugh@janestreet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.