all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dmitry@gutov.dev>
To: Eli Zaretskii <eliz@gnu.org>
Cc: sbaugh@janestreet.com, 71094@debbugs.gnu.org, rgm@gnu.org
Subject: bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Wed, 22 May 2024 17:50:42 +0300	[thread overview]
Message-ID: <10f62497-dfb1-4c46-b18a-6d1100de4b6a@gutov.dev> (raw)
In-Reply-To: <861q5t7vrp.fsf@gnu.org>

On 22/05/2024 17:42, Eli Zaretskii wrote:
>>> That's true, but what is your mental model of how the pipe with xargs
>>> works in practice?  How many invocations of grep will xargs do, and
>>> when will the first invocation happen?
>>
>> In my mental model xargs acts like an asynchronous queue with batch
>> processing. The first invocation will happen after the output reaches
>> the maximum line number of maximum number of arguments configured. They
>> are system-dependent by default.
> 
> And can be rather small.  But if it is large, then...
> 
>> For example, on my system 'xargs --show-limits' says
>>
>>     Size of command buffer we are actually using: 131072
>>
>> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
>> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
>> parallelism there we'll need to either lower the limit or test it in a
>> project at least twice as big.
> 
> ...until xargs collects all those characters, it will not invoke grep,
> right?  So, for directories whose file names total less than those
> 200K, xargs will still wait until find ends its job, right?

That's right. And it's why we're not seeing much of a difference in 
projects of Emacs's size or smaller. No apparent regression either, though.

>> So here is another example: a Linux kernel checkout (76K files). Also
>> about 30% improvement: 1.40s vs 2.00s.
> 
> This is all highly system-dependent.

Naturally. So it'd be great to see some additional data points from 
users on other systems.

Especially those where the default limit is lower than it is on mine.





  reply	other threads:[~2024-05-22 14:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-21 14:35 bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep Spencer Baugh
2024-05-21 20:00 ` Dmitry Gutov
2024-05-22 11:59 ` Eli Zaretskii
2024-05-22 12:34   ` Dmitry Gutov
2024-05-22 13:50     ` Eli Zaretskii
2024-05-22 14:22       ` Dmitry Gutov
2024-05-22 14:42         ` Eli Zaretskii
2024-05-22 14:50           ` Dmitry Gutov [this message]
2024-05-22 15:26             ` Eli Zaretskii
2024-05-22 17:47               ` Dmitry Gutov
2024-05-22 18:21                 ` Eli Zaretskii
2024-05-22 18:06               ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-22 18:30                 ` Eli Zaretskii
2024-05-22 19:15                   ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-23  4:46                     ` Eli Zaretskii
2024-05-23 13:24                       ` Dmitry Gutov
2024-05-24 17:44                         ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-26 15:57                           ` Dmitry Gutov
2024-05-22 18:51                 ` Dmitry Gutov
2024-05-22 19:36                   ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-22 19:59                     ` Dmitry Gutov
2024-05-22 12:54   ` Spencer Baugh
2024-05-26  9:47     ` Eli Zaretskii
2024-05-30 12:29       ` Spencer Baugh
2024-05-30 14:52         ` Eli Zaretskii
2024-06-28 14:03           ` Spencer Baugh
2024-06-30  5:07             ` Stefan Kangas
2024-07-03 12:53               ` Spencer Baugh
2024-07-03 13:42                 ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10f62497-dfb1-4c46-b18a-6d1100de4b6a@gutov.dev \
    --to=dmitry@gutov.dev \
    --cc=71094@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=rgm@gnu.org \
    --cc=sbaugh@janestreet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.