all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Joel Reicher <joel.reicher@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Should project delegate project-find-regexp?
Date: Mon, 18 Apr 2022 06:01:37 +0300	[thread overview]
Message-ID: <7c78dec7-d719-b3f0-2b6e-649c73eb5523@yandex.ru> (raw)
In-Reply-To: <86v8vkynle.fsf@gmail.com>

On 08.04.2022 11:40, Joel Reicher wrote:
> Dmitry Gutov <dgutov@yandex.ru> writes:
> 
>> On 07.04.2022 14:48, Joel Reicher wrote:
>>> It seems to me that, at least in the case of git, 'git grep' offers a superior implementation to anything offered by the generic implementation of project-find-regexp.
>>
>> Last I checked, there was no way to make 'git grep' search in
>> untracked files.
> 
> There's a --untracked option, at least now.

Thanks, that works. And we could try to support it. "ignore patterns" 
would require some code duplication, but that's doable. Not "error 
patterns", sorry, that was a typo.

But I've benchmarked searching through a large project (200000 files), 
and the results seem mixed.

--untracked does slow it down noticeably.

Examples:

$ time git grep -z -e symlinks >/dev/null

________________________________________________________
Executed in    1,11 secs    fish           external
    usr time    2,16 secs  720,00 micros    2,16 secs
    sys time    3,65 secs  192,00 micros    3,65 secs

$ time git grep -z --untracked -e symlinks >/dev/null

________________________________________________________
Executed in    1,81 secs    fish           external
    usr time    2,42 secs    0,00 micros    2,42 secs
    sys time    4,00 secs  938,00 micros    4,00 secs

At the same time, if I pipe the results of 'git ls-files' to ripgrep:

$ time git ls-files -z -c -o --exclude-standard | xargs -0 rg --null 
--no-messages -g '!*/' -nH -e symlinks >/dev/null

________________________________________________________
Executed in    2,50 secs    fish           external
    usr time    2,91 secs    1,40 millis    2,90 secs
    sys time    3,02 secs    0,37 millis    3,02 secs

...it looks a little worse. But what if I add some forced parallelism?

$ time git ls-files -z -c -o --exclude-standard | xargs -0 -P8 rg --null 
--no-messages -g '!*/' -nH -e symlinks >/dev/null

________________________________________________________
Executed in    1,08 secs    fish           external
    usr time    4,03 secs    1,50 millis    4,03 secs
    sys time    3,60 secs    0,42 millis    3,60 secs

...it shows better performance. Unfortunately, using the -P argument of 
xargs for grepping because of synchronization problems, but I've wrote 
about this to ripgrep's issue tracker 
(https://github.com/BurntSushi/ripgrep/issues/273#issuecomment-1100792783), 
and we might get such feature there natively someday.

YMMV, but on this machine at least this seems to demonstrate that 'git 
grep' isn't always better, at least. And its '--threads' argument 
doesn't seem to make any difference.

Now, the default searcher (grep) is a little slower than ripgrep, but at 
least we have a faster option present.

Now, when it comes to Emacs, we also lose a fair amount of time on 
parsing the list of files internally (the output of 'git ls-files') 
before sending it to 'xargs rg' or 'xargs grep'.

There are a few approaches how to deal with this. Maybe we'd have a 
generic function which constructs the shell command (which we'd simply 
concatenate when constructing the shell command for search). Or we'd 
have 'project-files' return some opaque value with a bunch of accessors 
which would allow parsing the list of files lazily, and simply reuse the 
output buffer as input without parsing it (this would save ~500ms in my 
measurements in this scenario). Or we'd cache the list of files, and cut 
the whole 1s with that.

We've discussed some of this before (like the caching thing) but so far 
it's up in the air.

But given the possibility of being able to choose a faster search 
problem, I'm not sure about making the search a project method (which 
would lock such projects into one search implementation). I'd rather try 
to work on other inefficiencies first.

Do try installing ripgrep, though. The search program is configured 
through the xref-search-program defcustom.



  reply	other threads:[~2022-04-18  3:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 11:48 Should project delegate project-find-regexp? Joel Reicher
2022-04-07 12:34 ` Ergus
2022-04-07 12:55   ` Joel Reicher
2022-04-07 14:30 ` Dmitry Gutov
2022-04-07 16:10   ` Ergus
2022-04-07 16:33     ` Dmitry Gutov
2022-04-08  8:40   ` Joel Reicher
2022-04-18  3:01     ` Dmitry Gutov [this message]
2022-04-09 23:01   ` Jim Porter
2022-04-18  3:06     ` Dmitry Gutov
2022-04-07 16:56 ` Sean Whitton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c78dec7-d719-b3f0-2b6e-649c73eb5523@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=emacs-devel@gnu.org \
    --cc=joel.reicher@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.