unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#71179: [PATCH] In rgrep, check matching files before excluding files
@ 2024-05-24 20:14 Spencer Baugh
  2024-05-24 20:45 ` Dmitry Gutov
  2024-05-25  6:36 ` Eli Zaretskii
  0 siblings, 2 replies; 23+ messages in thread
From: Spencer Baugh @ 2024-05-24 20:14 UTC (permalink / raw)
  To: 71179

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

Tags: patch


In rgrep, check matching files before excluding files

There are a lot of excluding globs, and checking them all is expensive.
The files glob (i.e. the glob for files we actually want) is usually
just one or two entries, so it's quite fast to check.

If find checks the files glob first and then the excluding glob, it has
to do much less checking (since the files glob will substantially narrow
down the set of files on its own), and find performance is much better.

In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
from ~410ms to ~130ms.

When the files glob is "* .*", there's no benefit from this change,
since we still have to check every excluding glob anyway.  But there's
also no cost.

* lisp/progmodes/grep.el (rgrep-default-command): Move the
excluded files glob to part of the "files" argument.

In GNU Emacs 29.2.50 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2024-05-20 built on
 igm-qws-u22796a
Repository revision: 734740051bd377d24899d08d00ec8e1bb8e00e00
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.9 (Green Obsidian)

Configured using:
 'configure -C --with-x-toolkit=lucid --with-gif=ifavailable
 --with-native-compilation=aot'


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-In-rgrep-check-matching-files-before-excluding-files.patch --]
[-- Type: text/patch, Size: 2851 bytes --]

From f94e7f15c77ff1d8b5736adc49bd1e9bd54c7270 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 24 May 2024 13:12:28 -0400
Subject: [PATCH] In rgrep, check matching files before excluding files

There are a lot of excluding globs, and checking them all is expensive.
The files glob (i.e. the glob for files we actually want) is usually
just one or two entries, so it's quite fast to check.

If find checks the files glob first and then the excluding glob, it has
to do much less checking (since the files glob will substantially narrow
down the set of files on its own), and find performance is much better.

In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
from ~410ms to ~130ms.

* lisp/progmodes/grep.el (rgrep-default-command): Move the
excluded files glob to part of the "files" argument.
---
 lisp/progmodes/grep.el | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index ce54c57aabc..84b3b352faa 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -1383,7 +1383,17 @@ rgrep-default-command
             (split-string files)
             (concat " -o " find-name-arg " "))
            " "
-           (shell-quote-argument ")" grep-quoting-style))
+           (shell-quote-argument ")" grep-quoting-style)
+           (when-let ((ignored-files (grep-find-ignored-files dir)))
+             (concat " " (shell-quote-argument "!" grep-quoting-style)
+                     " " (shell-quote-argument "(" grep-quoting-style)
+                     ;; we should use shell-quote-argument here
+                     " -name "
+                     (mapconcat
+                      (lambda (ignore) (shell-quote-argument ignore grep-quoting-style))
+                      ignored-files
+                      " -o -name ")
+                     " " (shell-quote-argument ")" grep-quoting-style))))
    dir
    (concat
     (when-let ((ignored-dirs (rgrep-find-ignored-directories dir)))
@@ -1398,18 +1408,6 @@ rgrep-default-command
                " -o -path ")
               " "
               (shell-quote-argument ")" grep-quoting-style)
-              " -prune -o "))
-    (when-let ((ignored-files (grep-find-ignored-files dir)))
-      (concat (shell-quote-argument "!" grep-quoting-style) " -type d "
-              (shell-quote-argument "(" grep-quoting-style)
-              ;; we should use shell-quote-argument here
-              " -name "
-              (mapconcat
-               (lambda (ignore) (shell-quote-argument ignore grep-quoting-style))
-               ignored-files
-               " -o -name ")
-              " "
-              (shell-quote-argument ")" grep-quoting-style)
               " -prune -o ")))))
 
 (defun grep-find-toggle-abbreviation ()
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-24 20:14 bug#71179: [PATCH] In rgrep, check matching files before excluding files Spencer Baugh
@ 2024-05-24 20:45 ` Dmitry Gutov
  2024-05-24 20:54   ` Spencer Baugh
  2024-05-25  6:36 ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-24 20:45 UTC (permalink / raw)
  To: Spencer Baugh, 71179

Hi Spencer,

On 24/05/2024 23:14, Spencer Baugh wrote:
> If find checks the files glob first and then the excluding glob, it has
> to do much less checking (since the files glob will substantially narrow
> down the set of files on its own), and find performance is much better.
> 
> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
> from ~410ms to ~130ms.

I can confirm improvement here (though not exactly 3x).

1.9s to 1.3s in a Linux checkout, for example. Nice.

Moving the files exclude instructions to the <F> placeholder is a slight 
incompatibility, but I wonder if there are any custom grep-find-template 
values which would be bothered by it (that's the only incompatibility I 
could think of). Perhaps those that currently don't include <X> at all?






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-24 20:45 ` Dmitry Gutov
@ 2024-05-24 20:54   ` Spencer Baugh
  2024-05-25 12:31     ` Dmitry Gutov
  0 siblings, 1 reply; 23+ messages in thread
From: Spencer Baugh @ 2024-05-24 20:54 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 71179

Dmitry Gutov <dmitry@gutov.dev> writes:
> Hi Spencer,
>
> On 24/05/2024 23:14, Spencer Baugh wrote:
>> If find checks the files glob first and then the excluding glob, it has
>> to do much less checking (since the files glob will substantially narrow
>> down the set of files on its own), and find performance is much better.
>> In my benchmarking, this takes (rgrep "foo" "*.el"
>> "~/src/emacs/trunk/")
>> from ~410ms to ~130ms.
>
> I can confirm improvement here (though not exactly 3x).
>
> 1.9s to 1.3s in a Linux checkout, for example. Nice.
>
> Moving the files exclude instructions to the <F> placeholder is a
> slight incompatibility, but I wonder if there are any custom
> grep-find-template values which would be bothered by it (that's the
> only incompatibility I could think of). Perhaps those that currently
> don't include <X> at all?

A grep-find-template that doesn't include <X> will indeed start seeing
ignores based on grep-find-ignored-files in rgrep.  But, such a user can
just set grep-find-ignored-files to nil and then they'll stop seeing
ignores again.

Also, for what it's worth, note that the documentation for
grep-find-template says this:

  <X> - find options to restrict or expand the directory list
  <F> - find options to limit the files matched

So this change makes the documentation more accurate: <X> previously
also affected the files matched, but now it only affects the directory
list, as documented.  <F> continues to limit the files matched, as
before.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-24 20:14 bug#71179: [PATCH] In rgrep, check matching files before excluding files Spencer Baugh
  2024-05-24 20:45 ` Dmitry Gutov
@ 2024-05-25  6:36 ` Eli Zaretskii
  2024-05-25 12:26   ` Dmitry Gutov
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25  6:36 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: 71179

> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Fri, 24 May 2024 16:14:39 -0400
> 
> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
> from ~410ms to ~130ms.

Which is a minor improvement at best, possibly a negligible one.  In
my testing (on MS-Windows), I see a barely-tangible improvement: 0.7%.

> Date: Fri, 24 May 2024 23:45:00 +0300
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> > In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
> > from ~410ms to ~130ms.
> 
> I can confirm improvement here (though not exactly 3x).
> 
> 1.9s to 1.3s in a Linux checkout, for example. Nice.

Which is still quite minor.

> Moving the files exclude instructions to the <F> placeholder is a slight 
> incompatibility

Right, and for that reason, we cannot install this change as-is.  We
need either a different command or a user option controlling the order
(with a good explanation of the effect of the difference).

> Cc: 71179@debbugs.gnu.org
> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Fri, 24 May 2024 16:54:37 -0400
> 
> > Moving the files exclude instructions to the <F> placeholder is a
> > slight incompatibility, but I wonder if there are any custom
> > grep-find-template values which would be bothered by it (that's the
> > only incompatibility I could think of). Perhaps those that currently
> > don't include <X> at all?
> 
> A grep-find-template that doesn't include <X> will indeed start seeing
> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
> just set grep-find-ignored-files to nil and then they'll stop seeing
> ignores again.

That's not a valid argument for changing the default behavior.
Because I could counter-argue that if you don't care about the order
and want those few hundreds of millisecond at all costs, then _you_
can customize the template to your liking, leaving the default
behavior intact.

> Also, for what it's worth, note that the documentation for
> grep-find-template says this:
> 
>   <X> - find options to restrict or expand the directory list
>   <F> - find options to limit the files matched
> 
> So this change makes the documentation more accurate: <X> previously
> also affected the files matched, but now it only affects the directory
> list, as documented.  <F> continues to limit the files matched, as
> before.

Sorry, such incompatible changes are not acceptable, definitely when
the gain is so small.  Correctness trumps speed.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25  6:36 ` Eli Zaretskii
@ 2024-05-25 12:26   ` Dmitry Gutov
  2024-05-25 12:51     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 12:26 UTC (permalink / raw)
  To: Eli Zaretskii, Spencer Baugh; +Cc: 71179

On 25/05/2024 09:36, Eli Zaretskii wrote:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Fri, 24 May 2024 16:14:39 -0400
>>
>> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
>> from ~410ms to ~130ms.
> 
> Which is a minor improvement at best, possibly a negligible one.  In
> my testing (on MS-Windows), I see a barely-tangible improvement: 0.7%.

That's unfortunate, but I think we prioritize GNU systems when making 
such decisions. I suppose filesystem access has more overhead on MSW, or 
there are other problems with the port.

>> Date: Fri, 24 May 2024 23:45:00 +0300
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>>> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
>>> from ~410ms to ~130ms.
>>
>> I can confirm improvement here (though not exactly 3x).
>>
>> 1.9s to 1.3s in a Linux checkout, for example. Nice.
> 
> Which is still quite minor.

A 30% improvement is nothing to sneeze at, especially for a code change 
as simple as this one.

I've tried the "gecko-dev" checkout, and there the change is from 6s 
down to 1.9s when searching for .cpp and from 6s to 3.7s when searching 
for .js (the top #1 file type, 25% of files in that project are .js).

Naturally not all cases will see an improvement, but many will, and for 
example 'xref-find-references' also uses grep-find-template (by default) 
and specifies a list of file extensions - so it should also get faster.

>> Moving the files exclude instructions to the <F> placeholder is a slight
>> incompatibility
> 
> Right, and for that reason, we cannot install this change as-is.  We
> need either a different command or a user option controlling the order
> (with a good explanation of the effect of the difference).

A user option might work, but before we add one it would be great to 
understand who are the users that it is for.

>> A grep-find-template that doesn't include <X> will indeed start seeing
>> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
>> just set grep-find-ignored-files to nil and then they'll stop seeing
>> ignores again.
> 
> That's not a valid argument for changing the default behavior.
> Because I could counter-argue that if you don't care about the order
> and want those few hundreds of millisecond at all costs, then _you_
> can customize the template to your liking, leaving the default
> behavior intact.

I don't think you can get the same effect just by editing the template.

>> Also, for what it's worth, note that the documentation for
>> grep-find-template says this:
>>
>>    <X> - find options to restrict or expand the directory list
>>    <F> - find options to limit the files matched
>>
>> So this change makes the documentation more accurate: <X> previously
>> also affected the files matched, but now it only affects the directory
>> list, as documented.  <F> continues to limit the files matched, as
>> before.
> 
> Sorry, such incompatible changes are not acceptable, definitely when
> the gain is so small.  Correctness trumps speed.

Can you think of a specific problematic usage? The way I see it, 
grep-find-template is not really portable between different programs: 
supporting <X> in the format we're passing to it basically requires 
'find' to be used (there are no compatible alternatives). That would 
mean that passing the same arguments to <F> should work fine.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-24 20:54   ` Spencer Baugh
@ 2024-05-25 12:31     ` Dmitry Gutov
  2024-05-26  6:50       ` Juri Linkov
  0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 12:31 UTC (permalink / raw)
  To: Spencer Baugh, Juri Linkov; +Cc: 71179

On 24/05/2024 23:54, Spencer Baugh wrote:
> A grep-find-template that doesn't include <X> will indeed start seeing
> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
> just set grep-find-ignored-files to nil and then they'll stop seeing
> ignores again.

Yeah, that sounds like a good enough solution for such cases.

I recall that Juri experimented with customizing grep-find-template (to 
use ripgrep? and perhaps other things). I wonder what he thinks about 
this patch.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 12:26   ` Dmitry Gutov
@ 2024-05-25 12:51     ` Eli Zaretskii
  2024-05-25 13:03       ` Dmitry Gutov
  2024-05-25 13:36       ` Spencer Baugh
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25 12:51 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: sbaugh, 71179

> Date: Sat, 25 May 2024 15:26:56 +0300
> Cc: 71179@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 25/05/2024 09:36, Eli Zaretskii wrote:
> >> From: Spencer Baugh <sbaugh@janestreet.com>
> >> Date: Fri, 24 May 2024 16:14:39 -0400
> >>
> >> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
> >> from ~410ms to ~130ms.
> > 
> > Which is a minor improvement at best, possibly a negligible one.  In
> > my testing (on MS-Windows), I see a barely-tangible improvement: 0.7%.
> 
> That's unfortunate, but I think we prioritize GNU systems when making 
> such decisions.

We do.  I just added one more data point.

> >>> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
> >>> from ~410ms to ~130ms.
> >>
> >> I can confirm improvement here (though not exactly 3x).
> >>
> >> 1.9s to 1.3s in a Linux checkout, for example. Nice.
> > 
> > Which is still quite minor.
> 
> A 30% improvement is nothing to sneeze at, especially for a code change 
> as simple as this one.

They are 30%, but they are only 600 milliseconds.

> >> Moving the files exclude instructions to the <F> placeholder is a slight
> >> incompatibility
> > 
> > Right, and for that reason, we cannot install this change as-is.  We
> > need either a different command or a user option controlling the order
> > (with a good explanation of the effect of the difference).
> 
> A user option might work, but before we add one it would be great to 
> understand who are the users that it is for.

The ones for whom the proposed change will affect the results.

> > Sorry, such incompatible changes are not acceptable, definitely when
> > the gain is so small.  Correctness trumps speed.
> 
> Can you think of a specific problematic usage?

Why is that needed?  Isn't it clear that it can happen?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 12:51     ` Eli Zaretskii
@ 2024-05-25 13:03       ` Dmitry Gutov
  2024-05-25 13:45         ` Eli Zaretskii
  2024-05-25 13:36       ` Spencer Baugh
  1 sibling, 1 reply; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 13:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: sbaugh, 71179

On 25/05/2024 15:51, Eli Zaretskii wrote:

>>>>> In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
>>>>> from ~410ms to ~130ms.
>>>>
>>>> I can confirm improvement here (though not exactly 3x).
>>>>
>>>> 1.9s to 1.3s in a Linux checkout, for example. Nice.
>>>
>>> Which is still quite minor.
>>
>> A 30% improvement is nothing to sneeze at, especially for a code change
>> as simple as this one.
> 
> They are 30%, but they are only 600 milliseconds.

On my top-of-the-line laptop, even if it's a few years old. Take an 
older or slower machine - and you might as well see a multi-second 
difference. Just like in my example with the bigger project anyway.

One of my personal aims is to make Emacs more viable even for those who 
work on large projects. That's why I routinely test certain operations 
with Mozilla's codebase. And AFAIK Spencer's codebase is even larger.

>>>> Moving the files exclude instructions to the <F> placeholder is a slight
>>>> incompatibility
>>>
>>> Right, and for that reason, we cannot install this change as-is.  We
>>> need either a different command or a user option controlling the order
>>> (with a good explanation of the effect of the difference).
>>
>> A user option might work, but before we add one it would be great to
>> understand who are the users that it is for.
> 
> The ones for whom the proposed change will affect the results.
> 
>>> Sorry, such incompatible changes are not acceptable, definitely when
>>> the gain is so small.  Correctness trumps speed.
>>
>> Can you think of a specific problematic usage?
> 
> Why is that needed?  Isn't it clear that it can happen?

Provided we do add an option, knowing the actual audience could help 
name it better and document it better.

But so far the audience I can think of is the very rare people who 
misused the template's customization. And those can use an existing option.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 12:51     ` Eli Zaretskii
  2024-05-25 13:03       ` Dmitry Gutov
@ 2024-05-25 13:36       ` Spencer Baugh
  2024-05-25 13:56         ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Spencer Baugh @ 2024-05-25 13:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, 71179

Eli Zaretskii <eliz@gnu.org> writes:
> The ones for whom the proposed change will affect the results.

To be clear, the proposed change will not affect the content of the
results returned by rgrep or any code in grep.el.  For them, this change
has no semantic effect at all, it only speeds things up.

The only user-observable difference besides the speedup is that it fixes
a bug: grep-find-ignored-files is documented to affect rgrep, and now
that variable affects rgrep even if grep-find-template is set to a
broken value (one which doesn't include <X>).

It's unlikely anyone is relying on this buggy behavior when <X> isn't
present; the much more obvious way and documented way to achieve this
behavior is to set grep-find-ignored-files to nil (which still works
fine).

I suppose we could preserve that bug by conditioning this optimization
on (string-search "<X>" grep-find-template), though.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 13:03       ` Dmitry Gutov
@ 2024-05-25 13:45         ` Eli Zaretskii
  2024-05-25 13:58           ` Dmitry Gutov
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25 13:45 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: sbaugh, 71179

> Date: Sat, 25 May 2024 16:03:00 +0300
> Cc: sbaugh@janestreet.com, 71179@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >> Can you think of a specific problematic usage?
> > 
> > Why is that needed?  Isn't it clear that it can happen?
> 
> Provided we do add an option, knowing the actual audience could help 
> name it better and document it better.
> 
> But so far the audience I can think of is the very rare people who 
> misused the template's customization. And those can use an existing option.

We cannot break user's existing setups, even if we consider them
"rare".





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 13:36       ` Spencer Baugh
@ 2024-05-25 13:56         ` Eli Zaretskii
  2024-05-25 14:02           ` Spencer Baugh
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25 13:56 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: dmitry, 71179

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: Dmitry Gutov <dmitry@gutov.dev>,  71179@debbugs.gnu.org
> Date: Sat, 25 May 2024 09:36:44 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> > The ones for whom the proposed change will affect the results.
> 
> The only user-observable difference besides the speedup is that it fixes
> a bug: grep-find-ignored-files is documented to affect rgrep, and now
> that variable affects rgrep even if grep-find-template is set to a
> broken value (one which doesn't include <X>).

You call that "broken", but I disagree.  There's nothing wrong with
removing <X> from the template.

> I suppose we could preserve that bug by conditioning this optimization
> on (string-search "<X>" grep-find-template), though.

Not good enough, sorry.  I'm quite sure other situations will emerge
where this change affects the results.  For example, you never
considered the cases where the files which match the pattern to search
also match the ignored-extensions, and many other possibilities.  IOW,
this proposal is based on a very narrow and very specific usage
pattern, whereas the real-life usage patterns are infinitely more
numerous.

My suggestion is to provide a user option to get back the old behavior
(and document all this in NEWS).





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 13:45         ` Eli Zaretskii
@ 2024-05-25 13:58           ` Dmitry Gutov
  0 siblings, 0 replies; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 13:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: sbaugh, 71179

On 25/05/2024 16:45, Eli Zaretskii wrote:
>> Date: Sat, 25 May 2024 16:03:00 +0300
>> Cc:sbaugh@janestreet.com,71179@debbugs.gnu.org
>> From: Dmitry Gutov<dmitry@gutov.dev>
>>
>>>> Can you think of a specific problematic usage?
>>> Why is that needed?  Isn't it clear that it can happen?
>> Provided we do add an option, knowing the actual audience could help
>> name it better and document it better.
>>
>> But so far the audience I can think of is the very rare people who
>> misused the template's customization. And those can use an existing option.
> We cannot break user's existing setups, even if we consider them
> "rare".

"rare" is an estimation - so far we don't have even one example of a 
user who does this.

And we do break existing setups from time to time, usually with good 
reasons (but those are in the eye of the beholder). We weigh the tradeoffs.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 13:56         ` Eli Zaretskii
@ 2024-05-25 14:02           ` Spencer Baugh
  2024-05-25 14:13             ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Spencer Baugh @ 2024-05-25 14:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, 71179

[-- Attachment #1: Type: text/plain, Size: 367 bytes --]

On Sat, May 25, 2024, 9:57 AM Eli Zaretskii <eliz@gnu.org> wrote:

> For example, you never
> considered the cases where the files which match the pattern to search
> also match the ignored-extensions, and many other possibilities.


In that case the file will not be searched, same as before this change.  No
behavior change.

What other possibilities?

[-- Attachment #2: Type: text/html, Size: 717 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 14:02           ` Spencer Baugh
@ 2024-05-25 14:13             ` Eli Zaretskii
  2024-05-25 14:47               ` Dmitry Gutov
  2024-05-25 14:51               ` Dmitry Gutov
  0 siblings, 2 replies; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25 14:13 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: dmitry, 71179

> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Sat, 25 May 2024 10:02:02 -0400
> Cc: Dmitry Gutov <dmitry@gutov.dev>, 71179@debbugs.gnu.org
> 
> On Sat, May 25, 2024, 9:57 AM Eli Zaretskii <eliz@gnu.org> wrote:
> 
>  For example, you never
>  considered the cases where the files which match the pattern to search
>  also match the ignored-extensions, and many other possibilities.  
> 
> In that case the file will not be searched, same as before this change.  No behavior change.

We are talking about performance here.

> What other possibilities?

Who knows?  Why risk any changes at all?

Anyway, I suggested an approach that should leave everyone happy.  Why
are we still arguing?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 14:13             ` Eli Zaretskii
@ 2024-05-25 14:47               ` Dmitry Gutov
  2024-05-25 15:07                 ` Eli Zaretskii
  2024-05-25 14:51               ` Dmitry Gutov
  1 sibling, 1 reply; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 14:47 UTC (permalink / raw)
  To: Eli Zaretskii, Spencer Baugh; +Cc: 71179

On 25/05/2024 17:13, Eli Zaretskii wrote:
>> What other possibilities?
> Who knows?  Why risk any changes at all?
> 
> Anyway, I suggested an approach that should leave everyone happy.  Why
> are we still arguing?

New option has the usual problems:

* What to call it.
* General proliferation of new options makes it harder to find each one.
* Several days of arguing whether we can make the new behavior the 
default one.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 14:13             ` Eli Zaretskii
  2024-05-25 14:47               ` Dmitry Gutov
@ 2024-05-25 14:51               ` Dmitry Gutov
  1 sibling, 0 replies; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-25 14:51 UTC (permalink / raw)
  To: Eli Zaretskii, Spencer Baugh; +Cc: 71179

On 25/05/2024 17:13, Eli Zaretskii wrote:
>>   For example, you never
>>   considered the cases where the files which match the pattern to search
>>   also match the ignored-extensions, and many other possibilities.
>>
>> In that case the file will not be searched, same as before this change.  No behavior change.
> We are talking about performance here.

FWIW I don't see which performance problem you have in mind.

If you mean the possible regression in the case "user picked file 
extension that's already in ignores", then I don't think it's going to 
be any noticeably slower in vast majority of cases. And it's not a 
scenario to optimize for anyway.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 14:47               ` Dmitry Gutov
@ 2024-05-25 15:07                 ` Eli Zaretskii
  2024-05-26 13:42                   ` Spencer Baugh
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2024-05-25 15:07 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: sbaugh, 71179

> Date: Sat, 25 May 2024 17:47:49 +0300
> Cc: 71179@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> On 25/05/2024 17:13, Eli Zaretskii wrote:
> > Anyway, I suggested an approach that should leave everyone happy.  Why
> > are we still arguing?
> 
> New option has the usual problems:
> 
> * What to call it.
> * General proliferation of new options makes it harder to find each one.
> * Several days of arguing whether we can make the new behavior the 
> default one.

I already said that I'm okay with making the new behavior the default,
provided that we document the way of getting back old behavior.

The new option can be a defvar, from where I stand, if that makes
things easier.  Just don't make it an internal variable.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 12:31     ` Dmitry Gutov
@ 2024-05-26  6:50       ` Juri Linkov
  2024-05-26 12:48         ` Spencer Baugh
  2024-05-26 12:56         ` Dmitry Gutov
  0 siblings, 2 replies; 23+ messages in thread
From: Juri Linkov @ 2024-05-26  6:50 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Spencer Baugh, 71179

>> A grep-find-template that doesn't include <X> will indeed start seeing
>> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
>> just set grep-find-ignored-files to nil and then they'll stop seeing
>> ignores again.
>
> Yeah, that sounds like a good enough solution for such cases.
>
> I recall that Juri experimented with customizing grep-find-template (to use
> ripgrep? and perhaps other things). I wonder what he thinks about this
> patch.

I'm using such configuration for ripgrep that hopefully should continue working:

  (setq grep-find-template "find <D> <X> -type f <F> -print0 | sort -z | xargs -0 -e rg <C> -nH --no-heading --null -j8 --sort path -M 200 --max-columns-preview -e <R>")





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-26  6:50       ` Juri Linkov
@ 2024-05-26 12:48         ` Spencer Baugh
  2024-05-26 12:56         ` Dmitry Gutov
  1 sibling, 0 replies; 23+ messages in thread
From: Spencer Baugh @ 2024-05-26 12:48 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Dmitry Gutov, 71179

Juri Linkov <juri@linkov.net> writes:
>>> A grep-find-template that doesn't include <X> will indeed start seeing
>>> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
>>> just set grep-find-ignored-files to nil and then they'll stop seeing
>>> ignores again.
>>
>> Yeah, that sounds like a good enough solution for such cases.
>>
>> I recall that Juri experimented with customizing grep-find-template (to use
>> ripgrep? and perhaps other things). I wonder what he thinks about this
>> patch.
>
> I'm using such configuration for ripgrep that hopefully should continue working:
>
>   (setq grep-find-template "find <D> <X> -type f <F> -print0 | sort -z | xargs -0 -e rg <C> -nH --no-heading --null -j8 --sort path -M 200 --max-columns-preview -e <R>")

Yep, that would continue working - it has a totally normal "find"
invocation and that's all this touches.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-26  6:50       ` Juri Linkov
  2024-05-26 12:48         ` Spencer Baugh
@ 2024-05-26 12:56         ` Dmitry Gutov
  1 sibling, 0 replies; 23+ messages in thread
From: Dmitry Gutov @ 2024-05-26 12:56 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Spencer Baugh, 71179

On 26/05/2024 09:50, Juri Linkov wrote:
>>> A grep-find-template that doesn't include <X> will indeed start seeing
>>> ignores based on grep-find-ignored-files in rgrep.  But, such a user can
>>> just set grep-find-ignored-files to nil and then they'll stop seeing
>>> ignores again.
>> Yeah, that sounds like a good enough solution for such cases.
>>
>> I recall that Juri experimented with customizing grep-find-template (to use
>> ripgrep? and perhaps other things). I wonder what he thinks about this
>> patch.
> I'm using such configuration for ripgrep that hopefully should continue working:
> 
>    (setq grep-find-template "find <D> <X> -type f <F> -print0 | sort -z | xargs -0 -e rg <C> -nH --no-heading --null -j8 --sort path -M 200 --max-columns-preview -e <R>")

AFAICT it will continue to work fine.

I think you can drop the 'sort -z' step, though: the sorting at the end 
of xref-matches-in-files covers the results' order, and adding a 
synchronizing step like this slows down the overall process a little 
(see bug#71094).





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-25 15:07                 ` Eli Zaretskii
@ 2024-05-26 13:42                   ` Spencer Baugh
  2024-06-01 14:15                     ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Spencer Baugh @ 2024-05-26 13:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, 71179

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 25 May 2024 17:47:49 +0300
>> Cc: 71179@debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@gutov.dev>
>> 
>> On 25/05/2024 17:13, Eli Zaretskii wrote:
>> > Anyway, I suggested an approach that should leave everyone happy.  Why
>> > are we still arguing?
>> 
>> New option has the usual problems:
>> 
>> * What to call it.
>> * General proliferation of new options makes it harder to find each one.
>> * Several days of arguing whether we can make the new behavior the 
>> default one.
>
> I already said that I'm okay with making the new behavior the default,
> provided that we document the way of getting back old behavior.
>
> The new option can be a defvar, from where I stand, if that makes
> things easier.  Just don't make it an internal variable.

OK, here's the patch, now with an option.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-In-rgrep-check-matching-files-before-excluding-files.patch --]
[-- Type: text/x-patch, Size: 6138 bytes --]

From 231813e5825e1f3a63c24ad6063eb73de7b2b361 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Sun, 26 May 2024 09:26:09 -0400
Subject: [PATCH] In rgrep, check matching files before excluding files

There are a lot of excluding globs, and checking them all is expensive.
The files glob (i.e. the glob for files we actually want) is usually
just one or two entries, so it's quite fast to check.

If find checks the files glob first and then the excluding glob, it has
to do much less checking (since the files glob will substantially narrow
down the set of files on its own), and find performance is much better.

In my benchmarking, this takes (rgrep "foo" "*.el" "~/src/emacs/trunk/")
from ~410ms to ~130ms.

Further optimizations are possible now that the ignores and matched
files are in the same <F> argument which can be rearranged more easily
without compatibility issues; I'll do those optimizations in later
commits.

* lisp/progmodes/grep.el (rgrep-find-ignores-in-<f>): Add.
* lisp/progmodes/grep.el (rgrep-default-command): Check
rgrep-find-ignores-in-<f> and move the excluded files glob to part of
the "files" argument.
---
 lisp/progmodes/grep.el | 97 +++++++++++++++++++++++++-----------------
 1 file changed, 58 insertions(+), 39 deletions(-)

diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index ce54c57aabc..c0b48fb17e0 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -214,6 +214,21 @@ grep-find-template
   :set #'grep-apply-setting
   :version "22.1")
 
+(defvar rgrep-find-ignores-in-<f> t
+  "If nil, when `rgrep' expands `grep-find-template', file ignores go in <X>.
+
+By default, the <X> placeholder contains find options for affecting the
+directory list, and the <F> placeholder contains the find options which
+affect which files are matched, both `grep-find-ignored-files' and the
+FILES argument to `rgrep'.
+
+This separation allows the two sources of file matching in <F> to be
+optimized together into a set of options which are overall faster for
+\"find\" to evaluate.
+
+If nil, <X> contains ignores both for directories and files, and <F>
+contains only the FILES argument.  This is the old behavior.")
+
 (defvar grep-quoting-style nil
   "Whether to use POSIX-like shell argument quoting.")
 
@@ -1372,45 +1387,49 @@ rgrep-find-ignored-directories
 
 (defun rgrep-default-command (regexp files dir)
   "Compute the command for \\[rgrep] to use by default."
-  (require 'find-dired)      ; for `find-name-arg'
-  (grep-expand-template
-   grep-find-template
-   regexp
-   (concat (shell-quote-argument "(" grep-quoting-style)
-           " " find-name-arg " "
-           (mapconcat
-            (lambda (x) (shell-quote-argument x grep-quoting-style))
-            (split-string files)
-            (concat " -o " find-name-arg " "))
-           " "
-           (shell-quote-argument ")" grep-quoting-style))
-   dir
-   (concat
-    (when-let ((ignored-dirs (rgrep-find-ignored-directories dir)))
-      (concat "-type d "
-              (shell-quote-argument "(" grep-quoting-style)
-              ;; we should use shell-quote-argument here
-              " -path "
-              (mapconcat
-               (lambda (d)
-                 (shell-quote-argument (concat "*/" d) grep-quoting-style))
-               ignored-dirs
-               " -o -path ")
-              " "
-              (shell-quote-argument ")" grep-quoting-style)
-              " -prune -o "))
-    (when-let ((ignored-files (grep-find-ignored-files dir)))
-      (concat (shell-quote-argument "!" grep-quoting-style) " -type d "
-              (shell-quote-argument "(" grep-quoting-style)
-              ;; we should use shell-quote-argument here
-              " -name "
-              (mapconcat
-               (lambda (ignore) (shell-quote-argument ignore grep-quoting-style))
-               ignored-files
-               " -o -name ")
-              " "
-              (shell-quote-argument ")" grep-quoting-style)
-              " -prune -o ")))))
+  (require 'find-dired)                 ; for `find-name-arg'
+  (let ((ignored-files-arg
+         (when-let ((ignored-files (grep-find-ignored-files dir)))
+           (concat (shell-quote-argument "(" grep-quoting-style)
+                   ;; we should use shell-quote-argument here
+                   " -name "
+                   (mapconcat
+                    (lambda (ignore) (shell-quote-argument ignore grep-quoting-style))
+                    ignored-files
+                    " -o -name ")
+                   " " (shell-quote-argument ")" grep-quoting-style)))))
+    (grep-expand-template
+     grep-find-template
+     regexp
+     (concat (shell-quote-argument "(" grep-quoting-style)
+             " " find-name-arg " "
+             (mapconcat
+              (lambda (x) (shell-quote-argument x grep-quoting-style))
+              (split-string files)
+              (concat " -o " find-name-arg " "))
+             " "
+             (shell-quote-argument ")" grep-quoting-style)
+             (when (and rgrep-find-ignores-in-<f> ignored-files-arg)
+               (concat " " (shell-quote-argument "!" grep-quoting-style) " " ignored-files-arg)))
+     dir
+     (concat
+      (when-let ((ignored-dirs (rgrep-find-ignored-directories dir)))
+        (concat "-type d "
+                (shell-quote-argument "(" grep-quoting-style)
+                ;; we should use shell-quote-argument here
+                " -path "
+                (mapconcat
+                 (lambda (d)
+                   (shell-quote-argument (concat "*/" d) grep-quoting-style))
+                 ignored-dirs
+                 " -o -path ")
+                " "
+                (shell-quote-argument ")" grep-quoting-style)
+                " -prune -o "))
+      (when (and (not rgrep-find-ignores-in-<f>) ignored-files-arg)
+        (concat (shell-quote-argument "!" grep-quoting-style) " -type d "
+                ignored-files-arg
+                " -prune -o "))))))
 
 (defun grep-find-toggle-abbreviation ()
   "Toggle showing the hidden part of rgrep/lgrep/zrgrep command line."
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-05-26 13:42                   ` Spencer Baugh
@ 2024-06-01 14:15                     ` Eli Zaretskii
  2024-06-02 10:46                       ` Stefan Kangas
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2024-06-01 14:15 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: dmitry, 71179

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: Dmitry Gutov <dmitry@gutov.dev>,  71179@debbugs.gnu.org
> Date: Sun, 26 May 2024 09:42:10 -0400
> 
> > I already said that I'm okay with making the new behavior the default,
> > provided that we document the way of getting back old behavior.
> >
> > The new option can be a defvar, from where I stand, if that makes
> > things easier.  Just don't make it an internal variable.
> 
> OK, here's the patch, now with an option.

Thanks, this LGTM.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#71179: [PATCH] In rgrep, check matching files before excluding files
  2024-06-01 14:15                     ` Eli Zaretskii
@ 2024-06-02 10:46                       ` Stefan Kangas
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Kangas @ 2024-06-02 10:46 UTC (permalink / raw)
  To: Eli Zaretskii, Spencer Baugh; +Cc: dmitry, 71179-done

Version: 30.1

Eli Zaretskii <eliz@gnu.org> writes:

> Thanks, this LGTM.

Same here, so pushed to master as commit b71fa27987d and closing the bug.

Thanks, Spencer.





^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-06-02 10:46 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-24 20:14 bug#71179: [PATCH] In rgrep, check matching files before excluding files Spencer Baugh
2024-05-24 20:45 ` Dmitry Gutov
2024-05-24 20:54   ` Spencer Baugh
2024-05-25 12:31     ` Dmitry Gutov
2024-05-26  6:50       ` Juri Linkov
2024-05-26 12:48         ` Spencer Baugh
2024-05-26 12:56         ` Dmitry Gutov
2024-05-25  6:36 ` Eli Zaretskii
2024-05-25 12:26   ` Dmitry Gutov
2024-05-25 12:51     ` Eli Zaretskii
2024-05-25 13:03       ` Dmitry Gutov
2024-05-25 13:45         ` Eli Zaretskii
2024-05-25 13:58           ` Dmitry Gutov
2024-05-25 13:36       ` Spencer Baugh
2024-05-25 13:56         ` Eli Zaretskii
2024-05-25 14:02           ` Spencer Baugh
2024-05-25 14:13             ` Eli Zaretskii
2024-05-25 14:47               ` Dmitry Gutov
2024-05-25 15:07                 ` Eli Zaretskii
2024-05-26 13:42                   ` Spencer Baugh
2024-06-01 14:15                     ` Eli Zaretskii
2024-06-02 10:46                       ` Stefan Kangas
2024-05-25 14:51               ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).