unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
@ 2023-10-29  5:36 Jim Porter
  2023-10-29  6:06 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jim Porter @ 2023-10-29  5:36 UTC (permalink / raw)
  To: 66806; +Cc: dmitry

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

X-Debbugs-Cc: dmitry@gutov.dev

To see this in action, you can do the following, starting from "emacs 
-Q" inside of a Git repo that contains submodules:

   M-x trace-function RET project-files RET
   C-x p g some-string RET

If you look at the trace, you'll see that the files in your submodules 
are returned from 'project-files', but so is the submodule directory. 
That's not really correct, since 'project-files' is supposed to return 
*files*, not directories. (There are already workarounds for this in 
'project-search' and 'project-query-replace-regexp'.)

By default, this is just a theoretical problem, but if you customize 
'xref-search-program-alist' and 'xref-search-program' to include some 
other program, this can cause real issues. For example, I tried to add 
"ag" to this list[1], and unfortunately, it just doesn't work in this 
case. The results for submodules get duplicated, and there's no way I 
can see with ag to search only the specified *files*, ignoring any 
directories. (Looking at the definition for ripgrep, I'm guessing the 
"-g '!*/'" is the trick for that program, but nothing similar works for ag.)

Attached are two possible patches for this: a minimal version that just 
fixes 'project-find-regexp', and a maximal version that fixes this in 
general, and should theoretically speed up 'project-search' and 
'project-query-replace-regexp' since they no longer need to call 
'file-regular-p' on every file.

Do either of these patches look ok?

[1] This is a long story, simplified for this message since the gory 
details aren't especially relevant.

[-- Attachment #2: minimal_0001-Exclude-Git-submodules-when-getting-list-of-files-fo.patch --]
[-- Type: text/plain, Size: 1093 bytes --]

From 9f9fc6a606bdd1610d440d9b130aaa30aa66597a Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sat, 28 Oct 2023 22:11:30 -0700
Subject: [PATCH] Exclude Git submodules when getting list of files for
 'project-find-regexp'

* lisp/progmodes/project.el (project-find-regexp): Filter out
non-regular files.
---
 lisp/progmodes/project.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index fda1081eb62..e245e794318 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -960,7 +960,8 @@ project-find-regexp
          (default-directory (project-root pr))
          (files
           (if (not current-prefix-arg)
-              (project-files pr)
+              ;; XXX: See the comment in project-query-replace-regexp.
+              (cl-delete-if-not #'file-regular-p (project-files pr))
             (let ((dir (read-directory-name "Base directory: "
                                             caller-dir nil t)))
               (project--files-in-directory dir
-- 
2.25.1


[-- Attachment #3: maximal_0001-Exclude-Git-submodules-from-project-files.patch --]
[-- Type: text/plain, Size: 2684 bytes --]

From fdb7a99dcd47e3a11057fb47221a5e6f40b2db6a Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sat, 28 Oct 2023 22:20:41 -0700
Subject: [PATCH] Exclude Git submodules from 'project-files'

* lisp/progmodes/project.el (project--vc-list-files): Exclude Git
submodules.
(project-search, project-query-replace-regexp): Remove now-unneeded
workaround.
---
 lisp/progmodes/project.el | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index fda1081eb62..92b3ed52600 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -647,6 +647,7 @@ project--vc-list-files
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked
                                 dir))
+            (submodules (project--git-submodules))
             files)
        (setq args (append args
                           '("-c" "--exclude-standard")
@@ -680,13 +681,14 @@ project--vc-list-files
        (setq files
              (mapcar
               (lambda (file) (concat default-directory file))
-              (split-string
-               (apply #'vc-git--run-command-string nil "ls-files" args)
-               "\0" t)))
+              (cl-set-difference
+               (split-string
+                (apply #'vc-git--run-command-string nil "ls-files" args)
+                "\0" t)
+               submodules :test #'string=)))
        (when (project--vc-merge-submodules-p default-directory)
          ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
-         (let* ((submodules (project--git-submodules))
-                (sub-files
+         (let ((sub-files
                  (mapcar
                   (lambda (module)
                     (when (file-directory-p module)
@@ -1326,8 +1328,7 @@ project-search
   (interactive "sSearch (regexp): ")
   (fileloop-initialize-search
    regexp
-   ;; XXX: See the comment in project-query-replace-regexp.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
@@ -1348,10 +1349,7 @@ project-query-replace-regexp
        (list from to))))
   (fileloop-initialize-replace
    from to
-   ;; XXX: Filter out Git submodules, which are not regular files.
-   ;; `project-files' can return those, which is arguably suboptimal,
-   ;; but removing them eagerly has performance cost.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29  5:36 bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program Jim Porter
@ 2023-10-29  6:06 ` Eli Zaretskii
  2023-10-29 17:54   ` Jim Porter
  2023-10-29 22:14   ` Dmitry Gutov
  2023-10-29 21:41 ` Dmitry Gutov
  2023-10-29 22:02 ` Dmitry Gutov
  2 siblings, 2 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-10-29  6:06 UTC (permalink / raw)
  To: Jim Porter; +Cc: dmitry, 66806

> Cc: dmitry@gutov.dev
> Date: Sat, 28 Oct 2023 22:36:07 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> 
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -960,7 +960,8 @@ project-find-regexp
>           (default-directory (project-root pr))
>           (files
>            (if (not current-prefix-arg)
> -              (project-files pr)
> +              ;; XXX: See the comment in project-query-replace-regexp.
> +              (cl-delete-if-not #'file-regular-p (project-files pr))
                  ^^^^^^^^^^^^^^^^
I think we want to prefer using seq.el functions, since seq.el is
nowadays preloaded.  Is there a good reason to use cl-delete-if-not
here?





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29  6:06 ` Eli Zaretskii
@ 2023-10-29 17:54   ` Jim Porter
  2023-10-29 19:22     ` Eli Zaretskii
  2023-10-29 22:14   ` Dmitry Gutov
  1 sibling, 1 reply; 11+ messages in thread
From: Jim Porter @ 2023-10-29 17:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dmitry, 66806

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

On 10/28/2023 11:06 PM, Eli Zaretskii wrote:
>> Cc: dmitry@gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>>            (default-directory (project-root pr))
>>            (files
>>             (if (not current-prefix-arg)
>> -              (project-files pr)
>> +              ;; XXX: See the comment in project-query-replace-regexp.
>> +              (cl-delete-if-not #'file-regular-p (project-files pr))
>                    ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> here?

Well, that's just copy-pasted from some other functions in project.el. 
If we want to go the minimal route, I could update all those workarounds.

Or we could go the maximal route and fix it at its source. Here's an 
updated patch for the maximal route that uses 'seq-difference'.

Assuming we're ok with the performance characteristics of the maximal 
patch, I think the maximal route is best: it fixes the issue at its 
source. For performance, it should be faster by default, but a bit 
slower when 'project-vc-merge-submodules' is nil (since we need an extra 
call to "git" to get the list of submodules). However, that slowness is 
compensated for by eliminating the need to call 'file-regular-p' on all 
the results for some functions that really do need 'project-files' to 
return only files. (If we're really concerned about *exact* perf 
numbers, I can try to collect some. Just let me know.)

[-- Attachment #2: maximal_0001-Exclude-Git-submodules-from-project-files.patch --]
[-- Type: text/plain, Size: 2665 bytes --]

From 2a17d5acfcf0b748aaad9d01ff341ed6da315a48 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sat, 28 Oct 2023 22:20:41 -0700
Subject: [PATCH] Exclude Git submodules from 'project-files'

* lisp/progmodes/project.el (project--vc-list-files): Exclude Git
submodules.
(project-search, project-query-replace-regexp): Remove now-unneeded
workaround.
---
 lisp/progmodes/project.el | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index fda1081eb62..a3fa1b320a8 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -647,6 +647,7 @@ project--vc-list-files
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked
                                 dir))
+            (submodules (project--git-submodules))
             files)
        (setq args (append args
                           '("-c" "--exclude-standard")
@@ -680,13 +681,14 @@ project--vc-list-files
        (setq files
              (mapcar
               (lambda (file) (concat default-directory file))
-              (split-string
-               (apply #'vc-git--run-command-string nil "ls-files" args)
-               "\0" t)))
+              (seq-difference
+               (split-string
+                (apply #'vc-git--run-command-string nil "ls-files" args)
+                "\0" t)
+               submodules)))
        (when (project--vc-merge-submodules-p default-directory)
          ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
-         (let* ((submodules (project--git-submodules))
-                (sub-files
+         (let ((sub-files
                  (mapcar
                   (lambda (module)
                     (when (file-directory-p module)
@@ -1326,8 +1328,7 @@ project-search
   (interactive "sSearch (regexp): ")
   (fileloop-initialize-search
    regexp
-   ;; XXX: See the comment in project-query-replace-regexp.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
@@ -1348,10 +1349,7 @@ project-query-replace-regexp
        (list from to))))
   (fileloop-initialize-replace
    from to
-   ;; XXX: Filter out Git submodules, which are not regular files.
-   ;; `project-files' can return those, which is arguably suboptimal,
-   ;; but removing them eagerly has performance cost.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29 17:54   ` Jim Porter
@ 2023-10-29 19:22     ` Eli Zaretskii
  0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-10-29 19:22 UTC (permalink / raw)
  To: Jim Porter; +Cc: dmitry, 66806

> Date: Sun, 29 Oct 2023 10:54:28 -0700
> Cc: dmitry@gutov.dev, 66806@debbugs.gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> >> -              (project-files pr)
> >> +              ;; XXX: See the comment in project-query-replace-regexp.
> >> +              (cl-delete-if-not #'file-regular-p (project-files pr))
> >                    ^^^^^^^^^^^^^^^^
> > I think we want to prefer using seq.el functions, since seq.el is
> > nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> > here?
> 
> Well, that's just copy-pasted from some other functions in project.el. 
> If we want to go the minimal route, I could update all those workarounds.
> 
> Or we could go the maximal route and fix it at its source. Here's an 
> updated patch for the maximal route that uses 'seq-difference'.

That's up to Dmitry; from my POV it is enough that new code prefers
seq.el.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29  5:36 bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program Jim Porter
  2023-10-29  6:06 ` Eli Zaretskii
@ 2023-10-29 21:41 ` Dmitry Gutov
  2023-10-30  0:58   ` Jim Porter
  2023-10-29 22:02 ` Dmitry Gutov
  2 siblings, 1 reply; 11+ messages in thread
From: Dmitry Gutov @ 2023-10-29 21:41 UTC (permalink / raw)
  To: Jim Porter, 66806

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Hi Jim,

On 29/10/2023 07:36, Jim Porter wrote:
> By default, this is just a theoretical problem, but if you customize 
> 'xref-search-program-alist' and 'xref-search-program' to include some 
> other program, this can cause real issues. For example, I tried to add 
> "ag" to this list[1], and unfortunately, it just doesn't work in this 
> case. The results for submodules get duplicated, and there's no way I 
> can see with ag to search only the specified *files*, ignoring any 
> directories. (Looking at the definition for ripgrep, I'm guessing the 
> "-g '!*/'" is the trick for that program, but nothing similar works for 
> ag.)
> 
> Attached are two possible patches for this: a minimal version that just 
> fixes 'project-find-regexp', and a maximal version that fixes this in 
> general, and should theoretically speed up 'project-search' and 
> 'project-query-replace-regexp' since they no longer need to call 
> 'file-regular-p' on every file.

I kept this unfortunate situation around because every obvious fix 
brought non-negligible performance regressions: the version with 
file-regular-p slowed one of my examples (Mozilla's repo) by 370%. Your 
cl-set-difference version slowed it down by 10-14% -- better, but still 
something that seemed worse on balance when I tried this before, so I 
preferred to work around it in command implementations: the "-s" or 
"--no-messages" flags xref-search-program-alist.

And that's not to mention usage over Tramp (which would be affected by 
the +1 process call that you mentioned as well, but that seems unavoidable).

Anyway, after recent experience micro-optimizing list operations, I came 
up with this version where the impact seems minimal.

WDYT?

[-- Attachment #2: project-no-submodules.diff --]
[-- Type: text/x-patch, Size: 1834 bytes --]

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index fda1081eb62..e3e7654e687 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -647,6 +647,7 @@ project--vc-list-files
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked
                                 dir))
+            (submodules (project--git-submodules))
             files)
        (setq args (append args
                           '("-c" "--exclude-standard")
@@ -680,13 +681,24 @@ project--vc-list-files
        (setq files
              (mapcar
               (lambda (file) (concat default-directory file))
-              (split-string
-               (apply #'vc-git--run-command-string nil "ls-files" args)
-               "\0" t)))
+              (let* ((files
+                      (split-string
+                       (apply #'vc-git--run-command-string nil "ls-files" args)
+                       "\0" t))
+                     ref)
+                (when submodules
+                  ;; Hand-optimized version of nset-difference.
+                  (while (member (car files) submodules)
+                    (setq files (cdr files)))
+                  (setq ref files)
+                  (while ref
+                    (if (member (cadr ref) submodules)
+                        (setcdr ref (cddr ref))
+                      (setq ref (cdr ref)))))
+                files)))
        (when (project--vc-merge-submodules-p default-directory)
          ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
-         (let* ((submodules (project--git-submodules))
-                (sub-files
+         (let ((sub-files
                  (mapcar
                   (lambda (module)
                     (when (file-directory-p module)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29  5:36 bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program Jim Porter
  2023-10-29  6:06 ` Eli Zaretskii
  2023-10-29 21:41 ` Dmitry Gutov
@ 2023-10-29 22:02 ` Dmitry Gutov
  2023-10-30  0:25   ` Jim Porter
  2 siblings, 1 reply; 11+ messages in thread
From: Dmitry Gutov @ 2023-10-29 22:02 UTC (permalink / raw)
  To: Jim Porter, 66806

On 29/10/2023 07:36, Jim Porter wrote:
> (Looking at the definition for ripgrep, I'm guessing the "-g '!*/'" is 
> the trick for that program, but nothing similar works for ag.)

Actually, it's a failed solution which I copied from the rg's issue 
tracker but I didn't test it enough or it broke in some later versions, 
and then I put off dealing with it when finally noticed. Maybe it never 
worked and only could filter out directories when those are passed with 
a trailing slash.

What should work, is adding the argument "--max-depth 0" (stopping the 
traversal of passed directories). For ag, the corresponding fix seems to 
be "--depth 0", but I haven't tested that in practice.

Anyway, see the other email, which should make it all unnecessary 
(unless you have a 3rd party package which has to work with older 
project.el too).





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29  6:06 ` Eli Zaretskii
  2023-10-29 17:54   ` Jim Porter
@ 2023-10-29 22:14   ` Dmitry Gutov
  1 sibling, 0 replies; 11+ messages in thread
From: Dmitry Gutov @ 2023-10-29 22:14 UTC (permalink / raw)
  To: Eli Zaretskii, Jim Porter; +Cc: 66806

On 29/10/2023 08:06, Eli Zaretskii wrote:
>> Cc:dmitry@gutov.dev
>> Date: Sat, 28 Oct 2023 22:36:07 -0700
>> From: Jim Porter<jporterbugs@gmail.com>
>>
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -960,7 +960,8 @@ project-find-regexp
>>            (default-directory (project-root pr))
>>            (files
>>             (if (not current-prefix-arg)
>> -              (project-files pr)
>> +              ;; XXX: See the comment in project-query-replace-regexp.
>> +              (cl-delete-if-not #'file-regular-p (project-files pr))
>                    ^^^^^^^^^^^^^^^^
> I think we want to prefer using seq.el functions, since seq.el is
> nowadays preloaded.  Is there a good reason to use cl-delete-if-not
> here?

I'm okay with using seq with other things equal, but in this case both 
cl- approaches are too slow, and seq-difference is no better because of 
consing and indirection overhead (about 10-20% slower than current). I'd 
say it's a shortcoming of seq.el: only having non-destructive versions.

I had some hope for cl-nset-difference, but looking at the 
implementation it just delegates to the non-destructive cousin (I guess 
providing the optimized implementation was left as TODO).





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29 22:02 ` Dmitry Gutov
@ 2023-10-30  0:25   ` Jim Porter
  0 siblings, 0 replies; 11+ messages in thread
From: Jim Porter @ 2023-10-30  0:25 UTC (permalink / raw)
  To: Dmitry Gutov, 66806

On 10/29/2023 3:02 PM, Dmitry Gutov wrote:
> What should work, is adding the argument "--max-depth 0" (stopping the 
> traversal of passed directories). For ag, the corresponding fix seems to 
> be "--depth 0", but I haven't tested that in practice.

I tried that, and sadly it didn't work. That could be a bug in ag though.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-29 21:41 ` Dmitry Gutov
@ 2023-10-30  0:58   ` Jim Porter
  2023-10-30  2:00     ` Dmitry Gutov
  0 siblings, 1 reply; 11+ messages in thread
From: Jim Porter @ 2023-10-30  0:58 UTC (permalink / raw)
  To: Dmitry Gutov, 66806

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
> And that's not to mention usage over Tramp (which would be affected by 
> the +1 process call that you mentioned as well, but that seems 
> unavoidable).

Yeah, I don't see a way around that, unless we constructed a complex 
command to run via Tramp that does it all in one go.

(I looked into using the "--stage" argument for "git ls-files", which 
gets most of the way to fixing this, but you could break that logic with 
evil file names not in the tree...)

> Anyway, after recent experience micro-optimizing list operations, I came 
> up with this version where the impact seems minimal.
> 
> WDYT?

Thanks, that helped form the basis for the attached patch. It moves the 
'member' call into the lambda for the original 'mapcar', and then we can 
just 'delq' all the nil elements. Benchmarking against the Emacs repo, 
this still seems to have a minimal perf impact.

There might also be a benefit to using "git ls-files 
--recurse-submodules" when we can (i.e. when not using "-o"), but maybe 
we can leave that optimization for another day.

[-- Attachment #2: 0001-Exclude-Git-submodules-from-project-files.patch --]
[-- Type: text/plain, Size: 3625 bytes --]

From b0e01806e36c8671279e6ffeb6ba2397655a3f5a Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sat, 28 Oct 2023 22:20:41 -0700
Subject: [PATCH] Exclude Git submodules from 'project-files'

* lisp/progmodes/project.el (project--vc-list-files): Exclude Git
submodules.
(project-search, project-query-replace-regexp): Remove now-unneeded
workaround.
---
 lisp/progmodes/project.el | 41 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index fda1081eb62..bb44cfefa54 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -647,6 +647,7 @@ project--vc-list-files
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked
                                 dir))
+            (submodules (project--git-submodules))
             files)
        (setq args (append args
                           '("-c" "--exclude-standard")
@@ -678,23 +679,25 @@ project--vc-list-files
                                         i)))
                                    extra-ignores)))))
        (setq files
-             (mapcar
-              (lambda (file) (concat default-directory file))
-              (split-string
-               (apply #'vc-git--run-command-string nil "ls-files" args)
-               "\0" t)))
+             (delq nil
+                   (mapcar
+                    (lambda (file)
+                      (unless (member file submodules)
+                        (concat default-directory file)))
+                    (split-string
+                     (apply #'vc-git--run-command-string nil "ls-files" args)
+                     "\0" t))))
        (when (project--vc-merge-submodules-p default-directory)
          ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
-         (let* ((submodules (project--git-submodules))
-                (sub-files
-                 (mapcar
-                  (lambda (module)
-                    (when (file-directory-p module)
-                      (project--vc-list-files
-                       (concat default-directory module)
-                       backend
-                       extra-ignores)))
-                  submodules)))
+         (let ((sub-files
+                (mapcar
+                 (lambda (module)
+                   (when (file-directory-p module)
+                     (project--vc-list-files
+                      (concat default-directory module)
+                      backend
+                      extra-ignores)))
+                 submodules)))
            (setq files
                  (apply #'nconc files sub-files))))
        ;; 'git ls-files' returns duplicate entries for merge conflicts.
@@ -1326,8 +1329,7 @@ project-search
   (interactive "sSearch (regexp): ")
   (fileloop-initialize-search
    regexp
-   ;; XXX: See the comment in project-query-replace-regexp.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
@@ -1348,10 +1350,7 @@ project-query-replace-regexp
        (list from to))))
   (fileloop-initialize-replace
    from to
-   ;; XXX: Filter out Git submodules, which are not regular files.
-   ;; `project-files' can return those, which is arguably suboptimal,
-   ;; but removing them eagerly has performance cost.
-   (cl-delete-if-not #'file-regular-p (project-files (project-current t)))
+   (project-files (project-current t))
    'default)
   (fileloop-continue))
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-30  0:58   ` Jim Porter
@ 2023-10-30  2:00     ` Dmitry Gutov
  2023-10-30  3:53       ` Jim Porter
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Gutov @ 2023-10-30  2:00 UTC (permalink / raw)
  To: Jim Porter, 66806-done

Version: 30.1

On 30/10/2023 02:58, Jim Porter wrote:
> On 10/29/2023 2:41 PM, Dmitry Gutov wrote:
>> And that's not to mention usage over Tramp (which would be affected by 
>> the +1 process call that you mentioned as well, but that seems 
>> unavoidable).
> 
> Yeah, I don't see a way around that, unless we constructed a complex 
> command to run via Tramp that does it all in one go.
> 
> (I looked into using the "--stage" argument for "git ls-files", which 
> gets most of the way to fixing this, but you could break that logic with 
> evil file names not in the tree...)

Filtering out gitlinks based on the mode bits? Clever.

Would probably slow down the parsing of the output, though. Not sure by 
how much.

>> Anyway, after recent experience micro-optimizing list operations, I 
>> came up with this version where the impact seems minimal.
>>
>> WDYT?
> 
> Thanks, that helped form the basis for the attached patch. It moves the 
> 'member' call into the lambda for the original 'mapcar', and then we can 
> just 'delq' all the nil elements. Benchmarking against the Emacs repo, 
> this still seems to have a minimal perf impact.

Ah, nice. delq is pretty handy to abbreviate the relinking.

We could combine both steps to eliminate an extra list altogether, but 
that only yields a further 2-3% improvement. Maybe later.

> There might also be a benefit to using "git ls-files 
> --recurse-submodules" when we can (i.e. when not using "-o"), but maybe 
> we can leave that optimization for another day.

As long as it doesn't result in too much code duplication.

Anyway, I've pushed your latest patch. Thanks! And closing.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program
  2023-10-30  2:00     ` Dmitry Gutov
@ 2023-10-30  3:53       ` Jim Porter
  0 siblings, 0 replies; 11+ messages in thread
From: Jim Porter @ 2023-10-30  3:53 UTC (permalink / raw)
  To: 66806, dmitry

On 10/29/2023 7:00 PM, Dmitry Gutov wrote:
> On 30/10/2023 02:58, Jim Porter wrote:
>> (I looked into using the "--stage" argument for "git ls-files", which 
>> gets most of the way to fixing this, but you could break that logic 
>> with evil file names not in the tree...)
> 
> Filtering out gitlinks based on the mode bits? Clever.

Well, it'd be even more clever if it weren't for the issue I mentioned. ;)

> Anyway, I've pushed your latest patch. Thanks! And closing.

Thanks for merging!





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-30  3:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-29  5:36 bug#66806: 30.0.50; [PATCH] 'project-find-regexp' passes Git submodules to the search program Jim Porter
2023-10-29  6:06 ` Eli Zaretskii
2023-10-29 17:54   ` Jim Porter
2023-10-29 19:22     ` Eli Zaretskii
2023-10-29 22:14   ` Dmitry Gutov
2023-10-29 21:41 ` Dmitry Gutov
2023-10-30  0:58   ` Jim Porter
2023-10-30  2:00     ` Dmitry Gutov
2023-10-30  3:53       ` Jim Porter
2023-10-29 22:02 ` Dmitry Gutov
2023-10-30  0:25   ` Jim Porter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).