unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
@ 2024-09-17 16:55 Sean Allred
  2024-09-17 22:54 ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Allred @ 2024-09-17 16:55 UTC (permalink / raw)
  To: 73320

[-- Attachment #1: Type: text/plain, Size: 1684 bytes --]

Tags: patch

Hello!

I noticed that `C-x p f` (M-x project-find-file) took an incredibly long
time to run on our monorepo -- even when using a sparse index -- and I
tracked the problem down to this function. Adding `--sparse` to the
`git-ls-files` invocation resolves the issue handily, albeit with the
quirk of still showing top-level directories that are excised from the
sparse index (which we may want to remove from the return value of this
function since it does say it returns /files/).

I'm expecting at least one more version of this patch before it's even
considered for merge. Given that this change removes many, many results
from the return value of `project--vc-list-files', I suspect this would
be a breaking change for some use case that I'm not considering. I'm
hoping the list can provide some feedback on the best way to make this
configurable. I've not really hacked in project.el before and am not
super familiar with its internals.

If you prefer, you may pull this patch from my fork on github at

    url: git@github.com:vermiculus/emacs.git
    branch: sa/sparse-index

Keep in mind I may be updating this branch with feedback from this list
as it comes in.

-Sean


In GNU Emacs 29.1 (build 1, aarch64-apple-darwin21.6.0, NS
 appkit-2113.60 Version 12.6.6 (Build 21G646)) of 2023-07-30 built on
 armbob.lan
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.6.1

Configured using:
 'configure --with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/${version}/site-lisp:/Library/Application
 Support/Emacs/site-lisp' --with-modules 'CFLAGS=-DFD_SETSIZE=10000
 -DDARWIN_UNLIMITED_SELECT' --with-x-toolkit=no'


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-project-vc-list-files-use-Git-s-sparse-index.patch --]
[-- Type: text/patch, Size: 1737 bytes --]

From e72c013afbafad3faf19f98acb8d0387fdf045ba Mon Sep 17 00:00:00 2001
From: Sean Allred <allred.sean@gmail.com>
Date: Tue, 17 Sep 2024 10:07:41 -0500
Subject: [PATCH] project--vc-list-files: use Git's sparse-index

When dealing with exceptionally large Git repositories, the performance
of `project-find-file` can suffer dramatically as the list of files is
collected for completion. This adds insult to injury when you consider
cases where the developer has configured the repository to use a sparse
checkout where the vast majority of these files are not even present on
disk and are not valid candidates for completion.

Unconditionally pass along the `--sparse` option to `git-ls-files` to
coerce `project--vc-list-files` only consider files present in the
sparse index (and presumably on disk).

No consideration is made to make this configurable as it is currently
unclear as to the best way to do this -- whether to follow the example
of the `include-untracked` symbol in this function or to adopt some
other mechanism.
---
 lisp/progmodes/project.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index b29d5ed5404..82cb79322dd 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -663,7 +663,7 @@ project--vc-list-files
   (pcase backend
     (`Git
      (let* ((default-directory (expand-file-name (file-name-as-directory dir)))
-            (args '("-z"))
+            (args '("-z", "--sparse"))
             (vc-git-use-literal-pathspecs nil)
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked

base-commit: 7d365a2d72d8e656262205827cc5fdf423c3a41f
-- 
2.46.0


[-- Attachment #3: Type: text/plain, Size: 17 bytes --]


-- 
Sean Allred

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-17 16:55 bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index Sean Allred
@ 2024-09-17 22:54 ` Dmitry Gutov
  2024-09-18  0:36   ` Sean Allred
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2024-09-17 22:54 UTC (permalink / raw)
  To: Sean Allred, 73320

Hi!

On 17/09/2024 19:55, Sean Allred wrote:

> I noticed that `C-x p f` (M-x project-find-file) took an incredibly long
> time to run on our monorepo -- even when using a sparse index -- and I
> tracked the problem down to this function.

That sounds like a good problem to try to solve.

> Adding `--sparse` to the
> `git-ls-files` invocation resolves the issue handily, albeit with the
> quirk of still showing top-level directories that are excised from the
> sparse index (which we may want to remove from the return value of this
> function since it does say it returns /files/).

The submitted patch has a comma inside which results in an error

   vc-delistify: Wrong type argument: characterp, \,

Removing it fixes the error.

But let's start from the beginning. Could you help me set up a sparse 
repo that would help test out the change?

I took a large-ish checkout with shallow history and set up the sparse 
config like this:

   git sparse-checkout init --cone
   git sparse-checkout set gfx media

With that, both 'git status' and 'ls' behave as expected - no extra 
directories around (just the top-level files and two subdirs).

But 'git ls-files' and 'git ls-files --sparse' continue to show the same 
large output. Any idea what could be wrong? My version of Git, perhaps? 
Which is 2.40.1.

> I'm expecting at least one more version of this patch before it's even
> considered for merge. Given that this change removes many, many results
> from the return value of `project--vc-list-files', I suspect this would
> be a breaking change for some use case that I'm not considering.

Yeah, I expect project-find-regexp, project-search, 
project-query-replace-regexp might start misbehaving without additional 
filtering -- either throwing up errors or, best case, continuing to 
search through the "hidden" directories.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-17 22:54 ` Dmitry Gutov
@ 2024-09-18  0:36   ` Sean Allred
  2024-09-18 22:27     ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Allred @ 2024-09-18  0:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 73320

Dmitry Gutov <dmitry@gutov.dev> writes:
> The submitted patch has a comma inside which results in an error

Well that's embarrassing. I've resolved this in my branch. I'm not sure
why I wasn't seeing an issue locally; perhaps I made this error somehow
after I eval'd the form for testing. Thanks for catching!

> But let's start from the beginning. Could you help me set up a sparse
> repo that would help test out the change?
>
> I took a large-ish checkout with shallow history and set up the sparse
> config like this:
>
>   git sparse-checkout init --cone
>   git sparse-checkout set gfx media
>
> With that, both 'git status' and 'ls' behave as expected - no extra
> directories around (just the top-level files and two subdirs).
>
> But 'git ls-files' and 'git ls-files --sparse' continue to show the
> same large output. Any idea what could be wrong? My version of Git,
> perhaps? Which is 2.40.1.

Sure! You're very close. The key is here in git-ls-files(1):

    --sparse
        If the index is sparse, show the sparse directories without
        expanding to the contained files. Sparse directories will be shown
        with a trailing slash, such as "x/" for a sparse directory "x".

combined with this in git-sparse-checkout(1):

    Use the --[no-]sparse-index option to use a sparse index (the
    default is to not use it).

I fell into this too when setting up my worktree for testing; normally
our internal tooling takes care of this for me. To resolve this issue in
an existing checkout, you can simply

    git sparse-checkout init --sparse-index

and that will rewrite your index file using sparse-index rules.

> Yeah, I expect project-find-regexp, project-search,
> project-query-replace-regexp might start misbehaving without
> additional filtering -- either throwing up errors or, best case,
> continuing to search through the "hidden" directories.

Not sure how best to track that we should come back to this, but yeah.
It seems like the right place to add some sort of switch would be in the
`project-files` defmethod. From here, it looks like all the functions
you mention could choose the behavior right for them. (Based on the
function names alone -- it seems they would /also/ be interested in
operating on only those files which exist on disk.)

Incidentally looking at the version check within `project-files`, it's
worthwhile to point out that `--sparse` is likely /not/ compatible with
ancient versions of Git. Does vc have any sort of policy on requiring
recent versions of these tools? If the answer is 'not really', I'll
additionally want to add some sort of protection against using
`--sparse` when the Git version won't understand it. This should be easy
enough to do within the implementation of `project--vc-list-files`.

-- 
Sean Allred





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-18  0:36   ` Sean Allred
@ 2024-09-18 22:27     ` Dmitry Gutov
  2024-09-19  4:25       ` Sean Allred
  2024-09-19  5:41       ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-09-18 22:27 UTC (permalink / raw)
  To: Sean Allred; +Cc: 73320

On 18/09/2024 03:36, Sean Allred wrote:
> Dmitry Gutov <dmitry@gutov.dev> writes:
>> The submitted patch has a comma inside which results in an error
> 
> Well that's embarrassing. I've resolved this in my branch. I'm not sure
> why I wasn't seeing an issue locally; perhaps I made this error somehow
> after I eval'd the form for testing. Thanks for catching!

No problem!

>> But let's start from the beginning. Could you help me set up a sparse
>> repo that would help test out the change?
>>
>> I took a large-ish checkout with shallow history and set up the sparse
>> config like this:
>>
>>    git sparse-checkout init --cone
>>    git sparse-checkout set gfx media
>>
>> With that, both 'git status' and 'ls' behave as expected - no extra
>> directories around (just the top-level files and two subdirs).
>>
>> But 'git ls-files' and 'git ls-files --sparse' continue to show the
>> same large output. Any idea what could be wrong? My version of Git,
>> perhaps? Which is 2.40.1.
> 
> Sure! You're very close. The key is here in git-ls-files(1):
> 
>      --sparse
>          If the index is sparse, show the sparse directories without
>          expanding to the contained files. Sparse directories will be shown
>          with a trailing slash, such as "x/" for a sparse directory "x".
> 
> combined with this in git-sparse-checkout(1):
> 
>      Use the --[no-]sparse-index option to use a sparse index (the
>      default is to not use it).
> 
> I fell into this too when setting up my worktree for testing; normally
> our internal tooling takes care of this for me. To resolve this issue in
> an existing checkout, you can simply
> 
>      git sparse-checkout init --sparse-index
> 
> and that will rewrite your index file using sparse-index rules.

Thanks! So the key here is using "sparse index", not just "sparse 
checkout", since these are two different features. Which have to be 
enabled separately for sort of "gradual rollout", I guess.

>> Yeah, I expect project-find-regexp, project-search,
>> project-query-replace-regexp might start misbehaving without
>> additional filtering -- either throwing up errors or, best case,
>> continuing to search through the "hidden" directories.
> 
> Not sure how best to track that we should come back to this, but yeah.
> It seems like the right place to add some sort of switch would be in the
> `project-files` defmethod. From here, it looks like all the functions
> you mention could choose the behavior right for them. (Based on the
> function names alone -- it seems they would /also/ be interested in
> operating on only those files which exist on disk.)

I think we can just remove the names ending with '/'. The built-in 
commands don't seem to error out on them right now - probably because 
there is some protection against nonexistent files - but those files are 
(were) still shown as completions for project-find-file. Try out this 
addition please. The performance here seems about the same even with a 
large list (something I was worried about):

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index b29d5ed5404..a2e3f3f52e6 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -663,7 +663,7 @@ project--vc-list-files
    (pcase backend
      (`Git
       (let* ((default-directory (expand-file-name 
(file-name-as-directory dir)))
-            (args '("-z"))
+            (args '("-z" "--sparse"))
              (vc-git-use-literal-pathspecs nil)
              (include-untracked (project--value-in-dir
                                  'project-vc-include-untracked
@@ -703,7 +703,8 @@ project--vc-list-files
               (delq nil
                     (mapcar
                      (lambda (file)
-                      (unless (member file submodules)
+                      (unless (or (member file submodules)
+                                  (eq ?/ (aref file (1- (length file)))))
                          (if project-files-relative-names
                              file
                            (concat default-directory file))))


> Incidentally looking at the version check within `project-files`, it's
> worthwhile to point out that `--sparse` is likely /not/ compatible with
> ancient versions of Git. Does vc have any sort of policy on requiring
> recent versions of these tools? If the answer is 'not really', I'll
> additionally want to add some sort of protection against using
> `--sparse` when the Git version won't understand it. This should be easy
> enough to do within the implementation of `project--vc-list-files`.

IIRC it was something like "should work on the CentOS stable", and maybe 
CentOS N-1 as well. But the release-based distro was discontinued since 
the last time this question came up ;-(

We can call vc-git--program-version the same way it's used in 
vc-git-state. Which version should we make the minimum?





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-18 22:27     ` Dmitry Gutov
@ 2024-09-19  4:25       ` Sean Allred
  2024-09-19  9:44         ` Dmitry Gutov
  2024-10-03 23:57         ` Dmitry Gutov
  2024-09-19  5:41       ` Eli Zaretskii
  1 sibling, 2 replies; 17+ messages in thread
From: Sean Allred @ 2024-09-19  4:25 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 73320

Dmitry Gutov <dmitry@gutov.dev> writes:
>>> Yeah, I expect project-find-regexp, project-search,
>>> project-query-replace-regexp might start misbehaving without
>>> additional filtering -- either throwing up errors or, best case,
>>> continuing to search through the "hidden" directories.
>> Not sure how best to track that we should come back to this, but
>> yeah.
>> It seems like the right place to add some sort of switch would be in the
>> `project-files` defmethod. From here, it looks like all the functions
>> you mention could choose the behavior right for them. (Based on the
>> function names alone -- it seems they would /also/ be interested in
>> operating on only those files which exist on disk.)
>
> I think we can just remove the names ending with '/'. The built-in
> commands don't seem to error out on them right now - probably because
> there is some protection against nonexistent files - but those files
> are (were) still shown as completions for project-find-file. Try out
> this addition please. The performance here seems about the same even
> with a large list (something I was worried about):
>
> diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
> index b29d5ed5404..a2e3f3f52e6 100644
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -663,7 +663,7 @@ project--vc-list-files
>    (pcase backend
>      (`Git
>       (let* ((default-directory (expand-file-name
>       (file-name-as-directory dir)))
> -            (args '("-z"))
> +            (args '("-z" "--sparse"))
>              (vc-git-use-literal-pathspecs nil)
>              (include-untracked (project--value-in-dir
>                                  'project-vc-include-untracked
> @@ -703,7 +703,8 @@ project--vc-list-files
>               (delq nil
>                     (mapcar
>                      (lambda (file)
> -                      (unless (member file submodules)
> +                      (unless (or (member file submodules)
> +                                  (eq ?/ (aref file (1- (length file)))))
>                          (if project-files-relative-names
>                              file
>                            (concat default-directory file))))

Works fine for me :-) Though I've added an additional version check
inlined below.

>> Incidentally looking at the version check within `project-files`, it's
>> worthwhile to point out that `--sparse` is likely /not/ compatible with
>> ancient versions of Git. [...]
>
> [...]
>
> We can call vc-git--program-version the same way it's used in
> vc-git-state. Which version should we make the minimum?

The `--sparse` option was introduced in 2.35. The following seems to
work well for me:

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index b29d5ed5404..873bc92729d 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -663,7 +663,8 @@ project--vc-list-files
   (pcase backend
     (`Git
      (let* ((default-directory (expand-file-name (file-name-as-directory dir)))
-            (args '("-z"))
+            (args `("-z" ,@(when (version<= "2.35" (vc-git--program-version))
+                             '("--sparse"))))
             (vc-git-use-literal-pathspecs nil)
             (include-untracked (project--value-in-dir
                                 'project-vc-include-untracked
@@ -703,7 +704,8 @@ project--vc-list-files
              (delq nil
                    (mapcar
                     (lambda (file)
-                      (unless (member file submodules)
+                      (unless (or (member file submodules)
+                                  (eq ?/ (aref file (1- (length file)))))
                         (if project-files-relative-names
                             file
                           (concat default-directory file))))

Since we're getting a bit busy with our conditions, though, it might be
better to start using `cond`:

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index 873bc92729d..b42415154e3 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -704,11 +704,11 @@ project--vc-list-files
              (delq nil
                    (mapcar
                     (lambda (file)
-                      (unless (or (member file submodules)
-                                  (eq ?/ (aref file (1- (length file)))))
-                        (if project-files-relative-names
-                            file
-                          (concat default-directory file))))
+                      (cond
+                       ((member file submodules) nil)
+                       ((eq ?/ (aref file (1- (length file)))) nil)
+                       (project-files-relative-names file)
+                       (t (concat default-directory file))))
                     (split-string
                      (with-output-to-string
                        (apply #'vc-git-command standard-output 0 nil "ls-files" args))

This seems to help readability -- at least to me. There's probably also
a nominal performance benefit since `cond` is a special form.

I've pushed this as branch `sa/sparse-index-2` to my repository. (This
is in addition to the `sa/sparse-index` branch, which contains the
`file-exists-p` check mentioned below plus what might be, I take it, an
ultimately unneeded opt-out parameter in `project-files`.)

It's worth noting that actually performing a `file-exists-p` check here
would have the added benefit of handling the awkward state between Git
2.25 (where sparse-checkout was introduced) and 2.35 (where git-ls-files
learned --sparse) where ls-files could still report things that _look_
like files but are not present. This would be fixed by just replacing
the (eq ..) form with (not (file-exists-p file)).

-- 
Sean Allred





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-18 22:27     ` Dmitry Gutov
  2024-09-19  4:25       ` Sean Allred
@ 2024-09-19  5:41       ` Eli Zaretskii
  2024-09-19  9:34         ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2024-09-19  5:41 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: allred.sean, 73320

> Cc: 73320@debbugs.gnu.org
> Date: Thu, 19 Sep 2024 01:27:03 +0300
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> -                      (unless (member file submodules)
> +                      (unless (or (member file submodules)
> +                                  (eq ?/ (aref file (1- (length file)))))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why not use directory-name-p here?

> > Incidentally looking at the version check within `project-files`, it's
> > worthwhile to point out that `--sparse` is likely /not/ compatible with
> > ancient versions of Git. Does vc have any sort of policy on requiring
> > recent versions of these tools? If the answer is 'not really', I'll
> > additionally want to add some sort of protection against using
> > `--sparse` when the Git version won't understand it. This should be easy
> > enough to do within the implementation of `project--vc-list-files`.
> 
> IIRC it was something like "should work on the CentOS stable", and maybe 
> CentOS N-1 as well. But the release-based distro was discontinued since 
> the last time this question came up ;-(
> 
> We can call vc-git--program-version the same way it's used in 
> vc-git-state. Which version should we make the minimum?

In which version was --sparse support introduced, and what is
considered "ancient" for this purpose?  I would not like us to rely on
shining new features of Git if the price is to break people who have
older versions.  TRT in these cases is to probe the version and act
accordingly, using the new features only when they are supported.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-19  5:41       ` Eli Zaretskii
@ 2024-09-19  9:34         ` Dmitry Gutov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-09-19  9:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: allred.sean, 73320

On 19/09/2024 08:41, Eli Zaretskii wrote:
>> Cc: 73320@debbugs.gnu.org
>> Date: Thu, 19 Sep 2024 01:27:03 +0300
>> From: Dmitry Gutov <dmitry@gutov.dev>
>>
>> -                      (unless (member file submodules)
>> +                      (unless (or (member file submodules)
>> +                                  (eq ?/ (aref file (1- (length file)))))
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Why not use directory-name-p here?

Thanks for the suggestion, I forgot that it's available and fast.

We might want to likewise update completion-emacs22-try-completion and 
vc-git--file-list-is-rootdir.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-19  4:25       ` Sean Allred
@ 2024-09-19  9:44         ` Dmitry Gutov
  2024-09-29  1:19           ` Dmitry Gutov
  2024-10-03 23:57         ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2024-09-19  9:44 UTC (permalink / raw)
  To: Sean Allred; +Cc: 73320

On 19/09/2024 07:25, Sean Allred wrote:

>> diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
>> index b29d5ed5404..a2e3f3f52e6 100644
>> --- a/lisp/progmodes/project.el
>> +++ b/lisp/progmodes/project.el
>> @@ -663,7 +663,7 @@ project--vc-list-files
>>     (pcase backend
>>       (`Git
>>        (let* ((default-directory (expand-file-name
>>        (file-name-as-directory dir)))
>> -            (args '("-z"))
>> +            (args '("-z" "--sparse"))
>>               (vc-git-use-literal-pathspecs nil)
>>               (include-untracked (project--value-in-dir
>>                                   'project-vc-include-untracked
>> @@ -703,7 +703,8 @@ project--vc-list-files
>>                (delq nil
>>                      (mapcar
>>                       (lambda (file)
>> -                      (unless (member file submodules)
>> +                      (unless (or (member file submodules)
>> +                                  (eq ?/ (aref file (1- (length file)))))
>>                           (if project-files-relative-names
>>                               file
>>                             (concat default-directory file))))
> 
> Works fine for me :-) Though I've added an additional version check
> inlined below.

Great, thanks!

>>> Incidentally looking at the version check within `project-files`, it's
>>> worthwhile to point out that `--sparse` is likely /not/ compatible with
>>> ancient versions of Git. [...]
>>
>> [...]
>>
>> We can call vc-git--program-version the same way it's used in
>> vc-git-state. Which version should we make the minimum?
> 
> The `--sparse` option was introduced in 2.35. The following seems to
> work well for me:
> 
> diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
> index b29d5ed5404..873bc92729d 100644
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -663,7 +663,8 @@ project--vc-list-files
>     (pcase backend
>       (`Git
>        (let* ((default-directory (expand-file-name (file-name-as-directory dir)))
> -            (args '("-z"))
> +            (args `("-z" ,@(when (version<= "2.35" (vc-git--program-version))
> +                             '("--sparse"))))
>               (vc-git-use-literal-pathspecs nil)
>               (include-untracked (project--value-in-dir
>                                   'project-vc-include-untracked
> @@ -703,7 +704,8 @@ project--vc-list-files
>                (delq nil
>                      (mapcar
>                       (lambda (file)
> -                      (unless (member file submodules)
> +                      (unless (or (member file submodules)
> +                                  (eq ?/ (aref file (1- (length file)))))
>                           (if project-files-relative-names
>                               file
>                             (concat default-directory file))))

Like Eli suggested, we can use directory-name-p instead of the second 
condition.

> Since we're getting a bit busy with our conditions, though, it might be
> better to start using `cond`:
> 
> diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
> index 873bc92729d..b42415154e3 100644
> --- a/lisp/progmodes/project.el
> +++ b/lisp/progmodes/project.el
> @@ -704,11 +704,11 @@ project--vc-list-files
>                (delq nil
>                      (mapcar
>                       (lambda (file)
> -                      (unless (or (member file submodules)
> -                                  (eq ?/ (aref file (1- (length file)))))
> -                        (if project-files-relative-names
> -                            file
> -                          (concat default-directory file))))
> +                      (cond
> +                       ((member file submodules) nil)
> +                       ((eq ?/ (aref file (1- (length file)))) nil)
> +                       (project-files-relative-names file)
> +                       (t (concat default-directory file))))
>                       (split-string
>                        (with-output-to-string
>                          (apply #'vc-git-command standard-output 0 nil "ls-files" args))
> 
> This seems to help readability -- at least to me. There's probably also
> a nominal performance benefit since `cond` is a special form.

I think I'd rather keep the 'unless' for now: it groups the first two 
cases a bit more clearly.

> I've pushed this as branch `sa/sparse-index-2` to my repository. (This
> is in addition to the `sa/sparse-index` branch, which contains the
> `file-exists-p` check mentioned below plus what might be, I take it, an
> ultimately unneeded opt-out parameter in `project-files`.)

Yeah, it doesn't seem to be useful to return files that don't exist in 
the current work directory - no idea what a caller would do with them.

> It's worth noting that actually performing a `file-exists-p` check here
> would have the added benefit of handling the awkward state between Git
> 2.25 (where sparse-checkout was introduced) and 2.35 (where git-ls-files
> learned --sparse) where ls-files could still report things that _look_
> like files but are not present. This would be fixed by just replacing
> the (eq ..) form with (not (file-exists-p file)).

'file-exists-p' does I/O, so it would make processing an order of a 
magnitude slower. Here's a benchmark you can try:

   (benchmark-run 1 (project-files (project-current)))





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-19  9:44         ` Dmitry Gutov
@ 2024-09-29  1:19           ` Dmitry Gutov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-09-29  1:19 UTC (permalink / raw)
  To: Sean Allred; +Cc: 73320-done

Hi Sean,

So I've pushed a slightly tweaked version of your last patch (without 
the 'cond') to master, commit 8d9a4647fbc, it seems to handle all of the 
requirements discussed.

Please test it out when you have the time, and of course any further 
suggestions and improvements are welcome.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-09-19  4:25       ` Sean Allred
  2024-09-19  9:44         ` Dmitry Gutov
@ 2024-10-03 23:57         ` Dmitry Gutov
  2024-10-04  7:48           ` Michael Albinus via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2024-10-03 23:57 UTC (permalink / raw)
  To: Sean Allred, Michael Albinus; +Cc: 73320

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

On 19/09/2024 07:25, Sean Allred wrote:
> awkward state between Git
> 2.25 (where sparse-checkout was introduced) and 2.35 (where git-ls-files
> learned --sparse)

Speaking of versions, I've triggered an error on a remote host (with 
Tramp) because it had older Git installed.

It was running a Trisquel 9.0, which is pretty old, but looking at 
https://repology.org/project/git/versions it appears even the latest 
Trisquel (11.0) only uses Git 2.34. Same for Ubuntu 22.04 which it's 
based on, both officially supported until 2027.

The attached makes detected version connection-local, that's step 1 (*).

It could use some review, though. There aren't many examples of doing 
that in Emacs code.

Michael, could you take a look? Does this look reasonable? Should 
:application be 'tramp' or 'vc-git'? Is the 
'connection-local-set-profiles' call needed here at all or can be skipped?

(*) Step 2 might be to limit the --sparse argument to Emacs 31 and 
newer. :-/

[-- Attachment #2: vc-git--program-version-connection-local.diff --]
[-- Type: text/x-patch, Size: 2141 bytes --]

diff --git a/lisp/vc/vc-git.el b/lisp/vc/vc-git.el
index 05400523048..2a40539adeb 100644
--- a/lisp/vc/vc-git.el
+++ b/lisp/vc/vc-git.el
@@ -315,21 +315,31 @@ vc-git--state-code
 
 (defvar vc-git--program-version nil)
 
+(connection-local-set-profile-variables
+ 'vc-git-connection-default-profile
+ '((vc-git--program-version . nil)))
+
+(connection-local-set-profiles
+ '(:application vc-git)
+ 'vc-git-connection-default-profile)
+
 (defun vc-git--program-version ()
-  (or vc-git--program-version
-      (let ((version-string
-             (vc-git--run-command-string nil "version")))
-        (setq vc-git--program-version
-              (if (and version-string
-                       ;; Some Git versions append additional strings
-                       ;; to the numerical version string. E.g., Git
-                       ;; for Windows appends ".windows.N", while Git
-                       ;; for Mac appends " (Apple Git-N)". Capture
-                       ;; numerical version and ignore the rest.
-                       (string-match "git version \\([0-9][0-9.]+\\)"
-                                     version-string))
-                  (string-trim-right (match-string 1 version-string) "\\.")
-                "0")))))
+  (with-connection-local-variables
+   (or vc-git--program-version
+       (let ((version-string
+              (vc-git--run-command-string nil "version")))
+         (setq-connection-local
+          vc-git--program-version
+          (if (and version-string
+                   ;; Some Git versions append additional strings
+                   ;; to the numerical version string. E.g., Git
+                   ;; for Windows appends ".windows.N", while Git
+                   ;; for Mac appends " (Apple Git-N)". Capture
+                   ;; numerical version and ignore the rest.
+                   (string-match "git version \\([0-9][0-9.]+\\)"
+                                 version-string))
+              (string-trim-right (match-string 1 version-string) "\\.")
+            "0"))))))
 
 (defun vc-git--git-path (&optional path)
   "Resolve .git/PATH for the current working tree.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-03 23:57         ` Dmitry Gutov
@ 2024-10-04  7:48           ` Michael Albinus via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-10-04 21:25             ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Albinus via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-10-04  7:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Sean Allred, 73320

Dmitry Gutov <dmitry@gutov.dev> writes:

Hi Dmitry,

> It could use some review, though. There aren't many examples of doing
> that in Emacs code.

In Eshell, Jim Porter makes extensive use of connection-local
variables. He has also added some functions which are useful (not
applied in Tramp yet).

> Michael, could you take a look? Does this look reasonable? Should
> :application be 'tramp' or 'vc-git'?

In general, it doesn't matter. But since your change is dedicated to
vc-git, using the application `vc-git' makes sense.

However, you use with-connection-local-variables, which uses the
application `tramp' by default. Either use
with-connection-local-application-variables instead, or let-bind
connection-local-default-application to `vc-git'.

I'm not sure how setq-connection-local behaves wrt to the application
(this function was written by Jim, I've never used it). But binding
connection-local-default-application should be good enough, I believe.

> Is the 'connection-local-set-profiles' call needed here at all or can
> be skipped?

It is needed. connection-local-set-profile-variables declares only the
variables, connection-local-set-profiles activates them.

Best regards, Michael.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-04  7:48           ` Michael Albinus via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-10-04 21:25             ` Dmitry Gutov
  2024-10-05  6:48               ` Eli Zaretskii
  2024-10-07  0:41               ` Jim Porter
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-10-04 21:25 UTC (permalink / raw)
  To: Michael Albinus, Eli Zaretskii; +Cc: Sean Allred, 73320

Hi Michael, Eli,

On 04/10/2024 10:48, Michael Albinus wrote:

>> It could use some review, though. There aren't many examples of doing
>> that in Emacs code.
> In Eshell, Jim Porter makes extensive use of connection-local
> variables. He has also added some functions which are useful (not
> applied in Tramp yet).

Yep, Eshell seems to be have been the only other client of 
connection-local, until now. I was wondering whether setting a unique 
application name is the recommended pattern, though.

> However, you use with-connection-local-variables, which uses the
> application `tramp' by default. Either use
> with-connection-local-application-variables instead, or let-bind
> connection-local-default-application to `vc-git'.
> 
> I'm not sure how setq-connection-local behaves wrt to the application
> (this function was written by Jim, I've never used it). But binding
> connection-local-default-application should be good enough, I believe.
> 
>> Is the 'connection-local-set-profiles' call needed here at all or can
>> be skipped?
> It is needed. connection-local-set-profile-variables declares only the
> variables, connection-local-set-profiles activates them.

Thanks, Michael. Binding connection-local-default-application seems like 
a good approach for Emacs 27-28, but vc-git is not distributed 
separately, so might as well use the newer macro.

Here's the updated patch.

Eli, any chance this can go in Emacs 30? Could be considered a bugfix, 
since we're fixing Git version detection on remote hosts. Not a 
regression, though.

diff --git a/lisp/vc/vc-git.el b/lisp/vc/vc-git.el
index 05400523048..5a7ffeffc9d 100644
--- a/lisp/vc/vc-git.el
+++ b/lisp/vc/vc-git.el
@@ -315,11 +315,21 @@ vc-git--state-code

  (defvar vc-git--program-version nil)

+(connection-local-set-profile-variables
+ 'vc-git-connection-default-profile
+ '((vc-git--program-version . nil)))
+
+(connection-local-set-profiles
+ '(:application vc-git)
+ 'vc-git-connection-default-profile)
+
  (defun vc-git--program-version ()
-  (or vc-git--program-version
-      (let ((version-string
-             (vc-git--run-command-string nil "version")))
-        (setq vc-git--program-version
+  (with-connection-local-application-variables 'vc-git
+   (or vc-git--program-version
+       (let ((version-string
+              (vc-git--run-command-string nil "version")))
+         (setq-connection-local
+              vc-git--program-version
                (if (and version-string
                         ;; Some Git versions append additional strings
                         ;; to the numerical version string. E.g., Git
@@ -329,7 +339,7 @@ vc-git--program-version
                         (string-match "git version \\([0-9][0-9.]+\\)"
                                       version-string))
                    (string-trim-right (match-string 1 version-string) 
"\\.")
-                "0")))))
+                "0"))))))

  (defun vc-git--git-path (&optional path)
    "Resolve .git/PATH for the current working tree.






^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-04 21:25             ` Dmitry Gutov
@ 2024-10-05  6:48               ` Eli Zaretskii
  2024-10-05 12:33                 ` Dmitry Gutov
  2024-10-07  0:41               ` Jim Porter
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2024-10-05  6:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 73320, allred.sean, michael.albinus

> Date: Sat, 5 Oct 2024 00:25:13 +0300
> Cc: Sean Allred <allred.sean@gmail.com>, 73320@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >> Is the 'connection-local-set-profiles' call needed here at all or can
> >> be skipped?
> > It is needed. connection-local-set-profile-variables declares only the
> > variables, connection-local-set-profiles activates them.
> 
> Thanks, Michael. Binding connection-local-default-application seems like 
> a good approach for Emacs 27-28, but vc-git is not distributed 
> separately, so might as well use the newer macro.
> 
> Here's the updated patch.
> 
> Eli, any chance this can go in Emacs 30? Could be considered a bugfix, 
> since we're fixing Git version detection on remote hosts. Not a 
> regression, though.

Given that we don't yet have a good understanding of this mechanism
and its implications, I'd prefer to install this on master.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-05  6:48               ` Eli Zaretskii
@ 2024-10-05 12:33                 ` Dmitry Gutov
  2024-10-07 23:55                   ` Dmitry Gutov
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Gutov @ 2024-10-05 12:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 73320, allred.sean, michael.albinus

On 05/10/2024 09:48, Eli Zaretskii wrote:
>> Date: Sat, 5 Oct 2024 00:25:13 +0300
>> Cc: Sean Allred<allred.sean@gmail.com>,73320@debbugs.gnu.org
>> From: Dmitry Gutov<dmitry@gutov.dev>
>>
>>>> Is the 'connection-local-set-profiles' call needed here at all or can
>>>> be skipped?
>>> It is needed. connection-local-set-profile-variables declares only the
>>> variables, connection-local-set-profiles activates them.
>> Thanks, Michael. Binding connection-local-default-application seems like
>> a good approach for Emacs 27-28, but vc-git is not distributed
>> separately, so might as well use the newer macro.
>>
>> Here's the updated patch.
>>
>> Eli, any chance this can go in Emacs 30? Could be considered a bugfix,
>> since we're fixing Git version detection on remote hosts. Not a
>> regression, though.
> Given that we don't yet have a good understanding of this mechanism
> and its implications, I'd prefer to install this on master.

Okay, then we might only support --sparse with Emacs 31. Otherwise, a 
user could upgrade project.el from ELPA and have that in Emacs 30 too.

with-connection-local-application-variables had been added in Emacs 
29.1, BTW.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-04 21:25             ` Dmitry Gutov
  2024-10-05  6:48               ` Eli Zaretskii
@ 2024-10-07  0:41               ` Jim Porter
  2024-10-07 23:38                 ` Dmitry Gutov
  1 sibling, 1 reply; 17+ messages in thread
From: Jim Porter @ 2024-10-07  0:41 UTC (permalink / raw)
  To: Dmitry Gutov, Michael Albinus, Eli Zaretskii; +Cc: Sean Allred, 73320

On 10/4/2024 2:25 PM, Dmitry Gutov wrote:
> Hi Michael, Eli,
> 
> On 04/10/2024 10:48, Michael Albinus wrote:
> 
>>> It could use some review, though. There aren't many examples of doing
>>> that in Emacs code.
>> In Eshell, Jim Porter makes extensive use of connection-local
>> variables. He has also added some functions which are useful (not
>> applied in Tramp yet).
> 
> Yep, Eshell seems to be have been the only other client of 
> connection-local, until now. I was wondering whether setting a unique 
> application name is the recommended pattern, though.

If I remember things correctly, I think using ":application eshell" was 
a suggestion from Michael Albinus. (I don't think I'd have come up with 
the idea on my own, since I didn't know much about connection-local 
variables at the time.)

However, I think can be a good move to use a unique application name 
when it makes sense because it reduces the chance of conflicts. For 
example, in a future version of Eshell, I might make 
'process-environment' connection-local for ":application eshell" (it's a 
long story). This would let users change remote env vars in Eshell 
freely without interfering with the ":application tramp" remote env 
vars, which would get used in other cases (e.g. when opening a remote 
file or using M-x shell remotely).





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-07  0:41               ` Jim Porter
@ 2024-10-07 23:38                 ` Dmitry Gutov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-10-07 23:38 UTC (permalink / raw)
  To: Jim Porter, Michael Albinus, Eli Zaretskii; +Cc: Sean Allred, 73320

On 07/10/2024 03:41, Jim Porter wrote:
> On 10/4/2024 2:25 PM, Dmitry Gutov wrote:
>> Hi Michael, Eli,
>>
>> On 04/10/2024 10:48, Michael Albinus wrote:
>>
>>>> It could use some review, though. There aren't many examples of doing
>>>> that in Emacs code.
>>> In Eshell, Jim Porter makes extensive use of connection-local
>>> variables. He has also added some functions which are useful (not
>>> applied in Tramp yet).
>>
>> Yep, Eshell seems to be have been the only other client of connection- 
>> local, until now. I was wondering whether setting a unique application 
>> name is the recommended pattern, though.
> 
> If I remember things correctly, I think using ":application eshell" was 
> a suggestion from Michael Albinus. (I don't think I'd have come up with 
> the idea on my own, since I didn't know much about connection-local 
> variables at the time.)
> 
> However, I think can be a good move to use a unique application name 
> when it makes sense because it reduces the chance of conflicts. For 
> example, in a future version of Eshell, I might make 'process- 
> environment' connection-local for ":application eshell" (it's a long 
> story). This would let users change remote env vars in Eshell freely 
> without interfering with the ":application tramp" remote env vars, which 
> would get used in other cases (e.g. when opening a remote file or using 
> M-x shell remotely).

Thanks, makes sense. Seems particularly useful for users who'd want to 
fine-tune the sets for one "application" independent from the other.

The var in vc-git is less of a user-facing thing (it still uses "--" in 
its name), so I was wondering what's the default solution that we 
recommend. Seems like it is to assign an application anyway.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index
  2024-10-05 12:33                 ` Dmitry Gutov
@ 2024-10-07 23:55                   ` Dmitry Gutov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Gutov @ 2024-10-07 23:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael.albinus, allred.sean, 73320

On 05/10/2024 15:33, Dmitry Gutov wrote:
> Okay, then we might only support --sparse with Emacs 31. Otherwise, a 
> user could upgrade project.el from ELPA and have that in Emacs 30 too.

Now pushed to master.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-10-07 23:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 16:55 bug#73320: [PATCH] project--vc-list-files: use Git's sparse-index Sean Allred
2024-09-17 22:54 ` Dmitry Gutov
2024-09-18  0:36   ` Sean Allred
2024-09-18 22:27     ` Dmitry Gutov
2024-09-19  4:25       ` Sean Allred
2024-09-19  9:44         ` Dmitry Gutov
2024-09-29  1:19           ` Dmitry Gutov
2024-10-03 23:57         ` Dmitry Gutov
2024-10-04  7:48           ` Michael Albinus via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-10-04 21:25             ` Dmitry Gutov
2024-10-05  6:48               ` Eli Zaretskii
2024-10-05 12:33                 ` Dmitry Gutov
2024-10-07 23:55                   ` Dmitry Gutov
2024-10-07  0:41               ` Jim Porter
2024-10-07 23:38                 ` Dmitry Gutov
2024-09-19  5:41       ` Eli Zaretskii
2024-09-19  9:34         ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).