all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: emacs-devel@gnu.org
Subject: Re: A project-files implementation for Git projects
Date: Thu, 3 Oct 2019 16:19:04 +0300	[thread overview]
Message-ID: <1ffb046d-f2de-b1d5-d6a9-a6059616cc08@yandex.ru> (raw)
In-Reply-To: <87r23u9qjc.fsf@gnu.org>

On 03.10.2019 11:33, Tassilo Horn wrote:

>> +(cl-defmethod project-files ((project (head vc)) &optional dirs)
>> +  (cl-mapcan
>> +   (lambda (dir)
>> +     (let (backend)
>> +       (if (and (file-equal-p dir (cdr project))
>> +                (setq backend (vc-responsible-backend dir))
>> +                nil
>                     ^^^
> 
> So this disables the VC operation.  I've removed it, and the speed
> improvement is good here.  This is my test case (the Emacs repository):

Yes, sorry. Used this for comparative testing and forgot to take it out.

The Emacs repository is the one I've mostly tested on as well.

> --8<---------------cut here---------------start------------->8---
> (let* ((dir "~/Repos/el/emacs")
>         (p (project-current nil dir))
>         f1 f2)
>    (let ((t1 (benchmark-run 10
> 	      (setq f1 (project-files p))))
> 	(t2 (benchmark-run 10
> 	      (setq f2 (project--files-in-directory
> 			dir (project--dir-ignores p dir))))))
>      (message "Files: %d (VC) vs. %d (find)" (length f1) (length f2))
>      (message "VC) Elapsed time: %fs (%fs in %d GCs)"
> 	     (car t1) (nth 2 t1) (nth 1 t1))
>      (message "Find) Elapsed time: %fs (%fs in %d GCs)"
> 	     (car t2) (nth 2 t2) (nth 1 t2)))
>    (let ((d1 (cl-set-difference f1 f2 :test #'string=))
> 	(d2 (cl-set-difference f2 f1 :test #'string=)))
>      (message "Files found by VC but not by find:")
>      (dolist (f d1)
>        (message "  %s" f))
>      (message "Files found by find but not by VC:")
>      (dolist (f d2)
>        (message "  %s" f))))
> --8<---------------cut here---------------end--------------->8---
> 
> Here is the output:
> 
> --8<---------------cut here---------------start------------->8---
> VC) Elapsed time: 1.379560s (0.308720s in 6 GCs)
> Find) Elapsed time: 4.397054s (0.200695s in 4 GCs)
> Files found by VC but not by find:
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-1.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-2.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-2a.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-3.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-4.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/cons-5.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/drawers.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/lambda-1.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/lambda-2.pdf
>    /home/horn/Repos/el/emacs/doc/lispintro/lambda-3.pdf
>    /home/horn/Repos/el/emacs/etc/refcards/Makefile
>    /home/horn/Repos/el/emacs/etc/refcards/gnus-logo.pdf
>    /home/horn/Repos/el/emacs/lib/_Noreturn.h
>    /home/horn/Repos/el/emacs/lib/stdalign.in.h
>    /home/horn/Repos/el/emacs/lib/stddef.in.h
>    /home/horn/Repos/el/emacs/lib/stdint.in.h
>    /home/horn/Repos/el/emacs/lib/stdio-impl.h
>    /home/horn/Repos/el/emacs/lib/stdio.in.h
>    /home/horn/Repos/el/emacs/lib/stdlib.in.h
>    /home/horn/Repos/el/emacs/m4/__inline.m4
>    /home/horn/Repos/el/emacs/test/data/xdg/mimeinfo.cache
>    /home/horn/Repos/el/emacs/test/lisp/progmodes/flymake-resources/Makefile
>    /home/horn/Repos/el/emacs/test/manual/etags/Makefile
>    /home/horn/Repos/el/emacs/test/manual/etags/make-src/Makefile
>    /home/horn/Repos/el/emacs/test/manual/indent/Makefile

The difference is that the 'find' based method does not support 
whitelist entries yet.

When it does, that might make its performance slightly worse, but 
probably not in gtk or gnulib repos.

> Files found by find but not by VC:
>    /home/horn/Repos/el/emacs/aclocal.m4
>    /home/horn/Repos/el/emacs/config.status
>    /home/horn/Repos/el/emacs/configure
>    /home/horn/Repos/el/emacs/info/dir
> --8<---------------cut here---------------end--------------->8---
> 
> Then I did it on a clean checkout of the gtk repository and got this
> result:
> 
> --8<---------------cut here---------------start------------->8---
> Files: 4774 (VC) vs. 4774 (find)
> VC) Elapsed time: 1.721054s (0.461112s in 9 GCs)
> Find) Elapsed time: 0.634624s (0.152549s in 3 GCs)
> Files found by VC but not by find:
> Files found by find but not by VC:
> nil
> --8<---------------cut here---------------end--------------->8---
> 
> So here, Git has been much slower that find!

Interesting! I haven't seen that result before, but it sounds plausible. 
IME it's ignore rules that make 'find' work slower. Git optimizes that 
logic somehow. So on projects that have few ignore rules 'find' could be 
faster.

I've also tried the gtk repo, and the performance ratio over here is the 
same, although in my case 'git ls-files' here is faster than 'git 
ls-files' in Emacs's repo (and 'find' is twice faster still).

> And again with gnulib:
> 
> --8<---------------cut here---------------start------------->8---
> Files: 9936 (VC) vs. 9936 (find)
> VC) Elapsed time: 3.444869s (0.902124s in 16 GCs)
> Find) Elapsed time: 1.380269s (0.285082s in 5 GCs)
> Files found by VC but not by find:
> Files found by find but not by VC:
> --8<---------------cut here---------------end--------------->8---
> 
> Again Git was slower.  What my gtk and gnulib repositories have in
> common is that they are clean, i.e., no build artifacts which would be
> matched by the exclude args passed to find...

gtk has only one .gitignore entry, gnulib has 8, but fairly simple ones.

So, what should we do here? Maybe:

1. Implement whitelist rules support for 'find'.

2. Add a defcustom project-vc-list-files-method? With a value 'auto' 
which would check the backend and Git version. Maybe the presence of 
'find' as well. Other possible values would be 'find' and 'vc'.

If you have time, could you compare the performance of 'find' and 'git 
ls-files' in the command line? Because when simply redirecting to a file 
I'm seeing a different result:

$ bash -c "time git ls-files >test"

real	0m0,011s
user	0m0,005s
sys	0m0,006s

$ bash -c "time find . >test2"

real	0m0,026s
user	0m0,008s
sys	0m0,018s

That could indicate some inefficiency in processing the output in Emacs.



  reply	other threads:[~2019-10-03 13:19 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06  9:19 A project-files implementation for Git projects Tassilo Horn
2019-09-06 12:52 ` Stefan Monnier
2019-09-10  6:25   ` Tassilo Horn
2019-09-10 12:56     ` Stefan Monnier
2019-09-10 13:39       ` Tassilo Horn
2019-09-10 13:56         ` Stefan Monnier
2019-09-11 11:00           ` Tassilo Horn
2019-09-11 20:01             ` Tassilo Horn
2019-09-13 20:38               ` Tassilo Horn
2019-09-14  0:29               ` Dmitry Gutov
2019-09-14 16:26                 ` Tassilo Horn
2019-09-15 18:56                   ` Dmitry Gutov
2019-09-16  2:27                     ` Eli Zaretskii
2019-09-16  3:36                       ` Dmitry Gutov
2019-09-16 15:25                         ` Eli Zaretskii
2019-09-17 10:46                           ` Dmitry Gutov
2019-09-17 12:03                             ` Eli Zaretskii
2019-09-17 12:55                               ` Dmitry Gutov
2019-09-17 13:14                                 ` Eli Zaretskii
2019-09-19 15:33                                   ` Dmitry Gutov
2019-09-19 17:29                                     ` Eli Zaretskii
2019-09-20 11:25                                       ` Dmitry Gutov
2019-09-20 12:59                                         ` Eli Zaretskii
2019-09-20 13:28                                           ` Dmitry Gutov
2019-09-20 13:45                                             ` Stefan Monnier
2019-09-20 13:54                                               ` Dmitry Gutov
2019-09-20 14:12                                                 ` Michael Albinus
2019-09-20 14:30                                                   ` Eli Zaretskii
2019-09-20 14:51                                                     ` Dmitry Gutov
2019-09-20 15:04                                                       ` Michael Albinus
2019-09-22  9:23                                                         ` Dmitry Gutov
2019-09-20 14:55                                                     ` Michael Albinus
2019-09-20 15:55                                                       ` Eli Zaretskii
2019-09-20 15:01                                                 ` Stefan Monnier
2019-09-20 15:59                                                   ` Eli Zaretskii
2019-09-20 17:32                                                     ` Stefan Monnier
2019-09-20 17:49                                                       ` Eli Zaretskii
2019-09-20 18:04                                                         ` Stefan Monnier
2019-09-20 14:23                                             ` Eli Zaretskii
2019-09-20 14:48                                               ` Dmitry Gutov
2019-09-16 13:32                     ` Tassilo Horn
2019-09-17 11:06                       ` Dmitry Gutov
2019-09-18 17:15                         ` Tassilo Horn
2019-09-19 16:01                           ` Dmitry Gutov
2019-09-22  8:56                             ` Tassilo Horn
2019-09-22  9:37                               ` Dmitry Gutov
2019-09-23  7:42                                 ` Tassilo Horn
2019-09-23 12:22                                   ` Dmitry Gutov
2019-09-27 16:17                                     ` Tassilo Horn
2019-09-30  0:09                                       ` Dmitry Gutov
2019-09-30  0:25                                         ` Stefan Monnier
2019-09-30  6:50                                           ` Dmitry Gutov
2019-09-30 17:09                                             ` Stefan Monnier
2019-10-01  8:19                                               ` Dmitry Gutov
2019-10-01 12:31                                                 ` Stefan Monnier
2019-10-01 13:10                                                   ` Stefan Monnier
2019-10-01 23:38                                                     ` Dmitry Gutov
2019-10-03  9:25                                                       ` Felician Nemeth
2019-10-03 10:32                                                         ` Dmitry Gutov
2019-10-03 11:15                                                           ` Felician Nemeth
2019-10-03 12:31                                                             ` Dmitry Gutov
2019-10-03 14:39                                                               ` Felician Nemeth
2019-10-03 14:42                                                                 ` Dmitry Gutov
2019-10-03 15:10                                                                   ` Felician Nemeth
2019-10-03 15:15                                                                     ` Dmitry Gutov
2019-10-01  8:11                                         ` Dmitry Gutov
2019-10-03  8:33                                           ` Tassilo Horn
2019-10-03 13:19                                             ` Dmitry Gutov [this message]
2019-10-03 17:15                                               ` Tassilo Horn
2019-10-03 22:49                                                 ` Dmitry Gutov
2019-10-04  7:47                                                   ` Tassilo Horn
2019-10-04  7:58                                                     ` Tassilo Horn
2019-10-04 13:16                                                       ` Dmitry Gutov
2019-10-04  8:49                                                     ` Tassilo Horn
2019-10-04 12:57                                                       ` Dmitry Gutov
2019-10-04 13:59                                                         ` Tassilo Horn
2019-10-04 15:24                                                           ` Dmitry Gutov
2019-10-04 12:16                                                     ` Stefan Monnier
2019-10-04 13:08                                                     ` Dmitry Gutov
2019-10-03  7:41                                         ` Tassilo Horn
2019-10-03 12:33                                           ` Dmitry Gutov
2019-10-03 12:51                                             ` Tassilo Horn
2019-10-04  5:52                                             ` Co-authoring and attribution in commit message (was: A project-files implementation for Git projects) Kévin Le Gouguec
2019-10-04  8:33                                               ` Co-authoring and attribution in commit message Dmitry Gutov
2019-10-04 21:36                                                 ` Karl Fogel
2019-10-05  6:55                                                   ` Eli Zaretskii
2019-10-03 23:02                                         ` A project-files implementation for Git projects Dmitry Gutov
2019-09-14  0:33             ` Dmitry Gutov
2019-09-14 16:43               ` Tassilo Horn
2019-09-15  8:29                 ` Dmitry Gutov
2019-09-15  9:06                   ` Dmitry Gutov
2019-09-10 13:57         ` Robert Pluim
2019-09-10 14:24         ` Dmitry Gutov
2019-09-10 14:41     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ffb046d-f2de-b1d5-d6a9-a6059616cc08@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.