From: Spencer Baugh <sbaugh@janestreet.com>
To: 69188@debbugs.gnu.org
Cc: Dmitry Gutov <dmitry@gutov.dev>
Subject: bug#69188: 30.0.50; project-files + project-find-file is slow in large repositories
Date: Thu, 15 Feb 2024 17:55:46 -0500 [thread overview]
Message-ID: <iera5o11gnh.fsf@igm-qws-u22796a.mail-host-address-is-not-set> (raw)
(project-files (project-current)) takes around 1 second in Linux (80k
files) and 7 seconds in my larger (500k file) repository.
With this patch:
diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index c7c07c3d34c..037beaa835a 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -667,12 +667,15 @@
(setq i (concat i "**"))))
i)))
extra-ignores)))))
- (setq files
- (mapcar
- (lambda (file) (concat default-directory file))
- (split-string
- (apply #'vc-git--run-command-string nil "ls-files" args)
- "\0" t)))
+ (with-temp-buffer
+ (let ((ok (apply #'vc-git--out-ok "ls-files" args))
+ (pt (point-min)))
+ (unless ok
+ (error "File listing failed: %s" (buffer-string)))
+ (goto-char pt)
+ (while (search-forward "\0" nil t)
+ (push (concat default-directory (buffer-substring-no-properties pt (1- (point)))) files)
+ (setq pt (point)))))
(when (project--vc-merge-submodules-p default-directory)
;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
(let* ((submodules (project--git-submodules))
project-files in Linux takes around .75 seconds.
If I further remove the (concat default-directory ...) around each file,
it speeds up to .5 seconds.
(Note that git ls-files itself takes only around 20 milliseconds)
My large repository (which uses Mercurial) has a custom project-files
which is basically:
(with-temp-buffer
(unless (zerop (apply #'call-process "rhg" nil t nil "files"))
(error "File listing failed: %s" (buffer-string)))
(goto-char (point-min))
(let ((pt (point))
res)
(while (search-forward "\n" nil t)
(push (file-name-concat default-directory (buffer-substring-no-properties pt (1- (point)))) res)
(setq pt (point)))
res))
Likewise, removing the (concat default-directory ...) speeds my
project-files up from 7 seconds to 4.5 seconds.
This is especially silly because project-find-file then just removes
this default-directory again from all the files, which has yet more
overhead.
My proposal: Could we find a way to make the default-directory not
necessary for the files returned from project-files?
Perhaps project-files could be allowed to return relative file paths
which are relative to the project root. Then in the common case where
all the files are within the project root, project-find-file would be
way faster. Happy to implement this, if it makes sense.
Another optimization I've considered: We could run the process
asynchronously so project-files parsing can be parallel with the
process; but the process is usually very fast anyway, that's not most of
the overhead, so that won't be a big win.
However, that would make it easy for project-files as a whole to be
asynchronous. Then that would allow project-find-file to start the
listing in the background, and then we'd write a completion table which
completes only over whatever files we've already read into Emacs. I
think this would be a lot nicer for most use-cases, and I'd again be
happy to implement this.
Also happy to implement any other optimizations you think might make
sense.
In GNU Emacs 30.0.50 (build 37, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.15.12, Xaw scroll bars) of 2024-02-13 built on
igm-qws-u22796a
Repository revision: a24a2b1ceb12f11c9d345190fbf554f27c4ec186
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.9 (Green Obsidian)
Configured using:
'configure -C --with-x-toolkit=lucid 'CFLAGS=-O0 -g3'
--without-native-compilation --without-gif'
Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG
SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM
XINPUT2 XPM LUCID ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
minibuffer-regexp-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd touch-screen tool-bar dnd fontset
image regexp-opt fringe tabulated-list replace newcomment text-mode
lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch
easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting cairo x-toolkit
xinput2 x multi-tty move-toolbar make-network-process emacs)
Memory information:
((conses 16 65052 9318) (symbols 48 9539 0) (strings 32 22452 1449)
(string-bytes 1 659675) (vectors 16 9245)
(vector-slots 8 111110 9295) (floats 8 40 17) (intervals 56 262 0)
(buffers 976 10))
next reply other threads:[~2024-02-15 22:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 22:55 Spencer Baugh [this message]
2024-02-18 18:56 ` bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories Eli Zaretskii
2024-02-18 19:42 ` Dmitry Gutov
2024-02-18 19:45 ` Eli Zaretskii
2024-02-18 20:11 ` Dmitry Gutov
2024-02-18 20:18 ` Eli Zaretskii
2024-02-23 21:34 ` bug#69233: bug#69188: " Spencer Baugh
2024-04-13 2:34 ` Dmitry Gutov
2024-04-16 23:48 ` Dmitry Gutov
2024-04-29 20:27 ` bug#69188: bug#69233: " Spencer Baugh
2024-05-05 0:29 ` Dmitry Gutov
2024-04-29 21:04 ` Spencer Baugh
2024-05-05 3:32 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=iera5o11gnh.fsf@igm-qws-u22796a.mail-host-address-is-not-set \
--to=sbaugh@janestreet.com \
--cc=69188@debbugs.gnu.org \
--cc=dmitry@gutov.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).