From: sbaugh@catern.com
To: Eli Zaretskii <eliz@gnu.org>
Cc: Spencer Baugh <sbaugh@janestreet.com>, 69775@debbugs.gnu.org
Subject: bug#69775: [PATCH] Use regexp-opt in dired-omit-regexp
Date: Sat, 16 Mar 2024 17:15:52 +0000 (UTC) [thread overview]
Message-ID: <8734sqjdyz.fsf@catern.com> (raw)
In-Reply-To: <86o7bh9iz3.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 14 Mar 2024 13:00:00 +0200")
[-- Attachment #1: Type: text/plain, Size: 4170 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Wed, 13 Mar 2024 11:01:05 -0400
>>
>> In my benchmarking, for large dired buffers, using regexp-opt provides
>> around a 3x speedup in omitting.
>
> Can you show a recipe for such benchmarking? I'd like to try that on
> my systems.
>
> Also, what is the slowdown in the (improbable, but possible) case
> where dired-omit-extensions change for each call of dired-omit-regexp?
Yes, run the following after applying the patch:
(require 'dired)
(require 'dired-x)
(require 'cl-lib)
(defun dired-omit-regexp-old ()
(concat (if dired-omit-files (concat "\\(" dired-omit-files "\\)") "")
(if (and dired-omit-files dired-omit-extensions) "\\|" "")
(if dired-omit-extensions
(concat ".";; a non-extension part should exist
"\\("
(mapconcat 'regexp-quote dired-omit-extensions "\\|")
"\\)$")
"")))
(defun my-do-omit (mode)
(let ((regexp
(cl-case mode
(new (dired-omit-regexp))
(old (dired-omit-regexp-old))
(new-uncached (let ((dired-omit--extension-regexp-cache nil)) (dired-omit-regexp)))
(t (error "Bad mode %s" mode)))))
(dired-mark-if
(let ((fn (dired-get-filename nil t)))
(and fn (string-match-p regexp fn)))
nil)))
(defun my-bench-omit (nfiles ntimes)
(let ((default-directory (expand-file-name "test-dired-list")))
(make-directory default-directory t)
(dolist (file (directory-files "." t "test-file"))
(delete-file file))
(dotimes (i nfiles)
(write-region "" nil (format "test-file%s" i) nil 'nomessage nil 'excl))
(let ((dired-omit-mode nil))
(with-current-buffer (let ((inhibit-message t)) (dired-noselect default-directory))
(revert-buffer)
(message "files %s, ntimes %s: new %s old %s new-uncached %s"
nfiles ntimes
(car (benchmark-call (lambda () (my-do-omit 'new)) ntimes))
(car (benchmark-call (lambda () (my-do-omit 'old)) ntimes))
(car (benchmark-call (lambda () (my-do-omit 'new-uncached)) ntimes)))
))))
(my-bench-omit 1 100)
(my-bench-omit 10 100)
(my-bench-omit 100 100)
(my-bench-omit 1000 100)
(my-bench-omit 10000 100)
For me, I get:
$ ./src/emacs -Q --batch -l ../emacs-29/bench-omit.elc
files 1, ntimes 100: new 0.008839979999999999 old 0.018162129 new-uncached 0.031399762
files 10, ntimes 100: new 0.012037615 old 0.040232355000000004 new-uncached 0.037990543
files 100, ntimes 100: new 0.07368538100000001 old 0.314905271 new-uncached 0.10006527300000001
files 1000, ntimes 100: new 0.669103498 old 3.076339984 new-uncached 0.693134644
files 10000, ntimes 100: new 6.336211434 old 30.926320486 new-uncached 6.442762152999999
So the performance improvement is quite substantial for large
directories.
new-uncached is the performance if dired-omit-extensions changes on each
call of dired-omit-regexp. For a directory of 1 file, the overhead of
recomputing regexp-opt every time makes the performance perhaps 2x-3x
worse, but around 10 files the performance improvement from regexp-opt
exceeds the overhead, and above that the uncached version still
outperforms the old version substantially.
If dired-omit-extensions doesn't change every time, the performance is
improved even for directories of 1 file.
>> regexp-opt takes around 5 milliseconds, so to avoid slowing down
>> omitting in small dired buffers we cache the return value.
>>
>> Since omitting is now 3x faster, increase dired-omit-size-limit by 3x.
>>
>> * lisp/dired-x.el (dired-omit--extension-regexp-cache): Add.
>> (dired-omit-regexp): Use regexp-opt.
>> (dired-omit-size-limit): Increase, since omitting is now faster.
>
> I'm okay with these changes, but:
>
> . the change in the default value of dired-omit-size-limit should be
> called out in NEWS
> . please document this variable in the dired-x.texi manual, where we
> document all the other variables relevant to dired-omit mode.
> . the doc string of dired-omit-size-limit is embarrassingly
> unhelpful, so bonus points for fixing that as well
>
> Thanks.
Certainly, updated patch attached.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-regexp-opt-in-dired-omit-regexp.patch --]
[-- Type: text/x-patch, Size: 4598 bytes --]
From ca0ff9d40b85b281e25b998528c1e1a71b9e70c5 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@catern.com>
Date: Sat, 16 Mar 2024 17:11:24 +0000
Subject: [PATCH] Use regexp-opt in dired-omit-regexp
In my benchmarking, for large dired buffers, using regexp-opt provides
around a 3x speedup in omitting.
regexp-opt takes around 5 milliseconds, so to avoid slowing down
omitting in small dired buffers we cache the return value.
Since omitting is now 3x faster, increase dired-omit-size-limit by 3x.
Also, document dired-omit-size-limit better.
* doc/misc/dired-x.texi (Omitting Variables): Document
dired-omit-size-limit.
* etc/NEWS: Announce increase of dired-omit-size-limit.
* lisp/dired-x.el (dired-omit--extension-regexp-cache): Add.
(dired-omit-regexp): Use regexp-opt. (bug#69775)
(dired-omit-size-limit): Increase and improve docs.
---
doc/misc/dired-x.texi | 8 ++++++++
etc/NEWS | 6 ++++++
lisp/dired-x.el | 26 ++++++++++++++++++++------
3 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/doc/misc/dired-x.texi b/doc/misc/dired-x.texi
index 4cad016a0f6..66045c5f759 100644
--- a/doc/misc/dired-x.texi
+++ b/doc/misc/dired-x.texi
@@ -346,6 +346,14 @@ Omitting Variables
match the file name relative to the buffer's top-level directory.
@end defvar
+@defvar dired-omit-size-limit
+If non-@code{nil}, omitting will be skipped if the directory listing
+exceeds this size in bytes. Since omitting can be slow for very large
+directories, this avoids having to wait before seeing the directory.
+This variable is ignored when @code{dired-omit-mode} is called
+interactively, such as by @code{C-x M-o}, so you can still enable
+omitting in the directory after the initial display.
+
@cindex omitting additional files
@defvar dired-omit-marker-char
Temporary marker used by Dired to implement omitting. Should never be used
diff --git a/etc/NEWS b/etc/NEWS
index b4a1c887f2e..fdface7aa0c 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -668,6 +668,12 @@ marked or clicked on files according to the OS conventions. For
example, on systems supporting XDG, this runs 'xdg-open' on the
files.
+*** The default value of 'dired-omit-size-limit' has increased.
+After performance improvements to omitting in large directories, the new
+default value is 300k, up from 100k. This means 'dired-omit-mode' will
+omit files in directories whose directory listing is up to 300 kilobytes
+in size.
+
+++
*** 'dired-listing-switches' handles connection-local values if exist.
This allows to customize different switches for different remote machines.
diff --git a/lisp/dired-x.el b/lisp/dired-x.el
index 62fdd916e69..d7d15028489 100644
--- a/lisp/dired-x.el
+++ b/lisp/dired-x.el
@@ -77,12 +77,17 @@ dired-vm-read-only-folders
(other :tag "non-writable only" if-file-read-only))
:group 'dired-x)
-(defcustom dired-omit-size-limit 100000
- "Maximum size for the \"omitting\" feature.
+(defcustom dired-omit-size-limit 300000
+ "Maximum buffer size for `dired-omit-mode'.
+
+Omitting will be skipped if the directory listing exceeds this size in
+bytes. This variable is ignored when `dired-omit-mode' is called
+interactively.
+
If nil, there is no maximum size."
:type '(choice (const :tag "no maximum" nil) integer)
:group 'dired-x
- :version "29.1")
+ :version "30.1")
(defcustom dired-omit-case-fold 'filesystem
"Determine whether \"omitting\" patterns are case-sensitive.
@@ -506,14 +511,23 @@ dired-omit-expunge
(re-search-forward dired-re-mark nil t))))
count)))
+(defvar dired-omit--extension-regexp-cache
+ nil
+ "A cache of `regexp-opt' applied to `dired-omit-extensions'.
+
+This is a cons whose car is a list of strings and whose cdr is a
+regexp produced by `regexp-opt'.")
+
(defun dired-omit-regexp ()
+ (unless (equal dired-omit-extensions (car dired-omit--extension-regexp-cache))
+ (setq dired-omit--extension-regexp-cache
+ (cons dired-omit-extensions (regexp-opt dired-omit-extensions))))
(concat (if dired-omit-files (concat "\\(" dired-omit-files "\\)") "")
(if (and dired-omit-files dired-omit-extensions) "\\|" "")
(if dired-omit-extensions
(concat ".";; a non-extension part should exist
- "\\("
- (mapconcat 'regexp-quote dired-omit-extensions "\\|")
- "\\)$")
+ (cdr dired-omit--extension-regexp-cache)
+ "$")
"")))
;; Returns t if any work was done, nil otherwise.
--
2.44.0
next prev parent reply other threads:[~2024-03-16 17:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-13 15:01 bug#69775: [PATCH] Use regexp-opt in dired-omit-regexp Spencer Baugh
2024-03-14 11:00 ` Eli Zaretskii
2024-03-16 17:15 ` sbaugh [this message]
2024-03-21 10:38 ` Eli Zaretskii
2024-03-23 13:29 ` sbaugh
2024-03-23 17:11 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8734sqjdyz.fsf@catern.com \
--to=sbaugh@catern.com \
--cc=69775@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=sbaugh@janestreet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.