all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: sbaugh@catern.com
To: Eli Zaretskii <eliz@gnu.org>
Cc: Spencer Baugh <sbaugh@janestreet.com>, 69775@debbugs.gnu.org
Subject: bug#69775: [PATCH] Use regexp-opt in dired-omit-regexp
Date: Sat, 16 Mar 2024 17:15:52 +0000 (UTC)	[thread overview]
Message-ID: <8734sqjdyz.fsf@catern.com> (raw)
In-Reply-To: <86o7bh9iz3.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 14 Mar 2024 13:00:00 +0200")

[-- Attachment #1: Type: text/plain, Size: 4170 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Wed, 13 Mar 2024 11:01:05 -0400
>> 
>> In my benchmarking, for large dired buffers, using regexp-opt provides
>> around a 3x speedup in omitting.
>
> Can you show a recipe for such benchmarking?  I'd like to try that on
> my systems.
>
> Also, what is the slowdown in the (improbable, but possible) case
> where dired-omit-extensions change for each call of dired-omit-regexp?

Yes, run the following after applying the patch:

(require 'dired)
(require 'dired-x)
(require 'cl-lib)

(defun dired-omit-regexp-old ()
  (concat (if dired-omit-files (concat "\\(" dired-omit-files "\\)") "")
          (if (and dired-omit-files dired-omit-extensions) "\\|" "")
          (if dired-omit-extensions
              (concat ".";; a non-extension part should exist
                      "\\("
                      (mapconcat 'regexp-quote dired-omit-extensions "\\|")
                      "\\)$")
            "")))

(defun my-do-omit (mode)
  (let ((regexp
	 (cl-case mode
	   (new (dired-omit-regexp))
	   (old (dired-omit-regexp-old))
	   (new-uncached (let ((dired-omit--extension-regexp-cache nil)) (dired-omit-regexp)))
	   (t (error "Bad mode %s" mode)))))
    (dired-mark-if
     (let ((fn (dired-get-filename nil t)))
       (and fn (string-match-p regexp fn)))
     nil)))

(defun my-bench-omit (nfiles ntimes)
  (let ((default-directory (expand-file-name "test-dired-list")))
    (make-directory default-directory t)
    (dolist (file (directory-files "." t "test-file"))
      (delete-file file))
    (dotimes (i nfiles)
      (write-region "" nil (format "test-file%s" i) nil 'nomessage nil 'excl))
    (let ((dired-omit-mode nil))
      (with-current-buffer (let ((inhibit-message t)) (dired-noselect default-directory))
	(revert-buffer)
	(message "files %s, ntimes %s: new %s old %s new-uncached %s"
		 nfiles ntimes
		 (car (benchmark-call (lambda () (my-do-omit 'new)) ntimes))
		 (car (benchmark-call (lambda () (my-do-omit 'old)) ntimes))
		 (car (benchmark-call (lambda () (my-do-omit 'new-uncached)) ntimes)))
	))))

(my-bench-omit 1 100)
(my-bench-omit 10 100)
(my-bench-omit 100 100)
(my-bench-omit 1000 100)
(my-bench-omit 10000 100)

For me, I get:
$ ./src/emacs -Q --batch -l ../emacs-29/bench-omit.elc
files 1, ntimes 100: new 0.008839979999999999 old 0.018162129 new-uncached 0.031399762
files 10, ntimes 100: new 0.012037615 old 0.040232355000000004 new-uncached 0.037990543
files 100, ntimes 100: new 0.07368538100000001 old 0.314905271 new-uncached 0.10006527300000001
files 1000, ntimes 100: new 0.669103498 old 3.076339984 new-uncached 0.693134644
files 10000, ntimes 100: new 6.336211434 old 30.926320486 new-uncached 6.442762152999999

So the performance improvement is quite substantial for large
directories.

new-uncached is the performance if dired-omit-extensions changes on each
call of dired-omit-regexp.  For a directory of 1 file, the overhead of
recomputing regexp-opt every time makes the performance perhaps 2x-3x
worse, but around 10 files the performance improvement from regexp-opt
exceeds the overhead, and above that the uncached version still
outperforms the old version substantially.

If dired-omit-extensions doesn't change every time, the performance is
improved even for directories of 1 file.

>> regexp-opt takes around 5 milliseconds, so to avoid slowing down
>> omitting in small dired buffers we cache the return value.
>> 
>> Since omitting is now 3x faster, increase dired-omit-size-limit by 3x.
>> 
>> * lisp/dired-x.el (dired-omit--extension-regexp-cache): Add.
>> (dired-omit-regexp): Use regexp-opt.
>> (dired-omit-size-limit): Increase, since omitting is now faster.
>
> I'm okay with these changes, but:
>
>   . the change in the default value of dired-omit-size-limit should be
>     called out in NEWS
>   . please document this variable in the dired-x.texi manual, where we
>     document all the other variables relevant to dired-omit mode.
>   . the doc string of dired-omit-size-limit is embarrassingly
>     unhelpful, so bonus points for fixing that as well
>
> Thanks.

Certainly, updated patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-regexp-opt-in-dired-omit-regexp.patch --]
[-- Type: text/x-patch, Size: 4598 bytes --]

From ca0ff9d40b85b281e25b998528c1e1a71b9e70c5 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@catern.com>
Date: Sat, 16 Mar 2024 17:11:24 +0000
Subject: [PATCH] Use regexp-opt in dired-omit-regexp

In my benchmarking, for large dired buffers, using regexp-opt provides
around a 3x speedup in omitting.

regexp-opt takes around 5 milliseconds, so to avoid slowing down
omitting in small dired buffers we cache the return value.

Since omitting is now 3x faster, increase dired-omit-size-limit by 3x.
Also, document dired-omit-size-limit better.

* doc/misc/dired-x.texi (Omitting Variables): Document
dired-omit-size-limit.
* etc/NEWS: Announce increase of dired-omit-size-limit.
* lisp/dired-x.el (dired-omit--extension-regexp-cache): Add.
(dired-omit-regexp): Use regexp-opt. (bug#69775)
(dired-omit-size-limit): Increase and improve docs.
---
 doc/misc/dired-x.texi |  8 ++++++++
 etc/NEWS              |  6 ++++++
 lisp/dired-x.el       | 26 ++++++++++++++++++++------
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/doc/misc/dired-x.texi b/doc/misc/dired-x.texi
index 4cad016a0f6..66045c5f759 100644
--- a/doc/misc/dired-x.texi
+++ b/doc/misc/dired-x.texi
@@ -346,6 +346,14 @@ Omitting Variables
 match the file name relative to the buffer's top-level directory.
 @end defvar
 
+@defvar dired-omit-size-limit
+If non-@code{nil}, omitting will be skipped if the directory listing
+exceeds this size in bytes.  Since omitting can be slow for very large
+directories, this avoids having to wait before seeing the directory.
+This variable is ignored when @code{dired-omit-mode} is called
+interactively, such as by @code{C-x M-o}, so you can still enable
+omitting in the directory after the initial display.
+
 @cindex omitting additional files
 @defvar dired-omit-marker-char
 Temporary marker used by Dired to implement omitting.  Should never be used
diff --git a/etc/NEWS b/etc/NEWS
index b4a1c887f2e..fdface7aa0c 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -668,6 +668,12 @@ marked or clicked on files according to the OS conventions.  For
 example, on systems supporting XDG, this runs 'xdg-open' on the
 files.
 
+*** The default value of 'dired-omit-size-limit' has increased.
+After performance improvements to omitting in large directories, the new
+default value is 300k, up from 100k.  This means 'dired-omit-mode' will
+omit files in directories whose directory listing is up to 300 kilobytes
+in size.
+
 +++
 *** 'dired-listing-switches' handles connection-local values if exist.
 This allows to customize different switches for different remote machines.
diff --git a/lisp/dired-x.el b/lisp/dired-x.el
index 62fdd916e69..d7d15028489 100644
--- a/lisp/dired-x.el
+++ b/lisp/dired-x.el
@@ -77,12 +77,17 @@ dired-vm-read-only-folders
 		 (other :tag "non-writable only" if-file-read-only))
   :group 'dired-x)
 
-(defcustom dired-omit-size-limit 100000
-  "Maximum size for the \"omitting\" feature.
+(defcustom dired-omit-size-limit 300000
+  "Maximum buffer size for `dired-omit-mode'.
+
+Omitting will be skipped if the directory listing exceeds this size in
+bytes.  This variable is ignored when `dired-omit-mode' is called
+interactively.
+
 If nil, there is no maximum size."
   :type '(choice (const :tag "no maximum" nil) integer)
   :group 'dired-x
-  :version "29.1")
+  :version "30.1")
 
 (defcustom dired-omit-case-fold 'filesystem
   "Determine whether \"omitting\" patterns are case-sensitive.
@@ -506,14 +511,23 @@ dired-omit-expunge
                                       (re-search-forward dired-re-mark nil t))))
         count)))
 
+(defvar dired-omit--extension-regexp-cache
+  nil
+  "A cache of `regexp-opt' applied to `dired-omit-extensions'.
+
+This is a cons whose car is a list of strings and whose cdr is a
+regexp produced by `regexp-opt'.")
+
 (defun dired-omit-regexp ()
+  (unless (equal dired-omit-extensions (car dired-omit--extension-regexp-cache))
+    (setq dired-omit--extension-regexp-cache
+          (cons dired-omit-extensions (regexp-opt dired-omit-extensions))))
   (concat (if dired-omit-files (concat "\\(" dired-omit-files "\\)") "")
           (if (and dired-omit-files dired-omit-extensions) "\\|" "")
           (if dired-omit-extensions
               (concat ".";; a non-extension part should exist
-                      "\\("
-                      (mapconcat 'regexp-quote dired-omit-extensions "\\|")
-                      "\\)$")
+                      (cdr dired-omit--extension-regexp-cache)
+                      "$")
             "")))
 
 ;; Returns t if any work was done, nil otherwise.
-- 
2.44.0


  reply	other threads:[~2024-03-16 17:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 15:01 bug#69775: [PATCH] Use regexp-opt in dired-omit-regexp Spencer Baugh
2024-03-14 11:00 ` Eli Zaretskii
2024-03-16 17:15   ` sbaugh [this message]
2024-03-21 10:38     ` Eli Zaretskii
2024-03-23 13:29       ` sbaugh
2024-03-23 17:11         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8734sqjdyz.fsf@catern.com \
    --to=sbaugh@catern.com \
    --cc=69775@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=sbaugh@janestreet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.