From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Wed, 26 Jul 2023 09:09:28 +0000 Message-ID: <878rb3m43b.fsf@localhost> References: <1fd5e3ed-e1c3-5d6e-897f-1d5d55e379fa@gutov.dev> <87wmyupvlw.fsf@localhost> <5c4d9bea-3eb9-b262-138a-4ea0cb203436@gutov.dev> <87tttypp2e.fsf@localhost> <87r0p030w0.fsf@yahoo.com> <83sf9f6wm0.fsf@gnu.org> <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13729"; mail-complaints-to="usenet@ciao.gmane.io" Cc: luangruo@yahoo.com, sbaugh@janestreet.com, Eli Zaretskii , 64735@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jul 26 11:10:40 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qOaXY-0003Le-2r for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 26 Jul 2023 11:10:40 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qOaWy-00009z-FB; Wed, 26 Jul 2023 05:10:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qOaWw-00009g-5U for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 05:10:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qOaWv-0003Xd-SL for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 05:10:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qOaWv-0006sV-NO for bug-gnu-emacs@gnu.org; Wed, 26 Jul 2023 05:10:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 26 Jul 2023 09:10:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169036256926393 (code B ref 64735); Wed, 26 Jul 2023 09:10:01 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 26 Jul 2023 09:09:29 +0000 Original-Received: from localhost ([127.0.0.1]:47556 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOaWO-0006rX-3T for submit@debbugs.gnu.org; Wed, 26 Jul 2023 05:09:28 -0400 Original-Received: from mout02.posteo.de ([185.67.36.66]:40895) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qOaWI-0006qf-3P for 64735@debbugs.gnu.org; Wed, 26 Jul 2023 05:09:23 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id CC654240105 for <64735@debbugs.gnu.org>; Wed, 26 Jul 2023 11:09:15 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1690362555; bh=dfRxu3WjvbG3Z1ab4VqywW3Zm8wPksFVH7XOINpcQmQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:From; b=h+j0SfJCpqNj4RdF0Kg9nAuvk/5Bv04nuVjuTbbr5Sgp2Imzj7zUzFhQKq4qbm4Ex WZL6z3ydQLtNn0x8dUaDTtwkrrXudrIvbi1293n0QOSOfhjnCjgd7uV82/IQcykGKi KJtw2oA4VsX7orOi6Ub9Q4YspFaNQavkxy7i4h5O7LmcrUiC18WKZuil+ky9xUFaAd GZNLIcScYXywjD8bZJil9YRpOXAKkRWslNVS1b/Ht/aXZ/bkPxrHKCVoZPv1UVJze5 mAGm6YQdXrHchbzuq/i9aWdWDaVDLXugY3zsTuPpgAlstWTqnnbAKafupq0YFv43P6 vZODBNltCkn2Q== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4R9p4B4gBkz6tn4; Wed, 26 Jul 2023 11:09:14 +0200 (CEST) In-Reply-To: <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266107 Archived-At: --=-=-= Content-Type: text/plain Dmitry Gutov writes: >> (my-bench 10 "/usr/src/linux/" "") >> >> (("built-in" . "Elapsed time: 7.034326s (3.598539s in 14 GCs)") >> ("built-in no filename handler alist" . "Elapsed time: 5.907194s (3.698456s in 15 GCs)") >> ("with-find" . "Elapsed time: 6.078056s (4.052791s in 16 GCs)") >> ("with-find-p" . "Elapsed time: 4.496762s (2.739565s in 11 GCs)") >> ("with-find-sync" . "Elapsed time: 3.702760s (1.715160s in 7 GCs)")) > > Thanks, for the extra data point in particular. Easy to see how it > compares to the most efficient use of 'find', right (on GNU/Linix, at > least)? > > It's also something to note that, GC-wise, numbers 1 and 2 are not the > worst: the time must be spent somewhere else. Indeed. I did more detailed analysis in https://yhetil.org/emacs-devel/87cz0p2xlc.fsf@localhost/ Main contributors in the lisp versions are (in the order from most significant to less significant) (1) file name handlers; (2) regexp matching of the file names; (3) nconc calls in the current `directory-files-recursively' implementation. I have modified `directory-files-recursively' to avoid O(N^2) `nconc' calls + bypassing regexp matches when REGEXP is nil. Here are the results (using the attached modified version of your benchmark file): (my-bench 10 "/usr/src/linux/" "") (("built-in" . "Elapsed time: 7.285597s (3.853368s in 6 GCs)") ("built-in no filename handler alist" . "Elapsed time: 5.855019s (3.760662s in 6 GCs)") ("built-in non-recursive no filename handler alist" . "Elapsed time: 5.817639s (4.326945s in 7 GCs)") ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 2.708306s (1.871665s in 3 GCs)") ("with-find" . "Elapsed time: 6.082200s (4.262830s in 7 GCs)") ("with-find-p" . "Elapsed time: 4.325503s (3.058647s in 5 GCs)") ("with-find-sync" . "Elapsed time: 3.267648s (1.903655s in 3 GCs)")) (let ((gc-cons-threshold most-positive-fixnum)) (my-bench 10 "/usr/src/linux/" "")) (("built-in" . "Elapsed time: 2.754473s") ("built-in no filename handler alist" . "Elapsed time: 1.322443s") ("built-in non-recursive no filename handler alist" . "Elapsed time: 1.235044s") ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 0.750275s") ("with-find" . "Elapsed time: 1.438510s") ("with-find-p" . "Elapsed time: 1.200876s") ("with-find-sync" . "Elapsed time: 1.349755s")) If we forget about GC, Elisp version can get fairly close to GNU find. And if we do not perform regexp matching (which makes sense when the REGEXP is ""), Elisp version is faster. --=-=-= Content-Type: text/plain Content-Disposition: inline; filename=find-bench.el ;; -*- lexical-binding: t; -*- (defun find-directory-files-recursively (dir regexp &optional include-directories _p follow-symlinks) (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates") (with-temp-buffer (setq case-fold-search nil) (cd dir) (let* ((command (append (list "find" (file-local-name dir)) (if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) (unless (string-empty-p regexp) (list "-regex" (concat ".*" regexp ".*"))) (unless include-directories '("!" "-type" "d")) '("-print0") )) (remote (file-remote-p dir)) (proc (if remote (let ((proc (apply #'start-file-process "find" (current-buffer) command))) (set-process-sentinel proc (lambda (_proc _state))) (set-process-query-on-exit-flag proc nil) proc) (make-process :name "find" :buffer (current-buffer) :connection-type 'pipe :noquery t :sentinel (lambda (_proc _state)) :command command)))) (while (accept-process-output proc)) (let ((start (goto-char (point-min))) ret) (while (search-forward "\0" nil t) (push (concat remote (buffer-substring-no-properties start (1- (point)))) ret) (setq start (point))) ret)))) (defun find-directory-files-recursively-2 (dir regexp &optional include-directories _p follow-symlinks) (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates") (cl-assert (not (file-remote-p dir))) (let* (buffered result (proc (make-process :name "find" :buffer nil :connection-type 'pipe :noquery t :sentinel (lambda (_proc _state)) :filter (lambda (proc data) (let ((start 0)) (when-let (end (string-search "\0" data start)) (push (concat buffered (substring data start end)) result) (setq buffered "") (setq start (1+ end)) (while-let ((end (string-search "\0" data start))) (push (substring data start end) result) (setq start (1+ end)))) (setq buffered (concat buffered (substring data start))))) :command (append (list "find" (file-local-name dir)) (if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) (unless (string-empty-p regexp) (list "-regex" (concat ".*" regexp ".*"))) (unless include-directories '("!" "-type" "d")) '("-print0") )))) (while (accept-process-output proc)) result)) (defun find-directory-files-recursively-3 (dir regexp &optional include-directories _p follow-symlinks) (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates") (cl-assert (not (file-remote-p dir))) (let ((args `(,(file-local-name dir) ,@(if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) ,@(unless (string-empty-p regexp) (list "-regex" (concat ".*" regexp ".*"))) ,@(unless include-directories '("!" "-type" "d")) "-print0"))) (with-temp-buffer (let ((status (apply #'process-file "find" nil t nil args)) (pt (point-min)) res) (unless (zerop status) (error "Listing failed")) (goto-char (point-min)) (while (search-forward "\0" nil t) (push (buffer-substring-no-properties pt (1- (point))) res) (setq pt (point))) res)))) (defun directory-files-recursively-strip-nconc (dir regexp &optional include-directories predicate follow-symlinks) "Return list of all files under directory DIR whose names match REGEXP. This function works recursively. Files are returned in \"depth first\" order, and files from each directory are sorted in alphabetical order. Each file name appears in the returned list in its absolute form. By default, the returned list excludes directories, but if optional argument INCLUDE-DIRECTORIES is non-nil, they are included. PREDICATE can be either nil (which means that all subdirectories of DIR are descended into), t (which means that subdirectories that can't be read are ignored), or a function (which is called with the name of each subdirectory, and should return non-nil if the subdirectory is to be descended into). If FOLLOW-SYMLINKS is non-nil, symbolic links that point to directories are followed. Note that this can lead to infinite recursion." (let* ((result nil) (dirs (list dir)) (dir (directory-file-name dir)) ;; When DIR is "/", remote file names like "/method:" could ;; also be offered. We shall suppress them. (tramp-mode (and tramp-mode (file-remote-p (expand-file-name dir))))) (while (setq dir (pop dirs)) (dolist (file (file-name-all-completions "" dir)) (unless (member file '("./" "../")) (if (directory-name-p file) (let* ((leaf (substring file 0 (1- (length file)))) (full-file (concat dir "/" leaf))) ;; Don't follow symlinks to other directories. (when (and (or (not (file-symlink-p full-file)) follow-symlinks) ;; Allow filtering subdirectories. (or (eq predicate nil) (eq predicate t) (funcall predicate full-file))) (push full-file dirs)) (when (and include-directories (string-match regexp leaf)) (setq result (nconc result (list full-file))))) (when (and regexp (string-match regexp file)) (push (concat dir "/" file) result)))))) (sort result #'string<))) (defun my-bench (count path regexp) (setq path (expand-file-name path)) ;; (let ((old (directory-files-recursively path regexp)) ;; (new (find-directory-files-recursively-3 path regexp))) ;; (dolist (path old) ;; (unless (member path new) (error "! %s not in" path))) ;; (dolist (path new) ;; (unless (member path old) (error "!! %s not in" path)))) (list (cons "built-in" (benchmark count (list 'directory-files-recursively path regexp))) (cons "built-in no filename handler alist" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively path regexp)))) (cons "built-in non-recursive no filename handler alist" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively-strip-nconc path regexp)))) (cons "built-in non-recursive no filename handler alist + skip re-match" (let (file-name-handler-alist) (benchmark count (list 'directory-files-recursively-strip-nconc path nil)))) (cons "with-find" (benchmark count (list 'find-directory-files-recursively path regexp))) (cons "with-find-p" (benchmark count (list 'find-directory-files-recursively-2 path regexp))) (cons "with-find-sync" (benchmark count (list 'find-directory-files-recursively-3 path regexp))))) (provide 'find-bench) --=-=-= Content-Type: text/plain -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at --=-=-=--