From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Tue, 12 Sep 2023 17:23:53 +0300 Message-ID: References: <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> <83v8e6lyi4.fsf@gnu.org> <35f8b664-0241-9f96-1aa0-20ca51b2d34c@gutov.dev> <59c30342-a7e0-d83b-a128-0faae4cbd633@gutov.dev> <83pm4bi6qa.fsf@gnu.org> <83bkfs2tw5.fsf@gnu.org> <18a0b4d8-32bd-3ecd-8db4-32608a1ebba7@gutov.dev> <83il8lxjcu.fsf@gnu.org> <2e21ec81-8e4f-4c02-ea15-43bd6da3daa7@gutov.dev> <8334zmtwwi.fsf@gnu.org> <83tts0rkh5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39675"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Sep 12 16:25:18 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qg4KJ-0009z3-Iu for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 12 Sep 2023 16:25:16 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qg4K6-0000j0-Bv; Tue, 12 Sep 2023 10:25:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qg4K1-0000iT-7D for bug-gnu-emacs@gnu.org; Tue, 12 Sep 2023 10:24:57 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qg4K0-0005ha-Vp for bug-gnu-emacs@gnu.org; Tue, 12 Sep 2023 10:24:56 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qg4K5-0007MI-Je for bug-gnu-emacs@gnu.org; Tue, 12 Sep 2023 10:25:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 12 Sep 2023 14:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169452865028222 (code B ref 64735); Tue, 12 Sep 2023 14:25:01 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 12 Sep 2023 14:24:10 +0000 Original-Received: from localhost ([127.0.0.1]:59705 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qg4JG-0007L7-1b for submit@debbugs.gnu.org; Tue, 12 Sep 2023 10:24:10 -0400 Original-Received: from out4-smtp.messagingengine.com ([66.111.4.28]:45243) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qg4JD-0007Kt-Co for 64735@debbugs.gnu.org; Tue, 12 Sep 2023 10:24:08 -0400 Original-Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 632F25C028B; Tue, 12 Sep 2023 10:23:57 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Tue, 12 Sep 2023 10:23:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1694528637; x=1694615037; bh=QqKv4EB9n4HHUgnDi8ixSAF6v1Yo+A/OUTO pnI8ETVg=; b=iv6rbAJMZ+6wQrUVJui0ruD2UGNZO1XnpaBrW+/f37XDGq8YZsW D/CKSJVO7Zae7ZxUzTVPt7dlb3hzDroyJg94EskN8uKZupTwQwDRI9gh7oN4xqhT AwMRSaC3FTSmsokKAQWQzA7ImLNWX8bHtMnvTpvFiNlELAlkP7f2FsCZAs8h6wst MaE6K8oack3WnfDQQlaIP6/qAubT8wadiGYzlwIzj3Yyf4dHglpKuXiJUAR2tS2D lL30YogJ0VxSLCt/CNJDovbu/JOqJgSj5laxf42mxllPIFeb5Pv8CwgIKYKDDf8t Dss/aB83Qa6Lfma9Spmrb/NHOBCorLiM7jQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1694528637; x=1694615037; bh=QqKv4EB9n4HHUgnDi8ixSAF6v1Yo+A/OUTO pnI8ETVg=; b=JECeLGxyo/l8XSSy2tujwMEjfNqCCfbsvg+7AakIUptJE9kpNEh Cz5tpaVx0JxZuNMj7BdUyQ4hApDW2VED7iBYg4VkXBPRF2jl3XJ/UMPd5/BhC+dr BHfGV1vqT0SpHb0ITXMNetX4afeAKXvIPJb3KQ2ZIbtg8bDkL8WdrxAf45zrnVyM lHAqqZItk1ZUt8Q38K5EQhePwAJltJSrPpT+WIOsqtUWFPYSEMeBBAQgW9+x2WfW nqdTpzYuapoTmK7gWPpzj7w5yZ3ZWTTl/9ou9chG2WGjRyA9OL5KOe3T1g7R85zX uIcxSFy3qJ1FhMXxWwDI+yf5hy26HaCynIg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeiiedgjeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtfeejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepiefgteevheevveffheeltdeukeeiieekueefgedugfefgefhudelgfefveel vdevnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 12 Sep 2023 10:23:55 -0400 (EDT) Content-Language: en-US In-Reply-To: <83tts0rkh5.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:270182 Archived-At: On 11/09/2023 14:57, Eli Zaretskii wrote: >> So there is also a second recording for >> find-directory-files-recursively-2 with read-process-output-max=409600. >> It does improve the performance significantly (and reduce the number of >> GC pauses). I guess what I'm still not clear on, is whether the number >> of GC pauses is fewer because of less consing (the only column that >> looks significantly different is the 3rd: VECTOR-CELLS), or because the >> process finishes faster due to larger buffers, which itself causes fewer >> calls to maybe_gc. > I think the latter. It might be both. To try to analyze how large might per-chunk overhead be (CPU and GC-wise combined), I first implemented the same function in yet another way that doesn't use :filter (so that the default filter is used). But still asynchronously, with parsing happening concurrently to the process: (defun find-directory-files-recursively-5 (dir regexp &optional include-directories _p follow-symlinks) (cl-assert (null _p) t "find-directory-files-recursively can't accept arbitrary predicates") (with-temp-buffer (setq case-fold-search nil) (cd dir) (let* ((command (append (list "find" (file-local-name dir)) (if follow-symlinks '("-L") '("!" "(" "-type" "l" "-xtype" "d" ")")) (unless (string-empty-p regexp) (list "-regex" (concat ".*" regexp ".*"))) (unless include-directories '("!" "-type" "d")) '("-print0") )) (remote (file-remote-p dir)) (proc (if remote (let ((proc (apply #'start-file-process "find" (current-buffer) command))) (set-process-sentinel proc (lambda (_proc _state))) (set-process-query-on-exit-flag proc nil) proc) (make-process :name "find" :buffer (current-buffer) :connection-type 'pipe :noquery t :sentinel (lambda (_proc _state)) :command command))) start ret) (setq start (point-min)) (while (accept-process-output proc) (goto-char start) (while (search-forward "\0" nil t) (push (buffer-substring-no-properties start (1- (point))) ret) (setq start (point)))) ret))) This method already improved the performance somewhat (compared to find-directory-files-recursively-2), but not too much. So I tried these next two steps: - Dropping most of the setup in read_and_dispose_of_process_output (which creates some consing too) and calling Finternal_default_process_filter directly (call_filter_directly.diff), when it is the filter to be used anyway. - Going around that function entirely, skipping the creation of a Lisp string (CHARS -> TEXT) and inserting into the buffer directly (when the filter is set to the default, of course). Copied and adapted some code from 'call_process' for that (read_and_insert_process_output.diff). Neither are intended as complete proposals, but here are some comparisons. Note that either of these patches could only help the implementations that don't set up process filter (the naive first one, and the new parallel number 5 above). For testing, I used two different repo checkouts that are large enough to not finish too quickly: gecko-dev and torvalds-linux. master | Function | gecko-dev | linux | | find-directory-files-recursively | 1.69 | 0.41 | | find-directory-files-recursively-2 | 1.16 | 0.28 | | find-directory-files-recursively-3 | 0.92 | 0.23 | | find-directory-files-recursively-5 | 1.07 | 0.26 | | find-directory-files-recursively (rpom 409600) | 1.42 | 0.35 | | find-directory-files-recursively-2 (rpom 409600) | 0.90 | 0.25 | | find-directory-files-recursively-5 (rpom 409600) | 0.89 | 0.24 | call_filter_directly.diff (basically, not much difference) | Function | gecko-dev | linux | | find-directory-files-recursively | 1.64 | 0.38 | | find-directory-files-recursively-5 | 1.05 | 0.26 | | find-directory-files-recursively (rpom 409600) | 1.42 | 0.36 | | find-directory-files-recursively-5 (rpom 409600) | 0.91 | 0.25 | read_and_insert_process_output.diff (noticeable differences) | Function | gecko-dev | linux | | find-directory-files-recursively | 1.30 | 0.34 | | find-directory-files-recursively-5 | 1.03 | 0.25 | | find-directory-files-recursively (rpom 409600) | 1.20 | 0.35 | | find-directory-files-recursively-5 (rpom 409600) | (!!) 0.72 | 0.21 | So it seems like we have at least two potential ways to implement an asynchronous file listing routine that is as fast or faster than the synchronous one (if only thanks to starting the parsing in parallel). Combining the last patch together with using the very large value of read-process-output-max seems to yield the most benefit, but I'm not sure if it's appropriate to just raise that value in our code, though. Thoughts?