From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Thu, 27 Jul 2023 16:30:56 +0300 Message-ID: <35f8b664-0241-9f96-1aa0-20ca51b2d34c@gutov.dev> References: <1fd5e3ed-e1c3-5d6e-897f-1d5d55e379fa@gutov.dev> <87wmyupvlw.fsf@localhost> <5c4d9bea-3eb9-b262-138a-4ea0cb203436@gutov.dev> <87tttypp2e.fsf@localhost> <87r0p030w0.fsf@yahoo.com> <83sf9f6wm0.fsf@gnu.org> <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> <83v8e6lyi4.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25075"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jul 27 16:00:22 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qP1XQ-0006I3-Bp for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 27 Jul 2023 16:00:21 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qP16A-0001x6-TX; Thu, 27 Jul 2023 09:32:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qP163-0001ns-CZ for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 09:32:03 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qP161-0004v8-US for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 09:32:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qP161-00040c-Qh for bug-gnu-emacs@gnu.org; Thu, 27 Jul 2023 09:32:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 27 Jul 2023 13:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169046467115342 (code B ref 64735); Thu, 27 Jul 2023 13:32:01 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 27 Jul 2023 13:31:11 +0000 Original-Received: from localhost ([127.0.0.1]:40957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qP15D-0003zO-85 for submit@debbugs.gnu.org; Thu, 27 Jul 2023 09:31:11 -0400 Original-Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:47203) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qP15A-0003zA-54 for 64735@debbugs.gnu.org; Thu, 27 Jul 2023 09:31:10 -0400 Original-Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4C05D32008FD; Thu, 27 Jul 2023 09:31:01 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Thu, 27 Jul 2023 09:31:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1690464660; x=1690551060; bh=waeVfQIki+HZTOaa0uly1FtCecdYCc2KDOT pFnDrxV4=; b=Fu6XNhQYD8DQZv2Rrc53t+xhHmsXy/dr72czO3ZrOQ9qoSHxcbv vlMcZQLk65YhehhBjarDnat/Z7JexZRcTpW31QW/SxxutizG4H/uMQmWaKOKkx04 23wpgkEErDNp0/UN7o9UHDxgnWYtVAP2ZkDFKtdwd24Jje/akV7iecL9FJ+ARWFE ozU46BWyNDGnnVN9QdgnXNR0cry+0uRdsF2r98kk+gXCTE0AVmHqM/raiJoXlonA Bm48FY6uA2ec16hsbSq+NcoswvB8szNjcvCNf3dtY1IKARxkbbb8ThG1YJQuoerr NST6E0kqgJxLG8MzAIhcTHVoEDE3zIbfcNA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690464660; x=1690551060; bh=waeVfQIki+HZTOaa0uly1FtCecdYCc2KDOT pFnDrxV4=; b=uhtOiyAutyrsDZ0qrAkGgXec43QBKknbbyaLiddqmDkiVaep+By b/HzlNt09C5buv3E5ejuaSTxEpPLjb2kO3xigbif2AQuz7/nxM2GlR8V6PkknGvf wNjRHCTL7IdkLSMCJJFMZamXN0gyStlE2epEv0/eGwvoinFhgiBBOrVgN464gllF /7kU5zs/lqMr+zuQDZHfqyUzLEwv1wSGIS/LbKeFfhjoE3lKVRLF64NONoBGneWX qTyZ4koCyZ7c9Vqe+WdYvBK35qSsTzqcVSAcofH6FJzDJtJ1jJRwXot/DAwTcoxK UymW93WD+zRhIAGC86H8kX+8mkkZt9xdxaQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrieeggdeflecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthejredttdefjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpeeigfetveehveevffehledtueekieeikeeufeegudfgfeeghfdulefgfeevledv veenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 27 Jul 2023 09:30:58 -0400 (EDT) Content-Language: en-US In-Reply-To: <83v8e6lyi4.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266192 Archived-At: On 27/07/2023 08:22, Eli Zaretskii wrote: >> Date: Thu, 27 Jul 2023 03:41:29 +0300 >> Cc: Eli Zaretskii , luangruo@yahoo.com, sbaugh@janestreet.com, >> 64735@debbugs.gnu.org >> From: Dmitry Gutov >> >>> I have modified `directory-files-recursively' to avoid O(N^2) `nconc' >>> calls + bypassing regexp matches when REGEXP is nil. >> >> Sounds good. I haven't examined the diff closely, but it sounds like an >> improvement that can be applied irrespective of how this discussion ends. > > That change should be submitted as a separate issue and discussed in > detail before we decide we can make it. Sure. >>> If we forget about GC, Elisp version can get fairly close to GNU find. >>> And if we do not perform regexp matching (which makes sense when the >>> REGEXP is ""), Elisp version is faster. >> >> We can't really forget about GC, though. > > But we could temporarily lift the threshold while this function runs, > if that leads to significant savings. I mean, everything's doable, but if we do this for this function, why not others? Most long-running code would see an improvement from that kind of change (the 'find'-based solutions too). IIRC the main drawback is running out of memory in extreme conditions or on low-memory platforms/devices. It's not like this feature is particularly protected from this. >> But the above numbers make me hopeful about the async-parallel solution, >> implying that the parallelization really can help (and offset whatever >> latency we lose on pselect), as soon as we determine the source of extra >> consing and decide what to do about it. > > Isn't it clear that additional consing comes from the fact that we > first insert the Find's output into a buffer or produce a string from > it, and then chop that into individual file names? But we do that in all 'find'-based solutions: the synchronous one takes buffer text and chops it into strings. The first asynchronous does the same. The other ("with-find-p") works from a process filter, chopping up strings that get passed to it. But the amount of time spent in GC is different, with most of the difference in performance attributable to it: if we subtract time spent in GC, the runtimes are approximately equal. I can imagine that the filter-based approach necessarily creates more strings (to pass to the filter function). Maybe we could increase those strings' size (thus reducing the number) by increasing the read buffer size? I haven't found a relevant variable, though. Or if there was some other callback that runs after the next chunk of output arrives from the process, we could parse it from the buffer. But the insertion into the buffer would need to be made efficient (apparently internal-default-process-filter currently uses the same sequence of strings as the other filters for input, with the same amount of consing).