From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Sat, 29 Jul 2023 03:12:34 +0300 Message-ID: <59c30342-a7e0-d83b-a128-0faae4cbd633@gutov.dev> References: <1fd5e3ed-e1c3-5d6e-897f-1d5d55e379fa@gutov.dev> <87wmyupvlw.fsf@localhost> <5c4d9bea-3eb9-b262-138a-4ea0cb203436@gutov.dev> <87tttypp2e.fsf@localhost> <87r0p030w0.fsf@yahoo.com> <83sf9f6wm0.fsf@gnu.org> <83sf9eub9d.fsf@gnu.org> <2d844a34-857d-3d59-b897-73372baac480@gutov.dev> <83bkg2tsu6.fsf@gnu.org> <83bd4246-ac41-90ec-1df3-02d0bd59ca44@gutov.dev> <834jlttv1p.fsf@gnu.org> <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> <83v8e6lyi4.fsf@gnu.org> <35f8b664-0241-9f96-1aa0-20ca51b2d34c@gutov.dev> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23646"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jul 29 02:35:51 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qPXvx-0005t4-Sh for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 29 Jul 2023 02:35:50 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qPXZw-0003nl-My; Fri, 28 Jul 2023 20:13:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qPXZu-0003nZ-S8 for bug-gnu-emacs@gnu.org; Fri, 28 Jul 2023 20:13:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qPXZu-0008Uq-JJ for bug-gnu-emacs@gnu.org; Fri, 28 Jul 2023 20:13:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qPXZt-0005zH-PC for bug-gnu-emacs@gnu.org; Fri, 28 Jul 2023 20:13:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 29 Jul 2023 00:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.169058956722993 (code B ref 64735); Sat, 29 Jul 2023 00:13:01 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 29 Jul 2023 00:12:47 +0000 Original-Received: from localhost ([127.0.0.1]:46293 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qPXZe-0005ym-V9 for submit@debbugs.gnu.org; Fri, 28 Jul 2023 20:12:47 -0400 Original-Received: from out3-smtp.messagingengine.com ([66.111.4.27]:60679) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qPXZc-0005yZ-LE for 64735@debbugs.gnu.org; Fri, 28 Jul 2023 20:12:45 -0400 Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 7F2A75C00CF; Fri, 28 Jul 2023 20:12:39 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Fri, 28 Jul 2023 20:12:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1690589559; x=1690675959; bh=kDAy/oC8PTT3tibtcSafhP6m7ELCJpuOacX 162MYb58=; b=ZuGxXyaG6Dv3YzuwVPbxRtuxdzofsnn8W9cyhGWh7RwvLVJw+/A 2+n5nly/4Ag/VclJoEU1uf3X9EAE7OKci6WiQJ65wqdS3/QN2ZDWZgGXdjKEb7LZ KYaL0I6nVZKzNdiNSrYRiJtXw6ZhVybN+hwmT4ssrC0sPOJ1hYDGyRjEma5B6QE1 Gq9LD20rxk1FYlLUCLAEAsiQUwA4UrsY6vAVmI4hjx4VbpPCfv9ZNZx1w+is/9Hr aDm45+QAoD2WEpWVFigpmvWvrtnfi60gb0x1PONcLW4mnwJcWHEqPLj1+FXvlT/l OovEHp9F8jgrvMb7nkiZTDTEoLTI0c7WfgA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690589559; x=1690675959; bh=kDAy/oC8PTT3tibtcSafhP6m7ELCJpuOacX 162MYb58=; b=NzWC8Lh00Qrblh9/livnDpRMs50pFyYQXlKBGtE9JzmrdG8NO6I O/GB3oULeR9JybJiYXn+dxof/SuJLpe1dawD3SWrIBV844HbbV7og4OPBXQrU2jl hdE+Rtqudm1a7b63wSGD2JsK38r2amFdqXsfAp11t0nejw5FOmpNYtiwwAWh2op2 rRdLcjiaZNXJBa6Ne5aXNtVHnIaS7srX0eobHoNlDS+BOs2dnjFc7g0Cxl5CA2Gz XzwEPFbu7uTYpHTh1Cp2tWS28uo6QZ6DuiFP98iBwsiuQ6GV5uQZe3JvqPc13HLv 4NURigtoSQ1+2urLFBL8WkB15WDWt6ojyfw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrieejgdefudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuhffvvehfjggtgfesthejredttdefjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpeejgeeivdefgffhudetueevjefgfeelleetvddtfeevkeehveffueeuhfehjeff hfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 28 Jul 2023 20:12:36 -0400 (EDT) Content-Language: en-US In-Reply-To: <35f8b664-0241-9f96-1aa0-20ca51b2d34c@gutov.dev> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:266305 Archived-At: On 27/07/2023 16:30, Dmitry Gutov wrote: > I can imagine that the filter-based approach necessarily creates more > strings (to pass to the filter function). Maybe we could increase those > strings' size (thus reducing the number) by increasing the read buffer > size? To go further along this route, first of all, I verified that the input strings are (almost) all the same length: 4096. And they are parsed into strings with length 50-100 characters, meaning the number of "junk" objects due to the process-filter approach probably shouldn't matter too much, given that the number of strings returned is 40-80x more. But then I ran these tests with different values of read-process-output-max, which exactly increased those strings' size, proportionally reducing their number. The results were: > (my-bench-rpom 1 default-directory "") => (("with-find-p 4096" . "Elapsed time: 0.945478s (0.474680s in 6 GCs)") ("with-find-p 40960" . "Elapsed time: 0.760727s (0.395379s in 5 GCs)") ("with-find-p 409600" . "Elapsed time: 0.729757s (0.394881s in 5 GCs)")) where (defun my-bench-rpom (count path regexp) (setq path (expand-file-name path)) (list (cons "with-find-p 4096" (let ((read-process-output-max 4096)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))) (cons "with-find-p 40960" (let ((read-process-output-max 40960)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))) (cons "with-find-p 409600" (let ((read-process-output-max 409600)) (benchmark count (list 'find-directory-files-recursively-2 path regexp)))))) ...with the last iteration showing consistently the same or better performance than the "sync" version I benchmarked previously. What does that mean for us? The number of strings in the heap is reduced, but not by much (again, the result is a list with 43x more elements). The combined memory taken up by these intermediate strings to be garbage-collected, is the same. It seems like per-chunk overhead is non-trivial, and affects GC somehow (but not in a way that just any string would). In this test, by default, the output produces ~6000 strings and passes them to the filter function. Meaning, read_and_dispose_of_process_output is called about 6000 times, producing the overhead of roughly 0.2s. Something in there must be producing extra work for the GC. This line seems suspect: list3 (outstream, make_lisp_proc (p), text), Creates 3 conses and one Lisp object (tagged pointer). But maybe I'm missing something bigger.