From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep Date: Wed, 22 May 2024 17:22:56 +0300 Message-ID: References: <86ttiq6or8.fsf@gnu.org> <8aedd0ed-58fe-4ac7-98d6-950be2d4700b@gutov.dev> <868r026jlq.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5884"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: sbaugh@janestreet.com, 71094@debbugs.gnu.org, rgm@gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 22 16:24:17 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s9mt6-0001HD-4O for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 22 May 2024 16:24:16 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9mso-0003t9-Cv; Wed, 22 May 2024 10:23:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9msm-0003sB-Nm for bug-gnu-emacs@gnu.org; Wed, 22 May 2024 10:23:56 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s9msm-0001pw-FQ for bug-gnu-emacs@gnu.org; Wed, 22 May 2024 10:23:56 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1s9mss-0003Wv-7F for bug-gnu-emacs@gnu.org; Wed, 22 May 2024 10:24:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 22 May 2024 14:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71094 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 71094-submit@debbugs.gnu.org id=B71094.171638779513533 (code B ref 71094); Wed, 22 May 2024 14:24:02 +0000 Original-Received: (at 71094) by debbugs.gnu.org; 22 May 2024 14:23:15 +0000 Original-Received: from localhost ([127.0.0.1]:56246 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s9ms7-0003WD-Do for submit@debbugs.gnu.org; Wed, 22 May 2024 10:23:15 -0400 Original-Received: from fout1-smtp.messagingengine.com ([103.168.172.144]:32869) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s9ms3-0003Vm-Jy for 71094@debbugs.gnu.org; Wed, 22 May 2024 10:23:13 -0400 Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfout.nyi.internal (Postfix) with ESMTP id 760EA13800B4; Wed, 22 May 2024 10:23:00 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 22 May 2024 10:23:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1716387780; x=1716474180; bh=faNf1GI7kINA0SayIDDBlOuXTOKWfK6IsDRbI/Qp0Oc=; b= HQ3B++hqwJJcb/1WqB22BnMUBgspMsypNooMkY0L6CreeKQu7Zxf1FKb1TDZSKnO hZeh/R/ZcOdrb427k7jz5ZSASUm217+e9DQYcw/6Rg/zAr3bK408l7HVEvaNbMDA 3EDuUHHllZJjpqESYvbssvICk1WMwQMY2SREvimU1nxPYL/h6LUVygcbJ8OKG+71 jpQd9K/2fmkigL4V/jXfsCWtxa6f6r3WTOJI8yXvB8BEnAvpH7IEbdcX0LxUp6Qv OaxirYrbScL3Jc86HcJdJ15G2D6n+v9BeYaH+UZukJcZj/VUWKHhdIzh2BvC2vva 3VhTq8fdhb+Mb4djTw+ByQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716387780; x= 1716474180; bh=faNf1GI7kINA0SayIDDBlOuXTOKWfK6IsDRbI/Qp0Oc=; b=I zS1P4tjBo6c9NRBsmTw6gSgYrXRmFPEY914x1ZyRlLtMgMF4Ey0/srM3gd1gTUUm iGT8T12Omqgf7Yo3FHEN0ZpoACKJl4ktfCBa/bkBHdynmVyMGgYCBQUtuNoyIep+ z5ZKBnOf0mXLsqY7kvPxoM0kNNV0s6vzUrH1SkBm5bx0v35FlNXtj8ahPCBR4sOW beuQ+EmqBdDJ5fg5rmi8RBBDsA4k46yrc4rAnfte1CWjWAJSMzRsI3ToffiH1s7p wsaAIVO44H9LUIkoi1PXpHuNv44Ej8DHyjaCmx1k/PfExlCTqcLws75F5xQVeuVy 5COgNnb2K1nE63EXrJ3BQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeigedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 22 May 2024 10:22:58 -0400 (EDT) Content-Language: en-US In-Reply-To: <868r026jlq.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:285628 Archived-At: On 22/05/2024 16:50, Eli Zaretskii wrote: >> Date: Wed, 22 May 2024 15:34:06 +0300 >> Cc: 71094@debbugs.gnu.org, rgm@gnu.org >> From: Dmitry Gutov >> >> On 22/05/2024 14:59, Eli Zaretskii wrote: >> >>> With how many files did you measure the 40% speedup? Can you show the >>> performance with much fewer and much more files than what you used? >> >> FWIW my test indicated that for a smaller project (such as Emacs) the >> difference is fairly small - the new code is slightly better or the same. >> >> The directory where I saw significant improvement has 300K files. > > That's what I thought. So we are changing the decade-old defaults to > favor huge directories, which is not necessarily the wisest thing to > do. I don't see any regression on small directories, though. And an improvement on big ones. So the way I see it, we're expanding Emacs's applicability to wider audience without any apparent drawbacks. It might actually give us an improvement in smaller projects as well, if we decrease xargs's batch size (with -s or -n). But those are fairly fast already, so it's not critical. >>> I >>> suspect that the effect depends on that. (It also depends on the >>> system limit on the number of files and the length of the command line >>> that xargs can use.) The argument about 'find' waiting is no longer >>> relevant with 'exec-plus', since in most cases there will be just one >>> invocation of 'grep'. >> >> If there's just one invocation, wouldn't that mean that it will happen >> at the end of the full directory scan? Rather than in parallel. > > That's true, but what is your mental model of how the pipe with xargs > works in practice? How many invocations of grep will xargs do, and > when will the first invocation happen? In my mental model xargs acts like an asynchronous queue with batch processing. The first invocation will happen after the output reaches the maximum line number of maximum number of arguments configured. They are system-dependent by default. For example, on my system 'xargs --show-limits' says Size of command buffer we are actually using: 131072 Whereas in the Emacs repository "find ... -print0 | wc" reports 202928 characters. Meaning, it uses just 1.5 'grep' invocations. To see better parallelism there we'll need to either lower the limit or test it in a project at least twice as big. So here is another example: a Linux kernel checkout (76K files). Also about 30% improvement: 1.40s vs 2.00s.