From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Why does not rgrep use "grep -r"? Date: Sat, 03 Nov 2007 10:43:03 +0100 Message-ID: <85bqabslzs.fsf@lola.goethe.zz> References: <472B99AB.3090705@gmail.com> <87ve8k8dyw.fsf@catnip.gol.com> <472BB921.20104@gmail.com> <87d4us6rnl.fsf@catnip.gol.com> <472BD29F.5090205@gmail.com> <877il06ltg.fsf@catnip.gol.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1194083510 23254 80.91.229.12 (3 Nov 2007 09:51:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 3 Nov 2007 09:51:50 +0000 (UTC) Cc: schwab@suse.de, emacs-devel@gnu.org, lennart.borgman@gmail.com, Miles Bader To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 03 10:51:52 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IoFfT-0002S1-0a for ged-emacs-devel@m.gmane.org; Sat, 03 Nov 2007 10:51:51 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IoFfI-00034R-Gf for ged-emacs-devel@m.gmane.org; Sat, 03 Nov 2007 05:51:40 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IoFfE-00034M-65 for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:36 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IoFfC-00034A-Gl for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IoFfC-000347-84 for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IoFfB-0006v5-VN for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400 Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz) by fencepost.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IoFfA-0000Zc-7q; Sat, 03 Nov 2007 05:51:32 -0400 Original-Received: by lola.goethe.zz (Postfix, from userid 1002) id E653B1C464E3; Sat, 3 Nov 2007 10:43:04 +0100 (CET) In-Reply-To: (Eli Zaretskii's message of "Sat, 03 Nov 2007 10:40:02 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.50 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:82417 Archived-At: Eli Zaretskii writes: >> From: Miles Bader >> Date: Sat, 03 Nov 2007 12:37:47 +0900 >> Cc: Andreas Schwab , Emacs Devel >> >> > I believed that grep had to be started many times. Is not that the case? >> >> xargs invokes grep in "batches," with as many filenames as will fit on >> the command line; for e.g. linux, that's many thousands at once, so >> process invocation overhead will tend to be in the noise compared to >> file I/O overhead. > > Yes, but I believe "grep -r" will be still faster, even on GNU/Linux, > since all it does to recurse is `readdir' and `fnmatch'; the need for > writing file names to the pipe and reading them on the xargs side is > avoided. Totally warm cache: dak@lola:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort ./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for real 0m0.974s user 0m0.372s sys 0m0.536s dak@lola:/usr/local/texlive/2007$ time grep -r --include=\*.tex snort . ./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for real 0m1.225s user 0m0.376s sys 0m0.764s Totally cold cache (after umount and mount): dak@lola:/usr/local/texlive/2007$ time grep -r --include=\*.tex snort . ./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for real 1m44.387s user 0m0.508s sys 0m3.768s dak@lola:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort ./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for real 0m59.633s user 0m0.604s sys 0m1.484s And, for good measure: dak@lola:/usr/local/texlive/2007$ time find -name \*.tex -exec grep snort {} \+ ./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for real 0m55.640s user 0m0.576s sys 0m1.292s In short: interspersing the directory and file search on a per-file basis (as grep -r does) makes the whole operation much more inefficient on a cold buffer cache. On a warm cache, it is pretty much the same. Using a pipe also allows for some parallelism. In this particular case, however, both jobs are so much I/O-bound that the last, pipeless version using -exec ... \+ is still somewhat faster even though it is strictly single-threaded in its operation. The decisive factor appears to be the large-scale bundling of directory searches without intervening file searches in between. This is on a Linux lola 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux single processor laptop with a fairly standard ATA disk. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum