From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Why does not rgrep use "grep -r"?
Date: Sat, 03 Nov 2007 10:43:03 +0100
Message-ID: <85bqabslzs.fsf@lola.goethe.zz>
References: <472B99AB.3090705@gmail.com> <87ve8k8dyw.fsf@catnip.gol.com>
	<jetzo4ql9r.fsf@sykes.suse.de> <472BB921.20104@gmail.com>
	<87d4us6rnl.fsf@catnip.gol.com> <472BD29F.5090205@gmail.com>
	<877il06ltg.fsf@catnip.gol.com> <u1wb7sowt.fsf@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1194083510 23254 80.91.229.12 (3 Nov 2007 09:51:50 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 3 Nov 2007 09:51:50 +0000 (UTC)
Cc: schwab@suse.de, emacs-devel@gnu.org, lennart.borgman@gmail.com,
	Miles Bader <miles@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 03 10:51:52 2007
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1IoFfT-0002S1-0a
	for ged-emacs-devel@m.gmane.org; Sat, 03 Nov 2007 10:51:51 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1IoFfI-00034R-Gf
	for ged-emacs-devel@m.gmane.org; Sat, 03 Nov 2007 05:51:40 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IoFfE-00034M-65
	for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:36 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IoFfC-00034A-Gl
	for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IoFfC-000347-84
	for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400
Original-Received: from fencepost.gnu.org ([140.186.70.10])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <dak@gnu.org>) id 1IoFfB-0006v5-VN
	for emacs-devel@gnu.org; Sat, 03 Nov 2007 05:51:34 -0400
Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz)
	by fencepost.gnu.org with esmtp (Exim 4.60)
	(envelope-from <dak@gnu.org>)
	id 1IoFfA-0000Zc-7q; Sat, 03 Nov 2007 05:51:32 -0400
Original-Received: by lola.goethe.zz (Postfix, from userid 1002)
	id E653B1C464E3; Sat,  3 Nov 2007 10:43:04 +0100 (CET)
In-Reply-To: <u1wb7sowt.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 03 Nov
	2007 10:40:02 +0200")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.50 (gnu/linux)
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:82417
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/82417>

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Miles Bader <miles@gnu.org>
>> Date: Sat, 03 Nov 2007 12:37:47 +0900
>> Cc: Andreas Schwab <schwab@suse.de>, Emacs Devel <emacs-devel@gnu.org>
>> 
>> > I believed that grep had to be started many times. Is not that the case?
>> 
>> xargs invokes grep in "batches," with as many filenames as will fit on
>> the command line; for e.g. linux, that's many thousands at once, so
>> process invocation overhead will tend to be in the noise compared to
>> file I/O overhead.
>
> Yes, but I believe "grep -r" will be still faster, even on GNU/Linux,
> since all it does to recurse is `readdir' and `fnmatch'; the need for
> writing file names to the pipe and reading them on the xargs side is
> avoided.

Totally warm cache:

dak@lola:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for

real    0m0.974s
user    0m0.372s
sys     0m0.536s
dak@lola:/usr/local/texlive/2007$ time grep -r  --include=\*.tex snort .
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for

real    0m1.225s
user    0m0.376s
sys     0m0.764s

Totally cold cache (after umount and mount):

dak@lola:/usr/local/texlive/2007$ time grep -r  --include=\*.tex snort .
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for

real    1m44.387s
user    0m0.508s
sys     0m3.768s


dak@lola:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for

real    0m59.633s
user    0m0.604s
sys     0m1.484s


And, for good measure:

dak@lola:/usr/local/texlive/2007$ time find -name \*.tex -exec grep snort {} \+
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and snorting, that I could not hear myself think for

real    0m55.640s
user    0m0.576s
sys     0m1.292s

In short: interspersing the directory and file search on a per-file
basis (as grep -r does) makes the whole operation much more inefficient
on a cold buffer cache.  On a warm cache, it is pretty much the same.
Using a pipe also allows for some parallelism.  In this particular case,
however, both jobs are so much I/O-bound that the last, pipeless version
using -exec ... \+ is still somewhat faster even though it is strictly
single-threaded in its operation.  The decisive factor appears to be the
large-scale bundling of directory searches without intervening file
searches in between.

This is on a
Linux lola 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux

single processor laptop with a fairly standard ATA disk.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum