From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Ted Zlatanov <tzz@lifelogs.com>
Newsgroups: gmane.emacs.help
Subject: Re: Negative occur
Date: Thu, 29 Nov 2007 09:58:11 -0600
Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos
Message-ID: <8663zl2gdo.fsf@lifelogs.com>
References: <mailman.4275.1196290389.18990.help-gnu-emacs@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1196354529 8217 80.91.229.12 (29 Nov 2007 16:42:09 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 29 Nov 2007 16:42:09 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Nov 29 17:42:17 2007
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1IxmSq-00062N-Kv
	for geh-help-gnu-emacs@m.gmane.org; Thu, 29 Nov 2007 17:42:12 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1IxmSa-0001q6-Vk
	for geh-help-gnu-emacs@m.gmane.org; Thu, 29 Nov 2007 11:41:57 -0500
Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!xs4all!feeder.news-service.com!newsfeed.kamp.net!newsfeed.kamp.net!newsfeed.freenet.de!news.albasani.net!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 75
Original-X-Trace: news.albasani.net
	un5EXSajmFq1uU6jnjATMUpeHatYFhg0hijYKW3kdCBN25EXByxqTJK/tK9uVwfP8hP6k4a1c0vL4ZKeOI2Nd7sEBJIUYyoL5OLf+qSYZT1lTGu1kdF+obISOxXNOK0m
Original-X-Complaints-To: abuse@albasani.net
Original-NNTP-Posting-Date: Thu, 29 Nov 2007 15:58:04 +0000 (UTC)
X-User-ID: LbI+yiEK4sKduIGY1fzlKcXVyYgFCpfA3KEoR1Lgzis=
X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;
	d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT=
	D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx"
Cancel-Lock: sha1:YmjY+HJibYNfWJ3LFJoRfK20ztQ=
	sha1:tvsMR/yN9vwS/z/HX55eKFy8U6U=
User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux)
X-NNTP-Posting-Host: lE+2otPpKHPpCy0+ESt+gDzehebZdJhl3FU+y6SXD+Q=
Original-Xref: shelby.stanford.edu gnu.emacs.help:154262
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:49690
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/49690>

On Wed, 28 Nov 2007 14:52:15 -0800 "Drew Adams" <drew.adams@oracle.com> wrote: 

>> >> > You could try running "occur" with the pattern "^" (which matches
>> >> > every line), then prune the results with M-x delete-matching-lines
DA> RET

DA> [spamfilteraccount suggested that Emacs should have this as part of
DA> `occur'...]

DA> I realize that your suggestion is that this be added to
DA> Emacs. I agree. FYI - In Icicles, just do this: C-' foobar C-~
DA> That shows and lets you visit all lines that do not match the
DA> regexp "foobar".
>> 
>> Both solutions will be slower on a large buffer than they should be.

DA> What does "slower than they should be" mean? How slow should they be? How
DA> slow are they in fact? How large is a large buffer? How do you judge that
DA> "they" (two totally different approaches and implementations) are slower
DA> than they should be?

I'm certain that creating an *occur* buffer on every line of a 100+ MB
buffer and then removing most of them compares poorly in memory usage
and CPU usage to just matching what you need from it.  It's a very
suboptimal approach whose only advantage is that it doesn't require
changes to any internal logic.  A parallel would be (using `sort'
instead of `cat' to account for Emacs' memory usage):

sort file | grep x
grep x file | sort

>> A real inversion parameter, either as a predicate function or a variable,
>> passed lexically or as a parameter to the occur-engine function call, is
>> necessary.

DA> Necessary? For what? Why necessary? These are generalizations that don't
DA> help.

Necessary to implement the solution in such a way that it will satisfy
both the OP and future needs for tuning the occur results.  I'm not
talking about Icicles (that's why I mentioned occur-1 and occur-engine
originally), sorry if I didn't state that clearly.  I just thought that
since you recommended the filter-later approach, Icicles didn't support
predicates, so it made sense to follow up to you.

DA> Your statements are vague, but I'm guessing that what you're really trying
DA> to say is that it is often more efficient to apply a predicate earlier
DA> rather than later (filter promotion), which is true.

Sure.  Reduce the search results as early as possible, as in my earlier
example of sort/grep usage.

DA> The Icicles approach is designed for interactive use, which is why it
DA> emphasises changing search patterns (and predicates) on the fly. It works
DA> fine with any buffers I've ever used, some of which are pretty darn big.
DA> (How big is big? I just searched a 19MB buffer with no effect on
DA> interactivity.)

DA> As always, the usefulness of a tool depends on what you use it for. If you
DA> want to search a 5 terabyte file, then interactivity might suffer with some
DA> approaches (depending on your hardware... and, especially, depending on your
DA> regexp). But, as always, the devil is in the details.

I can see that between a O(n) and O(n log(n)) algorithm for small data
sets, but when the difference is that one approach copies every line and
the other doesn't, while they achieve the same result, it literally
bothers me to recommend the former approach just because the API doesn't
support the latter.  So I'll propose the API change to emacs-devel.

As for hardware, I maintain an Emacs Maemo port, which is for the Nokia
770/800/810 tablets that run GNU/Linux.  There is little memory
available and the CPU is slow, so copying a large buffer unnecessarily
would be terrible for the user experience.

Ted