From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: grischka <grishka@gmx.de>
Newsgroups: gmane.emacs.devel
Subject: Re: Case mapping of sharp s
Date: Thu, 26 Nov 2009 14:07:51 +0100
Message-ID: <4B0E7DA7.3010402@gmx.de>
References: <4B05A11F.5000700@gmx.de>
	<jwvskcai43z.fsf-monnier+emacs@gnu.org>	<4B05D3EE.2000101@gmx.de>
	<jwvws1mt0ao.fsf-monnier+emacs@gnu.org>	<4B0759BA.2010303@gmx.de>	<19207.50135.691132.983395@a1i15.kph.uni-mainz.de>
	<m21vjst7ha.fsf@igel.home> <tl7vdh09kie.fsf@m17n.org>
	<4B0C32BF.2020708@gmx.de> <tl7r5rn9wsc.fsf@m17n.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1259241068 13524 80.91.229.12 (26 Nov 2009 13:11:08 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 26 Nov 2009 13:11:08 +0000 (UTC)
Cc: ulm@gentoo.org, schwab@linux-m68k.org, monnier@iro.umontreal.ca,
	emacs-devel@gnu.org
To: Kenichi Handa <handa@m17n.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 14:11:00 2009
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1NDe7S-0000jk-At
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2009 14:10:46 +0100
Original-Received: from localhost ([127.0.0.1]:43410 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NDe7R-0004qB-FV
	for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2009 08:10:45 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NDe4s-0003qI-E2
	for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:06 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NDe4n-0003nU-Bc
	for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:05 -0500
Original-Received: from [199.232.76.173] (port=45131 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NDe4n-0003nM-6Q
	for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:01 -0500
Original-Received: from mail.gmx.net ([213.165.64.20]:50601)
	by monty-python.gnu.org with smtp (Exim 4.60)
	(envelope-from <grishka@gmx.de>) id 1NDe4m-0004mN-FL
	for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:00 -0500
Original-Received: (qmail invoked by alias); 26 Nov 2009 13:07:57 -0000
Original-Received: from p57A0A45D.dip0.t-ipconnect.de (EHLO [192.168.1.2])
	[87.160.164.93]
	by mail.gmx.net (mp042) with SMTP; 26 Nov 2009 14:07:57 +0100
X-Authenticated: #18588216
X-Provags-ID: V01U2FsdGVkX18HhkcEMA8HOsKzw+8yHHQb2V0gPF3PUOoJ3RhKv+
	oFWy7oiL7te4NL
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
In-Reply-To: <tl7r5rn9wsc.fsf@m17n.org>
X-Y-GMX-Trusted: 0
X-FuHaFi: 0.5
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:117814
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/117814>

Kenichi Handa wrote:
> In article <4B0C32BF.2020708@gmx.de>, grischka <grishka@gmx.de> writes:
> 
>> DEC_BOTH is maybe not slower than INC_BOTH, but two DEC_BOTH
>> are (as with Andy's patch).  Moderately slower, still ;)
> 
> So, changing the current backward matching to forward
> matching should is effective.
> 

No, there is no such condition.  There are several ways to avoid
the duplicate DEC_POS, on being to handle the "pattern_len == 0"
case right at the top of the function, for all its branches.

>> The originally observed slowness was not because of the usage of
>> CHAR_TO_BYTE, but because of the flaws in CHAR_TO_BYTE, such as
>> using unrelated "best_below" and "best_above" in the same expression.
> 
>> For the numbers, with my 100MB file test case:
> 
>> backward search previously:
>> 	14 .. 90 s (random)
>> backward search with fixed CHAR_TO_BYTE:
>> 	5.6 s
> 
> I don't see any fix of CHAR_TO_BYTE in the current CVS
> code.  Where is it?

Those tests were made with ad hoc modifications as needed. There
was also some code to measure the times, of course.

>> In any case, with some tweaking it is possible to improve both
>> directions by ~70% (that is down to about 1 sec for the test
>> case).  I still don't know why boyer_moore with a one-char
>> pattern takes only 0.5 seconds though.  It's amazingly fast.
> 
> Are you comparing both methods with the same value of
> case-fold-search?

Same value, but not same search patterns.  One with "sharp s",
one without.

Actually I just wanted to check the facts with the originally in
this thread proposed "sharp s" patch, because some people wrote it
would be too slow.  FWIW I don't think it would be any problem.

>> Btw it seems that long loading time for the big file has much to
>> do with inefficient counting of newlines.  Appearently it takes
>> ~2 sec to load the file and then another ~6 sec to scan newlines.
>> It should be (far) under 0.5 sec.
> 
> Why is the code of counting newlines called when we just
> visit a file?

I have no idea why.  Opening the 100MB file would call scan_buffer
(for \n) 67637 times.  The file has 3142771 lines though, so I take
it back: it's probably not "counting newlines" in that sense.  Maybe
it comes from  "Loading cc-langs ..." which happens after the first
2 seconds.

--- grischka

> 
> ---
> Kenichi Handa
> handa@m17n.org
>