From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: grischka Newsgroups: gmane.emacs.devel Subject: Re: Case mapping of sharp s Date: Thu, 26 Nov 2009 14:07:51 +0100 Message-ID: <4B0E7DA7.3010402@gmx.de> References: <4B05A11F.5000700@gmx.de> <4B05D3EE.2000101@gmx.de> <4B0759BA.2010303@gmx.de> <19207.50135.691132.983395@a1i15.kph.uni-mainz.de> <4B0C32BF.2020708@gmx.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1259241068 13524 80.91.229.12 (26 Nov 2009 13:11:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Nov 2009 13:11:08 +0000 (UTC) Cc: ulm@gentoo.org, schwab@linux-m68k.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 26 14:11:00 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NDe7S-0000jk-At for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2009 14:10:46 +0100 Original-Received: from localhost ([127.0.0.1]:43410 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NDe7R-0004qB-FV for ged-emacs-devel@m.gmane.org; Thu, 26 Nov 2009 08:10:45 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NDe4s-0003qI-E2 for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:06 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NDe4n-0003nU-Bc for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:05 -0500 Original-Received: from [199.232.76.173] (port=45131 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NDe4n-0003nM-6Q for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:01 -0500 Original-Received: from mail.gmx.net ([213.165.64.20]:50601) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1NDe4m-0004mN-FL for emacs-devel@gnu.org; Thu, 26 Nov 2009 08:08:00 -0500 Original-Received: (qmail invoked by alias); 26 Nov 2009 13:07:57 -0000 Original-Received: from p57A0A45D.dip0.t-ipconnect.de (EHLO [192.168.1.2]) [87.160.164.93] by mail.gmx.net (mp042) with SMTP; 26 Nov 2009 14:07:57 +0100 X-Authenticated: #18588216 X-Provags-ID: V01U2FsdGVkX18HhkcEMA8HOsKzw+8yHHQb2V0gPF3PUOoJ3RhKv+ oFWy7oiL7te4NL User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) In-Reply-To: X-Y-GMX-Trusted: 0 X-FuHaFi: 0.5 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117814 Archived-At: Kenichi Handa wrote: > In article <4B0C32BF.2020708@gmx.de>, grischka writes: > >> DEC_BOTH is maybe not slower than INC_BOTH, but two DEC_BOTH >> are (as with Andy's patch). Moderately slower, still ;) > > So, changing the current backward matching to forward > matching should is effective. > No, there is no such condition. There are several ways to avoid the duplicate DEC_POS, on being to handle the "pattern_len == 0" case right at the top of the function, for all its branches. >> The originally observed slowness was not because of the usage of >> CHAR_TO_BYTE, but because of the flaws in CHAR_TO_BYTE, such as >> using unrelated "best_below" and "best_above" in the same expression. > >> For the numbers, with my 100MB file test case: > >> backward search previously: >> 14 .. 90 s (random) >> backward search with fixed CHAR_TO_BYTE: >> 5.6 s > > I don't see any fix of CHAR_TO_BYTE in the current CVS > code. Where is it? Those tests were made with ad hoc modifications as needed. There was also some code to measure the times, of course. >> In any case, with some tweaking it is possible to improve both >> directions by ~70% (that is down to about 1 sec for the test >> case). I still don't know why boyer_moore with a one-char >> pattern takes only 0.5 seconds though. It's amazingly fast. > > Are you comparing both methods with the same value of > case-fold-search? Same value, but not same search patterns. One with "sharp s", one without. Actually I just wanted to check the facts with the originally in this thread proposed "sharp s" patch, because some people wrote it would be too slow. FWIW I don't think it would be any problem. >> Btw it seems that long loading time for the big file has much to >> do with inefficient counting of newlines. Appearently it takes >> ~2 sec to load the file and then another ~6 sec to scan newlines. >> It should be (far) under 0.5 sec. > > Why is the code of counting newlines called when we just > visit a file? I have no idea why. Opening the 100MB file would call scan_buffer (for \n) 67637 times. The file has 3142771 lines though, so I take it back: it's probably not "counting newlines" in that sense. Maybe it comes from "Loading cc-langs ..." which happens after the first 2 seconds. --- grischka > > --- > Kenichi Handa > handa@m17n.org >