From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Case mapping of sharp s
Date: Sat, 21 Nov 2009 10:58:04 +0200
Message-ID: <83k4xkp84j.fsf@gnu.org>
References: <4B05A11F.5000700@gmx.de> <jwvskcai43z.fsf-monnier+emacs@gnu.org>
	<4B05D3EE.2000101@gmx.de> <jwvws1mt0ao.fsf-monnier+emacs@gnu.org>
	<4B0759BA.2010303@gmx.de>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: lo.gmane.org
X-Trace: ger.gmane.org 1258793976 28370 80.91.229.12 (21 Nov 2009 08:59:36 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 21 Nov 2009 08:59:36 +0000 (UTC)
Cc: handa@m17n.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org
To: grischka <grishka@gmx.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 09:59:29 2009
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1NBloV-0004Jn-H6
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 09:59:27 +0100
Original-Received: from localhost ([127.0.0.1]:47155 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NBloV-0002k8-1I
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 03:59:27 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NBloL-0002j9-4K
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 03:59:17 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NBloE-0002gf-L6
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 03:59:15 -0500
Original-Received: from [199.232.76.173] (port=41565 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NBloE-0002gc-Hp
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 03:59:10 -0500
Original-Received: from mtaout20.012.net.il ([80.179.55.166]:38140)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <eliz@gnu.org>) id 1NBloE-0006Vo-49
	for emacs-devel@gnu.org; Sat, 21 Nov 2009 03:59:10 -0500
Original-Received: from conversion-daemon.a-mtaout20.012.net.il by
	a-mtaout20.012.net.il (HyperSendmail v2007.08) id
	<0KTG00100BCO3Q00@a-mtaout20.012.net.il> for
	emacs-devel@gnu.org; Sat, 21 Nov 2009 10:58:05 +0200 (IST)
Original-Received: from HOME-C4E4A596F7 ([87.70.37.193]) by a-mtaout20.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0KTG00M7UBKR0I60@a-mtaout20.012.net.il>;
	Sat, 21 Nov 2009 10:58:04 +0200 (IST)
In-reply-to: <4B0759BA.2010303@gmx.de>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:117420
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/117420>

> Date: Sat, 21 Nov 2009 04:08:42 +0100
> From: grischka <grishka@gmx.de>
> Cc: emacs-devel@gnu.org, handa@m17n.org
> 
> Stefan Monnier wrote:
> >> Also how can it happen that a C function varies between 4 and 90
> >> seconds for the same action.
> > 
> > I have no explanation for that.
> 
> Turned out that the time of backwards simple_search depends mostly
> on the number of buffer markers in the buffer.
> 
> That's because of CHAR_TO_BYTE in the inner loop and then because
> that one doesn't mind checking hundreds of markers for each single
> char in the file.

CHAR_TO_BYTE could be expensive, yes.  But how else can you convert an
arbitrary character position to the corresponding byte position?  When
you scan forward, you know the byte length of a multi-byte UTF-8
sequence by the first byte, but what do you do when you scan backwards?

The markers CHAR_TO_BYTE considers are a kind of cache, and are
supposed to speed things up.  I don't know what measurements were done
at the time this caching was introduced, nor whether those
measurements were repeated when Emacs switched from Mule encoding to
the current extended UTF-8 encoding of characters.  Maybe nowadays
this caching no longer helps.  Maybe UTF-8 allows a simpler conversion
than just counting bytes since the beginning of the buffer.  Or maybe
this particular use-case does not benefit from the cache, and we
should have a no-cache method for doing the same.

IOW, more research is needed.