From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Case mapping of sharp s Date: Tue, 17 Nov 2009 16:36:12 +0900 Message-ID: References: <19200.4158.380820.761685@a1i15.kph.uni-mainz.de> <19201.32770.352944.474086@a1i15.kph.uni-mainz.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1258443400 28590 80.91.229.12 (17 Nov 2009 07:36:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 17 Nov 2009 07:36:40 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ulrich Mueller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 17 08:36:33 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NAIc5-0003qV-Dp for ged-emacs-devel@m.gmane.org; Tue, 17 Nov 2009 08:36:33 +0100 Original-Received: from localhost ([127.0.0.1]:55375 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAIc4-00044o-KO for ged-emacs-devel@m.gmane.org; Tue, 17 Nov 2009 02:36:32 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAIby-00042W-5Y for emacs-devel@gnu.org; Tue, 17 Nov 2009 02:36:26 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAIbt-0003yh-1g for emacs-devel@gnu.org; Tue, 17 Nov 2009 02:36:25 -0500 Original-Received: from [199.232.76.173] (port=49890 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAIbs-0003ye-MC for emacs-devel@gnu.org; Tue, 17 Nov 2009 02:36:20 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:61684) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NAIbr-0003qx-LP for emacs-devel@gnu.org; Tue, 17 Nov 2009 02:36:20 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id nAH7aD0B013860; Tue, 17 Nov 2009 16:36:13 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id nAH7aDlX012557; Tue, 17 Nov 2009 16:36:13 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id nAH7aChX020477; Tue, 17 Nov 2009 16:36:12 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1NAIbk-0006zA-A6; Tue, 17 Nov 2009 16:36:12 +0900 In-Reply-To: <19201.32770.352944.474086@a1i15.kph.uni-mainz.de> (message from Ulrich Mueller on Mon, 16 Nov 2009 17:38:26 +0100) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117067 Archived-At: In article <19201.32770.352944.474086@a1i15.kph.uni-mainz.de>, Ulrich Muell= er writes: > So do I understand this right: In order to perform a Boyer-Moore > search, the characters have to be either both ASCII, or must be in the > same group of 64 adjacent characters (because the last byte in UTF-8 > encodes 6 bits)? Yes. > Is that the reason why also =C3=BF and =C5=B8 (U+00FF and U+0178, small/c= apital > y with diaeresis) don't form a case pair? Yes. > > So, if you are sure that searching of =C3=9F is very rare (I have > > no idea), please install it. > Usage of (lower case) =C3=9F is very common in a German language context, > so I'd guess that searching for it is not so rare. > On the other hand, capital =E1=BA=9E is not used in regular German orthog= raphy > (that's probably the reason why the character was added to Unicode > only in 2008). So if the change would cause large tradeoffs in search > speed, then I think it's not worthwhile. > By what factor is the non-BM search slower, as compared to the BM > search? I don't know exactly. It depends on the length of searching string; longer the string is, the more BM search is faster than simple serach. At least, when this code was active, ;; (set-downcase-syntax ?=C4=B0 ?i tbl) ;; (set-upcase-syntax ?I ?=C4=B1 tbl) there were complaints about the slowdown. --- Kenichi Handa handa@m17n.org