From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Case mapping of sharp s Date: Wed, 18 Nov 2009 09:44:31 -0500 Message-ID: References: <19200.4158.380820.761685@a1i15.kph.uni-mainz.de> <83lji6mgg4.fsf@gnu.org> <83iqd9m14h.fsf@gnu.org> <83bpj0mq2v.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1258556315 9884 80.91.229.12 (18 Nov 2009 14:58:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 18 Nov 2009 14:58:35 +0000 (UTC) Cc: Eli Zaretskii , ulm@gentoo.org, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 18 15:58:28 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NAlzG-0007Hk-DI for ged-emacs-devel@m.gmane.org; Wed, 18 Nov 2009 15:58:26 +0100 Original-Received: from localhost ([127.0.0.1]:57676 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAlzF-0002QF-OZ for ged-emacs-devel@m.gmane.org; Wed, 18 Nov 2009 09:58:25 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAllz-0003aB-SQ for emacs-devel@gnu.org; Wed, 18 Nov 2009 09:44:43 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAllv-0003Xk-8D for emacs-devel@gnu.org; Wed, 18 Nov 2009 09:44:43 -0500 Original-Received: from [199.232.76.173] (port=39230 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAllv-0003Xh-1d for emacs-devel@gnu.org; Wed, 18 Nov 2009 09:44:39 -0500 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:59912 helo=ironport2-out.pppoe.ca) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NAllp-00048Y-BA; Wed, 18 Nov 2009 09:44:33 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As4EAMOWA0vO+II8/2dsb2JhbACBTdRwhDsEiVc X-IronPort-AV: E=Sophos;i="4.44,765,1249272000"; d="scan'208";a="49593616" Original-Received: from 206-248-130-60.dsl.teksavvy.com (HELO pastel.home) ([206.248.130.60]) by ironport2-out.pppoe.ca with ESMTP; 18 Nov 2009 09:44:32 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id 91041859F; Wed, 18 Nov 2009 09:44:31 -0500 (EST) In-Reply-To: (Kenichi Handa's message of "Wed, 18 Nov 2009 15:26:32 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117177 Archived-At: >> My idea was to use the same technique to fold the text as you search >> through it. You could, for example, fold one character at a time, so >> that instead of comparing against X you compare against fold(X). > BM-search already does it. The problem is that the current > `fold' function can be applicable only to the last byte of a > character. We'd need to use a variant of BM that can search a set of strings rather than a single string. Given the fact that all the strings in the set are very similar, it should be possible to come up with an efficient algorithm for it. The basic idea I'd try is to build the BM table for each of the alternative strings and then merge them into a single table that takes for each entry the smallest shift distance of each table. There's a lot more to it, I'm sure, but it should be workable. Would make a nice project for a student. Stefan