From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Elias Oltmanns Newsgroups: gmane.emacs.devel Subject: Re: New buffer-case-table makes search_buffer painfully slow Date: Sat, 06 May 2006 21:10:08 +0200 Message-ID: <87fyjnkm0f.fsf@denkblock.local> References: <87y7xhq4wy.fsf@denkblock.local> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1146942927 22717 80.91.229.2 (6 May 2006 19:15:27 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 6 May 2006 19:15:27 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 06 21:15:25 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FcSFM-0001p9-BJ for ged-emacs-devel@m.gmane.org; Sat, 06 May 2006 21:15:20 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FcSFL-0003ym-Fd for ged-emacs-devel@m.gmane.org; Sat, 06 May 2006 15:15:19 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FcSF9-0003yh-Fo for emacs-devel@gnu.org; Sat, 06 May 2006 15:15:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FcSF8-0003y3-Gv for emacs-devel@gnu.org; Sat, 06 May 2006 15:15:06 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FcSF8-0003y0-Bw for emacs-devel@gnu.org; Sat, 06 May 2006 15:15:06 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FcSFZ-00030w-VH for emacs-devel@gnu.org; Sat, 06 May 2006 15:15:34 -0400 Original-Received: from root by ciao.gmane.org with local (Exim 4.43) id 1FcSF4-0001lr-Md for emacs-devel@gnu.org; Sat, 06 May 2006 21:15:02 +0200 Original-Received: from o5a44.o.pppool.de ([89.51.90.68]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 06 May 2006 21:15:02 +0200 Original-Received: from oltmanns by o5a44.o.pppool.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 06 May 2006 21:15:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 42 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: o5a44.o.pppool.de User-Agent: Gnus/5.110004 (No Gnus v0.4) Cancel-Lock: sha1:SJ6jGc+OCdC2I+6vWjvi6bQNbI8= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54013 Archived-At: Elias Oltmanns wrote: > Hi all, > > switching from emacs 21 to emacs 22 has a very significant performance > impact on packages that make heavy use of search_buffer. An example > that actually made me aware of this problem is gnus processing large > mbox files. Further analysis of this problem revealed that in emacs 22 > an "i" in the search string makes search_buffer use simple_search() > instead of boyer_moore(). Emacs 22's EQUIVALENCES table relates i, and thus I as well, to two more characters with character codes 331857 and 331856. On www.unicode.org the character look up engine couldn't find a match for U+51051 or U+51050 saying that most likely those codes weren't assigned to any characters yet. So, here is a plain question: Is there a bug in the case-table in emacs 22 or does the search engine on www.unicode.org for some reason miss certain character ranges? Slightly biassed, I'm disregarding the possibility of me being unable to use www.unicode.org properly, which, in fact, might well be the reason for my confusion. Second question: If the case-table was right, what would be the right way to tacle the problem described in my original post? For me the following snippet in .emacs solves the problem: --- ~/.emacs --- (unless (< emacs-major-version 22) (set-case-syntax 331856 "w" (standard-case-table)) (set-case-syntax 331857 "w" (standard-case-table))) --- ~/.emacs --- This, of course, is a durty hack and I'm wondering whether emacs should provide a feature to "clean up" the EQUIVALENCES table in the ascii range in order to avoid falling back to a slow search algorithm when we are searching for pure ascii strings. Or do you think that packages like gnus which make heavy use of re-search-forward should handle these performance issues themselves---or indeed the users. Regards, Elias