From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Elias Oltmanns Newsgroups: gmane.emacs.devel Subject: New buffer-case-table makes search_buffer painfully slow Date: Thu, 04 May 2006 15:46:05 +0200 Message-ID: <87y7xhq4wy.fsf@denkblock.local> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1146750650 14931 80.91.229.2 (4 May 2006 13:50:50 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 4 May 2006 13:50:50 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu May 04 15:50:49 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FbeDz-0001SN-8B for ged-emacs-devel@m.gmane.org; Thu, 04 May 2006 15:50:35 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FbeDy-00072j-Ea for ged-emacs-devel@m.gmane.org; Thu, 04 May 2006 09:50:34 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FbeDi-00072F-75 for emacs-devel@gnu.org; Thu, 04 May 2006 09:50:18 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FbeDh-00071l-NJ for emacs-devel@gnu.org; Thu, 04 May 2006 09:50:17 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FbeDh-00071i-Cp for emacs-devel@gnu.org; Thu, 04 May 2006 09:50:17 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FbeEJ-0005Fj-En for emacs-devel@gnu.org; Thu, 04 May 2006 09:50:55 -0400 Original-Received: from root by ciao.gmane.org with local (Exim 4.43) id 1FbeDS-0001K9-EX for emacs-devel@gnu.org; Thu, 04 May 2006 15:50:02 +0200 Original-Received: from p508868f6.dip.t-dialin.net ([80.136.104.246]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 May 2006 15:50:02 +0200 Original-Received: from oltmanns by p508868f6.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 May 2006 15:50:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 29 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: p508868f6.dip.t-dialin.net User-Agent: Gnus/5.110004 (No Gnus v0.4) Cancel-Lock: sha1:J2LV85PS4IhID9UjmV1JmvMLZbQ= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:53901 Archived-At: Hi all, switching from emacs 21 to emacs 22 has a very significant performance impact on packages that make heavy use of search_buffer. An example that actually made me aware of this problem is gnus processing large mbox files. Further analysis of this problem revealed that in emacs 22 an "i" in the search string makes search_buffer use simple_search() instead of boyer_moore(). This means that, for instance, a loop repeatedly calling re-search-forward with the search string "X-Gnus-Article-Number" takes (in the order of several magnitudes) more time in emacs 22 than in emacs 21 just because of the "i" in article -- at least in a multibyte buffer. The cause for this seems to be a change in the buffer-case-table. Comparing the output of M-x describe-buffer-case-table in emacs 21 resp. emacs 22 makes me wonder whether a match of a certain character in unicode row 32 with "i" in the emacs 22 table might be the cause for this trouble. If so, what would be the right thing to do about it? Of course, applications like gnus have to open the mbox files in multibyte mode simply because mails in different languages and charsets may be stored in these files. Yet, I'm quite confident that quite a few people if not the majority will never need the match of i with this obscure character but would certainly prefer the boyer_moore algorithm when searching for strings containing an "i". Any ideas and thoughts concerning this problem? Regards, Elias