From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#13084: boyer_moore crashes with certain characters in the case table Date: Sat, 15 Dec 2012 22:17:17 +0900 Message-ID: <87ehir78v6.fsf@gnu.org> References: <87txs1l4kg.fsf@mail.jurta.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1355577666 383 80.91.229.3 (15 Dec 2012 13:21:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 15 Dec 2012 13:21:06 +0000 (UTC) Cc: 13084@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 15 14:21:20 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Tjrg4-0003Ug-St for geb-bug-gnu-emacs@m.gmane.org; Sat, 15 Dec 2012 14:21:17 +0100 Original-Received: from localhost ([::1]:46699 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tjrfr-0007mz-LV for geb-bug-gnu-emacs@m.gmane.org; Sat, 15 Dec 2012 08:21:03 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:53409) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tjrfo-0007ma-24 for bug-gnu-emacs@gnu.org; Sat, 15 Dec 2012 08:21:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tjrfm-0000SH-PG for bug-gnu-emacs@gnu.org; Sat, 15 Dec 2012 08:20:59 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:33234) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tjrfm-0000SD-Lj for bug-gnu-emacs@gnu.org; Sat, 15 Dec 2012 08:20:58 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Tjrgn-0004d0-S4 for bug-gnu-emacs@gnu.org; Sat, 15 Dec 2012 08:22:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 15 Dec 2012 13:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135557766917730 (code B ref 13084); Sat, 15 Dec 2012 13:22:01 +0000 Original-Received: (at 13084) by debbugs.gnu.org; 15 Dec 2012 13:21:09 +0000 Original-Received: from localhost ([127.0.0.1]:43485 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tjrfw-0004bu-1H for submit@debbugs.gnu.org; Sat, 15 Dec 2012 08:21:08 -0500 Original-Received: from fencepost.gnu.org ([208.118.235.10]:37692) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tjrfs-0004bm-QQ for 13084@debbugs.gnu.org; Sat, 15 Dec 2012 08:21:06 -0500 Original-Received: from 253.240.accsnet.ne.jp ([202.220.240.253]:59484 helo=mongkok) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1Tjrep-0002cf-Cm; Sat, 15 Dec 2012 08:20:00 -0500 In-Reply-To: <83obhxoo2v.fsf@gnu.org> (message from Eli Zaretskii on Thu, 13 Dec 2012 19:32:08 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:68549 Archived-At: In article <83obhxoo2v.fsf@gnu.org>, Eli Zaretskii writes: > > Here, A and B belongs to the same character group iff A and > > B has the same multibyte sequence except for the last byte. > > In this condition, we should be able to use the table > > simple_translate. > OK, then maybe just the comments need to be fixed. They shouldn't > talk about "charset" and "row", which are undefined in Unicode Emacs. > They should instead use terminology that correspond to UTF-8 multibyte > representation of characters we use today. I've just committed this change. How is it? === modified file 'src/search.c' --- src/search.c 2012-10-10 20:09:47 +0000 +++ src/search.c 2012-12-15 13:04:46 +0000 @@ -1313,8 +1313,11 @@ non-nil, we can use boyer-moore search only if TRT can be represented by the byte array of 256 elements. For that, all non-ASCII case-equivalents of all case-sensitive - characters in STRING must belong to the same charset and - row. */ + characters in STRING must belong to the same character + group (two characters belong to the same group iff their + multibyte forms are the same except for the last byte; + i.e. every 64 characters form a group; U+0000..U+003F, + U+0040..U+007F, U+0080..U+00BF, ...). */ while (--len >= 0) { --- Kenichi Handa handa@gnu.org