From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13084: boyer_moore crashes with certain characters in the case table Date: Thu, 13 Dec 2012 19:32:08 +0200 Message-ID: <83obhxoo2v.fsf@gnu.org> References: <87r4mu6pgu.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1355420004 6740 80.91.229.3 (13 Dec 2012 17:33:24 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 13 Dec 2012 17:33:24 +0000 (UTC) Cc: 13084@debbugs.gnu.org To: Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 13 18:33:36 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TjCf8-0000dr-Hy for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Dec 2012 18:33:34 +0100 Original-Received: from localhost ([::1]:48433 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TjCev-0000Ge-M4 for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Dec 2012 12:33:21 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:48044) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TjCet-0000GD-5f for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 12:33:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TjCej-0005vM-LM for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 12:33:19 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:59195) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TjCej-0005vI-Ig for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 12:33:09 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TjCfa-0006XG-Da for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 12:34:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 13 Dec 2012 17:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135541998725030 (code B ref 13084); Thu, 13 Dec 2012 17:34:02 +0000 Original-Received: (at 13084) by debbugs.gnu.org; 13 Dec 2012 17:33:07 +0000 Original-Received: from localhost ([127.0.0.1]:41213 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjCeh-0006Vf-6o for submit@debbugs.gnu.org; Thu, 13 Dec 2012 12:33:07 -0500 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:53351) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjCee-0006VT-Rg for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 12:33:05 -0500 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MEZ00400CK2OO00@a-mtaout21.012.net.il> for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 19:32:10 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MEZ004AMCPLE6C0@a-mtaout21.012.net.il>; Thu, 13 Dec 2012 19:32:10 +0200 (IST) In-reply-to: <87r4mu6pgu.fsf@gnu.org> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:68469 Archived-At: > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Thu, 13 Dec 2012 22:39:29 +0900 > > I have not yet checked the code, but what I remember is that > search_buffer checks the search string and decides which to > use; boyer_moore or simple_search. If all equivalent > characters of all non-ASCII characters in the search string > are in the same character group, we can use boyer_moore. Yes, that's my reading of the code as well. > Here, A and B belongs to the same character group iff A and > B has the same multibyte sequence except for the last byte. > In this condition, we should be able to use the table > simple_translate. OK, then maybe just the comments need to be fixed. They shouldn't talk about "charset" and "row", which are undefined in Unicode Emacs. They should instead use terminology that correspond to UTF-8 multibyte representation of characters we use today.