From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#13084: boyer_moore crashes with certain characters in the case table Date: Thu, 13 Dec 2012 22:39:29 +0900 Message-ID: <87r4mu6pgu.fsf@gnu.org> References: <87txs1l4kg.fsf@mail.jurta.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1355406141 29568 80.91.229.3 (13 Dec 2012 13:42:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 13 Dec 2012 13:42:21 +0000 (UTC) Cc: 13084@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 13 14:42:34 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Tj93X-0002de-MW for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Dec 2012 14:42:31 +0100 Original-Received: from localhost ([::1]:37960 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tj93K-0002Fq-PL for geb-bug-gnu-emacs@m.gmane.org; Thu, 13 Dec 2012 08:42:18 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:42619) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tj93E-0002EY-1b for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 08:42:16 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tj93C-0008Ac-T7 for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 08:42:11 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:58339) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tj93C-0008AY-PV for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 08:42:10 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Tj942-0007Zs-Pf for bug-gnu-emacs@gnu.org; Thu, 13 Dec 2012 08:43:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 13 Dec 2012 13:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135540617929114 (code B ref 13084); Thu, 13 Dec 2012 13:43:02 +0000 Original-Received: (at 13084) by debbugs.gnu.org; 13 Dec 2012 13:42:59 +0000 Original-Received: from localhost ([127.0.0.1]:40357 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tj93z-0007ZX-7H for submit@debbugs.gnu.org; Thu, 13 Dec 2012 08:42:59 -0500 Original-Received: from fencepost.gnu.org ([208.118.235.10]:41131) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tj93x-0007ZQ-MK for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 08:42:58 -0500 Original-Received: from 253.240.accsnet.ne.jp ([202.220.240.253]:54722 helo=mongkok) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1Tj936-0002L7-0M; Thu, 13 Dec 2012 08:42:04 -0500 In-Reply-To: <831uewa9cq.fsf@gnu.org> (message from Eli Zaretskii on Tue, 11 Dec 2012 17:37:09 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:68451 Archived-At: In article <831uewa9cq.fsf@gnu.org>, Eli Zaretskii writes: > In addition, I'd suggest that Handa-san (or someone else) takes a good > look at the code that sets up the simple_translate table in > boyer_moore, because the constants there, like 0200 and 0x3F, and all > the talk about characters that belong "to the same charset and row" > smell of pre-Unicode (a.k.a. "MULE") representation of characters. > For now, I disabled boyer_moore for unibyte characters beyond 160, > because my reading of the code is that simple_translate and the > supporting code cannot handle that. Maybe I'm wrong. I have not yet checked the code, but what I remember is that search_buffer checks the search string and decides which to use; boyer_moore or simple_search. If all equivalent characters of all non-ASCII characters in the search string are in the same character group, we can use boyer_moore. Here, A and B belongs to the same character group iff A and B has the same multibyte sequence except for the last byte. In this condition, we should be able to use the table simple_translate. --- Kenichi Handa handa@gnu.org