From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chong Yidong Newsgroups: gmane.emacs.bugs Subject: bug#540: 23.0.60; Unicode search bug Date: Wed, 27 Aug 2008 00:15:57 -0400 Message-ID: <87wsi3qeiq.fsf@cyd.mit.edu> Reply-To: Chong Yidong , 540@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1219811264 32527 80.91.229.12 (27 Aug 2008 04:27:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 27 Aug 2008 04:27:44 +0000 (UTC) Cc: 540@emacsbugs.donarmstrong.com To: Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Aug 27 06:28:38 2008 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KYCe5-00020e-3N for geb-bug-gnu-emacs@m.gmane.org; Wed, 27 Aug 2008 06:28:37 +0200 Original-Received: from localhost ([127.0.0.1]:56996 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KYCd6-00082w-3W for geb-bug-gnu-emacs@m.gmane.org; Wed, 27 Aug 2008 00:27:36 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KYCd2-00082r-QI for bug-gnu-emacs@gnu.org; Wed, 27 Aug 2008 00:27:32 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KYCd1-00082f-E6 for bug-gnu-emacs@gnu.org; Wed, 27 Aug 2008 00:27:32 -0400 Original-Received: from [199.232.76.173] (port=59957 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KYCd1-00082c-8F for bug-gnu-emacs@gnu.org; Wed, 27 Aug 2008 00:27:31 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:35569) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KYCd0-0003w9-ON for bug-gnu-emacs@gnu.org; Wed, 27 Aug 2008 00:27:31 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m7R4RSNB021560; Tue, 26 Aug 2008 21:27:28 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id m7R4K32C018703; Tue, 26 Aug 2008 21:20:03 -0700 X-Loop: don@donarmstrong.com Resent-From: Chong Yidong Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Wed, 27 Aug 2008 04:20:03 +0000 Resent-Message-ID: Resent-Sender: don@donarmstrong.com X-Emacs-PR-Message: report 540 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 540-submit@emacsbugs.donarmstrong.com id=B540.121981045416991 (code B ref 540); Wed, 27 Aug 2008 04:20:03 +0000 Original-Received: (at 540) by emacsbugs.donarmstrong.com; 27 Aug 2008 04:14:14 +0000 Original-Received: from cyd.mit.edu (CYD.MIT.EDU [18.115.2.24]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m7R4EBo5016985 for <540@emacsbugs.donarmstrong.com>; Tue, 26 Aug 2008 21:14:12 -0700 Original-Received: by cyd.mit.edu (Postfix, from userid 1000) id 7744657E34B; Wed, 27 Aug 2008 00:15:57 -0400 (EDT) X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) Resent-Date: Wed, 27 Aug 2008 00:27:32 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:19750 Archived-At: Hi Handa-san, Could you take a look at this bug report? Thanks. Juri Linkov wrote: > There is a weird bug in searching Unicode text. The search function > fails on Cyrillic letters between codepoints #x0400 and #x041f, but > successfully finds a Cyrillic letter between #x0420 and #x042f. > > I tried to debug this and see that in case of failure it calls > `boyer_moore', and in case of successful search it calls > `simple_search'. I checked the Unicode properties, but everything > seems correct. > > This bug didn't exist before the Unicode merge. > > The easiest way to reproduce it: run `emacs -Q', put in the *scratch* > buffer the following 4 lines (note the leading space): > > (search-forward " =D0=9F" nil t) > (search-forward " =D0=A0" nil t) > =D0=9F > =D0=A0 > > and type `C-x C-e' after each of first two lines. Here, the failing case is: =D0=9F =3D 1055 =3D 10000011111 inverse(=D0=9F) =3D 1087 =3D 10000111111 ^^^^^^ whereas the case that works (by setting boyer_moore_ok to 0) is =D0=A0 =3D 1056 =3D 10000100000 inverse(=D0=A0) =3D 1088 =3D 10001000000 ^^^^^^ I've indicated the last 6 bits, according to the logic in search_buffer (which I don't fully understand).