bug#540: 23.0.60; Unicode search bug

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#540: 23.0.60; Unicode search bug
@ 2008-07-06 18:43 ` Juri Linkov
  2008-08-27 14:40   ` bug#540: marked as done (23.0.60; Unicode search bug) Emacs bug Tracking System
  0 siblings, 1 reply; 4+ messages in thread
From: Juri Linkov @ 2008-07-06 18:43 UTC (permalink / raw)
  To: emacs-pretest-bug

There is a weird bug in searching Unicode text.  The search function
fails on Cyrillic letters between codepoints #x0400 and #x041f, but
successfully finds a Cyrillic letter between #x0420 and #x042f.

I tried to debug this and see that in case of failure
it calls `boyer_moore', and in case of successful search
it calls `simple_search'.  I checked the Unicode properties,
but everything seems correct.

This bug didn't exist before the Unicode merge.

The easiest way to reproduce it: run `emacs -Q',
put in the *scratch* buffer the following 4 lines
(note the leading space):

(search-forward " П" nil t)
(search-forward " Р" nil t)
 П
 Р

and type `C-x C-e' after each of first two lines.

In GNU Emacs 23.0.60 (x86_64-pc-linux-gnu)
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

-- 
Juri Linkov
http://www.jurta.org/emacs/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#540: marked as done (23.0.60; Unicode search bug)
  2008-07-06 18:43 ` bug#540: 23.0.60; Unicode search bug Juri Linkov
@ 2008-08-27 14:40   ` Emacs bug Tracking System
  0 siblings, 0 replies; 4+ messages in thread
From: Emacs bug Tracking System @ 2008-08-27 14:40 UTC (permalink / raw)
  To: Chong Yidong

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

Your message dated Wed, 27 Aug 2008 10:34:40 -0400
with message-id <87wsi2a5mn.fsf@cyd.mit.edu>
and subject line Re: bug#540: 23.0.60; Unicode search bug
has caused the Emacs bug report #540,
regarding 23.0.60; Unicode search bug
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact don@donarmstrong.com
immediately.)

-- 
540: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=540
Emacs Bug Tracking System
Contact don@donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 3599 bytes --]

From: Juri Linkov <juri@jurta.org>
To: emacs-pretest-bug@gnu.org
Subject: 23.0.60; Unicode search bug
Date: Sun, 06 Jul 2008 21:43:23 +0300
Message-ID: <87ej66q2os.fsf@jurta.org>

There is a weird bug in searching Unicode text.  The search function
fails on Cyrillic letters between codepoints #x0400 and #x041f, but
successfully finds a Cyrillic letter between #x0420 and #x042f.

I tried to debug this and see that in case of failure
it calls `boyer_moore', and in case of successful search
it calls `simple_search'.  I checked the Unicode properties,
but everything seems correct.

This bug didn't exist before the Unicode merge.

The easiest way to reproduce it: run `emacs -Q',
put in the *scratch* buffer the following 4 lines
(note the leading space):

(search-forward " П" nil t)
(search-forward " Р" nil t)
 П
 Р

and type `C-x C-e' after each of first two lines.

In GNU Emacs 23.0.60 (x86_64-pc-linux-gnu)
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

-- 
Juri Linkov
http://www.jurta.org/emacs/

[-- Attachment #3: Type: message/rfc822, Size: 1302 bytes --]

From: Chong Yidong <cyd@stupidchicken.com>
To: Andreas Schwab <schwab@suse.de>
Cc: 540-done@emacsbugs.donarmstrong.com, Kenichi Handa <handa@m17n.org>
Subject: Re: bug#540: 23.0.60; Unicode search bug
Date: Wed, 27 Aug 2008 10:34:40 -0400
Message-ID: <87wsi2a5mn.fsf@cyd.mit.edu>

Andreas Schwab <schwab@suse.de> writes:

> Should be fixed now.

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#540: 23.0.60; Unicode search bug
@ 2008-08-27  4:15 Chong Yidong
  2008-08-27 10:59 ` Andreas Schwab
  0 siblings, 1 reply; 4+ messages in thread
From: Chong Yidong @ 2008-08-27  4:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 540

Hi Handa-san,

Could you take a look at this bug report?  Thanks.

Juri Linkov <juri@jurta.org> wrote:
> There is a weird bug in searching Unicode text.  The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'.  I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
>  П
>  Р
>
> and type `C-x C-e' after each of first two lines.

Here, the failing case is:

П          = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
                         ^^^^^^

whereas the case that works (by setting boyer_moore_ok to 0) is

Р          = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
                         ^^^^^^

I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#540: 23.0.60; Unicode search bug
  2008-08-27  4:15 bug#540: 23.0.60; Unicode search bug Chong Yidong
@ 2008-08-27 10:59 ` Andreas Schwab
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Schwab @ 2008-08-27 10:59 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 540, Kenichi Handa

Chong Yidong <cyd@stupidchicken.com> writes:

>> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
>> buffer the following 4 lines (note the leading space):
>>
>> (search-forward " П" nil t)
>> (search-forward " Р" nil t)
>>  П
>>  Р
>>
>> and type `C-x C-e' after each of first two lines.

Should be fixed now.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-27 14:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87wsi2a5mn.fsf@cyd.mit.edu>
2008-07-06 18:43 ` bug#540: 23.0.60; Unicode search bug Juri Linkov
2008-08-27 14:40   ` bug#540: marked as done (23.0.60; Unicode search bug) Emacs bug Tracking System
2008-08-27  4:15 bug#540: 23.0.60; Unicode search bug Chong Yidong
2008-08-27 10:59 ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).