* bug#540: 23.0.60; Unicode search bug
@ 2008-07-06 18:43 Juri Linkov
0 siblings, 0 replies; 3+ messages in thread
From: Juri Linkov @ 2008-07-06 18:43 UTC (permalink / raw)
To: emacs-pretest-bug
There is a weird bug in searching Unicode text. The search function
fails on Cyrillic letters between codepoints #x0400 and #x041f, but
successfully finds a Cyrillic letter between #x0420 and #x042f.
I tried to debug this and see that in case of failure
it calls `boyer_moore', and in case of successful search
it calls `simple_search'. I checked the Unicode properties,
but everything seems correct.
This bug didn't exist before the Unicode merge.
The easiest way to reproduce it: run `emacs -Q',
put in the *scratch* buffer the following 4 lines
(note the leading space):
(search-forward " П" nil t)
(search-forward " Р" nil t)
П
Р
and type `C-x C-e' after each of first two lines.
In GNU Emacs 23.0.60 (x86_64-pc-linux-gnu)
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#540: 23.0.60; Unicode search bug
@ 2008-08-27 4:15 Chong Yidong
2008-08-27 10:59 ` Andreas Schwab
0 siblings, 1 reply; 3+ messages in thread
From: Chong Yidong @ 2008-08-27 4:15 UTC (permalink / raw)
To: Kenichi Handa; +Cc: 540
Hi Handa-san,
Could you take a look at this bug report? Thanks.
Juri Linkov <juri@jurta.org> wrote:
> There is a weird bug in searching Unicode text. The search function
> fails on Cyrillic letters between codepoints #x0400 and #x041f, but
> successfully finds a Cyrillic letter between #x0420 and #x042f.
>
> I tried to debug this and see that in case of failure it calls
> `boyer_moore', and in case of successful search it calls
> `simple_search'. I checked the Unicode properties, but everything
> seems correct.
>
> This bug didn't exist before the Unicode merge.
>
> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
> buffer the following 4 lines (note the leading space):
>
> (search-forward " П" nil t)
> (search-forward " Р" nil t)
> П
> Р
>
> and type `C-x C-e' after each of first two lines.
Here, the failing case is:
П = 1055 = 10000011111
inverse(П) = 1087 = 10000111111
^^^^^^
whereas the case that works (by setting boyer_moore_ok to 0) is
Р = 1056 = 10000100000
inverse(Р) = 1088 = 10001000000
^^^^^^
I've indicated the last 6 bits, according to the logic in search_buffer
(which I don't fully understand).
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#540: 23.0.60; Unicode search bug
2008-08-27 4:15 bug#540: 23.0.60; Unicode search bug Chong Yidong
@ 2008-08-27 10:59 ` Andreas Schwab
0 siblings, 0 replies; 3+ messages in thread
From: Andreas Schwab @ 2008-08-27 10:59 UTC (permalink / raw)
To: Chong Yidong; +Cc: 540, Kenichi Handa
Chong Yidong <cyd@stupidchicken.com> writes:
>> The easiest way to reproduce it: run `emacs -Q', put in the *scratch*
>> buffer the following 4 lines (note the leading space):
>>
>> (search-forward " П" nil t)
>> (search-forward " Р" nil t)
>> П
>> Р
>>
>> and type `C-x C-e' after each of first two lines.
Should be fixed now.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-08-27 10:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-27 4:15 bug#540: 23.0.60; Unicode search bug Chong Yidong
2008-08-27 10:59 ` Andreas Schwab
-- strict thread matches above, loose matches on Subject: below --
2008-07-06 18:43 Juri Linkov
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).