From: Kenichi Handa <handa@etl.go.jp>
Cc: bug-gnu-emacs@gnu.org
Subject: Re: Problem with Boyer Moore and Greek characters
Date: Tue, 7 May 2002 22:35:29 +0900 (JST) [thread overview]
Message-ID: <200205071335.WAA08426@etlken.m17n.org> (raw)
In-Reply-To: message from Thomas Morgan on 22 Apr 2002 19:44:17 -0400
Sorry for the late reply on this matter.
Although I don't understand this part of code fully, it
seems that your fix is correct. Richard, what do you think?
Shall I install it (both in HEAD and RC)?
---
Ken'ichi HANDA
handa@etl.go.jp
Thomas Morgan <tlm@pocketmail.com> writes:
> I ran GNU Emacs 21.1.1 (i686-pc-linux-gnu, X toolkit) with the options
> `--q --no-site-file', then typed the following into `*scratch*':
> (search-forward "ί")
> ύ
> (The first Greek character is an accented iota represented in Emacs by
> the character number 342199, and the second is an accented upsilon
> represented by 342203. I entered them with the input method
> `greek-ibycus4'.)
> Then I pressed `C-p' and `C-e' to move point to the end of the first
> line, and `C-x C-e' to evaluate the expression.
> Here is the exact input for all of that:
> ( s e a r c h - f o r w a r d SPC " C-x <return> C-\
> g r e e k - i b y c u s 4 <return> i ' C-\ " ) <return>
> C-\ u ' C-\ C-p C-e C-x C-e
> This moved the cursor to the end of the second line, and displayed
> `214', the new position of point, in the echo area. So searching for
> the iota found the upsilon. This must be a bug.
> Boyer Moore searching compares only the last bytes of the characters,
> and this leads to the problem. If you capitalize the accented iota,
> the last byte is the same as the last byte of the upsilon, although
> their second-to-last bytes are different.
> Capital accented iota \234\364\362\273
> Small accented upsilon \234\364\361\273
> So before doing a Boyer Moore search, `search_buffer' needs to check
> that the character and its inversion have the same first three bytes.
> Here is the patch I made to do that. Please forgive my mistakes; I am
> not a programmer.
> cd ~/emacs-21.1/src/
> diff -c /home/tlm/emacs-21.1/src/search.c.\~1\~ /home/tlm/emacs-21.1/src/search.c
> *** /home/tlm/emacs-21.1/src/search.c.~1~ Mon Oct 1 02:08:20 2001
> --- /home/tlm/emacs-21.1/src/search.c Wed Apr 3 07:53:39 2002
> ***************
> *** 1237,1243 ****
> /* Keep track of which character set row
> contains the characters that need translation. */
> int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! if (charset_base == -1)
> charset_base = charset_base_code;
> else if (charset_base != charset_base_code)
> /* If two different rows appear, needing translation,
> --- 1237,1246 ----
> /* Keep track of which character set row
> contains the characters that need translation. */
> int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! int inverse_charset_base = inverse & ~CHAR_FIELD3_MASK;
> ! if (charset_base_code != inverse_charset_base)
> ! boyer_moore_ok = 0;
> ! else if (charset_base == -1)
> charset_base = charset_base_code;
> else if (charset_base != charset_base_code)
> /* If two different rows appear, needing translation,
> Diff finished at Wed Apr 3 08:00:10
> _______________________________________________
> Bug-gnu-emacs mailing list
> Bug-gnu-emacs@gnu.org
> http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs
next reply other threads:[~2002-05-07 13:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-07 13:35 Kenichi Handa [this message]
2002-05-12 16:44 ` Problem with Boyer Moore and Greek characters Richard Stallman
-- strict thread matches above, loose matches on Subject: below --
2002-05-13 0:12 Kenichi Handa
2002-05-13 17:00 ` Richard Stallman
2002-04-22 23:44 Thomas Morgan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200205071335.WAA08426@etlken.m17n.org \
--to=handa@etl.go.jp \
--cc=bug-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).