unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@etl.go.jp>
Cc: bug-gnu-emacs@gnu.org
Subject: Re: Problem with Boyer Moore and Greek characters
Date: Tue, 7 May 2002 22:35:29 +0900 (JST)	[thread overview]
Message-ID: <200205071335.WAA08426@etlken.m17n.org> (raw)
In-Reply-To: message from Thomas Morgan on 22 Apr 2002 19:44:17 -0400

Sorry for the late reply on this matter.

Although I don't understand this part of code fully, it
seems that your fix is correct.  Richard, what do you think?
Shall I install it (both in HEAD and RC)?

---
Ken'ichi HANDA
handa@etl.go.jp

Thomas Morgan <tlm@pocketmail.com> writes:
> I ran GNU Emacs 21.1.1 (i686-pc-linux-gnu, X toolkit) with the options
> `--q --no-site-file', then typed the following into `*scratch*':

>   (search-forward "ί")
>   ύ

> (The first Greek character is an accented iota represented in Emacs by
> the character number 342199, and the second is an accented upsilon
> represented by 342203.  I entered them with the input method
> `greek-ibycus4'.)

> Then I pressed `C-p' and `C-e' to move point to the end of the first
> line, and `C-x C-e' to evaluate the expression.

> Here is the exact input for all of that:

> ( s e a r c h - f o r w a r d SPC " C-x <return> C-\ 
> g r e e k - i b y c u s 4 <return> i ' C-\ " ) <return> 
> C-\ u ' C-\ C-p C-e C-x C-e

> This moved the cursor to the end of the second line, and displayed
> `214', the new position of point, in the echo area.  So searching for
> the iota found the upsilon.  This must be a bug.

> Boyer Moore searching compares only the last bytes of the characters,
> and this leads to the problem.  If you capitalize the accented iota,
> the last byte is the same as the last byte of the upsilon, although
> their second-to-last bytes are different.

> Capital accented iota	\234\364\362\273
> Small accented upsilon	\234\364\361\273

> So before doing a Boyer Moore search, `search_buffer' needs to check
> that the character and its inversion have the same first three bytes.
> Here is the patch I made to do that.  Please forgive my mistakes; I am
> not a programmer.

> cd ~/emacs-21.1/src/
> diff -c /home/tlm/emacs-21.1/src/search.c.\~1\~ /home/tlm/emacs-21.1/src/search.c
> *** /home/tlm/emacs-21.1/src/search.c.~1~	Mon Oct  1 02:08:20 2001
> --- /home/tlm/emacs-21.1/src/search.c	Wed Apr  3 07:53:39 2002
> ***************
> *** 1237,1243 ****
>   		  /* Keep track of which character set row
>   		     contains the characters that need translation.  */
>   		  int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! 		  if (charset_base == -1)
>   		    charset_base = charset_base_code;
>   		  else if (charset_base != charset_base_code)
>   		    /* If two different rows appear, needing translation,
> --- 1237,1246 ----
>   		  /* Keep track of which character set row
>   		     contains the characters that need translation.  */
>   		  int charset_base_code = c & ~CHAR_FIELD3_MASK;
> ! 		  int inverse_charset_base = inverse & ~CHAR_FIELD3_MASK;
> ! 		  if (charset_base_code != inverse_charset_base)
> ! 		    boyer_moore_ok = 0;
> ! 		  else if (charset_base == -1)
>   		    charset_base = charset_base_code;
>   		  else if (charset_base != charset_base_code)
>   		    /* If two different rows appear, needing translation,

> Diff finished at Wed Apr  3 08:00:10


> _______________________________________________
> Bug-gnu-emacs mailing list
> Bug-gnu-emacs@gnu.org
> http://mail.gnu.org/mailman/listinfo/bug-gnu-emacs

             reply	other threads:[~2002-05-07 13:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-07 13:35 Kenichi Handa [this message]
2002-05-12 16:44 ` Problem with Boyer Moore and Greek characters Richard Stallman
  -- strict thread matches above, loose matches on Subject: below --
2002-05-13  0:12 Kenichi Handa
2002-05-13 17:00 ` Richard Stallman
2002-04-22 23:44 Thomas Morgan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200205071335.WAA08426@etlken.m17n.org \
    --to=handa@etl.go.jp \
    --cc=bug-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).