all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: ulm@gentoo.org, emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
Subject: Re: Case mapping of sharp s
Date: Wed, 18 Nov 2009 14:33:57 +0900	[thread overview]
Message-ID: <876398wg56.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <83bpj0mq2v.fsf@gnu.org>

Eli Zaretskii writes:

 > My idea was to use the same technique to fold the text as you search
 > through it.  You could, for example, fold one character at a time, so
 > that instead of comparing against X you compare against fold(X).

That's precisely how case-insensitive Boyer-Moore is done.  But it's
not that easy for Unicode, because you will need to check
*characters*, not octets, while Boyer-Moore is very much oriented
toward fixed-width charsets.

I think the problem here is that because of the nature of Unicode,
this folding often crosses character blocks (here meaning "sets of
characters with the same bits except for the lowest 6").  That means
that you will need to keep track not only of the folding for the
current octet, but you will also need to be able to backtrack to check
the higher bits.  By contrast, with the Mule internal encoding the
case pairs are always in the same block of 96 characters.

I don't think that it's *impossible* to implement Boyer-Moore with
translation for UTF-8, but it's nowhere near as easy as for the
unibyte charset case, which Mule internal can in practice be reduced
to.  It's possible that all this fiddling could result in a
performance hit of the same order of magnitude as the performance hit
of brute-forcing the comparison.




  reply	other threads:[~2009-11-18  5:33 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-15 14:29 Case mapping of sharp s Ulrich Mueller
2009-11-16 12:06 ` Kenichi Handa
2009-11-16 16:38   ` Ulrich Mueller
2009-11-17  7:36     ` Kenichi Handa
2009-11-17 21:23       ` Reiner Steib
2009-11-16 19:12   ` Eli Zaretskii
2009-11-17  7:43     ` martin rudalics
2009-11-17  7:49     ` Kenichi Handa
2009-11-17 18:56       ` Eli Zaretskii
2009-11-18  1:00         ` Kenichi Handa
2009-11-18  4:09           ` Eli Zaretskii
2009-11-18  5:33             ` Stephen J. Turnbull [this message]
2009-11-18  6:26             ` Kenichi Handa
2009-11-18 14:44               ` Stefan Monnier
2009-11-18 19:05                 ` Ulrich Mueller
2009-11-19  1:16                   ` Stefan Monnier
2009-11-18 17:58               ` Eli Zaretskii
2009-11-19  1:57                 ` Stephen J. Turnbull
  -- strict thread matches above, loose matches on Subject: below --
2009-11-19 19:48 grischka
2009-11-19 21:49 ` Stefan Monnier
2009-11-19 22:43   ` David Kastrup
2009-11-20  2:08     ` Stefan Monnier
2009-11-20  8:03       ` David Kastrup
2009-11-20 14:14         ` Stefan Monnier
2009-11-20  3:41     ` Stephen J. Turnbull
2009-11-20  4:20       ` Stefan Monnier
2009-11-20  7:13         ` Stephen J. Turnbull
2009-11-21  0:02           ` Richard Stallman
2009-11-21 12:39             ` David Kastrup
2009-11-21 17:40               ` Stephen J. Turnbull
2009-11-21 19:15                 ` Eli Zaretskii
2009-11-22  2:58                   ` Stephen J. Turnbull
2009-11-22  4:28                     ` Eli Zaretskii
2009-11-22  8:27                       ` Stephen J. Turnbull
2009-11-23  1:30                 ` Kenichi Handa
2009-11-21 22:52               ` Richard Stallman
2009-11-20  8:10     ` Ulrich Mueller
2009-11-20 11:46       ` Stephen J. Turnbull
2009-11-20 14:43         ` Ulrich Mueller
2009-11-21  4:33           ` Stephen J. Turnbull
2009-11-19 23:25   ` grischka
2009-11-20  2:11     ` Stefan Monnier
2009-11-21  3:08       ` grischka
2009-11-21  8:58         ` Eli Zaretskii
2009-11-21  9:33           ` Andreas Schwab
2009-11-21 11:45             ` Eli Zaretskii
2009-11-21 15:33           ` grischka
2009-11-21 10:41         ` Ulrich Mueller
2009-11-21 11:58           ` Andreas Schwab
2009-11-21 17:01             ` Ulrich Mueller
2009-11-22 12:11               ` Andreas Schwab
2009-11-22 20:15                 ` Stefan Monnier
2009-11-24 12:26             ` Kenichi Handa
2009-11-24 19:23               ` grischka
2009-11-25  2:13                 ` Kenichi Handa
2009-11-26 13:07                   ` grischka
2009-11-29 22:03                   ` Juri Linkov
2009-11-30  1:22                     ` Stefan Monnier
2009-11-30  1:28                     ` Kenichi Handa
2009-11-30  1:36                       ` Kenichi Handa
2009-11-30  7:01                     ` Ulrich Mueller
2009-11-30 12:01                       ` Juri Linkov
2009-11-30 13:09                         ` martin rudalics
2009-11-30 21:57                       ` Juri Linkov
2009-11-30 22:34                         ` Ulrich Mueller
2009-12-01  0:02                           ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=876398wg56.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=ulm@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.