unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Ulrich Mueller <ulm@gentoo.org>
To: Kenichi Handa <handa@m17n.org>
Cc: emacs-devel@gnu.org
Subject: Re: Case mapping of sharp s
Date: Mon, 16 Nov 2009 17:38:26 +0100	[thread overview]
Message-ID: <19201.32770.352944.474086@a1i15.kph.uni-mainz.de> (raw)
In-Reply-To: <tl7d43ir7w1.fsf@m17n.org>

>>>>> On Mon, 16 Nov 2009, Kenichi Handa wrote:

> In article <19200.4158.380820.761685@a1i15.kph.uni-mainz.de>, Ulrich Mueller <ulm@gentoo.org> writes:
>> In Unicode since version 5.1.0 the U+1E9E code point is assigned
>> to "LATIN CAPITAL LETTER SHARP S". Would it be possible to add a
>> mapping from this to the lower case ß, as in the patch below?

>> However, I've noticed that similar mappings for Turkish ı (dotless
>> i) and İ (I with dot) were commented out [1]. Is it still so that
>> such a change would "make searches slow", as stated in the comment?

> That kind of setting surely makes the searching of ß and ẞ slow
> because we can't use BM search when case-fold-search is non-nil.
> BM search is possible only when all case-equivalent characters are
> represented by the same byte length, and differ only in the last
> byte.

So do I understand this right: In order to perform a Boyer-Moore
search, the characters have to be either both ASCII, or must be in the
same group of 64 adjacent characters (because the last byte in UTF-8
encodes 6 bits)?

Is that the reason why also ÿ and Ÿ (U+00FF and U+0178, small/capital
y with diaeresis) don't form a case pair?

> So, if you are sure that searching of ß is very rare (I have
> no idea), please install it.

Usage of (lower case) ß is very common in a German language context,
so I'd guess that searching for it is not so rare.

On the other hand, capital ẞ is not used in regular German orthography
(that's probably the reason why the character was added to Unicode
only in 2008). So if the change would cause large tradeoffs in search
speed, then I think it's not worthwhile.

By what factor is the non-BM search slower, as compared to the BM
search?

> By the way, I think it's possible to improve the current BM-search
> for such a case. For instance, to search "straße", we at first do
> BM-search for "stra" part and then check the remaining "ße" part.
> Aren't there any challenger?

Ulrich




  reply	other threads:[~2009-11-16 16:38 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-15 14:29 Case mapping of sharp s Ulrich Mueller
2009-11-16 12:06 ` Kenichi Handa
2009-11-16 16:38   ` Ulrich Mueller [this message]
2009-11-17  7:36     ` Kenichi Handa
2009-11-17 21:23       ` Reiner Steib
2009-11-16 19:12   ` Eli Zaretskii
2009-11-17  7:43     ` martin rudalics
2009-11-17  7:49     ` Kenichi Handa
2009-11-17 18:56       ` Eli Zaretskii
2009-11-18  1:00         ` Kenichi Handa
2009-11-18  4:09           ` Eli Zaretskii
2009-11-18  5:33             ` Stephen J. Turnbull
2009-11-18  6:26             ` Kenichi Handa
2009-11-18 14:44               ` Stefan Monnier
2009-11-18 19:05                 ` Ulrich Mueller
2009-11-19  1:16                   ` Stefan Monnier
2009-11-18 17:58               ` Eli Zaretskii
2009-11-19  1:57                 ` Stephen J. Turnbull
  -- strict thread matches above, loose matches on Subject: below --
2009-11-19 19:48 grischka
2009-11-19 21:49 ` Stefan Monnier
2009-11-19 22:43   ` David Kastrup
2009-11-20  2:08     ` Stefan Monnier
2009-11-20  8:03       ` David Kastrup
2009-11-20 14:14         ` Stefan Monnier
2009-11-20  3:41     ` Stephen J. Turnbull
2009-11-20  4:20       ` Stefan Monnier
2009-11-20  7:13         ` Stephen J. Turnbull
2009-11-21  0:02           ` Richard Stallman
2009-11-21 12:39             ` David Kastrup
2009-11-21 17:40               ` Stephen J. Turnbull
2009-11-21 19:15                 ` Eli Zaretskii
2009-11-22  2:58                   ` Stephen J. Turnbull
2009-11-22  4:28                     ` Eli Zaretskii
2009-11-22  8:27                       ` Stephen J. Turnbull
2009-11-23  1:30                 ` Kenichi Handa
2009-11-21 22:52               ` Richard Stallman
2009-11-20  8:10     ` Ulrich Mueller
2009-11-20 11:46       ` Stephen J. Turnbull
2009-11-20 14:43         ` Ulrich Mueller
2009-11-21  4:33           ` Stephen J. Turnbull
2009-11-19 23:25   ` grischka
2009-11-20  2:11     ` Stefan Monnier
2009-11-21  3:08       ` grischka
2009-11-21  8:58         ` Eli Zaretskii
2009-11-21  9:33           ` Andreas Schwab
2009-11-21 11:45             ` Eli Zaretskii
2009-11-21 15:33           ` grischka
2009-11-21 10:41         ` Ulrich Mueller
2009-11-21 11:58           ` Andreas Schwab
2009-11-21 17:01             ` Ulrich Mueller
2009-11-22 12:11               ` Andreas Schwab
2009-11-22 20:15                 ` Stefan Monnier
2009-11-24 12:26             ` Kenichi Handa
2009-11-24 19:23               ` grischka
2009-11-25  2:13                 ` Kenichi Handa
2009-11-26 13:07                   ` grischka
2009-11-29 22:03                   ` Juri Linkov
2009-11-30  1:22                     ` Stefan Monnier
2009-11-30  1:28                     ` Kenichi Handa
2009-11-30  1:36                       ` Kenichi Handa
2009-11-30  7:01                     ` Ulrich Mueller
2009-11-30 12:01                       ` Juri Linkov
2009-11-30 13:09                         ` martin rudalics
2009-11-30 21:57                       ` Juri Linkov
2009-11-30 22:34                         ` Ulrich Mueller
2009-12-01  0:02                           ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19201.32770.352944.474086@a1i15.kph.uni-mainz.de \
    --to=ulm@gentoo.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).