unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Nathan Trapuzzano <nbtrap@nbtrap.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 17130@debbugs.gnu.org
Subject: bug#17130: 24.4.50; Deficient Unicode case folding
Date: Sat, 29 Mar 2014 11:29:43 -0400	[thread overview]
Message-ID: <87ioqxxbtk.fsf@nbtrap.com> (raw)
In-Reply-To: <838urtdpwk.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 29 Mar 2014 17:45:47 +0300")

Eli Zaretskii <eliz@gnu.org> writes:

>> σ, ς, and Σ would all have σ in the CANONICALIZE slot, since they all
>> fold to σ.
>
> So you would need to search all characters to find those which have σ
> in the CANONICALIZE slot -- not very efficient, to say the least.

Doesn't this already happen?  If not, then what is the CANONICALIZE slot
doing that couldn't be done with the regular upcase/downcase slots by
themselves?

> IOW, what you suggest will provide a one-way mapping, whereas we need
> a two-way mapping.

Not sure I follow.  Seems to me the CANONICALIZE slot is sufficient, at
least in principle.

>> > Besides, don't we also need to know that ς can only be present at the
>> > end of a word?
>> 
>> Don't think so.  AFAIK, Unicode says nothing about ordering except when
>> it comes to combining characters.  But even it did prescribe such a
>> rule, I don't think it would have anything to do with case folding.
>
> Who said this is only about case folding?

I should have said just "case", not "case folding".

> Emacs should use this data for up-casing and down-casing as well, for
> example, so that M-l downcases Σ to ς, not σ, when it is at the end of
> the word.  Wouldn't users of Greek expect that?

Maybe.  I'm just saying that Unicode itself doesn't prescribe or even
recommend such behavior.  It defines case conversions independently of
ordering.

That said, making M-l downcase terminal Σ to ς would be a nice feature
that could be enabled, e.g., by enabling a minor mode or by modifying
some *-functions variable of functions that get called before the normal
behavior of M-l is applied, etc.  But it shouldn't have anything to do
with Unicode-compliant case-insensitive searching.

>> Right, but what I'm asking is: if Emacs doesn't do Unicode case folding,
>> what is the purpose of the CANONICALIZE slot except as a kind of
>> placeholder that gets autofilled?
>
> Whenever you need the canonical equivalent of a character, such as in
> case-insensitive search, you need that slot.

But there's nothing about the slot that mandates that only _pairs_ can
be case-equivalent under case folding.  Indeed, the manual speaks of
"sets" of chracters that might be equivalent under case-folding, hence
my understanding that σ, ς, and Σ can all have σ in their CANONICALIZE
slot, and that's all it would take.

(Btw, I'm using "case-insensitive" to mean the same as "under
case-folding".)

>> Are there other kinds of case folding--other than traditional
>> upper/lower and Unicode--that I'm not aware of?
>
> There's "title case", of course.  

I think title case would require an extra slot in the case table.

> There are also characters whose case pair is not a single character,
> but several, like the upper-case variant of ß in German.

Good point.  "ß" should fold to "ss".  I guess for the CANONICALIZE slot
to suffice, it would have to map to a string, not a code point.

> Personally, I think we need an additional slot for what you want, and
> code to use it.

Given the point about ß, you're probably right.  Unless we can make
entries in the CANONICALIZE slot be strings rather than code points.





  reply	other threads:[~2014-03-29 15:29 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-28 12:07 bug#17130: 24.4.50; Deficient Unicode case folding Nathan Trapuzzano
2014-03-28 15:51 ` Eli Zaretskii
2014-03-28 19:31   ` nbtrap
2014-03-29  6:45     ` Eli Zaretskii
     [not found]       ` <87ob0pnptc.fsf@nbtrap.com>
2014-03-29 13:15         ` Eli Zaretskii
2014-03-29 14:03           ` Nathan Trapuzzano
2014-03-29 14:45             ` Eli Zaretskii
2014-03-29 15:29               ` Nathan Trapuzzano [this message]
2014-03-29 17:37                 ` Eli Zaretskii
2014-03-29 18:31                   ` Nathan Trapuzzano
2014-03-29 18:36                     ` Nathan Trapuzzano
2014-03-29 19:51                       ` Eli Zaretskii
2014-03-29 20:15                         ` Nathan Trapuzzano
2014-03-30  2:45                           ` Eli Zaretskii
2014-03-29 19:50                     ` Eli Zaretskii
2014-03-29 20:01                       ` Nathan Trapuzzano
2019-09-29 14:23 ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ioqxxbtk.fsf@nbtrap.com \
    --to=nbtrap@nbtrap.com \
    --cc=17130@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).