From: Thien-Thi Nguyen <ttn@gnuvola.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Comparing non-English strings for sorting
Date: Fri, 13 Feb 2009 00:57:15 +0100 [thread overview]
Message-ID: <878wobgqec.fsf@ambire.localdomain> (raw)
In-Reply-To: <e9ff7e14-d8b2-4b75-9a34-4a00ad2c6019@a12g2000yqm.googlegroups.com> (spamfilteraccount@gmail.com's message of "Tue, 10 Feb 2009 02:47:41 -0800 (PST)")
() "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com>
() Tue, 10 Feb 2009 02:47:41 -0800 (PST)
(vconcat (downcase str1))
(vconcat (downcase str2)))))
If all the strings you wish to compare are composed entirely of
the characters in `order', this (unconditional case smashing) is
sufficient. Otherwise, comparing a downcased character in that
set with a "downcased" character outside that set (where the
result is equal to the input) can be problematic.
Consider the ASCII character set (ascii(7)), specifically, the
six indices between ?Z and ?a (here, we use ?_, decimal 95).
(downcase ?_) => 95 ;; no change
(downcase ?a) => 97 ;; no change
(downcase ?A) => 97 ;; smashed (numerically "upward", hee hee)
?A => 65 ;; originally
Using unconditional case smashing in a hypothetical analog of
`my-case-insensitive-nonenglish-string-comparator', we'd see:
(string-ci-lessp "_" "a") => t
(string-ci-lessp "_" "A") => t
(string-lessp "_" "a") => t
(string-lessp "_" "A") => nil
Perhaps the reason behind the difference between the 2nd and 4th
results being "one is case-insensitive and the other isn't" does
indeed satisfy you. It doesn't, me. What is the case of the
underscore and why should my (in)sensitivity to it matter at all?
Appended is what i think is a more rational algorithm (expressed
in C, not Emacs Lisp, because it is part of an upcoming Guile
release (which is implemented (like Emacs) in C)). It allows for
the (properly phrased ;-) mu answer.
thi
______________________________________
int
scm_i_ccmp_ci (int x, int y)
{
int d, lx, ly, ux = 0, uy = 0;
#define ISLOWER(c) (islower (c) ? (1 + c - 'a') : 0)
#define ISUPPER(c) (isupper (c) ? (1 + c - 'A') : 0)
#define ALPHA(c) ((l ## c = ISLOWER (c)) || (u ## c = ISUPPER (c)))
d = (!ALPHA (x) || !ALPHA (y))
/* Subtract directly. */
? (x - y)
/* Subtract in one domain or another. */
: (lx
? (lx - (ly
? ly
: uy))
: (ux - (uy
? uy
: ly)));
return !d
? 0
: (GOOD (d)
? 1
: -1);
#undef ALPHA
#undef ISUPPER
#undef ISLOWER
}
prev parent reply other threads:[~2009-02-12 23:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-10 6:31 Comparing non-English strings for sorting spamfilteraccount
2009-02-10 10:47 ` spamfilteraccount
2009-02-12 23:57 ` Thien-Thi Nguyen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878wobgqec.fsf@ambire.localdomain \
--to=ttn@gnuvola.org \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.