From: Alexis <flexibeast@gmail.com>
To: 20316@debbugs.gnu.org
Cc: michael.albinus@gmx.de
Subject: bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE
Date: Wed, 15 Apr 2015 09:55:22 +1000 [thread overview]
Message-ID: <87a8ya5kph.fsf@gmail.com> (raw)
In-Reply-To: <83r3rmbvvn.fsf@gnu.org>
Eli Zaretskii <eliz@gnu.org> writes:
> I think we use "lexicographic" for lack of a more accurate word.
> We could use something like "code point (binary) order", but
> would that be clear enough to be useful?
i would certainly find that more useful overall, as i think it's
less ambiguous (to me) than 'lexicographic order' in this
context. i assume it's "code point [according to the overall
encoding of the relevant buffer]"? And given your earlier point,
i'm guessing it would also be useful to say something along the
lines of "If the data being sorted contains multiple encodings,
all bets are off"? (Which is relevant in the `org-vcard' case of
people possibly trying to sort contacts whose names are based in a
variety of locales.)
> Note that we are not alone in this; at least this page:
>
> http://en.cppreference.com/w/cpp/string/byte/strcoll
>
> says that the C function 'strcmp' does a "lexicographical
> comparison". So do a few other similar pages; google for
> "difference between strcmp and strcoll".
Well, that to me feels like continued holdover from the C+ASCII
(or at best Latin-1) 'byte == character' mindset ....
>> A,B,C,Č,Ć,D,Dž,Đ,..S,Š,..Z,Ž
>
> That's "collation order" in action, note that the diacritic
> order is applied _after_ the alphabetic order of the base
> characters. That's what string-collate-lessp does.
*nod* That's why my first thoughts about this issue went to
collation settings; given that (it seems to me) Emacs has a far
better handle on i18n and m17n issues than most software, i
assumed that sorting-by-collation-order would already be available
in 24.x. However, given what you've said, i've now got a better
understanding of why implementing this is not straightforward.
Thanks for taking the time to explain all this!
Alexis.
next prev parent reply other threads:[~2015-04-14 23:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-13 5:22 bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE Alexis
2015-04-13 5:59 ` Alexis
2015-04-13 6:18 ` Michael Albinus
2015-04-13 6:42 ` Alexis
2015-04-13 6:53 ` Michael Albinus
2015-04-13 14:48 ` Eli Zaretskii
2015-04-14 0:55 ` Alexis
2015-04-14 14:57 ` Eli Zaretskii
2015-04-14 23:55 ` Alexis [this message]
2015-04-15 1:23 ` Stefan Monnier
2015-04-13 14:10 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a8ya5kph.fsf@gmail.com \
--to=flexibeast@gmail.com \
--cc=20316@debbugs.gnu.org \
--cc=michael.albinus@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.