unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alexis <flexibeast@gmail.com>
To: 20316@debbugs.gnu.org
Cc: michael.albinus@gmx.de
Subject: bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE
Date: Wed, 15 Apr 2015 09:55:22 +1000	[thread overview]
Message-ID: <87a8ya5kph.fsf@gmail.com> (raw)
In-Reply-To: <83r3rmbvvn.fsf@gnu.org>


Eli Zaretskii <eliz@gnu.org> writes:

> I think we use "lexicographic" for lack of a more accurate word. 
> We could use something like "code point (binary) order", but 
> would that be clear enough to be useful?

i would certainly find that more useful overall, as i think it's 
less ambiguous (to me) than 'lexicographic order' in this 
context. i assume it's "code point [according to the overall 
encoding of the relevant buffer]"? And given your earlier point, 
i'm guessing it would also be useful to say something along the 
lines of "If the data being sorted contains multiple encodings, 
all bets are off"? (Which is relevant in the `org-vcard' case of 
people possibly trying to sort contacts whose names are based in a 
variety of locales.)

> Note that we are not alone in this; at least this page:
>
>   http://en.cppreference.com/w/cpp/string/byte/strcoll
>
> says that the C function 'strcmp' does a "lexicographical 
> comparison".  So do a few other similar pages; google for 
> "difference between strcmp and strcoll".

Well, that to me feels like continued holdover from the C+ASCII 
(or at best Latin-1) 'byte == character' mindset ....

>> A,B,C,Č,Ć,D,Dž,Đ,..S,Š,..Z,Ž
>
> That's "collation order" in action, note that the diacritic 
> order is applied _after_ the alphabetic order of the base 
> characters.  That's what string-collate-lessp does.

*nod* That's why my first thoughts about this issue went to 
collation settings; given that (it seems to me) Emacs has a far 
better handle on i18n and m17n issues than most software, i 
assumed that sorting-by-collation-order would already be available 
in 24.x. However, given what you've said, i've now got a better 
understanding of why implementing this is not straightforward.

Thanks for taking the time to explain all this! 


Alexis.





  reply	other threads:[~2015-04-14 23:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-13  5:22 bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE Alexis
2015-04-13  5:59 ` Alexis
2015-04-13  6:18 ` Michael Albinus
2015-04-13  6:42   ` Alexis
2015-04-13  6:53     ` Michael Albinus
2015-04-13 14:48     ` Eli Zaretskii
2015-04-14  0:55       ` Alexis
2015-04-14 14:57         ` Eli Zaretskii
2015-04-14 23:55           ` Alexis [this message]
2015-04-15  1:23             ` Stefan Monnier
2015-04-13 14:10 ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8ya5kph.fsf@gmail.com \
    --to=flexibeast@gmail.com \
    --cc=20316@debbugs.gnu.org \
    --cc=michael.albinus@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).