* Alternative to string< that works "well" with unicode
@ 2014-11-27 22:42 Rasmus
0 siblings, 0 replies; 6+ messages in thread
From: Rasmus @ 2014-11-27 22:42 UTC (permalink / raw)
To: help-gnu-emacs
Hi,
I want to sort a list of strings, including accented strings, in a
"meaningful way". E.g. with this list (É E T A À Z) the sorted list
should be (A À E É T Z).
(sort '(É E T A À Z) 'string<)
=> (A E T Z À É) ; expected (A À E É T Z)
I tried all the versions of 'string< that I could find with apropos.
Is there a function that will support my preferred sorting in Emacs?
Thanks,
Rasmus
--
What will be next?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Alternative to string< that works "well" with unicode
[not found] <mailman.14831.1417128158.1147.help-gnu-emacs@gnu.org>
@ 2014-11-27 23:52 ` Pascal J. Bourguignon
2014-11-28 0:06 ` Yuri Khan
0 siblings, 1 reply; 6+ messages in thread
From: Pascal J. Bourguignon @ 2014-11-27 23:52 UTC (permalink / raw)
To: help-gnu-emacs
Rasmus <rasmus@gmx.us> writes:
> Hi,
>
> I want to sort a list of strings, including accented strings, in a
> "meaningful way". E.g. with this list (É E T A À Z) the sorted list
> should be (A À E É T Z).
>
> (sort '(É E T A À Z) 'string<)
> => (A E T Z À É) ; expected (A À E É T Z)
You might have expected that, but users writing different languages will
have expected something else.
This is called localization.
> I tried all the versions of 'string< that I could find with apropos.
>
> Is there a function that will support my preferred sorting in Emacs?
AFAICS, there's nothing yet.
http://en.wikipedia.org/wiki/Unicode_collation_algorithm
You could try to implement the UCA (Unicode Collation Algorithm):
http://www.unicode.org/reports/tr10/
Alternatively, you could send the data to the unix sort command, with
the right LC_ALL environment variable.
--
__Pascal Bourguignon__ http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Alternative to string< that works "well" with unicode
2014-11-27 23:52 ` Alternative to string< that works "well" with unicode Pascal J. Bourguignon
@ 2014-11-28 0:06 ` Yuri Khan
2014-11-28 0:32 ` Rasmus
0 siblings, 1 reply; 6+ messages in thread
From: Yuri Khan @ 2014-11-28 0:06 UTC (permalink / raw)
To: Pascal J. Bourguignon; +Cc: help-gnu-emacs@gnu.org
On Fri, Nov 28, 2014 at 5:52 AM, Pascal J. Bourguignon
<pjb@informatimago.com> wrote:
>> Is there a function that will support my preferred sorting in Emacs?
>
> AFAICS, there's nothing yet.
I think Emacs bug#18051 is relevant.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Alternative to string< that works "well" with unicode
2014-11-28 0:06 ` Yuri Khan
@ 2014-11-28 0:32 ` Rasmus
2014-11-28 8:25 ` Eli Zaretskii
0 siblings, 1 reply; 6+ messages in thread
From: Rasmus @ 2014-11-28 0:32 UTC (permalink / raw)
To: help-gnu-emacs
Yuri Khan <yuri.v.khan@gmail.com> writes:
> On Fri, Nov 28, 2014 at 5:52 AM, Pascal J. Bourguignon
> <pjb@informatimago.com> wrote:
>
>>> Is there a function that will support my preferred sorting in Emacs?
>>
>> AFAICS, there's nothing yet.
>
> I think Emacs bug#18051 is relevant.
Thanks, the bug pointed me to `string-collate-lessp':
(sort '(É E T A À Z) 'string-collate-lessp)
=> (A À E É T Z)
Pretty cool!
Thanks!
--
The second rule of Fight Club is: You do not talk about Fight Club
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Alternative to string< that works "well" with unicode
2014-11-28 0:32 ` Rasmus
@ 2014-11-28 8:25 ` Eli Zaretskii
2014-11-28 8:43 ` Rasmus
0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-11-28 8:25 UTC (permalink / raw)
To: help-gnu-emacs
> From: Rasmus <rasmus@gmx.us>
> Date: Fri, 28 Nov 2014 01:32:00 +0100
>
> > I think Emacs bug#18051 is relevant.
>
> Thanks, the bug pointed me to `string-collate-lessp':
>
> (sort '(É E T A À Z) 'string-collate-lessp)
> => (A À E É T Z)
>
> Pretty cool!
Yes, it is. But please be aware that the results are heavily
locale-dependent; in particular, a non-UTF-8 locale on a typical
GNU/Linux system might behave very differently. Likewise on systems
where the standard C library is not glibc.
The upshot of all this is that the results of such sorting are
excellent from the local user POV, but should not be considered as
stable across locales and systems, and therefore Lisp programs that
are expected to be distributed should not rely on the resulting order
too much.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Alternative to string< that works "well" with unicode
2014-11-28 8:25 ` Eli Zaretskii
@ 2014-11-28 8:43 ` Rasmus
0 siblings, 0 replies; 6+ messages in thread
From: Rasmus @ 2014-11-28 8:43 UTC (permalink / raw)
To: help-gnu-emacs
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Rasmus <rasmus@gmx.us>
>> Date: Fri, 28 Nov 2014 01:32:00 +0100
>
>>
>> > I think Emacs bug#18051 is relevant.
>>
>> Thanks, the bug pointed me to `string-collate-lessp':
>>
>> (sort '(É E T A À Z) 'string-collate-lessp)
>> => (A À E É T Z)
>>
>> Pretty cool!
>
> Yes, it is. But please be aware that the results are heavily
> locale-dependent; in particular, a non-UTF-8 locale on a typical
> GNU/Linux system might behave very differently. Likewise on systems
> where the standard C library is not glibc.
>
> The upshot of all this is that the results of such sorting are
> excellent from the local user POV, but should not be considered as
> stable across locales and systems, and therefore Lisp programs that
> are expected to be distributed should not rely on the resulting order
> too much.
The particular usecase I had in mind is sorting in EMMS. Here,
local-dependent sorting seems like a plus, and the particular order is not
so interesting.
Thanks everyone!
—Rasmus
--
There are known knowns; there are things we know that we know
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-11-28 8:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.14831.1417128158.1147.help-gnu-emacs@gnu.org>
2014-11-27 23:52 ` Alternative to string< that works "well" with unicode Pascal J. Bourguignon
2014-11-28 0:06 ` Yuri Khan
2014-11-28 0:32 ` Rasmus
2014-11-28 8:25 ` Eli Zaretskii
2014-11-28 8:43 ` Rasmus
2014-11-27 22:42 Rasmus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).