all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Alternative to string< that works "well" with unicode
@ 2014-11-27 22:42 Rasmus
  0 siblings, 0 replies; 6+ messages in thread
From: Rasmus @ 2014-11-27 22:42 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I want to sort a list of strings, including accented strings, in a
"meaningful way".  E.g. with this list (É E T A À Z) the sorted list
should be (A À E É T Z).

(sort '(É E T A À Z) 'string<)
      => (A E T Z À É) ; expected (A À E É T Z)

I tried all the versions of 'string< that I could find with apropos.

Is there a function that will support my preferred sorting in Emacs?

Thanks,
Rasmus

-- 
What will be next?




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Alternative to string< that works "well" with unicode
       [not found] <mailman.14831.1417128158.1147.help-gnu-emacs@gnu.org>
@ 2014-11-27 23:52 ` Pascal J. Bourguignon
  2014-11-28  0:06   ` Yuri Khan
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal J. Bourguignon @ 2014-11-27 23:52 UTC (permalink / raw)
  To: help-gnu-emacs

Rasmus <rasmus@gmx.us> writes:

> Hi,
>
> I want to sort a list of strings, including accented strings, in a
> "meaningful way".  E.g. with this list (É E T A À Z) the sorted list
> should be (A À E É T Z).
>
> (sort '(É E T A À Z) 'string<)
>       => (A E T Z À É) ; expected (A À E É T Z)

You might have expected that, but users writing different languages will
have expected something else.

This is called localization.


> I tried all the versions of 'string< that I could find with apropos.
>
> Is there a function that will support my preferred sorting in Emacs?

AFAICS, there's nothing yet.

http://en.wikipedia.org/wiki/Unicode_collation_algorithm

You could try to implement the UCA (Unicode Collation Algorithm):
http://www.unicode.org/reports/tr10/

Alternatively, you could send the data to the unix sort command, with
the right LC_ALL environment variable.

-- 
__Pascal Bourguignon__                 http://www.informatimago.com/
“The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.” -- Carl Bass CEO Autodesk


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Alternative to string< that works "well" with unicode
  2014-11-27 23:52 ` Alternative to string< that works "well" with unicode Pascal J. Bourguignon
@ 2014-11-28  0:06   ` Yuri Khan
  2014-11-28  0:32     ` Rasmus
  0 siblings, 1 reply; 6+ messages in thread
From: Yuri Khan @ 2014-11-28  0:06 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs@gnu.org

On Fri, Nov 28, 2014 at 5:52 AM, Pascal J. Bourguignon
<pjb@informatimago.com> wrote:

>> Is there a function that will support my preferred sorting in Emacs?
>
> AFAICS, there's nothing yet.

I think Emacs bug#18051 is relevant.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Alternative to string< that works "well" with unicode
  2014-11-28  0:06   ` Yuri Khan
@ 2014-11-28  0:32     ` Rasmus
  2014-11-28  8:25       ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Rasmus @ 2014-11-28  0:32 UTC (permalink / raw)
  To: help-gnu-emacs

Yuri Khan <yuri.v.khan@gmail.com> writes:

> On Fri, Nov 28, 2014 at 5:52 AM, Pascal J. Bourguignon
> <pjb@informatimago.com> wrote:
>
>>> Is there a function that will support my preferred sorting in Emacs?
>>
>> AFAICS, there's nothing yet.
>
> I think Emacs bug#18051 is relevant.

Thanks, the bug pointed me to `string-collate-lessp':

(sort '(É E T A À Z) 'string-collate-lessp)
      => (A À E É T Z)

Pretty cool!

Thanks!

-- 
The second rule of Fight Club is: You do not talk about Fight Club




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Alternative to string< that works "well" with unicode
  2014-11-28  0:32     ` Rasmus
@ 2014-11-28  8:25       ` Eli Zaretskii
  2014-11-28  8:43         ` Rasmus
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-11-28  8:25 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Rasmus <rasmus@gmx.us>
> Date: Fri, 28 Nov 2014 01:32:00 +0100
> 
> > I think Emacs bug#18051 is relevant.
> 
> Thanks, the bug pointed me to `string-collate-lessp':
> 
> (sort '(É E T A À Z) 'string-collate-lessp)
>       => (A À E É T Z)
> 
> Pretty cool!

Yes, it is.  But please be aware that the results are heavily
locale-dependent; in particular, a non-UTF-8 locale on a typical
GNU/Linux system might behave very differently.  Likewise on systems
where the standard C library is not glibc.

The upshot of all this is that the results of such sorting are
excellent from the local user POV, but should not be considered as
stable across locales and systems, and therefore Lisp programs that
are expected to be distributed should not rely on the resulting order
too much.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Alternative to string< that works "well" with unicode
  2014-11-28  8:25       ` Eli Zaretskii
@ 2014-11-28  8:43         ` Rasmus
  0 siblings, 0 replies; 6+ messages in thread
From: Rasmus @ 2014-11-28  8:43 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Rasmus <rasmus@gmx.us>
>> Date: Fri, 28 Nov 2014 01:32:00 +0100
>
>> 
>> > I think Emacs bug#18051 is relevant.
>> 
>> Thanks, the bug pointed me to `string-collate-lessp':
>> 
>> (sort '(É E T A À Z) 'string-collate-lessp)
>>       => (A À E É T Z)
>> 
>> Pretty cool!
>
> Yes, it is.  But please be aware that the results are heavily
> locale-dependent; in particular, a non-UTF-8 locale on a typical
> GNU/Linux system might behave very differently.  Likewise on systems
> where the standard C library is not glibc.
>
> The upshot of all this is that the results of such sorting are
> excellent from the local user POV, but should not be considered as
> stable across locales and systems, and therefore Lisp programs that
> are expected to be distributed should not rely on the resulting order
> too much.

The particular usecase I  had in mind is sorting in EMMS.  Here,
local-dependent sorting seems like a plus, and the particular order is not
so interesting.

Thanks everyone!

—Rasmus

-- 
There are known knowns; there are things we know that we know




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-11-28  8:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.14831.1417128158.1147.help-gnu-emacs@gnu.org>
2014-11-27 23:52 ` Alternative to string< that works "well" with unicode Pascal J. Bourguignon
2014-11-28  0:06   ` Yuri Khan
2014-11-28  0:32     ` Rasmus
2014-11-28  8:25       ` Eli Zaretskii
2014-11-28  8:43         ` Rasmus
2014-11-27 22:42 Rasmus

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.