* bug#12008: Alphabetic sorting respect user's language and/or locale
@ 2012-07-21 14:32 martin rudalics
2012-07-21 16:01 ` Eli Zaretskii
0 siblings, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-21 14:32 UTC (permalink / raw)
To: 12008
Currently, specifying alphabetic order for output produced by functions
like `dired' and `sort-subr' makes that output appear in ASCII-code
order. This means that such output deviates from the order expected by
users of Latin-derived alphabets like French, German or Spanish and
can make working with these functions very awkward.
Please consider adding a predicate which makes it possible to produce
such output in alphabetic order respecting the language and/or locale
of the user.
Thank you, martin
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-21 14:32 bug#12008: Alphabetic sorting respect user's language and/or locale martin rudalics
@ 2012-07-21 16:01 ` Eli Zaretskii
2012-07-21 16:38 ` Eli Zaretskii
2012-07-22 10:24 ` Stefan Monnier
0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-21 16:01 UTC (permalink / raw)
To: martin rudalics; +Cc: 12008
> Date: Sat, 21 Jul 2012 16:32:50 +0200
> From: martin rudalics <rudalics@gmx.at>
>
> Currently, specifying alphabetic order for output produced by functions
> like `dired' and `sort-subr' makes that output appear in ASCII-code
> order. This means that such output deviates from the order expected by
> users of Latin-derived alphabets like French, German or Spanish and
> can make working with these functions very awkward.
>
> Please consider adding a predicate which makes it possible to produce
> such output in alphabetic order respecting the language and/or locale
> of the user.
A simple way of doing this goes along the following lines:
Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
return make_number (strcoll (enc_str1, enc_str2));
However, there are 2 potential issues with this:
. do typical libc implementations of strcoll handle multibyte
characters correctly, if ENCODE_SYSTEM happens to produce multibyte
encoding, such as UTF-8?
. is the above efficient enough, when ENCODE_SYSTEM is not a no-op
(which it is for UTF-8 locales)?
I don't know the answer to these, mainly to the first. (The
MS-Windows implementation is claimed to handle multibyte strings.)
Anyone?
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-21 16:01 ` Eli Zaretskii
@ 2012-07-21 16:38 ` Eli Zaretskii
2012-07-22 10:24 ` Stefan Monnier
1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-21 16:38 UTC (permalink / raw)
To: rudalics; +Cc: 12008
> Date: Sat, 21 Jul 2012 19:01:11 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 12008@debbugs.gnu.org
>
> Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
> Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
>
> return make_number (strcoll (enc_str1, enc_str2));
Err... make that
return make_number (strcoll (SDATA (enc_str1), SDATA (enc_str2)));
Sorry.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-21 16:01 ` Eli Zaretskii
2012-07-21 16:38 ` Eli Zaretskii
@ 2012-07-22 10:24 ` Stefan Monnier
2012-07-22 15:25 ` Eli Zaretskii
1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-07-22 10:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 12008
> A simple way of doing this goes along the following lines:
> Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
> Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> return make_number (strcoll (enc_str1, enc_str2));
That's probably OK for dired'd sorting but not for sort-subr where we
need to be independent from the system locale. So better would be to
switch the locale to utf-8 and call strcoll without calling
ENCODE_SYSTEM (tho of course, only if the strings are multibyte).
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-22 10:24 ` Stefan Monnier
@ 2012-07-22 15:25 ` Eli Zaretskii
2012-07-23 8:57 ` Stefan Monnier
2012-07-23 9:34 ` martin rudalics
0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-22 15:25 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 12008
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: martin rudalics <rudalics@gmx.at>, 12008@debbugs.gnu.org
> Date: Sun, 22 Jul 2012 06:24:31 -0400
>
> > A simple way of doing this goes along the following lines:
> > Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
> > Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> > return make_number (strcoll (enc_str1, enc_str2));
>
> That's probably OK for dired'd sorting but not for sort-subr where we
> need to be independent from the system locale.
How do you mean "independent of the system locale"? We already have
locale-independent string comparison: compare-strings, string<, etc.
By contrast, sorting strings in collation order is AFAIK inherently
locale-specific. Or am I missing something?
> So better would be to switch the locale to utf-8 and call strcoll
> without calling ENCODE_SYSTEM (tho of course, only if the strings
> are multibyte).
How will this be different from string< etc., that we already have?
(And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
'ls' uses 'strcoll' to sort file names. Only on MS-Windows, where we
use ls-lisp.el, do we need to collate in Lisp as part of Dired.)
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-22 15:25 ` Eli Zaretskii
@ 2012-07-23 8:57 ` Stefan Monnier
2012-07-23 15:34 ` Eli Zaretskii
2012-07-23 9:34 ` martin rudalics
1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-07-23 8:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 12008
>> So better would be to switch the locale to utf-8 and call strcoll
>> without calling ENCODE_SYSTEM (tho of course, only if the strings
>> are multibyte).
> How will this be different from string< etc., that we already have?
I'd expect strcoll in a utf-8 locale to sort e é è ê and such
close together.
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-22 15:25 ` Eli Zaretskii
2012-07-23 8:57 ` Stefan Monnier
@ 2012-07-23 9:34 ` martin rudalics
2012-07-23 15:38 ` Eli Zaretskii
1 sibling, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-23 9:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 12008
> How do you mean "independent of the system locale"? We already have
> locale-independent string comparison: compare-strings, string<, etc.
> By contrast, sorting strings in collation order is AFAIK inherently
> locale-specific. Or am I missing something?
Ideally, it should be possible to specify a locale-independent behavior.
But using the locale-specific one would be already a great improvement
for me.
> (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
> 'ls' uses 'strcoll' to sort file names.
I suppose this doesn't hold for the `ls' coming with GnuWin32.
> Only on MS-Windows, where we
> use ls-lisp.el, do we need to collate in Lisp as part of Dired.)
martin
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-23 8:57 ` Stefan Monnier
@ 2012-07-23 15:34 ` Eli Zaretskii
2012-07-23 23:30 ` Stefan Monnier
0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:34 UTC (permalink / raw)
To: Stefan Monnier; +Cc: 12008
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: rudalics@gmx.at, 12008@debbugs.gnu.org
> Date: Mon, 23 Jul 2012 04:57:14 -0400
>
> >> So better would be to switch the locale to utf-8 and call strcoll
> >> without calling ENCODE_SYSTEM (tho of course, only if the strings
> >> are multibyte).
> > How will this be different from string< etc., that we already have?
>
> I'd expect strcoll in a utf-8 locale to sort e é è ê and such
> close together.
Does the encoding really matter for strcoll? That is, won't
de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
the same characters?
If the encoding doesn't matter, then why do you keep mentioning utf-8
locales in this context? The request, as I understood it, was to be
able to sort arbitrary strings in the locale-dependent collating
order.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-23 9:34 ` martin rudalics
@ 2012-07-23 15:38 ` Eli Zaretskii
2012-07-23 15:49 ` martin rudalics
0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:38 UTC (permalink / raw)
To: martin rudalics; +Cc: 12008
> Date: Mon, 23 Jul 2012 11:34:53 +0200
> From: martin rudalics <rudalics@gmx.at>
> CC: Stefan Monnier <monnier@IRO.UMontreal.CA>, 12008@debbugs.gnu.org
>
> > How do you mean "independent of the system locale"? We already have
> > locale-independent string comparison: compare-strings, string<, etc.
> > By contrast, sorting strings in collation order is AFAIK inherently
> > locale-specific. Or am I missing something?
>
> Ideally, it should be possible to specify a locale-independent behavior.
I'm confused: what is "locale-independent behavior" in this context?
Do you mean the ability to request a collation specific for the German
locale when your current system locale is something else? If so, this
is not locale-independent behavior as I understand it. If you mean
something else, please elaborate.
> > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
> > 'ls' uses 'strcoll' to sort file names.
>
> I suppose this doesn't hold for the `ls' coming with GnuWin32.
Why do you think so? The code calls strcoll even on Windows.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-23 15:38 ` Eli Zaretskii
@ 2012-07-23 15:49 ` martin rudalics
2012-07-23 15:55 ` Eli Zaretskii
0 siblings, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-23 15:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 12008
>> Ideally, it should be possible to specify a locale-independent behavior.
>
> I'm confused: what is "locale-independent behavior" in this context?
> Do you mean the ability to request a collation specific for the German
> locale when your current system locale is something else?
Yes.
> If so, this
> is not locale-independent behavior as I understand it. If you mean
> something else, please elaborate.
>
>> > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>> > 'ls' uses 'strcoll' to sort file names.
>>
>> I suppose this doesn't hold for the `ls' coming with GnuWin32.
>
> Why do you think so? The code calls strcoll even on Windows.
But how do I request it when calling `ls'? Or, maybe better, what can I
do to have `ls' respect my locale?
martin
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-23 15:49 ` martin rudalics
@ 2012-07-23 15:55 ` Eli Zaretskii
0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:55 UTC (permalink / raw)
To: martin rudalics; +Cc: 12008
> Date: Mon, 23 Jul 2012 17:49:49 +0200
> From: martin rudalics <rudalics@gmx.at>
> CC: monnier@IRO.UMontreal.CA, 12008@debbugs.gnu.org
>
> >> > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
> >> > 'ls' uses 'strcoll' to sort file names.
> >>
> >> I suppose this doesn't hold for the `ls' coming with GnuWin32.
> >
> > Why do you think so? The code calls strcoll even on Windows.
>
> But how do I request it when calling `ls'?
It works automagically. Or, shall I say, "should work" (keeping in
mind how buggy GnuWin32 ports are).
On Unix, you can set LC_COLLATE in the environment to control the
collation order, but I don't think it works on Windows.
> Or, maybe better, what can I do to have `ls' respect my locale?
Barring any bugs, it should do so already.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#12008: Alphabetic sorting respect user's language and/or locale
2012-07-23 15:34 ` Eli Zaretskii
@ 2012-07-23 23:30 ` Stefan Monnier
0 siblings, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2012-07-23 23:30 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 12008
> Does the encoding really matter for strcoll? That is, won't
> de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
> the same characters?
The encoding matters because de_DE.iso8859-1's strcoll won't work right
if your strings include λ, Π, τ, ⊢, →, ↦, ≡, ...
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-07-23 23:30 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-21 14:32 bug#12008: Alphabetic sorting respect user's language and/or locale martin rudalics
2012-07-21 16:01 ` Eli Zaretskii
2012-07-21 16:38 ` Eli Zaretskii
2012-07-22 10:24 ` Stefan Monnier
2012-07-22 15:25 ` Eli Zaretskii
2012-07-23 8:57 ` Stefan Monnier
2012-07-23 15:34 ` Eli Zaretskii
2012-07-23 23:30 ` Stefan Monnier
2012-07-23 9:34 ` martin rudalics
2012-07-23 15:38 ` Eli Zaretskii
2012-07-23 15:49 ` martin rudalics
2012-07-23 15:55 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).