unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#12008: Alphabetic sorting respect user's language and/or locale
@ 2012-07-21 14:32 martin rudalics
  2012-07-21 16:01 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-21 14:32 UTC (permalink / raw)
  To: 12008

Currently, specifying alphabetic order for output produced by functions
like `dired' and `sort-subr' makes that output appear in ASCII-code
order.  This means that such output deviates from the order expected by
users of Latin-derived alphabets like French, German or Spanish and
can make working with these functions very awkward.

Please consider adding a predicate which makes it possible to produce
such output in alphabetic order respecting the language and/or locale
of the user.

Thank you, martin





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-21 14:32 bug#12008: Alphabetic sorting respect user's language and/or locale martin rudalics
@ 2012-07-21 16:01 ` Eli Zaretskii
  2012-07-21 16:38   ` Eli Zaretskii
  2012-07-22 10:24   ` Stefan Monnier
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-21 16:01 UTC (permalink / raw)
  To: martin rudalics; +Cc: 12008

> Date: Sat, 21 Jul 2012 16:32:50 +0200
> From: martin rudalics <rudalics@gmx.at>
> 
> Currently, specifying alphabetic order for output produced by functions
> like `dired' and `sort-subr' makes that output appear in ASCII-code
> order.  This means that such output deviates from the order expected by
> users of Latin-derived alphabets like French, German or Spanish and
> can make working with these functions very awkward.
> 
> Please consider adding a predicate which makes it possible to produce
> such output in alphabetic order respecting the language and/or locale
> of the user.

A simple way of doing this goes along the following lines:

  Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
  Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);

  return make_number (strcoll (enc_str1, enc_str2));

However, there are 2 potential issues with this:

 . do typical libc implementations of strcoll handle multibyte
   characters correctly, if ENCODE_SYSTEM happens to produce multibyte
   encoding, such as UTF-8?

 . is the above efficient enough, when ENCODE_SYSTEM is not a no-op
   (which it is for UTF-8 locales)?

I don't know the answer to these, mainly to the first.  (The
MS-Windows implementation is claimed to handle multibyte strings.)
Anyone?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-21 16:01 ` Eli Zaretskii
@ 2012-07-21 16:38   ` Eli Zaretskii
  2012-07-22 10:24   ` Stefan Monnier
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-21 16:38 UTC (permalink / raw)
  To: rudalics; +Cc: 12008

> Date: Sat, 21 Jul 2012 19:01:11 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 12008@debbugs.gnu.org
> 
>   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
>   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> 
>   return make_number (strcoll (enc_str1, enc_str2));

Err... make that

   return make_number (strcoll (SDATA (enc_str1), SDATA (enc_str2)));

Sorry.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-21 16:01 ` Eli Zaretskii
  2012-07-21 16:38   ` Eli Zaretskii
@ 2012-07-22 10:24   ` Stefan Monnier
  2012-07-22 15:25     ` Eli Zaretskii
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-07-22 10:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 12008

> A simple way of doing this goes along the following lines:
>   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
>   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
>   return make_number (strcoll (enc_str1, enc_str2));

That's probably OK for dired'd sorting but not for sort-subr where we
need to be independent from the system locale.  So better would be to
switch the locale to utf-8 and call strcoll without calling
ENCODE_SYSTEM (tho of course, only if the strings are multibyte).


        Stefan





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-22 10:24   ` Stefan Monnier
@ 2012-07-22 15:25     ` Eli Zaretskii
  2012-07-23  8:57       ` Stefan Monnier
  2012-07-23  9:34       ` martin rudalics
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-22 15:25 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 12008

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: martin rudalics <rudalics@gmx.at>, 12008@debbugs.gnu.org
> Date: Sun, 22 Jul 2012 06:24:31 -0400
> 
> > A simple way of doing this goes along the following lines:
> >   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
> >   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> >   return make_number (strcoll (enc_str1, enc_str2));
> 
> That's probably OK for dired'd sorting but not for sort-subr where we
> need to be independent from the system locale.

How do you mean "independent of the system locale"?  We already have
locale-independent string comparison: compare-strings, string<, etc.
By contrast, sorting strings in collation order is AFAIK inherently
locale-specific.  Or am I missing something?

> So better would be to switch the locale to utf-8 and call strcoll
> without calling ENCODE_SYSTEM (tho of course, only if the strings
> are multibyte).

How will this be different from string< etc., that we already have?

(And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
'ls' uses 'strcoll' to sort file names.  Only on MS-Windows, where we
use ls-lisp.el, do we need to collate in Lisp as part of Dired.)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-22 15:25     ` Eli Zaretskii
@ 2012-07-23  8:57       ` Stefan Monnier
  2012-07-23 15:34         ` Eli Zaretskii
  2012-07-23  9:34       ` martin rudalics
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-07-23  8:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 12008

>> So better would be to switch the locale to utf-8 and call strcoll
>> without calling ENCODE_SYSTEM (tho of course, only if the strings
>> are multibyte).
> How will this be different from string< etc., that we already have?

I'd expect strcoll in a utf-8 locale to sort e é è ê and such
close together.


        Stefan





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-22 15:25     ` Eli Zaretskii
  2012-07-23  8:57       ` Stefan Monnier
@ 2012-07-23  9:34       ` martin rudalics
  2012-07-23 15:38         ` Eli Zaretskii
  1 sibling, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-23  9:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 12008

 > How do you mean "independent of the system locale"?  We already have
 > locale-independent string comparison: compare-strings, string<, etc.
 > By contrast, sorting strings in collation order is AFAIK inherently
 > locale-specific.  Or am I missing something?

Ideally, it should be possible to specify a locale-independent behavior.
But using the locale-specific one would be already a great improvement
for me.

 > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
 > 'ls' uses 'strcoll' to sort file names.

I suppose this doesn't hold for the `ls' coming with GnuWin32.

 > Only on MS-Windows, where we
 > use ls-lisp.el, do we need to collate in Lisp as part of Dired.)

martin





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-23  8:57       ` Stefan Monnier
@ 2012-07-23 15:34         ` Eli Zaretskii
  2012-07-23 23:30           ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 12008

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: rudalics@gmx.at, 12008@debbugs.gnu.org
> Date: Mon, 23 Jul 2012 04:57:14 -0400
> 
> >> So better would be to switch the locale to utf-8 and call strcoll
> >> without calling ENCODE_SYSTEM (tho of course, only if the strings
> >> are multibyte).
> > How will this be different from string< etc., that we already have?
> 
> I'd expect strcoll in a utf-8 locale to sort e é è ê and such
> close together.

Does the encoding really matter for strcoll?  That is, won't
de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
the same characters?

If the encoding doesn't matter, then why do you keep mentioning utf-8
locales in this context?  The request, as I understood it, was to be
able to sort arbitrary strings in the locale-dependent collating
order.






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-23  9:34       ` martin rudalics
@ 2012-07-23 15:38         ` Eli Zaretskii
  2012-07-23 15:49           ` martin rudalics
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:38 UTC (permalink / raw)
  To: martin rudalics; +Cc: 12008

> Date: Mon, 23 Jul 2012 11:34:53 +0200
> From: martin rudalics <rudalics@gmx.at>
> CC: Stefan Monnier <monnier@IRO.UMontreal.CA>, 12008@debbugs.gnu.org
> 
>  > How do you mean "independent of the system locale"?  We already have
>  > locale-independent string comparison: compare-strings, string<, etc.
>  > By contrast, sorting strings in collation order is AFAIK inherently
>  > locale-specific.  Or am I missing something?
> 
> Ideally, it should be possible to specify a locale-independent behavior.

I'm confused: what is "locale-independent behavior" in this context?
Do you mean the ability to request a collation specific for the German
locale when your current system locale is something else?  If so, this
is not locale-independent behavior as I understand it.  If you mean
something else, please elaborate.

>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>  > 'ls' uses 'strcoll' to sort file names.
> 
> I suppose this doesn't hold for the `ls' coming with GnuWin32.

Why do you think so?  The code calls strcoll even on Windows.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-23 15:38         ` Eli Zaretskii
@ 2012-07-23 15:49           ` martin rudalics
  2012-07-23 15:55             ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: martin rudalics @ 2012-07-23 15:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 12008

 >> Ideally, it should be possible to specify a locale-independent behavior.
 >
 > I'm confused: what is "locale-independent behavior" in this context?
 > Do you mean the ability to request a collation specific for the German
 > locale when your current system locale is something else?

Yes.

 > If so, this
 > is not locale-independent behavior as I understand it.  If you mean
 > something else, please elaborate.
 >
 >>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
 >>  > 'ls' uses 'strcoll' to sort file names.
 >>
 >> I suppose this doesn't hold for the `ls' coming with GnuWin32.
 >
 > Why do you think so?  The code calls strcoll even on Windows.

But how do I request it when calling `ls'?  Or, maybe better, what can I
do to have `ls' respect my locale?

martin





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-23 15:49           ` martin rudalics
@ 2012-07-23 15:55             ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-07-23 15:55 UTC (permalink / raw)
  To: martin rudalics; +Cc: 12008

> Date: Mon, 23 Jul 2012 17:49:49 +0200
> From: martin rudalics <rudalics@gmx.at>
> CC: monnier@IRO.UMontreal.CA, 12008@debbugs.gnu.org
> 
>  >>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>  >>  > 'ls' uses 'strcoll' to sort file names.
>  >>
>  >> I suppose this doesn't hold for the `ls' coming with GnuWin32.
>  >
>  > Why do you think so?  The code calls strcoll even on Windows.
> 
> But how do I request it when calling `ls'?

It works automagically.  Or, shall I say, "should work" (keeping in
mind how buggy GnuWin32 ports are).

On Unix, you can set LC_COLLATE in the environment to control the
collation order, but I don't think it works on Windows.

> Or, maybe better, what can I do to have `ls' respect my locale?

Barring any bugs, it should do so already.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#12008: Alphabetic sorting respect user's language and/or locale
  2012-07-23 15:34         ` Eli Zaretskii
@ 2012-07-23 23:30           ` Stefan Monnier
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2012-07-23 23:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 12008

> Does the encoding really matter for strcoll?  That is, won't
> de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
> the same characters?

The encoding matters because de_DE.iso8859-1's strcoll won't work right
if your strings include λ, Π, τ, ⊢, →, ↦, ≡, ...


        Stefan





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-23 23:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-21 14:32 bug#12008: Alphabetic sorting respect user's language and/or locale martin rudalics
2012-07-21 16:01 ` Eli Zaretskii
2012-07-21 16:38   ` Eli Zaretskii
2012-07-22 10:24   ` Stefan Monnier
2012-07-22 15:25     ` Eli Zaretskii
2012-07-23  8:57       ` Stefan Monnier
2012-07-23 15:34         ` Eli Zaretskii
2012-07-23 23:30           ` Stefan Monnier
2012-07-23  9:34       ` martin rudalics
2012-07-23 15:38         ` Eli Zaretskii
2012-07-23 15:49           ` martin rudalics
2012-07-23 15:55             ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).