* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
@ 2014-07-18 6:22 ` Michael Heerdegen
2014-07-18 6:53 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-18 6:22 UTC (permalink / raw)
To: 18051
Hello,
Some users will want to configure the sorting predicate used by ls-lisp,
for example, to get a natural sorting of version numbers, or to sort in
files whose names start with a dot as if they had no dot, etc.
Currently, sorting is even hardcoded because `ls-lisp-string-lessp' is a
defsubst. If it was a normal function, one could advice it.
Or, with some more efforts, sorting order could be made configurable via
options, and the -v switch could be implemented.
Regards,
Michael.
In GNU Emacs 24.3.92.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.12.2)
of 2014-07-17 on drachen
Windowing system distributor `The X.Org Foundation', version 11.0.11599904
System Description: Debian GNU/Linux testing (jessie)
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 6:22 ` bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function? Michael Heerdegen
@ 2014-07-18 6:53 ` Eli Zaretskii
2014-07-18 7:33 ` Michael Heerdegen
2014-08-27 23:57 ` bug#18051: trunk r117751: Improve robustness of new string-collation code Katsumi Yamaoka
2014-08-28 3:09 ` Katsumi Yamaoka
2 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 6:53 UTC (permalink / raw)
To: michael_heerdegen; +Cc: 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Date: Fri, 18 Jul 2014 08:22:43 +0200
>
> Some users will want to configure the sorting predicate used by ls-lisp,
> for example, to get a natural sorting of version numbers, or to sort in
> files whose names start with a dot as if they had no dot, etc.
>
> Currently, sorting is even hardcoded because `ls-lisp-string-lessp' is a
> defsubst. If it was a normal function, one could advice it.
>
> Or, with some more efforts, sorting order could be made configurable via
> options, and the -v switch could be implemented.
ls-lisp emulates the Unix and GNU 'ls'. So I will generally oppose to
introducing any option into it that cannot be had with an external
'ls' program, as long as the latter is the main method of getting a
Dired buffer. (If Emacs ever decides that ls-lisp becomes the main
method, and will use it by default on all supported platforms, this
objection will no longer be valid, of course.)
An alternative to what you want would be a Dired-level feature, which
then will be available also to those who don't use ls-lisp.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 6:53 ` Eli Zaretskii
@ 2014-07-18 7:33 ` Michael Heerdegen
2014-07-18 8:53 ` Eli Zaretskii
2014-07-18 9:24 ` Michael Albinus
0 siblings, 2 replies; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-18 7:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 18051
Eli Zaretskii <eliz@gnu.org> writes:
> ls-lisp emulates the Unix and GNU 'ls'. So I will generally oppose to
> introducing any option into it that cannot be had with an external
> 'ls' program, as long as the latter is the main method of getting a
> Dired buffer. (If Emacs ever decides that ls-lisp becomes the main
> method, and will use it by default on all supported platforms, this
> objection will no longer be valid, of course.)
That's a bit what I expected, and makes sense. I would welcome ls-lisp
to become the default, btw.
Sorting in dired could generally be improved a lot. If you want sorting
by size, you need to edit the switches, which is not very handy (there
are addons for that job). I needed to define the whole
`dired-sort-toggle-or-edit' because it hardcodes -t whereby I prefer -c.
> An alternative to what you want would be a Dired-level feature, which
> then will be available also to those who don't use ls-lisp.
That would be a really good thing (if it doesn't slow down things). But
before this is reality ... could I please get my `ls-lisp-string-lessp'
defined as a function?
Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 7:33 ` Michael Heerdegen
@ 2014-07-18 8:53 ` Eli Zaretskii
2014-07-18 9:37 ` Michael Heerdegen
2014-07-18 9:24 ` Michael Albinus
1 sibling, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 8:53 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 09:33:21 +0200
>
> could I please get my `ls-lisp-string-lessp' defined as a function?
You can always redefine the functions that call it, no?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 7:33 ` Michael Heerdegen
2014-07-18 8:53 ` Eli Zaretskii
@ 2014-07-18 9:24 ` Michael Albinus
2014-07-18 9:33 ` Eli Zaretskii
1 sibling, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-07-18 9:24 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: 18051
Michael Heerdegen <michael_heerdegen@web.de> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>
>> ls-lisp emulates the Unix and GNU 'ls'. So I will generally oppose to
>> introducing any option into it that cannot be had with an external
>> 'ls' program, as long as the latter is the main method of getting a
>> Dired buffer. (If Emacs ever decides that ls-lisp becomes the main
>> method, and will use it by default on all supported platforms, this
>> objection will no longer be valid, of course.)
>
> That's a bit what I expected, and makes sense. I would welcome ls-lisp
> to become the default, btw.
Tramp uses ls-lisp only in case it cannot use a native method on the
remote host. Experience shows, that ls-lisp has a much worse performance
for remote directories than native implementations.
I would oppose to make ls-lisp the default, and to add functionality to
it which would not be available otherwise. Such additional functionality
must be added to file name functions with a file name handler, if
desired.
> Michael.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 9:24 ` Michael Albinus
@ 2014-07-18 9:33 ` Eli Zaretskii
2014-07-18 10:12 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 9:33 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: Eli Zaretskii <eliz@gnu.org>, 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 11:24:59 +0200
>
> Tramp uses ls-lisp only in case it cannot use a native method on the
> remote host. Experience shows, that ls-lisp has a much worse performance
> for remote directories than native implementations.
Any insight as to why this happens? Perhaps the Tramp implementation
of directory-files-and-attributes needs some love?
> I would oppose to make ls-lisp the default, and to add functionality to
> it which would not be available otherwise.
If this is because of Tramp, nothing prevents us from using 'ls' on
the remote host, and then manipulate the results locally in Lisp,
right? So I'm not sure I understand the rationale for your
objections. Perhaps revealing more details will help.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 8:53 ` Eli Zaretskii
@ 2014-07-18 9:37 ` Michael Heerdegen
2014-07-18 9:46 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-18 9:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 18051
Eli Zaretskii <eliz@gnu.org> writes:
> > could I please get my `ls-lisp-string-lessp' defined as a function?
>
> You can always redefine the functions that call it, no?
That would cover the whole switch processing algorithm,
`ls-lisp-handle-switches', 60 lines of code. I would shadow any future
change in that code by redefining that huge function. That's why I
wanted to avoid that.
What's the advantage of `ls-lisp-string-lessp' being a defsubst? If it
really makes it significantly faster (it's called a lot of times, of
course), then ok, let's keep it.
Thanks,
Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 9:37 ` Michael Heerdegen
@ 2014-07-18 9:46 ` Eli Zaretskii
2014-07-18 10:18 ` Michael Heerdegen
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 9:46 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 11:37:19 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > > could I please get my `ls-lisp-string-lessp' defined as a function?
> >
> > You can always redefine the functions that call it, no?
>
> That would cover the whole switch processing algorithm,
> `ls-lisp-handle-switches', 60 lines of code. I would shadow any future
> change in that code by redefining that huge function. That's why I
> wanted to avoid that.
You are going to override the behavior of the package anyway, so I
don't see the big difference.
> What's the advantage of `ls-lisp-string-lessp' being a defsubst?
From my POV, making sure the package always behaves as designed, I
guess. You agreed with my motivation, so it sounds like a
contradiction to me to still push for the change.
Anyway, if others think the comparison of file names should be up for
grabs, I won't fight the change just because I think it's wrong.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 9:33 ` Eli Zaretskii
@ 2014-07-18 10:12 ` Michael Albinus
2014-07-18 12:57 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-07-18 10:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
Hi Eli,
>> Tramp uses ls-lisp only in case it cannot use a native method on the
>> remote host. Experience shows, that ls-lisp has a much worse performance
>> for remote directories than native implementations.
>
> Any insight as to why this happens? Perhaps the Tramp implementation
> of directory-files-and-attributes needs some love?
Maybe it is a misunderstanding. Tramp's native implementation is much
faster, because it sends exactly one remote command. For ssh-like
connections, this is for example
# echo "("; (/bin/ls --color=never -a | sed -e s/\$/\"/g -e s/^/\"/g | xargs \stat -c '("%n" ("%N") %h %ue0 %ge0 %Xe0 %Ye0 %Ze0 %se0 "%A" t %ie0 -1)' 2>/dev/null); echo ")" 2>/dev/null
This is much faster than ls-lisp, which must determine file-attributes
for every single file in a remote directory (which is a remote command
on its own).
>> I would oppose to make ls-lisp the default, and to add functionality to
>> it which would not be available otherwise.
>
> If this is because of Tramp, nothing prevents us from using 'ls' on
> the remote host, and then manipulate the results locally in Lisp,
> right? So I'm not sure I understand the rationale for your
> objections. Perhaps revealing more details will help.
I would oppose only if there is an additional mandatory functionality in
ls-lisp other file name primitives are urged to use. If there would
be changes in, let's say, directory-files-and-attributes, there's no
problem for me. But that's not what Michael has asked for.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 9:46 ` Eli Zaretskii
@ 2014-07-18 10:18 ` Michael Heerdegen
2014-07-18 13:03 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-18 10:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 18051
Eli Zaretskii <eliz@gnu.org> writes:
> From my POV, making sure the package always behaves as designed, I
> guess. You agreed with my motivation, so it sounds like a
> contradiction to me to still push for the change.
Seems I again misunderstood what you meant with:
> [...] So I will generally oppose to introducing any option into it
> that cannot be had with an external 'ls' program, as long as the
> latter is the main method of getting a
I had read that as introducing "user options", but you obviously meant
introducing options for changing the behavior using any means.
> Anyway, if others think the comparison of file names should be up for
> grabs, I won't fight the change just because I think it's wrong.
At least we agree that there's room for improvement.
Anyway, what I want to reach:
(1) sort in files starting with a dot as if they had no dot
(2) -v sorting (sorting versions correctly)
isn't that both possible with ls? (2) obviously; I think (1) depends on
the system's language setting?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 10:12 ` Michael Albinus
@ 2014-07-18 12:57 ` Eli Zaretskii
2014-07-18 13:18 ` Michael Albinus
2014-07-20 5:49 ` Michael Heerdegen
0 siblings, 2 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 12:57 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 12:12:48 +0200
>
> >> Tramp uses ls-lisp only in case it cannot use a native method on the
> >> remote host. Experience shows, that ls-lisp has a much worse performance
> >> for remote directories than native implementations.
> >
> > Any insight as to why this happens? Perhaps the Tramp implementation
> > of directory-files-and-attributes needs some love?
>
> Maybe it is a misunderstanding. Tramp's native implementation is much
> faster, because it sends exactly one remote command. For ssh-like
> connections, this is for example
>
> # echo "("; (/bin/ls --color=never -a | sed -e s/\$/\"/g -e s/^/\"/g | xargs \stat -c '("%n" ("%N") %h %ue0 %ge0 %Xe0 %Ye0 %Ze0 %se0 "%A" t %ie0 -1)' 2>/dev/null); echo ")" 2>/dev/null
We could easily add this to ls-lisp, in case the directory is remote.
Right now, it simply doesn't support remote directories, because I
didn't know there was any interest in that.
> I would oppose only if there is an additional mandatory functionality in
> ls-lisp other file name primitives are urged to use. If there would
> be changes in, let's say, directory-files-and-attributes, there's no
> problem for me. But that's not what Michael has asked for.
I don't think you understood what Michael wanted, but I'll let Michael
speak for himself.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 10:18 ` Michael Heerdegen
@ 2014-07-18 13:03 ` Eli Zaretskii
2014-07-19 1:25 ` Michael Heerdegen
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 13:03 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 12:18:19 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > From my POV, making sure the package always behaves as designed, I
> > guess. You agreed with my motivation, so it sounds like a
> > contradiction to me to still push for the change.
>
> Seems I again misunderstood what you meant with:
>
> > [...] So I will generally oppose to introducing any option into it
> > that cannot be had with an external 'ls' program, as long as the
> > latter is the main method of getting a
>
> I had read that as introducing "user options", but you obviously meant
> introducing options for changing the behavior using any means.
Yes, I meant adding code that would support functionality not
available when 'ls' is used.
> > Anyway, if others think the comparison of file names should be up for
> > grabs, I won't fight the change just because I think it's wrong.
>
> At least we agree that there's room for improvement.
That's trivial: there always is, including in ls-lisp.
> Anyway, what I want to reach:
>
> (1) sort in files starting with a dot as if they had no dot
Why? Personally, it would mightily confuse me: I always expect to
find all the dot-files together. This is useful, e.g., when I'm
looking for init file related to some feature, but I don't know the
exact name of that file.
But if 'ls' supports that, so should ls-lisp.
> (2) -v sorting (sorting versions correctly)
Isn't this what "ls -v" does? If so, and if ls-lisp doesn't currently
support that, patches to add such support are welcome.
> isn't that both possible with ls? (2) obviously; I think (1) depends on
> the system's language setting?
Patches to support any feature available in some 'ls' are more than
welcome. TIA.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 12:57 ` Eli Zaretskii
@ 2014-07-18 13:18 ` Michael Albinus
2014-07-18 13:44 ` Eli Zaretskii
2014-07-20 5:49 ` Michael Heerdegen
1 sibling, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-07-18 13:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
>> Maybe it is a misunderstanding. Tramp's native implementation is much
>> faster, because it sends exactly one remote command. For ssh-like
>> connections, this is for example
>>
>> # echo "("; (/bin/ls --color=never -a | sed -e s/\$/\"/g -e s/^/\"/g
>> | xargs \stat -c '("%n" ("%N") %h %ue0 %ge0 %Xe0 %Ye0 %Ze0 %se0 "%A"
>> t %ie0 -1)' 2>/dev/null); echo ")" 2>/dev/null
>
> We could easily add this to ls-lisp, in case the directory is remote.
> Right now, it simply doesn't support remote directories, because I
> didn't know there was any interest in that.
No, that's not needed. Tramp does its job for different target
architectures in different ways. For example, if the stat command is not
available on the remote host, it uses another implementation with perl
for directory-files-and-attributes, and so on.
ls-lisp does not support file name handlers, and Tramp uses it only
internally in case it doesn't know better. Support for remote
directories would mean to add a file name handler for ls-lisp - is this
what you have in mind? I don't believe it is necessary.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 13:18 ` Michael Albinus
@ 2014-07-18 13:44 ` Eli Zaretskii
2014-07-18 16:21 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-18 13:44 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Fri, 18 Jul 2014 15:18:40 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> Maybe it is a misunderstanding. Tramp's native implementation is much
> >> faster, because it sends exactly one remote command. For ssh-like
> >> connections, this is for example
> >>
> >> # echo "("; (/bin/ls --color=never -a | sed -e s/\$/\"/g -e s/^/\"/g
> >> | xargs \stat -c '("%n" ("%N") %h %ue0 %ge0 %Xe0 %Ye0 %Ze0 %se0 "%A"
> >> t %ie0 -1)' 2>/dev/null); echo ")" 2>/dev/null
> >
> > We could easily add this to ls-lisp, in case the directory is remote.
> > Right now, it simply doesn't support remote directories, because I
> > didn't know there was any interest in that.
>
> No, that's not needed. Tramp does its job for different target
> architectures in different ways. For example, if the stat command is not
> available on the remote host, it uses another implementation with perl
> for directory-files-and-attributes, and so on.
We are talking past each other. What I meant was to add to ls-lisp
support for remote directories, which will simply invoke Tramp's
handlers for that.
> Support for remote directories would mean to add a file name handler
> for ls-lisp - is this what you have in mind?
Yes.
> I don't believe it is necessary.
It is necessary if ls-lisp will ever become the default.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 13:44 ` Eli Zaretskii
@ 2014-07-18 16:21 ` Michael Albinus
0 siblings, 0 replies; 63+ messages in thread
From: Michael Albinus @ 2014-07-18 16:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
>> Support for remote directories would mean to add a file name handler
>> for ls-lisp - is this what you have in mind?
>
> Yes.
>
>> I don't believe it is necessary.
>
> It is necessary if ls-lisp will ever become the default.
Well, in this case I don't know which functionality shall be added to
ls-lisp-insert-directory, which couldn't be added directly to
insert-directory. The latter function calls file name handlers already,
if needed.
(I still have the impression we're speaking about different
topics. Sorry for my stupidness)
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 13:03 ` Eli Zaretskii
@ 2014-07-19 1:25 ` Michael Heerdegen
2014-07-19 8:17 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-19 1:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 18051
Eli Zaretskii <eliz@gnu.org> writes:
> > Anyway, what I want to reach:
> >
> > (1) sort in files starting with a dot as if they had no dot
>
> Why? Personally, it would mightily confuse me: I always expect to
> find all the dot-files together. This is useful, e.g., when I'm
> looking for init file related to some feature, but I don't know the
> exact name of that file.
But currently all dot files are listed before all the other files.
I find that annoying most of the time. Dunno yet if I like my current
setup, I'll see - but over the current sorting, I would much prefer
having all the dot files at the end. (I don't know if this is possible
with ls.)
> But if 'ls' supports that, so should ls-lisp.
It depends on "locale" settings. I tried some settings of the LC_ALL
variable. With "C" or "POSIX", I get the same sorting as with ls-lisp.
OTOH, with "en_US.utf8" or "de_DE.utf8" I get the sorting I described,
with dot files merged with the other files.
> > (2) -v sorting (sorting versions correctly)
>
> Isn't this what "ls -v" does? If so, and if ls-lisp doesn't currently
> support that, patches to add such support are welcome.
ls -v sorts backup versions in their natural order (which is not the
lexicographic order). Yes, would be good to have that in ls-lisp, and
should not be too hard. I can give it a try when I get the time. But
I'm not sure what would have to be done about the locale depend part.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-19 1:25 ` Michael Heerdegen
@ 2014-07-19 8:17 ` Eli Zaretskii
2014-07-19 10:52 ` Michael Heerdegen
2014-07-19 10:56 ` Eli Zaretskii
0 siblings, 2 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-19 8:17 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: 18051@debbugs.gnu.org
> Date: Sat, 19 Jul 2014 03:25:01 +0200
>
> over the current sorting, I would much prefer having all the dot
> files at the end. (I don't know if this is possible with ls.)
If 'ls' cannot do that, I think we should have this in Dired.
> > But if 'ls' supports that, so should ls-lisp.
>
> It depends on "locale" settings. I tried some settings of the LC_ALL
> variable. With "C" or "POSIX", I get the same sorting as with ls-lisp.
> OTOH, with "en_US.utf8" or "de_DE.utf8" I get the sorting I described,
> with dot files merged with the other files.
AFAICT, 'ls' uses strcoll. I don't see how can that effectively
ignore the leading dot, but maybe I'm missing something. OTOH, if the
UTF-8 codeset says the leading dot should be ignored, then ls-lisp
should do the same by default, at least when the locale's codeset is
UTF-8.
Can you see the answer in 'ls' sources? Does just "en_US" or
"en_US.8859-1" change the order?
> > > (2) -v sorting (sorting versions correctly)
> >
> > Isn't this what "ls -v" does? If so, and if ls-lisp doesn't currently
> > support that, patches to add such support are welcome.
>
> ls -v sorts backup versions in their natural order (which is not the
> lexicographic order). Yes, would be good to have that in ls-lisp, and
> should not be too hard. I can give it a try when I get the time.
Thanks.
> I'm not sure what would have to be done about the locale depend part.
Assuming it's indeed the locale thing, Emacs doesn't yet support
locale-specific sorting. But we could do that in some ad-hoc manner
anyway.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-19 8:17 ` Eli Zaretskii
@ 2014-07-19 10:52 ` Michael Heerdegen
2014-07-19 10:56 ` Eli Zaretskii
1 sibling, 0 replies; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-19 10:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 18051
Eli Zaretskii <eliz@gnu.org> writes:
> AFAICT, 'ls' uses strcoll. I don't see how can that effectively
> ignore the leading dot, but maybe I'm missing something.
Yes, it uses strcoll.
> Can you see the answer in 'ls' sources?
No, I don't see anything related in the sources, it just calls strcoll
(or strcmp as fallback), nothing more.
I compiled some test program calling strcoll, and there it didn't ignore
the dot. Sorry, I don't why this is different in ls.
> Does just "en_US" or "en_US.8859-1" change the order?
Yes! ls -al with en_US behaves the same as with en_US.utf8, dots are
ignored.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-19 8:17 ` Eli Zaretskii
2014-07-19 10:52 ` Michael Heerdegen
@ 2014-07-19 10:56 ` Eli Zaretskii
1 sibling, 0 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-19 10:56 UTC (permalink / raw)
To: michael_heerdegen; +Cc: 18051
> Date: Sat, 19 Jul 2014 11:17:15 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 18051@debbugs.gnu.org
>
> > > But if 'ls' supports that, so should ls-lisp.
> >
> > It depends on "locale" settings. I tried some settings of the LC_ALL
> > variable. With "C" or "POSIX", I get the same sorting as with ls-lisp.
> > OTOH, with "en_US.utf8" or "de_DE.utf8" I get the sorting I described,
> > with dot files merged with the other files.
>
> AFAICT, 'ls' uses strcoll. I don't see how can that effectively
> ignore the leading dot, but maybe I'm missing something.
I think I know the answer: those versions of 'ls' that do this are
based on libc implementation that supports UTS#10, the Unicode
Collation Algorithm, or at least part of it. UTS#10 specifies a
multi-level comparison, whereby base characters, accents, and
letter-case variants are compared before punctuation characters.
> OTOH, if the UTF-8 codeset says the leading dot should be ignored,
> then ls-lisp should do the same by default, at least when the
> locale's codeset is UTF-8.
For this, we would need a UTS#10 compatible compare-strings. Then
ls-lisp could simply use it when the locale is .UTF-8.
Alternatively, we could have an approximation to that, just for
sorting non-punctuation characters before the punctuation characters.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-18 12:57 ` Eli Zaretskii
2014-07-18 13:18 ` Michael Albinus
@ 2014-07-20 5:49 ` Michael Heerdegen
2014-07-20 6:07 ` Eli Zaretskii
2014-07-20 6:18 ` Michael Heerdegen
1 sibling, 2 replies; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 5:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Michael Albinus, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> I don't think you understood what Michael wanted, but I'll let Michael
> speak for himself.
I don't use Tramp often currently.
Generally, I switched to ls-lisp because I liked that it gives me more
control over how dired looks like.
Sorting is one only thing (that could probably be done with Tramp).
There are other things. What I don't like from ls is that it shows
symlinks like this:
lrwxrwxrwx ... ...
rwxrwxrwx is redundant. When you use M on a link in dired, you actually
set the modes of the target. I want to see the target file's modes, so
I use this:
,----------------------------------------------------------------------
| (defun my-ls-lisp-treat-symlinks-ad (file-alist &rest _)
| "Make it show modes of truenames for symlinks."
| (mapc (lambda (file-line)
| (let ((filename (expand-file-name (car file-line)
| default-directory))
| modes-string)
| (when (file-symlink-p filename)
| (setq modes-string (nth 8 (file-attributes
| (file-truename filename))))
| (if (not modes-string) ;; link could be dead!
| (setq modes-string "l?????????")
| (aset modes-string 0 ?l))
| (setf (nth 9 file-line) modes-string))))
| file-alist)
| file-alist)
|
| (advice-add 'ls-lisp-handle-switches :after #'my-ls-lisp-treat-symlinks-ad)
`----------------------------------------------------------------------
AFAIK this can't be reached with ls. Just one example. Trying to do
such things with Tramp would probably indeed slow it down.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 5:49 ` Michael Heerdegen
@ 2014-07-20 6:07 ` Eli Zaretskii
2014-07-20 6:21 ` Michael Heerdegen
2014-07-20 6:18 ` Michael Heerdegen
1 sibling, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 6:07 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: michael.albinus, 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: Michael Albinus <michael.albinus@gmx.de>, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 07:49:48 +0200
>
> There are other things. What I don't like from ls is that it shows
> symlinks like this:
>
> lrwxrwxrwx ... ...
>
> rwxrwxrwx is redundant.
It's what 'lstat' returns.
> When you use M on a link in dired, you actually
> set the modes of the target. I want to see the target file's modes, so
> I use this:
>
> ,----------------------------------------------------------------------
> | (defun my-ls-lisp-treat-symlinks-ad (file-alist &rest _)
> | "Make it show modes of truenames for symlinks."
> | (mapc (lambda (file-line)
> | (let ((filename (expand-file-name (car file-line)
> | default-directory))
> | modes-string)
> | (when (file-symlink-p filename)
> | (setq modes-string (nth 8 (file-attributes
> | (file-truename filename))))
> | (if (not modes-string) ;; link could be dead!
> | (setq modes-string "l?????????")
> | (aset modes-string 0 ?l))
> | (setf (nth 9 file-line) modes-string))))
> | file-alist)
> | file-alist)
> |
> | (advice-add 'ls-lisp-handle-switches :after #'my-ls-lisp-treat-symlinks-ad)
> `----------------------------------------------------------------------
>
> AFAIK this can't be reached with ls.
Doesn't "ls -L" give you that?
> Just one example. Trying to do such things with Tramp would
> probably indeed slow it down.
IMO, the right way to do this is to have an additional argument to
file-attributes, follow-symlinks, with the obvious semantics.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 5:49 ` Michael Heerdegen
2014-07-20 6:07 ` Eli Zaretskii
@ 2014-07-20 6:18 ` Michael Heerdegen
2014-07-20 14:22 ` Stefan Monnier
1 sibling, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 6:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Michael Albinus, 18051
Michael Heerdegen <michael_heerdegen@web.de> writes:
> | (advice-add 'ls-lisp-handle-switches
> :after #'my-ls-lisp-treat-symlinks-ad)
And I understand that users should not be encouraged to do such things.
Being 1:1 ls compatible has many advantages, but it makes dired
unusually inflexible compared to other Emacs packages. That's why I
said I wished ls-lisp - or some other configurable mechanism - would be
the default. I didn't think of Tramp when I said that.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 6:07 ` Eli Zaretskii
@ 2014-07-20 6:21 ` Michael Heerdegen
2014-07-20 6:33 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 6:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael.albinus, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> Doesn't "ls -L" give you that?
That's worse. Then I don't even see at all that the file is a symlink,
it is shown as a regular file. I had tried that, and found it very
confusing.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 6:21 ` Michael Heerdegen
@ 2014-07-20 6:33 ` Eli Zaretskii
2014-07-20 7:30 ` Michael Heerdegen
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 6:33 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: michael.albinus, 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: michael.albinus@gmx.de, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 08:21:27 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Doesn't "ls -L" give you that?
>
> That's worse. Then I don't even see at all that the file is a symlink,
> it is shown as a regular file. I had tried that, and found it very
> confusing.
Why is it important to you to know that the file is a symlink, if you
always want to change its target?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 6:33 ` Eli Zaretskii
@ 2014-07-20 7:30 ` Michael Heerdegen
2014-07-20 8:14 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 7:30 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael.albinus, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> Why is it important to you to know that the file is a symlink, if you
> always want to change its target?
That's what I want for file modes - but still I want to know/see what
I'm doing!
And it makes a difference in the organization of the file system. You
can use symlinks as some kind of shortcut to move within the file system
more quickly. It makes a difference if you remove a shortcut or if you
erase a whole directory with all its files from your hard drive. It
also makes a difference when you think you make a local change, and
actually cause changes "somewhere else" in the file system.
The funniest thing you could do would be to delete the target because
you think you have two identical versions of a file/directory in your
filesystem, and then are shocked because the second version turns out to
be a dead link after that.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 7:30 ` Michael Heerdegen
@ 2014-07-20 8:14 ` Eli Zaretskii
2014-07-20 8:24 ` Michael Heerdegen
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 8:14 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: michael.albinus, 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: michael.albinus@gmx.de, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 09:30:32 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Why is it important to you to know that the file is a symlink, if you
> > always want to change its target?
>
> That's what I want for file modes - but still I want to know/see what
> I'm doing!
But the same problem exists with the size, the time stamp, the
UID/GID, the inode -- you name it. I fail to see how seeing the mode
bits of the target is important, but the same is not true for the rest
of the attributes.
Or are you just saying that you want to see the attributes of the
target, and _also_ the fact that the file is a symlink?
> And it makes a difference in the organization of the file system. You
> can use symlinks as some kind of shortcut to move within the file system
> more quickly. It makes a difference if you remove a shortcut or if you
> erase a whole directory with all its files from your hard drive. It
> also makes a difference when you think you make a local change, and
> actually cause changes "somewhere else" in the file system.
This just means that we should have an easy way of switching between
the -L view and the non-L view. Would that solve the problem?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 8:14 ` Eli Zaretskii
@ 2014-07-20 8:24 ` Michael Heerdegen
2014-07-20 8:38 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 8:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael.albinus, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> Or are you just saying that you want to see the attributes of the
> target, and _also_ the fact that the file is a symlink?
Yes, exactly.
> This just means that we should have an easy way of switching between
> the -L view and the non-L view. Would that solve the problem?
That would not be bad, although personally I would prefer one view
providing all relevant information, even when that's not ls conform.
But even more important would be to an easy way to sort by size and
such.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 8:24 ` Michael Heerdegen
@ 2014-07-20 8:38 ` Eli Zaretskii
2014-07-20 9:15 ` Michael Heerdegen
2014-07-20 11:44 ` Michael Albinus
0 siblings, 2 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 8:38 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: michael.albinus, 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: michael.albinus@gmx.de, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 10:24:46 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Or are you just saying that you want to see the attributes of the
> > target, and _also_ the fact that the file is a symlink?
>
> Yes, exactly.
>
> > This just means that we should have an easy way of switching between
> > the -L view and the non-L view. Would that solve the problem?
>
> That would not be bad, although personally I would prefer one view
> providing all relevant information, even when that's not ls conform.
Should be possible on the Dired level.
> But even more important would be to an easy way to sort by size and
> such.
That is already available (e.g., sorting by size is triggered by the
"-S" switch).
I think that one important conclusion from this discussion is that we
need to have a way to sort files names in a way that is at least
partially compliant with UTS#10, to produce listings that are similar
to what 'ls' does on GNU systems under UTF-8 locales. I hope someone
will step forward to write the necessary code.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 8:38 ` Eli Zaretskii
@ 2014-07-20 9:15 ` Michael Heerdegen
2014-07-20 9:18 ` Eli Zaretskii
2014-07-20 11:44 ` Michael Albinus
1 sibling, 1 reply; 63+ messages in thread
From: Michael Heerdegen @ 2014-07-20 9:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael.albinus, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> That is already available (e.g., sorting by size is triggered by the
> "-S" switch).
But very inconveniently: C-u s [edit minibuffer] RET.
It would be more convenient when s without prefix would cycle between
more then two sorting orders, and when this would be configurable.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 9:15 ` Michael Heerdegen
@ 2014-07-20 9:18 ` Eli Zaretskii
0 siblings, 0 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 9:18 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: michael.albinus, 18051
> From: Michael Heerdegen <michael_heerdegen@web.de>
> Cc: michael.albinus@gmx.de, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 11:15:38 +0200
>
> It would be more convenient when s without prefix would cycle between
> more then two sorting orders, and when this would be configurable.
Sounds like a good idea to me.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 8:38 ` Eli Zaretskii
2014-07-20 9:15 ` Michael Heerdegen
@ 2014-07-20 11:44 ` Michael Albinus
2014-07-20 11:59 ` Eli Zaretskii
1 sibling, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-07-20 11:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Michael Heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> I think that one important conclusion from this discussion is that we
> need to have a way to sort files names in a way that is at least
> partially compliant with UTS#10, to produce listings that are similar
> to what 'ls' does on GNU systems under UTF-8 locales. I hope someone
> will step forward to write the necessary code.
Programs like `ls' honor the LC_COLLATE environment variable. Emacs
shall do it as well. This shouldn't affect only directory listings, but
could be used also for string searches.
Maybe we should expose glib's g_utf8_collate() on Lisp level. On systems
without glib, we might emulate it partially. Packages like ls-lisp could
use it then for sorting.
I have no clear forecast on my time budget next weeks. If possible, I
would play with this.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 11:44 ` Michael Albinus
@ 2014-07-20 11:59 ` Eli Zaretskii
2014-07-20 15:26 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 11:59 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: Michael Heerdegen <michael_heerdegen@web.de>, 18051@debbugs.gnu.org
> Date: Sun, 20 Jul 2014 13:44:07 +0200
>
> Programs like `ls' honor the LC_COLLATE environment variable. Emacs
> shall do it as well.
That's clear, and is not the issue. The issue is why (and how) does
having UTF-8 in the codeset part of the locale cause sorting to sort
as 'ls' does on GNU platforms. And the answer is that the sorting
should implement UTS#10, which I'm not sure every platform does in its
standard C library.
> This shouldn't affect only directory listings, but could be used
> also for string searches.
That is a much larger job, and it's not clear how to do it best.
Emacs supports different languages in different buffers, and setting
and resetting LC_COLLATE for each buffer is not a good idea, IMO,
because thread-local locales are not well supported outside glibc
(AFAIK).
> Maybe we should expose glib's g_utf8_collate() on Lisp level.
Are you sure this does the job? Glib docs are minimal, and don't seem
to mention UTS#10. E.g., if g_utf8_collate relies on the underlying
libc's strcoll, we are back at square one.
> On systems without glib, we might emulate it partially. Packages
> like ls-lisp could use it then for sorting.
I think we need our own implementation in any case. If nothing else,
that would solve the issue of encoding strings into UTF-8 before
calling external C functions.
> I have no clear forecast on my time budget next weeks. If possible, I
> would play with this.
Thanks.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 6:18 ` Michael Heerdegen
@ 2014-07-20 14:22 ` Stefan Monnier
0 siblings, 0 replies; 63+ messages in thread
From: Stefan Monnier @ 2014-07-20 14:22 UTC (permalink / raw)
To: Michael Heerdegen; +Cc: Michael Albinus, 18051
>> | (advice-add 'ls-lisp-handle-switches
>> | :after #'my-ls-lisp-treat-symlinks-ad)
> And I understand that users should not be encouraged to do such things.
Actually, while they're not encouraged, I definitely don't want to
discourage users from using defadvice or advice-add.
It's strongly discouraged within Emacs, and mildly discouraged for
external packages, but it's not discouraged for end-users.
Stefan
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 11:59 ` Eli Zaretskii
@ 2014-07-20 15:26 ` Michael Albinus
2014-07-20 16:16 ` Eli Zaretskii
2014-08-16 21:52 ` Michael Albinus
0 siblings, 2 replies; 63+ messages in thread
From: Michael Albinus @ 2014-07-20 15:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, Paul Eggert, 18051
Eli Zaretskii <eliz@gnu.org> writes:
>> Maybe we should expose glib's g_utf8_collate() on Lisp level.
>
> Are you sure this does the job? Glib docs are minimal, and don't seem
> to mention UTS#10. E.g., if g_utf8_collate relies on the underlying
> libc's strcoll, we are back at square one.
Well, I've checked the code of g_utf8_collate in glib 2.36. Shortly, it does
--8<---------------cut here---------------start------------->8---
#ifdef HAVE_CARBON
UCCompareTextDefault (kUCCollateStandardOptions,
str1_utf16, len1, str2_utf16, len2,
NULL, &retval);
#elif defined(__STDC_ISO_10646__)
result = wcscoll ((wchar_t *)str1_norm, (wchar_t *)str2_norm);
#else /* !__STDC_ISO_10646__ */
result = strcoll (str1_norm, str2_norm);
#endif
--8<---------------cut here---------------end--------------->8---
Likely, wcscoll implements only ISO 14651 (a subset of UCA these days),
and likely wcscoll supports single byte characters only. I will run some
tests next days.
An alternative would be libicu, which seems to implement UCA
completely. I have no idea whether there are licensing issues when
linking with Emacs, 'tho.
Maybe Paul knows better which library to use? I've seen in GNU grep's
Changelogs, that wcscoll was used, but removed last year. I haven't
checked (yet) what is the replacement.
>> On systems without glib, we might emulate it partially. Packages
>> like ls-lisp could use it then for sorting.
>
> I think we need our own implementation in any case. If nothing else,
> that would solve the issue of encoding strings into UTF-8 before
> calling external C functions.
Yep. But given the complexity of UCA, we will start slowly with a subset
of the algorithm only. This and performance considerations will still
demand for a native C library, if available.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 15:26 ` Michael Albinus
@ 2014-07-20 16:16 ` Eli Zaretskii
2014-08-16 21:52 ` Michael Albinus
1 sibling, 0 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-07-20 16:16 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, eggert, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org, Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 20 Jul 2014 17:26:04 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> Maybe we should expose glib's g_utf8_collate() on Lisp level.
> >
> > Are you sure this does the job? Glib docs are minimal, and don't seem
> > to mention UTS#10. E.g., if g_utf8_collate relies on the underlying
> > libc's strcoll, we are back at square one.
>
> Well, I've checked the code of g_utf8_collate in glib 2.36. Shortly, it does
>
> --8<---------------cut here---------------start------------->8---
> #ifdef HAVE_CARBON
>
> UCCompareTextDefault (kUCCollateStandardOptions,
> str1_utf16, len1, str2_utf16, len2,
> NULL, &retval);
>
> #elif defined(__STDC_ISO_10646__)
>
> result = wcscoll ((wchar_t *)str1_norm, (wchar_t *)str2_norm);
>
> #else /* !__STDC_ISO_10646__ */
>
> result = strcoll (str1_norm, str2_norm);
>
> #endif
> --8<---------------cut here---------------end--------------->8---
As expected, it simply relies on the Standard C library's wcscoll
implementation.
> Likely, wcscoll implements only ISO 14651 (a subset of UCA these days),
> and likely wcscoll supports single byte characters only.
No, I expect wcscoll, at least in its glibc implementation, to support
the entire Unicode range.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-07-20 15:26 ` Michael Albinus
2014-07-20 16:16 ` Eli Zaretskii
@ 2014-08-16 21:52 ` Michael Albinus
2014-08-17 16:38 ` Eli Zaretskii
1 sibling, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-16 21:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Michael Albinus <michael.albinus@gmx.de> writes:
>>> On systems without glib, we might emulate it partially. Packages
>>> like ls-lisp could use it then for sorting.
>>
>> I think we need our own implementation in any case. If nothing else,
>> that would solve the issue of encoding strings into UTF-8 before
>> calling external C functions.
>
> Yep. But given the complexity of UCA, we will start slowly with a subset
> of the algorithm only. This and performance considerations will still
> demand for a native C library, if available.
Just being curious, I've taken g_utf8_collate from the glib for a
test. It doesn't work bad.
I have added two functions `gstring-lessp' and `gstring-equalp', which
are meant to be the collation counterparts of `string-lessp' and
`string-equal'. Here are some tests, taken from UTS#10, chapter 1.1
"Multi-Level Comparison":
--8<---------------cut here---------------start------------->8---
(sort '("role" "roles" "rule") 'string-lessp)
=> ("role" "roles" "rule")
(sort '("role" "roles" "rule") 'gstring-lessp)
=> ("role" "roles" "rule")
--8<---------------cut here---------------end--------------->8---
No surprise they return the same result, this is level 1
comparison. Just base characters are compared.
--8<---------------cut here---------------start------------->8---
(sort '("role" "rôle" "roles") 'string-lessp)
=> ("role" "roles" "rôle")
(sort '("role" "rôle" "roles") 'gstring-lessp)
=> ("role" "rôle" "roles")
--8<---------------cut here---------------end--------------->8---
Accent differences are typically ignored in collation, if the base
letters differ. And so on, further tests applied from there ...
The collation rules could even be influenced by setting the locale
environment. The following example is taken from ISO 14651:2011,
appendix D.3. If LC_COLLATE is set to C.utf8, `string-lessp' and
`gstring-lessp' behave the same:
--8<---------------cut here---------------start------------->8---
(sort '("Alzheimer" "czar" "cæsium" "cølibat" "Aachen" "Aalborg" "Århus") 'stri\ng-lessp)
=> ("Aachen" "Aalborg" "Alzheimer" "czar" "cæsium" "cølibat" "Århus")
(sort '("Alzheimer" "czar" "cæsium" "cølibat" "Aachen" "Aalborg" "Århus") 'gstring-lessp)
=> ("Aachen" "Aalborg" "Alzheimer" "czar" "cæsium" "cølibat" "Århus")
--8<---------------cut here---------------end--------------->8---
When I set LC_COLLATE to en_US.utf8, accent differences are ignored,
again:
--8<---------------cut here---------------start------------->8---
(sort '("Alzheimer" "czar" "cæsium" "cølibat" "Aachen" "Aalborg" "Århus") 'gstring-lessp)
=> ("Aachen" "Aalborg" "Alzheimer" "Århus" "cæsium" "cølibat" "czar")
--8<---------------cut here---------------end--------------->8---
But setting LC_COLLATE to da_DK.utf8, the order differs, because "cz" is
less than "cæ", and "aa" is equivalent to "å" but greater than "z".
--8<---------------cut here---------------start------------->8---
(sort '("Alzheimer" "czar" "cæsium" "cølibat" "Aachen" "Aalborg" "Århus") 'gstring-lessp)
("Alzheimer" "czar" "cæsium" "cølibat" "Aachen" "Aalborg" "Århus")
--8<---------------cut here---------------end--------------->8---
Well, for practical use cases it seems to be worth to include
g_utf8_collate into Emacs. Of course, it could be used only in case glib
is linked, so we might still need an own Lisp implementation. I don't
know how well g_utf8_collate works for non Latin characters, 'tho.
And the test files CollationTest_NON_IGNORABLE.txt and
CollationTest_SHIFTED.txt from UTS#10 do not run completely
successful. I have no idea, whether it is due to a limitation of
g_utf8_collate, or whether it is because I have taken the latest Unicode
7.0.0 test files, which might include tests which haven't reached
GNU/Linux distributions yet. (Or whether my implementation is still
erroneous).
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-16 21:52 ` Michael Albinus
@ 2014-08-17 16:38 ` Eli Zaretskii
2014-08-17 17:55 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-17 16:38 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Sat, 16 Aug 2014 23:52:16 +0200
>
> Just being curious, I've taken g_utf8_collate from the glib for a
> test. It doesn't work bad.
Are you sure this is implemented in Glib, not in the underlying libc
(glibc in your case, I presume)?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-17 16:38 ` Eli Zaretskii
@ 2014-08-17 17:55 ` Eli Zaretskii
2014-08-17 18:46 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-17 17:55 UTC (permalink / raw)
To: michael.albinus; +Cc: michael_heerdegen, 18051
> Date: Sun, 17 Aug 2014 19:38:36 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
>
> > From: Michael Albinus <michael.albinus@gmx.de>
> > Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> > Date: Sat, 16 Aug 2014 23:52:16 +0200
> >
> > Just being curious, I've taken g_utf8_collate from the glib for a
> > test. It doesn't work bad.
>
> Are you sure this is implemented in Glib, not in the underlying libc
> (glibc in your case, I presume)?
Answering myself here: by reading the source of g_utf8_collate, it is
clear that the implementation is elsewhere. In particular, in any
environment that defines __STDC_ISO_10646__ (as does glibc),
g_utf8_collate simply calls wcscoll, after converting the UTF-8
strings to wide-character strings.
So I think a better alternative would be to base the implementation of
this feature on the system libraries directly. I think most modern
platforms have the necessary facilities.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-17 17:55 ` Eli Zaretskii
@ 2014-08-17 18:46 ` Michael Albinus
2014-08-17 18:52 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-17 18:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> Answering myself here: by reading the source of g_utf8_collate, it is
> clear that the implementation is elsewhere. In particular, in any
> environment that defines __STDC_ISO_10646__ (as does glibc),
> g_utf8_collate simply calls wcscoll, after converting the UTF-8
> strings to wide-character strings.
>
> So I think a better alternative would be to base the implementation of
> this feature on the system libraries directly. I think most modern
> platforms have the necessary facilities.
But OTOH, g_utf8_collate handles also other cases, like the #ifdef
HAVE_CARBON case.
So what, maybe it is sufficient to take over the implementation from
glib, indeed. There's not too much logic added there, and we would avoid
the glib dependency.
What I would really like to test are non-Latin coding points. I'm a noob
for such characters (glad to speak German, English and a little bit
Français); do you or somebody else has some test cases for
`gstring-lessp' or `gstring-equalp', which shall return different
results than `string-lessp' and `string-equal'?
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-17 18:46 ` Michael Albinus
@ 2014-08-17 18:52 ` Eli Zaretskii
2014-08-21 9:05 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-17 18:52 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Sun, 17 Aug 2014 20:46:11 +0200
>
> So what, maybe it is sufficient to take over the implementation from
> glib, indeed. There's not too much logic added there, and we would avoid
> the glib dependency.
That's what I had in mind, yes.
> do you or somebody else has some test cases for `gstring-lessp' or
> `gstring-equalp', which shall return different results than
> `string-lessp' and `string-equal'?
I don't have such test cases, but I'd start by looking in the glib
test suite.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-17 18:52 ` Eli Zaretskii
@ 2014-08-21 9:05 ` Michael Albinus
2014-08-21 14:41 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-21 9:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
[-- Attachment #1: Type: text/plain, Size: 675 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
>> So what, maybe it is sufficient to take over the implementation from
>> glib, indeed. There's not too much logic added there, and we would avoid
>> the glib dependency.
>
> That's what I had in mind, yes.
Finally, I came out with the appended patch. Comments appreciated.
>> do you or somebody else has some test cases for `gstring-lessp' or
>> `gstring-equalp', which shall return different results than
>> `string-lessp' and `string-equal'?
>
> I don't have such test cases, but I'd start by looking in the glib
> test suite.
There aren't so many glib test cases for collation. I need to search further.
Best regards, Michael.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: collate-patch --]
[-- Type: text/x-patch, Size: 5295 bytes --]
*** /usr/local/src/emacs/src/fns.c.~117719~ 2014-08-21 10:58:42.195613026 +0200
--- /usr/local/src/emacs/src/fns.c 2014-08-21 10:27:30.986334200 +0200
***************
*** 39,46 ****
#if defined (HAVE_X_WINDOWS)
#include "xterm.h"
#endif
! Lisp_Object Qstring_lessp;
static Lisp_Object Qprovide, Qrequire;
static Lisp_Object Qyes_or_no_p_history;
Lisp_Object Qcursor_in_echo_area;
--- 39,52 ----
#if defined (HAVE_X_WINDOWS)
#include "xterm.h"
#endif
+ #ifdef HAVE_SETLOCALE
+ #include <locale.h>
+ #endif /* HAVE_SETLOCALE */
+ #ifdef __STDC_ISO_10646__
+ #include <wchar.h>
+ #endif /* __STDC_ISO_10646__ */
! Lisp_Object Qstring_lessp, Qstring_collate_lessp, Qstring_collate_equalp;
static Lisp_Object Qprovide, Qrequire;
static Lisp_Object Qyes_or_no_p_history;
Lisp_Object Qcursor_in_echo_area;
***************
*** 343,348 ****
--- 349,467 ----
}
return i1 < SCHARS (s2) ? Qt : Qnil;
}
+
+ #ifdef __STDC_ISO_10646__
+ ptrdiff_t
+ str_collate (Lisp_Object s1, Lisp_Object s2)
+ {
+ register ptrdiff_t res, len, i, i_byte;
+ wchar_t *p1, *p2;
+ Lisp_Object lc_collate;
+ char *old_collate, *saved_collate;
+
+ USE_SAFE_ALLOCA;
+
+ /* Check parameters. */
+ if (SYMBOLP (s1))
+ s1 = SYMBOL_NAME (s1);
+ if (SYMBOLP (s2))
+ s2 = SYMBOL_NAME (s2);
+ CHECK_STRING (s1);
+ CHECK_STRING (s2);
+
+ /* Convert byte stream to code pointers. */
+ len = SCHARS (s1); i = i_byte = 0;
+ p1 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p1));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p1+i-1), s1, i, i_byte);
+ *(p1+len) = 0;
+
+ len = SCHARS (s2); i = i_byte = 0;
+ p2 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p2));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p2+i-1), s2, i, i_byte);
+ *(p2+len) = 0;
+
+ #ifdef HAVE_SETLOCALE
+ /* Set locale. */
+ lc_collate =
+ Fgetenv_internal (build_string ("LC_COLLATE"), Vprocess_environment);
+ if (STRINGP (lc_collate))
+ {
+ old_collate = setlocale (LC_COLLATE, NULL);
+ saved_collate = xstrdup (old_collate);
+ setlocale (LC_COLLATE, SSDATA (lc_collate));
+ }
+ #endif /* HAVE_SETLOCALE */
+
+ res = wcscoll (p1, p2);
+
+ #ifdef HAVE_SETLOCALE
+ /* Restore the original locale. */
+ if (STRINGP (lc_collate))
+ setlocale (LC_COLLATE, saved_collate);
+ #endif /* HAVE_SETLOCALE */
+
+ /* Return result. */
+ SAFE_FREE ();
+ return res;
+ }
+ #endif /* __STDC_ISO_10646__ */
+
+ DEFUN ("string-collate-lessp", Fstring_collate_lessp, Sstring_collate_lessp, 2, 2, 0,
+ doc: /* Return t if first arg string is less than second in collation order.
+
+ Case is significant. Symbols are also allowed; their print names are
+ used instead.
+
+ This function obeys the conventions for collation order in your
+ locale settings. For example, punctuation and whitespace characters
+ are considered less significant for sorting.
+
+ \(sort '\("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
+ => \("11" "1 1" "1.1" "12" "1 2" "1.2")
+
+ If your system does not support a locale environment, this function
+ behaves like `string-lessp'.
+
+ If the environment variable \"LC_COLLATE\" is set in `process-environment',
+ it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+ {
+ #ifdef __STDC_ISO_10646__
+ return (str_collate (s1, s2) < 0) ? Qt : Qnil;
+ #else
+ return Fstring_lessp (s1, s2);
+ #endif /* __STDC_ISO_10646__ */
+ }
+
+ DEFUN ("string-collate-equalp", Fstring_collate_equalp, Sstring_collate_equalp, 2, 2, 0,
+ doc: /* Return t if two strings have identical contents.
+
+ Case is significant. Symbols are also allowed; their print names are
+ used instead.
+
+ This function obeys the conventions for collation order in your locale
+ settings. For example, characters with different coding points but
+ the same meaning are considered as equal, like different grave accent
+ unicode characters.
+
+ \(string-collate-equalp \(string ?\\uFF40) \(string ?\\u1FEF))
+ => t
+
+ If your system does not support a locale environment, this function
+ behaves like `string-equal'.
+
+ If the environment variable \"LC_COLLATE\" is set in `process-environment',
+ it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+ {
+ #ifdef __STDC_ISO_10646__
+ return (str_collate (s1, s2) == 0) ? Qt : Qnil;
+ #else
+ return Fstring_equal (s1, s2);
+ #endif /* __STDC_ISO_10646__ */
+ }
\f
static Lisp_Object concat (ptrdiff_t nargs, Lisp_Object *args,
enum Lisp_Type target_type, bool last_special);
***************
*** 4919,4924 ****
--- 5038,5045 ----
defsubr (&Sdefine_hash_table_test);
DEFSYM (Qstring_lessp, "string-lessp");
+ DEFSYM (Qstring_collate_lessp, "string-collate-lessp");
+ DEFSYM (Qstring_collate_equalp, "string-collate-equalp");
DEFSYM (Qprovide, "provide");
DEFSYM (Qrequire, "require");
DEFSYM (Qyes_or_no_p_history, "yes-or-no-p-history");
***************
*** 4972,4977 ****
--- 5093,5100 ----
defsubr (&Sstring_equal);
defsubr (&Scompare_strings);
defsubr (&Sstring_lessp);
+ defsubr (&Sstring_collate_lessp);
+ defsubr (&Sstring_collate_equalp);
defsubr (&Sappend);
defsubr (&Sconcat);
defsubr (&Svconcat);
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-21 9:05 ` Michael Albinus
@ 2014-08-21 14:41 ` Eli Zaretskii
2014-08-22 14:23 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-21 14:41 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Thu, 21 Aug 2014 11:05:43 +0200
>
> >> So what, maybe it is sufficient to take over the implementation from
> >> glib, indeed. There's not too much logic added there, and we would avoid
> >> the glib dependency.
> >
> > That's what I had in mind, yes.
>
> Finally, I came out with the appended patch. Comments appreciated.
Thanks.
I have 2 comments:
. I suggest to factor out the part that converts to wchar_t, sets up
the locale, and calls strcoll. The code you wrote makes certain
assumptions about 'setlocale', and also about the wchar_t
representation. Factoring those system-dependent parts out will
minimize the number of #ifdef's needed to provide such features for
other platforms.
. I think glibc has a 'newlocale' API that is better suited to this
kind of jobs. In particular, 'setlocale' changes the locale of the
entire program, which is bad news for other threads that might be
using some locale-aware functions while the main thread calls
string-collate-lessp. (We have more than 1 thread in Emacs built
with GTK, for example, and who knows what those threads might be
doing?)
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-21 14:41 ` Eli Zaretskii
@ 2014-08-22 14:23 ` Michael Albinus
2014-08-23 9:05 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-22 14:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
Hi Eli,
> . I suggest to factor out the part that converts to wchar_t, sets up
> the locale, and calls strcoll. The code you wrote makes certain
> assumptions about 'setlocale', and also about the wchar_t
> representation. Factoring those system-dependent parts out will
> minimize the number of #ifdef's needed to provide such features for
> other platforms.
I see. But I don't know how to factor out. Shall I move str_collate to
another file? Or to a new file? Something else?
> . I think glibc has a 'newlocale' API that is better suited to this
> kind of jobs. In particular, 'setlocale' changes the locale of the
> entire program, which is bad news for other threads that might be
> using some locale-aware functions while the main thread calls
> string-collate-lessp. (We have more than 1 thread in Emacs built
> with GTK, for example, and who knows what those threads might be
> doing?)
OK, done.
I have added also configure checks HAVE_NEWLOCALE, HAVE_USELOCALE and
HAVE_FREELOCALE for the respective glibc functions. I don't know whether
it is overengineering, and whether I could simply apply the existing
HAVE_SETLOCALE check. I believe all these functions do exist in parallel
in locale.h, don't they?
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-22 14:23 ` Michael Albinus
@ 2014-08-23 9:05 ` Eli Zaretskii
2014-08-23 16:42 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-23 9:05 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Fri, 22 Aug 2014 16:23:34 +0200
>
> > . I suggest to factor out the part that converts to wchar_t, sets up
> > the locale, and calls strcoll. The code you wrote makes certain
> > assumptions about 'setlocale', and also about the wchar_t
> > representation. Factoring those system-dependent parts out will
> > minimize the number of #ifdef's needed to provide such features for
> > other platforms.
>
> I see. But I don't know how to factor out. Shall I move str_collate to
> another file? Or to a new file? Something else?
I think everything in str_collate starting with the "Convert byte
stream to code pointers." comment (btw, I guess you meant "code
points" here) should be in a separate function, and the best place for
that function is sysdep.c. At least on MS-Windows, both the part that
converts a Lisp string into wchar_t array, and the part that performs
a locale-sensitive string comparison, will be implemented differently.
> > . I think glibc has a 'newlocale' API that is better suited to this
> > kind of jobs. In particular, 'setlocale' changes the locale of the
> > entire program, which is bad news for other threads that might be
> > using some locale-aware functions while the main thread calls
> > string-collate-lessp. (We have more than 1 thread in Emacs built
> > with GTK, for example, and who knows what those threads might be
> > doing?)
>
> OK, done.
Thanks. (You didn't attach the new patch.)
Btw, I wonder whether we should have a way to pass the locale string
explicitly, instead of relying on $LC_COLLATE.
> I have added also configure checks HAVE_NEWLOCALE, HAVE_USELOCALE and
> HAVE_FREELOCALE for the respective glibc functions. I don't know whether
> it is overengineering, and whether I could simply apply the existing
> HAVE_SETLOCALE check. I believe all these functions do exist in parallel
> in locale.h, don't they?
I'll defer to glibc experts on that. My knowledge of 'newlocale'
facilities is limited to what I saw in Guile's i18n.c module.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-23 9:05 ` Eli Zaretskii
@ 2014-08-23 16:42 ` Michael Albinus
2014-08-23 17:33 ` Eli Zaretskii
0 siblings, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-23 16:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
> I think everything in str_collate starting with the "Convert byte
> stream to code pointers." comment (btw, I guess you meant "code
> points" here) should be in a separate function, and the best place for
> that function is sysdep.c. At least on MS-Windows, both the part that
> converts a Lisp string into wchar_t array, and the part that performs
> a locale-sensitive string comparison, will be implemented differently.
Well, I've moved (most of) str_collate to sysdep.c.
> Thanks. (You didn't attach the new patch.)
Oops. Appended this time.
> Btw, I wonder whether we should have a way to pass the locale string
> explicitly, instead of relying on $LC_COLLATE.
We could add an optional argument to string-collate-*. But this would
break signature equivalence with string-lessp and string-equal,
respectively.
Or we could introduce a global var, which shall be let-bound to the
locale string.
>> I have added also configure checks HAVE_NEWLOCALE, HAVE_USELOCALE and
>> HAVE_FREELOCALE for the respective glibc functions. I don't know whether
>> it is overengineering, and whether I could simply apply the existing
>> HAVE_SETLOCALE check. I believe all these functions do exist in parallel
>> in locale.h, don't they?
>
> I'll defer to glibc experts on that. My knowledge of 'newlocale'
> facilities is limited to what I saw in Guile's i18n.c module.
According to the manpages, setlocale is conforming to "C89, C99,
POSIX.1-2001". {new,use,free}locale are conforming to "POSIX.1-2008".
So we must check for HAVE_USELOCALE, indeed. Checks for HAVE_NEWLOCALE
and HAVE_FREELOCALE are not necessary, the functions exist in parallel
to uselocale (introduced in glibc 2.3).
This raises the question, whether we shall use also my first setlocale
approach in case of uselocale absence?
Best regards, Michael.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: collate-patch --]
[-- Type: text/x-patch, Size: 5348 bytes --]
=== modified file 'src/fns.c'
--- src/fns.c 2014-08-02 15:56:18 +0000
+++ src/fns.c 2014-08-23 15:57:06 +0000
@@ -40,7 +40,7 @@
#include "xterm.h"
#endif
-Lisp_Object Qstring_lessp;
+Lisp_Object Qstring_lessp, Qstring_collate_lessp, Qstring_collate_equalp;
static Lisp_Object Qprovide, Qrequire;
static Lisp_Object Qyes_or_no_p_history;
Lisp_Object Qcursor_in_echo_area;
@@ -343,6 +343,84 @@
}
return i1 < SCHARS (s2) ? Qt : Qnil;
}
+
+#ifdef __STDC_ISO_10646__
+/* Defined in sysdep.c. */
+extern ptrdiff_t str_collate (Lisp_Object, Lisp_Object);
+#endif /* __STDC_ISO_10646__ */
+
+DEFUN ("string-collate-lessp", Fstring_collate_lessp, Sstring_collate_lessp, 2, 2, 0,
+ doc: /* Return t if first arg string is less than second in collation order.
+
+Case is significant. Symbols are also allowed; their print names are
+used instead.
+
+This function obeys the conventions for collation order in your
+locale settings. For example, punctuation and whitespace characters
+are considered less significant for sorting.
+
+\(sort '\("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
+ => \("11" "1 1" "1.1" "12" "1 2" "1.2")
+
+If your system does not support a locale environment, this function
+behaves like `string-lessp'.
+
+If the environment variable \"LC_COLLATE\" is set in `process-environment',
+it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+{
+#ifdef __STDC_ISO_10646__
+ /* Check parameters. */
+ if (SYMBOLP (s1))
+ s1 = SYMBOL_NAME (s1);
+ if (SYMBOLP (s2))
+ s2 = SYMBOL_NAME (s2);
+ CHECK_STRING (s1);
+ CHECK_STRING (s2);
+
+ return (str_collate (s1, s2) < 0) ? Qt : Qnil;
+
+#else
+ return Fstring_lessp (s1, s2);
+#endif /* __STDC_ISO_10646__ */
+}
+
+DEFUN ("string-collate-equalp", Fstring_collate_equalp, Sstring_collate_equalp, 2, 2, 0,
+ doc: /* Return t if two strings have identical contents.
+
+Case is significant. Symbols are also allowed; their print names are
+used instead.
+
+This function obeys the conventions for collation order in your locale
+settings. For example, characters with different coding points but
+the same meaning are considered as equal, like different grave accent
+unicode characters.
+
+\(string-collate-equalp \(string ?\\uFF40) \(string ?\\u1FEF))
+ => t
+
+If your system does not support a locale environment, this function
+behaves like `string-equal'.
+
+If the environment variable \"LC_COLLATE\" is set in `process-environment',
+it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+{
+#ifdef __STDC_ISO_10646__
+ /* Check parameters. */
+ if (SYMBOLP (s1))
+ s1 = SYMBOL_NAME (s1);
+ if (SYMBOLP (s2))
+ s2 = SYMBOL_NAME (s2);
+ CHECK_STRING (s1);
+ CHECK_STRING (s2);
+
+ return (str_collate (s1, s2) == 0) ? Qt : Qnil;
+
+#else
+ return Fstring_equal (s1, s2);
+#endif /* __STDC_ISO_10646__ */
+}
\f
static Lisp_Object concat (ptrdiff_t nargs, Lisp_Object *args,
enum Lisp_Type target_type, bool last_special);
@@ -4919,6 +4997,8 @@
defsubr (&Sdefine_hash_table_test);
DEFSYM (Qstring_lessp, "string-lessp");
+ DEFSYM (Qstring_collate_lessp, "string-collate-lessp");
+ DEFSYM (Qstring_collate_equalp, "string-collate-equalp");
DEFSYM (Qprovide, "provide");
DEFSYM (Qrequire, "require");
DEFSYM (Qyes_or_no_p_history, "yes-or-no-p-history");
@@ -4972,6 +5052,8 @@
defsubr (&Sstring_equal);
defsubr (&Scompare_strings);
defsubr (&Sstring_lessp);
+ defsubr (&Sstring_collate_lessp);
+ defsubr (&Sstring_collate_equalp);
defsubr (&Sappend);
defsubr (&Sconcat);
defsubr (&Svconcat);
=== modified file 'src/sysdep.c'
--- src/sysdep.c 2014-07-14 19:23:18 +0000
+++ src/sysdep.c 2014-08-23 16:36:39 +0000
@@ -3513,3 +3513,63 @@
}
#endif /* !defined (WINDOWSNT) */
+\f
+/* Wide character string collation. */
+
+#ifdef __STDC_ISO_10646__
+#include <wchar.h>
+
+#ifdef HAVE_USELOCALE
+#include <locale.h>
+#endif /* HAVE_USELOCALE */
+
+ptrdiff_t
+str_collate (Lisp_Object s1, Lisp_Object s2)
+{
+ register ptrdiff_t res, len, i, i_byte;
+ wchar_t *p1, *p2;
+#ifdef HAVE_USELOCALE
+ Lisp_Object lc_collate;
+ locale_t loc = (locale_t) 0, oldloc = (locale_t) 0;
+#endif /* HAVE_USELOCALE */
+
+ USE_SAFE_ALLOCA;
+
+ /* Convert byte stream to code points. */
+ len = SCHARS (s1); i = i_byte = 0;
+ p1 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p1));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p1+i-1), s1, i, i_byte);
+ *(p1+len) = 0;
+
+ len = SCHARS (s2); i = i_byte = 0;
+ p2 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p2));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p2+i-1), s2, i, i_byte);
+ *(p2+len) = 0;
+
+#ifdef HAVE_USELOCALE
+ /* Create a new locale object, and set it. */
+ lc_collate =
+ Fgetenv_internal (build_string ("LC_COLLATE"), Vprocess_environment);
+
+ if (STRINGP (lc_collate)
+ && (loc = newlocale (LC_COLLATE_MASK, SSDATA (lc_collate), (locale_t) 0)))
+ oldloc = uselocale (loc);
+#endif /* HAVE_USELOCALE */
+
+ res = wcscoll (p1, p2);
+
+#ifdef HAVE_USELOCALE
+ /* Free the locale object, and reset. */
+ if (loc)
+ freelocale (loc);
+ if (oldloc)
+ uselocale (oldloc);
+#endif /* HAVE_USELOCALE */
+
+ /* Return result. */
+ SAFE_FREE ();
+ return res;
+}
+#endif /* __STDC_ISO_10646__ */
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-23 16:42 ` Michael Albinus
@ 2014-08-23 17:33 ` Eli Zaretskii
2014-08-23 20:32 ` Michael Albinus
0 siblings, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-23 17:33 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Sat, 23 Aug 2014 18:42:44 +0200
>
> > Btw, I wonder whether we should have a way to pass the locale string
> > explicitly, instead of relying on $LC_COLLATE.
>
> We could add an optional argument to string-collate-*. But this would
> break signature equivalence with string-lessp and string-equal,
> respectively.
>
> Or we could introduce a global var, which shall be let-bound to the
> locale string.
Or have a new optional argument in string-lessp etc., or introduce a
new set of APIs which will accept a locale, and have string-lessp
etc. call them with that argument nil.
> This raises the question, whether we shall use also my first setlocale
> approach in case of uselocale absence?
I think so, yes.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-23 17:33 ` Eli Zaretskii
@ 2014-08-23 20:32 ` Michael Albinus
2014-08-24 14:54 ` Eli Zaretskii
2014-08-25 16:45 ` Glenn Morris
0 siblings, 2 replies; 63+ messages in thread
From: Michael Albinus @ 2014-08-23 20:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
[-- Attachment #1: Type: text/plain, Size: 972 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
>> > Btw, I wonder whether we should have a way to pass the locale string
>> > explicitly, instead of relying on $LC_COLLATE.
>>
>> We could add an optional argument to string-collate-*. But this would
>> break signature equivalence with string-lessp and string-equal,
>> respectively.
>>
>> Or we could introduce a global var, which shall be let-bound to the
>> locale string.
>
> Or have a new optional argument in string-lessp etc., or introduce a
> new set of APIs which will accept a locale, and have string-lessp
> etc. call them with that argument nil.
An optional argument to string-lessp could be inconvenient. IMHO, the
most important use-case of string-lessp is being a PREDICATE of
sort. This does not support optional arguments.
>> This raises the question, whether we shall use also my first setlocale
>> approach in case of uselocale absence?
>
> I think so, yes.
Extended patch appended.
Best regards, Michael.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: collate-patch --]
[-- Type: text/x-patch, Size: 5867 bytes --]
=== modified file 'src/fns.c'
--- src/fns.c 2014-08-02 15:56:18 +0000
+++ src/fns.c 2014-08-23 15:57:06 +0000
@@ -40,7 +40,7 @@
#include "xterm.h"
#endif
-Lisp_Object Qstring_lessp;
+Lisp_Object Qstring_lessp, Qstring_collate_lessp, Qstring_collate_equalp;
static Lisp_Object Qprovide, Qrequire;
static Lisp_Object Qyes_or_no_p_history;
Lisp_Object Qcursor_in_echo_area;
@@ -343,6 +343,84 @@
}
return i1 < SCHARS (s2) ? Qt : Qnil;
}
+
+#ifdef __STDC_ISO_10646__
+/* Defined in sysdep.c. */
+extern ptrdiff_t str_collate (Lisp_Object, Lisp_Object);
+#endif /* __STDC_ISO_10646__ */
+
+DEFUN ("string-collate-lessp", Fstring_collate_lessp, Sstring_collate_lessp, 2, 2, 0,
+ doc: /* Return t if first arg string is less than second in collation order.
+
+Case is significant. Symbols are also allowed; their print names are
+used instead.
+
+This function obeys the conventions for collation order in your
+locale settings. For example, punctuation and whitespace characters
+are considered less significant for sorting.
+
+\(sort '\("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
+ => \("11" "1 1" "1.1" "12" "1 2" "1.2")
+
+If your system does not support a locale environment, this function
+behaves like `string-lessp'.
+
+If the environment variable \"LC_COLLATE\" is set in `process-environment',
+it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+{
+#ifdef __STDC_ISO_10646__
+ /* Check parameters. */
+ if (SYMBOLP (s1))
+ s1 = SYMBOL_NAME (s1);
+ if (SYMBOLP (s2))
+ s2 = SYMBOL_NAME (s2);
+ CHECK_STRING (s1);
+ CHECK_STRING (s2);
+
+ return (str_collate (s1, s2) < 0) ? Qt : Qnil;
+
+#else
+ return Fstring_lessp (s1, s2);
+#endif /* __STDC_ISO_10646__ */
+}
+
+DEFUN ("string-collate-equalp", Fstring_collate_equalp, Sstring_collate_equalp, 2, 2, 0,
+ doc: /* Return t if two strings have identical contents.
+
+Case is significant. Symbols are also allowed; their print names are
+used instead.
+
+This function obeys the conventions for collation order in your locale
+settings. For example, characters with different coding points but
+the same meaning are considered as equal, like different grave accent
+unicode characters.
+
+\(string-collate-equalp \(string ?\\uFF40) \(string ?\\u1FEF))
+ => t
+
+If your system does not support a locale environment, this function
+behaves like `string-equal'.
+
+If the environment variable \"LC_COLLATE\" is set in `process-environment',
+it overrides the setting of your current locale. */)
+ (Lisp_Object s1, Lisp_Object s2)
+{
+#ifdef __STDC_ISO_10646__
+ /* Check parameters. */
+ if (SYMBOLP (s1))
+ s1 = SYMBOL_NAME (s1);
+ if (SYMBOLP (s2))
+ s2 = SYMBOL_NAME (s2);
+ CHECK_STRING (s1);
+ CHECK_STRING (s2);
+
+ return (str_collate (s1, s2) == 0) ? Qt : Qnil;
+
+#else
+ return Fstring_equal (s1, s2);
+#endif /* __STDC_ISO_10646__ */
+}
\f
static Lisp_Object concat (ptrdiff_t nargs, Lisp_Object *args,
enum Lisp_Type target_type, bool last_special);
@@ -4919,6 +4997,8 @@
defsubr (&Sdefine_hash_table_test);
DEFSYM (Qstring_lessp, "string-lessp");
+ DEFSYM (Qstring_collate_lessp, "string-collate-lessp");
+ DEFSYM (Qstring_collate_equalp, "string-collate-equalp");
DEFSYM (Qprovide, "provide");
DEFSYM (Qrequire, "require");
DEFSYM (Qyes_or_no_p_history, "yes-or-no-p-history");
@@ -4972,6 +5052,8 @@
defsubr (&Sstring_equal);
defsubr (&Scompare_strings);
defsubr (&Sstring_lessp);
+ defsubr (&Sstring_collate_lessp);
+ defsubr (&Sstring_collate_equalp);
defsubr (&Sappend);
defsubr (&Sconcat);
defsubr (&Svconcat);
=== modified file 'src/sysdep.c'
--- src/sysdep.c 2014-07-14 19:23:18 +0000
+++ src/sysdep.c 2014-08-23 20:23:11 +0000
@@ -3513,3 +3513,77 @@
}
#endif /* !defined (WINDOWSNT) */
+\f
+/* Wide character string collation. */
+
+#ifdef __STDC_ISO_10646__
+#include <wchar.h>
+
+#if defined (HAVE_USELOCALE) || defined (HAVE_SETLOCALE)
+#include <locale.h>
+#endif /* HAVE_USELOCALE || HAVE_SETLOCALE */
+
+ptrdiff_t
+str_collate (Lisp_Object s1, Lisp_Object s2)
+{
+ register ptrdiff_t res, len, i, i_byte;
+ wchar_t *p1, *p2;
+ Lisp_Object lc_collate;
+#ifdef HAVE_USELOCALE
+ locale_t loc = (locale_t) 0, oldloc = (locale_t) 0;
+#elif defined (HAVE_SETLOCALE)
+ char *oldloc = NULL;
+#endif /* HAVE_USELOCALE */
+
+ USE_SAFE_ALLOCA;
+
+ /* Convert byte stream to code points. */
+ len = SCHARS (s1); i = i_byte = 0;
+ p1 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p1));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p1+i-1), s1, i, i_byte);
+ *(p1+len) = 0;
+
+ len = SCHARS (s2); i = i_byte = 0;
+ p2 = (wchar_t *) SAFE_ALLOCA ((len+1) * (sizeof *p2));
+ while (i < len)
+ FETCH_STRING_CHAR_ADVANCE (*(p2+i-1), s2, i, i_byte);
+ *(p2+len) = 0;
+
+#if defined (HAVE_USELOCALE) || defined (HAVE_SETLOCALE)
+ /* Create a new locale object, and set it. */
+ lc_collate =
+ Fgetenv_internal (build_string ("LC_COLLATE"), Vprocess_environment);
+
+#ifdef HAVE_USELOCALE
+ if (STRINGP (lc_collate)
+ && (loc = newlocale (LC_COLLATE_MASK, SSDATA (lc_collate), (locale_t) 0)))
+ oldloc = uselocale (loc);
+#elif defined (HAVE_SETLOCALE)
+ if (STRINGP (lc_collate))
+ {
+ oldloc = xstrdup (setlocale (LC_COLLATE, NULL));
+ setlocale (LC_COLLATE, SSDATA (lc_collate));
+ }
+#endif /* HAVE_USELOCALE */
+#endif /* HAVE_USELOCALE || HAVE_SETLOCALE */
+
+ res = wcscoll (p1, p2);
+
+#ifdef HAVE_USELOCALE
+ /* Free the locale object, and reset. */
+ if (loc)
+ freelocale (loc);
+ if (oldloc)
+ uselocale (oldloc);
+#elif defined (HAVE_SETLOCALE)
+ /* Restore the original locale. */
+ if (oldloc)
+ setlocale (LC_COLLATE, oldloc);
+#endif /* HAVE_USELOCALE */
+
+ /* Return result. */
+ SAFE_FREE ();
+ return res;
+}
+#endif /* __STDC_ISO_10646__ */
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-23 20:32 ` Michael Albinus
@ 2014-08-24 14:54 ` Eli Zaretskii
2014-08-24 16:18 ` Michael Albinus
2014-08-25 15:01 ` Stefan Monnier
2014-08-25 16:45 ` Glenn Morris
1 sibling, 2 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-24 14:54 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Sat, 23 Aug 2014 22:32:00 +0200
>
> >> > Btw, I wonder whether we should have a way to pass the locale string
> >> > explicitly, instead of relying on $LC_COLLATE.
> >>
> >> We could add an optional argument to string-collate-*. But this would
> >> break signature equivalence with string-lessp and string-equal,
> >> respectively.
> >>
> >> Or we could introduce a global var, which shall be let-bound to the
> >> locale string.
> >
> > Or have a new optional argument in string-lessp etc., or introduce a
> > new set of APIs which will accept a locale, and have string-lessp
> > etc. call them with that argument nil.
>
> An optional argument to string-lessp could be inconvenient. IMHO, the
> most important use-case of string-lessp is being a PREDICATE of
> sort. This does not support optional arguments.
In those cases, we should add the same optional argument to the sort
function.
> Extended patch appended.
Thanks.
I wonder what should this do if the new locale cannot be
instantiated/installed. As you wrote the code, it will silently use
the current locale, but I wonder if that's TRT.
Other than that, I think you should install this.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-24 14:54 ` Eli Zaretskii
@ 2014-08-24 16:18 ` Michael Albinus
2014-08-25 15:01 ` Stefan Monnier
1 sibling, 0 replies; 63+ messages in thread
From: Michael Albinus @ 2014-08-24 16:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
>> > Or have a new optional argument in string-lessp etc., or introduce a
>> > new set of APIs which will accept a locale, and have string-lessp
>> > etc. call them with that argument nil.
>>
>> An optional argument to string-lessp could be inconvenient. IMHO, the
>> most important use-case of string-lessp is being a PREDICATE of
>> sort. This does not support optional arguments.
>
> In those cases, we should add the same optional argument to the sort
> function.
That would be an option. But it would take decades, until all
appearences of sort have been adapted. That's why I'm in favor of a
global variable, which could be let-bounded.
But I don't oppose strongly to your proposal.
> I wonder what should this do if the new locale cannot be
> instantiated/installed. As you wrote the code, it will silently use
> the current locale, but I wonder if that's TRT.
We could raise a warning. But if it is used in sort with a huge list of
strings, the user would be flooded with those warnings.
Maybe we could add a function, which would tell whether a given locale
value would have effect. The user could check then.
> Other than that, I think you should install this.
Done. Next days I'm on the road; the work still missing (extending Elisp
manual, writing ert tests, adapting ls-lisp) must be postponed to later
this week, or the week after.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-24 14:54 ` Eli Zaretskii
2014-08-24 16:18 ` Michael Albinus
@ 2014-08-25 15:01 ` Stefan Monnier
2014-08-27 8:49 ` Michael Albinus
1 sibling, 1 reply; 63+ messages in thread
From: Stefan Monnier @ 2014-08-25 15:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, Michael Albinus, 18051
>> An optional argument to string-lessp could be inconvenient. IMHO, the
>> most important use-case of string-lessp is being a PREDICATE of
>> sort. This does not support optional arguments.
Of course it does:
(sort foo (lambda (x y) (string-lessp x y 'optional-arg)))
-- Stefan
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-23 20:32 ` Michael Albinus
2014-08-24 14:54 ` Eli Zaretskii
@ 2014-08-25 16:45 ` Glenn Morris
2014-08-25 17:36 ` Eli Zaretskii
1 sibling, 1 reply; 63+ messages in thread
From: Glenn Morris @ 2014-08-25 16:45 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
Michael Albinus wrote:
> An optional argument to string-lessp could be inconvenient. IMHO, the
> most important use-case of string-lessp is being a PREDICATE of
> sort. This does not support optional arguments.
So make the argument optional, but defaulting to what the locale says.
BTW, this is http://debbugs.gnu.org/2263 (and 12008).
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-25 16:45 ` Glenn Morris
@ 2014-08-25 17:36 ` Eli Zaretskii
0 siblings, 0 replies; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-25 17:36 UTC (permalink / raw)
To: Glenn Morris; +Cc: michael_heerdegen, michael.albinus, 18051
> From: Glenn Morris <rgm@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Mon, 25 Aug 2014 12:45:07 -0400
>
> BTW, this is http://debbugs.gnu.org/2263 (and 12008).
They should obviously be closed now, both of them.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-25 15:01 ` Stefan Monnier
@ 2014-08-27 8:49 ` Michael Albinus
2014-08-27 15:37 ` Eli Zaretskii
2014-08-27 15:48 ` Glenn Morris
0 siblings, 2 replies; 63+ messages in thread
From: Michael Albinus @ 2014-08-27 8:49 UTC (permalink / raw)
To: Stefan Monnier; +Cc: michael_heerdegen, 18051
Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>> An optional argument to string-lessp could be inconvenient. IMHO, the
>>> most important use-case of string-lessp is being a PREDICATE of
>>> sort. This does not support optional arguments.
>
> Of course it does:
>
> (sort foo (lambda (x y) (string-lessp x y 'optional-arg)))
Yes, but this would also expect optional-arg to be a variable which can
be set by the user. That's similar to what I have proposed, I believe.
> -- Stefan
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 8:49 ` Michael Albinus
@ 2014-08-27 15:37 ` Eli Zaretskii
2014-08-27 18:02 ` Michael Albinus
2014-08-27 15:48 ` Glenn Morris
1 sibling, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-27 15:37 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
> From: Michael Albinus <michael.albinus@gmx.de>
> Cc: Eli Zaretskii <eliz@gnu.org>, michael_heerdegen@web.de, 18051@debbugs.gnu.org
> Date: Wed, 27 Aug 2014 10:49:05 +0200
>
> Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>
> >>> An optional argument to string-lessp could be inconvenient. IMHO, the
> >>> most important use-case of string-lessp is being a PREDICATE of
> >>> sort. This does not support optional arguments.
> >
> > Of course it does:
> >
> > (sort foo (lambda (x y) (string-lessp x y 'optional-arg)))
>
> Yes, but this would also expect optional-arg to be a variable which can
> be set by the user. That's similar to what I have proposed, I believe.
True. So I suggest to define a new variable, say,
string-collate-options, which is a key-value list with up to 2
members: ':locale' (a string), and ':case-fold' (a flag). If the
locale's codeset is UTF-8, the collation on Windows will emulate what
glibc evidently does. Lisp programs will bind string-collate-options
to the value they need.
Then we can remove the reference to process-environment from the code
of string_collate.
WDYT?
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 8:49 ` Michael Albinus
2014-08-27 15:37 ` Eli Zaretskii
@ 2014-08-27 15:48 ` Glenn Morris
2014-08-27 16:53 ` Eli Zaretskii
2014-08-27 18:08 ` Michael Albinus
1 sibling, 2 replies; 63+ messages in thread
From: Glenn Morris @ 2014-08-27 15:48 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
Michael Albinus wrote:
>> (sort foo (lambda (x y) (string-lessp x y 'optional-arg)))
>
> Yes, but this would also expect optional-arg to be a variable which can
> be set by the user.
I'm missing something, because I don't get why you want me to write (in
authors.el):
(let ((process-environment
(cons "LC_COLLATE=en_US.UTF-8"
process-environment)))
(sort authors-author-list
(lambda (a b) (string-collate-lessp (car a) (car b)))))
rather than the obviously-better:
(sort authors-author-list
(lambda (a b) (string-collate-lessp (car a) (car b) "en_US.UTF-8")))
Normally one controls functions through their arguments, not the
environment.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 15:48 ` Glenn Morris
@ 2014-08-27 16:53 ` Eli Zaretskii
2014-08-28 3:23 ` Stefan Monnier
2014-08-27 18:08 ` Michael Albinus
1 sibling, 1 reply; 63+ messages in thread
From: Eli Zaretskii @ 2014-08-27 16:53 UTC (permalink / raw)
To: Glenn Morris; +Cc: michael_heerdegen, michael.albinus, 18051
> From: Glenn Morris <rgm@gnu.org>
> Date: Wed, 27 Aug 2014 11:48:36 -0400
> Cc: michael_heerdegen@web.de, 18051@debbugs.gnu.org
>
> Normally one controls functions through their arguments, not the
> environment.
We also have the other variety, e.g. see coding-system-for-read.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 15:37 ` Eli Zaretskii
@ 2014-08-27 18:02 ` Michael Albinus
0 siblings, 0 replies; 63+ messages in thread
From: Michael Albinus @ 2014-08-27 18:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051
Eli Zaretskii <eliz@gnu.org> writes:
> True. So I suggest to define a new variable, say,
> string-collate-options, which is a key-value list with up to 2
> members: ':locale' (a string), and ':case-fold' (a flag). If the
> locale's codeset is UTF-8, the collation on Windows will emulate what
> glibc evidently does. Lisp programs will bind string-collate-options
> to the value they need.
>
> Then we can remove the reference to process-environment from the code
> of string_collate.
>
> WDYT?
Sounds OK to me.
Nest regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 15:48 ` Glenn Morris
2014-08-27 16:53 ` Eli Zaretskii
@ 2014-08-27 18:08 ` Michael Albinus
2014-08-27 18:30 ` Glenn Morris
1 sibling, 1 reply; 63+ messages in thread
From: Michael Albinus @ 2014-08-27 18:08 UTC (permalink / raw)
To: Glenn Morris; +Cc: michael_heerdegen, 18051
Glenn Morris <rgm@gnu.org> writes:
> I'm missing something, because I don't get why you want me to write (in
> authors.el):
>
> (let ((process-environment
> (cons "LC_COLLATE=en_US.UTF-8"
> process-environment)))
> (sort authors-author-list
> (lambda (a b) (string-collate-lessp (car a) (car b)))))
>
> rather than the obviously-better:
>
> (sort authors-author-list
> (lambda (a b) (string-collate-lessp (car a) (car b) "en_US.UTF-8")))
>
> Normally one controls functions through their arguments, not the
> environment.
authors.el is a special case:
- Your sort predicate is not an existing function, but a
lambda. Usually, I would expect something like
(sort any-list 'string-collate-lessp)
- You use a hard-coded value for the locale. The intention is to make it
configurable for the user.
If, for example, a user wants to use another collation order but the one
given by "en_US.UTF-8", you end up in offering a variable which can be
set. Don't know whether this is desirable in authors.el, 'tho.
Best regards, Michael.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 18:08 ` Michael Albinus
@ 2014-08-27 18:30 ` Glenn Morris
0 siblings, 0 replies; 63+ messages in thread
From: Glenn Morris @ 2014-08-27 18:30 UTC (permalink / raw)
To: Michael Albinus; +Cc: michael_heerdegen, 18051
You don't know how this feature will be used, because you just added it.
Sometimes people will want to use it to "sort in the user's specified
locale", sometimes they will want to use it to "sort according to some
specific locale". An optional argument defaulting to LC_COLLATE makes
both uses easy. (Adding a global lisp variable that overrides LC_COLLATE
seems pointless but harmless to me.)
Anyway, at this point I give up trying to get an optional argument added
to a function.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: trunk r117751: Improve robustness of new string-collation code.
2014-07-18 6:22 ` bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function? Michael Heerdegen
2014-07-18 6:53 ` Eli Zaretskii
@ 2014-08-27 23:57 ` Katsumi Yamaoka
2014-08-28 0:51 ` Paul Eggert
2014-08-28 3:09 ` Katsumi Yamaoka
2 siblings, 1 reply; 63+ messages in thread
From: Katsumi Yamaoka @ 2014-08-27 23:57 UTC (permalink / raw)
To: Paul Eggert; +Cc: 18051
On Wed, 27 Aug 2014 18:56:52 +0000, Paul Eggert wrote:
> ------------------------------------------------------------
> revno: 117751
[...]
> message:
> Improve robustness of new string-collation code.
> * configure.ac (newlocale): Check for this, not for uselocale.
> * src/sysdep.c (LC_COLLATE, LC_COLLATE_MASK, freelocale, locale_t)
> (newlocale, wcscoll_l): Define substitutes for platforms that
> lack them, so as to simplify the mainline code.
> (str_collate): Simplify the code by assuming the above definitions.
> Use wcscoll_l, not uselocale, as uselocale is too fragile. For
> example, the old version left the Emacs in the wrong locale if
> wcscoll reported an error. Use 'int', not ptrdiff_t, for the int
> result. Report an error if newlocale fails.
Emacs build gets stopped on Cygwin probably due to this change.
gcc [...options below...] sysdep.c
sysdep.c: In function 'str_collate':
sysdep.c:3706:33: error: 'LC_COLLATE_MASK' undeclared (first use in this function)
locale_t loc = newlocale (LC_COLLATE_MASK, SSDATA (lc_collate), 0);
^
sysdep.c:3706:33: note: each undeclared identifier is reported only once for each function it appears in
Makefile:336: recipe for target 'sysdep.o' failed
make[1]: *** [sysdep.o] Error 1
Thanks.
gcc options:
-std=gnu99 -c -Demacs -I. -I. -I../lib -I../lib -D_REENTRANT
-I/usr/include/gtk-3.0 -I/usr/include/at-spi2-atk/2.0
-I/usr/include/gtk-3.0 -I/usr/include/gio-unix-2.0/ I/usr/include/cairo
--I/usr/include/pango-1.0 I/usr/include/harfbuzz -I/usr/include/pango-1.0
-I/usr/include/atk-1.0 -I/usr/include/cairo I/usr/include/pixman-1
--I/usr/include/freetype2 I/usr/include/libpng15 -I/usr/include/freetype2
-I/usr/include/libpng15 -I/usr/include/gdk-pixbuf-2.0
-I/usr/include/libpng15 -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -I/usr/include/freetype2
-I/usr/include/libpng15 -I/usr/include/freetype2 I/usr/include/libpng15
--D_REENTRANT -I/usr/include/librsvg-2.0 I/usr/include/gdk-pixbuf-2.0
--I/usr/include/libpng15 I/usr/include/cairo -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -I/usr/include/pixman-1
-I/usr/include/freetype2 -I/usr/include/libpng15 I/usr/include/freetype2
--I/usr/include/libpng15 -fopenmp I/usr/include/ImageMagick
--I/usr/include/libpng15 I/usr/include/libxml2 -I/usr/include/dbus-1.0
-I/usr/lib/dbus-1.0/include -D_REENTRANT I/usr/include/glib-2.0
--I/usr/lib/glib-2.0/include D_REENTRANT -I/usr/include/gconf/2
--I/usr/include/dbus-1.0 I/usr/lib/dbus-1.0/include
--I/usr/include/glib-2.0 I/usr/lib/glib-2.0/include
--I/usr/include/glib-2.0 I/usr/lib/glib-2.0/include
--I/usr/include/freetype2 I/usr/include/libpng15 -I/usr/include/freetype2
-I/usr/include/libpng15 -I/usr/include/freetype2 I/usr/include/libpng15
--MMD -MF deps/sysdep.d -MP I/usr/include/p11-kit-1 -D_REENTRANT
--I/usr/include/glib-2.0 I/usr/lib/glib-2.0/include -g3 -O2
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: trunk r117751: Improve robustness of new string-collation code.
2014-08-27 23:57 ` bug#18051: trunk r117751: Improve robustness of new string-collation code Katsumi Yamaoka
@ 2014-08-28 0:51 ` Paul Eggert
0 siblings, 0 replies; 63+ messages in thread
From: Paul Eggert @ 2014-08-28 0:51 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: 18051
Katsumi Yamaoka wrote:
> sysdep.c:3706:33: error: 'LC_COLLATE_MASK' undeclared
Thanks, I installed a fix in trunk bzr 117753.
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: trunk r117751: Improve robustness of new string-collation code.
2014-07-18 6:22 ` bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function? Michael Heerdegen
2014-07-18 6:53 ` Eli Zaretskii
2014-08-27 23:57 ` bug#18051: trunk r117751: Improve robustness of new string-collation code Katsumi Yamaoka
@ 2014-08-28 3:09 ` Katsumi Yamaoka
2 siblings, 0 replies; 63+ messages in thread
From: Katsumi Yamaoka @ 2014-08-28 3:09 UTC (permalink / raw)
To: Paul Eggert; +Cc: 18051
On Wed, 27 Aug 2014 17:51:34 -0700, Paul Eggert wrote:
> Katsumi Yamaoka wrote:
>> sysdep.c:3706:33: error: 'LC_COLLATE_MASK' undeclared
> Thanks, I installed a fix in trunk bzr 117753.
Now the new build is running. Thank you!
^ permalink raw reply [flat|nested] 63+ messages in thread
* bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
2014-08-27 16:53 ` Eli Zaretskii
@ 2014-08-28 3:23 ` Stefan Monnier
0 siblings, 0 replies; 63+ messages in thread
From: Stefan Monnier @ 2014-08-28 3:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: michael_heerdegen, 18051, michael.albinus
>> Normally one controls functions through their arguments, not the
>> environment.
> We also have the other variety, e.g. see coding-system-for-read.
Yes, we have a lot of that, but that's only used when adding an optional
argument was not really practical, in which case dynamic scoping comes
to the rescue (but with several caveats).
In the present case I don't see why we can't use an optional argument,
so an optional arg would be preferable.
Stefan
^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2014-08-28 3:23 UTC | newest]
Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1XMiOq-0000si-VD@vcs.savannah.gnu.org>
2014-07-18 6:22 ` bug#18051: 24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function? Michael Heerdegen
2014-07-18 6:53 ` Eli Zaretskii
2014-07-18 7:33 ` Michael Heerdegen
2014-07-18 8:53 ` Eli Zaretskii
2014-07-18 9:37 ` Michael Heerdegen
2014-07-18 9:46 ` Eli Zaretskii
2014-07-18 10:18 ` Michael Heerdegen
2014-07-18 13:03 ` Eli Zaretskii
2014-07-19 1:25 ` Michael Heerdegen
2014-07-19 8:17 ` Eli Zaretskii
2014-07-19 10:52 ` Michael Heerdegen
2014-07-19 10:56 ` Eli Zaretskii
2014-07-18 9:24 ` Michael Albinus
2014-07-18 9:33 ` Eli Zaretskii
2014-07-18 10:12 ` Michael Albinus
2014-07-18 12:57 ` Eli Zaretskii
2014-07-18 13:18 ` Michael Albinus
2014-07-18 13:44 ` Eli Zaretskii
2014-07-18 16:21 ` Michael Albinus
2014-07-20 5:49 ` Michael Heerdegen
2014-07-20 6:07 ` Eli Zaretskii
2014-07-20 6:21 ` Michael Heerdegen
2014-07-20 6:33 ` Eli Zaretskii
2014-07-20 7:30 ` Michael Heerdegen
2014-07-20 8:14 ` Eli Zaretskii
2014-07-20 8:24 ` Michael Heerdegen
2014-07-20 8:38 ` Eli Zaretskii
2014-07-20 9:15 ` Michael Heerdegen
2014-07-20 9:18 ` Eli Zaretskii
2014-07-20 11:44 ` Michael Albinus
2014-07-20 11:59 ` Eli Zaretskii
2014-07-20 15:26 ` Michael Albinus
2014-07-20 16:16 ` Eli Zaretskii
2014-08-16 21:52 ` Michael Albinus
2014-08-17 16:38 ` Eli Zaretskii
2014-08-17 17:55 ` Eli Zaretskii
2014-08-17 18:46 ` Michael Albinus
2014-08-17 18:52 ` Eli Zaretskii
2014-08-21 9:05 ` Michael Albinus
2014-08-21 14:41 ` Eli Zaretskii
2014-08-22 14:23 ` Michael Albinus
2014-08-23 9:05 ` Eli Zaretskii
2014-08-23 16:42 ` Michael Albinus
2014-08-23 17:33 ` Eli Zaretskii
2014-08-23 20:32 ` Michael Albinus
2014-08-24 14:54 ` Eli Zaretskii
2014-08-24 16:18 ` Michael Albinus
2014-08-25 15:01 ` Stefan Monnier
2014-08-27 8:49 ` Michael Albinus
2014-08-27 15:37 ` Eli Zaretskii
2014-08-27 18:02 ` Michael Albinus
2014-08-27 15:48 ` Glenn Morris
2014-08-27 16:53 ` Eli Zaretskii
2014-08-28 3:23 ` Stefan Monnier
2014-08-27 18:08 ` Michael Albinus
2014-08-27 18:30 ` Glenn Morris
2014-08-25 16:45 ` Glenn Morris
2014-08-25 17:36 ` Eli Zaretskii
2014-07-20 6:18 ` Michael Heerdegen
2014-07-20 14:22 ` Stefan Monnier
2014-08-27 23:57 ` bug#18051: trunk r117751: Improve robustness of new string-collation code Katsumi Yamaoka
2014-08-28 0:51 ` Paul Eggert
2014-08-28 3:09 ` Katsumi Yamaoka
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).