unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
@ 2022-06-03 23:21 TAKAHASHI Yoshio
  2022-06-04  7:44 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: TAKAHASHI Yoshio @ 2022-06-03 23:21 UTC (permalink / raw)
  To: 55787

Hi,

I encounter an inconsistent sort result.  The position of "01.0" and/or
"01.2" seems wrong.


$ cat /tmp/test.el
(require 'ls-lisp)
(print (sort (vector "01.0" "10" "010" "01.2")
             (lambda (x y)
               (ls-lisp-version-lessp x y))))
$ emacs -Q --batch -l /tmp/test.el

["01.0" "10" "010" "01.2"]
$

In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.16.0)
 of 2022-05-26 built on LAPTOP-89LTAUNV
Repository revision: 531688a19e2125b20c2efa032e02b9cebbedb397
Repository branch: master
Windowing system distributor 'Microsoft Corporation', version 11.0.12010000
System Description: Ubuntu 22.04 LTS






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-03 23:21 bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp TAKAHASHI Yoshio
@ 2022-06-04  7:44 ` Eli Zaretskii
  2022-06-04 14:11   ` TAKAHASHI Yoshio
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2022-06-04  7:44 UTC (permalink / raw)
  To: TAKAHASHI Yoshio; +Cc: 55787

> From: TAKAHASHI Yoshio <yfb02119@nifty.com>
> Date: Sat, 04 Jun 2022 08:21:48 +0900
> 
> I encounter an inconsistent sort result.  The position of "01.0" and/or
> "01.2" seems wrong.
> 
> 
> $ cat /tmp/test.el
> (require 'ls-lisp)
> (print (sort (vector "01.0" "10" "010" "01.2")
>              (lambda (x y)
>                (ls-lisp-version-lessp x y))))
> $ emacs -Q --batch -l /tmp/test.el
> 
> ["01.0" "10" "010" "01.2"]

Why do you think this is wrong?  This function is not meant to compare
dotted versions with undotted ones, only dotted to dotted or undotted
to undotted.  The strings are supposed to be file names, where a dot
begins an extension.

See the node "More details about version sort" in the GNU Coreutils
manual for more info.

If you want a general-purpose version-comparison function, use
version< instead.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-04  7:44 ` Eli Zaretskii
@ 2022-06-04 14:11   ` TAKAHASHI Yoshio
  2022-06-04 14:52     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: TAKAHASHI Yoshio @ 2022-06-04 14:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55787

Eli-san,

Thank you for your replay.

>> I encounter an inconsistent sort result.  The position of "01.0" and/or
>> "01.2" seems wrong.
>> 
>> 
>> $ cat /tmp/test.el
>> (require 'ls-lisp)
>> (print (sort (vector "01.0" "10" "010" "01.2")
>>              (lambda (x y)
>>                (ls-lisp-version-lessp x y))))
>> $ emacs -Q --batch -l /tmp/test.el
>> 
>> ["01.0" "10" "010" "01.2"]
>
> Why do you think this is wrong?  This function is not meant to compare
> dotted versions with undotted ones, only dotted to dotted or undotted
> to undotted.  The strings are supposed to be file names, where a dot
> begins an extension.
> 
> See the node "More details about version sort" in the GNU Coreutils
> manual for more info.

I report this "inconsistency" because ls-lisp does not sort files as ls
program does when `dired-listing-switches' has 'v', such as "-alGv".

# "01.0", "10", ... is minimal reproducible pattern that I stlipped down
# my real filenames pattern.

I'm not aware that `ls-lisp-version-lessp' does not support
dotted-undotted mixed cases.  Doc string says it acts as `strverscmp', I
expect the same result (order) in dired buffer.  And in below example,
the result seems to act like `strverscmp'.

    (print (sort (vector "01.0" "10" "01.2") ; no "010" in arg.
                 (lambda (x y)
                   (ls-lisp-version-lessp x y))))
    ["01.0" "01.2" "10"]


> If you want a general-purpose version-comparison function, use
> version< instead.

Umm, do I need to use `version<' in `ls-lisp-handle-switches' with
extracting numerical part from filename argument?

-- 
tkh





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-04 14:11   ` TAKAHASHI Yoshio
@ 2022-06-04 14:52     ` Eli Zaretskii
  2022-06-05  2:37       ` TAKAHASHI Yoshio
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2022-06-04 14:52 UTC (permalink / raw)
  To: TAKAHASHI Yoshio; +Cc: 55787

> From: TAKAHASHI Yoshio <yfb02119@nifty.com>
> Cc: 55787@debbugs.gnu.org
> Date: Sat, 04 Jun 2022 23:11:17 +0900
> 
> >> $ cat /tmp/test.el
> >> (require 'ls-lisp)
> >> (print (sort (vector "01.0" "10" "010" "01.2")
> >>              (lambda (x y)
> >>                (ls-lisp-version-lessp x y))))
> >> $ emacs -Q --batch -l /tmp/test.el
> >> 
> >> ["01.0" "10" "010" "01.2"]
> >
> > Why do you think this is wrong?  This function is not meant to compare
> > dotted versions with undotted ones, only dotted to dotted or undotted
> > to undotted.  The strings are supposed to be file names, where a dot
> > begins an extension.
> > 
> > See the node "More details about version sort" in the GNU Coreutils
> > manual for more info.
> 
> I report this "inconsistency" because ls-lisp does not sort files as ls
> program does when `dired-listing-switches' has 'v', such as "-alGv".

What do you see with 'ls' and what do you see with ls-lisp?  Also, in
which locale are you trying this with 'ls'?

> # "01.0", "10", ... is minimal reproducible pattern that I stlipped down
> # my real filenames pattern.

I'd prefer to see the real file names instead, since that's what
ls-lisp-version-lessp was written to handle.

> I'm not aware that `ls-lisp-version-lessp' does not support
> dotted-undotted mixed cases.  Doc string says it acts as `strverscmp', I
> expect the same result (order) in dired buffer.  And in below example,
> the result seems to act like `strverscmp'.

The exact spec of strverscmp is not known, AFAIK, and the
implementation is a state machine, which is somewhat hard to
reverse-engineer.  I'm only aware of the documentation in the glibc
manual; did you read it?

Comparing with 'ls' is also somewhat problematic, because in UTF-8
locales its collation rules ignore some punctuation characters --
again, because that's how glibc implements that.  Emacs on MS-Windows
can emulate this behavior if you set w32-collate-ignore-punctuation to
a non-nil value.

>     (print (sort (vector "01.0" "10" "01.2") ; no "010" in arg.
>                  (lambda (x y)
>                    (ls-lisp-version-lessp x y))))
>     ["01.0" "01.2" "10"]

If I create files by the names in your original example, I see this in
a Dired buffer created by "C-u C-x d" after I set the switches to "-alv":

    drwxrwxrwx  1 xxxxx yyy 0 06-04 10:19 .
    drwxrwxrwx  1 xxxxx yyy 0 06-04 11:02 ..
    -rw-rw-rw-  1 xxxxx yyy 0 06-04 10:19 10
    -rw-rw-rw-  1 xxxxx yyy 0 06-04 10:19 010
    -rw-rw-rw-  1 xxxxx yyy 0 06-04 10:19 01.0
    -rw-rw-rw-  1 xxxxx yyy 0 06-04 10:19 01.2

which seems reasonable.

> > If you want a general-purpose version-comparison function, use
> > version< instead.
> 
> Umm, do I need to use `version<' in `ls-lisp-handle-switches' with
> extracting numerical part from filename argument?

No, I wrote that before I understood what you were trying to do.
Please ignore that part.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-04 14:52     ` Eli Zaretskii
@ 2022-06-05  2:37       ` TAKAHASHI Yoshio
  2022-06-05  7:01         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: TAKAHASHI Yoshio @ 2022-06-05  2:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55787

Eli-san,

With further tests, this ls-lisp behavior occurs only on my Mingw64
Windows Emacs environment.  I can not reproduce it on my WSL2 Ubuntu
environemnt.

> What do you see with 'ls' and what do you see with ls-lisp?  Also, in
> which locale are you trying this with 'ls'?

I include my trial to hope it can be reproduced on your environment.  In
this scenario, I use alittle more real filenames instead of just number.

================================================================

On my Windows machine, output from "M-! (shell-command) env"

OS=Windows_NT
LANG=ja_JP.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_TIME=C


================================================================

tkh$ cat ../createfiles.sh
touch "34 アルバム-300dpi.jpg"
touch "34 アルバム-300dpi.png"
touch "054_交換機.jpg"
touch "054_交換機.png"
touch "91 部分カット.jpg"
touch "91 部分カット.png"
touch "0717-パソコン.jpg"
touch "0717-パソコン.png"
touch "1935 社屋.jpg"
touch "1935 社屋.png"
touch "FFFF_縁カット.jpg"
touch "FFFF_縁カット.png"
touch "hhhh.jpg"
touch "hhhh.png"
tkh$ sh ../createfiles.sh
tkh$ ls -l
total 0
-rw-r--r-- 1 tkh 0 Jun  5 10:45 054_交換機.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 054_交換機.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 0717-パソコン.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 0717-パソコン.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 1935 社屋.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 1935 社屋.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 34 アルバム-300dpi.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 34 アルバム-300dpi.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 91 部分カット.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 91 部分カット.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 FFFF_縁カット.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 FFFF_縁カット.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 hhhh.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 hhhh.png
tkh$ ls -lv
total 0
-rw-r--r-- 1 tkh 0 Jun  5 10:45 34 アルバム-300dpi.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 34 アルバム-300dpi.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 054_交換機.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 054_交換機.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 91 部分カット.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 91 部分カット.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 0717-パソコン.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 0717-パソコン.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 1935 社屋.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 1935 社屋.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 FFFF_縁カット.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 FFFF_縁カット.png
-rw-r--r-- 1 tkh 0 Jun  5 10:45 hhhh.jpg
-rw-r--r-- 1 tkh 0 Jun  5 10:45 hhhh.png
tkh$


================================================================

On my Windows machine, "054_交換機.{jpg,png}" are wrongly listed in
dired buffer.

  drwxrwxrwx  1 0 Jun  5 10:45 .
  drwxrwxrwx  1 0 Jun  5 10:45 ..
  -rw-rw-rw-  1 0 Jun  5 10:45 34 アルバム-300dpi.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 34 アルバム-300dpi.png
  -rw-rw-rw-  1 0 Jun  5 10:45 054_交換機.png
  -rw-rw-rw-  1 0 Jun  5 10:45 91 部分カット.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 91 部分カット.png
  -rw-rw-rw-  1 0 Jun  5 10:45 0717-パソコン.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 0717-パソコン.png
  -rw-rw-rw-  1 0 Jun  5 10:45 054_交換機.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 1935 社屋.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 1935 社屋.png
  -rw-rw-rw-  1 0 Jun  5 10:45 FFFF_縁カット.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 FFFF_縁カット.png
  -rw-rw-rw-  1 0 Jun  5 10:45 hhhh.jpg
  -rw-rw-rw-  1 0 Jun  5 10:45 hhhh.png

================================================================

When I drilled down to understand this listing, I encountered sort order
inconsistency, from my point of view, reported in my original mail.


>> # "01.0", "10", ... is minimal reproducible pattern that I stlipped down
>> # my real filenames pattern.
>
> I'd prefer to see the real file names instead, since that's what
> ls-lisp-version-lessp was written to handle.

I did too simplification in my original mail.  It was not good for
report, sorry.


> The exact spec of strverscmp is not known, AFAIK, and the
> implementation is a state machine, which is somewhat hard to
> reverse-engineer.  I'm only aware of the documentation in the glibc
> manual; did you read it?

I saw strverscmp man page, then source.  And no attempt to understand
the state machine implemantation.


> Comparing with 'ls' is also somewhat problematic, because in UTF-8
> locales its collation rules ignore some punctuation characters --
> again, because that's how glibc implements that.  Emacs on MS-Windows
> can emulate this behavior if you set w32-collate-ignore-punctuation to
> a non-nil value.

I think `w32-collate-ignore-punctuation' seems not to affect my test
case.  In my trial, the dired buffer listing is same with t / nil of
`w32-collate-ignore-punctuation'.

-- 
tkh





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-05  2:37       ` TAKAHASHI Yoshio
@ 2022-06-05  7:01         ` Eli Zaretskii
  2022-06-05  9:38           ` TAKAHASHI Yoshio
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2022-06-05  7:01 UTC (permalink / raw)
  To: TAKAHASHI Yoshio; +Cc: 55787

> From: TAKAHASHI Yoshio <yfb02119@nifty.com>
> Cc: 55787@debbugs.gnu.org
> Date: Sun, 05 Jun 2022 11:37:11 +0900
> 
> > What do you see with 'ls' and what do you see with ls-lisp?  Also, in
> > which locale are you trying this with 'ls'?
> 
> I include my trial to hope it can be reproduced on your environment.  In
> this scenario, I use alittle more real filenames instead of just number.

Thanks, I found two issues with the current implementation of
ls-lisp-version-lessp, and I hope I fixed them now on the master
branch.  Please see if you get a more reasonable behavior.  (I'm not
sure you will see exactly the same order as in "ls -lv", though; not
sure why.)





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-05  7:01         ` Eli Zaretskii
@ 2022-06-05  9:38           ` TAKAHASHI Yoshio
  2022-06-05  9:48             ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: TAKAHASHI Yoshio @ 2022-06-05  9:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55787

Eli-san,

> Please see if you get a more reasonable behavior.  (I'm not
> sure you will see exactly the same order as in "ls -lv", though; not
> sure why.)

As you menthined in earler mail, the specification of strverscmp is not
documented clearly.  I believe your fix generates reasonable listing
order.  I appreciate your fix.  Thank you!

-- 
tkh





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp
  2022-06-05  9:38           ` TAKAHASHI Yoshio
@ 2022-06-05  9:48             ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2022-06-05  9:48 UTC (permalink / raw)
  To: TAKAHASHI Yoshio; +Cc: 55787-done

> From: TAKAHASHI Yoshio <yfb02119@nifty.com>
> Cc: 55787@debbugs.gnu.org
> Date: Sun, 05 Jun 2022 18:38:10 +0900
> 
> Eli-san,
> 
> > Please see if you get a more reasonable behavior.  (I'm not
> > sure you will see exactly the same order as in "ls -lv", though; not
> > sure why.)
> 
> As you menthined in earler mail, the specification of strverscmp is not
> documented clearly.  I believe your fix generates reasonable listing
> order.  I appreciate your fix.  Thank you!

Thanks, I'm therefore closing this bug.





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-05  9:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 23:21 bug#55787: 29.0.50; inconsistent sort order with ls-lisp-version-lessp TAKAHASHI Yoshio
2022-06-04  7:44 ` Eli Zaretskii
2022-06-04 14:11   ` TAKAHASHI Yoshio
2022-06-04 14:52     ` Eli Zaretskii
2022-06-05  2:37       ` TAKAHASHI Yoshio
2022-06-05  7:01         ` Eli Zaretskii
2022-06-05  9:38           ` TAKAHASHI Yoshio
2022-06-05  9:48             ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).