emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* test-org-table/sort-lines: Failing test on macOS
@ 2022-10-06 20:15 Rudolf Adamkovič
  2022-10-07 12:04 ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Rudolf Adamkovič @ 2022-10-06 20:15 UTC (permalink / raw)
  To: emacs-orgmode

Howdy, howdy!

I see the test failure below on macOS.

Test test-org-table/sort-lines condition:

  (ert-test-failed
   ((should
     (equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
            (org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n"
            ... ...)))
    :form
    (equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
           #("| B | 4 |\n| a | x |\n| c | 3 |\n" 0 9
             (face org-table)
             10 19
             (face org-table)
             20 29
             (face org-table)))
    :value nil :explanation
    (array-elt 2
               (different-atoms
                (97 "#x61" "?a")
                (66 "#x42" "?B")))))
 FAILED  796/952  test-org-table/sort-lines (0.003410 sec)
   at ../lisp/test-org-table.el:1880

The isolated part of the test file that fails:

(let ((original-string-collate-lessp (symbol-function 'string-collate-lessp)))
  (cl-letf (((symbol-function 'string-collate-lessp)
             (lambda (s1 s2 &optional _locale ignore-case)
               (funcall original-string-collate-lessp
                        s1 s2 "C" nil))))
    (should
     (equal "| a | x |\n| B | 4 |\n| c | 3 |\n"
            (org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n"
                                     (org-table-sort-lines nil ?a)
                                     (buffer-string))))))

If I understand, "a" should be less than "B" when under "C" locale when
ignoring case (nil) , right?  Yet, I get the following:

(string-collate-lessp "a" "B" "C" nil)  ; => nil

[FYI: If I replace nil with t, the procedure returns nil too.]

Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).

Rudy
-- 
"It is no paradox to say that in our most theoretical moods we may be
nearest to our most practical applications."
-- Alfred North Whitehead, 1861-1947

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič
@ 2022-10-07 12:04 ` Max Nikulin
  2022-10-08  5:25   ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-10-07 12:04 UTC (permalink / raw)
  To: emacs-orgmode

On 07/10/2022 03:15, Rudolf Adamkovič wrote:
> 
> If I understand, "a" should be less than "B" when under "C" locale when
> ignoring case (nil) , right?  Yet, I get the following:
> 
> (string-collate-lessp "a" "B" "C" nil)  ; => nil

When case is not ignored (4th argument is nil) locale-dependent 
collation rules are used, so you get the expected result.

$ printf 'a\nB\n' | LC_COLLATE=C sort
B
a
$ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
a
B

> [FYI: If I replace nil with t, the procedure returns nil too.]
> 
> Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).

Strange. Emacs-26, Linux

(string-collate-lessp "a" "B" "C" t)
t

If libc is sane (assuming that sort is linked to the same libc)

printf 'a\nb\n' | LC_COLLATE=C sort
printf 'b\na\n' | LC_COLLATE=C sort
printf 'A\nB\n' | LC_COLLATE=C sort
printf 'B\nA\n' | LC_COLLATE=C sort
printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort

then you might face an Emacs bug.

P.S. Example of a subtle issue with sorting: significant space added to 
some locales like es_ES & Co, pl_PL:

Maxim Nikulin. Re: [Patch] to correctly sort the items with emphasis 
marks in a list. Fri, 16 Apr 2021 21:59:51 +0700. 
https://list.orgmode.org/s5c8p9$97n$1@ciao.gmane.io



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-07 12:04 ` Max Nikulin
@ 2022-10-08  5:25   ` Ihor Radchenko
  2022-10-08 14:27     ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-10-08  5:25 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> On 07/10/2022 03:15, Rudolf Adamkovič wrote:
>> 
>> If I understand, "a" should be less than "B" when under "C" locale when
>> ignoring case (nil) , right?  Yet, I get the following:
>> 
>> (string-collate-lessp "a" "B" "C" nil)  ; => nil
>
> When case is not ignored (4th argument is nil) locale-dependent 
> collation rules are used, so you get the expected result.
>
> $ printf 'a\nB\n' | LC_COLLATE=C sort
> B
> a
> $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> a
> B

Should we then modify the test to set locale explicitly?

>> [FYI: If I replace nil with t, the procedure returns nil too.]
>> 
>> Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614).
>
> Strange. Emacs-26, Linux
>
> (string-collate-lessp "a" "B" "C" t)
> t
>
> If libc is sane (assuming that sort is linked to the same libc)
>
> printf 'a\nb\n' | LC_COLLATE=C sort
> printf 'b\na\n' | LC_COLLATE=C sort
> printf 'A\nB\n' | LC_COLLATE=C sort
> printf 'B\nA\n' | LC_COLLATE=C sort
> printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
> printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
> printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
> printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
>
> then you might face an Emacs bug.

IDK if it is related, but there was a recent (fixed) bug in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787

Note that Rudolf is using Emacs 29.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-08  5:25   ` Ihor Radchenko
@ 2022-10-08 14:27     ` Max Nikulin
  2022-10-09  3:59       ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-10-08 14:27 UTC (permalink / raw)
  To: emacs-orgmode

On 08/10/2022 12:25, Ihor Radchenko wrote:
> Max Nikulin writes:
>>
>> When case is not ignored (4th argument is nil) locale-dependent
>> collation rules are used, so you get the expected result.
>>
>> $ printf 'a\nB\n' | LC_COLLATE=C sort
>> B
>> a
>> $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
>> a
>> B
> 
> Should we then modify the test to set locale explicitly?

Rudolf cited the context of this test and "C" locale is explicitly used 
there.

> IDK if it is related, but there was a recent (fixed) bug in
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787

I have not tried to find commits to check if only version sort is affected.

> Note that Rudolf is using Emacs 29.

and macOS, so libc and locales version may be different as well.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-08 14:27     ` Max Nikulin
@ 2022-10-09  3:59       ` Ihor Radchenko
  2022-10-09 15:38         ` Rudolf Adamkovič
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-10-09  3:59 UTC (permalink / raw)
  To: Max Nikulin, Rudolf Adamkovič; +Cc: emacs-orgmode

[I am adding Rudolf's email back to CC just in case]

Max Nikulin <manikulin@gmail.com> writes:

>> Should we then modify the test to set locale explicitly?
>
> Rudolf cited the context of this test and "C" locale is explicitly used 
> there.

Oops. Missed it. Thanks for the clarification.

>> Note that Rudolf is using Emacs 29.
>
> and macOS, so libc and locales version may be different as well.

[Max, correct me if my further suggestion is wrong.]

Rudolf, can you (1) try sort in terminal to confirm that "C" locale
behaves as expected in MacOS; (2) If sort works fine, you may consider
reporting Emacs bug.

> If libc is sane (assuming that sort is linked to the same libc)
>
> printf 'a\nb\n' | LC_COLLATE=C sort
> printf 'b\na\n' | LC_COLLATE=C sort
> printf 'A\nB\n' | LC_COLLATE=C sort
> printf 'B\nA\n' | LC_COLLATE=C sort
> printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
> printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
> printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
> printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
>
> then you might face an Emacs bug.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-09  3:59       ` Ihor Radchenko
@ 2022-10-09 15:38         ` Rudolf Adamkovič
  2022-10-09 16:53           ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Rudolf Adamkovič @ 2022-10-09 15:38 UTC (permalink / raw)
  To: Ihor Radchenko, Max Nikulin; +Cc: emacs-orgmode

Ihor Radchenko <yantar92@gmail.com> writes:

> Rudolf, can you (1) try sort in terminal to confirm that "C" locale
> behaves as expected in MacOS; (2) If sort works fine, you may consider
> reporting Emacs bug.

For the two examples given by Max on Linux, I get on macOS:

printf 'a\nB\n' | LC_COLLATE=C sort
B
a

printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
B
a

For the other examples mentioned, I get on macOS:

printf 'a\nb\n' | LC_COLLATE=C sort
a
b

printf 'b\na\n' | LC_COLLATE=C sort
a
b

printf 'A\nB\n' | LC_COLLATE=C sort
A
B

printf 'B\nA\n' | LC_COLLATE=C sort
A
B

printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort
a
b

printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort
a
b

printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort
A
B

printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort
A
B

Rudy
-- 
"Chop your own wood and it will warm you twice."
-- Henry Ford; Francis Kinloch, 1819; Henry David Thoreau, 1854

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-09 15:38         ` Rudolf Adamkovič
@ 2022-10-09 16:53           ` Max Nikulin
  2022-10-10 22:25             ` Rudolf Adamkovič
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-10-09 16:53 UTC (permalink / raw)
  To: emacs-orgmode

On 09/10/2022 22:38, Rudolf Adamkovič wrote:
> 
> For the two examples given by Max on Linux, I get on macOS:
> 
> printf 'a\nB\n' | LC_COLLATE=C sort
> B
> a

This is the expected behavior.

> printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> B
> a

This one is not consistent with what I see on Linux with glibc.

printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
a
B

Perhaps you do not have en_US locale generated

locale -a | grep en_US
en_US.utf8

At least sort uses the same "C" locale definition as expected by Org 
tests. Either Emacs is linked with another libc or there is a bug in Emacs.

> printf 'a\nb\n' | LC_COLLATE=C sort
> a
> b

Sanity test passed for sort. You may try the same set of pairs with 
`string-collate-lessp'.

I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in 
your case
(string-collate-lessp "a" "B" "POSIX" nil)




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-09 16:53           ` Max Nikulin
@ 2022-10-10 22:25             ` Rudolf Adamkovič
  2022-10-12 16:09               ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Rudolf Adamkovič @ 2022-10-10 22:25 UTC (permalink / raw)
  To: Max Nikulin, emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> This one is not consistent with what I see on Linux with glibc.

Yeah, I noticed. :)

> Perhaps you do not have en_US locale generated
>
> locale -a | grep en_US
> en_US.utf8

$ locale -a | grep en_US
en_US.US-ASCII
en_US.UTF-8
en_US
en_US.ISO8859-15
en_US.ISO8859-1

> Sanity test passed for sort. You may try the same set of pairs with 
> `string-collate-lessp'.

(string-collate-lessp "a" "b" "C" t) ; t
(string-collate-lessp "b" "a" "C" t) ; nil
(string-collate-lessp "A" "B" "C" t) ; t
(string-collate-lessp "B" "A" "C" t) ; nil
(string-collate-lessp "a" "b" "C" t) ; t
(string-collate-lessp "b" "a" "C" t) ; nil
(string-collate-lessp "A" "B" "C" t) ; t
(string-collate-lessp "B" "A" "C" t) ; nil

(string-collate-lessp "a" "b" "C" nil) ; t
(string-collate-lessp "b" "a" "C" nil) ; nil
(string-collate-lessp "A" "B" "C" nil) ; t
(string-collate-lessp "B" "A" "C" nil) ; nil
(string-collate-lessp "a" "b" "C" nil) ; t
(string-collate-lessp "b" "a" "C" nil) ; nil
(string-collate-lessp "A" "B" "C" nil) ; t
(string-collate-lessp "B" "A" "C" nil) ; nil

> I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in 
> your case (string-collate-lessp "a" "B" "POSIX" nil).

(string-collate-lessp "a" "B" "POSIX" nil) ; nil

Rudy
-- 
"'Contrariwise,' continued Tweedledee, 'if it was so, it might be; and
if it were so, it would be; but as it isn't, it ain't.  That's logic.'"
-- Lewis Carroll, Through the Looking Glass, 1871/1872

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-10 22:25             ` Rudolf Adamkovič
@ 2022-10-12 16:09               ` Max Nikulin
  2022-11-15  4:10                 ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-10-12 16:09 UTC (permalink / raw)
  To: emacs-orgmode

On 11/10/2022 05:25, Rudolf Adamkovič wrote:
> (string-collate-lessp "a" "b" "C" t) ; t
..
> (string-collate-lessp "a" "b" "C" nil) ; t
..

So basic sanity tests passed.

> (string-collate-lessp "a" "B" "C" nil)  ; => nil
> (string-collate-lessp "a" "B" "POSIX" nil) ; nil

is expected behavior as well. What I do not like is

 > (string-collate-lessp "a" "B" "C" t)  ; => nil
Actually you wrote
> [FYI: If I replace nil with t, the procedure returns nil too.]
 From my point of view it is a reason to file an Emacs bug because I get

     (string-collate-lessp "a" "B" "C" t) ; => t

It seems case folding works strange for comparison because when case is 
the same "a" < "b" as expected:

> (string-collate-lessp "a" "b" "C" t) ; t
> (string-collate-lessp "A" "B" "C" t) ; t
> (string-collate-lessp "a" "b" "C" nil) ; t
> (string-collate-lessp "A" "B" "C" nil) ; t

May it happen that IGNORE-CASE argument is ignored in your case? I 
believe, it is improbable that C locale is not generated, so case fold 
rules are undefined

     locale -a | grep C

Another your strange result is

> $ locale -a | grep en_US
> en_US.US-ASCII
> en_US.UTF-8
..
so en_US locale is defined but collation rules are different from glibc
> printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort
> B
> a

I have no idea if sort and Emacs use the same libc and the same locale 
definitions. I am unaware which way it is organized in MacOS.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-10-12 16:09               ` Max Nikulin
@ 2022-11-15  4:10                 ` Ihor Radchenko
  2022-11-20  4:18                   ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-15  4:10 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>  > (string-collate-lessp "a" "B" "C" t)  ; => nil
> Actually you wrote
>> [FYI: If I replace nil with t, the procedure returns nil too.]
>  From my point of view it is a reason to file an Emacs bug because I get
>
>      (string-collate-lessp "a" "B" "C" t) ; => t

I submitted the bug report to Emacs.
See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-15  4:10                 ` Ihor Radchenko
@ 2022-11-20  4:18                   ` Ihor Radchenko
  2022-11-20  8:00                     ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-20  4:18 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Ihor Radchenko <yantar92@posteo.net> writes:

> Max Nikulin <manikulin@gmail.com> writes:
>
>>  > (string-collate-lessp "a" "B" "C" t)  ; => nil
>> Actually you wrote
>>> [FYI: If I replace nil with t, the procedure returns nil too.]
>>  From my point of view it is a reason to file an Emacs bug because I get
>>
>>      (string-collate-lessp "a" "B" "C" t) ; => t
>
> I submitted the bug report to Emacs.
> See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275


According to the discussion on debbugs, it looks like we can use
`compare-strings' instead. It will be independent of the system locale
and always follow Unicode rules.

However, I am not sure if ignoring locale is something we really want.
WDYT?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-20  4:18                   ` Ihor Radchenko
@ 2022-11-20  8:00                     ` Max Nikulin
  2022-11-21  3:15                       ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-11-20  8:00 UTC (permalink / raw)
  To: emacs-orgmode

On 20/11/2022 11:18, Ihor Radchenko wrote:
>> Max Nikulin writes:
>>>   From my point of view it is a reason to file an Emacs bug because I get
>>>
>>>       (string-collate-lessp "a" "B" "C" t) ; => t
>>
>> See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275
> 
> According to the discussion on debbugs, it looks like we can use
> `compare-strings' instead. It will be independent of the system locale
> and always follow Unicode rules.
> 
> However, I am not sure if ignoring locale is something we really want.
> WDYT?

I think we should keep `string-collate-lessp' in the 
`org-table-sort-lines' implementation. Users expect sorting accordingly 
to their locales. However it is better to add a warning to 
`org-table-sort-lines' docstring and to the manual that caseless sort 
depends on its implementation in libc, so currently it does not work in 
clang/llvm and so e.g. on MacOS.

Concerning the test, I would split the current testcase into 2 parts 
depending on WITH-CASE argument, check if caseless collation is 
available and skip the related test otherwise.

As to the thread linked to the bug report 
https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html 
"case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400, 
there is a link
https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
unrelated to the issue, but comments and answers there describe a lot of 
pitfalls and explain why string comparison ignoring case is not trivial. 
(It is a Sisyphean task in some sense, I like the comment on 3 sigmas.)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-20  8:00                     ` Max Nikulin
@ 2022-11-21  3:15                       ` Ihor Radchenko
  2022-11-21 16:48                         ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-21  3:15 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>> However, I am not sure if ignoring locale is something we really want.
>> WDYT?
>
> I think we should keep `string-collate-lessp' in the 
> `org-table-sort-lines' implementation. Users expect sorting accordingly 
> to their locales. However it is better to add a warning to 
> `org-table-sort-lines' docstring and to the manual that caseless sort 
> depends on its implementation in libc, so currently it does not work in 
> clang/llvm and so e.g. on MacOS.

Sounds reasonable.

Note that not only `org-table-sort-lines' is using
`string-collate-lessp'. The full list of functions potentially affected
by libc sorting is:

1. Bibliography order in `org-cite-basic-export-bibliography'
   (via org-cite-basic--sort-keys -> org-cite-basic--field-less-p)
2. `org-sort-list'
3. `org-table-sort-lines'
4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
   "Alphabetical" or "Reverse alphabetical".
5. `org-sort-entries'
6. Agenda sorting, when alphabetical sorting is involved
7. `org-map-entries'

I am not 100% sure where we should add the information to
docstring/manual and where we should not.

> Concerning the test, I would split the current testcase into 2 parts 
> depending on WITH-CASE argument, check if caseless collation is 
> available and skip the related test otherwise.

How can we check the availability?

> As to the thread linked to the bug report 
> https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html 
> "case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400, 
> there is a link
> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
> unrelated to the issue, but comments and answers there describe a lot of 
> pitfalls and explain why string comparison ignoring case is not trivial. 
> (It is a Sisyphean task in some sense, I like the comment on 3 sigmas.)

Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
what we are concerned about here is consistency. Not the pitfalls per
se.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-21  3:15                       ` Ihor Radchenko
@ 2022-11-21 16:48                         ` Max Nikulin
  2022-11-22  1:14                           ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-11-21 16:48 UTC (permalink / raw)
  To: emacs-orgmode

On 21/11/2022 10:15, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>>> However, I am not sure if ignoring locale is something we really want.
>>> WDYT?
>>
>> I think we should keep `string-collate-lessp' in the
>> `org-table-sort-lines' implementation. Users expect sorting accordingly
>> to their locales. However it is better to add a warning to
>> `org-table-sort-lines' docstring and to the manual that caseless sort
>> depends on its implementation in libc, so currently it does not work in
>> clang/llvm and so e.g. on MacOS.
> 
> Sounds reasonable.
> 
> Note that not only `org-table-sort-lines' is using
> `string-collate-lessp'. The full list of functions potentially affected
> by libc sorting is:
> 
> 1. Bibliography order in `org-cite-basic-export-bibliography'
>     (via org-cite-basic--sort-keys -> org-cite-basic--field-less-p)
 > 3. `org-table-sort-lines'
Confirmed.

> 2. `org-sort-list'
> 5. `org-sort-entries'
`downcase' is used, not proper case folding, so a potential issue

> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
>     "Alphabetical" or "Reverse alphabetical".

IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.

> 6. Agenda sorting, when alphabetical sorting is involved

`string-lessp' and `downcase' so even more severe locale-related issues 
might be expected.

> 7. `org-map-entries'

Unsure which predicate is used.

> I am not 100% sure where we should add the information to
> docstring/manual and where we should not.

If footnotes in the manual had fixed labels then I would suggest 
reference the same footnote in the manual and in the docstrings. 
Perhaps, a new subsection should be added to info "(org) Miscellaneous" 
and "see info node ..." should be added to all involved docstrings.

>> Concerning the test, I would split the current testcase into 2 parts
>> depending on WITH-CASE argument, check if caseless collation is
>> available and skip the related test otherwise.
> 
> How can we check the availability?

(string-collate-lessp "a" "B" "C" t)

> Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
> what we are concerned about here is consistency. Not the pitfalls per
> se.

Achieving consistency across Org code requires additional efforts.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-21 16:48                         ` Max Nikulin
@ 2022-11-22  1:14                           ` Ihor Radchenko
  2022-11-22 16:01                             ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-22  1:14 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>> 2. `org-sort-list'
>> 5. `org-sort-entries'
> `downcase' is used, not proper case folding, so a potential issue

`downcase' is used to determine user input about sorting type.
Not for sorting itself.

>> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
>>     "Alphabetical" or "Reverse alphabetical".
>
> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.

I feel like we are slightly miscommunicating here.
I mostly tried to list the uses of libc-sensitive sorting. Not
specifically cases when we try to ignore the case.

The problem is not limited to case-sensitive comparisons. Some systems
may fail to implement specific locales and thus sorting may downgrade to
simple string-lessp.

No `downcase' is hidden anywhere there.

>> 6. Agenda sorting, when alphabetical sorting is involved
>
> `string-lessp' and `downcase' so even more severe locale-related issues 
> might be expected.

Could you please elaborate?

>> 7. `org-map-entries'
>
> Unsure which predicate is used.

It is a similar scenario with agenda. `org-map-entries' uses
`org-make-tags-matcher', which calls `org-op-to-function' when user
wants to select property values via </<=/>/>= criterion.
`org-op-to-function' calls `org-string<' or similar that, in turn, uses
`string-collate-lessp' with nil IGNORE-CASE argument.

>> I am not 100% sure where we should add the information to
>> docstring/manual and where we should not.
>
> If footnotes in the manual had fixed labels then I would suggest 
> reference the same footnote in the manual and in the docstrings. 
> Perhaps, a new subsection should be added to info "(org) Miscellaneous" 
> and "see info node ..." should be added to all involved docstrings.

Sounds reasonable.

>>> Concerning the test, I would split the current testcase into 2 parts
>>> depending on WITH-CASE argument, check if caseless collation is
>>> available and skip the related test otherwise.
>> 
>> How can we check the availability?
>
> (string-collate-lessp "a" "B" "C" t)

Thanks!

>> Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However,
>> what we are concerned about here is consistency. Not the pitfalls per
>> se.
>
> Achieving consistency across Org code requires additional efforts.

Well. Just using `string-lessp' would make things very consistent.
Easily and with no efforts.

The question though is what is the right thing to do for users while
also keeping consistency.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-22  1:14                           ` Ihor Radchenko
@ 2022-11-22 16:01                             ` Max Nikulin
  2022-11-23 10:37                               ` Ihor Radchenko
  0 siblings, 1 reply; 21+ messages in thread
From: Max Nikulin @ 2022-11-22 16:01 UTC (permalink / raw)
  To: emacs-orgmode

On 22/11/2022 08:14, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
>>> 2. `org-sort-list'
>>> 5. `org-sort-entries'
>> `downcase' is used, not proper case folding, so a potential issue
> 
> `downcase' is used to determine user input about sorting type.
> Not for sorting itself.

See case-func variable. Its initialization depends on the IGNORE-CASE 
argument. Strings to sort are passed either through `identity' or 
through `downcase'.

>>> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to
>>>      "Alphabetical" or "Reverse alphabetical".
>>
>> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
> 
> I feel like we are slightly miscommunicating here.
> I mostly tried to list the uses of libc-sensitive sorting. Not
> specifically cases when we try to ignore the case.
> 
> The problem is not limited to case-sensitive comparisons. Some systems
> may fail to implement specific locales and thus sorting may downgrade to
> simple string-lessp.

When case folding is not involved, I consider `string-lessp' as a 
graceful degradation. Despite locale rules are not applied, strings are 
mostly sorted. Exceptions exist, but usually order is reasonable.

Completely disregarding IGNORE-CASE argument of `string-collate-lessp' 
on MacOS (that is not a heavily stripped embedded OS) is a bad surprise 
for me.

>>> 6. Agenda sorting, when alphabetical sorting is involved
>>
>> `string-lessp' and `downcase' so even more severe locale-related issues
>> might be expected.
> 
> Could you please elaborate?

I admit that `downcase' may be an acceptable workaround since 
`string-collate-lessp' may not work IGNORE-CASE, but I believe, when 
available, `string-collate-lessp' should be the preferred option for 
sorting.

>> Achieving consistency across Org code requires additional efforts.
> 
> Well. Just using `string-lessp' would make things very consistent.
> Easily and with no efforts.

With hope that clang will get better Unicode support, I would move in 
the opposite direction, namely wider usage of `string-collate-lessp'. 
Just using `string-lessp' means no ignore case sort even where it is 
available now.

I have an idea of a compatibility wrapper for `string-collate-lessp' 
with special treatment of ignoring case and bad libc implementation. 
Apply `downcase' before passing arguments to `string-lessp'. It should 
provide consistency, best user experience when locales works properly, 
and graceful degradation otherwise. I hope, it is acceptable for Org 
even though such trick is undesired for Emacs due to performance reasons.

However I am afraid of compatibility shims after

d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time: 
Refactor into top-level `defmacro'

P.S. I am not motivated enough to build Emacs on Linux using clang to 
check if locale information will be available. I am almost sure that 
some locale information is available on MacOS, e.g. at least strcasecmp 
even if full CLDR can not be easily accessed from C. I do not have a Mac 
to check state of affairs. For objective-C there is e.g. 
comareCaseIndependent.

I do not like that Emacs relies on locale support (and timezone as well) 
in libc. It becomes a problem as soon as more than one locale should be 
used in simultaneously. I agree that there are enough complications and 
sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes 
specific locale even restricted to a part of a document. It is tricky to 
handle such cases, but current limitations are too strict (and defective 
`string-collate-lessp' on MacOS is an example).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-22 16:01                             ` Max Nikulin
@ 2022-11-23 10:37                               ` Ihor Radchenko
  2022-11-23 15:27                                 ` Max Nikulin
  0 siblings, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-23 10:37 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> On 22/11/2022 08:14, Ihor Radchenko wrote:
>> Max Nikulin writes:
>> 
>>>> 2. `org-sort-list'
>>>> 5. `org-sort-entries'
>>> `downcase' is used, not proper case folding, so a potential issue
>> 
>> `downcase' is used to determine user input about sorting type.
>> Not for sorting itself.
>
> See case-func variable. Its initialization depends on the IGNORE-CASE 
> argument. Strings to sort are passed either through `identity' or 
> through `downcase'.

Thanks for the pointer.
Now, I am getting more confused though.
Do we even need to use `string-collate-lessp' then?

Eli even argued that `string-collate-lessp' is strictly worse compared
to more predictable approach. See
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40

Do you remember any cases when users actually demanded locale-specific
sorting?

>>> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code.
>> 
>> I feel like we are slightly miscommunicating here.
>> I mostly tried to list the uses of libc-sensitive sorting. Not
>> specifically cases when we try to ignore the case.
>> 
>> The problem is not limited to case-sensitive comparisons. Some systems
>> may fail to implement specific locales and thus sorting may downgrade to
>> simple string-lessp.
>
> When case folding is not involved, I consider `string-lessp' as a 
> graceful degradation. Despite locale rules are not applied, strings are 
> mostly sorted. Exceptions exist, but usually order is reasonable.
>
> Completely disregarding IGNORE-CASE argument of `string-collate-lessp' 
> on MacOS (that is not a heavily stripped embedded OS) is a bad surprise 
> for me.

It was a surprise for me as well. Should be at least a bit more clear
now as I updated the docstring of `string-collate-lessp'.

However, I feel a bit lost about what to do on Org side.
We can put a disclaimer in the manual and all that, but it still feels
too complex.

>>>> 6. Agenda sorting, when alphabetical sorting is involved
>>>
>>> `string-lessp' and `downcase' so even more severe locale-related issues
>>> might be expected.
>> 
>> Could you please elaborate?
>
> I admit that `downcase' may be an acceptable workaround since 
> `string-collate-lessp' may not work IGNORE-CASE, but I believe, when 
> available, `string-collate-lessp' should be the preferred option for 
> sorting.

As I pointed above, Eli has an opposite opinion.
I feel that my understanding of the topic is not sufficient to judge.
Maybe we should ask users? (But who is even aware about these things
happening under the hood?)

> I have an idea of a compatibility wrapper for `string-collate-lessp' 
> with special treatment of ignoring case and bad libc implementation. 
> Apply `downcase' before passing arguments to `string-lessp'. It should 
> provide consistency, best user experience when locales works properly, 
> and graceful degradation otherwise. I hope, it is acceptable for Org 
> even though such trick is undesired for Emacs due to performance reasons.

Macro idea sounds reasonable. Though I am still unsure which direction
we need to go.

> However I am afraid of compatibility shims after
>
> d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time: 
> Refactor into top-level `defmacro'

What do you refer to?

> I do not like that Emacs relies on locale support (and timezone as well) 
> in libc. It becomes a problem as soon as more than one locale should be 
> used in simultaneously. I agree that there are enough complications and 
> sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes 
> specific locale even restricted to a part of a document. It is tricky to 
> handle such cases, but current limitations are too strict (and defective 
> `string-collate-lessp' on MacOS is an example).

The question is what can be done and, more importantly, how much effort
will it take to implement and maintain an alternative.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-23 10:37                               ` Ihor Radchenko
@ 2022-11-23 15:27                                 ` Max Nikulin
  2022-11-23 17:01                                   ` Max Nikulin
  2022-11-26  2:05                                   ` Ihor Radchenko
  0 siblings, 2 replies; 21+ messages in thread
From: Max Nikulin @ 2022-11-23 15:27 UTC (permalink / raw)
  To: emacs-orgmode

On 23/11/2022 17:37, Ihor Radchenko wrote:
> Max Nikulin writes:
>>
>> Strings to sort are passed either through `identity' or
>> through `downcase'.
> 
> Thanks for the pointer.
> Now, I am getting more confused though.
> Do we even need to use `string-collate-lessp' then?

I think we do because sort result is presented to humans.

(setq lst '("semana" "señor" "sepia"))
(sort lst #'string-lessp) ;         => ("semana" "sepia" "señor")
(sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia")

> Eli even argued that `string-collate-lessp' is strictly worse compared
> to more predictable approach. See
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40

In this particular case Eli may assume that e.g. list is a elisp 
structure, not a kind of text formatting. In general, I am quite 
pessimistic concerning quality of locales support in Emacs while Eli may 
have rather different point of view.

> Do you remember any cases when users actually demanded locale-specific
> sorting?

I think, users too often face poor locale support in various 
applications, so they are not surprised when see incorrect results. In 
some sense such results are consistent (erroneous in the same way).

Formatting of numbers in Emacs is the extreme case of consistency. For 
the sake of reliably reading/writing of numbers from/to files or network 
it is impossible to present a number accordingly to the current locale. 
An exception is en_US that has some dedicated code in calc.

I believe, it is silly to adhere to a common denominator and to not use 
`string-collate-lessp' just because it is unavailable in some environments.

> However, I feel a bit lost about what to do on Org side.
> We can put a disclaimer in the manual and all that, but it still feels
> too complex.

My current suggestion is to provide a fallback to `downcase' in the code 
and to explain in the manual that runtime environments (OSes) are not 
equal and quality of locale support varies. Emacs heavily depends on 
libc in this area.

>> However I am afraid of compatibility shims after
>>
>> d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time:
>> Refactor into top-level `defmacro'
> 
> What do you refer to?

Implementation must be chosen at compile (or load) time. Due to some 
issues with native compiling it does not work. For string comparison 
runtime performance penalty may be higher than for timestamp processing.

> The question is what can be done and, more importantly, how much effort
> will it take to implement and maintain an alternative.

Effort is significant however e.g. browsers have their own 
implementation of Unicode-related stuff. There is ICU library, but Eli 
is against it because Emacs already has partial implementation of 
Unicode and it would mean duplication of character database.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-23 15:27                                 ` Max Nikulin
@ 2022-11-23 17:01                                   ` Max Nikulin
  2022-11-26  2:05                                   ` Ihor Radchenko
  1 sibling, 0 replies; 21+ messages in thread
From: Max Nikulin @ 2022-11-23 17:01 UTC (permalink / raw)
  To: emacs-orgmode

On 23/11/2022 22:27, Max Nikulin wrote:
> 
> (setq lst '("semana" "señor" "sepia"))
> (sort lst #'string-lessp) ;         => ("semana" "sepia" "señor")
> (sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia")
> 
> On 23/11/2022 17:37, Ihor Radchenko wrote:
>> Eli even argued that `string-collate-lessp' is strictly worse compared
>> to more predictable approach. See
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40

I think, Eli is afraid of the following sort of inconsistency

(string-collate-lessp "z" "ö" "de_DE.UTF-8") ; => nil
(string-collate-lessp "z" "ö" "sv_SE.UTF-8") ; => t

Mixed language example: U+0049 LATIN CAPITAL LETTER I vs. U+0406 
CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I

(sort '("Івана" "Ivan" "Термін" "Вони")
       (lambda (a b) (string-collate-lessp a b "uk_UA.UTF-8")))
("Вони" "Івана" "Термін" "Ivan")

(sort '("Івана" "Ivan" "Термін" "Вони")
       (lambda (a b) (string-collate-lessp a b "en_US.UTF-8")))
("Ivan" "Вони" "Івана" "Термін")

I suppose users should get result native to their languages even though 
others may get another order.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-23 15:27                                 ` Max Nikulin
  2022-11-23 17:01                                   ` Max Nikulin
@ 2022-11-26  2:05                                   ` Ihor Radchenko
  2022-11-29 16:40                                     ` Max Nikulin
  1 sibling, 1 reply; 21+ messages in thread
From: Ihor Radchenko @ 2022-11-26  2:05 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

>> However, I feel a bit lost about what to do on Org side.
>> We can put a disclaimer in the manual and all that, but it still feels
>> too complex.
>
> My current suggestion is to provide a fallback to `downcase' in the code 
> and to explain in the manual that runtime environments (OSes) are not 
> equal and quality of locale support varies. Emacs heavily depends on 
> libc in this area.

This sounds like something to be adapted to Emacs upstream.
I suggested to change `string-collate-lessp' fallback behaviour to use
`downcase' when IGNORE-CASE is non-nil. See my last message in
bug#59275.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: test-org-table/sort-lines: Failing test on macOS
  2022-11-26  2:05                                   ` Ihor Radchenko
@ 2022-11-29 16:40                                     ` Max Nikulin
  0 siblings, 0 replies; 21+ messages in thread
From: Max Nikulin @ 2022-11-29 16:40 UTC (permalink / raw)
  To: emacs-orgmode

On 26/11/2022 09:05, Ihor Radchenko wrote:
> Max Nikulin writes:
> 
> This sounds like something to be adapted to Emacs upstream.
> I suggested to change `string-collate-lessp' fallback behaviour to use
> `downcase' when IGNORE-CASE is non-nil. See my last message in
> bug#59275.

I do not share Eli's position "all or nothing". I prefer graceful 
degradation and best result achievable with reasonable efforts.

However either the reason is performance or correctness, both variants 
are against modification of `string-collate-lessp'. I still think that 
Org will benefit from a compatibility wrapper with `downcase'.

The only additional consideration is that compare function should be 
configurable. If a user access same files from Linux and macOS then it 
may be really annoying to get different order of entries in agenda. For 
most of Linux users it is better to use more smart 
`string-collate-lessp'. Some care is required to sort entries obtained 
from multiple buffers in predictable environment (locale, case 
conversion table).



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-11-29 17:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič
2022-10-07 12:04 ` Max Nikulin
2022-10-08  5:25   ` Ihor Radchenko
2022-10-08 14:27     ` Max Nikulin
2022-10-09  3:59       ` Ihor Radchenko
2022-10-09 15:38         ` Rudolf Adamkovič
2022-10-09 16:53           ` Max Nikulin
2022-10-10 22:25             ` Rudolf Adamkovič
2022-10-12 16:09               ` Max Nikulin
2022-11-15  4:10                 ` Ihor Radchenko
2022-11-20  4:18                   ` Ihor Radchenko
2022-11-20  8:00                     ` Max Nikulin
2022-11-21  3:15                       ` Ihor Radchenko
2022-11-21 16:48                         ` Max Nikulin
2022-11-22  1:14                           ` Ihor Radchenko
2022-11-22 16:01                             ` Max Nikulin
2022-11-23 10:37                               ` Ihor Radchenko
2022-11-23 15:27                                 ` Max Nikulin
2022-11-23 17:01                                   ` Max Nikulin
2022-11-26  2:05                                   ` Ihor Radchenko
2022-11-29 16:40                                     ` Max Nikulin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).