* test-org-table/sort-lines: Failing test on macOS @ 2022-10-06 20:15 Rudolf Adamkovič 2022-10-07 12:04 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Rudolf Adamkovič @ 2022-10-06 20:15 UTC (permalink / raw) To: emacs-orgmode Howdy, howdy! I see the test failure below on macOS. Test test-org-table/sort-lines condition: (ert-test-failed ((should (equal "| a | x |\n| B | 4 |\n| c | 3 |\n" (org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n" ... ...))) :form (equal "| a | x |\n| B | 4 |\n| c | 3 |\n" #("| B | 4 |\n| a | x |\n| c | 3 |\n" 0 9 (face org-table) 10 19 (face org-table) 20 29 (face org-table))) :value nil :explanation (array-elt 2 (different-atoms (97 "#x61" "?a") (66 "#x42" "?B"))))) FAILED 796/952 test-org-table/sort-lines (0.003410 sec) at ../lisp/test-org-table.el:1880 The isolated part of the test file that fails: (let ((original-string-collate-lessp (symbol-function 'string-collate-lessp))) (cl-letf (((symbol-function 'string-collate-lessp) (lambda (s1 s2 &optional _locale ignore-case) (funcall original-string-collate-lessp s1 s2 "C" nil)))) (should (equal "| a | x |\n| B | 4 |\n| c | 3 |\n" (org-test-with-temp-text "| <point>a | x |\n| c | 3 |\n| B | 4 |\n" (org-table-sort-lines nil ?a) (buffer-string)))))) If I understand, "a" should be less than "B" when under "C" locale when ignoring case (nil) , right? Yet, I get the following: (string-collate-lessp "a" "B" "C" nil) ; => nil [FYI: If I replace nil with t, the procedure returns nil too.] Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614). Rudy -- "It is no paradox to say that in our most theoretical moods we may be nearest to our most practical applications." -- Alfred North Whitehead, 1861-1947 Rudolf Adamkovič <salutis@me.com> [he/him] Studenohorská 25 84103 Bratislava Slovakia ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič @ 2022-10-07 12:04 ` Max Nikulin 2022-10-08 5:25 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-10-07 12:04 UTC (permalink / raw) To: emacs-orgmode On 07/10/2022 03:15, Rudolf Adamkovič wrote: > > If I understand, "a" should be less than "B" when under "C" locale when > ignoring case (nil) , right? Yet, I get the following: > > (string-collate-lessp "a" "B" "C" nil) ; => nil When case is not ignored (4th argument is nil) locale-dependent collation rules are used, so you get the expected result. $ printf 'a\nB\n' | LC_COLLATE=C sort B a $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort a B > [FYI: If I replace nil with t, the procedure returns nil too.] > > Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614). Strange. Emacs-26, Linux (string-collate-lessp "a" "B" "C" t) t If libc is sane (assuming that sort is linked to the same libc) printf 'a\nb\n' | LC_COLLATE=C sort printf 'b\na\n' | LC_COLLATE=C sort printf 'A\nB\n' | LC_COLLATE=C sort printf 'B\nA\n' | LC_COLLATE=C sort printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort then you might face an Emacs bug. P.S. Example of a subtle issue with sorting: significant space added to some locales like es_ES & Co, pl_PL: Maxim Nikulin. Re: [Patch] to correctly sort the items with emphasis marks in a list. Fri, 16 Apr 2021 21:59:51 +0700. https://list.orgmode.org/s5c8p9$97n$1@ciao.gmane.io ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-07 12:04 ` Max Nikulin @ 2022-10-08 5:25 ` Ihor Radchenko 2022-10-08 14:27 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-10-08 5:25 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > On 07/10/2022 03:15, Rudolf Adamkovič wrote: >> >> If I understand, "a" should be less than "B" when under "C" locale when >> ignoring case (nil) , right? Yet, I get the following: >> >> (string-collate-lessp "a" "B" "C" nil) ; => nil > > When case is not ignored (4th argument is nil) locale-dependent > collation rules are used, so you get the expected result. > > $ printf 'a\nB\n' | LC_COLLATE=C sort > B > a > $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort > a > B Should we then modify the test to set locale explicitly? >> [FYI: If I replace nil with t, the procedure returns nil too.] >> >> Tested on Emacs 29 (adaa2fc90e) and Org 9.5.5 (580f28614). > > Strange. Emacs-26, Linux > > (string-collate-lessp "a" "B" "C" t) > t > > If libc is sane (assuming that sort is linked to the same libc) > > printf 'a\nb\n' | LC_COLLATE=C sort > printf 'b\na\n' | LC_COLLATE=C sort > printf 'A\nB\n' | LC_COLLATE=C sort > printf 'B\nA\n' | LC_COLLATE=C sort > printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort > printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort > printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort > printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort > > then you might face an Emacs bug. IDK if it is related, but there was a recent (fixed) bug in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787 Note that Rudolf is using Emacs 29. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-08 5:25 ` Ihor Radchenko @ 2022-10-08 14:27 ` Max Nikulin 2022-10-09 3:59 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-10-08 14:27 UTC (permalink / raw) To: emacs-orgmode On 08/10/2022 12:25, Ihor Radchenko wrote: > Max Nikulin writes: >> >> When case is not ignored (4th argument is nil) locale-dependent >> collation rules are used, so you get the expected result. >> >> $ printf 'a\nB\n' | LC_COLLATE=C sort >> B >> a >> $ printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort >> a >> B > > Should we then modify the test to set locale explicitly? Rudolf cited the context of this test and "C" locale is explicitly used there. > IDK if it is related, but there was a recent (fixed) bug in > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55787 I have not tried to find commits to check if only version sort is affected. > Note that Rudolf is using Emacs 29. and macOS, so libc and locales version may be different as well. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-08 14:27 ` Max Nikulin @ 2022-10-09 3:59 ` Ihor Radchenko 2022-10-09 15:38 ` Rudolf Adamkovič 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-10-09 3:59 UTC (permalink / raw) To: Max Nikulin, Rudolf Adamkovič; +Cc: emacs-orgmode [I am adding Rudolf's email back to CC just in case] Max Nikulin <manikulin@gmail.com> writes: >> Should we then modify the test to set locale explicitly? > > Rudolf cited the context of this test and "C" locale is explicitly used > there. Oops. Missed it. Thanks for the clarification. >> Note that Rudolf is using Emacs 29. > > and macOS, so libc and locales version may be different as well. [Max, correct me if my further suggestion is wrong.] Rudolf, can you (1) try sort in terminal to confirm that "C" locale behaves as expected in MacOS; (2) If sort works fine, you may consider reporting Emacs bug. > If libc is sane (assuming that sort is linked to the same libc) > > printf 'a\nb\n' | LC_COLLATE=C sort > printf 'b\na\n' | LC_COLLATE=C sort > printf 'A\nB\n' | LC_COLLATE=C sort > printf 'B\nA\n' | LC_COLLATE=C sort > printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort > printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort > printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort > printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort > > then you might face an Emacs bug. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-09 3:59 ` Ihor Radchenko @ 2022-10-09 15:38 ` Rudolf Adamkovič 2022-10-09 16:53 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Rudolf Adamkovič @ 2022-10-09 15:38 UTC (permalink / raw) To: Ihor Radchenko, Max Nikulin; +Cc: emacs-orgmode Ihor Radchenko <yantar92@gmail.com> writes: > Rudolf, can you (1) try sort in terminal to confirm that "C" locale > behaves as expected in MacOS; (2) If sort works fine, you may consider > reporting Emacs bug. For the two examples given by Max on Linux, I get on macOS: printf 'a\nB\n' | LC_COLLATE=C sort B a printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort B a For the other examples mentioned, I get on macOS: printf 'a\nb\n' | LC_COLLATE=C sort a b printf 'b\na\n' | LC_COLLATE=C sort a b printf 'A\nB\n' | LC_COLLATE=C sort A B printf 'B\nA\n' | LC_COLLATE=C sort A B printf 'a\nb\n' | LC_COLLATE=C.UTF-8 sort a b printf 'b\na\n' | LC_COLLATE=C.UTF-8 sort a b printf 'A\nB\n' | LC_COLLATE=C.UTF-8 sort A B printf 'B\nA\n' | LC_COLLATE=C.UTF-8 sort A B Rudy -- "Chop your own wood and it will warm you twice." -- Henry Ford; Francis Kinloch, 1819; Henry David Thoreau, 1854 Rudolf Adamkovič <salutis@me.com> [he/him] Studenohorská 25 84103 Bratislava Slovakia ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-09 15:38 ` Rudolf Adamkovič @ 2022-10-09 16:53 ` Max Nikulin 2022-10-10 22:25 ` Rudolf Adamkovič 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-10-09 16:53 UTC (permalink / raw) To: emacs-orgmode On 09/10/2022 22:38, Rudolf Adamkovič wrote: > > For the two examples given by Max on Linux, I get on macOS: > > printf 'a\nB\n' | LC_COLLATE=C sort > B > a This is the expected behavior. > printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort > B > a This one is not consistent with what I see on Linux with glibc. printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort a B Perhaps you do not have en_US locale generated locale -a | grep en_US en_US.utf8 At least sort uses the same "C" locale definition as expected by Org tests. Either Emacs is linked with another libc or there is a bug in Emacs. > printf 'a\nb\n' | LC_COLLATE=C sort > a > b Sanity test passed for sort. You may try the same set of pairs with `string-collate-lessp'. I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in your case (string-collate-lessp "a" "B" "POSIX" nil) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-09 16:53 ` Max Nikulin @ 2022-10-10 22:25 ` Rudolf Adamkovič 2022-10-12 16:09 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Rudolf Adamkovič @ 2022-10-10 22:25 UTC (permalink / raw) To: Max Nikulin, emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > This one is not consistent with what I see on Linux with glibc. Yeah, I noticed. :) > Perhaps you do not have en_US locale generated > > locale -a | grep en_US > en_US.utf8 $ locale -a | grep en_US en_US.US-ASCII en_US.UTF-8 en_US en_US.ISO8859-15 en_US.ISO8859-1 > Sanity test passed for sort. You may try the same set of pairs with > `string-collate-lessp'. (string-collate-lessp "a" "b" "C" t) ; t (string-collate-lessp "b" "a" "C" t) ; nil (string-collate-lessp "A" "B" "C" t) ; t (string-collate-lessp "B" "A" "C" t) ; nil (string-collate-lessp "a" "b" "C" t) ; t (string-collate-lessp "b" "a" "C" t) ; nil (string-collate-lessp "A" "B" "C" t) ; t (string-collate-lessp "B" "A" "C" t) ; nil (string-collate-lessp "a" "b" "C" nil) ; t (string-collate-lessp "b" "a" "C" nil) ; nil (string-collate-lessp "A" "B" "C" nil) ; t (string-collate-lessp "B" "A" "C" nil) ; nil (string-collate-lessp "a" "b" "C" nil) ; t (string-collate-lessp "b" "a" "C" nil) ; nil (string-collate-lessp "A" "B" "C" nil) ; t (string-collate-lessp "B" "A" "C" nil) ; nil > I am curious if "POSIX" locale works similar to "C" and "C.UTF-8" in > your case (string-collate-lessp "a" "B" "POSIX" nil). (string-collate-lessp "a" "B" "POSIX" nil) ; nil Rudy -- "'Contrariwise,' continued Tweedledee, 'if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic.'" -- Lewis Carroll, Through the Looking Glass, 1871/1872 Rudolf Adamkovič <salutis@me.com> [he/him] Studenohorská 25 84103 Bratislava Slovakia ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-10 22:25 ` Rudolf Adamkovič @ 2022-10-12 16:09 ` Max Nikulin 2022-11-15 4:10 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-10-12 16:09 UTC (permalink / raw) To: emacs-orgmode On 11/10/2022 05:25, Rudolf Adamkovič wrote: > (string-collate-lessp "a" "b" "C" t) ; t .. > (string-collate-lessp "a" "b" "C" nil) ; t .. So basic sanity tests passed. > (string-collate-lessp "a" "B" "C" nil) ; => nil > (string-collate-lessp "a" "B" "POSIX" nil) ; nil is expected behavior as well. What I do not like is > (string-collate-lessp "a" "B" "C" t) ; => nil Actually you wrote > [FYI: If I replace nil with t, the procedure returns nil too.] From my point of view it is a reason to file an Emacs bug because I get (string-collate-lessp "a" "B" "C" t) ; => t It seems case folding works strange for comparison because when case is the same "a" < "b" as expected: > (string-collate-lessp "a" "b" "C" t) ; t > (string-collate-lessp "A" "B" "C" t) ; t > (string-collate-lessp "a" "b" "C" nil) ; t > (string-collate-lessp "A" "B" "C" nil) ; t May it happen that IGNORE-CASE argument is ignored in your case? I believe, it is improbable that C locale is not generated, so case fold rules are undefined locale -a | grep C Another your strange result is > $ locale -a | grep en_US > en_US.US-ASCII > en_US.UTF-8 .. so en_US locale is defined but collation rules are different from glibc > printf 'a\nB\n' | LC_COLLATE=en_US.UTF-8 sort > B > a I have no idea if sort and Emacs use the same libc and the same locale definitions. I am unaware which way it is organized in MacOS. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-10-12 16:09 ` Max Nikulin @ 2022-11-15 4:10 ` Ihor Radchenko 2022-11-20 4:18 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-15 4:10 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > > (string-collate-lessp "a" "B" "C" t) ; => nil > Actually you wrote >> [FYI: If I replace nil with t, the procedure returns nil too.] > From my point of view it is a reason to file an Emacs bug because I get > > (string-collate-lessp "a" "B" "C" t) ; => t I submitted the bug report to Emacs. See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-15 4:10 ` Ihor Radchenko @ 2022-11-20 4:18 ` Ihor Radchenko 2022-11-20 8:00 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-20 4:18 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Ihor Radchenko <yantar92@posteo.net> writes: > Max Nikulin <manikulin@gmail.com> writes: > >> > (string-collate-lessp "a" "B" "C" t) ; => nil >> Actually you wrote >>> [FYI: If I replace nil with t, the procedure returns nil too.] >> From my point of view it is a reason to file an Emacs bug because I get >> >> (string-collate-lessp "a" "B" "C" t) ; => t > > I submitted the bug report to Emacs. > See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 According to the discussion on debbugs, it looks like we can use `compare-strings' instead. It will be independent of the system locale and always follow Unicode rules. However, I am not sure if ignoring locale is something we really want. WDYT? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-20 4:18 ` Ihor Radchenko @ 2022-11-20 8:00 ` Max Nikulin 2022-11-21 3:15 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-11-20 8:00 UTC (permalink / raw) To: emacs-orgmode On 20/11/2022 11:18, Ihor Radchenko wrote: >> Max Nikulin writes: >>> From my point of view it is a reason to file an Emacs bug because I get >>> >>> (string-collate-lessp "a" "B" "C" t) ; => t >> >> See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275 > > According to the discussion on debbugs, it looks like we can use > `compare-strings' instead. It will be independent of the system locale > and always follow Unicode rules. > > However, I am not sure if ignoring locale is something we really want. > WDYT? I think we should keep `string-collate-lessp' in the `org-table-sort-lines' implementation. Users expect sorting accordingly to their locales. However it is better to add a warning to `org-table-sort-lines' docstring and to the manual that caseless sort depends on its implementation in libc, so currently it does not work in clang/llvm and so e.g. on MacOS. Concerning the test, I would split the current testcase into 2 parts depending on WITH-CASE argument, check if caseless collation is available and skip the related test otherwise. As to the thread linked to the bug report https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html "case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400, there is a link https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison unrelated to the issue, but comments and answers there describe a lot of pitfalls and explain why string comparison ignoring case is not trivial. (It is a Sisyphean task in some sense, I like the comment on 3 sigmas.) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-20 8:00 ` Max Nikulin @ 2022-11-21 3:15 ` Ihor Radchenko 2022-11-21 16:48 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-21 3:15 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> However, I am not sure if ignoring locale is something we really want. >> WDYT? > > I think we should keep `string-collate-lessp' in the > `org-table-sort-lines' implementation. Users expect sorting accordingly > to their locales. However it is better to add a warning to > `org-table-sort-lines' docstring and to the manual that caseless sort > depends on its implementation in libc, so currently it does not work in > clang/llvm and so e.g. on MacOS. Sounds reasonable. Note that not only `org-table-sort-lines' is using `string-collate-lessp'. The full list of functions potentially affected by libc sorting is: 1. Bibliography order in `org-cite-basic-export-bibliography' (via org-cite-basic--sort-keys -> org-cite-basic--field-less-p) 2. `org-sort-list' 3. `org-table-sort-lines' 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to "Alphabetical" or "Reverse alphabetical". 5. `org-sort-entries' 6. Agenda sorting, when alphabetical sorting is involved 7. `org-map-entries' I am not 100% sure where we should add the information to docstring/manual and where we should not. > Concerning the test, I would split the current testcase into 2 parts > depending on WITH-CASE argument, check if caseless collation is > available and skip the related test otherwise. How can we check the availability? > As to the thread linked to the bug report > https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html > "case-insensitive string comparison." Tue, 19 Jul 2022 13:27:50 -0400, > there is a link > https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison > unrelated to the issue, but comments and answers there describe a lot of > pitfalls and explain why string comparison ignoring case is not trivial. > (It is a Sisyphean task in some sense, I like the comment on 3 sigmas.) Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However, what we are concerned about here is consistency. Not the pitfalls per se. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-21 3:15 ` Ihor Radchenko @ 2022-11-21 16:48 ` Max Nikulin 2022-11-22 1:14 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-11-21 16:48 UTC (permalink / raw) To: emacs-orgmode On 21/11/2022 10:15, Ihor Radchenko wrote: > Max Nikulin writes: > >>> However, I am not sure if ignoring locale is something we really want. >>> WDYT? >> >> I think we should keep `string-collate-lessp' in the >> `org-table-sort-lines' implementation. Users expect sorting accordingly >> to their locales. However it is better to add a warning to >> `org-table-sort-lines' docstring and to the manual that caseless sort >> depends on its implementation in libc, so currently it does not work in >> clang/llvm and so e.g. on MacOS. > > Sounds reasonable. > > Note that not only `org-table-sort-lines' is using > `string-collate-lessp'. The full list of functions potentially affected > by libc sorting is: > > 1. Bibliography order in `org-cite-basic-export-bibliography' > (via org-cite-basic--sort-keys -> org-cite-basic--field-less-p) > 3. `org-table-sort-lines' Confirmed. > 2. `org-sort-list' > 5. `org-sort-entries' `downcase' is used, not proper case folding, so a potential issue > 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to > "Alphabetical" or "Reverse alphabetical". IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code. > 6. Agenda sorting, when alphabetical sorting is involved `string-lessp' and `downcase' so even more severe locale-related issues might be expected. > 7. `org-map-entries' Unsure which predicate is used. > I am not 100% sure where we should add the information to > docstring/manual and where we should not. If footnotes in the manual had fixed labels then I would suggest reference the same footnote in the manual and in the docstrings. Perhaps, a new subsection should be added to info "(org) Miscellaneous" and "see info node ..." should be added to all involved docstrings. >> Concerning the test, I would split the current testcase into 2 parts >> depending on WITH-CASE argument, check if caseless collation is >> available and skip the related test otherwise. > > How can we check the availability? (string-collate-lessp "a" "B" "C" t) > Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However, > what we are concerned about here is consistency. Not the pitfalls per > se. Achieving consistency across Org code requires additional efforts. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-21 16:48 ` Max Nikulin @ 2022-11-22 1:14 ` Ihor Radchenko 2022-11-22 16:01 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-22 1:14 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> 2. `org-sort-list' >> 5. `org-sort-entries' > `downcase' is used, not proper case folding, so a potential issue `downcase' is used to determine user input about sorting type. Not for sorting itself. >> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to >> "Alphabetical" or "Reverse alphabetical". > > IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code. I feel like we are slightly miscommunicating here. I mostly tried to list the uses of libc-sensitive sorting. Not specifically cases when we try to ignore the case. The problem is not limited to case-sensitive comparisons. Some systems may fail to implement specific locales and thus sorting may downgrade to simple string-lessp. No `downcase' is hidden anywhere there. >> 6. Agenda sorting, when alphabetical sorting is involved > > `string-lessp' and `downcase' so even more severe locale-related issues > might be expected. Could you please elaborate? >> 7. `org-map-entries' > > Unsure which predicate is used. It is a similar scenario with agenda. `org-map-entries' uses `org-make-tags-matcher', which calls `org-op-to-function' when user wants to select property values via </<=/>/>= criterion. `org-op-to-function' calls `org-string<' or similar that, in turn, uses `string-collate-lessp' with nil IGNORE-CASE argument. >> I am not 100% sure where we should add the information to >> docstring/manual and where we should not. > > If footnotes in the manual had fixed labels then I would suggest > reference the same footnote in the manual and in the docstrings. > Perhaps, a new subsection should be added to info "(org) Miscellaneous" > and "see info node ..." should be added to all involved docstrings. Sounds reasonable. >>> Concerning the test, I would split the current testcase into 2 parts >>> depending on WITH-CASE argument, check if caseless collation is >>> available and skip the related test otherwise. >> >> How can we check the availability? > > (string-collate-lessp "a" "B" "C" t) Thanks! >> Indeed. Also, see https://nullprogram.com/blog/2014/06/13/. However, >> what we are concerned about here is consistency. Not the pitfalls per >> se. > > Achieving consistency across Org code requires additional efforts. Well. Just using `string-lessp' would make things very consistent. Easily and with no efforts. The question though is what is the right thing to do for users while also keeping consistency. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-22 1:14 ` Ihor Radchenko @ 2022-11-22 16:01 ` Max Nikulin 2022-11-23 10:37 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-11-22 16:01 UTC (permalink / raw) To: emacs-orgmode On 22/11/2022 08:14, Ihor Radchenko wrote: > Max Nikulin writes: > >>> 2. `org-sort-list' >>> 5. `org-sort-entries' >> `downcase' is used, not proper case folding, so a potential issue > > `downcase' is used to determine user input about sorting type. > Not for sorting itself. See case-func variable. Its initialization depends on the IGNORE-CASE argument. Strings to sort are passed either through `identity' or through `downcase'. >>> 4. `org-set-tags' (tag order), when `org-tags-sort-function' is set to >>> "Alphabetical" or "Reverse alphabetical". >> >> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code. > > I feel like we are slightly miscommunicating here. > I mostly tried to list the uses of libc-sensitive sorting. Not > specifically cases when we try to ignore the case. > > The problem is not limited to case-sensitive comparisons. Some systems > may fail to implement specific locales and thus sorting may downgrade to > simple string-lessp. When case folding is not involved, I consider `string-lessp' as a graceful degradation. Despite locale rules are not applied, strings are mostly sorted. Exceptions exist, but usually order is reasonable. Completely disregarding IGNORE-CASE argument of `string-collate-lessp' on MacOS (that is not a heavily stripped embedded OS) is a bad surprise for me. >>> 6. Agenda sorting, when alphabetical sorting is involved >> >> `string-lessp' and `downcase' so even more severe locale-related issues >> might be expected. > > Could you please elaborate? I admit that `downcase' may be an acceptable workaround since `string-collate-lessp' may not work IGNORE-CASE, but I believe, when available, `string-collate-lessp' should be the preferred option for sorting. >> Achieving consistency across Org code requires additional efforts. > > Well. Just using `string-lessp' would make things very consistent. > Easily and with no efforts. With hope that clang will get better Unicode support, I would move in the opposite direction, namely wider usage of `string-collate-lessp'. Just using `string-lessp' means no ignore case sort even where it is available now. I have an idea of a compatibility wrapper for `string-collate-lessp' with special treatment of ignoring case and bad libc implementation. Apply `downcase' before passing arguments to `string-lessp'. It should provide consistency, best user experience when locales works properly, and graceful degradation otherwise. I hope, it is acceptable for Org even though such trick is undesired for Emacs due to performance reasons. However I am afraid of compatibility shims after d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time: Refactor into top-level `defmacro' P.S. I am not motivated enough to build Emacs on Linux using clang to check if locale information will be available. I am almost sure that some locale information is available on MacOS, e.g. at least strcasecmp even if full CLDR can not be easily accessed from C. I do not have a Mac to check state of affairs. For objective-C there is e.g. comareCaseIndependent. I do not like that Emacs relies on locale support (and timezone as well) in libc. It becomes a problem as soon as more than one locale should be used in simultaneously. I agree that there are enough complications and sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes specific locale even restricted to a part of a document. It is tricky to handle such cases, but current limitations are too strict (and defective `string-collate-lessp' on MacOS is an example). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-22 16:01 ` Max Nikulin @ 2022-11-23 10:37 ` Ihor Radchenko 2022-11-23 15:27 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-23 10:37 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > On 22/11/2022 08:14, Ihor Radchenko wrote: >> Max Nikulin writes: >> >>>> 2. `org-sort-list' >>>> 5. `org-sort-entries' >>> `downcase' is used, not proper case folding, so a potential issue >> >> `downcase' is used to determine user input about sorting type. >> Not for sorting itself. > > See case-func variable. Its initialization depends on the IGNORE-CASE > argument. Strings to sort are passed either through `identity' or > through `downcase'. Thanks for the pointer. Now, I am getting more confused though. Do we even need to use `string-collate-lessp' then? Eli even argued that `string-collate-lessp' is strictly worse compared to more predictable approach. See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40 Do you remember any cases when users actually demanded locale-specific sorting? >>> IGNORE-CASE argument is not used, perhaps `downcase' is hidden in the code. >> >> I feel like we are slightly miscommunicating here. >> I mostly tried to list the uses of libc-sensitive sorting. Not >> specifically cases when we try to ignore the case. >> >> The problem is not limited to case-sensitive comparisons. Some systems >> may fail to implement specific locales and thus sorting may downgrade to >> simple string-lessp. > > When case folding is not involved, I consider `string-lessp' as a > graceful degradation. Despite locale rules are not applied, strings are > mostly sorted. Exceptions exist, but usually order is reasonable. > > Completely disregarding IGNORE-CASE argument of `string-collate-lessp' > on MacOS (that is not a heavily stripped embedded OS) is a bad surprise > for me. It was a surprise for me as well. Should be at least a bit more clear now as I updated the docstring of `string-collate-lessp'. However, I feel a bit lost about what to do on Org side. We can put a disclaimer in the manual and all that, but it still feels too complex. >>>> 6. Agenda sorting, when alphabetical sorting is involved >>> >>> `string-lessp' and `downcase' so even more severe locale-related issues >>> might be expected. >> >> Could you please elaborate? > > I admit that `downcase' may be an acceptable workaround since > `string-collate-lessp' may not work IGNORE-CASE, but I believe, when > available, `string-collate-lessp' should be the preferred option for > sorting. As I pointed above, Eli has an opposite opinion. I feel that my understanding of the topic is not sufficient to judge. Maybe we should ask users? (But who is even aware about these things happening under the hood?) > I have an idea of a compatibility wrapper for `string-collate-lessp' > with special treatment of ignoring case and bad libc implementation. > Apply `downcase' before passing arguments to `string-lessp'. It should > provide consistency, best user experience when locales works properly, > and graceful degradation otherwise. I hope, it is acceptable for Org > even though such trick is undesired for Emacs due to performance reasons. Macro idea sounds reasonable. Though I am still unsure which direction we need to go. > However I am afraid of compatibility shims after > > d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time: > Refactor into top-level `defmacro' What do you refer to? > I do not like that Emacs relies on locale support (and timezone as well) > in libc. It becomes a problem as soon as more than one locale should be > used in simultaneously. I agree that there are enough complications and > sometimes locale depends on the document (e.g. #+LANGUAGE:), sometimes > specific locale even restricted to a part of a document. It is tricky to > handle such cases, but current limitations are too strict (and defective > `string-collate-lessp' on MacOS is an example). The question is what can be done and, more importantly, how much effort will it take to implement and maintain an alternative. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-23 10:37 ` Ihor Radchenko @ 2022-11-23 15:27 ` Max Nikulin 2022-11-23 17:01 ` Max Nikulin 2022-11-26 2:05 ` Ihor Radchenko 0 siblings, 2 replies; 27+ messages in thread From: Max Nikulin @ 2022-11-23 15:27 UTC (permalink / raw) To: emacs-orgmode On 23/11/2022 17:37, Ihor Radchenko wrote: > Max Nikulin writes: >> >> Strings to sort are passed either through `identity' or >> through `downcase'. > > Thanks for the pointer. > Now, I am getting more confused though. > Do we even need to use `string-collate-lessp' then? I think we do because sort result is presented to humans. (setq lst '("semana" "señor" "sepia")) (sort lst #'string-lessp) ; => ("semana" "sepia" "señor") (sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia") > Eli even argued that `string-collate-lessp' is strictly worse compared > to more predictable approach. See > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40 In this particular case Eli may assume that e.g. list is a elisp structure, not a kind of text formatting. In general, I am quite pessimistic concerning quality of locales support in Emacs while Eli may have rather different point of view. > Do you remember any cases when users actually demanded locale-specific > sorting? I think, users too often face poor locale support in various applications, so they are not surprised when see incorrect results. In some sense such results are consistent (erroneous in the same way). Formatting of numbers in Emacs is the extreme case of consistency. For the sake of reliably reading/writing of numbers from/to files or network it is impossible to present a number accordingly to the current locale. An exception is en_US that has some dedicated code in calc. I believe, it is silly to adhere to a common denominator and to not use `string-collate-lessp' just because it is unavailable in some environments. > However, I feel a bit lost about what to do on Org side. > We can put a disclaimer in the manual and all that, but it still feels > too complex. My current suggestion is to provide a fallback to `downcase' in the code and to explain in the manual that runtime environments (OSes) are not equal and quality of locale support varies. Emacs heavily depends on libc in this area. >> However I am afraid of compatibility shims after >> >> d3a9c424b 2022-08-16 17:15:27 +0800 Ihor Radchenko: org-encode-time: >> Refactor into top-level `defmacro' > > What do you refer to? Implementation must be chosen at compile (or load) time. Due to some issues with native compiling it does not work. For string comparison runtime performance penalty may be higher than for timestamp processing. > The question is what can be done and, more importantly, how much effort > will it take to implement and maintain an alternative. Effort is significant however e.g. browsers have their own implementation of Unicode-related stuff. There is ICU library, but Eli is against it because Emacs already has partial implementation of Unicode and it would mean duplication of character database. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-23 15:27 ` Max Nikulin @ 2022-11-23 17:01 ` Max Nikulin 2022-11-26 2:05 ` Ihor Radchenko 1 sibling, 0 replies; 27+ messages in thread From: Max Nikulin @ 2022-11-23 17:01 UTC (permalink / raw) To: emacs-orgmode On 23/11/2022 22:27, Max Nikulin wrote: > > (setq lst '("semana" "señor" "sepia")) > (sort lst #'string-lessp) ; => ("semana" "sepia" "señor") > (sort lst #'string-collate-lessp) ; => ("semana" "señor" "sepia") > > On 23/11/2022 17:37, Ihor Radchenko wrote: >> Eli even argued that `string-collate-lessp' is strictly worse compared >> to more predictable approach. See >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275#40 I think, Eli is afraid of the following sort of inconsistency (string-collate-lessp "z" "ö" "de_DE.UTF-8") ; => nil (string-collate-lessp "z" "ö" "sv_SE.UTF-8") ; => t Mixed language example: U+0049 LATIN CAPITAL LETTER I vs. U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I (sort '("Івана" "Ivan" "Термін" "Вони") (lambda (a b) (string-collate-lessp a b "uk_UA.UTF-8"))) ("Вони" "Івана" "Термін" "Ivan") (sort '("Івана" "Ivan" "Термін" "Вони") (lambda (a b) (string-collate-lessp a b "en_US.UTF-8"))) ("Ivan" "Вони" "Івана" "Термін") I suppose users should get result native to their languages even though others may get another order. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-23 15:27 ` Max Nikulin 2022-11-23 17:01 ` Max Nikulin @ 2022-11-26 2:05 ` Ihor Radchenko 2022-11-29 16:40 ` Max Nikulin 1 sibling, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2022-11-26 2:05 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> However, I feel a bit lost about what to do on Org side. >> We can put a disclaimer in the manual and all that, but it still feels >> too complex. > > My current suggestion is to provide a fallback to `downcase' in the code > and to explain in the manual that runtime environments (OSes) are not > equal and quality of locale support varies. Emacs heavily depends on > libc in this area. This sounds like something to be adapted to Emacs upstream. I suggested to change `string-collate-lessp' fallback behaviour to use `downcase' when IGNORE-CASE is non-nil. See my last message in bug#59275. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: test-org-table/sort-lines: Failing test on macOS 2022-11-26 2:05 ` Ihor Radchenko @ 2022-11-29 16:40 ` Max Nikulin 2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2022-11-29 16:40 UTC (permalink / raw) To: emacs-orgmode On 26/11/2022 09:05, Ihor Radchenko wrote: > Max Nikulin writes: > > This sounds like something to be adapted to Emacs upstream. > I suggested to change `string-collate-lessp' fallback behaviour to use > `downcase' when IGNORE-CASE is non-nil. See my last message in > bug#59275. I do not share Eli's position "all or nothing". I prefer graceful degradation and best result achievable with reasonable efforts. However either the reason is performance or correctness, both variants are against modification of `string-collate-lessp'. I still think that Org will benefit from a compatibility wrapper with `downcase'. The only additional consideration is that compare function should be configurable. If a user access same files from Linux and macOS then it may be really annoying to get different order of entries in agenda. For most of Linux users it is better to use more smart `string-collate-lessp'. Some care is required to sort entries obtained from multiple buffers in predictable environment (locale, case conversion table). ^ permalink raw reply [flat|nested] 27+ messages in thread
* [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) 2022-11-29 16:40 ` Max Nikulin @ 2024-04-03 11:40 ` Ihor Radchenko 2024-05-05 11:59 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2024-04-03 11:40 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> This sounds like something to be adapted to Emacs upstream. >> I suggested to change `string-collate-lessp' fallback behaviour to use >> `downcase' when IGNORE-CASE is non-nil. See my last message in >> bug#59275. > > I do not share Eli's position "all or nothing". I prefer graceful > degradation and best result achievable with reasonable efforts. > However either the reason is performance or correctness, both variants > are against modification of `string-collate-lessp'. I still think that > Org will benefit from a compatibility wrapper with `downcase'. Unless we have user complaints with real-world use-cases, I am leaning towards keeping things consistent with Emacs. Including Emacs-wide fallback for `string-collate-lessp'. This will make our life easier. Maintaining an Org-specific fallback will (1) cost maintenance time; (2) may confuse users used to global Emacs behaviour; (3) has no clear benefit other than our theoretical discussion. > The only additional consideration is that compare function should be > configurable. If a user access same files from Linux and macOS then it > may be really annoying to get different order of entries in agenda. For > most of Linux users it is better to use more smart > `string-collate-lessp'. Some care is required to sort entries obtained > from multiple buffers in predictable environment (locale, case > conversion table). I agree. We can introduce a new customization - `org-string-sort-function' that will be used across Org mode to sort user text. It would be even better to allow smart sort function that depends on document #+language, but I do not see an easy way to implement such feature - `string-collate-lessp' does accept LOCALE argument, but I have no idea how to link #+LANGUAGE to locale deterministically. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) 2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko @ 2024-05-05 11:59 ` Ihor Radchenko 2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2024-05-05 11:59 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode [-- Attachment #1: Type: text/plain, Size: 769 bytes --] Ihor Radchenko <yantar92@posteo.net> writes: >> The only additional consideration is that compare function should be >> configurable. If a user access same files from Linux and macOS then it >> may be really annoying to get different order of entries in agenda. For >> most of Linux users it is better to use more smart >> `string-collate-lessp'. Some care is required to sort entries obtained >> from multiple buffers in predictable environment (locale, case >> conversion table). > > I agree. We can introduce a new customization - > `org-string-sort-function' that will be used across Org mode to sort > user text. See the attached tentative patch. I added a customization, made everything in Org obey it, and provided some default options for MacOS users. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-org-New-Org-wide-custom-option-org-sort-function.patch --] [-- Type: text/x-patch, Size: 13446 bytes --] From dbc3929d8c7a26da3bf31fb52a651da68d1f733b Mon Sep 17 00:00:00 2001 Message-ID: <dbc3929d8c7a26da3bf31fb52a651da68d1f733b.1714910323.git.yantar92@posteo.net> From: Ihor Radchenko <yantar92@posteo.net> Date: Sun, 5 May 2024 14:37:52 +0300 Subject: [PATCH] org: New Org-wide custom option `org-sort-function' * lisp/org-macs.el (org-sort-function): New customization defining how Org mode should sort headlines, table lines, agenda lines, etc. (org-string<): (org-string<=): (org-string>=): (org-string>): Use the new customization. (org-string<>): Add docstring. (org-sort-function-downcase): New helper function to help users on MacOS where `string-collate-lessp' is not reliable. * lisp/oc-basic.el (org-cite-basic--field-less-p): * lisp/org-agenda.el (org-cmp-category): (org-cmp-alpha): * lisp/org-list.el (org-sort-list): * lisp/org-mouse.el (org-mouse-list-options-menu): * lisp/org-table.el (org-table-sort-lines): * lisp/org.el (org-tags-sort-function): (org-sort-entries): * lisp/ox-publish.el (org-publish-sitemap): Honor the new customization. * lisp/org-mouse.el (org-mouse-tag-menu): (org-mouse-popup-global-menu): * lisp/org-agenda.el (org-cmp-tag): Honor `org-tags-sort-function' and falling back to `org-string<' if note set. * etc/ORG-NEWS (New option controlling how Org mode sorts things ~org-sort-function~): Announce the change. This change aims to standardize the way Org mode performs sorting of user data. In particular, it addresses issues with oddities of string collation rules on MacOS and tricky language environments like Turkish. Link: https://orgmode.org/list/87jzleptcs.fsf@localhost --- etc/ORG-NEWS | 20 ++++++++++++++ lisp/oc-basic.el | 2 +- lisp/org-agenda.el | 12 ++++----- lisp/org-list.el | 2 +- lisp/org-macs.el | 66 +++++++++++++++++++++++++++++++++++++--------- lisp/org-mouse.el | 13 +++++---- lisp/org-table.el | 4 +-- lisp/org.el | 6 ++--- lisp/ox-publish.el | 9 +++---- 9 files changed, 98 insertions(+), 36 deletions(-) diff --git a/etc/ORG-NEWS b/etc/ORG-NEWS index 3c597db40..af88febb1 100644 --- a/etc/ORG-NEWS +++ b/etc/ORG-NEWS @@ -710,6 +710,26 @@ any more. Run ~org-ctags-enable~ to setup hooks and advices: #+end_src ** New and changed options +*** New option controlling how Org mode sorts things ~org-sort-function~ + +Sorting of agenda items, tables, menus, headlines, etc can now be +controlled using a new custom option ~org-sort-function~. + +By default, Org mode sorts things according to the operation system +language. However, language sorting rules may or may not produce good +results depending on the use case. For example, multi-language +documents may be sorted weirdly when sorting rules for system language +are applied on the text written using different language. Also, some +operations systems (e.g. MacOS), do not provide accurate string +sorting rules. + +Org mode provides 4 possible values for ~org-sort-function~: +1. (default) Sort using system language rules. +2. Sort using dumb string comparison. It is the most reliable option. +3. Sort case-insensitively, making use of UTF case conversion. This + may work better for mixed-language documents and on MacOS. +4. Custom function, if the above does not fit the needs. + *** =ob-latex= now uses a new option ~org-babel-latex-process-alist~ to generate png output Previously, =ob-latex= used ~org-preview-latex-default-process~ from diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el index 8959bb065..6e3142fa1 100644 --- a/lisp/oc-basic.el +++ b/lisp/oc-basic.el @@ -680,7 +680,7 @@ (defun org-cite-basic--field-less-p (field info) INFO is the export state, as a property list." (and field (lambda (a b) - (string-collate-lessp + (org-string< (org-cite-basic--get-field field a info 'raw) (org-cite-basic--get-field field b info 'raw) nil t)))) diff --git a/lisp/org-agenda.el b/lisp/org-agenda.el index 93c6acef2..05d2f94c0 100644 --- a/lisp/org-agenda.el +++ b/lisp/org-agenda.el @@ -7489,8 +7489,8 @@ (defsubst org-cmp-category (a b) "Compare the string values of categories of strings A and B." (let ((ca (or (get-text-property (1- (length a)) 'org-category a) "")) (cb (or (get-text-property (1- (length b)) 'org-category b) ""))) - (cond ((string-lessp ca cb) -1) - ((string-lessp cb ca) +1)))) + (cond ((org-string< ca cb) -1) + ((org-string< cb ca) +1)))) (defsubst org-cmp-todo-state (a b) "Compare the todo states of strings A and B." @@ -7536,8 +7536,8 @@ (defsubst org-cmp-alpha (a b) (cond ((not (or ta tb)) nil) ((not ta) +1) ((not tb) -1) - ((string-lessp ta tb) -1) - ((string-lessp tb ta) +1)))) + ((org-string< ta tb) -1) + ((org-string< tb ta) +1)))) (defsubst org-cmp-tag (a b) "Compare the string values of the first tags of A and B." @@ -7546,8 +7546,8 @@ (defsubst org-cmp-tag (a b) (cond ((not (or ta tb)) nil) ((not ta) +1) ((not tb) -1) - ((string-lessp ta tb) -1) - ((string-lessp tb ta) +1)))) + ((funcall (or org-tags-sort-function #'org-string<) ta tb) -1) + ((funcall (or org-tags-sort-function #'org-string<) tb ta) +1)))) (defsubst org-cmp-time (a b) "Compare the time-of-day values of strings A and B." diff --git a/lisp/org-list.el b/lisp/org-list.el index fca3758c8..d7559d2a7 100644 --- a/lisp/org-list.el +++ b/lisp/org-list.el @@ -2979,7 +2979,7 @@ (defun org-sort-list (error "Missing key extractor")))) (sort-func (cond - ((= dcst ?a) #'string-collate-lessp) + ((= dcst ?a) #'org-string<) ((= dcst ?f) (or compare-func (and interactive? diff --git a/lisp/org-macs.el b/lisp/org-macs.el index 1254ddb54..c3bef66cd 100644 --- a/lisp/org-macs.el +++ b/lisp/org-macs.el @@ -113,7 +113,6 @@ (declare-function org-fold-save-outline-visibility "org-fold" (use-markers &rest (declare-function org-fold-next-visibility-change "org-fold" (&optional pos limit ignore-hidden-p previous-p)) (declare-function org-fold-core-with-forced-fontification "org-fold" (&rest body)) (declare-function org-fold-folded-p "org-fold" (&optional pos limit ignore-hidden-p previous-p)) -(declare-function string-collate-lessp "org-compat" (s1 s2 &optional locale ignore-case)) (declare-function org-time-convert-to-list "org-compat" (time)) (declare-function org-buffer-text-pixel-width "org-compat" ()) @@ -982,20 +981,63 @@ (defun org-uuidgen-p (s) \f ;;; String manipulation -(defun org-string< (a b) - (string-collate-lessp a b)) - -(defun org-string<= (a b) - (or (string= a b) (string-collate-lessp a b))) - -(defun org-string>= (a b) - (not (string-collate-lessp a b))) - -(defun org-string> (a b) +(defcustom org-sort-function #'string-collate-lessp + "Function used to compare strings when sorting. +This function affects how Org mode sorts headlines, agenda items, +table lines, etc. + +The function must accept either 2 or 4 arguments: strings to compare +and, optionally, LOCALE and IGNORE-CASE - locale name and flag to make +comparison case-insensitive. + +The default value uses sorting rules according to OS language. Users +who want to make sorting language-independent, may customize the value +to `string-lessp'. + +Note that some string sorting rules are known to be not accurate on +MacOS. See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=59275. +MacOS users may customize the value to `org-sort-function-downcase'." + :group 'org + :package-version '(Org . "9.7") + :type '(choice + (const :tag "According to OS language" string-collate-lessp) + (const :tag "Using string comparison" string-lessp) + (const :tag "Case-insensitive string comparison" org-sort-function-downcase) + (function :tag "Custom function"))) + +(defun org-sort-function-downcase (a b &optional _ _) + "Return non-nil when downcased string A < string B. Ignore case." + (string-lessp (downcase a) (downcase b))) + +(defun org-string< (a b &optional locale ignore-case) + "Return non-nil when string A < string B. +LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison +ignore case." + (if (= 4 (cdr (func-arity org-sort-function))) + (funcall org-sort-function a b locale ignore-case) + (funcall org-sort-function a b))) + +(defun org-string<= (a b &optional locale ignore-case) + "Return non-nil when string A <= string B. +LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison +ignore case." + (or (string= a b) (org-string< a b locale ignore-case))) + +(defun org-string>= (a b &optional locale ignore-case) + "Return non-nil when string A >= string B. +LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison +ignore case." + (not (org-string< a b locale ignore-case))) + +(defun org-string> (a b &optional locale ignore-case) + "Return non-nil when string A > string B. +LOCALE is the locale name. IGNORE-CASE, when non-nil, makes comparison +ignore case." (and (not (string= a b)) - (not (string-collate-lessp a b)))) + (not (org-string< a b locale ignore-case)))) (defun org-string<> (a b) + "Return non-nil when string A and string B are not equal." (not (string= a b))) (defsubst org-trim (s &optional keep-lead) diff --git a/lisp/org-mouse.el b/lisp/org-mouse.el index 2904bad1f..0b1ddaa6e 100644 --- a/lisp/org-mouse.el +++ b/lisp/org-mouse.el @@ -426,13 +426,14 @@ (defun org-mouse-tag-menu () ;todo (append (let ((tags (org-get-tags nil t))) (org-mouse-keyword-menu - (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp) + (sort (mapcar #'car (org-get-buffer-tags)) + (or org-tags-sort-function #'org-string<)) (lambda (tag) (org-mouse-set-tags (sort (if (member tag tags) (delete tag tags) (cons tag tags)) - #'string-lessp))) + (or org-tags-sort-function #'org-string<)))) (lambda (tag) (member tag tags)) )) '("--" @@ -473,7 +474,7 @@ (defun org-mouse-list-options-menu (alloptions &optional function) (sort (if (member ',name ',options) (delete ',name ',options) (cons ',name ',options)) - 'string-lessp) + #'org-string<) " ") nil nil nil 1) (when (functionp ',function) (funcall ',function))) @@ -502,7 +503,8 @@ (defun org-mouse-popup-global-menu () ["Check TODOs" org-show-todo-tree t] ("Check Tags" ,@(org-mouse-keyword-menu - (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp) + (sort (mapcar #'car (org-get-buffer-tags)) + (or org-tags-sort-function #'org-string<)) (lambda (tag) (org-tags-sparse-tree nil tag))) "--" ["Custom Tag ..." org-tags-sparse-tree t]) @@ -512,7 +514,8 @@ (defun org-mouse-popup-global-menu () ["Display TODO List" org-todo-list t] ("Display Tags" ,@(org-mouse-keyword-menu - (sort (mapcar #'car (org-get-buffer-tags)) #'string-lessp) + (sort (mapcar #'car (org-get-buffer-tags)) + (or org-tags-sort-function #'org-string<)) (lambda (tag) (org-tags-view nil tag))) "--" ["Custom Tag ..." org-tags-view t]) diff --git a/lisp/org-table.el b/lisp/org-table.el index 0c2dc27ed..45fe4d0fa 100644 --- a/lisp/org-table.el +++ b/lisp/org-table.el @@ -4637,8 +4637,8 @@ (defun org-table-sort-lines (predicate (cl-case sorting-type ((?n ?N ?t ?T) #'<) - ((?a ?A) (if with-case #'string-collate-lessp - (lambda (s1 s2) (string-collate-lessp s1 s2 nil t)))) + ((?a ?A) (if with-case #'org-string< + (lambda (s1 s2) (org-string< s1 s2 nil t)))) ((?f ?F) (or compare-func (and interactive? diff --git a/lisp/org.el b/lisp/org.el index 20879685c..f9a9332aa 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -2944,8 +2944,8 @@ (defcustom org-tags-sort-function nil :group 'org-tags :type '(choice (const :tag "No sorting" nil) - (const :tag "Alphabetical" string-collate-lessp) - (const :tag "Reverse alphabetical" org-string-collate-greaterp) + (const :tag "Alphabetical" org-string<) + (const :tag "Reverse alphabetical" org-string>) (function :tag "Custom function" nil))) (defvar org-tags-history nil @@ -7955,7 +7955,7 @@ (defun org-sort-entries (t (error "Invalid sorting type `%c'" sorting-type)))) nil (cond - ((= dcst ?a) 'string-collate-lessp) + ((= dcst ?a) #'org-string<) ((= dcst ?f) (or compare-func (and interactive? diff --git a/lisp/ox-publish.el b/lisp/ox-publish.el index 3e526b813..1b623ce9f 100644 --- a/lisp/ox-publish.el +++ b/lisp/ox-publish.el @@ -794,17 +794,14 @@ (defun org-publish-sitemap (project &optional sitemap-filename) (concat (file-name-directory b) (org-publish-find-title b project)) b))) - (setq retval - (if ignore-case - (not (string-lessp (upcase B) (upcase A))) - (not (string-lessp B A)))))) + (setq retval (org-string<= A B nil ignore-case)))) ((or `anti-chronologically `chronologically) (let* ((adate (org-publish-find-date a project)) (bdate (org-publish-find-date b project))) (setq retval (not (if (eq sort-files 'chronologically) - (time-less-p bdate adate) - (time-less-p adate bdate)))))) + (time-less-p bdate adate) + (time-less-p adate bdate)))))) (`nil nil) (_ (user-error "Invalid sort value %s" sort-files))) ;; Directory-wise wins: -- 2.45.0 [-- Attachment #3: Type: text/plain, Size: 224 bytes --] -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale 2024-05-05 11:59 ` Ihor Radchenko @ 2024-05-07 11:06 ` Max Nikulin 2024-05-07 13:09 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2024-05-07 11:06 UTC (permalink / raw) To: emacs-orgmode On 05/05/2024 18:59, Ihor Radchenko wrote: > Ihor Radchenko writes: > >>> If a user access same files from Linux and macOS then it >>> may be really annoying to get different order of entries in agenda. For >>> most of Linux users it is better to use more smart >>> `string-collate-lessp'. Some care is required to sort entries obtained >>> from multiple buffers in predictable environment (locale, case >>> conversion table). >> >> I agree. We can introduce a new customization - >> `org-string-sort-function' that will be used across Org mode to sort >> user text. > > See the attached tentative patch. > I added a customization, made everything in Org obey it, and provided > some default options for MacOS users. Contrary to Eli, I still think that there are enough locales where completely disregarding IGNORE-CASE is worse than fallback to `downcase' when IGNORE-CASE is t. Perhaps some kind of normalization (NFD?) may improve results further. I consider the following as a kind of graceful degradation (defun org-sort-function-fallback-downcase (a b &optional LOCALE IGNORE-CASE) (if ignore-case (string-collate-lessp (downcase a) (downcase b) locale ignore-case) (string-collate-lessp a b locale ignore-case))) (defcustom org-sort-function (if (string-collate-lessp "a" "B" "C" t) #'string-collate-lessp #'org-sort-function-fallback-downcase)) I would consider a setter function for `org-sort-function' to avoid branches based of `func-arity' in `org-string<'. I see a little point in purely downcase comparator `org-sort-function-downcase'. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale 2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin @ 2024-05-07 13:09 ` Ihor Radchenko 2024-05-07 16:47 ` Max Nikulin 0 siblings, 1 reply; 27+ messages in thread From: Ihor Radchenko @ 2024-05-07 13:09 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: > I consider the following as a kind of graceful degradation > > (defun org-sort-function-fallback-downcase > (a b &optional LOCALE IGNORE-CASE) > (if ignore-case > (string-collate-lessp (downcase a) (downcase b) locale ignore-case) > (string-collate-lessp a b locale ignore-case))) It is indeed better than `org-sort-function-downcase'. > (defcustom org-sort-function > (if (string-collate-lessp "a" "B" "C" t) > #'string-collate-lessp > #'org-sort-function-fallback-downcase)) No. Let's be consistent with Emacs here. > I would consider a setter function for `org-sort-function' to avoid > branches based of `func-arity' in `org-string<'. Setter is not reliable when setq is used, so I prefer arity check. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale 2024-05-07 13:09 ` Ihor Radchenko @ 2024-05-07 16:47 ` Max Nikulin 2024-05-11 9:38 ` Ihor Radchenko 0 siblings, 1 reply; 27+ messages in thread From: Max Nikulin @ 2024-05-07 16:47 UTC (permalink / raw) To: emacs-orgmode On 07/05/2024 20:09, Ihor Radchenko wrote: > Max Nikulin writes: > >> I consider the following as a kind of graceful degradation >> >> (defun org-sort-function-fallback-downcase >> (a b &optional LOCALE IGNORE-CASE) >> (if ignore-case >> (string-collate-lessp (downcase a) (downcase b) locale ignore-case) >> (string-collate-lessp a b locale ignore-case))) > > It is indeed better than `org-sort-function-downcase'. `compare-strings' with upcase conversion under the hood may be an alternative. >> I would consider a setter function for `org-sort-function' to avoid >> branches based of `func-arity' in `org-string<'. > > Setter is not reliable when setq is used, so I prefer arity check. I bothers me as well. Another idea is to require 2 optional argument and thus wrappers for 2 argument functions. My expectation that extra function call still may be cheaper. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [DISCUSSION] Sorting strings in Org mode vs. system locale 2024-05-07 16:47 ` Max Nikulin @ 2024-05-11 9:38 ` Ihor Radchenko 0 siblings, 0 replies; 27+ messages in thread From: Ihor Radchenko @ 2024-05-11 9:38 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode Max Nikulin <manikulin@gmail.com> writes: >> >>> I consider the following as a kind of graceful degradation >>> >>> (defun org-sort-function-fallback-downcase >>> (a b &optional LOCALE IGNORE-CASE) >>> (if ignore-case >>> (string-collate-lessp (downcase a) (downcase b) locale ignore-case) >>> (string-collate-lessp a b locale ignore-case))) >> >> It is indeed better than `org-sort-function-downcase'. > > `compare-strings' with upcase conversion under the hood may be an > alternative. Applied, onto main. https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=413192698 I replaced the two fallback variants with #'string< and custom downcase function with a single fallback that uses `compare-strings'. Closed. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2024-05-11 9:38 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-06 20:15 test-org-table/sort-lines: Failing test on macOS Rudolf Adamkovič 2022-10-07 12:04 ` Max Nikulin 2022-10-08 5:25 ` Ihor Radchenko 2022-10-08 14:27 ` Max Nikulin 2022-10-09 3:59 ` Ihor Radchenko 2022-10-09 15:38 ` Rudolf Adamkovič 2022-10-09 16:53 ` Max Nikulin 2022-10-10 22:25 ` Rudolf Adamkovič 2022-10-12 16:09 ` Max Nikulin 2022-11-15 4:10 ` Ihor Radchenko 2022-11-20 4:18 ` Ihor Radchenko 2022-11-20 8:00 ` Max Nikulin 2022-11-21 3:15 ` Ihor Radchenko 2022-11-21 16:48 ` Max Nikulin 2022-11-22 1:14 ` Ihor Radchenko 2022-11-22 16:01 ` Max Nikulin 2022-11-23 10:37 ` Ihor Radchenko 2022-11-23 15:27 ` Max Nikulin 2022-11-23 17:01 ` Max Nikulin 2022-11-26 2:05 ` Ihor Radchenko 2022-11-29 16:40 ` Max Nikulin 2024-04-03 11:40 ` [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS) Ihor Radchenko 2024-05-05 11:59 ` Ihor Radchenko 2024-05-07 11:06 ` [DISCUSSION] Sorting strings in Org mode vs. system locale Max Nikulin 2024-05-07 13:09 ` Ihor Radchenko 2024-05-07 16:47 ` Max Nikulin 2024-05-11 9:38 ` Ihor Radchenko
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.