unofficial mirror of emacs-orgmode@gnu.org
 help / color / mirror / code / Atom feed
* [Patch] to correctly sort the items with emphasis marks in a list
@ 2021-04-02 18:15 Juan Manuel Macías
  2021-04-09 22:28 ` Nicolas Goaziou
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-02 18:15 UTC (permalink / raw)
  To: orgmode

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

Hi all,

I have noticed that a couple of (spurious?) spaces in a `format'
expression of `org-sort-remove-invisible' is causing `org-sort-list' not
to sort correctly (alphabetically) the items that contain emphasis marks
in a plain list. I propose this very simple patch to fix that problem.

Best regards,

Juan Manuel 


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: org-sort-remove-invisible_remove_spaces.patch --]
[-- Type: text/x-patch, Size: 494 bytes --]

diff --git a/lisp/org.el b/lisp/org.el
index 04da1afcd..d10cc2f5c 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8114,7 +8114,7 @@ Optional argument WITH-CASE means sort case-sensitively."
   (replace-regexp-in-string
    org-verbatim-re (lambda (m) (format "%s " (match-string 4 m)))
    (replace-regexp-in-string
-    org-emph-re (lambda (m) (format " %s " (match-string 4 m)))
+    org-emph-re (lambda (m) (format "%s" (match-string 4 m)))
     (org-link-display-format s)
     t t) t t))
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-02 18:15 [Patch] to correctly sort the items with emphasis marks in a list Juan Manuel Macías
@ 2021-04-09 22:28 ` Nicolas Goaziou
  2021-04-10  0:01   ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-09 22:28 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

Hello,

Juan Manuel Macías <maciaschain@posteo.net> writes:

> I have noticed that a couple of (spurious?) spaces in a `format'
> expression of `org-sort-remove-invisible' is causing `org-sort-list' not
> to sort correctly (alphabetically) the items that contain emphasis marks
> in a plain list. I propose this very simple patch to fix that problem.

Thank you.

Could you apply the same fix to the `org-verbatim-re' match above, and
provide an appropriate commit message?

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-09 22:28 ` Nicolas Goaziou
@ 2021-04-10  0:01   ` Juan Manuel Macías
  2021-04-10 10:19     ` Nicolas Goaziou
  2021-04-17 13:27     ` Maxim Nikulin
  0 siblings, 2 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-10  0:01 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: orgmode

[-- Attachment #1: Type: text/plain, Size: 331 bytes --]

Hellow Nicolas:

Nicolas Goaziou writes:

> Thank you.
>
> Could you apply the same fix to the `org-verbatim-re' match above, and
> provide an appropriate commit message?

Done! I've attached the corrected patch. Sorry for the flaws in me
previous patch: I'm a bit of a novice at submitting patches...

Best regards,

Juan Manuel


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Remove-spurious-spaces-in-org-sort-remove-invisible.patch --]
[-- Type: text/x-patch, Size: 1172 bytes --]

From ab104bb079e3e32d622a6ff53824b5047dc25c14 Mon Sep 17 00:00:00 2001
From: Juan Manuel Macias <maciaschain@posteo.net>
Date: Fri, 2 Apr 2021 21:20:17 +0200
Subject: [PATCH] Remove spurious spaces in org-sort-remove-invisible

Some spurious spaces in org-sort-remove-invisible is causing
org-sort-list not to sort correctly (alphabetically) the items that
contain emphasis marks in a plain list
---
 lisp/org.el | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 675a614e2..74e2afac9 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8113,9 +8113,9 @@ Optional argument WITH-CASE means sort case-sensitively."
   "Remove invisible part of links and emphasis markers from string S."
   (remove-text-properties 0 (length s) org-rm-props s)
   (replace-regexp-in-string
-   org-verbatim-re (lambda (m) (format "%s " (match-string 4 m)))
+   org-verbatim-re (lambda (m) (format "%s" (match-string 4 m)))
    (replace-regexp-in-string
-    org-emph-re (lambda (m) (format " %s " (match-string 4 m)))
+    org-emph-re (lambda (m) (format "%s" (match-string 4 m)))
     (org-link-display-format s)
     t t) t t))
 
-- 
2.26.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-10  0:01   ` Juan Manuel Macías
@ 2021-04-10 10:19     ` Nicolas Goaziou
  2021-04-10 11:41       ` Juan Manuel Macías
  2021-04-17 13:27     ` Maxim Nikulin
  1 sibling, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-10 10:19 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

Hello,

Juan Manuel Macías <maciaschain@posteo.net> writes:

> Done! I've attached the corrected patch. Sorry for the flaws in me
> previous patch: I'm a bit of a novice at submitting patches...

No problem. Thank you.

Do you have a simple test case to reproduce the problem? Currently
sorting the following trivial lists causes no issue:

    - b
    - *a*

and 

    - *b*
    - a

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-10 10:19     ` Nicolas Goaziou
@ 2021-04-10 11:41       ` Juan Manuel Macías
  2021-04-13 17:31         ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-10 11:41 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: orgmode

Hello Nicolas,

Nicolas Goaziou writes:

> Do you have a simple test case to reproduce the problem? Currently
> sorting the following trivial lists causes no issue:
>
>     - b
>     - *a*
>
> and
>
>     - *b*
>     - a

Consider this (unordered) list:

- a
- b
- /v/
- /a/

The current result is wrong:

- /a/
- /v/
- a
- b

With the spaces removed in `org-sort-remove-invisible', this would be the
expected result[1]:

- a
- /a/
- b
- /v/

[1] According to the National Information Standards Organization:
"Different typefaces (italic, boldface, blackletter, etc.) do not affect
the arrangement of letters." (see p. 3 on:
https://www.niso.org/sites/default/files/2017-08/tr03.pdf)

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-10 11:41       ` Juan Manuel Macías
@ 2021-04-13 17:31         ` Maxim Nikulin
  2021-04-13 19:08           ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-13 17:31 UTC (permalink / raw)
  To: emacs-orgmode

On 10/04/2021 18:41, Juan Manuel Macías wrote:
> Nicolas Goaziou writes:
> 
>> Do you have a simple test case to reproduce the problem? Currently
>> sorting the following trivial lists causes no issue:
>>
>>      - b
>>      - *a*
>>
>> and
>>
>>      - *b*
>>      - a
> 
> The current result is wrong:
> 
> - /a/
> - /v/
> - a
> - b

I could reproduce such result but I am in doubt if it is a reason to 
merge the patch. I believe, the following behavior is almost expected

list.org:
- v
- /v/
- /a/
- a

LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \
   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
   list.org

(org-sort-list t ?a)

- /a/
- /v/
- a
- v

LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \
   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
   list.org

(org-sort-list t ?a)

- /a/
- a
- /v/
- v

Collation rules depend on language. The question is if emphasized 
variant should be sorted first.

P.S. The thread is broken. Some new messages do not have proper 
In-Reply-To header. Original question was not referenced in the message 
with patch as well: https://orgmode.org/list/87blbac0k0.fsf@posteo.net/



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-13 17:31         ` Maxim Nikulin
@ 2021-04-13 19:08           ` Juan Manuel Macías
  2021-04-14 15:42             ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-13 19:08 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: orgmode

Hi, Maxim,

Thanks for clearing things up. So it seems obvious that the root
of the problem is in the locales and the collation rules.

The situation is that with locales configured for Spanish from Spain
(en_ES.UTF-8) the list is not ordered correctly, unless those three
spaces from org-sort-remove-invisible are removed. But I couldn't say
why or if that would be appropriate as a patch...

Regarding the collation rule of forms with emphasis, at least in Spanish
these should come after the non-emphasized forms. I don't know if this
is a general rule also for other languages (at least it seems more
natural that the forms without emphasis are placed before). 

Best regards,

Juan Manuel

Maxim Nikulin writes:

> I could reproduce such result but I am in doubt if it is a reason to
> merge the patch. I believe, the following behavior is almost expected
>
> list.org:
> - v
> - /v/
> - /a/
> - a
>
> LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> (org-sort-list t ?a)
>
> - /a/
> - /v/
> - a
> - v
>
> LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> (org-sort-list t ?a)
>
> - /a/
> - a
> - /v/
> - v
>
> Collation rules depend on language. The question is if emphasized
> variant should be sorted first.
>
> P.S. The thread is broken. Some new messages do not have proper
> In-Reply-To header. Original question was not referenced in the
> message with patch as well:
> https://orgmode.org/list/87blbac0k0.fsf@posteo.net/



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-13 19:08           ` Juan Manuel Macías
@ 2021-04-14 15:42             ` Maxim Nikulin
  2021-04-14 15:51               ` Maxim Nikulin
  2021-04-14 17:07               ` Juan Manuel Macías
  0 siblings, 2 replies; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-14 15:42 UTC (permalink / raw)
  To: emacs-orgmode

On 14/04/2021 02:08, Juan Manuel Macías wrote:
> 
> The situation is that with locales configured for Spanish from Spain
> (en_ES.UTF-8) the list is not ordered correctly, unless those three
> spaces from org-sort-remove-invisible are removed. But I couldn't say
> why or if that would be appropriate as a patch...

Did not you get a warning like the following one?

(process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C 
library.
	Using the fallback 'C' locale.

> Regarding the collation rule of forms with emphasis, at least in Spanish
> these should come after the non-emphasized forms. I don't know if this
> is a general rule also for other languages (at least it seems more
> natural that the forms without emphasis are placed before).

LANG=es_ES.UTF-8 \
   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
   list.org

- a
- /a/
- v
- /v/

So it works accordingly to your expectation. Emacs 25.2.2, Ubuntu-18.04 
container.

I have generated es_ES.UTF-8 locale using

     dpkg-reconfigure locales

Depending on linux distribution you run, the locale may be ready to use 
or not. I tend to think that in minimal environment of virtual machine 
it was missed.

I had an idea to add a test for sorting of items including emphasized 
ones but test-org-list/sort forces C locale. Maybe it was done to avoid 
failures due to missed locale.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-14 15:42             ` Maxim Nikulin
@ 2021-04-14 15:51               ` Maxim Nikulin
  2021-04-14 17:07               ` Juan Manuel Macías
  1 sibling, 0 replies; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-14 15:51 UTC (permalink / raw)
  To: emacs-orgmode

On 14/04/2021 22:42, Maxim Nikulin wrote:
> I have generated es_ES.UTF-8 locale using
> 
>      dpkg-reconfigure locales
> 
> Depending on linux distribution you run, the locale may be ready to use 
> or not. I tend to think that in minimal environment of virtual machine 
> it was missed.

I forgot to add a command how to get the list of generated locales

      locale --all-locales

locales-2.27-3ubuntu1.4 package is built from glibc sources.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-14 15:42             ` Maxim Nikulin
  2021-04-14 15:51               ` Maxim Nikulin
@ 2021-04-14 17:07               ` Juan Manuel Macías
  2021-04-14 21:36                 ` Juan Manuel Macías
  2021-04-15 14:58                 ` Maxim Nikulin
  1 sibling, 2 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-14 17:07 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: orgmode

Hi Maxim,

Thanks again. In my case, I keep getting the wrong result. In addition
to the test I did in a virtual machine with Fedora, I use Arch Linux in
my every day computers, with locales correctly (I hope) configured as
es_ES.UTF-8 (there was a typo in my previous mail, sorry:
'en_ES.UTF-8'). In my /etc/locale.conf file I have:

LANG=es_ES.UTF-8
LC_ADDRESS=es_ES.UTF-8
LC_IDENTIFICATION=es_ES.UTF-8
LC_MEASUREMENT=es_ES.UTF-8
LC_MONETARY=es_ES.UTF-8
LC_NAME=es_ES.UTF-8
LC_NUMERIC=es_ES.UTF-8
LC_PAPER=es_ES.UTF-8
LC_TELEPHONE=es_ES.UTF-8
LC_TIME=es_ES.UTF-8

And with locale -a, I get a list of enabled locales:

C
en_US.utf8
es_ES.utf8
POSIX

However, I keep getting the wrong result :-( (Arch, Emacs 27.2).

Even with

LANG=en_EN.UTF-8 \
emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
list.org

Maybe the problem is on Arch's side (?), although it was also reproducible
in the test I did with Fedora in virtual machine and Emacs 27. 

Best regards,

Juan Manuel 

Maxim Nikulin writes:

> On 14/04/2021 02:08, Juan Manuel Macías wrote:
>> The situation is that with locales configured for Spanish from Spain
>> (en_ES.UTF-8) the list is not ordered correctly, unless those three
>> spaces from org-sort-remove-invisible are removed. But I couldn't say
>> why or if that would be appropriate as a patch...
>
> Did not you get a warning like the following one?
>
> (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C
> library.
>       Using the fallback 'C' locale.
>
>> Regarding the collation rule of forms with emphasis, at least in Spanish
>> these should come after the non-emphasized forms. I don't know if this
>> is a general rule also for other languages (at least it seems more
>> natural that the forms without emphasis are placed before).
>
> LANG=es_ES.UTF-8 \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> - a
> - /a/
> - v
> - /v/
>
> So it works accordingly to your expectation. Emacs 25.2.2,
> Ubuntu-18.04 container.
>
> I have generated es_ES.UTF-8 locale using
>
>     dpkg-reconfigure locales
>
> Depending on linux distribution you run, the locale may be ready to
> use or not. I tend to think that in minimal environment of virtual
> machine it was missed.
>
> I had an idea to add a test for sorting of items including emphasized
> ones but test-org-list/sort forces C locale. Maybe it was done to
> avoid failures due to missed locale.
>
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-14 17:07               ` Juan Manuel Macías
@ 2021-04-14 21:36                 ` Juan Manuel Macías
  2021-04-15 14:58                 ` Maxim Nikulin
  1 sibling, 0 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-14 21:36 UTC (permalink / raw)
  To: orgmode

Hello again,

Since I have an old Emacs version (24.5.1) on my Raspberry, I've done
a few more tests. The situation is the following:

1. On Arch Linux and Emacs 27.2:

- The system locales are set to "es_ES.UTF-8". When, inside Emacs, I do M-x
getenv RET LANG RET, I get "es_ES.UTF-8".

org-sort-list a -> wrong result;

- Launching Emacs from terminal with

LANG=es_ES.UTF-8 \
etc...

org-sort-list a -> wrong result again.

2. On Fedora 32 (virtual machine) and Emacs 27.1

- Everything as in the previous case.

3. On Raspian stretch and Emacs 24.5.1:

- The system locales are set to "es_ES.UTF-8" as well. When, inside a
normal Emacs session, I do M-x getenv RET lang RET, I get "es_ES.UTF-8".

org-sort-list a -> wrong result;

- Launching Emacs from terminal with

LANG=es_ES.UTF-8 \
etc...

In this case the list is ordered correctly, but I observe that similar
forms with or without emphasis are not always ordered in the same way.
Sometimes the non-emphasized forms are ordered before and sometimes
they are ordered after (?).

I don't know if I'm missing something...

Best regards,

Juan Manuel

Juan Manuel Macías writes:

> Hi Maxim,
>
> Thanks again. In my case, I keep getting the wrong result. In addition
> to the test I did in a virtual machine with Fedora, I use Arch Linux in
> my every day computers, with locales correctly (I hope) configured as
> es_ES.UTF-8 (there was a typo in my previous mail, sorry:
> 'en_ES.UTF-8'). In my /etc/locale.conf file I have:
>
> LANG=es_ES.UTF-8
> LC_ADDRESS=es_ES.UTF-8
> LC_IDENTIFICATION=es_ES.UTF-8
> LC_MEASUREMENT=es_ES.UTF-8
> LC_MONETARY=es_ES.UTF-8
> LC_NAME=es_ES.UTF-8
> LC_NUMERIC=es_ES.UTF-8
> LC_PAPER=es_ES.UTF-8
> LC_TELEPHONE=es_ES.UTF-8
> LC_TIME=es_ES.UTF-8
>
> And with locale -a, I get a list of enabled locales:
>
> C
> en_US.utf8
> es_ES.utf8
> POSIX
>
> However, I keep getting the wrong result :-( (Arch, Emacs 27.2).
>
> Even with
>
> LANG=en_EN.UTF-8 \
> emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
> list.org
>
> Maybe the problem is on Arch's side (?), although it was also reproducible
> in the test I did with Fedora in virtual machine and Emacs 27.
>
> Best regards,
>
> Juan Manuel
>
> Maxim Nikulin writes:
>
>> On 14/04/2021 02:08, Juan Manuel Macías wrote:
>>> The situation is that with locales configured for Spanish from Spain
>>> (en_ES.UTF-8) the list is not ordered correctly, unless those three
>>> spaces from org-sort-remove-invisible are removed. But I couldn't say
>>> why or if that would be appropriate as a patch...
>>
>> Did not you get a warning like the following one?
>>
>> (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C
>> library.
>>       Using the fallback 'C' locale.
>>
>>> Regarding the collation rule of forms with emphasis, at least in Spanish
>>> these should come after the non-emphasized forms. I don't know if this
>>> is a general rule also for other languages (at least it seems more
>>> natural that the forms without emphasis are placed before).
>>
>> LANG=es_ES.UTF-8 \
>>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>>   list.org
>>
>> - a
>> - /a/
>> - v
>> - /v/
>>
>> So it works accordingly to your expectation. Emacs 25.2.2,
>> Ubuntu-18.04 container.
>>
>> I have generated es_ES.UTF-8 locale using
>>
>>     dpkg-reconfigure locales
>>
>> Depending on linux distribution you run, the locale may be ready to
>> use or not. I tend to think that in minimal environment of virtual
>> machine it was missed.
>>
>> I had an idea to add a test for sorting of items including emphasized
>> ones but test-org-list/sort forces C locale. Maybe it was done to
>> avoid failures due to missed locale.
>>
>>
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-14 17:07               ` Juan Manuel Macías
  2021-04-14 21:36                 ` Juan Manuel Macías
@ 2021-04-15 14:58                 ` Maxim Nikulin
  2021-04-15 18:21                   ` Juan Manuel Macías
  1 sibling, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-15 14:58 UTC (permalink / raw)
  To: emacs-orgmode

I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live 
image running in qemu. Org mode is current git HEAD. It seems that 
something is changed in emacs since locale is correct:

ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
   | LANG=C.UTF-8 sort
- /a/
- /v/
- a
- v
ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
   | LANG=es_ES.UTF-8 sort
- /a/
- a
- /v/
- v

Concerning org-sort-list, I do not see any problem with en_US.UTF-8, 
it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are 
sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8.
I have found some evidence that the problem is on the org side

cat list.el
(message "%S" (sort '("- /v/" "- v" "- a" "- /a/")
                     #'string-collate-lessp))

LC_ALL=C.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- /v/" "- a" "- v")

LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- a" "- /v/" "- v")

LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- a" "- /v/" "- v")

So even string-collate-lessp works as expected.

I'm puzzled why the problem is specific to org-sort-list and namely to 
Spanish locales.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-15 14:58                 ` Maxim Nikulin
@ 2021-04-15 18:21                   ` Juan Manuel Macías
  2021-04-16 14:59                     ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-15 18:21 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: orgmode

Hi Maxim

Maxim Nikulin writes:

> I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live
> image running in qemu. Org mode is current git HEAD. It seems that 
> something is changed in emacs since locale is correct:
>
> ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
>   | LANG=C.UTF-8 sort
> - /a/
> - /v/
> - a
> - v
> ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
>   | LANG=es_ES.UTF-8 sort
> - /a/
> - a
> - /v/
> - v
>
> Concerning org-sort-list, I do not see any problem with en_US.UTF-8,
> it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are 
> sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8.
> I have found some evidence that the problem is on the org side
>
> cat list.el
> (message "%S" (sort '("- /v/" "- v" "- a" "- /a/")
>                     #'string-collate-lessp))
>
> LC_ALL=C.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- /v/" "- a" "- v")
>
> LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- a" "- /v/" "- v")
>
> LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- a" "- /v/" "- v")
>
> So even string-collate-lessp works as expected.
>
> I'm puzzled why the problem is specific to org-sort-list and namely to
> Spanish locales.
>
>

Well that's pretty weird ... I wonder if the (spurious?) spaces in
`org-sort-remove-invisible' that I mentioned at the beginning may be
helpful to keep a track, or if it's just something that masks the
problem.

Anyway, I have noticed that the only space that seems determinant (from
some way that escapes me) is this (where I put the @ sign):

    org-emph-re (lambda (m) (format "@%s " (match-string 4 m)))

Following your examples, it occurred to me to try this, which I don't
know if it is relevant or if I have entered a dead end:

#+begin_src emacs-lisp :tangle list-var.el
  (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa")
			#'string-collate-lessp))
#+end_src

#+begin_src sh
exec 2>&1
LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "- a" "-  v" "- v

#+begin_src sh
exec 2>&1
LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "-  v" "- a" "- v

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-15 18:21                   ` Juan Manuel Macías
@ 2021-04-16 14:59                     ` Maxim Nikulin
  2021-04-16 15:30                       ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-16 14:59 UTC (permalink / raw)
  To: emacs-orgmode

Hi Juan,

On 16/04/2021 01:21, you wrote:
> #+begin_src emacs-lisp :tangle list-var.el
>    (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa")
> 			#'string-collate-lessp))
> #+end_src
> 
> #+begin_src sh
> exec 2>&1
> LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el
> #+end_src
> 
> #+RESULTS:
> : -  a" "- a" "-  v" "- v
> 
> #+begin_src sh
> exec 2>&1
> LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el
> #+end_src
> 
> #+RESULTS:
> : -  a" "-  v" "- a" "- v

You have managed to convince me that despite my first suspects the 
locale on your computer is correct. It is unexpectedly correct and it is 
more correct that most of locales in libc.

However I do not have opinion concerning you patch yet. I have not 
realized what is the proper way to sort list.

Space is significant. At least it may be. Only a few languages have got 
such fix, Spanish is among them 
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5

Notice "collating-symbol <space>"

I have found example of sorting names in a language where woman surname 
usually have additional "a" in comparison to man surname.

printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \
   | LANG=pl_PL.UTF-8 sort
Ivanov Adam
Ivanova Alla
Ivanova Svetlana

es_ES behavior is just as the above example.

printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \
   | LANG=en_US.UTF-8 sort
Ivanova Alla
Ivanov Adam
Ivanova Svetlana

Ukrainian sort works better than Russian one with such example:

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
   | LANG=uk_UA.UTF-8 sort
Иванов Адам
Иванова Алла
Иванова Светлана

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
   | LANG=ru_RU.UTF-8 sort
Иванова Алла
Иванов Адам
Иванова Светлана

Man names are sorted first in such lists. Other cases might exist when 
significant space is undesired.

So sorting is tricky than I expected.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-16 14:59                     ` Maxim Nikulin
@ 2021-04-16 15:30                       ` Maxim Nikulin
  0 siblings, 0 replies; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-16 15:30 UTC (permalink / raw)
  To: emacs-orgmode

On 16/04/2021 21:59, Maxim Nikulin wrote:
> Ukrainian sort works better than Russian one with such example:
> 
> printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
>    | LANG=uk_UA.UTF-8 sort
> Иванов Адам
> Иванова Алла
> Иванова Светлана
> 
> printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
>    | LANG=ru_RU.UTF-8 sort
> Иванова Алла
> Иванов Адам
> Иванова Светлана

Sorry, I forgot to generate uk_UA locale so simple ordering were applied 
without ignoring of spaces in the shared part of locale definitions.

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
   | LANG=uk_UA.UTF-8 sort
Иванова Алла
Иванов Адам
Иванова Светлана

So only pl_PL and es_ES have significant spaces

<U0020> <space>;IGNORE;IGNORE;<U0020>

vs.

<U0020> IGNORE;IGNORE;IGNORE;<U0020> % SPACE

in iso14651_t1_common



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-10  0:01   ` Juan Manuel Macías
  2021-04-10 10:19     ` Nicolas Goaziou
@ 2021-04-17 13:27     ` Maxim Nikulin
  2021-04-18 17:52       ` Juan Manuel Macías
  1 sibling, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-17 13:27 UTC (permalink / raw)
  To: emacs-orgmode

On 10/04/2021 07:01, Juan Manuel Macías wrote:
> Nicolas Goaziou writes:
>> Could you apply the same fix to the `org-verbatim-re' match above, and
>> provide an appropriate commit message?
> 
> Done! I've attached the corrected patch. Sorry for the flaws in me
> previous patch: I'm a bit of a novice at submitting patches...

I have not read yet the following documents to realize what pitfalls 
could be expected due to significant space added to Spanish and Polish 
collation rules in libc.

http://www.unicode.org/reports/tr10/
Unicode® Technical Standard #10
Unicode Collation Algorithm

http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm
Unicode Technical Standard #35
Unicode Locale Data Markup Language (LDML)
Part 5: Collation

In the meanwhile I have realized that org-sort-remove-invisible function 
has rather strange behavior. There are no tests, so I am in doubt if the 
following result is expected and intended (unpatched version)

(org-sort-remove-invisible "- (*bold*)")
"-  bold "

I would expect
"- (bold)"

P.S.

On 10/04/2021 17:19, Nicolas Goaziou wrote:
 >
 > Do you have a simple test case to reproduce the problem? Currently
 > sorting the following trivial lists causes no issue:

After the lengthy discussion the problem is traced down to the following:

- Ensure that you have *compiled* recent enough es_ES.UTF-8 locale 
(description includes 
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5)
- LC_ALL= LANG=es_ES.UTF-8 emacs
- (org-sort-list t ?a) for the following snippet

- /a/
- a
- /v/
- v

Significance of space should be better illustrated with multi-word items 
but I am not ready to provide an impressive example yet.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-17 13:27     ` Maxim Nikulin
@ 2021-04-18 17:52       ` Juan Manuel Macías
  2021-04-18 21:20         ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-18 17:52 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: orgmode

Hi Maxim,

Maxim Nikulin writes:

> On 10/04/2021 07:01, Juan Manuel Macías wrote:
>> Nicolas Goaziou writes:
>>> Could you apply the same fix to the `org-verbatim-re' match above, and
>>> provide an appropriate commit message?
>> Done! I've attached the corrected patch. Sorry for the flaws in me
>> previous patch: I'm a bit of a novice at submitting patches...
>
> I have not read yet the following documents to realize what pitfalls
> could be expected due to significant space added to Spanish and Polish 
> collation rules in libc.
>
> http://www.unicode.org/reports/tr10/
> Unicode® Technical Standard #10
> Unicode Collation Algorithm
>
> http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm
> Unicode Technical Standard #35
> Unicode Locale Data Markup Language (LDML)
> Part 5: Collation
>
> In the meanwhile I have realized that org-sort-remove-invisible
> function has rather strange behavior. There are no tests, so I am in
> doubt if the following result is expected and intended (unpatched
> version)

All this research you've done on spaces and collation rules is very
interesting and enlightening.

Certainly, space is important, as the following list in Spanish is
correctly ordered as:

- lo raro
- lo vano
- localidad

However, with the 'unpatched' version of org-sort-remove invisible I get
this result (which is correct) for this list:

- lo bueno
- lo /bueno/
- lo vano
- lo /vano/

And with the patched version, I get this, which is wrong (!):

- lo bueno
- lo vano
- lo /bueno/
- lo /vano/

It seems that what I was proposing as a patch at the beginning is not,
finally, a viable solution for all contexts...

The problem is that, if the first space is removed, we get this
abnormal result:

#+begin_src emacs-lisp
(org-sort-remove-invisible "- lo /bueno/")
#+end_src

#+RESULTS:
: - lobueno

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-18 17:52       ` Juan Manuel Macías
@ 2021-04-18 21:20         ` Juan Manuel Macías
  2021-04-19  8:33           ` Nicolas Goaziou
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-18 21:20 UTC (permalink / raw)
  To: orgmode

Juan Manuel Macías writes:

> It seems that what I was proposing as a patch at the beginning is not,
> finally, a viable solution for all contexts...
>
> The problem is that, if the first space is removed, we get this
> abnormal result:
>
> #+begin_src emacs-lisp
> (org-sort-remove-invisible "- lo /bueno/")
> #+end_src
>
> #+RESULTS:
> : - lobueno

I wonder if this other approach can be viable or if it is something
crazy: if the spaces in org-sort-remove-invisible are a
problem only for the first emphasis of each item, how about this
fix to org-sort-list? (not modifying org-sort-remove-invisible):

@@ -2940,10 +2940,20 @@ function is being called interactively."
 		     (org-sort-remove-invisible
 		      (buffer-substring (match-end 0) (point-at-eol)))))
 		   ((= dcst ?a)
-		    (funcall case-func
-			     (org-sort-remove-invisible
-			      (buffer-substring
-			       (match-end 0) (point-at-eol)))))
+		    (if (save-excursion
+			  (beginning-of-line)
+			  (forward-char)
+			  (looking-at-p org-emph-re))
+			(replace-regexp-in-string
+			 "\\(^\\)\s+" "\\1"
+			 (funcall case-func
+				  (org-sort-remove-invisible
+				   (buffer-substring
+				    (match-end 0) (point-at-eol)))))
+		      (funcall case-func
+			       (org-sort-remove-invisible
+				(buffer-substring
+				 (match-end 0) (point-at-eol))))))
 		   ((= dcst ?t)
 		    (cond
 		     ;; If it is a timer list, convert timer to seconds


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-18 21:20         ` Juan Manuel Macías
@ 2021-04-19  8:33           ` Nicolas Goaziou
  2021-04-19 12:34             ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-19  8:33 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: orgmode

Hello,

Juan Manuel Macías <maciaschain@posteo.net> writes:

> I wonder if this other approach can be viable or if it is something
> crazy: if the spaces in org-sort-remove-invisible are a
> problem only for the first emphasis of each item, how about this
> fix to org-sort-list? (not modifying org-sort-remove-invisible):

I think `org-sort-remove-invisible' is wrong, so it is the one that
needs to be fixed.

Could you try the following instead?

--8<---------------cut here---------------start------------->8---
(defun org-sort-remove-invisible (s)
  "Remove invisible part of links and emphasis markers from string S."
  (let ((remove-markers
         (lambda (m)
           (concat (match-string 1 m)
                   (match-string 4 m)
                   (match-string 5 m)))))
    (remove-text-properties 0 (length s) org-rm-props s)
    (replace-regexp-in-string
     org-verbatim-re remove-markers
     (replace-regexp-in-string org-emph-re remove-markers (org-link-display-format s) t tt)
     t t)))
--8<---------------cut here---------------end--------------->8---

Regards,
--
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19  8:33           ` Nicolas Goaziou
@ 2021-04-19 12:34             ` Maxim Nikulin
  2021-04-19 16:08               ` Nicolas Goaziou
  0 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-19 12:34 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1989 bytes --]

On 19/04/2021 15:33, Nicolas Goaziou wrote:
> 
> Could you try the following instead?
> 
> --8<---------------cut here---------------start------------->8---
> (defun org-sort-remove-invisible (s)
>    "Remove invisible part of links and emphasis markers from string S."
>    (let ((remove-markers
>           (lambda (m)

Just a curiosity, what is style guide recommendation: let - lambda or 
cl-labels?

>             (concat (match-string 1 m)
>                     (match-string 4 m)
>                     (match-string 5 m)))))
>      (remove-text-properties 0 (length s) org-rm-props s)
>      (replace-regexp-in-string
>       org-verbatim-re remove-markers
>       (replace-regexp-in-string org-emph-re remove-markers (org-link-display-format s) t tt)

Single "t" is at the end, isn't it?

>       t t)))

I think, the patch is an improvement.

There is still a minor shortcoming that ordering of emphasized and 
non-emphasized words is undefined
- /a/
- *a*
- a
Let's leave it aside since it requires multilevel comparison similar to 
collation in locales and more strict definition which part of string is 
compared at each level:
- /a/ :: A
- /a/ :: Z
- a :: A
- a :: Z

In my opinion, a more severe limitation comes from sequential 
regexp-based approach. Consider stripping markers from
1. "a =b *c* d= e"
2. "*b* /i/"
First case could be solved by splitting the input string by verbatim 
regexp at first and by applying emphasis substitution only to 
non-verbatim parts. However I am rather shy to experiment with regexps 
definition to avoid including chars before and after emphasis markers. 
It would be great to get role of particular characters from more 
reliable parser (or at least from text properties affected by 
fontification).

I have some tests for `org-sort-remove-invisible', see the attachment. 
Actually I do not like such style of tests, I strongly prefer to see all 
failures at once, but I have not found if such technique is used 
anywhere in org tests.

[-- Attachment #2: 0001-More-tests-for-org-sort-list.patch --]
[-- Type: text/x-patch, Size: 5026 bytes --]

From 6719386aa5a95394a4cb98ce28b889f545620bd9 Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Mon, 19 Apr 2021 19:06:36 +0700
Subject: [PATCH] More tests for `org-sort-list'

* lisp/org.el (org-verbatim-re): Add to the docstring a statement
concerning subgroups meaning.
* testing/lisp/test-org-list.el (test-org-list/sort): Add cases with
text emphasis.  Stress in comments that it is significant whether
locale-specific collation rules ignores spaces.
* testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add
tests for `org-sort-list' helper.

Add a test with expected failures to make apparent limitation of simple
regexp-based approach in `org-sort-remove-invisible' function.
---
 lisp/org.el                   |  3 ++-
 testing/lisp/test-org-list.el | 34 +++++++++++++++++++++++++++++++++-
 testing/lisp/test-org.el      | 27 +++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index c2738100f..3d4f5553a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements:
 5  The character after the match, empty at the end of a line")
 
 (defvar org-verbatim-re nil
-  "Regular expression for matching verbatim text.")
+  "Regular expression for matching verbatim text.
+Groups 1-5 have the same meaning as in `org-emph-re' pattern.")
 
 (defvar org-emphasis-regexp-components) ; defined just below
 (defvar org-emphasis-alist) ; defined just below
diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el
index 3689a172f..cc7914c8d 100644
--- a/testing/lisp/test-org-list.el
+++ b/testing/lisp/test-org-list.el
@@ -1225,7 +1225,39 @@ b. Item 2<point>"
        (equal "- b\n- a\n- C\n"
 	      (org-test-with-temp-text "- b\n- C\n- a\n"
 		(org-sort-list t ?A)
-		(buffer-string))))))
+		(buffer-string))))
+      ;; Emphasis in the beginning.
+      (should
+       (equal "- a\n- /z/\n"
+              (org-test-with-temp-text "- /z/\n- a\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      (should
+       (equal "- *a*\n- z\n"
+              (org-test-with-temp-text "- z\n- *a*\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      ;; Emphasis of second word.
+      ;; Locales with significant spaces (C, es_ES, pl_PL)
+      ;; are more sensitive to problems with sort.
+      (should
+       (equal "- a b\n- a /z/\n"
+              (org-test-with-temp-text "- a /z/\n- a b\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      (should
+       (equal "- a *b*\n- a z\n"
+              (org-test-with-temp-text "- a z\n- a *b*\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      ;; Space role in sorting.
+      ;; Test would fail for locales with ignored space, e.g. en_US, it works
+      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
+      (should
+       (equal "- Time stamp\n- Timer\n"
+              (org-test-with-temp-text "- Timer\n- Time stamp\n"
+                (org-sort-list t ?a)
+                (buffer-string))))))
   ;; Sort numerically.
   (should
    (equal "- 1\n- 2\n- 10\n"
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index f6fb4b3ca..307b30265 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8262,4 +8262,31 @@ two
 
 (provide 'test-org)
 
+(ert-deftest test-org/org-sort-remove-invisible ()
+  "Empasis markers are stripped by `org-sort-remove-invisible'."
+  (dolist (case '(("Test *bold* text" . "Test bold text")
+                  ("Test /italic/ text" . "Test italic text")
+                  ("Test _underlined_ text" . "Test underlined text")
+                  ("Test +strike through+ text" . "Test strike through text")
+                  ("Test =verbatim= text" . "Test verbatim text")
+                  ("Test ~code~ text" . "Test code text")
+                  ("*Beginning* of a line" . "Beginning of a line")
+                  ("End =of line=" . "End of line")
+                  ("Braces (*around*)" . "Braces (around)")
+                  ("Multiple *emphasis* options /in the/ same *line*" .
+                   "Multiple emphasis options in the same line")))
+    (let ((test-string (car case))
+          (expectation (cdr case)))
+      (should (equal expectation (org-sort-remove-invisible test-string))))))
+
+(ert-deftest test-org/org-sort-remove-invisible-failures ()
+  "Examples of `org-sort-remove-invisible` failures."
+  :expected-result :failed
+  (dolist (case '(("Lost =in *verbatim* text= fragment" .
+                   "Lost in *verbatim* text fragment")
+                  ("Adjucent emphasis: *Overlapped* /regexps/" .
+                   "Adjucent emphasis: Ovelapped regexps")))
+    (let ((test-string (car case))
+          (expectation (cdr case)))
+      (should (equal expectation (org-sort-remove-invisible test-string))))))
 ;;; test-org.el ends here
-- 
2.25.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 12:34             ` Maxim Nikulin
@ 2021-04-19 16:08               ` Nicolas Goaziou
  2021-04-19 17:00                 ` Greg Minshall
                                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-19 16:08 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Hello,

Maxim Nikulin <manikulin@gmail.com> writes:

> Just a curiosity, what is style guide recommendation: let - lambda or
> cl-labels?

I stay away from CL as much as possible, otherwise newcomers will have
to learn two languages to start contributing, Elisp and CL (cl-loop,
ewww). CL is still necessary however, as we cannot use `seq' yet.

Also, there is `letrec' instead of `cl-labels'.

> In my opinion, a more severe limitation comes from sequential
> regexp-based approach. Consider stripping markers from
> 1. "a =b *c* d= e"
> 2. "*b* /i/"

Fair enough. Here comes another, more involved, attempt.

--8<---------------cut here---------------start------------->8---
(defun org-sort-remove-invisible (s)
  "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
  ;; org-element-interpret-data clears any text property, including
  ;; invisible part.
  (org-element-interpret-data
   (let ((tree (org-element-parse-secondary-string
                s (org-element-restriction 'paragraph))))
     (org-element-map tree '(bold code italic strike-through underline verbatim)
       (lambda (o)
         (pcase (org-element-type o)
           ;; Terminal object.  Replace it with its value.
           ((or `code `verbatim)
            (let ((new (org-element-property :value o)))
              (org-element-insert-before new o)
              (org-element-put-property
               new :post-blank (org-element-property :post-blank o))))
           ;; Non-terminal objects.  Splice contents.
           (_
            (let ((contents (org-element-contents o))
                  (c nil))
              (while contents
                (setq c (pop contents))
                (org-element-insert-before c o))
              (org-element-put-property
               c :post-blank (org-element-property :post-blank o)))))
         (org-element-extract-element o)))
     ;; Return modified tree.
     tree)))
--8<---------------cut here---------------end--------------->8---

It is not perfect, but it does a better job.

WDYT?

> +      ;; Space role in sorting.
> +      ;; Test would fail for locales with ignored space, e.g. en_US, it works
> +      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
> +      (should
> +       (equal "- Time stamp\n- Timer\n"
> +              (org-test-with-temp-text "- Timer\n- Time stamp\n"
> +                (org-sort-list t ?a)
> +                (buffer-string))))))

Since this test is bound to fail for some developers, I assume it
shouldn't be included.

> +  (dolist (case '(("Lost =in *verbatim* text= fragment" .
> +                   "Lost in *verbatim* text fragment")
> +                  ("Adjucent emphasis: *Overlapped* /regexps/" .
> +                   "Adjucent emphasis: Ovelapped regexps")))
                                          ^^^^
                                          typo

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 16:08               ` Nicolas Goaziou
@ 2021-04-19 17:00                 ` Greg Minshall
  2021-04-19 17:17                   ` Tom Gillespie
  2021-04-19 17:36                 ` Maxim Nikulin
  2021-04-20 12:20                 ` Maxim Nikulin
  2 siblings, 1 reply; 38+ messages in thread
From: Greg Minshall @ 2021-04-19 17:00 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Maxim Nikulin, emacs-orgmode

hi, Nicolas,

i'm curious, not knowing history and/or procedures.

> ... CL is still necessary however, as we cannot use `seq' yet.

why is 'seq not "yet" available?  what will make it available?

cheers, Greg


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 17:00                 ` Greg Minshall
@ 2021-04-19 17:17                   ` Tom Gillespie
  2021-04-19 18:00                     ` Greg Minshall
  0 siblings, 1 reply; 38+ messages in thread
From: Tom Gillespie @ 2021-04-19 17:17 UTC (permalink / raw)
  To: Greg Minshall; +Cc: Maxim Nikulin, emacs-orgmode, Nicolas Goaziou

Hi Greg,
    seq cannot be used because it is not available in older versions
of emacs that org still supports. When support for those older
versions is dropped then seq could be used. Best,
Tom


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 16:08               ` Nicolas Goaziou
  2021-04-19 17:00                 ` Greg Minshall
@ 2021-04-19 17:36                 ` Maxim Nikulin
  2021-04-19 17:50                   ` Nicolas Goaziou
  2021-04-20 12:20                 ` Maxim Nikulin
  2 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-19 17:36 UTC (permalink / raw)
  To: emacs-orgmode

On 19/04/2021 23:08, Nicolas Goaziou wrote:
>> +      ;; Space role in sorting.
>> +      ;; Test would fail for locales with ignored space, e.g. en_US, it works
>> +      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
>> +      (should
>> +       (equal "- Time stamp\n- Timer\n"
>> +              (org-test-with-temp-text "- Timer\n- Time stamp\n"
>> +                (org-sort-list t ?a)
>> +                (buffer-string))))))
> 
> Since this test is bound to fail for some developers, I assume it
> shouldn't be included.

Locale "C" is forced for this group of tests. I have added the test to 
catch an attempt to change locale since it would affect other cases. I 
hope, the comment might be useful for those who play with sorting having 
a regular locale instead of fallback "C".



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 17:36                 ` Maxim Nikulin
@ 2021-04-19 17:50                   ` Nicolas Goaziou
  2021-04-20 12:37                     ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-19 17:50 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> On 19/04/2021 23:08, Nicolas Goaziou wrote:
>>> +      ;; Space role in sorting.
>>> +      ;; Test would fail for locales with ignored space, e.g. en_US, it works
>>> +      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
>>> +      (should
>>> +       (equal "- Time stamp\n- Timer\n"
>>> +              (org-test-with-temp-text "- Timer\n- Time stamp\n"
>>> +                (org-sort-list t ?a)
>>> +                (buffer-string))))))
>> Since this test is bound to fail for some developers, I assume it
>> shouldn't be included.
>
> Locale "C" is forced for this group of tests. 

Sorry, I don't understand. There is no locale change around this test,
so it will fail, for example, for me. I wouldn't want to get a noisy
failure each time I run tests.

Regards,


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 17:17                   ` Tom Gillespie
@ 2021-04-19 18:00                     ` Greg Minshall
  0 siblings, 0 replies; 38+ messages in thread
From: Greg Minshall @ 2021-04-19 18:00 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: Maxim Nikulin, emacs-orgmode, Nicolas Goaziou

Tom, thanks!  i assumed something like that.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 16:08               ` Nicolas Goaziou
  2021-04-19 17:00                 ` Greg Minshall
  2021-04-19 17:36                 ` Maxim Nikulin
@ 2021-04-20 12:20                 ` Maxim Nikulin
  2021-04-20 13:57                   ` Nicolas Goaziou
  2 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-20 12:20 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

On 19/04/2021 23:08, Nicolas Goaziou wrote:
> 
>> In my opinion, a more severe limitation comes from sequential
>> regexp-based approach. Consider stripping markers from
>> 1. "a =b *c* d= e"
>> 2. "*b* /i/"
> 
> Fair enough. Here comes another, more involved, attempt.

Maybe first variant deserves to be committed while discussion of a 
better option is in progress.

> --8<---------------cut here---------------start------------->8---
> (defun org-sort-remove-invisible (s)
>    "Remove emphasis markers and any invisible property from string S.
> Assume S may contain only objects."
>    ;; org-element-interpret-data clears any text property, including
>    ;; invisible part.
>    (org-element-interpret-data

Sorry, I can not help you with polishing code of this function, I am not 
familiar with functions working on org element tree yet.

I can not even determine what type of structure is returned when 
`org-sort-remove-invisible' is called from ert or from scratch buffer:

(org-sort-remove-invisible "A")
#("A" 0 1 (:parent (#("A" 0 1 ...))))

A couple of obvious problems:

1. Link handling

#+begin_src elisp
(org-sort-remove-invisible
  "- [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]")
#+end_src

#+RESULTS:
: - [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]

2. Missed spaces

#+begin_src elisp
(org-sort-remove-invisible "A *b* /i/ t.")
#+end_src

#+RESULTS:
: A bit.

>     (let ((tree (org-element-parse-secondary-string
>                  s (org-element-restriction 'paragraph))))
>       (org-element-map tree '(bold code italic strike-through underline verbatim)
>         (lambda (o)
>           (pcase (org-element-type o)
>             ;; Terminal object.  Replace it with its value.
>             ((or `code `verbatim)
>              (let ((new (org-element-property :value o)))
>                (org-element-insert-before new o)
>                (org-element-put-property
>                 new :post-blank (org-element-property :post-blank o))))
>             ;; Non-terminal objects.  Splice contents.
>             (_
>              (let ((contents (org-element-contents o))
>                    (c nil))
>                (while contents
>                  (setq c (pop contents))
>                  (org-element-insert-before c o))
>                (org-element-put-property
>                 c :post-blank (org-element-property :post-blank o)))))
>           (org-element-extract-element o)))
>       ;; Return modified tree.
>       tree)))
> --8<---------------cut here---------------end--------------->8---
> 
> It is not perfect, but it does a better job.
> 
> WDYT?



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-19 17:50                   ` Nicolas Goaziou
@ 2021-04-20 12:37                     ` Maxim Nikulin
  0 siblings, 0 replies; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-20 12:37 UTC (permalink / raw)
  To: emacs-orgmode

On 20/04/2021 00:50, Nicolas Goaziou wrote:
> Maxim Nikulin writes:
> 
>> On 19/04/2021 23:08, Nicolas Goaziou wrote:
>>>> +      ;; Space role in sorting.
>>>> +      ;; Test would fail for locales with ignored space, e.g. en_US, it works
>>>> +      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
>>>> +      (should
>>>> +       (equal "- Time stamp\n- Timer\n"
>>>> +              (org-test-with-temp-text "- Timer\n- Time stamp\n"
>>>> +                (org-sort-list t ?a)
>>>> +                (buffer-string))))))
>>> Since this test is bound to fail for some developers, I assume it
>>> shouldn't be included.
>>
>> Locale "C" is forced for this group of tests.
> 
> Sorry, I don't understand. There is no locale change around this test,
> so it will fail, for example, for me. I wouldn't want to get a noisy
> failure each time I run tests.

Certainly flaky tests must be avoided. However I can not identify the 
source of confusion (yours or mine).

There is redefinition of `string-collate-lessp' to run tests with "C" 
locale:
https://code.orgmode.org/bzg/org-mode/src/master/testing/lisp/test-org-list.el#L1207

And it works for me

#+begin_src elisp
(let ((original-string-collate-lessp (symbol-function 
#'string-collate-lessp)))
   (cl-letf (((symbol-function 'string-collate-lessp)
	     (lambda (s1 s2 &optional locale ignore-case)
	       (funcall original-string-collate-lessp
			s1 s2 "C" ignore-case))))
     (org-test-with-temp-text "- Timer\n- Time stamp\n"
       (org-sort-list t ?a)
       (buffer-string))))
#+end_src

#+RESULTS:
: - Time stamp
: - Timer

#+begin_src elisp
(let ((original-string-collate-lessp (symbol-function 
#'string-collate-lessp)))
   (cl-letf (((symbol-function 'string-collate-lessp)
	     (lambda (s1 s2 &optional locale ignore-case)
	       (funcall original-string-collate-lessp
			s1 s2 "en_US.UTF-8" ignore-case))))
     (org-test-with-temp-text "- Timer\n- Time stamp\n"
       (org-sort-list t ?a)
       (buffer-string))))
#+end_src

#+RESULTS:
: - Timer
: - Time stamp

glibc-2.31 (source for locales package), Ubuntu-20.04 focal, emacs-26.3

Could you, please, show result of the following command in your 
environment (for a chance that "C" locale definition has changed):

printf 'Timestamp\nTime zone\n' \
   | LC_ALL= LC_COLLATE= LANG=C.UTF-8 LANGUAGE=en sort --debug
sort: using ‘C.UTF-8’ sorting rules
Time zone
_________
Timestamp
_________




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-20 12:20                 ` Maxim Nikulin
@ 2021-04-20 13:57                   ` Nicolas Goaziou
  2021-04-20 15:51                     ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-20 13:57 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Hello,

Maxim Nikulin <manikulin@gmail.com> writes:

> Maybe first variant deserves to be committed while discussion of a 
> better option is in progress.

I'd rather not, since we're currently considering a somewhat different
path. The problem has been there for ages anyway.

> I can not even determine what type of structure is returned when 
> `org-sort-remove-invisible' is called from ert or from scratch buffer:
>
> (org-sort-remove-invisible "A")
> #("A" 0 1 (:parent (#("A" 0 1 ...))))

This is a string.

> A couple of obvious problems:
>
> 1. Link handling
>
> #+begin_src elisp
> (org-sort-remove-invisible
>   "- [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]")
> #+end_src
>
>
> #+RESULTS:
> : - [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]

Ah! I forgot the link part! Hopefully done here.

--8<---------------cut here---------------start------------->8---
(defun org-sort-remove-invisible (s)
  "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
  ;; org-element-interpret-data clears any text property, including
  ;; invisible part.
  (org-element-interpret-data
   (let ((tree (org-element-parse-secondary-string
                s (org-element-restriction 'paragraph))))
     (org-element-map tree '(bold code italic link strike-through underline verbatim)
       (lambda (o)
         (pcase (org-element-type o)
           ;; Terminal object.  Replace it with its value.
           ((or `code `verbatim)
            (let ((new (org-element-property :value o)))
              (org-element-insert-before new o)
              (org-element-put-property
               new :post-blank (org-element-property :post-blank o))))
           ;; Non-terminal objects.  Splice contents.
           (type
            (let ((contents
                   (or (org-element-contents o)
                       (and (eq type 'link)
                            (list (org-element-property :raw-link o)))))
                  (c nil))
              (while contents
                (setq c (pop contents))
                (org-element-insert-before c o))
              (org-element-put-property
               c :post-blank (org-element-property :post-blank o)))))
         (org-element-extract-element o)))
     ;; Return modified tree.
     tree)))
--8<---------------cut here---------------end--------------->8---

> 2. Missed spaces
>
> #+begin_src elisp
> (org-sort-remove-invisible "A *b* /i/ t.")
> #+end_src
>
> #+RESULTS:
> : A bit.

You need to update Org. I added a fix for that in "org-element"
yesterday.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-20 13:57                   ` Nicolas Goaziou
@ 2021-04-20 15:51                     ` Maxim Nikulin
  2021-04-20 20:37                       ` Nicolas Goaziou
  0 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-20 15:51 UTC (permalink / raw)
  To: emacs-orgmode

On 20/04/2021 20:57, Nicolas Goaziou wrote:
> Maxim Nikulin writes:
>> (org-sort-remove-invisible "A")
>> #("A" 0 1 (:parent (#("A" 0 1 ...))))
> 
> This is a string.

Thank you, from second attempt I have managed to strip text properties.

Since the intended usage of return value is sorting key, would not it 
benefit from passing result through the following expression?

     (set-text-properties 0 (length s) nil s)

An alternative is to clean up keys in `org-sort-list' function.

> Ah! I forgot the link part! Hopefully done here.

Surprisingly there are still cases when the old approach works better:

(let ((s (org-sort-remove-invisible
"A /wrapping [[https://orgmode.org/?a=b&c=d#e][link]] emphasis/")))
   (set-text-properties 0 (length s) nil s)
   s)
"A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/"

I expect "A wrapping link emphasis".

In the meanwhile I have tried

     (benchmark-run 1 (org-sort-list t ?a))

in a file (1100 lines) obtained using

     grep '^- ' doc/org-manual.org >/tmp/list.org

It seems, performance is still acceptable (single run hardly could be 
considered as an accurate test):

(1.115571472 18 0.5986466069999999) ; new variant
(0.260384514 1 0.09805475199999947) ; original code

> --8<---------------cut here---------------start------------->8---
> (defun org-sort-remove-invisible (s)
>    "Remove emphasis markers and any invisible property from string S.
> Assume S may contain only objects."
>    ;; org-element-interpret-data clears any text property, including
>    ;; invisible part.
>    (org-element-interpret-data
>     (let ((tree (org-element-parse-secondary-string
>                  s (org-element-restriction 'paragraph))))
>       (org-element-map tree '(bold code italic link strike-through underline verbatim)
>         (lambda (o)
>           (pcase (org-element-type o)
>             ;; Terminal object.  Replace it with its value.
>             ((or `code `verbatim)
>              (let ((new (org-element-property :value o)))
>                (org-element-insert-before new o)
>                (org-element-put-property
>                 new :post-blank (org-element-property :post-blank o))))
>             ;; Non-terminal objects.  Splice contents.
>             (type
>              (let ((contents
>                     (or (org-element-contents o)
>                         (and (eq type 'link)
>                              (list (org-element-property :raw-link o)))))
>                    (c nil))
>                (while contents
>                  (setq c (pop contents))
>                  (org-element-insert-before c o))
>                (org-element-put-property
>                 c :post-blank (org-element-property :post-blank o)))))
>           (org-element-extract-element o)))
>       ;; Return modified tree.
>       tree)))
> --8<---------------cut here---------------end--------------->8---




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-20 15:51                     ` Maxim Nikulin
@ 2021-04-20 20:37                       ` Nicolas Goaziou
  2021-04-21 13:10                         ` Maxim Nikulin
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Goaziou @ 2021-04-20 20:37 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Hello,

Maxim Nikulin <manikulin@gmail.com> writes:

> Surprisingly there are still cases when the old approach works better:
>
> (let ((s (org-sort-remove-invisible
> "A /wrapping [[https://orgmode.org/?a=b&c=d#e][link]] emphasis/")))
>   (set-text-properties 0 (length s) nil s)
>   s)
> "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/"
>
> I expect "A wrapping link emphasis".

Yet, your expectations are wrong. There is no link in the text above.
Emphasized text starts at "/wrapping" and ends before "/?".

Granted, this is a situation where the Org syntax is not very intuitive.
Anyway, the new function is more accurate.

Maybe we should require a space after punctuation following emphasized
text. I don't know. This is orthogonal to the current discussion.

> In the meanwhile I have tried
>
>     (benchmark-run 1 (org-sort-list t ?a))
>
> in a file (1100 lines) obtained using

I don't think performance is really an issue. Of course, the suggested
function is clearly slower than the current one.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-20 20:37                       ` Nicolas Goaziou
@ 2021-04-21 13:10                         ` Maxim Nikulin
  2021-04-21 15:45                           ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Maxim Nikulin @ 2021-04-21 13:10 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1739 bytes --]

On 21/04/2021 03:37, Nicolas Goaziou wrote:
> Maxim Nikulin writes:
> 
>> (let ((s (org-sort-remove-invisible
>> "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/"
>>
>> I expect "A wrapping link emphasis".
> 
> Yet, your expectations are wrong. There is no link in the text above.
> Emphasized text starts at "/wrapping" and ends before "/?".
>
> Granted, this is a situation where the Org syntax is not very intuitive.
> Anyway, the new function is more accurate.

I think, new variant should be committed even it does not fix Juan's 
case. He have not confirmed the fix yet.

> Maybe we should require a space after punctuation following emphasized
> text. I don't know. This is orthogonal to the current discussion.

I still believe in my expectation, however I admit such limitation of 
parser. At first I have not recognized that the issue may be similar to
https://orgmode.org/list/CAH+Wm4-_XHUZKFTf=ZtbfnCPvQWkbEoeGs8EpYm+8SPmu8LHFg@mail.gmail.com/
Anyway for my example workaround is to add more markers before, inside, 
and after the link. Maybe I will look closer at Tom's parser if it 
solves ambiguity in the same way.

>> In the meanwhile I have tried
>>
>>      (benchmark-run 1 (org-sort-list t ?a))
>>
>> in a file (1100 lines) obtained using
> 
> I don't think performance is really an issue. Of course, the suggested
> function is clearly slower than the current one.

It is OK since difference is not really huge, especially taking into 
account that new variant was not compiled.

Do you still have problem with locale dependency of added tests? I can 
not guess what could be its source and expect that test should work 
reliably. Disregard "/3" in the subject of the patches. Third change is 
your code.

[-- Attachment #2: 0001-More-tests-for-org-sort-list.patch --]
[-- Type: text/x-patch, Size: 4913 bytes --]

From c2c46d133e80ffa2323618ac88a1902e63ba1efc Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Mon, 19 Apr 2021 19:06:36 +0700
Subject: [PATCH 1/3] More tests for `org-sort-list'

* lisp/org.el (org-verbatim-re): Add to the docstring a statement
concerning subgroups meaning.
* testing/lisp/test-org-list.el (test-org-list/sort): Add cases with
text emphasis.  Stress in comments that it is significant whether
locale-specific collation rules ignores spaces.
* testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add
tests for `org-sort-list' helper.
---
 lisp/org.el                   |  3 ++-
 testing/lisp/test-org-list.el | 34 +++++++++++++++++++++++++++++++++-
 testing/lisp/test-org.el      | 25 +++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index c2738100f..3d4f5553a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements:
 5  The character after the match, empty at the end of a line")
 
 (defvar org-verbatim-re nil
-  "Regular expression for matching verbatim text.")
+  "Regular expression for matching verbatim text.
+Groups 1-5 have the same meaning as in `org-emph-re' pattern.")
 
 (defvar org-emphasis-regexp-components) ; defined just below
 (defvar org-emphasis-alist) ; defined just below
diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el
index 3689a172f..cc7914c8d 100644
--- a/testing/lisp/test-org-list.el
+++ b/testing/lisp/test-org-list.el
@@ -1225,7 +1225,39 @@ b. Item 2<point>"
        (equal "- b\n- a\n- C\n"
 	      (org-test-with-temp-text "- b\n- C\n- a\n"
 		(org-sort-list t ?A)
-		(buffer-string))))))
+		(buffer-string))))
+      ;; Emphasis in the beginning.
+      (should
+       (equal "- a\n- /z/\n"
+              (org-test-with-temp-text "- /z/\n- a\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      (should
+       (equal "- *a*\n- z\n"
+              (org-test-with-temp-text "- z\n- *a*\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      ;; Emphasis of second word.
+      ;; Locales with significant spaces (C, es_ES, pl_PL)
+      ;; are more sensitive to problems with sort.
+      (should
+       (equal "- a b\n- a /z/\n"
+              (org-test-with-temp-text "- a /z/\n- a b\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      (should
+       (equal "- a *b*\n- a z\n"
+              (org-test-with-temp-text "- a z\n- a *b*\n"
+                (org-sort-list t ?a)
+                (buffer-string))))
+      ;; Space role in sorting.
+      ;; Test would fail for locales with ignored space, e.g. en_US, it works
+      ;; in C and currently rare locales having significant space (es_ES, pl_PL)
+      (should
+       (equal "- Time stamp\n- Timer\n"
+              (org-test-with-temp-text "- Timer\n- Time stamp\n"
+                (org-sort-list t ?a)
+                (buffer-string))))))
   ;; Sort numerically.
   (should
    (equal "- 1\n- 2\n- 10\n"
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index f6fb4b3ca..3f74b3f35 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8262,4 +8262,29 @@ two
 
 (provide 'test-org)
 
+(ert-deftest test-org/org-sort-remove-invisible ()
+  "Empasis markers are stripped by `org-sort-remove-invisible'."
+  (dolist (case '(("Test *bold* text" . "Test bold text")
+                  ("Test /italic/ text" . "Test italic text")
+                  ("Test _underlined_ text" . "Test underlined text")
+                  ("Test +strike through+ text" . "Test strike through text")
+                  ("Test =verbatim= text" . "Test verbatim text")
+                  ("Test ~code~ text" . "Test code text")
+                  ("*Beginning* of a line" . "Beginning of a line")
+                  ("End =of line=" . "End of line")
+                  ("Braces (*around*)" . "Braces (around)")
+                  ("Multiple *emphasis* options /in the/ same *line*" .
+                   "Multiple emphasis options in the same line")
+                  ("/Adjucent/ *emphasis*" . "Adjucent emphasis")
+                  ("Verbatim =with *emphasis* example=" .
+                   "Verbatim with *emphasis* example")
+                  ("Emphasis [[http://orgmode.org/][inside /a link/]]" .
+                   "Emphasis inside a link")
+                  ("*Wrap [[https:/orgmode.org][link]] emphasis*" .
+                   "Wrap link emphasis")
+                  ("A =verbatim [[https://orgmode.org/][link]] example=" .
+                   "A verbatim [[https://orgmode.org/][link]] example")))
+    (let ((test-string (car case))
+          (expectation (cdr case)))
+      (should (equal expectation (org-sort-remove-invisible test-string))))))
 ;;; test-org.el ends here
-- 
2.25.1


[-- Attachment #3: 0002-testing-lisp-test-org.el-Non-obvious-cases-for-org-s.patch --]
[-- Type: text/x-patch, Size: 2330 bytes --]

From e7a326fff98fdc56b91818af526d398d8387bda4 Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Wed, 21 Apr 2021 19:22:18 +0700
Subject: [PATCH 2/3] testing/lisp/test-org.el: Non-obvious cases for
 org-sort-remove-invisible

testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add test
cases illustrating corner cases in parser behavior.

Fontification in Emacs buffer is not consistent with result of
`org-sort-remove-invisible` for 2 of 3 added examples.  The intention is
to make author of changes in parser aware of consequences, however it is
unlikely that users rely on such nuances.  Update expectations if change
is considered as improvement.
---
 testing/lisp/test-org.el | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index 3f74b3f35..4f987f509 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8283,7 +8283,18 @@ two
                   ("*Wrap [[https:/orgmode.org][link]] emphasis*" .
                    "Wrap link emphasis")
                   ("A =verbatim [[https://orgmode.org/][link]] example=" .
-                   "A verbatim [[https://orgmode.org/][link]] example")))
+                   "A verbatim [[https://orgmode.org/][link]] example")
+                  ;; A bit strange cases to catch changes of behavior.
+                  ;; It may be reasonable to update expectation in the case of failure.
+                  ("/Overlapping [[http://orgmode.org][structure/ with link]]" .
+                   ;; not "Overlapping structure with link"
+                   "Overlapping [[http://orgmode.org][structure with link]]")
+                  ("Another [[http://orgmode.org][overlapping *structure]] with* link" .
+                   ;; not "Another overlapping structure with link"
+                   "Another overlapping *structure with* link")
+                  ("A /wrapping [[https://orgmode.org/?bot=true][link]] emphasis/" .
+                   ;; not "A wrapping link emphasis", lost "/" before "?" instead
+                   "A wrapping [[https://orgmode.org?bot=true][link]] emphasis/")))
     (let ((test-string (car case))
           (expectation (cdr case)))
       (should (equal expectation (org-sort-remove-invisible test-string))))))
-- 
2.25.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-21 13:10                         ` Maxim Nikulin
@ 2021-04-21 15:45                           ` Juan Manuel Macías
  0 siblings, 0 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-21 15:45 UTC (permalink / raw)
  To: orgmode

Hi all,

Maxim Nikulin writes:

> I think, new variant should be committed even it does not fix Juan's
> case. He have not confirmed the fix yet.

Sorry, I have been busy with other matters and had lost the thread of
the discussion. I'm catching up on the messages...

I have tried the Nicolas' patch (latest version) and I see that the
items with emphasis are already ordered well. However, it seems that the
problem with identical items with or without emphasis still persists:
which items should go before and in what order? For example, in the
following list I get:

- /a/
- *a*
- a
- *b*
- /b/
- b
- /v/
- *v*
- v

Anyway, I think it is a minor problem (I can't think of many
scenarios in real life where this problem is relevant). In a more
realistic case, the result is correct:

- lo /bueno/
- lo bueno
- lo /vano/
- lo vano

...

- /a/
- b
- /v/
- *z*

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-12 23:52     ` Samuel Wales
@ 2021-04-13 14:16       ` Juan Manuel Macías
  0 siblings, 0 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-13 14:16 UTC (permalink / raw)
  To: Samuel Wales; +Cc: orgmode

Hi Samuel,

Samuel Wales writes:

> probably not a relevant non-confirmation but in recent maint, my config:
>
> - a
> - /a/
> - b

Thanks for trying. I've uploaded this screencast with a minimal Emacs on
a virtual machine:

https://gnutas.juanmanuelmacias.com/images/org-sort-issue-2021-04-13_15.44.52.mp4

and, as you can see, in my case `org-sort-list' can not sort the list
correctly. Maybe it's something related to locales (??), but the only
thing that I can confirm is that if I remove those spaces in
`org-sort-remove-invisible', the list is then sorted correctly.

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-12 23:18   ` Juan Manuel Macías
@ 2021-04-12 23:52     ` Samuel Wales
  2021-04-13 14:16       ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Samuel Wales @ 2021-04-12 23:52 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Ypo, orgmode, Nicolas Goaziou

probably not a relevant non-confirmation but in recent maint, my config:

- a
- /a/
- b


On 4/12/21, Juan Manuel Macías <maciaschain@posteo.net> wrote:
> Hi Ypo,
>
> Ypo writes:
>
>> This is my result after doing "org-sort-list a":
>>
>>   - /a/
>>   - /v/
>>   - a
>>   - b
>>
>> Regards
>
> Thanks for trying. So it seems that you can reproduce the problem as
> well... I wonder if anyone else is able to reproduce it, preferably in a
> minimal emacs.
>
> Best regards,
>
> Juan Manuel
>
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
  2021-04-12 18:51 ` Ypo
@ 2021-04-12 23:18   ` Juan Manuel Macías
  2021-04-12 23:52     ` Samuel Wales
  0 siblings, 1 reply; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-12 23:18 UTC (permalink / raw)
  To: Ypo; +Cc: orgmode, Nicolas Goaziou

Hi Ypo,

Ypo writes:

> This is my result after doing "org-sort-list a":
>
>   - /a/
>   - /v/
>   - a
>   - b
>
> Regards

Thanks for trying. So it seems that you can reproduce the problem as
well... I wonder if anyone else is able to reproduce it, preferably in a
minimal emacs.

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
       [not found] <mailman.57.1618243212.17744.emacs-orgmode@gnu.org>
@ 2021-04-12 18:51 ` Ypo
  2021-04-12 23:18   ` Juan Manuel Macías
  0 siblings, 1 reply; 38+ messages in thread
From: Ypo @ 2021-04-12 18:51 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 95 bytes --]

This is my result after doing "org-sort-list a":

   - /a/
   - /v/
   - a
   - b

Regards

[-- Attachment #2: Type: text/html, Size: 475 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Patch] to correctly sort the items with emphasis marks in a list
@ 2021-04-12 13:50 Juan Manuel Macías
  0 siblings, 0 replies; 38+ messages in thread
From: Juan Manuel Macías @ 2021-04-12 13:50 UTC (permalink / raw)
  To: orgmode

Nicolas Goaziou writes:

> I cannot reproduce it. With your initial list, and a minimal init file,
> I get:
>
>   - /a/
>   - a
>   - b
>   - /v/
>
> Could you try with a minimal Emacs, too?

That's weird ... I have tried launching emacs -q in a virtual machine,
and I keep getting the wrong result (git version, master branch):

- /a/
- /v/
- a
- b

I have tried with the typical keyboard shortcut and also with M-: and
evaluating (org-sort-list t? a nil nil nil)

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-04-21 15:46 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-02 18:15 [Patch] to correctly sort the items with emphasis marks in a list Juan Manuel Macías
2021-04-09 22:28 ` Nicolas Goaziou
2021-04-10  0:01   ` Juan Manuel Macías
2021-04-10 10:19     ` Nicolas Goaziou
2021-04-10 11:41       ` Juan Manuel Macías
2021-04-13 17:31         ` Maxim Nikulin
2021-04-13 19:08           ` Juan Manuel Macías
2021-04-14 15:42             ` Maxim Nikulin
2021-04-14 15:51               ` Maxim Nikulin
2021-04-14 17:07               ` Juan Manuel Macías
2021-04-14 21:36                 ` Juan Manuel Macías
2021-04-15 14:58                 ` Maxim Nikulin
2021-04-15 18:21                   ` Juan Manuel Macías
2021-04-16 14:59                     ` Maxim Nikulin
2021-04-16 15:30                       ` Maxim Nikulin
2021-04-17 13:27     ` Maxim Nikulin
2021-04-18 17:52       ` Juan Manuel Macías
2021-04-18 21:20         ` Juan Manuel Macías
2021-04-19  8:33           ` Nicolas Goaziou
2021-04-19 12:34             ` Maxim Nikulin
2021-04-19 16:08               ` Nicolas Goaziou
2021-04-19 17:00                 ` Greg Minshall
2021-04-19 17:17                   ` Tom Gillespie
2021-04-19 18:00                     ` Greg Minshall
2021-04-19 17:36                 ` Maxim Nikulin
2021-04-19 17:50                   ` Nicolas Goaziou
2021-04-20 12:37                     ` Maxim Nikulin
2021-04-20 12:20                 ` Maxim Nikulin
2021-04-20 13:57                   ` Nicolas Goaziou
2021-04-20 15:51                     ` Maxim Nikulin
2021-04-20 20:37                       ` Nicolas Goaziou
2021-04-21 13:10                         ` Maxim Nikulin
2021-04-21 15:45                           ` Juan Manuel Macías
2021-04-12 13:50 Juan Manuel Macías
     [not found] <mailman.57.1618243212.17744.emacs-orgmode@gnu.org>
2021-04-12 18:51 ` Ypo
2021-04-12 23:18   ` Juan Manuel Macías
2021-04-12 23:52     ` Samuel Wales
2021-04-13 14:16       ` Juan Manuel Macías

unofficial mirror of emacs-orgmode@gnu.org

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/orgmode/0 orgmode/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 orgmode orgmode/ https://yhetil.org/orgmode \
		emacs-orgmode@gnu.org
	public-inbox-index orgmode

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.orgmode
	nntp://news.gmane.io/gmane.emacs.orgmode


code repositories for project(s) associated with this inbox:

	orgmode.git.git (no URL configured)

AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git