From: Maxim Nikulin <m.a.nikulin@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Ihor Radchenko <yantar92@posteo.net>, 59275@debbugs.gnu.org
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sun, 27 Nov 2022 22:19:24 +0700 [thread overview]
Message-ID: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> (raw)
In-Reply-To: <83mt8cpjbd.fsf@gnu.org>
On 27/11/2022 21:23, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Sun, 27 Nov 2022 21:00:50 +0700
>>
>> Concerning Org, my point is that caseless sorting should be uniform.
>
> You need to work hard to get that. Just using 'downcase' is not enough, and
> neither is using 'string-collate-equalp'.
I do not like that in some functions `string-collate-lessp' with
IGNORE-CASE argument is used while strings are passed through `downcase'
in other places. When proper locales implementation is available, I
believe, it is better to consistently use IGNORE-CASE. I assume that
text is presented to users, not serialized to be saved or sent as data.
When `string-collate-lessp' disregards IGNORE-CASE, I consider it
acceptable to use `downcase' (`upcase' may be worse since Org currently
uses `downcase'). It provides reasonable balance of invested efforts and
obtained result.
>> Does not composed/decomposed representation affect comparison result?
>
> They are different texts, so yes, they do, and they should.
> If you want to treat such strings as equivalent, you need to work even
> harder, since Emacs currently doesn't have enough infrastructure to do it
> right in all cases.
`("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia")
(sort lst #'string-lessp)
=> ("semana" "señor" "sepia" "señor")
(sort lst #'string-collate-lessp)
=> ("semana" "señor" "señor" "sepia")
`string-collate-lessp' is able to handle at least some cases, it is
another argument to use it.
>> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
>
> This is about Python, no?
The value of this link is a collection of examples that are not obvious
for everybody. They are applicable to behavior `string-lessp' vs.
`string-collate-lessp' as well.
>> From my point of view e.g. case transformation rule for Turkish I is a
>> minor issue
>
> Why, Org doesn't want to support Turkish users?
From my point of view it is a minor issue in comparison to
(string-collate-lessp "a" "B" "C" t) ; => nil
that breaks comparison not only for accented letters.
You almost manged to convince Ihor to use `string-lessp' instead of
`string-collate-lessp'. I do not think it would improve quality of
support of Turkish language.
My suggestion is to fall back to `downcase' and `string-lessp' only if
`string-collate-lessp' is unable to provide case insensitive comparison.
>> My argument against `downcase' in `string-collate-lessp' is that it may
>> add noticeable performance penalty.
>
> I'd worry about correctness before performance.
`downcase' with `string-lessp' handles more cases than just
`string-lessp' (leaving aside buffer-local conversion tables), so form
my point of view the former is more correct. Even `downcase' with fixed
"C" locale may give result more consistent with user expectations. My
impression that users may be familiar with wide spread problems with
sorting.
next prev parent reply other threads:[~2022-11-27 15:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 4:08 bug#59275: Unexpected return value of `string-collate-lessp' on Mac Ihor Radchenko
2022-11-15 9:51 ` Robert Pluim
2022-11-16 3:47 ` Ihor Radchenko
2022-11-15 13:46 ` Eli Zaretskii
2022-11-15 15:05 ` Ihor Radchenko
2022-11-15 15:16 ` Eli Zaretskii
2022-11-16 1:34 ` Ihor Radchenko
2022-11-16 13:00 ` Eli Zaretskii
2022-11-21 7:28 ` Ihor Radchenko
2022-11-21 13:31 ` Eli Zaretskii
2022-11-22 1:24 ` Ihor Radchenko
2022-11-22 12:56 ` Eli Zaretskii
2022-11-23 10:39 ` Ihor Radchenko
2022-11-23 14:58 ` Eli Zaretskii
2022-11-24 2:22 ` Ihor Radchenko
2022-11-24 7:23 ` Eli Zaretskii
2022-11-26 2:03 ` Ihor Radchenko
2022-11-26 8:06 ` Eli Zaretskii
2022-11-26 8:47 ` Ihor Radchenko
2022-11-26 9:22 ` Eli Zaretskii
2022-11-27 14:00 ` Maxim Nikulin
2022-11-27 14:23 ` Eli Zaretskii
2022-11-27 15:19 ` Maxim Nikulin [this message]
2022-11-27 15:42 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com \
--to=m.a.nikulin@gmail.com \
--cc=59275@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=yantar92@posteo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).