unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Maxim Nikulin <m.a.nikulin@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Ihor Radchenko <yantar92@posteo.net>, 59275@debbugs.gnu.org
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sun, 27 Nov 2022 22:19:24 +0700	[thread overview]
Message-ID: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> (raw)
In-Reply-To: <83mt8cpjbd.fsf@gnu.org>

On 27/11/2022 21:23, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Sun, 27 Nov 2022 21:00:50 +0700
>>
>> Concerning Org, my point is that caseless sorting should be uniform.
> 
> You need to work hard to get that.  Just using 'downcase' is not enough, and
> neither is using 'string-collate-equalp'.

I do not like that in some functions `string-collate-lessp' with 
IGNORE-CASE argument is used while strings are passed through `downcase' 
in other places. When proper locales implementation is available, I 
believe, it is better to consistently use IGNORE-CASE. I assume that 
text is presented to users, not serialized to be saved or sent as data.

When `string-collate-lessp' disregards IGNORE-CASE, I consider it 
acceptable to use `downcase' (`upcase' may be worse since Org currently 
uses `downcase'). It provides reasonable balance of invested efforts and 
obtained result.

>> Does not composed/decomposed representation affect comparison result?
> 
> They are different texts, so yes, they do, and they should.
> If you want to treat such strings as equivalent, you need to work even
> harder, since Emacs currently doesn't have enough infrastructure to do it
> right in all cases.

`("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia")
(sort lst #'string-lessp)
=> ("semana" "señor" "sepia" "señor")
(sort lst #'string-collate-lessp)
=> ("semana" "señor" "señor" "sepia")

`string-collate-lessp' is able to handle at least some cases, it is 
another argument to use it.

>> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
> 
> This is about Python, no?

The value of this link is a collection of examples that are not obvious 
for everybody. They are applicable to behavior `string-lessp' vs. 
`string-collate-lessp' as well.

>>  From my point of view e.g. case transformation rule for Turkish I is a
>> minor issue
> 
> Why, Org doesn't want to support Turkish users?

 From my point of view it is a minor issue in comparison to

     (string-collate-lessp "a" "B" "C" t)  ; => nil

that breaks comparison not only for accented letters.

You almost manged to convince Ihor to use `string-lessp' instead of 
`string-collate-lessp'. I do not think it would improve quality of 
support of Turkish language.

My suggestion is to fall back to `downcase' and `string-lessp' only if 
`string-collate-lessp' is unable to provide case insensitive comparison.

>> My argument against `downcase' in `string-collate-lessp' is that it may
>> add noticeable performance penalty.
> 
> I'd worry about correctness before performance.

`downcase' with `string-lessp' handles more cases than just 
`string-lessp' (leaving aside buffer-local conversion tables), so form 
my point of view the former is more correct. Even `downcase' with fixed 
"C" locale may give result more consistent with user expectations. My 
impression that users may be familiar with wide spread problems with 
sorting.





  reply	other threads:[~2022-11-27 15:19 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  4:08 bug#59275: Unexpected return value of `string-collate-lessp' on Mac Ihor Radchenko
2022-11-15  9:51 ` Robert Pluim
2022-11-16  3:47   ` Ihor Radchenko
2022-11-15 13:46 ` Eli Zaretskii
2022-11-15 15:05   ` Ihor Radchenko
2022-11-15 15:16     ` Eli Zaretskii
2022-11-16  1:34       ` Ihor Radchenko
2022-11-16 13:00         ` Eli Zaretskii
2022-11-21  7:28           ` Ihor Radchenko
2022-11-21 13:31             ` Eli Zaretskii
2022-11-22  1:24               ` Ihor Radchenko
2022-11-22 12:56                 ` Eli Zaretskii
2022-11-23 10:39                   ` Ihor Radchenko
2022-11-23 14:58                     ` Eli Zaretskii
2022-11-24  2:22                       ` Ihor Radchenko
2022-11-24  7:23                         ` Eli Zaretskii
2022-11-26  2:03                   ` Ihor Radchenko
2022-11-26  8:06                     ` Eli Zaretskii
2022-11-26  8:47                       ` Ihor Radchenko
2022-11-26  9:22                         ` Eli Zaretskii
2022-11-27 14:00                           ` Maxim Nikulin
2022-11-27 14:23                             ` Eli Zaretskii
2022-11-27 15:19                               ` Maxim Nikulin [this message]
2022-11-27 15:42                                 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com \
    --to=m.a.nikulin@gmail.com \
    --cc=59275@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).