unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Maxim Nikulin <m.a.nikulin@gmail.com>
Cc: yantar92@posteo.net, 59275@debbugs.gnu.org
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sun, 27 Nov 2022 17:42:40 +0200	[thread overview]
Message-ID: <83ilj0pfnz.fsf@gnu.org> (raw)
In-Reply-To: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> (message from Maxim Nikulin on Sun, 27 Nov 2022 22:19:24 +0700)

> From: Maxim Nikulin <m.a.nikulin@gmail.com>
> Date: Sun, 27 Nov 2022 22:19:24 +0700
> Cc: Ihor Radchenko <yantar92@posteo.net>, 59275@debbugs.gnu.org
> 
> I do not like that in some functions `string-collate-lessp' with 
> IGNORE-CASE argument is used while strings are passed through `downcase' 
> in other places. When proper locales implementation is available, I 
> believe, it is better to consistently use IGNORE-CASE.

I already explained up-thread why we ignore IGNORE-CASE when collation order
is not known.  I stand by that reasoning.  I believe your opinion is based
on considering only simple locales, and on the a-priori knowledge what is
the locale's collation to begin with, something that Emacs cannot know in
that case.

> When `string-collate-lessp' disregards IGNORE-CASE, I consider it 
> acceptable to use `downcase' (`upcase' may be worse since Org currently 
> uses `downcase'). It provides reasonable balance of invested efforts and 
> obtained result.

We disagree, sorry.

> `("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia")
> (sort lst #'string-lessp)
> => ("semana" "señor" "sepia" "señor")
> (sort lst #'string-collate-lessp)
> => ("semana" "señor" "señor" "sepia")
> 
> `string-collate-lessp' is able to handle at least some cases

On what OS and with which libc?

And I don't think this is evidence of collation knowing about equivalent
sequences.  It is most probable the side effect of collation ignoring
Latin accents altogether.

> >> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
> > 
> > This is about Python, no?
> 
> The value of this link is a collection of examples that are not obvious 
> for everybody. They are applicable to behavior `string-lessp' vs. 
> `string-collate-lessp' as well.

Which parts are applicable, in your opinion, and in what way?

> >>  From my point of view e.g. case transformation rule for Turkish I is a
> >> minor issue
> > 
> > Why, Org doesn't want to support Turkish users?
> 
>  From my point of view it is a minor issue in comparison to
> 
>      (string-collate-lessp "a" "B" "C" t)  ; => nil
> 
> that breaks comparison not only for accented letters.

Org is free to make such misguided decisions, but Emacs won't.  We cannot
decide that some locale is "minor" and others are "major".  My suggestion is
to look for a solution that works in any locale.

> You almost manged to convince Ihor to use `string-lessp' instead of 
> `string-collate-lessp'. I do not think it would improve quality of 
> support of Turkish language.

I didn't try to convince Ihor of anything, just point out the pitfalls of
using locale-specific collation order in portable programs.  I said back
then that I don't know enough to evaluate your decisions.  Once you
understand the subtle issues with these APIs, it is your call to decide how
to solve your particular problems.

> My suggestion is to fall back to `downcase' and `string-lessp' only if 
> `string-collate-lessp' is unable to provide case insensitive comparison.

You can do that in Org if that's the decision of the Org developers.  Emacs
cannot do that automatically for the reasons I explained up-thread.

> >> My argument against `downcase' in `string-collate-lessp' is that it may
> >> add noticeable performance penalty.
> > 
> > I'd worry about correctness before performance.
> 
> `downcase' with `string-lessp' handles more cases than just 
> `string-lessp' (leaving aside buffer-local conversion tables), so form 
> my point of view the former is more correct.

I'm quite sure this is only true for the cases that you considered, not in
general.

> Even `downcase' with fixed "C" locale may give result more consistent with
> user expectations.

How does it help on systems where locale-specific collation is not
accessible to Emacs?

> My impression that users may be familiar with wide spread problems with
> sorting.

Not IME.  But that's a separate issue, and I don't pretend to know Org users
better than you do, so I will defer to you on this one.





      reply	other threads:[~2022-11-27 15:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  4:08 bug#59275: Unexpected return value of `string-collate-lessp' on Mac Ihor Radchenko
2022-11-15  9:51 ` Robert Pluim
2022-11-16  3:47   ` Ihor Radchenko
2022-11-15 13:46 ` Eli Zaretskii
2022-11-15 15:05   ` Ihor Radchenko
2022-11-15 15:16     ` Eli Zaretskii
2022-11-16  1:34       ` Ihor Radchenko
2022-11-16 13:00         ` Eli Zaretskii
2022-11-21  7:28           ` Ihor Radchenko
2022-11-21 13:31             ` Eli Zaretskii
2022-11-22  1:24               ` Ihor Radchenko
2022-11-22 12:56                 ` Eli Zaretskii
2022-11-23 10:39                   ` Ihor Radchenko
2022-11-23 14:58                     ` Eli Zaretskii
2022-11-24  2:22                       ` Ihor Radchenko
2022-11-24  7:23                         ` Eli Zaretskii
2022-11-26  2:03                   ` Ihor Radchenko
2022-11-26  8:06                     ` Eli Zaretskii
2022-11-26  8:47                       ` Ihor Radchenko
2022-11-26  9:22                         ` Eli Zaretskii
2022-11-27 14:00                           ` Maxim Nikulin
2022-11-27 14:23                             ` Eli Zaretskii
2022-11-27 15:19                               ` Maxim Nikulin
2022-11-27 15:42                                 ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ilj0pfnz.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=59275@debbugs.gnu.org \
    --cc=m.a.nikulin@gmail.com \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).