unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 59275@debbugs.gnu.org
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sat, 26 Nov 2022 08:47:13 +0000	[thread overview]
Message-ID: <87v8n2je5q.fsf@localhost> (raw)
In-Reply-To: <83r0xqta0d.fsf@gnu.org>

Eli Zaretskii <eliz@gnu.org> writes:

>> We concluded that a better fallback when collation is not available
>> would be using downcase+string-lessp when `string-collate-lessp' is
>> called with non-nil IGNORE-CASE argument.
>
> This has caveats, see below.  I won't argue about your Org-local decision,
> since I don't know enough about the intended uses of what you did, but I do
> have something to say about this decision in general.  I suggest at least a
> FIXME comment where you do this stuff, based on what I tell below.

Thanks for the information!

>> Would it be acceptable for Emacs to change the fallback behavior of
>> `string-collate-lessp' to:
>> 
>> 1. If string collation is not available and IGNORE-CASE is nil, fallback
>>    to`string-lessp';
>> 2. If string collation is not available and IGNORE-CASE is non-nil,
>>    use `downcase' + `string-lessp'.
>
> 'downcase' uses the buffer-local case table if such is defined for the
> buffer that happens to be the current when you invoke 'downcase', and that's
> another cause of inconsistency and user surprises, especially when the
> strings you compare don't really "belong" to the current buffer.

Interesting. Is there any reason why this is not mentioned in the
docstring for `downcase'?

I now see 4.10 The Case Table section of the manual, and it looks like
case tables should be set mostly automatically (by Emacs?) according to
the language environment. Are details about this process documented
anywhere? Are these case conversion tables independent of glibc?

> Also, in
> some (rarely-used) locales, downcasing has unexpected results, even with the
> default case-table.  For example, downcasing "I" produces "ı", not "i" as
> expected.  Did you think about these cases when making the above decision?

I did not. However, I recall reading somewhere that it is possible work
around this kind of issues by calling case conversion several times:
upcase -> downcase -> upcase -> downcase.

I did not. But now, after you reminded me about this caveat, I do recall
https://nullprogram.com/blog/2014/06/13/ that mentioned something
similar about caveats with composition. Just mentioning it for your
reference. (I am not sure if the caveats discussed have been raised on
Emacs devel).

>> I also do not think that it will be backwards-incompatible. If the call
>> to `string-collate-lessp' explicitly requests ignoring case, `downcase'
>> is more expected than bare `string-lessp' that _does not_ ignore case.
>> 
>> WDYT?
>
> See above.  What you suggest is perhaps fine for plain-ASCII text, but not
> in general, IMNSHO.
>
> The reason for what Emacs currently does on systems that lack collation
> functions is that for such systems collation rules are indeterminate, and so
> inventing them by following naïve rules of plain ASCII, in particular the
> case-conversion rules, is potentially very wrong.  These are general-purpose
> APIs, not something concrete in specific Org contexts, and as such, these
> APIs cannot "mostly work", they should work always and for every possible
> use case.

I feel that I miss something. Don't Emacs provide unicode case
conversion tables? Why plain ASCII rules?

> And we are talking about a single system where these problems happen, which
> is macOS, right?  Wouldn't it be better for "Someone" who uses macOS to just
> bite the bullet and write a proper collation function, or find a free
> software implementation of one, and include it in Emacs?  This is what I did
> for MS-Windows at the time string-collate-lessp was added to Emacs.  Why
> cannot macOS users do the same?

It would be. But how can we ask for this? etc/TODO? Or maybe re-open
this bug report?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





  reply	other threads:[~2022-11-26  8:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15  4:08 bug#59275: Unexpected return value of `string-collate-lessp' on Mac Ihor Radchenko
2022-11-15  9:51 ` Robert Pluim
2022-11-16  3:47   ` Ihor Radchenko
2022-11-15 13:46 ` Eli Zaretskii
2022-11-15 15:05   ` Ihor Radchenko
2022-11-15 15:16     ` Eli Zaretskii
2022-11-16  1:34       ` Ihor Radchenko
2022-11-16 13:00         ` Eli Zaretskii
2022-11-21  7:28           ` Ihor Radchenko
2022-11-21 13:31             ` Eli Zaretskii
2022-11-22  1:24               ` Ihor Radchenko
2022-11-22 12:56                 ` Eli Zaretskii
2022-11-23 10:39                   ` Ihor Radchenko
2022-11-23 14:58                     ` Eli Zaretskii
2022-11-24  2:22                       ` Ihor Radchenko
2022-11-24  7:23                         ` Eli Zaretskii
2022-11-26  2:03                   ` Ihor Radchenko
2022-11-26  8:06                     ` Eli Zaretskii
2022-11-26  8:47                       ` Ihor Radchenko [this message]
2022-11-26  9:22                         ` Eli Zaretskii
2022-11-27 14:00                           ` Maxim Nikulin
2022-11-27 14:23                             ` Eli Zaretskii
2022-11-27 15:19                               ` Maxim Nikulin
2022-11-27 15:42                                 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8n2je5q.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=59275@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).