From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac Date: Sun, 27 Nov 2022 17:42:40 +0200 Message-ID: <83ilj0pfnz.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> <83mt8cpjbd.fsf@gnu.org> <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13032"; mail-complaints-to="usenet@ciao.gmane.io" Cc: yantar92@posteo.net, 59275@debbugs.gnu.org To: Maxim Nikulin Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 27 16:43:19 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ozJoM-0003Aw-Sl for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 27 Nov 2022 16:43:19 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ozJoB-0001dC-QZ; Sun, 27 Nov 2022 10:43:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozJo8-0001ci-Qx for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 10:43:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ozJo8-0001ou-I5 for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 10:43:04 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ozJo6-0005zk-6F for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 10:43:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 27 Nov 2022 15:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59275 X-GNU-PR-Package: emacs Original-Received: via spool by 59275-submit@debbugs.gnu.org id=B59275.166956374223036 (code B ref 59275); Sun, 27 Nov 2022 15:43:02 +0000 Original-Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 15:42:22 +0000 Original-Received: from localhost ([127.0.0.1]:42721 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJnS-0005zU-7s for submit@debbugs.gnu.org; Sun, 27 Nov 2022 10:42:22 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:36758) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJnQ-0005zJ-Dy for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 10:42:21 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozJnL-0001hu-0u; Sun, 27 Nov 2022 10:42:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=HWxBLoQCPWRKsVs+Ltu7K1vl74B5MOBBcVEFf0zCF00=; b=ge6ArB9K4SqU5n8qGPYI khq1CQG0yflvZY1+fP80qbKQ6hhkAQDPIwoI33MbMrywI4RB/SBwZouOKpkrD2ebwAmBgBb++Q1Tz K9pIKadNUHOnqvHwNvPkT+30YIvEaft+kfzqBNKJFAKxqg54kov1XA/ajlfydWl9UxVmSBvKaJBBv BhehMDTOoIMdcS14/p/FQvJz1uDk9EL7/NLyWmLF7kYU7KfYCUcFVkujWp4uKeZvc08GJpjJ0Q26K GgmRJImh1mbu/wYjWERgyMJbhAjYcduJHntp0otI+ee3PU+2bsj4fFsOk6AljHO8Xv7/ltJ79Fem1 rHZdTqL+JFeiXg==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozJnJ-0001WN-Ar; Sun, 27 Nov 2022 10:42:13 -0500 In-Reply-To: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> (message from Maxim Nikulin on Sun, 27 Nov 2022 22:19:24 +0700) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:249227 Archived-At: > From: Maxim Nikulin > Date: Sun, 27 Nov 2022 22:19:24 +0700 > Cc: Ihor Radchenko , 59275@debbugs.gnu.org > > I do not like that in some functions `string-collate-lessp' with > IGNORE-CASE argument is used while strings are passed through `downcase' > in other places. When proper locales implementation is available, I > believe, it is better to consistently use IGNORE-CASE. I already explained up-thread why we ignore IGNORE-CASE when collation order is not known. I stand by that reasoning. I believe your opinion is based on considering only simple locales, and on the a-priori knowledge what is the locale's collation to begin with, something that Emacs cannot know in that case. > When `string-collate-lessp' disregards IGNORE-CASE, I consider it > acceptable to use `downcase' (`upcase' may be worse since Org currently > uses `downcase'). It provides reasonable balance of invested efforts and > obtained result. We disagree, sorry. > `("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia") > (sort lst #'string-lessp) > => ("semana" "señor" "sepia" "señor") > (sort lst #'string-collate-lessp) > => ("semana" "señor" "señor" "sepia") > > `string-collate-lessp' is able to handle at least some cases On what OS and with which libc? And I don't think this is evidence of collation knowing about equivalent sequences. It is most probable the side effect of collation ignoring Latin accents altogether. > >> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison > > > > This is about Python, no? > > The value of this link is a collection of examples that are not obvious > for everybody. They are applicable to behavior `string-lessp' vs. > `string-collate-lessp' as well. Which parts are applicable, in your opinion, and in what way? > >> From my point of view e.g. case transformation rule for Turkish I is a > >> minor issue > > > > Why, Org doesn't want to support Turkish users? > > From my point of view it is a minor issue in comparison to > > (string-collate-lessp "a" "B" "C" t) ; => nil > > that breaks comparison not only for accented letters. Org is free to make such misguided decisions, but Emacs won't. We cannot decide that some locale is "minor" and others are "major". My suggestion is to look for a solution that works in any locale. > You almost manged to convince Ihor to use `string-lessp' instead of > `string-collate-lessp'. I do not think it would improve quality of > support of Turkish language. I didn't try to convince Ihor of anything, just point out the pitfalls of using locale-specific collation order in portable programs. I said back then that I don't know enough to evaluate your decisions. Once you understand the subtle issues with these APIs, it is your call to decide how to solve your particular problems. > My suggestion is to fall back to `downcase' and `string-lessp' only if > `string-collate-lessp' is unable to provide case insensitive comparison. You can do that in Org if that's the decision of the Org developers. Emacs cannot do that automatically for the reasons I explained up-thread. > >> My argument against `downcase' in `string-collate-lessp' is that it may > >> add noticeable performance penalty. > > > > I'd worry about correctness before performance. > > `downcase' with `string-lessp' handles more cases than just > `string-lessp' (leaving aside buffer-local conversion tables), so form > my point of view the former is more correct. I'm quite sure this is only true for the cases that you considered, not in general. > Even `downcase' with fixed "C" locale may give result more consistent with > user expectations. How does it help on systems where locale-specific collation is not accessible to Emacs? > My impression that users may be familiar with wide spread problems with > sorting. Not IME. But that's a separate issue, and I don't pretend to know Org users better than you do, so I will defer to you on this one.