From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Nikulin Newsgroups: gmane.emacs.bugs Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac Date: Sun, 27 Nov 2022 21:00:50 +0700 Message-ID: <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19409"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: 59275@debbugs.gnu.org To: Eli Zaretskii , Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 27 15:02:27 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ozIEk-0004sz-Qw for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 27 Nov 2022 15:02:26 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ozIES-0001PI-E0; Sun, 27 Nov 2022 09:02:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozIEN-0001P2-EX for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 09:02:06 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ozIEN-0000eS-5n for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 09:02:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ozIEM-000527-2N for bug-gnu-emacs@gnu.org; Sun, 27 Nov 2022 09:02:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Maxim Nikulin Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 27 Nov 2022 14:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59275 X-GNU-PR-Package: emacs Original-Received: via spool by 59275-submit@debbugs.gnu.org id=B59275.166955766219337 (code B ref 59275); Sun, 27 Nov 2022 14:02:02 +0000 Original-Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 14:01:02 +0000 Original-Received: from localhost ([127.0.0.1]:42306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIDN-00051R-Dy for submit@debbugs.gnu.org; Sun, 27 Nov 2022 09:01:02 -0500 Original-Received: from mail-lj1-f172.google.com ([209.85.208.172]:34389) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIDL-00051L-AS for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 09:01:00 -0500 Original-Received: by mail-lj1-f172.google.com with SMTP id d3so10340678ljl.1 for <59275@debbugs.gnu.org>; Sun, 27 Nov 2022 06:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=6SGdlg/pBotK5793f/MCmOq+3VUn158dqGG8MY2pxMs=; b=kDD7u4iaV554FFZS+9/ECC+uEKAyXafElp4C1eK24ehSkaVmH2/O/cUqHYvW0Nsiiy slQ1RRZQzE0gKQkoR9xtOZmhkxxD6P3MsgvUmID2c27xTaD1z0KgKkMEUk7wPTvKGCfu 2nljmO+PgcpbZzsqgAzC7Bj7F34bE60R9TZzb9F0MNMUwiGJ4jIndqCca1TwbAtvgB7o +Sw17m/C0krWQ7yWPM5yQFuntOQ/UfCe4CqzNjj1+bjRZFSsd9qiWoE6nOJZk46KTqZy XeVKr4kfOJqsFatnJ0S5RpHlvfOoaeEzyXdvq7N3FpZ5jM1uw6q5nKAbwTbTJ7oOyQwe JoZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6SGdlg/pBotK5793f/MCmOq+3VUn158dqGG8MY2pxMs=; b=6Y3bHwrf2NZQU4RKbuD06B47LMbWVgR4ijFYIas0WFvEiPxxs+6NZLT1sNYQ9mZtZS Xg9q6TGr+ZqFTUE1OBM7jvRn9tdaKsZtQUT+a7ey9QumbMLhNXhR8hSsmUmnA0f0QhH0 vBSWAJdlicpuBEPYbkaJDOjRfvy2hvb+z/Xn7RtDnEypBrhO8QngtS/FA0MZmCFYcJoq 5dmzz4clA2kVAzqf2fyvx58uYzbhVnqr0m19Jnr6VE0JIl/qFr5Nq41ll/OPuNo4Oxoz Mh+dH0gxYEUiHtrtR/ZS0iEWGi/T0062NF4nq3zCmtlKWlE/RgXTY9Gezhx1H+PDk3aC ChvA== X-Gm-Message-State: ANoB5pmdGKfutFOZQePBCeNDQDdcC/M8cWmUzXfq/knr9J82AeuPYOwn IRhsh++gNVUPGiP7gzOBCco= X-Google-Smtp-Source: AA0mqf6SilwPmIkavpKwNTVLV5/4Vxs+ey3upSUpq/tL7in2eDUkwm/+fw5/5EFMdLRo97MbasZiOw== X-Received: by 2002:a05:651c:82:b0:277:2f15:4179 with SMTP id 2-20020a05651c008200b002772f154179mr9386894ljq.408.1669557653106; Sun, 27 Nov 2022 06:00:53 -0800 (PST) Original-Received: from [192.168.0.101] (nat-0-0.nsk.sibset.net. [5.44.169.188]) by smtp.googlemail.com with ESMTPSA id f18-20020a2eb5b2000000b0026bca725cd0sm926438ljn.39.2022.11.27.06.00.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 27 Nov 2022 06:00:52 -0800 (PST) X-Google-Original-From: Maxim Nikulin Content-Language: en-US In-Reply-To: <83k03it6i2.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:249215 Archived-At: On 26/11/2022 16:22, Eli Zaretskii wrote: >> From: Ihor Radchenko Date: Sat, 26 Nov 2022 08:47:13 +0000 >> >>> 'downcase' uses the buffer-local case table if such is defined for the >>> buffer that happens to be the current when you invoke 'downcase', and that's >>> another cause of inconsistency and user surprises, especially when the >>> strings you compare don't really "belong" to the current buffer. `downcase' is already used in Org for case-insensitive sorting. I am unsure if it appeared earlier than `string-collate-lessp' was introduced. Buffer-local conversion table is not a problem when table rows, list items (text formatting object, not elisp structure), or tags local to the current file are sorted. However when agenda is built from several files current buffer should not affect entries order. Concerning Org, my point is that caseless sorting should be uniform. Currently different functions use distinct approaches and it is more severe inconsistency. >> https://nullprogram.com/blog/2014/06/13/ that mentioned something >> similar about caveats with composition. > > I don't see there anything about sorting or collation. What did I miss? Does not composed/decomposed representation affect comparison result? Emacs-devel thread mentioned earlier in this bug contains a link describing enough issues with string comparison: https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison >>> And we are talking about a single system where these problems happen, which >>> is macOS, right? Wouldn't it be better for "Someone" who uses macOS to just >>> bite the bullet and write a proper collation function, or find a free >>> software implementation of one, and include it in Emacs? My impression was that clang should eventually get better locales support. If so, I am in doubts concerning macOS-specific implementation. I have no a macOS machine, so I may be wrong in my assumption concerning locale implementation there. However Emacs may benefit from its own implementation of collation (based on built-in Unicode character database) used on (almost) all OSes. It will allow using of several locales in parallel without switching of libc locale that is not thread-safe. I consider `downcase' as a kind of workaround (ignore case for poors) that allows graceful degradation in comparison to `string-lessp'. From my point of view e.g. case transformation rule for Turkish I is a minor issue in comparison to complete disregarding of IGNORE-CASE argument at least when results are presented to users. My argument against `downcase' in `string-collate-lessp' is that it may add noticeable performance penalty. Interestingly `compare-strings' uses upcase conversion when the IGNORE-CASE argument is true. I believed that some implementations (unrelated to Emacs) may have problems with e.g. ß and considered downcase as a safer option.