From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac Date: Sat, 26 Nov 2022 11:22:29 +0200 Message-ID: <83k03it6i2.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17884"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 59275@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Nov 26 10:23:25 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oyrPA-0004RQ-RM for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 26 Nov 2022 10:23:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyrOx-0002fz-W4; Sat, 26 Nov 2022 04:23:12 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyrOq-0002c3-QH for bug-gnu-emacs@gnu.org; Sat, 26 Nov 2022 04:23:05 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oyrOo-0002yZ-Gr for bug-gnu-emacs@gnu.org; Sat, 26 Nov 2022 04:23:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oyrOo-0002kX-Br for bug-gnu-emacs@gnu.org; Sat, 26 Nov 2022 04:23:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 26 Nov 2022 09:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59275 X-GNU-PR-Package: emacs Original-Received: via spool by 59275-submit@debbugs.gnu.org id=B59275.166945453710501 (code B ref 59275); Sat, 26 Nov 2022 09:23:02 +0000 Original-Received: (at 59275) by debbugs.gnu.org; 26 Nov 2022 09:22:17 +0000 Original-Received: from localhost ([127.0.0.1]:37812 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyrO5-0002jG-7C for submit@debbugs.gnu.org; Sat, 26 Nov 2022 04:22:17 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:33116) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyrO0-0002in-N9 for 59275@debbugs.gnu.org; Sat, 26 Nov 2022 04:22:14 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyrNu-0002sY-M3; Sat, 26 Nov 2022 04:22:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=3U5qW0Sw0IQZyGxzksonoeJ2TVOad1Tzbj5dzPw0VX8=; b=Bol+rKILZtb6 eV0nI6+BHIxRQDurLxOOrc4XU6DG61VRcvy5PSWOOy855QY2D7YzUUSAk+QgVJqlno3cpgNHz1KP7 GoxRlSQCmqokWG6rS7i4kM7n8m5xj3YJX9Uv1gtYlGpTCBXo7QwhI1LcDCGxP3OnUiLgBEUiT6XMO TP6HlLGF3HYoTY0iSuXjSRXogzKWxz2+LTiOsDC3MmwHA4qdNEjC4H0ejwue104s+LlfVDiNPqekw QRZbHRaGl+9omkucuRKF6I01YaqfCvaQHVzzFEiemW9y/Z+jUNdVA13Xm+1pDK4GIHr3pZM+U/gYW b620WKc45UkVspMS8Zg2lA==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyrNu-000553-5g; Sat, 26 Nov 2022 04:22:06 -0500 In-Reply-To: <87v8n2je5q.fsf@localhost> (message from Ihor Radchenko on Sat, 26 Nov 2022 08:47:13 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:249056 Archived-At: > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Sat, 26 Nov 2022 08:47:13 +0000 > > > 'downcase' uses the buffer-local case table if such is defined for the > > buffer that happens to be the current when you invoke 'downcase', and that's > > another cause of inconsistency and user surprises, especially when the > > strings you compare don't really "belong" to the current buffer. > > Interesting. Is there any reason why this is not mentioned in the > docstring for `downcase'? Yes: because we are ashamed of that and hope to change it at some point, if we ever figure out how to do that. The way to avoid this caveat is simple: let-bind case-table when you call 'downcase'. > I now see 4.10 The Case Table section of the manual, and it looks like > case tables should be set mostly automatically (by Emacs?) according to > the language environment. Yes. But a buffer can have its local case-table. > Are details about this process documented anywhere? No. But see characters.el and the function I mention below. > Are these case conversion tables independent of glibc? Yes. We build them completely separately and from scratch, as you will see in characters.el. > https://nullprogram.com/blog/2014/06/13/ that mentioned something > similar about caveats with composition. I don't see there anything about sorting or collation. What did I miss? > Just mentioning it for your reference. (I am not sure if the caveats > discussed have been raised on Emacs devel). What did you think ought to be discussed? Btw, that blog fails to distinguish between display-time features and processing of text without displaying it. On display, Emacs combines characters that are combining, so equivalent character sequences should look the same. But Emacs doesn't by default consider equivalent character sequences as equal in all situations, leaving this to the Lisp program. Considering them always as equal looks sexy in a blog post, because it raises some brows and has the "whoah!" effect, but isn't a good policy in general, since some applications definitely need to know about the original decomposed sequence. We cannot conceal this from Lisp programs by hiding the original sequence on some low level that is not exposed to Lisp. Yes, this makes Lisp programs more complicated, but that comes with the territory: you cannot have power without complexity. > I feel that I miss something. Don't Emacs provide unicode case > conversion tables? The case tables we provide are based on Unicode, but are tweaked by the language-environment. See, for example, turkish-case-conversion-enable, which is run when the Turkish language-environment is turned on. > Why plain ASCII rules? Your logic is. What you suggest breaks down if you consider various complications in some locales. > > And we are talking about a single system where these problems happen, which > > is macOS, right? Wouldn't it be better for "Someone" who uses macOS to just > > bite the bullet and write a proper collation function, or find a free > > software implementation of one, and include it in Emacs? This is what I did > > for MS-Windows at the time string-collate-lessp was added to Emacs. Why > > cannot macOS users do the same? > > It would be. But how can we ask for this? etc/TODO? Or maybe re-open > this bug report? Anything will be fine with me, but unless the people who are asking you to do these workarounds are motivated enough to sit down and do the job, we will never get there. And guess what effect these workarounds have on their motivation.