From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek Date: Wed, 09 Dec 2020 17:46:10 +0200 Message-ID: <83eejz56f1.fsf@gnu.org> References: <5D75AE9F-F1F7-4A7E-A135-0071E03369AA@acm.org> <70DAA5B7-B336-4E8E-A342-05BD46BC0472@acm.org> <83ft4g70ci.fsf@gnu.org> <65B5A1CC-9D3D-495B-951D-733C9C0B355E@acm.org> <83zh2o5itq.fsf@gnu.org> <28B85957-B8DB-431D-A120-F17D8AE4693F@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12631"; mail-complaints-to="usenet@ciao.gmane.io" Cc: kehoea@parhasard.net, larsi@gnus.org, 11309-done@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 09 16:47:42 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kn1gs-0003Ba-3r for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Dec 2020 16:47:42 +0100 Original-Received: from localhost ([::1]:48316 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kn1gr-0000Iu-6A for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Dec 2020 10:47:41 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42374) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kn1gF-0000Ho-Dd for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 10:47:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52477) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kn1gF-00075Z-6K for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 10:47:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kn1gF-0002rQ-4K for bug-gnu-emacs@gnu.org; Wed, 09 Dec 2020 10:47:03 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Dec 2020 15:47:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11309 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 11309-done@debbugs.gnu.org id=D11309.160752878310906 (code D ref 11309); Wed, 09 Dec 2020 15:47:03 +0000 Original-Received: (at 11309-done) by debbugs.gnu.org; 9 Dec 2020 15:46:23 +0000 Original-Received: from localhost ([127.0.0.1]:35784 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kn1fb-0002pq-1u for submit@debbugs.gnu.org; Wed, 09 Dec 2020 10:46:23 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:48904) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kn1fa-0002pe-5t for 11309-done@debbugs.gnu.org; Wed, 09 Dec 2020 10:46:22 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:49632) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kn1fT-0006oj-Jy; Wed, 09 Dec 2020 10:46:15 -0500 Original-Received: from [176.228.60.248] (port=3262 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kn1fS-0007jk-UG; Wed, 09 Dec 2020 10:46:15 -0500 In-Reply-To: <28B85957-B8DB-431D-A120-F17D8AE4693F@acm.org> (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Wed, 9 Dec 2020 15:37:19 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:195532 Archived-At: > From: Mattias Engdegård > Date: Wed, 9 Dec 2020 15:37:19 +0100 > Cc: Lars Ingebrigtsen , Aidan Kehoe , > 11309-done@debbugs.gnu.org > > ß is a lower case letter so lowercasep(ß)=false is wrong. As a consequence, matching ß with [:lower:] and [:upper:] don't work correctly: ß should be matched by [:lower:] when case-fold-search is nil, and by both [:lower:] and [:upper:] when case-fold-search is non-nil. > > The problem stems from the fact that uppercasep and lowercasep don't use the Unicode case information directly (which perhaps they should) but derive the case indirectly from the upcase and downcase tables, and there is no way to state that a char is lower case but cannot be upcased or downcased. (Below I'm going to use the notation T[C] for the table T indexed by character C.) > > Currently, characters missing from or self-mapping in the upcase and downcase tables are considered to be caseless. For instance, upcase[*]=downcase[*]=* and upcase[中]=downcase[中]=nil. However, we also have upcase[ß]=downcase[ß]=ß, causing the incorrect lowercasep result. > > The solution that I ended up applying was the simplest possible: set upcase[ß]=ẞ (U+7838). The special-uppercase properties ensure that (upcase "ß") => "SS", and now all tests pass. > > (An acceptable alternative would have been to set upcase[ß]=nil and adapt lowercasep accordingly. I tried that and it works flawlessly, but involves slightly more changes.) > > And that concludes the resolution of this bug. Thanks.