From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.devel Subject: Re: case-insensitive string comparison Date: Wed, 20 Jul 2022 06:39:46 +0200 Message-ID: References: <2492DD45-FAF2-4A8F-AA70-0B10AA6FFB35@acm.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GMNuuUC26uFCxORx" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27758"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 20 06:42:06 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oE1XB-00070L-7w for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 06:42:05 +0200 Original-Received: from localhost ([::1]:36430 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oE1X9-0003Eg-Ia for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 00:42:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40046) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oE1VE-0002Wh-5a for emacs-devel@gnu.org; Wed, 20 Jul 2022 00:40:04 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:37970) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oE1V0-0001tq-8D for emacs-devel@gnu.org; Wed, 20 Jul 2022 00:40:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=YTZUbbRrRRSB5cnqAn7LrdSfNZQzP/grHllBWeRA+WQ=; b=X1TZrTzT0VLT9TZ7kZTjzKQv9g AYMcKmBBgDh6midPEXKJ/G0/Hj1K75fcmDi82WBkI6bIQl8JUyUOwx+VoNmFH4qyS+iYR+3OjqRq2 +zPj8at0HLdCPWZ+6FNc1tzMNJ/4CUfJ1980juVgCXSqrzZxWwysNGaAT6D3YLjilaEiPytzdnBg/ FWcoISx4FyShGj+E7sy+PVWjNMepxeLxUr7aWkPdXJ5/RasvfDQe7UPJWtswQFnofm6zAAb4gMQ1a 6pPnkf78pOAnotEXK192jcVXGpgV+8iEtiZiolxnCfTtlDZoYFuhaDg6ygyIo7rxVwVvh7O7bBTJX 9v9tjeyg==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.94.2) (envelope-from ) id 1oE1Uw-0000SK-94; Wed, 20 Jul 2022 06:39:46 +0200 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:292299 Archived-At: --GMNuuUC26uFCxORx Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 19, 2022 at 02:56:45PM -0400, Sam Steingold wrote: > > * Mattias Engdeg=C3=A5rd [2022-07-19 20:06:50 +0200]: > > > > 19 juli 2022 kl. 19.27 skrev Sam Steingold : > > > >> (defun string-equal-ignore-case (s1 s2) > > > > What would you tell someone complaining that > > > > (let ((rue "Stra=C3=9Fe")) > > (string-equal-ignore-case rue (upcase rue))) > > > > returns nil? Asking for a friend. >=20 > This is a well-known bug in user code. > https://stackoverflow.com/q/319426/850781 One case (heh) which gets too little attention in that (good) ref is "i" "=C4=B1" vs. "=C4=B0" vs. "I". You've to decide on a language environment to get a chance of doing it right (in Latin languages there are only 1 and 4, and they map to each other, in Turkic languages 1 and 3 correspond, as 2 and 4 do). The ref to the Unicode FAQ [1] from your ref shows that even the Unicode folks have given up on that. To me, it looks like an especially sleazy way to admit "well, folks, we've messed up on this one". Human languages are a messy mix, in which politics figures prominently. Unicode reflects that. Cheers [1] http://unicode.org/faq/casemap_charprop.html#9 --=20 t --GMNuuUC26uFCxORx Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCYteHCgAKCRAFyCz1etHa RvDhAJ9wrKFCTvlpbKpJFswdP3VO+vtFVACeL90WFCSsU7Gvim3qxVDzT6nyaWM= =QULB -----END PGP SIGNATURE----- --GMNuuUC26uFCxORx--