From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bruno Haible Newsgroups: gmane.emacs.devel Subject: Re: case-insensitive string comparison Date: Mon, 25 Jul 2022 21:37:16 +0200 Message-ID: <2837483.e9J7NaK4W3@nimes> References: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26463"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Sam Steingold To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jul 25 21:41:05 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oG3wv-0006hW-DS for ged-emacs-devel@m.gmane-mx.org; Mon, 25 Jul 2022 21:41:05 +0200 Original-Received: from localhost ([::1]:36044 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oG3wt-0003C2-Jg for ged-emacs-devel@m.gmane-mx.org; Mon, 25 Jul 2022 15:41:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60522) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oG3tN-0002F1-QD for emacs-devel@gnu.org; Mon, 25 Jul 2022 15:37:25 -0400 Original-Received: from mo4-p00-ob.smtp.rzone.de ([85.215.255.23]:44027) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oG3tL-0000sy-EV; Mon, 25 Jul 2022 15:37:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1658777837; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=b3ui16ah9xokpMNR0u8nxuOAvYkK6cXdewR0n9HsTDc=; b=qc3SX33LwNXTHaTbkW4uRwkXGb0wlLyWaURpJP5dhVIpUmu3SvetW8pnmwLjEotcLj bOLv24Gl1S+5TMTQGrqjEZon3QgcrS2OAXC0FlOOqvDSPyLuEVFPK0xI/dCrTursnLx8 mBLJtO4s3/sKXn3IUljU2MC7NkvgtM0WwvWMKIulHh35bJywE0pkUk+Y2oiNzbcy9YiE yc7+YJZPHYaKshsBBsQEoWCIdA7H2Yq3Wytpys8E5YusjLbmnrxUcDtcKS3+inTjQKF5 xOn0gTN8hynywEr7Rd4u1RByf5cEcpZwheL4OUZBQhT/CgLPNP0QwZ5B06HMhsx3NXZK wcvg== Authentication-Results: strato.com; dkim=none X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zG6tLj91pDEjOTjGnFkqWzqRcp+aQKfX1A=" X-RZG-CLASS-ID: mo00 Original-Received: from nimes.localnet by smtp.strato.de (RZmta 47.47.0 AUTH) with ESMTPSA id m76e23y6PJbH01g (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Mon, 25 Jul 2022 21:37:17 +0200 (CEST) In-Reply-To: Received-SPF: none client-ip=85.215.255.23; envelope-from=bruno@clisp.org; helo=mo4-p00-ob.smtp.rzone.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:292649 Archived-At: Sam Steingold asked: > > (string-collate-equalp "a" "A" current-locale-environment t) > > =3D=3D> nil > > current-locale-environment > > =3D=3D> "en_US.UTF-8" >=20 > So, how do we do case-insensitive string comparison in Emacs? >=20 > It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > (even though it does not recognize "SS" and "=DF" as equal) >=20 > Or should we first implement something like casefold in Python? > https://docs.python.org/3/library/stdtypes.html#str.casefold The Unicode Standard's algorithm for case-insensitive string comparison is indeed much better thought-out than anything that you could come up with within a month. You are pointing to the Python implementation. But there's also an implementation in GNU libunistring [1] and one in ICU4C [2]. Emacs could surely use one of these. The implementation from GNU libunistring is also available through Gnulib, as a set of modules [3]. The most relevant modules are unicase/u8-casecmp unicase/u8-casecoll unicase/u8-casefold unicase/u8-casemap unicase/u8-casexfrm unicase/u8-ct-casefold unicase/u8-ct-tolower unicase/u8-ct-totitle unicase/u8-ct-toupper Bruno [1] https://www.gnu.org/software/libunistring/manual/html_node/Case-insensi= tive-comparison.html [2] https://unicode-org.github.io/icu/userguide/transforms/casemappings.html [3] https://www.gnu.org/software/gnulib/MODULES.html