From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: case-insensitive string comparison Date: Wed, 20 Jul 2022 20:12:26 +0300 Message-ID: <83ilnrlnd1.fsf@gnu.org> References: <87ilnsq4cr.fsf@gnu.org> <87mtd3n455.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38603"; mail-complaints-to="usenet@ciao.gmane.io" Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Roland Winkler Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 20 19:13:41 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEDGW-0009tX-W7 for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 19:13:41 +0200 Original-Received: from localhost ([::1]:54688 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oEDGV-0002GM-JC for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 13:13:39 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60178) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDFU-0001Xs-Gn for emacs-devel@gnu.org; Wed, 20 Jul 2022 13:12:36 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34238) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDFT-0006G6-UD; Wed, 20 Jul 2022 13:12:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=xHD5J5WcQLT8Mbxr2drK2MWZScUynRr8/STrCNNqink=; b=h/TztIevY+xU ToBnbYCkaVqLZEn6cIPVzEwu8zuImItppPk//v9kD6/4av7yIkKQXzMMN0z3pJlSrXeXD6ogTr1DU 4g0OYlzqY+OoA3LOMZI2r215WEAvcX4H5TuRZcCgGPP/PxWEjRlHMryr5R3gvkfkgzimhjLiBH2a/ 60MoQMMGl4C/43NJ4JKwCVvrbQumQF21dAaR/2MxR4TIurxmYsIsXsYi+xfNBsiaEJktoVlZXd5N3 z69wus9+4aSxxC1QHUbDcnGNwhP7x+6bqL6iOCu68ssWahEIgF7ziHBo0WLJQyaBqicflT3li7gUA t7Gzd5/lQsGLeGIK76hz6g==; Original-Received: from [87.69.77.57] (port=2408 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDFT-0007NK-Cx; Wed, 20 Jul 2022 13:12:35 -0400 In-Reply-To: <87mtd3n455.fsf@gnu.org> (message from Roland Winkler on Wed, 20 Jul 2022 11:24:38 -0500) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:292311 Archived-At: > From: Roland Winkler > Cc: emacs-devel@gnu.org > Date: Wed, 20 Jul 2022 11:24:38 -0500 > > On Tue, Jul 19 2022, Stefan Monnier wrote: > >> PS. Actually, compare-strings/ignore_case is broken because it does, > >> essentially, upcase both arguments, see > >> https://stackoverflow.com/q/319426/850781 > > > > Hmm... `string-collate-equalp`? > > It would be nice if the node in the elisp manual on "comparison of > characters and strings" included some discussion on what usage cases > with case-folding can / should preferentially be covered by the > locale-dependent function string-collate-equalp versus something like > compare-strings. I hear you, but your request is impossible to fulfill in practice. That's because the collation rules used by this function are implemented in the C library, and even if we know the locale, different implementations of libc use different collation rules (in addition, collation rules for some locales change with time). The answer to the question "what comparison function should I use in a specific use case" depends on the details of the use case, on the locale, and on the libc against which Emacs was linked. That is why the ELisp manual and the doc strings are intentionally vague regarding what exactly should you expect as result: we simply cannot say there anything that is accurate enough and general enough. compare-strings, by contrast, doesn't use any collation rules, only the current buffer's value of the case table. So its results are more predictable. > - bibtex-mode needs to compare BibTeX keywords that are ascii strings > for which case is insignificant. So bibtex-string= is exactly what > Sam suggests to put into subr.el, and I believe that's good enough > (just as almost any other approach I can think of for this particular > problem). > > - BBDB needs to know whether a name is already present in the database > or not, ignoring case. The function bbdb-string= is again what Sam > suggests to put into subr.el. The function string-collate-equalp > might be better suited for this. But which locale should it use? The > records in my BBDB cover larger parts of the world and I do not even > know which locale(s) might work best for each of them, not to mention > that BBDB needs to loop over all records. Is there a "univeral > default locale"? That "universal default locale" is what Emacs uses, modulo the few problematic characters like the dotless I etc. For 100% predictable results, build your own case table, bind the buffer's case table to it, and then call case-insensitive comparison.