From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Roland Winkler Newsgroups: gmane.emacs.devel Subject: Re: case-insensitive string comparison Date: Wed, 20 Jul 2022 12:37:29 -0500 Message-ID: <87lesnlm7a.fsf@gnu.org> References: <87ilnsq4cr.fsf@gnu.org> <87mtd3n455.fsf@gnu.org> <83ilnrlnd1.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6340"; mail-complaints-to="usenet@ciao.gmane.io" Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jul 20 19:39:02 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEDf3-0001St-Kw for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 19:39:01 +0200 Original-Received: from localhost ([::1]:41988 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oEDf2-00076p-Ha for ged-emacs-devel@m.gmane-mx.org; Wed, 20 Jul 2022 13:39:00 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37038) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDdc-0005im-FY for emacs-devel@gnu.org; Wed, 20 Jul 2022 13:37:32 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34518) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDdb-0001tb-FO; Wed, 20 Jul 2022 13:37:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=CJrDeDgR1AQmHsdDKH1EAtHRxgyQk22eE+XLbBit9zY=; b=Kjm7bMam7btUrFujDVAl dWEhIZeckbE5VIeeB8Oyk/pim8JANYwv5W9wlqvmI3wYD2BG1J8FqHLOdV9MVEtA+8YH4dasD9Qok tZ4GzocVgYojzVPwPeYA03vH/vpmmgfrwIhkduS+hppqH0MZehvb+zilIQmJOd9JoKaITZiyoLWKT UgSULKX4SRuZiXzUJUtCP+XDkEsmuLQLF2UmBgSVTrt4EoLhB/P1riQn/E+6XvYIAA1fSsl5Jmkku 7BXyamxz7sQsKaNDDSOwjcpRaYogqSLYqVbTSBmO81EF9yAZYYqlknGoRg/bx1hwpnJZtEfwFaeRW AaSp3nyv2m07yQ==; Original-Received: from [2600:1700:5650:f790::42] (port=47396 helo=regnitz) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEDdb-0005qQ-4D; Wed, 20 Jul 2022 13:37:31 -0400 In-Reply-To: <83ilnrlnd1.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 20 Jul 2022 20:12:26 +0300") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:292313 Archived-At: On Wed, Jul 20 2022, Eli Zaretskii wrote: >> It would be nice if the node in the elisp manual on "comparison of >> characters and strings" included some discussion on what usage cases >> with case-folding can / should preferentially be covered by the >> locale-dependent function string-collate-equalp versus something like >> compare-strings. > > I hear you, but your request is impossible to fulfill in practice. > That's because the collation rules used by this function are > implemented in the C library, and even if we know the locale, > different implementations of libc use different collation rules (in > addition, collation rules for some locales change with time). Even mentioning the difficulties could be useful here. The elisp manual is used by people who want to develop code that works for a wide range of users. So even if string comparison is a slippery terrain these elisp hackers need to make design choices that work best for most users. What usage scenarios in elisp packages might benefit from string-collate-equalp even if this function depends on details that can be quite different for different users? >> - BBDB needs to know whether a name is already present in the database >> or not, ignoring case. The function bbdb-string= is again what Sam >> suggests to put into subr.el. The function string-collate-equalp >> might be better suited for this. But which locale should it use? The >> records in my BBDB cover larger parts of the world and I do not even >> know which locale(s) might work best for each of them, not to mention >> that BBDB needs to loop over all records. Is there a "univeral >> default locale"? > > That "universal default locale" is what Emacs uses, modulo the few > problematic characters like the dotless I etc. For 100% predictable > results, build your own case table, bind the buffer's case table to > it, and then call case-insensitive comparison. I am not sure I can follow your argument. Do you suggest that, likely, BBDB will work best if it compares names using compare-strings? (I'd be glad to hear that.) This code should work for users who do not want to build their own case table and stuff like that. Thanks!