unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Sam Steingold <sds@gnu.org>
To: emacs-devel@gnu.org, rms@gnu.org
Cc: Bruno Haible <bruno@clisp.org>
Subject: Re: case-insensitive string comparison
Date: Tue, 26 Jul 2022 10:28:01 -0400	[thread overview]
Message-ID: <lzlesg0wzy.fsf@3c22fb11fdab.ant.amazon.com> (raw)
In-Reply-To: <E1oGBBb-00075O-Tj@fencepost.gnu.org> (Richard Stallman's message of "Mon, 25 Jul 2022 23:24:43 -0400")

> * Richard Stallman <ezf@tah.bet> [2022-07-25 23:24:43 -0400]:
>
>   > It is okay to add a `string-equal-ignore-case' based on `compare-strings'?
>   > (even though it does not recognize "SS" and "ß" as equal)
>
> A function `string-equal-ignore-case' would make sense.  My question is,
> is it worth the cost in complexity, or is it better to urge users to call
> `compare-strings' directly?

1. we already have `string-prefix-p' and `string-suffix-p' which are
thin wrappers around `compare-strings'

> That depends on how often programs will do case-insensitive string comparison.
> If frequently, that gives a bigger upside to `string-equal-ignore-case'.

2. there are dozens of places in Emacs core with code like

--8<---------------cut here---------------start------------->8---
          (eq t (compare-strings (sgml-tag-name tag-info) nil nil
				 (car stack) nil nil t))
--8<---------------cut here---------------end--------------->8---

3. some emacs packages already have to define their own versions of
`string-equal-ignore-case', e.g., `bbdb-string='.

>   > Or should we first implement something like casefold in Python?
>   > https://docs.python.org/3/library/stdtypes.html#str.casefold
>
> That casefold operation is not the same thing as ignoring case in
> Emacs.

Normally, case-insensitive comparison means something like

--8<---------------cut here---------------start------------->8---
(string= (casefold A) (casefold B))
--8<---------------cut here---------------end--------------->8---

`compare-strings' does

--8<---------------cut here---------------start------------->8---
(string= (upcase A) (upcase B))
--8<---------------cut here---------------end--------------->8---

(except it does it character-by-character, no allocating new strings for
`upcase').

> How to integrate something like that into Emacs, and in
> general how to handle `ß' properly in case conversion, calls for more
> thought.

Bruno Haible replied in this thread, suggesting libunistring via gnulib.
I think this is the easiest way to handle the issue.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
https://memri.org https://honestreporting.com https://ffii.org
The program isn't debugged until the last user is dead.



  parent reply	other threads:[~2022-07-26 14:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19 17:27 case-insensitive string comparison Sam Steingold
2022-07-19 18:06 ` Mattias Engdegård
2022-07-19 18:56   ` Sam Steingold
2022-07-20  4:39     ` tomas
2022-07-20 11:35       ` Eli Zaretskii
2022-07-20 13:30         ` tomas
2022-07-19 18:16 ` Stefan Kangas
2022-07-19 19:39 ` Roland Winkler
2022-07-19 22:47   ` Sam Steingold
2022-07-20  2:21     ` Roland Winkler
2022-07-20  3:01     ` Stefan Monnier
2022-07-20 16:22       ` Sam Steingold
2022-07-25 14:23         ` Sam Steingold
2022-07-25 15:58           ` Eli Zaretskii
2022-07-25 19:39             ` Sam Steingold
2022-07-26 13:05               ` Eli Zaretskii
2022-07-26 14:16                 ` Sam Steingold
2022-07-26 15:53                   ` Eli Zaretskii
2022-07-26 16:00                     ` Sam Steingold
2022-07-26 16:16                     ` Lars Ingebrigtsen
2022-07-26 14:43                 ` Robert Pluim
2022-07-25 19:37           ` Bruno Haible
2022-07-26  3:24           ` Richard Stallman
2022-07-26  8:00             ` Helmut Eller
2022-07-26 12:21               ` Eli Zaretskii
2022-07-27  2:58               ` Richard Stallman
2022-07-31  8:24                 ` Eli Zaretskii
2022-07-26 14:28             ` Sam Steingold [this message]
2022-07-26 15:42               ` Sam Steingold
2022-07-26 16:10               ` Eli Zaretskii
2022-07-26 18:56                 ` Bruno Haible
2022-07-26 19:30                   ` Eli Zaretskii
2022-07-20 16:24       ` Roland Winkler
2022-07-20 17:06         ` Sam Steingold
2022-07-20 17:16           ` Eli Zaretskii
2022-07-20 17:12         ` Eli Zaretskii
2022-07-20 17:37           ` Roland Winkler
2022-07-20 17:50             ` Eli Zaretskii
2022-07-20 18:10               ` Roland Winkler
2022-07-20 18:16                 ` Eli Zaretskii
2022-07-20 18:18                   ` [External] : " Drew Adams
2022-07-21  6:56                   ` Eli Zaretskii
2022-07-21 14:19                     ` Roland Winkler
2022-07-21 15:53                       ` Eli Zaretskii
2022-07-21 16:35                         ` Roland Winkler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lzlesg0wzy.fsf@3c22fb11fdab.ant.amazon.com \
    --to=sds@gnu.org \
    --cc=bruno@clisp.org \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).