* case-insensitive string comparison @ 2022-07-19 17:27 Sam Steingold 2022-07-19 18:06 ` Mattias Engdegård ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-19 17:27 UTC (permalink / raw) To: emacs-devel Hi, Emacs Lisp has 3 ways to describe comparison that ignores case: 1. "ignore-case", as in, e.g., `member-ignore-case' 2. "case-fold", as in, e.g., `case-fold-search' 3. "case-insensitive", as in, e.g., `minibuffer-history-case-insensitive-variables' Is there a general rule when to use which naming? Specifically, I would like to add --8<---------------cut here---------------start------------->8--- (defun string-equal-ignore-case (s1 s2) "Like `string-equal', but case-insensitive. Upper-case and lower-case letters are treated as equal. Unibyte strings are converted to multibyte for comparison." (eq t (compare-strings s1 0 nil s2 0 nil t))) --8<---------------cut here---------------end--------------->8--- to subr.el next to `string-prefix-p' - is this okay? Thanks. -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://www.peaceandtolerance.org/ https://ij.org/ https://www.memritv.org Sex is like air. It's only a big deal if you can't get any. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 17:27 case-insensitive string comparison Sam Steingold @ 2022-07-19 18:06 ` Mattias Engdegård 2022-07-19 18:56 ` Sam Steingold 2022-07-19 18:16 ` Stefan Kangas 2022-07-19 19:39 ` Roland Winkler 2 siblings, 1 reply; 45+ messages in thread From: Mattias Engdegård @ 2022-07-19 18:06 UTC (permalink / raw) To: sds; +Cc: emacs-devel 19 juli 2022 kl. 19.27 skrev Sam Steingold <sds@gnu.org>: > (defun string-equal-ignore-case (s1 s2) What would you tell someone complaining that (let ((rue "Straße")) (string-equal-ignore-case rue (upcase rue))) returns nil? Asking for a friend. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 18:06 ` Mattias Engdegård @ 2022-07-19 18:56 ` Sam Steingold 2022-07-20 4:39 ` tomas 0 siblings, 1 reply; 45+ messages in thread From: Sam Steingold @ 2022-07-19 18:56 UTC (permalink / raw) To: emacs-devel, Mattias Engdegård > * Mattias Engdegård <znggvnfr@npz.bet> [2022-07-19 20:06:50 +0200]: > > 19 juli 2022 kl. 19.27 skrev Sam Steingold <sds@gnu.org>: > >> (defun string-equal-ignore-case (s1 s2) > > What would you tell someone complaining that > > (let ((rue "Straße")) > (string-equal-ignore-case rue (upcase rue))) > > returns nil? Asking for a friend. This is a well-known bug in user code. https://stackoverflow.com/q/319426/850781 -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://iris.org.il https://jij.org https://www.dhimmitude.org https://ij.org/ If a Somali pirate uses a legal Windows version, is he still a pirate? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 18:56 ` Sam Steingold @ 2022-07-20 4:39 ` tomas 2022-07-20 11:35 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: tomas @ 2022-07-20 4:39 UTC (permalink / raw) To: emacs-devel; +Cc: Mattias Engdegård [-- Attachment #1: Type: text/plain, Size: 1221 bytes --] On Tue, Jul 19, 2022 at 02:56:45PM -0400, Sam Steingold wrote: > > * Mattias Engdegård <znggvnfr@npz.bet> [2022-07-19 20:06:50 +0200]: > > > > 19 juli 2022 kl. 19.27 skrev Sam Steingold <sds@gnu.org>: > > > >> (defun string-equal-ignore-case (s1 s2) > > > > What would you tell someone complaining that > > > > (let ((rue "Straße")) > > (string-equal-ignore-case rue (upcase rue))) > > > > returns nil? Asking for a friend. > > This is a well-known bug in user code. > https://stackoverflow.com/q/319426/850781 One case (heh) which gets too little attention in that (good) ref is "i" "ı" vs. "İ" vs. "I". You've to decide on a language environment to get a chance of doing it right (in Latin languages there are only 1 and 4, and they map to each other, in Turkic languages 1 and 3 correspond, as 2 and 4 do). The ref to the Unicode FAQ [1] from your ref shows that even the Unicode folks have given up on that. To me, it looks like an especially sleazy way to admit "well, folks, we've messed up on this one". Human languages are a messy mix, in which politics figures prominently. Unicode reflects that. Cheers [1] http://unicode.org/faq/casemap_charprop.html#9 -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 4:39 ` tomas @ 2022-07-20 11:35 ` Eli Zaretskii 2022-07-20 13:30 ` tomas 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-20 11:35 UTC (permalink / raw) To: tomas; +Cc: emacs-devel, mattiase > Date: Wed, 20 Jul 2022 06:39:46 +0200 > Cc: Mattias Engdegård <mattiase@acm.org> > From: <tomas@tuxteam.de> > > One case (heh) which gets too little attention in that > (good) ref is "i" "ı" vs. "İ" vs. "I". You've to decide > on a language environment to get a chance of doing it > right (in Latin languages there are only 1 and 4, and > they map to each other, in Turkic languages 1 and 3 > correspond, as 2 and 4 do). > > The ref to the Unicode FAQ [1] from your ref shows that > even the Unicode folks have given up on that. To me, it > looks like an especially sleazy way to admit "well, folks, > we've messed up on this one". > > Human languages are a messy mix, in which politics figures > prominently. Unicode reflects that. This could be hard on the Unicode Consortium, but relatively easy in Emacs: just bind the case table of the current buffer to something reasonable around code which performs case-insensitive comparison. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 11:35 ` Eli Zaretskii @ 2022-07-20 13:30 ` tomas 0 siblings, 0 replies; 45+ messages in thread From: tomas @ 2022-07-20 13:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, mattiase [-- Attachment #1: Type: text/plain, Size: 651 bytes --] On Wed, Jul 20, 2022 at 02:35:21PM +0300, Eli Zaretskii wrote: > > Date: Wed, 20 Jul 2022 06:39:46 +0200 > > Cc: Mattias Engdegård <mattiase@acm.org> > > From: <tomas@tuxteam.de> [...] > > Human languages are a messy mix, in which politics figures > > prominently. Unicode reflects that. > > This could be hard on the Unicode Consortium, but relatively easy in > Emacs: just bind the case table of the current buffer to something > reasonable around code which performs case-insensitive comparison. ...still: "something reasonable" is (human) language-dependent; if you're writing a German-Turkish dictionary... Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 17:27 case-insensitive string comparison Sam Steingold 2022-07-19 18:06 ` Mattias Engdegård @ 2022-07-19 18:16 ` Stefan Kangas 2022-07-19 19:39 ` Roland Winkler 2 siblings, 0 replies; 45+ messages in thread From: Stefan Kangas @ 2022-07-19 18:16 UTC (permalink / raw) To: Sam Steingold, Emacs developers Sam Steingold <sds@gnu.org> writes: > Emacs Lisp has 3 ways to describe comparison that ignores case: > > 1. "ignore-case", as in, e.g., `member-ignore-case' > 2. "case-fold", as in, e.g., `case-fold-search' > 3. "case-insensitive", as in, e.g., `minibuffer-history-case-insensitive-variables' See also Bug#56401. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 17:27 case-insensitive string comparison Sam Steingold 2022-07-19 18:06 ` Mattias Engdegård 2022-07-19 18:16 ` Stefan Kangas @ 2022-07-19 19:39 ` Roland Winkler 2022-07-19 22:47 ` Sam Steingold 2 siblings, 1 reply; 45+ messages in thread From: Roland Winkler @ 2022-07-19 19:39 UTC (permalink / raw) To: emacs-devel On Tue, Jul 19 2022, Sam Steingold wrote: > Specifically, I would like to add > > (defun string-equal-ignore-case (s1 s2) > "Like `string-equal', but case-insensitive. > Upper-case and lower-case letters are treated as equal. > Unibyte strings are converted to multibyte for comparison." > (eq t (compare-strings s1 0 nil s2 0 nil t))) > > to subr.el next to `string-prefix-p' - is this okay? I have run into this problem fairly often that I needed case-insensitive string comparison, and I believe various elisp packages include a "private" version of the above. I always felt that `(eq t (compare-strings s1 0 nil s2 0 nil t))' was a crutch for this common problem. Would it make sense to give the built-in function string-equal an optional arg ignore-case? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 19:39 ` Roland Winkler @ 2022-07-19 22:47 ` Sam Steingold 2022-07-20 2:21 ` Roland Winkler 2022-07-20 3:01 ` Stefan Monnier 0 siblings, 2 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-19 22:47 UTC (permalink / raw) To: emacs-devel, Roland Winkler > * Roland Winkler <jvaxyre@tah.bet> [2022-07-19 14:39:32 -0500]: > > On Tue, Jul 19 2022, Sam Steingold wrote: >> Specifically, I would like to add >> >> (defun string-equal-ignore-case (s1 s2) >> "Like `string-equal', but case-insensitive. >> Upper-case and lower-case letters are treated as equal. >> Unibyte strings are converted to multibyte for comparison." >> (eq t (compare-strings s1 0 nil s2 0 nil t))) >> >> to subr.el next to `string-prefix-p' - is this okay? > > I have run into this problem fairly often that I needed case-insensitive > string comparison, and I believe various elisp packages include a > "private" version of the above. I always felt that > `(eq t (compare-strings s1 0 nil s2 0 nil t))' was a crutch for this > common problem. Would it make sense to give the built-in function > string-equal an optional arg ignore-case? No, because I need to be able to pass `string-equal-ignore-case' to things like `cl-find' as `:test' &c. Also, if you look at fns.c, `string-equal' is basically `memcmp', while `compare-strings' is way more complex. PS. Actually, compare-strings/ignore_case is broken because it does, essentially, upcase both arguments, see https://stackoverflow.com/q/319426/850781 -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://memri.org https://www.dhimmitude.org http://think-israel.org A poet who reads his verse in public may have other nasty habits. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 22:47 ` Sam Steingold @ 2022-07-20 2:21 ` Roland Winkler 2022-07-20 3:01 ` Stefan Monnier 1 sibling, 0 replies; 45+ messages in thread From: Roland Winkler @ 2022-07-20 2:21 UTC (permalink / raw) To: emacs-devel On Tue, Jul 19 2022, Sam Steingold wrote: > No, because I need to be able to pass `string-equal-ignore-case' to > things like `cl-find' as `:test' &c. That sounds like a rather particular use case that, I believe, should not motivate the design of what goes into subr.el. > Also, if you look at fns.c, `string-equal' is basically `memcmp', while > `compare-strings' is way more complex. I don't think that's an obstacle for anything. - The string-delimiting args and underlying machinery of compare-strings are something that can be skipped with string-equal. - On the other hand, string comparison with case-folding is more complex than string comparison without case-folding, by its very definition. > PS. Actually, compare-strings/ignore_case is broken because it does, > essentially, upcase both arguments, see > https://stackoverflow.com/q/319426/850781 That's a very different issue. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-19 22:47 ` Sam Steingold 2022-07-20 2:21 ` Roland Winkler @ 2022-07-20 3:01 ` Stefan Monnier 2022-07-20 16:22 ` Sam Steingold 2022-07-20 16:24 ` Roland Winkler 1 sibling, 2 replies; 45+ messages in thread From: Stefan Monnier @ 2022-07-20 3:01 UTC (permalink / raw) To: emacs-devel; +Cc: Roland Winkler > PS. Actually, compare-strings/ignore_case is broken because it does, > essentially, upcase both arguments, see https://stackoverflow.com/q/319426/850781 Hmm... `string-collate-equalp`? Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 3:01 ` Stefan Monnier @ 2022-07-20 16:22 ` Sam Steingold 2022-07-25 14:23 ` Sam Steingold 2022-07-20 16:24 ` Roland Winkler 1 sibling, 1 reply; 45+ messages in thread From: Sam Steingold @ 2022-07-20 16:22 UTC (permalink / raw) To: emacs-devel, Stefan Monnier > * Stefan Monnier <zbaavre@veb.hzbagerny.pn> [2022-07-19 23:01:31 -0400]: > >> PS. Actually, compare-strings/ignore_case is broken because it does, >> essentially, upcase both arguments, see https://stackoverflow.com/q/319426/850781 > > Hmm... `string-collate-equalp`? --8<---------------cut here---------------start------------->8--- (string-collate-equalp "a" "A" current-locale-environment t) ==> nil current-locale-environment ==> "en_US.UTF-8" --8<---------------cut here---------------end--------------->8--- -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://ij.org/ https://memri.org https://honestreporting.com There are 3 kinds of people: those who can count and those who cannot. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 16:22 ` Sam Steingold @ 2022-07-25 14:23 ` Sam Steingold 2022-07-25 15:58 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-25 14:23 UTC (permalink / raw) To: emacs-devel > * Sam Steingold <fqf@tah.bet> [2022-07-20 12:22:33 -0400]: > >> * Stefan Monnier <zbaavre@veb.hzbagerny.pn> [2022-07-19 23:01:31 -0400]: >> >>> PS. Actually, compare-strings/ignore_case is broken because it does, >>> essentially, upcase both arguments, see https://stackoverflow.com/q/319426/850781 >> >> Hmm... `string-collate-equalp`? > > (string-collate-equalp "a" "A" current-locale-environment t) > ==> nil > current-locale-environment > ==> "en_US.UTF-8" So, how do we do case-insensitive string comparison in Emacs? It is okay to add a `string-equal-ignore-case' based on `compare-strings'? (even though it does not recognize "SS" and "ß" as equal) Or should we first implement something like casefold in Python? https://docs.python.org/3/library/stdtypes.html#str.casefold -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://camera.org https://honestreporting.com https://www.memritv.org Warning! Dates in calendar are closer than they appear! ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-25 14:23 ` Sam Steingold @ 2022-07-25 15:58 ` Eli Zaretskii 2022-07-25 19:39 ` Sam Steingold 2022-07-25 19:37 ` Bruno Haible 2022-07-26 3:24 ` Richard Stallman 2 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-25 15:58 UTC (permalink / raw) To: sds; +Cc: emacs-devel > From: Sam Steingold <sds@gnu.org> > Date: Mon, 25 Jul 2022 10:23:30 -0400 > > >> Hmm... `string-collate-equalp`? > > > > (string-collate-equalp "a" "A" current-locale-environment t) > > ==> nil > > current-locale-environment > > ==> "en_US.UTF-8" I cannot reproduce this: (string-collate-equalp "a" "A" current-locale-environment t) => t current-locale-environment => "en_US.UTF-8" What OS is this, and which Emacs version? > So, how do we do case-insensitive string comparison in Emacs? If you want locale-specific collation, as Stefan said, above. > It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > (even though it does not recognize "SS" and "ß" as equal) What's wrong with calling compare-strings directly? > Or should we first implement something like casefold in Python? > https://docs.python.org/3/library/stdtypes.html#str.casefold Ha! we already have that: (get-char-code-property ?ß 'special-uppercase) => "SS" Give us some credit, yes? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-25 15:58 ` Eli Zaretskii @ 2022-07-25 19:39 ` Sam Steingold 2022-07-26 13:05 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Sam Steingold @ 2022-07-25 19:39 UTC (permalink / raw) To: emacs-devel, Eli Zaretskii > * Eli Zaretskii <ryvm@tah.bet> [2022-07-25 18:58:19 +0300]: > >> From: Sam Steingold <sds@gnu.org> >> Date: Mon, 25 Jul 2022 10:23:30 -0400 >> >> >> Hmm... `string-collate-equalp`? >> > >> > (string-collate-equalp "a" "A" current-locale-environment t) >> > ==> nil >> > current-locale-environment >> > ==> "en_US.UTF-8" > > I cannot reproduce this: > > (string-collate-equalp "a" "A" current-locale-environment t) > => t > current-locale-environment > => "en_US.UTF-8" > > What OS is this, and which Emacs version? GNU Emacs 29.0.50 (build 5, x86_64-apple-darwin21.5.0, NS appkit-2113.50 Version 12.4 (Build 21F79)) of 2022-07-25 Repository revision: ffe12ff2503917e47c0356195b31430996c148f9 Repository branch: master Windowing system distributor 'Apple', version 10.3.2113 System Description: macOS 12.4 >> So, how do we do case-insensitive string comparison in Emacs? > > If you want locale-specific collation, as Stefan said, above. Do I? Is it really true that "UTF-8" without "en_US" does _not_ define case conversion? but https://docs.python.org/3/library/stdtypes.html#str.casefold says >>>>> The casefolding algorithm is described in section 3.13 of the Unicode Standard. this seems to imply that user locale setting is not relevant. (locale _is_ mentioned in https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf but it looks like a _specification_ of the algorithm, not its _modification_). >> It is okay to add a `string-equal-ignore-case' based on `compare-strings'? >> (even though it does not recognize "SS" and "ß" as equal) > > What's wrong with calling compare-strings directly? I want to be able to use `string-equal-ignore-case' as a :test argument to things like `cl-find'. And I don't want to have to think about encodings and locales. So I want the core Emacs maintainers who know about these things to provide me with something that works. Thanks in advance! ;-) The fact that there are ***TWO*** core functions that compare strings - `string-collate-equalp' and `compare-strings' - does not look right to me. _I_ should not have to decide which function to use. >> Or should we first implement something like casefold in Python? > > Ha! we already have that: > > (get-char-code-property ?ß 'special-uppercase) > => "SS" Nice, but how does it help me if --8<---------------cut here---------------start------------->8--- (compare-strings "SS" 0 nil "ß" 0 nil t) ==> -1 (string-collate-equalp "SS" "ß" "en_US.UTF-8" t) ==> nil --8<---------------cut here---------------end--------------->8--- instead of `t'? > Give us some credit, yes? Sure, and I am very grateful! -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://fairforall.org https://camera.org https://thereligionofpeace.com He who laughs last did not get the joke. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-25 19:39 ` Sam Steingold @ 2022-07-26 13:05 ` Eli Zaretskii 2022-07-26 14:16 ` Sam Steingold 2022-07-26 14:43 ` Robert Pluim 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-26 13:05 UTC (permalink / raw) To: sds; +Cc: emacs-devel > From: Sam Steingold <sds@gnu.org> > Date: Mon, 25 Jul 2022 15:39:34 -0400 > > > * Eli Zaretskii <ryvm@tah.bet> [2022-07-25 18:58:19 +0300]: > > > >> > (string-collate-equalp "a" "A" current-locale-environment t) > >> > ==> nil > >> > current-locale-environment > >> > ==> "en_US.UTF-8" > > > > I cannot reproduce this: > > > > (string-collate-equalp "a" "A" current-locale-environment t) > > => t > > current-locale-environment > > => "en_US.UTF-8" > > > > What OS is this, and which Emacs version? > > GNU Emacs 29.0.50 (build 5, x86_64-apple-darwin21.5.0, NS appkit-2113.50 Version 12.4 (Build 21F79)) > of 2022-07-25 > Repository revision: ffe12ff2503917e47c0356195b31430996c148f9 > Repository branch: master > Windowing system distributor 'Apple', version 10.3.2113 > System Description: macOS 12.4 Could be something macOS-specific. Maybe your system doesn't define the __STDC_ISO_10646__ feature? In that case, string-collate-equalp (see the doc string) behaves like string-equal, and that one doesn't have a case-insensitive variant. > >> So, how do we do case-insensitive string comparison in Emacs? > > > > If you want locale-specific collation, as Stefan said, above. > > Do I? > Is it really true that "UTF-8" without "en_US" does _not_ define case conversion? string-collate-equalp relies on the implementation in your libc, so that's something I cannot answer (although I'd expect any reasonable libc to work as expected here). In general, locale-specific comparison is a bad idea in Emacs, unless you are writing a Lisp program that absolutely _must_ meet the locale's definitions of collation order and equivalence. That's because some locales have unexpected requirements, and because different libc's implement this stuff very differently. So using string-collate-equalp and string-collate-lessp makes your program unpredictable on any machine but your own. For that reason, I suggest always using compare-strings instead. That function uses the Unicode locale-independent case-conversion rules, and you can predictably control/tailor that if you need by using a buffer-local case-table. > but https://docs.python.org/3/library/stdtypes.html#str.casefold says > > >>>>> The casefolding algorithm is described in section 3.13 of the Unicode Standard. > > this seems to imply that user locale setting is not relevant. That conclusion is incorrect. The collation database is usually tailored for each locale, and at least glibc indeed loads the tailored collation tables for each locale you request. > >> It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > >> (even though it does not recognize "SS" and "ß" as equal) > > > > What's wrong with calling compare-strings directly? > > I want to be able to use `string-equal-ignore-case' as a :test argument > to things like `cl-find'. Then write a thin wrapper around compare-strings, and be done. > And I don't want to have to think about encodings and locales. > So I want the core Emacs maintainers who know about these things to > provide me with something that works. Thanks in advance! ;-) There's nothing to think about: see above. The best results, in the Emacs context, are to write code that doesn't depend on the locale, and that's what you get with compare-strings. No need to know anything about encoding or locales. > The fact that there are ***TWO*** core functions that compare strings - > `string-collate-equalp' and `compare-strings' - does not look right to me. > _I_ should not have to decide which function to use. You can always ask. But the documentation at least hints that the locale-specific comparison has many hidden aspects: This function obeys the conventions for collation order in your locale settings. For example, characters with different coding points but the same meaning might be considered as equal, like different grave accent Unicode characters: (string-collate-equalp (string ?\uFF40) (string ?\u1FEF)) => t > >> Or should we first implement something like casefold in Python? > > > > Ha! we already have that: > > > > (get-char-code-property ?ß 'special-uppercase) > > => "SS" > > Nice, but how does it help me if > --8<---------------cut here---------------start------------->8--- > (compare-strings "SS" 0 nil "ß" 0 nil t) > ==> -1 > (string-collate-equalp "SS" "ß" "en_US.UTF-8" t) > ==> nil > --8<---------------cut here---------------end--------------->8--- > instead of `t'? It depends on what you want to do, and why you care about the ß case in the first place. AFAIR, you never explained that, nor described your goal. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 13:05 ` Eli Zaretskii @ 2022-07-26 14:16 ` Sam Steingold 2022-07-26 15:53 ` Eli Zaretskii 2022-07-26 14:43 ` Robert Pluim 1 sibling, 1 reply; 45+ messages in thread From: Sam Steingold @ 2022-07-26 14:16 UTC (permalink / raw) To: emacs-devel, Eli Zaretskii > * Eli Zaretskii <ryvm@tah.bet> [2022-07-26 16:05:50 +0300]: > >> From: Sam Steingold <sds@gnu.org> >> Date: Mon, 25 Jul 2022 15:39:34 -0400 >> >> > * Eli Zaretskii <ryvm@tah.bet> [2022-07-25 18:58:19 +0300]: >> > >> >> > (string-collate-equalp "a" "A" current-locale-environment t) >> >> > ==> nil >> >> > current-locale-environment >> >> > ==> "en_US.UTF-8" >> > >> > I cannot reproduce this: >> > >> > (string-collate-equalp "a" "A" current-locale-environment t) >> > => t >> > current-locale-environment >> > => "en_US.UTF-8" >> > >> > What OS is this, and which Emacs version? >> >> GNU Emacs 29.0.50 (build 5, x86_64-apple-darwin21.5.0, NS appkit-2113.50 Version 12.4 (Build 21F79)) >> of 2022-07-25 >> Repository revision: ffe12ff2503917e47c0356195b31430996c148f9 >> Repository branch: master >> Windowing system distributor 'Apple', version 10.3.2113 >> System Description: macOS 12.4 > > Could be something macOS-specific. Maybe your system doesn't define > the __STDC_ISO_10646__ feature? In that case, string-collate-equalp > (see the doc string) behaves like string-equal, and that one doesn't > have a case-insensitive variant. How do I find out? --8<---------------cut here---------------start------------->8--- echo > .zzz.c; gcc -E -dM .zzz.c | grep __STDC_ISO_10646__ --8<---------------cut here---------------end--------------->8--- does not print anything, but maybe I need to `#include' something? > In general, locale-specific comparison is a bad idea in Emacs... > For that reason, I suggest always using compare-strings instead. Thank you very much for the clear and detailed explanation! >> >> It is okay to add a `string-equal-ignore-case' based on `compare-strings'? >> >> (even though it does not recognize "SS" and "ß" as equal) >> > >> > What's wrong with calling compare-strings directly? >> >> I want to be able to use `string-equal-ignore-case' as a :test argument >> to things like `cl-find'. > > Then write a thin wrapper around compare-strings, and be done. I think the need is sufficiently generic, e.g., BBDB provides such a wrapper, as, I am sure, do many other packages. Many core files can be simplified by using `string-equal-ignore-case' (just like with the `string-prefix-p'). Thank again! -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://www.dhimmitude.org https://thereligionofpeace.com When you are arguing with an idiot, your opponent is doing the same. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 14:16 ` Sam Steingold @ 2022-07-26 15:53 ` Eli Zaretskii 2022-07-26 16:00 ` Sam Steingold 2022-07-26 16:16 ` Lars Ingebrigtsen 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-26 15:53 UTC (permalink / raw) To: sds; +Cc: emacs-devel > From: Sam Steingold <sds@gnu.org> > Date: Tue, 26 Jul 2022 10:16:08 -0400 > > > Could be something macOS-specific. Maybe your system doesn't define > > the __STDC_ISO_10646__ feature? In that case, string-collate-equalp > > (see the doc string) behaves like string-equal, and that one doesn't > > have a case-insensitive variant. > > How do I find out? > --8<---------------cut here---------------start------------->8--- > echo > .zzz.c; > gcc -E -dM .zzz.c | grep __STDC_ISO_10646__ > --8<---------------cut here---------------end--------------->8--- > does not print anything, but maybe I need to `#include' something? No, that exactly means you are getting the string-equal fallback instead. Here on GNU/Linux I get $ gcc -E -dM foo.c | fgrep 10646 #define __STDC_ISO_10646__ 201706L > >> I want to be able to use `string-equal-ignore-case' as a :test argument > >> to things like `cl-find'. > > > > Then write a thin wrapper around compare-strings, and be done. > > I think the need is sufficiently generic, e.g., BBDB provides such a > wrapper, as, I am sure, do many other packages. > Many core files can be simplified by using `string-equal-ignore-case' > (just like with the `string-prefix-p'). I'm not convinced, but I won't mount the barricades if Lars and/or others think we need this. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 15:53 ` Eli Zaretskii @ 2022-07-26 16:00 ` Sam Steingold 2022-07-26 16:16 ` Lars Ingebrigtsen 1 sibling, 0 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-26 16:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On Tue, 26 Jul 2022 at 11:53, Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Sam Steingold <sds@gnu.org> > > > > I think the need is sufficiently generic, e.g., BBDB provides such a > > wrapper, as, I am sure, do many other packages. > > Many core files can be simplified by using `string-equal-ignore-case' > > (just like with the `string-prefix-p'). > > I'm not convinced, but I won't mount the barricades if Lars and/or > others think we need this. Even though we already have completion--string-equal-p and bibtex-string= in core? (and also gnus-string-equal which is _almost_ identical) -- Sam Steingold <http://sds.podval.org> <http://www.childpsy.net> <http://steingoldpsychology.com> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 15:53 ` Eli Zaretskii 2022-07-26 16:00 ` Sam Steingold @ 2022-07-26 16:16 ` Lars Ingebrigtsen 1 sibling, 0 replies; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-07-26 16:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: sds, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> I think the need is sufficiently generic, e.g., BBDB provides such a >> wrapper, as, I am sure, do many other packages. >> Many core files can be simplified by using `string-equal-ignore-case' >> (just like with the `string-prefix-p'). > > I'm not convinced, but I won't mount the barricades if Lars and/or > others think we need this. Since there are already three of these variations in core, I think that shows that this would be a handy function to have. I've used `cl-equalp' for this in the past, but having a `string-' prefixed function might make sense. And in that case, what about calling it `string-equalp'? But perhaps too obscure for people coming from an non-CL background, so `string-equal-ignore-case' is fine by me. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 13:05 ` Eli Zaretskii 2022-07-26 14:16 ` Sam Steingold @ 2022-07-26 14:43 ` Robert Pluim 1 sibling, 0 replies; 45+ messages in thread From: Robert Pluim @ 2022-07-26 14:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: sds, emacs-devel >>>>> On Tue, 26 Jul 2022 16:05:50 +0300, Eli Zaretskii <eliz@gnu.org> said: >> From: Sam Steingold <sds@gnu.org> >> Date: Mon, 25 Jul 2022 15:39:34 -0400 >> >> > * Eli Zaretskii <ryvm@tah.bet> [2022-07-25 18:58:19 +0300]: >> > >> >> > (string-collate-equalp "a" "A" current-locale-environment t) >> >> > ==> nil >> >> > current-locale-environment >> >> > ==> "en_US.UTF-8" >> > >> > I cannot reproduce this: >> > >> > (string-collate-equalp "a" "A" current-locale-environment t) >> > => t >> > current-locale-environment >> > => "en_US.UTF-8" >> > >> > What OS is this, and which Emacs version? >> >> GNU Emacs 29.0.50 (build 5, x86_64-apple-darwin21.5.0, NS appkit-2113.50 Version 12.4 (Build 21F79)) >> of 2022-07-25 >> Repository revision: ffe12ff2503917e47c0356195b31430996c148f9 >> Repository branch: master >> Windowing system distributor 'Apple', version 10.3.2113 >> System Description: macOS 12.4 Eli> Could be something macOS-specific. Maybe your system doesn't define Eli> the __STDC_ISO_10646__ feature? In that case, string-collate-equalp Eli> (see the doc string) behaves like string-equal, and that one doesn't Eli> have a case-insensitive variant. Neither Appleʼs clang nor llvm 13 clang define it. Looks like thereʼs some plan to add it, but it hasnʼt happened yet. see <https://reviews.llvm.org/D106577> Robert -- ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-25 14:23 ` Sam Steingold 2022-07-25 15:58 ` Eli Zaretskii @ 2022-07-25 19:37 ` Bruno Haible 2022-07-26 3:24 ` Richard Stallman 2 siblings, 0 replies; 45+ messages in thread From: Bruno Haible @ 2022-07-25 19:37 UTC (permalink / raw) To: emacs-devel; +Cc: Sam Steingold Sam Steingold asked: > > (string-collate-equalp "a" "A" current-locale-environment t) > > ==> nil > > current-locale-environment > > ==> "en_US.UTF-8" > > So, how do we do case-insensitive string comparison in Emacs? > > It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > (even though it does not recognize "SS" and "ß" as equal) > > Or should we first implement something like casefold in Python? > https://docs.python.org/3/library/stdtypes.html#str.casefold The Unicode Standard's algorithm for case-insensitive string comparison is indeed much better thought-out than anything that you could come up with within a month. You are pointing to the Python implementation. But there's also an implementation in GNU libunistring [1] and one in ICU4C <unicode/ustring.h> [2]. Emacs could surely use one of these. The implementation from GNU libunistring is also available through Gnulib, as a set of modules [3]. The most relevant modules are unicase/u8-casecmp unicase/u8-casecoll unicase/u8-casefold unicase/u8-casemap unicase/u8-casexfrm unicase/u8-ct-casefold unicase/u8-ct-tolower unicase/u8-ct-totitle unicase/u8-ct-toupper Bruno [1] https://www.gnu.org/software/libunistring/manual/html_node/Case-insensitive-comparison.html [2] https://unicode-org.github.io/icu/userguide/transforms/casemappings.html [3] https://www.gnu.org/software/gnulib/MODULES.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-25 14:23 ` Sam Steingold 2022-07-25 15:58 ` Eli Zaretskii 2022-07-25 19:37 ` Bruno Haible @ 2022-07-26 3:24 ` Richard Stallman 2022-07-26 8:00 ` Helmut Eller 2022-07-26 14:28 ` Sam Steingold 2 siblings, 2 replies; 45+ messages in thread From: Richard Stallman @ 2022-07-26 3:24 UTC (permalink / raw) To: sds; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > So, how do we do case-insensitive string comparison in Emacs? Users could do it by calling `compare-strings' directly. > It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > (even though it does not recognize "SS" and "ß" as equal) A function `string-equal-ignore-case' would make sense. My question is, is it worth the cost in complexity, or is it better to urge users to call `compare-strings' directly? That depends on how often programs will do case-insensitive string comparison. If frequently, that gives a bigger upside to `string-equal-ignore-case'. > Or should we first implement something like casefold in Python? > https://docs.python.org/3/library/stdtypes.html#str.casefold That casefold operation is not the same thing as ignoring case in Emacs. How to integrate something like that into Emacs, and in general how to handle `ß' properly in case conversion, calls for more thought. It's possible that Python's handling is good, that we should implement something similar. It would be useful for people to study that option including designing how to put it into Emacs, and whether the results would be problem-free. Part of the issue is how this should affect the existing case features including searches in the buffer, case conversion commands and functions, and `compare-strings'. Also how it interacts with Turkish. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 3:24 ` Richard Stallman @ 2022-07-26 8:00 ` Helmut Eller 2022-07-26 12:21 ` Eli Zaretskii 2022-07-27 2:58 ` Richard Stallman 2022-07-26 14:28 ` Sam Steingold 1 sibling, 2 replies; 45+ messages in thread From: Helmut Eller @ 2022-07-26 8:00 UTC (permalink / raw) To: Richard Stallman; +Cc: sds, emacs-devel On Mon, Jul 25 2022, Richard Stallman wrote: > How to integrate something like that into Emacs, and in > general how to handle `ß' properly in case conversion, calls for more > thought. Unicode defines a LATIN CAPITAL LETTER SHARP S `ẞ' U+1E9E. So maybe that's an easy problem now. Not sure what typographers think about it. Helmut ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 8:00 ` Helmut Eller @ 2022-07-26 12:21 ` Eli Zaretskii 2022-07-27 2:58 ` Richard Stallman 1 sibling, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-26 12:21 UTC (permalink / raw) To: Helmut Eller; +Cc: rms, sds, emacs-devel > From: Helmut Eller <eller.helmut@gmail.com> > Cc: sds@gnu.org, emacs-devel@gnu.org > Date: Tue, 26 Jul 2022 10:00:43 +0200 > > On Mon, Jul 25 2022, Richard Stallman wrote: > > > How to integrate something like that into Emacs, and in > > general how to handle `ß' properly in case conversion, calls for more > > thought. > > Unicode defines a LATIN CAPITAL LETTER SHARP S `ẞ' U+1E9E. So maybe > that's an easy problem now. Not sure what typographers think about it. They are in disagreement, AFAIU. The Unicode Character Database (UCD) doesn't define ß and ẞ as a case-pair: the latter down-cases to the former, but the former doesn't up-case to the latter. So it is still a "special-casing" situation (and AFAIU different languages have different views on its usage). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 8:00 ` Helmut Eller 2022-07-26 12:21 ` Eli Zaretskii @ 2022-07-27 2:58 ` Richard Stallman 2022-07-31 8:24 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Richard Stallman @ 2022-07-27 2:58 UTC (permalink / raw) To: Helmut Eller; +Cc: sds, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Unicode defines a LATIN CAPITAL LETTER SHARP S `ẞ' U+1E9E. On my terminal, that character dispays as \u1E9E, which is not as helpful as if it were S. -- Dr Richard Stallman (https://stallman.org) Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-27 2:58 ` Richard Stallman @ 2022-07-31 8:24 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-31 8:24 UTC (permalink / raw) To: rms; +Cc: eller.helmut, sds, emacs-devel > From: Richard Stallman <rms@gnu.org> > Cc: sds@gnu.org, emacs-devel@gnu.org > Date: Tue, 26 Jul 2022 22:58:40 -0400 > > > Unicode defines a LATIN CAPITAL LETTER SHARP S `ẞ' U+1E9E. > > On my terminal, that character dispays as \u1E9E, which is not > as helpful as if it were S. I've now added support for U+1E9E to both "C-x 8" keyboard input and to latin1-disp on text terminals. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 3:24 ` Richard Stallman 2022-07-26 8:00 ` Helmut Eller @ 2022-07-26 14:28 ` Sam Steingold 2022-07-26 15:42 ` Sam Steingold 2022-07-26 16:10 ` Eli Zaretskii 1 sibling, 2 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-26 14:28 UTC (permalink / raw) To: emacs-devel, rms; +Cc: Bruno Haible > * Richard Stallman <ezf@tah.bet> [2022-07-25 23:24:43 -0400]: > > > It is okay to add a `string-equal-ignore-case' based on `compare-strings'? > > (even though it does not recognize "SS" and "ß" as equal) > > A function `string-equal-ignore-case' would make sense. My question is, > is it worth the cost in complexity, or is it better to urge users to call > `compare-strings' directly? 1. we already have `string-prefix-p' and `string-suffix-p' which are thin wrappers around `compare-strings' > That depends on how often programs will do case-insensitive string comparison. > If frequently, that gives a bigger upside to `string-equal-ignore-case'. 2. there are dozens of places in Emacs core with code like --8<---------------cut here---------------start------------->8--- (eq t (compare-strings (sgml-tag-name tag-info) nil nil (car stack) nil nil t)) --8<---------------cut here---------------end--------------->8--- 3. some emacs packages already have to define their own versions of `string-equal-ignore-case', e.g., `bbdb-string='. > > Or should we first implement something like casefold in Python? > > https://docs.python.org/3/library/stdtypes.html#str.casefold > > That casefold operation is not the same thing as ignoring case in > Emacs. Normally, case-insensitive comparison means something like --8<---------------cut here---------------start------------->8--- (string= (casefold A) (casefold B)) --8<---------------cut here---------------end--------------->8--- `compare-strings' does --8<---------------cut here---------------start------------->8--- (string= (upcase A) (upcase B)) --8<---------------cut here---------------end--------------->8--- (except it does it character-by-character, no allocating new strings for `upcase'). > How to integrate something like that into Emacs, and in > general how to handle `ß' properly in case conversion, calls for more > thought. Bruno Haible replied in this thread, suggesting libunistring via gnulib. I think this is the easiest way to handle the issue. -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://memri.org https://honestreporting.com https://ffii.org The program isn't debugged until the last user is dead. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 14:28 ` Sam Steingold @ 2022-07-26 15:42 ` Sam Steingold 2022-07-26 16:10 ` Eli Zaretskii 1 sibling, 0 replies; 45+ messages in thread From: Sam Steingold @ 2022-07-26 15:42 UTC (permalink / raw) To: emacs-devel > * Sam Steingold <fqf@tah.bet> [2022-07-26 10:28:01 -0400]: > >> * Richard Stallman <ezf@tah.bet> [2022-07-25 23:24:43 -0400]: >> >> That depends on how often programs will do case-insensitive string comparison. >> If frequently, that gives a bigger upside to `string-equal-ignore-case'. > > 3. some emacs packages already have to define their own versions of > `string-equal-ignore-case', e.g., `bbdb-string='. and also `bibtex-string=' and `completion--string-equal-p' in the core. -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://fairforall.org http://think-israel.org https://www.memritv.org Growing Old is Inevitable; Growing Up is Optional. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 14:28 ` Sam Steingold 2022-07-26 15:42 ` Sam Steingold @ 2022-07-26 16:10 ` Eli Zaretskii 2022-07-26 18:56 ` Bruno Haible 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-26 16:10 UTC (permalink / raw) To: sds; +Cc: emacs-devel, rms, bruno > From: Sam Steingold <sds@gnu.org> > Cc: Bruno Haible <bruno@clisp.org> > Date: Tue, 26 Jul 2022 10:28:01 -0400 > > Bruno Haible replied in this thread, suggesting libunistring via gnulib. > I think this is the easiest way to handle the issue. Using an external library whose notion of string comparison and letter-case cannot be controlled by Emacs is a non-starter. With the current machinery, a Lisp program or a user can control up/down-casing by specifying a buffer-local case-table, and we won't give up this important functionality. Other than that, I'm not aware of anything that libunistring can do that Emacs cannot: we import the same Unicode tables as libunistring does, so we have the same data to do these jobs. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 16:10 ` Eli Zaretskii @ 2022-07-26 18:56 ` Bruno Haible 2022-07-26 19:30 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Bruno Haible @ 2022-07-26 18:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: sds, emacs-devel Eli Zaretskii wrote: > With the > current machinery, a Lisp program or a user can control up/down-casing > by specifying a buffer-local case-table, and we won't give up this > important functionality. For which types of users, and for which use-cases, do you consider this an "important functionality"? Recall that the Unicode casing tables already cover the special cases for 'ß', Turkisch i, and so on. I'd like to understand whether per-user customization of casing rules is so important that libunistring should offer it in the API (as opposed to requiring code modifications). LibreOffice, for example, allows per-user customizations of the spell- checking dictionary, but not of the casing tables. Is that a flaw, and why? Bruno ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-26 18:56 ` Bruno Haible @ 2022-07-26 19:30 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-26 19:30 UTC (permalink / raw) To: Bruno Haible; +Cc: sds, emacs-devel > From: Bruno Haible <bruno@clisp.org> > Cc: sds@gnu.org, emacs-devel@gnu.org > Date: Tue, 26 Jul 2022 20:56:10 +0200 > > Eli Zaretskii wrote: > > With the > > current machinery, a Lisp program or a user can control up/down-casing > > by specifying a buffer-local case-table, and we won't give up this > > important functionality. > > For which types of users, and for which use-cases, do you consider this an > "important functionality"? One example that immediately comes to mind is when you need to downcase strings without being hit by the case of 'I' in the Turkish locale. We use this, for example, when parsing various Internet protocols. More importantly, Emacs had this feature for many years, so suddenly losing it is not really a possibility. > Recall that the Unicode casing tables already cover the special cases for > 'ß', Turkisch i, and so on. We have the infrastructure for supporting that, and do so in locales where that is required. In particular, Emacs imports the data from the Unicode SpecialCasing.txt file. > I'd like to understand whether per-user customization of casing rules is > so important that libunistring should offer it in the API (as opposed to > requiring code modifications). > > LibreOffice, for example, allows per-user customizations of the spell- > checking dictionary, but not of the casing tables. Is that a flaw, and why? Emacs is not only a text-editing program, it is primarily a text-processing environment. When you write programs that process text, control of case conversions is sometimes important. Whether this means libunistring needs to grow such an API, I don't know. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 3:01 ` Stefan Monnier 2022-07-20 16:22 ` Sam Steingold @ 2022-07-20 16:24 ` Roland Winkler 2022-07-20 17:06 ` Sam Steingold 2022-07-20 17:12 ` Eli Zaretskii 1 sibling, 2 replies; 45+ messages in thread From: Roland Winkler @ 2022-07-20 16:24 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel On Tue, Jul 19 2022, Stefan Monnier wrote: >> PS. Actually, compare-strings/ignore_case is broken because it does, >> essentially, upcase both arguments, see >> https://stackoverflow.com/q/319426/850781 > > Hmm... `string-collate-equalp`? It would be nice if the node in the elisp manual on "comparison of characters and strings" included some discussion on what usage cases with case-folding can / should preferentially be covered by the locale-dependent function string-collate-equalp versus something like compare-strings. In my narrow world, I can think of two extremes: - bibtex-mode needs to compare BibTeX keywords that are ascii strings for which case is insignificant. So bibtex-string= is exactly what Sam suggests to put into subr.el, and I believe that's good enough (just as almost any other approach I can think of for this particular problem). - BBDB needs to know whether a name is already present in the database or not, ignoring case. The function bbdb-string= is again what Sam suggests to put into subr.el. The function string-collate-equalp might be better suited for this. But which locale should it use? The records in my BBDB cover larger parts of the world and I do not even know which locale(s) might work best for each of them, not to mention that BBDB needs to loop over all records. Is there a "univeral default locale"? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 16:24 ` Roland Winkler @ 2022-07-20 17:06 ` Sam Steingold 2022-07-20 17:16 ` Eli Zaretskii 2022-07-20 17:12 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Sam Steingold @ 2022-07-20 17:06 UTC (permalink / raw) To: emacs-devel, Roland Winkler > * Roland Winkler <jvaxyre@tah.bet> [2022-07-20 11:24:38 -0500]: > > On Tue, Jul 19 2022, Stefan Monnier wrote: >>> PS. Actually, compare-strings/ignore_case is broken because it does, >>> essentially, upcase both arguments, see >>> https://stackoverflow.com/q/319426/850781 >> >> Hmm... `string-collate-equalp`? > > - BBDB needs to know whether a name is already present in the database > or not, ignoring case. The function bbdb-string= is again what Sam > suggests to put into subr.el. The function string-collate-equalp > might be better suited for this. But which locale should it use? `bbdb-file-coding-system' ? > Is there a "univeral default locale"? UTF-8 is, I think, the generally accepted universal default today. -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.2113 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com https://www.memritv.org https://iris.org.il https://www.dhimmitude.org If you need a helping hand, just remember that you already have two. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 17:06 ` Sam Steingold @ 2022-07-20 17:16 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-20 17:16 UTC (permalink / raw) To: sds; +Cc: emacs-devel, winkler > From: Sam Steingold <sds@gnu.org> > Date: Wed, 20 Jul 2022 13:06:41 -0400 > > > - BBDB needs to know whether a name is already present in the database > > or not, ignoring case. The function bbdb-string= is again what Sam > > suggests to put into subr.el. The function string-collate-equalp > > might be better suited for this. But which locale should it use? > > `bbdb-file-coding-system' ? That's not the locale, that's a locale's _codeset_. > > Is there a "univeral default locale"? > > UTF-8 is, I think, the generally accepted universal default today. There's no such locale, AFAIK. UTF-8 is again just a codeset. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 16:24 ` Roland Winkler 2022-07-20 17:06 ` Sam Steingold @ 2022-07-20 17:12 ` Eli Zaretskii 2022-07-20 17:37 ` Roland Winkler 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-20 17:12 UTC (permalink / raw) To: Roland Winkler; +Cc: monnier, emacs-devel > From: Roland Winkler <winkler@gnu.org> > Cc: emacs-devel@gnu.org > Date: Wed, 20 Jul 2022 11:24:38 -0500 > > On Tue, Jul 19 2022, Stefan Monnier wrote: > >> PS. Actually, compare-strings/ignore_case is broken because it does, > >> essentially, upcase both arguments, see > >> https://stackoverflow.com/q/319426/850781 > > > > Hmm... `string-collate-equalp`? > > It would be nice if the node in the elisp manual on "comparison of > characters and strings" included some discussion on what usage cases > with case-folding can / should preferentially be covered by the > locale-dependent function string-collate-equalp versus something like > compare-strings. I hear you, but your request is impossible to fulfill in practice. That's because the collation rules used by this function are implemented in the C library, and even if we know the locale, different implementations of libc use different collation rules (in addition, collation rules for some locales change with time). The answer to the question "what comparison function should I use in a specific use case" depends on the details of the use case, on the locale, and on the libc against which Emacs was linked. That is why the ELisp manual and the doc strings are intentionally vague regarding what exactly should you expect as result: we simply cannot say there anything that is accurate enough and general enough. compare-strings, by contrast, doesn't use any collation rules, only the current buffer's value of the case table. So its results are more predictable. > - bibtex-mode needs to compare BibTeX keywords that are ascii strings > for which case is insignificant. So bibtex-string= is exactly what > Sam suggests to put into subr.el, and I believe that's good enough > (just as almost any other approach I can think of for this particular > problem). > > - BBDB needs to know whether a name is already present in the database > or not, ignoring case. The function bbdb-string= is again what Sam > suggests to put into subr.el. The function string-collate-equalp > might be better suited for this. But which locale should it use? The > records in my BBDB cover larger parts of the world and I do not even > know which locale(s) might work best for each of them, not to mention > that BBDB needs to loop over all records. Is there a "univeral > default locale"? That "universal default locale" is what Emacs uses, modulo the few problematic characters like the dotless I etc. For 100% predictable results, build your own case table, bind the buffer's case table to it, and then call case-insensitive comparison. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 17:12 ` Eli Zaretskii @ 2022-07-20 17:37 ` Roland Winkler 2022-07-20 17:50 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Roland Winkler @ 2022-07-20 17:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel On Wed, Jul 20 2022, Eli Zaretskii wrote: >> It would be nice if the node in the elisp manual on "comparison of >> characters and strings" included some discussion on what usage cases >> with case-folding can / should preferentially be covered by the >> locale-dependent function string-collate-equalp versus something like >> compare-strings. > > I hear you, but your request is impossible to fulfill in practice. > That's because the collation rules used by this function are > implemented in the C library, and even if we know the locale, > different implementations of libc use different collation rules (in > addition, collation rules for some locales change with time). Even mentioning the difficulties could be useful here. The elisp manual is used by people who want to develop code that works for a wide range of users. So even if string comparison is a slippery terrain these elisp hackers need to make design choices that work best for most users. What usage scenarios in elisp packages might benefit from string-collate-equalp even if this function depends on details that can be quite different for different users? >> - BBDB needs to know whether a name is already present in the database >> or not, ignoring case. The function bbdb-string= is again what Sam >> suggests to put into subr.el. The function string-collate-equalp >> might be better suited for this. But which locale should it use? The >> records in my BBDB cover larger parts of the world and I do not even >> know which locale(s) might work best for each of them, not to mention >> that BBDB needs to loop over all records. Is there a "univeral >> default locale"? > > That "universal default locale" is what Emacs uses, modulo the few > problematic characters like the dotless I etc. For 100% predictable > results, build your own case table, bind the buffer's case table to > it, and then call case-insensitive comparison. I am not sure I can follow your argument. Do you suggest that, likely, BBDB will work best if it compares names using compare-strings? (I'd be glad to hear that.) This code should work for users who do not want to build their own case table and stuff like that. Thanks! ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 17:37 ` Roland Winkler @ 2022-07-20 17:50 ` Eli Zaretskii 2022-07-20 18:10 ` Roland Winkler 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-20 17:50 UTC (permalink / raw) To: Roland Winkler; +Cc: monnier, emacs-devel > From: Roland Winkler <winkler@gnu.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Wed, 20 Jul 2022 12:37:29 -0500 > > > I hear you, but your request is impossible to fulfill in practice. > > That's because the collation rules used by this function are > > implemented in the C library, and even if we know the locale, > > different implementations of libc use different collation rules (in > > addition, collation rules for some locales change with time). > > Even mentioning the difficulties could be useful here. I'm not sure I agree. To describe all the important aspects of this would take too long, and it isn't the job of our manual to document this stuff. Read this if you want to know: https://unicode.org/reports/tr10/ > The elisp manual is used by people who want to develop code that > works for a wide range of users. So even if string comparison is a > slippery terrain these elisp hackers need to make design choices > that work best for most users. Luckily, Emacs Lisp programs rarely need this. > What usage scenarios in elisp packages might benefit from > string-collate-equalp even if this function depends on details that can > be quite different for different users? For example, sorting file names. If you want to get anything similar to what GNU 'ls' does on GNU/Linux (in particular, with punctuation characters in file names), you need to use the locale's collation rules as implemented by glibc. Which is what string-collate-lessp does. > >> - BBDB needs to know whether a name is already present in the database > >> or not, ignoring case. The function bbdb-string= is again what Sam > >> suggests to put into subr.el. The function string-collate-equalp > >> might be better suited for this. But which locale should it use? The > >> records in my BBDB cover larger parts of the world and I do not even > >> know which locale(s) might work best for each of them, not to mention > >> that BBDB needs to loop over all records. Is there a "univeral > >> default locale"? > > > > That "universal default locale" is what Emacs uses, modulo the few > > problematic characters like the dotless I etc. For 100% predictable > > results, build your own case table, bind the buffer's case table to > > it, and then call case-insensitive comparison. > > I am not sure I can follow your argument. Do you suggest that, likely, > BBDB will work best if it compares names using compare-strings? Yes. But in addition, you should set up the case table of the current buffer when you do so, because otherwise special cases with the likes of the Turkish language's dotless I could in rare cases screw you. > (I'd be glad to hear that.) This code should work for users who do not > want to build their own case table and stuff like that. Not the users should build the case table, BBDB (or whatever Lisp program that needs the comparison) should. It's not that hard, really: if you only need ASCII, use ascii-case-table, otherwise copy the standard case-table and modify it to make sure I downcases to i and similarly with a few other exceptional letters. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 17:50 ` Eli Zaretskii @ 2022-07-20 18:10 ` Roland Winkler 2022-07-20 18:16 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Roland Winkler @ 2022-07-20 18:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel On Wed, Jul 20 2022, Eli Zaretskii wrote: >> Even mentioning the difficulties could be useful here. > > I'm not sure I agree. To describe all the important aspects of this > would take too long, and it isn't the job of our manual to document > this stuff. Read this if you want to know: > > https://unicode.org/reports/tr10/ A footnote pointing the interested reader to this report could already be useful. I am not suggesting to try to provide a more exhaustive discussion of this topic. I am suggesting to mention briefly that the topic is subtle and depends on details "beyond emacs itself". >> I am not sure I can follow your argument. Do you suggest that, likely, >> BBDB will work best if it compares names using compare-strings? > > Yes. Thanks, that's already good to know! > But in addition, you should set up the case table of the current > buffer when you do so, because otherwise special cases with the likes > of the Turkish language's dotless I could in rare cases screw you. > >> (I'd be glad to hear that.) This code should work for users who do not >> want to build their own case table and stuff like that. > > Not the users should build the case table, BBDB (or whatever Lisp > program that needs the comparison) should. It's not that hard, > really: if you only need ASCII, use ascii-case-table, otherwise copy > the standard case-table and modify it to make sure I downcases to i > and similarly with a few other exceptional letters. I am not sure it would be possible to predict how a default case table for BBDB should differ from the standard case table. BBDB might be the only package of a user that accumulates strings that go beyond what otherwise a user is dealing with regularly. If there is a sensible "BBDB default case table" I'd hope that this is the standard case table. Or if not: can you suggest an emacs package that I can look into as a source of inspiration? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 18:10 ` Roland Winkler @ 2022-07-20 18:16 ` Eli Zaretskii 2022-07-20 18:18 ` [External] : " Drew Adams 2022-07-21 6:56 ` Eli Zaretskii 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-07-20 18:16 UTC (permalink / raw) To: Roland Winkler; +Cc: monnier, emacs-devel > From: Roland Winkler <winkler@gnu.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Wed, 20 Jul 2022 13:10:35 -0500 > > On Wed, Jul 20 2022, Eli Zaretskii wrote: > >> Even mentioning the difficulties could be useful here. > > > > I'm not sure I agree. To describe all the important aspects of this > > would take too long, and it isn't the job of our manual to document > > this stuff. Read this if you want to know: > > > > https://unicode.org/reports/tr10/ > > A footnote pointing the interested reader to this report could already > be useful. I'll see if we have a good place for that. > > Not the users should build the case table, BBDB (or whatever Lisp > > program that needs the comparison) should. It's not that hard, > > really: if you only need ASCII, use ascii-case-table, otherwise copy > > the standard case-table and modify it to make sure I downcases to i > > and similarly with a few other exceptional letters. > > I am not sure it would be possible to predict how a default case table > for BBDB should differ from the standard case table. BBDB might be the > only package of a user that accumulates strings that go beyond what > otherwise a user is dealing with regularly. If there is a sensible > "BBDB default case table" I'd hope that this is the standard case table. Maybe BBDB can just use the standard case table, I don't know. You should be the judge of that: if your users don't care with I not being equal to i case-insensitively, when the language-environment happens to be Turkish, then you shouldn't worry about that. > Or if not: can you suggest an emacs package that I can look into as a > source of inspiration? I'm not aware of any (which is not to say there isn't any, just that I don't know). ^ permalink raw reply [flat|nested] 45+ messages in thread
* RE: [External] : Re: case-insensitive string comparison 2022-07-20 18:16 ` Eli Zaretskii @ 2022-07-20 18:18 ` Drew Adams 2022-07-21 6:56 ` Eli Zaretskii 1 sibling, 0 replies; 45+ messages in thread From: Drew Adams @ 2022-07-20 18:18 UTC (permalink / raw) To: Eli Zaretskii, Roland Winkler Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > A footnote pointing the interested reader > > to this report could already be useful. > > I'll see if we have a good place for that. +1. Thx. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-20 18:16 ` Eli Zaretskii 2022-07-20 18:18 ` [External] : " Drew Adams @ 2022-07-21 6:56 ` Eli Zaretskii 2022-07-21 14:19 ` Roland Winkler 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-21 6:56 UTC (permalink / raw) To: winkler; +Cc: monnier, emacs-devel > Date: Wed, 20 Jul 2022 21:16:12 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > > > https://unicode.org/reports/tr10/ > > > > A footnote pointing the interested reader to this report could already > > be useful. > > I'll see if we have a good place for that. Done. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-21 6:56 ` Eli Zaretskii @ 2022-07-21 14:19 ` Roland Winkler 2022-07-21 15:53 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Roland Winkler @ 2022-07-21 14:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel On Thu, Jul 21 2022, Eli Zaretskii wrote: >> Date: Wed, 20 Jul 2022 21:16:12 +0300 >> From: Eli Zaretskii <eliz@gnu.org> >> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org >> >> > > https://unicode.org/reports/tr10/ >> > >> > A footnote pointing the interested reader to this report could already >> > be useful. >> >> I'll see if we have a good place for that. > > Done. Thank you! - I always thought that such technical reports were beyond my comprehension. But this unicode report is quite readable. So there is no need for emacs to reinvent the wheel. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-21 14:19 ` Roland Winkler @ 2022-07-21 15:53 ` Eli Zaretskii 2022-07-21 16:35 ` Roland Winkler 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-07-21 15:53 UTC (permalink / raw) To: Roland Winkler; +Cc: monnier, emacs-devel > From: Roland Winkler <winkler@gnu.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Thu, 21 Jul 2022 09:19:37 -0500 > > On Thu, Jul 21 2022, Eli Zaretskii wrote: > >> Date: Wed, 20 Jul 2022 21:16:12 +0300 > >> From: Eli Zaretskii <eliz@gnu.org> > >> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > >> > >> > > https://unicode.org/reports/tr10/ > >> > > >> > A footnote pointing the interested reader to this report could already > >> > be useful. > >> > >> I'll see if we have a good place for that. > > > > Done. > > Thank you! - I always thought that such technical reports were beyond my > comprehension. But this unicode report is quite readable. So there is > no need for emacs to reinvent the wheel. The implementation of strcoll is in the underlying libc, so Emacs _cannot_ possible reinvent this wheel, even if we wanted to. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: case-insensitive string comparison 2022-07-21 15:53 ` Eli Zaretskii @ 2022-07-21 16:35 ` Roland Winkler 0 siblings, 0 replies; 45+ messages in thread From: Roland Winkler @ 2022-07-21 16:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel On Thu, Jul 21 2022, Eli Zaretskii wrote: >> Thank you! - I always thought that such technical reports were beyond my >> comprehension. But this unicode report is quite readable. So there is >> no need for emacs to reinvent the wheel. > > The implementation of strcoll is in the underlying libc, so Emacs > _cannot_ possible reinvent this wheel, even if we wanted to. I had in mind the elisp manual reinventing (rewriting) the readable unicode report. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2022-07-31 8:24 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-19 17:27 case-insensitive string comparison Sam Steingold 2022-07-19 18:06 ` Mattias Engdegård 2022-07-19 18:56 ` Sam Steingold 2022-07-20 4:39 ` tomas 2022-07-20 11:35 ` Eli Zaretskii 2022-07-20 13:30 ` tomas 2022-07-19 18:16 ` Stefan Kangas 2022-07-19 19:39 ` Roland Winkler 2022-07-19 22:47 ` Sam Steingold 2022-07-20 2:21 ` Roland Winkler 2022-07-20 3:01 ` Stefan Monnier 2022-07-20 16:22 ` Sam Steingold 2022-07-25 14:23 ` Sam Steingold 2022-07-25 15:58 ` Eli Zaretskii 2022-07-25 19:39 ` Sam Steingold 2022-07-26 13:05 ` Eli Zaretskii 2022-07-26 14:16 ` Sam Steingold 2022-07-26 15:53 ` Eli Zaretskii 2022-07-26 16:00 ` Sam Steingold 2022-07-26 16:16 ` Lars Ingebrigtsen 2022-07-26 14:43 ` Robert Pluim 2022-07-25 19:37 ` Bruno Haible 2022-07-26 3:24 ` Richard Stallman 2022-07-26 8:00 ` Helmut Eller 2022-07-26 12:21 ` Eli Zaretskii 2022-07-27 2:58 ` Richard Stallman 2022-07-31 8:24 ` Eli Zaretskii 2022-07-26 14:28 ` Sam Steingold 2022-07-26 15:42 ` Sam Steingold 2022-07-26 16:10 ` Eli Zaretskii 2022-07-26 18:56 ` Bruno Haible 2022-07-26 19:30 ` Eli Zaretskii 2022-07-20 16:24 ` Roland Winkler 2022-07-20 17:06 ` Sam Steingold 2022-07-20 17:16 ` Eli Zaretskii 2022-07-20 17:12 ` Eli Zaretskii 2022-07-20 17:37 ` Roland Winkler 2022-07-20 17:50 ` Eli Zaretskii 2022-07-20 18:10 ` Roland Winkler 2022-07-20 18:16 ` Eli Zaretskii 2022-07-20 18:18 ` [External] : " Drew Adams 2022-07-21 6:56 ` Eli Zaretskii 2022-07-21 14:19 ` Roland Winkler 2022-07-21 15:53 ` Eli Zaretskii 2022-07-21 16:35 ` Roland Winkler
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.