On Wed, Jun 26, 2024 at 01:46:28PM +0200, Maxime Devos wrote:
> 
> >>  >-Returns the number of characters in the given @var{string}.
> >> +Returns the number of bytes in the given @var{string}.
> >>  
> >> This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...]
> >
> >Maybe `the number of codepoints` will work here.
> >
> >(string-length "👨‍🏭") ;; => 3
> >(string-length "é") ;; => 2
> >
> >The number of characters here is 1 in both cases.
> 
> No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint.

It's more subtle than that: Unicode knows about "combining characters",
so it's quite possible that Andrew's "é" consists of two code points
(FWIW, it arrives to me as just one, but perhaps there was some
canonicalization [1] step in between).

ISTR that "Unicode character" is actually synonymous the same than "Unicode
code point" -- but the common meaning of "character" is more fuzzy. Perhaps
it's wise to avoid that word when trying to be precise.

Cheers

[1] https://en.wikipedia.org/wiki/Unicode_normalization

-- 
t