On Wed, Jun 26, 2024 at 01:46:28PM +0200, Maxime Devos wrote: > > >> >-Returns the number of characters in the given @var{string}. > >> +Returns the number of bytes in the given @var{string}. > >> > >> This is false. For example, (string-length "šŸ˜€") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] > > > >Maybe `the number of codepoints` will work here. > > > >(string-length "šŸ‘Øā€šŸ­") ;; => 3 > >(string-length "eĢ") ;; => 2 > > > >The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. It's more subtle than that: Unicode knows about "combining characters", so it's quite possible that Andrew's "Ć©" consists of two code points (FWIW, it arrives to me as just one, but perhaps there was some canonicalization [1] step in between). ISTR that "Unicode character" is actually synonymous the same than "Unicode code point" -- but the common meaning of "character" is more fuzzy. Perhaps it's wise to avoid that word when trying to be precise. Cheers [1] https://en.wikipedia.org/wiki/Unicode_normalization -- t