On 2024-06-26 13:46, Maxime Devos wrote: >>> >-Returns the number of characters in the given @var{string}. >>> +Returns the number of bytes in the given @var{string}. >>> >>> This is false. For example, (string-length "πŸ˜€") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] >> >>Maybe `the number of codepoints` will work here. >> >>(string-length "πŸ‘¨β€πŸ­") ;; => 3 >>(string-length "é") ;; => 2 >> >>The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. > > You need to fix your setup, that’s not what Guile does. Are you sure you have set the encoding of current-input-port correctly? (Probably by setting LC_ALL or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding might be interpreted in terms of some 8-bit encoding. > > Here’s a test: if you can input #\πŸ‘¨β€πŸ­ without errors and it evaluates to #\πŸ‘¨β€πŸ­, then the encoding should be set up correctly. (setlocale LC_ALL) ;; => "en_US.utf8" (display #\πŸ‘¨β€πŸ­) ;; => /home/bob/guile-ares-rs/dev/guile/tmp.scm:84:15: unknown character name πŸ‘¨β€πŸ­ The same hapenning if I do it in usual REPL: LC_ALL=en_US.utf8 guile -- Best regards, Andrew Tropin