Le samedi 06 juillet 2024 à 15:32 -0500, Rob Browning a écrit :
> At a minimum, I suggest Guile should produce an error by default
> (instead of generating incorrect data) when the system bytes cannot be
> encoded in the current locale.


I agree that an error would be better than replacing with a question mark.


> As an incremental step, and as has been discussed elsewhere a bit, we
> might add support for uselocale()[2] and then document that the current
> recommendation is to always use ISO-8859-1 (i.e. Latin-1)[3] for system
> data unless you're certain your program doesn't need to be general
> purpose (perhaps you're sure you only care about UTF-8 systems).


latin1 locale is a terrible default. Virtually no Linux system these days
has a locale encoding different than UTF-8. Except perhaps for the "C" locale,
which people still use by habit with "LC_ALL=C" as a way to say "speak English
please", although most Linux distros have a C.UTF-8 locale these days.



> The most direct (and compact, if we do convert to UTF-8) representation
> would bytevectors, but then you would have a much more limited set of
> operations available (i.e. strings have all of srfi-13, srfi-14, etc.)
> unless we expanded them (likely re-using the existing code paths).  Of
> course you could still convert to Latin-1, perform the operation, and
> convert back, but that's not ideal.


Why is that "not ideal"? The (ice-9 iconv) API is convenient, locale-independent
and thread-safe.