>> >> Guile is a Scheme implementation, bound by Scheme standards and compatibility >> >> with other Scheme implementations (and backwards compatibility too). >> > >> >Yes, I understand that. >> >> Going by what you are saying below, I think you don’t. > >Thank you for your vote of confidence. That was not a vote of confidence, if anything, it’s the contrary. > I’m pretty sure that they weren’t intending to get the 0xb5 byte. Rather, they were using the equivalent of ‘string-ref’ (i.e., ‘aref’) and demonstrating that the result is bogus in Scheme. In Scheme, ‘(string-ref ...)’ needs to return a character, and there exists no (Unicode) character with codepoint 4194229, so what Emacs returns here would be bogus for (Guile) Scheme. >aref in Emacs and string-ref in Guile are not the same, and if Guile needs to produce a raw byte in this scenario, it can be easily arranged. In Emacs we have other goals. It is the opposite. In Guile, string-ref does not need to produce bytes, but characters – just like aref (modulo difference in how Scheme and Emacs define ‘byte’). >IOW, I think this argument is pointless, since it is easy to adapt the mechanism to what Guile needs. No – the argument is about how it is impossible to adapt the mechanism to Guile, since bytes aren’t characters in Unicode. > >From the Emacs manual: > > >For example, you can access individual characters in a string using the function aref (see Functions that Operate on Arrays). > > Thus, (aref the-string index) is the equivalent of (string-ref the-string index). >No, because a raw byte is not a character. Yes, because characters are characters. Both string-ref and aref return characters. This is documented in both the Emacs and Guile manual: Again, from the Emacs manual: > A string is a fixed sequence of characters. [...] Since strings are arrays, and therefore sequences as well, you can operate on them with the general array and sequence functions documented in Sequences, Arrays, and Vectors. For example, you can access individual characters in a string using the function aref (see Functions that Operate on Arrays). Hence, (aref the-string index) returns (Emacs) characters. Likewise, from the Guile manual: > Scheme Procedure: string-ref str k >C Function: scm_string_ref (str, k) Return character k of str using zero-origin indexing. k must be a valid index of str. Clearly, these are equivalent (modulo difference in the meaning of ‘characters’). >If Guile restricts itself to Unicode characters and only them, it will lack important features. So my suggestion is not to have this restriction. Guile restricting strings to Unicode _is_ an important feature (simplicity, and compatibility). Guile extending strings beyond Unicode is a _limitation_ (compatibility and other trickiness for applications). I could imagine in the far future there might be too little codepoints left in Unicode, in which case the range of what Guile (and more generally, Scheme and Unicode) considers characters needs to be extended (even if that has some compatibility implicaitons), but that time hasn’t arrived yet. The important feature of this thread, is supporting file names (and getenv stuff, etc.) that doesn’t fit properly in the ‘string’ model. As mentioned earlier (in the initial message, even), there are solutions to that do not impose the ‘let characters go beyond Unicode’ limitation. >I think the fact that this discussion is held, and that Rob suggested to use Latin-1 for the purpose of supporting raw bytes is a clear indication that Guile, too, needs to deal with "character-like" data that does not fit the Unicode framework. True, and I never claimed otherwise. > So I think saying that strings in Guile can only hold Unicode characters will not give you what this discussion attempts to give. Sure, and I wasn’t trying to. What I (and IIUC, the other person as well) was doing was mentioning how neither the Emacs’s thing is a solution. (Whether because of backwards compatibility, or whether because of not _wanting_ to conflate bytes with characters (and not wanting to go beyond Unicode) with all the consequences this conflation would imply for applications.) > In particular, how will you handle the situations described by Rob where a file has a name that is not a valid UTF-8 sequence (thus not "characters" as long as you interpret text as UTF-8)? Scheme does not interpret text as UTF-8, that’s an internal implementation detail and a matter of things like locales. Instead, to Scheme text is (Unicode) characters. I have outlined a solution (that does not conflate characters with bytes) in another response. IIRC, it was in a response so Rob. I would propose actually, you know, reading it. I’m not sure, but IIRC Rob also mentioned another solution (i.e., just accept bytevectors in some locations, or do Latin-1). Also, this structure makes no sense. Even if I did not provide an alternative solution of my own, that wouldn’t mean Emacs’s thing is the answer. (Negative) criticism can be valid without providing alternatives. Best regards, Maxime Devos.