unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Maxime Devos <maximedevos@telenet.be>
Cc: jean@abou-samra.fr, rlb@defaultvalue.org, guile-devel@gnu.org
Subject: Re: Improving the handling of system data (env, users, paths, ...)
Date: Sun, 07 Jul 2024 18:43:31 +0300	[thread overview]
Message-ID: <86wmlx1a3g.fsf@gnu.org> (raw)
In-Reply-To: <20240707165910.kez92C00H4hwdlW01ezAlE@andre.telenet-ops.be> (message from Maxime Devos on Sun, 7 Jul 2024 16:59:10 +0200)

> Cc: "jean@abou-samra.fr" <jean@abou-samra.fr>, 
> 	"rlb@defaultvalue.org" <rlb@defaultvalue.org>, 
> 	"guile-devel@gnu.org" <guile-devel@gnu.org>
> From: Maxime Devos <maximedevos@telenet.be>
> Date: Sun, 7 Jul 2024 16:59:10 +0200
> 
> >> >> Guile is a Scheme implementation, bound by Scheme standards and compatibility
> >> >> with other Scheme implementations (and backwards compatibility too).
> >> >
> >> >Yes, I understand that.
> >> 
> >> Going by what you are saying below, I think you don’t.
> >
> >Thank you for your vote of confidence.
> 
> That was not a vote of confidence, if anything, it’s the contrary.

You don't say!

> > I’m pretty sure that they weren’t intending to get the 0xb5 byte. Rather, they were using the equivalent of ‘string-ref’ (i.e., ‘aref’) and demonstrating that the result is bogus in Scheme.  In Scheme, ‘(string-ref ...)’ needs to return a character, and there exists no (Unicode) character with codepoint 4194229, so what Emacs returns here would be bogus for (Guile) Scheme.
> 
> >aref in Emacs and string-ref in Guile are not the same, and if Guile
> needs to produce a raw byte in this scenario, it can be easily
> arranged.  In Emacs we have other goals.
> 
> It is the opposite. In Guile, string-ref does not need to produce bytes, but characters – just like aref (modulo difference in how Scheme and Emacs define ‘byte’).

But raw byte is not a character.

> >IOW, I think this argument is pointless, since it is easy to adapt the
> mechanism to what Guile needs.
> 
> No – the argument is about how it is impossible to adapt the mechanism to Guile, since bytes aren’t characters in Unicode.

I'm saying that Guile needs to support raw bytes as well, because they
happen in Real Life, including as part of otherwise legible text.

> > >From the Emacs manual:
> > 
> > >For example, you can access individual characters in a string using the function aref (see Functions that Operate on Arrays).
> > 
> > Thus, (aref the-string index) is the equivalent of (string-ref the-string index).
> 
> >No, because a raw byte is not a character.
> 
> Yes, because characters are characters. Both string-ref and aref return characters. This is documented in both the Emacs and Guile manual:
> 
> Again, from the Emacs manual:
> 
> > A string is a fixed sequence of characters. [...] Since strings are arrays, and therefore sequences as well, you can operate on them with the general array and sequence functions documented in Sequences, Arrays, and Vectors. For example, you can access individual characters in a string using the function aref (see Functions that Operate on Arrays).
> 
> Hence, (aref the-string index) returns (Emacs) characters.

You missed the description of raw bytes and unibyte strings, I guess.

> >If Guile restricts itself to Unicode characters and only them, it will
> lack important features.  So my suggestion is not to have this
> restriction.
> 
> Guile restricting strings to Unicode _is_ an important feature (simplicity, and compatibility).
> 
> Guile extending strings beyond Unicode is a _limitation_ (compatibility and other trickiness for applications).
> 
> I could imagine in the far future there might be too little codepoints left in Unicode, in which case the range of what Guile (and more generally, Scheme and Unicode) considers characters needs to be extended (even if that has some compatibility implicaitons), but that time hasn’t arrived yet.
> 
> The important feature of this thread, is supporting file names (and getenv stuff, etc.) that doesn’t fit properly in the ‘string’ model. As mentioned earlier (in the initial message, even), there are solutions to that do not impose the ‘let characters go beyond Unicode’ limitation.
> 
> >I think the fact that this discussion is held, and that Rob suggested
> to use Latin-1 for the purpose of supporting raw bytes is a clear
> indication that Guile, too, needs to deal with "character-like" data
> that does not fit the Unicode framework. 
> 
> True, and I never claimed otherwise.
> 
> > So I think saying that strings in Guile can only hold Unicode characters will not give you what this discussion attempts to give.
> 
> Sure, and I wasn’t trying to. What I (and IIUC, the other person as well) was doing was mentioning how neither the Emacs’s thing is a solution. (Whether because of backwards compatibility, or whether because of not _wanting_ to conflate bytes with characters (and not wanting to go beyond Unicode) with all the consequences this conflation would imply for applications.)
> 
> > In particular, how will you
> handle the situations described by Rob where a file has a name that is
> not a valid UTF-8 sequence (thus not "characters" as long as you
> interpret text as UTF-8)?
> 
> Scheme does not interpret text as UTF-8, that’s an internal implementation detail and a matter of things like locales. Instead, to Scheme text is (Unicode) characters.
> 
> I have outlined a solution (that does not conflate characters with bytes) in another response. IIRC, it was in a response so Rob. I would propose actually, you know, reading it. I’m not sure, but IIRC Rob also mentioned another solution (i.e., just accept bytevectors in some locations, or do Latin-1).
> 
> Also, this structure makes no sense. Even if I did not provide an alternative solution of my own, that wouldn’t mean Emacs’s thing is the answer. (Negative) criticism can be valid without providing alternatives.

That's fine by me.  I described what we have done in Emacs because I
think it works and works well.  For many years.  So I thought
describing it will be useful to Guile and will allow you to consider
if something like that could solve your problems, which I think are
very similar if not identical.  It is up to you whether to reject that
solution without trying to adapt it to Guile, and in that case I wish
you all the luck in finding your own solutions.



  reply	other threads:[~2024-07-07 15:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-06 20:32 Improving the handling of system data (env, users, paths, ...) Rob Browning
2024-07-07  4:59 ` tomas
2024-07-07  5:33 ` Eli Zaretskii
2024-07-07 10:03   ` Jean Abou Samra
2024-07-07 11:04     ` Eli Zaretskii
2024-07-07 11:35       ` Maxime Devos
2024-07-07 14:25         ` Eli Zaretskii
2024-07-07 14:59           ` Maxime Devos
2024-07-07 15:43             ` Eli Zaretskii [this message]
2024-07-07 15:16           ` Jean Abou Samra
2024-07-07 15:18             ` Jean Abou Samra
2024-07-07 15:58             ` Eli Zaretskii
2024-07-07 16:09               ` Jean Abou Samra
2024-07-07 16:56               ` Mike Gran
2024-07-07  9:45 ` Jean Abou Samra
2024-07-07 19:25   ` Rob Browning
2024-07-07 10:24 ` Maxime Devos
2024-07-07 19:40   ` Rob Browning

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86wmlx1a3g.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=jean@abou-samra.fr \
    --cc=maximedevos@telenet.be \
    --cc=rlb@defaultvalue.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).