unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Mike Gran <spk121@yahoo.com>
To: Jean Abou Samra <jean@abou-samra.fr>, Eli Zaretskii <eliz@gnu.org>
Cc: "maximedevos@telenet.be" <maximedevos@telenet.be>,
	 "rlb@defaultvalue.org" <rlb@defaultvalue.org>,
	 "guile-devel@gnu.org" <guile-devel@gnu.org>
Subject: Re: Improving the handling of system data (env, users, paths, ...)
Date: Sun, 7 Jul 2024 16:56:24 +0000 (UTC)	[thread overview]
Message-ID: <1288285059.1039882.1720371384822@mail.yahoo.com> (raw)
In-Reply-To: <86v81h19f6.fsf@gnu.org>

On Sunday, July 7, 2024 at 08:58:34 AM PDT, Eli Zaretskii <eliz@gnu.org> wrote: 
>> I don't think it's worth it. If anybody's going to work on this problem,
>> I'd recommend simply adding APIs like program-arguments-bytevector,
>> getenv-bytevector and the like, returning raw bytevectors instead of strings,
>> and letting programs which need to be reliable against invalid UTF-8
>> in the environment use these.
> 
>> That is also the approach taken in, e.g., Rust (except that due to the
>> static typing, you are forced to handle the "invalid UTF-8" error case
>> when you use, e.g., std::env::args as opposed to std::env::args_os).

> The Emacs experience shows that (rare) raw bytes as part of otherwise
> completely valid text are a fact of life.  They happen all the time,
> for whatever reasons.  Granted, those reasons are most probably
> something misconfigured somewhere, but as long as that happens in a
> program other than the one you are developing, or even on another
> computer, the ability of the user, let alone the programmer, to fix
> the whole world is, how shall I put it, somewhat limited.  The
> question is what do you when this stuff happens, and how you prepare
> your package for dealing with it as well as reasonably possible?

To halfway follow Emacs's lead, Guild could use some of Unicode's
Private Use Area characters to represent raw bytes.

Raw bytes 0x00 to 0xFF could map to U+100000 to U+1000FF, for example.

We could make an encoding option such that
when converting from locale to internal Guile representation fails,
raw bytes could be transcoded thus for storage, and when outputting that string as a locale string
those characters can be output as raw bytes.  When outputting that string as UTF-8,
they can remain as PUA characters or be converted to the U+FFFD Replacement Character.

It would make corner cases: what if you use more than one non-UTF locale,
what if you actually wanted to use SMP PUA characters... 
And it would not be memory efficient; however, it would be
simple enough given Guile internals.

Regards,
Mike



  parent reply	other threads:[~2024-07-07 16:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-06 20:32 Improving the handling of system data (env, users, paths, ...) Rob Browning
2024-07-07  4:59 ` tomas
2024-07-07  5:33 ` Eli Zaretskii
2024-07-07 10:03   ` Jean Abou Samra
2024-07-07 11:04     ` Eli Zaretskii
2024-07-07 11:35       ` Maxime Devos
2024-07-07 14:25         ` Eli Zaretskii
2024-07-07 14:59           ` Maxime Devos
2024-07-07 15:43             ` Eli Zaretskii
2024-07-07 15:16           ` Jean Abou Samra
2024-07-07 15:18             ` Jean Abou Samra
2024-07-07 15:58             ` Eli Zaretskii
2024-07-07 16:09               ` Jean Abou Samra
2024-07-07 16:56               ` Mike Gran [this message]
2024-07-07  9:45 ` Jean Abou Samra
2024-07-07 19:25   ` Rob Browning
2024-07-07 10:24 ` Maxime Devos
2024-07-07 19:40   ` Rob Browning

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1288285059.1039882.1720371384822@mail.yahoo.com \
    --to=spk121@yahoo.com \
    --cc=eliz@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=jean@abou-samra.fr \
    --cc=maximedevos@telenet.be \
    --cc=rlb@defaultvalue.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).