unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Jean Abou Samra <jean@abou-samra.fr>
To: Eli Zaretskii <eliz@gnu.org>, Rob Browning <rlb@defaultvalue.org>
Cc: guile-devel@gnu.org
Subject: Re: Improving the handling of system data (env, users, paths, ...)
Date: Sun, 07 Jul 2024 12:03:06 +0200	[thread overview]
Message-ID: <9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr> (raw)
In-Reply-To: <86jzhx3gxe.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2821 bytes --]

Le dimanche 07 juillet 2024 à 08:33 +0300, Eli Zaretskii a écrit :
> 
>     - The internal representation is a superset of UTF-8, in that it
>       is capable of representing characters for which there are no
>       Unicode codepoints (such as GB 18030, some of whose characters
>       don't have Unicode counterparts; and raw bytes, used to
>       represent byte sequences that cannot be decoded).  It uses
>       5-byte UTF-8-like sequences for these extensions.


Guile is a Scheme implementation, bound by Scheme standards and compatibility
with other Scheme implementations (and backwards compatibility too).

I just tried (aref (cadr command-line-args) 0) in a lisp-interaction-mode
Emacs buffer after launching "emacs $'\xb5'". It gave 4194229 = 0x3fffb5,
which quite logically is outside the Unicode code point range 0 - 0x110000.

This doesn't work for Guile, since a character is a Unicode code point
in the Scheme semantics.


>     - Emacs has its own code for code-conversion, for moving by
>       characters through multibyte sequences, for producing a Unicode
>       codepoint from a byte sequence in the super-UTF-8 representation
>       and back, etc., so it doesn't use libc routines for that, and
>       thus doesn't depend on the current locale for these operations.


Guile's encoding conversions don't rely on the libc locale. They use
GNU libiconv. The issue at hand is that for argv specifically, the
conversion happens at startup with the locale encoding as a default
(AFAICT Guile uses environ_locale_charset from gnulib to convert the
C locale to an encoding name usable by libiconv) and Guile doesn't store
the original argv bytes.


>     - APIs are provided for "manual" encoding and decoding.  A Lisp
>       program can read a byte stream, then decode it "manually" using
>       a particular codeset, as deemed appropriate.  This allows to
>       handle complex situations where a program receives stuff whose
>       encoding can only be determined by examining the raw byte stream
>       (a typical example is a multipart email message with MIME
>       charset header for each part).


These exist, see (ice-9 iconv).


>     - Emacs also has tables of Unicode attributes of characters
>       (produced by parsing the relevant Unicode data files at build
>       time), so it can up/down-case characters, determine their
>       category (letters, digits, punctuation, etc.) and script to
>       which they belong, etc. -- all with its own code, independent of
>       the underlying libc.


Also exists, and AFAICT uses GNU libunistring. See string-upcase,
char-general-category, etc.

> 
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2024-07-07 10:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-06 20:32 Improving the handling of system data (env, users, paths, ...) Rob Browning
2024-07-07  4:59 ` tomas
2024-07-07  5:33 ` Eli Zaretskii
2024-07-07 10:03   ` Jean Abou Samra [this message]
2024-07-07 11:04     ` Eli Zaretskii
2024-07-07 11:35       ` Maxime Devos
2024-07-07 14:25         ` Eli Zaretskii
2024-07-07 14:59           ` Maxime Devos
2024-07-07 15:43             ` Eli Zaretskii
2024-07-07 15:16           ` Jean Abou Samra
2024-07-07 15:18             ` Jean Abou Samra
2024-07-07 15:58             ` Eli Zaretskii
2024-07-07 16:09               ` Jean Abou Samra
2024-07-07 16:56               ` Mike Gran
2024-07-07  9:45 ` Jean Abou Samra
2024-07-07 19:25   ` Rob Browning
2024-07-07 10:24 ` Maxime Devos
2024-07-07 19:40   ` Rob Browning

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr \
    --to=jean@abou-samra.fr \
    --cc=eliz@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=rlb@defaultvalue.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).