From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Abou Samra Newsgroups: gmane.lisp.guile.devel Subject: Re: Improving the handling of system data (env, users, paths, ...) Date: Sun, 07 Jul 2024 12:03:06 +0200 Message-ID: <9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr> References: <878qyeqn1q.fsf@trouble.defaultvalue.org> <86jzhx3gxe.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-M2pIKStvQ3U7TQdpxklG" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37992"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.52.2 (3.52.2-1.fc40) Cc: guile-devel@gnu.org To: Eli Zaretskii , Rob Browning Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Jul 07 12:03:43 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sQOkA-0009fi-St for guile-devel@m.gmane-mx.org; Sun, 07 Jul 2024 12:03:43 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sQOjk-0002YJ-Kf; Sun, 07 Jul 2024 06:03:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQOjj-0002Y3-2o for guile-devel@gnu.org; Sun, 07 Jul 2024 06:03:15 -0400 Original-Received: from mout.kundenserver.de ([212.227.126.134]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQOjh-0006zR-0y; Sun, 07 Jul 2024 06:03:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=abou-samra.fr; s=s1-ionos; t=1720346587; x=1720951387; i=jean@abou-samra.fr; bh=6UYKWAt+svUbkEQefZyURZ4P4nuXxLcapp9g88cMELk=; h=X-UI-Sender-Class:Message-ID:Subject:From:To:Cc:Date:In-Reply-To: References:Content-Type:MIME-Version:cc:content-transfer-encoding: content-type:date:from:message-id:mime-version:reply-to:subject: to; b=FsRHTlIa4X1t0rH0hJjkC21vmtDQp0797hv13Ul1qCxnrIGDRKI4VZU8XBd6Y6CV pyEV8+VpfwjaJioHBprseLnRTu+ouCey5MKiEdOOOyZGb5puA6eqA2uSaSM6os3g7 qwEjS9E/zUR+2s7CzsEtTa1jppztxxlyk6AK/JWt4kcZoOoxYXXyCzIFvWfvBsT1/ HXWU8vVO265h7HpoAq/WbBJ5FYQD9lXpv6Zp0faZmNekjv2SkEyfq8Ud1mpSnfAuH KO7j/rEKn3JBw9kEMgxfW9Pj9/fqCrVXKpwRlvrSovUmuwx7kkEsRjxJdqtnmgUu8 Qf+3dWXY/kMTlNl/VA== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Original-Received: from laptop.home ([90.60.20.191]) by mrelayeu.kundenserver.de (mreue011 [213.165.67.103]) with ESMTPSA (Nemesis) id 1MiaHf-1rwZUS3ToV-00olcq; Sun, 07 Jul 2024 12:03:07 +0200 In-Reply-To: <86jzhx3gxe.fsf@gnu.org> X-Provags-ID: V03:K1:33Ak7ErDjCRStm1Rhdwskk8CVUm9SHcyoacxTXgUDoww62R0Qf1 zadsWKmO/+BkTa0vfksNUx+8CgoFL3dwpuxXW6IryT2CnZFw7hsAVWo7gcHX6+BHE4ahSKA z+fYg0ctpLjOUL67iByIVOOX/h1c9B2wijHSGH4mU2x9GH7hkNMZRmI/MS1YbCf2bbSuCLI mLopfzJryOO0nchfMXcHg== UI-OutboundReport: notjunk:1;M01:P0:Mkk/edicf8g=;MtdM3D9ElBGoc37r6rXM4fgX32/ G8KeyliSvgvs4dXhvxGpb9c2QkIWoNeYqhTz4das4FGqy1hbg13dkwjEslF+mBI/5p0/gQowm iLnMmnEEHOV1ib3zvnpVdBXlvh6ZIyjjC7rJAG7shDP8OnElfsxQEYJ6b71D6L3ISLzovsStA JGTE0juex693x05hznjmf180BdHyysb6poXy4opwUh/ehI4Bjnwl191bRE+EUJD9CAAqlEe3e pllbG1sFgqF0D2qhl898rQka+cJBZjpL514Q2NdI3J7eWF0E4muWc9ED3R6jpz7NSMSDWmOB4 XephPXA+SM9twq81LSf6d3ee5kPByaE939sLpWtyfHIxNEWd0n8lcnLHp7wAJS8x/I4iABzvi 76s9BMgMxFqFdP81oT3MzS3+/UgK3dxjKoJQebfBfOc57qhBUY2n8hbJaIn13CvTbmKQy/BYz PdCfCdkfDn5N6PXbbf9PcAMsgkk341vHi3nKAxhveUGd+0kRuxkwQwr+4skW0jQC17sulBCah MNHgkYeJN+88IvvvZVqCD1upF9NJx3RFOoXjJrTVJ+2XqhNr1cMrafpR4Y3gADka0gFsHXpAT 44zJ7q4XnoCMCmFpvq7Tdmy6/Zn+SHPAazqWT1asDa+tn/nmvoawEwgP0p7zjh/1ctfs0+/Ws 1gNoMZPkUB7bjQ98dFmKDn4Od6egZYPV1Hm8nfVGn7EvUqKh8eq+0hx5HdpZE3WwpnKV7PEII vWOpzcJxEaO7AaySpBDPS/QTkdASDLUDQF97oLpP8kYKM8dT+hhK6A= Received-SPF: pass client-ip=212.227.126.134; envelope-from=jean@abou-samra.fr; helo=mout.kundenserver.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22552 Archived-At: --=-M2pIKStvQ3U7TQdpxklG Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le dimanche 07 juillet 2024 =C3=A0 08:33 +0300, Eli Zaretskii a =C3=A9crit= =C2=A0: >=20 > =C2=A0=C2=A0=C2=A0 - The internal representation is a superset of UTF-8, = in that it > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 is capable of representing characters for = which there are no > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Unicode codepoints (such as GB 18030, some= of whose characters > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 don't have Unicode counterparts; and raw b= ytes, used to > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 represent byte sequences that cannot be de= coded).=C2=A0 It uses > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5-byte UTF-8-like sequences for these exte= nsions. Guile is a Scheme implementation, bound by Scheme standards and compatibili= ty with other Scheme implementations (and backwards compatibility too). I just tried (aref (cadr command-line-args) 0) in a lisp-interaction-mode Emacs buffer after launching "emacs $'\xb5'". It gave 4194229 =3D 0x3fffb5, which quite logically is outside the Unicode code point range 0 - 0x110000. This doesn't work for Guile, since a character is a Unicode code point in the Scheme semantics. > =C2=A0=C2=A0=C2=A0 - Emacs has its own code for code-conversion, for movi= ng by > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 characters through multibyte sequences, fo= r producing a Unicode > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 codepoint from a byte sequence in the supe= r-UTF-8 representation > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 and back, etc., so it doesn't use libc rou= tines for that, and > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 thus doesn't depend on the current locale = for these operations. Guile's encoding conversions don't rely on the libc locale. They use GNU libiconv. The issue at hand is that for argv specifically, the conversion happens at startup with the locale encoding as a default (AFAICT Guile uses environ_locale_charset from gnulib to convert the C locale to an encoding name usable by libiconv) and Guile doesn't store the original argv bytes. > =C2=A0=C2=A0=C2=A0 - APIs are provided for "manual" encoding and decoding= .=C2=A0 A Lisp > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 program can read a byte stream, then decod= e it "manually" using > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 a particular codeset, as deemed appropriat= e.=C2=A0 This allows to > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 handle complex situations where a program = receives stuff whose > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 encoding can only be determined by examini= ng the raw byte stream > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (a typical example is a multipart email me= ssage with MIME > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 charset header for each part). These exist, see (ice-9 iconv). > =C2=A0=C2=A0=C2=A0 - Emacs also has tables of Unicode attributes of chara= cters > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (produced by parsing the relevant Unicode = data files at build > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 time), so it can up/down-case characters, = determine their > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 category (letters, digits, punctuation, et= c.) and script to > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 which they belong, etc. -- all with its ow= n code, independent of > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 the underlying libc. Also exists, and AFAICT uses GNU libunistring. See string-upcase, char-general-category, etc. >=20 >=20 --=-M2pIKStvQ3U7TQdpxklG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSZ7TKxnKGyBvBjzBmj8PYLiTOX/gUCZopn2gAKCRCj8PYLiTOX /nIEAQCeuxotm9mvba6xhNomUCUwokS1hsP/SDT9ikXZXjonowEAxc9SSgtzbE8p u0tIpfK9ATyvTMJhUbce8dImjal+Xg0= =mr63 -----END PGP SIGNATURE----- --=-M2pIKStvQ3U7TQdpxklG--