From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Abou Samra Newsgroups: gmane.lisp.guile.devel Subject: Re: Improving the handling of system data (env, users, paths, ...) Date: Sun, 07 Jul 2024 17:16:31 +0200 Message-ID: <6e263534728f4a7cd3d2e2869781d4eafed54b5d.camel@abou-samra.fr> References: <878qyeqn1q.fsf@trouble.defaultvalue.org> <86jzhx3gxe.fsf@gnu.org> <9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr> <865xth31kq.fsf@gnu.org> <20240707133527.kbbT2C0064hwdlW01bbTq5@baptiste.telenet-ops.be> <8634ol2sal.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-Aa53R0koOeq7mWfdewzd" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4212"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.52.2 (3.52.2-1.fc40) Cc: rlb@defaultvalue.org, guile-devel@gnu.org To: Eli Zaretskii , Maxime Devos Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Jul 07 18:09:20 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sQUS0-0000os-32 for guile-devel@m.gmane-mx.org; Sun, 07 Jul 2024 18:09:20 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sQTd5-0006HG-Bn; Sun, 07 Jul 2024 11:16:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQTd3-0006Gw-KT for guile-devel@gnu.org; Sun, 07 Jul 2024 11:16:41 -0400 Original-Received: from mout.kundenserver.de ([217.72.192.73]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQTd0-0008Uo-KE; Sun, 07 Jul 2024 11:16:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=abou-samra.fr; s=s1-ionos; t=1720365393; x=1720970193; i=jean@abou-samra.fr; bh=pIchP5wOIuYARTP/IADGq8444t8QdnX/3cA4faWV7BU=; h=X-UI-Sender-Class:Message-ID:Subject:From:To:Cc:Date:In-Reply-To: References:Content-Type:MIME-Version:cc:content-transfer-encoding: content-type:date:from:message-id:mime-version:reply-to:subject: to; b=z3f/Nfwqq66iAv0GDwdoMK4P419zrAvu3m/LxTMWzEAp+ErvTs2RHEDr8M5QaksZ pHbkbQa4ZwA10qwcxKUqs03UuJhAzJVlq3QDTJyVb7z1HxanGeQLaXcCUPoIlXlnk KBK7LevmMmK/kxcUmJui8H5xw7F2fqMrjcjpgBIv9yaTsvw/RTTOK0ieE4SYtT+6u DRd+TvJDkXpX0/xTKa6Ju5hBAqrwoSanNCiATn0cj0yXjRzcUIzfNr7Dv6m75sPF8 8KephlLSD8t8kWTX/Kq9nU8NMurPitYHDcv9Gh+141WapF0F4Qfa0z0txqHf2YrF8 YAXbThx3p8MV5u9V3A== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Original-Received: from laptop.home ([90.60.20.191]) by mrelayeu.kundenserver.de (mreue109 [213.165.67.119]) with ESMTPSA (Nemesis) id 1N95mL-1sMmt41soB-00yeEt; Sun, 07 Jul 2024 17:16:33 +0200 In-Reply-To: <8634ol2sal.fsf@gnu.org> X-Provags-ID: V03:K1:6vmv8iUZWBMwHJxvQs6sMkAcJo3YYeNjF7Z3wbKbFkkDynFoaiG BAiiDWlLwrU3DeqJh8K2Wm8ZcIHKD7YQEc0haY97nTC2TC9Uehlx9PQ8wUqaXCGy1obj0sS NZfVacGxdn90ACDLHG/rxqPEZ1qO6D8REw8e+W6dvh7eNzCrv7FQE0e1XypZo510axPADxy FIBj2kJFXtg+I4cTK06vw== UI-OutboundReport: notjunk:1;M01:P0:00d6CwG7sDY=;rDihc6QktgVsy8TP2sHFAlNJPiI svDEUvGosgIqH+9rD8HraMsUxhsGLvxbF7fzc6uJTaxA4No4Qoqcgp/LzmjnjqPQ2Uc80vEwA w/ge80eraoRu9uzVrjz8H3r7J6dQcQ57aFxpYenn3UUQ8HJxKhiokOg2F5Ar24cF1remKMThU ELJPx/5U94Ah68ps2nZmS11b+6USzChfU2VfMR6yFmZ3JRG7QqGSeuydae4GDkjoQnGq58N/E IPI2s2GAaB3AUjmjUsESYX/BbZZXXverGkn0Dz/RA/1vDhHig9X6qZvfwoRo9be1GY7AKp5ey D2lzr+JzpbJIUv8HYNuXTeB9QaUADlO+TtiRDrW1UCUF7YkB71IlU78eH9PnISPhHAydOA2EV YfSvOTvBl/uRdCw4TbiA6P3z1s+lOYu+atYs37BIsYBXyYAwLL1icJDZdKdBUCiuTn49P8hzr qU9vRd7EZOHA8PuH/981G/lepjOQ1FTN9dDB+fhqWzKevjAgNVWNSD3W0sxbK7aC+m8KyMIZc gIa6ed2keg5Ts/dOQLpx6V33VWYwcih5tBBA7FpuI2tyPildKD9PxsqnX0h7d0pM0dJNX2haH Zk4bNPfPdywE6GMfjwxMrP0kBacOoTSBwVmUsmq8Rapxx45s/5jaU11B55E6w2Xlfsw4ZDQWT RzFmEvZevkC/fGDQGDYWY/Bg0YLLkhHdhC1dXP7QZq0PW0UQsK54JdYa38eQtupcCRyQ8uuQ0 XGOlzaQb5xLompu2wVEH5rQBSyp23l/jye2CjRIaU8VLY3Ms1nFICw= Received-SPF: pass client-ip=217.72.192.73; envelope-from=jean@abou-samra.fr; helo=mout.kundenserver.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22561 Archived-At: --=-Aa53R0koOeq7mWfdewzd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le dimanche 07 juillet 2024 =C3=A0 17:25 +0300, Eli Zaretskii a =C3=A9crit= =C2=A0: >=20 > If Guile restricts itself to Unicode characters and only them, it will > lack important features.=C2=A0 So my suggestion is not to have this > restriction. >=20 > I think the fact that this discussion is held, and that Rob suggested > to use Latin-1 for the purpose of supporting raw bytes is a clear > indication that Guile, too, needs to deal with "character-like" data > that does not fit the Unicode framework.=C2=A0 So I think saying that > strings in Guile can only hold Unicode characters will not give you > what this discussion attempts to give.=C2=A0 In particular, how will you > handle the situations described by Rob where a file has a name that is > not a valid UTF-8 sequence (thus not "characters" as long as you > interpret text as UTF-8)? Whatever the details of aref in Emacs are (which I have not studied), I think we all agree that a) Strings in Scheme have the semantics of arrays of something called "characters". b) According to Scheme standards and in current Guile, a character is a wrapper around a Unicode scalar value. (NB I wasn't precise enough in my previous email. R6RS explicitly disallows surrogate code points, so characters really correspond to scalar values and not to code points). c) If we want Guile strings to losslessly represent arbitrary byte sequences, Guile's definition of a character needs to be expanded to include things other than Unicode scalar values. So what would it entail for Guile to change its string model in this way? First, Guile would become technically not R6RS-compliant. I'm not sure how much of a problem this would actually be. There are non-trivial backwards compatibility implications. To give a concrete case: LilyPond definitely has code that would break if passed a string whose "conversion to UTF-8" gave something not valid UTF-8. (An example off the top: passing strings to the Pango API and to GLib's PCRE-based regex API. By the way, running "emacs $'\xb5'" gives a Pango warning on the terminal, I assume because of trying to display the file name as the window title.) =46rom the implementation point of view: conversion from an encoding to another could no longer use libiconv, because it stops on invalid multibyte sequences. Likewise, Guile could probably not use libiconv anymore. This means a large implementation cost to reimplement all of this in Guile. I don't think it's worth it. If anybody's going to work on this problem, I'd recommend simply adding APIs like program-arguments-bytevector, getenv-bytevector and the like, returning raw bytevectors instead of string= s, and letting programs which need to be reliable against invalid UTF-8 in the environment use these. That is also the approach taken in, e.g., Rust (except that due to the static typing, you are forced to handle the "invalid UTF-8" error case when you use, e.g., std::env::args as opposed to std::env::args_os). --=-Aa53R0koOeq7mWfdewzd Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSZ7TKxnKGyBvBjzBmj8PYLiTOX/gUCZoqxTwAKCRCj8PYLiTOX /j58AQCa5LpVZZ6zrzn9ICG0gjYW33DazL0K0bwnN32ZuuG7TwD/SJC3H6ONVmZt yqtyhqTN6HT3O4tYLuRvsbqWRK061wE= =yCzz -----END PGP SIGNATURE----- --=-Aa53R0koOeq7mWfdewzd--