From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.lisp.guile.devel Subject: Re: Improving the handling of system data (env, users, paths, ...) Date: Sun, 07 Jul 2024 18:58:05 +0300 Message-ID: <86v81h19f6.fsf@gnu.org> References: <878qyeqn1q.fsf@trouble.defaultvalue.org> <86jzhx3gxe.fsf@gnu.org> <9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr> <865xth31kq.fsf@gnu.org> <20240707133527.kbbT2C0064hwdlW01bbTq5@baptiste.telenet-ops.be> <8634ol2sal.fsf@gnu.org> <6e263534728f4a7cd3d2e2869781d4eafed54b5d.camel@abou-samra.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27945"; mail-complaints-to="usenet@ciao.gmane.io" Cc: maximedevos@telenet.be, rlb@defaultvalue.org, guile-devel@gnu.org To: Jean Abou Samra Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Jul 07 17:58:23 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sQUHP-000718-Js for guile-devel@m.gmane-mx.org; Sun, 07 Jul 2024 17:58:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sQUHD-000234-6m; Sun, 07 Jul 2024 11:58:11 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQUHB-00022r-Lr for guile-devel@gnu.org; Sun, 07 Jul 2024 11:58:09 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQUHA-0002Cj-I4; Sun, 07 Jul 2024 11:58:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=bO10FoqczwxdSdwYXZHG0yjV30cC8nUqil7EIXfQg5Q=; b=rghI/ZmMGNWbzatKyZT0 5dc97i680rJndVlRlYD6dXdTtiCvTZY3s+mMjEdarQHFofkqo8XKues682yQeAKcAovxVk4IFdpLl atqv2nPJsmYKkSXHW1BKKGpxAihAXRSE9smV3Cx06bad7jb5m9rG2DC4l1aR+BaC4A4SQdYl9AAbp X22gk6v7/c4idcAhutOKAQgdV7/hhu3lK2hby/yqdKHr7gTuYaB9lB7H4x4yxJ/Ezbqj/nudxPUUs v8o1rbOkGyXV8rBP+eJidoK1K4gLqvNbVLFyPSwNp49hE6Vn5q7NAwaH2pgGGQylvNmPqIsSNtrSF +x5tV79KYM7DNg==; In-Reply-To: <6e263534728f4a7cd3d2e2869781d4eafed54b5d.camel@abou-samra.fr> (message from Jean Abou Samra on Sun, 07 Jul 2024 17:16:31 +0200) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22559 Archived-At: > From: Jean Abou Samra > Cc: rlb@defaultvalue.org, guile-devel@gnu.org > Date: Sun, 07 Jul 2024 17:16:31 +0200 > > There are non-trivial backwards compatibility implications. To give > a concrete case: LilyPond definitely has code that would break if > passed a string whose "conversion to UTF-8" gave something not valid > UTF-8. (An example off the top: passing strings to the Pango API and to > GLib's PCRE-based regex API. By the way, running "emacs $'\xb5'" > gives a Pango warning on the terminal, I assume because of trying > to display the file name as the window title.) Probably. Do you consider it a problem in Emacs or in Pango? > From the implementation point of view: conversion from an encoding to > another could no longer use libiconv, because it stops on invalid > multibyte sequences. Likewise, Guile could probably not use libiconv > anymore. This means a large implementation cost to reimplement all > of this in Guile. Or relatively small additions to libiconv, should their developers agree with such an extension. > I don't think it's worth it. If anybody's going to work on this problem, > I'd recommend simply adding APIs like program-arguments-bytevector, > getenv-bytevector and the like, returning raw bytevectors instead of strings, > and letting programs which need to be reliable against invalid UTF-8 > in the environment use these. > > That is also the approach taken in, e.g., Rust (except that due to the > static typing, you are forced to handle the "invalid UTF-8" error case > when you use, e.g., std::env::args as opposed to std::env::args_os). The Emacs experience shows that (rare) raw bytes as part of otherwise completely valid text are a fact of life. They happen all the time, for whatever reasons. Granted, those reasons are most probably something misconfigured somewhere, but as long as that happens in a program other than the one you are developing, or even on another computer, the ability of the user, let alone the programmer, to fix the whole world is, how shall I put it, somewhat limited. The question is what do you when this stuff happens, and how you prepare your package for dealing with it as well as reasonably possible? Here's an example just from today: I've received an email from RMS, no less, with obviously garbled address: To: Björn Bidar Now, this is a typical case of misinterpreting UTF-8 as Latin-1 (on RMS's machine, not on mine); the correct name is Björn Bidar. But when you get such mojibake from your MTA, what do you do? signal an error and refuse to show the message? Good luck explaining to your users that you are right behaving like that! We in Emacs decided differently, but that's us. Once again, I described what we do in Emacs in the hope that it will help you find your own solution. If it doesn't help, that's fine by me; there's no need to argue as long as what we do is understood.