From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#20822: environment mangled by locale Date: Tue, 16 Jun 2015 05:17:36 +0100 Message-ID: <20150616041736.GA2718@fysh.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1434428302 23551 80.91.229.3 (16 Jun 2015 04:18:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 16 Jun 2015 04:18:22 +0000 (UTC) To: 20822@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Jun 16 06:18:11 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z4iKA-0002Lw-E1 for guile-bugs@m.gmane.org; Tue, 16 Jun 2015 06:18:10 +0200 Original-Received: from localhost ([::1]:37458 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iK9-0004dQ-Fr for guile-bugs@m.gmane.org; Tue, 16 Jun 2015 00:18:09 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53959) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iK6-0004dL-9W for bug-guile@gnu.org; Tue, 16 Jun 2015 00:18:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iK3-00010Q-2r for bug-guile@gnu.org; Tue, 16 Jun 2015 00:18:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:40649) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iK2-00010M-VI for bug-guile@gnu.org; Tue, 16 Jun 2015 00:18:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Z4iK2-0001mr-MI for bug-guile@gnu.org; Tue, 16 Jun 2015 00:18:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 16 Jun 2015 04:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 20822 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14344282786854 (code B ref -1); Tue, 16 Jun 2015 04:18:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 16 Jun 2015 04:17:58 +0000 Original-Received: from localhost ([127.0.0.1]:55109 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4iJx-0001mU-ST for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:17:58 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:40382) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4iJv-0001mH-UH for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:17:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iJp-0000vg-Px for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:17:50 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:54996) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iJp-0000vZ-MI for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:17:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53883) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iJo-0004cE-FL for bug-guile@gnu.org; Tue, 16 Jun 2015 00:17:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iJl-0000ui-8i for bug-guile@gnu.org; Tue, 16 Jun 2015 00:17:48 -0400 Original-Received: from river.fysh.org ([5.135.154.127]:60991) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iJl-0000tK-2r for bug-guile@gnu.org; Tue, 16 Jun 2015 00:17:45 -0400 Original-Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1Z4iJc-0002F5-VS; Tue, 16 Jun 2015 05:17:37 +0100 Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7793 Archived-At: When guile-2.0 is asked to read environment variables, via getenv, it always decodes the underlying octet string according to the current locale's nominal character encoding. This is a problem, because the environment variable's value is not necessarily encoded that way, and may not even be an encoding of a character string at all. The decoding is lossy, where the octet string isn't consistent with the character encoding, so the original octet string cannot be recovered from the mangled form. I don't see any Scheme interface that retrieves the environment without locale decoding. The decoding is governed by the currently selected locale at the time that getenv is called, so this can be controlled to some extent by setlocale. However, this doesn't provide a way round the lossy decoding problem, because there is no guarantee of a cooperative locale being available (and especially being available under a predictable name). On my Debian system here, the "POSIX" and "C" locales' nominal character encoding is ASCII, so decoding under these locales results in all high-half octets being turned into question marks. Retrieving environment without calling setlocale at all also yields this lossy ASCII decode. Demos: $ env - FOO=$'L\xc3\xa9on' guile-2.0 -c '(write (map char->integer (string->list (getenv "FOO")))) (newline)' (76 63 63 111 110) $ env - FOO=$'L\xc3\xa9on' guile-2.0 -c '(setlocale LC_ALL "POSIX") (write (map char->integer (string->list (getenv "FOO")))) (newline)' (76 63 63 111 110) $ env - FOO=$'L\xc3\xa9on' guile-2.0 -c '(setlocale LC_ALL "de_DE.utf8") (write (map char->integer (string->list (getenv "FOO")))) (newline)' (76 233 111 110) $ env - FOO=$'L\xc3\xa9on' guile-2.0 -c '(setlocale LC_ALL "de_DE.iso88591") (write (map char->integer (string->list (getenv "FOO")))) (newline)' (76 195 169 111 110) The actual data passed between processes is an octet string, and there really needs to be some reliable way to access that octet string. There's an obvious parallel with reading data from an input port. If setlocale is called, then input is by default decoded according to locale, including the very lossy ASCII decode for C/POSIX. But if setlocale has not been called, then input is by default decoded according to ISO-8859-1, preserving the actual octets. It would probably be most sensible that, if setlocale hasn't been called, getenv should likewise decode according to ISO-8859-1. It might also be sensible to offer some explicit control over the encoding to be used with the environment, just as I/O ports have a concept of per-port selected encoding. The same issue applies to other environment access functions too. For setenv the corresponding problem is the inability to *write* an arbitrary octet string to an environment variable. Obviously all the functions should have mutually consistent behaviour. -zefram