From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#20823: argv mangled by locale Date: Tue, 16 Jun 2015 05:33:00 +0100 Message-ID: <20150616043300.GB2718@fysh.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1434429260 4728 80.91.229.3 (16 Jun 2015 04:34:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 16 Jun 2015 04:34:20 +0000 (UTC) To: 20823@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Jun 16 06:34:12 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z4iZe-0007Xh-Ni for guile-bugs@m.gmane.org; Tue, 16 Jun 2015 06:34:10 +0200 Original-Received: from localhost ([::1]:37489 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iZe-0006OG-28 for guile-bugs@m.gmane.org; Tue, 16 Jun 2015 00:34:10 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59333) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iZa-0006OA-Dr for bug-guile@gnu.org; Tue, 16 Jun 2015 00:34:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iZX-00008n-3x for bug-guile@gnu.org; Tue, 16 Jun 2015 00:34:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:40656) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iZX-00008Q-1R for bug-guile@gnu.org; Tue, 16 Jun 2015 00:34:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Z4iZW-0002AM-Ig for bug-guile@gnu.org; Tue, 16 Jun 2015 00:34:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 16 Jun 2015 04:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 20823 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14344291988266 (code B ref -1); Tue, 16 Jun 2015 04:34:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 16 Jun 2015 04:33:18 +0000 Original-Received: from localhost ([127.0.0.1]:55116 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4iYn-00029F-3i for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:33:17 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:45613) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4iYk-000292-PS for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:33:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iYe-0008J7-Bh for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:33:09 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:55244) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iYe-0008J2-9Q for submit@debbugs.gnu.org; Tue, 16 Jun 2015 00:33:08 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59116) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iYd-0006Ma-9l for bug-guile@gnu.org; Tue, 16 Jun 2015 00:33:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4iYa-0008Hi-28 for bug-guile@gnu.org; Tue, 16 Jun 2015 00:33:07 -0400 Original-Received: from river.fysh.org ([5.135.154.127]:32978) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4iYZ-0008Hb-S3 for bug-guile@gnu.org; Tue, 16 Jun 2015 00:33:03 -0400 Original-Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1Z4iYW-0002hW-9q; Tue, 16 Jun 2015 05:33:00 +0100 Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7794 Archived-At: When guile-2.0 stores argv for later access via program-arguments, it sometimes decodes the underlying octet string according to the nominal character encoding of the locale suggested by the environment. This is a problem, because the arguments are not necessarily encoded that way, and may not even be encodings of character strings at all. The decoding is lossy, where the octet string isn't consistent with the character encoding, so the original octet string cannot be recovered from the mangled form. I don't see any Scheme interface that reliably retrieves the command line arguments without locale decoding. The decoding doesn't follow the usual rules for locale control. It is not at all sensitive to setlocale, which is understandable due to the arguments being acquired before any of the actual program's code runs. Empirically, if the environment nominates no locale, "POSIX", or a non-existent locale, then argv is decoded according to ISO-8859-1, thus preserving the octets. If the environment nominates an extant locale other than "POSIX", then argv is decoded according to that locale's nominal character encoding. Demos: $ env - guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 195 169 111 110) $ env - LANG=C guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 63 63 111 110) $ env - LANG=de_DE.utf8 guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 233 111 110) $ env - LANG=de_DE.iso88591 guile-2.0 -c '(write (map char->integer (string->list (cadr (program-arguments))))) (newline)' $'L\xc3\xa9on' (76 195 169 111 110) The actual data passed between processes is an octet string, and there really needs to be some reliable way to access that octet string. My comments about resolution in bug#20822 "environment mangled by locale" mostly apply here too, with a slight change: it seems necessary to store the original octet strings and decode at the time program-arguments is called. With that change, the decoding can be responsive to setlocale (and in particular can reliably use ISO-8859-1 in the absence of setlocale). -zefram