From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#20822: environment mangled by locale Date: Sun, 26 Jun 2016 11:33:49 +0100 Message-ID: <20160626103349.GK1170@fysh.org> References: <20150616041736.GA2718@fysh.org> <87eg7njfhk.fsf@pobox.com> <87wplcpxev.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1466937263 30277 80.91.229.3 (26 Jun 2016 10:34:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 26 Jun 2016 10:34:23 +0000 (UTC) Cc: 20822@debbugs.gnu.org, ludo@gnu.org To: Mark H Weaver Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Sun Jun 26 12:34:14 2016 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bH7OH-0002Uc-9f for guile-bugs@m.gmane.org; Sun, 26 Jun 2016 12:34:13 +0200 Original-Received: from localhost ([::1]:53337 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bH7OG-0003Jj-E9 for guile-bugs@m.gmane.org; Sun, 26 Jun 2016 06:34:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54299) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bH7OB-0003JX-8v for bug-guile@gnu.org; Sun, 26 Jun 2016 06:34:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bH7O6-0002sB-AH for bug-guile@gnu.org; Sun, 26 Jun 2016 06:34:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43761) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bH7O6-0002ry-71 for bug-guile@gnu.org; Sun, 26 Jun 2016 06:34:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bH7O5-00013Y-TL for bug-guile@gnu.org; Sun, 26 Jun 2016 06:34:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 26 Jun 2016 10:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20822 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 20822-submit@debbugs.gnu.org id=B20822.14669372354048 (code B ref 20822); Sun, 26 Jun 2016 10:34:01 +0000 Original-Received: (at 20822) by debbugs.gnu.org; 26 Jun 2016 10:33:55 +0000 Original-Received: from localhost ([127.0.0.1]:56098 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bH7Nz-00013E-Lb for submit@debbugs.gnu.org; Sun, 26 Jun 2016 06:33:55 -0400 Original-Received: from river.fysh.org ([87.98.248.19]:36101 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bH7Nx-000135-RB for 20822@debbugs.gnu.org; Sun, 26 Jun 2016 06:33:54 -0400 Original-Received: from zefram by river.fysh.org with local (Exim 4.84_2 #1 (Debian)) id 1bH7Nt-0006ec-Vt; Sun, 26 Jun 2016 11:33:49 +0100 Content-Disposition: inline In-Reply-To: <87wplcpxev.fsf@netris.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8255 Archived-At: Mark H Weaver wrote: > by convention they are >supposed to encoded in the locale encoding. This convention is bunk. The encoding aspect of the locale system is fundamentally broken: the model is that every string in the universe (every file content, filename, command line argument, etc.) is encoded in the same way, and the locale environment variable tells you which universe you're in. But in the real universe, files, filenames, and so on turn up encoded how their authors liked to encode them, and that's not always the same. In the real universe we have to cope with data that is not encoded in our preferred way. > If that convention is >violated, I don't see what a program could do about it. If the convention is violated, then there is some difficulty in presenting correctly-encoded (or even consistently-encoded) output to the user, but it is not insuperable. Perhaps the program knows by some non-locale means how a string is encoded, and can explicitly convert. Perhaps it doesn't know the real encoding, but can trust that the user will understand the octet string if it is passed through with neither decoding of input nor encoding for output. Or perhaps the program doesn't need to put the string into textual output at all, but only to use it some API or file format that's expecting an encodingless octet string. So there are many things a program can reasonably do about it, and which one to do depends on the application. >Can someone show me a realistic example of how this would be used in >practice? Looking specifically at environment variables: an environment variable could give the name of a file that is to be consulted under specified circumstances, and the right file may happen to have a name that is inconsistent with the encoding used by the user's terminal. (The filename is not required for output; it only needs to be passed as an uninterpreted octet string to the open(2) syscall.) An environment variable could specify a Unicode-using name of a language module to be loaded, while the user doesn't otherwise use Unicode, or doesn't use an encoding encompassing enough of it. (Name not required on output, again; will be either transformed into a filename or looked up in a file format that specifies its own encoding.) The program could be env(1), not interpreting the environment but needing to output the octets correctly. The program could be saving an uninterpreted environment, for a cron job to later run some other program with equivalent settings. -zefram