From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#22913: filenames mangled by locale Date: Sat, 5 Mar 2016 00:42:51 +0000 Message-ID: <20160305004251.GF7946@fysh.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1457138657 16269 80.91.229.3 (5 Mar 2016 00:44:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 5 Mar 2016 00:44:17 +0000 (UTC) To: 22913@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Sat Mar 05 01:44:08 2016 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ac0KF-0006Zn-G8 for guile-bugs@m.gmane.org; Sat, 05 Mar 2016 01:44:07 +0100 Original-Received: from localhost ([::1]:44187 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0KE-0000cg-Rm for guile-bugs@m.gmane.org; Fri, 04 Mar 2016 19:44:06 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33252) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0KB-0000cD-GZ for bug-guile@gnu.org; Fri, 04 Mar 2016 19:44:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ac0KA-0003wf-HF for bug-guile@gnu.org; Fri, 04 Mar 2016 19:44:03 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:37076) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0KA-0003wb-Dv for bug-guile@gnu.org; Fri, 04 Mar 2016 19:44:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84) (envelope-from ) id 1ac0KA-0007Ke-9T for bug-guile@gnu.org; Fri, 04 Mar 2016 19:44:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sat, 05 Mar 2016 00:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 22913 X-GNU-PR-Package: guile X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.145713858428108 (code B ref -1); Sat, 05 Mar 2016 00:44:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 5 Mar 2016 00:43:04 +0000 Original-Received: from localhost ([127.0.0.1]:34203 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1ac0JE-0007JI-Gs for submit@debbugs.gnu.org; Fri, 04 Mar 2016 19:43:04 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:41983) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1ac0JD-0007Io-50 for submit@debbugs.gnu.org; Fri, 04 Mar 2016 19:43:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ac0J7-0003nR-3h for submit@debbugs.gnu.org; Fri, 04 Mar 2016 19:42:58 -0500 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:57054) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0J7-0003nM-0N for submit@debbugs.gnu.org; Fri, 04 Mar 2016 19:42:57 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32954) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0J6-0000WB-0w for bug-guile@gnu.org; Fri, 04 Mar 2016 19:42:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ac0J5-0003n5-1D for bug-guile@gnu.org; Fri, 04 Mar 2016 19:42:55 -0500 Original-Received: from river6.fysh.org ([2001:41d0:d:20da::2]:34679 helo=river.fysh.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ac0J4-0003n0-RK for bug-guile@gnu.org; Fri, 04 Mar 2016 19:42:54 -0500 Original-Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1ac0J1-0006Tx-Fy; Sat, 05 Mar 2016 00:42:51 +0000 Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7982 Archived-At: It seems that guile-2.0 applies locale encoding and decoding to pathnames being used in system calls. This radically breaks file access anywhere that the locale's character encoding is anything other than a simple 8-bit encoding such as ISO-8859-1. For example, in the default C locale with its nominal ASCII encoding, $ guile-2.0 -c '(open-file (list->string (map integer->char '\''(76 195 169 111 110))) "w")' $ echo L*n | od -tc 0000000 L ? ? o n \n 0000006 Those are literal question marks in the name of the file actually created, apparently arising as substitutions for the high-half octets in the requested filename. Existing files with names containing high-half octets can't be found (resulting in an ENOENT error message that shows the actually-existing filename), and new ones can't be created (actually being created under the mangled name instead). There's no warning or exception advising that the requested name can't be used, just this misbehaviour. The equivalent problem arises with decoding when filenames are received: $ echo foo > $'L\303\251on.txt' $ guile-2.0 -c '(define d (opendir ".")) (let r () (let ((n (readdir d))) (if (eof-object? n) #t (begin (if (eq? (car (reverse (string->list n))) #\t) (begin (write (map char->integer (string->list n))) (newline))) (r)))))' (76 63 63 111 110 46 116 120 116) Again no warning or exception, just incorrect data returned. To work around this would require the program to select a locale with a more accommodating nominal character encoding. As I've previously noted, there's no guarantee of such a locale existing. Thus the above behaviour is fatal to any attempt to write in Guile Scheme a program to operate on arbitrarily-named files. Guile even applies this mangling to the pathname of a script that it is to load: $ echo '(write "hi")(newline)' > $'L\303\251on.scm' $ guile-2.0 -s L*n.scm [big error message saying it couldn't find the file that exists] Obviously, even if a program could turn off the locale mangling in general, this instance of it occurs too early for the program to avoid. The guile framework itself has acquired the kind of 8-bit-cleanliness bug that it is imposing on the programs that it interprets. -zefram