unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: Zefram <zefram@fysh.org>
To: 22913@debbugs.gnu.org
Subject: bug#22913: filenames mangled by locale
Date: Sat, 5 Mar 2016 00:42:51 +0000	[thread overview]
Message-ID: <20160305004251.GF7946@fysh.org> (raw)

It seems that guile-2.0 applies locale encoding and decoding to pathnames
being used in system calls.  This radically breaks file access anywhere
that the locale's character encoding is anything other than a simple
8-bit encoding such as ISO-8859-1.  For example, in the default C locale
with its nominal ASCII encoding,

$ guile-2.0 -c '(open-file (list->string (map integer->char '\''(76 195 169 111 110))) "w")'
$ echo L*n | od -tc
0000000   L   ?   ?   o   n  \n
0000006

Those are literal question marks in the name of the file actually
created, apparently arising as substitutions for the high-half octets in
the requested filename.  Existing files with names containing high-half
octets can't be found (resulting in an ENOENT error message that shows the
actually-existing filename), and new ones can't be created (actually being
created under the mangled name instead).  There's no warning or exception
advising that the requested name can't be used, just this misbehaviour.

The equivalent problem arises with decoding when filenames are received:

$ echo foo > $'L\303\251on.txt'
$ guile-2.0 -c '(define d (opendir ".")) (let r () (let ((n (readdir d))) (if (eof-object? n) #t (begin (if (eq? (car (reverse (string->list n))) #\t) (begin (write (map char->integer (string->list n))) (newline))) (r)))))'
(76 63 63 111 110 46 116 120 116)

Again no warning or exception, just incorrect data returned.

To work around this would require the program to select a locale with
a more accommodating nominal character encoding.  As I've previously
noted, there's no guarantee of such a locale existing.  Thus the above
behaviour is fatal to any attempt to write in Guile Scheme a program to
operate on arbitrarily-named files.

Guile even applies this mangling to the pathname of a script that it is
to load:

$ echo '(write "hi")(newline)' > $'L\303\251on.scm'     
$ guile-2.0 -s L*n.scm
[big error message saying it couldn't find the file that exists]

Obviously, even if a program could turn off the locale mangling in
general, this instance of it occurs too early for the program to avoid.
The guile framework itself has acquired the kind of 8-bit-cleanliness
bug that it is imposing on the programs that it interprets.

-zefram





                 reply	other threads:[~2016-03-05  0:42 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160305004251.GF7946@fysh.org \
    --to=zefram@fysh.org \
    --cc=22913@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).