unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#22913: filenames mangled by locale
@ 2016-03-05  0:42 Zefram
  0 siblings, 0 replies; only message in thread
From: Zefram @ 2016-03-05  0:42 UTC (permalink / raw)
  To: 22913

It seems that guile-2.0 applies locale encoding and decoding to pathnames
being used in system calls.  This radically breaks file access anywhere
that the locale's character encoding is anything other than a simple
8-bit encoding such as ISO-8859-1.  For example, in the default C locale
with its nominal ASCII encoding,

$ guile-2.0 -c '(open-file (list->string (map integer->char '\''(76 195 169 111 110))) "w")'
$ echo L*n | od -tc
0000000   L   ?   ?   o   n  \n
0000006

Those are literal question marks in the name of the file actually
created, apparently arising as substitutions for the high-half octets in
the requested filename.  Existing files with names containing high-half
octets can't be found (resulting in an ENOENT error message that shows the
actually-existing filename), and new ones can't be created (actually being
created under the mangled name instead).  There's no warning or exception
advising that the requested name can't be used, just this misbehaviour.

The equivalent problem arises with decoding when filenames are received:

$ echo foo > $'L\303\251on.txt'
$ guile-2.0 -c '(define d (opendir ".")) (let r () (let ((n (readdir d))) (if (eof-object? n) #t (begin (if (eq? (car (reverse (string->list n))) #\t) (begin (write (map char->integer (string->list n))) (newline))) (r)))))'
(76 63 63 111 110 46 116 120 116)

Again no warning or exception, just incorrect data returned.

To work around this would require the program to select a locale with
a more accommodating nominal character encoding.  As I've previously
noted, there's no guarantee of such a locale existing.  Thus the above
behaviour is fatal to any attempt to write in Guile Scheme a program to
operate on arbitrarily-named files.

Guile even applies this mangling to the pathname of a script that it is
to load:

$ echo '(write "hi")(newline)' > $'L\303\251on.scm'     
$ guile-2.0 -s L*n.scm
[big error message saying it couldn't find the file that exists]

Obviously, even if a program could turn off the locale mangling in
general, this instance of it occurs too early for the program to avoid.
The guile framework itself has acquired the kind of 8-bit-cleanliness
bug that it is imposing on the programs that it interprets.

-zefram





^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-03-05  0:42 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-05  0:42 bug#22913: filenames mangled by locale Zefram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).