unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Chris Vine <chris@cvine.freeserve.co.uk>
To: guile-user@gnu.org
Subject: Re: Running script from directory with UTF-8 characters
Date: Tue, 22 Dec 2015 20:12:40 +0000	[thread overview]
Message-ID: <20151222201240.3a66fd94@bother.homenet> (raw)
In-Reply-To: <87bn9ieaup.fsf@elektro.pacujo.net>

On Tue, 22 Dec 2015 17:55:58 +0200
Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Vine <chris@cvine.freeserve.co.uk> wrote:
> > On Tue, 22 Dec 2015 03:14:18 +0200
> > Marko Rauhamaa <marko@pacujo.net> wrote:  
> >> For example,
> >> 
> >>     scheme@(guile-user)> (opendir ".")
> >>     $1 = #<directory stream f7ffa0>
> >>     [...]
> >>     scheme@(guile-user)> (readdir $1)
> >>     $4 = "?9t\x1b["
> >>     scheme@(guile-user)> (open-file $4 "r")
> >>     ERROR: In procedure open-file:
> >>     ERROR: In procedure open-file: No such file or directory:
> >> "?9t\x1b["  
> >
> > You can set the locale in the REPL, if that is where you are working
> > from (as in your example), and then UTF-8 pathnames will work
> > fine.  
> 
> You misunderstood me. The problem is that Guile cannot deal with
> non-UTF-8 pathnames in a UTF-8 locale. IOW, Linux pathnames are *not*
> strings. They are bytevectors. Guile 1.x (as well as Python 2.x) was
> fine bytevector pathnames, but Guile 2.x (as well as Python 3.x) wants
> to pretend filenames are strings. That leads to trouble, potentially
> even to security vulnerabilities.
> 
> A very typical case is a tarball that contains, say, Latin-1
> filenames. If you should expand the tarball in a UTF-8 environment,
> Guile wouldn't be able to deal with the situation.

Yes, you exceeded my powers of deduction (or clairvoyance, depending on
how you look at it).

More to he point, unix-like pathnames are at the implementation level
just a collection of bytes terminated by null and with '/' as the
directory separator. Having said that, the POSIX Portable Filename
Character Set (§3.278 of the SUS) doesn't even cover all of ASCII, let
alone unicode.

It can be useful to handle filenames as strings in the program.  My
main objection is not that filenames are not treated as collections of
bytes, but that guile assumes the filename character set is the same as
the locale character set, which on distributed file systems may be
completely false.  I may be wrong, but I do not think you can set the
filename codeset programmatically in guile, which most other libraries
permit.

So I guess the best rule is that, even if you don't stick to the
Portable Filename Character Set, stick to ASCII for filenames/paths.

Chris



  reply	other threads:[~2015-12-22 20:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 21:09 Running script from directory with UTF-8 characters Vicente Vera
2015-12-21 23:19 ` Marko Rauhamaa
2015-12-22  0:34   ` Chris Vine
2015-12-22  1:14     ` Marko Rauhamaa
2015-12-22 14:21       ` Chris Vine
2015-12-22 15:55         ` Marko Rauhamaa
2015-12-22 20:12           ` Chris Vine [this message]
2015-12-22 20:36             ` Marko Rauhamaa
2015-12-22 20:59               ` Eli Zaretskii
2015-12-22 21:39                 ` Marko Rauhamaa
2015-12-23 18:28                   ` Eli Zaretskii
2015-12-23 19:18                     ` Marko Rauhamaa
2015-12-23 19:33                       ` Eli Zaretskii
2015-12-23 21:15                         ` Marko Rauhamaa
2015-12-23 21:53                         ` David Kastrup
2015-12-23 22:20                           ` Marko Rauhamaa
2015-12-23 22:25                             ` David Kastrup
2015-12-24 16:13                   ` Barry Schwartz
2015-12-22 14:32   ` Vicente Vera
2015-12-22 15:56     ` Marko Rauhamaa
2015-12-26  1:57       ` Vicente Vera

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151222201240.3a66fd94@bother.homenet \
    --to=chris@cvine.freeserve.co.uk \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).