unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Chris Vine <chris@cvine.freeserve.co.uk>
To: Eli Zaretskii <eliz@gnu.org>
Cc: guile-user@gnu.org
Subject: Re: Filename encoding
Date: Wed, 15 Jan 2014 21:42:57 +0000	[thread overview]
Message-ID: <20140115214257.45af1076@bother.homenet> (raw)
In-Reply-To: <83sisp2c65.fsf@gnu.org>

On Wed, 15 Jan 2014 23:00:18 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Wed, 15 Jan 2014 19:50:51 +0000
> > From: Chris Vine <chris@cvine.freeserve.co.uk>
> > Cc: guile-user@gnu.org
> > 
> > POSIX system calls are encoding agnostic.  The filename is just a
> > series of bytes terminating with a NUL character.  All guile needs
> > to know is what encoding the person creating the filesystem has
> > adopted in naming files and which it needs to map to.
> 
> This doesn't work well, because you cannot easily take apart and
> construct file names in encoding-agnostic ways.  For example, some
> multibyte sequence in an arbitrary encoding could include the '/' or
> '\' characters, so searching for directory separators could fail,
> unless you use multibyte-aware string functions (which is a nuisance,
> because these functions only support a single locale at a time).
> 
> So I think using UTF-8 internally is a much better way.

I am not sure what you mean, as I am not talking about internal use.
Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which
is fine.  glib uses UTF-32 and UTF-8 internally for most purposes.  It
is the external representation which is in issue. This is just an
encoding transformation for the library when looking up a file (be it
guile, glib or anything else).

As it happens (although this is beside the point) using a byte value or
sequence in a filename which the operating system reserves as the '/'
character, for a purpose other than designating a pathname, or a NUL
character for designating anything other than end of filename, is not
POSIX compliant and will not work on any operating system I know of,
including windows. (As for POSIX, see SUS, Base Definitions, section
3.170 (Filename) and 3.267 (Pathname).) But as I say, that is
irrelevant.  Whatever the filesystem encoding happens to be, it happens
to be.  It might not be a narrow encoding at all.

Chris



  reply	other threads:[~2014-01-15 21:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 12:52 Filename encoding Chris Vine
2014-01-15 18:14 ` Mark H Weaver
2014-01-15 19:02   ` Eli Zaretskii
2014-01-15 21:34     ` Mark H Weaver
2014-01-16  3:46       ` Eli Zaretskii
2014-01-15 19:50   ` Chris Vine
2014-01-15 21:00     ` Eli Zaretskii
2014-01-15 21:42       ` Chris Vine [this message]
2014-01-16  3:52         ` Eli Zaretskii
2014-01-15 21:47     ` Mark H Weaver
2014-01-15 22:32       ` Chris Vine
2014-01-16  3:55       ` Eli Zaretskii
2014-01-15 23:29     ` Ludovic Courtès
2014-01-16  4:00       ` Eli Zaretskii
2014-01-16 13:03         ` Ludovic Courtès
2014-01-16 14:07           ` John Darrington
2014-01-16 16:12             ` Eli Zaretskii
2014-01-16 16:09           ` Eli Zaretskii
2014-01-16 15:36         ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140115214257.45af1076@bother.homenet \
    --to=chris@cvine.freeserve.co.uk \
    --cc=eliz@gnu.org \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).