From: Chris Vine <chris@cvine.freeserve.co.uk>
To: Eli Zaretskii <eliz@gnu.org>
Cc: guile-user@gnu.org
Subject: Re: Filename encoding
Date: Wed, 15 Jan 2014 21:42:57 +0000 [thread overview]
Message-ID: <20140115214257.45af1076@bother.homenet> (raw)
In-Reply-To: <83sisp2c65.fsf@gnu.org>
On Wed, 15 Jan 2014 23:00:18 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Wed, 15 Jan 2014 19:50:51 +0000
> > From: Chris Vine <chris@cvine.freeserve.co.uk>
> > Cc: guile-user@gnu.org
> >
> > POSIX system calls are encoding agnostic. The filename is just a
> > series of bytes terminating with a NUL character. All guile needs
> > to know is what encoding the person creating the filesystem has
> > adopted in naming files and which it needs to map to.
>
> This doesn't work well, because you cannot easily take apart and
> construct file names in encoding-agnostic ways. For example, some
> multibyte sequence in an arbitrary encoding could include the '/' or
> '\' characters, so searching for directory separators could fail,
> unless you use multibyte-aware string functions (which is a nuisance,
> because these functions only support a single locale at a time).
>
> So I think using UTF-8 internally is a much better way.
I am not sure what you mean, as I am not talking about internal use.
Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which
is fine. glib uses UTF-32 and UTF-8 internally for most purposes. It
is the external representation which is in issue. This is just an
encoding transformation for the library when looking up a file (be it
guile, glib or anything else).
As it happens (although this is beside the point) using a byte value or
sequence in a filename which the operating system reserves as the '/'
character, for a purpose other than designating a pathname, or a NUL
character for designating anything other than end of filename, is not
POSIX compliant and will not work on any operating system I know of,
including windows. (As for POSIX, see SUS, Base Definitions, section
3.170 (Filename) and 3.267 (Pathname).) But as I say, that is
irrelevant. Whatever the filesystem encoding happens to be, it happens
to be. It might not be a narrow encoding at all.
Chris
next prev parent reply other threads:[~2014-01-15 21:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-15 12:52 Filename encoding Chris Vine
2014-01-15 18:14 ` Mark H Weaver
2014-01-15 19:02 ` Eli Zaretskii
2014-01-15 21:34 ` Mark H Weaver
2014-01-16 3:46 ` Eli Zaretskii
2014-01-15 19:50 ` Chris Vine
2014-01-15 21:00 ` Eli Zaretskii
2014-01-15 21:42 ` Chris Vine [this message]
2014-01-16 3:52 ` Eli Zaretskii
2014-01-15 21:47 ` Mark H Weaver
2014-01-15 22:32 ` Chris Vine
2014-01-16 3:55 ` Eli Zaretskii
2014-01-15 23:29 ` Ludovic Courtès
2014-01-16 4:00 ` Eli Zaretskii
2014-01-16 13:03 ` Ludovic Courtès
2014-01-16 14:07 ` John Darrington
2014-01-16 16:12 ` Eli Zaretskii
2014-01-16 16:09 ` Eli Zaretskii
2014-01-16 15:36 ` Mark H Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140115214257.45af1076@bother.homenet \
--to=chris@cvine.freeserve.co.uk \
--cc=eliz@gnu.org \
--cc=guile-user@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).