* Filename encoding
@ 2014-01-15 12:52 Chris Vine
2014-01-15 18:14 ` Mark H Weaver
0 siblings, 1 reply; 19+ messages in thread
From: Chris Vine @ 2014-01-15 12:52 UTC (permalink / raw)
To: guile-user
Hi,
A number of guile's scheme procedures look-up or reference files on a
file system (open-file, load and so forth).
How does guile translate filenames from its internal string
representation (ISO-8859-1/UTF-32) to narrow string filename encoding
when looking up the file? Does it assume filenames are in locale
encoding (not particularly safe on networked file systems) or does it
provide a fluid for this? (glib caters for this with the
G_FILENAME_ENCODING environmental variable.)
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 12:52 Filename encoding Chris Vine
@ 2014-01-15 18:14 ` Mark H Weaver
2014-01-15 19:02 ` Eli Zaretskii
2014-01-15 19:50 ` Chris Vine
0 siblings, 2 replies; 19+ messages in thread
From: Mark H Weaver @ 2014-01-15 18:14 UTC (permalink / raw)
To: Chris Vine; +Cc: guile-user
Chris Vine <chris@cvine.freeserve.co.uk> writes:
> A number of guile's scheme procedures look-up or reference files on a
> file system (open-file, load and so forth).
>
> How does guile translate filenames from its internal string
> representation (ISO-8859-1/UTF-32) to narrow string filename encoding
> when looking up the file? Does it assume filenames are in locale
> encoding (not particularly safe on networked file systems) or does it
> provide a fluid for this? (glib caters for this with the
> G_FILENAME_ENCODING environmental variable.)
It assumes filenames are in locale encoding. Ditto for virtually
everything that interfaces with POSIX-style byte strings, including
environment variables, command-line arguments, etc. Encoding errors
will raise exceptions by default.
My hope is that this will become less of an issue over time, as systems
increasingly standardize on UTF-8. I see no other good solution.
Thoughts?
Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 18:14 ` Mark H Weaver
@ 2014-01-15 19:02 ` Eli Zaretskii
2014-01-15 21:34 ` Mark H Weaver
2014-01-15 19:50 ` Chris Vine
1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-15 19:02 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
> From: Mark H Weaver <mhw@netris.org>
> Date: Wed, 15 Jan 2014 13:14:39 -0500
> Cc: guile-user@gnu.org
>
> My hope is that this will become less of an issue over time, as systems
> increasingly standardize on UTF-8. I see no other good solution.
>
> Thoughts?
MS-Windows filesystems will not standardize on UTF-8 in any observable
future.
Likewise, in some Far Eastern cultures, non-UTF encoding are still
widely used.
An "other good solution" is to decode file names into Unicode based
representation (which can be UTF-8) for internal handling, then encode
them back into the locale-specific encoding when passing them to
system calls and library functions that receive file names. This is
what Emacs does.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 18:14 ` Mark H Weaver
2014-01-15 19:02 ` Eli Zaretskii
@ 2014-01-15 19:50 ` Chris Vine
2014-01-15 21:00 ` Eli Zaretskii
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Chris Vine @ 2014-01-15 19:50 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
On Wed, 15 Jan 2014 13:14:39 -0500
Mark H Weaver <mhw@netris.org> wrote:
> Chris Vine <chris@cvine.freeserve.co.uk> writes:
>
> > A number of guile's scheme procedures look-up or reference files on
> > a file system (open-file, load and so forth).
> >
> > How does guile translate filenames from its internal string
> > representation (ISO-8859-1/UTF-32) to narrow string filename
> > encoding when looking up the file? Does it assume filenames are in
> > locale encoding (not particularly safe on networked file systems)
> > or does it provide a fluid for this? (glib caters for this with the
> > G_FILENAME_ENCODING environmental variable.)
>
> It assumes filenames are in locale encoding. Ditto for virtually
> everything that interfaces with POSIX-style byte strings, including
> environment variables, command-line arguments, etc. Encoding errors
> will raise exceptions by default.
>
> My hope is that this will become less of an issue over time, as
> systems increasingly standardize on UTF-8. I see no other good
> solution.
>
> Thoughts?
POSIX system calls are encoding agnostic. The filename is just a series
of bytes terminating with a NUL character. All guile needs to know is
what encoding the person creating the filesystem has adopted in naming
files and which it needs to map to. So far as filenames are concerned,
this seems to me to be something for which a fluid would be just the
thing - it could default to the locale encoding but a user could set it
to something else. I suppose command lines and environmental variables
are less problematic because they are usually local to a particular
machine, although that may not necessarily be so true these days for
command lines.
Fluids would have a substantial advantage over glib's approach of an
environmental variable. Fluids can be thread safe, environmental
variables are not. (Incidentally, with glib you can set the
environmental variable G_BROKEN_FILENAMES instead of G_FILENAME_ENCODING
which will cause the glib file functions to use locale encoding, which
I guess expresses their view on the issue. However, their solution of
using environmental variables is not ideal.)
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 19:50 ` Chris Vine
@ 2014-01-15 21:00 ` Eli Zaretskii
2014-01-15 21:42 ` Chris Vine
2014-01-15 21:47 ` Mark H Weaver
2014-01-15 23:29 ` Ludovic Courtès
2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-15 21:00 UTC (permalink / raw)
To: Chris Vine; +Cc: guile-user
> Date: Wed, 15 Jan 2014 19:50:51 +0000
> From: Chris Vine <chris@cvine.freeserve.co.uk>
> Cc: guile-user@gnu.org
>
> POSIX system calls are encoding agnostic. The filename is just a series
> of bytes terminating with a NUL character. All guile needs to know is
> what encoding the person creating the filesystem has adopted in naming
> files and which it needs to map to.
This doesn't work well, because you cannot easily take apart and
construct file names in encoding-agnostic ways. For example, some
multibyte sequence in an arbitrary encoding could include the '/' or
'\' characters, so searching for directory separators could fail,
unless you use multibyte-aware string functions (which is a nuisance,
because these functions only support a single locale at a time).
So I think using UTF-8 internally is a much better way.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 19:02 ` Eli Zaretskii
@ 2014-01-15 21:34 ` Mark H Weaver
2014-01-16 3:46 ` Eli Zaretskii
0 siblings, 1 reply; 19+ messages in thread
From: Mark H Weaver @ 2014-01-15 21:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: guile-user
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Mark H Weaver <mhw@netris.org>
>> Date: Wed, 15 Jan 2014 13:14:39 -0500
>> Cc: guile-user@gnu.org
>>
>> My hope is that this will become less of an issue over time, as systems
>> increasingly standardize on UTF-8. I see no other good solution.
>>
>> Thoughts?
>
> MS-Windows filesystems will not standardize on UTF-8 in any observable
> future.
Well, I understand that MS has standardized on UTF-16 (right?) but what
matters from Guile's perspective is the encoding used by the POSIX-style
interfaces that Guile uses, such as 'open'. Do you know what encoding
that is on Windows?
> Likewise, in some Far Eastern cultures, non-UTF encoding are still
> widely used.
*nod*
> An "other good solution" is to decode file names into Unicode based
> representation (which can be UTF-8) for internal handling, then encode
> them back into the locale-specific encoding when passing them to
> system calls and library functions that receive file names. This is
> what Emacs does.
That's what Guile does too. Internally, all strings are Unicode. At
present we use either Latin-1 or UTF-32, but I intend to change the
internal representation to UTF-8 at some point.
Thanks,
Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 21:00 ` Eli Zaretskii
@ 2014-01-15 21:42 ` Chris Vine
2014-01-16 3:52 ` Eli Zaretskii
0 siblings, 1 reply; 19+ messages in thread
From: Chris Vine @ 2014-01-15 21:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: guile-user
On Wed, 15 Jan 2014 23:00:18 +0200
Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Wed, 15 Jan 2014 19:50:51 +0000
> > From: Chris Vine <chris@cvine.freeserve.co.uk>
> > Cc: guile-user@gnu.org
> >
> > POSIX system calls are encoding agnostic. The filename is just a
> > series of bytes terminating with a NUL character. All guile needs
> > to know is what encoding the person creating the filesystem has
> > adopted in naming files and which it needs to map to.
>
> This doesn't work well, because you cannot easily take apart and
> construct file names in encoding-agnostic ways. For example, some
> multibyte sequence in an arbitrary encoding could include the '/' or
> '\' characters, so searching for directory separators could fail,
> unless you use multibyte-aware string functions (which is a nuisance,
> because these functions only support a single locale at a time).
>
> So I think using UTF-8 internally is a much better way.
I am not sure what you mean, as I am not talking about internal use.
Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which
is fine. glib uses UTF-32 and UTF-8 internally for most purposes. It
is the external representation which is in issue. This is just an
encoding transformation for the library when looking up a file (be it
guile, glib or anything else).
As it happens (although this is beside the point) using a byte value or
sequence in a filename which the operating system reserves as the '/'
character, for a purpose other than designating a pathname, or a NUL
character for designating anything other than end of filename, is not
POSIX compliant and will not work on any operating system I know of,
including windows. (As for POSIX, see SUS, Base Definitions, section
3.170 (Filename) and 3.267 (Pathname).) But as I say, that is
irrelevant. Whatever the filesystem encoding happens to be, it happens
to be. It might not be a narrow encoding at all.
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 19:50 ` Chris Vine
2014-01-15 21:00 ` Eli Zaretskii
@ 2014-01-15 21:47 ` Mark H Weaver
2014-01-15 22:32 ` Chris Vine
2014-01-16 3:55 ` Eli Zaretskii
2014-01-15 23:29 ` Ludovic Courtès
2 siblings, 2 replies; 19+ messages in thread
From: Mark H Weaver @ 2014-01-15 21:47 UTC (permalink / raw)
To: Chris Vine; +Cc: guile-user
Chris Vine <chris@cvine.freeserve.co.uk> writes:
> POSIX system calls are encoding agnostic. The filename is just a series
> of bytes terminating with a NUL character.
Yes, I know, but conceptually these things are strings. Unless you're
going to treat these filenames as black boxes to be copied from one
place to another but never manipulated, printed, or read, you need to
know their encoding and you need to treat them as strings internally.
> All guile needs to know is what encoding the person creating the
> filesystem has adopted in naming files and which it needs to map to.
Right, but how does it know that? The closest thing we have to a
standard way to tell programs what encoding to use is via the locale. I
believe that's what most existing internationalized programs do, anyway.
> So far as filenames are concerned, this seems to me to be something
> for which a fluid would be just the thing - it could default to the
> locale encoding but a user could set it to something else.
We could do that, but I'm not really sure how it would improve the
situation. If Guile expects the program to know the encoding of
filenames on the filesystem, that just passes the buck to the program.
How does the program know what encoding to use?
Yes, the program can know the encoding if it's a custom program written
for one specific system. However, if you write a program that's
supposed to work on any system, how do you know the encoding?
It seems to me that each system must standardize on a single encoding
for all filenames on that system, and the locale encoding is the defacto
standard way of telling programs what that is.
Regards,
Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 21:47 ` Mark H Weaver
@ 2014-01-15 22:32 ` Chris Vine
2014-01-16 3:55 ` Eli Zaretskii
1 sibling, 0 replies; 19+ messages in thread
From: Chris Vine @ 2014-01-15 22:32 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
On Wed, 15 Jan 2014 16:47:45 -0500
Mark H Weaver <mhw@netris.org> wrote:
[snip]
> > So far as filenames are concerned, this seems to me to be something
> > for which a fluid would be just the thing - it could default to the
> > locale encoding but a user could set it to something else.
>
> We could do that, but I'm not really sure how it would improve the
> situation. If Guile expects the program to know the encoding of
> filenames on the filesystem, that just passes the buck to the program.
> How does the program know what encoding to use?
>
> Yes, the program can know the encoding if it's a custom program
> written for one specific system. However, if you write a program
> that's supposed to work on any system, how do you know the encoding?
>
> It seems to me that each system must standardize on a single encoding
> for all filenames on that system, and the locale encoding is the
> defacto standard way of telling programs what that is.
A language support library such as guile can adopt a default but it
cannot standardize on anything. It certainly cannot assume that all
filenames use the locale encoding which happens to be the chosen locale
on a particular user's computer. The only thing a general library can
do is pass the issue back to the program. Generally it would have to
be a setting for the program/network, along with many other
configuration settings. A given computer system has to know other
basic things, like the address of its nameserver. There are many
different ways in which that can be achieved.
In practice you might include it in a look-up table which the network
administrator provides. More likely it is a standard promulgated by
the administrator for all systems which happen to form part of a single
business unit. More likely still in other than Asia the administrator
has a policy of only using ASCII names for files served across wide
area networks.
Leaving that aside, the idea that a library should not enable the
program to choose its filename encoding as a configuration option in
some way seems to me to be odd, and unworkable in many real life
situations.
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 19:50 ` Chris Vine
2014-01-15 21:00 ` Eli Zaretskii
2014-01-15 21:47 ` Mark H Weaver
@ 2014-01-15 23:29 ` Ludovic Courtès
2014-01-16 4:00 ` Eli Zaretskii
2 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2014-01-15 23:29 UTC (permalink / raw)
To: guile-user
Chris Vine <chris@cvine.freeserve.co.uk> skribis:
> So far as filenames are concerned,
> this seems to me to be something for which a fluid would be just the
> thing - it could default to the locale encoding but a user could set it
> to something else. I suppose command lines and environmental variables
> are less problematic because they are usually local to a particular
> machine, although that may not necessarily be so true these days for
> command lines.
I makes some sense to have a fluid for that.
That said, it’s not so great either: for each Unicode-capable library in
use, people would end up define $THELIB_FILE_NAME_ENCODING. Apart from
GLib, I think language run-time supports (Python, etc.) typically assume
locale encoding too, no?
Does anyone know of systems where the file name encoding is commonly
different from locale encoding? Is it the case on Windows?
Ludo’.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 21:34 ` Mark H Weaver
@ 2014-01-16 3:46 ` Eli Zaretskii
0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 3:46 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
> From: Mark H Weaver <mhw@netris.org>
> Cc: chris@cvine.freeserve.co.uk, guile-user@gnu.org
> Date: Wed, 15 Jan 2014 16:34:26 -0500
>
> Well, I understand that MS has standardized on UTF-16 (right?)
Right.
> but what matters from Guile's perspective is the encoding used by
> the POSIX-style interfaces that Guile uses, such as 'open'. Do you
> know what encoding that is on Windows?
It's the current system codepage, a.k.a. "ANSI" encoding. But
sticking to that means you can never support file names with
characters outside of the current locale. So IMO this is not "good
enough" for Guile.
The POSIX-style interfaces issue can be worked around by providing
wrapper functions. A few projects already do that: Emacs, msysgit.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 21:42 ` Chris Vine
@ 2014-01-16 3:52 ` Eli Zaretskii
0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 3:52 UTC (permalink / raw)
To: Chris Vine; +Cc: guile-user
> Date: Wed, 15 Jan 2014 21:42:57 +0000
> From: Chris Vine <chris@cvine.freeserve.co.uk>
> Cc: mhw@netris.org, guile-user@gnu.org
>
> I am not sure what you mean, as I am not talking about internal use.
Then I probably didn't understand why you mentioned the external
encoding. How is that relevant to the issue at hand?
I'm saying that Guile does needs to know how to convert a file name
when it needs to pass it to library functions and system calls. So
The POSIX system calls may be "encoding agnostic", but Guile simply
cannot be.
> As it happens (although this is beside the point) using a byte value or
> sequence in a filename which the operating system reserves as the '/'
> character, for a purpose other than designating a pathname, or a NUL
> character for designating anything other than end of filename, is not
> POSIX compliant and will not work on any operating system I know of,
> including windows.
Windows is not Posix-compliant, so all bets are off. As a matter of
fact, there _are_ DBCS codepages where the second byte can be '\'.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 21:47 ` Mark H Weaver
2014-01-15 22:32 ` Chris Vine
@ 2014-01-16 3:55 ` Eli Zaretskii
1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 3:55 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-user
> From: Mark H Weaver <mhw@netris.org>
> Date: Wed, 15 Jan 2014 16:47:45 -0500
> Cc: guile-user@gnu.org
>
> > All guile needs to know is what encoding the person creating the
> > filesystem has adopted in naming files and which it needs to map to.
>
> Right, but how does it know that? The closest thing we have to a
> standard way to tell programs what encoding to use is via the locale. I
> believe that's what most existing internationalized programs do, anyway.
You can take the defaults from the locale, but you need to allow the
user or application to change those defaults.
> It seems to me that each system must standardize on a single encoding
> for all filenames on that system, and the locale encoding is the defacto
> standard way of telling programs what that is.
But I can, for example, mount a file system from a different locale
which uses different encoding. No need to prevent me from using file
names on that file system.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-15 23:29 ` Ludovic Courtès
@ 2014-01-16 4:00 ` Eli Zaretskii
2014-01-16 13:03 ` Ludovic Courtès
2014-01-16 15:36 ` Mark H Weaver
0 siblings, 2 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 4:00 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-user
> From: ludo@gnu.org (Ludovic Courtès)
> Date: Thu, 16 Jan 2014 00:29:06 +0100
>
> Does anyone know of systems where the file name encoding is commonly
> different from locale encoding? Is it the case on Windows?
Windows stores file names on disk encoded in UTF-16, but converts them
to the current codepage if you use Posix-style interfaces like 'open'
and 'rename'. (There are parallel APIs that accept UTF-16 encoded
file names.)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-16 4:00 ` Eli Zaretskii
@ 2014-01-16 13:03 ` Ludovic Courtès
2014-01-16 14:07 ` John Darrington
2014-01-16 16:09 ` Eli Zaretskii
2014-01-16 15:36 ` Mark H Weaver
1 sibling, 2 replies; 19+ messages in thread
From: Ludovic Courtès @ 2014-01-16 13:03 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: guile-user
Eli Zaretskii <eliz@gnu.org> skribis:
>> From: ludo@gnu.org (Ludovic Courtès)
>> Date: Thu, 16 Jan 2014 00:29:06 +0100
>>
>> Does anyone know of systems where the file name encoding is commonly
>> different from locale encoding? Is it the case on Windows?
>
> Windows stores file names on disk encoded in UTF-16, but converts them
> to the current codepage if you use Posix-style interfaces like 'open'
> and 'rename'.
So in practice, given that Guile uses the POSIX interfaces, the
assumption that file names are in the locale encoding is valid on
Windows.
Ludo’.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-16 13:03 ` Ludovic Courtès
@ 2014-01-16 14:07 ` John Darrington
2014-01-16 16:12 ` Eli Zaretskii
2014-01-16 16:09 ` Eli Zaretskii
1 sibling, 1 reply; 19+ messages in thread
From: John Darrington @ 2014-01-16 14:07 UTC (permalink / raw)
To: Ludovic Court??s; +Cc: guile-user
On Thu, Jan 16, 2014 at 02:03:05PM +0100, Ludovic Court??s wrote:
Eli Zaretskii <eliz@gnu.org> skribis:
>> From: ludo@gnu.org (Ludovic Court??s)
>> Date: Thu, 16 Jan 2014 00:29:06 +0100
>>
>> Does anyone know of systems where the file name encoding is commonly
>> different from locale encoding? Is it the case on Windows?
>
> Windows stores file names on disk encoded in UTF-16, but converts them
> to the current codepage if you use Posix-style interfaces like 'open'
> and 'rename'.
So in practice, given that Guile uses the POSIX interfaces, the
assumption that file names are in the locale encoding is valid on
Windows.
If you know that the filename was always obtained using the Guile's
interface then the issue is never pertinent. The problem comes when a function
is aske to open a non-ascii named file, without any information about where that
filename came from.
There is no answer to this general problem. We've encountered it over the years
in PSPP what we are doing now, is to pass the filename around in a structure along
with a variable indicating the encoding in which that filename should be interpreted.
This works up to a point, but eventually there comes an interface where the crucial
information is missing. For example, what happens if the filename is in a text file.
We have heuristics which can guess the encoding of a file, but that is of course not
completely reliable.
One has to decide on an approach which will give the lowest probability of surprises.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-16 4:00 ` Eli Zaretskii
2014-01-16 13:03 ` Ludovic Courtès
@ 2014-01-16 15:36 ` Mark H Weaver
1 sibling, 0 replies; 19+ messages in thread
From: Mark H Weaver @ 2014-01-16 15:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Ludovic Courtès, guile-user
Eli Zaretskii <eliz@gnu.org> writes:
> Windows stores file names on disk encoded in UTF-16, but converts them
> to the current codepage if you use Posix-style interfaces like 'open'
> and 'rename'. (There are parallel APIs that accept UTF-16 encoded
> file names.)
Okay, so on Windows we should use the parallel APIs that accept UTF-16,
thus (apparently) completely avoiding the issue on that platform.
Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-16 13:03 ` Ludovic Courtès
2014-01-16 14:07 ` John Darrington
@ 2014-01-16 16:09 ` Eli Zaretskii
1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 16:09 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-user
> From: ludo@gnu.org (Ludovic Courtès)
> Cc: guile-user@gnu.org
> Date: Thu, 16 Jan 2014 14:03:05 +0100
>
> Eli Zaretskii <eliz@gnu.org> skribis:
>
> >> From: ludo@gnu.org (Ludovic Courtès)
> >> Date: Thu, 16 Jan 2014 00:29:06 +0100
> >>
> >> Does anyone know of systems where the file name encoding is commonly
> >> different from locale encoding? Is it the case on Windows?
> >
> > Windows stores file names on disk encoded in UTF-16, but converts them
> > to the current codepage if you use Posix-style interfaces like 'open'
> > and 'rename'.
>
> So in practice, given that Guile uses the POSIX interfaces, the
> assumption that file names are in the locale encoding is valid on
> Windows.
For now, yes. But I was under the impression that this thread
discusses future designs, and for that, I don't recommend to rely on
this situation to remain for long. After all, providing wrappers for
a few Posix interfaces is not such a hard job.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Filename encoding
2014-01-16 14:07 ` John Darrington
@ 2014-01-16 16:12 ` Eli Zaretskii
0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-01-16 16:12 UTC (permalink / raw)
To: John Darrington; +Cc: ludo, guile-user
> Date: Thu, 16 Jan 2014 15:07:43 +0100
> From: John Darrington <john@darrington.wattle.id.au>
> Cc: Eli Zaretskii <eliz@gnu.org>, guile-user@gnu.org
>
> If you know that the filename was always obtained using the Guile's
> interface then the issue is never pertinent. The problem comes when a function
> is aske to open a non-ascii named file, without any information about where that
> filename came from.
>
>
> There is no answer to this general problem.
I think storing file names in some Unicode based encoding internally
is that answer. If you disagree, please tell why.
> This works up to a point, but eventually there comes an interface where the crucial
> information is missing. For example, what happens if the filename is in a text file.
Then the encoding of that file is your clue.
> One has to decide on an approach which will give the lowest probability of surprises.
I think if Guile provides sensible defaults and convenient ways to
override those defaults for a specific operation, that should be
enough.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-01-16 16:12 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15 12:52 Filename encoding Chris Vine
2014-01-15 18:14 ` Mark H Weaver
2014-01-15 19:02 ` Eli Zaretskii
2014-01-15 21:34 ` Mark H Weaver
2014-01-16 3:46 ` Eli Zaretskii
2014-01-15 19:50 ` Chris Vine
2014-01-15 21:00 ` Eli Zaretskii
2014-01-15 21:42 ` Chris Vine
2014-01-16 3:52 ` Eli Zaretskii
2014-01-15 21:47 ` Mark H Weaver
2014-01-15 22:32 ` Chris Vine
2014-01-16 3:55 ` Eli Zaretskii
2014-01-15 23:29 ` Ludovic Courtès
2014-01-16 4:00 ` Eli Zaretskii
2014-01-16 13:03 ` Ludovic Courtès
2014-01-16 14:07 ` John Darrington
2014-01-16 16:12 ` Eli Zaretskii
2014-01-16 16:09 ` Eli Zaretskii
2014-01-16 15:36 ` Mark H Weaver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).