From: Eli Zaretskii <eliz@gnu.org>
To: emacs-devel@gnu.org
Subject: Foreign file names on MS-Windows
Date: Sat, 22 Mar 2008 14:50:59 +0200 [thread overview]
Message-ID: <u7ifuhqh8.fsf@gnu.org> (raw)
This is a bit longish, but there's an important question near the end,
related to the release of Emacs 22.2, so please bear with me.
In the context of this message, ``foreign file names'' means file
names that cannot be expressed using the current system codepage. For
example, Cyrillic file names on a system whose codepage is 1252 (which
supports only Latin-1 characters).
Problem description: The Windows filesystem holds file names in UTF-16
encoding, which allows it to support file names outside of the current
locale. Emacs currently uses the ANSI variants of filesystem APIs, so
the file names returned by system calls on which `readdir' (emulated
in src/w32.c) relies are converted by these system calls to the
current codepage. When Windows encounters characters that cannot be
converted to the current codepage, it converts them to question marks
`?' instead. A `?' is a character that cannot appear in a valid file
name on a Windows filesystem, so Emacs primitives that are built on
top of `readdir', such as `directory-files-and-attributes',
`directory-files', and file-name completion primitives, all fail for
these file names in different ways: at best these files are silently
omitted from the output, and at worst you see some weirdo error
messages.
A case in point is "C-x d", which on Windows uses `ls' emulation in
ls-lisp, which in turn calls `directory-files-and-attributes': a
simple "C-x d" silently omits foreign file names from the directory
listing, while "C-u C-x d -altr RET" complains about something being
nil instead of a number, and fails to sort the file names as
requested. This is because `file-attributes' fails for a file name
that includes `?' characters, and `directory-files-and-attributes'
then returns such files without attributes.
Eventually, Emacs 23 should switch to using Unicode APIs to the
filesystem, which will resolve this problem (but we will need to
figure out how not to break W9x versions of Windows, where Unicode
support is an add-on that is typically not installed).
A temporary bandaid, and the only solution that is practical for Emacs
22, is to modify `readdir' to return the 8+3 aliase of the problematic
file name instead of the long name. The 8+3 aliases use only 7-bit
ASCII characters; they are ugly and butchered to the point of being
unrecognizable, but are otherwise fully functional. The change below,
which I installed on the trunk, shows how to do that. After this
change, at least "C-x d", both with and without "C-u", works for me on
directories with such file names.
Now the important question I promised at the beginning: Should we
install this change on the release branch? Here are the pros and cons
that I could think of for this decision:
Cons:
. It is too close to release for such non-trivial changes.
. The affected primitives are used in lots of places, and this
change could break them, and the Lisp code that uses them.
. This problem exists in Emacs for a long time, so it's not a big
deal if it continues to exist some more (until resolved in Emacs
23).
. The suggested solution is only partial, and the resulting file
names are UGLY.
Pros:
. The bug is quite grave: it causes real data loss.
. Whatever code uses the affected primitives is probably already
broken.
. The change is very simple, so the probability of it being buggy
is very low (but please eyeball the diffs below to make it lower
still).
Yidong and Stefan, please decide whether the change below should be
installed on the release branch.
2008-03-22 Eli Zaretskii <eliz@gnu.org>
* w32.c (readdir): If FindFirstFile/FindNextFile return in
cFileName a file name that includes `?' characters, use the 8+3
alias in cAlternateFileName instead.
Index: src/w32.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/w32.c,v
retrieving revision 1.130
diff -u -p -r1.130 w32.c
--- src/w32.c 24 Feb 2008 10:09:03 -0000 1.130
+++ src/w32.c 22 Mar 2008 11:51:07 -0000
@@ -1889,6 +1889,8 @@ closedir (DIR *dirp)
struct direct *
readdir (DIR *dirp)
{
+ int downcase = !NILP (Vw32_downcase_file_names);
+
if (wnet_enum_handle != INVALID_HANDLE_VALUE)
{
if (!read_unc_volume (wnet_enum_handle,
@@ -1923,14 +1925,23 @@ readdir (DIR *dirp)
value returned by stat(). */
dir_static.d_ino = 1;
+ strcpy (dir_static.d_name, dir_find_data.cFileName);
+
+ /* If the file name in cFileName[] includes `?' characters, it means
+ the original file name used characters that cannot be represented
+ by the current ANSI codepage. To avoid total lossage, retrieve
+ the short 8+3 alias of the long file name. */
+ if (_mbspbrk (dir_static.d_name, "?"))
+ {
+ strcpy (dir_static.d_name, dir_find_data.cAlternateFileName);
+ downcase = 1; /* 8+3 aliases are returned in all caps */
+ }
+ dir_static.d_namlen = strlen (dir_static.d_name);
dir_static.d_reclen = sizeof (struct direct) - MAXNAMLEN + 3 +
dir_static.d_namlen - dir_static.d_namlen % 4;
-
- dir_static.d_namlen = strlen (dir_find_data.cFileName);
- strcpy (dir_static.d_name, dir_find_data.cFileName);
if (dir_is_fat)
_strlwr (dir_static.d_name);
- else if (!NILP (Vw32_downcase_file_names))
+ else if (downcase)
{
register char *p;
for (p = dir_static.d_name; *p; p++)
next reply other threads:[~2008-03-22 12:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-22 12:50 Eli Zaretskii [this message]
2008-03-22 13:21 ` Foreign file names on MS-Windows Lennart Borgman (gmail)
2008-03-22 13:32 ` Eli Zaretskii
2008-03-22 14:34 ` Eli Zaretskii
2008-03-22 15:38 ` Jason Rumney
2008-03-22 17:26 ` Eli Zaretskii
2008-03-22 17:32 ` Stefan Monnier
2008-03-22 18:15 ` Eli Zaretskii
2008-03-23 0:56 ` Stefan Monnier
2008-03-23 4:22 ` Eli Zaretskii
2008-03-24 1:15 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=u7ifuhqh8.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.