all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Foreign file names on MS-Windows
@ 2008-03-22 12:50 Eli Zaretskii
  2008-03-22 13:21 ` Lennart Borgman (gmail)
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Eli Zaretskii @ 2008-03-22 12:50 UTC (permalink / raw)
  To: emacs-devel

This is a bit longish, but there's an important question near the end,
related to the release of Emacs 22.2, so please bear with me.

In the context of this message, ``foreign file names'' means file
names that cannot be expressed using the current system codepage.  For
example, Cyrillic file names on a system whose codepage is 1252 (which
supports only Latin-1 characters).

Problem description: The Windows filesystem holds file names in UTF-16
encoding, which allows it to support file names outside of the current
locale.  Emacs currently uses the ANSI variants of filesystem APIs, so
the file names returned by system calls on which `readdir' (emulated
in src/w32.c) relies are converted by these system calls to the
current codepage.  When Windows encounters characters that cannot be
converted to the current codepage, it converts them to question marks
`?' instead.  A `?' is a character that cannot appear in a valid file
name on a Windows filesystem, so Emacs primitives that are built on
top of `readdir', such as `directory-files-and-attributes',
`directory-files', and file-name completion primitives, all fail for
these file names in different ways: at best these files are silently
omitted from the output, and at worst you see some weirdo error
messages.

A case in point is "C-x d", which on Windows uses `ls' emulation in
ls-lisp, which in turn calls `directory-files-and-attributes': a
simple "C-x d" silently omits foreign file names from the directory
listing, while "C-u C-x d -altr RET" complains about something being
nil instead of a number, and fails to sort the file names as
requested.  This is because `file-attributes' fails for a file name
that includes `?' characters, and `directory-files-and-attributes'
then returns such files without attributes.

Eventually, Emacs 23 should switch to using Unicode APIs to the
filesystem, which will resolve this problem (but we will need to
figure out how not to break W9x versions of Windows, where Unicode
support is an add-on that is typically not installed).

A temporary bandaid, and the only solution that is practical for Emacs
22, is to modify `readdir' to return the 8+3 aliase of the problematic
file name instead of the long name.  The 8+3 aliases use only 7-bit
ASCII characters; they are ugly and butchered to the point of being
unrecognizable, but are otherwise fully functional.  The change below,
which I installed on the trunk, shows how to do that.  After this
change, at least "C-x d", both with and without "C-u", works for me on
directories with such file names.

Now the important question I promised at the beginning: Should we
install this change on the release branch?  Here are the pros and cons
that I could think of for this decision:

Cons:

  . It is too close to release for such non-trivial changes.

  . The affected primitives are used in lots of places, and this
    change could break them, and the Lisp code that uses them.

  . This problem exists in Emacs for a long time, so it's not a big
    deal if it continues to exist some more (until resolved in Emacs
    23).

  . The suggested solution is only partial, and the resulting file
    names are UGLY.

Pros:

  . The bug is quite grave: it causes real data loss.

  . Whatever code uses the affected primitives is probably already
    broken.

  . The change is very simple, so the probability of it being buggy
    is very low (but please eyeball the diffs below to make it lower
    still).

Yidong and Stefan, please decide whether the change below should be
installed on the release branch.

2008-03-22  Eli Zaretskii  <eliz@gnu.org>

	* w32.c (readdir): If FindFirstFile/FindNextFile return in
	cFileName a file name that includes `?' characters, use the 8+3
	alias in cAlternateFileName instead.

Index: src/w32.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/w32.c,v
retrieving revision 1.130
diff -u -p -r1.130 w32.c
--- src/w32.c	24 Feb 2008 10:09:03 -0000	1.130
+++ src/w32.c	22 Mar 2008 11:51:07 -0000
@@ -1889,6 +1889,8 @@ closedir (DIR *dirp)
 struct direct *
 readdir (DIR *dirp)
 {
+  int downcase = !NILP (Vw32_downcase_file_names);
+
   if (wnet_enum_handle != INVALID_HANDLE_VALUE)
     {
       if (!read_unc_volume (wnet_enum_handle,
@@ -1923,14 +1925,23 @@ readdir (DIR *dirp)
      value returned by stat().  */
   dir_static.d_ino = 1;
 
+  strcpy (dir_static.d_name, dir_find_data.cFileName);
+
+  /* If the file name in cFileName[] includes `?' characters, it means
+     the original file name used characters that cannot be represented
+     by the current ANSI codepage.  To avoid total lossage, retrieve
+     the short 8+3 alias of the long file name.  */
+  if (_mbspbrk (dir_static.d_name, "?"))
+    {
+      strcpy (dir_static.d_name, dir_find_data.cAlternateFileName);
+      downcase = 1;	/* 8+3 aliases are returned in all caps */
+    }
+  dir_static.d_namlen = strlen (dir_static.d_name);
   dir_static.d_reclen = sizeof (struct direct) - MAXNAMLEN + 3 +
     dir_static.d_namlen - dir_static.d_namlen % 4;
-
-  dir_static.d_namlen = strlen (dir_find_data.cFileName);
-  strcpy (dir_static.d_name, dir_find_data.cFileName);
   if (dir_is_fat)
     _strlwr (dir_static.d_name);
-  else if (!NILP (Vw32_downcase_file_names))
+  else if (downcase)
     {
       register char *p;
       for (p = dir_static.d_name; *p; p++)




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-03-24  1:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-22 12:50 Foreign file names on MS-Windows Eli Zaretskii
2008-03-22 13:21 ` Lennart Borgman (gmail)
2008-03-22 13:32   ` Eli Zaretskii
2008-03-22 14:34 ` Eli Zaretskii
2008-03-22 15:38 ` Jason Rumney
2008-03-22 17:26   ` Eli Zaretskii
2008-03-22 17:32 ` Stefan Monnier
2008-03-22 18:15   ` Eli Zaretskii
2008-03-23  0:56     ` Stefan Monnier
2008-03-23  4:22       ` Eli Zaretskii
2008-03-24  1:15         ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.