From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Foreign file names on MS-Windows Date: Sat, 22 Mar 2008 14:50:59 +0200 Message-ID: Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org Content-Transfer-Encoding: 7BIT X-Trace: ger.gmane.org 1206190337 3286 80.91.229.12 (22 Mar 2008 12:52:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 22 Mar 2008 12:52:17 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 22 13:52:47 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Jd3DK-00007h-Dr for ged-emacs-devel@m.gmane.org; Sat, 22 Mar 2008 13:52:46 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jd3Ci-0005Op-Rg for ged-emacs-devel@m.gmane.org; Sat, 22 Mar 2008 08:52:09 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Jd3CU-0005Nv-Dd for emacs-devel@gnu.org; Sat, 22 Mar 2008 08:51:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jd3CT-0005Mg-BP for emacs-devel@gnu.org; Sat, 22 Mar 2008 08:51:53 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jd3CT-0005MT-38 for emacs-devel@gnu.org; Sat, 22 Mar 2008 08:51:53 -0400 Original-Received: from mtaout4.012.net.il ([84.95.2.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Jd3CS-0005Rp-Bl for emacs-devel@gnu.org; Sat, 22 Mar 2008 08:51:52 -0400 Original-Received: from HOME-C4E4A596F7 ([84.228.236.188]) by i_mtaout4.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0JY400HZNUZOMKQ0@i_mtaout4.012.net.il> for emacs-devel@gnu.org; Sat, 22 Mar 2008 15:04:37 +0200 (IST) X-012-Sender: halo1@inter.net.il X-detected-kernel: by monty-python.gnu.org: Solaris 9.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:93177 Archived-At: This is a bit longish, but there's an important question near the end, related to the release of Emacs 22.2, so please bear with me. In the context of this message, ``foreign file names'' means file names that cannot be expressed using the current system codepage. For example, Cyrillic file names on a system whose codepage is 1252 (which supports only Latin-1 characters). Problem description: The Windows filesystem holds file names in UTF-16 encoding, which allows it to support file names outside of the current locale. Emacs currently uses the ANSI variants of filesystem APIs, so the file names returned by system calls on which `readdir' (emulated in src/w32.c) relies are converted by these system calls to the current codepage. When Windows encounters characters that cannot be converted to the current codepage, it converts them to question marks `?' instead. A `?' is a character that cannot appear in a valid file name on a Windows filesystem, so Emacs primitives that are built on top of `readdir', such as `directory-files-and-attributes', `directory-files', and file-name completion primitives, all fail for these file names in different ways: at best these files are silently omitted from the output, and at worst you see some weirdo error messages. A case in point is "C-x d", which on Windows uses `ls' emulation in ls-lisp, which in turn calls `directory-files-and-attributes': a simple "C-x d" silently omits foreign file names from the directory listing, while "C-u C-x d -altr RET" complains about something being nil instead of a number, and fails to sort the file names as requested. This is because `file-attributes' fails for a file name that includes `?' characters, and `directory-files-and-attributes' then returns such files without attributes. Eventually, Emacs 23 should switch to using Unicode APIs to the filesystem, which will resolve this problem (but we will need to figure out how not to break W9x versions of Windows, where Unicode support is an add-on that is typically not installed). A temporary bandaid, and the only solution that is practical for Emacs 22, is to modify `readdir' to return the 8+3 aliase of the problematic file name instead of the long name. The 8+3 aliases use only 7-bit ASCII characters; they are ugly and butchered to the point of being unrecognizable, but are otherwise fully functional. The change below, which I installed on the trunk, shows how to do that. After this change, at least "C-x d", both with and without "C-u", works for me on directories with such file names. Now the important question I promised at the beginning: Should we install this change on the release branch? Here are the pros and cons that I could think of for this decision: Cons: . It is too close to release for such non-trivial changes. . The affected primitives are used in lots of places, and this change could break them, and the Lisp code that uses them. . This problem exists in Emacs for a long time, so it's not a big deal if it continues to exist some more (until resolved in Emacs 23). . The suggested solution is only partial, and the resulting file names are UGLY. Pros: . The bug is quite grave: it causes real data loss. . Whatever code uses the affected primitives is probably already broken. . The change is very simple, so the probability of it being buggy is very low (but please eyeball the diffs below to make it lower still). Yidong and Stefan, please decide whether the change below should be installed on the release branch. 2008-03-22 Eli Zaretskii * w32.c (readdir): If FindFirstFile/FindNextFile return in cFileName a file name that includes `?' characters, use the 8+3 alias in cAlternateFileName instead. Index: src/w32.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/w32.c,v retrieving revision 1.130 diff -u -p -r1.130 w32.c --- src/w32.c 24 Feb 2008 10:09:03 -0000 1.130 +++ src/w32.c 22 Mar 2008 11:51:07 -0000 @@ -1889,6 +1889,8 @@ closedir (DIR *dirp) struct direct * readdir (DIR *dirp) { + int downcase = !NILP (Vw32_downcase_file_names); + if (wnet_enum_handle != INVALID_HANDLE_VALUE) { if (!read_unc_volume (wnet_enum_handle, @@ -1923,14 +1925,23 @@ readdir (DIR *dirp) value returned by stat(). */ dir_static.d_ino = 1; + strcpy (dir_static.d_name, dir_find_data.cFileName); + + /* If the file name in cFileName[] includes `?' characters, it means + the original file name used characters that cannot be represented + by the current ANSI codepage. To avoid total lossage, retrieve + the short 8+3 alias of the long file name. */ + if (_mbspbrk (dir_static.d_name, "?")) + { + strcpy (dir_static.d_name, dir_find_data.cAlternateFileName); + downcase = 1; /* 8+3 aliases are returned in all caps */ + } + dir_static.d_namlen = strlen (dir_static.d_name); dir_static.d_reclen = sizeof (struct direct) - MAXNAMLEN + 3 + dir_static.d_namlen - dir_static.d_namlen % 4; - - dir_static.d_namlen = strlen (dir_find_data.cFileName); - strcpy (dir_static.d_name, dir_find_data.cFileName); if (dir_is_fat) _strlwr (dir_static.d_name); - else if (!NILP (Vw32_downcase_file_names)) + else if (downcase) { register char *p; for (p = dir_static.d_name; *p; p++)