From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13515: 24.3.50; file-name operating functions are broken on Japanese Windows Date: Tue, 22 Jan 2013 15:27:44 +0200 Message-ID: <831uddny9b.fsf@gnu.org> References: <87txqa7ikj.wl%%xmue@d1.dion.ne.jp> <834ni9o1oz.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1358861346 26211 80.91.229.3 (22 Jan 2013 13:29:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Jan 2013 13:29:06 +0000 (UTC) Cc: 13515@debbugs.gnu.org To: kzhr@d1.dion.ne.jp Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jan 22 14:29:22 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Txduf-0001Mi-IL for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Jan 2013 14:29:17 +0100 Original-Received: from localhost ([::1]:58315 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxduO-0001bw-8D for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Jan 2013 08:29:00 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:34526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxduJ-0001bj-VD for bug-gnu-emacs@gnu.org; Tue, 22 Jan 2013 08:28:58 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TxduI-0007Nz-5v for bug-gnu-emacs@gnu.org; Tue, 22 Jan 2013 08:28:55 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:38911) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxduH-0007Ns-Vf for bug-gnu-emacs@gnu.org; Tue, 22 Jan 2013 08:28:54 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TxdvO-0007pC-Q5 for bug-gnu-emacs@gnu.org; Tue, 22 Jan 2013 08:30:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 22 Jan 2013 13:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13515 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13515-submit@debbugs.gnu.org id=B13515.135886135629998 (code B ref 13515); Tue, 22 Jan 2013 13:30:02 +0000 Original-Received: (at 13515) by debbugs.gnu.org; 22 Jan 2013 13:29:16 +0000 Original-Received: from localhost ([127.0.0.1]:44375 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Txdua-0007nk-T3 for submit@debbugs.gnu.org; Tue, 22 Jan 2013 08:29:16 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:61033) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxduU-0007nX-DO for 13515@debbugs.gnu.org; Tue, 22 Jan 2013 08:29:11 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MH100E003WVTV00@a-mtaout22.012.net.il> for 13515@debbugs.gnu.org; Tue, 22 Jan 2013 15:27:25 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MH100EMS41PRC10@a-mtaout22.012.net.il>; Tue, 22 Jan 2013 15:27:25 +0200 (IST) In-reply-to: <834ni9o1oz.fsf@gnu.org> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:70163 Archived-At: > Date: Tue, 22 Jan 2013 14:13:32 +0200 > From: Eli Zaretskii > Cc: 13515@debbugs.gnu.org > > > Date: Mon, 21 Jan 2013 22:48:44 +0900 > > From: Kazuhiro Ito > > > > The below code returns unexpected result in turnk on Windows. > > > > (let ((file-name-coding-system 'cp932)) > > (expand-file-name "表" "C:/")) > > > > -> "c:/\225/" > > > > dostounix_filename does not support cp932 encoded string, which could > > contain '\\' as the part of Kankji characters. > > Thanks, I will work on fixing this. Please try the changes below (relative to the emacs-24 branch). If no issues are found with them, I will soon install them on the release branch. === modified file 'src/w32.c' --- src/w32.c 2013-01-01 09:11:05 +0000 +++ src/w32.c 2013-01-22 13:20:33 +0000 @@ -37,7 +37,7 @@ along with GNU Emacs. If not, see -#include /* for _mbspbrk */ +#include /* for _mbspbrk and _mbslwr */ #undef access #undef chdir @@ -1304,6 +1304,67 @@ srandom (int seed) srand (seed); } +/* Current codepage for encoding file names. */ +static int file_name_codepage; + +/* Return the maximum length in bytes of a multibyte character + sequence encoded in the current ANSI codepage. This is required to + correctly walk the encoded file names one character at a time. */ +static int +max_filename_mbslen (void) +{ + /* A simple cache to avoid calling GetCPInfo every time we need to + normalize a file name. The file-name encoding is not supposed to + be changed too frequently, if ever. */ + static Lisp_Object last_file_name_encoding; + static int last_max_mbslen; + Lisp_Object current_encoding; + + current_encoding = Vfile_name_coding_system; + if (NILP (current_encoding)) + current_encoding = Vdefault_file_name_coding_system; + + if (!EQ (last_file_name_encoding, current_encoding)) + { + CPINFO cp_info; + + last_file_name_encoding = current_encoding; + /* Default to the current ANSI codepage. */ + file_name_codepage = w32_ansi_code_page; + if (!NILP (current_encoding)) + { + char *cpname = SDATA (SYMBOL_NAME (current_encoding)); + char *cp = NULL, *end; + int cpnum; + + if (strncmp (cpname, "cp", 2) == 0) + cp = cpname + 2; + else if (strncmp (cpname, "windows-", 8) == 0) + cp = cpname + 8; + + if (cp) + { + end = cp; + cpnum = strtol (cp, &end, 10); + if (cpnum && *end == '\0' && end - cp >= 2) + file_name_codepage = cpnum; + } + } + + if (!file_name_codepage) + file_name_codepage = CP_ACP; /* CP_ACP = 0, but let's not assume that */ + + if (!GetCPInfo (file_name_codepage, &cp_info)) + { + file_name_codepage = CP_ACP; + if (!GetCPInfo (file_name_codepage, &cp_info)) + emacs_abort (); + } + last_max_mbslen = cp_info.MaxCharSize; + } + + return last_max_mbslen; +} /* Normalize filename by converting all path separators to the specified separator. Also conditionally convert upper @@ -1313,14 +1374,20 @@ static void normalize_filename (register char *fp, char path_sep) { char sep; - char *elem; + char *elem, *p2; + int dbcs_p = max_filename_mbslen () > 1; /* Always lower-case drive letters a-z, even if the filesystem preserves case in filenames. This is so filenames can be compared by string comparison functions that are case-sensitive. Even case-preserving filesystems do not distinguish case in drive letters. */ - if (fp[1] == ':' && *fp >= 'A' && *fp <= 'Z') + if (dbcs_p) + p2 = CharNextExA (file_name_codepage, fp, 0); + else + p2 = fp + 1; + + if (*p2 == ':' && *fp >= 'A' && *fp <= 'Z') { *fp += 'a' - 'A'; fp += 2; @@ -1332,7 +1399,10 @@ normalize_filename (register char *fp, c { if (*fp == '/' || *fp == '\\') *fp = path_sep; - fp++; + if (!dbcs_p) + fp++; + else + fp = CharNextExA (file_name_codepage, fp, 0); } return; } @@ -1355,13 +1425,20 @@ normalize_filename (register char *fp, c if (elem && elem != fp) { *fp = 0; /* temporary end of string */ - _strlwr (elem); /* while we convert to lower case */ + _mbslwr (elem); /* while we convert to lower case */ } *fp = sep; /* convert (or restore) path separator */ elem = fp + 1; /* next element starts after separator */ sep = path_sep; } - } while (*fp++); + if (*fp) + { + if (!dbcs_p) + fp++; + else + fp = CharNextExA (file_name_codepage, fp, 0); + } + } while (*fp); } /* Destructively turn backslashes into slashes. */ @@ -2588,15 +2665,22 @@ readdir (DIR *dirp) strcpy (dir_static.d_name, dir_find_data.cFileName); dir_static.d_namlen = strlen (dir_static.d_name); if (dir_is_fat) - _strlwr (dir_static.d_name); + _mbslwr (dir_static.d_name); else if (downcase) { register char *p; - for (p = dir_static.d_name; *p; p++) - if (*p >= 'a' && *p <= 'z') - break; + int dbcs_p = max_filename_mbslen () > 1; + for (p = dir_static.d_name; *p; ) + { + if (*p >= 'a' && *p <= 'z') + break; + if (dbcs_p) + p = CharNextExA (file_name_codepage, p, 0); + else + p++; + } if (!*p) - _strlwr (dir_static.d_name); + _mbslwr (dir_static.d_name); } return &dir_static;