From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#30755: 25.3; Encoding of load-file-name wrongwhenpathtoworking dir does not contain accented letter Date: Thu, 17 May 2018 18:19:16 +0300 Message-ID: <83tvr6ia2z.fsf@gnu.org> References: <5aa26e5f.d05b1c0a.10214.5e41@mx.google.com> <836065weec.fsf@gnu.org> <5aa2ff79.c786df0a.cebf4.23cd@mx.google.com> <83bmfutwzk.fsf@gnu.org> <5aad23e1.c1471c0a.8eac2.b6fe@mx.google.com> <83efkipx3r.fsf@gnu.org> <83fu30pwlz.fsf@gnu.org> <5af4c657.1c69fb81.34361.19f1@mx.google.com> <83tvr8kf57.fsf@gnu.org> <5afbe0c9.1c69fb81.2bfe1.d10b@mx.google.com> <04AF7956-7A0E-4D3B-BCAD-E2C0C4EE685D@gnu.org> <5afd97c2.1c69fb81.53c57.01cf@mx.google.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1526570297 20475 195.159.176.226 (17 May 2018 15:18:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 17 May 2018 15:18:17 +0000 (UTC) Cc: 30755@debbugs.gnu.org To: =?UTF-8?Q?M=C3=A1rton?= Marczell Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu May 17 17:18:13 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJKfT-0005Cz-QX for geb-bug-gnu-emacs@m.gmane.org; Thu, 17 May 2018 17:18:12 +0200 Original-Received: from localhost ([::1]:60477 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJKha-0004k0-Gh for geb-bug-gnu-emacs@m.gmane.org; Thu, 17 May 2018 11:20:22 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45458) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJKhK-0004g8-2J for bug-gnu-emacs@gnu.org; Thu, 17 May 2018 11:20:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJKhG-0006lm-C7 for bug-gnu-emacs@gnu.org; Thu, 17 May 2018 11:20:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:59068) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fJKhG-0006lU-2o for bug-gnu-emacs@gnu.org; Thu, 17 May 2018 11:20:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fJKhF-0001BT-O9 for bug-gnu-emacs@gnu.org; Thu, 17 May 2018 11:20:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 17 May 2018 15:20:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30755 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: unreproducible moreinfo Original-Received: via spool by 30755-submit@debbugs.gnu.org id=B30755.15265703554490 (code B ref 30755); Thu, 17 May 2018 15:20:01 +0000 Original-Received: (at 30755) by debbugs.gnu.org; 17 May 2018 15:19:15 +0000 Original-Received: from localhost ([127.0.0.1]:38732 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJKgT-0001AK-RQ for submit@debbugs.gnu.org; Thu, 17 May 2018 11:19:15 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:53981) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fJKgS-0001A8-7y for 30755@debbugs.gnu.org; Thu, 17 May 2018 11:19:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJKgJ-0005wm-Ui for 30755@debbugs.gnu.org; Thu, 17 May 2018 11:19:07 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50416) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJKgJ-0005wX-Rf; Thu, 17 May 2018 11:19:03 -0400 Original-Received: from [176.228.60.248] (port=2442 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fJKgJ-0006s7-6w; Thu, 17 May 2018 11:19:03 -0400 In-reply-to: <5afd97c2.1c69fb81.53c57.01cf@mx.google.com> (message from =?UTF-8?Q?M=C3=A1rton?= Marczell on Thu, 17 May 2018 16:54:56 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:146240 Archived-At: > Cc: "30755@debbugs.gnu.org" <30755@debbugs.gnu.org> > From: Márton Marczell > Date: Thu, 17 May 2018 16:54:56 +0200 > > I cloned git.savannah.gnu.org/r/emacs.git and built emacs. I set the HOME envvar to > ‘D:\Marci\Programozás\emacsdebug’ which contained the repro case from above. I still get wrong output > when the load path is printed: I didn't yet commit my changes, so the fact you still see the problem is expected. Please apply the patch below, rebuild Emacs (by typing "make" at the shell prompt in the top-level directory of the Emacs tree), and see if the problem goes away. Thanks. --- src/fileio.c~0 2018-02-12 12:40:44.000000000 +0200 +++ src/fileio.c 2018-05-15 18:13:28.240161500 +0300 @@ -865,33 +865,71 @@ the root directory. */) } } multibyte = STRING_MULTIBYTE (name); - if (multibyte != STRING_MULTIBYTE (default_directory)) + bool defdir_multibyte = STRING_MULTIBYTE (default_directory); + if (multibyte != defdir_multibyte) { + /* We want to make both NAME and DEFAULT_DIRECTORY have the same + multibyteness. Strategy: + . If either NAME or DEFAULT_DIRECTORY is pure-ASCII, they + can be converted to the multibyteness of the other one + while keeping the same byte sequence. + . If both are non-ASCII, the only safe conversion is to + convert the multibyte one to be unibyte, because the + reverse conversion potentially adds bytes while raw bytes + are converted to their multibyte forms, which we will be + unable to account for, since the information about the + original multibyteness is lost. If those additional bytes + later leak to system APIs because they are not encoded or + because they are converted to unibyte strings by keeping + the data, file APIs will fail. */ if (multibyte) { - unsigned char *p = SDATA (name); + bool name_ascii_p = SCHARS (name) == SBYTES (name); + unsigned char *p = SDATA (default_directory); - while (*p && ASCII_CHAR_P (*p)) - p++; - if (*p == '\0') + if (!name_ascii_p) + while (*p && ASCII_CHAR_P (*p)) + p++; + if (name_ascii_p || *p != '\0') { - /* NAME is a pure ASCII string, and DEFAULT_DIRECTORY is - unibyte. Do not convert DEFAULT_DIRECTORY to - multibyte; instead, convert NAME to a unibyte string, - so that the result of this function is also a unibyte - string. This is needed during bootstrapping and - dumping, when Emacs cannot decode file names, because - the locale environment is not set up. */ + /* DEFAULT_DIRECTORY is unibyte and possibly non-ASCII. + Make a unibyte string out of NAME, and arrange for + the result of this function to be a unibyte string. + This is needed during bootstrapping and dumping, when + Emacs cannot decode file names, because the locale + environment is not set up. */ name = make_unibyte_string (SSDATA (name), SBYTES (name)); multibyte = 0; } else - default_directory = string_to_multibyte (default_directory); + { + /* NAME is non-ASCII and multibyte, and + DEFAULT_DIRECTORY is unibyte and pure-ASCII: make a + multibyte string out of DEFAULT_DIRECTORY's data. */ + default_directory = + make_multibyte_string (SSDATA (default_directory), + SCHARS (default_directory), + SCHARS (default_directory)); + } } else { - name = string_to_multibyte (name); - multibyte = 1; + unsigned char *p = SDATA (name); + + while (*p && ASCII_CHAR_P (*p)) + p++; + if (*p == '\0') + { + /* DEFAULT_DIRECTORY is multibyte and NAME is unibyte + and pure-ASCII. Make a multibyte string out of + NAME's data. */ + name = make_multibyte_string (SSDATA (name), + SCHARS (name), SCHARS (name)); + multibyte = 1; + } + else + default_directory = make_unibyte_string (SSDATA (default_directory), + SBYTES (default_directory)); } } --- lisp/startup.el~0 2018-03-14 06:40:04.000000000 +0200 +++ lisp/startup.el 2018-05-16 18:48:50.597482900 +0300 @@ -560,9 +560,17 @@ (if default-directory (setq default-directory (if (eq system-type 'windows-nt) - ;; Convert backslashes to forward slashes. - (expand-file-name - (decode-coding-string default-directory coding t)) + ;; We pass the decoded default-directory as + ;; the 2nd arg to make sure expand-file-name + ;; sees a multibyte string as the default + ;; directory; this avoids the side effect of + ;; returning a unibyte string from + ;; expand-file-name because it still sees + ;; the undecoded value of default-directory. + (let ((defdir (decode-coding-string default-directory + coding t))) + ;; Convert backslashes to forward slashes. + (expand-file-name defdir defdir)) (decode-coding-string default-directory coding t)))))) ;; Decode all the important variables and directory lists, now