From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#15260: cannot build in a directory with non-ascii characters Date: Mon, 28 Oct 2013 18:47:32 +0200 Message-ID: <8361shfil7.fsf@gnu.org> References: <83y55ih6g3.fsf@gnu.org> <81ppqums9o.fsf@fencepost.gnu.org> <83hac5h1ha.fsf@gnu.org> <83bo2dgq36.fsf@gnu.org> <837gd1gpcz.fsf@gnu.org> <8361slgnhk.fsf@gnu.org> <83zjpwfp3l.fsf@gnu.org> <83mwlug0cd.fsf@gnu.org> <83bo2af4hh.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1382979484 13046 80.91.229.3 (28 Oct 2013 16:58:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Oct 2013 16:58:04 +0000 (UTC) Cc: 15260@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Oct 28 17:58:08 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Vaq8l-0005qq-SF for geb-bug-gnu-emacs@m.gmane.org; Mon, 28 Oct 2013 17:58:08 +0100 Original-Received: from localhost ([::1]:42282 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vaq8l-0002FA-IL for geb-bug-gnu-emacs@m.gmane.org; Mon, 28 Oct 2013 12:58:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43346) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vapz8-00056P-R6 for bug-gnu-emacs@gnu.org; Mon, 28 Oct 2013 12:48:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vapz1-0001nr-88 for bug-gnu-emacs@gnu.org; Mon, 28 Oct 2013 12:48:10 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35485) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vapz1-0001nf-3l for bug-gnu-emacs@gnu.org; Mon, 28 Oct 2013 12:48:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Vapz0-0005k9-O2 for bug-gnu-emacs@gnu.org; Mon, 28 Oct 2013 12:48:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 28 Oct 2013 16:48:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 15260 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 15260-submit@debbugs.gnu.org id=B15260.138297887422057 (code B ref 15260); Mon, 28 Oct 2013 16:48:02 +0000 Original-Received: (at 15260) by debbugs.gnu.org; 28 Oct 2013 16:47:54 +0000 Original-Received: from localhost ([127.0.0.1]:49504 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vapyr-0005jg-OX for submit@debbugs.gnu.org; Mon, 28 Oct 2013 12:47:54 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:37368) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vapym-0005jH-Nv for 15260@debbugs.gnu.org; Mon, 28 Oct 2013 12:47:50 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MVE00F00158SM00@a-mtaout22.012.net.il> for 15260@debbugs.gnu.org; Mon, 28 Oct 2013 18:47:35 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MVE00FAT1B8PS50@a-mtaout22.012.net.il>; Mon, 28 Oct 2013 18:47:33 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:79725 Archived-At: > From: Stefan Monnier > Cc: rgm@gnu.org, handa@gnu.org, 15260@debbugs.gnu.org > Date: Mon, 28 Oct 2013 00:05:32 -0400 > > More specifically, for the bug to appear, you need ENCODE (DECODE (s)) > to not be the identity function. Why is not so in the "early" Emacs? Because life's a mess that doesn't easily fit into simple and elegant schemes ;-) For starters, we don't really DECODE_FILE with these file- and directory-names. We just use build_string or make_string, as you can easily see in the init_* functions I mentioned. If you are lucky and your file names are UTF-8 encoded, this produces the same result as DECODE_FILE. If you are less lucky, and your file names are encoded in something else, like Latin-N, you get a unibyte string with the same bytes as in the original. Then we pass these strings to various functions, like file_accessible_directory_p, that _do_ ENCODE_FILE... (Luckily, during most of temacs's run, both file-name-coding-system and its default value are nil, so ENCODE_FILE is a no-op -- except when they aren't, see the next paragraph.) Next, it is quite possible that the file-name-coding-system changes between the time we process and store the file name and the time we encode and pass it to a low-level function. This is especially true during "loadup", when many packages are loaded and their top-level forms are executed. It turns out that 2 of them have side effects that do just that: mule-cmds.el calls reset-language-environment, and language/english.el calls set-language-info-alist; both have the effect of resetting default-file-name-coding-system to latin-1 (!? an interesting "default" for a Unicode-era Emacs, perhaps Handa-san could comment why we still do that). When this happens, your symmetry is broken, and ENCODE_FILE (DECODE_FILE (f)) is no longer the identity function. And then there are other players in this game. For example, default-directory, which is used every time we call expand-file-name, IOW "a lot". If you look in init_buffer, you will see that the default-directory of *scratch* is first set to a multibyte representation of the unibyte string we get from getcwd. In a "normal" Emacs session, we promptly fix that in startup.el, after the call to set-locale-environment initializes all the coding-systems. But "temacs -l loadup dump" doesn't run startup.el, so we are left with what init_buffer did, which is a string no file-name API will be able to grok. Another example is the use of 'equal' (and 'member', which calls 'equal') to compare file and directory names, and look them up in lists: as you know, 'equal' will not compare a unibyte and a multibyte string as equal. So having a mix of unibyte and multibyte strings in file names fails some of the code that relies on 'equal', tricking it into doing wrong things, like deciding that Emacs is _not_ run from the source tree. I'm sure there's more to this saga, I'm just half-way through it...