From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days? Date: Fri, 11 Sep 2020 15:24:14 +0300 Message-ID: <83een8h575.fsf@gnu.org> References: <708ten8bam.fsf@fencepost.gnu.org> <83shcu3mtf.fsf@gnu.org> <83y3mdwo0a.fsf@gnu.org> <87imcn9jmq.fsf@gnus.org> <835z8nknar.fsf@gnu.org> <87r1r97pbz.fsf@gnus.org> <835z8lk85y.fsf@gnu.org> <87imck1t1g.fsf@gnus.org> <83h7s4h8uh.fsf@gnu.org> <87een81rkv.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17089"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rgm@gnu.org, 15803@debbugs.gnu.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Sep 11 14:25:20 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kGi7E-0004M4-H9 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 11 Sep 2020 14:25:20 +0200 Original-Received: from localhost ([::1]:57414 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kGi7D-0005G8-Hg for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 11 Sep 2020 08:25:19 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60310) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kGi6x-0005Fe-5N for bug-gnu-emacs@gnu.org; Fri, 11 Sep 2020 08:25:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59505) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kGi6v-0008KT-Vo for bug-gnu-emacs@gnu.org; Fri, 11 Sep 2020 08:25:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kGi6v-0006E4-Qm for bug-gnu-emacs@gnu.org; Fri, 11 Sep 2020 08:25:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 11 Sep 2020 12:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 15803 X-GNU-PR-Package: emacs Original-Received: via spool by 15803-submit@debbugs.gnu.org id=B15803.159982706723881 (code B ref 15803); Fri, 11 Sep 2020 12:25:01 +0000 Original-Received: (at 15803) by debbugs.gnu.org; 11 Sep 2020 12:24:27 +0000 Original-Received: from localhost ([127.0.0.1]:42818 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGi6M-0006D7-KF for submit@debbugs.gnu.org; Fri, 11 Sep 2020 08:24:26 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:56064) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGi6K-0006Cu-Km for 15803@debbugs.gnu.org; Fri, 11 Sep 2020 08:24:25 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:36773) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kGi6F-0008Gl-CA; Fri, 11 Sep 2020 08:24:19 -0400 Original-Received: from [176.228.60.248] (port=1089 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kGi6D-0005Ab-AC; Fri, 11 Sep 2020 08:24:18 -0400 In-Reply-To: <87een81rkv.fsf@gnus.org> (message from Lars Ingebrigtsen on Fri, 11 Sep 2020 13:27:28 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:187822 Archived-At: > From: Lars Ingebrigtsen > Cc: rgm@gnu.org, 15803@debbugs.gnu.org > Date: Fri, 11 Sep 2020 13:27:28 +0200 > > make[1]: Entering directory '/home/larsi/src/emacs/f�o/test' > ELC lisp/eshell/eshell-tests.elc > foo2: "#(\"/home/larsi/src/emacs/fóo/test/lisp/eshell/eshell-tests.elcnjDFYY\" 0 65 (charset iso-8859-1))" > >>Error occurred processing lisp/eshell/eshell-tests.el: File is missing (("Doing chmod" "No such file or directory" "/home/larsi/src/emacs/f\303\263o/test/lisp/eshell/eshell-tests.elcnjDFYY")) > make[1]: *** [Makefile:165: lisp/eshell/eshell-tests.elc] Error 1 > > So it's created a tempfile, tagged with the correct charset (I had no > idea that that's how it worked), but decoded, and then set-file-modes > interprets that as an UTF-8 file name. > > So... it's a bug in set-file-modes? Hm, nope, write-region has the > same problem. There be dragons ;-) The problematic aspect of debugging these problems is that what you see is not always what's there, due to display and decoding/encoding operations by both Emacs and the display software you have on your system (which drives the terminal). In particular, strings inside Emacs are always in UTF-8-compatible encoding, so the fact you get UTF-8 in *Messages* doesn't prove anything. What we need is to find 2 types of possible problems: . raw bytes from Latin-1 encoding inside Emacs buffers or strings that are supposed to be decoded . UTF-8 encoded (instead of Latin-1 encoded) characters passed to libc functions So if you found that the problem reveals itself in set-file-modes, let's see what happens there. The relevant code is this: char *fname = SSDATA (ENCODE_FILE (absname)); mode_t imode = XFIXNUM (mode) & 07777; if (fchmodat (AT_FDCWD, fname, imode, nofollow) != 0) report_file_error ("Doing chmod", absname); Please either run this under GDB, or add printf's, to show the byte sequences of 'absname' and of 'fname'. The former should be in UTF-8 (so you should see 0xC3 and 0xB3 for the ó character), the latter should be in Latin-1 (so you should see 0xF3 for the same letter). This should give us some hints wrt where to look for the cause of the problem. > That weird file name (decoded and tagged with a charset text parameter) > comes from make-temp-file -- everything seems to be OK before that. > target-file is: > > foo: "\"/home/larsi/src/emacs/f\\363o/test/lisp/eshell/eshell-tests.elc\"" > > which seems to be correct, Where does the "foo:" printout comes from? I wouldn't expect to see Latin-1 encoded strings inside Emacs, not normally anyway. > but > > (tempfile > (make-temp-file (expand-file-name target-file))) > > is > > "#(\"/home/larsi/src/emacs/fóo/test/lisp/eshell/eshell-tests.elcnjDFYY\" 0 65 (charset iso-8859-1))" I see nothing wrong here: this is how decoding works in Emacs. And again, how did you produce this string? As I explained above, the details of how you display these strings matter in this case.