From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days? Date: Fri, 08 Dec 2017 11:46:29 +0200 Message-ID: <83y3mdwo0a.fsf@gnu.org> References: <708ten8bam.fsf@fencepost.gnu.org> <83shcu3mtf.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1512726501 18347 195.159.176.226 (8 Dec 2017 09:48:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 8 Dec 2017 09:48:21 +0000 (UTC) Cc: 15803@debbugs.gnu.org To: Glenn Morris Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Dec 08 10:48:17 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNFGS-0004cC-7T for geb-bug-gnu-emacs@m.gmane.org; Fri, 08 Dec 2017 10:48:16 +0100 Original-Received: from localhost ([::1]:36264 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eNFGZ-0003BH-Du for geb-bug-gnu-emacs@m.gmane.org; Fri, 08 Dec 2017 04:48:23 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56828) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eNFGI-00033x-AR for bug-gnu-emacs@gnu.org; Fri, 08 Dec 2017 04:48:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eNFGD-0005yD-SP for bug-gnu-emacs@gnu.org; Fri, 08 Dec 2017 04:48:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:42674) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eNFGD-0005y1-PZ for bug-gnu-emacs@gnu.org; Fri, 08 Dec 2017 04:48:01 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eNFGD-0008WQ-K0 for bug-gnu-emacs@gnu.org; Fri, 08 Dec 2017 04:48:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 08 Dec 2017 09:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 15803 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 15803-submit@debbugs.gnu.org id=B15803.151272642332663 (code B ref 15803); Fri, 08 Dec 2017 09:48:01 +0000 Original-Received: (at 15803) by debbugs.gnu.org; 8 Dec 2017 09:47:03 +0000 Original-Received: from localhost ([127.0.0.1]:51349 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNFFG-0008Ul-K0 for submit@debbugs.gnu.org; Fri, 08 Dec 2017 04:47:02 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:37429) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eNFFE-0008UG-1Z for 15803@debbugs.gnu.org; Fri, 08 Dec 2017 04:47:00 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eNFF5-00059o-Ia for 15803@debbugs.gnu.org; Fri, 08 Dec 2017 04:46:54 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:53089) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eNFF5-00059e-Ch for 15803@debbugs.gnu.org; Fri, 08 Dec 2017 04:46:51 -0500 Original-Received: from [176.228.60.248] (port=2719 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1eNFF4-0005PX-Gx; Fri, 08 Dec 2017 04:46:50 -0500 In-reply-to: (message from Glenn Morris on Mon, 04 Dec 2017 19:35:05 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:140804 Archived-At: > From: Glenn Morris > Cc: 15803@debbugs.gnu.org > Date: Mon, 04 Dec 2017 19:35:05 -0500 > > Eli Zaretskii wrote: > > > Perhaps on Posix systems, but not elsewhere. > > I assume non-POSIX is newspeak for MS-Windows (native and DOS). I didn't say "non-Posix"; you did. MS-Windows is definitely not a Posix system, but whether it is the only one, I don't know. Are we sure all macOS/Darwin systems are sufficiently Posix in this aspect? AFAIR they use quite different encoding methods for file names (canonical normalization etc.). > > And if we make the change, we should make sure building Emacs in a > > non-ASCII directory still works. > > It works fine for me on G/L to have source, build, and install > directories be distinct non-ASCII directories. Was it in a UTF-8 locale or in a non-UTF-8 locale? The latter is the potentially problematic case, AFAIR. > (Emacs works, that is, > but makeinfo 5.1 fails to find @include files in non-ASCII directories, > so I wonder how common such setups are.) Building a release tarball doesn't require makeinfo. > BTW, it feels very dated to me to have discussion of Windows 9X in the > Emacs manual section on file-name-coding. We still try to support it, and the aspects of file-name encoding related to it are definitely non-trivial. Everything described there is in the code. > diff --git i/doc/emacs/mule.texi w/doc/emacs/mule.texi > index 78f77cb..5fc44a6 100644 > --- i/doc/emacs/mule.texi > +++ w/doc/emacs/mule.texi > @@ -1214,11 +1214,8 @@ system can encode. > > If @code{file-name-coding-system} is @code{nil}, Emacs uses a > default coding system determined by the selected language environment, > -and stored in the @code{default-file-name-coding-system} variable. > -@c FIXME? Is this correct? What is the "default language environment"? > -In the default language environment, non-@acronym{ASCII} characters in > -file names are not encoded specially; they appear in the file system > -using the internal Emacs representation. > +and stored in the @code{default-file-name-coding-system} variable > +(normally UTF-8). Not sure why you removed the sentence which had the FIXME comment. Is it in any way related to the issue at hand? > @cindex file-name encoding, MS-Windows > @vindex w32-unicode-filenames > diff --git i/lisp/international/mule-cmds.el w/lisp/international/mule-cmds.el > index 9d22d6e..192f0e9 100644 > --- i/lisp/international/mule-cmds.el > +++ w/lisp/international/mule-cmds.el > @@ -1797,10 +1797,11 @@ The default status is as follows: > 'raw-text) > > (set-default-coding-systems nil) > - (setq default-sendmail-coding-system 'iso-latin-1) > - ;; On Darwin systems, this should be utf-8-unix, but when this file is loaded > - ;; that is not yet defined, so we set it in set-locale-environment instead. > - (setq default-file-name-coding-system 'iso-latin-1-unix) > + (setq default-sendmail-coding-system 'utf-8) > + (setq default-file-name-coding-system (if (memq system-type > + '(window-nt ms-dos)) > + 'iso-latin-1-unix > + 'utf-8-unix)) Why are we changing sendmail-coding-system? It has nothing to do with file names, AFAIK. > ;; Preserve eol-type from existing default-process-coding-systems. > ;; On non-unix-like systems in particular, these may have been set > ;; carefully by the user, or by the startup code, to deal with the > @@ -1816,8 +1817,10 @@ The default status is as follows: > (input-coding > (condition-case nil > (coding-system-change-text-conversion > - (cdr default-process-coding-system) 'iso-latin-1) > - (coding-system-error 'iso-latin-1)))) > + (cdr default-process-coding-system) > + (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)) > + (coding-system-error > + (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))))) > (setq default-process-coding-system > (cons output-coding input-coding))) And this changes the default encoding used to communicate with sub-processes. Why? We never talked about a wholesale change of all the defaults to UTF-8, that is a much more broad issue than just encoding of file names. > diff --git i/lisp/mh-e/mh-comp.el w/lisp/mh-e/mh-comp.el > index 98067ce..25118cd 100644 > --- i/lisp/mh-e/mh-comp.el > +++ w/lisp/mh-e/mh-comp.el > @@ -304,6 +304,7 @@ message and scan line." > (let ((draft-buffer (current-buffer)) > (file-name buffer-file-name) > (config mh-previous-window-config) > + ;; FIXME this is subtly different to select-message-coding-system. > (coding-system-for-write > (if (and (local-variable-p 'buffer-file-coding-system > (current-buffer)) ;XEmacs needs two args > @@ -315,7 +316,7 @@ message and scan line." > (or (and (boundp 'sendmail-coding-system) sendmail-coding-system) > (and (default-boundp 'buffer-file-coding-system) > (default-value 'buffer-file-coding-system)) > - 'iso-latin-1)))) > + 'utf-8)))) Changes like that in MH-E should be communicated to the MH-E developer; I 'm not sure he is reading this list. And you never answered my question about the rationale: > Btw, why does the default matter so much? Once Emacs starts up > default-file-name-coding-system on GNU/Linux is set to UTF-8, if the > locale says so. Is this just an aesthetic issue?