From: Eli Zaretskii <eliz@gnu.org>
To: Glenn Morris <rgm@gnu.org>
Cc: 15803@debbugs.gnu.org
Subject: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?
Date: Fri, 08 Dec 2017 11:46:29 +0200 [thread overview]
Message-ID: <83y3mdwo0a.fsf@gnu.org> (raw)
In-Reply-To: <cvlgiiggg6.fsf@fencepost.gnu.org> (message from Glenn Morris on Mon, 04 Dec 2017 19:35:05 -0500)
> From: Glenn Morris <rgm@gnu.org>
> Cc: 15803@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 19:35:05 -0500
>
> Eli Zaretskii wrote:
>
> > Perhaps on Posix systems, but not elsewhere.
>
> I assume non-POSIX is newspeak for MS-Windows (native and DOS).
I didn't say "non-Posix"; you did.
MS-Windows is definitely not a Posix system, but whether it is the
only one, I don't know. Are we sure all macOS/Darwin systems are
sufficiently Posix in this aspect? AFAIR they use quite different
encoding methods for file names (canonical normalization etc.).
> > And if we make the change, we should make sure building Emacs in a
> > non-ASCII directory still works.
>
> It works fine for me on G/L to have source, build, and install
> directories be distinct non-ASCII directories.
Was it in a UTF-8 locale or in a non-UTF-8 locale? The latter is the
potentially problematic case, AFAIR.
> (Emacs works, that is,
> but makeinfo 5.1 fails to find @include files in non-ASCII directories,
> so I wonder how common such setups are.)
Building a release tarball doesn't require makeinfo.
> BTW, it feels very dated to me to have discussion of Windows 9X in the
> Emacs manual section on file-name-coding.
We still try to support it, and the aspects of file-name encoding
related to it are definitely non-trivial. Everything described there
is in the code.
> diff --git i/doc/emacs/mule.texi w/doc/emacs/mule.texi
> index 78f77cb..5fc44a6 100644
> --- i/doc/emacs/mule.texi
> +++ w/doc/emacs/mule.texi
> @@ -1214,11 +1214,8 @@ system can encode.
>
> If @code{file-name-coding-system} is @code{nil}, Emacs uses a
> default coding system determined by the selected language environment,
> -and stored in the @code{default-file-name-coding-system} variable.
> -@c FIXME? Is this correct? What is the "default language environment"?
> -In the default language environment, non-@acronym{ASCII} characters in
> -file names are not encoded specially; they appear in the file system
> -using the internal Emacs representation.
> +and stored in the @code{default-file-name-coding-system} variable
> +(normally UTF-8).
Not sure why you removed the sentence which had the FIXME comment. Is
it in any way related to the issue at hand?
> @cindex file-name encoding, MS-Windows
> @vindex w32-unicode-filenames
> diff --git i/lisp/international/mule-cmds.el w/lisp/international/mule-cmds.el
> index 9d22d6e..192f0e9 100644
> --- i/lisp/international/mule-cmds.el
> +++ w/lisp/international/mule-cmds.el
> @@ -1797,10 +1797,11 @@ The default status is as follows:
> 'raw-text)
>
> (set-default-coding-systems nil)
> - (setq default-sendmail-coding-system 'iso-latin-1)
> - ;; On Darwin systems, this should be utf-8-unix, but when this file is loaded
> - ;; that is not yet defined, so we set it in set-locale-environment instead.
> - (setq default-file-name-coding-system 'iso-latin-1-unix)
> + (setq default-sendmail-coding-system 'utf-8)
> + (setq default-file-name-coding-system (if (memq system-type
> + '(window-nt ms-dos))
> + 'iso-latin-1-unix
> + 'utf-8-unix))
Why are we changing sendmail-coding-system? It has nothing to do with
file names, AFAIK.
> ;; Preserve eol-type from existing default-process-coding-systems.
> ;; On non-unix-like systems in particular, these may have been set
> ;; carefully by the user, or by the startup code, to deal with the
> @@ -1816,8 +1817,10 @@ The default status is as follows:
> (input-coding
> (condition-case nil
> (coding-system-change-text-conversion
> - (cdr default-process-coding-system) 'iso-latin-1)
> - (coding-system-error 'iso-latin-1))))
> + (cdr default-process-coding-system)
> + (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
> + (coding-system-error
> + (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
> (setq default-process-coding-system
> (cons output-coding input-coding)))
And this changes the default encoding used to communicate with
sub-processes. Why? We never talked about a wholesale change of all
the defaults to UTF-8, that is a much more broad issue than just
encoding of file names.
> diff --git i/lisp/mh-e/mh-comp.el w/lisp/mh-e/mh-comp.el
> index 98067ce..25118cd 100644
> --- i/lisp/mh-e/mh-comp.el
> +++ w/lisp/mh-e/mh-comp.el
> @@ -304,6 +304,7 @@ message and scan line."
> (let ((draft-buffer (current-buffer))
> (file-name buffer-file-name)
> (config mh-previous-window-config)
> + ;; FIXME this is subtly different to select-message-coding-system.
> (coding-system-for-write
> (if (and (local-variable-p 'buffer-file-coding-system
> (current-buffer)) ;XEmacs needs two args
> @@ -315,7 +316,7 @@ message and scan line."
> (or (and (boundp 'sendmail-coding-system) sendmail-coding-system)
> (and (default-boundp 'buffer-file-coding-system)
> (default-value 'buffer-file-coding-system))
> - 'iso-latin-1))))
> + 'utf-8))))
Changes like that in MH-E should be communicated to the MH-E
developer; I 'm not sure he is reading this list.
And you never answered my question about the rationale:
> Btw, why does the default matter so much? Once Emacs starts up
> default-file-name-coding-system on GNU/Linux is set to UTF-8, if the
> locale says so. Is this just an aesthetic issue?
next prev parent reply other threads:[~2017-12-08 9:46 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-04 18:45 bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days? Glenn Morris
2017-12-01 1:52 ` Glenn Morris
2017-12-01 7:54 ` Eli Zaretskii
2017-12-05 0:35 ` Glenn Morris
2017-12-08 9:46 ` Eli Zaretskii [this message]
2017-12-12 1:38 ` Glenn Morris
2020-09-09 13:15 ` Lars Ingebrigtsen
2020-09-09 15:00 ` Eli Zaretskii
2020-09-10 13:07 ` Lars Ingebrigtsen
2020-09-10 14:39 ` Eli Zaretskii
2020-09-11 10:55 ` Lars Ingebrigtsen
2020-09-11 11:05 ` Eli Zaretskii
2020-09-11 11:27 ` Lars Ingebrigtsen
2020-09-11 12:24 ` Eli Zaretskii
2020-09-11 12:33 ` Lars Ingebrigtsen
2020-09-11 12:41 ` Eli Zaretskii
2020-09-11 14:18 ` Lars Ingebrigtsen
2020-09-11 14:27 ` Lars Ingebrigtsen
2020-09-11 14:46 ` Eli Zaretskii
2020-09-11 14:54 ` Lars Ingebrigtsen
2020-09-11 15:11 ` Eli Zaretskii
2020-09-12 8:47 ` Michael Albinus
2020-09-12 11:21 ` Lars Ingebrigtsen
2020-09-11 12:39 ` Lars Ingebrigtsen
2020-09-11 12:45 ` Eli Zaretskii
2020-09-09 13:33 ` Stefan Kangas
2020-09-09 15:09 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83y3mdwo0a.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=15803@debbugs.gnu.org \
--cc=rgm@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.