unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Glenn Morris <rgm@gnu.org>
Cc: 15803@debbugs.gnu.org
Subject: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?
Date: Fri, 08 Dec 2017 11:46:29 +0200	[thread overview]
Message-ID: <83y3mdwo0a.fsf@gnu.org> (raw)
In-Reply-To: <cvlgiiggg6.fsf@fencepost.gnu.org> (message from Glenn Morris on Mon, 04 Dec 2017 19:35:05 -0500)

> From: Glenn Morris <rgm@gnu.org>
> Cc: 15803@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 19:35:05 -0500
> 
> Eli Zaretskii wrote:
> 
> > Perhaps on Posix systems, but not elsewhere. 
> 
> I assume non-POSIX is newspeak for MS-Windows (native and DOS).

I didn't say "non-Posix"; you did.

MS-Windows is definitely not a Posix system, but whether it is the
only one, I don't know.  Are we sure all macOS/Darwin systems are
sufficiently Posix in this aspect?  AFAIR they use quite different
encoding methods for file names (canonical normalization etc.).

> > And if we make the change, we should make sure building Emacs in a
> > non-ASCII directory still works.
> 
> It works fine for me on G/L to have source, build, and install
> directories be distinct non-ASCII directories.

Was it in a UTF-8 locale or in a non-UTF-8 locale?  The latter is the
potentially problematic case, AFAIR.

> (Emacs works, that is,
> but makeinfo 5.1 fails to find @include files in non-ASCII directories,
> so I wonder how common such setups are.)

Building a release tarball doesn't require makeinfo.

> BTW, it feels very dated to me to have discussion of Windows 9X in the
> Emacs manual section on file-name-coding.

We still try to support it, and the aspects of file-name encoding
related to it are definitely non-trivial.  Everything described there
is in the code.

> diff --git i/doc/emacs/mule.texi w/doc/emacs/mule.texi
> index 78f77cb..5fc44a6 100644
> --- i/doc/emacs/mule.texi
> +++ w/doc/emacs/mule.texi
> @@ -1214,11 +1214,8 @@ system can encode.
>  
>    If @code{file-name-coding-system} is @code{nil}, Emacs uses a
>  default coding system determined by the selected language environment,
> -and stored in the @code{default-file-name-coding-system} variable.
> -@c FIXME?  Is this correct?  What is the "default language environment"?
> -In the default language environment, non-@acronym{ASCII} characters in
> -file names are not encoded specially; they appear in the file system
> -using the internal Emacs representation.
> +and stored in the @code{default-file-name-coding-system} variable
> +(normally UTF-8).

Not sure why you removed the sentence which had the FIXME comment.  Is
it in any way related to the issue at hand?

>  @cindex file-name encoding, MS-Windows
>  @vindex w32-unicode-filenames
> diff --git i/lisp/international/mule-cmds.el w/lisp/international/mule-cmds.el
> index 9d22d6e..192f0e9 100644
> --- i/lisp/international/mule-cmds.el
> +++ w/lisp/international/mule-cmds.el
> @@ -1797,10 +1797,11 @@ The default status is as follows:
>     'raw-text)
>  
>    (set-default-coding-systems nil)
> -  (setq default-sendmail-coding-system 'iso-latin-1)
> -  ;; On Darwin systems, this should be utf-8-unix, but when this file is loaded
> -  ;; that is not yet defined, so we set it in set-locale-environment instead.
> -  (setq default-file-name-coding-system 'iso-latin-1-unix)
> +  (setq default-sendmail-coding-system 'utf-8)
> +  (setq default-file-name-coding-system (if (memq system-type
> +                                                  '(window-nt ms-dos))
> +                                            'iso-latin-1-unix
> +                                          'utf-8-unix))

Why are we changing sendmail-coding-system?  It has nothing to do with
file names, AFAIK.

>    ;; Preserve eol-type from existing default-process-coding-systems.
>    ;; On non-unix-like systems in particular, these may have been set
>    ;; carefully by the user, or by the startup code, to deal with the
> @@ -1816,8 +1817,10 @@ The default status is as follows:
>  	(input-coding
>  	 (condition-case nil
>  	     (coding-system-change-text-conversion
> -	      (cdr default-process-coding-system) 'iso-latin-1)
> -	   (coding-system-error 'iso-latin-1))))
> +	      (cdr default-process-coding-system)
> +	      (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
> +	   (coding-system-error
> +	    (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
>      (setq default-process-coding-system
>  	  (cons output-coding input-coding)))

And this changes the default encoding used to communicate with
sub-processes.  Why?  We never talked about a wholesale change of all
the defaults to UTF-8, that is a much more broad issue than just
encoding of file names.

> diff --git i/lisp/mh-e/mh-comp.el w/lisp/mh-e/mh-comp.el
> index 98067ce..25118cd 100644
> --- i/lisp/mh-e/mh-comp.el
> +++ w/lisp/mh-e/mh-comp.el
> @@ -304,6 +304,7 @@ message and scan line."
>    (let ((draft-buffer (current-buffer))
>          (file-name buffer-file-name)
>          (config mh-previous-window-config)
> +        ;; FIXME this is subtly different to select-message-coding-system.
>          (coding-system-for-write
>           (if (and (local-variable-p 'buffer-file-coding-system
>                                      (current-buffer)) ;XEmacs needs two args
> @@ -315,7 +316,7 @@ message and scan line."
>             (or (and (boundp 'sendmail-coding-system) sendmail-coding-system)
>                 (and (default-boundp 'buffer-file-coding-system)
>                      (default-value 'buffer-file-coding-system))
> -               'iso-latin-1))))
> +               'utf-8))))

Changes like that in MH-E should be communicated to the MH-E
developer; I 'm not sure he is reading this list.

And you never answered my question about the rationale:

> Btw, why does the default matter so much?  Once Emacs starts up
> default-file-name-coding-system on GNU/Linux is set to UTF-8, if the
> locale says so.  Is this just an aesthetic issue?





  reply	other threads:[~2017-12-08  9:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-04 18:45 bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days? Glenn Morris
2017-12-01  1:52 ` Glenn Morris
2017-12-01  7:54   ` Eli Zaretskii
2017-12-05  0:35     ` Glenn Morris
2017-12-08  9:46       ` Eli Zaretskii [this message]
2017-12-12  1:38         ` Glenn Morris
2020-09-09 13:15           ` Lars Ingebrigtsen
2020-09-09 15:00             ` Eli Zaretskii
2020-09-10 13:07               ` Lars Ingebrigtsen
2020-09-10 14:39                 ` Eli Zaretskii
2020-09-11 10:55                   ` Lars Ingebrigtsen
2020-09-11 11:05                     ` Eli Zaretskii
2020-09-11 11:27                       ` Lars Ingebrigtsen
2020-09-11 12:24                         ` Eli Zaretskii
2020-09-11 12:33                           ` Lars Ingebrigtsen
2020-09-11 12:41                             ` Eli Zaretskii
2020-09-11 14:18                               ` Lars Ingebrigtsen
2020-09-11 14:27                                 ` Lars Ingebrigtsen
2020-09-11 14:46                                   ` Eli Zaretskii
2020-09-11 14:54                                     ` Lars Ingebrigtsen
2020-09-11 15:11                                       ` Eli Zaretskii
2020-09-12  8:47                                         ` Michael Albinus
2020-09-12 11:21                                         ` Lars Ingebrigtsen
2020-09-11 12:39                           ` Lars Ingebrigtsen
2020-09-11 12:45                             ` Eli Zaretskii
2020-09-09 13:33       ` Stefan Kangas
2020-09-09 15:09         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83y3mdwo0a.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=15803@debbugs.gnu.org \
    --cc=rgm@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).