From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Understanding how to specify UTF-8 Date: Thu, 13 Apr 2017 10:18:04 +0300 Message-ID: <83a87kixn7.fsf@gnu.org> References: NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1492067909 8796 195.159.176.226 (13 Apr 2017 07:18:29 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 13 Apr 2017 07:18:29 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Apr 13 09:18:22 2017 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cyZ1J-00027H-Iw for geh-help-gnu-emacs@m.gmane.org; Thu, 13 Apr 2017 09:18:21 +0200 Original-Received: from localhost ([::1]:47735 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyZ1P-0005LN-IY for geh-help-gnu-emacs@m.gmane.org; Thu, 13 Apr 2017 03:18:27 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36562) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyZ0k-0005Jj-8b for help-gnu-emacs@gnu.org; Thu, 13 Apr 2017 03:17:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cyZ0h-0007Kl-4T for help-gnu-emacs@gnu.org; Thu, 13 Apr 2017 03:17:46 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:55971) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyZ0h-0007Kh-0f for help-gnu-emacs@gnu.org; Thu, 13 Apr 2017 03:17:43 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2017 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cyZ0f-0006ML-Hy for help-gnu-emacs@gnu.org; Thu, 13 Apr 2017 03:17:42 -0400 In-reply-to: (btraven@nihilo.net) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:112760 Archived-At: [Resending with the correct Subject.] > From: "B. T. Raven" > Date: Thu, 13 Apr 2017 00:09:51 -0500 > > I also have these lines in my .emacs: > > (set-locale-environment "utf-8") > (set-language-environment 'utf-8) > (set-default-coding-systems 'utf-8) > (setq file-name-coding-system 'utf-8) > (setq buffer-file-coding-system 'utf-8) > (setq coding-system-for-write 'utf-8) > (set-keyboard-coding-system 'utf-8) > (set-terminal-coding-system 'utf-8) > (prefer-coding-system 'utf-8) > ;; (set-buffer-process-coding-system 'utf-8 'utf-8) > (modify-coding-system-alist 'process > "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos) > > > The line commented out caused a problem but I don't remember what it > was. My os w64 vers. 7 Some of the above are not recommended, and some are downright dangerous (a.k.a. "shooting yourself in the foot"). Especially on MS-Windows, UTF-8 should be used with extra care, because Windows only partially supports this encoding in its APIs. Specifically: > (set-locale-environment "utf-8") Don't do this on Windows, as Windows locales cannot use UTF-8 as their encoding. > (set-language-environment 'utf-8) > (set-default-coding-systems 'utf-8) Redundant as long as you have the prefer-coding-system call below. > (setq file-name-coding-system 'utf-8) This is a no-op: Emacs on Windows ignores the value of this variable, except if you are on Windows 9X, and file names cannot be encoded in UTF-8 on Windows anyway. Starting with Emacs 24.4, Emacs on Windows uses Unicode APIs to deal with file names, so it supports non-ASCII file names with all Unicode characters, and you don't need to do anything to get this support. > (setq buffer-file-coding-system 'utf-8) Dangerous. Also redundant with prefer-coding-system below. > (setq coding-system-for-write 'utf-8) This is dangerous: it will produce subtle issues with some commands, notably when invoking subprocesses with non-ASCII strings in command-line arguments. This variable exists so that Lisp programs could force specific encoding where appropriate, so leave it to that and don't globally set it. > (set-keyboard-coding-system 'utf-8) > (set-terminal-coding-system 'utf-8) These are wrong, and will get in the way when you work in -nw sessions. Emacs on MS-Windows doesn't fully support UTF-8 encoding of keyboard input and console output, even if you tweak your system's codepage to be 65001 (did you?). > (prefer-coding-system 'utf-8) This is the only setting that you should have if you want to use UTF-8 wherever possible and reasonable. > ;; (set-buffer-process-coding-system 'utf-8 'utf-8) > (modify-coding-system-alist 'process > "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos) This is wrong: Emacs on MS-Windows doesn't support UTF-8 encoding of program command-line arguments for subprocesses, and most Windows programs will NOT talk UTF-8 in their standard streams. prefer-coding-system should take care of those situations where this is possible/actually happens; the rest should be left alone, or you will have subtle problems with non-ASCII I/O vis-a-vis subprocesses. HTH