unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Understanding how to specify UTF-8
@ 2017-04-07 23:43 Will Parsons
  2017-04-08  7:29 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Will Parsons @ 2017-04-07 23:43 UTC (permalink / raw)
  To: help-gnu-emacs

I want to always use Unicode/UTF-8 unless otherwise specified.  I've noticed
that I've attempted to do this in my .emacs file in two separate ways on two
separate platforms:

1)  (setq-default buffer-file-coding-system 'utf-8-unix)

2)  (set-language-environment "UTF-8")

Both seem to work, but I'm wondering if there are subtle differences between
the two that I should be aware of.

-- 
Will


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-07 23:43 Understanding how to specify UTF-8 Will Parsons
@ 2017-04-08  7:29 ` Eli Zaretskii
  2017-04-13  5:09 ` B. T. Raven
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2017-04-08  7:29 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Will Parsons <wbp@nodomain.invalid>
> Date: 7 Apr 2017 23:43:55 GMT
> 
> I want to always use Unicode/UTF-8 unless otherwise specified.

This doesn't tell what exactly do you want to happen.  The above
basically says "I want to use UTF-8 except when I don't", and doesn't
say a word about those "I don't" cases.  So please elaborate to make
the responses more accurate and correct.

For example, what about files you edit that were encoded in something
other than UTF-8 before? what about responding to email encoded in
something other than UTF-8? etc. etc.

> I've noticed that I've attempted to do this in my .emacs file in two
> separate ways on two separate platforms:
> 
> 1)  (setq-default buffer-file-coding-system 'utf-8-unix)
> 
> 2)  (set-language-environment "UTF-8")
> 
> Both seem to work, but I'm wondering if there are subtle differences between
> the two that I should be aware of.

The second one is better, as it leaves Emacs more leeway where UTF-8
might not be appropriate.  But it's difficult to know what to tell
without the additional information.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-07 23:43 Understanding how to specify UTF-8 Will Parsons
  2017-04-08  7:29 ` Eli Zaretskii
@ 2017-04-13  5:09 ` B. T. Raven
  2017-04-13  6:37   ` (unknown) Eli Zaretskii
                     ` (2 more replies)
  2017-04-21  9:28 ` Jason Rumney
  2017-04-21 18:30 ` Understanding how to specify UTF-8 Stefan Monnier
  3 siblings, 3 replies; 13+ messages in thread
From: B. T. Raven @ 2017-04-13  5:09 UTC (permalink / raw)
  To: help-gnu-emacs

Hi Will. I decided to respond because of this observation in the latest 
posting:
"They used to say emacs and vi are religions; these days they are 
starting to seem like latin."

On 4/7/2017 18:43, Will Parsons wrote:
> I want to always use Unicode/UTF-8 unless otherwise specified.  I've noticed
> that I've attempted to do this in my .emacs file in two separate ways on two
> separate platforms:
>
> 1)  (setq-default buffer-file-coding-system 'utf-8-unix)
>
> 2)  (set-language-environment "UTF-8")
>
> Both seem to work, but I'm wondering if there are subtle differences between
> the two that I should be aware of.


I can't help with any subtlties but can only recommend that you add this 
cookie to the beginning of the buffer:

  ;; -*- coding: utf-8 -*-


I think it may be enough to save and reload the file into a new buffer 
before adding exotic characters.
I also have these lines in my .emacs:

   (set-locale-environment   "utf-8")
         (set-language-environment               'utf-8)
         (set-default-coding-systems             'utf-8)
         (setq file-name-coding-system           'utf-8)
         (setq buffer-file-coding-system 'utf-8)
         (setq coding-system-for-write           'utf-8)
         (set-keyboard-coding-system             'utf-8)
         (set-terminal-coding-system          'utf-8)
         (prefer-coding-system                   'utf-8)
         ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
         (modify-coding-system-alist 'process 
"[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)


The line commented out caused a problem but I don't remember what it 
was. My os w64 vers. 7

Ed


^ permalink raw reply	[flat|nested] 13+ messages in thread

* (unknown)
  2017-04-13  5:09 ` B. T. Raven
@ 2017-04-13  6:37   ` Eli Zaretskii
  2017-04-13  7:18   ` Understanding how to specify UTF-8 Eli Zaretskii
  2017-04-14 23:37   ` Will Parsons
  2 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2017-04-13  6:37 UTC (permalink / raw)
  To: help-gnu-emacs

> From: "B. T. Raven" <btraven@nihilo.net>
> Date: Thu, 13 Apr 2017 00:09:51 -0500
> 
> I also have these lines in my .emacs:
> 
>    (set-locale-environment   "utf-8")
>          (set-language-environment               'utf-8)
>          (set-default-coding-systems             'utf-8)
>          (setq file-name-coding-system           'utf-8)
>          (setq buffer-file-coding-system 'utf-8)
>          (setq coding-system-for-write           'utf-8)
>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)
>          (prefer-coding-system                   'utf-8)
>          ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
>          (modify-coding-system-alist 'process 
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
> 
> 
> The line commented out caused a problem but I don't remember what it 
> was. My os w64 vers. 7

Some of the above are not recommended, and some are downright
dangerous (a.k.a. "shooting yourself in the foot").  Especially on
MS-Windows, UTF-8 should be used with extra care, because Windows only
partially supports this encoding in its APIs.

Specifically:

>    (set-locale-environment   "utf-8")

Don't do this on Windows, as Windows locales cannot use UTF-8 as their
encoding.

>          (set-language-environment               'utf-8)
>          (set-default-coding-systems             'utf-8)

Redundant as long as you have the prefer-coding-system call below.

>          (setq file-name-coding-system           'utf-8)

This is a no-op: Emacs on Windows ignores the value of this variable,
except if you are on Windows 9X, and file names cannot be encoded in
UTF-8 on Windows anyway.  Starting with Emacs 24.4, Emacs on Windows
uses Unicode APIs to deal with file names, so it supports non-ASCII
file names with all Unicode characters, and you don't need to do
anything to get this support.

>          (setq buffer-file-coding-system 'utf-8)

Dangerous.  Also redundant with prefer-coding-system below.

>          (setq coding-system-for-write           'utf-8)

This is dangerous: it will produce subtle issues with some commands,
notably when invoking subprocesses with non-ASCII strings in
command-line arguments.  This variable exists so that Lisp programs
could force specific encoding where appropriate, so leave it to that
and don't globally set it.

>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)

These are wrong, and will get in the way when you work in -nw
sessions.  Emacs on MS-Windows doesn't fully support UTF-8 encoding of
keyboard input and console output, even if you tweak your system's
codepage to be 65001 (did you?).

>          (prefer-coding-system                   'utf-8)

This is the only setting that you should have if you want to use UTF-8
wherever possible and reasonable.

>          ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
>          (modify-coding-system-alist 'process 
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)

This is wrong: Emacs on MS-Windows doesn't support UTF-8 encoding of
program command-line arguments for subprocesses, and most Windows
programs will NOT talk UTF-8 in their standard streams.
prefer-coding-system should take care of those situations where this
is possible/actually happens; the rest should be left alone, or you
will have subtle problems with non-ASCII I/O vis-a-vis subprocesses.

HTH



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-13  5:09 ` B. T. Raven
  2017-04-13  6:37   ` (unknown) Eli Zaretskii
@ 2017-04-13  7:18   ` Eli Zaretskii
  2017-04-13  9:42     ` hector
  2017-04-14 23:37   ` Will Parsons
  2 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2017-04-13  7:18 UTC (permalink / raw)
  To: help-gnu-emacs

[Resending with the correct Subject.]

> From: "B. T. Raven" <btraven@nihilo.net>
> Date: Thu, 13 Apr 2017 00:09:51 -0500
> 
> I also have these lines in my .emacs:
> 
>    (set-locale-environment   "utf-8")
>          (set-language-environment               'utf-8)
>          (set-default-coding-systems             'utf-8)
>          (setq file-name-coding-system           'utf-8)
>          (setq buffer-file-coding-system 'utf-8)
>          (setq coding-system-for-write           'utf-8)
>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)
>          (prefer-coding-system                   'utf-8)
>          ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
>          (modify-coding-system-alist 'process 
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
> 
> 
> The line commented out caused a problem but I don't remember what it 
> was. My os w64 vers. 7

Some of the above are not recommended, and some are downright
dangerous (a.k.a. "shooting yourself in the foot").  Especially on
MS-Windows, UTF-8 should be used with extra care, because Windows only
partially supports this encoding in its APIs.

Specifically:

>    (set-locale-environment   "utf-8")

Don't do this on Windows, as Windows locales cannot use UTF-8 as their
encoding.

>          (set-language-environment               'utf-8)
>          (set-default-coding-systems             'utf-8)

Redundant as long as you have the prefer-coding-system call below.

>          (setq file-name-coding-system           'utf-8)

This is a no-op: Emacs on Windows ignores the value of this variable,
except if you are on Windows 9X, and file names cannot be encoded in
UTF-8 on Windows anyway.  Starting with Emacs 24.4, Emacs on Windows
uses Unicode APIs to deal with file names, so it supports non-ASCII
file names with all Unicode characters, and you don't need to do
anything to get this support.

>          (setq buffer-file-coding-system 'utf-8)

Dangerous.  Also redundant with prefer-coding-system below.

>          (setq coding-system-for-write           'utf-8)

This is dangerous: it will produce subtle issues with some commands,
notably when invoking subprocesses with non-ASCII strings in
command-line arguments.  This variable exists so that Lisp programs
could force specific encoding where appropriate, so leave it to that
and don't globally set it.

>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)

These are wrong, and will get in the way when you work in -nw
sessions.  Emacs on MS-Windows doesn't fully support UTF-8 encoding of
keyboard input and console output, even if you tweak your system's
codepage to be 65001 (did you?).

>          (prefer-coding-system                   'utf-8)

This is the only setting that you should have if you want to use UTF-8
wherever possible and reasonable.

>          ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
>          (modify-coding-system-alist 'process 
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)

This is wrong: Emacs on MS-Windows doesn't support UTF-8 encoding of
program command-line arguments for subprocesses, and most Windows
programs will NOT talk UTF-8 in their standard streams.
prefer-coding-system should take care of those situations where this
is possible/actually happens; the rest should be left alone, or you
will have subtle problems with non-ASCII I/O vis-a-vis subprocesses.

HTH



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-13  7:18   ` Understanding how to specify UTF-8 Eli Zaretskii
@ 2017-04-13  9:42     ` hector
  0 siblings, 0 replies; 13+ messages in thread
From: hector @ 2017-04-13  9:42 UTC (permalink / raw)
  To: help-gnu-emacs

@Eli: Thank you. Everything works better when you know what you're doing.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-13  5:09 ` B. T. Raven
  2017-04-13  6:37   ` (unknown) Eli Zaretskii
  2017-04-13  7:18   ` Understanding how to specify UTF-8 Eli Zaretskii
@ 2017-04-14 23:37   ` Will Parsons
  2 siblings, 0 replies; 13+ messages in thread
From: Will Parsons @ 2017-04-14 23:37 UTC (permalink / raw)
  To: help-gnu-emacs

B. T. Raven wrote:
> Hi Will. I decided to respond because of this observation in the latest 
> posting:
> "They used to say emacs and vi are religions; these days they are 
> starting to seem like latin."

Not completely - "Emacs" should be spelt "Emax" first ;)
(And the plural, I suppose should be "emaces" rather than "emacsen".)

> On 4/7/2017 18:43, Will Parsons wrote:
>> I want to always use Unicode/UTF-8 unless otherwise specified.  I've noticed
>> that I've attempted to do this in my .emacs file in two separate ways on two
>> separate platforms:
>>
>> 1)  (setq-default buffer-file-coding-system 'utf-8-unix)
>>
>> 2)  (set-language-environment "UTF-8")
>>
>> Both seem to work, but I'm wondering if there are subtle differences between
>> the two that I should be aware of.
>
> I can't help with any subtlties but can only recommend that you add this 
> cookie to the beginning of the buffer:
>
>   ;; -*- coding: utf-8 -*-

Yes, I've employed that too.  (Incidentally, I've been programming a lot in
Ruby for some years now, and I was surprised to find that after inserting a
copyright symbol (©) into one of my Ruby source files, that Emacs ruby-mode
inserted a line containing '# coding: utf-8' at the top when the file was
saved.)

> I think it may be enough to save and reload the file into a new buffer 
> before adding exotic characters.
> I also have these lines in my .emacs:
>
>    (set-locale-environment   "utf-8")
>          (set-language-environment               'utf-8)
>          (set-default-coding-systems             'utf-8)
>          (setq file-name-coding-system           'utf-8)
>          (setq buffer-file-coding-system 'utf-8)
>          (setq coding-system-for-write           'utf-8)
>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)
>          (prefer-coding-system                   'utf-8)
>          ;; (set-buffer-process-coding-system 'utf-8 'utf-8)
>          (modify-coding-system-alist 'process 
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
>
> The line commented out caused a problem but I don't remember what it 
> was. My os w64 vers. 7

Wow.  I should think that should cover all possibilities.  I prefer to be a
bit more minimalist than that though...

Anyway, thanks - Vale Edwarde!

-- 
Will


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-07 23:43 Understanding how to specify UTF-8 Will Parsons
  2017-04-08  7:29 ` Eli Zaretskii
  2017-04-13  5:09 ` B. T. Raven
@ 2017-04-21  9:28 ` Jason Rumney
  2017-04-21 10:54   ` Eli Zaretskii
                     ` (2 more replies)
  2017-04-21 18:30 ` Understanding how to specify UTF-8 Stefan Monnier
  3 siblings, 3 replies; 13+ messages in thread
From: Jason Rumney @ 2017-04-21  9:28 UTC (permalink / raw)
  To: help-gnu-emacs

On Saturday, 8 April 2017 07:43:58 UTC+8, Will Parsons  wrote:
> I want to always use Unicode/UTF-8 unless otherwise specified.  I've noticed
> that I've attempted to do this in my .emacs file in two separate ways on two
> separate platforms:
> 
> 1)  (setq-default buffer-file-coding-system 'utf-8-unix)
> 
> 2)  (set-language-environment "UTF-8")
> 
> Both seem to work, but I'm wondering if there are subtle differences between
> the two that I should be aware of.

The first only sets the default coding system for Files.

The second sets it for for everything, including system clipboard, file names, process I/O ...

On modern GNU/Linux, Mac or other Posix based OS's, you probably want everything in UTF-8, so the latter is correct. 

On Windows, the system itself does not support UTF-8 fully, so the former is safer. For clipboard and file names on Windows, the latest versions of Emacs will use Unicode regardless of what you specify for the coding system, it is really only process I/O that is the problem - Cygwin and Mingw apps may support UTF-8 I/O, but native Windows apps (including the cmd.exe shell) can have severe difficulties with it.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-21  9:28 ` Jason Rumney
@ 2017-04-21 10:54   ` Eli Zaretskii
  2017-04-21 17:36   ` Will Parsons
  2017-05-29 15:16   ` Understanding cross version problem Francis Belliveau
  2 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2017-04-21 10:54 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Fri, 21 Apr 2017 02:28:45 -0700 (PDT)
> From: Jason Rumney <jasonrumney@gmail.com>
> 
> On Windows, the system itself does not support UTF-8 fully, so the former is safer. For clipboard and file names on Windows, the latest versions of Emacs will use Unicode regardless of what you specify for the coding system, it is really only process I/O that is the problem - Cygwin and Mingw apps may support UTF-8 I/O, but native Windows apps (including the cmd.exe shell) can have severe difficulties with it.

MinGW apps are native apps, so they don't support UTF-8.  I think you
meant MSYS, not MinGW (and then only MSYS2 apps support UTF-8).



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-21  9:28 ` Jason Rumney
  2017-04-21 10:54   ` Eli Zaretskii
@ 2017-04-21 17:36   ` Will Parsons
  2017-05-29 15:16   ` Understanding cross version problem Francis Belliveau
  2 siblings, 0 replies; 13+ messages in thread
From: Will Parsons @ 2017-04-21 17:36 UTC (permalink / raw)
  To: help-gnu-emacs

Jason Rumney wrote:
> On Saturday, 8 April 2017 07:43:58 UTC+8, Will Parsons  wrote:
>> I want to always use Unicode/UTF-8 unless otherwise specified.  I've noticed
>> that I've attempted to do this in my .emacs file in two separate ways on two
>> separate platforms:
>> 
>> 1)  (setq-default buffer-file-coding-system 'utf-8-unix)
>> 
>> 2)  (set-language-environment "UTF-8")
>> 
>> Both seem to work, but I'm wondering if there are subtle differences between
>> the two that I should be aware of.
>
> The first only sets the default coding system for Files.
>
> The second sets it for for everything, including system clipboard, file names, process I/O ...
>
> On modern GNU/Linux, Mac or other Posix based OS's, you probably want everything in UTF-8, so the latter is correct. 
>
> On Windows, the system itself does not support UTF-8 fully, so the former is safer. For clipboard and file names on Windows, the latest versions of Emacs will use Unicode regardless of what you specify for the coding system, it is really only process I/O that is the problem - Cygwin and Mingw apps may support UTF-8 I/O, but native Windows apps (including the cmd.exe shell) can have severe difficulties with it.

Thank you for this detailed answer.  Interestingly enough, I have them
reversed in my Unix vs Windows configurations.

-- 
Will


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Understanding how to specify UTF-8
  2017-04-07 23:43 Understanding how to specify UTF-8 Will Parsons
                   ` (2 preceding siblings ...)
  2017-04-21  9:28 ` Jason Rumney
@ 2017-04-21 18:30 ` Stefan Monnier
  3 siblings, 0 replies; 13+ messages in thread
From: Stefan Monnier @ 2017-04-21 18:30 UTC (permalink / raw)
  To: help-gnu-emacs

> I want to always use Unicode/UTF-8 unless otherwise specified.

If your locale is using utf-8 (which it should nowadays in most cases
under GNU/Linux, especially if you "want to always use Unicode/UTF-8"),
then Emacs should already do that automatically.


        Stefan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Understanding cross version problem
  2017-04-21  9:28 ` Jason Rumney
  2017-04-21 10:54   ` Eli Zaretskii
  2017-04-21 17:36   ` Will Parsons
@ 2017-05-29 15:16   ` Francis Belliveau
  2017-05-29 16:38     ` Drew Adams
  2 siblings, 1 reply; 13+ messages in thread
From: Francis Belliveau @ 2017-05-29 15:16 UTC (permalink / raw)
  To: help-gnu-emacs

I have encountered something that does not make sense to me.
I am normally running version 23.1 but my OS command line binds to 22.1

I have the following line in my .emacs file
  (if (boundp tool-bar-mode) (tool-bar-mode -1))

I put that there to eliminate an error from 22.1 about the missing variable.  However, when I -debug-init I am still being told:
  "void-variable tool-bar-mode"
I thought that is what "boundp" was checking?

What have I missed?

Fran


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Understanding cross version problem
  2017-05-29 15:16   ` Understanding cross version problem Francis Belliveau
@ 2017-05-29 16:38     ` Drew Adams
  0 siblings, 0 replies; 13+ messages in thread
From: Drew Adams @ 2017-05-29 16:38 UTC (permalink / raw)
  To: Francis Belliveau, help-gnu-emacs

>   (if (boundp tool-bar-mode) (tool-bar-mode -1))

Change (boundp tool-bar-mode) to (boundp 'tool-bar-mode).

> I put that there to eliminate an error from 22.1 about the missing variable.
> However, when I -debug-init I am still being told:
>   "void-variable tool-bar-mode"
> I thought that is what "boundp" was checking?
> 
> What have I missed?

See above.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-29 16:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-07 23:43 Understanding how to specify UTF-8 Will Parsons
2017-04-08  7:29 ` Eli Zaretskii
2017-04-13  5:09 ` B. T. Raven
2017-04-13  6:37   ` (unknown) Eli Zaretskii
2017-04-13  7:18   ` Understanding how to specify UTF-8 Eli Zaretskii
2017-04-13  9:42     ` hector
2017-04-14 23:37   ` Will Parsons
2017-04-21  9:28 ` Jason Rumney
2017-04-21 10:54   ` Eli Zaretskii
2017-04-21 17:36   ` Will Parsons
2017-05-29 15:16   ` Understanding cross version problem Francis Belliveau
2017-05-29 16:38     ` Drew Adams
2017-04-21 18:30 ` Understanding how to specify UTF-8 Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).