From: Simon Ledergerber <sledergerber@gmx.net>
To: 20623@debbugs.gnu.org
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Thu, 21 May 2015 20:50:58 +0200 [thread overview]
Message-ID: <555E2912.7060509@gmx.net> (raw)
Hi
When I was editing XHTML and HTML files, I wanted to make sure the BOM
was written out to the file in order to make it easier for the browser
to detect the UTF-8 encoding. Therefore I changed the coding system for
the file buffer to utf-8-with-signature-dos (since I am working on a
Windows System) before saving the file.
After some time I got surprised because the browser (IE11), didn't
report UTF-8 as the file's encoding. Having checked the hexdump of my
(X)HTML file, I saw the BOM was definitely missing.
Obviously, when a "UTF-8" string appears in the <meta charset="utf-8">
(even if commented out, see later below) or <?xml version="1.0"
encoding="utf-8"?> declaration, Emacs switches the file coding system to
utf-8, when it saves the file, even if utf-8-with-signature was
specified explicitly before. This appears to me as a bug, because there
is no way anymore to restore the BOM using Emacs.
I was not sure, if my bug is related to bug #8282, so I decided to
report it (again).
My Emacs version is: 24.5.1 (x86_64-unkown-cygwin) of 2015-04-10 on
Windows 8.1 x64.
I am running Emacs in text-mode only inside a Cygwin console.
This is my .emacs.d/init.el:
(line-number-mode)
(column-number-mode)
(setq-default fill-column 80)
(setq-default buffer-file-coding-system 'utf-8-dos)
(setq-default indent-tabs-mode nil)
With XML the problem can be reproduced in the most basic way as detailed
out by the following steps:
- Create a new file with C-x C-f in the current directory. Name it
test.txt for example.
- Switch to fundamental mode with M-x fundamental-mode.
- Type the text '<?xml version="1.0"' (without the surrounding single
quotes).
- Switch the encoding system to include the BOM: C-x RET f
utf-8-with-signature-dos.
- Verify the current encoding system with C-h Shift-c RET: Yes, the
encoding system for the file buffer is as specified before.
- Type C-x k to kill the help buffer if necessary and save the file with
C-x C-s.
- Check the file with a hex editor. Under the Cygwin Bash shell, 'od -Ax
-t xCaz test.txt' will also do it: The UTF-8 BOM 'EF BB BF' was written
at the beginning of the file.
- Complete the rest of the XML declaration as follows: ' encoding="utf-8"?>'
- Now save the file and check again: The encoding system for the buffer
has changed to utf-8-dos and the BOM has disappeared from the file!
Now the steps for HTML:
- Create a new file test1.txt in the current directory.
- Fill it with the following simple and yet incomplete HTML5 document:
<!doctype html>
<html>
<head>
<title>Test</title>
</head>
<body>
</body>
</html>
- Change the coding system to utf-8-with-signature-dos and save the file.
- Verify that the coding system for the buffer is correct and the BOM is
really written: Yes, it is.
- Insert the following *comment* between <head> and <title>: <!-- <meta
charset="utf-8"> -->
- Save the file and verify: The coding system has changed to utf-8-dos
and the BOM has vanished, even if it is just a comment and has no effect!
Regards
Simon
P. S. Information as reported by M-x report-emacs-bug:
In GNU Emacs 24.5.1 (x86_64-unknown-cygwin)
of 2015-04-10 on desktop-new
Configured using:
`configure
--srcdir=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5
--prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc
--docdir=/usr/share/doc/emacs --htmldir=/usr/share/doc/emacs/html -C
--with-x=no 'CFLAGS=-ggdb -O2 -pipe -Wimplicit-function-declaration
-fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/build=/usr/src/debug/emacs-24.5-1
-fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5=/usr/src/debug/emacs-24.5-1'
CPPFLAGS= LDFLAGS='
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Help
Minor modes in effect:
tooltip-mode: t
electric-indent-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Recent messages:
Beginning of buffer [3 times]
Saving file /cygdrive/c/users/.../html_basics/basic.xhtml...
Wrote /cygdrive/c/users/.../html_basics/basic.xhtml
Mark set [2 times]
Auto-saving...done
Mark set [2 times]
Saving file /cygdrive/c/users/.../html_basics/basic.xhtml...
Wrote /cygdrive/c/users/.../html_basics/basic.xhtml
No docstring slot for help-mode-setup
No docstring slot for help-mode-finish
Load-path shadows:
None found.
Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
help-fns mail-prsvr mail-utils misearch multi-isearch mule-diag
help-mode easymenu regexp-opt sgml-mode xterm time-date tooltip electric
uniquify ediff-hook vc-hooks lisp-float-type tabulated-list newcomment
lisp-mode prog-mode register page menu-bar rfn-eshadow timer select
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
dbusbind gfilenotify multi-tty emacs)
Memory information:
((conses 16 81797 4691)
(symbols 48 17091 0)
(miscs 40 73 387)
(strings 32 11233 4887)
(string-bytes 1 291872)
(vectors 16 7587)
(vector-slots 8 342125 27930)
(floats 8 57 393)
(intervals 56 834 26)
(buffers 960 21))
next reply other threads:[~2015-05-21 18:50 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 18:50 Simon Ledergerber [this message]
2015-05-21 19:48 ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Eli Zaretskii
[not found] ` <555E44EB.6070604@gmx.net>
2015-05-22 7:11 ` Eli Zaretskii
2015-05-22 13:21 ` Simon Ledergerber
2016-10-12 21:44 ` Alain Schneble
2017-12-04 16:54 ` Glenn Morris
2017-12-04 17:38 ` Stefan Monnier
2017-12-04 20:28 ` Eli Zaretskii
2017-12-04 21:08 ` Stefan Monnier
2017-12-10 19:17 ` Eli Zaretskii
2017-12-15 9:08 ` Eli Zaretskii
2018-08-01 18:07 ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose " Glenn Morris
2018-08-01 18:41 ` Eli Zaretskii
2018-08-07 19:14 ` Glenn Morris
2018-08-11 12:45 ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose " Stefan Monnier
2018-08-11 13:54 ` Eli Zaretskii
2018-08-12 0:04 ` Stefan Monnier
2018-08-12 19:07 ` Eli Zaretskii
2018-08-08 9:47 ` Vincent Lefevre
2018-08-08 14:45 ` Stefan Monnier
2018-08-11 9:15 ` Eli Zaretskii
2018-08-11 10:13 ` Vincent Lefevre
2018-08-11 10:45 ` Eli Zaretskii
2018-08-11 15:41 ` Vincent Lefevre
2018-08-11 16:27 ` Eli Zaretskii
2018-08-12 1:34 ` Vincent Lefevre
2018-08-12 0:11 ` Stefan Monnier
2018-08-12 0:58 ` Vincent Lefevre
2015-05-22 15:22 ` Stefan Monnier
2015-05-22 15:26 ` Eli Zaretskii
2015-05-22 21:51 ` Stefan Monnier
2015-05-23 6:44 ` Eli Zaretskii
2015-05-23 17:11 ` Simon Ledergerber
2015-05-23 17:20 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=555E2912.7060509@gmx.net \
--to=sledergerber@gmx.net \
--cc=20623@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.