From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Simon Ledergerber Newsgroups: gmane.emacs.bugs Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Date: Thu, 21 May 2015 20:50:58 +0200 Message-ID: <555E2912.7060509@gmx.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1432234415 21589 80.91.229.3 (21 May 2015 18:53:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 21 May 2015 18:53:35 +0000 (UTC) To: 20623@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu May 21 20:53:23 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YvVah-0005L5-TP for geb-bug-gnu-emacs@m.gmane.org; Thu, 21 May 2015 20:53:12 +0200 Original-Received: from localhost ([::1]:59122 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVah-0002o0-9z for geb-bug-gnu-emacs@m.gmane.org; Thu, 21 May 2015 14:53:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51287) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVac-0002jc-BZ for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:53:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvVaY-0006ao-N1 for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:53:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:42405) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVaY-0006aY-Gp for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:53:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YvVaY-0005Vx-80 for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:53:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Simon Ledergerber Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 21 May 2015 18:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 20623 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.143223436121166 (code B ref -1); Thu, 21 May 2015 18:53:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 21 May 2015 18:52:41 +0000 Original-Received: from localhost ([127.0.0.1]:52380 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YvVaC-0005VH-GW for submit@debbugs.gnu.org; Thu, 21 May 2015 14:52:41 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:36981) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YvVYx-0005TD-Qb for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvVYq-0005HY-Jj for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:18 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:45979) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYq-0005HU-H0 for submit@debbugs.gnu.org; Thu, 21 May 2015 14:51:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYo-0000z9-UK for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YvVYl-0005GC-SH for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:14 -0400 Original-Received: from mout.gmx.net ([212.227.17.22]:56000) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YvVYl-0005Fd-Hw for bug-gnu-emacs@gnu.org; Thu, 21 May 2015 14:51:11 -0400 Original-Received: from [192.168.1.102] ([77.56.185.142]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0LkCU2-1ZS5UF0JgI-00cBjh for ; Thu, 21 May 2015 20:51:09 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 X-Provags-ID: V03:K0:+N77zlDSfKetZFyJ9sojTf37y7vlQ9EMwXRcTN9eR1I4kstbR1k y8u2HrVwBcRgmT+eD6uDr66KK0+EiCoqODouS+ovKG2+2UoLufkMVjJuDK2o8/vE6HOF+Gq rUhzvB8YfVTBBBuSc+iLloOhugd2uJgZcAjDZ5AxCygaHWXO2QP6F17wL3enxHHrZkWJ+1/ 4VyeaKjzrlFwOel39mzPw== X-UI-Out-Filterresults: notjunk:1; X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Mailman-Approved-At: Thu, 21 May 2015 14:52:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:103001 Archived-At: Hi When I was editing XHTML and HTML files, I wanted to make sure the BOM was written out to the file in order to make it easier for the browser to detect the UTF-8 encoding. Therefore I changed the coding system for the file buffer to utf-8-with-signature-dos (since I am working on a Windows System) before saving the file. After some time I got surprised because the browser (IE11), didn't report UTF-8 as the file's encoding. Having checked the hexdump of my (X)HTML file, I saw the BOM was definitely missing. Obviously, when a "UTF-8" string appears in the (even if commented out, see later below) or declaration, Emacs switches the file coding system to utf-8, when it saves the file, even if utf-8-with-signature was specified explicitly before. This appears to me as a bug, because there is no way anymore to restore the BOM using Emacs. I was not sure, if my bug is related to bug #8282, so I decided to report it (again). My Emacs version is: 24.5.1 (x86_64-unkown-cygwin) of 2015-04-10 on Windows 8.1 x64. I am running Emacs in text-mode only inside a Cygwin console. This is my .emacs.d/init.el: (line-number-mode) (column-number-mode) (setq-default fill-column 80) (setq-default buffer-file-coding-system 'utf-8-dos) (setq-default indent-tabs-mode nil) With XML the problem can be reproduced in the most basic way as detailed out by the following steps: - Create a new file with C-x C-f in the current directory. Name it test.txt for example. - Switch to fundamental mode with M-x fundamental-mode. - Type the text '' - Now save the file and check again: The encoding system for the buffer has changed to utf-8-dos and the BOM has disappeared from the file! Now the steps for HTML: - Create a new file test1.txt in the current directory. - Fill it with the following simple and yet incomplete HTML5 document: Test - Change the coding system to utf-8-with-signature-dos and save the file. - Verify that the coding system for the buffer is correct and the BOM is really written: Yes, it is. - Insert the following *comment* between and : <!-- <meta charset="utf-8"> --> - Save the file and verify: The coding system has changed to utf-8-dos and the BOM has vanished, even if it is just a comment and has no effect! Regards Simon P. S. Information as reported by M-x report-emacs-bug: In GNU Emacs 24.5.1 (x86_64-unknown-cygwin) of 2015-04-10 on desktop-new Configured using: `configure --srcdir=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/emacs --htmldir=/usr/share/doc/emacs/html -C --with-x=no 'CFLAGS=-ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/build=/usr/src/debug/emacs-24.5-1 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.5-1.x86_64/src/emacs-24.5=/usr/src/debug/emacs-24.5-1' CPPFLAGS= LDFLAGS=' Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Help Minor modes in effect: tooltip-mode: t electric-indent-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent messages: Beginning of buffer [3 times] Saving file /cygdrive/c/users/.../html_basics/basic.xhtml... Wrote /cygdrive/c/users/.../html_basics/basic.xhtml Mark set [2 times] Auto-saving...done Mark set [2 times] Saving file /cygdrive/c/users/.../html_basics/basic.xhtml... Wrote /cygdrive/c/users/.../html_basics/basic.xhtml No docstring slot for help-mode-setup No docstring slot for help-mode-finish Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util help-fns mail-prsvr mail-utils misearch multi-isearch mule-diag help-mode easymenu regexp-opt sgml-mode xterm time-date tooltip electric uniquify ediff-hook vc-hooks lisp-float-type tabulated-list newcomment lisp-mode prog-mode register page menu-bar rfn-eshadow timer select mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind gfilenotify multi-tty emacs) Memory information: ((conses 16 81797 4691) (symbols 48 17091 0) (miscs 40 73 387) (strings 32 11233 4887) (string-bytes 1 291872) (vectors 16 7587) (vector-slots 8 342125 27930) (floats 8 57 393) (intervals 56 834 26) (buffers 960 21))