From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alain Schneble Newsgroups: gmane.emacs.bugs Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Date: Wed, 12 Oct 2016 23:44:57 +0200 Message-ID: <8660oxdyxy.fsf@realize.ch> References: <555E2912.7060509@gmx.net> <83iobl67ao.fsf@gnu.org> <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1476308792 21456 195.159.176.226 (12 Oct 2016 21:46:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 12 Oct 2016 21:46:32 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (windows-nt) Cc: Stefan Monnier , 20623@debbugs.gnu.org To: Simon Ledergerber Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Oct 12 23:46:28 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1buRLu-0003y7-TG for geb-bug-gnu-emacs@m.gmane.org; Wed, 12 Oct 2016 23:46:19 +0200 Original-Received: from localhost ([::1]:36064 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buRLt-0006vj-Fv for geb-bug-gnu-emacs@m.gmane.org; Wed, 12 Oct 2016 17:46:17 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43765) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buRLk-0006uC-OU for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2016 17:46:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1buRLe-0000xE-QQ for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2016 17:46:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:47327) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1buRLe-0000x8-Mu for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2016 17:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1buRLe-0002HV-8t for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2016 17:46:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alain Schneble Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Oct 2016 21:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20623 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 20623-submit@debbugs.gnu.org id=B20623.14763087418737 (code B ref 20623); Wed, 12 Oct 2016 21:46:02 +0000 Original-Received: (at 20623) by debbugs.gnu.org; 12 Oct 2016 21:45:41 +0000 Original-Received: from localhost ([127.0.0.1]:53517 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1buRLJ-0002Gr-C2 for submit@debbugs.gnu.org; Wed, 12 Oct 2016 17:45:41 -0400 Original-Received: from clientmail.realize.ch ([46.140.89.53]:2660) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1buRLH-0002GX-8S for 20623@debbugs.gnu.org; Wed, 12 Oct 2016 17:45:39 -0400 Original-Received: from rintintin.hq.realize.ch.lan.rit (Unknown [192.168.0.105]) by clientmail.realize.ch with ESMTP ; Wed, 12 Oct 2016 23:45:17 +0200 Original-Received: from myngb (192.168.66.65) by rintintin.hq.realize.ch.lan.rit (192.168.0.105) with Microsoft SMTP Server (TLS) id 15.0.516.32; Wed, 12 Oct 2016 23:45:04 +0200 In-Reply-To: <555F2D3C.6090608@gmx.net> (Simon Ledergerber's message of "Fri, 22 May 2015 15:21:00 +0200") X-ClientProxiedBy: rintintin.hq.realize.ch.lan.rit (192.168.0.105) To rintintin.hq.realize.ch.lan.rit (192.168.0.105) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:124421 Archived-At: I'm joining this discussion and would like to report a recipe to reproduce this issue on Windows: - emacs -Q - C-x C-f utf-8-bom-test.xml - Enter the following text in the new buffer: - C-x RET c utf-8-with-signature-dos C-x C-s yes RET - C-x k RET - C-x C-f utf-8-bom-test.xml - M-: buffer-file-coding-system => utf-8-with-signature-dos - Change buffer content, e.g. add some text to the root element: test - C-x C-s - M-: buffer-file-coding-system => utf-8-dos (expected coding system: utf-8-with-signature-dos) As it was already mentioned in this thread, just by visiting the file, then changing and saving the buffer, the BOM gets lost. This is due to select-safe-coding-system (called by choose_write_coding_system) fully trusting the coding system identified by find-auto-coding. So far so good. The latter eventually calls auto-coding-functions which in turn calls the built-in sgml-xml-auto-coding-function which I think should take into account some context to enrich the derived coding system with a signature if needed. Similar to what select-safe-coding-system does to enrich the coding with the proper eol-type. Does that make sense to you? If so, I'll try to come up with a patch that enhances sgml-xml-auto-coding-function to take into account buffer-file-coding-system (buffer + default value) in case it carries the same text-conversion but different signature. The proposed "auto coding" shall inherit the signature in this case. Thanks for any help. Alain