From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Date: Sat, 11 Aug 2018 19:27:33 +0300 Message-ID: <83pnyolvgq.fsf@gnu.org> References: <555E44EB.6070604@gmx.net> <83egm95boc.fsf@gnu.org> <555F2D3C.6090608@gmx.net> <8660oxdyxy.fsf@realize.ch> <457eu2h1sk.fsf@fencepost.gnu.org> <20180808094748.GA26509@zira.vinc17.org> <83a7ptmfgs.fsf@gnu.org> <20180811101341.GA4800@zira.vinc17.org> <83zhxtkwqq.fsf@gnu.org> <20180811154101.GB4800@zira.vinc17.org> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1534004775 7920 195.159.176.226 (11 Aug 2018 16:26:15 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 11 Aug 2018 16:26:15 +0000 (UTC) Cc: a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org, sledergerber@gmx.net To: Vincent Lefevre Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Aug 11 18:26:11 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1foWiQ-0001x4-U1 for geb-bug-gnu-emacs@m.gmane.org; Sat, 11 Aug 2018 18:26:11 +0200 Original-Received: from localhost ([::1]:60783 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1foWkX-0002dV-HX for geb-bug-gnu-emacs@m.gmane.org; Sat, 11 Aug 2018 12:28:21 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38086) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1foWkI-0002bZ-D2 for bug-gnu-emacs@gnu.org; Sat, 11 Aug 2018 12:28:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1foWkE-0004gR-JP for bug-gnu-emacs@gnu.org; Sat, 11 Aug 2018 12:28:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43307) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1foWkE-0004g0-Fg for bug-gnu-emacs@gnu.org; Sat, 11 Aug 2018 12:28:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1foWkE-0003U0-9b for bug-gnu-emacs@gnu.org; Sat, 11 Aug 2018 12:28:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 11 Aug 2018 16:28:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20623 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 20623-submit@debbugs.gnu.org id=B20623.153400486713366 (code B ref 20623); Sat, 11 Aug 2018 16:28:02 +0000 Original-Received: (at 20623) by debbugs.gnu.org; 11 Aug 2018 16:27:47 +0000 Original-Received: from localhost ([127.0.0.1]:48325 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1foWju-0003TQ-2N for submit@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:46 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:46931) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1foWjs-0003TC-JM for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:40 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1foWjj-0003xn-Gr for 20623@debbugs.gnu.org; Sat, 11 Aug 2018 12:27:35 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:40813) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1foWjj-0003xO-DK; Sat, 11 Aug 2018 12:27:31 -0400 Original-Received: from [176.228.60.248] (port=2585 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1foWji-0004Kp-As; Sat, 11 Aug 2018 12:27:30 -0400 In-reply-to: <20180811154101.GB4800@zira.vinc17.org> (message from Vincent Lefevre on Sat, 11 Aug 2018 17:41:01 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:149440 Archived-At: > Date: Sat, 11 Aug 2018 17:41:01 +0200 > From: Vincent Lefevre > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net, > a.s@realize.ch, 20623@debbugs.gnu.org > > > > You're completely wrong. The presence of BOM or not is very important > > > for some applications, such as Firefox (not to determine the charset, > > > but the MIME type of local files). > > > > Please provide the details, including the use case, if possible. I'm > > still in the dark regarding the importance of the BOM in UTF-8 encoded > > HTML stuff. > > https://bugzilla.mozilla.org/show_bug.cgi?id=1422889 > > for HTML. Wontfix because of: > > https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm > > For text/plain only (but this is another example that BOM can matter > in practice), there's > > https://bugzilla.mozilla.org/show_bug.cgi?id=1071816 > > (which is a bug that should be fixed). Maybe I'm missing something, but none of these issues describes the situation in this bug report, namely: an HTML file with an explicit charset= tag, with or without a BOM. In fact, the first of these issues happens only in files that _do_ have a BOM, so you could say that Emacs did you a favor by removing it ;-) > > I agree about the user not knowing, but that doesn't yet qualify as > > "data loss", which has an widely accepted meaning. > > This is data corruption, which is a form of data loss, because some > information is lost in the process (I recall that Emacs does not > provide any information to the user about this transformation). That is the most inclusive interpretation of "data loss" I've ever seen. "Some information is lost" is nowhere near what "grave bug" means by "data loss", so I don't think "grave" applies here. Anyway, the Emacs issue is now fixed.