unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Vincent Lefevre <vincent@vinc17.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: a.s@realize.ch, monnier@iro.umontreal.ca, 20623@debbugs.gnu.org,
	sledergerber@gmx.net
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Sat, 11 Aug 2018 17:41:01 +0200	[thread overview]
Message-ID: <20180811154101.GB4800@zira.vinc17.org> (raw)
In-Reply-To: <83zhxtkwqq.fsf@gnu.org>

On 2018-08-11 13:45:17 +0300, Eli Zaretskii wrote:
> > Date: Sat, 11 Aug 2018 12:13:41 +0200
> > From: Vincent Lefevre <vincent@vinc17.net>
> > Cc: monnier@iro.umontreal.ca, rgm@gnu.org, sledergerber@gmx.net,
> > 	a.s@realize.ch, 20623@debbugs.gnu.org
> > 
> > On 2018-08-11 12:15:31 +0300, Eli Zaretskii wrote:
> > > In this case, I cannot but express my extreme surprise to see such a
> > > minor issue described as "grave".  The alleged data loss is minor, if
> > > it exists at all (the BOM is not data important for the user,
> > 
> > You're completely wrong. The presence of BOM or not is very important
> > for some applications, such as Firefox (not to determine the charset,
> > but the MIME type of local files).
> 
> Please provide the details, including the use case, if possible.  I'm
> still in the dark regarding the importance of the BOM in UTF-8 encoded
> HTML stuff.

  https://bugzilla.mozilla.org/show_bug.cgi?id=1422889

for HTML. Wontfix because of:

  https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm

For text/plain only (but this is another example that BOM can matter
in practice), there's

  https://bugzilla.mozilla.org/show_bug.cgi?id=1071816

(which is a bug that should be fixed).

> > It can be repaired, but the problems are the user doesn't know
> > what's going on and this breaks things.
> 
> I agree about the user not knowing, but that doesn't yet qualify as
> "data loss", which has an widely accepted meaning.

This is data corruption, which is a form of data loss, because some
information is lost in the process (I recall that Emacs does not
provide any information to the user about this transformation).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





  reply	other threads:[~2018-08-11 15:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 18:50 bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Simon Ledergerber
2015-05-21 19:48 ` Eli Zaretskii
     [not found]   ` <555E44EB.6070604@gmx.net>
2015-05-22  7:11     ` Eli Zaretskii
2015-05-22 13:21       ` Simon Ledergerber
2016-10-12 21:44         ` Alain Schneble
2017-12-04 16:54           ` Glenn Morris
2017-12-04 17:38             ` Stefan Monnier
2017-12-04 20:28               ` Eli Zaretskii
2017-12-04 21:08                 ` Stefan Monnier
2017-12-10 19:17                   ` Eli Zaretskii
2017-12-15  9:08                     ` Eli Zaretskii
2018-08-01 18:07                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose " Glenn Morris
2018-08-01 18:41                       ` Eli Zaretskii
2018-08-07 19:14                         ` Glenn Morris
2018-08-11 12:45                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose " Stefan Monnier
2018-08-11 13:54                       ` Eli Zaretskii
2018-08-12  0:04                         ` Stefan Monnier
2018-08-12 19:07                           ` Eli Zaretskii
2018-08-08  9:47               ` Vincent Lefevre
2018-08-08 14:45                 ` Stefan Monnier
2018-08-11  9:15                 ` Eli Zaretskii
2018-08-11 10:13                   ` Vincent Lefevre
2018-08-11 10:45                     ` Eli Zaretskii
2018-08-11 15:41                       ` Vincent Lefevre [this message]
2018-08-11 16:27                         ` Eli Zaretskii
2018-08-12  1:34                           ` Vincent Lefevre
2018-08-12  0:11                         ` Stefan Monnier
2018-08-12  0:58                           ` Vincent Lefevre
2015-05-22 15:22   ` Stefan Monnier
2015-05-22 15:26     ` Eli Zaretskii
2015-05-22 21:51       ` Stefan Monnier
2015-05-23  6:44         ` Eli Zaretskii
2015-05-23 17:11           ` Simon Ledergerber
2015-05-23 17:20             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180811154101.GB4800@zira.vinc17.org \
    --to=vincent@vinc17.net \
    --cc=20623@debbugs.gnu.org \
    --cc=a.s@realize.ch \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=sledergerber@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).