unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Simon Ledergerber <sledergerber@gmx.net>
Cc: 20623@debbugs.gnu.org
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8"	declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Fri, 22 May 2015 10:11:31 +0300	[thread overview]
Message-ID: <83egm95boc.fsf@gnu.org> (raw)
In-Reply-To: <555E44EB.6070604@gmx.net>

[Please don't remove the bug address from the CC list, so that this
discussion is recorded in the bug data base.]

> Date: Thu, 21 May 2015 22:49:47 +0200
> From: Simon Ledergerber <sledergerber@gmx.net>
> 
>  From the documentation I understand that utf-8 is without BOM and 
> utf-8-with-signature is with BOM. Maybe I am wrong and should rather 
> understand that utf-8 is auto-detect. But then there is something like 
> utf-8-without-signature missing to specify explicitly that no BOM is 
> desired.
> 
> In my opinion, it is correct when Emacs prefers utf-8 over 
> utf-8-with-signature when it opens a file without BOM that can still be 
> recognized as UTF-8.
> 
> However when a file is opened with a BOM already present, it should 
> stick to the utf-8-with-signature coding system, because the BOM "EF BB 
> BF" unambiguously marks the file as UTF-8. (For UTF-16 for example, 
> there is a different BOM byte pattern. There are other coding systems 
> which do not have a BOM at all.)

What do you mean by "stick to"?  When I try visiting an XML file that
is encoded with BOM, Emacs decodes the file correctly, and the value
of buffer-file-coding-system is utf-8-with-signature.  Isn't that what
you want?  If that's what you want, but it doesn't happen for you,
please try in "emacs -Q".  It's possible that the default you set:

  (setq-default buffer-file-coding-system 'utf-8-dos)

is the reason for what you see.  (I don't understand why you need such
a default, and it sounds like a bad idea to me.)

> By doing C-x <RET> f and then saving it with C-x C-s, I expect to be 
> able to change the coding system.  For example, if I specify utf-8-dos, 
> the BOM should be removed, if one was present, and CR LF should be 
> inserted for EOL. On the other side, if I choose 
> utf-8-with-signature-unix, a BOM should be written and LF be taken for 
> EOL. (The conversion between DOS and Unix works, just the BOM is the 
> problem.)
> 
> I have found a link, where this topic was already discussed, but it 
> didn't help me further:
> http://superuser.com/questions/41254/make-emacs-not-remove-the-bom-from-xml-files
> 
> In that post Vebjorn Ljosa asked exactly the question I have. Richard 
> Hoskins replies with the answer to change the coding system with C-x 
> <RET> r utf-8-with-signature. Unfortunately, it didn't work for me - 
> after doing a change in the file and saving, it got back to utf-8 
> automatically - that's why I have filed the bug.

That's not how you force a file to be saved in a specific encoding.
You should do this instead:

  C-x RET c utf-8-with-signature RET C-x C-s

The "C-x RET c" prefix forces the next Emacs operation to use the
specified encoding.  In this case, Emacs will ask for confirmation,
because the encoding you specified is different from what the XML
comment says.





  parent reply	other threads:[~2015-05-22  7:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 18:50 bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Simon Ledergerber
2015-05-21 19:48 ` Eli Zaretskii
     [not found]   ` <555E44EB.6070604@gmx.net>
2015-05-22  7:11     ` Eli Zaretskii [this message]
2015-05-22 13:21       ` Simon Ledergerber
2016-10-12 21:44         ` Alain Schneble
2017-12-04 16:54           ` Glenn Morris
2017-12-04 17:38             ` Stefan Monnier
2017-12-04 20:28               ` Eli Zaretskii
2017-12-04 21:08                 ` Stefan Monnier
2017-12-10 19:17                   ` Eli Zaretskii
2017-12-15  9:08                     ` Eli Zaretskii
2018-08-01 18:07                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose " Glenn Morris
2018-08-01 18:41                       ` Eli Zaretskii
2018-08-07 19:14                         ` Glenn Morris
2018-08-11 12:45                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose " Stefan Monnier
2018-08-11 13:54                       ` Eli Zaretskii
2018-08-12  0:04                         ` Stefan Monnier
2018-08-12 19:07                           ` Eli Zaretskii
2018-08-08  9:47               ` Vincent Lefevre
2018-08-08 14:45                 ` Stefan Monnier
2018-08-11  9:15                 ` Eli Zaretskii
2018-08-11 10:13                   ` Vincent Lefevre
2018-08-11 10:45                     ` Eli Zaretskii
2018-08-11 15:41                       ` Vincent Lefevre
2018-08-11 16:27                         ` Eli Zaretskii
2018-08-12  1:34                           ` Vincent Lefevre
2018-08-12  0:11                         ` Stefan Monnier
2018-08-12  0:58                           ` Vincent Lefevre
2015-05-22 15:22   ` Stefan Monnier
2015-05-22 15:26     ` Eli Zaretskii
2015-05-22 21:51       ` Stefan Monnier
2015-05-23  6:44         ` Eli Zaretskii
2015-05-23 17:11           ` Simon Ledergerber
2015-05-23 17:20             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83egm95boc.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=20623@debbugs.gnu.org \
    --cc=sledergerber@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).