From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Sun, 10 Dec 2017 21:17:00 +0200 [thread overview]
Message-ID: <838teatmtv.fsf@gnu.org> (raw)
In-Reply-To: <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> (message from Stefan Monnier on Mon, 04 Dec 2017 16:08:14 -0500)
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rgm@gnu.org, a.s@realize.ch, sledergerber@gmx.net, 20623@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 16:08:14 -0500
>
> > Isn't it better to fix this in sgml-xml-auto-coding-function? That's
> > where the root cause is, AFAIU.
>
> I'd expect the same problem would affect all other uses.
Not sure what you meant by "all other uses". Could you please
elaborate?
> > And I don't understand the comment about latin-1-mac: I don't think we
> > have such problems in Emacs. The -with-signature variety is
> > different, because it is not about EOL format.
>
> You might be right, but I don't know where/how this is handled.
I would like to propose the following alternative patch, which accepts
utf-8-with-signature and utf-8-hfs as variants of utf-8 for the
purposes of encoding of XML files. Comments? Do we want a similar
treatment for UTF-16? (That doesn't seem to be required by the bug
report, and UTF-16 in XML files is non-standard anyway. But what
about HTML?)
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 857fa80..5ff1acf 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2493,7 +2493,17 @@ sgml-xml-auto-coding-function
(let* ((match (match-string 1))
(sym (intern (downcase match))))
(if (coding-system-p sym)
- sym
+ ;; If the encoding tag is UTF-8 and the buffer's
+ ;; encoding is one of the variants of UTF-8, use the
+ ;; buffer's encoding. This allows, e.g., saving an
+ ;; XML file as UTF-8 with BOM when the tag says UTF-8.
+ (if (and (coding-system-equal 'utf-8
+ (coding-system-type sym))
+ (coding-system-equal sym
+ (coding-system-type
+ buffer-file-coding-system)))
+ buffer-file-coding-system
+ sym)
(message "Warning: unknown coding system \"%s\"" match)
nil))
;; Files without an encoding tag should be UTF-8. But users
@@ -2506,7 +2516,8 @@ sgml-xml-auto-coding-function
(coding-system-base
(detect-coding-region (point-min) size t)))))
;; Pure ASCII always comes back as undecided.
- (if (memq detected '(utf-8 undecided))
+ (if (memq detected
+ '(utf-8 'utf-8-with-signature 'utf-8-hfs undecided))
'utf-8
(warn "File contents detected as %s.
Consider adding an encoding attribute to the xml declaration,
next prev parent reply other threads:[~2017-12-10 19:17 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 18:50 bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Simon Ledergerber
2015-05-21 19:48 ` Eli Zaretskii
[not found] ` <555E44EB.6070604@gmx.net>
2015-05-22 7:11 ` Eli Zaretskii
2015-05-22 13:21 ` Simon Ledergerber
2016-10-12 21:44 ` Alain Schneble
2017-12-04 16:54 ` Glenn Morris
2017-12-04 17:38 ` Stefan Monnier
2017-12-04 20:28 ` Eli Zaretskii
2017-12-04 21:08 ` Stefan Monnier
2017-12-10 19:17 ` Eli Zaretskii [this message]
2017-12-15 9:08 ` Eli Zaretskii
2018-08-01 18:07 ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose " Glenn Morris
2018-08-01 18:41 ` Eli Zaretskii
2018-08-07 19:14 ` Glenn Morris
2018-08-11 12:45 ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose " Stefan Monnier
2018-08-11 13:54 ` Eli Zaretskii
2018-08-12 0:04 ` Stefan Monnier
2018-08-12 19:07 ` Eli Zaretskii
2018-08-08 9:47 ` Vincent Lefevre
2018-08-08 14:45 ` Stefan Monnier
2018-08-11 9:15 ` Eli Zaretskii
2018-08-11 10:13 ` Vincent Lefevre
2018-08-11 10:45 ` Eli Zaretskii
2018-08-11 15:41 ` Vincent Lefevre
2018-08-11 16:27 ` Eli Zaretskii
2018-08-12 1:34 ` Vincent Lefevre
2018-08-12 0:11 ` Stefan Monnier
2018-08-12 0:58 ` Vincent Lefevre
2015-05-22 15:22 ` Stefan Monnier
2015-05-22 15:26 ` Eli Zaretskii
2015-05-22 21:51 ` Stefan Monnier
2015-05-23 6:44 ` Eli Zaretskii
2015-05-23 17:11 ` Simon Ledergerber
2015-05-23 17:20 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=838teatmtv.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=20623@debbugs.gnu.org \
--cc=a.s@realize.ch \
--cc=monnier@iro.umontreal.ca \
--cc=sledergerber@gmx.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.