all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: a.s@realize.ch, 20623@debbugs.gnu.org, sledergerber@gmx.net
Subject: bug#20623: XML and HTML files with encoding/charset="utf-8"	declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save
Date: Sun, 10 Dec 2017 21:17:00 +0200	[thread overview]
Message-ID: <838teatmtv.fsf@gnu.org> (raw)
In-Reply-To: <jwvr2sab43f.fsf-monnier+emacsbugs@gnu.org> (message from Stefan Monnier on Mon, 04 Dec 2017 16:08:14 -0500)

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rgm@gnu.org,  a.s@realize.ch,  sledergerber@gmx.net,  20623@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 16:08:14 -0500
> 
> > Isn't it better to fix this in sgml-xml-auto-coding-function?  That's
> > where the root cause is, AFAIU.
> 
> I'd expect the same problem would affect all other uses.

Not sure what you meant by "all other uses".  Could you please
elaborate?

> > And I don't understand the comment about latin-1-mac: I don't think we
> > have such problems in Emacs.  The -with-signature variety is
> > different, because it is not about EOL format.
> 
> You might be right, but I don't know where/how this is handled.

I would like to propose the following alternative patch, which accepts
utf-8-with-signature and utf-8-hfs as variants of utf-8 for the
purposes of encoding of XML files.  Comments?  Do we want a similar
treatment for UTF-16?  (That doesn't seem to be required by the bug
report, and UTF-16 in XML files is non-standard anyway.  But what
about HTML?)

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 857fa80..5ff1acf 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2493,7 +2493,17 @@ sgml-xml-auto-coding-function
 	    (let* ((match (match-string 1))
 		   (sym (intern (downcase match))))
 	      (if (coding-system-p sym)
-		  sym
+                  ;; If the encoding tag is UTF-8 and the buffer's
+                  ;; encoding is one of the variants of UTF-8, use the
+                  ;; buffer's encoding.  This allows, e.g., saving an
+                  ;; XML file as UTF-8 with BOM when the tag says UTF-8.
+                  (if (and (coding-system-equal 'utf-8
+                                                (coding-system-type sym))
+                           (coding-system-equal sym
+                                                (coding-system-type
+                                                 buffer-file-coding-system)))
+                      buffer-file-coding-system
+		    sym)
 		(message "Warning: unknown coding system \"%s\"" match)
 		nil))
           ;; Files without an encoding tag should be UTF-8. But users
@@ -2506,7 +2516,8 @@ sgml-xml-auto-coding-function
                    (coding-system-base
                     (detect-coding-region (point-min) size t)))))
             ;; Pure ASCII always comes back as undecided.
-            (if (memq detected '(utf-8 undecided))
+            (if (memq detected
+                      '(utf-8 'utf-8-with-signature 'utf-8-hfs undecided))
                 'utf-8
               (warn "File contents detected as %s.
   Consider adding an encoding attribute to the xml declaration,





  reply	other threads:[~2017-12-10 19:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 18:50 bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save Simon Ledergerber
2015-05-21 19:48 ` Eli Zaretskii
     [not found]   ` <555E44EB.6070604@gmx.net>
2015-05-22  7:11     ` Eli Zaretskii
2015-05-22 13:21       ` Simon Ledergerber
2016-10-12 21:44         ` Alain Schneble
2017-12-04 16:54           ` Glenn Morris
2017-12-04 17:38             ` Stefan Monnier
2017-12-04 20:28               ` Eli Zaretskii
2017-12-04 21:08                 ` Stefan Monnier
2017-12-10 19:17                   ` Eli Zaretskii [this message]
2017-12-15  9:08                     ` Eli Zaretskii
2018-08-01 18:07                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration lose " Glenn Morris
2018-08-01 18:41                       ` Eli Zaretskii
2018-08-07 19:14                         ` Glenn Morris
2018-08-11 12:45                     ` bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose " Stefan Monnier
2018-08-11 13:54                       ` Eli Zaretskii
2018-08-12  0:04                         ` Stefan Monnier
2018-08-12 19:07                           ` Eli Zaretskii
2018-08-08  9:47               ` Vincent Lefevre
2018-08-08 14:45                 ` Stefan Monnier
2018-08-11  9:15                 ` Eli Zaretskii
2018-08-11 10:13                   ` Vincent Lefevre
2018-08-11 10:45                     ` Eli Zaretskii
2018-08-11 15:41                       ` Vincent Lefevre
2018-08-11 16:27                         ` Eli Zaretskii
2018-08-12  1:34                           ` Vincent Lefevre
2018-08-12  0:11                         ` Stefan Monnier
2018-08-12  0:58                           ` Vincent Lefevre
2015-05-22 15:22   ` Stefan Monnier
2015-05-22 15:26     ` Eli Zaretskii
2015-05-22 21:51       ` Stefan Monnier
2015-05-23  6:44         ` Eli Zaretskii
2015-05-23 17:11           ` Simon Ledergerber
2015-05-23 17:20             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=838teatmtv.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=20623@debbugs.gnu.org \
    --cc=a.s@realize.ch \
    --cc=monnier@iro.umontreal.ca \
    --cc=sledergerber@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.