unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Noam Postavsky <npostavs@gmail.com>
Cc: jszabo_98@hotmail.com, 35766@debbugs.gnu.org
Subject: bug#35766: emacs saves utf-16 le xml files as utf-16 be
Date: Fri, 17 May 2019 18:34:48 +0300	[thread overview]
Message-ID: <83tvdt9js7.fsf@gnu.org> (raw)
In-Reply-To: <87a7fle1yp.fsf@gmail.com> (message from Noam Postavsky on Fri, 17 May 2019 07:48:30 -0400)

> From: Noam Postavsky <npostavs@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  "35766\@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 07:48:30 -0400
> 
>     UTF-16LE    1014    [RFC2781]   [RFC2781]   csUTF16LE

Ouch, I was looking at the wrong column in that document.

The problem is that our detection of encoding of XML files is based on
the assumption that the header is in ASCII-compatible encoding, which
UTF-16 isn't.  So regexp search for the XML header fails, and the
detection fails with it.

The patch below make us at least recognize UTF-16 with BOM, and also
stop the encoding from frightening the user when she specifies UTF-16
with BOM at buffer-save time.  But by default, saving a buffer with
UTF-16BE or UTF-16LE still produces a file without BOM, and that
cannot be detected by our encoding-detection machinery, leaving it to
the user to use "C-x RET c" or "C-x RET r".

Perhaps we should by default produce encoding with BOM when XML header
specifies UTF-16?

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index dfa9e4e..a248ef8 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1029,7 +1029,11 @@ select-safe-coding-system
 		 ;; This check perhaps isn't ideal, but is probably
 		 ;; the best thing to do.
 		 (not (auto-coding-alist-lookup (or file buffer-file-name "")))
-		 (not (coding-system-equal coding-system auto-cs)))
+		 (not (coding-system-equal coding-system auto-cs))
+                 (or (equal (coding-system-type auto-cs) 'charset)
+                     (not (coding-system-equal (coding-system-type auto-cs)
+                                               (coding-system-type
+                                                coding-system)))))
 	    (unless (yes-or-no-p
 		     (format "Selected encoding %s disagrees with \
 %s specified by file contents.  Really save (else edit coding cookies \
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index b5414de..fcdcd3c 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2587,9 +2587,14 @@ xml-find-file-coding-system
       (let ((detected
              (with-coding-priority '(utf-8)
                (coding-system-base
-                (detect-coding-region (point-min) (point-max) t)))))
-        ;; Pure ASCII always comes back as undecided.
+                (detect-coding-region (point-min) (point-max) t))))
+            (bom (list (char-after 1) (char-after 2))))
         (cond
+         ((equal bom '(#xFE #xFF))
+          'utf-16be-with-signature)
+         ((equal bom '(#xFF #xFE))
+          'utf-16le-with-signature)
+         ;; Pure ASCII always comes back as undecided.
          ((memq detected '(utf-8 undecided))
           'utf-8)
          ((eq detected 'utf-16le-with-signature) 'utf-16le-with-signature)





  reply	other threads:[~2019-05-17 15:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 17:11 bug#35766: emacs saves utf-16 le xml files as utf-16 be J S
2019-05-16 18:22 ` Eli Zaretskii
     [not found]   ` <BL0PR11MB34754605999DC2A03A6DF45A9E0A0@BL0PR11MB3475.namprd11.prod.outlook.com>
2019-05-16 19:21     ` J S
2019-05-16 20:57       ` J S
2019-05-17  9:26         ` Eli Zaretskii
2019-05-17 11:26           ` J S
2019-05-17 11:48             ` Noam Postavsky
2019-05-17 15:34               ` Eli Zaretskii [this message]
2019-05-17 16:27                 ` npostavs
2019-05-17 16:57                   ` J S
2019-05-17 19:46                     ` Eli Zaretskii
2019-05-17 20:16                       ` J S
2019-05-18  5:33                         ` Eli Zaretskii
2019-05-18 20:57                           ` J S
2019-05-19  4:58                             ` Eli Zaretskii
2019-05-19 14:12                               ` J S
2019-05-18  7:26                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83tvdt9js7.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=35766@debbugs.gnu.org \
    --cc=jszabo_98@hotmail.com \
    --cc=npostavs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).