From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#35766: emacs saves utf-16 le xml files as utf-16 be Date: Fri, 17 May 2019 18:34:48 +0300 Message-ID: <83tvdt9js7.fsf@gnu.org> References: <837eaqcl9g.fsf@gnu.org> <83lfz5bfed.fsf@gnu.org> <87a7fle1yp.fsf@gmail.com> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="50947"; mail-complaints-to="usenet@blaine.gmane.org" Cc: jszabo_98@hotmail.com, 35766@debbugs.gnu.org To: Noam Postavsky Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri May 17 17:36:15 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hReu5-000D0r-HI for geb-bug-gnu-emacs@m.gmane.org; Fri, 17 May 2019 17:36:13 +0200 Original-Received: from localhost ([127.0.0.1]:50266 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hReu4-0000Ea-FT for geb-bug-gnu-emacs@m.gmane.org; Fri, 17 May 2019 11:36:12 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:54289) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hRetv-0000DM-NO for bug-gnu-emacs@gnu.org; Fri, 17 May 2019 11:36:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hRetu-0003Ij-NW for bug-gnu-emacs@gnu.org; Fri, 17 May 2019 11:36:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:45262) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hRetu-0003Ie-K1 for bug-gnu-emacs@gnu.org; Fri, 17 May 2019 11:36:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hRetu-0001HU-H7 for bug-gnu-emacs@gnu.org; Fri, 17 May 2019 11:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 17 May 2019 15:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 35766 X-GNU-PR-Package: emacs Original-Received: via spool by 35766-submit@debbugs.gnu.org id=B35766.15581073244879 (code B ref 35766); Fri, 17 May 2019 15:36:02 +0000 Original-Received: (at 35766) by debbugs.gnu.org; 17 May 2019 15:35:24 +0000 Original-Received: from localhost ([127.0.0.1]:58806 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRetI-0001Gc-Hd for submit@debbugs.gnu.org; Fri, 17 May 2019 11:35:24 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39805) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRetG-0001GL-7l for 35766@debbugs.gnu.org; Fri, 17 May 2019 11:35:22 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:56274) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hRet9-00032k-9O; Fri, 17 May 2019 11:35:15 -0400 Original-Received: from [176.228.60.248] (port=3266 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hRet4-0005ar-0X; Fri, 17 May 2019 11:35:13 -0400 In-reply-to: <87a7fle1yp.fsf@gmail.com> (message from Noam Postavsky on Fri, 17 May 2019 07:48:30 -0400) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:159457 Archived-At: > From: Noam Postavsky > Cc: Eli Zaretskii , "35766\@debbugs.gnu.org" <35766@debbugs.gnu.org> > Date: Fri, 17 May 2019 07:48:30 -0400 > > UTF-16LE 1014 [RFC2781] [RFC2781] csUTF16LE Ouch, I was looking at the wrong column in that document. The problem is that our detection of encoding of XML files is based on the assumption that the header is in ASCII-compatible encoding, which UTF-16 isn't. So regexp search for the XML header fails, and the detection fails with it. The patch below make us at least recognize UTF-16 with BOM, and also stop the encoding from frightening the user when she specifies UTF-16 with BOM at buffer-save time. But by default, saving a buffer with UTF-16BE or UTF-16LE still produces a file without BOM, and that cannot be detected by our encoding-detection machinery, leaving it to the user to use "C-x RET c" or "C-x RET r". Perhaps we should by default produce encoding with BOM when XML header specifies UTF-16? diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el index dfa9e4e..a248ef8 100644 --- a/lisp/international/mule-cmds.el +++ b/lisp/international/mule-cmds.el @@ -1029,7 +1029,11 @@ select-safe-coding-system ;; This check perhaps isn't ideal, but is probably ;; the best thing to do. (not (auto-coding-alist-lookup (or file buffer-file-name ""))) - (not (coding-system-equal coding-system auto-cs))) + (not (coding-system-equal coding-system auto-cs)) + (or (equal (coding-system-type auto-cs) 'charset) + (not (coding-system-equal (coding-system-type auto-cs) + (coding-system-type + coding-system))))) (unless (yes-or-no-p (format "Selected encoding %s disagrees with \ %s specified by file contents. Really save (else edit coding cookies \ diff --git a/lisp/international/mule.el b/lisp/international/mule.el index b5414de..fcdcd3c 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -2587,9 +2587,14 @@ xml-find-file-coding-system (let ((detected (with-coding-priority '(utf-8) (coding-system-base - (detect-coding-region (point-min) (point-max) t))))) - ;; Pure ASCII always comes back as undecided. + (detect-coding-region (point-min) (point-max) t)))) + (bom (list (char-after 1) (char-after 2)))) (cond + ((equal bom '(#xFE #xFF)) + 'utf-16be-with-signature) + ((equal bom '(#xFF #xFE)) + 'utf-16le-with-signature) + ;; Pure ASCII always comes back as undecided. ((memq detected '(utf-8 undecided)) 'utf-8) ((eq detected 'utf-16le-with-signature) 'utf-16le-with-signature)