unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#35766: emacs saves utf-16 le xml files as utf-16 be
@ 2019-05-16 17:11 J S
  2019-05-16 18:22 ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-16 17:11 UTC (permalink / raw)
  To: 35766

[-- Attachment #1: Type: text/plain, Size: 214 bytes --]

Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using "UTF-16LE" instead will break the encoding and remove the BOM.

<?xml version="1.0" encoding="UTF-16"?>

[-- Attachment #2: Type: text/html, Size: 804 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-16 17:11 bug#35766: emacs saves utf-16 le xml files as utf-16 be J S
@ 2019-05-16 18:22 ` Eli Zaretskii
       [not found]   ` <BL0PR11MB34754605999DC2A03A6DF45A9E0A0@BL0PR11MB3475.namprd11.prod.outlook.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-16 18:22 UTC (permalink / raw)
  To: J S; +Cc: 35766

> From: J S <jszabo_98@hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
> 
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
> 
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
       [not found]   ` <BL0PR11MB34754605999DC2A03A6DF45A9E0A0@BL0PR11MB3475.namprd11.prod.outlook.com>
@ 2019-05-16 19:21     ` J S
  2019-05-16 20:57       ` J S
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-16 19:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>

________________________________
From: J S <jszabo_98@hotmail.com>
Sent: Thursday, May 16, 2019 7:15 PM
To: Eli Zaretskii
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>


________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Thursday, May 16, 2019 6:22 PM
To: J S
Cc: 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98@hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
>
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
>
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.

[-- Attachment #2: Type: text/html, Size: 3324 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-16 19:21     ` J S
@ 2019-05-16 20:57       ` J S
  2019-05-17  9:26         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-16 20:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when this is the tag:

<?xml version="1.0" encoding="UTF-16"?>

________________________________
From: J S <jszabo_98@hotmail.com>
Sent: Thursday, May 16, 2019 7:21 PM
To: Eli Zaretskii
Cc: 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>

________________________________
From: J S <jszabo_98@hotmail.com>
Sent: Thursday, May 16, 2019 7:15 PM
To: Eli Zaretskii
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Try saving this xml file and opening it again:

<?xml version="1.0" encoding="UTF-16LE"?>


________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Thursday, May 16, 2019 6:22 PM
To: J S
Cc: 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98@hotmail.com>
> Date: Thu, 16 May 2019 17:11:21 +0000
>
> Xml files with this tag are saved as utf-16 be by emacs, even if the file was originally utf-16 le.  Using
> "UTF-16LE" instead will break the encoding and remove the BOM.
>
> <?xml version="1.0" encoding="UTF-16"?>

Did you try using utf-16le-with-signature?

Or maybe I don't understand the scenario: would you please describe a
full reproduction recipe, starting from "emacs -Q"?

Thanks.

[-- Attachment #2: Type: text/html, Size: 4548 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-16 20:57       ` J S
@ 2019-05-17  9:26         ` Eli Zaretskii
  2019-05-17 11:26           ` J S
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-17  9:26 UTC (permalink / raw)
  To: J S; +Cc: 35766

> From: J S <jszabo_98@hotmail.com>
> CC: "35766@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Thu, 16 May 2019 20:57:34 +0000
> 
> I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when
> this is the tag:
> 
> <?xml version="1.0" encoding="UTF-16"?>

This is the default, yes.  "C-h C utf-16 RET" says:

  UTF-16 (detect endian on decoding, use big endian on encoding with BOM).
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to encode in UTF-16LE, you need to tell Emacs to do this
explicitly:

  C-x RET c utf-16le-with-signature RET C-x C-s

> Try saving this xml file and opening it again:
> 
> <?xml version="1.0" encoding="UTF-16LE"?>

AFAIU, encoding="UTF-16LE" is invalid in XML.  If you see this
documented somewhere in XML docs, please tell me where it is
described.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17  9:26         ` Eli Zaretskii
@ 2019-05-17 11:26           ` J S
  2019-05-17 11:48             ` Noam Postavsky
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-17 11:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

It would change color in emacs if encoding="UTF16-LE" were invalid.  It's hard to find the docs for it.  UTF-16LE is listed here:  http://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.wst.xmleditor.doc.user%2Ftopics%2Fcxmlenc.html


________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Friday, May 17, 2019 9:26 AM
To: J S
Cc: 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98@hotmail.com>
> CC: "35766@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Thu, 16 May 2019 20:57:34 +0000
>
> I should say that I'm using emacs for windows.  And it's preferring saving in big endian to little endian when
> this is the tag:
>
> <?xml version="1.0" encoding="UTF-16"?>

This is the default, yes.  "C-h C utf-16 RET" says:

  UTF-16 (detect endian on decoding, use big endian on encoding with BOM).
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to encode in UTF-16LE, you need to tell Emacs to do this
explicitly:

  C-x RET c utf-16le-with-signature RET C-x C-s

> Try saving this xml file and opening it again:
>
> <?xml version="1.0" encoding="UTF-16LE"?>

AFAIU, encoding="UTF-16LE" is invalid in XML.  If you see this
documented somewhere in XML docs, please tell me where it is
described.

[-- Attachment #2: Type: text/html, Size: 2872 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 11:26           ` J S
@ 2019-05-17 11:48             ` Noam Postavsky
  2019-05-17 15:34               ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Noam Postavsky @ 2019-05-17 11:48 UTC (permalink / raw)
  To: J S; +Cc: 35766@debbugs.gnu.org

J S <jszabo_98@hotmail.com> writes:

> It would change color in emacs if encoding="UTF16-LE" were invalid.
> It's hard to find the docs for it.  UTF-16LE is listed here:
> http://help.eclipse.org/kepler/index.jsp?topic=%2Forg.eclipse.wst.xmleditor.doc.user%2Ftopics%2Fcxmlenc.html

A more official reference:

https://www.w3.org/TR/xml/#NT-EncName

    It is RECOMMENDED that character encodings registered (as charsets)
    with the Internet Assigned Numbers Authority [IANA-CHARSETS], other
    than those just listed, be referred to using their registered names;

[IANA-CHARSETS]: http://www.iana.org/assignments/character-sets/character-sets.xhtml

    UTF-16LE    1014    [RFC2781]   [RFC2781]   csUTF16LE






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 11:48             ` Noam Postavsky
@ 2019-05-17 15:34               ` Eli Zaretskii
  2019-05-17 16:27                 ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-17 15:34 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: jszabo_98, 35766

> From: Noam Postavsky <npostavs@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  "35766\@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 07:48:30 -0400
> 
>     UTF-16LE    1014    [RFC2781]   [RFC2781]   csUTF16LE

Ouch, I was looking at the wrong column in that document.

The problem is that our detection of encoding of XML files is based on
the assumption that the header is in ASCII-compatible encoding, which
UTF-16 isn't.  So regexp search for the XML header fails, and the
detection fails with it.

The patch below make us at least recognize UTF-16 with BOM, and also
stop the encoding from frightening the user when she specifies UTF-16
with BOM at buffer-save time.  But by default, saving a buffer with
UTF-16BE or UTF-16LE still produces a file without BOM, and that
cannot be detected by our encoding-detection machinery, leaving it to
the user to use "C-x RET c" or "C-x RET r".

Perhaps we should by default produce encoding with BOM when XML header
specifies UTF-16?

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index dfa9e4e..a248ef8 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1029,7 +1029,11 @@ select-safe-coding-system
 		 ;; This check perhaps isn't ideal, but is probably
 		 ;; the best thing to do.
 		 (not (auto-coding-alist-lookup (or file buffer-file-name "")))
-		 (not (coding-system-equal coding-system auto-cs)))
+		 (not (coding-system-equal coding-system auto-cs))
+                 (or (equal (coding-system-type auto-cs) 'charset)
+                     (not (coding-system-equal (coding-system-type auto-cs)
+                                               (coding-system-type
+                                                coding-system)))))
 	    (unless (yes-or-no-p
 		     (format "Selected encoding %s disagrees with \
 %s specified by file contents.  Really save (else edit coding cookies \
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index b5414de..fcdcd3c 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2587,9 +2587,14 @@ xml-find-file-coding-system
       (let ((detected
              (with-coding-priority '(utf-8)
                (coding-system-base
-                (detect-coding-region (point-min) (point-max) t)))))
-        ;; Pure ASCII always comes back as undecided.
+                (detect-coding-region (point-min) (point-max) t))))
+            (bom (list (char-after 1) (char-after 2))))
         (cond
+         ((equal bom '(#xFE #xFF))
+          'utf-16be-with-signature)
+         ((equal bom '(#xFF #xFE))
+          'utf-16le-with-signature)
+         ;; Pure ASCII always comes back as undecided.
          ((memq detected '(utf-8 undecided))
           'utf-8)
          ((eq detected 'utf-16le-with-signature) 'utf-16le-with-signature)





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 15:34               ` Eli Zaretskii
@ 2019-05-17 16:27                 ` npostavs
  2019-05-17 16:57                   ` J S
  2019-05-18  7:26                   ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: npostavs @ 2019-05-17 16:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: jszabo_98, 35766, Noam Postavsky

Eli Zaretskii <eliz@gnu.org> writes:

> Perhaps we should by default produce encoding with BOM when XML header
> specifies UTF-16?

I think yes, https://www.w3.org/TR/xml/#charencoding says

    Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

By the way, is Bug#8282 the same as this one, or just closely related?
It's talking about sgml-html-meta-auto-coding-function (though maybe
sgml-xml-auto-coding-function is more relevant).  I'm getting a little
confused between all the different *-find/auto-coding-* functions.
There is also nxml-set-auto-coding which seems to be mostly unused.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 16:27                 ` npostavs
@ 2019-05-17 16:57                   ` J S
  2019-05-17 19:46                     ` Eli Zaretskii
  2019-05-18  7:26                   ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: J S @ 2019-05-17 16:57 UTC (permalink / raw)
  To: npostavs@gmail.com, Eli Zaretskii; +Cc: 35766@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

________________________________
From: npostavs@gmail.com <npostavs@gmail.com>
Sent: Friday, May 17, 2019 4:27 PM
To: Eli Zaretskii
Cc: Noam Postavsky; jszabo_98@hotmail.com; 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

Eli Zaretskii <eliz@gnu.org> writes:

> Perhaps we should by default produce encoding with BOM when XML header
> specifies UTF-16?

I think yes, https://www.w3.org/TR/xml/#charencoding says

    Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

By the way, is Bug#8282 the same as this one, or just closely related?
It's talking about sgml-html-meta-auto-coding-function (though maybe
sgml-xml-auto-coding-function is more relevant).  I'm getting a little
confused between all the different *-find/auto-coding-* functions.
There is also nxml-set-auto-coding which seems to be mostly unused.

[-- Attachment #2: Type: text/html, Size: 1972 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 16:57                   ` J S
@ 2019-05-17 19:46                     ` Eli Zaretskii
  2019-05-17 20:16                       ` J S
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-17 19:46 UTC (permalink / raw)
  To: J S; +Cc: 35766, npostavs

> From: J S <jszabo_98@hotmail.com>
> CC: "35766@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 16:57:23 +0000
> 
> When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

What is "an application" in this context?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 19:46                     ` Eli Zaretskii
@ 2019-05-17 20:16                       ` J S
  2019-05-18  5:33                         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-17 20:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org, npostavs@gmail.com

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

For example, if I save this xml file in emacs, it saves it as utf-16 big endian:


<?xml version="1.0" encoding="UTF-16"?>

If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

[xml]$xml = get-content file.xml
$xml.save('file.xml')


________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Friday, May 17, 2019 7:46 PM
To: J S
Cc: npostavs@gmail.com; 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98@hotmail.com>
> CC: "35766@debbugs.gnu.org" <35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 16:57:23 +0000
>
> When an xml file just says encoding="UTF-16", how does an application pick big endian vs little endian?

What is "an application" in this context?

[-- Attachment #2: Type: text/html, Size: 2747 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 20:16                       ` J S
@ 2019-05-18  5:33                         ` Eli Zaretskii
  2019-05-18 20:57                           ` J S
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-18  5:33 UTC (permalink / raw)
  To: J S; +Cc: 35766, npostavs

> From: J S <jszabo_98@hotmail.com>
> CC: "npostavs@gmail.com" <npostavs@gmail.com>, "35766@debbugs.gnu.org"
> 	<35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 20:16:41 +0000
> 
> For example, if I save this xml file in emacs, it saves it as utf-16 big endian:
> 
> <?xml version="1.0" encoding="UTF-16"?>

This is the Emacs default, which is well documented, and is also
according to what the UTF-16 spec (RFC 2781) says.

> If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

Then PowerShell behaves in violation of RFC 2781.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-17 16:27                 ` npostavs
  2019-05-17 16:57                   ` J S
@ 2019-05-18  7:26                   ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-18  7:26 UTC (permalink / raw)
  To: npostavs; +Cc: jszabo_98, 35766

merge 8282 35766
close 36766
thanks

> From: npostavs@gmail.com
> Cc: Noam Postavsky <npostavs@gmail.com>,  jszabo_98@hotmail.com,  35766@debbugs.gnu.org
> Date: Fri, 17 May 2019 12:27:50 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Perhaps we should by default produce encoding with BOM when XML header
> > specifies UTF-16?
> 
> I think yes, https://www.w3.org/TR/xml/#charencoding says
> 
>     Entities encoded in UTF-16 MUST [...] begin with the Byte Order Mark

OK, I did that as well, and pushed the changes to master.

> By the way, is Bug#8282 the same as this one, or just closely related?

It's the same problem; merged the bugs.

> It's talking about sgml-html-meta-auto-coding-function (though maybe
> sgml-xml-auto-coding-function is more relevant).  I'm getting a little
> confused between all the different *-find/auto-coding-* functions.

The function relevant for the recipe in bug#8282 is
sgml-xml-auto-coding-function, which is where I made the changes.  If
the HTML and/or SGML specs also mandate that we use BOM, then maybe we
need the same changes in sgml-html-meta-auto-coding-function as well.
Note that there's no equivalent for xml-find-file-coding-system for
non-XML files, so recognition of visited UTF-16 HTML files will not
work even if they do have a BOM.

> There is also nxml-set-auto-coding which seems to be mostly unused.

It is supposed to be used by packages that build on top of nXml,
AFAIU.

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-18  5:33                         ` Eli Zaretskii
@ 2019-05-18 20:57                           ` J S
  2019-05-19  4:58                             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: J S @ 2019-05-18 20:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org, npostavs@gmail.com

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that if a document is labelled "UTF-16", the application should check the byte order mark to see if it is little endian or big endian   Only if there's no byte order mark, should the document be interpreted as big endian.

________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Saturday, May 18, 2019 5:33 AM
To: J S
Cc: npostavs@gmail.com; 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

> From: J S <jszabo_98@hotmail.com>
> CC: "npostavs@gmail.com" <npostavs@gmail.com>, "35766@debbugs.gnu.org"
>        <35766@debbugs.gnu.org>
> Date: Fri, 17 May 2019 20:16:41 +0000
>
> For example, if I save this xml file in emacs, it saves it as utf-16 big endian:
>
> <?xml version="1.0" encoding="UTF-16"?>

This is the Emacs default, which is well documented, and is also
according to what the UTF-16 spec (RFC 2781) says.

> If I do this in powershell (really a .net method), it saves it as utf-16 little endian (osx or windows):

Then PowerShell behaves in violation of RFC 2781.

[-- Attachment #2: Type: text/html, Size: 2172 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-18 20:57                           ` J S
@ 2019-05-19  4:58                             ` Eli Zaretskii
  2019-05-19 14:12                               ` J S
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-05-19  4:58 UTC (permalink / raw)
  To: J S; +Cc: 35766@debbugs.gnu.org, npostavs@gmail.com

On May 18, 2019 11:57:51 PM GMT+03:00, J S <jszabo_98@hotmail.com> wrote:
> RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that
> if a document is labelled "UTF-16", the application should check the
> byte order mark to see if it is little endian or big endian   Only if
> there's no byte order mark, should the document be interpreted as big
> endian.
> 


If you are talking about visiting an existing file, then the change I installed does just that.  I was talking about saving a file, in which case there's no BOM, since it isn't present in the buffer 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#35766: emacs saves utf-16 le xml files as utf-16 be
  2019-05-19  4:58                             ` Eli Zaretskii
@ 2019-05-19 14:12                               ` J S
  0 siblings, 0 replies; 17+ messages in thread
From: J S @ 2019-05-19 14:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35766@debbugs.gnu.org, npostavs@gmail.com

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

Sounds good.
________________________________
From: Eli Zaretskii <eliz@gnu.org>
Sent: Sunday, May 19, 2019 4:58 AM
To: J S
Cc: npostavs@gmail.com; 35766@debbugs.gnu.org
Subject: Re: bug#35766: emacs saves utf-16 le xml files as utf-16 be

On May 18, 2019 11:57:51 PM GMT+03:00, J S <jszabo_98@hotmail.com> wrote:
> RFC 2781 under "4.3 Interpreting text labelled as UTF-16" says is that
> if a document is labelled "UTF-16", the application should check the
> byte order mark to see if it is little endian or big endian   Only if
> there's no byte order mark, should the document be interpreted as big
> endian.
>


If you are talking about visiting an existing file, then the change I installed does just that.  I was talking about saving a file, in which case there's no BOM, since it isn't present in the buffer

[-- Attachment #2: Type: text/html, Size: 1643 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-19 14:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-16 17:11 bug#35766: emacs saves utf-16 le xml files as utf-16 be J S
2019-05-16 18:22 ` Eli Zaretskii
     [not found]   ` <BL0PR11MB34754605999DC2A03A6DF45A9E0A0@BL0PR11MB3475.namprd11.prod.outlook.com>
2019-05-16 19:21     ` J S
2019-05-16 20:57       ` J S
2019-05-17  9:26         ` Eli Zaretskii
2019-05-17 11:26           ` J S
2019-05-17 11:48             ` Noam Postavsky
2019-05-17 15:34               ` Eli Zaretskii
2019-05-17 16:27                 ` npostavs
2019-05-17 16:57                   ` J S
2019-05-17 19:46                     ` Eli Zaretskii
2019-05-17 20:16                       ` J S
2019-05-18  5:33                         ` Eli Zaretskii
2019-05-18 20:57                           ` J S
2019-05-19  4:58                             ` Eli Zaretskii
2019-05-19 14:12                               ` J S
2019-05-18  7:26                   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).