* auto-recognizing utf-16le ?
@ 2009-06-15 11:40 Miles Bader
2009-06-15 21:45 ` Andreas Schwab
0 siblings, 1 reply; 6+ messages in thread
From: Miles Bader @ 2009-06-15 11:40 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]
Someone on #emacs noticed that emacs doesn't seem to auto-recognize
files encoding using utf-16le. Visiting a file which uses such an
encoding results in the buffer having coding-system "no-conversion
(alias: binary)", and lots of ^@ (NUL) characters in the buffer.
Forcing the encoding with "C-x C-m r utf-16le RET" results in the
correct thing happening.
[He was on windows where this coding system is common, so it's kind of
annoying for him.]
I noticed that the same happens on debian.
I thought maybe he could just do:
(prefer-coding-system 'utf-16le-dos)
but it seems to have no effect.
To reproduce:
1. Save this message's attachment to a file "/tmp/oink"
2. Start emacs with: HOME=/tmp emacs -Q
3. Visit the file you saved: C-x C-f /tmp/oink RET
4. ** Notice that the buffer contains ^@ (NUL) characters, and that
the buffer coding-system is "no-conversion (binary)"
5. Re-visit the file, forcing the coding-system:
C-x C-m r utf-16le RET yes RET
6. ** Notice that the file contents are now correct
7. Kill the current buffer: C-x k RET
8. Evaluate: M-: (prefer-coding-system 'utf-16le) RET
9. Visit the file again: C-x C-f /tmp/oink RET
10. ** Notice that prefer-coding-system didn't seem to have any effect
Thanks,
-Miles
[-- Attachment #2: test file encoded using utf-16le --]
[-- Type: application/octet-stream, Size: 30 bytes --]
[-- Attachment #3: Type: text/plain, Size: 167 bytes --]
--
Justice, n. A commodity which in a more or less adulterated condition the
State sells to the citizen as a reward for his allegiance, taxes and personal
service.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: auto-recognizing utf-16le ?
2009-06-15 11:40 auto-recognizing utf-16le ? Miles Bader
@ 2009-06-15 21:45 ` Andreas Schwab
2009-06-16 0:20 ` Miles Bader
2009-06-16 2:04 ` Kenichi Handa
0 siblings, 2 replies; 6+ messages in thread
From: Andreas Schwab @ 2009-06-15 21:45 UTC (permalink / raw)
To: Miles Bader; +Cc: emacs-devel
Miles Bader <miles.bader@necel.com> writes:
> Someone on #emacs noticed that emacs doesn't seem to auto-recognize
> files encoding using utf-16le.
UTF-16 detection never tries to auto detect files without a signature.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: auto-recognizing utf-16le ?
2009-06-15 21:45 ` Andreas Schwab
@ 2009-06-16 0:20 ` Miles Bader
2009-06-16 2:04 ` Kenichi Handa
1 sibling, 0 replies; 6+ messages in thread
From: Miles Bader @ 2009-06-16 0:20 UTC (permalink / raw)
To: emacs-devel
Andreas Schwab <schwab@linux-m68k.org> writes:
>> Someone on #emacs noticed that emacs doesn't seem to auto-recognize
>> files encoding using utf-16le.
>
> UTF-16 detection never tries to auto detect files without a signature.
So are UTF-16 files without a signature an anomaly?
-Miles
--
Philosophy, n. A route of many roads leading from nowhere to nothing.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: auto-recognizing utf-16le ?
2009-06-15 21:45 ` Andreas Schwab
2009-06-16 0:20 ` Miles Bader
@ 2009-06-16 2:04 ` Kenichi Handa
2009-06-16 15:01 ` Andreas Schwab
1 sibling, 1 reply; 6+ messages in thread
From: Kenichi Handa @ 2009-06-16 2:04 UTC (permalink / raw)
To: Andreas Schwab; +Cc: emacs-devel, miles
In article <m2r5xlw4qp.fsf@igel.home>, Andreas Schwab <schwab@linux-m68k.org> writes:
> Miles Bader <miles.bader@necel.com> writes:
> > Someone on #emacs noticed that emacs doesn't seem to auto-recognize
> > files encoding using utf-16le.
> UTF-16 detection never tries to auto detect files without a signature.
No. detect_coding_utf_16 tries to check if the file is
UTF-16 or not by checking the dispersion of Eth and Oth
bytes where E is even and O is odd. But, there were two
bugs in the code. One was already fixed by this change.
2009-06-15 Andreas Schwab <schwab@linux-m68k.org>
* coding.c (detect_coding_utf_16): Fix typo counting odd bytes.
And, I've just installed a fix of another bug.
So, with the latest code, if you set
inhibit-null-byte-detection to t, and prefer utf-16be and/or
utf-16le, Emacs will detect UTF-16 files without BOM in most
cases.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: auto-recognizing utf-16le ?
2009-06-16 2:04 ` Kenichi Handa
@ 2009-06-16 15:01 ` Andreas Schwab
2009-06-17 0:43 ` Kenichi Handa
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2009-06-16 15:01 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel, miles
Kenichi Handa <handa@m17n.org> writes:
> And, I've just installed a fix of another bug.
I think instead of
while (detect_info->rejected != CATEGORY_MASK_UTF_16)
you probably want to check this:
while ((detect_info->rejected & CATEGORY_MASK_UTF_16) != CATEGORY_MASK_UTF_16)
since there may be bits for other categories already set in rejected.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: auto-recognizing utf-16le ?
2009-06-16 15:01 ` Andreas Schwab
@ 2009-06-17 0:43 ` Kenichi Handa
0 siblings, 0 replies; 6+ messages in thread
From: Kenichi Handa @ 2009-06-17 0:43 UTC (permalink / raw)
To: Andreas Schwab; +Cc: emacs-devel, miles
In article <m24oug1av0.fsf@linux-m68k.org>, Andreas Schwab <schwab@linux-m68k.org> writes:
> Kenichi Handa <handa@m17n.org> writes:
> > And, I've just installed a fix of another bug.
> I think instead of
> while (detect_info->rejected != CATEGORY_MASK_UTF_16)
> you probably want to check this:
> while ((detect_info->rejected & CATEGORY_MASK_UTF_16) != CATEGORY_MASK_UTF_16)
> since there may be bits for other categories already set in rejected.
Ah! You are right, thank you. I installed that fix.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-06-17 0:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-15 11:40 auto-recognizing utf-16le ? Miles Bader
2009-06-15 21:45 ` Andreas Schwab
2009-06-16 0:20 ` Miles Bader
2009-06-16 2:04 ` Kenichi Handa
2009-06-16 15:01 ` Andreas Schwab
2009-06-17 0:43 ` Kenichi Handa
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.