* bug#4051: Character Soup
@ 2009-08-05 21:09 Juri Linkov
2016-02-11 20:38 ` Alan Third
0 siblings, 1 reply; 2+ messages in thread
From: Juri Linkov @ 2009-08-05 21:09 UTC (permalink / raw)
To: emacs-pretest-bug
The coding system for the buffer with the Latin-1 character á in
the Cyrillic KOI8 language environment is detected as Chinese gb2312.
How funny!
I noticed this while reporting the bug#4037 that was sent by message.el
with charset=gb2312. Mail readers incorrectly display this message due
to ugly fonts associated with gb2312 (this is a separate problem).
I think it would be more natural to encode this as Latin-1 (in this
particular case) or generally UTF-8 - the universal coding specially
designed for mixing different scripts.
The easiest way to reproduce this problem:
1. emacs -Q
2. C-x RET l Cyrillic-KOI8
3. C-x 8 ' a
4. C-x C-s
5. File to save in: /tmp/file
After that the prompt says:
Select coding system (default chinese-iso-8bit):
and the buffer `*Warning*' contains:
These default coding systems were tried to encode text
in the buffer `file':
(cyrillic-koi8-unix (192 . 225))
However, each of them encountered characters it couldn't encode:
cyrillic-koi8-unix cannot encode these: á
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
gb2312 utf-8 euc-jis-2004 euc-jp windows-1258 viscii
iso-2022-jp-2004 cp862 iso-8859-16 hp-roman8 next mac-roman cp437
cp865 cp861 cp860 cp858 cp857 cp852 cp850 windows-1254 windows-1252
windows-1250 iso-8859-15 iso-8859-14 iso-8859-10 iso-8859-9
iso-8859-4 iso-8859-3 iso-8859-2 gb18030 gbk hz-gb-2312 utf-7
iso-8859-1 utf-16 utf-16be-with-signature utf-16le-with-signature
utf-16be utf-16le iso-2022-7bit utf-8-auto utf-8-with-signature
eucjp-ms vietnamese-tcvn vietnamese-viqr vietnamese-vscii
japanese-shift-jis-2004 japanese-iso-7bit-1978-irv ibm1047
utf-7-imap utf-8-emacs
I already figured out how to fix this problem for message.el using
(setq mm-coding-system-priorities (cons 'utf-8 mm-coding-system-priorities))
But as shown by the test case above this is a general problem.
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 2+ messages in thread
* bug#4051: Character Soup
2009-08-05 21:09 bug#4051: Character Soup Juri Linkov
@ 2016-02-11 20:38 ` Alan Third
0 siblings, 0 replies; 2+ messages in thread
From: Alan Third @ 2016-02-11 20:38 UTC (permalink / raw)
To: Juri Linkov; +Cc: 4051-done, 4051
Juri Linkov <juri@jurta.org> writes:
> The coding system for the buffer with the Latin-1 character á in
> the Cyrillic KOI8 language environment is detected as Chinese gb2312.
> How funny!
Hi, sorry nobody's got back to you about this before now. It seems that
the choice of gb2312 isn't due to it being detected as Chinese text, but
just that that is the first encoding Emacs finds that can encode the
buffer correctly.
There's some more discussion of this here:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22436
I'm going to close this bug report, but please reopen it if you're
unhappy.
--
Alan Third
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-02-11 20:38 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-05 21:09 bug#4051: Character Soup Juri Linkov
2016-02-11 20:38 ` Alan Third
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.