From: Juri Linkov <juri@jurta.org>
To: emacs-pretest-bug@gnu.org
Subject: bug#4051: Character Soup
Date: Thu, 06 Aug 2009 00:09:20 +0300 [thread overview]
Message-ID: <87my6eary7.fsf@mail.jurta.org> (raw)
The coding system for the buffer with the Latin-1 character á in
the Cyrillic KOI8 language environment is detected as Chinese gb2312.
How funny!
I noticed this while reporting the bug#4037 that was sent by message.el
with charset=gb2312. Mail readers incorrectly display this message due
to ugly fonts associated with gb2312 (this is a separate problem).
I think it would be more natural to encode this as Latin-1 (in this
particular case) or generally UTF-8 - the universal coding specially
designed for mixing different scripts.
The easiest way to reproduce this problem:
1. emacs -Q
2. C-x RET l Cyrillic-KOI8
3. C-x 8 ' a
4. C-x C-s
5. File to save in: /tmp/file
After that the prompt says:
Select coding system (default chinese-iso-8bit):
and the buffer `*Warning*' contains:
These default coding systems were tried to encode text
in the buffer `file':
(cyrillic-koi8-unix (192 . 225))
However, each of them encountered characters it couldn't encode:
cyrillic-koi8-unix cannot encode these: á
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
gb2312 utf-8 euc-jis-2004 euc-jp windows-1258 viscii
iso-2022-jp-2004 cp862 iso-8859-16 hp-roman8 next mac-roman cp437
cp865 cp861 cp860 cp858 cp857 cp852 cp850 windows-1254 windows-1252
windows-1250 iso-8859-15 iso-8859-14 iso-8859-10 iso-8859-9
iso-8859-4 iso-8859-3 iso-8859-2 gb18030 gbk hz-gb-2312 utf-7
iso-8859-1 utf-16 utf-16be-with-signature utf-16le-with-signature
utf-16be utf-16le iso-2022-7bit utf-8-auto utf-8-with-signature
eucjp-ms vietnamese-tcvn vietnamese-viqr vietnamese-vscii
japanese-shift-jis-2004 japanese-iso-7bit-1978-irv ibm1047
utf-7-imap utf-8-emacs
I already figured out how to fix this problem for message.el using
(setq mm-coding-system-priorities (cons 'utf-8 mm-coding-system-priorities))
But as shown by the test case above this is a general problem.
--
Juri Linkov
http://www.jurta.org/emacs/
next reply other threads:[~2009-08-05 21:09 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-05 21:09 Juri Linkov [this message]
2016-02-11 20:38 ` bug#4051: Character Soup Alan Third
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87my6eary7.fsf@mail.jurta.org \
--to=juri@jurta.org \
--cc=4051@emacsbugs.donarmstrong.com \
--cc=emacs-pretest-bug@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.