unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#4051: Character Soup
@ 2009-08-05 21:09 Juri Linkov
  2016-02-11 20:38 ` Alan Third
  0 siblings, 1 reply; 2+ messages in thread
From: Juri Linkov @ 2009-08-05 21:09 UTC (permalink / raw)
  To: emacs-pretest-bug

The coding system for the buffer with the Latin-1 character á in
the Cyrillic KOI8 language environment is detected as Chinese gb2312.
How funny!

I noticed this while reporting the bug#4037 that was sent by message.el
with charset=gb2312.  Mail readers incorrectly display this message due
to ugly fonts associated with gb2312 (this is a separate problem).

I think it would be more natural to encode this as Latin-1 (in this
particular case) or generally UTF-8 - the universal coding specially
designed for mixing different scripts.

The easiest way to reproduce this problem:

  1. emacs -Q
  2. C-x RET l Cyrillic-KOI8
  3. C-x 8 ' a
  4. C-x C-s
  5. File to save in: /tmp/file

After that the prompt says:

  Select coding system (default chinese-iso-8bit): 

and the buffer `*Warning*' contains:

  These default coding systems were tried to encode text
  in the buffer `file':
    (cyrillic-koi8-unix (192 . 225))
  However, each of them encountered characters it couldn't encode:
    cyrillic-koi8-unix cannot encode these: á

  Click on a character (or switch to this window by `C-x o'
  and select the characters by RET) to jump to the place it appears,
  where `C-u C-x =' will give information about it.

  Select one of the safe coding systems listed below,
  or cancel the writing with C-g and edit the buffer
     to remove or modify the problematic characters,
  or specify any other coding system (and risk losing
     the problematic characters).

    gb2312 utf-8 euc-jis-2004 euc-jp windows-1258 viscii
    iso-2022-jp-2004 cp862 iso-8859-16 hp-roman8 next mac-roman cp437
    cp865 cp861 cp860 cp858 cp857 cp852 cp850 windows-1254 windows-1252
    windows-1250 iso-8859-15 iso-8859-14 iso-8859-10 iso-8859-9
    iso-8859-4 iso-8859-3 iso-8859-2 gb18030 gbk hz-gb-2312 utf-7
    iso-8859-1 utf-16 utf-16be-with-signature utf-16le-with-signature
    utf-16be utf-16le iso-2022-7bit utf-8-auto utf-8-with-signature
    eucjp-ms vietnamese-tcvn vietnamese-viqr vietnamese-vscii
    japanese-shift-jis-2004 japanese-iso-7bit-1978-irv ibm1047
    utf-7-imap utf-8-emacs

I already figured out how to fix this problem for message.el using
(setq mm-coding-system-priorities (cons 'utf-8 mm-coding-system-priorities))
But as shown by the test case above this is a general problem.

-- 
Juri Linkov
http://www.jurta.org/emacs/





^ permalink raw reply	[flat|nested] 2+ messages in thread

* bug#4051: Character Soup
  2009-08-05 21:09 bug#4051: Character Soup Juri Linkov
@ 2016-02-11 20:38 ` Alan Third
  0 siblings, 0 replies; 2+ messages in thread
From: Alan Third @ 2016-02-11 20:38 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 4051-done, 4051

Juri Linkov <juri@jurta.org> writes:

> The coding system for the buffer with the Latin-1 character á in
> the Cyrillic KOI8 language environment is detected as Chinese gb2312.
> How funny!

Hi, sorry nobody's got back to you about this before now. It seems that
the choice of gb2312 isn't due to it being detected as Chinese text, but
just that that is the first encoding Emacs finds that can encode the
buffer correctly.

There's some more discussion of this here:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22436

I'm going to close this bug report, but please reopen it if you're
unhappy.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-02-11 20:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-05 21:09 bug#4051: Character Soup Juri Linkov
2016-02-11 20:38 ` Alan Third

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).