From: Werner LEMBERG <wl@gnu.org>
To: 12291@debbugs.gnu.org
Cc: Curtis Smith <smithcu@gvsu.edu>
Subject: bug#12291: [rev 109796] wrong UTF-8 handling
Date: Tue, 28 Aug 2012 07:47:20 +0200 (CEST) [thread overview]
Message-ID: <20120828.074720.480105751.wl@gnu.org> (raw)
[-- Attachment #1: Type: Text/Plain, Size: 5007 bytes --]
[bzr revision 109796]
Have a look at the attached file, containing a single character.
(It's transmitted as binary to avoid e-mail encoding issues). It
contains a single, four-byte UTF-8 encoded character (0xF4 0xB5 0x87
0x9E, which would map to the non-existent Unicode character code
U+1351DE). If I load this file as UTF-8 encoded, Emacs gives this as
the output of `C-u C-x =':
position: 1 of 2 (0%), column: 0
character: 二 (displayed as 二) (codepoint 20108, #o47214, #x4e8c)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x4E8C
syntax: w which means: word
category: .:Base, C:2-byte han, L:Left-to-right (strong), c:Chinese, h:Korean, j:Japanese, |:line breakable
to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
buffer code: #xE4 #xBA #x8C
file code: #xE4 #xBA #x8C (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-SimSun-normal-normal-normal-*-24-*-*-*-d-0-iso10646-1 (#x460)
Character code properties: customize what to show
name: CJK IDEOGRAPH-4E8C
general-category: Lo (Letter, Other)
decomposition: (20108) ('二')
Look what Emacs says about the file code. If I save this
one-character file as UTF-8, the character code stays as-is.
This behaviour is clearly wrong. I suspect that Emacs is using such a
high character code for internal representation of the `emacs-mule'
encoding. However, the user must not see this. Instead, such
characters must be converted to correct UTF-8.
Werner
======================================================================
In GNU Emacs 24.2.50.1 (i686-pc-linux-gnu, GTK+ Version 2.24.9)
of 2012-08-28 on linux-nvf0
Windowing system distributor `The X.Org Foundation', version 11.0.11004000
Configured using:
`configure 'MAKEINFO=/usr/bin/makeinfo' '--with-x-toolkit=gtk''
Important settings:
value of $LANG: de_DE.UTF-8
value of $XMODIFIERS: @im=none
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Summary
Minor modes in effect:
tooltip-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
column-number-mode: t
transient-mark-mode: t
Recent input:
<return> w b u g - e m <tab> <tab> <tab> <tab> <tab>
<tab> <tab> <backspace> <backspace> <tab> <tab> C-c
C-q y M-x w r i t e - e m <tab> C-g C-h a b u g <return>
<M-next> C-x 1 M-x r e p r t <backspace> <backspace>
o r t - e m <tab> <return>
Recent messages:
Saving file /home/wl/Mail/draft/11...
Wrote /home/wl/Mail/draft/11
Draft is prepared
No matching alias [7 times]
Kill draft message? (y or n) y
Saving file /home/wl/Mail/draft/11...
Wrote /home/wl/Mail/draft/11
Draft was killed
Quit
Type C-x 4 C-o RET to restore the other window.
Load-path shadows:
None found.
Features:
(shadow emacsbug message format-spec rfc822 mml mml-sec mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
apropos descr-text latexenc preview prv-emacs byte-opt tex-buf
noutline outline font-latex warnings bytecomp byte-compile cconv
macroexp latex easy-mmode edmacro kmacro tex-style cus-edit wid-edit
cus-start cus-load pp mew-varsx mew-unix cal-menu calendar
cal-loaddefs mew-auth mew-config mew-imap2 mew-imap mew-nntp2 mew-nntp
mew-pop mew-smtp mew-ssl mew-ssh mew-net mew-highlight mew-sort
mew-fib mew-ext mew-refile mew-demo mew-attach mew-draft mew-message
mew-thread mew-virtual mew-summary4 mew-summary3 mew-summary2
mew-summary mew-search mew-pick mew-passwd mew-scan mew-syntax mew-bq
mew-smime mew-pgp mew-header mew-exec mew-mark mew-mime mew-edit
mew-decode mew-encode mew-cache mew-minibuf mew-complete mew-addrbook
mew-local mew-vars3 mew-vars2 mew-vars mew-env mew-mule3 mew-mule
mew-gemacs mew-key mew-func mew-blvs mew-const mew tex advice help-fns
advice-preload tex-site auto-loads quail help-mode easymenu cjktilde
disp-table time-date tooltip ediff-hook vc-hooks lisp-float-type
mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow
timer select scroll-bar mouse jit-lock font-lock syntax facemenu
font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces
cus-face files text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)
[-- Attachment #2: emacs-problem.utf8 --]
[-- Type: Application/Octet-Stream, Size: 5 bytes --]
next reply other threads:[~2012-08-28 5:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-28 5:47 Werner LEMBERG [this message]
2012-08-28 9:03 ` bug#12291: [rev 109796] wrong UTF-8 handling Andreas Schwab
2012-08-28 14:57 ` Kenichi Handa
2012-08-28 19:22 ` Werner LEMBERG
2012-08-31 10:40 ` Eli Zaretskii
2012-09-03 0:59 ` Kenichi Handa
2012-09-03 2:40 ` Eli Zaretskii
2022-01-27 16:32 ` Lars Ingebrigtsen
2022-01-27 16:52 ` Eli Zaretskii
2022-02-25 2:33 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120828.074720.480105751.wl@gnu.org \
--to=wl@gnu.org \
--cc=12291@debbugs.gnu.org \
--cc=smithcu@gvsu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).