[bzr revision 109796] Have a look at the attached file, containing a single character. (It's transmitted as binary to avoid e-mail encoding issues). It contains a single, four-byte UTF-8 encoded character (0xF4 0xB5 0x87 0x9E, which would map to the non-existent Unicode character code U+1351DE). If I load this file as UTF-8 encoded, Emacs gives this as the output of `C-u C-x =': position: 1 of 2 (0%), column: 0 character: 二 (displayed as 二) (codepoint 20108, #o47214, #x4e8c) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x4E8C syntax: w which means: word category: .:Base, C:2-byte han, L:Left-to-right (strong), c:Chinese, h:Korean, j:Japanese, |:line breakable to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME" buffer code: #xE4 #xBA #x8C file code: #xE4 #xBA #x8C (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-SimSun-normal-normal-normal-*-24-*-*-*-d-0-iso10646-1 (#x460) Character code properties: customize what to show name: CJK IDEOGRAPH-4E8C general-category: Lo (Letter, Other) decomposition: (20108) ('二') Look what Emacs says about the file code. If I save this one-character file as UTF-8, the character code stays as-is. This behaviour is clearly wrong. I suspect that Emacs is using such a high character code for internal representation of the `emacs-mule' encoding. However, the user must not see this. Instead, such characters must be converted to correct UTF-8. Werner ====================================================================== In GNU Emacs 24.2.50.1 (i686-pc-linux-gnu, GTK+ Version 2.24.9) of 2012-08-28 on linux-nvf0 Windowing system distributor `The X.Org Foundation', version 11.0.11004000 Configured using: `configure 'MAKEINFO=/usr/bin/makeinfo' '--with-x-toolkit=gtk'' Important settings: value of $LANG: de_DE.UTF-8 value of $XMODIFIERS: @im=none locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Summary Minor modes in effect: tooltip-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t transient-mark-mode: t Recent input: w b u g - e m C-c C-q y M-x w r i t e - e m C-g C-h a b u g C-x 1 M-x r e p r t o r t - e m Recent messages: Saving file /home/wl/Mail/draft/11... Wrote /home/wl/Mail/draft/11 Draft is prepared No matching alias [7 times] Kill draft message? (y or n) y Saving file /home/wl/Mail/draft/11... Wrote /home/wl/Mail/draft/11 Draft was killed Quit Type C-x 4 C-o RET to restore the other window. Load-path shadows: None found. Features: (shadow emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils apropos descr-text latexenc preview prv-emacs byte-opt tex-buf noutline outline font-latex warnings bytecomp byte-compile cconv macroexp latex easy-mmode edmacro kmacro tex-style cus-edit wid-edit cus-start cus-load pp mew-varsx mew-unix cal-menu calendar cal-loaddefs mew-auth mew-config mew-imap2 mew-imap mew-nntp2 mew-nntp mew-pop mew-smtp mew-ssl mew-ssh mew-net mew-highlight mew-sort mew-fib mew-ext mew-refile mew-demo mew-attach mew-draft mew-message mew-thread mew-virtual mew-summary4 mew-summary3 mew-summary2 mew-summary mew-search mew-pick mew-passwd mew-scan mew-syntax mew-bq mew-smime mew-pgp mew-header mew-exec mew-mark mew-mime mew-edit mew-decode mew-encode mew-cache mew-minibuf mew-complete mew-addrbook mew-local mew-vars3 mew-vars2 mew-vars mew-env mew-mule3 mew-mule mew-gemacs mew-key mew-func mew-blvs mew-const mew tex advice help-fns advice-preload tex-site auto-loads quail help-mode easymenu cjktilde disp-table time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind dynamic-setting system-font-setting font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)