Kenichi Handa writes: > „Die Familie Schroffenstein“ > > I thought that the notation &#NUMBER is for transmitting > Unicode character of code NUMBER. But, 132 and 147 are > control codes in Unicode, not any kind of quotings. &#NUMBERs are so called "character references"; the SGML declaration defines which are allowed. For HTML you must consult the html.d[e]?cl file. The crucial section is (HTML 2): BASESET "ISO Registration Number 100//CHARSET ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1" DESCSET 128 32 UNUSED 160 96 32 This basically means: € to Ÿ are unused. The same applies for HTML 4 (and later fpr XML resp. XHTML): BASESET "ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 UNUSED [...] To make the SGML parser happy you can provide a changed declaration: BASESET "ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 4 UNUSED 132 1 "My rising double quote left (low)" 133 14 UNUSED 147 1 "My rising double quote right (high)" 148 16 UNUSED [...] Untested, and the result is invalid HTML. If they would announce a proper HTTP header, it could be okay: Content-Type: text/html; charset=windows-1252 Andreas Schwab writes: > The numbers are supposed to be ISO 8859-1 characters codes. I'd guess the > page has been written with some broken (a.k.a. W*nd*ws) software (the use > of *.htm makes this apparent). Yes, they have "interesting" guidelines online... Kenichi Handa writes: > Ah, I see. I found that windows-125X maps 132 and 147 to > U+201E and U+201C. So, perhaps those systems (galeon and > lynx) parse them as U+201E and U+201C. Anyway, how to > encode them in X selection is their problem and Emacs can't > do anything about it. Yes, but once in the X selection I'd like to see Emacs honor them. The spacing problem also occurs when I try to cut and paste from Markus Kuhn's demo file (http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt): ¥ âdeutsche