* Russian letters @ 2006-07-05 18:10 Paul Pogonyshev 2006-07-05 18:19 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-05 18:10 UTC (permalink / raw) Russian letters loaded from file and newly typed are different character no matter if `unify-8859-on-...-mode's are active or not. Characters loaded from file: character: а (3664, #o7120, #xe50, U+0430) charset: cyrillic-iso8859-5 (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.) code point: #x50 syntax: w which means: word category: y:Cyrillic buffer code: #x8C #xD0 file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) display: by this font (glyph code) -cronyx-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-5 (#xD0) Newly typed characters: character: а (332880, #o1212120, #x51450, U+0430) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: #x28 #x50 syntax: w which means: word category: y:Cyrillic buffer code: #x9C #xF4 #xA8 #xD0 file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) display: by this font (glyph code) -Adobe-Courier-Medium-R-Normal--17-120-100-100-M-100-ISO10646-1 (#x430) The latter are displayed as boxes on my machine, which makes editing of Russian text impossible. Reproducible with `emacs -Q'. I consider it a bug. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 18:10 Russian letters Paul Pogonyshev @ 2006-07-05 18:19 ` Andreas Schwab 2006-07-05 21:43 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2006-07-05 18:19 UTC (permalink / raw) Cc: emacs-devel Paul Pogonyshev <pogonyshev@gmx.net> writes: > Russian letters loaded from file and newly typed are different > character no matter if `unify-8859-on-...-mode's are active or > not. What's your language environment? Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 18:19 ` Andreas Schwab @ 2006-07-05 21:43 ` Paul Pogonyshev 2006-07-05 22:08 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-05 21:43 UTC (permalink / raw) Cc: Andreas Schwab Andreas Schwab wrote: > Paul Pogonyshev <pogonyshev@gmx.net> writes: > > > Russian letters loaded from file and newly typed are different > > character no matter if `unify-8859-on-...-mode's are active or > > not. > > What's your language environment? I'm not sure I understand your question. The buffer is in UTF-8 and Emacs knows that. `locale' reports LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL=en_US.utf8 Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 21:43 ` Paul Pogonyshev @ 2006-07-05 22:08 ` Andreas Schwab 2006-07-05 22:21 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2006-07-05 22:08 UTC (permalink / raw) Cc: emacs-devel Paul Pogonyshev <pogonyshev@gmx.net> writes: > Andreas Schwab wrote: >> Paul Pogonyshev <pogonyshev@gmx.net> writes: >> >> > Russian letters loaded from file and newly typed are different >> > character no matter if `unify-8859-on-...-mode's are active or >> > not. >> >> What's your language environment? > > I'm not sure I understand your question. C-h L (describe-language-environment) Anyway, as documented, unify-8859-on-decoding-mode can only map to `iso-latin-1' and `mule-unicode-0100-24ff'. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 22:08 ` Andreas Schwab @ 2006-07-05 22:21 ` Paul Pogonyshev 2006-07-05 22:55 ` Andreas Schwab 2006-07-06 3:41 ` Eli Zaretskii 0 siblings, 2 replies; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-05 22:21 UTC (permalink / raw) Cc: Andreas Schwab Andreas Schwab wrote: > Paul Pogonyshev <pogonyshev@gmx.net> writes: > > > Andreas Schwab wrote: > >> Paul Pogonyshev <pogonyshev@gmx.net> writes: > >> > >> > Russian letters loaded from file and newly typed are different > >> > character no matter if `unify-8859-on-...-mode's are active or > >> > not. > >> > >> What's your language environment? > > > > I'm not sure I understand your question. > > C-h L (describe-language-environment) UTF-8 language environment Input methods (default rfc1345): rfc1345 ("m" in mode line) TeX ("\" in mode line) sgml ("&" in mode line) ucs ("U+" in mode line) Character sets: nothing specific to UTF-8 Coding systems: mule-utf-8 (`u' in mode line): UTF-8 encoding for Emacs-supported Unicode characters. It supports Unicode characters of these ranges: U+0000..U+33FF, U+E000..U+FFFF. They correspond to these Emacs character sets: ascii, latin-iso8859-1, mule-unicode-0100-24ff, mule-unicode-2500-33ff, mule-unicode-e000-ffff On decoding (e.g. reading a file), Unicode characters not in the above ranges are decoded into sequences of eight-bit-control and eight-bit-graphic characters to preserve their byte sequences. The byte sequence is preserved on i/o for valid utf-8, but not necessarily for invalid utf-8. On encoding (e.g. writing a file), Emacs characters not belonging to any of the character sets listed above are encoded into the UTF-8 byte sequence representing U+FFFD (REPLACEMENT CHARACTER). (alias: mule-utf-8 utf-8) > Anyway, as documented, unify-8859-on-decoding-mode can only map to > `iso-latin-1' and `mule-unicode-0100-24ff'. That's fine, but if the same characters read from file and typed from keyboard are different in a buffer, that's nothing else than a bug. Tell the average user about language environments. Ideally, Emacs should work in this case as installed, without any configuration or lines in `.emacs'. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 22:21 ` Paul Pogonyshev @ 2006-07-05 22:55 ` Andreas Schwab 2006-07-06 15:59 ` Paul Pogonyshev 2006-07-06 3:41 ` Eli Zaretskii 1 sibling, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2006-07-05 22:55 UTC (permalink / raw) Cc: emacs-devel Paul Pogonyshev <pogonyshev@gmx.net> writes: >> Anyway, as documented, unify-8859-on-decoding-mode can only map to >> `iso-latin-1' and `mule-unicode-0100-24ff'. > > That's fine, but if the same characters read from file and typed from > keyboard are different in a buffer, that's nothing else than a bug. You can get that only if you explicitly specify a coding system during reading, otherwise your file would be decoded as latin-1 even if it is encoded as cyrillic-iso-8bit. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 22:55 ` Andreas Schwab @ 2006-07-06 15:59 ` Paul Pogonyshev 2006-07-06 16:39 ` Andreas Schwab 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 15:59 UTC (permalink / raw) Cc: Andreas Schwab Andreas Schwab wrote: > Paul Pogonyshev <pogonyshev@gmx.net> writes: > > >> Anyway, as documented, unify-8859-on-decoding-mode can only map to > >> `iso-latin-1' and `mule-unicode-0100-24ff'. > > > > That's fine, but if the same characters read from file and typed from > > keyboard are different in a buffer, that's nothing else than a bug. > > You can get that only if you explicitly specify a coding system during > reading, otherwise your file would be decoded as latin-1 even if it is > encoded as cyrillic-iso-8bit. The file is UTF-8 and mentions its coding in `Local variables'. Again, the file is read just fine. The problems begin when I type new characters into the buffer: they are treated differently than the same characters read from the file. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 15:59 ` Paul Pogonyshev @ 2006-07-06 16:39 ` Andreas Schwab 2006-07-06 18:17 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Andreas Schwab @ 2006-07-06 16:39 UTC (permalink / raw) Cc: emacs-devel Paul Pogonyshev <pogonyshev@gmx.net> writes: > The file is UTF-8 and mentions its coding in `Local variables'. Again, > the file is read just fine. The problems begin when I type new > characters into the buffer: they are treated differently than the same > characters read from the file. I can't reproduce that here. Whenever I read a file with russian letters that is encoded in utf-8 the letters are decoded into mule-unicode-0100-24ff. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 16:39 ` Andreas Schwab @ 2006-07-06 18:17 ` Paul Pogonyshev 2006-07-06 20:11 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 18:17 UTC (permalink / raw) Cc: Andreas Schwab Andreas Schwab wrote: > Paul Pogonyshev <pogonyshev@gmx.net> writes: > > > The file is UTF-8 and mentions its coding in `Local variables'. Again, > > the file is read just fine. The problems begin when I type new > > characters into the buffer: they are treated differently than the same > > characters read from the file. > > I can't reproduce that here. Whenever I read a file with russian letters > that is encoded in utf-8 the letters are decoded into > mule-unicode-0100-24ff. I said many times that the problems begin when I _type_ characters, not when they are read from file. They end up being different characters, at least in the sence that `describe-char' shows different things. I presume that new characters are still valid, but they are shown as boxes (no font support, aparently), while read characters are shown just fine. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 18:17 ` Paul Pogonyshev @ 2006-07-06 20:11 ` Eli Zaretskii 0 siblings, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2006-07-06 20:11 UTC (permalink / raw) Cc: schwab, emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Thu, 6 Jul 2006 21:17:24 +0300 > Cc: Andreas Schwab <schwab@suse.de> > > Andreas Schwab wrote: > > Paul Pogonyshev <pogonyshev@gmx.net> writes: > > > > > The file is UTF-8 and mentions its coding in `Local variables'. Again, > > > the file is read just fine. The problems begin when I type new > > > characters into the buffer: they are treated differently than the same > > > characters read from the file. > > > > I can't reproduce that here. Whenever I read a file with russian letters > > that is encoded in utf-8 the letters are decoded into > > mule-unicode-0100-24ff. > > I said many times that the problems begin when I _type_ characters, not > when they are read from file. No, you said, and I quote: Russian letters loaded from file and newly typed are different character no matter if `unify-8859-on-...-mode's are active or not. Characters loaded from file: character: a (3664, #o7120, #xe50, U+0430) charset: cyrillic-iso8859-5 (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.) code point: #x50 syntax: w which means: word category: y:Cyrillic buffer code: #x8C #xD0 file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) display: by this font (glyph code) -cronyx-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-5 (#xD0) Newly typed characters: character: a (332880, #o1212120, #x51450, U+0430) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: #x28 #x50 syntax: w which means: word category: y:Cyrillic buffer code: #x9C #xF4 #xA8 #xD0 file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) display: by this font (glyph code) -Adobe-Courier-Medium-R-Normal--17-120-100-100-M-100-ISO10646-1 (#x430) That is, you said that characters read from a file are decoded into cyrillic-iso8859-5, while characters you type are decoded into mule-unicode-0100-24ff. Now it sounds like it's the other way around, especially since you say that the file is encoded in UTF-8 (which is _always_ decoded into mule-unicode-0100-24ff, AFAIR). Please clarify which one is it. Also, please try the same file and keyboard keys in "emacs -Q", perhaps something in your .emacs has unpleasant side effects. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-05 22:21 ` Paul Pogonyshev 2006-07-05 22:55 ` Andreas Schwab @ 2006-07-06 3:41 ` Eli Zaretskii 2006-07-06 15:56 ` Paul Pogonyshev 1 sibling, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2006-07-06 3:41 UTC (permalink / raw) Cc: emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Thu, 6 Jul 2006 01:21:26 +0300 > Cc: Andreas Schwab <schwab@suse.de> > > > Anyway, as documented, unify-8859-on-decoding-mode can only map to > > `iso-latin-1' and `mule-unicode-0100-24ff'. > > That's fine, but if the same characters read from file and typed from > keyboard are different in a buffer, that's nothing else than a bug. What was the file's encoding? Does it help to play with the value of utf-fragment-on-decoding? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 3:41 ` Eli Zaretskii @ 2006-07-06 15:56 ` Paul Pogonyshev 2006-07-06 20:12 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 15:56 UTC (permalink / raw) Eli Zaretskii wrote: > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > Date: Thu, 6 Jul 2006 01:21:26 +0300 > > Cc: Andreas Schwab <schwab@suse.de> > > > > > Anyway, as documented, unify-8859-on-decoding-mode can only map to > > > `iso-latin-1' and `mule-unicode-0100-24ff'. > > > > That's fine, but if the same characters read from file and typed from > > keyboard are different in a buffer, that's nothing else than a bug. > > What was the file's encoding? UTF-8. > Does it help to play with the value of utf-fragment-on-decoding? Not really. Nothing seems to change and the result of `describe-char' are identical. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 15:56 ` Paul Pogonyshev @ 2006-07-06 20:12 ` Eli Zaretskii 2006-07-06 20:27 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2006-07-06 20:12 UTC (permalink / raw) Cc: emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Thu, 6 Jul 2006 18:56:54 +0300 > > Eli Zaretskii wrote: > > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > > Date: Thu, 6 Jul 2006 01:21:26 +0300 > > > Cc: Andreas Schwab <schwab@suse.de> > > > > > > > Anyway, as documented, unify-8859-on-decoding-mode can only map to > > > > `iso-latin-1' and `mule-unicode-0100-24ff'. > > > > > > That's fine, but if the same characters read from file and typed from > > > keyboard are different in a buffer, that's nothing else than a bug. > > > > What was the file's encoding? > > UTF-8. UTF-8 is always decoded into Unicode character set, while you originally said that characters read from a file were decoded into Cyrillic ISO character set. Which one is true? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 20:12 ` Eli Zaretskii @ 2006-07-06 20:27 ` Paul Pogonyshev 2006-07-06 20:38 ` Paul Pogonyshev 2006-07-06 21:14 ` Eli Zaretskii 0 siblings, 2 replies; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 20:27 UTC (permalink / raw) Eli Zaretskii wrote: > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > Date: Thu, 6 Jul 2006 18:56:54 +0300 > > > > Eli Zaretskii wrote: > > > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > > > Date: Thu, 6 Jul 2006 01:21:26 +0300 > > > > Cc: Andreas Schwab <schwab@suse.de> > > > > > > > > > Anyway, as documented, unify-8859-on-decoding-mode can only map to > > > > > `iso-latin-1' and `mule-unicode-0100-24ff'. > > > > > > > > That's fine, but if the same characters read from file and typed from > > > > keyboard are different in a buffer, that's nothing else than a bug. > > > > > > What was the file's encoding? > > > > UTF-8. > > UTF-8 is always decoded into Unicode character set, while you > originally said that characters read from a file were decoded into > Cyrillic ISO character set. Which one is true? I explicitly reverted the buffer in UTF-8 (though I know it is): C-x RET r utf-8 RET yes RET `describe-char' on the Cyrillic characters from the file shows this: character: а (3664, #o7120, #xe50, U+0430) charset: cyrillic-iso8859-5 (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.) code point: #x50 syntax: w which means: word category: y:Cyrillic buffer code: #x8C #xD0 file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) display: by this font (glyph code) -cronyx-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-5 (#xD0) Note the file code, it is UTF-8! Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 20:27 ` Paul Pogonyshev @ 2006-07-06 20:38 ` Paul Pogonyshev 2006-07-07 8:41 ` Eli Zaretskii 2006-07-06 21:14 ` Eli Zaretskii 1 sibling, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 20:38 UTC (permalink / raw) Cc: Eli Zaretskii Paul Pogonyshev wrote: > I explicitly reverted the buffer in UTF-8 (though I know it is): > > C-x RET r utf-8 RET yes RET > > `describe-char' on the Cyrillic characters from the file shows this: > > character: а (3664, #o7120, #xe50, U+0430) > charset: cyrillic-iso8859-5 (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.) > code point: #x50 > syntax: w which means: word > category: y:Cyrillic > buffer code: #x8C #xD0 > file code: #xD0 #xB0 (encoded by coding system mule-utf-8-unix) > display: by this font (glyph code) > -cronyx-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-5 (#xD0) > > Note the file code, it is UTF-8! Actually, this doesn't happen in `emacs -Q', not sure why... There characters are decoded to `mule-unicode-0100-24ff' (and displayed as boxes, gah.) Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 20:38 ` Paul Pogonyshev @ 2006-07-07 8:41 ` Eli Zaretskii 0 siblings, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2006-07-07 8:41 UTC (permalink / raw) Cc: emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Thu, 6 Jul 2006 23:38:12 +0300 > Cc: Eli Zaretskii <eliz@gnu.org> > > Actually, this doesn't happen in `emacs -Q', not sure why... There > characters are decoded to `mule-unicode-0100-24ff' (and displayed as > boxes, gah.) This is expected behavior, AFAIK. The empty bixes mean you need to install a Unicode font that spans the Cyrillic characters. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 20:27 ` Paul Pogonyshev 2006-07-06 20:38 ` Paul Pogonyshev @ 2006-07-06 21:14 ` Eli Zaretskii 2006-07-06 21:48 ` Paul Pogonyshev 1 sibling, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2006-07-06 21:14 UTC (permalink / raw) Cc: emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Thu, 6 Jul 2006 23:27:27 +0300 > > > > > What was the file's encoding? > > > > > > UTF-8. > > > > UTF-8 is always decoded into Unicode character set, while you > > originally said that characters read from a file were decoded into > > Cyrillic ISO character set. Which one is true? > > I explicitly reverted the buffer in UTF-8 (though I know it is): > > C-x RET r utf-8 RET yes RET This gets more and more complicated with each message. Could you please post a short file (as a binary attachment) and a clear recipe how you visit it and how you type Cyrillic characters in order to reproduce the problem? Did I understand correctly that, unlike you first said, the Cyrillic ISO characters come from keyboard input, while the mule-unicode characters come from a file? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 21:14 ` Eli Zaretskii @ 2006-07-06 21:48 ` Paul Pogonyshev 2006-07-07 8:46 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-06 21:48 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1384 bytes --] Eli Zaretskii wrote: > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > Date: Thu, 6 Jul 2006 23:27:27 +0300 > > > > > > > What was the file's encoding? > > > > > > > > UTF-8. > > > > > > UTF-8 is always decoded into Unicode character set, while you > > > originally said that characters read from a file were decoded into > > > Cyrillic ISO character set. Which one is true? > > > > I explicitly reverted the buffer in UTF-8 (though I know it is): > > > > C-x RET r utf-8 RET yes RET > > This gets more and more complicated with each message. > > Could you please post a short file (as a binary attachment) and a > clear recipe how you visit it and how you type Cyrillic characters in > order to reproduce the problem? OK, after some testing I came up with a test (I didn't find it earlier because `customize-variable' is essential, simply setting it doesn't work): $ emacs -Q M-x customize-variable RET utf-fragment-on-decoding RET [set to t, set for current session] C-x C-f test.text RET Now, the characters from the file are decoded into `cyrillic-iso8859-5', while new, typed characters are in `mule-unicode-0100-24ff'. > Did I understand correctly that, unlike you first said, the Cyrillic > ISO characters come from keyboard input, while the mule-unicode > characters come from a file? No. Please try yourself, that way it must be easier to understand ;) Paul [-- Attachment #2: test.text --] [-- Type: text/plain, Size: 102 bytes --] АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ Local variables: coding: utf-8 End: [-- Attachment #3: Type: text/plain, Size: 142 bytes --] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-06 21:48 ` Paul Pogonyshev @ 2006-07-07 8:46 ` Eli Zaretskii 2006-07-07 19:59 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2006-07-07 8:46 UTC (permalink / raw) Cc: emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Fri, 7 Jul 2006 00:48:15 +0300 > > $ emacs -Q > M-x customize-variable RET utf-fragment-on-decoding RET > [set to t, set for current session] > C-x C-f test.text RET > > Now, the characters from the file are decoded into `cyrillic-iso8859-5', > while new, typed characters are in `mule-unicode-0100-24ff'. This is exactly what is expected. Here's the doc string of utf-fragment-on-decoding: utf-fragment-on-decoding's value is nil Whether or not to decode some chars in UTF-8/16 text into iso8859 charsets. Setting this means that the relevant Cyrillic and Greek characters are decoded into the iso8859 charsets rather than into mule-unicode-0100-24ff. The iso8859 charsets take half as much space in the buffer, but using them may affect how the buffer can be re-encoded and may require a different input method to search for them, for instance. See `unify-8859-on-decoding-mode' and `unify-8859-on-encoding-mode' for mechanisms to make this largely transparent. The reason why the default value is nil is precisely that most users will not want the fragmentation, they will want the characters to belong to a single character set. Did you set this variable to a non-nil value in your .emacs? If so, how about removing that customization? If the reason is that you don't have Unicode fonts installed, I think installing them is a better solution. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-07 8:46 ` Eli Zaretskii @ 2006-07-07 19:59 ` Paul Pogonyshev 2006-07-08 12:35 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-07 19:59 UTC (permalink / raw) Eli Zaretskii wrote: > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > Date: Fri, 7 Jul 2006 00:48:15 +0300 > > > > $ emacs -Q > > M-x customize-variable RET utf-fragment-on-decoding RET > > [set to t, set for current session] > > C-x C-f test.text RET > > > > Now, the characters from the file are decoded into `cyrillic-iso8859-5', > > while new, typed characters are in `mule-unicode-0100-24ff'. > > This is exactly what is expected. Here's the doc string of > utf-fragment-on-decoding: > > utf-fragment-on-decoding's value is nil > > Whether or not to decode some chars in UTF-8/16 text into iso8859 charsets. > [...] Why not do the same to the typed characters? Current behavior is inconsistent---some characters are decoded (into iso-8859 charsets), some are not. > The reason why the default value is nil is precisely that most users > will not want the fragmentation, they will want the characters to > belong to a single character set. I understand you, but actually, most users do not bother. Emacs should work `out of the box' and display the characters. Apparently, it can show Cyrillic letters, but won't show them, uh? Why doesn't Emacs try to decode characters on displaying? This can be done only once just to check if the decoded characters can be shown normally, not as boxes. > Did you set this variable to a non-nil value in your .emacs? If so, > how about removing that customization? If the reason is that you > don't have Unicode fonts installed, I think installing them is a > better solution. I use Debian Sarge which is only 1 year old. And Emacs doesn't work with its standard font and Cyrillic letters as is. (Well, I didn't try the standard package, but CVS `emacs -Q' shows boxes.) I had enough persistence to find the reason (here, thank you), but most users won't. Especially since Emacs cannot even list font families (at least I don't know how.) Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-07 19:59 ` Paul Pogonyshev @ 2006-07-08 12:35 ` Eli Zaretskii 2006-07-08 15:30 ` Paul Pogonyshev 0 siblings, 1 reply; 23+ messages in thread From: Eli Zaretskii @ 2006-07-08 12:35 UTC (permalink / raw) Cc: handa, emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Fri, 7 Jul 2006 22:59:40 +0300 > > > utf-fragment-on-decoding's value is nil > > > > Whether or not to decode some chars in UTF-8/16 text into iso8859 charsets. > > [...] > > Why not do the same to the typed characters? Maybe it does, let's find out: how did you type those characters? Did you use a Leim input method (which one?), or did you type them on your keyboard? > Current behavior is inconsistent---some characters are decoded (into > iso-8859 charsets), some are not. I think it is consistent in the default configuration. > > The reason why the default value is nil is precisely that most users > > will not want the fragmentation, they will want the characters to > > belong to a single character set. > > I understand you, but actually, most users do not bother. Emacs should > work `out of the box' and display the characters. It does work `out of the box', if you don't change the value of utf-fragment-on-decoding. > Why doesn't Emacs try to decode characters on displaying? Decoding happens on input, when the characters are inserted into a buffer, not when they are displayed. Such insertion occurs when you either (a) type the characters at the keyboard, or (b) visit a file, or (c) paste them from an X selection or a clipboard, or (d) read output of some process which interacts with Emacs. (I hope I didn't forget any other possibilities.) If you describe how you typed those characters, maybe we will find a bug that needs to be fixed. > > Did you set this variable to a non-nil value in your .emacs? If so, > > how about removing that customization? If the reason is that you > > don't have Unicode fonts installed, I think installing them is a > > better solution. > > I use Debian Sarge which is only 1 year old. And Emacs doesn't work > with its standard font and Cyrillic letters as is. (Well, I didn't > try the standard package, but CVS `emacs -Q' shows boxes.) I had enough > persistence to find the reason (here, thank you), but most users won't. > Especially since Emacs cannot even list font families (at least I don't > know how.) I still don't understand whether you modified the value of utf-fragment-on-decoding or it came that way with Debian Sarge. In the latter case, I think it's something to complain about to Debian maintainers. The missing fonts is also an issue with Debian, I think. Perhaps they have an optional package you need to install, but since you live in a Cyrillic locale (if I understand correctly the headers of your message), I find it hard to believe that your system lacks Unicode fonts that don't support Cyrillic characters. If you do have these fonts installed, maybe it's yet another bug in Emacs. One of your prior messages showed that your locale is en_US.utf8. I don't know enough about font selection and fontsets; Handa-san, could you please tell Paul what information to send in order to find out why Unicode fonts aren't found by Emacs? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-08 12:35 ` Eli Zaretskii @ 2006-07-08 15:30 ` Paul Pogonyshev 2006-07-08 16:06 ` Eli Zaretskii 0 siblings, 1 reply; 23+ messages in thread From: Paul Pogonyshev @ 2006-07-08 15:30 UTC (permalink / raw) Cc: handa Eli Zaretskii wrote: > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > Date: Fri, 7 Jul 2006 22:59:40 +0300 > > > > > utf-fragment-on-decoding's value is nil > > > > > > Whether or not to decode some chars in UTF-8/16 text into iso8859 charsets. > > > [...] > > > > Why not do the same to the typed characters? > > Maybe it does, let's find out: how did you type those characters? Did > you use a Leim input method (which one?), or did you type them on your > keyboard? I think it is Leim input method `russian-computer'. I.e. I use `C-\' in Emacs to switch between US Engish and Russian keyboard layouts. > > Current behavior is inconsistent---some characters are decoded (into > > iso-8859 charsets), some are not. > > I think it is consistent in the default configuration. Yes, in the default. But not if you change `utf-fragment-on-decoding', I think. > > > The reason why the default value is nil is precisely that most users > > > will not want the fragmentation, they will want the characters to > > > belong to a single character set. > > > > I understand you, but actually, most users do not bother. Emacs should > > work `out of the box' and display the characters. > > It does work `out of the box', if you don't change the value of > utf-fragment-on-decoding. And displays boxes in place of Russian characters (all of them.) If `utf-fragment-on-decoding' is non=nil, it displays read characters fine, but not the newly typed characters. > > Why doesn't Emacs try to decode characters on displaying? > > Decoding happens on input, when the characters are inserted into a > buffer, not when they are displayed. Such insertion occurs when you > either (a) type the characters at the keyboard, or (b) visit a file, > or (c) paste them from an X selection or a clipboard, or (d) read > output of some process which interacts with Emacs. (I hope I didn't > forget any other possibilities.) > > If you describe how you typed those characters, maybe we will find a > bug that needs to be fixed. Maybe I used imprecise words. We know that Emacs can display Russian characters if they decoded into a national ISO charset. The same (conceptually, from the user point of view) characters are shown as boxes when they are in UTF charset. It should be possible to display ranges from UTF charset as national charsets. I.e. if character U+0430 is displayed as ISO-8859-5 0x50, all problems solved. No matter how the characters are encoded, if they conceptually are the same, they should be displayed using the same method, no? > > > Did you set this variable to a non-nil value in your .emacs? If so, > > > how about removing that customization? If the reason is that you > > > don't have Unicode fonts installed, I think installing them is a > > > better solution. > > > > I use Debian Sarge which is only 1 year old. And Emacs doesn't work > > with its standard font and Cyrillic letters as is. (Well, I didn't > > try the standard package, but CVS `emacs -Q' shows boxes.) I had enough > > persistence to find the reason (here, thank you), but most users won't. > > Especially since Emacs cannot even list font families (at least I don't > > know how.) > > I still don't understand whether you modified the value of > utf-fragment-on-decoding or it came that way with Debian Sarge. In > the latter case, I think it's something to complain about to Debian > maintainers. I think modification of `utf-fragment-on-decoding' is a remnant of the times I tried to solve Russian characters problem. Maybe it worked then, not sure. > The missing fonts is also an issue with Debian, I think. Perhaps they > have an optional package you need to install, but since you live in a > Cyrillic locale (if I understand correctly the headers of your > message), I find it hard to believe that your system lacks Unicode > fonts that don't support Cyrillic characters. I'll try writing to Debian. However, Emacs does poor job: while it _can_ show Cyrillic characters in `adobe-courier', it does so only when they are in a certain encoding. Cronyx fonts do indeed support Russian characters. However, customizing `default' face to use cronyx-courier for some reason influences only the current Emacs session. Bug? In the current session: I customize the default face to use `cronyx-courier' and press the ``Save for Future Sessions'' button. Cyrillic characters are now displayed with the Cronyx font, but ASCII characters are shown with `adobe-courier'... A new session: all characters are shown with `adobe-courier'. In particular, Cyrillic characters are shown as boxes. `.emacs' does indeed contain `cronyx-courier', but for some reason it doesn't take effect at all... Actually, I now see that I had this problem before and wrote about it in ``Pango-like font fallback (was Re: Russian numero sign)'' thread: I went to install all the fonts I could find in my Debian Sarge. And found cronyx-courier font, which looks nice _and_ has Cyrillic characters. However, when I customize the default face in Emacs and set that font family, latin characters are still displayed in adobe-courier (though Cyrillic ones are shown in cronyx-courier)... And the customization doesn't take any effect after I restart Emacs... Any ideas? Kenichi Handa answered: Perhaps that because you don't have -cronyx-courier-...-iso8859-1. Emacs by default uses an iso8859-1 font for ASCII. To change it, you must create a proper fontset by one of these ways: [...] How an average user is supposed to find it is beyond me. I disovered it only here. Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Russian letters 2006-07-08 15:30 ` Paul Pogonyshev @ 2006-07-08 16:06 ` Eli Zaretskii 0 siblings, 0 replies; 23+ messages in thread From: Eli Zaretskii @ 2006-07-08 16:06 UTC (permalink / raw) Cc: handa, emacs-devel > From: Paul Pogonyshev <pogonyshev@gmx.net> > Date: Sat, 8 Jul 2006 18:30:12 +0300 > Cc: handa@m17n.org > > Eli Zaretskii wrote: > > > From: Paul Pogonyshev <pogonyshev@gmx.net> > > > Date: Fri, 7 Jul 2006 22:59:40 +0300 > > > > > > > utf-fragment-on-decoding's value is nil > > > > > > > > Whether or not to decode some chars in UTF-8/16 text into iso8859 charsets. > > > > [...] > > > > > > Why not do the same to the typed characters? > > > > Maybe it does, let's find out: how did you type those characters? Did > > you use a Leim input method (which one?), or did you type them on your > > keyboard? > > I think it is Leim input method `russian-computer'. I.e. I use `C-\' in > Emacs to switch between US Engish and Russian keyboard layouts. Handa-san, should Leim obey utf-fragment-on-decoding? I think it should, but maybe there's some complication that prevents it. > No matter how the characters are encoded, if they conceptually are > the same, they should be displayed using the same method, no? Ideally, yes. However, this is a harsh requirement: a font assumes a certain encoding of a character, so Emacs cannot easily use another font if it's for a different encoding. > Cronyx fonts do indeed support Russian characters. However, customizing > `default' face to use cronyx-courier for some reason influences only the > current Emacs session. Bug? Probably. I'll let Handa-san to answer this. > Actually, I now see that I had this problem before and wrote about it in > ``Pango-like font fallback (was Re: Russian numero sign)'' thread: > > I went to install all the fonts I could find in my Debian Sarge. And > found cronyx-courier font, which looks nice _and_ has Cyrillic > characters. However, when I customize the default face in Emacs and > set that font family, latin characters are still displayed in > adobe-courier (though Cyrillic ones are shown in cronyx-courier)... > And the customization doesn't take any effect after I restart Emacs... > Any ideas? > > Kenichi Handa answered: > > Perhaps that because you don't have > -cronyx-courier-...-iso8859-1. Emacs by default uses an > iso8859-1 font for ASCII. To change it, you must create a > proper fontset by one of these ways: [...] > > How an average user is supposed to find it is beyond me. They shouldn't. But I think Debian should add a -cronyx-courier font for Latin-1, because without that Emacs is broken for Cyrillic scripts. Or maybe there's some other Unicode font that covers both Cyrillic and Latin-1. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2006-07-08 16:06 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-05 18:10 Russian letters Paul Pogonyshev 2006-07-05 18:19 ` Andreas Schwab 2006-07-05 21:43 ` Paul Pogonyshev 2006-07-05 22:08 ` Andreas Schwab 2006-07-05 22:21 ` Paul Pogonyshev 2006-07-05 22:55 ` Andreas Schwab 2006-07-06 15:59 ` Paul Pogonyshev 2006-07-06 16:39 ` Andreas Schwab 2006-07-06 18:17 ` Paul Pogonyshev 2006-07-06 20:11 ` Eli Zaretskii 2006-07-06 3:41 ` Eli Zaretskii 2006-07-06 15:56 ` Paul Pogonyshev 2006-07-06 20:12 ` Eli Zaretskii 2006-07-06 20:27 ` Paul Pogonyshev 2006-07-06 20:38 ` Paul Pogonyshev 2006-07-07 8:41 ` Eli Zaretskii 2006-07-06 21:14 ` Eli Zaretskii 2006-07-06 21:48 ` Paul Pogonyshev 2006-07-07 8:46 ` Eli Zaretskii 2006-07-07 19:59 ` Paul Pogonyshev 2006-07-08 12:35 ` Eli Zaretskii 2006-07-08 15:30 ` Paul Pogonyshev 2006-07-08 16:06 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.