unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Cyrillic, utf-8 and windows
@ 2003-12-08 18:39 Sam Steingold
  2003-12-09 18:25 ` Sam Steingold
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Steingold @ 2003-12-08 18:39 UTC (permalink / raw)


GNU Emacs 21.3.50.1 (i386-msvc-nt5.0.2195)
 of 2003-11-20 on WINSTEINGOLDLAP
--with-msvc (12.00)

I can open in Emacs a utf-8 file with Cyrillic characters in it and it
is displayed just fine - with correct glyphs &c.
I set `default-input-method' to "cyrillic-yawerty" in .emacs,
so when I try C-\ `toggle-input-method', I get 2 "character outline
boxes" in the modeline and when I type, I see these "character outline
boxes" in the buffer instead of the characters I just typed.
When I save the buffer, kill it, and re-visit the file,
I see what I just typed displayed correctly as Cyrillic!
So, why does Emacs display the characters that I type as boxes
(rectangles) but shows them correctly when loaded from a file on disk?

I use:

  (setq default-input-method "cyrillic-yawerty")
  (prefer-coding-system 'utf-8)
  (when (fboundp 'utf-translate-cjk-mode) (utf-translate-cjk-mode 1))

Is there anything else I need to do?

Thanks!


-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
NY survival guide: when crossing a street, mind cars, not streetlights.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-08 18:39 Cyrillic, utf-8 and windows Sam Steingold
@ 2003-12-09 18:25 ` Sam Steingold
  2003-12-09 23:58   ` Kenichi Handa
  2003-12-10  1:27   ` Jason Rumney
  0 siblings, 2 replies; 8+ messages in thread
From: Sam Steingold @ 2003-12-09 18:25 UTC (permalink / raw)


> * Sam Steingold <fqf@tah.bet> [2003-12-08 13:39:32 -0500]:
>
> GNU Emacs 21.3.50.1 (i386-msvc-nt5.0.2195)
>  of 2003-11-20 on WINSTEINGOLDLAP
> --with-msvc (12.00)
>
> I can open in Emacs a utf-8 file with Cyrillic characters in it and it
> is displayed just fine - with correct glyphs &c.
> I set `default-input-method' to "cyrillic-yawerty" in .emacs,
> so when I try C-\ `toggle-input-method', I get 2 "character outline
> boxes" in the modeline and when I type, I see these "character outline
> boxes" in the buffer instead of the characters I just typed.
> When I save the buffer, kill it, and re-visit the file,
> I see what I just typed displayed correctly as Cyrillic!
> So, why does Emacs display the characters that I type as boxes
> (rectangles) but shows them correctly when loaded from a file on disk?
>
> I use:
>
>   (setq default-input-method "cyrillic-yawerty")
>   (prefer-coding-system 'utf-8)
>   (when (fboundp 'utf-translate-cjk-mode) (utf-translate-cjk-mode 1))

when I type using cyrillic-yawerty, I get this:

  character: а (07120, 3664, 0xe50, U+0430)
    charset: cyrillic-iso8859-5
             (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.)
 code point: 80
     syntax: w 	which means: word
   category: y:Cyrillic  
buffer code: 0x8C 0xD0
  file code: 0xD0 0xB0 (encoded by coding system mule-utf-8-unix)
    display: no font available

when I save the file, kill the buffer and visit the file again, that
character becomes

  character: а (01212120, 332880, 0x51450, U+0430)
    charset: mule-unicode-0100-24ff
             (Unicode characters of the range U+0100..U+24FF.)
 code point: 40 80
     syntax: w 	which means: word
   category: y:Cyrillic  
buffer code: 0x9C 0xF4 0xA8 0xD0
  file code: 0xD0 0xB0 (encoded by coding system mule-utf-8-unix)
    display: by this font (glyph code)
     -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-80-iso10646-1 (0x430)

So, how do I tell cyrillic-yawerty to insert UTF-8?!


-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
When you talk to God, it's prayer; when He talks to you, it's schizophrenia.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-09 18:25 ` Sam Steingold
@ 2003-12-09 23:58   ` Kenichi Handa
  2003-12-11 19:38     ` Sam Steingold
  2003-12-10  1:27   ` Jason Rumney
  1 sibling, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2003-12-09 23:58 UTC (permalink / raw)
  Cc: emacs-devel

Thank you for the report.  

In article <ufzftq26b.fsf@gnu.org>, Sam Steingold <sds@gnu.org> writes:
>> 
>>  I can open in Emacs a utf-8 file with Cyrillic characters in it and it
>>  is displayed just fine - with correct glyphs &c.
>>  I set `default-input-method' to "cyrillic-yawerty" in .emacs,
>>  so when I try C-\ `toggle-input-method', I get 2 "character outline
>>  boxes" in the modeline and when I type, I see these "character outline
>>  boxes" in the buffer instead of the characters I just typed.
>>  When I save the buffer, kill it, and re-visit the file,
>>  I see what I just typed displayed correctly as Cyrillic!
>>  So, why does Emacs display the characters that I type as boxes
>>  (rectangles) but shows them correctly when loaded from a file on disk?

Because those are different character for Emacs as you
already found as below.

> when I type using cyrillic-yawerty, I get this:
[...]
>     charset: cyrillic-iso8859-5
[...]
> when I save the file, kill the buffer and visit the file again, that
> character becomes
[...]
>     charset: mule-unicode-0100-24ff

> So, how do I tell cyrillic-yawerty to insert UTF-8?!

The input method cyrillic-yawerty generates iso-8859-5
characters, and Emacs has a facility to automatically adjust
an input character to what the buffer-file-coding-system
expects.  But, I found a bug in that facility and
insufficiency in set-default-coding-systems (called from
prefer-coding-system).  Please try the attached patch.

But, there still exist one problem.  As you don't have
iso8859-5 fonts, the input-method indicator in the modeline
can't be displayed correctly.  For the moment, Emacs doesn't
has a facility to automatically try the other fonts
(e.g. iso10646-1).  Emacs-unicode version has it.

---
Ken'ichi HANDA
handa@m17n.org

*** ucs-tables.el.~1.34.~	Tue Sep  2 08:25:38 2003
--- ucs-tables.el	Wed Dec 10 08:17:57 2003
***************
*** 2507,2512 ****
--- 2507,2514 ----
  		     (coding-system-base default-buffer-file-coding-system))))
        (when cs
  	(setq table (coding-system-get cs 'translation-table-for-encode))
+ 	(if (and table (symbolp table))
+ 	    (setq table (get table 'translation-table)))
  	(unless (char-table-p table)
  	  (setq table (coding-system-get cs 'translation-table-for-input)))
  	(when (char-table-p table)
*** mule-cmds.el.~1.249.~	Wed Nov 26 08:10:10 2003
--- mule-cmds.el	Wed Dec 10 08:42:43 2003
***************
*** 321,326 ****
--- 321,331 ----
    o default value for the command `set-keyboard-coding-system'."
    (check-coding-system coding-system)
    (setq-default buffer-file-coding-system coding-system)
+   (if (fboundp 'ucs-set-table-for-input)
+       (dolist (buffer (buffer-list))
+ 	(or (local-variable-p 'buffer-file-coding-system buffer)
+ 	    (ucs-set-table-for-input buffer))))
+ 
    (if default-enable-multibyte-characters
        (setq default-file-name-coding-system coding-system))
    ;; If coding-system is nil, honor that on MS-DOS as well, so

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-09 18:25 ` Sam Steingold
  2003-12-09 23:58   ` Kenichi Handa
@ 2003-12-10  1:27   ` Jason Rumney
  2003-12-10  7:20     ` Roman Belenov
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Rumney @ 2003-12-10  1:27 UTC (permalink / raw)
  Cc: emacs-devel

Sam Steingold <sds@gnu.org> writes:

> when I type using cyrillic-yawerty, I get this:
> 
>   character: а (07120, 3664, 0xe50, U+0430)
>     charset: cyrillic-iso8859-5
>              (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.)
>  code point: 80
>      syntax: w 	which means: word
>    category: y:Cyrillic  
> buffer code: 0x8C 0xD0
>   file code: 0xD0 0xB0 (encoded by coding system mule-utf-8-unix)
>     display: no font available

There seems to be a bug in the font handling of the Windows
version. Courier New (the default font) has Cyrillic characters, and
most versions of Windows support iso8859-5 - you can verify this: if
(w32-get-valid-codepages) lists 28595, then it should be supported;
so Cyrillic should display automatically. But I just checked and it
doesn't seem to be the case.

 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-10  1:27   ` Jason Rumney
@ 2003-12-10  7:20     ` Roman Belenov
  2003-12-11 19:39       ` Sam Steingold
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Belenov @ 2003-12-10  7:20 UTC (permalink / raw)
  Cc: help-emacs-windows, sds, emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> There seems to be a bug in the font handling of the Windows
> version. Courier New (the default font) has Cyrillic characters, and
> most versions of Windows support iso8859-5 - you can verify this: if
> (w32-get-valid-codepages) lists 28595, then it should be supported;
> so Cyrillic should display automatically. But I just checked and it
> doesn't seem to be the case.

The list of character sets supported by Windows can be customized (at
least in NT derivatives) via Control Panel->Regional Settings
("Advanced" tab). It seems that iso8859-5 is enabled automatically if
Russian locale is in use, but in other locales you may have to enable
iso8859-5 manually.

-- 
 							With regards, Roman.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-09 23:58   ` Kenichi Handa
@ 2003-12-11 19:38     ` Sam Steingold
  2003-12-11 23:20       ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Steingold @ 2003-12-11 19:38 UTC (permalink / raw)


> * Kenichi Handa <unaqn@z17a.bet> [2003-12-10 08:58:28 +0900]:
>
> Please try the attached patch.

works, thanks!

BTW, why does gnus save koi8 messages in something like uuencode?


-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
Abandon all hope, all ye who press Enter.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-10  7:20     ` Roman Belenov
@ 2003-12-11 19:39       ` Sam Steingold
  0 siblings, 0 replies; 8+ messages in thread
From: Sam Steingold @ 2003-12-11 19:39 UTC (permalink / raw)
  Cc: emacs-devel

> * Roman Belenov <eoryrabi@lnaqrk.eh> [2003-12-10 10:20:04 +0300]:
>
> The list of character sets supported by Windows can be customized (at
> least in NT derivatives) via Control Panel->Regional Settings
> ("Advanced" tab). It seems that iso8859-5 is enabled automatically if
> Russian locale is in use, but in other locales you may have to enable
> iso8859-5 manually.

thanks, this helped!

-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
Those who can't write, write manuals.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic, utf-8 and windows
  2003-12-11 19:38     ` Sam Steingold
@ 2003-12-11 23:20       ` Kenichi Handa
  0 siblings, 0 replies; 8+ messages in thread
From: Kenichi Handa @ 2003-12-11 23:20 UTC (permalink / raw)
  Cc: emacs-devel

In article <u4qw7no0w.fsf@gnu.org>, Sam Steingold <sds@gnu.org> writes:
>>  * Kenichi Handa <unaqn@z17a.bet> [2003-12-10 08:58:28 +0900]:
>> 
>>  Please try the attached patch.

> works, thanks!

Thank you for testing.

> BTW, why does gnus save koi8 messages in something like uuencode?

Sorry, I have no idea about it.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-12-11 23:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-08 18:39 Cyrillic, utf-8 and windows Sam Steingold
2003-12-09 18:25 ` Sam Steingold
2003-12-09 23:58   ` Kenichi Handa
2003-12-11 19:38     ` Sam Steingold
2003-12-11 23:20       ` Kenichi Handa
2003-12-10  1:27   ` Jason Rumney
2003-12-10  7:20     ` Roman Belenov
2003-12-11 19:39       ` Sam Steingold

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).