all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Problem with multilingual input?
@ 2007-11-20  6:11 Bostjan Vilfan
  2007-11-20  6:50 ` martin rudalics
  0 siblings, 1 reply; 5+ messages in thread
From: Bostjan Vilfan @ 2007-11-20  6:11 UTC (permalink / raw)
  To: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 6080 bytes --]

From:  <bostjanv@alum.mit.edu>
To: bug-gnu-emacs@gnu.org
Subject: Problem with multilingual input?
--text follows this line--

Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:


DESCRIPTION:

I'm having problems with cyrillic input. For example, if I do the following:

1. Open a new file
2. Write "bla bla" in  line one
3. Go to next line
4. Options->Mule->Select Input Method
5. Enter "cyrillic-translit"
6. Write "bla bla" (cyrillic characters appear: бла бла)
7. Save
8. On query about coding system, enter mule-utf-8 (default)
9. Close file
10.ON REOPEN CYRILLIC CHARACTERS RENDERED AS EMPTY RECTANGLES

However, if I do the same thing on a Linux system the result is OK
(cyrillic characters rendered correctly)

Regards,
Bostjan Vilfan


If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
c:/Program Files/Emacs/emacs-22.1/etc/DEBUG for instructions.


In GNU Emacs 22.1.1 (i386-mingw-nt5.1.2600)
 of 2007-06-02 on RELEASE
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4) --cflags -Ic:/gnuwin32/include'

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: ENU
  locale-coding-system: cp1250
  default-enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  encoded-kbd-mode: t
  tooltip-mode: t
  tool-bar-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  unify-8859-on-encoding-mode: t
  utf-translate-cjk-mode: t
  auto-compression-mode: t
  line-number-mode: t

Recent input:
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <menu-bar> 
<help-menu> <emacs-manual> <help-echo> <help-echo> 
C-s b u g s <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<help-echo> <help-echo> <help-echo> C-h C-e C-s m u 
l t i l <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <menu-bar> 
<file> <kill-buffer> <help-echo> <help-echo> <down-mouse-2> 
<mouse-2> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <down-mouse-2> 
<mouse-2> <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<help-echo> <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<down-mouse-2> <mouse-2> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <down-mouse-2> 
<mouse-2> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <menu-bar> <file> <kill-buffer> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <menu-bar> <help-menu> 
<report-emacs-bug>

Recent messages:
Mark saved where search started [2 times]
Loading view...done
Loading outline...
Loading easy-mmode...done
Loading outline...done
View mode: type C-h for help, h for commands, q to quit.
Mark saved where search started
Loading emacsbug...
Loading regexp-opt...done
Loading emacsbug...done








      ____________________________________________________________________________________
Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

[-- Attachment #2: Type: text/html, Size: 8180 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with multilingual input?
  2007-11-20  6:11 Bostjan Vilfan
@ 2007-11-20  6:50 ` martin rudalics
  0 siblings, 0 replies; 5+ messages in thread
From: martin rudalics @ 2007-11-20  6:50 UTC (permalink / raw)
  To: Bostjan Vilfan; +Cc: bug-gnu-emacs

> 10.ON REOPEN CYRILLIC CHARACTERS RENDERED AS EMPTY RECTANGLES
>
> However, if I do the same thing on a Linux system the result is OK
> (cyrillic characters rendered correctly)

Does it work if you do

M-x set-language-environment RET utf-8 RET

before reopening that file?

What are the values of `current-language-environment' on your
Windows and GNU/Linux systems?






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with multilingual input?
       [not found] <14794.6183.qm@web58603.mail.re3.yahoo.com>
@ 2007-11-21  7:23 ` martin rudalics
  0 siblings, 0 replies; 5+ messages in thread
From: martin rudalics @ 2007-11-21  7:23 UTC (permalink / raw)
  To: Bostjan Vilfan; +Cc: Bug-Gnu-Emacs

 > On Windows I tried your suggestion (set-language-environment) and the
 > result was the same (empty rectangles). Then I selected
 > Options->Mule-Show All of Mule Status and read off the current
 > language environment as UTF-8. Thus, language environment equals utf-8
 > or English does not influence the outcome.

Sorry for the delay.  I hoped someone else would respond but apparently
all language environment experts are busy at the moment.  Please CC to
bug-gnu-emacs when answering - maybe we'll get qualified help.

 > On Linux the outcome is OK (cyrillic characters visible), again

The correct name of this OS is GNU/Linux.

 > regardless of the language environment settings (utf-8 or English)

When I have a file saved with mule-utf-8 containing the two lines

bla bla
бла бла

visit the file with `current-language-environment' utf-8 and do
`describe-char' for the first character of the first line I get

   character: b (98, #o142, #x62, U+0062)
     charset: ascii (ASCII (ISO646 IRV))
  code point: #x62
      syntax: w 	which means: word
    category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
   file code: #x62 (encoded by coding system mule-utf-8-dos)
     display: by this font (glyph code)
      -outline-Courier New-normal-r-normal-normal-16-96-120-120-c-*-iso8859-1 (#x62)

`describe-char' for the first character of the second line gets me

   character: б (332881, #o1212121, #x51451, U+0431)
     charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
  code point: #x28 #x51
      syntax: w 	which means: word
    category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
   file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
     display: by this font (glyph code)
      -outline-Courier New-normal-r-normal-normal-16-96-120-120-c-*-iso10646-1 (#x431)

on WindowsME.  What do you get?






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with multilingual input?
@ 2007-11-21  8:31 Bostjan Vilfan
  2007-11-21  9:28 ` Jason Rumney
  0 siblings, 1 reply; 5+ messages in thread
From: Bostjan Vilfan @ 2007-11-21  8:31 UTC (permalink / raw)
  To: martin rudalics; +Cc: Bug-Gnu-Emacs

[-- Attachment #1: Type: text/plain, Size: 6617 bytes --]

Hello,
I followed your instructions, and I think I made some progress.

When I reopened the file containing cyrillic characters with language environment = utf-8, I obtained the following results for describe-char:


 character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
 code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
  file code: #x62 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso8859-1 (#x62)


  character: б (332881, #o1212121, #x51451, U+0431)
    charset: mule-unicode-0100-24ff
         (Unicode characters of the range U+0100..U+24FF.)
 code point: #x28 #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
  file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso10646-1 (#x431)

Comparing this result to yours in your previous message, it would appear that the font is the culprit. Namely I invoke Emacs with the command line options

"C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font "-outline-Bitstream Vera Sans Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"

If I invoke Emacs simply with the command line

Emacs

then the describe-char commands yield:

  character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
 code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
  file code: #x62 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-*-iso8859-1 (#x62)


  character: б (332881, #o1212121, #x51451, U+0431)
    charset: mule-unicode-0100-24ff
         (Unicode characters of the range U+0100..U+24FF.)
 code point: #x28 #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
  file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
    display: by this font (glyph code)
     -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-*-iso10646-1 (#x431)

and the cyrillic characters are clearly visible. However, this still does not exhaust  the possible questions. Namely, when I invoke Emacs with the "problematic font" as described above, I can still display cyrillic characters in a new file. Problems arise only when I  _reopen_ the file. To investigate this problem I invoked Emacs with 

"C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font
"-outline-Bitstream Vera Sans
Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"


and entered the same lines as before in a new file (even without language environment = utf-8). descirbe-char yields

  character: b (98, #o142, #x62, U+0062)
    charset: ascii (ASCII (ISO646 IRV))
 code point: #x62
     syntax: w     which means: word
   category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0]) l:Latin
buffer code: #x62
  file code: #x62 (encoded by coding system iso-latin-1-dos)
    display: by this font (glyph code)
     -outline-Bitstream Vera Sans Mono-bold-r-normal-normal-16-120-96-96-c-*-iso8859-1 (#x62)


  character: б (3665, #o7121, #xe51, U+0431)
    charset: cyrillic-iso8859-5
         (Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5): ISO-IR-144.)
 code point: #x51
     syntax: w     which means: word
   category: y:Cyrillic
buffer code: #x8C #xD1
  file code: not encodable by coding system iso-latin-1-dos
    display: by this font (glyph code)
     -outline-Arial-bold-r-normal-normal-16-120-96-96-p-*-iso8859-5 (#x431)

In this case the cyrillic characters are visible

THUS IT WOULD APPEAR THAT IN THIS CASE EMACS IS ABLE TO SELECT A SUBSTITUTE FONT THAT RENDERS THE CHARACTERS CORRECTLY. WHY DOES IT NOT DO SO WHEN THE FILE IS REOPENED?

 
Regards,
Bostjan


----- Original Message ----
From: martin rudalics <rudalics@gmx.at>
To: Bostjan Vilfan <bvilf@yahoo.com>
Cc: Bug-Gnu-Emacs <bug-gnu-emacs@gnu.org>
Sent: Wednesday, November 21, 2007 8:23:07 AM
Subject: Re: Problem with multilingual input?


 > On Windows I tried your suggestion (set-language-environment) and
 the
 > result was the same (empty rectangles). Then I selected
 > Options->Mule-Show All of Mule Status and read off the current
 > language environment as UTF-8. Thus, language environment equals
 utf-8
 > or English does not influence the outcome.

Sorry for the delay.  I hoped someone else would respond but apparently
all language environment experts are busy at the moment.  Please CC to
bug-gnu-emacs when answering - maybe we'll get qualified help.

 > On Linux the outcome is OK (cyrillic characters visible), again

The correct name of this OS is GNU/Linux.

 > regardless of the language environment settings (utf-8 or English)

When I have a file saved with mule-utf-8 containing the two lines

bla bla
бла бла

visit the file with `current-language-environment' utf-8 and do
`describe-char' for the first character of the first line I get

   character: b (98, #o142, #x62, U+0062)
     charset: ascii (ASCII (ISO646 IRV))
  code point: #x62
      syntax: w     which means: word
    category: a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0])
 l:Latin
buffer code: #x62
   file code: #x62 (encoded by coding system mule-utf-8-dos)
     display: by this font (glyph code)
      -outline-Courier
 New-normal-r-normal-normal-16-96-120-120-c-*-iso8859-1 (#x62)

`describe-char' for the first character of the second line gets me

   character: б (332881, #o1212121, #x51451, U+0431)
     charset: mule-unicode-0100-24ff (Unicode characters of the range
 U+0100..U+24FF.)
  code point: #x28 #x51
      syntax: w     which means: word
    category: y:Cyrillic
buffer code: #x9C #xF4 #xA8 #xD1
   file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
     display: by this font (glyph code)
      -outline-Courier
 New-normal-r-normal-normal-16-96-120-120-c-*-iso10646-1 (#x431)

on WindowsME.  What do you get?









      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/

[-- Attachment #2: Type: text/html, Size: 8334 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with multilingual input?
  2007-11-21  8:31 Problem with multilingual input? Bostjan Vilfan
@ 2007-11-21  9:28 ` Jason Rumney
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Rumney @ 2007-11-21  9:28 UTC (permalink / raw)
  To: Bostjan Vilfan; +Cc: Bug-Gnu-Emacs

Bostjan Vilfan wrote:
>   character: б (332881, #o1212121, #x51451, U+0431)
>     charset: mule-unicode-0100-24ff
>          (Unicode characters of the range U+0100..U+24FF.)
>  code point: #x28 #x51
>      syntax: w     which means: word
>    category: y:Cyrillic
> buffer code: #x9C #xF4 #xA8 #xD1
>   file code: #xD0 #xB1 (encoded by coding system mule-utf-8-dos)
>     display: by this font (glyph code)
>      -outline-Bitstream Vera Sans
> Mono-bold-r-normal-normal-16-120-96-96-c-*-iso10646-1 (#x431)
>
> Comparing this result to yours in your previous message, it would
> appear that the font is the culprit. Namely I invoke Emacs with the
> command line options
>
> "C:\Program Files\Emacs\emacs-22.1\bin\runemacs.exe" -g -0 --font
> "-outline-Bitstream Vera Sans
> Mono-bold-r-normal-normal-*-*-96-96-c-*-iso8859-1"
Try the font "DejaVu Sans Mono". It is an extended version of Bitstream
Vera Sans Mono that supports many more characters, including Cyrillic.

> and the cyrillic characters are clearly visible. However, this still
> does not exhaust  the possible questions. Namely, when I invoke Emacs
> with the "problematic font" as described above, I can still display
> cyrillic characters in a new file. Problems arise only when I 
> _reopen_ the file.
The difference is the character encoding: when you enter characters,
they are entered as iso8859-5 encoded characters, so Emacs chooses a
Cyrillic font to display them. When you read them from a UTF-8 encoded
file, they are read as mule-unicode-0100-24ff encoded characters, so
Emacs chooses a Unicode font to display them. On Windows, all truetype
fonts are Unicode fonts for some subset of characters, but Emacs 22 does
not look in any more detail to see what subset that is.

This will improve with Emacs 23 (once the unicode branch is merged),
since Cyrillic will always be consistently encoded as Unicode, and a new
font backend now has the ability to look more closely at Unicode fonts
to see which Unicode subranges they support.






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-11-21  9:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-21  8:31 Problem with multilingual input? Bostjan Vilfan
2007-11-21  9:28 ` Jason Rumney
     [not found] <14794.6183.qm@web58603.mail.re3.yahoo.com>
2007-11-21  7:23 ` martin rudalics
  -- strict thread matches above, loose matches on Subject: below --
2007-11-20  6:11 Bostjan Vilfan
2007-11-20  6:50 ` martin rudalics

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.