unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [w32] display international HELLO
@ 2007-11-09  8:37 Richard Wordingham
  2007-11-09 10:55 ` Eli Zaretskii
  2007-11-09 12:40 ` Kenichi Handa
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Wordingham @ 2007-11-09  8:37 UTC (permalink / raw)
  To: emacs-devel

On 31 January 2007, Takashi Hiromatsu wrote (archived at 
http://lists.gnu.org/archive/html/emacs-devel/2007-01/msg01087.html ):

> I'm tring to display all the language's "HELLO" on Emacs on Windows by 
> using original Microsoft true type fonts.
>    --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

> I succeed many of them by "Arial unicode MS" font exept 7 language listed 
> below.:
>    Amharic
>    Arabic
>    Braille
>    Hindi
>    Kannada
>    Malayalam
>    Tibetan

> I wrote only font settings in my ~/.emacs shown below.
> ----------------------------------------------------------------------------
> (add-to-list 'default-frame-alist '(font . "fontset-default"))

> (set-fontset-font "fontset-default"
>                   'mule-unicode-0100-24ff
>                   '("Arial Unicode MS*" . "iso10646-1"))
<snip>
> Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode 
> MS", becuase it does not have. But I hope to see other 5 languages by it.

> Is there any ways to display them?
> Or should I use other fonts?

Hindi and Malayalam are a tougher problem.  Although the basic text is 
encoded in mule-unicode-0100-24ff, 'composition' properties are actually 
specified in the file.  The composition property should provide renderable 
text and mark-up which replace the basic text in the display, which ideally 
should be totally unnecessary in an MS Windows system.  (Realising this 
ideal requires the ability to upgrade the Uniscribe library to cover extra 
scripts and even newly admitted characters in supported scripts.)  These 
compositions are defined by elements for the charset indian-glyph, and its 
characters have no specified Unicode equivalent.  You need a non-Unicode 
font to display these characters.  Arial Unicode MS does not contain much in 
the way of shaping tables, so it will not work properly for any of the truly 
'complex' Indic scripts.  (This may be why Microsoft seems to have abandoned 
this font.)

Tibetan and Lao also use the composition property, but in terms of 
characters in the same charset.  However, I'm having display problems for 
Lao - see item 3 below.  Tibetan won't display for me as I don't have a font 
that supports Tibetan.

I'm trying to understand how the input and display mechanisms of Emacs 
22.1.1 work on Windows XP - I'm particularly interested in Indic scripts. 
My machine is set up with Thai as its 'ANSI' character set.  I'm seeing some 
rather bizarre behaviours, and I'm having difficulty understanding them. 
Once I realised that Emacs was not accepting Unicode input from the 
keyboard, I tried to understand the built-in input methods.  I investigated 
Lao input.

1. With the default font, the Windows keyboard set to Thai Kedmanee, Thai 
displays badly as it is typed.  Bits of characters are left behind as the 
typing position moves rightwards faster than it should.  However, when I 
switch to Code2000, a font with a wide Unicode coverage, Thai displays as 
well as it does with native products such as Notepad.  This may be because 
the alleged default font, Courier New, has no Thai glyphs, and so glyph 
metrics and glyphs bear no relationship to one another.

The Thai characters produced in this fashion are in one of the Unicode 
charsets (mule-unicode-0100-24ff).

2. My first discovery with Lao was that just selecting a font (Code2000) 
that supported Lao was not enough.  It would not normally display Lao 
characters (in the Lao charset), until I discovered that a trick such as

(set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))

suddenly made the Lao text displayable.  How does this work?  I have studied 
the code of xdisp.c and its supporting functions, but I cannot find where 
Emacs character codes are converted to Unicode.  I did notice that if I 
pasted Lao in from an MS application, Emacs would accept them as Unicode 
characters and they would be displayed properly if I selected an appropriate 
font.

3. Compositions of Lao characters, (i.e. with the 'composition' string 
property) using the Code2000 font (the only fully working Lao font I have), 
do not display properly, whether they are in the Lao or 
mule-unicode-0100-24ff charset.  With the latter I have seen left-hand parts 
of Hangul syllables displayed instead of Lao!  Perhaps when I understand how 
uncomposed display does work, I will be able to understand this problem. At 
present I need to defeat the composition logic by typing consonant + vowel 
as <consonant, space, delete, vowel>!  The text entered thus then displays 
properly, mocking the hard work that has gone into carefully composing 
grapheme clusters.

4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one 
of its variants), the Lao input method suddenly switches from generating Lao 
characters in the Lao charset to generating Lao characters in the 
mule-unicode-0100-24ff charset.  How is this effect achieved?  I can't work 
it out.  Characters already stored in the Lao charset remain in the Lao 
charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not 
change the charset used by the Lao charset.

5. Possibly not news, but I have found that with a Uniscribe that supports 
Khmer, Unicode-encoded Khmer text pasted in to Emacs displays properly, 
including 'Indic rearrangement'.  As far as I can tell, Emacs 22.1 has no 
support for Khmer!  (Cursor positioning does look wrong for Khmer.)  When I 
understand what is happening with Lao, I intend to write an input method for 
Khmer - unless I find Emacs on Windows has evolved to accepting UTF-16 as 
the coding system for keyboard input.

6. Latin ligaturing does not work.  'Caesar' with a ZWJ between 'a' and 'e' 
does not ligate even using a font for which it does ligate in Notepad. 
Perhaps that can get swept up with the handling of Unicode viramas, i.e. 
Indic conjuncts.

Richard. 

^ permalink raw reply	[flat|nested] 13+ messages in thread
* [w32] display international HELLO
@ 2007-01-31  6:34 Takashi Hiromatsu
  2007-01-31  6:51 ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Hiromatsu @ 2007-01-31  6:34 UTC (permalink / raw)
  To: Emacs Devel ML

Dear all,

I'm tring to display all the language's "HELLO" on Emacs on Windows by
using original Microsoft true type fonts.
    --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

I succeed many of them by "Arial unicode MS" font exept 7 language listed
below.:
    Amharic
    Arabic
    Braille
    Hindi
    Kannada
    Malayalam
    Tibetan

I wrote only font settings in my ~/.emacs shown below.
----------------------------------------------------------------------------
(add-to-list 'default-frame-alist '(font . "fontset-default"))

(set-fontset-font "fontset-default"
                  'mule-unicode-0100-24ff
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'latin-iso8859-3
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'cyrillic-iso8859-5
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'greek-iso8859-7
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'vietnamese-viscii-lower
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'vietnamese-viscii-upper
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'tibetan
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'lao
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'ipa
                  '("Arial Unicode MS*" . "iso10646-1"))

----------------------------------------------------------------------------
Microsoft explained that "Arial Unicode MS" font can display many
languages listed below:
    Basic Latin (95); Latin-1 Supplement (96); Latin Extended-A (128);
    Latin Extended-B (148); IPA Extensions (89); Spacing Modifier Letters
    (57); Combining Diacritical Marks (72); Greek (105); Cyrillic (226);
    Armenian (85); Hebrew (82); Arabic (194); Devanagari (104); Bengali
    (89); Gurmukhi (75); Gujarati (78); Oriya (79); Tamil (61); Telugu
    (80); Kannada (80); Malayalam (78); Thai (87); Lao (65); Tibetan
    (168); Georgian (78); Hangul Jamo (240); Latin Extended Additional
    (246); Greek Extended (233); General Punctuation (63); Superscripts
    and Subscripts (28); Currency Symbols (13); Combining Diacritical
    Marks for Symbols (18); Letterlike Symbols (57); Number Forms (48);
    Arrows (91); Mathematical Operators (242); Miscellaneous Technical
    (123); Control Pictures (37); Optical Character Recognition (11);
    Enclosed Alphanumerics (139); Box Drawing (128); Block Elements (22);
    Geometric Shapes (80); Miscellaneous Symbols (106); Dingbats (160);
    CJK Symbols and Punctuation (57); Hiragana (90); Katakana (94);
    Bopomofo (40); Hangul Compatibility Jamo (94); Kanbun (16); Enclosed
    CJK Letters and Months (202); CJK Compatibility (249); CJK Unified
    Ideographs (20,902); Hangul Syllables (11,172); CJK Compatibility
    Ideographs (302); Alphabetic Presentation Forms (57); Arabic
    Presentation Forms-A (593); Combining Half Marks (4); CJK
    Compatibility Forms (28); Small Form Variants (26); Arabic
    Presentation Forms-B (139); Halfwidth and Fullwidth Forms (223);
    Specials (2)

----------------------------------------------------------------------------

Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode
MS", becuase it does not have. But I hope to see other 5 languages by it.

Is there any ways to display them?
Or should I use other fonts?

Takashi Hiromatsu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-11-21  1:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-09  8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
2007-11-15  2:48   ` Richard Wordingham
2007-11-19  2:35     ` Kenichi Handa
2007-11-19  8:51       ` Jason Rumney
2007-11-20  1:49       ` Richard Wordingham
2007-11-20 11:30         ` Jason Rumney
2007-11-20 12:50           ` Kenichi Handa
2007-11-21  1:51       ` Richard Wordingham
  -- strict thread matches above, loose matches on Subject: below --
2007-01-31  6:34 Takashi Hiromatsu
2007-01-31  6:51 ` Kenichi Handa
2007-01-31  7:07   ` Takashi Hiromatsu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).