[w32] display international HELLO

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* [w32] display international HELLO
@ 2007-01-31  6:34 Takashi Hiromatsu
  2007-01-31  6:51 ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Hiromatsu @ 2007-01-31  6:34 UTC (permalink / raw)
  To: Emacs Devel ML

Dear all,

I'm tring to display all the language's "HELLO" on Emacs on Windows by
using original Microsoft true type fonts.
    --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

I succeed many of them by "Arial unicode MS" font exept 7 language listed
below.:
    Amharic
    Arabic
    Braille
    Hindi
    Kannada
    Malayalam
    Tibetan

I wrote only font settings in my ~/.emacs shown below.
----------------------------------------------------------------------------
(add-to-list 'default-frame-alist '(font . "fontset-default"))

(set-fontset-font "fontset-default"
                  'mule-unicode-0100-24ff
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'latin-iso8859-3
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'cyrillic-iso8859-5
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'greek-iso8859-7
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'vietnamese-viscii-lower
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'vietnamese-viscii-upper
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'tibetan
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'lao
                  '("Arial Unicode MS*" . "iso10646-1"))

(set-fontset-font "fontset-default"
                  'ipa
                  '("Arial Unicode MS*" . "iso10646-1"))

----------------------------------------------------------------------------
Microsoft explained that "Arial Unicode MS" font can display many
languages listed below:
    Basic Latin (95); Latin-1 Supplement (96); Latin Extended-A (128);
    Latin Extended-B (148); IPA Extensions (89); Spacing Modifier Letters
    (57); Combining Diacritical Marks (72); Greek (105); Cyrillic (226);
    Armenian (85); Hebrew (82); Arabic (194); Devanagari (104); Bengali
    (89); Gurmukhi (75); Gujarati (78); Oriya (79); Tamil (61); Telugu
    (80); Kannada (80); Malayalam (78); Thai (87); Lao (65); Tibetan
    (168); Georgian (78); Hangul Jamo (240); Latin Extended Additional
    (246); Greek Extended (233); General Punctuation (63); Superscripts
    and Subscripts (28); Currency Symbols (13); Combining Diacritical
    Marks for Symbols (18); Letterlike Symbols (57); Number Forms (48);
    Arrows (91); Mathematical Operators (242); Miscellaneous Technical
    (123); Control Pictures (37); Optical Character Recognition (11);
    Enclosed Alphanumerics (139); Box Drawing (128); Block Elements (22);
    Geometric Shapes (80); Miscellaneous Symbols (106); Dingbats (160);
    CJK Symbols and Punctuation (57); Hiragana (90); Katakana (94);
    Bopomofo (40); Hangul Compatibility Jamo (94); Kanbun (16); Enclosed
    CJK Letters and Months (202); CJK Compatibility (249); CJK Unified
    Ideographs (20,902); Hangul Syllables (11,172); CJK Compatibility
    Ideographs (302); Alphabetic Presentation Forms (57); Arabic
    Presentation Forms-A (593); Combining Half Marks (4); CJK
    Compatibility Forms (28); Small Form Variants (26); Arabic
    Presentation Forms-B (139); Halfwidth and Fullwidth Forms (223);
    Specials (2)

----------------------------------------------------------------------------

Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode
MS", becuase it does not have. But I hope to see other 5 languages by it.

Is there any ways to display them?
Or should I use other fonts?

Takashi Hiromatsu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-01-31  6:34 Takashi Hiromatsu
@ 2007-01-31  6:51 ` Kenichi Handa
  2007-01-31  7:07   ` Takashi Hiromatsu
  0 siblings, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2007-01-31  6:51 UTC (permalink / raw)
  To: Takashi Hiromatsu; +Cc: emacs-devel

In article <u3b5ru2jy.wl%takashi-hiromatsu@isuzu.co.jp>, Takashi Hiromatsu <matsuan@ca2.so-net.ne.jp> writes:

> Dear all,
> I'm tring to display all the language's "HELLO" on Emacs on Windows by
> using original Microsoft true type fonts.
>     --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

> I succeed many of them by "Arial unicode MS" font exept 7 language listed
> below.:
>     Amharic
>     Arabic
>     Braille
>     Hindi
>     Kannada
>     Malayalam
>     Tibetan

The current Emacs still doesn't have a proper OpenType font
driver for displaying Indic scritps on Windows.  I'm working
on it in emacs-unicode-2 branch, but the progress is very
slow.  :-(

By the way, does "Arial unicode MS" contains proper Open
Type tables for Indic scripts?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-01-31  6:51 ` Kenichi Handa
@ 2007-01-31  7:07   ` Takashi Hiromatsu
  0 siblings, 0 replies; 13+ messages in thread
From: Takashi Hiromatsu @ 2007-01-31  7:07 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

At Wed, 31 Jan 2007 15:51:49 +0900,
Kenichi Handa wrote:
> >     Kannada
> >     Malayalam
> >     Tibetan
> 
> The current Emacs still doesn't have a proper OpenType font
> driver for displaying Indic scritps on Windows.  I'm working
> on it in emacs-unicode-2 branch, but the progress is very
> slow.  :-(
Thank you for your quick reply. I will try BDF for Indic.

> By the way, does "Arial unicode MS" contains proper Open
> Type tables for Indic scripts?
Sorry, I have not detail information about it, only from Microsoft.

Takashi Hiromatsu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
@ 2007-11-09  8:37 Richard Wordingham
  2007-11-09 10:55 ` Eli Zaretskii
  2007-11-09 12:40 ` Kenichi Handa
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Wordingham @ 2007-11-09  8:37 UTC (permalink / raw)
  To: emacs-devel

On 31 January 2007, Takashi Hiromatsu wrote (archived at 
http://lists.gnu.org/archive/html/emacs-devel/2007-01/msg01087.html ):

> I'm tring to display all the language's "HELLO" on Emacs on Windows by 
> using original Microsoft true type fonts.
>    --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

> I succeed many of them by "Arial unicode MS" font exept 7 language listed 
> below.:
>    Amharic
>    Arabic
>    Braille
>    Hindi
>    Kannada
>    Malayalam
>    Tibetan

> I wrote only font settings in my ~/.emacs shown below.
> ----------------------------------------------------------------------------
> (add-to-list 'default-frame-alist '(font . "fontset-default"))

> (set-fontset-font "fontset-default"
>                   'mule-unicode-0100-24ff
>                   '("Arial Unicode MS*" . "iso10646-1"))
<snip>
> Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode 
> MS", becuase it does not have. But I hope to see other 5 languages by it.

> Is there any ways to display them?
> Or should I use other fonts?

Hindi and Malayalam are a tougher problem.  Although the basic text is 
encoded in mule-unicode-0100-24ff, 'composition' properties are actually 
specified in the file.  The composition property should provide renderable 
text and mark-up which replace the basic text in the display, which ideally 
should be totally unnecessary in an MS Windows system.  (Realising this 
ideal requires the ability to upgrade the Uniscribe library to cover extra 
scripts and even newly admitted characters in supported scripts.)  These 
compositions are defined by elements for the charset indian-glyph, and its 
characters have no specified Unicode equivalent.  You need a non-Unicode 
font to display these characters.  Arial Unicode MS does not contain much in 
the way of shaping tables, so it will not work properly for any of the truly 
'complex' Indic scripts.  (This may be why Microsoft seems to have abandoned 
this font.)

Tibetan and Lao also use the composition property, but in terms of 
characters in the same charset.  However, I'm having display problems for 
Lao - see item 3 below.  Tibetan won't display for me as I don't have a font 
that supports Tibetan.

I'm trying to understand how the input and display mechanisms of Emacs 
22.1.1 work on Windows XP - I'm particularly interested in Indic scripts. 
My machine is set up with Thai as its 'ANSI' character set.  I'm seeing some 
rather bizarre behaviours, and I'm having difficulty understanding them. 
Once I realised that Emacs was not accepting Unicode input from the 
keyboard, I tried to understand the built-in input methods.  I investigated 
Lao input.

1. With the default font, the Windows keyboard set to Thai Kedmanee, Thai 
displays badly as it is typed.  Bits of characters are left behind as the 
typing position moves rightwards faster than it should.  However, when I 
switch to Code2000, a font with a wide Unicode coverage, Thai displays as 
well as it does with native products such as Notepad.  This may be because 
the alleged default font, Courier New, has no Thai glyphs, and so glyph 
metrics and glyphs bear no relationship to one another.

The Thai characters produced in this fashion are in one of the Unicode 
charsets (mule-unicode-0100-24ff).

2. My first discovery with Lao was that just selecting a font (Code2000) 
that supported Lao was not enough.  It would not normally display Lao 
characters (in the Lao charset), until I discovered that a trick such as

(set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))

suddenly made the Lao text displayable.  How does this work?  I have studied 
the code of xdisp.c and its supporting functions, but I cannot find where 
Emacs character codes are converted to Unicode.  I did notice that if I 
pasted Lao in from an MS application, Emacs would accept them as Unicode 
characters and they would be displayed properly if I selected an appropriate 
font.

3. Compositions of Lao characters, (i.e. with the 'composition' string 
property) using the Code2000 font (the only fully working Lao font I have), 
do not display properly, whether they are in the Lao or 
mule-unicode-0100-24ff charset.  With the latter I have seen left-hand parts 
of Hangul syllables displayed instead of Lao!  Perhaps when I understand how 
uncomposed display does work, I will be able to understand this problem. At 
present I need to defeat the composition logic by typing consonant + vowel 
as <consonant, space, delete, vowel>!  The text entered thus then displays 
properly, mocking the hard work that has gone into carefully composing 
grapheme clusters.

4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one 
of its variants), the Lao input method suddenly switches from generating Lao 
characters in the Lao charset to generating Lao characters in the 
mule-unicode-0100-24ff charset.  How is this effect achieved?  I can't work 
it out.  Characters already stored in the Lao charset remain in the Lao 
charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not 
change the charset used by the Lao charset.

5. Possibly not news, but I have found that with a Uniscribe that supports 
Khmer, Unicode-encoded Khmer text pasted in to Emacs displays properly, 
including 'Indic rearrangement'.  As far as I can tell, Emacs 22.1 has no 
support for Khmer!  (Cursor positioning does look wrong for Khmer.)  When I 
understand what is happening with Lao, I intend to write an input method for 
Khmer - unless I find Emacs on Windows has evolved to accepting UTF-16 as 
the coding system for keyboard input.

6. Latin ligaturing does not work.  'Caesar' with a ZWJ between 'a' and 'e' 
does not ligate even using a font for which it does ligate in Notepad. 
Perhaps that can get swept up with the handling of Unicode viramas, i.e. 
Indic conjuncts.

Richard. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-09  8:37 [w32] display international HELLO Richard Wordingham
@ 2007-11-09 10:55 ` Eli Zaretskii
  2007-11-09 12:40 ` Kenichi Handa
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2007-11-09 10:55 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: emacs-devel

> From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
> Date: Fri, 9 Nov 2007 08:37:42 -0000
> 
> > Is there any ways to display them?
> > Or should I use other fonts?

I guess some of the useful stuff you posted should find its way into
PROBLEMS.  So please keep on with posting any findings you will
discover as result of this discussion.  Thanks!

> 2. My first discovery with Lao was that just selecting a font (Code2000) 
> that supported Lao was not enough.  It would not normally display Lao 
> characters (in the Lao charset), until I discovered that a trick such as
> 
> (set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))
> 
> suddenly made the Lao text displayable.  How does this work?  I have studied 
> the code of xdisp.c and its supporting functions, but I cannot find where 
> Emacs character codes are converted to Unicode.

I think you need to look in w32term.c as well.  For example, the
function w32_encode_char there seems to be a good place to start.

> I did notice that if I pasted Lao in from an MS application, Emacs
> would accept them as Unicode characters and they would be displayed
> properly if I selected an appropriate font.

Yes, Emacs on Windows uses Unicode for working with the clipboard
whenever possible.

> 4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one 
> of its variants), the Lao input method suddenly switches from generating Lao 
> characters in the Lao charset to generating Lao characters in the 
> mule-unicode-0100-24ff charset.  How is this effect achieved?

That's because UTF-8 is supported only for mule-unicode-0100-24ff, so
decoding UTF-8 will _always_ produce Unicode characters.

> Characters already stored in the Lao charset remain in the Lao 
> charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

Yes.  This is because changing buffer's encoding does not change the
buffer contents in any way, it just tells Emacs how to encode the
buffer text when the time comes to write it to disk.  At that time, if
there are any characters in the buffer that cannot be encoded in the
encoding you specified, Emacs will complain and show you those files,
so that you could choose a different encoding.

> Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not 
> change the charset used by the Lao charset.

Probably a bug of some sort.

> 6. Latin ligaturing does not work.

Yes, Emacs currently does not support this feature.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-09  8:37 [w32] display international HELLO Richard Wordingham
  2007-11-09 10:55 ` Eli Zaretskii
@ 2007-11-09 12:40 ` Kenichi Handa
  2007-11-15  2:48   ` Richard Wordingham
  1 sibling, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2007-11-09 12:40 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: emacs-devel

In article <001501c822ab$ccfec5e0$d5101252@JRWXP1>, "Richard Wordingham" <richard.wordingham@ntlworld.com> writes:

> Hindi and Malayalam are a tougher problem.  Although the basic text is 
> encoded in mule-unicode-0100-24ff, 'composition' properties are actually 
> specified in the file.  The composition property should provide renderable 
> text and mark-up which replace the basic text in the display, which ideally 
> should be totally unnecessary in an MS Windows system.  (Realising this 
> ideal requires the ability to upgrade the Uniscribe library to cover extra 
> scripts and even newly admitted characters in supported scripts.)  These 
> compositions are defined by elements for the charset indian-glyph, and its 
> characters have no specified Unicode equivalent.  You need a non-Unicode 
> font to display these characters.  Arial Unicode MS does not contain much in 
> the way of shaping tables, so it will not work properly for any of the truly 
> 'complex' Indic scripts.  (This may be why Microsoft seems to have abandoned 
> this font.)

In emacs-unicode-2 branch, I'm working on supporting Indic
(and any other scripts that require CTL (Complex Text
Layout)) by OpenType fonts (though the progress is slow).

> 2. My first discovery with Lao was that just selecting a font (Code2000) 
> that supported Lao was not enough.  It would not normally display Lao 
> characters (in the Lao charset), until I discovered that a trick such as

> (set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))

> suddenly made the Lao text displayable.  How does this work?  I have studied 
> the code of xdisp.c and its supporting functions, but I cannot find where 
> Emacs character codes are converted to Unicode.  I did notice that if I 
> pasted Lao in from an MS application, Emacs would accept them as Unicode 
> characters and they would be displayed properly if I selected an appropriate 
> font.

Emacs 22 still doens't unify characters in legacy charsets
and Unicode by default.  And, as Lao doesn't have official
national charset, Emacs invented one long ago (before
Unicode).  Lao characters in the HELLO file is using that
charset.  The above set-fontset-font tells Emacs to use
iso10646-1 font (i.e. a Unicode encoding font) for that Lao
charset.  Then, the CCL code ccl-encode-unicode-font
(defined in lisp/international/fontset.el) converts Lao
codes to the corresponding Unicode codes on displaying.  It
is done in x_encode_char of xterm.c (or in w32_encode_char
of w32term.c).

This ugly mechanism is not used in emacs-unicode-2.

> 3. Compositions of Lao characters, (i.e. with the 'composition' string 
> property) using the Code2000 font (the only fully working Lao font I have), 
> do not display properly, whether they are in the Lao or 
> mule-unicode-0100-24ff charset.  With the latter I have seen left-hand parts 
> of Hangul syllables displayed instead of Lao!  Perhaps when I understand how 
> uncomposed display does work, I will be able to understand this problem. At 
> present I need to defeat the composition logic by typing consonant + vowel 
> as <consonant, space, delete, vowel>!  The text entered thus then displays 
> properly, mocking the hard work that has gone into carefully composing 
> grapheme clusters.

I think it's a waste of time to learn composition mechanism
of Emacs 22.  It will be a lot improved in emacs-unicode-2.

> 4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one 
> of its variants), the Lao input method suddenly switches from generating Lao 
> characters in the Lao charset to generating Lao characters in the 
> mule-unicode-0100-24ff charset.  How is this effect
> achieved?

When a buffer is created, Emacs 22 setup the char table
translation-table-for-input suitable for the buffer's
file-coding-system.  That table converts Lao charset to
mule-unicode-0100-24ff charset within an input method.

>  I can't work 
> it out.  Characters already stored in the Lao charset remain in the Lao 
> charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

> Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not 
> change the charset used by the Lao charset.

Yes.  Just saving doesn't change buffer contents.  Re-read
the file.

Anyway, emacs-unicode-2 doesn't have those bizarre problems.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-09 12:40 ` Kenichi Handa
@ 2007-11-15  2:48   ` Richard Wordingham
  2007-11-19  2:35     ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Wordingham @ 2007-11-15  2:48 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa wrote:
> Richard Wordingham wrote:

Thanks for the explanations.

>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>> property) using the Code2000 font (the only fully working Lao font I 
>> have),
>> do not display properly, whether they are in the Lao or
>> mule-unicode-0100-24ff charset.

> I think it's a waste of time to learn composition mechanism
> of Emacs 22.  It will be a lot improved in emacs-unicode-2.

Are you talking of future work or the present state of emacs-unicode-2, as 
in Emacs 23.0.60.0?  Are you talking of applying the composition property, 
or of the rendering of composed sequences?  With the help of Jason Rumney I 
got 23.0.60.0 to build and run on Windows XP, but the Lao compositions are 
still displaying unreadably bizarrely.  Unfortunately the more thorough 
application of composition in emacs-unicode-2 currently makes matters worse! 
Before, one could easily if tediously avoid the composition mechanism.  For 
example, to make a Lao closed syllable consonant-vowel sequence <ka>, in 
Emacs 22.1 one can type d, space, delete, a.  The best I can find in Emacs 
23.0.60.0 is d, space, a, left, delete.  However, as soon as one moves the 
cursor the characters compose and are then misdisplayed.

The display problems also apply to Thai.  It seems, however, that with Emacs 
22.1 and a system default codepage ('ANSI' in Microsoft jargon) of Thai 
(CP-874), using a Thai keyboard avoids composition until the Emacs Thai 
input method (I've only investigated Kesmanee - I don't use Pattachote) is 
selected.   Thus, Emacs 22.1 works fairly well for Thai on a 'Thai PC'. 
Emacs 23.0.60.0 seems not to be aware of the system default codepage - just 
switching the keyboard to Thai Kesmanee in Windows and issuing no commands 
in Emacs resulted in Latin-1 characters appearing as I typed.

These display problems seem to me to be a straightforward bug, at least as 
far as operation on MS Windows is concerned.  However, I am still no nearer 
to locating it.

Richard. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-15  2:48   ` Richard Wordingham
@ 2007-11-19  2:35     ` Kenichi Handa
  2007-11-19  8:51       ` Jason Rumney
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kenichi Handa @ 2007-11-19  2:35 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: emacs-devel

In article <00d801c82732$0001dfb0$d5101252@JRWXP1>, "Richard Wordingham" <richard.wordingham@ntlworld.com> writes:

>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>> property) using the Code2000 font (the only fully working Lao font I 
>>> have),
>>> do not display properly, whether they are in the Lao or
>>> mule-unicode-0100-24ff charset.

> > I think it's a waste of time to learn composition mechanism
> > of Emacs 22.  It will be a lot improved in emacs-unicode-2.

> Are you talking of future work or the present state of emacs-unicode-2, as 
> in Emacs 23.0.60.0?

I'm talking about the future work, but the latest
emacs-unicode-2 code already contains half-done codes.

> Are you talking of applying the composition property, 
> or of the rendering of composed sequences?

I'm going to allow each font-backends to generate proper
composition information that will vary depending on a font,
instead of the current fixed way of composition.  So, On
Windows, perhaps the font backend can utilize uniscribe.  On
GNU/Linux, I have not yet decided what to do; using Pango,
using m17n-lib, or using a newly written code that directly
uses libotf or harfbuzz for OTF handling.  I think the last
choice will result in the fastest rendering.

> With the help of Jason Rumney I 
> got 23.0.60.0 to build and run on Windows XP, but the Lao compositions are 
> still displaying unreadably bizarrely.  Unfortunately the more thorough 
> application of composition in emacs-unicode-2 currently makes matters worse! 
> Before, one could easily if tediously avoid the composition mechanism.  For 
> example, to make a Lao closed syllable consonant-vowel sequence <ka>, in 
> Emacs 22.1 one can type d, space, delete, a.  The best I can find in Emacs 
> 23.0.60.0 is d, space, a, left, delete.  However, as soon as one moves the 
> cursor the characters compose and are then misdisplayed.

That is because of the auto-composition mechanism introduced
in emacs-unicode-2.  The problem is that the current fixed
composition is suitable only for a specific font (usually a
fixed-width terminal font).  Even on Windows, I think
there's a way to use BDF fonts distributed as intlfonts.  If
you use those fonts on Windows, the rendering should be
good.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-19  2:35     ` Kenichi Handa
@ 2007-11-19  8:51       ` Jason Rumney
  2007-11-20  1:49       ` Richard Wordingham
  2007-11-21  1:51       ` Richard Wordingham
  2 siblings, 0 replies; 13+ messages in thread
From: Jason Rumney @ 2007-11-19  8:51 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: Richard Wordingham, emacs-devel

Kenichi Handa wrote:
>   Even on Windows, I think
> there's a way to use BDF fonts distributed as intlfonts.

Not with the new font backend though.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-19  2:35     ` Kenichi Handa
  2007-11-19  8:51       ` Jason Rumney
@ 2007-11-20  1:49       ` Richard Wordingham
  2007-11-20 11:30         ` Jason Rumney
  2007-11-21  1:51       ` Richard Wordingham
  2 siblings, 1 reply; 13+ messages in thread
From: Richard Wordingham @ 2007-11-20  1:49 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa wrote:

> Richard Wordingham writes:
>
>>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>>> property) using the Code2000 font (the only fully working Lao font I
>>>> have),
>>>> do not display properly, whether they are in the Lao or
>>>> mule-unicode-0100-24ff charset.

> I'm going to allow each font-backends to generate proper
> composition information that will vary depending on a font,
> instead of the current fixed way of composition.  So, On
> Windows, perhaps the font backend can utilize uniscribe.

For OpenType fonts in scripts supported by Uniscribe, that's generally the 
way to go - especially for quick results.  Might Pango be superior, even on 
MS Windows, though?  It was very noticeable that when Unicode belatedly 
added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil 
letter, let alone form the shri ligature from it in those fonts that had 
been updated.  (Previously the shri ligature had been implemented via the 
hack of using U+0BB7 TAMIL LETTER SSA instead.)

There is another composition technology around, intended to cater for those 
scripts not or inadequately supported by Uniscribe, namely Graphite from 
SIL.  For some time it was the only way of supporting the Burmese script in 
Unicode on Windows.  (I don't know if Windows Vista and related products 
support the Burmese script, at least for Burmese.  I'd be impressed if the 
Shan extensions were in.)  The OpenType font has extra tables for Graphite, 
so an application (such as at least some versions of Firefox and OpenOffice) 
knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS 
tables.  (I presume similar considerations apply to Apple-defined mort and 
morx tables.)  By putting the composition knowledge in the font, Graphite 
even allows one to encode complex scripts in the Private Use Areas.

Incidentally, part of the reason for the poor Lao rendering was that in 
Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI' 
sequence.  I've fixed that problem by adding some MS Windows only code to 
append_composite_glyph() in xdisp.c to apply the identification rules in the 
same way as done for uncomposed characters, but that doesn't really seem the 
best place for it.  Populating and using the unused field font_type in 
W32FontStruct would be a clearer solution.  (A cleaner solution still would 
be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always 
generates an intermediate sequence of 16-bit codes, but the burden of 
recoding for hack fonts might be transferred from the OS to emacs.)  Judging 
by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I 
can trust version.el).  Most spectacularly, plain text 'underlined' 'o' 
<U+006F U+0331> renders as 'o' with the digit '1' written below it!

This then exposes the next set of problems - Uniscribe often refuses to draw 
a combining mark on its own (prefixing U+00A0 might work) - and determining 
when a composition should be left to Uniscribe.  The latter is slightly 
complicated by such features as an ASCII or Latin-1 base character plus a 
combining mark, admittedly fairly rare if one is using Normal Form Composed 
(NFC).  (Indic transliteration and typewriter-based American Indian 
orthographies are the best sources, e.g. underlining for nasal vowels in 
Choctaw.)  In these cases, the character sequence is broken, at least in 
Emacs 22.1, because the base and combining characters seem to come from 
different fonts!

I'm tempted to go for the brute force rule of assuming that the combining 
marks are always taken from the same OpenType font as the base character and 
giving the job to Uniscribe.  This hits the practical problem that many 
OpenType fonts don't stack arbitrary combinations of diacritic marks. 
However, I have seen an Emacs-related statement that it is the user's 
responsibility to provide a font that works properly.

Richard. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-20  1:49       ` Richard Wordingham
@ 2007-11-20 11:30         ` Jason Rumney
  2007-11-20 12:50           ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2007-11-20 11:30 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: emacs-devel

Richard Wordingham wrote:
> For OpenType fonts in scripts supported by Uniscribe, that's generally
> the way to go - especially for quick results.  Might Pango be
> superior, even on MS Windows, though?
With the new font-backend design, there is no reason why someone could
not make a pango backend work on Windows. But I think we still need a
native windows backend.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-20 11:30         ` Jason Rumney
@ 2007-11-20 12:50           ` Kenichi Handa
  0 siblings, 0 replies; 13+ messages in thread
From: Kenichi Handa @ 2007-11-20 12:50 UTC (permalink / raw)
  To: Jason Rumney; +Cc: richard.wordingham, emacs-devel

In article <4742C54E.1030600@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Richard Wordingham wrote:
> > For OpenType fonts in scripts supported by Uniscribe, that's generally
> > the way to go - especially for quick results.  Might Pango be
> > superior, even on MS Windows, though?
> With the new font-backend design, there is no reason why someone could
> not make a pango backend work on Windows. But I think we still need a
> native windows backend.

I'm now designing an API to utilize font-backend's shaping
engine.  At the moment, I'm thinking about adding a new
callback function `shape' in a font driver, and make it
callable from auto-composition-function to attach a proper
composition property to text.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [w32] display international HELLO
  2007-11-19  2:35     ` Kenichi Handa
  2007-11-19  8:51       ` Jason Rumney
  2007-11-20  1:49       ` Richard Wordingham
@ 2007-11-21  1:51       ` Richard Wordingham
  2 siblings, 0 replies; 13+ messages in thread
From: Richard Wordingham @ 2007-11-21  1:51 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa wrote on Monday, November 19, 2007 2:35 AM

> The problem is that the current fixed
> composition is suitable only for a specific font (usually a
> fixed-width terminal font).  Even on Windows, I think
> there's a way to use BDF fonts distributed as intlfonts.  If
> you use those fonts on Windows, the rendering should be
> good.

I installed intlfonts 1.2.1 using the procedure given at 
http://www.gnu.org/software/emacs/windows/faq5.html , save that I updated 
the directory names to those actually present, thus names ending in '.X', 
not '-X'.  When I selected the 16-point BDF fonts for Emacs 22.1, the Lao 
came out nicely except that the tone marks were not treated as combining 
marks.  Unfortunately, with Emacs 23.0.60.1 (+ bug fixes Jason Rumney 
mentioned earlier), I just got square boxes for the BDF fonts.  Do I need to 
define the fontset specially to allow it use a Lao-encoded font for Lao 
characters?  I tried

(set-fontset-font "fontset-bdf" 'lao 
'("-misc-fixed-medium-r-normal--16-160-72-72-m-80-MuleLao-1" . "MuleLao-1"))

and several variations, but to no apparent avail.

Richard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-11-21  1:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-09  8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
2007-11-15  2:48   ` Richard Wordingham
2007-11-19  2:35     ` Kenichi Handa
2007-11-19  8:51       ` Jason Rumney
2007-11-20  1:49       ` Richard Wordingham
2007-11-20 11:30         ` Jason Rumney
2007-11-20 12:50           ` Kenichi Handa
2007-11-21  1:51       ` Richard Wordingham
  -- strict thread matches above, loose matches on Subject: below --
2007-01-31  6:34 Takashi Hiromatsu
2007-01-31  6:51 ` Kenichi Handa
2007-01-31  7:07   ` Takashi Hiromatsu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).