* [w32] display international HELLO
@ 2007-01-31 6:34 Takashi Hiromatsu
2007-01-31 6:51 ` Kenichi Handa
0 siblings, 1 reply; 13+ messages in thread
From: Takashi Hiromatsu @ 2007-01-31 6:34 UTC (permalink / raw)
To: Emacs Devel ML
Dear all,
I'm tring to display all the language's "HELLO" on Emacs on Windows by
using original Microsoft true type fonts.
--- Emacs/22.0.92 (i386-mingw-nt5.0.2195)
I succeed many of them by "Arial unicode MS" font exept 7 language listed
below.:
Amharic
Arabic
Braille
Hindi
Kannada
Malayalam
Tibetan
I wrote only font settings in my ~/.emacs shown below.
----------------------------------------------------------------------------
(add-to-list 'default-frame-alist '(font . "fontset-default"))
(set-fontset-font "fontset-default"
'mule-unicode-0100-24ff
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'latin-iso8859-3
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'cyrillic-iso8859-5
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'greek-iso8859-7
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'vietnamese-viscii-lower
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'vietnamese-viscii-upper
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'tibetan
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'lao
'("Arial Unicode MS*" . "iso10646-1"))
(set-fontset-font "fontset-default"
'ipa
'("Arial Unicode MS*" . "iso10646-1"))
----------------------------------------------------------------------------
Microsoft explained that "Arial Unicode MS" font can display many
languages listed below:
Basic Latin (95); Latin-1 Supplement (96); Latin Extended-A (128);
Latin Extended-B (148); IPA Extensions (89); Spacing Modifier Letters
(57); Combining Diacritical Marks (72); Greek (105); Cyrillic (226);
Armenian (85); Hebrew (82); Arabic (194); Devanagari (104); Bengali
(89); Gurmukhi (75); Gujarati (78); Oriya (79); Tamil (61); Telugu
(80); Kannada (80); Malayalam (78); Thai (87); Lao (65); Tibetan
(168); Georgian (78); Hangul Jamo (240); Latin Extended Additional
(246); Greek Extended (233); General Punctuation (63); Superscripts
and Subscripts (28); Currency Symbols (13); Combining Diacritical
Marks for Symbols (18); Letterlike Symbols (57); Number Forms (48);
Arrows (91); Mathematical Operators (242); Miscellaneous Technical
(123); Control Pictures (37); Optical Character Recognition (11);
Enclosed Alphanumerics (139); Box Drawing (128); Block Elements (22);
Geometric Shapes (80); Miscellaneous Symbols (106); Dingbats (160);
CJK Symbols and Punctuation (57); Hiragana (90); Katakana (94);
Bopomofo (40); Hangul Compatibility Jamo (94); Kanbun (16); Enclosed
CJK Letters and Months (202); CJK Compatibility (249); CJK Unified
Ideographs (20,902); Hangul Syllables (11,172); CJK Compatibility
Ideographs (302); Alphabetic Presentation Forms (57); Arabic
Presentation Forms-A (593); Combining Half Marks (4); CJK
Compatibility Forms (28); Small Form Variants (26); Arabic
Presentation Forms-B (139); Halfwidth and Fullwidth Forms (223);
Specials (2)
----------------------------------------------------------------------------
Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode
MS", becuase it does not have. But I hope to see other 5 languages by it.
Is there any ways to display them?
Or should I use other fonts?
Takashi Hiromatsu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-01-31 6:34 Takashi Hiromatsu
@ 2007-01-31 6:51 ` Kenichi Handa
2007-01-31 7:07 ` Takashi Hiromatsu
0 siblings, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2007-01-31 6:51 UTC (permalink / raw)
To: Takashi Hiromatsu; +Cc: emacs-devel
In article <u3b5ru2jy.wl%takashi-hiromatsu@isuzu.co.jp>, Takashi Hiromatsu <matsuan@ca2.so-net.ne.jp> writes:
> Dear all,
> I'm tring to display all the language's "HELLO" on Emacs on Windows by
> using original Microsoft true type fonts.
> --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)
> I succeed many of them by "Arial unicode MS" font exept 7 language listed
> below.:
> Amharic
> Arabic
> Braille
> Hindi
> Kannada
> Malayalam
> Tibetan
The current Emacs still doesn't have a proper OpenType font
driver for displaying Indic scritps on Windows. I'm working
on it in emacs-unicode-2 branch, but the progress is very
slow. :-(
By the way, does "Arial unicode MS" contains proper Open
Type tables for Indic scripts?
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-01-31 6:51 ` Kenichi Handa
@ 2007-01-31 7:07 ` Takashi Hiromatsu
0 siblings, 0 replies; 13+ messages in thread
From: Takashi Hiromatsu @ 2007-01-31 7:07 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel
At Wed, 31 Jan 2007 15:51:49 +0900,
Kenichi Handa wrote:
> > Kannada
> > Malayalam
> > Tibetan
>
> The current Emacs still doesn't have a proper OpenType font
> driver for displaying Indic scritps on Windows. I'm working
> on it in emacs-unicode-2 branch, but the progress is very
> slow. :-(
Thank you for your quick reply. I will try BDF for Indic.
> By the way, does "Arial unicode MS" contains proper Open
> Type tables for Indic scripts?
Sorry, I have not detail information about it, only from Microsoft.
Takashi Hiromatsu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
@ 2007-11-09 8:37 Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
0 siblings, 2 replies; 13+ messages in thread
From: Richard Wordingham @ 2007-11-09 8:37 UTC (permalink / raw)
To: emacs-devel
On 31 January 2007, Takashi Hiromatsu wrote (archived at
http://lists.gnu.org/archive/html/emacs-devel/2007-01/msg01087.html ):
> I'm tring to display all the language's "HELLO" on Emacs on Windows by
> using original Microsoft true type fonts.
> --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)
> I succeed many of them by "Arial unicode MS" font exept 7 language listed
> below.:
> Amharic
> Arabic
> Braille
> Hindi
> Kannada
> Malayalam
> Tibetan
> I wrote only font settings in my ~/.emacs shown below.
> ----------------------------------------------------------------------------
> (add-to-list 'default-frame-alist '(font . "fontset-default"))
> (set-fontset-font "fontset-default"
> 'mule-unicode-0100-24ff
> '("Arial Unicode MS*" . "iso10646-1"))
<snip>
> Off cource, "Amharic" and "Braille" can not be displayed by "Arial Unicode
> MS", becuase it does not have. But I hope to see other 5 languages by it.
> Is there any ways to display them?
> Or should I use other fonts?
Hindi and Malayalam are a tougher problem. Although the basic text is
encoded in mule-unicode-0100-24ff, 'composition' properties are actually
specified in the file. The composition property should provide renderable
text and mark-up which replace the basic text in the display, which ideally
should be totally unnecessary in an MS Windows system. (Realising this
ideal requires the ability to upgrade the Uniscribe library to cover extra
scripts and even newly admitted characters in supported scripts.) These
compositions are defined by elements for the charset indian-glyph, and its
characters have no specified Unicode equivalent. You need a non-Unicode
font to display these characters. Arial Unicode MS does not contain much in
the way of shaping tables, so it will not work properly for any of the truly
'complex' Indic scripts. (This may be why Microsoft seems to have abandoned
this font.)
Tibetan and Lao also use the composition property, but in terms of
characters in the same charset. However, I'm having display problems for
Lao - see item 3 below. Tibetan won't display for me as I don't have a font
that supports Tibetan.
I'm trying to understand how the input and display mechanisms of Emacs
22.1.1 work on Windows XP - I'm particularly interested in Indic scripts.
My machine is set up with Thai as its 'ANSI' character set. I'm seeing some
rather bizarre behaviours, and I'm having difficulty understanding them.
Once I realised that Emacs was not accepting Unicode input from the
keyboard, I tried to understand the built-in input methods. I investigated
Lao input.
1. With the default font, the Windows keyboard set to Thai Kedmanee, Thai
displays badly as it is typed. Bits of characters are left behind as the
typing position moves rightwards faster than it should. However, when I
switch to Code2000, a font with a wide Unicode coverage, Thai displays as
well as it does with native products such as Notepad. This may be because
the alleged default font, Courier New, has no Thai glyphs, and so glyph
metrics and glyphs bear no relationship to one another.
The Thai characters produced in this fashion are in one of the Unicode
charsets (mule-unicode-0100-24ff).
2. My first discovery with Lao was that just selecting a font (Code2000)
that supported Lao was not enough. It would not normally display Lao
characters (in the Lao charset), until I discovered that a trick such as
(set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))
suddenly made the Lao text displayable. How does this work? I have studied
the code of xdisp.c and its supporting functions, but I cannot find where
Emacs character codes are converted to Unicode. I did notice that if I
pasted Lao in from an MS application, Emacs would accept them as Unicode
characters and they would be displayed properly if I selected an appropriate
font.
3. Compositions of Lao characters, (i.e. with the 'composition' string
property) using the Code2000 font (the only fully working Lao font I have),
do not display properly, whether they are in the Lao or
mule-unicode-0100-24ff charset. With the latter I have seen left-hand parts
of Hangul syllables displayed instead of Lao! Perhaps when I understand how
uncomposed display does work, I will be able to understand this problem. At
present I need to defeat the composition logic by typing consonant + vowel
as <consonant, space, delete, vowel>! The text entered thus then displays
properly, mocking the hard work that has gone into carefully composing
grapheme clusters.
4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one
of its variants), the Lao input method suddenly switches from generating Lao
characters in the Lao charset to generating Lao characters in the
mule-unicode-0100-24ff charset. How is this effect achieved? I can't work
it out. Characters already stored in the Lao charset remain in the Lao
charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).
Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not
change the charset used by the Lao charset.
5. Possibly not news, but I have found that with a Uniscribe that supports
Khmer, Unicode-encoded Khmer text pasted in to Emacs displays properly,
including 'Indic rearrangement'. As far as I can tell, Emacs 22.1 has no
support for Khmer! (Cursor positioning does look wrong for Khmer.) When I
understand what is happening with Lao, I intend to write an input method for
Khmer - unless I find Emacs on Windows has evolved to accepting UTF-16 as
the coding system for keyboard input.
6. Latin ligaturing does not work. 'Caesar' with a ZWJ between 'a' and 'e'
does not ligate even using a font for which it does ligate in Notepad.
Perhaps that can get swept up with the handling of Unicode viramas, i.e.
Indic conjuncts.
Richard.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-09 8:37 [w32] display international HELLO Richard Wordingham
@ 2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
1 sibling, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2007-11-09 10:55 UTC (permalink / raw)
To: Richard Wordingham; +Cc: emacs-devel
> From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
> Date: Fri, 9 Nov 2007 08:37:42 -0000
>
> > Is there any ways to display them?
> > Or should I use other fonts?
I guess some of the useful stuff you posted should find its way into
PROBLEMS. So please keep on with posting any findings you will
discover as result of this discussion. Thanks!
> 2. My first discovery with Lao was that just selecting a font (Code2000)
> that supported Lao was not enough. It would not normally display Lao
> characters (in the Lao charset), until I discovered that a trick such as
>
> (set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))
>
> suddenly made the Lao text displayable. How does this work? I have studied
> the code of xdisp.c and its supporting functions, but I cannot find where
> Emacs character codes are converted to Unicode.
I think you need to look in w32term.c as well. For example, the
function w32_encode_char there seems to be a good place to start.
> I did notice that if I pasted Lao in from an MS application, Emacs
> would accept them as Unicode characters and they would be displayed
> properly if I selected an appropriate font.
Yes, Emacs on Windows uses Unicode for working with the clipboard
whenever possible.
> 4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one
> of its variants), the Lao input method suddenly switches from generating Lao
> characters in the Lao charset to generating Lao characters in the
> mule-unicode-0100-24ff charset. How is this effect achieved?
That's because UTF-8 is supported only for mule-unicode-0100-24ff, so
decoding UTF-8 will _always_ produce Unicode characters.
> Characters already stored in the Lao charset remain in the Lao
> charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).
Yes. This is because changing buffer's encoding does not change the
buffer contents in any way, it just tells Emacs how to encode the
buffer text when the time comes to write it to disk. At that time, if
there are any characters in the buffer that cannot be encoded in the
encoding you specified, Emacs will complain and show you those files,
so that you could choose a different encoding.
> Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not
> change the charset used by the Lao charset.
Probably a bug of some sort.
> 6. Latin ligaturing does not work.
Yes, Emacs currently does not support this feature.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-09 8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
@ 2007-11-09 12:40 ` Kenichi Handa
2007-11-15 2:48 ` Richard Wordingham
1 sibling, 1 reply; 13+ messages in thread
From: Kenichi Handa @ 2007-11-09 12:40 UTC (permalink / raw)
To: Richard Wordingham; +Cc: emacs-devel
In article <001501c822ab$ccfec5e0$d5101252@JRWXP1>, "Richard Wordingham" <richard.wordingham@ntlworld.com> writes:
> Hindi and Malayalam are a tougher problem. Although the basic text is
> encoded in mule-unicode-0100-24ff, 'composition' properties are actually
> specified in the file. The composition property should provide renderable
> text and mark-up which replace the basic text in the display, which ideally
> should be totally unnecessary in an MS Windows system. (Realising this
> ideal requires the ability to upgrade the Uniscribe library to cover extra
> scripts and even newly admitted characters in supported scripts.) These
> compositions are defined by elements for the charset indian-glyph, and its
> characters have no specified Unicode equivalent. You need a non-Unicode
> font to display these characters. Arial Unicode MS does not contain much in
> the way of shaping tables, so it will not work properly for any of the truly
> 'complex' Indic scripts. (This may be why Microsoft seems to have abandoned
> this font.)
In emacs-unicode-2 branch, I'm working on supporting Indic
(and any other scripts that require CTL (Complex Text
Layout)) by OpenType fonts (though the progress is slow).
> 2. My first discovery with Lao was that just selecting a font (Code2000)
> that supported Lao was not enough. It would not normally display Lao
> characters (in the Lao charset), until I discovered that a trick such as
> (set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))
> suddenly made the Lao text displayable. How does this work? I have studied
> the code of xdisp.c and its supporting functions, but I cannot find where
> Emacs character codes are converted to Unicode. I did notice that if I
> pasted Lao in from an MS application, Emacs would accept them as Unicode
> characters and they would be displayed properly if I selected an appropriate
> font.
Emacs 22 still doens't unify characters in legacy charsets
and Unicode by default. And, as Lao doesn't have official
national charset, Emacs invented one long ago (before
Unicode). Lao characters in the HELLO file is using that
charset. The above set-fontset-font tells Emacs to use
iso10646-1 font (i.e. a Unicode encoding font) for that Lao
charset. Then, the CCL code ccl-encode-unicode-font
(defined in lisp/international/fontset.el) converts Lao
codes to the corresponding Unicode codes on displaying. It
is done in x_encode_char of xterm.c (or in w32_encode_char
of w32term.c).
This ugly mechanism is not used in emacs-unicode-2.
> 3. Compositions of Lao characters, (i.e. with the 'composition' string
> property) using the Code2000 font (the only fully working Lao font I have),
> do not display properly, whether they are in the Lao or
> mule-unicode-0100-24ff charset. With the latter I have seen left-hand parts
> of Hangul syllables displayed instead of Lao! Perhaps when I understand how
> uncomposed display does work, I will be able to understand this problem. At
> present I need to defeat the composition logic by typing consonant + vowel
> as <consonant, space, delete, vowel>! The text entered thus then displays
> properly, mocking the hard work that has gone into carefully composing
> grapheme clusters.
I think it's a waste of time to learn composition mechanism
of Emacs 22. It will be a lot improved in emacs-unicode-2.
> 4. When I explicitly specify that a buffer is to be saved in UTF-8 (or one
> of its variants), the Lao input method suddenly switches from generating Lao
> characters in the Lao charset to generating Lao characters in the
> mule-unicode-0100-24ff charset. How is this effect
> achieved?
When a buffer is created, Emacs 22 setup the char table
translation-table-for-input suitable for the buffer's
file-coding-system. That table converts Lao charset to
mule-unicode-0100-24ff charset within an input method.
> I can't work
> it out. Characters already stored in the Lao charset remain in the Lao
> charset in the buffer, as confirmed by C-x C-e (eval-last-sexp).
> Bizarrely, selecting UTF-16 as the encoding for saving the buffer does not
> change the charset used by the Lao charset.
Yes. Just saving doesn't change buffer contents. Re-read
the file.
Anyway, emacs-unicode-2 doesn't have those bizarre problems.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-09 12:40 ` Kenichi Handa
@ 2007-11-15 2:48 ` Richard Wordingham
2007-11-19 2:35 ` Kenichi Handa
0 siblings, 1 reply; 13+ messages in thread
From: Richard Wordingham @ 2007-11-15 2:48 UTC (permalink / raw)
To: emacs-devel
Kenichi Handa wrote:
> Richard Wordingham wrote:
Thanks for the explanations.
>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>> property) using the Code2000 font (the only fully working Lao font I
>> have),
>> do not display properly, whether they are in the Lao or
>> mule-unicode-0100-24ff charset.
> I think it's a waste of time to learn composition mechanism
> of Emacs 22. It will be a lot improved in emacs-unicode-2.
Are you talking of future work or the present state of emacs-unicode-2, as
in Emacs 23.0.60.0? Are you talking of applying the composition property,
or of the rendering of composed sequences? With the help of Jason Rumney I
got 23.0.60.0 to build and run on Windows XP, but the Lao compositions are
still displaying unreadably bizarrely. Unfortunately the more thorough
application of composition in emacs-unicode-2 currently makes matters worse!
Before, one could easily if tediously avoid the composition mechanism. For
example, to make a Lao closed syllable consonant-vowel sequence <ka>, in
Emacs 22.1 one can type d, space, delete, a. The best I can find in Emacs
23.0.60.0 is d, space, a, left, delete. However, as soon as one moves the
cursor the characters compose and are then misdisplayed.
The display problems also apply to Thai. It seems, however, that with Emacs
22.1 and a system default codepage ('ANSI' in Microsoft jargon) of Thai
(CP-874), using a Thai keyboard avoids composition until the Emacs Thai
input method (I've only investigated Kesmanee - I don't use Pattachote) is
selected. Thus, Emacs 22.1 works fairly well for Thai on a 'Thai PC'.
Emacs 23.0.60.0 seems not to be aware of the system default codepage - just
switching the keyboard to Thai Kesmanee in Windows and issuing no commands
in Emacs resulted in Latin-1 characters appearing as I typed.
These display problems seem to me to be a straightforward bug, at least as
far as operation on MS Windows is concerned. However, I am still no nearer
to locating it.
Richard.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-15 2:48 ` Richard Wordingham
@ 2007-11-19 2:35 ` Kenichi Handa
2007-11-19 8:51 ` Jason Rumney
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Kenichi Handa @ 2007-11-19 2:35 UTC (permalink / raw)
To: Richard Wordingham; +Cc: emacs-devel
In article <00d801c82732$0001dfb0$d5101252@JRWXP1>, "Richard Wordingham" <richard.wordingham@ntlworld.com> writes:
>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>> property) using the Code2000 font (the only fully working Lao font I
>>> have),
>>> do not display properly, whether they are in the Lao or
>>> mule-unicode-0100-24ff charset.
> > I think it's a waste of time to learn composition mechanism
> > of Emacs 22. It will be a lot improved in emacs-unicode-2.
> Are you talking of future work or the present state of emacs-unicode-2, as
> in Emacs 23.0.60.0?
I'm talking about the future work, but the latest
emacs-unicode-2 code already contains half-done codes.
> Are you talking of applying the composition property,
> or of the rendering of composed sequences?
I'm going to allow each font-backends to generate proper
composition information that will vary depending on a font,
instead of the current fixed way of composition. So, On
Windows, perhaps the font backend can utilize uniscribe. On
GNU/Linux, I have not yet decided what to do; using Pango,
using m17n-lib, or using a newly written code that directly
uses libotf or harfbuzz for OTF handling. I think the last
choice will result in the fastest rendering.
> With the help of Jason Rumney I
> got 23.0.60.0 to build and run on Windows XP, but the Lao compositions are
> still displaying unreadably bizarrely. Unfortunately the more thorough
> application of composition in emacs-unicode-2 currently makes matters worse!
> Before, one could easily if tediously avoid the composition mechanism. For
> example, to make a Lao closed syllable consonant-vowel sequence <ka>, in
> Emacs 22.1 one can type d, space, delete, a. The best I can find in Emacs
> 23.0.60.0 is d, space, a, left, delete. However, as soon as one moves the
> cursor the characters compose and are then misdisplayed.
That is because of the auto-composition mechanism introduced
in emacs-unicode-2. The problem is that the current fixed
composition is suitable only for a specific font (usually a
fixed-width terminal font). Even on Windows, I think
there's a way to use BDF fonts distributed as intlfonts. If
you use those fonts on Windows, the rendering should be
good.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-19 2:35 ` Kenichi Handa
@ 2007-11-19 8:51 ` Jason Rumney
2007-11-20 1:49 ` Richard Wordingham
2007-11-21 1:51 ` Richard Wordingham
2 siblings, 0 replies; 13+ messages in thread
From: Jason Rumney @ 2007-11-19 8:51 UTC (permalink / raw)
To: Kenichi Handa; +Cc: Richard Wordingham, emacs-devel
Kenichi Handa wrote:
> Even on Windows, I think
> there's a way to use BDF fonts distributed as intlfonts.
Not with the new font backend though.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-19 2:35 ` Kenichi Handa
2007-11-19 8:51 ` Jason Rumney
@ 2007-11-20 1:49 ` Richard Wordingham
2007-11-20 11:30 ` Jason Rumney
2007-11-21 1:51 ` Richard Wordingham
2 siblings, 1 reply; 13+ messages in thread
From: Richard Wordingham @ 2007-11-20 1:49 UTC (permalink / raw)
To: emacs-devel
Kenichi Handa wrote:
> Richard Wordingham writes:
>
>>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>>> property) using the Code2000 font (the only fully working Lao font I
>>>> have),
>>>> do not display properly, whether they are in the Lao or
>>>> mule-unicode-0100-24ff charset.
> I'm going to allow each font-backends to generate proper
> composition information that will vary depending on a font,
> instead of the current fixed way of composition. So, On
> Windows, perhaps the font backend can utilize uniscribe.
For OpenType fonts in scripts supported by Uniscribe, that's generally the
way to go - especially for quick results. Might Pango be superior, even on
MS Windows, though? It was very noticeable that when Unicode belatedly
added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil
letter, let alone form the shri ligature from it in those fonts that had
been updated. (Previously the shri ligature had been implemented via the
hack of using U+0BB7 TAMIL LETTER SSA instead.)
There is another composition technology around, intended to cater for those
scripts not or inadequately supported by Uniscribe, namely Graphite from
SIL. For some time it was the only way of supporting the Burmese script in
Unicode on Windows. (I don't know if Windows Vista and related products
support the Burmese script, at least for Burmese. I'd be impressed if the
Shan extensions were in.) The OpenType font has extra tables for Graphite,
so an application (such as at least some versions of Firefox and OpenOffice)
knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS
tables. (I presume similar considerations apply to Apple-defined mort and
morx tables.) By putting the composition knowledge in the font, Graphite
even allows one to encode complex scripts in the Private Use Areas.
Incidentally, part of the reason for the poor Lao rendering was that in
Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI'
sequence. I've fixed that problem by adding some MS Windows only code to
append_composite_glyph() in xdisp.c to apply the identification rules in the
same way as done for uncomposed characters, but that doesn't really seem the
best place for it. Populating and using the unused field font_type in
W32FontStruct would be a clearer solution. (A cleaner solution still would
be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always
generates an intermediate sequence of 16-bit codes, but the burden of
recoding for hack fonts might be transferred from the OS to emacs.) Judging
by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I
can trust version.el). Most spectacularly, plain text 'underlined' 'o'
<U+006F U+0331> renders as 'o' with the digit '1' written below it!
This then exposes the next set of problems - Uniscribe often refuses to draw
a combining mark on its own (prefixing U+00A0 might work) - and determining
when a composition should be left to Uniscribe. The latter is slightly
complicated by such features as an ASCII or Latin-1 base character plus a
combining mark, admittedly fairly rare if one is using Normal Form Composed
(NFC). (Indic transliteration and typewriter-based American Indian
orthographies are the best sources, e.g. underlining for nasal vowels in
Choctaw.) In these cases, the character sequence is broken, at least in
Emacs 22.1, because the base and combining characters seem to come from
different fonts!
I'm tempted to go for the brute force rule of assuming that the combining
marks are always taken from the same OpenType font as the base character and
giving the job to Uniscribe. This hits the practical problem that many
OpenType fonts don't stack arbitrary combinations of diacritic marks.
However, I have seen an Emacs-related statement that it is the user's
responsibility to provide a font that works properly.
Richard.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-20 1:49 ` Richard Wordingham
@ 2007-11-20 11:30 ` Jason Rumney
2007-11-20 12:50 ` Kenichi Handa
0 siblings, 1 reply; 13+ messages in thread
From: Jason Rumney @ 2007-11-20 11:30 UTC (permalink / raw)
To: Richard Wordingham; +Cc: emacs-devel
Richard Wordingham wrote:
> For OpenType fonts in scripts supported by Uniscribe, that's generally
> the way to go - especially for quick results. Might Pango be
> superior, even on MS Windows, though?
With the new font-backend design, there is no reason why someone could
not make a pango backend work on Windows. But I think we still need a
native windows backend.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-20 11:30 ` Jason Rumney
@ 2007-11-20 12:50 ` Kenichi Handa
0 siblings, 0 replies; 13+ messages in thread
From: Kenichi Handa @ 2007-11-20 12:50 UTC (permalink / raw)
To: Jason Rumney; +Cc: richard.wordingham, emacs-devel
In article <4742C54E.1030600@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:
> Richard Wordingham wrote:
> > For OpenType fonts in scripts supported by Uniscribe, that's generally
> > the way to go - especially for quick results. Might Pango be
> > superior, even on MS Windows, though?
> With the new font-backend design, there is no reason why someone could
> not make a pango backend work on Windows. But I think we still need a
> native windows backend.
I'm now designing an API to utilize font-backend's shaping
engine. At the moment, I'm thinking about adding a new
callback function `shape' in a font driver, and make it
callable from auto-composition-function to attach a proper
composition property to text.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [w32] display international HELLO
2007-11-19 2:35 ` Kenichi Handa
2007-11-19 8:51 ` Jason Rumney
2007-11-20 1:49 ` Richard Wordingham
@ 2007-11-21 1:51 ` Richard Wordingham
2 siblings, 0 replies; 13+ messages in thread
From: Richard Wordingham @ 2007-11-21 1:51 UTC (permalink / raw)
To: emacs-devel
Kenichi Handa wrote on Monday, November 19, 2007 2:35 AM
> The problem is that the current fixed
> composition is suitable only for a specific font (usually a
> fixed-width terminal font). Even on Windows, I think
> there's a way to use BDF fonts distributed as intlfonts. If
> you use those fonts on Windows, the rendering should be
> good.
I installed intlfonts 1.2.1 using the procedure given at
http://www.gnu.org/software/emacs/windows/faq5.html , save that I updated
the directory names to those actually present, thus names ending in '.X',
not '-X'. When I selected the 16-point BDF fonts for Emacs 22.1, the Lao
came out nicely except that the tone marks were not treated as combining
marks. Unfortunately, with Emacs 23.0.60.1 (+ bug fixes Jason Rumney
mentioned earlier), I just got square boxes for the BDF fonts. Do I need to
define the fontset specially to allow it use a Lao-encoded font for Lao
characters? I tried
(set-fontset-font "fontset-bdf" 'lao
'("-misc-fixed-medium-r-normal--16-160-72-72-m-80-MuleLao-1" . "MuleLao-1"))
and several variations, but to no apparent avail.
Richard.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-11-21 1:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-09 8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
2007-11-15 2:48 ` Richard Wordingham
2007-11-19 2:35 ` Kenichi Handa
2007-11-19 8:51 ` Jason Rumney
2007-11-20 1:49 ` Richard Wordingham
2007-11-20 11:30 ` Jason Rumney
2007-11-20 12:50 ` Kenichi Handa
2007-11-21 1:51 ` Richard Wordingham
-- strict thread matches above, loose matches on Subject: below --
2007-01-31 6:34 Takashi Hiromatsu
2007-01-31 6:51 ` Kenichi Handa
2007-01-31 7:07 ` Takashi Hiromatsu
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).