From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
To: <emacs-devel@gnu.org>
Subject: Re: [w32] display international HELLO
Date: Tue, 20 Nov 2007 01:49:14 -0000 [thread overview]
Message-ID: <001101c82b17$8f9ad3f0$d5101252@JRWXP1> (raw)
In-Reply-To: E1ItwTu-0007aR-Jb@etlken.m17n.org
Kenichi Handa wrote:
> Richard Wordingham writes:
>
>>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>>> property) using the Code2000 font (the only fully working Lao font I
>>>> have),
>>>> do not display properly, whether they are in the Lao or
>>>> mule-unicode-0100-24ff charset.
> I'm going to allow each font-backends to generate proper
> composition information that will vary depending on a font,
> instead of the current fixed way of composition. So, On
> Windows, perhaps the font backend can utilize uniscribe.
For OpenType fonts in scripts supported by Uniscribe, that's generally the
way to go - especially for quick results. Might Pango be superior, even on
MS Windows, though? It was very noticeable that when Unicode belatedly
added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil
letter, let alone form the shri ligature from it in those fonts that had
been updated. (Previously the shri ligature had been implemented via the
hack of using U+0BB7 TAMIL LETTER SSA instead.)
There is another composition technology around, intended to cater for those
scripts not or inadequately supported by Uniscribe, namely Graphite from
SIL. For some time it was the only way of supporting the Burmese script in
Unicode on Windows. (I don't know if Windows Vista and related products
support the Burmese script, at least for Burmese. I'd be impressed if the
Shan extensions were in.) The OpenType font has extra tables for Graphite,
so an application (such as at least some versions of Firefox and OpenOffice)
knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS
tables. (I presume similar considerations apply to Apple-defined mort and
morx tables.) By putting the composition knowledge in the font, Graphite
even allows one to encode complex scripts in the Private Use Areas.
Incidentally, part of the reason for the poor Lao rendering was that in
Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI'
sequence. I've fixed that problem by adding some MS Windows only code to
append_composite_glyph() in xdisp.c to apply the identification rules in the
same way as done for uncomposed characters, but that doesn't really seem the
best place for it. Populating and using the unused field font_type in
W32FontStruct would be a clearer solution. (A cleaner solution still would
be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always
generates an intermediate sequence of 16-bit codes, but the burden of
recoding for hack fonts might be transferred from the OS to emacs.) Judging
by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I
can trust version.el). Most spectacularly, plain text 'underlined' 'o'
<U+006F U+0331> renders as 'o' with the digit '1' written below it!
This then exposes the next set of problems - Uniscribe often refuses to draw
a combining mark on its own (prefixing U+00A0 might work) - and determining
when a composition should be left to Uniscribe. The latter is slightly
complicated by such features as an ASCII or Latin-1 base character plus a
combining mark, admittedly fairly rare if one is using Normal Form Composed
(NFC). (Indic transliteration and typewriter-based American Indian
orthographies are the best sources, e.g. underlining for nasal vowels in
Choctaw.) In these cases, the character sequence is broken, at least in
Emacs 22.1, because the base and combining characters seem to come from
different fonts!
I'm tempted to go for the brute force rule of assuming that the combining
marks are always taken from the same OpenType font as the base character and
giving the job to Uniscribe. This hits the practical problem that many
OpenType fonts don't stack arbitrary combinations of diacritic marks.
However, I have seen an Emacs-related statement that it is the user's
responsibility to provide a font that works properly.
Richard.
next prev parent reply other threads:[~2007-11-20 1:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-09 8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
2007-11-15 2:48 ` Richard Wordingham
2007-11-19 2:35 ` Kenichi Handa
2007-11-19 8:51 ` Jason Rumney
2007-11-20 1:49 ` Richard Wordingham [this message]
2007-11-20 11:30 ` Jason Rumney
2007-11-20 12:50 ` Kenichi Handa
2007-11-21 1:51 ` Richard Wordingham
-- strict thread matches above, loose matches on Subject: below --
2007-01-31 6:34 Takashi Hiromatsu
2007-01-31 6:51 ` Kenichi Handa
2007-01-31 7:07 ` Takashi Hiromatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='001101c82b17$8f9ad3f0$d5101252@JRWXP1' \
--to=richard.wordingham@ntlworld.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.