all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
To: <emacs-devel@gnu.org>
Subject: Re: [w32] display international HELLO
Date: Tue, 20 Nov 2007 01:49:14 -0000	[thread overview]
Message-ID: <001101c82b17$8f9ad3f0$d5101252@JRWXP1> (raw)
In-Reply-To: E1ItwTu-0007aR-Jb@etlken.m17n.org

Kenichi Handa wrote:


> Richard Wordingham writes:
>
>>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string
>>>> property) using the Code2000 font (the only fully working Lao font I
>>>> have),
>>>> do not display properly, whether they are in the Lao or
>>>> mule-unicode-0100-24ff charset.

> I'm going to allow each font-backends to generate proper
> composition information that will vary depending on a font,
> instead of the current fixed way of composition.  So, On
> Windows, perhaps the font backend can utilize uniscribe.

For OpenType fonts in scripts supported by Uniscribe, that's generally the 
way to go - especially for quick results.  Might Pango be superior, even on 
MS Windows, though?  It was very noticeable that when Unicode belatedly 
added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil 
letter, let alone form the shri ligature from it in those fonts that had 
been updated.  (Previously the shri ligature had been implemented via the 
hack of using U+0BB7 TAMIL LETTER SSA instead.)

There is another composition technology around, intended to cater for those 
scripts not or inadequately supported by Uniscribe, namely Graphite from 
SIL.  For some time it was the only way of supporting the Burmese script in 
Unicode on Windows.  (I don't know if Windows Vista and related products 
support the Burmese script, at least for Burmese.  I'd be impressed if the 
Shan extensions were in.)  The OpenType font has extra tables for Graphite, 
so an application (such as at least some versions of Firefox and OpenOffice) 
knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS 
tables.  (I presume similar considerations apply to Apple-defined mort and 
morx tables.)  By putting the composition knowledge in the font, Graphite 
even allows one to encode complex scripts in the Private Use Areas.

Incidentally, part of the reason for the poor Lao rendering was that in 
Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI' 
sequence.  I've fixed that problem by adding some MS Windows only code to 
append_composite_glyph() in xdisp.c to apply the identification rules in the 
same way as done for uncomposed characters, but that doesn't really seem the 
best place for it.  Populating and using the unused field font_type in 
W32FontStruct would be a clearer solution.  (A cleaner solution still would 
be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always 
generates an intermediate sequence of 16-bit codes, but the burden of 
recoding for hack fonts might be transferred from the OS to emacs.)  Judging 
by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I 
can trust version.el).  Most spectacularly, plain text 'underlined' 'o' 
<U+006F U+0331> renders as 'o' with the digit '1' written below it!

This then exposes the next set of problems - Uniscribe often refuses to draw 
a combining mark on its own (prefixing U+00A0 might work) - and determining 
when a composition should be left to Uniscribe.  The latter is slightly 
complicated by such features as an ASCII or Latin-1 base character plus a 
combining mark, admittedly fairly rare if one is using Normal Form Composed 
(NFC).  (Indic transliteration and typewriter-based American Indian 
orthographies are the best sources, e.g. underlining for nasal vowels in 
Choctaw.)  In these cases, the character sequence is broken, at least in 
Emacs 22.1, because the base and combining characters seem to come from 
different fonts!

I'm tempted to go for the brute force rule of assuming that the combining 
marks are always taken from the same OpenType font as the base character and 
giving the job to Uniscribe.  This hits the practical problem that many 
OpenType fonts don't stack arbitrary combinations of diacritic marks. 
However, I have seen an Emacs-related statement that it is the user's 
responsibility to provide a font that works properly.

Richard. 

  parent reply	other threads:[~2007-11-20  1:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-09  8:37 [w32] display international HELLO Richard Wordingham
2007-11-09 10:55 ` Eli Zaretskii
2007-11-09 12:40 ` Kenichi Handa
2007-11-15  2:48   ` Richard Wordingham
2007-11-19  2:35     ` Kenichi Handa
2007-11-19  8:51       ` Jason Rumney
2007-11-20  1:49       ` Richard Wordingham [this message]
2007-11-20 11:30         ` Jason Rumney
2007-11-20 12:50           ` Kenichi Handa
2007-11-21  1:51       ` Richard Wordingham
  -- strict thread matches above, loose matches on Subject: below --
2007-01-31  6:34 Takashi Hiromatsu
2007-01-31  6:51 ` Kenichi Handa
2007-01-31  7:07   ` Takashi Hiromatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001101c82b17$8f9ad3f0$d5101252@JRWXP1' \
    --to=richard.wordingham@ntlworld.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.