From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Richard Wordingham" Newsgroups: gmane.emacs.devel Subject: Re: [w32] display international HELLO Date: Tue, 20 Nov 2007 01:49:14 -0000 Message-ID: <001101c82b17$8f9ad3f0$d5101252@JRWXP1> References: <001501c822ab$ccfec5e0$d5101252@JRWXP1> <00d801c82732$0001dfb0$d5101252@JRWXP1> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="ISO-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1195523396 14299 80.91.229.12 (20 Nov 2007 01:49:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 20 Nov 2007 01:49:56 +0000 (UTC) To: Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 20 02:50:02 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IuIFT-00058O-T2 for ged-emacs-devel@m.gmane.org; Tue, 20 Nov 2007 02:50:00 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IuIFF-0003gO-VW for ged-emacs-devel@m.gmane.org; Mon, 19 Nov 2007 20:49:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IuIFB-0003co-0v for emacs-devel@gnu.org; Mon, 19 Nov 2007 20:49:41 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IuIFA-0003bU-CR for emacs-devel@gnu.org; Mon, 19 Nov 2007 20:49:40 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IuIFA-0003bM-3S for emacs-devel@gnu.org; Mon, 19 Nov 2007 20:49:40 -0500 Original-Received: from mtaout01-winn.ispmail.ntl.com ([81.103.221.47]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1IuIF9-000231-0N for emacs-devel@gnu.org; Mon, 19 Nov 2007 20:49:39 -0500 Original-Received: from aamtaout01-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout01-winn.ispmail.ntl.com with ESMTP id <20071120014927.TVJ1988.mtaout01-winn.ispmail.ntl.com@aamtaout01-winn.ispmail.ntl.com> for ; Tue, 20 Nov 2007 01:49:27 +0000 Original-Received: from JRWXP1 ([82.18.16.213]) by aamtaout01-winn.ispmail.ntl.com with SMTP id <20071120014926.NGLU219.aamtaout01-winn.ispmail.ntl.com@JRWXP1> for ; Tue, 20 Nov 2007 01:49:26 +0000 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 X-detected-kernel: by monty-python.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:83695 Archived-At: Kenichi Handa wrote: > Richard Wordingham writes: > >>>> 3. Compositions of Lao characters, (i.e. with the 'composition' string >>>> property) using the Code2000 font (the only fully working Lao font I >>>> have), >>>> do not display properly, whether they are in the Lao or >>>> mule-unicode-0100-24ff charset. > I'm going to allow each font-backends to generate proper > composition information that will vary depending on a font, > instead of the current fixed way of composition. So, On > Windows, perhaps the font backend can utilize uniscribe. For OpenType fonts in scripts supported by Uniscribe, that's generally the way to go - especially for quick results. Might Pango be superior, even on MS Windows, though? It was very noticeable that when Unicode belatedly added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil letter, let alone form the shri ligature from it in those fonts that had been updated. (Previously the shri ligature had been implemented via the hack of using U+0BB7 TAMIL LETTER SSA instead.) There is another composition technology around, intended to cater for those scripts not or inadequately supported by Uniscribe, namely Graphite from SIL. For some time it was the only way of supporting the Burmese script in Unicode on Windows. (I don't know if Windows Vista and related products support the Burmese script, at least for Burmese. I'd be impressed if the Shan extensions were in.) The OpenType font has extra tables for Graphite, so an application (such as at least some versions of Firefox and OpenOffice) knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS tables. (I presume similar considerations apply to Apple-defined mort and morx tables.) By putting the composition knowledge in the font, Graphite even allows one to encode complex scripts in the Private Use Areas. Incidentally, part of the reason for the poor Lao rendering was that in Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI' sequence. I've fixed that problem by adding some MS Windows only code to append_composite_glyph() in xdisp.c to apply the identification rules in the same way as done for uncomposed characters, but that doesn't really seem the best place for it. Populating and using the unused field font_type in W32FontStruct would be a clearer solution. (A cleaner solution still would be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always generates an intermediate sequence of 16-bit codes, but the burden of recoding for hack fonts might be transferred from the OS to emacs.) Judging by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I can trust version.el). Most spectacularly, plain text 'underlined' 'o' renders as 'o' with the digit '1' written below it! This then exposes the next set of problems - Uniscribe often refuses to draw a combining mark on its own (prefixing U+00A0 might work) - and determining when a composition should be left to Uniscribe. The latter is slightly complicated by such features as an ASCII or Latin-1 base character plus a combining mark, admittedly fairly rare if one is using Normal Form Composed (NFC). (Indic transliteration and typewriter-based American Indian orthographies are the best sources, e.g. underlining for nasal vowels in Choctaw.) In these cases, the character sequence is broken, at least in Emacs 22.1, because the base and combining characters seem to come from different fonts! I'm tempted to go for the brute force rule of assuming that the combining marks are always taken from the same OpenType font as the base character and giving the job to Uniscribe. This hits the practical problem that many OpenType fonts don't stack arbitrary combinations of diacritic marks. However, I have seen an Emacs-related statement that it is the user's responsibility to provide a font that works properly. Richard.