From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Date: Sun, 19 Aug 2012 20:56:57 +0300 Message-ID: <83txvydaty.fsf@gnu.org> References: <349071341393469@web30d.yandex.ru> <87k3wwimlk.fsf@gnu.org> <87393j7fdv.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1345399080 662 80.91.229.3 (19 Aug 2012 17:58:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 19 Aug 2012 17:58:00 +0000 (UTC) Cc: 11860@debbugs.gnu.org, smias@yandex.ru To: Jason Rumney Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Aug 19 19:57:58 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T39l8-0002tH-0G for geb-bug-gnu-emacs@m.gmane.org; Sun, 19 Aug 2012 19:57:58 +0200 Original-Received: from localhost ([::1]:37838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T39l6-0002Nb-Qm for geb-bug-gnu-emacs@m.gmane.org; Sun, 19 Aug 2012 13:57:56 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:44666) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T39l4-0002NW-HO for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T39l3-0008S9-BN for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:54 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:57450) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T39l3-0008S0-81 for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:53 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1T39lB-0004aV-Ll for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:58:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 19 Aug 2012 17:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11860 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11860-submit@debbugs.gnu.org id=B11860.134539903417580 (code B ref 11860); Sun, 19 Aug 2012 17:58:01 +0000 Original-Received: (at 11860) by debbugs.gnu.org; 19 Aug 2012 17:57:14 +0000 Original-Received: from localhost ([127.0.0.1]:38763 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T39kP-0004ZV-BI for submit@debbugs.gnu.org; Sun, 19 Aug 2012 13:57:14 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:61139) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T39kL-0004ZK-SZ for 11860@debbugs.gnu.org; Sun, 19 Aug 2012 13:57:12 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0M9000K00KGTBH00@a-mtaout20.012.net.il> for 11860@debbugs.gnu.org; Sun, 19 Aug 2012 20:56:58 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M9000KCHKIX6C20@a-mtaout20.012.net.il>; Sun, 19 Aug 2012 20:56:58 +0300 (IDT) In-reply-to: <87393j7fdv.fsf@gnu.org> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:63295 Archived-At: > From: Jason Rumney > Cc: Eli Zaretskii , 11860@debbugs.gnu.org, smias@ya= ndex.ru > Date: Sun, 19 Aug 2012 11:02:52 +0800 >=20 > Kenichi Handa writes: >=20 > > In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii = writes: > > > >> > From: Kenichi Handa > >> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru > >> > Date: Sat, 18 Aug 2012 11:45:27 +0900 > >> >=20 > >> > So, apparently Emacs on Windows and GNU/Linux uses the > >> > different metrics of glyphs. >=20 > Right, but adding the offsets to the corresponding metrics, we get = the > same result with both the Windows and GNU/Linux cases I think the results of addition are not relevant to the problem. The problem is that the diacriticals and/or vowels are not drawn at correct horizontal positions. The values of the offsets are directly relevant to that, because they describe how many pixels to advance after drawing each glyph. By contrast, the sum of the offsets will b= e always approximately the same, since the entire grapheme cluster occupies a single character cell. > So I'm not sure that this is causing us problems (see Eli's report = about > Hebrew), it's just a case of a different reference point being used > between Windows and GNU/Linux. My report about Hebrew is not relevant either; see below. > If you are seeing something different than Eli for Hebrew with the = same > font, then I suspect the cause is linked with the version of Uniscr= ibe > that is installed. Maybe diacritic handling for Hebrew and Arabic i= s a > more recent addition to Uniscribe than the basic support for those > languages. That appears to be the case, indeed. My initial attempts to reproduc= e this were on XP SP3, where Hebrew rendering appeared to be OK. I now tried on Windows 7 and there I see the problem with Hebrew as well. Moreover, when I type the Hebrew characters specified by the OP, I don't see that the uniscribe_shape function is called at all on XP: a breakpoint inside it never breaks. On Windows 7, that function does get called. Jason, how can I find out whether Uniscribe is used for rendering Hebrew, or why doesn't Emacs call uniscribe_shape? (I know about uniscribe_font->cache, but I don't see that function called even if I start Emacs with a breakpoint in it, so it seems the cache is not the issue here. The cache is per application, right?) For Arabic characters in the recipe, uniscribe_shape _is_ called on XP. I guess that's why the problem with Arabic is visible on both XP and Windows7. For the record, here's the output of "C-u C-x =3D" on XP for the Hebr= ew character composition mentioned earlier: =09 position: 193 of 194 (99%), column: 1 =09 character: =D7=92=E2=80=8E (displayed as =D7=92=E2=80=8E) (c= odepoint 1490, #o2722, #x5d2) preferred charset: iso-8859-8 (ISO/IEC 8859/8) code point in charset: 0xE2 =09=09 syntax: w =09which means: word =09 category: .:Base, R:Right-to-left (strong) =09 to input: type "d" with hebrew-full =09 buffer code: #xD7 #x92 =09 file code: #xE2 (encoded by coding system hebrew-iso-8bit-do= s) =09=09display: composed to form "=D7=92=D6=BB" (see below) Composed with the following character(s) "=D6=BB" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*= -c-*-iso8859-8 by these glyphs: [0 1 1490 674 8 0 6 12 4 nil] [0 1 1467 663 8 0 7 12 4 [-8 0 0]] Compare with the output on Windows 7 to see the differences: =09 position: 193 of 194 (99%), column: 1 =09 character: =D7=92=E2=80=8E (displayed as =D7=92=E2=80=8E) (c= odepoint 1490, #o2722, #x5d2) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x05D2 =09=09 syntax: w =09which means: word =09 category: .:Base, R:Right-to-left (strong) =09 to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME= " =09 buffer code: #xD7 #x92 =09 file code: not encodable by coding system iso-latin-1-dos =09=09display: composed to form "=D7=92=D6=BB" (see below) Composed with the following character(s) "=D6=BB" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*= -c-*-iso10646-1 by these glyphs: [0 1 1490 674 8 1 6 12 4 nil] [0 1 1490 663 0 2 6 12 4 nil] And here's the output of "C-u C-x =3D" for the Arabic character Ayin with sukun on XP: =09 position: 197 of 198 (99%), column: 0 =09 character: =D8=B9=E2=80=8E (displayed as =D8=B9=E2=80=8E) (c= odepoint 1593, #o3071, #x639) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x0639 =09=09 syntax: w =09which means: word =09 category: .:Base, R:Right-to-left (strong), b:Arabic =09 buffer code: #xD8 #xB9 =09 file code: not encodable by coding system hebrew-iso-8bit-do= s =09=09display: composed to form "=D8=B9=D9=92" (see below) Composed with the following character(s) "=D9=92" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*= -c-*-iso10646-1 by these glyphs: [0 1 1593 969 8 2 8 12 4 nil] [0 1 1593 1028 0 3 6 12 4 nil] Note that the glyph index of the sukun are different from the Windows 7 output. I have no idea why. > >> > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] > >> > [0 1 1593 969 8 1 8 12 4 nil] >=20 > I'm curious as to how we ended up with the same C entry in those > vectors. That's because the code in uniscribe_shape does this: =09=09 LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos =09=09=09=09=09=09 + from]); and it does that for all the 'nglyphs' glyphs produced by ScriptPlace= . As Handa-san writes, the character code is never used, because we hav= e the font glyph index and its metrics, so I think this is a non-issue. > Could this be causing us problems later on? The glyph index > is correct (comparing to the GNU/Linux version), but I wonder if > Uniscribe is referring back to the character at some point and trip= ping > up because it has been changed. Uniscribe cannot refer to this code, because Uniscribe doesn't use LGSTRING, IIUC. Or does it? (If it does, please show where in the code it uses that value.) > > =09=09 /* Detect clusters, for linking codes back to > > =09=09 characters. */ > > =09=09 if (attributes[j].fClusterStart) > > =09=09 { > > =09=09 while (from < nchars_in_run && clusters[from] < j) > > =09=09=09from++; > > =09=09 if (from >=3D nchars_in_run) > > =09=09=09from =3D to =3D nchars_in_run - 1; > > =09=09 else > > =09=09=09{ > > =09=09=09 int k; > > =09=09=09 to =3D nchars_in_run - 1; > > =09=09=09 for (k =3D from + 1; k < nchars_in_run; k++) > > =09=09=09 { > > =09=09=09 if (clusters[k] > j) > > =09=09=09=09{ > > =09=09=09=09 to =3D k - 1; > > =09=09=09=09 break; > > =09=09=09=09} > > =09=09=09 } > > =09=09=09} > > =09=09 } > > > > The comment refer to "clusters". I don't know what it > > exactly means in uniscribe, but I guess it relates to > > grapheme cluster, and if so, this part seems to relates to > > the ordering of glyphs in this kind of grapheme clauster: > > > > [0 1 1593 969 8 1 8 12 4 nil] > > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] >=20 > That seems to be correct. Maybe this is the code that is changing = the > character code to 1593. It doesn't _change_ the character code, it simply sets it to the code of the base character. But again, I don't think this is relevant.