From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Jason Rumney Newsgroups: gmane.emacs.bugs Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Date: Sun, 19 Aug 2012 11:02:52 +0800 Message-ID: <87393j7fdv.fsf@gnu.org> References: <349071341393469@web30d.yandex.ru> <87k3wwimlk.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1345345443 2545 80.91.229.3 (19 Aug 2012 03:04:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 19 Aug 2012 03:04:03 +0000 (UTC) Cc: 11860@debbugs.gnu.org, smias@yandex.ru To: Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Aug 19 05:04:03 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T2vo2-0004FL-Bs for geb-bug-gnu-emacs@m.gmane.org; Sun, 19 Aug 2012 05:04:02 +0200 Original-Received: from localhost ([::1]:41242 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2vo0-0007tZ-Tu for geb-bug-gnu-emacs@m.gmane.org; Sat, 18 Aug 2012 23:04:00 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:39941) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2vny-0007tA-Cf for bug-gnu-emacs@gnu.org; Sat, 18 Aug 2012 23:03:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T2vnx-0000Va-A5 for bug-gnu-emacs@gnu.org; Sat, 18 Aug 2012 23:03:58 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:56777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2vnx-0000VW-6j for bug-gnu-emacs@gnu.org; Sat, 18 Aug 2012 23:03:57 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1T2vo2-000071-4D for bug-gnu-emacs@gnu.org; Sat, 18 Aug 2012 23:04:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Jason Rumney Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 19 Aug 2012 03:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11860 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 11860-submit@debbugs.gnu.org id=B11860.1345345393372 (code B ref 11860); Sun, 19 Aug 2012 03:04:02 +0000 Original-Received: (at 11860) by debbugs.gnu.org; 19 Aug 2012 03:03:13 +0000 Original-Received: from localhost ([127.0.0.1]:38089 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T2vnE-00005x-Cj for submit@debbugs.gnu.org; Sat, 18 Aug 2012 23:03:12 -0400 Original-Received: from mail-pb0-f44.google.com ([209.85.160.44]:52531) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T2vnB-00005o-Bf for 11860@debbugs.gnu.org; Sat, 18 Aug 2012 23:03:10 -0400 Original-Received: by pbbrr4 with SMTP id rr4so6661006pbb.3 for <11860@debbugs.gnu.org>; Sat, 18 Aug 2012 20:03:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=F5WNpmiatsueHKU5aCb/9TgXhZpN1fnfRgnN82txhgw=; b=ExaJ88GjiQc49S32qN+MkcCzN95AnlqTtBnEZPiLINLPSCIkyoqtppcVd0HzBTIN07 m6o59SNoHa1YQoT9HHHNEZHFrs0GmCvYLTAiDQ7lMjLI5PMRXWJ3HJnDB9FtozUrjcly N0PSmp1ABAYdQ2DjG3Iq5oCCtFQXfUZLUpI/Nplhg2p39YbtLkoHoYJMIDNhZoBpbDRW /FIfa8xWBGGqnEezBYjAt+5kz9UmbZzgY4zMfD4EX+rvRcefV6GrHCNa5FCsJgYLyzKg WsZiVbsBJUHSHSxylT4j/I+Wj/qoMyfp97RjD1AQ+dCsuanwnT+y6N2r9YsLk/y0Nfc4 9NTA== Original-Received: by 10.68.234.98 with SMTP id ud2mr23347834pbc.165.1345345382866; Sat, 18 Aug 2012 20:03:02 -0700 (PDT) Original-Received: from home.jasonrumney.net ([180.75.28.25]) by mx.google.com with ESMTPS id qx8sm8141659pbc.63.2012.08.18.20.02.58 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 18 Aug 2012 20:03:01 -0700 (PDT) Original-Received: by home.jasonrumney.net (Postfix, from userid 1000) id 9CD461BE0; Sun, 19 Aug 2012 11:02:52 +0800 (MYT) In-Reply-To: <87k3wwimlk.fsf@gnu.org> (Kenichi Handa's message of "Sat, 18 Aug 2012 18:19:19 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:63282 Archived-At: Kenichi Handa writes: > In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii writes: > >> > From: Kenichi Handa >> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru >> > Date: Sat, 18 Aug 2012 11:45:27 +0900 >> > >> > So, apparently Emacs on Windows and GNU/Linux uses the >> > different metrics of glyphs. Right, but adding the offsets to the corresponding metrics, we get the same result with both the Windows and GNU/Linux cases, except for the total height of the font, which I think is because Windows counts inter-line spacing, while on GNU/Linux, that is separate. So I'm not sure that this is causing us problems (see Eli's report about Hebrew), it's just a case of a different reference point being used between Windows and GNU/Linux. > For Hebrew too, on Windows, I see the same problem as what > Steffan reported: If you are seeing something different than Eli for Hebrew with the same font, then I suspect the cause is linked with the version of Uniscribe that is installed. Maybe diacritic handling for Hebrew and Arabic is a more recent addition to Uniscribe than the basic support for those languages. >> > For instance, in the above case, we may have to render glyphs in >> > this order (diacritical mark first): >> > >> > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] >> > [0 1 1593 969 8 1 8 12 4 nil] I'm curious as to how we ended up with the same C entry in those vectors. Could this be causing us problems later on? The glyph index is correct (comparing to the GNU/Linux version), but I wonder if Uniscribe is referring back to the character at some point and tripping up because it has been changed. > I've just read the function uniscribe_shape in > w32uniscribe.c. It seems that these are the key API for > uniscribe: > > * ScriptItemize -- no idea what is this This should be a no-op on Emacs, as we already split the string into LGSTRING components. But if it is not called, subsequent uniscribe operations fail, so it must also be doing some initialization of internal structures as well. > * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype) > * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype) Yes, I think that is correct. > So at first please check the documentation of ScriptShape > and figure out how it works for bidi script; i.e. what order > does it expect for input, and what order does it produce. > > Next please find the meaning of this code fragment: > > /* Detect clusters, for linking codes back to > characters. */ > if (attributes[j].fClusterStart) > { > while (from < nchars_in_run && clusters[from] < j) > from++; > if (from >= nchars_in_run) > from = to = nchars_in_run - 1; > else > { > int k; > to = nchars_in_run - 1; > for (k = from + 1; k < nchars_in_run; k++) > { > if (clusters[k] > j) > { > to = k - 1; > break; > } > } > } > } > > The comment refer to "clusters". I don't know what it > exactly means in uniscribe, but I guess it relates to > grapheme cluster, and if so, this part seems to relates to > the ordering of glyphs in this kind of grapheme clauster: > > [0 1 1593 969 8 1 8 12 4 nil] > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] That seems to be correct. Maybe this is the code that is changing the character code to 1593. I seem to recall that something like this was required for Indic languages to let Emacs know which characters had been linked back into one glyph.