From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.bugs
Subject: bug#11860: 24.1;
	Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 20:56:57 +0300
Message-ID: <83txvydaty.fsf@gnu.org>
References: <349071341393469@web30d.yandex.ru> <87k3wwimlk.fsf@gnu.org>
	<87393j7fdv.fsf@gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-Trace: ger.gmane.org 1345399080 662 80.91.229.3 (19 Aug 2012 17:58:00 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 19 Aug 2012 17:58:00 +0000 (UTC)
Cc: 11860@debbugs.gnu.org, smias@yandex.ru
To: Jason Rumney <jasonr@gnu.org>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Aug 19 19:57:58 2012
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1T39l8-0002tH-0G
	for geb-bug-gnu-emacs@m.gmane.org; Sun, 19 Aug 2012 19:57:58 +0200
Original-Received: from localhost ([::1]:37838 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1T39l6-0002Nb-Qm
	for geb-bug-gnu-emacs@m.gmane.org; Sun, 19 Aug 2012 13:57:56 -0400
Original-Received: from eggs.gnu.org ([208.118.235.92]:44666)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1T39l4-0002NW-HO
	for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:55 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1T39l3-0008S9-BN
	for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:54 -0400
Original-Received: from debbugs.gnu.org ([140.186.70.43]:57450)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1T39l3-0008S0-81
	for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:57:53 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1T39lB-0004aV-Ll
	for bug-gnu-emacs@gnu.org; Sun, 19 Aug 2012 13:58:01 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Eli Zaretskii <eliz@gnu.org>
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sun, 19 Aug 2012 17:58:01 +0000
Resent-Message-ID: <handler.11860.B11860.134539903417580@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 11860
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 11860-submit@debbugs.gnu.org id=B11860.134539903417580
	(code B ref 11860); Sun, 19 Aug 2012 17:58:01 +0000
Original-Received: (at 11860) by debbugs.gnu.org; 19 Aug 2012 17:57:14 +0000
Original-Received: from localhost ([127.0.0.1]:38763 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1T39kP-0004ZV-BI
	for submit@debbugs.gnu.org; Sun, 19 Aug 2012 13:57:14 -0400
Original-Received: from mtaout20.012.net.il ([80.179.55.166]:61139)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eliz@gnu.org>) id 1T39kL-0004ZK-SZ
	for 11860@debbugs.gnu.org; Sun, 19 Aug 2012 13:57:12 -0400
Original-Received: from conversion-daemon.a-mtaout20.012.net.il by
	a-mtaout20.012.net.il (HyperSendmail v2007.08) id
	<0M9000K00KGTBH00@a-mtaout20.012.net.il> for
	11860@debbugs.gnu.org; Sun, 19 Aug 2012 20:56:58 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0M9000KCHKIX6C20@a-mtaout20.012.net.il>;
	Sun, 19 Aug 2012 20:56:58 +0300 (IDT)
In-reply-to: <87393j7fdv.fsf@gnu.org>
X-012-Sender: halo1@inter.net.il
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Received-From: 140.186.70.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:63295
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/63295>

> From: Jason Rumney <jasonr@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  11860@debbugs.gnu.org,  smias@ya=
ndex.ru
> Date: Sun, 19 Aug 2012 11:02:52 +0800
>=20
> Kenichi Handa <handa@gnu.org> writes:
>=20
> > In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org>=
 writes:
> >
> >> > From: Kenichi Handa <handa@gnu.org>
> >> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru
> >> > Date: Sat, 18 Aug 2012 11:45:27 +0900
> >> >=20
> >> > So, apparently Emacs on Windows and GNU/Linux uses the
> >> > different metrics of glyphs.
>=20
> Right, but adding the offsets to the corresponding metrics, we get =
the
> same result with both the Windows and GNU/Linux cases

I think the results of addition are not relevant to the problem.  The
problem is that the diacriticals and/or vowels are not drawn at
correct horizontal positions.  The values of the offsets are directly
relevant to that, because they describe how many pixels to advance
after drawing each glyph.  By contrast, the sum of the offsets will b=
e
always approximately the same, since the entire grapheme cluster
occupies a single character cell.

> So I'm not sure that this is causing us problems (see Eli's report =
about
> Hebrew), it's just a case of a different reference point being used
> between Windows and GNU/Linux.

My report about Hebrew is not relevant either; see below.

> If you are seeing something different than Eli for Hebrew with the =
same
> font, then I suspect the cause is linked with the version of Uniscr=
ibe
> that is installed. Maybe diacritic handling for Hebrew and Arabic i=
s a
> more recent addition to Uniscribe than the basic support for those
> languages.

That appears to be the case, indeed.  My initial attempts to reproduc=
e
this were on XP SP3, where Hebrew rendering appeared to be OK.  I now
tried on Windows 7 and there I see the problem with Hebrew as well.

Moreover, when I type the Hebrew characters specified by the OP, I
don't see that the uniscribe_shape function is called at all on XP: a
breakpoint inside it never breaks.  On Windows 7, that function does
get called.

Jason, how can I find out whether Uniscribe is used for rendering
Hebrew, or why doesn't Emacs call uniscribe_shape?  (I know about
uniscribe_font->cache, but I don't see that function called even if I
start Emacs with a breakpoint in it, so it seems the cache is not the
issue here.  The cache is per application, right?)

For Arabic characters in the recipe, uniscribe_shape _is_ called on
XP.  I guess that's why the problem with Arabic is visible on both XP
and Windows7.

For the record, here's the output of "C-u C-x =3D" on XP for the Hebr=
ew
character composition mentioned earlier:

=09       position: 193 of 194 (99%), column: 1
=09      character: =D7=92=E2=80=8E (displayed as =D7=92=E2=80=8E) (c=
odepoint 1490, #o2722, #x5d2)
      preferred charset: iso-8859-8 (ISO/IEC 8859/8)
  code point in charset: 0xE2
=09=09 syntax: w =09which means: word
=09       category: .:Base, R:Right-to-left (strong)
=09       to input: type "d" with hebrew-full
=09    buffer code: #xD7 #x92
=09      file code: #xE2 (encoded by coding system hebrew-iso-8bit-do=
s)
=09=09display: composed to form "=D7=92=D6=BB" (see below)

  Composed with the following character(s) "=D6=BB" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*=
-c-*-iso8859-8
  by these glyphs:
    [0 1 1490 674 8 0 6 12 4 nil]
    [0 1 1467 663 8 0 7 12 4 [-8 0 0]]

Compare with the output on Windows 7 to see the differences:

=09       position: 193 of 194 (99%), column: 1
=09      character: =D7=92=E2=80=8E (displayed as =D7=92=E2=80=8E) (c=
odepoint 1490, #o2722, #x5d2)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x05D2
=09=09 syntax: w =09which means: word
=09       category: .:Base, R:Right-to-left (strong)
=09       to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME=
"
=09    buffer code: #xD7 #x92
=09      file code: not encodable by coding system iso-latin-1-dos
=09=09display: composed to form "=D7=92=D6=BB" (see below)

  Composed with the following character(s) "=D6=BB" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*=
-c-*-iso10646-1
  by these glyphs:
    [0 1 1490 674 8 1 6 12 4 nil]
    [0 1 1490 663 0 2 6 12 4 nil]

And here's the output of "C-u C-x =3D" for the Arabic character Ayin
with sukun on XP:

=09       position: 197 of 198 (99%), column: 0
=09      character: =D8=B9=E2=80=8E (displayed as =D8=B9=E2=80=8E) (c=
odepoint 1593, #o3071, #x639)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x0639
=09=09 syntax: w =09which means: word
=09       category: .:Base, R:Right-to-left (strong), b:Arabic
=09    buffer code: #xD8 #xB9
=09      file code: not encodable by coding system hebrew-iso-8bit-do=
s
=09=09display: composed to form "=D8=B9=D9=92" (see below)

  Composed with the following character(s) "=D9=92" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*=
-c-*-iso10646-1
  by these glyphs:
    [0 1 1593 969 8 2 8 12 4 nil]
    [0 1 1593 1028 0 3 6 12 4 nil]

Note that the glyph index of the sukun are different from the Windows
7 output.  I have no idea why.

> >> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >> >   [0 1 1593 969 8 1 8 12 4 nil]
>=20
> I'm curious as to how we ended up with the same C entry in those
> vectors.

That's because the code in uniscribe_shape does this:

=09=09  LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
=09=09=09=09=09=09 + from]);

and it does that for all the 'nglyphs' glyphs produced by ScriptPlace=
.

As Handa-san writes, the character code is never used, because we hav=
e
the font glyph index and its metrics, so I think this is a non-issue.

> Could this be causing us problems later on?  The glyph index
> is correct (comparing to the GNU/Linux version), but I wonder if
> Uniscribe is referring back to the character at some point and trip=
ping
> up because it has been changed.

Uniscribe cannot refer to this code, because Uniscribe doesn't use
LGSTRING, IIUC.  Or does it?  (If it does, please show where in the
code it uses that value.)

> > =09=09  /* Detect clusters, for linking codes back to
> > =09=09     characters.  */
> > =09=09  if (attributes[j].fClusterStart)
> > =09=09    {
> > =09=09      while (from < nchars_in_run && clusters[from] < j)
> > =09=09=09from++;
> > =09=09      if (from >=3D nchars_in_run)
> > =09=09=09from =3D to =3D nchars_in_run - 1;
> > =09=09      else
> > =09=09=09{
> > =09=09=09  int k;
> > =09=09=09  to =3D nchars_in_run - 1;
> > =09=09=09  for (k =3D from + 1; k < nchars_in_run; k++)
> > =09=09=09    {
> > =09=09=09      if (clusters[k] > j)
> > =09=09=09=09{
> > =09=09=09=09  to =3D k - 1;
> > =09=09=09=09  break;
> > =09=09=09=09}
> > =09=09=09    }
> > =09=09=09}
> > =09=09    }
> >
> > The comment refer to "clusters".  I don't know what it
> > exactly means in uniscribe, but I guess it relates to
> > grapheme cluster, and if so, this part seems to relates to
> > the ordering of glyphs in this kind of grapheme clauster:
> >
> >   [0 1 1593 969 8 1 8 12 4 nil]
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>=20
> That seems to be correct.  Maybe this is the code that is changing =
the
> character code to 1593.

It doesn't _change_ the character code, it simply sets it to the code
of the base character.  But again, I don't think this is relevant.