bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: Jason Rumney <jasonr@gnu.org>
Cc: 11860@debbugs.gnu.org, smias@yandex.ru
Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sun, 19 Aug 2012 20:56:57 +0300	[thread overview]
Message-ID: <83txvydaty.fsf@gnu.org> (raw)
In-Reply-To: <87393j7fdv.fsf@gnu.org>

> From: Jason Rumney <jasonr@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  11860@debbugs.gnu.org,  smias@yandex.ru
> Date: Sun, 19 Aug 2012 11:02:52 +0800
> 
> Kenichi Handa <handa@gnu.org> writes:
> 
> > In article <83txw0aczg.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> >
> >> > From: Kenichi Handa <handa@gnu.org>
> >> > Cc: eliz@gnu.org, 11860@debbugs.gnu.org, smias@yandex.ru
> >> > Date: Sat, 18 Aug 2012 11:45:27 +0900
> >> > 
> >> > So, apparently Emacs on Windows and GNU/Linux uses the
> >> > different metrics of glyphs.
> 
> Right, but adding the offsets to the corresponding metrics, we get the
> same result with both the Windows and GNU/Linux cases

I think the results of addition are not relevant to the problem.  The
problem is that the diacriticals and/or vowels are not drawn at
correct horizontal positions.  The values of the offsets are directly
relevant to that, because they describe how many pixels to advance
after drawing each glyph.  By contrast, the sum of the offsets will be
always approximately the same, since the entire grapheme cluster
occupies a single character cell.

> So I'm not sure that this is causing us problems (see Eli's report about
> Hebrew), it's just a case of a different reference point being used
> between Windows and GNU/Linux.

My report about Hebrew is not relevant either; see below.

> If you are seeing something different than Eli for Hebrew with the same
> font, then I suspect the cause is linked with the version of Uniscribe
> that is installed. Maybe diacritic handling for Hebrew and Arabic is a
> more recent addition to Uniscribe than the basic support for those
> languages.

That appears to be the case, indeed.  My initial attempts to reproduce
this were on XP SP3, where Hebrew rendering appeared to be OK.  I now
tried on Windows 7 and there I see the problem with Hebrew as well.

Moreover, when I type the Hebrew characters specified by the OP, I
don't see that the uniscribe_shape function is called at all on XP: a
breakpoint inside it never breaks.  On Windows 7, that function does
get called.

Jason, how can I find out whether Uniscribe is used for rendering
Hebrew, or why doesn't Emacs call uniscribe_shape?  (I know about
uniscribe_font->cache, but I don't see that function called even if I
start Emacs with a breakpoint in it, so it seems the cache is not the
issue here.  The cache is per application, right?)

For Arabic characters in the recipe, uniscribe_shape _is_ called on
XP.  I guess that's why the problem with Arabic is visible on both XP
and Windows7.

For the record, here's the output of "C-u C-x =" on XP for the Hebrew
character composition mentioned earlier:

	       position: 193 of 194 (99%), column: 1
	      character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: iso-8859-8 (ISO/IEC 8859/8)
  code point in charset: 0xE2
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong)
	       to input: type "d" with hebrew-full
	    buffer code: #xD7 #x92
	      file code: #xE2 (encoded by coding system hebrew-iso-8bit-dos)
		display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-8
  by these glyphs:
    [0 1 1490 674 8 0 6 12 4 nil]
    [0 1 1467 663 8 0 7 12 4 [-8 0 0]]

Compare with the output on Windows 7 to see the differences:

	       position: 193 of 194 (99%), column: 1
	      character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x05D2
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong)
	       to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
	    buffer code: #xD7 #x92
	      file code: not encodable by coding system iso-latin-1-dos
		display: composed to form "גֻ" (see below)

  Composed with the following character(s) "ֻ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1490 674 8 1 6 12 4 nil]
    [0 1 1490 663 0 2 6 12 4 nil]

And here's the output of "C-u C-x =" for the Arabic character Ayin
with sukun on XP:

	       position: 197 of 198 (99%), column: 0
	      character: ع‎ (displayed as ع‎) (codepoint 1593, #o3071, #x639)
      preferred charset: unicode (Unicode (ISO10646))
  code point in charset: 0x0639
		 syntax: w 	which means: word
	       category: .:Base, R:Right-to-left (strong), b:Arabic
	    buffer code: #xD8 #xB9
	      file code: not encodable by coding system hebrew-iso-8bit-dos
		display: composed to form "عْ" (see below)

  Composed with the following character(s) "ْ" using this font:
    uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 1593 969 8 2 8 12 4 nil]
    [0 1 1593 1028 0 3 6 12 4 nil]

Note that the glyph index of the sukun are different from the Windows
7 output.  I have no idea why.

> >> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> >> >   [0 1 1593 969 8 1 8 12 4 nil]
> 
> I'm curious as to how we ended up with the same C entry in those
> vectors.

That's because the code in uniscribe_shape does this:

		  LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos
						 + from]);

and it does that for all the 'nglyphs' glyphs produced by ScriptPlace.

As Handa-san writes, the character code is never used, because we have
the font glyph index and its metrics, so I think this is a non-issue.

> Could this be causing us problems later on?  The glyph index
> is correct (comparing to the GNU/Linux version), but I wonder if
> Uniscribe is referring back to the character at some point and tripping
> up because it has been changed.

Uniscribe cannot refer to this code, because Uniscribe doesn't use
LGSTRING, IIUC.  Or does it?  (If it does, please show where in the
code it uses that value.)

> > 		  /* Detect clusters, for linking codes back to
> > 		     characters.  */
> > 		  if (attributes[j].fClusterStart)
> > 		    {
> > 		      while (from < nchars_in_run && clusters[from] < j)
> > 			from++;
> > 		      if (from >= nchars_in_run)
> > 			from = to = nchars_in_run - 1;
> > 		      else
> > 			{
> > 			  int k;
> > 			  to = nchars_in_run - 1;
> > 			  for (k = from + 1; k < nchars_in_run; k++)
> > 			    {
> > 			      if (clusters[k] > j)
> > 				{
> > 				  to = k - 1;
> > 				  break;
> > 				}
> > 			    }
> > 			}
> > 		    }
> >
> > The comment refer to "clusters".  I don't know what it
> > exactly means in uniscribe, but I guess it relates to
> > grapheme cluster, and if so, this part seems to relates to
> > the ordering of glyphs in this kind of grapheme clauster:
> >
> >   [0 1 1593 969 8 1 8 12 4 nil]
> >   [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
> 
> That seems to be correct.  Maybe this is the code that is changing the
> character code to 1593.

It doesn't _change_ the character code, it simply sets it to the code
of the base character.  But again, I don't think this is relevant.

next prev parent reply	other threads:[~2012-08-19 17:56 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-04  9:17 bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Steffan
2012-07-04 20:22 ` Eli Zaretskii
2012-07-05 17:53 ` Steffan
2012-08-05  5:27 ` Steffan
2012-08-05 15:49   ` Eli Zaretskii
2012-08-13  0:02     ` Kenichi Handa
2012-08-18  2:45       ` Kenichi Handa
2012-08-18  7:14         ` Eli Zaretskii
2012-08-18  9:19           ` Kenichi Handa
2012-08-18 15:33             ` Eli Zaretskii
2012-08-19  7:32               ` YAMAMOTO Mitsuharu
2012-08-19 12:51                 ` Kenichi Handa
2012-08-19 13:20               ` Kenichi Handa
2012-08-19 18:44                 ` Eli Zaretskii
2012-08-19 18:53                   ` Werner LEMBERG
2012-08-20 17:24                   ` Eli Zaretskii
2012-08-19  3:02             ` Jason Rumney
2012-08-19 13:37               ` Kenichi Handa
2012-08-19 16:16                 ` Jason Rumney
2012-08-19 18:54                   ` Eli Zaretskii
2012-08-20 14:57                     ` Kenichi Handa
2012-08-20 17:16                       ` Eli Zaretskii
2012-08-21  9:20                         ` Kenichi Handa
2012-08-19 18:52                 ` Eli Zaretskii
2012-08-19 17:56               ` Eli Zaretskii [this message]
2012-08-19  4:34         ` YAMAMOTO Mitsuharu
2012-09-09  4:06           ` YAMAMOTO Mitsuharu
2012-09-11 14:49             ` Kenichi Handa
2012-09-11 17:48               ` Eli Zaretskii
2012-09-12 13:14                 ` Kenichi Handa
2012-09-12 16:34                   ` Eli Zaretskii
2012-09-13  6:07                     ` Kenichi Handa
2012-09-13 17:00                       ` Eli Zaretskii
2012-09-13 23:26                         ` Kenichi Handa
2012-09-16 12:03               ` Kenichi Handa
2012-09-16 12:41                 ` Eli Zaretskii
2012-09-16 15:43                   ` Stefan Monnier
2012-09-16 15:50                     ` Eli Zaretskii
2012-09-17 14:08                       ` Kenichi Handa
2012-09-17 16:58                         ` Stefan Monnier
2012-08-19 18:22         ` Eli Zaretskii
2012-08-21 13:16           ` Kenichi Handa
2012-08-21 17:32             ` Eli Zaretskii
2012-08-22  9:15               ` Kenichi Handa
2012-08-22 19:52 ` Steffan
2012-08-23  2:50   ` Eli Zaretskii
2012-08-22 21:40 ` Steffan
2012-08-23  2:49   ` Eli Zaretskii
2012-08-27 21:10 ` Steffan
2012-08-29  8:09   ` Kenichi Handa
2012-09-01 13:59     ` Eli Zaretskii
2012-09-03 13:55       ` Kenichi Handa
2012-09-03 15:53         ` Eli Zaretskii
2012-09-04  9:03           ` Kenichi Handa
2012-08-29  8:57 ` Steffan
2012-09-01 14:06   ` Eli Zaretskii
2012-09-03 15:31 ` Steffan
2012-09-03 16:28   ` Eli Zaretskii
2012-09-04 17:18   ` Eli Zaretskii
2012-09-03 16:24 ` Steffan
2012-09-03 17:49 ` Steffan
2012-09-06  2:09   ` YAMAMOTO Mitsuharu
2012-09-06  8:52 ` Steffan
2012-09-06  9:56   ` YAMAMOTO Mitsuharu
2012-09-06 10:47     ` Eli Zaretskii
2012-09-06 14:52 ` Steffan
2012-09-10 16:13 ` Steffan
2020-08-17 22:45 ` Stefan Kangas
2020-08-18  4:40   ` Eli Zaretskii
2020-08-18  9:47     ` Stefan Kangas
     [not found] <14231341502795@web11e.yandex.ru>
2012-07-05 17:16 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83txvydaty.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=11860@debbugs.gnu.org \
    --cc=jasonr@gnu.org \
    --cc=smias@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).