unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: Yair F <yair.f.lists@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Composing Hebrew diacriticals
Date: Thu, 01 Jul 2010 14:52:23 +0900	[thread overview]
Message-ID: <tl7sk434ulk.fsf@m17n.org> (raw)
In-Reply-To: <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com> (message from Yair F on Thu, 1 Jul 2010 00:28:36 +0300)

[-- Attachment #1: Type: text/plain, Size: 9138 bytes --]

In article <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Sorry about that Please find hebrew-sample2.txt the source file.
> Arial-anottated.png is this file displayed using emacs with Arial font.
> The numbers in red refer to the following comments the general flow is
> top-bottom right-left:
> 1. Shin-Dot should be rendered near the right leg. currently it is
> rendered above the centre leg, this is unreradable.
> 2. All points below should be horizontally centred relative to the
> base letter. Currently it seems that they are align to the left.
> Exception for this rule is letters that have a single leg downward
> such as =D7=95, =D7=A8, =D7=93, =D7=96 the points should be rendered direct=
> ly under the
> leg for these letters.
> 3. The Shva point touches Qof's leg. the result is unreadable.
> 4. The Dagesh point is hidden within the Shin letter.
> 5. This is not Hebrew, but the combining dot above should be composed
> with the letter A.
> 6. The Holam point should be left to the leg, and not right. Result is
> unreadable.
> 7. Shuruq point should be left to the vav letter, and not right.
> Result is unreadable.

All those are glyph positioning problems and can be improved
by adding more code to hebrew-shape-gstring.

> > Anyway, for fonts that don't have OpenType tables for Hebrew
> > script, we can do nothing other than artificially adjusting
> > glyph position. =C2=A0Have you seen any other application
> > rendering Hebrew well with that Arial font?
> Openoffice and Firefox correctly render Hebrew points.

??? When I open your hebrew-sample2.txt with oowriter, and
specify Arial font, the rendering is almost (exactly?) the
same as that of Emacs (see the attached image).

I confirmed that Firefox (and all applications using
Pango/harfbuzz; e.g. gedit) surely do better hebrew
rendering with Arial.  By reading the code of Pango, I found
that it has a fallback shaping engine that is used for a
font of no hebrew GPOS OpenType tables.  Here's the excerpt
from pango/module/hebrew-shaper.c.  You'll see that it
checks various character combinations and adjust glyph
offsets properly.  But the code has many magic numbers
(e.g. 3.5, 0.7, 0.5, 1/3, 3/5, ...).  I think it's a dirty &
ad-hoc hack.

Theoretically, it is possible to do the same thing in the
function hebrew-shape-gstring.  But, is it really worth
doing that?  Isn't it enough to tell Hebrew users to use
properly desinged OpenType fonts?

============================================================
void
hebrew_shaper_get_cluster_kerning(gunichar            *cluster,
				  gint                cluster_length,
				  PangoRectangle      ink_rect[],

				  /* input and output */
				  gint                width[],
				  gint                x_offset[],
				  gint                y_offset[])
{
  int i;
  int base_ink_x_offset, base_ink_y_offset, base_ink_width, base_ink_height;
  gunichar base_char = cluster[0];

  x_offset[0] = 0;
  y_offset[0] = 0;

  if (cluster_length == 1)
    {
      /* Make lone 'vav dot' have zero width */
      if (base_char == UNI_SHIN_DOT
	  || base_char == UNI_SIN_DOT
	  || base_char == UNI_HOLAM
	  ) {
	x_offset[0] = -ink_rect[0].x - ink_rect[0].width;
	width[0] = 0;
      }

      return;
    }

  base_ink_x_offset = ink_rect[0].x;
  base_ink_y_offset = ink_rect[0].y;
  base_ink_width = ink_rect[0].width;
  base_ink_height = ink_rect[0].height;

  /* Do heuristics */
  for (i=1; i<cluster_length; i++)
    {
      int gl = cluster[i];
      x_offset[i] = 0;
      y_offset[i] = 0;

      /* Check if it is a point */
      if (gl < 0x5B0 || gl >= 0x05D0)
	continue;

      /* Center dot of VAV */
      if (gl == UNI_MAPIQ && base_char == UNI_VAV)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x;

	  /* If VAV is a vertical bar without a roof, then we
	     need to make room for the dot by increasing the
	     cluster width. But how can I check if that is the
	     case??
	  */
	  /* This is wild, but it does the job of differentiating
	     between two M$ fonts... Base the decision on the
	     aspect ratio of the vav...
	  */
	  if (base_ink_height > base_ink_width * 3.5)
	    {
	      int j;
	      double space = 0.7;
	      double kern = 0.5;

	      /* Shift all characters to make place for the mapiq */
	      for (j=0; j<i; j++)
		  x_offset[j] += ink_rect[i].width*(1+space-kern);

	      width[cluster_length-1] += ink_rect[i].width*(1+space-kern);
	      x_offset[i] -= ink_rect[i].width*(kern);
	    }
	}

      /* Dot over SHIN */
      else if (gl == UNI_SHIN_DOT && base_char == UNI_SHIN)
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* Dot over SIN */
      else if (gl == UNI_SIN_DOT && base_char == UNI_SHIN)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x;
	}

      /* VOWEL DOT above to any other character than
	 SHIN or VAV should stick out a bit to the left. */
      else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM)
	       && base_char != UNI_SHIN && base_char != UNI_VAV)
	{
	  x_offset[i] = base_ink_x_offset -ink_rect[i].x - ink_rect[i].width * 3/ 2;
	}

      /* VOWELS under resh or vav are right aligned, if they are
	 narrower than the characters. Otherwise they are centered.
       */
      else if ((base_char == UNI_VAV
		|| base_char == UNI_RESH
		|| base_char == UNI_YOD
		|| base_char == UNI_DALED
		)
	       && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) ||
		   gl == UNI_QUBUTS)
	       && ink_rect[i].width < base_ink_width
	       )
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* VOWELS under FINAL KAF are offset centered and offset in
	 y */
      else if ((base_char == UNI_FINAL_KAF
		)
	       && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) ||
		   gl == UNI_QUBUTS))
	{
	  /* x are at 1/3 to take into accoun the stem */
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 1/3 - ink_rect[i].width/2;

	  /* Center in y */
	  y_offset[i] = base_ink_y_offset - ink_rect[i].y
	    + base_ink_height * 1/2 - ink_rect[i].height/2;
	}


      /* MAPIQ in PE or FINAL PE */
      else if (gl == UNI_MAPIQ
	       && (base_char == UNI_PE || base_char == UNI_FINAL_PE))
	{
	  x_offset[i]= base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 2/3 - ink_rect[i].width/2;

	  /* Another option is to offset the MAPIQ in y...
	     glyphs->glyphs[cluster_start_idx+i].geometry.y_offset
	     -= base_ink_height/5; */
	}

      /* MAPIQ in SHIN should be moved a bit to the right */
      else if (gl == UNI_MAPIQ
	       && base_char == UNI_SHIN)
	{
	  x_offset[i]=  base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 3/5 - ink_rect[i].width/2;
	}

      /* MAPIQ in YUD is right aligned */
      else if (gl == UNI_MAPIQ
	       && base_char == UNI_YOD)
	{
	  x_offset[i]=  base_ink_x_offset - ink_rect[i].x;

	  /* Lower left in y */
	  y_offset[i] = base_ink_y_offset - ink_rect[i].y
	    + base_ink_height - ink_rect[i].height*1.75;

	  if (base_ink_height > base_ink_width * 2)
	    {
	      int j;
	      double space = 0.7;
	      double kern = 0.5;

	      /* Shift all cluster characters to make space for mapiq */
	      for (j=0; j<i; j++)
		x_offset[j] += ink_rect[i].width*(1+space-kern);

	      width[cluster_length-1] += ink_rect[i].width*(1+space-kern);
	    }

	}

      /* VOWEL DOT next to any other character */
      else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM)
	       && (base_char != UNI_VAV))
	{
	  x_offset[i] = base_ink_x_offset -ink_rect[i].x;
	}

      /* Move nikud of taf a bit ... */
      else if (base_char == UNI_TAV && gl == UNI_MAPIQ)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 5/8 - ink_rect[i].width/2;
	}

      /* Move center dot of characters with a right stem and no
	 left stem. */
      else if (gl == UNI_MAPIQ &&
	       (base_char == UNI_BET
		|| base_char == UNI_DALED
		|| base_char == UNI_KAF
		|| base_char == UNI_GIMMEL
		))
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 3/8 - ink_rect[i].width/2;
	}

      /* Right align wide nikud under QOF */
      else if (base_char == UNI_QOF &&
	       ( (gl >= UNI_HATAF_SEGOL
		  && gl <= UNI_HATAF_QAMATZ)
		 || (gl >= UNI_TSERE
		     && gl<= UNI_QAMATS)
		 || (gl == UNI_QUBUTS)))
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* Center by default */
      else
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width/2 - ink_rect[i].width/2;
	}
    }

}
============================================================

> The poetry site
> you mentioned http://www.zemer.co.il/song.asp?id=3D393 uses David and
> being correctly rendered.
> Kate (using pango?) also better render using Arial, David-CLM. It has
> some other issues though, but the result is mostly readable.

As Kate is a KDE application, I think it's not using Pango.
But, if it renders Hebrew with Arial well, it (or rendering
module of KDE/Qt) should have the similar ad-hoc code.

---
Kenichi Handa
handa@m17n.org


[-- Attachment #2: oowriter-arial.png --]
[-- Type: image/png, Size: 79797 bytes --]

  parent reply	other threads:[~2010-07-01  5:52 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tl7fx0v9nra.fsf@m17n.org>
2010-06-15 11:02 ` Composing Hebrew diacriticals Kenichi Handa
2010-06-24  6:33   ` Kenichi Handa
2010-06-25 10:16     ` Eli Zaretskii
2010-06-28 16:40     ` Yair F
2010-06-29  8:07       ` Kenichi Handa
2010-06-29 18:57         ` Yair F
2010-06-30  5:27           ` Kenichi Handa
     [not found]             ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>
2010-06-30 21:48               ` Fwd: " Yair F
2010-07-01  5:59                 ` Miles Bader
2010-07-01  5:52               ` Kenichi Handa [this message]
2010-07-01 20:30                 ` Yair F
2010-07-02  7:51                   ` Kenichi Handa
2010-07-12  8:17                     ` Kenichi Handa
2010-07-12 21:10                       ` Yair F
2010-07-13  4:11                         ` Kenichi Handa
2010-07-13  4:47                           ` Yair F
2010-07-13 12:01                         ` Eli Zaretskii
2010-04-30 12:29 Eli Zaretskii
2010-05-05  2:39 ` Kenichi Handa
2010-05-05 15:49   ` David Kastrup
2010-05-05 20:51     ` Eli Zaretskii
2010-05-06  7:20       ` David Kastrup
2010-05-06  0:45     ` Kenichi Handa
2010-05-06 12:14       ` David Kastrup
2010-05-06 13:01         ` Kenichi Handa
2010-05-05 18:01   ` Eli Zaretskii
2010-05-07 11:15     ` Kenichi Handa
2010-05-08 12:51       ` Eli Zaretskii
2010-05-06 14:59   ` Yair F.
2010-05-06 17:41     ` Eli Zaretskii
2010-05-07  0:48     ` Kenichi Handa
2010-05-07  4:41       ` Yair F
2010-05-07  6:23         ` Kenichi Handa
2010-05-07 10:00           ` Yair F
2010-05-07 11:11             ` Kenichi Handa
2010-05-07  9:28         ` Eli Zaretskii
2010-05-10 14:09           ` Yair F
2010-05-11  0:25             ` Kenichi Handa
2010-05-11 12:20               ` Kenichi Handa
2010-05-11 16:22                 ` Eli Zaretskii
2010-05-12  8:04                   ` Kenichi Handa
2010-05-12 17:35                     ` Eli Zaretskii
2010-05-12 19:05                       ` Juanma Barranquero
2010-05-13  3:06                         ` Eli Zaretskii
2010-05-13  0:42                       ` Kenichi Handa
2010-05-14  8:10                         ` Kenichi Handa
2010-05-14 10:02                           ` Eli Zaretskii
2010-05-14 11:58                             ` Kenichi Handa
2010-05-14 13:29                               ` Eli Zaretskii
2010-05-14 14:06                                 ` Eli Zaretskii
     [not found]                           ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>
2010-05-15  2:14                             ` Kenichi Handa
2010-05-15 21:35                               ` Yair F
2010-05-17  4:35                                 ` Kenichi Handa
2010-05-17 17:32                                   ` Eli Zaretskii
2010-05-18  0:36                                     ` Kenichi Handa
2010-05-17 21:08                                   ` Yair F
2010-05-20  2:09                                     ` Kenichi Handa
2010-05-25  1:45                                       ` Kenichi Handa
2010-05-25 20:56                                         ` Yair F
2010-05-26  0:36                                           ` Kenichi Handa
2010-05-26  4:37                                             ` Yair F
2010-05-26  6:00                                               ` Kenichi Handa
2010-05-26 16:12                                                 ` Yair F
2010-05-27  7:27                                                   ` Kenichi Handa
2010-05-27 21:59                                                     ` Yair F
2010-05-28  0:42                                                       ` Kenichi Handa
2010-06-01  8:58                                                         ` Yair F
2010-05-18  7:29                                   ` Eli Zaretskii
2010-05-17 13:53                                 ` Stefan Monnier
2010-05-19 17:23                     ` Eli Zaretskii
2010-05-11 21:40                 ` Yair F
2010-05-12  3:15                   ` Eli Zaretskii
2010-05-12 15:11                     ` Yair F
2010-05-12 17:43                       ` Eli Zaretskii
2010-05-12 22:01                         ` Yair F
2010-05-13 17:14                           ` Eli Zaretskii
2010-05-13 19:46                             ` Yair F
2010-05-13 19:56                               ` Eli Zaretskii
2010-05-13 20:08                                 ` Yair F
2010-05-14  2:35                                   ` Miles Bader
2010-05-14 10:45                                     ` Yair F
2010-05-14 13:05                                       ` Eli Zaretskii
2010-05-14 13:15                                       ` Kenichi Handa
2010-05-15  0:46                                       ` Miles Bader
2010-05-13  0:29                       ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7sk434ulk.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    --cc=yair.f.lists@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).