unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Composing Hebrew diacriticals
@ 2010-04-30 12:29 Eli Zaretskii
  2010-05-05  2:39 ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-04-30 12:29 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

AFAICS, Hebrew diacriticals are not currently composed with
consonants, even in the pre-bidi display.  Is that true?

If it's true, what will it take to construct the appropriate
composition rules for the diacriticals?

I tried to read composite.el and some of the files in lisp/language
that appear to use its facilities, but quickly got lost, probably
because I know to little about this matter.  The ELisp manual also
doesn't document these facilities, unfortunately.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-04-30 12:29 Composing Hebrew diacriticals Eli Zaretskii
@ 2010-05-05  2:39 ` Kenichi Handa
  2010-05-05 15:49   ` David Kastrup
                     ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-05  2:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83mxwlw2c0.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> AFAICS, Hebrew diacriticals are not currently composed with
> consonants, even in the pre-bidi display.  Is that true?

Yes.

> If it's true, what will it take to construct the appropriate
> composition rules for the diacriticals?

This will be ok.

(set-char-table-range 
 composition-function-table '(#x591 . #x5F4)
 '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))

On Windows, I expect uniscribe already has capability to
handle that.

On GNU/Linux, unfortunately, m17n-db packages of various
distribution is not yet updated.  So, you have to install a
few database files manually.  I'll prepare those files soon
so that GNU/Linux users can test hebrew composition easily.

> I tried to read composite.el and some of the files in lisp/language
> that appear to use its facilities, but quickly got lost, probably
> because I know to little about this matter.  The ELisp manual also
> doesn't document these facilities, unfortunately.

Sorry for my laziness.  For composition, there are many
stakeholders and I'm still wondering to what degree to
document.  For instance, you have to understand what
OpenType font can do and what it can't do.  The former part
can be delegated to font-shape-gstring, but the latter part
must be implemented by lisp.  The characteristics of
terminal (graphical or charater) must also be concerned.
I'll provide a proper internal document (i.e. for Emacs
developpers) in haste.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05  2:39 ` Kenichi Handa
@ 2010-05-05 15:49   ` David Kastrup
  2010-05-05 20:51     ` Eli Zaretskii
  2010-05-06  0:45     ` Kenichi Handa
  2010-05-05 18:01   ` Eli Zaretskii
  2010-05-06 14:59   ` Yair F.
  2 siblings, 2 replies; 88+ messages in thread
From: David Kastrup @ 2010-05-05 15:49 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <83mxwlw2c0.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
>
>> AFAICS, Hebrew diacriticals are not currently composed with
>> consonants, even in the pre-bidi display.  Is that true?
>
> Yes.
>
>> If it's true, what will it take to construct the appropriate
>> composition rules for the diacriticals?
>

Uh, with the current version of Emacs

GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola

I tried some copy&paste of Yiddish text including diacriticals into
Emacs, with the result that without bidi-display-reordering set, the
diacriticals were not composed with the letters, and with
bidi-display-reordering set, they were composed just fine.

So I am not sure whether you currently are discussing how to best fix
something that is not actually broken.

-- 
David Kastrup





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05  2:39 ` Kenichi Handa
  2010-05-05 15:49   ` David Kastrup
@ 2010-05-05 18:01   ` Eli Zaretskii
  2010-05-07 11:15     ` Kenichi Handa
  2010-05-06 14:59   ` Yair F.
  2 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-05 18:01 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Wed, 05 May 2010 11:39:28 +0900
> 
> > If it's true, what will it take to construct the appropriate
> > composition rules for the diacriticals?
> 
> This will be ok.
> 
> (set-char-table-range 
>  composition-function-table '(#x591 . #x5F4)
>  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> 
> On Windows, I expect uniscribe already has capability to
> handle that.

Thanks, but when I evaluate this expression and insert a consonant
followed by a diacriticals, Emacs infloops.  Do you think that this is
something specific to uniscribe, or to how Emacs uses uniscribe?




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05 15:49   ` David Kastrup
@ 2010-05-05 20:51     ` Eli Zaretskii
  2010-05-06  7:20       ` David Kastrup
  2010-05-06  0:45     ` Kenichi Handa
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-05 20:51 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Wed, 05 May 2010 17:49:35 +0200
> 
> Uh, with the current version of Emacs
> 
> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola
> 
> I tried some copy&paste of Yiddish text including diacriticals into
> Emacs, with the result that without bidi-display-reordering set, the
> diacriticals were not composed with the letters, and with
> bidi-display-reordering set, they were composed just fine.

Please show a recipe to reproduce this.  Compositions with reordering
is something that wasn't coded yet (I think that's what Handa-san
works on now), so I'm not sure how this could work.  In any case,
compositions without reordering ought to work.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05 15:49   ` David Kastrup
  2010-05-05 20:51     ` Eli Zaretskii
@ 2010-05-06  0:45     ` Kenichi Handa
  2010-05-06 12:14       ` David Kastrup
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-06  0:45 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

In article <87eihqqrg0.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:

> Uh, with the current version of Emacs

> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola

> I tried some copy&paste of Yiddish text including diacriticals into
> Emacs, with the result that without bidi-display-reordering set, the
> diacriticals were not composed with the letters, and with
> bidi-display-reordering set, they were composed just fine.

Does cursor movement treat a character and the combined
diacritical mark as one unit?

What is shown when you put a cursor on a character that is
composed with a diacritical mark and type C-u C-x =?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05 20:51     ` Eli Zaretskii
@ 2010-05-06  7:20       ` David Kastrup
  0 siblings, 0 replies; 88+ messages in thread
From: David Kastrup @ 2010-05-06  7:20 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Date: Wed, 05 May 2010 17:49:35 +0200
>> 
>> Uh, with the current version of Emacs
>> 
>> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola
>> 
>> I tried some copy&paste of Yiddish text including diacriticals into
>> Emacs, with the result that without bidi-display-reordering set, the
>> diacriticals were not composed with the letters, and with
>> bidi-display-reordering set, they were composed just fine.
>
> Please show a recipe to reproduce this.  Compositions with reordering
> is something that wasn't coded yet (I think that's what Handa-san
> works on now), so I'm not sure how this could work.  In any case,
> compositions without reordering ought to work.

Copy and paste from <URL:http://en.wikipedia.org/wiki/Yiddish_language>
into a buffer.

Pasting

    the language is called לשון־אַשכּנז  (loshn-ashkenaz

with bidi-display-reordering unset, I get


[-- Attachment #2: Screenshot.png --]
[-- Type: image/png, Size: 548 bytes --]

[-- Attachment #3: Type: text/plain, Size: 31 bytes --]


and with it being set, I get


[-- Attachment #4: Screenshot-1.png --]
[-- Type: image/png, Size: 498 bytes --]

[-- Attachment #5: Type: text/plain, Size: 205 bytes --]



As you can see, the aleph is composed with the "a" vowel underneath, and
the kaph gets a dagesh.

So there clearly happens some composition.  Whether GTK+ is involved, I
have no idea.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-06  0:45     ` Kenichi Handa
@ 2010-05-06 12:14       ` David Kastrup
  2010-05-06 13:01         ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: David Kastrup @ 2010-05-06 12:14 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <87eihqqrg0.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:
>
>> Uh, with the current version of Emacs
>
>> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola
>
>> I tried some copy&paste of Yiddish text including diacriticals into
>> Emacs, with the result that without bidi-display-reordering set, the
>> diacriticals were not composed with the letters, and with
>> bidi-display-reordering set, they were composed just fine.
>
> Does cursor movement treat a character and the combined
> diacritical mark as one unit?

No.

> What is shown when you put a cursor on a character that is
> composed with a diacritical mark and type C-u C-x =?

Depends on where I put the cursor.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-06 12:14       ` David Kastrup
@ 2010-05-06 13:01         ` Kenichi Handa
  0 siblings, 0 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-06 13:01 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

In article <876331p6r3.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:

>>> I tried some copy&paste of Yiddish text including diacriticals into
>>> Emacs, with the result that without bidi-display-reordering set, the
>>> diacriticals were not composed with the letters, and with
>>> bidi-display-reordering set, they were composed just fine.
> >
> > Does cursor movement treat a character and the combined
> > diacritical mark as one unit?

> No.

Then, it seems that your Yiddish font just has special left
and right bearings for diacriticals that is suitable for the
case that they are drawn before (i.e. at the left of) a base
character.

> > What is shown when you put a cursor on a character that is
> > composed with a diacritical mark and type C-u C-x =?

> Depends on where I put the cursor.

Of course.  I just wanted to see some example to check if
they are really composed or not.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05  2:39 ` Kenichi Handa
  2010-05-05 15:49   ` David Kastrup
  2010-05-05 18:01   ` Eli Zaretskii
@ 2010-05-06 14:59   ` Yair F.
  2010-05-06 17:41     ` Eli Zaretskii
  2010-05-07  0:48     ` Kenichi Handa
  2 siblings, 2 replies; 88+ messages in thread
From: Yair F. @ 2010-05-06 14:59 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa <handa <at> m17n.org> writes:


> This will be ok.
> 
> (set-char-table-range 
>  composition-function-table '(#x591 . #x5F4)
>  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> 

There are some restrictions on which characters are allowed to be composed.

I've started working on composition, and will update when it's ready.
















^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-06 14:59   ` Yair F.
@ 2010-05-06 17:41     ` Eli Zaretskii
  2010-05-07  0:48     ` Kenichi Handa
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-06 17:41 UTC (permalink / raw)
  To: Yair F.; +Cc: emacs-devel

> From: Yair F. <yair.f.lists@gmail.com >
> Date: Thu, 6 May 2010 14:59:23 +0000 (UTC)
> 
> I've started working on composition, and will update when it's ready.

Thanks.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-06 14:59   ` Yair F.
  2010-05-06 17:41     ` Eli Zaretskii
@ 2010-05-07  0:48     ` Kenichi Handa
  2010-05-07  4:41       ` Yair F
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-07  0:48 UTC (permalink / raw)
  To: Yair F.; +Cc: emacs-devel

In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes:

> Kenichi Handa <handa <at> m17n.org> writes:
> > This will be ok.
> > 
> > (set-char-table-range 
> >  composition-function-table '(#x591 . #x5F4)
> >  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> > 

> There are some restrictions on which characters are allowed to be composed.

Yes.  If font_driver->shape is implemented correctly,
font-shape-gstring should compose the given sequence with
multiple grapheme clusters.

But, I admit the above regexp is not optimal.  For
efficiency, the regexp should match only with a character
sequence that is composed into one grapheme cluster.

---
Kenichi Handa
handa@m17n.org
















^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07  0:48     ` Kenichi Handa
@ 2010-05-07  4:41       ` Yair F
  2010-05-07  6:23         ` Kenichi Handa
  2010-05-07  9:28         ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yair F @ 2010-05-07  4:41 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Fri, May 7, 2010 at 3:48 AM, Kenichi Handa <handa@m17n.org> wrote:
> In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes:
>
>> Kenichi Handa <handa <at> m17n.org> writes:
>> > This will be ok.
>> >
>> > (set-char-table-range
>> >  composition-function-table '(#x591 . #x5F4)
>> >  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
>> >
>
>> There are some restrictions on which characters are allowed to be composed.
>

Something like:

(set-char-table-range
 composition-function-table '(#x591 . #x5F4)
 (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))

and now Needs to define hebrew-composable-pattern

> Yes.  If font_driver->shape is implemented correctly,
> font-shape-gstring should compose the given sequence with
> multiple grapheme clusters.
>
> But, I admit the above regexp is not optimal.  For
> efficiency, the regexp should match only with a character
> sequence that is composed into one grapheme cluster.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07  4:41       ` Yair F
@ 2010-05-07  6:23         ` Kenichi Handa
  2010-05-07 10:00           ` Yair F
  2010-05-07  9:28         ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-07  6:23 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <x2hba5bff411005062141rbcadbcd5va8b1ead65f40aef8@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Something like:

> (set-char-table-range
>  composition-function-table '(#x591 . #x5F4)
>  (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))

> and now Needs to define hebrew-composable-pattern

If what composed are only diacritical marks, and they are
placed on any base characters, it is better to set that kind
of list only for hebrew diacriticals for efficiency.  So,
the code will be something like this:

(let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...))
      (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..]))
  (dolist (elt hebrew-diacritals-list)
    (set-char-table-range elt
      (list (vector regexp 1 'font-shape-gstring)))))

Here "1" is for moving back one character to check matching
with REGEXP.

>> There are some restrictions on which characters are allowed to be composed.

If that restrictions are more rigid, regexp should vary for
each diacritical mark.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07  4:41       ` Yair F
  2010-05-07  6:23         ` Kenichi Handa
@ 2010-05-07  9:28         ` Eli Zaretskii
  2010-05-10 14:09           ` Yair F
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-07  9:28 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Fri, 7 May 2010 07:41:54 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: emacs-devel@gnu.org
> 
> On Fri, May 7, 2010 at 3:48 AM, Kenichi Handa <handa@m17n.org> wrote:
> > In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes:
> >
> >> Kenichi Handa <handa <at> m17n.org> writes:
> >> > This will be ok.
> >> >
> >> > (set-char-table-range
> >> >  composition-function-table '(#x591 . #x5F4)
> >> >  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> >> >
> >
> >> There are some restrictions on which characters are allowed to be composed.
> >
> 
> Something like:
> 
> (set-char-table-range
>  composition-function-table '(#x591 . #x5F4)
>  (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))
> 
> and now Needs to define hebrew-composable-pattern

Yair, does the above work for you?  For me (on MS-Windows), it
infloops.  I will try to find out where it loops later.





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07  6:23         ` Kenichi Handa
@ 2010-05-07 10:00           ` Yair F
  2010-05-07 11:11             ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-07 10:00 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Fri, May 7, 2010 at 9:23 AM, Kenichi Handa <handa@m17n.org> wrote:

> If what composed are only diacritical marks, and they are
> placed on any base characters, it is better to set that kind
> of list only for hebrew diacriticals for efficiency.  So,
> the code will be something like this:
>
> (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...))
>      (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..]))
>  (dolist (elt hebrew-diacritals-list)
>    (set-char-table-range elt
>      (list (vector regexp 1 'font-shape-gstring)))))
>
> Here "1" is for moving back one character to check matching
> with REGEXP.
>
>>> There are some restrictions on which characters are allowed to be composed.
>
> If that restrictions are more rigid, regexp should vary for
> each diacritical mark.

This is the composition regexp : I added whitespace and comments for readability

\\(
[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base
  [\u05BC\u05BF]?                               ;; 0-1 marks of 1st
class (dagesh)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05D5                                          ; base
  \u05BC?                                       ;; 0-1 marks of 1st
class (dagesh)
[\u05B0-\u05BB\u05C7]?                          ;; 0-1 marks of
extended 3rd class (niqud)
[\u0591-\u05AF\u05BD]*                          ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05E9                                          ; base
  \u05BC                                       ;; 0-1 marks of 1st
class (dagesh)
  [\u05C1\u05C2]?                              ;; 0-1 marks of 2nd
class (shin dot)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\)

What would be the best way in this case?
In the most extreme case there are 6 marks attached to base character.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07 10:00           ` Yair F
@ 2010-05-07 11:11             ` Kenichi Handa
  0 siblings, 0 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-07 11:11 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <p2hba5bff411005070300lbc01fb06k9a753cf629b1b4c0@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> This is the composition regexp : I added whitespace and comments for readability

> \\(
> [\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base
>   [\u05BC\u05BF]?                               ;; 0-1 marks of 1st
> class (dagesh)
>   [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
> class (niqud)
>   [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
> marks of 4th class
> \\|
> \u05D5                                          ; base
>   \u05BC?                                       ;; 0-1 marks of 1st
> class (dagesh)
> [\u05B0-\u05BB\u05C7]?                          ;; 0-1 marks of
> extended 3rd class (niqud)
> [\u0591-\u05AF\u05BD]*                          ;; 0-2 (possibly 3)
> marks of 4th class
> \\|
> \u05E9                                          ; base
>   \u05BC                                       ;; 0-1 marks of 1st
> class (dagesh)
>   [\u05C1\u05C2]?                              ;; 0-1 marks of 2nd
> class (shin dot)
>   [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
> class (niqud)
>   [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
> marks of 4th class
> \\)

> What would be the best way in this case?
> In the most extreme case there are 6 marks attached to base character.

Thank you for the data.  As I'm now busy for fixing the
display problem of bidi-composition interaction, I'll reply
after that work is finished.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-05 18:01   ` Eli Zaretskii
@ 2010-05-07 11:15     ` Kenichi Handa
  2010-05-08 12:51       ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-07 11:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83r5lqjkho.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > Cc: emacs-devel@gnu.org
> > Date: Wed, 05 May 2010 11:39:28 +0900
> > 
> > > If it's true, what will it take to construct the appropriate
> > > composition rules for the diacriticals?
> > 
> > This will be ok.
> > 
> > (set-char-table-range 
> >  composition-function-table '(#x591 . #x5F4)
> >  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> > 
> > On Windows, I expect uniscribe already has capability to
> > handle that.

> Thanks, but when I evaluate this expression and insert a consonant
> followed by a diacriticals, Emacs infloops.  Do you think that this is
> something specific to uniscribe, or to how Emacs uses uniscribe?

I'm not sure.  I don't see infloops on GNU/Linux.  But,
investigating it based on the current problematic code is a
waste of time.  I'd like to concentrate on fixing the
current code at the moment.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07 11:15     ` Kenichi Handa
@ 2010-05-08 12:51       ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-08 12:51 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 07 May 2010 20:15:03 +0900
> 
> > > (set-char-table-range 
> > >  composition-function-table '(#x591 . #x5F4)
> > >  '(["[\u0591-\u05F4]+" 0 font-shape-gstring]))
> > > 
> > > On Windows, I expect uniscribe already has capability to
> > > handle that.
> 
> > Thanks, but when I evaluate this expression and insert a consonant
> > followed by a diacriticals, Emacs infloops.  Do you think that this is
> > something specific to uniscribe, or to how Emacs uses uniscribe?
> 
> I'm not sure.  I don't see infloops on GNU/Linux.  But,
> investigating it based on the current problematic code is a
> waste of time.

I'm not sure what problematic code you have in mind.  I see the
infinite loop without turning bidi-display-reordering on, and also in
Emacs 23.2.

In the debugger, I see that it loops inside move_it_in_display_line_to,
because set_iterator_to_next does not advance to the next character.
This happens because autocmp_chars and composition_update_it produce a
cmp_it structure with this contents:

    (gdb) p *cmp_it
    $62 = {
      stop_pos = 308,
      id = 4,
      ch = 1489,
      lookback = 0,
      nglyphs = 1,
      nchars = 0,
      nbytes = 0,
      from = 0,
      to = 1,
      width = 0
    }

Since nchars and nbytes are zero, the iterator does not move.

The character whose codepoint is 1489 is u+05d1, HEBREW LETTER BET.  I
typed that character, followed by a u+05bc, HEBREW POINT DAGESH, after
evaluating the expression you suggested (see above).  Emacs loops
after I type the DAGESH.

This code is almost 100% platform independent, except where
font-shape-gstring calls Uniscribe.  Since you don't see the loop,
what could be the reason for that, if it's not something specific to
Uniscribe?  Did I do something wrong?

Thanks.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-07  9:28         ` Eli Zaretskii
@ 2010-05-10 14:09           ` Yair F
  2010-05-11  0:25             ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-10 14:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

> Yair, does the above work for you?  For me (on MS-Windows), it
> infloops.  I will try to find out where it loops later.
>
The code below sort-of works for me.
However, composition only occurs sometimes. I believe it depends on
the font being used including font size.
How can I debug it further?

(Kubuntu/GTK/XFT)

(defconst hebrew-composable-pattern
  (concat
   "\\("
   "[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3]"
                                      ;; base (except vav and shin)
   "[\u05BC\u05BF]?"                  ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05B9\u05BB\u05C7]?"     ;; 0-1 marks of 3rd class (vowel)
   "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
   "\\|"
   "\u05D5"                           ;; base (vav)
   "\u05BC?"                          ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05BB\u05C7]?"           ;; 0-1 marks of extended 3rd class (vowel)
   "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
   "\\|"
   "\u05E9"                           ;; base (shin)
   "\u05BC?"                          ;; 0-1 marks of 1st class (dagesh)
   "[\u05C1\u05C2]?"                  ;; 0-1 marks of 2nd class (shin dot)
   "[\u05B0-\u05B9\u05BB\u05C7]?"     ;; 0-1 marks of 3rd class (vowel)
   "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
   "\\)")
  "Regexp matching a composable sequence of Hebrew characters.")

(set-char-table-range
 composition-function-table '(#x591 . #x5F4)
 (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-10 14:09           ` Yair F
@ 2010-05-11  0:25             ` Kenichi Handa
  2010-05-11 12:20               ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-11  0:25 UTC (permalink / raw)
  To: Yair F; +Cc: eliz, emacs-devel

In article <AANLkTikXVDpVwaB6xGa_ktqw0fsQ7nXdvE9SZ_r0m6mM@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> > Yair, does the above work for you?  For me (on MS-Windows), it
> > infloops.  I will try to find out where it loops later.
> >
> The code below sort-of works for me.
> However, composition only occurs sometimes. I believe it depends on
> the font being used including font size.
> How can I debug it further?

I'm now working on fixing the problem of bidi-composition
interaction.  Please wait for a while.

---
Kenichi Handa
handa@m17n.org

> (Kubuntu/GTK/XFT)

> (defconst hebrew-composable-pattern
>   (concat
>    "\\("
>    "[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3]"
>                                       ;; base (except vav and shin)
>    "[\u05BC\u05BF]?"                  ;; 0-1 marks of 1st class (dagesh)
>    "[\u05B0-\u05B9\u05BB\u05C7]?"     ;; 0-1 marks of 3rd class (vowel)
>    "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
>    "\\|"
>    "\u05D5"                           ;; base (vav)
>    "\u05BC?"                          ;; 0-1 marks of 1st class (dagesh)
>    "[\u05B0-\u05BB\u05C7]?"           ;; 0-1 marks of extended 3rd class (vowel)
>    "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
>    "\\|"
>    "\u05E9"                           ;; base (shin)
>    "\u05BC?"                          ;; 0-1 marks of 1st class (dagesh)
>    "[\u05C1\u05C2]?"                  ;; 0-1 marks of 2nd class (shin dot)
>    "[\u05B0-\u05B9\u05BB\u05C7]?"     ;; 0-1 marks of 3rd class (vowel)
>    "[\u0591-\u05AF\u05BD]*"           ;; 0-2 (3?) marks of 4th class (cantill.)
>    "\\)")
>   "Regexp matching a composable sequence of Hebrew characters.")

> (set-char-table-range
>  composition-function-table '(#x591 . #x5F4)
>  (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-11  0:25             ` Kenichi Handa
@ 2010-05-11 12:20               ` Kenichi Handa
  2010-05-11 16:22                 ` Eli Zaretskii
  2010-05-11 21:40                 ` Yair F
  0 siblings, 2 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-11 12:20 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, yair.f.lists, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

In article <tl7d3x3cmja.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> I'm now working on fixing the problem of bidi-composition
> interaction.  Please wait for a while.

It seems that my code has started to work.  The attached is
the image of displaying the following line:
A "אֲעוֹלֵל 123 כַּגֶּפֶן" B.
with Emacs.  I got the two hebrew words from
http://www.zemer.co.il/song.asp?id=626&artist=157, and
inserted "123" between them just for testing nest.  Hebrew chars
are displayed by "dejavu sans mono" which has OpenType table
for Hebrew.

Could you please check the rendering (position of
diacriticals, etc.)?


[-- Attachment #2: Hebrew sample --]
[-- Type: image/png, Size: 61425 bytes --]

[-- Attachment #3: Type: text/plain, Size: 34 bytes --]


---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-11 12:20               ` Kenichi Handa
@ 2010-05-11 16:22                 ` Eli Zaretskii
  2010-05-12  8:04                   ` Kenichi Handa
  2010-05-11 21:40                 ` Yair F
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-11 16:22 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: yair.f.lists@gmail.com, eliz@gnu.org, emacs-devel@gnu.org
> Date: Tue, 11 May 2010 21:20:57 +0900
> 
> Could you please check the rendering (position of
> diacriticals, etc.)?

Looks okay to me.  Thanks!



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-11 12:20               ` Kenichi Handa
  2010-05-11 16:22                 ` Eli Zaretskii
@ 2010-05-11 21:40                 ` Yair F
  2010-05-12  3:15                   ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-11 21:40 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3011 bytes --]

Is there a way to get what-cursor-position show composition?

On Tue, May 11, 2010 at 3:20 PM, Kenichi Handa <handa@m17n.org> wrote:
> Could you please check the rendering (position of
> diacriticals, etc.)?

I see some problems, some not present in your image see my example attached:
In the phrase:
שָׁלוֹם לְמִשְׁתַּמְּשֵׁי אִמַאקְס
The Holam point above Vav is not centred above the letter.
The Dagesh points in the Tav and Mem letters is not centred but too
close to the left side of the letter.

In the phrase:
A "אֲעוֹלֵל 123 כַּגֶּפֶן" B.
The point Hataf-Patah under the letter Alef is not centred, but is
placed slightly to left than where it should. This also visible in
your sample.

The last word:
עַשֶּׁשֶׁת
The Dagesh point on the first (right) Shin is not visible at all.
(e.g. http://he.wiktionary.org/wiki/%D7%A2%D7%A9%D7%A9%D7%AA)

The font used is xft:-unknown-DejaVu
Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1
Font version 2.29

When these sample are rendered correctly you might want to try more
complex such as http://www.mechon-mamre.org/c/ct/c2831.htm V12 has a
challenging example.

I now use a refined composition regexp:
(defconst hebrew-composable-pattern
  (concat
   "\\("
   "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]"
                                    ;; base (allows rafe)
   "[\u05BC\u05BF]?"                ;; 0-1 marks of 1st class (dagesh/rafe)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "\u05D5"                         ;; base (vav)
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05BB\u05C7]?"         ;; 0-1 marks of extended 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "\u05E9"                         ;; base (shin)
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05C1\u05C2]?"                ;; 0-1 marks of 2nd class (shin dot)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "[\u05F1-\u05F3]"                ;; base (yidish ligatures)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\)")
  "Regexp matching a composable sequence of Hebrew characters.")
(set-char-table-range
 composition-function-table '(#x591 . #x5F4)
 (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))

[-- Attachment #2: c.png --]
[-- Type: image/png, Size: 19876 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-11 21:40                 ` Yair F
@ 2010-05-12  3:15                   ` Eli Zaretskii
  2010-05-12 15:11                     ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-12  3:15 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Wed, 12 May 2010 00:40:59 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: eliz@gnu.org, emacs-devel@gnu.org
> 
> Is there a way to get what-cursor-position show composition?

Try "C-u C-x =".

> I see some problems, some not present in your image see my example attached:

How did you get the images you posted?  I don't think Handa-san posted
his changes to Emacs yet, at least I didn't see them.  Did I miss
something?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-11 16:22                 ` Eli Zaretskii
@ 2010-05-12  8:04                   ` Kenichi Handa
  2010-05-12 17:35                     ` Eli Zaretskii
  2010-05-19 17:23                     ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-12  8:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel

In article <83pr12pfw6.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > Could you please check the rendering (position of
> > diacriticals, etc.)?

> Looks okay to me.  Thanks!

Thank you for the check.  Now at least rendering with bidi
and composition is working.  But, I found that cursor
positioning sometimes failed even for a text that doesn't
involve composition.  For instance, when I have this text in
a buffer or bidi-display-reordering t,
----------
A א B.
בג
----------
and point is at (i.e. before) "A", typing C-e moves point to
"ג".

This happens with the vanilla trunk code.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12  3:15                   ` Eli Zaretskii
@ 2010-05-12 15:11                     ` Yair F
  2010-05-12 17:43                       ` Eli Zaretskii
  2010-05-13  0:29                       ` Kenichi Handa
  0 siblings, 2 replies; 88+ messages in thread
From: Yair F @ 2010-05-12 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

On Wed, May 12, 2010 at 6:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Wed, 12 May 2010 00:40:59 +0300
>> From: Yair F <yair.f.lists@gmail.com>
>> Cc: eliz@gnu.org, emacs-devel@gnu.org
>>
>> Is there a way to get what-cursor-position show composition?
>
> Try "C-u C-x =".
>
This would only show the information for the character after the point
whether it's a based or composed mark. But it does not provide
information if composition occurred.

>> I see some problems, some not present in your image see my example attached:
>
> How did you get the images you posted?  I don't think Handa-san posted
> his changes to Emacs yet, at least I didn't see them.  Did I miss
> something?
>
Sorry, it is my mistake. I though that Handa-san already comitted his work.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12  8:04                   ` Kenichi Handa
@ 2010-05-12 17:35                     ` Eli Zaretskii
  2010-05-12 19:05                       ` Juanma Barranquero
  2010-05-13  0:42                       ` Kenichi Handa
  2010-05-19 17:23                     ` Eli Zaretskii
  1 sibling, 2 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-12 17:35 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org
> Date: Wed, 12 May 2010 17:04:09 +0900
> 
> I found that cursor positioning sometimes failed even for a text
> that doesn't involve composition.  For instance, when I have this
> text in a buffer or bidi-display-reordering t,
> ----------
> A א B.
> בג
> ----------
> and point is at (i.e. before) "A", typing C-e moves point to
> "ג".
> 
> This happens with the vanilla trunk code.

Yes, cursor motion is a constant source of trouble with bidi
reordering.  Too many places assume strictly unidirectional display
and rely on the fact that the first (leftmost) character of line N
follows the last character of line N-1 in buffer's logical order.  It
takes time to learn all of the hidden assumptions and fix them.  C-a
and C-e are even trickier than C-f and C-b, because they move by
buffer positions assuming that screen positions correspond to those
1:1.

I already refactored the related functions twice, and recently came to
a conclusion that I need to refactor them yet again, sigh...

Thanks for the test case, I will add it to my other tests.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 15:11                     ` Yair F
@ 2010-05-12 17:43                       ` Eli Zaretskii
  2010-05-12 22:01                         ` Yair F
  2010-05-13  0:29                       ` Kenichi Handa
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-12 17:43 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Wed, 12 May 2010 18:11:16 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: handa@m17n.org, emacs-devel@gnu.org
> 
> On Wed, May 12, 2010 at 6:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >> Date: Wed, 12 May 2010 00:40:59 +0300
> >> From: Yair F <yair.f.lists@gmail.com>
> >> Cc: eliz@gnu.org, emacs-devel@gnu.org
> >>
> >> Is there a way to get what-cursor-position show composition?
> >
> > Try "C-u C-x =".
> >
> This would only show the information for the character after the point
> whether it's a based or composed mark. But it does not provide
> information if composition occurred.

Sorry, I don't understand what you mean.  If "C-u C-x =" shows you
something like this:

	    display: composed to form "ດີ" (see below)

  Composed with the following character(s) "ີ" using this font:
    uniscribe:-outline-DejaVu Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
  by these glyphs:
    [0 1 3732 1091 8 1 7 12 3 nil]
    [0 1 3732 1117 0 0 8 12 3 [-8 1 0]]

then you know that character at point was composed and that the
composition did happen.  What would you like to see instead?




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 17:35                     ` Eli Zaretskii
@ 2010-05-12 19:05                       ` Juanma Barranquero
  2010-05-13  3:06                         ` Eli Zaretskii
  2010-05-13  0:42                       ` Kenichi Handa
  1 sibling, 1 reply; 88+ messages in thread
From: Juanma Barranquero @ 2010-05-12 19:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel, Kenichi Handa

On Wed, May 12, 2010 at 19:35, Eli Zaretskii <eliz@gnu.org> wrote:

> Yes, cursor motion is a constant source of trouble with bidi
> reordering.

It's weird that it also happens in text with no bidi reordering
whatsoever, just activating line-by-line-scrolling and moving over a
control char (bug#6036).

    Juanma



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 17:43                       ` Eli Zaretskii
@ 2010-05-12 22:01                         ` Yair F
  2010-05-13 17:14                           ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-12 22:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

On Wed, May 12, 2010 at 8:43 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> Sorry, I don't understand what you mean.  If "C-u C-x =" shows you
> something like this:
>
>            display: composed to form "ດີ" (see below)
>
>  Composed with the following character(s) "ີ" using this font:
>    uniscribe:-outline-DejaVu Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
>  by these glyphs:
>    [0 1 3732 1091 8 1 7 12 3 nil]
>    [0 1 3732 1117 0 0 8 12 3 [-8 1 0]]
>
> then you know that character at point was composed and that the
> composition did happen.  What would you like to see instead?
>

For Hebrew, the diplay is a bit different (no composition info):
          display: by this font (glyph code)
    xft:-unknown-DejaVu
Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x53D)

Character code properties: customize what to show
  name: HEBREW LETTER SHIN
...

          display: by this font (glyph code)
    xft:-unknown-DejaVu
Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x51F)

Character code properties: customize what to show
  name: HEBREW POINT SHIN DOT
...

          display: by this font (glyph code)
    xft:-unknown-DejaVu
Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x516)

Character code properties: customize what to show
  name: HEBREW POINT QAMATS

While a Latin test gives:
          display: composed to form "Ȧ" (see below)

Composed with the following character(s) "̇" using this font:
  xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-27-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 65 36 16 1 16 20 0 nil]
  [0 1 775 652 16 7 10 20 -17 [-16 -6 0]]

Character code properties: customize what to show
  name: LATIN CAPITAL LETTER A



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 15:11                     ` Yair F
  2010-05-12 17:43                       ` Eli Zaretskii
@ 2010-05-13  0:29                       ` Kenichi Handa
  1 sibling, 0 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-13  0:29 UTC (permalink / raw)
  To: Yair F; +Cc: eliz, emacs-devel

In article <AANLkTimTBtuObr0aS7_cJ-XCCKJRW5e0rNT4HyGdAJUN@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

>>> Is there a way to get what-cursor-position show composition?
> >
> > Try "C-u C-x =".
> >
> This would only show the information for the character after the point
> whether it's a based or composed mark.

Then, it means that the diacritical mark is not really
composed with the previous base character but is just
displayed with a proper x-offset (lbearing).

>>> I see some problems, some not present in your image see my example attached:
> >
> > How did you get the images you posted?  I don't think Handa-san posted
> > his changes to Emacs yet, at least I didn't see them.  Did I miss
> > something?
> >
> Sorry, it is my mistake. I though that Handa-san already comitted his work.

Sorry for not writing that clearly.  I need some more work
to commit the code.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 17:35                     ` Eli Zaretskii
  2010-05-12 19:05                       ` Juanma Barranquero
@ 2010-05-13  0:42                       ` Kenichi Handa
  2010-05-14  8:10                         ` Kenichi Handa
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-13  0:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel

In article <83fx1xowfj.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Yes, cursor motion is a constant source of trouble with bidi
> reordering.  Too many places assume strictly unidirectional display
> and rely on the fact that the first (leftmost) character of line N
> follows the last character of line N-1 in buffer's logical order.  It
> takes time to learn all of the hidden assumptions and fix them.  C-a
> and C-e are even trickier than C-f and C-b, because they move by
> buffer positions assuming that screen positions correspond to those
> 1:1.

> I already refactored the related functions twice, and recently came to
> a conclusion that I need to refactor them yet again, sigh...

> Thanks for the test case, I will add it to my other tests.

Ok, then, as soon as I conclude that at least the displaying
looks correct, I'll commit my new code without testing it
against cursor positioning.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 19:05                       ` Juanma Barranquero
@ 2010-05-13  3:06                         ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-13  3:06 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: yair.f.lists, emacs-devel, handa

> From: Juanma Barranquero <lekktu@gmail.com>
> Date: Wed, 12 May 2010 21:05:20 +0200
> Cc: Kenichi Handa <handa@m17n.org>, yair.f.lists@gmail.com, emacs-devel@gnu.org
> 
> On Wed, May 12, 2010 at 19:35, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > Yes, cursor motion is a constant source of trouble with bidi
> > reordering.
> 
> It's weird that it also happens in text with no bidi reordering
> whatsoever, just activating line-by-line-scrolling and moving over a
> control char (bug#6036).

That's a different, though related, problem.  The cursor positioning
code was completely rewritten in Emacs 24, to be able to support
reordered lines, and that rewrite affects even lines that aren't
actually reordered.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12 22:01                         ` Yair F
@ 2010-05-13 17:14                           ` Eli Zaretskii
  2010-05-13 19:46                             ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-13 17:14 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Thu, 13 May 2010 01:01:38 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: handa@m17n.org, emacs-devel@gnu.org
> 
> For Hebrew, the diplay is a bit different (no composition info):

IIUC, this means no composition took place.  Why did you expect a
composition?  If this is in stock Emacs 24.0.50, then there are no
compositions defined for any of the Hebrew characters out of the box.
This is why we need your work.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-13 17:14                           ` Eli Zaretskii
@ 2010-05-13 19:46                             ` Yair F
  2010-05-13 19:56                               ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-13 19:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Thu, May 13, 2010 at 8:14 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Thu, 13 May 2010 01:01:38 +0300
>> From: Yair F <yair.f.lists@gmail.com>
>> Cc: handa@m17n.org, emacs-devel@gnu.org
>>
>> For Hebrew, the diplay is a bit different (no composition info):
>
> IIUC, this means no composition took place.  Why did you expect a
> composition?  If this is in stock Emacs 24.0.50, then there are no
> compositions defined for any of the Hebrew characters out of the box.
> This is why we need your work.
>

Something strange happens here as these characters *are* composed
(Shin+shin dot+qamats).

One more thing: In the test case attached the Latin composition
sometimes occurs and sometimes not. I haven't been able to identify
why.

All of this applies to current trunk built with the attached
lisp/languages/hebrew.el (Kubuntu/gtk/xtf)

[-- Attachment #2: hebrew-sample2.txt --]
[-- Type: text/plain, Size: 143 bytes --]

שָׁלוֹם לְמִשְׁתַּמְּשֵׁי אִמַאקְס

A "אֲעוֹלֵל 123 כַּגֶּפֶן" B.

עַשֶּׁשֶׁת 

Ȧ

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: hebrew.el --]
[-- Type: text/x-emacs-lisp; name="hebrew.el", Size: 5304 bytes --]

;;; hebrew.el --- support for Hebrew -*- coding: iso-2022-7bit; no-byte-compile: t -*-

;; Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
;;   Free Software Foundation, Inc.
;; Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
;;   2005, 2006, 2007, 2008, 2009, 2010
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H14PRO021

;; Copyright (C) 2003
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H13PRO009

;; Keywords: multilingual, Hebrew

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;; For Hebrew, the character set ISO8859-8 is supported.
;; See http://www.ecma.ch/ecma1/STAND/ECMA-121.HTM.
;; Windows-1255 is also supported.

;;; Code:

(define-coding-system 'hebrew-iso-8bit
  "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)."
  :coding-type 'charset
  :mnemonic ?8
  :charset-list '(iso-8859-8)
  :mime-charset 'iso-8859-8)

(define-coding-system-alias 'iso-8859-8 'hebrew-iso-8bit)

;; These are for Explicit and Implicit directionality information, as
;; defined in RFC 1556.  We don't yet support directional information
;; in bidi languages, so these aliases are a lie, especially as far as
;; iso-8859-8-e is concerned.  FIXME.
(define-coding-system-alias 'iso-8859-8-e 'hebrew-iso-8bit)
(define-coding-system-alias 'iso-8859-8-i 'hebrew-iso-8bit)

(set-language-info-alist
 "Hebrew" '((charset iso-8859-8)
	    (coding-priority hebrew-iso-8bit)
	    (coding-system hebrew-iso-8bit windows-1255 cp862)
	    (nonascii-translation . iso-8859-8)
	    (input-method . "hebrew")
	    (unibyte-display . hebrew-iso-8bit)
	    (sample-text . "Hebrew	^[,Hylem^[(B")
	    (documentation . "Right-to-left writing is not yet supported.")))

(set-language-info-alist
 "Windows-1255" '((coding-priority windows-1255)
		  (coding-system windows-1255)
		  (documentation . "\
Support for Windows-1255 encoding, e.g. for Yiddish.
Right-to-left writing is not yet supported.")))

(define-coding-system 'windows-1255
  "windows-1255 (Hebrew) encoding (MIME: WINDOWS-1255)"
  :coding-type 'charset
  :mnemonic ?h
  :charset-list '(windows-1255)
  :mime-charset 'windows-1255)
(define-coding-system-alias 'cp1255 'windows-1255)

(define-coding-system 'cp862
  "DOS codepage 862 (Hebrew)"
  :coding-type 'charset
  :mnemonic ?D
  :charset-list '(cp862)
  :mime-charset 'cp862)
(define-coding-system-alias 'ibm862 'cp862)

;; For automatic composition.
(defconst hebrew-composable-pattern
  (concat
   "\\("
   "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]" 
                                    ;; base (allows rafe)
   "[\u05BC\u05BF]?"                ;; 0-1 marks of 1st class (dagesh/rafe)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "\u05D5"                         ;; base (vav)
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05B0-\u05BB\u05C7]?"         ;; 0-1 marks of extended 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "\u05E9"                         ;; base (shin)
   "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
   "[\u05C1\u05C2]?"                ;; 0-1 marks of 2nd class (shin dot)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\|"
   "[\u05F1-\u05F3]"                ;; base (yidish ligatures)
   "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
   "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
   "\\)")
  "Regexp matching a composable sequence of Hebrew characters.")

;;; Handa san suggest this. still needs to be understood
;; (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...))
;;      (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..]))
;;  (dolist (elt hebrew-diacritals-list)
;;    (set-char-table-range elt
;;      (list (vector regexp 1 'font-shape-gstring)))))

(set-char-table-range 
 composition-function-table '(#x591 . #x5F4)
 (list (vector hebrew-composable-pattern 0 'font-shape-gstring)))

(provide 'hebrew)

;; arch-tag: 3ca04f32-3f1e-498e-af46-8267498ba5d9
;;; hebrew.el ends here

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-13 19:46                             ` Yair F
@ 2010-05-13 19:56                               ` Eli Zaretskii
  2010-05-13 20:08                                 ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-13 19:56 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Thu, 13 May 2010 22:46:03 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: handa@m17n.org, emacs-devel@gnu.org
> 
> Something strange happens here as these characters *are* composed
> (Shin+shin dot+qamats).

When you move cursor through that "composed" character, does Emacs
treat them as a single character?  That is, is a single C-f enough to
move all the way past Shin+shin dot+qamats?  If not, then there is no
composition; rather, the diacriticals are displayed with a proper
offset so that they _appear_ to be composed with the base character
(shin, in this case).



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-13 19:56                               ` Eli Zaretskii
@ 2010-05-13 20:08                                 ` Yair F
  2010-05-14  2:35                                   ` Miles Bader
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-13 20:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, handa

On Thu, May 13, 2010 at 10:56 PM, Eli Zaretskii <eliz@gnu.org> wrote:
 If not, then there is no
> composition; rather, the diacriticals are displayed with a proper
> offset so that they _appear_ to be composed with the base character
> (shin, in this case).
>

You are right. However this is somewhat counter-intuitive and makes
editing difficult. Leaving bidi
Consider this example: d̊̂ (d+ring above+circumflex) The user would
like to remove the ring component. How can it be done?

Also When a composition occurs what-cursor-position give Unicode
information only fot the base characters and not for other composed
components.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-13 20:08                                 ` Yair F
@ 2010-05-14  2:35                                   ` Miles Bader
  2010-05-14 10:45                                     ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Miles Bader @ 2010-05-14  2:35 UTC (permalink / raw)
  To: Yair F; +Cc: Eli Zaretskii, handa, emacs-devel

Yair F <yair.f.lists@gmail.com> writes:
> Consider this example: d̊̂ (d+ring above+circumflex) The user would
> like to remove the ring component. How can it be done?

_Editing_ commands (e.g., DEL) work on the individual components, even
for composed characters -- so if you put d̊̂ in a buffer, position the
cursor after it, and hit DEL, it will remove just the last accent,
leaving "d̊".  This duality works pretty well I think, and seems
intuitive.

> Also When a composition occurs what-cursor-position give Unicode
> information only fot the base characters and not for other composed
> components.

Use "C-u C-x =" to see more info.

-Miles

-- 
自らを空にして、心を開く時、道は開かれる



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-13  0:42                       ` Kenichi Handa
@ 2010-05-14  8:10                         ` Kenichi Handa
  2010-05-14 10:02                           ` Eli Zaretskii
       [not found]                           ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>
  0 siblings, 2 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-14  8:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, yair.f.lists, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

In article <tl7r5lgbpit.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> Ok, then, as soon as I conclude that at least the displaying
> looks correct, I'll commit my new code without testing it
> against cursor positioning.

I've just committed a fix.

Eli, please check the comments of set_iterator_to_next, and
verify that I'm doing the right thing.

I have not yet committed proper codes for Hebrew
composition.  I'm now testing with this simple version.

(let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+"))
  (set-char-table-range
   composition-function-table '(#x591 . #x5C7)
   (list (vector pattern 1 'font-shape-gstring)
	 ["[\u0591-\u05C7]" 0 font-shape-gstring]))
  (set-char-table-range
   composition-function-table #x5C0 nil)
  (set-char-table-range
   composition-function-table #x5C6 nil))

(set-fontset-font
 t 'hebrew '("dejavu sans" . "unicode-bmp"))

On GNU/Linux, you have to build Emacs with libotf and
m17n-flt libraries, and having the attached file in
~/.m17n.d/HEBR-FF.flt.

---
Kenichi Handa
handa@m17n.org


[-- Attachment #2: HEBR-FF.flt --]
[-- Type: text/plain, Size: 1398 bytes --]

;; HEBR-FF.flt -- Font Layout Table for Hebrew OTF font
;; Copyright (C) 2010  AIST (H15PRO112)
;; See the end for copying conditions.

(font layouter hebr-ff nil
      (font (nil nil unicode-bmp) :otf=hebr+mark))

(category
 (0x0591 0x05BF	?h)
 (0x05C1 0x05C5	?h)
 (0x05C7	?h)
 (0x05D0 0x05F2	?H))

(generator
 (0
  (cond
   ("Hhh*" < :otf=hebr=ccmp+mark >)
   ("h" [ = ])
   ("." =))))

;; Copyright (C) 2010
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H15PRO112

;; This file is part of the m17n database; a sub-part of the m17n
;; library.

;; The m17n library is free software; you can redistribute it and/or
;; modify it under the terms of the GNU Lesser General Public License
;; as published by the Free Software Foundation; either version 2.1 of
;; the License, or (at your option) any later version.

;; The m17n library is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; Lesser General Public License for more details.

;; You should have received a copy of the GNU Lesser General Public
;; License along with the m17n library; if not, write to the Free
;; Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.

;; Local Variables:
;; mode: lisp
;; End:

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14  8:10                         ` Kenichi Handa
@ 2010-05-14 10:02                           ` Eli Zaretskii
  2010-05-14 11:58                             ` Kenichi Handa
       [not found]                           ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>
  1 sibling, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-14 10:02 UTC (permalink / raw)
  To: Kenichi Handa, Jason Rumney; +Cc: yair.f.lists, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: eliz@gnu.org, yair.f.lists@gmail.com, emacs-devel@gnu.org
> Date: Fri, 14 May 2010 17:10:33 +0900
> 
> I've just committed a fix.
> 
> Eli, please check the comments of set_iterator_to_next, and
> verify that I'm doing the right thing.

It looks okay at a first glance, thank you!

In the HELLO buffer, the RLM character is not composed with the
following parenthesis, though.  Is this a separate problem?

I will work on the issues you raised in the comments.  For now, I have
just one response: in this fragment from set_iterator_to_next:

		/* Update IT's char/byte positions to point the first
		   character of the next grapheme cluster, or to the
		   character visually after the current composition.  */
  #if 0
		/* Is it ok to do this directly? */
		IT_CHARPOS (*it) += it->cmp_it.nchars;
		IT_BYTEPOS (*it) += it->cmp_it.nbytes;
  #else
		/* Or do we have to call bidi_get_next_char_visually
		   repeatedly (perhaps not to confuse some internal
		   state of bidi_it)?  At least we must do this if we
		   have consumed all grapheme clusters in the current
		   composition because the next character will be in the
		   different bidi level.  */
		for (i = 0; i < it->cmp_it.nchars; i++)
		  bidi_get_next_char_visually (&it->bidi_it);

the "#else" part is doing TRT.  You cannot jump to a different place
in the buffer/string behind the back of bidi_get_next_char_visually,
because that would violate the integrity of its internal cache, which
must correspond to the buffer/string positions 1:1.

> I have not yet committed proper codes for Hebrew
> composition.  I'm now testing with this simple version.
> 
> (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+"))
>   (set-char-table-range
>    composition-function-table '(#x591 . #x5C7)
>    (list (vector pattern 1 'font-shape-gstring)
> 	 ["[\u0591-\u05C7]" 0 font-shape-gstring]))
>   (set-char-table-range
>    composition-function-table #x5C0 nil)
>   (set-char-table-range
>    composition-function-table #x5C6 nil))

Could you please look at the message I posted in
http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html?
I still see the infloop, with the current trunk, even when
bidi-display-reordering is set to nil, after I type BET and DAGESH, as
described in that message.  What kind of problems in the information
that Uniscribe returns to Emacs could cause such a loop?

If I type a different diacritical after BET, like PATAH, there's no
infloop, but the display is incorrect: I see both the isolated PATAH
and the composed BAT+PATAH after it.

Jason, could you help me with this?  It looks like some
Uniscribe-specific issue.  TIA




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14  2:35                                   ` Miles Bader
@ 2010-05-14 10:45                                     ` Yair F
  2010-05-14 13:05                                       ` Eli Zaretskii
                                                         ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Yair F @ 2010-05-14 10:45 UTC (permalink / raw)
  To: Miles Bader; +Cc: Eli Zaretskii, handa, emacs-devel

On Fri, May 14, 2010 at 5:35 AM, Miles Bader <miles@gnu.org> wrote:
> Yair F <yair.f.lists@gmail.com> writes:
>> Consider this example: d̊̂ (d+ring above+circumflex) The user would
>> like to remove the ring component. How can it be done?
>
> _Editing_ commands (e.g., DEL) work on the individual components, even
> for composed characters -- so if you put d̊̂ in a buffer, position the
> cursor after it, and hit DEL, it will remove just the last accent,
> leaving "d̊".  This duality works pretty well I think, and seems
> intuitive.
>
I still fail to understand how to directly access end edit individual componens.
In the example above how can the ring component can be directly edited
without editing the circumflex?

A base character composed with 3 marks is not that infrequent.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 10:02                           ` Eli Zaretskii
@ 2010-05-14 11:58                             ` Kenichi Handa
  2010-05-14 13:29                               ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-14 11:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel, jasonr

In article <837hn64x96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Eli, please check the comments of set_iterator_to_next, and
> > verify that I'm doing the right thing.

> It looks okay at a first glance, thank you!

> In the HELLO buffer, the RLM character is not composed with the
> following parenthesis, though.  Is this a separate problem?

Currently Emacs' category of RLM (U+200F) is not '^'
(Combining).  So it is not composed with the previous base
character.

BTW, I don't know which is better; showing RLM (and the
other control characters) to users or not.  If we show it,
the display is a little bit annoying.  If we hide it, it
gets very difficult for users to manually edit it.

> I will work on the issues you raised in the comments.  For now, I have
> just one response: in this fragment from set_iterator_to_next:

> 		/* Update IT's char/byte positions to point the first
> 		   character of the next grapheme cluster, or to the
> 		   character visually after the current composition.  */
>   #if 0
> 		/* Is it ok to do this directly? */
> 		IT_CHARPOS (*it) += it->cmp_it.nchars;
> 		IT_BYTEPOS (*it) += it->cmp_it.nbytes;
>   #else
> 		/* Or do we have to call bidi_get_next_char_visually
> 		   repeatedly (perhaps not to confuse some internal
> 		   state of bidi_it)?  At least we must do this if we
> 		   have consumed all grapheme clusters in the current
> 		   composition because the next character will be in the
> 		   different bidi level.  */
> 		for (i = 0; i < it->cmp_it.nchars; i++)
> 		  bidi_get_next_char_visually (&it->bidi_it);

> the "#else" part is doing TRT.  You cannot jump to a different place
> in the buffer/string behind the back of bidi_get_next_char_visually,
> because that would violate the integrity of its internal cache, which
> must correspond to the buffer/string positions 1:1.

Ok, I see.

> > I have not yet committed proper codes for Hebrew
> > composition.  I'm now testing with this simple version.
> > 
> > (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+"))
> >   (set-char-table-range
> >    composition-function-table '(#x591 . #x5C7)
> >    (list (vector pattern 1 'font-shape-gstring)
> > 	 ["[\u0591-\u05C7]" 0 font-shape-gstring]))
> >   (set-char-table-range
> >    composition-function-table #x5C0 nil)
> >   (set-char-table-range
> >    composition-function-table #x5C6 nil))

> Could you please look at the message I posted in
> http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html?
> I still see the infloop, with the current trunk, even when
> bidi-display-reordering is set to nil, after I type BET and DAGESH, as
> described in that message.  What kind of problems in the information
> that Uniscribe returns to Emacs could cause such a loop?

I'm now investigating that problem.

But, as I don't have Emacs on Windows now, all I can do is
just guessing.

To debug C code of Emacs on Windows, which environment
(cygwin, mingw, ...) is good?  It will be good that there's
an instruction simply showing just one method by saying
something like:

(1) Download this and this and this...  (with actual URLs).
(2) Install this, then do this, then install this, then do this...
(3) Build Emacs under this terminal program.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 10:45                                     ` Yair F
@ 2010-05-14 13:05                                       ` Eli Zaretskii
  2010-05-14 13:15                                       ` Kenichi Handa
  2010-05-15  0:46                                       ` Miles Bader
  2 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-14 13:05 UTC (permalink / raw)
  To: Yair F; +Cc: handa, emacs-devel, miles

> Date: Fri, 14 May 2010 13:45:41 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, handa@m17n.org, emacs-devel@gnu.org
> 
> In the example above how can the ring component can be directly edited
> without editing the circumflex?
> 
> A base character composed with 3 marks is not that infrequent.

Well, at least in Hebrew, everyday's written language almost never
uses marks at all, so I think 3 marks would be limited to Biblical
Hebrew or maybe poetry (and I'd be thrilled to know that
Hebrew-writing poets start using Emacs for their craft ;-).

If worse comes to worst, you can always disable auto-composition-mode,
edit the text, then re-enable it.

In the long run, it would be nice to have specialized commands for
moving inside composed characters, but I suspect that making it happen
would need non-trivial changes to the display engine.  Volunteers are
welcome to code that, but I personally have much more important things
on my plate for the foreseeable future.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 10:45                                     ` Yair F
  2010-05-14 13:05                                       ` Eli Zaretskii
@ 2010-05-14 13:15                                       ` Kenichi Handa
  2010-05-15  0:46                                       ` Miles Bader
  2 siblings, 0 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-14 13:15 UTC (permalink / raw)
  To: Yair F; +Cc: eliz, emacs-devel, miles

In article <AANLkTikiwXfiL0WUjkkSUQw9SbGYNYGLg-6rO_LhNWWe@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Fri, May 14, 2010 at 5:35 AM, Miles Bader <miles@gnu.org> wrote:
> > Yair F <yair.f.lists@gmail.com> writes:
>>> Consider this example: d̊̂ (d+ring above+circumflex) The user would
>>> like to remove the ring component. How can it be done?
> >
> > _Editing_ commands (e.g., DEL) work on the individual components, even
> > for composed characters -- so if you put d̊̂ in a buffer, position the
> > cursor after it, and hit DEL, it will remove just the last accent,
> > leaving "d̊".  This duality works pretty well I think, and seems
> > intuitive.
> >
> I still fail to understand how to directly access end edit individual componens.
> In the example above how can the ring component can be directly edited
> without editing the circumflex?

> A base character composed with 3 marks is not that infrequent.

I used to use this code for debugging.  It allows you to put
point within a composition.  But, I found that it surely
moves point but the screen is not updated properly now.
I'll debug it after I finish more urgent tasks.

(defun forward-char-intrusive ()
  (interactive)
  (setq disable-point-adjustment t)
  (forward-char 1))

(defun backward-char-intrusive ()
  (interactive)
  (setq disable-point-adjustment t)
  (forward-char -1))

(global-set-key (kbd "C-S-f") 'forward-char-intrusive)
(global-set-key (kbd "C-S-b") 'backward-char-intrusive)

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 11:58                             ` Kenichi Handa
@ 2010-05-14 13:29                               ` Eli Zaretskii
  2010-05-14 14:06                                 ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-14 13:29 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel, jasonr

> From: Kenichi Handa <handa@m17n.org>
> Cc: jasonr@gnu.org, yair.f.lists@gmail.com, emacs-devel@gnu.org
> Date: Fri, 14 May 2010 20:58:00 +0900
> 
> In article <837hn64x96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > Eli, please check the comments of set_iterator_to_next, and
> > > verify that I'm doing the right thing.
> 
> > It looks okay at a first glance, thank you!
> 
> > In the HELLO buffer, the RLM character is not composed with the
> > following parenthesis, though.  Is this a separate problem?
> 
> Currently Emacs' category of RLM (U+200F) is not '^'
> (Combining).  So it is not composed with the previous base
> character.

A separate problem, then.

> BTW, I don't know which is better; showing RLM (and the
> other control characters) to users or not.  If we show it,
> the display is a little bit annoying.  If we hide it, it
> gets very difficult for users to manually edit it.

We need both, I think.  But AFAIU, it would be easy to write a command
that modifies the category you show above, and thus makes these
characters appear and disappear.

> > Could you please look at the message I posted in
> > http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html?
> > I still see the infloop, with the current trunk, even when
> > bidi-display-reordering is set to nil, after I type BET and DAGESH, as
> > described in that message.  What kind of problems in the information
> > that Uniscribe returns to Emacs could cause such a loop?
> 
> I'm now investigating that problem.

Thanks.

> But, as I don't have Emacs on Windows now, all I can do is
> just guessing.
> 
> To debug C code of Emacs on Windows, which environment
> (cygwin, mingw, ...) is good?

MinGW.  (Cygwin is just a usual Unix build running on Windows using an
X server, so I doubt if you will see there the same problem I do in
the MinGW build, which uses Uniscribe to render characters.)

> It will be good that there's an instruction simply showing just one
> method by saying something like:
> 
> (1) Download this and this and this...  (with actual URLs).
> (2) Install this, then do this, then install this, then do this...
> (3) Build Emacs under this terminal program.

This place may be a good starting point:

   http://ourcomments.org/Emacs/w32-build-emacs.html

I use GCC, Binutils, GDB, and Make downloaded from the MinGW site
(http://www.mingw.org/), `cp', `mv', and `rm' from the latest port of
Coreutils from GnuWin32 (http://gnuwin32.sourceforge.net/).  You will
also need the MinGW Runtime (mingw-runtime-*.tar.gz) and the Windows
API headers (w32api-*.tar.gz) from MinGW -- these are headers and
libraries that are needed to compile and link against Windows system
APIs.  The XPM library, needed for displaying the tool bar, is
available from ftp://ftp.gnu.org/gnu/emacs/windows, where you will
also find a precompiled binary.

I think that's about it.

One other thing: to unpack .tar.gz archives on Windows, you can use
the handy djtarnt.exe utility, which you can find here:

  ftp://ftp.gnu.org/old-gnu/emacs/windows/utilities/i386/

This utility supports only .tar.gz, not .tar.bz2.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 13:29                               ` Eli Zaretskii
@ 2010-05-14 14:06                                 ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-14 14:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: jasonr, yair.f.lists, emacs-devel, handa

> Date: Fri, 14 May 2010 16:29:09 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org, jasonr@gnu.org
> 
> The XPM library, needed for displaying the tool bar, is
> available from ftp://ftp.gnu.org/gnu/emacs/windows, where you will
> also find a precompiled binary.

Of course, the precompiled binary is not what you want, since you want
to build the latest trunk.  Sorry for any confusion.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-14 10:45                                     ` Yair F
  2010-05-14 13:05                                       ` Eli Zaretskii
  2010-05-14 13:15                                       ` Kenichi Handa
@ 2010-05-15  0:46                                       ` Miles Bader
  2 siblings, 0 replies; 88+ messages in thread
From: Miles Bader @ 2010-05-15  0:46 UTC (permalink / raw)
  To: Yair F; +Cc: Eli Zaretskii, emacs-devel, handa

Yair F <yair.f.lists@gmail.com> writes:
> I still fail to understand how to directly access end edit individual componens.
> In the example above how can the ring component can be directly edited
> without editing the circumflex?

No so far as I know (though it can be done simply enough in elisp), but
as far as I can see, it's really not very important.

You could easily write an elisp function to split apart the composed
components to allow easier editing, e.g., by putting a space after the
base character (a user could then re-compose them by deleting the
space).  However given the shortness of such sequences in all but the
rarest cases, I expect most users simply wouldn't bother to use such a
function, and would probably just delete the accents and re-add them if
they were incorrect.

-Miles

-- 
Next to fried food, the South has suffered most from oratory.
  			-- Walter Hines Page



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
       [not found]                           ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>
@ 2010-05-15  2:14                             ` Kenichi Handa
  2010-05-15 21:35                               ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-15  2:14 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:
> Now most simple compositions (niqqud only) works.

> I stll see these issues:
> 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot
> (U+5C2) which breaks the composition.

In my environment, the sequence U+5E9 U+5C2 is surely
composed.  Please show me the output of this command.

% MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")'

Mine is this;

 [FLT] (hebr-ff (dejavu sans)
 [FLT]   (SOURCE 05E9 05C2)
 [FLT]   (STAGE 0 "Hh" (05E9 05C2)
 [FLT]     (SUBPART 0
 [FLT]      (COND
 [FLT]       (REGEX "^Hhh*" "Hh" 2
 [FLT]        <0
 [FLT]        :otf=hebr=ccmp+mark 2>))))
 [FLT]   (RESULT (1299 1216 0 0)))

> 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is
> composed to the right of the letter instead of left.

It seems that "dejavu sans" font doesn't have 'ccmp' nor
'mark' feature applicable to that sequence.  It is possible
to detect such a case and adjust glyph-string by Lisp code.
I'll try to implement that kind of code.

> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into
> presentation form (U+FB1D). This should only happen with specific
> control (Either CGJ or ZWJ I'll check).

I tried to show that sequence with gedit and and firefox.
Both show it with U+FB1D.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-15  2:14                             ` Kenichi Handa
@ 2010-05-15 21:35                               ` Yair F
  2010-05-17  4:35                                 ` Kenichi Handa
  2010-05-17 13:53                                 ` Stefan Monnier
  0 siblings, 2 replies; 88+ messages in thread
From: Yair F @ 2010-05-15 21:35 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Sat, May 15, 2010 at 5:14 AM, Kenichi Handa <handa@m17n.org> wrote:
> In article <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:
>> Now most simple compositions (niqqud only) works.
>
>> I stll see these issues:
>> 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot
>> (U+5C2) which breaks the composition.
>
> In my environment, the sequence U+5E9 U+5C2 is surely
> composed.  Please show me the output of this command.
>
> % MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")'
>
> Mine is this;
>
>  [FLT] (hebr-ff (dejavu sans)
>  [FLT]   (SOURCE 05E9 05C2)
>  [FLT]   (STAGE 0 "Hh" (05E9 05C2)
>  [FLT]     (SUBPART 0
>  [FLT]      (COND
>  [FLT]       (REGEX "^Hhh*" "Hh" 2
>  [FLT]        <0
>  [FLT]        :otf=hebr=ccmp+mark 2>))))
>  [FLT]   (RESULT (1299 1216 0 0)))

Mine is
 [FLT] (hebr-ff (dejavu sans)
 [FLT]   (SOURCE 05E9)
 [FLT]   (STAGE 0 "H" (05E9)
 [FLT]     (SUBPART 0
 [FLT]      (COND
 [FLT]       (REGEX "^." "H" 1
 [FLT]        (COPY 0x5E9)))))
 [FLT]   (RESULT (053D 1152 0 0)))

>
>> 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is
>> composed to the right of the letter instead of left.
>
> It seems that "dejavu sans" font doesn't have 'ccmp' nor
> 'mark' feature applicable to that sequence.
>

In that case, may I suggest switching to a different font?

>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into
>> presentation form (U+FB1D). This should only happen with specific
>> control (Either CGJ or ZWJ I'll check).
>
> I tried to show that sequence with gedit and and firefox.
> Both show it with U+FB1D.

Kate show this as well, Firefox doesn't. Nevertheless it's still a bug.

Thank you for investing effort in Hebrew composition.

Yair



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-15 21:35                               ` Yair F
@ 2010-05-17  4:35                                 ` Kenichi Handa
  2010-05-17 17:32                                   ` Eli Zaretskii
                                                     ` (2 more replies)
  2010-05-17 13:53                                 ` Stefan Monnier
  1 sibling, 3 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-17  4:35 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTin5fDs_thjY2I3S4anUDpgkqbzfZt2TBdzRo-8Y@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

>>> 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot
>>> (U+5C2) which breaks the composition.
> >
> > In my environment, the sequence U+5E9 U+5C2 is surely
> > composed.  Please show me the output of this command.
> >
> > % MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")'
[...]
> Mine is
>  [FLT] (hebr-ff (dejavu sans)
>  [FLT]   (SOURCE 05E9)
>  [FLT]   (STAGE 0 "H" (05E9)
>  [FLT]     (SUBPART 0
>  [FLT]      (COND
>  [FLT]       (REGEX "^." "H" 1
>  [FLT]        (COPY 0x5E9)))))
>  [FLT]   (RESULT (053D 1152 0 0)))

Are you using the same setting as mine which I wrote as
below in the previous mail?

> I have not yet committed proper codes for Hebrew
> composition.  I'm now testing with this simple version.
> 
> (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+"))
>   (set-char-table-range
>    composition-function-table '(#x591 . #x5C7)
>    (list (vector pattern 1 'font-shape-gstring)
> 	 ["[\u0591-\u05C7]" 0 font-shape-gstring]))
>   (set-char-table-range
>    composition-function-table #x5C0 nil)
>   (set-char-table-range
>    composition-function-table #x5C6 nil))

>>> 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is
>>> composed to the right of the letter instead of left.
> >
> > It seems that "dejavu sans" font doesn't have 'ccmp' nor
> > 'mark' feature applicable to that sequence.

> In that case, may I suggest switching to a different font?

It's possible to write such a code (which will slow down the
display), but if you don't have a better font, we anyway
have to use the available font.

By the way, do you have a better font than 'dejavu sans' for
Hebrew?

>>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into
>>> presentation form (U+FB1D). This should only happen with specific
>>> control (Either CGJ or ZWJ I'll check).
> >
> > I tried to show that sequence with gedit and and firefox.
> > Both show it with U+FB1D.

> Kate show this as well, Firefox doesn't. Nevertheless it's still a bug.

Then what is the correct rendering of the sequence "\u05D9\u05B4"?
Applying only `mark' feature?  With dejavu sans font, U+FB1D
has lower dot above baseline, but just applying `mark'
feature results in displaying dot below baseline.

See the attached image.  The left one is the composition
with `ccmp' GSUB feature (i.e. into U+FB1D) for the sequence
"\u05D9\u200D\u05B4", and the right one is the composition
with `mark' GPOS feature for the sequence "\u05D9\u05B4".
Is this the correct rendering?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-15 21:35                               ` Yair F
  2010-05-17  4:35                                 ` Kenichi Handa
@ 2010-05-17 13:53                                 ` Stefan Monnier
  1 sibling, 0 replies; 88+ messages in thread
From: Stefan Monnier @ 2010-05-17 13:53 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, Kenichi Handa

>> It seems that "dejavu sans" font doesn't have 'ccmp' nor
>> 'mark' feature applicable to that sequence.
> In that case, may I suggest switching to a different font?

More constructive would be to first report the problem so that the font
can be fixed (assuming the problem is indeed in the font data rather
than in Emacs's use of it).


        Stefan



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-17  4:35                                 ` Kenichi Handa
@ 2010-05-17 17:32                                   ` Eli Zaretskii
  2010-05-18  0:36                                     ` Kenichi Handa
  2010-05-17 21:08                                   ` Yair F
  2010-05-18  7:29                                   ` Composing Hebrew diacriticals Eli Zaretskii
  2 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-17 17:32 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 17 May 2010 13:35:42 +0900
> Cc: emacs-devel@gnu.org
> 
> See the attached image.

I didn't receive any attachments.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-17  4:35                                 ` Kenichi Handa
  2010-05-17 17:32                                   ` Eli Zaretskii
@ 2010-05-17 21:08                                   ` Yair F
  2010-05-20  2:09                                     ` Kenichi Handa
  2010-05-18  7:29                                   ` Composing Hebrew diacriticals Eli Zaretskii
  2 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-17 21:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Mon, May 17, 2010 at 7:35 AM, Kenichi Handa <handa@m17n.org> wrote:
> In article <AANLkTin5fDs_thjY2I3S4anUDpgkqbzfZt2TBdzRo-8Y@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Are you using the same setting as mine which I wrote as
> below in the previous mail?

>

No. I was using this:
;; For automatic composition.
(defconst hebrew-composable-pattern
 (concat
  "\\("
  "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base
  "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
  "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
  "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
  "\\|"
  "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]"
                                   ;; base (allows rafe)
  "[\u05BC\u05BF]?"                ;; 0-1 marks of 1st class (dagesh/rafe)
  "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
  "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
  "\\|"
  "\u05D5"                         ;; base (vav)
  "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
  "[\u05B0-\u05BB\u05C7]?"         ;; 0-1 marks of extended 3rd class (niqqud)
  "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
  "\\|"
  "\u05E9"                         ;; base (shin)
  "\u05BC?"                        ;; 0-1 marks of 1st class (dagesh)
  "[\u05C1\u05C2]?"                ;; 0-1 marks of 2nd class (shin dot)
  "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
  "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
  "\\|"
  "[\u05F1-\u05F3]"                ;; base (yidish ligatures)
  "[\u05B0-\u05B9\u05BB\u05C7]?"   ;; 0-1 marks of 3rd class (niqqud)
  "[\u0591-\u05AF\u05BD]*"         ;; 0-2 (possibly 3) marks of 4th class
  "\\)")
 "Regexp matching a composable sequence of Hebrew characters.")
(set-char-table-range
composition-function-table '(#x591 . #x5F4)
(list (vector hebrew-composable-pattern 0 'font-shape-gstring)))

With your changes there is some composition. But this word doesn't
compose properly:
עַשֶּׁשֶׁת

First Shin (u+05E9) composes with Dagesh (u+05BC), This shin-dot
(u+05C1) isn't visible, the Segol (u+05B6) Goes under the previous
base letter.

what-cursor-position gives this:
         display: composed to form "שֶּׁ" (see below)

Composed with the following character(s) "ֶּׁ" using this font:
 xft:-unknown-DejaVu Sans-normal-normal-normal-*-23-*-*-*-*-0-iso10646-1
by these glyphs:
 [0 3 0 4786 18 2 16 13 0 nil]
 [0 3 1473 1311 0 15 17 16 -14 nil]
 [0 3 1462 1300 0 5 11 -1 6 nil]
MDEBUG_FLT=3 emacs - --eval ' (message "\u05E9\u05BC\u05C1\u05B6")'
[FLT] (hebr-ff (dejavu sans)
[FLT]   (SOURCE 05E9 05BC 05C1 05B6)
[FLT]   (STAGE 0 "Hhhh" (05E9 05BC 05C1 05B6)
[FLT]     (SUBPART 0
[FLT]      (COND
[FLT]       (REGEX "^Hhh*" "Hhhh" 4
[FLT]        <0
[FLT]        :otf=hebr=ccmp+mark 4>))))
[FLT]   (RESULT (12B2 1152 0 0) (051F 0 0 0) (0514 0 0 0)))



> By the way, do you have a better font than 'dejavu sans' for
> Hebrew?

There are 2 major options: Fonts from culmus package
(http://culmus.sourceforge.net/):
Miriam Mono (blends with couier), David (Serif), Nachlieli (OpenOffice
default) or most of the others.

The other option is using msttcorefonts.

I'll try to approach Dejavu designers as well.

>
>>>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into
>>>> presentation form (U+FB1D). This should only happen with specific
>>>> control (Either CGJ or ZWJ I'll check).

> Then what is the correct rendering of the sequence "\u05D9\u05B4"?
Hiriq should be rendered below the baseline like under all other letters.


> See the attached image.

I'm sorry but the attachment was lost. Can you please resent it? I



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-17 17:32                                   ` Eli Zaretskii
@ 2010-05-18  0:36                                     ` Kenichi Handa
  0 siblings, 0 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-18  0:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

In article <83ljbibfiy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > 
> > See the attached image.

> I didn't receive any attachments.

Oops, sorry, here it is.

---
Kenichi Handa
handa@m17n.org


[-- Attachment #2: hebrew.png --]
[-- Type: image/png, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-17  4:35                                 ` Kenichi Handa
  2010-05-17 17:32                                   ` Eli Zaretskii
  2010-05-17 21:08                                   ` Yair F
@ 2010-05-18  7:29                                   ` Eli Zaretskii
  2 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-18  7:29 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 17 May 2010 13:35:42 +0900
> Cc: emacs-devel@gnu.org
> 
> >>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into
> >>> presentation form (U+FB1D). This should only happen with specific
> >>> control (Either CGJ or ZWJ I'll check).
> > >
> > > I tried to show that sequence with gedit and and firefox.
> > > Both show it with U+FB1D.
> 
> > Kate show this as well, Firefox doesn't. Nevertheless it's still a bug.
> 
> Then what is the correct rendering of the sequence "\u05D9\u05B4"?
> Applying only `mark' feature?  With dejavu sans font, U+FB1D
> has lower dot above baseline, but just applying `mark'
> feature results in displaying dot below baseline.
> 
> See the attached image.  The left one is the composition
> with `ccmp' GSUB feature (i.e. into U+FB1D) for the sequence
> "\u05D9\u200D\u05B4", and the right one is the composition
> with `mark' GPOS feature for the sequence "\u05D9\u05B4".
> Is this the correct rendering?

IMHO, U+FB1D looks quite ugly with this font, but I guess that's not
really the issue here.  Otherwise, looks good to me, although I'm far
from being an expert on Hebrew typesetting.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-12  8:04                   ` Kenichi Handa
  2010-05-12 17:35                     ` Eli Zaretskii
@ 2010-05-19 17:23                     ` Eli Zaretskii
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-19 17:23 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org
> Date: Wed, 12 May 2010 17:04:09 +0900
> 
> I found that cursor positioning sometimes failed even for a text
> that doesn't involve composition.  For instance, when I have this
> text in a buffer or bidi-display-reordering t,
> ----------
> A א B.
> בג
> ----------
> and point is at (i.e. before) "A", typing C-e moves point to
> "ג".

Cursor motion problems aside, this turned out to be a genuine bug in
move-end-of-line: it assumed that the first character in every line
necessarily follows the newline which ends the previous line.  This
is, of course, false when characters are reordered for display under
bidi-display-reordering.

I think I fixed it now (revno 100369).




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-17 21:08                                   ` Yair F
@ 2010-05-20  2:09                                     ` Kenichi Handa
  2010-05-25  1:45                                       ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-20  2:09 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTinYS8B06ev3fWapc4eRqy_4Q2UQ7UPQggJRdmnw@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:
> > By the way, do you have a better font than 'dejavu sans' for
> > Hebrew?

> There are 2 major options: Fonts from culmus package
> (http://culmus.sourceforge.net/):
> Miriam Mono (blends with couier), David (Serif), Nachlieli (OpenOffice
> default) or most of the others.

> The other option is using msttcorefonts.

I downloaded culmus fonts.  It seems "David CLM" font is
quite good.  And, while testing with that font, I found a
bug in the current Emacs code.  I'll fix it soon.  Please
wait for a while.

> I'll try to approach Dejavu designers as well.

Thank you.  Another problem I found with Dejavu Sans is that
that lookup tables implementing "mark" feature for hebrew
don't have proper LookupFlag to ignore previous combining
marks (GDEF class 3).  I think that's why it can't compose
two or more diacricals correctly.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-20  2:09                                     ` Kenichi Handa
@ 2010-05-25  1:45                                       ` Kenichi Handa
  2010-05-25 20:56                                         ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-25  1:45 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> I downloaded culmus fonts.  It seems "David CLM" font is
> quite good.  And, while testing with that font, I found a
> bug in the current Emacs code.  I'll fix it soon.  Please
> wait for a while.

I've just committed fixes.  Please try the latest code.

With David CLM font, just font-shape-gstring work well.  But
with Dejavu Sans font, as it lacks several OTF capabilities,
we have to adjust the result of font-shape-gstring.  For
that, I implemented hebrew-shape-gstring.  So now Emacs
should be able to use both fonts.

I have not yet adopted your code for setting up
composition-function-table.  It requires some modifications.
For instance, we should register hebrew-shape-gstring only
for combining characters for fast displaying of a Hebrew
text that doesn't have diacriticals.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-25  1:45                                       ` Kenichi Handa
@ 2010-05-25 20:56                                         ` Yair F
  2010-05-26  0:36                                           ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-25 20:56 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Tue, May 25, 2010 at 4:45 AM, Kenichi Handa <handa@m17n.org> wrote:
> In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> I've just committed fixes.  Please try the latest code.
>
> With David CLM font, just font-shape-gstring work well.
I was not able to compose with this code (revision 100441) with any of
CLM or MS fonts.

MDEBUG_FLT doesn't produce output. Instead this appear in the *messages* buffer
Error during redisplay: (wrong-type-argument arrayp nil) [148 times]

>  But
> with Dejavu Sans font, as it lacks several OTF capabilities

Dejavu sans still composes  with no difference from my last report.
,
> I have not yet adopted your code for setting up
> composition-function-table.  It requires some modifications.
> For instance, we should register hebrew-shape-gstring only
> for combining characters for fast displaying of a Hebrew
> text that doesn't have diacriticals.

Good, it needs further refinement.
In addition I found out that scholars sometimes compose Hebrew letters
with diacritical marks form U+0300 block as well.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-25 20:56                                         ` Yair F
@ 2010-05-26  0:36                                           ` Kenichi Handa
  2010-05-26  4:37                                             ` Yair F
  2010-05-26 13:28                                             ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier
  0 siblings, 2 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-05-26  0:36 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTil_LKUI8n6kQvbj6CIDtPi3UQnyfDFp1dPunxOy@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Tue, May 25, 2010 at 4:45 AM, Kenichi Handa <handa@m17n.org> wrote:
> > In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> > I've just committed fixes.  Please try the latest code.
> >
> > With David CLM font, just font-shape-gstring work well.
> I was not able to compose with this code (revision 100441) with any of
> CLM or MS fonts.

> MDEBUG_FLT doesn't produce output. Instead this appear in the *messages* buffer
> Error during redisplay: (wrong-type-argument arrayp nil) [148 times]

Ummm, strange.  I don't see that message.  I can see correct
Hebrew composing with this:

% emacs -Q
ESC : (setq-default bidi-display-reordering t) RET
ESC : (set-fontset-font t 'hebrew '("David CLM" . "unicode-bmp"))

Please run Emacs under gdb, set breakpoint at Fsignal, and
find out which code causes the above error.

% cd .../emacs/src
% gdb emacs
(gdb) br Fsignal
(gdb) run -Q

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-26  0:36                                           ` Kenichi Handa
@ 2010-05-26  4:37                                             ` Yair F
  2010-05-26  6:00                                               ` Kenichi Handa
  2010-05-26 13:28                                             ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier
  1 sibling, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-26  4:37 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Wed, May 26, 2010 at 3:36 AM, Kenichi Handa <handa@m17n.org> wrote:
> Please run Emacs under gdb, set breakpoint at Fsignal, and
> find out which code causes the above error.

It's been a while since I last run gdb and never for Emacs. If you
need more information please specify the commands.

(gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default
bidi-display-reordering t) (setq debug-on-error t) (message
"\u05E9\u05C2"))'
Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn
'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t)
(setq debug-on-error t) (message "\u05E9\u05C2"))'
[Thread debugging using libthread_db enabled]

Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
1686      register struct handler *allhandlers = handlerlist;
(gdb) bt
#0  Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
#1  0x08301401 in xsignal (error_symbol=142100970, data=143920942) at
eval.c:1796
#2  0x08301467 in xsignal2 (error_symbol=142100970, arg1=142101594,
arg2=142070194) at eval.c:1820
#3  0x082dea27 in wrong_type_argument (predicate=0, value=5) at data.c:118
#4  0x082e612e in Faref (array=142070194, idx=0) at data.c:2215
#5  0x0837c054 in Fbyte_code (bytestr=140298249, vector=140343389,
maxdepth=8) at bytecode.c:980
#6  0x08305d62 in funcall_lambda (fun=140343341, nargs=1,
arg_vector=0xbfffa9a0) at eval.c:3260
#7  0x08305786 in apply_lambda (fun=140343341, args=140362406,
eval_flag=1) at eval.c:3184
#8  0x083035d1 in Feval (form=140362398) at eval.c:2456
#9  0x083001a8 in Flet (args=140362590) at eval.c:1072
#10 0x08302c6b in Feval (form=140362382) at eval.c:2361
#11 0x082fe7c0 in Fprogn (args=140363030) at eval.c:416
#12 0x08305c53 in funcall_lambda (fun=140363046, nargs=1,
arg_vector=0xbfffad94) at eval.c:3253
#13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130
#14 0x0837afb2 in Fbyte_code (bytestr=140347985, vector=140348053,
maxdepth=24) at bytecode.c:680
#15 0x08305d62 in funcall_lambda (fun=140347925, nargs=5,
arg_vector=0xbfffb1e0) at eval.c:3260
#16 0x08305541 in Ffuncall (nargs=6, args=0xbfffb1dc) at eval.c:3119
#17 0x083010a8 in internal_condition_case_n (bfun=0x8304689
<Ffuncall>, nargs=6, args=0xbfffb1dc, handlers=142070218,
hfun=0x808f2b6 <safe_eval_handler>) at eval.c:1656
#18 0x0808f358 in safe_call (nargs=6, args=0xbfffb1dc) at xdisp.c:2454
#19 0x083b2c39 in autocmp_chars (rule=144490029, charpos=1, bytepos=1,
limit=3, win=0x8b1dc70, face=0x8b10940, string=142070194) at
composite.c:991
#20 0x083b57d4 in composition_reseat_it (cmp_it=0xbfffdab8, charpos=1,
bytepos=1, endpos=3, w=0x8b1dc70, face=0x8b10940, string=142070194) at
composite.c:1321
#21 0x0809ddf5 in next_element_from_buffer (it=0xbfffd600) at xdisp.c:6984
#22 0x0809a219 in get_next_display_element (it=0xbfffd600) at xdisp.c:5886
#23 0x0809e56a in move_it_in_display_line_to (it=0xbfffd600,
to_charpos=3, to_x=-1, op=MOVE_TO_POS) at xdisp.c:7246
#24 0x0809fcdd in move_it_to (it=0xbfffd600, to_charpos=3, to_x=-1,
to_y=-1, to_vpos=-1, op=8) at xdisp.c:7747
#25 0x080a5704 in resize_mini_window (w=0x8b1dc70, exact_p=0) at xdisp.c:9242
#26 0x080a4ed0 in display_echo_area_1 (a1=145874032, a2=142070194,
a3=0, a4=0) at xdisp.c:9105
#27 0x080a3aba in with_echo_area_buffer (w=0x8b1dc70, which=0,
fn=0x80a4eaa <display_echo_area_1>, a1=145874032, a2=142070194, a3=0,
a4=0) at xdisp.c:8892
#28 0x080a4e78 in display_echo_area (w=0x8b1dc70) at xdisp.c:9073
#29 0x080a6a13 in echo_area_display (update_frame_p=1) at xdisp.c:9671
#30 0x080a2716 in message3_nolog (m=145559521, nbytes=4, multibyte=1)
at xdisp.c:8568
#31 0x080a21e4 in message3 (m=145559521, nbytes=4, multibyte=1) at xdisp.c:8503
#32 0x082f68ff in Fmessage (nargs=1, args=0xbfffe070) at editfns.c:3418
#33 0x08302e13 in Feval (form=143921278) at eval.c:2387
#34 0x082fe7c0 in Fprogn (args=143921262) at eval.c:416
#35 0x08302c6b in Feval (form=143921350) at eval.c:2361
#36 0x08304fbf in Ffuncall (nargs=2, args=0xbfffe320) at eval.c:3073
#37 0x0837afb2 in Fbyte_code (bytestr=140089481, vector=140089957,
maxdepth=40) at bytecode.c:680
#38 0x08305d62 in funcall_lambda (fun=140089453, nargs=1,
arg_vector=0xbfffe614) at eval.c:3260
#39 0x08305541 in Ffuncall (nargs=2, args=0xbfffe610) at eval.c:3119
#40 0x0837afb2 in Fbyte_code (bytestr=140072329, vector=140073261,
maxdepth=28) at bytecode.c:680
#41 0x08305d62 in funcall_lambda (fun=140072309, nargs=0,
arg_vector=0xbfffe8f4) at eval.c:3260
#42 0x08305541 in Ffuncall (nargs=1, args=0xbfffe8f0) at eval.c:3119
#43 0x0837afb2 in Fbyte_code (bytestr=140069033, vector=140069253,
maxdepth=24) at bytecode.c:680
#44 0x08305d62 in funcall_lambda (fun=140069013, nargs=0,
arg_vector=0xbfffeb20) at eval.c:3260
#45 0x08305786 in apply_lambda (fun=140069013, args=142070194,
eval_flag=1) at eval.c:3184
#46 0x083035d1 in Feval (form=142425222) at eval.c:2456
#47 0x0823350a in top_level_2 () at keyboard.c:1365
#48 0x08300d82 in internal_condition_case (bfun=0x82334f7
<top_level_2>, handlers=142100922, hfun=0x8232fe7 <cmd_error>) at
eval.c:1510
#49 0x08233541 in top_level_1 () at keyboard.c:1373
#50 0x083007bf in internal_catch (tag=142099042, func=0x823350c
<top_level_1>, arg=142070194) at eval.c:1246
#51 0x0823347a in command_loop () at keyboard.c:1328
#52 0x082326ee in recursive_edit_1 () at keyboard.c:950
#53 0x08232c0b in Frecursive_edit () at keyboard.c:1012
#54 0x0823088c in main (argc=6, argv=0xbffff324) at emacs.c:1801

Lisp Backtrace:
"lgstring-header" (0xbfffa9a0)
"let" (0xbfffac2c)
"hebrew-shape-gstring" (0xbfffad94)
"auto-compose-chars" (0xbfffb1e0)
"message" (0xbfffe070)
"progn" (0xbfffe22c)
"eval" (0xbfffe324)
"command-line-1" (0xbfffe614)
"command-line" (0xbfffe8f4)
"normal-top-level" (0xbfffeb20)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-26  4:37                                             ` Yair F
@ 2010-05-26  6:00                                               ` Kenichi Handa
  2010-05-26 16:12                                                 ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-26  6:00 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTilIzGMNuvACJucMikm5z0U-kik58W-5ezcB-X1G@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Wed, May 26, 2010 at 3:36 AM, Kenichi Handa <handa@m17n.org> wrote:
> > Please run Emacs under gdb, set breakpoint at Fsignal, and
> > find out which code causes the above error.

> It's been a while since I last run gdb and never for Emacs. If you
> need more information please specify the commands.

> (gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default
> bidi-display-reordering t) (setq debug-on-error t) (message
> "\u05E9\u05C2"))'
> Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn
> 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t)
> (setq debug-on-error t) (message "\u05E9\u05C2"))'
> [Thread debugging using libthread_db enabled]

> Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
> 1686      register struct handler *allhandlers = handlerlist;
> (gdb) bt
> #0  Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
[...]
> #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130

Please show me the result of this:

(gdb) up 13
(gdb) pp args[1]

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Enabling bidi (was: Composing Hebrew diacriticals)
  2010-05-26  0:36                                           ` Kenichi Handa
  2010-05-26  4:37                                             ` Yair F
@ 2010-05-26 13:28                                             ` Stefan Monnier
  2010-05-26 17:14                                               ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Stefan Monnier @ 2010-05-26 13:28 UTC (permalink / raw)
  To: emacs-devel

> % emacs -Q
> ESC : (setq-default bidi-display-reordering t) RET

I think it's time we set bidi-display-reordering to t by default.
Any objection?


        Stefan



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-26  6:00                                               ` Kenichi Handa
@ 2010-05-26 16:12                                                 ` Yair F
  2010-05-27  7:27                                                   ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-26 16:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Wed, May 26, 2010 at 9:00 AM, Kenichi Handa <handa@m17n.org> wrote:
>> (gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default
>> bidi-display-reordering t) (setq debug-on-error t) (message
>> "\u05E9\u05C2"))'
>> Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn
>> 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t)
>> (setq debug-on-error t) (message "\u05E9\u05C2"))'
>> [Thread debugging using libthread_db enabled]
>
>> Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
>> 1686      register struct handler *allhandlers = handlerlist;
>> (gdb) bt
>> #0  Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686
> [...]
>> #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130
>
> Please show me the result of this:
>
> (gdb) up 13
> (gdb) pp args[1]
#13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130
3130            val = funcall_lambda (fun, numargs, args + 1);
(gdb) pp args[1]
[[#<font-object "-bitstream-David
CLM-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1"> 1513 1474] nil [0
0 1513 141 15 0 16 15 1 nil] [1 1 1474 113 0 3 6 17 -13 nil] nil nil
nil nil nil nil]
(gdb)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Enabling bidi (was: Composing Hebrew diacriticals)
  2010-05-26 13:28                                             ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier
@ 2010-05-26 17:14                                               ` Eli Zaretskii
  2010-05-27  4:13                                                 ` Enabling bidi Stefan Monnier
  0 siblings, 1 reply; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-26 17:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 26 May 2010 09:28:13 -0400
> 
> > % emacs -Q
> > ESC : (setq-default bidi-display-reordering t) RET
> 
> I think it's time we set bidi-display-reordering to t by default.
> Any objection?

There are still 2 problems I'd like to fix before we do this:

  . Annoying unnecessary scrolls when moving cursor through continued
    lines (for one of its manifestations, see
    http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00634.html)

  . Cursor positioning problems with truncated lines (cursor gets
    stuck at end of lines truncated on the left)

These problems affect anyone who turns on the bidi reordering, even if
all the characters in the buffer are strict L2R and no reordering
actually happens.

The first problem will almost certainly be fixed this weekend (I
already know where's the bug and how to fix that).  The second one
needs to be debugged, but I hope to get it nailed by this coming
Sunday as well.

However, if you think these problems are not grave enough to delay, go
ahead and turn it on, and let's see how much I screwed up.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Enabling bidi
  2010-05-26 17:14                                               ` Eli Zaretskii
@ 2010-05-27  4:13                                                 ` Stefan Monnier
  2010-05-27 17:43                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 88+ messages in thread
From: Stefan Monnier @ 2010-05-27  4:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> However, if you think these problems are not grave enough to delay, go
> ahead and turn it on, and let's see how much I screwed up.

If you still have enough work to do as is, then we can leave it nil
for now.  But please set it to t as soon as you're done with such
"obvious problems".


        Stefan



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-26 16:12                                                 ` Yair F
@ 2010-05-27  7:27                                                   ` Kenichi Handa
  2010-05-27 21:59                                                     ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-27  7:27 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTinqUAXWipZRQoYQMnnTSWbSJ5rIppF4EjPlfx27@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> > (gdb) up 13
> > (gdb) pp args[1]
> #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130
> 3130            val = funcall_lambda (fun, numargs, args + 1);
> (gdb) pp args[1]
> [[#<font-object "-bitstream-David
> CLM-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1"> 1513 1474] nil [0
> 0 1513 141 15 0 16 15 1 nil] [1 1 1474 113 0 3 6 17 -13 nil] nil nil
> nil nil nil nil]
> (gdb)

This indicates that somehow the "shape" method of a font
driver didn't work correctly.  Please set the break point at
Ffont_shape_gstring, and investigate why that function
doesn't reach this last code by stepping through the code
using n(ext) command of gdb:

L4600:  return composition_gstring_put_cache (gstring, XINT (n));

By the way, the Lisp error was because of a bug in
hebrew-shape-gstring.  I've committed a fix.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Enabling bidi
  2010-05-27  4:13                                                 ` Enabling bidi Stefan Monnier
@ 2010-05-27 17:43                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-05-27 17:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Thu, 27 May 2010 00:13:01 -0400
> 
> If you still have enough work to do as is, then we can leave it nil
> for now.  But please set it to t as soon as you're done with such
> "obvious problems".

Will do.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-27  7:27                                                   ` Kenichi Handa
@ 2010-05-27 21:59                                                     ` Yair F
  2010-05-28  0:42                                                       ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-05-27 21:59 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Thu, May 27, 2010 at 10:27 AM, Kenichi Handa <handa@m17n.org> wrote:

> This indicates that somehow the "shape" method of a font
> driver didn't work correctly.  Please set the break point at
> Ffont_shape_gstring, and investigate why that function
> doesn't reach this last code by stepping through the code
> using n(ext) command of gdb:

Ffont_shape_gstring fails because of otf == 0

p *ftfont_info
$5 = {
  font = {
    size = 1075838994,
    next = 0x8bef578,
    props = {142227066, 145465234, 145465186, 142070194, 142222906,
102720, 102528, 102656, 108, 142070194, 142070194, 0, 144727838,
142070194, 145453985, 145453969, 143750657, 145465258},
    max_width = 19,
    pixel_size = 27,
    height = 29,
    space_width = 10,
    average_width = 12,
    min_width = 10,
    ascent = 22,
    descent = 7,
    underline_thickness = 1,
    underline_position = 2,
    vertical_centering = 0,
    encoding_type = 200 '\310',
    baseline_offset = 0,
    relative_compose = 0,
    default_ascent = 0,
    font_encoder = 0x8b8b0e0,
    driver = 0x8771fa0,
    encoding_charset = -1,
    repertory_charset = -1
  },
  maybe_otf = 0,
  otf = 0x0,
  ft_size = 0x8ae9010,
  index = 35,
  matrix = {
    xx = 0,
    xy = 38,
    yx = 147122784,
    yy = 39
  }
}



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-27 21:59                                                     ` Yair F
@ 2010-05-28  0:42                                                       ` Kenichi Handa
  2010-06-01  8:58                                                         ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-05-28  0:42 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTilqW4R3C5RsSWEXkRMUOR6JH0pppYIMclqnqwAq@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Thu, May 27, 2010 at 10:27 AM, Kenichi Handa <handa@m17n.org> wrote:
> > This indicates that somehow the "shape" method of a font
> > driver didn't work correctly.  Please set the break point at
> > Ffont_shape_gstring, and investigate why that function
> > doesn't reach this last code by stepping through the code
> > using n(ext) command of gdb:

> Ffont_shape_gstring fails because of otf == 0

> p *ftfont_info
> $5 = {
[...]
>   maybe_otf = 0,
>   otf = 0x0,

Then please find why maybe_otf and otf are set to zero by
stepping through the code of ftfont_get_otf which is called
from ftfont_shape.

BTW, are you sure that your Emacs is compiled with libotf?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-05-28  0:42                                                       ` Kenichi Handa
@ 2010-06-01  8:58                                                         ` Yair F
  0 siblings, 0 replies; 88+ messages in thread
From: Yair F @ 2010-06-01  8:58 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote:

> Then please find why maybe_otf and otf are set to zero by
> stepping through the code of ftfont_get_otf which is called
> from ftfont_shape.

ftfont_get_otf sets otf only if maybe_otf != 0.

maybe_otf is initialized from ft_face->face_flags in xftfont_open.
For David CLM maybe_otf = 0 because ft_face->face_flags = 2577.
For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
       [not found] <tl7fx0v9nra.fsf@m17n.org>
@ 2010-06-15 11:02 ` Kenichi Handa
  2010-06-24  6:33   ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-06-15 11:02 UTC (permalink / raw)
  To: emacs-devel; +Cc: yair.f.lists

In article <AANLkTinkfapIXNSnij20psfpKU1ZKS-6wJsVIDbVaQ7i@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote:
> > Then please find why maybe_otf and otf are set to zero by
> > stepping through the code of ftfont_get_otf which is called
> > from ftfont_shape.

> ftfont_get_otf sets otf only if maybe_otf != 0.

> maybe_otf is initialized from ft_face->face_flags in xftfont_open.
> For David CLM maybe_otf = 0 because ft_face->face_flags = 2577.
> For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649.

That's very strange.  Perhaps your David CLM font is
different from mine.

In freetype.h, FT_FACE_FLAG_SFNT is explained as this:

  /*    FT_FACE_FLAG_SFNT ::                                               */
  /*      Indicates that the face uses the `sfnt' storage scheme.  For     */
  /*      now, this means TrueType and OpenType.                           */

So, if the font doesn't have this flag set, it means that
the font is surely not OTF.

This is some info about my David CLM font.

% ls -l DavidCLM-Medium.ttf
-rw-r--r-- 1 handa handa 24156 2010-06-15 09:48 DavidCLM-Medium.ttf
% fc-list 'david clm' capability
:capability=otlayout\:hebr
% od -t x1 DavidCLM-Medium.ttf |head
0000000 00 01 00 00 00 10 01 00 00 04 00 00 46 46 54 4d
0000020 4f 58 4a 2a 00 00 5e 40 00 00 00 1c 47 44 45 46
0000040 08 87 07 9c 00 00 50 24 00 00 00 6e 47 50 4f 53
0000060 c3 06 cd 7e 00 00 55 34 00 00 09 0a 47 53 55 42
0000100 48 82 52 49 00 00 50 94 00 00 04 9e 4f 53 2f 32
0000120 89 5b 2c ee 00 00 01 88 00 00 00 56 63 6d 61 70
0000140 ae 86 db a7 00 00 05 3c 00 00 02 0a 63 76 74 20
0000160 00 28 02 f8 00 00 07 48 00 00 00 04 67 61 73 70
0000200 ff ff 00 03 00 00 50 1c 00 00 00 08 67 6c 79 66
0000220 62 9d 8f 85 00 00 08 fc 00 00 3c 34 68 65 61 64

---
Kenichi Handa
handa@m17n.org

PS.  I got WiFi (WiMAX) now, and the Internet access has
been much improved. :-)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-15 11:02 ` Kenichi Handa
@ 2010-06-24  6:33   ` Kenichi Handa
  2010-06-25 10:16     ` Eli Zaretskii
  2010-06-28 16:40     ` Yair F
  0 siblings, 2 replies; 88+ messages in thread
From: Kenichi Handa @ 2010-06-24  6:33 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

Yair, could you please check your David CLM font with these
commands?

% ls -l DavidCLM-Medium.ttf
% fc-list 'david clm' capability
% od -t x1 DavidCLM-Medium.ttf |head

---
Kenichi Handa
handa@m17n.org

PS.  I left the hospital yesterday. :-)

In article <tl7eig8pnim.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> In article <AANLkTinkfapIXNSnij20psfpKU1ZKS-6wJsVIDbVaQ7i@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:
> > On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote:
> > > Then please find why maybe_otf and otf are set to zero by
> > > stepping through the code of ftfont_get_otf which is called
> > > from ftfont_shape.

> > ftfont_get_otf sets otf only if maybe_otf != 0.

> > maybe_otf is initialized from ft_face->face_flags in xftfont_open.
> > For David CLM maybe_otf = 0 because ft_face->face_flags = 2577.
> > For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649.

> That's very strange.  Perhaps your David CLM font is
> different from mine.

> In freetype.h, FT_FACE_FLAG_SFNT is explained as this:

>   /*    FT_FACE_FLAG_SFNT ::                                               */
>   /*      Indicates that the face uses the `sfnt' storage scheme.  For     */
>   /*      now, this means TrueType and OpenType.                           */

> So, if the font doesn't have this flag set, it means that
> the font is surely not OTF.

> This is some info about my David CLM font.

> % ls -l DavidCLM-Medium.ttf
> -rw-r--r-- 1 handa handa 24156 2010-06-15 09:48 DavidCLM-Medium.ttf
> % fc-list 'david clm' capability
> :capability=otlayout\:hebr
> % od -t x1 DavidCLM-Medium.ttf |head
> 0000000 00 01 00 00 00 10 01 00 00 04 00 00 46 46 54 4d
> 0000020 4f 58 4a 2a 00 00 5e 40 00 00 00 1c 47 44 45 46
> 0000040 08 87 07 9c 00 00 50 24 00 00 00 6e 47 50 4f 53
> 0000060 c3 06 cd 7e 00 00 55 34 00 00 09 0a 47 53 55 42
> 0000100 48 82 52 49 00 00 50 94 00 00 04 9e 4f 53 2f 32
> 0000120 89 5b 2c ee 00 00 01 88 00 00 00 56 63 6d 61 70
> 0000140 ae 86 db a7 00 00 05 3c 00 00 02 0a 63 76 74 20
> 0000160 00 28 02 f8 00 00 07 48 00 00 00 04 67 61 73 70
> 0000200 ff ff 00 03 00 00 50 1c 00 00 00 08 67 6c 79 66
> 0000220 62 9d 8f 85 00 00 08 fc 00 00 3c 34 68 65 61 64

> ---
> Kenichi Handa
> handa@m17n.org

> PS.  I got WiFi (WiMAX) now, and the Internet access has
> been much improved. :-)




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-24  6:33   ` Kenichi Handa
@ 2010-06-25 10:16     ` Eli Zaretskii
  2010-06-28 16:40     ` Yair F
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-06-25 10:16 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Thu, 24 Jun 2010 15:33:06 +0900
> Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org
> 
> PS.  I left the hospital yesterday. :-)

Glad to hear that.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-24  6:33   ` Kenichi Handa
  2010-06-25 10:16     ` Eli Zaretskii
@ 2010-06-28 16:40     ` Yair F
  2010-06-29  8:07       ` Kenichi Handa
  1 sibling, 1 reply; 88+ messages in thread
From: Yair F @ 2010-06-28 16:40 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Sorry for the late response.

Apparently the Culmus fonts are type1:
/usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font
text (DavidCLM-Medium 0.101)

But MS fonts are ttf, and they doesn't compose either.
/usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data


On Thu, Jun 24, 2010 at 9:33 AM, Kenichi Handa <handa@m17n.org> wrote:
> Yair, could you please check your David CLM font with these
> commands?
>
> % ls -l DavidCLM-Medium.ttf
> % fc-list 'david clm' capability
> % od -t x1 DavidCLM-Medium.ttf |head
>
> ---
> Kenichi Handa
> handa@m17n.org
>
> PS.  I left the hospital yesterday. :-)

This the best news!



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-28 16:40     ` Yair F
@ 2010-06-29  8:07       ` Kenichi Handa
  2010-06-29 18:57         ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-06-29  8:07 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTilenRSGCRXJNj8TtdXqUlyoBOuk-PGld8geCah1@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Sorry for the late response.
> Apparently the Culmus fonts are type1:
> /usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font
> text (DavidCLM-Medium 0.101)

How did you install that font?  I donwloaded
culmus-0.104.tar.gz from this page:
   http://sourceforge.net/projects/culmus/files/culmus/0.104/
and extracted DavidCLM-Medium.ttf from that tarball, and put
it under ~/.fonts.

Please try that (and uninstall the above type1 font), and
check if Emacs can use TrueType version of that font
correctly.

> But MS fonts are ttf, and they doesn't compose either.
> /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data

I tried that font too.  That font doesn't have OpenType
tables for hebrew script.

% fc-list arial family capability
Arial
Arial:capability=otlayout\:arab

But, the function hebrew-shape-gstring has workaround code
for such fonts, and in my environment, hebrew diacriticals
are surely composed (although the positioning is not
optimal).

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-29  8:07       ` Kenichi Handa
@ 2010-06-29 18:57         ` Yair F
  2010-06-30  5:27           ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-06-29 18:57 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

On Tue, Jun 29, 2010 at 11:07 AM, Kenichi Handa <handa@m17n.org> wrote:
> In article <AANLkTilenRSGCRXJNj8TtdXqUlyoBOuk-PGld8geCah1@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:
>
>> Sorry for the late response.
>> Apparently the Culmus fonts are type1:
>> /usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font
>> text (DavidCLM-Medium 0.101)
>
> How did you install that font?  I donwloaded
> culmus-0.104.tar.gz from this page:

This is from culmus package on ubuntu (and debian as well as most
distributions as well). I would assume most Hebrew speakers on X based
paltform will have these two packages installed. Most Hebrew based
remixes package it.

>   http://sourceforge.net/projects/culmus/files/culmus/0.104/
> and extracted DavidCLM-Medium.ttf from that tarball, and put
> it under ~/.fonts.
>
> Please try that (and uninstall the above type1 font), and
> check if Emacs can use TrueType version of that font
> correctly.
I Tried with Keter-YG which is IMO the best Hebrew font, and Indeed
the rendring looks OK with my sample (See attached). This font comes
from culmus-ancient. The "problem" with that fornt that it is indeed
have an ancient look.

>
>> But MS fonts are ttf, and they doesn't compose either.
>> /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data
>
> I tried that font too.  That font doesn't have OpenType
> tables for hebrew script.
>
> % fc-list arial family capability
> Arial
> Arial:capability=otlayout\:arab
>
> But, the function hebrew-shape-gstring has workaround code
> for such fonts, and in my environment, hebrew diacriticals
> are surely composed (although the positioning is not
> optimal).

I would say that the positioning is not sufficient See attached of same file.

[-- Attachment #2: arial.png --]
[-- Type: image/png, Size: 23621 bytes --]

[-- Attachment #3: keter.png --]
[-- Type: image/png, Size: 29097 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-29 18:57         ` Yair F
@ 2010-06-30  5:27           ` Kenichi Handa
       [not found]             ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-06-30  5:27 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTintXoyqvqO5Mqqbyci-AKuBqMYRyp7TBnVUKT-Z@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

>>> But MS fonts are ttf, and they doesn't compose either.
>>> /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data
> >
> > I tried that font too. =A0That font doesn't have OpenType
> > tables for hebrew script.
> >
> > % fc-list arial family capability
> > Arial
> > Arial:capability=3Dotlayout\:arab
> >
> > But, the function hebrew-shape-gstring has workaround code
> > for such fonts, and in my environment, hebrew diacriticals
> > are surely composed (although the positioning is not
> > optimal).

> I would say that the positioning is not sufficient See attached of same fil=
> e.

Comparing images of different font of unfamiliar (for me)
script is very difficult.  Please tell me exactly what
character sequence requires more than positioning, and show
me images of only that sequence.

Anyway, for fonts that don't have OpenType tables for Hebrew
script, we can do nothing other than artificially adjusting
glyph position.  Have you seen any other application
rendering Hebrew well with that Arial font?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
       [not found]             ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>
  2010-06-30 21:48               ` Fwd: " Yair F
@ 2010-07-01  5:52               ` Kenichi Handa
  2010-07-01 20:30                 ` Yair F
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-07-01  5:52 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 9138 bytes --]

In article <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Sorry about that Please find hebrew-sample2.txt the source file.
> Arial-anottated.png is this file displayed using emacs with Arial font.
> The numbers in red refer to the following comments the general flow is
> top-bottom right-left:
> 1. Shin-Dot should be rendered near the right leg. currently it is
> rendered above the centre leg, this is unreradable.
> 2. All points below should be horizontally centred relative to the
> base letter. Currently it seems that they are align to the left.
> Exception for this rule is letters that have a single leg downward
> such as =D7=95, =D7=A8, =D7=93, =D7=96 the points should be rendered direct=
> ly under the
> leg for these letters.
> 3. The Shva point touches Qof's leg. the result is unreadable.
> 4. The Dagesh point is hidden within the Shin letter.
> 5. This is not Hebrew, but the combining dot above should be composed
> with the letter A.
> 6. The Holam point should be left to the leg, and not right. Result is
> unreadable.
> 7. Shuruq point should be left to the vav letter, and not right.
> Result is unreadable.

All those are glyph positioning problems and can be improved
by adding more code to hebrew-shape-gstring.

> > Anyway, for fonts that don't have OpenType tables for Hebrew
> > script, we can do nothing other than artificially adjusting
> > glyph position. =C2=A0Have you seen any other application
> > rendering Hebrew well with that Arial font?
> Openoffice and Firefox correctly render Hebrew points.

??? When I open your hebrew-sample2.txt with oowriter, and
specify Arial font, the rendering is almost (exactly?) the
same as that of Emacs (see the attached image).

I confirmed that Firefox (and all applications using
Pango/harfbuzz; e.g. gedit) surely do better hebrew
rendering with Arial.  By reading the code of Pango, I found
that it has a fallback shaping engine that is used for a
font of no hebrew GPOS OpenType tables.  Here's the excerpt
from pango/module/hebrew-shaper.c.  You'll see that it
checks various character combinations and adjust glyph
offsets properly.  But the code has many magic numbers
(e.g. 3.5, 0.7, 0.5, 1/3, 3/5, ...).  I think it's a dirty &
ad-hoc hack.

Theoretically, it is possible to do the same thing in the
function hebrew-shape-gstring.  But, is it really worth
doing that?  Isn't it enough to tell Hebrew users to use
properly desinged OpenType fonts?

============================================================
void
hebrew_shaper_get_cluster_kerning(gunichar            *cluster,
				  gint                cluster_length,
				  PangoRectangle      ink_rect[],

				  /* input and output */
				  gint                width[],
				  gint                x_offset[],
				  gint                y_offset[])
{
  int i;
  int base_ink_x_offset, base_ink_y_offset, base_ink_width, base_ink_height;
  gunichar base_char = cluster[0];

  x_offset[0] = 0;
  y_offset[0] = 0;

  if (cluster_length == 1)
    {
      /* Make lone 'vav dot' have zero width */
      if (base_char == UNI_SHIN_DOT
	  || base_char == UNI_SIN_DOT
	  || base_char == UNI_HOLAM
	  ) {
	x_offset[0] = -ink_rect[0].x - ink_rect[0].width;
	width[0] = 0;
      }

      return;
    }

  base_ink_x_offset = ink_rect[0].x;
  base_ink_y_offset = ink_rect[0].y;
  base_ink_width = ink_rect[0].width;
  base_ink_height = ink_rect[0].height;

  /* Do heuristics */
  for (i=1; i<cluster_length; i++)
    {
      int gl = cluster[i];
      x_offset[i] = 0;
      y_offset[i] = 0;

      /* Check if it is a point */
      if (gl < 0x5B0 || gl >= 0x05D0)
	continue;

      /* Center dot of VAV */
      if (gl == UNI_MAPIQ && base_char == UNI_VAV)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x;

	  /* If VAV is a vertical bar without a roof, then we
	     need to make room for the dot by increasing the
	     cluster width. But how can I check if that is the
	     case??
	  */
	  /* This is wild, but it does the job of differentiating
	     between two M$ fonts... Base the decision on the
	     aspect ratio of the vav...
	  */
	  if (base_ink_height > base_ink_width * 3.5)
	    {
	      int j;
	      double space = 0.7;
	      double kern = 0.5;

	      /* Shift all characters to make place for the mapiq */
	      for (j=0; j<i; j++)
		  x_offset[j] += ink_rect[i].width*(1+space-kern);

	      width[cluster_length-1] += ink_rect[i].width*(1+space-kern);
	      x_offset[i] -= ink_rect[i].width*(kern);
	    }
	}

      /* Dot over SHIN */
      else if (gl == UNI_SHIN_DOT && base_char == UNI_SHIN)
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* Dot over SIN */
      else if (gl == UNI_SIN_DOT && base_char == UNI_SHIN)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x;
	}

      /* VOWEL DOT above to any other character than
	 SHIN or VAV should stick out a bit to the left. */
      else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM)
	       && base_char != UNI_SHIN && base_char != UNI_VAV)
	{
	  x_offset[i] = base_ink_x_offset -ink_rect[i].x - ink_rect[i].width * 3/ 2;
	}

      /* VOWELS under resh or vav are right aligned, if they are
	 narrower than the characters. Otherwise they are centered.
       */
      else if ((base_char == UNI_VAV
		|| base_char == UNI_RESH
		|| base_char == UNI_YOD
		|| base_char == UNI_DALED
		)
	       && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) ||
		   gl == UNI_QUBUTS)
	       && ink_rect[i].width < base_ink_width
	       )
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* VOWELS under FINAL KAF are offset centered and offset in
	 y */
      else if ((base_char == UNI_FINAL_KAF
		)
	       && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) ||
		   gl == UNI_QUBUTS))
	{
	  /* x are at 1/3 to take into accoun the stem */
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 1/3 - ink_rect[i].width/2;

	  /* Center in y */
	  y_offset[i] = base_ink_y_offset - ink_rect[i].y
	    + base_ink_height * 1/2 - ink_rect[i].height/2;
	}


      /* MAPIQ in PE or FINAL PE */
      else if (gl == UNI_MAPIQ
	       && (base_char == UNI_PE || base_char == UNI_FINAL_PE))
	{
	  x_offset[i]= base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 2/3 - ink_rect[i].width/2;

	  /* Another option is to offset the MAPIQ in y...
	     glyphs->glyphs[cluster_start_idx+i].geometry.y_offset
	     -= base_ink_height/5; */
	}

      /* MAPIQ in SHIN should be moved a bit to the right */
      else if (gl == UNI_MAPIQ
	       && base_char == UNI_SHIN)
	{
	  x_offset[i]=  base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 3/5 - ink_rect[i].width/2;
	}

      /* MAPIQ in YUD is right aligned */
      else if (gl == UNI_MAPIQ
	       && base_char == UNI_YOD)
	{
	  x_offset[i]=  base_ink_x_offset - ink_rect[i].x;

	  /* Lower left in y */
	  y_offset[i] = base_ink_y_offset - ink_rect[i].y
	    + base_ink_height - ink_rect[i].height*1.75;

	  if (base_ink_height > base_ink_width * 2)
	    {
	      int j;
	      double space = 0.7;
	      double kern = 0.5;

	      /* Shift all cluster characters to make space for mapiq */
	      for (j=0; j<i; j++)
		x_offset[j] += ink_rect[i].width*(1+space-kern);

	      width[cluster_length-1] += ink_rect[i].width*(1+space-kern);
	    }

	}

      /* VOWEL DOT next to any other character */
      else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM)
	       && (base_char != UNI_VAV))
	{
	  x_offset[i] = base_ink_x_offset -ink_rect[i].x;
	}

      /* Move nikud of taf a bit ... */
      else if (base_char == UNI_TAV && gl == UNI_MAPIQ)
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 5/8 - ink_rect[i].width/2;
	}

      /* Move center dot of characters with a right stem and no
	 left stem. */
      else if (gl == UNI_MAPIQ &&
	       (base_char == UNI_BET
		|| base_char == UNI_DALED
		|| base_char == UNI_KAF
		|| base_char == UNI_GIMMEL
		))
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width * 3/8 - ink_rect[i].width/2;
	}

      /* Right align wide nikud under QOF */
      else if (base_char == UNI_QOF &&
	       ( (gl >= UNI_HATAF_SEGOL
		  && gl <= UNI_HATAF_QAMATZ)
		 || (gl >= UNI_TSERE
		     && gl<= UNI_QAMATS)
		 || (gl == UNI_QUBUTS)))
	{
	  x_offset[i] = base_ink_x_offset + base_ink_width
	    - ink_rect[i].x - ink_rect[i].width;
	}

      /* Center by default */
      else
	{
	  x_offset[i] = base_ink_x_offset - ink_rect[i].x
	    + base_ink_width/2 - ink_rect[i].width/2;
	}
    }

}
============================================================

> The poetry site
> you mentioned http://www.zemer.co.il/song.asp?id=3D393 uses David and
> being correctly rendered.
> Kate (using pango?) also better render using Arial, David-CLM. It has
> some other issues though, but the result is mostly readable.

As Kate is a KDE application, I think it's not using Pango.
But, if it renders Hebrew with Arial well, it (or rendering
module of KDE/Qt) should have the similar ad-hoc code.

---
Kenichi Handa
handa@m17n.org


[-- Attachment #2: oowriter-arial.png --]
[-- Type: image/png, Size: 79797 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-06-30 21:48               ` Fwd: " Yair F
@ 2010-07-01  5:59                 ` Miles Bader
  0 siblings, 0 replies; 88+ messages in thread
From: Miles Bader @ 2010-07-01  5:59 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

Just FYI, the Emacs rendering of your sample text looks correct in my
Gnus buffer, using the Truetype version of "Lucida Sans".

[by "correct" I mean, (1) it handles all the points you describe
correctly, and (2) everything looks "nice".]

-Miles

-- 
Suburbia: where they tear out the trees and then name streets after them.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-01  5:52               ` Kenichi Handa
@ 2010-07-01 20:30                 ` Yair F
  2010-07-02  7:51                   ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Yair F @ 2010-07-01 20:30 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Thu, Jul 1, 2010 at 8:52 AM, Kenichi Handa <handa@m17n.org> wrote:
> All those are glyph positioning problems and can be improved
>by adding more code to hebrew-shape-gstring.

What else problem do you expect? So far I see no other problems
regarding bidi or compositions.


> ??? When I open your hebrew-sample2.txt with oowriter, and
> specify Arial font, the rendering is almost (exactly?) the
> same as that of Emacs (see the attached image).
 urrent oo p
You are right. Maybe it was with a special Hebrew oo version I don't
have it now, or maybe on other OS. current oo practice is "use proper
fonts" :(


>I think it's a dirty &
> ad-hoc hack.
>
> Theoretically, it is possible to do the same thing in the
> function hebrew-shape-gstring.  But, is it really worth
> doing that?  Isn't it enough to tell Hebrew users to use
> properly desinged OpenType fonts?

The sad answer on free systems is that there are nealy no such fonts.
The common answer for "Why is Hebrew so ugly on Linux?" is "Install
Culmus and msttcorefonts".
I guess that is the reason for the twaks you mentioned.

>
> As Kate is a KDE application, I think it's not using Pango.
> But, if it renders Hebrew with Arial well, it (or rendering
> module of KDE/Qt) should have the similar ad-hoc code.

Maybe, as you can see I don't know much about rending engines.

An additional and possibly less ugly path is to use presentation forms
when available.(UFB20) There are additional forms in the private use
area.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-01 20:30                 ` Yair F
@ 2010-07-02  7:51                   ` Kenichi Handa
  2010-07-12  8:17                     ` Kenichi Handa
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-07-02  7:51 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTil7SZlvtMBLmfz3DG_wHKKki72LwSIITx53w0tf@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> On Thu, Jul 1, 2010 at 8:52 AM, Kenichi Handa <handa@m17n.org> wrote:
> > All those are glyph positioning problems and can be improved
> >by adding more code to hebrew-shape-gstring.

> What else problem do you expect?

Sorry, I just misread what you wrote "I would say that the
positioning is not sufficient" as "there should be more work
other than positioning".

> >I think it's a dirty &
> > ad-hoc hack.
> >
> > Theoretically, it is possible to do the same thing in the
> > function hebrew-shape-gstring.  But, is it really worth
> > doing that?  Isn't it enough to tell Hebrew users to use
> > properly desinged OpenType fonts?

> The sad answer on free systems is that there are nealy no such fonts.
> The common answer for "Why is Hebrew so ugly on Linux?" is "Install
> Culmus and msttcorefonts".
> I guess that is the reason for the twaks you mentioned.

Sign...

> An additional and possibly less ugly path is to use presentation forms
> when available.(UFB20) There are additional forms in the private use
> area.

Hmmm, that seems to be a practical approach provided that
the presentation forms covers most of frequently used
character combinations.  I'll try to implement it.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-02  7:51                   ` Kenichi Handa
@ 2010-07-12  8:17                     ` Kenichi Handa
  2010-07-12 21:10                       ` Yair F
  0 siblings, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-07-12  8:17 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel

In article <tl7hbki48zt.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> > An additional and possibly less ugly path is to use presentation forms
> > when available.(UFB20) There are additional forms in the private use
> > area.

> Hmmm, that seems to be a practical approach provided that
> the presentation forms covers most of frequently used
> character combinations.  I'll try to implement it.

I've just comitted the code to do that.  I tested with the
Arial font and it seems that the most of points you listed
are solved now except for this:

5. This is not Hebrew, but the combining dot above should be composed
with the letter A.

It seems that Arial font doesn't have a glyph of #x307.

When you set both the default font and the font for #x307 to
"dejavu sans mono", #x307 is composed with the preceding
"A".

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-12  8:17                     ` Kenichi Handa
@ 2010-07-12 21:10                       ` Yair F
  2010-07-13  4:11                         ` Kenichi Handa
  2010-07-13 12:01                         ` Eli Zaretskii
  0 siblings, 2 replies; 88+ messages in thread
From: Yair F @ 2010-07-12 21:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Mon, Jul 12, 2010 at 11:17 AM, Kenichi Handa <handa@m17n.org> wrote:

> I've just comitted the code to do that.  I tested with the
> Arial font and it seems that the most of points you listed
> are solved now except for this:

Now it's much much better, Thank you!
Here are some more improvements needed:
The placement of Holam (05B9) point seems to be top-center. It should
be top-left instead.
Specifically for Lamed (0CDC) base letter it should be to the left of
the top vertical leg.
Some fonts have presentation-form for that at  E804.

Sheva (05B0) and Qamats (05B8) points should be shifted above baseline
to approximatly
center-center position when composed with Final Kaf (05DA).
Again some fonts pre-compose it at E802 and E803 respectively.

Currently I'm trying to hunt-down a problem when sometimes when
transient-mode is
active some characters suddenly stop composing. Once I get a recepie,
I'll let you know.

Thanks Again to you end Eli,
Yair



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-12 21:10                       ` Yair F
@ 2010-07-13  4:11                         ` Kenichi Handa
  2010-07-13  4:47                           ` Yair F
  2010-07-13 12:01                         ` Eli Zaretskii
  1 sibling, 1 reply; 88+ messages in thread
From: Kenichi Handa @ 2010-07-13  4:11 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel

In article <AANLkTimYSLAA4UTDjZ2MF5NhqOxtX_m7_oqQ6TanAMZl@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes:

> Now it's much much better, Thank you!
> Here are some more improvements needed:
> The placement of Holam (05B9) point seems to be top-center. It should
> be top-left instead.
> Specifically for Lamed (0CDC) base letter it should be to the left of
> the top vertical leg.
> Some fonts have presentation-form for that at  E804.

But, E804 is in a Private Use Area, and there's no way to
check if the glyph there (if any) is a Hebrew glyph or not.

Or, are there any consensus among Hebrew font designers?

> Currently I'm trying to hunt-down a problem when sometimes when
> transient-mode is
> active some characters suddenly stop composing. Once I get a recepie,
> I'll let you know.

I see.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-13  4:11                         ` Kenichi Handa
@ 2010-07-13  4:47                           ` Yair F
  0 siblings, 0 replies; 88+ messages in thread
From: Yair F @ 2010-07-13  4:47 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

On Tue, Jul 13, 2010 at 7:11 AM, Kenichi Handa <handa@m17n.org> wrote:
> But, E804 is in a Private Use Area, and there's no way to
> check if the glyph there (if any) is a Hebrew glyph or not.
By glyph name?

>
> Or, are there any consensus among Hebrew font designers?
It is available on some font, some of them those who don't give enough
information to do proper rendring.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Composing Hebrew diacriticals
  2010-07-12 21:10                       ` Yair F
  2010-07-13  4:11                         ` Kenichi Handa
@ 2010-07-13 12:01                         ` Eli Zaretskii
  1 sibling, 0 replies; 88+ messages in thread
From: Eli Zaretskii @ 2010-07-13 12:01 UTC (permalink / raw)
  To: Yair F; +Cc: emacs-devel, handa

> Date: Tue, 13 Jul 2010 00:10:04 +0300
> From: Yair F <yair.f.lists@gmail.com>
> Cc: emacs-devel@gnu.org
> 
> Currently I'm trying to hunt-down a problem when sometimes when
> transient-mode is active some characters suddenly stop composing.

Is this in a buffer that's bidi-reordered for display?  If so, does
the problem go away if you turn off bidi-display-reordering?

When bidi reordering is in effect, both face resolution and character
composition need to examine buffer text backwards, because text
properties and character compositions are still defined in logical
order.  It's possible that face resolution somehow interferes with
character composition in that case.

Let me know if I can help.



^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2010-07-13 12:01 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-30 12:29 Composing Hebrew diacriticals Eli Zaretskii
2010-05-05  2:39 ` Kenichi Handa
2010-05-05 15:49   ` David Kastrup
2010-05-05 20:51     ` Eli Zaretskii
2010-05-06  7:20       ` David Kastrup
2010-05-06  0:45     ` Kenichi Handa
2010-05-06 12:14       ` David Kastrup
2010-05-06 13:01         ` Kenichi Handa
2010-05-05 18:01   ` Eli Zaretskii
2010-05-07 11:15     ` Kenichi Handa
2010-05-08 12:51       ` Eli Zaretskii
2010-05-06 14:59   ` Yair F.
2010-05-06 17:41     ` Eli Zaretskii
2010-05-07  0:48     ` Kenichi Handa
2010-05-07  4:41       ` Yair F
2010-05-07  6:23         ` Kenichi Handa
2010-05-07 10:00           ` Yair F
2010-05-07 11:11             ` Kenichi Handa
2010-05-07  9:28         ` Eli Zaretskii
2010-05-10 14:09           ` Yair F
2010-05-11  0:25             ` Kenichi Handa
2010-05-11 12:20               ` Kenichi Handa
2010-05-11 16:22                 ` Eli Zaretskii
2010-05-12  8:04                   ` Kenichi Handa
2010-05-12 17:35                     ` Eli Zaretskii
2010-05-12 19:05                       ` Juanma Barranquero
2010-05-13  3:06                         ` Eli Zaretskii
2010-05-13  0:42                       ` Kenichi Handa
2010-05-14  8:10                         ` Kenichi Handa
2010-05-14 10:02                           ` Eli Zaretskii
2010-05-14 11:58                             ` Kenichi Handa
2010-05-14 13:29                               ` Eli Zaretskii
2010-05-14 14:06                                 ` Eli Zaretskii
     [not found]                           ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>
2010-05-15  2:14                             ` Kenichi Handa
2010-05-15 21:35                               ` Yair F
2010-05-17  4:35                                 ` Kenichi Handa
2010-05-17 17:32                                   ` Eli Zaretskii
2010-05-18  0:36                                     ` Kenichi Handa
2010-05-17 21:08                                   ` Yair F
2010-05-20  2:09                                     ` Kenichi Handa
2010-05-25  1:45                                       ` Kenichi Handa
2010-05-25 20:56                                         ` Yair F
2010-05-26  0:36                                           ` Kenichi Handa
2010-05-26  4:37                                             ` Yair F
2010-05-26  6:00                                               ` Kenichi Handa
2010-05-26 16:12                                                 ` Yair F
2010-05-27  7:27                                                   ` Kenichi Handa
2010-05-27 21:59                                                     ` Yair F
2010-05-28  0:42                                                       ` Kenichi Handa
2010-06-01  8:58                                                         ` Yair F
2010-05-26 13:28                                             ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier
2010-05-26 17:14                                               ` Eli Zaretskii
2010-05-27  4:13                                                 ` Enabling bidi Stefan Monnier
2010-05-27 17:43                                                   ` Eli Zaretskii
2010-05-18  7:29                                   ` Composing Hebrew diacriticals Eli Zaretskii
2010-05-17 13:53                                 ` Stefan Monnier
2010-05-19 17:23                     ` Eli Zaretskii
2010-05-11 21:40                 ` Yair F
2010-05-12  3:15                   ` Eli Zaretskii
2010-05-12 15:11                     ` Yair F
2010-05-12 17:43                       ` Eli Zaretskii
2010-05-12 22:01                         ` Yair F
2010-05-13 17:14                           ` Eli Zaretskii
2010-05-13 19:46                             ` Yair F
2010-05-13 19:56                               ` Eli Zaretskii
2010-05-13 20:08                                 ` Yair F
2010-05-14  2:35                                   ` Miles Bader
2010-05-14 10:45                                     ` Yair F
2010-05-14 13:05                                       ` Eli Zaretskii
2010-05-14 13:15                                       ` Kenichi Handa
2010-05-15  0:46                                       ` Miles Bader
2010-05-13  0:29                       ` Kenichi Handa
     [not found] <tl7fx0v9nra.fsf@m17n.org>
2010-06-15 11:02 ` Kenichi Handa
2010-06-24  6:33   ` Kenichi Handa
2010-06-25 10:16     ` Eli Zaretskii
2010-06-28 16:40     ` Yair F
2010-06-29  8:07       ` Kenichi Handa
2010-06-29 18:57         ` Yair F
2010-06-30  5:27           ` Kenichi Handa
     [not found]             ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>
2010-06-30 21:48               ` Fwd: " Yair F
2010-07-01  5:59                 ` Miles Bader
2010-07-01  5:52               ` Kenichi Handa
2010-07-01 20:30                 ` Yair F
2010-07-02  7:51                   ` Kenichi Handa
2010-07-12  8:17                     ` Kenichi Handa
2010-07-12 21:10                       ` Yair F
2010-07-13  4:11                         ` Kenichi Handa
2010-07-13  4:47                           ` Yair F
2010-07-13 12:01                         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).