Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
       [not found]   ` <tl739yppmat.fsf@m17n.org>
@ 2010-04-23 18:52     ` Eli Zaretskii
  2010-04-23 20:34       ` Andreas Schwab
                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-23 18:52 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: Peter_Dyballa@Freenet.DE, 5977@debbugs.gnu.org
> Date: Wed, 21 Apr 2010 11:32:58 +0900
> 
> I've just build the trunk code on GNU/Linus, and found that all
> characters displayed by composition are incorrect.

Only when bidi-display-reordering is turned on (etc/HELLO does that
automatically).

> Here's a brief explanation about control flow.

Thanks, that part was quite clear from the code.  I now fixed display
of composed characters from L2R scripts when bidi-display-reordering
is set to non-nil.

Where I really need help is in getting compositions to work when text
is reordered.  Is it true that composition_reseat_it and its
subroutines need to see the to-be-composed characters in strict
logical order, i.e. left to right?  Or can they also work if they see
the characters to be composed in the reverse order?

Also, what does this condition (in next_element_from_composition)
check?

      if (it->c < 0)
	{
	  IT_CHARPOS (*it) += it->cmp_it.nchars;
	  IT_BYTEPOS (*it) += it->cmp_it.nbytes;

If the meaning of the test is that there's no composition at the
iterator's position, then why do we skip some of the buffer text under
this condition?

Thanks for your help.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-23 18:52     ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
@ 2010-04-23 20:34       ` Andreas Schwab
  2010-04-23 20:43         ` Eli Zaretskii
  2010-04-26  2:09       ` Kenichi Handa
  2010-04-26 11:29       ` Kenichi Handa
  2 siblings, 1 reply; 27+ messages in thread
From: Andreas Schwab @ 2010-04-23 20:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa

Eli Zaretskii <eliz@gnu.org> writes:

> Thanks, that part was quite clear from the code.  I now fixed display
> of composed characters from L2R scripts when bidi-display-reordering
> is set to non-nil.

There is still a problem with the cursor positioning when the line ends
with a composed character (try moving point to the end of the Lao line).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-23 20:34       ` Andreas Schwab
@ 2010-04-23 20:43         ` Eli Zaretskii
  2010-04-24 11:27           ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-23 20:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel, handa

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Kenichi Handa <handa@m17n.org>,  emacs-devel@gnu.org
> Date: Fri, 23 Apr 2010 22:34:35 +0200
> 
> There is still a problem with the cursor positioning when the line ends
> with a composed character (try moving point to the end of the Lao line).

Yes, I know.  I'm working on that.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-23 20:43         ` Eli Zaretskii
@ 2010-04-24 11:27           ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-24 11:27 UTC (permalink / raw)
  To: schwab, emacs-devel, handa

> Date: Fri, 23 Apr 2010 23:43:48 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org, handa@m17n.org
> 
> > From: Andreas Schwab <schwab@linux-m68k.org>
> > Cc: Kenichi Handa <handa@m17n.org>,  emacs-devel@gnu.org
> > Date: Fri, 23 Apr 2010 22:34:35 +0200
> > 
> > There is still a problem with the cursor positioning when the line ends
> > with a composed character (try moving point to the end of the Lao line).
> 
> Yes, I know.  I'm working on that.

Fix it, I think (revno 100025).




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-23 18:52     ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
  2010-04-23 20:34       ` Andreas Schwab
@ 2010-04-26  2:09       ` Kenichi Handa
  2010-04-26  2:38         ` Kenichi Handa
  2010-04-26 11:29       ` Kenichi Handa
  2 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-04-26  2:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <834oj22e96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > I've just build the trunk code on GNU/Linus, and found that all
> > characters displayed by composition are incorrect.

> Only when bidi-display-reordering is turned on (etc/HELLO does that
> automatically).

> > Here's a brief explanation about control flow.

> Thanks, that part was quite clear from the code.  I now fixed display
> of composed characters from L2R scripts when bidi-display-reordering
> is set to non-nil.

I've just 


> Where I really need help is in getting compositions to work when text
> is reordered.  Is it true that composition_reseat_it and its
> subroutines need to see the to-be-composed characters in strict
> logical order, i.e. left to right?  Or can they also work if they see
> the characters to be composed in the reverse order?

> Also, what does this condition (in next_element_from_composition)
> check?

>       if (it->c < 0)
> 	{
> 	  IT_CHARPOS (*it) += it->cmp_it.nchars;
> 	  IT_BYTEPOS (*it) += it->cmp_it.nbytes;

> If the meaning of the test is that there's no composition at the
> iterator's position, then why do we skip some of the buffer text under
> this condition?

I vaguely remember that this is to avoid crash by a bug of a
composition function.

A composition function is written in Lisp and can be tested
interactively without restarting Emacs each time.  If it has
a bug while testing, it may produce no glyphs for a chunk of
text.  In such a case, composition_update_it returns -1 and
it->c is set to that return value.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-26  2:09       ` Kenichi Handa
@ 2010-04-26  2:38         ` Kenichi Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Kenichi Handa @ 2010-04-26  2:38 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, emacs-devel

Oops, I typed C-c C-c too early.

In article <tl7r5m3hsmd.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> > Thanks, that part was quite clear from the code.  I now fixed display
> > of composed characters from L2R scripts when bidi-display-reordering
> > is set to non-nil.

> I've just 

I meant "I've just confimed it, thank you."

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-23 18:52     ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
  2010-04-23 20:34       ` Andreas Schwab
  2010-04-26  2:09       ` Kenichi Handa
@ 2010-04-26 11:29       ` Kenichi Handa
  2010-04-26 18:40         ` Compositions and bidi display Eli Zaretskii
  2010-04-27  3:13         ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
  2 siblings, 2 replies; 27+ messages in thread
From: Kenichi Handa @ 2010-04-26 11:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <834oj22e96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Where I really need help is in getting compositions to work when text
> is reordered.  Is it true that composition_reseat_it and its
> subroutines need to see the to-be-composed characters in strict
> logical order, i.e. left to right?  Or can they also work if they see
> the characters to be composed in the reverse order?

All composition-related functions expect characters are in
logical order.  The bottom-most library for OTF handling
(libotf) requires it because OpenType tables expect
characters in logical order.  So, the bidi reordering must
happen after composition handling is done.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-26 11:29       ` Kenichi Handa
@ 2010-04-26 18:40         ` Eli Zaretskii
  2010-04-27 12:15           ` Kenichi Handa
  2010-04-27  3:13         ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-26 18:40 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 26 Apr 2010 20:29:18 +0900
> Cc: emacs-devel@gnu.org
> 
> All composition-related functions expect characters are in
> logical order.

I assumed that much.  Sigh...

> So, the bidi reordering must happen after composition handling is
> done.

Unfortunately, this is impossible, not without throwing away the
entire design and current implementation of the bidi reordering, and
implementing it in a totally different way that will have to be much
more invasive into the overall design of Emacs display engine.

The reason is, as you know, that bidi reordering in Emacs is
conceptually just a replacement for advancing from one character to
the next during iteration through buffers or strings.  Instead of
incrementing the character position to the next character, we modify
the position non-linearly to get to the next character in the visual
order.  Obviously, this iteration is a lower-level operation than character
composition.

In addition, the bidi reordering engine knows nothing about the
characters it encounters except their bidirectional properties; in
particular, it doesn't know anything about character compositions, and
teaching it about them would mean rather serious complications.

Moreover, the bidirectional properties are in general defined for
individual characters, not for the composed ones, which is one more
reason it is very hard to do what you suggest, even if we would turn
the current design inside out.  For example, we compose Hebrew
consonants with diacriticals into a single glyph, but that glyph has
no character codepoint to look up its bidirectional properties in the
Unicode database.  So, once composed, these characters cannot be
reordered by following the UAX#9 algorithm without complications,
because UAX#9 is explicitly defined to work _before_ any shaping of
characters for display, see Section 3.5 there.

Therefore, I will need to find and handle sequences of characters to
be composed as an integral part of next_element_from_buffer, similarly
to what is already done with face changes there.

The idea is to detect the situation where the bidi iteration placed us
into a composable sequence of characters, and when that happens,
compose them and deliver them as a single display element, and then
skip the entire sequence, like we do today in the unidirectional
display.  The tricky part is that today we only detect this when we
hit the beginning of such a sequence, while moving in the strictly
increasing order of buffer positions; with bidi reordering we will
need to detect them from the end of the sequence as well, for when the
bidi iterator moves backwards or jumps across many character
positions.

Is it possible to write a function or macro that will find out, for a
particular buffer/string position, whether that position is at the end
or in the middle of a composable sequence of characters, and if so,
return the character positions of the first and last characters of the
sequence?  Something like CHAR_COMPOSED_P, but one that looks back in
the buffer?  If so, could you please help me write such a function?

TIA

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-26 11:29       ` Kenichi Handa
  2010-04-26 18:40         ` Compositions and bidi display Eli Zaretskii
@ 2010-04-27  3:13         ` Eli Zaretskii
  2010-04-27 12:26           ` Kenichi Handa
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-27  3:13 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 26 Apr 2010 20:29:18 +0900
> Cc: emacs-devel@gnu.org
> 
> All composition-related functions expect characters are in
> logical order.  The bottom-most library for OTF handling
> (libotf) requires it because OpenType tables expect
> characters in logical order.

Btw, where does libotf come into this picture?  That is, which libotf
functions we use for composite characters, and at what stage in the
redisplay process?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-26 18:40         ` Compositions and bidi display Eli Zaretskii
@ 2010-04-27 12:15           ` Kenichi Handa
  2010-04-28  3:18             ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-04-27 12:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <837hnuys42.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > So, the bidi reordering must happen after composition handling is
> > done.

> Unfortunately, this is impossible, not without throwing away the
> entire design and current implementation of the bidi reordering, and
> implementing it in a totally different way that will have to be much
> more invasive into the overall design of Emacs display engine.

> The reason is, as you know, that bidi reordering in Emacs is
> conceptually just a replacement for advancing from one character to
> the next during iteration through buffers or strings.  Instead of
> incrementing the character position to the next character, we modify
> the position non-linearly to get to the next character in the visual
> order.  Obviously, this iteration is a lower-level operation than character
> composition.

> In addition, the bidi reordering engine knows nothing about the
> characters it encounters except their bidirectional properties; in
> particular, it doesn't know anything about character compositions, and
> teaching it about them would mean rather serious complications.

> Moreover, the bidirectional properties are in general defined for
> individual characters, not for the composed ones, which is one more
> reason it is very hard to do what you suggest, even if we would turn
> the current design inside out.  For example, we compose Hebrew
> consonants with diacriticals into a single glyph, but that glyph has
> no character codepoint to look up its bidirectional properties in the
> Unicode database.

I think it's possible to apply Unicode's bidi algorithm to
the glyph sequence if each glyph provides a character code
to check for reordering.  For composition glyph, we can use
the first character of the composed sequence.  But, as your
algorithm is incremental and don't cache glyphs, such a
method may slow down the display engine.

> So, once composed, these characters cannot be
> reordered by following the UAX#9 algorithm without complications,
> because UAX#9 is explicitly defined to work _before_ any shaping of
> characters for display, see Section 3.5 there.

The example of Section 3.5 is for base characters, not
applicable for base and combining character sequence.  First
of all, TR9's bidi model is not incremental, and thus the
shaping engine can see a result of all reordering result at
once.

In that model, it's possible for the shaping engine to
reverse the order of a base character and combining
characters after bidi processing as written in L3 of 3.4:

============================================================
L3. Combining marks applied to a right-to-left base
character will at this point precede their base
character. If the rendering engine expects them to follow
the base characters in the final display process, then the
ordering of the marks and the base character must be
reversed.
============================================================

So, how to do that in the current incremental method?

> Therefore, I will need to find and handle sequences of characters to
> be composed as an integral part of next_element_from_buffer, similarly
> to what is already done with face changes there.

> The idea is to detect the situation where the bidi iteration placed us
> into a composable sequence of characters, and when that happens,
> compose them and deliver them as a single display element, and then
> skip the entire sequence, like we do today in the unidirectional
> display.  The tricky part is that today we only detect this when we
> hit the beginning of such a sequence, while moving in the strictly
> increasing order of buffer positions; with bidi reordering we will
> need to detect them from the end of the sequence as well, for when the
> bidi iterator moves backwards or jumps across many character
> positions.

> Is it possible to write a function or macro that will find out, for a
> particular buffer/string position, whether that position is at the end
> or in the middle of a composable sequence of characters, and if so,
> return the character positions of the first and last characters of the
> sequence?  Something like CHAR_COMPOSED_P, but one that looks back in
> the buffer?  If so, could you please help me write such a function?

Here's a rough idea.

(1) Call composition_compute_stop_pos with ENDPOS < CHARPOS
if we are now in R2L range.  ENDPOS is the start of this R2L
range.  And modify this function to search a buffer/string
backward if ENDPOS < CHARPOS.

Provided that uppercase letters denote Hebrew consonants,
lowercase denotes Hebrew diacriticals, a buffer has the
character sequence "AaBbCc", CHARPOS is the position of 'c',
ENDPOS is the position of 'A'.

(2) Do the same for composition_reseat_it.

(3) Add member 'direction' to struct composition_it that
records in which direction context the composition was made.

(4) Modify composition_update_it to update members 'from'
and 'to' of "struct composition_it" in the reverse order if
'direction' is R2L.  Note that a single composition may
contain multiple graphem clusters.  For instance, it's
possible to write a composition fuction that accepts
"AaBbCc" (above example) at onse and produce a single
composition that contains three graphem clusters "Aa", "Bb",
and "Cc".

To do all of them, perhaps all I need is to know the way to
find the correct ENDPOS.  Please tell me how to do that.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed)
  2010-04-27  3:13         ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
@ 2010-04-27 12:26           ` Kenichi Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Kenichi Handa @ 2010-04-27 12:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <8339yhziyj.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > All composition-related functions expect characters are in
> > logical order.  The bottom-most library for OTF handling
> > (libotf) requires it because OpenType tables expect
> > characters in logical order.

> Btw, where does libotf come into this picture?  That is, which libotf
> functions we use for composite characters, and at what stage in the
> redisplay process?

For instance, an OpenType font may have independent glyphs
for Hebrew consontants and diacriticals, and provide GPOS
(glyph positioning) table to tell where to place a specific
diacritical glyph on a specific consontant.  To utilize such
a font, a composition function calls libotf's OTF_drive_gpos
in this calling sequence.

CHAR_COMPOSED_P
 -> composition_reseat_it
   -> autocmp_chars
     -> a Lisp function in `composition-function-table
       -> font-shape-gstring
         -> font_driver->shape
           -> ftfont_shape_by_flt
             -> mflt_run (of libm17n-flt)
               -> ftfont_drive_otf (ftfont.c) as a callback rountine
                 -> OTF_drive_gpos (of libotf)

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-27 12:15           ` Kenichi Handa
@ 2010-04-28  3:18             ` Eli Zaretskii
  2010-04-28  4:01               ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-28  3:18 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 27 Apr 2010 21:15:04 +0900
> 
> Provided that uppercase letters denote Hebrew consonants,
> lowercase denotes Hebrew diacriticals, a buffer has the
> character sequence "AaBbCc", CHARPOS is the position of 'c',
> ENDPOS is the position of 'A'.
> [...]
> To do all of them, perhaps all I need is to know the way to
> find the correct ENDPOS.  Please tell me how to do that.

What is the definition of ENDPOS?  If that's the beginning of the
composition sequence, that's the same question I asked, for which I
don't know the answer.  If that's the other end of the R2L run of
characters, you need to iterate with bidi_get_next_char_visually until
some condition (which I cannot yet formulate) is satisfied.  But note
that this is tricky, because the bidi iteration changes direction and
jumps at will.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-28  3:18             ` Eli Zaretskii
@ 2010-04-28  4:01               ` Kenichi Handa
  2010-04-28 17:38                 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-04-28  4:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83mxwoxo1t.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Provided that uppercase letters denote Hebrew consonants,
> > lowercase denotes Hebrew diacriticals, a buffer has the
> > character sequence "AaBbCc", CHARPOS is the position of 'c',
> > ENDPOS is the position of 'A'.
> > [...]
> > To do all of them, perhaps all I need is to know the way to
> > find the correct ENDPOS.  Please tell me how to do that.

> What is the definition of ENDPOS?  If that's the beginning of the
> composition sequence, that's the same question I asked, for which I
> don't know the answer.  If that's the other end of the R2L run of
> characters,

Yes, that one.

> you need to iterate with bidi_get_next_char_visually until
> some condition (which I cannot yet formulate) is
> satisfied.  But note that this is tricky, because the bidi
> iteration changes direction and jumps at will.

The condition should be "until it reaches a character that
should never be composed with the currently looking
character".  We may be able to simplify that condition to
"until it reaches a character in the different bidi level
(or chunk)".

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-28  4:01               ` Kenichi Handa
@ 2010-04-28 17:38                 ` Eli Zaretskii
  2010-04-28 22:49                   ` Stefan Monnier
  2010-04-30  6:06                   ` Kenichi Handa
  0 siblings, 2 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-28 17:38 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Wed, 28 Apr 2010 13:01:10 +0900
> 
> > What is the definition of ENDPOS?  If that's the beginning of the
> > composition sequence, that's the same question I asked, for which I
> > don't know the answer.  If that's the other end of the R2L run of
> > characters,
> 
> Yes, that one.
> 
> > you need to iterate with bidi_get_next_char_visually until
> > some condition (which I cannot yet formulate) is
> > satisfied.  But note that this is tricky, because the bidi
> > iteration changes direction and jumps at will.
> 
> The condition should be "until it reaches a character that
> should never be composed with the currently looking
> character".

That is the condition I'm looking for.  But how to code it?  Is the
code in find_automatic_composition a good starting point?  AFAIU, it
can search backward as well as forward.

> We may be able to simplify that condition to
> "until it reaches a character in the different bidi level
> (or chunk)".

But that could be very far back.  I would really like to avoid going
too far back, just to find out whether we reached a composition
sequence, because (again AFAIU) the length of most such sequences is
just a few characters.  Is it correct that searching back
MAX_AUTO_COMPOSITION_LOOKBACK characters is enough?

If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how
long can a composition sequence be?

Another idea would be to call composition_compute_stop_pos repeatedly,
starting from the last cmp_it->stop_pos, until we find the last
stop_pos before the current iterator position, then compute the
beginning and end of the composable sequence at that position, and
record it in the iterator.  Then we handle the composition when we
enter the sequence from either end.

Btw, do we still need to support static compositions?  Those are based
on the `composition' text property, which are no longer supported,
right?  Or am I confused?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-28 17:38                 ` Eli Zaretskii
@ 2010-04-28 22:49                   ` Stefan Monnier
  2010-04-29  3:12                     ` Eli Zaretskii
  2010-04-30  6:06                   ` Kenichi Handa
  1 sibling, 1 reply; 27+ messages in thread
From: Stefan Monnier @ 2010-04-28 22:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa

> Btw, do we still need to support static compositions?  Those are based
> on the `composition' text property, which are no longer supported,
> right?  Or am I confused?

They're not?
Does that mean that

  (font-lock-add-keywords
   nil `(("(lambda\\>"
	  (0 (progn (compose-region (1+ (match-beginning 0)) (match-end 0)
				    ;; ,(make-char 'greek-iso8859-7 107)
				    ?λ)
		    nil)))))

is using unsupported features?  I know I could use `display' instead,
but some details of `compose-region' are handy (e.g. the fact that it's
automatically removed when the text is modified).


        Stefan




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-28 22:49                   ` Stefan Monnier
@ 2010-04-29  3:12                     ` Eli Zaretskii
  2010-04-30  2:28                       ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-29  3:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: handa, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 28 Apr 2010 18:49:39 -0400
> Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
> 
> > Btw, do we still need to support static compositions?  Those are based
> > on the `composition' text property, which are no longer supported,
> > right?  Or am I confused?
> 
> They're not?

I deduced this from the fact that we removed Qcomposition and the
associated handle_composition_prop from xdisp.c.  Again, I could be
confused.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-29  3:12                     ` Eli Zaretskii
@ 2010-04-30  2:28                       ` Kenichi Handa
  2010-04-30  6:41                         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-04-30  2:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

In article <838w87x87a.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Date: Wed, 28 Apr 2010 18:49:39 -0400
> > Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
> > 
> > > Btw, do we still need to support static compositions?  Those are based
> > > on the `composition' text property, which are no longer supported,
> > > right?  Or am I confused?
> > 
> > They're not?

> I deduced this from the fact that we removed Qcomposition and the
> associated handle_composition_prop from xdisp.c.  Again, I could be
> confused.

??? I've never removed handle_composition_prop nor any of
codes for static composition.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-28 17:38                 ` Eli Zaretskii
  2010-04-28 22:49                   ` Stefan Monnier
@ 2010-04-30  6:06                   ` Kenichi Handa
  2010-04-30  7:08                     ` Eli Zaretskii
  2010-04-30 10:07                     ` Eli Zaretskii
  1 sibling, 2 replies; 27+ messages in thread
From: Kenichi Handa @ 2010-04-30  6:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83d3xjxys1.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > The condition should be "until it reaches a character that
> > should never be composed with the currently looking
> > character".

> That is the condition I'm looking for.  But how to code it?  Is the
> code in find_automatic_composition a good starting point?

No.  The checking of possibility of composing characters at
a specific position is done within
composition_compute_stop_pos.  What we need now is where we
should stop searching in composition_compute_stop_pos.

In the case of "english HEBREW TEXT text" (lowercases are
l2r characters, upppercases are r2l characters),
get_next_display_element starts from the first "e" and
proceeds to the first " " (stage 1), then jumps to the last
"T" and proceeds back to the first "H" (stage 2), then jumps
to the last " " and proceeds to the last "t" (stage 3).

When composition_compute_stop_pos is called in stage 1,
ENDPOS should be the first " " because searching far is
useless (we may have to compose some of "TEXT" before
composing some of "HEBREW").  When
composition_compute_stop_pos is called in stage 2, ENDPOS
should be the first "H" because searching far back is
useless, and so on.

Note that composition_compute_stop_pos just finds a stop
position to check, and the actual checking and composing is
done by composition_reseat_it which is called by
CHAR_COMPOSED_P.  But composition_reseat_it also needs
ENDPOS because when that funciton finds that there's no need
of composition at the stop position, it calls
composition_compute_stop_pos to update the next stop
position.

> > We may be able to simplify that condition to
> > "until it reaches a character in the different bidi level
> > (or chunk)".

> But that could be very far back.

Isn't it possible to record where the current bidi-run
started while you scan a buffer in
bidi_get_next_char_visually?

> I would really like to avoid going too far back, just to
> find out whether we reached a composition sequence,

We don't have to re-calculate ENDPOS each time.  It must be
updated only when we pass over bidi boundary.  Consider the
above example case ("english ...").

> because (again AFAIU) the length of most such sequences is
> just a few characters.  Is it correct that searching back
> MAX_AUTO_COMPOSITION_LOOKBACK characters is enough?

No.

> If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how
> long can a composition sequence be?

It is MAX_COMPOSITION_COMPONENTS (16), but here it's not
relevant.  What we need is to find where in a buffer (before
the scan reaches ENDPOS) next composition will happen.  And,
to perform it efficiently, giving a proper ENDPOS is
necessary.

> Another idea would be to call composition_compute_stop_pos repeatedly,
> starting from the last cmp_it->stop_pos, until we find the last
> stop_pos before the current iterator position, then compute the
> beginning and end of the composable sequence at that position, and
> record it in the iterator.  Then we handle the composition when we
> enter the sequence from either end.

To move from one composition position to the next, we must
actually call autocmp_chars and find where the current
composition ends, then start searching for the next
composition.  As autocmp_chars calls Lisp and all functions
to compose characters, it's so inefficient to call it
repeatedly just to find the last one.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30  2:28                       ` Kenichi Handa
@ 2010-04-30  6:41                         ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-30  6:41 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: monnier, emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Fri, 30 Apr 2010 11:28:40 +0900
> 
> In article <838w87x87a.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > > Date: Wed, 28 Apr 2010 18:49:39 -0400
> > > Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
> > > 
> > > > Btw, do we still need to support static compositions?  Those are based
> > > > on the `composition' text property, which are no longer supported,
> > > > right?  Or am I confused?
> > > 
> > > They're not?
> 
> > I deduced this from the fact that we removed Qcomposition and the
> > associated handle_composition_prop from xdisp.c.  Again, I could be
> > confused.
> 
> ??? I've never removed handle_composition_prop nor any of
> codes for static composition.

Sorry, I got confused.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30  6:06                   ` Kenichi Handa
@ 2010-04-30  7:08                     ` Eli Zaretskii
  2010-05-03  2:39                       ` Kenichi Handa
  2010-04-30 10:07                     ` Eli Zaretskii
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-30  7:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 30 Apr 2010 15:06:11 +0900
> 
> In the case of "english HEBREW TEXT text" (lowercases are
> l2r characters, upppercases are r2l characters),
> get_next_display_element starts from the first "e" and
> proceeds to the first " " (stage 1), then jumps to the last
> "T" and proceeds back to the first "H" (stage 2), then jumps
> to the last " " and proceeds to the last "t" (stage 3).

This is only the simplest case, with just 2 embedding levels: the base
level of the paragraph, and the (higher) level of the embedded R2L
text.  The general case is much more complex: there could be up to 60
nested levels, and some of them could begin or end at the same buffer
position.  bidi.c handles all this complexity by means of a very
simple algorithm, but that algorithm needs to know a lot about the
characters traversed so far.  I don't think exposing all these
internals to xdisp.c is a good idea.

> Note that composition_compute_stop_pos just finds a stop
> position to check, and the actual checking and composing is
> done by composition_reseat_it which is called by
> CHAR_COMPOSED_P.

Right, but the same is true for the bidi iteration: I need only to
know when to check for composition; the actual composing will be still
done by composition_reseat_it.  I just cannot assume that I always
move linearly forward in the buffer.  Therefore, it is not enough to
have only the next stop position recorded in the iterator.  I need
more information recorded.  What I'm trying to determine in this
thread is what needs to be recorded and how to compute what's needed.
Thanks for helping me.

> > > We may be able to simplify that condition to
> > > "until it reaches a character in the different bidi level
> > > (or chunk)".
> 
> > But that could be very far back.
> 
> Isn't it possible to record where the current bidi-run
> started while you scan a buffer in
> bidi_get_next_char_visually?

See above: it's tricky.  The function in bidi.c that looks for the
beginning and end of a level run relies on almost all the other
functions in bidi.c, and it does that on the fly.  The level edges are
not recorded anywhere, except in an internal cache used to speed up
moving back in the buffer.

> > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how
> > long can a composition sequence be?
> 
> It is MAX_COMPOSITION_COMPONENTS (16), but here it's not
> relevant.

Why not?  Isn't it true that if none of the 16 characters preceding
the current position can start a composition sequence, then the
current position is not inside a composition sequence?

> > Another idea would be to call composition_compute_stop_pos repeatedly,
> > starting from the last cmp_it->stop_pos, until we find the last
> > stop_pos before the current iterator position, then compute the
> > beginning and end of the composable sequence at that position, and
> > record it in the iterator.  Then we handle the composition when we
> > enter the sequence from either end.
> 
> To move from one composition position to the next, we must
> actually call autocmp_chars and find where the current
> composition ends, then start searching for the next
> composition.  As autocmp_chars calls Lisp and all functions
> to compose characters, it's so inefficient to call it
> repeatedly just to find the last one.

If the buffer or string is full of composed characters, then yes, it
would be a slowdown.  Especially if the number of ``suspect'' stop
positions is much larger than the number of actual composition
sequences.  But what else can be done, given the design of the
compositions that doesn't let us know the sequence length without
actually composing the character?

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30  6:06                   ` Kenichi Handa
  2010-04-30  7:08                     ` Eli Zaretskii
@ 2010-04-30 10:07                     ` Eli Zaretskii
  2010-04-30 12:12                       ` Kenichi Handa
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-30 10:07 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 30 Apr 2010 15:06:11 +0900

After re-reading the code of composition_compute_stop_pos, I have a
few more questions about what you wrote.

> Note that composition_compute_stop_pos just finds a stop
> position to check, and the actual checking and composing is
> done by composition_reseat_it which is called by
> CHAR_COMPOSED_P.

But it looks like composition_compute_stop_pos does use at least some
validation for the candidate stop position.  AFAIU, this fragment
finds and validates a static composition:

  if (find_composition (charpos, endpos, &start, &end, &prop, string)
      && COMPOSITION_VALID_P (start, end, prop))
    {
      cmp_it->stop_pos = endpos = start;
      cmp_it->ch = -1;
    }

So it looks like COMPOSITION_VALID_P is the proper way of validating a
position that is a candidate for a static composition.  Is that true?
If it is true, then the end point of the static composition is given
by the `end' argument to find_composition, and all we need is record
it in cmp_it.  If not true, what _does_ COMPOSITION_VALID_P validate?

And the loop after that, conditioned on auto-composition-mode, seems
to do a similar job for automatic compositions.  Omitting some
secondary details, that loop does this:

  while (charpos < endpos)
    {
      [advance to the next character]
      val = CHAR_TABLE_REF (Vcomposition_function_table, c);
      if (! NILP (val))
	{
	  Lisp_Object elt;

	  for (; CONSP (val); val = XCDR (val))
	    {
	      elt = XCAR (val);
	      if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1))
		  && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start)
		break;
	    }
	  if (CONSP (val))
	    {
	      cmp_it->lookback = XFASTINT (AREF (elt, 1));
	      cmp_it->stop_pos = charpos - 1 - cmp_it->lookback;
	      cmp_it->ch = c;
	      return;
	    }
	}
    }

This looks as if a position that is a candidate for starting a
composition sequence should have a non-nil entry in
composition-function-table for the character at that position, and
that entry should specify the (relative) character position where the
sequence might start.  Is my understanding correct?

> To move from one composition position to the next, we must actually
> call autocmp_chars and find where the current composition ends, then
> start searching for the next composition.

It is true that the code looking for stop position that might begin an
automatic composition does not compute the end of the sequence.  That
end is computed by autocmp_chars.  But what does this mean in
practice?  Suppose we have found a candidate stop_pos, marked by S
below:

     abcdeSuvwxyz

First, a composition sequence cannot be shorter than 2 characters,
right?  So the next stop_pos cannot be before v.  Now suppose that the
actual composition sequence is "Suvw", and we issue the next call to
composition_compute_stop_pos at v -- are you saying that it will
suggest that v is also a possible stop_pos, even though it is in the
middle of a composition sequence?  If not, then repeated calls to
composition_compute_stop_pos in the bidi case, without calling
composition_reseat_it in between, will just be slightly
more expensive because they will need to examine more positions.  Is
this analysis correct?

> But composition_reseat_it also needs ENDPOS

We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we
call composition_reseat_it and composition_compute_stop_pos in the
forward direction repeatedly, can't we?  That's because, when the
iterator is some position, we are only interested in compositions that
cover that position.

> We don't have to re-calculate ENDPOS each time.  It must be
> updated only when we pass over bidi boundary.

Btw, can we always assume that all the characters of a composition
sequence are at the same embedding level?  I guess IOW I'm asking what
Emacs features are currently implemented based on compositions?
Obviously, all the characters in a sequence that produces a single
grapheme must have the same level, but what about compositions that
produce several grapheme clusters -- can each of the clusters have
different bidirectional properties?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30 10:07                     ` Eli Zaretskii
@ 2010-04-30 12:12                       ` Kenichi Handa
  2010-04-30 13:15                         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-04-30 12:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

I'll reply to this before replying to your previous mail.

In article <83r5lxw8wi.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Note that composition_compute_stop_pos just finds a stop
> > position to check, and the actual checking and composing is
> > done by composition_reseat_it which is called by
> > CHAR_COMPOSED_P.

> But it looks like composition_compute_stop_pos does use at least some
> validation for the candidate stop position.  AFAIU, this fragment
> finds and validates a static composition:

>   if (find_composition (charpos, endpos, &start, &end, &prop, string)
>       && COMPOSITION_VALID_P (start, end, prop))
>     {
>       cmp_it->stop_pos = endpos = start;
>       cmp_it->ch = -1;
>     }

> So it looks like COMPOSITION_VALID_P is the proper way of validating a
> position that is a candidate for a static composition.  Is that true?

Yes.

> If it is true, then the end point of the static composition is given
> by the `end' argument to find_composition,

Yes.

> and all we need is record it in cmp_it.

Record it for what purpose?

Anyway, calling COMPOSITION_VALID_P here is because we can
avoid calling it again in composition_reseat_it.  But, for
automatic composition, the checking and actual composing
happens at the same time.  So, even if we do that in
composition_compute_stop_pos, composition_reseat_it has to
do that again (for actual composing).

> And the loop after that, conditioned on auto-composition-mode, seems
> to do a similar job for automatic compositions.  Omitting some
> secondary details, that loop does this:

>   while (charpos < endpos)
>     {
>       [advance to the next character]
>       val = CHAR_TABLE_REF (Vcomposition_function_table, c);
>       if (! NILP (val))
> 	{
> 	  Lisp_Object elt;

> 	  for (; CONSP (val); val = XCDR (val))
> 	    {
> 	      elt = XCAR (val);
> 	      if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1))
> 		  && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start)
> 		break;
> 	    }
> 	  if (CONSP (val))
> 	    {
> 	      cmp_it->lookback = XFASTINT (AREF (elt, 1));
> 	      cmp_it->stop_pos = charpos - 1 - cmp_it->lookback;
> 	      cmp_it->ch = c;
> 	      return;
> 	    }
> 	}
>     }

> This looks as if a position that is a candidate for starting a
> composition sequence should have a non-nil entry in
> composition-function-table for the character at that position, and
> that entry should specify the (relative) character position where the
> sequence might start.  Is my understanding correct?

Mostly, but not accuate.  The correct one is "A position
that will be composed with the following and/or the
preceding characters should have a non-nil entry in ...".

The reason why we don't record all characters that will
start a composition is for efficiency (for instance, to
record only combining characters (U+0300...U+03FF) in
composition-function-table).

> > To move from one composition position to the next, we must actually
> > call autocmp_chars and find where the current composition ends, then
> > start searching for the next composition.

> It is true that the code looking for stop position that might begin an
> automatic composition does not compute the end of the sequence.  That
> end is computed by autocmp_chars.  But what does this mean in
> practice?  Suppose we have found a candidate stop_pos, marked by S
> below:

>      abcdeSuvwxyz

> First, a composition sequence cannot be shorter than 2 characters,
> right?

No, a single character can composed.

> So the next stop_pos cannot be before v.  Now suppose that the
> actual composition sequence is "Suvw", and we issue the next call to
> composition_compute_stop_pos at v -- are you saying that it will
> suggest that v is also a possible stop_pos, even though it is in the
> middle of a composition sequence?  --- (Q1)

Yes, that happens in Indic scripts.  Actually both a line
starting with "Suvw" and a line staring with "vw" can have
different composition at BOL.  But, AFAIK, all R2L scripts
(Arabic, Dhivehi, Hebrew) don't have such a charactics.  So,
in a adhoc way, we can say that your (Q1) is false.  So, 

> If not, then repeated calls to
> composition_compute_stop_pos in the bidi case, without calling
> composition_reseat_it in between, will just be slightly
> more expensive because they will need to examine more positions.  Is
> this analysis correct?

it is correct but just empirically.  There will be a script
that uses the same writing system as Devanagari but in R2L
manner somewhere between Indic and Arabic region.  I have no
idea.

> > But composition_reseat_it also needs ENDPOS

> We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we
> call composition_reseat_it and composition_compute_stop_pos in the
> forward direction repeatedly, can't we?  That's because, when the
> iterator is some position, we are only interested in compositions that
> cover that position.

No.  Such a way slows down the display of a buffer that has
no composition at all.  For such a buffer,
composition_compute_stop_pos should set cmp_it->stop_pos to
the actual endpos so that CHAR_COMPOSED_P quickly returns
zero.

> > We don't have to re-calculate ENDPOS each time.  It must be
> > updated only when we pass over bidi boundary.

> Btw, can we always assume that all the characters of a composition
> sequence are at the same embedding level?  I guess IOW I'm asking what
> Emacs features are currently implemented based on compositions?

Yes.  I can't think of any situation that characters must be
composed striding over bidi-boundary.   First of all, in
what embedding level, such a composition belongs?

> Obviously, all the characters in a sequence that produces a single
> grapheme must have the same level, but what about compositions that
> produce several grapheme clusters -- can each of the clusters have
> different bidirectional properties?

It is possible to setup a regular expression of an entry of
composition-function-table to do such a composition.  But, I
think we don't have to support such a thing until we face
with a concrete example of the necessity (quite doubtfull).

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30 12:12                       ` Kenichi Handa
@ 2010-04-30 13:15                         ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-04-30 13:15 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 30 Apr 2010 21:12:04 +0900
> 
> > So it looks like COMPOSITION_VALID_P is the proper way of validating a
> > position that is a candidate for a static composition.  Is that true?
> 
> Yes.
> 
> > If it is true, then the end point of the static composition is given
> > by the `end' argument to find_composition,
> 
> Yes.
> 
> > and all we need is record it in cmp_it.
> 
> Record it for what purpose?

For determining (1) whether the current iterator position is inside a
composition sequence, and (2) when to look for the next possible
composition sequence.

Consider a buffer with 3 composition sequence indicated by Sn..En:

   S1..E1.......S2..E2.....|.....S3..E3

Suppose the iterator is at the position marked by |.  Then the
iterator does not need to consider composite characters as long as its
character position is between E2 and S3 (exclusively).  If it gets to
between S2 and E2, then it needs to produce the composite character
from S2..E2.  If it goes back beyond S2, it will need to find the
places S1 and E1, and if it gets beyond E3, it will need to find the
next sequence, S4..E4 (not shown above).

IOW, the idea is to keep track of 2 potential composition sequences,
one before and one after the current iterator position, and recompute
them when the iterator is placed outside the region between the start
of the leftmost and the end of the rightmost one.

But it looks like this idea is not going to work with automatic
compositions, see below.

> > This looks as if a position that is a candidate for starting a
> > composition sequence should have a non-nil entry in
> > composition-function-table for the character at that position, and
> > that entry should specify the (relative) character position where the
> > sequence might start.  Is my understanding correct?
> 
> Mostly, but not accuate.  The correct one is "A position
> that will be composed with the following and/or the
> preceding characters should have a non-nil entry in ...".

Yes, that's what I meant, but failed to express.  Thanks.

> > So the next stop_pos cannot be before v.  Now suppose that the
> > actual composition sequence is "Suvw", and we issue the next call to
> > composition_compute_stop_pos at v -- are you saying that it will
> > suggest that v is also a possible stop_pos, even though it is in the
> > middle of a composition sequence?  --- (Q1)
> 
> Yes, that happens in Indic scripts.  Actually both a line
> starting with "Suvw" and a line staring with "vw" can have
> different composition at BOL.  But, AFAIK, all R2L scripts
> (Arabic, Dhivehi, Hebrew) don't have such a charactics.  So,
> in a adhoc way, we can say that your (Q1) is false.  So, 
> 
> > If not, then repeated calls to
> > composition_compute_stop_pos in the bidi case, without calling
> > composition_reseat_it in between, will just be slightly
> > more expensive because they will need to examine more positions.  Is
> > this analysis correct?
> 
> it is correct but just empirically.

Unfortunately, this means that Q1 must be considered to be true.  The
reason is the following subtlety of bidi reordering: in R2L
paragraphs, where the base embedding level is 1 (as opposed to zero in
L2R paragraphs), the bidi iterator delivers R2L characters in their
logical order, and reorders the L2R characters.  (We then reverse the
character order for display in append_glyph, which prepends each new
glyph instead of appending it, in such paragraphs.)  So, if an Indic
script is embedded in an R2L paragraph, it will hit this issue,
because the iterator will see Indic characters in reverse order.

Is there _any_ way to precompute the length of a composition sequence
when the entry is added to composition-function-table?  Or is it only
possible to compute the length given the text surrounding the
sequence, when it is actually encountered in a buffer or string?

If the latter, I see no other way except calling autocmp_chars inside
composition_compute_stop_pos.  This would slow down redisplay by a
factor of 2 at the worst.  If that turns out too expensive, we will
have to introduce some mechanism to avoid computing each composition
more than once.  What results of the call to autocmp_chars need to be
recorded in order to avoid calling it again in composition_reseat_it?

> > We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we
> > call composition_reseat_it and composition_compute_stop_pos in the
> > forward direction repeatedly, can't we?  That's because, when the
> > iterator is some position, we are only interested in compositions that
> > cover that position.
> 
> No.  Such a way slows down the display of a buffer that has
> no composition at all.  For such a buffer,
> composition_compute_stop_pos should set cmp_it->stop_pos to
> the actual endpos so that CHAR_COMPOSED_P quickly returns
> zero.

It could be that having CHAR_COMPOSED_P return non-zero once every 16
characters in a buffer with no compositions at all is still the best
we can do, see above.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-04-30  7:08                     ` Eli Zaretskii
@ 2010-05-03  2:39                       ` Kenichi Handa
  2010-05-03  7:31                         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-05-03  2:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83tyqtwh7z.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org>
> > Cc: emacs-devel@gnu.org
> > Date: Fri, 30 Apr 2010 15:06:11 +0900
> > 
> > In the case of "english HEBREW TEXT text" (lowercases are
> > l2r characters, upppercases are r2l characters),
> > get_next_display_element starts from the first "e" and
> > proceeds to the first " " (stage 1), then jumps to the last
> > "T" and proceeds back to the first "H" (stage 2), then jumps
> > to the last " " and proceeds to the last "t" (stage 3).

> This is only the simplest case, with just 2 embedding levels: the base
> level of the paragraph, and the (higher) level of the embedded R2L
> text.  The general case is much more complex: there could be up to 60
> nested levels, and some of them could begin or end at the same buffer
> position.  bidi.c handles all this complexity by means of a very
> simple algorithm, but that algorithm needs to know a lot about the
> characters traversed so far.  I don't think exposing all these
> internals to xdisp.c is a good idea.

Just exposing (or creating) one function that tells where
the current bidi-run ends is enough.  Is it that difficult?

> > Note that composition_compute_stop_pos just finds a stop
> > position to check, and the actual checking and composing is
> > done by composition_reseat_it which is called by
> > CHAR_COMPOSED_P.

> Right, but the same is true for the bidi iteration: I need only to
> know when to check for composition; the actual composing will be still
> done by composition_reseat_it.  I just cannot assume that I always
> move linearly forward in the buffer.  Therefore, it is not enough to
> have only the next stop position recorded in the iterator.  I need
> more information recorded.  What I'm trying to determine in this
> thread is what needs to be recorded and how to compute what's needed.
> Thanks for helping me.

I don't understand the logic of "Therefore" in the above
paragraph.

> > Isn't it possible to record where the current bidi-run
> > started while you scan a buffer in
> > bidi_get_next_char_visually?

> See above: it's tricky.  The function in bidi.c that looks for the
> beginning and end of a level run relies on almost all the other
> functions in bidi.c, and it does that on the fly.  The level edges are
> not recorded anywhere, except in an internal cache used to speed up
> moving back in the buffer.

Then, what we need is a function that return the value of that cache.

> > > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how
> > > long can a composition sequence be?
> > 
> > It is MAX_COMPOSITION_COMPONENTS (16), but here it's not
> > relevant.

> Why not?  Isn't it true that if none of the 16 characters preceding
> the current position can start a composition sequence, then the
> current position is not inside a composition sequence?

It's true, but how does it contribute to find where to check a
composition next time?

> > > Another idea would be to call composition_compute_stop_pos repeatedly,
> > > starting from the last cmp_it->stop_pos, until we find the last
> > > stop_pos before the current iterator position, then compute the
> > > beginning and end of the composable sequence at that position, and
> > > record it in the iterator.  Then we handle the composition when we
> > > enter the sequence from either end.
> > 
> > To move from one composition position to the next, we must
> > actually call autocmp_chars and find where the current
> > composition ends, then start searching for the next
> > composition.  As autocmp_chars calls Lisp and all functions
> > to compose characters, it's so inefficient to call it
> > repeatedly just to find the last one.

> If the buffer or string is full of composed characters, then yes, it
> would be a slowdown.  Especially if the number of ``suspect'' stop
> positions is much larger than the number of actual composition
> sequences.  But what else can be done, given the design of the
> compositions that doesn't let us know the sequence length without
> actually composing the character?

Isn't it faster to call bidi_get_next_char_visually
repeatedly.  At least it doesn't call Lisp.

And, aren't there any possibility in the current bidi code
to provide a function that gives the information I'm asking?

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-05-03  2:39                       ` Kenichi Handa
@ 2010-05-03  7:31                         ` Eli Zaretskii
  2010-05-04  9:19                           ` Kenichi Handa
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2010-05-03  7:31 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 03 May 2010 11:39:24 +0900
> 
> > This is only the simplest case, with just 2 embedding levels: the base
> > level of the paragraph, and the (higher) level of the embedded R2L
> > text.  The general case is much more complex: there could be up to 60
> > nested levels, and some of them could begin or end at the same buffer
> > position.  bidi.c handles all this complexity by means of a very
> > simple algorithm, but that algorithm needs to know a lot about the
> > characters traversed so far.  I don't think exposing all these
> > internals to xdisp.c is a good idea.
> 
> Just exposing (or creating) one function that tells where
> the current bidi-run ends is enough.  Is it that difficult?

Maybe not, but what will this solve?  The end of a level run can still
potentially be far away, much farther than we need to look to find
compositions.  I'm trying to find a way of searching smaller parts of
the buffer.

In addition, going back in the buffer is much less efficient than
going forward, so it's probably a good idea to avoid looking back by
decrementing buffer positions.

> > > Note that composition_compute_stop_pos just finds a stop
> > > position to check, and the actual checking and composing is
> > > done by composition_reseat_it which is called by
> > > CHAR_COMPOSED_P.
> 
> > Right, but the same is true for the bidi iteration: I need only to
> > know when to check for composition; the actual composing will be still
> > done by composition_reseat_it.  I just cannot assume that I always
> > move linearly forward in the buffer.  Therefore, it is not enough to
> > have only the next stop position recorded in the iterator.  I need
> > more information recorded.  What I'm trying to determine in this
> > thread is what needs to be recorded and how to compute what's needed.
> > Thanks for helping me.
> 
> I don't understand the logic of "Therefore" in the above
> paragraph.

When we traverse the buffer in a single direction, like with Emacs 23
redisplay, we only need to record the single next position to check
for compositions, which is always _after_ (at higher buffer position)
than where we are.  Until we get to that position, we _know_ there
will be no composition sequences in the buffer.

By contrast, when we traverse the buffer non-linearly, changing
direction and jumping back and forth, we can suddenly find ourselves
beyond this single next position, without actually passing it and
handling the composition at that position.  So we need to record more
information about possible places of compositions in the buffer, to
account for such non-linear movement.

> > > > Another idea would be to call composition_compute_stop_pos repeatedly,
> > > > starting from the last cmp_it->stop_pos, until we find the last
> > > > stop_pos before the current iterator position, then compute the
> > > > beginning and end of the composable sequence at that position, and
> > > > record it in the iterator.  Then we handle the composition when we
> > > > enter the sequence from either end.
> > > 
> > > To move from one composition position to the next, we must
> > > actually call autocmp_chars and find where the current
> > > composition ends, then start searching for the next
> > > composition.  As autocmp_chars calls Lisp and all functions
> > > to compose characters, it's so inefficient to call it
> > > repeatedly just to find the last one.
> 
> > If the buffer or string is full of composed characters, then yes, it
> > would be a slowdown.  Especially if the number of ``suspect'' stop
> > positions is much larger than the number of actual composition
> > sequences.  But what else can be done, given the design of the
> > compositions that doesn't let us know the sequence length without
> > actually composing the character?
> 
> Isn't it faster to call bidi_get_next_char_visually
> repeatedly.  At least it doesn't call Lisp.

I'm confused.  bidi_get_next_char_visually is what we use now to move
through the buffer, so using it gets me back at the problem I'm trying
to solve: how to know, at an arbitrary position returned by
bidi_get_next_char_visually, whether it is inside a composition
sequence.

What am I missing?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-05-03  7:31                         ` Eli Zaretskii
@ 2010-05-04  9:19                           ` Kenichi Handa
  2010-05-04 17:47                             ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Kenichi Handa @ 2010-05-04  9:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <E1O8q7I-0003HV-FH@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > > If the buffer or string is full of composed characters, then yes, it
> > > would be a slowdown.  Especially if the number of ``suspect'' stop
> > > positions is much larger than the number of actual composition
> > > sequences.  But what else can be done, given the design of the
> > > compositions that doesn't let us know the sequence length without
> > > actually composing the character?
> > 
> > Isn't it faster to call bidi_get_next_char_visually
> > repeatedly.  At least it doesn't call Lisp.

> I'm confused.  bidi_get_next_char_visually is what we use now to move
> through the buffer, so using it gets me back at the problem I'm trying
> to solve: how to know, at an arbitrary position returned by
> bidi_get_next_char_visually, whether it is inside a composition
> sequence.

It seems that we are discussing based on different
strategies for solving the current problem.

My current plan is not to make bidi_get_next_char_visually aware of
composition, but to make composition codes pay attention to bidi and
take responsibility on setting character positions at composition
boundary.

I'm now modifying my local copy along that line.  As soon as I finish
it, I'll show you the code and ask your comment.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Compositions and bidi display
  2010-05-04  9:19                           ` Kenichi Handa
@ 2010-05-04 17:47                             ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2010-05-04 17:47 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Tue, 04 May 2010 18:19:30 +0900
> 
> My current plan is not to make bidi_get_next_char_visually aware of
> composition, but to make composition codes pay attention to bidi and
> take responsibility on setting character positions at composition
> boundary.

I meant the same.  I probably simply misunderstood you, sorry.

> I'm now modifying my local copy along that line.  As soon as I finish
> it, I'll show you the code and ask your comment.

Thank you.




^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-05-04 17:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE>
     [not found] ` <83fx2q5w86.fsf@gnu.org>
     [not found]   ` <tl739yppmat.fsf@m17n.org>
2010-04-23 18:52     ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
2010-04-23 20:34       ` Andreas Schwab
2010-04-23 20:43         ` Eli Zaretskii
2010-04-24 11:27           ` Eli Zaretskii
2010-04-26  2:09       ` Kenichi Handa
2010-04-26  2:38         ` Kenichi Handa
2010-04-26 11:29       ` Kenichi Handa
2010-04-26 18:40         ` Compositions and bidi display Eli Zaretskii
2010-04-27 12:15           ` Kenichi Handa
2010-04-28  3:18             ` Eli Zaretskii
2010-04-28  4:01               ` Kenichi Handa
2010-04-28 17:38                 ` Eli Zaretskii
2010-04-28 22:49                   ` Stefan Monnier
2010-04-29  3:12                     ` Eli Zaretskii
2010-04-30  2:28                       ` Kenichi Handa
2010-04-30  6:41                         ` Eli Zaretskii
2010-04-30  6:06                   ` Kenichi Handa
2010-04-30  7:08                     ` Eli Zaretskii
2010-05-03  2:39                       ` Kenichi Handa
2010-05-03  7:31                         ` Eli Zaretskii
2010-05-04  9:19                           ` Kenichi Handa
2010-05-04 17:47                             ` Eli Zaretskii
2010-04-30 10:07                     ` Eli Zaretskii
2010-04-30 12:12                       ` Kenichi Handa
2010-04-30 13:15                         ` Eli Zaretskii
2010-04-27  3:13         ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
2010-04-27 12:26           ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).