* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) [not found] ` <tl739yppmat.fsf@m17n.org> @ 2010-04-23 18:52 ` Eli Zaretskii 2010-04-23 20:34 ` Andreas Schwab ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-04-23 18:52 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: Peter_Dyballa@Freenet.DE, 5977@debbugs.gnu.org > Date: Wed, 21 Apr 2010 11:32:58 +0900 > > I've just build the trunk code on GNU/Linus, and found that all > characters displayed by composition are incorrect. Only when bidi-display-reordering is turned on (etc/HELLO does that automatically). > Here's a brief explanation about control flow. Thanks, that part was quite clear from the code. I now fixed display of composed characters from L2R scripts when bidi-display-reordering is set to non-nil. Where I really need help is in getting compositions to work when text is reordered. Is it true that composition_reseat_it and its subroutines need to see the to-be-composed characters in strict logical order, i.e. left to right? Or can they also work if they see the characters to be composed in the reverse order? Also, what does this condition (in next_element_from_composition) check? if (it->c < 0) { IT_CHARPOS (*it) += it->cmp_it.nchars; IT_BYTEPOS (*it) += it->cmp_it.nbytes; If the meaning of the test is that there's no composition at the iterator's position, then why do we skip some of the buffer text under this condition? Thanks for your help. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-23 18:52 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii @ 2010-04-23 20:34 ` Andreas Schwab 2010-04-23 20:43 ` Eli Zaretskii 2010-04-26 2:09 ` Kenichi Handa 2010-04-26 11:29 ` Kenichi Handa 2 siblings, 1 reply; 27+ messages in thread From: Andreas Schwab @ 2010-04-23 20:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa Eli Zaretskii <eliz@gnu.org> writes: > Thanks, that part was quite clear from the code. I now fixed display > of composed characters from L2R scripts when bidi-display-reordering > is set to non-nil. There is still a problem with the cursor positioning when the line ends with a composed character (try moving point to the end of the Lao line). Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-23 20:34 ` Andreas Schwab @ 2010-04-23 20:43 ` Eli Zaretskii 2010-04-24 11:27 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-23 20:43 UTC (permalink / raw) To: Andreas Schwab; +Cc: emacs-devel, handa > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: Kenichi Handa <handa@m17n.org>, emacs-devel@gnu.org > Date: Fri, 23 Apr 2010 22:34:35 +0200 > > There is still a problem with the cursor positioning when the line ends > with a composed character (try moving point to the end of the Lao line). Yes, I know. I'm working on that. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-23 20:43 ` Eli Zaretskii @ 2010-04-24 11:27 ` Eli Zaretskii 0 siblings, 0 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-04-24 11:27 UTC (permalink / raw) To: schwab, emacs-devel, handa > Date: Fri, 23 Apr 2010 23:43:48 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: emacs-devel@gnu.org, handa@m17n.org > > > From: Andreas Schwab <schwab@linux-m68k.org> > > Cc: Kenichi Handa <handa@m17n.org>, emacs-devel@gnu.org > > Date: Fri, 23 Apr 2010 22:34:35 +0200 > > > > There is still a problem with the cursor positioning when the line ends > > with a composed character (try moving point to the end of the Lao line). > > Yes, I know. I'm working on that. Fix it, I think (revno 100025). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-23 18:52 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 2010-04-23 20:34 ` Andreas Schwab @ 2010-04-26 2:09 ` Kenichi Handa 2010-04-26 2:38 ` Kenichi Handa 2010-04-26 11:29 ` Kenichi Handa 2 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-04-26 2:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <834oj22e96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > I've just build the trunk code on GNU/Linus, and found that all > > characters displayed by composition are incorrect. > Only when bidi-display-reordering is turned on (etc/HELLO does that > automatically). > > Here's a brief explanation about control flow. > Thanks, that part was quite clear from the code. I now fixed display > of composed characters from L2R scripts when bidi-display-reordering > is set to non-nil. I've just > Where I really need help is in getting compositions to work when text > is reordered. Is it true that composition_reseat_it and its > subroutines need to see the to-be-composed characters in strict > logical order, i.e. left to right? Or can they also work if they see > the characters to be composed in the reverse order? > Also, what does this condition (in next_element_from_composition) > check? > if (it->c < 0) > { > IT_CHARPOS (*it) += it->cmp_it.nchars; > IT_BYTEPOS (*it) += it->cmp_it.nbytes; > If the meaning of the test is that there's no composition at the > iterator's position, then why do we skip some of the buffer text under > this condition? I vaguely remember that this is to avoid crash by a bug of a composition function. A composition function is written in Lisp and can be tested interactively without restarting Emacs each time. If it has a bug while testing, it may produce no glyphs for a chunk of text. In such a case, composition_update_it returns -1 and it->c is set to that return value. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-26 2:09 ` Kenichi Handa @ 2010-04-26 2:38 ` Kenichi Handa 0 siblings, 0 replies; 27+ messages in thread From: Kenichi Handa @ 2010-04-26 2:38 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, emacs-devel Oops, I typed C-c C-c too early. In article <tl7r5m3hsmd.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > > Thanks, that part was quite clear from the code. I now fixed display > > of composed characters from L2R scripts when bidi-display-reordering > > is set to non-nil. > I've just I meant "I've just confimed it, thank you." --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-23 18:52 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 2010-04-23 20:34 ` Andreas Schwab 2010-04-26 2:09 ` Kenichi Handa @ 2010-04-26 11:29 ` Kenichi Handa 2010-04-26 18:40 ` Compositions and bidi display Eli Zaretskii 2010-04-27 3:13 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 2 siblings, 2 replies; 27+ messages in thread From: Kenichi Handa @ 2010-04-26 11:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <834oj22e96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Where I really need help is in getting compositions to work when text > is reordered. Is it true that composition_reseat_it and its > subroutines need to see the to-be-composed characters in strict > logical order, i.e. left to right? Or can they also work if they see > the characters to be composed in the reverse order? All composition-related functions expect characters are in logical order. The bottom-most library for OTF handling (libotf) requires it because OpenType tables expect characters in logical order. So, the bidi reordering must happen after composition handling is done. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-26 11:29 ` Kenichi Handa @ 2010-04-26 18:40 ` Eli Zaretskii 2010-04-27 12:15 ` Kenichi Handa 2010-04-27 3:13 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 1 sibling, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-26 18:40 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Mon, 26 Apr 2010 20:29:18 +0900 > Cc: emacs-devel@gnu.org > > All composition-related functions expect characters are in > logical order. I assumed that much. Sigh... > So, the bidi reordering must happen after composition handling is > done. Unfortunately, this is impossible, not without throwing away the entire design and current implementation of the bidi reordering, and implementing it in a totally different way that will have to be much more invasive into the overall design of Emacs display engine. The reason is, as you know, that bidi reordering in Emacs is conceptually just a replacement for advancing from one character to the next during iteration through buffers or strings. Instead of incrementing the character position to the next character, we modify the position non-linearly to get to the next character in the visual order. Obviously, this iteration is a lower-level operation than character composition. In addition, the bidi reordering engine knows nothing about the characters it encounters except their bidirectional properties; in particular, it doesn't know anything about character compositions, and teaching it about them would mean rather serious complications. Moreover, the bidirectional properties are in general defined for individual characters, not for the composed ones, which is one more reason it is very hard to do what you suggest, even if we would turn the current design inside out. For example, we compose Hebrew consonants with diacriticals into a single glyph, but that glyph has no character codepoint to look up its bidirectional properties in the Unicode database. So, once composed, these characters cannot be reordered by following the UAX#9 algorithm without complications, because UAX#9 is explicitly defined to work _before_ any shaping of characters for display, see Section 3.5 there. Therefore, I will need to find and handle sequences of characters to be composed as an integral part of next_element_from_buffer, similarly to what is already done with face changes there. The idea is to detect the situation where the bidi iteration placed us into a composable sequence of characters, and when that happens, compose them and deliver them as a single display element, and then skip the entire sequence, like we do today in the unidirectional display. The tricky part is that today we only detect this when we hit the beginning of such a sequence, while moving in the strictly increasing order of buffer positions; with bidi reordering we will need to detect them from the end of the sequence as well, for when the bidi iterator moves backwards or jumps across many character positions. Is it possible to write a function or macro that will find out, for a particular buffer/string position, whether that position is at the end or in the middle of a composable sequence of characters, and if so, return the character positions of the first and last characters of the sequence? Something like CHAR_COMPOSED_P, but one that looks back in the buffer? If so, could you please help me write such a function? TIA ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-26 18:40 ` Compositions and bidi display Eli Zaretskii @ 2010-04-27 12:15 ` Kenichi Handa 2010-04-28 3:18 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-04-27 12:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <837hnuys42.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > So, the bidi reordering must happen after composition handling is > > done. > Unfortunately, this is impossible, not without throwing away the > entire design and current implementation of the bidi reordering, and > implementing it in a totally different way that will have to be much > more invasive into the overall design of Emacs display engine. > The reason is, as you know, that bidi reordering in Emacs is > conceptually just a replacement for advancing from one character to > the next during iteration through buffers or strings. Instead of > incrementing the character position to the next character, we modify > the position non-linearly to get to the next character in the visual > order. Obviously, this iteration is a lower-level operation than character > composition. > In addition, the bidi reordering engine knows nothing about the > characters it encounters except their bidirectional properties; in > particular, it doesn't know anything about character compositions, and > teaching it about them would mean rather serious complications. > Moreover, the bidirectional properties are in general defined for > individual characters, not for the composed ones, which is one more > reason it is very hard to do what you suggest, even if we would turn > the current design inside out. For example, we compose Hebrew > consonants with diacriticals into a single glyph, but that glyph has > no character codepoint to look up its bidirectional properties in the > Unicode database. I think it's possible to apply Unicode's bidi algorithm to the glyph sequence if each glyph provides a character code to check for reordering. For composition glyph, we can use the first character of the composed sequence. But, as your algorithm is incremental and don't cache glyphs, such a method may slow down the display engine. > So, once composed, these characters cannot be > reordered by following the UAX#9 algorithm without complications, > because UAX#9 is explicitly defined to work _before_ any shaping of > characters for display, see Section 3.5 there. The example of Section 3.5 is for base characters, not applicable for base and combining character sequence. First of all, TR9's bidi model is not incremental, and thus the shaping engine can see a result of all reordering result at once. In that model, it's possible for the shaping engine to reverse the order of a base character and combining characters after bidi processing as written in L3 of 3.4: ============================================================ L3. Combining marks applied to a right-to-left base character will at this point precede their base character. If the rendering engine expects them to follow the base characters in the final display process, then the ordering of the marks and the base character must be reversed. ============================================================ So, how to do that in the current incremental method? > Therefore, I will need to find and handle sequences of characters to > be composed as an integral part of next_element_from_buffer, similarly > to what is already done with face changes there. > The idea is to detect the situation where the bidi iteration placed us > into a composable sequence of characters, and when that happens, > compose them and deliver them as a single display element, and then > skip the entire sequence, like we do today in the unidirectional > display. The tricky part is that today we only detect this when we > hit the beginning of such a sequence, while moving in the strictly > increasing order of buffer positions; with bidi reordering we will > need to detect them from the end of the sequence as well, for when the > bidi iterator moves backwards or jumps across many character > positions. > Is it possible to write a function or macro that will find out, for a > particular buffer/string position, whether that position is at the end > or in the middle of a composable sequence of characters, and if so, > return the character positions of the first and last characters of the > sequence? Something like CHAR_COMPOSED_P, but one that looks back in > the buffer? If so, could you please help me write such a function? Here's a rough idea. (1) Call composition_compute_stop_pos with ENDPOS < CHARPOS if we are now in R2L range. ENDPOS is the start of this R2L range. And modify this function to search a buffer/string backward if ENDPOS < CHARPOS. Provided that uppercase letters denote Hebrew consonants, lowercase denotes Hebrew diacriticals, a buffer has the character sequence "AaBbCc", CHARPOS is the position of 'c', ENDPOS is the position of 'A'. (2) Do the same for composition_reseat_it. (3) Add member 'direction' to struct composition_it that records in which direction context the composition was made. (4) Modify composition_update_it to update members 'from' and 'to' of "struct composition_it" in the reverse order if 'direction' is R2L. Note that a single composition may contain multiple graphem clusters. For instance, it's possible to write a composition fuction that accepts "AaBbCc" (above example) at onse and produce a single composition that contains three graphem clusters "Aa", "Bb", and "Cc". To do all of them, perhaps all I need is to know the way to find the correct ENDPOS. Please tell me how to do that. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-27 12:15 ` Kenichi Handa @ 2010-04-28 3:18 ` Eli Zaretskii 2010-04-28 4:01 ` Kenichi Handa 0 siblings, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-28 3:18 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 27 Apr 2010 21:15:04 +0900 > > Provided that uppercase letters denote Hebrew consonants, > lowercase denotes Hebrew diacriticals, a buffer has the > character sequence "AaBbCc", CHARPOS is the position of 'c', > ENDPOS is the position of 'A'. > [...] > To do all of them, perhaps all I need is to know the way to > find the correct ENDPOS. Please tell me how to do that. What is the definition of ENDPOS? If that's the beginning of the composition sequence, that's the same question I asked, for which I don't know the answer. If that's the other end of the R2L run of characters, you need to iterate with bidi_get_next_char_visually until some condition (which I cannot yet formulate) is satisfied. But note that this is tricky, because the bidi iteration changes direction and jumps at will. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-28 3:18 ` Eli Zaretskii @ 2010-04-28 4:01 ` Kenichi Handa 2010-04-28 17:38 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-04-28 4:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83mxwoxo1t.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Provided that uppercase letters denote Hebrew consonants, > > lowercase denotes Hebrew diacriticals, a buffer has the > > character sequence "AaBbCc", CHARPOS is the position of 'c', > > ENDPOS is the position of 'A'. > > [...] > > To do all of them, perhaps all I need is to know the way to > > find the correct ENDPOS. Please tell me how to do that. > What is the definition of ENDPOS? If that's the beginning of the > composition sequence, that's the same question I asked, for which I > don't know the answer. If that's the other end of the R2L run of > characters, Yes, that one. > you need to iterate with bidi_get_next_char_visually until > some condition (which I cannot yet formulate) is > satisfied. But note that this is tricky, because the bidi > iteration changes direction and jumps at will. The condition should be "until it reaches a character that should never be composed with the currently looking character". We may be able to simplify that condition to "until it reaches a character in the different bidi level (or chunk)". --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-28 4:01 ` Kenichi Handa @ 2010-04-28 17:38 ` Eli Zaretskii 2010-04-28 22:49 ` Stefan Monnier 2010-04-30 6:06 ` Kenichi Handa 0 siblings, 2 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-04-28 17:38 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Wed, 28 Apr 2010 13:01:10 +0900 > > > What is the definition of ENDPOS? If that's the beginning of the > > composition sequence, that's the same question I asked, for which I > > don't know the answer. If that's the other end of the R2L run of > > characters, > > Yes, that one. > > > you need to iterate with bidi_get_next_char_visually until > > some condition (which I cannot yet formulate) is > > satisfied. But note that this is tricky, because the bidi > > iteration changes direction and jumps at will. > > The condition should be "until it reaches a character that > should never be composed with the currently looking > character". That is the condition I'm looking for. But how to code it? Is the code in find_automatic_composition a good starting point? AFAIU, it can search backward as well as forward. > We may be able to simplify that condition to > "until it reaches a character in the different bidi level > (or chunk)". But that could be very far back. I would really like to avoid going too far back, just to find out whether we reached a composition sequence, because (again AFAIU) the length of most such sequences is just a few characters. Is it correct that searching back MAX_AUTO_COMPOSITION_LOOKBACK characters is enough? If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how long can a composition sequence be? Another idea would be to call composition_compute_stop_pos repeatedly, starting from the last cmp_it->stop_pos, until we find the last stop_pos before the current iterator position, then compute the beginning and end of the composable sequence at that position, and record it in the iterator. Then we handle the composition when we enter the sequence from either end. Btw, do we still need to support static compositions? Those are based on the `composition' text property, which are no longer supported, right? Or am I confused? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-28 17:38 ` Eli Zaretskii @ 2010-04-28 22:49 ` Stefan Monnier 2010-04-29 3:12 ` Eli Zaretskii 2010-04-30 6:06 ` Kenichi Handa 1 sibling, 1 reply; 27+ messages in thread From: Stefan Monnier @ 2010-04-28 22:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa > Btw, do we still need to support static compositions? Those are based > on the `composition' text property, which are no longer supported, > right? Or am I confused? They're not? Does that mean that (font-lock-add-keywords nil `(("(lambda\\>" (0 (progn (compose-region (1+ (match-beginning 0)) (match-end 0) ;; ,(make-char 'greek-iso8859-7 107) ?λ) nil))))) is using unsupported features? I know I could use `display' instead, but some details of `compose-region' are handy (e.g. the fact that it's automatically removed when the text is modified). Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-28 22:49 ` Stefan Monnier @ 2010-04-29 3:12 ` Eli Zaretskii 2010-04-30 2:28 ` Kenichi Handa 0 siblings, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-29 3:12 UTC (permalink / raw) To: Stefan Monnier; +Cc: handa, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Wed, 28 Apr 2010 18:49:39 -0400 > Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org> > > > Btw, do we still need to support static compositions? Those are based > > on the `composition' text property, which are no longer supported, > > right? Or am I confused? > > They're not? I deduced this from the fact that we removed Qcomposition and the associated handle_composition_prop from xdisp.c. Again, I could be confused. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-29 3:12 ` Eli Zaretskii @ 2010-04-30 2:28 ` Kenichi Handa 2010-04-30 6:41 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-04-30 2:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel In article <838w87x87a.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Stefan Monnier <monnier@iro.umontreal.ca> > > Date: Wed, 28 Apr 2010 18:49:39 -0400 > > Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org> > > > > > Btw, do we still need to support static compositions? Those are based > > > on the `composition' text property, which are no longer supported, > > > right? Or am I confused? > > > > They're not? > I deduced this from the fact that we removed Qcomposition and the > associated handle_composition_prop from xdisp.c. Again, I could be > confused. ??? I've never removed handle_composition_prop nor any of codes for static composition. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 2:28 ` Kenichi Handa @ 2010-04-30 6:41 ` Eli Zaretskii 0 siblings, 0 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-04-30 6:41 UTC (permalink / raw) To: Kenichi Handa; +Cc: monnier, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 11:28:40 +0900 > > In article <838w87x87a.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > From: Stefan Monnier <monnier@iro.umontreal.ca> > > > Date: Wed, 28 Apr 2010 18:49:39 -0400 > > > Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org> > > > > > > > Btw, do we still need to support static compositions? Those are based > > > > on the `composition' text property, which are no longer supported, > > > > right? Or am I confused? > > > > > > They're not? > > > I deduced this from the fact that we removed Qcomposition and the > > associated handle_composition_prop from xdisp.c. Again, I could be > > confused. > > ??? I've never removed handle_composition_prop nor any of > codes for static composition. Sorry, I got confused. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-28 17:38 ` Eli Zaretskii 2010-04-28 22:49 ` Stefan Monnier @ 2010-04-30 6:06 ` Kenichi Handa 2010-04-30 7:08 ` Eli Zaretskii 2010-04-30 10:07 ` Eli Zaretskii 1 sibling, 2 replies; 27+ messages in thread From: Kenichi Handa @ 2010-04-30 6:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83d3xjxys1.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > The condition should be "until it reaches a character that > > should never be composed with the currently looking > > character". > That is the condition I'm looking for. But how to code it? Is the > code in find_automatic_composition a good starting point? No. The checking of possibility of composing characters at a specific position is done within composition_compute_stop_pos. What we need now is where we should stop searching in composition_compute_stop_pos. In the case of "english HEBREW TEXT text" (lowercases are l2r characters, upppercases are r2l characters), get_next_display_element starts from the first "e" and proceeds to the first " " (stage 1), then jumps to the last "T" and proceeds back to the first "H" (stage 2), then jumps to the last " " and proceeds to the last "t" (stage 3). When composition_compute_stop_pos is called in stage 1, ENDPOS should be the first " " because searching far is useless (we may have to compose some of "TEXT" before composing some of "HEBREW"). When composition_compute_stop_pos is called in stage 2, ENDPOS should be the first "H" because searching far back is useless, and so on. Note that composition_compute_stop_pos just finds a stop position to check, and the actual checking and composing is done by composition_reseat_it which is called by CHAR_COMPOSED_P. But composition_reseat_it also needs ENDPOS because when that funciton finds that there's no need of composition at the stop position, it calls composition_compute_stop_pos to update the next stop position. > > We may be able to simplify that condition to > > "until it reaches a character in the different bidi level > > (or chunk)". > But that could be very far back. Isn't it possible to record where the current bidi-run started while you scan a buffer in bidi_get_next_char_visually? > I would really like to avoid going too far back, just to > find out whether we reached a composition sequence, We don't have to re-calculate ENDPOS each time. It must be updated only when we pass over bidi boundary. Consider the above example case ("english ..."). > because (again AFAIU) the length of most such sequences is > just a few characters. Is it correct that searching back > MAX_AUTO_COMPOSITION_LOOKBACK characters is enough? No. > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how > long can a composition sequence be? It is MAX_COMPOSITION_COMPONENTS (16), but here it's not relevant. What we need is to find where in a buffer (before the scan reaches ENDPOS) next composition will happen. And, to perform it efficiently, giving a proper ENDPOS is necessary. > Another idea would be to call composition_compute_stop_pos repeatedly, > starting from the last cmp_it->stop_pos, until we find the last > stop_pos before the current iterator position, then compute the > beginning and end of the composable sequence at that position, and > record it in the iterator. Then we handle the composition when we > enter the sequence from either end. To move from one composition position to the next, we must actually call autocmp_chars and find where the current composition ends, then start searching for the next composition. As autocmp_chars calls Lisp and all functions to compose characters, it's so inefficient to call it repeatedly just to find the last one. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 6:06 ` Kenichi Handa @ 2010-04-30 7:08 ` Eli Zaretskii 2010-05-03 2:39 ` Kenichi Handa 2010-04-30 10:07 ` Eli Zaretskii 1 sibling, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-30 7:08 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 15:06:11 +0900 > > In the case of "english HEBREW TEXT text" (lowercases are > l2r characters, upppercases are r2l characters), > get_next_display_element starts from the first "e" and > proceeds to the first " " (stage 1), then jumps to the last > "T" and proceeds back to the first "H" (stage 2), then jumps > to the last " " and proceeds to the last "t" (stage 3). This is only the simplest case, with just 2 embedding levels: the base level of the paragraph, and the (higher) level of the embedded R2L text. The general case is much more complex: there could be up to 60 nested levels, and some of them could begin or end at the same buffer position. bidi.c handles all this complexity by means of a very simple algorithm, but that algorithm needs to know a lot about the characters traversed so far. I don't think exposing all these internals to xdisp.c is a good idea. > Note that composition_compute_stop_pos just finds a stop > position to check, and the actual checking and composing is > done by composition_reseat_it which is called by > CHAR_COMPOSED_P. Right, but the same is true for the bidi iteration: I need only to know when to check for composition; the actual composing will be still done by composition_reseat_it. I just cannot assume that I always move linearly forward in the buffer. Therefore, it is not enough to have only the next stop position recorded in the iterator. I need more information recorded. What I'm trying to determine in this thread is what needs to be recorded and how to compute what's needed. Thanks for helping me. > > > We may be able to simplify that condition to > > > "until it reaches a character in the different bidi level > > > (or chunk)". > > > But that could be very far back. > > Isn't it possible to record where the current bidi-run > started while you scan a buffer in > bidi_get_next_char_visually? See above: it's tricky. The function in bidi.c that looks for the beginning and end of a level run relies on almost all the other functions in bidi.c, and it does that on the fly. The level edges are not recorded anywhere, except in an internal cache used to speed up moving back in the buffer. > > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how > > long can a composition sequence be? > > It is MAX_COMPOSITION_COMPONENTS (16), but here it's not > relevant. Why not? Isn't it true that if none of the 16 characters preceding the current position can start a composition sequence, then the current position is not inside a composition sequence? > > Another idea would be to call composition_compute_stop_pos repeatedly, > > starting from the last cmp_it->stop_pos, until we find the last > > stop_pos before the current iterator position, then compute the > > beginning and end of the composable sequence at that position, and > > record it in the iterator. Then we handle the composition when we > > enter the sequence from either end. > > To move from one composition position to the next, we must > actually call autocmp_chars and find where the current > composition ends, then start searching for the next > composition. As autocmp_chars calls Lisp and all functions > to compose characters, it's so inefficient to call it > repeatedly just to find the last one. If the buffer or string is full of composed characters, then yes, it would be a slowdown. Especially if the number of ``suspect'' stop positions is much larger than the number of actual composition sequences. But what else can be done, given the design of the compositions that doesn't let us know the sequence length without actually composing the character? Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 7:08 ` Eli Zaretskii @ 2010-05-03 2:39 ` Kenichi Handa 2010-05-03 7:31 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-05-03 2:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83tyqtwh7z.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > > Cc: emacs-devel@gnu.org > > Date: Fri, 30 Apr 2010 15:06:11 +0900 > > > > In the case of "english HEBREW TEXT text" (lowercases are > > l2r characters, upppercases are r2l characters), > > get_next_display_element starts from the first "e" and > > proceeds to the first " " (stage 1), then jumps to the last > > "T" and proceeds back to the first "H" (stage 2), then jumps > > to the last " " and proceeds to the last "t" (stage 3). > This is only the simplest case, with just 2 embedding levels: the base > level of the paragraph, and the (higher) level of the embedded R2L > text. The general case is much more complex: there could be up to 60 > nested levels, and some of them could begin or end at the same buffer > position. bidi.c handles all this complexity by means of a very > simple algorithm, but that algorithm needs to know a lot about the > characters traversed so far. I don't think exposing all these > internals to xdisp.c is a good idea. Just exposing (or creating) one function that tells where the current bidi-run ends is enough. Is it that difficult? > > Note that composition_compute_stop_pos just finds a stop > > position to check, and the actual checking and composing is > > done by composition_reseat_it which is called by > > CHAR_COMPOSED_P. > Right, but the same is true for the bidi iteration: I need only to > know when to check for composition; the actual composing will be still > done by composition_reseat_it. I just cannot assume that I always > move linearly forward in the buffer. Therefore, it is not enough to > have only the next stop position recorded in the iterator. I need > more information recorded. What I'm trying to determine in this > thread is what needs to be recorded and how to compute what's needed. > Thanks for helping me. I don't understand the logic of "Therefore" in the above paragraph. > > Isn't it possible to record where the current bidi-run > > started while you scan a buffer in > > bidi_get_next_char_visually? > See above: it's tricky. The function in bidi.c that looks for the > beginning and end of a level run relies on almost all the other > functions in bidi.c, and it does that on the fly. The level edges are > not recorded anywhere, except in an internal cache used to speed up > moving back in the buffer. Then, what we need is a function that return the value of that cache. > > > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how > > > long can a composition sequence be? > > > > It is MAX_COMPOSITION_COMPONENTS (16), but here it's not > > relevant. > Why not? Isn't it true that if none of the 16 characters preceding > the current position can start a composition sequence, then the > current position is not inside a composition sequence? It's true, but how does it contribute to find where to check a composition next time? > > > Another idea would be to call composition_compute_stop_pos repeatedly, > > > starting from the last cmp_it->stop_pos, until we find the last > > > stop_pos before the current iterator position, then compute the > > > beginning and end of the composable sequence at that position, and > > > record it in the iterator. Then we handle the composition when we > > > enter the sequence from either end. > > > > To move from one composition position to the next, we must > > actually call autocmp_chars and find where the current > > composition ends, then start searching for the next > > composition. As autocmp_chars calls Lisp and all functions > > to compose characters, it's so inefficient to call it > > repeatedly just to find the last one. > If the buffer or string is full of composed characters, then yes, it > would be a slowdown. Especially if the number of ``suspect'' stop > positions is much larger than the number of actual composition > sequences. But what else can be done, given the design of the > compositions that doesn't let us know the sequence length without > actually composing the character? Isn't it faster to call bidi_get_next_char_visually repeatedly. At least it doesn't call Lisp. And, aren't there any possibility in the current bidi code to provide a function that gives the information I'm asking? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-05-03 2:39 ` Kenichi Handa @ 2010-05-03 7:31 ` Eli Zaretskii 2010-05-04 9:19 ` Kenichi Handa 0 siblings, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-05-03 7:31 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Mon, 03 May 2010 11:39:24 +0900 > > > This is only the simplest case, with just 2 embedding levels: the base > > level of the paragraph, and the (higher) level of the embedded R2L > > text. The general case is much more complex: there could be up to 60 > > nested levels, and some of them could begin or end at the same buffer > > position. bidi.c handles all this complexity by means of a very > > simple algorithm, but that algorithm needs to know a lot about the > > characters traversed so far. I don't think exposing all these > > internals to xdisp.c is a good idea. > > Just exposing (or creating) one function that tells where > the current bidi-run ends is enough. Is it that difficult? Maybe not, but what will this solve? The end of a level run can still potentially be far away, much farther than we need to look to find compositions. I'm trying to find a way of searching smaller parts of the buffer. In addition, going back in the buffer is much less efficient than going forward, so it's probably a good idea to avoid looking back by decrementing buffer positions. > > > Note that composition_compute_stop_pos just finds a stop > > > position to check, and the actual checking and composing is > > > done by composition_reseat_it which is called by > > > CHAR_COMPOSED_P. > > > Right, but the same is true for the bidi iteration: I need only to > > know when to check for composition; the actual composing will be still > > done by composition_reseat_it. I just cannot assume that I always > > move linearly forward in the buffer. Therefore, it is not enough to > > have only the next stop position recorded in the iterator. I need > > more information recorded. What I'm trying to determine in this > > thread is what needs to be recorded and how to compute what's needed. > > Thanks for helping me. > > I don't understand the logic of "Therefore" in the above > paragraph. When we traverse the buffer in a single direction, like with Emacs 23 redisplay, we only need to record the single next position to check for compositions, which is always _after_ (at higher buffer position) than where we are. Until we get to that position, we _know_ there will be no composition sequences in the buffer. By contrast, when we traverse the buffer non-linearly, changing direction and jumping back and forth, we can suddenly find ourselves beyond this single next position, without actually passing it and handling the composition at that position. So we need to record more information about possible places of compositions in the buffer, to account for such non-linear movement. > > > > Another idea would be to call composition_compute_stop_pos repeatedly, > > > > starting from the last cmp_it->stop_pos, until we find the last > > > > stop_pos before the current iterator position, then compute the > > > > beginning and end of the composable sequence at that position, and > > > > record it in the iterator. Then we handle the composition when we > > > > enter the sequence from either end. > > > > > > To move from one composition position to the next, we must > > > actually call autocmp_chars and find where the current > > > composition ends, then start searching for the next > > > composition. As autocmp_chars calls Lisp and all functions > > > to compose characters, it's so inefficient to call it > > > repeatedly just to find the last one. > > > If the buffer or string is full of composed characters, then yes, it > > would be a slowdown. Especially if the number of ``suspect'' stop > > positions is much larger than the number of actual composition > > sequences. But what else can be done, given the design of the > > compositions that doesn't let us know the sequence length without > > actually composing the character? > > Isn't it faster to call bidi_get_next_char_visually > repeatedly. At least it doesn't call Lisp. I'm confused. bidi_get_next_char_visually is what we use now to move through the buffer, so using it gets me back at the problem I'm trying to solve: how to know, at an arbitrary position returned by bidi_get_next_char_visually, whether it is inside a composition sequence. What am I missing? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-05-03 7:31 ` Eli Zaretskii @ 2010-05-04 9:19 ` Kenichi Handa 2010-05-04 17:47 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-05-04 9:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <E1O8q7I-0003HV-FH@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > If the buffer or string is full of composed characters, then yes, it > > > would be a slowdown. Especially if the number of ``suspect'' stop > > > positions is much larger than the number of actual composition > > > sequences. But what else can be done, given the design of the > > > compositions that doesn't let us know the sequence length without > > > actually composing the character? > > > > Isn't it faster to call bidi_get_next_char_visually > > repeatedly. At least it doesn't call Lisp. > I'm confused. bidi_get_next_char_visually is what we use now to move > through the buffer, so using it gets me back at the problem I'm trying > to solve: how to know, at an arbitrary position returned by > bidi_get_next_char_visually, whether it is inside a composition > sequence. It seems that we are discussing based on different strategies for solving the current problem. My current plan is not to make bidi_get_next_char_visually aware of composition, but to make composition codes pay attention to bidi and take responsibility on setting character positions at composition boundary. I'm now modifying my local copy along that line. As soon as I finish it, I'll show you the code and ask your comment. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-05-04 9:19 ` Kenichi Handa @ 2010-05-04 17:47 ` Eli Zaretskii 0 siblings, 0 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-05-04 17:47 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Tue, 04 May 2010 18:19:30 +0900 > > My current plan is not to make bidi_get_next_char_visually aware of > composition, but to make composition codes pay attention to bidi and > take responsibility on setting character positions at composition > boundary. I meant the same. I probably simply misunderstood you, sorry. > I'm now modifying my local copy along that line. As soon as I finish > it, I'll show you the code and ask your comment. Thank you. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 6:06 ` Kenichi Handa 2010-04-30 7:08 ` Eli Zaretskii @ 2010-04-30 10:07 ` Eli Zaretskii 2010-04-30 12:12 ` Kenichi Handa 1 sibling, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-30 10:07 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 15:06:11 +0900 After re-reading the code of composition_compute_stop_pos, I have a few more questions about what you wrote. > Note that composition_compute_stop_pos just finds a stop > position to check, and the actual checking and composing is > done by composition_reseat_it which is called by > CHAR_COMPOSED_P. But it looks like composition_compute_stop_pos does use at least some validation for the candidate stop position. AFAIU, this fragment finds and validates a static composition: if (find_composition (charpos, endpos, &start, &end, &prop, string) && COMPOSITION_VALID_P (start, end, prop)) { cmp_it->stop_pos = endpos = start; cmp_it->ch = -1; } So it looks like COMPOSITION_VALID_P is the proper way of validating a position that is a candidate for a static composition. Is that true? If it is true, then the end point of the static composition is given by the `end' argument to find_composition, and all we need is record it in cmp_it. If not true, what _does_ COMPOSITION_VALID_P validate? And the loop after that, conditioned on auto-composition-mode, seems to do a similar job for automatic compositions. Omitting some secondary details, that loop does this: while (charpos < endpos) { [advance to the next character] val = CHAR_TABLE_REF (Vcomposition_function_table, c); if (! NILP (val)) { Lisp_Object elt; for (; CONSP (val); val = XCDR (val)) { elt = XCAR (val); if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1)) && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start) break; } if (CONSP (val)) { cmp_it->lookback = XFASTINT (AREF (elt, 1)); cmp_it->stop_pos = charpos - 1 - cmp_it->lookback; cmp_it->ch = c; return; } } } This looks as if a position that is a candidate for starting a composition sequence should have a non-nil entry in composition-function-table for the character at that position, and that entry should specify the (relative) character position where the sequence might start. Is my understanding correct? > To move from one composition position to the next, we must actually > call autocmp_chars and find where the current composition ends, then > start searching for the next composition. It is true that the code looking for stop position that might begin an automatic composition does not compute the end of the sequence. That end is computed by autocmp_chars. But what does this mean in practice? Suppose we have found a candidate stop_pos, marked by S below: abcdeSuvwxyz First, a composition sequence cannot be shorter than 2 characters, right? So the next stop_pos cannot be before v. Now suppose that the actual composition sequence is "Suvw", and we issue the next call to composition_compute_stop_pos at v -- are you saying that it will suggest that v is also a possible stop_pos, even though it is in the middle of a composition sequence? If not, then repeated calls to composition_compute_stop_pos in the bidi case, without calling composition_reseat_it in between, will just be slightly more expensive because they will need to examine more positions. Is this analysis correct? > But composition_reseat_it also needs ENDPOS We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we call composition_reseat_it and composition_compute_stop_pos in the forward direction repeatedly, can't we? That's because, when the iterator is some position, we are only interested in compositions that cover that position. > We don't have to re-calculate ENDPOS each time. It must be > updated only when we pass over bidi boundary. Btw, can we always assume that all the characters of a composition sequence are at the same embedding level? I guess IOW I'm asking what Emacs features are currently implemented based on compositions? Obviously, all the characters in a sequence that produces a single grapheme must have the same level, but what about compositions that produce several grapheme clusters -- can each of the clusters have different bidirectional properties? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 10:07 ` Eli Zaretskii @ 2010-04-30 12:12 ` Kenichi Handa 2010-04-30 13:15 ` Eli Zaretskii 0 siblings, 1 reply; 27+ messages in thread From: Kenichi Handa @ 2010-04-30 12:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel I'll reply to this before replying to your previous mail. In article <83r5lxw8wi.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Note that composition_compute_stop_pos just finds a stop > > position to check, and the actual checking and composing is > > done by composition_reseat_it which is called by > > CHAR_COMPOSED_P. > But it looks like composition_compute_stop_pos does use at least some > validation for the candidate stop position. AFAIU, this fragment > finds and validates a static composition: > if (find_composition (charpos, endpos, &start, &end, &prop, string) > && COMPOSITION_VALID_P (start, end, prop)) > { > cmp_it->stop_pos = endpos = start; > cmp_it->ch = -1; > } > So it looks like COMPOSITION_VALID_P is the proper way of validating a > position that is a candidate for a static composition. Is that true? Yes. > If it is true, then the end point of the static composition is given > by the `end' argument to find_composition, Yes. > and all we need is record it in cmp_it. Record it for what purpose? Anyway, calling COMPOSITION_VALID_P here is because we can avoid calling it again in composition_reseat_it. But, for automatic composition, the checking and actual composing happens at the same time. So, even if we do that in composition_compute_stop_pos, composition_reseat_it has to do that again (for actual composing). > And the loop after that, conditioned on auto-composition-mode, seems > to do a similar job for automatic compositions. Omitting some > secondary details, that loop does this: > while (charpos < endpos) > { > [advance to the next character] > val = CHAR_TABLE_REF (Vcomposition_function_table, c); > if (! NILP (val)) > { > Lisp_Object elt; > for (; CONSP (val); val = XCDR (val)) > { > elt = XCAR (val); > if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1)) > && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start) > break; > } > if (CONSP (val)) > { > cmp_it->lookback = XFASTINT (AREF (elt, 1)); > cmp_it->stop_pos = charpos - 1 - cmp_it->lookback; > cmp_it->ch = c; > return; > } > } > } > This looks as if a position that is a candidate for starting a > composition sequence should have a non-nil entry in > composition-function-table for the character at that position, and > that entry should specify the (relative) character position where the > sequence might start. Is my understanding correct? Mostly, but not accuate. The correct one is "A position that will be composed with the following and/or the preceding characters should have a non-nil entry in ...". The reason why we don't record all characters that will start a composition is for efficiency (for instance, to record only combining characters (U+0300...U+03FF) in composition-function-table). > > To move from one composition position to the next, we must actually > > call autocmp_chars and find where the current composition ends, then > > start searching for the next composition. > It is true that the code looking for stop position that might begin an > automatic composition does not compute the end of the sequence. That > end is computed by autocmp_chars. But what does this mean in > practice? Suppose we have found a candidate stop_pos, marked by S > below: > abcdeSuvwxyz > First, a composition sequence cannot be shorter than 2 characters, > right? No, a single character can composed. > So the next stop_pos cannot be before v. Now suppose that the > actual composition sequence is "Suvw", and we issue the next call to > composition_compute_stop_pos at v -- are you saying that it will > suggest that v is also a possible stop_pos, even though it is in the > middle of a composition sequence? --- (Q1) Yes, that happens in Indic scripts. Actually both a line starting with "Suvw" and a line staring with "vw" can have different composition at BOL. But, AFAIK, all R2L scripts (Arabic, Dhivehi, Hebrew) don't have such a charactics. So, in a adhoc way, we can say that your (Q1) is false. So, > If not, then repeated calls to > composition_compute_stop_pos in the bidi case, without calling > composition_reseat_it in between, will just be slightly > more expensive because they will need to examine more positions. Is > this analysis correct? it is correct but just empirically. There will be a script that uses the same writing system as Devanagari but in R2L manner somewhere between Indic and Arabic region. I have no idea. > > But composition_reseat_it also needs ENDPOS > We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we > call composition_reseat_it and composition_compute_stop_pos in the > forward direction repeatedly, can't we? That's because, when the > iterator is some position, we are only interested in compositions that > cover that position. No. Such a way slows down the display of a buffer that has no composition at all. For such a buffer, composition_compute_stop_pos should set cmp_it->stop_pos to the actual endpos so that CHAR_COMPOSED_P quickly returns zero. > > We don't have to re-calculate ENDPOS each time. It must be > > updated only when we pass over bidi boundary. > Btw, can we always assume that all the characters of a composition > sequence are at the same embedding level? I guess IOW I'm asking what > Emacs features are currently implemented based on compositions? Yes. I can't think of any situation that characters must be composed striding over bidi-boundary. First of all, in what embedding level, such a composition belongs? > Obviously, all the characters in a sequence that produces a single > grapheme must have the same level, but what about compositions that > produce several grapheme clusters -- can each of the clusters have > different bidirectional properties? It is possible to setup a regular expression of an entry of composition-function-table to do such a composition. But, I think we don't have to support such a thing until we face with a concrete example of the necessity (quite doubtfull). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display 2010-04-30 12:12 ` Kenichi Handa @ 2010-04-30 13:15 ` Eli Zaretskii 0 siblings, 0 replies; 27+ messages in thread From: Eli Zaretskii @ 2010-04-30 13:15 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 21:12:04 +0900 > > > So it looks like COMPOSITION_VALID_P is the proper way of validating a > > position that is a candidate for a static composition. Is that true? > > Yes. > > > If it is true, then the end point of the static composition is given > > by the `end' argument to find_composition, > > Yes. > > > and all we need is record it in cmp_it. > > Record it for what purpose? For determining (1) whether the current iterator position is inside a composition sequence, and (2) when to look for the next possible composition sequence. Consider a buffer with 3 composition sequence indicated by Sn..En: S1..E1.......S2..E2.....|.....S3..E3 Suppose the iterator is at the position marked by |. Then the iterator does not need to consider composite characters as long as its character position is between E2 and S3 (exclusively). If it gets to between S2 and E2, then it needs to produce the composite character from S2..E2. If it goes back beyond S2, it will need to find the places S1 and E1, and if it gets beyond E3, it will need to find the next sequence, S4..E4 (not shown above). IOW, the idea is to keep track of 2 potential composition sequences, one before and one after the current iterator position, and recompute them when the iterator is placed outside the region between the start of the leftmost and the end of the rightmost one. But it looks like this idea is not going to work with automatic compositions, see below. > > This looks as if a position that is a candidate for starting a > > composition sequence should have a non-nil entry in > > composition-function-table for the character at that position, and > > that entry should specify the (relative) character position where the > > sequence might start. Is my understanding correct? > > Mostly, but not accuate. The correct one is "A position > that will be composed with the following and/or the > preceding characters should have a non-nil entry in ...". Yes, that's what I meant, but failed to express. Thanks. > > So the next stop_pos cannot be before v. Now suppose that the > > actual composition sequence is "Suvw", and we issue the next call to > > composition_compute_stop_pos at v -- are you saying that it will > > suggest that v is also a possible stop_pos, even though it is in the > > middle of a composition sequence? --- (Q1) > > Yes, that happens in Indic scripts. Actually both a line > starting with "Suvw" and a line staring with "vw" can have > different composition at BOL. But, AFAIK, all R2L scripts > (Arabic, Dhivehi, Hebrew) don't have such a charactics. So, > in a adhoc way, we can say that your (Q1) is false. So, > > > If not, then repeated calls to > > composition_compute_stop_pos in the bidi case, without calling > > composition_reseat_it in between, will just be slightly > > more expensive because they will need to examine more positions. Is > > this analysis correct? > > it is correct but just empirically. Unfortunately, this means that Q1 must be considered to be true. The reason is the following subtlety of bidi reordering: in R2L paragraphs, where the base embedding level is 1 (as opposed to zero in L2R paragraphs), the bidi iterator delivers R2L characters in their logical order, and reorders the L2R characters. (We then reverse the character order for display in append_glyph, which prepends each new glyph instead of appending it, in such paragraphs.) So, if an Indic script is embedded in an R2L paragraph, it will hit this issue, because the iterator will see Indic characters in reverse order. Is there _any_ way to precompute the length of a composition sequence when the entry is added to composition-function-table? Or is it only possible to compute the length given the text surrounding the sequence, when it is actually encountered in a buffer or string? If the latter, I see no other way except calling autocmp_chars inside composition_compute_stop_pos. This would slow down redisplay by a factor of 2 at the worst. If that turns out too expensive, we will have to introduce some mechanism to avoid computing each composition more than once. What results of the call to autocmp_chars need to be recorded in order to avoid calling it again in composition_reseat_it? > > We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we > > call composition_reseat_it and composition_compute_stop_pos in the > > forward direction repeatedly, can't we? That's because, when the > > iterator is some position, we are only interested in compositions that > > cover that position. > > No. Such a way slows down the display of a buffer that has > no composition at all. For such a buffer, > composition_compute_stop_pos should set cmp_it->stop_pos to > the actual endpos so that CHAR_COMPOSED_P quickly returns > zero. It could be that having CHAR_COMPOSED_P return non-zero once every 16 characters in a buffer with no compositions at all is still the best we can do, see above. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-26 11:29 ` Kenichi Handa 2010-04-26 18:40 ` Compositions and bidi display Eli Zaretskii @ 2010-04-27 3:13 ` Eli Zaretskii 2010-04-27 12:26 ` Kenichi Handa 1 sibling, 1 reply; 27+ messages in thread From: Eli Zaretskii @ 2010-04-27 3:13 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Mon, 26 Apr 2010 20:29:18 +0900 > Cc: emacs-devel@gnu.org > > All composition-related functions expect characters are in > logical order. The bottom-most library for OTF handling > (libotf) requires it because OpenType tables expect > characters in logical order. Btw, where does libotf come into this picture? That is, which libotf functions we use for composite characters, and at what stage in the redisplay process? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) 2010-04-27 3:13 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii @ 2010-04-27 12:26 ` Kenichi Handa 0 siblings, 0 replies; 27+ messages in thread From: Kenichi Handa @ 2010-04-27 12:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <8339yhziyj.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > All composition-related functions expect characters are in > > logical order. The bottom-most library for OTF handling > > (libotf) requires it because OpenType tables expect > > characters in logical order. > Btw, where does libotf come into this picture? That is, which libotf > functions we use for composite characters, and at what stage in the > redisplay process? For instance, an OpenType font may have independent glyphs for Hebrew consontants and diacriticals, and provide GPOS (glyph positioning) table to tell where to place a specific diacritical glyph on a specific consontant. To utilize such a font, a composition function calls libotf's OTF_drive_gpos in this calling sequence. CHAR_COMPOSED_P -> composition_reseat_it -> autocmp_chars -> a Lisp function in `composition-function-table -> font-shape-gstring -> font_driver->shape -> ftfont_shape_by_flt -> mflt_run (of libm17n-flt) -> ftfont_drive_otf (ftfont.c) as a callback rountine -> OTF_drive_gpos (of libotf) --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-05-04 17:47 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE> [not found] ` <83fx2q5w86.fsf@gnu.org> [not found] ` <tl739yppmat.fsf@m17n.org> 2010-04-23 18:52 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 2010-04-23 20:34 ` Andreas Schwab 2010-04-23 20:43 ` Eli Zaretskii 2010-04-24 11:27 ` Eli Zaretskii 2010-04-26 2:09 ` Kenichi Handa 2010-04-26 2:38 ` Kenichi Handa 2010-04-26 11:29 ` Kenichi Handa 2010-04-26 18:40 ` Compositions and bidi display Eli Zaretskii 2010-04-27 12:15 ` Kenichi Handa 2010-04-28 3:18 ` Eli Zaretskii 2010-04-28 4:01 ` Kenichi Handa 2010-04-28 17:38 ` Eli Zaretskii 2010-04-28 22:49 ` Stefan Monnier 2010-04-29 3:12 ` Eli Zaretskii 2010-04-30 2:28 ` Kenichi Handa 2010-04-30 6:41 ` Eli Zaretskii 2010-04-30 6:06 ` Kenichi Handa 2010-04-30 7:08 ` Eli Zaretskii 2010-05-03 2:39 ` Kenichi Handa 2010-05-03 7:31 ` Eli Zaretskii 2010-05-04 9:19 ` Kenichi Handa 2010-05-04 17:47 ` Eli Zaretskii 2010-04-30 10:07 ` Eli Zaretskii 2010-04-30 12:12 ` Kenichi Handa 2010-04-30 13:15 ` Eli Zaretskii 2010-04-27 3:13 ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii 2010-04-27 12:26 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).