From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Compositions and bidi display Date: Tue, 27 Apr 2010 21:15:04 +0900 Message-ID: References: <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE> <83fx2q5w86.fsf@gnu.org> <834oj22e96.fsf@gnu.org> <837hnuys42.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1272370433 7879 80.91.229.12 (27 Apr 2010 12:13:53 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 27 Apr 2010 12:13:53 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 27 14:13:52 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1O6jfi-0001K0-S8 for ged-emacs-devel@m.gmane.org; Tue, 27 Apr 2010 14:13:51 +0200 Original-Received: from localhost ([127.0.0.1]:49148 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O6jfi-0007r0-5h for ged-emacs-devel@m.gmane.org; Tue, 27 Apr 2010 08:13:50 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O6jfb-0007qt-UP for emacs-devel@gnu.org; Tue, 27 Apr 2010 08:13:44 -0400 Original-Received: from [140.186.70.92] (port=53688 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O6jfZ-0007qd-0h for emacs-devel@gnu.org; Tue, 27 Apr 2010 08:13:42 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O6jfX-0006WK-M7 for emacs-devel@gnu.org; Tue, 27 Apr 2010 08:13:40 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:60528) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O6jfT-0006TE-4u; Tue, 27 Apr 2010 08:13:35 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id o3RCDTVb009359; Tue, 27 Apr 2010 21:13:29 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id o3RCDTRs012801; Tue, 27 Apr 2010 21:13:29 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id o3RCDS3D026428; Tue, 27 Apr 2010 21:13:28 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1O6jgu-0007Ac-U4; Tue, 27 Apr 2010 21:15:04 +0900 In-Reply-To: <837hnuys42.fsf@gnu.org> (message from Eli Zaretskii on Mon, 26 Apr 2010 21:40:45 +0300) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124243 Archived-At: In article <837hnuys42.fsf@gnu.org>, Eli Zaretskii writes: > > So, the bidi reordering must happen after composition handling is > > done. > Unfortunately, this is impossible, not without throwing away the > entire design and current implementation of the bidi reordering, and > implementing it in a totally different way that will have to be much > more invasive into the overall design of Emacs display engine. > The reason is, as you know, that bidi reordering in Emacs is > conceptually just a replacement for advancing from one character to > the next during iteration through buffers or strings. Instead of > incrementing the character position to the next character, we modify > the position non-linearly to get to the next character in the visual > order. Obviously, this iteration is a lower-level operation than character > composition. > In addition, the bidi reordering engine knows nothing about the > characters it encounters except their bidirectional properties; in > particular, it doesn't know anything about character compositions, and > teaching it about them would mean rather serious complications. > Moreover, the bidirectional properties are in general defined for > individual characters, not for the composed ones, which is one more > reason it is very hard to do what you suggest, even if we would turn > the current design inside out. For example, we compose Hebrew > consonants with diacriticals into a single glyph, but that glyph has > no character codepoint to look up its bidirectional properties in the > Unicode database. I think it's possible to apply Unicode's bidi algorithm to the glyph sequence if each glyph provides a character code to check for reordering. For composition glyph, we can use the first character of the composed sequence. But, as your algorithm is incremental and don't cache glyphs, such a method may slow down the display engine. > So, once composed, these characters cannot be > reordered by following the UAX#9 algorithm without complications, > because UAX#9 is explicitly defined to work _before_ any shaping of > characters for display, see Section 3.5 there. The example of Section 3.5 is for base characters, not applicable for base and combining character sequence. First of all, TR9's bidi model is not incremental, and thus the shaping engine can see a result of all reordering result at once. In that model, it's possible for the shaping engine to reverse the order of a base character and combining characters after bidi processing as written in L3 of 3.4: ============================================================ L3. Combining marks applied to a right-to-left base character will at this point precede their base character. If the rendering engine expects them to follow the base characters in the final display process, then the ordering of the marks and the base character must be reversed. ============================================================ So, how to do that in the current incremental method? > Therefore, I will need to find and handle sequences of characters to > be composed as an integral part of next_element_from_buffer, similarly > to what is already done with face changes there. > The idea is to detect the situation where the bidi iteration placed us > into a composable sequence of characters, and when that happens, > compose them and deliver them as a single display element, and then > skip the entire sequence, like we do today in the unidirectional > display. The tricky part is that today we only detect this when we > hit the beginning of such a sequence, while moving in the strictly > increasing order of buffer positions; with bidi reordering we will > need to detect them from the end of the sequence as well, for when the > bidi iterator moves backwards or jumps across many character > positions. > Is it possible to write a function or macro that will find out, for a > particular buffer/string position, whether that position is at the end > or in the middle of a composable sequence of characters, and if so, > return the character positions of the first and last characters of the > sequence? Something like CHAR_COMPOSED_P, but one that looks back in > the buffer? If so, could you please help me write such a function? Here's a rough idea. (1) Call composition_compute_stop_pos with ENDPOS < CHARPOS if we are now in R2L range. ENDPOS is the start of this R2L range. And modify this function to search a buffer/string backward if ENDPOS < CHARPOS. Provided that uppercase letters denote Hebrew consonants, lowercase denotes Hebrew diacriticals, a buffer has the character sequence "AaBbCc", CHARPOS is the position of 'c', ENDPOS is the position of 'A'. (2) Do the same for composition_reseat_it. (3) Add member 'direction' to struct composition_it that records in which direction context the composition was made. (4) Modify composition_update_it to update members 'from' and 'to' of "struct composition_it" in the reverse order if 'direction' is R2L. Note that a single composition may contain multiple graphem clusters. For instance, it's possible to write a composition fuction that accepts "AaBbCc" (above example) at onse and produce a single composition that contains three graphem clusters "Aa", "Bb", and "Cc". To do all of them, perhaps all I need is to know the way to find the correct ENDPOS. Please tell me how to do that. --- Kenichi Handa handa@m17n.org