From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Arabic support Date: Fri, 03 Sep 2010 16:25:49 +0300 Message-ID: <83pqwvhsbm.fsf@gnu.org> References: <83bp8oml9c.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1283521017 16603 80.91.229.12 (3 Sep 2010 13:36:57 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 3 Sep 2010 13:36:57 +0000 (UTC) Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org To: Kenichi Handa Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Fri Sep 03 15:36:54 2010 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OrWRl-00044J-Ak for gnu-emacs-bidi@m.gmane.org; Fri, 03 Sep 2010 15:36:50 +0200 Original-Received: from localhost ([127.0.0.1]:34443 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OrWLN-0001xU-UC for gnu-emacs-bidi@m.gmane.org; Fri, 03 Sep 2010 09:30:13 -0400 Original-Received: from [140.186.70.92] (port=58476 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OrWHr-00073v-Qv for emacs-bidi@gnu.org; Fri, 03 Sep 2010 09:26:37 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OrWHp-0006Ow-Vn for emacs-bidi@gnu.org; Fri, 03 Sep 2010 09:26:35 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:64921) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OrWHp-0006Od-Ip; Fri, 03 Sep 2010 09:26:33 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0L8600500AL69P00@a-mtaout20.012.net.il>; Fri, 03 Sep 2010 16:25:45 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.126.202.93]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L86005S2AMU4620@a-mtaout20.012.net.il>; Fri, 03 Sep 2010 16:25:45 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:814 gmane.emacs.devel:129643 Archived-At: Archived-At: > From: Kenichi Handa > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Fri, 03 Sep 2010 10:00:02 +0900 > > > > > I'm asking because it's possible that we will need to modify > > > > w32uniscribe.c to reorder R2L characters before we pass them to the > > > > Uniscribe ScriptShape API, to let it see the characters in the logical > > > > order it expects them. That's if it turns out that Uniscribe cannot > > > > otherwise shape them correctly. > > > > > > ??? Currently characters and glyphs in LGSTRING are always > > > in logical order. > > > See my mail from yesterday, where I describe that I see in GDB that > > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual > > order: > > > http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html > > In this mail, you wrote: > > > Also, it looks like uniscribe_shape is repeatedly called from > > font-shape-gstring to shape the same text that is progressively > > shortened. For example, the first call will be with a 7-character > > string whose contents is > > > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} > > and this character sequence is surely in logical order. So > I don't know why you think uniscribe_shape is given a > LGSTRING of visual order. Sorry, you are right. I got fooled by the fact that the end of the string is almost a mirror image of its beginning. There's something I'm missing in how character compositions and font shaping work together with bidi reordering. I need to understand that to figure out what, if anything, needs to be fixed in uniscribe_shape to get it to work correctly. So let me describe how the bidi reordering works and my understanding of how it interacts with character compositions, and ask you to correct any inaccuracies and fill in the blanks. Thanks in advance. There are two use-cases that bidi reordering supports. The first one is reordering in left-to-right paragraphs, containing mostly L2R text with embedded R2L characters. I will call this "the L2R paragraph" case. The other use-case is reordering in right-to-left paragraphs, which typically almost entirely consist of R2L characters with embedded L2R letters, digits, and other characters that are displayed left to right. I call this "the R2L paragraph" case. For L2R paragraphs, runs of R2L characters are delivered in reverse order (ignoring for the moment complications caused by directional override control characters). When the bidi iterator bumps into an R2L character, it scans forward until the end of the run, then begins to go back delivering the characters, thus reversing them on display. When the run of R2L characters is exhausted, the iterator jumps to the end of the run and resumes its normal forward scan. For R2L paragraphs, runs of R2L characters are delivered in their buffer's logical order, without reversing them. L2R characters in such paragraphs _are_ reversed, by the same process of scanning forward past them, then delivering them back to front. This produces a mirror image of the line as it should be displayed, wherein the character to be displayed the rightmost is the first glyph we produce. To mirror the line into its correct order, the PRODUCE_GLYPHS macro, which calls the produce_glyphs method of the terminal-specific redisplay interface, _prepends_ each new glyph to those already produced for the glyph row, rather than appending them in the L2R paragraph case. To illustrate, if we have a buffer with the following contents (capital letters represent R2L characters): ABCD foo then the bidi iterator will produce the characters in this order: ABCD oof and then PRODUCE_GLYPHS will mirror them into foo DCBA which is the correct visual order. Note that in both cases, the glyph row generated by the above procedure is drawn from left to right by the terminal-specific method that delivers glyphs to the glass. That method draws glyphs one by one in the order they are stored in the glyph row. No reordering happens on this level, and in fact this level is totally ignorant about the text directionality. Enter character compositions. During the buffer scan that delivers characters to PRODUCE_GLYPHS, if the next character to be delivered is a composed character, then composition_reseat_it and next_element_from_composition are called. If they succeed to compose the character with one or more following characters, the whole sequence of characters that where composed is recorded in the glyph row as a single element of type IT_COMPOSITION. This single element is expanded into the actual font glyphs when the glyph row is drawn by the terminal-specific draw_glyphs method. The bidi reordering treats this single element as if it were a single glyph, and thus does not reorder its font glyphs. So this single element winds up in the glyph row in the position corresponding to the first character of the composed sequence. The question is: in what order should the font glyphs be held in the LGSTRING returned by the font driver's `shape' method? Let's take an example. Suppose we have a L2R paragraph in a buffer with this contents: foobar ABCDE and suppose that "ABCDE" will be shaped by the font driver's `shape' method into a logical-order sequence of glyphs "XYZ". Since this is a L2R paragraph, and since no reordering will happen to "XYZ" when it is delivered to the glass, it must be stored in the LGSTRING in the visual order, i.e. "ZYX", with X being the first character to be read and the rightmost to display, Y the second, etc. Now suppose we have a R2L paragraph: ABCDE foobar The mirroring of the glyph row in PRODUCE_GLYPHS will now produce foobar XYZ because it treats "XYZ" as a single element. Again, no reordering will happen to "XYZ" when it is drawn on the terminal. So again, we need "XYZ" to be stored in visual order, i.e. "ZYX". You say that the contents of LGSTRING passed to the `shape' method are in logical order. The conclusion from the above seems to be that we need to have the `shape' method reorder the shaped glyphs into visual order. Is that what happens with the libotf driver? does it indeed reorder R2L glyphs it returns after reshaping? If not, how does a reshaped sequence of glyphs winds up correctly on display? Even if everything I said above is correct, there are complications. ABCDE could be inside an embedding with left to right override, like this: foobar RLO ABCDE PDF This should be displayed as foobar ABCDE i.e., "ABCDE" is not reordered, but displayed in the logical order, as forced by RLO. Therefore, the reshaped "XYZ" should also be displayed left to right: foobar XYZ But, if I understand correctly how composition works, the auto-composed sequence in this case will still be just "XYZ", without the RLO and PDF control characters. So the `shape' method of the font driver will still see just "XYZ" in the LGSTRING, without the control characters, and will reorder "XYZ", which is incorrect. If we need the `shape' method to reorder glyphs, then in order for it do its job correctly, we need to give it the entire bidi context of the string we are asking it to reshape. In the above example, we need to tell it about the override directive, i.e. pass it "ABCDE" with surrounding RLO and PDF controls. This flies in the face of the current design, which separates reordering from glyph shaping. So the conclusion is that we need the `shape' method to return the reshaped glyphs in the logical order, and then reorder them afterwards. If this is correct, we need to make 2 changes: . change the interface to the `shape' method, so that the reshaped LGSTRING holds glyphs in the logical order . modify fill_gstring_glyph_string to reorder glyphs when it puts them into a glyph_string structure Am I missing something?