From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Compositions and bidi display Date: Fri, 30 Apr 2010 10:08:00 +0300 Message-ID: <83tyqtwh7z.fsf@gnu.org> References: <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE> <83fx2q5w86.fsf@gnu.org> <834oj22e96.fsf@gnu.org> <837hnuys42.fsf@gnu.org> <83mxwoxo1t.fsf@gnu.org> <83d3xjxys1.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1272611348 12308 80.91.229.12 (30 Apr 2010 07:09:08 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 30 Apr 2010 07:09:08 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 30 09:09:02 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1O7kLJ-00005S-CD for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 09:08:57 +0200 Original-Received: from localhost ([127.0.0.1]:54178 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7kLI-0006R5-MR for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 03:08:56 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O7kLC-0006QY-Fx for emacs-devel@gnu.org; Fri, 30 Apr 2010 03:08:50 -0400 Original-Received: from [140.186.70.92] (port=53367 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7kLA-0006Pu-Qq for emacs-devel@gnu.org; Fri, 30 Apr 2010 03:08:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O7kL3-0003kh-18 for emacs-devel@gnu.org; Fri, 30 Apr 2010 03:08:47 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:57586) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7kL2-0003k4-MH for emacs-devel@gnu.org; Fri, 30 Apr 2010 03:08:40 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0L1O00I00GZNWX00@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Fri, 30 Apr 2010 10:08:00 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.126.59.39]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L1O00FM6H5ARQC0@a-mtaout23.012.net.il>; Fri, 30 Apr 2010 10:07:59 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124345 Archived-At: > From: Kenichi Handa > Cc: emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 15:06:11 +0900 > > In the case of "english HEBREW TEXT text" (lowercases are > l2r characters, upppercases are r2l characters), > get_next_display_element starts from the first "e" and > proceeds to the first " " (stage 1), then jumps to the last > "T" and proceeds back to the first "H" (stage 2), then jumps > to the last " " and proceeds to the last "t" (stage 3). This is only the simplest case, with just 2 embedding levels: the base level of the paragraph, and the (higher) level of the embedded R2L text. The general case is much more complex: there could be up to 60 nested levels, and some of them could begin or end at the same buffer position. bidi.c handles all this complexity by means of a very simple algorithm, but that algorithm needs to know a lot about the characters traversed so far. I don't think exposing all these internals to xdisp.c is a good idea. > Note that composition_compute_stop_pos just finds a stop > position to check, and the actual checking and composing is > done by composition_reseat_it which is called by > CHAR_COMPOSED_P. Right, but the same is true for the bidi iteration: I need only to know when to check for composition; the actual composing will be still done by composition_reseat_it. I just cannot assume that I always move linearly forward in the buffer. Therefore, it is not enough to have only the next stop position recorded in the iterator. I need more information recorded. What I'm trying to determine in this thread is what needs to be recorded and how to compute what's needed. Thanks for helping me. > > > We may be able to simplify that condition to > > > "until it reaches a character in the different bidi level > > > (or chunk)". > > > But that could be very far back. > > Isn't it possible to record where the current bidi-run > started while you scan a buffer in > bidi_get_next_char_visually? See above: it's tricky. The function in bidi.c that looks for the beginning and end of a level run relies on almost all the other functions in bidi.c, and it does that on the fly. The level edges are not recorded anywhere, except in an internal cache used to speed up moving back in the buffer. > > If MAX_AUTO_COMPOSITION_LOOKBACK is not the right number, then how > > long can a composition sequence be? > > It is MAX_COMPOSITION_COMPONENTS (16), but here it's not > relevant. Why not? Isn't it true that if none of the 16 characters preceding the current position can start a composition sequence, then the current position is not inside a composition sequence? > > Another idea would be to call composition_compute_stop_pos repeatedly, > > starting from the last cmp_it->stop_pos, until we find the last > > stop_pos before the current iterator position, then compute the > > beginning and end of the composable sequence at that position, and > > record it in the iterator. Then we handle the composition when we > > enter the sequence from either end. > > To move from one composition position to the next, we must > actually call autocmp_chars and find where the current > composition ends, then start searching for the next > composition. As autocmp_chars calls Lisp and all functions > to compose characters, it's so inefficient to call it > repeatedly just to find the last one. If the buffer or string is full of composed characters, then yes, it would be a slowdown. Especially if the number of ``suspect'' stop positions is much larger than the number of actual composition sequences. But what else can be done, given the design of the compositions that doesn't let us know the sequence length without actually composing the character? Thanks.