From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Compositions and bidi display Date: Fri, 30 Apr 2010 21:12:04 +0900 Message-ID: References: <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE> <83fx2q5w86.fsf@gnu.org> <834oj22e96.fsf@gnu.org> <837hnuys42.fsf@gnu.org> <83mxwoxo1t.fsf@gnu.org> <83d3xjxys1.fsf@gnu.org> <83r5lxw8wi.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1272629402 11262 80.91.229.12 (30 Apr 2010 12:10:02 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 30 Apr 2010 12:10:02 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 30 14:10:00 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1O7p2b-0005kR-8T for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 14:09:57 +0200 Original-Received: from localhost ([127.0.0.1]:41082 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7p2a-0006JA-De for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 08:09:56 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O7p2W-0006J3-C0 for emacs-devel@gnu.org; Fri, 30 Apr 2010 08:09:52 -0400 Original-Received: from [140.186.70.92] (port=47420 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7p2V-0006Iv-0O for emacs-devel@gnu.org; Fri, 30 Apr 2010 08:09:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O7p2T-0007rJ-OT for emacs-devel@gnu.org; Fri, 30 Apr 2010 08:09:50 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:46419) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7p2Q-0007qT-A7; Fri, 30 Apr 2010 08:09:46 -0400 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id o3UC9gGW006309; Fri, 30 Apr 2010 21:09:42 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id o3UC9git000882; Fri, 30 Apr 2010 21:09:42 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id o3UC9fml002941; Fri, 30 Apr 2010 21:09:41 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1O7p4e-0005m7-FL; Fri, 30 Apr 2010 21:12:04 +0900 In-Reply-To: <83r5lxw8wi.fsf@gnu.org> (message from Eli Zaretskii on Fri, 30 Apr 2010 13:07:41 +0300) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124351 Archived-At: I'll reply to this before replying to your previous mail. In article <83r5lxw8wi.fsf@gnu.org>, Eli Zaretskii writes: > > Note that composition_compute_stop_pos just finds a stop > > position to check, and the actual checking and composing is > > done by composition_reseat_it which is called by > > CHAR_COMPOSED_P. > But it looks like composition_compute_stop_pos does use at least some > validation for the candidate stop position. AFAIU, this fragment > finds and validates a static composition: > if (find_composition (charpos, endpos, &start, &end, &prop, string) > && COMPOSITION_VALID_P (start, end, prop)) > { > cmp_it->stop_pos = endpos = start; > cmp_it->ch = -1; > } > So it looks like COMPOSITION_VALID_P is the proper way of validating a > position that is a candidate for a static composition. Is that true? Yes. > If it is true, then the end point of the static composition is given > by the `end' argument to find_composition, Yes. > and all we need is record it in cmp_it. Record it for what purpose? Anyway, calling COMPOSITION_VALID_P here is because we can avoid calling it again in composition_reseat_it. But, for automatic composition, the checking and actual composing happens at the same time. So, even if we do that in composition_compute_stop_pos, composition_reseat_it has to do that again (for actual composing). > And the loop after that, conditioned on auto-composition-mode, seems > to do a similar job for automatic compositions. Omitting some > secondary details, that loop does this: > while (charpos < endpos) > { > [advance to the next character] > val = CHAR_TABLE_REF (Vcomposition_function_table, c); > if (! NILP (val)) > { > Lisp_Object elt; > for (; CONSP (val); val = XCDR (val)) > { > elt = XCAR (val); > if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1)) > && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start) > break; > } > if (CONSP (val)) > { > cmp_it->lookback = XFASTINT (AREF (elt, 1)); > cmp_it->stop_pos = charpos - 1 - cmp_it->lookback; > cmp_it->ch = c; > return; > } > } > } > This looks as if a position that is a candidate for starting a > composition sequence should have a non-nil entry in > composition-function-table for the character at that position, and > that entry should specify the (relative) character position where the > sequence might start. Is my understanding correct? Mostly, but not accuate. The correct one is "A position that will be composed with the following and/or the preceding characters should have a non-nil entry in ...". The reason why we don't record all characters that will start a composition is for efficiency (for instance, to record only combining characters (U+0300...U+03FF) in composition-function-table). > > To move from one composition position to the next, we must actually > > call autocmp_chars and find where the current composition ends, then > > start searching for the next composition. > It is true that the code looking for stop position that might begin an > automatic composition does not compute the end of the sequence. That > end is computed by autocmp_chars. But what does this mean in > practice? Suppose we have found a candidate stop_pos, marked by S > below: > abcdeSuvwxyz > First, a composition sequence cannot be shorter than 2 characters, > right? No, a single character can composed. > So the next stop_pos cannot be before v. Now suppose that the > actual composition sequence is "Suvw", and we issue the next call to > composition_compute_stop_pos at v -- are you saying that it will > suggest that v is also a possible stop_pos, even though it is in the > middle of a composition sequence? --- (Q1) Yes, that happens in Indic scripts. Actually both a line starting with "Suvw" and a line staring with "vw" can have different composition at BOL. But, AFAIK, all R2L scripts (Arabic, Dhivehi, Hebrew) don't have such a charactics. So, in a adhoc way, we can say that your (Q1) is false. So, > If not, then repeated calls to > composition_compute_stop_pos in the bidi case, without calling > composition_reseat_it in between, will just be slightly > more expensive because they will need to examine more positions. Is > this analysis correct? it is correct but just empirically. There will be a script that uses the same writing system as Devanagari but in R2L manner somewhere between Indic and Arabic region. I have no idea. > > But composition_reseat_it also needs ENDPOS > We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we > call composition_reseat_it and composition_compute_stop_pos in the > forward direction repeatedly, can't we? That's because, when the > iterator is some position, we are only interested in compositions that > cover that position. No. Such a way slows down the display of a buffer that has no composition at all. For such a buffer, composition_compute_stop_pos should set cmp_it->stop_pos to the actual endpos so that CHAR_COMPOSED_P quickly returns zero. > > We don't have to re-calculate ENDPOS each time. It must be > > updated only when we pass over bidi boundary. > Btw, can we always assume that all the characters of a composition > sequence are at the same embedding level? I guess IOW I'm asking what > Emacs features are currently implemented based on compositions? Yes. I can't think of any situation that characters must be composed striding over bidi-boundary. First of all, in what embedding level, such a composition belongs? > Obviously, all the characters in a sequence that produces a single > grapheme must have the same level, but what about compositions that > produce several grapheme clusters -- can each of the clusters have > different bidirectional properties? It is possible to setup a regular expression of an entry of composition-function-table to do such a composition. But, I think we don't have to support such a thing until we face with a concrete example of the necessity (quite doubtfull). --- Kenichi Handa handa@m17n.org