From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Compositions and bidi display Date: Fri, 30 Apr 2010 13:07:41 +0300 Message-ID: <83r5lxw8wi.fsf@gnu.org> References: <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE> <83fx2q5w86.fsf@gnu.org> <834oj22e96.fsf@gnu.org> <837hnuys42.fsf@gnu.org> <83mxwoxo1t.fsf@gnu.org> <83d3xjxys1.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1272622108 17623 80.91.229.12 (30 Apr 2010 10:08:28 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 30 Apr 2010 10:08:28 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Apr 30 12:08:25 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1O7n8w-0000Z0-N5 for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 12:08:23 +0200 Original-Received: from localhost ([127.0.0.1]:35925 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7n8v-00026A-LM for ged-emacs-devel@m.gmane.org; Fri, 30 Apr 2010 06:08:21 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O7n8o-000265-Rn for emacs-devel@gnu.org; Fri, 30 Apr 2010 06:08:14 -0400 Original-Received: from [140.186.70.92] (port=33248 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O7n8i-00025w-4G for emacs-devel@gnu.org; Fri, 30 Apr 2010 06:08:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O7n8f-0006Op-Vi for emacs-devel@gnu.org; Fri, 30 Apr 2010 06:08:07 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:34886) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7n8d-0006ON-4G for emacs-devel@gnu.org; Fri, 30 Apr 2010 06:08:05 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0L1O00H00PEW2U00@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Fri, 30 Apr 2010 13:07:40 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.126.59.39]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L1O00G4UPGRDJ20@a-mtaout22.012.net.il>; Fri, 30 Apr 2010 13:07:40 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124347 Archived-At: > From: Kenichi Handa > Cc: emacs-devel@gnu.org > Date: Fri, 30 Apr 2010 15:06:11 +0900 After re-reading the code of composition_compute_stop_pos, I have a few more questions about what you wrote. > Note that composition_compute_stop_pos just finds a stop > position to check, and the actual checking and composing is > done by composition_reseat_it which is called by > CHAR_COMPOSED_P. But it looks like composition_compute_stop_pos does use at least some validation for the candidate stop position. AFAIU, this fragment finds and validates a static composition: if (find_composition (charpos, endpos, &start, &end, &prop, string) && COMPOSITION_VALID_P (start, end, prop)) { cmp_it->stop_pos = endpos = start; cmp_it->ch = -1; } So it looks like COMPOSITION_VALID_P is the proper way of validating a position that is a candidate for a static composition. Is that true? If it is true, then the end point of the static composition is given by the `end' argument to find_composition, and all we need is record it in cmp_it. If not true, what _does_ COMPOSITION_VALID_P validate? And the loop after that, conditioned on auto-composition-mode, seems to do a similar job for automatic compositions. Omitting some secondary details, that loop does this: while (charpos < endpos) { [advance to the next character] val = CHAR_TABLE_REF (Vcomposition_function_table, c); if (! NILP (val)) { Lisp_Object elt; for (; CONSP (val); val = XCDR (val)) { elt = XCAR (val); if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1)) && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start) break; } if (CONSP (val)) { cmp_it->lookback = XFASTINT (AREF (elt, 1)); cmp_it->stop_pos = charpos - 1 - cmp_it->lookback; cmp_it->ch = c; return; } } } This looks as if a position that is a candidate for starting a composition sequence should have a non-nil entry in composition-function-table for the character at that position, and that entry should specify the (relative) character position where the sequence might start. Is my understanding correct? > To move from one composition position to the next, we must actually > call autocmp_chars and find where the current composition ends, then > start searching for the next composition. It is true that the code looking for stop position that might begin an automatic composition does not compute the end of the sequence. That end is computed by autocmp_chars. But what does this mean in practice? Suppose we have found a candidate stop_pos, marked by S below: abcdeSuvwxyz First, a composition sequence cannot be shorter than 2 characters, right? So the next stop_pos cannot be before v. Now suppose that the actual composition sequence is "Suvw", and we issue the next call to composition_compute_stop_pos at v -- are you saying that it will suggest that v is also a possible stop_pos, even though it is in the middle of a composition sequence? If not, then repeated calls to composition_compute_stop_pos in the bidi case, without calling composition_reseat_it in between, will just be slightly more expensive because they will need to examine more positions. Is this analysis correct? > But composition_reseat_it also needs ENDPOS We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we call composition_reseat_it and composition_compute_stop_pos in the forward direction repeatedly, can't we? That's because, when the iterator is some position, we are only interested in compositions that cover that position. > We don't have to re-calculate ENDPOS each time. It must be > updated only when we pass over bidi boundary. Btw, can we always assume that all the characters of a composition sequence are at the same embedding level? I guess IOW I'm asking what Emacs features are currently implemented based on compositions? Obviously, all the characters in a sequence that produces a single grapheme must have the same level, but what about compositions that produce several grapheme clusters -- can each of the clusters have different bidirectional properties?