Re: Compositions and bidi display

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Compositions and bidi display
Date: Fri, 30 Apr 2010 21:12:04 +0900	[thread overview]
Message-ID: <tl71vdxgmwb.fsf@m17n.org> (raw)
In-Reply-To: <83r5lxw8wi.fsf@gnu.org> (message from Eli Zaretskii on Fri, 30 Apr 2010 13:07:41 +0300)

I'll reply to this before replying to your previous mail.

In article <83r5lxw8wi.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Note that composition_compute_stop_pos just finds a stop
> > position to check, and the actual checking and composing is
> > done by composition_reseat_it which is called by
> > CHAR_COMPOSED_P.

> But it looks like composition_compute_stop_pos does use at least some
> validation for the candidate stop position.  AFAIU, this fragment
> finds and validates a static composition:

>   if (find_composition (charpos, endpos, &start, &end, &prop, string)
>       && COMPOSITION_VALID_P (start, end, prop))
>     {
>       cmp_it->stop_pos = endpos = start;
>       cmp_it->ch = -1;
>     }

> So it looks like COMPOSITION_VALID_P is the proper way of validating a
> position that is a candidate for a static composition.  Is that true?

Yes.

> If it is true, then the end point of the static composition is given
> by the `end' argument to find_composition,

Yes.

> and all we need is record it in cmp_it.

Record it for what purpose?

Anyway, calling COMPOSITION_VALID_P here is because we can
avoid calling it again in composition_reseat_it.  But, for
automatic composition, the checking and actual composing
happens at the same time.  So, even if we do that in
composition_compute_stop_pos, composition_reseat_it has to
do that again (for actual composing).

> And the loop after that, conditioned on auto-composition-mode, seems
> to do a similar job for automatic compositions.  Omitting some
> secondary details, that loop does this:

>   while (charpos < endpos)
>     {
>       [advance to the next character]
>       val = CHAR_TABLE_REF (Vcomposition_function_table, c);
>       if (! NILP (val))
> 	{
> 	  Lisp_Object elt;

> 	  for (; CONSP (val); val = XCDR (val))
> 	    {
> 	      elt = XCAR (val);
> 	      if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1))
> 		  && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start)
> 		break;
> 	    }
> 	  if (CONSP (val))
> 	    {
> 	      cmp_it->lookback = XFASTINT (AREF (elt, 1));
> 	      cmp_it->stop_pos = charpos - 1 - cmp_it->lookback;
> 	      cmp_it->ch = c;
> 	      return;
> 	    }
> 	}
>     }

> This looks as if a position that is a candidate for starting a
> composition sequence should have a non-nil entry in
> composition-function-table for the character at that position, and
> that entry should specify the (relative) character position where the
> sequence might start.  Is my understanding correct?

Mostly, but not accuate.  The correct one is "A position
that will be composed with the following and/or the
preceding characters should have a non-nil entry in ...".

The reason why we don't record all characters that will
start a composition is for efficiency (for instance, to
record only combining characters (U+0300...U+03FF) in
composition-function-table).

> > To move from one composition position to the next, we must actually
> > call autocmp_chars and find where the current composition ends, then
> > start searching for the next composition.

> It is true that the code looking for stop position that might begin an
> automatic composition does not compute the end of the sequence.  That
> end is computed by autocmp_chars.  But what does this mean in
> practice?  Suppose we have found a candidate stop_pos, marked by S
> below:

>      abcdeSuvwxyz

> First, a composition sequence cannot be shorter than 2 characters,
> right?

No, a single character can composed.

> So the next stop_pos cannot be before v.  Now suppose that the
> actual composition sequence is "Suvw", and we issue the next call to
> composition_compute_stop_pos at v -- are you saying that it will
> suggest that v is also a possible stop_pos, even though it is in the
> middle of a composition sequence?  --- (Q1)

Yes, that happens in Indic scripts.  Actually both a line
starting with "Suvw" and a line staring with "vw" can have
different composition at BOL.  But, AFAIK, all R2L scripts
(Arabic, Dhivehi, Hebrew) don't have such a charactics.  So,
in a adhoc way, we can say that your (Q1) is false.  So, 

> If not, then repeated calls to
> composition_compute_stop_pos in the bidi case, without calling
> composition_reseat_it in between, will just be slightly
> more expensive because they will need to examine more positions.  Is
> this analysis correct?

it is correct but just empirically.  There will be a script
that uses the same writing system as Devanagari but in R2L
manner somewhere between Indic and Arabic region.  I have no
idea.

> > But composition_reseat_it also needs ENDPOS

> We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we
> call composition_reseat_it and composition_compute_stop_pos in the
> forward direction repeatedly, can't we?  That's because, when the
> iterator is some position, we are only interested in compositions that
> cover that position.

No.  Such a way slows down the display of a buffer that has
no composition at all.  For such a buffer,
composition_compute_stop_pos should set cmp_it->stop_pos to
the actual endpos so that CHAR_COMPOSED_P quickly returns
zero.

> > We don't have to re-calculate ENDPOS each time.  It must be
> > updated only when we pass over bidi boundary.

> Btw, can we always assume that all the characters of a composition
> sequence are at the same embedding level?  I guess IOW I'm asking what
> Emacs features are currently implemented based on compositions?

Yes.  I can't think of any situation that characters must be
composed striding over bidi-boundary.   First of all, in
what embedding level, such a composition belongs?

> Obviously, all the characters in a sequence that produces a single
> grapheme must have the same level, but what about compositions that
> produce several grapheme clusters -- can each of the clusters have
> different bidirectional properties?

It is possible to setup a regular expression of an entry of
composition-function-table to do such a composition.  But, I
think we don't have to support such a thing until we face
with a concrete example of the necessity (quite doubtfull).

---
Kenichi Handa
handa@m17n.org

next prev parent reply	other threads:[~2010-04-30 12:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3A521851-F7CC-45DB-A2ED-8348EF96D5CF@Freenet.DE>
     [not found] ` <83fx2q5w86.fsf@gnu.org>
     [not found]   ` <tl739yppmat.fsf@m17n.org>
2010-04-23 18:52     ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
2010-04-23 20:34       ` Andreas Schwab
2010-04-23 20:43         ` Eli Zaretskii
2010-04-24 11:27           ` Eli Zaretskii
2010-04-26  2:09       ` Kenichi Handa
2010-04-26  2:38         ` Kenichi Handa
2010-04-26 11:29       ` Kenichi Handa
2010-04-26 18:40         ` Compositions and bidi display Eli Zaretskii
2010-04-27 12:15           ` Kenichi Handa
2010-04-28  3:18             ` Eli Zaretskii
2010-04-28  4:01               ` Kenichi Handa
2010-04-28 17:38                 ` Eli Zaretskii
2010-04-28 22:49                   ` Stefan Monnier
2010-04-29  3:12                     ` Eli Zaretskii
2010-04-30  2:28                       ` Kenichi Handa
2010-04-30  6:41                         ` Eli Zaretskii
2010-04-30  6:06                   ` Kenichi Handa
2010-04-30  7:08                     ` Eli Zaretskii
2010-05-03  2:39                       ` Kenichi Handa
2010-05-03  7:31                         ` Eli Zaretskii
2010-05-04  9:19                           ` Kenichi Handa
2010-05-04 17:47                             ` Eli Zaretskii
2010-04-30 10:07                     ` Eli Zaretskii
2010-04-30 12:12                       ` Kenichi Handa [this message]
2010-04-30 13:15                         ` Eli Zaretskii
2010-04-27  3:13         ` Compositions and bidi display (was: bug#5977: 24.0.50; Lao HELLO is incorrectly displayed) Eli Zaretskii
2010-04-27 12:26           ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl71vdxgmwb.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).