unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: Composed Sequences
Date: Sat, 26 Feb 2022 17:35:22 +0200	[thread overview]
Message-ID: <83fso5prnp.fsf@gnu.org> (raw)
In-Reply-To: <20220226151144.4c0b641e@JRWUBU2> (message from Richard Wordingham on Sat, 26 Feb 2022 15:11:44 +0000)

> Date: Sat, 26 Feb 2022 15:11:44 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> 
> > > Different renderers give different clusters, and thus, by default,
> > > different cursor motion!  
> 
> > Not "different renderers", but "different fonts".
> 
> I experimented with the Tai Tham composition-function-table entry
> 
> (list (vector "[\u1a20-\u1aad]+" 0 'font-shape-gstring))
> 
> For GNU Emacs 23.4.1 (i386-mingw-nt6.2.9200) using Uniscribe, the word
> ᨠᩣ᩠ᨿ <1A20 HIGH KA, 1A63 AA, 1A60 SAKOT, 1A3F LOW YA>, the glyph string
> for Version 0.8 of my font Da Lekh is divided into two
> clusters as identified by the 'glyph' values [0 1 6688...] [0 1
> 6688...] [2 3 6752...] and confirmed by ordinary cursor motion.  While
> this division into <1A20, 1A63> and <1A60, 1A3F> is not the Unicode
> division into grapheme clusters, it accords with what are natively
> namable clusters.
> 
> For GNU Emacs 27.1 (build1 i686-w64-mingw32) of 2020-08-21, which uses
> HarfBuzz, the same word is one indivisible cluster (at least with
> Version 0.13 of the same font).  I think this is a change in the
> behaviour of HarfBuzz.

If you must have the last word in this.  (It's quite clear that in
gray areas, such as Tai Tham, and where a shaping engine has a bug or
a misfeature, the results will also depend on the shaping engine.  But
that is not the main lesson to be taken home from the original issue,
which btw was with Arabic, not Tai Tham.)

> > > The reason Arabic seemed different is that when lam+hah appears to
> > > ligate, what is happening (at least with Amiri) is that
> > > substitutions are made which give the effect of a ligature, while
> > > remaining two distinct glyphs.  
> 
> > Yes, I see that as well.  "C-u C-x =" should tell you whether ligation
> > happened or not.  What you see is normal, I think: Emacs obeys the
> > decisions of the font designers.
> 
> Unless they recorded the positions of the boundaries between the parts
> of a ligature!

I don't understand what you mean by that.

Emacs behaves according to what the shaping engine tells us about the
number of graphems in the cluster.  Each grapheme is (by default) a
single unit for the purposes of cursor motion: Emacs will not let you
"enter" the grapheme, even if it is make out of several glyphs.  But
there's nothing in particular that Emacs expects from the number and
order of the graphemes in a cluster, we just use what the shaping
engine hands back to us.  And the cursor motion in Emacs is by default
in logical order, i.e. in the increasing order of buffer positions of
the original codepoints.



  reply	other threads:[~2022-02-26 15:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20 11:09 Manually parsing char-tables Richard Wordingham
2022-02-20 12:50 ` Eli Zaretskii
2022-02-21  1:39   ` Richard Wordingham
2022-02-26  0:28   ` Composed Sequences (was: Manually parsing char-tables) Richard Wordingham
2022-02-26  6:33     ` Eli Zaretskii
2022-02-26 15:11       ` Composed Sequences Richard Wordingham
2022-02-26 15:35         ` Eli Zaretskii [this message]
2022-02-26 19:46           ` Richard Wordingham
2022-02-26 20:02             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fso5prnp.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).