From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Composed Sequences Date: Sat, 26 Feb 2022 17:35:22 +0200 Message-ID: <83fso5prnp.fsf@gnu.org> References: <20220220110926.25c675be@JRWUBU2> <835yp9ya4x.fsf@gnu.org> <20220226002837.699ae2b1@JRWUBU2> <83r17qp268.fsf@gnu.org> <20220226151144.4c0b641e@JRWUBU2> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2690"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Feb 26 16:36:09 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nNz7B-0000Wr-1Z for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 26 Feb 2022 16:36:09 +0100 Original-Received: from localhost ([::1]:39086 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nNz79-0004wx-Dt for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 26 Feb 2022 10:36:07 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:44560) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNz6g-0004wo-Hg for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 10:35:38 -0500 Original-Received: from [2001:470:142:3::e] (port=55802 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNz6g-0001Ex-8U for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 10:35:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=T3e12AiPIm2cksCcYKMoGunOwqGmvL10pFlp9It0Bx8=; b=iH4m2qsNJeez7QzMCU9F k6Clj89Du/Jqg7r3HCP2b1iwWg7DZ6MpS2ncTd8zKsmJalvXthrwIyNPznWN6VRvBBa95NwzuvkQQ 8KAu2Pe++tZt2lAcwqbubVaeFByum/EU/hcQpn+4gSeA5/qg5XDD7AdY/jOEgSvC2ydo8Ej+WEEzj cXKnd04JtGM5CWywfltsO2dzl7ucvyLqOLlZTu+r5PniBCkpaCkc10MdlrCDgq+TyWhsoZ8bD0o2q gFulRsGtgne75gI7VyfRcQLZ2X2OC72+ZQL8w93wsJmPk9ofmMLKk1Jzmq4VWjGRtvgrsswA7UZ62 0uQAAjYCxZLkkg==; Original-Received: from [87.69.77.57] (port=2275 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNz6e-00031v-TR for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 10:35:37 -0500 In-Reply-To: <20220226151144.4c0b641e@JRWUBU2> (message from Richard Wordingham on Sat, 26 Feb 2022 15:11:44 +0000) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:136200 Archived-At: > Date: Sat, 26 Feb 2022 15:11:44 +0000 > From: Richard Wordingham > > > > Different renderers give different clusters, and thus, by default, > > > different cursor motion! > > > Not "different renderers", but "different fonts". > > I experimented with the Tai Tham composition-function-table entry > > (list (vector "[\u1a20-\u1aad]+" 0 'font-shape-gstring)) > > For GNU Emacs 23.4.1 (i386-mingw-nt6.2.9200) using Uniscribe, the word > ᨠᩣ᩠ᨿ <1A20 HIGH KA, 1A63 AA, 1A60 SAKOT, 1A3F LOW YA>, the glyph string > for Version 0.8 of my font Da Lekh is divided into two > clusters as identified by the 'glyph' values [0 1 6688...] [0 1 > 6688...] [2 3 6752...] and confirmed by ordinary cursor motion. While > this division into <1A20, 1A63> and <1A60, 1A3F> is not the Unicode > division into grapheme clusters, it accords with what are natively > namable clusters. > > For GNU Emacs 27.1 (build1 i686-w64-mingw32) of 2020-08-21, which uses > HarfBuzz, the same word is one indivisible cluster (at least with > Version 0.13 of the same font). I think this is a change in the > behaviour of HarfBuzz. If you must have the last word in this. (It's quite clear that in gray areas, such as Tai Tham, and where a shaping engine has a bug or a misfeature, the results will also depend on the shaping engine. But that is not the main lesson to be taken home from the original issue, which btw was with Arabic, not Tai Tham.) > > > The reason Arabic seemed different is that when lam+hah appears to > > > ligate, what is happening (at least with Amiri) is that > > > substitutions are made which give the effect of a ligature, while > > > remaining two distinct glyphs. > > > Yes, I see that as well. "C-u C-x =" should tell you whether ligation > > happened or not. What you see is normal, I think: Emacs obeys the > > decisions of the font designers. > > Unless they recorded the positions of the boundaries between the parts > of a ligature! I don't understand what you mean by that. Emacs behaves according to what the shaping engine tells us about the number of graphems in the cluster. Each grapheme is (by default) a single unit for the purposes of cursor motion: Emacs will not let you "enter" the grapheme, even if it is make out of several glyphs. But there's nothing in particular that Emacs expects from the number and order of the graphemes in a cluster, we just use what the shaping engine hands back to us. And the cursor motion in Emacs is by default in logical order, i.e. in the increasing order of buffer positions of the original codepoints.