From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Richard Wordingham Newsgroups: gmane.emacs.help Subject: Re: Composed Sequences Date: Sat, 26 Feb 2022 19:46:16 +0000 Message-ID: <20220226194616.4c6e0330@JRWUBU2> References: <20220220110926.25c675be@JRWUBU2> <835yp9ya4x.fsf@gnu.org> <20220226002837.699ae2b1@JRWUBU2> <83r17qp268.fsf@gnu.org> <20220226151144.4c0b641e@JRWUBU2> <83fso5prnp.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29348"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Feb 26 20:47:12 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nO328-0007S3-Ao for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 26 Feb 2022 20:47:12 +0100 Original-Received: from localhost ([::1]:38782 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nO325-00014w-Pa for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 26 Feb 2022 14:47:10 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:59780) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO31V-00014a-1v for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 14:46:33 -0500 Original-Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:53250) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO31S-0006v3-4N for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 14:46:32 -0500 Original-Received: from [212.54.57.112] (helo=csmtp8.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO31O-0001gb-O0 for help-gnu-emacs@gnu.org; Sat, 26 Feb 2022 20:46:26 +0100 Original-Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id O31On9qTxgJWQO31On5Ott; Sat, 26 Feb 2022 20:46:26 +0100 X-SourceIP: 82.27.122.109 X-Spam: 0 X-Authority: v=2.4 cv=FuEWQknq c=1 sm=1 tr=0 ts=621a8392 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=oHssOmoXGL8XicAM7tUA:9 a=QEXdDO2ut3YA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1645904786; bh=kabdqeeT4x3Tgl6cR7LQ38ZQFumAk69VUI36Ip8CcX4=; h=Date:From:To:Subject:In-Reply-To:References; b=1lwX92N6HwCk4Jv41Xpo7lxevwNLduNeReEU1xtQeJkPTqF0VHpgi6vNLbDXgkW3p mgPQSApcqtz3fWZjwMe8nSmkCv4Kwt99uLqTx+0ToUSycG5iFVxzp+PqcggR4HL/Yj KQ294cb0EslVnWY4FFTFOlpInC5IfQ4MsDfBUwSIRlJzPFVxieZJzdaB5fYdevbFIN 7Xh2J4HNhecxOzSHPNMt2fmyy68LXq7PrFYzBXVedG8Ti2KP1odETyN5Qxhv4G6rgz +tSU6/0Ow6jaFm+tta59b3W0o8xld+lUaxA2uG2T2h3+c5uEmVUO1h3FskttHCCm7u BpDFUyM9l/x5g== In-Reply-To: <83fso5prnp.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) X-CMAE-Envelope: MS4xfE2KkcPkLsrzaELqwoAWlmYc5As7smJmyBCo4c7IkA9k3dGZs7CE07ud8jf3WuUT2ULn0MBJEtKEG3EMR6YK4HkWbduKzAmy8caDEQqN8KJOopEI93Q1 x3yFFWuoY/K/Txxuy0ghQj/oI9UTe0q3jUtzfUX2ig7umjlEQCK7VvzJaiSW38HlKeyQSFQzEbWny4cLRJ6Yg8pX2+xVYKSAsWs= Received-SPF: pass client-ip=212.54.57.97; envelope-from=richard.wordingham@ntlworld.com; helo=smtpq2.tb.ukmail.iss.as9143.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:136210 Archived-At: On Sat, 26 Feb 2022 17:35:22 +0200 Eli Zaretskii wrote: > > Date: Sat, 26 Feb 2022 15:11:44 +0000 > > From: Richard Wordingham > > =20 > > > > Different renderers give different clusters, and thus, by > > > > default, different cursor motion! =20 > > =20 > > > Not "different renderers", but "different fonts". =20 > >=20 > > I experimented with the Tai Tham composition-function-table entry > >=20 > > (list (vector "[\u1a20-\u1aad]+" 0 'font-shape-gstring)) > >=20 > > For GNU Emacs 23.4.1 (i386-mingw-nt6.2.9200) using Uniscribe, the > > word =E1=A8=A0=E1=A9=A3=E1=A9=A0=E1=A8=BF <1A20 HIGH KA, 1A63 AA, 1A60 = SAKOT, 1A3F LOW YA>, the > > glyph string for Version 0.8 of my font Da Lekh is divided into two > > clusters as identified by the 'glyph' values [0 1 6688...] [0 1 > > 6688...] [2 3 6752...] and confirmed by ordinary cursor motion. > > While this division into <1A20, 1A63> and <1A60, 1A3F> is not the > > Unicode division into grapheme clusters, it accords with what are > > natively namable clusters. > >=20 > > For GNU Emacs 27.1 (build1 i686-w64-mingw32) of 2020-08-21, which > > uses HarfBuzz, the same word is one indivisible cluster (at least > > with Version 0.13 of the same font). I think this is a change in > > the behaviour of HarfBuzz. =20 >=20 > If you must have the last word in this. (It's quite clear that in > gray areas, such as Tai Tham, and where a shaping engine has a bug or > a misfeature, the results will also depend on the shaping engine. But > that is not the main lesson to be taken home from the original issue, > which btw was with Arabic, not Tai Tham.) The original query was how the cursor could wind up being displayed inside a cluster as defined by the composition rules. The answer is that it is always allowed at the boundary of graphemes, as defined below. It does, unfortunately, seem that the Uniscribe behaviour results from oppressive coding, rather than any desire to support default grapheme clusters (Unicode) or the like. > > > Emacs > > > obeys the decisions of the font designers. =20 > > Unless they recorded the positions of the boundaries between the > > parts of a ligature! =20 > I don't understand what you mean by that. The GDEF table of an OpenType font records the boundary between the components of a ligature glyph, via the 'ligature caret list' table therein. These data, if they exist, are amongst the 'decisions of the font designers'. Annoyingly, the font designers may be overridden by the rendering engine designers. A font designer can merge 'graphemes', but seemingly not split 'graphemes'. Glossary: cluster - sequence of coded characters presented to the shaping engine to be shaped. grapheme - A sequence of coded characters which the shaping engine treats as a unit for the purpose of 'hit detection'. (Perhaps this glossary has been published somewhere.) In principle, a glyph may be shared between two graphemes, but I doubt that Emacs has a mechanism to support that. > Emacs behaves according to what the shaping engine tells us about the > number of graphems in the cluster. Each grapheme is (by default) a > single unit for the purposes of cursor motion: Emacs will not let you > "enter" the grapheme, even if it is make out of several glyphs. But > there's nothing in particular that Emacs expects from the number and > order of the graphemes in a cluster, we just use what the shaping > engine hands back to us. And the cursor motion in Emacs is by default > in logical order, i.e. in the increasing order of buffer positions of > the original codepoints. I hope you mean "several characters", not "several glyphs". The exception is related to disable-point-adjustment and its relatives, and I think also to undisplayed buffers. Richard.