From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#20140: 24.4; M17n shaper output rejected Date: Mon, 07 Feb 2022 16:04:35 +0200 Message-ID: <83czjyydqk.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> <20220206220958.5a4d8ffe@JRWUBU2> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17104"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 20140@debbugs.gnu.org, handa@gnu.org, larsi@gnus.org To: Richard Wordingham Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 07 15:29:18 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nH514-0004GJ-85 for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 07 Feb 2022 15:29:18 +0100 Original-Received: from localhost ([::1]:49952 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nH513-0008HB-1H for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 07 Feb 2022 09:29:17 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:52160) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH4dk-0002Cp-PG for bug-gnu-emacs@gnu.org; Mon, 07 Feb 2022 09:05:13 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:47411) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nH4da-0001xR-6e for bug-gnu-emacs@gnu.org; Mon, 07 Feb 2022 09:05:10 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nH4dZ-0006kG-WE for bug-gnu-emacs@gnu.org; Mon, 07 Feb 2022 09:05:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 07 Feb 2022 14:05:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20140 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 20140-submit@debbugs.gnu.org id=B20140.164424269825916 (code B ref 20140); Mon, 07 Feb 2022 14:05:01 +0000 Original-Received: (at 20140) by debbugs.gnu.org; 7 Feb 2022 14:04:58 +0000 Original-Received: from localhost ([127.0.0.1]:41308 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH4dV-0006jw-UQ for submit@debbugs.gnu.org; Mon, 07 Feb 2022 09:04:58 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:45426) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH4dU-0006jj-0m for 20140@debbugs.gnu.org; Mon, 07 Feb 2022 09:04:56 -0500 Original-Received: from [2001:470:142:3::e] (port=52040 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH4dN-0001wB-UO; Mon, 07 Feb 2022 09:04:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=FJenTatY7pObevwM9sc/jpnBNfEWzrXrtZwtl+UV6z4=; b=VeZA73XWsh+NIkwhIMEi xc5OutpFPZ/bw8CEMm7XiMG5cf42jJPYqxjs7lNbzZTCQeveaIc8VU728C4eGC5rUfnnvPORABypc P4NHArkMSoxLxHt9IAOpImn4fD3vB6av8L4budZlIJKmWAIeHZ2/2ioAZyrqyXxJvrP/NEPBvu2Rl vgFzzbcGYAkdtGVk/Rn8ASXG94rK7g/VKw17pIxsq/MY+iaGAdZD9PyK/+D+yGK9Yx2wIpE1NITsT zBUcE8ZrrQCXAswVX1rGxLHgYAkkMYfjtZqLnoFYVZEOY9CGnSnijUOkBwy7mkXI9+wCrxlBeVUTS mVJrFpy/vUxW3A==; Original-Received: from [87.69.77.57] (port=1433 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH4dN-000190-AN; Mon, 07 Feb 2022 09:04:49 -0500 In-Reply-To: <20220206220958.5a4d8ffe@JRWUBU2> (message from Richard Wordingham on Sun, 6 Feb 2022 22:09:58 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:226248 Archived-At: > Date: Sun, 6 Feb 2022 22:09:58 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org, Kenichi Handa > > > > Sad to see that Khaled Hosny's suggestion not to use composition > > > rules seems not to have been taken. > > > > You mean, to pass all the text via HarfBuzz instead? That makes the > > Emacs redisplay painfully slow, and would require a complete redesign > > of how we render text to be bearable. So as long as such a redesign > > is not available, we cannot use that advice. > > Except for Malayalam! (Subexpression XX* in indian.el at the moment.) (That was changed lately. But it is a tangent.) > > > They're complicated by the facts that the 'regular expressions' are > > > not interpreted as regular expressions and they are not interpreted > > > as closed under canonical equivalence. I therefore calculate the > > > regular expression. > > > > I'm not sure I understand the issue: what you do seems to be very > > similar to what we do for the Indic scripts in indian.el, so what kind > > of complications are you talking about here? > > Well, those rules themselves are a bit odd. Why are you composing > single clusters? Why are you breaking clusters where Microsoft > imitators are likely to insert dotted circles? I'm not sure this is what I asked. I asked why you think this way of defining patterns for composition rules is in any way exceptional. It seems pretty much boilerplate to me. > The best sources are the regular expressions in the proposals, but they > missed out the combination of tone mark and final consonant signs. Can you be more specific about those proposals? Any specific pointers? Also, does this mean there's currently no widely accepted agreement regarding Tai Tham shaping? What do native readers of that script expect? > What do you mean by 'shaping'? Whatever is needed to produce correct display from a sequence of codepoints in a given script. > > At least for the dotted circle case, Emacs has a general composition > > rule; see compose-gstring-for-dotted-circle and the corresponding rule > > in composite.c. So I'm not sure we need anything specific to Tai Tham > > there. > > Does the 3-character Khmer sequence "◌្ក" work > in Version 28? It doesn't in Version 26.3. It should look like a > dotted circle with the lower part of ក្ក below it. In Version 26.3, I > don't even get the consonant U+1780 subscripted! No, it doesn't produce what you want (though the 2nd and the 3rd characters do combine), but that's not surprising: the general rules for U+25CC that we have cover only a single combining mark after it: (aset composition-function-table #x25CC `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle])) So a sequence of more than one character after U+25CC needs an explicit rule to work. What is the rule in this case? (And what does Khmer have to do with the question I asked, which is about Tai Tham?) > With HarfBuzz, if you don't compose U+25CC with the following mark, you > are very likely to get two dotted circles - are you deliberately > deleting one? No. And I don't get 2 dotted circles with the above in Emacs 28 with HarfBuzz. Anyway, Khmer is a separate issue.