From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate Date: Thu, 01 Jun 2023 19:10:16 +0300 Message-ID: <83a5xjcgev.fsf@gnu.org> References: <87a5xrzsph.fsf@stebalien.com> <83mt1rjg69.fsf@gnu.org> <875y8fm7x7.fsf@gmail.com> <83lehbjdjd.fsf@gnu.org> <87wn0vkqn1.fsf@gmail.com> <83jzwvj94x.fsf@gnu.org> <87h6rw8y82.fsf@gmail.com> <83353gipww.fsf@gnu.org> <87edmzto0l.fsf@gmail.com> <83ilcbgrxo.fsf@gnu.org> <87zg5nb3m3.fsf@gmail.com> <83bki3gpai.fsf@gnu.org> <87sfbfazfp.fsf@gmail.com> <837csrgioe.fsf@gnu.org> <87jzwqb7r3.fsf@gmail.com> <83ttvuf29m.fsf@gnu.org> <87a5xmaqul.fsf@gmail.com> <83r0qxg4q0.fsf@gnu.org> <875y88bhvr.fsf@gmail.com> <83ilc8eapd.fsf@gnu.org> <83edmvcpzl.fsf@gnu.org> <87wn0n9uol.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3617"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63731@debbugs.gnu.org, steven@stebalien.com To: Robert Pluim Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jun 01 18:10:49 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q4ksy-0000lv-L3 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 01 Jun 2023 18:10:48 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q4ksQ-0005oD-RT; Thu, 01 Jun 2023 12:10:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q4ksG-0005lA-MI for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 12:10:06 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1q4ksF-0003BX-53 for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 12:10:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1q4ksE-00019S-W2 for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 12:10:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 01 Jun 2023 16:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63731 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63731-submit@debbugs.gnu.org id=B63731.16856357824350 (code B ref 63731); Thu, 01 Jun 2023 16:10:02 +0000 Original-Received: (at 63731) by debbugs.gnu.org; 1 Jun 2023 16:09:42 +0000 Original-Received: from localhost ([127.0.0.1]:38371 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q4krt-000186-QV for submit@debbugs.gnu.org; Thu, 01 Jun 2023 12:09:42 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:50570) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q4krq-00017s-FR for 63731@debbugs.gnu.org; Thu, 01 Jun 2023 12:09:40 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q4krj-00037v-U7; Thu, 01 Jun 2023 12:09:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=rErGHP695iV4Y6XMKpuDf7dmoX73SibhtydLkELHdkg=; b=l6WASrjot8UjgyDHpmEg 1qqxv4mRgG/MXtSXATQ0j5cZlieU5yW7+OsdgoEQ36I2JDwz2g+NGOQa60XMRv0cimsQ1/nSYxMNQ Ckd/HShcTi9lMwCwhLd5VmKsEmZU09rS4+4H+GnKEb9XJGRagpa4R6DuxiGhBC366R5oN8s3veVGK 9eyWe3/Yk1Y1xXVrRtNVEfZ8M7/rF/uakEEuo322PVnDCK2CSkD43AYbjUSnIggQna/7UUhTQdrn1 Dc5EbLRv7cmTnCUy7psCr/njaDqO8Fi8RK3563hGSsHKtpcwRHuyCIn+VzVJraYpIQUcDHQwYwN5W UMOoC0bbSBNZ/Q==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q4krj-0007iM-2Y; Thu, 01 Jun 2023 12:09:31 -0400 In-Reply-To: <87wn0n9uol.fsf@gmail.com> (message from Robert Pluim on Thu, 01 Jun 2023 15:30:18 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:262709 Archived-At: > From: Robert Pluim > Cc: 63731@debbugs.gnu.org, steven@stebalien.com > Date: Thu, 01 Jun 2023 15:30:18 +0200 > > Eli> OK, the issue is quite clear even without stepping with a debugger. > > Eli> Bottom line: we cannot support a situation where the same character > Eli> can be composed by more than one slot in composition-function-table. > Eli> If there are more than a single slot for the same character, one of > Eli> them will be tried, and the rest will be ignored (not even tried). > Eli> In particular, if a character CH has a "forward" composition rule that > Eli> starts with itself, and also has a "backward" rule (one with non-zero > Eli> look-back parameter) triggered by a different character (which should > Eli> follow CH), the latter rule will never be tried. > > OK, that makes sense. Where would be a good place to document this? In the doc string of composition-function-table, I think. We already document there the caveat of arranging rules in descending order of look-back, which is part of the same "misfeature". > Eli> Which means that to have #xFE0F compose correctly with Emoji > Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el. > > Thatʼs easy enough: > > diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk > index 7d2ff6cb900..d1195ebbad8 100644 > --- a/admin/unidata/emoji-zwj.awk > +++ b/admin/unidata/emoji-zwj.awk > @@ -106,7 +106,8 @@ END { > > for (elt in ch) > { > - printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt]) > + entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt) > + printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries) > } > print "))" > print " (set-char-table-range composition-function-table" > > That makes all the VS-16 sequences in > admin/unidata/emoji-variation-sequences.txt display with the emoji > font for me. Ready to install this on the emacs-29 branch? > Eli> The reason why "C-u C-x =" lies to us saying there's a composition > Eli> where really there isn't is because descr-text.el uses the > Eli> find-composition primitive, whose implementation is parallel and > Eli> separate from that of the display-engine routines, and is structured > Eli> differently. So find-composition does succeed to detect the second > Eli> rule, the one triggered by #xFE0F, which the display engine ignores. > Eli> I will think whether this can be fixed, to avoid such false positives, > Eli> but if we accept that there can be only one set of composition rules > Eli> for a character, then we basically invoked undefined behavior here, > Eli> and we got what we deserved. > > If find-composition DTRT, could we not use it in the display engine? Not easily, because the display code calls subroutines of find-composition in a certain order, and that's what causes the behavior I described. And even if we could make this happen, I'm not sure we should: basically, having multiple matching slots would mean users and callers will never be sure which one "wins".