From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.bugs Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate Date: Thu, 01 Jun 2023 15:30:18 +0200 Message-ID: <87wn0n9uol.fsf@gmail.com> References: <87a5xrzsph.fsf@stebalien.com> <83mt1rjg69.fsf@gnu.org> <875y8fm7x7.fsf@gmail.com> <83lehbjdjd.fsf@gnu.org> <87wn0vkqn1.fsf@gmail.com> <83jzwvj94x.fsf@gnu.org> <87h6rw8y82.fsf@gmail.com> <83353gipww.fsf@gnu.org> <87edmzto0l.fsf@gmail.com> <83ilcbgrxo.fsf@gnu.org> <87zg5nb3m3.fsf@gmail.com> <83bki3gpai.fsf@gnu.org> <87sfbfazfp.fsf@gmail.com> <837csrgioe.fsf@gnu.org> <87jzwqb7r3.fsf@gmail.com> <83ttvuf29m.fsf@gnu.org> <87a5xmaqul.fsf@gmail.com> <83r0qxg4q0.fsf@gnu.org> <875y88bhvr.fsf@gmail.com> <83ilc8eapd.fsf@gnu.org> <83edmvcpzl.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32085"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63731@debbugs.gnu.org, steven@stebalien.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jun 01 15:31:24 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q4iOh-0008AD-GV for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 01 Jun 2023 15:31:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q4iOQ-0003iB-8G; Thu, 01 Jun 2023 09:31:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q4iOO-0003hM-Ph for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 09:31:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1q4iOO-0005MB-Gc for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 09:31:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1q4iON-0004Zd-5T for bug-gnu-emacs@gnu.org; Thu, 01 Jun 2023 09:31:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Robert Pluim Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 01 Jun 2023 13:31:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63731 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63731-submit@debbugs.gnu.org id=B63731.168562622817526 (code B ref 63731); Thu, 01 Jun 2023 13:31:03 +0000 Original-Received: (at 63731) by debbugs.gnu.org; 1 Jun 2023 13:30:28 +0000 Original-Received: from localhost ([127.0.0.1]:37262 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q4iNn-0004Yc-Vz for submit@debbugs.gnu.org; Thu, 01 Jun 2023 09:30:28 -0400 Original-Received: from mail-wm1-f48.google.com ([209.85.128.48]:62543) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q4iNl-0004Y8-Qw for 63731@debbugs.gnu.org; Thu, 01 Jun 2023 09:30:27 -0400 Original-Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-3f6ffc2b314so16683895e9.0 for <63731@debbugs.gnu.org>; Thu, 01 Jun 2023 06:30:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685626220; x=1688218220; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=keOWRxhv0MFtMcWgNyNOeTW6W62bKLP/bcDDJQeh27Q=; b=JtgSdy6TGfCU87mJZJrJkayhBSPWi+bbQdxOwM7IruR5sZdqY4Rf/jjm6I2KcONTOV jWSPOay7gunPWQaaVk1Gt6d9sImXjnZczdH6DhHj6uWn5JYbBCaqw5CzylpphdKv1RXs O+QXGkLxYhBFY3jxl8vFwznwdH0+luGfTwF9CB7bowV0vrDoQnRsM1/38DHjtHbTqIS3 x6eYXuCCQkmlajNGm//mT51anoIya5LCg6d02TZjacb7nC1q1dccvHqXn+V+q+pAR1x8 OaeDgLC+N4aQBd4H/bhsx0f3KGW3nIoFMrwcHjk3gi0stttIka/4/jTlcpkU/r9bMMiS dlBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685626220; x=1688218220; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=keOWRxhv0MFtMcWgNyNOeTW6W62bKLP/bcDDJQeh27Q=; b=Q1QWj6qVWj42FesjcfyCkEV2RZxywjpEwO7hqZcahneUmHJsHodUwj2fqhICQRyl0k bDoWA6z/y8McD/otOMqVA2U4Kej/BiwCZodoDET+agyvOrBaJ3g2USFYkkY9F11ESmj3 WgbkO04OaHpkxEUHwAEL35bApPDdJIXb1OyysypusxO0L8yCltfR3wgpXIT4ZGgY/USG Zflxwd+SWrVPnx4GT/EWUMKYDXfIOqf8RMrJgi2aat89cK/Q6kNz+6MO4zWFhTWKhp0m C7Jr2OvJvdQVWM4oxua3/U+1sF7xPKB+6+wVcCV51o0pr+KHjCvZcBmYqiCtJVp1YS4O tbGw== X-Gm-Message-State: AC+VfDw63yvqVDk7MFYvwZtfOn1wAT5HWhBThhLLS2CktbksTYbrmEpD oJkyFuYkNnTANDOVS5Lyh3U= X-Google-Smtp-Source: ACHHUZ7Xe9nWbM29wbbMsRM8B99anawhDr9Uaq2RTuQR18A0dD6wtEC+4+a4tXji0m9JTuk9KiuPOw== X-Received: by 2002:a05:6000:107:b0:2fe:e455:666c with SMTP id o7-20020a056000010700b002fee455666cmr1967359wrx.33.1685626219742; Thu, 01 Jun 2023 06:30:19 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id q7-20020adff507000000b0030add836194sm10493472wro.65.2023.06.01.06.30.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Jun 2023 06:30:19 -0700 (PDT) In-Reply-To: <83edmvcpzl.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 01 Jun 2023 15:43:26 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:262704 Archived-At: >>>>> On Thu, 01 Jun 2023 15:43:26 +0300, Eli Zaretskii said: >> Cc: 63731@debbugs.gnu.org, steven@stebalien.com >> Date: Wed, 31 May 2023 19:18:22 +0300 >> From: Eli Zaretskii >>=20 >> > From: Robert Pluim >> > Cc: 63731@debbugs.gnu.org, steven@stebalien.com >> > Date: Wed, 31 May 2023 18:11:36 +0200 >> >=20 >> > Eli> So there are two issues here: (a) why there's no composit= ion in the >> > Eli> first case, and (b) why does "C-u C-x =3D" says there is = when there >> > Eli> isn't. >> >=20 >> > OK. I can poke around in gdb if you give me some idea of what I sh= ould >> > be looking at. >>=20 >> I don't really know. I plan to just step through the code in >> composite.c tomorrow, unless you beat me to it. Once we understand >> issue (a), I think we will also understand issue (b). Eli> OK, the issue is quite clear even without stepping with a debugger. Eli> Bottom line: we cannot support a situation where the same character Eli> can be composed by more than one slot in composition-function-tabl= e. Eli> If there are more than a single slot for the same character, one of Eli> them will be tried, and the rest will be ignored (not even tried). Eli> In particular, if a character CH has a "forward" composition rule = that Eli> starts with itself, and also has a "backward" rule (one with non-z= ero Eli> look-back parameter) triggered by a different character (which sho= uld Eli> follow CH), the latter rule will never be tried. OK, that makes sense. Where would be a good place to document this? Eli> This is what happens in this case: the character #x1F44D has sever= al Eli> rules that start with itself in emoji-zwj.el: Eli> (#x1F44D . Eli> ,(eval-when-compile (regexp-opt Eli> '( Eli> "\N{U+1F44D}\N{U+1F3FB}" Eli> "\N{U+1F44D}\N{U+1F3FC}" Eli> "\N{U+1F44D}\N{U+1F3FD}" Eli> "\N{U+1F44D}\N{U+1F3FE}" Eli> "\N{U+1F44D}\N{U+1F3FF}" Eli> )))) Eli> and it also has a "backward" rule: Eli> (set-char-table-range Eli> composition-function-table Eli> #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring])) Eli> The latter is triggered by #xFE0F and has a 1-character look-back, Eli> which will match #x1F44D, since its category is '.' (it's a "base Eli> character"). This latter rule is never tried. Why? because the Eli> former rules, anchored at #X1F44D, are tried first (Emacs redisplay Eli> examines characters in the order of their buffer positions), and f= ail Eli> to match. When those rules fail to match, due to how the Eli> composition-related functions called by the display engine are Eli> factored, we never again consider compositions triggered by a later Eli> character which "cover" also #x1F44D: once that position was exami= ned Eli> and the attempted composition failed, we move to the next characte= r. Eli> IOW, we assume that this first set of composition rules we find fo= r a Eli> given character are the only ones that could possibly be relevant = for Eli> that character. Eli> Which means that to have #xFE0F compose correctly with Emoji Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj= .el. That=CA=BCs easy enough: diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk index 7d2ff6cb900..d1195ebbad8 100644 --- a/admin/unidata/emoji-zwj.awk +++ b/admin/unidata/emoji-zwj.awk @@ -106,7 +106,8 @@ END { =20 for (elt in ch) { - printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n",= elt, vec[elt]) + entries =3D sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt) + printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n",= elt, entries) } print "))" print " (set-char-table-range composition-function-table" That makes all the VS-16 sequences in admin/unidata/emoji-variation-sequences.txt display with the emoji font for me. Eli> The reason why "C-u C-x =3D" lies to us saying there's a compositi= on Eli> where really there isn't is because descr-text.el uses the Eli> find-composition primitive, whose implementation is parallel and Eli> separate from that of the display-engine routines, and is structur= ed Eli> differently. So find-composition does succeed to detect the second Eli> rule, the one triggered by #xFE0F, which the display engine ignore= s. Eli> I will think whether this can be fixed, to avoid such false positi= ves, Eli> but if we accept that there can be only one set of composition rul= es Eli> for a character, then we basically invoked undefined behavior here, Eli> and we got what we deserved. If find-composition DTRT, could we not use it in the display engine? Robert --=20