From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.bugs Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate Date: Fri, 26 May 2023 10:34:02 +0200 Message-ID: <87v8gfmqyt.fsf@gmail.com> References: <87a5xrzsph.fsf@stebalien.com> <83pm6nlhll.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9508"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63731@debbugs.gnu.org, Steven Allen To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri May 26 10:35:33 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q2Sv6-0002FN-Ni for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 26 May 2023 10:35:32 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q2Suf-0006v5-H9; Fri, 26 May 2023 04:35:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2Suc-0006tG-Tu for bug-gnu-emacs@gnu.org; Fri, 26 May 2023 04:35:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1q2Suc-0001k2-LU for bug-gnu-emacs@gnu.org; Fri, 26 May 2023 04:35:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1q2Suc-0002Bu-9s for bug-gnu-emacs@gnu.org; Fri, 26 May 2023 04:35:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Robert Pluim Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 26 May 2023 08:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63731 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 63731-submit@debbugs.gnu.org id=B63731.16850900538332 (code B ref 63731); Fri, 26 May 2023 08:35:02 +0000 Original-Received: (at 63731) by debbugs.gnu.org; 26 May 2023 08:34:13 +0000 Original-Received: from localhost ([127.0.0.1]:48773 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q2Sto-0002AJ-KK for submit@debbugs.gnu.org; Fri, 26 May 2023 04:34:13 -0400 Original-Received: from mail-wm1-f53.google.com ([209.85.128.53]:48604) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q2Stm-0002A0-Pa for 63731@debbugs.gnu.org; Fri, 26 May 2023 04:34:11 -0400 Original-Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-3f6a6b9c079so3256615e9.1 for <63731@debbugs.gnu.org>; Fri, 26 May 2023 01:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685090044; x=1687682044; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=3s/1XGS2BtmlYtUxCc8xCNnBR+VowQhv2VIxHoWAd1s=; b=qoqCtajwa1N96XGd4K+rZKpXaa9iazXRD2yDcqfsbf4fY/Slsuhk7GmDPzr+USU+9R 5SoYpPTs5/O9HgTwsicQ/esoSzA9JawsZXSbsO+qLkBLBUXRjpPw9TrOAlladgyuFRSD pPLts77wARFQ+mKZTiC2WF4cwv/EmnUR7Q4shql6ibhbstyqMkbn5DSVCn5qvYnOpABx CvK8bd2CSx4EyxVkFEKQNbNMIg2N5NeJDqP9yabyD7PpC4SX6DdS9bB3LqzX9L8Hd4Sd 7benlde+9PVpRRX30TQvqPWXpVV+eGwRQ5A9q6FyXu6WNB4LJ1fr4kRCYT823KOfOFJZ +/yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685090044; x=1687682044; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3s/1XGS2BtmlYtUxCc8xCNnBR+VowQhv2VIxHoWAd1s=; b=fbQYQCAx76HhdY7g/55QV8op0JIprSgYPrOp/gsF6/sBB9VUIPC0PrkcwZp/9l87wz GQ8pYl7OHwsZtP7i/XsAo9ldg9gSLavA9Vzzr3v4JpklDu0wMEywV46CGxfRrXbW2kQE e1g0AmoJLlqi5T7Gipo9FvHWFex8FoP7mfW8mpxw7zIA6exZeh3om92n4KvaI9I2LE1s RX3V0eMyBLENdsvsa07yhmhlsxgAkH7SJ0VxUXHHcizSu1NusA95HTozPrXUEwmrgxWA uZIOS7bFgo5CzLiezW5wwsRkgRp4lBeXqeA1ObbJR4LqY2qi7G5TQqiP59ybPVWVTFBx tycQ== X-Gm-Message-State: AC+VfDy1yyFPKwS8Vn3pGvVgbU+DPGiFCYmrAnHczbcoKMgzqqzYmZih 5DdcCpnp7/pqtLsYfShydqS5MEEqAIA= X-Google-Smtp-Source: ACHHUZ7LNgo+JoJO/Xpy2nP8eI+lKxuG5Ip3U22wMEwQbqlc5DHDpl1GH583aoZY3UcNNTcV2W+OEg== X-Received: by 2002:a05:600c:252:b0:3f4:a09f:1877 with SMTP id 18-20020a05600c025200b003f4a09f1877mr818778wmj.23.1685090044173; Fri, 26 May 2023 01:34:04 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id c21-20020a7bc855000000b003f1958eeadcsm8042888wml.17.2023.05.26.01.34.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 01:34:03 -0700 (PDT) In-Reply-To: <83pm6nlhll.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 26 May 2023 09:41:42 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:262411 Archived-At: Disclaimer: I haven=CA=BCt looked at the patch yet >>>>> On Fri, 26 May 2023 09:41:42 +0300, Eli Zaretskii said: >> From: Steven Allen >> Date: Thu, 25 May 2023 20:18:02 -0700 >>=20 >> This patch imports the full list from unicode.org instead of >> special-casing a few characters as was done previously. >>=20 >> With this patch, '=F0=9F=91=8D=EF=B8=8F' (1F44D FE0F) should look th= e same as '=F0=9F=91=8D' (1F44D). >> Without it, it will look like '=F0=9F=91=8D=E2=80=8C=EF=B8=8F'. >>=20 >> As a simple regression test, '=E2=9C=94' (2714) should still as "tex= t" while '=E2=9C=94=EF=B8=8F' >> (2714 FE0F) should still display as an emoji. >>=20 >> Fixes https://github.com/alphapapa/ement.el/issues/137 >>=20 >> NOTE: I'm not a Unicode expert, nor do I understand how Emacs handles >> Unicode (beyond what was required to implement this patch). But this >> patch appears to work and I can't find any regressions. Eli> AFAIU, this change will populate composition-function-table for ma= ny Eli> "normal" characters, including ASCII digits and symbol/punctuation Eli> characters from the 0x2xxx blocks. E.g., after you build Emacs wi= th Eli> this patch, what do the following evaluations yield: Eli> M-: (aref composition-function-table ?0) RET Eli> M-: (aref composition-function-table #x2122) RET Eli> If they yield non-nil values, it could mean dramatic slowdown of Eli> redisplay with these characters. Which is precisely what we wante= d to Eli> avoid when we made the decision which parts of the Unicode-defined Eli> Emoji sequences to support in Emacs, and how to arrange for that Eli> support to work. Yes. We don=CA=BCt want to do composition checks for ASCII if we can avoid = it. Eli> The issue you site is strange: according to the "C-u C-x =3D" disp= lay Eli> there, Emacs did compose #x1f44d with VS-16 using the Noto Color E= moji Eli> font, so I don't quite understand why VS-16 is then also shown as = an Eli> empty rectangle. On my system Noto Color Emoji doesn't work, and = "C-u Eli> C-x =3D" says this instead: Eli> Composed with the following character(s) "=EF=B8=8F" using this = font: Eli> harfbuzz:-outline-Noto Emoji-regular-normal-normal-mono-15-*-*= -*-c-*-iso10646-1 Eli> by these glyphs: Eli> [0 1 128077 422 19 2 17 14 2 nil] Eli> [0 1 65039 3 19 0 1 0 1 [0 0 0]] Eli> with these character(s): Eli> =EF=B8=8F (#xfe0f) VARIATION SELECTOR-16 Eli> which explains why I see two glyphs and not 1. But in the display Eli> shown in the above issue, I see Eli> Composed with the following character(s) "=EF=B8=8F" using this = font: Eli> ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-18-*-*-*= -m-0-iso10646-1 Eli> by these glyphs: Eli> [0 1 128077 569 22 0 23 17 5 [0 0 136]] Eli> with these character(s): Eli> =EF=B8=8F (#xfe0f) VARIATION SELECTOR-16 Eli> which describes only one glyph, not two. So the result ought to be Eli> what you expect. I see the emoji followed by a blank box with Noto Color Emoji here. I don=CA=BCt yet understand why. Eli> Robert, what am I missing here? 1F44D FE0F is a valid sequence according to tr51 (aref composition-function-table #x1f44d) =3D> (["\\(?:=F0=9F=91=8D[=F0=9F=8F=BB-=F0=9F=8F=BF]\\)" 0 compose-gstring-= for-graphic]) which means that the composition is being triggered by this entry: (aref composition-function-table #xfe0f) =3D> (["\\c.\\c^+" 1 compose-gstring-for-graphic] [nil 0 compose-gstring-fo= r-graphic]) (time passes) Ugh. The following fixes it for me: diff --git a/lisp/composite.el b/lisp/composite.el index fb8b76114f4..af86d1436d3 100644 --- a/lisp/composite.el +++ b/lisp/composite.el @@ -756,7 +756,7 @@ compose-gstring-for-dotted-circle ;; Allow for bootstrapping without uni-*.el. (when unicode-category-table (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic] - [nil 0 compose-gstring-for-graphic]))) + ))) (map-char-table #'(lambda (key val) (if (memq val '(Mn Mc Me)) Although the following is less invasive: diff --git a/lisp/composite.el b/lisp/composite.el index fb8b76114f4..333428f008a 100644 --- a/lisp/composite.el +++ b/lisp/composite.el @@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle (if (memq val '(Mn Mc Me)) (set-char-table-range composition-function-table key elt))) unicode-category-table)) + ;; for Emoji presentation selector + (set-char-table-range + composition-function-table + #xFE0F + `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic])) ;; for dotted-circle (aset composition-function-table #x25CC `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle])) Didn=CA=BCt we conclude that composition had some issues with multiple entries for the same codepoint if there was a mix for forward and backward looking regexp? Robert --=20