From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: Better emoji support Date: Sun, 19 Sep 2021 20:40:11 +0200 Message-ID: <87ee9kl88k.fsf@gmail.com> References: <834kd2cypw.fsf@gnu.org> <87zguuttbm.fsf@gmail.com> <8335smcxx6.fsf@gnu.org> <87v95itsc4.fsf@gmail.com> <831r86cxdy.fsf@gnu.org> <83a6kgejp0.fsf@gnu.org> <87wnnkpjj9.fsf@gmail.com> <3E0155F6-D681-4443-A1D9-472D1836168D@traduction-libre.org> <87bl4rnyoe.fsf@gmail.com> <87tuigmyez.fsf@mail.linkov.net> <834kag4gwq.fsf@gnu.org> <87r1dkl9m9.fsf@gmail.com> <83tuig2zc7.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8750"; mail-complaints-to="usenet@ciao.gmane.io" Cc: lists@traduction-libre.org, emacs-devel@gnu.org, juri@linkov.net To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Sep 19 20:43:27 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mS1mh-00027K-59 for ged-emacs-devel@m.gmane-mx.org; Sun, 19 Sep 2021 20:43:27 +0200 Original-Received: from localhost ([::1]:54120 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mS1mg-0004T8-13 for ged-emacs-devel@m.gmane-mx.org; Sun, 19 Sep 2021 14:43:26 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59748) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mS1jc-0008VY-GW for emacs-devel@gnu.org; Sun, 19 Sep 2021 14:40:16 -0400 Original-Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]:40507) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mS1ja-0007D3-Tg; Sun, 19 Sep 2021 14:40:16 -0400 Original-Received: by mail-wr1-x435.google.com with SMTP id q26so24653401wrc.7; Sun, 19 Sep 2021 11:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=C6WqhsXpBL2htd+uMwFAdrbbQNAuUzL/UjY4ZIWi7KI=; b=Wv2/wV7uu+3579qeKhxaeNwLd/pknZ3jG2hD6NIX3Zfb/B7Po9k5XvqQGHrnneMVRs ZTQwEcvJRnd5R6CzqfDFr9g4iEsTu3UKjtqZziswHM5o4voclWhzzVrtTOsbTCPQjxm2 1rvKnz187cIEt3JwpB5GF30N4WbXUu2QCytpjvenmZZOMiTbFFqnpDt/J6Ke6T/IvHMD UeEZ/Iz6+533GXYZ1zlhUh0VNzoaaIIEBMbHA8CjdZWnPfZQYZAVNqDQZbFYFSLNYHd+ jtrG/q4n5x+GiIyNaMcEnFewPhAEu86eGZ61HGPDvlWeTS8VXeCD5AvQa3gGpG1sRylD DywQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=C6WqhsXpBL2htd+uMwFAdrbbQNAuUzL/UjY4ZIWi7KI=; b=NC6N8x+xrwPhnHbEQCgqIsLiPWZW7JM++NYDOMNfxHuLl/Z8MGrGzItZ2fLZslTSGq rDv/d9lyOFjwUgwxYu6fnrifugdqkf8xh1KTdeT2MzrJAj+fcaGrqhlygy23vYjDSHBe Y2/iwgqmm2UsGxS3E9lF+TobIr4NBj0/q91CC9RyPa2MesiYs7np9r0gDHPwGaZXv2On +l6Fkr+IokBLjrQfx3WkLelylFddhyaBxsXBMiuDUQkQGYlbbpwhhE0ohNxyP3IDmUc3 RN8VT+ay+mq+gZIQ5rUzTDEN96hIJ0SAYiHgNj2XjYGqG/kHDCaGgDti/ykXAWCyKKyK VnrA== X-Gm-Message-State: AOAM531Mw2U50xNeny6Q9m39yfq+jG+L3b6xpqCXoFiW9cdjnJmBSYHp DXyJQ0kQnjpjjMoyCPCbjI3LVglHkjk= X-Google-Smtp-Source: ABdhPJwVv2KEf/TJe36rkEnaSClGmPHS00WrTlchHPDDnSk9yOHRazTFURur5r2SYIRwr6tbfqe7fQ== X-Received: by 2002:a5d:4a4e:: with SMTP id v14mr23946891wrs.271.1632076812375; Sun, 19 Sep 2021 11:40:12 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id y6sm13013165wrp.46.2021.09.19.11.40.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Sep 2021 11:40:11 -0700 (PDT) In-Reply-To: <83tuig2zc7.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 19 Sep 2021 21:29:44 +0300") Received-SPF: pass client-ip=2a00:1450:4864:20::435; envelope-from=rpluim@gmail.com; helo=mail-wr1-x435.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275078 Archived-At: >>>>> On Sun, 19 Sep 2021 21:29:44 +0300, Eli Zaretskii said: >> From: Robert Pluim >> Cc: Juri Linkov , lists@traduction-libre.org, >> emacs-devel@gnu.org >> Date: Sun, 19 Sep 2021 20:10:22 +0200 >>=20 Eli> Hmm... Robert, I see quite a few characters that now belong to the Eli> emoji script, which shouldn't be there, AFAIU. The above is one of Eli> them (AFAIK, the Arrows block doesn't belong to Emoji). But there= are Eli> more stark cases, for example: >>=20 >> The whole block might not, but some of the codepoints do: >>=20 >> 2194..2199 ; Emoji # E0.6 [6] (=E2=86=94=EF=B8= =8F..=E2=86=99=EF=B8=8F) left-right arrow..down-left arrow Eli> Only if followed by a variation selector VS-16, right? I=CA=BCm inclined to agree, but I=CA=BCd have to re-read tr51, and I have a headache. They definitely have Emoji_Presentation=3DNo. Eli> (aref char-script-table ?#) =3D> emoji Eli> (aref char-script-table ?0) =3D> emoji >>=20 >> I don=CA=BCt see that here (and it=CA=BCs definitely not the >> intention). Blocks.awk skips any ASCII codepoints (and those both >> evaluate to "latin" here). Could you double-check your >> lisp/international/charscript.el? Eli> I see them there: Eli> (#x0023 #x0023 emoji) ; Autogenerated emoji Eli> (#x002A #x002A emoji) ; Autogenerated emoji Eli> (#x0030 #x0039 emoji) ; Autogenerated emoji Eli> (#x00A9 #x00A9 emoji) ; Autogenerated emoji Eli> (#x00AE #x00AE emoji) ; Autogenerated emoji Eli> Which corresponds to these lines in emoji-data.txt: Eli> 0023 ; Emoji # E0.0 [1] (#=EF=B8=8F) = hash sign Eli> 002A ; Emoji # E0.0 [1] (*=EF=B8=8F) = asterisk Eli> 0030..0039 ; Emoji # E0.0 [10] (0=EF=B8=8F..9= =EF=B8=8F) digit zero..digit nine Eli> 00A9 ; Emoji # E0.6 [1] (=C2=A9=EF=B8= =8F) copyright Eli> 00AE ; Emoji # E0.6 [1] (=C2=AE=EF=B8= =8F) registered Blocks.awk has this: FILENAME ~ "emoji-data.txt" && /^00[0-9A-F]{2}.*; Emoji / { next } so those should have been filtered out (this is where I learn more about Awk incompatibilities than I care to, perhaps) Eli> It seems like these characters ended up in the emoji script because Eli> they should render as emoji when followed by variation selectors? = But Eli> in that case, the place to do this is in composition-function-tabl= e, Eli> if we can, and if we cannot, let's for now decide we don't support Eli> these sequences, because the cure sounds worse than the disease wi= th Eli> our current infrastructure. >>=20 Eli> Am I missing something? >>=20 >> Are now saying that we only want to add to the emoji script those >> characters with Emoji_Presentation=3DYes? Eli> Yes, I think so. Are there any downsides to that? Not that I can see. As a side effect it will fix whatever is causing those ASCII codepoints to be treated as Emoji for you. Robert --=20