From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: Better emoji support Date: Mon, 20 Sep 2021 22:05:10 +0200 Message-ID: <87lf3rhv2h.fsf@gmail.com> References: <834kd2cypw.fsf@gnu.org> <877dffnwf5.fsf@gmail.com> <875yuzciaf.fsf@gmail.com> <87tuijm9uy.fsf@gmail.com> <87lf3v2dz4.fsf@gmail.com> <87ilywl8tx.fsf@gmail.com> <877dfcz6zu.fsf@gmail.com> <83mto73mem.fsf@gnu.org> <87ee9jhint.fsf@gmail.com> <838rzr3fhm.fsf@gnu.org> <87h7efk5c3.fsf@gmail.com> <83y27r1sku.fsf@gnu.org> <87czp3jt5z.fsf@gmail.com> <83lf3r1j8h.fsf@gnu.org> <83k0jb1iqg.fsf@gnu.org> <8735pzjqyu.fsf@gmail.com> <83h7ef1d3w.fsf@gnu.org> <87tuifi254.fsf@gmail.com> <838rzr13i6.fsf@gnu.org> <87pmt3hwoq.fsf@gmail.com> <835yuv11ay.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16778"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, kevin.legouguec@gmail.com To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Sep 20 22:06:45 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mSPYq-0004C2-Ua for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 22:06:44 +0200 Original-Received: from localhost ([::1]:47994 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSPYp-0008CF-Bk for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 16:06:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48190) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSPXQ-0007Qk-1o for emacs-devel@gnu.org; Mon, 20 Sep 2021 16:05:16 -0400 Original-Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]:39722) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mSPXO-00057p-9H; Mon, 20 Sep 2021 16:05:15 -0400 Original-Received: by mail-wr1-x430.google.com with SMTP id u15so33002148wru.6; Mon, 20 Sep 2021 13:05:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=9GP1/Z25ru+Z+Tcbe9ojJ6XO41RngNnjQLdJhoK35c8=; b=ljD1jl/X+OaMyRQfrxmU/H9bme73RYZIhTlwj9XGsoY8zK9nGHBMTDFii+E22Ylrne SwEioXUXDGGJqOJ5sZ096KPE7VfFXlDKucWEEFUz509ElnxCRZKaEhcZ0NVAVJ7YPaYZ UAF9Pab903ZzBsFhR7ZF9nvuz+f+9gjoktw5X9pPOk0tfXZfUdhDI46JG8iHcaty0y0M vCjovSqGLk1BSzVXVJudsF41mHsiJSTmpOjdIPcVCkTPBZX+vho9t7wzG8J0Yi5prww8 PVGLdzEWbJIMjdaomss45dwjkDrN2UIebEbCSN5MPUxiGyi/v1WY6C2iEEkLgKcwQSou +Sfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=9GP1/Z25ru+Z+Tcbe9ojJ6XO41RngNnjQLdJhoK35c8=; b=LMHInyKnSBPGfEWVjrUppQtcIwFacWmFAPSBHfVDgoYJ6s/rF8SyWRcOoamjaWj9Zm V+7yu8vewnPzh9YmRNs6TQw4o2MqV/tFmoL1sJ5IxegqLoXroykhBXPjtJ+gjEfcyjeA njGnlTbOUazs1qc2H32hoqV1aKNl42vtwAhfNX/nC09JXHihdjBm2OT9UrqkKbAPCXyL eUus2CW6jk83+o74ZfPNyYR5n/pV4VZM+gXl3L4rT2iGW98brocKI2ivE+qt3FN2407H lOF1zdaJAip4jIa6mzdhU3J0+VXXeThl9wEUGE97S6NsgdpecW4Jd18/CTWDTliAMDMo +h+g== X-Gm-Message-State: AOAM531iLIOUHXb2t0A7lQT2kTBwLouyxSMUldwr9qihaRkzqzmj7JFN xlD21GvHQZORynw+nboCr3B6RzKAnHU= X-Google-Smtp-Source: ABdhPJyesyQ/rqkHMy2OJhLR/bFfdA4mebZdK6nyytZx0JGyKTmkZxsqn1uMuX2IFXMk1O9nJqdfAQ== X-Received: by 2002:a05:600c:3548:: with SMTP id i8mr837658wmq.106.1632168311937; Mon, 20 Sep 2021 13:05:11 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id m18sm16789693wrn.85.2021.09.20.13.05.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 13:05:11 -0700 (PDT) In-Reply-To: <835yuv11ay.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 20 Sep 2021 22:42:29 +0300") Received-SPF: pass client-ip=2a00:1450:4864:20::430; envelope-from=rpluim@gmail.com; helo=mail-wr1-x430.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275169 Archived-At: >>>>> On Mon, 20 Sep 2021 22:42:29 +0300, Eli Zaretskii said: >> From: Robert Pluim >> Cc: kevin.legouguec@gmail.com, emacs-devel@gnu.org >> Date: Mon, 20 Sep 2021 21:30:13 +0200 >>=20 >> >>>>> On Mon, 20 Sep 2021 21:54:57 +0300, Eli Zaretskii said: >>=20 Eli> for Emoji sequences in composition-function-table should be anchor= ed Eli> on the VS-n codepoints (which I think is a good idea regardless). >> >>=20 >> >> We=CA=BCd have to raise the lookback limit for composition-functi= on-table >> >> rules higher than 3 (maybe only to 4). >>=20 Eli> Examples? Not that it's a catastrophe. >>=20 >> >From emoji-zwj-sequences.txt: >>=20 >> 1F468 1F3FB 200D 2764 FE0F 200D 1F468 1F3FB ; RGI_Emoji_ZWJ_Sequence >> ; couple with heart: man, man, light skin tone # >> E13.1 [1] (=F0=9F=91=A8=F0=9F=8F=BB=E2=80=8D=E2=9D=A4=EF=B8=8F=E2= =80=8D=F0=9F=91=A8=F0=9F=8F=BB) >>=20 >> With the current limit you'd get no further than the 1F3FB if you >> anchored at FE0F, and miss the 1F468. Eli> Ah, that's a misunderstanding. I meant what I said only for seque= nces Eli> that start with a non-emoji character. When the first character is Eli> from the emoji script, we don't need anything special to have the Eli> right font used. Phew. Let's talk about en/de-coding next, fun for all the family :-) >> >> I guess it reduces the number of entries in >> >> composition-function-table, but then you end up with a lot of rul= es >> >> for eg VS-16. >>=20 Eli> Why do you think we need to have a lot of such rules? What kind of Eli> rules did you think about? >>=20 >> For whatever reason, a lot of the sequences in emoji-zwj-sequences.t= xt >> contain codepoints with Emoji_Presentation =3D No, hence they=CA=BCre >> followed by VS-16. As a result, anchoring to VS-16 would produces a >> lot of rules for VS-16. Eli> We don't need a separate rule for every sequence, we can use a reg= ular Eli> expression with character sets. We can even have regexps that mat= ch Eli> more than emoji-zwj-sequences.txt specifies, since the font and the Eli> shaping engine will sort that out and return a failure indication = for Eli> sequences that the font doesn't support. Right. >> Anyway, we can measure the difference, if any, once we have the base >> implementation and Someone=E2=84=A2 implements the VS-16 anchored ve= rsion (it >> would only be a dozen lines of awk, I think). Eli> Let's cross that bridge when we get to it. Right. For now we key off the first character in the sequence speficied in emoji-zwj-sequences.txt. Robert --=20