From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: Better emoji support Date: Mon, 20 Sep 2021 21:30:13 +0200 Message-ID: <87pmt3hwoq.fsf@gmail.com> References: <834kd2cypw.fsf@gnu.org> <87bl4rnyoe.fsf@gmail.com> <877dffnwf5.fsf@gmail.com> <875yuzciaf.fsf@gmail.com> <87tuijm9uy.fsf@gmail.com> <87lf3v2dz4.fsf@gmail.com> <87ilywl8tx.fsf@gmail.com> <877dfcz6zu.fsf@gmail.com> <83mto73mem.fsf@gnu.org> <87ee9jhint.fsf@gmail.com> <838rzr3fhm.fsf@gnu.org> <87h7efk5c3.fsf@gmail.com> <83y27r1sku.fsf@gnu.org> <87czp3jt5z.fsf@gmail.com> <83lf3r1j8h.fsf@gnu.org> <83k0jb1iqg.fsf@gnu.org> <8735pzjqyu.fsf@gmail.com> <83h7ef1d3w.fsf@gnu.org> <87tuifi254.fsf@gmail.com> <838rzr13i6.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11499"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, kevin.legouguec@gmail.com To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Sep 20 21:32:16 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mSP1T-0002jh-1I for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 21:32:15 +0200 Original-Received: from localhost ([::1]:37072 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSP1Q-0005uM-Op for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 15:32:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42314) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSOzb-0004Yk-Em for emacs-devel@gnu.org; Mon, 20 Sep 2021 15:30:19 -0400 Original-Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]:34508) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mSOzZ-0000T9-N0; Mon, 20 Sep 2021 15:30:19 -0400 Original-Received: by mail-wr1-x42f.google.com with SMTP id t8so25773633wri.1; Mon, 20 Sep 2021 12:30:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=cloQnBGbQurUZWoiA/miJvL6uOJKIryLX5yHWAWdOGY=; b=ftCwWQHgtoD8rF9detbjLbBJ2IC/fCJqJdiizkBXp7zZ1shhoV9DYdwEHp9d0Y4TXw Vc2YTFOpob9NfRNuBchYlx8v/jJ7R56/3h7oDK+JCq3Wt5kIodaYTMls2TD6ANaJewDh bOO2PeJTHd3AG4Qf3hukzXzfl63hpTcnE3W2ad4tqA5DJjhkrSqjOUbVoHhsMGorIp3E B3VG5Z0ybjeJviNLvDMR3+OW/AYPncczf9qA4pVDTlyByf2YpqhQUsbj9Xw8hUSkGiky OyJjLKZ79Su2mNKNpr0WywVorjDLDdV43n/q3lUnXmzDOw5KLfHs+tRPYT05D0Vi77o9 qgGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=cloQnBGbQurUZWoiA/miJvL6uOJKIryLX5yHWAWdOGY=; b=d/Gp8MyFKgBTgEr3g8esOL7WODi8bMF0o7nYjs+weGTE/xL4HvKY9Kc8AOueeBtbA6 w9YTptEknKV7WG+KxhQ5XqcIf/whOLKfc72f5hP7OprTHvf/3GDiOHguFvNEcWhVM2XW 8+G1sEvUxlvCwt/LSIF+7Z8eTuysODNJLf3jIXqIB1BOcIhMx3S0N01ymRIxe2Afgh74 Pm/7WIlDEZuoCOUbT5p7RiV0FlT7EKAyhbmwAtfSW8jv2vdD/QpFY9sQjyfpk3KdRNDC F7T+tx02aUm72osDYOrrA3OPoH4GoDBasY4O2wL2MPx4+aoTkbxk8hOPMoyMMMlZv4ZN j4yg== X-Gm-Message-State: AOAM533CTyAgpAhhXMzpv2WKqZpFhnk7Zgp6m8PFMmXRXBU+jN/tY6ZX L3x1bsNTPXluN2HRQBmJfD43NsX4JsQ= X-Google-Smtp-Source: ABdhPJwn8++5PXRJe7Un01UamtCM/7RZcflihX3PuzaaYWOZ08fGHT19JmwxYNSiTTKOlY4LLINdFw== X-Received: by 2002:adf:ce03:: with SMTP id p3mr30996176wrn.261.1632166215185; Mon, 20 Sep 2021 12:30:15 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id d5sm17528148wra.38.2021.09.20.12.30.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 12:30:14 -0700 (PDT) In-Reply-To: <838rzr13i6.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 20 Sep 2021 21:54:57 +0300") Received-SPF: pass client-ip=2a00:1450:4864:20::42f; envelope-from=rpluim@gmail.com; helo=mail-wr1-x42f.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275165 Archived-At: >>>>> On Mon, 20 Sep 2021 21:54:57 +0300, Eli Zaretskii said: >> From: Robert Pluim >> Cc: kevin.legouguec@gmail.com, emacs-devel@gnu.org >> Date: Mon, 20 Sep 2021 19:32:23 +0200 >>=20 Eli> My idea is to trick the font_range function, which verifies that t= he Eli> sequence can be composed using a single font, to use an Emoji font Eli> when it sees something followed by a variation selector, instead of Eli> the font for the first character in the sequence. For that, the r= ules Eli> for Emoji sequences in composition-function-table should be anchor= ed Eli> on the VS-n codepoints (which I think is a good idea regardless). >>=20 >> We=CA=BCd have to raise the lookback limit for composition-function-= table >> rules higher than 3 (maybe only to 4). Eli> Examples? Not that it's a catastrophe. >From emoji-zwj-sequences.txt: 1F468 1F3FB 200D 2764 FE0F 200D 1F468 1F3FB ; RGI_Emoji_ZWJ_Sequence ; couple with heart: man, man, light skin tone # E13.1 [1] (=F0=9F=91=A8=F0=9F=8F=BB=E2=80=8D=E2=9D=A4=EF=B8=8F=E2=80=8D=F0= =9F=91=A8=F0=9F=8F=BB) With the current limit you'd get no further than the 1F3FB if you anchored at FE0F, and miss the 1F468. >> I guess it reduces the number of entries in >> composition-function-table, but then you end up with a lot of rules >> for eg VS-16. Eli> Why do you think we need to have a lot of such rules? What kind of Eli> rules did you think about? For whatever reason, a lot of the sequences in emoji-zwj-sequences.txt contain codepoints with Emoji_Presentation =3D No, hence they=CA=BCre followed by VS-16. As a result, anchoring to VS-16 would produces a lot of rules for VS-16. >> emoji-zwj-sequences.txt would result in about 840 rules, >> with a lot of redundancy, which could be reduced, but I think that c= an >> wait until after the zwj sequence stuff goes in. Eli> I guess I'm missing something because I don't see a problem there. Perhaps I=CA=BCm worrying too much. It would be a lot of rules to check, but only when encountering VS-16. Anyway, we can measure the difference, if any, once we have the base implementation and Someone=E2=84=A2 implements the VS-16 anchored version (= it would only be a dozen lines of awk, I think). Robert --=20