From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Better emoji support Date: Mon, 20 Sep 2021 22:42:29 +0300 Message-ID: <835yuv11ay.fsf@gnu.org> References: <834kd2cypw.fsf@gnu.org> <87bl4rnyoe.fsf@gmail.com> <877dffnwf5.fsf@gmail.com> <875yuzciaf.fsf@gmail.com> <87tuijm9uy.fsf@gmail.com> <87lf3v2dz4.fsf@gmail.com> <87ilywl8tx.fsf@gmail.com> <877dfcz6zu.fsf@gmail.com> <83mto73mem.fsf@gnu.org> <87ee9jhint.fsf@gmail.com> <838rzr3fhm.fsf@gnu.org> <87h7efk5c3.fsf@gmail.com> <83y27r1sku.fsf@gnu.org> <87czp3jt5z.fsf@gmail.com> <83lf3r1j8h.fsf@gnu.org> <83k0jb1iqg.fsf@gnu.org> <8735pzjqyu.fsf@gmail.com> <83h7ef1d3w.fsf@gnu.org> <87tuifi254.fsf@gmail.com> <838rzr13i6.fsf@gnu.org> <87pmt3hwoq.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23581"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, kevin.legouguec@gmail.com To: Robert Pluim Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Sep 20 21:43:18 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mSPCA-0005zl-IS for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 21:43:18 +0200 Original-Received: from localhost ([::1]:41750 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSPC9-000137-9n for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Sep 2021 15:43:17 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44100) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSPBW-0000MW-GC for emacs-devel@gnu.org; Mon, 20 Sep 2021 15:42:38 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:44232) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSPBV-0002fR-T7; Mon, 20 Sep 2021 15:42:37 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:4466 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSPBV-00009e-Hu; Mon, 20 Sep 2021 15:42:37 -0400 In-Reply-To: <87pmt3hwoq.fsf@gmail.com> (message from Robert Pluim on Mon, 20 Sep 2021 21:30:13 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:275167 Archived-At: > From: Robert Pluim > Cc: kevin.legouguec@gmail.com, emacs-devel@gnu.org > Date: Mon, 20 Sep 2021 21:30:13 +0200 > > >>>>> On Mon, 20 Sep 2021 21:54:57 +0300, Eli Zaretskii said: > > Eli> for Emoji sequences in composition-function-table should be anchored > Eli> on the VS-n codepoints (which I think is a good idea regardless). > >> > >> Weʼd have to raise the lookback limit for composition-function-table > >> rules higher than 3 (maybe only to 4). > > Eli> Examples? Not that it's a catastrophe. > > >From emoji-zwj-sequences.txt: > > 1F468 1F3FB 200D 2764 FE0F 200D 1F468 1F3FB ; RGI_Emoji_ZWJ_Sequence > ; couple with heart: man, man, light skin tone # > E13.1 [1] (👨🏻‍❤️‍👨🏻) > > With the current limit you'd get no further than the 1F3FB if you > anchored at FE0F, and miss the 1F468. Ah, that's a misunderstanding. I meant what I said only for sequences that start with a non-emoji character. When the first character is from the emoji script, we don't need anything special to have the right font used. > >> I guess it reduces the number of entries in > >> composition-function-table, but then you end up with a lot of rules > >> for eg VS-16. > > Eli> Why do you think we need to have a lot of such rules? What kind of > Eli> rules did you think about? > > For whatever reason, a lot of the sequences in emoji-zwj-sequences.txt > contain codepoints with Emoji_Presentation = No, hence theyʼre > followed by VS-16. As a result, anchoring to VS-16 would produces a > lot of rules for VS-16. We don't need a separate rule for every sequence, we can use a regular expression with character sets. We can even have regexps that match more than emoji-zwj-sequences.txt specifies, since the font and the shaping engine will sort that out and return a failure indication for sequences that the font doesn't support. > Anyway, we can measure the difference, if any, once we have the base > implementation and Someone™ implements the VS-16 anchored version (it > would only be a dozen lines of awk, I think). Let's cross that bridge when we get to it.