unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Robert Pluim <rpluim@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 63731@debbugs.gnu.org, steven@stebalien.com
Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate
Date: Thu, 01 Jun 2023 15:30:18 +0200	[thread overview]
Message-ID: <87wn0n9uol.fsf@gmail.com> (raw)
In-Reply-To: <83edmvcpzl.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 01 Jun 2023 15:43:26 +0300")

>>>>> On Thu, 01 Jun 2023 15:43:26 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> Cc: 63731@debbugs.gnu.org, steven@stebalien.com
    >> Date: Wed, 31 May 2023 19:18:22 +0300
    >> From: Eli Zaretskii <eliz@gnu.org>
    >> 
    >> > From: Robert Pluim <rpluim@gmail.com>
    >> > Cc: 63731@debbugs.gnu.org,  steven@stebalien.com
    >> > Date: Wed, 31 May 2023 18:11:36 +0200
    >> > 
    >> >     Eli> So there are two issues here: (a) why there's no composition in the
    >> >     Eli> first case, and (b) why does "C-u C-x =" says there is when there
    >> >     Eli> isn't.
    >> > 
    >> > OK. I can poke around in gdb if you give me some idea of what I should
    >> > be looking at.
    >> 
    >> I don't really know.  I plan to just step through the code in
    >> composite.c tomorrow, unless you beat me to it.  Once we understand
    >> issue (a), I think we will also understand issue (b).

    Eli> OK, the issue is quite clear even without stepping with a debugger.

    Eli> Bottom line: we cannot support a situation where the same character
    Eli> can be composed by more than one slot in composition-function-table.
    Eli> If there are more than a single slot for the same character, one of
    Eli> them will be tried, and the rest will be ignored (not even tried).
    Eli> In particular, if a character CH has a "forward" composition rule that
    Eli> starts with itself, and also has a "backward" rule (one with non-zero
    Eli> look-back parameter) triggered by a different character (which should
    Eli> follow CH), the latter rule will never be tried.

OK, that makes sense. Where would be a good place to document this?

    Eli> This is what happens in this case: the character #x1F44D has several
    Eli> rules that start with itself in emoji-zwj.el:

    Eli>   (#x1F44D .
    Eli>   ,(eval-when-compile (regexp-opt
    Eli>    '(
    Eli>    "\N{U+1F44D}\N{U+1F3FB}"
    Eli>    "\N{U+1F44D}\N{U+1F3FC}"
    Eli>    "\N{U+1F44D}\N{U+1F3FD}"
    Eli>    "\N{U+1F44D}\N{U+1F3FE}"
    Eli>    "\N{U+1F44D}\N{U+1F3FF}"
    Eli>    ))))

    Eli> and it also has a "backward" rule:

    Eli>   (set-char-table-range
    Eli>    composition-function-table
    Eli>    #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))

    Eli> The latter is triggered by #xFE0F and has a 1-character look-back,
    Eli> which will match #x1F44D, since its category is '.' (it's a "base
    Eli> character").  This latter rule is never tried.  Why? because the
    Eli> former rules, anchored at #X1F44D, are tried first (Emacs redisplay
    Eli> examines characters in the order of their buffer positions), and fail
    Eli> to match.  When those rules fail to match, due to how the
    Eli> composition-related functions called by the display engine are
    Eli> factored, we never again consider compositions triggered by a later
    Eli> character which "cover" also #x1F44D: once that position was examined
    Eli> and the attempted composition failed, we move to the next character.
    Eli> IOW, we assume that this first set of composition rules we find for a
    Eli> given character are the only ones that could possibly be relevant for
    Eli> that character.

    Eli> Which means that to have #xFE0F compose correctly with Emoji
    Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.

Thatʼs easy enough:

diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk
index 7d2ff6cb900..d1195ebbad8 100644
--- a/admin/unidata/emoji-zwj.awk
+++ b/admin/unidata/emoji-zwj.awk
@@ -106,7 +106,8 @@ END {
 
      for (elt in ch)
     {
-        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt])
+        entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt)
+        printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries)
     }
      print "))"
      print "  (set-char-table-range composition-function-table"

That makes all the VS-16 sequences in
admin/unidata/emoji-variation-sequences.txt display with the emoji
font for me.

    Eli> The reason why "C-u C-x =" lies to us saying there's a composition
    Eli> where really there isn't is because descr-text.el uses the
    Eli> find-composition primitive, whose implementation is parallel and
    Eli> separate from that of the display-engine routines, and is structured
    Eli> differently.  So find-composition does succeed to detect the second
    Eli> rule, the one triggered by #xFE0F, which the display engine ignores.
    Eli> I will think whether this can be fixed, to avoid such false positives,
    Eli> but if we accept that there can be only one set of composition rules
    Eli> for a character, then we basically invoked undefined behavior here,
    Eli> and we got what we deserved.

If find-composition DTRT, could we not use it in the display engine?

Robert
-- 





  reply	other threads:[~2023-06-01 13:30 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26  3:18 bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate Steven Allen
2023-05-26  6:41 ` Eli Zaretskii
2023-05-26  8:34   ` Robert Pluim
2023-05-26  8:46     ` Eli Zaretskii
2023-05-26 11:14       ` Robert Pluim
2023-05-26 12:06         ` Eli Zaretskii
2023-05-26 14:02           ` Robert Pluim
2023-05-26 14:55             ` Eli Zaretskii
2023-05-26 15:25               ` Robert Pluim
2023-05-26 15:52                 ` Eli Zaretskii
2023-05-26 16:24                   ` Robert Pluim
2023-05-26 17:27                     ` Eli Zaretskii
2023-05-26 17:35                       ` Robert Pluim
2023-05-26 18:05                         ` Eli Zaretskii
2023-05-28 11:43                           ` Robert Pluim
2023-05-28 12:44                             ` Eli Zaretskii
2023-05-26 17:43                       ` Eli Zaretskii
2023-05-28 10:29                         ` Robert Pluim
2023-05-28 12:37                           ` Eli Zaretskii
2023-05-28 11:57                       ` Robert Pluim
2023-05-28 12:47                         ` Eli Zaretskii
2023-05-29 10:44                           ` Robert Pluim
2023-05-29 13:58                             ` Eli Zaretskii
2023-05-29 14:43                               ` Robert Pluim
2023-05-29 14:55                                 ` Eli Zaretskii
2023-05-29 16:13                                   ` Robert Pluim
2023-05-29 17:18                                     ` Eli Zaretskii
2023-05-30  7:25                                       ` Robert Pluim
2023-05-30 12:10                                         ` Eli Zaretskii
2023-05-30 13:30                                           ` Robert Pluim
2023-05-30 16:32                                             ` Eli Zaretskii
2023-05-31 16:11                                               ` Robert Pluim
2023-05-31 16:18                                                 ` Eli Zaretskii
2023-06-01 12:43                                                   ` Eli Zaretskii
2023-06-01 13:30                                                     ` Robert Pluim [this message]
2023-06-01 16:10                                                       ` Eli Zaretskii
2023-06-01 16:34                                                         ` Robert Pluim
2023-06-02  8:15                                                           ` Robert Pluim
2023-06-02 12:06                                                             ` Eli Zaretskii
2023-06-02 12:25                                                               ` Robert Pluim
2023-06-02 12:58                                                                 ` Eli Zaretskii
2023-06-02 13:58                                                                   ` Robert Pluim
2023-06-03  5:36                                                                     ` Eli Zaretskii
2023-06-05 13:08                                                                       ` Robert Pluim
2023-06-05 13:12                                                                         ` Eli Zaretskii
2023-06-05 13:31                                                                           ` Eli Zaretskii
2023-06-05 14:06                                                                             ` Robert Pluim
2023-06-05 13:36                                                                           ` Robert Pluim
2023-06-05 13:47                                                                             ` Eli Zaretskii
2023-06-05 14:27                                                                               ` Robert Pluim
2023-06-05 15:35                                                                                 ` Eli Zaretskii
2023-06-05 15:57                                                                                   ` Robert Pluim
2023-06-05 16:20                                                                                     ` Robert Pluim
2023-06-05 16:41                                                                                       ` Eli Zaretskii
2023-06-06  7:24                                                                                         ` Robert Pluim
2023-06-05 16:39                                                                                     ` Eli Zaretskii
2023-06-06  7:28                                                                                       ` Robert Pluim
2023-06-06 11:53                                                                                         ` Eli Zaretskii
2023-05-26 15:06   ` Steven Allen
2023-05-26 15:29     ` Robert Pluim
2023-05-26 16:03       ` Steven Allen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wn0n9uol.fsf@gmail.com \
    --to=rpluim@gmail.com \
    --cc=63731@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=steven@stebalien.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).