From: Eli Zaretskii <eliz@gnu.org>
To: rpluim@gmail.com
Cc: 63731@debbugs.gnu.org, steven@stebalien.com
Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate
Date: Thu, 01 Jun 2023 15:43:26 +0300 [thread overview]
Message-ID: <83edmvcpzl.fsf@gnu.org> (raw)
In-Reply-To: <83ilc8eapd.fsf@gnu.org> (message from Eli Zaretskii on Wed, 31 May 2023 19:18:22 +0300)
> Cc: 63731@debbugs.gnu.org, steven@stebalien.com
> Date: Wed, 31 May 2023 19:18:22 +0300
> From: Eli Zaretskii <eliz@gnu.org>
>
> > From: Robert Pluim <rpluim@gmail.com>
> > Cc: 63731@debbugs.gnu.org, steven@stebalien.com
> > Date: Wed, 31 May 2023 18:11:36 +0200
> >
> > Eli> So there are two issues here: (a) why there's no composition in the
> > Eli> first case, and (b) why does "C-u C-x =" says there is when there
> > Eli> isn't.
> >
> > OK. I can poke around in gdb if you give me some idea of what I should
> > be looking at.
>
> I don't really know. I plan to just step through the code in
> composite.c tomorrow, unless you beat me to it. Once we understand
> issue (a), I think we will also understand issue (b).
OK, the issue is quite clear even without stepping with a debugger.
Bottom line: we cannot support a situation where the same character
can be composed by more than one slot in composition-function-table.
If there are more than a single slot for the same character, one of
them will be tried, and the rest will be ignored (not even tried).
In particular, if a character CH has a "forward" composition rule that
starts with itself, and also has a "backward" rule (one with non-zero
look-back parameter) triggered by a different character (which should
follow CH), the latter rule will never be tried.
This is what happens in this case: the character #x1F44D has several
rules that start with itself in emoji-zwj.el:
(#x1F44D .
,(eval-when-compile (regexp-opt
'(
"\N{U+1F44D}\N{U+1F3FB}"
"\N{U+1F44D}\N{U+1F3FC}"
"\N{U+1F44D}\N{U+1F3FD}"
"\N{U+1F44D}\N{U+1F3FE}"
"\N{U+1F44D}\N{U+1F3FF}"
))))
and it also has a "backward" rule:
(set-char-table-range
composition-function-table
#xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))
The latter is triggered by #xFE0F and has a 1-character look-back,
which will match #x1F44D, since its category is '.' (it's a "base
character"). This latter rule is never tried. Why? because the
former rules, anchored at #X1F44D, are tried first (Emacs redisplay
examines characters in the order of their buffer positions), and fail
to match. When those rules fail to match, due to how the
composition-related functions called by the display engine are
factored, we never again consider compositions triggered by a later
character which "cover" also #x1F44D: once that position was examined
and the attempted composition failed, we move to the next character.
IOW, we assume that this first set of composition rules we find for a
given character are the only ones that could possibly be relevant for
that character.
Which means that to have #xFE0F compose correctly with Emoji
codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.
The reason why "C-u C-x =" lies to us saying there's a composition
where really there isn't is because descr-text.el uses the
find-composition primitive, whose implementation is parallel and
separate from that of the display-engine routines, and is structured
differently. So find-composition does succeed to detect the second
rule, the one triggered by #xFE0F, which the display engine ignores.
I will think whether this can be fixed, to avoid such false positives,
but if we accept that there can be only one set of composition rules
for a character, then we basically invoked undefined behavior here,
and we got what we deserved.
Thanks.
next prev parent reply other threads:[~2023-06-01 12:43 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 3:18 bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate Steven Allen
2023-05-26 6:41 ` Eli Zaretskii
2023-05-26 8:34 ` Robert Pluim
2023-05-26 8:46 ` Eli Zaretskii
2023-05-26 11:14 ` Robert Pluim
2023-05-26 12:06 ` Eli Zaretskii
2023-05-26 14:02 ` Robert Pluim
2023-05-26 14:55 ` Eli Zaretskii
2023-05-26 15:25 ` Robert Pluim
2023-05-26 15:52 ` Eli Zaretskii
2023-05-26 16:24 ` Robert Pluim
2023-05-26 17:27 ` Eli Zaretskii
2023-05-26 17:35 ` Robert Pluim
2023-05-26 18:05 ` Eli Zaretskii
2023-05-28 11:43 ` Robert Pluim
2023-05-28 12:44 ` Eli Zaretskii
2023-05-26 17:43 ` Eli Zaretskii
2023-05-28 10:29 ` Robert Pluim
2023-05-28 12:37 ` Eli Zaretskii
2023-05-28 11:57 ` Robert Pluim
2023-05-28 12:47 ` Eli Zaretskii
2023-05-29 10:44 ` Robert Pluim
2023-05-29 13:58 ` Eli Zaretskii
2023-05-29 14:43 ` Robert Pluim
2023-05-29 14:55 ` Eli Zaretskii
2023-05-29 16:13 ` Robert Pluim
2023-05-29 17:18 ` Eli Zaretskii
2023-05-30 7:25 ` Robert Pluim
2023-05-30 12:10 ` Eli Zaretskii
2023-05-30 13:30 ` Robert Pluim
2023-05-30 16:32 ` Eli Zaretskii
2023-05-31 16:11 ` Robert Pluim
2023-05-31 16:18 ` Eli Zaretskii
2023-06-01 12:43 ` Eli Zaretskii [this message]
2023-06-01 13:30 ` Robert Pluim
2023-06-01 16:10 ` Eli Zaretskii
2023-06-01 16:34 ` Robert Pluim
2023-06-02 8:15 ` Robert Pluim
2023-06-02 12:06 ` Eli Zaretskii
2023-06-02 12:25 ` Robert Pluim
2023-06-02 12:58 ` Eli Zaretskii
2023-06-02 13:58 ` Robert Pluim
2023-06-03 5:36 ` Eli Zaretskii
2023-06-05 13:08 ` Robert Pluim
2023-06-05 13:12 ` Eli Zaretskii
2023-06-05 13:31 ` Eli Zaretskii
2023-06-05 14:06 ` Robert Pluim
2023-06-05 13:36 ` Robert Pluim
2023-06-05 13:47 ` Eli Zaretskii
2023-06-05 14:27 ` Robert Pluim
2023-06-05 15:35 ` Eli Zaretskii
2023-06-05 15:57 ` Robert Pluim
2023-06-05 16:20 ` Robert Pluim
2023-06-05 16:41 ` Eli Zaretskii
2023-06-06 7:24 ` Robert Pluim
2023-06-05 16:39 ` Eli Zaretskii
2023-06-06 7:28 ` Robert Pluim
2023-06-06 11:53 ` Eli Zaretskii
2023-05-26 15:06 ` Steven Allen
2023-05-26 15:29 ` Robert Pluim
2023-05-26 16:03 ` Steven Allen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83edmvcpzl.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=63731@debbugs.gnu.org \
--cc=rpluim@gmail.com \
--cc=steven@stebalien.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).