bug#54562: 28.0.91; Emoji sequence not composed

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#54562: 28.0.91; Emoji sequence not composed
       [not found] <87bkxu8k7t.fsf.ref@yahoo.com>
@ 2022-03-25  9:17 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 10:27   ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-25  9:17 UTC (permalink / raw)
  To: 54562

The following Emoji does not display correctly:

  7⃣️

In other programs, it displays as the digit "7" inside a square, but
inside Emacs it displays as the digit "7", followed by the blue square,
and an empty hollow black square.

In GNU Emacs 28.0.91 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo version 1.17.4, Xaw3d scroll bars)
 of 2022-02-08 built on trinity
Repository revision: 82e74e4559b8becd44f3e7ac0134e2baddd69921
Repository branch: emacs-28
Windowing system distributor 'The X.Org Foundation', version 11.0.12014000
System Description: Fedora Linux 35 (Workstation Edition)

Configured using:
 'configure --with-x-toolkit=lucid --with-native-compilation
 --cache-file=/tmp/ccache'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11
XAW3D XDBE XIM XPM LUCID ZLIB

Important settings:
  value of $LANG: en_GB.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25  9:17 ` bug#54562: 28.0.91; Emoji sequence not composed Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-25 10:27   ` Eli Zaretskii
  2022-03-25 10:32     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-25 10:27 UTC (permalink / raw)
  To: Po Lu; +Cc: 54562

> Date: Fri, 25 Mar 2022 17:17:26 +0800
> From:  Po Lu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> The following Emoji does not display correctly:
> 
>   7⃣️
> 
> In other programs, it displays as the digit "7" inside a square, but
> inside Emacs it displays as the digit "7", followed by the blue square,
> and an empty hollow black square.

I think this means your default font doesn't support the U+20E3
COMBINING ENCLOSING KEYCAP character.  Emacs cannot compose characters
that aren't supported by the font used for the base character.  Here's
what I see in "C-u C-x =" on my system, when Emacs uses a font that
does support it (and where I do see "7" inside a square):

	       position: 148 of 150 (98%), column: 2
	      character: 7 (displayed as 7) (codepoint 55, #o67, #x37)
		charset: ascii (ASCII (ISO646 IRV))
  code point in charset: 0x37
		 script: latin
		 syntax: w 	which means: word
	       category: .:Base, a:ASCII, l:Latin, r:Roman
	       to input: type "C-x 8 RET 37" or "C-x 8 RET DIGIT SEVEN"
	    buffer code: #x37
	      file code: #x37 (encoded by coding system iso-latin-1-dos)
		display: composed to form "7⃣️" (see below)

  Composed with the following character(s) "⃣️" using this font:
    harfbuzz:-outline-Symbola-normal-normal-normal-serif-16-*-*-*-p-*-iso8859-1
  by these glyphs:
    [0 2 55 26 8 0 7 11 0 nil]
    [0 2 8419 2327 0 -10 4 10 4 nil]
    [0 2 65039 3 4 0 1 0 1 [0 0 0]]
  with these character(s):
    ⃣ (#x20e3) COMBINING ENCLOSING KEYCAP
    ️ (#xfe0f) VARIATION SELECTOR-16





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 10:27   ` Eli Zaretskii
@ 2022-03-25 10:32     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 10:54       ` Robert Pluim
  2022-03-25 11:23       ` Eli Zaretskii
  0 siblings, 2 replies; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-25 10:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 54562

Eli Zaretskii <eliz@gnu.org> writes:

> I think this means your default font doesn't support the U+20E3
> COMBINING ENCLOSING KEYCAP character.  Emacs cannot compose characters
> that aren't supported by the font used for the base character.  Here's
> what I see in "C-u C-x =" on my system, when Emacs uses a font that
> does support it (and where I do see "7" inside a square):
>
> 	       position: 148 of 150 (98%), column: 2
> 	      character: 7 (displayed as 7) (codepoint 55, #o67, #x37)
> 		charset: ascii (ASCII (ISO646 IRV))
>   code point in charset: 0x37
> 		 script: latin
> 		 syntax: w 	which means: word
> 	       category: .:Base, a:ASCII, l:Latin, r:Roman
> 	       to input: type "C-x 8 RET 37" or "C-x 8 RET DIGIT SEVEN"
> 	    buffer code: #x37
> 	      file code: #x37 (encoded by coding system iso-latin-1-dos)
> 		display: composed to form "7⃣️" (see below)
>
>   Composed with the following character(s) "⃣️" using this font:
>     harfbuzz:-outline-Symbola-normal-normal-normal-serif-16-*-*-*-p-*-iso8859-1
>   by these glyphs:
>     [0 2 55 26 8 0 7 11 0 nil]
>     [0 2 8419 2327 0 -10 4 10 4 nil]
>     [0 2 65039 3 4 0 1 0 1 [0 0 0]]
>   with these character(s):
>     ⃣ (#x20e3) COMBINING ENCLOSING KEYCAP
>     ️ (#xfe0f) VARIATION SELECTOR-16

Thanks.  But does it really make sense to require that the default font
(on my system, Source Code Pro) support Emoji?  20E3 COMBINING ENCLOSING
KEYCAP displays by itself using Noto Color Emoji.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 10:32     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-25 10:54       ` Robert Pluim
  2022-03-25 11:47         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 11:23       ` Eli Zaretskii
  1 sibling, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-25 10:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Po Lu, 54562

>>>>> On Fri, 25 Mar 2022 18:32:08 +0800, Po Lu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org> said:

    Bug> Eli Zaretskii <eliz@gnu.org> writes:
    >> I think this means your default font doesn't support the U+20E3
    >> COMBINING ENCLOSING KEYCAP character.  Emacs cannot compose characters
    >> that aren't supported by the font used for the base character.

... except when you use the correct emoji sequence, which in this case
is

U+0037 U+FE0F U+20E3

    Bug> Thanks.  But does it really make sense to require that the default font
    Bug> (on my system, Source Code Pro) support Emoji?  20E3 COMBINING ENCLOSING
    Bug> KEYCAP displays by itself using Noto Color Emoji.

See above

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 10:32     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 10:54       ` Robert Pluim
@ 2022-03-25 11:23       ` Eli Zaretskii
  1 sibling, 0 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-25 11:23 UTC (permalink / raw)
  To: Po Lu; +Cc: 54562

> From: Po Lu <luangruo@yahoo.com>
> Cc: 54562@debbugs.gnu.org
> Date: Fri, 25 Mar 2022 18:32:08 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I think this means your default font doesn't support the U+20E3
> > COMBINING ENCLOSING KEYCAP character.  Emacs cannot compose characters
> > that aren't supported by the font used for the base character.  Here's
> > what I see in "C-u C-x =" on my system, when Emacs uses a font that
> > does support it (and where I do see "7" inside a square):
> >
> > 	       position: 148 of 150 (98%), column: 2
> > 	      character: 7 (displayed as 7) (codepoint 55, #o67, #x37)
> > 		charset: ascii (ASCII (ISO646 IRV))
> >   code point in charset: 0x37
> > 		 script: latin
> > 		 syntax: w 	which means: word
> > 	       category: .:Base, a:ASCII, l:Latin, r:Roman
> > 	       to input: type "C-x 8 RET 37" or "C-x 8 RET DIGIT SEVEN"
> > 	    buffer code: #x37
> > 	      file code: #x37 (encoded by coding system iso-latin-1-dos)
> > 		display: composed to form "7⃣️" (see below)
> >
> >   Composed with the following character(s) "⃣️" using this font:
> >     harfbuzz:-outline-Symbola-normal-normal-normal-serif-16-*-*-*-p-*-iso8859-1
> >   by these glyphs:
> >     [0 2 55 26 8 0 7 11 0 nil]
> >     [0 2 8419 2327 0 -10 4 10 4 nil]
> >     [0 2 65039 3 4 0 1 0 1 [0 0 0]]
> >   with these character(s):
> >     ⃣ (#x20e3) COMBINING ENCLOSING KEYCAP
> >     ️ (#xfe0f) VARIATION SELECTOR-16
> 
> Thanks.  But does it really make sense to require that the default font
> (on my system, Source Code Pro) support Emoji?  20E3 COMBINING ENCLOSING
> KEYCAP displays by itself using Noto Color Emoji.

U+20E3 is not an Emoji character, so how do you want Emacs to know to
use the Emoji font for it?  And "7" is definitely not Emoji.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 10:54       ` Robert Pluim
@ 2022-03-25 11:47         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 12:15           ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-25 11:47 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Eli Zaretskii, 54562

Robert Pluim <rpluim@gmail.com> writes:

> ... except when you use the correct emoji sequence, which in this case
> is
>
> U+0037 U+FE0F U+20E3

Hmm, odd, thanks.  I wonder why other programs display the original
sequence correctly.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 11:47         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-25 12:15           ` Eli Zaretskii
  2022-03-25 12:46             ` Andreas Schwab
  2022-03-25 14:05             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-25 12:15 UTC (permalink / raw)
  To: Po Lu; +Cc: rpluim, 54562

> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  54562@debbugs.gnu.org
> Date: Fri, 25 Mar 2022 19:47:21 +0800
> 
> Robert Pluim <rpluim@gmail.com> writes:
> 
> > ... except when you use the correct emoji sequence, which in this case
> > is
> >
> > U+0037 U+FE0F U+20E3
> 
> Hmm, odd, thanks.  I wonder why other programs display the original
> sequence correctly.

Why do you think what they do is "correct"?  AFAIK, we use the Unicode
Standard's definition of Emoji sequences to decide when U+FE0F
warrants an Emoji representation.  maybe those other applications
default to Emoji representation of every character that can possibly
have such a representation, but in Emacs such a default cannot make
sense.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 12:15           ` Eli Zaretskii
@ 2022-03-25 12:46             ` Andreas Schwab
  2022-03-25 13:05               ` Eli Zaretskii
  2022-03-25 14:05             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 47+ messages in thread
From: Andreas Schwab @ 2022-03-25 12:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Po Lu, rpluim, 54562

On Mär 25 2022, Eli Zaretskii wrote:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  54562@debbugs.gnu.org
>> Date: Fri, 25 Mar 2022 19:47:21 +0800
>> 
>> Robert Pluim <rpluim@gmail.com> writes:
>> 
>> > ... except when you use the correct emoji sequence, which in this case
>> > is
>> >
>> > U+0037 U+FE0F U+20E3
>> 
>> Hmm, odd, thanks.  I wonder why other programs display the original
>> sequence correctly.
>
> Why do you think what they do is "correct"?

If you switch to Symbola as the default font, Emacs is able to
combine 7 U+20E3 U+FE0F.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 12:46             ` Andreas Schwab
@ 2022-03-25 13:05               ` Eli Zaretskii
  2022-03-25 13:14                 ` Andreas Schwab
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-25 13:05 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Po Lu <luangruo@yahoo.com>,  rpluim@gmail.com,  54562@debbugs.gnu.org
> Date: Fri, 25 Mar 2022 13:46:37 +0100
> 
> On Mär 25 2022, Eli Zaretskii wrote:
> 
> >> From: Po Lu <luangruo@yahoo.com>
> >> Cc: Eli Zaretskii <eliz@gnu.org>,  54562@debbugs.gnu.org
> >> Date: Fri, 25 Mar 2022 19:47:21 +0800
> >> 
> >> Robert Pluim <rpluim@gmail.com> writes:
> >> 
> >> > ... except when you use the correct emoji sequence, which in this case
> >> > is
> >> >
> >> > U+0037 U+FE0F U+20E3
> >> 
> >> Hmm, odd, thanks.  I wonder why other programs display the original
> >> sequence correctly.
> >
> > Why do you think what they do is "correct"?
> 
> If you switch to Symbola as the default font, Emacs is able to
> combine 7 U+20E3 U+FE0F.

By which composition rule?  Isn't that because U+20E3 is a combining
character?





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 13:05               ` Eli Zaretskii
@ 2022-03-25 13:14                 ` Andreas Schwab
  2022-03-25 13:30                   ` Robert Pluim
  2022-03-25 13:44                   ` Eli Zaretskii
  0 siblings, 2 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-25 13:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, rpluim, 54562

On Mär 25 2022, Eli Zaretskii wrote:

> By which composition rule?  Isn't that because U+20E3 is a combining
> character?

Sure.  If Emacs were able to do that even if the default does not
contain U+20E3 that would be ideal.  Or if Emacs were able to combine a
and U+0308 even if the latter is not avaliable in the default font.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 13:14                 ` Andreas Schwab
@ 2022-03-25 13:30                   ` Robert Pluim
  2022-03-25 13:57                     ` Andreas Schwab
  2022-03-25 13:44                   ` Eli Zaretskii
  1 sibling, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-25 13:30 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, 54562

>>>>> On Fri, 25 Mar 2022 14:14:27 +0100, Andreas Schwab <schwab@linux-m68k.org> said:

    Andreas> On Mär 25 2022, Eli Zaretskii wrote:
    >> By which composition rule?  Isn't that because U+20E3 is a combining
    >> character?

    Andreas> Sure.  If Emacs were able to do that even if the default does not
    Andreas> contain U+20E3 that would be ideal.  Or if Emacs were able to combine a
    Andreas> and U+0308 even if the latter is not avaliable in the default font.

For U+20E3 you could try playing with the value of
`auto-composition-emoji-eligible-codepoints'. For U+308, how common is
it to have a font that doesnʼt have a glyph for it?

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 13:14                 ` Andreas Schwab
  2022-03-25 13:30                   ` Robert Pluim
@ 2022-03-25 13:44                   ` Eli Zaretskii
  2022-03-25 14:03                     ` Andreas Schwab
  1 sibling, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-25 13:44 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: luangruo@yahoo.com,  rpluim@gmail.com,  54562@debbugs.gnu.org
> Date: Fri, 25 Mar 2022 14:14:27 +0100
> 
> On Mär 25 2022, Eli Zaretskii wrote:
> 
> > By which composition rule?  Isn't that because U+20E3 is a combining
> > character?
> 
> Sure.  If Emacs were able to do that even if the default does not
> contain U+20E3 that would be ideal.  Or if Emacs were able to combine a
> and U+0308 even if the latter is not avaliable in the default font.

I think Emacs only considers the font of the base character when it
tries to compose?  So even if I do

  (set-fontset-font t #x20e3 '("Symbola" . "iso10646-1") nil 'prepend)

there's no composition between '7' and U+20E3.

(This does work with Emoji sequences, but AFAIR that's because we have
an extra-special hack in composite.c for characters that are in
auto-composition-emoji-eligible-codepoints.)





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 13:30                   ` Robert Pluim
@ 2022-03-25 13:57                     ` Andreas Schwab
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-25 13:57 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, 54562

On Mär 25 2022, Robert Pluim wrote:

> For U+308, how common is it to have a font that doesnʼt have a glyph
> for it?

Rather common, I would think.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 13:44                   ` Eli Zaretskii
@ 2022-03-25 14:03                     ` Andreas Schwab
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-25 14:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, rpluim, 54562

On Mär 25 2022, Eli Zaretskii wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: luangruo@yahoo.com,  rpluim@gmail.com,  54562@debbugs.gnu.org
>> Date: Fri, 25 Mar 2022 14:14:27 +0100
>> 
>> On Mär 25 2022, Eli Zaretskii wrote:
>> 
>> > By which composition rule?  Isn't that because U+20E3 is a combining
>> > character?
>> 
>> Sure.  If Emacs were able to do that even if the default does not
>> contain U+20E3 that would be ideal.  Or if Emacs were able to combine a
>> and U+0308 even if the latter is not avaliable in the default font.
>
> I think Emacs only considers the font of the base character when it
> tries to compose?

I guess that's the difference between Emacs and other display engines.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 12:15           ` Eli Zaretskii
  2022-03-25 12:46             ` Andreas Schwab
@ 2022-03-25 14:05             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-25 14:14               ` Robert Pluim
  1 sibling, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-25 14:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rpluim, 54562

Eli Zaretskii <eliz@gnu.org> writes:

> Why do you think what they do is "correct"?  AFAIK, we use the Unicode
> Standard's definition of Emoji sequences to decide when U+FE0F
> warrants an Emoji representation.  maybe those other applications
> default to Emoji representation of every character that can possibly
> have such a representation, but in Emacs such a default cannot make
> sense.

I don't know whether or not their behavior is correct, but this sequence
is seen in the wild (for example, the Mac OS input methods generate
these sequences), so maybe it is worth supporting.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 14:05             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-25 14:14               ` Robert Pluim
  2022-03-26  1:16                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-25 14:14 UTC (permalink / raw)
  To: Po Lu; +Cc: 54562

>>>>> On Fri, 25 Mar 2022 22:05:32 +0800, Po Lu <luangruo@yahoo.com> said:

    Po> Eli Zaretskii <eliz@gnu.org> writes:
    >> Why do you think what they do is "correct"?  AFAIK, we use the Unicode
    >> Standard's definition of Emoji sequences to decide when U+FE0F
    >> warrants an Emoji representation.  maybe those other applications
    >> default to Emoji representation of every character that can possibly
    >> have such a representation, but in Emacs such a default cannot make
    >> sense.

    Po> I don't know whether or not their behavior is correct, but this sequence
    Po> is seen in the wild (for example, the Mac OS input methods generate
    Po> these sequences), so maybe it is worth supporting.

Iʼve just tested adding U+20E3 to
`auto-composition-emoji-eligible-codepoints', and it seems to work OK.

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-25 14:14               ` Robert Pluim
@ 2022-03-26  1:16                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-26  5:56                   ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-26  1:16 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Eli Zaretskii, 54562

Robert Pluim <rpluim@gmail.com> writes:

> Iʼve just tested adding U+20E3 to
> `auto-composition-emoji-eligible-codepoints', and it seems to work OK.

Works here too.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-26  1:16                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-26  5:56                   ` Eli Zaretskii
  2022-03-26 16:51                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-26  5:56 UTC (permalink / raw)
  To: Po Lu; +Cc: rpluim, 54562

> From: Po Lu <luangruo@yahoo.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  54562@debbugs.gnu.org
> Date: Sat, 26 Mar 2022 09:16:43 +0800
> 
> Robert Pluim <rpluim@gmail.com> writes:
> 
> > Iʼve just tested adding U+20E3 to
> > `auto-composition-emoji-eligible-codepoints', and it seems to work OK.
> 
> Works here too.

That's fine, but what is the conclusion here?  Unicode defines quite a
few more COMBINING ENCLOSING <SOMETHING> codepoints, so IMO either we
do this for all of them (the entire Combining Diacritical Marks for
Symbols block), or none at all.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-26  5:56                   ` Eli Zaretskii
@ 2022-03-26 16:51                     ` Lars Ingebrigtsen
  2022-03-27  0:32                       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 47+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-26 16:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Po Lu, rpluim, 54562

Eli Zaretskii <eliz@gnu.org> writes:

> That's fine, but what is the conclusion here?  Unicode defines quite a
> few more COMBINING ENCLOSING <SOMETHING> codepoints, so IMO either we
> do this for all of them (the entire Combining Diacritical Marks for
> Symbols block), or none at all.

If other applications does composition on the entire block, I guess we
should do the same.  I guess it's this range:

20D0..20FF; Combining Diacritical Marks for Symbols

Could somebody do some testing in some other programs and see what they
do?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-26 16:51                     ` Lars Ingebrigtsen
@ 2022-03-27  0:32                       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-27 15:10                         ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-27  0:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, 54562, rpluim

Lars Ingebrigtsen <larsi@gnus.org> writes:

> If other applications does composition on the entire block, I guess we
> should do the same.  I guess it's this range:
>
> 20D0..20FF; Combining Diacritical Marks for Symbols
>
> Could somebody do some testing in some other programs and see what they
> do?

Please tell me what sequences to input in order to test, thanks.






^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-27  0:32                       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-27 15:10                         ` Robert Pluim
  2022-03-28  0:19                           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-27 15:10 UTC (permalink / raw)
  To: Po Lu; +Cc: Lars Ingebrigtsen, 54562

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

>>>>> On Sun, 27 Mar 2022 08:32:55 +0800, Po Lu <luangruo@yahoo.com> said:

    Po> Lars Ingebrigtsen <larsi@gnus.org> writes:
    >> If other applications does composition on the entire block, I guess we
    >> should do the same.  I guess it's this range:
    >>
    >> 20D0..20FF; Combining Diacritical Marks for Symbols
    >>

20e3 is the only one there which is used in an emoji sequence, though.

    >> Could somebody do some testing in some other programs and see what they
    >> do?

    Po> Please tell me what sequences to input in order to test, thanks.

Hereʼs what I tested

20d1 a⃑
20d2 a⃒
20d3 a⃓
20d4 a⃔
20d5 a⃕
20d6 a⃖
20d7 a⃗
20d8 a⃘
20d9 a⃙
20da a⃚
20db a⃛
20dc a⃜
20dd a⃝
20de a⃞
20df a⃟
20e0 a⃠
20e1 a⃡
20e2 a⃢
20e3 a⃣
7⃣
20e4 a⃤
20e5 a⃥
20e6 a⃦
20e7 a⃧
20e8 a⃨
20e9 a⃩
20ea a⃪
20eb a⃫
20ec a⃬
20ed a⃭
20ee a⃮
20ef a⃯
20f0 a⃰
20f1 a⃱
20f2 a⃲
20f3 a⃳
20f4 a⃴
20f5 a⃵
20f6 a⃶
20f7 a⃷
20f8 a⃸
20f9 a⃹
20fa a⃺
20fb a⃻
20fc a⃼
20fd a⃽
20fe a⃾
20ff a⃿

(I think 20f1-20ff are actually codepoint non grata, but I canʼt find
the reference for the moment)

gedit combines some of them, but not others. It does not use the emoji
font for 0037 20e3, though, you need to add fe0f in the middle for
that to happen.

libreoffice combines more, but not the same set as gedit. It does
however render 0037 20e3 with the emoji font.

Screenshots below.

Robert
-- 

[-- Attachment #2: gedit combining.png --]
[-- Type: image/png, Size: 44994 bytes --]

[-- Attachment #3: libreoffice combining.png --]
[-- Type: image/png, Size: 36090 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-27 15:10                         ` Robert Pluim
@ 2022-03-28  0:19                           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-03-28  7:47                             ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-28  0:19 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Lars Ingebrigtsen, 54562, Eli Zaretskii

Robert Pluim <rpluim@gmail.com> writes:

> (I think 20f1-20ff are actually codepoint non grata, but I canʼt find
> the reference for the moment)
>
> gedit combines some of them, but not others. It does not use the emoji
> font for 0037 20e3, though, you need to add fe0f in the middle for
> that to happen.

For me, Gedit combines everything in that list before 20f1, except for
20e3 and 20dd.

> libreoffice combines more, but not the same set as gedit. It does
> however render 0037 20e3 with the emoji font.

I see the same results with LibreOffice as in your screenshot.

Thanks.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28  0:19                           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-03-28  7:47                             ` Robert Pluim
  2022-03-28 11:51                               ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-28  7:47 UTC (permalink / raw)
  To: Po Lu; +Cc: Lars Ingebrigtsen, 54562

>>>>> On Mon, 28 Mar 2022 08:19:39 +0800, Po Lu via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org> said:

    Po> Robert Pluim <rpluim@gmail.com> writes:
    >> (I think 20f1-20ff are actually codepoint non grata, but I canʼt find
    >> the reference for the moment)
    >> 
    >> gedit combines some of them, but not others. It does not use the emoji
    >> font for 0037 20e3, though, you need to add fe0f in the middle for
    >> that to happen.

    Po> For me, Gedit combines everything in that list before 20f1, except for
    Po> 20e3 and 20dd.

    >> libreoffice combines more, but not the same set as gedit. It does
    >> however render 0037 20e3 with the emoji font.

    Po> I see the same results with LibreOffice as in your screenshot.

OK. So it sounds like we should perhaps look at doing composition for
the codepoints in that block by doing face lookup based on the
combining character rather than the base character. Eli, should we
look at doing that for other combining characters, such as Andreas'
0308?

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28  7:47                             ` Robert Pluim
@ 2022-03-28 11:51                               ` Eli Zaretskii
  2022-03-28 12:46                                 ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 11:51 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  54562@debbugs.gnu.org,  Eli
>  Zaretskii <eliz@gnu.org>
> Date: Mon, 28 Mar 2022 09:47:54 +0200
> 
> OK. So it sounds like we should perhaps look at doing composition for
> the codepoints in that block by doing face lookup based on the
> combining character rather than the base character.

I guess we should try.  It should be optional behavior, because Emacs
never did that, and I cannot predict what will that do to all the
different use cases where we compose text, and thus whether users will
like that in all the cases.  It could, for example, mean that a
particular Latin character with a diacritic will be displayed with a
font that's different from the rest of the Latin text, which some
users might consider worse than seeing just the base character in the
"expected" font.  And that's just the simplest use case.

And I think "based on combining character" is not the correct
definition.  We should allow selection of the font based on the
character that triggered the composition, i.e. the character whose
slot in composition-function-table stores the rule which we are using
to produce the composition.  Like we already do for Emoji.  For
combining characters, the default is that the combining character is
that trigger.  By contrast, today we use the font for the first
character in the composition sequence (NOT the base character, as I
incorrectly wrote earlier, although in practice it is the same for
Latin).

> Eli, should we look at doing that for other combining characters,
> such as Andreas' 0308?

"Look at" in what sense?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 11:51                               ` Eli Zaretskii
@ 2022-03-28 12:46                                 ` Robert Pluim
  2022-03-28 13:12                                   ` Eli Zaretskii
  2022-03-28 13:19                                   ` Andreas Schwab
  0 siblings, 2 replies; 47+ messages in thread
From: Robert Pluim @ 2022-03-28 12:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, larsi, 54562

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

>>>>> On Mon, 28 Mar 2022 14:51:49 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  54562@debbugs.gnu.org,  Eli
    >> Zaretskii <eliz@gnu.org>
    >> Date: Mon, 28 Mar 2022 09:47:54 +0200
    >> 
    >> OK. So it sounds like we should perhaps look at doing composition for
    >> the codepoints in that block by doing face lookup based on the
    >> combining character rather than the base character.

    Eli> I guess we should try.  It should be optional behavior, because Emacs
    Eli> never did that, and I cannot predict what will that do to all the
    Eli> different use cases where we compose text, and thus whether users will
    Eli> like that in all the cases.  It could, for example, mean that a
    Eli> particular Latin character with a diacritic will be displayed with a
    Eli> font that's different from the rest of the Latin text, which some
    Eli> users might consider worse than seeing just the base character in the
    Eli> "expected" font.  And that's just the simplest use case.

Yes, thatʼs exactly what happens with U+0308 here sometimes, see
screenshot below. I had to search a bit to find a font to use as the
default that didnʼt have a glyph for U+0308, so Iʼm not sure how
important this issue is in practice.

    Eli> And I think "based on combining character" is not the correct
    Eli> definition.  We should allow selection of the font based on the
    Eli> character that triggered the composition, i.e. the character whose
    Eli> slot in composition-function-table stores the rule which we are using
    Eli> to produce the composition.  Like we already do for Emoji.  For
    Eli> combining characters, the default is that the combining character is
    Eli> that trigger.  By contrast, today we use the font for the first
    Eli> character in the composition sequence (NOT the base character, as I
    Eli> incorrectly wrote earlier, although in practice it is the same for
    Eli> Latin).

Imprecise wording on my part. It would indeed be the triggering
character, as with emoji.

    >> Eli, should we look at doing that for other combining characters,
    >> such as Andreas' 0308?

    Eli> "Look at" in what sense?

'consider'

Rough patch attached. It does U+20E3, U+0308, and U+20D0..U+20FF. It
works kind of ok, but U+006F U+0308 suffers from the font problem you
were worried about. With Bitstream Vera Mono, the composed glyph ends
up being from Latin Modern Roman, which looks very different.

The composed glyphs for U+20D0..U+20FF look pretty bad in all the
fonts Iʼve tried so far: Unifont, FreeSans, Free Mono, Menlo,
Bitstream Vera Mono. Does anyone have an idea of a good font for
those?

Robert
-- 

[-- Attachment #2: 0308 font difference.png --]
[-- Type: image/png, Size: 2088 bytes --]

[-- Attachment #3: Type: text/plain, Size: 1603 bytes --]

diff --git i/admin/unidata/emoji-zwj.awk w/admin/unidata/emoji-zwj.awk
index 3d605d5d64..331095d56f 100644
--- i/admin/unidata/emoji-zwj.awk
+++ w/admin/unidata/emoji-zwj.awk
@@ -69,6 +69,7 @@ END {
      # emoji sequences.  We have code in font.c:font_range that will
      # try to display them with the emoji font anyway.
 
+     trigger_codepoints[0] = "20E3"
      trigger_codepoints[1] = "261D"
      trigger_codepoints[2] = "26F9"
      trigger_codepoints[3] = "270C"
diff --git i/src/font.c w/src/font.c
index 7e0219181c..265bec6ce5 100644
--- i/src/font.c
+++ w/src/font.c
@@ -3937,6 +3937,14 @@ codepoint_is_emoji_eligible (int ch)
   return false;
 }
 
+static bool
+codepoint_is_combining_lookup_eligible (int ch)
+{
+  if ((0x20D0 <= ch && ch <= 0x20FF) || ch == 0x308)
+    return true;
+  return false;
+}
+
 /* Check how many characters after character/byte position POS/POS_BYTE
    (at most to *LIMIT) can be displayed by the same font in the window W.
    FACE, if non-NULL, is the face selected for the character at POS.
@@ -3996,6 +4004,13 @@ font_range (ptrdiff_t pos, ptrdiff_t pos_byte, ptrdiff_t *limit,
 	    val = AREF (val, 0);
 	  font_object = font_for_char (face, XFIXNAT (val), pos, string);
 	}
+    } else if (codepoint_is_combining_lookup_eligible (ch))
+  /* If the triggering codepoint is a combining character, use the
+     font of that character rather than the font of the base
+     character, since that increases the chances of composition
+     working.  */
+    {
+      font_object = font_for_char (face, ch, pos, string);
     }
 
   while (pos < *limit)

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 12:46                                 ` Robert Pluim
@ 2022-03-28 13:12                                   ` Eli Zaretskii
  2022-03-28 14:59                                     ` Robert Pluim
  2022-03-28 13:19                                   ` Andreas Schwab
  1 sibling, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 13:12 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
> Date: Mon, 28 Mar 2022 14:46:09 +0200
> 
>     Eli> I guess we should try.  It should be optional behavior, because Emacs
>     Eli> never did that, and I cannot predict what will that do to all the
>     Eli> different use cases where we compose text, and thus whether users will
>     Eli> like that in all the cases.  It could, for example, mean that a
>     Eli> particular Latin character with a diacritic will be displayed with a
>     Eli> font that's different from the rest of the Latin text, which some
>     Eli> users might consider worse than seeing just the base character in the
>     Eli> "expected" font.  And that's just the simplest use case.
> 
> Yes, thatʼs exactly what happens with U+0308 here sometimes, see
> screenshot below. I had to search a bit to find a font to use as the
> default that didnʼt have a glyph for U+0308, so Iʼm not sure how
> important this issue is in practice.

I wasn't talking specifically about U+0308, I was talking about
combining diacritics in general.  Some newer ones could be missing
from fonts that otherwise cover Latin character sets.

>     Eli> "Look at" in what sense?
> 
> 'consider'
> 
> Rough patch attached. It does U+20E3, U+0308, and U+20D0..U+20FF. It
> works kind of ok, but U+006F U+0308 suffers from the font problem you
> were worried about. With Bitstream Vera Mono, the composed glyph ends
> up being from Latin Modern Roman, which looks very different.
> 
> The composed glyphs for U+20D0..U+20FF look pretty bad in all the
> fonts Iʼve tried so far: Unifont, FreeSans, Free Mono, Menlo,
> Bitstream Vera Mono. Does anyone have an idea of a good font for
> those?

I'll let people comment on whether this is worth an optional
behavior.

> +static bool
> +codepoint_is_combining_lookup_eligible (int ch)
> +{
> +  if ((0x20D0 <= ch && ch <= 0x20FF) || ch == 0x308)
> +    return true;
> +  return false;
> +}

Any reason not to use the Unicode category here?  Or do we want to
support only specific characters (in which case U+0308 is still not
the only one)?





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 12:46                                 ` Robert Pluim
  2022-03-28 13:12                                   ` Eli Zaretskii
@ 2022-03-28 13:19                                   ` Andreas Schwab
  2022-03-28 15:01                                     ` Robert Pluim
  1 sibling, 1 reply; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 13:19 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, 54562, larsi

On Mär 28 2022, Robert Pluim wrote:

> Yes, thatʼs exactly what happens with U+0308 here sometimes, see
> screenshot below. I had to search a bit to find a font to use as the
> default that didnʼt have a glyph for U+0308, so Iʼm not sure how
> important this issue is in practice.

It's quite common in NFKD encoded texts.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 13:12                                   ` Eli Zaretskii
@ 2022-03-28 14:59                                     ` Robert Pluim
  2022-03-28 16:07                                       ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-28 14:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, larsi, 54562

>>>>> On Mon, 28 Mar 2022 16:12:06 +0300, Eli Zaretskii <eliz@gnu.org> said:

    Eli> I wasn't talking specifically about U+0308, I was talking about
    Eli> combining diacritics in general.  Some newer ones could be missing
    Eli> from fonts that otherwise cover Latin character sets.

Andreas indicates that missing glyphs is an issue. I think a user
option (default 'off') would be in order.

    >> +static bool
    >> +codepoint_is_combining_lookup_eligible (int ch)
    >> +{
    >> +  if ((0x20D0 <= ch && ch <= 0x20FF) || ch == 0x308)
    >> +    return true;
    >> +  return false;
    >> +}

    Eli> Any reason not to use the Unicode category here?  Or do we want to
    Eli> support only specific characters (in which case U+0308 is still not
    Eli> the only one)?

You'd want to apply this to everything in Mn? Thatʼs a lot of
codepoints. Or did you mean Me? Or anything in Mn thatʼs latin? The
possibilities are endless :-)

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 13:19                                   ` Andreas Schwab
@ 2022-03-28 15:01                                     ` Robert Pluim
  2022-03-28 15:35                                       ` Andreas Schwab
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-28 15:01 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, larsi, 54562

>>>>> On Mon, 28 Mar 2022 15:19:47 +0200, Andreas Schwab <schwab@linux-m68k.org> said:

    Andreas> On Mär 28 2022, Robert Pluim wrote:
    >> Yes, thatʼs exactly what happens with U+0308 here sometimes, see
    >> screenshot below. I had to search a bit to find a font to use as the
    >> default that didnʼt have a glyph for U+0308, so Iʼm not sure how
    >> important this issue is in practice.

    Andreas> It's quite common in NFKD encoded texts.

That may be true, but the issue is how common it is to have a font
that canʼt compose it, not how often the non-precomposed form appears.

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 15:01                                     ` Robert Pluim
@ 2022-03-28 15:35                                       ` Andreas Schwab
  2022-03-28 16:11                                         ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 15:35 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

On Mär 28 2022, Robert Pluim wrote:

> That may be true, but the issue is how common it is to have a font
> that canʼt compose it

How do I search for fonts containing a specific character?

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 14:59                                     ` Robert Pluim
@ 2022-03-28 16:07                                       ` Eli Zaretskii
  2022-03-29 10:45                                         ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 16:07 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
> Date: Mon, 28 Mar 2022 16:59:16 +0200
> 
>     >> +static bool
>     >> +codepoint_is_combining_lookup_eligible (int ch)
>     >> +{
>     >> +  if ((0x20D0 <= ch && ch <= 0x20FF) || ch == 0x308)
>     >> +    return true;
>     >> +  return false;
>     >> +}
> 
>     Eli> Any reason not to use the Unicode category here?  Or do we want to
>     Eli> support only specific characters (in which case U+0308 is still not
>     Eli> the only one)?
> 
> You'd want to apply this to everything in Mn? Thatʼs a lot of
> codepoints. Or did you mean Me? Or anything in Mn thatʼs latin? The
> possibilities are endless :-)

I thought about any Mn character whose canonical-combining-class
property is 200 and above.  The COMBINING ENCLOSING <SOMETHING> stuff
will need to be added to that, of course.  And we could have that
option have multiple possible values, not just on/off...

Btw, for sequences that include a base character and 2 or more
diacritics, selecting a font that supports the first diacritic (the
one which triggers the composition) might not be enough, since the
rest of the diacritics could be absent from that font.  Instead, we'd
need something like "find the font for each one of them and then use
the one which supports the largest subset of them".

^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 15:35                                       ` Andreas Schwab
@ 2022-03-28 16:11                                         ` Eli Zaretskii
  2022-03-28 16:20                                           ` Andreas Schwab
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 16:11 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562, larsi

> Resent-From: Andreas Schwab <schwab@linux-m68k.org>
> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
> Resent-CC: bug-gnu-emacs@gnu.org
> Resent-Sender: help-debbugs@gnu.org
> From: Andreas Schwab <schwab@linux-m68k.org>
> Date: Mon, 28 Mar 2022 17:35:10 +0200
> Cc: luangruo@yahoo.com, larsi@gnus.org, 54562@debbugs.gnu.org
> 
> On Mär 28 2022, Robert Pluim wrote:
> 
> > That may be true, but the issue is how common it is to have a font
> > that canʼt compose it
> 
> How do I search for fonts containing a specific character?

With fc or with Emacs?





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 16:11                                         ` Eli Zaretskii
@ 2022-03-28 16:20                                           ` Andreas Schwab
  2022-03-28 16:26                                             ` Robert Pluim
  2022-03-28 17:10                                             ` Eli Zaretskii
  0 siblings, 2 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 16:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, rpluim, 54562, larsi

On Mär 28 2022, Eli Zaretskii wrote:

>> Resent-From: Andreas Schwab <schwab@linux-m68k.org>
>> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
>> Resent-CC: bug-gnu-emacs@gnu.org
>> Resent-Sender: help-debbugs@gnu.org
>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Date: Mon, 28 Mar 2022 17:35:10 +0200
>> Cc: luangruo@yahoo.com, larsi@gnus.org, 54562@debbugs.gnu.org
>> 
>> On Mär 28 2022, Robert Pluim wrote:
>> 
>> > That may be true, but the issue is how common it is to have a font
>> > that canʼt compose it
>> 
>> How do I search for fonts containing a specific character?
>
> With fc or with Emacs?

Whatever works.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 16:20                                           ` Andreas Schwab
@ 2022-03-28 16:26                                             ` Robert Pluim
  2022-03-28 16:41                                               ` Andreas Schwab
  2022-03-28 17:10                                             ` Eli Zaretskii
  1 sibling, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-28 16:26 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, 54562, larsi

>>>>> On Mon, 28 Mar 2022 18:20:31 +0200, Andreas Schwab <schwab@linux-m68k.org> said:

    Andreas> On Mär 28 2022, Eli Zaretskii wrote:
    >>> Resent-From: Andreas Schwab <schwab@linux-m68k.org>
    >>> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
    >>> Resent-CC: bug-gnu-emacs@gnu.org
    >>> Resent-Sender: help-debbugs@gnu.org
    >>> From: Andreas Schwab <schwab@linux-m68k.org>
    >>> Date: Mon, 28 Mar 2022 17:35:10 +0200
    >>> Cc: luangruo@yahoo.com, larsi@gnus.org, 54562@debbugs.gnu.org
    >>> 
    >>> On Mär 28 2022, Robert Pluim wrote:
    >>> 
    >>> > That may be true, but the issue is how common it is to have a font
    >>> > that canʼt compose it
    >>> 
    >>> How do I search for fonts containing a specific character?
    >> 
    >> With fc or with Emacs?

    Andreas> Whatever works.

fc-match --format='%{charset}\n' Menlo

will list the codepoints that Menlo supports.

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 16:26                                             ` Robert Pluim
@ 2022-03-28 16:41                                               ` Andreas Schwab
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 16:41 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, 54562, larsi

On Mär 28 2022, Robert Pluim wrote:

> fc-match --format='%{charset}\n' Menlo
>
> will list the codepoints that Menlo supports.

That's the wrong way round.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 16:20                                           ` Andreas Schwab
  2022-03-28 16:26                                             ` Robert Pluim
@ 2022-03-28 17:10                                             ` Eli Zaretskii
  2022-03-28 17:14                                               ` Eli Zaretskii
  2022-03-28 17:39                                               ` Andreas Schwab
  1 sibling, 2 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 17:10 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562, larsi

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: rpluim@gmail.com,  luangruo@yahoo.com,  larsi@gnus.org,
>   54562@debbugs.gnu.org
> Date: Mon, 28 Mar 2022 18:20:31 +0200
> 
> On Mär 28 2022, Eli Zaretskii wrote:
> 
> >> How do I search for fonts containing a specific character?
> >
> > With fc or with Emacs?
> 
> Whatever works.

Try this (only very lightly tested):

  (defun fonts-supporting-char (test-char)
    (let* ((inhibit-compacting-font-caches t)
	   (frame (selected-frame))
	   (fnt-list
	    (delete-dups
	     (x-list-fonts "-*-*-medium-r-normal-*-*-*-*-*-*-iso10646-1"
			   'default frame)))
	   fspec fonts-for-char ffont font-obj glyphs)
      (dolist (fnt fnt-list)
	(setq fspec (ignore-errors (font-spec :name fnt)))
	(if fspec
	    (setq ffont (find-font fspec frame)))
	(when ffont
	  (setq font-obj (open-font ffont nil frame))
	  (when font-obj
	    (setq glyphs (font-get-glyphs font-obj 0 1 (string test-char)))
	    (if (vectorp glyphs)
		(push (symbil- name (font-get font-obj :family))
		      fonts-for-char)))))
      (clear-font-cache)
      fonts-for-char))

Invoke like this:

  (fonts-supporting-char #x308) => [... long list of font names ...]






^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 17:10                                             ` Eli Zaretskii
@ 2022-03-28 17:14                                               ` Eli Zaretskii
  2022-03-28 17:39                                               ` Andreas Schwab
  1 sibling, 0 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 17:14 UTC (permalink / raw)
  To: schwab; +Cc: luangruo, rpluim, 54562, larsi

> Date: Mon, 28 Mar 2022 20:10:22 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: luangruo@yahoo.com, rpluim@gmail.com, 54562@debbugs.gnu.org, larsi@gnus.org
> 
> 		(push (symbil- name (font-get font-obj :family))
                       ^^^^^^^^^^^^
This should be symbol-name, of course.  Sorry for my naughty fingers.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 17:10                                             ` Eli Zaretskii
  2022-03-28 17:14                                               ` Eli Zaretskii
@ 2022-03-28 17:39                                               ` Andreas Schwab
  2022-03-28 18:12                                                 ` Eli Zaretskii
  2022-03-28 18:15                                                 ` Eli Zaretskii
  1 sibling, 2 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 17:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, rpluim, 54562, larsi

On Mär 28 2022, Eli Zaretskii wrote:

> Try this (only very lightly tested):
>
>   (defun fonts-supporting-char (test-char)

Doesn't work.  It claims support for a lot of fonts that don't contain
that character.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 17:39                                               ` Andreas Schwab
@ 2022-03-28 18:12                                                 ` Eli Zaretskii
  2022-03-28 18:14                                                   ` Andreas Schwab
  2022-03-28 18:15                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 18:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562, larsi

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: luangruo@yahoo.com,  rpluim@gmail.com,  54562@debbugs.gnu.org,
>   larsi@gnus.org
> Date: Mon, 28 Mar 2022 19:39:43 +0200
> 
> On Mär 28 2022, Eli Zaretskii wrote:
> 
> > Try this (only very lightly tested):
> >
> >   (defun fonts-supporting-char (test-char)
> 
> Doesn't work.  It claims support for a lot of fonts that don't contain
> that character.

For base characters also, or only for combining characters?





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 18:12                                                 ` Eli Zaretskii
@ 2022-03-28 18:14                                                   ` Andreas Schwab
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Schwab @ 2022-03-28 18:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, rpluim, 54562, larsi

On Mär 28 2022, Eli Zaretskii wrote:

> For base characters also, or only for combining characters?

I have only tested #x308.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 17:39                                               ` Andreas Schwab
  2022-03-28 18:12                                                 ` Eli Zaretskii
@ 2022-03-28 18:15                                                 ` Eli Zaretskii
  1 sibling, 0 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-28 18:15 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: luangruo, rpluim, 54562, larsi

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: luangruo@yahoo.com,  rpluim@gmail.com,  54562@debbugs.gnu.org,
>   larsi@gnus.org
> Date: Mon, 28 Mar 2022 19:39:43 +0200
> 
> On Mär 28 2022, Eli Zaretskii wrote:
> 
> > Try this (only very lightly tested):
> >
> >   (defun fonts-supporting-char (test-char)
> 
> Doesn't work.  It claims support for a lot of fonts that don't contain
> that character.

Try this fixed version instead:

  (defun fonts-supporting-char (test-char)
    (let* ((inhibit-compacting-font-caches t)
	   (test-str (string test-char))
	   (frame (selected-frame))
	   (fnt-list
	    (delete-dups
	     (x-list-fonts "-*-*-medium-r-normal-*-*-*-*-*-*-iso10646-1"
			   'default frame)))
	   fspec fonts-for-char ffont font-obj glyphs)
      (dolist (fnt fnt-list)
	(setq fspec (ignore-errors (font-spec :name fnt)))
	(if fspec
	    (setq ffont (find-font fspec frame)))
	(when ffont
	  (setq font-obj (open-font ffont nil frame))
	  (when font-obj
	    (setq glyphs (font-get-glyphs font-obj 0 1 test-str))
	    (if (and (vectorp glyphs) (aref glyphs 0))
		(push (symbol-name (font-get font-obj :family))
		      fonts-for-char)))))
      (clear-font-cache)
      fonts-for-char))





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-28 16:07                                       ` Eli Zaretskii
@ 2022-03-29 10:45                                         ` Robert Pluim
  2022-03-29 11:44                                           ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-29 10:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, larsi, 54562

>>>>> On Mon, 28 Mar 2022 19:07:53 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
    >> Date: Mon, 28 Mar 2022 16:59:16 +0200
    >> 
    >> >> +static bool
    >> >> +codepoint_is_combining_lookup_eligible (int ch)
    >> >> +{
    >> >> +  if ((0x20D0 <= ch && ch <= 0x20FF) || ch == 0x308)
    >> >> +    return true;
    >> >> +  return false;
    >> >> +}
    >> 
    Eli> Any reason not to use the Unicode category here?  Or do we want to
    Eli> support only specific characters (in which case U+0308 is still not
    Eli> the only one)?
    >> 
    >> You'd want to apply this to everything in Mn? Thatʼs a lot of
    >> codepoints. Or did you mean Me? Or anything in Mn thatʼs latin? The
    >> possibilities are endless :-)

    Eli> I thought about any Mn character whose canonical-combining-class
    Eli> property is 200 and above.  The COMBINING ENCLOSING <SOMETHING> stuff
    Eli> will need to be added to that, of course.  And we could have that
    Eli> option have multiple possible values, not just on/off...

OK. Would Me be ok for you, or would you specifically want only the
codepoints from the "Combining Diacritical Marks for Symbols" block?

I guess you'd want options like:

'all => combining-class + enclosing
'enclosing
'combining-class

(did we want to cover the 'number followed U+20E3 => emoji' case with
an option too?)

    Eli> Btw, for sequences that include a base character and 2 or more
    Eli> diacritics, selecting a font that supports the first diacritic (the
    Eli> one which triggers the composition) might not be enough, since the
    Eli> rest of the diacritics could be absent from that font.  Instead, we'd
    Eli> need something like "find the font for each one of them and then use
    Eli> the one which supports the largest subset of them".

font_range currently only has access to the first diacritic, so that
would be a bigger change. And that subset had better have the same
size as the number of unique diacritics, otherwise itʼs unlikely to
work.

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-29 10:45                                         ` Robert Pluim
@ 2022-03-29 11:44                                           ` Eli Zaretskii
  2022-03-29 14:50                                             ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-29 11:44 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
> Date: Tue, 29 Mar 2022 12:45:44 +0200
> 
>     Eli> I thought about any Mn character whose canonical-combining-class
>     Eli> property is 200 and above.  The COMBINING ENCLOSING <SOMETHING> stuff
>     Eli> will need to be added to that, of course.  And we could have that
>     Eli> option have multiple possible values, not just on/off...
> 
> OK. Would Me be ok for you, or would you specifically want only the
> codepoints from the "Combining Diacritical Marks for Symbols" block?

Using Me is fine with me.

> I guess you'd want options like:
> 
> 'all => combining-class + enclosing
> 'enclosing
> 'combining-class
> 
> (did we want to cover the 'number followed U+20E3 => emoji' case with
> an option too?)

That's a separate issue, IMO, and it can be handled via
auto-composition-emoji-eligible-codepoints, I think?  We could even
tell users to do that by themselves.

> 
>     Eli> Btw, for sequences that include a base character and 2 or more
>     Eli> diacritics, selecting a font that supports the first diacritic (the
>     Eli> one which triggers the composition) might not be enough, since the
>     Eli> rest of the diacritics could be absent from that font.  Instead, we'd
>     Eli> need something like "find the font for each one of them and then use
>     Eli> the one which supports the largest subset of them".
> 
> font_range currently only has access to the first diacritic, so that
> would be a bigger change. And that subset had better have the same
> size as the number of unique diacritics, otherwise itʼs unlikely to
> work.

We could perhaps avoid the complexity by rewriting the composition
rule for diacritics.  Instead of "\\c.\\c^+" with 1-character
look-back, we could have several rules:

   "\\c.\\c^\\c^\\c^\\c^" with 4-character look-back
   "\\c.\\c^\\c^\\c^+"    with 3-character look-back
   "\\c.\\c^\\c^+"        with 2-character look-back
   "\\c.\\c^+"            with 1-character look-back

(in that order).  I didn't test this, but if it works, maybe it could
solve the problem without any deep changes on the C level.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-29 11:44                                           ` Eli Zaretskii
@ 2022-03-29 14:50                                             ` Robert Pluim
  2022-03-29 15:42                                               ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-29 14:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, larsi, 54562

>>>>> On Tue, 29 Mar 2022 14:44:47 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
    >> Date: Tue, 29 Mar 2022 12:45:44 +0200
    >> 
    Eli> I thought about any Mn character whose canonical-combining-class
    Eli> property is 200 and above.  The COMBINING ENCLOSING <SOMETHING> stuff
    Eli> will need to be added to that, of course.  And we could have that
    Eli> option have multiple possible values, not just on/off...
    >> 
    >> OK. Would Me be ok for you, or would you specifically want only the
    >> codepoints from the "Combining Diacritical Marks for Symbols" block?

    Eli> Using Me is fine with me.

OK. There are probably subtleties surrounding things like U+20D2 that
I need to read up on (or we say "overlays are deprecated, letʼs ignore
them").

    >> I guess you'd want options like:
    >> 
    >> 'all => combining-class + enclosing
    >> 'enclosing
    >> 'combining-class
    >> 
    >> (did we want to cover the 'number followed U+20E3 => emoji' case with
    >> an option too?)

    Eli> That's a separate issue, IMO, and it can be handled via
    Eli> auto-composition-emoji-eligible-codepoints, I think?  We could even
    Eli> tell users to do that by themselves.

We could, although my purist side doesnʼt want to do it, since the
standard exists for a reason, dammit.

    Eli> We could perhaps avoid the complexity by rewriting the composition
    Eli> rule for diacritics.  Instead of "\\c.\\c^+" with 1-character
    Eli> look-back, we could have several rules:

    Eli>    "\\c.\\c^\\c^\\c^\\c^" with 4-character look-back
    Eli>    "\\c.\\c^\\c^\\c^+"    with 3-character look-back
    Eli>    "\\c.\\c^\\c^+"        with 2-character look-back
    Eli>    "\\c.\\c^+"            with 1-character look-back

    Eli> (in that order).  I didn't test this, but if it works, maybe it could
    Eli> solve the problem without any deep changes on the C level.

That might work. What would the fallback look like? Suppose we have 4
diacritics, 3 of which are covered by the same font, and one by a
different one. Would you prefer to attempt to use the font of 3 of
them, or would you prefer to fall back to the font of the base
character? (Iʼm not sure which would give better results in practice,
they might both fail)

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-29 14:50                                             ` Robert Pluim
@ 2022-03-29 15:42                                               ` Eli Zaretskii
  2022-03-29 15:59                                                 ` Robert Pluim
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-29 15:42 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
> Date: Tue, 29 Mar 2022 16:50:10 +0200
> 
>     Eli> We could perhaps avoid the complexity by rewriting the composition
>     Eli> rule for diacritics.  Instead of "\\c.\\c^+" with 1-character
>     Eli> look-back, we could have several rules:
> 
>     Eli>    "\\c.\\c^\\c^\\c^\\c^" with 4-character look-back
>     Eli>    "\\c.\\c^\\c^\\c^+"    with 3-character look-back
>     Eli>    "\\c.\\c^\\c^+"        with 2-character look-back
>     Eli>    "\\c.\\c^+"            with 1-character look-back
> 
>     Eli> (in that order).  I didn't test this, but if it works, maybe it could
>     Eli> solve the problem without any deep changes on the C level.
> 
> That might work. What would the fallback look like? Suppose we have 4
> diacritics, 3 of which are covered by the same font, and one by a
> different one. Would you prefer to attempt to use the font of 3 of
> them, or would you prefer to fall back to the font of the base
> character?

I think I'd prefer to have the font that covers the majority.

But I'm not sure it's a real-life dilemma.  I fully expect a font that
supports the rare diacritic to also support the less rare ones.  And
if I'm wrong, I'm sure we will hear about that soon enough ;-)






^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-29 15:42                                               ` Eli Zaretskii
@ 2022-03-29 15:59                                                 ` Robert Pluim
  2022-03-29 16:49                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Robert Pluim @ 2022-03-29 15:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, larsi, 54562

>>>>> On Tue, 29 Mar 2022 18:42:03 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> That might work. What would the fallback look like? Suppose we have 4
    >> diacritics, 3 of which are covered by the same font, and one by a
    >> different one. Would you prefer to attempt to use the font of 3 of
    >> them, or would you prefer to fall back to the font of the base
    >> character?

    Eli> I think I'd prefer to have the font that covers the majority.

OK. Btw, the limit is a 3-character lookback, not 4 (although I guess
we could always raise it).

    Eli> But I'm not sure it's a real-life dilemma.  I fully expect a font that
    Eli> supports the rare diacritic to also support the less rare ones.  And
    Eli> if I'm wrong, I'm sure we will hear about that soon enough ;-)

That we will.

Robert
-- 





^ permalink raw reply	[flat|nested] 47+ messages in thread

* bug#54562: 28.0.91; Emoji sequence not composed
  2022-03-29 15:59                                                 ` Robert Pluim
@ 2022-03-29 16:49                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 47+ messages in thread
From: Eli Zaretskii @ 2022-03-29 16:49 UTC (permalink / raw)
  To: Robert Pluim; +Cc: luangruo, larsi, 54562

> From: Robert Pluim <rpluim@gmail.com>
> Cc: luangruo@yahoo.com,  larsi@gnus.org,  54562@debbugs.gnu.org
> Date: Tue, 29 Mar 2022 17:59:46 +0200
> 
> >>>>> On Tue, 29 Mar 2022 18:42:03 +0300, Eli Zaretskii <eliz@gnu.org> said:
> 
>     >> From: Robert Pluim <rpluim@gmail.com>
>     >> That might work. What would the fallback look like? Suppose we have 4
>     >> diacritics, 3 of which are covered by the same font, and one by a
>     >> different one. Would you prefer to attempt to use the font of 3 of
>     >> them, or would you prefer to fall back to the font of the base
>     >> character?
> 
>     Eli> I think I'd prefer to have the font that covers the majority.
> 
> OK. Btw, the limit is a 3-character lookback, not 4 (although I guess
> we could always raise it).

Right.  So with this trick we can support at most 3 diacritics on the
same base character.





^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-03-29 16:49 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87bkxu8k7t.fsf.ref@yahoo.com>
2022-03-25  9:17 ` bug#54562: 28.0.91; Emoji sequence not composed Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-25 10:27   ` Eli Zaretskii
2022-03-25 10:32     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-25 10:54       ` Robert Pluim
2022-03-25 11:47         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-25 12:15           ` Eli Zaretskii
2022-03-25 12:46             ` Andreas Schwab
2022-03-25 13:05               ` Eli Zaretskii
2022-03-25 13:14                 ` Andreas Schwab
2022-03-25 13:30                   ` Robert Pluim
2022-03-25 13:57                     ` Andreas Schwab
2022-03-25 13:44                   ` Eli Zaretskii
2022-03-25 14:03                     ` Andreas Schwab
2022-03-25 14:05             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-25 14:14               ` Robert Pluim
2022-03-26  1:16                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-26  5:56                   ` Eli Zaretskii
2022-03-26 16:51                     ` Lars Ingebrigtsen
2022-03-27  0:32                       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-27 15:10                         ` Robert Pluim
2022-03-28  0:19                           ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-03-28  7:47                             ` Robert Pluim
2022-03-28 11:51                               ` Eli Zaretskii
2022-03-28 12:46                                 ` Robert Pluim
2022-03-28 13:12                                   ` Eli Zaretskii
2022-03-28 14:59                                     ` Robert Pluim
2022-03-28 16:07                                       ` Eli Zaretskii
2022-03-29 10:45                                         ` Robert Pluim
2022-03-29 11:44                                           ` Eli Zaretskii
2022-03-29 14:50                                             ` Robert Pluim
2022-03-29 15:42                                               ` Eli Zaretskii
2022-03-29 15:59                                                 ` Robert Pluim
2022-03-29 16:49                                                   ` Eli Zaretskii
2022-03-28 13:19                                   ` Andreas Schwab
2022-03-28 15:01                                     ` Robert Pluim
2022-03-28 15:35                                       ` Andreas Schwab
2022-03-28 16:11                                         ` Eli Zaretskii
2022-03-28 16:20                                           ` Andreas Schwab
2022-03-28 16:26                                             ` Robert Pluim
2022-03-28 16:41                                               ` Andreas Schwab
2022-03-28 17:10                                             ` Eli Zaretskii
2022-03-28 17:14                                               ` Eli Zaretskii
2022-03-28 17:39                                               ` Andreas Schwab
2022-03-28 18:12                                                 ` Eli Zaretskii
2022-03-28 18:14                                                   ` Andreas Schwab
2022-03-28 18:15                                                 ` Eli Zaretskii
2022-03-25 11:23       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).