unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* emojis and other multi-character glyphs
@ 2021-12-26  9:43 Evgeny Zajcev
  2021-12-26 10:15 ` Eli Zaretskii
  2021-12-26 10:45 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 25+ messages in thread
From: Evgeny Zajcev @ 2021-12-26  9:43 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1753 bytes --]

There is some inconsistency in naming and behaviour in Emacs master.
We have `forward-char', `backward-char', `delete-char',
`backward-delete-char' commands.  All of them use "char" in their names,
however, `forward-char' and `backward-char' treats "char" differently than
`delete-char' and `backward-delete-char'.

Let me explain.  Emacs has support for composed characters to display
multiple characters composed into a single glyph.  Almost the same is done
for multi-character emojis such as 🇷🇺 or 👨‍👩‍👧‍👦 - multiple unicode
chars are composed into single glyph representing some emoji.  Now, if you
put point under composed character or emoji and run `forward-char' or
`backward-char' it moves point to the whole glyph, however, if you run
`delete-char' (when point is under composed char) or
`backward-delete-char'(when point just after the glyph) it will delete only
single character from multiple character representation, so pressing `C-d'
under 🇷🇺 will magically turn Russian flag into 🇺.  This is very
misleading behaviour especially when invisible characters are used in the
emojis

Maybe introduce "glyph" term meaning graphical representation of chars
sequence, displayed in the buffer and operated as a whole thing?

For example these things creates a glyph in a buffer:
1) (compose-chars ?a ?b)
2) (concat "\x1F1F7" "\x1F1FA")
3) (propertize "aaaa" 'display "B")

In this case, we can rename `forward-char' to `forward-glyph' keeping
naming and behaviour in consistency.

And also it will be possible to write something like `string-glyph-length'
to return 1 for "👨‍👩‍👧‍👦" instead of 7 as `length' returns now.

What do you think?

Thanks

-- 
lg

[-- Attachment #2: Type: text/html, Size: 2221 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26  9:43 emojis and other multi-character glyphs Evgeny Zajcev
@ 2021-12-26 10:15 ` Eli Zaretskii
  2021-12-26 10:41   ` Evgeny Zajcev
  2021-12-26 10:45 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 10:15 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: emacs-devel

> From: Evgeny Zajcev <lg.zevlg@gmail.com>
> Date: Sun, 26 Dec 2021 12:43:34 +0300
> 
> There is some inconsistency in naming and behaviour in Emacs master.
> We have `forward-char', `backward-char', `delete-char', `backward-delete-char' commands.  All of them use
> "char" in their names, however, `forward-char' and `backward-char' treats "char" differently than
> `delete-char' and `backward-delete-char'.
> 
> Let me explain.  Emacs has support for composed characters to display multiple characters composed into
> a single glyph.  Almost the same is done for multi-character emojis such as 🇷🇺 or 👨‍👩‍👧‍👦 - multiple
> unicode chars are composed into single glyph representing some emoji.  Now, if you put point under
> composed character or emoji and run `forward-char' or `backward-char' it moves point to the whole glyph,
> however, if you run `delete-char' (when point is under composed char) or `backward-delete-char'(when
> point just after the glyph) it will delete only single character from multiple character representation, so
> pressing `C-d' under 🇷🇺 will magically turn Russian flag into 🇺.  This is very misleading behaviour
> especially when invisible characters are used in the emojis

Emacs had in the past a feature whereby the user could move and delete
by single codepoints in composed character sequences.  This feature
was somehow lost.  I'm trying for some time to determine how and why
it was lost, and how to restore it.  So this issue is known and is in
the works, albeit slowly.

> Maybe introduce "glyph" term meaning graphical representation of chars sequence, displayed in the buffer
> and operated as a whole thing?

There's no need for that, because we can provide dwim-ish operation
for existing commands without any new terminology or new commands.

> And also it will be possible to write something like `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦" instead of 7
> as `length' returns now.

Why would that be useful?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:15 ` Eli Zaretskii
@ 2021-12-26 10:41   ` Evgeny Zajcev
  2021-12-26 10:51     ` Eli Zaretskii
  2021-12-26 18:00     ` Stefan Monnier
  0 siblings, 2 replies; 25+ messages in thread
From: Evgeny Zajcev @ 2021-12-26 10:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2838 bytes --]

вс, 26 дек. 2021 г. в 13:15, Eli Zaretskii <eliz@gnu.org>:

> > From: Evgeny Zajcev <lg.zevlg@gmail.com>
> > Date: Sun, 26 Dec 2021 12:43:34 +0300
> >
> > There is some inconsistency in naming and behaviour in Emacs master.
> > We have `forward-char', `backward-char', `delete-char',
> `backward-delete-char' commands.  All of them use
> > "char" in their names, however, `forward-char' and `backward-char'
> treats "char" differently than
> > `delete-char' and `backward-delete-char'.
> >
> > Let me explain.  Emacs has support for composed characters to display
> multiple characters composed into
> > a single glyph.  Almost the same is done for multi-character emojis such
> as 🇷🇺 or 👨‍👩‍👧‍👦 - multiple
> > unicode chars are composed into single glyph representing some emoji.
> Now, if you put point under
> > composed character or emoji and run `forward-char' or `backward-char' it
> moves point to the whole glyph,
> > however, if you run `delete-char' (when point is under composed char) or
> `backward-delete-char'(when
> > point just after the glyph) it will delete only single character from
> multiple character representation, so
> > pressing `C-d' under 🇷🇺 will magically turn Russian flag into 🇺.
> This is very misleading behaviour
> > especially when invisible characters are used in the emojis
>
> Emacs had in the past a feature whereby the user could move and delete
> by single codepoints in composed character sequences.  This feature
> was somehow lost.  I'm trying for some time to determine how and why
> it was lost, and how to restore it.  So this issue is known and is in
> the works, albeit slowly.
>

Ah, I see, nice, I'll try to debug this as well to help you


> > Maybe introduce "glyph" term meaning graphical representation of chars
> sequence, displayed in the buffer
> > and operated as a whole thing?
>
> There's no need for that, because we can provide dwim-ish operation
> for existing commands without any new terminology or new commands.
>

Yeah, if "char" consistency will be restored then there is no need for
"glyph" introduction.  I just thought that this is some new feature that
chars and glyphs are treated differently.


>
> > And also it will be possible to write something like
> `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦" instead of 7
> > as `length' returns now.
>
> Why would that be useful?
>

Sometimes it is useful to know real string length before acting on it.  In
my case, I use a service that has limitation on number chars it can act on
and emojis are counted as single char.  Anyway, having something like
`emoji' text-property (as analogue to `composition' text property for
composed chars) will be very useful for different use-cases

-- 
lg

[-- Attachment #2: Type: text/html, Size: 3847 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26  9:43 emojis and other multi-character glyphs Evgeny Zajcev
  2021-12-26 10:15 ` Eli Zaretskii
@ 2021-12-26 10:45 ` Lars Ingebrigtsen
  2021-12-26 10:50   ` Evgeny Zajcev
  1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2021-12-26 10:45 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: emacs-devel

Evgeny Zajcev <lg.zevlg@gmail.com> writes:

> And also it will be possible to write something like `string-glyph-length' to return 1
> for "👨‍👩‍👧‍👦" instead of 7 as `length' returns now.

(length (string-glyph-split "👨‍👩‍👧‍👦"))
=> 1

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:45 ` Lars Ingebrigtsen
@ 2021-12-26 10:50   ` Evgeny Zajcev
  0 siblings, 0 replies; 25+ messages in thread
From: Evgeny Zajcev @ 2021-12-26 10:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

вс, 26 дек. 2021 г. в 13:45, Lars Ingebrigtsen <larsi@gnus.org>:

> Evgeny Zajcev <lg.zevlg@gmail.com> writes:
>
> > And also it will be possible to write something like
> `string-glyph-length' to return 1
> > for "👨‍👩‍👧‍👦" instead of 7 as `length' returns now.
>
> (length (string-glyph-split "👨‍👩‍👧‍👦"))
> => 1
>
>
nice! this suits my needs, thanks

-- 
lg

[-- Attachment #2: Type: text/html, Size: 914 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:41   ` Evgeny Zajcev
@ 2021-12-26 10:51     ` Eli Zaretskii
  2021-12-26 10:56       ` Evgeny Zajcev
  2021-12-26 18:00     ` Stefan Monnier
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 10:51 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: emacs-devel

> From: Evgeny Zajcev <lg.zevlg@gmail.com>
> Date: Sun, 26 Dec 2021 13:41:21 +0300
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
>  > And also it will be possible to write something like `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦"
>  instead of 7
>  > as `length' returns now.
> 
>  Why would that be useful?
> 
> Sometimes it is useful to know real string length before acting on
> it.

If you mean their width on display, then we have string-width for
that.  And if you need absolute accuracy, use window-text-pixel-size.

> In my case, I use a service that has
> limitation on number chars it can act on and emojis are counted as
> single char.

But that is incorrect: most Emoji sequences occupy two columns on display.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:51     ` Eli Zaretskii
@ 2021-12-26 10:56       ` Evgeny Zajcev
  2021-12-26 10:58         ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Evgeny Zajcev @ 2021-12-26 10:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

вс, 26 дек. 2021 г. в 13:51, Eli Zaretskii <eliz@gnu.org>:

> > From: Evgeny Zajcev <lg.zevlg@gmail.com>
> > Date: Sun, 26 Dec 2021 13:41:21 +0300
> > Cc: emacs-devel <emacs-devel@gnu.org>
> >
> >  > And also it will be possible to write something like
> `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦"
> >  instead of 7
> >  > as `length' returns now.
> >
> >  Why would that be useful?
> >
> > Sometimes it is useful to know real string length before acting on
> > it.
>
> If you mean their width on display, then we have string-width for
> that.  And if you need absolute accuracy, use window-text-pixel-size.
>
> > In my case, I use a service that has
> > limitation on number chars it can act on and emojis are counted as
> > single char.
>
> But that is incorrect: most Emoji sequences occupy two columns on display.
>

No, no, not string-width, string length in number of glyphs, as Swift
counts them: "👨‍👩‍👧‍👦".length == 1

-- 
lg

[-- Attachment #2: Type: text/html, Size: 1674 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:56       ` Evgeny Zajcev
@ 2021-12-26 10:58         ` Eli Zaretskii
  2021-12-26 11:09           ` Evgeny Zajcev
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 10:58 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: emacs-devel

> From: Evgeny Zajcev <lg.zevlg@gmail.com>
> Date: Sun, 26 Dec 2021 13:56:12 +0300
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
>  > In my case, I use a service that has
>  > limitation on number chars it can act on and emojis are counted as
>  > single char.
> 
>  But that is incorrect: most Emoji sequences occupy two columns on display.
> 
> No, no, not string-width, string length in number of glyphs, as Swift counts them: "👨‍👩‍👧‍👦".length == 1

Why and where is this important?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:58         ` Eli Zaretskii
@ 2021-12-26 11:09           ` Evgeny Zajcev
  2021-12-26 11:26             ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Evgeny Zajcev @ 2021-12-26 11:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

вс, 26 дек. 2021 г. в 13:58, Eli Zaretskii <eliz@gnu.org>:

> > From: Evgeny Zajcev <lg.zevlg@gmail.com>
> > Date: Sun, 26 Dec 2021 13:56:12 +0300
> > Cc: emacs-devel <emacs-devel@gnu.org>
> >
> >  > In my case, I use a service that has
> >  > limitation on number chars it can act on and emojis are counted as
> >  > single char.
> >
> >  But that is incorrect: most Emoji sequences occupy two columns on
> display.
> >
> > No, no, not string-width, string length in number of glyphs, as Swift
> counts them: "👨‍👩‍👧‍👦".length == 1
>
> Why and where is this important?
>

I think in any WYSIWYG env it is essential that you get length in glyphs
that are actually displayed instead number or bytes or multibyte chars used
by internal representation.

30 years ago we had only single byte chars and length was a number of bytes
in string.  Then we had multibyte chars and length of such string become a
number of multibyte chars.  Now we have emojis represented by multiple
multibyte chars and length should be adopted as well.

-- 
lg

[-- Attachment #2: Type: text/html, Size: 1685 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 11:09           ` Evgeny Zajcev
@ 2021-12-26 11:26             ` Eli Zaretskii
  2021-12-26 11:53               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 11:26 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: emacs-devel

> From: Evgeny Zajcev <lg.zevlg@gmail.com>
> Date: Sun, 26 Dec 2021 14:09:01 +0300
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
>  > No, no, not string-width, string length in number of glyphs, as Swift counts them: "👨‍👩‍👧‍👦".length
>  == 1
> 
>  Why and where is this important?
> 
> I think in any WYSIWYG env it is essential that you get length in glyphs that are actually displayed instead
> number or bytes or multibyte chars used by internal representation.

But the above sequence displays here as 4 glyphs, not as one.  IOW,
a single "grapheme cluster" doesn't mean there's just one glyph there:
that "cluster" part is there for a reason.

> 30 years ago we had only single byte chars and length was a number of bytes in string.  Then we had
> multibyte chars and length of such string become a number of multibyte chars.  Now we have emojis
> represented by multiple multibyte chars and length should be adopted as well.

Do you really think I needed that lecture?  I didn't mention bytes of
a multibyte sequence in any of my messages, and string-width doesn't
measure the number of bytes, either.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 11:26             ` Eli Zaretskii
@ 2021-12-26 11:53               ` Lars Ingebrigtsen
  2021-12-26 11:57                 ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2021-12-26 11:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Evgeny Zajcev, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Swift counts them: "👨‍👩‍👧‍👦".length
>>  == 1

[...]

> But the above sequence displays here as 4 glyphs, not as one.

It displays as one glyph here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 11:53               ` Lars Ingebrigtsen
@ 2021-12-26 11:57                 ` Eli Zaretskii
  2021-12-26 12:03                   ` Lars Ingebrigtsen
  2021-12-26 12:35                   ` LdBeth
  0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 11:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lg.zevlg, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Evgeny Zajcev <lg.zevlg@gmail.com>,  emacs-devel@gnu.org
> Date: Sun, 26 Dec 2021 12:53:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Swift counts them: "👨‍👩‍👧‍👦".length
> >>  == 1
> 
> [...]
> 
> > But the above sequence displays here as 4 glyphs, not as one.
> 
> It displays as one glyph here.

??? Don't you see 4 faces there?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 11:57                 ` Eli Zaretskii
@ 2021-12-26 12:03                   ` Lars Ingebrigtsen
  2021-12-26 12:13                     ` Eli Zaretskii
  2021-12-26 12:35                   ` LdBeth
  1 sibling, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2021-12-26 12:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lg.zevlg, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 292 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> >> Swift counts them: "👨‍👩‍👧‍👦".length
>> >>  == 1
>> 
>> [...]
>> 
>> > But the above sequence displays here as 4 glyphs, not as one.
>> 
>> It displays as one glyph here.
>
> ??? Don't you see 4 faces there?

Nope:


[-- Attachment #2: Type: image/png, Size: 13571 bytes --]

[-- Attachment #3: Type: text/plain, Size: 166 bytes --]


Perhaps you don't have a recent enough emoji face installed?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:03                   ` Lars Ingebrigtsen
@ 2021-12-26 12:13                     ` Eli Zaretskii
  2021-12-26 12:16                       ` Po Lu
  2021-12-26 12:26                       ` Lars Ingebrigtsen
  0 siblings, 2 replies; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 12:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lg.zevlg, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: lg.zevlg@gmail.com,  emacs-devel@gnu.org
> Date: Sun, 26 Dec 2021 13:03:25 +0100
> 
> >> It displays as one glyph here.
> >
> > ??? Don't you see 4 faces there?
> 
> Nope:

Not sure what that is about, since I clearly see 4 faces of 4 people
(2 adults and 2 kids) in the image you posted.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:13                     ` Eli Zaretskii
@ 2021-12-26 12:16                       ` Po Lu
  2021-12-26 12:26                       ` Lars Ingebrigtsen
  1 sibling, 0 replies; 25+ messages in thread
From: Po Lu @ 2021-12-26 12:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, lg.zevlg, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: lg.zevlg@gmail.com,  emacs-devel@gnu.org
>> Date: Sun, 26 Dec 2021 13:03:25 +0100
>> 
>> >> It displays as one glyph here.
>> >
>> > ??? Don't you see 4 faces there?
>> 
>> Nope:
>
> Not sure what that is about, since I clearly see 4 faces of 4 people
> (2 adults and 2 kids) in the image you posted.

I also see 4 distinct faces, on a GNU/Linux system.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:13                     ` Eli Zaretskii
  2021-12-26 12:16                       ` Po Lu
@ 2021-12-26 12:26                       ` Lars Ingebrigtsen
  2021-12-26 12:50                         ` Po Lu
  2021-12-26 13:00                         ` Eli Zaretskii
  1 sibling, 2 replies; 25+ messages in thread
From: Lars Ingebrigtsen @ 2021-12-26 12:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lg.zevlg, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Not sure what that is about, since I clearly see 4 faces of 4 people
> (2 adults and 2 kids) in the image you posted.

I thought you meant that you were seeing four separate glyphs.  But then
we're all on the same page, and we're all seeing one glyph.  (And that
glyph, in this case, is showing a drawing of four people.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 11:57                 ` Eli Zaretskii
  2021-12-26 12:03                   ` Lars Ingebrigtsen
@ 2021-12-26 12:35                   ` LdBeth
  2021-12-26 13:01                     ` Eli Zaretskii
  1 sibling, 1 reply; 25+ messages in thread
From: LdBeth @ 2021-12-26 12:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, lg.zevlg, emacs-devel

>>>>> In <83wnjro9xv.fsf@gnu.org> 
>>>>>	Eli Zaretskii <eliz@gnu.org> wrote:
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Evgeny Zajcev <lg.zevlg@gmail.com>,  emacs-devel@gnu.org
> Date: Sun, 26 Dec 2021 12:53:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Swift counts them: "👨‍👩‍👧‍👦".length
> >>  == 1
> 
> [...]
> 
> > But the above sequence displays here as 4 glyphs, not as one.
> 
> It displays as one glyph here.

EZ> ??? Don't you see 4 faces there?

At first I thought you are joking, but I guess not everyone's deviced
are configued differently for display Unicode or Emoji. :P

On a Mac (or computers with Apple Color Emoji installed), the above
Emoji is indeed displayed as one glyph.

The following is describe-char:

             position: 464 of 641 (72%), column: 25
            character: 👨 (displayed as 👨) (codepoint 128104, #o372150, #x1f468)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F468
               script: symbol
               syntax: w 	which means: word
             category: .:Base
             to input: type "C-x 8 RET 1f468" or "C-x 8 RET MAN"
          buffer code: #xF0 #x9F #x91 #xA8
            file code: #xF0 #x9F #x91 #xA8 (encoded by coding system utf-8)
              display: composed to form "👨‍👩‍👧‍👦" (see below)

Composed with the following character(s) "‍👩‍👧‍👦" using this font:
  mac-ct:-*-Apple Color Emoji-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 6 0 1438 17 -1 18 13 4 nil]

Character code properties: customize what to show
  name: MAN
  general-category: So (Symbol, Other)
  decomposition: (128104) ('👨')

There are text properties here:
  face                 wl-highlight-message-cited-text-4
  mime-view-entity     [Show]
  mime-view-entity-body [Show]
  mime-view-situation  [Show]



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:26                       ` Lars Ingebrigtsen
@ 2021-12-26 12:50                         ` Po Lu
  2021-12-26 13:00                         ` Eli Zaretskii
  1 sibling, 0 replies; 25+ messages in thread
From: Po Lu @ 2021-12-26 12:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, lg.zevlg, emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:

>> Not sure what that is about, since I clearly see 4 faces of 4 people
>> (2 adults and 2 kids) in the image you posted.

> I thought you meant that you were seeing four separate glyphs.  But then
> we're all on the same page, and we're all seeing one glyph.  (And that
> glyph, in this case, is showing a drawing of four people.)

I see 4 glyphs, one for each face.

This is `emacs-28' though, and I don't know if something changed on
master.

Thanks.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:26                       ` Lars Ingebrigtsen
  2021-12-26 12:50                         ` Po Lu
@ 2021-12-26 13:00                         ` Eli Zaretskii
  2021-12-27 10:30                           ` Lars Ingebrigtsen
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 13:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lg.zevlg, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: lg.zevlg@gmail.com,  emacs-devel@gnu.org
> Date: Sun, 26 Dec 2021 13:26:59 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Not sure what that is about, since I clearly see 4 faces of 4 people
> > (2 adults and 2 kids) in the image you posted.
> 
> I thought you meant that you were seeing four separate glyphs.  But then
> we're all on the same page, and we're all seeing one glyph.  (And that
> glyph, in this case, is showing a drawing of four people.)

Not one glyph: one grapheme cluster.  Which is constructed from 4
glyphs with offsets.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 12:35                   ` LdBeth
@ 2021-12-26 13:01                     ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-26 13:01 UTC (permalink / raw)
  To: LdBeth; +Cc: larsi, lg.zevlg, emacs-devel

> Date: Sun, 26 Dec 2021 20:35:27 +0800
> From: LdBeth <andpuke@foxmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> 	lg.zevlg@gmail.com,
> 	emacs-devel@gnu.org
> 
> EZ> ??? Don't you see 4 faces there?
> 
> At first I thought you are joking, but I guess not everyone's deviced
> are configued differently for display Unicode or Emoji. :P
> 
> On a Mac (or computers with Apple Color Emoji installed), the above
> Emoji is indeed displayed as one glyph.

That's irrelevant: the font you are using has this sequence as a
precomposed glyph, that's all.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 10:41   ` Evgeny Zajcev
  2021-12-26 10:51     ` Eli Zaretskii
@ 2021-12-26 18:00     ` Stefan Monnier
  1 sibling, 0 replies; 25+ messages in thread
From: Stefan Monnier @ 2021-12-26 18:00 UTC (permalink / raw)
  To: Evgeny Zajcev; +Cc: Eli Zaretskii, emacs-devel

>> > And also it will be possible to write something like
>> `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦" instead of 7
>> > as `length' returns now.
>> Why would that be useful?
> Sometimes it is useful to know real string length before acting on it.  In
> my case, I use a service that has limitation on number chars it can act on
> and emojis are counted as single char.  Anyway, having something like
> `emoji' text-property (as analogue to `composition' text property for
> composed chars) will be very useful for different use-cases

I suspect that whichever way we may define "glyph" it's unlikely to be
100% the same as what your service uses to enforce its limit.


        Stefan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-26 13:00                         ` Eli Zaretskii
@ 2021-12-27 10:30                           ` Lars Ingebrigtsen
  2021-12-27 14:46                             ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Lars Ingebrigtsen @ 2021-12-27 10:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lg.zevlg, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Not one glyph: one grapheme cluster.  Which is constructed from 4
> glyphs with offsets.

Nope.  It's one glyph, composed from a grapheme cluster.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-27 10:30                           ` Lars Ingebrigtsen
@ 2021-12-27 14:46                             ` Eli Zaretskii
  2021-12-29  4:54                               ` Anand Tamariya
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-27 14:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lg.zevlg, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: lg.zevlg@gmail.com,  emacs-devel@gnu.org
> Date: Mon, 27 Dec 2021 11:30:06 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Not one glyph: one grapheme cluster.  Which is constructed from 4
> > glyphs with offsets.
> 
> Nope.  It's one glyph, composed from a grapheme cluster.

That's nothing but a terminology mishap.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-27 14:46                             ` Eli Zaretskii
@ 2021-12-29  4:54                               ` Anand Tamariya
  2021-12-29 13:01                                 ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Anand Tamariya @ 2021-12-29  4:54 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:
>
> That's nothing but a terminology mishap.
It's got to be more than this.

The Family: Man, Woman, Girl, Boy emoji is a ZWJ sequence combining 👨
Man, ‍ Zero Width Joiner, 👩 Woman, ‍ Zero Width Joiner, 👧 Girl, ‍ Zero
Width Joiner and 👦 Boy. These display as a single emoji on supported
platforms.

Simply copying the sequence in firefox address bar shows me single emoji (I've
NotoColorEmoji Font). This leads me to conclude it's not a font feature
- but a renderer feature.


[-- Attachment #2: firefox-emoji --]
[-- Type: image/png, Size: 7642 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: emojis and other multi-character glyphs
  2021-12-29  4:54                               ` Anand Tamariya
@ 2021-12-29 13:01                                 ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2021-12-29 13:01 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> From: Anand Tamariya <atamariya@gmail.com>
> Date: Wed, 29 Dec 2021 10:24:05 +0530
> 
> 
> [1:text/plain Hide]
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >
> > That's nothing but a terminology mishap.
> It's got to be more than this.
> 
> The Family: Man, Woman, Girl, Boy emoji is a ZWJ sequence combining 👨
> Man, ‍ Zero Width Joiner, 👩 Woman, ‍ Zero Width Joiner, 👧 Girl, ‍ Zero
> Width Joiner and 👦 Boy. These display as a single emoji on supported
> platforms.

"Single emoji" is ambiguous and basically inaccurate.  It's a single
"grapheme cluster".

> Simply copying the sequence in firefox address bar shows me single emoji (I've
> NotoColorEmoji Font). This leads me to conclude it's not a font feature
> - but a renderer feature.

It is actually both: the rendering engine asks the font how to display
this sequence, the font provides the response in the form of one or
more font glyphs to use, and the renderer then displays those glyphs.

But that wasn't what I referred to as "terminology mishap".



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-12-29 13:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-26  9:43 emojis and other multi-character glyphs Evgeny Zajcev
2021-12-26 10:15 ` Eli Zaretskii
2021-12-26 10:41   ` Evgeny Zajcev
2021-12-26 10:51     ` Eli Zaretskii
2021-12-26 10:56       ` Evgeny Zajcev
2021-12-26 10:58         ` Eli Zaretskii
2021-12-26 11:09           ` Evgeny Zajcev
2021-12-26 11:26             ` Eli Zaretskii
2021-12-26 11:53               ` Lars Ingebrigtsen
2021-12-26 11:57                 ` Eli Zaretskii
2021-12-26 12:03                   ` Lars Ingebrigtsen
2021-12-26 12:13                     ` Eli Zaretskii
2021-12-26 12:16                       ` Po Lu
2021-12-26 12:26                       ` Lars Ingebrigtsen
2021-12-26 12:50                         ` Po Lu
2021-12-26 13:00                         ` Eli Zaretskii
2021-12-27 10:30                           ` Lars Ingebrigtsen
2021-12-27 14:46                             ` Eli Zaretskii
2021-12-29  4:54                               ` Anand Tamariya
2021-12-29 13:01                                 ` Eli Zaretskii
2021-12-26 12:35                   ` LdBeth
2021-12-26 13:01                     ` Eli Zaretskii
2021-12-26 18:00     ` Stefan Monnier
2021-12-26 10:45 ` Lars Ingebrigtsen
2021-12-26 10:50   ` Evgeny Zajcev

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).