unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Display of decomposed characters
@ 2020-12-23 10:05 Philipp Stephani
  2020-12-23 13:00 ` Janusz S. Bień
  2020-12-23 15:44 ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Philipp Stephani @ 2020-12-23 10:05 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

Before filing a bug, I wanted to ask whether the following Emacs
behavior is intentional: Even with Cairo and Harfbuzz, Emacs displays
decomposed Unicode characters (e.g. "a" followed by U+0308 COMBINING
DIAERESIS) as separate glyphs. While that's not technically wrong, I
think it would be better to display them as a single glyph, in other
words, not distinguish between canonically equivalent Unicode strings.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2020-12-23 10:05 Display of decomposed characters Philipp Stephani
@ 2020-12-23 13:00 ` Janusz S. Bień
  2020-12-23 15:44 ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Janusz S. Bień @ 2020-12-23 13:00 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: help-gnu-emacs

On Wed, Dec 23 2020 at 11:05 +01, Philipp Stephani wrote:
> Hi,
>
> Before filing a bug, I wanted to ask whether the following Emacs
> behavior is intentional: Even with Cairo and Harfbuzz, Emacs displays
> decomposed Unicode characters (e.g. "a" followed by U+0308 COMBINING
> DIAERESIS) as separate glyphs. While that's not technically wrong, I
> think it would be better to display them as a single glyph, in other
> words, not distinguish between canonically equivalent Unicode strings.

I don't have such a problem (GNU Emacs 26.1 (build 2,
 x86_64-pc-linux-gnu, GTK+ Version 3.24.5) of 2019-09-23, modified by
 Debian).

Regards

Janusz

-- 
             ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2020-12-23 10:05 Display of decomposed characters Philipp Stephani
  2020-12-23 13:00 ` Janusz S. Bień
@ 2020-12-23 15:44 ` Eli Zaretskii
  2020-12-25 17:14   ` Philipp Stephani
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2020-12-23 15:44 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 23 Dec 2020 11:05:13 +0100
> 
> Before filing a bug, I wanted to ask whether the following Emacs
> behavior is intentional: Even with Cairo and Harfbuzz, Emacs displays
> decomposed Unicode characters (e.g. "a" followed by U+0308 COMBINING
> DIAERESIS) as separate glyphs. While that's not technically wrong, I
> think it would be better to display them as a single glyph, in other
> words, not distinguish between canonically equivalent Unicode strings.

They are (or should be) displayed as a composed glyph if you are using
a font that supports both a and COMBINING DIAERESIS.  Emacs cannot
compose characters that aren't supported by the same font (because
composition processing stops at face boundaries, and each font defines
internally a separate face).



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2020-12-23 15:44 ` Eli Zaretskii
@ 2020-12-25 17:14   ` Philipp Stephani
  2020-12-25 19:01     ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2020-12-25 17:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

Am Mi., 23. Dez. 2020 um 16:44 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 23 Dec 2020 11:05:13 +0100
> >
> > Before filing a bug, I wanted to ask whether the following Emacs
> > behavior is intentional: Even with Cairo and Harfbuzz, Emacs displays
> > decomposed Unicode characters (e.g. "a" followed by U+0308 COMBINING
> > DIAERESIS) as separate glyphs. While that's not technically wrong, I
> > think it would be better to display them as a single glyph, in other
> > words, not distinguish between canonically equivalent Unicode strings.
>
> They are (or should be) displayed as a composed glyph if you are using
> a font that supports both a and COMBINING DIAERESIS.  Emacs cannot
> compose characters that aren't supported by the same font (because
> composition processing stops at face boundaries, and each font defines
> internally a separate face).

Interesting. Indeed the two glyphs come from different fonts. Is there
a way to force a single font for both of them? Or should the algorithm
be changed to perform composition before font selection?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2020-12-25 17:14   ` Philipp Stephani
@ 2020-12-25 19:01     ` Eli Zaretskii
  2021-01-24 18:58       ` Philipp Stephani
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2020-12-25 19:01 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Fri, 25 Dec 2020 18:14:31 +0100
> Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
> 
> > They are (or should be) displayed as a composed glyph if you are using
> > a font that supports both a and COMBINING DIAERESIS.  Emacs cannot
> > compose characters that aren't supported by the same font (because
> > composition processing stops at face boundaries, and each font defines
> > internally a separate face).
> 
> Interesting. Indeed the two glyphs come from different fonts. Is there
> a way to force a single font for both of them?

If the default font supports the diaeresis, that will happen
automatically.  If not, then simply don't choose the default font that
doesn't support accents.

> Or should the algorithm be changed to perform composition before
> font selection?

That's (a) hard, and (b) a bad idea in general, because different
fonts generally have very different sizes of accents, and are
generally incompatible in terms of pixel dimensions of characters vs
accents.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2020-12-25 19:01     ` Eli Zaretskii
@ 2021-01-24 18:58       ` Philipp Stephani
  2021-01-24 19:48         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp Stephani @ 2021-01-24 18:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

Am Fr., 25. Dez. 2020 um 20:02 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Fri, 25 Dec 2020 18:14:31 +0100
> > Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
> >
> > > They are (or should be) displayed as a composed glyph if you are using
> > > a font that supports both a and COMBINING DIAERESIS.  Emacs cannot
> > > compose characters that aren't supported by the same font (because
> > > composition processing stops at face boundaries, and each font defines
> > > internally a separate face).
> >
> > Interesting. Indeed the two glyphs come from different fonts. Is there
> > a way to force a single font for both of them?
>
> If the default font supports the diaeresis, that will happen
> automatically.  If not, then simply don't choose the default font that
> doesn't support accents.

The font will always support the composite variant (because it's part
of Latin-1). I guess fonts assume that applications will first try to
normalize strings to avoid issues like this?

>
> > Or should the algorithm be changed to perform composition before
> > font selection?
>
> That's (a) hard, and (b) a bad idea in general, because different
> fonts generally have very different sizes of accents, and are
> generally incompatible in terms of pixel dimensions of characters vs
> accents.
>

But the situation here is that all characters should be contained in
the default font *except* the combining diaeresis. So normalizing the
string to the composite form before trying to display it would
increase compatibility between the characters.
Does it ever make sense to pick different fonts for a base character
and its combining characters? Wouldn't that fundamentally prevent
using combining characters? IIUC text rendering engines should be able
to pick the right glyph if that didn't happen (assuming they can
perform Unicode normalization).



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-01-24 18:58       ` Philipp Stephani
@ 2021-01-24 19:48         ` Eli Zaretskii
  2021-01-24 19:57           ` Eli Zaretskii
  2021-02-28 18:10           ` Philipp
  0 siblings, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-01-24 19:48 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 24 Jan 2021 19:58:18 +0100
> Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
> 
> > If the default font supports the diaeresis, that will happen
> > automatically.  If not, then simply don't choose the default font that
> > doesn't support accents.
> 
> The font will always support the composite variant (because it's part
> of Latin-1).

That is only relevant if Emacs decides to compose the characters.
Then, and only then, will it ask the text-shaping engine to produce
glyphs for the base character and the accent together, and then the
font could provide a single precomposed glyph for them.

> I guess fonts assume that applications will first try to normalize
> strings to avoid issues like this?

Normalizing strings before you know whether the font has the
precomposed glyphs makes no sense.

What the text-shaping folks tell us is that we should pass _all_ the
text through the text shaper, then the shaper will DTRT in every
case.  But this would mean a thorough redesign and reimplementation of
how we do that in Emacs, and that is not easy if we want to keep the
current flexibility and customizability (which is why the character
composition code calls out to Lisp, and that makes sending all the
text that way tool expensive to be practical).

> Does it ever make sense to pick different fonts for a base character
> and its combining characters?

If the default font doesn't support the combining accent, what else
can you do?  Most fonts don't have precomposed glyphs for every
arbitrary sequence of base character followed by several combining
accents.  So sometimes you will have to compose the accents "by hand",
and that is not really possible if they come from different fonts.

> Wouldn't that fundamentally prevent using combining characters? IIUC
> text rendering engines should be able to pick the right glyph if
> that didn't happen (assuming they can perform Unicode
> normalization).

Unicode normalization is only tangentially relevant here.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-01-24 19:48         ` Eli Zaretskii
@ 2021-01-24 19:57           ` Eli Zaretskii
  2021-02-28 18:10           ` Philipp
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-01-24 19:57 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sun, 24 Jan 2021 21:48:07 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> 
> text that way tool expensive to be practical).
                ^^^^^^^^^^^^^^
Should be "too expensive", of course.  Sorry.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-01-24 19:48         ` Eli Zaretskii
  2021-01-24 19:57           ` Eli Zaretskii
@ 2021-02-28 18:10           ` Philipp
  2021-02-28 18:42             ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: Philipp @ 2021-02-28 18:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



> Am 24.01.2021 um 20:48 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Philipp Stephani <p.stephani2@gmail.com>
>> Date: Sun, 24 Jan 2021 19:58:18 +0100
>> Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
>> 
>>> If the default font supports the diaeresis, that will happen
>>> automatically.  If not, then simply don't choose the default font that
>>> doesn't support accents.
>> 
>> The font will always support the composite variant (because it's part
>> of Latin-1).
> 
> That is only relevant if Emacs decides to compose the characters.
> Then, and only then, will it ask the text-shaping engine to produce
> glyphs for the base character and the accent together, and then the
> font could provide a single precomposed glyph for them.

So in this case the decision to not compose the characters is incorrect or happens too early?

> 
>> I guess fonts assume that applications will first try to normalize
>> strings to avoid issues like this?
> 
> Normalizing strings before you know whether the font has the
> precomposed glyphs makes no sense.

Why? If the font doesn’t support a precomposed character, wouldn’t the rendering engine automatically fall back to a decomposed representation? (Serious question; I don’t know whether Harfbuzz does that.) IOW, would normalizing strings to NFC before sending them to the rendering engine ever break anything?

> 
> What the text-shaping folks tell us is that we should pass _all_ the
> text through the text shaper, then the shaper will DTRT in every
> case.  But this would mean a thorough redesign and reimplementation of
> how we do that in Emacs, and that is not easy if we want to keep the
> current flexibility and customizability (which is why the character
> composition code calls out to Lisp, and that makes sending all the
> text that way tool expensive to be practical).

Would it be possible to implement a more minimal change to fix the problem at hand?

> 
>> Does it ever make sense to pick different fonts for a base character
>> and its combining characters?
> 
> If the default font doesn't support the combining accent, what else
> can you do?  Most fonts don't have precomposed glyphs for every
> arbitrary sequence of base character followed by several combining
> accents.  So sometimes you will have to compose the accents "by hand",
> and that is not really possible if they come from different fonts.

Which is why they shouldn’t come from different fonts. What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?

> 
>> Wouldn't that fundamentally prevent using combining characters? IIUC
>> text rendering engines should be able to pick the right glyph if
>> that didn't happen (assuming they can perform Unicode
>> normalization).
> 
> Unicode normalization is only tangentially relevant here.
> 

Sure, but in this case it would fix them problem AFICS.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-02-28 18:10           ` Philipp
@ 2021-02-28 18:42             ` Eli Zaretskii
  2021-03-18 14:16               ` Philipp
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-02-28 18:42 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp <p.stephani2@gmail.com>
> Date: Sun, 28 Feb 2021 19:10:57 +0100
> Cc: help-gnu-emacs@gnu.org
> 
> >> The font will always support the composite variant (because it's part
> >> of Latin-1).
> > 
> > That is only relevant if Emacs decides to compose the characters.
> > Then, and only then, will it ask the text-shaping engine to produce
> > glyphs for the base character and the accent together, and then the
> > font could provide a single precomposed glyph for them.
> 
> So in this case the decision to not compose the characters is incorrect or happens too early?

That's one way of looking at the issue.  But it will lead you to the
conclusion that Emacs should send all the text it displays through the
shaping engine, which with the current design of how this stuff works
in Emacs will be much slower than what we have.  IOW, doing something
like that requires redesign of how we display text.

> >> I guess fonts assume that applications will first try to normalize
> >> strings to avoid issues like this?
> > 
> > Normalizing strings before you know whether the font has the
> > precomposed glyphs makes no sense.
> 
> Why? If the font doesn’t support a precomposed character, wouldn’t
> the rendering engine automatically fall back to a decomposed
> representation?

No.  How can it?

The fallback is in the composition code, not in the renderer.  The
latter just lays out the glyphs that it gets from the composition
code.  (Assuming that when you say "rendering engine" you mean the
part in the Emacs display code which handles layout.)

IOW, there's no "font doesn't support" in Emacs.  It works like this:

  . we check whether the current character should compose with the
    following and/or preceding ones
    . if it should compose, then:
      . pass the chunk of text that should compose to the shaping
        engine (e.g., HarfBuzz)
      . if the shaping engine succeeds, render the glyphs it returns
    . otherwise render the original character "normally", i.e. without
      consulting the shaping engine

(The above omits some secondary details in the interests of clarity.)
The "otherwise" part is the fallback you alluded to.  As you see, we
never ask the font, we only talk to the shaping engine.

> IOW, would normalizing strings to NFC before sending them to the rendering engine ever break anything?

Yes, it might.  Shaping engines don't usually decompose characters if
they get codepoints of precomposed ones.

Moreover, some precomposed glyphs don't even have codepoints, so you
cannot even ask the shaper to produce them by passing it a precomposed
character in that case -- such a character doesn't exist.

> > What the text-shaping folks tell us is that we should pass _all_ the
> > text through the text shaper, then the shaper will DTRT in every
> > case.  But this would mean a thorough redesign and reimplementation of
> > how we do that in Emacs, and that is not easy if we want to keep the
> > current flexibility and customizability (which is why the character
> > composition code calls out to Lisp, and that makes sending all the
> > text that way tool expensive to be practical).
> 
> Would it be possible to implement a more minimal change to fix the problem at hand?

Like what?  (And why we are discussing such an issue on the help
list?)

> >> Does it ever make sense to pick different fonts for a base character
> >> and its combining characters?
> > 
> > If the default font doesn't support the combining accent, what else
> > can you do?  Most fonts don't have precomposed glyphs for every
> > arbitrary sequence of base character followed by several combining
> > accents.  So sometimes you will have to compose the accents "by hand",
> > and that is not really possible if they come from different fonts.
> 
> Which is why they shouldn’t come from different fonts. What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?

What would that produce if the font of the previous character didn't
have a glyph for the accent?  The accent will disappear, or maybe will
be displayed as "tofu", right?  Does that sound like a good strategy?

> >> Wouldn't that fundamentally prevent using combining characters? IIUC
> >> text rendering engines should be able to pick the right glyph if
> >> that didn't happen (assuming they can perform Unicode
> >> normalization).
> > 
> > Unicode normalization is only tangentially relevant here.
> 
> Sure, but in this case it would fix them problem AFICS.

Sorry, I no longer understand what was this about (what does "that"
allude to here?).  That's bound to happen when a response comes more
than a month after the original exchange.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-02-28 18:42             ` Eli Zaretskii
@ 2021-03-18 14:16               ` Philipp
  2021-03-18 14:37                 ` Philipp
  2021-03-18 15:01                 ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Philipp @ 2021-03-18 14:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



> Am 28.02.2021 um 19:42 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
> 
>>>> I guess fonts assume that applications will first try to normalize
>>>> strings to avoid issues like this?
>>> 
>>> Normalizing strings before you know whether the font has the
>>> precomposed glyphs makes no sense.
>> 
>> Why? If the font doesn’t support a precomposed character, wouldn’t
>> the rendering engine automatically fall back to a decomposed
>> representation?
> 
> No.  How can it?
> 
> The fallback is in the composition code, not in the renderer.  The
> latter just lays out the glyphs that it gets from the composition
> code.  (Assuming that when you say "rendering engine" you mean the
> part in the Emacs display code which handles layout.)

What I mean is Harfbuzz (given your comment below, apparently the more correct term is "shaping engine").

> 
> IOW, there's no "font doesn't support" in Emacs.  It works like this:
> 
>  . we check whether the current character should compose with the
>    following and/or preceding ones

Is my understanding right that this is the step that comes too late, i.e. after font selection?  Otherwise I'd assume that the answer is always "yes" if the current character is a combining character.

>    . if it should compose, then:
>      . pass the chunk of text that should compose to the shaping
>        engine (e.g., HarfBuzz)
>      . if the shaping engine succeeds, render the glyphs it returns
>    . otherwise render the original character "normally", i.e. without
>      consulting the shaping engine
> 
> (The above omits some secondary details in the interests of clarity.)
> The "otherwise" part is the fallback you alluded to.  As you see, we
> never ask the font, we only talk to the shaping engine.

Hmm.  If these steps all happen before font selection, then I'm wondering where the problem comes from.
Or do they happen after font selection?

> 
>> IOW, would normalizing strings to NFC before sending them to the rendering engine ever break anything?
> 
> Yes, it might.  Shaping engines don't usually decompose characters if
> they get codepoints of precomposed ones.
> 
> Moreover, some precomposed glyphs don't even have codepoints, so you
> cannot even ask the shaper to produce them by passing it a precomposed
> character in that case -- such a character doesn't exist.

OK, so I guess we then definitely can't precompose unconditionally.

> 
>>> What the text-shaping folks tell us is that we should pass _all_ the
>>> text through the text shaper, then the shaper will DTRT in every
>>> case.  But this would mean a thorough redesign and reimplementation of
>>> how we do that in Emacs, and that is not easy if we want to keep the
>>> current flexibility and customizability (which is why the character
>>> composition code calls out to Lisp, and that makes sending all the
>>> text that way tool expensive to be practical).
>> 
>> Would it be possible to implement a more minimal change to fix the problem at hand?
> 
> Like what?

What I'd propose would be to perform font selection after the "compose/no-compose" decision.

>  (And why we are discussing such an issue on the help
> list?)

I'd first wanted to check whether this is actually a bug before filing a formal report, but I'll do that now.

> 
>>>> Does it ever make sense to pick different fonts for a base character
>>>> and its combining characters?
>>> 
>>> If the default font doesn't support the combining accent, what else
>>> can you do?  Most fonts don't have precomposed glyphs for every
>>> arbitrary sequence of base character followed by several combining
>>> accents.  So sometimes you will have to compose the accents "by hand",
>>> and that is not really possible if they come from different fonts.
>> 
>> Which is why they shouldn’t come from different fonts. What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
> 
> What would that produce if the font of the previous character didn't
> have a glyph for the accent?  The accent will disappear, or maybe will
> be displayed as "tofu", right?  Does that sound like a good strategy?

Can't the shaping engine produce fake compositions in that case?

> 
>>>> Wouldn't that fundamentally prevent using combining characters? IIUC
>>>> text rendering engines should be able to pick the right glyph if
>>>> that didn't happen (assuming they can perform Unicode
>>>> normalization).
>>> 
>>> Unicode normalization is only tangentially relevant here.
>> 
>> Sure, but in this case it would fix them problem AFICS.
> 
> Sorry, I no longer understand what was this about (what does "that"
> allude to here?).

'That' refers to "pick different fonts for a base character
and its combining characters".

>  That's bound to happen when a response comes more
> than a month after the original exchange.

Yes, but unfortunately answering these questions takes some time, which I don't always have.  I'll try to respond more timely in the future, but I can't really promise that.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-18 14:16               ` Philipp
@ 2021-03-18 14:37                 ` Philipp
  2021-03-18 15:01                 ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Philipp @ 2021-03-18 14:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



> Am 18.03.2021 um 15:16 schrieb Philipp <p.stephani2@gmail.com>:
> 
> 
>> (And why we are discussing such an issue on the help
>> list?)
> 
> I'd first wanted to check whether this is actually a bug before filing a formal report, but I'll do that now.

Filed https://debbugs.gnu.org/cgi/bugreport.cgi?bug=47235.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-18 14:16               ` Philipp
  2021-03-18 14:37                 ` Philipp
@ 2021-03-18 15:01                 ` Eli Zaretskii
  2021-03-19 16:37                   ` Philipp
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-03-18 15:01 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp <p.stephani2@gmail.com>
> Date: Thu, 18 Mar 2021 15:16:42 +0100
> Cc: help-gnu-emacs@gnu.org
> 
> >  . we check whether the current character should compose with the
> >    following and/or preceding ones
> 
> Is my understanding right that this is the step that comes too late, i.e. after font selection?

It comes after the font selection, yes.  And it cannot be any other
way, because the shaping engine must have the font to return any
meaningful results.  The results of text shaping depend heavily on the
font and its capabilities and features it supports.

> Otherwise I'd assume that the answer is always "yes" if the current character is a combining character.

Not only combining characters should be composed.  In fact, in Emacs
you can compose anything with anything else by tweaking a Lisp data
structure.

> >> What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
> > 
> > What would that produce if the font of the previous character didn't
> > have a glyph for the accent?  The accent will disappear, or maybe will
> > be displayed as "tofu", right?  Does that sound like a good strategy?
> 
> Can't the shaping engine produce fake compositions in that case?

What do you mean by "fake compositions"? what would they entail, and
which glyphs would they use?

> >  That's bound to happen when a response comes more
> > than a month after the original exchange.
> 
> Yes, but unfortunately answering these questions takes some time, which I don't always have.  I'll try to respond more timely in the future, but I can't really promise that.

You don't have to promise, but you must understand that such long
pauses almost guarantee that misunderstandings are more frequent.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-18 15:01                 ` Eli Zaretskii
@ 2021-03-19 16:37                   ` Philipp
  2021-03-19 16:44                     ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp @ 2021-03-19 16:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



> Am 18.03.2021 um 16:01 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Philipp <p.stephani2@gmail.com>
>> Date: Thu, 18 Mar 2021 15:16:42 +0100
>> Cc: help-gnu-emacs@gnu.org
>> 
>>> . we check whether the current character should compose with the
>>>   following and/or preceding ones
>> 
>> Is my understanding right that this is the step that comes too late, i.e. after font selection?
> 
> It comes after the font selection, yes.  And it cannot be any other
> way, because the shaping engine must have the font to return any
> meaningful results.  The results of text shaping depend heavily on the
> font and its capabilities and features it supports.

I get that, I'm just saying that in this case it leads to a suboptimal outcome.

> 
>>>> What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
>>> 
>>> What would that produce if the font of the previous character didn't
>>> have a glyph for the accent?  The accent will disappear, or maybe will
>>> be displayed as "tofu", right?  Does that sound like a good strategy?
>> 
>> Can't the shaping engine produce fake compositions in that case?
> 
> What do you mean by "fake compositions"? what would they entail, and
> which glyphs would they use?

For example, the shaping engine could use U+00A8 (assuming it's available in the font), but place it on top of the base glyph, without horizontal shift.  (At least that would be a possibility; I don't know whether Harfbuzz actually does that.)
That would still produce suboptimal results, but probably slightly better ones.

The optimal approach (for this case) would still be to try out composition before font selection, and use that if it works.

I should note that Emacs is not alone in producing suboptimal results in this case; other GUI programs on that systems appear to either perform the fake composition I mentioned before, or no composition at all (placing the base and combining characters next to each other).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-19 16:37                   ` Philipp
@ 2021-03-19 16:44                     ` Eli Zaretskii
  2021-03-21 11:43                       ` Philipp
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-03-19 16:44 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp <p.stephani2@gmail.com>
> Date: Fri, 19 Mar 2021 17:37:31 +0100
> Cc: help-gnu-emacs@gnu.org
> 
> >>>> What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
> >>> 
> >>> What would that produce if the font of the previous character didn't
> >>> have a glyph for the accent?  The accent will disappear, or maybe will
> >>> be displayed as "tofu", right?  Does that sound like a good strategy?
> >> 
> >> Can't the shaping engine produce fake compositions in that case?
> > 
> > What do you mean by "fake compositions"? what would they entail, and
> > which glyphs would they use?
> 
> For example, the shaping engine could use U+00A8 (assuming it's available in the font), but place it on top of the base glyph, without horizontal shift.

First, we were talking about the case where U+00A8 is NOT available in
the font.  (If it _is_ available, then this whole discussion is
pointless, because things already work well in that case.)

> (At least that would be a possibility; I don't know whether Harfbuzz actually does that.)
> That would still produce suboptimal results, but probably slightly better ones.

I don't understand what you are describing here.  If the font does
have U+00A8, what you describe already happens.  If the font doesn't
have the glyph, what can the shaper do?

The horizontal shift happens because we use U+00A8 from a different
font.  Placing a glyph from a different font on top of a base glyph is
in general an impossible task, because different fonts have different
ways of describing the points where the accents shall be placed on top
of base characters.

> The optimal approach (for this case) would still be to try out composition before font selection, and use that if it works.

I tried to explain why that's not possible, but I guess I failed
miserably.

> I should note that Emacs is not alone in producing suboptimal results in this case; other GUI programs on that systems appear to either perform the fake composition I mentioned before, or no composition at all (placing the base and combining characters next to each other).

Which should tell us something about the issue and the ways it can and
cannot be solved.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-19 16:44                     ` Eli Zaretskii
@ 2021-03-21 11:43                       ` Philipp
  2021-03-21 12:10                         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Philipp @ 2021-03-21 11:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



> Am 19.03.2021 um 17:44 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Philipp <p.stephani2@gmail.com>
>> Date: Fri, 19 Mar 2021 17:37:31 +0100
>> Cc: help-gnu-emacs@gnu.org
>> 
>>>>>> What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
>>>>> 
>>>>> What would that produce if the font of the previous character didn't
>>>>> have a glyph for the accent?  The accent will disappear, or maybe will
>>>>> be displayed as "tofu", right?  Does that sound like a good strategy?
>>>> 
>>>> Can't the shaping engine produce fake compositions in that case?
>>> 
>>> What do you mean by "fake compositions"? what would they entail, and
>>> which glyphs would they use?
>> 
>> For example, the shaping engine could use U+00A8 (assuming it's available in the font), but place it on top of the base glyph, without horizontal shift.
> 
> First, we were talking about the case where U+00A8 is NOT available in
> the font.  (If it _is_ available, then this whole discussion is
> pointless, because things already work well in that case.)

No, the case is that U+00A8 (the spacing diaeresis) is available, but U+0308 (the combining diaeresis) is not.

> 
>> (At least that would be a possibility; I don't know whether Harfbuzz actually does that.)
>> That would still produce suboptimal results, but probably slightly better ones.
> 
> I don't understand what you are describing here.  If the font does
> have U+00A8, what you describe already happens.  If the font doesn't
> have the glyph, what can the shaper do?

See above, here I'm assuming that U+0308 is unavailable not U+0048.

> 
> The horizontal shift happens because we use U+00A8 from a different
> font.  Placing a glyph from a different font on top of a base glyph is
> in general an impossible task, because different fonts have different
> ways of describing the points where the accents shall be placed on top
> of base characters.

Yes, but a fallback option where the two glyphs would just be centered horizontally on top of each other would be at least thinkable.  It wouldn't give great results, but I wouldn't call it impossible.

> 
>> The optimal approach (for this case) would still be to try out composition before font selection, and use that if it works.
> 
> I tried to explain why that's not possible, but I guess I failed
> miserably.

At least I'm not convinced.  Surely it's possible to call ucs-normalize-NFC-string before selecting fonts or sending a combined character sequence to the shaping engine, it produces optimal results in this case (I've tried it), and https://lists.freedesktop.org/archives/harfbuzz/2011-July/001426.html appears to talk about something very similar.  The question is rather whether this normalization would cause more problems than it fixes; at least the Harfbuzz approach shouldn't.

> 
>> I should note that Emacs is not alone in producing suboptimal results in this case; other GUI programs on that systems appear to either perform the fake composition I mentioned before, or no composition at all (placing the base and combining characters next to each other).
> 
> Which should tell us something about the issue and the ways it can and
> cannot be solved.
> 

It tells us that it's a difficult problem, as text rendering is in general.  But it's not unsolvable: for example, I just tried Firefox and Google Chrome, and both produce optimal results.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Display of decomposed characters
  2021-03-21 11:43                       ` Philipp
@ 2021-03-21 12:10                         ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-03-21 12:10 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Philipp <p.stephani2@gmail.com>
> Date: Sun, 21 Mar 2021 12:43:47 +0100
> Cc: help-gnu-emacs@gnu.org
> 
> >> For example, the shaping engine could use U+00A8 (assuming it's available in the font), but place it on top of the base glyph, without horizontal shift.
> > 
> > First, we were talking about the case where U+00A8 is NOT available in
> > the font.  (If it _is_ available, then this whole discussion is
> > pointless, because things already work well in that case.)
> 
> No, the case is that U+00A8 (the spacing diaeresis) is available, but U+0308 (the combining diaeresis) is not.

But that's a completely different character, with a completely
different metrics.  The results of superimposing them may well be
illegible.  It is certainly not what the author of the text intended.

And then what to do about diacritics which don't have such
counterparts?

> > The horizontal shift happens because we use U+00A8 from a different
> > font.  Placing a glyph from a different font on top of a base glyph is
> > in general an impossible task, because different fonts have different
> > ways of describing the points where the accents shall be placed on top
> > of base characters.
> 
> Yes, but a fallback option where the two glyphs would just be centered horizontally on top of each other would be at least thinkable.  It wouldn't give great results, but I wouldn't call it impossible.

It could be illegible.  The two dots could become located on some part
of the base character, for example.  Think lower-case and upper-case
base characters.

> >> The optimal approach (for this case) would still be to try out composition before font selection, and use that if it works.
> > 
> > I tried to explain why that's not possible, but I guess I failed
> > miserably.
> 
> At least I'm not convinced.  Surely it's possible to call ucs-normalize-NFC-string before selecting fonts or sending a combined character sequence to the shaping engine, it produces optimal results in this case (I've tried it), and https://lists.freedesktop.org/archives/harfbuzz/2011-July/001426.html appears to talk about something very similar.  The question is rather whether this normalization would cause more problems than it fixes; at least the Harfbuzz approach shouldn't.

Once again: (a) HarfBuzz folks (which are better text-shaping expert
than me and you combined) tell us this is the job of the shaping
engine, in particular because the shaper can handle the codepoints in
any order, not just the canonical order; (b) what about sequences
where NFC produces nothing (because the precomposed character doesn't
exist)?

> >> I should note that Emacs is not alone in producing suboptimal results in this case; other GUI programs on that systems appear to either perform the fake composition I mentioned before, or no composition at all (placing the base and combining characters next to each other).
> > 
> > Which should tell us something about the issue and the ways it can and
> > cannot be solved.
> 
> It tells us that it's a difficult problem, as text rendering is in general.  But it's not unsolvable: for example, I just tried Firefox and Google Chrome, and both produce optimal results.

But that doesn't mean they do what you propose we should do.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-21 12:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-23 10:05 Display of decomposed characters Philipp Stephani
2020-12-23 13:00 ` Janusz S. Bień
2020-12-23 15:44 ` Eli Zaretskii
2020-12-25 17:14   ` Philipp Stephani
2020-12-25 19:01     ` Eli Zaretskii
2021-01-24 18:58       ` Philipp Stephani
2021-01-24 19:48         ` Eli Zaretskii
2021-01-24 19:57           ` Eli Zaretskii
2021-02-28 18:10           ` Philipp
2021-02-28 18:42             ` Eli Zaretskii
2021-03-18 14:16               ` Philipp
2021-03-18 14:37                 ` Philipp
2021-03-18 15:01                 ` Eli Zaretskii
2021-03-19 16:37                   ` Philipp
2021-03-19 16:44                     ` Eli Zaretskii
2021-03-21 11:43                       ` Philipp
2021-03-21 12:10                         ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).