unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Philipp <p.stephani2@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: help-gnu-emacs@gnu.org
Subject: Re: Display of decomposed characters
Date: Thu, 18 Mar 2021 15:16:42 +0100	[thread overview]
Message-ID: <CB140634-ED75-4A27-ADA8-A77F3BF00530@gmail.com> (raw)
In-Reply-To: <83pn0k825c.fsf@gnu.org>



> Am 28.02.2021 um 19:42 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
> 
>>>> I guess fonts assume that applications will first try to normalize
>>>> strings to avoid issues like this?
>>> 
>>> Normalizing strings before you know whether the font has the
>>> precomposed glyphs makes no sense.
>> 
>> Why? If the font doesn’t support a precomposed character, wouldn’t
>> the rendering engine automatically fall back to a decomposed
>> representation?
> 
> No.  How can it?
> 
> The fallback is in the composition code, not in the renderer.  The
> latter just lays out the glyphs that it gets from the composition
> code.  (Assuming that when you say "rendering engine" you mean the
> part in the Emacs display code which handles layout.)

What I mean is Harfbuzz (given your comment below, apparently the more correct term is "shaping engine").

> 
> IOW, there's no "font doesn't support" in Emacs.  It works like this:
> 
>  . we check whether the current character should compose with the
>    following and/or preceding ones

Is my understanding right that this is the step that comes too late, i.e. after font selection?  Otherwise I'd assume that the answer is always "yes" if the current character is a combining character.

>    . if it should compose, then:
>      . pass the chunk of text that should compose to the shaping
>        engine (e.g., HarfBuzz)
>      . if the shaping engine succeeds, render the glyphs it returns
>    . otherwise render the original character "normally", i.e. without
>      consulting the shaping engine
> 
> (The above omits some secondary details in the interests of clarity.)
> The "otherwise" part is the fallback you alluded to.  As you see, we
> never ask the font, we only talk to the shaping engine.

Hmm.  If these steps all happen before font selection, then I'm wondering where the problem comes from.
Or do they happen after font selection?

> 
>> IOW, would normalizing strings to NFC before sending them to the rendering engine ever break anything?
> 
> Yes, it might.  Shaping engines don't usually decompose characters if
> they get codepoints of precomposed ones.
> 
> Moreover, some precomposed glyphs don't even have codepoints, so you
> cannot even ask the shaper to produce them by passing it a precomposed
> character in that case -- such a character doesn't exist.

OK, so I guess we then definitely can't precompose unconditionally.

> 
>>> What the text-shaping folks tell us is that we should pass _all_ the
>>> text through the text shaper, then the shaper will DTRT in every
>>> case.  But this would mean a thorough redesign and reimplementation of
>>> how we do that in Emacs, and that is not easy if we want to keep the
>>> current flexibility and customizability (which is why the character
>>> composition code calls out to Lisp, and that makes sending all the
>>> text that way tool expensive to be practical).
>> 
>> Would it be possible to implement a more minimal change to fix the problem at hand?
> 
> Like what?

What I'd propose would be to perform font selection after the "compose/no-compose" decision.

>  (And why we are discussing such an issue on the help
> list?)

I'd first wanted to check whether this is actually a bug before filing a formal report, but I'll do that now.

> 
>>>> Does it ever make sense to pick different fonts for a base character
>>>> and its combining characters?
>>> 
>>> If the default font doesn't support the combining accent, what else
>>> can you do?  Most fonts don't have precomposed glyphs for every
>>> arbitrary sequence of base character followed by several combining
>>> accents.  So sometimes you will have to compose the accents "by hand",
>>> and that is not really possible if they come from different fonts.
>> 
>> Which is why they shouldn’t come from different fonts. What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character?
> 
> What would that produce if the font of the previous character didn't
> have a glyph for the accent?  The accent will disappear, or maybe will
> be displayed as "tofu", right?  Does that sound like a good strategy?

Can't the shaping engine produce fake compositions in that case?

> 
>>>> Wouldn't that fundamentally prevent using combining characters? IIUC
>>>> text rendering engines should be able to pick the right glyph if
>>>> that didn't happen (assuming they can perform Unicode
>>>> normalization).
>>> 
>>> Unicode normalization is only tangentially relevant here.
>> 
>> Sure, but in this case it would fix them problem AFICS.
> 
> Sorry, I no longer understand what was this about (what does "that"
> allude to here?).

'That' refers to "pick different fonts for a base character
and its combining characters".

>  That's bound to happen when a response comes more
> than a month after the original exchange.

Yes, but unfortunately answering these questions takes some time, which I don't always have.  I'll try to respond more timely in the future, but I can't really promise that.




  reply	other threads:[~2021-03-18 14:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-23 10:05 Display of decomposed characters Philipp Stephani
2020-12-23 13:00 ` Janusz S. Bień
2020-12-23 15:44 ` Eli Zaretskii
2020-12-25 17:14   ` Philipp Stephani
2020-12-25 19:01     ` Eli Zaretskii
2021-01-24 18:58       ` Philipp Stephani
2021-01-24 19:48         ` Eli Zaretskii
2021-01-24 19:57           ` Eli Zaretskii
2021-02-28 18:10           ` Philipp
2021-02-28 18:42             ` Eli Zaretskii
2021-03-18 14:16               ` Philipp [this message]
2021-03-18 14:37                 ` Philipp
2021-03-18 15:01                 ` Eli Zaretskii
2021-03-19 16:37                   ` Philipp
2021-03-19 16:44                     ` Eli Zaretskii
2021-03-21 11:43                       ` Philipp
2021-03-21 12:10                         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CB140634-ED75-4A27-ADA8-A77F3BF00530@gmail.com \
    --to=p.stephani2@gmail.com \
    --cc=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).