From mboxrd@z Thu Jan 1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Philipp
Newsgroups: gmane.emacs.help
Subject: Re: Display of decomposed characters
Date: Thu, 18 Mar 2021 15:16:42 +0100
Message-ID:
References:
<83v9csplwq.fsf@gnu.org>
<83wnx5n1zw.fsf@gnu.org>
<831rea3ymg.fsf@gnu.org> <0077B374-A65D-412D-B1A5-4ADDD50D41A7@gmail.com>
<83pn0k825c.fsf@gnu.org>
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
Content-Type: text/plain;
charset=utf-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
logging-data="3771"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: help-gnu-emacs@gnu.org
To: Eli Zaretskii
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Mar 18 15:17:54 2021
Return-path:
Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.92)
(envelope-from )
id 1lMtTG-0000tJ-4z
for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 18 Mar 2021 15:17:54 +0100
Original-Received: from localhost ([::1]:42036 helo=lists1p.gnu.org)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from )
id 1lMtTF-000476-4Z
for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 18 Mar 2021 10:17:53 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54250)
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from )
id 1lMtSQ-00042l-UM
for help-gnu-emacs@gnu.org; Thu, 18 Mar 2021 10:17:02 -0400
Original-Received: from mail-wr1-x42c.google.com ([2a00:1450:4864:20::42c]:35653)
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
(Exim 4.90_1) (envelope-from )
id 1lMtSD-0005cr-IY; Thu, 18 Mar 2021 10:17:02 -0400
Original-Received: by mail-wr1-x42c.google.com with SMTP id j18so5717578wra.2;
Thu, 18 Mar 2021 07:16:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:subject:from:in-reply-to:date:cc
:content-transfer-encoding:message-id:references:to;
bh=18sNEzkPpNuTQOFVIifnnghx9jlcF9NLYPRqz4AWTuI=;
b=RzEy8LNTffTZGjAvmagoU2T+zfUydED2jqRNde7E0xtDu/pu+qLQvF1q9JRlC+dIxO
Ktd+JsV46NBP7eASs83j1uaYCUZNhavVbDpY5EBsr3RZ6pM5i21mXhC9h1JKTtnSya4T
OpFUhM/b5Y47l5klIfjC8GmhMPgrUYtMzKFj3pAyTdaTZX4AK5Cn8p4Mm8Y0MQd8e1ZR
+LrbhY1VNROh5iPP6r931pSOCJdRoayzbYctUqSnLC+MMnRhOUc/aCC+y3f/GUQVDtDn
EctlWmyEwcehzkXb1n7sLTJRc2CZydbT1tSXQQiAMejnInSBRS+n6ooQ76C6ZPaglNpc
peVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
:content-transfer-encoding:message-id:references:to;
bh=18sNEzkPpNuTQOFVIifnnghx9jlcF9NLYPRqz4AWTuI=;
b=qmkmbb47Yapklj2TVnTOvOQLeSSNr5RqfXoLd6xfOhvQDYeZtDr0fdymTHRw2FcFCG
XsQ71Q7kek49eMF33ehXT92WLiJ0eoy/i1VUhflEBiRkL/xXRIfoK5byoMtCdyLx6vL8
tKcQsGyvUqDDr8PTLVWvcUQZaYX9o2e6PaOvMs0LE3V3QzqUr7XKG1tyE6i3nKCfsXQM
hjnuNPMqeaaNJRvpQ7tX4Z3Kz0BnrDI/1L8KNbYbe6+yIWoPZHFUA/ghdUWHrwRw5lA7
WYzb5eDv+i/2DVWktscknJ/cP/rcIGuLyTchZLMvKpLs35nOqaR2b9vbnvrOTRIhwSss
jn2w==
X-Gm-Message-State: AOAM533NvborsqN566VbrKBGlh+9gx2/zV3R+DbP+/9NGCiNSfNpIMzY
r6mU7Mx5+c5PW2xhzBo+nCwcDj4uEmk=
X-Google-Smtp-Source: ABdhPJzLEJMeCyFMNDTj9Xsiv1HcSOIumD8MJE0XFJMVKcO5kMVOFfEykJYkBmFZAtbY2MoCCk22UA==
X-Received: by 2002:a05:6000:4b:: with SMTP id
k11mr9952325wrx.35.1616077005385;
Thu, 18 Mar 2021 07:16:45 -0700 (PDT)
Original-Received: from philipps-macbook-pro.fritz.box (p57aafc25.dip0.t-ipconnect.de.
[87.170.252.37])
by smtp.gmail.com with ESMTPSA id j6sm2371349wmq.16.2021.03.18.07.16.44
(version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
Thu, 18 Mar 2021 07:16:44 -0700 (PDT)
In-Reply-To: <83pn0k825c.fsf@gnu.org>
X-Mailer: Apple Mail (2.3654.60.0.2.21)
Received-SPF: pass client-ip=2a00:1450:4864:20::42c;
envelope-from=p.stephani2@gmail.com; helo=mail-wr1-x42c.google.com
X-Spam_score_int: -17
X-Spam_score: -1.8
X-Spam_bar: -
X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Users list for the GNU Emacs text editor
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org
Original-Sender: "help-gnu-emacs"
Xref: news.gmane.io gmane.emacs.help:128441
Archived-At:
> Am 28.02.2021 um 19:42 schrieb Eli Zaretskii :
>=20
>=20
>>>> I guess fonts assume that applications will first try to normalize
>>>> strings to avoid issues like this?
>>>=20
>>> Normalizing strings before you know whether the font has the
>>> precomposed glyphs makes no sense.
>>=20
>> Why? If the font doesn=E2=80=99t support a precomposed character, =
wouldn=E2=80=99t
>> the rendering engine automatically fall back to a decomposed
>> representation?
>=20
> No. How can it?
>=20
> The fallback is in the composition code, not in the renderer. The
> latter just lays out the glyphs that it gets from the composition
> code. (Assuming that when you say "rendering engine" you mean the
> part in the Emacs display code which handles layout.)
What I mean is Harfbuzz (given your comment below, apparently the more =
correct term is "shaping engine").
>=20
> IOW, there's no "font doesn't support" in Emacs. It works like this:
>=20
> . we check whether the current character should compose with the
> following and/or preceding ones
Is my understanding right that this is the step that comes too late, =
i.e. after font selection? Otherwise I'd assume that the answer is =
always "yes" if the current character is a combining character.
> . if it should compose, then:
> . pass the chunk of text that should compose to the shaping
> engine (e.g., HarfBuzz)
> . if the shaping engine succeeds, render the glyphs it returns
> . otherwise render the original character "normally", i.e. without
> consulting the shaping engine
>=20
> (The above omits some secondary details in the interests of clarity.)
> The "otherwise" part is the fallback you alluded to. As you see, we
> never ask the font, we only talk to the shaping engine.
Hmm. If these steps all happen before font selection, then I'm =
wondering where the problem comes from.
Or do they happen after font selection?
>=20
>> IOW, would normalizing strings to NFC before sending them to the =
rendering engine ever break anything?
>=20
> Yes, it might. Shaping engines don't usually decompose characters if
> they get codepoints of precomposed ones.
>=20
> Moreover, some precomposed glyphs don't even have codepoints, so you
> cannot even ask the shaper to produce them by passing it a precomposed
> character in that case -- such a character doesn't exist.
OK, so I guess we then definitely can't precompose unconditionally.
>=20
>>> What the text-shaping folks tell us is that we should pass _all_ the
>>> text through the text shaper, then the shaper will DTRT in every
>>> case. But this would mean a thorough redesign and reimplementation =
of
>>> how we do that in Emacs, and that is not easy if we want to keep the
>>> current flexibility and customizability (which is why the character
>>> composition code calls out to Lisp, and that makes sending all the
>>> text that way tool expensive to be practical).
>>=20
>> Would it be possible to implement a more minimal change to fix the =
problem at hand?
>=20
> Like what?
What I'd propose would be to perform font selection after the =
"compose/no-compose" decision.
> (And why we are discussing such an issue on the help
> list?)
I'd first wanted to check whether this is actually a bug before filing a =
formal report, but I'll do that now.
>=20
>>>> Does it ever make sense to pick different fonts for a base =
character
>>>> and its combining characters?
>>>=20
>>> If the default font doesn't support the combining accent, what else
>>> can you do? Most fonts don't have precomposed glyphs for every
>>> arbitrary sequence of base character followed by several combining
>>> accents. So sometimes you will have to compose the accents "by =
hand",
>>> and that is not really possible if they come from different fonts.
>>=20
>> Which is why they shouldn=E2=80=99t come from different fonts. What =
if Emacs ignored font lookup for combining characters and always picked =
the font of the previous base character?
>=20
> What would that produce if the font of the previous character didn't
> have a glyph for the accent? The accent will disappear, or maybe will
> be displayed as "tofu", right? Does that sound like a good strategy?
Can't the shaping engine produce fake compositions in that case?
>=20
>>>> Wouldn't that fundamentally prevent using combining characters? =
IIUC
>>>> text rendering engines should be able to pick the right glyph if
>>>> that didn't happen (assuming they can perform Unicode
>>>> normalization).
>>>=20
>>> Unicode normalization is only tangentially relevant here.
>>=20
>> Sure, but in this case it would fix them problem AFICS.
>=20
> Sorry, I no longer understand what was this about (what does "that"
> allude to here?).
'That' refers to "pick different fonts for a base character
and its combining characters".
> That's bound to happen when a response comes more
> than a month after the original exchange.
Yes, but unfortunately answering these questions takes some time, which =
I don't always have. I'll try to respond more timely in the future, but =
I can't really promise that.