From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philipp Newsgroups: gmane.emacs.help Subject: Re: Display of decomposed characters Date: Thu, 18 Mar 2021 15:16:42 +0100 Message-ID: References: <83v9csplwq.fsf@gnu.org> <83wnx5n1zw.fsf@gnu.org> <831rea3ymg.fsf@gnu.org> <0077B374-A65D-412D-B1A5-4ADDD50D41A7@gmail.com> <83pn0k825c.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3771"; mail-complaints-to="usenet@ciao.gmane.io" Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Mar 18 15:17:54 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lMtTG-0000tJ-4z for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 18 Mar 2021 15:17:54 +0100 Original-Received: from localhost ([::1]:42036 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMtTF-000476-4Z for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 18 Mar 2021 10:17:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54250) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMtSQ-00042l-UM for help-gnu-emacs@gnu.org; Thu, 18 Mar 2021 10:17:02 -0400 Original-Received: from mail-wr1-x42c.google.com ([2a00:1450:4864:20::42c]:35653) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lMtSD-0005cr-IY; Thu, 18 Mar 2021 10:17:02 -0400 Original-Received: by mail-wr1-x42c.google.com with SMTP id j18so5717578wra.2; Thu, 18 Mar 2021 07:16:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=18sNEzkPpNuTQOFVIifnnghx9jlcF9NLYPRqz4AWTuI=; b=RzEy8LNTffTZGjAvmagoU2T+zfUydED2jqRNde7E0xtDu/pu+qLQvF1q9JRlC+dIxO Ktd+JsV46NBP7eASs83j1uaYCUZNhavVbDpY5EBsr3RZ6pM5i21mXhC9h1JKTtnSya4T OpFUhM/b5Y47l5klIfjC8GmhMPgrUYtMzKFj3pAyTdaTZX4AK5Cn8p4Mm8Y0MQd8e1ZR +LrbhY1VNROh5iPP6r931pSOCJdRoayzbYctUqSnLC+MMnRhOUc/aCC+y3f/GUQVDtDn EctlWmyEwcehzkXb1n7sLTJRc2CZydbT1tSXQQiAMejnInSBRS+n6ooQ76C6ZPaglNpc peVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=18sNEzkPpNuTQOFVIifnnghx9jlcF9NLYPRqz4AWTuI=; b=qmkmbb47Yapklj2TVnTOvOQLeSSNr5RqfXoLd6xfOhvQDYeZtDr0fdymTHRw2FcFCG XsQ71Q7kek49eMF33ehXT92WLiJ0eoy/i1VUhflEBiRkL/xXRIfoK5byoMtCdyLx6vL8 tKcQsGyvUqDDr8PTLVWvcUQZaYX9o2e6PaOvMs0LE3V3QzqUr7XKG1tyE6i3nKCfsXQM hjnuNPMqeaaNJRvpQ7tX4Z3Kz0BnrDI/1L8KNbYbe6+yIWoPZHFUA/ghdUWHrwRw5lA7 WYzb5eDv+i/2DVWktscknJ/cP/rcIGuLyTchZLMvKpLs35nOqaR2b9vbnvrOTRIhwSss jn2w== X-Gm-Message-State: AOAM533NvborsqN566VbrKBGlh+9gx2/zV3R+DbP+/9NGCiNSfNpIMzY r6mU7Mx5+c5PW2xhzBo+nCwcDj4uEmk= X-Google-Smtp-Source: ABdhPJzLEJMeCyFMNDTj9Xsiv1HcSOIumD8MJE0XFJMVKcO5kMVOFfEykJYkBmFZAtbY2MoCCk22UA== X-Received: by 2002:a05:6000:4b:: with SMTP id k11mr9952325wrx.35.1616077005385; Thu, 18 Mar 2021 07:16:45 -0700 (PDT) Original-Received: from philipps-macbook-pro.fritz.box (p57aafc25.dip0.t-ipconnect.de. [87.170.252.37]) by smtp.gmail.com with ESMTPSA id j6sm2371349wmq.16.2021.03.18.07.16.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Mar 2021 07:16:44 -0700 (PDT) In-Reply-To: <83pn0k825c.fsf@gnu.org> X-Mailer: Apple Mail (2.3654.60.0.2.21) Received-SPF: pass client-ip=2a00:1450:4864:20::42c; envelope-from=p.stephani2@gmail.com; helo=mail-wr1-x42c.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:128441 Archived-At: > Am 28.02.2021 um 19:42 schrieb Eli Zaretskii : >=20 >=20 >>>> I guess fonts assume that applications will first try to normalize >>>> strings to avoid issues like this? >>>=20 >>> Normalizing strings before you know whether the font has the >>> precomposed glyphs makes no sense. >>=20 >> Why? If the font doesn=E2=80=99t support a precomposed character, = wouldn=E2=80=99t >> the rendering engine automatically fall back to a decomposed >> representation? >=20 > No. How can it? >=20 > The fallback is in the composition code, not in the renderer. The > latter just lays out the glyphs that it gets from the composition > code. (Assuming that when you say "rendering engine" you mean the > part in the Emacs display code which handles layout.) What I mean is Harfbuzz (given your comment below, apparently the more = correct term is "shaping engine"). >=20 > IOW, there's no "font doesn't support" in Emacs. It works like this: >=20 > . we check whether the current character should compose with the > following and/or preceding ones Is my understanding right that this is the step that comes too late, = i.e. after font selection? Otherwise I'd assume that the answer is = always "yes" if the current character is a combining character. > . if it should compose, then: > . pass the chunk of text that should compose to the shaping > engine (e.g., HarfBuzz) > . if the shaping engine succeeds, render the glyphs it returns > . otherwise render the original character "normally", i.e. without > consulting the shaping engine >=20 > (The above omits some secondary details in the interests of clarity.) > The "otherwise" part is the fallback you alluded to. As you see, we > never ask the font, we only talk to the shaping engine. Hmm. If these steps all happen before font selection, then I'm = wondering where the problem comes from. Or do they happen after font selection? >=20 >> IOW, would normalizing strings to NFC before sending them to the = rendering engine ever break anything? >=20 > Yes, it might. Shaping engines don't usually decompose characters if > they get codepoints of precomposed ones. >=20 > Moreover, some precomposed glyphs don't even have codepoints, so you > cannot even ask the shaper to produce them by passing it a precomposed > character in that case -- such a character doesn't exist. OK, so I guess we then definitely can't precompose unconditionally. >=20 >>> What the text-shaping folks tell us is that we should pass _all_ the >>> text through the text shaper, then the shaper will DTRT in every >>> case. But this would mean a thorough redesign and reimplementation = of >>> how we do that in Emacs, and that is not easy if we want to keep the >>> current flexibility and customizability (which is why the character >>> composition code calls out to Lisp, and that makes sending all the >>> text that way tool expensive to be practical). >>=20 >> Would it be possible to implement a more minimal change to fix the = problem at hand? >=20 > Like what? What I'd propose would be to perform font selection after the = "compose/no-compose" decision. > (And why we are discussing such an issue on the help > list?) I'd first wanted to check whether this is actually a bug before filing a = formal report, but I'll do that now. >=20 >>>> Does it ever make sense to pick different fonts for a base = character >>>> and its combining characters? >>>=20 >>> If the default font doesn't support the combining accent, what else >>> can you do? Most fonts don't have precomposed glyphs for every >>> arbitrary sequence of base character followed by several combining >>> accents. So sometimes you will have to compose the accents "by = hand", >>> and that is not really possible if they come from different fonts. >>=20 >> Which is why they shouldn=E2=80=99t come from different fonts. What = if Emacs ignored font lookup for combining characters and always picked = the font of the previous base character? >=20 > What would that produce if the font of the previous character didn't > have a glyph for the accent? The accent will disappear, or maybe will > be displayed as "tofu", right? Does that sound like a good strategy? Can't the shaping engine produce fake compositions in that case? >=20 >>>> Wouldn't that fundamentally prevent using combining characters? = IIUC >>>> text rendering engines should be able to pick the right glyph if >>>> that didn't happen (assuming they can perform Unicode >>>> normalization). >>>=20 >>> Unicode normalization is only tangentially relevant here. >>=20 >> Sure, but in this case it would fix them problem AFICS. >=20 > Sorry, I no longer understand what was this about (what does "that" > allude to here?). 'That' refers to "pick different fonts for a base character and its combining characters". > That's bound to happen when a response comes more > than a month after the original exchange. Yes, but unfortunately answering these questions takes some time, which = I don't always have. I'll try to respond more timely in the future, but = I can't really promise that.