From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Display of decomposed characters Date: Sun, 28 Feb 2021 20:42:39 +0200 Message-ID: <83pn0k825c.fsf@gnu.org> References: <83v9csplwq.fsf@gnu.org> <83wnx5n1zw.fsf@gnu.org> <831rea3ymg.fsf@gnu.org> <0077B374-A65D-412D-B1A5-4ADDD50D41A7@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13432"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun Feb 28 19:43:21 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lGR2H-0003Pc-8P for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 28 Feb 2021 19:43:21 +0100 Original-Received: from localhost ([::1]:43004 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lGR2G-00050i-59 for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 28 Feb 2021 13:43:20 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40064) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lGR1h-00050Y-CT for help-gnu-emacs@gnu.org; Sun, 28 Feb 2021 13:42:45 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58456) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lGR1g-0005PL-Tw for help-gnu-emacs@gnu.org; Sun, 28 Feb 2021 13:42:45 -0500 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:3996 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lGR1g-0007ZT-E8 for help-gnu-emacs@gnu.org; Sun, 28 Feb 2021 13:42:44 -0500 In-Reply-To: <0077B374-A65D-412D-B1A5-4ADDD50D41A7@gmail.com> (message from Philipp on Sun, 28 Feb 2021 19:10:57 +0100) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:128318 Archived-At: > From: Philipp > Date: Sun, 28 Feb 2021 19:10:57 +0100 > Cc: help-gnu-emacs@gnu.org > > >> The font will always support the composite variant (because it's part > >> of Latin-1). > > > > That is only relevant if Emacs decides to compose the characters. > > Then, and only then, will it ask the text-shaping engine to produce > > glyphs for the base character and the accent together, and then the > > font could provide a single precomposed glyph for them. > > So in this case the decision to not compose the characters is incorrect or happens too early? That's one way of looking at the issue. But it will lead you to the conclusion that Emacs should send all the text it displays through the shaping engine, which with the current design of how this stuff works in Emacs will be much slower than what we have. IOW, doing something like that requires redesign of how we display text. > >> I guess fonts assume that applications will first try to normalize > >> strings to avoid issues like this? > > > > Normalizing strings before you know whether the font has the > > precomposed glyphs makes no sense. > > Why? If the font doesn’t support a precomposed character, wouldn’t > the rendering engine automatically fall back to a decomposed > representation? No. How can it? The fallback is in the composition code, not in the renderer. The latter just lays out the glyphs that it gets from the composition code. (Assuming that when you say "rendering engine" you mean the part in the Emacs display code which handles layout.) IOW, there's no "font doesn't support" in Emacs. It works like this: . we check whether the current character should compose with the following and/or preceding ones . if it should compose, then: . pass the chunk of text that should compose to the shaping engine (e.g., HarfBuzz) . if the shaping engine succeeds, render the glyphs it returns . otherwise render the original character "normally", i.e. without consulting the shaping engine (The above omits some secondary details in the interests of clarity.) The "otherwise" part is the fallback you alluded to. As you see, we never ask the font, we only talk to the shaping engine. > IOW, would normalizing strings to NFC before sending them to the rendering engine ever break anything? Yes, it might. Shaping engines don't usually decompose characters if they get codepoints of precomposed ones. Moreover, some precomposed glyphs don't even have codepoints, so you cannot even ask the shaper to produce them by passing it a precomposed character in that case -- such a character doesn't exist. > > What the text-shaping folks tell us is that we should pass _all_ the > > text through the text shaper, then the shaper will DTRT in every > > case. But this would mean a thorough redesign and reimplementation of > > how we do that in Emacs, and that is not easy if we want to keep the > > current flexibility and customizability (which is why the character > > composition code calls out to Lisp, and that makes sending all the > > text that way tool expensive to be practical). > > Would it be possible to implement a more minimal change to fix the problem at hand? Like what? (And why we are discussing such an issue on the help list?) > >> Does it ever make sense to pick different fonts for a base character > >> and its combining characters? > > > > If the default font doesn't support the combining accent, what else > > can you do? Most fonts don't have precomposed glyphs for every > > arbitrary sequence of base character followed by several combining > > accents. So sometimes you will have to compose the accents "by hand", > > and that is not really possible if they come from different fonts. > > Which is why they shouldn’t come from different fonts. What if Emacs ignored font lookup for combining characters and always picked the font of the previous base character? What would that produce if the font of the previous character didn't have a glyph for the accent? The accent will disappear, or maybe will be displayed as "tofu", right? Does that sound like a good strategy? > >> Wouldn't that fundamentally prevent using combining characters? IIUC > >> text rendering engines should be able to pick the right glyph if > >> that didn't happen (assuming they can perform Unicode > >> normalization). > > > > Unicode normalization is only tangentially relevant here. > > Sure, but in this case it would fix them problem AFICS. Sorry, I no longer understand what was this about (what does "that" allude to here?). That's bound to happen when a response comes more than a month after the original exchange.