From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Sat, 23 May 2020 16:04:54 +0300 Message-ID: <83sgfqzts9.fsf@gnu.org> References: <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <834ks7110w.fsf@gnu.org> <20200523112412.GA30384@odonien.localdomain> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="130242"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, alan@idiocy.org, pipcet@gmail.com, emacs-devel@gnu.org To: Vasilij Schneidermann Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat May 23 15:05:22 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jcTq6-000Xkq-O1 for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 15:05:22 +0200 Original-Received: from localhost ([::1]:44514 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcTq5-00081T-QM for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 09:05:21 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jcTpa-0007aE-Ru for emacs-devel@gnu.org; Sat, 23 May 2020 09:04:51 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:42765) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcTpY-0002vp-S0; Sat, 23 May 2020 09:04:49 -0400 Original-Received: from [176.228.60.248] (port=2171 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jcTpY-0001Rp-AQ; Sat, 23 May 2020 09:04:48 -0400 In-Reply-To: <20200523112412.GA30384@odonien.localdomain> (message from Vasilij Schneidermann on Sat, 23 May 2020 13:24:12 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251269 Archived-At: > Date: Sat, 23 May 2020 13:24:12 +0200 > From: Vasilij Schneidermann > Cc: emacs-devel@gnu.org, pipcet@gmail.com, cpitclaudel@gmail.com, > alan@idiocy.org > > Out of curiosity, is this the same reason why font fallback is > handled on a per-script basis for most cases and with carefully > chosen ranges for emoji? I see a similar problem there, with > updates being necessary for every Unicode release. No, our font selection machinery is completely separate from text shaping, and is also agnostic to character compositions. Basically, we have a char-table (the one set-fontset-font manipulates) which provides the various fonts to try for every given character, and some very convoluted code (see fontset.c) that implements the logic of how to try the fonts and which fonts to prefer for a character. IOW, the font selection is basically per-character and not per-script. The relation to emoji is that emoji _sequences_ need character composition, and Emacs currently cannot compose characters that aren't supported by the same font. This _is_ related to ligatures etc., as it indeed touches on one of the basic premises of the display engine's iteration through buffer text: we stop wherever the 'face' property of characters changes (and the font is one attribute of the face), then continue after loading and realizing the new face. This is why you see strange artifacts when you press and hold Shift, and then move with arrow keys across the Arabic line in etc/HELLO: the shaping of adjacent characters breaks because we pass only part of the text to the shaper. This is another bug that cannot be fixed cleanly while keeping the current design of the display engine and its low-level method of iteration through text and of producing glyphs. > Given your previous explanation, a regex-based approach heuristic is the best > we can hope for then. From what I understand the display engine uses a > rectangular grid, not unlike what terminal emulators do. It uses a rectangular array of glyphs, not a rectangular grid. The difference is that glyphs can have variable metrics, which breaks the grid concept. IOW, the glyph at coordinates (i, j) in the array and the glyph at (i, j+1) are not necessarily one above the other on display. > Are there any tricks > to steal from existing terminal emulators? For example there is an open pull > request [1] for alacritty using Harfbuzz and FreeType for ligature support. I cannot claim I understood well enough what this attempts to do, but I don't think this is our problem in Emacs. It is not a problem of layout per se -- Emacs is well equipped to deal with layout of glyphs and grapheme clusters that have wildly different metrics (recall that we are able to lay out images of more-or-less arbitrary dimensions on the same line as simple text). The problem is that we make the layout decisions as soon as we have the glyph metrics, on the fly, for each "thing" we need to display. HarfBuzz people would like us to send them the entire paragraph of text, then get it back as a series of glyphs, then make the layout decisions based on that. This would need entirely different algorithms, if not also different data structures; for starters, we'd need to know how to find the paragraph(s) that will end up on display without first trying to display them. And all our redisplay shortcuts and optimizations implicitly also assume the current basic iteration, one character at a time, which can be started at any arbitrary buffer position. > The greatest challenge I see with redesigning the display engine is supporting > textual terminals. Really? Why do you think this to be the greatest challenge? For any model of the display we will come up, TTY frames will always be a proper subset, no?