From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Sat, 23 May 2020 17:08:20 +0300 Message-ID: <83mu5yzquj.fsf@gnu.org> References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="128984"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat May 23 16:08:46 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jcUpR-000XRm-66 for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 16:08:45 +0200 Original-Received: from localhost ([::1]:37240 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcUpQ-00041W-85 for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 10:08:44 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59886) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jcUoy-0003ZI-Uo for emacs-devel@gnu.org; Sat, 23 May 2020 10:08:16 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:43491) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcUox-0004m3-JO; Sat, 23 May 2020 10:08:15 -0400 Original-Received: from [176.228.60.248] (port=2150 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jcUox-0007cR-2y; Sat, 23 May 2020 10:08:15 -0400 In-Reply-To: (message from Pip Cet on Sat, 23 May 2020 12:36:56 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251274 Archived-At: > From: Pip Cet > Date: Sat, 23 May 2020 12:36:56 +0000 > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > You write: "(b) is not really feasible without redesigning the entire > > > Emacs display engine". I don't see how that's true at all. All we need > > > is some limited look-ahead. > > > > We already have look-ahead: that's what the regexp part of the > > composition rules are about. That is not the crucial problem. > > But it's the only problem I see! Then maybe I don't understand what you mean by look-ahead. Is that the decision how to choose those 32 characters of "context"? Then why not use the current regexp-based approach, which is already much smarter than just blindly taking a fixed amount of surrounding text? > When you see an IT_CHARACTER, you get some context, hand it to > HarfBuzz, slice up the relevant glyphs, and display them. The problem is, of course, in the "some context" part. Your patch used an arbitrary 32-character chunk of text around the character to shape, which is of course not what the shaping engines want: they want _all_ of the surrounding text, the entire paragraph. Your patch also invokes the shaper twice, on the same 32 characters, once in encode_char method and again in the text_extents method, which is another waste. The code in composite.c caches the composed characters to avoid that, but you bypass it. This is okay for showing the concept, but we cannot use this in production. There are too many arbitrary decisions and inefficient expensive operations. > It doesn't involve composite.c at all, and that's good, because for > those tricky special cases composite.c does a better job than standard > shaping, and we need to keep that feature. It just shouldn't be the > regular route. Of course, you never tell how to distinguish between the "tricky special cases" for which we still need to use composite.c and friends, and the other kind. Moreover, the HarfBuzz guys clearly say that what we do now is wrong for those "tricky" cases as well, so if we are going to fix that, why fix it only for ligatures made out of ASCII characters? > > The crucial problem is that we currently perform layout decisions one > > grapheme cluster at a time, whereas what HarfBuzz people say is that > > we should basically do that one screen line at a time. > > I think we're going to have to compromise: that's why my patch used a > 32-character context rather than an entire line or just a single > character. If we are going to compromise, then why not compromise on what we already have, which is much less than 32 characters? Why should we enormously complicate and slow down our code without actually solving the problem? Did you ever see ligatures that are 32-character long? > Ideally, of course, in most real cases we'd use whitespace-delimited > words as chunks. That's mere optimization, though. That'd be the wrong optimization, AFAIK. E.g., some scripts don't have whitespace separated words at all, and still need shaping. And what exactly is whitespace for this purpose? e.g., does it include Unicode control characters such as ZWJ? > > A secondary (but important) problem is that character composition > > involves calls to Lisp, which is relatively slow. This precludes > > calling the shaper for too many characters at once, too many times for > > each redisplay cycle of a window. > > I agree we shouldn't go through Lisp. My patch didn't. Your patch hard-codes arbitrary numbers without any way to control that from Lisp. Such code will never fly in Emacs. > Calling the shaper less often is an important optimization, too. For > whitespace-delimited words, we only need to call it once. This doesn't work when the produced sequence of glyphs doesn't fit on the screen line. What the current layout code does in this case won't work well when you need to break a long sequence of glyphs in the middle and then continue on the next line from where you left off on this one. The longer the sequence of glyphs you get from the shaper in one go, the higher the probability of hitting this issue. The bottom line of this is that I think you will find very quickly that the basic assumptions of the current design -- that we produce single glyphs or very short sequences of them for each call to the shaper -- that these assumptions bite you on every step, because the code which deals with layout implicitly assumes this. In short, I really don't see how this could ever work, except in a very limited set of simple use cases. E.g., what do you do with bidirectional text? ignore it? > > I don't think there's any disagreements on this high and abstract > > level. > > I think there are: if we treat fonts as programs, we need to let them > do their job, which involves kerning, substitutions, ligatures, and > even crazy stuff like randomizing the glyph used for each character to > get a more hand-written appearance. We don't need to know about > ligatures, we just let the font do it. No Lisp callbacks, just a call > to harfbuzz. I think this is a simplistic view of how the display engine works, and I don't see how it could work in production while supporting all the use cases we already do. I could be wrong, though, so I'm looking forward to see you present a series of patches that do support the existing use cases and the ligatures as well, and don't cause any slowdown in redisplay.