From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Sat, 23 May 2020 15:13:38 +0000 Message-ID: References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> <83mu5yzquj.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="12780"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat May 23 17:14:48 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jcVrL-0003AN-0V for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 17:14:47 +0200 Original-Received: from localhost ([::1]:39486 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcVrK-0001ym-39 for ged-emacs-devel@m.gmane-mx.org; Sat, 23 May 2020 11:14:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36478) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jcVqs-0001Zu-Er for emacs-devel@gnu.org; Sat, 23 May 2020 11:14:18 -0400 Original-Received: from mail-ot1-x344.google.com ([2607:f8b0:4864:20::344]:45063) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jcVqr-0008E6-4r; Sat, 23 May 2020 11:14:18 -0400 Original-Received: by mail-ot1-x344.google.com with SMTP id c3so10547942otr.12; Sat, 23 May 2020 08:14:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dcBWnYgStnsp9hoz7DVF9PPrF8Hyi71mKlkATMIHexA=; b=sSP0t4pWqdXULSIGGmldDJ38uyaaC1qyB3cS19Oa64mo19fRTZQIm/+yM1seFDut/V GHHSsSeFPRLWtRBnsZM3PyC/IDfdJROzA4MA6TrW6jXx/a8lFxax8KSF1XegLx3zU3YI nTAKGB+iuGfW+DVWDWbwmbJKsOEbJvFlDay/VuEoQ09ZpUPON0sAn2ZabMA2w3+k+fdp yDLH+wxGsxmkIdsuUEjbtXS0/1qo7JsHK2Czah3gAvHtoJEUN0Qgp/YbSPnFz1yynSVK N5Td7god6s9VwlAu6rPc+r2PXaeeVkKTjmgfIHHn4VTM/qOk/BukChXtr5FTmkFfnu6/ QQWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dcBWnYgStnsp9hoz7DVF9PPrF8Hyi71mKlkATMIHexA=; b=bdT2LZvBrPLYxcaYyPl2gZyb4GCkRaF200kZYTvHNmx1qQH3IpZm57RhgdZjrSV/wM UbyTDO/ZNeg3YZ1rbyv1y2h27/KckEJIPmJ2VFH2mKee6VJ4juIAlL4PEBetFn2hxjLL E49QrTYr47Y32WgNANy0imWMHe/czFeQJXAn1KJ22VUjssIkBRjuSzTD2inQR64IsxqD EHfdd2xAwuMv3oSQXjLjvD5a1LPpnt7Z8FAva1v7tgQOkdHXOI0kYFlfhucDoH5cYi6K /KAZGD4Ta0xNVEVuHPlBa9CI1aZJbsm9MgvJNsouf330MDENEejVb1URg/ey1V0nHdey nmJw== X-Gm-Message-State: AOAM530EAcJAMbZBDrjJfxZ+p5KyqIGQ8XRPUhzFn2rNRIic4lrLWhMz x5NOh+F2l7rB5hEPcLzD2ETwU/AVoYFWSz6mmzDainuD X-Google-Smtp-Source: ABdhPJylfllU1mGFaSvBr07oDTbbreJr3gzk5n0n8YVGOYr57m4ADsj5XZDAZTOuCQJ+UmVTGDLQeP+bYW+MHYv3gsY= X-Received: by 2002:a9d:7a50:: with SMTP id z16mr15907174otm.292.1590246855427; Sat, 23 May 2020 08:14:15 -0700 (PDT) In-Reply-To: <83mu5yzquj.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::344; envelope-from=pipcet@gmail.com; helo=mail-ot1-x344.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251279 Archived-At: On Sat, May 23, 2020 at 2:08 PM Eli Zaretskii wrote: > > From: Pip Cet > > Date: Sat, 23 May 2020 12:36:56 +0000 > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > > > You write: "(b) is not really feasible without redesigning the entire > > > > Emacs display engine". I don't see how that's true at all. All we need > > > > is some limited look-ahead. > > > > > > We already have look-ahead: that's what the regexp part of the > > > composition rules are about. That is not the crucial problem. > > > > But it's the only problem I see! > > Then maybe I don't understand what you mean by look-ahead. Is that > the decision how to choose those 32 characters of "context"? Yes. > Then why > not use the current regexp-based approach, which is already much > smarter than just blindly taking a fixed amount of surrounding text? Because I do not know the regexp to use? > > When you see an IT_CHARACTER, you get some context, hand it to > > HarfBuzz, slice up the relevant glyphs, and display them. > > The problem is, of course, in the "some context" part. Your patch > used an arbitrary 32-character chunk of text around the character to > shape, which is of course not what the shaping engines want: they want > _all_ of the surrounding text, the entire paragraph. Which is clearly too expensive to actually give them, which is something I didn't think it was necessary to even spell out. > Your patch also invokes the shaper twice, on the same 32 characters, > once in encode_char method and again in the text_extents method, which > is another waste. The code in composite.c caches the composed > characters to avoid that, but you bypass it. Absolutely. > This is okay for showing the concept, but we cannot use this in > production. There are too many arbitrary decisions and inefficient > expensive operations. I agree, of course! In fact, the 32-character limit was chosen as a reminder to myself that things would inherently be inefficient. > > It doesn't involve composite.c at all, and that's good, because for > > those tricky special cases composite.c does a better job than standard > > shaping, and we need to keep that feature. It just shouldn't be the > > regular route. > > Of course, you never tell how to distinguish between the "tricky > special cases" for which we still need to use composite.c and friends, > and the other kind. The tricky special cases get handled as before, and come in with the iterator .what set to IT_COMPOSITE. The standard cases come in with .what set to IT_CHARACTER. > Moreover, the HarfBuzz guys clearly say that what we do now is wrong > for those "tricky" cases as well, so if we are going to fix that, why > fix it only for ligatures made out of ASCII characters? There's no such limitation, but, yes, ideally people would find they don't need automatic compositions anymore... > > > The crucial problem is that we currently perform layout decisions one > > > grapheme cluster at a time, whereas what HarfBuzz people say is that > > > we should basically do that one screen line at a time. > > > > I think we're going to have to compromise: that's why my patch used a > > 32-character context rather than an entire line or just a single > > character. > > If we are going to compromise, then why not compromise on what we > already have, which is much less than 32 characters? 0 characters? > Why should we > enormously complicate and slow down our code without actually solving > the problem? We shouldn't. > Did you ever see ligatures that are 32-character long? "Zapfino" is the longest I've seen. > > Ideally, of course, in most real cases we'd use whitespace-delimited > > words as chunks. That's mere optimization, though. > > That'd be the wrong optimization, AFAIK. Sure, but since it is exclusively an optimization, it's performance considerations alone that will decide whether it is. > E.g., some scripts don't > have whitespace separated words at all, and still need shaping. Thus "most". > And > what exactly is whitespace for this purpose? e.g., does it include > Unicode control characters such as ZWJ? Thankfully, that doesn't matter much: it's just a question of what we optimize for, not one of what the results will look like. So I'd say " ", "\t", and "\n" are enough, which is what the display engine already handles specially. > > > A secondary (but important) problem is that character composition > > > involves calls to Lisp, which is relatively slow. This precludes > > > calling the shaper for too many characters at once, too many times for > > > each redisplay cycle of a window. > > > > I agree we shouldn't go through Lisp. My patch didn't. > > Your patch hard-codes arbitrary numbers without any way to control > that from Lisp. Yes. > Such code will never fly in Emacs. Of course not. > > Calling the shaper less often is an important optimization, too. For > > whitespace-delimited words, we only need to call it once. > > This doesn't work when the produced sequence of glyphs doesn't fit on > the screen line. > What the current layout code does in this case won't > work well when you need to break a long sequence of glyphs in the > middle and then continue on the next line from where you left off on > this one. You mean in visual-mode? Because what the current layout code does by default is to break along any glyph boundary, and I don't see how that's broken in any way. > The longer the sequence of glyphs you get from the shaper > in one go, the higher the probability of hitting this issue. You break between the glyphs. It doesn't depend on whether you have two or 20 or 100. > The bottom line of this is that I think you will find very quickly > that the basic assumptions of the current design -- that we produce > single glyphs or very short sequences of them for each call to the > shaper -- that these assumptions bite you on every step, because the > code which deals with layout implicitly assumes this. The shaper interface I described would actually return a single glyph for each top-level call, with a number of callbacks to provide context. So that assumption would hold up very well indeed... > In short, I really don't see how this could ever work, except in a > very limited set of simple use cases. E.g., what do you do with > bidirectional text? ignore it? A bidi boundary is a hard boundary for HarfBuzz, and no shaping happens across it. Is that what you mean by "ignore it"? > > > I don't think there's any disagreements on this high and abstract > > > level. > > > > I think there are: if we treat fonts as programs, we need to let them > > do their job, which involves kerning, substitutions, ligatures, and > > even crazy stuff like randomizing the glyph used for each character to > > get a more hand-written appearance. We don't need to know about > > ligatures, we just let the font do it. No Lisp callbacks, just a call > > to harfbuzz. > > I think this is a simplistic view of how the display engine works, Quite possibly :-) > and > I don't see how it could work in production while supporting all the > use cases we already do. It only comes in for use cases not handled otherwise, i.e. those where the iterator is at an IT_CHARACTER. All other use cases are unaffected, because they mean we're overriding the font decision anyway. As I said, the problem I have is to get look-ahead working, which you think isn't a problem. I've got an idea for it, but it doesn't work (yet); my theory is the bidi.c code fails to keep its state in the iterator and can't deal with multiple parallel iterators. > I could be wrong, though, so I'm looking > forward to see you present a series of patches that do support the > existing use cases and the ligatures as well, and don't cause any > slowdown in redisplay. As I said, what's stopping me is the look-ahead problem, and in particular some code in bidi.c that doesn't play along well with look-ahead.