From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Wed, 27 May 2020 20:13:36 +0300 Message-ID: <83tv01s3lr.fsf@gnu.org> References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> <83mu5yzquj.fsf@gnu.org> <838shizk35.fsf@gnu.org> <831rn9xs98.fsf@gnu.org> <83mu5utr7j.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="57836"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 27 19:14:18 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jdzdB-000EwU-TT for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 19:14:18 +0200 Original-Received: from localhost ([::1]:45484 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jdzdA-0001qu-UO for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 13:14:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36762) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jdzch-00016x-2j for emacs-devel@gnu.org; Wed, 27 May 2020 13:13:47 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:40691) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jdzcg-0001WO-QG; Wed, 27 May 2020 13:13:46 -0400 Original-Received: from [176.228.60.248] (port=1894 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jdzcg-0004LC-8Y; Wed, 27 May 2020 13:13:46 -0400 In-Reply-To: (message from Pip Cet on Wed, 27 May 2020 09:36:52 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251513 Archived-At: > From: Pip Cet > Date: Wed, 27 May 2020 09:36:52 +0000 > Cc: emacs-devel@gnu.org > > > Any measurements to back that up? > > Yes. With a regexp of "....", the composite.c code takes 175 billion > cycles to display every line of composite.c. My code takes 144 billion > cycles, with a lookahead/lookbehind each set to 128 but limiting it as > described. What did you compare, exactly? On the one hand, the code you posted here, which took 128 characters around each character to be displayed? any other changes in the code you posted here? And what does "limiting it as described" mean here? And on the other hand, the existing automatic composition machinery? With what setup of composition-function-table, exactly? And finally, which code was included in the count of cycles? > > > > and others, including (but not limited to) the dreaded bidi thing. > > > > > > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME. > > > > That's because you look in the wrong place. > > What's the right place? I'm using all the code in bidi.c, of course, No, actually you don't. Your make_context copies characters in strict logical order, bypassing bidi.c, and by that also potentially crossing boundaries of different directionality (and even line and paragraph boundaries), which is a no-no in text shaping. Then, after you call the shaper, you don't reorder the glyphs it delivers, so they will look on display in the wrong order. And there may be other subtle issues as well -- this stuff was finalized so long ago that I'm not even sure I remember all the details of what needed to be done to get it right. > > > The code shouldn't break horribly for RTL text (it doesn't). > > > > It _will_ break for RTL text, you just didn't yet see it because you > > only tested it in simple use cases. UAX#9 defines a lot of optional > > features, including multi-level directional overrides and embeddings, > > it isn't just right-to-left vs left-to-right. > > I assume bidi.c handles that, as it does for composite.c? Yes, but only _if_you_use_them_correctly_! If you bypass them, then all bets are off. > > > We have something that superficially results in a similar screen > > > layout to what I want, but that actually represents display elements > > > in a way that makes them unusable for my purposes. > > > > Then please describe what doesn't fit your purpose, and let's focus on > > extending the existing code to do what's missing. > > The three main things are: > - "entering" glyphs, instead of treating them as atomic Why is that needed? A ligature is a single display entity, that's why fonts ligate. Why would we want to break ligatures when we wrap lines? > - providing context automatically rather than by providing specific > regexps for it in advance That's a separate part of the problem; I wasn't talking about it. It needs a separate solution (which was not yet presented), but the solution doesn't have to be based on regexps if a better or smarter or faster way is available. Extending composition-function-table to support context definition by means other than regexp is easy and doesn't disrupt the way the code works. > - kerning, which requires context for every character That's again about that separate part of the problem, because once the context was determined correctly, the shaper will perform the kerning for you. > - ligatures that come partly from a display property and partly from > the buffer (composite.c doesn't allow for those, as far as I can tell) It doesn't and it shouldn't! Text of display strings and overlay strings is completely isolated from buffer text, and is even bidi-reordered independently. This is by design. These strings are more akin to images than to a part of buffer text, so mixing them with buffer text on display would be a grave mistake. > > Please note: I'm not talking about the regexp part -- that part you > > anyway will need to decide how to extend or augment. I'm telling you > > right here and now that blindly taking a fixed amount of surrounding > > text will not be acceptable. You can either come up with some smarter > > regexp (and you are wrong: the regexps in composition-function-table > > do NOT have to match only fixed strings, you can see that they don't > > in the part of the table we set up for the Arabic script); > > Again, I think the limits are fixed: 4 characters of history and 500 > characters of look-ahead. What am I missing? Fixed limits and fixed strings are two different things. You can match strings of many different lengths up to a limit. The 3 previous characters are rarely needed, certainly not for English ligatures, because you can detect the sequence by the first character. So this is rarely a limitation; but again, it can be expanded if needed with little if any effect on the code. (And where did you see the 500-character limitation of look-ahead?) Anyway, you again focus on the (separate) issue of determining the context. Whereas I was mainly talking about what happens _after_ you determine the context: how do you collect the characters to pass to the shaper, how you present to the layout code the glyphs returned by the shaper, and how you lay out those glyphs by inserting them into the glyph rows of the glyph matrix. It is this code that I see no reason to modify, definitely not significantly. > > or you can > > decide on something more complex, like a function. Either way, the > > amount of text that this will pick up and pass to the shaper should be > > reasonable and should be determined by some understandable rules. And > > those rules must be controllable from Lisp. > > That last part isn't true for the composite.c code, which imposes a > limit of 4 characters of history and 500 characters of look-ahead How do those limits violate the above requirement? The 3-char prev-chars limit is "reasonable" because we currently don't need more, and the other limit doesn't exist AFAICT -- you could make a regexp that matched very long strings, if needed. And the rules to use to set up the regexp are definitely "understandable" and can be controlled from Lisp.