From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Wed, 27 May 2020 18:42:07 +0000 Message-ID: References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> <83mu5yzquj.fsf@gnu.org> <838shizk35.fsf@gnu.org> <831rn9xs98.fsf@gnu.org> <83mu5utr7j.fsf@gnu.org> <83tv01s3lr.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="58388"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 27 20:43:22 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1je11O-000F6T-GP for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 20:43:22 +0200 Original-Received: from localhost ([::1]:43686 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1je11N-0002UW-JB for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 14:43:21 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49710) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1je10p-0001vP-SL for emacs-devel@gnu.org; Wed, 27 May 2020 14:42:47 -0400 Original-Received: from mail-ot1-x32e.google.com ([2607:f8b0:4864:20::32e]:39721) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1je10o-0008LY-FL; Wed, 27 May 2020 14:42:47 -0400 Original-Received: by mail-ot1-x32e.google.com with SMTP id d7so366965ote.6; Wed, 27 May 2020 11:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PDIvGMJbxgF8RRJbIC+f4sYYfqd7s7ZgNxk1O2cVsV0=; b=AW/eSkUq/6mM/p/sXm6ur2usQV3vi1wUIMnSft1kciRYayqtfoxY4ukj4UGQZvw0g0 YEaQY6eioDAATNEAYV4dXpMh3YyGnzX0wEPhc3hG4dGaoS/bHbVSzyZo2oQv65L+wIj2 TOe5Y8qXifkUUYfG3Chs5WXKqsXuzY+h9Mx+QYk/QWB/nTH1LjZTe9QJ9JXbturSzY9a CQj+OKrtWWb76D9s3vJMLrlzyDvNVtemMdn+5pyX4c+Y5HwBxTgocwRj5gb+03zgLfz2 2mjcTK4gx2cNfmG/1+e4PzkqGXwAP2d1p3f1lEiDJL+OsaOWwKdS+Xzh6HMrqbZbAKON 1pzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PDIvGMJbxgF8RRJbIC+f4sYYfqd7s7ZgNxk1O2cVsV0=; b=PEjY0gNuUO9KuPXkNv6QJ0id/qz6xPBxoooHYd/qwTCk2S3UJ5fcSmgK8k5po45Q1R Th03+SryEX76jM1kgyz6JOqNzQq6ALTea+s931RdfU1hsn1H2+jHlG3YFOYvcNY5NMO4 y2Nz3p8IAdw+6BpIAeKHDn+O0HTvHk/NCk4aZdyojyTridM9RxKUdTWbI2rizRoaUXQt 6yZlqgjLXiVkC8PDFUVj6J2lQr/srr08hGoHgI/jfjO5PK1F0jAuaTKoqKbUJJm995wP MroS6aCts8sKAeiFx3UwKf37op0olcaL6zq5RcoyqDJwz62DvahfpihFvdfjduZtf293 n5uQ== X-Gm-Message-State: AOAM5318V1i4p4TOtjOi2KMtclSjZ0uP5SYyCvW/Le8ZNNV7YgbvteOW Vkb5jaD6eBVrc74OP4zRzTIHIHsTSPuGTYF6NxLp6sy/ X-Google-Smtp-Source: ABdhPJz2CuzAXw3/u0pMxXKLsWBbMB6GAe0aSblO0xPidJW+iByynR81NmOOLB2i+64+Dni1Y2NQUWy8/40vu9ZhpH8= X-Received: by 2002:a9d:7a50:: with SMTP id z16mr6046843otm.292.1590604964132; Wed, 27 May 2020 11:42:44 -0700 (PDT) In-Reply-To: <83tv01s3lr.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::32e; envelope-from=pipcet@gmail.com; helo=mail-ot1-x32e.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251517 Archived-At: On Wed, May 27, 2020 at 5:13 PM Eli Zaretskii wrote: > > From: Pip Cet > > Date: Wed, 27 May 2020 09:36:52 +0000 > > Cc: emacs-devel@gnu.org > > > > > Any measurements to back that up? > > > > Yes. With a regexp of "....", the composite.c code takes 175 billion > > cycles to display every line of composite.c. My code takes 144 billion > > cycles, with a lookahead/lookbehind each set to 128 but limiting it as > > described. > > What did you compare, exactly? On the one hand, the code you posted > here, which took 128 characters around each character to be displayed? No. Not anything like that code. > any other changes in the code you posted here? And what does > "limiting it as described" mean here? I described the algorithm for selecting context. > And on the other hand, the existing automatic composition machinery? > With what setup of composition-function-table, exactly? As I said, a regexp of "....". > And finally, which code was included in the count of cycles? All of it. There's no reason to believe the composite.c regexp design will perform adequately. It doesn't. > > > > > and others, including (but not limited to) the dreaded bidi thing. > > > > > > > > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME. > > > > > > That's because you look in the wrong place. > > > > What's the right place? I'm using all the code in bidi.c, of course, > > No, actually you don't. > Your make_context copies characters in strict > logical order, bypassing bidi.c My current code doesn't. > , and by that also potentially crossing > boundaries of different directionality (and even line and paragraph > boundaries), which is a no-no in text shaping. Then, after you call > the shaper, you don't reorder the glyphs it delivers, so they will > look on display in the wrong order. I do now. > And there may be other subtle > issues as well -- this stuff was finalized so long ago that I'm not > even sure I remember all the details of what needed to be done to get > it right. (It's not enough. Open emacs -Q etc/HELLO, place point on the lam in "aleikum", and hit control-space. The shape changes to something incorrect.) > > > > The code shouldn't break horribly for RTL text (it doesn't). > > > > > > It _will_ break for RTL text, you just didn't yet see it because you > > > only tested it in simple use cases. UAX#9 defines a lot of optional > > > features, including multi-level directional overrides and embeddings, > > > it isn't just right-to-left vs left-to-right. > > > > I assume bidi.c handles that, as it does for composite.c? > > Yes, but only _if_you_use_them_correctly_! If you bypass them, then > all bets are off. Obviously. > > > > We have something that superficially results in a similar screen > > > > layout to what I want, but that actually represents display elements > > > > in a way that makes them unusable for my purposes. > > > > > > Then please describe what doesn't fit your purpose, and let's focus on > > > extending the existing code to do what's missing. > > > > The three main things are: > > - "entering" glyphs, instead of treating them as atomic > > Why is that needed? A ligature is a single display entity, that's why > fonts ligate. "ffi" is not. When I enter "official" C-a C-f C-f, point MUST be on the second f. > Why would we want to break ligatures when we wrap > lines? Who said we do? I personally like it, but it's obviously not something we should do by default? > > - providing context automatically rather than by providing specific > > regexps for it in advance > > That's a separate part of the problem; I wasn't talking about it. It > needs a separate solution (which was not yet presented), but the > solution doesn't have to be based on regexps if a better or smarter or > faster way is available. Extending composition-function-table to > support context definition by means other than regexp is easy and > doesn't disrupt the way the code works. > > > - kerning, which requires context for every character > > That's again about that separate part of the problem, because once the > context was determined correctly, the shaper will perform the kerning > for you. > > - ligatures that come partly from a display property and partly from > > the buffer (composite.c doesn't allow for those, as far as I can tell) > > It doesn't and it shouldn't! Text of display strings and overlay > strings is completely isolated from buffer text, and is even > bidi-reordered independently. This is by design. Unacceptable design for my use case, then. I don't see how revealing buffer text that has a replacing display property, rather than the replacement, is good design. The results of putting display properties on autocompositions are...entertaining, in any case. I've now got an "x" character that C-x = tells me is an "i"... > These strings are > more akin to images than to a part of buffer text, so mixing them with > buffer text on display would be a grave mistake. No, it wouldn't be. If two letters appear with no intervening space, they need to be kerned and ligated if appropriate, no matter where they come from. If people want a ZWNJ, that's perfectly available to them. > > > Please note: I'm not talking about the regexp part -- that part you > > > anyway will need to decide how to extend or augment. I'm telling you > > > right here and now that blindly taking a fixed amount of surrounding > > > text will not be acceptable. You can either come up with some smarter > > > regexp (and you are wrong: the regexps in composition-function-table > > > do NOT have to match only fixed strings, you can see that they don't > > > in the part of the table we set up for the Arabic script); > > > > Again, I think the limits are fixed: 4 characters of history and 500 > > characters of look-ahead. What am I missing? > > Fixed limits and fixed strings are two different things. You can > match strings of many different lengths up to a limit. Which effectively means you can match strings of that limited length. > The 3 previous characters are rarely needed, certainly not for English > ligatures, because you can detect the sequence by the first character. Precisely the same argument applies to my 16-character limit. A script in which a glyph depends on something happening 16 codepoints onwards, or back, is extremely unlikely. > Anyway, you again focus on the (separate) issue of determining the > context. Whereas I was mainly talking about what happens _after_ you > determine the context: how do you collect the characters to pass to > the shaper, how you present to the layout code the glyphs returned by > the shaper, and how you lay out those glyphs by inserting them into > the glyph rows of the glyph matrix. It is this code that I see no > reason to modify, definitely not significantly. It needs to be modified, significantly, to support entering glyphs, to support kerning, and to support things like ligating across a buffer text / display string boundary. > > > or you can > > > decide on something more complex, like a function. Either way, the > > > amount of text that this will pick up and pass to the shaper should be > > > reasonable and should be determined by some understandable rules. And > > > those rules must be controllable from Lisp. > > > > That last part isn't true for the composite.c code, which imposes a > > limit of 4 characters of history and 500 characters of look-ahead > > How do those limits violate the above requirement? The 3-char > prev-chars limit is "reasonable" because we currently don't need more, It's hardcoded in C, though. A 16-character limit, as explained above, is perfectly "reasonable" for determining the shape of a single glyph. > and the other limit doesn't exist AFAICT -- you could make a regexp > that matched very long strings, if needed. Hmm. I thought I saw weirdness around the 500th character, but it's probably one of the other bugs. But, seriously, you're still willing to argue that point shouldn't be able to enter the "ffi" glyph? Not even if the user wants it? Because if so, I suggest we interrupt the discussion here.