From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Wed, 27 May 2020 09:36:52 +0000 Message-ID: References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> <83mu5yzquj.fsf@gnu.org> <838shizk35.fsf@gnu.org> <831rn9xs98.fsf@gnu.org> <83mu5utr7j.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="69114"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 27 11:38:15 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jdsVr-000HsR-Is for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 11:38:15 +0200 Original-Received: from localhost ([::1]:43976 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jdsVq-0000PX-Lo for ged-emacs-devel@m.gmane-mx.org; Wed, 27 May 2020 05:38:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44750) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jdsVB-00085P-Ej for emacs-devel@gnu.org; Wed, 27 May 2020 05:37:33 -0400 Original-Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c]:39111) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jdsVA-0004m8-5O; Wed, 27 May 2020 05:37:33 -0400 Original-Received: by mail-ot1-x32c.google.com with SMTP id d7so18683049ote.6; Wed, 27 May 2020 02:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mXekS0TIp43pJEY+eA8DuMGV52Gra0A8bb0Wu7BMhEk=; b=l+gykrIx1FxdNLQRcCnRrzns/tb5FoTfO/E/+Vtw5qqXhQxyLcJj7fSXU5kdCYhRJ5 fkq4igAfJsHnvEw+Ae9rxVEbJD6AuXE28tJGLj0yLF0h7f6BKQG6VS93Pg9L7h4YWzbP 0ABIY3T2F5DzHiDjYu82GNBmOmbanhto8d7X8QGSZ1D+eA2+H1aiSy9NybVa+Ln59YsU G8e2mJ0F0V9Ez9rgSlzsh1mzADP1g2BFiLzrPpd0slBzE5G+ibfPwKJxhLRvK6Pn59eC gsSOl9HLvgBXLnTg9lET0166X+H7d3Zo7Vp8CmFofQExMJIo+NN6EvBL7IaBQmu47U0Y erlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mXekS0TIp43pJEY+eA8DuMGV52Gra0A8bb0Wu7BMhEk=; b=DIazH36XiIdSgraeBWOoAqQfuZ1eGIA30+Hwpu5cb0RVmaR116Cfc1giuFKH6PGxI6 C52qMdfibJ4CRFg7GCxypf7DjcJI1GLmt7WjtfNTIhjn9LN3V8cgYfIHhZ7awhK2FLAF Z90tyzJg9IhOdM7ZUZzmRmRvOKlniHIky51M/FtBk0BEnSOkcNQMdVCdtI5MNEHAcCvA HyWiPQ12tvOCH/ooF+QDUWgjd5jbu83iqNVs4L9cdJCHV4IoUuia5Qr/V/TOVGnb+FJZ GgMExarFo0AA7buq4jGvEDUQkXCH/eRVaZbKkTIeP5WsaZCDoSpt4iw41U8aTCc67z9c D7ew== X-Gm-Message-State: AOAM533yowhvYmlBCuD9y2hF+DfHKXEmstKw516YoAHFDJ/zRfZzfAk4 XjuxsWZbeVe91j66KaEcAj4mklQ+8gQNkNi9h+oh+Ako X-Google-Smtp-Source: ABdhPJyRDB7wEsAUrUw49ZE8nkH2MXpbyjcJ0nRscucLumEUV8gkdiEdfxgcD9pf8bB1FVteqRZ1zHfZliSX7WJHElg= X-Received: by 2002:a05:6830:61b:: with SMTP id w27mr4391499oti.154.1590572250169; Wed, 27 May 2020 02:37:30 -0700 (PDT) In-Reply-To: <83mu5utr7j.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::32c; envelope-from=pipcet@gmail.com; helo=mail-ot1-x32c.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251501 Archived-At: On Tue, May 26, 2020 at 7:46 PM Eli Zaretskii wrote: > > From: Pip Cet > > Date: Tue, 26 May 2020 18:13:55 +0000 > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > > Assuming that the alternative for selecting the "context" is found, > > > and composite.c is augmented to apply it instead of the regexps, why > > > not use the rest of the automatic composition code to produce the > > > glyphs and display them? > > > > I chose not to do that for a patch which I have stated repeatedly was > > not in any way a finalized design, and I don't see any good reason to > > do it for a real patch, either, so far. > > Why not? Which part are you asking about? I don't see any good reason because I've read the composite.c code (I'm not ignoring it), with an eye to reusing what's reusable, and come up empty. But you've convinced me I need to do a careful rereading. > > > The code which does that exists and works, > > > > (I suspect: slowly) > > Any measurements to back that up? Yes. With a regexp of "....", the composite.c code takes 175 billion cycles to display every line of composite.c. My code takes 144 billion cycles, with a lookahead/lookbehind each set to 128 but limiting it as described. > E.g., is scrolling through > etc/HELLO especially slow, once all the fonts were loaded (i.e. each > character in the file was displayed at least once)? > (And why are you using Emacs 26 and not > Emacs 27, where we support HarfBuzz and made several improvements and > bugfixes in the character composition area?) Because I was trying to test your implication that all this was usable years ago. It wasn't. I'm not using Emacs 26 :-) > > > It already solves the problems of look-ahead, > > > > If it does so efficiently, I'll certainly try reusing that code. But I > > strongly suspect it doesn't. > > Why suspect? why not try and see what does and doesn't work, what is > and isn't efficient? I have, now, coming up with the above measurement which confirms my suspicion. > > > and others, including (but not limited to) the dreaded bidi thing. > > > > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME. > > That's because you look in the wrong place. What's the right place? I'm using all the code in bidi.c, of course, so as far as I can tell what I'm not doing is using composite.c... > Once again, try looking > at etc/HELLO, there are portions of it that need both bidi and > compositions. I can explain how it works (the code is spread over > several files), but please believe me that it does, it passed the > HarfBuzz developers' eyes most of whom are native Arabic and Farsi > speakers, and wouldn't allow us to display Arabic script incorrectly. > > The whole point of using the existing code is that you don't _need_ to > understand how exactly we handle the bidi reordering when character > compositions are required. But that's true without using the existing code! > It just works, for all you care. It did > take several iterations to get right at the time; why would you want > to repeat all that, when the code is there to use and extend? > > second, precisely because it works well for the purposes of others, > > and I'd like to have as little impact as possible on existing use > > cases. They should just continue working, and so far they do. > > You are thinking of breaking those other cases by your changes? No! If I break them, that's a severe bug in my code! > But > we haven't yet established that changes are needed, "Enter"ing ligature glyphs is definitely something we need to do before any user can reasonably use variable-pitch fonts with ligatures for displaying English text. Kerning is another such thing. Both don't work with the current code. > Because the features you are talking about should "just work" in > Emacs. > Not only for some use cases and some scripts -- that is not > how we develop features. Features that work only for some cases are > broken and will draw bug reports. They make Emacs look unclean and > unprofessional. Not as much as the current lack of support does. > And there's no need to add such half-broken features because code that > supports much broader class of use cases already exists, you just need > to use it and maybe extend and augment it a bit. I don't think I agree with the "a bit". > > The code shouldn't break horribly for RTL text (it doesn't). > > It _will_ break for RTL text, you just didn't yet see it because you > only tested it in simple use cases. UAX#9 defines a lot of optional > features, including multi-level directional overrides and embeddings, > it isn't just right-to-left vs left-to-right. I assume bidi.c handles that, as it does for composite.c? > > > What's more, we already have the code which implements all > > > that, so I don't understand why you want to bypass it. > > > > We have something that superficially results in a similar screen > > layout to what I want, but that actually represents display elements > > in a way that makes them unusable for my purposes. > > Then please describe what doesn't fit your purpose, and let's focus on > extending the existing code to do what's missing. The three main things are: - "entering" glyphs, instead of treating them as atomic - providing context automatically rather than by providing specific regexps for it in advance - kerning, which requires context for every character Secondary concerns: - ligatures that come partly from a display property and partly from the buffer (composite.c doesn't allow for those, as far as I can tell) > Please note: I'm not talking about the regexp part -- that part you > anyway will need to decide how to extend or augment. I'm telling you > right here and now that blindly taking a fixed amount of surrounding > text will not be acceptable. You can either come up with some smarter > regexp (and you are wrong: the regexps in composition-function-table > do NOT have to match only fixed strings, you can see that they don't > in the part of the table we set up for the Arabic script); Again, I think the limits are fixed: 4 characters of history and 500 characters of look-ahead. What am I missing? > or you can > decide on something more complex, like a function. Either way, the > amount of text that this will pick up and pass to the shaper should be > reasonable and should be determined by some understandable rules. And > those rules must be controllable from Lisp. That last part isn't true for the composite.c code, which imposes a limit of 4 characters of history and 500 characters of look-ahead, as far as I can tell. But, sure, if that's a requirement, I'll keep it in mind. > But that is a separate part of the problem that you will need to > solve, and you will need to solve it whether or not you use character > compositions. What I _am_ saying is that the rest of the machinery > that implements automatic compositions does exactly what you need: it > calls the shaper, handling LTR and RTL text as needed, then lays out > the glyphs the shaper returns in a way that handles all the usual > stuff our users expect, such as line wrapping and truncation. > It is silly to disregard that code, so please don't. You've convinced me that it's worth reading it again, more carefully, but I'm not optimistic I'll come to a different conclusion this time around.