From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Pip Cet Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Tue, 26 May 2020 18:13:55 +0000 Message-ID: References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> <834ks95cmz.fsf@gnu.org> <4faa291f-f2df-36d1-73d5-332b93a9b6d8@gmail.com> <83wo544hx5.fsf@gnu.org> <831rnc43ih.fsf@gnu.org> <83ftbs2jr5.fsf@gnu.org> <83lflj16jn.fsf@gnu.org> <83eerb145r.fsf@gnu.org> <831rnb0zld.fsf@gnu.org> <83mu5yzquj.fsf@gnu.org> <838shizk35.fsf@gnu.org> <831rn9xs98.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="82686"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 26 20:16:51 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jde8B-000LOG-2k for ged-emacs-devel@m.gmane-mx.org; Tue, 26 May 2020 20:16:51 +0200 Original-Received: from localhost ([::1]:38694 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jde8A-0004yL-3j for ged-emacs-devel@m.gmane-mx.org; Tue, 26 May 2020 14:16:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56262) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jde5z-0003st-BQ for emacs-devel@gnu.org; Tue, 26 May 2020 14:14:35 -0400 Original-Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c]:43418) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jde5y-0002ki-1o; Tue, 26 May 2020 14:14:34 -0400 Original-Received: by mail-ot1-x32c.google.com with SMTP id a68so17075966otb.10; Tue, 26 May 2020 11:14:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vKmH0UY3nquym0l0E6fo8yQfi0PFs0mcXulb8bAkVJA=; b=K22zoyIBRQB1AJKPmyfBH4ZmuZFYV8Ibn9dDDQ69wv1RxTlZAYe6gjcRF83Pudf7qR 5vgFydATPU8gX0yrtqdigWp2KdmzlnLCA0hdAQEeQoShBQnEooF2Vw76mxEQG2veh6oD S4bccKginQKVLY5R1Ary90OuFIZ1RckD+4Qd6bbAdSbQkpS9lBV562lm9H/FW0hBCc/J gVfZwWK0nvcS9n1i+6OG8yGMZR8qSyDIZ3E4jgrnGW1pVkc2ZmQVSl6Wy4hMAwl6BLVt tcJWVXce3Furc9pTIpQKKKz9gX1mziDh+nmiru2R/b750M1ZJBCwIbURO5g0sinedqob wI0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vKmH0UY3nquym0l0E6fo8yQfi0PFs0mcXulb8bAkVJA=; b=lMQ2x3qAmxi0qkKqhw0HIgUZOecdilbFnWbQy6GF+dvr0/Sz9WPyim5Nn9JdiFIQhn BUfN72fzWbDG6yL24UySxQK3FxrPTkp+oxgR4FMZVWdD10JzaGq682uJaxyBwuccD5eR D6/MKDwwarbAyAylCLiFLG71O6p90PWBTUpX3tcUbuqgpcYTJUSyi8Rj7RHLQwQAsTCN U5Gvo7oPw+sqot9cLMOkprNjg2OKnanDv4rCFRFwJAr6cYE9TQfuJ1xuBvtKKFXfJM9L Ru/MGeVMRwsPKVmD/fQKkM6RNsLpc+JrYiPKPGj5/qGM1zJ7jf1Cz1moXlAEJSEYdPOV +pwQ== X-Gm-Message-State: AOAM531nD1pQYk+gTHn+QqlgRPwD4OrpSb1HDN/AODdaxWQcRZ5OnsW9 bz3InDnwdJDrZ/Lm8v4uTsTVqSulLgWQNbC2f2hOd1pb X-Google-Smtp-Source: ABdhPJx5ISIPyKAy3d3WGarLi0jOyWqy0OyP4IbHRDgH0psVFSeTeN450HV4fWMRkAGckwnR+1Y60+nBi9SmU634mKw= X-Received: by 2002:a9d:7a50:: with SMTP id z16mr1931576otm.292.1590516872233; Tue, 26 May 2020 11:14:32 -0700 (PDT) In-Reply-To: <831rn9xs98.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::32c; envelope-from=pipcet@gmail.com; helo=mail-ot1-x32c.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251469 Archived-At: On Sun, May 24, 2020 at 3:33 PM Eli Zaretskii wrote: > > From: Pip Cet > > Date: Sat, 23 May 2020 22:38:18 +0000 > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > On Sat, May 23, 2020 at 4:34 PM Eli Zaretskii wrote: > > > > From: Pip Cet > > > > Date: Sat, 23 May 2020 15:13:38 +0000 > > > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > Because what the current layout code does by default is to break > > > > along any glyph boundary, and I don't see how that's broken in any > > > > way. > > > > > > The code assumes that breaking on some glyph leaves the buffer > > > iterator ('struct it') in a state that we can simply continue to the > > > next buffer position. > > > > Yes. I see no reason to change that. > > > > > But if you already picked up several characters > > > via look-ahead, that is not true, and you will have to return back > > > several character positions, in order to continue on the next screen > > > line. > > > > You're describing why look-ahead is difficult: a while ago, you > > appeared to be saying it wasn't. This confuses me. > > > > Obviously, when I say "look-ahead", I mean receiving the next display > > elements an iterator would produce if it were actually advanced, > > without advancing it. > That's not what you said earlier: I think it is what I said. > > > > > > You write: "(b) is not really feasible without redesigning the entire > > > > > > Emacs display engine". I don't see how that's true at all. All we need > > > > > > is some limited look-ahead. > > > > > > > > > > We already have look-ahead: that's what the regexp part of the > > > > > composition rules are about. That is not the crucial problem. > > > > > > > > But it's the only problem I see! > > > > > > Then maybe I don't understand what you mean by look-ahead. Is that > > > the decision how to choose those 32 characters of "context"? > > > > Yes. > > Here you said that look-ahead means how to _choose_ the context. The distinction escapes me: look-ahead is how to get the context for a character, obviously without ruining any persistent state. I'm puzzled as to what else it could have meant. > > > If we want the shaper to handle all the text we display, > > > > Do we? A while back you said Lisp control over compositions was an > > important feature, and I'm inclined to think we shouldn't break the > > existing composition code. > > > > > we should go all the way and do it for any text, ASCII, non-ASCII, > > > symbols, emoji, everything. > > > > Are you suggesting I'm somehow limiting myself to ASCII? Let me assure > > you that's not the case. > > Then I really don't understand what problem are you trying to solve. Ligatures and kerning. > Let's try again from the beginning: which parts of the code that > implements automatic compositions are you trying to avoid, > and why? I'm not trying to avoid any of it! I just see no reason to use any of it, so far, because the part we have in common is about a dozen lines of code around the call to hb_shape. > Is that the part that identifies the "context" via regular > expressions? If so, then this problem needs to be solved by some > alternative; using an arbitrary chosen fixed number of characters is > not suitable for production. I'm puzzled as to how these regular expressions, which only work when they match fixed-length strings, as far as I can tell, are worse than a fixed-length context. You're right that the number shouldn't be hardcoded in Emacs, and shouldn't be arbitrary, but obviously there has to be a limit shorter than a word or paragraph. (The composite.c code currently hardcodes a limit of 500 characters). (And as I've said repeatedly, this is a deficiency specifically in HarfBuzz: the OpenType format makes it very easy to tell what the longest pattern is and how much context is needed. HarfBuzz should pass on that information, ideally by providing an incremental asynchronous API that requests only as much context as is needed until the glyphs in question can be returned.) > You haven't yet shown any viable alternative. To what? We still haven't seen any actual regular expressions that work. You just keep saying "regular expressions" like that's a solution, rather than simply constituting a restriction on the set of possible solutions. And keep in mind that this context is used only for deciding what the "current" glyph looks like: the next glyph will have its own context, which might or might not be different. What I'm currently playing with is something that I'm not sure is even expressible as a regexp: starting with the character at point, keep adding surrounding characters unless doing so would create a delimiter-nondelimiter boundary after the first char, or a nondelimiter-delimiter boundary before the last char, but limit the whole thing to 16 characters each way. As I've explained, it would be much better to let HarfBuzz tell us whether to provide more context, but even then we'd need a cut-off: imagine a file containing a gigabyte of 'f's. > Assuming that the alternative for selecting the "context" is found, > and composite.c is augmented to apply it instead of the regexps, why > not use the rest of the automatic composition code to produce the > glyphs and display them? I chose not to do that for a patch which I have stated repeatedly was not in any way a finalized design, and I don't see any good reason to do it for a real patch, either, so far. (I'll be honest: I strongly suspect that the code is too slow, we know it to be buggy, and it's simply too different from what I actually want to benefit from sharing the code). > The code which does that exists and works, (I suspect: slowly) > and is tested by years of use. It's unusable for me in Emacs 26.3. > It already solves the problems of look-ahead, If it does so efficiently, I'll certainly try reusing that code. But I strongly suspect it doesn't. > of wrapping long lines, Very poorly, for my purposes. > and others, including (but not limited to) the dreaded bidi thing. Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME. > Why reinvent that wheel when we already have it, and it works well? First, because it doesn't work that well for my purposes; second, precisely because it works well for the purposes of others, and I'd like to have as little impact as possible on existing use cases. They should just continue working, and so far they do. > > > and on top of that solve only a small part of the > > > underlying problem. > > > > Ligatures and kerning (right now, for LTR text). Is that a small > > problem because of the lack of RTL support? > > Yes, of course. Why? I honestly don't see what's bad about a patch that improves things for most languages and doesn't affect RTL languages (which, as you point out, have existing support). The code shouldn't break horribly for RTL text (it doesn't). If it works, that's great; if it doesn't work and leaves things unshaped, that's the existing behavior, and auto-composition-mode will still work if enabled. > An acceptable solution should support any text Emacs > supports. By that standard, bidi.c and composite.c are unacceptable. > What's more, we already have the code which implements all > that, so I don't understand why you want to bypass it. We have something that superficially results in a similar screen layout to what I want, but that actually represents display elements in a way that makes them unusable for my purposes.