From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Date: Thu, 21 May 2020 22:08:52 +0300 Message-ID: <834ks95cmz.fsf@gnu.org> References: <20200517165953.000044d2@web.de> <83lflqblp0.fsf@gnu.org> <83ftbybio3.fsf@gnu.org> <83zha69xs2.fsf@gnu.org> <83367x9qeq.fsf@gnu.org> <0ccae2a4-533b-d15c-2884-c2f00b067776@gmail.com> <83wo5987mk.fsf@gnu.org> <99d4beed-88ae-b5cd-3ecb-a44325c8a1dc@gmail.com> <20200518215908.GA57594@breton.holly.idiocy.org> <83mu6481v3.fsf@gnu.org> <75a90563-51b4-d3b8-4832-fc0e2542af0d@gmail.com> <83blmi7hys.fsf@gnu.org> <837dx55qff.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="76373"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu May 21 21:09:32 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jbqZP-000JkY-7t for ged-emacs-devel@m.gmane-mx.org; Thu, 21 May 2020 21:09:31 +0200 Original-Received: from localhost ([::1]:32792 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbqZO-0006d0-9q for ged-emacs-devel@m.gmane-mx.org; Thu, 21 May 2020 15:09:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36990) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbqYm-0005n0-Ff for emacs-devel@gnu.org; Thu, 21 May 2020 15:08:52 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:49972) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbqYl-0007Tw-3v; Thu, 21 May 2020 15:08:51 -0400 Original-Received: from [176.228.60.248] (port=1643 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jbqYk-0003oj-G8; Thu, 21 May 2020 15:08:50 -0400 In-Reply-To: (message from Pip Cet on Thu, 21 May 2020 16:26:13 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:251181 Archived-At: > From: Pip Cet > Date: Thu, 21 May 2020 16:26:13 +0000 > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > On Thu, May 21, 2020 at 2:11 PM Eli Zaretskii wrote: > > > From: Pip Cet > > > Date: Thu, 21 May 2020 10:01:03 +0000 > > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org > > > > but > > > > if we really want this only for these limited cases, we will need to > > > > somehow indicate to the display engine which ligatures are to be > > > > handled like this and which aren't. > > > > > > Well, we now know that fonts can provide information about how a > > > ligature is to be split into one-dimensional slices; > > > > The question is: do we want to show those carets for all the character > > compositions, even if the information is provided? If not, we will > > have to indicate somehow whether they should or shouldn't be shown for > > each particular grapheme cluster. > > Oh. I hadn't thought about fonts providing such caret information in > cases where they shouldn't, but of course that's a valid concern. > > > > Of course that means that Emacs behavior would depend on the font > > > tables in ways it currently doesn't. That's a problem. > > > > It isn't a problem to depend on that if most fonts provide this > > information. > > > Then we could simply say this is not supported when the > > information is not in the font. > > I'm not sure how simple that would be: we could treat ligatures > without carets as atomic, or we could tell harfbuzz not to apply > ligatures without carets, or maybe make that decision depend on > whether the ligature is required or discretionary... > > > But if many fonts that support > > ligatures don't provide this information, we will need to have some > > fallback, like assume that every codepoint has the same share of the > > ligature's width. the fact that other applications use a simplistic > > heuristic and not the information in the fonts suggests that either > > the information is not readily available or there are some other > > problems with using it. > > Correct, it does. I'm not sure which one is the case. > > > > > Right, the actual implementation will have to be different. In > > > > particular, I think that if ligatures will use automatic compositions, > > > > the information you need is already stored in the composition table > > > > and reachable from the glyph string, so you don't need to invoke the > > > > shaper again. > > > > > > Well, I'm sorry to bring up a different (though somewhat related > > > issue), but kerning is also an issue: we need a shaper to get that > > > right, not just a composition table, right? > > > > Automatic compositions already use the shaper, see autocmp_chars. > > I'm not sure I understand how kerning would work using automatic compositions. > > > > > I see you implemented this for static compositions, which are > > > > semi-obsolete. > > > > > > I'm sorry, I'm afraid I don't understand. This should handle any > > > composition the shaper does, and only those, but slices up everything > > > horizontally by default. > > > > I'm talking about the changes in gui_produce_glyphs. Its high-level > > structure is basically > > > > if (it->what == IT_CHARACTER) > > { > > ... /* handles character glyphs */ > > } > > else if (it->what == IT_COMPOSITION && it->cmp_it.ch < 0) > > { > > ... /* A static compositions. */ > > } > > else if (it->what == IT_COMPOSITION) > > { > > /* A dynamic (automatic) composition. */ > > } > > [...] > > > > You made changes only in the "static compositions" part. > > No. I didn't touch the "static compositions" part at all, except for > passing an extra NULL pointer to an API I'd extended. (At least, > that's what I intended, for all the changes to be in the IT_CHARACTER > part). I mean this part: @@ -30433,8 +30483,9 @@ gui_produce_glyphs (struct it *it) else { get_char_face_and_encoding (it->f, ch, face_id, - &char2b, false); - pcm = get_per_char_metric (font, &char2b); + &char2b, false, + make_context (it)); + pcm = get_per_char_metric (font, &char2b, make_context (it)); } This calls make_context and passes it to these functions. This code handles static compositions only. > > The "modern" way of composing text in Emacs uses automatic > > compositions, which are controlled by data in > > composition-function-table. This is where we call the shaping > > engine to produce the glyphs according to rules stored in the > > font. I don't see in your patch any changes that affect ligatures > > created by automatic compositions; did I miss something? > > I don't think so; I went for a third route, that of leaving all > compositions handling to the shaper and doing none of it in Emacs > itself. But automatic compositions do work by calling the shaper. > Perhaps I can digress a little and describe what I think the > interaction with the shaper should be like: > > Emacs: I'd like to display codepoint 'f' > Harfbuzz: you'll have to tell me the codepoint before that > Emacs: 'f' > Harfbuzz: and the one after those two > Emacs: 'i' > Harfbuzz: and the one before all of those > Emacs: That's too expensive for me to compute / it's the beginning of > paragraph / a bidi boundary / an object without an assigned codepoint > / ... > Harfbuzz: okay, display it as the middle slice of the "ffi" glyph > > I.e., I'd like Harfbuzz to be asynchronous, and request more > information, parsimoniously, about the context of the codepoint we're > describing, rather than working in one go from "complete" information > to an indefinitely-long line of glyphs. And deal well with us deciding > it's too expensive to perform that much look-back/look-ahead. (Because > in real life, ligatures depend on knowing some amount of the context, > but not all of it, or people could never start writing.) That would prevent Emacs from controlling what is and what isn't composed, leaving the shaper in charge. We currently allow Lisp to control that via composition-function-table, which provides a regexp that text around a character must match in order for the matching substring to be passed to the shaper. We never call the shaper unless composition-function-table tells us to do so. I'm not sure I understand what problems do you see with this design.