From: Pip Cet <pipcet@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Sat, 23 May 2020 15:13:38 +0000 [thread overview]
Message-ID: <CAOqdjBfUCvv2QbxtmqGkYMOh5Rep9WC4mvAWgdGRXm3a_ES9=Q@mail.gmail.com> (raw)
In-Reply-To: <83mu5yzquj.fsf@gnu.org>
On Sat, May 23, 2020 at 2:08 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 12:36:56 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > > You write: "(b) is not really feasible without redesigning the entire
> > > > Emacs display engine". I don't see how that's true at all. All we need
> > > > is some limited look-ahead.
> > >
> > > We already have look-ahead: that's what the regexp part of the
> > > composition rules are about. That is not the crucial problem.
> >
> > But it's the only problem I see!
>
> Then maybe I don't understand what you mean by look-ahead. Is that
> the decision how to choose those 32 characters of "context"?
Yes.
> Then why
> not use the current regexp-based approach, which is already much
> smarter than just blindly taking a fixed amount of surrounding text?
Because I do not know the regexp to use?
> > When you see an IT_CHARACTER, you get some context, hand it to
> > HarfBuzz, slice up the relevant glyphs, and display them.
>
> The problem is, of course, in the "some context" part. Your patch
> used an arbitrary 32-character chunk of text around the character to
> shape, which is of course not what the shaping engines want: they want
> _all_ of the surrounding text, the entire paragraph.
Which is clearly too expensive to actually give them, which is
something I didn't think it was necessary to even spell out.
> Your patch also invokes the shaper twice, on the same 32 characters,
> once in encode_char method and again in the text_extents method, which
> is another waste. The code in composite.c caches the composed
> characters to avoid that, but you bypass it.
Absolutely.
> This is okay for showing the concept, but we cannot use this in
> production. There are too many arbitrary decisions and inefficient
> expensive operations.
I agree, of course! In fact, the 32-character limit was chosen as a
reminder to myself that things would inherently be inefficient.
> > It doesn't involve composite.c at all, and that's good, because for
> > those tricky special cases composite.c does a better job than standard
> > shaping, and we need to keep that feature. It just shouldn't be the
> > regular route.
>
> Of course, you never tell how to distinguish between the "tricky
> special cases" for which we still need to use composite.c and friends,
> and the other kind.
The tricky special cases get handled as before, and come in with the
iterator .what set to IT_COMPOSITE. The standard cases come in with
.what set to IT_CHARACTER.
> Moreover, the HarfBuzz guys clearly say that what we do now is wrong
> for those "tricky" cases as well, so if we are going to fix that, why
> fix it only for ligatures made out of ASCII characters?
There's no such limitation, but, yes, ideally people would find they
don't need automatic compositions anymore...
> > > The crucial problem is that we currently perform layout decisions one
> > > grapheme cluster at a time, whereas what HarfBuzz people say is that
> > > we should basically do that one screen line at a time.
> >
> > I think we're going to have to compromise: that's why my patch used a
> > 32-character context rather than an entire line or just a single
> > character.
>
> If we are going to compromise, then why not compromise on what we
> already have, which is much less than 32 characters?
0 characters?
> Why should we
> enormously complicate and slow down our code without actually solving
> the problem?
We shouldn't.
> Did you ever see ligatures that are 32-character long?
"Zapfino" is the longest I've seen.
> > Ideally, of course, in most real cases we'd use whitespace-delimited
> > words as chunks. That's mere optimization, though.
>
> That'd be the wrong optimization, AFAIK.
Sure, but since it is exclusively an optimization, it's performance
considerations alone that will decide whether it is.
> E.g., some scripts don't
> have whitespace separated words at all, and still need shaping.
Thus "most".
> And
> what exactly is whitespace for this purpose? e.g., does it include
> Unicode control characters such as ZWJ?
Thankfully, that doesn't matter much: it's just a question of what we
optimize for, not one of what the results will look like.
So I'd say " ", "\t", and "\n" are enough, which is what the display
engine already handles specially.
> > > A secondary (but important) problem is that character composition
> > > involves calls to Lisp, which is relatively slow. This precludes
> > > calling the shaper for too many characters at once, too many times for
> > > each redisplay cycle of a window.
> >
> > I agree we shouldn't go through Lisp. My patch didn't.
>
> Your patch hard-codes arbitrary numbers without any way to control
> that from Lisp.
Yes.
> Such code will never fly in Emacs.
Of course not.
> > Calling the shaper less often is an important optimization, too. For
> > whitespace-delimited words, we only need to call it once.
>
> This doesn't work when the produced sequence of glyphs doesn't fit on
> the screen line.
> What the current layout code does in this case won't
> work well when you need to break a long sequence of glyphs in the
> middle and then continue on the next line from where you left off on
> this one.
You mean in visual-mode? Because what the current layout code does by
default is to break along any glyph boundary, and I don't see how
that's broken in any way.
> The longer the sequence of glyphs you get from the shaper
> in one go, the higher the probability of hitting this issue.
You break between the glyphs. It doesn't depend on whether you have
two or 20 or 100.
> The bottom line of this is that I think you will find very quickly
> that the basic assumptions of the current design -- that we produce
> single glyphs or very short sequences of them for each call to the
> shaper -- that these assumptions bite you on every step, because the
> code which deals with layout implicitly assumes this.
The shaper interface I described would actually return a single glyph
for each top-level call, with a number of callbacks to provide
context. So that assumption would hold up very well indeed...
> In short, I really don't see how this could ever work, except in a
> very limited set of simple use cases. E.g., what do you do with
> bidirectional text? ignore it?
A bidi boundary is a hard boundary for HarfBuzz, and no shaping
happens across it. Is that what you mean by "ignore it"?
> > > I don't think there's any disagreements on this high and abstract
> > > level.
> >
> > I think there are: if we treat fonts as programs, we need to let them
> > do their job, which involves kerning, substitutions, ligatures, and
> > even crazy stuff like randomizing the glyph used for each character to
> > get a more hand-written appearance. We don't need to know about
> > ligatures, we just let the font do it. No Lisp callbacks, just a call
> > to harfbuzz.
>
> I think this is a simplistic view of how the display engine works,
Quite possibly :-)
> and
> I don't see how it could work in production while supporting all the
> use cases we already do.
It only comes in for use cases not handled otherwise, i.e. those where
the iterator is at an IT_CHARACTER. All other use cases are
unaffected, because they mean we're overriding the font decision
anyway.
As I said, the problem I have is to get look-ahead working, which you
think isn't a problem. I've got an idea for it, but it doesn't work
(yet); my theory is the bidi.c code fails to keep its state in the
iterator and can't deal with multiple parallel iterators.
> I could be wrong, though, so I'm looking
> forward to see you present a series of patches that do support the
> existing use cases and the ligatures as well, and don't cause any
> slowdown in redisplay.
As I said, what's stopping me is the look-ahead problem, and in
particular some code in bidi.c that doesn't play along well with
look-ahead.
next prev parent reply other threads:[~2020-05-23 15:13 UTC|newest]
Thread overview: 145+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-17 10:41 Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 14:09 ` Arthur Miller
2020-05-17 14:30 ` Eli Zaretskii
2020-05-17 15:06 ` Arthur Miller
2020-05-17 15:56 ` Eli Zaretskii
2020-05-17 16:50 ` Arthur Miller
2020-05-17 17:06 ` Eli Zaretskii
2020-05-17 14:35 ` Eli Zaretskii
2020-05-17 14:59 ` Julius Pfrommer
2020-05-17 15:55 ` Eli Zaretskii
2020-05-17 16:28 ` Pip Cet
2020-05-17 17:00 ` Eli Zaretskii
2020-05-17 18:50 ` Pip Cet
2020-05-17 19:17 ` Eli Zaretskii
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-18 16:45 ` tomas
2020-05-18 16:49 ` Eli Zaretskii
2020-05-18 17:05 ` Ligatures Stefan Monnier
2020-05-18 17:18 ` Ligatures Eli Zaretskii
2020-05-18 19:19 ` Ligatures Pip Cet
2020-05-18 19:25 ` Ligatures tomas
2020-05-18 19:41 ` Ligatures Pip Cet
2020-05-18 20:20 ` Ligatures tomas
2020-05-18 19:33 ` Ligatures Eli Zaretskii
2020-05-18 19:44 ` Ligatures Clément Pit-Claudel
2020-05-19 2:25 ` Ligatures Eli Zaretskii
2020-05-19 2:44 ` Ligatures Clément Pit-Claudel
2020-05-19 13:59 ` Ligatures Eli Zaretskii
2020-05-19 14:35 ` Ligatures Clément Pit-Claudel
2020-05-19 15:21 ` Ligatures Eli Zaretskii
2020-05-19 15:44 ` Ligatures Clément Pit-Claudel
2020-05-19 16:15 ` Ligatures Eli Zaretskii
2020-05-19 15:36 ` Ligatures Tassilo Horn
2020-05-19 16:08 ` Ligatures Eli Zaretskii
2020-05-19 16:14 ` Ligatures Stefan Monnier
2020-05-19 3:47 ` Ligatures Stefan Monnier
2020-05-19 4:51 ` Ligatures Clément Pit-Claudel
2020-05-18 19:38 ` Ligatures Clément Pit-Claudel
2020-05-19 14:55 ` Ligatures Pip Cet
2020-05-19 15:30 ` Ligatures Clément Pit-Claudel
2020-05-19 15:52 ` Ligatures Pip Cet
2020-05-18 17:24 ` Ligatures tomas
2020-05-18 17:41 ` Ligatures Eli Zaretskii
2020-05-18 19:07 ` Ligatures tomas
2020-05-18 19:17 ` Ligatures Eli Zaretskii
2020-05-18 20:33 ` Ligatures Stefan Monnier
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
2020-05-18 17:39 ` Eli Zaretskii
2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-18 19:15 ` Eli Zaretskii
2020-05-18 19:18 ` tomas
2020-05-18 20:37 ` Ligatures Stefan Monnier
2020-05-18 21:59 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Alan Third
2020-05-19 13:56 ` Eli Zaretskii
2020-05-19 14:39 ` Clément Pit-Claudel
2020-05-19 21:43 ` Pip Cet
2020-05-20 1:41 ` Clément Pit-Claudel
2020-05-20 2:07 ` Ligatures Stefan Monnier
2020-05-20 7:14 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) tomas
2020-05-20 15:18 ` Eli Zaretskii
2020-05-20 17:31 ` Clément Pit-Claudel
2020-05-20 18:01 ` Eli Zaretskii
2020-05-20 18:33 ` Clément Pit-Claudel
2020-05-20 18:49 ` Eli Zaretskii
2020-05-20 18:53 ` Clément Pit-Claudel
2020-05-20 19:02 ` Eli Zaretskii
2020-05-20 23:19 ` Ligatures Stefan Monnier
2020-05-21 10:01 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Pip Cet
2020-05-21 14:11 ` Eli Zaretskii
2020-05-21 16:26 ` Pip Cet
2020-05-21 19:08 ` Eli Zaretskii
2020-05-21 20:51 ` Clément Pit-Claudel
2020-05-21 21:16 ` Pip Cet
2020-05-22 6:12 ` Eli Zaretskii
2020-05-22 9:25 ` Pip Cet
2020-05-22 11:23 ` Eli Zaretskii
2020-05-22 12:52 ` Pip Cet
2020-05-22 13:15 ` Eli Zaretskii
2020-05-22 13:29 ` Clément Pit-Claudel
2020-05-22 14:30 ` Eli Zaretskii
2020-05-22 14:34 ` Clément Pit-Claudel
2020-05-22 19:01 ` Eli Zaretskii
2020-05-22 19:33 ` Clément Pit-Claudel
2020-05-22 19:44 ` Eli Zaretskii
2020-05-22 20:02 ` Clément Pit-Claudel
[not found] ` <83mu5z171j.fsf@gnu.org>
2020-05-23 14:34 ` Clément Pit-Claudel
2020-05-23 16:18 ` Eli Zaretskii
2020-05-23 16:37 ` Clément Pit-Claudel
2020-05-22 13:56 ` Pip Cet
[not found] ` <83lflj16jn.fsf@gnu.org>
[not found] ` <AF222EA0-FE05-4224-8459-2BF82CE27266@vasilij.de>
[not found] ` <834ks7110w.fsf@gnu.org>
2020-05-23 11:24 ` Vasilij Schneidermann
2020-05-23 13:04 ` Eli Zaretskii
[not found] ` <83eerb145r.fsf@gnu.org>
[not found] ` <CAOqdjBeef8Fa596raEyBUwv0Zr+41LSiYvHW39EdoaXpyxCXVw@mail.gmail.com>
[not found] ` <831rnb0zld.fsf@gnu.org>
2020-05-23 12:36 ` Pip Cet
2020-05-23 14:08 ` Eli Zaretskii
2020-05-23 15:13 ` Pip Cet [this message]
2020-05-23 16:34 ` Eli Zaretskii
2020-05-23 22:38 ` Pip Cet
2020-05-24 15:33 ` Eli Zaretskii
2020-05-26 18:13 ` Pip Cet
2020-05-26 19:46 ` Eli Zaretskii
2020-05-27 9:36 ` Pip Cet
2020-05-27 17:13 ` Eli Zaretskii
2020-05-27 18:42 ` Pip Cet
2020-05-27 19:19 ` Eli Zaretskii
2020-05-23 17:32 ` Eli Zaretskii
2020-05-23 21:29 ` Pip Cet
2020-05-24 15:19 ` Eli Zaretskii
2020-05-23 12:47 ` Ligatures Stefan Monnier
2020-05-23 13:10 ` Ligatures Eli Zaretskii
2020-05-23 13:45 ` Ligatures Stefan Monnier
2020-05-23 14:12 ` Ligatures Eli Zaretskii
2020-05-23 13:36 ` Ligatures 조성빈
2020-05-23 14:15 ` Ligatures Stefan Monnier
2020-05-23 14:37 ` Ligatures Pip Cet
2020-05-22 11:44 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-22 13:26 ` Clément Pit-Claudel
2020-05-22 14:29 ` Eli Zaretskii
2020-05-22 14:32 ` Clément Pit-Claudel
2020-05-22 19:00 ` Eli Zaretskii
2020-05-21 21:06 ` Pip Cet
2020-05-22 6:06 ` Eli Zaretskii
2020-05-22 9:34 ` Pip Cet
2020-05-22 11:33 ` Eli Zaretskii
2020-05-19 20:26 ` Alan Third
2020-05-19 10:09 ` Trevor Spiteri
2020-05-19 14:22 ` Eli Zaretskii
2020-05-19 5:43 ` Ligatures ASSI
2020-05-19 7:22 ` Ligatures tomas
2020-05-19 7:55 ` Ligatures Joost Kremers
2020-05-19 8:07 ` Ligatures tomas
2020-05-19 10:17 ` Ligatures Yuri Khan
2020-05-19 14:26 ` Ligatures Eli Zaretskii
2020-05-19 19:00 ` Ligatures Yuri Khan
2020-05-19 10:43 ` Ligatures Werner LEMBERG
2020-05-19 10:48 ` Ligatures tomas
2020-05-19 14:18 ` Ligatures Eli Zaretskii
2020-05-19 14:52 ` Ligatures Eli Zaretskii
2020-05-19 15:11 ` Ligatures Pip Cet
2020-05-19 15:36 ` Ligatures Eli Zaretskii
2020-05-19 16:16 ` Ligatures Pip Cet
2020-05-19 16:41 ` Ligatures Eli Zaretskii
2020-05-19 17:00 ` Ligatures Eli Zaretskii
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 18:45 ` Eli Zaretskii
2020-05-17 22:28 ` chad
2020-05-18 22:08 ` Alan Third
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOqdjBfUCvv2QbxtmqGkYMOh5Rep9WC4mvAWgdGRXm3a_ES9=Q@mail.gmail.com' \
--to=pipcet@gmail.com \
--cc=alan@idiocy.org \
--cc=cpitclaudel@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.