* Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
@ 2020-05-17 10:41 Julius Pfrommer
2020-05-17 14:09 ` Arthur Miller
2020-05-17 14:35 ` Eli Zaretskii
0 siblings, 2 replies; 145+ messages in thread
From: Julius Pfrommer @ 2020-05-17 10:41 UTC (permalink / raw)
To: emacs-devel
Hi all,
during the recent discussion on "Emacs being too square", I recalled a
few projects that use OpenGL for terminal emulators [1,2]. With good
performance, smooth scrolling and the possibility to add more visual
*bling*.
I had a good look at Emacs' code-base to see if similar approaches
could be used. As you can imagine, I got lost in a forest of #ifdef for
different platforms and GUI toolkits. The code looks scary to touch. If
you don't have access to *all supported platform*, it is likely that
changes break a platform you could not test locally.
To make the code-base less scary, there should be more code-sharing
across GUI platforms. And this is indeed possible!
The GTK-based Emacs GUI can use Cairo for rendering. Cairo + FreeType +
HarfBuzz (calling it CFH for simplicity) is available for the other
supported platforms as well (besides pure TTY):
- GnuSTEP [http://wiki.gnustep.org/index.php/Backend]
- Raw Xlib [https://www.cairographics.org/Xlib/]
- Windows+MacOS [https://www.cairographics.org/download/]
Big portions of the platform-specific GUI code could be unified based on
the CFH libraries. Is a hard dependency on the CFH libraries imaginable?
Maybe one of the platforms is a "low-hanging fruit" to get things going.
As every major refactoring, there should be a series of small steps in
order to keep things stable.
Thank you for the hard work put into this amazing piece of software!
Regards, Julius
[1] https://sw.kovidgoyal.net/kitty/
[2] https://github.com/alacritty/alacritty
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 10:41 Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
@ 2020-05-17 14:09 ` Arthur Miller
2020-05-17 14:30 ` Eli Zaretskii
2020-05-17 14:35 ` Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Arthur Miller @ 2020-05-17 14:09 UTC (permalink / raw)
To: Julius Pfrommer; +Cc: emacs-devel
Julius Pfrommer <julius.pfrommer@web.de> writes:
> Hi all,
>
> during the recent discussion on "Emacs being too square", I recalled a
> few projects that use OpenGL for terminal emulators [1,2]. With good
> performance, smooth scrolling and the possibility to add more visual
> *bling*.
>
> I had a good look at Emacs' code-base to see if similar approaches
> could be used. As you can imagine, I got lost in a forest of #ifdef for
> different platforms and GUI toolkits. The code looks scary to touch. If
> you don't have access to *all supported platform*, it is likely that
> changes break a platform you could not test locally.
I have been looking into same, some time ago and recently, and I
experience same problem. A forest of cases, all coded into same place in
giant files of 5K+ lines :-).
> To make the code-base less scary, there should be more code-sharing
> across GUI platforms. And this is indeed possible!
Emacs and Emacs src could benefit of some modularization and refactoring
definitely.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 14:09 ` Arthur Miller
@ 2020-05-17 14:30 ` Eli Zaretskii
2020-05-17 15:06 ` Arthur Miller
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 14:30 UTC (permalink / raw)
To: emacs-devel, Arthur Miller, Julius Pfrommer
On May 17, 2020 5:09:08 PM GMT+03:00, Arthur Miller <arthur.miller@live.com> wrote:
> Julius Pfrommer <julius.pfrommer@web.de> writes:
>
> > Hi all,
> >
> > during the recent discussion on "Emacs being too square", I recalled
> a
> > few projects that use OpenGL for terminal emulators [1,2]. With good
> > performance, smooth scrolling and the possibility to add more visual
> > *bling*.
> >
> > I had a good look at Emacs' code-base to see if similar approaches
> > could be used. As you can imagine, I got lost in a forest of #ifdef
> for
> > different platforms and GUI toolkits. The code looks scary to touch.
> If
> > you don't have access to *all supported platform*, it is likely that
> > changes break a platform you could not test locally.
>
> I have been looking into same, some time ago and recently, and I
> experience same problem. A forest of cases, all coded into same place
> in
> giant files of 5K+ lines :-).
>
> > To make the code-base less scary, there should be more code-sharing
> > across GUI platforms. And this is indeed possible!
> Emacs and Emacs src could benefit of some modularization and
> refactoring
> definitely.
I suggest to go through the archives and the Git logs to see how many such efforts have been made and are already in the codebase. It isn't like the advantages of this are unclear to the development team, or that nothing is being done in that direction. Far from it.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 10:41 Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 14:09 ` Arthur Miller
@ 2020-05-17 14:35 ` Eli Zaretskii
2020-05-17 14:59 ` Julius Pfrommer
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 14:35 UTC (permalink / raw)
To: emacs-devel, Julius Pfrommer
On May 17, 2020 1:41:25 PM GMT+03:00, Julius Pfrommer <julius.pfrommer@web.de> wrote:
> Hi all,
>
> during the recent discussion on "Emacs being too square", I recalled a
> few projects that use OpenGL for terminal emulators [1,2]. With good
> performance, smooth scrolling and the possibility to add more visual
> *bling*.
>
> I had a good look at Emacs' code-base to see if similar approaches
> could be used. As you can imagine, I got lost in a forest of #ifdef
> for
> different platforms and GUI toolkits. The code looks scary to touch.
> If
> you don't have access to *all supported platform*, it is likely that
> changes break a platform you could not test locally.
>
> To make the code-base less scary, there should be more code-sharing
> across GUI platforms. And this is indeed possible!
>
> The GTK-based Emacs GUI can use Cairo for rendering. Cairo + FreeType
> +
> HarfBuzz (calling it CFH for simplicity) is available for the other
> supported platforms as well (besides pure TTY):
>
> - GnuSTEP [http://wiki.gnustep.org/index.php/Backend]
> - Raw Xlib [https://www.cairographics.org/Xlib/]
> - Windows+MacOS [https://www.cairographics.org/download/]
>
> Big portions of the platform-specific GUI code could be unified based
> on
> the CFH libraries. Is a hard dependency on the CFH libraries
> imaginable?
>
> Maybe one of the platforms is a "low-hanging fruit" to get things
> going.
> As every major refactoring, there should be a series of small steps in
> order to keep things stable.
>
> Thank you for the hard work put into this amazing piece of software!
>
> Regards, Julius
>
> [1] https://sw.kovidgoyal.net/kitty/
> [2] https://github.com/alacritty/alacritty
Any work in this direction is and always has been welcome. The practical problem with that is that you need to have access to all the supported platforms to make sure the refactoring works.
FWIW, I'm not sure I share your optimism regarding the Cairo way, I think it requires something from the system as well, so it might be not so easy.
And the GUI toolkits are AFAIU a separate issue, not directly related to how we draw to the glass.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 14:35 ` Eli Zaretskii
@ 2020-05-17 14:59 ` Julius Pfrommer
2020-05-17 15:55 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Julius Pfrommer @ 2020-05-17 14:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli,
> Any work in this direction is and always has been welcome. The
> practical problem with that is that you need to have access to all
> the supported platforms to make sure the refactoring works.
>
> FWIW, I'm not sure I share your optimism regarding the Cairo way, I
> think it requires something from the system as well, so it might be
> not so easy.
>
> And the GUI toolkits are AFAIU a separate issue, not directly related
> to how we draw to the glass.
I am well aware of the effort to keep the many different platforms
alive.
Let me phrase the question differently: Would it be okay to have a hard
dependency on the Cairo+FreeType+Harfbuzz (CFH) libraries, as they are
available everywhere?
It would be a pity to invest time into a direction that is infeasible
from the outset.
Even on Linux, this would unlock quite a few simplifications. I count
at least three font handling "backends" here.
Regards, Julius
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 14:30 ` Eli Zaretskii
@ 2020-05-17 15:06 ` Arthur Miller
2020-05-17 15:56 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Arthur Miller @ 2020-05-17 15:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Julius Pfrommer, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> On May 17, 2020 5:09:08 PM GMT+03:00, Arthur Miller <arthur.miller@live.com> wrote:
>> Julius Pfrommer <julius.pfrommer@web.de> writes:
>>
>> > Hi all,
>> >
>> > during the recent discussion on "Emacs being too square", I recalled
>> a
>> > few projects that use OpenGL for terminal emulators [1,2]. With good
>> > performance, smooth scrolling and the possibility to add more visual
>> > *bling*.
>> >
>> > I had a good look at Emacs' code-base to see if similar approaches
>> > could be used. As you can imagine, I got lost in a forest of #ifdef
>> for
>> > different platforms and GUI toolkits. The code looks scary to touch.
>> If
>> > you don't have access to *all supported platform*, it is likely that
>> > changes break a platform you could not test locally.
>>
>> I have been looking into same, some time ago and recently, and I
>> experience same problem. A forest of cases, all coded into same place
>> in
>> giant files of 5K+ lines :-).
>>
>> > To make the code-base less scary, there should be more code-sharing
>> > across GUI platforms. And this is indeed possible!
>> Emacs and Emacs src could benefit of some modularization and
>> refactoring
>> definitely.
>
> I suggest to go through the archives and the Git logs to see how many such
> efforts have been made and are already in the codebase. It isn't like the
> advantages of this are unclear to the development team, or that nothing is being
> done in that direction. Far from it.
I understand that, and I am conscius myself that you devs are aware of
it and that you would probably do something about it if it was less work
than it probably is. I believe you it is not easy considering the long
history of Emacs. I am just reflecting over how I feel every time I peek
into souces. It feels like I am looking into sqlite ammalgamation :-).
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 14:59 ` Julius Pfrommer
@ 2020-05-17 15:55 ` Eli Zaretskii
2020-05-17 16:28 ` Pip Cet
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 15:55 UTC (permalink / raw)
To: Julius Pfrommer; +Cc: emacs-devel
> Date: Sun, 17 May 2020 16:59:53 +0200
> From: Julius Pfrommer <julius.pfrommer@web.de>
> Cc: emacs-devel@gnu.org
>
> Let me phrase the question differently: Would it be okay to have a hard
> dependency on the Cairo+FreeType+Harfbuzz (CFH) libraries, as they are
> available everywhere?
First, we need to establish that this is a solution, and for what
problem(s). It is important to realize that the GUI backends we use
handle much more than just drawing text, they need to be able to
display GUI widgets, frame and window decorations (menu bar, tool bar,
scroll bars, the frame title, etc.), and much more. Is the
configuration you propose capable of doing all that? I don't think
the answer will be full and definitive until "Someone" walks through
all the APIs we implement in x/w32/ns/fns.c and x/w32/ns/term.c, and
makes sure they all can be covered.
Next, please be aware that we already made the decision to use
HarfBuzz as our main text-shaping engine. X and w32 already use it;
for NS someone has to write the code (and they are not very likely to
do so because macOS users consider the native text shaping more
feature-rich). Dropping the other font backends is a matter of time,
but it could take a long time.
In any case, the font backend is not the main issue here; in
particular, the likes of FreeType are hardly even seen except on very
low level of the code. It's the other aspects of GUI code that
bothers me much more.
> Even on Linux, this would unlock quite a few simplifications. I count
> at least three font handling "backends" here.
Down to 2 and one deprecated one on master. Bu again, font backends
is a relatively easy problem, and it is being dealt with.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 15:06 ` Arthur Miller
@ 2020-05-17 15:56 ` Eli Zaretskii
2020-05-17 16:50 ` Arthur Miller
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 15:56 UTC (permalink / raw)
To: Arthur Miller; +Cc: julius.pfrommer, emacs-devel
> From: Arthur Miller <arthur.miller@live.com>
> Cc: emacs-devel@gnu.org, Julius Pfrommer <julius.pfrommer@web.de>
> Date: Sun, 17 May 2020 17:06:35 +0200
>
> I am just reflecting over how I feel every time I peek
> into souces. It feels like I am looking into sqlite ammalgamation :-).
It was worse just a year ago. It will be better a year from now.
patches are welcome.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 15:55 ` Eli Zaretskii
@ 2020-05-17 16:28 ` Pip Cet
2020-05-17 17:00 ` Eli Zaretskii
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-17 16:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel, Julius Pfrommer
On Sun, May 17, 2020 at 3:56 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Sun, 17 May 2020 16:59:53 +0200
> > From: Julius Pfrommer <julius.pfrommer@web.de>
> > Cc: emacs-devel@gnu.org
> Next, please be aware that we already made the decision to use
> HarfBuzz as our main text-shaping engine.
That's a decision that, having just played with HarfBuzz, I find
puzzling. It appears to have no practical support for treating
ligatures as anything but monolithic glyphs: is there a documented way
of getting HarfBuzz to tell you which part of the "ffi" ligature is
the middle "f"? I suspect the answer is "there are some languages
whose scripts don't allow for the equivalent operation, so we won't
support it at all, as a matter of principle".
I'm not sure PangoCairo does better, but whatever Libreoffice uses
appears to get the job done, so at least one display engine out there
solves this problem.
(This is assuming we want kerning, ligatures, and subpixel rendering
for English text. "Real" text shaping, composition, reordrant glyphs,
and bidi concerns are something that I can't really comment on, beyond
admitting that, of course, supporting the world's major languages at
all is more important than supporting English with the typographic
finesse we currently lack).
Years ago, I ran a WebAssembly version of Emacs in a web browser.
(Back then, I used a terminal emulator written in JavaScript.) I'd
certainly like to do that again some day, and I think a hard
dependency on Cairo and FreeType would make that even harder.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 15:56 ` Eli Zaretskii
@ 2020-05-17 16:50 ` Arthur Miller
2020-05-17 17:06 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Arthur Miller @ 2020-05-17 16:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: julius.pfrommer, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Arthur Miller <arthur.miller@live.com>
>> Cc: emacs-devel@gnu.org, Julius Pfrommer <julius.pfrommer@web.de>
>> Date: Sun, 17 May 2020 17:06:35 +0200
>>
>> I am just reflecting over how I feel every time I peek
>> into souces. It feels like I am looking into sqlite ammalgamation :-).
>
> It was worse just a year ago. It will be better a year from now.
> patches are welcome.
Are there any guidelines if one would like to restructure something?
For example, I am looking a lot in image.c I was playing with line drawing
on an image the other day, and I would love to not have to look into ns
and gdi code while working with x11 & cairo only. It is so easy to miss
if a single line is actually outside of some platform ifdef and similar. It
is so messy, at least if one is n00b like me :-).
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 16:28 ` Pip Cet
@ 2020-05-17 17:00 ` Eli Zaretskii
2020-05-17 18:50 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 17:00 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel, julius.pfrommer
> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 17 May 2020 16:28:30 +0000
> Cc: Julius Pfrommer <julius.pfrommer@web.de>, emacs-devel@gnu.org
>
> On Sun, May 17, 2020 at 3:56 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > Date: Sun, 17 May 2020 16:59:53 +0200
> > > From: Julius Pfrommer <julius.pfrommer@web.de>
> > > Cc: emacs-devel@gnu.org
> > Next, please be aware that we already made the decision to use
> > HarfBuzz as our main text-shaping engine.
>
> That's a decision that, having just played with HarfBuzz, I find
> puzzling. It appears to have no practical support for treating
> ligatures as anything but monolithic glyphs: is there a documented way
> of getting HarfBuzz to tell you which part of the "ffi" ligature is
> the middle "f"?
You are accusing HarfBuzz of crimes it didn't commit ;-) What you see
is not produced by HarfBuzz, it's produced by Emacs.
HarfBuzz (and any other text-shaping engine we ever used) has a very
simple job: Emacs hands it a string of codepoints, and HarfBuzz
returns a series of font glyphs to be used to display that string.
That's all. All the rest is the Emacs display engine.
And yes, the current design is that a ligature (like any other
"grapheme cluster" produced by character composition) is a single
"display element": you move across all of it with a single C-f/C-b.
The only exception to this rule is that we allow DEL (but not C-d or
Delete) to erase individual codepoints going back from the end of the
grapheme cluster -- to facilitate editing ligatures and other composed
characters. This is the minimum "editing" capability that the user
must have, and I don't think I've heard complaints that it wasn't
enough. But if required, we could easily add special forward and
backward movements that could "enter" the composed character, we just
need to figure out how to display the result in order to give the user
some visual feedback. (Without visual feedback, I think you can have
it today if you customize global-disable-point-adjustment to a non-nil
value.)
In any case, the question "which part of the ligature corresponds to
some codepoint" is meaningless in the context of ligation and complex
text shaping: a sequence of N codepoints in general produces M font
glyphs, where M can be smaller, equal, or greater than N. The
relation between the N codepoints and M glyphs is many-to-many.
> I'm not sure PangoCairo does better, but whatever Libreoffice uses
> appears to get the job done
What job is that?
> (This is assuming we want kerning, ligatures, and subpixel rendering
> for English text. "Real" text shaping, composition, reordrant glyphs,
> and bidi concerns are something that I can't really comment on, beyond
> admitting that, of course, supporting the world's major languages at
> all is more important than supporting English with the typographic
> finesse we currently lack).
The truth is that "we" the Emacs project don't want to know anything
about ligatures, we want to delegate that job to the shaper. That's
the shaper's job, and HarfBuzz does its job very well and stays on top
of the relevant technological advances.
> Years ago, I ran a WebAssembly version of Emacs in a web browser.
> (Back then, I used a terminal emulator written in JavaScript.) I'd
> certainly like to do that again some day, and I think a hard
> dependency on Cairo and FreeType would make that even harder.
I think there's some measure of confusion here: AFAIR we don't use
Cairo for text shaping, only for its display. IOW, we tell Cairo to
display this and that glyphs, after HarfBuzz returned them.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 16:50 ` Arthur Miller
@ 2020-05-17 17:06 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 17:06 UTC (permalink / raw)
To: Arthur Miller; +Cc: julius.pfrommer, emacs-devel
> From: Arthur Miller <arthur.miller@live.com>
> Cc: emacs-devel@gnu.org, julius.pfrommer@web.de
> Date: Sun, 17 May 2020 18:50:04 +0200
>
> Are there any guidelines if one would like to restructure something?
The guideline is to factor any GUI code into common part and
platform-specific part, and define interfaces for the latter whose
implementation is in the corresponding *term.[cm] or *fns.[cm].
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 15:55 ` Eli Zaretskii
2020-05-17 16:28 ` Pip Cet
@ 2020-05-17 18:28 ` Julius Pfrommer
2020-05-17 18:45 ` Eli Zaretskii
` (2 more replies)
1 sibling, 3 replies; 145+ messages in thread
From: Julius Pfrommer @ 2020-05-17 18:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> First, we need to establish that this is a solution, and for what
> problem(s). It is important to realize that the GUI backends we use
> handle much more than just drawing text, they need to be able to
> display GUI widgets, frame and window decorations (menu bar, tool bar,
> scroll bars, the frame title, etc.), and much more.
I am quite supportive of the native GUI toolkits.
Cairo is a vector-drawing library and only responsible for the "glass"
of each frame (called the "canvas" in other communities). All the
event-handling logic, menu-drawing, etc. is untouched by it.
> Next, please be aware that we already made the decision to use
> HarfBuzz as our main text-shaping engine. X and w32 already use it;
Very good to see Emacs settle on HarfBuzz! Text-shaping touches into the
very core, as the glyph rendering impacts line-breaking, redisplay, and
so on.
> I don't think the answer will be full and definitive until "Someone"
> walks through all the APIs we implement in x/w32/ns/fns.c and
> x/w32/ns/term.c, and makes sure they all can be covered.
Looking at xterm.c, it is littered with #ifdef USE_CAIRO.
A first step could be to assume Cairo on X-based platforms and remove
duplicate code. The second step could be to decouple the "glass" from
the tookit "chrome" more thoroughly in xterm.c. That is easier to do
when a Cairo-canvas can be assumed for drawing.
Then, that entire "glass" could be reused by other platforms once they
have a Cairo-canvas for drawing as well. (Modulo the XWidget support
that depends on GTK.)
Once a switchover is in reach, it can live separately to the existing
platform-specific "glass" until all the kinks are worked out.
Regards, Julius
Am Sun, 17 May 2020 18:55:23 +0300
schrieb Eli Zaretskii <eliz@gnu.org>:
> > Date: Sun, 17 May 2020 16:59:53 +0200
> > From: Julius Pfrommer <julius.pfrommer@web.de>
> > Cc: emacs-devel@gnu.org
> >
> > Let me phrase the question differently: Would it be okay to have a
> > hard dependency on the Cairo+FreeType+Harfbuzz (CFH) libraries, as
> > they are available everywhere?
>
> First, we need to establish that this is a solution, and for what
> problem(s). It is important to realize that the GUI backends we use
> handle much more than just drawing text, they need to be able to
> display GUI widgets, frame and window decorations (menu bar, tool bar,
> scroll bars, the frame title, etc.), and much more. Is the
> configuration you propose capable of doing all that? I don't think
> the answer will be full and definitive until "Someone" walks through
> all the APIs we implement in x/w32/ns/fns.c and x/w32/ns/term.c, and
> makes sure they all can be covered.
>
> Next, please be aware that we already made the decision to use
> HarfBuzz as our main text-shaping engine. X and w32 already use it;
> for NS someone has to write the code (and they are not very likely to
> do so because macOS users consider the native text shaping more
> feature-rich). Dropping the other font backends is a matter of time,
> but it could take a long time.
>
> In any case, the font backend is not the main issue here; in
> particular, the likes of FreeType are hardly even seen except on very
> low level of the code. It's the other aspects of GUI code that
> bothers me much more.
>
> > Even on Linux, this would unlock quite a few simplifications. I
> > count at least three font handling "backends" here.
>
> Down to 2 and one deprecated one on master. Bu again, font backends
> is a relatively easy problem, and it is being dealt with.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
@ 2020-05-17 18:45 ` Eli Zaretskii
2020-05-17 22:28 ` chad
2020-05-18 22:08 ` Alan Third
2 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 18:45 UTC (permalink / raw)
To: Julius Pfrommer; +Cc: emacs-devel
> Date: Sun, 17 May 2020 20:28:02 +0200
> From: Julius Pfrommer <julius.pfrommer@web.de>
> Cc: emacs-devel@gnu.org
>
> Cairo is a vector-drawing library and only responsible for the "glass"
> of each frame (called the "canvas" in other communities). All the
> event-handling logic, menu-drawing, etc. is untouched by it.
Which is what I said. So Cairo alone will be unable to provide all
the GUI features we need, we will need something else. And that
something is done different on different platforms.
> Looking at xterm.c, it is littered with #ifdef USE_CAIRO.
Yes, because Cairo and Xlib are two quite different ways of doing GUI
display.
> A first step could be to assume Cairo on X-based platforms and remove
> duplicate code.
We are going there, but it takes time. We've just made Cairo the
default build on master; it couldn't be that previously because the
Cairo code had several grave bugs which took us time to fix.
> The second step could be to decouple the "glass" from
> the tookit "chrome" more thoroughly in xterm.c. That is easier to do
> when a Cairo-canvas can be assumed for drawing.
>
> Then, that entire "glass" could be reused by other platforms once they
> have a Cairo-canvas for drawing as well. (Modulo the XWidget support
> that depends on GTK.)
>
> Once a switchover is in reach, it can live separately to the existing
> platform-specific "glass" until all the kinks are worked out.
Sounds like a good plan for several years, maybe more, of extensive
development on several platforms. Can I interest you in doing this?
And meanwhile, we also need to come up with enough new features every
2 - 3 years to keep our users engaged and attract new ones.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 17:00 ` Eli Zaretskii
@ 2020-05-17 18:50 ` Pip Cet
2020-05-17 19:17 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-17 18:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel, julius.pfrommer
On Sun, May 17, 2020 at 5:00 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 17 May 2020 16:28:30 +0000
> > Cc: Julius Pfrommer <julius.pfrommer@web.de>, emacs-devel@gnu.org
> >
> > On Sun, May 17, 2020 at 3:56 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > > Date: Sun, 17 May 2020 16:59:53 +0200
> > > > From: Julius Pfrommer <julius.pfrommer@web.de>
> > > > Cc: emacs-devel@gnu.org
> > > Next, please be aware that we already made the decision to use
> > > HarfBuzz as our main text-shaping engine.
> >
> > That's a decision that, having just played with HarfBuzz, I find
> > puzzling. It appears to have no practical support for treating
> > ligatures as anything but monolithic glyphs: is there a documented way
> > of getting HarfBuzz to tell you which part of the "ffi" ligature is
> > the middle "f"?
>
> You are accusing HarfBuzz of crimes it didn't commit ;-) What you see
> is not produced by HarfBuzz, it's produced by Emacs.
I don't think that's true.
> HarfBuzz (and any other text-shaping engine we ever used) has a very
> simple job: Emacs hands it a string of codepoints, and HarfBuzz
> returns a series of font glyphs to be used to display that string.
> That's all. All the rest is the Emacs display engine.
HarfBuzz also tells you which codepoints are used for which glyphs. It
should also, for languages where it can do so, tell you which
codepoints are used for which subglyphs. It fails to do the latter.
(I'm aware of what the Emacs display engine does; I'm, obviously, not
accusing HarfBuzz of failing to present ligatures, because that's
easily fixable. What isn't easily fixable is going back from the
ligature glyph to its subglyphs. LibreOffice does it, and I wonder
how, because the alternative is jumping back and forth between
ligatures and individual characters depending on where PT is, and that
looks horrible.)
> And yes, the current design is that a ligature (like any other
> "grapheme cluster" produced by character composition) is a single
> "display element": you move across all of it with a single C-f/C-b.
I'm using a different design :-)
That one is simply unworkable for English and its limited traditional
set of ligatures.
> In any case, the question "which part of the ligature corresponds to
> some codepoint" is meaningless in the context of ligation and complex
> text shaping:
No, it's not. It's meaningless for some languages, but not for English
and its limited set of traditional ligatures. That a problem cannot be
solved in general is no excuse to refuse to solve it in the specific
cases where it can be.
> > I'm not sure PangoCairo does better, but whatever Libreoffice uses
> > appears to get the job done
>
> What job is that?
LibreOffice highlights sub-glyphs of ligatures correctly. I enter
"official", and it renders <o> <ffi> <c> <i> <a> <l>. I move the
cursor right twice, and it highlights precisely what it should, the
middle "f" of the ligature glyph.
> > (This is assuming we want kerning, ligatures, and subpixel rendering
> > for English text. "Real" text shaping, composition, reordrant glyphs,
> > and bidi concerns are something that I can't really comment on, beyond
> > admitting that, of course, supporting the world's major languages at
> > all is more important than supporting English with the typographic
> > finesse we currently lack).
>
> The truth is that "we" the Emacs project don't want to know anything
> about ligatures, we want to delegate that job to the shaper.
I don't see how that's true. Treating a ligature as a single character
for entry purposes is simply unworkable for English. It might be okay
for other languages, but for English, we really need to display "ffi"
correctly and still allow it to be edited as three characters.
> That's
> the shaper's job, and HarfBuzz does its job very well and stays on top
> of the relevant technological advances.
I don't see any evidence for that positive statement about HarfBuzz:
out of the box, Emacs fails miserably to do anything about English
ligatures. Trying to find a way to fix it, I ran into HarfBuzz
limitations that appear to make it impossible to use it to deal with
English ligatures. It might deal very well with other languages and
their ligatures, but for English text, it fails to do what TeX did
since its inception.
> > Years ago, I ran a WebAssembly version of Emacs in a web browser.
> > (Back then, I used a terminal emulator written in JavaScript.) I'd
> > certainly like to do that again some day, and I think a hard
> > dependency on Cairo and FreeType would make that even harder.
>
> I think there's some measure of confusion here: AFAIR we don't use
> Cairo for text shaping, only for its display. IOW, we tell Cairo to
> display this and that glyphs, after HarfBuzz returned them.
Yes, that's correct. Which means that a WebAssembly version of Emacs
would need to bundle Cairo, even though it would prefer to simply
render things in the browser using HTML 5 canvases or something
similar.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 18:50 ` Pip Cet
@ 2020-05-17 19:17 ` Eli Zaretskii
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-17 19:17 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel, julius.pfrommer
> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 17 May 2020 18:50:19 +0000
> Cc: julius.pfrommer@web.de, emacs-devel@gnu.org
>
> HarfBuzz also tells you which codepoints are used for which glyphs. It
> should also, for languages where it can do so, tell you which
> codepoints are used for which subglyphs. It fails to do the latter.
No, it doesn't fail. You can see what it tells us in the display of
the composition produced by "C-u C-x =".
> That one is simply unworkable for English and its limited traditional
> set of ligatures.
The main reason we want ligatures in Emacs is for displaying program
source. Latin ligatures are not the main reason. But I see no reason
we couldn't do what you want, it's just the question of someone who'd
need to write the code. The information is there.
> LibreOffice highlights sub-glyphs of ligatures correctly. I enter
> "official", and it renders <o> <ffi> <c> <i> <a> <l>. I move the
> cursor right twice, and it highlights precisely what it should, the
> middle "f" of the ligature glyph.
We can do that in Emacs as well. The information is there, we just
need to use it. For Latin ligatures that information will allow the
display you describe. Doing that for other scripts would be harder,
and the results will be less one-to-one.
> > The truth is that "we" the Emacs project don't want to know anything
> > about ligatures, we want to delegate that job to the shaper.
>
> I don't see how that's true. Treating a ligature as a single character
> for entry purposes is simply unworkable for English.
I didn't say we must treat ligatures as a single character, I just
said we do that now. But that has nothing to do with the fact that we
want all the information about the ligature to come from the shaper.
> out of the box, Emacs fails miserably to do anything about English
> ligatures. Trying to find a way to fix it, I ran into HarfBuzz
> limitations that appear to make it impossible to use it to deal with
> English ligatures. It might deal very well with other languages and
> their ligatures, but for English text, it fails to do what TeX did
> since its inception.
I don't think this is right, but since you haven't shown any code, or
what you tried to do, or which HarfBuzz limitations you allude to, it
is hard to be more specific. I can only suggest, again, to look at
the output of "C-u C-x =" -- that information comes directly from
HarfBuzz.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 18:45 ` Eli Zaretskii
@ 2020-05-17 22:28 ` chad
2020-05-18 22:08 ` Alan Third
2 siblings, 0 replies; 145+ messages in thread
From: chad @ 2020-05-17 22:28 UTC (permalink / raw)
To: Julius Pfrommer; +Cc: Eli Zaretskii, EMACS development team
[-- Attachment #1: Type: text/plain, Size: 727 bytes --]
On Sun, May 17, 2020 at 11:30 AM Julius Pfrommer <julius.pfrommer@web.de>
wrote:
> A first step could be to assume Cairo on X-based platforms and remove
> duplicate code. The second step could be to decouple the "glass" from
> the tookit "chrome" more thoroughly in xterm.c. That is easier to do
> when a Cairo-canvas can be assumed for drawing.
>
> Then, that entire "glass" could be reused by other platforms once they
> have a Cairo-canvas for drawing as well. (Modulo the XWidget support
> that depends on GTK.)
>
FWIW, there exists code to bring xwidgets and webkit into macOS without GTK:
https://github.com/veshboo/emacs
I haven't tried it since my macOS machine died ~1.5 years ago, but it
worked once upon a time.
[-- Attachment #2: Type: text/html, Size: 1210 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-17 19:17 ` Eli Zaretskii
@ 2020-05-18 16:08 ` Eli Zaretskii
2020-05-18 16:45 ` tomas
` (3 more replies)
0 siblings, 4 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 16:08 UTC (permalink / raw)
To: pipcet; +Cc: emacs-devel
> Date: Sun, 17 May 2020 22:17:17 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org, julius.pfrommer@web.de
>
> > LibreOffice highlights sub-glyphs of ligatures correctly. I enter
> > "official", and it renders <o> <ffi> <c> <i> <a> <l>. I move the
> > cursor right twice, and it highlights precisely what it should, the
> > middle "f" of the ligature glyph.
>
> We can do that in Emacs as well. The information is there, we just
> need to use it. For Latin ligatures that information will allow the
> display you describe. Doing that for other scripts would be harder,
> and the results will be less one-to-one.
On second thought, I think I misunderstood you. If the font that is
used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
highlights parts of this glyph, then I'd like to know how it does
that, and how far does this capability extend. I mean, what does it
do with ligatures like ae, displayed as æ -- does it highlight the
common vertical stroke for both parts? And what about "st", displayed
as st -- this has a curved "hand" connecting s and t -- to which of the
2 does it belong for the purposes of highlighting? There's also "hv"
displayed as ƕ, let alone "fs" displayed as ẞ and "fz" displayed as
ß.
IOW, I really don't think I understand how this could work even for
what you call "English ligatures". Do you know how they do it?
The information I said we get from HarfBuzz is returned when HarfBuzz
produces a grapheme cluster from several font glyphs. When the result
is a single font glyph, that information just says which of the
original codepoints are to be displayed as that single glyph, it
doesn't provide any sub-glyph information.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
@ 2020-05-18 16:45 ` tomas
2020-05-18 16:49 ` Eli Zaretskii
2020-05-18 17:05 ` Ligatures Stefan Monnier
` (2 subsequent siblings)
3 siblings, 1 reply; 145+ messages in thread
From: tomas @ 2020-05-18 16:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: pipcet, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 866 bytes --]
On Mon, May 18, 2020 at 07:08:45PM +0300, Eli Zaretskii wrote:
> > Date: Sun, 17 May 2020 22:17:17 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: emacs-devel@gnu.org, julius.pfrommer@web.de
> >
> > > LibreOffice highlights sub-glyphs of ligatures correctly. I enter
> > > "official", and it renders <o> <ffi> <c> <i> <a> <l>. I move the
> > > cursor right twice, and it highlights precisely what it should, the
> > > middle "f" of the ligature glyph.
[...]
> On second thought, I think I misunderstood you. If the font that is
> used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
> highlights parts of this glyph, then I'd like to know how it does
> that [...]
Didn't work for me [1]. It treated the whole ligature as one "character".
Cheers
[1] LibreOffice 6.1.5.2 10(Build:2), Debian GNU/Linux (buster).
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 16:45 ` tomas
@ 2020-05-18 16:49 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 16:49 UTC (permalink / raw)
To: tomas; +Cc: pipcet, emacs-devel
> Date: Mon, 18 May 2020 18:45:43 +0200
> Cc: pipcet@gmail.com, emacs-devel@gnu.org
> From: <tomas@tuxteam.de>
>
> > On second thought, I think I misunderstood you. If the font that is
> > used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
> > highlights parts of this glyph, then I'd like to know how it does
> > that [...]
>
> Didn't work for me [1]. It treated the whole ligature as one "character".
That's what I'd expect.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-18 16:45 ` tomas
@ 2020-05-18 17:05 ` Stefan Monnier
2020-05-18 17:18 ` Ligatures Eli Zaretskii
2020-05-18 17:24 ` Ligatures tomas
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
2020-05-19 5:43 ` Ligatures ASSI
3 siblings, 2 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-18 17:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: pipcet, emacs-devel
[ I know nothing about the underlying APIs and such, so speaking here
only as a random user. ]
> On second thought, I think I misunderstood you. If the font that is
> used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
> highlights parts of this glyph, then I'd like to know how it does
> that, and how far does this capability extend. I mean, what does it
> do with ligatures like ae, displayed as æ -- does it highlight the
> common vertical stroke for both parts? And what about "st", displayed
> as st -- this has a curved "hand" connecting s and t -- to which of the
> 2 does it belong for the purposes of highlighting?
As a mere user I wouldn't care very much about this detail: I'd just
want the cursor to have 2 different positions depending on whether I'm
on the "s" or on the "t", and hopefully those two positions are
sufficiently self-evident that I don't have to read a manual to
understand which is which.
So, maybe we don't need very much info: all we need is a boolean which
tells us whether the glyph should be treated atomically or not.
When not treating it atomically, we would (somewhat arbitrarily) divide
the glyph horizontally into N equal sized "subglyphs" and draw the
cursor on the corresponding subglyph.
If Harfbuzz could tell us more precisely how to divide the glyph into
subglyphs, we could do a better job, of course.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:05 ` Ligatures Stefan Monnier
@ 2020-05-18 17:18 ` Eli Zaretskii
2020-05-18 19:19 ` Ligatures Pip Cet
2020-05-18 17:24 ` Ligatures tomas
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 17:18 UTC (permalink / raw)
To: Stefan Monnier; +Cc: pipcet, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: pipcet@gmail.com, emacs-devel@gnu.org
> Date: Mon, 18 May 2020 13:05:53 -0400
>
> So, maybe we don't need very much info: all we need is a boolean which
> tells us whether the glyph should be treated atomically or not.
> When not treating it atomically, we would (somewhat arbitrarily) divide
> the glyph horizontally into N equal sized "subglyphs" and draw the
> cursor on the corresponding subglyph.
That strikes me as not a very user-friendly UX. Especially if you
keep in mind that glyphs can be composed into a grapheme cluster using
2D offsets, not just left-right one-dimensional offsets.
An alternative which might be nicer is to "split" the composition:
display it as if a ZWNJ character was inserted at point. Thus, moving
forward one buffer position into the ffi would show f followed by a thin bar
cursor followed by the fi; moving forward one more buffer position
would show ff followed by a thin bar cursor followed by i. Etc.
> If Harfbuzz could tell us more precisely how to divide the glyph into
> subglyphs, we could do a better job, of course.
I don't think it's possible because AFAIK fonts don't store this
information. It should be possible, of course, to have a private
database of such offsets, but I don't really see how it could work in
general.
Maybe I'm missing something, though. If someone wants to have a
definitive answer, I suggest to ask on the HarfBuzz mailing list.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:05 ` Ligatures Stefan Monnier
2020-05-18 17:18 ` Ligatures Eli Zaretskii
@ 2020-05-18 17:24 ` tomas
2020-05-18 17:41 ` Ligatures Eli Zaretskii
2020-05-18 20:33 ` Ligatures Stefan Monnier
1 sibling, 2 replies; 145+ messages in thread
From: tomas @ 2020-05-18 17:24 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 909 bytes --]
On Mon, May 18, 2020 at 01:05:53PM -0400, Stefan Monnier wrote:
> [ I know nothing about the underlying APIs and such, so speaking here
> only as a random user. ]
[...]
> So, maybe we don't need very much info: all we need is a boolean which
> tells us whether the glyph should be treated atomically or not.
> When not treating it atomically, we would (somewhat arbitrarily) divide
> the glyph horizontally into N equal sized "subglyphs" and draw the
> cursor on the corresponding subglyph.
I'm somewhat out of my depth here, but I have the hunch that some
"ligatures" aren't "just stacked horizontally".
> If Harfbuzz could tell us more precisely how to divide the glyph into
> subglyphs, we could do a better job, of course.
On a very superficial glance it seems they can [1]
Cheers
[1] https://github.com/harfbuzz/harfbuzz/blob/master/docs/usermanual-clusters.xml
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-18 16:45 ` tomas
2020-05-18 17:05 ` Ligatures Stefan Monnier
@ 2020-05-18 17:31 ` Clément Pit-Claudel
2020-05-18 17:39 ` Eli Zaretskii
2020-05-19 10:09 ` Trevor Spiteri
2020-05-19 5:43 ` Ligatures ASSI
3 siblings, 2 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-18 17:31 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]
On 18/05/2020 12.08, Eli Zaretskii wrote:
> On second thought, I think I misunderstood you. If the font that is
> used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
> highlights parts of this glyph, then I'd like to know how it does
> that, and how far does this capability extend. I mean, what does it
> do with ligatures like ae, displayed as æ -- does it highlight the
> common vertical stroke for both parts? And what about "st", displayed
> as st -- this has a curved "hand" connecting s and t -- to which of the
> 2 does it belong for the purposes of highlighting? There's also "hv"
> displayed as ƕ, let alone "fs" displayed as ẞ and "fz" displayed as
> ß.
I've attached a screenshot with a few examples, though I couldn't find a font that displays ae as æ.
Firefox does the same as LibreOffice (try it here, for example: https://developer.mozilla.org/en-US/docs/Web/CSS/font-variant-ligatures). Since Firefox uses Harbuzz, I think there's a good chance we can support that feature too :)
[-- Attachment #2: ligatures.png --]
[-- Type: image/png, Size: 13362 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
@ 2020-05-18 17:39 ` Eli Zaretskii
2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-19 10:09 ` Trevor Spiteri
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 17:39 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Mon, 18 May 2020 13:31:30 -0400
>
> I've attached a screenshot with a few examples, though I couldn't find a font that displays ae as æ.
Thanks. Once again, I wonder how they decide where each parts starts
and ends. The examples show very simple cases, so it's hard to know
where this ends.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:24 ` Ligatures tomas
@ 2020-05-18 17:41 ` Eli Zaretskii
2020-05-18 19:07 ` Ligatures tomas
2020-05-18 20:33 ` Ligatures Stefan Monnier
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 17:41 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
> Date: Mon, 18 May 2020 19:24:41 +0200
> From: <tomas@tuxteam.de>
>
> > If Harfbuzz could tell us more precisely how to divide the glyph into
> > subglyphs, we could do a better job, of course.
>
> On a very superficial glance it seems they can [1]
>
> Cheers
> [1] https://github.com/harfbuzz/harfbuzz/blob/master/docs/usermanual-clusters.xml
AFAIK, each "cluster" corresponds to a single font glyph, and we
already get this information from HarfBuzz, see hbfont.c.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 17:39 ` Eli Zaretskii
@ 2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-18 19:15 ` Eli Zaretskii
` (3 more replies)
0 siblings, 4 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-18 19:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 18/05/2020 13.39, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Mon, 18 May 2020 13:31:30 -0400
>>
>> I've attached a screenshot with a few examples, though I couldn't find a font that displays ae as æ.
>
> Thanks. Once again, I wonder how they decide where each parts starts
> and ends. The examples show very simple cases, so it's hard to know
> where this ends.
Hi Eli,
I asked on Firefox' Matrix server. Here is a lightly edited transcript:
cpitclaudel> Hi all. I noticed that Firefox has this nifty feature that makes it possible to move the cursor within a ligature (for example, with the right font config, "ffi" can be rendered as "ffi" while allowing the cursor to move between the individual glyphs that make up that composition). Is the extraction of ligature information and the rendering done by Firefox itself, or by a lower-level library? Most font shaping libraries I've seen don't seem to return glyph-decomposition information for ligatures, so I'm curious to understand how Firefox does it ^^
jfkthame> Firefox uses harfbuzz to handle the font shaping (ligature rules, etc). I'd expect what you describe to work pretty much the same in other browsers too, fwiw.
krosylight
cpitclaudel> Thanks! But Harfbuzz doesn't give sub-glyph information for ligatures, does it? So how does Firefox know where to put the caret when it moves through a ligature?
jfkthame> it doesn't, really - it just knows how many underlying characters are represented by the ligature glyph, and divides the advance width up into that many slices (usually that works pretty reasonably, but it's possible to come up with fonts where the inaccuracy becomes obvious)
jfkthame> In principle, OpenType fonts can provide specific positions for the caret within a ligature (see the LigatureCaretList subtable within the GDEF table), but in practice that's rarely supported or used (harfbuzz can provide this information if it's present, see the hb_ot_layout_get_ligature_carets function, but currently firefox doesn't use it anyhow)
cpitclaudel> Thanks, that's very useful! How does that work for glyphs like "fs" displayed as ẞ or "fz" displayed as ß? Does Firefox move in that single glyph? (I couldn't find a font that does that, otherwise I'd have tested it ^^) Thanks a lot for your help :)
jfkthame> Yes, it'd be the same - doesn't matter what the specific characters are, if there's a ligature of two characters Firefox would put the caret half-way through the ligature glyph when it is between the component characters in the underlying text
jfkthame> btw, if you're on a mac (or have access to one), you can see an extreme case if you try the word "Zapfino" in the font Zapfino .... the entire word is a single 7-character ligature, and the seven equal slices that Firefox treats it as for selection/editing purposes don't match up to the visual shapes of the sub-glyphs at all well
HTH,
Clément.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:41 ` Ligatures Eli Zaretskii
@ 2020-05-18 19:07 ` tomas
2020-05-18 19:17 ` Ligatures Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: tomas @ 2020-05-18 19:07 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 744 bytes --]
On Mon, May 18, 2020 at 08:41:09PM +0300, Eli Zaretskii wrote:
> > Date: Mon, 18 May 2020 19:24:41 +0200
> > From: <tomas@tuxteam.de>
> >
> > > If Harfbuzz could tell us more precisely how to divide the glyph into
> > > subglyphs, we could do a better job, of course.
> >
> > On a very superficial glance it seems they can [1]
> >
> > Cheers
> > [1] https://github.com/harfbuzz/harfbuzz/blob/master/docs/usermanual-clusters.xml
>
> AFAIK, each "cluster" corresponds to a single font glyph, and we
> already get this information from HarfBuzz, see hbfont.c.
I see, thanks. As I said, my reading was a very cursory. I'm sure
you read that doc much more thoroughly than me :-)
Thanks for the insights
Cheers
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 19:01 ` Clément Pit-Claudel
@ 2020-05-18 19:15 ` Eli Zaretskii
2020-05-18 19:18 ` tomas
` (2 subsequent siblings)
3 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 19:15 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Mon, 18 May 2020 15:01:49 -0400
>
> I asked on Firefox' Matrix server. Here is a lightly edited transcript:
Thanks. So it's pure heuristic, and works only in simple cases.
We could ask on the HarfBuzz list how many fonts provide meaningful
information for the hb_ot_layout_get_ligature_carets function to
return useful data. If someone is interested in working on that, that
is.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:07 ` Ligatures tomas
@ 2020-05-18 19:17 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 19:17 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
> Date: Mon, 18 May 2020 21:07:35 +0200
> From: tomas@tuxteam.de
> Cc: emacs-devel@gnu.org
>
> > > [1] https://github.com/harfbuzz/harfbuzz/blob/master/docs/usermanual-clusters.xml
> >
> > AFAIK, each "cluster" corresponds to a single font glyph, and we
> > already get this information from HarfBuzz, see hbfont.c.
>
> I see, thanks. As I said, my reading was a very cursory. I'm sure
> you read that doc much more thoroughly than me :-)
Some of the docs is impossible to understand without asking the
HarfBuzz developers (who are always willing to help). The HarfBuzz
docs is really minimal.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-18 19:15 ` Eli Zaretskii
@ 2020-05-18 19:18 ` tomas
2020-05-18 20:37 ` Ligatures Stefan Monnier
2020-05-18 21:59 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Alan Third
3 siblings, 0 replies; 145+ messages in thread
From: tomas @ 2020-05-18 19:18 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
On Mon, May 18, 2020 at 03:01:49PM -0400, Clément Pit-Claudel wrote:
> On 18/05/2020 13.39, Eli Zaretskii wrote:
> >> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> >> Date: Mon, 18 May 2020 13:31:30 -0400
> >>
> >> I've attached a screenshot with a few examples, though I couldn't find a font that displays ae as æ.
> >
> > Thanks. Once again, I wonder how they decide where each parts starts
> > and ends. The examples show very simple cases, so it's hard to know
> > where this ends.
>
> Hi Eli,
>
> I asked on Firefox' Matrix server. Here is a lightly edited transcript:
Thanks, that's interesting. So they just assume the subcharacters in a
cluster stack side-by-side. Works most of the time, but is bound to
give surprising results with things which stack the "wrong" way (i.e.
on the top or bottom for LR or RL scripts, like accents and crazy
scripts like Devanagari).
Thanks for gathering the information.
Cheers
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:18 ` Ligatures Eli Zaretskii
@ 2020-05-18 19:19 ` Pip Cet
2020-05-18 19:25 ` Ligatures tomas
` (2 more replies)
0 siblings, 3 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-18 19:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
On Mon, May 18, 2020 at 5:18 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > So, maybe we don't need very much info: all we need is a boolean which
> > tells us whether the glyph should be treated atomically or not.
> > When not treating it atomically, we would (somewhat arbitrarily) divide
> > the glyph horizontally into N equal sized "subglyphs" and draw the
> > cursor on the corresponding subglyph.
>
> That strikes me as not a very user-friendly UX. Especially if you
> keep in mind that glyphs can be composed into a grapheme cluster using
> 2D offsets, not just left-right one-dimensional offsets.
So such clusters would be marked as atomic? I like Stefan's proposal,
and maybe it's what LibreOffice actually does: at large font sizes,
the horizontal division of "subglyphs" seems off.
> An alternative which might be nicer is to "split" the composition:
> display it as if a ZWNJ character was inserted at point. Thus, moving
> forward one buffer position into the ffi would show f followed by a thin bar
> cursor followed by the fi; moving forward one more buffer position
> would show ff followed by a thin bar cursor followed by i. Etc.
I tried something like that (with a variable-pitch font), and the
effect is nauseating because the rest of the line shifts as the width
of the word at point changes. What I tried was to use Harfbuzz to
shape entire words when PT is not in them, then split them up into
individual characters (the way it's done now) when PT enters them.
Of course, people might still like it.
> > If Harfbuzz could tell us more precisely how to divide the glyph into
> > subglyphs, we could do a better job, of course.
>
> I don't think it's possible because AFAIK fonts don't store this
> information.
Well, they should!
> It should be possible, of course, to have a private
> database of such offsets, but I don't really see how it could work in
> general.
And this is where it gets back to "let's not hardcode the dependency
on Harfbuzz and FreeType, because other backends might actually give
us the information we need".
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:19 ` Ligatures Pip Cet
@ 2020-05-18 19:25 ` tomas
2020-05-18 19:41 ` Ligatures Pip Cet
2020-05-18 19:33 ` Ligatures Eli Zaretskii
2020-05-18 19:38 ` Ligatures Clément Pit-Claudel
2 siblings, 1 reply; 145+ messages in thread
From: tomas @ 2020-05-18 19:25 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 406 bytes --]
On Mon, May 18, 2020 at 07:19:19PM +0000, Pip Cet wrote:
[...]
> And this is where it gets back to "let's not hardcode the dependency
> on Harfbuzz and FreeType, because other backends might actually give
> us the information we need".
But how should a backend guess where the subparts of a cluster are
without the font providing it? And in the latter case, HarfBuzz
does give us the info.
Cheers
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:19 ` Ligatures Pip Cet
2020-05-18 19:25 ` Ligatures tomas
@ 2020-05-18 19:33 ` Eli Zaretskii
2020-05-18 19:44 ` Ligatures Clément Pit-Claudel
2020-05-18 19:38 ` Ligatures Clément Pit-Claudel
2 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-18 19:33 UTC (permalink / raw)
To: Pip Cet; +Cc: monnier, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 18 May 2020 19:19:19 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>
> > An alternative which might be nicer is to "split" the composition:
> > display it as if a ZWNJ character was inserted at point. Thus, moving
> > forward one buffer position into the ffi would show f followed by a thin bar
> > cursor followed by the fi; moving forward one more buffer position
> > would show ff followed by a thin bar cursor followed by i. Etc.
>
> I tried something like that (with a variable-pitch font), and the
> effect is nauseating because the rest of the line shifts as the width
> of the word at point changes.
The idea is that this is used only rarely. Most use cases don't need
to deconstruct a ligature that way; after all, that's what ligatures
are for.
> And this is where it gets back to "let's not hardcode the dependency
> on Harfbuzz and FreeType, because other backends might actually give
> us the information we need".
You cannot avoid hardcoding something, because each shaper has its
idiosyncrasies. But those are only limited to the implementation of
the font driver interfaces described in font.h, they don't leak above
that level. So if we will support such sub-glyph movements, we will
probably introduce one more method into the font driver interface, and
the display engine will use that.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:19 ` Ligatures Pip Cet
2020-05-18 19:25 ` Ligatures tomas
2020-05-18 19:33 ` Ligatures Eli Zaretskii
@ 2020-05-18 19:38 ` Clément Pit-Claudel
2020-05-19 14:55 ` Ligatures Pip Cet
2 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-18 19:38 UTC (permalink / raw)
To: emacs-devel
On 18/05/2020 15.19, Pip Cet wrote:
> So such clusters would be marked as atomic? I like Stefan's proposal,
> and maybe it's what LibreOffice actually does: at large font sizes,
> the horizontal division of "subglyphs" seems off.
Yup, that's what Firefox and LibreOffice do.
>>> If Harfbuzz could tell us more precisely how to divide the glyph into
>>> subglyphs, we could do a better job, of course.
>>
>> I don't think it's possible because AFAIK fonts don't store this
>> information.
>
> Well, they should!
They can, but few do (the LigatureCaretList subtable within the GDEF table)
>> It should be possible, of course, to have a private
>> database of such offsets, but I don't really see how it could work in
>> general.
>
> And this is where it gets back to "let's not hardcode the dependency
> on Harfbuzz and FreeType, because other backends might actually give
> us the information we need".
Harfbuzz can give us this info: hb_ot_layout_get_ligature_carets
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:25 ` Ligatures tomas
@ 2020-05-18 19:41 ` Pip Cet
2020-05-18 20:20 ` Ligatures tomas
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-18 19:41 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
On Mon, May 18, 2020 at 7:27 PM <tomas@tuxteam.de> wrote:
> > And this is where it gets back to "let's not hardcode the dependency
> > on Harfbuzz and FreeType, because other backends might actually give
> > us the information we need".
>
> But how should a backend guess where the subparts of a cluster are
> without the font providing it?
Well, of course it shouldn't. It should return the information that is
available, and then we can decide, based on a user setting, what we
want to do about it: the options are, at least, to treat the ligature
as atomic (the right thing to do for ligatures like %, &, and ß),
guess (possibly the right thing to do for ffi?), or refuse to use the
ligature in question and fall back to individual characters (which
isn't always possible, but it is what we do right now for ASCII
ligatures).
> And in the latter case, HarfBuzz
> does give us the info.
How so? I honestly don't think it does, because it would treat the
ligature as one glyph.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:33 ` Ligatures Eli Zaretskii
@ 2020-05-18 19:44 ` Clément Pit-Claudel
2020-05-19 2:25 ` Ligatures Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-18 19:44 UTC (permalink / raw)
To: emacs-devel
On 18/05/2020 15.33, Eli Zaretskii wrote:
>> From: Pip Cet <pipcet@gmail.com>
>> Date: Mon, 18 May 2020 19:19:19 +0000
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>>
>>> An alternative which might be nicer is to "split" the composition:
>>> display it as if a ZWNJ character was inserted at point. Thus, moving
>>> forward one buffer position into the ffi would show f followed by a thin bar
>>> cursor followed by the fi; moving forward one more buffer position
>>> would show ff followed by a thin bar cursor followed by i. Etc.
>> I tried something like that (with a variable-pitch font), and the
>> effect is nauseating because the rest of the line shifts as the width
>> of the word at point changes.
> The idea is that this is used only rarely. Most use cases don't need
> to deconstruct a ligature that way; after all, that's what ligatures
> are for.
In an earlier thread, you mentioned programming font ligatures — wouldn't it be very common to deconstruct such ligatures, like → into ->?
Maybe the effect wouldn't be jarring with monospaced fonts, but for these the simple approach of subdividing the glyph works nicely too.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:41 ` Ligatures Pip Cet
@ 2020-05-18 20:20 ` tomas
0 siblings, 0 replies; 145+ messages in thread
From: tomas @ 2020-05-18 20:20 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
On Mon, May 18, 2020 at 07:41:19PM +0000, Pip Cet wrote:
> On Mon, May 18, 2020 at 7:27 PM <tomas@tuxteam.de> wrote:
[...]
> > But how should a backend guess where the subparts of a cluster are
> > without the font providing it?
>
> Well, of course it shouldn't. It should return the information that is
> available [...]
> > And in the latter case, HarfBuzz
> > does give us the info.
>
> How so? I honestly don't think it does, because it would treat the
> ligature as one glyph.
Eli and Clément already looked it up for us: hb_ot_layout_get_ligature_carets()
Cheers
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 17:24 ` Ligatures tomas
2020-05-18 17:41 ` Ligatures Eli Zaretskii
@ 2020-05-18 20:33 ` Stefan Monnier
1 sibling, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-18 20:33 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
>> So, maybe we don't need very much info: all we need is a boolean which
>> tells us whether the glyph should be treated atomically or not.
>> When not treating it atomically, we would (somewhat arbitrarily) divide
>> the glyph horizontally into N equal sized "subglyphs" and draw the
>> cursor on the corresponding subglyph.
>
> I'm somewhat out of my depth here, but I have the hunch that some
> "ligatures" aren't "just stacked horizontally".
That's why we need a boolean to tell us whether this ligature is
"stacked horizontally" (which I called "not atomic").
This boolean could actually be a global constant (so it give the wrong
behavior half the time, but that would be good enough for those people
who use the kind of latin-ligatures talked about here and almost no
other ligatures, and would be no-worse than what we have now for people
who do use languages where many ligatures aren't "stacked horizontally".
But rather than a global constant, we could probably try and do better
either by asking the font-backend (in case it can provide that kind of
info) of by using a heuristic based on the script of the characters that
are being combined.
Obviously, I'm discussing a *heuristic*, not a 100% perfect solution.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-18 19:15 ` Eli Zaretskii
2020-05-18 19:18 ` tomas
@ 2020-05-18 20:37 ` Stefan Monnier
2020-05-18 21:59 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Alan Third
3 siblings, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-18 20:37 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, emacs-devel
> jfkthame> it doesn't, really - it just knows how many underlying characters
> jfkthame> are represented by the ligature glyph, and divides the advance
> jfkthame> width up into that many slices (usually that works pretty
> jfkthame> reasonably, but it's possible to come up with fonts where the
> jfkthame> inaccuracy becomes obvious)
Apparently, great minds think alike ;-)
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 19:01 ` Clément Pit-Claudel
` (2 preceding siblings ...)
2020-05-18 20:37 ` Ligatures Stefan Monnier
@ 2020-05-18 21:59 ` Alan Third
2020-05-19 13:56 ` Eli Zaretskii
3 siblings, 1 reply; 145+ messages in thread
From: Alan Third @ 2020-05-18 21:59 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 721 bytes --]
On Mon, May 18, 2020 at 03:01:49PM -0400, Clément Pit-Claudel wrote:
> jfkthame> btw, if you're on a mac (or have access to one), you can
> see an extreme case if you try the word "Zapfino" in the font
> Zapfino .... the entire word is a single 7-character ligature, and
> the seven equal slices that Firefox treats it as for
> selection/editing purposes don't match up to the visual shapes of
> the sub-glyphs at all well
In case anyone's interested, I've attached a screenshot of Apple's
Pages.app displaying the word Zapfino with the cursor after the "a".
Clearly not ideal. OTOH, if LibreOffice, Firefox, and even Apple's
products do this, perhaps it's just the way people will expect it to
be done.
--
Alan Third
[-- Attachment #2: Screenshot 2020-05-18 at 22.52.23.png --]
[-- Type: image/png, Size: 28933 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 18:45 ` Eli Zaretskii
2020-05-17 22:28 ` chad
@ 2020-05-18 22:08 ` Alan Third
2 siblings, 0 replies; 145+ messages in thread
From: Alan Third @ 2020-05-18 22:08 UTC (permalink / raw)
To: Julius Pfrommer; +Cc: Eli Zaretskii, emacs-devel
On Sun, May 17, 2020 at 08:28:02PM +0200, Julius Pfrommer wrote:
> > I don't think the answer will be full and definitive until "Someone"
> > walks through all the APIs we implement in x/w32/ns/fns.c and
> > x/w32/ns/term.c, and makes sure they all can be covered.
>
> Looking at xterm.c, it is littered with #ifdef USE_CAIRO.
>
> A first step could be to assume Cairo on X-based platforms and remove
> duplicate code. The second step could be to decouple the "glass" from
> the tookit "chrome" more thoroughly in xterm.c. That is easier to do
> when a Cairo-canvas can be assumed for drawing.
>
> Then, that entire "glass" could be reused by other platforms once they
> have a Cairo-canvas for drawing as well. (Modulo the XWidget support
> that depends on GTK.)
>
> Once a switchover is in reach, it can live separately to the existing
> platform-specific "glass" until all the kinks are worked out.
It may be worth your while looking into the PGTK port that some people
are working on:
https://github.com/masm11/emacs
I believe it will be using pure Cairo rendering which may make this
project a bit easier.
--
Alan Third
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:44 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 2:25 ` Eli Zaretskii
2020-05-19 2:44 ` Ligatures Clément Pit-Claudel
2020-05-19 3:47 ` Ligatures Stefan Monnier
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 2:25 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Mon, 18 May 2020 15:44:01 -0400
>
> > The idea is that this is used only rarely. Most use cases don't need
> > to deconstruct a ligature that way; after all, that's what ligatures
> > are for.
>
> In an earlier thread, you mentioned programming font ligatures — wouldn't it be very common to deconstruct such ligatures, like → into ->?
No, I don't think so. Why would this be common?
> Maybe the effect wouldn't be jarring with monospaced fonts, but for these the simple approach of subdividing the glyph works nicely too.
It might work in some simple cases, but I wonder what gains would that
give the users. It sounds very unusual to me to do something like
that, and I don't think we ever heard any such complaints until now,
although prettify-symbols-mode exists for several years.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 2:25 ` Ligatures Eli Zaretskii
@ 2020-05-19 2:44 ` Clément Pit-Claudel
2020-05-19 13:59 ` Ligatures Eli Zaretskii
2020-05-19 3:47 ` Ligatures Stefan Monnier
1 sibling, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 2:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 18/05/2020 22.25, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Mon, 18 May 2020 15:44:01 -0400
>>
>>> The idea is that this is used only rarely. Most use cases don't need
>>> to deconstruct a ligature that way; after all, that's what ligatures
>>> are for.
>>
>> In an earlier thread, you mentioned programming font ligatures — wouldn't it be very common to deconstruct such ligatures, like → into ->?
>
> No, I don't think so. Why would this be common?
I thought it would be the default. Emacs shows →, and you can put the point either before (|→), in the middle (-|>), or after (→|).
This is what prettify-symbols-unprettify-at-point exists for, I believe, though it doesn't work perfectly often the composed glyph doesn't have the same width as the non-composed one.
Here's a fairly common case: when writing html or XML, you may type <, then >, then press C-b and type the tag name; or you may use < and a paredit-like setup that inserts the > automatically. If the font has a ligature for <> and you can't put the point in the middle, this breaks. Same for || — the notation |x| { … } is used for lambdas in some languages; if you type || then try to move the point back inside the composed || glyph it won't work.
>> Maybe the effect wouldn't be jarring with monospaced fonts, but for these the simple approach of subdividing the glyph works nicely too.
>
> It might work in some simple cases, but I wonder what gains would that
> give the users. It sounds very unusual to me to do something like
> that, and I don't think we ever heard any such complaints until now,
> although prettify-symbols-mode exists for several years.
I thought I did complain in the past, but I can't find the thread any more :/ prettify-symbols-unprettify-at-point helps, and it's the default in some popular Emacs configs.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 2:25 ` Ligatures Eli Zaretskii
2020-05-19 2:44 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 3:47 ` Stefan Monnier
2020-05-19 4:51 ` Ligatures Clément Pit-Claudel
1 sibling, 1 reply; 145+ messages in thread
From: Stefan Monnier @ 2020-05-19 3:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, emacs-devel
> It might work in some simple cases, but I wonder what gains would that
> give the users. It sounds very unusual to me to do something like
> that, and I don't think we ever heard any such complaints until now,
> although prettify-symbols-mode exists for several years.
For things like `→`, I think of `->` as an "encoding" used to stay
within the confines of ASCII whereas `→` is what is really "meant".
So when I see `→` I'm not likely to want to "look inside" and am instead
happy if `C-p` skips over both characters at once (except when I want
to change it to `=>`, of course).
In contrast I don't think of "ffi" as the ASCII encoding of `ffi`.
Instead I think of `ffi` as just a more refined way to draw "ffi" and I'd
find it odd for `C-p` to skip over those three chars.
So, the right behavior depends on the intention, AFAICT.
Since 99.99% of my Emacs windows is made up of monospace text,
I probably won't be too significantly affected either way.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 3:47 ` Ligatures Stefan Monnier
@ 2020-05-19 4:51 ` Clément Pit-Claudel
0 siblings, 0 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 4:51 UTC (permalink / raw)
To: Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel
On 18/05/2020 23.47, Stefan Monnier wrote:
> (except when I want
> to change it to `=>`, of course).
Variants of this case are not too uncommon, and they're not always as simple as removing the beginning of the composition to replace it with something else. For example, I'm typing a regexp in javascript, enclosed in /…/; then I add a backslash at the end of the regexp to escape a character that I haven't typed yet, and \/ turns into a composition, and the point disappears. Or I write html, with a buffer that contains <a href>, I type an = sign after the href, and => gets composed into ⇒, and the point disappears. There are many such examples, and if I lose my position, I need to delete part of the composition.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
` (2 preceding siblings ...)
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
@ 2020-05-19 5:43 ` ASSI
2020-05-19 7:22 ` Ligatures tomas
2020-05-19 14:18 ` Ligatures Eli Zaretskii
3 siblings, 2 replies; 145+ messages in thread
From: ASSI @ 2020-05-19 5:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: pipcet, emacs-devel
Eli Zaretskii writes:
> On second thought, I think I misunderstood you. If the font that is
> used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
> highlights parts of this glyph, then I'd like to know how it does
> that, and how far does this capability extend. I mean, what does it
> do with ligatures like ae, displayed as æ -- does it highlight the
> common vertical stroke for both parts?
The only program I ever used that I remember doing this (a WYSIWYG TeX
editor for DOS, natch) temporarily broke the ligature while you were
moving the cursor inside. It looked a bit strange and was slightly
distracting if you were just moving the cursor without trying to edit
it, but otherwise did the job well.
I expect that fonts that make extensive use of ligatures have
information on where the ligatures can be broken and exactly how to
display the parts in that case, although I wouldn't be surprised if that
information is not very reliable even when just considering latin family
scripts.
> And what about "st", displayed as st -- this has a curved "hand"
> connecting s and t -- to which of the 2 does it belong for the
> purposes of highlighting? There's also "hv" displayed as ƕ, let alone
> "fs" displayed as ẞ and "fz" displayed as ß.
The origin of this ligature has no general consensus AFAIK, but if you
read older (facsimile) printed literature from around 1800 it becomes
pretty obvious that the typeface evolved from a combination of long s
(mainly used inside a word) and round s (used at the end). The origin
of "sz" in that place is even more complicated to figure out, but it
seems (to me anyway) that this was driven by a desire to preserve the
distinction to double s / "ss" when using typefaces that didn't have the
proper glyphs for the various types of "s" previously available in
Fraktur. Neither "fs" nor "fz" should ligature into "ß" (which is a
proper glyph these days and no longer a ligature, although you are still
allowed to break it into either "ss" or "sz" when using typefaces that
don't support it, like most versalia).
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
Samples for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldSamplesExtra
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 5:43 ` Ligatures ASSI
@ 2020-05-19 7:22 ` tomas
2020-05-19 7:55 ` Ligatures Joost Kremers
2020-05-19 14:18 ` Ligatures Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: tomas @ 2020-05-19 7:22 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]
On Tue, May 19, 2020 at 07:43:00AM +0200, ASSI wrote:
[...]
> [...] Neither "fs" nor "fz" should ligature into "ß" (which is a
> proper glyph these days and no longer a ligature, although you are still
> allowed to break it into either "ss" or "sz" when using typefaces that
> don't support it, like most versalia).
Definitely. This "long" and "short" vaiants of s were in use in Germany
early in te twentieth, in Fraktur and also in handwriting [1]. This two
forms of "s" (one for terminal position) still exists in Greek. The
ß "ligature" (which isn't perceived as such nowadays) evolved from
"ss", the first s being a non-terminal (yeah, looks a bit like an "f"
to the untrained eye).
In the German speaking part of Switzerland, "ß" is always replaced by
"ss". There's no capital version of "ß", you use "SS" (thus breaking
bijectivity of upper- and lowercase).
Writing is human. Human is messy :-/
Cheers
[1] https://de.wikipedia.org/wiki/S%C3%BCtterlinschrift
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 7:22 ` Ligatures tomas
@ 2020-05-19 7:55 ` Joost Kremers
2020-05-19 8:07 ` Ligatures tomas
0 siblings, 1 reply; 145+ messages in thread
From: Joost Kremers @ 2020-05-19 7:55 UTC (permalink / raw)
To: tomas; +Cc: emacs-devel
On Tue, May 19 2020, tomas@tuxteam.de wrote:
> There's no capital version of "ß", you use "SS" (thus breaking
> bijectivity of upper- and lowercase).
Actually, uppercase ẞ was accepted into the official German
spelling in 2017:
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E (cf. last line of
Section "History").
--
Joost Kremers
Life has its moments
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 7:55 ` Ligatures Joost Kremers
@ 2020-05-19 8:07 ` tomas
2020-05-19 10:17 ` Ligatures Yuri Khan
2020-05-19 10:43 ` Ligatures Werner LEMBERG
0 siblings, 2 replies; 145+ messages in thread
From: tomas @ 2020-05-19 8:07 UTC (permalink / raw)
To: Joost Kremers; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]
On Tue, May 19, 2020 at 09:55:25AM +0200, Joost Kremers wrote:
>
> On Tue, May 19 2020, tomas@tuxteam.de wrote:
> >There's no capital version of "ß", you use "SS" (thus breaking
> >bijectivity of upper- and lowercase).
>
> Actually, uppercase ẞ was accepted into the official German spelling
> in 2017:
>
> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E (cf. last line of
> Section "History").
Yes, Officially. Nearly nobody uses it. If I had to bet, I'd expect
'ß' to disappear and be replaced by 'ss', as the Swiss do before
uppercase ß has a chance :-)
But we disgress: I was just trying to highlight how much cultural
bias there is in one's view of seemingly technical things. When
talking ligatures, one should try to first understand what crazy
stuff other languages have to take care of.
I wish I could say a thing or two about Devanagari or Hangul [1],
but knowledge is just too limited.
Cheers
[1] https://en.wikipedia.org/wiki/Hangul
for another example where you stack stuff in two dimensions
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
2020-05-18 17:39 ` Eli Zaretskii
@ 2020-05-19 10:09 ` Trevor Spiteri
2020-05-19 14:22 ` Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Trevor Spiteri @ 2020-05-19 10:09 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]
On 18/05/2020 19:31, Clément Pit-Claudel wrote:
> On 18/05/2020 12.08, Eli Zaretskii wrote:
>> On second thought, I think I misunderstood you. If the font that is
>> used shows "ffi" as a _single_ glyph ffi, and LibreOffice indeed
>> highlights parts of this glyph, then I'd like to know how it does
>> that, and how far does this capability extend. I mean, what does it
>> do with ligatures like ae, displayed as æ -- does it highlight the
>> common vertical stroke for both parts? And what about "st", displayed
>> as st -- this has a curved "hand" connecting s and t -- to which of the
>> 2 does it belong for the purposes of highlighting? There's also "hv"
>> displayed as ƕ, let alone "fs" displayed as ẞ and "fz" displayed as
>> ß.
> I've attached a screenshot with a few examples, though I couldn't find a font that displays ae as æ.
>
> Firefox does the same as LibreOffice (try it here, for example: https://developer.mozilla.org/en-US/docs/Web/CSS/font-variant-ligatures). Since Firefox uses Harbuzz, I think there's a good chance we can support that feature too :)
For what it's worth, LibreOffice does it differently. I think what it
does is place the cursor on the position it would be if any following
text was missing. So moving after the second f in ffi would move the
cursor to the same position as after ff if the i was missing. This is
evident from fraction ligatures; in the screenshot I'm attaching, "63"
is selected and the selection matches the 63 in the bottom line.
[-- Attachment #2: fraction.png --]
[-- Type: image/png, Size: 5505 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 8:07 ` Ligatures tomas
@ 2020-05-19 10:17 ` Yuri Khan
2020-05-19 14:26 ` Ligatures Eli Zaretskii
2020-05-19 10:43 ` Ligatures Werner LEMBERG
1 sibling, 1 reply; 145+ messages in thread
From: Yuri Khan @ 2020-05-19 10:17 UTC (permalink / raw)
To: tomas; +Cc: Joost Kremers, Emacs developers
On Tue, 19 May 2020 at 15:11, <tomas@tuxteam.de> wrote:
> [1] https://en.wikipedia.org/wiki/Hangul
> for another example where you stack stuff in two dimensions
An example of character combining other than side-by-side stacking is
much closer than that: Combining diacritics. Sure, you can delete an
acute accent from á by pressing Backspace, but you cannot put point
between the ‘a’ and the accent if you want to put a different
diacritic between them. (And putting multiple diacritics over a single
base character in various orders is a thing, it is the subject of the
Unicode Canonical Order subsection in Unicode standard.)
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 8:07 ` Ligatures tomas
2020-05-19 10:17 ` Ligatures Yuri Khan
@ 2020-05-19 10:43 ` Werner LEMBERG
2020-05-19 10:48 ` Ligatures tomas
1 sibling, 1 reply; 145+ messages in thread
From: Werner LEMBERG @ 2020-05-19 10:43 UTC (permalink / raw)
To: tomas; +Cc: joostkremers, emacs-devel
>> >There's no capital version of "ß", you use "SS" (thus breaking
>> >bijectivity of upper- and lowercase).
>>
>> Actually, uppercase ẞ was accepted into the official German
>> spelling in 2017:
>>
>> https://en.wikipedia.org/wiki/Capital_%E1%BA%9E (cf. last line of
>> Section "History").
>
> Yes, Officially. Nearly nobody uses it. If I had to bet, I'd expect
> 'ß' to disappear and be replaced by 'ss', as the Swiss do before
> uppercase ß has a chance :-)
Well, if your family name is 'Dreßen', you don't want to see your name
written as 'DRESSEN' in your passport (which usually requires
uppercase for family names): All German speakers would pronounce the
first 'e' as a short vowel instead of the correct long one. Exactly
for this situation – and for hardly anything else – you should write
'DREẞEN'.
Werner
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 10:43 ` Ligatures Werner LEMBERG
@ 2020-05-19 10:48 ` tomas
0 siblings, 0 replies; 145+ messages in thread
From: tomas @ 2020-05-19 10:48 UTC (permalink / raw)
To: Werner LEMBERG; +Cc: joostkremers, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 861 bytes --]
On Tue, May 19, 2020 at 12:43:06PM +0200, Werner LEMBERG wrote:
[...]
> Well, if your family name is 'Dreßen', you don't want to see your name
> written as 'DRESSEN' in your passport (which usually requires
> uppercase for family names): All German speakers would pronounce the
> first 'e' as a short vowel instead of the correct long one. Exactly
> for this situation – and for hardly anything else – you should write
> 'DREẞEN'.
Yes, I know -- that's why such things change slowly. But the Swiss
prove that it works. We're used to having things which are written
the same and pronounced differently, anyway. One more wouldn't
change things.
Note that I'm not advocating [1] for dropping the 'ß'. I'm just betting
that it might happen rather sooner than later.
Cheers
[1] I've enough to do advocating free software ;-D
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-18 21:59 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Alan Third
@ 2020-05-19 13:56 ` Eli Zaretskii
2020-05-19 14:39 ` Clément Pit-Claudel
2020-05-19 20:26 ` Alan Third
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 13:56 UTC (permalink / raw)
To: Alan Third; +Cc: cpitclaudel, emacs-devel
> Date: Mon, 18 May 2020 23:59:11 +0200 (CEST)
> From: Alan Third <alan@idiocy.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
>
> In case anyone's interested, I've attached a screenshot of Apple's
> Pages.app displaying the word Zapfino with the cursor after the "a".
I don't see anything on or after "a", I see a thin vertical line on
the "Z". is that what is actually displayed? If so, how do people
know the cursor is after "a"??
> Clearly not ideal. OTOH, if LibreOffice, Firefox, and even Apple's
> products do this, perhaps it's just the way people will expect it to
> be done.
If someone wants to work on such a feature, I'm sure it will be
welcome by at least some of the users.
Thanks.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 2:44 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 13:59 ` Eli Zaretskii
2020-05-19 14:35 ` Ligatures Clément Pit-Claudel
2020-05-19 15:36 ` Ligatures Tassilo Horn
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 13:59 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Mon, 18 May 2020 22:44:27 -0400
>
> >> In an earlier thread, you mentioned programming font ligatures — wouldn't it be very common to deconstruct such ligatures, like → into ->?
> >
> > No, I don't think so. Why would this be common?
>
> I thought it would be the default. Emacs shows →, and you can put the point either before (|→), in the middle (-|>), or after (→|).
Doesn't sound as a useful default to me. It could be an optional
feature, though.
> Here's a fairly common case: when writing html or XML, you may type <, then >, then press C-b and type the tag name; or you may use < and a paredit-like setup that inserts the > automatically. If the font has a ligature for <> and you can't put the point in the middle, this breaks. Same for || — the notation |x| { … } is used for lambdas in some languages; if you type || then try to move the point back inside the composed || glyph it won't work.
Sounds like a bug or misfeature that needs a solution, not necessarily
the one that's been proposed here. For example, how about a special
insert command that would disable ligation with the character it
inserts?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 5:43 ` Ligatures ASSI
2020-05-19 7:22 ` Ligatures tomas
@ 2020-05-19 14:18 ` Eli Zaretskii
2020-05-19 14:52 ` Ligatures Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 14:18 UTC (permalink / raw)
To: ASSI; +Cc: pipcet, emacs-devel
> From: ASSI <Stromeko@nexgo.de>
> Cc: pipcet@gmail.com, emacs-devel@gnu.org
> Date: Tue, 19 May 2020 07:43:00 +0200
>
> The only program I ever used that I remember doing this (a WYSIWYG TeX
> editor for DOS, natch) temporarily broke the ligature while you were
> moving the cursor inside. It looked a bit strange and was slightly
> distracting if you were just moving the cursor without trying to edit
> it, but otherwise did the job well.
That's what I had in mind (although I never used such an editor).
> The origin of this ligature has no general consensus AFAIK, but if you
> read older (facsimile) printed literature from around 1800 it becomes
> pretty obvious that the typeface evolved from a combination of long s
> (mainly used inside a word) and round s (used at the end). The origin
> of "sz" in that place is even more complicated to figure out, but it
> seems (to me anyway) that this was driven by a desire to preserve the
> distinction to double s / "ss" when using typefaces that didn't have the
> proper glyphs for the various types of "s" previously available in
> Fraktur. Neither "fs" nor "fz" should ligature into "ß" (which is a
> proper glyph these days and no longer a ligature, although you are still
> allowed to break it into either "ss" or "sz" when using typefaces that
> don't support it, like most versalia).
I think we should support these unusual ligatures for those who'd like
to see them, probably as an opt-in feature.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 10:09 ` Trevor Spiteri
@ 2020-05-19 14:22 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 14:22 UTC (permalink / raw)
To: Trevor Spiteri; +Cc: emacs-devel
> From: Trevor Spiteri <tspiteri@ieee.org>
> Date: Tue, 19 May 2020 12:09:32 +0200
>
> For what it's worth, LibreOffice does it differently. I think what it
> does is place the cursor on the position it would be if any following
> text was missing. So moving after the second f in ffi would move the
> cursor to the same position as after ff if the i was missing.
This is only possible if the metrics of a sole f and f inside the
ligature are identical or sufficiently close. That is not generally
true in ligatures, not even in Latin ligatures.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 10:17 ` Ligatures Yuri Khan
@ 2020-05-19 14:26 ` Eli Zaretskii
2020-05-19 19:00 ` Ligatures Yuri Khan
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 14:26 UTC (permalink / raw)
To: Yuri Khan; +Cc: joostkremers, tomas, emacs-devel
> From: Yuri Khan <yuri.v.khan@gmail.com>
> Date: Tue, 19 May 2020 17:17:25 +0700
> Cc: Joost Kremers <joostkremers@fastmail.fm>,
> Emacs developers <emacs-devel@gnu.org>
>
> An example of character combining other than side-by-side stacking is
> much closer than that: Combining diacritics. Sure, you can delete an
> acute accent from á by pressing Backspace, but you cannot put point
> between the ‘a’ and the accent if you want to put a different
> diacritic between them.
Well, you can (this is Emacs, right?): just disable automatic
composition with "M-x auto-composition-mode", and you can do any
editing you want. Then re-enable the mode again.
> (And putting multiple diacritics over a single base character in
> various orders is a thing, it is the subject of the Unicode
> Canonical Order subsection in Unicode standard.)
Canonical order of diacritics is indeed important for jobs such as
comparison, searching, etc. But we are talking about display, and for
display there's a requirement that the order should not matter as long
as the base character comes first. AFAIR, HarfBuzz supports that
requirement, but not every other shaping engine does.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 13:59 ` Ligatures Eli Zaretskii
@ 2020-05-19 14:35 ` Clément Pit-Claudel
2020-05-19 15:21 ` Ligatures Eli Zaretskii
2020-05-19 15:36 ` Ligatures Tassilo Horn
1 sibling, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 14:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 19/05/2020 09.59, Eli Zaretskii wrote:
>> Cc: emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Mon, 18 May 2020 22:44:27 -0400
>>
>>>> In an earlier thread, you mentioned programming font ligatures — wouldn't it be very common to deconstruct such ligatures, like → into ->?
>>>
>>> No, I don't think so. Why would this be common?
>>
>> I thought it would be the default. Emacs shows →, and you can put the point either before (|→), in the middle (-|>), or after (→|).
>
> Doesn't sound as a useful default to me. It could be an optional
> feature, though.
Do we know of other editors that support ligatures but chose not to support moving through a composed character? If not, that would be a fairly strong signal that it's a reasonable default, I'd expect.
>> Here's a fairly common case: when writing html or XML, you may type <, then >, then press C-b and type the tag name; or you may use < and a paredit-like setup that inserts the > automatically. If the font has a ligature for <> and you can't put the point in the middle, this breaks. Same for || — the notation |x| { … } is used for lambdas in some languages; if you type || then try to move the point back inside the composed || glyph it won't work.
>
> Sounds like a bug or misfeature that needs a solution, not necessarily
> the one that's been proposed here.
Possibly! But the feature discussed here seems to fit the bill pretty perfectly, so …
> For example, how about a special
> insert command that would disable ligation with the character it
> inserts?
Would that command be called automatically, or would it require a different input?
I don't think Emacs can guess whether it should enable or disable ligation, so I imagine you mean different input, but that doesn't sound pleasant to use, so maybe I'm misunderstanding?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 13:56 ` Eli Zaretskii
@ 2020-05-19 14:39 ` Clément Pit-Claudel
2020-05-19 21:43 ` Pip Cet
2020-05-19 20:26 ` Alan Third
1 sibling, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 14:39 UTC (permalink / raw)
To: Eli Zaretskii, Alan Third; +Cc: emacs-devel
On 19/05/2020 09.56, Eli Zaretskii wrote:
> I don't see anything on or after "a", I see a thin vertical line on
> the "Z". is that what is actually displayed? If so, how do people
> know the cursor is after "a"??
They don't: "the seven equal slices that Firefox treats it as for selection/editing purposes don't match up to the visual shapes of the sub-glyphs at all well"
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 14:18 ` Ligatures Eli Zaretskii
@ 2020-05-19 14:52 ` Eli Zaretskii
2020-05-19 15:11 ` Ligatures Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 14:52 UTC (permalink / raw)
To: Stromeko; +Cc: pipcet, emacs-devel
> Date: Tue, 19 May 2020 17:18:41 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: pipcet@gmail.com, emacs-devel@gnu.org
>
> > From: ASSI <Stromeko@nexgo.de>
> > Cc: pipcet@gmail.com, emacs-devel@gnu.org
> > Date: Tue, 19 May 2020 07:43:00 +0200
> >
> > The only program I ever used that I remember doing this (a WYSIWYG TeX
> > editor for DOS, natch) temporarily broke the ligature while you were
> > moving the cursor inside. It looked a bit strange and was slightly
> > distracting if you were just moving the cursor without trying to edit
> > it, but otherwise did the job well.
>
> That's what I had in mind (although I never used such an editor).
Btw, there's one subtle issue that will need to be resolved if we are
to have this feature of "sub-glyph" cursor movement inside composed
characters. The way we currently display the default block cursor is
by simply redrawing the glyph at point in reverse video. So we don't
have a way of displaying a cursor that "covers" only part of a glyph.
To make this happen, we'd probably need to draw the cursor as part of
drawing the glyph foreground and/or background, which is against the
current flow of the display code: we generally first completely draw
the background and foreground of the entire text that needs to be
redrawn, and only then draw the cursor where it should be placed.
Something to figure out by that "Someone" who'd volunteer for the job.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-18 19:38 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 14:55 ` Pip Cet
2020-05-19 15:30 ` Ligatures Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-19 14:55 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
On Mon, May 18, 2020 at 7:40 PM Clément Pit-Claudel
<cpitclaudel@gmail.com> wrote:
> > And this is where it gets back to "let's not hardcode the dependency
> > on Harfbuzz and FreeType, because other backends might actually give
> > us the information we need".
>
> Harfbuzz can give us this info: hb_ot_layout_get_ligature_carets
Thanks, I hadn't looked there!
So Harfbuzz provides a non-core API which, after a separate call for
each cluster, allows us to split up a glyph into non-overlapping
bounding boxes of the same height (the information returned is
one-dimensional, and intended for carets, not for Emacs-style box
cursors).
I don't see how that API design is so great we should hardcode
dependencies on it, though I do agree it's sufficient to work with.
Again, this isn't about some exotic use case: I open a buffer, type
"ffi", and hit C-b twice. What should happen?
AFAIU, people are still seriously considering the possibility that all
of "ffi" would be covered by the cursor. I hope I'm misunderstanding
that, because it's so obviously the wrong thing to do in this case.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 14:52 ` Ligatures Eli Zaretskii
@ 2020-05-19 15:11 ` Pip Cet
2020-05-19 15:36 ` Ligatures Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-19 15:11 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stromeko, emacs-devel
On Tue, May 19, 2020 at 2:52 PM Eli Zaretskii <eliz@gnu.org> wrote:
> Btw, there's one subtle issue that will need to be resolved if we are
> to have this feature of "sub-glyph" cursor movement inside composed
> characters. The way we currently display the default block cursor is
> by simply redrawing the glyph at point in reverse video. So we don't
> have a way of displaying a cursor that "covers" only part of a glyph.
I thought that was what glyph_row->clip was for.
> To make this happen, we'd probably need to draw the cursor as part of
> drawing the glyph foreground and/or background, which is against the
I believe that's a change we should make anyway: late cursor drawing
makes sense on TTYs with physical cursors, but on GUI backends, we
should simply use a special face for drawing the struct glyph a cursor
is on, IMHO.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 14:35 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 15:21 ` Eli Zaretskii
2020-05-19 15:44 ` Ligatures Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 15:21 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 19 May 2020 10:35:50 -0400
>
> > Doesn't sound as a useful default to me. It could be an optional
> > feature, though.
>
> Do we know of other editors that support ligatures but chose not to support moving through a composed character? If not, that would be a fairly strong signal that it's a reasonable default, I'd expect.
OTOH, the current default exists since Emacs 21, so it sounds like a
reasonable default as well.
And I don't think arguing about defaults in Emacs is useful, because
changing the default if you don't like it is easy. We do change the
default behavior slowly, though.
(And please note that we are talking about defaults for a feature that
doesn't yet exist, which makes this dispute even less useful.)
> > For example, how about a special
> > insert command that would disable ligation with the character it
> > inserts?
>
> Would that command be called automatically, or would it require a different input?
You'd invoke it when you either know in advance you don't want the
next character to ligate, or after you saw the ligature to disable the
ligation for the sequence at or before point.
> I don't think Emacs can guess whether it should enable or disable ligation, so I imagine you mean different input, but that doesn't sound pleasant to use, so maybe I'm misunderstanding?
Emacs cannot, but the user can. Thus a separate command.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 14:55 ` Ligatures Pip Cet
@ 2020-05-19 15:30 ` Clément Pit-Claudel
2020-05-19 15:52 ` Ligatures Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 15:30 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel
On 19/05/2020 10.55, Pip Cet wrote:
> On Mon, May 18, 2020 at 7:40 PM Clément Pit-Claudel
> <cpitclaudel@gmail.com> wrote:
>>> And this is where it gets back to "let's not hardcode the dependency
>>> on Harfbuzz and FreeType, because other backends might actually give
>>> us the information we need".
>>
>> Harfbuzz can give us this info: hb_ot_layout_get_ligature_carets
>
> Thanks, I hadn't looked there!
>
> So Harfbuzz provides a non-core API which, after a separate call for
> each cluster, allows us to split up a glyph into non-overlapping
> bounding boxes of the same height (the information returned is
> one-dimensional, and intended for carets, not for Emacs-style box
> cursors).
Are you worried about the height of the box? For the width part, isn't it just the difference between two consecutive carets?
> I don't see how that API design is so great we should hardcode
> dependencies on it, though I do agree it's sufficient to work with.
No opinions there ^^
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:11 ` Ligatures Pip Cet
@ 2020-05-19 15:36 ` Eli Zaretskii
2020-05-19 16:16 ` Ligatures Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 15:36 UTC (permalink / raw)
To: Pip Cet; +Cc: Stromeko, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 19 May 2020 15:11:27 +0000
> Cc: Stromeko@nexgo.de, emacs-devel@gnu.org
>
> On Tue, May 19, 2020 at 2:52 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > Btw, there's one subtle issue that will need to be resolved if we are
> > to have this feature of "sub-glyph" cursor movement inside composed
> > characters. The way we currently display the default block cursor is
> > by simply redrawing the glyph at point in reverse video. So we don't
> > have a way of displaying a cursor that "covers" only part of a glyph.
>
> I thought that was what glyph_row->clip was for.
We could use that, but that's not the main problem. After all,
clipping while drawing is simple and doesn't need any special help.
The problem is that we need to change how the cursor is drawn, from
the control flow POV. We'd need to audit the code and see that the
information required for drawing the cursor is available when we are
drawing the text. And then there's the popular use case where nothing
changes except the cursor position, in which case no text is redrawn
at all.
> > To make this happen, we'd probably need to draw the cursor as part of
> > drawing the glyph foreground and/or background, which is against the
>
> I believe that's a change we should make anyway: late cursor drawing
> makes sense on TTYs with physical cursors, but on GUI backends, we
> should simply use a special face for drawing the struct glyph a cursor
> is on, IMHO.
It cannot be a single face, because the "thing under cursor" can be
anything, and can have different colors. We will need to merge faces,
which is slower than the current simple but effective method, which
completely sidesteps the issue.
in any case, using a face doesn't solve the main problem, as we'd
still need to draw the glyph with partial colors.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 13:59 ` Ligatures Eli Zaretskii
2020-05-19 14:35 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 15:36 ` Tassilo Horn
2020-05-19 16:08 ` Ligatures Eli Zaretskii
2020-05-19 16:14 ` Ligatures Stefan Monnier
1 sibling, 2 replies; 145+ messages in thread
From: Tassilo Horn @ 2020-05-19 15:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
>> > it be very common to deconstruct such ligatures, like → into ->?
>> >
>> > No, I don't think so. Why would this be common?
>>
>> I thought it would be the default. Emacs shows →, and you can put the
>> point either before (|→), in the middle (-|>), or after (→|).
>
> Doesn't sound as a useful default to me. It could be an optional
> feature, though.
To me it sounds like a good default.
>> Here's a fairly common case: when writing html or XML, you may type
>> <, then >, then press C-b and type the tag name; or you may use < and
>> a paredit-like setup that inserts the > automatically. If the font
>> has a ligature for <> and you can't put the point in the middle, this
>> breaks. Same for || — the notation |x| { … } is used for lambdas in
>> some languages; if you type || then try to move the point back inside
>> the composed || glyph it won't work.
>
> Sounds like a bug or misfeature that needs a solution, not necessarily
> the one that's been proposed here. For example, how about a special
> insert command that would disable ligation with the character it
> inserts?
I use the attached self-written ligature.el (Eli, you've helped me with
that some months back). That's all nice but sometimes I too have the
problem that I want to edit the name of a "private" function/variable
foo--do-stuff and cannot move point inside the double-dash because it is
composed as one char. As a little cure, I disable ligatures in the
minibuffer where I absolutely need to do completion stuff like
foo-<TAB>-bar.
Another case is where when inserting < automatically inserts >
immediately giving a <> diamond where I cannot move into.
A special insert command will not help here because it is already
inserted.
Bye,
Tassilo
[-- Attachment #2: ligature.el --]
[-- Type: text/plain, Size: 3251 bytes --]
(defgroup ligature nil
"Support for font ligatures"
:version "28.1"
:prefix "ligature-")
(defcustom ligature-arrows
(list "-->" "<!--" "->>" "<<-" "->" "<-"
"<-<" ">>-" ">-" "<~>" "-<" "-<<"
"<=>" "=>" "<=<" "<<=" "<==" "<==>" "==>" "=>>" ">=>" ">>="
"<-|" "<=|" "|=>" "|->" "<~~" "<~" "~~>"
"~>" "<->")
"Arrow ligatures."
:type '(repeat string))
(defcustom ligature-misc
(list "..<" "~-" "-~" "~@" "-|" "_|_" "|-" "||-" "|=" "||="
".?" "?=" "<|>" "<:" ":<" ":>" ">:"
".=" ".-" "__" "<<<" ">>>" "<<" ">>" "~~"
"<$>" "<$" "$>" "<+>" "<+" "+>" "<*>" "<*" "*>" "</" "</>" "/>"
"|}" "{|" "[<" ">]" ":?>" ":?" "[||]" "?:" "?."
"|>" "<|" "||>" "<||" "|||>" "<|||::=" "|]" "[|"
"#{" "#[" "]#" "#(" "#?" "#_" "#_(" "#:" "#!" "#=")
"Miscellaneous ligatures."
:type '(repeat string))
(defcustom ligature-relations
(list "==" "!=" "<=" ">=" "=:=" "!==" "===" "<>" "/==" "=!=" "=/=" "~=" ":="
"/=" "^=")
"Relation ligatures."
:type '(repeat string))
(defcustom ligature-operators
(list "&&" "&&&" "||" "++" "--" "!!" "::" "+++" "??" ":::" "***" "---"
"/\\" "\\/")
"Operator ligatures."
:type '(repeat string))
(defcustom ligature-comments-c-like
(list "//" "///" "/**" "/*" "*/")
"Ligatures for comments in C-like languages."
:type '(repeat string))
(defcustom ligature-comments-xml-like
(list "<!--" "-->")
"Ligatures for comments in XML-like languages."
:type '(repeat string))
(defcustom ligature-hashes
(list "##" "###" "####")
"Ligatures for comments in languages with # being the comment character."
:type '(repeat string))
(defcustom ligature-dots
(list "..." "..")
"Dot ligatures."
:type '(repeat string))
(defcustom ligature-semicolons
(list ";;" ";;;")
"Ligatures for comments in lisp languages."
:type '(repeat string))
(defun ligature--get-all ()
(append ligature-arrows
ligature-relations
ligature-operators
ligature-misc
ligature-dots
ligature-comments-c-like
ligature-comments-xml-like
ligature-hashes
ligature-semicolons))
(defun ligature--apply (ligatures)
(let ((groups (seq-group-by #'string-to-char ligatures)))
(dolist (group groups)
(let ((c (car group))
(rx (regexp-opt (mapcar (lambda (s) (substring s 1))
(cdr group)))))
(set-char-table-range composition-function-table
c `([,(concat "." rx) 0 compose-gstring-for-graphic]))))))
(define-minor-mode ligature-minor-mode
"A mode for font ligatures."
nil "" nil
(if ligature-minor-mode
(progn
(when (minibufferp)
(error "Cannot use ligature-minor-mode in minibuffer"))
;; FIXME: This doesn't work. When enabled, there will be a local
;; variable but the global value is the same (and also includes the
;; ligature composition rules).
(ligature--apply (ligature--get-all)))
;; FIXME: Even if the above worked, this could remove much more than this
;; mode added itself.
(kill-local-variable 'composition-function-table)))
(defun ligature-minor-mode--apply-if-possible ()
(unless (minibufferp)
(ligature-minor-mode)))
(define-globalized-minor-mode global-ligature-minor-mode
ligature-minor-mode
ligature-minor-mode--apply-if-possible)
(provide 'ligature)
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:21 ` Ligatures Eli Zaretskii
@ 2020-05-19 15:44 ` Clément Pit-Claudel
2020-05-19 16:15 ` Ligatures Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-19 15:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 19/05/2020 11.21, Eli Zaretskii wrote:
> And I don't think arguing about defaults in Emacs is useful, because
> changing the default if you don't like it is easy. We do change the
> default behavior slowly, though.
I see this argument often (changing settings is easy), but I don't find it very convincing: in my experience, even after years of using Emacs, figuring which variable controls a given behavior, if there is even such a variable, is usually not easy: it requires reading manuals, guessing the right keywords, and often stepping through function implementations.
It's quite a bit easier in Emacs than in other editors, but still not easy at all.
>>> For example, how about a special
>>> insert command that would disable ligation with the character it
>>> inserts?
>>
>> Would that command be called automatically, or would it require a different input?
>
> You'd invoke it when you either know in advance you don't want the
> next character to ligate, or after you saw the ligature to disable the
> ligation for the sequence at or before point.
That assumes that I know whether inserting a character will introduce a ligation, but I usually don't. I can't keep in my head a list of all the ligatures that my font supports, so I'm bound to be surprised from time to time (besides, this is very contextual. When I write a language where /\ and \/ are used to mean "and" and "or", I think of it when I type a / or a \. But when I'm in a context where /…/ is used to delimit regular expressions and \ is used to escape a character, I don't think of the \/ ligature.
>> I don't think Emacs can guess whether it should enable or disable ligation, so I imagine you mean different input, but that doesn't sound pleasant to use, so maybe I'm misunderstanding?
>
> Emacs cannot, but the user can. Thus a separate command.
I don't think that will work, but maybe I'm missing something. How does this work if I open a file that already has a ligature and I want to modify it? Do I have to explicitly break the ligature before I can edit it?
More importantly, though, I don't understand what problem it would solve, at least in the context of programming ligatures. What is the problem with allowing cursor movement through ligatures like → for ->?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:30 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 15:52 ` Pip Cet
0 siblings, 0 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-19 15:52 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]
On Tue, May 19, 2020 at 3:30 PM Clément Pit-Claudel
<cpitclaudel@gmail.com> wrote:
> > So Harfbuzz provides a non-core API which, after a separate call for
> > each cluster, allows us to split up a glyph into non-overlapping
> > bounding boxes of the same height (the information returned is
> > one-dimensional, and intended for carets, not for Emacs-style box
> > cursors).
>
> Are you worried about the height of the box? For the width part, isn't it just the difference between two consecutive carets?
That's what I'd work with, yeah.
Perhaps I can make things a little clearer by attaching a screenshot
of how things currently look with the "Linux Libertine Display O"
font, which has especially prominent ligatures and overhangs (I guess
it's somehow inspired by the operating system kernel it's named for?).
I think there's plenty to be improved about that: use a ligature,
sure, but also maybe get away from the "invert a box" style of drawing
the cursor, or handle overhangs specially, or...something.
But that would require an idea of which pixels belong to which
(sub)glyphs (in the ligature). And caret positioning doesn't give us
enough information to do that.
Thank you again for pointing out that API! Whether it's a core feature
of a shaper or a backend-dependent extra feature is a secondary
concern, the important part is that it's there and we can do the right
thing.
[-- Attachment #2: ffi.jpg --]
[-- Type: image/jpeg, Size: 1741 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:36 ` Ligatures Tassilo Horn
@ 2020-05-19 16:08 ` Eli Zaretskii
2020-05-19 16:14 ` Ligatures Stefan Monnier
1 sibling, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 16:08 UTC (permalink / raw)
To: Tassilo Horn; +Cc: cpitclaudel, emacs-devel
> From: Tassilo Horn <tsdh@gnu.org>
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
> Date: Tue, 19 May 2020 17:36:44 +0200
>
> I use the attached self-written ligature.el (Eli, you've helped me with
> that some months back). That's all nice but sometimes I too have the
> problem that I want to edit the name of a "private" function/variable
> foo--do-stuff and cannot move point inside the double-dash because it is
> composed as one char. As a little cure, I disable ligatures in the
> minibuffer where I absolutely need to do completion stuff like
> foo-<TAB>-bar.
>
> Another case is where when inserting < automatically inserts >
> immediately giving a <> diamond where I cannot move into.
Yes, the user-level (and perhaps also some infrastructure level) of
support for ligatures is not yet ready. There's a TODO item for that,
patches are welcome.
> A special insert command will not help here because it is already
> inserted.
Then maybe we need both a command to insert a character without
ligation, and a command to disassemble a ligature at point.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:36 ` Ligatures Tassilo Horn
2020-05-19 16:08 ` Ligatures Eli Zaretskii
@ 2020-05-19 16:14 ` Stefan Monnier
1 sibling, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-19 16:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, emacs-devel
>> Doesn't sound as a useful default to me. It could be an optional
>> feature, though.
> To me it sounds like a good default.
For `->` and `ffi` it sounds good, indeed. For prettify-symbol-mode's
combining of `lambda` into `λ`, OTOH that would be rather undesirable.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:44 ` Ligatures Clément Pit-Claudel
@ 2020-05-19 16:15 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 16:15 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 19 May 2020 11:44:31 -0400
>
> > You'd invoke it when you either know in advance you don't want the
> > next character to ligate, or after you saw the ligature to disable the
> > ligation for the sequence at or before point.
>
> That assumes that I know whether inserting a character will
> introduce a ligation, but I usually don't. [...]
Did you miss the part after "or after"?
> I don't think that will work, but maybe I'm missing something. How does this work if I open a file that already has a ligature and I want to modify it? Do I have to explicitly break the ligature before I can edit it?
"M-x toggle-ligature-mode RET", perhaps? Or go to the ligature you
want to edit and invoke that command I mentioned above (after "or
after")?
> More importantly, though, I don't understand what problem it would solve, at least in the context of programming ligatures. What is the problem with allowing cursor movement through ligatures like → for ->?
It doesn't feel right to me, and it goes against what Emacs did for
the past 20 years. But that's me.
But again, this is a purely academic argument. Ligature support in
Emacs is not yet ready for prime time, the sub-glyph cursor motion
needs to be implemented in the display engine, and only after that it
would make sense arguing about the defaults of this imaginary mode.
Let's not finish arguing now, lest we will have nothing to argue about
then, okay? ;-)
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 15:36 ` Ligatures Eli Zaretskii
@ 2020-05-19 16:16 ` Pip Cet
2020-05-19 16:41 ` Ligatures Eli Zaretskii
2020-05-19 17:00 ` Ligatures Eli Zaretskii
0 siblings, 2 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-19 16:16 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stromeko, emacs-devel
On Tue, May 19, 2020 at 3:36 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Tue, 19 May 2020 15:11:27 +0000
> > Cc: Stromeko@nexgo.de, emacs-devel@gnu.org
> >
> > On Tue, May 19, 2020 at 2:52 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > Btw, there's one subtle issue that will need to be resolved if we are
> > > to have this feature of "sub-glyph" cursor movement inside composed
> > > characters. The way we currently display the default block cursor is
> > > by simply redrawing the glyph at point in reverse video. So we don't
> > > have a way of displaying a cursor that "covers" only part of a glyph.
> >
> > I thought that was what glyph_row->clip was for.
>
> We could use that, but that's not the main problem.
Sorry, I genuinely don't understand what the problem is. draw_glyphs
is called by draw_phys_cursor_glyph, so all we need is a line or two
of extra code in draw_phys_cursor_glyphs to set row->clip to the
rectangle surrounding the subglyph the cursor is on. No further change
of the display engine is required for that, is it?
> The problem is that we need to change how the cursor is drawn, from
> the control flow POV.
That's a separate thing that, yes, we need to do. Because optimizing
for TTYs is no longer appropriate. But I don't see why we need to
perform this large change before performing the little one that makes
things work for subglyphs.
> We'd need to audit the code and see that the
> information required for drawing the cursor is available when we are
> drawing the text. And then there's the popular use case where nothing
> changes except the cursor position, in which case no text is redrawn
> at all.
Except for the glyphs the cursor is on, right? Those are redrawn by
draw_phys_cursor_glyph, or am I missing something here?
> > > To make this happen, we'd probably need to draw the cursor as part of
> > > drawing the glyph foreground and/or background, which is against the
> >
> > I believe that's a change we should make anyway: late cursor drawing
> > makes sense on TTYs with physical cursors, but on GUI backends, we
> > should simply use a special face for drawing the struct glyph a cursor
> > is on, IMHO.
>
> It cannot be a single face, because the "thing under cursor" can be
> anything, and can have different colors.
Agreed.
> We will need to merge faces,
> which is slower than the current simple but effective method, which
> completely sidesteps the issue.
I believe performance concerns are an entirely different subject (put
briefly, my opinion is that we've painted ourselves into a corner by
micro-optimizing fast loops over an essentially inefficient basic
design).
> in any case, using a face doesn't solve the main problem, as we'd
> still need to draw the glyph with partial colors.
Which we can do by setting glyph_row->clip? I don't see how there's
any problem here at all.
Again, I see three totally separate problems here:
1. draw a box cursor over a partial glyph
2. improve the display engine to handle cursor(s) like other
highlighting on graphical terminals
3. identify and counteract actual performance problems in the redisplay engine
I still don't see how (1) depends on (2), and I think I disagree with
you on the subject of (3), because I think we need to fix the design
first, moving a lot of C code out to Lisp, then see where things
actually chafe and maybe move some special code back to C.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 16:16 ` Ligatures Pip Cet
@ 2020-05-19 16:41 ` Eli Zaretskii
2020-05-19 17:00 ` Ligatures Eli Zaretskii
1 sibling, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 16:41 UTC (permalink / raw)
To: Pip Cet; +Cc: Stromeko, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 19 May 2020 16:16:53 +0000
> Cc: Stromeko@nexgo.de, emacs-devel@gnu.org
>
> I think we need to fix the design first, moving a lot of C code out
> to Lisp
No, we don't need to fix the design of the display engine. We need to
design a new and different display engine, based on ideas more
flexible and powerful than the current rectangular array of glyphs.
You (or someone else) is more than welcome to work on such a new
design, present it here, discuss ideas, etc. If I can help, I will.
I will reserve my judgment on the "move to Lisp" part until I see the
overall design of this new engine, and at least some of the
implementation ideas, including how not to lose existing display
features.
By contrast, "fixing the design" of the current display engine, let
alone moving parts of it to Lisp, is IMNSHO a waste of effort. It
simply cannot be fixed, it's already stretched beyond limit. We can
(and do) make small adjustments, but that's all.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 16:16 ` Ligatures Pip Cet
2020-05-19 16:41 ` Ligatures Eli Zaretskii
@ 2020-05-19 17:00 ` Eli Zaretskii
1 sibling, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-19 17:00 UTC (permalink / raw)
To: Pip Cet; +Cc: Stromeko, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 19 May 2020 16:16:53 +0000
> Cc: Stromeko@nexgo.de, emacs-devel@gnu.org
>
> Sorry, I genuinely don't understand what the problem is.
There's no need to argue. There's a TODO item regarding ligature
support, and I just updated it with the ideas from this discussion.
You, or anyone else, are welcome to work on some or all of that. I
think good ligature support in Emacs is long overdue; that is one of
the reasons we added HarfBuzz support and are steadily moving towards
making it the default font backend. Any advances in the direction of
letting Emacs use advanced features of modern fonts are welcome.
> draw_glyphs is called by draw_phys_cursor_glyph, so all we need is a
> line or two of extra code in draw_phys_cursor_glyphs to set
> row->clip to the rectangle surrounding the subglyph the cursor is
> on. No further change of the display engine is required for that, is
> it?
Feel free to ignore me. I may be completely wrong about this. Please
disregard what I said and just code away what you think is needed to
implement this.
> > And then there's the popular use case where nothing
> > changes except the cursor position, in which case no text is redrawn
> > at all.
>
> Except for the glyphs the cursor is on, right? Those are redrawn by
> draw_phys_cursor_glyph, or am I missing something here?
Basically, yes, draw_phys_cursor_glyph. But there are other functions
related to that, and which ones need to be changed for this "partial"
cursor drawing to work, I really don't know/remember, sorry. You need
to read the code.
> > We will need to merge faces,
> > which is slower than the current simple but effective method, which
> > completely sidesteps the issue.
>
> I believe performance concerns are an entirely different subject (put
> briefly, my opinion is that we've painted ourselves into a corner by
> micro-optimizing fast loops over an essentially inefficient basic
> design).
The current design is that faces are realized lazily and cached for
subsequent use, because realizing a face is expensive. It makes no
sense to realize a face each time we blink the cursor. No matter what
you think about the current design, code which does unnecessary
calculations is bad code. Gerd Moellmann, which designed and
implemented the current display engine, isn't stupid or incompetent,
quite the contrary.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 14:26 ` Ligatures Eli Zaretskii
@ 2020-05-19 19:00 ` Yuri Khan
0 siblings, 0 replies; 145+ messages in thread
From: Yuri Khan @ 2020-05-19 19:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Joost Kremers, tomas, Emacs developers
> > (And putting multiple diacritics over a single base character in> > various orders is a thing, it is the subject of the Unicode
> > Canonical Order subsection in Unicode standard.)
>
> Canonical order of diacritics is indeed important for jobs such as
> comparison, searching, etc. But we are talking about display, and for
> display there's a requirement that the order should not matter as long
> as the base character comes first. AFAIR, HarfBuzz supports that
> requirement, but not every other shaping engine does.
I meant, the Canonical Order spec could be a lot simpler (“just sort
all diacritics according to their codepoint value” rather than “take
great care to only swap two adjacent diacritics if their combining
classes differ and ordered wrongly”) if diacritics order did not
matter. But it does; <a> <acute> <diaeresis> is different from <a>
<diaeresis> <acute>, so the use case of putting point between the base
character and its following diacritic in order to insert a different
one is somewhat important. Indeed, toggling auto-composition-mode
solves that.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 13:56 ` Eli Zaretskii
2020-05-19 14:39 ` Clément Pit-Claudel
@ 2020-05-19 20:26 ` Alan Third
1 sibling, 0 replies; 145+ messages in thread
From: Alan Third @ 2020-05-19 20:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel
On Tue, May 19, 2020 at 04:56:32PM +0300, Eli Zaretskii wrote:
> > Date: Mon, 18 May 2020 23:59:11 +0200 (CEST)
> > From: Alan Third <alan@idiocy.org>
> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> >
> > In case anyone's interested, I've attached a screenshot of Apple's
> > Pages.app displaying the word Zapfino with the cursor after the "a".
>
> I don't see anything on or after "a", I see a thin vertical line on
> the "Z". is that what is actually displayed? If so, how do people
> know the cursor is after "a"??
Yep, that's what's displayed. The vertical line is the cursor. The
only reason I know it's after the a is because I hit the right arrow
twice to get there from the left of the glyph.
--
Alan Third
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 14:39 ` Clément Pit-Claudel
@ 2020-05-19 21:43 ` Pip Cet
2020-05-20 1:41 ` Clément Pit-Claudel
` (3 more replies)
0 siblings, 4 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-19 21:43 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, Alan Third, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]
On Tue, May 19, 2020 at 2:39 PM Clément Pit-Claudel
<cpitclaudel@gmail.com> wrote:
> On 19/05/2020 09.56, Eli Zaretskii wrote:
> > I don't see anything on or after "a", I see a thin vertical line on
> > the "Z". is that what is actually displayed? If so, how do people
> > know the cursor is after "a"??
>
> They don't: "the seven equal slices that Firefox treats it as for selection/editing purposes don't match up to the visual shapes of the sub-glyphs at all well"
And I'm afraid the difference is much more obvious with box cursors
than it is with carets. I'm attaching a screenshot of a patched Emacs
displaying "ffi", with point on the second f, in the "Linux Libertine
Display O" font (using approximately equal slices).
I think this is a bit of a worst-case scenario, a three-letter
ligature in a font using ligatures and overhangs very
enthusiastically. It might be okay for other fonts.
My remaining idea is to stretch characters so we can break up a
ligature without changing its total width. I'm not sure how to do
that, though.
(I'm also attaching the patch, for the morbidly curious; it isn't
clean, readable, or finished in any way, and contains at least one
obvious bug. It's just good enough to produce the screenshot, and
maybe it can serve as a hint as to which files need changing for
ligatures to work; but such changes would have to be done very
differently from the patch.).
[-- Attachment #2: ffi-box-cursor.png --]
[-- Type: image/png, Size: 1067 bytes --]
[-- Attachment #3: 0001-Ligatures.diff --]
[-- Type: text/x-patch, Size: 21370 bytes --]
diff --git a/src/alloc.c b/src/alloc.c
index ebc55857ea..1395f647f4 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -322,7 +322,7 @@ #define PUREBEG (char *) pure
/* If positive, garbage collection is inhibited. Otherwise, zero. */
-static intptr_t garbage_collection_inhibited;
+static intptr_t garbage_collection_inhibited = 3;
/* The GC threshold in bytes, the last time it was calculated
from gc-cons-threshold and gc-cons-percentage. */
diff --git a/src/composite.c b/src/composite.c
index 518502be49..e2bece40c8 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -836,7 +836,7 @@ fill_gstring_body (Lisp_Object gstring)
LGLYPH_SET_CHAR (g, c);
if (font != NULL)
- code = font->driver->encode_char (font, LGLYPH_CHAR (g));
+ code = font->driver->encode_char (font, LGLYPH_CHAR (g), NULL);
else
code = FONT_INVALID_CODE;
if (code != FONT_INVALID_CODE)
diff --git a/src/dispextern.h b/src/dispextern.h
index 0b1f3d14ae..2f6b33e74c 100644
--- a/src/dispextern.h
+++ b/src/dispextern.h
@@ -397,6 +397,15 @@ #define SET_GLYPH_FROM_GLYPH_CODE(glyph, gc) \
};
+struct glyph_context
+{
+ union vectorlike_header header;
+ Lisp_Object string;
+ Lisp_Object position;
+ int i;
+ int n;
+};
+
/* Glyphs.
Be extra careful when changing this structure! Esp. make sure that
@@ -567,6 +576,8 @@ #define FACE_ID_BITS 20
/* Used to compare all bit-fields above in one step. */
unsigned val;
} u;
+
+ struct glyph_context *context;
};
diff --git a/src/font.c b/src/font.c
index ab00402b40..8de3c969b9 100644
--- a/src/font.c
+++ b/src/font.c
@@ -3010,7 +3010,7 @@ font_has_char (struct frame *f, Lisp_Object font, int c)
if (result >= 0)
return result;
}
- return (fontp->driver->encode_char (fontp, c) != FONT_INVALID_CODE);
+ return (fontp->driver->encode_char (fontp, c, NULL) != FONT_INVALID_CODE);
}
@@ -3023,7 +3023,7 @@ font_encode_char (Lisp_Object font_object, int c)
eassert (FONT_OBJECT_P (font_object));
font = XFONT_OBJECT (font_object);
- return font->driver->encode_char (font, c);
+ return font->driver->encode_char (font, c, NULL);
}
@@ -4418,7 +4418,7 @@ font_fill_lglyph_metrics (Lisp_Object glyph, struct font *font, unsigned int cod
struct font_metrics metrics;
LGLYPH_SET_CODE (glyph, code);
- font->driver->text_extents (font, &code, 1, &metrics);
+ font->driver->text_extents (font, &code, 1, &metrics, NULL);
LGLYPH_SET_LBEARING (glyph, metrics.lbearing);
LGLYPH_SET_RBEARING (glyph, metrics.rbearing);
LGLYPH_SET_WIDTH (glyph, metrics.width);
@@ -4638,7 +4638,7 @@ DEFUN ("internal-char-font", Finternal_char_font, Sinternal_char_font, 1, 2, 0,
struct face *face = FACE_FROM_ID (f, face_id);
if (! face->font)
return Qnil;
- unsigned code = face->font->driver->encode_char (face->font, c);
+ unsigned code = face->font->driver->encode_char (face->font, c, NULL);
if (code == FONT_INVALID_CODE)
return Qnil;
Lisp_Object font_object;
@@ -4965,7 +4965,7 @@ DEFUN ("font-get-glyphs", Ffont_get_glyphs, Sfont_get_glyphs, 3, 4, 0,
unsigned code;
struct font_metrics metrics;
- code = font->driver->encode_char (font, c);
+ code = font->driver->encode_char (font, c, NULL);
if (code == FONT_INVALID_CODE)
{
ASET (vec, i, Qnil);
@@ -4976,7 +4976,7 @@ DEFUN ("font-get-glyphs", Ffont_get_glyphs, Sfont_get_glyphs, 3, 4, 0,
LGLYPH_SET_TO (g, i);
LGLYPH_SET_CHAR (g, c);
LGLYPH_SET_CODE (g, code);
- font->driver->text_extents (font, &code, 1, &metrics);
+ font->driver->text_extents (font, &code, 1, &metrics, NULL);
LGLYPH_SET_WIDTH (g, metrics.width);
LGLYPH_SET_LBEARING (g, metrics.lbearing);
LGLYPH_SET_RBEARING (g, metrics.rbearing);
diff --git a/src/font.h b/src/font.h
index 8614e7fa10..952a9fa4c3 100644
--- a/src/font.h
+++ b/src/font.h
@@ -565,6 +565,8 @@ #define FONT_PIXEL_SIZE_QUANTUM 1
#define FONT_INVALID_CODE 0xFFFFFFFF
+struct glyph_context;
+
/* Font driver. Members specified as "optional" can be NULL. */
struct font_driver
@@ -645,14 +647,15 @@ #define FONT_INVALID_CODE 0xFFFFFFFF
/* Return a glyph code of FONT for character C (Unicode code point).
If FONT doesn't have such a glyph, return FONT_INVALID_CODE. */
- unsigned (*encode_char) (struct font *font, int c);
+ unsigned (*encode_char) (struct font *font, int c, struct glyph_context *context);
/* Compute the total metrics of the NGLYPHS glyphs specified by
the font FONT and the sequence of glyph codes CODE, and store the
result in METRICS. */
void (*text_extents) (struct font *font,
const unsigned *code, int nglyphs,
- struct font_metrics *metrics);
+ struct font_metrics *metrics,
+ struct glyph_context *context);
#ifdef HAVE_WINDOW_SYSTEM
diff --git a/src/ftcrfont.c b/src/ftcrfont.c
index 7832d4f5ce..19c2644285 100644
--- a/src/ftcrfont.c
+++ b/src/ftcrfont.c
@@ -323,7 +323,7 @@ ftcrfont_has_char (Lisp_Object font, int c)
}
static unsigned
-ftcrfont_encode_char (struct font *font, int c)
+ftcrfont_encode_char (struct font *font, int c, struct glyph_context *context)
{
struct font_info *ftcrfont_info = (struct font_info *) font;
unsigned code = FONT_INVALID_CODE;
@@ -331,20 +331,53 @@ ftcrfont_encode_char (struct font *font, int c)
int utf8len = CHAR_STRING (c, utf8);
cairo_glyph_t stack_glyph;
cairo_glyph_t *glyphs = &stack_glyph;
- int num_glyphs = 1;
- if (cairo_scaled_font_text_to_glyphs (ftcrfont_info->cr_scaled_font, 0, 0,
- (char *) utf8, utf8len,
- &glyphs, &num_glyphs,
- NULL, NULL, NULL)
- == CAIRO_STATUS_SUCCESS)
+ if (context == NULL)
{
- if (glyphs != &stack_glyph)
- cairo_glyph_free (glyphs);
- else if (stack_glyph.index)
- code = stack_glyph.index;
+ context = xmalloc (sizeof *context);
+ context->string = CALLN (Fstring, make_fixnum (c));
+ context->position = make_fixnum (0);
}
+ unsigned int num_glyphs = 0;
+ unsigned int num_clusters = 0;
+ hb_buffer_t *hb_buf = hb_buffer_create ();
+ hb_buffer_set_cluster_level (hb_buf, HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS);
+ hb_buffer_add_utf8 (hb_buf, SDATA (context->string), -1, 0, -1);
+ hb_buffer_set_direction (hb_buf, HB_DIRECTION_LTR);
+ hb_font_t *hb_font = hb_ft_font_create_referenced
+ (cairo_ft_scaled_font_lock_face (ftcrfont_info->cr_scaled_font));
+ hb_shape (hb_font, hb_buf, NULL, 0);
+ hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos
+ (hb_buf, &num_glyphs);
+ hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions
+ (hb_buf, &num_glyphs);
+ int i0, i1;
+ int c0, c1;
+ i0 = 0;
+ for (int i = num_glyphs - 1; i >= 0; i--)
+ {
+ if (glyph_info[i].cluster <= XFIXNUM (context->position))
+ {
+ i0 = i;
+ c0 = glyph_info[i].cluster;
+ break;
+ }
+ }
+ i1 = num_glyphs;
+ for (int i = 0; i < num_glyphs; i++)
+ {
+ if (glyph_info[i].cluster > c0)
+ {
+ i1 = i;
+ c1 = glyph_info[i].cluster;
+ break;
+ }
+ }
+ context->i = XFIXNUM (context->position) - c0;
+ context->n = c1 - c0;
+ code = glyph_info[i0].codepoint;
+
return code;
}
@@ -352,30 +385,65 @@ ftcrfont_encode_char (struct font *font, int c)
ftcrfont_text_extents (struct font *font,
const unsigned *code,
int nglyphs,
- struct font_metrics *metrics)
+ struct font_metrics *metrics,
+ struct glyph_context *context)
{
+ struct font_info *ftcrfont_info = (struct font_info *) font;
int width, i;
block_input ();
- width = ftcrfont_glyph_extents (font, code[0], metrics);
- for (i = 1; i < nglyphs; i++)
+
+ if (context == NULL)
{
- struct font_metrics m;
- int w = ftcrfont_glyph_extents (font, code[i], metrics ? &m : NULL);
+ context = xmalloc (sizeof *context);
+ context->string = CALLN (Fstring, make_fixnum (code[0]));
+ context->position = make_fixnum (0);
+ }
- if (metrics)
+ unsigned int num_glyphs = 0;
+ unsigned int num_clusters = 0;
+ hb_buffer_t *hb_buf = hb_buffer_create ();
+ hb_buffer_set_cluster_level (hb_buf, HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS);
+ hb_buffer_set_direction (hb_buf, HB_DIRECTION_LTR);
+ hb_buffer_set_content_type (hb_buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
+ int n = 0;
+ for (const char *p = SDATA (context->string); p <= SDATA (context->string) + SBYTES (context->string);)
+ {
+ int c = string_char_advance (&p);
+ hb_buffer_add (hb_buf, c, n++);
+ }
+ hb_font_t *hb_font = hb_ft_font_create_referenced
+ (cairo_ft_scaled_font_lock_face (ftcrfont_info->cr_scaled_font));
+ hb_shape (hb_font, hb_buf, NULL, 0);
+ hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos
+ (hb_buf, &num_glyphs);
+ hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions
+ (hb_buf, &num_glyphs);
+ int i0, i1;
+ int c0, c1;
+ i0 = 0;
+ for (int i = num_glyphs - 1; i >= 0; i--)
+ {
+ if (glyph_info[i].cluster <= XFIXNUM (context->position))
+ {
+ i0 = i;
+ c0 = glyph_info[i].cluster;
+ break;
+ }
+ }
+ i1 = num_glyphs;
+ for (int i = 0; i < num_glyphs; i++)
+ {
+ if (glyph_info[i].cluster > c0)
{
- if (width + m.lbearing < metrics->lbearing)
- metrics->lbearing = width + m.lbearing;
- if (width + m.rbearing > metrics->rbearing)
- metrics->rbearing = width + m.rbearing;
- if (m.ascent > metrics->ascent)
- metrics->ascent = m.ascent;
- if (m.descent > metrics->descent)
- metrics->descent = m.descent;
+ i1 = i;
+ c1 = glyph_info[i].cluster;
+ break;
}
- width += w;
}
+ context->i = XFIXNUM (context->position) - c0;
+ context->n = c1 - c0;
+ width = glyph_pos[i0].x_advance / (c1 - c0) / 64;
unblock_input ();
if (metrics)
@@ -508,6 +576,8 @@ ftcrfont_draw (struct glyph_string *s,
glyphs[i].index = s->char2b[from + i];
glyphs[i].x = x;
glyphs[i].y = y;
+ struct glyph_context *context = s->first_glyph->context;
+ glyphs[i].x -= (context->i * s->width);
x += (s->padding_p ? 1 : ftcrfont_glyph_extents (s->font,
glyphs[i].index,
NULL));
diff --git a/src/hbfont.c b/src/hbfont.c
index 576c5fe7f6..5c3c690281 100644
--- a/src/hbfont.c
+++ b/src/hbfont.c
@@ -578,7 +578,7 @@ hbfont_shape (Lisp_Object lgstring, Lisp_Object direction)
LGLYPH_SET_CODE (lglyph, info[i].codepoint);
unsigned code = info[i].codepoint;
- font->driver->text_extents (font, &code, 1, &metrics);
+ font->driver->text_extents (font, &code, 1, &metrics, NULL);
LGLYPH_SET_WIDTH (lglyph, metrics.width);
LGLYPH_SET_LBEARING (lglyph, metrics.lbearing);
LGLYPH_SET_RBEARING (lglyph, metrics.rbearing);
diff --git a/src/lisp.h b/src/lisp.h
index ad7d67ae69..c4ae954999 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1103,6 +1103,7 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
PVEC_MUTEX,
PVEC_CONDVAR,
PVEC_MODULE_FUNCTION,
+ PVEC_GLYPH_CONTEXT,
/* These should be last, for internal_equal and sxhash_obj. */
PVEC_COMPILED,
diff --git a/src/xdisp.c b/src/xdisp.c
index cf15f579b5..41a7b4235a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -27499,14 +27499,15 @@ append_glyph_string (struct glyph_string **head, struct glyph_string **tail,
static struct face *
get_char_face_and_encoding (struct frame *f, int c, int face_id,
- unsigned *char2b, bool display_p)
+ unsigned *char2b, bool display_p,
+ struct glyph_context *context)
{
struct face *face = FACE_FROM_ID (f, face_id);
unsigned code = 0;
if (face->font)
{
- code = face->font->driver->encode_char (face->font, c);
+ code = face->font->driver->encode_char (face->font, c, context);
if (code == FONT_INVALID_CODE)
code = 0;
@@ -27533,7 +27534,7 @@ get_char_face_and_encoding (struct frame *f, int c, int face_id,
static struct face *
get_glyph_face_and_encoding (struct frame *f, struct glyph *glyph,
- unsigned *char2b)
+ unsigned *char2b, struct glyph_context *context)
{
struct face *face;
unsigned code = 0;
@@ -27549,7 +27550,8 @@ get_glyph_face_and_encoding (struct frame *f, struct glyph *glyph,
if (CHAR_BYTE8_P (glyph->u.ch))
code = CHAR_TO_BYTE8 (glyph->u.ch);
else
- code = face->font->driver->encode_char (face->font, glyph->u.ch);
+ code = face->font->driver->encode_char (face->font, glyph->u.ch,
+ context);
if (code == FONT_INVALID_CODE)
code = 0;
@@ -27565,14 +27567,15 @@ get_glyph_face_and_encoding (struct frame *f, struct glyph *glyph,
Return true iff FONT has a glyph for C. */
static bool
-get_char_glyph_code (int c, struct font *font, unsigned *char2b)
+get_char_glyph_code (int c, struct font *font, unsigned *char2b,
+ struct glyph_context *context)
{
unsigned code;
if (CHAR_BYTE8_P (c))
code = CHAR_TO_BYTE8 (c);
else
- code = font->driver->encode_char (font, c);
+ code = font->driver->encode_char (font, c, context);
if (code == FONT_INVALID_CODE)
return false;
@@ -27620,7 +27623,8 @@ fill_composite_glyph_string (struct glyph_string *s, struct face *base_face,
-1, Qnil);
face = get_char_face_and_encoding (s->f, c, face_id,
- s->char2b + i, true);
+ s->char2b + i, true,
+ NULL);
if (face)
{
if (! s->face)
@@ -27777,12 +27781,13 @@ fill_glyph_string (struct glyph_string *s, int face_id,
&& glyph->glyph_not_available_p == glyph_not_available_p)
{
s->face = get_glyph_face_and_encoding (s->f, glyph,
- s->char2b + s->nchars);
+ s->char2b + s->nchars,
+ glyph->context);
++s->nchars;
eassert (s->nchars <= end - start);
s->width += glyph->pixel_width;
- if (glyph++->padding_p != s->padding_p)
- break;
+ glyph++;
+ break;
}
s->font = s->face->font;
@@ -27877,7 +27882,8 @@ fill_stretch_glyph_string (struct glyph_string *s, int start, int end)
}
static struct font_metrics *
-get_per_char_metric (struct font *font, const unsigned *char2b)
+get_per_char_metric (struct font *font, const unsigned *char2b,
+ struct glyph_context *context)
{
static struct font_metrics metrics;
@@ -27886,7 +27892,7 @@ get_per_char_metric (struct font *font, const unsigned *char2b)
if (*char2b == FONT_INVALID_CODE)
return NULL;
- font->driver->text_extents (font, char2b, 1, &metrics);
+ font->driver->text_extents (font, char2b, 1, &metrics, context);
return &metrics;
}
@@ -27908,9 +27914,10 @@ normal_char_ascent_descent (struct font *font, int c, int *ascent, int *descent)
/* Get metrics of C, defaulting to a reasonably sized ASCII
character. */
- if (get_char_glyph_code (c >= 0 ? c : '{', font, &char2b))
+ if (get_char_glyph_code (c >= 0 ? c : '{', font, &char2b, NULL))
{
- struct font_metrics *pcm = get_per_char_metric (font, &char2b);
+ struct font_metrics *pcm = get_per_char_metric (font, &char2b,
+ NULL);
if (!(pcm->width == 0 && pcm->rbearing == 0 && pcm->lbearing == 0))
{
@@ -27952,10 +27959,12 @@ gui_get_glyph_overhangs (struct glyph *glyph, struct frame *f, int *left, int *r
if (glyph->type == CHAR_GLYPH)
{
unsigned char2b;
- struct face *face = get_glyph_face_and_encoding (f, glyph, &char2b);
+ struct face *face = get_glyph_face_and_encoding (f, glyph, &char2b,
+ NULL);
if (face->font)
{
- struct font_metrics *pcm = get_per_char_metric (face->font, &char2b);
+ struct font_metrics *pcm = get_per_char_metric (face->font, &char2b,
+ NULL);
if (pcm)
{
if (pcm->rbearing > pcm->width)
@@ -29841,12 +29850,12 @@ produce_glyphless_glyph (struct it *it, bool for_no_font, Lisp_Object acronym)
str = buf;
}
for (len = 0; str[len] && ASCII_CHAR_P (str[len]) && len < 6; len++)
- code[len] = font->driver->encode_char (font, str[len]);
+ code[len] = font->driver->encode_char (font, str[len], NULL);
upper_len = (len + 1) / 2;
font->driver->text_extents (font, code, upper_len,
- &metrics_upper);
+ &metrics_upper, NULL);
font->driver->text_extents (font, code + upper_len, len - upper_len,
- &metrics_lower);
+ &metrics_lower, NULL);
@@ -29936,6 +29945,40 @@ #define IT_APPLY_FACE_BOX(it, face) \
} \
} while (false)
+static struct glyph_context *
+make_context (struct it *it)
+{
+ struct glyph_context *context = xmalloc (sizeof *context); // XXX GC
+ char *string = xmalloc (128);
+ char *p = string;
+ ptrdiff_t bytepos = it->current.pos.bytepos;
+ ptrdiff_t charpos = it->current.pos.charpos;
+ ptrdiff_t bp5 = bytepos;
+ ptrdiff_t bp0 = bp5;
+ ptrdiff_t bp1 = bp5;
+ while (bytepos > BEG_BYTE && bp5 - bytepos < 32)
+ dec_both (&charpos, &bytepos);
+ bp0 = bytepos;
+ int i = 0;
+ Lisp_Object pos = make_fixnum (0);
+ while (bytepos >= BEG_BYTE && bytepos < Z_BYTE && bytepos - bp0 < 32)
+ {
+ inc_both (&charpos, &bytepos);
+ memcpy (p, BUF_BYTE_ADDRESS (current_buffer, bytepos - prev_char_len (bytepos)), prev_char_len (bytepos));
+ p += prev_char_len (bytepos);
+ ++i;
+ if (bytepos == bp5)
+ pos = make_fixnum (i);
+ }
+ bp1 = bytepos;
+ eassert (strlen (p) == bp1 - bp0);
+ *p++ = it->c;
+ *p++ = 0;
+ context->string = build_string (string);
+ context->position = pos;
+ return context;
+}
+
/* RIF:
Produce glyphs/get display metrics for the display element IT is
loaded with. See the description of struct it in dispextern.h
@@ -29973,6 +30016,7 @@ gui_produce_glyphs (struct it *it)
if (font->vertical_centering)
boff = VCENTER_BASELINE_OFFSET (font, it->f) - boff;
+ struct glyph_context *context = NULL;
if (it->char_to_display != '\n' && it->char_to_display != '\t')
{
it->nglyphs = 1;
@@ -29989,9 +30033,11 @@ gui_produce_glyphs (struct it *it)
it->descent = FONT_DESCENT (font) - boff;
}
- if (get_char_glyph_code (it->char_to_display, font, &char2b))
+ context = make_context (it);
+ if (get_char_glyph_code (it->char_to_display, font, &char2b,
+ context))
{
- pcm = get_per_char_metric (font, &char2b);
+ pcm = get_per_char_metric (font, &char2b, context);
if (pcm->width == 0
&& pcm->rbearing == 0 && pcm->lbearing == 0)
pcm = NULL;
@@ -30079,9 +30125,13 @@ gui_produce_glyphs (struct it *it)
/ FONT_HEIGHT (font));
append_stretch_glyph (it, it->object, it->pixel_width,
it->ascent + it->descent, ascent);
+ it->glyph_row->glyphs[it->area][it->glyph_row->used[it->area] - 1].context = NULL;
}
else
- append_glyph (it);
+ {
+ append_glyph (it);
+ it->glyph_row->glyphs[it->area][it->glyph_row->used[it->area] - 1].context = context;
+ }
/* If characters with lbearing or rbearing are displayed
in this line, record that fact in a flag of the
@@ -30233,9 +30283,9 @@ gui_produce_glyphs (struct it *it)
it->nglyphs = 1;
if (FONT_TOO_HIGH (font))
{
- if (get_char_glyph_code (' ', font, &char2b))
+ if (get_char_glyph_code (' ', font, &char2b, NULL))
{
- pcm = get_per_char_metric (font, &char2b);
+ pcm = get_per_char_metric (font, &char2b, NULL);
if (pcm->width == 0
&& pcm->rbearing == 0 && pcm->lbearing == 0)
pcm = NULL;
@@ -30372,8 +30422,8 @@ gui_produce_glyphs (struct it *it)
if (! font_not_found_p)
{
get_char_face_and_encoding (it->f, c, it->face_id,
- &char2b, false);
- pcm = get_per_char_metric (font, &char2b);
+ &char2b, false, NULL);
+ pcm = get_per_char_metric (font, &char2b, NULL);
}
/* Initialize the bounding box. */
@@ -30433,8 +30483,9 @@ gui_produce_glyphs (struct it *it)
else
{
get_char_face_and_encoding (it->f, ch, face_id,
- &char2b, false);
- pcm = get_per_char_metric (font, &char2b);
+ &char2b, false,
+ make_context (it));
+ pcm = get_per_char_metric (font, &char2b, make_context (it));
}
if (! pcm)
cmp->offsets[i * 2] = cmp->offsets[i * 2 + 1] = 0;
diff --git a/src/xterm.c b/src/xterm.c
index 7989cecec7..3b5f0d3524 100644
--- a/src/xterm.c
+++ b/src/xterm.c
@@ -1703,7 +1703,8 @@ x_compute_glyph_string_overhangs (struct glyph_string *s)
if (s->first_glyph->type == CHAR_GLYPH)
{
struct font *font = s->font;
- font->driver->text_extents (font, s->char2b, s->nchars, &metrics);
+ font->driver->text_extents (font, s->char2b, s->nchars, &metrics,
+ NULL);
}
else
{
@@ -2047,7 +2048,7 @@ x_draw_glyphless_glyph_string_foreground (struct glyph_string *s)
/* It is assured that all LEN characters in STR is ASCII. */
for (j = 0; j < len; j++)
- char2b[j] = s->font->driver->encode_char (s->font, str[j]) & 0xFFFF;
+ char2b[j] = s->font->driver->encode_char (s->font, str[j], NULL) & 0xFFFF;
s->font->driver->draw (s, 0, upper_len,
x + glyph->slice.glyphless.upper_xoff,
s->ybase + glyph->slice.glyphless.upper_yoff,
^ permalink raw reply related [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 21:43 ` Pip Cet
@ 2020-05-20 1:41 ` Clément Pit-Claudel
2020-05-20 2:07 ` Ligatures Stefan Monnier
` (2 subsequent siblings)
3 siblings, 0 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-20 1:41 UTC (permalink / raw)
To: Pip Cet; +Cc: Eli Zaretskii, Alan Third, emacs-devel
On 19/05/2020 17.43, Pip Cet wrote:
> And I'm afraid the difference is much more obvious with box cursors
> than it is with carets. I'm attaching a screenshot of a patched Emacs
> displaying "ffi", with point on the second f, in the "Linux Libertine
> Display O" font (using approximately equal slices).
Beauty is in the eye of the beholder :) This looks great to me, actually.
Maybe I'm just used to it because it's consistent with what Firefox does when I select text, and I have a habit of randomly selecting text while I read?
Thanks for working on this!
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-19 21:43 ` Pip Cet
2020-05-20 1:41 ` Clément Pit-Claudel
@ 2020-05-20 2:07 ` Stefan Monnier
2020-05-20 7:14 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) tomas
2020-05-20 15:18 ` Eli Zaretskii
3 siblings, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-20 2:07 UTC (permalink / raw)
To: Pip Cet; +Cc: Clément Pit-Claudel, emacs-devel, Eli Zaretskii, Alan Third
> than it is with carets. I'm attaching a screenshot of a patched Emacs
> displaying "ffi", with point on the second f, in the "Linux Libertine
> Display O" font (using approximately equal slices).
This looks pretty good to me. Not perfect, but to the extent that the
border of the drawn cursor go right through the "space" that separates the
letters, it shows clearly where we are.
> I think this is a bit of a worst-case scenario
I hope you're right.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 21:43 ` Pip Cet
2020-05-20 1:41 ` Clément Pit-Claudel
2020-05-20 2:07 ` Ligatures Stefan Monnier
@ 2020-05-20 7:14 ` tomas
2020-05-20 15:18 ` Eli Zaretskii
3 siblings, 0 replies; 145+ messages in thread
From: tomas @ 2020-05-20 7:14 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 591 bytes --]
On Tue, May 19, 2020 at 09:43:49PM +0000, Pip Cet wrote:
[...]
> And I'm afraid the difference is much more obvious with box cursors
> than it is with carets. I'm attaching a screenshot of a patched Emacs
> displaying "ffi", with point on the second f, in the "Linux Libertine
> Display O" font (using approximately equal slices).
Nice. I understand what miffs you (the overhang falls off the cursor
box, "compensated" by the wrong overhang entering from the left),
but given the information available you just can't do better.
IMHO it looks fine. Thanks for showing us :-)
Cheers
-- t
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-19 21:43 ` Pip Cet
` (2 preceding siblings ...)
2020-05-20 7:14 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) tomas
@ 2020-05-20 15:18 ` Eli Zaretskii
2020-05-20 17:31 ` Clément Pit-Claudel
2020-05-21 10:01 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Pip Cet
3 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-20 15:18 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 19 May 2020 21:43:49 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, Alan Third <alan@idiocy.org>, emacs-devel@gnu.org
>
> And I'm afraid the difference is much more obvious with box cursors
> than it is with carets. I'm attaching a screenshot of a patched Emacs
> displaying "ffi", with point on the second f, in the "Linux Libertine
> Display O" font (using approximately equal slices).
>
> I think this is a bit of a worst-case scenario, a three-letter
> ligature in a font using ligatures and overhangs very
> enthusiastically. It might be okay for other fonts.
I'm not sure this is the worst case. It might be the worst case if we
are talking about ligatures that involve only ASCII characters, and
don't involve symbols like ==> that gets converted to ⇒. But in
general, there are worse cases, like á (two codepoints). And for
kicks see the Khmer hello in etc/HELLO, where you can find 4
codepoints that produce a grapheme cluster made of 3 glyphs.
If we only want this feature for ASCII ligatures, then it sounds like
a limitation to me (and frankly, somewhat unclean as features go), but
if we really want this only for these limited cases, we will need to
somehow indicate to the display engine which ligatures are to be
handled like this and which aren't.
> My remaining idea is to stretch characters so we can break up a
> ligature without changing its total width. I'm not sure how to do
> that, though.
I don't think I understand what you'd like to do. Can you elaborate?
> (I'm also attaching the patch, for the morbidly curious; it isn't
> clean, readable, or finished in any way, and contains at least one
> obvious bug. It's just good enough to produce the screenshot, and
> maybe it can serve as a hint as to which files need changing for
> ligatures to work; but such changes would have to be done very
> differently from the patch.).
Right, the actual implementation will have to be different. In
particular, I think that if ligatures will use automatic compositions,
the information you need is already stored in the composition table
and reachable from the glyph string, so you don't need to invoke the
shaper again.
I see you implemented this for static compositions, which are
semi-obsolete. Also, I don't see the code which moves point inside
the ligature; Emacs will not allow doing that by default. In
particular, how did you tell the display code to show the cursor on
the middle 'f', not on the first one? Did I miss something?
And finally, you said you intended to do this via row->clip, but this
patch does something very different. What changed your mind?
Thanks.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 15:18 ` Eli Zaretskii
@ 2020-05-20 17:31 ` Clément Pit-Claudel
2020-05-20 18:01 ` Eli Zaretskii
2020-05-20 23:19 ` Ligatures Stefan Monnier
2020-05-21 10:01 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Pip Cet
1 sibling, 2 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-20 17:31 UTC (permalink / raw)
To: Eli Zaretskii, Pip Cet; +Cc: alan, emacs-devel
On 20/05/2020 11.18, Eli Zaretskii wrote:
> It might be the worst case if we are talking about ligatures that
> involve only ASCII characters, and don't involve symbols like ==>
> that gets converted to ⇒.
Wouldn't ==> be converted to ⟹ instead of ⇒? But regardless, what's the issue with ⇒?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 17:31 ` Clément Pit-Claudel
@ 2020-05-20 18:01 ` Eli Zaretskii
2020-05-20 18:33 ` Clément Pit-Claudel
2020-05-20 23:19 ` Ligatures Stefan Monnier
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-20 18:01 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 20 May 2020 13:31:13 -0400
>
> On 20/05/2020 11.18, Eli Zaretskii wrote:
> > It might be the worst case if we are talking about ligatures that
> > involve only ASCII characters, and don't involve symbols like ==>
> > that gets converted to ⇒.
>
> Wouldn't ==> be converted to ⟹ instead of ⇒?
Yes, to ⟹, sorry.
> But regardless, what's the issue with ⇒?
The issue with ⟹ is that the stem doesn't seem to be splittable into 2
parts, whereas "==" are two characters.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 18:01 ` Eli Zaretskii
@ 2020-05-20 18:33 ` Clément Pit-Claudel
2020-05-20 18:49 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-20 18:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 20/05/2020 14.01, Eli Zaretskii wrote:
>> But regardless, what's the issue with ⇒?
>
> The issue with ⟹ is that the stem doesn't seem to be splittable into 2
> parts, whereas "==" are two characters.
Oh, I see the worry, but I don't think it's a problem — it's a feature to split the stem into two parts :) In a monospace font, it should look obvious what's happening, since ⟹ will occupy three columns.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 18:33 ` Clément Pit-Claudel
@ 2020-05-20 18:49 ` Eli Zaretskii
2020-05-20 18:53 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-20 18:49 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 20 May 2020 14:33:24 -0400
>
> On 20/05/2020 14.01, Eli Zaretskii wrote:
> >> But regardless, what's the issue with ⇒?
> >
> > The issue with ⟹ is that the stem doesn't seem to be splittable into 2
> > parts, whereas "==" are two characters.
>
> Oh, I see the worry, but I don't think it's a problem — it's a feature to split the stem into two parts :)
Then I guess we have very different views of what is a "feature". To
me, this looks like a terrible kludge.
> In a monospace font, it should look obvious what's happening, since ⟹ will occupy three columns.
Here it occupies only two.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 18:49 ` Eli Zaretskii
@ 2020-05-20 18:53 ` Clément Pit-Claudel
2020-05-20 19:02 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-20 18:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 20/05/2020 14.49, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Wed, 20 May 2020 14:33:24 -0400
>>
>> On 20/05/2020 14.01, Eli Zaretskii wrote:
>>>> But regardless, what's the issue with ⇒?
>>>
>>> The issue with ⟹ is that the stem doesn't seem to be splittable into 2
>>> parts, whereas "==" are two characters.
>>
>> Oh, I see the worry, but I don't think it's a problem — it's a feature to split the stem into two parts :)
>
> Then I guess we have very different views of what is a "feature". To
> me, this looks like a terrible kludge.
Yet, that's what everyone else is doing, so at least it's a predictable (and convenient) kludge.
>> In a monospace font, it should look obvious what's happening, since ⟹ will occupy three columns.
>
> Here it occupies only two.
Do you have a font with ligatures that composes ==> into ⟹, taking only two characters?
Most of the monospace fonts on my machine show ⇒ as one character and ⟹ as two — but the ones that have ligatures changing => into ⇒ and ==> into ⟹ all respect the widths of the characters they compose, so ⇒ is two characters wide and ⟹ is three characters wide.
I don't think the width of ⟹ as a non-composed character is too relevant, since we won't break it up, right?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 18:53 ` Clément Pit-Claudel
@ 2020-05-20 19:02 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-20 19:02 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 20 May 2020 14:53:59 -0400
>
> On 20/05/2020 14.49, Eli Zaretskii wrote:
> >> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> >> Date: Wed, 20 May 2020 14:33:24 -0400
> >>
> >> Oh, I see the worry, but I don't think it's a problem — it's a feature to split the stem into two parts :)
> >
> > Then I guess we have very different views of what is a "feature". To
> > me, this looks like a terrible kludge.
>
> Yet, that's what everyone else is doing, so at least it's a predictable (and convenient) kludge.
Since when we in Emacs do stuff "like everyone else" and feel good
about that?
Anyway, this argument about personal preferences is futile. Just
understand that a feature that works for some vaguely-defined use
cases, but doesn't work for the rest is a misfeature in my book.
> I don't think the width of ⟹ as a non-composed character is too relevant, since we won't break it up, right?
My point is that you cannot rely on the width being 3 columns. It may
or may not be so.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-20 17:31 ` Clément Pit-Claudel
2020-05-20 18:01 ` Eli Zaretskii
@ 2020-05-20 23:19 ` Stefan Monnier
1 sibling, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-20 23:19 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, alan, Pip Cet, emacs-devel
>> It might be the worst case if we are talking about ligatures that
>> involve only ASCII characters, and don't involve symbols like ==>
>> that gets converted to ⇒.
> Wouldn't ==> be converted to ⟹ instead of ⇒? But regardless, what's the issue with ⇒?
Using `misc-fixed` here, those two above are displayed identically (as
single-column char) ;-)
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-20 15:18 ` Eli Zaretskii
2020-05-20 17:31 ` Clément Pit-Claudel
@ 2020-05-21 10:01 ` Pip Cet
2020-05-21 14:11 ` Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-21 10:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
Hi, Eli,
On Wed, May 20, 2020 at 3:31 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Tue, 19 May 2020 21:43:49 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, Alan Third <alan@idiocy.org>, emacs-devel@gnu.org
> >
> > And I'm afraid the difference is much more obvious with box cursors
> > than it is with carets. I'm attaching a screenshot of a patched Emacs
> > displaying "ffi", with point on the second f, in the "Linux Libertine
> > Display O" font (using approximately equal slices).
> >
> > I think this is a bit of a worst-case scenario, a three-letter
> > ligature in a font using ligatures and overhangs very
> > enthusiastically. It might be okay for other fonts.
>
> I'm not sure this is the worst case. It might be the worst case if we
> are talking about ligatures that involve only ASCII characters, and
> don't involve symbols like ==> that gets converted to ⇒. But in
> general, there are worse cases, like á (two codepoints). And for
> kicks see the Khmer hello in etc/HELLO, where you can find 4
> codepoints that produce a grapheme cluster made of 3 glyphs.
You're correct: I'm simply not dealing with Khmer or composed
characters (which are different from ligatures, of course) in the
patch, and I'm not certain how to deal with them in theory, either.
> If we only want this feature for ASCII ligatures, then it sounds like
> a limitation to me (and frankly, somewhat unclean as features go),
Not "only for ASCII ligatures", but not "any conceivable combination
of codepoints into glyphs" either. Just those supported by the font
and Harfbuzz.
> but
> if we really want this only for these limited cases, we will need to
> somehow indicate to the display engine which ligatures are to be
> handled like this and which aren't.
Well, we now know that fonts can provide information about how a
ligature is to be split into one-dimensional slices; I filed a pull
request against Harfbuzz (since merged) that would actually make the
corresponding API work, at least for the "Libertinus" font family.
Of course that means that Emacs behavior would depend on the font
tables in ways it currently doesn't. That's a problem.
> > My remaining idea is to stretch characters so we can break up a
> > ligature without changing its total width. I'm not sure how to do
> > that, though.
>
> I don't think I understand what you'd like to do. Can you elaborate?
My idea was to display "ffi" with the point on the second f by
condensing an "f" glyph to cover the middle third of the "ffi" glyph.
However, I might have been too critical of how good the simple
solution deals with this case.
> > (I'm also attaching the patch, for the morbidly curious; it isn't
> > clean, readable, or finished in any way, and contains at least one
> > obvious bug. It's just good enough to produce the screenshot, and
> > maybe it can serve as a hint as to which files need changing for
> > ligatures to work; but such changes would have to be done very
> > differently from the patch.).
>
> Right, the actual implementation will have to be different. In
> particular, I think that if ligatures will use automatic compositions,
> the information you need is already stored in the composition table
> and reachable from the glyph string, so you don't need to invoke the
> shaper again.
Well, I'm sorry to bring up a different (though somewhat related
issue), but kerning is also an issue: we need a shaper to get that
right, not just a composition table, right?
> I see you implemented this for static compositions, which are
> semi-obsolete.
I'm sorry, I'm afraid I don't understand. This should handle any
composition the shaper does, and only those, but slices up everything
horizontally by default.
> Also, I don't see the code which moves point inside
> the ligature; Emacs will not allow doing that by default. In
> particular, how did you tell the display code to show the cursor on
> the middle 'f', not on the first one? Did I miss something?
I produce three "struct glyph"s for "ffi": each has width one third of
the actual font glyph, and stores, in convoluted form, information
about which slice of the font glyph is to be actually drawn.
> And finally, you said you intended to do this via row->clip, but this
> patch does something very different. What changed your mind?
I was surprised this no longer seemed to be strictly necessary: as far
as the display code is concerned, we're dealing with three separate
glyphs with overhang areas, and those are already handled by the
cursor-drawing code.
Clipping is still needed: to deal with double-drawing issues, and to
deal with such crimes as making part of a ligature have a different
foreground color.
I'm sorry it's not particularly obvious from the patch, but the
approach I took yesterday is this:
1. every struct glyph has a "context", which specifies the character
for the struct glyph and some surrounding text.
2. every struct glyph is converted to a slice of (currently) a single
font glyph, by sending the context through the shaper and cutting out
the relevant bits
3. struct glyphs are displayed one by one
Problems:
1. ligatures can cross line boundaries
2. the context has to be updated, and trigger redisplay of the struct glyph
3. clipping is necessary
4. there are N clipped drawing operations for a single glyph covering
N struct glyphs.
5. corner cases can have ambiguous context: for example, a string of
many "f"s would be paired into "ff" glyphs, and simply cutting off the
context after a certain number of characters might result in the wrong
pairing
On the other hand, it deals with kerning as well as ligatures. And
other problems (right now, we call the shaper on 64 characters for
every character we actually display, which makes things noticeably
slow) are fixable.
Overall, I'd like to think more about alternative approaches to the
"context string" one before implementing anything. How would that work
for kerning, in particular?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 10:01 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Pip Cet
@ 2020-05-21 14:11 ` Eli Zaretskii
2020-05-21 16:26 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-21 14:11 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Thu, 21 May 2020 10:01:03 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > If we only want this feature for ASCII ligatures, then it sounds like
> > a limitation to me (and frankly, somewhat unclean as features go),
>
> Not "only for ASCII ligatures", but not "any conceivable combination
> of codepoints into glyphs" either. Just those supported by the font
> and Harfbuzz.
>
> > but
> > if we really want this only for these limited cases, we will need to
> > somehow indicate to the display engine which ligatures are to be
> > handled like this and which aren't.
>
> Well, we now know that fonts can provide information about how a
> ligature is to be split into one-dimensional slices;
The question is: do we want to show those carets for all the character
compositions, even if the information is provided? If not, we will
have to indicate somehow whether they should or shouldn't be shown for
each particular grapheme cluster.
> Of course that means that Emacs behavior would depend on the font
> tables in ways it currently doesn't. That's a problem.
It isn't a problem to depend on that if most fonts provide this
information. Then we could simply say this is not supported when the
information is not in the font. But if many fonts that support
ligatures don't provide this information, we will need to have some
fallback, like assume that every codepoint has the same share of the
ligature's width. the fact that other applications use a simplistic
heuristic and not the information in the fonts suggests that either
the information is not readily available or there are some other
problems with using it.
> > Right, the actual implementation will have to be different. In
> > particular, I think that if ligatures will use automatic compositions,
> > the information you need is already stored in the composition table
> > and reachable from the glyph string, so you don't need to invoke the
> > shaper again.
>
> Well, I'm sorry to bring up a different (though somewhat related
> issue), but kerning is also an issue: we need a shaper to get that
> right, not just a composition table, right?
Automatic compositions already use the shaper, see autocmp_chars.
> > I see you implemented this for static compositions, which are
> > semi-obsolete.
>
> I'm sorry, I'm afraid I don't understand. This should handle any
> composition the shaper does, and only those, but slices up everything
> horizontally by default.
I'm talking about the changes in gui_produce_glyphs. Its high-level
structure is basically
if (it->what == IT_CHARACTER)
{
... /* handles character glyphs */
}
else if (it->what == IT_COMPOSITION && it->cmp_it.ch < 0)
{
... /* A static compositions. */
}
else if (it->what == IT_COMPOSITION)
{
/* A dynamic (automatic) composition. */
}
[...]
You made changes only in the "static compositions" part. That code
handles compositions created by compose-region. The "modern" way of
composing text in Emacs uses automatic compositions, which are
controlled by data in composition-function-table. This is where we
call the shaping engine to produce the glyphs according to rules
stored in the font. I don't see in your patch any changes that affect
ligatures created by automatic compositions; did I miss something?
If you use the automatic compositions route, then the information you
need, i.e. the number of clusters in the shaped text and the overall
width of the ligature, is already produced by the shaper and stored in
the "gstring" object in the composition table, see the description of
that object in the doc string of composition-get-gstring. So there
should be no need to invoke the shaper inside gui_produce_glyphs and
elsewhere. (If we want to use the carets information from the font,
we will probably need to extend the gstring object to store that as
well, and extend the shape method to extract this information when
available.)
> > Also, I don't see the code which moves point inside
> > the ligature; Emacs will not allow doing that by default. In
> > particular, how did you tell the display code to show the cursor on
> > the middle 'f', not on the first one? Did I miss something?
>
> I produce three "struct glyph"s for "ffi": each has width one third of
> the actual font glyph, and stores, in convoluted form, information
> about which slice of the font glyph is to be actually drawn.
Ah, okay, I missed that. But producing 3 glyphs instead of just one
is not necessarily the best idea, I think. As you point out, one
problem will be with splitting the ligature across lines. Another
problem is more expensive display. And we won't be able to display
the ligature as a single glyph, for those who want that, at least not
easily.
> > And finally, you said you intended to do this via row->clip, but this
> > patch does something very different. What changed your mind?
>
> I was surprised this no longer seemed to be strictly necessary: as far
> as the display code is concerned, we're dealing with three separate
> glyphs with overhang areas, and those are already handled by the
> cursor-drawing code.
Yes. But if we return to a single glyph, then we'd need to do some
clipping.
> On the other hand, it deals with kerning as well as ligatures.
You mean, kerning of simple characters, for which we don't produce
ligatures? Or kerning within ligatures? If the latter, then I don't
see why we'd need that: font designers already design the ligatures to
have the optimal kerning, no?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 14:11 ` Eli Zaretskii
@ 2020-05-21 16:26 ` Pip Cet
2020-05-21 19:08 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-21 16:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Thu, May 21, 2020 at 2:11 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Thu, 21 May 2020 10:01:03 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> > > but
> > > if we really want this only for these limited cases, we will need to
> > > somehow indicate to the display engine which ligatures are to be
> > > handled like this and which aren't.
> >
> > Well, we now know that fonts can provide information about how a
> > ligature is to be split into one-dimensional slices;
>
> The question is: do we want to show those carets for all the character
> compositions, even if the information is provided? If not, we will
> have to indicate somehow whether they should or shouldn't be shown for
> each particular grapheme cluster.
Oh. I hadn't thought about fonts providing such caret information in
cases where they shouldn't, but of course that's a valid concern.
> > Of course that means that Emacs behavior would depend on the font
> > tables in ways it currently doesn't. That's a problem.
>
> It isn't a problem to depend on that if most fonts provide this
> information.
> Then we could simply say this is not supported when the
> information is not in the font.
I'm not sure how simple that would be: we could treat ligatures
without carets as atomic, or we could tell harfbuzz not to apply
ligatures without carets, or maybe make that decision depend on
whether the ligature is required or discretionary...
> But if many fonts that support
> ligatures don't provide this information, we will need to have some
> fallback, like assume that every codepoint has the same share of the
> ligature's width. the fact that other applications use a simplistic
> heuristic and not the information in the fonts suggests that either
> the information is not readily available or there are some other
> problems with using it.
Correct, it does. I'm not sure which one is the case.
> > > Right, the actual implementation will have to be different. In
> > > particular, I think that if ligatures will use automatic compositions,
> > > the information you need is already stored in the composition table
> > > and reachable from the glyph string, so you don't need to invoke the
> > > shaper again.
> >
> > Well, I'm sorry to bring up a different (though somewhat related
> > issue), but kerning is also an issue: we need a shaper to get that
> > right, not just a composition table, right?
>
> Automatic compositions already use the shaper, see autocmp_chars.
I'm not sure I understand how kerning would work using automatic compositions.
> > > I see you implemented this for static compositions, which are
> > > semi-obsolete.
> >
> > I'm sorry, I'm afraid I don't understand. This should handle any
> > composition the shaper does, and only those, but slices up everything
> > horizontally by default.
>
> I'm talking about the changes in gui_produce_glyphs. Its high-level
> structure is basically
>
> if (it->what == IT_CHARACTER)
> {
> ... /* handles character glyphs */
> }
> else if (it->what == IT_COMPOSITION && it->cmp_it.ch < 0)
> {
> ... /* A static compositions. */
> }
> else if (it->what == IT_COMPOSITION)
> {
> /* A dynamic (automatic) composition. */
> }
> [...]
>
> You made changes only in the "static compositions" part.
No. I didn't touch the "static compositions" part at all, except for
passing an extra NULL pointer to an API I'd extended. (At least,
that's what I intended, for all the changes to be in the IT_CHARACTER
part).
> That code
> handles compositions created by compose-region. The "modern" way of
> composing text in Emacs uses automatic compositions, which are
> controlled by data in composition-function-table. This is where we
> call the shaping engine to produce the glyphs according to rules
> stored in the font. I don't see in your patch any changes that affect
> ligatures created by automatic compositions; did I miss something?
I don't think so; I went for a third route, that of leaving all
compositions handling to the shaper and doing none of it in Emacs
itself.
> If you use the automatic compositions route, then the information you
> need, i.e. the number of clusters in the shaped text and the overall
> width of the ligature, is already produced by the shaper and stored in
> the "gstring" object in the composition table, see the description of
> that object in the doc string of composition-get-gstring. So there
> should be no need to invoke the shaper inside gui_produce_glyphs and
> elsewhere. (If we want to use the carets information from the font,
> we will probably need to extend the gstring object to store that as
> well, and extend the shape method to extract this information when
> available.)
Yes, and that seemed too complicated for me for something that I
thought wouldn't handle kerning anyway...
> > > Also, I don't see the code which moves point inside
> > > the ligature; Emacs will not allow doing that by default. In
> > > particular, how did you tell the display code to show the cursor on
> > > the middle 'f', not on the first one? Did I miss something?
> >
> > I produce three "struct glyph"s for "ffi": each has width one third of
> > the actual font glyph, and stores, in convoluted form, information
> > about which slice of the font glyph is to be actually drawn.
>
> Ah, okay, I missed that. But producing 3 glyphs instead of just one
> is not necessarily the best idea, I think.
I agree! I'd be happy to hear better ideas, and I think for now "use
fixed-width fonts" is a better idea...
> As you point out, one
> problem will be with splitting the ligature across lines. Another
> problem is more expensive display.
You mean the actual "copy the glyph bitmap to the glass" display?
Because I don't think that's relevant. Overall redisplay() time really
goes up calling the shaper on 32 characters for every character
displayed, though, so that's a concern I agree with.
> And we won't be able to display
> the ligature as a single glyph, for those who want that, at least not
> easily.
But that's what they can do now, with the IT_COMPOSITION case, right?
Because I did not touch that code so I didn't expect that to break
(famous last words).
> > > And finally, you said you intended to do this via row->clip, but this
> > > patch does something very different. What changed your mind?
> >
> > I was surprised this no longer seemed to be strictly necessary: as far
> > as the display code is concerned, we're dealing with three separate
> > glyphs with overhang areas, and those are already handled by the
> > cursor-drawing code.
>
> Yes. But if we return to a single glyph, then we'd need to do some
> clipping.
As I said, we need to do the clipping to render antialiased pixels properly.
It's just two lines of code in ftcrfont_draw:
cairo_rectangle (cr, x, y - FONT_BASE (face->font),
s->width, FONT_HEIGHT (face->font));
cairo_clip (cr);
> > On the other hand, it deals with kerning as well as ligatures.
>
> You mean, kerning of simple characters, for which we don't produce
> ligatures?
Yes, that's what I mean.
> Or kerning within ligatures? If the latter, then I don't
> see why we'd need that: font designers already design the ligatures to
> have the optimal kerning, no?
It's certainly not our job to fix that if they don't!
Perhaps I can digress a little and describe what I think the
interaction with the shaper should be like:
Emacs: I'd like to display codepoint 'f'
Harfbuzz: you'll have to tell me the codepoint before that
Emacs: 'f'
Harfbuzz: and the one after those two
Emacs: 'i'
Harfbuzz: and the one before all of those
Emacs: That's too expensive for me to compute / it's the beginning of
paragraph / a bidi boundary / an object without an assigned codepoint
/ ...
Harfbuzz: okay, display it as the middle slice of the "ffi" glyph
I.e., I'd like Harfbuzz to be asynchronous, and request more
information, parsimoniously, about the context of the codepoint we're
describing, rather than working in one go from "complete" information
to an indefinitely-long line of glyphs. And deal well with us deciding
it's too expensive to perform that much look-back/look-ahead. (Because
in real life, ligatures depend on knowing some amount of the context,
but not all of it, or people could never start writing.)
Of course, all this doesn't change that the "struct it" design is
somewhat difficult to extend to handling look-ahead: it's easy enough
to create a copy of the iterator and advance that while leaving the
actual iterator intact, but it's also really slow. In fact I suspect
the best way would be to make struct it a heap-allocated pseudovector
(not necessarily one ordinarily garbage-collected, though), and cache
"future" iterator states once we compute them.
You're correct when you say that some major redesign is needed in this
area, but I don't think that's the subject of the current discussion.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 16:26 ` Pip Cet
@ 2020-05-21 19:08 ` Eli Zaretskii
2020-05-21 20:51 ` Clément Pit-Claudel
2020-05-21 21:06 ` Pip Cet
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-21 19:08 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Thu, 21 May 2020 16:26:13 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> On Thu, May 21, 2020 at 2:11 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > From: Pip Cet <pipcet@gmail.com>
> > > Date: Thu, 21 May 2020 10:01:03 +0000
> > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> > > > but
> > > > if we really want this only for these limited cases, we will need to
> > > > somehow indicate to the display engine which ligatures are to be
> > > > handled like this and which aren't.
> > >
> > > Well, we now know that fonts can provide information about how a
> > > ligature is to be split into one-dimensional slices;
> >
> > The question is: do we want to show those carets for all the character
> > compositions, even if the information is provided? If not, we will
> > have to indicate somehow whether they should or shouldn't be shown for
> > each particular grapheme cluster.
>
> Oh. I hadn't thought about fonts providing such caret information in
> cases where they shouldn't, but of course that's a valid concern.
>
> > > Of course that means that Emacs behavior would depend on the font
> > > tables in ways it currently doesn't. That's a problem.
> >
> > It isn't a problem to depend on that if most fonts provide this
> > information.
>
> > Then we could simply say this is not supported when the
> > information is not in the font.
>
> I'm not sure how simple that would be: we could treat ligatures
> without carets as atomic, or we could tell harfbuzz not to apply
> ligatures without carets, or maybe make that decision depend on
> whether the ligature is required or discretionary...
>
> > But if many fonts that support
> > ligatures don't provide this information, we will need to have some
> > fallback, like assume that every codepoint has the same share of the
> > ligature's width. the fact that other applications use a simplistic
> > heuristic and not the information in the fonts suggests that either
> > the information is not readily available or there are some other
> > problems with using it.
>
> Correct, it does. I'm not sure which one is the case.
>
> > > > Right, the actual implementation will have to be different. In
> > > > particular, I think that if ligatures will use automatic compositions,
> > > > the information you need is already stored in the composition table
> > > > and reachable from the glyph string, so you don't need to invoke the
> > > > shaper again.
> > >
> > > Well, I'm sorry to bring up a different (though somewhat related
> > > issue), but kerning is also an issue: we need a shaper to get that
> > > right, not just a composition table, right?
> >
> > Automatic compositions already use the shaper, see autocmp_chars.
>
> I'm not sure I understand how kerning would work using automatic compositions.
>
> > > > I see you implemented this for static compositions, which are
> > > > semi-obsolete.
> > >
> > > I'm sorry, I'm afraid I don't understand. This should handle any
> > > composition the shaper does, and only those, but slices up everything
> > > horizontally by default.
> >
> > I'm talking about the changes in gui_produce_glyphs. Its high-level
> > structure is basically
> >
> > if (it->what == IT_CHARACTER)
> > {
> > ... /* handles character glyphs */
> > }
> > else if (it->what == IT_COMPOSITION && it->cmp_it.ch < 0)
> > {
> > ... /* A static compositions. */
> > }
> > else if (it->what == IT_COMPOSITION)
> > {
> > /* A dynamic (automatic) composition. */
> > }
> > [...]
> >
> > You made changes only in the "static compositions" part.
>
> No. I didn't touch the "static compositions" part at all, except for
> passing an extra NULL pointer to an API I'd extended. (At least,
> that's what I intended, for all the changes to be in the IT_CHARACTER
> part).
I mean this part:
@@ -30433,8 +30483,9 @@ gui_produce_glyphs (struct it *it)
else
{
get_char_face_and_encoding (it->f, ch, face_id,
- &char2b, false);
- pcm = get_per_char_metric (font, &char2b);
+ &char2b, false,
+ make_context (it));
+ pcm = get_per_char_metric (font, &char2b, make_context (it));
}
This calls make_context and passes it to these functions. This code
handles static compositions only.
> > The "modern" way of composing text in Emacs uses automatic
> > compositions, which are controlled by data in
> > composition-function-table. This is where we call the shaping
> > engine to produce the glyphs according to rules stored in the
> > font. I don't see in your patch any changes that affect ligatures
> > created by automatic compositions; did I miss something?
>
> I don't think so; I went for a third route, that of leaving all
> compositions handling to the shaper and doing none of it in Emacs
> itself.
But automatic compositions do work by calling the shaper.
> Perhaps I can digress a little and describe what I think the
> interaction with the shaper should be like:
>
> Emacs: I'd like to display codepoint 'f'
> Harfbuzz: you'll have to tell me the codepoint before that
> Emacs: 'f'
> Harfbuzz: and the one after those two
> Emacs: 'i'
> Harfbuzz: and the one before all of those
> Emacs: That's too expensive for me to compute / it's the beginning of
> paragraph / a bidi boundary / an object without an assigned codepoint
> / ...
> Harfbuzz: okay, display it as the middle slice of the "ffi" glyph
>
> I.e., I'd like Harfbuzz to be asynchronous, and request more
> information, parsimoniously, about the context of the codepoint we're
> describing, rather than working in one go from "complete" information
> to an indefinitely-long line of glyphs. And deal well with us deciding
> it's too expensive to perform that much look-back/look-ahead. (Because
> in real life, ligatures depend on knowing some amount of the context,
> but not all of it, or people could never start writing.)
That would prevent Emacs from controlling what is and what isn't
composed, leaving the shaper in charge. We currently allow Lisp to
control that via composition-function-table, which provides a regexp
that text around a character must match in order for the matching
substring to be passed to the shaper. We never call the shaper unless
composition-function-table tells us to do so.
I'm not sure I understand what problems do you see with this design.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 19:08 ` Eli Zaretskii
@ 2020-05-21 20:51 ` Clément Pit-Claudel
2020-05-21 21:16 ` Pip Cet
2020-05-22 11:44 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-21 21:06 ` Pip Cet
1 sibling, 2 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-21 20:51 UTC (permalink / raw)
To: Eli Zaretskii, Pip Cet; +Cc: alan, emacs-devel
On 21/05/2020 15.08, Eli Zaretskii wrote:
> That would prevent Emacs from controlling what is and what isn't
> composed, leaving the shaper in charge. We currently allow Lisp to
> control that via composition-function-table, which provides a regexp
> that text around a character must match in order for the matching
> substring to be passed to the shaper. We never call the shaper unless
> composition-function-table tells us to do so.
Does this mean that for each font we need to re-encode the font's logic for deciding whether to use a ligature?
Some concrete examples: in Iosevka (*, (**, (***, (**** etc are all displayed with the * character vertically centered relative to the (, but a lone * is not centered. In Fira Code, punctuation is context-aware, so the "+" in "A + B" is not the same as the "+" in "a + b". In both of these faces, arrows can be of any length, and in Fira Code you can even mix and match them (see https://raw.githubusercontent.com/tonsky/FiraCode/master/extras/arrows.png).
The documentation of Fira Code does recommend composition-function-table here: https://github.com/tonsky/FiraCode/wiki/Emacs-instructions, but it seems like a lot of extra work for each font, isn't it?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 19:08 ` Eli Zaretskii
2020-05-21 20:51 ` Clément Pit-Claudel
@ 2020-05-21 21:06 ` Pip Cet
2020-05-22 6:06 ` Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-21 21:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Thu, May 21, 2020 at 7:08 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Thu, 21 May 2020 16:26:13 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > On Thu, May 21, 2020 at 2:11 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > No. I didn't touch the "static compositions" part at all, except for
> > passing an extra NULL pointer to an API I'd extended. (At least,
> > that's what I intended, for all the changes to be in the IT_CHARACTER
> > part).
>
> I mean this part:
>
> @@ -30433,8 +30483,9 @@ gui_produce_glyphs (struct it *it)
> else
> {
> get_char_face_and_encoding (it->f, ch, face_id,
> - &char2b, false);
> - pcm = get_per_char_metric (font, &char2b);
> + &char2b, false,
> + make_context (it));
> + pcm = get_per_char_metric (font, &char2b, make_context (it));
> }
>
> This calls make_context and passes it to these functions. This code
> handles static compositions only.
Oops, sorry. You're right, that change was harmless but unintended;
the relevant change is
@@ -29989,9 +30033,11 @@ gui_produce_glyphs (struct it *it)
it->descent = FONT_DESCENT (font) - boff;
}
- if (get_char_glyph_code (it->char_to_display, font, &char2b))
+ context = make_context (it);
+ if (get_char_glyph_code (it->char_to_display, font, &char2b,
+ context))
{
- pcm = get_per_char_metric (font, &char2b);
+ pcm = get_per_char_metric (font, &char2b, context);
if (pcm->width == 0
&& pcm->rbearing == 0 && pcm->lbearing == 0)
pcm = NULL;
> > > The "modern" way of composing text in Emacs uses automatic
> > > compositions, which are controlled by data in
> > > composition-function-table. This is where we call the shaping
> > > engine to produce the glyphs according to rules stored in the
> > > font. I don't see in your patch any changes that affect ligatures
> > > created by automatic compositions; did I miss something?
> >
> > I don't think so; I went for a third route, that of leaving all
> > compositions handling to the shaper and doing none of it in Emacs
> > itself.
>
> But automatic compositions do work by calling the shaper.
Yes, that observation is correct. What I'm doing is still very
different from the (semi-)automatic compositions composite.c does.
> > Perhaps I can digress a little and describe what I think the
> > interaction with the shaper should be like:
> >
> > Emacs: I'd like to display codepoint 'f'
> > Harfbuzz: you'll have to tell me the codepoint before that
> > Emacs: 'f'
> > Harfbuzz: and the one after those two
> > Emacs: 'i'
> > Harfbuzz: and the one before all of those
> > Emacs: That's too expensive for me to compute / it's the beginning of
> > paragraph / a bidi boundary / an object without an assigned codepoint
> > / ...
> > Harfbuzz: okay, display it as the middle slice of the "ffi" glyph
> >
> > I.e., I'd like Harfbuzz to be asynchronous, and request more
> > information, parsimoniously, about the context of the codepoint we're
> > describing, rather than working in one go from "complete" information
> > to an indefinitely-long line of glyphs. And deal well with us deciding
> > it's too expensive to perform that much look-back/look-ahead. (Because
> > in real life, ligatures depend on knowing some amount of the context,
> > but not all of it, or people could never start writing.)
>
> That would prevent Emacs from controlling what is and what isn't
> composed, leaving the shaper in charge.
Well, yes and no: the shaper is in charge, and I see absolutely
nothing wrong with that. You can tell the shaper not to perform
ligatures (or perform only some of them), or kerning, if you want to.
> We currently allow Lisp to
> control that via composition-function-table, which provides a regexp
> that text around a character must match in order for the matching
> substring to be passed to the shaper.
And you're suggesting that regexp be set to, say, ".+"? Because that's
the only way I've found of getting it to do kerning.
> We never call the shaper unless
> composition-function-table tells us to do so.
...whereas I want to call it every time, which is why having
composition-function-table in the loop seemed wasteful.
> I'm not sure I understand what problems do you see with this design.
I meant the redisplay engine in general, not the way automatic
compositions work.
(That's not to say I'm happy with automatic compositions, but that's a
different subject).
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 20:51 ` Clément Pit-Claudel
@ 2020-05-21 21:16 ` Pip Cet
2020-05-22 6:12 ` Eli Zaretskii
2020-05-22 11:44 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-21 21:16 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, alan, emacs-devel
On Thu, May 21, 2020 at 8:51 PM Clément Pit-Claudel
<cpitclaudel@gmail.com> wrote:
> On 21/05/2020 15.08, Eli Zaretskii wrote:
> > That would prevent Emacs from controlling what is and what isn't
> > composed, leaving the shaper in charge. We currently allow Lisp to
> > control that via composition-function-table, which provides a regexp
> > that text around a character must match in order for the matching
> > substring to be passed to the shaper. We never call the shaper unless
> > composition-function-table tells us to do so.
>
> Does this mean that for each font we need to re-encode the font's logic for deciding whether to use a ligature?
I think
(set-char-table-range composition-function-table t '([".+" 0
font-shape-gstring]))
should work, but it has weird side effects that I'm pretty sure aren't
intended (paren highlighting is broken, for example).
Is that supposed to happen?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 21:06 ` Pip Cet
@ 2020-05-22 6:06 ` Eli Zaretskii
2020-05-22 9:34 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 6:06 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Thu, 21 May 2020 21:06:27 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > But automatic compositions do work by calling the shaper.
>
> Yes, that observation is correct. What I'm doing is still very
> different from the (semi-)automatic compositions composite.c does.
For ligatures, I don't think I understand why the automatic
compositions are not the way to go.
> > That would prevent Emacs from controlling what is and what isn't
> > composed, leaving the shaper in charge.
>
> Well, yes and no: the shaper is in charge, and I see absolutely
> nothing wrong with that. You can tell the shaper not to perform
> ligatures (or perform only some of them), or kerning, if you want to.
Tell it how? by introducing new Lisp options and data structures?
What would those new data structures be, and how will they be
different from composition-function-table?
> > We currently allow Lisp to
> > control that via composition-function-table, which provides a regexp
> > that text around a character must match in order for the matching
> > substring to be passed to the shaper.
>
> And you're suggesting that regexp be set to, say, ".+"? Because that's
> the only way I've found of getting it to do kerning.
I'm not talking about the kerning. This discussion is about
ligatures, AFAIU. For ligatures, the regexp should catch the
sequences of characters that should be ligated. ".+" is definitely
not right for ligatures, since it will significantly slow down
redisplay for no good reason.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 21:16 ` Pip Cet
@ 2020-05-22 6:12 ` Eli Zaretskii
2020-05-22 9:25 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 6:12 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Thu, 21 May 2020 21:16:44 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, alan@idiocy.org, emacs-devel@gnu.org
>
> (set-char-table-range composition-function-table t '([".+" 0
> font-shape-gstring]))
>
> should work, but it has weird side effects that I'm pretty sure aren't
> intended (paren highlighting is broken, for example).
This is not the right way. The right way is to do the likes of the
following:
(set-char-table-range
composition-function-table '(?f . ?f)
(list (vector "ffi" 0 'compose-gstring-for-graphic)))
This shows how to do this only for the "ffi" ligature, but I think it
makes the idea clear. Tassilo posted here some code ho wrote that
supports more (and different) ligatures which are supposed to be used
like prettify-symbols-mode. The idea is to populate
composition-function-table only for characters that should trigger
ligation.
Whether to use compose-gstring-for-graphic or font-shape-gstring
depends on what you want to happen when the font doesn't have a glyph
for a certain ligature: the latter will then cause the characters be
displayed as usual, as separate characters, the latter will display
them as a single display element, a kind of "fake ligature".
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 6:12 ` Eli Zaretskii
@ 2020-05-22 9:25 ` Pip Cet
2020-05-22 11:23 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-22 9:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Fri, May 22, 2020 at 6:12 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Thu, 21 May 2020 21:16:44 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, alan@idiocy.org, emacs-devel@gnu.org
> >
> > (set-char-table-range composition-function-table t '([".+" 0
> > font-shape-gstring]))
> >
> > should work, but it has weird side effects that I'm pretty sure aren't
> > intended (paren highlighting is broken, for example).
>
> This is not the right way.
What is the right way, then? I want all ligatures my font supports.
Also, even if it is the wrong thing to do, why does it break seemingly
unrelated things?
> The right way is to do the likes of the
> following:
>
> (set-char-table-range
> composition-function-table '(?f . ?f)
> (list (vector "ffi" 0 'compose-gstring-for-graphic)))
> This shows how to do this only for the "ffi" ligature, but I think it
> makes the idea clear.
I'm afraid it doesn't, to me.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 6:06 ` Eli Zaretskii
@ 2020-05-22 9:34 ` Pip Cet
2020-05-22 11:33 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-22 9:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Fri, May 22, 2020 at 6:06 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Thu, 21 May 2020 21:06:27 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > But automatic compositions do work by calling the shaper.
> >
> > Yes, that observation is correct. What I'm doing is still very
> > different from the (semi-)automatic compositions composite.c does.
>
> For ligatures, I don't think I understand why the automatic
> compositions are not the way to go.
I don't think I've concluded they're not, though I'm strongly leaning
that way. I didn't use them in the first patch, but that's probably
easy enough to change.
(Playing around with composite.c, I noticed it's very easy to get into
an unquittable infinite loop by specifying invalid values in
composition-function-table. That should probably be fixed).
> > > That would prevent Emacs from controlling what is and what isn't
> > > composed, leaving the shaper in charge.
> >
> > Well, yes and no: the shaper is in charge, and I see absolutely
> > nothing wrong with that. You can tell the shaper not to perform
> > ligatures (or perform only some of them), or kerning, if you want to.
>
> Tell it how? by introducing new Lisp options and data structures?
Yes. A buffer option to disable ligatures/kerning would probably
suffice, because it would essentially only be used to work around
buggy fonts.
> What would those new data structures be, and how will they be
> different from composition-function-table?
> > > We currently allow Lisp to
> > > control that via composition-function-table, which provides a regexp
> > > that text around a character must match in order for the matching
> > > substring to be passed to the shaper.
> >
> > And you're suggesting that regexp be set to, say, ".+"? Because that's
> > the only way I've found of getting it to do kerning.
>
> I'm not talking about the kerning. This discussion is about
> ligatures, AFAIU.
Oh. I understood it differently, because kerning is an important
problem to solve in order to use variable-pitch fonts for English
text.
> For ligatures, the regexp should catch the
> sequences of characters that should be ligated.
I have to know that before using auto-composition-mode? How do I work
it out? Do I have to disassemble the font and reimplement the relevant
tables?
> ".+" is definitely
> not right for ligatures, since it will significantly slow down
> redisplay
So that's another argument against auto-composition-mode: it's too
slow unless you know in advance which ligatures you want. Right?
> for no good reason.
I think "because I want the ligatures the font provides, and I don't
care to work out in advance which ones those are" is a pretty good
reason.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 9:25 ` Pip Cet
@ 2020-05-22 11:23 ` Eli Zaretskii
2020-05-22 12:52 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 11:23 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 09:25:31 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > > should work, but it has weird side effects that I'm pretty sure aren't
> > > intended (paren highlighting is broken, for example).
> >
> > This is not the right way.
>
> What is the right way, then? I want all ligatures my font supports.
You can request all the ligatures that _can_ be supported; those which
aren't available in the font you use will not be ligated (if you use
font-shape-gstring in the composition-function-table slot).
Or you can request only those ligatures that make sense for the
particular use case. For example, when displaying program source code
you'd probably want the various symbols, like -> etc., to produce
ligatures, but you most probably won't want "ffi" in a variable name
to produce a ligature.
Or you can provide your own function to use in the
composition-function-table, and that function can do more complex
stuff, like refuse to ligate under some complicated conditions.
Therefore, I think letting Lisp programs (and thus users) control what
gets composed into ligatures and what doesn't is an important feature
to have. We should develop it more, because currently it lacks some
features we'd need for better ligature support (see the TODO item
about that), but I think the basic design is valid. At least I didn't
yet see any evidence that it isn't valid; perhaps when we develop it
more and/or start using it more, we will find some problems, but I
don't see them yet.
> Also, even if it is the wrong thing to do, why does it break seemingly
> unrelated things?
I don't know. Can you show how to reproduce that in the current
codebase on master? Then I'll look into it.
> > (set-char-table-range
> > composition-function-table '(?f . ?f)
> > (list (vector "ffi" 0 'compose-gstring-for-graphic)))
>
> > This shows how to do this only for the "ffi" ligature, but I think it
> > makes the idea clear.
>
> I'm afraid it doesn't, to me.
Doesn't make the idea clear or doesn't produce the ligature? If the
latter, then I'm puzzled, because it did work for me with a font that
has the ffi ligature. If the former, please ask more questions and I
will try to explain as best I can.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 9:34 ` Pip Cet
@ 2020-05-22 11:33 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 11:33 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 09:34:54 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > > Well, yes and no: the shaper is in charge, and I see absolutely
> > > nothing wrong with that. You can tell the shaper not to perform
> > > ligatures (or perform only some of them), or kerning, if you want to.
> >
> > Tell it how? by introducing new Lisp options and data structures?
>
> Yes. A buffer option to disable ligatures/kerning would probably
> suffice, because it would essentially only be used to work around
> buggy fonts.
That option already exists: disable auto-composition-mode in a buffer
where you don't want that.
If you want to disable only some compositions, like only ligatures, or
only some of the ligatures, you can do that in two ways:
. modify composition-function-table (although this currently cannot
be done only for a single buffer, I think: something to fix for
better ligature support)
. provide your own composition function to be used in
composition-function-table, which could then be programmed to
decide which ligatures to allow and which not to allow
> > I'm not talking about the kerning. This discussion is about
> > ligatures, AFAIU.
>
> Oh. I understood it differently, because kerning is an important
> problem to solve in order to use variable-pitch fonts for English
> text.
Perhaps so, but let's discuss the kerning issue separately. It's a
separate problem, AFAIU.
> > For ligatures, the regexp should catch the
> > sequences of characters that should be ligated.
>
> I have to know that before using auto-composition-mode? How do I work
> it out?
I tried to answer this in my previous message in this thread.
> > ".+" is definitely
> > not right for ligatures, since it will significantly slow down
> > redisplay
>
> So that's another argument against auto-composition-mode: it's too
> slow unless you know in advance which ligatures you want. Right?
It's too slow if we have too many ligatures, or, more generally, too
many characters to compose. Character composition works by calling
Lisp (so as to allow use the flexibility we need, see the other
messages), and calling Lisp for too many characters during redisplay
will make redisplay slower. This is one reason why we don't run every
buffer substring through the shaper, although the HarfBuzz developers
told me long ago they thought this was a flaw in our design.
> > for no good reason.
>
> I think "because I want the ligatures the font provides, and I don't
> care to work out in advance which ones those are" is a pretty good
> reason.
Let's see if I succeeded to convince you that we have better
solutions.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-21 20:51 ` Clément Pit-Claudel
2020-05-21 21:16 ` Pip Cet
@ 2020-05-22 11:44 ` Eli Zaretskii
2020-05-22 13:26 ` Clément Pit-Claudel
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 11:44 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 21 May 2020 16:51:47 -0400
>
> On 21/05/2020 15.08, Eli Zaretskii wrote:
> > That would prevent Emacs from controlling what is and what isn't
> > composed, leaving the shaper in charge. We currently allow Lisp to
> > control that via composition-function-table, which provides a regexp
> > that text around a character must match in order for the matching
> > substring to be passed to the shaper. We never call the shaper unless
> > composition-function-table tells us to do so.
>
> Does this mean that for each font we need to re-encode the font's logic for deciding whether to use a ligature?
I don't think so, but I'm not yet sure I understand all the details of
the use cases you have in mind. See also my responses to Pip Cet:
perhaps they answer also your questions here.
> Some concrete examples: in Iosevka (*, (**, (***, (**** etc are all displayed with the * character vertically centered relative to the (, but a lone * is not centered. In Fira Code, punctuation is context-aware, so the "+" in "A + B" is not the same as the "+" in "a + b". In both of these faces, arrows can be of any length, and in Fira Code you can even mix and match them (see https://raw.githubusercontent.com/tonsky/FiraCode/master/extras/arrows.png).
How do you solve this in prettify-symbols-mode?
In general, I envision that people would use the font they find
acceptable for the ligatures they want/need in each mode or buffer
where they need that. If for some reason different fonts could
determine which ligatures you do NOT want to see, then I guess we will
have to provide some easy-to-use UI for that, which would manipulate
the relevant data structures under the hood. Alternatively each font
could require a separate composition function to go with it.
See, this is exactly part of the job that still awaits us: to figure
out the various use cases for displaying ligatures in a buffer, and
then provide the necessary user-facing features to adapt Emacs to each
use case. The infrastructure for this already exists: it's the
auto-composition-mode and composition-function-table that underlies it
(although we may need to add something so that
composition-function-table could be modified on per-buffer basis), but
we lack an easy-to-use UI and customization features that will allow
users to use that machinery in practice. See the TODFO item about
ligatures; volunteers are welcome to work on that.
> The documentation of Fira Code does recommend composition-function-table here: https://github.com/tonsky/FiraCode/wiki/Emacs-instructions, but it seems like a lot of extra work for each font, isn't it?
That's for static compositions, not for automatic compositions. I was
talking about the latter, and consider the former to be a
semi-obsolete feature that we should eventually remove.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 11:23 ` Eli Zaretskii
@ 2020-05-22 12:52 ` Pip Cet
2020-05-22 13:15 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-22 12:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Fri, May 22, 2020 at 11:23 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 22 May 2020 09:25:31 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > > should work, but it has weird side effects that I'm pretty sure aren't
> > > > intended (paren highlighting is broken, for example).
> > >
> > > This is not the right way.
> >
> > What is the right way, then? I want all ligatures my font supports.
>
> You can request all the ligatures that _can_ be supported;
How do I do that? Opentype fonts can support arbitrary ligatures, such
as "Zapfino" being a seven-letter ligature.
> those which
> aren't available in the font you use will not be ligated (if you use
> font-shape-gstring in the composition-function-table slot).
> Or you can request only those ligatures that make sense for the
> particular use case.
My use case is English text, and all ligatures supported by the font
make sense for that.
> For example, when displaying program source code
> you'd probably want the various symbols, like -> etc., to produce
> ligatures, but you most probably won't want "ffi" in a variable name
> to produce a ligature.
Why not?
> Or you can provide your own function to use in the
> composition-function-table, and that function can do more complex
> stuff, like refuse to ligate under some complicated conditions.
If that kind of thing turns out to be necessary, we can find ways of
doing it, such as setting a text property with harfbuzz feature
strings to be applied when rendering.
> Therefore, I think letting Lisp programs (and thus users) control what
> gets composed into ligatures and what doesn't is an important feature
> to have.
Okay, I can accept that requirement. But it should be possible to get
"all ligatures", rather than a finite set you know about in advance.
> We should develop it more, because currently it lacks some
> features we'd need for better ligature support (see the TODO item
> about that), but I think the basic design is valid.
The TODO item is confusing and, I believe, confused.
"For the list of typographical ligatures, see
https://en.wikipedia.org/wiki/Orthographic_ligature#Ligatures_in_Unicode_(Latin_alphabets)"
That's very wrong: typographical ligatures generally aren't assigned
Unicode codepoints; those that have them usually do so for historical
reasons. There's no finite "the" list of typographical ligatures, it's
up to each font to define glyphs covering codepoint clusters as it
sees fit.
I disagree with pretty much every statement in the rest of the TODO item.
> At least I didn't yet see any evidence that it isn't valid;
But how do I make it work? For English/Western text with ligatures
that I don't know about in advance? Please treat this as a dumb
end-user question. What lines of Lisp do I enter to get all the
ligatures my font supports, most of which do not have individual
Unicode codepoints?
> > Also, even if it is the wrong thing to do, why does it break seemingly
> > unrelated things?
>
> I don't know. Can you show how to reproduce that in the current
> codebase on master? Then I'll look into it.
bug#41454
> > > (set-char-table-range
> > > composition-function-table '(?f . ?f)
> > > (list (vector "ffi" 0 'compose-gstring-for-graphic)))
> >
> > > This shows how to do this only for the "ffi" ligature, but I think it
> > > makes the idea clear.
> >
> > I'm afraid it doesn't, to me.
>
> Doesn't make the idea clear or doesn't produce the ligature?
It doesn't make the idea clear, because I simply see no practical way
we're going to know about the ligatures the font provides in advance.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 12:52 ` Pip Cet
@ 2020-05-22 13:15 ` Eli Zaretskii
2020-05-22 13:29 ` Clément Pit-Claudel
2020-05-22 13:56 ` Pip Cet
0 siblings, 2 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 13:15 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 12:52:41 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > You can request all the ligatures that _can_ be supported;
>
> How do I do that? Opentype fonts can support arbitrary ligatures, such
> as "Zapfino" being a seven-letter ligature.
I thought the set of all the ligatures is known, and guided by
typography experts. Do font designers really support ligatures from
any arbitrary combination of characters? If so, where can I read
about this?
> > Or you can request only those ligatures that make sense for the
> > particular use case.
>
> My use case is English text, and all ligatures supported by the font
> make sense for that.
Which ones are those? Is there an exhaustive list of such ligatures
somewhere?
> > For example, when displaying program source code
> > you'd probably want the various symbols, like -> etc., to produce
> > ligatures, but you most probably won't want "ffi" in a variable name
> > to produce a ligature.
>
> Why not?
It makes no sense to me. Why ligate them in that use case? Program
source code isn't supposed to behave like typeset human-readable text.
> Okay, I can accept that requirement. But it should be possible to get
> "all ligatures", rather than a finite set you know about in advance.
Let's first reach an understanding of what "all ligatures" actually
means. I thought the full list of all ligatures is known in advanced
and quite small, but maybe this is wrong, see above.
> "For the list of typographical ligatures, see
>
> https://en.wikipedia.org/wiki/Orthographic_ligature#Ligatures_in_Unicode_(Latin_alphabets)"
>
> That's very wrong: typographical ligatures generally aren't assigned
> Unicode codepoints; those that have them usually do so for historical
> reasons.
Indeed, ligatures don't have to have Unicode codepoints, only some of
them are precomposed. Emacs doesn't need them to have codepoints when
we use auto-composition-mode. The reference is there only to show the
list of ligatures, and I believe the list is full regardless of the
codepoint issue. Can you point me to a larger list of ligatures made
out of ASCII letters?
> There's no finite "the" list of typographical ligatures, it's up to
> each font to define glyphs covering codepoint clusters as it sees
> fit.
Really? Any reference for this?
> > At least I didn't yet see any evidence that it isn't valid;
>
> But how do I make it work? For English/Western text with ligatures
> that I don't know about in advance? Please treat this as a dumb
> end-user question. What lines of Lisp do I enter to get all the
> ligatures my font supports, most of which do not have individual
> Unicode codepoints?
You tell Emacs that a given series of characters should be composed,
via composition-function-table, and the shaper then does the job of
providing the font glyphs for displaying that sequence.
But I don't think we should continue with these details before we have
a clear idea of whether the list of possible ligatures is really
infinite, as you seem to imply.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 11:44 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
@ 2020-05-22 13:26 ` Clément Pit-Claudel
2020-05-22 14:29 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 13:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 22/05/2020 07.44, Eli Zaretskii wrote:
>> Cc: alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Thu, 21 May 2020 16:51:47 -0400
>>
>> On 21/05/2020 15.08, Eli Zaretskii wrote:
>>> That would prevent Emacs from controlling what is and what isn't
>>> composed, leaving the shaper in charge. We currently allow Lisp to
>>> control that via composition-function-table, which provides a regexp
>>> that text around a character must match in order for the matching
>>> substring to be passed to the shaper. We never call the shaper unless
>>> composition-function-table tells us to do so.
>>
>> Does this mean that for each font we need to re-encode the font's logic for deciding whether to use a ligature?
>
> I don't think so, but I'm not yet sure I understand all the details of
> the use cases you have in mind. See also my responses to Pip Cet:
> perhaps they answer also your questions here.
>
>> Some concrete examples: in Iosevka (*, (**, (***, (**** etc are all displayed with the * character vertically centered relative to the (, but a lone * is not centered. In Fira Code, punctuation is context-aware, so the "+" in "A + B" is not the same as the "+" in "a + b". In both of these faces, arrows can be of any length, and in Fira Code you can even mix and match them (see https://raw.githubusercontent.com/tonsky/FiraCode/master/extras/arrows.png).
>
> How do you solve this in prettify-symbols-mode?
You don't, which is unfortunate. prettify-symbols-mode was extremely cool a few years ago when fonts with programming ligatures were mostly unheard of, and it's still extremely nice for things like prettifying lambda in λ, but for things like turning ascii arrows into pretty arrows it lags behind the more recent ligature stuff.
> In general, I envision that people would use the font they find
> acceptable for the ligatures they want/need in each mode or buffer
> where they need that. If for some reason different fonts could
> determine which ligatures you do NOT want to see, then I guess we will
> have to provide some easy-to-use UI for that, which would manipulate
> the relevant data structures under the hood. Alternatively each font
> could require a separate composition function to go with it.
It would be weird for Emacs to be the only program that requires re-encoding the entire ligature logic of each font it attempts to use. Different fonts offer different ligatures, and if I want to select a subset the font itself provides variants that let me do this. Meanwhile, I hope that we can make Emacs act like browsers or other editors in that if I select a font it will just, by default, use the ligatures that this font provides according to the logic embedded in the font.
>> The documentation of Fira Code does recommend composition-function-table here: https://github.com/tonsky/FiraCode/wiki/Emacs-instructions, but it seems like a lot of extra work for each font, isn't it?
>
> That's for static compositions, not for automatic compositions. I was
> talking about the latter, and consider the former to be a
> semi-obsolete feature that we should eventually remove.
I see. I need to read up on the difference.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 13:15 ` Eli Zaretskii
@ 2020-05-22 13:29 ` Clément Pit-Claudel
2020-05-22 14:30 ` Eli Zaretskii
2020-05-22 13:56 ` Pip Cet
1 sibling, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 13:29 UTC (permalink / raw)
To: Eli Zaretskii, Pip Cet; +Cc: alan, emacs-devel
On 22/05/2020 09.15, Eli Zaretskii wrote:
> I thought the set of all the ligatures is known, and guided by
> typography experts.
I don't think so, at least not for programming fonts?
> Do font designers really support ligatures from
> any arbitrary combination of characters? If so, where can I read
> about this?
Yes; that's what I was alluding to in my example with comment signs and arrows. I think the pictures on https://github.com/tonsky/FiraCode should be illuminating.
I hope I'm not misunderstanding your question :/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 13:15 ` Eli Zaretskii
2020-05-22 13:29 ` Clément Pit-Claudel
@ 2020-05-22 13:56 ` Pip Cet
[not found] ` <83lflj16jn.fsf@gnu.org>
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-22 13:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Fri, May 22, 2020 at 1:15 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 22 May 2020 12:52:41 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > You can request all the ligatures that _can_ be supported;
> >
> > How do I do that? Opentype fonts can support arbitrary ligatures, such
> > as "Zapfino" being a seven-letter ligature.
>
> I thought the set of all the ligatures is known, and guided by
> typography experts.
No, that's not how Opentype handles things at all. I just added a "ta"
ligature to a font by converting it to ttx format, editing the XML,
and converting back to .otf. It works fine.
So ad-hoc ligatures certainly are a feature of Opentype.
> Do font designers really support ligatures from
> any arbitrary combination of characters? If so, where can I read
> about this?
https://docs.microsoft.com/en-us/typography/opentype/spec/gsub#lookuptype-4-ligature-substitution-subtable
The font I'm looking at right now has these: Th, ch, ck, ffh, ffi,
ffj, ffk, ffl, ff, fh, fi, fj, fk, fl, ft, tt, tz
But I've also come across an example where "fä" was displayed
differently, though I'm not sure it used Opentype ligatures.
> > > For example, when displaying program source code
> > > you'd probably want the various symbols, like -> etc., to produce
> > > ligatures, but you most probably won't want "ffi" in a variable name
> > > to produce a ligature.
> >
> > Why not?
>
> It makes no sense to me. Why ligate them in that use case? Program
> source code isn't supposed to behave like typeset human-readable text.
Seems like an aesthetic decision. As far as I'm concerned, program
source code is typeset human-readable text, it just has different (and
possibly better) conventions for typesetting it. I wouldn't choose to
use a variable-pitch font for program source code ordinarily, but if I
did, I'd want ligatures.
> > "For the list of typographical ligatures, see
> >
> > https://en.wikipedia.org/wiki/Orthographic_ligature#Ligatures_in_Unicode_(Latin_alphabets)"
> >
> > That's very wrong: typographical ligatures generally aren't assigned
> > Unicode codepoints; those that have them usually do so for historical
> > reasons.
>
> Indeed, ligatures don't have to have Unicode codepoints, only some of
> them are precomposed. Emacs doesn't need them to have codepoints when
> we use auto-composition-mode. The reference is there only to show the
> list of ligatures, and I believe the list is full regardless of the
> codepoint issue. Can you point me to a larger list of ligatures made
> out of ASCII letters?
"Th" is mentioned as an example in a few places, and it's not on the list.
> But I don't think we should continue with these details before we have
> a clear idea of whether the list of possible ligatures is really
> infinite, as you seem to imply.
I agree.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 13:26 ` Clément Pit-Claudel
@ 2020-05-22 14:29 ` Eli Zaretskii
2020-05-22 14:32 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 14:29 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 May 2020 09:26:05 -0400
>
> > In general, I envision that people would use the font they find
> > acceptable for the ligatures they want/need in each mode or buffer
> > where they need that. If for some reason different fonts could
> > determine which ligatures you do NOT want to see, then I guess we will
> > have to provide some easy-to-use UI for that, which would manipulate
> > the relevant data structures under the hood. Alternatively each font
> > could require a separate composition function to go with it.
>
> It would be weird for Emacs to be the only program that requires re-encoding the entire ligature logic of each font it attempts to use. Different fonts offer different ligatures, and if I want to select a subset the font itself provides variants that let me do this. Meanwhile, I hope that we can make Emacs act like browsers or other editors in that if I select a font it will just, by default, use the ligatures that this font provides according to the logic embedded in the font.
If this is a real problem, it should be possible to have a function
that will extract all the ligatures supported by a font, I think.
But I don't think I agree with the "logic embedded in the font" part.
I think we should let the user control which ligatures are really
used.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 13:29 ` Clément Pit-Claudel
@ 2020-05-22 14:30 ` Eli Zaretskii
2020-05-22 14:34 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 14:30 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 May 2020 09:29:57 -0400
>
> > Do font designers really support ligatures from
> > any arbitrary combination of characters? If so, where can I read
> > about this?
>
> Yes; that's what I was alluding to in my example with comment signs and arrows. I think the pictures on https://github.com/tonsky/FiraCode should be illuminating.
>
> I hope I'm not misunderstanding your question :/
I was talking about ligatures made from letters, not symbols.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 14:29 ` Eli Zaretskii
@ 2020-05-22 14:32 ` Clément Pit-Claudel
2020-05-22 19:00 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 14:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 22/05/2020 10.29, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 May 2020 09:26:05 -0400
>>
>>> In general, I envision that people would use the font they find
>>> acceptable for the ligatures they want/need in each mode or buffer
>>> where they need that. If for some reason different fonts could
>>> determine which ligatures you do NOT want to see, then I guess we will
>>> have to provide some easy-to-use UI for that, which would manipulate
>>> the relevant data structures under the hood. Alternatively each font
>>> could require a separate composition function to go with it.
>>
>> It would be weird for Emacs to be the only program that requires re-encoding the entire ligature logic of each font it attempts to use. Different fonts offer different ligatures, and if I want to select a subset the font itself provides variants that let me do this. Meanwhile, I hope that we can make Emacs act like browsers or other editors in that if I select a font it will just, by default, use the ligatures that this font provides according to the logic embedded in the font.
>
> If this is a real problem, it should be possible to have a function
> that will extract all the ligatures supported by a font, I think.
>
> But I don't think I agree with the "logic embedded in the font" part.
> I think we should let the user control which ligatures are really
> used.
I agree. We should let them control the logic, but that doesn't mean we have to force them to do so; which means we need a way to extract that logic, somehow. My udnerstanding was that it could be quite complex, so there was no point in re-implementing it in ELisp.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 14:30 ` Eli Zaretskii
@ 2020-05-22 14:34 ` Clément Pit-Claudel
2020-05-22 19:01 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 14:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 22/05/2020 10.30, Eli Zaretskii wrote:
>> Cc: alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 May 2020 09:29:57 -0400
>>
>>> Do font designers really support ligatures from
>>> any arbitrary combination of characters? If so, where can I read
>>> about this?
>>
>> Yes; that's what I was alluding to in my example with comment signs and arrows. I think the pictures on https://github.com/tonsky/FiraCode should be illuminating.
>>
>> I hope I'm not misunderstanding your question :/
>
> I was talking about ligatures made from letters, not symbols.
But then how do you handle symbol ligatures? You showed the example below in response to Pip's suggestion of using .+ to support everything that I had mentioned; was that only for letters? What about symbols then?
(set-char-table-range
composition-function-table '(?f . ?f)
(list (vector "ffi" 0 'compose-gstring-for-graphic)))
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 14:32 ` Clément Pit-Claudel
@ 2020-05-22 19:00 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 19:00 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 May 2020 10:32:30 -0400
>
> > But I don't think I agree with the "logic embedded in the font" part.
> > I think we should let the user control which ligatures are really
> > used.
>
> I agree. We should let them control the logic, but that doesn't mean we have to force them to do so; which means we need a way to extract that logic, somehow.
If we decide to enable only the ligatures that are supported by the
default font, then yes, we should find a way of detecting which ones
it supports. But if we find out that the list of the possible
ligatures is fixed, we could by default enable all of them, and let
the shaping engine deal with those that the font doesn't support.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 14:34 ` Clément Pit-Claudel
@ 2020-05-22 19:01 ` Eli Zaretskii
2020-05-22 19:33 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 19:01 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 May 2020 10:34:06 -0400
>
> > I was talking about ligatures made from letters, not symbols.
>
> But then how do you handle symbol ligatures?
By using suitable regular expressions. E.g., you could take the list
of ligatures in that FiraCode site and convert them into a regexp or a
set of regexps.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 19:01 ` Eli Zaretskii
@ 2020-05-22 19:33 ` Clément Pit-Claudel
2020-05-22 19:44 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 19:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 22/05/2020 15.01, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 May 2020 10:34:06 -0400
>>
>>> I was talking about ligatures made from letters, not symbols.
>>
>> But then how do you handle symbol ligatures?
>
> By using suitable regular expressions. E.g., you could take the list
> of ligatures in that FiraCode site and convert them into a regexp or a
> set of regexps.
Thanks. I don't understand why we need to do this, but if we have technical limitations that force us to add those regular expressions then maybe it's not the end of the world (I understand that there is value in being able to selectively disable ligatures, using regexps or something else, but it seems surprising that we'll need extra Emacs-specific work for each and every font that includes ligatures).
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 19:33 ` Clément Pit-Claudel
@ 2020-05-22 19:44 ` Eli Zaretskii
2020-05-22 20:02 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-22 19:44 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 22 May 2020 15:33:59 -0400
>
> >> But then how do you handle symbol ligatures?
> >
> > By using suitable regular expressions. E.g., you could take the list
> > of ligatures in that FiraCode site and convert them into a regexp or a
> > set of regexps.
>
> Thanks. I don't understand why we need to do this
I'm not sure I follow. Do you understand why
https://github.com/tonsky/FiraCode/wiki/Emacs-instructions includes a
long list of strings to be replaced with ligatures? If so, why don't
you understand the reason we need to specify similar things when we
use automatic compositions?
And who is "we" in this case? Users of these features indeed
shouldn't need to mess with these long lists of character sequences,
but why is it a problem if "we" the Emacs developers provide data
bases of such sequences in advance, which user-facing features could
use, hiding them behind much easier UI?
> it seems surprising that we'll need extra Emacs-specific work for each and every font that includes ligatures).
I don't understand how you got to this conclusion. This is true for
prettify-symbols-mode, but that's exactly why I don't like that
implementation, and why I think automatic compositions are a better
way to go. And for automatic compositions we didn't yet decide that
any user-level action is needed when you switch to another font, we
are still discussing what is involved. Up front, I don't yet see why
such font-specific adjustment would be required from users.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-22 19:44 ` Eli Zaretskii
@ 2020-05-22 20:02 ` Clément Pit-Claudel
[not found] ` <83mu5z171j.fsf@gnu.org>
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-22 20:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 22/05/2020 15.44, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 May 2020 15:33:59 -0400
>>
>>>> But then how do you handle symbol ligatures?
>>>
>>> By using suitable regular expressions. E.g., you could take the list
>>> of ligatures in that FiraCode site and convert them into a regexp or a
>>> set of regexps.
>>
>> Thanks. I don't understand why we need to do this
>
> I'm not sure I follow. Do you understand why
> https://github.com/tonsky/FiraCode/wiki/Emacs-instructions includes a
> long list of strings to be replaced with ligatures?
Yes, I do understand: that's because Emacs' ligature support is currently weaker than other editors, and so you need to jump through hoops to use Fira Code. These hoops include telling Emacs what sequences to turn into ligatures. This problem is specific to Emacs: in other text editors, you just pick the font, and all supported ligatures are used. Importantly, the instructions on that page are a poor workaround that doesn't give you all the features of Fira Code (I don't mean that we couldn't support all of them, as I don't know if that true currently. I just mean that the page shouldn't be understood as providing full support for Fira Code in Emacs).
That's why Emacs is in the fairly short list of "Doesn't work" editors, I think.
> If so, why don't
> you understand the reason we need to specify similar things when we
> use automatic compositions?
What I don't understand is what it is about Emacs that means that we need special lists of regexps for each new font, while other editors don't need them.
> And who is "we" in this case? Users of these features indeed
> shouldn't need to mess with these long lists of character sequences,
> but why is it a problem if "we" the Emacs developers provide data
> bases of such sequences in advance, which user-facing features could
> use, hiding them behind much easier UI?
We can't provide these data bases in advance, I think. Each font supports a different set of symbol ligatures, and so the list for each font will be different.
>> it seems surprising that we'll need extra Emacs-specific work for each and every font that includes ligatures).
>
> I don't understand how you got to this conclusion. This is true for
> prettify-symbols-mode, but that's exactly why I don't like that
> implementation, and why I think automatic compositions are a better
> way to go. And for automatic compositions we didn't yet decide that
> any user-level action is needed when you switch to another font, we
> are still discussing what is involved. Up front, I don't yet see why
> such font-specific adjustment would be required from users.
Each font offers a different set of symbol ligatures: there is no common superset that covers all fonts, except the ".+" regexp that Pip posted earlier. From earlier messages, I understood that we need to specify which character sequences to ligate. So, I conclude that we'll need new work every time a new font comes out, or the ligatures in a font change (every time Fira Code is updated, for example). Since other editors don't need that work, I wonder why it's needed in Emacs.
Sorry if I misunderstood something; I don't want to waste anyone's time.
Clément.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
[not found] ` <834ks7110w.fsf@gnu.org>
@ 2020-05-23 11:24 ` Vasilij Schneidermann
2020-05-23 13:04 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Vasilij Schneidermann @ 2020-05-23 11:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, pipcet, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3359 bytes --]
> The reason is how the current Emacs display engine is designed: it
> cannot pass large substrings of buffer text to the shaping engine
> without incurring performance penalties and/or disrupting the way the
> layout decisions, as currently designed, work. the current design of
> the display engine is that we examine the stuff to be displayed one
> grapheme cluster at a time, and make the layout decisions after each
> grapheme cluster's metrics is produced. Unless someone begins working
> on a new design of the Emacs display, I see no good way for overcoming
> these problems, based on what I know about the display code.
Thanks for describing the problem in detail. Out of curiosity, is this the
same reason why font fallback is handled on a per-script basis for most cases
and with carefully chosen ranges for emoji? I see a similar problem there,
with updates being necessary for every Unicode release.
> Of course, it's possible that I'm missing something in the current display
> code, which could luckily allow us to support any ligature made up from any
> number of characters without any significant design changes. So please by
> all means study the current code, see if something like that is possible,
> describe such a possible solution, and I'll gladly admit my mistake. I don't
> claim a 110% understanding of all the subtleties of the current code, so it
> is perfectly possible that I'm missing something. I don't think it is good
> for Emacs to have just one person who knows these details, especially if that
> person is myself. We need to enlarge the circle of our experts on this, and
> then perhaps a practical solution could present itself. Although I'm
> skeptical, to tell the truth.
Given your previous explanation, a regex-based approach heuristic is the best
we can hope for then. From what I understand the display engine uses a
rectangular grid, not unlike what terminal emulators do. Are there any tricks
to steal from existing terminal emulators? For example there is an open pull
request [1] for alacritty using Harfbuzz and FreeType for ligature support.
> If I _am_ right, and the complete solution is impossible, we could, of course
> decide that partial solutions based on heuristics are not good enough for us,
> and wait for the redesign of the display code. I hope we will not do that,
> because IMO partial solutions that satisfy 80% of the needs are much better
> than no solutions. That is why I described how this stuff could work under
> the current limitations, albeit without supporting every possible use case.
> Eventually, this is something the community should decide.
The greatest challenge I see with redesigning the display engine is supporting
textual terminals. One alternative design would be using something akin to a
typesetting engine, like TeX's boxes and glue model or something from the roff
family (which is used successfully in terminal emulators for `man`). Another
approach is to build upon a browser engine and use copious amounts of CSS and
JavaScript to build an editor. Neither is known to be performant and power
efficient enough for continuous redisplay. It's no wonder that custom designs
are used, for example in GUI toolkits. Maybe that is the way forward?
Vasilij
[1]: https://patch-diff.githubusercontent.com/raw/alacritty/alacritty/pull/2677.patch
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
[not found] ` <831rnb0zld.fsf@gnu.org>
@ 2020-05-23 12:36 ` Pip Cet
2020-05-23 14:08 ` Eli Zaretskii
2020-05-23 12:47 ` Ligatures Stefan Monnier
1 sibling, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-23 12:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Sat, May 23, 2020 at 9:28 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 08:44:22 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > You write: "(b) is not really feasible without redesigning the entire
> > Emacs display engine". I don't see how that's true at all. All we need
> > is some limited look-ahead.
>
> We already have look-ahead: that's what the regexp part of the
> composition rules are about. That is not the crucial problem.
But it's the only problem I see! When you see an IT_CHARACTER, you get
some context, hand it to HarfBuzz, slice up the relevant glyphs, and
display them.
This is not complicated or difficult, except for the "get some context" part.
It doesn't involve composite.c at all, and that's good, because for
those tricky special cases composite.c does a better job than standard
shaping, and we need to keep that feature. It just shouldn't be the
regular route.
> The crucial problem is that we currently perform layout decisions one
> grapheme cluster at a time, whereas what HarfBuzz people say is that
> we should basically do that one screen line at a time.
I think we're going to have to compromise: that's why my patch used a
32-character context rather than an entire line or just a single
character.
Ideally, of course, in most real cases we'd use whitespace-delimited
words as chunks. That's mere optimization, though.
> A secondary (but important) problem is that character composition
> involves calls to Lisp, which is relatively slow. This precludes
> calling the shaper for too many characters at once, too many times for
> each redisplay cycle of a window.
I agree we shouldn't go through Lisp. My patch didn't.
Calling the shaper less often is an important optimization, too. For
whitespace-delimited words, we only need to call it once.
> > I think at the heart of it, it's about whether we treat fonts like
> > pieces of software, to be given a specific task and fixed if they fail
> > to perform it, or as bitmaps for simulating a TTY. Fonts are software:
> > they're written in a weird limited language, but essentially they're
> > programs to measure and display characters as glyphs.
>
> I don't think there's any disagreements on this high and abstract
> level.
I think there are: if we treat fonts as programs, we need to let them
do their job, which involves kerning, substitutions, ligatures, and
even crazy stuff like randomizing the glyph used for each character to
get a more hand-written appearance. We don't need to know about
ligatures, we just let the font do it. No Lisp callbacks, just a call
to harfbuzz.
> The problem is how to support that within the limits of the
> current design of the display engine.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
[not found] ` <831rnb0zld.fsf@gnu.org>
2020-05-23 12:36 ` Pip Cet
@ 2020-05-23 12:47 ` Stefan Monnier
2020-05-23 13:10 ` Ligatures Eli Zaretskii
` (2 more replies)
1 sibling, 3 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-23 12:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, Pip Cet, emacs-devel
> The crucial problem is that we currently perform layout decisions one
> grapheme cluster at a time, whereas what HarfBuzz people say is that
> we should basically do that one screen line at a time.
I wonder how it is supposed to work and it works in other applications:
Disregarding the theoretical question of whether a font can use
ligatures that involve the LF character (and hence affect the definition
of what is a line), I still see a chicken-and-egg problems:
How do you know where the current "screen line" ends if you don't know
how narrow/wide the font and its ligatures will render the text?
Do current applications use a heuristic like "ligatures won't reduce the
size by more than a factor 2, so estimate the lower bound on the final
size to be at most half of what the font metrics say", so they will send
up to twice as much text to be shaped as needed, and then they throw
away the left overs?
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 11:24 ` Vasilij Schneidermann
@ 2020-05-23 13:04 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 13:04 UTC (permalink / raw)
To: Vasilij Schneidermann; +Cc: cpitclaudel, alan, pipcet, emacs-devel
> Date: Sat, 23 May 2020 13:24:12 +0200
> From: Vasilij Schneidermann <mail@vasilij.de>
> Cc: emacs-devel@gnu.org, pipcet@gmail.com, cpitclaudel@gmail.com,
> alan@idiocy.org
>
> Out of curiosity, is this the same reason why font fallback is
> handled on a per-script basis for most cases and with carefully
> chosen ranges for emoji? I see a similar problem there, with
> updates being necessary for every Unicode release.
No, our font selection machinery is completely separate from text
shaping, and is also agnostic to character compositions. Basically,
we have a char-table (the one set-fontset-font manipulates) which
provides the various fonts to try for every given character, and some
very convoluted code (see fontset.c) that implements the logic of how
to try the fonts and which fonts to prefer for a character. IOW, the
font selection is basically per-character and not per-script.
The relation to emoji is that emoji _sequences_ need character
composition, and Emacs currently cannot compose characters that aren't
supported by the same font. This _is_ related to ligatures etc., as
it indeed touches on one of the basic premises of the display engine's
iteration through buffer text: we stop wherever the 'face' property of
characters changes (and the font is one attribute of the face), then
continue after loading and realizing the new face. This is why you
see strange artifacts when you press and hold Shift, and then move
with arrow keys across the Arabic line in etc/HELLO: the shaping of
adjacent characters breaks because we pass only part of the text to
the shaper. This is another bug that cannot be fixed cleanly while
keeping the current design of the display engine and its low-level
method of iteration through text and of producing glyphs.
> Given your previous explanation, a regex-based approach heuristic is the best
> we can hope for then. From what I understand the display engine uses a
> rectangular grid, not unlike what terminal emulators do.
It uses a rectangular array of glyphs, not a rectangular grid. The
difference is that glyphs can have variable metrics, which breaks the
grid concept. IOW, the glyph at coordinates (i, j) in the array and
the glyph at (i, j+1) are not necessarily one above the other on
display.
> Are there any tricks
> to steal from existing terminal emulators? For example there is an open pull
> request [1] for alacritty using Harfbuzz and FreeType for ligature support.
I cannot claim I understood well enough what this attempts to do, but
I don't think this is our problem in Emacs. It is not a problem of
layout per se -- Emacs is well equipped to deal with layout of glyphs
and grapheme clusters that have wildly different metrics (recall that
we are able to lay out images of more-or-less arbitrary dimensions on
the same line as simple text). The problem is that we make the layout
decisions as soon as we have the glyph metrics, on the fly, for each
"thing" we need to display. HarfBuzz people would like us to send
them the entire paragraph of text, then get it back as a series of
glyphs, then make the layout decisions based on that. This would need
entirely different algorithms, if not also different data structures;
for starters, we'd need to know how to find the paragraph(s) that will
end up on display without first trying to display them. And all our
redisplay shortcuts and optimizations implicitly also assume the
current basic iteration, one character at a time, which can be started
at any arbitrary buffer position.
> The greatest challenge I see with redesigning the display engine is supporting
> textual terminals.
Really? Why do you think this to be the greatest challenge? For any
model of the display we will come up, TTY frames will always be a
proper subset, no?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 12:47 ` Ligatures Stefan Monnier
@ 2020-05-23 13:10 ` Eli Zaretskii
2020-05-23 13:45 ` Ligatures Stefan Monnier
2020-05-23 13:36 ` Ligatures 조성빈
2020-05-23 14:37 ` Ligatures Pip Cet
2 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 13:10 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cpitclaudel, alan, pipcet, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Pip Cet <pipcet@gmail.com>, cpitclaudel@gmail.com, alan@idiocy.org,
> emacs-devel@gnu.org
> Date: Sat, 23 May 2020 08:47:57 -0400
>
> I wonder how it is supposed to work and it works in other applications:
I have no idea. If someone does, it would be good to hear the
details.
> Do current applications use a heuristic like "ligatures won't reduce the
> size by more than a factor 2, so estimate the lower bound on the final
> size to be at most half of what the font metrics say", so they will send
> up to twice as much text to be shaped as needed, and then they throw
> away the left overs?
As I wrote elsewhere, HarfBuzz developers actually prefer to see the
entire paragraph, not just screen line, because some shaping decisions
depend on that. Not sure what the other applications do about that.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 12:47 ` Ligatures Stefan Monnier
2020-05-23 13:10 ` Ligatures Eli Zaretskii
@ 2020-05-23 13:36 ` 조성빈
2020-05-23 14:15 ` Ligatures Stefan Monnier
2020-05-23 14:37 ` Ligatures Pip Cet
2 siblings, 1 reply; 145+ messages in thread
From: 조성빈 @ 2020-05-23 13:36 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, cpitclaudel, alan, Pip Cet, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> 작성:
>> The crucial problem is that we currently perform layout decisions one
>> grapheme cluster at a time, whereas what HarfBuzz people say is that
>> we should basically do that one screen line at a time.
>
> I wonder how it is supposed to work and it works in other applications:
I don’t know how much you know about text rendering, (I’m fairly confident
that a previous Emacs maintainer to know more about this than me) but for
people who are curious about this, I found the ’Text Rendering Hates
You’[0] article which was very helpful for understanding the problem.
[0]: https://gankra.github.io/blah/text-hates-you/
> Disregarding the theoretical question of whether a font can use
> ligatures that involve the LF character (and hence affect the definition
> of what is a line), I still see a chicken-and-egg problems:
> How do you know where the current "screen line" ends if you don't know
> how narrow/wide the font and its ligatures will render the text?
>
> Do current applications use a heuristic like "ligatures won't reduce the
> size by more than a factor 2, so estimate the lower bound on the final
> size to be at most half of what the font metrics say", so they will send
> up to twice as much text to be shaped as needed, and then they throw
> away the left overs?
According to the article I mentioned, it’s just passing the total text
repeatedly until it runs out of space.
> You have to assume that your text fits on a single line and shape it
> until you run out of space. At that point you can perform layout
> operations and figure out where to break the text and start the next
> line. Repeat until everything is shaped and laid out.
>
> Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 13:10 ` Ligatures Eli Zaretskii
@ 2020-05-23 13:45 ` Stefan Monnier
2020-05-23 14:12 ` Ligatures Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Stefan Monnier @ 2020-05-23 13:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, pipcet, emacs-devel
>> Do current applications use a heuristic like "ligatures won't reduce the
>> size by more than a factor 2, so estimate the lower bound on the final
>> size to be at most half of what the font metrics say", so they will send
>> up to twice as much text to be shaped as needed, and then they throw
>> away the left overs?
> As I wrote elsewhere, HarfBuzz developers actually prefer to see the
> entire paragraph, not just screen line, because some shaping decisions
> depend on that. Not sure what the other applications do about that.
But the entire "paragraph" could be 10MB of text?!
Sounds like making the "long lines problem" even worse than it already is.
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 12:36 ` Pip Cet
@ 2020-05-23 14:08 ` Eli Zaretskii
2020-05-23 15:13 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 14:08 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 12:36:56 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > > You write: "(b) is not really feasible without redesigning the entire
> > > Emacs display engine". I don't see how that's true at all. All we need
> > > is some limited look-ahead.
> >
> > We already have look-ahead: that's what the regexp part of the
> > composition rules are about. That is not the crucial problem.
>
> But it's the only problem I see!
Then maybe I don't understand what you mean by look-ahead. Is that
the decision how to choose those 32 characters of "context"? Then why
not use the current regexp-based approach, which is already much
smarter than just blindly taking a fixed amount of surrounding text?
> When you see an IT_CHARACTER, you get some context, hand it to
> HarfBuzz, slice up the relevant glyphs, and display them.
The problem is, of course, in the "some context" part. Your patch
used an arbitrary 32-character chunk of text around the character to
shape, which is of course not what the shaping engines want: they want
_all_ of the surrounding text, the entire paragraph.
Your patch also invokes the shaper twice, on the same 32 characters,
once in encode_char method and again in the text_extents method, which
is another waste. The code in composite.c caches the composed
characters to avoid that, but you bypass it.
This is okay for showing the concept, but we cannot use this in
production. There are too many arbitrary decisions and inefficient
expensive operations.
> It doesn't involve composite.c at all, and that's good, because for
> those tricky special cases composite.c does a better job than standard
> shaping, and we need to keep that feature. It just shouldn't be the
> regular route.
Of course, you never tell how to distinguish between the "tricky
special cases" for which we still need to use composite.c and friends,
and the other kind.
Moreover, the HarfBuzz guys clearly say that what we do now is wrong
for those "tricky" cases as well, so if we are going to fix that, why
fix it only for ligatures made out of ASCII characters?
> > The crucial problem is that we currently perform layout decisions one
> > grapheme cluster at a time, whereas what HarfBuzz people say is that
> > we should basically do that one screen line at a time.
>
> I think we're going to have to compromise: that's why my patch used a
> 32-character context rather than an entire line or just a single
> character.
If we are going to compromise, then why not compromise on what we
already have, which is much less than 32 characters? Why should we
enormously complicate and slow down our code without actually solving
the problem? Did you ever see ligatures that are 32-character long?
> Ideally, of course, in most real cases we'd use whitespace-delimited
> words as chunks. That's mere optimization, though.
That'd be the wrong optimization, AFAIK. E.g., some scripts don't
have whitespace separated words at all, and still need shaping. And
what exactly is whitespace for this purpose? e.g., does it include
Unicode control characters such as ZWJ?
> > A secondary (but important) problem is that character composition
> > involves calls to Lisp, which is relatively slow. This precludes
> > calling the shaper for too many characters at once, too many times for
> > each redisplay cycle of a window.
>
> I agree we shouldn't go through Lisp. My patch didn't.
Your patch hard-codes arbitrary numbers without any way to control
that from Lisp. Such code will never fly in Emacs.
> Calling the shaper less often is an important optimization, too. For
> whitespace-delimited words, we only need to call it once.
This doesn't work when the produced sequence of glyphs doesn't fit on
the screen line. What the current layout code does in this case won't
work well when you need to break a long sequence of glyphs in the
middle and then continue on the next line from where you left off on
this one. The longer the sequence of glyphs you get from the shaper
in one go, the higher the probability of hitting this issue.
The bottom line of this is that I think you will find very quickly
that the basic assumptions of the current design -- that we produce
single glyphs or very short sequences of them for each call to the
shaper -- that these assumptions bite you on every step, because the
code which deals with layout implicitly assumes this.
In short, I really don't see how this could ever work, except in a
very limited set of simple use cases. E.g., what do you do with
bidirectional text? ignore it?
> > I don't think there's any disagreements on this high and abstract
> > level.
>
> I think there are: if we treat fonts as programs, we need to let them
> do their job, which involves kerning, substitutions, ligatures, and
> even crazy stuff like randomizing the glyph used for each character to
> get a more hand-written appearance. We don't need to know about
> ligatures, we just let the font do it. No Lisp callbacks, just a call
> to harfbuzz.
I think this is a simplistic view of how the display engine works, and
I don't see how it could work in production while supporting all the
use cases we already do. I could be wrong, though, so I'm looking
forward to see you present a series of patches that do support the
existing use cases and the ligatures as well, and don't cause any
slowdown in redisplay.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 13:45 ` Ligatures Stefan Monnier
@ 2020-05-23 14:12 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 14:12 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cpitclaudel, alan, pipcet, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: pipcet@gmail.com, cpitclaudel@gmail.com, alan@idiocy.org,
> emacs-devel@gnu.org
> Date: Sat, 23 May 2020 09:45:12 -0400
>
> > As I wrote elsewhere, HarfBuzz developers actually prefer to see the
> > entire paragraph, not just screen line, because some shaping decisions
> > depend on that. Not sure what the other applications do about that.
>
> But the entire "paragraph" could be 10MB of text?!
Yes. And?
> Sounds like making the "long lines problem" even worse than it already is.
Presumably, you use other algorithms and data structures to replace
the slow parts we have now. But yes, this is one of the problems that
would need to be solved by the new display engine.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 13:36 ` Ligatures 조성빈
@ 2020-05-23 14:15 ` Stefan Monnier
0 siblings, 0 replies; 145+ messages in thread
From: Stefan Monnier @ 2020-05-23 14:15 UTC (permalink / raw)
To: 조성빈
Cc: Eli Zaretskii, emacs-devel, cpitclaudel, Pip Cet, alan
> According to the article I mentioned, it’s just passing the total text
> repeatedly until it runs out of space.
But wouldn't that inherently imply an O(N²) complexity?
Stefan
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
[not found] ` <83mu5z171j.fsf@gnu.org>
@ 2020-05-23 14:34 ` Clément Pit-Claudel
2020-05-23 16:18 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-23 14:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 23/05/2020 02.47, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Fri, 22 May 2020 16:02:22 -0400
>>
>> What I don't understand is what it is about Emacs that means that we need special lists of regexps for each new font, while other editors don't need them.
>
> Emacs doesn't need a special list for each font. I already said that
> several times. Please look at some examples of composition rules we
> already have, for example the Arabic rules at the very end of
> misc-lang.el. Do you see any fonts mentioned there? These rules work
> with any font that supports Arabic.
The only thing I'm talking about is symbol compositions in programming fonts, and for these, we *will* need a custom list for each font, right?
>> Each font offers a different set of symbol ligatures: there is no common superset that covers all fonts, except the ".+" regexp that Pip posted earlier.
>
> I'm not yet sure this is indeed so. I didn't see any reference which
> implies that any combination of 26 ASCII letters could become a
> ligature.
I think that's where I'm confused. I'm talking of ligatures like -> and =>, which do not involve the 26 ASCII letters.
> This is a discussion that didn't yet happen. It is quite possible
> that in practice the list of ligatures we want to support is not very
> long. E.g., the list in
> https://github.com/tonsky/FiraCode/wiki/Emacs-instructions is not
> long, and I doubt manu additions to it will ever make sense for us.
As I said, this list is incomplete and broken.
> And finally, if a given font doesn't support some ligature, the
> original characters will be displayed "normally", so nothing is lost,
> and there's no need to tune the list of ligatures to each and every
> font. I said that as well several times already.
As long as you can produce a superset of all ligatures, yes. My claim is that this superset is ".+".
Otherwise, how do you handle the fact that Fira Code handles arrows of arbitrary lengths? Or is that different from ligatures?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures
2020-05-23 12:47 ` Ligatures Stefan Monnier
2020-05-23 13:10 ` Ligatures Eli Zaretskii
2020-05-23 13:36 ` Ligatures 조성빈
@ 2020-05-23 14:37 ` Pip Cet
2 siblings, 0 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-23 14:37 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel, cpitclaudel, alan
On Sat, May 23, 2020 at 12:48 PM Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> > The crucial problem is that we currently perform layout decisions one
> > grapheme cluster at a time, whereas what HarfBuzz people say is that
> > we should basically do that one screen line at a time.
>
> I wonder how it is supposed to work and it works in other applications:
That's why I'd like us to use a more advanced internal API rather than
the limited HarfBuzz API, one that asynchronously requests information
about preceding/following codepoints, incrementally informing us of
the minimum width already reached, until it can reach a decision. It
should be easy enough to put in some heuristics that work in practice
until a better shaper comes along...
> Do current applications use a heuristic like "ligatures won't reduce the
> size by more than a factor 2, so estimate the lower bound on the final
> size to be at most half of what the font metrics say", so they will send
> up to twice as much text to be shaped as needed, and then they throw
> away the left overs?
I don't know.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 14:08 ` Eli Zaretskii
@ 2020-05-23 15:13 ` Pip Cet
2020-05-23 16:34 ` Eli Zaretskii
2020-05-23 17:32 ` Eli Zaretskii
0 siblings, 2 replies; 145+ messages in thread
From: Pip Cet @ 2020-05-23 15:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Sat, May 23, 2020 at 2:08 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 12:36:56 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > > You write: "(b) is not really feasible without redesigning the entire
> > > > Emacs display engine". I don't see how that's true at all. All we need
> > > > is some limited look-ahead.
> > >
> > > We already have look-ahead: that's what the regexp part of the
> > > composition rules are about. That is not the crucial problem.
> >
> > But it's the only problem I see!
>
> Then maybe I don't understand what you mean by look-ahead. Is that
> the decision how to choose those 32 characters of "context"?
Yes.
> Then why
> not use the current regexp-based approach, which is already much
> smarter than just blindly taking a fixed amount of surrounding text?
Because I do not know the regexp to use?
> > When you see an IT_CHARACTER, you get some context, hand it to
> > HarfBuzz, slice up the relevant glyphs, and display them.
>
> The problem is, of course, in the "some context" part. Your patch
> used an arbitrary 32-character chunk of text around the character to
> shape, which is of course not what the shaping engines want: they want
> _all_ of the surrounding text, the entire paragraph.
Which is clearly too expensive to actually give them, which is
something I didn't think it was necessary to even spell out.
> Your patch also invokes the shaper twice, on the same 32 characters,
> once in encode_char method and again in the text_extents method, which
> is another waste. The code in composite.c caches the composed
> characters to avoid that, but you bypass it.
Absolutely.
> This is okay for showing the concept, but we cannot use this in
> production. There are too many arbitrary decisions and inefficient
> expensive operations.
I agree, of course! In fact, the 32-character limit was chosen as a
reminder to myself that things would inherently be inefficient.
> > It doesn't involve composite.c at all, and that's good, because for
> > those tricky special cases composite.c does a better job than standard
> > shaping, and we need to keep that feature. It just shouldn't be the
> > regular route.
>
> Of course, you never tell how to distinguish between the "tricky
> special cases" for which we still need to use composite.c and friends,
> and the other kind.
The tricky special cases get handled as before, and come in with the
iterator .what set to IT_COMPOSITE. The standard cases come in with
.what set to IT_CHARACTER.
> Moreover, the HarfBuzz guys clearly say that what we do now is wrong
> for those "tricky" cases as well, so if we are going to fix that, why
> fix it only for ligatures made out of ASCII characters?
There's no such limitation, but, yes, ideally people would find they
don't need automatic compositions anymore...
> > > The crucial problem is that we currently perform layout decisions one
> > > grapheme cluster at a time, whereas what HarfBuzz people say is that
> > > we should basically do that one screen line at a time.
> >
> > I think we're going to have to compromise: that's why my patch used a
> > 32-character context rather than an entire line or just a single
> > character.
>
> If we are going to compromise, then why not compromise on what we
> already have, which is much less than 32 characters?
0 characters?
> Why should we
> enormously complicate and slow down our code without actually solving
> the problem?
We shouldn't.
> Did you ever see ligatures that are 32-character long?
"Zapfino" is the longest I've seen.
> > Ideally, of course, in most real cases we'd use whitespace-delimited
> > words as chunks. That's mere optimization, though.
>
> That'd be the wrong optimization, AFAIK.
Sure, but since it is exclusively an optimization, it's performance
considerations alone that will decide whether it is.
> E.g., some scripts don't
> have whitespace separated words at all, and still need shaping.
Thus "most".
> And
> what exactly is whitespace for this purpose? e.g., does it include
> Unicode control characters such as ZWJ?
Thankfully, that doesn't matter much: it's just a question of what we
optimize for, not one of what the results will look like.
So I'd say " ", "\t", and "\n" are enough, which is what the display
engine already handles specially.
> > > A secondary (but important) problem is that character composition
> > > involves calls to Lisp, which is relatively slow. This precludes
> > > calling the shaper for too many characters at once, too many times for
> > > each redisplay cycle of a window.
> >
> > I agree we shouldn't go through Lisp. My patch didn't.
>
> Your patch hard-codes arbitrary numbers without any way to control
> that from Lisp.
Yes.
> Such code will never fly in Emacs.
Of course not.
> > Calling the shaper less often is an important optimization, too. For
> > whitespace-delimited words, we only need to call it once.
>
> This doesn't work when the produced sequence of glyphs doesn't fit on
> the screen line.
> What the current layout code does in this case won't
> work well when you need to break a long sequence of glyphs in the
> middle and then continue on the next line from where you left off on
> this one.
You mean in visual-mode? Because what the current layout code does by
default is to break along any glyph boundary, and I don't see how
that's broken in any way.
> The longer the sequence of glyphs you get from the shaper
> in one go, the higher the probability of hitting this issue.
You break between the glyphs. It doesn't depend on whether you have
two or 20 or 100.
> The bottom line of this is that I think you will find very quickly
> that the basic assumptions of the current design -- that we produce
> single glyphs or very short sequences of them for each call to the
> shaper -- that these assumptions bite you on every step, because the
> code which deals with layout implicitly assumes this.
The shaper interface I described would actually return a single glyph
for each top-level call, with a number of callbacks to provide
context. So that assumption would hold up very well indeed...
> In short, I really don't see how this could ever work, except in a
> very limited set of simple use cases. E.g., what do you do with
> bidirectional text? ignore it?
A bidi boundary is a hard boundary for HarfBuzz, and no shaping
happens across it. Is that what you mean by "ignore it"?
> > > I don't think there's any disagreements on this high and abstract
> > > level.
> >
> > I think there are: if we treat fonts as programs, we need to let them
> > do their job, which involves kerning, substitutions, ligatures, and
> > even crazy stuff like randomizing the glyph used for each character to
> > get a more hand-written appearance. We don't need to know about
> > ligatures, we just let the font do it. No Lisp callbacks, just a call
> > to harfbuzz.
>
> I think this is a simplistic view of how the display engine works,
Quite possibly :-)
> and
> I don't see how it could work in production while supporting all the
> use cases we already do.
It only comes in for use cases not handled otherwise, i.e. those where
the iterator is at an IT_CHARACTER. All other use cases are
unaffected, because they mean we're overriding the font decision
anyway.
As I said, the problem I have is to get look-ahead working, which you
think isn't a problem. I've got an idea for it, but it doesn't work
(yet); my theory is the bidi.c code fails to keep its state in the
iterator and can't deal with multiple parallel iterators.
> I could be wrong, though, so I'm looking
> forward to see you present a series of patches that do support the
> existing use cases and the ligatures as well, and don't cause any
> slowdown in redisplay.
As I said, what's stopping me is the look-ahead problem, and in
particular some code in bidi.c that doesn't play along well with
look-ahead.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 14:34 ` Clément Pit-Claudel
@ 2020-05-23 16:18 ` Eli Zaretskii
2020-05-23 16:37 ` Clément Pit-Claudel
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 16:18 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: alan, pipcet, emacs-devel
> Cc: pipcet@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Sat, 23 May 2020 10:34:23 -0400
>
> > Emacs doesn't need a special list for each font. I already said that
> > several times. Please look at some examples of composition rules we
> > already have, for example the Arabic rules at the very end of
> > misc-lang.el. Do you see any fonts mentioned there? These rules work
> > with any font that supports Arabic.
>
> The only thing I'm talking about is symbol compositions in programming fonts, and for these, we *will* need a custom list for each font, right?
No, we won't need custom lists. Not if we will use the same character
composition machinery as we use now for Arabic and other scripts that
require it.
> > And finally, if a given font doesn't support some ligature, the
> > original characters will be displayed "normally", so nothing is lost,
> > and there's no need to tune the list of ligatures to each and every
> > font. I said that as well several times already.
>
> As long as you can produce a superset of all ligatures, yes. My claim is that this superset is ".+".
It cannot be literally ".+", if you are talking about symbols, because
(a) not every character starts a symbol, and (b) symbols cannot be of
arbitrary length.
> Otherwise, how do you handle the fact that Fira Code handles arrows
> of arbitrary lengths?
We won't handle arrows of arbitrary length, no. Not as long as we
keep the current design of the display engine.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 15:13 ` Pip Cet
@ 2020-05-23 16:34 ` Eli Zaretskii
2020-05-23 22:38 ` Pip Cet
2020-05-23 17:32 ` Eli Zaretskii
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 16:34 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 15:13:38 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > > Calling the shaper less often is an important optimization, too. For
> > > whitespace-delimited words, we only need to call it once.
> >
> > This doesn't work when the produced sequence of glyphs doesn't fit on
> > the screen line.
>
> > What the current layout code does in this case won't
> > work well when you need to break a long sequence of glyphs in the
> > middle and then continue on the next line from where you left off on
> > this one.
>
> You mean in visual-mode?
Not just in visual-line-mode, but also for the default line
continuation.
> Because what the current layout code does by default is to break
> along any glyph boundary, and I don't see how that's broken in any
> way.
The code assumes that breaking on some glyph leaves the buffer
iterator ('struct it') in a state that we can simply continue to the
next buffer position. But if you already picked up several characters
via look-ahead, that is not true, and you will have to return back
several character positions, in order to continue on the next screen
line. The whole convoluted logic of display_line (and a similar one
in move_it_in_display_line_to) is based on the assumption that this
line-wrap decisions are made as soon as a single glyph is produced;
that code will need to be rewritten if this assumption breaks. And
since the code is already hairy, to say the least, I cannot even
imagine what it will look like after such rewriting.
This is just a small example of how deep are the current design
assumptions entrenched in the code. I don't see how this can be
resolved to yield code that is readable and maintainable without
changing the design. Again, maybe I'm missing something.
> > In short, I really don't see how this could ever work, except in a
> > very limited set of simple use cases. E.g., what do you do with
> > bidirectional text? ignore it?
>
> A bidi boundary is a hard boundary for HarfBuzz, and no shaping
> happens across it. Is that what you mean by "ignore it"?
I don't mean the boundary, I meant the fact that clusters need to be
reordered.
> > I don't see how it could work in production while supporting all the
> > use cases we already do.
>
> It only comes in for use cases not handled otherwise, i.e. those where
> the iterator is at an IT_CHARACTER. All other use cases are
> unaffected, because they mean we're overriding the font decision
> anyway.
I see no reason to add such patches just to handle some simple enough
use cases. If we want the shaper to handle all the text we display,
we should go all the way and do it for any text, ASCII, non-ASCII,
symbols, emoji, everything. The current codebase is already very
difficult to understand and modify; you seem to suggest to make it
even more so, and on top of that solve only a small part of the
underlying problem. That makes very little sense to me.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 16:18 ` Eli Zaretskii
@ 2020-05-23 16:37 ` Clément Pit-Claudel
0 siblings, 0 replies; 145+ messages in thread
From: Clément Pit-Claudel @ 2020-05-23 16:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: alan, pipcet, emacs-devel
On 23/05/2020 12.18, Eli Zaretskii wrote:
> We won't handle arrows of arbitrary length, no. Not as long as we
> keep the current design of the display engine.
Ah, OK, then I understand.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 15:13 ` Pip Cet
2020-05-23 16:34 ` Eli Zaretskii
@ 2020-05-23 17:32 ` Eli Zaretskii
2020-05-23 21:29 ` Pip Cet
1 sibling, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-23 17:32 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 15:13:38 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> As I said, the problem I have is to get look-ahead working, which you
> think isn't a problem. I've got an idea for it, but it doesn't work
> (yet); my theory is the bidi.c code fails to keep its state in the
> iterator and can't deal with multiple parallel iterators.
>
> > I could be wrong, though, so I'm looking
> > forward to see you present a series of patches that do support the
> > existing use cases and the ligatures as well, and don't cause any
> > slowdown in redisplay.
>
> As I said, what's stopping me is the look-ahead problem, and in
> particular some code in bidi.c that doesn't play along well with
> look-ahead.
I don't think you understand the depth of the issue. If we are going
to send large chunks of text to the shaping engine, then none of the
insane complexity of bidi.c makes sense; we should simply throw all of
it away and use a very different, batch-style reordering code, of the
kind you can find in the freebidi library. The sole reason for
bidi.c's existence is to produce character positions in the _visual_
order, one position at a time, something that no other bidi-aware
editor does.
Moreover, not even the basic iteration, one level above bidi.c, where
we call get_next_display_element, then PRODUCE_GLYPHS, then
set_iterator_to_next -- not even that makes sense. This basic loop
exists only because we examine characters one by one, switching from
buffer text to overlay or display strings, then back, as needed, and
applying faces as we go. Doing this in large chunks calls for a very
different structure of the code, and very different separation into
layers.
This needs to be carefully designed in advance in a clean and
well-defined way, not lumped one patch upon another until it kinda
works...
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 17:32 ` Eli Zaretskii
@ 2020-05-23 21:29 ` Pip Cet
2020-05-24 15:19 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-23 21:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Sat, May 23, 2020 at 5:32 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 15:13:38 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > As I said, the problem I have is to get look-ahead working, which you
> > think isn't a problem. I've got an idea for it, but it doesn't work
> > (yet); my theory is the bidi.c code fails to keep its state in the
> > iterator and can't deal with multiple parallel iterators.
> >
> > > I could be wrong, though, so I'm looking
> > > forward to see you present a series of patches that do support the
> > > existing use cases and the ligatures as well, and don't cause any
> > > slowdown in redisplay.
> >
> > As I said, what's stopping me is the look-ahead problem, and in
> > particular some code in bidi.c that doesn't play along well with
> > look-ahead.
>
> I don't think you understand the depth of the issue.
I think I do, actually. It's just that you'd prefer the display engine
to be torn out by the roots and rewritten in one fell swoop, but that
option isn't currently on the table.
> If we are going
> to send large chunks of text to the shaping engine, then none of the
> insane complexity of bidi.c makes sense; we should simply throw all of
> it away and use a very different, batch-style reordering code, of the
> kind you can find in the freebidi library. The sole reason for
> bidi.c's existence is to produce character positions in the _visual_
> order, one position at a time, something that no other bidi-aware
> editor does.
Yes, except we're not talking about "large chunks of text" here:
somehow you went from "we need only a bunch of regexps to catch
ligatures" to "we need to send entire paragraphs to the shaping
engine, nothing less will do". My opinion is that we need a reasonable
amount of context, often just a single character, and I see no reason
to throw out the entire display engine because we want some
look-ahead.
> Moreover, not even the basic iteration, one level above bidi.c, where
> we call get_next_display_element, then PRODUCE_GLYPHS, then
> set_iterator_to_next -- not even that makes sense.
Again, a single character of lookahead in the typical case, four
characters for most ligatures; that doesn't push us over the line to
"only a complete rewrite makes sense".
> This basic loop
> exists only because we examine characters one by one, switching from
> buffer text to overlay or display strings, then back, as needed, and
> applying faces as we go. Doing this in large chunks calls for a very
> different structure of the code, and very different separation into
> layers.
Indeed. Which is why I'm not talking about doing it in large chunks,
at this point. Let's keep doing it character by character but add what
little we need to in order to look ahead a little.
> This needs to be carefully designed in advance in a clean and
> well-defined way, not lumped one patch upon another until it kinda
> works...
I agree "just start hacking on it with no understanding of the code
until things appear to start working" is a bad strategy. So is "first,
redesign the universe". To me, it seems like what I want is a
reasonable compromise: not large chunks of text, because we can't do
that, but some context, enough to do kerning and deal with ligatures.
Remember that this discussion started when I mentioned that I was
unhappy with HarfBuzz, and I still am, precisely because of its
"first, send me your entire document" approach. I don't think it's the
right approach to take this design flaw of HarfBuzz for granted and
conclude that we need to rewrite the Emacs display engine before we
can get English ligatures to display properly.
If, that is, we can get look-ahead to work.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 16:34 ` Eli Zaretskii
@ 2020-05-23 22:38 ` Pip Cet
2020-05-24 15:33 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-23 22:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Sat, May 23, 2020 at 4:34 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 15:13:38 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> > Because what the current layout code does by default is to break
> > along any glyph boundary, and I don't see how that's broken in any
> > way.
>
> The code assumes that breaking on some glyph leaves the buffer
> iterator ('struct it') in a state that we can simply continue to the
> next buffer position.
Yes. I see no reason to change that.
> But if you already picked up several characters
> via look-ahead, that is not true, and you will have to return back
> several character positions, in order to continue on the next screen
> line.
You're describing why look-ahead is difficult: a while ago, you
appeared to be saying it wasn't. This confuses me.
Obviously, when I say "look-ahead", I mean receiving the next display
elements an iterator would produce if it were actually advanced,
without advancing it.
An easy, but potentially slow, way of doing that is to copy the
iterator to a new one, advance that, retrieve the display elements,
then throw away the copied iterator and return.
> The whole convoluted logic of display_line (and a similar one
> in move_it_in_display_line_to) is based on the assumption that this
> line-wrap decisions are made as soon as a single glyph is produced;
> that code will need to be rewritten if this assumption breaks.
I see no reason to break that assumption.
> And
> since the code is already hairy, to say the least, I cannot even
> imagine what it will look like after such rewriting.
Good thing I'm not planning to do that, then.
> This is just a small example of how deep are the current design
> assumptions entrenched in the code.
One I don't understand, because those fundamental design assumptions
aren't something I'm willing to break at this point.
> > > In short, I really don't see how this could ever work, except in a
> > > very limited set of simple use cases. E.g., what do you do with
> > > bidirectional text? ignore it?
> >
> > A bidi boundary is a hard boundary for HarfBuzz, and no shaping
> > happens across it. Is that what you mean by "ignore it"?
>
> I don't mean the boundary, I meant the fact that clusters need to be
> reordered.
I see no fundamental problem there, certainly not of the "I don't see
how this could ever work" variety.
> > > I don't see how it could work in production while supporting all the
> > > use cases we already do.
> >
> > It only comes in for use cases not handled otherwise, i.e. those where
> > the iterator is at an IT_CHARACTER. All other use cases are
> > unaffected, because they mean we're overriding the font decision
> > anyway.
>
> I see no reason to add such patches just to handle some simple enough
> use cases.
If it's so simple to get ligatures and kerning right, please tell me
how to do it.
> If we want the shaper to handle all the text we display,
Do we? A while back you said Lisp control over compositions was an
important feature, and I'm inclined to think we shouldn't break the
existing composition code.
> we should go all the way and do it for any text, ASCII, non-ASCII,
> symbols, emoji, everything.
Are you suggesting I'm somehow limiting myself to ASCII? Let me assure
you that's not the case.
> The current codebase is already very
> difficult to understand and modify;
I agree with that.
> you seem to suggest to make it
> even more so,
Well, yes, it's not going to be a free feature. The changes are
comparatively tiny compared to what else has been done to xdisp.c.
> and on top of that solve only a small part of the
> underlying problem.
Ligatures and kerning (right now, for LTR text). Is that a small
problem because of the lack of RTL support?
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 21:29 ` Pip Cet
@ 2020-05-24 15:19 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-24 15:19 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 21:29:32 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > If we are going
> > to send large chunks of text to the shaping engine, then none of the
> > insane complexity of bidi.c makes sense; we should simply throw all of
> > it away and use a very different, batch-style reordering code, of the
> > kind you can find in the freebidi library. The sole reason for
> > bidi.c's existence is to produce character positions in the _visual_
> > order, one position at a time, something that no other bidi-aware
> > editor does.
>
> Yes, except we're not talking about "large chunks of text" here:
> somehow you went from "we need only a bunch of regexps to catch
> ligatures" to "we need to send entire paragraphs to the shaping
> engine, nothing less will do".
The former is what we do now. If you want to treat fonts as software,
then the HarfBuzz guys tell us we need to pass all the text through
the shaper.
> My opinion is that we need a reasonable amount of context, often
> just a single character, and I see no reason to throw out the entire
> display engine because we want some look-ahead.
The problem is to determine how much of surrounding text is needed.
The answer I was given was "all of it". So if we want to do it right,
that is what we should do. What you propose stops short of that goal,
so it's yet another partial solution. Doing that to avoid the need of
specifying a fixed set of ligatures in advance sounds like a lot of
pain for minimal gain to me.
> Remember that this discussion started when I mentioned that I was
> unhappy with HarfBuzz, and I still am, precisely because of its
> "first, send me your entire document" approach.
I'm familiar with 3 shaping engines, and they all behave like that.
So this is not an idiosyncrasy of HarfBuzz, it's how text-shaping
works in general.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-23 22:38 ` Pip Cet
@ 2020-05-24 15:33 ` Eli Zaretskii
2020-05-26 18:13 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-24 15:33 UTC (permalink / raw)
To: Pip Cet; +Cc: cpitclaudel, alan, emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 22:38:18 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> On Sat, May 23, 2020 at 4:34 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > From: Pip Cet <pipcet@gmail.com>
> > > Date: Sat, 23 May 2020 15:13:38 +0000
> > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> > > Because what the current layout code does by default is to break
> > > along any glyph boundary, and I don't see how that's broken in any
> > > way.
> >
> > The code assumes that breaking on some glyph leaves the buffer
> > iterator ('struct it') in a state that we can simply continue to the
> > next buffer position.
>
> Yes. I see no reason to change that.
>
> > But if you already picked up several characters
> > via look-ahead, that is not true, and you will have to return back
> > several character positions, in order to continue on the next screen
> > line.
>
> You're describing why look-ahead is difficult: a while ago, you
> appeared to be saying it wasn't. This confuses me.
>
> Obviously, when I say "look-ahead", I mean receiving the next display
> elements an iterator would produce if it were actually advanced,
> without advancing it.
That's not what you said earlier:
> > > > > You write: "(b) is not really feasible without redesigning the entire
> > > > > Emacs display engine". I don't see how that's true at all. All we need
> > > > > is some limited look-ahead.
> > > >
> > > > We already have look-ahead: that's what the regexp part of the
> > > > composition rules are about. That is not the crucial problem.
> > >
> > > But it's the only problem I see!
> >
> > Then maybe I don't understand what you mean by look-ahead. Is that
> > the decision how to choose those 32 characters of "context"?
>
> Yes.
Here you said that look-ahead means how to _choose_ the context. Now
you are saying something very different: that look-ahead is how to
advance the iterator without advancing it. It's a small wonder we are
going in circles when the same term is used for two very different
things.
> > If we want the shaper to handle all the text we display,
>
> Do we? A while back you said Lisp control over compositions was an
> important feature, and I'm inclined to think we shouldn't break the
> existing composition code.
>
> > we should go all the way and do it for any text, ASCII, non-ASCII,
> > symbols, emoji, everything.
>
> Are you suggesting I'm somehow limiting myself to ASCII? Let me assure
> you that's not the case.
Then I really don't understand what problem are you trying to solve.
Let's try again from the beginning: which parts of the code that
implements automatic compositions are you trying to avoid, and why?
Is that the part that identifies the "context" via regular
expressions? If so, then this problem needs to be solved by some
alternative; using an arbitrary chosen fixed number of characters is
not suitable for production. You haven't yet shown any viable
alternative.
Assuming that the alternative for selecting the "context" is found,
and composite.c is augmented to apply it instead of the regexps, why
not use the rest of the automatic composition code to produce the
glyphs and display them? The code which does that exists and works,
and is tested by years of use. It already solves the problems of
look-ahead, of wrapping long lines, and others, including (but not
limited to) the dreaded bidi thing. Why reinvent that wheel when we
already have it, and it works well?
> > and on top of that solve only a small part of the
> > underlying problem.
>
> Ligatures and kerning (right now, for LTR text). Is that a small
> problem because of the lack of RTL support?
Yes, of course. An acceptable solution should support any text Emacs
supports. What's more, we already have the code which implements all
that, so I don't understand why you want to bypass it. Please
explain.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-24 15:33 ` Eli Zaretskii
@ 2020-05-26 18:13 ` Pip Cet
2020-05-26 19:46 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-26 18:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, alan, emacs-devel
On Sun, May 24, 2020 at 3:33 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 22:38:18 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > On Sat, May 23, 2020 at 4:34 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > > From: Pip Cet <pipcet@gmail.com>
> > > > Date: Sat, 23 May 2020 15:13:38 +0000
> > > > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> > > > Because what the current layout code does by default is to break
> > > > along any glyph boundary, and I don't see how that's broken in any
> > > > way.
> > >
> > > The code assumes that breaking on some glyph leaves the buffer
> > > iterator ('struct it') in a state that we can simply continue to the
> > > next buffer position.
> >
> > Yes. I see no reason to change that.
> >
> > > But if you already picked up several characters
> > > via look-ahead, that is not true, and you will have to return back
> > > several character positions, in order to continue on the next screen
> > > line.
> >
> > You're describing why look-ahead is difficult: a while ago, you
> > appeared to be saying it wasn't. This confuses me.
> >
> > Obviously, when I say "look-ahead", I mean receiving the next display
> > elements an iterator would produce if it were actually advanced,
> > without advancing it.
> That's not what you said earlier:
I think it is what I said.
> > > > > > You write: "(b) is not really feasible without redesigning the entire
> > > > > > Emacs display engine". I don't see how that's true at all. All we need
> > > > > > is some limited look-ahead.
> > > > >
> > > > > We already have look-ahead: that's what the regexp part of the
> > > > > composition rules are about. That is not the crucial problem.
> > > >
> > > > But it's the only problem I see!
> > >
> > > Then maybe I don't understand what you mean by look-ahead. Is that
> > > the decision how to choose those 32 characters of "context"?
> >
> > Yes.
>
> Here you said that look-ahead means how to _choose_ the context.
The distinction escapes me: look-ahead is how to get the context for a
character, obviously without ruining any persistent state. I'm puzzled
as to what else it could have meant.
> > > If we want the shaper to handle all the text we display,
> >
> > Do we? A while back you said Lisp control over compositions was an
> > important feature, and I'm inclined to think we shouldn't break the
> > existing composition code.
> >
> > > we should go all the way and do it for any text, ASCII, non-ASCII,
> > > symbols, emoji, everything.
> >
> > Are you suggesting I'm somehow limiting myself to ASCII? Let me assure
> > you that's not the case.
>
> Then I really don't understand what problem are you trying to solve.
Ligatures and kerning.
> Let's try again from the beginning: which parts of the code that
> implements automatic compositions are you trying to avoid,
> and why?
I'm not trying to avoid any of it! I just see no reason to use any of
it, so far, because the part we have in common is about a dozen lines
of code around the call to hb_shape.
> Is that the part that identifies the "context" via regular
> expressions? If so, then this problem needs to be solved by some
> alternative; using an arbitrary chosen fixed number of characters is
> not suitable for production.
I'm puzzled as to how these regular expressions, which only work when
they match fixed-length strings, as far as I can tell, are worse than
a fixed-length context. You're right that the number shouldn't be
hardcoded in Emacs, and shouldn't be arbitrary, but obviously there
has to be a limit shorter than a word or paragraph. (The composite.c
code currently hardcodes a limit of 500 characters).
(And as I've said repeatedly, this is a deficiency specifically in
HarfBuzz: the OpenType format makes it very easy to tell what the
longest pattern is and how much context is needed. HarfBuzz should
pass on that information, ideally by providing an incremental
asynchronous API that requests only as much context as is needed until
the glyphs in question can be returned.)
> You haven't yet shown any viable alternative.
To what? We still haven't seen any actual regular expressions that
work. You just keep saying "regular expressions" like that's a
solution, rather than simply constituting a restriction on the set of
possible solutions.
And keep in mind that this context is used only for deciding what the
"current" glyph looks like: the next glyph will have its own context,
which might or might not be different.
What I'm currently playing with is something that I'm not sure is even
expressible as a regexp: starting with the character at point, keep
adding surrounding characters unless doing so would create a
delimiter-nondelimiter boundary after the first char, or a
nondelimiter-delimiter boundary before the last char, but limit the
whole thing to 16 characters each way.
As I've explained, it would be much better to let HarfBuzz tell us
whether to provide more context, but even then we'd need a cut-off:
imagine a file containing a gigabyte of 'f's.
> Assuming that the alternative for selecting the "context" is found,
> and composite.c is augmented to apply it instead of the regexps, why
> not use the rest of the automatic composition code to produce the
> glyphs and display them?
I chose not to do that for a patch which I have stated repeatedly was
not in any way a finalized design, and I don't see any good reason to
do it for a real patch, either, so far.
(I'll be honest: I strongly suspect that the code is too slow, we know
it to be buggy, and it's simply too different from what I actually
want to benefit from sharing the code).
> The code which does that exists and works,
(I suspect: slowly)
> and is tested by years of use.
It's unusable for me in Emacs 26.3.
> It already solves the problems of look-ahead,
If it does so efficiently, I'll certainly try reusing that code. But I
strongly suspect it doesn't.
> of wrapping long lines,
Very poorly, for my purposes.
> and others, including (but not limited to) the dreaded bidi thing.
Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
> Why reinvent that wheel when we already have it, and it works well?
First, because it doesn't work that well for my purposes; second,
precisely because it works well for the purposes of others, and I'd
like to have as little impact as possible on existing use cases. They
should just continue working, and so far they do.
> > > and on top of that solve only a small part of the
> > > underlying problem.
> >
> > Ligatures and kerning (right now, for LTR text). Is that a small
> > problem because of the lack of RTL support?
>
> Yes, of course.
Why? I honestly don't see what's bad about a patch that improves
things for most languages and doesn't affect RTL languages (which, as
you point out, have existing support).
The code shouldn't break horribly for RTL text (it doesn't). If it
works, that's great; if it doesn't work and leaves things unshaped,
that's the existing behavior, and auto-composition-mode will still
work if enabled.
> An acceptable solution should support any text Emacs
> supports.
By that standard, bidi.c and composite.c are unacceptable.
> What's more, we already have the code which implements all
> that, so I don't understand why you want to bypass it.
We have something that superficially results in a similar screen
layout to what I want, but that actually represents display elements
in a way that makes them unusable for my purposes.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-26 18:13 ` Pip Cet
@ 2020-05-26 19:46 ` Eli Zaretskii
2020-05-27 9:36 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-26 19:46 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Tue, 26 May 2020 18:13:55 +0000
> Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
>
> > Assuming that the alternative for selecting the "context" is found,
> > and composite.c is augmented to apply it instead of the regexps, why
> > not use the rest of the automatic composition code to produce the
> > glyphs and display them?
>
> I chose not to do that for a patch which I have stated repeatedly was
> not in any way a finalized design, and I don't see any good reason to
> do it for a real patch, either, so far.
Why not? How about trying to do that before giving up?
> (I'll be honest: I strongly suspect that the code is too slow, we know
> it to be buggy, and it's simply too different from what I actually
> want to benefit from sharing the code).
>
> > The code which does that exists and works,
>
> (I suspect: slowly)
Any measurements to back that up? E.g., is scrolling through
etc/HELLO especially slow, once all the fonts were loaded (i.e. each
character in the file was displayed at least once)?
> > and is tested by years of use.
>
> It's unusable for me in Emacs 26.3.
How so? what doesn't work? (And why are you using Emacs 26 and not
Emacs 27, where we support HarfBuzz and made several improvements and
bugfixes in the character composition area?)
> > It already solves the problems of look-ahead,
>
> If it does so efficiently, I'll certainly try reusing that code. But I
> strongly suspect it doesn't.
Why suspect? why not try and see what does and doesn't work, what is
and isn't efficient?
> > of wrapping long lines,
>
> Very poorly, for my purposes.
How so? what doesn't wrap correctly, and why?
> > and others, including (but not limited to) the dreaded bidi thing.
>
> Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
That's because you look in the wrong place. Once again, try looking
at etc/HELLO, there are portions of it that need both bidi and
compositions. I can explain how it works (the code is spread over
several files), but please believe me that it does, it passed the
HarfBuzz developers' eyes most of whom are native Arabic and Farsi
speakers, and wouldn't allow us to display Arabic script incorrectly.
The whole point of using the existing code is that you don't _need_ to
understand how exactly we handle the bidi reordering when character
compositions are required. It just works, for all you care. It did
take several iterations to get right at the time; why would you want
to repeat all that, when the code is there to use and extend?
> > Why reinvent that wheel when we already have it, and it works well?
>
> First, because it doesn't work that well for my purposes;
What doesn't work? please be specific.
> second, precisely because it works well for the purposes of others,
> and I'd like to have as little impact as possible on existing use
> cases. They should just continue working, and so far they do.
You are thinking of breaking those other cases by your changes? But
we haven't yet established that changes are needed, let alone which
changes. How do you know you will break anything at all?
> > > Ligatures and kerning (right now, for LTR text). Is that a small
> > > problem because of the lack of RTL support?
> >
> > Yes, of course.
>
> Why?
Because the features you are talking about should "just work" in
Emacs. Not only for some use cases and some scripts -- that is not
how we develop features. Features that work only for some cases are
broken and will draw bug reports. They make Emacs look unclean and
unprofessional.
And there's no need to add such half-broken features because code that
supports much broader class of use cases already exists, you just need
to use it and maybe extend and augment it a bit.
> The code shouldn't break horribly for RTL text (it doesn't).
It _will_ break for RTL text, you just didn't yet see it because you
only tested it in simple use cases. UAX#9 defines a lot of optional
features, including multi-level directional overrides and embeddings,
it isn't just right-to-left vs left-to-right.
Again, there's no need for you to reinvent this wheel, we already have
it figured out.
> > What's more, we already have the code which implements all
> > that, so I don't understand why you want to bypass it.
>
> We have something that superficially results in a similar screen
> layout to what I want, but that actually represents display elements
> in a way that makes them unusable for my purposes.
Then please describe what doesn't fit your purpose, and let's focus on
extending the existing code to do what's missing. Throwing everything
away and starting anew is not the right way, it's a huge waste of
energy and time to implement something that we already have. It is
also a maintenance burden in the long run.
Please note: I'm not talking about the regexp part -- that part you
anyway will need to decide how to extend or augment. I'm telling you
right here and now that blindly taking a fixed amount of surrounding
text will not be acceptable. You can either come up with some smarter
regexp (and you are wrong: the regexps in composition-function-table
do NOT have to match only fixed strings, you can see that they don't
in the part of the table we set up for the Arabic script); or you can
decide on something more complex, like a function. Either way, the
amount of text that this will pick up and pass to the shaper should be
reasonable and should be determined by some understandable rules. And
those rules must be controllable from Lisp.
But that is a separate part of the problem that you will need to
solve, and you will need to solve it whether or not you use character
compositions. What I _am_ saying is that the rest of the machinery
that implements automatic compositions does exactly what you need: it
calls the shaper, handling LTR and RTL text as needed, then lays out
the glyphs the shaper returns in a way that handles all the usual
stuff our users expect, such as line wrapping and truncation. It is
silly to disregard that code, so please don't.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-26 19:46 ` Eli Zaretskii
@ 2020-05-27 9:36 ` Pip Cet
2020-05-27 17:13 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-27 9:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Tue, May 26, 2020 at 7:46 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Tue, 26 May 2020 18:13:55 +0000
> > Cc: cpitclaudel@gmail.com, alan@idiocy.org, emacs-devel@gnu.org
> >
> > > Assuming that the alternative for selecting the "context" is found,
> > > and composite.c is augmented to apply it instead of the regexps, why
> > > not use the rest of the automatic composition code to produce the
> > > glyphs and display them?
> >
> > I chose not to do that for a patch which I have stated repeatedly was
> > not in any way a finalized design, and I don't see any good reason to
> > do it for a real patch, either, so far.
>
> Why not?
Which part are you asking about? I don't see any good reason because
I've read the composite.c code (I'm not ignoring it), with an eye to
reusing what's reusable, and come up empty.
But you've convinced me I need to do a careful rereading.
> > > The code which does that exists and works,
> >
> > (I suspect: slowly)
>
> Any measurements to back that up?
Yes. With a regexp of "....", the composite.c code takes 175 billion
cycles to display every line of composite.c. My code takes 144 billion
cycles, with a lookahead/lookbehind each set to 128 but limiting it as
described.
> E.g., is scrolling through
> etc/HELLO especially slow, once all the fonts were loaded (i.e. each
> character in the file was displayed at least once)?
> (And why are you using Emacs 26 and not
> Emacs 27, where we support HarfBuzz and made several improvements and
> bugfixes in the character composition area?)
Because I was trying to test your implication that all this was usable
years ago. It wasn't. I'm not using Emacs 26 :-)
> > > It already solves the problems of look-ahead,
> >
> > If it does so efficiently, I'll certainly try reusing that code. But I
> > strongly suspect it doesn't.
>
> Why suspect? why not try and see what does and doesn't work, what is
> and isn't efficient?
I have, now, coming up with the above measurement which confirms my suspicion.
> > > and others, including (but not limited to) the dreaded bidi thing.
> >
> > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
>
> That's because you look in the wrong place.
What's the right place? I'm using all the code in bidi.c, of course,
so as far as I can tell what I'm not doing is using composite.c...
> Once again, try looking
> at etc/HELLO, there are portions of it that need both bidi and
> compositions. I can explain how it works (the code is spread over
> several files), but please believe me that it does, it passed the
> HarfBuzz developers' eyes most of whom are native Arabic and Farsi
> speakers, and wouldn't allow us to display Arabic script incorrectly.
>
> The whole point of using the existing code is that you don't _need_ to
> understand how exactly we handle the bidi reordering when character
> compositions are required.
But that's true without using the existing code!
> It just works, for all you care. It did
> take several iterations to get right at the time; why would you want
> to repeat all that, when the code is there to use and extend?
> > second, precisely because it works well for the purposes of others,
> > and I'd like to have as little impact as possible on existing use
> > cases. They should just continue working, and so far they do.
>
> You are thinking of breaking those other cases by your changes?
No! If I break them, that's a severe bug in my code!
> But
> we haven't yet established that changes are needed,
"Enter"ing ligature glyphs is definitely something we need to do
before any user can reasonably use variable-pitch fonts with ligatures
for displaying English text. Kerning is another such thing. Both don't
work with the current code.
> Because the features you are talking about should "just work" in
> Emacs.
> Not only for some use cases and some scripts -- that is not
> how we develop features. Features that work only for some cases are
> broken and will draw bug reports. They make Emacs look unclean and
> unprofessional.
Not as much as the current lack of support does.
> And there's no need to add such half-broken features because code that
> supports much broader class of use cases already exists, you just need
> to use it and maybe extend and augment it a bit.
I don't think I agree with the "a bit".
> > The code shouldn't break horribly for RTL text (it doesn't).
>
> It _will_ break for RTL text, you just didn't yet see it because you
> only tested it in simple use cases. UAX#9 defines a lot of optional
> features, including multi-level directional overrides and embeddings,
> it isn't just right-to-left vs left-to-right.
I assume bidi.c handles that, as it does for composite.c?
> > > What's more, we already have the code which implements all
> > > that, so I don't understand why you want to bypass it.
> >
> > We have something that superficially results in a similar screen
> > layout to what I want, but that actually represents display elements
> > in a way that makes them unusable for my purposes.
>
> Then please describe what doesn't fit your purpose, and let's focus on
> extending the existing code to do what's missing.
The three main things are:
- "entering" glyphs, instead of treating them as atomic
- providing context automatically rather than by providing specific
regexps for it in advance
- kerning, which requires context for every character
Secondary concerns:
- ligatures that come partly from a display property and partly from
the buffer (composite.c doesn't allow for those, as far as I can tell)
> Please note: I'm not talking about the regexp part -- that part you
> anyway will need to decide how to extend or augment. I'm telling you
> right here and now that blindly taking a fixed amount of surrounding
> text will not be acceptable. You can either come up with some smarter
> regexp (and you are wrong: the regexps in composition-function-table
> do NOT have to match only fixed strings, you can see that they don't
> in the part of the table we set up for the Arabic script);
Again, I think the limits are fixed: 4 characters of history and 500
characters of look-ahead. What am I missing?
> or you can
> decide on something more complex, like a function. Either way, the
> amount of text that this will pick up and pass to the shaper should be
> reasonable and should be determined by some understandable rules. And
> those rules must be controllable from Lisp.
That last part isn't true for the composite.c code, which imposes a
limit of 4 characters of history and 500 characters of look-ahead, as
far as I can tell. But, sure, if that's a requirement, I'll keep it in
mind.
> But that is a separate part of the problem that you will need to
> solve, and you will need to solve it whether or not you use character
> compositions. What I _am_ saying is that the rest of the machinery
> that implements automatic compositions does exactly what you need: it
> calls the shaper, handling LTR and RTL text as needed, then lays out
> the glyphs the shaper returns in a way that handles all the usual
> stuff our users expect, such as line wrapping and truncation.
> It is silly to disregard that code, so please don't.
You've convinced me that it's worth reading it again, more carefully,
but I'm not optimistic I'll come to a different conclusion this time
around.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-27 9:36 ` Pip Cet
@ 2020-05-27 17:13 ` Eli Zaretskii
2020-05-27 18:42 ` Pip Cet
0 siblings, 1 reply; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-27 17:13 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Wed, 27 May 2020 09:36:52 +0000
> Cc: emacs-devel@gnu.org
>
> > Any measurements to back that up?
>
> Yes. With a regexp of "....", the composite.c code takes 175 billion
> cycles to display every line of composite.c. My code takes 144 billion
> cycles, with a lookahead/lookbehind each set to 128 but limiting it as
> described.
What did you compare, exactly? On the one hand, the code you posted
here, which took 128 characters around each character to be displayed?
any other changes in the code you posted here? And what does
"limiting it as described" mean here?
And on the other hand, the existing automatic composition machinery?
With what setup of composition-function-table, exactly?
And finally, which code was included in the count of cycles?
> > > > and others, including (but not limited to) the dreaded bidi thing.
> > >
> > > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
> >
> > That's because you look in the wrong place.
>
> What's the right place? I'm using all the code in bidi.c, of course,
No, actually you don't. Your make_context copies characters in strict
logical order, bypassing bidi.c, and by that also potentially crossing
boundaries of different directionality (and even line and paragraph
boundaries), which is a no-no in text shaping. Then, after you call
the shaper, you don't reorder the glyphs it delivers, so they will
look on display in the wrong order. And there may be other subtle
issues as well -- this stuff was finalized so long ago that I'm not
even sure I remember all the details of what needed to be done to get
it right.
> > > The code shouldn't break horribly for RTL text (it doesn't).
> >
> > It _will_ break for RTL text, you just didn't yet see it because you
> > only tested it in simple use cases. UAX#9 defines a lot of optional
> > features, including multi-level directional overrides and embeddings,
> > it isn't just right-to-left vs left-to-right.
>
> I assume bidi.c handles that, as it does for composite.c?
Yes, but only _if_you_use_them_correctly_! If you bypass them, then
all bets are off.
> > > We have something that superficially results in a similar screen
> > > layout to what I want, but that actually represents display elements
> > > in a way that makes them unusable for my purposes.
> >
> > Then please describe what doesn't fit your purpose, and let's focus on
> > extending the existing code to do what's missing.
>
> The three main things are:
> - "entering" glyphs, instead of treating them as atomic
Why is that needed? A ligature is a single display entity, that's why
fonts ligate. Why would we want to break ligatures when we wrap
lines?
> - providing context automatically rather than by providing specific
> regexps for it in advance
That's a separate part of the problem; I wasn't talking about it. It
needs a separate solution (which was not yet presented), but the
solution doesn't have to be based on regexps if a better or smarter or
faster way is available. Extending composition-function-table to
support context definition by means other than regexp is easy and
doesn't disrupt the way the code works.
> - kerning, which requires context for every character
That's again about that separate part of the problem, because once the
context was determined correctly, the shaper will perform the kerning
for you.
> - ligatures that come partly from a display property and partly from
> the buffer (composite.c doesn't allow for those, as far as I can tell)
It doesn't and it shouldn't! Text of display strings and overlay
strings is completely isolated from buffer text, and is even
bidi-reordered independently. This is by design. These strings are
more akin to images than to a part of buffer text, so mixing them with
buffer text on display would be a grave mistake.
> > Please note: I'm not talking about the regexp part -- that part you
> > anyway will need to decide how to extend or augment. I'm telling you
> > right here and now that blindly taking a fixed amount of surrounding
> > text will not be acceptable. You can either come up with some smarter
> > regexp (and you are wrong: the regexps in composition-function-table
> > do NOT have to match only fixed strings, you can see that they don't
> > in the part of the table we set up for the Arabic script);
>
> Again, I think the limits are fixed: 4 characters of history and 500
> characters of look-ahead. What am I missing?
Fixed limits and fixed strings are two different things. You can
match strings of many different lengths up to a limit.
The 3 previous characters are rarely needed, certainly not for English
ligatures, because you can detect the sequence by the first character.
So this is rarely a limitation; but again, it can be expanded if
needed with little if any effect on the code.
(And where did you see the 500-character limitation of look-ahead?)
Anyway, you again focus on the (separate) issue of determining the
context. Whereas I was mainly talking about what happens _after_ you
determine the context: how do you collect the characters to pass to
the shaper, how you present to the layout code the glyphs returned by
the shaper, and how you lay out those glyphs by inserting them into
the glyph rows of the glyph matrix. It is this code that I see no
reason to modify, definitely not significantly.
> > or you can
> > decide on something more complex, like a function. Either way, the
> > amount of text that this will pick up and pass to the shaper should be
> > reasonable and should be determined by some understandable rules. And
> > those rules must be controllable from Lisp.
>
> That last part isn't true for the composite.c code, which imposes a
> limit of 4 characters of history and 500 characters of look-ahead
How do those limits violate the above requirement? The 3-char
prev-chars limit is "reasonable" because we currently don't need more,
and the other limit doesn't exist AFAICT -- you could make a regexp
that matched very long strings, if needed. And the rules to use to
set up the regexp are definitely "understandable" and can be
controlled from Lisp.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-27 17:13 ` Eli Zaretskii
@ 2020-05-27 18:42 ` Pip Cet
2020-05-27 19:19 ` Eli Zaretskii
0 siblings, 1 reply; 145+ messages in thread
From: Pip Cet @ 2020-05-27 18:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On Wed, May 27, 2020 at 5:13 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Wed, 27 May 2020 09:36:52 +0000
> > Cc: emacs-devel@gnu.org
> >
> > > Any measurements to back that up?
> >
> > Yes. With a regexp of "....", the composite.c code takes 175 billion
> > cycles to display every line of composite.c. My code takes 144 billion
> > cycles, with a lookahead/lookbehind each set to 128 but limiting it as
> > described.
>
> What did you compare, exactly? On the one hand, the code you posted
> here, which took 128 characters around each character to be displayed?
No. Not anything like that code.
> any other changes in the code you posted here? And what does
> "limiting it as described" mean here?
I described the algorithm for selecting context.
> And on the other hand, the existing automatic composition machinery?
> With what setup of composition-function-table, exactly?
As I said, a regexp of "....".
> And finally, which code was included in the count of cycles?
All of it.
There's no reason to believe the composite.c regexp design will
perform adequately. It doesn't.
> > > > > and others, including (but not limited to) the dreaded bidi thing.
> > > >
> > > > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
> > >
> > > That's because you look in the wrong place.
> >
> > What's the right place? I'm using all the code in bidi.c, of course,
>
> No, actually you don't.
> Your make_context copies characters in strict
> logical order, bypassing bidi.c
My current code doesn't.
> , and by that also potentially crossing
> boundaries of different directionality (and even line and paragraph
> boundaries), which is a no-no in text shaping. Then, after you call
> the shaper, you don't reorder the glyphs it delivers, so they will
> look on display in the wrong order.
I do now.
> And there may be other subtle
> issues as well -- this stuff was finalized so long ago that I'm not
> even sure I remember all the details of what needed to be done to get
> it right.
(It's not enough. Open emacs -Q etc/HELLO, place point on the lam in
"aleikum", and hit control-space. The shape changes to something
incorrect.)
> > > > The code shouldn't break horribly for RTL text (it doesn't).
> > >
> > > It _will_ break for RTL text, you just didn't yet see it because you
> > > only tested it in simple use cases. UAX#9 defines a lot of optional
> > > features, including multi-level directional overrides and embeddings,
> > > it isn't just right-to-left vs left-to-right.
> >
> > I assume bidi.c handles that, as it does for composite.c?
>
> Yes, but only _if_you_use_them_correctly_! If you bypass them, then
> all bets are off.
Obviously.
> > > > We have something that superficially results in a similar screen
> > > > layout to what I want, but that actually represents display elements
> > > > in a way that makes them unusable for my purposes.
> > >
> > > Then please describe what doesn't fit your purpose, and let's focus on
> > > extending the existing code to do what's missing.
> >
> > The three main things are:
> > - "entering" glyphs, instead of treating them as atomic
>
> Why is that needed? A ligature is a single display entity, that's why
> fonts ligate.
"ffi" is not. When I enter "official" C-a C-f C-f, point MUST be on
the second f.
> Why would we want to break ligatures when we wrap
> lines?
Who said we do? I personally like it, but it's obviously not something
we should do by default?
> > - providing context automatically rather than by providing specific
> > regexps for it in advance
>
> That's a separate part of the problem; I wasn't talking about it. It
> needs a separate solution (which was not yet presented), but the
> solution doesn't have to be based on regexps if a better or smarter or
> faster way is available. Extending composition-function-table to
> support context definition by means other than regexp is easy and
> doesn't disrupt the way the code works.
>
> > - kerning, which requires context for every character
>
> That's again about that separate part of the problem, because once the
> context was determined correctly, the shaper will perform the kerning
> for you.
> > - ligatures that come partly from a display property and partly from
> > the buffer (composite.c doesn't allow for those, as far as I can tell)
>
> It doesn't and it shouldn't! Text of display strings and overlay
> strings is completely isolated from buffer text, and is even
> bidi-reordered independently. This is by design.
Unacceptable design for my use case, then.
I don't see how revealing buffer text that has a replacing display
property, rather than the replacement, is good design.
The results of putting display properties on autocompositions
are...entertaining, in any case. I've now got an "x" character that
C-x = tells me is an "i"...
> These strings are
> more akin to images than to a part of buffer text, so mixing them with
> buffer text on display would be a grave mistake.
No, it wouldn't be. If two letters appear with no intervening space,
they need to be kerned and ligated if appropriate, no matter where
they come from. If people want a ZWNJ, that's perfectly available to
them.
> > > Please note: I'm not talking about the regexp part -- that part you
> > > anyway will need to decide how to extend or augment. I'm telling you
> > > right here and now that blindly taking a fixed amount of surrounding
> > > text will not be acceptable. You can either come up with some smarter
> > > regexp (and you are wrong: the regexps in composition-function-table
> > > do NOT have to match only fixed strings, you can see that they don't
> > > in the part of the table we set up for the Arabic script);
> >
> > Again, I think the limits are fixed: 4 characters of history and 500
> > characters of look-ahead. What am I missing?
>
> Fixed limits and fixed strings are two different things. You can
> match strings of many different lengths up to a limit.
Which effectively means you can match strings of that limited length.
> The 3 previous characters are rarely needed, certainly not for English
> ligatures, because you can detect the sequence by the first character.
Precisely the same argument applies to my 16-character limit. A script
in which a glyph depends on something happening 16 codepoints onwards,
or back, is extremely unlikely.
> Anyway, you again focus on the (separate) issue of determining the
> context. Whereas I was mainly talking about what happens _after_ you
> determine the context: how do you collect the characters to pass to
> the shaper, how you present to the layout code the glyphs returned by
> the shaper, and how you lay out those glyphs by inserting them into
> the glyph rows of the glyph matrix. It is this code that I see no
> reason to modify, definitely not significantly.
It needs to be modified, significantly, to support entering glyphs, to
support kerning, and to support things like ligating across a buffer
text / display string boundary.
> > > or you can
> > > decide on something more complex, like a function. Either way, the
> > > amount of text that this will pick up and pass to the shaper should be
> > > reasonable and should be determined by some understandable rules. And
> > > those rules must be controllable from Lisp.
> >
> > That last part isn't true for the composite.c code, which imposes a
> > limit of 4 characters of history and 500 characters of look-ahead
>
> How do those limits violate the above requirement? The 3-char
> prev-chars limit is "reasonable" because we currently don't need more,
It's hardcoded in C, though. A 16-character limit, as explained above,
is perfectly "reasonable" for determining the shape of a single glyph.
> and the other limit doesn't exist AFAICT -- you could make a regexp
> that matched very long strings, if needed.
Hmm. I thought I saw weirdness around the 500th character, but it's
probably one of the other bugs.
But, seriously, you're still willing to argue that point shouldn't be
able to enter the "ffi" glyph? Not even if the user wants it? Because
if so, I suggest we interrupt the discussion here.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
2020-05-27 18:42 ` Pip Cet
@ 2020-05-27 19:19 ` Eli Zaretskii
0 siblings, 0 replies; 145+ messages in thread
From: Eli Zaretskii @ 2020-05-27 19:19 UTC (permalink / raw)
To: Pip Cet; +Cc: emacs-devel
> From: Pip Cet <pipcet@gmail.com>
> Date: Wed, 27 May 2020 18:42:07 +0000
> Cc: emacs-devel@gnu.org
>
> > What did you compare, exactly? On the one hand, the code you posted
> > here, which took 128 characters around each character to be displayed?
>
> No. Not anything like that code.
Then your numbers cannot be meaningfully reasoned about, because no
one knows what you did.
> There's no reason to believe the composite.c regexp design will
> perform adequately. It doesn't.
I guess in your eyes only your code performs adequately.
Sorry, this means any further discussion with you on these matters is
futile. I regret to have wasted so much time trying to explain how
this stuff works. I will try to be smarter next time when you ask
some question.
> (It's not enough. Open emacs -Q etc/HELLO, place point on the lam in
> "aleikum", and hit control-space. The shape changes to something
> incorrect.)
A known limitation of our handling of faces in conjunction with
character composition. Finding the reason is left as an exercise.
> > > - "entering" glyphs, instead of treating them as atomic
> >
> > Why is that needed? A ligature is a single display entity, that's why
> > fonts ligate.
>
> "ffi" is not. When I enter "official" C-a C-f C-f, point MUST be on
> the second f.
That doesn't require producing separate glyphs.
> > It doesn't and it shouldn't! Text of display strings and overlay
> > strings is completely isolated from buffer text, and is even
> > bidi-reordered independently. This is by design.
>
> Unacceptable design for my use case, then.
This is the design of the Emacs display engine. If it doesn't fit
your case, your case cannot be had in Emacs without rewriting the
display code.
> No, it wouldn't be. If two letters appear with no intervening space,
> they need to be kerned and ligated if appropriate, no matter where
> they come from. If people want a ZWNJ, that's perfectly available to
> them.
That's not what display and overlay strings are for in Emacs.
> > Fixed limits and fixed strings are two different things. You can
> > match strings of many different lengths up to a limit.
>
> Which effectively means you can match strings of that limited length.
Except that there's no limit, of course.
> > The 3 previous characters are rarely needed, certainly not for English
> > ligatures, because you can detect the sequence by the first character.
>
> Precisely the same argument applies to my 16-character limit. A script
> in which a glyph depends on something happening 16 codepoints onwards,
> or back, is extremely unlikely.
You are wrong. Please read this:
https://lists.freedesktop.org/archives/harfbuzz/2020-May/007517.html
https://lists.freedesktop.org/archives/harfbuzz/2020-May/007521.html
This is what is needed for doing ligatures The Right Way. Collecting
an arbitrary number of codepoint doesn't cut it.
And in any case, I was talking about the need to look _backward_,
i.e. when the character that triggers the composition is not the first
one in the sequence of the characters to be composed. This is usually
needed as an optimization: if you have 2-character sequences where the
second character is one of a much smaller set than the first, then
using the second character as an anchor will use up less memory when
you set up composition-function-table. A case in point is a base
character and a diacritic.
How many characters you need _forward_ is an entirely different issue.
> It needs to be modified, significantly, to support entering glyphs, to
> support kerning, and to support things like ligating across a buffer
> text / display string boundary.
Two of these are not needed or are outright wrong, and the third
doesn't need anything, the shaper already does that with any text you
pass through it.
> But, seriously, you're still willing to argue that point shouldn't be
> able to enter the "ffi" glyph? Not even if the user wants it? Because
> if so, I suggest we interrupt the discussion here.
See above. I indeed see no reason to continue this discussion, as
evidently any progress here is impossible with your attitude in place.
^ permalink raw reply [flat|nested] 145+ messages in thread
end of thread, other threads:[~2020-05-27 19:19 UTC | newest]
Thread overview: 145+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-17 10:41 Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 14:09 ` Arthur Miller
2020-05-17 14:30 ` Eli Zaretskii
2020-05-17 15:06 ` Arthur Miller
2020-05-17 15:56 ` Eli Zaretskii
2020-05-17 16:50 ` Arthur Miller
2020-05-17 17:06 ` Eli Zaretskii
2020-05-17 14:35 ` Eli Zaretskii
2020-05-17 14:59 ` Julius Pfrommer
2020-05-17 15:55 ` Eli Zaretskii
2020-05-17 16:28 ` Pip Cet
2020-05-17 17:00 ` Eli Zaretskii
2020-05-17 18:50 ` Pip Cet
2020-05-17 19:17 ` Eli Zaretskii
2020-05-18 16:08 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-18 16:45 ` tomas
2020-05-18 16:49 ` Eli Zaretskii
2020-05-18 17:05 ` Ligatures Stefan Monnier
2020-05-18 17:18 ` Ligatures Eli Zaretskii
2020-05-18 19:19 ` Ligatures Pip Cet
2020-05-18 19:25 ` Ligatures tomas
2020-05-18 19:41 ` Ligatures Pip Cet
2020-05-18 20:20 ` Ligatures tomas
2020-05-18 19:33 ` Ligatures Eli Zaretskii
2020-05-18 19:44 ` Ligatures Clément Pit-Claudel
2020-05-19 2:25 ` Ligatures Eli Zaretskii
2020-05-19 2:44 ` Ligatures Clément Pit-Claudel
2020-05-19 13:59 ` Ligatures Eli Zaretskii
2020-05-19 14:35 ` Ligatures Clément Pit-Claudel
2020-05-19 15:21 ` Ligatures Eli Zaretskii
2020-05-19 15:44 ` Ligatures Clément Pit-Claudel
2020-05-19 16:15 ` Ligatures Eli Zaretskii
2020-05-19 15:36 ` Ligatures Tassilo Horn
2020-05-19 16:08 ` Ligatures Eli Zaretskii
2020-05-19 16:14 ` Ligatures Stefan Monnier
2020-05-19 3:47 ` Ligatures Stefan Monnier
2020-05-19 4:51 ` Ligatures Clément Pit-Claudel
2020-05-18 19:38 ` Ligatures Clément Pit-Claudel
2020-05-19 14:55 ` Ligatures Pip Cet
2020-05-19 15:30 ` Ligatures Clément Pit-Claudel
2020-05-19 15:52 ` Ligatures Pip Cet
2020-05-18 17:24 ` Ligatures tomas
2020-05-18 17:41 ` Ligatures Eli Zaretskii
2020-05-18 19:07 ` Ligatures tomas
2020-05-18 19:17 ` Ligatures Eli Zaretskii
2020-05-18 20:33 ` Ligatures Stefan Monnier
2020-05-18 17:31 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Clément Pit-Claudel
2020-05-18 17:39 ` Eli Zaretskii
2020-05-18 19:01 ` Clément Pit-Claudel
2020-05-18 19:15 ` Eli Zaretskii
2020-05-18 19:18 ` tomas
2020-05-18 20:37 ` Ligatures Stefan Monnier
2020-05-18 21:59 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Alan Third
2020-05-19 13:56 ` Eli Zaretskii
2020-05-19 14:39 ` Clément Pit-Claudel
2020-05-19 21:43 ` Pip Cet
2020-05-20 1:41 ` Clément Pit-Claudel
2020-05-20 2:07 ` Ligatures Stefan Monnier
2020-05-20 7:14 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) tomas
2020-05-20 15:18 ` Eli Zaretskii
2020-05-20 17:31 ` Clément Pit-Claudel
2020-05-20 18:01 ` Eli Zaretskii
2020-05-20 18:33 ` Clément Pit-Claudel
2020-05-20 18:49 ` Eli Zaretskii
2020-05-20 18:53 ` Clément Pit-Claudel
2020-05-20 19:02 ` Eli Zaretskii
2020-05-20 23:19 ` Ligatures Stefan Monnier
2020-05-21 10:01 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Pip Cet
2020-05-21 14:11 ` Eli Zaretskii
2020-05-21 16:26 ` Pip Cet
2020-05-21 19:08 ` Eli Zaretskii
2020-05-21 20:51 ` Clément Pit-Claudel
2020-05-21 21:16 ` Pip Cet
2020-05-22 6:12 ` Eli Zaretskii
2020-05-22 9:25 ` Pip Cet
2020-05-22 11:23 ` Eli Zaretskii
2020-05-22 12:52 ` Pip Cet
2020-05-22 13:15 ` Eli Zaretskii
2020-05-22 13:29 ` Clément Pit-Claudel
2020-05-22 14:30 ` Eli Zaretskii
2020-05-22 14:34 ` Clément Pit-Claudel
2020-05-22 19:01 ` Eli Zaretskii
2020-05-22 19:33 ` Clément Pit-Claudel
2020-05-22 19:44 ` Eli Zaretskii
2020-05-22 20:02 ` Clément Pit-Claudel
[not found] ` <83mu5z171j.fsf@gnu.org>
2020-05-23 14:34 ` Clément Pit-Claudel
2020-05-23 16:18 ` Eli Zaretskii
2020-05-23 16:37 ` Clément Pit-Claudel
2020-05-22 13:56 ` Pip Cet
[not found] ` <83lflj16jn.fsf@gnu.org>
[not found] ` <AF222EA0-FE05-4224-8459-2BF82CE27266@vasilij.de>
[not found] ` <834ks7110w.fsf@gnu.org>
2020-05-23 11:24 ` Vasilij Schneidermann
2020-05-23 13:04 ` Eli Zaretskii
[not found] ` <83eerb145r.fsf@gnu.org>
[not found] ` <CAOqdjBeef8Fa596raEyBUwv0Zr+41LSiYvHW39EdoaXpyxCXVw@mail.gmail.com>
[not found] ` <831rnb0zld.fsf@gnu.org>
2020-05-23 12:36 ` Pip Cet
2020-05-23 14:08 ` Eli Zaretskii
2020-05-23 15:13 ` Pip Cet
2020-05-23 16:34 ` Eli Zaretskii
2020-05-23 22:38 ` Pip Cet
2020-05-24 15:33 ` Eli Zaretskii
2020-05-26 18:13 ` Pip Cet
2020-05-26 19:46 ` Eli Zaretskii
2020-05-27 9:36 ` Pip Cet
2020-05-27 17:13 ` Eli Zaretskii
2020-05-27 18:42 ` Pip Cet
2020-05-27 19:19 ` Eli Zaretskii
2020-05-23 17:32 ` Eli Zaretskii
2020-05-23 21:29 ` Pip Cet
2020-05-24 15:19 ` Eli Zaretskii
2020-05-23 12:47 ` Ligatures Stefan Monnier
2020-05-23 13:10 ` Ligatures Eli Zaretskii
2020-05-23 13:45 ` Ligatures Stefan Monnier
2020-05-23 14:12 ` Ligatures Eli Zaretskii
2020-05-23 13:36 ` Ligatures 조성빈
2020-05-23 14:15 ` Ligatures Stefan Monnier
2020-05-23 14:37 ` Ligatures Pip Cet
2020-05-22 11:44 ` Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY)) Eli Zaretskii
2020-05-22 13:26 ` Clément Pit-Claudel
2020-05-22 14:29 ` Eli Zaretskii
2020-05-22 14:32 ` Clément Pit-Claudel
2020-05-22 19:00 ` Eli Zaretskii
2020-05-21 21:06 ` Pip Cet
2020-05-22 6:06 ` Eli Zaretskii
2020-05-22 9:34 ` Pip Cet
2020-05-22 11:33 ` Eli Zaretskii
2020-05-19 20:26 ` Alan Third
2020-05-19 10:09 ` Trevor Spiteri
2020-05-19 14:22 ` Eli Zaretskii
2020-05-19 5:43 ` Ligatures ASSI
2020-05-19 7:22 ` Ligatures tomas
2020-05-19 7:55 ` Ligatures Joost Kremers
2020-05-19 8:07 ` Ligatures tomas
2020-05-19 10:17 ` Ligatures Yuri Khan
2020-05-19 14:26 ` Ligatures Eli Zaretskii
2020-05-19 19:00 ` Ligatures Yuri Khan
2020-05-19 10:43 ` Ligatures Werner LEMBERG
2020-05-19 10:48 ` Ligatures tomas
2020-05-19 14:18 ` Ligatures Eli Zaretskii
2020-05-19 14:52 ` Ligatures Eli Zaretskii
2020-05-19 15:11 ` Ligatures Pip Cet
2020-05-19 15:36 ` Ligatures Eli Zaretskii
2020-05-19 16:16 ` Ligatures Pip Cet
2020-05-19 16:41 ` Ligatures Eli Zaretskii
2020-05-19 17:00 ` Ligatures Eli Zaretskii
2020-05-17 18:28 ` Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY) Julius Pfrommer
2020-05-17 18:45 ` Eli Zaretskii
2020-05-17 22:28 ` chad
2020-05-18 22:08 ` Alan Third
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).