Hello all,
I'm brand new to emacs development, so please forgive any breaches of
protocol in the following missive. This is mostly based on a rather
janky println debugging branch I hacked together, plus haphazard
profiling of a quite "non -Q" session.
Recently I have been investigating slowdowns in overall responsiveness
and snappiness in my emacs setup, which arise during the course of
normal work. I attached a sampling profiler to the process
(Instruments on MacOS), and recorded ten or so seconds of
mashing the movement cursors in my org-agenda window. The result is
that 93.4% of the total samples are inside of the function
lface_from_face_name_no_resolve, in xfaces.c. The culprit seems to be
a large association list, frame->face_list, which in my current
session contains over 1000 faces.
This association list is queried every time that face properties need
to be resolved. I believe several factors are contributing to this
being a major bottleneck:
- Some important faces are looked up once per line, on every redisplay. For
example :fringe and :line-number.
- new faces are *pre*-pended to the alist, so old, commonly used faces get
pushed to the back. The linear search in assq_no_quit takes longer and longer
to find them. This is compounded as packages create faces for their own
purposes. See above about my session containing over 1000 faces.
- This may be less of a problem in vanilla emacs, but some packages create faces
that result in quite deep recursive calls to merge_named_face. Each such frame
in the stack (I observed upwards of 50 such frames with my org-agenda button
mashing) is doing its own face lookups.
- display-line-number-mode is also a major offender, since it has to
look up several faces and merge them, at least once per line number.
To verify this bottleneck for yourself, I encourage you to attach a sampling
profiler to an emacs -Q session, and determine the percentage of time spent
inside the function lface_from_face_name_no_resolve. By turning on
display-line-numbers-mode, I was able to get this proportion above 40% in just a
few seconds of typing and mashing the up and down arrows. The other diagnostic I used was
to compile a duplicate function my_assq_no_quit which returns the number of
iterations required to find the key in the list. This can then be logged inside
of lface_from_face_name_no_resolve. It is easy to verify that as more faces are
registered (e.g. by turning on hl-line-mode, line numbers, org-agenda, etc.),
the number of iterations required to find commonly needed faces like :fringe
grows.
So, what should be done about this? The frame's face_alist field is exposed
directly to elisp via the (frame-face-alist) function. It is probably not a good
idea to break any contract there, explicit or implicit. An alternative is to add
a sister field to the frame struct, which would be a fixed-size cache of the
most recently accessed faces, laid out contiguously in memory for access speed.
The invalidation strategy for such a cache can be extremely simple, even as
simple as clearing the whole cache. As the profiling demonstrates,
accesses should dominate face updates by many orders of magnitude. An implementation
like this with, say, 100 cached pointers to face objects, would benefit from
cache coherency and probably outperform the association list lookup by
quite a lot for almost all faces.
----
I have not even touched on the question of why faces like :fringe need to be
looked up so many times per redisplay. I imagine that there may be a reason
like, in principle it's possible for elisp to modify the value of :fringe in
between consecutive uses of it. Setting that aside, I think
that the use of the association list for such a hot function as
lface_from_face_name_no_resolve should be reconsidered.
----
Thank you for reading this far! I look forward to a fruitful conversation about
everyone's favorite editor. :)
-Jack
----
Appendix: output of face lookup logging, piped through uniq -c. Each pair of
lines is a single redisplay. [N iters] indicates the number of iterations the
assq_no_quit lookup took. N goes up as the number of faces in the
frame goes up.
```
48 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
50 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
52 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
54 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
56 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
58 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
60 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
62 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
64 [97 iters] recalling face obj named fringe
68 [68 iters] recalling face obj named line-number-current-line
148 [97 iters] recalling face obj named fringe
```