Iteration over frame->face_alist is a huge performance suck

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Iteration over frame->face_alist is a huge performance suck

From:	John Coughlin
Subject:	Iteration over frame->face_alist is a huge performance suck
Date:	Thu, 01 Jul 2021 20:58:07 -0700
User-agent:	Cyrus-JMAP/3.5.0-alpha0-530-gd0c265785f-fm-20210616.002-gd0c26578

Hello all,

I'm brand new to emacs development, so please forgive any breaches of

protocol in the following missive. This is mostly based on a rather

janky println debugging branch I hacked together, plus haphazard

profiling of a quite "non -Q" session.

Recently I have been investigating slowdowns in overall responsiveness

and snappiness in my emacs setup, which arise during the course of

normal work. I attached a sampling profiler to the process

(Instruments on MacOS), and recorded ten or so seconds of

mashing the movement cursors in my org-agenda window. The result is

that 93.4% of the total samples are inside of the function

lface_from_face_name_no_resolve, in xfaces.c. The culprit seems to be

a large association list, frame->face_list, which in my current

session contains over 1000 faces.

This association list is queried every time that face properties need

to be resolved. I believe several factors are contributing to this

being a major bottleneck:

- Some important faces are looked up once per line, on every redisplay. For

example :fringe and :line-number.

- new faces are *pre*-pended to the alist, so old, commonly used faces get

pushed to the back. The linear search in assq_no_quit takes longer and longer

to find them. This is compounded as packages create faces for their own

purposes. See above about my session containing over 1000 faces.

- This may be less of a problem in vanilla emacs, but some packages create faces

that result in quite deep recursive calls to merge_named_face. Each such frame

in the stack (I observed upwards of 50 such frames with my org-agenda button

mashing) is doing its own face lookups.

- display-line-number-mode is also a major offender, since it has to

look up several faces and merge them, at least once per line number.

To verify this bottleneck for yourself, I encourage you to attach a sampling

profiler to an emacs -Q session, and determine the percentage of time spent

inside the function lface_from_face_name_no_resolve. By turning on

display-line-numbers-mode, I was able to get this proportion above 40% in just a

few seconds of typing and mashing the up and down arrows. The other diagnostic I used was

to compile a duplicate function my_assq_no_quit which returns the number of

iterations required to find the key in the list. This can then be logged inside

of lface_from_face_name_no_resolve. It is easy to verify that as more faces are

registered (e.g. by turning on hl-line-mode, line numbers, org-agenda, etc.),

the number of iterations required to find commonly needed faces like :fringe

grows.

So, what should be done about this? The frame's face_alist field is exposed

directly to elisp via the (frame-face-alist) function. It is probably not a good

idea to break any contract there, explicit or implicit. An alternative is to add

a sister field to the frame struct, which would be a fixed-size cache of the

most recently accessed faces, laid out contiguously in memory for access speed.

The invalidation strategy for such a cache can be extremely simple, even as

simple as clearing the whole cache. As the profiling demonstrates,

accesses should dominate face updates by many orders of magnitude. An implementation

like this with, say, 100 cached pointers to face objects, would benefit from

cache coherency and probably outperform the association list lookup by

quite a lot for almost all faces.

----

I have not even touched on the question of why faces like :fringe need to be

looked up so many times per redisplay. I imagine that there may be a reason

like, in principle it's possible for elisp to modify the value of :fringe in

between consecutive uses of it. Setting that aside, I think

that the use of the association list for such a hot function as

lface_from_face_name_no_resolve should be reconsidered.

----

Thank you for reading this far! I look forward to a fruitful conversation about

everyone's favorite editor. :)

-Jack

----

Appendix: output of face lookup logging, piped through uniq -c. Each pair of

lines is a single redisplay. [N iters] indicates the number of iterations the

assq_no_quit lookup took. N goes up as the number of faces in the

frame goes up.

```

48 [97 iters] recalling face obj named fringe