--- Begin Message ---
Subject: |
27.0.50; elixir-mode fontification is very slow |
Date: |
Thu, 7 Nov 2019 17:40:11 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 |
I haven't been able to track this to a particular component (e.g. a
regexp) for now, but font-lock-fontify-region is now considerably slower
than it was in Emacs 26 (at least at revision cb8fb597e5bf4f14).
To reproduce: install elixir-mode (e.g. from MELPA Stable):
(add-to-list 'package-archives
'("melpa-stable" . "https://stable.melpa.org/packages/") t)
M-x list-packages, install elixir-mode.
Savet the attached tiny.__ex__ as tiny.ex.
Visit tiny.ex.
Eval: (benchmark 1 '(font-lock-fontify-region (point-min) (point-max))).
"Elapsed time: 0.158824s"
With larger files, the times are much longer.
I had a break from Elixir, so I noticed this only now.
In GNU Emacs 27.0.50 (build 11, x86_64-pc-linux-gnu, GTK+ Version 3.24.8)
of 2019-11-05 built on potemkin
Repository revision: dd19cc3aa16ccc441a8a2bfcdeb3005a6eef2543
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
System Description: Ubuntu 19.04
tiny.__ex__
Description: Text document
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#38104: 27.0.50; elixir-mode fontification is very slow |
Date: |
Wed, 27 Nov 2019 23:58:46 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 |
Hi Mattias,
On 26.11.2019 21:32, Mattias EngdegÄrd wrote:
As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el,
we have
(identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
(zero-or-more (any "A-Z" "a-z" "0-9" "_"))
(optional (or "?" "!"))))
First, this regex is suboptimal: the first character of an identifier should
occur exactly once, or you get bad backtracking behaviour. Just remove the
one-or-more construct:
(identifiers . ,(rx (any "A-Z" "a-z" "_")
(zero-or-more (any "A-Z" "a-z" "0-9" "_"))
(optional (or "?" "!"))))
This definition is then used in several places, but two in particular are of
interest to us:
;; Module attributes
(,(elixir-rx (and "@" (1+ identifiers)))
The construct (1+ identifiers) was perhaps meant to match multiple identifiers,
but it doesn't (no separator); it just matches an identifier in several ways,
which again leads to bad backtracking behaviour.
The same problem here:
;; Map keys
(,(elixir-rx (group (and (one-or-more identifiers) ":")) space)
Remove the 1+ and one-or-more and it's fast again.
That makes a lot of sense. I removed these one-or-more's and 1+ (and a
few others), and it became fast again.
I'll send a patch upstream. Thanks for your help!
(Looking at the tracker, they have a minor version of this change
submitted already).
Why did this "work" with the old rx implementation? Because that code had a
nasty bug: it does not bracket definitions in rx-constituents properly. Example:
(let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
(rx-to-string '(1+ hello) t))
=> "HELLO+"
The new rx implementation does not suffer from this bug.
The result in your case is that the old rx, when translating (1+ identifiers), only
tacked the "+" onto whatever regexp 'identifiers' produced, resulting in
"[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"
which is a lot faster, since only the final [!?] is repeated twice (and it
probably doesn't match very often).
It's funny to think how someone probably beaten the current code into
submission by trial and error.
--- End Message ---