bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60953: The :match predicate with large regexp in tree-sitter font-lo


From: Dmitry Gutov
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 21:35:55 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 26/01/2023 20:24, Eli Zaretskii wrote:
Date: Thu, 26 Jan 2023 19:15:51 +0200
Cc:60953@debbugs.gnu.org
From: Dmitry Gutov<dgutov@yandex.ru>

On 26/01/2023 10:10, Eli Zaretskii wrote:
Perhaps Dmitry could present comparison of profiles from perf which
would allow us to understand the reason(s)?
I believe I did that in the second message in this thread:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8

To quote the specific profiles, it's

    15.30%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_current_status
    14.92%  emacs         emacs                       [.] process_mark_stack
     9.75%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_next_sibling
     8.90%  emacs         libtree-sitter.so.0.0       [.]
ts_tree_cursor_goto_first_child
     3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point

for :pred vs.

    23.72%  emacs         emacs                    [.] process_mark_stack
    12.33%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_current_status
     7.96%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_next_sibling
     7.38%  emacs         libtree-sitter.so.0.0    [.]
ts_tree_cursor_goto_first_child
     3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point

for :match.

And to continue the quote:

    Here's a significant jump in GC time which is almost the same as the
    difference in runtime. And all of it is spent marking?

    I suppose if the problem is allocation of a large string (many times
    over), the GC could be spending a lot of time scanning through the
    memory. Could this be avoided by passing some substitute handle to TS,
    instead of the full string? E.g. some kind of reference to it in the
    regexp cache.
If you are saying that GC is responsible, then running the benchmark
with gc-cons-threshold set to most-positive-fixnum should produce a
more interesting profile and perhaps a more interesting comparison.

That really helps:

(benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))

=> (16.078430587 251 5.784299419999996)

(let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure)))))

=> (10.369389725 0 0.0)

Do you want a perf profile for the latter? It might not be very useful.

(But I thought you concluded that GC alone cannot explain the
difference in performance?)

I'm inclined to think the difference is related to copying of the regexp string, but whether the time is spent in actually copying it, or scanning its copies for garbage later, it was harder to say. Seems like it's the latter, though.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]