bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a p


From: Yuan Fu
Subject: bug#59415: 29.0.50; [feature/tree-sitter] c-ts-mode fails to fontify a portion of a large C file
Date: Sun, 20 Nov 2022 12:59:42 -0800


> On Nov 20, 2022, at 12:33 PM, Theodor Thornhill <theo@thornhill.no> wrote:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Theodor Thornhill <theo@thornhill.no>
>>> Cc: Yuan Fu <casouri@gmail.com>
>>> Date: Sun, 20 Nov 2022 20:54:05 +0100
>>> 
>>>> Observe that fontifications stop at this line for some reason.
>>>> Fontification reappears on line 209271.  Maybe it's because of the many
>>>> braces that appear in warning face?  Why does TS think there are syntax
>>>> errors here?  The C++ TS parser doesn't have that problem, btw.
>>> 
>>> It seems the c parser definitely can't handle what it's seeing.
>> 
>> Yes, but do you have any clue why it gives up at that line?
>> 
> 
> No, not yet.

Because the whole thing is contained in an ERROR node. I wasn’t covered in 
error face because our rule for error doesn’t “override”: if there are existing 
faces in the range, the error face isn’t applied. If I change the rule 
fontifying errors to override, everything is in error face. Alternatively, if 
you disable fontifying errors, like this:

(add-hook 'c-ts-mode-hook #'c-ts-setup)
(defun c-ts-setup ()
  (treesit-font-lock-recompute-features nil '(error)))

> 
> 
>> One thing that I see is that many braces around there are shown in warning
>> face, so perhaps the parser is overwhelmed by the amount of parsing errors?
>> 
> 
> Yeah that's my first guess, but that shouldn't be an issue, it should be
> able to font-lock _something_.

Yeah, see above.

> 
>>>> P.S. Btw, isn't the treesit-max-buffer-size limit too low?  4 MiB?
>>> 
>>> It might be!  IIRC treesit uses 10x the buffer size to store the ast, so
>>> it'll be some more memory usage.
>> 
>> After lifting the limit to allow visiting the file, this file causes Emacs
>> to go up to 350 MiB.  Which is significant, but definitely not outrageous
>> enough to prevent using TS with this file.  And I'm sure "normal" C files
>> (as opposed to ones written by a program) will need less memory.  So 4 MiB
>> sounds too restrictive to me.  We should maybe increase that to 15 MiB on
>> 32-bit systems and say 40 MiB on 64-bit?
>> 
> 
> I think it should probably be the same as in the C level, as I mentioned
> in the other mail?

4GB is the absolute upper limit, but the practical maximum size if well below 
that. Thought 4MB might be too conservative.

> 
>>> I'll do some more digging, but in the
>>> meantime I attach this profiler report that shows font-locking as the
>>> culprit:
>> 
>> Culprit for what?  For slow performance?
> 
> Yeah.
> 
>> Don't get me wrong: from my POV, TS works here better than CC Mode, in
>> many use cases which are much more important than scrolling through
>> the entire humongous file top to bottom.  For example, just visiting
>> the file takes 3 times as much with CC Mode as with c-ts-mode; going
>> to EOB with CC Mode takes more 1 min 20 sec, whereas TS does it in 2.5
>> sec.  And likewise jumping into a random point in the file.  Instead
>> of Alan's 150 sec for a full scroll by CC Mode I get 27 min.  The
>> number of GC cycles with CC Mode is 10 times as large as with TS.
>> (Caveat: my Emacs is built without optimizations, whereas Tree-sitter
>> and the language support libraries are, of course, fully optimized.)
>> 
> 
> Ok, that's good to know!
> 
>>> In this profile I followed your repro, and did some more movement around
>>> the buffer after.  This isn't from emacs -Q, but I believe the results
>>> will be just the same, considering where the slowness seems to be
>>> 
>>> 
>>>       16695  85% - redisplay_internal (C function)
>>>       16695  85%  - jit-lock-function
>>>       16695  85%   - jit-lock-fontify-now
>>>       16695  85%    - jit-lock--run-functions
>>>       16695  85%     - run-hook-wrapped
>>>       16695  85%      - #<compiled -0x156eddb48a262583>
>>>       16695  85%       - font-lock-fontify-region
>>>       16695  85%        - font-lock-default-fontify-region
>>>       16679  84%         - treesit-font-lock-fontify-region
>> 
>> Yes, treesit-font-lock-fontify-region takes the lion's share.  If you or
>> Yuan can speed this up, please do.  But I see no reason to consider this a
>> catastrophe, quite to the contrary.
> 
> I think it boils down to getting the root too many times.  In an
> unmodified buffer I think getting the root node should be instant, and
> it seems to take some time.  I'll try to figure out why.

Getting root is trivial, the bulk of the time is spent in query-capture

Running the following in that file gives me 1.87 seconds, while in a smaller 
file it only takes 0.00016.

(benchmark-run 100
  (let ((query (caar treesit-font-lock-settings))
        (root (treesit-buffer-root-node)))
    (treesit-query-capture root query 7700472 7703604)))

> This diff fixes the font-lock issues:
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 674c984dfe..0f84d8b83e 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -774,12 +774,12 @@ treesit-font-lock-fontify-region
>       ;; will give you that quote node.  We want to capture the string
>       ;; and apply string face to it, but querying on the quote node
>       ;; will not give us the string node.
> -      (when-let ((root (treesit-buffer-root-node language))
> +      (when-let (
>                  ;; Only activate if ENABLE flag is t.
>                  (activate (eq t enable)))
>         (ignore activate)
>         (let ((captures (treesit-query-capture
> -                         root query start end))
> +                         (treesit-node-on start end) query start end))
>               (inhibit-point-motion-hooks t))
>           (with-silent-modifications
>             (dolist (capture captures)
> 
> 
> However, the comment right above makes a case for why we should have
> this.  BUT, is this still relevant, Yuan, after the changes in treesit
> reporting what has changed etc?  What exact case is that an issue?  And
> is it more severe than the behavior this bug is exhibiting?

The case described by the comment is still relevant. With this patch, the quote 
described in that case still wouldn’t be fontified. We can use some heuristic 
to get a node “large enough” and not the root node. Eg, find some top-level 
node. That should make query-capture much faster.

Yuan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]