[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Comments to the new tree sitter implementation
From: |
Theodor Thornhill |
Subject: |
Comments to the new tree sitter implementation |
Date: |
Sat, 23 Apr 2022 14:33:04 +0200 |
Hi there!
I think I finally understand most of the important bits of the tree
sitter implementation as seen from a major mode maintainer, and I want
to share my thoughts so that we can have something concrete to work on
when the feature branch hits.
1. New parent-beginning-of-line preset
For typescript-mode more often that not you want to find some close
parent, then go to beginning of line to calculate the offset. A
suggested implementation of this is:
```elisp
(defun parent-beginning-of-line ()
(lambda (node parent bol &rest _)
(when-let ((parent (tree-sitter-node-parent node)))
(save-excursion
(goto-char (tree-sitter-node-start parent))
(back-to-indentation)
(tree-sitter-node-start
(tree-sitter-node-at (point) (point) 'tree-sitter-tsx))))))
```
This works as following:
```typescript
function foo() {
bar(() => ({
baz
}))
}
```
baz will know where to indent because it finds the closest {, then goes
to the start of 'bar' and returns that nodes position. The same applies
to how 'bar' finds its closest '{', then goes to the first letter of
'function'.
To do all this you need a rule such as
```elisp
((node-is "}" (parent-beginning-of-line) 0)
```
and
```elisp
((parent-is "object_or_some_other_parent" (parent-beginning-of-line) 2)
```
inside of the 'simple-indent-rules'
2. Performance of query preset
The performance of the query preset is not good. I finally understood
how the indentation engine worked by using that preset, but only after 4
or 5 queries indenting a file took several seconds. Either it should be
marked with a caveat, or improved in some way. I didn't get a
profiler-report before removing them. Removing them and supplying
equivalent rules by using the 'parent-beginning-of-line' function made
indenting huge files instantaneous.
3. Performance of font-locking
The font-locking also can be flaky, and I'm not sure what happens here
either. It may be the current implementation of typescript-mode, it
could be in tree-sitter. Does the complexity increase proportionally to
the granularity of the queries, maybe?
Steps to reproduce:
1. paste in some file, for example this:
https://raw.githubusercontent.com/tree-sitter/tree-sitter-javascript/master/grammar.js
2. start scrolling using 'C-n'
3. observe enourmous lags when scrolling.
This time I got a profiler report:
```
15588 97% - command-execute
15588 97% - call-interactively
15417 96% - funcall-interactively
15417 96% - previous-line
15417 96% - line-move
13677 85% - line-move-visual
13655 85% - vertical-motion
13655 85% - jit-lock-function
13651 85% - jit-lock-fontify-now
13651 85% - jit-lock--run-functions
13651 85% - run-hook-wrapped
13651 85% - #<compiled -0x156eafb94cdd4083>
13651 85% - font-lock-fontify-region
13647 85% - tree-sitter-font-lock-fontify-region
762 4% + tree-sitter-query-capture
112 0% + font-lock-default-fontify-region
4 0% + font-lock-unfontify-region
1716 10% - line-pixel-height
1716 10% - jit-lock-function
1716 10% - jit-lock-fontify-now
1716 10% - jit-lock--run-functions
1716 10% - run-hook-wrapped
1716 10% - #<compiled -0x156ecffbbac1d883>
1716 10% - font-lock-fontify-region
1716 10% - tree-sitter-font-lock-fontify-region
108 0% - tree-sitter-query-capture
104 0% - tree-sitter-expand-pattern
36 0% - tree-sitter-expand-pattern
8 0% tree-sitter-expand-pattern
12 0% - font-lock-default-fontify-region
12 0% - font-lock-fontify-syntactically-region
12 0% syntax-ppss
171 1% + byte-code
156 0% + redisplay_internal (C function)
89 0% + jsonrpc--process-filter
63 0% + eldoc-pre-command-refresh-echo-area
17 0% + timer-event-handler
4 0% internal-timer-start-idle
0 0% + ...
```
As you can see most of the time is spent inside of
'tree-sitter-font-lock-fontify-region'. I'm not sure how to get even
more granular information than this.
4. How do I make sure that the null node indents without the catch all
I want to be able to type this:
```typescript
const foo = () => {
| // This is the indentation I get
}
```
```typescript
const foo = () => {
| // This is the indentation I want
}
```
In some cases the parser returns null, which I can match on using the
provided preset. However, usually tree-sitter-tsx returns something
like:
```elisp
(some_parent (object))
```
where object is not a null node. I don't want to match on this
specifically, for two reasons
1. I have to make rules for every case where this happens
2. I need to be mindful of the precedence of rules inside of the
simple-indent-rules
This is because when I start to type something, the ast changes to:
```elisp
(some_parent (object (some_child)))
```
It is the 'some_child' I want to indent, not the object itself. As such
a rule such as:
```elisp
((parent-is "object") parent 2)
```
will not work for this problem. It will indent the 'some_child', but
not the cursor inside of the squigglies.
Is there a way to do this with a simple preset?
5. Bugs in error-handling during font-locking
When typing it seems tree-sitter-tsx returns ERROR (which is expected
during typing), but then emacs shifts around the font-locking and
returning broken font-locking. Sometimes it recovers, sometimes it does
not. Maybe we shouldn't font-lock when there are errors, or at least
don't change the font-locking from outside of the node where there is an
error? Now it looks like the whole file errors out, but it is really
only one node, or maybe it bleeds into its sibling.
I attach the screenshots I've already sent to make documentation a
little easier.
Lastly I want to say that I'm really impressed by this, Yuan! Thank you,
and I'm looking forward to getting it mainlined.
I'll also add the link to the typescript-mode I'm making atm, so that
the context of what I'm talking about may be a little clearer:
https://github.com/emacs-typescript/typescript.el/blob/feature/tsx-support/typescript-ts.el
All the best,
Theodor
correct.png
Description: PNG image
wrong.png
Description: PNG image
- Comments to the new tree sitter implementation,
Theodor Thornhill <=