[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#70077: An easier way to track buffer changes
From: |
Ihor Radchenko |
Subject: |
bug#70077: An easier way to track buffer changes |
Date: |
Tue, 02 Apr 2024 16:21:25 +0000 |
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> I'm not sure how to combine the benefits of combining small changes into
>>> larger ones with the benefits of keeping distant changes separate.
>> I am not sure if combining small changes into larger ones is at all a
>> good idea.
>
> In my experience, if the amount of work to be done "per change" is not
> completely trivial, combining small changes is indispensable (the worst
> part being that the cases where it's needed are sufficiently infrequent
> that naive users of `*-change-functions` won't know about that, so we
> end up with unacceptable slowdowns in corner cases).
I agree, but only when subsequent changes intersect each other.
When distant changes are combined together, they cover a lot of buffer
text that has not been changed - they may trigger a lot more work
compared to more granular changes. Consider, for example, two changes
near the beginning and the end of the buffer:
(1..10) and (1000000...(point-max))
If such changes are tracked by tree-sitter or other parser, there is a
world of difference between requesting to re-parse individual segments
and the whole buffer.
>> The changes are stored in strings, which get allocated and
>> re-allocated repeatedly.
>
> Indeed. Tho for N small changes, my code should perform only O(log N)
> re-allocations.
May you explain a bit more about this?
My reading of `track-changes--before' is that `concat' is called on
every change except when a new change is already inside the previously
recorded one. (aside: I have a hard time reading the code because of
confusing slot names: bbeg vs beg??)
>> Repeated string allocations, especially when strings keep growing
>> towards the buffer size, is likely going to increase consing and make
>> GCs more frequent.
>
> Similar allocations presumably take place anyway while running the code
> (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be
> "lost in the noise".
I disagree. `buffer-undo-list' only includes the text that has been
actually changed. It never creates strings that span between
distant regions. As a terminal case, consider alternating between first
and second half of the buffer, starting in the middle and editing
towards the buffer boundaries - this will involve re-allocation of
buffer-long strings on average.
>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>> which avoids the last problem. But it's a fair bit more work for us, it
>>> makes the API more complex for the clients, and it's rarely what the
>>> clients really want anyway.
>> FYI, Org parser maintains such a list.
>
> Could you point me to the relevant code (I failed to find it)?
That code handles a lot more than just changes, so it might not be the
best reference. Anyway...
See the docstring of `org-element--cache-sync-requests', and
the branch in `org-element--cache-submit-request' starting from
;; Current changes can be merged with first sync request: we
;; can save a partial cache synchronization.
>> We previously discussed a similar API in
>> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/
>
> IIUC this discusses a *sequence* of edits. In the point to which you
> replied I was discussing keeping a list of *simultaneous* edits.
Hmm. By referring to buffer-undo-list, I meant that intersecting edits
will be automatically merged, as it is usually done in undo system.
I am also not 100% sure why edits being simultaneous is any relevant.
Maybe we are talking about different things?
What I was talking about is (1) automatically merging subsequent edits
like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
intersect with previous edited region, record it separately without
merging.
> This said, the downside in both cases is that the specific data that we
> need from such a list tends to depend on the user. E.g. you suggest
>
> (BEG END_BEFORE END_AFTER COUNTER)
>
> but that is not sufficient reconstruct the corresponding buffer state,
> so things like Eglot/CRDT can't use it. Ideally for CRDT I think you'd
> want a sequence of
>
> (BEG END-BEFORE STRING-AFTER)
Right. It's just that Org mode does not need STRING-AFTER, which is why
I did not think about it in my proposal. Of course, having STRING-AFTER
is required to get full reconstruction of the buffer state.
> but for Eglot this is not sufficient because Eglot needs to convert BEG
> and END_BEFORE into LSP positions (i.e. "line+col") and for that it
> needs to reproduce the past buffer state. So instead, what Eglot needs
> (and does indeed build using `*-change-functions`) is a sequence of
>
> (LSP-BEG LSP-END-BEFORE STRING-AFTER)
>
> [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure
> exactly why, but I guess it's a piece of redundant info the servers
> can use to sanity-check the data? ]
It is to support deprecated LSP spec (I presume that older LSP servers
may still be using the old spec):
https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/
export type TextDocumentContentChangeEvent = {
* The range of the document that changed.
range: Range;
* The optional length of the range that got replaced.
* @deprecated use range instead.
rangeLength?: number;
* The new text for the provided range.
text: string;
> The code changes were actually quite simple, so I have now included it.
> See my current PoC code below.
Am I missing something, or will `track-changes--signal-if-disjoint'
alter `track-changes--state' when it calls `track-changes-fetch' ->
`track-changes-reset'? Won't that interfere with the information passed
to non-disjoint trackers?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
- bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/01
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/01
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/01
- bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/02
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/02
- bug#70077: An easier way to track buffer changes,
Ihor Radchenko <=
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/02
- bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/03
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/03
- bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/04