bug#70077: An easier way to track buffer changes

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70077: An easier way to track buffer changes

From:	Ihor Radchenko
Subject:	bug#70077: An easier way to track buffer changes
Date:	Tue, 02 Apr 2024 16:21:25 +0000

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> I'm not sure how to combine the benefits of combining small changes into
>>> larger ones with the benefits of keeping distant changes separate.
>> I am not sure if combining small changes into larger ones is at all a
>> good idea.
>
> In my experience, if the amount of work to be done "per change" is not
> completely trivial, combining small changes is indispensable (the worst
> part being that the cases where it's needed are sufficiently infrequent
> that naive users of `*-change-functions` won't know about that, so we
> end up with unacceptable slowdowns in corner cases).

I agree, but only when subsequent changes intersect each other.
When distant changes are combined together, they cover a lot of buffer
text that has not been changed - they may trigger a lot more work
compared to more granular changes. Consider, for example, two changes
near the beginning and the end of the buffer:
(1..10) and (1000000...(point-max))
If such changes are tracked by tree-sitter or other parser, there is a
world of difference between requesting to re-parse individual segments
and the whole buffer.

>> The changes are stored in strings, which get allocated and
>> re-allocated repeatedly.
>
> Indeed.  Tho for N small changes, my code should perform only O(log N)
> re-allocations.

May you explain a bit more about this?
My reading of `track-changes--before' is that `concat' is called on
every change except when a new change is already inside the previously
recorded one. (aside: I have a hard time reading the code because of
confusing slot names: bbeg vs beg??)

>> Repeated string allocations, especially when strings keep growing
>> towards the buffer size, is likely going to increase consing and make
>> GCs more frequent.
>
> Similar allocations presumably take place anyway while running the code
> (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be
> "lost in the noise".

I disagree. `buffer-undo-list' only includes the text that has been
actually changed. It never creates strings that span between
distant regions. As a terminal case, consider alternating between first
and second half of the buffer, starting in the middle and editing
towards the buffer boundaries - this will involve re-allocation of
buffer-long strings on average.

>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>> which avoids the last problem.  But it's a fair bit more work for us, it
>>> makes the API more complex for the clients, and it's rarely what the
>>> clients really want anyway.
>> FYI, Org parser maintains such a list.
>
> Could you point me to the relevant code (I failed to find it)?

That code handles a lot more than just changes, so it might not be the
best reference. Anyway...

See the docstring of `org-element--cache-sync-requests', and
the branch in `org-element--cache-submit-request' starting from
          ;; Current changes can be merged with first sync request: we
          ;; can save a partial cache synchronization.

>> We previously discussed a similar API in
>> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/
>
> IIUC this discusses a *sequence* of edits.  In the point to which you
> replied I was discussing keeping a list of *simultaneous* edits.

Hmm. By referring to buffer-undo-list, I meant that intersecting edits
will be automatically merged, as it is usually done in undo system.

I am also not 100% sure why edits being simultaneous is any relevant.
Maybe we are talking about different things?

What I was talking about is (1) automatically merging subsequent edits
like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
intersect with previous edited region, record it separately without
merging.

> This said, the downside in both cases is that the specific data that we
> need from such a list tends to depend on the user.  E.g. you suggest
>
>     (BEG END_BEFORE END_AFTER COUNTER)
>
> but that is not sufficient reconstruct the corresponding buffer state,
> so things like Eglot/CRDT can't use it.  Ideally for CRDT I think you'd
> want a sequence of
>
>     (BEG END-BEFORE STRING-AFTER)

Right. It's just that Org mode does not need STRING-AFTER, which is why
I did not think about it in my proposal. Of course, having STRING-AFTER
is required to get full reconstruction of the buffer state.

> but for Eglot this is not sufficient because Eglot needs to convert BEG
> and END_BEFORE into LSP positions (i.e. "line+col") and for that it
> needs to reproduce the past buffer state.  So instead, what Eglot needs
> (and does indeed build using `*-change-functions`) is a sequence of
>
>     (LSP-BEG LSP-END-BEFORE STRING-AFTER)
>
> [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure
>   exactly why, but I guess it's a piece of redundant info the servers
>   can use to sanity-check the data?  ]

It is to support deprecated LSP spec (I presume that older LSP servers
may still be using the old spec):

https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/
export type TextDocumentContentChangeEvent = {
         * The range of the document that changed.
        range: Range;

         * The optional length of the range that got replaced.
         * @deprecated use range instead.
        rangeLength?: number;

         * The new text for the provided range.
        text: string;

> The code changes were actually quite simple, so I have now included it.
> See my current PoC code below.

Am I missing something, or will `track-changes--signal-if-disjoint'
alter `track-changes--state' when it calls `track-changes-fetch' ->
`track-changes-reset'? Won't that interfere with the information passed
to non-disjoint trackers?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

[Prev in Thread]

Current Thread

[Next in Thread]

bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/01
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/01
  - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/01
    - bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/02
    - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/02
    - bug#70077: An easier way to track buffer changes, Ihor Radchenko <=
    - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/02
    - bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/03
    - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/03
    - bug#70077: An easier way to track buffer changes, Ihor Radchenko, 2024/04/04
- bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/05
  - bug#70077: An easier way to track buffer changes, Eli Zaretskii, 2024/04/06
    - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/08
    - bug#70077: An easier way to track buffer changes, Eli Zaretskii, 2024/04/08
    - bug#70077: An easier way to track buffer changes, Stefan Monnier, 2024/04/08
    - bug#70077: An easier way to track buffer changes, Andrea Corallo, 2024/04/08

Prev by Date: bug#70007: [PATCH] native JSON encoder
Next by Date: bug#70145: [PATCH] Add sqlite-execute-batch command
Previous by thread: bug#70077: An easier way to track buffer changes
Next by thread: bug#70077: An easier way to track buffer changes
Index(es):
- Date
- Thread