emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch suggestion] Mitigating the poor Emacs performance on huge org


From: Ihor Radchenko
Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Date: Tue, 19 May 2020 00:52:08 +0800

> As you noticed, using Org Element is a no-go, unfortunately. Parsing an
> element is a O(N) operation by the number of elements before it in
> a section. In particular, it is not bounded, and not mitigated by
> a cache. For large documents, it is going to be unbearably slow, too.

Ouch. I thought it is faster.
What do you mean by "not mitigated by a cache"?

The reason I would like to utilise org-element parser to make tracking
modifications more robust. Using details of the syntax would make the
code fragile if any modifications are made to syntax in future.
Debugging bugs in modification functions is not easy, according to my
experience.

One possible way to avoid performance issues during modification is
running parser in advance. For example, folding an element may
as well add information about the element to its text properties.
This will not degrade performance of folding since we are already
parsing the element during folding (at least, in
org-hide-drawer-toggle).

The problem with parsing an element during folding is that we cannot
easily detect changes like below without re-parsing.

:PROPERTIES: <folded>
:CREATED: [2020-05-18 Mon]
:END: <- added line
:ID: test
:END:

or even

:PROPERTIES:
:CREATED: [2020-05-18 Mon]
:ID: test
:END: <- delete this line

:DRAWER: <folded, cannot be unfolded if we don't re-parse after deletion>
test
:END:

The re-parsing can be done via regexp, as you suggested, but I don't
like this idea, because it will end up re-implementing
org-element-*-parser. Would it be acceptable to run org-element-*-parser
in after-change-functions?

> If you use modification-hooks and al., you don't need to parse anything,
> because you can store information as text properties. Therefore, once
> the modification happens, you already know where you are (or, at least
> where you were before the change).

> The ideas I suggested about sensitive parts of elements are worth
> exploring, IMO. Do you have any issue with them?

If I understand correctly, it is not as easy.
Consider the following example:

:PROPERTIES:
:CREATED: [2020-05-18 Mon]
<region-beginning>
:ID: example
:END:

<... a lot of text, maybe containing other drawers ...>

Nullam rutrum.
Pellentesque dapibus suscipit ligula.
<region-end>
Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus.

If the region gets deleted, the modification hooks from chars inside
drawer will be called as (hook-function <region-beginning>
<region-end>). So, there is still a need to find the drawer somehow to
mark it as about to be modified (modification hooks are ran before
actual modification).

The only difference between using modification hooks and
before-change-functions is that modification hooks will trigger less
frequently. Considering the performance of org-element-at-point, it is
probably worth doing. Initially, I wanted to avoid it because setting a
single before-change-functions hook sounded cleaner than setting
modification-hooks, insert-behind-hooks, and insert-in-front-hooks.
Moreover, these text properties would be copied by default if one uses 
buffer-substring. Then, the hooks will also trigger later in the yanked
text, which may cause all kinds of bugs.

> `org-element-at-point' is local, `org-element-parse-buffer' is global.
> They are not equivalent, but is it an issue?

It was mostly an annoyance, because they returned different results on
the same element. Specifically, they returned different :post-blank and
:end properties, which does not sound right.

Best,
Ihor




Nicolas Goaziou <address@hidden> writes:

> Hello,
>
> Ihor Radchenko <address@hidden> writes:
>
>> Apparently my previous email was again refused by your mail server (I
>> tried to add patch as attachment this time).
>
> Ah. This is annoying, for you and for me.
>
>> The patch is in
>> https://gist.github.com/yantar92/6447754415457927293acda43a7fcaef
>
> Thank you.
>
>>> I have finished a seemingly stable implementation of handling changes
>>> inside drawer and block elements. For now, I did not bother with
>>> 'modification-hooks and 'insert-in-font/behind-hooks, but simply used
>>> before/after-change-functions.
>>>
>>> The basic idea is saving parsed org-elements before the modification
>>> (with :begin and :end replaced by markers) and comparing them with the 
>>> versions of the same elements after the modification.
>>> Any valid org element can be examined in such way by an arbitrary
>>> function (see org-track-modification-elements) [1].
>
> As you noticed, using Org Element is a no-go, unfortunately. Parsing an
> element is a O(N) operation by the number of elements before it in
> a section. In particular, it is not bounded, and not mitigated by
> a cache. For large documents, it is going to be unbearably slow, too.
>
> I don't think the solution is to use combine-after-change-calls either,
> because even a single call to `org-element-at-point' can be noticeable
> in a very large section. Such low-level code should avoid using the
> Element library altogether, except for the initial folding part, which
> is interactive.
>
> If you use modification-hooks and al., you don't need to parse anything,
> because you can store information as text properties. Therefore, once
> the modification happens, you already know where you are (or, at least
> where you were before the change).
>
> The ideas I suggested about sensitive parts of elements are worth
> exploring, IMO. Do you have any issue with them?
>
>>> For (2), I have introduced org--property-drawer-modified-re to override
>>> org-property-drawer-re in relevant *-change-function. This seems to work
>>> for property drawers. However, I am not sure if similar problem may
>>> happen in some border cases with ordinary drawers or blocks. 
>
> I already specified what parts were "sensitive" in a previous message.
>
>>> 2. I have noticed that results of org-element-at-point and
>>> org-element-parse-buffer are not always consistent.
>
> `org-element-at-point' is local, `org-element-parse-buffer' is global.
> They are not equivalent, but is it an issue?
>
>
> Regards,
>
> -- 
> Nicolas Goaziou

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong 
University, Xi'an, China
Email: address@hidden, address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]