Re: [patch suggestion] Mitigating the poor Emacs performance on huge org

emacs-orgmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch suggestion] Mitigating the poor Emacs performance on huge org

From:	Ihor Radchenko
Subject:	Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Date:	Tue, 02 Jun 2020 17:23:07 +0800

The patch (against 758b039c0) is attached.

featuredrawertextprop-20200602.patch
Description: Text Data

Ihor Radchenko <yantar92@gmail.com> writes:

> Hello,
>
> [The patch itself will be provided in the following email]
>
> I have three updates from the previous version of the patch:
>
> 1. I managed to implement buffer-local text properties.
>    Now, outline folding also uses text properties without a need to give
>    up independent folding in indirect buffers.
>
> 2. The code handling modifications in folded drawers/blocks was
>    rewritten. The new code uses after-change-functions to re-hide text
>    inserted in the middle of folded regions; and text properties to
>    unfold folded drawers/blocks if one changes BEGIN/END line.
>
> 3. [experimental] Started working on improving memory and cpu footprint
>    of the old code related to folding/unfolding. org-hide-drawer-all now
>    works significantly faster because I can utilise simplified drawer
>    parser, which require a lot less memory. Overall, I managed to reduce
>    Emacs memory footprint after loading all my agenda_files twice. The
>    loading is also noticeably faster.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the buffer-local text properties:
>
> I have found char-property-alias-alist variable that controls how Emacs
> calculates text property value if the property is not set. This variable
> can be buffer-local, which allows independent 'invisible states in
> different buffers.
>
> All the implementation stays in
> org--get-buffer-local-text-property-symbol, which takes care about
> generating unique property name and mapping it to 'invisible (or any
> other) text property.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the new implementation for tracking changes:
>
> I simplified the code as suggested, without using pairs of before- and
> after-change-functions.
>
> Handling text inserted into folded/invisible region is handled by a
> simple after-change function. After testing, it turned out that simple
> re-hiding text based on 'invisible property of the text before/after the
> inserted region works pretty well.
>
> Modifications to BEGIN/END line of the drawers and blocks is handled via
> 'modification-hooks + 'insert-behind-hooks text properties (there is no
> after-change-functions analogue for text properties in Emacs). The
> property is applied during folding and the modification-hook function is
> made aware about the drawer/block boundaries (via apply-partially
> passing element containing :begin :end markers for the current
> drawer/block). Passing the element boundary is important because the
> 'modification-hook will not directly know where it belongs to. Only the
> modified region (which can be larger than the drawer) is passed to the
> function. In the worst case, the region can be the whole buffer (if one
> runs revert-buffer).
>
> It turned out that adding 'modification-hook text property takes a
> significant cpu time (partially, because we need to take care about
> possible existing 'modification-hook value, see
> org--add-to-list-text-property). For now, I decided to not clear the
> modification hooks during unfolding because of poor performance.
> However, this approach would lead to partial unfolding in the following
> case:
>
> :asd:
> :drawer:
> lksjdfksdfjl
> sdfsdfsdf
> :end:
>
> If :asd: was inserted in front of folded :drawer:, changes in :drawer:
> line of the new folded :asd: drawer would reveal the text between
> :drawer: and :end:.
>
> Let me know what you think on this.
>
>> You shouldn't be bothered by the case you're describing here, for
>> multiple reasons.
>> 
>> First, this issue already arises in the current implementation. No one
>> bothered so far: this change is very unlikely to happen. If it becomes
>> an issue, we could make sure that `org-reveal' handles this.
>> 
>> But, more importantly, we actually /want it/ as a feature. Indeed, if
>> DRAWER is expanded every time ":BLAH:" is inserted above, then inserting
>> a drawer manually would unfold /all/ drawers in the section. The user is
>> more likely to write first ":BLAH:" (everything is unfolded) then
>> ":END:" than ":END:", then ":BLAH:".
>
> Agree. This allowed me to simplify the code significantly.
>
>> It seems you're getting it backwards. `before-change-functions' are the
>> functions being called with a possibly wide, imprecise, region to
>> handle:
>> 
>>     When that happens, the arguments to ‘before-change-functions’ will
>>     enclose a region in which the individual changes are made, but won’t
>>     necessarily be the minimal such region
>> 
>> however, after-change-functions calls are always minimal:
>> 
>>     and the arguments to each successive call of
>>     ‘after-change-functions’ will then delimit the part of text being
>>     changed exactly.
>> 
>> If you stick to `after-change-functions', there will be no such thing as
>> you describe.
>
> You are right here, I missed that before-change-functions are likely to
> be called on large regions. I thought that the regions are same for
> before/after-change-functions, but after-change-functions could be
> called more than 1 time. After second thought, your vision that it is
> mostly 0 or 1 times should be the majority of cases in practice.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on reducing cpu and memory footprint of org buffers:
>
> My simplified implementation of element boundary parser
> (org--get-element-region-at-point) appears to be much faster and also
> uses much less memory in comparison with org-element-at-point.
> Moreover, not all the places where org-element-at-point is called
> actually need the full parsed element. For example, org-hide-drawer-all,
> org-hide-drawer-toggle, org-hide-block-toggle, and
> org--hide-wrapper-toggle only need element type and some information
> about the element boundaries - the information we can get from
> org--get-element-region-at-point.
>
> The following version of org-hide-drawer-all seems to work much faster
> in comparison with original:
>
> (defun org-hide-drawer-all ()
>   "Fold all drawers in the current buffer."
>   (save-excursion
>     (goto-char (point-min))
>     (while (re-search-forward org-drawer-regexp nil t)
>       (when-let* ((drawer (org--get-element-region-at-point '(property-drawer 
> drawer)))
>                 (type (org-element-type drawer)))
>       (org-hide-drawer-toggle t nil drawer)
>       ;; Make sure to skip drawer entirely or we might flag it
>       ;; another time when matching its ending line with
>       ;; `org-drawer-regexp'.
>       (goto-char (org-element-property :end drawer))))))
>
> What do you think about the idea of making use of
> org--get-element-region-at-point in org code base?
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> Further work:
>
> 1. Look into other code using overlays. Specifically,
> org-toggle-custom-properties, Babel hashes, and narrowed table columns.
>
> Best,
> Ihor
>
> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> Hello,
>>
>> Ihor Radchenko <yantar92@gmail.com> writes:
>>
>>> I have five updates from the previous version of the patch:
>>
>> Thank you.
>>
>>> 1. I implemented a simplified version of element parsing to detect
>>> changes in folded drawers or blocks. No computationally expensive calls
>>> of org-element-at-point or org-element-parse-buffer are needed now.
>>>
>>> 2. The patch is now compatible with master (commit 2e96dc639). I
>>> reverted the earlier change in folding drawers and blocks. Now, they are
>>> back to using 'org-hide-block and 'org-hide-drawer. Using 'outline would
>>> achieve nothing when we use text properties.
>>>
>>> 3. 'invisible text property can now be nested. This is important, for
>>> example, when text inside drawers contains fontified links (which also
>>> use 'invisible text property to hide parts of the link). Now, the old
>>> 'invisible spec is recovered after unfolding.
>>
>> Interesting. I'm running out of time, so I cannot properly inspect the
>> code right now. I'll try to do that before the end of the week.
>>
>>> 4. Some outline-* function calls in org referred to outline-flag-region
>>> implementation, which is not in sync with org-flag-region in this patch.
>>> I have implemented their org-* versions and replaced the calls
>>> throughout .el files. Actually, some org-* versions were already
>>> implemented in org, but not used for some reason (or not mentioned in
>>> the manual). I have updated the relevant sections of manual. These
>>> changes might be relevant to org independently of this feature branch.
>>
>> Yes, we certainly want to move to org-specific versions in all cases.
>>
>>> 5. I have managed to get a working version of outline folding via text
>>> properties. However, that approach has a big downside - folding state
>>> cannot be different in indirect buffer when we use text properties. I
>>> have seen packages relying on this feature of org and I do not see any
>>> obvious way to achieve different folding state in indirect buffer while
>>> using text properties for outline folding.
>>
>> Hmm. Good point. This is a serious issue to consider. Even if we don't
>> use text properties for outline, this also affects drawers and blocks.
>>
>>> For now, I still used before/after-change-functions combination.
>>
>> You shouldn't.
>>
>>> I see the following problems with using only after-change-functions: 
>>>
>>> 1. They are not guaranteed to be called after every single change:
>>
>> Of course they are! See below.
>>
>>> From (elisp) Change Hooks:
>>> "... some complex primitives call ‘before-change-functions’ once before
>>> making changes, and then call ‘after-change-functions’ zero or more
>>> times"
>>
>> "zero" means there are no changes at all, so, `after-change-functions'
>> are not called, which is expected.
>>
>>> The consequence of it is a possibility that region passed to the
>>> after-change-functions is quite big (including all the singular changes,
>>> even if they are distant). This region may contain changed drawers as
>>> well and unchanged drawers and needs to be parsed to determine which
>>> drawers need to be re-folded.
>>
>> It seems you're getting it backwards. `before-change-functions' are the
>> functions being called with a possibly wide, imprecise, region to
>> handle:
>>
>>     When that happens, the arguments to ‘before-change-functions’ will
>>     enclose a region in which the individual changes are made, but won’t
>>     necessarily be the minimal such region
>>
>> however, after-change-functions calls are always minimal:
>>
>>     and the arguments to each successive call of
>>     ‘after-change-functions’ will then delimit the part of text being
>>     changed exactly.
>>
>> If you stick to `after-change-functions', there will be no such thing as
>> you describe.
>>
>>>> And, more importantly, they are not meant to be used together, i.e., you
>>>> cannot assume that a single call to `before-change-functions' always
>>>> happens before calling `after-change-functions'. This can be tricky if
>>>> you want to use the former to pass information to the latter.
>>>
>>> The fact that before-change-functions can be called multiple times
>>> before after-change-functions, is trivially solved by using buffer-local
>>> changes register (see org--modified-elements).
>>
>> Famous last words. Been there, done that, and it failed.
>>
>> Let me quote the manual:
>>
>>     In general, we advise to use either before- or the after-change
>>     hooks, but not both.
>>
>> So, let me insist: don't do that. If you don't agree with me, let's at
>> least agree with Emacs developers.
>>
>>> The register is populated by before-change-functions and cleared by
>>> after-change-functions.
>>
>> You cannot expect `after-change-functions' to clear what
>> `before-change-functions' did. This is likely to introduce pernicious
>> bugs. Sorry if it sounds like FUD, but bugs in those areas are just
>> horrible to squash.
>>
>>>> Well, `before-change-fuctions' and `after-change-functions' are not
>>>> clean at all: you modify an unrelated part of the buffer, but still call
>>>> those to check if a drawer needs to be unfolded somewhere.
>>>
>>> 2. As you pointed, instead of global before-change-functions, we can use
>>> modification-hooks text property on sensitive parts of the
>>> drawers/blocks. This would work, but I am concerned about one annoying
>>> special case:
>>>
>>> -------------------------------------------------------------------------
>>> :BLAH: <inserted outside any of the existing drawers>
>>>
>>> <some text>
>>>
>>> :DRAWER: <folded>
>>> Donec at pede.
>>> :END:
>>> -------------------------------------------------------------------------
>>> In this example, the user would not be able to unfold the folder DRAWER
>>> because it will technically become a part of a new giant BLAH drawer.
>>> This may be especially annoying if <some text> is more than one screen
>>> long and there is no easy way to identify why unfolding does not work
>>> (with point at :DRAWER:).
>>
>> You shouldn't be bothered by the case you're describing here, for
>> multiple reasons.
>>
>> First, this issue already arises in the current implementation. No one
>> bothered so far: this change is very unlikely to happen. If it becomes
>> an issue, we could make sure that `org-reveal' handles this.
>>
>> But, more importantly, we actually /want it/ as a feature. Indeed, if
>> DRAWER is expanded every time ":BLAH:" is inserted above, then inserting
>> a drawer manually would unfold /all/ drawers in the section. The user is
>> more likely to write first ":BLAH:" (everything is unfolded) then
>> ":END:" than ":END:", then ":BLAH:".
>>
>>> Because of this scenario, limiting before-change-functions to folded
>>> drawers is not sufficient. Any change in text may need to trigger
>>> unfolding.
>>
>> after-change-functions is more appropriate than before-change-functions,
>> and local parsing, as explained in this thread, is more efficient than
>> re-inventing the parser.
>>
>>> In the patch, I always register possible modifications in the
>>> blocks/drawers intersecting with the modified region + a drawer/block
>>> right next to the region.
>>>
>>> -----------------------------------------------------------------------
>>> -----------------------------------------------------------------------
>>>
>>> More details on the nested 'invisible text property implementation.
>>>
>>> The idea is to keep 'invisible property stack push and popping from it
>>> as we add/remove 'invisible text property. All the work is done in
>>> org-flag-region.
>>
>> This sounds like a good idea.
>>
>>> This was originally intended for folding outlines via text properties.
>>> Since using text properties for folding outlines is not a good idea,
>>> nested text properties have much less use.
>>
>> AFAIU, they have. You mention link fontification, but there are other
>> pieces that we could switch to text properties instead of overlays,
>> e.g., Babel hashes, narrowed table columns…
>>
>>> 3. Multiple calls to before/after-change-functions is still a problem. I
>>> am looking into following ways to reduce this number:
>>>  - reduce the number of elements registered as potentially modified
>>>    + do not add duplicates to org--modified-elements
>>>    + do not add unfolded elements to org--modified-elements
>>>    + register after-change-function as post-command hook and remove it
>>>      from global after-change-functions. This way, it will be called
>>>      twice per command only.
>>>  - determine common region containing org--modified-elements. if change
>>>    is happening within that region, there is no need to parse
>>>    drawers/blocks there again.
>>
>> This is over-engineering. Again, please focus on local changes, as
>> discussed before.
>>
>>> Recipe to have different (org-element-at-point) and
>>> (org-element-parse-buffer 'element)
>>> -------------------------------------------------------------------------
>>> <point-min>
>>> :PROPERTIES:
>>> :CREATED:  [2020-05-23 Sat 02:32]
>>> :END:
>>>
>>>
>>> <point-max>
>>> -------------------------------------------------------------------------
>>
>> I didn't look at this situation in particular, but there are cases where
>> different :post-blank values are inevitable, for example at the end of
>> a section.
>>
>> Regards,
>>
>> -- 
>> Nicolas Goaziou
>
> -- 
> Ihor Radchenko,
> PhD,
> Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
> State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong 
> University, Xi'an, China
> Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong 
University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/02
- Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko <=
  - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Bastien, 2020/06/02
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/02
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Bastien, 2020/06/02
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/02
- Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/02
- Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Nicolas Goaziou, 2020/06/05
  - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/05
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Nicolas Goaziou, 2020/06/05
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/08
    - Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, Ihor Radchenko, 2020/06/08

Prev by Date: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Next by Date: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Previous by thread: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Next by thread: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Index(es):
- Date
- Thread