emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E


From: Tuấn-Anh Nguyễn
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Thu, 2 Apr 2020 00:55:45 +0700

On Wed, Apr 1, 2020 at 8:26 PM Eli Zaretskii <address@hidden> wrote:
>
> > From: Tuấn Anh Nguyễn <address@hidden>
> > Date: Wed, 1 Apr 2020 13:17:42 +0700
> > Cc: address@hidden
> >
> > Real usage with "xdisp.c":
> >
> >     (define-advice tree-sitter--do-parse (:around (f &rest args) benchmark)
> >       (message "%s" (benchmark-run (apply f args))))
> >
> >     (0.257998 1 0.13326100000000096)
>
> And that is even without encoding the buffer text, IIUC what the
> package does.
>
> > So yes, direct access to buffer's text from dynamic modules would be nice.
>
> Did you consider using the API where an application can provide a
> function to return text at a given offset?  Such a function could be
> relatively easily implemented for Emacs.
>

I don't understand what you mean. Below I'll explain how it works
currently.

`ts-parse' uses the Tree-sitter's API that consumes text in chunks:

    TSTree *ts_parser_parse(
      TSParser *self,
      const TSTree *old_tree,
      TSInput input
    );

    typedef struct {
      void *payload;
      const char *(*read)(
        void *payload,
        uint32_t byte_offset,
        TSPoint position,
        uint32_t *bytes_read
      );
      TSInputEncoding encoding;
    } TSInput;

Because dynamic modules don't have direct access to buffer text,
`ts-parse' uses the module function `copy_string_contents', and exposes
this interface:

    (ts-parse PARSER INPUT-FUNCTION OLD-TREE)

Here INPUT-FUNCTION must return a chunk of the buffer text, starting
from the given byte offset, as a Lisp string. `ts-buffer-input' is one
such function.

So:

1. Chunks of the buffer text are copied into Lisp strings, through
   `buffer-substring-no-properties'.
2. These Lisp strings are copied into buffers of null-terminated utf-8
   bytes, through `copy_string_contents'.
3. All these temporary Lisp strings create GC pressure. In the xdisp.c
   example, it was 100ms for GC, in addition to 150ms for parsing.
4. emacs-module-rs has an automatic, blanket workaround for this bug
   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31238. The workaround
   involves pairs of `make_global_ref' and `free_global_ref' calls, on
   all "suspected" `emacs_value's.

#4 can be avoided if emacs-module-rs allows selectively disabling the
blanket workaround. It's band-aid on top of band-aid, but at least it's
workable.

#3 can probably be alleviated by increasing the chunk size.

However, they are consequences of #1 and #2. If dynamic modules have
direct access to the buffer text, none of the above is an issue.

Such direct access can be enabled by something like this:

    char* (*access_buffer_text) (emacs_env *env,
                                 emacs_value buffer,
                                 ptrdiff_t byte_offset,
                                 ptrdiff_t *size_inout);

Of course, such an API would require extensive documentation on how it
must be used, to ensure safety and correctness.

> Btw, what do you do with the tree returned by the tree-sitter parser?
> store it in some buffer-local variable?  If so, how much memory does
> such a tree take, and when, if ever, is that memory released?
>

It's stored in a buffer-local variable. I haven't measured the memory
they take. Memory is released when the tree object is garbage-collected
(it's a `user-ptr').


--
Tuấn-Anh Nguyễn
Software Engineer



reply via email to

[Prev in Thread] Current Thread [Next in Thread]