emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E


From: Eli Zaretskii
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Thu, 02 Apr 2020 18:02:06 +0300

> From: Tuấn-Anh Nguyễn <address@hidden>
> Date: Thu, 2 Apr 2020 11:21:49 +0700
> Cc: address@hidden
> 
> > Buffer text is not exactly UTF-8, it's a superset of UTF-8.  So one
> > question to answer is what to do with byte sequences that are not
> > valid UTF-8.  Any suggestions or ideas?  How does tree-sitter handle
> > invalid byte sequences in general?
> >
> 
> I haven't checked yet. It will probably bail out, which is usually the
> desired behavior.

"Bail out" meaning that this breaks the parse?  I'd be surprised if
that was what happens in these cases.  But if it does, we will need to
replace such sequences by the likes of U+FFFD in the reader function
we provide.

> With direct access, no Lisp code will be run between these calls.

Then this issue is taken care of.

> > Next, I'm still asking whether parsing the whole buffer when it is
> > first created is necessary.  Can we pass to the parser just a small
> > chunk (say, 500 bytes) of the buffer around the window-full to be
> > displayed next?  If this presents problems, what are those problems?
> >
> 
> In principle (not in tree-sitter ATM), and in very specific cases, yes.
> IMO that's the wrong focus on a premature optimization anyway.

I tried to explain elsewhere why I don't think this is premature.

> As others noted, even in the pathological case of xdisp.c, the
> performance is acceptable.

xdisp.c is not a pathological case for me, I edit it very frequently.
More importantly, this scales poorly.

> Also keep in mind that syntax highlighting is just one
> application. Other use cases usually want a full parse tree.

Other applications have different restrictions and requirements, so
trying to satisfy all of them at once might not be the best way.

> If we really want to tackle this issue, there are other approaches to
> consider, e.g. background parsing, or parsing up until a time limit, and
> resume parsing when Emacs is idle. Tree-sitter's API supports the
> latter.

JIT-lock already supports background fontification (see
jit-lock-stealth-time), so using such parsers from jit-lock gives that
to you at almost no cost.

> > IOW, the issue with exposing access to buffer text to modules is IMO
> > secondary.  My suggestion is first to figure out how to do this stuff
> > efficiently from within Emacs itself, as if the module interface were
> > not part of the equation.  We can add that aspect back later.
> >
> 
> My opinion is that it's better to experiment with this kind of stuff
> out-of-core. It can move forward faster that way, allowing more lessons
> to be learned. Real lessons, involving real-world use cases, not thought
> exercises.

I'm talking about trying different design ideas.  It is best to do
that without being limited by what modules can and cannot do.
Building a hacked version of Emacs to test those ideas doesn't
necessarily contradict the desire to collect real-life experience.

IOW, I suggest to test alternative design ideas that are not based on
copying portions of the buffer via Lisp strings.  If those ideas are
workable (and I think they are), they will support a more scalable
implementation that exerts less memory pressure on Emacs and on the
host system.

HTH



reply via email to

[Prev in Thread] Current Thread [Next in Thread]