emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question collaborative editing.


From: Ergus
Subject: Re: Question collaborative editing.
Date: Thu, 1 Oct 2020 17:55:50 +0200

On Thu, Oct 01, 2020 at 04:40:59PM +0300, Eli Zaretskii wrote:
Date: Thu, 1 Oct 2020 01:11:59 +0200
From: Ergus <spacibba@aol.com>
Cc: Qiantan Hong <qhong@mit.edu>, fmfs@posteo.net, bugs@gnu.support,
        npostavs@gmail.com, emacs-devel@gnu.org, kfogel@red-bean.com,
        monnier@iro.umontreal.ca

If I understood the Qiantan idea; his approach is to add such ID as a
text property within the emacs buffer. To modify the buffer it is only
needed to search (go to) the property ID of text chars/words whatever in
the emacs buffer and perform the action. So the error probability is
smaller because the IDs ARE in the text itself.

IMO, that is not a good idea, to put it mildly.  Since these
properties can potentially span very few characters, or even change
the value at each buffer position in extreme cases, they will most
probably slow down redisplay, perhaps even significantly so.  The
problem here is that the display engine always knows where's the next
buffer position, called "stop position" at which text properties
change values.  Between the stop positions, the display engine runs at
full throttle, using the information about properties (notably, faces)
and overlays it computed at the last stop position.  But when it comes
to the next stop position, it stops and reconsiders all the stuff that
can affect display: text properties (faces, invisibility), overlays,
composed characters, etc., something that requires non-trivial
processing.  Having text properties change too often will cause
slowdown, even though these properties don't affect redisplay.

You can measure this slowdown by putting some text property on each
character in a buffer (the property should have a different value for
each character), then timing redisplay in a large enough buffer with
such properties.  For example, lean on the DOWN arrow and see how long
it takes to scroll through a large file; or run one of the scroll-down
benchmarks that were posted here in the past.

Maybe I'm being overly pessimistic, but I expect slower redisplay with
so many text property changes.  So I definitely wouldn't recommend
going that way.

Oh; you are right. Now that I remember some of that code in the display
engine you are totally right.

This will reduce undetected errors inserting in the wrong positions when
translating form the CRDT ID to the real "global" buffer position to do
an "insert". Which could happen in some corner cases... (lets say
basically synchronization errors between local modifications (which
modify local absolute positions) and processing remote ones using
outdated information (translating to global indices)...

I'm not sure we need to implement this in Emacs.  For example, Tandem
doesn't require this from its plugins; presumably, it builds the
character IDs internally?  You could look at its implementation of
CRDT and take the ideas from there; AFAIU, it requires only a couple
of very simple operations to be implemented by plugins.

Indeed; that's why I was considering the external library in C. In
general it is actually feasible. Sadly the translation from Tandem to C
will require some time because it is split with many code mixtures and
non of them in C... but if you think is better we could do that (with
SOME time).

Any way we could wait for the Qiantan implementation and consider some
optimizations if you think it worth the effort.

I also am not sure we should disregard the OT-based designs.  It is
true that they scale worse than CRDTs, but do we really envision
groups of tens, let alone hundreds or thousands of users working at
the same time in Emacs on the same document?  For smaller groups,
OT-based design might be good enough, and if it is simpler to
implement and requires less from Emacs core, maybe this possibility
should also be considered?

Indeed; If we use an external library; one of the advantages is that we
could have both because for the emacs side will be exactly the same. It
only received a "list" of changes to implement then a set of external
ones. And the network implementation and so on will be more or less the
same. IMO the hardest part of this is all the network managing, server
and so on... not CRDT is OT implementations.

My previous concerns about the internal storage and performance was
because the linear search of text properties in a long linear buffer may
be very expensive as described in the refereed paper;

Text property search in Emacs is not linear, it uses interval trees.
The problem is not in the search, the problem is in how often we will
need to do this when we traverse the buffer text, as redisplay does.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]