return to table of content

Movable tree CRDTs and Loro's implementation

wim
8 replies
1d4h

We're building a new multiplayer editor for tasks/notes [1] which supports both text and outliner operations. Although it behaves like a flat text document, the outliner features essentially turn the document into a large tree under the hood. We do something similar to the highly-available move operation to sync changes:

There is one operation to change the tree, called insmov (move-or-insert). Whenever a client is online it can sync changes C to a server. Whenever the server has remote changes for us, it will send us back a list R of all changes since our last sync in a global linear order. We then undo any of the insmovs in our changeset C, and (re)apply all changes in R + any new changes we didn't sync yet.

We don't use any fractional indices though. Instead, our insmov tuple not only contains a parent P, but also a previous sibling guid A. Because all tree ops will eventually be applied in the global linear order as determined by the server, "sorting" is handled by just using the insmov operation.

Most of the time the undo'ing of operations is not needed though. Only when the server has insmov changes we don't know about while we are sending new insmovs ourselves do we need to ensure we replay the operations in the correct order. That's likely to happen when you reconnect to wifi after a long flight, but not so likely when updates are pushed in real-time over websocket when you're online (plus it's not needed for non-insmov operations, like updating text).

[1] https://thymer.com

patrick91
3 replies
1d4h

what rich text are you using?

wim
2 replies
1d3h

We built it from scratch, so not based on prosemirror or contenteditable or anything like that (as we needed something which feels as if you're just editing text but also supports outlining features)

tomcam
0 replies
20h0m

Does it to RTL and vertical text? Full Unicode?

patrick91
0 replies
1d2h

very cool! can't wait to try I'm doing a note editor as well, and would love to have great outlining support!

mweidner
1 replies
1d2h

We don't use any fractional indices though. Instead, our insmov tuple not only contains a parent P, but also a previous sibling guid A. Because all tree ops will eventually be applied in the global linear order as determined by the server, "sorting" is handled by just using the insmov operation.

For what it's worth, this sounds equivalent to the RGA list CRDT [1], using the server's global linear order as a logical timestamp (in place of e.g. Lamport timestamps).

[1] https://inria.hal.science/inria-00555588/

wim
0 replies
1d2h

Right but rather than working on an array it's combined with a tree operation in this case, so if someone drags a task to reorder but someone else moves it to another parent it won't cause (cycle) conflicts

wim
0 replies
1d3h

Hah well there is definitely some scope/vision creep involved, and it all took a bit longer than planned. Not 15 years though! (that's about the very first app we ever made, which we only keep online for existing users). We've been working on this new project as a team of 2 for almost three years now. We really wanted to get it right so we spent a lot of time building the editor/IDE completely from scratch, as well as all the other stuff like the syncing layer (which is how I became interested in the topic of CRDTs and such).

billconan
7 replies
1d4h

I wonder if there has been any practical CRDT for data dense applications, such as images (pixels) and 3D models?

danielvaughn
1 replies
1d4h

I'm not sure that CRDTs would be necessary for image editing, since all conflicting edits could easily be resolved with a last-writer-wins approach. 3D models are a different beast, and I haven't seen any collaborative 3D modeling tool on the market (though I haven't actively searched).

okcdz
0 replies
1d2h

There is a 3D modelling tool called Spline supports multiplayer editing. I suppose it's using OT

paulgb
0 replies
1d

A cool example I've seen is Modyfi, which is a non-destructive editor for raster graphics. They use Yjs to represent the data, but instead of storing raw pixels, they are storing the entire history of transformations.

https://digest.browsertech.com/archive/browsertech-digest-ho...

jitl
0 replies
1d

With any collaborative application, you need to start with a conceptual framework for the edits a user may perform, and how to best preserve the intention of the user (1) and the coherency of the resulting document (2) when any such edits may occur asynchronously. Even if a document is data-intensive in its concrete representation, the way you encode the user's discrete edits & operations can still be tiny.

Let's say we're building an image editor like Photoshop. An uncompressed 102 megapixel image with 16-bit color depth per channel (a photo from a Fujifilm GFX100 camera) would be about 610 MB as a TIFF. Representing each pixel of the image as a separate last-write-wins register would impose a high overhead, but such a representation doesn't actually make any sense to preserve a user's intention. The edits the user will perform are things like "increase image contrast by 15%" or "paint spline [(0,0), (1500, 1500)] with brush Q and color #000". If we sync each pixel by lamport timestamp, we could end up with user 1's contrast applying to all pixels except for those painted by user 2, which would give a weird looking image with painted-over pixels looking out of place.

Instead you'd probably want to represent user intention as a list of edit operations, which are much smaller than a whole 102MB pixel grid. A CRDT data structure is one possible technical mechanism perform to synchronize that user intent, but you pick the structure to match the user intent semantics, not to match the concrete data layout of your output.

You may still end up having edit operations that contain massive amounts of data, like "add new layer named `bg` below layer `fg` with pixels `data:(10mb of pixels)` at (1500, 1500)". But the overhead for the synchronization of that kind of edit command is very low, it's size is O(1), not O(pixels in the edit command).

archagon
0 replies
22h34m

I sketched out what a performant pixel-based CRDT might look like in my big CRDT article: http://archagon.net/blog/2018/03/24/data-laced-with-history/...

Never tried building it, though. And I’m not sure it would actually be practical, but it would at least preserve the full history of the document.

CGamesPlay
0 replies
1d4h

What would a concurrent edit look like? If there are two concurrent edits to the same pixel or vertex, what should the result look like? The easiest answer is that you decide arbitrarily (say by timestamp), which is "last-write-wins", which is a CRDT.

koromak
4 replies
1d4h

Asking for advice: I do not have a multiplayer app, but I have some large, interconnected, denormalized trees on my frontend as user profiles. Think like a tiled layout, where a user can add/remove/resize tiles, and then add a number of components into each tiled slot, each of those having their own profiles too. Multiple "layouts" can exist with different arrangements of tiles, and theres some other complexity with individual tiles referencing and sharing other pieces of state globally.

Making safe updates via regular REST is difficult, as you have to make sure someone with two tabs open isn't going to make an update on tab 1, then another on tab 2 which puts the overall profile into an invalid state. And in general, ordering matters. Skipping an update serverside that was properly applied clientside could break things.

The dumb-as-rocks solution I came up with is to just send the minimal amount of data over that can completely overwrite a particular chunk of state, and place it behind a queue. Usually thats fine, but sometimes thats a lot of wasted data, 50KB when the actual change was only a couple bytes.

I don't need CRDTs for any of the regular reasons, but it seems like it would make state management a million times easier, even for a single user. For one, I'd get syncing between a user's browser tabs, which is good. But moreover, I can just make simple changes to frontend state, and trust that the CRDT is going to negotiate it properly with the server. I no longer have to deal with it myself.

Does this make sense? Or is the overhead required to make something like Yjs work not worth it when I don't even need multiplayer and local-first.

danielvaughn
2 replies
1d4h

If your application makes active use of multiple tabs, it might make sense to use YJS or something, because it's very effective in resolving those types of problems.

However, if your profile edits are single-user only, it's probably overkill to introduce a CRDT. At first glance, it seems the two-tabs-open scenario is your highest source of bugs, so what you could do is use a BroadcastChannel to signal update events to all other tabs.

TheColorYellow
1 replies
1d1h

How is YJS different from introducing CRDT? Doesn't it basically just do that for you anyways?

If CRDT is complications and difficult to manage, either YJS resolves that completely, or more likely that complexity will leak out of the abstraction layer no matter what.

To me it seems more like that OP should compare and contrast concurrency solutions, one of which is CDRT via YJS or another could be something like concurrency based on Go routines.

Edit: Should obviously mention Loro, the literal thread we're in now lol

danielvaughn
0 replies
20h10m

I wrote that comment as a stream-of-consciousness, so it could have been written much clearer. What I meant was that you probably don't want to reach for a CRDT of any kind unless either multi-tab or multi-user editing is an inherent part of your app's experience. Else you can get the same benefits with less complexity.

pshc
0 replies
21h28m

I think CRDTs make sense for your usecase.

Maintaining the shared state with REST calls that overwrite parts of server state is indeed brittle, and really only suitable for overwriting fields on flat data records. It also requires that you consider server-client state coordination with care at all times, and things can easily get out of sync in non-happy paths.

Like you said, building out a CRDT which specifies how updates are coalesced will significantly reduce cognitive burden.

lewisjoe
2 replies
1d4h

When working with formatted text content like in Google Docs / Zoho Writer: moving a list item down or adding a new column or any table/list operation is essentially a tree manipulation op.

Concurrent conflicts in such cases are notoriously hard to converge without contextual special handling [1]. Does this implementation generalize a solution for such use-cases?

I guess it should be possible to combine a list(or string) CRDT for leaf nodes (i.e text blocks) and use this tree CRDT for structural nodes (lists & tables).

But that will make augmenting every op with two-dimensional address (parent-id, index_offset_into_that_parent)

[1] https://github.com/inkandswitch/peritext/issues/27

josephg
0 replies
1d4h

That’s always how I’ve imagined it. Rich text is plain text with 2 additions: Annotation ranges (for bolded regions and such) and non-character elements (Eg a table or embedded image). A text crdt is fundamentally just a list crdt that happens to contain character data. So embedded elements can easily be modelled as a special item (the embedded child node), and with size of 1 like any other item in the string. And then with the right approach, you can mix and match different CRDTs in a tree as needed. (Rich text, contains a table, and one of the cells has an image and so on).

Augmenting every op with a parent-crdt-id field is unfortunate but I think unavoidable. Thankfully I suspect that in most real world use cases, it would be very common for runs of operations to share the same parent crdt. As such, I think those ID fields would run-length encode very well.

czx111331
0 replies
1d3h

The implementation can indeed combine multiple different CRDTs. Within Loro's internal implementation, each op does need to store a parent ID. However, as Seph mentioned, consecutive operations under the same parent can be effectively compressed, so the amortized overhead of these parent IDs is often not significant.

rwieruch
1 replies
1d

Wow I have to read this! For a freelance client of mine, I have open sourced React Table Library [0] with the focus on tree operations. They are handling a folder/file tree structure of 100 thousands nodes where it is possible to move folders/files, clone them, lazy load them on a top and nested level, etc. And all of it in the same table structure.

After I finished the project, I kinda knew why Google Drive only allows to display and modify on the same hierarchical level. There are so many constraints that you have to consider when implementing this in a nested view with many nodes.

[0] https://react-table-library.com/

cyanydeez
0 replies
20h55m

Looks nice, when will it be completely headless?

curtisblaine
1 replies
1d1h

Did you use a GPT to check your article? The first paragraph has a strong ChatGPT voice IMHO.

mkl
0 replies
19h45m

Not really. This kind of grammatical error is very unlike ChatGPT:

This article introduces the implementation difficulties and challenges of Movable Tree CRDTs when collaboration, and how Loro implements it and sorts child nodes.