Is it possible to recompose Doc from Updates?

robertoalvarezalonso · April 27, 2023, 2:53pm

Hello, I am currently working on building a collaborative editor using Yjs alongside a WebSocket server that cannot be modified.

The WebSocket server operates by broadcasting incoming messages to all connected clients, and temporarily storing these messages, and therefore Yjs updates, in a table.

Although I have the ability to run my own code on the backend, its JavaScript engine only supports the ECMAScript5 standard. As a result, I don’t believe I can use Yjs on the server-side.

Given the previous, is it possible to generate the document’s content in some way from the updates stored in the server?

The idea is to follow dmonad’s recommendation, instead of saving snapshots from the clients.

Thanks!

raine · April 28, 2023, 3:31pm

Yjs stores documents as Uint8Array updates. As long as you can store these binary updates, you can recreate the YDoc with Y.mergeUpdates and Y.applyUpdate.

robertoalvarezalonso · May 3, 2023, 8:06am

Thanks, raine! That was actually my first thought too, but then I started worrying about the number of updates we might end up with.

So I’m thinking maybe we could have clients send a snapshot ( Y.encodeStateAsUpdate) of the document every so often, and then on the server side, we could just get rid of any updates that were made before the most recent snapshot. It’s kind of like what they do in this project I found on Github: GitHub - YousefED/Matrix-CRDT: Use Matrix as a backend for local-first applications with the Matrix-CRDT Yjs provider.

This way, when we load the document from the server we’d just need to grab a snapshot and a few updates. Do you think this could work?

raine · May 3, 2023, 1:09pm

Yes, that could work.

Note that a client that makes offline changes before the snapshot is generated, and then reconnects after it is generated and the past updates have been removed, will not be able to merge conflicts. For true offline-first functionality you need the whole history.

It’s usually best to store the YJS updates as-is. They are heavily optimized and allow efficient storage, syncing, and transmission over the wire. But I understand that a lot of projects have different needs, and people often prefer JSON or snapshots. I’m just not sure of the full implications for the conflict-resolution behavior.

robertoalvarezalonso · May 3, 2023, 4:27pm

Thanks raine! The issue on my side is that sending a bunch of updates to the clients from the server is way slower than sending a big one.

Now I have noticed that the document state is growing fast, even though the content is empty. The server doesn’t need to store every transaction, only the content of the document when the snapshot was taken and any updates that have been made since then. Do you know how would be possible to do that?

raine · May 3, 2023, 10:19pm

I’m not sure I know what you mean here. y-websocket sends all updates in a single response to the client in sync step 2. If you are using a different WebSocket server, you would need to emulate that behavior.

The process is described here:

Are you sure? CRDT’s rely on the full document history to recreate the document state and resolve conflicts. In a distributed environment, there is no way to guarantee that all clients are up-to-date with a given snapshot.

Maybe there are others who have gotten snapshots to work in this fashion though.

robertoalvarezalonso · May 6, 2023, 1:25pm

Unfortunately, I can’t use y-websocket and I can’t modify the existing WebSocket server, but so far my approach seems to work. And I have just found the solution to the growing document state, which was worrying me.

The solution consists in using Y.snapshot to get the document state at a given point -when I want so save the snap- without all the document history.

if (shouldSendSnapshot) {
  const doc = new Y.Doc({ gc: false });
  const currentUpdate = Y.encodeStateAsUpdate(this.doc);
  console.log('Current doc: ' + currentUpdate.length);

  Y.applyUpdate(doc, currentUpdate);
  const snap = Y.snapshot(doc);
  const docRestored = Y.createDocFromSnapshot(doc, snap);
  const restoredUpdate = Y.encodeStateAsUpdate(docRestored);
  console.log('Restored doc: ' + restoredUpdate.length);

  persistUpdate(restoredUpdate);
}

Probably there is a more efficient way of doing this, but I hope at least it can help others with similar needs.