Syncing persistent storage without loading Y.Doc with y-leveldb

MaxNoetzold · August 30, 2022, 4:57pm

Hey,
I really try to understand how to store a Y.Doc in a database without loading the Y.Docs. I have read this discussion post which, in my opinion, states that this is possible with the code from this GitHub example. However, I don’t understand how and I hope you can help me. As I understand the persistence object of y-websocket, it is necessary to keep the ydoc of bindState to listen to updates. I would love to see an example of how to use it properly. The best I was able to come up with is the following code for bindState:

const persistedYdoc = await ldb.getYDoc(docName);

// when we retrieve it with the ldb function we also flush the database which isnt
//  the worst thing we could do
const persistedStateVector = await ldb.getStateVector(docName);
// this function however is faster to use
//const calcPersistedStateVector = Y.encodeStateVector(persistedYdoc);

// in the default code the following value gets saved in the db
//  this however leads to the case that multiple complete Y.Docs are saved in the db (https://github.com/fadiquader/y-mongodb/issues/7)
//const newUpdates = Y.encodeStateAsUpdate(ydoc);

// better just get the differences and save those:
const diff = Y.encodeStateAsUpdate(ydoc, persistedStateVector);

// store the new data in db
ldb.storeUpdate(docName, diff);

// send the persisted data to clients
Y.applyUpdate(ydoc, Y.encodeStateAsUpdate(persistedYdoc));

// store updates of the document in db
ydoc.on("update", async update => {
	ldb.storeUpdate(docName, update);
});

// cleanup some memory
persistedYdoc.destroy();

However, this isnt even near the solution proposed.

dmonad · August 31, 2022, 12:01pm

My comment in the mentioned thread describes that it is now possible to sync with other peers without loading the document to memory. It does not say that this approach is “faster”. Having all information readily in-memory is, of course, faster. So initial sync is definitely “slower” (if a few milliseconds matter…).

I also mention a few of the advantages. The biggest advantage is that you don’t have to keep documents consistently in-memory. This allows a server to serve more clients. Because after the initial sync, you only have to forward update messages.

For the initial sync, you want to compute the state-vector without the loaded Yjs document (persistence.getStateVector(..)). The differences can be computed using (persistence.getDiff(..))

All incremental updates that you receive from clients should be stored in leveldb and also be forwarded to all other clients. You don’t want to use ydoc.on('update', ..) anymore.

y-websocket does not support this approach because the bindState function requires that you have a loaded Yjs document. If you want to use the database-only approach, you can write your custom provider.

MaxNoetzold · August 31, 2022, 12:20pm

Oh okay. Thank you very much for the detailed explanation, I think I do understand it now!