Persisting to DB, could it be this easy?

mikkokam · January 17, 2021, 2:13pm

Hi all,
just starting out with Yjs. It’s awesome!

I’m attempting to make a PoC of a collaborative tool where each document will have few editors, but the amount of (concurrent) docs might be large.
To keep the server side memory usage low I’m making a naïve attempt to persist the documents without loading them on the server.

I came up with an idea - and would like to know if this is stupid or won’t work:

I’d like to use Y.encodeStateAsUpdate at the front end side, and send the result to be saved by the back end (without the back end knowing anything about what this data is).
This should happen when the last (or only) editor is exiting the app - or at reasonable intervals.

When someone wishes to start / or continue editing the document, they fetch the update and use Y.applyUpdate to read the state into their doc.

If there were concurrent editors, I would use y-websocket as usual (or y-webrtc, if y-websocket is loading the doc into memory anyhow?).

Could it be as simple as this:

// front end, using a db service
async save(id, doc){
    const update = Y.encodeStateAsUpdate(doc);
    await db.set(id, fromUint8Array(update)); // or binary
}
async load(id, doc){
    const update = await db.get(id);
    Y.applyUpdate(doc, toUint8Array(update));
}

dmonad · January 18, 2021, 6:54pm

Hi @mikkokam

Sure that would work, and it is actually how applications like room.sh work with Yjs. They use y-webrtc and then store the document on some server (if there is any).

The downside of this approach is that you can’t always determine if a document is …

synced to the server. What happens if the user is closing the document before your app can sync it to a server?
completely synced to the server. In case two users are concurrently syncing a change, you can’t determine if all latest changes are synced. For example, there might be a proxy that prevents a client to sync via webrtc, then the client behind the proxy will overwrite all document updates from other users with an old state.

Generally, I recommend to add more reliability to an application. There are tremendous benefits in having a server that accepts granular updates:

You don’t need to sync-up the complete document state (few hundred kbs) in an interval
Document updates that are synced to the server won’t be lost when you close your application window
Allows faster syncing with the clients. No need to sync the complete document state, only the differences are exchanged.

I’m also currently working on enabling applications to sync changes without loading them to memory. I described my approach here: https://github.com/yjs/yjs/issues/263 Once this is done, I will rewrite the server to work on the binary format directly.

mikkokam · January 18, 2021, 7:19pm

Thanks!
I made a few tests with PostgreSQL, Firestore and Google’s Storage – saving the Y.encodeStateAsUpdate as binary from the clients.
I also tested cutting off network for some of the clients while editing, and any old edits did not overwrite newer ones created by others meanwhile at sync / load. We are using a top level Y.Map with Y.Maps as children (no text editing in this case).

Most of the projected costs of the app being designed comes from the amount of (DB) writes. Therefore I’d like not to write at every change…

Syncing without loading to memory will be great, I am definitely going to test that once it’s done.

dmonad · January 18, 2021, 7:27pm

That makes sense. I also discourage from writing every edit to a Postgres database. In this case another solution would be to have a minimal server (like y-websocket server) that caches the results and eventually writes the data once all clients disconnected.

mikkokam · January 18, 2021, 7:33pm

Yes, that sounds like an optimal way to go. We are planning to test an architecture along those lines.
We can horizontally scale this by splitting the users on several minimal servers on Kubernetes. The server can cache and persist as needed - and also make some other DB writes at the end; persisting data for indexing elsewhere.

Fibs7000 · November 1, 2021, 6:55pm

Hi I know this is very late but I would love to hear which approach you have taken to split the traffic for the documents! (To ensure every miniserver has all users connected which are working on a sigle document)