Question about Y.encodeState(ydoc).byteLength

zlv-thisF · August 21, 2020, 6:56am

hey guys!

I have encountered the biggest challenges in my prosemirror editor with Yjs and Y.PermanentUser.

In my project, I broadcast all the updates got from one client to all other clients instead of have a shared doc on the server side and all works as expected at first. However recently i find that when too many people participate in editing the same doc, system collapsed.

I have checked the reason, as i mention above: " I broadcast all the updates got from one client to all other clients instead of have a shared doc on the server side ", if the server have no shared doc and only broadcast updates, i have to broadcast encodedState at first to ensure that each client’s document can always sync. in that case if one client’s encodedState’s byte size is 200k, 12 people would be about 2m size of updates only for sync each other, it would be much larger if i encode the update to base64.

Besides, without shared doc and only broadcast client’s updates, the timing of broadcast encodeState is bothering me quite a lot…The benefits of shared doc is that it only sync shared doc with one client by stateVector and broadcast only the diffs.

I don’t know where to go …

dmonad · September 2, 2020, 1:48pm

In my project, I broadcast all the updates got from one client to all other clients instead of have a shared doc on the server side and all works as expected at first. However recently i find that when too many people participate in editing the same doc, system collapsed.

From a Yjs perspective, it doesn’t matter how you propagate document updates. Client-server or p2p - no difference.

in that case if one client’s encodedState’s byte size is 200k,

If one client inserts a lot of content, you will end up with a large document. If you want to track the whole document history and be able to old document states, you can’t delete edits and you will end up with a large document. This is a logical consequence and also true for any other CRDT/OT approach. One solution is to take less snapshots and cleanup (garbage-collect) the document properly, as I do on the yjs.dev website. The yjs.dev website is a bit experimental and I don’t fully support this GC approach. But it will reduce the document size overhead.

I seriously have no idea how you ended up with these large documents. In practice, people type slowly and it is basically impossible to end up with a document size of 2MB.

if i encode the update to base64.

Seriously, don’t use base64. It is inefficient and slows down your application.

zlv-thisF · September 3, 2020, 8:53am

Thanks for answering!

Sorry for not expressing clearly, image 49 people have been in the doc editing, then another new editor join in, now all 50 people have to broadcast encodeState of it’s own doc for sync, if each document size is 200k, now messages list is 200k * 50 and much more (cus each join operation will go through this logic).

I have to do this, due to in a p2p environment rather than C-S, a newly joined client doesn’t have a centralized synced doc to sync

Besides, I use lastEncodeState + newly created Updates to restore the doc, so DB will store so many encodeState…and too many large Updates lead to restore and type slowly.

dmonad · September 3, 2020, 12:58pm

Why don’t you use state vectors to only compute the differences? https://github.com/yjs/yjs#example-sync-two-clients-by-computing-the-differences

I guess a problem is that you will connect to all peers at the same time. One solution is to sync with one client at a time. In y-webrtc, I only connect to ~10 clients and basically create a partially connected network. This also limits the amount of exchanged messages.

The problem you are referring to is not unique to PermanentUserData - a document size of 200kb is reasonable in my opinion. You just need to handle syncing more efficient.

zlv-thisF · September 3, 2020, 1:18pm

yes! two-steps-sync-by-stateVector-diff method with all clients is too difficult…

OK! I will try !!! thanks a lot