Initial load of Yjs - too many updates

I read through the conversations here.
So I’m using Node Change Streams on server side to add changes to a ydoc. And from client side I’m connecting to same ydoc and listening to changes.
But whenever I load yjs document it receives all the previous updates as well. And because of that my application becomes unresponsive for some time (until the changes are received completely). And when I’ve too many of them it even crashes.

So is there any way I can optimize it? Like inform yjs to not do so or only consider the changes not present from current ydoc?

I also went through the Documentation, here’s the approach I’ve implemented.

const {doc: yDoc, wsProvider: socket} = connectToWebSocket(id);
const stateVector1 = Y.encodeStateVector(socket.doc)
const stateVector2 = Y.encodeStateVector(yDoc)
const diff1 = Y.encodeStateAsUpdate(socket.doc, stateVector2)
const diff2 = Y.encodeStateAsUpdate(yDoc, stateVector1)
Y.applyUpdate(socket.doc, diff2)
Y.applyUpdate(yDoc, diff1)

I don’t see much of a difference with this change, am I doing something wrong or is there anyway I can debug it in deep?

With a state vector, you can retrieve only the necessary updates. However, the initial load always must include all updates in order to populate the Doc. Encoding all updates into a single update saves some space.

Without knowing which providers you are using or how big your Doc is it’s hard to say more. I have had to break some Docs into multiples because the initial load was unacceptable. I think the need to load the entire Doc into memory is poor design on YJS’s part.

1 Like

+1 to splitting state across multiple Y.Docs. Unfortunately that first load is nasty. I also strongly recommend keeping your YDoc processing isolated to a web worker if you suspect your application will continue to require large ydocs, that way you at least don’t block the UI thread.

1 Like

How can I do this?
I’ve data flow something like whenever yjs map receives a change it dispatches redux action for it to update the redux.
So for now when yjs loads it takes all the changes and dispatches redux actions for them. So is there any way I can prevent that?

Y.encodeStateAsUpdate already does this, so you’re probably already doing it unless you are sending separate, raw updates from the server to the client.

Maybe you could skip the observe handler until the Doc is fully loaded, and just dispatch a single action with the static content of the Doc.

How big exactly is your Doc? i.e. Y.encodeStateAsUpdate(doc).length

If your doc is multiple MB in size, you’re going to limited by the raw download size more than any specific update handling regimen. Then you might have to rethink your update frequency or split it into multiple Docs.

Sorry to highjack this thread, but how does one break a Doc into multiple docs? Given that this Doc contains Prosemirror generated data being synced with a YJS web socket provider. Thanks.

1 Like

I’m not sure about Prosemirror data. My use case is a large graph that can be segmented. The OP doesn’t mention what the shape of their data is so I can only speculate.

Theoretically XML could be split into chunks, though that sounds like a pain to me.

If your YJS Doc becomes too big, it’s no small fix unfortunately. It is necessary to evaluate your data shape, update frequency, and atomicity requirements to determine the best solution and a migration path.

1 Like