Handling slow applyUpdate on the client side with update increasing

kangnli · July 27, 2023, 3:50pm

I have a collaborative application in which the clients and the server communicate through WebSocket messages. The client will receive the message from the server and then applyUpdate(update) directly on the YDoc, and the server will only send updates not the whole state of the YDoc to the clients.

I’ve run a test in which there is an empty YDoc created by new YDoc and the updates which are normally around 600 bits of size for each one. With more updates being applied to the YDoc, the time consumed by applyUpdate increases from 1ms per update to 100ms per update, and the number of updates in the test are around 3800 which are from the testing env by real user collaborations.

Is there any way to optimize the applyUpdate time? It is really unacceptable to handle each update on the client side for around 50ms.

raine · July 28, 2023, 10:12am

Yes, it’s quite bad. I have a 1MB Doc that takes about 10 seconds to load from disk

I can’t speak to the potential for optimization, but I can share a couple workarounds I’ve considered.

You may be able to reduce the time by combining updates with Y.mergeUpdates. The result is always smaller than the sum of its parts.

A more obtrusive workaround is to completely reset the history of the Doc when it grows too big by populating a new Doc from JSON. Of course you lose the ability to merge changes from offline clients when this occurs, so it’s a big tradeoff. You’d have to have some kind of migration strategy for clients to move to the new Doc.

The other workaround is to keep the Doc in a web worker. That will offload applyUpdate to a separate thread so that it doesn’t block the UI. Still, it’s a linear improvement at best.

I’ve been disappointed with YJS’s handling of large Docs, and absence of lazy loading. It too often assumes a single, prose-like document and neglects the large data realities of many other uses cases.

ctxbyc1123 · February 26, 2024, 6:53am

hi~，raine, would like to ask how to save the Doc in the web worker, we also thought about loading applyUpdate into a separate thread, we think this is probably the most appropriate solution for now without blocking the UI, but the worker cannot share an instance of Y.Docs. So can you talk about your idea in detail, or if you can help demo it? Thank you.

ctxbyc1123 · February 26, 2024, 7:03am

We recently encountered a more difficult problem, a 30MB ydocs updates, on 8G, 8-core CPU windows applyUpdate execution time of 6718ms, on 8G, 8-core CPU Mac execution time of 1108ms, Is there a solution to this applyUpdate performance problem on windows

raine · February 27, 2024, 9:47pm

I perform all Doc access in a web worker, and then send the data back to the main thread through message passing. Comlink makes this all fairly straightforward. There is a small amount of latency passing messages over the bridge, but for me it’s worth it to ensure the UI is unaffected by data access.

(I should mention that I have an outstanding issue where the worker connection gets broken and does not reconnect, which I have yet to look into. I think it’s a Comlink issue.)

Document size is a major problem with YJS. Once it gets too big, it’s very hard to fix it. I don’t have a good solution unfortunately.