Clear document history and reject old updates

We spoke to Kevin on this topic yesterday and here is my summary:

Documents getting really large is most likely due to code that produces unnecessary operations. It could be an issue in an editor binding or in your application code. This was the case for us. With “normal” operations Yjs is very efficient and optimized. We identified two approaches to pruning documents to remove old unnecessary history.

Identifying the cause of the excessively large documents

This can be done by inspecting the update: Y.logUpdate(Y.encodeStateAsUpdate(yDoc)). It’s perhaps not trivial to read these messages, but we got the hang of it and we managed to identify a bunch of unnecessary operations in our case. Fixing them will mitigate most of the issues that we are seeing with our large YDocs.

Prune documents: Approach A: Isolated “sessions”

Initialize a new ydoc from json snapshot and give it a new documentName (perhaps ${documentId}:${sessionId}. This new YDoc will have no history and thus it will be as small as possible. Then, keep track of the active session id in your system. Make sure that new connections always connect to the active session id and make sure that existing connections are informed to connect to the new session. Edits to old sessions should be refused.

Prune documents: Approach B: Clearing YMap keys

If you have a root YMap in which you put all your data, you can completely delete and reinitialize keys and everything under that level will be garbage collected efficiently but not 100% because of tombstones that will be retained. This is perhaps simpler than Approach A since it doesn’t require reloading the document and keeping track of session ids.

Additional notes

Q: Why cannot some unnecessary operations be optimized away?
A: Y.Map doesn’t make use of Yjs’ optimizations if you write key-value entries in alternating order. Always writing the same entry does’t significantly increase the size of the document. But writing key1, then key2, then key1, then key2 (alternating order) breaks Yjs’ optimization. As a consequence of this, Kevin has started exploring a more optimized implementation of a “YKeyValue” type, similar to a YMap, still early and not yet feature complete: GitHub - yjs/y-utility: Utility features for Yjs. I think it will be very interesting to follow the development of this.

3 Likes