Squash document updates in-place

davidbrochart · June 26, 2024, 7:31am

Since updates to a document always grow in memory, this will eventually lead to memory overflow. Squashing updates would solve this issue, but is it possible to do that “in-place” in Yjs/Yrs, i.e. without creating a new document?

dmonad · June 26, 2024, 4:59pm

Since updates to a document always grow in memory, this will eventually lead to memory overflow.

Is that the case? My argument is that there is no human being that can type fast enough to create a document that will lead to memory overflow: Are CRDTs suitable for shared editing?

Yjs optimizes for storage and removes unused data. I often refer to this as “garbage-collection”. This happens automatically, unless you disable this feature.

At least it is incorrect that memory always grows because deletions do result in a smaller document state.

I assume that “squashing” doesn’t refer to “garbage-collection”? What exactly do you want to do?

If you want to purge the old document and start with a new one, then you must do that manually. You could create your whole document under a single Y.Map key. Once you delete that key, all content will be garbage-collected, and there will be almost no trace of it anymore.

That said, I really recommend against purging documents. It is best to retain document state. Especially for the Jupyter project, there is really no need for purging documents. You should retain document state to keep them mergeable.

davidbrochart · June 26, 2024, 7:44pm

No human being can type fast enough to create a document that will lead to memory overflow, but in Jupyter we can modify a shared document programmatically, through a kernel that will produce cell outputs, and these outputs can be quite big (images) and/or produced at a fast rate.
What you are saying about “garbage collection” in Yjs is interesting. But our main issue with memory is on the server, does Yrs has the same garbage collection system?

dmonad · June 27, 2024, 11:56am

The size of operations doesn’t really matter. Once you delete an image, for example, all that remains is a single “tombstone” that only allocates a few bytes.

It doesn’t matter how many changes you perform as long as you only append to outputs. If you only perform appends, then Yjs compacts this to a single operation. Once the output is deleted, all its input is garbage-collected. All that remains is a single reference of a few bytes.

My recommendation is to first try to produce a Yjs document that is “too large”. Then, optimize the representation, i.e., the operations that lead to such a large document. Only if all of that doesn’t help, then I might recommend purging document state and starting anew. However, I only recommended that once to a company that uses Yjs for syncing game application state (I recommended the Y.Map approach above). Purging leads to a lot of other problems that should be avoided.