Should size of binary ydoc be monotonically increasing?

I’ve been doing some automated testing of a TipTap implementation using a ydoc. My understanding is that a ydoc stores the entire history. I would think that this means the size of the binary ydoc would have to be monotonically increasing, because it wouldn’t be able to delete any information if it contains all the history.

However, it seems like the ydoc size goes up and down, and does NOT simply increase.

Is this correct? Am I doing something wrong?

Here is how I am saving the doc, which I’m doing periodically writing to disk during editing:

  const ydocUintArray = Y.encodeStateAsUpdate(document);
  const ydocBuffer = Buffer.from(ydocUintArray);

I’m issuing random mutations. When I have a huge amount of text in the ydoc, it can get up to, for example, 10MB in size after writing to disk. A random deletion then can drop the doc down to 2kb in size.

Does that 2kb file still have the entire history? Am I doing something else wrong? Am I misunderstanding how the ydoc works?

I could be wrong, but I think what you’re seeing might be the result of garbage collection:

Recent publications describe Yjs as a CRDT that performs some kind of garbage collection scheme to collect tombstones to reduce the size of the document. The document size only decreases because of the optimizations that are laid out in this article. Yjs doesn’t perform a garbage collection scheme that would result in convergence issues. To be fair, my first publication in 2016 described a garbage collection scheme including its limitations. The mentioned garbage collection scheme did work as intended, but was never enabled by default because it, as it was described, only works under specific circumstances and needed more work. The garbage collection approach has been removed in 2017 and replaced by a compound representation to improve performance. This article shows that tombstone garbage collection is not even necessary for CRDTs to work in practice.

I’m not entirely clear from that description if the newer “compound representation” that Kevin describes still involves deletion of tombstones, though based on discussions of snapshots and version restoration I’m guessing it does.

You could run your tests with new Y.Doc({ gc: false }) to compare.

1 Like

+1 to it being GC. With collection turned on, your doc sizes will trend upwards over time, and have a rising floor size, but will go up and down in the short term. Especially for runs of edits and deletes, which are the easiest for the GC to clean up, iirc.

Indeed, looks like that’s it. GC is turned on in my implementation, and this description does indeed suggest deletions are purged:

Thank you all!

1 Like