We are using YJS with the undo manager, and we notice that this makes doc size grow unbounded (even if slowly bc of the optimizations around encoding, merging items, etc). We have a few related questions:
We see that the undo manager exposes a clear() method, which calls keepItem(item, false) on the deleted items it removes from the undo stack to allow them to be GC’d. But, we don’t see any mechanism that will actually run GC on those items. From what we can tell, GC only happens automatically at the end of any transaction, and will only look at the items deleted in that transaction. What was the intention for how to GC the deleted that were cleared from the undo stack? Are we meant to call the top-level Y.tryGc() method on the entire doc manually?
We’d love a way to truncate part of the undo stack instead of clearing it entirely — ex. if we want to support only undoing the most recent X actions. Are you open to adding that ability? We are also open to giving this a stab ourselves — do you have suggestions on what the interface should look like? Ex. allow you to configure a max size or # elements, or just expose some sort of truncate() method?
Generally, it should not be a problem to keep the items. The next time you load the document, Yjs will properly gc them. This is the preferred way to gc items.
You can clear the data (I honestly don’t remember implementing this) and make them available for gc. However, there might be unintended side effects when you use multiple UndoManagers. Today, I’d probably not allow for items to be garbage-collected again (at least I should rename the method unstableClearItems() and make a note that this might break other undo managers).
You can always safely call Y.tryGC. I didn’t test it, but the document should shrink again.
You can also partially gc the undo stack by only clearing (see clear()) the undo / redo items that you don’t need anymore
You can safely modify the undoStack and redoStack by deleting and reordering items. That shouldn’t be a problem. If you only want to keep the past X items, you can just splice after undoManager.on('stackItemAdded', () => .. ).
Ah ok we didn’t realize we could modify the undoStack/redoStack directly – makes sense!
It doesn’t seem like clear() accepts any parameters controlling which items to clear (just the booleans for undo vs redo), but we’ve got it to work like this (here assuming we only want to keep the last 5 undo items):
We also noticed that the doc size could still grow unbounded if you repeatedly alternated between undo/redo – we addressed this by doing a similar operation as above on the undo + redo stacks on the stack-item-popped event (in order to “clear” the stack item that was just popped off).
Is this what you had in mind?
Also – do you have a better suggestion for building the delete set to pass to Y.tryGc()? We are concerned about the overhead of calling Y.snapshot(), and it didn’t look like addToDeleteSet() or the DeleteItem class were exposed.
Yjs is a CRDT implementation. As such, it must retain some metadata for every edit to resolve potential conflicts. It is pretty good at compressing information and “garbage-collecting” as much as possible, but there will be some information left (only a few bytes for any edit).
There is a function “createDeleteSetFromStructStore”, but I’m not sure if it is exposed. Y.snapshot is a good idea.
Yep – makes sense that we always keep some metadata even after GC. What I was describing though was if you alternated bw undo/redo, GC still didn’t happen at all. The undo/redo stacks were always length 0 or length 1, so we never hit our cleanup threshold, but the deletions still never got GC’d. I think this is because we don’t mark them as “don’t keep” when we pop something off the stack, which is why adding the listener on stack-item-popped seemed to fix it!
createDeleteSetFromStructStore also works, though it seems like that’s also where most of the expensive computation is (iterating over all structs). We don’t know that it’ll cause issues yet, so we’ll try it out.
Oh right. GC will be disabled on those items forever. That is something that we could change. Feel free to open a ticket on GitHub. But tbh, it will probably take me a while to get to this.
You can always safely run it (maybe not inside a transaction, because that would mess up the event emitter). But you are right, it will only clean up thinks that are marked as “don’t keep”.