Garbage Collection and Version Snapshotting

In my production app with millions of Y.Docs, we have a certain type of domain model entity that uses Yjs with garbage collection turned OFF because we utilize version snapshotting.

This tradeoff has not been a great one. While the snapshots are nice, the GC is far more useful. We’re finding docs with GC disabled are pretty awful for performance, disk space, and network throughput. I would advise anyone thinking about doing the same to strongly reconsider.

If I just turn GC back on for all these docs, anyone using the snapshots will get screwed over, but the ramifications of turning GC off have been bad enough that I’m considering it. Does anyone have any ideas for using these features together gracefully? There was talk of being able to use both, but I can’t find the example, and from what I remember it seemed like an “off the books” kind of thing.

5 Likes

I found a compromise:

Our backend now garbage-collects everything, no matter what. The client garbage-collects anything that isn’t exposed to the versioning feature, and now we store the snapshots in the browser, rather than in the Y.Doc. We no longer offer fully networked versioning, only local version snapshots that are lost when the cache is cleared. No one seems to mind, as most usage of the version snapshot feature was recovering accidentally deleted data after the session had ended.

EDIT:

Before and after enabling GC on the server. :wink: The CPU spikes are from database writes of oversized, non-GC’d documents. Enabled GC on the 11th.

Screenshot 2023-05-20 at 5.26.48 PM

1 Like

@braden Can you share the pseudocode for how you enabled client and server garbage collection? This would be of great help.

@braden please provide some help on how you achieved this “Our backend now garbage-collects everything”.

1 Like

You set { gc: true } in the Y.Doc options when instantiating a document. I believe gc: true is the default value, as well. You set { gc: false } to explicitly turn off GC.

1 Like