Compliance Limitations on Data Retention

So I’d like to store documents in a central server/database and I would like to preserve history of documents, but for compliance reasons if content is deleted (say email addresses or other personally identifying information) from a document, I have to make sure that no records of deleted content remain after some time (say 30 days).

Correct me if I’m wrong, but it seems like garbage collection will delete ALL content not in the latest/current version/state-vector of a document, leaving no history whatsoever. Is it possible to garbage collect only that content deleted (approximately) before some state-vector? (…this just occured to me) could I do something like: take a snapshot [A] of the document [B] at the earliest compliant state-vector [C], produce the diff/updates between A and B [D], garbage collect A [E] and apply D to E to get a document with a compliant history I can safely save in the database?

Thanks in advance.

I see now that there is a(n undocumented?) gcFilter() mechanism that can be passed as an option when constructing a Y.Doc – which is probably the right way to go about it.

What you are trying to do makes sense, but it is very complicated and requires deep understanding of the internal structure of a Yjs document.

The yjs.dev website contains a demo that garbage-collects in-between snapshots. There is an example of how you can use gcFilter here: https://github.com/yjs/website/blob/master/src/sharedTypes.js#L15

But first you should understand how the Yjs document is modeled internally. https://docs.yjs.dev/api/internals

Please note that undocumented API is considered experimental and is subject to change.