Document branches like git branches?

Hideman85 · August 9, 2021, 11:03am

Hi,

I’m wondering if I can build like a branch structure with Yjs docs.

My use case, imagine I have a parent document with a lot of work made on it, then we want to make a copy of this document to derivate some content.
We should have the possibility to integrate parent changes in our clone, and reversely we want to be able to move back some child content to the parent.
And the child document could have different user and access policies, so we want to kindly clean the data.

For that, I have multiple questions:

Can I force the garbage collection after cloning my document? Since I have versioned doc the GC is disabled on the documents but for the brand-new clone it would be really great to compact the document.
Can I reset the clientID & PersistantUserData on all the current data? Basically, we want to see that the data is just pre-initialized data in the cloned doc. If we want to know more (and have the perm) we could just look the history of the parent.
Then when we have clone can we easily take the update from one doc and put it in the other one?
Last question, a bit more tricky, can we cherry-pick an update/transaction and apply to the parent? I mean, I can do a lot of updates on it, but I just want to take some of them back to the parent.

Thanks in advance for your help

dmonad · August 16, 2021, 10:55am

Hi @Hideman85,

If you are building a collaborative text / richtext application, you shouldn’t worry about manually cleaning up metadata (garbage collection). The gist of this article is that it is basically impossible for humans to create documents that are too large for Yjs (that is unless you are storing images or other large blobs in the file).

Can I force the garbage collection after cloning my document? Since I have versioned doc the GC is disabled on the documents but for the brand-new clone it would be really great to compact the document.

Yes, that is possible. You can disable garbage collection for tracking changes (or even only keep changes that are relevant for restoring versions). Once you create a new snapshot, you could store the non-gc’d document, reload it again with garbage-collection enabled and then disable garbage-collection again for tracking changes. If you want to load a certain version, you can load the specific snapshot. In order to load the non-gc’d document, you can load all document snapshots from beginning to end. nongcd = Y.mergeUpdates([version1, version2, version3]) The merged document has the complete version history of all changes.

Can I reset the clientID & PersistantUserData on all the current data? Basically, we want to see that the data is just pre-initialized data in the cloned doc. If we want to know more (and have the perm) we could just look the history of the parent.

Can you please clarify what you mean?

Then when we have clone can we easily take the update from one doc and put it in the other one?

Last question, a bit more tricky, can we cherry-pick an update/transaction and apply to the parent? I mean, I can do a lot of updates on it, but I just want to take some of them back to the parent.

It is generally not possible to cherry-pick specific changes from a version history (in the form of a document update) and apply them to another document. In some cases, this might work, but sometimes updates depend on each other. You always need to merge all changes from the diverged documents in the order in which they have been produced. Instead, you could work with a different representation for the changes (e.g. prosemirror transactions or the delta format.)

Cheers,
Kevin

Hideman85 · August 17, 2021, 7:51am

Hi @dmonad,

Thanks a lot for your reply, and now that I’m more familiar with Yjs I also learnt part of the response.

Currently, I’m using Yjs for the versioning, conflict free and blame feature rather than realtime (our app is not so ready for that yet) and that’s why I was questioning myself about “branch” aspect.

In my case, GC or metadata compression is not in order to save space, but in order to remove information that we don’t want to give you access to. For understanding that, we need to consider that we have some public document maintained by us and that other could clone and derivate. When they clone the doc we don’t want them to be able to know who made what on the doc but more “this is the initial content”.

After deep look I understand now that all the content has the clientID and the clock and that PersistantUserData “just” retain a mapping clientID to user and clientID to deletion if I understand well. So if I want to collapse the metadata at the end, I can simply create a brand new doc and iterate over the previous one to clone the content, so I would have only one clientID and one clock. The question is, if I do so, could I still apply the updates made on the clone to the parent? I probably need the clock to be the latest one in the source doc to be able to merge, right?

The cherry-pick feature would be super nice. I think we could still compute the originLeft an originRight and try to find origins that are part of the previous document, but this would work only if the parent exist in the previous document, or you would have to pick also the parent creation. I do not think delta would help either since we have the same issue, the parent must exist, or we need to pick the creation of it too.

Anyway I would look for that challenge a later time for now I still need to finish our implementation

And thanks a lot for this great lib, this is really awesome and at the end the implementation is pretty easy to understand

dmonad · August 17, 2021, 8:56am

This is correct!

So if I want to collapse the metadata at the end, I can simply create a brand new doc and iterate over the previous one to clone the content, so I would have only one clientID and one clock.

The clientIDs don’t leak any user information. In order to get rid of stored content, you can simply load the content into a fresh Y.Doc with gc enabled (it is enabled by default).

Y.applyUpdate(new Y.Doc({ gc: true }), Y.encodeStateAsUpdate(existingDoc))

This way you can still merge updates from the original document and vice versa. The metadata that is retained doesn’t leak any user-information or editing snippets. If you tried might be able to recognize editing patterns as you can see how much content was inserted and deleted.

Good luck to your app. Would be awesome if you could share it here once you have something to show (in the “show” section )