Oh my, the deep dive video is such a good one. I am still midway there watching, and I start to piece together the philosophy behind Yjs’ implementation, filling many of the holes that I have overseen when I just read the code. It is just so cool. Thank you so much for doing that with Joseph - it was like a dream came true.
I guess you can associate a timestamp with each update on the server. But you would also need to store every single update separately on the server. Then you can reconstruct the Yjs document using all updates before a certain timestamp. You could also store a lot of snapshots instead. I’m not sure which approach would work better in practice. The advantage of using snapshots is that you don’t need to reconstruct the document all the time. Maybe you could simply take a snapshot after 5 seconds without any changes by any user.
Indeed, I believe it would need a data structure for each single update and another for snapshots (automatic or manual) so that we don’t need to go through all the updates. Since the time-slider (or the history feature as a whole) is not something most users would access all the time, we can probably put some of the older updates in colder storage, while maintaining the few latest snapshots and updates not yet in snapshots in hotter storage, for synchronization.
A graph representation would definitely be interesting. A graph representation is much easier to implement if you have a central authority. In Yjs, you can merge with any peer at any time, so it is a bit harder to represent. But I think that would be very interesting.
I imagine the graph representation to be reminiscent of git history. When it gets synchronized to the central authority, it is like a git merge - the merge commit contains a reference to both the original branch’s latest commit (authority’s updates/snapshots) and the merging branch’s latest commit (client’s updates). Unlike commits, CRDT updates are much more versatile and do not contain references to previous records, so I think both the client and the server will have to keep track of updates in a strictly incremental manner (which I believe
y-leveldb is already doing). The server also needs to save individual updates (including the session ID and the client’s own incremental update sequence), which is currently not present to my knowledge.
If I am to hack on the internals of Yjs here, where would you suggest to start with? Should I first start with
y-prosemirror to add in timestamp for each updates (which I think there should be no existing APIs for that?) and then
y-protocols to create a new synchronization protocol that would require clients to send all its updates in serial instead of just a two-step handshaking? Or would you have any suggestions?
Thank you again!