As part of a new project, we are looking into realtime collaboration, with a goal similar to Figma; Being able to collaborate on design documents.
We’ve run some tests with y-websocket (with y-mongodb) and we think this would be a viable solution for us. However, it seems that the yjs transactions collection/table (either in memory or persisted) is the leading source of data and that a document should always be initialized through this data.
We currently persist the actual data inside the ydocument into a separate collection when all connections in a specific room closes. Ideally we’d like to make this data the leading source when a new session starts, after which the transactions/updates will become the source again.
So to clarify, we’d like the situation to be like this:
user opens design document
load “real” data from our separate collection
let yjs take over with transactions
user closes design document
save “real” data (from content of ydoc)
clear transactional data
repeat when needed
Other users that join in would sync from the websockets rather than init with the “real” data I assume?
Yes, it’s possible, with a couple of caveats that I’m aware of. We do something similar in Relm–when a designer (of a 3D world) wants to truncate the history of a relm, we export a snapshot of the current state of the YDoc and then import into a new YDoc.
(I could wrap that up in a PR for y-websocket if it’s valuable to you and @dmonad finds it an acceptable change).
The other piece that’s a little tricky is that Yjs doesn’t keep track of the schema of your data. In other words, you might know that your YDoc consists of a y-array with a bunch of y-maps containing y-text; however, the YDoc itself doesn’t track how that maps to, say, a JSON export. So you’d need to hard-code or otherwise track the schema of the YDoc so that when you import it, you can put all the data you exported into the right Y types.
@lucien Ideally, you store the Yjs document alongside your JSON representation. This will introduce some overhead because you are storing the same data twice, but there are a lot of advantages of keeping the Yjs metadata around.
A client might not realize that it disconnected (it takes a while before the client realizes that it disconnected in some cases - e.g. over 3g, Starbucks Wifi, …). You won’t be able to apply edits after the server document is destroyed.
A nice feature of Yjs is that you can store your data offline using y-indexeddb. This improves load-time and ensures that users never loose any data unless the server AND the client lose all their data.
When you introduce the feature that you described, and that @canadaduane implemented, then you need to think about more special cases. A lot of developer overhead for losing some essential features.
Even ShareDB doesn’t recommend to delete the history - ever!
In @canadaduane’s case, it really makes sense to restart the session without any associated metadata. If you have a document that really receives millions of millions of changes every day (e.g. a gameworld that allows thousands of users to concurrently move & rotate 3d objects) then you should think about buying into the complexity (and the restrictions!) of restarting sessions. If you only build a collaborative application that only receives a couple of million changes in its entire lifetime, then you don’t need to think about this feature. You can always re-implement it later. Build it first, and improve later.
@canadaduane I’m hesitant right now to make it part of y-websocket because I don’t want to give the impression that implementing this should be the norm. This feature won’t play nicely with other features I have planned (e.g. autoscaling of y-websocket). But I would appreciate it if you would write a tutorial on how you implemented this feature. Initially I only planned to build collaborative apps, but you built a whole 3d world with Yjs. It would be interesting to hear more about the challenges and solutions you came up with.