Y-websocket - re-init from different data source

Hi!

As part of a new project, we are looking into realtime collaboration, with a goal similar to Figma; Being able to collaborate on design documents.

We’ve run some tests with y-websocket (with y-mongodb) and we think this would be a viable solution for us. However, it seems that the yjs transactions collection/table (either in memory or persisted) is the leading source of data and that a document should always be initialized through this data.

We currently persist the actual data inside the ydocument into a separate collection when all connections in a specific room closes. Ideally we’d like to make this data the leading source when a new session starts, after which the transactions/updates will become the source again.

So to clarify, we’d like the situation to be like this:

  • user opens design document
  • load “real” data from our separate collection
  • let yjs take over with transactions
  • user closes design document
  • save “real” data (from content of ydoc)
  • clear transactional data
  • repeat when needed

Other users that join in would sync from the websockets rather than init with the “real” data I assume?

Would this be possible to do?

1 Like

Yes, it’s possible, with a couple of caveats that I’m aware of. We do something similar in Relm–when a designer (of a 3D world) wants to truncate the history of a relm, we export a snapshot of the current state of the YDoc and then import into a new YDoc.

To accomplish this, we needed the ability to “get” the current snapshot of the YDoc, and we did so by exporting a new function, getYDoc in a custom version of the y-websocket code: https://github.com/relm-us/relm/blob/main/server/yws.js#L51

(I could wrap that up in a PR for y-websocket if it’s valuable to you and @dmonad finds it an acceptable change).

The other piece that’s a little tricky is that Yjs doesn’t keep track of the schema of your data. In other words, you might know that your YDoc consists of a y-array with a bunch of y-maps containing y-text; however, the YDoc itself doesn’t track how that maps to, say, a JSON export. So you’d need to hard-code or otherwise track the schema of the YDoc so that when you import it, you can put all the data you exported into the right Y types.

1 Like

@lucien Ideally, you store the Yjs document alongside your JSON representation. This will introduce some overhead because you are storing the same data twice, but there are a lot of advantages of keeping the Yjs metadata around.

  • A client might not realize that it disconnected (it takes a while before the client realizes that it disconnected in some cases - e.g. over 3g, Starbucks Wifi, …). You won’t be able to apply edits after the server document is destroyed.
  • A nice feature of Yjs is that you can store your data offline using y-indexeddb. This improves load-time and ensures that users never loose any data unless the server AND the client lose all their data.
  • When you introduce the feature that you described, and that @canadaduane implemented, then you need to think about more special cases. A lot of developer overhead for losing some essential features.

Even ShareDB doesn’t recommend to delete the history - ever!

In @canadaduane’s case, it really makes sense to restart the session without any associated metadata. If you have a document that really receives millions of millions of changes every day (e.g. a gameworld that allows thousands of users to concurrently move & rotate 3d objects) then you should think about buying into the complexity (and the restrictions!) of restarting sessions. If you only build a collaborative application that only receives a couple of million changes in its entire lifetime, then you don’t need to think about this feature. You can always re-implement it later. Build it first, and improve later.

@canadaduane I’m hesitant right now to make it part of y-websocket because I don’t want to give the impression that implementing this should be the norm. This feature won’t play nicely with other features I have planned (e.g. autoscaling of y-websocket). But I would appreciate it if you would write a tutorial on how you implemented this feature. Initially I only planned to build collaborative apps, but you built a whole 3d world with Yjs. It would be interesting to hear more about the challenges and solutions you came up with.

1 Like

Thanks both for the detailed responses! We’ll have to give it some more thought and perhaps try to simulate a few environments to see what would work best for us.

In any case, thanks for all the hard work on yjs :smile: