Persist and Retrieve Doc to mongoDB

GMatt · April 23, 2023, 1:15pm

Hi - I am trying to understand how does persistence and retrieval of a doc works - I am using MongoDB for long term storage.

I have read the YJS Doc update documentation and all the discussions in this forum, but it is actually very unclear how to do this.

I think especially the retrieval of a doc from offline storage is not covered anywhere.

This was the closest anybody got to addressing the issue.

This diagram showed exactly what I understood about how YJS works

I have 2 questions

Can anyone share how do I retrieve file from Database? it looks like I cannot just do

let ydoc = await Document.findOne({documentId: documentId})

My persistence looks like this:


  socket.on('message', async (message) => {

    console.log(`Received update for document id=${documentId}`);
    const update = Y.encodeStateAsUpdate(ydoc, message);

   [... then store update to DB]

I see all the support for storing docs in the browser - while those use cases are ok for certain apps, I am confused for the lack of perceived support for storing/retrieving from a server based database.

Thank you

GMatt · April 23, 2023, 1:25pm

The Error that I get is when I try to retrieve the file for the first time:

Although I could be storing it improperly as well, so maybe my issues is not only retrieval.

if I do not try to store/retrieve to database, the ydoc sync between clients seems to work well.

raine · April 27, 2023, 12:56pm

You will need a MongoDB provider to persist YDocs to MongoDB.

On the server-side, run an HTTP/WebSocket server that persists to your MongoDB instance. You can use the simple server example in the link above. I recommend testing locally first so you don’t have to deal with ports and firewalls of a public server.

On the client-side, use a WebsocketProvider to sync your local YDoc to the websocket server.

On the server-side, you can use persistence.getYDoc to get a YDoc instance. But retrieving files is kind of the wrong way to think about it. Think about it as syncing with one or more providers and then listening for a synced or observe event. Once synced has fired on the YDoc, or observe has fired on the Shared Type, then you can access the document data.

GMatt · April 27, 2023, 4:23pm

Thank you Raine.

That is helpful and in past week I have been shifting my thinking. I think I understand on an abstract way - but any chance you know of any implementation example? It would help.

I realize that as well the desired functionality is the server to be a hub for collab and possible backup/long term storage - which is really interesting.

In terms of pseudocode, so when a client connects first time

on(connect) - I want to retrieve doc from DB
I create a new YDoc(documentid) - where documentId is “room_name”
then this is where it gets fuzzy - can you break it down for me a bit -

I am reading this:

So then, I will always end up with two docs everytime a client requests a file that is not already in my yMap?

Thank you

raine · April 28, 2023, 3:44pm

You’re using Websocket + MongoDB, right? Just follow the instructions at y-websocket and y-mongodb-provider. I don’t have an example app on hand unfortunately.

The provider handles this for you when you create the WebsocketProvider instance.

No, document id and room name are not the same. The root document id is generally different every time you refresh the page, while the room name identifies the channel which the document is synced with. You would only pass a document id to the constructor if you’re trying to clone a document. You would just do new Y.Doc() and then instantiate your WebsocketProvider with the right credentials to connect to your websocket server, just as shown in the y-websocket README.

The document update API is too low level for you, unless there is a requirement you didn’t mention. You should just be instantiating Y.Doc and the appropriate providers.

If you already have a functioning client-side, the only change to the client-side code will be new WebsocketProvider. The server will need to be running with the y-mongodb-provider code.

chrysalis · April 28, 2023, 4:30pm

As I understand, the application on the server needs to act as a client of the websocket server, and use WebsocketProvider to send/receive document updates to/from the websocket server.

The websocket server itself is sending/receiving updates to/from the WebSocketProviders of the other client applications (e.g. applications running in browser like an editor)

Does this mean that the server application must create multiple instances of the WebsocketProvider where each WebsocketProvider deals with one shared document? In other words, the WebsocketProvider today is not capable of handling ALL shared documents simultaneosly?

raine · April 28, 2023, 4:55pm

Based on the example on https://github.com/MaxNoetzold/y-mongodb-provider, it looks like a single HTTP + WebSocket server is launched which persists to a given MongoDB instance. The application on the server is the websocket server, so it wouldn’t be acting as a client to it, if I am understanding it correctly.

Of course, you could split these services onto separate servers, but that would require additional setup.

Yes

Yes, y-websocket currently supports one WebsocketProvider (and thus one websocket connection) per YDoc. Not very scalable. Multiplexing was recently added to hocuspocus.

chrysalis · April 28, 2023, 5:06pm

Thanks for your answers and also thanks for pointing me to the Hocuspocus release supporting multi-plexing.

GMatt · April 29, 2023, 3:29am

Yes, the hocuspocus direction looks like it will sidestep a lot of my issues. Thank you for that, I am surprised this is the first I have heard of it, nevertheless, thanks!

GMatt · April 29, 2023, 3:31am

Thanks for bringing up the multiplex issues… Nice catch there.