Multiple room sync & subdocument

Hi great yjs,

I read subdocument, it seems some provider should support it (y-websocket?), and some not support it (y-webrtc?).
My scene is, I’m using yjs to sync for user’s own data cross browsers, e.g. a list of items for each user. it works well, except the keep growing indexed db entries.
Anyway, now, the feature need be extended, so that each entry of the list is something which sync between users, as long as their ids are same.
I just wonder, currently my sync is inside the “user room” (identified by userId, when using websocket provider). Will each list entry (if I use subdocument to do this, use same guid), keep sync between different users (via different websocket userId rooms of same websocket server)?
I don’t see any guid code in y-websocket, so I don’t know how this is going on. Does the subdocument sync only inside current “user room”, or it will cross sync between user rooms (internal sync using the subdocument’s guid as room id?)
Anyone know about this?

Y-Websocket does not explicity support subdocs, but rather treats them exactly the same as any other YDoc. They need their own provider. For example, in the code on this page, at the bottom you can see an event is fired when a subdoc is loaded. The example code then wires up a new provider for the subdocument.

As it stands now, you would need a websocket provider and separate websocket connection for each subdocument. That being said, it’s not terribly hard to add multiplexing support to sync many subdocs over a single websocket connection. One way to do this is to extend the y-websocket server message protocol, found here, to include a message type for subdocuments. In my case, I just added a new message type called “subdocumentSync” that included the id of the target subdocument. Then I use the same websocket connection to perform the sync operation on that subdocument. You would also need to extend WebsocketProvider to understand this new message type, as well.

In regards to the numerous indexeddb entries, it’s true that every indexeddb provider you create will result in a separate database. I’m not sure if there’s a limit on how many dbs a browser can have; I’ve been curious about this as well. You could probably modify y-indexeddb to use individual tables for subdocuments, but if the allowed amount of indexeddbs is functionally unlimited, it would probably be wasted effort. I’d need to do some research on this.

Also make sure you absolutely need subdocs! YJS is super duper simple when you aren’t using subdocs. Subdocs are incredibly useful, but add some complexity to your application design. That being said, if you need them, they are awesome :slight_smile:

4 Likes

WoW, @braden, your answer is quite helpful. I thought at least some provider already support subdocs, it seems false now. Each subdoc got a separate websocket looks terrible, your approach would be great. Would you consider commit your update to let y-websocket support subdocs? yjs users and community will benefit from it. I just woke up with idea that y-websocket should already support subdocs since it got subdoc and guid and it can use that for sync. But now looks it’s too early to say that. I will look into it, and also do some test without the single websocket. Thank you!

I believe @dmonad is working on a significant update to y-websocket. If his update doesn’t include multiplexing support, I’d be happy to adapt my solution and submit a PR once the update goes out. As it stands now, my fork of y-websocket has diverged greatly from the current version of y-websocket and would be made irrelevant once the new y-websocket goes out anyways. That being said, in the meantime I can outline a loose solution:

Add new message types such that the full y-websocket protocol is:

message formats:
[messageSync][messageType][data]
[messageSyncSub][subdocId][messageType][data]

Extend websocket provider on the client and WSSharedDoc on the server to have a data structure that tracks loaded subdocs.

class WSSharedDoc extends Y.Doc {
   ...
   subdocs = {}
   ...
   loadSubdoc(subdocId) {
      let subdoc = getYDoc(subdocId)
      this.subdocs[subdocId] = subdoc
      return subdoc
      // or return an already created subdoc
   }
}

const messageHandler = (conn, doc, message) => {
   const encoder = encoding.createEncoder()
   const decoder = decoding.createDecoder(message)
   const messageType = decoding.readVarUint(decoder)
   
   switch(messageType) {
      case messageSync:
         // same as default y-websocket implementation
      case messageSubSync:
        let subdocId = decoding.readVarString(decoder)
        let subdoc = doc.loadSubdoc(subdocId)
        // now that we have the subdoc, the sync process becomes identical to the standard case.
        // be sure to broadcast any changes uses your subsync message type
        syncProtocol.readSyncMessage(decoder, encoder, subdoc, null)
        send(doc, conn, encoding.toUIntArray(encoder))
   }
}

When we detect a subdoc sync message, we just extract the subdocID, and then from there, the sync process becomes identical to a non-subdoc. There’s probably a clever way of reducing code re-use here, but I haven’t refactored much. Regardless, it works very well and can sync an enormous amount of subdocs very quickly.

The client does the exact same thing. Add a subdocs collection to WebsocketProvider, when we detect a subdoc message, use the subdoc ID to load the new Y.doc into memory, and then we use the contents of the message to apply updates to the doc, just like a non-subdoc Y.Doc. A lot of these details change when you introduce YJS’ newest features, mergeUpdate and diffUpdate, but I’m still getting acquainted with the api.

4 Likes

It’s a very smart implement, great simple. I will read though y-websocket source code so I can put pieces together. Great thanks for you help and contribution!!

1 Like

Can you provide the code repository for reference and learning?

Hi @braden! Sorry to ping you about this so long after the fact, I was wondering if you would be willing to share your modifications to WebsocketProvider? I have been trying to implement basically what you are describing here but I ran into some roadblocks. Which is how I discovered this thread. Fully understand if you can’t/aren’t willing to share, just thought I’d ask.

@ChasLui if you are still looking for a reference example take a look at this here: https://github.com/DAlperin/y-websocket/blob/master/src/y-websocket.js
I’ll try to update the server reference implementation soon as well

2 Likes

multiple room sync sub documents can not be acceptable on server side (if you have), because there can be too many connections. To support the main doc and many sub documents communicate in the same websocket connection, my idea is adding doc guid into sync message, something like below:

/**
 * Listens to Yjs updates and sends them to remote peers (ws and broadcastchannel)
 * @param {Uint8Array} update
 * @param {any} origin
 */
this._updateHandler = (update, origin) => {
  if (origin !== this) {
    const encoder = encoding.createEncoder()
    encoding.writeVarUint(encoder, messageSync)
    encoding.writeVarString(encoder, this.roomname)  // identify which ydoc changed
    syncProtocol.writeUpdate(encoder, update)
    broadcastMessage(this, encoding.toUint8Array(encoder))
  }
}
this.doc.on('update', this._updateHandler)

// sub document update handler
/**
 * Listen to sub documents updates
 * @param {String} id identifier of sub documents 
 * @returns 
 */
this._getSubDocUpdateHandler = (id) => {
  return (update, origin) => {
    if (origin === this) return
    const encoder = encoding.createEncoder()
    encoding.writeVarUint(encoder, messageSync)
    encoding.writeVarString(encoder, id)  // identify which sub document changed
    syncProtocol.writeUpdate(encoder, update)
    broadcastMessage(this, encoding.toUint8Array(encoder))
  }
}

I have finished a poc, and it works.

Hey,
I’ve been working on syncing subsockets with y-websocket providers for the last few days. I have now failed with three different attempts and therefore wanted to continue the thread here.

My first attempt was to create a new provider for each subdocument. This worked fine as long as the connection to the server was stable. As soon as the clients have to reconnect it happened regularly that parts of the subdoc data were deleted.

So I implemented the ideas that were expressed in this thread. For this I have now also created a github repositry, so you could see the code there. If any questions arise about my code, I’ll be very happy to answer them.

First I tried to finish the fork of @DAlperin (in /try1).

Then, (in /try2) I tried to implement the comment of @LeeSanity.
The differences to /try1 are basically that the message type messageSubDocSync got removed again and instead every messageSync message sends the corresponding docId.

Both attempts did not work. Both even fail on the same problem and that is that the ydoc.on(“update”) listeners are not triggered when a subdoc is updated.

Does anyone have any ideas on how to solve any of my problems?
Or does anyone have a working implementation that I may look at?

FYI @MaxNoetzold , I have a post about the detailed implementation, maybe you can have a look.

1 Like