Sync protocol over websockets

We are implementing realtime collaboration using Prosemirror (currently with the y-prosemirror bindings and y-websocket’s client websocket provider, but this may be swapped for a custom implementation soon) which talks to a custom implementation of the protocols via websocket.

In the protocols package it mentions that in client/server model, e.g http or websockets. That during the sync stage the server should only respond to the clients SyncStep1 and not mutually send SyncStep1 as should be done in p2p architectures.

However in the server in the y-websocket package it appears to (and when tested) follow the p2p specification of the protocol and immediately sends SyncStep1 to the client. Is it still recommended for client/server architecture over websockets to only initiate sync on the reception of SyncStep1? It seems to work either way, but if we were to move towards a socket.io or SSE implementation it might change.

Hi @ButteryCrumpet, you are right. It will sync either way and won’t make a difference in most cases.

There is a section in the Yjs README that explains how syncing works. https://github.com/yjs/yjs#Document-Updates
A SyncStep1 is basically just a State Vector. A SyncStep2 encodes document updates based on the State Vector.

In a P2P network each client will just connect to a small number of other clients. They will send a SyncStep1 immediately and receive the same document updates from multiple clients (you are basically asking multiple clients for the same document updates).

In a client-server architecture, you might have many clients connecting to the server. When you restart the server, the server will ask all clients to send their document to the server, so it can sync up. Because initially the server document is empty, SyncStep1 will request the whole document. Forcing many clients to send the same document updates to the server. You can improve performance by waiting for each client to send SyncStep1 first, before you ask for their document updates. Some fast clients will sync with the server before the server will send SyncStep1 to the other clients. When Yjs was still using JSON encoding, this really improved performance when syncing to >50 clients.

Background: We had one large server instance and many users connecting to our beta product. We tested restarting the server and found that when we do that the server would be unresponsive for a couple of minutes because there was so much data incoming. This change improved performance quite a bit, without having any downsides.

Now I think it is quite an edge case. Furthermore, binary encoding and a change to the algorithm that applies document updates heavily improved performance, so that you don’t have to worry about it anymore.

Another advantage of the client-initiated sync is that the client might want to wait for data from a database, before initiating the sync to the server. At the moment the client will concurrently sync with the database and the server, but it is something we might want to do in the future.

My suggestion is that you still do as described in the y-protocols package unless there is a reason against it - in that case, it will be fine too. Now you know the background.

Thanks for the fantastic answer! Very useful to know the rational behind the implementation.
As you recommend we will do as described in y-protocols.