How to move on from dropped update?

Bono · April 27, 2024, 12:50pm

I know that websockets shouldn’t “just” drop messages, but I can’t help but aim to design my system to be resilient in the case that a document update from one client goes missing or is dropped and is never received by the server for whatever reason.

Consider the following scenario:

  const serverState: any[] = []

  const doc1 = new Y.Doc()
  const doc2 = new Y.Doc()

  let nextUpdateShouldBeDropped = false

  doc1.on('update', (update) => {
    if (nextUpdateShouldBeDropped) {
      nextUpdateShouldBeDropped = false
    } else {
      serverState.push(update)
      Y.applyUpdate(doc2, update)
    }
  })

  doc2.on('update', (update) => {
    serverState.push(update)
    Y.applyUpdate(doc1, update)
  })

  doc1.getText().insert(0, 'H')
  doc1.getText().insert(1, 'e')
  nextUpdateShouldBeDropped = true
  doc1.getText().insert(2, 'l')
  doc1.getText().insert(3, 'l')
  doc1.getText().insert(4, 'o')

  console.log('Doc 1 state', doc1.getText().toString())
  console.log('Doc 2 state', doc2.getText().toString())

  /**
    Result:
      Doc 1 state Hello
      Doc 2 state He

      Client 2's document was halted essentially because it did not receive one update.

      Question now is, how do we move on from this?
   */

In this scenario, the server state is missing the third update, and this update will never be recovered.

Let’s assume now that both clients close their window and come back a few days later and download the server state, which has 4 updates (and not 5).

When they replay these server updates, the document will show “He”. Ideally in this state I think we should just allow the clients to move on from that missing update, so that the document shows “Helo”.

How do we get clients to move on from that missing update and just continue to play the other updates? Or am I looking at this the wrong way?

In this issue, @dmonad says:

After sync is fired, you should assume that the client received all content from the server. There is no message that confirms that a server really received all updates from the client. That is something that you would need to implement yourself.

If this is really important to you, you could extend the sync protocol and give each update that is sent from the client a unique identifier. The server must confirm that each message has been written to the database before the client shows a sync message. This, however, only makes sense for super sensitive content.

In my eyes this seems like a super high priority, but I’m still in development of my Yjs application and don’t have any real world experience. So it may be that I am being unnecessarily paranoid.

Resyncs can help assuming the client with the truest document state doesn’t quickly close their browser before a resync can complete.

Even if we trust websockets to be reliable, we’d need to have really careful exception handling on the server so that connections are immediately closed if a save cannot be performed for whatever reason. But if there is an uncaught exception on the server, it would spell trouble for documents.

So overall I’m curious just how much of a problem this stands to be, and whether it’s worth the effort to build some sort of message confirmation system, such that the client does not send the next update until it receives confirmation from the server that the last update was received.

dmonad · April 27, 2024, 11:58pm

If a user types “hello”, and the first l operation is dropped, all future operations from that client will be cached until all “dependencies” are received.

If you used a different algorithm for conflict resolution (e.g., Operational Transformation), you would end up with scrambled content as future operations depend on previous operations. Yjs does something incredible: It detects that a dependency is missing and prevents you from messing up the content with scrambled content.

Asking Yjs (or any similar algorithm) to do more is unreasonable. A reliable network protocol is a fair assumption. You can’t recover from missing dependencies. All operations that depend on the missing op will be lost as well. That’s why we need a reliable network protocol.

To be honest, this phrasing sounds paranoid. When you download a website, you also trust that no random paragraphs are missing. WebSocket and HTTP share the same protocol (TCP). TCP is a reliable protocol that is battle-tested like no other. It doesn’t just randomly drop messages.

With this kind of thinking, we would be unable to design applications nowadays.

Yjs expects a reliable communication protocol for message distribution. That is a fair assumption. These protocols exist. WebSocket (which is an extension of HTTP, which in turn is based on TCP) is such a reliable protocol. Either all sent messages are sent, or the connection is dropped.

If you have lost messages, it is most definitely a user error (e.g., because you use an unreliable pubsub server, or your server forgets to store & broadcast a message received from a client, …). Redis pubsub, for example, is not a reliable communication protocol; Websocket is. An unreliable distribution mechanism is intolerable for any application, and it’s not something we should try to fix on the Yjs- or networking level.

Sorry for the rant. But I want to make clear that there is no coming back from dropped messages. Not in any application. You dropped content, you lost content. It is most definitely not TCPs fault.

This problem is not worth anyone’s time. Exactly this has been implemented by TCP. If I recall correctly, I implemented the resync interval for a company that uses an unreliable pubsub implementation to broadcast messages.

Depending on the websocket library you are using, you could check whether there are conditions under which the library drops messages. uws, for example, can drop messages if the buffer is full and if the client is configured to drop messages. The same is true for the ws library. In these cases an error handler is called. y-websocket handles these errors for you.

Bono · April 28, 2024, 4:01am

Got it, thanks very much for your input.