Set the key in the YMap to its old value (to ensure consistency in the higher-level semantic interpretation)

huluba · November 28, 2023, 3:38am

Hello everyone, I’ve recently been pondering a question in a project that utilizes Yjs, specifically, how to share (and of course, modify) the same YMap instance across multiple clients, ensuring that the YMap, once merged, maintains business-level semantic consistency at all times.

I understand that achieving this goal is impossible without imposing constraints on the contents of the YMap. Therefore, I will consider adhering to the following principles when modifying the YMap at the business layer:

The keys in the YMap are definitely a fixed set and will not increase or decrease (for example, there will always only be ‘type’ and ‘content’ as the two keys).
All modifications to this YMap instance by any client are performed within a single transaction.
All modifications to this YMap instance by any client will involve setting both keys simultaneously; there will never be a situation where a client sets only ‘type’ and not ‘content’.

These principles are based on my understanding of the Yjs conflict resolution algorithm. If there are any wrong with these principles, please feel free to correct me.

Let’s discuss further using a YMap that only contains two fields: ‘type’ and ‘content’. In this YMap, the value of the ‘type’ field is a regular string, while the value of the ‘content’ field is a Y.AbstractType instance. The business layer will determine how to read the contents of ‘content’ based on different ‘type’ values (of course, there are also some ‘type’ values whose corresponding ‘content’ have the same format).

We assume that there are three types of values: type-A, type-B, and type-C. Both type-A and type-B share the same content format, while type-C has a different format.

When Bob wants to change the value of ‘type’ from type-A to type-B, in order to comply with the principles established earlier, although the value of ‘content’ remains consistent before and after the modification, Bob still needs to set the value of ‘content’ in the transaction. However, if Bob uses the original value for the set operation, it would actually lead to the content being cleared (since the old value is a Y.AbstractType instance). Of course, Bob could serialize the old content value and then reinitialize a new Y.AbstractType instance before assigning it to ‘content’, but this approach might result in the collaborative modifications to ‘content’ by clients being discarded.

const doc1 = new Y.Doc()
const map1 = doc1.getMap("M")
map1.set("type", "type-A")
map1.set("content", Y.Text("tttttt"))

doc1.transact((txn) => {
    map1.set("type", "type-B")
    map1.set("content", map1.get("content")) // <- actually the content will be cleared 
})
map1.get("content").toString() // <- will be empty

So my question is, first, whether the principles assumed above are correct for ensuring consistency in this scenario? Second, how can Bob preserve the old value of ‘content’ in the transaction, while also obtaining the same logical status in Yjs’s conflict resolution process as if ‘content’ were reset?

raine · November 28, 2023, 2:26pm

I think trying to set content to the same value in order to appease the validation logic is the wrong way to go about it. Your business logic is not actually “always set type and content together”; it’s “type should always match content”.

Yes, and the validation logic should do the same.

Since users can modify client-side code, the best place to put validation logic is on the server (or in a peer-to-peer model, on the receiving client). Then the server/other client can reject invalid updates.

This is just pseudo code (and the clone is slow), but this is what I have in mind:

on('message', (update, doc) => {
  // first apply and observe the change only on the clone
  const clone = new Y.Doc()
  clone.applyUpdate(Y.encodeStateAsUpdate(doc)
  clone.on('observe', delta => {
    if (validate(delta)) {
      // if it's a valid update, then it's safe to apply to the real doc
      doc.apply(update)
    }
  })
  clone.apply(update)
})

Your validation logic would look something like this:

function validate(delta) {
  if (delta.type === 'type-A' || delta.type === 'type-B') {
    return delta.content == null
      // if content is missing, the type must be changing 
      // from A to B or B to A
      ? oldType === 'type-A' || oldType === 'type-B'
      // if content is present, it must be valid A/B content
      : isValidAB(delta.content)
  }
  else if (delta.type === 'type-C') {
    // content must be provided and be valid
    return delta.content && isValidC(delta.content)
  }
}

Therefore, you allow content to be missing but only when that is valid.

(As an aside, I think your code example reveals a bug in YJS. Logically that should be a noop.)

@stefanw You might have ideas as well since we were just discussing validation.

huluba · November 29, 2023, 1:25am

Thank you very much for your reply! Your idea of using a validation approach is excellent and has inspired another thought in me: What if we register an observer (or listen for an afterTransaction notification event, etc.) on this YMap? In the callback, we would be able to know what changes have occurred in both ‘type’ and ‘content’, and whether their current states are consistent. If an inconsistency is detected, we could then make another transaction within the callback to set the ‘type’ back to its old value.

However, I am not sure whether make another transaction within the observer callback for modifications aligns with Yjs’s best practices.

Thank you again for your reply!

raine · November 29, 2023, 2:52am

If a client has already called transact, then it’s too late to redo the transaction. At best you could create another transaction, but then you lose atomicity since one transaction could succeed and the other could fail, which kind of defeats the purpose of what you’re trying to do.

When it comes to security, you need to validate it on the server anyway. Client-side validation is just for user feedback.

MentalGear · February 17, 2024, 9:15pm

@raine It’d be interesting how we could improve the validation logic to maybe save a few extra cpu cycles. Also,

I was thinking that it’d be best to set a limit / throttler server and a debouncer client-side and queue updates (and already merge on the client) before sending them out to safe on expensive validations.

I guess the debounce time could be dynamically set depending on scenario, e.g. a doc under colab needs a higher refresh rate (0.5s) than a document that is only edited by one person (10s).

MentalGear · February 17, 2024, 9:19pm

There’s also a whole blog post by notion engineers describing how they handle their data types and real-time colab. Apparently, they don’t use CRDT, but seem to compare and resolve server side (maybe OT) ?