Split update into smaller updates

I’m trying to split large updates into a set of smaller ones to work around limitations of the infrastructure I use regarding message sizes.

I’ve considered breaking messages themselves into frames but not having to introduce that additional layer would help a lot with taming complexity.

I’ve poked around in the alternative update API with the aim to craft half-sized updates like so

import * as Y from "yjs";

const doc = new Y.Doc();
const map = doc.getMap();

map.set("a", "1");
map.set("b", "2");
// ---- split here
map.set("c", "3");
map.set("d", "4");

const update = Y.encodeStateAsUpdate(doc);
const updateMeta = Y.parseUpdateMeta(update);
const breakPoint = Math.ceil(
  (updateMeta.to.get(doc.clientID) - updateMeta.from.get(doc.clientID)) / 2
);

const secondHalfVector = Y.encodeStateVector(new Map([[doc.clientID, breakPoint]]));
const secondHalf = Y.diffUpdate(update, secondHalfVector);
const secondHalfMeta = Y.parseUpdateMeta(secondHalf);
console.log(secondHalfMeta.from.get(doc.clientID), '→', secondHalfMeta.to.get(doc.clientID));

// How to get the obtain the first half of the update?
// const firstHalfVector = Y.encodeStateVector(new Map([[doc.clientID, 1]]));
// const firstHalf = Y.diffUpdate(update, firstHalfVector);
// const firstHalfMeta = Y.parseUpdateMeta(firstHalf);
// console.log(firstHalfMeta.from.get(doc.clientID), '→', firstHalfMeta.to.get(doc.clientID));

Code snippet also in this code sandbox to play with: gallant-drake-xzr9ey - CodeSandbox

Hi @marionebl,

This can never work reliably because large updates (writing a very long string, or a binary blob) cannot be split into two update messages. Furthermore, splitting updates will void some of the guarantees that transactions give (changes applied within a transaction must be applied together). A third problem: while you are assembling all updates, the states in-between might not make sense or invalidate some kind of data schema. Your approach will definitely be problematic when working with the prosemirror editor binding, and possibly other applications.

The only reliable approach is to split messages into frames. There really is no significant overhead in frames, yet it has all the advantages.

If you still want to try this approach, I recommend to work with the LazyUpdateEncoder/Decoder directly. Look at some of the transformation functions to understand how to read and recreate update messages. But generally I really don’t recommend going this route.

3 Likes

Thanks for the advice!

Hi @marionebl, I’m having the same need. Were you able to find a solution?

I think this is a non-yjs problem, but how do you implement your provider? I think you could add a total and current to your response from the server, and only when it receives all updates, then put it into the ydoc.

And the second idea is every time you receive the small updates from doc.on('update'), you can keep it in your database and don’t let yjs merge them into a large update, since it cannot be split.