Merging two different Y.js documents

Hey folks, I have an implementation question. We have a bunch of services that generate html content async for a document. We are using hocuspocus + a simple postgres bytea array to persist the Y.doc using the db extension.

The problem that I am running into is that the merging this new html with an existing document isn’t working correctly on the server side

const mergeDocs(htmlContent: string, provider: HocuspocusProvider) {
  await provider.connect();

  while (!provider.isSynced) {
    // console.log("syncing..."); // "syncing...
    await new Promise((resolve) => setTimeout(resolve, 100));
  }

  // uses y-prosemirror to create a new ydoc
  const htmlYdoc = htmlToYdoc(html);


 // i am guessing this is wrong
  Y.applyUpdate(
    serverProvider.document,
    Y.encodeStateAsUpdate(htmlYdoc)
  );

  while (provider.hasUnsyncedChanges) {
    console.log("waiting for changes to sync");
    await new Promise((resolve) => setTimeout(resolve, 100));
  }

  console.log("done updating");
}

Any idea how I can get the new html to be added to the document’s history correctly? I am guessing I need to do some diff first, but I’m not sure if the separate histories would allow that either.

Yjs doesn’t calculate diffs with the other users. It manages a change history.

Think of Yjs documents as a git repository. Each action you perform (e.g. inserting a character, adding an HTML node) is a commit to the document.

When you call htmlToYdoc, I assume you create a completely new Y.Doc. So it doesn’t have a common change history with the other peers. In git you get a merge-conflict. Yjs performs automatic merging - which is great, but not always what you want. I assume that in your case, the approach of automatic merging results in either duplication (i.e. Y.Text content is duplicated). or overwriting (i.e. properties Y.Map will overwrite each other - only one version will “win”).

mergeDocs should pull the remote doc serverProvider.document and then apply the necessary changes individually (see Shared Types | Yjs Docs). You need to keep the editing history intact. Calculating a diff might be a good idea. But you need to apply the changes individually to the Yjs document. Alternatively, if you are using an editor binding like y-prosemirror, you could use their diffing technology. However, please keep in mind that y-prose-mirror (and other editor bindings) were designed for individual edits and will not result in a “minimal” diff if the documents diverged a lot.

If it’s really just HTML, you could overwrite the content of serverProvider.document completely, which will, at least, keep the change history intact.

Thanks for the reply! Appreciate it - that clears out things quite a bit.

This is the basically the method I decided to go with. Here’s the hocupocus code snippet which performs the update, in case anyone needs it.

// express + hocuspocus: https://tiptap.dev/docs/hocuspocus/server/examples#express

app.post("/update/:documentName", async (request, response) => {

  const htmlBody = request.body?.html;
  const docConnection = await server.openDirectConnection(
    request.params.documentName
  );

  docConnection.transact((doc) => {
    const htmlYdoc = htmlToYdoc(htmlBody);
    if (doc.get(xmlFragmentName, Y.XmlFragment).length > 0) {
      doc.get(xmlFragmentName, Y.XmlFragment).delete(0);
    }
    Y.applyUpdate(doc, Y.encodeStateAsUpdate(htmlYdoc));
  });

  await docConnection.disconnect();

  return response.status(200).send("document updated");
});

I suggest that you first fork the document and then apply the changes. I bet that in 50% of cases, the merged result will simply be empty. E.g.

const yxmlroot = doc.get(xmlFragmentName, Y.XmlFragment)
if (yxmlroot.length > 0) {
  yxmlroot.delete(0, yxmlroot.length);
}

const htmlYdoc = htmlToYdoc(yxmlroot, htmlBody);