YDocs Diffing 2 snapshots

Hello guys,

How you can make a diff between 2 snapshots or yDocs?? E.g.:

yDoc2 - yDoc1 = result

Where result should be an readable output that might express node positions or something similar? I can find something similar for y-posemirror: https://github.com/yjs/yjs-demos/tree/d8e33e619d6f2da0fae0c6a361286e6901635a9b/prosemirror-versions and the example would be here: https://demos.yjs.dev/prosemirror-versions/prosemirror-versions.html .
Unfortunately I am not using poseMirror, but slate, so the y-posemirror integration is not that helpful for me.
Wondering if there is a way to obtain the positions or a readable data structure of the changes of 2 yDocs/snapshots.

I have been following the path of using diffUpdate, something like this:

        let currentState1 = Y.encodeStateAsUpdate(yDoc1);
        let currentState2 = Y.encodeStateAsUpdate(yDoc2);

        const stateVector1 = Y.encodeStateVectorFromUpdate(currentState1);
        const stateVector2 = Y.encodeStateVectorFromUpdate(currentState2);
        const diff1 = Y.diffUpdate(currentState1, stateVector2);
        const diff2 = Y.diffUpdate(currentState2, stateVector1);

      console.log({ diff1 });

Not sure if this would be the correct approach, but how you can decode the diff1, into something more readable, in order to actually see the diffs/operations/changed positions etc…?

After playing a bit with the yjs api, I think I might have a better understanding of how things happen, but since I couldn’t find any related information about this, I will try to reproduce this as clear as possible.

Currently I am having 2 delta strings:

  1. delta1 = …
  2. delta2= …

From this 2 deltas I am creating 2 uint8arrays:

      const diff1 = toUint8Array(delta1)
      const diff2 = toUint8Array(delta2)

What it is interesting, is that when I am trying to make a diff between this 2 arrays:

    const updates = Y.diffUpdate(diff1, diff2);

It is a bit confisuing, since I didn’t expected to receive a content identical with diff 1, and viceversa.
My expectations were that if I would make a diff between 2 state:
state1: [p1, p2, p3] and state2: [p1, p2], then the resulting element would be [p3].

What I am doing wrong, since the diff between 2 updates it will always be the first state?

Per the docs, Y.diffUpdate does not compare two updates. It compares an update with a state vector. They are both represented as Uint8Array, but they are very different entities. An update is an encoded set of changes, while a state vector is an encoded listing of all clients and their clocks that together specifies a point in time in the change history. You’re getting diff1 back because diff2 is not a valid state vector, so it thinks the entire update consists of missing differences, i.e. changes not represented in the (invalid) state vector.

(I’m assuming by “delta string” you mean an update encoded in base64? Correct me if I’m wrong…)

There isn’t an easy way to compare two updates directly that I know of, as they must be integrated into a Doc in order to know the convergent state. However, you may be able to extract some relevant information with Y.decodeUpdate.

1 Like

Hi @raine . First of all, thank you for engaging to this thread. Much appreciated!

Thank you for clearing that out.

You are indeed correct, I do save the doc into an encoded base64 format.

That was a path that I’ve been also exploring. Actually applying the detals to a doc, something like below, and trying to obtain the differences between the 2 ydocs, but this seems to have the same result :slightly_frowning_face: .

I did something like this:

export const docDeltaYDoc = (docDelta: string, opts?: DocOpts): Y.Doc => {
  const uInt8Doc = toUint8Array(docDelta) // is a Uint8Array

  const yDoc = new Y.Doc(opts)
  Y.applyUpdate(yDoc, uInt8Doc)

  return yDoc
}

export const getSlateNodes = (yDocument: Y.Doc): Element[] => {
  const sharedRoot = yDocument.get('content', Y.XmlText) as Y.XmlText
  const slateContents = yTextToSlateElement(sharedRoot).children

  return slateContents;
}

const ydoc1 = docDeltaYDoc(delta1);
const ydoc2 = docDeltaYDoc(delta2);

const ydoc1State = Y.encodeStateAsUpdate(ydoc1);
const ydoc2State = Y.encodeStateAsUpdate(ydoc2);

const stateVector1 = Y.encodeStateVectorFromUpdate(ydoc1State);
const stateVector2 = Y.encodeStateVectorFromUpdate(ydoc2State);

const diff1 = Y.diffUpdate(ydoc1State, stateVector2);
const diff2 = Y.diffUpdate(ydoc2State, stateVector1);

const ydoc3 = new Y.Doc()
const ydoc4 = new Y.Doc()

Y.applyUpdate(ydoc3, diff1)
Y.applyUpdate(ydoc4, diff2)

const slateState1 = getSlateNodes(ydoc3)
const slateState2 = getSlateNodes(ydoc4)
console.log({ slateState1, slateState2 });

This seems to have the same output as above too…

Yes, this would be the end goal, but first I must find a way to obtain the desired/correct results from comparing 2 ydocs, and then I will try to decode the updates and try to map them to desired format.

P.S.: Not sure if in this case I still replicate the same issue as you mentioned above, but it is still not applying any diffing between the 2 (update and stateVector).

I don’t think this will work… It may not throw an error, but the shared types will not be recreated correctly if you apply an arbitrary update to an empty Y.Doc. That would only work if you had the entire state.


Another idea is to observe your shared type(s) on each doc. It will fire as soon as data is added to your Doc (including the initial load from providers). You can access the deltas in e.changes.delta. If doc1 has 100 changes and doc2 has 120 changes, then the difference is the last 20 changes of doc2.

1 Like

I will explore both ideas and see where it goes :crossed_fingers: !
Thanks for pointing them out!

I explored several options based on the the previous discussions. It seems I can’t reach promise land yet.
The first option is the following:

I am creating a snapshot of the document and adding it into an YArray in the desired YDoc. Something like this:

export const addSnapshotOnYDoc = (document: Y.Doc): void => {
  document.gc = false
  const versions = document.getArray<YDocVersion>('versions')

  const prevVersion = versions.length === 0 ? null : versions.get(0)
  const prevSnapshot: Y.Snapshot =
    prevVersion === null
      ? Y.emptySnapshot
      : Y.decodeSnapshot(prevVersion.snapshot)
  const snapshot: Y.Snapshot = Y.snapshot(document)

  if (!Y.equalSnapshots(prevSnapshot, snapshot)) {
    versions.delete(0, versions.length)

    versions.insert(0, [
      {
        date: new Date().getTime(),
        snapshot: Y.encodeSnapshot(snapshot),
        clientID: document.clientID,
      },
    ])
  }
}

I am interested in saving only the last version of the document, and only once, so I don’t care treating multiple versions scenarios for the moment.

From here I tried 2 approaches:

  1. Trying to get the decoded diffs between the currentDoc and the snapshotedVersion of the doc.:
export const diffYjsDocs = (document: Y.Doc) => {
  document.gc = false
  const versions = document.getArray<YDocVersion>('versions')

  const prevVersion = versions.length === 0 ? null : versions.get(0)

  if (!prevVersion) {
    return...
  }

  const docFromSnapshot = Y.createDocFromSnapshot(
    document,
    Y.decodeSnapshot(prevVersion.snapshot)
  )

  const ydoc1State = Y.encodeStateAsUpdate(document)
  const ydoc2State = Y.encodeStateAsUpdate(docFromSnapshot)

  const stateVector2 = Y.encodeStateVectorFromUpdate(ydoc2State)

  const diff = Y.diffUpdate(ydoc1State, stateVector2)
  const decodedDiff = Y.decodeUpdate(diff)

console.log({ decodedDiff })
}

It seems this approach works only for added “nodes”, not for changed/removed ones.

  1. Trying to observe the structure that is being changed inside both documents, and diff the deltas as suggested. Here I couldn’t find a way to make the observer trigger upon Y.createDocFromSnapshot. I had to applyUpdate in order to trigger the observer, even tho behind the scenes, createDocFromSnapshot also uses applyUpdate`. So I am trying to do something like this:
const diffDeltaChanges = (originalDelta: string) => {
  let originalDeltas: Delta = []
  let currentDeltas: Delta = [];
  
  const originalDocDelta = toUint8Array(response.delta);
  document.gc = false;
  
  let originalDoc = new Y.Doc();
  const currentSharedRoot = editor.sharedRoot;

  const sharedRoot = originalDoc.get('content', Y.XmlText) as Y.XmlText;
  
  sharedRoot.observe(event => {
    const deltas = event.changes.delta as Delta
    originalDeltas = deltas;
  });

  currentSharedRoot.observe(event => {
    const deltas = event.changes.delta as Delta;
    currentDeltas = deltas;
  });

// I can see only insert deltas, but when I am changing some item attributes, 
// or delete some, this is not represented in deltas or changes.
}

Also, if I am comparing deltas, e.g.:
initialDoc: 2 deltas
currentDoc: 5 deltas

I might get the diff between them as added items.
But if I do change the item[0] contained in both initialDoc & currentDoc, this doesn’t reflected in deltas.
Same for removing a common element from both docs.

Not sure what I am doing wrong in this case.

Seems like you’ve made some progress! I haven’t worked with snapshots before, but it seems promising.