Y-prosemirror updateYFragment algorithm accuracy

Hello!

I’m working on a feature that allows user to compare prosemirror documents, and highlight differences between them. I’ve successfully implemented calculating and highlighting the differences, and generating a doc that works pretty much like the versions demo in the yjs repository.

One thing I’ve noticed is that updateYFragment updates the mapping pretty aggressively instead of trying to keep the changes small.

Here is an example:

The first version has the following content:

# Hello
p How is it going?

The second one has the following content:

# Hi
# Hello
p How is it going?
# Ok

An optimal update would keep “Hello” and “How is it going” the same, and insert the other two headings.
The algorithm in updateYFragment though prioritises updating nodes of the same type, regardless of their content.

let updateLeft = leftY instanceof Y.XmlElement && matchNodeName(leftY, leftP)
let updateRight = rightY instanceof Y.XmlElement && matchNodeName(rightY, rightP)
if (updateLeft && updateRight) {
  // decide which which element to update
  // ...
} else {
  // ...
}

In the case above though, because we added a new heading at the bottom, that won’t match the paragraph of the old version, and so we’ll start updating all the nodes from the top, effectively marking all the doc as changed.

Would it be possible to optimise the algorithm to output a smaller changeset? It reminds me of the stable marriage problem, which would help find the best match between nodes and keep the mapping more “stable”.

Thanks a lot for your work!

I thought about this a bit more, and realised this would mean implementing something like the Patience algorithm in git: The patience diff algorithm – The If Works

I’m still not sure how this could be applied to a tree structure though.

I understand that a minimal diff might be preferable. However, Yjs takes the approach of showing
what the user actually changed.

If you deleted 234 in the word 12345 and then re-inserted them again. Nothing changed, but you would still see that the user deleted and re-inserted the same content. I believe that we can optimize this in the future. For now, however, we focus on showing the changes and highlighting changes by user.

Thank you for the response @dmonad !

I realise i’m bending yjs heavily to do something that’s not supported as the primary use case.
I need to diff two prosemirror documents that have no common history, so it’s basically only 2 updates to the shared type: the first one inserts the content of the first doc, the second one replaces everything with the content of the second doc. Then I take two snapshots and create a new doc by merging them, like shown in the versions demo.

I’ve now patched the y-prosemirror package to export all the utilities used in updateYFragment, so that I can implement a “patient” version of the algorithm, which seems to be giving good results!