Hello,
I’m working on iasakura/lean-yjs, a Lean4 project to formally model and prove the correctness of Yjs algorithms.
My goal is to contribute to the theoretical reliability of Yjs by providing formal proofs, and hopefully to share observations from the verification process, in case they might be useful for those interested in the correctness of the Yjs algorithm.
So far I’ve proved the commutativity of the Yjs conflict resolution algorithm, which is modeled as the integrate function from the reference implementation by Joseph Gentle.
Currently, my work is still at the algorithmic level of verification, but my plan is to gradually move towards proofs that are closer to the actual implementation.
Along the way, I also noticed that the original YATA insert algorithm described in “Near Real-Time Peer-to-Peer Shared Editing on Extensible Data Types” admits a potential non-convergent case.
This issue does not affect Yjs at all, because the Yjs algorithm already addresses this problem. In fact, my verification shows that my formal model of Yjs does not exhibit such non-convergent cases.
I’m sharing this because I believe it might be useful for those interested in understanding the correctness of the YATA algorithm.
Non-convergent example
(based on Section 3.4 insert algorithm)
Setup (three clients, concurrent inserts):
- Initial document: empty.
- Client 1 inserts
"a"at position 0 concurrently with Client 2 inserting"b"at position 0.
Hence:a.origin = ⊥,b.origin = ⊥. - Client 3 observes
"b"and then inserts"c"after"b".
Hence:c.origin = b.
We compare two scenarios that lead to two different results:
Order A: a → b → c
- A-1: apply
a→"a" - A-2: apply
b- compare with
a:a.origin = b.origin = ⊥anda.id < b.id→ placebaftera - result:
"ab"
- compare with
- A-3: apply
c(withc.origin = b) → placecafterb- result:
"abc"
- result:
Order B: b → c → a
- B-1: apply
b→"b" - B-2: apply
c(withc.origin = b) →"bc" - B-3: apply
a- loop #1 vs
b:a.origin = b.origin = ⊥anda.id < b.id→ no matching “place after” condition in this pass, keep scanning - loop #2 vs
c:a.origin < c.origin(sincec.origin = banda.origin = ⊥) → placeaafterc - result:
"bca"
- loop #1 vs
Thus, the two scenarios diverge: "abc" vs "bca".
(This non-convergent case arises from the subtlety in the scanning process of the integration loop (e.g., comparing a vs. b), where the insert position does not move until an item appears whose origin makes the ordering unambiguous. This turned out to be one of the most challenging parts of my formal verification.)
If I’m misunderstanding something here, I’d be happy to be corrected.
I would greatly appreciate any feedback or comments.