Easy to understand explanation of how Yjs works

riguz · August 31, 2024, 12:03pm

After read the paper I think I can explain the basic idea of Yjs/CRDT in a much more easier way:

The core concept of Yjs is events, we log all events as single source of truth, this pattern is usually named Event Sourcing
Like any other event based distributed systems, eg. Raft, we need to ensure that events will be replicated to all clients, eventually
We replay events to calculate the state, so if we have same events, then we get same results.
What makes Yjs different to other event based systems is the order of events. Unlike other systems relying on natual order of events, eg. event happen time or global IDs, we cannot use absolute time or IDs to sort events, simply because Yjs is distributed. Also, in order to keep the replay results logically correct, the sort method must be reasonable. So the core of Yjs is to find such a method to sort events, which is refered as “strict total order <c” in the paper.

In conclusion, the most important part is Event Sourcing and Sorting. Am I understanding it correctly?

dmonad · September 2, 2024, 10:03pm

Basically yes, but some annotations:

The core concept of Yjs is events, we log all events as single source of truth, this pattern is usually named Event Sourcing

In Event Sourcing we usually have a single source of truth. This makes it easier to reason about changes.

The unique thing in CRDTs is that there can be multiple sources of truths / “event sources” (client A, client B, …), each emitting different events. However, if you interpret these events, all should end up in the same state. E.g. client 1 might emit “insert(A, 0) • insert(B, 1) ⇒ AB” and client b might emit “insert(B,0) • insert(A, 0) ⇒ AB” (same ops, different order). The peers are eventually consistent because the state will eventually converge. In CRDTs, changes are optimistic (we apply them locally first and then apply them on remote peers). This is usually not the case in Event Sourcing.

Like any other event-based distributed systems, eg. Raft, we need to ensure that events will be replicated to all clients, eventually

The changes (In Yjs they are binary encoded) can be replicated in any order to all connected peers. There are different syncing approaches. As long as all changes are distributed, all clients will end up with the same Yjs document.

Raft achieves convergence via “consensus”. CRDTs only rely on the algorithm and need no further communication.

Raft provide (semi-)pessimistic replication. CRDTs provide optimistic replication.

What makes Yjs different to other event based systems is the order of events. Unlike other systems relying on natual order of events, eg. event happen time or global IDs, we cannot use absolute time or IDs to sort events, simply because Yjs is distributed. Also, in order to keep the replay results logically correct, the sort method must be reasonable. So the core of Yjs is to find such a method to sort events, which is refered as “strict total order <c” in the paper.

In CRDTs we usually don’t think in terms of events. For me, it would be more natural to think about state. Yjs maintains a linked list. Eventually, all clients will end up with the same linked list. CRDTs give you an algorithm to “put that state together” so that everyone ends up with the same state.

This is very different to Operation Transformation, where we think in terms of operations that need to be transformed so that everyone ends up with the same content.

However, this is just my interpretation.