Question regarding updates and state vectors in y-leveldb

Hey there! I’m digging around in y-leveldb and had a couple questions.

One, I noticed that the implementation included in y-websocket server reads from LDB, then immediately writes back to it. Each time I start my server, this of course results in the update list growing by one, even though the updates have identical information in them. What’s the intention of this behavior? (Answered this one myself upon re-reading; it’s using the in-memory ydoc here, which could have new stuff in it. Oops!)

In that same area, it seems like I’d be able to use Y.diffUpdate and the stored state vector to avoid writing updates I already have. The state vector, however, is only updated on flushDocument, and on state vector creation. Is the intention to go ahead and write updates I already have and let the trimming take care of compaction? Or do I need to update the state vector on every update? Is there a reason y-levelDB doesn’t do this?

Anyways, just a few questions I had while digging around. Really enjoying playing with the new Y.diffUpdate and Y.mergeUpdate; they are super cool. :smiley:

Hey @braden! Happy you like the diffUpdates feature :slight_smile:

The getStateVector method in y-leveldb is currently incomplete. Nevertheless, it could potentially save database-queries (e.g. computing the state-vector for sync-step-1 can be avoided even if it might lead to a bit more overhead).

The state-vector database-field should be interpreted as “the database has at least the information of the defined state-vector” or “the compacted state has the information of the defined state-vector”. Updating the state-vector on every update is potentially expensive as it doubles the number of writes. While leveldb is really fast, other databases that are supported by levelup have a lot more overhead.

With the diff-update feature, we can update the state vector in the database by simply merging all updates that come after the compacted state and then extracting the state-vector. I.e. if we have [compactedState, update1, update2], stateVector: sv0 in the database we first compact update1-2 compactedUpdates = Y.mergeUpdates([update1, update2]) and extract the stateVector sv12 = Y.getStateVectorFromUpdate(compacted) and write everything back to the database:

[compactedState, compactedState], stateVector:  mergeStateVectors(sv0 + sv12)

I plan to update the getStateVector method in leveldb to always retrieve the latest state vector. If the state is not compacted and the state-vector in the database is not up to date, the state-vector will be updated using the above approach.

Regarding your other question. I think it should be possible to avoid duplicate writes of updates in many cases. I’m unsure if it makes sense to check every update if it should be written. So I would favor letting the compaction handle removing the duplicate updates.

1 Like

Awesome, that all makes sense. Thank you for the reply!

Another question: Does diffUpdate always converge to an empty diff after applying mutual updates to documents? I’m experimenting with sending the smallest amount of information possible between two nodes and I’m finding I can’t get the diff to converge to an empty Uint8Array. I imagine I just have a bug somewhere, but I wanted to confirm my assumption about convergence. I have side effects based on the results of a sync, and I’d like to be able to opt out of those side effects if there’s no meaningful changes in an update.

Hey @braden,

Does diffUpdate always converge to an empty diff after applying mutual updates to documents?

It does not.

Basically, Yjs has a struct store (containing all insertions) and a delete store (containing an optimized representation of all ids that have been deleted). We can compute a minimal diff for the struct store (using lamport timestamps). But the delete store is always sent to the other client when syncing.

Other CRDTs don’t make that distinction and store everything in an “operation store”. In Yjs, we achieve better compression faster parsing of the encoded document by separating struct- and delete store.

That might be hard to achieve. If you opt-out of syncing some changes, you might not be able to apply other updates (that depend on the changes that you didn’t apply).

1 Like