Transactions: atomicity and labeling?

Hi! I’m very new to Yjs, but am a long time OT/CRDT armchair enthusiast and have used ShareDB in production for a few years. I’ve spent a couple evenings playing around with Yjs and wow: it looks amazing!

I have a couple questions about transactions:

  1. Do Yjs transactions (doc.transact(...)) apply the inner operations atomically? I’ve been working on a shared-state library (totally unrelated to Yjs) based on Redux which transmits actions, and one advantage I perceive of that approach is that each Redux action is applied (or not) atomically, regardless of how complicated it is, even under reordering. This is nice to help attempt preserve user intent on actions which modify several parts of the state. But, it looks like Yjs transactions may provide this behavior too? I think so… but wanted to confirm that this is the actual intent, not just an optimization on the number of messages sent and number of observer calls. (One of these evenings I’ll settle in and watch the 3 hour internals video… not yet though :sweat_smile: )

  2. Another nice aspect of Redux actions is that logging the action history is a developer-friendly tool for debugging and support. I see that Yjs transactions support an origin metadata field. Is there a way one might also provide some kind of label or other arbitrary k/v set on a transaction that exists solely for logging/observability?

Thanks for any pointers!

2 Likes

Hm, for (2) perhaps I could have an Array type in the doc which holds these “logs”, and have each transaction explicitly append to that array.

Welcome to the forum @jasonm,

As you assumed transactions are mostly for managing events. We don’t want to fire a single event for each character that is inserted. Instead, we can bundle many changes into a single transaction, so the event only fires once, and we only need to update the (presumably) DOM once after all changes have been performed.

In most cases, you can assume that transactions ensure that a change is applied as an atomic operation. However, there are some edge cases when this is not true. These edge cases always involve at least three clients. If one of the changes performed depends on a change that has not been applied yet to the local copy, we will only apply the changes that can be applied and apply the other changes once the dependencies have been applied.

In theory, it would be possible to do what you are asking. However, this hasn’t been an issue yet and we are focusing on other things at the moment.

If you want to replay the operations (e.g. for debugging purposes), I suggest that you log the generated update messages somewhere instead of the transactions (including the origin field and the transaction.local field if you like). You can, of course also store Transactions. However, they contain a lot of information that we usually would like to discard (e.g. objects intended for garbage-collection and potentially a lot of object references).

@dmonad Thanks for the reply!

I’m having a little difficulty imaging such a case. I tried to come up with a scenario here - is this such a case?

Imagine three clients A, B, and C:

  • A sends change 1.
  • B receives change 1 and applies it.
  • B then generates change 2 based on change 1. Change 2 is a transaction containing two operations 2a and 2b.
  • B sends change 2.
  • C receives 2 before it receives 1.
  • C is able to apply 2a but not able to apply 2b, and so it applies 2a. This incompletely applies the transaction of 2.
  • Later, C receives 1 and is able to apply that. Then, it also can finish applying 2 by applying 2b.

I’m trying to build an intuition of when this partial transaction application might happen. For the phrase “can be applied” above, what does this mean? My understanding of CRDT mechanisms is fairly weak, feel free to refer me to some literature. Does this have to do with data available in the types? Or clocks? If you have seen realistic motivating examples here, that’d be helpful to share, too.

Also, it seems that this may be impossible in a provider setup where all changes are routed through a single node, e.g. y-websocket, where a single node can impose an authoritative ordering on the operation history. Is this a fair assumption? I understand that this depends on a specific topology and would not generalize e.g. to y-webrtc with p2p.

Last, I came across yjs/INTERNALS.md at main · yjs/yjs · GitHub which does describe a transaction as “a set of updates to the Yjs document to be applied on remote peers atomically”. Would you be open to a PR updating this with some caveat example(s) if I’m understanding them correctly?

Thanks!

@jasonm This is a perfect example of a case where we don’t have atomicity. However, as you said, this can only happen in p2p scenarios.

To give you a better understanding how some operations “depend” on others:

Imagine a Y.Text containing the characters “ab”. Now client 1 inserts character x between a and b. Client 2 inserts character y between x and b. If client 3 receives “insert y between x and b” before it receives character “x”, we have to wait until character “x” is available.

The intention always was to make transactions atomic. We could achieve that by detecting unresolved dependencies before applying updates. However, I never came around to doing that. So for now we only have atomicity when all operations are routed through a central entity (because that prevents the case that @jasonm described).

Would you be open to a PR updating this with some caveat example(s) if I’m understanding them correctly?

Yes, please. Thanks for bringing this up!

2 Likes