How to achieve atomicity?

kuwajima · October 10, 2024, 4:55pm

I sometimes make a series of operations that if interrupted (by an app error a crash) leave my document in an inconsistent state.

One such example is moving a node in a graph to another parent. Depending on the implementation, an inconsistent state may leave that node without a parent or with the two parents.

Is there a way to either detect such errors and to revert to a known good state, or even better – group a series of operations and execute them atomically?

“Cheating” this is also a possibility, by, say, creating a copy of a document and then applying a single update. I fear this may be too slow for large enough documents.

I wonder if there are any established best practices.

Thanks!

dmonad · October 10, 2024, 5:36pm

If you are building a tree with Yjs, concurrent actions can always lead to an inconsistent state (as a tree requires that nodes are connected).

Often, individual changes are fine: 1) User1 connects node A and B with edge E. then 2) User2 deletes node B.

But once changes get merged, we have an unconnected edge E (as node B was deleted).

For certain kinds of data models, you need to look for these kinds of issues after every edit. Then you can repair the graph by deleting unconnected nodes and edges. Note that insertions may lead to duplication, as “repairs” happen on every device.

y-prosemirror also needs to check for “invalid nodes” as rich-text nodes in prosemirror have schemas that can be violated by concurrent edits. A rule might be: “Every citation has at least one paragraph”. Concurrent actions (user1 deletes paragraph one, user2 deletes paragraph 2) may lead to empty citations which must be cleaned up before reflecting the changes in ProseMirror.

Y.Text also has certain kinds of cleanups for formatting attributes. But this is more of an internal optimization. The user doesn’t have to deal with them.

Is there a way to either detect such errors and to revert to a known good state, or even better – group a series of operations and execute them atomically?

For all intends and purposes, transactions are atomic. You should group changes in ydoc.transact(() => { /* .. apply changes to data types here .. */ }, 'my cool graph changes'). Remote clients will receive these changes as a single event. The state should be consistent, unless there were concurrent actions that lead to an inconsistent state (see examples above).

kuwajima · October 10, 2024, 6:04pm

Thanks for the answer.

How can I handle the case in which a transaction crashes midway?

If I want to remove node C from parent A to parent B, and I have a transaction with these two steps:

Remove C from A
Add C to B

and the browser crashes after 1, I have a client with inconsistent state

dmonad · October 10, 2024, 10:43pm

If the browser crashes, then you probably just restart it.

I think what you meant to say is: “what should I do when my code throws an exception during a Yjs transaction”. My suggestion: add a try { /* your code */ } catch (e) {} clause. But once that happens, the Yjs data is no longer in sync with the data from the graph. Hence, you should probably implement a better fallback and make exceptions the exception.

kuwajima · October 10, 2024, 11:58pm

I am catching everything there is to catch (and correcting), but I’ve been worried that the same consequences could happen if the browser experiences a crash.

Pseudo code

tranasction(() => {
  try {
    aNodes.remove(c);
  } catch (e) { 
    // we're going to skip this transaction
    return;
  }

  // what happens if the browser crashes here?
  // will it be as if c wasn't removed from a?

  try {
    bNodes.add(c);
  } catch (e) {
    aNodes.add(c);
  }
})

kuwajima · October 11, 2024, 12:02am

Ah, after writing that previous comment I think I have understood what’s happening. Since providers react to events, and a transaction guarantees a single event, then a crash would not fire an event and the change would not be persisted. I should have figured it out to begin it, having written two providers.

I do wonder – how is it that an event fires even if the code throws? When I omitted that try catch block in dev and checked, the removal of c from a persisted.

dmonad · October 11, 2024, 1:17pm

I’m still not quite sure what you mean by “browser crashing”, but I’m glad you figured out your issue!

While I recommend not throwing exceptions in events, Yjs usually handles exceptions gracefully by catching them and throwing them again once the Yjs state is consistent again.

If you are interested, have a look at the cleanup login in Transaction.js. Especially cleanupTransaction and the event emitters callEventHandlerListeners have interesting implementations that allow multiple errors to be thrown. Still, I recommend to make exceptions the exception.

kuwajima · October 11, 2024, 6:00pm

Thanks for following up

First I have to clarify I never intentionally throw inside a transaction. What I’m trying to do is handle a throw that occurs despite my exception-avoidance, due to a mistake in the code.

Here is a functioning example:

const ydoc = new Y.Doc();
const provider = new IndexeddbPersistence('testDoc', ydoc)

provider.on('synced', () => {
  const firstMap = ydoc.getMap('first');
  const secondMap = ydoc.getMap('second');

  function addToAll(key, value) {
    console.log('before: ', firstMap.toJSON(), secondMap.toJSON());

    const shouldDemoThrow = !localStorage.getItem('alreadyThrew');

    ydoc.transact(() => {
      firstMap.set(key, value);

      if (shouldDemoThrow) {
        localStorage.setItem('alreadyThrew', 'true');
        throw 'an exception we unfortunately overlooked';
      }

      secondMap.set(key, value);
    });

    console.log('after: ', firstMap.toJSON(), secondMap.toJSON());
  }

  addToAll(Math.random().toString(), Math.random().toString());
});

Running this twice results in firstMap holding one extra value compared to secondMap.

I would like to achieve atomicity: if some of the transaction fails, it’s as if none of it ever happened. With it, the first time the example code runs (and throws), nothing would have been added to either firstMap or secondMap.

dmonad · October 14, 2024, 1:56pm

You can catch transactions and reset the document. But Yjs won’t do that for you. You could also use the undo manager to achieve the same.

But Yjs won’t do that for you. This feature is useful if you work with database transactions that might fail because of concurrent writes / because you access a remote data store and you don’t know what is in there. But Yjs is not a database. It is a mutable, automatically syncing data type. Similar to normal data types, there is no concept of “reset on fail”/atomicity (which usually applies to database transactions that write to the same data, which Yjs handles intentionally by automatic syncing). I feel this edge-case requirement might not be worth the implementation overhead. Handling exceptions as I already do is already hard enough.

kuwajima · October 14, 2024, 4:06pm

Yes I completely agree, I was wondering what can be the best approach for me to implement on top of Yjs.

A couple of questions:

Is there a way to indicate a point in time/snapshot a state so I won’t need to count the number of undos needed to revert a transaction?
How expensive (in compute, not memory) is it to clone a YDoc? If there aren’t any snapshots I could clone the document, perform the transaction, then use applyUpdate to apply the diff to the original document, which as I understand is atomic.