How to fetch init data in server side?

jin · July 24, 2020, 3:17am

Hi, everybody. I have some issues about init data
Now y-leveldata is supporting that store and fetch ydoc object into leveldb
But I need to fetch init data from table’s field value. So I build self persistence module using ytext.insert(0, value).

getYDoc (ydoc, docName) {
return this._transact(async db => {
const document = await getMongoData(db, info)
for (let key in document) {
const value = document[key]
if (typeof value === ‘string’ || value instanceof String) {
//. insert data
let yTextName = {genereate by collection, document_id and name}
ydoc.getText(yTextName).insert(0, value)
}
}
})
}

This is my persistence module’s getYDoc function.
And I added setPersistence in server.js

utils.setPersistence({
bindState: async(docName, ydoc, a, b) => {
await ldb.getYDoc(ydoc, docName);
},
writeState: async (docName, ydoc) => {}
})

When the user load page, it will be work well at first time. But if server restart or client reconnect, it will be duplicate data

I thought that this case will retry bindState in server sider. because clients has been closed

So I am not sure how to fix this issue.

Thanks.

dmonad · July 27, 2020, 3:02pm

Hi @jin

Yjs documents will always sync. But if you load the same text data using insert(0, text) every time you restart the server, you will duplicate content. There is no mechanism to detect how you initialized your content. So I suggest that you init the data with the content from the Yjs model. This means, that you need to persist the Yjs doc (Y.encodeStateAsUpdate(ydoc)) instead of the pure text content. If you would like to index the text document, you can store it besides the Yjs encoded document.

Many people have tried to solve this problem by implementing some kind of protocol that discards the Yjs document. But you will always run into duplication troubles if the client disconnects for a time and then reconnects with existing content. Even if you would use Operational Transformation (e.g. using shared) you need to persist the log of all operations that were ever created in order to ensure convergence. There is no difference in Yjs - you need to persist the log of all operations (the Yjs document) in order to ensure that clients can always sync and that there is no duplication. The advantage of Yjs is that the encoded document is actually pretty small even for long editing traces and that it works peer-to-peer without a central authority.

ShareDB created a FAQ just for this type of issue https://github.com/share/sharedb/blob/master/docs/faq.md

I know there are concerns that the Yjs document is larger than the pure text document. So naturally, you would like to discard the Yjs document. The same goes for the operation log in ShareDB. As I highlighted above, there are many advantages of persisting the operation log. You can probably think of a way to discard the operation log after a few days - but I would also like to discourage that idea. This is extremely hard to implement correctly and you won’t be able to use other Yjs modules like y-redis or y-indexed for improved load-time. The Yjs document only has an overhead of 45% in practice for long editing traces. This is a small price to pay for convergence.

jin · July 27, 2020, 3:25pm

Hi @dmonad. Thanks for your reply
I understand for your answer.
we had already a lot of data in db, these data have been built by our CRUD editor
But we are going to upgrade to use CRDT editor from current db’s content.
This case, there were some issues that render current content from db, so we used insert(0, text)

I am not sure how to fetch init data from current content

Thanks, Regards.

dmonad · July 27, 2020, 3:31pm

A simple upgrade method it to initialize using .insert(0, 'text') if the Yjs document doesn’t exist yet. But then immediately persist the Yjs document so that the next call doesn’t initialize the document again, but instead uses the persisted Yjs document.

jin · July 27, 2020, 3:36pm

Great @dmonad

Thanks

jstleger0 · January 19, 2021, 4:49pm

Hi, still very new here… but I was having a similar issue with duplicate documents. I would be keen to get your thoughts on the solution below. (See the block inside if (!persistedYdoc.share.size) Is this solution a nightmare?)

This is the WIP solution we have used in this spike to prevent the duplication issue…

Frontend

const ydoc = new Y.Doc();
const permanentUserData = new Y.PermanentUserData(ydoc);
    permanentUserData.setUserMapping(ydoc, ydoc.clientID, username);
    const wsProvider = new WebsocketProvider(
      'ws://localhost:8080',
      docName,
      ydoc,
    );

Backend

As a proof of concept we have been using the y-websocket/bin/utils.js

utils.setPersistence({
  bindState: async (docName, ydoc) => {
    const persistedYdoc = await ldb.getYDoc(docName);

    if (!persistedYdoc.share.size) { // if no ydoc is stored in levelDb
      // This loads documents that havent been loaded by YJS before from our old prosemirror store
      const initialYdoc = await getPreviousProsemirrorDocAsNewYdoc(docName);
      const initialUpdates = Y.encodeStateAsUpdate(initialYdoc);
      ldb.storeUpdate(docName, initialUpdates);
      Y.applyUpdate(initialYdoc, Y.encodeStateAsUpdate(persistedYdoc));
    }

    const newUpdates = Y.encodeStateAsUpdate(ydoc);
    ldb.storeUpdate(docName, newUpdates);
    Y.applyUpdate(ydoc, Y.encodeStateAsUpdate(persistedYdoc));

    ydoc.on('update', async (update) => {
      ldb.storeUpdate(docName, update);
    });
  },
  writeState: async (docName, ydoc) => {
    // This is called when all connections to the document are closed.
    // In the future, this method might also be called in intervals or after a certain number of updates.
    return new Promise((resolve) => {
      // When the returned Promise resolves, the document will be destroyed.
      // So make sure that the document really has been written to the database.
      resolve();
    });
  },
});

dmonad · January 23, 2021, 3:34pm

Hey @jstleger0, this is exactly right. You make sure to initialize the document only once.

!persistedYdoc.share.size is a neat hack. It works as you intended - i.e. it is only empty if the document is empty.