Issue with serializing and deserializing Y.doc to database

milesingrams · November 26, 2021, 9:22pm

My plan is as such:

Create new list and Yjs doc on server
Persist Yjs doc to database in base64 since I’m using postgres:

persistedYDoc = byteArrayToBase64(Y.encodeStateAsUpdate(yDoc))

When any user goes to list page /list/:listID then load Yjs doc from DB and deserialize from base64:

const loadedYDoc = new Y.Doc()
Y.applyUpdate(loadedYDoc, base64ToByteArray(persistedYDoc))

Connect Y doc over WebRTC
Document is autosaved to database periodically by the last user to make an update

However when I try this the persisting and loading of the doc fails to yield identical docs. for example when I run the following code:

function serializeYDoc(yDoc: Y.Doc) {
  const documentState = Y.encodeStateAsUpdate(yDoc)
  const base64Encoded = fromUint8Array(documentState)
  return base64Encoded
}

function deserializeYDoc(base64YDoc: string) {
  const binaryEncoded = toUint8Array(base64YDoc)
  const deserializedYDoc = new Y.Doc()
  Y.applyUpdate(deserializedYDoc, binaryEncoded)
  return deserializedYDoc
}


/*
yDoc is a Y.Doc initialized as such:
{
  value: {
    test: 'This is a test'
  }
}
*/
const serialized1 = serializeYDoc(yDoc) // AQHT/L7sAwAoAQV2YWx1ZQR0ZXN0AXcOdGhpcyBpcyBhIHRlc3QA
const deserialized1 = deserializeYDoc(serialized1) // Doc { ..... }
const serialized2 = serializeYDoc(deserialized1) // AQHT/L7sAwAoAQV2YWx1ZQR0ZXN0AXcOdGhpcyBpcyBhIHRlc3QA

console.log(yDoc.toJSON()) // {"value":{"test":"this is a test"}}
console.log(deserialized1.toJSON()) // {} // This should be the same but is empty
console.log(serialized1 === serialized2) // true // Somehow when reserialized to base64 it still is the same

Note that the serialized and then deserialized doc outputs empty JSON despite being identical in base64 form.

Perhaps I am missing something, Is there a way to persist and load a Y.Doc to a database in base64 without having access to the original javascript object? Perhaps the problem is coming from the fact that I am applying the stateUpdate to a new Y.Doc, but I don’t know of any Yjs function that lets you serialize and deserialize the entire document.

Any help much appreciated!

Thanks!

philip · November 27, 2021, 1:42am

Probably a bug. Reproduction: repl

Interesting! If you call
“deserialized1.getMap(‘value’).toJSON()”
then
“deserialized1.toJSON()”
will work!

philip · December 1, 2021, 1:57pm

It seems like the function has been deprecated:

" @deprecated — Do not use this method and rather call toJSON directly on the shared types."

’
Do anyone know how to get all the keys within a ydoc? Edit: Found it! yDoc.share.keys()

dmonad · December 2, 2021, 11:21am

Yes, it has been deprecated exactly for this reason. It will be removed in the next major release. Types need to be defined at the root-level before accessing them (e.g. using getMap, or getText). toJSON hence can produce different results for different users (depending on which types they defined).

milesingrams · December 3, 2021, 6:49pm

Thanks for finding a workaround Philip! Calling toJSON on the subitem and not the doc works and solves my dilemma. I understand this was deprecated for consistency reasons, but perhaps I still don’t understand why a Doc is treated differently than a Map since both are a collection of key-value pairs with the Doc just being the root. For my purposes I decided to go with my Doc having a single Map in it called ‘value’ and just attaching all my data to that ‘value’ object in a nested fashion. I feel like I’m missing the use case for why “Types need to be defined at the root-level before accessing them”. Is it so different clients can opt in to different parts of the Doc for performance reasons?

dmonad · December 14, 2021, 2:50pm

You are not the only one who feels this way. I decided to design this API like this because overwriting entries on the root-level almost always leads to problems (one client overwriting content from another client).

This is the idiomatic way to work with Yjs documents and I suggest that everyone works with documents by defining a limited number of types on the root level.

If you want to have a root-level map, I suggest that you simply create a single rootmap = ydoc.getMap('root') and use that as the root map. It is safe to call rootmap.toJSON() and it is safe to generate an unlimited number of objects on that root map. However, this is not always what you want, because it is unclear who will initialize the initial content of your document (there could be two clients initializing rootmap.set('my editor', new Y.Text()), hence overwriting each others content).