I’m running a test with 2 documents, where I am inserting and deleting an element from an array consecutively and then measuring the document size. When I always update one of the documents, the encoded size is 45B. When I randomly select one, the encoded size is >7000B. I tried following the structs but I’m not sure why this is occurring. Any ideas?
Here is the code to reproduce:
const Y = require('yjs')
let doc1 = new Y.Doc()
let doc2 = new Y.Doc()
let docs = [doc1, doc2]
doc1.on('update', update => {
Y.applyUpdate(doc2, update)
})
doc2.on('update', update => {
Y.applyUpdate(doc1, update)
})
for (n=1 ; n<= 2048; n++) {
const c = Math.floor(Math.random() * 2) // Over 7000 bytes
//const c=0 // <- 45 bytes
const yarray = docs[c].getArray('a')
yarray.insert(0, [n])
yarray.delete(0,1)
}
let size1 = Y.encodeStateAsUpdateV2(doc1).byteLength
let size2 = Y.encodeStateAsUpdateV2(doc2).byteLength
console.log(`${size1};${size2}`)
This is expected. CRDTs need to retain meta-information to avoid sync-conflicts with other peers. Yjs can delete meta-information in many cases (e.g. if you delete the parent-type; or if you insert in-order). Deleting randomly creates a lot of meta-information that will only be deleted when the parent-type is deleted.
I just noticed that you have different clients inserting content. In this case, the blowup is still due to the amount of meta-information that needs to be retained. Meta-information from a single client can be garbage-collected more easily.
The worst-case scenario is that you create around 20 bytes per operation that you apply to a document. In most cases, the garbage-collector will virtually remove operations from history or the binary encoder will compress to a small document. Randomness is the worst-case scenario. Yjs will encode any kind of pattern more efficiently.