Multi doc array metadata explosion

Hi,

I’m running a test with 2 documents, where I am inserting and deleting an element from an array consecutively and then measuring the document size. When I always update one of the documents, the encoded size is 45B. When I randomly select one, the encoded size is >7000B. I tried following the structs but I’m not sure why this is occurring. Any ideas?

Here is the code to reproduce:

const Y = require('yjs')
let doc1 = new Y.Doc()
let doc2 = new Y.Doc()
let docs = [doc1, doc2]

doc1.on('update', update => {
    Y.applyUpdate(doc2, update)
})

doc2.on('update', update => {
    Y.applyUpdate(doc1, update)
})

for (n=1 ; n<= 2048; n++) {
    const c = Math.floor(Math.random() * 2) // Over 7000 bytes
    //const c=0 // <- 45 bytes
    const yarray = docs[c].getArray('a')
    yarray.insert(0, [n])
    yarray.delete(0,1)
}

let size1 = Y.encodeStateAsUpdateV2(doc1).byteLength
let size2 = Y.encodeStateAsUpdateV2(doc2).byteLength

console.log(`${size1};${size2}`)

Hi @Arik,

This is expected. CRDTs need to retain meta-information to avoid sync-conflicts with other peers. Yjs can delete meta-information in many cases (e.g. if you delete the parent-type; or if you insert in-order). Deleting randomly creates a lot of meta-information that will only be deleted when the parent-type is deleted.

@dmonad Thanks! So essentially the blowup is due to an insertion on one node and a deletion on the other?

I just noticed that you have different clients inserting content. In this case, the blowup is still due to the amount of meta-information that needs to be retained. Meta-information from a single client can be garbage-collected more easily.

The worst-case scenario is that you create around 20 bytes per operation that you apply to a document. In most cases, the garbage-collector will virtually remove operations from history or the binary encoder will compress to a small document. Randomness is the worst-case scenario. Yjs will encode any kind of pattern more efficiently.