Merging documents

Hello, I’m new to yjs and I must be missing some fundamentals. I hope to describe the problem with some code examples.

I have written a document to disk using the following code:

import * as y from 'yjs'
import fs from 'node:fs'

const doc = new y.Doc()
doc.getMap('a').set('b', new y.Map(Object.entries({ 'c': 1 })))
const data = y.encodeStateAsUpdate(doc)

console.log('Storing', doc.getMap('a').toJSON())

fs.writeFileSync('./doc', data)

The output is: “Storing { b: { c: 1 } }”

Now I retrieve the document and apply it as updates in two ways:

import * as y from 'yjs'
import fs from 'node:fs'

const data = fs.readFileSync('./doc')

const doc1 = new y.Doc()
y.applyUpdate(doc1, data)
console.log('Read 1', doc1.getMap('a').toJSON())

const doc2 = new y.Doc()
doc2.getMap('a').set('b', new y.Map(Object.entries({ 'd': 2 })))
y.applyUpdate(doc2, data)
console.log('Read 2', doc2.getMap('a').toJSON())

This is the output:

Read 1 { b: { c: 1 } }
Read 2 { b: { d: 2 } }

I would expect Read 2 to have both c and d, but it only has d. I suspect this is to be expected and my assumption of the outcome is wrong.

Where does my thinking go wrong? What fundamental piece of the puzzle am I missing?

Hi, welcome.

The reason that you are not seeing doc1 data on doc2 is that you cannot apply updates from one Doc to another. They are incompatible.

More specifically, you cannot apply updates to disparate histories. For example, if you have a doc1 with changes A1 → A2 → A3 and doc2 with changes D1 → D2 → D3, you can apply doc1’s state to a doc with A1, A1 → A2, A1 → A2 → A3 (no change), A1 → A2 → A3 → A4 (no change), etc. But you cannot apply any A updates to Docs with D updates. In other words, updates can be applied to move forward or backward in a single line of history, but cannot be used to mix and match from different histories. Different Docs have different clientIds and clocks (the values used by the CRDT to track causality) so they won’t match up.

If you want to merge the content of two documents, you have to manually copy the content over. If you wanted you could write a function to traverse a Doc and copy everything over to another Doc.

Thank you very much for your response, it greatly helps.

Is it correct to think that a new y.Doc() can receive any changes, but as soon as you modify it, you make it incompatible with any other documents?

Would it be hard to explain what clientId has to do with it? I can see how a clock (if interpreted as somekind of step) could influence it. clientId however sounds like something that would be different on each client.

Yes

Each update is uniquely identified by a clientId and clock. clientIds have to be unique among all connected clients. They can change between sessions though. They are used to provide arbitrary but deterministic tie-breakers when there are conflicts.

I’m afraid I don’t know in greater detail how the clientId affects the specific situation you describe though.

Thank you again for the reply. You answer about the clientId is sufficient and makes a lot of sense to use as a tiebreaker.

I think I have a good enough understanding now to proceed. Thanks again.

1 Like