Initial offline value of a shared document

I had to solve this problem and I looked for advice from this thread. Might as well update my approach.

So when I fetch the document from the DB, if it doesn’t exist I create a new y.Doc and then immediately run doc.createDefaultDoc() which is just:

  createDefaultDoc() {
    const node = schema.nodes.doc.createAndFill()
    applyUpdate(this.yDoc, encodeStateAsUpdate(prosemirrorToYDoc(node, 'pm-doc')))
  }

Then I immediately write that update to DB. Think that’s the simplest and safest approach. If you allow creation of new documents from client-side, then it becomes a problem to see who has initialized the doc. I’d advise not to do that. Or at least make sure no one else can create the same doc with the same id.

1 Like

Is there a way to do this with y-webrtc? Is it something that I should setup with peerOpts. Or should I use y-awareness?

I think the missing piece here is that Ydoc updates are idempotent.

Once an initial state template “update” has been encoded like so:

const template = “8ab…”
const myDoc = new Y.Doc()
Y.applyUpdate(myDoc, fromBase64(template))
// Then bind to provider and to editor

Then it doesn’t matter that this happens on every local offline client. When they all come online and sync, the ydoc history “knows” that the initial template has been applied (server/dev ClientID plus lambert clock 1) and does not “reapply” that edit. Its not just an “add this initial content” record (which would not be idempotent).

Does that make sense @dmonad ?

I think the missing piece here is that Ydoc updates are idempotent.

@joshuafontany

As far as I know, ydoc is not idempotent.
Each time you apply the update, it will add an update with your client id(which is a random number on every refresh page).

So, you can not give it a template as the initial data. I can graduate that it will often overwrite the server data.

1 Like

GitHub - yjs/yjs: Shared data types for building collaborative software
Document Updates
Changes on the shared document are encoded into document updates. Document updates are commutative and idempotent. This means that they can be applied in any order and multiple times.

That is not my understanding.

IF the local client applies the template as an update to the document, then yes - a random userID will be generated on each page refresh/load. Seperate client IDs will cause the data to “duplicate”.

IF, tho, you encode a document update into a base64 string, and then hardcode (or fetch) that string (with a “reliable” already encoded user ID) and apply it as the first update every time the ydoc is initiated into memory, then the Idempotent nature of updates should prevent data duplication.

@joshuafontany Yes, that is correct. If the clientID and the clock are the same, the update is idempotent.

This question was never adequately answered, and I believe the answer is yes, it is basically the same.

Given how common this question is, I’m going to provide a reusable solution based on the dreaded clientID manipulation. I really don’t think users have been given a good alternative yet. As long as the Doc is synced with the same initial content on every client, I see no issue at all. Please correct me if I’m wrong. (And note that this is WAY better than waiting for the provider to sync. That’s not offline-first!)

Remember: Never sync docs with different initial content. i.e. Always construct the Doc with the same initial content.

Here we go. I just extended Y.Doc with a constructor param to provide initial content:

class TemplateDoc extends Y.Doc {
  constructor(options) {
    super(options)
    if (options?.init) {
      const clientID = this.clientID
      this.clientID = 0
      this.transact(() => options.init?.(this))
      this.clientID = clientID
    }
  }
}

And here it is in action, with the naive approach that duplicates content as a comparison:

Demo: View in CodeSandbox

import * as Y from 'yjs'

/** Syncs two Docs. */
const sync = (doc1, doc2) => {
  const state1 = Y.encodeStateAsUpdate(doc1)
  const state2 = Y.encodeStateAsUpdate(doc2)
  Y.applyUpdate(doc1, state2)
  Y.applyUpdate(doc2, state1)
}

const initialContent = 'This is your new document.'

// Naive approach - duplicate initial content

const doc1 = new Y.Doc()
const doc2 = new Y.Doc()

doc1.getText().insert(0, initialContent)
doc1.getText().insert(initialContent.length, ' Uh oh.')

doc2.getText().insert(0, initialContent)
doc2.getText().insert(initialContent.length, ' Oh no!')

sync(doc1, doc2)

console.log('doc1', doc1.getText().toString())
console.log('doc2', doc2.getText().toString())

// TemplateDoc - insert idempotent initial content

const init = (doc: Y.Doc) => {
  doc.getText().insert(0, initialContent)
}
const docWithTemplate1 = new TemplateDoc({ init })
const docWithTemplate2 = new TemplateDoc({ init })

docWithTemplate1.getText().insert(initialContent.length, ' Fantastic.')
docWithTemplate2.getText().insert(initialContent.length, ' Wonderful.')

sync(docWithTemplate1, docWithTemplate2)

console.log('docWithTemplate1', docWithTemplate1.getText().toString())
console.log('docWithTemplate2', docWithTemplate2.getText().toString())

Output:

doc1 This is your new document. Oh no!This is your new document. Uh oh.
doc2 This is your new document. Oh no!This is your new document. Uh oh.
docWithTemplate1 This is your new document. Wonderful. Fantastic.
docWithTemplate2 This is your new document. Wonderful. Fantastic.

Since this clientID is only manipulated when the Doc is created, there is no risk of breakage.

Just make sure to never sync two docs with different initial content.

Also, once a client uses the initial content, you can never change the value in your code. Think of the initial content of a Doc as part of its schema. You would need a migration strategy to change it (just like if you wanted to change the shared types for code that’s already in production).

2 Likes

Good to have that spelled out in code. In my use-case, I will render the initial doc state as an Update string, & store that string in the html page that starts up the app (actually store it in Fission Drive, but stashing it in the page header at first will work).

1 Like

Don’t think “placeholder data with a template”; think of it as “first user creates a new document with some template operations”. This is two different things

Great!! :clap: I think it fits at least my use case. I will definitely try this approach :crossed_fingers:

Thank you very much @raine

1 Like

Just to ensure if I understand right when this corruption might occur: If a Y.Doc (or any type?) is generated with the same properties/id but with different init value from 2 different clients. Wouldn’t simply using a UUID solve this?

This strategy tricks YJS into thinking that the initial update originates from a single client, thus avoiding duplication. It doesn’t matter that the update originated on multiple clients, because it is byte-to-byte identical for each client.

Using a different clientID on each client (the default behavior) ensures no risk of corruption, but each insertion would be considered a separate update, and thus not suitable for setting an initial value among all clients.

1 Like