Then I immediately write that update to DB. Think that’s the simplest and safest approach. If you allow creation of new documents from client-side, then it becomes a problem to see who has initialized the doc. I’d advise not to do that. Or at least make sure no one else can create the same doc with the same id.
I think the missing piece here is that Ydoc updates are idempotent.
Once an initial state template “update” has been encoded like so:
const template = “8ab…”
const myDoc = new Y.Doc()
Y.applyUpdate(myDoc, fromBase64(template))
// Then bind to provider and to editor
Then it doesn’t matter that this happens on every local offline client. When they all come online and sync, the ydoc history “knows” that the initial template has been applied (server/dev ClientID plus lambert clock 1) and does not “reapply” that edit. Its not just an “add this initial content” record (which would not be idempotent).
As far as I know, ydoc is not idempotent.
Each time you apply the update, it will add an update with your client id(which is a random number on every refresh page).
So, you can not give it a template as the initial data. I can graduate that it will often overwrite the server data.
IF the local client applies the template as an update to the document, then yes - a random userID will be generated on each page refresh/load. Seperate client IDs will cause the data to “duplicate”.
IF, tho, you encode a document update into a base64 string, and then hardcode (or fetch) that string (with a “reliable” already encoded user ID) and apply it as the first update every time the ydoc is initiated into memory, then the Idempotent nature of updates should prevent data duplication.
@joshuafontany Yes, that is correct. If the clientID and the clock are the same, the update is idempotent.
This question was never adequately answered, and I believe the answer is yes, it is basically the same.
Given how common this question is, I’m going to provide a reusable solution based on the dreaded clientID manipulation. I really don’t think users have been given a good alternative yet. As long as the Doc is synced with the same initial content on every client, I see no issue at all. Please correct me if I’m wrong. (And note that this is WAY better than waiting for the provider to sync. That’s not offline-first!)
Remember: Never sync docs with different initial content. i.e. Always construct the Doc with the same initial content.
Here we go. I just extended Y.Doc with a constructor param to provide initial content:
doc1 This is your new document. Oh no!This is your new document. Uh oh.
doc2 This is your new document. Oh no!This is your new document. Uh oh.
docWithTemplate1 This is your new document. Wonderful. Fantastic.
docWithTemplate2 This is your new document. Wonderful. Fantastic.
Since this clientID is only manipulated when the Doc is created, there is no risk of breakage.
Just make sure to never sync two docs with different initial content.
Also, once a client uses the initial content, you can never change the value in your code. Think of the initial content of a Doc as part of its schema. You would need a migration strategy to change it (just like if you wanted to change the shared types for code that’s already in production).
Good to have that spelled out in code. In my use-case, I will render the initial doc state as an Update string, & store that string in the html page that starts up the app (actually store it in Fission Drive, but stashing it in the page header at first will work).
Don’t think “placeholder data with a template”; think of it as “first user creates a new document with some template operations”. This is two different things
Just to ensure if I understand right when this corruption might occur: If a Y.Doc (or any type?) is generated with the same properties/id but with different init value from 2 different clients. Wouldn’t simply using a UUID solve this?
This strategy tricks YJS into thinking that the initial update originates from a single client, thus avoiding duplication. It doesn’t matter that the update originated on multiple clients, because it is byte-to-byte identical for each client.
Using a different clientID on each client (the default behavior) ensures no risk of corruption, but each insertion would be considered a separate update, and thus not suitable for setting an initial value among all clients.