Initial offline value of a shared document

aliak00 · April 12, 2021, 12:26pm

Hey, I’m hitting a slight snag where I have a situation where two (or more) people enter a document in an offline state. This document say has a initial value that can be whatever a document template defines it to be (a real example would be e.g. slatejs editor framework requires an empty first text node to be present on initialization).

Now two people go to the same document in offline mode, so both their documents are initialzed with the initial document value.

When one goes online, all good. When the other goes online, this initial document is duplicated.

So I’m wondering if there’s a way to tell yjs that “this value here is the initial value and if you see it again then ignore it and sync whatever is after”

My test app is using slate-yjs and y-websocket if that makes a difference to the approach.

I was directed here by slate-yjs btw so there may be more details in the issue I created there (https://github.com/BitPhinix/slate-yjs/issues/192)

Thanks for any help/pointers!

dmonad · April 13, 2021, 11:22am

Hi @aliak00,

I’m happy to help, but can you share a bit more insight about the application you are building? Specifically, why do you need to initialize two documents with the same value?

I assume you have something like a list of notes. As a user, I can either create a note (which involves initializing the note with some content, empty paragraph, …) or I can open an existing document (loading the Yjs document from the server or something like indexeddb).

Cheers,
Kevin

aliak00 · April 13, 2021, 7:33pm

Hi @dmonad!

Thanks for helping! The application is a collaborative editor that can possibly start in offline mode. And two people can start in offline mode in the the same globally-unique document.

SlateJS (an editor framework) requires the empty document be defined as:

{ children: [ { text: '' } ] } <-- that’s a node

Our text editor displays each node as a list item So the above node (the empty document) is displayed as a single list item:

When you press enter, you get a new list item: i.e.

So when two people create their empty document, and neither of them have entered any keys, but both of them try to go online then the yjs shared type gets both “empty nodes” in and the editor data looks like:

[ 
  { children: [ { text: '' } ] },
  { children: [ { text: '' } ] }
]

Which means each person’s empty document gets updated to show:

And I guess if we have X people go online from an empty document state then there’ll be X list items. Even though none of them have entered any data.

Of course this is quite an edge case but I was wondering if there’s a way to handle it?

Hope that made sense. Let me know if anything is unclear.

Cheers,

Ali

dmonad · April 13, 2021, 9:07pm

My recommendation is to wait for the initial content to be synced before rendering the editor content. If you are using y-websocket, you could do:

provider.on('synced', () => {
  // you received the initial content (e.g. the empty paragraph) from the other peers
})

Does that work for you?

YousefED · April 14, 2021, 1:01pm

I’m running into similar issues whilst getting started with yJS. My scenario is the following:

I’m building a notebook-style programming environment. New users can create a new notebook by specifying a title. As soon as the notebook is created, it should be initialized with an empty “cell” (currently stored as Xml).

User’s can do the following simultaneously, whilst offline:

Create notebook “Cats”
We add an empty “cell” to the XmlFragment, so the content of the notebook YDoc will be <cell />

None of the users edit anything, they just open the blank document. Now, when both users go online, the content would become <cell /><cell />, correct?

I think this is exactly the same issue as mentioned above, but hope this extra context helps. Listening to provider “synced” events is not ideal:

The user can be offline, or the provider can be “down”. In this case the synced event will never fire. How do we distinguish between yet-to-be-synced and “offline”?
Waiting for “synced” is only a workaround for users that are connected and can afford to wait for a synced document. For disconnected users it won’t work right?
I prefer my data model to be detached as much as possible from the providers, so this would introduce an extra layer of logic

Hope the scenario explanation helps, perhaps you’re aware of an easier fix

dmonad · April 15, 2021, 6:36pm

When I create a new notebook “cats”, I can initialize the content with, for example, an empty paragraph or with some kind of template and sync that document to other users. Then I can render it.

When I open a document, I need to have it already (e.g. locally stored in y-indexeddb) or download it from another client / server. Then I can render it.

What you are trying to do is something really complicated (I’m not sure you are aware of the trouble you are in). You want to be able to open a document and see some “template” content before you download it from another client. E.g., when you open a notebook “cats”, you want to be able to render the template, that doesn’t show the latest (or any!) state of the document.

The user-experience is rather questionable. I open a document and see some initial content. Then, magically, the content might appear once you have a connection.

Now that this is out of the way. You can do what you are describing to do.

// THIS CODE IS DISCOURAGED AND WILL LIKELY BREAK YOUR Yjs DOCUMENTS
// temporarily change your client id:
const myID = ydoc.clientID
ydoc.clientID = 0
// insert the initial content
ydoc.getXmlFragemnt().insert(0, new Y.XmlElement('p'))
ydoc.clientID = myID

This was a common thing to do in Yjs v11 (a few years back). It allows you to initialize the state with some initial content (e.g. an empty paragraph). However, once a client initializes state slightly differently, you will break all documents. I can’t stress how dangerous this code is. I believe there is no good reason to use that “initialization pattern” which proofed to be far too dangerous in practice. For that reason, I won’t help you when you break your documents (you might receive error messages, or simply have divergent state). I encourage you to use the idiomatic approach that I explained at the beginning.

YousefED · April 19, 2021, 8:52am

Thanks for the explanation, and for the clear warning.

I think what would be helpful is a design pattern for initiating documents (with user addressable identifiers) in offline-first distributed applications. It looks like it’s a complex problem to be able to create new documents offline, though it might be a common scenario. Probably the ideal scenario would:

When creating a document, check with a server or peers whether the ID is available, if not, automatically load or prompt to load the existing document
If step 1 isn’t possible (user is offline), go ahead and create the document locally, but some sort of conflict resolution must happen (e.g.: prompt the user? change ID?) when the document is synced with the server / peers and a document with the same ID has been created already.

For now, my quick fix to work around it is by prompting the user to initialize a new document, that solves most issues. I’ll stay away from your hack

dmonad · April 19, 2021, 9:09am

Just to be clear: It is very easy to create a document offline. You create a document, you manipulate it (i.e. set the initial content), then you send it to the server once you are connected. The thing that I’m discouraging is to have some kind of initialization step that all clients must perform before receiving the actual state from another client/server.

aliak00 · April 19, 2021, 9:56am

@dmonad Thanks for the responses. Yeah it seems a bit tricky. Ideally I’d like this to be doable:

User1: opens app offline (since we’re offline-first) and starts doc “a”
User2: opens app offline (since we’re offline-first) and starts doc “a”

And then when the users do go online, again ideally, this is what’d happen:

If User1 gets connected with empty doc and User2 gets connected with empty doc, the doc stays empty.
If User1 adds content and then both go online then that content is seen and vice versa

The problem is that slate requires initialization data. Which is indeed similar to initializing a document with a template.

The above is i think an edge case though, so @YousefED’s workaround would work (though not be ideal).

If there was a way to compare and then sync that’d be very cool. E.g.

provider.on('synced', () => {
  // If content_to_sync === current_content === initial_content  then set content to initial_content only
})

dmonad · April 19, 2021, 10:23am

I think there is a misunderstanding here. How is it possible to create the same document offline twice? Either user1 creates the document or user2. Not both.

When you open a document, you should already have it. You could either store it in y-indexeddb or retrieve it from a server. You can’t open a document that you don’t have.

In which case does your scenario happen? And why specifically is a “workaround” needed?

aliak00 · April 19, 2021, 11:30am

Could be a misunderstanding indeed!

Maybe you can image a workspace with a single button that says “start doc”. Now say both users are offline and go in to this workspace and click on start doc. The first thing they see is empty doc, but it’s the same single doc in the same workspace. So both have created the same empty doc now. Now both come online with an empty (but initialised) doc - in the case of slatejs, which requires initialisation data, yjs thinks they are separate data entered independently by each user so joins the data together.

But again, this is an edge case.

Another case would be two users go in to a folder and each create an empty document with the same name. So in that case the same thing happens. The workaround would be that When the second user goes online with an empty doc, the system asks the second user to create a copy since there’s already someone who created this doc (even though both are empty and have no content).

Maybe this can be solved from the slate-yjs plugin though, since slate-yjs can know what an empty slate doc is?

YousefED · April 19, 2021, 4:08pm

To summarize and confirm @aliak00’s answer, I think the difference to scenario’s you might be used to, is that we’re working with documents where the user defines the ID. Imagine building a website together and we both start working on “/page.html” while working offline.

dmonad · May 1, 2021, 11:57am

In another discussion, I described a “templating engine” that might be relevant for your use-case. Merging changes from one document into another

The idea would be to always start with a “template” that, for example, contains a headline and an empty paragraph. You could create a template and store it as a base64 in your source-code. E.g.

const ydoc = Y.Doc()
ydoc.getXmlFragment().insert(0, new Y.XmlElement('p'))
const template = bufferToBase64(Y.encodeStateAsUpdate(ydoc))

Every time you execute the above code, you will end up with operations from a different clientID. Execute the above code only once and then copy the content from template to your javascript code.

const template = "8ab.."
const myDoc = new Y.Doc()
Y.applyUpdate(myDoc, fromBase64(template))
// Then bind to provider and to editor
..

Now when you open a document, you can always apply the template first. The empty paragraph is already contained in the template.

I described a similar solution in Initial offline value of a shared document. However, the template solution (using base64) is safe to use because you don’t manually set the clientID (which can be very dangerous).

Hope this provides an alternative solution to you.

YousefED · May 1, 2021, 12:44pm

Thanks for looking into this! This sounds powerful. I’ve worked around the issue for now, but if I revisit that part I’ll definitely give it a try and share my findings.

Just to make sure; would conflicts be resolved if the template is changed later on? (and there’s an "initial template mismatch across users). I think it should be safe but I’m not 100% comfortable yet with the internals of updates

dmonad · May 1, 2021, 10:30pm

It would sync, but you need to extend the previous template. Otherwise, you wuld just duplicate the content. If anyone want to try this out, I recommend to play a bit with the example that I provided here: Merging changes from one document into another

vojto · May 13, 2022, 2:45pm

I described a similar solution in Initial offline value of a shared document. However, the template solution (using base64) is safe to use because you don’t manually set the clientID (which can be very dangerous).

This is not that different from setting clientID manually, is it?

You’re just making sure that when both Alice and Bob make that update, there’s no way one could enter content that is different from another, and thus making sure it won’t break in a terrible way.

So would it make sense to somehow derive the clientID from template content? (Calculate the hash for example.) It would achieve the same guarantee.

Now imagine Alice and Bob are not in sync. Alice generates header Fri 13th and Bob generates 13th, Fri. Once connected, they both send update to server, but the clientID differs, so we fall back to having duplicated content, which isn’t that bad.

Do you think we could revive Support passing clientID to doc transformation utils?

I’m thinking the ideal solution here would be some kind of named range. I would add something like named(0-14, "header") and it would somehow make sure that piece is never duplicated, but last write wins.

dmonad · May 14, 2022, 11:43am

Please take my warning seriously:

Never manipulate clientIDs, unless you really know what you are doing (you understand the YATA CRDT + you are familiar with its implementation)

There will never be utility functions for this because it is very dangerous. Really, don’t take this lightly. There are a lot of things going on that you don’t understand.

I already regret mentioning this here. Working with the templated Yjs update is fine. Manipulating the clientID is not.

dmonad · May 14, 2022, 11:51am

Now, you might be fairly frustrated because what you want to do seems so easily possible.

The problem is that Fri 13th and 13th, Fri have a different number of characters. Hence the merged document will be Fri 13thi for one peer and 13th, Fri for the other. The problem is much worse if you generate complex Y.XML documents like this, because the CRDT items will have different types on different peers.

The problem, as a maintainer of Yjs, is that I will get people reporting all kinds of weird bug reports. I can’t help them, because they manipulated the clientID and violated the integrity of the document. The document will be broken forever.

Just don’t do it.

In all my years of experience in this field (almost 10 years, oh god…), there is only one acceptable usage: For replaying test cases.

vojto · May 15, 2022, 5:46am

Thanks a lot. Yeah so it would be ideal to avoid manipulating clientID. However, when you create a template like you did - isn’t that exactly the same as having a constant update and then manipulating its client ID? With template you’re making two guarantees:

Client ID won’t change
The template content won’t change (so it will be exactly the same set of items)

Is template the only idea? It won’t work for us, because I need a bunch of templates for each day "Fri, 13th". So I was really trying to come up with a dynamic template, while making the same guarantees - clientID stays the same, and the update body stays the same.

What other options I have to resolve the problem originally asked in this thread - so initializing content on two offline devices, and then making sure I don’t end up with duplicates.

vojto · May 15, 2022, 12:59pm

Little background:

We’re creating daily notes for each day, their ID will be derived from the current date - such as 13052022.

New daily note will have some initial content - a header with nice formatted date, such as <h1>Fri May 13th, 2022</h1>.

If you have two devices offline, each one of them might create the daily note and initialize the content with header.

Now if both these devices initialize with a random clientID, you’ll end up with the content inserted twice when merging.

So what would be the solution to a situation like this?