Common Concepts & Best Practices

MentalGear · February 15, 2024, 5:22pm

Intro

Venturing into the realm of Y.js, I’m impressed by its capabilities, yet grappling with comprehending specific concepts and managing potential conflict states. I’d really appreciate guidance & best practices knowledge.

We are planning to build a PoC Notes App with Y.js, and in this context, an array of technical questions has surfaced.

1. Structure

a. Should every note be it’s own Y.Doc?
I.e. n(Y.doc) = n(notes)

b. Or should there be only one central Y.Doc that has a Y.Array with many Y.Maps which contain each note?

root: Y.Doc = {
   notesList: new Y.Array<Y.Map>()
};

1.2 Possible Conflict States

Let’s say there’s Group A that can have access to all notes. But Group B should only have access to a subset notesListB.

Y.Doc : {
	notesListOnlyA: [],
	notesListB: [],
}

a. What would be the state of Y.Doc that Group B receives? Will notesListOnlyA be empty for them or not even exist ?
b. If a member of Group B syncs their state back to the server, would Y.js assume, since there are no items in notesListA (as Group B was never allowed access), that the member of Group B deleted all of notesListA and tries to propagate this state ?

2. Permission/Access Management

As far as I understand, permissions should be checked on the server-side before “commit” of a client’s update to the persistent server-side database provider.

What will happen in a scenario, like above, where the client tries to update properties on the server to a state that the client has no permission to.

Or another example that might make it more clear:

Member C adds a note Test. Member D opens it.
Member D disconnects X
Member C sets the note Test to private (not accessible by D)
Member D makes edits on note Test, and various other edits to other notes
Member D reconnects to the server

What will happens here?

Will note Test disappear for Member D, including the content they have added to note Test ?
When Member D reconnects: will the server (partykit) send a “access denied” for note Test property?
- Can the rest of the Member D’s data (other notes) still be synched or will the whole dataset (all that the user did while offline) be invalid?
- Will this result in a feedback loop? E.g. will the client still try to push the update again and again as the server always denies it ?

3. Data Migration

Y.js is schema-less. However, if a client after a long time comes back online, and the dataset that they try to insert is very different from the current structure, there needs to be some migration strategy in place, right?
What’s the best practice to handle these ? Should a version number be added to each Y.Doc ?

4. General Conflict Resolution Failure Scenarios

Is there a list of common situations where convergence fails, or is not correctness, and hence we need to handle with custom code?

raine · February 15, 2024, 10:52pm

I’ll offer some brief thoughts.

The advantage of a single Y.Doc is:

storage efficiency - One Y.Doc has less overhead and maximizes use of the binary compression used to store updates.
low complexity - One Y.Doc is trivial to load and sync.
atomicity - Multiple notes can be edited in a transaction (not sure if that is a requirement you have).

The disadvantage of a single Y.Doc is mainly:

The entire Y.Doc must be synced and loaded into memory. There is no way to partially load a Y.Doc, and it grows in size over time. This can affect load time and memory usage. I’ve crashed the browser with ~10k items after they accumulate some history.

There is no way to grant partial access to a Y.Doc. It’s all or nothing.

Unfortunately YJS doesn’t provide granular access control. You would have to build that out. The y-websocket server uses a simple “secret url” approach that allows per-Doc syncing and persistence to a given room name. You can send a token in the request and do additional auth yourself. You would need to build the logic for setting/changing permission and how various scenarios are handled on the client.

Migration strategy is something else that YJS lacks. I recommend searching the forums to find some hints for how others have done this. Again, it’ll be a custom job.

Luckily, that is where YJS and CRDT’s in general really shine (they’re “conflict-free”). Convergence is rock solid, and you can add additional providers for persistence, horizontal scaling, redundancy, etc.

savaki · May 11, 2024, 6:19pm

This is a fantastic response. I did have a follow up question to this block:

The entire Y.Doc must be synced and loaded into memory. There is no way to partially load a Y.Doc, and it grows in size over time. This can affect load time and memory usage. I’ve crashed the browser with ~10k items after they accumulate some history.

Is this true? Does the entire document always need to be loaded into memory always or is there a snapshot form outside of enabling the gc e.g. a form that maintains the ids for each of the fields, but does not need to keep in memory deleted ids/content? In the use case I’m considering, time travel would be a critical feature.

dmonad · May 14, 2024, 10:06pm

By default (gc enabled) the Yjs document will clean up a good chunk of history. Not everything though.

I think I made a good point about why we don’t need more garbage-collection, in this article: Are CRDTs suitable for shared editing?

This doesn’t apply to all kinds of applications. Also, there are some editing behaviors that Yjs is really good at optimizing automatically (e.g. text editing), but in other areas Yjs accumulates a lot of history (if you change many different properties for every single mouse event - that’s something Yjs can’t optimize well and should be avoided).

Often, you can get away with disabling garbage collection. Then, you can time-travel using Y.snapshot to any point in time, although that is not something that I’d recommend to everyone. Generally, I suggest that you store copies of different versions so you only distribute the data that is currently necessary. It doesn’t make sense to distribute all data that was ever generated whenever you load a shared text document. I recommend only loading/distributing what is currently necessary.