Separating the application model from the shared data structure

NickDarvey · March 27, 2022, 8:00am

In this article, @dmonad describes seeing value in separating the application model and shared data structure.

Yjs’ shared types are very powerful and allow you to make any kind of application collaborative. But shared models that define an application-specific API make it easier for developers to manipulate the data without understanding how the data is represented in the CRDT. This is particularly relevant because CRDT implementations are almost always schemaless (Cambria being the exception). A well-maintained model could ensure that the model is compatible with previous versions.

The canonical examples of yjs are with editor UIs. Some of (all of?) those editor bindings take a Yjs type as a parameter.

What would it look like to separate the model from the shared data with editor bindings? Would application developers essentially be writing bindings to make their own model (like this) compatible with both Yjs and the editor UI?

dmonad · March 27, 2022, 9:49am

For a collaborative editor project, you essentially only need Y.Text (or Y.Xml). It doesn’t make sense to create a complex model around either of them.

However, if you are about to represent a large portion of your application model in Yjs, then it can make sense to invest some time in building a model as we did for the Jupyter Project.

A few reasons why building a shared application model on-top of Yjs can make sense for your application.

Yjs’ types are untyped and don’t have a schema.
- Building a typed model around Yjs allows you to make some assumptions about the data that you are retrieving. It allows others to consume your data without duck-typing.
- It ensures that no third-party vendor (e.g. a Jupyter extension) writes invalid data into a Yjs document.
- Compatibility: Changing names is less of a concern. While changing a key-name in Yjs results in an model that older clients won’t understand anymore (e.g. change “users” to “accounts”: ymap.get('users') to ymap.get(accounts)). The abstract model can still use the old name internally but expose a different name externally. This gives you more control over compatibility.
You are working in a team and not everyone is familiar with Yjs.
You want to have the option to swap Yjs (or only parts of the model) for a different sync solution without affecting the whole codebase.
You only want to invest one person working on the complex “syncing stuff” while the others can focus on consuming an easy-to-use API.
Usability: Yjs was built for performance, not usability. Other shared models are much easier to use and integrate well in certain UI frameworks. I encourage you to try wrappers like synced-store around Yjs that make it easier to work with your data.
Not all concurrent actions result in an expected outcome.
- An example from the Jupyter Project: A cell can be of type “MarkdownCell” or “CodeCell”. A cell can also have metadata information that sometimes relates to the type of cell. Every Cell is represented as a Y.Map. If User1 adds an execution_count as metadata to a CodeCell, but User2 concurrently changes the CodeCell to a MarkdownCell, we end up with a MarkdownCell that has an execution_count (which really doesn’t make sense in this context). This is one of the scenarios that we want to prevent. A more specialized model around Yjs’ types can handle these issues internally (e.g. by removing invalid properties) while exposing a typed interface.

The only downside that I can think of is that you need to invest time in your abstract model. However, that is probably time well spent.

An alternative solution for shared models is to provide some form of schema for CRDTs. See for example Project Cambria: Translate your data with lenses. However, a schema is far from practicable at the moment as all existing solutions are experimental and far from being usable in practice.

Please also note that we still have so much to learn about how to use Yjs (or CRDTs in general) to sync application state. I’m fairly certain that, as the field grows, we will find better abstractions and tools to sync application state.

NickDarvey · March 28, 2022, 2:17am

Even in this case, we’d be defining some structure via the editor UI schema and then binding the editor UI directly to the shared data structure. The editor UI schema may evolve over time and any instance of the shared data structure may be outdated.

Having some kind of application model in-between would give us a place to migrate, I guess like Cambria explores.

[state schemas] <- [codecs] -> [app schemas] -> [editor UI]

dmonad · March 28, 2022, 8:03am

If you have the resources it does make sense to build your custom editor model. This would even allow you to switch to a different editor in the future by writing another editor binding.

Another user from the ProseMirror discussion board is taking this step: Offline, Peer-to-Peer, Collaborative Editing using Yjs - #33 by jessejorgenson - Show - discuss.ProseMirror

However, this will require you to build a complex editor model and a custom editor binding to whatever editor you are using. I would argue that most companies don’t have the resources to spend on this and might want to find an alternative approach (e.g. transforming the old schema to a new one and outdate clients that work on an outdated model).

Again, I hope that in the future we find better, more generic solutions for this.