One large Y.Doc or many smaller Y.Doc?

ViktorQvarfordt · January 2, 2022, 12:56am

Structuring data in YDocs

One basically needs to decide on the following:

To use one or multiple YDocs for an entity or set of entities in your application.
How to structure the data within a YDoc.

When reasoning around how to structure data in Yjs I recommend to consider these aspects:

The flow of data for common use cases: It can be good to group data that is often used together. In contrast, it may not be practical to load hundreds of YDocs at once or load new YDocs very frequently.
Read/write permissions: Permissions cannot be practically enforced within a YDoc so you need to split data into multiple YDocs if you need different permissions for different parts of the data.
Size is very rarely a practical problem as long as you deal with human-entered text input. (See benchmarks.)
Separate structure and data: In some cases it can be practical to have one YDoc that holds the only the id references across entities (eg. pages) and one YDoc per entity data. This is particularly relevant if you need different permission levels for different entities. If you have no need for granular control, a split like this may be unnecessarily complex.
History and undo: At what level is it natural to track edit history and perform undo? It is much easier to perform history tracking within a single YDoc rather than spread across multiple YDocs.
Consider using a single top-level YMap: Top-level shared types cannot be deleted, so you may want to structure all your data in a single top-level YMap, eg. yDoc.getMap('data').get('page-1').
Subdocuments: You may also consider using subdocuments. However, it gets bit more complex and your provider may not support it.

Storing YDocs in a database

The return type of Y.encodeStateAsUpdate is a byte array (Uint8Array). Postgres has a data type for binary data just like this, called BYTEA. Other SQL databases call this BLOB or BINARY LARGE OBJECT.

Estimating the size of YDocs

Generally speaking, the size of the byte array representation given by Y.encodeStateAsUpdate will grow as you apply edit operations on your document. Yjs does apply garbage collection but some traces of past edits cannot be fully garbage collected in order to maintain the properties of a CRDT. The advice I can give on this is to 1) use the update format V2 version which provides much better compression and 2) run some experiments where you simulate scenarios that will be common for your application and see how your YDocs grow in size.