One large Y.Doc or many smaller Y.Doc?

Hi,

I’m learning yjs and very impressed so far. I have a couple lingering questions.

  1. I’m considering building a collaborative note taking application. A single project can have many pages of notes. I’m trying to decide whether I should have one Y.Doc per project, with each “page” essentially being a Y.Map with nested sub-children. Or whether each individual “page” should be its own Y.Doc instance. I’m curious the pros and cons of each approach. A user can only edit one page at a time, but may edit many pages in a single session. Is there any reason to favor one approach to the other?

  2. I’m trying to determine a strategy for long-term document storage in Postgres. Is there a maximum size for the binary format. In other words, if I call Y.encodeStateAsUpdate with a large document, can I make any assumptions about the size of the UInt8Array that is returned? Will it grow infinitely with the size of my document? What kind of database column are people typically using for storing these updates?

1 Like

Structuring data in YDocs

One basically needs to decide on the following:

  1. To use one or multiple YDocs for an entity or set of entities in your application.
  2. How to structure the data within a YDoc.

When reasoning around how to structure data in Yjs I recommend to consider these aspects:

  1. The flow of data for common use cases: It can be good to group data that is often used together. In contrast, it may not be practical to load hundreds of YDocs at once or load new YDocs very frequently.
  2. Read/write permissions: Permissions cannot be practically enforced within a YDoc so you need to split data into multiple YDocs if you need different permissions for different parts of the data.
  3. Size is very rarely a practical problem as long as you deal with human-entered text input. (See benchmarks.)
  4. Separate structure and data: In some cases it can be practical to have one YDoc that holds the only the id references across entities (eg. pages) and one YDoc per entity data. This is particularly relevant if you need different permission levels for different entities. If you have no need for granular control, a split like this may be unnecessarily complex.
  5. History and undo: At what level is it natural to track edit history and perform undo? It is much easier to perform history tracking within a single YDoc rather than spread across multiple YDocs.
  6. Consider using a single top-level YMap: Top-level shared types cannot be deleted, so you may want to structure all your data in a single top-level YMap, eg. yDoc.getMap('data').get('page-1').
  7. Subdocuments: You may also consider using subdocuments. However, it gets bit more complex and your provider may not support it.

Storing YDocs in a database

The return type of Y.encodeStateAsUpdate is a byte array (Uint8Array). Postgres has a data type for binary data just like this, called BYTEA. Other SQL databases call this BLOB or BINARY LARGE OBJECT.

Estimating the size of YDocs

Generally speaking, the size of the byte array representation given by Y.encodeStateAsUpdate will grow as you apply edit operations on your document. Yjs does apply garbage collection but some traces of past edits cannot be fully garbage collected in order to maintain the properties of a CRDT. The advice I can give on this is to 1) use the update format V2 version which provides much better compression and 2) run some experiments where you simulate scenarios that will be common for your application and see how your YDocs grow in size.

5 Likes

Thanks a lot @VictorQvarfordt! This is extremely helpful

Thank you so much for the summary @ViktorQvarfordt ! Would you mind if I copy that to the documentation? :slightly_smiling_face:

I’m glad it was useful. Feel free to copy and reuse in any way!

1 Like