When should you use subdocuments?

braden · November 24, 2020, 3:20am

Hey there! Finally diving into YJS and building a prototype of my product built completely with YJS. The experience has been incredible so far, but I have a question regarding subdocs.

At what scale should I consider using them? It seems like migrating to subdocs later would be difficult, so I’m trying to design my first schema with subdocs in mind. For example, many of my users’ projects consist of thousands upon thousands of documents, so I’m modeling the project as a Y.Map index of document metadata, and then their content is lazily loaded via subdocuments.

Is this an appropriate usage? I don’t want to prematurely optimize, as subdocs add complexity, especially when trying to export or index data for searching, but at the same time, I’m hesitant to have everything in one document, since many of my users are so prolific with their content.

dmonad · November 26, 2020, 2:59pm

Hi @braden,

There is little experience with using subdocuments as they were just released in October.

Could you explain more about your concept of separation of metadata? Maybe you could share some sample-JSON code.

I imagine that you want to allow users to load the editor as fast as possible. But when they only look at metadata (the name of the document, …) they shouldn’t load the editor-content at all. In this case, you should definitely load the editor-content as a subdocument.

Aside from the asynchronous nature of subdocuments (they load asynchronously and - depending on the provider - require a separate network request), I don’t see any disadvantages of using them. In some cases you want to apply atomic transformations (e.g. you want to change two related properties at the same time) that should be executed as one transaction. In this case, you probably don’t want to use subdocuments either.

braden · November 27, 2020, 7:49pm

Hey @dmonad, thanks for the reply! I’ve been playing with subdocuments the past few days and have gotten pretty comfortable with them. You have confirmed my suspicions: there are very little downsides to using them. I think they were the missing piece for me; in only a few days I’ve moved most of my application’s data layer to Yjs, using subdocuments as lazy-load boundaries where necessary. Really excellent work–Yjs is a dream.

dmonad · November 28, 2020, 7:12pm

Oh wow, that’s great to hear!

braden · December 7, 2020, 1:42am

@dmonad, for subdocuments, from the docs I read that “subdoc added/deleted” events can be used for syncing two clients. For example, each client has a search index that it wants to keep up to date, so the client can load and index any new doc whenever it gets a corresponding “subdoc added” event. Is there an equivalent way to detect changes to an existing subdocument? Is that something YJS can do internally, or will I need to keep some kind of shared update log?

For example, I’m connected via remote provider to Subdoc A, but not connected to Subdoc B. Someone changes Subdoc B and I’d like to respond to that by loading it, attaching a provider, waiting for sync step 2, and then indexing Doc B’s content after a debounce interval.

dmonad · December 7, 2020, 11:56am

That’s a really interesting question…

There is no method to subscribe to changes on subdocuments. You need to implement that manually.

With the new differential updates feature, I’m planning to allow y-websocket client to subscribe to subdocuments without loading the documents. We could add a “hook” that is called after some debounce timeout after any document has been updated.

In any case, this would be implemented on the provider-level. Yjs doesn’t know about the contents of the subdocuments. From the perspective of a Yjs document, a subdocumt is only a reference to another document.