Document RAM usage after being loaded from storage

Hi, reviewing the following article I have the following question:

In case a document uses 220 MB in RAM, which contains about 1097100 “item” objects, it is stored in a database and removed from RAM. Will reloading the document into RAM for use recreate the 1097100 “item” objects again taking up 220MB of RAM?

I would like some guidance on this to take into consideration when using ram for my application.

Yes, the document is stored on disk in virtually the same format as it is stored in memory—a compact binary structure. When it is loaded from disk all items will be loaded back into memory.

Generally there is plenty of RAM available for this amount of data. It is more of a concern when sending it over the wire.

It is possible to lazy load subdocuments and avoid loading the subdocuments’ items until they are needed, which may save memory usage depending on the shape of your data. Subdocuments are still missing a lot of support though.

Ah ok I understand, thank you very much for your early response. What concerns me is the amount of simultaneous documents that can exist on my server, since that could considerably increase the amount of memory used.

I apologize for asking so much, but I have the following questions:

1.- So, could I manage a document using sub-documents so that, depending on the part of the document that is used in one view or another, only load that part and not load the entire document (which could be quite heavy)?

2.- If a document is stored and it has sub-documents, should I store the sub-documents separately?

2.- What do you mean when you say that the subdocuments need more support?

3.- What other recommendations or memory requirements for my server could you give me to take care of and mitigate as much as possible that my server runs out of memory?

4.- Is there a way to perform searches on the documents efficiently or is it more advisable to keep certain metadata stored externally and perform these searches on that metadata?

Again thank you very much.

1.- So, could I manage a document using sub-documents so that, depending on the part of the document that is used in one view or another, only load that part and not load the entire document (which could be quite heavy)?

Yes. The subdocument keys will always be loaded into memory, but the contents can be loaded on demand.

2.- If a document is stored and it has sub-documents, should I store the sub-documents separately?

Storing them separately is supported out of the box.
Storing multiple documents in a single database is not currently supported afaik.

// persist all subdocuments
doc.on('subdocs', ({ loaded }) => {
  loaded.forEach(subdoc => {
    new IndexeddbPersistence(subdoc.guid, subdoc)
  })
})

2.- What do you mean when you say that the subdocuments need more support?

Subdocuments are fully supported in memory with the Y.Doc class. The only limitation that I am aware of is that updates to multiple subdocuments cannot be contained within a single transaction.

Providers, however, currently are not aware of subdocuments, so you have to create a separate provider instance for each subdocument (as shown above). This is not the most efficient. In particular, y-websocket creates a separate websocket connection per subdocument, which is quite bad. (There are some forks out there that may have a solution but I haven’t used them myself.)

3.- What other recommendations or memory requirements for my server could you give me to take care of and mitigate as much as possible that my server runs out of memory?

I’m pretty new to Yjs and I have not used in production yet, so I’m not sure about this one :). You might want to check out this fork that provides horizontal scaling: GitHub - erdtool/yjs-scalable-ws-backend.

4.- Is there a way to perform searches on the documents efficiently or is it more advisable to keep certain metadata stored externally and perform these searches on that metadata?

I imagine you would use a separate indexing service to provide full text search.

2 Likes