I’m using the standard y-websocket provider with levelDB to provide persistence for PRSM. It works fine and has been doing so for more than a year. However, the levelDB database is now more than 2.5GB and will grow bigger and bigger as more users add more ‘rooms’ and more data, all of which is kept in the database indefinitely. So I’m worrying about scaling.
One approach would be to clean out from the DB any rooms that have not been accessed for, say, a year. However, as far as I can see, no last access (or last modification) date is kept with the data in the current implementation. Before I start hacking levelDB or y-websocket, I wanted to ask whether anyone has thoughts about the best solution.
A few thoughts:
- If possible I don’t want to redo any changes I might make to y-websocket every time @dmonad updates the y-websocket code.
- The server is running PostgreSQL already, so a solution that uses that rather than levelDB would be fine, although I have no idea how to transfer the 2.5GB of existing data to a new database.
Your suggestions are very welcome.
If you want to transfer the data to postgres, I propose that you write a wrapper around y-leveldb. Postgres is not a good target to store incremental updates, so you probably want to keep leveldb (or something similar) for caching purposes.
You can already add “metainformation” to Yjs documents in y-leveldb (see
setMeta). The first step is probably to add a timestamp to every entry.
Then you create wrapper around y-leveldb that pulls the document from postgres if the document doesn’t exist in y-leveldb and insert it into y-leveldb. See the documentation (“document updates”) on how to transform a Yjs doc to a binary blob or base64.
This wrapper should also update the mentioned timestamp every time a document is accessed.
Lastly, you can create a routine in the websocket server that writes documents that haven’t been accessed in a while to postgres. Once the doc is persisted in postgres, you can delete it from leveldb.
Thanks for these very helpful hints. My reference to Postgres was somewhat of a red herring: I have no desire to use postgres unless I absolutely have to. My main aim is to keep everything as simple as possible.