y-websocket/levelDB/ and a bit too much persistence

micrology · September 29, 2022, 5:08pm

I’m using the standard y-websocket provider with levelDB to provide persistence for PRSM. It works fine and has been doing so for more than a year. However, the levelDB database is now more than 2.5GB and will grow bigger and bigger as more users add more ‘rooms’ and more data, all of which is kept in the database indefinitely. So I’m worrying about scaling.

One approach would be to clean out from the DB any rooms that have not been accessed for, say, a year. However, as far as I can see, no last access (or last modification) date is kept with the data in the current implementation. Before I start hacking levelDB or y-websocket, I wanted to ask whether anyone has thoughts about the best solution.

A few thoughts:

If possible I don’t want to redo any changes I might make to y-websocket every time @dmonad updates the y-websocket code.
The server is running PostgreSQL already, so a solution that uses that rather than levelDB would be fine, although I have no idea how to transfer the 2.5GB of existing data to a new database.

Your suggestions are very welcome.

dmonad · September 30, 2022, 12:23pm

If you want to transfer the data to postgres, I propose that you write a wrapper around y-leveldb. Postgres is not a good target to store incremental updates, so you probably want to keep leveldb (or something similar) for caching purposes.

You can already add “metainformation” to Yjs documents in y-leveldb (see setMeta). The first step is probably to add a timestamp to every entry.

Then you create wrapper around y-leveldb that pulls the document from postgres if the document doesn’t exist in y-leveldb and insert it into y-leveldb. See the documentation (“document updates”) on how to transform a Yjs doc to a binary blob or base64.

This wrapper should also update the mentioned timestamp every time a document is accessed.

Lastly, you can create a routine in the websocket server that writes documents that haven’t been accessed in a while to postgres. Once the doc is persisted in postgres, you can delete it from leveldb.

micrology · September 30, 2022, 12:58pm

Thanks for these very helpful hints. My reference to Postgres was somewhat of a red herring: I have no desire to use postgres unless I absolutely have to. My main aim is to keep everything as simple as possible.

micrology · May 29, 2024, 4:13pm

I have been distracted by other things, but now need to get back to this - my levelDB is now over 7GB!

@dmonad said:

You can already add “metainformation” to Yjs documents in y-leveldb (see setMeta ). The first step is probably to add a timestamp to every entry.

which would be an excellent idea, but I can’t work out how to get to the leveldb instance (called persistence in the y-leveldb API docs) from the WebsocketProvider instance that I obtain in my client code. Is it accessible? If not, how do I call y-leveldb’s setMeta method?

I am using the standard

const wsProvider = new WebsocketProvider('ws://localhost:1234', 'my-roomname', doc)

to connect to the websocket, and at the server end,

HOST=localhost PORT=1234 YPERSISTENCE=./dbDir node ./node_modules/y-websocket/bin/server.js