Guidance on persistence storage and working with databases

Hi,

I was wondering if there is any guidance on how to store/maintain the data inside a database. Just keep storing every update as a separate record will make the database huge. Is there a guide on how to efficiently store the data inside a database while keeping some history e.g last 7 days and combining the updates that happened in the past?

I know I could use mergeUpdates but how do I merge these updates that happened in the past?

Any examples of implementation or pseudo code will be greatly appreciated.

1 Like

I haven’t gotten around implementing my version but basically you’d store the history as updates to say Redis or to disk, like RocksDB. But, and here’s the difficult part, you want to also keep the materialized doc also as JSON blob in your say Postgres as the final source of truth.

So you keep using Redis/RocksDB for storing the updates but after certain timeout, maybe a week a day or whatever, you want to toJSON the Y.Doc with the latest updates, save it to Postgres and discard the updates. Which works pretty ok as the next time you can just fetch the doc from Postgres and use encodeStateAsUpdate to apply it to the empty Y.Doc.

Problem only arises if users still have had their tab open for that week (or you just support offline edits) and when they start editing again, their history clashes with the now erased history. Basically I think you have to make your own resolution of the conflicts in that case. Maybe use a separate sync step to figure out if the history was wiped. Then either use encodeStateAsUpdate client-side and apply it to the doc, erasing the client’s old doc and then syncing it as usual. Or just directly discard the client’s version and suppose it was their fault not syncing up before going offline (if client’s history contains unsaved edits). I wouldn’t discard server’s version and follow client’s version unless the doc has been specifically edited by only one user.

Maybe that helps.

@websiddu you can check out this repo GitHub - kapv89/yjs-scalable-ws-backend

This is a great question. With Y.js and similar libraries, there is a balance you have to strike in Y.js. How long do you wish to allow users to work offline? This time duration is what really matters. If you set to 24 hours, then your system would reject changes beyond 24 hours, and then you can safely discard history older than 24 hours.

If you want to allow users to work offline indefinitely, then you can never wipe history without resolving conflicts manually.