Just curious on how people persist Y.js docs to databases or other storage. I can see at least 2 approaches that I think are valid:
Store the document as a snapshot after changes, using a debounce mechanism. Overwrite earlierly stored state.
Store individual updates, or merge consecutive updates using a debounce interval, and store them in an append-only storage.
To me it seems that approach 1 will yield more compact storage and faster document retrieval time, while 2 will allow you to go back in history to any given time. The second approach can also be optimized by storing snapshots to avoid having to “play back” the whole update history when retrieving a document to memory.
What are your thoughts and experiences on Y.js document storage alternatives? Am I missing something relevant here?
For low-level use, you can just persist the whole blob to Postgres alongside the serialized json. For any “real” prod apps you should write the updates first to eg Redis and persist the whole binary to Postgres in debounced intervals & on eviction. This will avoid bombarding your prod db needlessly.
It’s not recommended to turn off the GC for Yjs docs, as that will balloon their size for any moderately used doc. Therefore, you can’t have fully traversable history and to make sure you have exact snapshots you should add another table and store the doc state binaries/json there.
Anything else then comes when you have actual traffic. But those problems are probably not relevant for most users of Yjs.
I get a condescending vibe from this - please don’t dismiss other’s applications as “not having actual traffic”. I for one am very interested in optimized solutions for high traffic and high performance, as well as in random access to change history and so on. And especially interested in hearing how challenging cases have been implemented. This is why I asked
Okay I see. I’m not condescending, just saying–from personal experience–don’t build a rocket ship until you need to. Yjs can be a bit of time sink to the detriment of your X note app.
And storing individual updates to DB seems non-ideal for long-term storage. It’s basically the same as GC turned off but worse, since you need to retrieve them separately to apply them.
LiveBlocks - Hosted datastore for local‑first sync engines. Liveblocks Yjs is a realtime sync engine designed for collaborative text editors such as Notion and Google Docs.
y-crossws - yjs server for crossws.unjs.io, works with Node.js, Deno, Bun, Cloudflare Workers without any framework dependency. (Research project)
I don’t have any experience with any of these yet and use HocusPocus as the YJS backend for our PKM app Clibu Notes
You can also run compaction on the individual updates store. E.g. merge all updates into a single one at a certain interval, or after a certain number of updates is reached. That optimizes a bit the read path.
Exactly! On my hobby project ourboard.io I’ve applied this approach:
Document state is persisted in a throttled manner every couple of seconds, so that all updated during that period are merged into a single update, which is then appended to the database as a new row
When there are more than N stored update rows, a compaction is run, where a larger batch of rows is merged into a single update
Admittedly, this “compaction” step is not compatible with the goal of also being able to return to any save point later. In another project, I optimized instead by storing a snapshot every now and then, to avoid having to run through all updates when reading state from DB.
Generally I believe that appending updates instead of storing full state makes a lot of sense, because it makes each individual write transaction lightweight and append-only writes of idempotent updates are not prone to race conditions. With a stateful server, Y.js database traffic is mostly writes, because the document state only needs to be read from the DB only when a document is opened. Could be something like 1 read and a 100 writes.
Thanks for these tips. It seems that these projects are run by companies who want me to host my apps on their platforms. There’s nothing wrong with that! Even though, that’s not what I’m currently looking for, it’s good to be aware of these options when time comes to quickly build something up with minimal hassle.
I had a glance at Y-Sweet in particular and it seems that it’s easy to set up locally as well as hosted on Jamsocket. What I found lacking was the configurability that Hocuspocus server has. I’ve found their configurability and extensive hooks support very useful. For instance with Hocuspocus you can
Run it as a part of an Express application
Add custom persistence layer
Perform operations on document load. Very useful for doing repairs, cleanup and document migrations (needed when your data model changes)
Add server-side observers for data, so you can, for instance, prevent unwanted things from happening
I may be wrong, but I got the impression that you cannot do this stuff with Y-Sweet. Which is fine - they have a different niche.