Very large datasets


I’m in a planning mode around something and I think the YJS data model makes a lot of sense for what I’m trying to do but I want to architect it in a way that scales.

Basically I have objects that progress only forward (little state machines) and have a hash associated with them. I’d like to create a big document mapping “name” to “current hash” - there will be rules allowing the current hash to change, but the actual CRDT document will only ever be “set name to hash”

The catch here is this could grow quite quickly to hundreds of thousands of key/value pairs. What I like about YJS is the garbage collection, but I’m trying to figure out how to architect it so that the state doesn’t become insanely large.

Sharding is an obvious answer but then “how many shards” is the question… is there a way to grow that set somehow in a CRDT environment?

Anyway - I don’t need a full solution… just interested in ideas on how to manage large state sets that could grow unbounded over time.

Thanks! YJS is awesome.

1 Like

Hey @tobowers,

There are many CRDT implementions for key-value storage. Yjs’s implementation for key-value storage will have a reference to every key-string ever created and might therefore be unsuited for your project. But it handles key updates very efficiently.

The best approach here is to test and performance benchmark. 100k key/value pairs sounds like something Yjs can handle. Other implementations might be better suited for this exact scenario.