Yjs and AWS amplify Datastore

mdegrees · April 21, 2021, 6:21pm

DataStore is a persistent on-device storage from AWS. Under the hood it uses IndexDB on the web and SQLite on mobile. Its compelling feature is the ability to model data with the graphql syntax. You can then interact with the data models online or offline. DataStore, then, takes care of syncing the data to dynamoDB. https://aws.amazon.com/blogs/aws/amplify-datastore-simplify-development-of-offline-apps-with-graphql/

My initial plan using Yjs with Datastore was:

creating an updates tables
on a remote update commit the update to Yjs
on a local update write the update to the updates table. With the syncing mechanism. the update will end up in dynamoDB then dispatched to whoever listens to it. then run step 2

Then I stumbled upon @dmonad comment on why it’s better/cheaper to use a custom webSocket server as writing granular updates to db is expensive/slow. https://github.com/yjs/yjs/issues/189#issuecomment-707703284

I went out to build a custom Websocket server to later find out that I would anyway have to write updates to DynamoDb in order to persist Data. see “bindState” in: https://github.com/yjs/yjs/issues/170#issuecomment-536464934

Can anyone please help me clear out this confusion? Is piggybacking on DataStore syncing engine to forward updates, even a good idea?

Thank you

dmonad · April 22, 2021, 6:13pm

Generally, Yjs works using any Network protocol. So you can absolutely plug it into DataStore.
I don’t have any experience with DataStore at all, but as far as I understand the idea sounds promising.

However, databases that have complex abstractions (SQL / GraphQL / …) always come with overhead. PaaS providers will charge $$ for operations on the database. Collaborative applications have a lot of write & read requests (at least one r/w request per keystroke). Furthermore, when you want to propagate cursor & presence information, you will quickly end up with a lot of (very small) updates per second. DynamoDB, for example, costs $1.25 per million write requests. Which is fair, but you could do a lot better.

Can anyone please help me clear out this confusion? Is piggybacking on DataStore syncing engine to forward updates, even a good idea?

The New York Times, for example, wrote their internal collaborative editing application on top of FireStore which is very expensive in ops/sec. It is only used by their writers. Obviously, a large corporation can spend 10$/user in costs for simply propagating changes (pricing is just a rough estimate in my head). You can’t publish such an application to the public though because you would have to charge too much.

My advice is that you go through their pricing plan and check how expensive this would be. The “ideal” solution would use a custom websocket connection and something like a Redis cache to avoid write requests to a database. But it’s completely fair to throw money at the problem to reduce maintenance overhead.

mdegrees · April 26, 2021, 9:59am

@dmonad, crystal clear. Thank you for the thorough answer. I haven’t thought about the downsides of forwarding cursor and awareness updates. Indeed it would be such a waste of resources if channeled through a “heaving” backend. I will try to use a websocket API from AWS and persist to a dynamodb table.