Y-prosemirror persistence

flow · December 2, 2020, 1:16pm

great to hear your thoughts! I’m absolutely with you that the target should be a resilient system that scales. Good to hear that you also consider using DynamoDB.

I actually did not really get how the provisioned concurrency of lambdas fits in.
I assume that there are many collaborations happening in different prosemirror documents, where each document state is stored in it’s own Y.Doc.
If that is the case, there must be a differentiation between the different cached states. I think caching every document in the lambdas might not be ideal.

Another concern that I have is the fact that yjs needs an one-to-many architecture for collaborative real-time editing. That means I have to somehow get all the relevant client connectionIDs and send them the changes. A blog post on AWS does it via a third lambda function that iterates over all the connections and send them the changes. As one of my earlier posted links (the first that I mentioned in my last post) states, that this is not that much scalable.

Another thing that bothers me is (if I understood it correct) the fact that it could be possible to apply changes without loading the complete Y.Doc instance. This could be a huge benefit! (But would also be specific to the provider) Although to fully get an idea of this feature I have to dig deeper into the core of yjs & CRDTs.
The related post:

As far as I found there are two approaches to realise a one-to-many architecture (only a short high level overview):

Most common approach (and also supported with current yjs providers):

Sticky websocket connection where each client is connected to a host.
To be able to scale horizontally there is the redis PubSub mechanism that distributes changes to all the other hosts who can then send the changes to their connected clients.
Problem here is the downscaling because of long sticky sessions.
For that approach there is currently the y-websocket and y-redis package.

A new rising approach that not many people have done so far:

A client Websocket connection that is handled by a gateway that holds the client connections and converts the requests into HTTP messages and sends them to the backend.

I personally think that the second approach will be the future and the one I want to try, although there is still a lot to explore to get it running.
Happy to share my findings along the road and for collaboration on the topic