Implementing end-to-end encryption

dhmacs · December 9, 2020, 3:30pm

I’ve been reading the docs and some of the Y.js repos in the past few days. First of all impressive work
Anyway, I’m experimenting with CRDTs and I would like to know how could I add e2ee when using Y.js? Ideally I’d like to put something like a middleware in y-websocket provider. Since I’d like to store documents on a server I’d like to keep state vector in plain text so that I can query the database (of course no need to index documents as It would break the e2ee requirement)

dmonad · December 9, 2020, 4:18pm

Hi @dhmacs,

I propose that you don’t interpret the data on the server when doing e2e encryption. You can simply store a linear history of all the document updates. If you enumerate them, the clients can specifically ask for a range of document updates.

There is room for optimization. In case the history of document updates is too long, you can ask the clients (that have read-permission) to merge ranges of document updates. Very soon this API might be helpful to you to merge ranges. In the meantime, you can easily polyfill the missing API by creating a Yjs document to merge document updates.

Hope this gave you a new perspective.

Helpful docs about how document updates work: https://docs.yjs.dev/api/document-updates

y-websocket server probably can’t be extended to support e2e encryption because it needs to interpret document updates.

dhmacs · December 9, 2020, 4:53pm

If you enumerate them, the clients can specifically ask for a range of document updates.

is there any guide on how to achieve this?
Specifically I’m looking to use something like DynamoDB to store updates. The primary key (PK) would be something like the document guid (this will allow me to assign a partition to each document) but I’m not quite sure how to design the secondary key (SK).
My understanding is that the client sends the vector state (map<clientId, nextClock>) to the server, but I’m not quite sure on how to use this information to retrieve the missing updates.

One possible idea would be to design the secondary key like this clientId#clock. So I could query for updates like this:
get all updates with PK = docGuid and SK >= minClientId#clock

But I don’t know if it make sense Can this be a solution to store encrypted updates on the server and be able to retrieve only the missing one?

dmonad · December 9, 2020, 8:02pm

If you send the state vector to the server the server can reply with a “diff” - containing all the missing information to sync up the client. But in order to do that, the server needs to interpret the data, which is not a good idea if you intend to build an e2e encrypted service. Unless we have a different idea of what e2e encryption means.

This is why I’m proposing that you handle synchronization. You have a central server anyway (dynamoDB as the central source of truth). For each update you receive, you attach an increasing clock to the update and store it in your database. The clients now ask for missing ranges. This is basically how document updates in Yjs work anyway.