Is there any documentation or example that explains, or gives pointers to how to implement user authorization/access control?
I’d like to persist (using levelDB, I guess) documents who have one owner. Some documents can further be shared with other, selected users.
I wasn’t able to find any information (might have looked in the wrong places).
To be clear: I know how to build a “classic” user authentication & authorization backend using PHP/MySQL or NodeJS/MongoDB, or even using a provider like Auth0. I’m just not sure about the implications when using YJS, or how to integrate YJS with a classic backend.
the threads for this topic are all over the place. Which was the main reason to create this discussion board ^^
So I’m afraid the following links won’t be too helpful:
https://github.com/yjs/y-websocket/issues/7 - Winston Fasset modified the y-websocket server to authenticate clients. I think eventually he adapted the protocol and added an authentication event message. So in any case you need to modify the y-websocket server.
https://github.com/yjs/y-websocket/issues/14 - Everything you need to know about how to persist data is here: https://github.com/yjs/yjs#Document-Updates - but again, you probably want to implement this by yourself using the database you are using. Important: Don’t be tempted to store the document in JSON encoding or text (if you are making a text editor collaborative). The best approach is to simply store the Yjs model in a database.
Authorization, you guessed it, is also something that is too application-specific to be included in Yjs. My suggestion is to work with random document keys that identify the document. Only clients with the correct key have access to the document. This is also how room.sh and many public notepads work. For more fine-grained control you need to implement something on your own.
https://github.com/yjs/yjs/issues/170 another helpful thread. A user shared his y-leveldb implementation. You could make use of it. Eventually (this is discussed here) I’d like to have a maintained y-leveldb implementation.
I’ll also chime in here and mention that authorization with eventual consistency can be difficult. For example, how would you revoke access to a document once you have already granted access? Put differently, how do you reject unauthorized writes while preserving eventual consistency? In a broader sense, how do you reject writes at all with Yjs?
The short answer is: it’s not really possible in a decentralized way. If you’re going to reject writes, you probably need some sort of consensus among all nodes. More discussion on this topic can be found here: Claims-based authorization · Issue #419 · automerge/automerge · GitHub
Authorization (especially revoking access etc.) in a decentralized system is indeed a very hard problem. The problem is, of course, much simpler when there is a central authority that can reject changes. Rejecting writes to a Yjs document is discussed here: "read-only" or one-way only sync
It is possible to implement your own user management in y-websocket and reject client-connections once access is revoked.
@dmonad - Yeah, thanks for linking to the one-way sync discussion. I remember reading that a few weeks ago but I couldn’t remember where, ha!
I’ve spoken with Herb about authorization (authZ), as well. I think authentication (authN) is relatively straightforward using cryptography, but as we discussed, it is difficult to ensure eventual consistency if you need to reject unauthorized writes. For this to work, unless I’m missing something, you have to choose between having a quorum of nodes reaching a consensus about rejecting future writes, or you have to choose to reject all writes that were causally dependent on a rejected write. The discussion on the Automerge issue discusses this at length. The problem with the former solution is, well… implementing consensus. The problem with the latter solution is that there could be a very large number of writes that causally depend on a rejected write, especially if the write was eventually rejected after a long period of time has elapsed.
I can understand that all of this is probably outside the scope of Yjs, but I am starting to realize that using Yjs or other CRDTs in business applications might not be feasible because of this issue. Yet, it seems like something that can be solved. I’d argue that most database writes can commute, so most writes can enjoy the performance benefits of eventual consistency. On the other hand, most applications require at least a small number of writes that require strong consistency (i.e. linearizability).
@dmonad - I’m really interested in your thoughts about this problem, as I think it’s quite fundamental when building robust software applications. Thanks so much for your work on Yjs!
I should also mention that Martin Kleppmann shared a paper with me about RedBlue consistency, the presentation for which I’ll link below. It discusses a methodology that “colors” blue writes as eventually consistent and red writes as linearizable. The criteria for “blue” is essentially commutativity, but they note that “blue” writes should also never violate application invariants. If you meet these criteria, you fall in the eventually consistent (blue) category; otherwise, you fall in the linearizable (red) category. Their preliminary research showed that most writes in “typical” applications are blue, but a few apps required a small number of “red” writes.
For this to work, unless I’m missing something, you have to choose between having a quorum of nodes reaching a consensus about rejecting future writes, or you have to choose to reject all writes that were causally dependent on a rejected write.
Something like this is definitely possible with an adapted version of Yjs. While this is a super interesting topic, I don’t believe that something like this can ever lead to a good user experience. It is never good when edits simply disappear after a time (the reason being that you synced permissions with another peer, so your causally dependent changes are now all getting removed). It is impossible to build a mental model to understand why this happens for casual users.
Another disadvantage is that you must associate a user and a timestamp with all operations that are created. You need to propagate this information so that others can potentially reject them when a user was revoked access in the past.
If I would tackle this problem, I’d try to find a solution that can be easily expressed as a mental model. Ideally, there is a central authority for authorization on a document. But if that’s not possible and we need to have completely decentralized authorization management I have two other ideas:
One approach would be to write edits to a blockchain. Users need to check permission on the blockchain before accepting changes from a user. The blockchain would also help to keep all permissions distributed when users are offline. You want to prevent that a user shares secret information with a user that was revoked access. If authorization is only handled in a small group of users that have access to the document, it might take quite a while for this information to propagate to all clients. My assumption is that the document will be opened very sporadically. The users might live in different timezones and only open their computers for a few hours a day. In some cases, it might take weeks until this information is propagated. Blockchains (or any other always-available entity for authorization) is an appropriate solution for this kind of problem.
Another approach that would work well for pure p2p applications is to fork the document to a new “room” when permissions from a user is revoked. The client that forks the document will propagate information about the new “room” to the other users in a way so that only users with permission can read it. The user that doesn’t have access anymore won’t even know that they don’t have access anymore (unless you make it explicit).
The fork approach has the advantage that there is a clear mental model of what happens when permissions change. Users that are still in the old room and catch up eventually will have to carry over their offline edits manually, which makes this action explicit. This approach also doesn’t have any additional overhead (like storing user information and a timestamp with each operation).
But these are just a few suggestions. Different solutions have different tradeoffs. I believe that the proper permission model is very application-specific and can’t be generalized.
@dmonad - Thanks for your response. Yeah, both of your approaches essentially rely on consensus for nodes to agree on what the “next” fork of the document should be (i.e. based on access control rules / user authorization). Users who got permissions revoked end up with the “old” fork of the document. It’s then an application decision whether you simply reject syncs on the “old” fork or allow syncing to continue on the “old” fork while most nodes migrate to the “new” fork, ignoring changes on the “old” fork. The client app could also be written such that the “old” document is deleted. In all cases, the consensus of nodes is what allows writes to continue down the “new” fork.
Consensus could be in the form of a blockchain, a central authoritative server, or a quorum of nodes.