SQL/GraphQL Backend. where do I start?

We have been working on our app for a bit over a year now, it started not being collaborative and now it’s pivoting to add the whole multiplayer experience. The app is a BI dash-boarding building tool. We have data structures that are very nested and have relations to other data structures.
Ex. A board has a block, a block has a payload, payloads have a table and columns (columns have IDS).
with these columns, we create graphs which are a bunch of nested settings for all the settings we offer (graph type, dashed lines, color, etc…) but also have settings that depend on the column ID’s such as changing that columns color or selecting it to be plotted on an axis.

We built the whole app using Apollo GraphQL both on the backend and front end, and we store everything in our trusty PSQL. Everything is a graphQL endpoint, to update (mutate) our models or query them. Ex. createBlock, updateBlock, createBoard, updateBoard, board, block, etc…
It’s also important to know that we use apollo cache client side to store all that data.

Since GraphQL’s subscription implementation leaves a lot to be desired we wanted to see if a library could help us achieve these features without it being a massive undertaking.

With that in mind I have been researching around CRDT (automerge and yjs) to see if this could be a solution to our problem. After looking for hours I could not find any info around migrating an existing codebase over to using these libraries.

Where should I start? Do I need to start at the data model level, simplifying our whole database to just be a single column to store the whole “Block Document”? How do I model a PSQL database to work well with YJS? There are no sql examples of what kind of datastructure a YJS document would have. I did find a psql project that allowed for yjs doc columns in sql but it is not meant for production.
How do I sync user actions with a GraphQL backend? do I need to write a custom sync plugin to dispatch document changes to the correct backend mutations? do I have to keep track of resource invalidation on our backend and tell Yjs on the backend to emit updates to all clients?
Would i need a custom sync for every document?
Would i need to then take into account the various mutations for different actions (create, update, delete, restore).

I have a lot more questions around all of this and I really don’t have a clear vision yet of how something like YJS is implemented efficiently on an app that is not built entirely with YJS from its start.

if anyone has pointers or experience with repos that retroactively added YJS to their stack in a large project (graphql preferably but not required) your help would be much appreciated.

Thank you and have a good day.

I don’t have any experience with GraphQL, but I can share some general aspects of YJS that are hopefully relevant to your questions.

The minimum SQL schema for YJS has two columns:

  • docName: string
  • value: Blob

Each record is an update associated with a Doc. Updates can be merged after a certain threshold to save space. Saving to Postgres would require a custom provider, which could be modeled after y-leveldb. I’m pretty sure others have built SQL providers before, but I’m not sure if they have been made public.

The implication of the binary update format is that all of your data modeling needs to happen in your Doc structure, i.e. with shared types like Y.Map and Y.Array, which can be nested. There is no schema validation in YJS, so you’ll have to have your own checks in place to ensure the data stays in integrity.

There is a key tradeoff with large YJS Docs: Data can be lazy or atomic, but not both. That is, if you store everything in a single Doc, then you have atomic transactions across all shared types (“tables”). However, this comes with the cost of needing to load the entire database into memory when the page loads. This may be entirely feasible if you are modeling a text document + some settings, or entirely infeasible if you are modeling a graph with tens of thousands of nodes. As an alternative, you can split your data across multiple Docs, but you lose atomicity and you need to manage loading and unloading Docs yourself.

If you can sync data unidirectionally from YJS → GraphQL, then perhaps you can query your dataset with GraphQL as needed. YJS doesn’t have any querying capabilities.

Fwiw, the YJS database must be the source of truth. You can store snapshots, JSON, or other representations separately, but the binary YJS updates must be the source of truth, not the other way around. This is because YJS updates contain the full history of edits, whereas synchronic representations do not. The history is needed for the CRDT to resolve potential conflicts.

I integrated YJS into an existing React/Redux frontend, but I had to gut the backend entirely because YJS has its own system of syncing and storage. You may run into similar problems with other CRDT libraries, due to the need to store the full history. There are some libraries like VLCN that are more SQL-centric, but they’re even more experimental.

2 Likes

I integrated YJS into an existing React/Redux frontend, but I had to gut the backend entirely because YJS has its own system of syncing and storage.

explains the issue. It will be a very hard sell to just up and say that all our backend models need to be migrated and rewritten.

I guess i will have to go down the custom solution route.
Thanks for taking the time and explaining the tradeoffs.

1 Like