How to implement data persistence on the server side

shi-yan · October 27, 2020, 5:42pm

I’m new to CRDT. It’s unclear to me how to implement data persistence on the server side.

It feels like simply saving a serialized the json doesn’t work. I will need to active merge updates on the server side.

If my server is not built in nodejs, what options do I have?

flow · October 27, 2020, 6:54pm

to be honest, I am new, too. Here is what I’ve found so far, maybe it helps a little bit:

There is a y-leveldb package (which @canadaduane funded, if I’m correct - so thank you @canadaduane ). It is an adapter for the LevelDB databse.
As far as the docs state it the adapter uses level. Level offers the ability to switch the storage medium. Their GitHub Orga has an awesome repo, where you can find some storage adapter packages for level:

You can implement an bidirectional communication between client and server via websockets. Next to the y-websockets module implementation for clients you can find an example server implementation which offers the possibility with persisting data with y-leveldb.
This might be a good starting point for further investigation.

The following answer might help by examining the right data format to store the documents:

Right now YJS is mainly in the JavaScript/NodeJS ecosystem.
It would only work alongside other languages or tech stacks.
You could host a nodejs server besides your current server application and only use it for realtime awareness data (like cursor positions) or expand it to store also the yjs documents. Everything else you can hold in you current server application.
This might not be ideal, but good for a first prototype or first fast integration.

As far as I can spot it, @dmonad works on a Rust implementation of YATA called Yrs.
But this port to Rust is quite new, so might now be stable right now (also no GitHub releases)

shi-yan · October 27, 2020, 7:26pm

my backend is built in rust, I wonder what options I have given the rust port of yjs is very new.

one option I’m thinking is using https://deno.land/manual/embedding_deno to create an embedded javascript runtime and run yjs in it.

dmonad · October 29, 2020, 11:20am

Hi @shi-yan,

@flow explained it nicely: At the moment Yjs only works in JavaScript. There is a real need for a native port of Yjs that we can also bind to other languages (Rust projects integrate nicely into other languages using either wasm or language bindings). I applied at multiple organizations to fund the Yrs project which will allow you to use the CRDT in native applications as well. Maybe we will have a working Rust port somewhen next year.

Alternatively, you could implement your own update logic by simply persisting each update message. You don’t need to care about the content of the update message. You just need to ensure that every client gets every update message (the order doesn’t matter, it also doesn’t matter if you apply updates multiple times).

svenali · February 26, 2021, 10:18am

Hello @dmonad,

Thank you for your great work!!! I have a simular task to do. On client side I work with JavaScript. No problem here to use yjs. On server side I don’t want to use node.js. But theoreticly I can write a websocket server in Java, which update all clients and don’t need to understand the deep technique of Yjs? Right?

Greeting,
Sven

dmonad · February 28, 2021, 12:37pm

Hey @svenali, that’s right. You can sync clients without being able to parse Yjs document updates. Here is a related thread about possible approaches: Stateless server broadcasting implementation (in Go)

Of course, this is not always ideal. I’m working on shipping “differential updates” to Yrs (an unfinished Rust port of Yjs). The idea is to port Yrs to many different languages like Java using language bindings. With differential updates, you would be able to sync clients using conventional SyncStep1/2 messages which are more efficient.

shi-yan · March 8, 2021, 6:16pm

Thank you everybody.

I have a follow up question. I looked at the indexeddb storage adapter. It seems to save every updates.

Initially I thought the storage would save a document’s state, not update.

If we are saving updates, isn’t recovering the current document from the updates very slow? Imagine I’m building an editor, the first time I open the editor, I need to pull all updates and reconstruct the document from scratch?

dmonad · March 8, 2021, 8:12pm

Incremental updates are written to the database to improve performance. It would be infeasible to compute and save the complete document after every keystroke. After 100 or so updates, y-indexeddb will merge the updates to a single entry. This happens asynchronously after the keystroke has been rendered.