Persisting deltas instead of base64 for optimization

I’m currently storing base64 representation in my db. The base64 doc size is significantly larger (e.g. 10x) than the simple delta representation. I can reduce the base64 size by converting to deltas and creating a new doc, but even then the base64 size is still larger (e.g. 1.5x) than the delta.

If I’m already going thru the process of converting my base64 to deltas (to reduce the size), I’m thinking I might as well store the stringify version to the db. I understand the base64 is preferred for transport, so I would still need to convert the deltas back to base64 when retrieving the document.

The benefit of storing deltas are: 1) smaller size 2) readability

My process would change from:

  1. running cleanDoc below before storing a base64 doc

to:

  1. running convertBase64toDelta when I received a base64 doc

  2. running convertDeltaToBase64 when I retrieve the deltas from the db.

From a server cpu processing time, I think it’s somewhat of a wash. Right?

Am I missing something? Is there any reason to store the base64 version?

   function convertBase64toDelta(ydocBase64: string): string {
        const yDoc = new Y.Doc();
        Y.applyUpdate(yDoc, toUint8Array(ydocBase64));
        return yDoc.getText().toDelta();
    }

   function convertDeltaToBase64(delta: any): string {
        const yDoc = new Y.Doc();
        yDoc.getText().applyDelta(delta);
        const unit8Array = Y.encodeStateAsUpdate(yDoc);
        return fromUint8Array(unit8Array);
    }

   function cleanDoc(ydocBase64: string): string {
        const yDoc = new Y.Doc();
        Y.applyUpdate(yDoc, toUint8Array(ydocBase64));
        const docDeltas = yDoc.getText().toDelta();
        const yDoc2 = new Y.Doc();  //redundant but necessary
        yDoc2.getText().applyDelta(docDeltas);
        const unit8Array = Y.encodeStateAsUpdate(yDoc2);
        return fromUint8Array(unit8Array);
    }

Hi @mattch,

You can certainly convert the Yjs document to the delta format forth and back. However, the delta doesn’t retain editing history. Clients that store the old Yjs update will basically duplicate content or overwrite changes once they sync with the server.

You can avoid a lot of nasty issues by simply storing the Yjs update in the database.

The recommended way to transport & store updates is in the original binary format. If you must, you can use base64, although it is fairly inefficient to do so. It is not recommended to store the delta as it doesn’t retain the editing history, and Yjs won’t be able to merge conflicts anymore.

The v2 update format is more efficient, although IMHO, upgrading to v2 is not worth the effort yet. GitHub - yjs/yjs: Shared data types for building collaborative software

Thanks @dmonad! That’s a good point. I don’t currently support offline mode so losing the history is not an issue, but I kinda like the idea of saving binary updates. Currently, I save the binary updates between the bindState and writeState, and merge them before storing them.

It’s not just about offline support. Sometimes clients will just disconnect for a few seconds (or after closing the laptop lid for a few hours / days). So in any case, I recommend to store the Yjs document whenever possible.

Got it, Thanks, again @dmonad.