How to sync thousands of documents and have local persistent store?

nokola · January 27, 2021, 6:25pm

I am building an Electron note-taking app to yjs with the following capabilities:

Offline-only support where all notes are stored on disk
“online mode” option: If a user opts-in to “online” mode, they can access their notes from anywhere.
Share option to enable collaborating on notes with others. For example user A shares note N1 with user B in read-write mode and note N2 in read-only mode. Meanwhile, note N3 is private to A.

Yjs seems to be a really good fit for the collaboration part, however I’m having trouble figuring out some important details about local storage and syncing.

I’m considering the following setup:

There’s central server that acts as “gatekeeper” for user accounts and authorization/permissions.
The server also acts as yjs peer for shared notes.
In “online” mode, user’s notes are shared with the central server over y-websocket.
In order to share notes, encrypt user’s notes with public/private key and give out/revoke keys for shareable notes from the central server.
Offline mode does not require any servers or any web connection, ever

I’ve done a similar app in the past (with firebase+firepad), and know that typical users have upwards of 1000 notes stored. 2-3K notes is typical after a year of using the app. Notes are usually <1K in size.

How would I go about:

Supporting offline-only: writing notes on disk?
I read through yjs/yjs: Peer-to-peer shared types#Document Updates and it seems like I can use doc.on('update' ,...) and write the updates for a note to append-only file on disk. Is this the recommended way to persist the notes? I’m thinking one file per note currently. I looked at yjs/ydb: A distributed database for Yjs documents (github.com), however it seems deprecated.
Syncing thousands of notes effectively?
I assume that if I put each note in a separate yjs document, using something like y-websocket to sync 1000+ notes will be slow or open too many connections from the electron app or cause other issues. If I put all notes in a single doc somehow, I’m not sure how to do sharing at a note level - does YDoc have ability to share (encrypt) only parts of it?

Any pointers are much appreciated! Thanks a lot for reading and for your help!

dmonad · January 28, 2021, 6:11pm

An append-only file on disk would work fine. In order to reduce metadata, you might want to concatenate the updates from time to time. Either by calling Y.encodeStateAsUpdate or by using the new differential updates feature (not yet released).

Why don’t you start with y-leveldb? It works very well in nodejs. Alternatively you can use y-indexeddb, which is supported in the browser and in electron.
There is a separate section about offline editing in the docs: Offline Support | Yjs Docs & y-indexeddb | Yjs Docs

I built the subdocuments feature exactly for this. You can manage all your Yjs documents as references from a top-level Yjs document. The provider would be responsible for syncing subdocuments efficiently.

At the moment, none of the official providers support efficient syncing subdocuments. I recommend to adapt an existing provider and implement a syncing mechanism that makes sense for your application. For example, you could maintain a “last-modified” field that is updated when other clients should sync the document.

nokola · January 29, 2021, 9:49am

Thanks, will try y-leveldb. I wanted to have each note as separate file on disk (easier to debug, also easier to see what is synced), but DB may be more efficient.

Thanks! This looks awesome! I read the docs you sent and I’m not sure - why is there yDoc.subdocs and .getSubdocs() when the code recommends to use rootDoc.getMap().set("subdoc.txt", subDoc) for subdocs? Why not use e.g. rootDoc.getSubDocs().add(...) instead of getMap? Just curious about the design decision around it and if the intent is to change something in the future.

dmonad · January 29, 2021, 10:21am

The idea of subdocs is that you can embed Y.Doc instances into the shared types. So, for example, you could maintain a list of Y.Doc instances in a Y.Array. With Y.Map, you could create something like a filesystem that represents each file as a separate Y.Doc instance. This feature is very powerful as it allows for lazy loading of sub-content.

The provider is responsible for syncing each document. You don’t want the providers to query through all shared types to find the subdocs instances. This is why you can retrieve all subdocs instances using rootDoc.getSubDocs() and listen to events that tell you when subdocs are added/removed.

nokola · January 29, 2021, 6:29pm

Thanks for explaining! Your last post would be great addition to the docs, it helped me understand the intent and clarify that subdocs can be placed anywhere (in Y.Array or Y.Map.) Before I thought the only way was to use Y.Map.

I’ll modify one of the providers to add subdocs support and see how it goes

nokola · January 31, 2021, 8:13am

Update: some successful code below!
I implemented this intermediate MultiDocProvider class to help sync multiple docs. The class assumes a simple append/overwrite/read functions for underlying store. I think such “middleware” approach may be useful for all the existing y-* providers to avoid duplicating code around doc/subdoc update, tracking, and [in the future] other common code such as debouncing. Here’s the TypeScript code, anyone feel free to use with MIT, Apache, or CC0 license:

import * as Y from 'yjs'

export interface UpdateStore {
    append(docName: string, arr: Uint8Array): void;
    overwrite(docName: string, arr: Uint8Array): void;
    read(docName: string): Promise<Uint8Array[]>;
}

export class MultiDocProvider {
    private store: UpdateStore;
    private trimOpsCount: number;
    constructor(store: UpdateStore, trimOpsCount?: number) {
        this.store = store;
        this.trimOpsCount = trimOpsCount ?? 500;
    }

    public trackDoc(docName: string, doc: Y.Doc): void {
        this.store.read(docName).then((updates: Uint8Array[]) => {
            updates.forEach(update => Y.applyUpdate(doc, update));
        });

        let docUpdateCount: number = 0;
        const onUpdate = (update: Uint8Array, origin: any, doc: Y.Doc) => {
            docUpdateCount++;
            if (docUpdateCount > this.trimOpsCount) {
                const fullUpdate: Uint8Array = Y.encodeStateAsUpdate(doc);
                this.store.overwrite(docName, fullUpdate);
                docUpdateCount = 0;
            }
            else {
                this.store.append(docName, update);
            }
        };

        const onSubdocs = this.onSubdocs.bind(this);
        const onDestroy = (doc: Y.Doc): void => {
            doc.off('update', onUpdate);
            doc.off('subdocs', onSubdocs);
            doc.off('destroy', onDestroy);
        };

        doc.on('update', onUpdate);
        doc.on('subdocs', onSubdocs);
        doc.on('destroy', onDestroy);
    }

    private onSubdocs(docs: { added: Set<Y.Doc>, removed: Set<Y.Doc>, loaded: Set<Y.Doc> }): void {
        docs.loaded.forEach((subDoc: Y.Doc) => {
            this.trackDoc(subDoc.guid, subDoc);
        });
    }
}

Here’s a very simple in-memory store, useful for testing:

class MemoryStore implements UpdateStore {
    private store: { [docName: string]: Uint8Array[] } = {};
    public append(docName: string, arr: Uint8Array): void {
        const data: Uint8Array[] = this.store[docName];
        if (data) {
            data.push(arr);
        }
        else {
            this.store[docName] = [arr];
        }
    }

    public overwrite(docName: string, arr: Uint8Array): void {
        this.store[docName] = [arr];
    }

    public read(docName: string): Promise<Uint8Array[]> {
        return Promise.resolve(this.store[docName] ?? []);
    }
}

I’m planning to link this multi-doc provider with different stores - either for filesystem directly or leveldb.

dmonad · February 2, 2021, 7:58pm

Thanks for the feedback. I will update the docs.

I love when people share code I currently don’t have time to work on syncing of subdocs, so I’m happy you started tho work on this. I’m looking forward to see what you come up with. You are right, it makes sense to create a common abstraction for the providers.

kelvinkoko · May 30, 2023, 2:25am

I think this structure is very useful for managing multiple doc!~
Btw, i tried to set had a root doc as YMap and add subdoc to this YMap.
When i use with y-indexeddb provider, only the root doc is persisted. but the content of those subdoc is not. Does y-indexeddb support persist sub doc like this? or i need to implement sync of subdoc myself?

raine · May 30, 2023, 3:45am

@kelvinkoko You have to persist the subdocs manually. YJS subdoc support is very thin—basically just an internal list with some handlers for adding/removing subdocs. You have to wire up syncing and provider support yourself.

Something like this:

doc.on('subdocs', ({ loaded }) => {
  loaded.forEach(subdoc => {
    new IndexedDBPersistence(subdoc.guid, subdoc)
  })
})

kelvinkoko · May 31, 2023, 1:28pm

I see, thanks for the information and the reference. I will try about that!~

Himself65 · June 24, 2023, 5:29pm

I’ve implemented two subdocument-support providers, indexeddb and broadcast channel.

You can read all of the source code here.

github.com

toeverything/AFFiNE/blob/master/packages/y-indexeddb/src/index.ts

import { openDB } from 'idb';
import {
  applyUpdate,
  diffUpdate,
  Doc,
  encodeStateAsUpdate,
  encodeStateVector,
  UndoManager,
} from 'yjs';

import type {
  BlockSuiteBinaryDB,
  IndexedDBProvider,
  WorkspaceMilestone,
} from './shared';
import { dbVersion, DEFAULT_DB_NAME, upgradeDB } from './shared';
import { tryMigrate } from './utils';

const indexeddbOrigin = Symbol('indexeddb-provider-origin');
const snapshotOrigin = Symbol('snapshot-origin');

This file has been truncated. show original

github.com

toeverything/blocksuite/blob/master/packages/store/src/providers/async-call-rpc.ts

import type { AsyncCallOptions } from 'async-call-rpc';
import { AsyncCall } from 'async-call-rpc';
import { merge } from 'merge';
import {
  applyAwarenessUpdate,
  encodeAwarenessUpdate,
} from 'y-protocols/awareness';
import type { Doc } from 'yjs';

import { Workspace } from '../workspace/index.js';
import type { SubdocEvent } from '../yjs/index.js';
import type { DocProviderCreator, PassiveDocProvider } from './type.js';

const Y = Workspace.Y;

export type AwarenessChanges = Record<
  'added' | 'updated' | 'removed',
  number[]
>;

This file has been truncated. show original