Advise around a sync engine with thousands of docs

I am building an app where 30K-40K pages can exist on UI. Each page is a JSON object, and the content property is a Yjs doc.

I am using Tiptap, and only the editors in viewport are rendered. For realtime sync and awareness am using Hocuspocus (single socket, multiplexed by document name).

Right now on every doc.on('update',...), I persist the whole state to IndexedDB. I do not store individual updates.

I also have an events system. A local per device queue + a server events table with a monotonically increasing eventIndex. Each device tracks a event “checkpoint” and advances it after consuming an event, like UPDATE_PAGE and CREATE_PAGE etc, but these are strictly for JSON properties/objects. These events are realtime with a socket but also can be replayed once a device comes online, and is behind.

I have some problems around sync for pages that are not rendered / do not have an active provider:

  • A device is offline, it edits Page 1, creates 20 more pages, and the page 1 is never rendered again so provider based sync never runs.
  • A device is online and updates Page X, another device is online too but Page X isn’t open / rendered, so no it never receives an update.

I considered two approaches:

  • Utilize my events table. But update would produce an event and would bloat the events table. Even with debounce, it would be too much.
  • On each Yjs update, I can increment a “unsynced” counter for that page in IndexedDB (I have had race issues with boolean flags before). A background worker can unsynced pages to server. Other device’s worker can periodically poll for all “updatedAt” and pull/merge changes.

But even with these I am worried about some edge cases:

  • If I mark a document as unsynced when offline, and then we come online, the provider might sync it anyway and my worker could still push (the whole state), causing duplicate writes on server?
  • Should I mark a document unsynced even if it’s currently rendered and has a active provider. If I don’t. What if the socket drops and the tab is refreshed?
    • I don’t see a way in Hocuspocus to know if a update has been “acknowledged by the server” so it’s hard to decide when to mark an update as “synced”, or if it even sends individual updates at all or maybe it batches them?

Is there a better way to handle sync? Maybe I need to write my own provider and server implementation?

Maybe try onUnsyncedChanges hook? Seems to correctly log the count of unsynced changes. You could just write it directly to IndexedDB as the unsynced boolean with data.number !== 0. Probably easier to write the offline changes back to the server in one batch REST request, so you can just await for the response and clear the cache.

Crucial though is to have 1 writer to coordinate the writing to IndexedDB—you don’t want the REST request to overwrite the Hocuspocus hook. If the doc is open the REST result doesn’t count, better to overwrite to DB than to skip unsynced changes. OR you can derive the last update version perhaps to compare which is the latest, but I don’t think it’s worth the trouble.

Actually, this problem can be broken down into two core parts:

  1. How to reliably achieve Yjs synchronization for unrendered documents;
  2. How to avoid duplicate writes and event storms while maintaining offline-first + multi-device consistency.

Below is a practical, production-ready approach that you can directly map to your Yjs-first + Tiptap wrapper architecture.

Overall Architecture Design

  • Completely decouple real-time collaboration and final persistence into two layers:
    • Real-time Layer: Continue using a single Hocuspocus socket with multiplexing by docName to handle low-latency collaboration during online sessions.
    • Persistence Layer: Implement a separate document version/snapshot synchronization layer that uses Yjs’ update/snapshot mechanism for pull-based synchronization.
  • Whether a page is rendered only affects the need for immediate UI updates—it does not impact whether the document should ultimately be synchronized.

Document Updates and Local Persistence

  • Keep using doc.on('update', ...), but instead of writing the entire state on every update:
    • For each page in IndexedDB, maintain:
      • lastPersistedUpdate (or lastSnapshotClock)
      • unsynced = true/false or a version counter
    • On each update:
      • Append the incremental update to a local log (you can store sharded small files by docName) or directly overwrite it using Y.encodeStateAsUpdate (simpler option; logging is optional).
      • Mark the document as unsynced = true.
  • With this approach, regardless of whether the document is rendered, any update will be added to the pending sync queue—without relying on the provider. This works for offline edits as well.

Background Worker Synchronization Strategy

Upload (Push)

  • The worker periodically scans IndexedDB:
    • Identify documents where unsynced = true.
    • For each document, call encodeStateAsUpdate and send the result to your custom sync API (avoid using the events table from the event system).
  • Server-side Logic:
    • Use Yjs’ applyUpdate to merge the update into the server-side Y.Doc.
    • Record a server-side version number (this can be a serialized Yjs state vector, or a simple combination of updatedAt and an incremental clock).
  • After successful synchronization, the worker updates local storage:
    • Save the latest version vector or serverVersion returned by the server.
    • Mark the document as unsynced = false.

Download (Pull)

  • Each device maintains a server checkpoint (you can retain your eventIndex concept, but limit its use to JSON metadata/page CRUD operations).
  • For the document content layer:
    • The worker periodically calls an API to list recently updated documents, filtering by updatedAt or serverVersion > lastSeenVersion.
    • For each matched document, fetch the server’s encodeStateAsUpdate or snapshot, then apply it to the document stored in local IndexedDB using Y.applyUpdate (no UI rendering required).

Role of Rendering and Provider

  • When a page is rendered:
    • Read the latest state of the document from IndexedDB (local storage already includes updates from the local device and synced changes from other devices).
    • Initialize the Tiptap/Yjs document with this state.
    • Attach the Hocuspocus provider to enable real-time collaboration.
  • When a page is no longer rendered:
    • The provider can be destroyed to save resources, but the document will continue to synchronize with the server via the background worker (only low-latency collaboration will be disabled).

Solutions to the Mentioned Edge Cases

1. How to synchronize offline edits for documents that are never re-rendered?

  • Since updates are written to IndexedDB and marked as unsynced in doc.on('update'), the background worker will handle pushing these changes to the server—even if the document is never re-rendered afterward.
  • The worker on another device will retrieve the latest updates for the document during periodic pulls, even if the page was never opened on that device. The UI will initialize with the latest state when the page is eventually opened.

2. User A edits Page X online; User B is online but has not opened X—will User B miss the updates?

  • The real-time layer will not push updates to documents that are not subscribed to.
  • However, your background mechanism for fetching recently updated documents will periodically retrieve the latest version of Page X on User B’s device (via updatedAt or version number filtering).
  • Thus, eventual consistency is guaranteed by the worker, while strong real-time synchronization is only applied to open pages.

3. Will using both provider synchronization and worker synchronization cause duplicate writes?

  • There is no need to track whether the server has acknowledged a specific update.
  • The server-side only uses Yjs CRDT:
    • Writes from the provider: handled via applyUpdate.
    • Writes from worker pushes: also handled via applyUpdate.
    • Yjs is inherently idempotent—duplicate updates will be ignored (as they are already reflected in the state vector).
  • You only need to implement update merging by docName in the upload API, without worrying about the update source. Repeated API calls will not corrupt the document state—they will only trigger harmless applyUpdate operations.

4. Should rendered pages still be marked as unsynced?

  • Recommendation: Mark all document updates as unsynced, regardless of whether an active provider exists.
  • Rationale:
    • The provider is responsible for low-latency collaboration, but it does not guarantee immediate persistence of updates to the server database (especially if you plan to implement snapshot/GC later).
    • The worker acts as a fallback for final writes. A unified logic is the simplest approach: mark a document as unsynced whenever there are local changes, and only mark it as synced after the worker successfully pushes the changes, the server merges them, and returns the updated version.

Do You Need a Custom Provider/Server?

  • Do you need to build a custom provider/server?
    • The real-time collaboration layer can still use Hocuspocus—it is sufficient for this purpose.
    • What you actually need to customize is the persistence service:
      • Expose a POST /y-sync/{docId} endpoint to receive encodeStateAsUpdate data.
      • Expose a GET /y-sync/{docId} endpoint to return the current document’s encodeStateAsUpdate or snapshot.
      • Expose a GET /y-sync/changed?since=... endpoint to list recently updated documents.
  • Your sync engine will thus consist of two components:
    • Hocuspocus: For short-term, online collaboration.
    • Your custom y-sync API: For long-term data retention, offline recovery, and background state alignment.