Retrieve and store a y-websocket persisted document

I’d like to retrieve the current state of a y-websocket/y-leveldb persisted document, and then create a new document with just the “latest state” (truncating/ignoring all history).

I’m using a custom server that combines HTTP and WS, which allows me to accept an HTTP POST to trigger the action.

I thought I might be able to copy latest state with direct access to the docs map; however, it looks like this map only contains documents stored in memory, so I can’t use it to retrieve a document unless it has already been “vivified” from its persistent store.

Furthermore, this map contains only the AbstractType type, not the concrete types (e.g. YMap, etc.). So, even if I did have access to all vivified documents, I would not be able to call toJSON() on any particular one to get its current state. (Calling toJSON() on AbstractType simply returns undefined).

What’s the best approach to retrieving and storing an arbitrary document in y-websocket? Considerations:

  • I’d prefer not to duplicate storage by using a database (but I will if I have to)
  • I can’t open the leveldb database directly, because its rules require that only one process has access to its on-disk store at a time

I’ve made some progress. It looks like I can do something similar to bindState and export a function that gets a ydoc (persistence.getYDoc, in the following code) from the combined persistence data and in-memory data:

  persistence = {
    ldb,
    mutualSync: async (docName, ydoc) => {
      const persistedYdoc = await ldb.getYDoc(docName)
      const newUpdates = Y.encodeStateAsUpdate(ydoc)
      ldb.storeUpdate(docName, newUpdates)
      Y.applyUpdate(ydoc, Y.encodeStateAsUpdate(persistedYdoc))
    },
    bindState: async (docName, ydoc) => {
      await persistence.mutualSync(docName, ydoc)
      ydoc.on('update', (update) => {
        ldb.storeUpdate(docName, update)
      })
    },
    writeState: async (docName, ydoc) => {},
    getYDoc: async (docName) => {
      const ydoc = findOrCreateDoc(docName)
      await persistence.mutualSync(docName, ydoc)
      return ydoc
    },
  }

function findOrCreateDoc(docName, gc = true) {
  // get doc, create if it does not exist yet
  return map.setIfUndefined(docs, docName, () => {
    const doc = new WSSharedDoc(docName)
    doc.gc = gc
    if (persistence !== null) {
      persistence.bindState(docName, doc)
    }
    docs.set(docName, doc)
    return doc
  })
}

I’m still learning how to now get the “current state only” from the ydoc and store it as another ydoc.

When using the y-leveldb adapter, it makes sense to duplicate storage. This is also what big databases like cassandra and bigtable are doing. They duplicate any change. You could use pubSub server to distribute document updates and store each one of them on all servers (using the methods of y-leveldb you don’t even need to load the Yjs document).

A bit of background: For conflict resolution, there is no difference between Y.Array and Y.Map and Y.XmlElement. They all implement Y.AbstractType. In order to keep the Yjs update format consistent, it doesn’t send the type information of the top-level types. This is why you need to specify them individually with ydoc.getText(). In some cases you can take advantage of this, because a user can load a type as a Y.XmlFragment (ProseMirror) and another user can load a type as Y.XmlElement (the old dom binding).

Short answer: You need to know how the document structure looks like (i.e. load the types using doc.getText(), doc.get*())

I like how @Mansehej implemented storing the JSON structure. https://github.com/yjs/y-websocket/pull/20

He basically specified beforehand what kind of types he wants to transform to JSON. Maybe this implementation is helpful to you as well. Also it makes sense to know beforehand what you are storing in a database. A user could store arbitrary data in the ydoc. You should only store the data that is relevant for your application.

1 Like