Shared references to nested yjs maps

vez · October 18, 2023, 2:29pm

Hi all, I’m building an application in which a user builds a tree of maps. For example:

const root = new Y.Map();
const nestedYmap = new Y.Map();
root.set('hello', nestedYmap);
nestedYmap.set('world', new Y.Map());

In this application, the user’s cursor can be “at” any of the nested ymaps (the nodes of the tree).

Now for the problem that I am not sure how to approach: For yjs arrays we have relative positions that can be shared between peers, but how can we share positions in nested maps?

Let me illustrate the problem by showing an approach that does not work:

const cursorPath = ['hello', 'world']
// now I can send this path to another user,
// and they can display the cursor by doing:
root.get(cursorPath[0]).get(cursorPath[1])

But this does not actually work properly! If another user changes the tree, for example by doing root.set('hello', new Y.Map()), then this path no longer points to the same place in the yjs document. It is the same problem as using indices to share cursor positions in shared arrays.

So is there a way around this with yjs?

raine · October 18, 2023, 4:21pm

I would recommend giving each node a unique id and storing them in a single level of Maps (either top-level or in a container Map). You can create the nesting through maps of children ids. This is akin to normalization in relational databases.

e.g.

a
- b
  - c

const id_root = 'root'
const id_a = guid()
const id_b = guid()
const id_c = guid()

const root = doc.getMap(id_root)
const a = doc.getMap(id_a)
const b = doc.getMap(id_b)
const c = doc.getMap(id_c)

// note that children are stored in a Map, not an array, to avoid 
// concurrency issues
root.set(id_a, true)
a.set(id_b, true)
b.set(id_c, true)

With this structure, you can share any node with another client just by sharing its id. It can be accessed in O(1) and its path can be determined in O(depth) if you add links from child → parent.

vez · October 18, 2023, 7:51pm

Thanks @raine for the solution! I can see how that works. I appreciate your analysis of the time complexity of the operations.

Since the trees may become quite large, the storage complexity of all these guids scares me a bit. It makes me wonder, can the guids of deleted nodes be cleaned up?

I don’t see a way to delete a map that is created using doc.getMap. Does this mean the gc will never clean them up and they will be stored in the document for eternity? Modifying off of your example, would it allow the gc to clean up the maps once they are removed from the tree (deleted from their parent map) if I instead did something like:

const id_root = 'root'
const id_a = guid()
const id_b = guid()
const id_c = guid()

const root = doc.getMap(id_root)
const a = new Y.Map()
const b = new Y.Map()
const c = new Y.Map()

root.set(id_a, a)
a.set(id_b, b)
b.set(id_c, c)

a.delete(id_b)
// can the gc clean up the maps b and c 
// since they are not connected to the document any more?

I’m not sure how to think about managing memory in yjs, so this may or may not make sense. I just wonder if there is ways to reduce the storage complexity, and this is my idea to try to reduce it in the deleted parts of the tree.

raine · October 19, 2023, 12:08am

You can use new Map instead of doc.getMap() if you prefer.

Deletes are very cheap in YJS. Just delete all the keys and they will be moved into an efficient delete set. However, you would have to traverse and delete all descendants if you use the flat map as described. YJS won’t delete orphaned nodes in this model since the hierarchical structure has been moved outside YJS and is only captured in the business logic.

As far as overall size of the Doc, that depends a lot on your application. It will grow over time. I’ve had to switch over to a multi-Doc architecture to improve initial load time and allow lazy loading, but it’s a lot of work to manage. I’m aiming for a capacity for 100k items, with room to grow to 1M, and I want the initial visible items (~100) to load in under 1 second. If you have less data or less stringent load time requirements, one Doc is far easier.