Moving elements in lists

aslakhellesoy · May 21, 2020, 8:18am

Hello everyone,

Kevin and contributors - thanks so much for Yjs - it’s an impressive piece of kit!

I’m building a collaborative application where users can reorder nodes in an ordered tree. I’ve modeled the hierarchy as a Y.Map<Y.Array<string>> where the keys are node ids, and the values are the ids of that node’s children. (My move operation is a delete followed by an insert).

This works nicely except for moves. If two peers concurrently move a node to different destinations we end up with a dupe.

Martin Kleppmann has described the problem nicely in this paper/video.

Are there any known techniques or workarounds for dealing with moves in Yjs?

Many thanks,

canadaduane · May 21, 2020, 1:28pm

Just to be clear–the hierarchy is modeled as Y.Map<string, Y.Array<string>>, correct? (Reading a Map with only one type in its type signature was tripping me up).

aslakhellesoy · May 21, 2020, 9:57pm

Correct. Consider the following tree:

countries
+--france
|  +--paris
+--spain
   +--madrid
   +--seville

That would be represented as follows in a Y.Map<string, Y.Array<string>> (using JSON syntax)

{
    "__ROOTS__": ["countries"], // convenience entry for quickly finding root(s)
    "countries": ["france", "spain"],
    "france": ["paris"],
    "paris": [],
    "spain": ["madrid", "seville"],
    "madrid": [],
    "seville": [],
}

The problem of moving items doesn’t require a model this complex though. Let’s consider reordering in a simple list (Y.Array<string>):

["wash floors", "get milk", "homeschooling"]

Peer A moves "wash floors" to index 1, while Peer B moves "wash floors" to index 2. Since the move operation doesn’t exist, they both delete the "wash floors" element, then insert it at the new index. When these operations are merged we end up with "wash floors" twice in the list.

My question is - can this kind of conflict be resolved in Yjs? If not, would this be possible to add?

According to Martin Kleppmann, a CRDT move operation can be implementd with a combination of a List CRDT (Y.Array in the case of Yjs), an AWSet and a LWWRegister. I’d love to contribute this, but I think I need to learn a lot more about CRDTs and the Yjs codebase before I’d be able to.

Anyone more able than me?

aslakhellesoy · May 21, 2020, 10:28pm

Btw here is the paper: https://martin.kleppmann.com/papers/list-move-papoc20.pdf

dmonad · May 27, 2020, 3:30pm

Hi @aslakhellesoy,

Moving element(s) is possible in any list CRDT, as Martin Kleppmann described. He just discusses a single approach, but there are probably more solutions that can be explored. My first idea was to use special move-items that will refer to a range (start;beginning) of items (In Yjs terms an item is an integrated list-CRDT operation that consists of a unique id, content(s), and a position (next item; previous item)). The moved items would be marked as moved and would refer to the move operation. In case of a concurrent move, a last writer wins algorithm is simply implemented by comparing the unique identifiers of the move items that overlap.

This would have the advantage that you can move ranges of text. It would even be possible to move text from one Y.Array to another Y.Array.

Here are some of my concerns:
• How would rich-text semantic (formatted text) work in combination with the ability to move text? It would probably expected that moved text retains the formatting attributes.
• Introducing moving semantics requires a new event system. Implementing an event system for just insert/delete with formatting attributes is already incredible difficult. How do you even express moving semantics in events?
• Moving semantics also affect other areas of Yjs. It will probably take a lot of effort to make it work with version histories.
• How do you even visualize moving semantics with, for example, ProseMirrors version history implementation.
• While your example example is absolutely reasonable, I think that moving semantics on text/arrays is not yet worth the effort. How would you even recognize moving of text in current text editors? They also only work with insert/delete semantics.
• If I’m going to implement moving semantics in Yjs, I also want to introduce copying semantics. This would allow us copy types from one type to another. Basically you should be able to do const user = ydoc.getMap('my-user'); ydoc.getMap('editor').set('currentUser', user). My idea is also that you would be able to use copy semantics to use Yjs to better organize data / replace Firestore / provide some kind of indexing ability.

I was just writing down my thoughts. I do want to have moving / copying semantics in Yjs eventually. But there is a lot of work to be done.

You can actually implement some kind of moving semantic yourself. You don’t need to have Yjs for that. It only gets complicated when you introduce copying ranges. Based on your second example you could do the following:

// Every item has a unique ID:
const data = [
  { content: "wash floors", id: "Xyz" },
  { content: "get milk", id: "gz3" },
  { content: "homeschooling", id: "39z" }
]

const yarray = ydoc.getArray('myfood')

/*
On every change you ensure that there are no duplicate ids.
This will ensure that content is not duplicated when a two clients move the same item at the same time.
The easiest implementation is to keep the last unique item and delete the rest:
 */
yarray.observe(event => {
  // temporary Set of all ids used in yarray
  const uniqueIds = new Set()
  const array = yarray.toArray()
  // bundle all changes in a transaction, so that only one event is fired
  yarray.doc.transact(() => {
    // delete from right to left so that deletions don't affect the current position
    for (let i = array.length - 1; i >= 0; i--) {
      const item = array[i]
      if (uniqueIds.has(item.id)) {
        // We already found this item, delete it
        yarray.delete(i, 1)
      } else {
       // This is the first time we found this item (the id is unique)
       uniqueIds.add(item.id)
      }
    }
  })
})

// Initialize content _after_ the above handler has been registered.
yarray.insert(0, data)

const moveItem = (yarray, from, to) => {
  yarray.doc.transact(() => {
    const item = yarray.get(from)
    yarray.delete(from)
    // we already deleted an item, we might need to adjust the position
    const adjustedPosition = from < to ? to - 1 : to
    yarray.insert(adjustedPosition, [item])
  })
}

// Now you can move items:

moveItem(yarray, 0, 2)

aslakhellesoy · May 28, 2020, 1:42pm

Thanks @dmonad for your detailed reply and the workaround. I had something like that in mind, and it’s reassuring to see your recommendation.

gzp-crey · June 23, 2020, 6:49am

I’m thinking on reimplementing some of the types/concept of yjs for a pet project written in rust. I think I’m starting to understand the basics. You’ve mentioned you are thinking on creating a copy (and move) functionality. How would it fit into the structs. For YATA to work each operation has a left/right operation where it can be integrated. For copy and (move) there are two. One source and one destination. All pending operation that target the old location should have the effect on the new location.
For move I’m thinking on reordering the operations but i’m not sure if the ordering requirement (non crossing link as of the article) will be still valid. Or i’m nut sure what extra constraint is required.
For copy i’m more stuck as in this case the effect (operation) have to be duplicated (and the two site can even diverge). Thus it should mimic as if there were two independent operation at different location. But how could you find the id for the duplicated operation?

What do you have in mind to implement it? Do you have some trick i’ve missed or was not stated yet.

dmonad · June 25, 2020, 8:14pm

Hi @gzp-crey
To my knowledge nobody is currently working on something like this. As a start, you should definitely have a look at the move approach that @aslakhellesoy mentioned and continue from there. Reimplementing Yjs, or any CRDT for that matter, is already pretty difficult. I suggest to make small steps.

jacobp100 · July 20, 2020, 3:26pm

Just to be clear, in the above example with observing an array, is that needed even for the case of moving only within a single array? (Not maps of arrays like the original question)

One idea I was playing around with is representing the array as a map, and having a sortIndex property on each element. Figma had an interesting blogpost on how they handle reordering of layers (this exact issue) - and they’re using fractional indexing (using the string versions of this to avoid needing bignum)

Then I have a utility function to convert this map to a standard, sorted array.

Would that work for this case? It should at least avoid data inconsistencies, since re-ordering is just setting a map value.

dmonad · July 20, 2020, 9:03pm

I’m unclear if it is necessary to represent move operations in the event system. I need to be sure that the move semantic that is implemented is robust and can actually do everything that is needed. Move semantic brings in extra complexity as the way I see it now, simple move semantics can easily be implemented using the approach I highlighted before.

Yjs’s arrays have stricter semantics than the fractional indexing implementation. In particular, Y.Arrays do not suffer from interleaving. So this is not an option for Yjs in particular.

Of course, you can represent the array as a map and then convert the map to an array using some kind of sort on the data. This is basically the approach I laid out in the bottom of my answer Moving elements in lists

I suggest not to use maps as the top-level data structure. Deleted properties will always take up space (the key-name can’t be removed). When you store the data in a Y.Array, you can delete data very efficiently.

I’m actually planning a Y.Set implementation for exactly for this use-case. This would allow to implement efficient move-semantics using a sort algorithm on the data.

marbemac · August 11, 2020, 1:36am

@dmonad thanks for putting together the earlier example! Would you expect that pattern to work if it’s a Y.Array<Y.Map>? Seeing an exception when trying to delete/add the Map entry.

I reproduced it over in this codesandbox: https://codesandbox.io/s/sleepy-visvesvaraya-nux76?file=/src/App.js

Ignore missing names in the second list, not sure why that’s happening - but if you click on the second button it will try to sort the array of maps, and produce the same exception we’re seeing in our proof of concept (the top list is your working example of an array of plain objects).

dmonad · August 11, 2020, 10:00am

Hi @marbemac,

You can’t copy a Y.Map from one position to another. A Y.Map instance can only be present once in the document.

In moveItem you take one map and insert it at a different position.

An approach that supports types as children would look like this:


const arr = doc.getArray('move-array')

// the array is ordered using "index" markers. They represent the order of the items
// index is misleading, because index simply specifies a random value to mark the maps position
const getArrayContent = arr => arr.toArray().sort((a, b) => a.get('index') - b.get('index'))

const insertMap = (map, index) => {
  moveMap(map, index)
  // always appended
  arr.append([map])
}

const moveMap = (map, index) => {
  // you want to insert the map between two other maps
  const sortedArray = getArrayContent()
  const left = sortedArray[index] || null
  const right = sortedArray[index + 1] || null
  const leftIndex = left === null ? 0 : left.get('index')
  const rightIndex = right === null ? index : right.get('index')
  // you probably need to use another in-between measurement. Eventually you might run out of numbers
  map.set('index', (rightIndex - leftIndex) / 2)
}

This example is inspired by the id approach used by LSEQ and LAGOOT. You basically assign each map a number that specifies its position. If a map is moved, you set the index in a way that moves the map to the intended position. Together with the pre-ordering of the Y.Array, you will always get a uniquely ordered list that supports move.

I did not test the above example. I appreciate that you created a codesandbox to test out the previous examples. Let me know if it works for you, but you probably have to adapt it a bit.
You also need to ensure that the sort-algorithm is stable. I know that chrome uses a stable sort algorithm. Another approach is to use a stable sort algorithm from npm.
Eventually you might run out of numbers. You can get inspired from the LSEQ an LAGOOT CRDTs how they assign ids to limit the amount of numbers. Your implementation should work well in the following cases:
** A user inserts 1 million maps at a position (insert at pos 1, insert at pos 2, insert at pos 3, …)
** A user prepends 1 millions maps at a position (insert at pos 1, insert at pos 1, insert at pos 1, …)

marbemac · August 11, 2020, 8:58pm

Got it! Will give it a go now, thank you.

This is basically the fractional indexing approach described above that Figma uses right?

Have got one related follow up (LMK if makes more sense to start a new thread for it) - since deleting a Y.Map from one array and adding it to another does not work, how would one “re-parent”?

For example - moving a file (Y.Map) from one folder (Y.Array) to another in a filesystem/tree like situation. Given this use case, does it make more sense for us to consider a different data structure e.g. Y.Map<File> has parent property pointing to id of Y.Map<Folder>? This would make it easy to “re-parent”, but it seems like we’d lose out on a number of neat yjs features such as the automatic child cleanup when deleting arrays etc.

Will let you know how well the indexing approach works once I have a moment to try it!

dmonad · August 11, 2020, 11:10pm

This is basically the fractional indexing approach described above that Figma uses right?

Exactly! It is just an adaption to support move operations.

Re-parenting on maps is currently not supported. v12 supported reparenting, but I removed this feature because it is hard to keep track of potentially cyclic structures. Furthermore, the child cleanup approach wouldn’t work anymore.

Having a separate map might make sense. I think you read a thread of me suggesting to Duane that he should not use a top-level data structure. In his case, he stored millions of different keys in a single top-level map. For a small file system (with probably less than 10k files) it won’t make a difference. The advantage is that you could use the much simpler first approach. Although the second approach is probably optimal because it is more efficient and allows you to keep everything in a single data structure.

FYI: the children (i.e. values of the maps) will always be cleaned up after they are deleted. But the keys, alongside some other metadata, will remain as part of the Yjs document. This metadata is needed for conflict resolution. Writing only to a few keys with the same name is very efficient, because Yjs does a lot of internal optimizations.

I suggest that you implement the approach that is easier to work with, and then optimize later if you notice any significant overhead.

marbemac · August 11, 2020, 11:46pm

I think you read a thread of me suggesting to Duane that he should not use a top-level data structure. In his case, he stored millions of different keys in a single top-level map. For a small file system (with probably less than 10k files) it won’t make a difference.

I did indeed - this is good to know. I’ll experiment with a couple of different structures to see what works best. Checking this out next, will let ya know where I end up. Now I’ve moved down the POC checklist to the “branching” item - this one seems like it should be pretty straightforward with yjs, fingers crossed.

BTW below is where I ended up re moving items within a parent - what you shared is working like a charm with some small tweaks . Built on the ideas in this article to create an index (the generateKeyBetween func used but not defined in example below) that would scale to work with most realistic situations - Implementing Fractional Indexing / David Greenspan | Observable.

// the array is ordered using "pos" markers. They represent the order of the items
// this function returns the sorted array
export const getArrayContent = (arr: Y.Array<Y.Map<any>>) =>
  arr
    .toArray()
    .sort((a, b) =>
      a.get('pos') < b.get('pos') ? -1 : a.get('pos') > b.get('pos') ? 1 : 0,

const insertMap = (
  arr: Y.Array<Y.Map<any>>,
  map: Y.Map<any>,
  index: number,
) => {
  moveMap(arr, map, index);

  // always appended, since sort order is not reliant on array order
  arr.push([map]);
};

export const moveMap = (
  arr: Y.Array<Y.Map<any>>,
  map: Y.Map<any>,
  index: number,
) => {
  const sortedArray = getArrayContent(arr);
  const currentIndex = sortedArray.findIndex((i) => i === map);

  let left: Y.Map<any> | null;
  let right: Y.Map<any> | null;
  if (currentIndex >= 0 && currentIndex < index) {
    // we're moving an item down in the list
    left = sortedArray[index] || null;
    right = sortedArray[index + 1] || null;
  } else {
    // we're adding an item for first time, or moving up in list
    left = sortedArray[index - 1] || null;
    right = sortedArray[index] || null;
  }

  const leftIndex = left ? (left.get('pos') as string) : null;
  const rightIndex = right ? (right.get('pos') as string) : null;

  map.set('pos', generateKeyBetween(leftIndex, rightIndex));
};

dmonad · August 12, 2020, 1:45pm

Nice, thanks for compiling this into a useful example!

zib · August 12, 2020, 4:20pm

Thank you for the explanation. Is it the same reason why I couldn’t move (delete and reuse for insertion) Y.XmlElement? What’s your suggested approach (as recreating the the whole sub-tree is quite a lot of work)?

  const doc = new Y.Doc();
  const body = doc.getXmlFragment('body');
  const p = new Y.XmlElement("p");
  body.insert(0, [p]);
  body.delete(0, 1);
  body.insert(0, [p]); // TypeError: content is null

dmonad · August 12, 2020, 4:58pm

Deleting an element and then inserting it somewhere else would also be a move operation. Implementing moves is not trivial and has a lot of consequences.

As soon as you delete a type, all children are also deleted. The advantage is that children can be removed from the document. So even if Yjs would support moving operations, you shouldn’t delete them beforehand.
The content of a deleted type is empty.
What happens when two users move the same Y.Type at the same time?

The best approach is to copy the type before you insert it somewhere else. You might duplicate content if two users move an element at the same time. If that is unacceptable, you can implement one of the move approaches we discussed above. Although, duplicating content through concurrent moves could also be seen as an advantage (assume two users move a paragraph and modify it a bit. In most versioning systems you would expect that you can see both versions).

marbemac · September 22, 2020, 3:16pm

Yeah however when talking about tree structures, copying approach comes with a significant disadvantage. With copy approach any in flight edits by other users to children of the original tree will be lost - e.g. if I drag a folder full of files to re-parent it under a separate folder, and user two is renaming one of the child files in the folder tree, the rename might be lost right?

Just making sure I understand correctly .

dmonad · September 22, 2020, 5:15pm

Move semantics have a lot of trade-offs. Performance vs semantics. Complicated semantic vs simple semantic. Your kind of move semantics will also introduce new semantic issues. What is the expected outcome when two users move the same paragraph and then modify it - adjust it to the surrounding context?

Move semantics are essentially useless in collaborative text editing scenarios because they will never be able to catch the intention of the user.

What does it matter anyway when documents sync within milliseconds?

For offline editing you need to display the differences anyway so that the user can manually merge conflicts.

For folder structures you can already implement move semantics as explained above. I’m not saying that Yjs will never support move. With some research I’d like to support a generic move eventually.