Yjs vs Loro (new CRDT lib)

MentalGear · April 4, 2024, 9:15am

I just learned about the new CRDT solution named Loro.

# Features

## Supported CRDT Algorithms

- **Common Data Structures**: Support for `List` for ordered collections, LWW(Last Write Win) `Map` for key-value pairs, `Tree` for hierarchical data, and `Text` for rich text manipulation, enabling various applications.
- **Text Editing with Fugue**: Loro integrates [Fugue](https://arxiv.org/abs/2305.00583), a CRDT algorithm designed to minimize interleaving anomalies in concurrent text editing.
- **Peritext-like Rich Text CRDT**: Drawing inspiration from [Peritext](https://www.inkandswitch.com/peritext/), Loro manages rich text CRDTs that excel at merging concurrent rich text style edits, maintaining the original intent of users input as much as possible. Details on this will be explored further in an upcoming blog post.
- **Moveable Tree**: For applications requiring directory-like data manipulation, Loro utilizes the algorithm from [*A Highly-Available Move Operation for Replicated Trees*](https://ieeexplore.ieee.org/document/9563274), which simplifies the process of moving hierarchical data structures.

## Advanced Features in Loro

- **Preserve Editing History**
  - With Loro, you can track changes effortlessly as it records the editing history with low overhead. 
  - This feature is essential for audit trails, undo/redo functionality, and understanding the evolution of your data over time.
- **Time Travel Through History**
  - It allows users to compare and merge manually when needed, although CRDTs typically resolve conflicts well.
- **High Performance**
  - [See benchmarks](https://www.loro.dev/docs/performance).

> **Build time travel feature easily for large documents**.

I was wondering if anyone has tried it out yet, and what the main differences (features, DX) are when compared to Yjs?

dmonad · April 5, 2024, 12:37pm

I have a bit of a problem with the Loro CRDT as their benchmarks are not reproducible. They don’t even publish the source code for the benchmarks. Yet, they make bold claims.

I tried to reproduce their results, and I ended up with very different results: GitHub - dmonad/crdt-benchmarks: A collection of CRDT benchmarks

My argument against wasm implementations in general is that they are way to large and ironically consume more memory than a Yjs document would, in most cases. The size of the Loro bundle is over 1MB in size, which needs to be base64 encoded if you ship it to the browser (+30% overhead). Once wasm is ready we will encourage users to use Ywasm (a Yjs compatible port). But the web is not ready yet for wasm crdts.

MentalGear · April 19, 2024, 7:55pm

Thanks for the detailed reply. Not open sourcing the benchmarks is indeed very suspicious from Loro.

zxch3n · May 6, 2024, 7:05am

Links in the post are removed. Otherwise, the community will flag the post as spam

I am the author of Loro. The latest benchmark data can be found at this address: github zxch3n/crdt-benchmarks, which includes the most recent Loro benchmark data as well as the November 2023 version of Loro’s benchmark data, which you can easily reproduce. Over the past few months, Loro has undergone numerous changes, especially in the encoding format, shifting from a version focused more on performance to one emphasizing compatibility, resulting in performance differences loro-dev/loro/pull/219. We also have experimental work underway that may significantly change performance in the near future, so even the current benchmarks on Loro are not stable.

For a fairer comparison of Yjs and Loro in this benchmark, Yjs should turn off garbage collection because Loro and Automerge’s documents record the complete editing history, whereas Yjs in GC mode does not. Even with GC disabled, Yjs cannot directly implement Time Travel, whereas Loro and Automerge can.

Regarding package size, it’s true that the WASM size is larger. However, it’s important to consider the context. Most use cases for CRDTs are in web apps, not web pages. In this scenario, caching can be fully utilized, and items like WASM binaries do not require frequent updates. Additionally, WASM binaries do not need to be embedded in the JavaScript source code as Base64, which can misleadingly suggest a 30% increase in size due to encoding. They can be loaded as separate files. If there’s a strong community demand for a lightweight library, we will explore a mode based on Loro’s REG algorithm features. This mode would rely on server-side conflict resolution computation, where the client only needs a lightweight JS library to log operations and apply diffs.

If you have any questions, please email me directly or ask in our community. I greatly appreciate the work of Yjs and have even sponsored Kevin for over a year. We have given Yjs credits in our README, and I have supported Kevin when he tried to reproduce our results, explaining why there are performance differences with the original version. I had hoped for a more friendly community. The insinuations in this discussion made me upset and disappointed.

dmonad · May 7, 2024, 8:53pm

Hi @zxch3n ,

I do want you to feel welcome and I am thankful that you were a sponsor!

I also appreciate that you made the source code for reproducing the updated benchmarks available. I don’t want to single you/Loro out. I have spoken out publicly many times when papers and CRDT authors don’t publish data on how to reproduce their benchmarks. It is - unfortunately - common practice that only the results are published, hiding the drawbacks that their approach has. I do like to point that out whenever I have the chance.

With crdt-benchmarks I want to give some kind of guideline for benchmarks that are relevant in the real-world. You don’t have to “win” them all, but it’s good if a CRDT implementation doesn’t take exponentially long for anything - and loro never has!

I remember your helpful input when I was working on the Loro benchmarks. I also remember pointing out that your benchmarks were not reproducible and that the source code is missing. In some cases, your benchmarks were 40x faster than what I was able to reproduce. Of course, this makes me wonder. To this day, I’m unable to reproduce your old results.

When I was asked in this forum about Loro, I believe I’m right in pointing out that I was unable to reproduce your results. I’m still unable to reproduce the old results. You still haven’t produced code for being able to reproduce your old results. I want to note the following differences:

Loro according to GitHub - zxch3n/crdt-benchmarks: A collection of CRDT benchmarks

Loro’s website last month

2024-05-07_21-35-loro-old

The last column doesn’t match the results of loro_old which is supposedly to be much faster.

This makes me think that there are other things that are modified in the original crdt-benchmarks source code. These modifications were not mentioned anywhere. I assume there are other modifications (I remember that you removed the report of the size of the web bundle).

I also believe that I’m right to point out that turning of garbage-collection in Yjs is unfair, and misleading to the user. It is an integral feature of Yjs and the reason why it performs well in practice. Turning it off will yield very different results. The github repository (GitHub - zxch3n/crdt-benchmarks: A collection of CRDT benchmarks) doesn’t mention anywhere that gc is disabled in Yjs. It also doesn’t explain why this is “fairer” to turn it off. (I could have done this in a ticket in the repository, but I just learned about it)

Regarding versioning:

I do appreciate Loros approach for versioning. It is very simple and easy to use.

However, I don’t recommend users who want to do versioning in Yjs to turn of gc. Instead, I recommend storing the encoded Yjs document versions in separate database entries. The differences can be computed using snapshots (state vector + deleteset). I don’t think it is right for every application to load the full editing history to memory, like Loro does. This might be problematic for huge documents with a long editing history.

In conclusion, versioning in Yjs is still possible with gc turned on. It is just different than Loro’s approach, which has an integrated versioning feature but lacks garbage collection.

Regarding base64 encoding:

Unless you want to load loro asynchronously, you have to base64-encode the wasm bundle. I’d like to see a demo where a bundler doesn’t use base64-encode the wasm bundle. In the future this might be different.

dmonad · May 7, 2024, 8:56pm

I’m sorry if this response was again quite critical. I really do like to learn about new CRDT implementations, and Loro is a very interesting one. Please don’t feel discouraged from my above response.

I’m really happy that the benchmarks are now published and that I can play with it.

zxch3n · May 7, 2024, 11:43pm

I welcome criticism, but please communicate directly.

How to Reproduce

You can find and reproduce the same result at this commit GitHub - zxch3n/crdt-benchmarks at 029df673e42564850da4b83c6531ff9c06e004ef . The difference is that it merged the upstream changes from January 24, 2024, to February 12, 2024. You can find the differences in the benchmark results before and after the merge here Update the benchmark results base on the new benchmark rules from ups… · zxch3n/crdt-benchmarks@44819d4 · GitHub

Why It Causes a Performance Change

The difference is caused by the new version providing an additional unused updateHandler, which triggers an export every time a character is inserted or deleted. We have not optimized for this scenario, resulting in high additional computational costs (such optimizations seem to only make the benchmark numbers look better without any other benefits). There are also costs associated with communication from WASM to JS. The new version 0.15.2 could be much faster without this unused updateHandler, as the exported data even includes an md5 checksum.

Why the February Version Cannot Reproduce the Performance of Last November

We continuously improve the design and adjust the architecture, introducing many breaking changes, some of which have disrupted the original optimizations. However, we know there are new optimization efforts we can undertake, but they are not a high priority for us. When you try to reproduce it, it is in an unoptimized state.

About Loading WASM

Our blog Introduction to Loro's Rich Text CRDT – Loro loads WASM in a separate file. Loading them asynchronously isn’t difficult, as the ecosystem is now sufficiently friendly towards loading content asynchronously.

About Fair Benchmarks

If the original benchmark compares document size and encode & decode speed without stating that Loro & Automerge save the complete editing history, thus being larger and slower, it is clearly unfair; it directly ignores this fact.

If the comparison is intended only for real-time editing scenarios in terms of document size and encode & decode speed, we will have a more suitable mode for this comparison in the future. We will have a mode that can trim off the needless history. Meaningful comparisons in terms of document size or performance for real-time collaboration scenarios will only be relevant then. In our older codebase, we had GC capabilities, but we later removed this feature Feat remove gc by Leeeon233 · Pull Request #110 · loro-dev/loro · GitHub to simplify the overall architecture, instead preparing to utilize the properties of the REG algorithm to directly avoid loading unnecessary history. With the properties of REG, not loading needless history eliminates overheads in CRDTs such as tombstones.

zxch3n · May 8, 2024, 1:00am

The community hides the posts because they contain links to GitHub. You can find the reply here:

https://twitter.com/zx_loro/status/1787994354176659605

german-jablo · May 8, 2024, 1:22am

While I think that in most applications using snapshots + GC is preferable, I also think that on some occasions having the full editing history can be preferable.

My recommendation is to make a website that shows the benchmark results interactively, with indications about implementations. The results could be shown with or without GC (or leave one of those sections blank if the library does not support it).

This is what the famous js-framework-benchmark does, which is quite reputed in js frameworks (view of the latest edition here). You can see that Keyed results are separated from non-keyed results, and that there are notes indicating whether the implementation uses manual DOM manipulations, explicit requestAnimationFrame calls, etc.

Another aspect that could be worked on is the presentation and weighting of the results. In the benchmark I mentioned, I left some suggestions that were well received but I didn’t have time to implement yet.

I think the results of the CRDT benchmark are a little difficult to parse even for people experienced in CRDTs, much more so for beginners.

I would make a weighted average of the results that match real-world document traces, and put worst-case scenarios like random or concurrent operations aside. Also, I would group them by what they measure (docSize, updateSize, time, etc).

I’m sorry there were misunderstandings here. I believe that if there is collaboration, a benchmark can be achieved that shows the results in the most objective way possible, along with the tradeoffs of each implementation.

PS: I don’t know the reason why some messages in this post were hidden, but I think they contributed to the discussion in a positive way.

german-jablo · May 8, 2024, 1:46am

As I have also seen the discussion on Twitter, I want to add something.

I 100% agree that efforts to win in a benchmark can be a poor use of resources when the threshold of what the end user can perceive has already been exceeded.

In this I think Google has done a good job with its CWV. There is no incentive to improve metrics once the recommended threshold is reached.

I hope that the fatigue that it takes for both of you as library maintainers to optimize certain rare scenarios does not result in a lack of collaboration or even healthy competition. The benchmark can be a great tool for you as maintainers and for us as users, if the methodology and presentation of the benchmark is polished.

zxch3n · May 8, 2024, 3:11am

These benchmarks are good at spotting performance issues related to unexpected time or memory use. However, they can be misleading if they don’t reflect real-world situations since different projects often make different compromises. It’s not accurate to say that Project A is better than Project B just because it performs better in some benchmarks, while Project B might be better in other important ways.

To make these benchmarks more useful, we should discuss the trade-offs different projects make along with the benchmarks and try to make them fairer by allowing different setups.

I’ll create the online benchmarks after Loro reaches v1.0