How efficient is updateV2 encoding

Gin-Quin · May 20, 2022, 4:24pm

Hi,

I would like to know more about the v2 encoding. If I understood correctly, it is 10x more efficient but sill has bugs or slowness?

What will change comparing to v1 encoding? Will the same API be used?

And is it possible to use it right now? It’s hard to find documentation about that.

dmonad · May 23, 2022, 7:02pm

Hi @Gin-Quin,

Who said that it has bugs or that it is slow?

v2 encoding is slightly slower than v1 encoding, but has a better compression ratio.

The size-benefit is (IMHO) not too important. Furthermore, all existing protocols and tooling only support the v1 encoding.

The reason why I don’t mention v2 encoding in the documentation is that users will get confused by fact that there are two update formats. It makes more sense to recommend only one encoding format that has wide support.

V2 is stable and will be supported in the future. However, there will likely be a v3 encoding format that will be the new standard in the next Yjs release.

Gin-Quin · May 24, 2022, 8:16am

Okay, that’s awesome. I can’t remember exactly where I’ve read v2 is buggy or slow, I think partly in a comment in some old not-to-date code and in a Github topic when v2 was sill in progress. Since I could not find any documentation for the v2 encoding I wrongly expected unstability could be the reason.

I’ve tried v2 a bit nonetheless and in my case updates were larger with v2 encoding than with v1. Does that mean the better compression ratio applies when all updates are merged into one state? Or is it that v2 is better-compressed only in some situations?

Anyway if there is a better v3 coming up that’s not very important. I was thinking about how powerful it would be to have such a minimal update format that it would make complete sense to only store the updates instead of a static state.

For example, for text documents, the most common operation is to “insert n characters at the end of the document”. Could there be some kind of optimisation so that the size of this update would be n or n+1 with a leading byte indicating the kind of the operation?

Or maybe to have a system of “cursor” that would deal with all cases, not only “end of the text”. Most of the time you don’t write a unique letter there or there. You first put a cursor somewhere in a text and then operate series of insertions. It could be cool, when stacking updates together, to realize that “last updates share the same cursor” and so to not carry the position of the insertion/deletion/… on every update.

I don’t have experience on working with document updates so maybe you already thought about that and this is completely unachievable. I just believe having a super-compact updates format would be a complete game-breaker and would change the way we store a “state”.

dmonad · May 25, 2022, 7:41am

v2 is larger for small updates (i.e. a single keystroke). However, v2 is much smaller for large updates (i.e. encoding the complete Yjs document).

The update format still needs to persist metadata, it is always going to be less performant than “insert x at y”. However, the v2 compression has an overhead of only ~50% in size compared to the utf8 encoded document (measured on datasets of pure text editing traces).

Gin-Quin · May 25, 2022, 12:48pm

An awesome feature with storing updates instead of having a static state is that you have access to the history of a document.

Is history a planned feature for Yjs? Would it be compatible with compressing the updates of a document into one big update?

dmonad · May 25, 2022, 3:02pm

Yjs already persists the history in the update if you disable garbage-collection (ydoc.gc = false). However, the API to restore old states is not yet public.

Gin-Quin · May 25, 2022, 4:54pm

Okay, that’s very cool. I suppose the downside of disabling GC is that documents can get huge in time.

Looking forward to use the history API, thanks!