Memory handling in YJS

vin · July 4, 2023, 2:42pm

Hi,
I am trying to build my own collaborative white board. I have been using socket io up until now but now I am looking for alternatives like YJS. I am new to web applications development as well so I want some advice regarding memory management in my system.

The whiteboard application that I have built allows users to join “meeting” rooms and do some collaborative drawings on the whiteboard. However, I have been using arrays for storing my drawing data in the form of each drawing path - start point, end point, color and linewidth. This drawing data is used to load the existing drawing on a new user that has joined a room, along with giving my appllication undo and redo functionalit. This form of storage is very very ineffecient as each room will have its own arrays. When I get this website hosted, I am expecting atleast 100 rooms open in one instance and storing all the drawing data in arrays will crash my site and expose it to data leakage. The size of my array can be in 100,000s’ elements in a matter of minutes. I have explored into many alternatives like using DBs and session storage but I still feel this is very ineffecient as storing this much data in a DB will also be expensive plus will make data manipulation very difficult.
In essence, my system poses scalibility issues due to poor data management.
Does YJS framework give me any advantage or solution to this problem?

rozek · July 4, 2023, 2:49pm

I don’t see why your “site” should crash…

presumably every web client is joining a single room only
your “site” is mainly serving the code and distributing messages, the actual execution is done by every client’s browser
of course, freehand lines will have dozens, perhaps hundreds of nodes - so what?
if there is real need, you may still limit the number of nodes per line and/or the number of lines in your drawing
for data modelling: every sheet in your whiteboard could be a Y.Array of “strokes” (i.e., plain JS objects with color information and a list of nodes)

vin · July 4, 2023, 2:58pm

Thanks Rozek.
Can you please explain the last point that you made about Data Modelling?
Thats a good point of limiting the nodes in my lines before storing.

I am still trying to learn and understand what would YJS, webRTC etc be doing differently if I use them in my application as compared to socket io. I would really appreciate some help with this as well!

rozek · July 4, 2023, 3:02pm

well,

first of all: read the docs
then model your data (e.g., one Y.Doc per whiteboard sheet, to be distributed using its own “room”)
now decide on a persistence (if need be, e.g., y-indexeddb) and
decide on a “network provider” (e.g., y-websocket)

Don’t forget to decide how users may join a certain room

Finally: implement

rozek · July 4, 2023, 3:12pm

Perhaps the following helps a bit

Y.Docs are containers for individually shareable data
Y.Maps have entries which may be individually modified by multiple people (if you don’t need this feature, use plain JS objects instead)
Y.Array have elements which may be individually added, shifted around and removed by multiple people (if you don’t need this feature, use plain JS arrays instead)

Yjs itself could be viewed as a distributed database with offline support and “awareness” features during live connections

raine · July 4, 2023, 3:13pm

The main difference is that YJS is a CRDT implementation. YJS is a good choice if you need decentralized conflict resolution, such as in an offline-first situation. If you don’t need to support offline-first, i.e. the app is always expected to be connected to the internet, then a centralized approach with socket.io is better.

In order for YJS to perform merges without conflicts, it needs to store the entire history of changes. While YJS is efficient compared to other CRDT implementations, it uses more memory than simple arrays. A lot more. You may be better off with socket.io, and just doing some throttling and horizontal scaling to reduce bandwidth.