Y-redis: An alternative backend to y-websocket

dmonad · March 15, 2024, 12:52am

I just released y-redis - an alternative backend for y-websocket that is easy to scale and doesn’t keep data in-memory. It shows how you can implement auth, and it enables you to setup your own auth. I’ve made a lot of iterations on y-redis over the years. This is the first time I’m happy with my implementation.

I’ve been keeping this gem hidden while looking for an organization that could possibly sponsor the work. I took the last few months to finish this up, because a lot of people have been asking for it.

Please note that there are other backends to y-websocket, that probably provide better support (e.g. y-sweet).

marko · March 17, 2024, 4:04pm

Hey @dmonad,

First of all, congrats on the amazing work and for sharing it publicly.

I’ve just tried y-redis, and here’s my feedback so far. I managed to get it up and running, but I needed to make some tweaks.

I followed the instructions, but after cloning it and running ‘npm i’, the command ‘npm run start:server’ failed:

> @y/redis@1.5.3 start:server
> dotenv run node ./bin/server.js
sh: dotenv: command not found

After running ‘npm install -g dotenv-cli’ and trying again, I encountered the following:

After that, I changed the package.json line to:

“start:server”: “dotenv-run – node ./bin/server.js”,

and that did the trick

Am I missing something? Is it the wrong dotenv library? I guess it should be listed in the dependencies?

Also, command “npx 0ecdsa-generate-keypair --name auth” from .env file outputs this format

AUTH_PUBLIC_KEY=“{"key_ops":["verify"],"ext":true,"kty":…h3O","crv":"P-384"}”
AUTH_PRIVATE_KEY=“{"key_ops": … fJGTlx0im"}”

which gives the parsing error

SyntaxError: Expected property name or '}' in JSON at position 1 (line 1 column 2)
    at Module.parse (<anonymous>)

I changed it to this, which is working

AUTH_PUBLIC_KEY={"key_ops":["verify"],"ext":true ...
AUTH_PRIVATE_KEY={"key_ops":["sign"],"ext":true ....

Thanks

Marko

dmonad · March 17, 2024, 7:08pm

Thank you so much for the feedback @marko !

I just fixed the key generation and opted for node --env-file .env .. instead, so I don’t have to include another dependency. I thought that dotenv was a standard utility (it was included in my distro). Turns out this is a node thing and dotenv is just one of many implementations.

marko · March 18, 2024, 8:03am

Happy to help! Amazing work!

By the way, I came across an issue where the server crashes without any particular reason. I don’t have the exact scenario yet, but here is the error. Maybe you’ll know right away…

@y/redis/ws: client connected (uid=14, ip=0000:0000:0000:0000:0000:0000:0000:0001) +14ms
file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497
export const readAny = decoder => readAnyLookupTable[127 - readUint8(decoder)](decoder)
                                                                              ^

TypeError: readAnyLookupTable[(127 - readUint8(...))] is not a function
    at readAny (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497:79)
    at Array.<anonymous> (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:479:18)
    at readAny (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497:79)
    at Array.readAnyLookupTable (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:487:16)
    at readAny (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497:79)
    at Array.<anonymous> (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:479:18)
    at readAny (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497:79)
    at Array.readAnyLookupTable (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:487:16)
    at readAny (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:497:79)
    at Array.<anonymous> (file:///Users/marko/dev/blockbuilder/y-redis/node_modules/lib0/decoding.js:479:18)

Node.js v21.1.0

By the way, do you prefer to report this kind of issue on GitHub or here?

junoriosity · March 18, 2024, 11:52am

@dmonad Many thanks for sharing this amazing work.

Is it somehow possible to connect y-redis to Tiptap, for further information see here?

dmonad · March 18, 2024, 11:27pm

Thanks @marko,

It would be great if you could open an issue with a full stack-trace.

@junoriosity Sure you can! FYI They also have Hocuspocus. You basically just need to connect the Y.Doc to a y-websocket instance and connect it to the y-redis instance. You can use the demo in y-redis/demos as a starting point.

dwzkit · March 18, 2024, 11:28pm

Hey @dmonad , Greatly appreciate your selfless contribution to such a fantastic project! I’ve successfully set up the environment locally and connected it to a third-party object storage service compatible with the S3 protocol. Everything seems to be working fine in the demo environment. Of course, to get everything running smoothly, I encountered issues with dotenv and the key format, but I managed to solve them similarly. Additionally, I made a small adjustment: changing

javascript

try {
  // make sure the bucket exists
  await store.client.makeBucket(bucketName);
} catch (e) {}

to:

javascript

// Check if the bucket already exists
const bucketExists = await store.client.bucketExists(bucketName);
if (bucketExists) {
  console.log(`Bucket '${bucketName}' already exists.`);
} else {
  try {
    // Attempt to create the bucket if it does not exist
    await store.client.makeBucket(bucketName);
    console.log(`Bucket '${bucketName}' created successfully.`);
  } catch (e) {
    console.error("Error creating bucket", e);
  }
}

In handling the Redis connection, I added a .on('error') listener to avoid the server and worker programs from hanging up directly due to a Redis restart. I’ve found that as long as Redis itself is persistent, even if Redis restarts due to a fault, the system can still function normally after the server and worker reconnect to Redis. Thanks again for your sharing! I wonder if there are any plans to support multi-version storage in the future, to allow for the selection and restoration of document versions.

dwzkit · March 19, 2024, 4:28pm

Hi @dmonad , I’m currently exploring y-redis for enhancing real-time collaboration capabilities in a project. The deployment plan involves Kubernetes, where I’m aiming to set up y-redis across multiple pods. This approach is to ensure that the system can efficiently manage varying loads and maintain high availability.

The y-redis documentation highlights its scalability, mentioning that one can launch as many instances as necessary to handle fluctuating client numbers, without needing explicit coordination. From this, I gather that y-redis is designed to automatically distribute messages among instances, ensuring no message is processed more than once, even in a distributed setup. It suggests the inclusion of load balancing, idempotent processing strategies, and possibly leveraging Redis Streams’ consumer groups or similar features.

Given my plans to deploy y-redis within a Kubernetes environment utilizing multiple pods, I’d love to hear your thoughts on a couple of points:

Does my interpretation of y-redis’s scalability and automatic message distribution align with its intended capabilities?
Can y-redis inherently prevent the duplication of message processing across different instances, facilitating seamless scalability in a microservices architecture without necessitating additional coordination?

Your insights into these queries would be invaluable as I proceed with the deployment strategy. Thank you very much for your time and consideration.

dmonad · March 19, 2024, 10:24pm

Hi @dwzkit,

I just added a bit more documentation to the Components section. You can scale each component of y-redis. However, if you want to do sharding in redis, you’d need to adapt the lua scripts a bit (in that case you want one y:worker stream per shard).

Document updates are distributed using Redis streams (one stream per “room” or document). However, you don’t need to manage the streams yourself. The worker component does this for you. The rooms don’t use consumer groups (which is a good thing). Only the worker queue uses consumer groups to ensure that only one worker persists/cleans up a room at a time. This is just a performance optimization. You could safely let multiple workers clean up the same room without coordination.

Can y-redis inherently prevent the duplication of message processing across different instances, facilitating seamless scalability in a microservices architecture without necessitating additional coordination?

I’m unsure what you wanted to know. Is there a specific problem from other scaling approaches you are referring to?

Every message that a server component receives is distributed via redis. All server components that are “listening” for updates on a specific room (because their clients subscribed to it), will process the messages and distribute them to the respective clients. There is as little processing going on as possible.

dmonad · March 19, 2024, 10:37pm

In handling the Redis connection, I added a .on('error') listener to avoid the server and worker programs from hanging up directly due to a Redis restart. I’ve found that as long as Redis itself is persistent, even if Redis restarts due to a fault, the system can still function normally after the server and worker reconnect to Redis. Thanks again for your sharing!

It was my intention that the worker & server restart when the redis connection breaks. By default, redis only retains messages for 1-2 minutes. After that, the worker may persist and clean-up old messages. If the Redis connection breaks for longer than 1 minute, the server won’t be able to distribute all messages anymore (because they have already been cleaned up). At least make sure to throw out clients if the redis connection breaks for longer than a few seconds.

I wonder if there are any plans to support multi-version storage in the future, to allow for the selection and restoration of document versions.

This is part of a much larger plan to finally make versioning and suggestions easy to use in Yjs. I’ve big plans here, but I need to find the right sponsor for this… So this will take a while.

dwzkit · March 20, 2024, 1:30am

I just added a bit more documentation to the Components section. You can scale each component of y-redis. However, if you want to do sharding in redis, you’d need to adapt the lua scripts a bit (in that case you want one y:worker stream per shard).

Document updates are distributed using Redis streams (one stream per “room” or document). However, you don’t need to manage the streams yourself. The worker component does this for you. The rooms don’t use consumer groups (which is a good thing). Only the worker queue uses consumer groups to ensure that only one worker persists/cleans up a room at a time. This is just a performance optimization. You could safely let multiple workers clean up the same room without coordination.

Can y-redis inherently prevent the duplication of message processing across different instances, facilitating seamless scalability in a microservices architecture without necessitating additional coordination?

I’m unsure what you wanted to know. Is there a specific problem from other scaling approaches you are referring to?

Every message that a server component receives is distributed via redis. All server components that are “listening” for updates on a specific room (because their clients subscribed to it), will process the messages and distribute them to the respective clients. There is as little processing going on as possible.

Thank you for the insights on handling sharding with Redis. I’ve come to understand the implications of operating in a sharded Redis architecture and indeed, my setup involves a sharded Redis configuration. Given this, I’ll be making some adjustments to ensure that my Lua scripts are adapted to work seamlessly with the sharding setup. This adaptation, as discussed, is particularly crucial to maintain the scripts’ functionality across different shards by employing hash tags to keep related keys within the same shard.

Regarding my second question, please disregard it as it may have been somewhat repetitive upon review. I was essentially inquiring about the feasibility of scaling in a microservices architecture without needing significant adjustments. From your response, it seems that aside from the special considerations required for sharding, no additional modifications are needed for scaling within a microservices framework, which answers my query.

I appreciate your guidance and look forward to implementing these changes to ensure compatibility with the sharded environment.

dmonad · March 20, 2024, 10:06am

If you want this software to survive, it makes sense to hire me to scale y-redis and fix issues along the way. I won’t provide any support for larger companies anymore that don’t give back at all while profiting from free software. I can’t (and don’t want to) prevent you from using and extending my software and making it better for your own purposes. But I want to encourage you and your company to give back and work together instead.

MaksimSokolovsky · March 28, 2024, 11:33am

@dmonad Can you explain, Is it possible to combine hocuspocus and y-redis? I have Hocuspocus instance and I want to improve my performance with y-redis. Is it possible?

dmonad · March 28, 2024, 12:08pm

Nope, it’s either one. But hocuspocus has a redis-extension as well

dwzkit · April 20, 2024, 10:46am

I wanted to share some updates I made to ensure compatibility with Redis Cluster. Given the challenges of operating across different shards in a Redis Cluster environment, I made three key adjustments:

Script Separation: To address the issue of redisWorkerStreamName and redisWorkerGroupName being global variables and potentially causing script execution problems across different shards, I’ve split the related operations into separate Lua scripts. Initially, the script combined stream existence checks and message additions.
The original script:

addMessage: redis.defineScript({
  NUMBER_OF_KEYS: 1,
  SCRIPT: `
    if redis.call("EXISTS", KEYS[1]) == 0 then
      redis.call("XADD", "${this.redisWorkerStreamName}", "*", "compact", KEYS[1])
    end
    redis.call("XADD", KEYS[1], "*", "m", ARGV[1])
  `,
  transformArguments(key, message) {
    return [key, message];
  },
  transformReply(x) {
    return x;
  },
}),

And it has been refactored into these more focused scripts:

checkStreamExists: redis.defineScript({
  NUMBER_OF_KEYS: 1,
  SCRIPT: `
    return redis.call("EXISTS", KEYS[1])
  `,
  transformArguments(streamKey) {
    return [streamKey];
  },
  transformReply(reply) {
    return reply === 1;
  },
}),
addCompactToWorkerStream: redis.defineScript({
  NUMBER_OF_KEYS: 1,
  SCRIPT: `
    redis.call("XADD", KEYS[1], "*", "compact", ARGV[1])
  `,
  transformArguments(workerStreamKey, streamKey) {
    return [workerStreamKey, streamKey];
  },
  transformReply(x) {
    return x;
  },
}),
addMessageToStream: redis.defineScript({
  NUMBER_OF_KEYS: 1,
  SCRIPT: `
    redis.call("XADD", KEYS[1], "*", "m", ARGV[1])
  `,
  transformArguments(key, message) {
    return [key, message];
  },
  transformReply(x) {
    return x;
  },
}),

Individual Script Functions: Each part of the process is now handled by its own dedicated Lua script. The refactoring was necessary to handle the distinct operations in a manner compatible with Redis Cluster requirements. Here’s the adjustment in the application logic:

Original function:

async addMessage(path, room, docid, m) {
  if (m[0] === protocol.messageSync && m[1] === protocol.messageSyncStep2) {
    if (m.byteLength < 4) {
      return promise.resolve();
    }
    m[1] = protocol.messageSyncUpdate;
  }
  return this.redis.addMessage(
    computeRedisRoomStreamName(path, room, docid, this.prefix),
    m
  );
}

Updated function:

async addMessage(path, room, docid, m) {
  if (m[0] === protocol.messageSync && m[1] === protocol.messageSyncStep2) {
    if (m.byteLength < 4) {
      return promise.resolve();
    }
    m[1] = protocol.messageSyncUpdate;
  }
  const streamKey = computeRedisRoomStreamName(
    path,
    room,
    docid,
    this.prefix
  );
  const exists = await this.redis.checkStreamExists(streamKey);
  if (!exists) {
    await this.redis.addCompactToWorkerStream(
      this.redisWorkerStreamName,
      streamKey
    );
  }
  return this.redis.addMessageToStream(streamKey, m);
}

Stream Key Management: To further enhance compatibility and ensure all keys map to appropriate hash slots, I modified how stream keys are managed:

Original key computation:

export const computeRedisRoomStreamName = (room, docid, prefix) =>
  `${prefix}:room:${encodeURIComponent(room)}:${encodeURIComponent(docid)}`;

Updated key computation:

export const computeRedisRoomStreamName = (path, room, docid, prefix) => {
  return `${prefix}:{${encodeURIComponent(room)}:${encodeURIComponent(
    docid
  )}}:stream`;
};

These changes were essential to ensure that the application runs smoothly across multiple nodes in a Redis Cluster, avoiding cross-slot command executions which were causing failures. I hope these insights might be helpful for your future projects

dwzkit · May 15, 2024, 1:05am

Regarding the use of y-redis in cluster mode, there are two additional reference points:

The getMessages function in api.js needs to be adjusted to first fetch the slot of each stream and then process it. Otherwise, using the same xRead command for multiple slots will result in an error.
Due to the modification in point 1 and the limitation that Redis cluster cannot process across slots at once (including the limitation that cross-slot does not support pipeline), the execution speed of getMessages becomes very slow when it is called by the run function in subscriber.js. Therefore, I discarded the run function and added a custom function in subscriber.js:

/**
 * @param {string} stream
 */
async processStreamMessages(stream) {
  try {
    const sub = this.subs.get(stream);
    if (sub == null) return;
    const messages = await this.client.getMessages([
      { key: stream, id: sub.id },
    ]);

    for (const message of messages) {
      sub.id = message.lastId;
      if (sub.nextId != null) {
        sub.id = sub.nextId;
        sub.nextId = null;
      }
      sub.fs.forEach((callback) =>
        callback(message.stream, message.messages)
      );
    }
  } catch (e) {
    console.error(e);
  }
}

hen, in the ws.js function message: async (ws, messageBuffer) => {, after executing await client.addMessage(user.path, user.room, "index", message);, immediately execute subscriber.processStreamMessages(stream); to broadcast. In other words, the original method of polling all room messages in a while loop in the run function was adjusted to call processStreamMessages immediately after a message is successfully added to Redis to broadcast it, avoiding message broadcast blocking.

pixeldrew · May 21, 2024, 3:01pm

Thanks very much for this and i’m looking at how best to implement. I’m starting to evaluate y-redis and I’ve setup the demo app. One of the major tasks I would need to address is how to migrate documents from y-socket’s indexdb to the s3 or postgres stores. I assume i’m going to have to create an app that will have to initialize the YJS doc and then to save it directly to the store. Does that sound right?

pixeldrew · May 22, 2024, 4:01pm

i’m working on something and will be happy to contribute this back to the community.