Can I get advice on how to work with streaming AI LLMs?

Hey everybody! how are you doing?

I’m building an editor that assists you with the help of an AI LLM.

My current stack is:

  • TipTap Editor (which uses y-prosemirror bindings), for the editor
  • LiveBlocks for the synchronization of the Yjs doc.

However, I’m having a bit of trouble figuring out exactly how to make the LLM streaming feature work with Liveblocks and a TipTap editor.

Let me explain:
The LLM streams markdown content to my backend.
From there I need to find a way to sync that content to the Yjs doc that’s hosted in LiveBlocks.
However the conversion to a Yjs doc is not straightforward at all. Or at least I’m not figuring out the exact sequence of steps on how to convert a chunk of markdown (which may be invalid still, because remember it’s just a chunk for now!) to the correct yXmlFragment, and from there to push it to liveblocks.

I think the only viable solution right now is to:

  1. Have my users get the streamed markdown from the backend into their browser’s client
  2. Automatically populating the TipTap editor content with the setContent hook. Which will then do all the heavy lifting, transform it to a valid Yjs doc and push it to LiveBlocks (thanks to the Collaboration extension)

HOWEVER, this latter approach blocks the ability of generating content in the background. Meaning that if one of my users close the editor’s tab. The content stops being pushed to LiveBlocks.

Is there a correct or recommended way to stream content (markdown) from an LLM into an existing Yjs doc?

Thanks!

Sure, you can insert Y.XmlElements directly into the Yjs document. However, it really depends on your setup how to insert content.

What kind of block elements do you have defined in TipTap?

Let’s say, you want to insert LLM content as paragraphs using the paragraph block type. Then you would do:

const yxml = ydoc.getYxmlFragment('my-tiptap-content')
// insert a single paragraph block with some content..
const p = new Y.XmlElement('paragraph')
p.insert(0, [new Y.XmlText('my LMM generated content....')])
yxml.insert(0, [p])

I recommend stripping the markdown part for now.

If you want to keep it, you could parse the markdown content (you need a parser for that, you can’t do that manually) and transform it to the delta format (a rich-text format that Y.Text understands, it supports bold,italic, etc…). TipTap should be able to pick-up the richtext from Y.Text. The formatting attributes in Yjs will be picked up by TipTap as “marks”.

Hey @dmonad ! Thank you very much for your explanation.

I do definitely need to keep the markdown since that’s what adds a lot of the differentiation to the product. (i.e: the titles being big and bold, etc…)

My main issue with parsing markdown in a streamed way is that I can’t know whether we’ve finished a block.

For example, a text chunk may be " ** Welcome to"
and the second text chunk may be " my Article **"

And when you parse the first one you don’t add the bold mark because you haven’t yet gotten the closing “**” asterisks.

So that’s an issue that’s blocking me and I don’t know how to continue from there…

One idea that I had was to keep a string in memory of the whole article that the LLM streams to my server. Meaning, as soon as the LLM sends a new text chunk I do

let fullArticle += newTextChunk

Then each time we get an update I’d like to completely erase the Yjs doc, and replace it with the new contents of fullArticle on its entirety.

Is this possible? Do you think it’s a good idea? What would be the best way to delete everything and push it again?

You can replace the Y.XmlElement that contains the current LLM answer with a new one. That sounds like a good idea. Then it shouldn’t be a problem to have temporary parsing issues.

However, you shouldn’t replace the whole Yjs document on every change, that might lead to some issues down the line. For once, there is no method to erase all content in Yjs. Furthermore, large amounts of generated text will result in a lot of overhead (especially for Yjs) that can easily be avoided. Also, the client has to rerender everything on every LLM update (which is unnecessary, if the previous paragraph didn’t change).

Hey @dmonad! Thanks again. Feels like I’m getting closer.

I’m trying to build the Y.XmlElement with the parsed markdown, but getting wrangled in the middle.

I have the following as of now:

import Document from "@tiptap/extension-document";
import Paragraph from "@tiptap/extension-paragraph";
import Text from "@tiptap/extension-text";
import { getSchema } from "@tiptap/react";
import {
  defaultMarkdownParser
} from "prosemirror-markdown";
import { prosemirrorJSONToYDoc } from "y-prosemirror";

const jsonState = defaultMarkdownParser.parse("testing!");
const schema = getSchema([Document, Paragraph, Text]);
const yDoc = prosemirrorJSONToYDoc(schema, jsonState, "default");

console.log(yDoc);

Unfortunately the above fails with the following error in the prosemirrorJSONToYDoc call:

RangeError: Invalid input for Fragment.fromJSON

I’m sorry. I can’t help with that :frowning:

My best guess is that the jsonState does not conform to the schema.

If you want to use the code above from the server, I believe that prosemirror depends on the dom. So you have to install jsdom as well.

1 Like

@mikealche Could you rework the design somehow to workaround this?

I’m working off the same stack (TipTap, Yjs, Liveblocks) and this is what came to my mind:

Perhaps you can create a custom TipTap node view for the streamed LLM response, and have that render a React or Vue component. All Prosemirror or Yjs knows about is that this is a custom component. It wouldn’t know what the custom component’s internal state is. That’s your escape hatch.

I would give these components a unique ID prop that would be stored in Prosemiror, but that’s about all you need to store in Prosemirror.

The actual state inside of this component (your Markdown text) could live outside of Prosemirror while the LLM content is streamed in.

You could also have the state of these components streamed live to other users in the same Liveblocks room by piggybacking on Liveblocks’ presence (that way, the state is ephemeral, at least while it’s being streamed in). If you do this, other users can see what’s being streamed in too.

What you can do thereafter the LLM has completed its response, is let the user “accept” or “decline” what the LLM has generated, similar to what Cursor does. Here’s also where the user can ask a follow-up edit prompt or click on a “regenerate” button if they didn’t like the LLM’s response.

During this acceptance flow, you can take the ephemeral state and insert actual text nodes to your TipTap editor. By then, you should have valid Markdown.

I wonder if that would solve your problem. Let me know how it goes and I’d love to see your solution if you get something working!