Refreshing page causes y-indexeddb to accumulate db entries

I have a standard call to y-indexeddb in a web app , e.g.

	const persistence = new IndexeddbPersistence(room, doc)
	// once the map is loaded, it can be displayed
	persistence.once('synced', () => {
		console.log(' local content loaded')
	})

When I look in DevTools at Application > IndexedDB > [room] > updates, I see 2 rows, one for a Uint8Array[0,0,buffer: arrayBuffer(2)...] and one long Uint8Array.

If I now refresh the page in the browser (without making any changes to the doc), another two rows are added (similar to the first two). Each refresh adds another two rows. Is this the expected behaviour? Since the data to be stored hasn’t changed, I wasn’t expecting any change to the database.

(The motive for asking this question is that I am experiencing some unreliability in retrieving the contents of a yDoc when I use both y-indexedDB and y-websocket, and I am trying to track down the cause).

1 Like

@micrology, are you using Firefox? probably related to this, Y-websocket-server connection event emitted twice on page reload

This occurs on Chrome and Safari as well as Firefox (on a Mac, Monterey), so I don’t think there is a link to that issue. Also I am logging connection events and I don’t see it emitted twice.

Here is a minimal example:

import * as Y from 'yjs'
import {IndexeddbPersistence} from 'y-indexeddb'

const doc = new Y.Doc()
const persistence = new IndexeddbPersistence('test11-2', doc)
persistence.once('synced', () => {
	console.log(` ${yMap1.get('prop1')} loaded from IDB`)
})
const yMap1 = doc.getMap('map1')
yMap1.set('prop1', 'foo')

If you run this, and look in the Debugger at the display for IndexedDB, every time the page is reloaded, a new key and value is generated for database test11-2, object store updates. This is true in both Chrome and Firefox.

This happens here as well, but it doesn’t seem to cause any issues. What kinds of unreliabilities are you experiencing?

I have a yDoc that can often be 3MB or more. Reloading this a good few times uses a lot of memory since 3MB is added each time. I suspect that the problem a client had with my app was caused by IndexedDB being ‘out of memory’ (or out of disk space), but it was not possible to reproduce the issue. Whether or not that was the case, I don’t understand why we are seeing this behaviour.

2 Likes

@micrology did you find a solution for this? I’m also facing the same issue and checking how to resolve it.

I didn’t! Sorry I have nothing to help.

Reg

You can think of Yjs document as a git repository. Every time you change or create a value, you create a new commit. The only relevant difference is that in Yjs conflicts are automatically merged.

When you insert a value every time you start the app, you are creating a commit on an empty document. Of course, the change will automatically be merged (if the same value is set by a remote client/server, then the values will be merged - in the case of Y.Map, one will overwrite the other).

y-indexeddb notices that you created a change and stores the “commit” in a database. It squashes commits into a single entry every now and then, but the produced metadata can never be deleted… So you should avoid making unnecessary changes.

I talked about this a number of times on this discussion board (search for “initial content”). You should only initialize the content once (not every time you load the document). Firstly, it is extremely inefficient (Yjs needs to store all data that was ever produced, even metadata of content that was overwritten). Secondly, there is a good chance that you overwrite the content that is currently used by all other clients which might include new changes. If you manipulate the Yjs document (even if you overwrite content with the exact same content), then Yjs needs to store it in the database (indexeddb) and propagate the change to all clients.

I am seeing the same behavior when there is no initialization or changes to the doc. It still persists a new update.

Minimal example:

Notice that I never update the Doc, or even create a shared type on it for that matter.

I have traced the behavior to the following commit. Anybody know why this was added? It seems like a mistake that it grows the object store without any changes to the Doc.

New to Yjs and still learning, so I kept trying to figure out what I was doing to cause this issue. My updates were growing without me understanding why.

@micrology thanks for setting up the minimal example. And @raine, hopefully that leads to a fix. Nicely done!

1 Like

ChatGPT4 thinks that there is a logic error in the storeState method. Really not sure how accurate that is atm…

The y-indexeddb module, as shown in your code, is a persistence layer that uses IndexedDB to store the state of a Yjs document. The module listens for updates to the Yjs document, and stores these updates in an IndexedDB database. When the Yjs document is loaded, it fetches these updates from the database and applies them to the document, effectively restoring its state.

The storeState function is called periodically (as defined by _storeTimeout) or when the number of updates in the database reaches a certain threshold (PREFERRED_TRIM_SIZE). This function fetches all updates from the database, applies them to the Yjs document, and then stores the resulting state of the document in the database. If the number of updates has reached the threshold, it also deletes the oldest updates from the database, ensuring that the total number of updates does not exceed the threshold.

Refreshing the page causes a new instance of the Yjs document to be created, and this new instance fetches and applies all updates from the database. However, these updates are not deleted from the database, because the storeState function only deletes updates when the number of updates reaches the threshold. As a result, each refresh of the page causes the database to accumulate more updates.

To prevent this accumulation of updates, you could modify the storeState function to delete all updates from the database after they have been applied to the Yjs document, regardless of the number of updates. Alternatively, you could decrease the value of PREFERRED_TRIM_SIZE to a smaller number, so that updates are deleted more frequently.

Please note that this is a high-level explanation based on the provided code and may not cover all edge cases or specific implementation details.

Interesting. That’s more or less correct. However it doesn’t clarify why empty updates are being stored on load. That code was just added in September, so it was not part of the core logic, and presumably was addressing a newly discovered edge case.

A couple steps that might be worth a try:

  1. See which (if any) tests fail when that commit is reverted.
  2. Add a condition that only stores the update if it is non-empty.
1 Like

Several tests fail if the call to the new beforeApplyUpdatesCallback is removed.

I adding the condition if (updates.length === 0) to the call site of beforeApplyUpdatesCallback, which stops the growth. All the tests still pass, though this doesn’t say a whole lot given that the test coverage is spotty. Some real-world testing is needed, as well as a more precise understanding of the redundancy condition under which beforeApplyUpdatesCallback can be skipped.

Update: Never mind, that definitely doesn’t work.

Opened an issue here:

1 Like

@raine Thanks for filing an actual bug!

If it is a problem with how the y-indexeddb is constructed, I wonder if we could compare the state of the current doc with a doc created from all of the states already in the store, and store the new state only if there is a difference. Might be a few ms on construction, but would avoid this memory leak.

I actually think it may be simpler than that. I’m currently testing a small change that skips empty updates, and it is working well so far: do not save empty updates - fixes #31 · raineorshine/y-indexeddb@5e0917e · GitHub

I’d love to get some help testing this before submitting a PR, to marke sure there are no regressions. To test, replace the y-indexeddb dependency in your package.json with the following: "y-indexeddb": "https://github.com/raineorshine/y-indexeddb#empty-update"

I did some more testing and discovered that a simple check for empty updates is not enough when there are multiple providers. The empty update is prevented when the first provider syncs, but then the second provider causes a new, redundant db entry. This grows continuously as originally observed.

I tried many attempts to detect when an update is redundant by comparing binary updates or state vectors, but it always resulted in either false positives or regressions.

I have resorted to a more brute force approach of triggering a database compaction on the initial sync. You can do this by setting PREFERRED_TRIM_SIZE = 0, or calling a debounced storeUpdate(this, true) after the initial sync. Here is an example of the latter approach, which I am currently using:

So far this works with any number of providers, and there is no visible performance impact with hundreds of Docs.

2 Likes

Has this fix ever been merged?

No, a PR was never created because this change comes with a performance cost and I wasn’t entirely sure it would be the correct default behavior.