Refreshing page causes y-indexeddb to accumulate db entries

micrology · February 4, 2022, 5:08pm

I have a standard call to y-indexeddb in a web app , e.g.

	const persistence = new IndexeddbPersistence(room, doc)
	// once the map is loaded, it can be displayed
	persistence.once('synced', () => {
		console.log(' local content loaded')
	})

When I look in DevTools at Application > IndexedDB > [room] > updates, I see 2 rows, one for a Uint8Array[0,0,buffer: arrayBuffer(2)...] and one long Uint8Array.

If I now refresh the page in the browser (without making any changes to the doc), another two rows are added (similar to the first two). Each refresh adds another two rows. Is this the expected behaviour? Since the data to be stored hasn’t changed, I wasn’t expecting any change to the database.

(The motive for asking this question is that I am experiencing some unreliability in retrieving the contents of a yDoc when I use both y-indexedDB and y-websocket, and I am trying to track down the cause).

yufw · February 6, 2022, 8:41am

@micrology, are you using Firefox? probably related to this, Y-websocket-server connection event emitted twice on page reload

micrology · February 6, 2022, 10:49am

This occurs on Chrome and Safari as well as Firefox (on a Mac, Monterey), so I don’t think there is a link to that issue. Also I am logging connection events and I don’t see it emitted twice.

micrology · February 11, 2022, 10:35pm

Here is a minimal example:

import * as Y from 'yjs'
import {IndexeddbPersistence} from 'y-indexeddb'

const doc = new Y.Doc()
const persistence = new IndexeddbPersistence('test11-2', doc)
persistence.once('synced', () => {
	console.log(` ${yMap1.get('prop1')} loaded from IDB`)
})
const yMap1 = doc.getMap('map1')
yMap1.set('prop1', 'foo')

If you run this, and look in the Debugger at the display for IndexedDB, every time the page is reloaded, a new key and value is generated for database test11-2, object store updates. This is true in both Chrome and Firefox.

gustavotoyota · February 12, 2022, 1:05pm

This happens here as well, but it doesn’t seem to cause any issues. What kinds of unreliabilities are you experiencing?

micrology · February 12, 2022, 9:43pm

I have a yDoc that can often be 3MB or more. Reloading this a good few times uses a lot of memory since 3MB is added each time. I suspect that the problem a client had with my app was caused by IndexedDB being ‘out of memory’ (or out of disk space), but it was not possible to reproduce the issue. Whether or not that was the case, I don’t understand why we are seeing this behaviour.

santhosh-ps · September 25, 2022, 12:22pm

@micrology did you find a solution for this? I’m also facing the same issue and checking how to resolve it.

micrology · September 29, 2022, 4:50pm

I didn’t! Sorry I have nothing to help.

dmonad · September 30, 2022, 12:13pm

Reg

micrology:

Here is a minimal example:
import * as Y from 'yjs'
import {IndexeddbPersistence} from 'y-indexeddb'

const doc = new Y.Doc()
const persistence = new IndexeddbPersistence('test11-2', doc)
persistence.once('synced', () => {
	console.log(` ${yMap1.get('prop1')} loaded from IDB`)
})
const yMap1 = doc.getMap('map1')
yMap1.set('prop1', 'foo')
If you run this, and look in the Debugger at the display for IndexedDB, every time the page is reloaded, a new key and value is generated for database test11-2, object store updates. This is true in both Chrome and Firefox.

You can think of Yjs document as a git repository. Every time you change or create a value, you create a new commit. The only relevant difference is that in Yjs conflicts are automatically merged.

When you insert a value every time you start the app, you are creating a commit on an empty document. Of course, the change will automatically be merged (if the same value is set by a remote client/server, then the values will be merged - in the case of Y.Map, one will overwrite the other).

y-indexeddb notices that you created a change and stores the “commit” in a database. It squashes commits into a single entry every now and then, but the produced metadata can never be deleted… So you should avoid making unnecessary changes.

I talked about this a number of times on this discussion board (search for “initial content”). You should only initialize the content once (not every time you load the document). Firstly, it is extremely inefficient (Yjs needs to store all data that was ever produced, even metadata of content that was overwritten). Secondly, there is a good chance that you overwrite the content that is currently used by all other clients which might include new changes. If you manipulate the Yjs document (even if you overwrite content with the exact same content), then Yjs needs to store it in the database (indexeddb) and propagate the change to all clients.

raine · May 28, 2023, 2:04am

I am seeing the same behavior when there is no initialization or changes to the doc. It still persists a new update.

Minimal example:

Notice that I never update the Doc, or even create a shared type on it for that matter.

I have traced the behavior to the following commit. Anybody know why this was added? It seems like a mistake that it grows the object store without any changes to the Doc.

mpex · May 28, 2023, 3:34am

New to Yjs and still learning, so I kept trying to figure out what I was doing to cause this issue. My updates were growing without me understanding why.

@micrology thanks for setting up the minimal example. And @raine, hopefully that leads to a fix. Nicely done!

joshuafontany · June 6, 2023, 9:30pm

ChatGPT4 thinks that there is a logic error in the storeState method. Really not sure how accurate that is atm…

The y-indexeddb module, as shown in your code, is a persistence layer that uses IndexedDB to store the state of a Yjs document. The module listens for updates to the Yjs document, and stores these updates in an IndexedDB database. When the Yjs document is loaded, it fetches these updates from the database and applies them to the document, effectively restoring its state.

The storeState function is called periodically (as defined by _storeTimeout) or when the number of updates in the database reaches a certain threshold (PREFERRED_TRIM_SIZE). This function fetches all updates from the database, applies them to the Yjs document, and then stores the resulting state of the document in the database. If the number of updates has reached the threshold, it also deletes the oldest updates from the database, ensuring that the total number of updates does not exceed the threshold.

Refreshing the page causes a new instance of the Yjs document to be created, and this new instance fetches and applies all updates from the database. However, these updates are not deleted from the database, because the storeState function only deletes updates when the number of updates reaches the threshold. As a result, each refresh of the page causes the database to accumulate more updates.

To prevent this accumulation of updates, you could modify the storeState function to delete all updates from the database after they have been applied to the Yjs document, regardless of the number of updates. Alternatively, you could decrease the value of PREFERRED_TRIM_SIZE to a smaller number, so that updates are deleted more frequently.

Please note that this is a high-level explanation based on the provided code and may not cover all edge cases or specific implementation details.

raine · June 7, 2023, 3:27am

Interesting. That’s more or less correct. However it doesn’t clarify why empty updates are being stored on load. That code was just added in September, so it was not part of the core logic, and presumably was addressing a newly discovered edge case.

A couple steps that might be worth a try:

See which (if any) tests fail when that commit is reverted.
Add a condition that only stores the update if it is non-empty.

raine · June 15, 2023, 1:32pm

Several tests fail if the call to the new beforeApplyUpdatesCallback is removed.

I adding the condition if (updates.length === 0) to the call site of beforeApplyUpdatesCallback, which stops the growth. All the tests still pass, though this doesn’t say a whole lot given that the test coverage is spotty. Some real-world testing is needed, as well as a more precise understanding of the redundancy condition under which beforeApplyUpdatesCallback can be skipped.

Update: Never mind, that definitely doesn’t work.

Opened an issue here:

github.com/yjs/y-indexeddb

Doc grows without any changes

opened 11:08PM - 12 Jun 23 UTC

raineorshine

bug

**Checklist** * [x] Are you reporting a bug? Use github issues for bug report…s and feature requests. For general questions, please use https://discuss.yjs.dev/ * [x] Try to report your issue in the correct repository. Yjs consists of many modules. When in doubt, report it to https://github.com/yjs/yjs/issues/ **Describe the bug** The size of the object store grows every time the page is refreshed. This occurs because the doc is encoded as an update and appended to the object store every time `IndexeddbPersistence` is instantiated. The db is compacted after reaching the `PREFERRED_TRIM_SIZE` whenever an update is stored. However, this assumes 1) a small number of Docs, and 2) Docs will be modified. **While reasonable on the surface, neither of these assumptions hold in all cases.** I currently instantiate thousands of Docs for an offline-first graph database built on YJS. Many hundreds are instantiated and destroyed dynamically as the user navigates the graph, and some may never be modified. **To Reproduce** Minimal example: https://codesandbox.io/p/sandbox/pedantic-euler-jje92f?file=%2Fsrc%2FApp.tsx%3A1%2C1-2%2C1 Notice that no modifications are made to the Doc, yet it grows in size. **Expected behavior** The persisted Doc should not grow indefinitely when no changes are made. **Environment Information** - Browser: Chrome - Yjs: v13.6.2 - y-indexeddb: v9.0.11 **Additional context** I can see that the behavior was introduced in [this commit](https://github.com/yjs/y-indexeddb/commit/217fd6da1ed12b300eba647db247af4ac1257cf2). ~One possible solution is adding the condition `if (updates.length === 0)` to `beforeApplyUpdatesCallback`, as it seems to only be needed on a new Doc. That stops the infinite growth, and the tests pass, however I don't know the full implications of this change. If `beforeApplyUpdatesCallback` needs to be invoked in other cases that are not covered by the tests that would be important to know.~ Is there a condition that detects when `beforeApplyUpdatesCallback` does not need to be called? I have tried skipping it when the Doc is empty, but that does not work.

joshuafontany · June 26, 2023, 2:22am

@raine Thanks for filing an actual bug!

If it is a problem with how the y-indexeddb is constructed, I wonder if we could compare the state of the current doc with a doc created from all of the states already in the store, and store the new state only if there is a difference. Might be a few ms on construction, but would avoid this memory leak.

raine · June 26, 2023, 3:14pm

I actually think it may be simpler than that. I’m currently testing a small change that skips empty updates, and it is working well so far: do not save empty updates - fixes #31 · raineorshine/y-indexeddb@5e0917e · GitHub

I’d love to get some help testing this before submitting a PR, to marke sure there are no regressions. To test, replace the y-indexeddb dependency in your package.json with the following: "y-indexeddb": "https://github.com/raineorshine/y-indexeddb#empty-update"

raine · July 18, 2023, 9:00pm

I did some more testing and discovered that a simple check for empty updates is not enough when there are multiple providers. The empty update is prevented when the first provider syncs, but then the second provider causes a new, redundant db entry. This grows continuously as originally observed.

I tried many attempts to detect when an update is redundant by comparing binary updates or state vectors, but it always resulted in either false positives or regressions.

I have resorted to a more brute force approach of triggering a database compaction on the initial sync. You can do this by setting PREFERRED_TRIM_SIZE = 0, or calling a debounced storeUpdate(this, true) after the initial sync. Here is an example of the latter approach, which I am currently using:

So far this works with any number of providers, and there is no visible performance impact with hundreds of Docs.

MentalGear · February 17, 2024, 3:55pm

Has this fix ever been merged?

raine · February 17, 2024, 10:56pm

No, a PR was never created because this change comes with a performance cost and I wasn’t entirely sure it would be the correct default behavior.