Im considering the following architecture for a local search indexer. Given:
- multiple documents can be updated, e.g. there can be 100 documents a user is “following”
- documents can be updated by different users
- I dont want to keep 100 documents in memory and subscribe to 100 documents and index all changes
- I would like to be able to search through all 100 documents. I’d like to index the document client side and not rely on a central server
This would require a method to get notified of doc changes, without observing all docs. I’m considering:
- have a YDoc “feed”, containing a Ymap “documentUpdates”
- whenever a user edits a document, call documentUpdates.set(docId, stateVector)
- the Indexer subscribes to documentUpdates
- if theres a new/changed documentUpdates entry that hasn’t been synced yet, load the corresponding YDoc until the document reaches a state that includes the stateVector. Then, index the document and unload it.
- the Indexer keeps a state of which (documentId, stateVector) combination has been indexed. The Indexer always reindexes complete documents
(Later on, we could optimize this further and index incremental updates, for now, it would be ok to always reindex complete documents)
a) Step 4 would require to determine whether a stateVector is part of a document state. Does such an API exist?
b) This design is probably not completely fool-proof, e.g. in “split-brain” situations. However, I think it’s relatively simple and should be covering most scenarios
c) I understood Deletions are not included in State Vectors, so if there are no insertions/changes the documentUpdates won’t be updated, and the delete-only changes won’t trigger reindexing. Maybe my overall concept is flawed, and there’s a better approach for this?