Good vector search means more than embeddings.
Embeddings don’t know when a result matches / doesn’t match. Similarity floors don’t work consistently - a cutoff that works for one query might be disastrous for another. Even worse: your embedding usually can’t capture every little bit of meaning from your corpus.
You need to efficiently pick the best top N candidates from your vector database.
What do you need?
- Query Understanding - translating the query to domain language (categories, colors, etc?) likely to produce the best results
- Filters - Exclude from scoring results that would obviously be irrelevant
- Boosts - Promote items close to the information need in ways not expressed in your embedding. Bring up the most popular, the one with shipping availability, etc.
Vector search is not enough, search requires a full suite of solutions to work.
-Doug
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting May 18!
Signup here - http://maven.com/softwaredoug/cheat-at-search
I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.