What slows full-text search down? too many unique terms.
Consider search with numerical values. It’s unlikely you care about the distinction between 3.145927 and 3.14 when searching. Both are pi! 🥧
Instead of a postings list that looks like
3.145927 → [1, 5, 9]
3.14 → [1, 3, 9]
Collapse them to:
pi → [1, 3, 5, 9]
This requires you to pay attention to tokenization. Whether you actually have numbers, or more likely - you’re dealing with stemming or synonyms - collapsing terms to a single concept pays performance dividends.
And It helps improve recall too!
-Doug
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting June 22!
I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.