What slows full-text search down? too many unique terms.

Consider search with numerical values. It’s unlikely you care about the distinction between 3.145927 and 3.14 when searching. Both are pi! 🥧

Instead of a postings list that looks like

3.145927 → [1, 5, 9]

3.14 → [1, 3, 9]

Collapse them to:

pi → [1, 3, 5, 9]

This requires you to pay attention to tokenization. Whether you actually have numbers, or more likely - you’re dealing with stemming or synonyms - collapsing terms to a single concept pays performance dividends.

And It helps improve recall too!

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs