You may know BM25 lets you tune two parameters:

  • k1: how quickly to saturate document term frequency’s contribution
  • b: how much to bias towards below average length docs

What you may NOT know is there is another parameter k3

What does k3 do? It handles repeated query terms.

Old papers suggest k3=100 to 1000, which immediately saturates. That’s why Lucene ignores k3. It just uses the query term frequency. Some other search engines like Terrier set it to 8.

So for the query, “Best dog toys for rambunctious dog”

  • Lucene engines count dog twice
  • Terrier, with k3=8, would count in 1.8 times: ((8 + 1) * 2) / (8 + 2) = 18 / 10 = 1.8

Which is right? For traditional search queries, we ignore k3.A few keywords usually don’t have repeated terms.

In today’s question answering world, though, its reasonable to wonder if we should bring k3 back?

If you’ve used k3 in search, let me know, I’d love to hear your story!

-Doug

AI Powered Search has STARTED - late signups available here

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs