In a lucene based search engine, BM25 rewards shorter snippets of text. I’ve mentioned that term frequency might not matter, it’s still good to bias towards shorter snippets of text with fewer terms.

Consider the case a user searches for angularjs

Which is more relevant?

  • A book title mentions “angularJS” but also “web design” and “javascript”
  • A book title JUST mentioning angularJS, and nothing else

The latter will be more “about” the concept than the one mentioning many topics. It’s a safer bet.

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky