Late interaction models, like ColBERT, give you fine-grained passage scoring.

In normal vector search, every document has exactly one vector. You score it against the query’s vector. You get a similarity. You rank the document. Done.

We call this a single vector representation

Late interaction works with multi-vector representations

Setup:

  • We’re scoring a passage “susan loved her baby sheep”
  • For the query “mary had a little lamb”
  • Every token in the document has a vector[1]

Scoring the passage:

  1. We encode our first query token [mary]
  2. We find the passage token with highest similarity. In this case, probably [susan]
    • (This is the max sim operation)
  3. We continue with [had], finding the max sim token, summing it in
  4. Continuing with every query token, until we have our final score

How does one train a ColBERT to produce a multi vector representation? Learn more in the Colbert paper

1 - using words here for simplicity, but we’d use a BERTy tokenizer like WordPiece

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting June 22!

I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky
Take My New Course - Cheat at Search with LLMs