Late interaction models, like ColBERT, give you fine-grained passage scoring.

In normal vector search, every document has exactly one vector. You score it against the query’s vector. You get a similarity. You rank the document. Done.

We call this a single vector representation

Late interaction works with multi-vector representations

Setup:

  • We’re scoring a passage “susan loved her baby sheep”
  • For the query “mary had a little lamb”
  • Every token in the document has a vector[1]

Scoring the passage:

  1. We encode our first query token [mary]
  2. We find the passage token with highest similarity. In this case, probably [susan]
    • (This is the max sim operation)
  3. We continue with [had], finding the max sim token, summing it in
  4. Continuing with every query token, until we have our final score

How does one train a ColBERT to produce a multi vector representation? Learn more in the Colbert paper

1 - using words here for simplicity, but we’d use a BERTy tokenizer like WordPiece

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky