Doug Turnbull's Blog

March 10th, 2026

The tests are the code now

With AI coding, tests suddenly becomes the most important part of our code to maintain

March 9th, 2026

Ugly hack to force BM25 to 0-1

Its convenient to have a lexical score normalized from 0 1 Sadly BM25 scores tend to be all over the...

March 6th, 2026

BM25’s other parameter - WTF is k3?

You may know BM25 lets you tune two parameters k1 how quickly to saturate document term frequency’s contribution b how...

March 6th, 2026

Can BM25 be a probability?

BM25 odds vs probabilities: a tour of Bayesian BM25 and what it means for hybrid search calibration.

March 4th, 2026

High IDF doesn't always mean relevant search

Rare terms have high inverse document frequency IDF BM25 scoring treats high IDF terms as more relevant Why We assume...

March 2nd, 2026

BM25: probabilistic but not a probability

BM25 models the odds a term would be observed in a relevant document vs the term occurring in an irrelevant...

February 27th, 2026

Consider pairwise evals instead of pointwise

If pointwise evals asks How relevant is this from 1 5 pairwise search evals says Which of these two results...

February 25th, 2026

Measure your rater reliability

In the previous tip we discussed how pointwise 1 5 labels fall apart The expert rater gives only nit picky...

February 23rd, 2026

How pointwise evals fall apart

A judgment list labels a document as relevant irrelevant for a query So you get a label say 1 5...

February 20th, 2026

Know where search management hits limits

I mentioned my experience with Shopify merchants that controlled their own search quality They manually outperformed our best algorithms The...

February 18th, 2026

Search management takes pressure off algorithms

You built a pretty good query understanding solution It’s an improvement You have to ship tomorrow One problem the query...

February 16th, 2026

Don’t discount search management

At Shopify while our work improved search quality overall one segment of users stood out those that manually controlled results...

February 13th, 2026

What are visual language models?

A visual language model learns embeddings for parts of an image In a normal language model we look up the...

February 11th, 2026

What is a late interaction model?

Late interaction models like ColBERT give you fine grained passage scoring In normal vector search every document has exactly one...

February 9th, 2026

Why do single vector representations fail?

This week we’ll talk a bit about late interaction But to get there we need to think about why single...

February 6th, 2026

HNSW doesn't always scale?

In previous tips I talked about tail latency Your search cluster becomes as slow as the slowest node A wide...

February 4th, 2026

Relevant retrieval w/ predictable latency

The higher the scale the stronger the incentive to simplify your retrieval There’s two conflicting incentives Improving relevance Requiring more...

February 2nd, 2026

Be aware of tail latency in your search cluster

Don't push complex ranking into the search engine Layering in operation on top of plugin on top of who knows...

January 30th, 2026

Spreadsheet implementation of word2vec

Here’s a fun spreadsheet that implements word2vec Use it for jumping off point It has A single small vocabulary of...

January 28th, 2026

word2vec isn’t just for words

In my previous tip I introduced word2vec I discussed it in terms of language this word mary shared context with...

January 26th, 2026

Embedding triplet training - know word2vec

With embedding similarity you train with an anchor a positive and a negative You want to move the positive's embeddings...

January 23rd, 2026

Underrated: retention; overrated: conversions

Have you been to a conversion crazy site It’s nuts Their site screams at you They probably have the modern...

January 21st, 2026

To measure search well, watch Youtube

Youtube masterminded how to turn engagement into insights Whether search or their feed you can learn from how they learn...

January 19th, 2026

Don’t measure search with end-of-funnel conversions

When I worked at Shopify the gold standard was GMV the dollar amount in revenue Naturally that’s what we wanted...

January 17th, 2026

Ralph, too, needs a test train split

When we think of AI coding as ML modeling, we enable AI to solve deeper, more difficult, and less black and white problems.

January 16th, 2026

Be wary of public benchmarks

You may know ANN Benchmarks it’s a leaderboard of vector search algorithms It’s referenced a lot by companies when choosing...

January 14th, 2026

Unlike text search, filtered vector search can be slower

Full text search instincts about filters don’t translate to vector search In full text search the rule has been to...

January 12th, 2026

Know the two families of vector retrieval

Vector search isn’t that hard Think about maps Nearest neighbors in 768 dimensions is like nearest neighbors in 2 dimensions...

January 9th, 2026

In vector retrieval: orthogonality is the norm

One place 2D analogies of vector search breaks down orthogonality In 2D there’s little special about orthogonality two normal vectors...

January 8th, 2026

Semantic Search Without Embeddings

Pay attention to the other ways to model similarity and filter search results

January 7th, 2026

Field length (norms) matters for relevance

In a lucene based search engine BM25 rewards shorter snippets of text I’ve mentioned that term frequency might not matter...

January 5th, 2026

Slow full-text search? pay attention to term cardinality

What slows full text search down too many unique terms Consider search with numerical values It’s unlikely you care about...

January 1st, 2026

Use a pepsi-coke challenge to evaluate search results

Evaluating search Don’t jump to complex labeling systems just do simple side by sides When I worked at Reddit I...

December 19th, 2025

Don't confuse similarity for relevance

It’s easy to be seduced by the out of the box capabilities of an embedding model Immediately you get results...

December 18th, 2025

Replace complex search with agentic search

When people search for their job they have complex information needs They could express what they want in a paragraph...

December 17th, 2025

Query understanding matters more than ranking

Today’s daily search tip comes from friend of the newsletter Daniel Tunkelang From the talk Query understanding matters more than...

December 16th, 2025

The other kind of “semantic search”

When we think semantic search we think embeddings This is just one kind of semantic search There’s a less sexy...

December 15th, 2025

Remove term frequency from title fields

Users want to know if a document is about the searched for terms Search tech news articles for iPhone relevance...

December 14th, 2025

Free course: Cheat at Search Essentials

A free introductory search course for anyone who wants better search without all the hard work

December 13th, 2025

Welcome to Doug's Daily Search Tips

Welcome Test Doug...

December 9th, 2025

RAG Isn’t a Vector Search Problem

Through market forces, embeddings became the singular framework we understood RAG. It's the wrong lens to think about the problem

November 2nd, 2025

LLM Judges aren’t the shortcut you think

After the LLM judge hype curve crashes, what will come after?

October 19th, 2025

An agent-coded search reranker

An experiment guiding an agent to code a search reranker optimized for NDCG. How badly overfit is it?

October 15th, 2025

General purpose agentic loop in 40 lines of Python

Drop in any well-typed Python function; the loop builds the tool schema and wiring for you. How I build dumb demos and experiments.

October 6th, 2025

Reasoning boosts search relevance 15-30%

Kicking the tires on an initial, naive agentic search with some thoughts on how it could be improved further

September 22nd, 2025

Agents turn simple keyword search into compelling search experiences

Agents need tools they understand, like simple keyword search. They can reason about these tools, evaluate the results, refine, and iterate to deliver rather interesting results. But maybe with some caveats.

September 18th, 2025

BM25F from scratch

BM25 run across multiple fields isn't as simple as summing a bunch of field-level BM25 scores.

September 16th, 2025

The scapegoats guide to transforming organizations

Leading a new effort at work meant to transform the company? May you last long enough to become the villain.

September 5th, 2025

Do well-written, clear instructions beat few-shotting for tiny-LLMs?

Quick experiment few shotting vs using rules with little LLMs

August 21st, 2025

OpenAI lost the plot on 'boring' LLM use-cases

LLMs have uses far beyond agents. We should all be concerned how quickly GPT-5 thrashed away from those use cases

August 19th, 2025

Good agents are good researchers

We need to evaluate agents from a completely new angle: as researchers not reasoners.

July 30th, 2025

The tradeoff between AI and human context

AI coding requires a tradeoff between human and AI context, and your job is now to decide where to best spend your limited attention vs to let the AI do work

July 2nd, 2025

Mercenaries over Missionaries

Believing in the mission is the new we're a family. Why be a true believer in anything that's about to lay you off?

June 22nd, 2025

Grug-brained evals

Big brain spend months building perfect quality metrics. Grug brain no trust, and just want dumb labels from coworkers 👍/👎.

June 15th, 2025

Zero shot is not a free lunch

Can you prompt your way to solving every NLP problem?

June 7th, 2025

Be wary of high variance NDCG changes

High variance NDCG delta in an offline experiment can mean we don't understand our change

June 3rd, 2025

Liberating search from the search engine

Instead of learning complex Query DSLs, we need better API-level abstractions to deal with Top N Retrieval

May 16th, 2025

RAG's big blindspot

RAG apps have a big blindspot - using actual user engagement to drive improvement. But its a hard problem. Let's discuss!

May 3rd, 2025

On the road to your own vector db - some basics

Baseline concepts and ideas for vector search in the direction of graph data structures

April 29th, 2025

Two tower embeddings instead of 'hybrid search'

Building hybrid search means constantly tweaking and customizing your embedding

April 26th, 2025

Stop overbuilding evals

Some teams have no idea on evals. Other teams massively overscale them and don't make progress

April 16th, 2025

Hybrid Search - optimizng the "R" in RAG

Slides from my Hybrid Search talk for Maven. How to think about L0 retrieval with hybrid search

April 8th, 2025

An LLM Query Understanding Service

LLMs turn query understanding from complex, multi-month project to days

April 2nd, 2025

All search is structured now

There's no excuse for unstructured search queries in the age of LLMs

March 28th, 2025

AI Brainrot means developer opportunity

AI makes us lazier - today's inconveniences feel excrutiating enough to pay for them to go away

March 13th, 2025

Elasticsearch Hybrid Search Recipes - Benchmarked

Various strategies of Elasticsearch benchmarked w/ NDCG stats

February 8th, 2025

Elasticsearch hybrid search in practice

Elasticsearch knn query is both a joy and a headache - here is where you'll get stuck and the hacks I've used to overcome them.

January 21st, 2025

Classic ML to cope with Dumb LLM Judges

Taking the output of many dumb LLM search relevance judges and feeding the output to a decision tree to improve precision

January 19th, 2025

Check twice, cut once with LLM search relevance eval

Checking both directions ( in LLM pairwise evaluation of search relevance

January 13th, 2025

Turning my laptop into a Search Relevance Judge with local LLMs

Local LLMs can evaluate 100s of result pairs a minute in a Macbook, enabling a new age of rapid search relevance improvements

December 30th, 2024

Turns out an AI-only twitter is pretty boring

I built AI-bot twitter and learned they like to argue about pumpkin spice lattes

December 20th, 2024

Reflecting on 6 years of "AI Powered Search"

The book AI Powered Search is out and I couldn't be more grateful to Trey Grainger and Max Irwin for having me on this journey

December 14th, 2024

Preferring throwaway code over design docs

If you have discipline to throw away your first idea, draft, throwaway PRs often drives more progress than a design doc.

December 10th, 2024

Go from Python - initial impressions

Some notes when you get into Go from Python for the fellow Go newb.

December 3rd, 2024

Your big company can't be a startup again.

Large companies can't put the genie back in the bottle because to most employees, they don't have autonomy over their roles

November 18th, 2024

Failing at an Elasticsearch 'full' phrase match

Elasticsearch doesn't have a straight-forward way to match the 'full' field (all the tokens as a phrase).

November 3rd, 2024

RRF is Not Enough

Reciprocal Rank Fusion, while a useful tool, doesnt magically make hybrid search relevant

October 19th, 2024

Real life NDCG notebook

A notebook showing the real decisions computing search evaluation stats

October 13th, 2024

The hidden danger that kills search products

The lack of objective definition of good search creates huge hazards when creating search, RAG, AI solutions

September 25th, 2024

Stop avoiding conflict on your teams

Avoiding conflict is the death knell of organizations that leads to a lack of progress and careers that implode.

September 11th, 2024

Staff engineers exist in a system of patronage

In reality, staff engineers aren't about 'company wide' impact but a system of patronage where managers reward behaviors they value

September 11th, 2024

Generative AI Augmented Retrieval - GAR presi

My GAR slides from Systematically Improving RAG Applications Sept 2024 course

September 7th, 2024

Your company needs Junior devs

Junior engineers are foundational to whether a team can collaborate and innovate

August 9th, 2024

Search query analysis minus the noise

Finding search queries to improve is harder than you think. Here's one statistical procedure for deciding whether a query really has a problem -- or if its just noise.

August 6th, 2024

I made a worse search engine than Elasticsearch

Integrating my BM25 pandas search library, SearchArray, into BEIR, in order to embarass myself in public.

July 31st, 2024

Will AI Chickens come home to roost?

With normal/boring stock returns - have GOOG, MSFT, etc,run out of AI+layoff cards to play?

June 25th, 2024

What AI Engineers Should Know about Search

All that lexical search context you need to build that RAG app

June 21st, 2024

Planning of E-Commerce Relevance Work - MICES 2024

MICES (Mix Camp E-Commerce) talk about planning e-commerce search relevance work with fast prototypes

June 17th, 2024

Reddit Learning to Rank - project retro

Berlin Buzzwords + Haystack talks about Reddit's Learning to Rank journey

May 22nd, 2024

Flavors of NDCG - normalized to what!?

Every team chooses different types of NDCG, choosing your ideal is perhaps the most consequential decision

May 16th, 2024

Just code dumb shit to impress your friends

Groundbreaking and courageous software ideas start by first impressing 3 good friends

May 8th, 2024

Dont have F-You money? Build an F-You Network.

Be of service to others and true to your craft to build a great network

May 5th, 2024

100x faster sorted array intersections

Implementing an exponential search in Cython to speed up position intersections in SearchArray phrase search

April 28th, 2024

SearchArray Results Gathering Performance

The actual bottlenecks are the search results we gather along the way

March 24th, 2024

Adding position-aware search to SearchArray

As Yoda would Say - A joke it is not to add slop to a search system.

March 24th, 2024

The other hard retrieval problems

We need more than dense embeddings in our 'vector' search

January 24th, 2024

Are we at peak vector database?

Seriously - why do we need all these vector databases? Do we need dozens of them?

January 21st, 2024

A Roaringish phrase search algorithm

How phrase search works in search array by intersecting roaring-like numpy arrays.

November 20th, 2023

SearchArray: Making search relevance not so special

Make traditional text search a core part of the Python data stack

October 15th, 2023

View VisualVM Java profiler output as a flamegraph

Convert VisualVM's profiler output to a format suitable for a flamegraph

October 13th, 2023

Fighting undead documentation

Software documentation that doesn't suck needs to exist with the living

October 10th, 2023

NDCG is Overrated (talk at Berlin Search Technology Meetup)

Slides from Berlin Search Tech. Meetup describing an alternative way beyond Judgments and NDCG to think about search offline evaluation.

September 12th, 2023

Take calls. Help people :).

Helping random people, for free, can be one of the best things you can do for your career.

September 11th, 2023

It belongs in a foundation

Why do we rely on such a fractured, vendor-dominated database layer in our supply chain? Why aren't we more worried?

September 5th, 2023

Vector Search The Hard Way (talk at Chicago Search Meetup)

Slides from Chicago Search Meetup, discussing real-world tradeoffs in vector search beyond just the benchmarks

August 22nd, 2023

LSH in Numpy C for some fun but no profit

Implementing random projections based LSH as a C Numpy function

August 21st, 2023

A pure python LSH nearest neighbors implementation

Badly implementing locality-sensitive hashing as a vector search solution... for science, edification, 💩, and giggles.

July 27th, 2023

The wrong feedback loop

Software engineering is about designing the right feedback loop(s) with limitted resources.

July 8th, 2023

One big reason search teams fail

Search orgs fail because teams get stuck in functional silos rather than empowering their peers

June 24th, 2023

Stay a Beginner

Visibly failing, learning, and discovering first principles is how you have real influence on a field

June 15th, 2023

Hyperfocus vs perspective as a staff engineer

Navigating between hyperfocus/executing vs perspective/information-gathering mentals states is f*cking hard.

May 29th, 2023

Search relevance for understaffed teams

Where to get started, next steps to take, how to evolve search from 0 to 1.

May 28th, 2023

When feedback is not a gift

Feedback is the lifeblood of getting better, but be careful who you accept feedback from.

May 13th, 2023

Switch repos with "git cd"

Idiot proof "git cd" command to cd to repos in your project dir with fuzzy matching and tab completion.

May 6th, 2023

NDCG is overrated

You can get started improving search relevance without labels and judgments. Which is an imperfect model anyway.

May 1st, 2023

Recover relevance ranking from weak labels

In this post, I have very weak, uncertain labels of relevance / not. However, in aggregate, they may be able to help us make strong determination on the importance of ranking signals.

March 12th, 2023

Reconstructing a cosine similarity

If you know u and v's dot products to A1...An can you reconstruct u.v?

March 10th, 2023

Estimate dot product through two shared references

Taking things up exactly one notch, from one shared reference, to two to estimate a dot product.

March 2nd, 2023

Estimate dot product through a shared reference

Given a reference vector `A`, where we know `u.A` and `v.A` what can we say about `u.v`?

February 28th, 2023

How likely is a given dot product?

Finding the probability of a dot product between two vectors lets us quantify how much information is in cosine similarity.

February 13th, 2023

Vector Search for the Uninitiated

What is vector search and why all the sudden are we talking about it?

December 26th, 2022

Orthogonality expected at higher dimensions

Ninety degrees isn't particularly special in 2D, but 3D and beyond, it's the expected angle between two unit vectors.

December 24th, 2022

What ChatGPT Says About The Web

ChatGPT unlocks information from the Web, and away from sites that abuse their users attention with spam and writing that targets Google, not humans.

December 4th, 2022

Make a search engine in ChatGPT

Index some documents, provide some queries, ChatGPT will tell you the most relevant documents for those queries

December 3rd, 2022

Meet Fred, a person living inside ChatGPT

Fred works in marketing in New York City and enjoys running. He lost his job at the beginning of the pandemic.

November 14th, 2022

The Importance of Naive Solutions

With algorithm development, naive solutions provide a crucial reference implementation for your testing.

November 9th, 2022

Idiot proof git

Aliases, etc that have made rebase-based workflows in Git much less advanced feeling.

September 19th, 2022

We always work with a broken definition of the problem

Experiments are to search relevance correctness as unit tests are to code correctness. By definition they're a broken but nescesarry defition of the problem we need to get started.

September 11th, 2022

Using Elasticsearch from Google Colab with Bonsai

No need for local setup to play with Elasticsearch from a Jupyter notebook - just use Bonsai + Colab!

July 16th, 2022

What is Presentation Bias in search?

Let's explore this key bias in search systems towards the old algorithm and how to overcome it!

June 20th, 2022

Reconstructing relevance judgments - two scenarios

Analyzing the plausibility of guessing relevance judgments from runs in the VMWare Zero Shot Kaggle Competition

June 8th, 2022

Deriving Search Relevance Judgments from an A/B Test

Can we simulate the likely search relevance labels just from knowing which results shifted and the outcome of an A/B test?

April 23rd, 2022

Start with Who, not Why

Work with amazing people you love collaborating with, the rest (mission, purpose, etc) falls out from that.

January 17th, 2022

LambdaMART in Depth

Reimplementing LambdaMART in Python for endless tinkering and learning

November 28th, 2021

How LambdaMART works - optimizing product ranking goals

LambdaMART directly optimizes whatever search relevance ranking metric matters to your business. This article details how this neat machine learning trick works to target what matters most to your product

November 12th, 2021

Ruby vs Python comes down to the for loop

Contrasting how each language handles iteration helps understand how to work effectively in either.

May 5th, 2021

Finding the relevance cutoff: when to stop showing search results

In this article: we assume users review every search result. So we need to find that sweet spot when we get to look reaaalllly fricken smart and declare, with confidence, "we have nothing else that matches your query".

April 21st, 2021

Compute Mean Reciprocal Rank (MRR) using Pandas

Using Pandas to compute Mean Reciprical Rank using the MSMarco Dataset

February 21st, 2021

What Is a Judgment List?

Judgment lists prevent search whack-a-mole. They provide a safety net for search, allowing you to innovate quickly on relevance with a high degree of confidence.

December 22nd, 2020

Hack your Career With Consulting

High end technical consulting is a fantastic thing for you to do mid career. It helps you build a personal brand, deepen soft skills, and focus on challenging technical problems. Why and why not you might want to take this step.

August 6th, 2020

Political Twitter is The Opposite of Activism

Twitter gives you an illusion of influence over political events. In reality, it meaninglessly fiddles our energy away. Doing our duty requires real work in the real world.

May 20th, 2020

Avoiding Grubhub: Ethical Online Delivery Options in Charlottesville

What I know so far on getting cheaper delivery in Charlottesville that avoids Grubhub's shenanigans and supports local restaurants

April 5th, 2020

Kill Your Twitter Addiction With This One Weird Trick

Add friction to your twitter login to keep yourself sane.

August 26th, 2019

Write for yourself, not the audience

Write to grow closer to the truth. Not because you have all the answers, not to get page views or win internet points. Instead write to broach a point of view and test it against your audience's norms and points of view.