In previous tips I talked about tail latency

  • Your search cluster becomes as slow as the slowest node
  • A wide per-node latency distribution results in an even wider per-cluster distribution

The higher the scale, the more sharded your data becomes, the more small node latency problems exaggerate cluster latencies.

So when I think about graph-based vector retrieval at scale, like HNSW, I get nervous.

With HNSW you’re:

  • Navigating a non-deterministic graph, depending on the order graph is constructed
  • Constructed from a non-deterministic set of points specific to this node
  • Jumping around to different areas of memory, destroying any memory locality

All the non-determinism here worries me. Some nodes will quickly converge on nearest neighbors. Other nodes will take more work.

The cluster’s latency becomes the latency of those nodes that take more work.

Since benchmarks like ANN benchmarks all happen in nicely warmed memory, on one system, they’ll miss these issues.

So search carefully and measure your cluster behavior, not just single node behavior!

-Doug

This is part of Doug’s Daily Search tips - subscribe here


Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky