Embedding triplet training

With embedding similarity you train with an anchor, a positive, and a negative. You want to move the positive’s embeddings closer to the anchor’s, while moving negative’s farther apart.

Enter good ole word2vec

Every word in the vocabulary starts with its own random embedding
When a word co-occurs with another word, its a positive (training moves them together)
A random word, sampled out of context, is a negative (training pushes them apart)

From just the context, “mary had a little lamb”, we might have:

ANCHOR POSITIVE NEGATIVE

mary little toenail

mary lamb banana

Over many passages, you might imagine each of these might become more similar to mary:

mary + lamb
mary + church
bloody + mary
mary + poppins

Importantly, these embeddings just know they shared context. They appear within a few words of each other. They do not act as language models

Language models use the entire document as context, here context is binary in / out (either co-occurs if within a few tokens, or doesn’t count)
Language models use a transformer architecture that weighs long-range relationships between this token and other, distant tokens

The articles topic about Disney? A language model knows the next token after mary is more likely to be poppins. But word2vec just as easily chooses nursery rhyme, church, and other “mary” themes.

-Doug

PS - 7 days left to signup for Cheat at Search with Agents!

This is part of Doug’s Daily Search tips - subscribe here

Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky

Embedding triplet training - know word2vec

Enjoy softwaredoug in training course form!

Starting May 18!

Doug Turnbull