word2vec isn’t just for words

In my previous tip I introduced word2vec. I discussed it in terms of language: this word, mary, shared context with this other word, lamb, so their embeddings move closer.

Why constrain ourselves to language?

We could pretend that “Doug likes Star Wars” is the same kind of co-occurence. We can make a table of users to the movies they like:

Anchor Positive movie Negative movie

doug star wars king kong

doug star trek cinderella

tom star wars citizen kane

tom battlestar galactica the aviator

Think about what we have:

Doug and Tom’s embeddings grow closer through star wars. A word2vec training here shrinks the distance from Doug ←→Star Wars and Tom ←→ Star Wars, making Doug a more similar user to Tom.
In the same way, battlestar galactica moves closer to star trek through doug + tom

Thus now, we have a movie recommender system, through the same technology behind word2vec.

We could use this for quite a lot of domains:

Queries and documents
Images and captions

And so on!

-Doug

PS - 5 days left to signup for Cheat at Search with Agents!

This is part of Doug’s Daily Search tips - subscribe here

Enjoy softwaredoug in training course form!

Starting May 18!

Signup here - http://maven.com/softwaredoug/cheat-at-search

I hope you join me at Cheat at Search with Agents to learn use agents in search. build better RAG and use LLMs in query understanding.

Doug Turnbull

More from Doug
Twitter | LinkedIn | Newsletter | Bsky