In my previous tip I introduced word2vec. I discussed it in terms of language: this word, mary, shared context with this other word, lamb, so their embeddings move closer.
Why constrain ourselves to language?
We could pretend that “Doug likes Star Wars” is the same kind of co-occurence. We can make a table of users to the movies they like:
Anchor Positive movie Negative movie
doug star wars king kong
doug star trek cinderella
tom star wars citizen kane
tom battlestar galactica the aviator
Think about what we have:
- Doug and Tom’s embeddings grow closer through star wars. A word2vec training here shrinks the distance from Doug ←→Star Wars and Tom ←→ Star Wars, making Doug a more similar user to Tom.
- In the same way, battlestar galactica moves closer to star trek through doug + tom
Thus now, we have a movie recommender system, through the same technology behind word2vec.
We could use this for quite a lot of domains:
- Queries and documents
- Images and captions
And so on!
-Doug
PS - 5 days left to signup for Cheat at Search with Agents!
This is part of Doug’s Daily Search tips - subscribe here
Enjoy softwaredoug in training course form!
Starting June 22!
I hope you join me at Cheat at Search with LLMs to learn how to apply LLMs to search applications. Check out this post for a sneak preview.