In depth
An embedding model converts a variable-length input (a sentence, a document, a code snippet) into a fixed-size numerical vector. The magic: semantically similar inputs produce vectors close in cosine similarity, even if they share no keywords. 'How do I fix a flat tire?' and 'tire repair tutorial' end up near each other in embedding space.
Embeddings are the mechanism behind semantic search. To find the most relevant documents for a query, you embed the query, embed all your documents, and compute cosine similarity — the closest vectors are the most semantically relevant hits. This is how ChatGPT retrieves from your connected knowledge bases.
Popular embedding models: OpenAI `text-embedding-3-large` (3072 dim), Anthropic Voyage AI (`voyage-3`), Cohere `embed-v3`, open-weights like `bge-large` or `e5-mistral`. Dimension size is a trade-off: larger = more expressive, but more storage and slower search.
MCP ecosystem includes embedding-focused servers: Pinecone, Weaviate, pgvector-on-Postgres, Voyage. Combine with a knowledge-base MCP (Notion, Confluence) to build an end-to-end RAG pipeline.