AI Agents
What is RAG (Retrieval-Augmented Generation) and when should you use it?
· 5 min read · By Jon Jovinsson
RAG stands for Retrieval-Augmented Generation. Instead of relying entirely on what an LLM learned during training, RAG retrieves relevant documents or data at query time and includes them in the model's context. This means the model answers questions using your specific, up-to-date information rather than generic training knowledge that may be outdated or irrelevant.
How RAG works step by step
- →1. Ingestion: your documents (PDFs, wikis, emails, etc.) are chunked and converted to vector embeddings
- →2. Storage: embeddings are stored in a vector database (pgvector, Pinecone, Vertex AI Vector Search)
- →3. Retrieval: when a question arrives, it is embedded and the most relevant chunks are retrieved
- →4. Generation: the retrieved chunks plus the question are passed to the LLM, which answers using that context
When RAG is the right choice
RAG is the right architecture when your use case involves answering questions from a specific corpus: internal policy documents, product manuals, CRM notes, legal contracts, past project reports, or research libraries. It's more accurate than pure LLM generation for factual questions and more auditable because you can show which sources the answer drew from.
Hybrid search: better than pure vector
Pure vector search retrieves semantically similar content but misses exact keyword matches. Hybrid search combines vector similarity with BM25 keyword scoring for better recall across diverse query types. For most production RAG systems we build, hybrid search outperforms pure vector search, especially on queries that include specific names, codes, or technical terms common in Australian finance, mining, and legal contexts.
Common RAG mistakes
- →Chunk size too large: the model gets too much irrelevant context and gets confused
- →No re-ranking: returning the top-k chunks without re-ranking by relevance reduces answer quality
- →Missing metadata filtering: not filtering by date, department, or document type pollutes retrieval
- →No evaluation: deploying a RAG system without testing answer quality on real queries