Blog·03/05/2026

RAG: verifiable answers over your own documents

How retrieval-augmented generation works: indexing, semantic retrieval, chunking and evaluation. Why every answer should be able to cite where it comes from …

RAG (retrieval-augmented generation) is an architecture that combines semantic search with text generation: instead of asking the LLM to 'remember' your company's data, you search for it and give it in the context of each query. The result is a system where every answer can cite where the information comes from.

The problem RAG solves. A trained LLM knows nothing about your internal documents, and the fine-tuning alternative is expensive, slow and does not maintain source citations. RAG solves this without retraining: you index your documents, retrieve the relevant fragments for each question and include them in the prompt. The model generates from those fragments, not from its memory.

The three components you need to get right. The index (how you chunk and vectorise documents), retrieval (how well you find the right fragments for each question) and generation (how the model uses those fragments). The most common mistake is focusing on the model and neglecting retrieval: if the system brings the wrong fragment, the most powerful LLM in the world generates an incorrect response, no matter how plausible it sounds.

Chunking: not all pieces are equal. Splitting a PDF into 512-token fragments with a sliding window is the starting point, but you need to think about semantic context: do not cut paragraphs mid-sentence, respect document structure (sections, articles, tables), add metadata (document title, page number) that then appears in the citation. For long documents with complex structure, hierarchical chunking (summary + detail) improves retrieval.

How to evaluate retrieval. Before evaluating final answers, you need to evaluate whether the system retrieves the right fragments. With a set of questions and their expected fragments, you can measure precision@k (what percentage of the k retrieved fragments are relevant?) and recall@k (what percentage of the relevant fragments appear in the k retrieved?). Poor retrieval cannot be compensated for with a better LLM.

Citation as a guarantee. The distinctive value of RAG is that every answer can cite the fragment it came from. That design changes the user's relationship with the system: instead of trusting the answer, they can verify it. For applications where accuracy matters (technical, legal, medical, financial documentation), citation is not a cosmetic detail, it is the core feature.

Work with JMWEB

Let's build something that reaches production.

It all starts with a conversation. Bring a dataset, a goal or a model that is stuck; I will take care of the rest.

Start a project

Keep reading:

15/05/2026

When is it worth using an LLM — and when is it not?

Read article

08/05/2026

How to evaluate whether an ML model actually works

Read article