Question 1

When does RAG make sense instead of just an LLM?

Accepted Answer

When answers need to come from your specific documents, not from the model's general knowledge. RAG is the difference between 'the model knows about this in general' and 'the model searches your documentation and cites where the answer comes from'. Whenever answers must be verifiable and updatable without retraining.

Question 2

How do you evaluate whether a RAG system works well?

Accepted Answer

With a set of questions with expected answers and faithfulness metrics (the answer comes from the retrieved context) and relevance metrics (the retrieved context is the right one). The retrieval also needs to be evaluated separately: that the system finds the right fragments before generating.

Question 3

What LLM models do you use?

Accepted Answer

It depends on the case. GPT-4o and GPT-4o-mini from OpenAI for most production cases. Claude from Anthropic for tasks with long documents. HuggingFace models for fine-tuning or when data privacy does not allow external APIs. Llama and Mistral for on-premise deployment.

Question 4

Can I use LLMs with my internal data without it leaving my infrastructure?

Accepted Answer

Yes. If the data is confidential, I work with models deployed on-premise or in your private cloud: Llama, Mistral or other open-weight models. The RAG system runs on your infrastructure and documents never leave to external APIs.

Question 5

How much does developing an LLM application cost?

Accepted Answer

A basic RAG assistant (indexing + retrieval + API + minimal interface) is between €2,500 and €5,000. An agent with multiple tools or a system with evaluation and fine-tuning is between €5,000 and €12,000. Model API usage costs are on top and depend on volume.

LLM applications in Madrid

RAG: answers that cite their source

Agents with guardrails

Evaluation, not just a demo

Fine-tuning when the API is not enough

Want to build something with LLMs that actually works?

FAQ

When does RAG make sense instead of just an LLM?

How do you evaluate whether a RAG system works well?

What LLM models do you use?

Can I use LLMs with my internal data without it leaving my infrastructure?

How much does developing an LLM application cost?