AboutServicesProjectsContact

LLMs · RAG · Agents · Madrid

LLM applications in Madrid

I build language model applications that work: RAG over your own documents, agents that query structured data and assistants with memory and tools. With the evaluation needed to know whether they actually give correct answers, not just answers that sound right.

LLM applications in Madrid

RAG: answers that cite their source

Corpus indexing, semantic retrieval with embeddings and answer generation that cites the fragment it came from. The goal is for every answer to be verifiable, not to sound good. With chunking for long documents and re-ranking for large corpora.

Agents with guardrails

Agents that translate natural language into SQL queries and run them read-only, or that call external APIs with well-defined tools. Useful without being dangerous: bounded scope, no write access by default, with a log of what the agent does.

Evaluation, not just a demo

The problem with LLMs is that they generate text that always seems correct. I evaluate answers with automatic metrics (ROUGE, BERTScore, faithfulness) and black-box tests before the system reaches production. An honest benchmark says more than a demo.

Fine-tuning when the API is not enough

For very specific tasks where a base model does not give the required performance, I do supervised fine-tuning with LoRA or QLoRA on HuggingFace models. With evaluation before and after to know whether the cost is worth it.

Want to build something with LLMs that actually works?

Write to me with the use case: what documents, what questions, what users. Free first call to see what makes sense to build.

Start a project

FAQ

  • When does RAG make sense instead of just an LLM?

    When answers need to come from your specific documents, not from the model's general knowledge. RAG is the difference between 'the model knows about this in general' and 'the model searches your documentation and cites where the answer comes from'. Whenever answers must be verifiable and updatable without retraining.

  • How do you evaluate whether a RAG system works well?

    With a set of questions with expected answers and faithfulness metrics (the answer comes from the retrieved context) and relevance metrics (the retrieved context is the right one). The retrieval also needs to be evaluated separately: that the system finds the right fragments before generating.

  • What LLM models do you use?

    It depends on the case. GPT-4o and GPT-4o-mini from OpenAI for most production cases. Claude from Anthropic for tasks with long documents. HuggingFace models for fine-tuning or when data privacy does not allow external APIs. Llama and Mistral for on-premise deployment.

  • Can I use LLMs with my internal data without it leaving my infrastructure?

    Yes. If the data is confidential, I work with models deployed on-premise or in your private cloud: Llama, Mistral or other open-weight models. The RAG system runs on your infrastructure and documents never leave to external APIs.

  • How much does developing an LLM application cost?

    A basic RAG assistant (indexing + retrieval + API + minimal interface) is between €2,500 and €5,000. An agent with multiple tools or a system with evaluation and fine-tuning is between €5,000 and €12,000. Model API usage costs are on top and depend on volume.

hola@jmwebsoluciones.com