Skip to content

Mariano Rodrigo

AI Solutions Engineer building production systems with artificial intelligence, automation, and full-stack architecture. This is my public engineering lab: architecture decisions, implementation reports, experiments, and lessons from real systems.

Designing a RAG pipeline that survives production

A RAG demo is a weekend. A RAG system you can trust is a different project, and most of its problems are not in the generation step.

Where it actually breaks

  • Retrieval, not generation. If the right chunk never makes it into context, no prompt can save the answer. Chunking, embeddings, and ranking are the real work.
  • Grounding. The model must answer from the retrieved context, not from its priors. Without enforcement, it confidently fills gaps.
  • Verification. For anything factual, a second pass that checks claims against sources is not optional.

A shape that holds up

query → retrieve (hybrid) → rerank → ground → generate → verify → cite

                                            fail → escalate

Each stage is observable and independently testable. When quality drops, you can tell which stage moved — retrieval recall, rerank precision, or grounding — instead of guessing at the prompt.

The lesson

Treat RAG as an information-retrieval system with a language model attached, not a language model with documents attached. The ordering of those words is the whole difference in reliability.