LLM & RAG Evaluation Playbook for Production Apps

Workshop "LLM & RAG Evaluation Playbook for Production Apps" by Paul Lusztin

Get this course for free with Premium Ai+ subscription

Description

Building proof-of-concept LLM/RAG apps is easy—we know that. The next step, which consumes the most time and is the most challenging, is bringing the app to a production-ready level. You must increase accuracy, reduce latency and costs, and create reproducible results.

You must start optimizing your LLM and RAG layers to ensure compliance with all these requirements. You must begin digging into open-source LLMs, fine-tuning LLMs for your specialized tasks, optimizing them for inference, and so on.

However, before optimizing anything, you must first determine what to optimize. Thus, you must quantify your system's key metrics (e.g., latency, costs, accuracy, recall, hallucinations, etc.).

Thus, as developing AI applications is an iterative process, the first critical step to getting to production is learning how to evaluate and monitor your LLM/RAG applications. The best strategy is to build something simple end-to-end, attach an evaluation layer on top of it, and then quickly iterate in the right direction by clearly indicating what needs improvement.

Thus, this workshop will focus on evaluating LLM/RAG apps. We will take a simple, predefined agentic RAG system built in LangGraph and understand how to evaluate and monitor it.

To do that, we will explore the following topics:

Add a prompt monitoring layer.
Visualize the quality of the embeddings.
Evaluate the context from the retrieval step used for RAG. Compute application-level metrics to expose hallucinations, moderation issues, and performance (using LLM-as-judges). Log the metrics to a prompt management tool to compare the experiments.

Instructor's Bio

Paul Iusztin

Senior AI Engineer / Founder at Decoding ML

Paul Iusztin is a senior AI/ML engineer with over seven years of experience building GenAI, Computer Vision and MLOps solutions. His latest contribution was at Metaphysic, where he was one of the core AI engineers who took large GPU-heavy models to production. He previously worked at CoreAI, Everseen, and Continental.

He is the co-author of the LLM Engineer's Handbook, a bestseller on Amazon, which presents a hands-on framework for building LLM applications.

Paul is the Founder of Decoding ML, an educational channel on GenAI and information retrieval that provides code, posts, articles, and courses teaching people to build production-ready AI systems that work. His contributions to the open-source community have sparked collaborations with industry leaders like MongoDB, Comet, Qdrant, ZenML and 11 other AI companies.