Session Overview

How should machine learning models be evaluated? Specifically, if you have an existing model, need to decide whether to supplant it with a new version, how do you do that?

The most common approach is to compare the two models on a standard suite of metrics, such as F1 score, ROC-AUC, or perplexity. In this talk, I'll discuss why this approach is incomplete, and discuss a different approach for comparing models that SentiLink uses before pushing new models to production: specifically, by manually looking at the "swap ins and swap outs", or the cases where one model does especially poorly and the other model especially well.

I'll walk through some real-world examples of how SentiLink uses this approach to evaluate models. I'll also give a concrete illustration of using this approach to compare a "cutting edge" deep learning model to a more standard deep learning model on a popular NLP dataset, complete with code for attendees to take away.


Overview

  • 1

    What Really Matters in Evaluating Machine Learning Models: Swap-Ins / Swap-Outs and How to Use Them

    • Abstract & Bio

    • What Really Matters in Evaluating Machine Learning Models: Swap-Ins / Swap-Outs and How to Use Them

INTERESTED IN HANDS-ON TRAINING SESSIONS?

Start your 7-days trial. Cancel anytime.