Model performance often takes precedence over model stability, if stability is considered at all when optimizing machine learning pipelines. While this can be beneficial in Kaggle competitions, it can lead to unexpected outcomes when a model is in production. For example, credit risk scores, medical classifiers, and customer lifetime value models should be consistent over the same person. We present a real-world use case of how we optimize for both model stability and performance at Farfetch.
Farfetch is an online luxury-fashion platform with over one million active customers and more than one billion dollars of transactions in its marketplace yearly. We use machine learning to optimize customer relationship management (CRM) activities through customer-lifetime value and churn modelling. For this application, model stability is crucial for adoption by our internal stakeholders and to ensure a consistent customer experience.
Model stability relates to the variability of model predictions arising from the training process, training data, and shifts in the distribution of features over time. We shall discuss how this arises in the case of customer-lifetime value modelling, which requires making predictions for the same set of customers periodically. Firstly, we will introduce how to measure the variability arising from the sources above using methods such as bootstrapping, re-training, and simulation. Secondly, we present our solutions to enhance model stability. Finally, we benchmark a wide array of model classes including Linear Models, Random Forest, and Gradient Boosting on our dataset and use-case. We find that the most performant models are not the most stable.
By the end of the talk, we will have demonstrated practical methods to navigate the trade-off between model performance and model stability on a real world problem.
Abstract & Bio
Practical Methods to Optimise Model Stability: A Case Study Using Customer-Lifetime Value at Farfetch