Aric LaBarr, PhD
Associate Professor of Analytics | Institute for Advanced Analytics at NC State University
Develop good features (recency, frequency, and monetary value as well as categorical transformations) for detecting and preventing fraud
Identify anomalies using statistical techniques like z-scores, robust z-scores, Mahalanobis distances, k-nearest neighbors (k-NN), and local outlier factor (LOF)
Identify anomalies using machines learning approaches like isolation forests and classifier adjusted density estimation (CADE)
Visualize these anomalies identified by the above approaches
1. Introduction to Fraud
- The Problem of Fraud - How can we analytically define fraud? There are important characteristics of fraud that puts a better perspective on the modeling and identification of fraud.
- Detection and Prevention - The two biggest pieces that any holistic fraud solution should have are detection of previous instances of fraud and prevention of new instances. This section also defines the typical fraud identification process in organizations.
- Analytical Solution - Now that we now what fraud is as well as the organizational structure of how to deal with fraud, we need to introduce the analytical approaches to becoming a mature organization on detecting and preventing fraud.
2. Data Preparation
- Feature Engineering - The best way to glean information from data is to develop good features to help detect and identify fraud. We talk about and develop strategies for developing good features for anomaly detection.
- RFM Features - Thinking about new features in terms of recency, frequency, and monetary impact help define important characteristics of fraud. This is where the session gets interactive as participants put on their "fraudster hat" and try to think like a criminal to help develop new features.
- Categorical Feature Engineering - This section will cover ways to use categorical pieces of information to create even more rich features for our anomaly detection.
3. Anomaly Models
- Non-statistical Techniques - This section covers Benford's Law and why it was used (and still is) for basic anomaly detection.
- Univariate Analysis - When addressing anomalies for one variable at a time, we can use a variety of techniques. This section covers z-scores, robust z-scores, the IQR Rule, and the adjusted IQR rule.
- Multivariate Analysis - This is where the biggest improvements in anomaly detection have happened over the past decade. We will start with more statistical approaches like Mahalanobis distances (and their robust counterparts) as well as k-Nearest Neighbors (k-NN) and the Local Outlier Factor (LOF). Then we will move into more advanced machine learning approaches to anomaly detection like isolation forests and classifier-adjusted density estimation (CADE).
- Wrap-up - Here will will summarize everything we have done to build up our anomaly detection as well as hint towards the next course in more advanced fraud detection models.
Basic introduction to decision trees (this isn't required, but helpful for understanding)
Basic introduction to classification models like logistic regression, decision trees, etc. (this isn't required, but helpful for understanding)
Access to live training and QA session with the Instructor
Access to the on-demand recording
Certificate of completion