Course Abstract

Training duration: 4 hours (Hands-on)

Data is everywhere and its prevalence drives decisions for almost every industry. However, anomalies in data can lead to incorrect or out of date decisions to be made. Whether it is just doing exploratory data analysis and trying to clean your data, monitoring the health of a computer system to make sure things are working properly, or trying to catch fraudulent claims in life insurance, anomaly detection helps detect outliers before they can become too much of a problem for decision makers. This course will examine anomaly detection through the example of fraud, but all of these techniques can be applied to other areas as well. We will start with the importance of feature creation and transformation. We will then cover more statistical-based approaches to anomaly detection. Last, we will end with more machine learning-based approaches to allow the learner to approach anomalies from any angle and industry need.

DIFFICULTY LEVEL: BEGINNER-INTERMEDIATE

Learning Objectives

  • Develop good features (recency, frequency, and monetary value as well as categorical transformations) for detecting and preventing fraud

  • Identify anomalies using statistical techniques like z-scores, robust z-scores, Mahalanobis distances, k-nearest neighbors (k-NN), and local outlier factor (LOF)

  • Identify anomalies using machines learning approaches like isolation forests and classifier adjusted density estimation (CADE)

  • Visualize these anomalies identified by the above approaches

Instructor Bio:

Aric LaBarr, PhD

Associate Professor of Analytics | Institute for Advanced Analytics at NC State University

Aric LaBarr, PhD

A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern work force to wisely communicate and handle a data-driven future at the nation's first Master of Science in analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.

Course Outline

Module 1:  Introduction

Lesson 1.1 - Who am I

Lesson 1.2 - What are Anomalies

Lesson 1.3 - Anomaly Detection Analytical Framework

 

Module 2: Data Preparation

Lesson 2.1 - Feature Engineering

Lesson 2.2 - Recency and Frequency

Lesson 2.3 - Periodic Means

Lesson 2.4 - Categorical Feature Engineering

 

Module 3: Probability and Statistical Approaches

Lesson 3.1 - Benford's Law

Lesson 3.2 - Z-scores and Robust Z-scores

Lesson 3.3 - IQR Rule and Its Adjustment

Lesson 3.4 - Mahalanobis Distances and Robust Mahalanobis

 

Module 4: Machine Learning Approaches

Lesson 4.1 - k-Nearest Neighbors (k-NN)

Lesson 4.2 - Local Outlier Factor (LOF)

Lesson 4.3 - Isolation Forests

Lesson 4.4 - Classifier-Adjusted Density Estimation (CADE)

Lesson 4.5 - One-Class Support Vector Machines (SVM)

Background knowledge

  • Introductory knowledge to statistics to understand means and standard deviations

  • Introduction to basic machine learning to grasp the concepts of the advanced anomaly detection

This course could be useful for:

  • Data scientists in the banking industry

  • Data scientists in the insurance industry

  • Data scientists in the retail industry