Statistics for Data Science by Andrew Zirm, PhD
Programming with Data: Python and Pandas by Daniel Gerlanc
The emergence of data science as a discipline has impacted businesses in a range of different ways. One primary impact has been to elevate the use of data in decision-making by using statistical methods to assess the ever-growing datasets companies are collecting. This workshop will review and introduce statistical techniques and touch on more advanced methods for dealing with noisy data and applying real-world constraints to analyses. This workshop assumes a working knowledge of standard statistical methods and will aim to connect theory to practice using real-world examples.
Lesson 1: Descriptive statistics and exploring data statistically
- (Re)familiarize yourself with basic descriptive statistics
- Use simple data exploration techniques to identify problems and limitations of a new dataset
Lesson 2: Statistical analyses
- Review of statistical tests to compare datasets and groups within those data
- Assessments of correlations and other qualities of the data with an eye towards modeling
Lesson 3: More advanced analyses and methods
- Linear modeling and the statistical outputs thereof
- Stats -> ML: connections and methodologies
Whether in R, MATLAB, Stata, or Python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.
Lesson 1: Introduction to Python and Pandas DataFrames
In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis
Lesson 2: Core Functionalites of Pandas
Pandas is a massive library, so we will focus on its core functionality,specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.