Learning Objectives

  • Get an overview of the kinds of data analysis capable with pandas

  • Get introduced to the DataFrame and Series, the two primary containers of data in Pandas

  • Learn about the different data types available

  • Learn how to select subsets of data by label and integer location

  • Learn how to filter data via boolean selection or the query method

  • Learn best practices for using Pandas most efficiently

Course Abstract

Training duration: 90 min (Hands-on)

Pandas is an extremely popular and powerful Python library for analyzing data. It gives its users the ability to explore, query, transform, aggregate, and visualize data. Pandas provides an enormous amount of functionality to perform a wide variety of data operations. In this course, you’ll get introduced to the DataFrame and Series, the two primary containers of data in the Pandas library. You’ll learn about their structures, types of data they may contain, how to select subsets of data, and filter for specific conditions. As the Pandas library provides so much flexibility to perform its operations, you’ll learn best practices for using it effectively and efficiently. The goal of this course is to lay the foundation for the Pandas library so that you can continue your journey learning about it and using it to analyze data, producing trusted results.

DIFFICULTY LEVEL: BEGINNER

Instructor Bio:

Teddy Petrou

Python Data Science Expert Instructor - Author of Multiple Books and Python LIbraries | Founder | Dunder Data

Teddy Petrou

Teddy Petrou is the author of Pandas Cookbook, a highly rated text on performing real-world data analysis with Pandas. He is also the author of the books Exercise Python and Master Data Analysis with Python. He is the founder of Dunder Data, a company that teaches the fundamentals of data science and machine learning. He really enjoys discovering best practices on how to use and teach data analysis with Python.

Course Outline

Module 1: An Overview of Data Analysis and Pandas 

  • What is data analysis?
  • What is Pandas?
  • Data analysis examples with Pandas

 

Module 2: The DataFrame and Series 

  • Components of the DataFrame and Series
  • Displaying the DataFrame and Series in the notebook

 

Module 3: Data Types and Missing Values

  • Common column data types
  • New data types for Pandas 1.0
  • Missing value representation
  • Setting a meaningful index

 

Module 4: Five-Step Process for Data Exploration 

  • Data analysis within a Jupyter Notebook
  • Execute a single main line of code per cell

 

Module 5: Selecting Subsets of Data 

  • Subset selection with just the brackets
  • Subset selection using labels with loc
  • Subset selection using integer location with iloc
  • Selecting subsets from Series

 

Module 6: Boolean Selection 

  • Single boolean conditions
  • Multiple boolean conditions
  • Complex boolean selection

 

Module 7: Selection with the query Method 

  • Creating boolean conditions with and, or, not
  • Multiple equality comparisons
  • Referencing strings and variables

 

Module 8: Miscellaneous Subset Selection 

  • How not to select subsets of data
  • Other indexers

Background knowledge

  • Elementary knowledge of the Python programming language is necessary

  • Previous work with Jupyter Notebooks would be helpful as all material is delivered with them

Target Audience

  • All data enthusiasts and professionals that wish to gain a deep understanding of the fundamentals of the pandas library

  • Who would like to programmatically analyze data using Python

  • Who desires best practices for using a complex library such as Pandas

  • Who desires to become an expert at Pandas

  • Who enjoys completing exercises to test knowledge learned

  • Who wishes to take challenging certification exams to show proof of knowledge acquired