Course Abstract

Training duration : 90 minutes

Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this course is for you. In this course, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.

DIFFICULTY LEVEL: BEGINNER

Learning Objectives

  • Familiarity with how to use the NetworkX and nxviz Python packages for modelling and rationally visualizing networks

  • Be able to load node and edge data from a Pandas dataframe

  • Familiarity with object-oriented and matrix-oriented representations of graphs

  • Be able to find paths between nodes, interesting structures in graphs, and projections of bipartite graphs

  • Be able to use matrix operations to simulate diffusion of information on networks

Instructor Bio:

Eric J. Ma

Senior Expert II/Investigator III (Data Science & Statistical Learning) | Novartis Institutes for BioMedical Research (NIBR)

Eric J. Ma

Eric is a data scientist at the Novartis Institutes for Biomedical Research. There, he conducts biomedical data science research, with a focus on using Bayesian statistical methods in the service of making medicines for patients. Prior to Novartis, he was an Insight Health Data Fellow in the summer of 2017, and defended his doctoral thesis in the Department of Biological Engineering at MIT in the spring of 2017. Eric is also an open-source software developer and has led the development of pyjanitor, a clean API for cleaning data in Python, and nxviz, a visualization package for NetworkX. In addition, he gives back to the open-source community through code contributions to multiple projects.

Course Outline

  1. Course Introduction
  2. Introduction to Graphs
  3. The NetworkX API
  4. Graph Visualization
  5. Hubs
  6. Paths
  7. Structures
  8. Graph I/O
  9. Testing
  10. Bipartite Graphs
  11. Linear Algebra
  12. Statistical Inference
  13. Conclusions

Background knowledge

  • This course is for current and aspiring Data Scientists and Data Visualization & Network Theory enthusiasts

  • Knowledge of following tools and concepts is useful:

  • Learners should have a grasp of Python programming

  • Loops and basic Python data structures

Real-world applications

  • Recommender systems: Using graph structures to recommend products or professional connections.

  • Epidemiological analysis: Figure out the most important spreaders of disease.

  • Logistics: Identify the most efficient path to move goods and services.