Course Abstract

Training duration : 90 minutes

Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this course is for you. In this course, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.

DIFFICULTY LEVEL: BEGINNER

Learning Objectives

  • Familiarity with how to use the NetworkX and nxviz Python packages for modelling and rationally visualizing networks

  • Be able to load node and edge data from a Pandas dataframe

  • Familiarity with object-oriented and matrix-oriented representations of graphs

  • Be able to find paths between nodes, interesting structures in graphs, and projections of bipartite graphs

  • Be able to use matrix operations to simulate diffusion of information on networks

Instructor Bio:

Eric J. Ma

Principal Data Scientist, Platform Research | Moderna Therapeutics

Eric J. Ma

Eric is a Principal Data Scientist at Moderna Therapeutics and alumni of Novartis, Insight Data Science, and MIT. His ScD thesis research was conducted in the Department of Biological Engineering at MIT. His thesis addresses two distinct topics that are unified by a mission for infectious disease. The first problem he has addressed is the ecological question of whether genome shuffling is quantitatively important for ecological niche switching. The second problem he is addressing is the data science problem of interpretable machine learning models for predicting protein phenotype from genotype. He believes in using open data, open science, and open source tools to ensure the long-term integrity of the scientific work that he conducts. To that end, he is committed to releasing source code and documentation for his scientific work and has already done so on two manuscripts that are currently submitted and under consideration.

Course Outline

  1. Course Introduction
  2. Introduction to Graphs
  3. The NetworkX API
  4. Graph Visualization
  5. Hubs
  6. Paths
  7. Structures
  8. Graph I/O
  9. Testing
  10. Bipartite Graphs
  11. Linear Algebra
  12. Statistical Inference
  13. Conclusions

Background knowledge

  • This course is for current and aspiring Data Scientists and Data Visualization & Network Theory enthusiasts

  • Knowledge of following tools and concepts is useful:

  • Learners should have a grasp of Python programming

  • Loops and basic Python data structures

Real-world applications

  • Recommender systems: Using graph structures to recommend products or professional connections.

  • Epidemiological analysis: Figure out the most important spreaders of disease.

  • Logistics: Identify the most efficient path to move goods and services.