Course Abstract

Training duration: 90 minutes

In this hands-on training, we will use free-tier resources in the Google Cloud Platform (GCP) to introduce learners to the practical use of cloud computing resources in data science and machine learning. Learners should have some experience with data analytics, data science or machine learning. Learners should also have a Gmail account with no former GCP use associated with it, or be willing to create such an account. While fluency in R or Python will be very helpful, it is not rigorously required, as well-annotated scripts will be provided. No previous exposure to or use of cloud computing is required; this is introductory-level in terms of its cloud computing assumptions.


Learning Objectives

  • Understand Practical use of cloud computing resources in data science and machine learning

  • Create a new GCP account and explore documentation and tutorials offered

  • Explore public datasets hosted on GCP’s BigQuery service

  • Use SQL to do data analysis on a public dataset

  • Create a Jupyter notebook on a free-tier compute environment and use Python to analyze data

  • Create an RStudio Community server environment on a free-tier compute environment and use R to analyze data

  • Create a machine learning predictive model on public data


Instructor Bio:

Supervisor of Data Education | Children's Hospital of Philadelphia

Joy Payton

Joy Payton is a data scientist and data educator at the Children’s Hospital of Philadelphia (CHOP), where she helps biomedical researchers learn the reproducible computational methods that will speed time to science and improve the quality and quantity of research conducted at CHOP. A longtime open source evangelist, Joy develops and delivers data science instruction on topics related to R, Python, and git to an audience that includes physicians, nurses, researchers, analysts, developers, and other staff. Her personal research interests include using natural language processing to identify linguistic differences in a neurodiverse population as well as the use of government open data portals to conduct citizen science that draws attention to issues affecting vulnerable groups. Joy holds a degree in philosophy and math from Agnes Scott College, a divinity degree from the Universidad Pontificia de Comillas (Madrid), and a data science Master's from the City University of New York (CUNY).


Course Outline

• Cloud computing concepts and vocabulary

• Cloud providers

• Free tier and cost considerations

• Public datasets and citizen science

• Redundancy, security, and privacy

• Continuum of management levels

• Cloud data storage and analytics

• Machine learning in the cloud

Background knowledge

  • This course is for current or aspiring Data Engineers, Machine Learning Engineers, Software Engineers and Data Analysts

  • Knowledge of following tools and concepts:

  • Background R or Python will be helpful, although it is not rigorously required

  • This training will be useful for those considering cloud adoption, interested in data engineering, or interested in working with public data as citizen scientists.

Real-world applications

  • Chatbots and personal assistants and other interactive digital applications use ML on the cloud for productionization.

  • ML models on the cloud are used for creating intuitive recommendation engines for services like Netflix, Linkedin, Waze etc.

  • ML on the cloud have helped production models in industries including healthcare, finance, and e-commerce.