In this presentation, phData’s Director of Machine Learning Robert Coop will walk the audience through the process of detecting and classifying audio events using deep learning.  The objective is to enable the listener to use open source tools, pre-trained networks, and data augmentation for audio analytics projects. The target audience is someone with intermediate or advanced experience programming with Python and any level of experience using deep learning.

The first part of the talk will introduce the background techniques, available data, and theory. We will cover basic audio processing techniques such as time-frequency domain transformations, frequency domains used to replicate human interpretation of audio, and spectrographic representations of audio. The talk will focus on Google’s publicly-available AudioSet data, and how it can be processed using deep learning. We will demonstrate the similarity between image recognition and audio processing, and cover a VGG-inspired network that has been successfully used in audio processing work. There will also be some discussion of state-of-the-art techniques that have recently been developed.

The second part of the talk will focus on the hands-on application of these techniques in Python. The focus will be on the end-to-end process of classification of AudioSet data using the VGGish deep learning network. We will cover loading the data and applying common preprocessing techniques in Python. Tensorflow will be used to load the VGGish network and to process the transformed audio data. We will demonstrate using the network to create embedding vectors for classification as well as the process of transfer learning using the pretrained weights as a foundation.

After this talk, audience members will be able to understand the motivation behind using deep learning for audio processing, locate and use publicly-available audio data for experimentation, and use Python with Tensorflow to classify audio samples.


New on-demand courses are added weekly

Session Overview

  • 1

    ODSC East 2020: Audio Event Detection via Deep Learning in Python

    • Overview and Author Bio

    • Audio Event Detection via Deep Learning in Python

Instructor Bio:

Robert Coop, PhD

Director of Machine Learning | phData

Robert Coop, PhD

With more than 7 years of experience in the data science space, Dr. Robert Coop (who goes by Coop) is an expert at establishing data science teams in corporate environments. As the Director of Machine Learning at phData, Coop is bringing his expertise and knowledge to help phData build their machine learning practice and provide the same leadership to their customers. Previously, Coop founded and led the Artificial Intelligence and Machine Learning team at Stanley Black & Decker, which focuses on applying cutting-edge algorithms to solve enterprise problems at scale. Coop received his doctorate from the University of Tennessee, where he studied deep learning, neural network training algorithms, and ensemble techniques.