Twitter is what's happening in the world right now. In order to understand and organize content on the platform, we leverage a semantic text representation that is useful across a variety of tasks. Because content on Twitter spans a wide range of diverse topics and is constantly changing, supervised training that traditionally relies on human-annotated corpora proves to be expensive and unscalable.
Sijun He and Kenny Leung share their experience building and serving self-supervised content representations for heterogeneous content on Twitter. They also highlight various applications of content embeddings in recommendation systems, as well as the engineering challenge of maintaining such embeddings at scale.


New on-demand courses are added weekly

Session Overview

  • 1

    ODSC West 2020: Building Content Embedding with Self Supervised Learning

    • Overview and Author Bio

    • Building Content Embedding with Self Supervised Learning

Instructor Bio:

Machine Learning Engineer | Twitter Cortex

Sijun He

Sijun He is a machine learning engineer at Twitter Cortex, where he works on content understanding with deep learning and NLP. Previously, he was a data scientist at Autodesk. Sijun holds an MS in statistics from Stanford University.

Machine Learning Engineer | Twitter Cortex

Kenny Leung

Kenny is interested in bringing human capabilities - particularly language, vision, and the acquisition of everyday knowledge - to modern technology.