Course Abstract

Transformer has been around for a while now. However, it has proven to be one of the most interesting models of modern deep learning. Recent advances in Transformer model research have proven its domain agnostic nature. Despite of Transformer initial application for seq2seq NLP tasks with 1D sequences of text, the 1D transformer input can originate from more complex domains. Namely, a 2D image unrolled into long 1D sequence of pixels, can be understood with notion of its 2D image characteristics involving object appearance, category, or even predicting next image appearance in very long sequences. Recent research show that transformer originated architectures for computer vision often tends to be simpler and provide the performance which is at worst on pair with modern architectures such as RCNNs used for the computer vision tasks. The content is focused around recent research in area of transformer applications in new domains.


Learning Objectives

  • Knows the pros and cons of applying solutions predeceasing transformer such as RNNs, LSTMs, CNNs

  • Understand the essentials theory of transformer-based deep neural networks, including its most important building blocks.

  • Learn what are the modern approaches to solving computer vision problems with use of transformer-based architectures component.


Instructor Bio:

Director, R&D | UBS

Michał Chromiak, PhD

Dr Michał Chromiak is a Director at UBS, contributing to text document analytics and leading efforts to democratize AI in financial sector and investigate applications for multiple ML based tasks. Michal is also a member of Department of Intelligent Systems at Maria Curie-Skłodowska University. Past research in integrating big data from distributed and heterogeneous sources, brought him to concentrate on data perception using deep learning. He is interested in improving and understanding ways to generalize the modern deep learning algorithms and finding their best suited AI applications. He is strongly fascinated with understanding how deep learning can be improved to match, and exceed biological forms of intelligence, in terms of performance.


Course Outline

Module 1: Introduction to Attention and Transformers 

- Concept of attention and its applications before it has become an integral part of the transformer architecture. 

- Key building blocks of transformer architecture discussing their intuition and applications. 

Module 2: Transformers for object detection

- The intuitions behind the recent research and 

- How the transformer architecture, originating from NLP, has proven to be suitable for the computer vision domain, along 

- State of the art research with examples 

- 2D image can be used with Transformers for object detection. 

Module 3: Slot Attention 

- Extracting object-centric representations with Slot Attention 

- Enable generalization to unseen compositions with Slot Attention. 

- Relation to unsupervised object discovery from images. 

- Explaining slot attention based on recent research

Background knowledge

  • This course is for current or aspiring Data Scientists, Machine Learning Engineers, and NLP Practitioners

  • Knowledge of following tools and concepts:

  • Attention, Transformers and Convonutional Neural Networks

Real-world applications

  • Transformers are used for Natural Language Processing based applications in finance, healthcare, education etc., and widely used by Google for its search engine autocomplete.

  • A more recent applications of transformers is end-to-end object detection, a method recently launched by Facebook AI Research.