Natural language processing (NLP) has seen significant advances in the recent past, and is now being used for various tasks that once required a lot of human involvement. In this seminar, we will share our journey by applying NLP techniques to a robotic process automation problem.

In particular, we share our challenges automating the extraction of the Table of Contents from PDF research reports, a task which becomes challenging due to the heterogeneity of report formats, and many arbitrary constraints designed for humans to handle.

Using a combination of approaches from transfer learning with a Single Shot Detector model to rules, we highlight situations where such a hybrid approach might prove useful.

We will also cover how to break down and scope a complex set of data science requirements, and common pitfalls handling a dynamic set of requirements that were originally tailored for human operators.

Instructor's Bio

Jia Hui Chow, Data Scientist at Refinitiv

Currently a data scientist at Refinitiv, Jia Hui gained experience on NLP tasks like named entity recognition and relation extraction for both English and Chinese. She built a deep learning model released in an existing product, Eikon, and has also released a relation extraction model for a risk intelligence tool.

Melvin Perera, Data Scientist at Refinitiv

Currently a Data Scientist at Refinitiv, Melvin works to enhance and improve Refinitiv’s existing product offerings (TRIT, Eikon, World-Check) for Information Retrieval in the Asian markets using Natural Language Processing. He is currently working on Named Entity Recognition techniques to improve quality of News Document Tagging in Chinese.

Local ODSC chapter in Singapore

Use discount code - Meetup2020 - to get extra 10% off on your pass for Virtual Conference West and Virtual Conference APAC.


  • 1

    Automating human tasks using NLP: Table of Contents generation for PDF research reports

    • AI+ Training

    • Webinar recording

    • AI+ Subscription Plans