Beyond OCR: Using Deep Learning to Understand Documents
This course is available only as a part of subscription plans.
Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS and Google Cloud provide text extraction services to "digitize" images or pdfs. These services utilize OCR techniques and return phrases, words, and characters with their corresponding coordinate locations. Working with these outputs remains challenging and unscalable as different document types require different heuristics with new types uploaded daily. Additionally OCR does not attempt to understand the document; for example, dollar amounts need be numerical while OCR may suggest a “1” is an “l”. Furthermore, a performance ceiling is reached even when parsing algorithms work perfectly: while 3rd-party service OCR is excellent, it is not perfectly accurate.
We propose an end-to-end scalable solution utilizing deep learning architecture consisting of a computer vision component connected to a sequence generation component. Through training on millions of documents, the model learns to understand document patterns and characteristics to finally extract important fields from raw documents. We show marked improvement of accuracy compared to 3rd party OCR services. Additional benefits include char-level probabilities for confidence scores and utilizing explainability algorithms such as Smoothgrad and occlusion to determine which areas in the document are responsible for the predictions.
Bill.com is working to build a paperless future. We parse millions of documents a year ranging from invoices, contracts, receipts, and a variety of other types. Understanding those documents is critical to building intelligent products for our users.
Overview and Author Bio
Beyond OCR: Using Deep Learning to Understand Documents
Eitan Anzenberg, PhD
Eitan Anzenberg,PhD