In industries from government services to insurance, paper forms that were designed for physical use are increasingly sent over the internet. These forms have proved difficult to incorporate with digital tools and applications in the past.
As a result, demand has increased for robust Optical Character Recognition (OCR) tools that turn written forms and tables into machine-readable text. Document processors that require users to define a template are excellent for certain applications, but in situations with scanned or faxed documents, there are often problems with text recognition. Amazon released Textract in late 2018 to serve as a template-free OCR solution that serves as both an extractor and analyzer of text.
On April 28, Kayla Cross, a software engineer on the Security Control team at Evans & Chambers, held a virtual Tech Talk to provide an overview of Textract’s functionality. She outlined its advantages, such as reducing the burden on the user, and highlighted its easy-to-use API. She also outlined its possible uses, including identifying text for natural language processing, analyzing data with multiple columns, and parsing complex government forms like DD254s and SF86s. She ended with two demonstrations that showed different approaches to using Textract in the Amazon ecosystem to parse a government form and a picture of a blog post taken with a phone.
Tools like Textract can prevent the arduous experience of re-entering information that is already contained in a form into separate software, which makes it an attractive solution for developers seeking to improve user experiences and streamline their applications.
About the EC Tech Talk Series
The Tech Talk Series is an employee-led platform dedicated to EC’s core value of continual learning. These talks aim to cover a broad range of technology-based topics to promote the sharing of best practices and ideas across EC’s project teams.