Cracking The Code Behind Intelligent Document Processing (IDP)

September 23, 2022

Intelligent Document Processing (IDP) is a powerful technology that enables the end-to-end process automation of scanning and processing documents.
Learn more about the history of IDP, the techniques employed, and the benefits of using IDP in modern business environments.

Whether it is due to the environment-friendly properties or logistic benefits, paperless systems that primarily rely on digitized documents have been quickly but steadily gaining traction among businesses and organizations worldwide.

However, the process of digitizing paper documents is traditionally perceived to be both time-consuming and costly. But what if there is a solution that not only facilitates this process but also introduces new features that take advantage of the latest advancements in AI technologies?

Enter Intelligent Document Processing (IDP).

What is IDP?

IDP is a software solution that automates data extraction from different forms of scanned text and converts the obtained information into machine-encoded editable and searchable content. Aside from its data extraction abilities, IDP imitates human behavior by analyzing and contextualizing documents. For that, IDP employs various machine learning domains and cutting-edge AI techniques.

Deploying an IDP solution can greatly streamline a business’s data capture process from start to finish. IDP tasks such as reading, interpreting, and transforming documents into structured, usable, and organized data are all done in an automated fashion leaving a little margin for error, thus increasing work efficiency in a business environment.

IDP vs OCR

When discussing IDP, the term OCR gets often thrown around. OCR, which stands for Optical Character Recognition, is an older technology that enables a scanned image to be converted into text by transliterating one character at a time. One could argue that OCR is a stripped-down version of IDP, as the former lacks most of the automation features found in the latter. But in reality, both technologies have their specific use cases, even in today’s ever-evolving digital world.

To fully understand the origins of IDP, we must first delve into the history of its precursor and parent technology, OCR.

How did it all start?

A very early form of OCR can be traced back to 1913 when Irish physicist Dr. Edmund Fournier d’Albe invented the Optophone, a device designed to help the visually impaired consume written content. The Optophone scanned words on a page and converted letters into audible tones.

Another primitive form of OCR was used in telegraphy in the 1930s, where a back-then-newly-introduced teleprinter converted printed characters to electrical impulses.

The Omni-font OCR emerged in the 1970s and 1980s when postal services and other governmental organizations started to employ OCR techniques to scan official documents.
During that time, advancements were being implemented to OCR software in order to improve font and character recognition, gradually ushering in the modern era of Optical Character Recognition.

Nowadays, OCR is readily available and accessible through various cloud services and desktop/mobile-based applications.

OCR limitations

While OCR can extract information from scanned documents, it is unable to determine the appropriate context of that information. OCR lacks the ability to interpret extracted information and identify what each piece of information is associated with, eliminating the possibility of executing an automated end-to-end system.

For example, if a workflow requires the extraction of specific fields in a scanned identification card such as name, date of birth, and SSN, OCR can seamlessly transform the information listed under the fields into machine-encoded text.

But OCR, by itself, is incapable of associating any of the extracted information with their respective fields. In fact, associating the information with specific fields/properties is part of the post-OCR processing phase, and is often performed manually.

This is where IDP outperforms OCR.

How does IDP work?

IDP technology incorporates several machine learning techniques, including OCR, to extract specific data from different types of text documents, then determine the relevant context of that data.

With the advancement of AI technologies, implementing an IDP pipeline can be achieved through a variety of methods. But, at the time of writing this article, there is no end-to-end system that can apply IDP out-of-the-box.

There is, however, a way to achieve optimum results with IDP at the moment, and that is done by using a combination of image processing and deep learning techniques.

To form a clear understanding of the techniques and thought process behind the IDP technology, let’s go over the entire IDP framework that we at CME usually apply.

1. Image Segmentation

The first step of our IDP process aims at locating the target document in a snapshot image.
Image Segmentation identifies and selects the area where the document is located and then crops out the rest of the image.

2. Document Orientation Correction

Now that the subject document has been identified, the second step entails fixing orientation issues. Our IDP solution automatically detects and corrects the degree of skew of the subject document. Having a perfectly aligned document greatly enhances the accuracy of data extraction.

3. Field Detection

During this step, our IDP solution runs an initial analysis of the subject document to identify the available fields.
If the subject document is a passport card, fields with labels such as first name, last name, and date of birth are automatically identified and marked, for future use.

4. Image Processing

After determining the relevant fields, our IDP solution proceeds by isolating/cropping those fields and performing a set of image processing routines on them, such as image denoising.
This step helps in reducing the image noise and streak artifacts as much as possible before passing it along to the OCR phase.
A clean image significantly improves the OCR accuracy rate.

5. OCR

Each of the previously cropped fields is processed separately in OCR. Since the fields have already been marked in the Field Detection phase, the OCR software just has to extract the information listed under each field.
In the case of our subject passport card, here are the obtained OCR results:

– Card No: “Passport Card No \n C15326485”

– First Name: “Given Names \n JOHN”

– Last Name: “Surname \n SMITH”

6. Text Processing

The final step in our IDP process involves automated text cleaning and correction. The text extracted by the OCR software gets processed using NLP techniques such as regular expressions, entity recognition, and spelling correction.

Upon process completion, the cleaned text will be ready to integrate into a business workflow.
Here is what the final result looks like when our IDP process is completed:

– Card No: C15326485
– First Name: John
– Last Name: Smith

At first glance, our IDP approach might seem complex. However, in practice, the process provides highly accurate results, especially for business workflows that require a specific set of information to be extracted from a scanned image or document.

Benefits of IDP

Up to this point, we have established that IDP is a ground-breaking technology that essentially enables a machine to read and contextualize text. But what does it offer to businesses looking to increase their operational abilities? And how can it enhance a business day-to-day workflow?

Here are four key benefits of using IDP in your business:

1. Reducing Cost

By adding IDP technology into a business process, you are reducing your labor costs in transcribing and editing scanned documents and images. Also, IDP will help decrease costs in the areas of data entry, printing, copying, office equipment, and consumables.

2. Increasing Speed

Given that there is a fixed number of hours in the workday, it is crucial for every business to maximize its use of that time by reducing manual time-consuming efforts to a bare minimum. Using AI techniques, IDP can significantly decrease the processing time of a document thus allowing your employees to focus on their core competencies.

3. Reducing Errors

While not fully foolproof, IDP can greatly help you in reducing human errors, and even traditional OCR errors. As a machine learning technology, IDP can operate with minimal human intervention and improve itself over time by learning from its mistakes.

4. Improving Availability

By leveraging the features of IDP to scan and extract information from images and documents, you can make the extracted information available to multiple systems while eliminating the delay in locating and retrieving that information.

Wrapping up

IDP is a highly effective solution that automates the scanning and processing of images and documents. It can save businesses great amounts of time and money by reducing the need for manual data extraction and processing, and enabling workflows to run automatically with minimal human intervention.

If your business still relies on traditional data entry and document processing methods, then it may be worth considering implementing an IDP solution into your day-to-day workflow soon.