Magazine Article | October 23, 2006

Speeding Up Document Capture

Source: Field Technologies Magazine

Electronic document imaging and validation yield big benefits for AncestryDPS.

Integrated Solutions, November 2006

AncestryDPS is part of Provo, UT-based, Inc., which bills itself as a network for connecting families via the Internet by providing consumers with online access to information about their family trees. AncestryDPS processes 100,000 document pages per day, including birth/death records, military records, community directories, immigration records, and census records.

Not long ago, the company began looking for a document capture solution to replace a manual data entry procedure that was excessively time-consuming and, given the volume of pages handled daily, increasingly impractical. “We had staff who would manually key in information and process documents by hand, but considering the complexity of many of our projects, as well as the effort involved, we knew this wasn’t the best approach,” says Shawn Reid, AncestryDPS’ development director.


Reid and his team first looked at several off-the-shelf options, none of which proved suitable. “By this point, we had, in preparation for moving away from manual methods, developed several homegrown add-on solutions to suit the special requirements and challenges presented by the type of documents we handle,” Reid explains. One such tool defined settings for brightness, contrast, and other parameters and applied these settings to batches of images. However, the configuration of the off-the-shelf products allowed for neither an interface with the proprietary modules nor other customization.

AncestryDPS then turned to integrator DoxTek, which recommended that the company implement the Ascent platform and INDICIUS solution from Kofax because of these products’ abilities to process and classify unstructured (e.g. handwritten census forms), semistructured (e.g. printed birth certificates containing some handwritten data), and structured (e.g. telephone directories) documents. The availability of an open application program interface (API) from the vendor helped clinch the deal.

In tandem with DoxTek and the Kofax Professional Services Team, AncestryDPS developed several more custom modules and, via the API, interfaced them with the new technology. For example, the integrator built an audit module designed to assess the accuracy of data keyed by offshore operators before that data flows electronically into the new system. Another custom module converts images to a required JPEG2000 compression format prior to their publication on the Web site. It also converts images to a format required by INDICIUS.

All modules reside on AncestryDPS’ main server. Documents are scanned with a variety of hardware, such as the Kurta optical character recognition (OCR) robotic scanner. A custom module parses all data entries according to company-defined parameters and imports them to Ascent Capture, which digitizes entire documents or extracted data and uses OCR, intelligent character recognition (ICR), and optical mark recognition (OMR) to recognize machine- and hand-printed text. Customized image processing, review, and conversion are performed using the company’s proprietary tools.

The digitized documents or data are then imported to INDICIUS, which classifies, corrects, and validates AncestryDPS’ digitized data and images. The final data is then output to a file structure for import to the appropriate Web site.


According to Reid, automating document capture via the system is helping the company reap big gains on efficiency and accuracy. For example, AncestryDPS recently faced the challenge of capturing 72 million entries from British Telecom telephone directories published between 1880 and 1984. While it took operators 20 minutes to manually enter one page of a directory (about 300 listings), capturing the data via OCR and validating it using the new solution required 3 to 4 minutes.

“The Ascent platform has improved the processing of our documents by 85%,” Reid notes. “And adding the INDICIUS component means we can automatically capture both handwritten and printed information from all types of documents with a high degree of validation, ensuring that business processes run as smoothly as possible and allowing us to better serve the customers who access our Web site.”