Magazine Article | February 21, 2008

Data Capture In An 'Unstructured' World

Source: Field Technologies Magazine

Template-driven forms processing is giving way to semistructured and unstructured data capture options, but how do you know if the solution is right for you?


Integrated Solutions, March 2008

ECM (enterprise content management) software continues to expand the envelope of data collection and management. This is particularly true in the capture market, where forms processing solutions have escaped the bounds of static, template-driven applications. Forms-intensive business functions, such as those in the accounting, finance, and healthcare markets, have driven the development of semistructured and unstructured forms processing that employ advanced recognition and classification technologies.

"Educating the SMB market on the benefits of unstructured data capture solutions can prove to be difficult," says Julie Lautsch, director of marketing and channel sales at Questys Solutions. "In larger enterprises, the labor cost of manual data processing remains high when you consider the massive volumes of paper these organizations have to handle. By contrast, in smaller firms, the cost savings is not necessarily as evident." In today's data capture market, however, there are solutions designed to offer advanced capabilities on an SMB budget. When the increase in labor efficiency is factored in, SMBs can easily justify the purchase of an unstructured data capture package.  "In our economy, freeing up as little as one or two employees' time to perform other duties can prove to be immensely beneficial," continues Lautsch.

Chris Preston, senior director of content management and archiving at EMC Corporation,  agrees with this line of thinking. "The cost savings an enterprise-level business can achieve are often easy to quantify, and the benefits, like accelerated processing, become apparent when demonstrating the solution," says Preston. "But this does not exclude small and medium businesses from unstructured forms processing solutions. Smaller organizations also can benefit from classification and data extraction technologies, but need to look for specific paper-based processes within their organizations that are most critical to their operations." Essentially, any organization that finds it has paper-intensive processes requiring manual presorting and keying of business data can benefit from an advanced forms processing solution.

Organizations of any size, enterprise or SMB, should take the time to perform a thorough needs analysis prior to tackling a semistructured or unstructured forms solution. It is easy to get caught up in the allure of the automation benefits that can be achieved with unstructured forms processing, but the expense warrants due diligence. Analyzing the specific challenges and business processes to be addressed will clearly outline the requirements a data capture solution will need to satisfy.

Just as important as a needs analysis is the selection of an evaluation team. The evaluation team should include not only those who will develop the system, but those who will support it, those who will use it every day, and those who will be involved in measuring the project's success and benefits. This group should be able to skeptically validate the organization's circumstances, both financial and operational, and proceed to evaluate prospective vendors if a purchase or upgrade is deemed appropriate.

Once prospective vendors have been chosen, an often overlooked or rushed step is the proof-of-concept stage. "In order to successfully plan for an implementation of semistructured or unstructured forms processing, we recommend that a small pilot be conducted," says Randy Blevins, executive vice president at EDAC Systems, Inc. "This pilot should include a representative sampling of the different document types and meet the project's index requirements." This sample would likely include documents with varying paper sizes, as well as different layouts and formats. For example, one form may be set up in a vertical, columnar format, while the next carries data in a horizontal, linear framework. The sample sets should also be in random order and not presorted by size or type.

"All too often, vendors can show a high success and accuracy rate with a small sampling of documents provided by the client," says Lautsch. "However, in many cases these vendors tweak their systems to produce 100% accuracy with a small sampling, when true production accuracy levels would be lower."  A simple way to monitor and validate a vendor's results is to provide a large sampling of documents and allow the vendor to configure (and optimize) the system to run that sample. Follow up this test by requiring the vendor to then run the initial sample set again, followed by another large sample set that the vendor has never seen before. This will provide a truly representative result of how the system will operate in your daily environment. "Key advice to anyone looking to implement an unstructured data capture solution is to establish a true accuracy and success rate and decide if such a success rate truly provides a rapid return on investment," says Lautsch. 

Taking the time to thoroughly evaluate the integration capabilities of the solution is also important because some vendors will claim that the capture application is easy to configure and set up. What users often do not realize is that these simplified, almost plug and play applications, do not provide complete functionality or are limited in the amount of customization that can be done to meet the business' unique requirements.

Organizations must also consider more than just the document classification and data extraction capabilities of a capture solution. "It is important that the software is able to scale to handle anticipated volume requirements and also support the other key elements required to manage the entire transactional process," says Preston. "Therefore, it's extremely important for companies to require that the software vendor in question provide a complete end-to-end platform solution that will deliver compelling value not only in the area of forms processing, but also in areas of business process management, content management, archiving, and storage."

An important aspect to consider when evaluating potential solutions is the storage requirements presented by any data that will be sent to third-party or back end systems. "For example, a customer service database that tracks customer surveys that are manually entered is growing at a rate of 50 surveys per week.  In implementing a data capture solution, the throughput can be increased to 100,000 per week," says Lautsch. "Although the volumes may not be there initially, the additional throughput capability may expose additional volume that you didn't know was there, exploding your storage requirements."

One of the biggest challenges of any ECM software application remains the complexity of integrating it into a business solution that consists of disparate applications from multiple vendors. Examples of these applications would include document capture, workflow, and business process management. Historically, these technologies were selected individually to meet specific business needs and then integrated together on the back end. While these solutions provide best-of-breed capabilities in each area, the integration and ongoing maintenance challenges can be significant.

When talking about data capture solutions, whether you are focused on structured or unstructured content or not, ROI is largely based on the intangible results of increased efficiency and improved business processes. Transactional processes can be streamlined and accelerated, and paper processes that traditionally took days or weeks to process can be condensed into a few hours. Furthermore, staff that was once tied down by a manual data entry and index processes can be reduced or reassigned to higher-value tasks within the organization.