Navigation: Lifecycle Applications >

Discovery

 

 

 

Once we have captured the documents from the various different sources, the Discovery module automates the extraction and verification of the data and assignment to cabinet indexes.

 

 

 

We first need to understand what type of document we are dealing with. For example, are we dealing with an invoice or a loan document. This process is called Classification. If document types are contained in one document "blob" then it will need to be separated into the different types so it can be processed. In order to classify and index documents, we need to gather information about the content.

 

 

Separation, Classification and Indexing

Document Separation - Documents can be separated by using a number of different methods:

Manual (A user can split and re-arrange pages within the classification UI)

Barcodes (Code 11, Code 39, Code 128, CodaBar, Inter2of5, EAN13, EAN18, UPCE, Add2, PDF417, ReadDataMatrix, ReadQRCode)*

Patch Codes

Business Rules (using values in the content of the document to determine where to separate)

Document Classification - Once the document has been separated (if required) then the document can be classified:

Manual (A user can split and re-arrange pages within the classification UI)

Barcodes (Code 11, Code 39, Code 128, CodaBar, Inter2of5, EAN13, EAN18, UPCE, Add2, PDF417, ReadDataMatrix, ReadQRCode)

Business Rules (Using business rules to identify text, tags and properties of the document to determine the type)

Document Indexing - Once we have identified the type of document then we can assign/extract the index data:

Manual Indexing

Powerful database lookup functionality to gather information about the document from external databases and business applications

Extract data from an external reference file (text file, Microsoft Excel, XML etc)

Business Rules (using the content of the document to extract data)

 

 

Extracting the Text From Documents

The business rules work on the content of the document. DocuNECT can extract the text of the document from multiple formats:

 

Extract data from the content of text based documents (Microsoft Office, Adobe PDF Text, Adobe PDF Forms, Text, XML, HTML)

Full-text OCR image based documents (TIFF, Adobe PDF Image, JPG, BMP, GIF). The process will also automatically rotate documents and cleanup the image.

 

 

Using Business Rules


Business rules can be assign to the Separation, Classification and Indexing stages. Percentage confidence levels thresholds can be set to flag potential issues to users who can verify the information. The business rules for extraction uses DocuNECT's DocScript capabilities, which allows for comprehensive rule sets to be created.

 

For example, if we have a Promissory Note document type where the loan interest rate is buried in a paragraph, the DocScript text extraction capabilities can identify and extract the interest rate based on proximity information. Additionally we can then apply business rules to make sure, for example, the interest rate is within a specific range or if the interest rate is 304% then perhaps the decimal point was missed during the OCR process.

 

For each document, DocuNECT generates document information that stores the following information for each index value:

 

The business rule that was used to extract the information

The text location of where the information was found (Page number and location)

A description of the source to provide reference information on how and why the value was extracted

If the value was corrected by a user and what the value was corrected to. Note, this information is used for the DocuNECT Machine Learning module.

 

 

 

Work Center (Web Based Indexing and Classification)


DocuNECT's Work Center has a web-based classification review and indexing module. This allows users to review the information extracted by the business rules and make manual adjustment. To support the manual indexing, DocuNECT also has a Data Grab feature that allows users to grab data directly from the image itself to save typing.

 

 

 

Learn "As You Go"


This is a machine learning process that is looking at how the user is interacting with the data extracted using the business rules as a refinement mechanism. The process is looking for users making repetitive tasks and once identified creates an automation rule. These new rules and the effectiveness rating of existing rules are analyzed to look at the overall productivity of the lifecycle.

 

 

Process Monitoring


The "black box" approach requires insight into bottlenecks and to monitor Key Performance Indicators (KPI) to make sure the lifecycle to performing. Dashboards can be configured to provide access to this information at a glance:

 

 

 

 

Copyright © 2021