Kofax KTM â Understanding Your Kofax System`s Potential
Transcription
Kofax KTM â Understanding Your Kofax System`s Potential
Kofax KTM – Understanding Your Kofax System’s Potential March 17, 2015 Adrian Enders, Senior ECM Consultant DoxTek, Inc. References 1. Van Ittersum, Randy and Erin Spalding, 2005. “Understanding the Difference Between Structured and Unstructured Documents" - http://www.disusa.com/privatelibrary/documents/WP_Structured_&_Unstructured_Documents.pdf 2. . Fenton, Paul, January 6, 2014. "10 Benefits of Moving to Electronic Document Management System" http://blog.montrium.com/blog/10-benefits-of-moving-to-electronic-document-management-for-life-sciencecompanies Overview of KTM Kofax Transformation Modules (KTM) is an advanced forms processing engine designed to identify electronic documents and then automatically extract data from the document. KTM specializes in processing unstructured documents. A structured document is a document that has the same layout, and the data always appears in the same location on the page; some examples are W-2, IRS 1040 Tax Form, or a Form 4506-T. With unstructured documents, data can appear in unexpected places on the document. KTM can be used for many applications; one of the most obvious is to support uploading documents into an Electronic Document Management System (EDMS). There are many sound business reasons to employ an EDMS: to lower manual labor costs, facilitate collaboration, and/or to increase security and control. One of the major steps in deploying an EDMS system or moving closer to a paper-less process, is to convert paper documents into electronic images. Converting paper to electronic images is just part of the process; you also need to consider how the documents will be retrieved in the future. An EDMS system allows you to assign keywords that describe or identify the document. For example, on a W2 the form name (W2) and employee name (ex., Jim Shoe) could be used to retrieve the W2 form for a specific employee. Rather than having employees manually assign and type in the keywords for each document, KTM will automatically extract the data and assign it to the image. This leaves your employees with more time for doing business rather than typing data into an EDMS system. There are three steps in processing these electronic documents: separating, classifying, and extracting the data. Separation Separation determines where a document starts and stops. In the ideal situation, you would want to scan the documents by simply stacking them into a document scanner and pressing "Start".. Minimal pre-scanning preparation is most effective. KTM can automatically separate a stack of documents after scanning. Join us! For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com KTM uses several methods to automatically separate pages into documents. One method is to insert separator pages that contain a unique look or unique data to indicate to KTM that this page is the start of a new document. This provides a high level of accuracy but requires preparation time for your employees to create and insert the separator pages in the correct location before scanning. If the first pages of the documents being scanned are all different from each other or contain some unique data, then KTM can be set up to perform automatic separation based on these unique characteristics. Scanned documents are compared to a sample set of documents given to KTM to determine the first page, and then KTM will separate each time a sample page is found in the stack. When a sample page is identified, all pages following that page are associated as one document until KTM identifies the next sample page. This method generally requires only 1-5 copies of each document as samples to determine separation. In some cases, documents are more complex or do not vary in appearance enough for KTM to distinguish between the start and end of a document. In this case, KTM can be configured for trainable document separation. The content of the document is analyzed to determine first, middle, and last pages. This requires more sample images be provided to KTM as a base set to identify the first and last pages in a document. Classification Classification identifies what type of document is being scanned. Classification answers the question, "What document is this?" Is it a W2 form, a travel request, or an employee’s I-9 form? There are three methods that KTM uses to identify or classify the document. First, the layout, or the "look" of the document, is used to determine what the document is. This is generally the easiest and fastest method if feasible. Second, the content of the document is used to determine what the document is. This requires that Optical Character Recognition (OCR) be performed first to locate the words, so the process takes a little longer than layout classification. KTM then runs a series of algorithms to match the words to its trained document set. Finally, specific instructions can be applied to the classification set. If the document specifically contains the words "invoice number", "invoice amount," and "amount due," then it's probably an invoice. As you probably noticed, document classification can be closely intertwined with document separation. Documents are not always separated before they are classified. These classification techniques can be blended to provide successful automatic document separation and classification. Extraction Extraction is the process of reading the data on the page. OCR is performed to locate the words on the pages of the document. Remember that KTM is primed for extracting data from unstructured documents. When your company receives invoices from your vendors, each document from each vendor will look different but contain the same data. KTM uses key words as markers on the document to find the data. For example, the invoice amount will always be currency and probably close to words like "total", "total due", "pay amount," or something similar. Join us! For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com Some pieces of data contain specific patterns; amounts, dates, phone numbers, etc. KTM also uses the format patterns to find information. Additionally, the content of the document can be compared to external sources for identification. KTM can compare a database table of vendors to find the vendor name and address on a document, then it can automatically associate the document to that vendor with confidence. Summary KTM can easily be used for a variety of applications, perhaps most notably, for the ability to allow an EDMS to help manage the automatic extraction of data. There are many business reasons to employ an EDMS. Although cost, historically, has been a driving factor for many companies to implement such a system – the desire to protect the data becomes increasingly important as concerns grow regarding security and control of internal information and documentation. DoxTek is ready and willing to analyze the current processes used within any company looking to employ an EDMS or who is interested in reviewing their current Kofax system. Adrian Enders Join us! For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com