Automated content structure and data reassembly
Bring New Life to Content in an Automated Way
Many organizations have extensive content buried in image-based PDFs (and even paper!) that cannot be digitized using standard OCR tools due to complex tables, charts, figures, foreign characters, chemical formulae, etc. DCL Reformer is an automated solution that transforms static content into structured formats, improving the content's utility for downstream systems.
DCL uses computer vision techniques to detect and remove poor OCR-quality content, retaining text for high-accuracy OCR processing and conversion to unstructured text. Complex algorithms, NLP engines, and other techniques are then applied to analyze the unstructured text from documents with wide variations in format and quality, and accurately structure the data.
The importance of having a plan and process for content QA
Assuring content quality in today's business environment is vitally important. Most content is an accumulation from various sources that builds up over time. Periodic review and analysis focuses efforts on identifying and improving ongoing content quality, consistency and accuracy.
The first step in DCL Markup Check is content analysis, which quickly identifies areas for investigation. Organizations recognize immediate benefits including
Conversion alone does not improve content functionality
DCL Reformer amends and improves the structure, and hence, the quality of your content. Reformer is a fully-automated workflow system that receives and classifies documents, OCRs tiff images, extracts free-form text from textual and form-based documents, and generates target XML schema with image attachments.
Computer vision techniques detect and remove poor OCR quality content, retaining text for high-accuracy OCR processing and conversion to unstructured text.
DCL Reformer extracts free-form text or image-like content from true textual content. Removed artifacts are transformed into image files and the remaining content is ready for OCR.
Complex algorithms, NLP engines, and other techniques are applied to analyze the unstructured text from documents. The automated system references the extracted artifacts as images in the resulting XML.
Expertise Across all Formats
reform verb (1)
1a: to put or change into an improved form or condition
b: to amend or improve by change of form or removal of faults or abuses
: to become changed for the better