• DCL LinkedIn
  • DCL Twitter
  • DCL YouTube

61-18 190th Street, Suite 205

Fresh Meadows, NY 11365

+1 718.357.8700

info@dclab.com

HOME  /  INDUSTRIES  /   SOLUTIONS  /  SERVICES  /  RESOURCES /  ABOUT  /  CONTACT  /  PRIVACY  /  TERMS OF USE

© 2019 Copyright Data Conversion Laboratory, All Rights Reserved.

DCL Reformer

Automated content structure and data reassembly

Bring New Life to Content in an Automated Way

Many organizations have extensive content buried in image-based PDFs (and even paper!) that cannot be digitized using standard OCR tools due to complex tables, charts, figures, foreign characters, chemical formulae, etc. DCL Reformer is an automated solution that transforms static content into structured formats, improving the content's utility for downstream systems.

 

DCL uses computer vision techniques to detect and remove poor OCR-quality content, retaining text for high-accuracy OCR processing and conversion to unstructured text. Complex algorithms, NLP engines and other techniques are then applied to analyze the unstructured text from documents with wide variations in format and quality, and accurately structure the data. 

The importance of having a plan and process for content QA

Assuring content quality in today's business environment is vitally important. Most content is an accumulation from various sources that builds up over time. Periodic review and analysis focuses efforts on identifying and improving ongoing content quality, consistency and accuracy.

Analysis

The first step in DCL Markup Check is content analysis, which quickly identifies areas for investigation. Organizations recognize immediate benefits including

Conversion alone does not improve content functionality

DCL Reformer amends and improves the structure, and hence, the quality of your content. Reformer is a fully-automated workflow system that receives and classifies documents, OCRs tiff images, extracts free-form text from textual and form-based documents, and generates target XML schema with image attachments.

DCL Reformer

Artifact Removal

Computer vision techniques detect and remove poor OCR quality content, retaining text for high-accuracy OCR processing and conversion to unstructured text.

Post Extraction

DCL Reformer extracts free-form text or image-like content from true textual content. Removed artifacts are transformed to image files and the remaining content is ready for OCR.

DCL Reformer - images extracted
DCL Reformer - XML

Target XML

Complex algorithms, NLP engines and other techniques are applied to analyze the unstructured text from documents. The automated system references the extracted artifacts as images in the resulting XML. 

Expertise Across all Formats

  • DITA

  • XML

  • HTML, HTML5

  • PubMed JATS

  • MathML

  • NLM XML

  • NISO STS

  • Bookshelf

  • EPUB/MOBI

  • S1000D

  • SGML

  • MS Word

  • and more

reform verb (1)

1a: to put or change into an improved form or condition

  b: to amend or improve by change of form or removal of faults or abuses

intransitive verb

: to become changed for the better

Markets Served

DCL Reformer is a useful solution for any organization with a high-volume of incoming content or legacy content that is complex with multiple variants.

Group_3x.png
Shield_3x.png
Cash_3x.png
Library_3x.png
Medicine_3x.png
Scales_3x.png
Graduation_3x.png
Settings_3x.png
Book_3x.png

RELATED WHITE PAPER

This paper describes the implementation of DCL Reformer at the United States Patent and Trademark Office (USPTO). The system is processing millions of pages each month with turnaround measured in minutes.

USPTO_White_Paper.png