Harmonize Redundant Content
DCL works with organizations to analyze large document collections and identify content reuse across multiple document sets and source formats.
Harmonizer is our proprietary software application that processes and analyzes thousands of pages using natural language processing to identify redundant content in document collections. Harmonizer examines content across a variety of source formats. HTML and Excel reports from Harmonizer help operationalize information and influence reuse strategy when moving to XML.
Tools: Harmonizer, natural language processing
Document and content analysis
Analyzing paragraphs in a content set and looking for exact or close matches helps organizations understand what content is redundant and provides a roadmap for content clean up. The analysis is presented in either an HTML or Excel report.
DCL’s solution harnesses technology in Natural Language Processing and Machine Learning to help enable solutions powered by Artificial Intelligence. With sophisticated automated processes, DCL optimizes content to collect information, streamline compliance, facilitate migration to new systems and databases, maximize reuse potential, and ready it for delivery to all outputs.
Mark Gross, President, DCL
WEB SCRAPING: SCIENCE OR ART?
RELATED CASE STUDY
A multinational, Fortune 500 financial services organization had thousands of policy and procedure and training files in multiple sources, including HTML, XML, MSWord, and RoboHelp.
BENEFITS OF CONTENT ANALYSIS WITH HARMONIZER
Our customers find that Harmonizer reveals unintended differences in text phrasing, spelling, and punctuation—often finding errors that have been deeply embedded in documentation for many years.
Content reuse analysis is ideal for any content-centric organization managing document collections with high reuse potential. Analysis is a starting point for any organization that is moving to a topic-based authoring workflow or implementing DITA.