The Evolution of Content Conversion and Why it Matters
Updated: May 6
“Content conversion” and “data transformation” are loaded terms that mean different things to people depending on where they work in the content supply chain. Back in the day when content distribution followed a linear workflow—idea > manuscript > proof > composed page > published piece—content conversion comprised the reformatting from one output format to another (typically PDF to a specific XML format). For some organizations, simple conversion services are still required. For example converting from one format to another format, such as legacy content in the form of paper to a PDF.
More often, conversion needs are more complex for today’s content ecosystem.
Digitization is a non-negotiable but also enriching digitized content is more important than ever. Content discoverability across devices, platforms as well as the increasing importance of voice-first technology means digitization alone is not enough.
Content structure and semantic enrichment are mandatory for downstream content consumption—no matter if that consumption is a person or a machine. These tenets are particularly important for systems with enhanced AI-based search and discovery.
Semantic enrichment is a primary building block that enables downstream capabilities, such as search engine and database discoverability, consumption through voice interfaces, and new monetization opportunities.
Depending on your content and your business needs, there are a few options to enrich content, including
embedded metadata directly in XML files
tags in databases or content management systems and reference content elements
metadata headers for content that is not easily accessible, such as videos and images
Digitization Without Structure
Scanning is a good first step in information preservation. However, content is still considered "flat."
XML---The Building Blocks of Intelligent Content
Content markup is required if one wants to search through content in a meaningful way with modern tools and interfaces.
Technology Enables Multiple Dimensions
Where content was previously static and "flat," content structure, semantic enrichment, and technology enables interactivity and discovery.
Many organizations have extensive content buried in paper, PDF, and Word files that is not structured or enriched, and not necessarily even digitized – especially if they have complex tables, charts, figures, foreign characters, chemical formulae, etc. Unraveling this mess has until recently been challenging and expensive.
It's time to revisit that content! You might even find high-value content and data hidden away in your organization that is perfect for new product development, tools, and monetization.