top of page
  • Marianne Calilhanna

The Evolution of Content Conversion and Why it Matters

Content conversion” and “data transformation” are loaded terms that mean different things to people depending on where they work in the content supply chain. Back in the day when content distribution followed a linear workflow—idea > manuscript > proof > composed page > published piececontent conversion comprised the reformatting from one output format to another (typically PDF to a specific XML format). For some organizations, simple conversion services are still required. For example converting from one format to another format, such as legacy content in the form of paper to a PDF.

More often, conversion needs are more complex for today’s content ecosystem.

Digitization is a non-negotiable but also enriching digitized content is more important than ever. Content discoverability across devices, platforms as well as the increasing importance of voice-first technology means digitization alone is not enough.

Content structure and semantic enrichment are mandatory for downstream content consumption—no matter if that consumption is a person or a machine. These tenets are particularly important for systems with enhanced AI-based search and discovery.

Semantic Enrichment

Semantic enrichment is a primary building block that enables downstream capabilities, such as search engine and database discoverability, consumption through voice interfaces, and new monetization opportunities.

Depending on your content and your business needs, there are a few options to enrich content, including

  • embedded metadata directly in XML files

  • tags in databases or content management systems and reference content elements

  • metadata headers for content that is not easily accessible, such as videos and images

Digitization Without Structure

Scanning is a good first step in information preservation. However, content is still considered "flat."

iPhone displaying PDF content of reference material.
Here is an example of a PDF with reference material. But it's impossible to search, click, dive deeper. It's "flat."


XML---The Building Blocks of Intelligent Content

Content markup is required if one wants to search through content in a meaningful way with modern tools and interfaces.

iPhone with XML depicted as a layer over content.
When content is structured, the fun can commence!


Technology Enables Multiple Dimensions

Where content was previously static and "flat," content structure, semantic enrichment, and technology enables interactivity and discovery.

iPhone with multiple dimensions of content structure
Without semantically enriched structured content, there would be no search.


Many organizations have extensive content buried in paper, PDF, and Word files that is not structured or enriched, and not necessarily even digitized – especially if they have complex tables, charts, figures, foreign characters, chemical formulae, etc. Unraveling this mess has until recently been challenging and expensive.

It's time to revisit that content! You might even find high-value content and data hidden away in your organization that is perfect for new product development, tools, and monetization.


Recent Posts

See All


bottom of page