Content Transformation for Intelligent Content

Marianne Calilhanna
Oct 1, 2020
6 min read

The following is a guest post from SDL by Arpita Maity. Val Swisher, CEO of Content Rules shares her wisdom with Arpita about content strategy, reuse, content structure and more. Both SDL and Content Rules are sponsors of DCL DITA Day!

A truly digital organization is one that addresses the core of its business—its knowledge—which is made up of information and content stored in its systems and applied by its employees and partners to perform their jobs. Managing that information, in an adequate way, should be the starting point for supporting any digital transformation initiative.

As content demands scale, so too does it need automation, cross departmental collaboration and, at the same time, the quality and consistency to ensure accurate information—all of this means we need to make content intelligent.

Taking existing content and turning it into intelligent content requires a content transformation strategy. Content transformation expert, Val Swisher, of Content Rules, shared with me how to embark on this journey using a component content management system (CCMS).

Arpita: Val, can you share some thoughts about what content transformation is and how to start this journey? Val: Content transformation allows organizations to adopt a structured approach to how they create, store, manage, publish, and sunset content. This approach starts preparing content and organizations for digital transformation and the future.

There are many ways to divide the tasks needed to transform existing content for structure, but I think these three are essential:

Define content architecture
Curate existing (legacy) content
Transform the content for the new architecture

Step 1: Define Content Architecture

Before you do anything with your existing content, it is important to have a strategy—a roadmap for how you want your content to be created, stored, and tagged. A robust content architecture has these pieces:

Content models
Reuse strategy
Taxonomy and metadata
Workflow
Governance

Content Models

To start with, it is important to define your content models.

Content models are the structures which govern how you write or rewrite your content. They define what content you must include, the order in which each item appears, whether certain pieces of content are mandatory or optional, and so on. The model is the plan for each small chunk of content and also the plan for the larger information product that is ultimately published.

Models free up writers from worrying about what content needs to be included and how to include it. The structure, itself, defines what to put in the chunk. The writer creates the content that addresses the components of the structure.

Reuse Strategy

Content reuse is a key component of almost all digital transformation strategies. Companies of all sizes are realizing that creating two, three, four, sometimes sixteen versions of almost-the-same content is incredibly inefficient. It costs a lot of money and time to create, store, and maintain multiple versions when a single version will suffice. It also creates a lot less confusion when your customer reads the same text in many places, rather than different variations. We call this having a single source of truth. Even something as mundane as using consistent terminology can be the difference between content that is truly reusable and content that is confusing. For example, let’s say that four of us are creating a set of content chunks that, when published, will produce a guide on how to take care of Moose:

We each create a chunk of content:

How to walk the dog
How to feed the canine
How to groom the pooch
How to train the hound

If we combine these chunks and publish a guide on taking care of Moose, people will be very confused. Is a dog the same as a hound? A canine?

In order to successfully reuse content, we need to decide what we will call Moose and stick to it. If your content is more difficult than Moose-care, your terminology is even more important.

Taxonomy and Metadata

Locating reusable content within your component content management system is a critical element of being able to reuse it. A taxonomy is a system that you develop so that your content is well-organized and that you can easily locate and reuse it. Within the structure, each chunk of content is tagged with metadata. Metadata is data about data. In other words, metadata is information that describes an individual chunk of content.

Libraries use a time-tested, well-organized taxonomy for how books are put on shelves. That way, you can go to a particular section if you are looking for books about travel, or cookbooks, or history books, and so on.

Workflow

Workflow comprises of all the processes you use to create, review, check-in, and publish content. There are often several people in different departments who are part of the content workflow. For example, Subject Matter Experts (SMEs) who provide information and review, product marketing, sometimes legal is involved in the content workflow.

When you move to a structured environment, your workflow changes because the way you create content changes. It is important to define your workflow in advance so that all parties involved in the content ecosystem know what is expected of them and when.

Governance

Content governance defines the rules for everything that concerns your content. Governance can include rules for:

Creating
Writing
Retrieving
Tagging
Reviewing
Checking in
Publishing
Sunsetting

And a host of other things that affect your content ecosystem. If you have a solid governance strategy, people who touch the content will know exactly what is expected of them and how to act on it. Arpita: In short, creating good content hygiene comes with defining content architecture. What do we do with existing content? Val: This is where the second step starts, curating legacy content and, what we should do with years and years of existing content.

Step 2: Curate Legacy Content

Curation is the process where you find and assess your existing content. Many organizations have a lot of content, but they do not necessarily know where it all is, how up-to-date it is, and whether or not it is still useful. Locating the content is the first step of content curation. Often, other groups in the organization have created content that needs to be part of your strategy. Sometimes writers forget to check in the latest version and have things on their desktop. Things happen. Content gets lost or misplaced or misfiled. Once you have located the content, you need to decide how to act on it. The first decision is whether or not you will move the existing content into the new system. To Move or Not to Move

Decisions on content are largely based on your company’s needs and goals. Some companies pick new content to use as the starting point for their digital transformation. They leave the old content behind, stored in a legacy system, and use the new architecture for new projects as they come up. Some companies need to move all their existing content into the new system. Most companies have a combination of content that needs to be moved and content that does not. If you have a lot of content and the task is particularly daunting, I suggest you follow the 80/20 rule. Start with the 20% of the content that is used 80% of the time. Work on the most important content first. Once that is done, reassess and do it again. Select the 20% of the remaining content that is used 80% of the time (of all remaining content). And so on. Arpita: In short, it really depends on an organization and how they operate to transform all their old and new content or not. Val: Correct! That’s the final step of this journey: Content Transformation.

Step 3: Transform the content for the new architecture: converting content

Unless your legacy content is already in the target format (for example, DITA), all legacy content that needs to be moved to the new system will need to have the file format converted. For example, MS Word .DOCX files, FrameMaker .FM files, Adobe Indesign .IDD files, all need to be converted. There are companies who have specialized software they use to bulk convert large amounts of content. File formats are changed, the underlying format structure is changed, even variables, conditions, and cross-references can be converted. Often though, there is content that needs more than just file conversion. This content needs to be tweaked or rewritten. That is called content transformation. Arpita: How does an organization maintain this newly transformed content, what are the best practices around it? Val: Once the content has been transformed, it is important to maintain the hygiene by adopting the best practices to monitor and check the new content operation model for reuse:

Write in small chunks. The longer your chunk of content is, the less likely it will be reusable. On the other hand, shorter and more focused chunks of content are easier to reuse in a variety of settings.
Standardize terminology. It is so important that I want to repeat it here. In order to make content reusable, we must use the same words to describe the same thing.
Avoid dependent language. Because each chunk of content must stand on its own, you need to stay far away from dependent language. Dependent language includes:

A chunk of content that refers to another chunk of content
A chunk of content that requires users to read another chunk to understand this one
Words that anchor the content in time or space

Summary

Digital transformation involves many things: new tools, new processes, and new ways of publishing. At the heart of digital transformation is content. By taking a content-first approach, your digital transformation project will be faster, easier to achieve and will yield long-term results. Arpita: Thank you Val for sharing your comprehensive method on intelligent content transformation! Val: Thank you too!