DCL Learning Series

The Basics of Structured Content for Pharma and Life Sciences: What It Means and Why It’s Important

Marianne Calilhanna

Hello, and welcome to the DCL Learning Series. Today's webinar is titled "The Basics of Structured Content for Pharma and Life Sciences: What It Means and Why It's Important." My name is Marianne Calilhanna. I'm the VP of marketing here at Data Conversion Laboratory. And just a couple quick things before we begin. This webinar is being recorded and it will be available in the on-demand section of our website at dataconversionlaboratory.com. We will save time at the end to answer any questions. So please, as a question or a comment comes to mind, feel free to jot it down and share it with us.

You know, digital transformation is taking place across all industries, to various degrees, from highly regulated information such as drug labeling and package inserts to research articles. Content comes in many different forms. And simply having that information in a digital format, such as a Word file or an image-based PDF, is simply not enough for today's digital-savvy consumers, as well as the computer systems that find and deliver that information.

So today, I am delighted to have two people here who have spent a lot of time helping organizations transform content and processes. We have my colleague, David Turner, who works on digital transformation, content technologies, and publishing automation. And we have Val Swisher, founder and CEO of Content Rules. Thanks, both of you, for lending your time and your expertise. I'll turn it over to you, David.

David Turner

Oh. Thank you so much. We are thrilled to get the chance to talk. And kind of the genesis of this webinar was that there was a really fantastic article shared by one of our partners, FontoXML, from the Journal of Pharmaceutical Sciences, about structured content and data management. This article really pointed out a lot of great things, specifically talking about strategies to increase the efficiency of drug development and to speed up the process, in large part through restructuring data management and content authoring. And so we got to talking. And Val and her team had just put out a really fantastic white paper and said, "We really should try to recap this webinar, at least address some of the concepts." I mean, the article's pretty dense and I don't know if we have a link or anything we could put to it, but I'm sure we can get that if somebody just wants to contact us.

It was pretty dense, pretty scholarly. But, again, we thought maybe we could kind of bring it home in a meaningful way by doing this webinar, sort of a structured content 101 course, if you will. So we've got kind of two main goals that we're really trying to accomplish today. First of all, we're trying to provide a safe space for people out there who have heard terms like "XML" and "SPL" and "structured content." But in their day-to-day actual work that they're doing, they don't really understand what they mean and they don't fully understand the value, but they know that they're supposed to.

So we thought, "Hey, this will be a good place where you can just hear some topics, hear some basics, validate some things that you're thinking, ask some questions, et cetera." And then we're hoping that this will also help to get you thinking about ways that you can implement structured content and start taking advantage of it to create some results.

4:06

So with that, being that it's a structured content 101 course, if you will, I thought I would give a quick syllabus here. What we're going to try to accomplish today is really divided into three main sections. First of all, we're going to try to define structured content, you know, what we're talking about here. And then we're going to spend some time talking through really why it's important and what it's useful for. And then probably the largest part is we're going to talk about some ways that you can tangibly start to get started with this concept. Then of course we'll have some time for Q&A at the end. So the introduction, Val, are you ready to go?

Val Swisher

I am ready to go.

David Turner

All right. Well, I'm excited to get to do this with you. So let me kick things off here and just start by saying a lot of people in pharma traditionally think of structured content in a relatively narrow way. They think of structured content really in terms of structured content formats. Like if you ask them about structured content, they'll bring up the structured product label. You know? Something that we get to the FDA. And yes, structured content formats or structured content formatted documents are in fact structured content. But as this article is going to point out, the definition is really broadening in life sciences and pharma specifically today.

The old definition kind of had this idea that you kind of developed everything across the drug development life cycle, in documents, and everything was kind of unstructured. And structured content was something that you did at the end. Right? It was this thing that you did, threw it over the wall to some vendor who helped you to get it to the FDA so that you could meet the FDA regulations, you can get listed on the DailyMed. But as I mentioned, this is starting to change. And so Val, I'm going to just put my first question to you today. How have you seen this understanding of structured content starting to change and how are you seeing the structured content is starting to be understood more broadly?

Val Swisher

Yeah, it's a good question. And it has been the topic of many of our conversations recently. In, what I've seen happen is that, yes, structured content formats like SPL, we'll talk a bit about XML and some of those other formats, is definitely structured. But I think what we are looking at now broadens that definition to be more of a way to create, and manage, and store, and publish those structured formatted files. So it's really more of a system, I would say, or an ecosystem, and a methodology around how we approach content and how we approach writing it and all the things we need to do with it. So it's broader.

David Turner

Yeah. And I would say, the way I would put it is, it's kind of a mindset that it's not so much worrying about those exact formats and you're not having to try to formulate those things, but moving to this concept of, you know, where traditionally we've dealt with things all at the document level, right, and we think about, "I've got my document and somebody else has a document. Maybe we collaborate on a document." But really moving from that idea to more of a granular level, more of an idea where instead of looking at just thinking about just the document, we start thinking about the document parts, the component pieces.

8:01

How can those be reused and reapplied and used in an efficient way so that you're not doing things like, you know, when somebody has to make a change, let's say you change the drug name along the way. You're not having to go find it in 30 different documents and make that change in all those places, but be able to make it in one place and have it automatically populate.

Val Swisher

Absolutely. Absolutely agree. So we need to move from thinking about creating, and managing, and storing, and publishing content in documents, which really are monoliths, to more components, which are smaller chunks of content and they are more nimble. So as you say, first, I can mix and match them so that I can create whatever kind of document I need. So I create these components and then I build a document from these components.

Well, this allows us to reuse a given component in many places, and it allows us to have a single source of truth. So just as you said, if I have to come back later and I have to make a change, rather than thinking, "Gosh, what documents does this information exist in? And let me go find all those documents." Now, I only have to find that one component, make the change there, and then whatever documents contain that component, they're automatically updated.

And to bring this more into the pharma realm, if we think about the core data sheet, the core data sheet really is a document that's made up of a conglomeration, it's my word for the morning, of what really should be components. So we take all these different pieces of information and we put them in the core data sheet so we can find them later. And then we copy and paste them into all these other documents.

If instead we split apart the CDS, and each component was really a component so each one is its own structured file, then we can build whatever we need from it. So again, we would just, if we had a change, if we had other things, we would just go back to that one component. So it's a different way of creating your content and thinking about your content and managing your content. And it is so much more efficient.

David Turner

Yeah. It's moving from managing documents to creating a library of these components. And to answer the question that just came in, yes. Just as you would manage a change in a document, now you're managing changes to the component level. So you can keep a full audit trail of all of these pieces together into this huge content library. And with that, let's go ahead and let's jump here from this concept of the content library, and really look at what kind of the content ecosystem looks like when you change your mindset from this documents to components. I'll share that here. And when that comes up... All right. I'll let you take it from here.

Val Swisher

Yeah. This might look overwhelming. So if it does, bear with us and we will unoverwhelm it.

11:55

What we have on the left are all of the small components that we create, just whether it's clinical or labeling or CMC, that we create as part of all of that content that a lot of it eventually gets to regulatory. And rather than creating them in documents where we have to copy and paste different pieces, whether it's dosage, administration, whatever, we create it in small components.

And all of those components, regardless of where they're created in the ideal world, this is the ideal ecosystem, those components all go into what we call a structured content management system, or SCMS, or sometimes we call it an SCM. There are various riffs on that acronym. So everything is stored in this central, essentially library, like you were saying before, database of structured components.

And we have an authoring interface. So we create the components. We're going to talk more about the technologies later, but we also have a way of publishing these components. One of the things that we have to do with the components is we have to take them, put them into the various documents that then need to go to a RIM system and off to the health authorities. And then the information comes back, maybe we have to make some changes, whatever we need to do.

And eventually, we get to actually create the final published pieces, whether it's the labels, it's packaging, it's whatever goes into marketing, patient instructions, instructions for use, whatever we need to create. And certainly the label is the most obvious form of it. But each one of those little pieces lives one and only one time in the SCM, in the structured content management system.

So this is just a different way of visualizing it. So that's, a lot of the steps don't change. We still go to a RIM system. We still go to the health authorities. We still work on the content using an authoring interface where we type. All of that is still the same. But behind the scenes, a lot of things are more efficient.

David Turner

Yes. Yes. And we've seen this thing change in some other industries through the years. I think about, like, the technical documentation world as they began to move this way. It was a change, but I think many of them began to see: when they stopped worrying about the format per se and they let the technology handle the formats, and really just started focusing on the benefits and on understanding, you know, this really works the same way. We're just going to change some of the tooling and we're going to change from this document mindset to this pieces mindset. I think many of them have found that it's just been so much more efficient and there's just so many more benefits. And with that, that actually brings us to benefits.

Val Swisher

You bring up a really interesting and very important point that I think we should say. This way of creating content is not new. It's new to pharma, but we have been doing this for years and years and years in technology, in finance, in hardware, in manufacturing. This is proven, tried, and true technology. We're not asking to take any risks because we don't want to take risks with this information.

15:58

But this is very stable technology that's been around for a long, long time. So that was a really important point that you were kind of getting there. And that was really, so, thanks for leading me there.

David Turner

Well, thank you. All right. Well, all right, so that kind of finishes our defining what we're talking about here. We're trying to get the understanding of structured content; it's moving from this document mindset to this component mindset and kind of the ecosystem around. And now that we've got our definition, let's jump in and let's talk a little bit about why we would want to do this and some of the benefits that were listed in the article. I'm going to kick off here with the first two kind of at once here. Talk to us, Val, a little bit: in your experience, how does moving from this document mindset to the structured content processes, how does that help to reduce manual effort, redundant processes? How does it prevent replication errors and things like that?

Val Swisher

Yeah. Well, as we were saying, when I store everything in a document form rather than components, to create all of my subsequent documents, I copy, and I paste, and I inevitably make changes and tweaks. Sometimes I make errors when I copy, and paste, and tweak. So that whole process of copying and pasting, once you've copied and pasted, the second you move a comma, you now have strayed from the original. You now have version two. And the more you do this, the more potential there is to enter errors and the more likely it is that you're going to make a change to this new version that will not be replicated in the previous versions. So it's very manual, it's very redundant, and it is prone to error to do it that way.

David Turner

Yeah. You know, I think of our own examples here at DCL. We have a tool that looks for content reuse. It's called Harmonizer. And I can't tell you how many times we've run a report and people are looking at it and they're like, "Wait a second. That should be exactly the same as that. Why is that different?" Or somebody said, "Oh yeah, yeah. We changed to legal, so that's in every one of the documents, "and you look at it and you go, "Oh, well, it's in these 15 documents, but in 16 and 17, it's different." And they go, "Oh my gosh. I can't believe we let that out. Let's go fix that."

And a lot of that just comes down to this whole idea of copy, paste, and tweak. Whereas with the structured content approach, you're making the edit in one place and it's allowed to live in all the other places. All right. Let's jump onto the next little bullet here. Another thing that this article talked about that's a real benefit was that, you know, it's really been difficult, in pharma especially, for the various content systems, the places where content lives across the organization, to communicate with each other. Pharma's got silos, really almost like no other industry. I mean, every industry's got silos, but pharma's really unique. Why is that, do you think? And what does structured content bring to the table to facilitate the collaboration between these systems?

Val Swisher

Yeah. It's very interesting. So when we move to structure, and as we were looking at that ecosystem slide, in the ideal world, multiple groups that have to create content are all using the same system. And we know that there are pieces of content that are created in one group that eventually end up in another group. Right?

20:00

Pieces that are created in clinical, they eventually end up in the label. You know, there's a lot of this. So having this central place where they're all at this library that's available to different groups that need it, that really enhances the collaboration.

Another thing that we need to think about is: how are we going to automate getting the data from the actual studies into the content? And these technologies that we're looking at now that we're working with really make it much easier to grab the number of people who had this adverse reaction, for example, that number, and put it where it goes in the content without, again, typing it, because I'd hate to get that three and that nine or whatever reversed by accident.

So this is not standardized yet, but there are many groups. In fact, we've just started working on this interchange standard. So there's CDISC, and there's the ICH, and there's lots of groups that are working on this. But we really need structured content, that format-free structured content to be able to make this work.

David Turner

Very good. All right. Let's move on to the next one, which is around different geographies. One of the things that I know based at the work we've done here at DCL, we're one of those companies that will throw the content into a structured format for somebody. So I know that one of the big headaches that a lot of these pharma organizations have is managing their drug labels in different geographies. Right? The FDA needs one document, the EMA in Europe needs another one, you're in South Africa or Australia or Japan, you have other documents and formats that you need. How can moving to this whole idea of this broad structured content processing, how can that help facilitate perhaps the concurrent creation of these various filing documents?

Val Swisher

Yeah. It really is a challenge, I think, for all pharma companies because we do have multiple tweaked versions of the same piece of content that have to be a little bit different for the different regulatory. Even if we just take the USPI and the SmPC, we can see that "Gee, it's the same. Oh no, it's not the same here, but it's-

David Turner

No.

Val Swisher

Well, it's not the same here." So again, if we are building from components, we assemble those components at the point of publishing. So I can take the components that are identical and put in the components that are different and publish it for one regulatory agency. And I can take, again, those very same components, but put in these other components that need to be different for another region, another agency. So rather than having to construct and manage the whole document, because what if you need to make a change and you've sent this document to 20 regulatory agencies? Now you have to come back and you have to find all 20 documents. You have to change all 20 documents. And oh, by the way, while the one for Japan was open, you tweaked these other things and you forgot to tweak them for the EU or whatever.

David Turner

Yeah.

Val Swisher

So, yeah. Big problem.

23:51

David Turner

Well, and the article also talks about the new industry organization, the synergy that's developing this idea for a future platform to be able to load your content and then let the regulatory agencies all come to you. And so I think the idea of moving to this structured content process is what's going to facilitate that, as the article pointed out, and that, I think, will be truly transformational because you'll be loading your content that one time, and then, you can let the health authorities really put the content together in whatever way they like. Kind of a personalization, which actually leads me then to the next thing that we talked about, which is personalization in a little bit different way.

Let's talk about content personalization. You can think of it, you know, the health authorities want it in a certain way, and they could put content together, but you could start personalizing information for patients and things like that as well. I've read about this in this book. I don't know. You may have heard of it. It's called The Personalization Paradox. I have it here on my desk for ready reference at all times. But talk to to me a little bit about what that personalization paradox is and how structured content leads to more personalized content experiences.

Val Swisher

Yeah. Absolutely. And thank you for the plug for the book. So I really like to think about personalization for point-of-care content. You know, when a physician or other health professional is with the patient and really trying to get that information that only that patient needs at that moment in time. So personalized content is the ability to provide the right content to the right person on the right device at the right time in whatever language they need.

And the only way to do that at scale is to take these monolithic blocks and break them into these components so that, again, for Sally, I can combine this set of components, and now here's Sally's protocol of what she needs to do for her situation, and here's Sam. And so that's the way that I can pull just those components that I need and really personalize. If Sally is pregnant, I want to make sure Sally sees, "This is what you need to know." Sam may not be pregnant and he may not need to see that. So it really allows me to personalize that experience for people. Yeah.

David Turner

Yeah. Well, one of my favorite examples that was in your book talks about the idea of the custom home versus, like, the mostly custom home. Right? We're not talking about building a complete custom package for somebody. We're talking about putting together these elements that go together. And you can explain this much better than I can. So let me just pass it back to you. Talk a little bit about your home example. I think you'll know the one I'm talking about.

Val Swisher

I know the one you're talking about. So let's think about custom homes and apartments. So a custom home, you get to pick every thing. You get to pick your every single faucet, every doorknob, if you want different doorknobs for every room. Right? You get to pick everything. It is expensive to build a custom home. It takes a long time and it doesn't scale. You cannot build 100,000 custom homes all at the same time, or even 100 custom homes all at the same time so that they're all ready for the same, for the different people.

27:54

On the other side of the spectrum, we have apartments. And if I built an apartment building and there were a hundred apartments in it, every apartment will have the same door knob. This is in a standard building, right? We all get the same faucets, the same sinks. Even if I have three bedrooms versus one, we usually have the same appliances. And that is not customized at all. We don't think about who's moving into the apartment when we build an apartment building. So these are the two sides of the spectrum. Apartments are very scalable. I can build housing for 100 people in one fell swoop pretty quickly as opposed to each custom home. So what we really need is the middle ground, right? So think of the custom home as every single document is personalized for each person, you could never scale it.

David Turner

Can never do that. Yeah.

Val Swisher

But I don't really want to give everybody an apartment because, again, Sam might not be pregnant. I don't want to give him that information. He's got enough to worry about without reading that part. So the building community came up with this idea of standardizing components. So I can break it into components and I can standardize them. So for example, you go to one of these suburban, you know, the builder buys X number of acres and starts putting up houses.

And you can personalize your house because you can choose from one of six different door knobs, three different counter types. So what I've done is I've componentized the selections and I've narrowed it down to what choices I can make. And I can only do that because everything is standard and everything can mix and match. And that's what we need to do with content. It needs to be standard. We need to use the same terminology. We need to use the same grammar. We need to use the same writing style. You know? If we're in the present tense, we need to be present tense. If we're in the future tense, we need to be future tense. Then I can grab these little components and mix and match them and publish them just for Sally, just for Sam, just for Pat, just whatever. So that's where we do that.

David Turner

Well, thank you. Thank you for allowing me to go off script with that. Of course, I did put us a little bit further behind in the agenda, but I thought that was worth it to me. That's such a great example. Just real quickly, a couple of other benefits. I saw a survey last week that said that pharma executives think that artificial intelligence is the number one technology that is expected to have an impact on the industry. And we are firm believers that content structure really is something that helps to enable things like artificial intelligence. Structured content helps to enable things like analytics and decision making. Can you talk just quickly about that and kind of how structured content fits in with these topics?

Val Swisher

Yeah. Absolutely. So without a doubt, AI is coming, it's here. Natural language processing, natural language generation, all of the various kind of analytics, descriptive, prescriptive, predictive. All of these things are accelerating. But you know my saying. If you buy a new expensive tool but you still have the same old crappy content, what you end up with is expensive, crappy content. That's my saying. You can't just buy the tool and think that you can just shove that same old crappy content into it and it's magically going to be better. It's not.

31:58

So, structuring your content, standardizing your content, before you have to upload and ingest that content, the system has to ingest that content, is absolutely critical. You have no choice because the system itself can't fix your crappy content.

You've got to work on your content in order to get these brilliant results. The standardization, it doesn't matter what tools you're using. You need to standardize and chunk. Also AI, you need to have it structured. It needs to learn. You need to train an AI engine with very structured content. So you're really preparing for the future as well when you move to this kind of system. Yeah.

David Turner

Absolutely. All right. Last benefit that the article mentioned talks about speed. And it's just certainly no secret that the industry's trying to get products to market faster. I think we've hit on some of these things already. You know, you don't have to go and find, if you make a change, you don't have to go and find it in a bunch of documents, which don't underestimate that. I've had people tell me that if they make a change in something, it takes them anywhere from three to nine months. I think I've had someone tell me it takes up to a year to find everything like that. There are so many review processes that you have to keep doing over and over again in a copy and past workflow that you can eliminate, not to mention rewriting, et cetera. And honestly, we could go on and we could talk about more and more benefits here that weren't mentioned the article. Like, I think about accessibility, 508 compliance. You know, building that in just helps make all of that happen.

But we are at this point behind, so we did promise that we were going to talk about getting started and what it is that we need to do. So let's jump into that. And we're going to really break this down into three parts because we think the three big pieces that you need to consider if you want to get started here, first of all, is you got to put together some sort of a strategy. That's kind of building block number one. Second of all, you're going to need some technology. And we're going to talk about some of the technologies that you need to be thinking about. And then there's a conversion piece, right? How do I get my old content into these new formats in a way that we can work with it? So this is kind of where we're going next here in section three. And with that, Val, you get to talk about strategy now.

Val Swisher

All right. I'm going to go a little bit fast so that we can get through all three pieces and still have some time if folks have questions. You have to have a strategy. You can't just say, "Oh, I'm going to get this new technology," because remember the crappy content, expensive technology thing. So your strategy is going to consist of four different parts: content modeling, which, I'm going to talk about each one: reuse strategy; taxonomy; and metadata.

So let's talk about content models. So a content model defines the structure. So if you think about it, pharma content is already very much structured. So it's really taking the monoliths and breaking it into the small pieces. So we understand this is the structure of an SmPC. I don't know, that's the piece of content in my head. We're doing a lot of USPI/SmPC structuring right now. So this is the structure. The overall structure of a USPI and an SmPC are very, very, very, very similar, but until they're not. So we need to have, right, and a lot of it's already defined.

35:58

So we're defining the structure of each piece that goes in. This is part of how we separate the content from the format, which you were talking about, so we can publish to whatever we want. It has to be structured and modeled in order for us to reuse it and to do things automated. We need the systems, which you'll talk about more, need to know what to expect.

David Turner

Right.

Val Swisher

The models enforce the consistency and that increases the quality. And again, we even see this now in the TransCelerate templates that were released. We're trying to streamline the writing process so we can model it so that the writer knows exactly what to write, where they need to write it. This goes here, this goes here. No, you don't have to look at the last one you did to remember. It's right there in the models for you. So that's the first part of your strategy, is modeling.

The next part is creating a reuse strategy. So before you start reusing this content, you really need to think about, "Well, which parts of this content can be reused? Which content can be reused but I need to have a derivative?" So perhaps it's future tense here, and it's present tense there, and it's past tense here. Right? So can I use it as it is or do I need to make some changes and how? And who's allowed to change the reusable content?

Where am I going to use it? Right? All of these different things go into your reuse strategy. But if you do it right, then you end up with a single source of truth for each component that's right there. They're built into documents later on and this is how you reduce your costs and you speed up your time. So you have to have a strategy.

David Turner

Yeah. Well, it sounds like it's a really important topic and we probably could do a whole webinar on it. Oh, wait: we are doing a whole webinar on it! So here's a shameless plug. Next month, we're going to be talking about Content Reuse Strategies For Pharmaceutical Clinical Content. So feel free to go ahead and click the link. I think, did we put the link in the chat? Yeah. We've got the link in there. Feel free to click on that. Go ahead and go register. We'd love to have you. But shameless plugs behind. Let's let you talk about the next part of the strategy.

Val Swisher

No, it's going to be a good webinar. Good webinar. Well, once you have all these little components, you need to be able to organize them. So you need a taxonomy. And a taxonomy is a system of classification. You can think of it as a folder structure, right? It's the structure that you're going to use to organize all of these little pieces of content. You really need to be able to find them quickly and conveniently so that you don't make the mistake of accidentally making another one when you already have one. So the taxonomy is usually used for search, it can be very multilayered and tree branched or it can be flatter. It really depends. We work with organizations all the time on what is the right taxonomy for your content. So it's not a one size fits all, but it's very important, as you make more granular units of content, that you can organize them.

David Turner

So when you're using that technology, you're going to need to make sure that you ask about capabilities with taxonomy. And I think you'll probably talk about that when we get to technology. But anyway. Onto metadata strategy.

39:56

Val Swisher

So metadata, again, is often used for grouping. It's used for search. So you need a strategy. Again, you can't go in and do metadata on the fly. "Oh, I'm going to tag this one this way. I'm going to tag this one this way. And, oh yeah, we didn't actually think about this." So for the technology to work, you need to have consistent tags and you need to think about each piece of content. Not only does it go here, but I can also access it using this tag and this tag and this tag. And what's the minimum that needs to be applied to each piece? And what's allowed and where are you going to do it? And how are you going to do it? So you really need to have that strategy in place. So the strategy, before you start touching your content, the strategy is important because otherwise it's going to be a free-for-all.

David Turner

Yeah.

Val Swisher

Yeah.

David Turner

Well, certainly there's a lot to think about when you look at content models, when you think about reuse, when you think about taxonomy, when you think about metadata strategy. I would think it probably pays to have a really good consultant for that. Do you know any good consultants? Can you recommend anybody?

Val Swisher

Oh my gosh. Your shameless plugs are the best shameless plugs ever. So absolutely. At Content Rules, we have been working with customers on this type of strategy for 28 years. We work with many pharma companies, med device companies, and other vertical market companies. So we have the knowledge that you need. If you need some help with this, we would be delighted to help you.

David Turner

Well, thank you very much. All right. So we've talked about the strategy block. Now let's talk about the technology block. We had showed before this picture of the content ecosystem. But we're going to really kind of just highlight the middle section. Talk us through these pieces of technology here that are on the slide that are critical, what they are, what they do, kind of as part one, and then part two, you made a comment earlier about crappy content. Too often, I think projects fall off the rails because people get so focused on the technology piece. And the technology pieces are important and there's some great technology providers attending today. But you can't just focus on the technology piece. If you focus just on the technology and you don't do anything with the strategy piece, what are the dangers of that? How do you keep things in balance? So just kind of a big question for you and I'll let you run.

Val Swisher

Yes, it is a big topic. If we just focus on the key technologies that are in this slide. So we talked a bit already about the structured content management system. This is the system that stores all of your components, that has all of the features you need such as version control and change tracking. And many of them have collaborative ways of doing review. And also workflow, built in workflow. It goes from here, then it goes to here, then to here. Some of them have translation capabilities. You were going to say?

David Turner

How is it different than a document management system?

Val Swisher

Well, a document management system manages documents. So, and it can do a lot of the same kinds of tracking, but it can't-

David Turner

So it's really at the document level and not at the structured content component level.

Val Swisher

Exactly. It cannot manage components and really track: this little piece of content is used here, here, here, here, and here. A document is a document. It's a completely different level of granularity.

44:00

And it's not reusable in the same way as a component. So the structured content management system, some people might have heard it referred to as a component content management system as well. It's a key piece that you need in order for this whole thing to work. In addition, you have to have an authoring interface. The authoring interface is very important because that's how the writers interact with the system. I think that years ago there were companies looking at this type of environment. They were looking at moving to components. They were looking at, you'll talk a bit about XML.

And the biggest roadblock was the authoring interface. And it still is. We can't seem to pry our fingers off of Microsoft Word because everybody knows it, everybody has it. It's ubiquitous. "Don't change my authoring environment." Well, in the past, working in these systems as an author, you kind of felt like you were sort of a coder and an author. They were really complicated. But nowadays, there are interfaces that really look like Word or Google Docs. They allow you to work in what looks like a document, and behind the scenes it's components, it's getting tagged with metadata. All of the reuse is happening.

David Turner

Yeah. And I love that you brought up Google Docs there because I think that's an example. Right? We've had people so focused on Microsoft Word and they still are focused on that. But a lot of people have made the move to Google Docs and they'll tell you, "Oh, well, I mean, it's just like Word. I mean, it's an easy switch." And I think similarly, people need to realize that moving from Word and Google Docs to one of these new editors is really the same, except that now you can take advantage of all these component pieces. It is not so different, right? It's not like you're having to learn a whole huge new thing. You're just taking an extra step.

Val Swisher

Exactly. Exactly. It really, it's time to, if you looked before, it's time to look again because the authoring experience is simple. Hey, I picked it up right away. So it is simple. So that's the authoring interface. And obviously, it's a key part of the system. And then the third part is the publishing system, because one of the things we haven't focused on, but it's important to remember, is that your components are stored without any formatting. So Helvetica, 16, bold, left, justified. That's part of your Word document, but it's not part of your structured ecosystem, which is great because I can take that same information and I can format it with, let's just take font size, font size for submission in a document versus the font size for the label where I need my microscope to, right, I need something to look at it because it's so small. But it's the same content. I don't have to redo it. And that's done by the publishing system. And I can put all different formats on it before I publish it. So if it's marketing, it can look this way. If it's a label, it can look this way. And that's the third key piece of the puzzle.

47:57

David Turner

Absolutely. And you just never know what your outputs are going to be. And so being able to separate that, the content from the design, I think is critical because you can do this multi-channel type of approach. You know, I've got content that needs to go onto a label. I've got content that needs to go to a leaflet. I've got content that needs to go to a website. I've got content that needs to go wherever. And you don't have to maintain all those different ways. And you also don't have to police people or have people focused all the time on, "How do I make this extra paragraph here?" You let the technology take care of that. You give me the content and we'll take care of the formatting, which I think is brilliant. So we still got a minute or two here on technology that we can spend. Talk a little bit about the other piece here. How is it that, what are the dangers of focusing on the technology piece without doing the strategy? And how do you work to keep things in balance?

Val Swisher

Yeah. So that's back to the bright, shiny object syndrome because people love to buy new tools. And they think that the tool is just going to magically solve the problems. You cannot take the same exact content that you have in a Microsoft Word document and just magically think that it's going to componentize itself. It's not. So in order for all of this to work, you need to think about, that's why we need the strategy, what your content needs to look like, how big is the component that you're managing? What's the granularity of reuse that you want? Your content's going to need to be chunked up, your existing content. I happen to know a company that can help you with this. Maybe they're going to talk about it next.

David Turner

Maybe there'll be a shameless plug.

Val Swisher

Maybe they're going to do a shameless plug. And this is why Content Rules and DCL are great partners because you need all of it. You can't just buy the tool. You need to work with your content. And you can't just chunk it up either. You need to make sure that it's written in a way that it can be put back together.

David Turner

Absolutely. And what you were just hitting on there is the whole idea of conversion because when you implement one of these technologies and you've got a nice strategy in place, you could feasibly say, "Well, I'm just going to start. And everything I do from today forward, we're going to do in this new system." But honestly, most companies say, "Well, hold on a second. I've already got a whole pile of these old documents and have content that I want to reuse. How can I get that in here? How can I bring this in?" So that's the conversion piece that we're talking about. And I happen to work in the realm of conversion, so I thought I would take the lead.

So remember before we talked about creating this library of content components? Well, conversion is the process of breaking down whatever existing documents you have, or existing Lego pieces here, into their component parts, and then organizing them for search, for reuse, et cetera. It's getting the content in the system, or getting the content in the formats that can be maximized by the structured content management system that you put in place in the authoring environment. Right? It basically is setting up so when somebody says, "Hey, I want to reuse a piece of content." You know? Type, type, type, type, type. "Oh, there's the component of content. I can pull it in." Now, I'm going to give you a little bit of an under-the-hood look here just to kind of give you an idea of kind of what's going on.

Ultimately, what has to happen with this content is we've got to help out our friend, the computer. Right? As humans, we can read a document and we can infer structure and we can infer the different component parts just kind of naturally. So I went out, and this is not one of our clients or anything like that, I just went out and I pulled a random drug that started with the letter A, and put this in my presentation here.

52:03

But as a human, you can look at this and you could recognize that Accuretic is a drug name. And if you're in pharma a little bit, you probably recognize that this is a generic name here, and then you've got kind of the drug form there. Right? That's just what we can do naturally as humans. It doesn't work as well for computers to do that. Right? So a computer can read these words, but to the computer, they're just words without some sort of structured content. So what we have to do is we have to help the computer out here. And we use a concept that's called XML. Right?

We're using a concept called XML to help the computer to know what it is that it's reading. So we start putting these little tags that are hidden, kind of behind-the-scenes, a beginning tag and an ending tag that tell the computer, "Hey, this is the name of something." And based on where it is in the document, this might be the name of the drug, or this might be the generic name. So if you pulled out kind of an XML file, we can say, "Here's the drug name. This is how it's laid out." And you can see there, there's a little tag there that says "name," and then there's a little tag with a little slash mark in it that tells the computer, "Hey, the name is over." Right?

Or if we go down and we look at the generic name, you see there's the slash, there's "name" in brackets, and then there's slash name after it. That tells the computer, "Hey, the generic name is contained between those two tags." And similarly, the drug form. Now, with this drug form, we've actually given the computer a lot more additional information. Right? We're not just saying, here's, "TABLET," "FILM COATED" is the form, but we actually have a code that's related to it, a code system that's related to it. They're all pieces of metadata. So what we're trying to do is we're trying to help your content to get into this format that the computer can read. And then what your technology partner does is he make sure all of this is really hidden from you so that you want to have to mess with it, so that you can work in a a Word-like environment and have all of this stuff magically happen kind of behind the scenes.

So when talking about conversion, we're talking about getting the content from the document into all of those hidden tags, et cetera, in a way that your new structured content system can read it. So in other words, breaking it down into this content. Now, what that looks like in reality a lot of times is it could involve a lot of different things. There's a conversion process where a company says, "All right. Let's see. What do you have? You have some Word documents. You have some PDF. Maybe you have some InDesign. We're going to get that converted for you and we'll deliver that back." And it doesn't mean you necessarily have to throw away your Word, PowerPoint, or InDesign. You can keep all those things. But now you're going to have them broken down into the components.

Part of this process could involve things like scanning documents, if your content's in paper, and doing OCR, which is optical character recognition, that's helping the computer recognize the text that's on one of these scans. It could involve creating metadata or remediating metadata. You know, maybe you did metadata a certain way, and then you bought another company and they did it a different way. And there's a bunch of gaps, so we need to fix that. It could be normalizing content. You know, now you're bringing together content that's been in four different systems and they each had a little bit different data model.

So part of the conversion process could involve normalizing content. It can involve quality control to make sure that it works and it shows itself right in your system and when it gets published. And of course, a big part of this could be just migrating the content from one system to another and making sure that it gets loaded in the right place. Make sure it gets organized in the right way. Like strategy, there's a lot of work to do. So here's the shameless plug. It's the kind of thing that we here at DCL, right? So we hope to get content organized, broken down, converted, loaded, all those kinds of different things. And we've been in business now for 40 years. With that, we have made it to the last section. Yes.

56:26*

We have actually kept on schedule and we're here at the Q&A portion. So let's ask Marianne to come back here. Marianne, what questions do we have today?

Marianne Calilhanna

Thank you both so much. Okay. So we just have a couple minutes for questions. But one came in that someone works for a small- to mid-size company. They say they may or may not be managing document and metadata effectively. Can you recommend three or four areas where they should first focus?

David Turner

Val, I'm going to let you take that. That sounds like the strategy.

Val Swisher

Yeah. So I don't know whether they're interested in understanding how to focus specifically on metadata or focus on the whole thing because a lot of people get very overwhelmed at the idea of doing this. And we would normally suggest that perhaps we start with one molecule and we do a certain, you know, that we don't try to boil the ocean, that we focus on either one set of documents or one molecule.

And we like to talk about the first pickle out of the jar. You know? We can iterate after that. Getting the first pickle out of the jar, that's kind of the harder part. And once we've worked on one, then the rest of it starts to look a lot easier. So even with metadata, we usually work with people on metadata in a very facilitated way because we need to really kind of talk out all the different ways you want to tag information and why and what's important. So depending upon whether it's metadata, metadata is more important to think more broadly than starting the content modeling process and all of that where we really can start kind of small. So hopefully I answered the question, however it was.

Marianne Calilhanna

A good follow up: someone just asked, "How long does it take to move to this environment of structured content?"

Val Swisher

Yeah. And that really is the pickle out of the jar. So the first one, I'm just going to guess out here, the first one could be six months to really just start. The rest of the ones we start iterating go faster than that. So it is a definitely an effort, but it's certainly doable and worth it.

Marianne Calilhanna

Well, we have come to the top of the hour. I'll just leave with one open ended question. How did you two coordinate exact matching paint color on two separate sides of the country? So that's pretty interesting. Thanks, everyone, for taking time out of your day to join us. Thank you so much, Val and David. So today's webinar is part of the DCL Learning Series and that learning series comprises more than just webinars. We have a blog, we have a monthly newsletter.

And all of these things always focus on topics around content structure, related methodologies, and technologies. So we invite you to take part with these offerings that we have. This concludes today's webinar. We hope to see you next month when we'll talk content reuse. And until then, enjoy the rest of your afternoon, or evening, or day, wherever you might be. Thanks a lot.

David Turner

Thank you.

Val Swisher

Thanks. Bye now.

*Resources named on the screen at 56:26:

• 30-minute digital strategy consulting session with David

• "The Pharma Content Evolution: Content Reuse and Automation"