DCL Learning Series
Making the Case for DITA in Pharma and Life Sciences Content
Hello, everyone, and welcome to the DCL Learning Series. Today's webinar is titled “Making the Case for DITA in Pharma and Life Sciences Content." My name is Marianne Calilhanna, and I am the VP of Marketing here at Data Conversion Laboratory. Before we begin, I do want to share that this conversation is being recorded and it will be available in the on-demand webinar section of our website at dataconversionlaboratory.com.
We invite you to submit questions, comments at any time during today's webinar. We are going to save some time at the end, but don't wait, if something comes to mind, please send your question in. Our panelists have a lot to discuss with you, so let's get right to it. So happy with today's lineup of speakers. I'm happy to introduce my colleague, David Turner.
David is DCL's Digital Transformation Consultant. He helps organizations understand the benefits of content structure, semantic enrichment, and content management. Hi, David.
Hello, and thank you for having me.
I'd like to briefly give a little perspective about DCL. DCL services and solutions are all around converting, structuring, and enriching content and data. We are the leading provider of XML conversion services, data conversion, structured product labeling conversion, and S1000D conversion. While we are best known for these excellent content conversion services and content structure services, we also do a lot of work in the other areas listed on this slide, things like semantic enrichment, entity extraction, data harvesting, third-party validation for previously converted content, content reuse analysis, and structured content delivery to major industry platforms.
No matter the service we provide, we harness the latest innovations in artificial intelligence, including machine learning and natural language processing, to help businesses organize and structure data and content for modern technologies, systems, and platforms. Next slide, please. Our next speaker is Val Swisher, CEO of Content Rules, the global content experts. Hi, Val.
Hi, Marianne. Hi, David. Good morning, for me. Thank you for introducing me. I am the CEO of Content Rules. We are, I think, the oldest content services company in the world. We've been around for just about 30 years. We offer a variety of services, including content strategy, which helps customers move from an unstructured environment to a structured environment, which we'll be talking about today, global content strategy, if you're going to be taking over the rest of the world, content development.
We have writers and editors and artists and any service you need to create content. And content optimization and transformation where, along with DCL, we help customers take their legacy content that is unstructured and move it into structure and do all the other things that the DCL conversion doesn't do, which includes de-duping, rewriting, cleanup, that sort of thing. We've been working with DCL for a long time, and I'm thrilled to be here. Thank you.
Our third speaker is Sophie McMonagle. Sophie is Information Architect at Novo Nordisk. Sophie, we are thrilled to have you with us today. Thank you so much.
Thank you, Marianne. Really happy to be here. Thank you very much. I'm the Information Architect at Novo Nordisk. I think Novo Nordisk, many may have heard of already, but we are a Danish pharmaceutical company celebrating our 100th anniversary, in fact, this year, focusing particularly on diabetes and rare diseases, so cardiovascular, hemophilia, obesity as well.
You may have heard of Novo Nordisk in the news with Wegovy and Ozempic. So really happy to be representing Novo Nordisk today. My background is actually mostly in IT, so I have around 17 years of structured content experience. So DITA XML is my baby, so I think I'm talking today from Novo Nordisk's experiences and our very ambitious vision, I think, in terms of adopting structured content, but also my history with other companies and my journey with DITA as well.
Well, I'm going to go away, and I really look forward to this conversation. Thank you, everyone.
Well, thank you so much. I am excited, and as Sophie hit on there, it is nice to have the couple of different perspectives today. What we wanted to get at was the idea that this concept of structured content and pharma and life sciences has been getting a lot of traction, a lot of awareness, but because of the many different applications for structured content, there's a ton of different standards, and it can just feel really overwhelming.
So when a pharma company wants to start working with structured content, a lot of times it can be confusing, they don't know exactly where to start. The purpose of this webinar is to talk about a flexible architecture called DITA XML that could possibly be used for this purpose. Sophie has a perspective – well, obviously she worked in other companies, but she brings this perspective of working at Novo Nordisk, who is undertaking this project with DITA, and then Val has broad experience in working with some other clients who used both DITA and tried some other things.
That's where we're coming from today. A quick disclaimer: whatever XML model you use is driven by your specific use case, okay? We're not saying today that DITA is for everybody. There could be compelling reasons to use a different model. DCL, we have partners who we love and do a lot of work with who have great tools which do not use DITA but are maybe based on a custom XML model or a DocBook XML, for example.
But the purpose of this particular webinar is to offer a consideration for the merits of a DITA-based approach. We're going to divide this up into four parts. I'm going to let Val go in and define our terms for us. Then we're going to discuss just the case for structured content in general. Then we're going to get into some of the reasons why Sophie's team chose DITA, why some of Val's clients have chosen DITA, why it might make sense for you.
And then we'll wrap it up with a summary at the end and do some Q&A. So like Marianne said, please start adding those questions right now, and we'll dig in and make it happen. We do have a lot to cover. It's going to be high-level, quick paced, but hopefully it'll be good. So Val, I'm going to turn it over to you. Why don't you give us a quick high-level overview of what is the structured content, what are we talking about here in our definitions?
Sure. Well, let me introduce structure by first talking about what unstructured content is. Unstructured content is what we're used to. It's when we basically start at page one and just keep going until we hit page 320 or whatever.
It is written, updated, stored, managed, published, and translated as one big monolithic document. Historically, we've had unstructured content for a lot longer than structured content.
One of the things to keep in mind, and this is really important, is that in unstructured content you combine content and format. It is “what you see is what you get." And if you need multiple formats, then you have to have multiple copies of that same unstructured content. So keep that in mind. It's created using all the tools we know: Google Docs, Word, et cetera. To contrast, structured content is content that's created in small chunks.
We call the chunks components. You may hear them called topics. You may hear them called fragments. You may hear them called blocks. Call them whatever you want, but they are small chunks, and they are tagged individually with metadata, which is another thing to think about with unstructured content.
In most unstructured content situations, you can only tag the file. And if your file has 52 pages, good luck finding the information you need quickly. It is separate from format, so your text is truly your text. That means that from the very same file, that one chunk/block/fragment, you can publish to a variety of formats. Very important. It's organized. It's stored in a central repository.
So specifically then, what is DITA?
Okay. There are many ways to create and use structured content, and most of the ways that we work with in the content world are based on XML, which is a markup language. DITA, which stands for Darwin Information Typing Architecture. Maybe Sophie can tell us how they came up with that name.
Think of DITA as a subset of all the possibilities that XML offers to you. So it is a subset. It was introduced by IBM, Sophie was there, and it has become the de facto standard for structured content. So that means that, first of all, it's an open standard, which makes it much easier to create tools. There is a large ecosystem of tools that support DITA.
There are known methods for doing things. In fact, DITA, the subset of the various elements and attributes, they're semantically classified, which means a task is actually called a task and a concept is actually called a concept. So you don't have to really worry about "Gee, what is that?" And you can classify those different types of elements and things using semantic structure.
Now, Val, you stole my thunder a little bit here. I was going to take us back here to the early 2000s and the fun stuff that was around. As you mentioned, I know that in the early 2000s this came from IBM. And as you mentioned, Sophie was there. Here's a picture from back then. Sophie, I was going to just ask you quickly here since you were there. I read an article by the great Eliot Kimber, who gave a couple of different things about what their initial goals were.
First of all was this idea of enabling interoperability without having to require strict agreement between sources, so you could have your DITA and their data, and it'd be flexible enough for that. And then the other big one was just enable this reuse. How does this track with what you remember, and your thoughts on that?
Yeah, so I mean, wasn't there in 2001 in the labs, but I did join the IBM Hursley labs in 2006. They had previously been an SGML DocBook, so they were just actually migrating a majority of the content into DITA. So reuse was absolutely massive as a driver because we were looking at – I mean, the product I was on was an old mainframe product, transaction processing, and we had 30,000 topics.
And every release, we would have to replicate those topics. Sometimes the only thing that changed was the product number. Version one to version two to version three, as soon as you can actually start capturing those pieces and say "Okay, we'll create that as a variable and we can upload just the version number," it makes your content really portable, really reusable. So it was a really big deal. That wasn't my badge photo. I think I had my student photo for many, many years, but that was the time.
You're taking away the magic of the movies here. All right, so we hit our quick definition, structured content, unstructured content, DITA. Let's just talk about the case for structure. I think the big question that always comes up, especially when we're talking to people in pharma life sciences is “Why structured content?" Why not just Word? In a nutshell, what's the value of this? Anyway, so Val, I thought this might make a great start for you and maybe a great new series on Netflix or something: What's the VALue? I've already put it into my list there. Obviously you're working with more and more organizations who are implementing structured content. What is the value?
Oh, there's so many values to structured content. Let's talk specifically about life sciences. It's the same in every industry, but let's just talk about life sciences since we are here for that. In the current ecosystem, as your medical writers create information and update information, often they need variations of that same document. In addition, there are pieces of content that need to appear in many documents over and over and over again as your drug product moves through its regulatory life cycle.
The way things work today is your authors write something and eventually what they write goes to the health authorities who inevitably come back and say "You need to make these changes." And then every single place that content appears, you have to find an update. If you miss one and you're audited, you're in big trouble. And we all know that. It doesn't scale. It's inefficient in every way possible. It's high risk, high risk. That's today. Instead –
All right, let's talk about a better way then.
Yes, the better way. If we single-source, if we reuse components, fragments, call them whatever, call them label blocks, you write it once, you update it one time, and those updates are available immediately in everything that you publish that uses that chunk of content. The risk is minimized and the time is cut. Depending on where you are in the process, it can take you a lot of time to find all those versions. If you only need to find one, it's going to be a lot quicker. So it's just a much better way.
Let me also say, because I haven't said it yet, I have this whole grouping of chunks. They're stored usually in a structured content management system. I build my output, my documents by assembling these chunks in a particular order, because the chunk is going to appear in different documents in a different place, and I assemble them. It's that assembly that becomes the output, but the storage of the content is in individual single source of truth, reusable chunks of content.
All right, so let's summarize this here.
Benefits: managing and reusing content from a single source of truth, automating the assembly and the formatting. Remember, if you have different formats, if you're not using structured content, you need a separate copy of that content. So all different formats from a single source. Another thing is using the standard, it's much easier to exchange information. It's also much easier to automate the data science, the data to come directly into your content so your medical writers don't have to accidentally type a three instead of a nine or whatever. It's automated. You have traceability, audit trails, every bell and whistle you can imagine. We can make sure that the content is fair, which is a data standard that we're putting into content, findable, accessible, interoperable, and reusable.
The way I see it, based on what you've been telling me here, is that with Word in our current ecosystem, we can obviously do some of these things if you work away, but really, it's the wrong tool for the job. I mean, I can mow my grass with a pair of scissors, and I've got a little bitty yard or if I'm in the UK a little bitty garden, maybe that works. But if I live on Val's property out in Napa or have a yard like this, it could never work.
When you look at reuse, yeah, maybe you can do some reuse with Word, but it's copy and paste, so it's extremely limited. This automated content assembly, that doesn't really happen with Word. Single-source publishing, okay, I can save it as a PDF, maybe I can save it as an HTML file if I really, really target those styles or what have you. But yeah, pretty limited. Automated convergence of content data, limited. Future proofing, no.
It doesn't do any of those things. Interoperability, I think that's the big one there. So that's our case for structured content, if you will. Not specifically DITA but just structured content in general. We do have some resources that are available if somebody would like to check this out. We have two white papers from content rules that are over in the handout section, so feel free to download those. And then Leigh Anne is going to put into the chat some links to a couple of other webinars that we've done and a blog post that we have.
So feel free to contact us on those and we can talk about these resources and others. Sophie, before we get into the actual case for DITA, let's talk a little bit about your structured content use case. I know that at Novo we've talked a little bit about this idea of this NextGen labeling vision. What was behind this? What led you to, really, at Novo, promote this idea of a structured content approach?
Yeah, thanks, David. I think the vision was very, very clear from management from the start, and it's actually I think really important if you have got a very clear, very simple vision because everyone can buy into it. In pharma, I think it's one of the most important places that your documentation is right. When we talk about documentation, we are looking at things like the patient leaflet, we're looking at the cart and the label. If you get your dosage wrong, you get your product name wrong, any of those kind of things, you are not only potentially affecting people's lives, but you are also, from a business perspective, looking at potential recall. If you are copy-pasting, there is always that possibility that somebody is in a hurry or somebody has not got the time or has not maybe checked and you are copy pasting.
The other side of that is you have immense quality checks. You have to go through rounds and rounds and rounds of review to make sure what is going out is absolutely accurate. So it was absolutely imperative that the labeling workflow was connected. So it was actually having one shared process for actually delivering the submissions to health authorities, but also preparing for the future, creating smarter ways of working, flowing that all through from our submissions to the cart, to the label, to the patient leaflet because it's all one and the same.
In terms of preparing for the future, it was really saying, okay, so reuse is a huge, huge use case for us. We have portfolios of products. They might all be insuli- based or they might all be hemophilia-based or they might all based around one active ingredient, for example. You write things naturally in the same way across those texts. But actually, when they're in Word, they're slightly different quite often because a different author has come in and they've said "Well, I want to say 'could' rather than 'would.' This maybe sounds slightly better this way." So people add personality, but also they don't necessarily see what has come before.
With that personalization, it means that you can't reuse. So being able to reuse across portfolios and across the whole of ourNovo Nordisk addresses stock statements, why write those every time, why make your affiliates actually have to write that thing when that's the thing that they add every single time on top of the text that's been provided by HQ?
Copy-paste I've already spoken about. But also just to expand a little bit on that future, the world is progressing. It's not just Word output, it's not just InDesign for print. We want to be able to publish to HTML, to e-labeling, to apps, to digital media. Not just what the health authorities are asking, but what our customers want.
Investing in things like animations for showing exactly how you put the pen lid on, how you take it off, how you administer your drug, so that we are actually creating an infrastructure so that we can extend and we are not just at the mercy of the next output type that the health authorities want, be that XML for the US or Word for another country or some other media.
Let's take the part about the creating smarter ways of working a little bit. I like this slide that I saw on a presentation that you gave. What's in it for the Novo Nordisk employee who's working with this end-to-end labeling process? What were you guys hoping to accomplish there?
This was very much a drive from the business side in terms of: they spend a huge amount of time reviewing. Well, there are fewer industries that are more regulated than pharma. So in terms of having to perform QA, it takes a lot of people, it takes a lot of approval cycles. So they're very much looking at: if you create something once and it's locked, the rounds you need to review are much fewer because there is no debate about it, you see it in context of everything else.
Fewer handovers. So that was a bit about the process side of things, having things in one place so that you are reviewing, you're editing all in the context of the same thing, reducing the number of so there are fewer handovers. Transparency, that's actually seeing again with that reuse and with workflows that are alongside that what the impact is. Val, you talked about metadata, we've added so much metadata to our content so that you can search and filter it, but also so that we can trigger events.
So we know that if the EU has been signed off, that will trigger other projects through the reference metadata that we've got in there. So it's actually that transparency. Better overview of the labeling flow, again, that's transparency. Collaboration, so actually having an editing environment where we can actually collaborate, easy workflows. I won't go through all of it, David, but that was the kind of flavor of "How can we make the business easier? How can we actually simplify, add transparency, align all in one?"
Which is huge, because I think a lot of times people hear about these kinds of initiatives and they think about it from the perspective of there's a business benefit, but they don't think about the actual person that's actually working with the content, and there are a lot of benefits for that person as well.
But I want to now hit on, quickly before we move into the DITA part, you used that term “reuse" over and over and over again. I recall you had this very compelling reuse use case, which I've tried to illustrate here. Can you take a couple of minutes to maybe walk through this slide, what we're seeing here, and this concept of how a single-source update might require updating 50 to 100 other documents individually?
Yes, and I alluded a little bit to this before in terms of: we have these drug families, so it might be that you are working in the diabetes portfolio, and within that portfolio you've got three or four drugs or more that all have the same – they all have maybe the same core content to some degree, but their product name changes. So you've got drug A, drug B, drug C. One of them has a 10-milligram tablet, one has a 20-milligram. For another it's maybe that you are using a solution for injection. You might have a different presentation. For one of those drugs, you might have four or five of those presentations. It might be a vial. We call that a presentation. So it's the dosage and the way that that's administered in one.
Then once you start layering after that, we serve nearly every country around the world, so all of those countries will have at least some of those drugs within that drug family and they'll have some of the presentations that are shown out of that list, if not all.
And then once you look at that as a submission, so the submission itself has to get signed off by the health authority, everything in that submission goes onto either the package leaflet, the carton, or the label. It is locked at that point. So it is absolutely your source of truth, what you have submitted, and then you are free to put that onto pieces of documentation.
As you say you've got here, it might be some countries also need HTML. So they need to be able to publish that same information on a website. It needs to be accessible, so it has to have alternative text, it has to have active links. It's the same information but it's rendered in a different way. I mentioned this animation. We are also investing in other formats. It might be a mobile app, it might be an animation. It's exactly the same text.
So once you start layering up all those millions actually of differences going from a product family and you think "No wonder people struggle with Word and no wonder when you have the smallest update, it might mean 50 to 100 documents that you actually have to update." So it's a very, very compelling use case in pharma, I think, because this is really standard.
You look at just some texts like "Keep out of the reach and sight of children." That text is on every single drug that we put out there. I understand a few years ago that you changed that from "Keep out of the reach and sight–" Sorry, "Keep out of the sight and reach" to "...out of the reach and sight." Now, can you imagine the impact of that very, very small change across an entire company? So yes, I think the vision for me is one of the clearest I've ever seen. It's a very, very compelling view. And then to –
Oh, I was going to say now talk us through this slide.
Of course, David. This is just rounding it all up and putting it into perspective. The project I work on is NextGen labeling. We are basically putting together a content management system, so that is the structured content side of things, alongside the artwork management system. I said we are looking at this end-to-end workflow, being able to pull our content that we submit that is in, actually, a DITA XML, pulling that through to the carton.
Val talked very nicely about the outputs we've got there, we've talked about it in different situations, but being able to generate a Word document from that content, but also generating an artwork PDF in InDesign and being able to preview that at the point of writing. So I'm writing a label text. I can then see in context, because I've sucked that content through, what that actually is going to look like on my carton or label. Is this going to overflow? Is this something that is not going to be translatable into German because German is going to be very verbose and it's going to take a couple other space. So actually making things very tangible in terms of what you are actually experiencing, what you are writing.
All right. Well, Val, let's bring it back to you here because I've seen this slide on some webinars we've done together before. We've talked a little bit about Sophie's specific use case here. Broaden it out; how do you see that comparing to this ecosystem that we've talked about before?
Well, sure. What Sophie is describing is in this slide. It starts with labeling all the way to the left and goes to regulatory and comes back and then goes out to the packaging or wherever.
Wow, look at that, that's fancy. What this slide is trying – I like it. What this slide is showing is that in addition to labeling, we have clinical and we also have CMC. Again, we share content between clinical labeling and CMC. There is absolutely no reason to be copying and pasting this content.
If we think that it gets complicated in labeling, which it is very complicated if you're copying and pasting, expand that to all the other 10,000 pages of content that have to accompany a drug product to regulatory. This is just showing – it's very simplified, but this is the ecosystem that you would be looking at where everybody uses the same structure content management system.
That's where all of the components, the blocks are stored. There's an authoring interface, I think we're going to talk about that a little later. You can publish whatever the regulatory agency wants. They want Word, you publish to Word. They want a PDF, you publish to PFD. It can go into your RIM system, so you can keep track of what you've sent to regulatory.
Ultimately, when the drug is approved, it will then go to whatever, as Sophie said, your customers need. Obviously, it's going to be the labeling and packaging, but you could have safety information, education, marketing, electronic health records. I'm excited about the possibility of publishing to apps. I think that that's the wave of the future. We're doing it now, but it's so cumbersome. It's so much easier and less risky and faster, and it's all the things. So this is where we are going on a broader sense.
I love that approach because with Sophie's example, we talked about a specific area, which is a pretty big area with a lot of benefit, this labeling. But you can also step back and say "There's also value for clinical trials to do this. There's also value for quality to do this. And then there's value to bring all of those together."
One of the things I guess maybe we can hit on when you talk about DITA that's interesting is that, if you've got the clinical trials, people that say "I don't want to be forced to use what labeling's using because we have different needs." Well, you got a flexible format or architecture like DITA, you can actually do that.
But anyway, that's enough about that. So we've done our definitions, we've done our intro to benefits of structured content. Let's actually now take this now into the case for DITA, because, Sophie, you did actually choose DITA for several reasons. When we talked ahead of time, you actually gave me a whole list, but we whittled it down to just these six things. With that, let's jump right in, and we'll hit the first one there, reuse. As we've mentioned, DITA, compared to other XML models, is really, really built for reuse. Sophie, what are we seeing here on this particular slide?
I have to give credit to my business colleagues for pulling a lot of this analysis together right at the beginning of our project, because that was looking at What is the use case, what's the business case for actually doing this? What you are looking at here is a Tresiba patient leaflet. On one side you've got all the safety information, why you need to use it, all the precautions.
On the other side you have the instructions for use, so that's, pull off the pen cap, do all those kind of things. So Tresiba here, it’s showing that we worked out about 70 to 80% of the content is identical to another of the drugs in our portfolio called Fiasp. So the only pieces that are in white are the pieces that are unique. And even in some of those cases, it's only the product name. And actually, they didn't even look further to say "Well, actually, is it just that something slightly different or it is significantly different?" You'd expect things like the clinical trial data to be different, but other than that it's amazing, and particularly if you look at the instructions for use.
Tresiba is actually administered in a prefilled pen, as a lot of our insulin products. It's exactly the same way of doing it; it's they've used different colors. So to be able to swap out the images with the color, because that's very helpful to the user. "I know I've got a yellow pen, I can see my yellow pen on my leaflet." That's really good contextual information. But for Fiasp it's yellow and red. It's just a device. It just happens to have a different drug in it. So, very, very powerful.
You mentioned here that we can reuse the same content across packaging materials?
But you can also reuse similar content across packaging materials, which, I think, that's really powerful. And then, Val, you put out the example here of, you can reuse not just the sentences, but you can use entire components, sections, and things like that in your different regulatory documents. Oh, did you say something there, Val?
Yeah. Recently, one of the white papers that we put out was The CDS is Dead! Long Live with the CDS! or CCDS. Right now, companies who are using Word essentially use the CCDS, CDS, whatever you call it, as a repository. It's a Word document that contains all these little sections. And then to create other documents, for example the USPI, the SmPC labels, they copy and paste out of one Word document into another Word document and then tweak, and who knows what happens.
In a structured environment, there is no CCDS as an entity because each one of these little components lives in a repository. You want to pretend you need a CCDS, fine, print them all out, make an assembly that's a CCDS. You don't need it anymore, but if it makes you feel better, sure. But the point is you need to use the same content, and doing it in Word makes no sense. Doing it in DITA is the way to do it. So that's what this slide is trying to show.
All right. But honestly, can't you do reuse with any XML? I mean, we don't really see reuse with JATS XML in scholarly publishing because they call reuse in scholarly publishing, they have another name: that's “plagiarism." But you can do reuse with custom XML schemas; one partner that does that. You could do it with DocBook. What is it that makes DITA special when it comes to this reuse?
I'll chime in and then, Sophie, you may have additional things.
To me, the most important point about DITA is that it is a standard.
An open standard. Custom XML, you can do great things with custom XML, but you can't interoperate with anything. As we like to say, when you take standard DITA, every specialization that you make to that standard DITA makes it less and less interoperable with anything else. So it's all about creating to the same standard, and that is very special and it's very important.
The standard exists today. There are some efforts underway to create a new XML standard, I don't know why. There are lots of tools available. It's an ecosystem, and DITA is very unique in that in this part of the content industry. So Sophie, I don't know if you want to add anything, but to me that's the best thing.
Just to add on that, I think it actually requires less deep XML skills. There's an awful lot out of the box you can just use. You actually have all of these. It absolutely ties into what you are saying, Val, from a perspective of "Can I just go and get started?" Okay, you need a little bit of skill, but you don't need to have absolutely deep XML knowledge.
You can actually go and use what's there and there's a huge amount of tooling around that will actually help you do that, and a huge community as well that will help too. So it really does make a difference. Yeah, the skills are easier to come by because, as you say, it's an open standard; people invest in it.
I'll say this too just about the reuse part of it: DITA was designed for reuse. Most custom XMLs are not necessarily designed for XML, DocBook – I mean, for reuse – DocBook has some reuse but is not designed specifically with that purpose. In fact, DITA was meant to be an improvement on some of that. So that's something I would add. But actually, Val, you kind of hit us into this idea about open standards. And so, my question was, what makes DITA special here?
I'll just take this one here. The things I think about are things that you guys have mentioned, right? It's got one of the largest communities out there, 20 years of lessons to be learned, you're not going to be reinventing the wheel here. There's tons of available tools, add-ons, and I think of critical importance, there's a large pool of DITA developers out there. Sometimes when you get into other XML models, it gets hard to find that development talent. Let me talk a little bit just quickly about the community.
One of the things that I think is really impressive in the DITA world is this whole concept of the DITA Open Toolkit. We got a link here to that if you want to want to go to that, but it's a place where there are pre-built transforms, resources, documentations that can give any organization a real head start. You want to go DITA-to-HTML? There's code that you can adapt for that. You want to go DITA-to-InDesign? There's code that you can adapt for that too. It's not like you're starting from scratch because there's that great community.
And then toolset-wise – oh, it didn't change – there we go. Toolset-wise, where data has other types of formats, is the tool set, the DITA community has by far the most available tools, add-ons when you compare it to any other information standards. There are robust XML editors, there are robust content management systems, there are robust publishing systems, all of that. So as an open standard, DITA really steps up.
Another one that you mentioned, Sophie, was this concept of single-source publishing. DITA is especially good at creating multiple publishing outputs from a single source file. What did this mean for you at Novo Nordisk?
I think looking back to that original image we had around the drug and then the presentation and then going out to the health authorities and then out to the packaging leaflets, we, today, are publishing to Word because that's what the health authorities want. However, many health authorities are now moving to XML. They're using standard FHIR as an XML, standard basically. So we are now looking at DITA XML to another form of XML, so immediately there's huge value there in having our content in an XML framework.
The Word is actually one of the hardest things to do. That is a really hard transform. But the fact is we are not having to do it many, many times. We are keeping all of that source in one place. PDF, you mentioned HTML, we are using the DITA Open Toolkit, it's an amazing resource, absolutely fantastic. What I love is that I can do that locally as well. So from a local tool I can just whip off a webpage and see how my output looks. You've got XLIFF there. So XLIFF is perfect for translation. But actually, most of our translation companies can work with DITA as well. It's common standard.
The e-pack leaflet, apps, it's about having a machine readable – and you're coming onto machine-readable format, I know, but it's machine-readable and human-readable. The value in that is absolutely immense because it means that suddenly you can automate, you can send things to these different formats, different outputs, but you will edit and work in your daily life within the source in that component content management system.
So, Val, that brings us to the hard question for you. If all XML separates content from design, what is it that makes DITA particularly well suited for this?
Well, I think it's all the things we've been talking about. It's purpose-built for this. It was designed for single-source publishing. It's an open standard. The DITA Open Toolkit is a resource that I had not seen in any other content standard to be used that way. It's truly all of the things. As Sophie was saying, it's machine-readable, it's also human-readable. Semantic tagging is a big deal.
If you've ever done tagging without semantic tagging, then you can see what a big deal being able to tag things semantically is. That's part of what really makes it human-readable. The flexibility that it provides in terms of the output. It's XML-to-XML or XML to anything else you can dream of. If there's not a standard way to do it today, then DCL can do it for you. There's no doubt about it, XML to anything.
For example, if you have a task, a step can be presented in different ways for different uses. You can go down to the minutest piece of information and make it look however you want. So it's very powerful.
I think, yeah, and it's modular and it's hierarchical, so you can put sections under other sections and put those things together.
It's really built for that component piece which makes it really good for that output. It also makes it good for this concept of data publishing that we hear about a lot. Sophie, talk to us a little bit about this idea of publishing data, things that you don't necessarily want writers to be writing on – writers to be editing, I should say.
Yes. We talk about the reuse components that come for free, almost, with DITA, so content references and key references, essentially. You can see here you've got a section within the editor, I've taken a screen capture, it's got a little reuse symbol. What that means is that is a content reference. That is a piece of content I've written. It is locked for a normal user because it's been written, it's signed off, approved, and it is appropriate to be used in multiple places.
I just showed the lock icon. You'd see that you could have a topic level. As Val says, it could be at a phrase level or a paragraph level. You can see what is in this leaflet. These are actually all phrases that are taken into a numbered list. So actually, they could be used titles because they're titles on other topics, but they're also numbered lists. And the pieces in green where you've got the product name, that is my variable, that is my key reference. So that might be Fiasp, it might Tresiba. Suddenly you can see how I've got a whole topic there: name of the medicinal product.
All it contains is keys because all I ever have in that topic for the EU PI is my product name and my presentation. So I don't actually have to edit that in any way. It'll be auto-populated when I start my new drug with my keys, with my content references that I can just forget about. We're talking about it in the business is you get to, as an author, concentrate on the interesting and fun stuff because everything else is created for you. Your titles are all there, all your set texts are there, you don't have to worry about any of that anymore.
What about tables?
Tables are really complicated quite often, because actually from my investigations, quite often the table data is given at our company in a doc text file, which was quite surprising to me. We ran a project with the bio-statistics team. Bio-statistics are the guys and girls who look after all of the content that comes out of the clinical trials, and they hold all their data in a SaaS database. They have, obviously, a massive database there in terms of all of the clinical trials that are run. But we've run a project with them to be able to extract for the tables that are required, for the labeling for them to convert that into a DITA table.
So we talked about open standard, everything is out there in terms of how to create a DITA table, and they created some text – sorry –they created some script to be able to run so that when we request a table we can create it as a DITA table, suck it immediately into the content. That bit we're still working on the magicry, but they can create this DITA table that we can pull into the document and lock it. So isn't that amazing then to be able just to pull in the whole of the data? There is no question that is exactly the data that bio-stat have given us, and it's an entity that can just be used in situ.
I love that. I love that. Let's talk just quickly about metadata. We could do a whole webinar on metadata. I think with DITA the big things are that it was, again, designed to be modular and customizable without breaking things in the operability. Sometimes you have to change elements and create individual things in other XML models, where DITA gives you some really flexible metadata.
So it's going to empower your search and it's going to empower your filtering, things like that. That's most metadata. It really helps to empower your outputs without breaking the model for your interoperability. You might put things in output classes so that the system knows if it's going to this country in this instance for this kind of deliverable, this tech should be colored this way. And if it's going to this deliverable, it should be styled like that. The metadata really makes that happen. The metadata is what really empowers that reuse and obviously so much more.
Let's quickly talk about the last one here because I want to make sure we do have time for questions here. I love the concept of DITA being both machine-readable and human-readable. We talk a lot about the machine-readable part, but not always about the human-readable part. I think what we mean is the nice thing is that non-technical people can work with DITA.
Technical people can work with DITA, and they can take it to the nth degree, but non-technical people can do it as well. And so, you had a quote that said, "The way that we're going to win people over in this smoothest structured content is when they learn they don't have to be a tech writer." Talk a little bit about what you mean by that and the importance of this.
I think the way we've been handling this is looking at what are the key activities. So if you are reviewing and collaborating, then you should be able to do that without having to get into the midst of the content. So having a platform where you could actually for a majority of cases review, collaborate, propose texts, that kind of thing. And then having another perspective that is "Okay, I need to edit my content, but do I need to edit the reuse?" Probably not.
Maybe that's actually a more skilled role that would actually think about "What is the impact of what I'm doing here?" So I should be able to go into one of the many editors that are provided with DITA, being able to go in, not have to worry about the XML that's under the covers because already out the box DITA has sections, it has paragraphs, it has things that actually very easily relate to a Word document today.
You showed one editor view. Where I would just say a word of caution is, the more reuse you have in there, the more you need to govern your reuse. You don't get reused for free. So you can go in and you could actually use DITA and not worry if you didn't worry about reuse at all. You could just absolutely create your paragraphs, your sections, absolutely no technical knowledge required at all.
Where you invest in reuse, I think you do need a little bit more skill and you need to think about a mindset to change more than a technical writing sense in terms of "Okay, what is the impact of this if I create it as a Word document, as an HTML?" I am not viewing the actual thing here that I am going to be publishing. I am writing the words. This is the raw content, and then what I am publishing to is going to render what that will actually look like at the end of the day. So still a big mindset change, but it is much less involved than actually being an XML expert.
I'm afraid we're running out time, so we'll have to cover this last little bit quickly here. But it is also machine-readable, as we've talked about before. So if you've got other content management systems around the world that are maybe based on other standards, the nice thing about going with DITA is that it can talk and work with all of those.
So just a really quick high-level review. We talked about unstructured content, we talked about structured content, we talked about DITA, and then we specifically got in and we talked about “Why not just Word?" and really said the whole idea of structured content is better.
And then we went in and we made the case here that really what sets DITA apart is its reusability, the fact that it's got this open standard and community, single-source publishing, it's built for that, it's built for data publishing, it's got very flexible metadata and then it's human and machine-readable much more beyond what other systems and other architectures are.
So quick high-level overview there. We're a minute or so behind here, Marianne, but I think that still gives us some time for some questions. Did we get some questions in?
We do have some questions. I know we're not going to get through all of them, so I want everyone to know we'll be reaching out to you personally to make sure we answer your questions. One that came through: "What's the relation between DITA and FHIR? Are they redundant, or do they compliment each other?"
Yeah, either one of us can answer. The relationship is that they're both standards, and when you're using DITA, you can publish to FHIR, and FHIR is a standard that they're looking at for the health authorities as an interface, as a standard to be used. I'm trying to answer quickly. Sophie, if you want to add, feel free; I'm using a very quick answer for that.
Yeah, I mean, I think that's fine. In terms of my perspective, it is another XML format. So the great thing is that it is a standard, so hopefully all of the health authorities will be using, from what I've understood, they will all be using FHIR, so it should make life a lot easier in terms of that transform, that it's actually a slightly different recipe depending on the health authority but it's all within the same framework.
One other question, Marianne.
Another question. Sophie, did you mention that there's a new XML format authorities are looking into for labeling? Was that FHIR?
FHIR. I just meant different countries are at different levels of maturity. US and Canada, Japan are a long way along, EU is slightly further behind, but everyone will come along, I think, in due course, certainly the major ones.
Okay. This is a big question, so I think you're going to have to have 10 seconds each, but "What's prevented pharma from adopting structured content management, authoring, distribution processes and tools? Is it the complexity? Is it change management? Why is it different now? Why are we talking about this now?"
I have a whole soapbox about this. You'll be really surprised. My opinion is the reason we haven't been successful with this up until now is the authoring environment and the tooling, that up until very recently, and I'm going to say very recently, like four years ago-ish, in order to use DITA, you really needed to understand DITA. There was no way around it.
As Sophie explained quite well just a few minutes ago, you can now use DITA without knowing what's under the covers. There will be people that still need to understand it, don't get me wrong, as Sophie rightly said, but I think that the main reason is that we were expecting medical writers to be quasi-coders in the past, and now the tools are much simpler to use. Crystal clear.
Right. Well, thank you, everyone. We do have some more questions. We're going to follow up with those of you who've submitted something. David, can you bring us on home here?
Let me just give a quick shout out here. We've got a new industry group that has just started. If you're a part of DIA, there's a new structured content reference model group that's just started meeting. The next meeting is tomorrow, I think. If you're interested in that, put that in chat or you could send me an email at email@example.com. I would love to hear from you, I'd love to get you involved in that. And then just with that, let me just say thank you to Val and to Sophie. We're so excited that everybody was here. This will be available online. Marianne, is there anything else I'm supposed to say?
I just want to thank everyone for attending this webinar and would like to remind everyone that the DCL Learning Series comprises webinars, a monthly newsletter, and our blog. You can access all of these things and recordings on our website at dataconversionlaboratory.com. We hope to see you in future webinars. And to end everything, I would like to wish Sophie the warmest of birthday wishes today. Thank you so much.
May you have lots of cake later today in your future. This concludes today's webinar.