DCL Learning Series
Metadata and Taxonomies In the Drug Development Life Cycle: Why Should I Care?
[David Turner] Hello, everybody, and welcome to the DCL Learning Series. We're absolutely thrilled to have such a great group here to join today's webinar, which is titled "Metadata and Taxonomies in the Drug Development Life Cycle: Why Should I Care?" My name is David Turner. I control the clients on content and digital transformation projects here at Data Conversion Laboratory, and I'll be your moderator.
Couple of quick things before we get started. First of all, number one, this webinar is being recorded, and it will be, available in the On Demand webinar section of our website, which is dataconversionlaboratory.com, shortly. And then second, we do invite you to submit questions at any time during the conversation today. We're going to reserve 15 minutes at the end where we can take those, and looks like we've already got some lined up. Excellent. All right. We're on the next slide already, here.
Just in terms of background, we all know technology plays a critical role in life sciences. Accuracy is critical. Traceability, compliance, shortening speed to market, all these things are huge topics right now. And what we've found is that improving your content management, your data management, along with related IT systems, compliance, and program management, streamlines all these things, and it helps you to move things more quickly. So Data Conversion Laboratory, Court Square Group, and JANA Life Sciences came together to develop this learning series to address how structure and technology can contribute to your success in life sciences.
This is actually the second of seven individual webinars. The other webinar topics are listed here, on the slide, and today's webinar is built a little bit on our last one, which was about technology and the best practices for content management. You can watch a recording of that webinar on our website. And Marianne, if you could push a link to that page in the chat box for everybody, or Leigh Anne, one of you, there. All right. Let's move on. Let me introduce the panelists today.
First of all, we have the great Charles Cooper from The Rockley Group; he's a Vice President there. I've had the pleasure of hearing him speak a number of times, and he's excellent. We've also got Keith Parent from Court Square Group, another excellent speaker with years and years and years of life sciences experience. And then we've got Ron Niland and Glenn Emerson from JANA Life Sciences, also two individuals who've been in this industry a long time with a lot of knowledge to bring to you today. So, I'm excited about the consultants who are with us and the leaders.
Um, this webinar itself is, just quickly, brought to you – if you could go ahead and go to the next slide on this one, too – it's brought to you by Data Conversion Laboratory, or DCL, as we're also known. Our mission a DCL is structuring the world's content. Our services and solutions are all about converting, structuring, enriching content and data. So, for example, in life sciences, we're one of the leading providers of SPL conversion services. But we're also doing a lot of work with structured content, and structured content authoring, structured content management, you know, in other parts of life sciences, to, again, help shorten time to market, help to improve patient experiences and health outcomes, et cetera. So, if you're working on any kind of digital transformation projects like that, we're happy to help.
All right. But enough about, enough about me, let's get in it, we've got a ton of topic to cover today, and I think we're starting off with Keith from Court square Group. So, Keith, I'm going to turn off my camera. I'll let you turn yours on, and take it, and get started.
[Keith Parent] Thank you, David. I hope everybody can hear me. My name is Keith Parent, and I am the CEO at Court Square Group. We have been around for 26 years now, and we've been hosting qualified and validated systems within our Audit Ready Compliant Cloud for a long time. We have a RegDocs365 product line. That is a content management system. So we're uniquely qualified to talk about why we should actually use metadata and taxonomies in all the things that we do. With this, I'm looking forward to working with my colleagues on this webinar. We did the first one a little while back, and I think that this whole series is going to grow on each other. So hopefully you'll all be able to get a lot out of it. Thank you. And next.
[Ron Niland] Good morning. And for some of you, good afternoon. Thank you very much for joining us. My name is Ron Niland, and I'm with a company called JANA. JANA is a third-generation family-owned company, formed in 1973. And in the last two years, we've formed a new division focused exclusively on the life sciences. Our company focus is in four areas, those being technical documentation, that includes everything from parts lists to user manuals, maintenance manuals, and procedural documentation that could include SOPs and work instructions. We really focus on the aspects around operational excellence as it relates to business process mapping. Particularly, our focus, as you might expect, working with data and documentation extends to an area that include IT systems and program management. And as a company, we're ISO 9001:2015-certified, as well as AS9100D. Thank you.
[Charles Cooper] Hello, there. My name is Charles Cooper, Vice President of The Rockley Group. The Rockley Group was founded in 1992 by Ann Rockley, often known as "the mother of content strategy" due to the foundational work she started in the late eighties and early nineties. As an organization, we have over 25 years of experience in the life sciences, helping companies organize and structure their content for publication and submission. In 2012, we published the second edition of what was our seminal book, Managing Enterprise Content, which you see here. We often partner with our, with Content Rules out of California. They're licensees of our methodology that we use in the life sciences. And today, I'm going to provide you with a bit of overview on structure, metadata, and taxonomies. And the other speakers will then delve deeper into those as we go on through today's presentation. Next, please.
So the first thing we're going to do is talk about content structure. Like nuts, bolts, and screws, content is much easier to use and more useful when it's structured. Creating the structure in which to place your content does take time. But that time is well rewarded, and there are standards which can help you with this, and we'll discuss those a little bit later on in today's session and throughout the rest of the series. Obviously, a significant goal is to reduce time to market. And structure allows you to organize the dossier and labeling for rapid internal review, which helps you, and then enables you to provide it to the health authority in a consistent and much more easily reviewed format, which, again, helps you in the long run. Next, please.
So, structure has to be defined. And you need a plan in order to do that. You, obviously, structure is structured. You have to organize what you're going to do. And it's a multi-layered process where you sort of have to look at both ends at the same time. You need to examine the content that you have and compare it with where you need it to be and then work your way through. In an organized process of examination. You need to analyze your data. Is it consistent? But unofficially, un– like, officially unstructured? If there's consistency there, even though it's not officially structured, you may have some inherent structure that you can leverage. Look at the data that you have.
Once you've done your initial analysis, does everything you have need to be structured? What was the purpose of the data in its original form? Why are you doing it? Has the purpose of this data changed? Will it have to remain in its, in its current form throughout its entire life cycle, from your initial investigation through, through the approval process? When you're thinking about structure, think about the process. Remember why you're doing it. A lot of companies will get into a hole where they start trying to do everything, trying to boil the ocean and forget why they're trying to do it. Structuring it is not an end in itself, you need to figure out why you want to do the structuring and move towards that. Next, please.
We talk about enriched content, enriched or intelligent content. That's content which is format free. It's structurally and semantically rich. It's discoverable, re-usable, and adaptable. Structure is that first step to enrichment. Enriched content allows you to better understand the information you have, you can use it to gain new insights, allowing you to better design, to design better clinical trial processes, or enable better targeting for potential study applicants. It's the first step. Next, please. Next slide, please. There we are.
Now, as the first step to creating intelligent or enriched content is to provide structure, ontologies are at the far end of that scale. You start by providing structure. Then you tag content with metadata to provide context. After that, you create taxonomies, which are organized hierarchical structures that allow you to define the relationships of information within one realm, such as targets, diseases, or anatomy, as we see here on the left. But you can go well beyond that. Ontologies are a couple of levels above. They allow you to compare data across different realms, between different realms, by linking and relating information across them. For example, we might relate a disease to an anatomy. I'm identifying that that particular defeat disease affects a particular part of the anatomy.
And this type of data research can provide deep insights or data analysis and moving forward. And we'll be talking about ontologies later on. Next, please. Metadata. So I mentioned metadata. And the simplest description of metadata is usually data about data or information about information. The FDA of course has their own definition. They say metadata is the contextual information required to understand data. A data value by itself is meaningless without additional information about the data. Effectively, what they're saying is you need context. Next, please.
If we consider a number, I'll pick 42 because it's my favorite number. But without metadata, we have no idea what it is. We have no idea what it actually represents. Is it a measure of volume? Is it that in milligrams, milliliters? Is it a temperature? What scale is that temperature? Is it, of course my first thought is the meaning of life, the universes, everything. Without context, we don't know. Metadata provides the context to that otherwise undefined number. Next please.
Now, all of the old people will know what this is, it's a library catalog. In the olden days, we used to go to a library, you'd select, search in the alphabetical catalog of books by author or name or title and then pull out a card that told us where that book was in the library. But it did more than just tell us where it was. It provided information such as, well, obviously the title or author name, but it included the publishing date, the name of the publisher, where it was published, the number of pages, whether it was a hardback, or a soft cover. And of course an abstract or a short description of the book, and each of these things are metadata about that entire publication that enables you to identify the book.
Now in the pharma world – next, please – we often have the need to identify metadata at the publication level, as well. If you're creating a PI from a CMS, or a content management system, having individual metadata on the elements will allow you to assemble it. And if you want to identify previously created PI that you've already put together, you can search metadata within the CMS to find it. So the metadata you see here: the country where it's applicable, what the revision it is, the DIN, the manufacturer's name these are all pieces of metadata that would enable you to find the assembled PI that you're looking for by doing a progressive search. Next, please.
So you have to establish a metadata strategy. You can't apply it willy-nilly. It has to be consistent across all of the realms of that work where you're applying it. Like when you're applying structure to the information, you have to have a strategy, and you start with similar questions: What data do you have? Why have you collected it? When did you collect it? Are you, are you looking at documents? Do you need to find completed, quote, "documents," whatever that means? Are you trying to provide information? Who are you trying to provide the information? When, when in the life cycle? Stephen Covey was right. You have to start, quote, you know, start with the end in mind, and then work your way back. Think to yourself, why are you trying to do this, and what are, what will you be looking for.
There are also industry standards for metadata strategies, and we'll be learning a little bit about those later in the session. So, let's look at some of the overall purposes of metadata. Next, please. Workflow. Perhaps you have determined that you want to get a better understanding of your workflow. Metadata can help you here. Using metadata in the CMS, you can identify where a particular piece of information is in its life cycle. But, it's not simple. You have to apply multiple pieces of metadata. It's multi-dimensional. You need to be able to track who interacted with that piece of information at each stage, and what that interaction was. So next, please.
You have to think about the granularity. The level of detail. Are you concerned about completed documents such as labels or package inserts? One level of granularity. Are you looking at document components, such as reusable images, like storage temperature icons or company logos that might be used in multiple documents, pieces of information? That's another level. Are you looking at tracking or providing context for raw data? That is yet another level. You have to consider this level of granularity when you're working to define the metadata strategy. Remember, the point is to provide context to the data that you have or might be creating in the future. Next.
If you're looking at tracking data, at the very lowest level, you need to be able to identify the provenance of that data. It might be a data result, that number 42. So there's a whole host of information you need, as you see below that, you'll see that you will have to have to track the audit level provenance of that data. Next, please. And because metadata is data, it falls under the CGPM requirements. It must be secured from change and modification just like any other data. That is something that you're going to have to deal with. The tools that exist will help you do that. And my final slide.
Metadata can be created in many ways. It can be explicit, which is something that you apply directly to the content that's being created. On the right here, you see a simplified sample at the guts of a piece of content, where I'm showing you a tagged file, you can actually see what the metadata looks like as it's applied to the content at the computer level. For the authors and the content users, you wouldn't see this, you would see a higher level of, of, of data that looks more like, like Word, but deep in its lowest levels, you have something like this. It can also – next slide, please.
Metadata can also be implicit. So it's not defined specifically by information that is applied in the raw information, but it's, it's content that is provided by the structure of a document, whether it's been specifically structured or whether you've added to the structure later. The titles in the data spreadsheet, as we see here, are the record identifiers in the database. So, there's all sorts of places where you can get data. It's something you can apply, or it can come out of the, the system itself. And that's actually something that DCL can help you do. That's my last slide. Thank you very much. And we can move on to the next, please.
[Glenn Emerson] Thank you. Thank you. So, let's back up for a little bit and talk about some of the key terms that involved a taxonomy and ontology and glossary. So, first off, we have the basic unit, which is the term. It is typically a noun or a noun phrase. The metadata that Charles was just mentioning, for instance, that can often be found simply by scanning the content. However, there's usually broader ideas that are implicit or abstract that are not stated in the content, and those would be concepts. Those, that's where you start to get into meaning. So, yes, a term has meaning.
Think of this like a gear. A gear on a bicycle, a gear on a car. A gear is a gear; that's a term, and it has the same definition. However, the concept of a transmission on the car versus the concept of a 10-speed bicycle are two different concepts, and the gears that would be used on the bicycle are not the same ones that would be used in a car transmission. So the concept provides a broader context of use. And we're going to see that that's important as we dig into this further. And very often, that concept is known to the people working with the information, but not contained in the information itself.
And, when it comes to analyzing content, especially machine learning and some of this natural language processing that we're hearing more and more about, we look, especially, at syntax. That is the, like in computer programming, there's a syntax for use in computer languages, and in any spoken language. there is also a set of grammar rules that consist of, form, what's considered well-formed speech, and they convey meaning. And the semantics are the meaning that's conveyed by that syntax. And I have two examples here that are very important. Both of those statements – "Eats shoots and leaves" and "Eats, shoots, and leaves" – are syntactically correct. You'll notice that the addition of two commas changes "shoots" and "leaves" from nouns into verbs. So essentially, on one hand, we're describing an animal like a panda bear.
And on the other hand, we're maybe describing a scene from the movie The Godfather. The meanings changed. The words are the same; all that's changed is the position of commas. It changed the grammatical syntax. For an automated natural language processor, both meanings are correct, but that may not be the meeting you intended. So there are limitations on AI, and we'll get into that a little bit more later on. Finally, you pull all this together in what's known as a vocabulary. A vocabulary is exactly what it says there: it's a system for organizing knowledge with explicit controlled terms. Meaning, if a term has a synonym, it's clearly called out as a synonym. You don't use multiple terms for the same thing. And this is, this is a deliberate process to get to this point.
Mining your data with, with software is a starting point to find some of this metadata, but you need people to come together and agree on that metadata, and we'll see more of that momentarily. The reason is you'll find multiple synonyms for the same terms. You'll find multiple variant spellings. You'll find all kinds of things that, that, some are broader concepts, and some are narrower concepts, and it really takes deliberate thought to organize this. Next slide, please.
So, vocabularies can be expressed in many ways, and going from left to right, or simpler to more complex, there are other means too that I didn't get into here, for instance, with websites, you can get into what's called the thesaurus or synonymery. The glossary is the most simple; it uses natural language. It's a word and a definition. The taxonomy starts to get more formal. It not only expresses words and definitions, but it expresses them in a hierarchy. So within life sciences, for instance, that's a very broad term. You might get down into pharmaceutical development, which is a bit narrower, because there might also be biomedical research. And as you start to get into those, you can find much more narrower terms as you go down in the hierarchy. Think of your public library catalog that Charles mentioned earlier. That starts with broad subjects and delves into narrower subjects. That's also a taxonomy.
Finally, the most complex, the most expensive to develop and maintain, but the ones that allow the greatest possibility, are the ontology. And that not only gets into the meaning and definition of things, and there are hierarchical relationships, but the relationships of concepts to other concepts. Sort of a nature of being: what concepts are fundamental, what concepts are auxiliary? And that sounds very abstract. I'll show you a slide in a minute, though, where you'll see how this is applied. But where this is really used is your more sophisticated computer processing, and think of, for instance, systems that when you do a search and you get related items coming up as a result of the search, how does the computer know when it's a related item? How does it know you want gears for an automobile transmission and not for a bicycle? A gear is a gear, right? Those things are expressed in an ontology in a language that the computer can interpret to decide what's related. Next slide, please.
Next slide, please. Thank you. So, on the left, here, we have a simple taxonomy; I just kind of threw this together. Everybody's familiar with architecture and the concepts of government. There are different kinds of government. In the ancient world, we had city states; Rome and Carthage are two examples. In more modern times, we've had imperial systems of government, which are bigger scale. We have modern architecture, we have ancient architecture. The Romans were known for inventing many kinds of architectures. So within these categories you can come up with narrower terms that get into very specific subjects.
We can also begin to express that in an ontology. So if you're searching on Rome and you're searching on the Roman city state and you're searching on military history, you might be very interested in Carthage and Hannibal and the Alps and all that. This is expressed there where you see Rome had an opponent in Carthage. The computer doesn't understand what that means, it just sees this relationship expressed formally. So it knows to grab information that's been tagged with metadata about the ancient city state of Carthage.
We can apply the same thing to concepts of clinical trials, for instance. So you might want to find everything in a clinical trial that was in the, that's been through a phase two trial, but that could be a lot of documents in your organization. What if it was only for blood pressure medicine, and within that field a type called ACE inhibitor? You've just narrow the field considerably. So a clinical trial could be, something could be in a phase two trial, and not have anything to do with blood pressure or ACE inhibitor, and therefore be excluded because it doesn't have all of the characteristics. And that's the advantage of taking the taxonomy and expressing relationships between the items in the, in the taxonomy using an ontology. It allows you to distinguish between what's truly related to the information you're looking for versus what just happens to have the same keyword. Clinical trial, or phase two, might be a keyword. Next slide, please.
So we've talked of the advantages of this, what we call rich semantics. That's the, the vernacular in the taxonomy and ontology fields, semantic being meaning, and you're adding rich meaning, and, again, for human beings, the meaning is understood. We have background experience. We work together, we understand each other, we can make allusions to things that automatically trigger the same reference. With a computer system, it has to be very explicit, and so it has to be well thought out, and it has to be carefully modeled.
However, it does allow, as the model gets applied, you begin to find a greater context of information, you begin to find concepts and materials in your content that you didn't know were there. Because you're suddenly finding documents that are related, because they are tagged according to the taxonomy model and the ontology model, by somebody else in the organization that start to appear on your search. You're like, Hey, I didn't even know those documents existed.
But they do, because you're following the same model. Again, think of the public library. Once you find your subject in the card catalog, you go to the shelf and start looking. You see all kinds of books on the shelf that you might not have thought to search for. That's a very good research technique. Essentially, you're automating the same thing here with the computer-based information. This can lead to greater content reuse, because you integrate these vocabularies. And you start to find new patterns in the information that you own, that you didn't know was there before. Next slide please.
Now, we've talked a bit about harvesting terms. That is a very effective way of doing a lot of grunt work, but it needs human intervention. The example earlier, the "Eats, shoots, and leaves," the commas, very often there's ambiguity in language that the computers don't understand. They may identify as separate terms, things that are really a single term. You may have to make decisions over which term is a preferred term; that commonly happens. So it's not unusual for computer-based discovery to find many different variations of the same kind of metadata, or the same label. You need to take those labels and rationalize them into something that applies across the board, and then agree on that with the subject matter experts in your organization. That is a human process.
It takes time, but it makes these taxonomies become, these controlled vocabularies become more efficacious and livable and usable because they can be maintained in an orderly fashion and thus applied in an orderly fashion. If they're not applied consistently, you won't get consistent results from your, from your systems. Next slide, please.
So I mentioned natural language processing before. It does analyze ambiguous human language. If you've used the Amazon's Alexa, or Google, I forget what they call theirs, but, or Apple's Siri, the, what you've got there is a computer system that's hearing your spoken words, processing the sound patterns, and trying to extract syntactical meaning from what you're saying. So it does this by identifying what the subject, predicate, and the objects of your statement are the graphing those relationships. The difficulty, as we've already seen, is that the same set of words can be valid and have two different meanings.
And so the abstract, the ambiguous nature of the language can be difficult for the computers to cope with. The other part of this is that, while it's very good at finding the literal terms, when you scan a spreadsheet, for instance, for the column headers, those literal terms are there, but the abstract domains, the more abstract knowledge or concepts are not always stated or obvious. And then the relationships between those abstract domains are not as obvious, And so this is where you can start with these tools, but, but people who are trained in the theories of knowledge organization than ontology and library science and so on are really, really helpful to come in and begin to analyze this with an orderly process. Next slide, please. And I think that was my last one.
[Ron Niland] Thank you, Glenn. Yeah, I appreciate– [Glenn Emerson] Thank you. [Ron Niland] You're welcome. Yeah. So at this juncture, I wanted to shift gears a little bit with the group, and as my friend and colleague, Henry Hernandez, says to me, Hey, Ron, what have you done lately to move forward the aspect of democracy? As you go down this path with your, your organization, in terms of helping them to understand and appreciate the aspects of good data management, but including aspects around taxonomies and ontologies as well as the metadata, you, you would be very well served to think about this from two different levels. One is sort of the 30,000-foot level, and then maybe another is more at the 10,000-foot level. At the 30,000-foot level, there's the aspect of core strategy and how people are coming together.
As you can see you on the left side, the course strategy versus the content coming together on the right side. And people invariably are working in processes and workflows, if you will, that then need to be governed. Whereas the content, while you want substantive content, it needs to be structured. So we're looking at the right side here, then, you can see there are a myriad of elements that factor into developing a really good content strategy.
And by the way, when you look at this cartoon, if you will, this depiction says a few different things; it talks about all of the different connection points, if you will, with content strategy. Not the least of which is business goals and objectives, but also things like the governance of that content, things like your brand, as you can see at the 11 o'clock position there, or the tone of voice that comes through. Whereas down in the section of taxonomy and classification around the four o'clock portion, you can see that the aspects of tagging and classification models come into play.
So, you know, the idea is really just to think about these two different levels, because you, you need to be true to both, and as a company or an organization, your goals need to really align with your objectives, which need to align with that content strategy, which needs to tie in with your focus messages, and then your, your focus materials.
Over the, this journey, if you will, your, your focus is not only on your internal organization and your goals that would include the employee needs, but also, and more importantly, on your customers and what their needs are. So you've got your audience, the target audience, and the audience needs that you just need to think about.
Over the next few slides, if you could advance, I'd appreciate it. I'll talk to the aspects of governance and workflows. So first, let's talk about governance. In my 30 years in life sciences, working with different companies, but including Pfizer, and Amgen, and Genentech, I found that governance had different levels of maturity in the organizations at the point when I joined versus the point when I left, and what I found was invariably around information management, data management, that, that governance increased just, you know, on a consistent level, sometimes there were larger steps of control, if you will. But the fact of the matter is, good governance just really enables you just to ensure, ultimately, you're meeting the needs of your customers.
So, if you look in the lower right, you can see here a governance model that basically just sort of gives you a sense of who the players are. How the groups may interact with one another, and then you can also see that there's a bidirectional flow of information. And this is really key to remember, you know, information and decisions shouldn't just flow up. It needs to be, again, very bidirectional. The, the aspect of the decisions, then, really, are tied to this governance and understanding who's making what decisions and who's providing information, but including content to enable the decisions. And getting to good decisions, I think everyone would agree it really comes down to getting the right information in an organized fashion to the right people at the right time, so they can make the right decision and then communicate it, capture it and communicate it effectively. So when we talk about workflows, then, you know, business process maps: you really need to think about this from both a strategic and a very tactical level.
On the right side, you can see the aspect of metadata and taxonomy ties into this, but on the left side is the strategy, understanding, OK, as an organization, we want to do this work. And we want to develop taxonomies and we want to understand how to tag data and develop metadata. Well, you need to step back and you need to have that strategy articulated. And part of that is the governance.
Then, when it comes to workflows, you need to develop your standards, like what do your workflows look like? Meaning, you know, business process maps, and swim lanes. Swim lanes go horizontally or vertically. How are we taking the names of the different characters in the swim lanes and capturing them? Are they individuals, or are they departments, or are they roles? And those things, once you get those solidified, enable you to then create your workbench. And the workbench is, the idea, the idea behind this is everyone in your organization will know where to go to get those standards.
They'll get the base training, they'll get the base templates, and they can then go out and sort of, in their individual areas, develop their workflows. So that's the next step in the center of identifying the flows and then mapping them out. Once you get your business processes mapped out, then you want to build your, your SOPs and work instructions around them. Those SOPs, you really, in many ways, need a process map. The FDA will say, if you don't have a process map, you probably don't have a process.
When you think about workflows, there are a whole series of considerations on the right that you need to think about, not the least of which is the quality of those workflows and the functionality and the accessibility, but also conformance to different standards. For instance, if you're talking about medical imaging, and you need to share that across a series of groups, you know, one aspect here might be DICOM as an example. And at the same time, you need to think about just regulatory standards that are associated with your work procedures, that procedural documentation. Next slide, please.
So, you know, as Henry and I were reviewing these slides, we definitely came to an agreement that, in some ways, what we're trying to develop is a subway map, if you will, for an organization, of how this series of processes and information come together just to enable flow. So that if you're in the organization, and you're riding the subway from one station to another, that when you get there, you know, they're taking the same steps toward communicating information in the same language, if you will. Or they're measuring time in the same manner so that you're not jarred as you go from one station stop to the next.
But the idea is, really, you want to streamline this, and you want to develop sort of an understanding of the connectivity, and you'll do that through your process mapping. But you need to think about the maturity of your organization as well: are you more mature, are you a Pfizer-like company, or are you in this nascent stage? Do you have some aspect of centralization, like master data management or procedural documentation? Or even timeline management, because all of these things have the interrelations with the aspects of the taxonomy and metadata. Last slide, please.
And then, lastly, you need to think about this journey and the maturation process of your organization and how you get from point A to point B. And you'll do that with a really good and sound learning management system that's oriented toward training individuals. A step above that, if you will, if you're shifting gears in maturity, would be having the knowledge management system, where basically you're building knowledge in the organization. You're not just sort of sharing information on SOPs but now you're suddenly sharing insights that are gleaned through the processes that are being run with the SOPs and the work instructions.
You'll need to really make sure you have a base understanding in the organization as to what you're driving toward and what things really mean. I mean, down to the level of the terms. What is a taxonomy? What is an ontology? And educating your colleagues is a really good first step toward that end. Lastly, in that section there, I would say that understanding how to write things and write things right is really important, because at the end of the day, you know, as we strive towards the integration of data and the automation, that's going to be an imperative.
Lastly, in closing, I would say that there, oops, sorry, If you're just able to back for one quick second. You know, things like DITA and XML come into play, where you need to think about how you're reformulating or re-processing or -purposing information chunks. And then as you go through your process mapping, you're basically developing discrete processes but chipping away at them with the idea of bringing them all together in this subway mapping. So I'd like to pass the baton to Keith, and thank you very much.
[Keith Parent] Thanks, Ron and I thank you, Charles, Ron and Glenn. You guys did a great job, kind of putting a lot of texture into a lot of terms and concepts that people have to kind of get. So I'm here today to kind of help pull it all together and talk specifically about, well, how does all this matter, and how do we use metadata and taxonomies and ontologies, and how can we use them in our day-to-day lives? And part of that is using some of these reference models. And where, where do these reference models come from?
Well, DIA is the Drug Information Association. They help – that's an industry group that has, you know, ties into the health authorities and to each of the pharma, biotech, med device companies, as well as a lot of clinical research and some of the vendors. Our goal is to kind of work together as a group of people in an industry, to say: How can we pull these things together? How can we kind of put something out there that becomes a place for people to start with? Because right now, if you listen to everything we just talked about for the last 40 minutes, you would say: Wow. Where do I start?
Well, this could be a starting point for you. If you're in the process of getting ready to do submissions to the FDA, you may want to look at the EDM-eCTD reference model. If you're, if you're doing a trial, and you need to figure out what kind of documents or what kind of data needs to be collected for a trial, electronic trial master file is something. And for everybody, RIM is something that they hear a lot about but they have no clue what it's all about. I've been serving on a RIM reference model committee for the last year or so, and we've got a white paper group, we've got an AI working group on that team, we're putting out a reference model, so that, that new reference model version two is coming out in June. But, again, it's a lot of different opinions, a lot of different departments to talk about how these things work together. So, next slide, please.
So, let me talk about each of the individual ones that we can talk to. So, that's the electronic Trial Master File is just a way for gathering clinical trial data. There's 11 different zones in it and so it's a way of kind of categorizing documents and data in a way that makes it easily ready for you to be doing submission for the FDA. Next slide, please.
When I look at my Trial Master File, a lot of the things I want to say, well, how do I look at all the different documents that are out there? or the TMF spec actually has a lot of this metadata defined out for you. So when you look at columns, in this case, this happens to be a SharePoint site, um, that we're using to create – you'll look at TMF Section, TMF Component, TMF Source. And under each one of those columns, you're gonna see different data elements that are there. That's some of the metadata, when Glenn was talking a little bit earlier, or Charles was talking earlier about metadata in classifying things, you're going to see that there's three files in that TMF Zone of 01 Trial Management. And each of those files has a slightly different component.
So you want, if you wanted to look and search for something specific, I could pull those out. If I was looking for sponsored documents, I'd look at my TMF Source. If I was looking at Site Files, I would look specifically for anything that's classified as a Site File. So, the goal is to really help, help you to be able to get to those documents easier. Now, part of the issue is you're going to be working with a bigger group. So, it may be something that somebody's working on certain types of documents within that trial, and somebody's working on something else. Next slide, please.
If I'm going to be getting ready for submission, now I'm going past the trial phase, and I'm pulling that data in. I'm going to be, you know, going from where they're doing the trial information down to the regulatory group that's going to be actually working on my submission. We're going to use the EDM reference model, again, that's the DIA reference model that was created and the eCTD mandate. The FDA wanted to cut down on the amount of time it took for, um, going through a drug submission. The only way they could do that was really kind of putting out a standard that said, hey, we need to have the document sent to us and need to have a certain font type, all the different things that are happening with it.
But, again, it's, it's based on a certain kind of taxonomy and, and metadata format that you use when you're laying this information out, and setting it all up. It's a great way for anybody that's going to be doing this. If they learn that, they can actually tie that together. Next.
At the site level, when you're doing a trial, there's invest– the electronic Investigative Site Files. These are binders and a bunch of folder structure for actually capturing the data at the individual sites; those are then going to feed up into the eTMF. You may have a CTMF system and, and at one of our later presentations, we're going to talk about, how do I integrate multiple systems together? Well, one system might be using stuff at the site level to another one that's coming up to the CRO or to the sponsor level. And you gotta have that interaction between the two different groups. Next slide, please.
When we start to pull all of these documents together and you're looking at everything, your Regulatory Information Management system, kinda pulls everything together and there's lots of different categories. And you start to talk about the investigational applications, marketed applications, the differences between the two. Um, when you're gonna be doing submission planning and tracking, you have to identify these things, labeling, and that's, and actually, at the next webinar, we're, the next couple of webinars, we're gonna be talking a little bit about some of the labeling, and some of the work that we're gonna be doing there, specifically with DCL. Process integrations, Ron had talked a lot about process mapping, and how we can use that, and use the metadata between those to kind of pull those things together. All of this and all that metadata can actually help you to get metrics and KPIs.
Somebody asked me recently about regulatory intelligence. Well, that means so many different things to so many different people, but the ultimate goal is that you want to be able to do your submissions to the FDA, or to Health Canada, or the EU, on a timely fashion. You want to expect, you want to understand what the regulators are going to be looking for. Correspondence and commitment: you want to be able to get something in from a health authority, identify the data in it. Glenn talked a little bit about natural language processing. A lot of times you're gonna get a correspondence in; you can use NLP, or natural language processing, on that to identify metadata elements that they then can tie it to a specific trial, or tie it to a specific drug product, and you can tie those things together. Lastly, if we go to the next slide, please.
You start to see that there's a continuum, and we're looking at the way that these things work together. We start out with the Site Files. From the Site Files we can have an overlap for the eTMF, um, where it's pulling in some of the eISF data and some of the EDM work together. You're usually putting into working libraries where companies are working on all the document submissions. You've got the medical writers in there doing their work, putting documents together, you're setting links between lots of these documents, you're putting it all into an eCTD format, and then you're tying it in with a submission processing application.
All of that ties back into a RIM, which is going to be, you know, gathering all that information, pulling it all together and hopefully giving you from, from start to finish all the information around that drug product or multiple products, if it's going to be, in particular, if it's going to go into multiple countries, different efficacies of the drug, all the different things you're going to be dealing with from a drug level. All those kind of things kind of tied together. With that, this slide was kind of pulling it all together. My goal was to say, and to take some of those, those, um, more educational terms early on and say: how do we actually bring those together, and work together?
Here are the things that you're gonna do, you're gonna look at any of these kind of reference models that are out there as a starting point. You don't necessarily have to use them. When we do a configuration workshop, setting up a document management system for somebody, and we're looking at how to lay it out, the reference model can be daunting, and you may only need a fraction of that, because you're only gonna be using certain pieces of it. That's OK. It's a starting point. It's for something for you to use and move forward with.
If you're gonna be tying in multiple applications together, you may want to look at what data elements are in one application, data elements that you're gonna map in another application, and make sure that you've got a taxonomy that's tied together between those two. That's where that whole ontology and making sure that, that things connect up together, that Glenn had talked about a little bit earlier, and Charles had jumped on. That's an, actually, where, a place where you start to look at, when you're integrating multiple applications together. We're actually gonna cover that in a future webinar coming up, so I'm looking forward to doing that with you. With that, I think we're at the point, David, where we're at the end of all the slides. We're gonna get into the questions and start to get on that.
[David Turner] Absolutely. So, thank you to everybody. That was fantastic and a ton of content. I really do appreciate all the work that, that went into that. We do have some questions to go through here. And please, if you still have questions, go ahead and put them in. If we don't get to yours, we can always respond using e-mail or something. Also, if you want to speak to anybody from the panel today, just drop us a note in the chat, and we'll make sure that whichever panelist you'd like to speak to gets that note, and to make sure that you're contacted. So, do we have everybody's cameras on? Let's make sure all our panelists' cameras are on and feel free to, I'm just going to throw these out to the whole group, and feel free to jump in and answer as you see fit.
The first one, I kinda laughed: I really like the subway map example and in this question made me think of that because that subway map, like any other subway map, has a lot of stops. The question is: “For a relatively small to mid-size company that may not be managing metadata or managing documents effectively, or at all, where should we focus first? Are there certain document types that make sense as a starting point? Are there certain people we need to hire?” I'll throw that out there.
[Ron Niland] If it's OK if I first chime in, that'd be great. I think one of the easiest places to start first is with your procedural documentation in the company. You should have a quality policy, quality manual, and operating procedures, hopefully. And with those, it may be possible that they were developed over time, and that they never were really woven together. And so that could be a great starting point where you could establish those standards, weave them together, and from those procedural documents tease out taxonomy- and metadata-related information.
[Keith Parent] Ron, if I could just jump on that for a second. So the way we kinda look at it is, an information architecture or looking at the process map when you allude to the, the SOPs or the product procedures, by actually looking at the process maps upfront, you can see how these different processes are going to talk back and forth, or what's going to get set along the way. And setting up an information architecture of information that's going to flow from one process to another, from one department to another, is a great way to start with, start at it. And that can actually feed into some of the metadata elements you're going to define, it could feed into your governance plan, the way you're gonna set up, all your, your documents and data are captured. So that would be what I would start to look at.
[David Turner] Excellent.
[Glenn Emerson] I think you'll find, if you want to be serious with a taxonomy or controlled vocabulary, that you do need a dedicated effort to developing and maintaining that. Especially in any kind of enterprise level, and organizations that are really known for this, and successful with this – and Amazon is always at the top of the list – they have dedicated teams that work this, and that is how their systems are driven. This is by ontologies in the background. It doesn't happen by accident. There's a business value to it. And that's where you weigh the decision to do that. But we're seeing this also in financial technology, which is my background, is the fintech sector.
And, again, enterprise-level taxonomies that, that are turning into ontologies that not only allow organizations to associate content with related content, but also with related API codes and related network functions and data values in database tables for processing financial transactions. And, you know, you see the application of this – not to go off on a tangent; I realize this is life sciences – but when you're looking at tracking things like international terrorism and funding of it, human trafficking, this kind of stuff, a lot of that is done through financial technology networks, and it's done through creating these sophisticated vocabulary models that allow for tracking this kind of information.
[David Turner] Next question we have, I think, goes a little bit towards what Keith was talking about here at the end. So, Keith, if you want to take the first shot at this, and then we can go around the room: “Because we have multiple vendors, and different content management systems, regulatory systems, metadata systems,what can we do to get the data talking from one system to another?”
[Keith Parent] Well, again, it goes back to that information architecture concept. That's really understanding each of the different systems, how the data is being captured. Even the terminology of how the data, whether it's in a vault, or it's in a folder, or it's in, you know, whatever the content repository that they're, they have in an application, just identifying what are the data elements that would actually tie one system to another. That's the first step, is looking at that, how does the data flow, or the content, the document, flow from one system to another. That process map is going to be really important in order to get them to working. Are they, are they meant to work together?
Some places, you know, a lot of times, when I'm working at a client site, the quality system, everybody thinks everything's going to go into the quality system. But then, all of a sudden, the quality people are using it, and they don't want anybody else in it because they want them, they want them to be pristine when the regulators come to look at them or they get audited. So, it's one of those areas where they've got a lot of working systems around, and they're tying things together through other systems. So, it all depends on how they're going to be working with the data. But the reality is, looking at how data goes from one system to another is a great place at starting to identify what are those metadata element that can work between multiple systems. We're actually going to be doing an entire seminar, or webinar, upcoming that's talking a lot about those integrations, so look for that in future as well.
[Charles Cooper] Yeah, I'd like to weigh in on that as well. I think one of the challenges is, Keith, as you said, is, you don't necessarily need everything, every part of the metadata, every one of those things. As your business requires, look at what you need to do. So start with what you need. Don't try to boil the ocean, Don't try to do everything at once. Also, it sounds strange, but get your different vendors in the room and have them talk to each other once you figure out what you need to do or what you think you need to do. Because in many cases, sometimes the vendor will say, Well, we can't do that. And that's the standard response because it will take time and money. Come back to the vendors and say, Well, we need you to do this, and we are paying you to do this. So, let's work together to figure it out. And most of the time, the vendors will be able to do it. There are certainly places where you don't want data to cross the boundary.
Glenn, you mentioned, you know, stepping outside of the, the, pharma, the life sciences world, into the finances. Maybe different levels of finance organizations shouldn't be talking to each other. You know, corporate banking versus investment banking should not talk to each other. We have 2008 to think about when that happened. So, to focus back here, look at your business requirements, figure out what you needed to do, and then go out and talk to your vendors and say, We need you to do this. Let's work together to figure out a way.
[Glenn Emerson] Yeah, corporate investment banking started talking with repeal of Glass-Steagall but that's a totally separate subject. Uh, the, I don't disagree with your point. The other thing you need to think about, and I run into this a lot, is to question your assumptions. Very often, we get used to doing things a certain way. And that's just the way it's done. It really helps to stop and say, What if we didn't do it that way?
And, you know, I mentioned Amazon earlier, not that I'm trying to promote them or anything, but that's one of the keys to their business model, is, let's think of different ways to do it. You know, let's redefine the marketplace, that's what they set out to do. It wasn't to just create a product or an offering; it's to redefine the nature of transactions. That was their business case. That doesn't have to be yours. You don't want to boil the ocean. That's right.
But don't assume that the things you've always done always have to remain that way. It's because a lot of times there's sunk cost in that. There's, there's hidden cost in that because you just keep spending that money on payroll and salary and process. And some of those are the real barriers to the business efficiencies you're looking for.
[Ron Niland] David, can I take 30 seconds and add a few other quick thoughts?
[David Turner] Oh, sure!
[Ron Niland] For two years after Pfizer acquired Warner Lambert, I was tasked to bring together several dozen databases. And to Keith's point, it really comes down to just doing an inventory, if you will, going through these databases field by field, you need to map them out. And then you need to understand what's the parent data versus the child? The parent is the pristine data that you need to protect to create what might ultimately be a single source of truth.
And I think in this person's question there, they've got a series of redundant systems, perhaps they're overlapping, and maybe the way to get there in the shorter term is to create the connectivity and data cube that enables reporting out on the data that's coming from the myriad systems, but then there's the rationalization over time. But it really comes down to basically taking everything out of the closet and asking, you know, What is this piece of clothing? Have I worn it the last few years? and doing that rationalization piece by piece. And then, once you get it sort of all mapped out, then you're able to go to that next level, if you will, in the journey.
[David Turner] Excellent. These two questions, I'm kinda combining together. One's, this, I think, has to do with, you guys had mentioned some of the metadata, the content models that are predefined by industry groups, et cetera. One question is: “Are metadata values configurable, or are they static in nature?” And the other one is: “When people talk about a reference model in an industry group, how strictly do we have to follow that model?”
[Keith Parent] I'll just jump on that real quick: it's never static. I mean, it all depends on the system, but most of those reference models are just that: they're a starting point. They're meant to be changeable, they're meant to be adaptable to your system. So you, you can do what you need to do with those models and start out, like I said, when we do a configuration workshop, one of the things we do is kinda skinny it down to what's really applicable for that customer, and what they want to do with that. And metadata is meant to be changed. Companies change over time and metadata is meant to be changed, so as you add new states to a, to a process, you want to be able to change that metadata to reflect that. So I think it's all changeable and it's all based on how the business runs.
[David Turner] Excellent.
[Charles Cooper] It all comes back to supporting the business.
[David Turner] I'm sorry, we're on our last minute here, and we've got to throw up one more thing here. I want to just thank everybody for being a part, and we do have a short exit poll today when you leave. I think that's being either thrown up on the screen or maybe it's being sent out by email, I'm not sure. But we do have a short exit poll today. If you could take an additional minute just to provide some quick feedback, it's going to help us to follow up in a more meaningful way.
Anyway, let me just finish by saying the DCL Learning Series does comprise webinars like these, a monthly newsletter, our blog, and you can access this and, and other webinars related to content structure, XML, et cetera, on our on-demand webinars section. We hope to see you at future webinars, we hope to see you on our mailing list, and if you have any questions, please let me know. Thank you, panelists. Have a great day, and this concludes today's broadcast.
[Ron Niland] Thank you very much.
[Keith Parent] Thank you, everybody.
[Charles Cooper] Thank you, everyone.