On-Demand Webinars

DCL Learning Series

PubMed Central Primer and XML101

Webinar Transcript



Hello, everyone, and welcome to today's webinar. We will begin, in just one more minute. We're going to allow some folks to continue logging on. 


Welcome, everyone! Welcome Colin! 


Hello, Christina. Hello Jeff.


Hello Susan. 


Welcome Virginia. 


Happy everyone's taking a little bit of time out of their day. 


So welcome to the DCL Learning Series. Today we are hosting a PubMed Central Primer. Along with an introduction to XML. 


My name is Marianne Calilhanna and I am the Vice President of Marketing at Data Conversion Laboratory. I'll be your moderator today and just a couple quick things before we begin. We are recording this webinar and it will be available from the on-demand section of the Data Conversion Laboratory website at dataconversionlaboratory.com. 


If you could hit the next slide, please. 


Before we begin, I'd like to make just a really quick introduction to Data Conversion Laboratory, or DCL, as we are also known. Our mission is to structure the world's content. 


Content can unlock new opportunities for innovation and monetization when it has a foundation of rich structure and metadata. 


DCL's services and solutions are all about converting, structuring, and enriching content and data. 


We are one of the leading providers of XML conversion services, did a conversion, structured product labeling conversion, and S1000D conversion. 


Many people are well aware of our excellent content conversion and transformation services, which is what you see on that first blue box there. But we also do a lot of work in the other areas listed on the slide. 


Things like entity extraction, third party validation of previously converted content, semantic and metadata enrichment, data harvesting or website scraping, content re-use analysis, and structured content delivery to industry platforms. 


So, if you have complex content or data and any challenges around those things, we can help. 


I feel fortunate to work with some great people here at DCL, and David Turner is indeed one of them. David helps our customers are those who are not yet, our customers with content technology initiatives. 


He's a great person to reach out to if you have any contact structure challenge, and he really understands and demonstrates how XML, metadata and updated workflows translate to digital transformation and, ultimately, revenue. David helped us make sense of PubMed, PubMed Central, and XML. 


All right, definitely. We'll do that. Thanks so much. I appreciate the introduction, and hello to everybody. 


I'm coming to you today from my home and in beautiful, Sasche, Texas. 


If anybody that's on this webinar, who has ever been to Sasche, Texas, let me do something there in the, in the chatbox, et cetera. Because I'd be interested to know that. The sound is coming from my home office, so who knows if some kind of craziness will ever happen in the background? If there'll be some noise or some? Some loud children or something, they might come by. 


So, in terms of just background on this webinar, where the idea came from,  this is some questions that we've gotten. On a repeated basis as Marianne alluded to earlier, DCL the scene is structured content business. And what that means is that, often, we're doing conversions to XML, and, and we're making updates to metadata and things like that. 


And a big part of our partied structured practice, it's really around structured content for the repository, right? So, for example, I think she had mentioned we do know conversions too. 


The XML, for things that are going to be submitted to the FDA, and it's called SPL, or structured product labeling. 


We do conversion of patent documents, the supporting documents that go with the patent application for the US. Patent Office, and we also do a lot of work, converting content and structuring it, so that it can be loaded onto PubMed Central or PMC. 


Because our clients have asked a lot of similar questions about this. We thought, we'd put this together in the report when it's about the PubMed Central process because that's typically where the first half of the questions come around. 


But then part two is going to be about just XML. 


From an introductory perspective, some of our clients, when we come in, and they are going to load it onto a foreign institution for the first time, this is really their first exposure to pressured content, or their first exposure to XML. And so we've got some, we've got some explained. 


Well, let's just kick in, and let's start here with part one, which is the part about PubMed Central. 


Alright, so curious, a quick screenshot of the PubMed Central, this is what it looks like, and I'll actually show you some of this live here, in just a minute. 


Basically, what that essentially has is, it's an online library or repository that's part of the US National Library of Medicine. So, as you can imagine, the National Library of Medicine has lots and lots of shelves of books and journals and things like that. But they also have electronic content and they have this repository PMC that is maintained by them. 


It's actually operated by the National Center for Biotechnology Information and I think we actually have a couple of NCBI people who are at least one on today. So welcome, Thank you for coming. 


Um, this repository contains really, all the full text journal articles from a particular area, Biomedical and Life Sciences Jones. And so, you'll know the journals, and it's got all this content. And it's available for free. 


And, I think, currently the number is, like, six point three million articles, And they go back all the way to, to the 17 hundreds. I guess, just a quick look here at what it looks like. 


This is not it. 


Yeah, supposed to be. 




Alright, so this is what PubMed Central actually looks like. 


If you want to look up a particular journal, you can go in here to return a list. If you want to pick up a particular topic, you can do that, as well. 


I'm gonna jump in here and, you know, just put in a journal that I know. 


Um, Journal of Health Economics and Outcomes, Research, search for that particular journal. It's going to tell me what content is in there from this site. 


It's going to tell me, you know what, with the last volumes are, et cetera. I'm just going to click on this here. 


You can see, I can jump in, and I can actually get into this particular volume. I can get into each of the articles, and open up an article here. 


And you can see here in this article, that it's got the title, the Author information, the abstract, and then as you go down further, it's got the entire article. 


It contains the images, if you paints the tables and very flexible format. 


You can also output, you know, various different outputs. You can do an e-pub output. You can do a PDF outputs even, read it in a variety of different ways, but that's what, that's what PubMed Central looks like. 


Kind of the next question that we often get about PubMed Central is, what's the difference between EMC and goodness or how is PFC different than the medline? 


If my journals already indexed and Medline, isn't it already UPMC, things like that. 


So, just to walk you through this, I'm gonna kind of walk through the difference here and for those of you who've been in scholarly publishing a long time, you might already know some of these things, but I'm going to try to explain in layman's terms. So when you think of PubMed Central, I mentioned at the National Library of Medicine. Has all these shelves of content out there, right. 


Well, electronically they also have jealous of content. 


So you can think of central as this shelf of journals about medical research. So it's the actual full text articles. It's all of the different pieces together. 


Whereas PubMed, you look at PubMed, it's actually a bibliographic data. It's it's, it's a card catalog, if you will if you read the library. You've looked at the card catalog to find out where those full text articles are. 


In this case, you use this to have references, or links to the online articles and books, so it looks like. 


This here. This is what PubMed looks like. I can look at that same article that I had there before. 


And I can see just, you know, just see items that go along with it. So maybe I've done something like this here. 


You put an author's name or something like that. 


Hmm, hmm. 


I can find this article, yep, there's that article right there. 


And you can see here that it does include some of the same information, the abstract, the authors' names, et cetera. But it does not have the full record. Instead. What it has is English. 


They aren't on PMC. 


So again, it's, it's intended to be kind of that our catalog. 


So it has the information that point you to the bookshelf, the full text articles that are out there. 


It doesn't just point to PubMed Central, I mentioned, or what, six point three million articles. 


Well, that actually has something like 30 million citations. 


So just by doing the math, or you can see it, there's actually also pointed out of repositories actually, also points to other Publisher websites. So there can be lots of content out there where the full text exists someplace else. 


PubMed will show you the citations for PubMed Central. It's just that specific group that the National Library of Medicine is hosting. 


In terms of Medline, you think of Medline. 


Medline is a it's a subset, PubMed, So it's also a bibliographic database. It's the largest subset of PubMed. It represents something like 26 million, of those, of those citations of 30, 40 million citations total. 


26 million of those are indexed in the Medline database. And typically, one of the things that makes them very useful and specialists, if they're indexed, using mesh search, that's another topic for another day. 


Moving along here to this next slide, in terms of of submission, if a publisher was being asked to submit content to PubMed, or Medline versus PubMed Central, it's, it's a little different. The rigorous, a lot different, The amount of contents, like different. So, when you're submitting to PubMed, typically, you're submitting XML header type information only. You're sitting on the basic citation data, and you're sending over basic abstractly, and I think they can turn these around something like 24 hours. 


It's 48 hours, something like that, it's a really several facets. 


Now, when they go to the full text slide where it when they go into PMC there's a big difference publishers here going to certainly submit the citation data and certainly going to submit the abstract data. 


Then they're also going to submit the full text for the articles, all the tables, all the images, all the references, all the supporting documentation, things like that. That's what's involved. So, from a submission perspective, it's, it's pretty significantly. 


Alright, so I'm sure you guys walk out this, you're ready. So, I'm gonna give you a quick pop quiz here, so just, I'm not going to ask you to do this, like as a, you're online or anything like that. 


Just, you rate yourself, I think brand yourself as integrating this equation one can be indexed in Medline, and not exist in PubMed Central, UC. 


Be thinking of you and your answer. 


Answer is yes, certainly. So, Medline, again, has 26 million citations. PubMed Central has six point three million articles, so you can certainly have an article that sitting on a publisher side, or in another, in a database that does not exist in PubMed. 


Question number two. 


Are articles in PubMed Central generally also indexed in PubMed? 


Your answer there? 


No cheating. 


Yes. Yes, absolutely. Articles in PubMed Central are generally also indexed in PubMed. They tend to work hand in hand. 


And question number three. 


Do PubMed and PMC have the same rigorous requirements for full text XML? 


Answer is, no. 


We just talked about number three, so for PubMed, it's typically a relatively quick process, XML header type info. Whereas PMC requires a lot more information. A lot longer process. All, right. So if you created yourself, if you got 100, let me know in the chat. 


If you didn't make 100, Also let me know in the chat, and just let them know how you doing this quiz. Let's move along here. And let's let's talk a little bit more about the publisher's content on PubMed Central, what are the benefits to them? 


Know, basically they did for four main reasons. First of all, there's the idea of increased discoverability and access when you're you're researching, you obviously want your content to be accessed by the largest audience possible. And PubMed Central is certainly the pre-eminent repository of information. 


Another great benefit is that it allows you to archive your articles, really is perpetuity. 


General troubleshooters might come, they might go, but over time, the, ideally, the library, the National Library of Medicine, will be there, and so this allows you to archive them separately from, you, know, what, with the publisher's website might have or with another database might have some way that's there for the common good. 


Another great one is that you get a lot of increase exposure because you're integrating with the other databases that are part of the National Library of Medicine, NCBI. 


Um, sometimes it's really just important because maybe your vendor is requiring that to be, it permits. 


All right, Um, now let's move on. Let's talk about the process itself. 


So how do we get a juvenile into independence? 


Well, first of all, I'll tell you, they do actually have this whole process lined up here. 


Right, there's a website here, We're gonna give you this in, there's, there's a handout that includes all the links to this kind of thing, but there is a process that's outlined here on their website that walks through a lot of this detail, but, I'll hit it from kind of a high level here today. Alright? 


So first of all, you've got the process of just submitting the application. Because you're submitting a patent here. You have to give them some basic information. 


An important thing to understand here is that when you're submitting content, the permanent central, it's not books or a journal article or typically getting approved is an actual full journal, right? So what they're going to want to know is, who publishes this journal? Who's the management of this journal? What's the journal title ISSN? When was it published? How often is it published? you know what's the website? And then they want to know you know? 


lakes around. You know what your various policies aren't for editorial or peer review and things like that. So you submit all that information, kind of as the first step, just to make sure. 


Well, to prove that you know that you are a reputable journal, and that your content deserves to go on this site, once you pass that step, they move into an application screening process. So, the PubMed Central starts looking at the journal, and they started asking a lot of different questions. So they started asking, you know, is this is this peer reviewed content. 


You visit the right kind of content. This is biomedical content, Does it actually belong here on this slide? 


Are just wanting some other database, um, what are these author affiliations that we've got? Are they, are they appropriate for this kind of content? 


Article types relevant, so it's kind of a high level push field application screen there to give the idea of no from a macro perspective. 


Does this continent yet? 


From there, they move into a more detailed view, which is the scientific quality review. This, this is really intended to be a review of the content itself, alright? It's a content review, not a format review. 


And if you're already indexed in Medline, you generally don't have to be. We won't go too much more, in terms of the scientific quality review. 


If you're not a deadline already, they're going to start asking questions live. 


What's the scientific rigor here? 


The author's applied scientific method, They provide full transparency and providing their supporting data, things like that. 


Don't want to know, you know, is this, is there good editorial quality here? 


Whatever your articles claim, or are those things clear and logical? 


Figures and tables that they've added, or those well constructed, really contribute to that actually, if you look at both scientific rigor and editorial quality, and then ultimately, they come back and tell the publishers, you know, if you made it or not. 


If you don't make it, you are eligible to re-apply in 24 months. 


And that's pretty consistent across the way here. I will sell. So there is four criteria online for the scientific quality review. They walk through all of the different questions that they ask. Lots of lots of sample questions. 


I wouldn't worry too much about crashing down that website. 


But because we've included it in this handout, certainly encourage them and take a look. 


Alright, so we've submitted an application. 


They did kind of a high level screening and they dug in and look at the content, kind of at this point, at this point, they move in and they want to look at a format. They want to look at the format of the content. You know, does this thing meet our technical requirements for this content? You're ultimately Central wants to have their content to be consistent, and wanted to follow a similar format. They want to be able to have nice leakey, and so do that, it has to be in the right format, and that's the vendor gets involved. 


The publisher is asked to submit an initial package of 25 articles, and those articles should include XML should include, PDF to include, Images should include supplemental data all in a big package. 


At this point, they review all of that content. 


Now, if after three razvi evaluations, they can't address what they reported errors, the application gets rejected. 


Um, and so, you've got to make sure that you have clean XML. And we've had, I was just a couple of different times, where they have two strikes and they need somebody to help them on the Search tree. 


You're not required to use a vendor. 


You can certainly do this yourself, but you just need to know that if, after 3 evaluations 8, they will reject you, and down, I believe my understanding is that the same 24 month period applies, or you won't be able to. 


Don't have to re-apply the another two years. 


So after you pass that technical evaluation, they move you into a pre-production process. There's pre production processes, all right. Your content looks good. 


one that looks pretty good here. Let's, let's execute an agreement. 


Let's, let's get a larger set of article files. So you don't bring in the rest of the articles that you're going to load based on, you know, what you told them in terms of publishing frequency, et cetera, in the, in the first step. 


Then they, they, obviously, the review those as well, and they'll see the correction report, if any data errors are found, it asked before, if they continue to find repeated errors, it will. 


It will reject your application. You will have to wait again to debrief. 




Alright, so, we've been through all of these processes. What point you kind of move into a live released phase, right, release phase, what I'm gonna show here is a workflow. It's typical of what we do as a vendor. 


This could be different if you do it yourself. Or even if you use another vendor, it just kind of depends on their process. 


But, generally, the process that we use is, the publisher will send us a PDF and a word version of their artists, all right? 


We take this out. Sometimes they'll send us your images separately, other times, they'll just asked us to extract the images from what they've got on these other documents. 


So we take those, we go through a process of converting that content and we create an XML file, and we submit the XML PDF images. Chief of its central for you. We just do that on a regular, ongoing basis. 


As your journal comes out, you send us the content, we make the conversion, We do the submission, and then we'll also typically send a copy of the XML back to our client. 


So, thank you. Keep it there. 


That's overall the entire process, or getting a journal loaded. If you have questions on that, do put them there in the, in the questions pane, and we'll, we'll hit them at the end. I'm happy to do that, or I can obviously talk to you afterwards as well. 




Alright, so still here in this PubMed Central section, let's let's look at some frequently asked questions. 


I will say that PubMed does have a really great, frequently asked questions page, here's a screenshot of it. 


Um, and I'm also going to show you the the leek, here. 


Again, don't worry about copying down, we'll put it in that. 


Set of links fits on the, on the handouts there, but they do, they do a great job of going through a lot of common questions that are out there. And if you don't really see an answer to your question before, you can contact me anytime your life. But you can also contact come in central directly at this e-mail address there are very pleasant group to work with you, to get back to publishers quickly and self, so feel free to reach out as well. And that comes from there, your website. 


All right, but I'm going to hit a couple of the frequently asked questions that we get. Every now and then publishable asked us why. Why do we have to go through all this process of getting together to complete articles? 


Rather than just Nikita our journal site, we've already got it up on our site. Why can't they just link to it? 


The answer is really goes back to what we talked about earlier. 


You know, the link it's going to be on PubMed before catalog that's already established to do leaks, the full text on the publisher sites. 


The idea here is that you're, you're having this content, is full text content save, in a different repository in the National Library of Medicine Repository, that's accessible for all time in the future. So, they require full text articles in supplemental information. 


Another common question, Why does? Why do they require this article in XML? I've already got a PDF. 


You know, maybe I'd put it into HTML on my website, why do we give you this XML? 


Well, you know, there's a number of reasons why and they do list these on their website. 


First of all, XML is his hands down the most effective archival format. 


It's Hardware and software independent. 


And so, you know, it just really works well to interact with all systems and devices. 


The example that I think of here, when I was in college and I need to date myself Helpdesk, Um, we use the word Perfect to create all of our documents. 


So I had all this great stuff that I, By then, you know, Mike, it doesn't make it ready for job searching, all sorts of different documents. 


I saved in Word Perfect on a 3.5 inch fajitas, and 10 years later I can access those documents anymore. I didn't own Word Perfect anymore. I've moved on to Word, I know we're perfect still out there. 


And I can give those things that somehow if I want, But the idea here is that if you put your content into specific formats will require a specific kind of reader, you limit the ability to, to interact with other systems. You limit the future proof aspect of it. So XML is really good at. 


Another reason, XML is fantastic for readily transforming the content into, you know, whatever the best format is for a particular reading device. 


I mentioned on the on your screen, they have a link for got a PDF. So if you're on your laptop one PDF, if you want to print something out to read it, you can put up a PDF. 


But if you've got an e-reader, you know, you can use you can get the e-pub version or if you want some other accessibility features, things like that. It allows maybe readers that we don't even know about yet XML is going to set the stage. 


Other reasons, it's a better search experience, the PDF XML because there's this tagging around the different sorts of elements. You tell the computer, hey, this is the author's name and it recognizes, that's the author's name and it's not just the name David Turner. And so that allows you to really search in a much more granular, practical way. 


It also enables effective leaking of content. Because we have XML set up. You know, we can go into this particular article. 


And now we can come in here, and we can, we can know what the other articles. Oh, I'm back on PubMed. Central on it. 


Anyway, but you can, you can link, and you can see what the other articles, if this person has written, are, because of that XML that's about. 


Then, kind of last year, It does provide for those accessibility features, and we'll talk about that more in a minute. 


But accessibility is so crucial, and you've gotta have good structure to your content, to be able to laugh. 


Leads me to my next question, and this one is? a lot more common than you might expect, and that's what he's XML. No, which we started this journal, and the publishing now for two years, but I've never heard insects and other things we kind of heard, but we don't really know what it is. It's something computer or whatever. 


And so that's what kind of leads into this next, this next segment, part to the insurance XML. 


Now, when I talk about structuring content, typically I am talking about XML. 


And the idea behind XML is really this. 


So, um, when you look at, say, a journal articles, references, like on the screen here, as a human being, you can pretty easily look at this journal and recognize the different elements, right? 


You can look at this and know that that's an alternate. 


You can look at that and know that that's the name of a journal or you can look at this and know that those are our page now. 


To a computer, even if they can search and they can recognize that, you know, that EH, NAR is a word. It doesn't know. 


If that's an author. It's, It's just, It's meaningless text. So, yes, if you're searching just the right thing, you can search for it, but it doesn't give the article. 


It doesn't give the computer any real, semantic, meaning, if you will. 


The idea behind XML is to provide a standard of structuring content so that the computer can recognize elements like these two. 


When we convert a journal article into XML, the computer now knows this is an alternate, now knows this is a turtleneck and now knows these are the different patient. 


It does, is it does this through a series of hidden. Alright? 


So there are standard XML tags that are running behind the scenes that you'll see with an opening tag and a closing tag posits agencies different therapists backslash. 


I can remember which one, you call FFF, AK, so the idea here is that you seen those tags, like you see surname here, that lets that lets the computer know, hey. 


It takes it's in-between these two sets brackets. 


That's the authors', surname. 


Next one that SEO is given, or if you go down a little further, this is the source of that reference. These are the page numbers that are relevant to. So, we create these tags, kind of behind the scenes, so that the computer knows what all these different pieces. 


Just to clarify, for the most part, your readers aren't going to see these tax, Typically, your goods are still going to see, you know, the HTML and PDF print out or something like that. 


What we do is, we set up where the computer can see these tags, so that we can enable a lot of functionality. 


Which then leads to the next question? 


How do we know what tags to use team? We just, we just make the tags, you know, how do we know what types of you? 


Well, the answer to that comes in The idea of XML specification or *** in different communities have different specs, but I think there's an SMS text will allow you to create the elements in a standardized way with elements that are common to a particular publication and publishing community. 


There is a pre presentation on this by my mentor, your cast, and About external effects. It's a few years old, but it's still, it's still a fantastic presentation slides. 


I think I put a link to them on there. If you have trouble find out, let me know, and I can, I can always hit up bill and get those as well. 


And he may have more events than this, but this is kind of a transformational one for me, a few years back, Pretty case. Let's just use the example of, you know, the journal article. 


Certain things are going to always be important, when you're tagging Richard. Or, as I mentioned before, you care who are the authors, right? 


You might have a bog you know number. 


You might have certain citations, you might have page numbers. So certain things are always important to a journal article. 


So for this, there's a community out there that develop a suite of tags or journal articles. It's called the journal article tag suite. 


We use cats, specification when we're loading content, it has to do with the journalist and scientific journals like this. 


Love the link here. You can go here to the ... website, and you can look up and see what all these different elements are. 


All these are the standard packs that you're supposed to use when you create your content, OK? 


Again, don't worry about this. These tags are in the, in the handouts, as well. 


All right. But the idea behind aspects, I want to show a different example. 


I mentioned a particular kind of publishing community. So, if you're trying to create a journal article, chats is going to make a lot of sense. But let's say you're trying to create a re-usable section of technical documentation. I think we have some, some non scholarly publishers that are on here. Maybe you're trying to create educational content. 


Maybe you try to create a clinical trials protocols and pharma industry, and you want to be able to re-use this kind of content. 


Well, you're probably not going to care about the volume number of simply that. That's, that's an element that doesn't really matter. 


Honestly, because of the fluid nature of module and topics, really probably not gonna care very much about page numbers either. So, you know, those notes might not be all that important. 


What might be important, though, is being able to distinguish between a topic title and a section title, Know, Do I want to re-use the whole topic? 


Don't want to re-use just part of that stuff. 


So there are elements in this community that allow you to have no distinction between the top title and the section type, Or maybe you want to do traditional text. 


one of the things that you see a lot with you know, technical documentation is the idea of being able to re-use the content or maybe a different audience. So you might create your technical documentation that is 80% the same. 


Percent is different because some of your audience's novice and some of your audience is experts, in good Earth, they created an element for that. 


They created attributes, you know, for audience, so that's something for this particular kind of an element, you might want to use the specification, chats for one, did it for another. Slide, here are some Common XML specification, it's by no means an exhaustive list. You, depending on the industry you're in, there might be different communities that use this. 


..., is often used for scholarly journals, it also has a sister spec that's called the Book Interchange Tag Suite or Bits, that's very similar, and then the next one here, kind of a general purpose, you see this a lot, and Pharma uses like, education is just like technical documentation. 


That's the one I just talked about. 


I'm hearing my humanities. 


You're transcribing, you know, Thomas Jefferson papers or something like that, TBI is, it's a really common XML spec for that. 


You're in health care is this HL seven spec that you may have heard about billing records, patient tracking, things like that. 


If you publish standards, there's a whole standard that's for terrorist organizations, actually has a standard that was called ISO STS, and that was replaced by a new one called ... a few years ago. 


So the idea here is that in your different communities, there are these different sort of standard suites of tags and you can, you can find websites on these and you can find all this information about, and I'll be glad to help. 


All right, so we talked about when XML is. We talked about this idea of a stack. 


How do we create this XML with the idea here? Well, you really got two options. 


First of all, you can create XML yourself and you can submit PubMed Central yourself. As long as you have the right tools, which includes like some sort of an XML editor or XML transform of the tool. 


Um, you can really, you can do it for free if you use Notepad, notepad plus plus that have some some technical knowledge to be able to. Tell me. 


Like that's not exactly for this thing You can spend a little money and I'm not endorsing any particular editor, but I just happen to know that oxygen stuff is the set up online for you know you can you can practice here. So here's, here's an example of an XML editor so you can make changes in this content. 


However you like, you know, ABC 1, 2, 3, You can add comments and attributes and all that kind of thing over here. 


You can see the XML itself. You can see here, I just added ABC 1 2 3 without ever actually going into that, into that code. 


Certainly that there are several great authoring tools out there. 


Um, you can have something like that to be able to create eczema yourself. 


Another thing that it really, you probably should have for this is some sort of an XML enabled CMS or an XML, where CMS, Russell a lot of great tools like that out there. 


I've also seen that there are some publishers that are trying the product for free option, which is using a Git Repository. I will say, puri is's work as while you might be able to get started. For free. Typically, there's a lot of setup that's involved. You're gonna have to pay somebody to do that, but there's somebody on your own team, or not. 


And then lastly, there are some technologies that are out there that will automate the creation of XML. 


Some won't even automate the submission to PubMed Central. 


I would caution you a little bit about those, our experience is that, yes, there are some great tools that will automate a lot of this, but if you need it to be exact, automation may not be only choice. You may need to have another step in there. 


Because, remember, precise, you're out on this. And we've seen this more than once for a publisher has come to us and said, we bought this technology, it's supposed to be doing this. But, you know, during our scientific review, we kept getting errors and we can't figure them out. 


So you're saying, be careful when you're going into that process and you can do it, and you can do it yourself. 


Obviously, you can also use a data conversion, special lesson, sort of a vendor to do this, Very likely see how, you can see over. Time that you can, You can use that, but again, you need to remember this. You have a lot of options when it comes to gender equality, does matter high. Quality Matters for getting the kind of into PubMed Central Quality Matters, or how it's going to be used, efforts, improvements, and demonstrate this, to talk about how Quality Matters, But do you have a lesson? 


Assoc eighties, movie Moonstruck, OK, let me know in the chat if you, if you loved this movie, or, you hate this movie, or if you've never heard of this movie, that's fine with that. And you're ready to turn this. 


I didn't get to everybody, if you're listening to this using your own for the audio, you may not hear the video. You're going to have to turn up the volume your computer, or just read the captioning, something like that. With that, Mary, I want you to kick it off. 


Ashtray. Josh, and that seems like a lot. Scotch drainage advice heart. 


There we go. 


OK, back to you, David. Right. 


Alright, so I hope you enjoyed that, You know, obviously, Idaho, we don't sell any type of pipe, but we do, would you, when you press finding the quality, does matter, Right. 


And dev, The idea here is that the message you can take away from this, is that I think we should follow mister Castro quality XML costs money, because it saves money, right? And we've got countless examples of this, for the publishers have taken a less expensive lout, try to get the XML, and they come to us later and said, we need to fix this. And there's a lot bigger pain in the back. 


All right, But I'm just going to finish up a few things here. You might hit on kind of the why of XML, Some of the great benefits. 


We didn't hit on this a little bit during the PubMed Central, part of the program, but I thought I would get on it a little bit more right here. 


Some benefits here. You've got this kind of Excel structure behind the scenes. 


No matter what industry you're in, um, you're really going to see a lot of benefits. First of all, the idea of interoperability, simplified data sharing. 


So the XML makes the text readable by a computer, which makes it actually easy to share with other computers. 


And because it's software and hardware dependent, could easily have that between systems. 


So that's one really great benefit, the wife, XML makes sense as opposed to having a lot of PDFs Link. 


Second off, the whole, I guess, search, HTML tagging, makes it easier to automatically parse the content of an article, right? Supports the different pieces, which really helps when you try to get into more detailed, more focused, and more filtered type of searching. 


If that content is in, no Word or PDF, you just don't have that same kind of ability to find things facts, benefit the idea of multichannel publishing from from a single single source here. 


So with XML, instead of creating your content in a lot of different ways, you can create that in XML and then you can automatically, there get your outputs through various transforms as needed if you need a PDF. 


If you need for an on demand file, if you need your HTML for the website, if you need an e-pub, we need whatever. 


You've got it, because it's in, it's in that XML, and you don't have to go and recreate it each time. You simple transform. 


And that means you've got them on it. 


Accessibility, I talked about accessibility a little bit. 


The structure behind XML, its ability to adapt to various environments, really makes it a key component when it comes to accessibility. 


It also lets the computer know what's important to read. If you've ever just kinda goes, no screen reader that, it's not like an optimized for you to just reads everything, and it doesn't really know where to stop and start, et cetera. But when you put this, this structure tag and you can give a computer instructions about what's important to me. How should it be bred, where should be read, virtually, Go Next? 


But it also does a lot in terms of things like voice assist applications and let you know it can put those things together that can be used in bullets underneath. 


We hit on this a little earlier as well this idea of future proofing. 


It's not tied to particular software or hardware thing. 


Is this XML is going to still be able to be seifried, you know, and use several generations from now. 


Regardless of changes in technology. 


I must say after future proofing we've got, enforcing consistency really does help to enforce consistency. After all, you don't want to have a journal that doesn't have an abstract, for example. 




Or if you're a pharmaceutical company, you don't want the objectives and endpoints created from this study to be wildly different from your objectives and endpoints created for others. 


If you try to. 


Comply to different regulatory requirements that are out there. 


Need to have that kind of consistency and XML is perfect for creating and enforcing that kind of, last time, I mentioned one, that's not really so much for the PubMed Central part, but when you create content in XML, you do have the ability to, to re-use that content. 


All right, so typically in Scholarly publishing scholarly journals, we don't do a lot of re-use that's called patriotism. But, you know, at some of the other industries that are out there, and she would XML, can't do this kind of content reviews. 


Alright, so that's the great benefits of XML, and really concludes part two of this presentation. So, I'm gonna give a quick summary, here, in the next couple of minutes, and then we'll open it up to any questions, or chat. 


Sounds good? 


All right, so, in summary, structured content, XML is really useful in a lot of different industry. There's CM publishing, like we've talked about for PubMed Central. 


Educational publishing really makes a lot of sense, especially if you've got a lot of content to re-use, that can be really useful. 


Documentation is the same way. 


If you're informer, the move towards structured documentation has been really really picking up speed over the last couple of years. 


And that's something that they were hopelessly, pharma companies out, healthcare, you know, XML makes a lot of sense for healthcare and healthcare records, financial services. 


We're seeing more and more, no standards in and around our financial information is reported, how it's shared, et cetera. 


So one thing I want you to take away from this is that XML is used for women and all these different things. 


Don't take away a lot of benefits, we just covered all of these, So I'm not going to go through them individually, but you know, there are a lot of a lot of positives. 


Key point, I want you to get us that, you know, using XML reading, the structured content, it doesn't have to be difficult if not washing hands have been think that's impossible. 


It's something you could do. 


But if you really hope to have a partner, this kind of thing again, doesn't have to be K CL but in DC, I will be glad to be that part. 


And, last year, the permits, if it's the PubMed central submission process, doesn't have to be difficult either. 


And again, it can help to have a partner, and we'd be glad to talk to anybody about that. 


Because I'm going to just show you here that I do have a picture of resources. 


All of these links are in the handout that we've provided. If you want to try to write down a couple things that they are here. 


You certainly can, and Maria, we can move over and there answer whatever whatever questions we might have received. 


Well, thank you, David. 


And just to clarify everyone, there is a section on your goto Webinar control panel that is called handouts, and if you click on that arrow, you should be able to access a PDF that we put together with all of David's. 


Great consolidation of the PMC links and other content that might be useful. So we do have a couple of questions. And, David, the first question, I think it's interesting, because, you know, the terms PubMed Central and PubMed are confusing. I'm going to read the question. But, if I understand it right, I think to the terms should be swapped. 


So illustrates why this landscape is a little bit, you know, just just kind of confusing with the terminology. So the question is, Is there a way to have an article in PubMed before it is indexed in PMC. 


So I think that should be flipped. Is it possible to have an article in PM see PubMed Central before it is indexed in PubMed. 


Typically, it is possible to have a have an Oracle cited in PubMed, and not have that already exist in PubMed Central yet. 


That is entirely possible. 




You won't find a case where you typically won't find a case where an article has been. 


Has been loaded and submitted in the journal, has been approved for PubMed Central, and not be, also, in PubMed, because, typically, when they did the PubMed Central process, and also indexing for with other major databases midline of med, et cetera. 


OK, and so related to that, if a journal is already indexed in PubMed Central and interested in applying for medline, will there be any additional technical requirements for data submission once accepted, or to the XM XML file set to PubMed Central? 


Handle both in indices. Yeah. That's a good question. And, you know, in general, when, when you put a you submit the content to PubMed Central. 


They're also going to take the step of making sure that it's indexed in the appropriate place in PubMed, and intern deadline. You should not have to do a separate application. 


Typically, when we work, we get it the other way around where something's been in medline, or, of Ed Central says, hey, indexing in Medline first, and then we'll do PubMed Central after that. Typically, you don't have to do a separate separate application. 


And if you do the application for the Medline and four for PubMed, are really, really simple. 


And if you already have the information from PubMed Central, literally, it's it's You click a click a link, and you upload content. 


So, But in general, you shouldn't have to do that. 


Thank you. 


So another question if a cobbler, this publisher is submitting Word and image files for all submissions to PubMed Central. Could they just send an XML package to an FTP site to an FPC steppe site instead? 


OK, so then they are working, they're loading content to PubMed Central themselves. I'm not sure. I understand it. We are publishers submit Word Image files for all submissions. Could we just send XML packages to an FTP site for submissions instead? 


Well, you've got to look at the PubMed Central platform and typically, as part of your package. You have the XML and the images. I'm not sure I understand the question, exactly. Ordinarily, if you're loading to PubMed Central, you wouldn't just load Word documents and images. You have to load a package of that includes XML. 


And the PDF of the article, and all the image files, and we, we typically load those to the events platform. 


It's not a simple FTP process. Now if they're wanting to work with us, and you know, they can motivate our FTP and we can help to convert that, it's submitted to ... in Central Florida. 


I saw that question. I hope. I hope that makes sense. I didn't, I didn't fully understand it, but if you want to contact me afterwards, and we can certainly walk through it in more detail. 


There's another question. How quickly can d.c.l.a. generate XML for a journal submission to PubMed Central? 


That's it, that's a good question. 


I mean, normally, no, no, no. 


We don't put it specific time for a minute, but it's something that can be turned around. 


I think we're turning that just in a few days. 


It's something that if, if the need is there, we can turn around really fast with an app as well, I don't want to put, our production becomes facts. I'm not going to give an executor. 


But, yeah, it's, it's usually a period of days, not a period of several weeks or months, you can have that. 


Yeah, and I do remember when the pandemic just started, there was a period where we had some really rush Toby related articles that I think our team is working over the weekend, continuously get submitted to PMC to do whatever it takes to get it, get it done in time. And usually that's one of the first questions, and we'll ask them to publish a Kansas. What timeframe are you looking at here? Where are you in this in PubMed Central process? You know, how can we help you the best way? And a lot of times, if there's no real time constraint, other times, it's been, well, hey, we've inter application that, We've had two strikes. 


We need something. We've gotta give something to them by Friday. 


Because they're going to say, there is a bit of a timeframe from when you get your scientific application down to when you're when you get your technical requirements. And there isn't a period of time. they can expiring in your application can be rejected because you didn't get things in that. But we'll make sure that. that you do differ. 


Thank you. 


So does PubMed Central require a specific flavor of Jax XML? 


Know, it's really, it's just nice. 


So that's an XML to actually provide a link to that same page that I showed on the site. 


It's just a vocabulary that's that's involved with that. 


Shown here. 




I've added some reflector. 


That's where that might be. 


Yeah. So, he's a nicer jats, general purpose, she takes it and you use that to conform to the PMC style. 


They do have, as it says here on the screen, they do have a style checker that you can run to make sure that it does. 


And I think this is our last question. Does PubMed Central only accept journal articles? Or does it accept books monoliths and chapters as well. 


My understanding is that it only accepts journal articles at this time. 


Now, the National Library of Medicine, excepts book chapters, and other pieces, as well. 


Um, but, PubMed Central, my understanding of it, as I said, it, is, in fact, journal articles, unless that has changed recently. 


I don't think so. 


You seem to be having trouble with my. 


There we go having trouble with Maya. 


My audios forgive me for a moment. 


And now I think there's a strange echo. I don't know if anyone else hears it. 


I'm not hearing it. 


OK, well thank you so much for all this information. I want to thank everyone who's joined us today. If there was a comment submitted and we didn't get to address it here live, we will be in touch after today's webinar. So please hang tight. The DCL Learning Series comprises webinars, a monthly newsletter, blogs. And I invite any of you to visit data conversion laboratory dot com to sign up for those things and keep abreast of what we put out to help try to help our community and our industry grow together. 


Thank you very much for your time. And this concludes today's webinar.