DCL Learning Series
PubMed, PubMed Central – Ask the Experts
Hello, and welcome to the DCL Learning Series. My name is Marianne Calilhanna, and I'm the VP of Marketing here at Data Conversion Laboratory. Our presentation today is titled “PubMed, PubMed Central – Ask the Experts,” in which we invited you to submit any question related to these resources provided by the US National Library of Medicine. We collected questions during the registration process. And we also invite you to use the question box here in GoToWebinar, if any additional questions come to mind. Today's agenda comprises a short bit of housekeeping for me, and then I'm going to introduce today's speakers. We'll get into your questions, pausing to make sure we answer any new items that come to mind during the conversation. And then, we'll close with just a little bit more about DCL and how we support structured markup, XML conversion, and more.
So, today's event is being recorded. It will be available in the On-Demand section of our webinar, I mean of our website. Simply go to www.dataconversionlaboratory.com. You can select "Resources" and "On-Demand Webinars" from the navigation. There, you'll find a number of webinars related to this topic and many others that you might find interesting.
I am delighted to introduce today's speakers, my colleagues Devorah Ashlem and David Turner. Devorah has been with DCL for more than 21 years. She is fluent in many XML standards and presently supports so many of our scholarly publishing clients, not only in terms of structuring their content, but also really helping them understand the nuances behind the structure and how that tracks back to discoverability and accessibility. David is an industry veteran in the areas around content management and content structure. He's particularly adept at demonstrating the business benefits of digital transformation and helping organizations identify ROI to gauge investments in systems, structure, and semantics. Welcome, Devorah and David and everyone joining us today. So, David, I'd like to turn it over to you. And could you please start us off with the first question?
Yeah. Well, first of all, let me just say thank you. Thank you, Marianne, for including us, and giving us the chance to present and talk. I also just want to say thank you to everybody who is participating today. Thank you for participating by being here. Thank you to those who asked questions in advance. That was fantastic. And I'll go ahead and thank the rest of you who are going to ask questions that will arise on this webinar. So please, as you have questions you're gonna want to put those into the questions box, or into the chat, you can do either one, and they'll get to us some some way, somehow.
I will say this. We have a really diverse audience today. Some of you are our clients, so hi, good to see you again. Some of you are people we know in the industry. Some of you are new names, and I think looking at this list that, some of you out there are really people who have a pretty solid understanding of some of the concepts we'll talk about today. But I'm sure for some of you, this is brand new. And one of the things that I always discover in talking about this topic is that some people feel like they're supposed to know something, but maybe they don't.
And so we try to create this as one of those safe spaces, right? So you can ask whatever question you want, so that you can get a little bit better understanding.
So, anyway, I will say this. We know not everything that we share today is going to be useful for every person on the call, but we hope that at least something, and maybe many things will be useful. If you want to make sure that it's useful, though, it's going to be partly up to you. So ask those questions. Don't be shy, et cetera. Oh, before I get in, one other disclaimer: Devorah and I, while we do have a lot of experience with this, as well as our colleague Gitty, there actually is a lot of information available at the National Library of Medicine, NLM website. And they are great people there. We've worked with some fantastic people through the years there who can help. They are the ultimate authority to all the answers to these questions. So we're glad to help in whatever way we can, and just know that they, that they're there as well. So, with that, Devorah, are you ready?
All right, well, let's start with this first question, which is really one that I'm going to take. That’s just a really, really common question. And that's What is the difference between PubMed, between PubMed Central and then, this person actually also threw in MedLine. It can be really confusing, mostly because PubMed and PubMed Central do sound a whole lot, sound a lot alike. But there are some distinct differences here. So, I'm going to start by defining each one individually, and then I'll try to put them all kind of into the context of their relationships to each other. So with that, Marianne, go ahead and click us over and let's take a look at the PubMed slide.
So when when somebody talks about PubMed, and sometimes when people talk about PubMed Central, they use the term PubMed. But really, PubMed is a bibliographic database that is run by the National Library of Medicine. It has 35, well, 34 million plus citations from all sorts of places, and what I tell people, when they think of PubMed, is to think of it as a card catalog, if you will, of information. Instead of giving you the actual article, it's going to give you a pointer to that article, will give you a link to that article, whether it's on the NLM site, whether it's on the publisher site, it might give you links to books, things like that. It doesn't include the full text of the article. XML requirements are generally header only. I think we've got an example here, when Marianne clicks the next thing there.
Here's an article from a University of Toronto Press journal. And you'll see here, you don't actually see the full article text, right? You see the citation information, all right? And then if you want to actually read the article you would click, there's a link there for the DOI. That would take you to the website of the journal, The Canadian Liver Journal. And depending on your access to the journal, you may or may not have access to that article. So there’s a big, massive amount of information that says Hey, there is research about this particular topic. And if you have access to it, if it's open access, or if you have a subscription to it, you can get to it. So you might go to another website. Or you might go to the publisher's website, or you might go to an aggregator's website.
If you're a researcher at a university or a hospital, you might have a subscription to ProQuest, or to EBSCO, or one of the big, big providers, and you might be able to have access to that article. But, if you're, if you don't have access, then you just know that it's there. You can at least read the abstract, and you can decide if you want to purchase the article, purchase and access.
So, PubMed points you to the articles, gives you a link to where you can find the article. It doesn't give you the article itself. And it's big.
All right. Let's talk about MedLine now. So, MedLine is really a part of PubMed. It's a subset, if you will. It's also a National Library of Medicine database. It also is a card catalog, and it's really focused on particular areas. It is the largest subset of PubMed. It's searchable via PubMed, but it also can be searched on its own, has a very, very similar interface. And again, it'll point you to the articles; it’ll point you to links to those where you can find them. But it doesn't necessarily give you the article itself.
All right, let's talk about PubMed Central next. All right, we talk about PubMed Central. I try to make it a habit, and Devorah probably does, too. I try to use the term “PMC” as much as possible when I'm talking about PubMed Central, because that tends to be a little more clear. If you, whenever you say “PubMed,” it does increase that confusion. But whereas PubMed is the card catalog, PubMed Central is the actual library, the actual online library, right. It's actually got the access to the articles themselves. It's full text access. It's everything that you need based on journals that have been accepted there. There's something like 8.5 million articles on it for us, a huge number of journals. There are very, very strict standards for getting that content in. I do have an article screenshot here. So we looked at this same article’s citation on PubMed a minute ago. If you take a look at the actual article here, you can see that there's actual article text. You can read the entire article. It's not sending you to the publisher website, although it does have a link to that. It's not pulling up a PDF, although you do have a link to be able to get to a PDF. It just brings you the full text of the article. So it's really great if you're an independent researcher, you don't otherwise have access to library subscriptions. You know, you can read the entire article.
Another thing about PubMed Central, that's unique with PubMed: it'll, you don't have to be, you don't have to have your entire journal approved for PubMed. It can just be a certain article or collection of articles. It's not a requirement. But with PubMed Central, they index entire journals, and so you don't just submit an article, you submit your entire journal, and they decide Will we index this journal and will we store all the articles from this journal on PubMed Central? And so that's kind of a key thing. All right, before we move on to the XML requirements, which is the next question, Devorah, anything you want to add here, any two cents?
No, you’re doing great.
All right. Well, you'll get plenty of chances to talk in a minute, so just in summary here, PubMed, think of it as the card catalog. It sends you to all the links to articles and books, wherever they may be. Could be on the NLM site. Could be others. MedLine is the largest subset of that card catalog. Then PMC is the actual online library. So whereas PubMed has citations for 35 million articles, PubMed Central has roughly eight and a half million articles, but it's the full text of those articles, and it's the complete journal issue for those particular articles.
All right, let's move on to the next question. And this one came from one of our customers, and it says Can you explain the differences in XML requirements? Discuss any changes in linking done by PMC from the journal version. We had to get a little clarity on this one. I wasn't exactly sure what it was asking, but the second half of the question I'm going to address first. I think the real question here is How is the published article different on PMC than, say, from the publisher's website? Well, I'll say this. First of all, PubMed Central has a very standard layout. It has a global text, it has as a couple of different article views, it has the ability to get the PDF, it has a little button for citing the info. It's a nice mobile experience. All those, all those different things. On a publisher's website. it's really up to the publisher. The publisher's website may have some of those features. But it might not have all of those features, or it might have some additional features. You know, in PubMed Central, you can do a couple of things like extracting figures, or texts and things. But some of the, some of the journal hosting platforms, have some special features around, you know, ways that figures or tables are rendered. So it really just depends on the actual individual publisher.
All right, but from there, let's answer the first half of the question now, which is What are the differences in the XML requirements? Really this comes down to what you get. Remember with PubMed, you get primarily a citation. And so everything there is going to be all about what do you need to make a citation. PubMed Central is all about full text articles, so it's what do you need to be able to load full text articles. I'll give you an example. Go to the next slide, Marianne, if you would. On PubMed, since we're only storing the citation information, the information that they need to be able to create that, are the elements around things like What's the first page of the article? What's the last page of the article? Who are the authors? What's the date? Is there an abstract, typical XML header and footer type information.
I went online. I took a screenshot of what, what they had on PubMed's site. I counted them up; its, I don't know, 45, 50 elements, something like that, that are required. And again, those can be done for a particular article. It doesn't necessarily have to have it. You don't have to have a whole journal approved, if you will. Now, PMC, on the other hand, if you're getting an article there, going go to the next one there. You're, they have very specific way that this content is going to roll out. There’s a very specific user experience they want to give. And so, here, because you're storing the full text of the article, they need the elements that relate to full text.
So, that's a lot more, right. Whereas PubMed was roughly 50 elements, PMC, we've got something like 90 different elements of information that we can use. You may not use all 90 of those in any one particular article. But that certainly is what the schema’s like there. Devorah, any thoughts on this? Any best practices or experiences around dealing with PubMed XML or PMC?
I just want to jump in there and say that with PubMed, although there are about 50 elements that are allowed in their DTD, there's not that many that are required. If you go back to the slide, you'll see some of them are listed with an R and some of them with an O. Marianne, sorry. And the ones that are an R are actually required; the ones that are an O are optional. So even though some of them are allowed, it's not actually required. I just want to clarify that.
Oh, that's perfect, thank you. All right. Well then let's move on, then, to our next question. What are some important things to consider when getting a new journal or article to be submitted here? Devorah, I think you have prepared some information on this. So I'm going to let you take it away from here.
OK, so there are a couple of key considerations that you need to take into account when you're getting ready to submit, or applying to submit, to PMC. Number one is the publication model: are these published as full issues, or are they published as articles being released slowly and then at the end of the year it's an issue? You know, the packaging and the format that PMC will expect them in is going to be different. If it's a full journal, a full issue, at a time, or if it's article by article. Do the articles use standard page numbering, or are e-page numbering used? Are these published both electronically and in print? When you have a print publication that's being posted online, so then the page numbers reflect that, you have an, across the whole issue. It starts from one, and then, at the end of the issue, it ends up 345 or whatever it is. So each article has a first page and a last page, but when you're publishing online, they may not have that page numbering system. So, it needs something: e-location ID or, basically, PubMed Central needs to know, what kinds of page numbering are we going to be using here? If it's published electronically, print, both, am I going to be using that publication date for one of them, both of them? You know, these are all kinds of things that PMC needs to know, as you’re, when you're starting, and starting to submit a journal to them.
The last one here is, does the journal release ahead-of-print articles? Now, the short answer is that PMC does not really accept ahead-of-print articles. They have a couple of exceptions where they have a certain, a couple of outstanding requirements that, if you meet those, then they will accept it. But, in general, PMC does not take these ahead-of-print articles. PubMed, on the other hand, they do accept ahead-of-print articles. So if you have, if you're a publisher and you have a journal that has these ahead-of-print articles and that's part of your publishing cycle, then that's something that needs to be addressed and discussed with PMC prior to becoming an online journal there.
Outstanding. Outstanding. Well, and I know some of you that are in the audience today, you probably have some other things that you might recommend from your own experience. If you've got anything like that, if you’ve got any helpful hints, any things that you think might be useful, feel to throw those in the chat and we'll be glad to share those with the rest of the group as well.
All right. Let's go on to the next question, which is What tools are used and QC checks are performed before submission to PMC? I'm gonna let you talk a lot about the quality and the quality control piece. But as I looked at this question I thought before we talk about that, it's probably helpful just to describe the process of getting approved for PMC. Because that's something that I think, a publisher is busy, and it gets really, really difficult to kind of understand everything that goes into this. And so I'm gonna just walk really quickly through this process.
Now some of this process has to do with a vendor like DCL. But some of the process really doesn't have to do much with us at all. But, in general, it starts with you deciding, hey, I want to submit content to PubMed Central. So, they have an application. And the application itself, remember, is not for a particular article, it's not really for your organization as a whole, it's for a particular journal. And so, you're applying your journal. And so there's an application, but they're gonna, they want to know the publisher is they want to know when the first publication was. And the reason they want to know that is because they're gonna make a decision, a determination: how much of the old content do we want? If this publisher started publishing in 1946, do we want them to submit all the content from 1946 on? Do we want to start in 1999, or do we want to start with the current content? So they want to know about that. They want to know: where is this journal hosted? You know, any links? What’s your peer review process? They'll want to make sure you’re a reputable journal, they want to make sure that you're a journal that has all the standard things in the scholarly community. So, that's all kind of part of that application on the front end. Once you've put that in, it goes through an application screening process.
And so PubMed Central will read your application, they’ll review your content on your website, they are gonna review some of your content that's out there. They're gonna look, and they're gonna see whether – they're gonna come, they're gonna ask you questions. All sorts of things. And at this point, you want to pass that hurdle. They then take you into what's called the scientific evaluation. And the scientific evaluation, or scientific quality review, this is where they really try to make sure that your content is what they're looking for.
All right. PubMed Central is all about medical content, right? And so, if you have a humanities journal and they start reading that, they're gonna look at that and think, hmm, this may not fit, unless, of course, it's one where humanities relates to medicine and science some way. So they're gonna make sure that your content as a whole looks like it's the right kind of content. A lot of times, if you're already indexed on MedLine, or already indexed on PubMed, that's kinda good enough, but they're going to, they're gonna assess what do you do, what, what kind of scientific career do you have, all of that kind of thing. If you don't pass, you don't get to apply again, I think the slide says it here, yeah: 24 months. If you don't, you can show again, in 24 months, that you've met the standard, but assuming that you've met that standard, at that point, they move into another evaluation, which is called the technical evaluation.
Now, this is the part of the process where a company like DCL typically gets involved, because they not only, they've already made sure that as a whole your journal looks like it fits the process. Now, they want to make sure that you can get your content to them in the right format. You gotta think about this, all the articles that they've got and all the articles that are coming in, and a limited staff, anytime they get content that is not in the right format, that's a headache for them to try to get it into the right format. So they want to enforce this on the front end, so they ask you to show that you can submit content in the format that's required, that you can use the right XML done in the correct way. So on and so forth.
Now, you can do this yourself, or you can hire a vendor. But they're going to ask you, specifically, for a package that includes 25 articles, and they want that package in a very particular way: certain XML, PDF, images, supplemental data, et cetera. And they want to then review that for errors. If you make an error, not a huge deal, they're gonna give you some feedback that says, Hey, we found this error, or these errors. You make a second error, it starts to be a bigger deal. You make a third error, they basically cancel your application, and you have to wait three years to apply again. So, you want to make sure you get the technical evaluation, right. It's kind of a “three strikes, you're out” thing, and that's why the quality steps that Devorah is going to talk about here in just a second are so important. And so, this is, frankly, one of the reasons we get involved. We have publishers that call us all the time that say, I have a reputation in the marketplace. I want to make sure there's no issues whatsoever. So please take this on for me.
We have others that come and say, Hey, I bought a Word plugin, or, I bought a technology that promised me I can automatically get this done. But now I've got two strikes, so I have to make sure it's done correctly this time. And we can help with that. Anyway, after they review your content, it takes several weeks. Once you get approved for that, you move into what's called the pre-production stage. And at this point, PMC's going to ask for a couple of things: they're going to ask for you to to sign an agreement with them. There’s some paperwork involved. They're going to request a couple of things for the website, like a website banner and other such things.
Then they're going to talk to you about how much of your existing content they want you to submit. Sometimes, like I said, it could be that they say, Yeah, we'll just do it from today forward. Sometimes they might say, Let's get the last three years. Sometimes they say, Let's go ahead, let's do 40 years. And you can, you can work that out with them and you work that out with a vendor like us. I will say this. We've had some people that have finished their technical evaluation, and were excited, and then a few months later they said, how come my content's not on PMC? And it's because they've never completed these steps of this pre-production. So, after you complete the technical evaluation, keep an eye on your e-mail. Watch for the PubMed Central staff to send you content so that you can get the agreement done, you can get the banner loaded, you can get your backfiles squared away. And as soon as that's done, when things start to really pick up then and you move into what's, I guess we call that the production phase or the, whatever the last bullet is that we put on this slide. Marianne? Live release, that's what we called it. And in that case you're just you're doing these things in a very standard way. If you work with a company like us, either we'll do the typesetting for you and then we can convert to XML and we can load the PubMed Central on your behalf, or some will use a typesetter of their own, and then send it to us to do the conversion, and the loading. It really just depends on what you want to do.
So that's the overall PubMed Central process. Again, there's a “three strikes and you're out” process there. Another thing I again just want to stress is that you're applying for a journal. Every time of year, there's not just an article, not all your journals. You're doing a journal at a time. Anyway, so the quality piece is important, and Devorah, now I'll shut up and let you talk a little bit about the quality pieces.
OK, so the first thing that you want to pay attention to is that the XML itself has to be valid. So, what does that mean? Every open tag has to have a close tag. It has to be valid against the DTD that you're tagging against. And we like to use JATS, so, for every figure, if there needs to be a caption in there, that’s the caption and the title or the actual figures there, basically you need to parse it against the DTD that it's tagged to. But, as many of us know, or don't know, a file could be completely valid, but it still is not correct. So, that's when we have the PMC style checker that comes into play. PMC has a whole bunch of rules that they check for within that XML file. If the article type, it has to be valid. Or if the DOI has to be a certain format. Because the DTD, the XML DTD will say, OK, a DOI tag has to have text in there, but the style checker will say, Oh, well, it has to have a ten-dot-five digit, that's the publisher ID.
So it kind of checks the content within the XML that is beyond what the DTD or what the tagging structure allows for. So, it will also check, we also check and PMC in general when you're sending them files. They will right away bounce back files that don't parse, files that don't pass the style checker. And then, external files. If you have an XML file that calls out a figure, or calls out a supplementary file, all of those external files need to be included in your package. And you can't send them multiple packages or multiple files. It needs to be one full package. And depending on if this is an article-based or an issue-based, the whole package needs to be named accordingly and packaged up in one zip file. In addition to those three things that PMC will check for, and if someone tries to upload a file or submit a file that it just, it won't go, there's also some quality control that we have put into place that we put it in, and we obviously check for everything that PMC’s style checker is checking for.
But in addition to that, we'll also check for things that the PMC style checker may not take into account. So, for example, the style checker really checks the XML across, all XML that comes to them needs to pass the style checker, but it doesn't check for anything that's journal specific. So, if, in your initial process you said that you're going to use, like we talked about, the first page/last page, based on the print, or, if you said, I'm going to use this kind of license information, when we have quality checks in place, that check specifically for that journal, it does that match.
So, even even if it passes the parser and it passes the style checker, it may not actually pass the quality control before, and then we won't send it to PMC, obviously. Now, if a file, or if someone sends a file that goes to PMC that doesn't pass, then they would just send it back to you. And that would be one of those feedback items and things that would have to be addressed. And so we at DCL, as you know, we try to catch everything beforehand so that when it gets to PMC, and as David mentioned, with their limited staff and lots of data, they want the files to come to them as perfect as possible. So we basically put in some extra checks, specifically for that specific journal. And yeah, that's what we do.
All right, so, and one of the things I guess you're saying here is that it could be that your typesetter has XML that works perfectly on your journal hosting platform, but it may or may not conform exactly to what you need for PMC.
Yeah, I mean, we've definitely had experiences with some publishers that would send us their data and it works great on their platform. But then as we look into the source, we see, oh, wait, some of, this information is not accurate and the reason for that is maybe on the platform, there's some other metadata elements or another metadata file that they are pulling in that information from. The XML standalone is not what's actually driving the client's publisher platform. So, PMC takes the XML as, that's their only intake really for that article. So you need to make sure that everything in there is accurate.
All right, well, very good. Well, I think, based on our timescale, we got another question we need to get to next. This has to do with funding. Regarding potentially discrepant funding information from publishers in PubMed versus from authors in PMC, are there checks in place? Any thoughts on this?
So, currently, although PubMed and PMC share data, for example, when we send information to PubMed, we send a whole bunch of metadata. But if that article exists on PMC and there are figures there, for example, PubMed will automatically pull those figures in and display it on PubMed, even if that was not in the files, though, that we've submitted. So, there's definitely some relationship between the two. But, in terms of reconciliation between this kind of data, there is not currently any checks in place between PubMed and PMC as far as I know. If someone, an author or publisher, is looking at one of their articles, and they see some discrepant information, and that's definitely something that they should reach out to PubMed or PMC and have them update that manually. But there is not currently any reconciliation between the two.
All right, Marianne, let’s hit the next one. Oh, I like this one. Why does PubMed Central require full text and XML? Why not just a PDF or HTML file? It's actually, I love the “Why XML?” question in general. It makes me not always the most popular person at parties. I, this weekend, somebody's asking about what I did, and I started talking about XML, and he quickly started looking for somebody else to talk to. But hopefully, here among friends, you'll get what I'm talking about. Because XML really does kinda make the world go round, at least information on the Internet, right? And I think the reason for XML here is that the library staff was, it was killing them do this themselves when they didn't have requirements around XML.
So, let me start by just giving a couple of practical reasons. Marianne, if you go to that next slide, that’s the first kind of practical reason for “Why not PDF?” is really the user experience, right? Because of the XML, we can then produce HTML and we can have this reflowable text. So, when you're on a laptop, you have a nice experience where you can read the content. If you're on an e-reader, you can reflow the text. If you're using mobile, like in the picture here, the text just reflows and it works nicely. Anybody who's worked with PDF on their mobile device and has had to pinch and zoom knows that that will make you lose your mind. So it's one of those things that really is nice here.
I'll say also, about user experience, I mentioned this before, but on their website, they actually have a pub reader view that you can get onto, and it's a really nice experience as well. And that's really driven, again, by the fact that this content is in XML and can be easily transformed, then, into other experiences. Also, user experience-wise, there's things like, if you click on a table, on this next slide, we'll show, there's like these table pop-ups. Things like that that you can use. And it really makes things work well. And why not just HTML? Well, you probably do a lot of these things in HTML, but really, HTML is a very, very limited tag set. So like we were talking before about, we had like those 90 different elements, I think when you look at HTML it's really header and paragraph; it's not all those really specific things that you need for a journal article. So XML really kinda takes you to the next level.
So, go to the next screen. I've got just some reasons here why I think XML is such such a benefit. Right. I mean, it's one of those things that just overall makes everything easier to parse the content in an article. You know, it's an international data standard. So, one of the great things when you get your content right for PMC, then it's really pretty easy to make it work right on your journal platform, or if you have a new journal platform, or if you want to, if you're going to be hosted by some other aggregator, you've already got your content in the standard.
The XML helps ensure consistency. If we didn't have these XML guidelines, so the PubMed central people will be pulling their hair out because they get somebody who sends them an article and they would tag a reference as, you know, reference tag at the beginning, reference tag at the end, and nothing in the middle. Or somebody might tag author name as the whole author name, while somebody else might tag the author's first name and last name separately. You know, this makes everybody use the same terminology, same tags. Everything is consistent, and ultimately that leads to the big one here, which is less searching, more finding, right? We get researchers that are trying to get to this information. By putting it in this XML, it helps you to really find these things. No matter what platform you're on, get to them, be able to use them.
Another key benefit, I think, is that it's platform agnostic. You know, XML is one of those things that you don't necessarily have to worry about having a particular reader for. Devorah, you're far too young to remember this, but when I was in college, I saved my resume on, in Word, not in Word, but in WordPerfect. And a few years later, I went to go get that resume out, and I realized, oh, I don't own WordPerfect anymore. And I had no way to get to that information. So I got all this stuff on these old floppy disks that I can't use. Well, XML is not like that. Twenty years from now, machines are going to be able to read XML. It doesn't have to require something and it's very, very inter-operable. It works between systems.
Anyway, let's go now to the next one, and I think the next question has to do with procedure. Oh, yeah, it has to do with the procedures for Online First. You know, as publishers are considering, you continue to get content to the market faster and faster, they want your research there. We've had this whole concept of Online First. How does that work with PubMed and with PMC?
So, I think I mentioned this earlier, PubMed does accept ahead-of-print articles, and those can be submitted. PubMed Central, on the other hand, they do not. They do have a couple of exceptions: if the first is the version of record, and if it's needed to get out there because of funding. You know, there's a couple of requirements while they will make an exception to accept it. But in general, they don't, and the one thing to keep in mind that the impact of this a little bit is that PubMed, they require there only be one data provider for a journal. So, if, for example, you have a journal, they have Online First articles and you want that to be on PubMed, however, PubMed Central, PMC, does not include those ahead-of-print articles on their site. So, PMC cannot be the data provider to PubMed for those citations, for those citations. It needs to come from either the publisher or a vendor like us. But basically you can have, PubMed will not accept content for a journal from multiple places or multiple providers. So that's just one thing to keep in mind if you do have ahead-of-print articles that are coming from a different place, you need to keep in mind that all your content will have to be submitted to PubMed in a different way and not from PMC. Yeah, I think that's it.
OK. All right, I think we have one more question that we've gotten in advance. So, I'll just stop at this point also and talk to you. Those of you out there, you know who you are. You've got that question and you're thinking, should I bring this up? Listen, nobody's going to chastise you. Nobody's even going to know, except for me and Devorah, who asked the question. So, if you think the question is stupid, doesn't matter. Submit it. We're gonna get to those in just a second. All right. Let us know what, what it is, but, question number eight. We had somebody ask Why bother? Why bother submitting to PubMed Central? I've come up with four reasons myself here. I think, first is, there's just, there's an increased ability to find your content, right? And discoverability drives every thing for publishers, right? it's one of those things.
If you want to get, if you want the aggregators to include your journal and their subscription, if you want the libraries to buy your journa, they're gonna look at usage statistics, right? They want to know that you've been discovered, you've been cited, all those different kinds of things. Being on PubMed Central gives you that kind of discoverability and access. It certainly is, I mean, it's a US-based resource, but it is an internationally used resource. It really helps to drive that for publishers, so that's the first one.
The second one is, it gives you kind of this long-lasting place to keep your articles. Publishers come and publishers go and publishers get acquired and merged, and your content that's on your, on the server that you've set up may or may not be here for all time. But there's a pretty good chance that the National Library of Medicine is going to be able to keep these articles for a really long time. Have them available to the scholarly community. There's a sense of permanence, I think, that you get when you go to PubMed Central. and I think it's also good for your your journal's credibility. It's certainly something you can use for marketing, et cetera.
The third thing I was thinking about, it kinda goes along with the first one, but it's really kind of an increased exposure, if you will, because, PubMed and MedLine and these other databases are so frequently used, you have a really, really good chance of your content being found, being cited, being utilized, all of those things.
The fourth thing I put down was that sometimes it's a requirement. And you publishers know what I'm talking about. Sometimes you have a funder who says Hey, we have this expectation that it's going to be this certain way, and so, yeah, it gives you that compliance. I also just put another thing here. I want to put this out here to the audience. We've got 15 minutes left here. Those of you who have submitted to PubMed Central, why? Why did you submit? What were your key reasons? Throw something in the chat to us. Let us know what your thinking was, why you bothered submitting to PMC. We'd be interested to get your thoughts on that.
All right, that brings us to the place where we have audience questions. And, Marianne, I can't see the questions here, so is that something that you can feed me?
Yeah, and let me just jump back to our slides.
You had a really lovely desktop.
It was meant to be a little bit cleaner before today, but, you know, work trumps all. So, we have a couple questions around, ahead-of-print, so I'm gonna kind of read both of these, because they're related and let you two kind of discuss it. So one person would like to know a little bit more around sending ahead-of-print records to PubMed and then updating those records once the articles are set in issues. So, if you could speak to that, and then, the other one was related to any costs that might be related to PubMed in publishing ahead of print.
So, I'll talk about the different versions of the file. So, if you have an ahead-of-print article and it's online, it's submitted to PubMed, and it lives there, it has a DOI. Now, once it becomes part of an article, that DOI will not change, but you might have some other updated information that you need to include there, you know, whether it's part of a certain issue, or something else changed, the page numbering or whatever. So, you can resubmit it to PubMed, re-feed it to them, and they will replace what was there, because there will always only be one DOI per article. So, it will just, you won't have two versions on PubMed, you'll have just the updated one. Does that answer?
And what about costs related to ahead-of-print?
I don't know about that. David?
I think you really, I mean, as far as cost to PubMed, I don't think there is one, but in terms of, if you deal with a service provider, there may be a charge for making an edit, handling submissions, things like that. I think it depends on how much work is involved, et cetera.
OK, we have another question and it's related to funding and I'm just going to take a moment to give a little bit of context. So the question is: How should the XML requirements be considered for federal-government funded articles, per the OSTP memo? And I just want to spell that out. That would be Office of Science and Technology Policy. That comes from the US government. They put out this memo in late August. It's also called the Nelson Memo, and it's a requirement that by December 31st, 2025, all federally funded research needs to be made freely available without delay. So, where does the XML requirements fit into getting this immediately available? Because some publishers might consider providing minimal standard XML data and retain more robust article features on their platform. So, that's sort of a two-parter. And I think we need to kinda talk about, kinda separate feature functionality and XML. Over to you guys, if that makes sense. Let me know if you need more clarification.
Can you ask the question one more time? After I get all the explanation, I'm gonna go back to the question.
Yeah. How should the XML requirements be considered for federal government-funded articles per the OSTP memo? Some publishers might consider providing minimal standard XML data and retain more robust article features on their platform.
I think it's really, I think it's a publisher decision more than anything. I don't think it affects the XML requirements so much per se. I think it's more of a business decision. Now, that said, this is a, it's a big topic, and there are certainly people that have far more, a lot more expertise on the Nelson Memo than I do. There was a really good article about it, Marianne, that you had passed to us from Delta Think’s newsletter, I think.
Yeah, they wrapped up the conversation that took place at the Charleston Conference around the Nelson Memo. And I know Scholarly Kitchen has a lot of really good resources. If I'm interpreting the question the right way, I'm thinking back to the conversation that the three of us had while preparing for this webinar around, let's take the example around table pop-ups. So on some platforms, you click on a figure or a table, and that's going to pop up, or a new window might pop up, or it might pop up in the same window. So that's a functionality. That's a feature of your platform. The XML is the XML. How that XML is presented and how a user interacts with that XML, those are sort of two different things, right?
Gotcha. Well, do you remember that for PubMed, the XML we're talking about, it's relatively limited, right? So, it's it, you're not going to be worried about all the features and things that it's showing on the journal website. It is going to be a minimal amount of header-type XML that can be delivered almost immediately. It's things that are going to be standard to every journal. What is the title? What is the volume? What is the abstract, et cetera.
So, I don't think that's going to change. And as far as what happens on the full text. Again, PubMed Central has already very specific requirements about what are required there. So, I don't know that the XML requirements are going to change a lot from a technical perspective. Devorah, any thoughts that you have?
Yeah, just to repeat, really, the XML requirements will not change. I mean, as far as we know. It's really, how does the platform display that content? So, we provide, specifically, funding. Let's say, we provide, in the XML, funding tagging. And we give that to PubMed Central when they ingest the files. What they do with that, how they display, or where they display it, that has to do with the platform, with the bills and the single issue for really any tag.
XML is a semantic set of tagging. It doesn't say Make this big and bold and beautiful. It just says This is a title. And then it's up to the platform to display it how they want to display it. So, by saying, Oh, we're going to reserve some XML for the publisher site and not give it to PMC, I mean, that that really is, it doesn't, the XML itself doesn't really have anything to do with how it's going to be displayed. The builder of the platform or the way that the XML is translated into HTML and into a display for that, that will take it into consideration.
Any other questions that we have, or any comments in the chat as well?
We do have another question. I think we have time for one more; it's something we always see: How much does it cost? How much does it cost to work with DCL on this? How much does it cost? What are some of the costs related to getting this information up on PMC?
Good question. Good question. Really, there's a couple different ways to look at it. Right, there are costs that are gonna be involved in: how much time do you have to spend doing an application? How much time do you have to spend making sure that your banner meets this, that, and the other? As far as getting the content into the XML format and getting it loaded, it really depends on, do you want to try to do that yourself? Or do you want to have a vendor that does that? Do you trust your current vendor to do that? You know, we see that we see that a lot. We have a lot of publishers who they might trust their vendor for typesetting, but they don't necessarily trust that vendor for. Getting their content onto PubMed Central. So, really, you just have to kind of look at which way you want to go. If you want to do it yourself, you're gonna have to invest in some technology, and some training.
Probably more importantly, you're gonna have to invest in some, some quality, checks, some quality tools, because that's, I think that's a big piece that's really not, not always noticed, is that, you want to make sure things are right. It's really easy to fix things when somebody points out something's wrong. It's harder to find the things that are wrong on the front end. You gotta have good technology to help you identify those things so that you can fix them. What we typically do when somebody works for us is, we usually have some sort of a small setup charge, a few hundred dollars, something like that. Then we charge a per-page rate based on how many PDF pages, because usually it's PDF or Word that is being converted. How many pages do they have? Do the 25 articles, if their articles are two pages each, that's a little different than if somebody has articles that are 20 pages each.
Then the next question is, how much does they central require you to do, are they certainly a one-year backfile, you need to do 20 years’ backfile? We generally give a better price per page if you're doing a large amount as opposed to a small amount. It's kind of an economies of scale. Anyway, I can give you more specific information. If you ever have content, we'd be happy to do that. I see you've actually moved on to the next slide here, which is my cue to start kinda closing things up here. I will say this: we do have a number of ways we can help you guys help you with your PubMed needs, or PubMed Central. We're happy to answer questions. We're happy to get on a call and walk through stuff with you. We're happy to outsource some of your your needs for converting the loading to PubMed Central. We got quite a machine with a lot of publishers that we work with. And we're glad to make that available.
If you've got other content challenges, maybe I'll help with that, too. We have a lot of publishers lately who are moving from one platform to another. And they need some, some data conversion that that's involved or, or maybe they're trying to tackle a backfiled project. We have a lot that are really interested in trying to identify where there are problems in there, your content, especially on a platform migration. We have a tool called Content Clarity that's kind of a QA validation tool that helps to identify all these problems. And it's really a relatively inexpensive tool to identify. Maybe you're looking to extract some things from existing content. that we're talking about funding information. Sometimes publishers come to us and say Oh, my gosh, funding information has become so critical for us. But we weren't tagging funding information until last year. So we've got all these old PDFs, can you help us to get that content extracted and get it back? That's the kind of stuff we can pull up.
Or maybe it's harvesting data because you're creating a new publication. You want to aggregate information from somewhere. Or maybe, like you said, maybe you're getting content from your typesetters and you need to have it verified and validated. Actually we do typesetting as part of the process. So let us know if we can help in any way. And that's the last side. The last slide says if you have any other questions, just you can scan my information here. You can hit us up at info@dclab, so on and so forth. So with that, Marianne, let me hand it back to you and let you close this up.
Thank you, David and Devorah. Thank you, everyone, for attending today's webinar. I just want to close by saying The DCL Learning Series comprises webinars, a monthly newsletter, and our blog. You can access many of our webinars that are related to content structure, XML standards, and more from the On-Demand section of our website. We do hope to see you at future webinars. Have a great day, everyone. This concludes today's broadcast.