
DCL Learning Series
Research Integrity by Design – Leveraging Structure and Technology for Publishing Workflows
Marianne Calilhanna
Hello, and welcome to today's webinar. Today's webinar is titled "Research Integrity by Design: Leveraging Structure and Technology for Publishing Workflows." My name is Marianne Calilhanna and I'm the VP of marketing here at Data Conversion Laboratory. Before we begin, I do want to let everyone know that this webinar is being recorded and it will be available in the on-demand section of our website at dataconversionlaboratory.com. We will absolutely save time at the end of our conversation today to answer any questions you have. So please feel free to submit questions via the GoToWebinar control panel and you can do that as they come to mind.
I'm really excited to introduce today's panelists to everyone. We have Mark Gross, who is President and Founder of DCL. Mark is an industry leader in structuring content and digitization best practices, and directs the development of DCL's extensive list of services and solutions. We also have Madalina Pop. Madalina joins us today from Morressier, where she serves as Research Integrity Team Lead. She leads Morressier's product team toward building a tech stack that integrates research integrity checks that support authors, reviewers and publishers at every step of the publication workflow. And finally, we have Adrian Stanley. Adrian has been intricately involved in many tech initiatives across the ecosystem and today wears his hat as a Partner with Clear Skies. You probably remember Adrian as a past President of the Society for Scholarly Publishing. And welcome, everyone.
So before we jump into the conversation, I'd like to very quickly share a little bit of background on DCL. We've been in business for more than forty years and are US-based. Our focus is on resolving problems related to content formats and metadata. DCL's core mission is transforming content and data into structured formats. We believe that well-structured content is fundamental to fostering innovation. Today, people know DCL as the leading provider of XML conversion services, but Mark Gross started the business in 1981, long before markup languages were a thing. Today, DCL boasts more than ten thousand projects in fifty countries. But what really sets us apart is quality. We work on complex content and data challenges when it has to be right. And listed here are just a few clients representing a broad range of industries in which we work. Now I'm going to turn it over to Madalina and have her share a little bit about Morressier.
Madalina Pop
Thank you, Marianne. It's a pleasure to be here. I'm Madalina Pop. I am part of the research integrity team here at Morressier. Our team manages Morressier's Integrity Manager service, supporting publishers with the assessment of quality and integrity of research to be published or that has already been published. A few words about Morressier. Morressier provides societies and publishers with integrity-protected workflows. This is in line with our vision of a world in which all scientific outputs are traceable and trustworthy.
4:00
Morressier's initial focus was digitizing conferences in order to report early-stage research. And this nicely evolved into a workflow platform. And now we are committed to defending publishing programs against fraud and research misconduct with our integrity-protected workflows.
Marianne Calilhanna
Adrian, can you tell us a little bit about the work you're currently doing with Papermill Alarm?
Adrian Stanley
Yes, yes. So the Papermill Alarm, founded by Adam Day, was the actually first sort of commercial research integrity, fraud detection tool that was on the market, integrated very early with places like the STM Integrity Hub. But the Papermill Alarm really takes a number of signals and indicators from author history, citations, articles, templates, and has built this AI-based machine learning capacity and analyzed basically all of the research literature. So we've done some studies and tests on retracted papers and saw that it delivers a simple red, gre,en or orange alert. And say, for example, on the known Hindawi retractions of over twelve thousand articles, we detected over 98.8% signals in that work. So we work together.
We do integrate with the good people at Morressier, too, along with a number of the partners, Aries, Kriyadocs, so it's an easy way to sort of fit these alerts into someone's systems. I would say there's sort of two use cases and angles. There's one of working with the editorial teams of signaling new manuscripts and if there's alerts on those and sharing details why, which I find publishers feel very important, but also historically we've a whole view of the whole published content and a number of publishers either want to know if there's a level of exposure or to go back and do kind of investigations on research misconduct. So that's that. But part of this discussion with Mark, I will be wearing multiple hats. I was a publisher, was a technology person. I support other companies. But I think this is a great discussion. So thanks for inviting me along.
Marianne Calilhanna
Thanks for joining us. All right, I'm going to turn the conversation over to you, Mark. Take it away.
Mark Gross
Okay, thank you, Marianne, and thank you also, Madalina and Adrian, for joining us on this session. Before we start, this concept of integrity issues are not new. Just to set the level straight, how do you define a paper mill? What is a paper mill? Because that's sort of like – people throw the term around pretty loosely, I think. Adrian?
Adrian Stanley
That's to me, yeah, I mean the sort of broad discussion around that is research misconduct and there are groups that work at scale to help authors publish articles. And there are people that sell authorship for sales. It's a broad-ranging topic, but there are a number of signals, a number of activities, a number of sleuths and people out there working to identify either fake science or made-up science. So it's a good question, but I wouldn't say – one of the things we distinguish, Papermill Alarm isn't checking for things like AI-generated text as such.
8:00
It's really looking for signals about the researchers, the research that's published. The paper mills have a business of selling authorship and selling research work, so citing their own work, improving citations. So I hope that helps define.
Mark Gross
Okay, well, I guess, so, a paper mill by that definition is always fraudulent. If it's fraudulent, it becomes sort of – so there probably are businesses that do stuff like that that are not fraudulent. We don't call them paper mills, we call them publishing support services or something. Right? So I just wanted to get that. So anyway, so this idea of this is just, it really came on my radar only a few years ago of this concept that research integrity has become an alarming issue. Suddenly, and it's like three years ago or so, in the middle of COVID or something like that, it suddenly became a topic at every conference I was at. What is it? But it's not new. I mean, Marianne dug up this article from 1964 in The New York Times talking about the integrity of science, but they were few and far between. It's only in the last few years that it's hit the mainstream. What's the cause of that? What has changed in the last few years that suddenly it's an issue? I don't know, Madalina, you want to take a first shot at that?
Madalina Pop
Sure, Mark, this is actually one of my favorite topics, and this is exactly because of the increased attention that this has been getting in the past few years, especially outside of the research communities because this has become a public concern, the research integrity problem. And indeed we've heard a lot about research integrity in the past two, three years. But we know that issues like plagiarism, for example, are definitely not new. We hear about plagiarizing in year 80. We hear about legal framework around plagiarism being set around the 1600s or the 1700s. So this is not a new problem. It's the same with paper mill. I bumped into an article a few weeks back that was talking about paper mills and that was dated 2013, so that's eleven years ago. And even though we've heard about this a lot in the past few years, this is not new.
I see two reasons for which now this is more popular, a hotter topic. First of all, the increased attention kind of feels natural considering the rapid development of technology. On one hand, this makes it easier to commit fraud. So we have access to research, we can just copy/paste text, we can run that text through tools that can rephrase it, that can replace words with synonyms and create those so-called tortured phrases, and it's easy to do that. It's also easy to get in touch with paper mills, with these organizations that sell authorship slots to authors. They promote themselves in social networks. But you can just run a Google search and you can identify these services. So it's easy to commit fraud. On the other hand, the rapid development of technology also allows us to identify these issues easier.
12:05
So if, years ago, it was really difficult to identify that some text was plagiarized, now you have technology that can process information at scale and we can actually identify these papers and this kind of research easier.
Mark Gross
Okay. And Adrian, do you want to add anything to that?
Adrian Stanley
Well, yeah, yeah. I think one point that's known and Madalina's right, that the advances in technology. I would just say technology can give a lot of signals and indicators and highlight, if you've got a hundred thousand papers, you may only need to look at a hundred of those carefully, but you do need a human also to look into the details. And I see a kind of growth of research integrity departments. Skill have been able to analyze these signals and really identify, is this a paper we should reject? Is it something just needs a very good peer review? I think there's a number of gray lines. If an article cites one retracted reference against if it cites ten, that's a different level of signal. But you need humans to be able to understand that sort of variance. And I know Morressier's been doing a great job to really develop all these checks and put research integrity at the forefront, but those are tools and systems to help people do their job in a better way, if that makes sense.
Mark Gross
Yeah, that makes a lot of sense. Yeah, we'll get back to that, I think, in a little while, but let's talk about what kind of misconduct you've been seeing. And I know the Retraction Watch is out there and retraction's a big, black mark to any journal and every author. And that's also become sort of one of those areas that gets disputed. Should it have been retracted? All those kinds of things. But let's see, what are the pieces? And I see we're on a slide, we have a whole bunch of them. Can we talk about which of these things are more important, less important, which things are happening all the time? I suspect you already mentioned plagiarism has always been around going back centuries and centuries, probably going millennia, right? So I mean we didn't call it plagiarism then, I guess; we called them something else. But do you want to talk about how they manifest themselves? And Adrian, you talk about signals. I mean, what kind of signals do you get from these forms of misconduct?
Adrian Stanley
Yeah, that's a good point. And I would just say while retractions have a negative connotation, I do think it can also be highlighted to show that publishers have got good practices in place. Not every retraction is for a fraudulent reason. There could have been a genuine mistake that affected the science. So I think we have to be careful just how we view retractions. In some cases, it's better to ensure that that version of record is either updated or checked. I would just say sort of maybe going back to some of the reasons we see or understand why paper mills have really grown and been highlighted is –
16:00
Just take, for example, a doctor in China that's having to work fourteen, fifteen, sixteen hours a day looking after patients, but still has a need within the system or used to have, certainly, to publish a certain number of articles. So when a service comes and said "I can help you tick a box that you published this," some of that focus is doing the work, but to get their academic, keep their credits.
The other areas we've seen are things like if someone needs a paper in a certain career area, whether it be machine learning or sustainability or a way that they can help them get promotion, and they may not have done as much of that research but they want a shortcut to that. The crazy thing is that these articles are there as a version of record and can be analyzed and found later. So I think you have to be careful. And then one last point, the whole sort of research misconduct outside of scholarly publishing. There was a presenter at the STM meeting in Washington, DC in April. Basically he said that's a twenty-one-million-dollar business when you put all the academic credits, career progression. So I think even that's a bigger issue of shortcutting a way to get to where maybe you want to be as a career. But I think these researchers are making a decision that ultimately is not the right one and the ways that this can be detected, there are consequences. But Madalina, maybe you want to add something more to that at this point?
Mark Gross
have some thoughts on that? I'm sure you do.
Madalina Pop
I do. Mark, to your question earlier, which of these signals are more important than others? I think it's very hard to say that plagiarism is worse than peer review manipulation or than data manipulation. I think that what is important is to be able to detect all of these signals and to analyze all the signals in the submission context or across the submissions. So it's not one single signal or one single check that would qualify. The quality of the submission of the research is looking at all these aspects together in order to make the decision. Is this good quality? Do I have enough proof, enough evidence to reject something or to retract something? And in terms of the ways of misconduct, I think there are so many creative ways of doing this.
So we see data manipulation and image manipulation, which is things that have been talked about for long, plagiarism as well, text and image. And this varies from simple copy/pasting someone else's writing, but also that processing through additional tools to rephrase or even copying someone's text in a different language, then translating it to English, which makes it even more difficult to detect as plagiarism. There's also misconduct related to authorship. We talked about paper mills selling authorship slots to authors. But there's things like adding someone as an author to a paper they've never contributed to because they are a well-established researcher in the field, and you think that this increases your chances to get published.
20:04
Retraction Watch actually has some really nice stories. It's not nice. Some stories around researchers who found themselves authoring papers around research they never had anything to do with. So imagine finding that out in any context, but imagine how problematic that is when you find that out in the context of that paper being flagged for misconduct and you are one of the authors and you have nothing to do with it. Another way of authorship misconduct is authors changing contributions to bypass, for example, open access fees, because you are from a country that benefits from discount, so I'm adding you as the main contributor to this article so I can publish for free. This is also misconduct, right? There's citation manipulation, there's period –
Mark Gross
Getting a discount you're not entitled to. Right?
Madalina Pop
Exactly. So there's all these creative ways of misconduct and the question for us then becomes how do we combat this? What can we do to make this better? And I think there are a few things we can look at. First of all, we should look at how we prevent misconduct in the first place and how do we make it hard for these bad actors to get to us in the first place? And I think this is where we start talking about this research integrity by design. How do we prevent these bad actors reaching us in the first place? Then we're not going to stop them completely, that's for sure. But if they bypass our vetting processes, how do we run as many checks as possible and how do we flag for as many issues as possible? And this not only at the level of a single manuscript, but across publications, because there's misconduct that is not necessarily obvious in the context of a single manuscript. You have to look across publications in order to identify these types of misconduct.
Mark Gross
Across multiple publications? How do you do that? I mean everything the guy's authored, you mean? So it's really a collection of authorship kind of things. Let me just go back to of the things we talked, some of them are, you said, mechanically viable, right? Plagiarism you can figure out even if, you said languages, there's the complications, but the worst ones of these in my mind is fabrication and falsification. How do you find that?
Madalina Pop
Well, it depends on the things that are falsified. So for image manipulation, for example, we have scientific sleuths who actually we provide trainings on how you can identify that an image has been manipulated, and there are tools that identify that an image has been spliced or something in the image was erased or that the image was tampered with.
24:00
With data I think it's more difficult and Adrian can talk to that.
Mark Gross
Looks like he's waiting to say something.
Adrian Stanley
Well, I was going to say one of the use cases we see when there isn't a flag and alert is really just the publisher editorial office ensuring they send it to good, trusted peer reviewers they know. Peer review still works. It's still a great process. So using these signals to just perhaps sort of triage the manuscripts. But I would add one point, Mark, to your earlier comment, that I think there's been a lot of great work being done around detecting and highlighting research fraud and ways. And one of the ways is things like where there's duplicate submissions, where a paper mill will submit to multiple journals and through different systems and certainly STM and Clear Skies, you can see that and it's a really good indicator. There is an argument that when you notice that you think it is a paper mill submission and you have a pretty strong argument that maybe you don't just reject it automatically. Maybe it goes into a holding area somewhere because the paper mills will just sort of resubmit till they find the sort of right journal. So that's that.
But one, just, additional point that I think is really interesting that I'm seeing coming forward now is also signals where people have done good research, where they've been able to share the underlying data, the code, a protocol, maybe a pre-print, maybe a presentation, sort of like honest signaling. So you can sometimes counterbalance or compare signals with what happens when you get a paper mill alert, but then did they actually do a lot of the good scientific practices that make that research reproducible and transparent? Because paper mills don't really want to go to that level of effort. So I think there's an interesting sort of dichotomy of where you can also begin to focus on behavior that's positive and could be supported as well as the sort of negative things.
Mark Gross
Right. So I think that's a very interesting point. It used to be who's the author made a very important piece of it and the peer reviewers, new people, and I think took us to that slide before, which was also skipped over because I think we're running out of time on some of this stuff because it's such an interesting conversation. But part of it is just the sheer volume of material that's being published today. I mean, the MEDLINE in 2023 is one and a quarter million citations in 2023. So this is so much. I love every twenty-four and a half seconds you're getting a new paper out there, which is incredible. But it's also, there's so many authors now that there's no way that there's a personal touch anymore. And I think that's part of what – it probably seems very anonymous.
I mean, in fifty years, and I don't even know what the numbers are, it's probably a chart that goes like that, but fifty years ago the number of papers coming out in any discipline was small enough that people knew each other. They knew it was, so you're not going to cheat because it's going to come back to you. People knew you and the peer reviewers knew everybody. Today, it seems very anonymous. And I think what you said about positive signaling is also very important. That comes down to people. But I just wondered, if you got a paper retracted, I think Adrian made a very good point: retraction isn't always a bad thing.
28:01
But if you get a paper retracted for a bad reason, like when you're twenty-three and now ten, fifteen years later you've turned a new leaf and you've been doing good science for fifteen years, is there a process to improve your author integrity rating?
Adrian Stanley
Well, I think those are the really interesting key editorial decisions that the publishers are having to make. If there is a signal, but it was from a while ago, but now they're doing good work or maybe there's paper mills in between the kind of research they're doing. I think there's got to be a sort of normalizing or discussion around those best practices of what would you do with that case if he's published ten great articles but one bad one, is that fair to say "Maybe I shouldn't publish the next one"? Those are, I think, challenging questions that I suspect each publishing house may look at that differently.
Mark Gross
And to the extent that you're putting metrics on all this stuff and it's an automated process, people might get cut out without further review. Almost needs to be a form to allow you to ask for a second chance or to ask for a review, I guess, of a decision. But yeah, so one of the things we talked about and we talked about before is about standards and metadata and structure and all those things. I mean to the extent – the rule of thumb in security is don't let the bad actor in in the first place. And so to the extent that you can find that and could find the bad actors, the bad actors that are repetitive bad actors, that's the best way to keep bad science out or bad articles. And I guess putting the standards and metadata earlier in the process probably helps a lot. To what extent – I mean, I guess you're getting a lot of information right at the beginning of the process. Madalina and Adrian, you're getting probably information on authorship, I guess. What other kinds of things are you really doing today?
Madalina Pop
It's not only authorship. I mean, authorship is part of it definitely. But it's also the institutions for which those authors work, right? It's also their previous publication and their networks, their authorship networks, which are important, and their history of publications. So I think this is where metadata and standards and structure help, because in the context of the increased volumes that you were talking about earlier, Mark, assessing a submission is not an easy task for the editors, it's not an easy task for reviewers, and it requires effort. With these standards, you can automate part of the time-consuming tasks that an editor would have to take or a reviewer, because you are assessing the quality of the research. But as you said, you're also interested in who the author is, what is the institution they work for, especially in this context in which the reviewers don't know the authors anymore or the publishers don't know the authors anymore.
31:58
And having this metadata available and these standards and these persistent identifiers helps us automate some of these tasks. So take Ringgold, for example, the Ringgold Identify Database. You see an institution you've never seen before. Do you have to manually check if that institution is a genuine institution? Do you have to search in different sources to get this information? Maybe, if it's not already vetted by Ringgold, but if it's already vetted by Ringgold, you have some level of confidence that, yes, this is a genuine institution and this is all the information that we can provide you about this institution. Same for ORCID. ORCID has this amazing initiative where they partner up with institutions to verify the fact that someone is affiliated to that institution. So this saves you time to, I don't know, maybe check, go to that institution's staff webpage to see is this person actually working there or have these people actually published this information? This is information that is already vetted and providing it to the investigator in one place is essential for optimizing the assessment process.
Mark Gross
Right.
Adrian Stanley
Maybe just to add to Madalina's point about ORCID, Mark, if that's okay. Many journals, some of their policies have been the lead corresponding author should have an ORCID, but not necessarily all the other ORCIDs. The publisher I worked at, we did ensure every contributor had an ORCID, and as Madalina said, it's verified and it's there. That's a great way to just put a few standards and checks in, update your submission forms and process. It may take the author a little more time sometimes, but knowing that, and so it really does say that we do need these cross-publisher initiatives with people like ORCID and other BID vendors and things.
Mark Gross
Yeah, I'm surprised that, I mean, the percentage of authors that have ORCID IDs is still relatively low, it seems to me. I don't know. You're closer to it than I am, but you like said, the lead authors will have it, but a lot of times some of these articles have a hundred authors on them. So I think the authors should take a little bit more trouble to get verified and be trusted. In the '60s, you'd get onto an airplane and nobody was checking anything. I don't know if you remember those days. I remember marine terminal, I would go in basically in three minutes from cab to airplane and I'd be flying to Washington. There was none of this other stuff.
But today you have to be like a verified traveler and all those things. So the idea of more of this verification kind of concept should be a regular thing. So it sounds like an ORCID ID, the facilities should have it, all those seems like a natural, but they're not all there. So I want to talk about one last thing, tying back to what was a theme of our conversation there of putting a structure throughout the article and the idea of converting, of having a JATS article early on in the process, even before peer review, and that probably wasn't feasible a few years ago, but it probably is today.
36:02
I'm wondering, we do a lot of work for the US Patent Office and patents when they were coming in, they used to come in as PDF documents, and the patent examiners would have to leap through it. There was no automation aids to any of that. Today, everything's turned into XML right at the beginning before the examiner ever gets it. So you could take advantage of all the automation. What do you see as a state of the art now in converting to JATS even before peer review? Is that happening at all? I mean, I think it needs some – we've been working on automated approaches to taking a Word article and turn it into XML, but it doesn't seem that many of the platforms are actually using that yet or doing that. What's your thoughts on that?
Adrian Stanley
Well, my assessment is some of these different AI-based technologies that run off platforms, like OpenAlex has a lot of data. Often they will do some of the automatic conversion and use tools. I know GROBID is one that gets used a bit to get the key parts of the manuscript there, the reference that, the title, the abstracts, the authors, the affiliations, those are a lot of the key areas. Often maybe you don't need as much of the full text, but I sense those systems wouldn't do a great job if you were pushing that straight into a publishing workflow, but they're using XML as kind of like more the metadata standards to really highlight and tag and be consistently looking across the data and across all the publications. So that's my sense. I think it's good if the more and more publishers can have upfront XML workflows and formats. I think that that's positive, too.
Mark Gross
Right? It's going to take time. Madalina, what are you seeing in the field?
Madalina Pop
Kind of the same thing that Adrian is seeing. I think that there are solutions that are extracting these key aspects of a manuscript. What we bumped into with these solutions is that they're very dependent on the format of the manuscript because they are typically trained on certain formats. So this makes it difficult to extract. So absolutely, implementing this early on in the process, it absolutely helps with automation and with keeping the metadata clean and so on.
Mark Gross
Right. But are you finding, of course people have different formats, it's not easy to do. I mean as technology develops, it will be. I mean, that's one area it seems that AI really can help, right? I mean, of taking apart formats and pulling it together. So if that was available, that might be helpful in the early evaluation of an article. I mean, the earlier you could tell an article is suspect, the better it is. I mean, that's got to be right, or, I mean, after you've gone through peer review and then you find your signals, that's probably an expensive process. And getting peer reviewers, Adrian, you mentioned how peer review is really important in all this, but I've heard that getting peer reviewers has become a real issue. Is that what you're seeing also?
39:56
Adrian Stanley
I mean, certainly there's technologies that help match the article to people's research profiles, but academics are overloaded or it seems certainly my experience of those examples of finding peer reviewers, but having good editorial board members that you can rely on and trust for certain papers who do the peer review, those sort of things can help. But yeah, matching the right content with the right people is another sort of area that I think's important, too, and trusted. We actually did a ranking when I was at JMIR, the founder there, of quality of peer review. So the editorial office somehow rated it and the peer reviewers rated the article and gave us an assessment what they thought, and many of those articles ended up getting a lot more higher citations and Altmetric scores and things like that. So I think there are systems that help give peer reviewers reward, whether they're sort of credits you can offset the publications or copies of the journals. So I think looking after your peer reviewers is a good step that's important, too. Thanking them with a page of thanks at the end of your issue is something simple you could do.
Mark Gross
Right. And so, I mean, are people doing that? Is that helping?
Adrian Stanley
It certainly was at the publishing organization I worked at.
Mark Gross
Right. I guess it depends on what field you're in, I mean, some fields, there's probably plenty of people, and some fields, I think you mentioned the author and three other people are the only people in the world qualified to review it. Sometimes there's not three other people. So that's probably an area that you can't do very much in. Any other thoughts on what – maybe a quick statement and closing thoughts on what you think, where we are, and what's going to happen? I mean, Madalina, you want to go first?
Madalina Pop
Sure. So returning to the topic of research integrity by design, I would kind of like to highlight the importance of keeping integrity in mind at every step of the publication process and not only the publication process, at every step of the research process. So from the moment you're documenting your research and conducting the research and beyond the publication process, when this research becomes groundwork for new research. And today we focused a lot on the publication process, on the publishers, but the publishers are not the only party involved in this. There are different actors involved in the process. There's the researchers, there's the funders, there's the regulators, there's the institutions. So it kind of takes effort from all of these actors to increase the quality of research at every step.
Mark Gross
Right. Very good points. Adrian, final thoughts?
Adrian Stanley
One thing we didn't touch on, I do think, and looking at examples from, say, the banking industry where obviously there's a lot of fraudulent activity that needs to be measured and monitored and they have systems for credit scores and all sorts of things like that, there are still sort of regulations, things like GDPR, where it is hard to share data about people across publishers. So having systems that can sort of anonymize and centralize and take the big picture view, but yet be careful how they share that data across publishers.
44:02
You can have a flag but not necessarily say this paper from this journal and this publisher was bad and they're trying to publish in your article. So I think some of the regulations need to be thought of. And in an ideal world, there was a discussion in Frankfurt, right? If you really wanted to stop paper mills, identifying these bad actors who're submitting to many, many journals, but it does seem like there are some sort of legal restrictions in some areas about that.
And you have to be very careful with accusations and research misconduct. It does take time for institutions and publishers to really investigate. So putting up the kind of stops and checks at the beginning I think is one way to avoid yourself getting into those bigger, more costly operations down the line. I think I read somewhere a retraction and research investigation can cost somewhere like seven hundred thousand dollars or something. It's kind of crazy if you get all the people in a room and all the meetings and the time to investigate. So I think putting checks in earlier is a sound movement, but as an industry, I think we have to keep working together and finding solutions to tackle this because I don't think it's going to go away. There's always going to be incentives for someone to –
Mark Gross
Yeah, I'm not sure what regulation – I mean, in the banking industry you have some regulators with some pretty strong hammer. I mean, a threat of a five-billion-dollar fine is very effective in banks. But in publishing, I don't think we have something like that other than the good name and threat of retraction. So it's sort of self-policing rather than – right?
Adrian Stanley
Well, it is, but perhaps I know there's been sort of a couple of calls and posts around where should institutions be part of this and perhaps a little earlier upstream and not always just be on the publisher to be the – publishers are the gatekeepers of the content and the quality, but at some point I think institutions perhaps need to play a bigger role too.
Mark Gross
Right, right, right. They don't want to – oh, Marianne's here. I mean, they don't want to turn off their authors either, right? There's also the competition of authors, for authors. Marianne, I guess it's question time.
Marianne Calilhanna
It's question time. We have a few questions coming in, so I want to take these final minutes to address those. What incentives can publishers offer authors to offset the additional trouble of navigating technological guardrails? Any thoughts on that?
Adrian Stanley
Well, I think an author would be pleased to know that these checks are happening so that the journal and the publication of choice is doing things to only publish the best work, but using technology and people at its best. So I think I am not sure it's actually that much of a issue for the authors. It's more the editorial people who are finding and assessing and doing the work and then the author gets the decision. I don't know, but Madalina may add more to that. Maybe there's more context to the question.
Madalina Pop
I would also add to that that we are now in this context in which we have to process a lot of research. So this is a great opportunity for us to rethink our current processes and try to increase the efficiency of these processes because automation, we talked about automation here in the context of the publisher needs.
48:02
But there's room for optimizing the process through which the authors go when submitting something. So making it simple for the authors and especially for the good examples that Adrian was talking about, you don't want to put a burden on those people just because there are a few bad actors who are trying to bypass the systems and the checks in place. You want to make it easy for these people to submit their research. And there is a lot of room for optimization for authors, as well, while also checking for integrity and making sure that the publishers have the right information to make their decision, of course.
Mark Gross
Right.
Marianne Calilhanna
All right. The conversation today talked a lot about technology tools that are in use, but then Adrian, you mentioned bringing humans in the loop. There's this distribution of technology and human intervention. What are you seeing in terms of publishers' staffing? Are they creating research integrity staff or research integrity departments to keep pace with what's going on?
Adrian Stanley
Yeah. Sabina Alam from Taylor & Francis has presented at a couple of STM meetings and I think she basically said something along the lines of, in 2019 they had a hundred and eighty serious cases, not just plagiarism. I think, don't quote me exactly on this, but the timeline was in a couple of years later they had two to three thousand. And then when she presented at the last STM meeting, five thousand – four to five thousand integrity cases that they need to look at, coordinate with authors, with institutions. But I think that's the historic part of that.
I would say perhaps as the sort of checking part up front, a number of publishers are working with their kind of existing vendors and people to help triage, where you may have a group of people that can assess, is everything there in a manuscript? Are the figures there? Did the author send the supplemental data and things? So I think there are ways that people are integrating these in processes, but I still see that's perhaps evolving and there are different ways people are using these technologies. And I suspect it would be a really great meeting for us to have to really look at all the different ways people use technologies with people together and normalizing some of the decisions that people have to make and how training is done. I think there's a huge opportunity there somewhere for someone.
Marianne Calilhanna
All right, thank you. Another question. Can we talk about humanities misconduct or research integrity issues in humanities compared with STM? Are the same things going on? Are there different considerations? Any thoughts on that?
Adrian Stanley
Madalina, do you –
Madalina Pop
I would say that we do have publishers working in STM that face different types of issues more than others. There's actually, I think we ran an analysis on the Retraction Watch database a few years ago, and there are different types of misconduct that are leading in the different domains, but these are not necessarily –
52:02
I would not say we see a huge difference between the two sectors in types of misconduct.
Marianne Calilhanna
Okay. And another question. Scholarly publishing in itself is an evolution of scientific ideas. We know that science changes and evolves as we learn more and understand our world. So can you speak to using some of the tools and the technology to differentiate between true misconduct and what we're talking about with evolution of ideas? I think this circles back to what Adrian said about retractions aren't necessarily an evil thing, a bad thing. Thoughts on that?
Madalina Pop
I think, Marianne, this actually brings us back to what Adrian said. Technology can support people with certain tasks, with automation, with automating standard tasks, with processing information at scale. But at least not yet, it does not replace the people that are making the decision. And we have all sorts of flags that we check for, but a human, a person is the one who analyzes the results of those checks. And it's important for them to have all the information in one place to be able to easily make the decision. And this is what technology can help with. But in the end, it is the decision of the people that says "Okay, this author has retracted papers, but this retraction happened twenty years ago. They've had a great reputation ever since. And the reason for retraction was this, and maybe the author themselves was the one who announced the issue and was frank about it. So okay, I decide that this moves forward."
What we found helps is providing the investigators mechanism to actually flag, for example, false positives and override these false positives. So the check is saying that this is a problem, but I reviewed this carefully, I've made my decision. I'm overriding the result of this check because of this reason, which is an informed decision. So technology is not ready to replace the people. And to the question earlier, to the point earlier around what do we see publishers, how do we see publishers reacting, I think we see them reacting in two areas. On one hand we see the research integrity teams increasing, getting stuffed more and more with the volumes. On the other hand, we see them trying to look for automated tools that can help optimize the assessment process and, again, helps optimize –
Mark Gross
I mean, the human process is an expensive one. I mean, so what percentage of papers that have these signals –
55:58
What percent of papers actually get to an investigation point or have enough signals that warrant investigation? Is that 10% of the papers, 20% of the papers, or is it 1% of the papers?
Adrian Stanley
Well, on the published literature, there was an article in Nature that estimated there was about 2% of the published literature. Now, that's not all the submissions and things. But maybe to go on to just answer Marianne's question that came in, I sense there's been a number of high-profile cases, Harvard, where the dean resigned because they didn't either cite other work or they claimed it as their own. And these things happened a long time ago. Also, I do think for things like the image manipulation checks, there should be a sort of statute of limitation of when; how far do you go back? When did technology change and that sort of version of record where there may be reasons why something was wrong that was different to just pure fraud. But I think as an organization it would be great if there are forums like the World Research Integrity Congress, where all these sort of like-minded people from institutions and publishers do come together to talk. And I think that's just going to help. I'd love to see publishers playing a more active role, educating and working with institutions and all our partners and bringing the communities together to really digest and understand and set a course forwards. So, yeah.
Marianne Calilhanna
Well, I think that's a good place to stop. We have come to the top of the hour. I'd like to thank everyone for attending this webinar, taking time out of your day. Thank you so much, Mark, Madalina, and Adrian for having this conversation with us. The DCL Learning series comprises webinars like this, monthly newsletter, and our blog. You can access many other webinars related to content structure, XML standards, scholarly publishing, and more, from the on-demand webinar section of our website at dataconversionlaboratory.com. We hope to see you at future webinars and have a great day. This does conclude today's broadcast.
Mark Gross
I thank you, Madalina and Adrian. This is great. And thank you everybody who's in the audience. Okay, thanks.
Madalina Pop
Thank you.
Mark Gross
Call that a wrap. Thanks.
Adrian Stanley
Thank you.
Madalina Pop
Thank you.