DCL Learning Series
Platform Strategies - Not Your Mother’s Migration: Lessons from Migrating More than 47GB of Technical XML
Betsy Donohue
Just to kick things off, just a little bit of an introduction here. Anyone who has ever performed a platform migration can tell you that the state of your archive is one of the biggest determinations of the level of complexity of the entire project. However, it's also one of the biggest areas of opportunity, and that's because migrations are a great time to review, to modernize, and to upgrade your entire corpus of content. So just over a year ago, Silverchair, AIP Publishing, and DCL undertook exactly that task, and that task was moving more than 90 years of content to its new home on the Silverchair platform.
I'm joined today by Melissa Jones, who is the content architect at Silverchair, as well as Tracy Denien, who is the head of product development and operations at AIP Publishing. Richard O'Keeffe, who is the manager of digital asset management at AIP Publishing, as well as David Turner, who is the digital transformation consultant at DCL. We're all here today to talk about that process. That process, not only how it worked, but what publishers can learn from it. But before we get started, we decided to launch a poll. Our aim here is to get a sense of our audience's experience with this topic.
Steph has popped this poll up. If you could take a minute to quickly answer "Yes," "No," or "No, but we have one coming up," that would be wonderful. Thanks for participating in that poll. Awesome. Thanks a lot. Okay. We've got immediate results. Oh, pretty cool. 70% and 30% no. Okay. All right. That's a good mix. So let's hand it over to today's speakers. We're going to take an approach where the speakers introduce themselves and really set the stage, sharing with us their role in their migration project, what their goals were, what their experience was in this project, and afterwards then we'll engage in really an open discussion and you'll have an opportunity to ask questions and we'll provide the answers. Yeah. We're going to start off with Tracy from AIP.
Tracy Denien
Great, thank you. I'll just share a couple of quick slides. As Betsy said, my name's Tracy Denien. I'm head of product development and operations at AIP Publishing. My team drives the technology related areas for the platform, including integrations with other systems, our front-end services, and most importantly, the content. AIP Publishing, itself, was formed in 2013 as a wholly owned subsidiary of the American Institute of Physics, and we're charged with publishing over 30 journals and conference proceedings. The American Institute of Physics has, for nearly a century, worked to advance, promote, and serve the physical sciences.
Both organizations share in this mission and continue to work together to achieve their goals. Now, if we're talking numbers, AIP Publishing publishes the American Institute of Physics flagship magazine, Physics Today, 10 open access titles and growing, 12 subscription titles for our publishing partners along with 16 of our own titles. We have over 2,400 conference proceedings published. In total, that's over 1.1 million articles currently on the Silverchair platform, which is why we're here today. So, why did we move?
4:00
Let me just stop sharing, because I just had those two slides as a visual. Why did we migrate? As you know, a platform move is not for the faint of heart. In 2021, AIP Publishing reached a fork in the road where we needed to do a major and costly upgrade with our then platform vendor. The platform had an outdated look and feel, and it was the perfect opportunity to explore another option, which was to migrate to Silverchair. Even prior to being a Silverchair customer, we attended their platform strategy days, we spoke to other Silverchair customers and had exposure to many of the Silverchair staff, so Silverchair itself was not something that was unfamiliar to us. We have many areas identified as we worked on the discovery with Silverchair, that I can get into if time allows, but right now I'll pass it off to Rich O'Keeffe on my team to introduce himself.
Richard O'Keeffe
Hello, my name's Rich O'Keeffe. I am, as Tracy mentioned, the manager of the digital assets group here. Our primary role at AIP is to support the production workflow, the maintenance of the archive, and the integrity of the archive, as well as our XML implementation for current, past, and future implementations of the content that we published. Our role in this particular migration was we were the primary contact with Data Conversion Laboratory and with the Silverchair content architects responsible for the delivery of the material and the answering of any kind of questions and any kind of anomalies that had been found during the conversion process.
David Turner
All right. I guess I take it from here. As they mentioned, they worked with us at DCL to move their content and, as Tracy hinted at before, it was a massive amount of content. I think it's the largest set of content ever loaded on Silverchair, if I'm not mistaken. This slide gives you a sense of some of the size here: 47 gigabytes of journal content, 1.7 terabytes of journal assets, and not just loaded, but really optimized. That's the role that we played in this process. Again, as they mentioned before, I'm David Turner and I do a little consulting with new customers, I do a little business development and I also manage the relationships with partners like Silverchair, that's my role at DCL.
DCL stands for Data Conversion Laboratory. We're based in Queens. We've been solving content challenges for more than 40 years. We really shine in situations like this where quality is mission-critical and where the content is complex. We've got a really solid team: Beth, Devora, Robert, David, some of those people. We've been doing a lot of these projects. I mean, I don't even know the exact number that we've done with Silverchair now, but it's been over several years.
Just quickly, more just about DCL, some of the things that we do. I mentioned the content conversion, not just conversion, but optimization. We didn't want to just get their content onto Silverchair, but make sure that it was optimized for the new platform. We also handled the migration piece. We have some other services around accessibility, scanning, and entity extraction. The other really big part of where we were involved here had to do with identifying problems with metadata. As you're moving to one of these new platforms, you want to make sure you take advantage of the opportunity to be able to get in and fix those kinds of things because they're going to affect user experience, they're going to affect discoverability. What we did is we employed a tool that we have, and we do this with all of our Silverchair projects.
8:00
It's called Content Clarity. You'll hear us talk about this a little bit more today. Just quickly about that, Content Clarity is effectively an analysis that we do of all of a publisher's XML content, and it's really designed to do three things. First of all, provide content metrics on things like you see here: number of files, number of bytes, how much full text you have versus header-only XML, what DTDs you've used historically. It also identifies where information may be missing. If you've got missing doc types or missing titles or missing issue numbers or things like that, we can identify those and start working on getting those fixed. Then, finally, it also surfaces errors and duplications and things like that. When we're talking about Content Clarity, that's what we're talking about. With that, I will stop sharing that and we'll pass this over. I think Melissa's next.
Melissa Jones
Yes. I'm Melissa Jones. I'm a content architect here at Silverchair. A content architect wears many hats here, but our role in the migration process is to act as a facilitator for the migration and also to provide support for the content conversion vendor and the client to make sure that they have the answers that they need to make informed decisions about the content and the migration process. I co-managed the AIPP migration with my colleague Brooke Begin, who did a lot of the day-to-day work on this. I have a slide I want to share just to give some context on where this project falls in Silverchair's migration journey, and it has been a long one.
Let me just share my slides here. This timeline starts with the OUP migration in 2015, 2016. I think that the OUP migration is a really good mirror for the AIPP migration, because of the size. The OUP migration was approximately 2 million journal articles and AIPP, which is the largest we've done since OUP, is well over a million with book content as well as proceedings content. These two are very similar in terms of scope, in terms of the level of publisher engagement and involvement and expertise, but the context for us was much different. I want to talk a little bit about that.
OUP was actually our charter Zipline 3 and SCJATS' client. We were building Zipline 3 and creating the SCJATS specifications while we were migrating. For that reason, we decided to do an internal conversion, and that provided us with the flexibility to adjust as we were building the tools and the specs. It also gave us a significant opportunity to learn all the ins and outs of the migration process. I mean, what better way to learn than with millions of articles and doing everything on the fly. It was a challenging project for those reasons, but it was also a significant turning point for us as a company. A lot of the lessons that we learned in partnership with OUP informed the direction and the trajectory that we took for the years after that. The DUP migration, Duke University Press, is on this timeline, not because it's similar in size – it was a much smaller migration – but this was actually the first full vendor conversion that we did on Zipline 3 and SCJATS and SCBITS. It was also our first Zipline 3 partnership with DCL. DCL was no stranger to Silverchair at this point,
12:00
but it was their first encounter with the SCJATS and SCBITS specifications and the Zipline 3 tool. This is where we started to form the beginnings of the content project plan that we still use today. We came up with some internal ways of working, and also some ways of working with content vendors and with clients to maximize outcomes for the migration process. We also discovered, during this project, that DCL was probably going to be a really great partner for us that could grow with us and that could provide a strategic option for customers who were looking to de-risk the migration process. And so DCL became the unofficial, preferred vendor after the Duke project and eventually became a universe partner.
As David said, they've done many, many migrations with us. I would say probably 90%, if not more, of the migrations we've done since 2017 have been in partnership with DCL. Our relationship with them has really grown and matured, and the outcomes of that have been things like Content Clarity and additional innovations from them that have really made the migration process go even smoother. Now that brings us to the AIPP migration, which was the largest since 2016. AIPP brought a lot to the table, but from our perspective, this was also an opportunity for us to leverage very mature tools, specs, processes, and also benefit from the longstanding relationship that we had with DCL.
Betsy Donohue
Awesome. Well, thanks, everybody. That was a fantastic start to get the intro and kind of lay the groundwork for what's coming next, which is discussion points. We have a couple discussion questions I'm going to kick off. Starting with the first one; you each have a great deal of experience with migrations across the board and from all different perspectives. What made this one unique and what lessons were you able to apply to this project? Who wants to go first there?
Tracy Denien
I mean, I can speak just from an overall. I think, for me, what made this project unique was the speed at which we went from decision to migrate, to launch. It was just over a year from the ink being dry, I think, on the contract. That, to me, was at lightspeed as compared to past migrations that we'd been through. I think that what DCL and Silverchair teams brought to the table played a large part in that. Also, from past experience, AIP Publishing ensured that the business owners were always available, working across Silverchair and with DCL as needed to make sure that we met our deadlines. To me, just the speed at which we did it was very unique.
Betsy Donohue
Nice. Great perspective. Anybody want to add anything additional to that one?
Richard O'Keeffe
I would say what we were excited about was Content Clarity, as David and Melissa had spoken about, because everyone thinks their content is good, but there's some landmines out there in it. We were really excited to have this tool run through our corpus of content and identify those particular discrepancies and disparities because when you have this much content, nobody really knows that 1942 maybe they did this. It helps bring everything into alignment so you can have a more consistent presentation on the platform. We were really excited to have that tool applied to our content and learn from it.
Betsy Donohue
Yep. Right. That's a great point. That's kind of intermingled with the next discussion point.
David Turner
Can I add one other thing to that first part?
16:00
Betsy Donohue
Oh, absolutely. Absolutely.
David Turner
I thought one of the things that made this unique was really the knowledge that AIP Publishing people brought to the table. A lot of times in these engagements we're working with people, it's their first time to do this, they're not really sure where all their assets are. They're trying to get things together. One of the things we quickly discovered about working with Rich was that he had his ducks in a row and he knew where things were, he'd done this before, he had a lot of answers to questions ready to go, and really made the process a lot easier by bringing that level of preparedness.
Betsy Donohue
Very nice. Great point. The next add-on to this area and this topic that we're covering, and this is specifically for David and Melissa, what gains have you seen other publishers achieve through migrations like this?
Melissa Jones
One area that is typically addressed the most is metadata. Enhancing metadata, adding DOIs, normalizing article types, normalizing casing in article titles, adding article titles, if you don't have them. Our platform actually requires them, so it's kind of a forced improvement. Taking that opportunity to make sure that your metadata is clean and up-to-date tends to be the area where most of the focus is. The brave among us tackle their references from eons ago and try to make them more granular and machine-readable. That is a pretty challenging thing to do, but we have had some clients recently who have taken that on.
David Turner
I think there's a whole host of gains that we've seen. I mean, obviously there's a lot in terms of new functionality when you move to a platform like Silverchair that – Hannah and her team put together great use cases, and I think all that probably factored into why they were chosen here. For other publishers, sometimes you see it's just the ability to get all their content in one place. Sometimes people have their journal content here, their book content here, their conference proceedings over there, and so I think that's a big gain for people.
Also, really, it's a chance sometimes for people just to establish an archive. I mentioned that Rich and Tracy had done this before and they had an archive, but there are a lot of publishers out there who they haven't really kept things in-house, it's always been someplace else, and doing a migration like this really helps with that. Just a couple other things just off the top of my head here. Well, Melissa already mentioned the chance to fix those metadata problems, those nagging problems that are there, but I think there's also – I saw in the chat here there's a question about digitizing content. Some publishers, they'll really take this as an opportunity to make gains around getting content from a paper format into a digital format, or from taking things that maybe were PDF into XML or things that were header-only XML upgraded to full-text XML. Anyway. Should I go ahead and answer that question about the digitization? I can't remember where I saw that.
Betsy Donohue
Sure. Yeah.
David Turner
Is it in questions or is it in the chat?
Betsy Donohue
Yeah, it's in the chat.
20:00
From Guy Jackson. Was that the one you were referencing?
David Turner
Yep, yep. Guy, thanks for that question. We absolutely can do non-destructive digitization of fragile hard copy and return that. We do have that hardware in-house. I will say, it depends a little bit. Occasionally, we'll come across some really special things where we'll use some out-of-house vendors, we might subcontract someplace else, or if the cost of shipping it to our facility is just incredibly cumbersome, but anyway. If you want to contact me afterwards, I'd be glad to talk about that with you.
Melissa Jones
I did just want to add something that I think is important to call out about this project in particular. MathML remediation was pretty significant for the AIPP project. Any client who has very math-heavy content, DCL is very good at pulling out math errors and helping to resolve those. That's another area that's often addressed.
Betsy Donohue
Nice. Great point. Taking the discussion questions and kind of bringing it back to Rich and Tracy for a minute: in this process with the project, did you have any moments that really surprised you and, or really got you excited to see things addressed that you didn't expect?
Tracy Denien
Rich, you may be better equipped to speak to that, as far as the work you've been doing on the conversion.
Richard O'Keeffe
Yeah. I mean, there were a couple of different aspects that the Silverchair platform had than on some of our previous platforms, one of which was it gave us an opportunity to consolidate and pair up our supplementary material with the rest of the assets of articles, because we had a legacy process that had it in a separate repository and we had the opportunity to merge it in so we could deliver everything consistently to DCL and then have it converted for the platform. I guess, from a surprise standpoint, and this builds on what Melissa's comment was and our MathML, is that that was probably the largest area of where we had encountered content that needed to be updated.
We had older processes where we had the composition process integrated with the XML generation process. While the MathML would've been valid if you opened it up, in many cases, if you turned it on strict or were using the latest version of MathML, it would've been invalid almost always because of attributes that were put in that just weren't permitted, because someone decided to use Roman instead of normal or they used a font face instead of that. It worked fine on the PDF and everything back in the day, in the context in which it was done, but it wasn't working here. We had a lot of corrections for that. When you compare it to the overall number that David had flashed, it's a very, very small percentage, which was nice, but still it was a number of files that we had to update and we were able to get that fixed, updated and redelivered to DCL. It gave us that one opportunity to clean that up in the archive. That was probably the biggest surprise. We knew it was out there, but not to the extent that it was.
Betsy Donohue
Wonderful. Thanks, Rich. That's a great answer. The next discussion question is a little bit of a different flavor. I'd like to start with Tracy and add on or ask for a little bit more detail. One of the points you made when you first shared your initial slide, you shared a little bit about the situation when the decision was made to undertake the migration and where AIPP was in crossroads. If you could give a little bit more detail?
24:00
What were the motivating and compelling reasons and organizationally what that was like for you all to come to that realization and work together and kind of start to make the decisions to do that migration and work with Silverchair?
Tracy Denien
I think, at first, it was how to get people within the organization motivated as opposed to "here we go again." We did, and I think because of the situation we found ourselves in, the motivating factor was to really bring our platform into a modern day. With Silverchair, we were able to achieve this researcher-friendly web design. We have a new, professional look, we have new features that we didn't have in the past, like a split-screen article view, which we have our associated content and the data in the same view with the research in the article, so that's huge for us. Getting access to the Silverchair community and workshops, that was really a positive.
Another thing for us too was we didn't have a lot of control over our old platform and we were at the mercy of waiting for the vendor to make front-end changes. What Silverchair provides us is a robust tool allowing us to manage the front-end displays, configurations, adding publications or publishers much easier. That allows us to respond to things a lot quicker. I think a lot of that and really working together with our stakeholders on all the benefits. We brought them into the process more this time than we had in the past, which has pros and cons. The more people involved, the more voices there are. A lot of it was managing the voices, working along with Silverchair and working along with DCL. But I do think in the end it really brought everyone together and having everyone involved was an important aspect of it.
Betsy Donohue
Tracy, that's awesome. Thanks for that extra detail. That's great, because it all comes down to the people, right?
Tracy Denien
Yeah. Yeah.
Betsy Donohue
For sure. The next discussion question is a little bit more precise. We want to focus on newer content for this next question. How did you manage the switchover? What we mean for that is specifically about the content actively in production. Setting up a parallel publishing process, how did that get done? How was that achieved?
Richard O'Keeffe
Well, that's always the most challenging part of any migration. I mean, you see 1.1 million articles, and while that's a large number, the first million's the easy part. We copied off 1929 to 2021 on a drive, and conveniently the DCL drop-off location was on the way home from work, so I could just take the drive and drop it off. That was the easy part. It's those subsequent delta deliveries that are the most challenging, because there were specific times in the schedule where we would drop off the 2022 content and then it was the first couple months of 2023.
The target is always shifting, because as anyone who works in publishing knows, the publishing engine just never stops. It just keeps going and going, and you're supporting that while you're trying to do the migration simultaneously, which is always a challenge. It's coordinating internally on a couple of different matters, one of which is establishing, with your production groups, a cutoff time, which in retrospect, I wish we had allowed a little more time for ourselves, but moving so much content, it was difficult to get a solid date. We actually were publishing up to a day before we cut over. The other part is having a mechanism to identify changes in your content. You may have delivered everything through 2022.
28:00
But what has changed since then? Fortunately, we have a nice content management system where we could run reports and identify everything that had changed since the last time we collected it. We were able to run those reports and include those in the delta along with the new material.
Those are really the two facets; all the new material as well as everything that has changed and being able to identify those precisely and send it over. That was a key benefit for us as well. The other probably most important facet of the parallel publishing is determining, well, what's your output channel going to be. For AIP, we use the JATS archive article 1.3 version as our archival content. That's what we deliver out to customers and everything else like that. It would've been too large a project to modify that, so we were exporting out to SCJATS. And we had the good fortune to work with my talented colleague, Jennifer McAndrews; she wrote the XSLT, and while we were preparing the final deltas earlier, she was writing the conversion, and we were testing and doing that to get that output.
You just work at that. We went from 60% failures with articles we were uploading into the test area, down to 2, 3% by the time we're at launch. There's still some cleanup to do, for the outliers that you never catch during the process, but the overall bulk of material goes through. Since launch, we've been publishing anywhere from 100 to 400 articles a day, and we have very few failures with it. Making sure you understand the spec, you read the documentation that's there, you rely on the advice of the content architects and the information that DCL had provided during the content clarity where it identified how you could tag things, you could use their material for samples, pulling all that together to build your export, that was the critical path for us because without that export, nothing's getting to the platform at the quality that we wanted to.
Betsy Donohue
That was great, Rich. That was a really detailed, thorough answer to that question. Anybody want to add on to that?
David Turner
I think it's important that at the beginning of these projects, Brooke and Melissa and Beth and Devora, they work really closely with the clients to outline the schedule and the process and exactly how this is all going to work. Does it change along the way? Sometimes. Fortunately, this is not our first rodeo, as we say here in Texas. It's something that we feel pretty confident that we can get through, even with a large amount of content. Sometimes it's just a matter of we have to push through and we have to make it work and we've got to work some extra hours to make it happen, but it's all worth it in the end.
Betsy Donohue
Nice. Related to that notion, worth it in the end, literally, this was part of the question that we got in our Q&A here. Let me read it. David, let me know if you want to take this, and we probably can have a couple different answers, but since it dovetails beautifully onto what you just said. Are there examples, from folks in this group, of organizations righting past wrongs in content migration to inject best practices like accessibility, et cetera? What other rewrites, seen as onerous, are worth it in the end?
David Turner
Yeah, I think we've seen a lot of that. Part of that is because publishers don't always –
32:00
How do I put this nicely? Budgets tend not to be unlimited for publishers, right? Unless, AIP, you guys probably have unlimited budgets, right, Tracy?
Betsy Donohue
[Laughs]
David Turner
You always discover that little shortcuts were taken along the way. Yeah, we're going to convert this content, but we're not going to – it would cost too much to do this data. Or we're going to capture all of our math as images. We're going to tag our references in this way. We have seen things come along in terms of connecting with supplemental material. We've seen wrongs righted around, well, like the math that was just mentioned. Cross references, the way we handle affiliations is not exactly the same today as it was 25 years ago. Being able to address some of those things I think has made a big difference. Cleaning up DOIs. As discoverability becomes more and more important, all these little things just make such a big difference. Accessibility. I'm trying to think if we've done any purely accessible type of things, but there has been just a lot of times when you're converting, just getting that extra measure of tagging around certain elements makes it that much more able to be read by a screen reader and things like that. We have seen eBook productions and things like that from these projects as well.
Betsy Donohue
Nice. Anyone else want to add on to that?
Tracy Denien
I'll just say along with what you said, David, you're always managing budgets, but certainly looking at the costs to do it during a migration versus backtracking: once you're already on the platform, you really have to think about it because in a lot of cases it's worth the investment while you're doing a migration rather than trying to fix something once you're already on a platform.
Betsy Donohue
Right. Melissa, it looks like you had something to add as well.
Melissa Jones
Yeah. I just wanted to say a couple of things. The first thing is that a big part of our mission with SCJATS and SCBITS was to nudge people in the direction of better practices as much as possible. We really started that project so that content could be more reusable across multiple platforms, deposit services like Crossref and PubMed. That is one big thing that I think helps in the migration is that we do have a much more restrictive spec than just everything that's available in JATS because you can do a lot of things in JATS, but not everybody agrees on the right way. We try, as much as possible, to put the right way, or as close as the right way as we know at the time, into those specs to nudge people in the right direction during the conversion process.
The second thing, accessibility. I think I would like to see more, but I think that money does become an issue because it is a significant undertaking, especially when you're talking about hundreds of years of content and adding something that is really pretty editorial. Alt text that makes sense is not something that's easy, yet, for a machine to do. We have some initiatives going on in the industry. JATS4R is planning to release a recommendation around accessibility, I think probably later this year, so I'm hoping that there will be some momentum around more publishers taking advantage of the migration process to implement some of those practices.
36:00
Betsy Donohue
Nice. Great points, Melissa. Thank you. Now to take a little bit of a pivot, and we've been talking about technical stuff. What about communication? Broadly, non-technical strategies. The three groups participating on today's webinar, can you summarize for us your strategy and your approach to communication and collaboration during the project?
Melissa Jones
I'll start with that, because the sort of hub of communications is actually a Silverchair ticketing system. We have a migration ticketing system that both DCL and the publisher have access to and they're able to talk to each other and talk to us, and we can all see what's going on and we can chime in as necessary. It enables us to communicate and see the history of the conversation. Instead of getting pulled into an email thread that's branched off from another email thread, we can see the original problem and everything that's happened since then. I think that that created a lot of cohesion around communication.
Richard O'Keeffe
I would agree with that, because going back to see the history, what you mentioned is really key because how many times four months down the road, oh, what did we decide? Then you can go back and you can look at it. In addition, there was also always the opportunity to speak with either a Silverchair staff member or with DCL staff in a separate call, if you had something very specific that a give-and-take through an electronic format might not have been sufficient to get the answer fast enough or to get the details that you wanted to. You would always go back and want to document it in the thread as what was discussed, but you could, in order to maintain schedules, have that opportunity to speak with one of the partners and get the proper answer as fast as you could, which was very helpful because sometimes you just need to bounce off those repetitive, follow-up questions so you can move on instead of waiting for people's schedules in a ticketing system.
David Turner
Yeah. I'll add to that. We try to spend a lot of time in communication on the front end of these projects. We probably wore Rich and Tracy out a little bit with all the questions that we asked at the beginning, but we do try to make sure that we get off on the right foot and that we've got all the details and we're all in alignment. We do try to have regular communication along the way. Then just between DCL and Silverchair, we're in constant communication with their teams at multiple levels. We do think that that's all absolutely critical in this.
Tracy Denien
Yep. I agree, David. A lot of people have meeting fatigue, Zoom fatigue, but in a migration like this, having constant meetings, even with the ticketing system, which I agree, Melissa, was really key, being able to have the in-person communication where you could address certain questions, issues, flesh things out, that really is needed as well.
Betsy Donohue
Mm-hmm. All really good points. Then kind of a follow-on from that general communication question, and then it looks like we're getting some questions from the audience that I can switch to. What are some of the most important non-technical things that partners can do in projects like this?
40:00
Is it that face-to-face time, either Zoom or in-person? Tracy, what do you think?
Tracy Denien
That's definitely one area. I think, also, whether you've done a migration before or you haven't, is trying to pull together as much of your information upfront possible. I mean, we did have a lot of information that we were able to share with both DCL and Silverchair upfront. A lot of our requirements were already outlined, having conversations with the vendor. I know Silverchair, there are things that we asked for just because that was the way we always did it. Being open-minded and listening to your vendor, because they're working with multiple organizations as opposed to just us and there are best practices that we may not really be thinking about. It really does pay to listen to what your vendor is saying.
Betsy Donohue
Great point. Great point. Anyone else want to add to that?
David Turner
Yeah. Just touching back on it, I think it's been said a couple of times, but when it comes to doing one of these migrations, they are big and they're time-consuming and they can be expensive. You should take advantage of that opportunity and be really strategic about how you do it. You're going to have more chance to be able to fix things and enhance things and make them the way that you want during an initiative like this, as opposed to, like Tracy said, doing it later and trying to backtrack. There are some platforms out there that when you migrate to them, they just simply want to move it over, lift and shift. One of the things I love about working with Silverchair is that they don't take that approach. They realize that if you're going to take the time and the money to move, you ought to do whatever you can to really take advantage of that, be strategic, get your content upgraded.
I think then the follow-up to that is really plan for the time that it takes. Whatever you can do to not rush it – I know occasionally Silverchair will get a deal where somebody says "Oh, hey. My contract with my other vendor is ending in September and it's June. Hey, I want to see if we can get this converted over really quickly." That's probably a recipe for trouble right there. Plan for the time that it takes, don't try to rush it. Then I'd also say it's a good idea to have a consultant. We do a lot of these with a third-party consultant being involved, just sort of helping to manage the process and there are some good consultants out there.
Betsy Donohue
Great. Awesome. Great tips. Let's shift for a bit to the questions coming in on the chat. Let me know who wants to grab this one, or everybody. What are the panel views on header only versus full-text XML archives? Is the industry trending towards the latter? What are the cost trends? Who wants to grab that one first?
David Turner
I could talk about the cost. The cost tends to be a lot more expensive to do full-text XML. When you're converting content to full-text XML, if you don't already have full-text XML – much of what we did for AIPP was moving XML to XML.
44:00
But if you've got PDF that you're trying to move to full-text XML, that's typically charged on a per-page basis. Whereas when you're doing header only, that's typically charged on a per-article basis or per-chapter if you're doing books. It does tend to be more expensive to do the full text.
At the same time, full text really gives you all the flexibility. It enhances your search, it gives you the ability to create multiple types of outputs. It allows your content to be mobile-friendly. Instead of reviewing a PDF on your mobile phone, you can have reflowable content if it's full-text XML. I think the industry is trending toward full-text XML, and I think most publishers are trying to create it as they go now. The bigger question is what do you want to do on the back file? Of course at DCL, we want you to move all of your old things from PDF to full-text XML. We're happy to help.
Melissa Jones
Yeah. I've seen this most often with smaller projects where you convert from a PDF. If you have one book and it's only ever been available as a PDF, it really only makes sense to make it full text; otherwise you kind of just get what you already had. Or maybe there's five or six books and the cost implication is not as high as if you're talking about someone with thousands of book assets, millions of articles. Those tend to be more hybrid, like we'll select part of our back file to make that investment, but the rest we'll leave as full text.
David Turner
We have seen a couple of publishers do things where if you look at their site, they have 10 years of full-text XML, and then the next 10 years are PDF with XML headers. Then what they'll do is afterwards, over time they'll come to DCL and they'll say "Hey, we want to maybe do another 10 years. Let's either take that to header only." Or they'll say "We're going to go back and we're going to take the 10 years that are header XML. Let's get those to full-text and then let's expand the ones that we have that are not on Silverchair because they're PDF only. Let's add XML header so that we can get them up as well."
Betsy Donohue
Nice. Thanks for that detail. A little bit of a different area coming in on the Q&A. Generally, looking back – I think this is one for AIP folks. Well, actually for everybody. Looking back, what would you do differently? What are the key lessons learned in this project for future migrations?
Tracy Denien
You go first, Rich, then I can go.
Richard O'Keeffe
Well, I would say one, it goes back to the parallel publishing. I just don't think we allowed enough time for that based on scheduling, resourcing that we had at the time. We got it done, but I would've liked to have a little more time to have our internal staff get familiar with the publishing tools here. We were working in a staging area and it was coinciding with DCL still loading content in, as we were in that mad rush to get everything ready for launch date, and I wish we had extended a little bit more time for that.
The other is coordinate with your people responsible for the mock-ups and things like that, because on a mock-up, it's pretty straightforward; oh, this is the author, this is the main title, this is the abstract. All that stuff is pretty straightforward. It's all the surrounding metadata to make sure you understand exactly where that's coming from. There's a date. Okay, is it coming from a pub date tag? Is it coming from a history tag? Is it system generated? Understanding where that is because it's all the peripheral metadata, whether it's the full text record or just the header record, that is so key and very noticeable by your end users,
48:00
whether it's a table of contents setting, your tagging for licensing, copyrights, permissions, subject information that you might have in there, as well as how are your author footnotes? What's the labeling style? What's all that? Those little details are what people tend to notice and find, because the full text is pretty much a dump of text. You have paragraph tags with a few stylistic tags, maybe math tags in there.
It's not as critical for search and other functions on the platform as your other metadata. I had wished I had a little bit more in tune with that early on. It sometimes can be difficult because the platform isn't built yet, and yet you're needing to make decisions on it. Understanding the mock-ups and engaging in a more in-depth conversation to understand how that data is applied on the platform will go a long way to helping perfect what your rendered output will be, as well as searching and indexing.
Betsy Donohue
Really great answer. Thanks, Rich. Anybody else?
Tracy Denien
To add to what Rich is saying, what I mentioned earlier about customizations is really thinking about what customizations are needed and trying to keep them at a minimum because the customizations add costs time and scope to your migrations. Being thoughtful about what you really need and, again, listening to what the vendors are recommending based on what other publishers are doing on the platform, that will definitely help with your timeline.
Betsy Donohue
For sure. I don't want to cut you off, but to add on that, because I think it's right in line with where you're going. We have another question coming in that said how do you make sure that you make the most of the opportunities of the migration while sticking to a tight deadline? It's great to focus on quality improvements, but the reality is there's a lot of pressure to complete a migration, and that's probably where you were heading.
Tracy Denien
Yes.
Betsy Donohue
Yeah.
Tracy Denien
Another thing as far as if you do need customizations, it's really also understanding when you need them. Are they something that you need at launch, drop dead, you can't launch without it, or is it something that you can live without and circle back post-migration? I think that also controls that time. Also, having a clear accountability. We went from one migration where we had one person making decisions and whether they were for good or for bad, it definitely streamlined things. With this migration, we did open it up to other stakeholders to contribute information. I think that was very good in making sure that we met everybody's needs, but in hindsight, I think too, just having clearly accountable decision makers because otherwise some decisions tend to swirl a little longer than they should. Those are some things that I would recommend.
Betsy Donohue
Those are excellent.
Richard O'Keeffe
The circle-back point is very important because, working with DCL, our primary responsibility was they would highlight a discrepancy or disparity in the data and ask us, what do you want us to do? We would be able to tell them what to do, and they would go ahead and they would do it, which would mean that the content would be fine on the platform, but we had the benefit of them providing us with a listing of everything that was problematic. We know the decision, based on the ticketing system history, and we could go back and we can update our archive. In parallel, we can update our transform so anything new coming out will abide by that new compliant rule, but yet – maybe for 40 years worth of data that might need to be changed, we can go back and do that on our own schedule and not disrupt the schedule for the launch.
Betsy Donohue
Great. Excellent add-on, Rich. We have one more follow-up.
52:00
Actually in the chat, a question, going back very quickly to the full text and header-only question. The question is are there any case studies comparing the monetization and profitability in that previous point? I think, if there are, the unspoken part of this question is can we direct folks to those case studies?
David Turner
I'll have to think about that. I bet I could find some information on that. Is this also Guy? Oh, yeah. Guy, contact me afterwards and let me see what I can find for you.
Betsy Donohue
Yeah. Wonderful. Thanks, David. Then, finally, a brief question, but we could probably talk about it for hours. There's lots of excellent answers and specific technical detail that was shared on this webinar, but if we could really boil it down organizationally, starting with AIPP, what have you gained from this migration? The features and the benefits covered, it's the next step. What does this allow AIP Publishing to do that they weren't able to do before? Tracy, do you want to start with that one?
Tracy Denien
Sure. As I said, we have more control, which is a big thing for us to be able to continue to move the platform ahead. I think also, for us, it's getting more access to our customer data. What we're looking forward to taking advantage of is some of the robust tools and data capabilities and analytics capabilities that the Silverchair platform offers. Our first challenge was getting migrated and having a very professional-looking site that was researcher-friendly. Now, it's going to be about us taking advantage of more of what Silverchair has to offer moving forward to continue to grow the platform.
Betsy Donohue
Wonderful. Great. Rich, your perspective? Anything to add there?
Richard O'Keeffe
No, I would just add on to Tracy's comments, it gives us a foundation on which to grow and evolve with our material. One of the offerings that Silverchair has is that they have a tool known as Radiate that does content delivery. We only engaged with it for launch for industry standard delivery such as Crossref and PubMed, because it would've been too big a lift to deal with three dozen other customers and changing all the formats simultaneously with the launch and all the correspondence that's required for that. It gives us the opportunity to maybe explore and use that tool going forward as we evolve as an organization. Having that pallet of tools available to us certainly increases our potential capability moving forward.
Betsy Donohue
Wonderful. Excellent. We're coming close to the end. We have two minutes to go, but I know that Stephanie wants to close up and spend a little time talking about some closing comments. I want to hand things back over to Stephanie. Thanks, everybody, for joining today. This was great.
Stephanie Lovegrove Hansen
Yeah. Thank you so much. It was so hard to stay in the background and not want to jump in. You guys really nailed it. This is great discussion. Thank you so much, all, for joining. Thank you to all the attendees for participating in the Q&A and chat. I'm just here, again, to remind you that the recording of this will be available on the website later this week, along with the transcript. I'll also be emailing it to all the attendees and registrants. As well, one more plug. We'd love to see you in DC in September. The link for the registration and discount code are in the chat and will also be sent to you in a follow-up email.
Otherwise, thank you so much to our speakers and to all the attendees who've joined us for these virtual sessions. It's been really nice to connect with people from all over the globe throughout the year, at multiple points throughout the year, which has kind of been the benefit of having more virtual events in our lives in the last few years. Thank you again, everyone. Have a great rest of your day and week.
Tracy Denien
Thank you.
David Turner
Thanks for having us.
Betsy Donohue
Thanks.