
DCL Learning Series
DCL's Content Crystallizer - 2025 SSP Innovation Showcase
Dave Myers (CEO, Data Licensing Alliance)
Now, I'm pleased to introduce our first panelist, David Turner, Publishing Automation and Digital Transformation Specialist, Data Conversion Laboratory.
David Turner
All right. Howdy, everybody. And thanks so much to SSP for giving me this opportunity. As Dave said, I'm David Turner, and I'm a digital transformation consultant and head of the partner relationships at Data Conversion Laboratory, also known as DCL. I am proud to present our latest solution, which is Content Crystallizer, which essentially helps a scholarly publication, whether it's a journal article or a book chapter, conference proceeding, or what have you, it helps it to come together, to take shape, to crystallize into a publisher's desired form and format, with the goal of letting you go from Word to XML and in three easy steps.
So before I get too much further, I do have a quick poll, and I'm not exactly sure how this whole poll thing works, but I think Susan pushes it out there. Yes, we want to know, does the eXtyles "end of life" announcement affect you. Yes no. Or what the heck is eXtyles? The reason that we're asking this is that Content Crystallizer does a lot of things that are similar to what eXtyles does. It actually came about in response to the "end of life" announcement. So just interested to see what we're working with here. And Susan, I'll just wait for you to post the results whenever you want to say, Yep, this is done.
2:09
All right. We got a whole bunch of people that don't know what eXtyles is. That's great. And others that yes, have been affected by this. A small percentage of "No"s. Okay, well, let's just dig in. So for those of you who don't know, eXtyles is a Word plugin that's developed by a fantastic person in the industry, Bruce Rosenblum, who sadly passed away a couple years ago. And it's a tool that's been used successfully in the industry for many different things. Right? auto styling documents, cleaning them up, fixing reference formats and then taking that and converting the content into XML. Bruce's company and sold to add upon a couple of years ago. And just recently, it was announced that they're going to be retiring this tool.
So one of our partners came to us and said, hey, do you guys have any kind of tool that could help some of these people out that are looking for help and we don't have any kind of Word plugin, but I, I got to looking around and thinking, you know, we actually do have a lot of software in our existing processes. And lo and behold, they do a lot of the same things. It's not a Word plugin, but it is commercial quality software that we use. And it's not like a single piece of software. It's software that we've used. And, you know, it's not really been configured for external use. So we got to thinking, you know, is there a way that we could do this in a self-service manner? Because obviously we could just tell people, Oh, send it to us, and we'll, we'll do it as an outsourced service. I mean, DCL does that kind of thing all the time. But this is not what that's about.
4:00
We're looking to try to replicate something where someone has a do-it-yourself kind of a solution, something that is fully automated, at least from the DCL part of the process, where they can get results in minutes as opposed to, days or hours or whatever it is that we do in another process. So we came up with this idea of a, a content vending machine, if you will, right? Where the software exists on our servers. You put in your content. Some automation runs, a few minutes later, you get back results. You can make some changes, run it again, get results. So it wasn't a very heavy lift for us because we weren't really creating new software. We just have been putting existing tools together in a new way, and that's what we've got. And it's resulted in this three step process on this next slide, where basically you come in and you, you start by getting the document prepped and that sort of kicks off the crystallization process.
Then step two, you let the editor make some fixes, work with the authors to make some edits, and then ultimately load it up again. And step three then takes you actually from the Word document into XML. So at a little more of a deeper level, the way we started with a Word document, a raw Word document, it doesn't have to be styled or anything. It could be a journal article, book, chapter, what have you. But the idea is that the author then loads that as a zip file with a manuscript supplemental file, some basic metadata, and that kicks off this automated doc prep, which does a ton of different things. Right so it's intended to really start by applying styles automatically to all the different elements of the document. Tables, headings. Author names. Affiliations. It goes through and it cleans up spacing problems. Removes blank paragraphs.
6:00
Puts metadata in the right places, checks to make sure the metadata is there. Looks up things on PubMed, Crossref, or peer review system. It checks citations for completeness, checks the order of citations, identifies unrecognized styles, does all of these things, and then ultimately delivers back a styled Word document. And we've tried to match the styles, look and feel also with comments and error reporting integrated into that Word document. So the document is starting to crystallize. It's gone from raw into something that is styled cleaned up and ready to work with. At this point, the editor then starts to work with it. They incorporate whatever comments they can override elements that they like, they can share back and forth with an author.
And when they're done, then they can either run back through auto prep again and have it. Do some other additional checks, or they can move along and they can select the actual conversion. Option and take this thing onto XML. Now before it does the full XML, the Word-to-XML process starts. By doing a check to make sure no errors have been introduced. If there have been errors that have been introduced, we send back the document with a report. But once it does pass that step, then it moves into an automated conversion process. It converts the document XML. It takes that XML, it parses it against the DTD, and then it runs a series of additional checks, bunch of checks on the XML to make sure that all of these things, all of the styles, everything that the client has set up does match and that they're getting really useful XML. The whole process takes, you know, minutes. It's, you know, five to ten minutes for part one, it's like five to ten minutes for part two.
7:57
If anything fails along the way, we send it back to the client with suggestions and comments, but ultimately the result is you get this valid and useful XML. Now, I can't go into all the details in the time that we have here today, but I did go ahead and put together a list. I violated my PowerPoint rule about amount of text on a slide just so I could help you. Anyway, this is by no means a comprehensive list, but some of the things on the auto doc prep it's doing, it's using AI to auto apply these Word styles to all the different elements of the article. It's doing things like verifying that citations have a match.
If there's a citation listed at the end, is there a matching citation in the article. Are the are there other titles where they're supposed to be titles. Do we have the author's name? Do we have the, you know, the affiliations? And then, depending on what the client wants us to do, we can either fix certain things automatically or we can suggest the fix in the comments. Finally, it does things like look for things that are out of place. So for example, like we found an author name in the middle of a table or an abstract that was like in the complete wrong place in the document or an image floating somewhere. So it does all of those. And then as far as on the XML side, it goes through and it checks like I mentioned before, checks for that suitability first. Then it goes through and it actually creates to make sure that we've got good XML, right?
But more importantly, it goes in and it actually checks to make sure we don't just have valid XML, but we have good XML, right? So you could technically have abstract. And then the text insert abstract here and then a close tag of abstract. And that would parse but that wouldn't be useful XML so we can check all of those things. There are a ton of checks I didn't list here, and we can also customize checks too. So anyway, I'm pushing up against my 10 minutes. So that's going to take me to my last slide. And if I can actually click the button correctly... While you're scanning the QR code let me just quickly summarize.
10:02
We provide similar functionality to eXtyles, but it's not a Word plugin like eXtyles users are used to using. It's a fully automated solution, and it's really geared to help that unstructured Word document crystallize into a structured Word document, and then that structured Word document crystallize into a useful XML. It can be customized in all sorts of ways how things are ingested, what are the styles are how you want. You know, what automations you want, what checks you want, what XML model you need. You know how you want things reported. And it doesn't just have to be customized at the publisher level. It can also be customized at the publication level. So if you have a, you know, a couple of journals that you'd like treated different ways we can accommodate that. Or if you have, you know, books that are maybe in different product sets, we can, you know, accomplish that altogether. In any case, that's our presentation. Thanks so much for listening. And thank you, SSP for this opportunity.
Dave Myers
This is for DCL. What should what should current textiles customers do to prepare for a transition to Content Crystallizer in the next year?
David Turner
So I think what most people need to do on the front end is really try to establish, how long do they want to try to continue using the eXtyles tool. Number two, I think that they need to consider, do they want to keep doing this based on their volume. Some of the customers we've talked to, it actually makes more sense for them to move to a services model. And then number three, I think it's exploring options with some of the providers that are out there that are doing similar things to, to DCL. We'd be happy to talk to anybody about what they're doing and their process. It's a robust setup process that we can do with you, and we can really turn this into something that's customized at a high level for your team.
12:07
Dave Myers
The question is: Did DCL say that styles, spellings, et cetera can be customized for each publisher? If a publisher has multiple journals, can different styles be applied across journals?
David Turner
Yes, I did say that we can customize those things, and yes, they can be customized individually by publication. Does not have to be just all one type for a publisher.
This video is part of is part of the Society for Scholarly Publishing (SSP)'s Summer 2025 Innovation Showcase.
