Case Studies
Delivering customer success since 1981
Project Statistics
-
1,104,773 files analyzed for platform migration
-
29,464,902 MathML formulae analyzed and updated
-
36,036,951 reference citations analyzed, parsed, and structured
-
47.3 GB XML journal content and 1.7 TB assets
-
142 MB XML book content and 25.7 GB assets
Technologies Used
-
DOM
-
XPath
-
Regular Expressions
-
Custom Algorithms
-
Exif tools
-
PDF tools
-
Ghostscript
-
XML
What our customers are saying
"Modernizing our content platform was critical to maintaining the pace and professional publishing environment at AIPP. Additionally, we recognized that this massive undertaking would provide deep insight into issues in our content structure that impact downstream discovery. What we did not anticipate was the amount of content and assets that required XML updates that DCL demonstrated would improve our content performance.”
Ann Michael
Chief Digital Transformation Officer
AIPP
AIP Publishing
Platform Migration to Silverchair
Keywords: content structure analysis, Content Clarity, conversion, platform migration, Silverchair, XML, JATS
Background
AIP Publishing (AIPP) is the publishing subsidiary of the American Institute of Physics with a portfolio of highly regarded, peer-reviewed journals, including a growing collection of Open Access titles, that cover all areas of the physical sciences. AIPP publishes 30 journals including the most influential physics magazine in the world—Physics Today. The research AIPP publishes in its extensive collection of books, journals, and conference proceedings paves the way for new fields of study, gives rise to new techniques, and provides inspiration to contemporary researchers.
To better support AIPP’s mission, it has made significant investments in technology including upgrading to a powerful platform to properly support its massive content collection.
Solution
AIPP selected the Silverchair Platform as the core technology for its growing digital library collection of journal articles, books, and conference proceedings. Faced with what could have been the daunting task of transferring, restructuring, and checking over a million files of complex scientific content, AIPP selected DCL to manage the migration to Silverchair.
The scope of the project was massive due to the amount of content that had to be analyzed, converted, and delivered to Silverchair—47.4 gigabytes of XML and 1.73 terabytes of assets!
Fortunately, DCL has developed tools specifically to handle very large collections, starting with Content Clarity to audit content collections, and provide detailed analytics on all the key components. No link could be overlooked, and no asset could go missing! Content Clarity validated XML files and health checked the corresponding digital assets (e.g., verify that for every image there is at least one callout in the XML and for every callout there is an image). Findings from the analysis were grouped into categories—Summary Analytics and Errors and Warnings. Issues in XML structure were identified providing the roadmap to convert AIPP’s entire collection to Silverchair.
Examples of the structural issues that the pre-conversion analysis revealed were
-
MathML: physics content contains a lot of complex equations and math. AIPP had more than 29 million math formulae in its content. DCL updated both invalid and valid MathML to the Silverchair MathML Schema to ensure proper rendering and discovery.
-
Citations: Content included multiple citations in a single reference. DCL’s conversion process identified and broke up more than 18 million citations into individual references, updating all xrefs accordingly.
-
Subject Headers: DCL normalized subject headers and contrib-types as provided by AIPP.
Result
In less than one year, DCL converted and structured this entire content collection for a frictionless flow into the Silverchair Platform. This project was the largest set of content ever uploaded to Silverchair. AIPP will house its entire library of physics content on a state-of-the-art technology platform with the additional benefits of content structured to an updated version of the JATS XML standard. Gathering everything in one place also allowed AIPP to work with DCL to review content metadata (as reported by Content Clarity) and update to the latest taxonomies and ontologies.
Most importantly, AIPP’s commitment to the scientific community is fortified and architected for decades of future success. AIPP now ensures that all findings to advance the physical sciences are presented, promoted, and permanently available as the building blocks of future discoveries.