Case Studies

Delivering customer success since 1981

Technologies Used

Pattern detection software
Automated QC software
XML

Project highlights

6,000,000 articles processed
100,000,000+ XML references delivered ahead of schedule
Fully automated 24/7 conversion process

“Scopus is adding cited

references to more than 10

million pre-1996 articles, resulting in a wealth of historic citation data which is of great use to our users. At this scale, capabilities to process data in an automated way and at high precision are essential. DCL has proven that it has these capabilities and we are happy to continue our collaboration to convert the digital archives of different publishers for Scopus."

-Wim Meester, Head of Product Management for Content Strategy, Elsevier

Elsevier

Content Standardization for Elsevier's Scopus Repository

Keywords: Scopus, data enrichment, XML, pattern detection, automated QC, CrossRef, PubMed

Background

Elsevier’s Scopus database is the largest abstract and citation database of peer-reviewed literature, comprising scientific journals, books, and conference proceedings.

Elsevier wanted to improve search capabilities by increasing the content coverage and link density of the Scopus database, which included archived content from 1996 and forward. Classifying content and establishing meaningful, contextual links within and across content sets improves search precision, recall, and navigation.

Solution

DCL onboarded, inventoried, and processed millions of bibliographic records with vastly varying levels of structure and quality, using a fully-automated 24/7 custom technology stack.

DCL worked with Elsevier to plan the normalization of the structured and unstructured data. Once normalized, the automated process transformed and enriched XML records coming from various archives and in various formats to match Elsevier’s XML schema. Workflows were created to match, verify, and overlay related content from CrossRef and PubMed databases.

Result

The unique reference matching process developed by DCL improved the quality of the final XML data while still supporting the Elsevier schema and business rules validation. The DCL Scopus Cited Reference Expansion Program conversion process ran 24/7 across 30 processing servers and delivered high-quality output in a quick, efficient manner.