Case Studies

Delivering customer success since 1981

Technologies Used

  • MathPix

  • Python

  • LaTeX

  • MathJax

  • ImageMagick

  • JAVA

  • JAPE

  • PERL

 

Project highlights

  • complex image equations automatically converted to MathML

  • improved equation discoverability for end users

“Go where your organization has not gone before with automated machine learning solutions to complex data problems. 

 

Our methodology combined with new AI techniques provide capabilities that were previously impossible just a few years ago!”

 

-Tammy Bilitzky, CIO, DCL

Supervised Machine Learning

A Deep Learning System to Transform Images Into Markup 

Keywords: machine learning, datasets, Artificial Intelligence, natural language processing

Background

Complex equations are challenging to typeset for print or digital publications. Many organizations revert to setting images in common graphic applications such as Illustrator or Photoshop. While the resulting equation is accurate to the human eye, computers and systems are not capable of understanding the meaning behind the image. Further, equations set as images are not accessible to the visually impaired.

Solution

DCL created training data that leveraged decades of XML content creation. Using highly varied content, approximately 100K images were extracted, correlated with matching XML markup, and visually verified, using a custom DCL verification tool. DCL engineers selected the Harvard image-to-markup framework, designed to recognize mathematical equations, and applied a statistical model to transform images of equations to LaTex markup. The equations comprised complex display math and inline math equations. Next, DCL trained the image-to-markup statistical model with the verified training data. After extensive training, DCL raised the equation transformation accuracy from 4% to 70%.

Results

The resulting training model employs a convolutional network (feature recognition) for text and layout, with an attention-based neural machine translation system. DCL employs this model to transform images in complex technical journal manuscripts to LaTex math. Converting LaTex math to MathML was also completed, enabling the mathematical equations to be fully discoverable as well as accessible for the visually impaired.

Following are some examples of the original equation images along with the AI-predicted image:

image-based equation transformed to mathML
image-based equation transformed to mathML
image-based equation transformed to mathML

Stay up to date with DCL!

Learn about product updates, get company news, and receive our monthly newsletter.

  • DCL LinkedIn
  • DCL Twitter
  • DCL YouTube

61-18 190th Street, Suite 205

Fresh Meadows, NY 11365

+1 718.357.8700

info@dclab.com

HOME  /  INDUSTRIES  /   SOLUTIONS  /  SERVICES  /  RESOURCES /  ABOUT  /  CONTACT  /  PRIVACY  /  TERMS OF USE

© 2021 Copyright Data Conversion Laboratory, All Rights Reserved.