DCL Learning Series

Illuminate the Blind Spots in Your Content Strategy with Harmonizer

Patricia Grindereng

Stay informed, as always. And today's webinar, "Illuminate the Blind Spots in Your Content Strategy with Harmonizer," presented by Christopher Hill with DCL. Welcome.

Christopher Hill

Thank you, Trish. Let me just share my screen here. So I think you're seeing it. So today I wanted to share with you one of the tools we use at DCL in order to illuminate blind spots in your content strategies. So we have a number of techniques for doing this, but some of them we've made available outside of our organization, so they aren't just part of our own services. And I'll talk a little bit more about that later. But a little bit about me. I'm Christopher Hill, I'm a Technical Product Manager as well as Project Manager at Data Conversion Laboratory. And again, I'll share a little more about my company a little bit later on. My background really goes back to the early days of XML. I left public school teaching and went into corporate training in the late nineties and ended up in a company that did XML training courses. So for many years I developed and was on the ground floor of XML and the rise of XML. And things have changed a lot since then but I'm still talking about the same old themes. So here we are today.

So let's talk a little bit about what I'm going to be focused on today, and that's really a tool called Harmonizer. And one of the things we do is content conversion and projects around updating content. And over the years we developed tools to help us do this internally. So things like being able to automate these conversions or find things in the content that might be hard for a human to find. And Harmonizer really emerged from that effort and was productized a number of years ago. And then I have carried on with that product going forward. And it's offered in a number of ways that we'll talk about a little bit later. But it's software that basically takes any document collection.

It can work with multiple formats at one time. So that means things like office documents, PDF files, DITA source files, HTML. You can feed it text documents if you want. And then because we're a conversion company, we can support all kinds of other formats, as well as any XML format that's out there or really any markup language. But what you do is you feed all this stuff into Harmonizer and Harmonizer will take and identify all the text blocks and then it does a massive many-to-many comparisons. So it says "I'm going to look at all those text blocks and group them together by which ones are exact matches and also which ones are closely matching. So I will give you a whole bunch of match groups that give you groups of those texts that occurs throughout your content that are either duplicate or near-duplicate." So what we use this for is we can identify redundant content. That can help us with migrations. Sometimes we can create some business rules around ways to, say, automatically create some topics during the migration, if you were going from something like Word to DITA, for example.