DCL Learning Series

Publishing at the Speed of Science: A Lunchtime Conversation With Bill Kasdorf

Marianne Calilhanna (00:05)

Hello, and welcome to the Data Conversion Laboratory Learning Series. Today, we are thrilled to host a lunchtime conversation with publishing luminaries Mark Gross and Bill Kasdorf. My name is Marianne Calilhanna. I'm the Vice President of Marketing at Data Conversion Laboratory. And I will be your moderator today. Before I turn it over to Mark and Bill, I do want to encourage everyone to submit any questions or comments via the Q&A box. We are recording this webinar and it will be available in the on-demand webinar section of the DCL website.

Before we begin, I would like to provide a short introduction on Data Conversion Laboratory, or DCL, as we are also known. Our mission is to structure the world's content. Content can unlock new opportunities for innovation and monetization when it has the foundation of rich structure and metadata. For content to be easily discovered and used across various platforms, it should be converted into an XML format from which machines can extract pertinent information. DCL's services and solutions are all about converting, structuring, and enriching content and data. We are one of the leading providers of XML conversion services, DITA conversion, structured product labeling conversion, and S1000D conversion. Many people are well aware of our excellent content conversion and transformation services, but we also do a lot of work in the other areas listed on this slide: entity extraction, third party validation of previously converted content, semantic and metadata enrichment, data harvesting, or web scraping, content reuse analysis, and structured content delivery to industry platforms. If you have complex content and data challenges, we can help.

(02:15)

Now, before we begin, I would like to launch a quick poll just so we understand what your role is in the industry. So I'm going to invite you to simply answer the best you can from the, from the options listed below. What is your role in the industry?

(02:42)

And I'll give just a minute for that. And while you're submitting your answer, I do want to let everyone know that Bill and Mark have not scripted their chat, because we really wanted to have a natural conversation about publishing: about publishing not only during a pandemic, but also during a time of great change for our industry. All right, I'm going to close the poll and just share the results. So everyone knows who we have with us today. So almost half of you are from the scholarly publishing industry. Looks like we have a, some educational publishers, legal trade publishers, and some service providers. So welcome, everyone.

All right. It really feels funny for me to introduce two gentlemen that truly need no introduction. Mark Gross is President and Founder of DCL. He established Data Conversion Laboratory 39 years ago and is an industry leader in structuring content and digitization practices. Bill Kasdorf is Principal of Kasdorf & Associates and a Founding Partner with Publishing Technology Partners. He's an expert on accessibility, XML, HTML, EPUB modeling and specification, standards alignment, and workflow. Thank you both for joining me for lunch. Mark, I'm going to turn it over to you.

Mark Gross (04:29)

Okay. Thank you, Marianne. I'm really excited about this and I'm really, it's really wonderful to welcome Bill to our, to our webinar series. I, I, and it's unscripted, all the ins and outs are because of that. I want to start by, Bill, we actually hadn't seen each other in a long time. In the last few days we only each other on video, which is interesting because you know, being able to be on these video conferences means you can see people you don't see otherwise for, except for conferences every once in a while. But tell me, Bill, tell us a little about what you're up to these days and you know, what's, what's, what's going on in your life.

Bill Kasdorf (5:24)

Sure. Thanks. Nice to be here with you, Mark. It's been, we've known each other for a long time. I will, I can't resist mentioning that I did a book many, many years ago called The Columbia Guide to Digital Publishing. Guess who I got to write the chapter on data conversion: Mark Gross. So we go way back. So in terms of what I'm doing recently, as Marianne mentioned, you know, most of my consulting work centers on things like editorial and production workflows, and content modeling, metadata, all those kinds of geeky things. I was talking to my PTP colleagues the other day. And as we got an inquiry on a publisher that wants our help, and one of my colleagues mentioned, boy, it maybe sounds like they need, maybe want to get into the weeds. And I said, well, that's good. That's what I do. I do weeds. Right.

But the other thing I always say I do is, I always used to say "and accessibility," and I've tried to stop saying that because accessibility is really part of all of that. So any of those engagements, I always factor accessibility in. And then one other comment I might make, Mark, that I think might be relevant is that, you know, why do, why have clients come to me? What are they looking for? Fundamentally, I think the main driver is that they realized that they're kind of stuck in an old print-centric workflow, a book-centric workflow, a journal article-centric workflow. And they realized that they really need to be thinking digital first. And that may sound obvious now, and we've been talking about this for years, but frankly it isn't obvious because, you know, editors, production people, et cetera, it's really hard for them to envision their content in a different way than this kind of chapter-based or article-based format. So, so anyway, so that's what I do. It's very fun. I'm a busy guy.

Mark Gross (07:36)

A bunch of things there. I guess one, you know, we sort of said, when you said, you know, the 700-pound gorilla today is COVID and that's what everybody's talking about. And we agreed that that wasn't going to be the centerpiece of our conversation, but it's probably the backdrop to a bunch of things that are going on over here. So I think just in terms of publishing and, and the, the title of this webinar, "Publishing at the Speed of Science," one of the things that's going on right now, and COVID has exacerbated it, is this whole idea of getting science out as soon as you know anything, preprints, even earlier than preprints, first concept thoughts. And it's become, you know, in the last few months it's been like, how come we don't know any more? How come, well, I just looked at it yesterday. So how is that affecting, how is that affecting what's going on? What are you seeing?

Bill Kasdorf (08:32)

Well, yeah, that's, you're absolutely right. And basically, I think one thing that's a notable about that is that, you know, the, particularly the journal publishing process hasn't suddenly changed. So, you know, preprints have been around since archive, you know, the Los Alamos was originally at Los Alamos for particle science, right. Particle scientists. So they've been depending on preprints for decades, really, but, you know, particle science to most people, that's not the real world, right? That's not, that's not where I live, but now it's, all the science is doing this and has been for a while, but suddenly the public is paying attention to the damn preprints. And, you know, I read a comment recently, sorry about my language there, but you know, preprints are for professionals. They're designed for your professional colleagues, so they know what you're working on and they can comment on it, et cetera.

(09:37)

And this is a very good development, but the big problem is the public doesn't understand how to interpret those, those preprints. And doesn't realize that, that the results have not been vetted and verified like a peer-reviewed published article. So that's a, that's the, he's making the point that, you know, the, the, the current journal publishing workflow hasn't actually changed, although it's accelerating. I've got, one of my, well, actually a former client that I've done some work for, not currently, is The New England Journal of Medicine. And a current client is Annals of Internal Medicine. You can just imagine what those people are going through right now. They're just unbelievably, simultaneously, they've got a fire hose of content coming at them, and they've got incredible pressure to get stuff out right now. Annals, you know, I was just blown away when I first started working with Annals, they have a database of 25,000 peer reviewers.

(10:39)

And the process they use to vet an article is incredible: how many layers of analysis and how many steps they go through. And at the end of it, it just is a kind of provisionally accepted manuscript. And they also have a staff of statisticians, actual in-house statisticians. I think they've got seven people that validate the statistics in every article. So, I mean, this is really, really thoroughly vetted content. How do you do that in two days when you've got pressure to get something up, right? It's crazy. And they've got hundreds and hundreds of things coming in every day.

Mark Gross (11:17)

And yeah, and also, I mean, a few of our clients we've, we've helped them identify COVID-related articles that they had in their banks. I mean, this is –

Bill Kasdorf

Yeah.

Mark Gross

...this coronavirus is not a new concept. It's been around for, for years. It's just this is a novel virus. So, so we help them identify it and we set so they can go right online with them. This is, could never have done it 10, 15 years ago because you didn't have the right metadata on you. It all depends on these new publishing techniques and being able to –

Bill Kasdorf

Right.

Mark Gross

...get ready for that. So I, you know, how soon can an article really get out there? It used to be years. Now it's been condensed, but there's still things that need to happen. I mean, can you really speed up publishing that much?

Bill Kasdorf (12:08)

Well, I think it actually is getting sped up, you know, this, this coronavirus, COVID situation has just made something that was kind of about to happen, have to happen. Right? And so you know, I read recently that, I can't quote the name of the of the initiative, but it's a group of basically scientific experts in a variety of fields that are being assembled to be able to provide a kind of, I don't want to say instant or immediate, but incredibly rapid vetting of papers for the news industry so that the news industry can, can say, well, here's this preprint that came up instead of just jumping to some conclusion about the preprint, run it by these folks and, you know, in a day or two, they're going to get back to you about, you know, what can you say about that?

(13:09)

Or what can't you? That's a pretty cool thing. It's not, that's not still the full traditional peer review that presumably that article will still go through. It's not a substitute for it, but I think, I think that's really interesting. One of the other, you know, you may know that I'm involved in lots of different standards organizations, and the one that most people probably on this call haven't heard of is IPTC, which is the International Press Telecommunications Council. It's all the big news agencies and news organizations around the world. And, you know, they're dealing with this problem from the, from the other side of that, the screen is, as you, as you might say, that, you know, they're getting deluged with this stuff and they've got to sort out, you know, what can they say, that's responsible and they still need to get the news out, right?

Mark Gross (13:56)

Right. Right. And, and, you know, it's, it's a, it almost sounds like what you're describing here is like taking crowdsourcing to a much more serious level. I mean, it's almost like –

Bill Kasdorf (14:08)

It's a validated crowd, right.

Mark Gross (14:11)

It's a validated crowd, but, you know, we've seen that even unvalidated crowds can get very accurate. But I think it's gotten gamed a lot, but I remember when they did some analysis of what was going on Wikipedia, they found a lot of accuracy based on just editors who were almost self-selecting. But, but now it's gotten a little game, there are organizations like going out there and doing all kinds of stuff, but, but you know, the, the communications capability you've got here, and the fact you've got everybody on computers, everybody on broadband, you've got, you know, billions of people out there. It means that even if you select carefully, you still have millions of people out there. So that really could change things a lot, I would guess.

Bill Kasdorf (15:00)

Yeah. You bet. Of course another thing that service is, think of all the millions of people who don't have access, you know, all the people in the digital deserts and the, you know, the global south that, you know, it's, it's not as easy as it is for you and me.

Mark Gross (15:19)

Right. So how big is that global desert? I mean, we, we've, we hear about the, you know, the, the people who have iPhones, it's like a huge percentage of the world population.

Bill Kasdorf

That's right.

Mark Gross

What, how, how big is that desert? Do you have any sense of that?

Bill Kasdorf (15:36)

I think the number is in the billions. I can't quote the number. I bet you some of our attendees could probably write chapter and verse on that. But it's having the device, it's having the broadband access that enables you to get the content. And frankly, another thing that both you and I are passionately committed to is standards. And underneath all of this is standards that actually makes it work. Right? So, that, you know, the example I've been giving people, also a COVID-related thing, is, you know, everybody's on video conferences. This one happens to be GoToMeeting, but everybody's on Zoom or they're on Teams or they're on blah, blah, blah. And I was at a Teams meeting the other day with a client, and for some reason Teams, I, you know, I have Teams installed in my computer, but it didn't come up, and I could just click the button and it said, well, use your browser. And it worked fine.

(16:31)

You know, why did that work fine? It's because there's a standard called Web RTC, which is a web real-time communication. And it's it's what enables that to just work without you even being aware of it, you don't have to know which browser you're using. And, you know, people actually create these standards. What? Thousands of people on working grids, working on these standards, but you know who really drove the web RTC working group and is driving that standard? Microsoft, Apple, Google, Mozilla, and Opera. The browsers. They're ierce competitors, right? But they recognize that it's in their interest. If this doesn't work on all of them, it's not going to work. So they kind of have to put their competitive impulses aside and sit down at the table and work together to get this to work. And that's why it works for us. And we don't even have to know about it anymore. Right? It just works.

Mark Gross (17:34)

And I'm glad that also, there are sort of standards in the human engineering part of it, because, like –

Bill Kasdorf

Yeah!

Mark Gross

The other day I was on five different technologies in five different meetings in the course of six hours. And just keeping track of, I seem to be able to figure out how to turn on my camera in each one of those, which, because they're doing things in similar ways. It's a good thing because nobody has manuals anymore, so there's nothing to read. So if it doesn't work the way you think it would, you'd never get anything working. You're talking about accessibility, you know, a large part of the world does have access, but just in terms of accessibility, we usually think of as people who have disabilities or people who are, and I know you've been actively working on that, it's been in the news lately because for people who are hard of hearing, people with masks, people wearing masks has been difficult. So that came up in the news recently. But certainly it's been a bigger issue for, for many years; we're both involved in that.

Bill Kasdorf (18:42)

Yep. And of course, what I think is the best news on the accessibility front is that, you know, making, making content accessible to people with particular disabilities is still a challenge. And you do sometimes need to do a specific kind of remediation of a, of an asset for a particular user. But, having said that, it used to involve special technologies and special XML formats and special products, basically. And accessibility standards have really all converged around web accessibility. WCAG is the foundation for web accessibility. It's the foundation for accessibility specifications around the world. So, you know, US's Section 508 was, quote, "refreshed" a couple of years ago to align it with WCAG and the EU requirements align with WCAG, et cetera. And so, you know, now to get a website accessible or an EPUB accessible, it's just getting your HTML right. And you know, it's, yes, you maybe have to add a few things like some Aria markup, et cetera, but I shouldn't be getting into the weeds. I told you I was a weeds guy, right? But –

Mark Gross

Well, the weeds are important to this kind of stuff.

Bill Kasdorf

Oftentimes the only missing piece is image descriptions. That's, that's still a challenge for people.

Mark Gross (20:12)

Right. Right. I mean, there are things that you need to do –

Bill Kasdorf (20:15)

Tools you're using and the technologies you're using, and the markup you're using, you know, not only do you probably already know that, but the vendors that are creating your products know how to do those things. And that's just a huge benefit.

Mark Gross (20:29)

Right. But it's also care on the thing on best practices. We said image descriptions need to be there because if somebody's not sighted, they won't be able to see the image; they need some description of it. But also things that people forget about is a colorblindness is much bigger problem than non-sightedness. And people are careless with, with colors of maps and lines and all kinds of things like that. And so with the web, you've got imagery, images, and moving images that may disturb people with certain kinds of disabilities. So there's there's a whole, as you know, this, there's a whole bunch of things going on over there. But –

Bill Kasdorf (21:11)

Yeah. In addition, I should, I just want to interject, in addition to colorblind people, low-vision people, you know, there's 10 times as many low-vision people as blind people. And, you know, they need to enlarge the fonts, which you can do with an EPUB, but it, you know, try doing that on a phone with a PDF. It's, you know, it doesn't work. And dyslexic, you know, you need to substitute the fonts to put a dyslexic font in, and, you know, I know I'm sounding like an evangelist for EPUB, but that's because I am. But that's why EPUB is so much more accessible.

Mark Gross (21:44)

You just mentioned something: what is a dyslexic font? A font; you want to just talk about that?

Bill Kasdorf (21:53)

Yes. Dyslexic people, for example, have a problem with letters that are actually the same thing, but rotated, right? So a "p" and a "q," ah, or ah, and a "d," a lowercase "d," lowercase "p," a lowercase "q," you know, it's really pretty much the same shape in most fonts. Well, a dyslexic font makes those three things distinctly different, and it makes a huge difference to a dyslexic person to be able to, to read that. That's not the only thing they need; they need spacing and adjustments and stuff like that, too. But just being able to, I got a demo of a, of a new technology that's just coming out yesterday and they've got a dyslexic font built right into their platform. So that's pretty cool.

Mark Gross (22:42)

That's, that's amazing. I didn't know that, but it's amazing how something like that, I know people have been designing fonts for generations and fonts have always done, I mean, the Museum of Modern Art has a collection of thousands of fonts, and people don't think of it as an art form, but you also, it's also some practical implications to something as simple as a font. I mean, look at the ones that we have available; we would load up Microsoft Word, but there are many fonts that are very specialized, very, so that, that's, that's a very interesting point. So, so yeah, I think the point is that once things go into, into an XML or into something like that, that that separates out the content from how it appears, it makes, it makes it very feasible to make things accessible that weren't before.

(23:32)

And then it becomes the instruments that are out there and the computers are out there. And the ability, it's just a matter of enlarging and doing things like that, which, which, you know, will make things lot better over time, as long as people apply those kind of standards. Standards again. Let's talk a little about, you know, you're, you're doing a lot of work in education in K through eight. I mean, so, you know, what, what and certainly, the 700-pound gorilla again, COVID has brought things up, people have to learn at home, all kinds of things like that are going on. And I think what we've discussed, I mean, it's been said that COVID isn't really changing anything that wouldn't have changed otherwise, but it's, it's expediting, it's accelerating change, which might have taken 10 or 20 years is, now might be happening in a space of six months or a year. I mean, we, we've moved our office offsite in the space of two days, had everybody working off site. I wouldn't have thought you could do that in, in six months or a year. So we've been doing amazing things just because we have to. But what about education? How's that? What are the changes that are going to be long-lasting?

Bill Kasdorf (24:50)

Well, it's still hardest in elementary school. Most of my work in education is in higher ed. And there, you know, certainly the big publishers and lots of medium-sized and even smaller publishers for years have been trying to move again. I mentioned getting away from their print paradigm and moving to a digital paradigm. Well, you know, you can do a lot more teaching in a platform that can be responsive to how, the students' progress, and can be interactive, et cetera, than a paper book, but, you know, people like their paper books. And so it's been hard for publishers to get folks to get, you know, to get off the paper book that they don't want to keep having to make and move on to the platform. And there's two sides to the platform. There's a publisher's platform, but there's also providing the content to the learning management system that the school uses.

(25:50)

Right? So again, it's standards that make these things work, you know, because there's a bunch of different LMSs out there. So that's, you're accelerating that, because now it's like, you know, you need, you need to do it online, but I've got a client right now that, you know, the reason that they're doing this is that they've been publishing, they're a credentialing organization, but it's, you know, they have a curriculum. It's educational and it's in fact very complex. And they publish a giant amount of content every year in these giant print books that, you know, these people have to study in order to pass the exams to get their credentials. And what they're finding is that, because now they've also got that content online, is that people actually prefer to use the online. 'cause it's, it actually teaches better than the, than the print. And so.

Mark Gross (26:46)

Right. Now there's been a movement in other areas that have been going on for years and sometimes happened very quickly. I think, you know, in the history of where we've both been working, I mean, I guess it first happened with legal books and legal publishing, where in the nineties, stuff started moving off where, and part of it was because, oh, exactly that, because you had all, you have all this material that needs to be updated all the time and, and in particular big law firms didn't want to spend money on all that real estate, because –

Bill Kasdorf (27:18)

Well, and it's also a giant amount of content, right?

Mark Gross

A giant amount of content.

Bill Kasdorf

And you need to be able to access all of it, kind of all at once. So –

Mark Gross (27:25)

Right. And now it's moved, then it moved to scholarly publishing and medical publishing is also, because there's just huge amounts of that. And I, it sounds like education is, is the next frontier over here. I'm still wondering why it hasn't happened faster. And is that and, and, you know, you, you made a point that publishers want, want to do that, but it is part of it the economics of it? Is it hard to figure out how to monetize that? I mean, that seems to be an issue; when people go online, so many people think everything should be free. Is that something that you've been seeing?

Bill Kasdorf (28:05)

Yeah, and actually that's, that's an issue that's actually a, quite a challenge for commercial educational publishers, because there's what's called OER, open educational resources out there. So, you know, I was talking to one of my clients just a couple of days ago and he's at a major university and they basically ask all faculty when they get their curriculum together in their, you know, their, the reading list for a course, they have to justify now, if, if the student has to spend more than $25 to get what they're assigning, they have to justify that. Like, what? The university is mandating that on all, and it's, and you would know, you would know the university. It's a really big one. And they're not alone in doing that. So, wow.

Mark Gross (28:57)

So what does it do to the economics of the industry and what, what does the publisher, I guess, why not, why not that $25 book or why not those free resources?

Bill Kasdorf (29:09)

Well, yeah, it's all free resources. You know, you mentioned that, you know, these things come in waves and it hits scholarly publishing first. So, you know, you'll remember when open access first came along and people just freaked out, you know, about open access. And justifiably so. You know, commercial publishers, society publishers, particularly because they were dependent on the revenue for their, from their journals to help keep the society alive. But you know, now it's mainstream and it's taken for granted. And, you know, I have to say, I, I tend to be a, you know, a positive-oriented person, let me just say. But I'm not, I'm not cynical, but I'm about to say something cynical, because, frankly, it was when the giant publishers discovered that they could just get paid up front for doing the work, and then they made it open and free, they thought, well, okay, that works for us.

(30:03)

Now I'm oversimplifying dramatically here. And we've still got a huge uproar with Plan S and, you know, different kinds of commercial deals that are, it's a lot of experimentation and, and turmoil is going on in that area. But at least, you know, the kind of expectation that, you know what? This content needs to be open and free, and the funders are kind of pushing to make that happen, is completely restructuring the whole commercial realm in, in, in STM, journal publishing, certainly. It's a problem for humanities and social sciences, but education hasn't figured how, figured out how to deal with that yet. So it's, it's a huge problem for education.

Mark Gross (30:48)

Right. I mean, I think you're, you've got a point, it's a very important point you're making over here, that they figured out a whole new, a whole new sales model. And it actually might be better than the other one. Getting paid up front is not a bad thing.

Bill Kasdorf

Right.

Mark Gross

It's an, an, but even though you're not getting that long-term income on an ongoing basis, those materials are not the kind of materials that would have long term necessarily, I mean, there's a long tail on that.

Bill Kasdorf (31:20)

And that mostly works in the sciences because, you know, they have the funding to pay those article processing charges up front, but it's a huge problem for humanities scholars, for example, that, you know, they spend, they spend 10 years working on a book and they're not gonna make any money on the book and they're supposed to pay thousands of dollars to get that book published, just so that it can be open access? So how that gets funded is still being worked out, right? That's not going to happen, it'll happen, but that problem hasn't been solved yet.

Mark Gross (31:53)

Right. I mean, well, I mean the same issue came up with, with libraries and, and eBooks at libraries, on how to, how do publishers monetize that. And so the, the model now sounds a little kludgy to me, but apparently they they get a certain, publishers license an ebook for a certain number of, of uses, which is modeled on how many times they think a print book would be used. So they think a book is good for 20 readings, then the ebook is sold for 20, 20 loans, which is, sounds a little kludgy, but it's really, I think that's gotta be a temporary kind of thing. Somebody's gotta –

Bill Kasdorf (32:40)

Yeah, that's a case where the print model is still driving that because, you know, in print, the model is for bestseller, a given library, depending on how big the library is, might buy 10 copies of that book, or they might buy 50 copies of that book, right? Because they can only give it to one person at a time. But you know, you, you brought up earlier, you know, like, the kind of of impact this whole COVID year is having on everything. Here we go again, right. Is that, you know, there were, there were major trade publishers that had policies that the librarians were just flipping out about because they were so restrictive. And suddenly, you know, kind of the, the publishers were looking really bad by saying, you know, no, you can't have access to this content. So a lot of publishers kind of eased up on those kind of draconian requirements, to the library's delight. But, but on the other hand, there's a financial impact on the publisher.

Mark Gross

Right. I mean, and books weren't being loaned out. The only books that were being loaned out of libraries were eBooks for the last four months, right? So...

Bill Kasdorf

That's right.

Mark Gross

So the same kind of thing is going on with movies. No, no movies at movie theaters; suddenly everything is being downloaded to the streaming services, where before, you wouldn't do that for two months or whatever, till the things became available.

Bill Kasdorf

Yup. I see Marianne. I wonder if she's got a question for us.

Marianne Calilhanna (34:11)

I have a number of questions for both of you.

Mark Gross

Really?

Marianne Calilhanna

Yeah.

Bill Kasdorf

Well, I'll just slow down and shut up for a while. [Laughs]

Marianne Calilhanna

[Laughs] No, I'm just going to interject for a moment here. What with the impact of the pandemic on events, many publishers, including those outside of scholarly publishing, are turning back to their publications to try to find revenue. So how could using XML help publishers to create new products and try to find new revenue streams? Wondering if you could speak to that for a moment.

Bill Kasdorf (34:46)

You want to take that, Mark, or you want me to go ahead first?

Mark Gross

No, you can – you're our guest. You start.

Bill Kasdorf

Yeah. Well, you know, another major trend, I mentioned thinking digital first, instead of thinking print first in terms of how you, how you create your content, how you manage your content, how you deliver your content. But another aspect of that is people are consuming content in more granular bits, right? They aren't necessarily getting the large publication chunks that they used to. So, you know, if you've got XML, and particularly if you've got some semantics associated with that XML so you know, What in all this XML that I've got here is about X? You know, you can go into that repository of content, let me just characterize it that way, and say, you know what? I've got all these, let me just give you, a mundane example is, I published a whole bunch of cookbooks, right? So I've got Italian cookbooks. I got French cookbooks and I've got Mexican cookbooks, et cetera. And it's like, well, you know what? I can put together a bread cookbook because everybody's baking bread all of a sudden, right. So I could look at all that XML I got and say, give me all the recipes that are about bread. And now I got a new book to sell, right. So that's just one example.

Mark Gross (36:06)

Right. And, and, and I think well, first of all, the recipe example reminds me. You could do more on that. I have a, I had a client who took all these that, they took all their, their cookbooks, turned them into XML, and were able to sell a version of it that would go to a Nestlé chocolate cookbook. They would go to Nestlé and have all Nestlé products in there. Same thing can go to a competitor that had their products and they could do all those kinds of things with them. But also, I mean, I think you make a very good point. I mean, as people are, as the market widens there's a much, there's that long tail there of people who would never have looked at certain books because it wasn't available. They didn't know where to find them. And now they could find more, the, a very esoteric book might now have a much larger audience than they've ever had before, but it has to be delivered in different ways.

(37:00)

It has to be delivered so somebody could instantly get it on their phone. And the only way you can do that is if it's in XML. So it can be reformed and reorganized and taken in little bits as you go along. And, you know, going back to what's been going on the last four, four months, I mean, those publishers who have been very digital, I think have been able to take advantage of that even now...

Bill Kasdorf

Exactly.

Mark Gross

...and be able to distribute information that, if there weren't ready for that, would never have been able to do. It's just that there wasn't time to set it up. So you know, those who thought ahead, I think, are doing well with that. I don't know if "well" is the right word. It's still difficult times for many, but certainly...

Bill Kasdorf (37:44)

Yeah, they, they're, they're in a transition. Transitions are always hard, but, you know, you can come out better on the other end. You know, one of the ironic things is that books don't need, don't need to go out of print because they don't use print.

Mark Gross (38:00)

Right. Ever. They're never out of print anymore.

Bill Kasdorf

Right.

Marianne Calilhanna

Mark, I've –

Mark Gross

We've got an inventory of millions of books. Go ahead. Sorry.

Marianne Calilhanna (38:11)

Yeah, Mark, I have a couple of questions for you. So, it was asked, has DCL been busier since the pandemic? And tacked on to that, what services are DCL delivering due to COVID-19 that they did not see before and were not needed before? And of course, I think that to the second, third week of March, that weekend, and maybe you could share some of that story.

Mark Gross

Right. So I, you know, I think we're very lucky to be in an industry that's all digital. And so, and many have not been that lucky. We've been able to pretty much keep our regular business going. And many of our customers are all digital. I mean, they're getting content and data from all kinds of places and, and that's continued. And, and even, you know, I was worried about, like, the conferences business, and actually we're going to talk about, I was going to ask you about that, where that's going, but, but conference proceedings seem to have still been going on. People have made material available. So even though the conferences haven't physically taken place they, they're still, the, the proceedings have really been done and, and distributed. And so those are things that have been going on in general.

(39:35)

I mean, specifically with COVID, I mean, I mentioned, two of our clients have asked us to go and take all their COVID-related papers so they can issue them right away. And some of those were done over a weekend. I mean, it became like, okay, we gotta get rollout by Monday, and because they were digital, and they were able to gather, they were able to find them and we're able to publish things very, very, very quickly. So I, you know, it's, it's, this didn't happen by accident. In terms of DCL's, I think we've been, I didn't know we were preparing for this, but for the last two, three years, we've been really upgrading, been upgrading phone lines and upgrading equipment so that you can actually start working off it. We never knew what that would be for. Never in my wildest dreams did anybody think of this. I think this is truly a black swan, right?

Bill Kasdorf (40:41)

Wow. Yeah, very good point. You know, you were mentioning the conference business, and there's a, an interesting parallel to what we were talking about with education. And this is actually a point you made to me in our conversation. So I shouldn't be taking words out of your mouth, but, you know, it's, it's, that: is being physically present the best way to consume a presentation or not? And it's like, you know, I think I'm a firm believer that the conference business will come back, but in a different way, because I think that, you know, the, the networking, the hallway conversations, you know, the ability to meet new people. It's not just seeing old friends, it's meeting new people. It's, you know, being, you know, I, I, people, people often say that I, I seem to know a lot about stuff, right? How do I, how did I learn all that stuff? It was going to conferences all these years, right? I just paid attention to what people were doing. But you don't actually have to be physically present to consume the presentation. You have to be physically present to interact with that person. Right. And the same thing's happening in education. I'll let you take that because that's actually a point you made to meet the other day.

Mark Gross (41:55)

Yeah. The point I was making this that maybe education is being done all backwards. And, and actually some elementary schools have gone to a model where the class presentation by a teacher is done over video and done in the evening, while the homework is being done in the classroom, which is totally backwards to what we've been thinking about, but this way the classroom is done under supervision so that a child can get instruction where he's having problems. And he can be listening, listening to a presentation, can be done on video. So that really turns things on its head, but it's, it sounds like a great model. It seems to be working well. And the same thing might be happening in conferences. I think, Bill, you made the point that there are conferences where the materials are sent in advance and the conferences, the intention is to be networking and talking to those people.

Bill Kasdorf (42:48)

That's right. That's what the W3C did. They have a, they have a semiannual confab that gets lots of people together. And they basically just recorded all the presentations that would have been made at that conference and said, here, watch these in advance, and then we're going to have two two-hour conversations as web call-ins, and no agenda, all Q and A. And it was fantastic. I think, you know, certainly the largest Zoom call I've been on. I think they had 400 and some people on a Zoom call. It was, it was amazing. But it really worked.

Mark Gross (43:27)

How do you coordinate with 400 people on asking questions? It's like, I don't know how you do that.

Bill Kasdorf (43:34)

You know, this, this is the W3C. And that group also uses what's called IRC, which is internet chat, for almost all of its meetings. So basically, you just basically, in effect, put your hand up in the internet chat and you get in a queue, and then they are watching that and they answer the questions. Or with Zoom, you can, you know, you can raise your hand.

Mark Gross (43:58)

Right.

Marianne Calilhanna (44:00)

Or you have someone who pokes in with a question like I'm going now. So just to shift a little bit, could you speak a little bit about multichannel dissemination, in particular workflows that are digital first and print second? Share, share some knowledge on that.

Bill Kasdorf (44:24)

Yeah, yeah, exactly. And frankly, that's probably why people are feeling the pressure to make this kind of paradigm change in their workflows to be digital-centric because they realize, well, you know, it wasn't long ago that the EPUBs came out weeks and weeks after the book was, after books, book was published. And now the expectation is, well, you should have it at the same time. And by the way, that EPUB ought to be accessible. There shouldn't be an accessible version that has to be made later. It's like, no, it should be accessible in the first place. But in terms of workflows, the tools are basically there already, right? I mean, we have the, the models, we have the standards; they're always in, in evolution. So I wouldn't want to give anybody the impression that the job is done. Right. But it's become mainstream to, to work in a digital way, as Mark was saying.

(45:25)

And because of that, if you get the, if you get the right kind of modeling in your content, so you're not thinking of chapters, you're thinking of topics. And you can, I think, Marianne, you were asking, when you were talking about multichannel, you were asking for a given publication, not creating variant publications, but the same publication being published in a variety of ways. The fact that so many of the standards are really XML- and HTML-based really helps or, yeah, in scholarly publishing it's JATS and BITS XML. Completely. It's the, those are the lingua franca of scholarly publishing. But I was just working with a client, a current client that I have, where, you know, I basically pointed out to them that everything in there, JATS XML, this is a major medical journal, everything, everything in the JATS XML model, the tag library, provides the ability to tag everything you need for accessibility. And then they need a specification. I'm a big believer in "a model is not a spec," right? In other words, you still need a specification for: how are you going to use JATS XML? But if you do that right, then you've got everything you need in the JATS XML to just transform it to the HTML that you need for your website or that you need for your eBooks. Your EPUBs.

Mark Gross (47:03)

Well, it's all available in JATS, but you have to have the policies and procedures and whatever in place that you're putting all the things in and not taking shortcuts around that.

Bill Kasdorf

Yeah, exactly. I get so many, you know, so many people still say, well, wait a minute, I'm using JATS. Why is my JATS XML for Atypon math? The same as my JATS XML for Silverchair? It's like, well, those are specifications, right? They don't, they don't add, ask for it to be done in exactly the same way. So you still have to work with who's the recipient of this XML and what do they need you to do so that their systems work properly with it?

Mark Gross (47:43)

Right. Right. And don't forget, I mean, we're talking about JATS for scholarly publishing, but there is other forms of XML that I used in other specific purposes for various reasons. So like a DITA is, is used not for scholarly publishing, what's used in technical publications, but you want to be able to modularize things. So the, the, the, what, you know, I, I, you know, when I first saw XML washer, when I first saw SGML, I said, wow, this is a language for really describing what texts looks like. And that's what XML it, what XML is, what you can define the purposes for it, depending on what you needed for. So there are many forms of being used out there. Nice thing about standards is there are so many of them, right. But, but you know, in any event it allows you to be able to communicate between the various people that are out there to make sure you're doing things in the same way, and that it works the same way.

Bill Kasdorf (48:44)

Yeah. And that it's optimized for a particular purpose. So, you know, JATS and its counterpart BITS have for years and years been optimized to represent scholarly content. And it's, it's, there is no better way to represent scholarly content than JATS and BITS. But guess what, when you're looking at a website, you're not looking at JATS and BITS, you're either looking at PDF or you're looking at HTML. But there was always JATS and BITS. Oh, I'll give you an accessibility anecdote. I was in a working group. This is, goes back a few years; so many of these working groups are just really, really technical people. And this guy was just a total expert tech geek. Right. And he said, he just kind of posted it to the list. "Why do we need to be paying attention to MathML? Nobody uses MathML."

And I went "Wait a minute!" You know, he's thinking of EPUBs in trade publishing and they've never seen MathML. And even when there are equations, the MathML isn't in the EPUB because it doesn't work right. So the image of the equation is in there. And he's saying "Why do we even need to pay attention to this?" I said "You have no idea that millions of equations are being created in MathML all the time." And any math equation that you see in a scholarly journal or a scholarly book, I can almost guarantee you was that was MathML at some point in the workflow. Right.

Mark Gross (50:21)

Right. Or, and, and I think that goes through for any specialized kind of area. I mean, I, you know, I don't know what happens in, like, music publishing, but it's probably some version of that that you'd use in music publishing and in voice.

Bill Kasdorf

There are music markup languages. That's true.

Mark Gross

So, you know, so if you've lived with, look at music, you'd say, why would anybody ever need that? But I think it goes through any specialized field. And I think the beauty of XML has been, is it allows that ability to be able to apply the specialized, the additions that you need in order to be able to work in many different areas.

Bill Kasdorf (50:58)

Yeah. And it's really, it's really in layers. Right. So you've got that underlying layer of XML mach – you know, a lot of people are shocked when I remind them that we are still using XML 1.0, that is the governing XML spec right now is XML 1.0, right. So it's just radically useful. Right. But then it gets specified for DITA for topic based offering and JATS for scholarly journals, et cetera, et cetera. But then there's a layer on top of that that says, well, this is JATS that is going to HighWire. JATS that's going to Silverchair. JATS that's going to Atypon. But then you've also got, the publisher is saying, but here's what I need to do with this content. Right. So they've got their own specification on top of that. But so it's, you know, it's, it's like layers, but it's not, you throw this one out and do this one instead. They build on each other.

Mark Gross (51:58)

That's right. And it's moved, moved a lot over the years; it's really been pretty interesting. So, so Marianne, any other questions you've got over there? It's –

Bill Kasdorf (52:14)

Because we only got five minutes left before we have to wrap up.

Marianne Calilhanna (52:17)

Well, I will share one thing. You, you do have someone who disagrees with you about the use of MathML in that IEEE processes and displays millions of equations without any MathML. They can change that for you if you want, IEEE, but –

Bill Kasdorf (52:40)

I have several very good friends that IEEE. So I'm going to dig into that because I would be surprised if, if the equations aren't method of streaming that worked well, I used to be working with, I shouldn't mention this 'cause it's a competitor of DCL, but I was with a conversion in prepress house for, for many years. And every equation done by that firm is done in MathML. The publisher isn't asking for it. And the publisher, usually wasn't getting the MathML, but the vendor was using MathML to create the equations in the first place. So –

Mark Gross (53:15)

I'm not sure, though. But no, I, I think there are, there are groups that are using TeX/LaTeX is really –

Bill Kasdorf (53:20)

That's right. Oh yeah. Particularly I think the way I've ventured in particularly 'cause in engineering, TeX/LaTeX is dominant, so yeah. That's right.

Mark Gross (53:28)

Right. And one for precision that's available there, when you're doing MathML, you're really doing very fine print. So it's really been a yin/yang kind of thing over the years when we got specialized over there, that's one thing that happens over there, but that's probably true in other places also, but the general trend in places has been to try to use our XML because it is, it's a language that lets you define lots of things. I mean, there's also other areas, also is an area that you may not be able to do everything that's there, but I think there's a large group, certainly large groups that use MathML until they get to those really, you know, page-long equations, which may be difficult to do.

Marianne Calilihanna

And that's exactly the point.

Bill Kasdorf (54:22)

Yeah. And I take that correction. So thanks to the IEEE people that said that, but you're right. What I should say is that the math is, MathML or TeX/LaTeX, very likely it's one or the other of those formats. But I can't resist pointing out that for accessibility, you want MathML. So even if you're creating your content in LaTeX – so Word, for example, you know, you can put TeX equations in a Word document or MathML equations in a Word document, but when that document gets made accessible, you actually need to convert that TeX into MathML to produce the accessible content. Yeah. MathML, they probably are going to get MathML pretty soon because they're going to be under pressure for those equations to be accessible.

Mark Gross (55:19)

Right. I think this development needs to be done. I think just the visualization of MathML, it's taken a long time to catch up when you get to really fine, fine work.

Bill Kasdorf (55:29)

Yeah. Actually there's some very significant work being done on MathML right now to make it easier to use and to enter, make it easier to kind of embed it in, in workflow and say, it's a MathML refresh that's being done. Right. Coming out with kind of a simple, it was simplified subset that's what's mostly needed. And then, you know, kind of the more extensive version that can basically express anything you want to express in math.

Mark Gross

I don't think it's sometihng that DCL is qulified to discuss. It sort of takes me out of my area of expertise.

Bill Kasdorf

That's very active, ongoing work.

Marianne Calilhanna (56:24)

We're coming to the top of the hour. And I think people are finishing up their lunch. I wanted to thank you both. But one of our participants, I thought, wrote an elegant and lovely thanks to both of you. And that is "Thanks to the work that Bill, Mark have done in the industry. We would never have been able to prove produce articles in 24 hours. That is now a requirement for COVID-19 articles and content." So kudos to both of you for your work on XML and conversion workflow. And I think that's a lovely way to end and I want to thank everyone who's taken time out of their day to join us for this webinar. And I'd like to remind everyone that the DCL Learning Series comprises webinars, a monthly newsletter, and blog articles. So we hope that you take advantage of this. You can access all this information from our website at data conversionlaboratory.com. This now concludes today's webinar. I hope everyone has a great day. Thank you.

Mark Gross

Okay. Bye, everyone. Good to see you, Bill and Marianne.