| This article ( http://www.DCLab.com/stm_xml.asp ) is provided through the Data Conversion Laboratory website. To subscribe to DCLnews go to http://www.dclab.com/request_subscription.asp. |
Why
Publishers Should
Use XML...
Every
moment publishers put off embracing XML, they're missing a powerful opportunity
to reduce composition costs and increase revenues, says David Skurnik
(pictured), VP of Sales at Data Conversion Laboratory
.
[Thanks to Lori Barber, Business Development Manager and Beth Friedman, Senior Project Manager for their input.]
It's 3pm. You've just had a meeting with your Executive
Editor. He
told you to cut the cost of production and reduce the time to market
of your books and journals.
You now go to another meeting. This time with the VP of Marketing. She
tells you that, in an effort to increase revenue from existing customers, they are planning to develop dozens of separately purchasable,
value-added features, to the online products. What's more, to attract
new buyers, they are targeting smaller
markets with virtual niche journals built from existing journal content.
They are also building up extra distribution channels through content
aggregators, and abstracting and indexing services.
The upshot is you will have to produce a lot more products ... and they will have to be easily adaptable for use in multiple vendors' systems.
You then make a beeline to your Executive Editor's office and tell him you can't decrease overall production costs because you now have to support more platforms and produce more products. The Editor smiles sagely at you and wishes you good luck in realizing organizational goals.
Sound Familiar?
|
|
In today's high tech and ever demanding publishing climate, many production managers yearn for the "good old days" when all they had to produce was paper. But, as we quickly learned, those days are only a pleasant memory. In order to keep the Executive Editor and VP of Marketing happy, we have to embrace technology and re-engineer our production process to exploit new technology.
Before we discuss technology, let's look at what the perfect production environment would be. From my conversations with numerous publishing professionals I came up with the following list:
Are these goals attainable?
Yes. Technology has made these goals fully attainable. But before the mansion can be built you have to dig the foundation. In the case of Publishing, the foundation is the data. That is why data needs to be in "smart formats" like SGML and XML. With intelligent data, technology can be exploited to accomplish your goals.
Let's examine the tasks you will ask of your data:
Not surprisingly, the benefits of SGML and XML are exactly those mentioned above.
What is SGML and XML?
Definition and History of SGML, HTML and XML
SGML (Standard Generalized Markup Language) was developed in the 1980s as a non-proprietary, platform independent, method of describing the structure of a document rather than its appearance.
What does it mean to tag a document based on structure?
In a bibliographic reference, for example, there is a title, author(s) details, publisher, year of publication, and perhaps page citations. A reader can distinguish between the different types of information from both experience and the appearance of the data (the title may be in italics, for example). But a reference containing SGML or XML mark-up will have tags representing the many structures within the reference.
A reference could be tagged as simply as this:
<citation citation_type="journal" id="A123">Abdel Malek, Z, Swope, VB, Pallas, J, Krug, K & Nordlund, JJ. <it>Mitogenic, melanogenic and cAMP responses of cultured neonatal human melanocytes to commonly used mitogens</it>. J Cell Physiol, 150, 416–425, 1992.</citation>
The more the data is tagged, the more "intelligence" is added to the document. This intelligence makes it easier to "slice and dice" the data for creating new products and to transform the data for other systems. The following example shows every reference element tagged separately.
<jnlref><au><snm>Abdel Malek</snm>, <fnms>Z</fnms></au>, <au><snm>Swope</snm>, <fnms>VB</fnms></au>, <au><snm>Pallas</snm>, <fnms>J</fnms></au>, <au><snm>Krug</snm>, <fnms>K</fnms></au> & <au><snm>Nordlund</snm>, <fnms>JJ.</fnms></au> <tl>Mitogenic, melanogenic and cAMP responses of cultured neonatal human melanocytes to commonly used mitogens</tl>. <pubtl>J Cell Physiol</pubtl>, <vid>150</vid>, <ppf>416</ppf>– <ppl>425</ppl>, <cd year="1992">1992</cd></jnlref>.
Prior to applying tags to a document, you have to define some basic rules that determine:
Document Type Definition
These rules comprise a document called a Document Type Definition (DTD). Before any conversion, the DTD has to be developed to give guidance on the basic rules of the conversion.
In the early days, the biggest issues against implementing an SGML solution were that it was complex and that there were not many tools on the market to support it.
In the infancy of the Internet, a universal DTD for tagging documents designed to be viewed on the Internet was developed. This DTD came to be known as Hypertext Markup Language (HTML). Since HTML was focused on presentation and not on structure, the HTML tag set was very limited, and was therefore much easier to implement.
But its advantage of being simple was its biggest drawback since HTML's ability to perform complex searching, linking and document maintenance was very limited.
The challenge was to find a way of marking up documents that was not as complex as SGML but was more powerful than HTML. The solution was XML. XML is an acronym for eXtensible Markup Language and is a derivative of SGML. Since its introduction on to the market, many corporations and organizations like IBM, Microsoft and General Electric have been converting their documentation to XML. And XML has become the de-facto standard for data transfer.
Let's discuss how a combination of good data and technology can help you achieve your publishing goals. We will start at the beginning of the process and move forward to the remainder of the process.
In many instances, collaborative authoring may be involved in your production process. There are Web based systems that allow manuscripts to be authored in Microsoft Word. Edits can either be made directly into the document or the main author can receive an external document containing the edits. If there is only one author involved, the manuscript is usually created in Word; but a manuscript full of equations might be created in Tex or Latex.
Author Templates
In either case, the idea of developing Author Templates is important since it would produce a consistently structured and well-styled manuscript.
The main challenge is how to make sure the author actually uses the template. Depending on the situation, you will have to decide between using a "carrot or a stick" or a combination of both. In any event, even if you were only mildly successful in getting author compliance, you would have achieved a greater degree of consistency in your author manuscripts. This will help you reduce the time and cost of converting your manuscripts into your target format.
Pre-Composition Conversion
Once you have the first cut of your "template driven" manuscript, your copy editors can work in conjunction with the author to get the manuscript into its final form within whatever format they are familiar with. Another option would be to convert to SGML or XML right away and your copyeditors would work within an SGML/XML editing environment until the manuscript reaches its final form. There are several excellent commercially available SGML/XML editors. Converting the manuscripts prior to composition is known as "Up Front Conversion" or "Pre-Composition Conversion".
Finding the best point in the process to convert the manuscript to SGML/XML is tricky. It depends on your copyeditor/author collaborating environment. This is especially true if it involves multiple page proof passes. We typically discuss this issue in great detail before we recommend an approach.
|
|
If you are able to utilize a "pre-composition" SGML/XML up front process, you can realize many benefits:
Content Management System (CMS)
If you would rather wait to re-engineer your production process and prefer to convert to SGML/XML post-Composition, you will still realize many benefits since your data will be in SGML/XML. But before we discuss these benefits we need to mention another piece of technology that you can leverage to improve the process - a Content Management System (CMS). As its name suggests, the CMS is designed to manage the content contained within it. The basic features of an SGML and XML based CMS are:
Some of the benefits that you will receive from an SGML/XML based CMS are:
Other benefits of SGML/XML data are:
In conclusion, although the demands on Publishers are increasing, technology is evolving ... and understanding exactly how all the pieces fit together will enable you to meet the current challenges.
DCLnews Editorial
Read more on
Publishing and XML
at
DCL Library
DCLnews
recently did a series of interviews with leading figures
in STM publishing. Click on the links below to read them:
Return
to top
Data Conversion Laboratory, Inc. 61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365 718-357-8700 convert@dclab.com
Copyright © 1997-2010 Data Conversion Laboratory, Inc. All rights
reserved.