National
Library of Medicine
Making Medical Information Available
On-Line
ABSTRACT: In the medical industry, the accuracy of data is a matter of
life and death. Therefore, accurate data conversion is a necessity whenever a
new system is implemented. This article describes the conversion process of the
National Library of Medicine as they moved their materials into an on-line
system, and will be of interest to anyone who is concerned with the accuracy and
quality of data.
The New Library
There's
a maxim in the conversion business: "Information is an asset." Nowhere
is this more true than in the medical industry, where information can mean the
difference between life and death. In such a milieu, the importance of the
National Library of Medicine (NLM) can hardly be exaggerated. It is, after all,
the largest medical research library in the world. In fact, it's the world's
largest research library in a single scientific and professional field, with a
collection of 5 million items books, journals, technical reports, manuscripts,
microfilms, and pictorial materials.
But what does it mean to be a library today? New information technology is
expanding the possibilities of how much information can be stored and how it can
be disseminated. In the field of health care, new opportunities mean new
obligations. To quote the Hippocratic Oath, "into whatsoever house you
shall enter, it shall be for the good of the sick to the utmost of your
power." High technology has expanded "utmost" to new levels and
has made it possible to enter more houses than ever before without even getting
out of one's chair.
NLM's Electronic Resource
The NLM's latest response to this challenge is HSTAT (Health
Services/Technology Assessment Text), an electronic resource that includes the
full text of clinical practice guidelines, quick-reference guides for
clinicians, and consumer brochures. The materials were provided by the Agency
for Health Care Policy and Research (AHCPR), National Institutes of Health (NIH)
consensus development conference and technology assessment reports, and the U.S.
Preventive Services Task Force Guide to Clinical Preventive Services (1989
edition).
HSTAT is
part of an initiative called the Health Services Research Information Program
coordinated by NLM's National Information Center on Health Services Research and
Health Care Technology (NICHSR). The actual development of HSTAT was left to the
Information Technology Branch of the Lister Hill Center, also part of NLM. It
was the Lister Hill Center that called DCL.
The Importance of Access
"HSTAT can be accessed several different ways," explains Maureen
Prettyman at the Center. "In fact, it's currently in three different
databases. Users can do full-text search and retrieval on character-based
terminals, they can download over the Internet with gopher or ftp, and then
there's the World Wide Web. But when we started, all we had were WordPerfect
documents, ASCII, and the books themselves. We knew we needed to go to
SGML."
SGML tagging would provide the cues needed for search engines and could
readily be converted to the SGML-based HTML, the accepted format for World Wide
Web access. But after developing a Document Type Definition (DTD) to define the
structural rules for the SGML documents, hundreds of pages of material had to be
converted from the first set of books, and then there would be more sets to
follow.
Don't Try This At Home
Norman Barth, DCL Project Manager for this conversion, talked about the
difficulty of such a conversion. "Some companies think they can save money
by converting files in-house, but with a complex conversion like this one, a
company will find itself draining more and more of its resources as the project
continues. NLM came to us right away and we were able to give them a cost
estimate that allowed them to make a realistic budget"
But why is the conversion so difficult? Norman continues, "In this
conversion, we are adding information. There are no tags in the original
material. Where does this information come from? Three places: appearance,
context, and content. If all chapter titles are the same font size, then we can
use appearance cues to tag chapters.
"But because SGML is so concerned with how a document is structured,
context becomes important, too. A blank line in a sample form, for example,
might be considered different than a blank line after a study question in the
back of a chapter. The only information source we didn't use for the NLM job was
content. In this case, it was more cost-effective to have their own people do
that tagging, since they were subject-matter experts. Still, most of the tagging
was accomplished by appearance and context clues only.
"Your original question was about difficulty. Let me just say that we've
had to expand our development and editorial departments twice as we've increased
the number of SGML conversions we do. Most of our editors are trained
specifically for SGML. My advice: Don't try this at home!"
The Importance of Communication
DCL is able to offer any combination of manual and automated processes to
most cost-effectively convert legacy documents. In this case, a manual approach
was chosen. Jennifer Ruckdeschel was put in charge of the editing process.
"Maureen [Prettyman] sent the DTD and narrowed down what she wanted us
to do. From that information, I came up with keying specs from outside editors,
who tagged the WordPerfect files. When they came back, we parsed them and had
in-house editors do the final clean up.
"Communication with the client was very good on this project. Whenever
Maureen had a question or concern about our tagging, she didn't hesitate to
call. Our priority is always to create a good feedback cycle with our clients.
By sending them materials early and often, and then making them feel comfortable
when they call, we stay in touch with what the clients want and the clients
don't get any surprises."
Even though most, if not all, of DCL's employees have not taken the
Hippocratic Oath, they did the utmost of their power for the NLM, which has
already begun to make HSTAT information available. For more information on how
you can access this information, please call the NICHSR at (301) 496-0176 or
E-mail them at NICHSR@NLM.NIH.GOV
|