[ TAF Home ] | Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998 |
The Integrated Taxonomic Information System
It's a pleasure to be here to represent a project that I'm excited about, and in so doing, I want to acknowledge the contributions of Barbara Lamborne to this talk and the entire project, and of John Maret, who helped with the research for this presentation.
Figure 1. The Integrated Taxonomic Information System. |
What I'd like to do this morning is to introduce you to the Integrated Taxonomic Information System [Fig. 1]. In this generalized overview, I'm going to briefly cover the components of the project, review the history of the project, examine some of the relationships within the ITIS program, and describe some of our current and future directions and activities.
ITIS is a database built through a partnership among the taxonomic community sponsoring agencies and organizations. Our goals are to provide quality, taxonomic information about all organisms, from terrestrial and aquatic habitats, primarily with a North American focus, but ultimately with a global scope. Several Federal agencies are responsible for managing the natural heritage of this country. It is our opinion that good management requires knowledge of the resource. In other words, if you don't know what you have, it's hard to manage it. These agencies identified a scientifically credible list of taxonomic names, with a unique identifier for each name, as a critical need. (People in the various agencies suggested to us that unique identifiers were a critical part of the representation.) The National Research Council suggested to us that as Federal agencies we should avoid duplication and try to get together and coordinate efforts in managing taxonomic information.
It's obvious from the earlier presentations today that there are a lot of reasons why no scientifically credible list of names is available today. When we give this presentation to some audiences, we often get the reaction: "I don't understand why this hasn't been done." Some of you may understand the current situation very well, but others may not, so I should describe it at least briefly here. Taxonomists been working on describing the biota of the globe for roughly 240 years. There are thousands and thousands of people who have made contributions. We don't know how many; there's no good way to count. It starts way back with Linneaus and continues today. This means that there is a massive amount of information out there, presenting a wide diversity of perspectives, scattered through a multitude of scientific publications, that are distributed in a variety of holdings. Some of the publications are extremely rare, so you simply can't find all of that information in one place. The names are in Latin, which is good because the language doesn't change, but bad when you have to assess the accuracy of a particular name. As you know, there is currently no tool or list to help you determine whether or not a taxonomic name you've used is in fact spelled correctly. The rules of nomenclature, as set forth in the different international codes for zoology, botany, and microbiology, are unfortunately very technical and legalistic, and it takes somebody with some experience and knowledge to interpret them. Finally, classification is constantly changingthis whole arena is dynamic. We have new discoveries, new approaches, new data sets, and new analytical techniques that give us new, more current, and we hope, ever more accurate views of the world's biota. Adapting new technologies to meet this dynamic reality is the challenge we're discussing today.
Important events in the history of ITIS are shown in Figure 2. It began in the university environment in the early seventies as an attempt by some people to put together a list of names. In 1976 that project moved to the National Oceanographic Data Center (NODC), which is part of NOAA. The NODC database was constructed as a flat file, and used a single intelligent number both to identify names and to hold the taxonomic classification, or relationships among organisms. Because of the way it was built, with information being received from sources all around the world, the content of the list was effectively un-reviewed. The first tape, which was the way these data were distributed, was produced in the subsequent year, 1977. In 1985 the EPA, having similar needs for taxonomic information, joined with NODC and brought an aquatic freshwater component to what was primarily a marine data file. In 1992 the original partners, plus members of what is now USGS, made a decision to try to upgrade the NODC Code. That effort became known as ITIS in 1993. The idea was to provide scientifically credible information, in a state of the art database, and to make it available over the Net. ITIS came on line in 1996 after a three year development period. In 1997 we established the data development team, and we've been in a production mode ever since.
Figure 2. History of ITIS. |
|
The partners include a series of agencies within the Federal government, whose responsibilities are for the management of resources, or who have relevant expertise or information in the scope of their responsibility. It includes the original group from the Departments of Commerce (NOAA), Interior (USGS), Agriculture (USDA), and the Environmental Protection Agency (EPA); was expanded to include the Smithsonian Institution, National Museum of Natural History, and most recently Agriculture and Agri-Food Canada, and the University of Kansas Natural History Museum and Biodiversity Research Center.
There are four components of the ITIS system [Fig. 3]: the database of information; an on-line system, which allows access to and distribution of that information; a workbench, which is effectively the way we control the quality of the data and works as the interface between data development and the database; and then an organization and management infrastructure. What I'm going to do is take you through each of these pieces very briefly, but spend a bit more time on the database because it's the focus of what we're trying to do today.
Figure 3. Components of ITIS. |
|
The data model itself is relational and consists of 20 tables. I grouped the tables into subject areas to make it easier to understand [Fig. 4]. The basic unit is the scientific name. Each name in the file has a unique taxonomic serial number. It's comparable, if you like, to a social security number. All scientific names are in this table, including synonyms, and through a series of connected tables, we associate each species and generic name with an author and date of publication. We have a separate table of vernacular names. There are tables for managing the peer review process and a table for geographic distribution, which includes biogeographic areas as well as jurisdictional areas for Federal holdings. We have a module, called change tracking, that allows us to track the history of changes within the database, which then provides access and review points for people who are interested in understanding what was there say 5 years ago, versus what's there today. There are comment fields scattered throughout the database. Systematists, and I think many other people, realize the value of this. We don't necessarily know the status of a particular data item when it's first entered, and comments are very useful documenting the checks and outstanding questions. Finally, there are the tables for sources or publications, and these are the original descriptions or monographic treatments. We also have experts that serve as sources, and several other types, such as CDs or Bill Eschmeyer's three volume set on Fishes. More detailed information about the data model is available on the Web site, including database diagrams, business rules, and data dictionaries.
Figure 4. Simplified ITIS data model. |
There are various query and report capabilities provided by the on-line system. One of these I think you'll find very interesting and useful is the ability to compare your list, in terms of it's content, to information within the system. We have an automated ability here, which is called "Compare taxonomic nomenclature", which allows you to do that on your own, from the on-line system.
The Taxonomic Workbench [Fig. 5] is a Windows program that was designed to interface between data development, the on-line system, and the database system. This is the tool that we use to get information into the system. You can see a series of tabs and a menu driven formats, which allow and constrain, in many ways, the quality of information that goes into the file. It's through this workbench that we control the quality, in the sense of structure and format, of the data. This is a stand-alone program that's available for your own use and we encourage people to use it at whatever level is appropriate for them.
Figure 5. The ITIS Workbench. |
The organization itself has evolved into a three-ring circus: there's a systems development team, a data development team, and a management team. Currently, we are trying to formalize this in terms of additional structure. This was a sort of bottom-up development, a relatively uncommon event in the Federal government, and some of us think that's why it's been as successful as it has.
Data acquisition and development is a major component of this operation [Fig. 6]. We have basically four ways of generating information. Sometimes our partner organizations within the original group bring data to the table. An example of that would be the PLANTS database, which came from the Department of Agriculture. In other instances we're able to support our contacts and help them design databases and put information into an electronic format that can be imported into ITIS. We have a system of data stewardsthat is people with particular expertise who are responsible for a particular taxonomic group. Many times those stewards have data in hand that we then help them develop or they can provide to us for migration into ITIS. In some instances, when the need is established by one of the participating agencies, we may in fact develop our own datathat is go out into the basic literature and go through the process that others have gone through to get information into the system.
Figure 6. Data acquisition and development: a multi-faceted approach. |
The basic process then, with data coming in through an acquisition process or in through the workbench, is to associate various quality control attributes with each piece of data, to provide a review system for those data, to go through a certification process, and move the data then into the on-line system [Fig. 7]. It's that process of moving data from the workbench into the on-line system where the taxonomic serial number is assigned and all the format and data quality verification is accomplished. Verified data are then made available for use across the Internet.
Figure 7. The ITIS data roadmap. |
I want to talk about the users of ITIS [Fig. 8]. There are many other classifications of users that we could have used, but this is at least my perspective on that. The public is a sizable proportion of the ITIS user communityfor example, people who simply need to find out how to spell the species that Johnny is working for his 5th grade term reportand we consider them as an important piece of this whole program. We started out to provide information to government agenciesto help them track their biological information. So these data get incorporated as some sort of authority file into the National Biological Information Infrastructure and other programs. Science writers, editors, and similar sorts of people use these data, and we feel very gratified when we get some comment back about the success people have had using ITIS for this purpose. The uses of ITIS in museum and library communities are ones we hope to explore here today. We're beginning to develop these relationships, and I think there are a lot of opportunities for additional exchange.
Figure 8. The ITIS interactive environment. |
I'd like to spend just a second on the relationship between ITIS and the systematics community [Fig. 9]. This endeavor is only going to be successful if we are able to involve the systematics community in this process. There are not enough people like Bill Eschmeyer out there, who make these data available in electronic format, or some other format, to give us the support, coverage, credibility, and currency that we need for ITIS. Right now there are basically four components that define the nature of the interaction between the systematics community and the ITIS endeavor. We are able to provide, through NSF and our own resources, limited funding to assist systematists, for example with the PEET program and others, to develop information and to provide that information back to ITIS. We interact with the systematics community through data stewards. Data stewards use this format, this presentation, as a way of getting their information, views, and labors out the rest of the scientific community and the public. We interact with the systematics community in terms of sources. For example, right now the data development team is reviewing all of the fish names in ITIS using Bill Eschmeyer's Catalog, both the books and the CD-ROM. This is an extremely important piece of ITISto provide that standardization, that credibility required by the initial agreement. And certainly peer review is another component of that. Now these are just pieces of the interaction, but I think they illustrate the exchange between ITIS and the rest of the community.
Figure 9. ITIS and the systematics community. |
There are many projects, organizations, and collaborations that we are involved with [Fig. 10]. Rather than going through the whole list, let me mention some key examples. The NBIIthe effort to link information for all biological databases and information within the governmentis an important component. Last week we were pursuing interactions at the international level with Agriculture Canada and the Canada NBII program, and we look forward to expanding that operation. I think it's going to be a nice additiona nice collaborative additionthat will give us better North American coverage. We hope to establish a similar partnership with Mexico. There are others, and some of you can find your favorite projects and organizations listed. We're going to hear from Frank Bisby a little bit later about Species 2000, and how ITIS and other efforts like ITIS interact with Species 2000.
Databases | Projects and organizations |
---|---|
|
|
The major accomplishments of the project are listed in Figure 11. We have: an on-line database; a data development office, with people working right now, cleaning up data, reviewing data, adding data, and interacting with data providers; and we are in the process of developing data standards. There are on-going discussions with groups to adopt the ITIS viewthe ITIS informationas a standard for taxonomic information. We have over 260,000 names in the file. About 75% of these are species-level names, and by the calculations we did at the end of last week, we had touched about 30% of those names, which means that we have reviewed and looked at these names, and are trying to improve the quality of the database.
|
A couple points about where we are now and what we're looking at down the road. We are continuing to review the legacy data, paying particular attention to updatingthat is bringing those data up to current levelsand to improving the quality of those data. We're continuing to expand the geographic and taxonomic coverage, with a complete treatment of North America as our immediate goal, but we operate in that global environment and respond as best we can whenever an opportunity or need arises. We are in the process of establishing a senior management team and improving the infrastructure, in terms of budgetary support and other things that are important for the on-going operation. As I mentioned, we are in the process of expanding partnerships. There are other agencies, such as Fish and Wildlife Service, National Park Service, the Bureau of Land Management in the Department of Interior, and other groups, such as Agriculture Canada, with whom we are carrying on active discussions for expansion.
We're have begun to refine the web interface to enable better data access. We're talking with people about incorporating Z39.50 capabilities and moving in that direction. We're very interested in developing relationships with the Library and Museum communities. This is where much of the basic information is maintained, either specimens or literature, and it's essential that we have on-going, collaborative efforts at that level. Finally, we are beginning to consider some of the challenges that were presented early, but set aside for practical reasons, such as the ability to accommodate multiple classifications. Those in the systematics community may appreciate what this means. We made a decision early on to present a single view, a single classification, because that's what most of the agencies wanted, knowing all along that that was not going to satisfy everybody. Now we're beginning to investigate how to handle multiple classifications and to address some of the other challenges that were deferred early on.
So I hope this brief tour gives you some idea of what the ITIS enterprise is all about. If there's time, I'd be pleased to try and answer some of your questions. We'll be around here for the rest of the two days to interact.
Roy McDiarmid received his Ph.D. from the University of Southern California in 1969. He then spent two years at the University of Chicago and 10 years at the University of South Florida. In 1978 he moved to the Smithsonian where he has served as a Curator of Hepetology and Research Zoologist for: the National Fish & Wildlife Service, the National Biological Survey, and most recently, the Biological Resources Division of the US Geological Survey (all while occupying the same office). His research interests are in the systematics and ecology of both amphibians and reptiles, with a geographic empahsis on the neo-tropics. He is chair of the checklist committee for the Herpetologists' League, which is charged to continue the work on catalogs of amphibians and reptiles of the world. With co-authors, he has just completed a checklist of snakes of the world. He has been involved with ITIS since its inception in 1992.