[ TAF Home ] | Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998 |
The Economics of Authority File Compilation: A perspective from a self-sustaining literature indexing enterprise
There is a true story about a very senior member of the British Civil Service, who was asked to give evidence at a trial in whose outcome the Government had a strong interest. When it was subsequently discovered that his evidence was best described as less than helpful, his explanation was that he had not deliberately misled the Court, but that he had been "economical with the truth"he'd answered what he was asked, but not volunteered anything.
I was asked to talk about Zoological Record's activities in authority file compilation as an economically self-sustaining activity. A slightly different use of "economic," but not wanting to be caught in the same trap as that civil servant, I think I should make clear at the beginning that our taxonomic authority files cannot really be called self-sustaining, though we do have a somewhat different funding model from some of the other speakers.
I'd like to briefly outline what BIOSIS sees as its role and how that is financed, move on to Zoological Record in general, and then to our taxonomic authority files, returning to the question of economics from time to time.
BIOSIS is a not-for-profit organisation based in Philadelphia and governed by a Board of twelve Trustees, probably better known to many as Biological Abstracts. And since 1980, we've been jointly responsible with the Zoological Society of London for the compilation of Zoological Record.
My primary role within BIOSIS is to manage the UK operation, which is solely responsible for the compilation of Zoological Record and where most of our taxonomic activities are based. While I'm also a member of the management team that runs BIOSIS as a whole, we have completely separate production systems for Biological Abstracts and Zoological Record.
Figure 1. The mission and role of BIOSIS. |
|
BIOSIS sees its mission as helping to advance the knowledge of biology by organising information to support the needs of life science researchers in all settings. In other words, our role is to act as an interface between the producers and users of biological information, not to actually create that informationunlike many of the participants in this workshop. We sit across the boundary between the two communities mentioned in the overall introduction to the workshop.
We try and support these needs by collecting, analysing and filtering the primary literature of the life sciences so that it is manageable by the individual researcher. Primary literature to us largely consists of the traditional printed books and journals; although we are attempting to cover the newly emerging purely electronic publications, these present some unique problems.
It's important to understand that, as I just said, we regard our role as recording and filtering the data. Unlike some of the other speakers, it is most definitely not our role to sit in judgement on the value of that literaturewe are not interested in control of information or of the decision making process, just in making everything that exists easily available to those who wish to consult it.
BIOSIS is indeed, as the title of this presentation suggests, an entirely self-supporting activity. We maintain our existence by providing services, usually to individual researchers, which are paid for usually by someone other than the useroften a librarian. We do not operate with grants; although we have sometimes had some funding in the shape of contracts to provide information to various agencies, we have regarded those contracts as we would any other sale.
We would agree with much of the comment made by the preceding speaker about the audience for the information we analyse and produceits size and nature, and its suspicion about anything it sees as "commercial" in particular. While we do charge for our services we have no shareholders and do not generate a profit. Any surplus is invested in improving the database and the products we can produce from it.
Overall we employ 230 staff in Philadelphia, where we produce Biological Abstracts, and 45 in the UK, where we produce Zoological Record. We have a budget of about 22 million US dollars a yearof which roughly 10% goes towards Zoological Recordto pay those staff, provide the infrastructure, make and distribute the products. Overall we break even, most of the time.
Both Biological Abstracts and Zoological Record in their present form are essentially traditional abstracting and indexing (A&I) journalsthey provide pointers to "real" information rather than direct access to the information itself. Our view is that this traditional approach will not be appropriate for much longer.
We see the future as moving towards a system which provides integrated, seamless access to information, including the contents of the primary literature and to the underlying data itself. We believe that BIOSIS has much to offer this processwhile we agree with many of the comments which suggest that the future lies in distributed data, we also agree with what I think is the underlying theme of this workshopthat there must be a common structure and terminology to allow access to this data. That's where we believe our expertise lies, and where we can help at a practical, as well as a theoretical level.
Figure 2. A future role for BIOSIS. |
|
As part of our role of collecting, recording, and filtering the data, we have developed a number of tools which we believe might be of use to the community. We have done much to develop our own knowledge base for biological information retrieval, drawing on the information we gain from handling something like 625,000 original items of literature each year. While it's the taxonomic information I would like to talk about today, we do believe that we've also made real progress on the subject side.
So, turning to taxonomy and Zoological Record, I should remind you that Zoological Record is an index to the world literature in zoology; it has been published continuously since 1864, and throughout the past 130 years it has served both a current awareness and archival function.
Like everything we do, Zoological Record reports what is said, but doesn't comment on the value of the work. If an author describes a new species then, provided they've met the requirements of the International Code of Zoological Nomenclature, we record it. We now index about 70,000 original works each year, of which perhaps a quarter to a third are truly "taxonomic," and these generate about 30,000 new and changed names.
Because we've acted for many years as an unofficial register of zoological names, and unlike some current awareness services which can limit their indexing to a fixed number of items for each citation and to a fixed number of terms for each item, we try and include an entry for every nomenclatural act in every paper published world-wide. We know we miss some, but it's not for want of trying.
This requirement to be both comprehensive and complete is an important factor in our economic model. While typically we include perhaps five index sets for an original item, taxonomic papers push that number up very rapidly. The highest number we've had in recent years was over twelve thousand entries for a single workthat took a significant amount of extra effort and is the sort of thing which has to be factored into our budget process.
The concept of registration and registers leads into how and why the Zoological Record Taxonomic Authority Files have become more widely available.
A number of years ago the International Union of Biological Sciences set in train a process which, it hoped, would lead to a form of registration being introduced for all organisms, in the interests of stability and improved dissemination of information.
In some groups such as Bacteria this has been an attainable goal, realised through the Approved List of Bacterial Names. Botanists seemed, after initial hesitation, to have become more comfortable with the concept and a trial system of registration is now in place with some possibility of formal adoption at the Botanical Congress in St Louis in 1999. However there are clearly strong reservations in some quarters about this.
Zoologists, having included proposals in the draft of the new International Code of Zoological Nomenclature to require registration of all new names, rejected these proposals as a result of strong opposition to some aspects of the schemewhich had perhaps not been fully thought through by the International Commission on Zoological Nomenclature (ICZN). Part of the proposals included the adoption of Zoological Record as the Official Register; while we believe that was not the reason the proposals fell, it did lead us to develop public access to some of our taxonomic authority files as a demonstration of how taxonomists would be able to verify that their names were included in the Register, without being required to purchase Zoological Record to do sowhich we understand was a considerable concern amongst many zoologists. This demonstration was an Internet-based system which was the prototype of our ION / TRITON system, described in more detail later.
The Zoological Record database holds a significant resource of animal name and bibliographic data, which we believe is crucial to the work of taxonomists, but which is only usable on payment of a charge of some sort. The community would clearly like this information to be available "free." Unfortunately they would also like it to continue to be available in futurewhich with the present economic model is a contradiction in terms.
Figure 3. ZR name resources. |
|
We could, of course, make all our data freely available on the web tomorrowwe have the technology in place to do so, as I will describebut if we did, compilation of the database would stop very quickly as there would be no way to pay the staff.
In the past few years we have been actively investigating ways in which the information we collect might be made available in a more flexible manner, while ensuring that the resources to maintain it continue to be available. These investigations have resulted in our active participation in the Species 2000 program, in discussions with ITIS, and with similar European agencies on how we can collaborate, and in our own work in developing our in-house authority files.
Although we don't judge the value of names and their relationships, we do need to organise the information so that it is accessible to the userwho may be a taxonomist or someone not trained in systematics and nomenclature. To assist in doing this we have a number of authority files, closely integrated into our production system, the first of which is our hierarchy.
This is a fairly high level taxonomic hierarchy. It's not the result of original taxonomic work by our staff, but a consensus based on what we see happening in the literature. In other words, it reflects thought rather than leads it. And as it is intended for recording and retrieval rather than judgement, we only go down to a level that is generally accepted by the communitythis might be to say family in the vertebrates, but only to order in some other groups. Within the hierarchy, we need to decide where to place names so that retrieval is accurate and efficient.
We have a system to do this called TITAN, which we developed from earlier manual card-indexes. This is a database of about 300,000 generic and sub-generic names, linked to our hierarchy and used as part of our semi-automated indexing process. This database is updated automatically as new taxa are added to main index database. New taxa and proposals to change the relationships of taxa are always linked to the hierarchy used by the author; where this doesn't correspond to ours we add cross-references.
The Species 2000 program described earlier in the workshop sets out to provide the best possible tool both for taxonomists and for those working with biodiversity who need to know what the correct name for an organism is. BIOSIS and Zoological Record are enthusiastic and committed supporters of Species 2000; I am a member of its management team. We recognise, however, that Species 2000 is a long term program, dependent both on the availability of substantial funding and significant input from taxonomists for it to reach anything near completion. In the meantime, there is a need for information about taxonomic names and in particular a straightforward way of finding out what they are, where they belong, and where to find out more.
As I mentioned earlier, in working with ICZN we had come up with a system to search the Zoological Record database and retrieve information about newly indexed names. We decided to see if we could extend that, firstly to all names in our database, and secondly to cover other groups of organisms.
Our partners in this project, who are all also active participants in Species 2000, include: that part of CABI BioScience previously known as IMI; the USDA Systematic Botany and Mycology Laboratory, who between them have provided a complete list of fungal names; the Missouri Botanic Garden, who have provided the mosses; and BIOSIS' own Register of Bacterial Nomenclature. Discussions are taking place with other organisations to see how the gaps which remainthe algae and the vascular plantsmight be included.
One of the problems we faced in developing this project is that of economics. Some of our partners are happy to have their data available to anyone free of charge; others, like ourselves, need to take care not to eliminate use of the existing paid for services, which provide the revenue supporting the ongoing database production. As a result we came up with what are, in effect, two views of the same information.
The first of these is called ION, the Index to Organism Names. This provides a freely available, simple, look up to determine which group an organism belongs to, though in the case of names from our own database we've been able to add some information not readily obtainable by users from our main index database using other tools.
For example, a search of the system for Carcinus maenas returns the major group, the author of the name, and where the name fits into our hierarchy. In addition we include a table that shows the number of papers in which the name has been used since volume 115 of Zoological Record (corresponding to the literature of 1978, when the ZR production system was computerised), and also a link to the full ZR hierarchy.
Figure 4. ION search form. |
Figure 5. ION search results for "Carcinus maenas". |
Figure 6. ZR taxonomic hierarchy for Carcinus maenas. |
* Crustacea ** Malacostraca *** Eumalacostraca **** Eucarida ***** Decapoda $Crustacea$ ****** Reptantia ******* Brachyura ******** Carcinus maenas (Linnaeus 1758) |
In addition to zoological names, as I've already mentioned we have a number of other groups included such as mosses from the Missouri Botanic Garden database.
Figure 7. Search results for "Mnium curvulum". |
Index To Organism Names: Search Results Request Details
Search Results Found in the Missouri Botanical Gardens Moss Names Database Name : Mnium curvulum |
Even using this limited (since 1978) database can throw up some interesting results. A search for the name Mirabella shows it has been used by three different zoologists, all naming a new genus and all within a six-year period.
Figure 8. ION results Mirabella summary. | ||||||||||||||||
Index To Organism Names: Search Results Request Details
Search Results More than 1 matching name has been found (note that all names represented in the database with differing forms of author and/or date are currently treated as different names). For more information, check the boxes next to the name(s) of interest and select 'More Data'. |
Figure 9. ION results Mirabella details. | |||||||||||||||||
Index To Organism Names: Search Results Request Details
Search Results Found in the Zoological Record Animal Names DatabaseName : Mirabella |
If we had a system of registration in place, or even if taxonomists made more use of existing resources, this sort of mistake could be prevented. Unfortunately at present it happens all too regularly.
Figure 10. Index to Organism Names (ION). |
|
The ION system was made publically available without charge in April 1997 and has been steadily used since then. It now contains about one and a quarter million names and searches are typically running at about 250 names, from around 70 unique hosts, each day.
ION is one of two views of the database. The other is TRITON, the Taxonomic Resource and Index to Taxonomic Organism Names, which provides access to the same information as ION, amplified by the full bibliographic data in the case of Zoological Record, and any other data the other participating organisations may want to make available. This system is functional now in the sense that it is in place on our server, but not yet accessible to anyone other than BIOSIS and our partners; it could very easily be made widely available if we were sure that doing so would not destroy our ability to continue production of the database from which it is derived.
Figure 11. TRITON results for "Filenchus". (In the on-line version, values for "Entry type", shown here in blue, would hyperlink to citations below. Only the first of nine citations is shown here.) | ||||||||||||||||||||||||||||||||||||||||
TRITON: Search Results Zoological Record Animal Names Database nomenclatural data for : Filenchus
Citation 01 Serial Title: BULLETIN OF ZOOLOGICAL NOMENCLATURE Vol: 44 Iss: 1 Year(s): Publ: 1987 Article Authors: Brzeski-M-W; Geraert-E; Raski-D-J Title: Case 2582. Filenchus Andrassy, 1954 (Nematoda): proposed designation of Tylenchus vulgaris Brzeski, 1963 as type species. Pages: 23-24 Language: In English Indexing Nematoda; Filenchus; Andrassy 1954; Proposal to Commission; Placement on Official List & designation of type species; Tylenchus vulgaris Brzeski 1963, p. 24 |
One comment that should be made is that name data presented by the ION / TRITON systems reflects, at least for zoology, the actual usage of names in the literature, including use of alternative spellings, synonyms and so onit does not comment on the validity of those names except where that is the specific purpose of the workand there of course the opinion expressed is that of the author, not of Zoological Record. In other words it's a nomenclator, not a taxonomic authority in its own right.
Figure 12. ION / TRITON. |
|
We take the view that, while this service is usable by non-taxonomists in the absence of other, more comprehensive directories, it is most useful to taxonomists to help them build better databases for more widespread usesuch as through the Species 2000 project. Zoological Record is not in competition with Species 2000; we are trying to help support the infrastructure needed to build that system. Information about both ION and TRITON, and access to ION, is available at www.york.biosis.org/triton.
The TRITON taxonomic authority file, of which ION is a subset, and the other activities I have been describing are all based on the manipulation of data which has been drawn together as part of the process of producing Zoological Record. None have any funding in their own right, and indeed none exist as a separate entitythey are all views of a common database that could not exist without the overall ZR production operation, including the substantial costs of literature acquisition and control, indexing, thesaurus maintenance, and many other functions. Putting together the physical product that the user sees is only the tip of the iceberg in terms of cost.
Figure 13. Economic models. |
|
The title of this presentation includes the phrase "a self sustaining literature indexing enterprise." I have to say that Zoological Record has never, since it began in 1864, been self-supporting, and the cost of producing it continues to mount. Initially the losses were met by the group of 19th century scientists who set it up; when in the 1880's they were no longer able to continue, the Zoological Society of London accepted the responsibility and covered the costs as part of its contribution to science for almost a century.
When in 1980 the Society in turn was unable to bear the burden alone, BIOSIS stepped in with an offer of partnershipknowing full well that this was an offer that required a relatively deep pocket. Fortunately, despite the present climatewhich has not been kind to many, if any, of the secondary servicesBIOSIS has been able and willing up till now to continue to pick up the deficit, which amounts to around half a million dollars each year when all costs are taken into account.
How long that can continue I don't knowperhaps not indefinitelybut I have to say that I am not sure in any case that I understand why one organisation of our nature should carry this financial burden. If this aggregation of taxonomic information, both as taxonomic authority files and the raw data that support those files, is as essential as we are regularly told by users that it is, then I believe that fact has to be recognized and the information paid for, by someone. Whether that should be through the existing economic model and through BIOSIS is another matter; my own feeling is that it needs to be seen as part of global infrastructure and funded at government or international level. While I include the activities of Zoological Record in that statement, I also intend it to include all taxonomic activity.
As Stan Blum said in his original proposal to hold this workshop, our ability to communicate precisely about the biological world, regardless of discipline, depends on common definitions of taxa and the names we use to designate them.
To use an analogy: taxonomic information is the telephone directory of biological information, without it you can't even start out to make the call. And yet people don't expect to pay for telephone directoriesthey assume that the telephone companies provide them free to encourage use of their networks. In fact of course they are not really free, they're paid for out of charges the customer has to meet, whether as part of line rental, call charges, or whatever. What taxonomists need to do is to find closer links to the network of life science information and persuade the owners of that network to support their activities.
And as a concluding remark, I don't know who owns the life science information telephone network, but I'm trying very hard to find out, and in the spirit of collaboration that I've tried to emphasize, will happily share the information if I succeed!
Michael Dadd has been with Zoological Record for more than thirty years. He is involved with a number of organizations that have interests in the communication of scientific information and taxonomy. The most relevant to this workshop are ICSTI (the International Council of Scientific and Technical Information) and CODATA. He is a member of the Management Committee of the International Trust for Zoological Nomenclature, and is also a member of the Project Management Team of the Species 2000 Project.