[ TAF Home ] Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998

The Economics of Authority File Compilation: A perspective from a self-sustaining literature indexing enterprise


Michael N. Dadd

BIOSIS-UK
54 Micklegate
York, YO1 6WF, UK
mndadd@york.biosis.org


Introduction

There is a true story about a very senior member of the British Civil Service, who was asked to give evidence at a trial in whose outcome the Government had a strong interest. When it was subsequently discovered that his evidence was best described as less than helpful, his explanation was that he had not deliberately misled the Court, but that he had been "economical with the truth"—he'd answered what he was asked, but not volunteered anything.

I was asked to talk about Zoological Record's activities in authority file compilation as an economically self-sustaining activity. A slightly different use of "economic," but not wanting to be caught in the same trap as that civil servant, I think I should make clear at the beginning that our taxonomic authority files cannot really be called self-sustaining, though we do have a somewhat different funding model from some of the other speakers.

I'd like to briefly outline what BIOSIS sees as its role and how that is financed, move on to Zoological Record in general, and then to our taxonomic authority files, returning to the question of economics from time to time.

BIOSISBiosis Logo

BIOSIS is a not-for-profit organisation based in Philadelphia and governed by a Board of twelve Trustees, probably better known to many as Biological Abstracts. And since 1980, we've been jointly responsible with the Zoological Society of London for the compilation of Zoological Record.

My primary role within BIOSIS is to manage the UK operation, which is solely responsible for the compilation of Zoological Record and where most of our taxonomic activities are based. While I'm also a member of the management team that runs BIOSIS as a whole, we have completely separate production systems for Biological Abstracts and Zoological Record.

Figure 1. The mission and role of BIOSIS.
  • Mission: To advance the knowledge base of biology by organising information to support the needs of life science researchers in all settings
  • Role: To collect, analyse, and filter the primary literature so that it is manageable by the individual researcher

BIOSIS sees its mission as helping to advance the knowledge of biology by organising information to support the needs of life science researchers in all settings. In other words, our role is to act as an interface between the producers and users of biological information, not to actually create that information—unlike many of the participants in this workshop. We sit across the boundary between the two communities mentioned in the overall introduction to the workshop.

We try and support these needs by collecting, analysing and filtering the primary literature of the life sciences so that it is manageable by the individual researcher. Primary literature to us largely consists of the traditional printed books and journals; although we are attempting to cover the newly emerging purely electronic publications, these present some unique problems.

It's important to understand that, as I just said, we regard our role as recording and filtering the data. Unlike some of the other speakers, it is most definitely not our role to sit in judgement on the value of that literature—we are not interested in control of information or of the decision making process, just in making everything that exists easily available to those who wish to consult it.

BIOSIS is indeed, as the title of this presentation suggests, an entirely self-supporting activity. We maintain our existence by providing services, usually to individual researchers, which are paid for usually by someone other than the user—often a librarian. We do not operate with grants; although we have sometimes had some funding in the shape of contracts to provide information to various agencies, we have regarded those contracts as we would any other sale.

We would agree with much of the comment made by the preceding speaker about the audience for the information we analyse and produce—its size and nature, and its suspicion about anything it sees as "commercial" in particular. While we do charge for our services we have no shareholders and do not generate a profit. Any surplus is invested in improving the database and the products we can produce from it.

Overall we employ 230 staff in Philadelphia, where we produce Biological Abstracts, and 45 in the UK, where we produce Zoological Record. We have a budget of about 22 million US dollars a year—of which roughly 10% goes towards Zoological Record—to pay those staff, provide the infrastructure, make and distribute the products. Overall we break even, most of the time.

Both Biological Abstracts and Zoological Record in their present form are essentially traditional abstracting and indexing (A&I) journals—they provide pointers to "real" information rather than direct access to the information itself. Our view is that this traditional approach will not be appropriate for much longer.

We see the future as moving towards a system which provides integrated, seamless access to information, including the contents of the primary literature and to the underlying data itself. We believe that BIOSIS has much to offer this process—while we agree with many of the comments which suggest that the future lies in distributed data, we also agree with what I think is the underlying theme of this workshop—that there must be a common structure and terminology to allow access to this data. That's where we believe our expertise lies, and where we can help at a practical, as well as a theoretical level.

Figure 2. A future role for BIOSIS.
  • Existing traditional A&I role
  • Move to future collaborative systems integrating publishing and other forms of information
  • Need for common structure and terminology to allow easy access
  • BIOSIS has knowledge-base and tools

As part of our role of collecting, recording, and filtering the data, we have developed a number of tools which we believe might be of use to the community. We have done much to develop our own knowledge base for biological information retrieval, drawing on the information we gain from handling something like 625,000 original items of literature each year. While it's the taxonomic information I would like to talk about today, we do believe that we've also made real progress on the subject side.

Zoological Record

So, turning to taxonomy and Zoological Record, I should remind you that Zoological Record is an index to the world literature in zoology; it has been published continuously since 1864, and throughout the past 130 years it has served both a current awareness and archival function.

Like everything we do, Zoological Record reports what is said, but doesn't comment on the value of the work. If an author describes a new species then, provided they've met the requirements of the International Code of Zoological Nomenclature, we record it. We now index about 70,000 original works each year, of which perhaps a quarter to a third are truly "taxonomic," and these generate about 30,000 new and changed names.

Because we've acted for many years as an unofficial register of zoological names, and unlike some current awareness services which can limit their indexing to a fixed number of items for each citation and to a fixed number of terms for each item, we try and include an entry for every nomenclatural act in every paper published world-wide. We know we miss some, but it's not for want of trying.

This requirement to be both comprehensive and complete is an important factor in our economic model. While typically we include perhaps five index sets for an original item, taxonomic papers push that number up very rapidly. The highest number we've had in recent years was over twelve thousand entries for a single work—that took a significant amount of extra effort and is the sort of thing which has to be factored into our budget process.

Registration

The concept of registration and registers leads into how and why the Zoological Record Taxonomic Authority Files have become more widely available.

A number of years ago the International Union of Biological Sciences set in train a process which, it hoped, would lead to a form of registration being introduced for all organisms, in the interests of stability and improved dissemination of information.

In some groups such as Bacteria this has been an attainable goal, realised through the Approved List of Bacterial Names. Botanists seemed, after initial hesitation, to have become more comfortable with the concept and a trial system of registration is now in place with some possibility of formal adoption at the Botanical Congress in St Louis in 1999. However there are clearly strong reservations in some quarters about this.

Zoologists, having included proposals in the draft of the new International Code of Zoological Nomenclature to require registration of all new names, rejected these proposals as a result of strong opposition to some aspects of the scheme—which had perhaps not been fully thought through by the International Commission on Zoological Nomenclature (ICZN). Part of the proposals included the adoption of Zoological Record as the Official Register; while we believe that was not the reason the proposals fell, it did lead us to develop public access to some of our taxonomic authority files as a demonstration of how taxonomists would be able to verify that their names were included in the Register, without being required to purchase Zoological Record to do so—which we understand was a considerable concern amongst many zoologists. This demonstration was an Internet-based system which was the prototype of our ION / TRITON system, described in more detail later.

Zoological Record Resources

The Zoological Record database holds a significant resource of animal name and bibliographic data, which we believe is crucial to the work of taxonomists, but which is only usable on payment of a charge of some sort. The community would clearly like this information to be available "free." Unfortunately they would also like it to continue to be available in future—which with the present economic model is a contradiction in terms.

Figure 3. ZR name resources.
  • Index database
  • Taxonomic hierarchy
  • TITAN – The Index to Taxonomic Animal Names
  • TRITON – Taxonomy Resource & Index To Organism Names
    • Links with Species 2000
    • Project Partners
  • ION – Index to Organism Names

We could, of course, make all our data freely available on the web tomorrow—we have the technology in place to do so, as I will describe—but if we did, compilation of the database would stop very quickly as there would be no way to pay the staff.

In the past few years we have been actively investigating ways in which the information we collect might be made available in a more flexible manner, while ensuring that the resources to maintain it continue to be available. These investigations have resulted in our active participation in the Species 2000 program, in discussions with ITIS, and with similar European agencies on how we can collaborate, and in our own work in developing our in-house authority files.

Zoological Record Authority Files

Although we don't judge the value of names and their relationships, we do need to organise the information so that it is accessible to the user—who may be a taxonomist or someone not trained in systematics and nomenclature. To assist in doing this we have a number of authority files, closely integrated into our production system, the first of which is our hierarchy.

This is a fairly high level taxonomic hierarchy. It's not the result of original taxonomic work by our staff, but a consensus based on what we see happening in the literature. In other words, it reflects thought rather than leads it. And as it is intended for recording and retrieval rather than judgement, we only go down to a level that is generally accepted by the community—this might be to say family in the vertebrates, but only to order in some other groups. Within the hierarchy, we need to decide where to place names so that retrieval is accurate and efficient.

We have a system to do this called TITAN, which we developed from earlier manual card-indexes. This is a database of about 300,000 generic and sub-generic names, linked to our hierarchy and used as part of our semi-automated indexing process. This database is updated automatically as new taxa are added to main index database. New taxa and proposals to change the relationships of taxa are always linked to the hierarchy used by the author; where this doesn't correspond to ours we add cross-references.

Species 2000

The Species 2000 program described earlier in the workshop sets out to provide the best possible tool both for taxonomists and for those working with biodiversity who need to know what the correct name for an organism is. BIOSIS and Zoological Record are enthusiastic and committed supporters of Species 2000; I am a member of its management team. We recognise, however, that Species 2000 is a long term program, dependent both on the availability of substantial funding and significant input from taxonomists for it to reach anything near completion. In the meantime, there is a need for information about taxonomic names and in particular a straightforward way of finding out what they are, where they belong, and where to find out more.

Index to Organism Names (ION)

As I mentioned earlier, in working with ICZN we had come up with a system to search the Zoological Record database and retrieve information about newly indexed names. We decided to see if we could extend that, firstly to all names in our database, and secondly to cover other groups of organisms.

Our partners in this project, who are all also active participants in Species 2000, include: that part of CABI BioScience previously known as IMI; the USDA Systematic Botany and Mycology Laboratory, who between them have provided a complete list of fungal names; the Missouri Botanic Garden, who have provided the mosses; and BIOSIS' own Register of Bacterial Nomenclature. Discussions are taking place with other organisations to see how the gaps which remain—the algae and the vascular plants—might be included.

One of the problems we faced in developing this project is that of economics. Some of our partners are happy to have their data available to anyone free of charge; others, like ourselves, need to take care not to eliminate use of the existing paid for services, which provide the revenue supporting the ongoing database production. As a result we came up with what are, in effect, two views of the same information.

The first of these is called ION, the Index to Organism Names. This provides a freely available, simple, look up to determine which group an organism belongs to, though in the case of names from our own database we've been able to add some information not readily obtainable by users from our main index database using other tools.

For example, a search of the system for Carcinus maenas returns the major group, the author of the name, and where the name fits into our hierarchy. In addition we include a table that shows the number of papers in which the name has been used since volume 115 of Zoological Record (corresponding to the literature of 1978, when the ZR production system was computerised), and also a link to the full ZR hierarchy.

Figure 4. ION search form.
ION Search Form

 
Figure 5. ION search results for "Carcinus maenas".
ION search results

 
Figure 6. ZR taxonomic hierarchy for Carcinus maenas.
* Crustacea
** Malacostraca
*** Eumalacostraca
**** Eucarida
***** Decapoda $Crustacea$
****** Reptantia
******* Brachyura
******** Carcinus maenas (Linnaeus 1758)

In addition to zoological names, as I've already mentioned we have a number of other groups included such as mosses from the Missouri Botanic Garden database.

Figure 7. Search results for "Mnium curvulum".

Index To Organism Names: Search Results

Request Details

  • Searching for : Mnium curvulum

Search Results Found in the Missouri Botanical Gardens Moss Names Database
Name : Mnium curvulum
Group : Moss
Author : C. Müller 1896
For more information search W3MOST the moss database at the Missouri Botanical Garden

Even using this limited (since 1978) database can throw up some interesting results. A search for the name Mirabella shows it has been used by three different zoologists, all naming a new genus and all within a six-year period.

Figure 8. ION results – Mirabella summary.

Index To Organism Names: Search Results

Request Details

  • Searching for : Mirabella

Search Results

More than 1 matching name has been found (note that all names represented in the database with differing forms of author and/or date are currently treated as different names). For more information, check the boxes next to the name(s) of interest and select 'More Data'.


Found in the Zoological Record Animal Names Database
Organism Name Author Details Group
Mirabella
Barskova M. I. 1988
Mollusca
Mirabella
Emeljanov A. F. 1982
Hemiptera
Mirabella
de Bruijn, Unay, Sarac & Hofmeijer 1987
Mammalia

 
Figure 9. ION results – Mirabella details.

Index To Organism Names: Search Results

Request Details

  • Searching for : Mirabella

Search Results

Found in the Zoological Record Animal Names Database
Name : Mirabella
Group : Mollusca
Author : Barskova M. I. 1988
Classification : Gastropoda
Newly reported in ZR 125, as : Gen. nov. [full citation & ZR index entry data to be available through a proposed TRITON subscription service (in development)]

Occurrence In ZR : 1 ZR Volume125
Occurrence1


Name : Mirabella
Group : Hemiptera
Author : Emeljanov A. F. 1982
Classification : Delphacidae
Newly reported in ZR 125, as : Gen. nov. [full citation & ZR index entry data to be available through a proposed TRITON subscription service (in development)]

Occurrence In ZR : 1 ZR Volume125
Occurrence1


Name : Mirabella
Group : Mammalia
Author : de Bruijn, Unay, Sarac & Hofmeijer 1987
Classification : Muridae
Newly reported in ZR 124, as : Gen. nov. [full citation & ZR index entry data to be available through a proposed TRITON subscription service (in development)]

Occurrence In ZR : 2 ZR Volume124129
Occurrence11

If we had a system of registration in place, or even if taxonomists made more use of existing resources, this sort of mistake could be prevented. Unfortunately at present it happens all too regularly.

Figure 10. Index to Organism Names (ION).
  • ION released April 1997
  • limited information
  • 250 searches a day
  • Taxonomic Resource & Index to Taxonomic Organism Names (TRITON) provides more detailed information

The ION system was made publically available without charge in April 1997 and has been steadily used since then. It now contains about one and a quarter million names and searches are typically running at about 250 names, from around 70 unique hosts, each day.

ION is one of two views of the database. The other is TRITON, the Taxonomic Resource and Index to Taxonomic Organism Names, which provides access to the same information as ION, amplified by the full bibliographic data in the case of Zoological Record, and any other data the other participating organisations may want to make available. This system is functional now in the sense that it is in place on our server, but not yet accessible to anyone other than BIOSIS and our partners; it could very easily be made widely available if we were sure that doing so would not destroy our ability to continue production of the database from which it is derived.

Figure 11. TRITON results for "Filenchus". (In the on-line version, values for "Entry type", shown here in blue, would hyperlink to citations below. Only the first of nine citations is shown here.)

TRITON: Search Results

Zoological Record Animal Names Database nomenclatural data for : Filenchus

ZR Volume Entry Type Year Author
123 Proposal To Commission Publ: 1987 Brzeski-M-W; Geraert-E; Raski-D-J
123 Syn Nov Publ: 1986 Siddiqi-M-R
123 Syn Nov Publ: 1986 Siddiqi-M-R
124 Syn Nov Publ: 1986 Raski-D-J; Geraert-E
124 Syn Nov Publ: 1986 Raski-D-J; Geraert-E
124 Syn Nov Publ: 1986 Raski-D-J; Geraert-E
124 Syn Nov Publ: 1986 Raski-D-J; Geraert-E
124 Comments On Proposal To Commission Publ: 1988 Fortuner-R; Maggenti-A-R; Loof-P-A-A
124 Comments On Proposal To Commission Publ: 1988 Siddiqi-M-R; Hunt-D-J

Citation 01
Serial Title: BULLETIN OF ZOOLOGICAL NOMENCLATURE Vol: 44 Iss: 1 Year(s): Publ: 1987
Article Authors: Brzeski-M-W; Geraert-E; Raski-D-J Title: Case 2582. Filenchus Andrassy, 1954 (Nematoda): proposed designation of Tylenchus vulgaris Brzeski, 1963 as type species. Pages: 23-24 Language: In English
Indexing Nematoda; Filenchus; Andrassy 1954; Proposal to Commission; Placement on Official List & designation of type species; Tylenchus vulgaris Brzeski 1963, p. 24

One comment that should be made is that name data presented by the ION / TRITON systems reflects, at least for zoology, the actual usage of names in the literature, including use of alternative spellings, synonyms and so on—it does not comment on the validity of those names except where that is the specific purpose of the work—and there of course the opinion expressed is that of the author, not of Zoological Record. In other words it's a nomenclator, not a taxonomic authority in its own right.

Figure 12. ION / TRITON.
  • ION and TRITON are nomenclators
  • widely usable
  • most useful in building better taxonomic databases
  • aim to support Species 2000

We take the view that, while this service is usable by non-taxonomists in the absence of other, more comprehensive directories, it is most useful to taxonomists to help them build better databases for more widespread use—such as through the Species 2000 project. Zoological Record is not in competition with Species 2000; we are trying to help support the infrastructure needed to build that system. Information about both ION and TRITON, and access to ION, is available at www.york.biosis.org/triton.

Economic Models

The TRITON taxonomic authority file, of which ION is a subset, and the other activities I have been describing are all based on the manipulation of data which has been drawn together as part of the process of producing Zoological Record. None have any funding in their own right, and indeed none exist as a separate entity—they are all views of a common database that could not exist without the overall ZR production operation, including the substantial costs of literature acquisition and control, indexing, thesaurus maintenance, and many other functions. Putting together the physical product that the user sees is only the tip of the iceberg in terms of cost.

Figure 13. Economic models.
  • All TAF’s derived from existing ZR database
  • ZR is not self-sustaining
    • supported by scientists
    • supported by Zoological Society of London
    • supported by BIOSIS
  • Future support?

The title of this presentation includes the phrase "a self sustaining literature indexing enterprise." I have to say that Zoological Record has never, since it began in 1864, been self-supporting, and the cost of producing it continues to mount. Initially the losses were met by the group of 19th century scientists who set it up; when in the 1880's they were no longer able to continue, the Zoological Society of London accepted the responsibility and covered the costs as part of its contribution to science for almost a century.

When in 1980 the Society in turn was unable to bear the burden alone, BIOSIS stepped in with an offer of partnership—knowing full well that this was an offer that required a relatively deep pocket. Fortunately, despite the present climate—which has not been kind to many, if any, of the secondary services—BIOSIS has been able and willing up till now to continue to pick up the deficit, which amounts to around half a million dollars each year when all costs are taken into account.

How long that can continue I don't know—perhaps not indefinitely—but I have to say that I am not sure in any case that I understand why one organisation of our nature should carry this financial burden. If this aggregation of taxonomic information, both as taxonomic authority files and the raw data that support those files, is as essential as we are regularly told by users that it is, then I believe that fact has to be recognized and the information paid for, by someone. Whether that should be through the existing economic model and through BIOSIS is another matter; my own feeling is that it needs to be seen as part of global infrastructure and funded at government or international level. While I include the activities of Zoological Record in that statement, I also intend it to include all taxonomic activity.

Conclusions

As Stan Blum said in his original proposal to hold this workshop, our ability to communicate precisely about the biological world, regardless of discipline, depends on common definitions of taxa and the names we use to designate them.

To use an analogy: taxonomic information is the telephone directory of biological information, without it you can't even start out to make the call. And yet people don't expect to pay for telephone directories—they assume that the telephone companies provide them free to encourage use of their networks. In fact of course they are not really free, they're paid for out of charges the customer has to meet, whether as part of line rental, call charges, or whatever. What taxonomists need to do is to find closer links to the network of life science information and persuade the owners of that network to support their activities.

And as a concluding remark, I don't know who owns the life science information telephone network, but I'm trying very hard to find out, and in the spirit of collaboration that I've tried to emphasize, will happily share the information if I succeed!


Transcript of Discussion Go to discussion


Biographical Information

Michael Dadd has been with Zoological Record for more than thirty years. He is involved with a number of organizations that have interests in the communication of scientific information and taxonomy. The most relevant to this workshop are ICSTI (the International Council of Scientific and Technical Information) and CODATA. He is a member of the Management Committee of the International Trust for Zoological Nomenclature, and is also a member of the Project Management Team of the Species 2000 Project.