[ TAF Home ] | Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998 |
Names, Taxa, and Information
Taxonomic Authority Files have been identified as a central part of the information on biodiversity. In this talk, I will show problems caused by the use of scientific names as an index to biodiversity, and I will suggest solutions. Taking into consideration that this audience consists not only of systematists, I will set out to circumscribe the subject rather broadly.
Biodiversity has been defined as:
The variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems, and the ecological complexes of which they are a part; this includes diversity among genes, individuals, and populations within and between species, and of ecosystems.
-- Subgroup on Biodiversity Informatics, Working Group on Biological Informatics of the OECD Megascience Forum [see: http://www.oecd.org/dsti/sti/s_t/ms/prod/BIREPFIN.pdf, or http://www.bgbm.fu-berlin.de/biodivinf/docs/oecdmswgbi.pdf ]
Biodiversity information may be subdivided roughly into the three levels according to the different scientific and user communities that handle these fields (Fig.1). This partitioning is also reflected in the emerging field of biodiversity informatics: biodiversity information on the ecosystem level is often seen as a (more or less important) part of environmental information systems, the organism level forming the taxonomic information systems, and for the molecular level the term "Bioinformatics" has been coined.
Figure 1. Biodiversity information components | |||
Ecosystem level: | |||
- Habitats, - Biotopes, - Associations of organisms | |||
Organism level: | |||
- Species, - Infraspecific taxa | |||
Molecular level: | |||
- Genes, - Metabolic products |
This distinction somewhat hinders information access because it tends to obscure the common elements important for information access, which are based on classification systems covering all levels of biodiversity information. The drive towards Authority Files is largely the attempt to harmonize and qualify these classification systems, be it with respect to terminology describing functions or processes, descriptive terminology, or with respect to the concepts used in the classification systems themselves.
Geographic information and the scientific names for groups of organisms (taxa) are the key access points as well as the most important information output provided by biodiversity information systems. Designations for taxa form the principal index for large parts of the existing scientific information about life on earth (past and present).
Of course there are other major biological fields which use ecosystems, taxa and/or molecules as their objects of research, analyzing their composition, structure, processes and organization. However, all scientific fields develop classification systems and terminologies to express their results, so the principal problems encountered in the indexing of biodiversity information are by no means restricted to that subject. The interdisciplinary approach of this workshop is based on the realization that these problems exist in information retrieval for any field to which human descriptive and research activities are directed.
Figure 2. Biodiversity information |
To be able to access and communicate information in a meaningful way we have to classify and we have to name the resulting classes. If we are able to exactly define the classes, we have created an index system which allows us to access the underlying information in a very precise manner. However, for biodiversity information on the organismic level this is where we encounter a major problem, which I want to illustrate in the following.
The first example illustrates an exact classification and naming process (Fig. 3). Many secondary metabolites in plants are small biomolecules for which the entire chemical structure is known. We can classify the individual molecule according to a single criterion, the structure. Any new compound found and isolated can be compared to known structures and either be grouped with an existing one or form a new class of its own. A name is assigned, which applies to an exactly circumscribed set of compounds.
Figure 3. Exact classification and naming | ||
Example: (smaller) biomolecules. At the basic level, biomolecules can be exactly classified by means of a single criterion: their structure. Trivial name: Isoatripiciolideangelate Search and indexing: done on structure or parts of structure, rarely on names |
(Although every chemical compound theoretically can be named according to a defined set of rules, these names get so complicated that the graphical representation itself is often used. Chemical databases store these representations and queries can be made by drawing a compound or part of a compound.)
My point is that the basic component class is exact, consisting of uniform elements. The technical index item consists of an exact description, be it in the form of the technical name or a graphical representation of the structure.
Classification of organisms has the same aim as that of molecules. To be able to comprehend the organismic world we need to subdivide it, distribute the knowledge into drawers (classes = taxa), and we have to name these classes to be able to talk about them.
In contrast to a molecular structure (or its descriptive name), however, the name of a taxon describes a class of items that are not exactly alike (Fig. 4). The classification process in this case is not the determination of a single, overriding property, but the class represents a taxonomic concept, which in turn determines the criteria used to define the class. So even on the lowest level of classification, a taxon is a class defined by selection of criteria. This constitutes a fundamental difference to the classification used for biomolecules. Because of the inherent complexity of organisms, the number of morphological, anatomical, physiological, chemical, genetic, etc. characters which can be used is very high. Fortunately, over the past centuries systematists have reached a certain consensus that the taxonomic system should not only serve to identify organisms, but should also try to mirror genetic relationship between groups of organisms (albeit sometimes these two aims are contradictory).
Figure 4. Taxa: the classification of concepts |
|
As soon as a class of organisms has been formed, the international codes of nomenclature provide clear rules on how to name the taxon. Separate codes exist for Animals, Plants, Bacteria, and Viruses. The names formed in accordance with the codes are not descriptive, which has proven to be a practical requirement of the system.
The circumscription of a taxon, e.g. that of a species, and the assignation of a name to the taxon, is laid down in a published treatment (Fig. 5). The circumscription may be made by describing the characters that differentiate the taxon from others, and/or by describing the taxon itself, as well as by refering to preserved specimens held in long-term storage in natural history collections. The latter has proven to be the most important means to reproduce the ideas of an author with respect to the circumscription of a taxon, hence the paramount importance of collections in systematics.
Figure 5. Taxon circumscription vs. name |
|
Generally, for species a new name has to be explicitly tied to exactly one specimen, the so-called type specimen. The correct name for any species is attached to the type concept, it depends on the type specimens found within the circumscription of the taxon. The rule is that the oldest name which is represented by a type falling within that species circumscription has to be used. This priority rule has lead to a rather stable nomenclature of organisms, although it introduces a grave complication in the use of names as an index for biodiversity information access.
Another problem imposed by the rules of nomenclature is that for taxon names below the hierarchical level of genus, the boundary between classification and nomenclature is obscured. For example, a species name consists of the genus name the plus an additional word, the species epithet. This has proven to be very practical, but it also means that a new system of definitions or circumscriptions at the generic level effects the names of included species.
Let me try to illustrate the implications with an example from the world of mosses (Fig. 6). The plane of the illustration represents the multidimensional character space used to circumscribe taxa. The boxes represent certain concepts, i.e. circumscriptions of species in recent treatments. The white dots represent the position of designated type specimens within the character space.
Figure 6. Circumscriptions, types, and names. |
The problem of the influence of taxonomy and/or nomenclature on the genus level overlaying the naming of species is illustrated by the use of different generic names (Drepanocladus, Hypnum) within the same character space. Hypnum fluitans was described by Hedwig in 1801. It was placed in the genus Drepanocladus by Warnstorf in 1903, the resulting taxon called Drepanocladus fluitans because it was based on the same type specimen. (In 1907, Loeske placed the type of the species in yet another genus, Warnstorfia.) Hypnum exannulatum was described by Schimper in 1854 and underwent the same generic displacements as Hypnum fluitans.
This should suffice to illustrate that several names may exist for a given point in the character space depending on the generic circumscription. However, the more important fact is that different taxonomic circumscriptions may exist under the same name. This is exemplified by the differing treatments of Drepanocladus fluitans by Touv & Rubers in 1989 and Frey & al. in 1995). From the nomenclatural point of view, Hypnum fluitans, Drepanocladus fluitans, and Warnstorfia fluitans are synonyms. However, the circumscription provided by Touv. and Rubers differs substantially from that given by Ludwig & al.
Figure 7. Nomenclature | |
|
As a consequence, we encounter problems with information linked to such names. For example, if Drepanocladus fluitans is protected by law, which concept do we refer to? Do we include all the organisms circumscribed by the black box, or those in the yellow box? We cannot solve this question by referring just to the name.
Should we abandon the type concept and the other rules that cause these problems? I do not think so. By far not all taxa present the problems described. The current system represents an international consensus and provides a certain amount of stability in names. In particularly, it has helped to avoid an inflation of names; imagine if any (supposed) change in the concept of a taxon would have led to a new name. Finally, no reasonable alternative has been brought forward to replace the current system.
On the other hand, we need a unique index to the concepts represented by names in cases where this is vital - for example for legal issues, in medicine (poisonous organisms, pharmaceutical properties), and in environmental management (endemism, habitat change, environmental impact assessment, invasive species, etc.).
Figure 8. A potential solution. | |
|
A rather simple and practical solution is to replace names with potential taxon names [Berendsohn 1995], a combination of a name with an additional data item, a circumscription reference (Fig. 8). When different concepts are in use under the same name, or different names for the same concept, then we can actually verify to a certain extent how the concepts relate to each other, as long as we keep them separate. An information item linked to a potential taxon name, e.g. information on a certain property of the organism, can thus be followed up. As a matter of scientific progress, new concepts will arise, but the older concepts can be mapped into the new ones, thus providing an indication of the extent to which information attached to the old concept can be transferred to the new one. This mapping of potential taxa to a current consensus view (or several of them) is a task for the specialist, of course. However, any thorough treatment of a taxonomic group includes this scrutiny of concepts, so it is only a new task for information storage, not a new task for the taxonomist.
We have implemented a crude version of the potential taxon concept in the International organization for Plant Informations Global Plant Checklist database (see http://www.bgbm.fu-berlin.de/IOPI/GPC/). Although the subject of the database is restricted to vascular plants, I have taken the liberty to introduce a few mosses to serve as an example of the use of the concept. (The GPC project itself yet awaits funding.) If you search for Warnstorfia fluitans, you are presented with an arbitrarily assigned consensus view (Fig. 9), that of Ludwig & al. 1996. Other potential taxa are included in the database and have been related to the consensus view.
Figure 9. Implementation in the IOPI GPC |
Warnstorfia fluitans (Hedw.) Loeske
|
The example is taken from a project on mosses in Germany carried out at the University of Göttingen in collaboration with the Federal Office for Nature Protection [S.R. Gradstein, M. Koperski, M. Sauer & W. Braun 1998]. This TAXLINK project group is building a database implementing the full IOPI model [Berendsohn 1997] and they are in the process of entering data from various important historical treatments, linking them into a a current view of the taxa. (Please note that the treatment by Ludwig & al. is not necessarily their consensus concept.)
The entry depicted above cites the consensus view first, including its source (the sec., i.e. in the sense of reference), followed by the nomenclatural synonyms of the name. This is the part which is used in all traditional authority files in taxonomy.
Under the concept synonyms heading, a list of previously used potential taxa is given, together with their relationship (in brackets) with the consensus view. From this we can conclude, for example, that any information attached to Drepanocladus fluitans as treated by Smith (1978) can be directly transferred to Warnstorfia fluitans as accepted by the current view. In contrast, caution is necessary with information attached to the same name as used by Touv & Rubers (1989).
The GPC database is implemented using standard relational database technology. The structure can be found at http://www.bgbm.fu-berlin.de/IOPI/Gpc/CurrentModel/ImportTables.gif.
Figure 10. Potential taxa, implementation | |
Advantages | |
+ Provides Authority File without being in the way of scientific
progress + Links unambiguous information to Authority File entries, alerts user to inconsistencies + Storage of taxonomic and nomenclatural decision process, duplicate efforts avoided |
|
Caveats | |
- Constant maintenance by systematists needed - Comparatively complex structure |
Implementation of a database using potential taxon names as the taxonomic authority file has a series of advantages as compared to simple checklists. First of all, the input is very flexible, new records can be added without having to decide immediately how they relate to the current consensus view. They become accessible by way of the name alone, while coordination can be done at a later stage. The consensus view may change, the current consensus view thus becomes simply another potential taxon in the list. Where there are no conflicts, users can be directly led to other information linked to the taxon, where conflicts exist, they can be alerted. The system can store large parts of the decision process leading to the current taxonomic view, information which in traditional checklists is largely lost or supported only by more or less unstructured notes. Thus the system clearly supports the taxonomic process, it is in no way imposing an authoritative treatment on the community (which is sometimes feared by taxonomists if they hear the the term authority file). On the other hand, a view of the database can be implemented which shields the user community outside systematics from all the unneeded information, presenting them only with the latest consensus checklist.
I think that these advantages outweigh the caveats, which largely lay in the more complex structure of such a system.
Figure 11. Accessing biodiversity |
Coming back to the question of biodiversity information access on the level of organisms, it can be said that access via names may reveal a large amount of information (e.g. using a full text query on the WWW). However, an access system using potential taxa will provide information of superior quality. This does not diminish the fact that the highest quality information should be falsifiable, which can only be achieved by linking it to specimens.
Taxon and name based access is the central component of the global biodiversity information facility (GBIF) proposed by the Working Group on Biodiversity Informatics of the OECD Megascience Forum. In this workshop we have seen that substantial efforts are under way towards providing the base for the realization of this component. The technical conditions exist to implement an access system based on taxonomic concepts, but to achieve this aim several prerequisites have to be met.
Figure 12. Actions required. |
Name access forms the core of the proposed Global Biodiversity Information Facility (GBIF). |
|
Full access to published names is of course a basic component of any solution envisioned, and projects such as the International Plant Name Index introduced earlier in the workshop show that this aim is within reach. In the realm of software development we need data capture software for potential taxon information. Although it does not yet exist, I mentioned already that such efforts are also under way. We also need software components which are able to handle this type of information in a networked environment. Projects exist which develop the logic for programs able to link existing checklist-based databases and make intelligent guesses as to the relationship of the different potential taxa included in those (see http://sobs.soton.ac.uk/litchi/).
A major challenge is to gain the support of the taxonomic community to facilitate the information needed for a concept based authority system, thus providing the base to link content information as well as specimen information to potential taxa.
Berendsohn, W.G. 1995. The concept of "potential taxa" in databases. Taxon 44: 207-212.
Berendsohn, W. G. 1997. A taxonomic information model for botanical databases: The IOPI Model. Taxon 46:283-309.
Gradstein, S. R., Braun, W., Koperski, M. & Sauer, M. 1998. Taxlink. Datenbank zur Verwaltung unterschiedlicher taxonomischer Auffassungen. Information Leaflet, Albrecht-von-Haller-Institut für Pflanzenwissenschaften der Universität, Untere Klarspüle 2, D-3703 Göttingen.