California Wild Winter 2006 - Today Is Too Soon

The Magazine of the CALIFORNIA ACADEMY OF SCIENCES

feature

DNA Barcoding
Today is Too Soon

Peter Roopnarine

A set of DNA and data analytic techniques, known collectively as DNA barcoding, has been proposed recently as a solution to the problem of enumerating the Earth’s biodiversity. Estimates of the number of species on Earth range broadly from fewer than 10 million to upward of 50 million. With fewer than 2 million species actually identified by scientists, these estimates are the results of very different interpretations of the uncertainty and incompleteness of our biodiversity surveys.

Given the logistical difficulties of sampling Earth’s myriad corners, and the time and expertise normally required for the identification of species, any proposed revolution in methodology must be considered seriously. But is DNA barcoding the revolutionary solution that it purports to be? To answer this question, we need to examine the goals of DNA barcoding, how scientists go about barcoding organisms, and then decide if barcoding is really up to the job.

The basic assumption behind DNA barcoding is that every biological species has a short sequence of DNA that, like a fingerprint, is unique to that species. The sequences should come from parts of the genome that evolve quickly enough to separate species that share a recent common ancestor, but slowly enough to minimize differences among members of the same species. This is a tall order, but supporters of DNA barcoding have identified several candidates. Foremost among these is the mitochondrial gene, cytochrome c oxidase I (cox1). This gene codes for an enzyme so critical to metabolism that apparently every creature in the animal kingdom has it. Once a cox1 sequence has been obtained from an animal, that sequence is then compared to a database of already established sequences for identification.

Proponents of barcoding claim that it will help biologists more rapidly identify species, provide a better way to classify them, and serve as the basis for phylogenies (family trees) of groups of species. The claim has also been made that barcoding will allow the efficient identification of previously undescribed species. Sadly, these claims are exaggerated.

Imagine that you are a biologist on a field expedition to a remote location. Few of the animals you will see have been previously described. You have just collected an unusual animal, one you have never seen before. Without barcoding, you might be flummoxed. But with this technique, proponents say, your problem is easily solved. By taking a small sample of tissue from this animal, you could read a section of its cox1 gene right there in the field! Your handheld sequencer then uploads the data, via satellite, where it is compared to sequences in the master database. Minutes later, you will receive an identification of your species, or learn that the sequence is not currently in the database. You have either added another species to your list of the area, or just discovered a new species.

Such scenarios, if feasible, would revolutionize the documentation of Earth’s biodiversity. The technology isn’t so far off, but first let’s consider how your DNA sequence will be interpreted.

First, suppose that an exact match to the sequence was not found in the database. Do you have a new species? Maybe, maybe not. To arrive at that conclusion, you had to assume that all animals belonging to the same species have the same cox1 sequence. This means, for example, that all humans on Earth have identical cox1 sequences. Now, one could argue that we cannot possibly draw that conclusion, since we have no practical way of sampling cox1 from every human on the planet. But we could sample many humans, and then estimate how variable cox1 actually is within our species. If cox1 is variable in humans or other animals, then how do we know that the cox1 that we sampled and sequenced from our specimen is indeed unique?

This question is made all the more difficult when you realize that there are actually very few individuals of any single species that have been sequenced and are represented in the database. Barcoding supporters address this problem by pointing out that even though members of the same species will vary in their sequences, the variation among those individuals is far smaller than the differences between species. For example, that would mean that human sequences, though not necessarily identical, will be far more similar to each other than to sequences from, say, chimpanzees. So-called “thresholds” could be established, meaning that two sequences differing by more than, say, a few percent, must come from different species. Where to draw this line is where barcoding runs into its first serious obstacle.

In the old days, taxonomists relied almost exclusively on examinations of morphology (skeletal characters, soft-tissue anatomy) to establish species identities (paleontologists in fact still operate in this manner, since fossils almost always only show morphology). This approach requires a great deal of time and expertise, quite in contrast to DNA barcoding. A set of methods known as numerical taxonomy were developed in the 1950s and 1960s, to quantify the process of comparing morphologies and assist taxonomists in their decisions. Numerical taxonomy can actually work quite well for discriminating species.

However, one of the caveats is that individuals of the same species can vary, sometimes dramatically, in their morphologies. When faced with this variation, how does one decide if the collection of specimens belong to a single species? The notion of establishing thresholds of morphological differences was suggested. But as scientists developed ways to analyze the genetic blueprints of organisms, it became clear that you first have to understand how variable morphology is within a species.

As it turns out, there is no simple correspondence between morphological variation and genomic variation. Chimpanzees and humans are quite clearly different species, and we know this from morphology alone, but we share more than 99 percent of our genomes. Different breeds of domestic dog, all belonging to a single species but morphologically very different, also share more than 99 percent of their genomes. Yet other species might have significantly different segments of their genomes, but be morphologically nearly identical. So there is a classic Catch-22 problem—how can one understand variability within a species if one is not even certain what belongs to the same species? Today, taxonomists generally attack the problem with a multitude of complementary tools, including detailed examinations of morphology, life-histories, behavior, ecology, and, of course, genomic structure.

Barcoding shortcuts this process. In doing so, it will generate tremendous, and unmeasurable, uncertainty in species identifications. The counter-argument is that the differences between individuals of the same species will always fall below established thresholds. We simply do not know that.

Back to you sitting in the wilderness with your barcoder. If the sequence of your specimen did indeed fall above the threshold, should you conclude that the species is new to science? That could be a valid conclusion, noting that further verification, probably by expert taxonomists, should be performed at some point in the future. This approach could in fact accelerate biodiversity description and documentation. Given our current biodiversity crisis, this sort of acceleration is something that we absolutely need.

But why are we so concerned with counting all the species on the planet? If barcoding worked, and we were able to tally all the species over the next ten years (an impossible feat given available funding, the number of biologists, and our mortal limitations), how would we use that information? Would humans be more conscious of our impact on ecosystems and biological communities if there were 20 million, instead of 5 million, species on Earth? Would there be greater awareness and increased conservation efforts if 10 million out of 20 million species were threatened, instead of 2.5 million out of 5 million?

It is important for all of us to understand that diversity is measured in different ways. While the total number of species is important, I argue that ecological diversity is of greater importance. We need to know what these species are, how many of them are out there, and their functions in ecosystems. Barcoding can help, by suggesting species identities and numbers, but it tells us nothing else about the nature of these species.

Barcoding’s statistical approach to describing diversity is also potentially dangerous. It is unlikely that we will be able to count all the species on the planet, accurately, in time to address catastrophic habitat destruction and climate change. Scientists working on these problems are faced constantly with poorly known factors, and while great effort goes into accumulating increasing quantities of data of known accuracy, the ticking clock forces us to incorporate uncertainty into our predictions of the future.

The uncertainties inherent in the barcoding scheme are no worse than those generated by more traditional approaches. Whether or not they are significantly better remains unanswered, and we cannot risk assuming that they are better. Verifying species with complementary approaches will, and must, take time. Dedicating more scientists, staff, and materials to barcoding efforts could inadvertently detract from the overall goal of biodiversity documentation.

DNA barcoding is an innovative addition to the taxonomist’s toolbox, and will speed up the discovery of new species, but it is not the panacea some of its supporters claim. We may have only one chance to truly understand the nature of Earth’s biodiversity, so let’s get it right.

Peter Roopnarine is Associate Curator of Invertebrate Zoology and Geology at the California Academy of Sciences.