The Magazine of the CALIFORNIA ACADEMY OF SCIENCES

CURRENT ISSUE

SUBSCRIBE

ABOUT CALIFORNIA WILD

CONTACT US

ADVERTISING

SEARCH

BACK ISSUES

CONTRIBUTORS'
GUIDELINES

THIS WEEK IN
CALIFORNIA WILD

 

Feature

Captain Genome
An interview with Craig Venter

Kathleen M. Wong

Image: Michael Scholz

J. Craig Venter is a genetic buccaneer. He made a name for himself by showing up the most hallowed biological research program in decades - the Human Genome Sequencing Project. Venter's damn-the-torpedoes personal style led critics to claim he was in it for the money. But instead of founding a pharmaceutical empire on a trove of patented human genes, he resigned from Celera Genomics to found an entirely new field of scientific study: environmental genomics. This new suite of projects carries the classic marks of a Venter venture: wide scope, great ambition, and idiosyncratic genius. Venter is currently circumnavigating the world on a yacht specially outfitted to fish for ocean microorganisms. He wants to sequence their DNA, analyze their metabolisms, and see if they hold the answers to our future energy needs. He is interviewed by California Wild's Senior Editor, Kathleen M. Wong.

California Wild: You started your career researching pharmacology and biochemistry for the National Institutes of Health (NIH). Yet today, you're famous for revolutionizing genome sequencing. How did you make the leap from what you were studying to genetics?

Craig Venter: I spent ten years studying the structure and function of the adrenaline receptor, trying to isolate that protein from the heart and brain. It's such a rare protein; we could never get enough to work on. With the molecular biology revolution, it became clear that to get a rare protein you had to isolate the gene and then sequence the gene to get the protein structure.

When I moved to the NIH in 1984, I switched from being a biochemist, to spending a year teaching myself and the lab about molecular biology. I felt it was essential for moving forward. We isolated the human brain adrenaline receptor gene and sequenced its DNA. The techniques at that time used a lot of radioactivity, gels that were hard to read, and the method was very slow and cumbersome.

In the mid-80s, when the first discussions of the Human Genome Project began, I got very excited about the potential of this new field of genomics-looking at all the genetic information in the cells. There was a paper in Nature by Lee Hood and colleagues that demonstrated you could attach four different colored dyes to the four bases of DNA (T-thymine, A-adenine, C-cytosine, and G-guanine), activate those dye tags by a laser, and read them all optically into a computer.

After spending ten years working on one protein, I saw that we could sequence tens of thousands of proteins in a few years. I contacted Applied Biosystems, the fledgling company which made the first automated sequencing machines and, in 1987, my lab became their first test site. I also went to see James Watson [co-discoverer of DNA] about using the tools I had to sequence the human genome. He got very excited about it, and the next day told Congress I had the best lab for sequencing DNA in the world.

The genetic instruction booklet of a species-its genome-is not an easy read. For one thing, they tend to be very long. In the case of human beings, the message can be billions of Ts and As and Cs and Gs long. You've also got to wade through a lot of what scientists consider "genetic junk," as the genes the body actually uses lie amidst huge segments of filler DNA.

CW: To cut down on the amount of sequencing you'd have to do, and focus only on real genes, you took the relatively new approach of using complementary DNA. So instead of tagging, or reading, every sequence in the genome, you only tagged the ones that are read, or "expressed." This approach (described in a 1991 paper in Science, "Complementary DNA Sequencing: 'Expressed Sequence Tags' and the Human Genome Project.") transformed the monumental problem of deciphering whole genomes into a logistically feasible task. Can you explain the advantages of looking at complementary DNA versus all DNA?

Venter: In each of our tissues, heart or brain or muscle, our cells interpret our genetic code. Only about one percent of the human genome actually codes for genes. Enzymes go and copy those regions and make what's called messenger RNA (mRNA) from the DNA. We use those as templates to produce the proteins that give our body structure. mRNA directs the formation of proteins, but is very unstable. So we copy and convert the mRNA back to cDNA. The cDNA is very stable. Also, this approach sequences only the small percentage of our entire genetic sequence that actually codes for genes.

It basically uses our cells as supercomputers, to sort what is biologically useful from the genome but also specifically which genes are essential for becoming heart cell or brain cell or skin cell. It gave us functional and spatial information about each gene, not just linear information about where it is located in the sequence, and it sped up the process of identifying gene function by orders of magnitude.

I was doing the first test human genome sequencing project at NIH in the late 1980s. It was very slow to sequence even small stretches. Each stretch would take maybe a month or two of work, and you'd find maybe one gene. Using cDNA allowed me to identify more new genes per day than I did per decade previously, then more new genes per hour than I had done in a decade.

CW: Several other scientists were exploring the possibilities of cDNA at the time. Yet you were the one who took it to the next level by applying computer technology.

Venter: There were several early proponents of sequencing the genome, but all those using cDNA were shouted down by those who wanted to look for disease genes. My idea was to use new automated technology and supercomputing to interpret the data, but I also had a philosophical difference on how to interpret what the genome was telling us. Putting all that together made it successful.

CW: Your ideas were the ones ultimately used to sequence the human genome. Can you describe how you developed your methods?

Venter: My colleague Hamilton Smith, the Nobel laureate who discovered restriction enzymes, and I were looking to apply computational tools to genomics. He suggested sequencing Haemophilus influenziae (a bacterium), and we designed a technique called whole genome shotgun sequencing. It was extremely efficient.

I left NIH in 1992 and formed a nonprofit, The Institute for Genomic Research (tigr). I started that to expand the expres-sed sequence tag (EST) approach. We published a paper in 1995 containing about half of all the human genes. There was a complication with the method because we didn't have good computer algorithms. The existing algorithms could deal with only a few thousand sequences, but we had millions. So we hired a new set of people with computational skills to design new computer algorithms that enabled us to put together tens of thousands of sequences.

For example, it took 13 years of government funding to sequence the human E. coli genome. We turned that into a four- month process. As a result, we sequenced the first genome of a free-living organism in history and published it in Science in 1995. In that article, I predicted we would use this technique eventually to sequence the human genome.

So after sequencing the genomes of many key pathogens, and working on the first plant genome, Arabidopsis thaliana, and doing some sequencing on the human genome, in 1998 Perkin Elmer offered to fund the human genome effort using this new technique. It was part of the stipulation that I had to form a new company to work with them; and I felt I either had to do that or be a witness and watch other people do it over the next 20 years. So I formed the for-profit company Celera.

CW: You upped the ante on the Human Genome Sequencing Project by claiming Celera could slice years off the target finish date and hundreds of millions of dollars off the cost. There was a lot of tension between Celera and the public sequencing effort, because people claimed Celera would get a head start on patenting many genes.

Venter: There were a lot of different claims made. Numerous scientists working on the public genome effort didn't like competition, and were quite vicious in their attacks. I'd respond, if they were so sure it wouldn't work, why were they attacking it so viciously? It was almost like a presidential campaign, with each side trying to smear the other. The sequencing of the human genome was supposed to be a 15 to 20 year project. We did it in nine months. The sequencing of the human genome was originally expected to cost around $5 billion to complete. With our participation, that dropped to something around $100 million dollars of private money. That's a pretty dramatic change. The reality is, despite the claims and attacks, we sequenced the human genome very rapidly and accurately, and gave it to the public for free.

As a result, we had the human genome more than five years before the first draft was supposed to be published. We announced in 2000 the first assembly of the human genome at the White House.

Many in the public effort, because it's such a large project, thought that getting the sequence was the goal. What I have always argued is getting the human genome sequence would be a starting point in science, not an ending point. To me, that's when we started the analysis. From my experience working on the adrenaline receptor, I knew we could only start the interpretation when we had the sequence. We set up a large bioinformatics team to develop new software sequencing tools to make it available to the public.

At that time, two companies were each claiming to patent hundreds of thousands of genes. In the end, we found a total of only 25,000 to 26,000 genes. It shows how little we understood about the genome.

CW: Does that mean our instruction booklet is a lot simpler than expected?

Venter: If you were of the school that wanted to have one gene for every human trait and thought process, then you were surprised. But in fact our complexity is extremely high. We have over a hundred trillion cells, each has the same exact genetic code, but only a subset of the code is used for each of those cells. The control, regulation, and complexity of those hundred trillion cells only begin with the genetic code. As soon as you make proteins from those genes, the level of complexity goes up again. How do you regulate that and each time end up with a functioning, walking, talking human? I'd say our genetic code is pretty complex.

CW: Celera originally announced that it would sequence the genomes of several different people chosen at random, but the sequence ultimately published was dominated by your genome. Why did you decide to use your own DNA, and how much did that affect the final product?

Venter: The sequence we did at Celera was derived from five people, three women and two men. They were self-described as African American, Hispanic, Chinese, and Caucasian. There were several reasons I was one of the two male donors. We had to make the libraries quickly. Instead of waiting months to go through the complete informed review process with another human subject, I figured me donating my own DNA would provide as much of an informed consent as you could get. I understood the implications better than just about anyone else.

Most people at that time were very afraid of the notion of having their DNA sequenced and exposed to the public. They thought it would define who they were. I felt that was not the correct view of biology and genetics. I believe in leading by actually doing, and I wanted to set an example of a leading researcher in the field not being afraid of having his genome revealed, that there is nothing to be feared by looking at one's genetic code. But from the sequence we published, you can't just pull mine out.

It's a consensus sequence. If you looked at yours and compared it to mine, we would differ only by roughly one out of one thousand letters. So in a stretch of 500 letters, it's very easy to match both up. Take two stretches of newspaper with 500 words exactly the same and three words out of that 500 having one letter changed. In a place where a letter was different, we would use the majority rule. If three out of five had a T, we'd use a T.

But all of us share a virtually identical genetic code. I had done earlier studies that showed our genome was remarkably similar over small areas. I'd be very surprised if you and I had any different genes; what we differ in is the spelling of our genes.

CW: You've managed to combine work and play-which for you is genetics and sailing-in your latest project. Can you describe the Sorcerer II Expedition, and discuss how you came up with the idea for the project?

Venter: The expedition is based on the experiment we did on the Sargasso Sea, which was published last April in Science. We wanted to take what we know about the human genome and apply it to the environment. We wanted to know who's there and more about their biology. We only understand about 5,000 microbial species currently from the oceans, and yet we're dependent totally on the oceans for the carbon cycle, which we're distorting quite substantially right now.

By doing shotgun sequencing, we're testing to see whether we can collect the DNA of all species and reassemble their genomes. Off Bermuda, at sites just a mile apart, we found totally different organisms. And in the Sargasso Sea, which is a low nutrient environment, we discovered 1.3 million new genes and somewhere between 1,800 and 47,000 new microbial species. It was such an exciting discovery that we decided to mount a global expedition. We realized that we currently understand very little about our oceans and/or our environment.

Now Sorcerer II is on a complete circumnavigation of the globe, doing shotgun sequencing of the oceans around the world. Every two hundred miles, we take 200 liters of seawater, and sequence all the DNA that's in the water.

As the expedition has continued from the east coast of the United States, through the Caribbean, Panama, Cocos Island, Gal‡pagos Islands, and through the Pacific to Australia, we are seeing the same level of diversity and discovering a staggering amount of new genes and identifying new species. This is rapidly changing our understanding of life in our oceans at least by an order of magnitude.

CW: While formulating the Sorcerer II project, you came up with a whole new field of scientific inquiry: environmental genomics. Can you explain what that is?

Venter: The unseen world of microbes makes up the majority of life on our planet, yet we know virtually nothing about these extremely important forms of life. The oceans and the soils are home to billions of microorganisms, yet we've been unable to characterize and study them because many can't be grown in the lab. Now, using the same genomic tools and techniques used to sequence the human genome, we can characterize who is in various environments by sequencing their unique DNA.

With a better understanding of marine and terrestrial microbial biodiversity, scientists will be able to understand how ecosystems function and discover new genes of ecological importance. Environmental genomics could potentially redefine the origin of species as well as alert us to the damaging effects we have on the planet. More than 3.5 billion tons of carbon dioxide is released into the atmosphere each year as we continue our dependence on fossil fuels. The resulting environmental problems are the most important issues facing us as a species.

CW: Among your many projects has been researching how to make a completely synthetic form of life, the so-called "designer microbe." Why are you pursuing this project?

Venter: We're trying to build the minimal microbial cell, as a means to try to understand the genetic basis of cellular life. It's extremely hard to do, and it's slow going. We have a team dedicated to this at the J. Craig Venter Science Foundation, and we'll see what happens in the next year or two.

CW: In your Sorcerer II work, you've found a wide variety of the genes organisms use to convert sunlight into usable energy. We hear you have plans to use some of these phytochrome genes to produce clean energy. Can you elaborate?

Venter: Bacteria and other microorganisms form the bottom of the food chain and orchestrate the cycling of carbon, nitrogen and other nutrients through the ecosystem. They may hold the key to generating a near-infinite amount of clean energy. All organisms that use sunlight as a primary energy source use a particular wavelength of light. By looking at the wavelength that these newly discovered phytochromes absorb, we could take advantage of the different absorbed wavelengths of light and utilize more of the sun's energy.

We could create a single microbe that combines these phytochromes and use the energy that is harvested, [as well as] other novel pathways that have yet to be characterized, to produce clean hydrogen energy. The microbe would have to be engineered because existing species do not naturally produce hydrogen in the abundance and efficiency that we need.

Our biological energy team at the Venter Institute is looking at ways to modify bacteria and their photosynthetic systems so that hydrogen could be produced more efficiently from direct sunlight.


Kathleen M. Wong is Senior Editor of California Wild.