Discussion at the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998
[ TAF Home ] [ TAF Workshop Proceedings ] [ Session ]

General discussion following Sessions III & IV

"Authorities, Gazetteers, and Thesauri" and

"Data Structures for Taxonomic Names and Classifications"


Stan Blum:
We're into the section now where we can talk about all of the issues that we've heard about up to this point. There were a whole broad array of issues, from representations of classifications, to possible community architectures. The ADL gazetteer work, for example might raise the question of whether a single large repository, with distributed contributors, might work for the taxonomic community... Is anyone ready to ask a question or make a comment? No? Jim, how about some rabble-rousing?
 
Jim Beach:
Well, throughout the morning Allen Allison and I have been having a side discussion about an issue that I think Walter raised as well as several others, and that is the community overhead associated with maintaining—in this case I think you'd call it a global classification space or concept space—to organize all of the classification trees and subtrees that have been published, to maintain that multidimensional tree structure, on an on-going basis, without a professional infrastructure for career credit, etcetera. It seems that this is a fascinating information engineering problem that has some interesting identified solutions, both conceptually and in terms of implementation, but Allen has been sort of poking me and raising the question: "Can we afford to do this?" I guess looking at the geographic efforts and the libraries' subject term efforts, how do we evaluate whether this is economical, and supportable and sustainable, now and into the future, because this is likely to represent a need to identify sources and kinds of funding that are different than we would get for supporting analogous print publications. So that's my rabble-rousing comment.
 
Frank Bisby:
It's a simple fact that, world-wide, there is enormous pressure for one element of that space, and that is to get to the position where we have at least one responsible checklist of those species presently known. So I think that's the bottom-line; the minimum. Whether there are enough funds to fill out all the space, all the alternatives, all the hierarchies, all the potential taxa, and so on, is not clear to me. The minimum is that. Throughout the library community, the biodiversity community, and the biotechnology community, they're just astounded that we don't have a first try at the list of what is presently recorded in one dimension.
 
Stan Blum:
I will take this opportunity, then, to say that what seems evident to me is the need for greater collaboration between different kinds of projects. We've been discussing, and Frank mentioned in his talk earlier, this hierarchical notion of data components in taxonomy, from nomenclature, to taxonomy, to classification. We've heard from a number of projects that are making, or have already made, great progress in the compilation of basic nomenclatural data; the nomenclatural acts that have occurred in the literature. These are the simple facts. The checklist-oriented work could easily draw upon these. They need the basic nomenclatural data as building blocks. Checklists could be built as references to authoritative records on nomenclature, without duplicating the indexing and verification work. So it seems to me that we should be working toward some sort of collaborative architecture in our various projects. That's a recommendation.
 
Nancy Morin:
I don't know if this is too simple-minded a way to look at it, but one of the messages we had from the libraries is that they looked at the work they were doing, they identified the fact that there was too much work to be done, and the way that they were doing it was taking too long. They needed to have a collective way to streamline it and just simply make that work go faster. The critical piece was having the authority files as a way of sharing information, so you only had to put the information in once and other people could use it; you figured stuff out once and everybody could use it. Surely, as we're looking at the demands we have for information on biodiversity, the pressure that Frank was talking about, if we only looked at capturing data from our collections, which every institution has (it's this enormous thing), we should figure out what the cost is now, if each institution had to put in all the names, and the geographical names, time after time... Could we get the job done faster if we put in some up-front financing to get these authority files up and available in a way that we could all share them better?
 
Scott Peterson:
Adding to what Nancy just said, the thing that seems obvious to me is that the medical industry has found a way to gain support and put in some of their infrastructure at NIH, whereas the biological community really doesn't have, or hasn't applied the influence (in the U.S.) on Congress, to get funding into a federal agency that could assist us in this effort.
 
Anon. (F.):
Do you need to come up with some kind of uniform place to put things? To use the Field of Dreams, "if you build it, they will come" model; something like the Species 2000 project, for example. If we could advertise that further, on some of the e-mail lists and things; so people would realize this resource is there, would go and use it, and start to put their data into it. On the issue of feedback though, I think that in order to get to the issue of paper publication—you know, "publish or perish"—if we could get feedback on how many times these people are hitting these sites, and be able to cite that to the bosses, and say "Look, we've put are data there, and they're getting a thousand hits a day." That's more than people are seeing your paper publications. Maybe that will give the necessary feedback to help justify the web site.
 
Laurel Jizba:
There is some funding now, because the agency—I don't know the name of it, but it's the group that used to give libraries money under title 2C, which was disbanded, and now the requirement is that federal money only be given out to joint projects to libraries and museums. I guess Adam knows which group this is, but it's not NSF. —The Institute for Libraries and Museum Services—So now libraries and museums could do really well to cooperate with each other, and certainly work on authority files of any kind would be of interest to both communities.
 
Gary Rosenberg:
I've looked into that, and the way the guidelines are written it sounds like they're looking for collaboration between institutions, but if you've got your library and museum in the same institution are you eligible? [A small chorus of "Yes" was heard from the audience.]
 
Laurel Jizba:
The institution that Oregon State went in with was the Oregon Museum of Science and Industry. On the books there's just a very minimal relationship, but it doesn't make any difference.
 
Karen Calhoun:
Another opportunity not to overlook is one that identifies host organizations who have money coming in from serving your community, your biodiversity community. Not simply grant funding organizations, but those who are receiving an income from serving your constituency, much like OCLC and RLG receive incomes from serving the library community. The point has been made many times that these kinds of files are not profit centers. If you look at the economics of it—where the money is coming from—it's coming from deploying those files, in ways that make them available to the information community and the end-users of the information. If you think back to the model I presented of why the cooperative authority control model worked, there was a publicly funded agency at the center and two host organizations that serve the library community that could recover the costs of mounting the file from other operations that were in service to the library community. So that is another model to look at, besides simply grant money.
 
Allen Allison:
Just trying to get a little bit of discussion going here between the library community and the systematics community, it seems to me that the library community is in contact with the user, that the motivation for building these systems was to better serve the user. I'm not so sure, in the systematics community, that we know who that user is. Certainly it's ourselves, but we can't charge ourselves, I mean, it's almost like chasing your tail. We've got to connect to some outside group. So much of what I hear here strikes me as what I just learned is being called supply-side science. It follows the supply-side economics—that "build it and they will come" kind of model—and it seems to me that if we're really to succeed here we've got to somehow connect to a much broader array of users than just ourselves. It seems to me that the library community has accomplished that, so it would be useful to hear how they have connected the efforts to develop more sophisticated search and retrieval systems to user demand from the general public.
 
Laurel Jizba:
There are certainly many databases that the libraries use that we pay for. There is no reason why, collectively, the libraries could not be a user of the files you build and pay you for it.
 
Stuart Nelson:
I just thought I'd share that the National Library of Medicine, when it made Medline free, went from a rate of 7 million search sessions per year, to 70 million. I think if you put it out there, and it's good, that it will generate the kinds of numbers that will command respect and will generate income.
 
Stephanie Haas:
I just have to say this: It's just a quirk of fate that the Library of Congress decided to use common names instead of genus-species names, because if that had happened probably a lot of the authority work that needs to be created now would already be there. So it might be advisable to connect in with the Library of Congress and the work they're doing, because they've created the connection with the public by using common names, and that's the way a lot of people approach information. I realize you're dealing with a very sophisticated group of people, but from the people who aren't the scientists, who aren't the taxonomists, it's the common name—how they use the literature and how they get into it. So I think keeping that in mind, when you create whatever it is that you want to create, would be a good idea.
 
Gail Hodge:
To respond to the earlier comment about libraries being in touch with their users, we'd like to think that we are, but if you go to any library meeting, there's an awful lot of discussion about how do stay in touch and become more in touch with our users. So it's very nice that you think that we have those answers, but I really don't think that we do. I think in many cases that we're struggling with some of the same issues that you are: How do we make ourselves relevant now, in the electronic environment? How do we make ourselves to new groups of end-users, where before we were serving ourselves in many cases? It was interlibrary loan, libraries serving libraries, and they served end-users. So it's nice to get some pats on the back, but we certainly have a long way to go on this, together.
 
Allen Allison:
Just to follow up on that comment about Medline going from 7 to 70 million hits. What was the mechanism for continuing the funding, if you will? I mean, obviously, if you've got 70 million hits you can justify the service, but how do you connect the service to the revenue stream? How does your agency get the extra money to do that? Through Congress, or something like that?
 
Stuart Nelson:
Well our mandate says—the law says—that we're supposed to recoup the cost of providing the information, not recoup the cost of developing the information. So our indexing, and the dollars spent on my section to develop the indexing vocabulary and so forth, are part of our federal budget. But as I like to point out to publishers, the marginal cost of providing our information over the Internet is essentially zero. The reason I pointed that out is that —What was the number of hits you had to have to get an advertiser, half a million a month?—There are a lot of people out there who are interested in biology, and there are lot of companies out there would like to be seen as being "green", and environmentally friendly, and supporting species diversity. Now, can you figure out a way to put them together, and make money out of it to support your activity? I think so.