"Data Structures for Taxonomic Names and Classifications"
Stan Blum:
We're into the section now where we can talk about all of the
issues that we've heard about up to this point. There were a whole broad
array of issues, from representations of classifications, to possible
community architectures. The ADL gazetteer work, for
example might raise the question of whether a single large
repository, with distributed contributors, might work for the
taxonomic community... Is anyone ready to ask a question or make a
comment? No? Jim, how about some rabble-rousing?
Jim Beach:
Well, throughout the morning Allen Allison and I have been having a
side discussion about an issue that I think Walter raised as well as
several others, and that is the community overhead associated with
maintainingin this case I think you'd call it a global
classification space or concept spaceto organize all of the
classification trees and subtrees that have been published, to
maintain that multidimensional tree structure, on an on-going basis,
without a professional infrastructure for career credit, etcetera. It
seems that this is a fascinating information engineering problem that
has some interesting identified solutions, both conceptually and in
terms of implementation, but Allen has been sort of poking me and
raising the question: "Can we afford to do this?" I guess looking at
the geographic efforts and the libraries' subject term efforts, how
do we evaluate whether this is economical, and supportable and
sustainable, now and into the future, because this is likely to
represent a need to identify sources and kinds of funding that are
different than we would get for supporting analogous print
publications. So that's my rabble-rousing comment.
Frank Bisby:
It's a simple fact that, world-wide, there is enormous pressure for
one element of that space, and that is to get to the position where
we have at least one responsible checklist of those species presently
known. So I think that's the bottom-line; the minimum. Whether
there are enough funds to fill out all the space, all the
alternatives, all the hierarchies, all the potential taxa, and so on,
is not clear to me. The minimum is that. Throughout the library
community, the biodiversity community, and the biotechnology
community, they're just astounded that we don't have a first try at
the list of what is presently recorded in one dimension.
Stan Blum:
I will take this opportunity, then, to say that what seems
evident to me is the need for greater collaboration between different
kinds of projects. We've been discussing, and Frank mentioned in his
talk earlier, this hierarchical notion of data components in
taxonomy, from nomenclature, to taxonomy, to classification. We've
heard from a number of projects that are making, or have already
made, great progress in the compilation of basic nomenclatural data;
the nomenclatural acts that have occurred in the literature. These
are the simple facts. The checklist-oriented work could easily
draw upon these. They need the basic nomenclatural data as building
blocks. Checklists could be built as references to authoritative
records on nomenclature, without duplicating the indexing and
verification work. So it seems to me that we should be working
toward some sort of collaborative architecture in our various
projects. That's a recommendation.
Nancy Morin:
I don't know if this is too simple-minded a way to look at it,
but one of the messages we had from the libraries is that they looked
at the work they were doing, they identified the fact that there was
too much work to be done, and the way that they were doing it was
taking too long. They needed to have a collective way to streamline
it and just simply make that work go faster. The critical piece was
having the authority files as a way of sharing information, so you
only had to put the information in once and other people could use
it; you figured stuff out once and everybody could use it. Surely,
as we're looking at the demands we have for information on
biodiversity, the pressure that Frank was talking about, if we only
looked at capturing data from our collections, which every
institution has (it's this enormous thing), we should figure out what
the cost is now, if each institution had to put in all the names, and
the geographical names, time after time... Could we get the job done
faster if we put in some up-front financing to get these authority
files up and available in a way that we could all share them
better?
Scott Peterson:
Adding to what Nancy just said, the thing that seems obvious to
me is that the medical industry has found a way to gain support and
put in some of their infrastructure at NIH, whereas the biological
community really doesn't have, or hasn't applied the influence (in the
U.S.) on Congress, to get funding into a federal agency that could
assist us in this effort.
Anon. (F.):
Do you need to come up with some kind of uniform place to put
things? To use the Field of Dreams, "if you build it, they will come"
model; something like the Species 2000 project, for example. If we
could advertise that further, on some of the e-mail lists and things;
so people would realize this resource is there, would go and use it, and
start to put their data into it. On the issue of feedback though, I
think that in order to get to the issue of paper publicationyou
know, "publish or perish"if we could get feedback on how many
times these people are hitting these sites, and be able to cite that
to the bosses, and say "Look, we've put are data there, and they're
getting a thousand hits a day." That's more than people are seeing
your paper publications. Maybe that will give the necessary feedback
to help justify the web site.
Laurel Jizba:
There is some funding now, because the agencyI don't know
the name of it, but it's the group that used to give libraries money
under title 2C, which was disbanded, and now the requirement is that
federal money only be given out to joint projects to libraries and
museums. I guess Adam knows which group this is, but it's not NSF.
The Institute for Libraries and Museum ServicesSo now
libraries and museums could do really well to cooperate with each
other, and certainly work on authority files of any kind would be of
interest to both communities.
Gary Rosenberg:
I've looked into that, and the way the guidelines are written it
sounds like they're looking for collaboration between institutions,
but if you've got your library and museum in the same institution are
you eligible? [A small chorus of "Yes" was heard from the
audience.]
Laurel Jizba:
The institution that Oregon State went in with was the Oregon Museum of
Science and Industry. On the books there's just a very minimal
relationship, but it doesn't make any difference.
Karen Calhoun:
Another opportunity not to overlook is one that identifies host
organizations who have money coming in from serving your community,
your biodiversity community. Not simply grant funding organizations,
but those who are receiving an income from serving your constituency,
much like OCLC and RLG receive incomes from serving the library
community. The point has been made many times that these kinds of
files are not profit centers. If you look at the economics of
itwhere the money is coming fromit's coming from
deploying those files, in ways that make them available to the
information community and the end-users of the information. If you
think back to the model I presented of why the cooperative
authority control model worked, there was a publicly funded agency at
the center and two host organizations that serve the library
community that could recover the costs of mounting the file from
other operations that were in service to the library community. So
that is another model to look at, besides simply grant
money.
Allen Allison:
Just trying to get a little bit of discussion going here between
the library community and the systematics community, it seems to me
that the library community is in contact with the user, that the
motivation for building these systems was to better serve the user.
I'm not so sure, in the systematics community, that we know who that
user is. Certainly it's ourselves, but we can't charge ourselves, I
mean, it's almost like chasing your tail. We've got to connect to
some outside group. So much of what I hear here strikes me as what I
just learned is being called supply-side science. It follows the
supply-side economicsthat "build it and they will come" kind of
modeland it seems to me that if we're really to succeed here
we've got to somehow connect to a much broader array of users than
just ourselves. It seems to me that the library community has
accomplished that, so it would be useful to hear how they have
connected the efforts to develop more sophisticated search and
retrieval systems to user demand from the general public.
Laurel Jizba:
There are certainly many databases that the libraries use that we
pay for. There is no reason why, collectively, the libraries could
not be a user of the files you build and pay you for it.
Stuart Nelson:
I just thought I'd share that the National Library of Medicine,
when it made Medline free, went from a rate of 7 million search
sessions per year, to 70 million. I think if you put it out there,
and it's good, that it will generate the kinds of numbers that will
command respect and will generate income.
Stephanie Haas:
I just have to say this: It's just a quirk of fate that the
Library of Congress decided to use common names instead of
genus-species names, because if that had happened probably a lot of
the authority work that needs to be created now would already be
there. So it might be advisable to connect in with the Library of
Congress and the work they're doing, because they've created the
connection with the public by using common names, and that's the way
a lot of people approach information. I realize you're dealing with
a very sophisticated group of people, but from the people who aren't
the scientists, who aren't the taxonomists, it's the common
namehow they use the literature and how they get into it. So I
think keeping that in mind, when you create whatever it is that you
want to create, would be a good idea.
Gail Hodge:
To respond to the earlier comment about libraries being in touch
with their users, we'd like to think that we are, but if you go to
any library meeting, there's an awful lot of discussion about how do
stay in touch and become more in touch with our users. So it's very
nice that you think that we have those answers, but I really don't
think that we do. I think in many cases that we're struggling with
some of the same issues that you are: How do we make ourselves
relevant now, in the electronic environment? How do we make ourselves
to new groups of end-users, where before we were serving ourselves in
many cases? It was interlibrary loan, libraries serving libraries,
and they served end-users. So it's nice to get some pats on the
back, but we certainly have a long way to go on this,
together.
Allen Allison:
Just to follow up on that comment about Medline going from 7 to
70 million hits. What was the mechanism for continuing the funding,
if you will? I mean, obviously, if you've got 70 million hits you
can justify the service, but how do you connect the service to the
revenue stream? How does your agency get the extra money to do that?
Through Congress, or something like that?
Stuart Nelson:
Well our mandate saysthe law saysthat we're supposed
to recoup the cost of providing the information, not recoup the cost
of developing the information. So our indexing, and the dollars
spent on my section to develop the indexing vocabulary and so forth,
are part of our federal budget. But as I like to point out to
publishers, the marginal cost of providing our information over the
Internet is essentially zero. The reason I pointed that out is that
What was the number of hits you had to have to get an
advertiser, half a million a month?There are a lot of people
out there who are interested in biology, and there are lot of
companies out there would like to be seen as being "green", and
environmentally friendly, and supporting species diversity. Now, can
you figure out a way to put them together, and make money out of it
to support your activity? I think so.