Discussion at the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998
[ TAF Home ] [ TAF Workshop Proceedings ] [ Presentation ]

Transcript of Questions for Linda Hill


John Attig:
You mentioned that some of the data elements in your gazetteer have time specifications. Does your service interface already make use of this? Do you have thoughts about how you would make use of it?
 
Linda Hill:
First of all, we haven't implemented that part, but we're talking to the Getty Museum because they have a lot of historical place names and we're hoping to be able to move theirs into our gazetteer. However, they haven't made such a point of having a spatial footprint, so we're going to have to work that out. We do something kind of interesting, in the sense of using the date: we treat the gazetteer as a collection -- it's a collection just like a catalog is a collection, and since we can now search the catalog and use a date constraint, we can use a date constraint on the gazetteer also.
 
Randy Ballew:
Are there any plans to make the gazetteer available as a distributed object, say maybe with CORBA or something like that?
 
Linda Hill:
This is the direction we're going. I didn't talk about that because we haven't really sat down and made any progress on that, but the idea is to have gazetteer services that have known behaviors and that they are distributed objects, so that you will know these gazetteer services are available on the net and how to use them. Now I obviously don't know a whole lot about how one makes that work in CORBA and so forth, but we're actually organizing a workshop [see NKOS], Gail Hodge, another colleague and I, in Pittsburgh next Saturday, the [June] 27th [1998], on what we need to do to network what we're calling terminology tools or reference tools, of which gazetteers, thesauri, and classification systems are some examples. How do we put those out there as interactive, accessible, services on the Net.
 
Anon. (M.):
Are your point data associated with latitude longitude information?
 
Linda Hill:
Yes, all of the footprints are represented in latitude longitude, as decimal degrees, so a point just has one latitude and one longitude, a bounding box has two latitudes and two longitudes, and a polygon, or a string, or linear feature would have more than that.
 
Stephanie Haas:
With all the work that's going on with FGDC and the geospatial community, with all the data sets that are being captured in electronic directories, how is this [ADL] interfacing with that.
 
Linda Hill:
We interface quite a bit with FGDC, and for those of you who are not aware of it, that's the Federal Geographic Data Committee, which created the first metadata standard that is designed to represent geospatial digital data. The thing about that, and the clearinghouse, and the whole impetus and design, is that they have never really paid attention to the gazetteer component. We just had a meeting last week on geolibraries in which we were talking about what are the core components of a geolibrary, which is the same kind of thing as a clearinghouse, but with more library services, and at that meeting we were making a big point about the gazetteer being a key component. The other thing that they have not done much about is the whole idea of how do you categorize, or what kind of terminology do you use to categorize, the spatial objects you're entering into the clearinghouse, and they're having some problems along that line.
 
Karen Calhoun:
Linda, it's a wonderful project. You've been working on it for four years. What would you say are the most important lessons that you learned for people that are working with information retrieval systems and controlled vocabularies or classification systems.
 
Linda Hill:
Wow, well... To get to the controlled vocabularies, one thing that was very interesting to me, and turned out to be extremely important, is that you need a controlled vocabulary (a limited domain set of terms - an organized set of terms) for the "types" of information you're dealing with. Every time we came up with a set or collection to add to ADL, the key thing we wanted to know is what "type" the objects in the set are in order to provide the user a way of limiting their retrieval to a certain type of information. Well that turned out to be very interesting because I hadn't anticipated that. For an ADL collection, we have to have the type domain and the "available as" domain -- as we call it, which is essentially the MIME types -- to provide the user the opportunity to say "I only want TIFF images", or "I only want types of data that will work with the software package I have." But we haven't done much about subjects, or concept controlled vocabularies. We've done nothing about it, because we're concentrating more on the spatial representation, and as far as the spatial representation, I think that this is such a powerful idea, that as a user you can describe your query area without having to resort to terms, by using a map, and then making that match -- which is contains, overlaps, nearby, west of; all those kinds of relationships -- to objects in information stores. So, I think it's high time that libraries and other kinds of information systems start thinking about ways to incorporate the spatial representation of their objects so they can use that power. Does that address your question? We're really excited about the concept, as you can probably tell.
 
Anon. (F.):
This is a really simple question. Do you include special places like Yellowstone Park, or wild and scenic river designated areas?
 
Linda Hill:
Yes, absolutely. This is absolutely open and I'm glad you asked me that question because one thing I forgot to make a point of is that we see this open metadata standard for gazetteers to be very accommodating, even for personal gazetteers. The example that I use is that if I'm researching the Red-legged frog and I know what the habitats are for the Red-legged frog, I can actually create that as a named geographic place, with non-contiguous, perhaps, footprints that represent it, and I can put that in my personal gazetteer. Certainly Yellowstone Park, oil fields, geologic provinces, river valleys... This is not limited to administrative areas at all. Any kind of place can be represented; my backyard, if I want to, my school, whatever... Named geographic footprints and a system whereby people can find out where that is -- with their GPS units, or map reading or whatever -- and put that in the system so we can all share it, or at least we have the potential of sharing it. None of this exists now.
 
Adam Schiff:
In the course of cataloging taxonomic literature, I've often come across descriptions of where things were collected; often they're national parks in other countries, or mountain ranges in other countries, etcetera; and I've often felt, particularly for man-made features like national parks, that the U.S. Defense Mapping Agency gazetteers do not include those types place names. But I have been able to go in some cases to other gazetteers on the web -- the Canadians have their place names up, the Australians have their place names up -- and I can get that information to set up authority records. Are you able to incorporate other sources, other than the federal?
 
Linda Hill:
Oh absolutely, in fact we have one from the National Park Service -- they have a gazetteer-like file, and we have it. That's one of the sets that is waiting to be loaded in. And there are some very interesting aspects to that particular one. They use lots of different category schemes: Bailey's ecoregions, for example. They will describe a park in those terms. So this is going to be a good set for us to test with this content standard. It's open to everything. But the major gazetteers we have to deal with now are mainly administrative areas, not the parks and so forth.
 
John Mitchell:
To piggy-back on that, I noticed that you used the USGS GNIS database. Do you also use the GEONET as well, for the international?
 
Linda Hill:
Yes, the two that we put together to get to our 6 million item gazetteer were the GNPS and the GNIS, and these both operate under the aegis of the Board on Geographic Names -- the GNIS are the domestic names and the GNPS are the non-domestic names -- and we combined these, but interestingly enough that was very difficult. They used two different category schemes. So we actually took the classes from one system and the categories or subclasses from the other, and managed to make them work. It was that experience that led us to believe that we needed to establish a thesaurus, which would allow us the flexibility of a thesaurus rather than just a two level class system.
 
Gain Hodge:
Linda, can you talk a little about the economics? Obviously the digital library projects were funded, but now what will happen and what do you see in the future for continuing to provide gazetteer type services?
 
Linda Hill:
Yes, I can talk a very little bit about it. I think that our contribution has been to set up the content standard and the method by which you would go at it. But what we would hope is that other organizations would pick that up. I would love to see the federal Government pick this up and do something where the result is share-able, interoperable, integrate-able, with more fully described place name information.