[ TAF Home ] Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998

Accessing And Distributing Authority Files


Gary L. Strawn

Northwestern University Library

mrsmith@nwu.edu


What I’d like to talk about today, are the mechanisms people in the library community have developed for creating, accessing and distributing authority records and authority files. In other words, I’m going to talk about how people in the library community build authority files, and make information available. To do this, I’m going to have to touch on a couple of areas about which I believe you’ve already heard from others; but of course I’ll stress the things that bear especially on my topic, the mechanics of authority record exchange.

The Program for Cooperative Cataloging, the PCC, is one of the major cooperative endeavors of the library community in the last few years. The primary goal of the PCC is to enhance the timely availability of bibliographic and authority records that have been produced under commonly-accepted standards. Two programs of the PCC are of special interest today, the Name Authority Cooperative, or NACO, and the Subject Authority Cooperative, or SACO. I’m going to describe each from my own point of view.

NACO is the program for the creation and distribution of authority records for "names," that is, all types of authority records except those for topical headings. This includes personal names, corporate and conference names, individual works, and jurisdictions.

Figure 3
Figure 1. Organizations with copies of the name authority file.

The authority file created by NACO members resides at the Library of Congress, with copies at OCLC and RLIN (sometimes called the "bibliographic utilities") and also at the British Library, symbolized here by the "National Bibliographic Service" icon [Fig. 1]. This file was once called the "national" authority file, but is now because of its expanded scope called the "international" authority file. The copies of the authority file at the bibliographic utilities and the British Library are automatically synchronized with the file housed at the Library of Congress. Authority records created at the Library of Congress go directly into the authority file at the Library of congress; authority records created by other libraries go into the authority file maintained by their bibliographic utility.

To create an authority record, operators outside LC work within the client provided by the bibliographic utility. To work in OCLC, for example, one uses "Passport for Windows," not only for authority-related tasks, but for all cataloging tasks. (This is a Windows 3.1 program; it’s shown here [Fig. 2] running under Windows NT.) The operator requests an authority workform from the utility, and fills in the workform with the appropriate information. Here, the operator has entered a basic name authority record, and is examining it for correctness.

Now, Passport for Windows has a built-in programming language, similar in many respects to Microsoft’s Visual Basic language. Libraries have used this language to build fairly complicated macros for doing things of interest to them. (I can offer as just one example Northwestern’s extensive macro for searching OCLC from provisional records for materials in our cataloging backlog, adding holdings to OCLC and downloading the records so that they can be added to NOTIS.) An ambitious person at OCLC devised an elaborate macro for creating a basic authority record for certain types of headings from information in a bibliographic record. For example, to create an authority record for the heading given here as the main entry, the operator simply clicks the cursor on the heading, and then starts up the macro [Fig. 3]. The macro extracts information from the bibliographic record, calls up an authority workform, and fills it in; all this happens in a second or so. Here [Fig. 4] the macro has finished its work for the heading we were just looking at. It is now the operator’s responsibility to inspect the record for correctness and completeness, changing things that are wrong, and supplying things that are missing.

Some libraries find it more efficient to create authority records in the local system and upload them to the utility, rather than create them directly in the utility. The reason is typically that the local system has some capability, such as enhanced record editing, that makes it easier to create records in this manner. At Northwestern, our current NOTIS system provides a quick and efficient full-screen record editor; and we also have a more sophisticated ability to create authority records automatically than even that available via the OCLC macro [Fig. 5].

The ability to create a new authority record automatically from a heading in a bibliographic record is becoming part of standard integrated library systems. For example, it’s built into version 97-2 of the Voyager system (the system that Northwestern is moving to in about a month), and I understand it’s part release 6.9 of the GEAC system. This ability, which some of us have had at this level for about four years, has made a significant difference in libraries’ approach to authority control. I can certainly say that at my own library, where increases in the staffing of public services areas are normally made by cutting personnel from technical services, this ability has made it possible to participate in cooperative programs. Without it, we’d probably not even be creating many authority records for our own use.

I’d like to describe briefly how this authority-record creation business will work for us in Voyager; this is similar to the technique we use today with NOTIS, and I believe it’s similar to the method that will be used in the future in other client/server systems. In Voyager, when the operator saves a bibliographic record, the system prepares a report listing all headings in the bibliographic record, and describing the manner in which each heading compares to authority records in the local file [Fig. 6].

If the operator wants an authority record for a heading without one, the operator clicks on the heading to highlight it, and then clicks the "Create authority" button [Fig. 7]. The system takes the heading, massages it together with information from the bibliographic record, and proposes an authority record. Again, it is the operator’s responsibility to make sure this record is correct and complete. Once the operator is satisfied with a record and approves it, the record is added to the local system. There’s more work to do if the operator wants to contribute the record to NACO; this is a separate, non-automatic step. The operator executes a command in the library system’s client to save the record in the current workspace to a file on the local hard drive in the USMARC format [Fig. 8]. Then, the operator shifts to Passport and starts a macro that reads the authority record in the disk file, calls up a workform, and fills it in [Fig. 9]. This procedure sounds cumbersome, but it really takes just seconds, and can be accomplished largely with mouse clicks. The main thing to note in all of this is that the authority record is being created wherever it can be done most efficiently, and that records are moved from one place to another without re-keying.

Regardless of the manner in which the workform has been filled in—direct keyboard input, or uploading from another system—the operator eventually sends a command to the utility that means "I’m all done with this one." The utility adds the record to its own copy of the international authority file. This record is immediately available to other users of that utility searching that particular copy of the authority file, but it’s not available yet to those searching other copies. (For example, a record contributed to OCLC is available immediately to OCLC users, but not to those searching RLIN or LC.)

Once each day, all of the authority records created or modified at each remote site are written to a file. Very early each morning, a program at the Library of Congress connects to each site and pulls the file from the remote location to LC’s own computer. (The program uses FTP to do this.) LC loads these new records (and updates, too) into its own copy of the file; the LC file now contains a complete set of all authority records. The computer at LC prepares what are called "response" records, which describe the fate of each record. The utilities fetch the response records and examine them, to make sure that nothing was lost. What happens next, now that there is one complete copy of the authority file, I’ll come to after I talk about the Subject Authority Cooperative.

SACO is a program that is somewhat similar to NACO, but there are differences. One important thing to note is that the name may be misleading: the SACO program does not comprehend all subject vocabularies, but only the Library of Congress subject heading system, or LCSH.

Figure 10
Figure 10. All subject heading proposals are reviewed at the Library of Congress.

The flow for SACO contributions is markedly different from that used for NACO contributions [Fig. 10]. NACO is decentralized; records are added to the international authority file without any review beyond that provided by the local library. For SACO, on the other hand, all suggestions come to LC. These may come directly from a SACO member, or through a SACO funnel, or in fact from any library that feels like making a suggestion. All such suggestions—and they are no more than suggestions—are thoroughly reviewed before being approved. Those suggestions that do get approved are entered directly into the authority file at the Library of Congress. (It so happens that two other vocabularies often used in libraries, the Medical Subject Heading system maintained by the National Library of Medicine, and the Art and Architecture thesaurus maintained by the Getty Information Institute, follow a similar centralized model.)

The procedure for suggesting changes to LCSH is simplicity itself. There is a Web page that lists the elements likely to be needed in a new subject authority record. This page is not in any sense interactive; you copy the form into an e-mail message, fill in the blanks, and send it off as an e-mail message. Figure 11 shows one such proposal, recently sent by a funnel project to the Library of Congress together with several other proposals. One of the problems with using e-mail for these proposals, is that e-mail clients often don’t have the ALA character set built into them, so we have to resort to expedients when authority information contains diacritics and special characters [Fig. 12].

To give you a more complete picture of something I talked about earlier, I should mention that the automatic generation of subject authority records is nowhere near as advanced as the automatic generation of name authority records. All we’re getting right now is the heading and the immediate source information [Fig. 13]. This is principally because name authority records can in large part be generated by manipulating the heading and the source bibliographic record. Subject records on the other hand contain principally information—variant forms of the heading, and relationships to other headings—that isn’t explicit in the bibliographic record, so it can’t be pulled in automatically or created through a stereotyped manipulation.

Figure 14
Figure 14. New records are merged at the Library of Congress and a new master file is distributed daily.

So now, by various means, we’ve got the authority file at the Library of Congress, containing all of the name and subject authority records contributed from any source. The remaining part of the work is the synchronization of the authority file at LC with the other copies. Once a day, files containing new and modified name and subject authority records, from whatever source, are prepared by a computer program at the library of Congress. These files contain records uploaded from the bibliographic utilities and records from the Library of Congress; new records, modified records, and instructions to delete records. (Deleting records can only be done at the Library of Congress.) These files—one for name, one for subjects—are fetched in their turn by computers at the remote authority locations and processed as appropriate [Fig. 14]. Response records are prepared by the utilities and used to make sure that all records sent out were received and handled as appropriate. So, it is possible to say that, within the span of at most one day, all of the files have the same records in them. Immediately after a file has been synchronized of course, it’s out of synch again; but the amount of duplication encountered appears to be within tolerable limits. (Most of the duplicates encountered are not because records for the same heading were created at different locations on the same day, but simply because someone wasn’t quite thorough in searching before adding a new record.) In any case, records contributed by any means are eventually, and automatically, available to all.

The Library of Congress also cumulates the daily changes into a weekly subscription service, symbolized here by the little icon of a tape reel. A small number of libraries subscribe to this service, and also some vendors; this is done either directly, or through a third-party reseller. Given the ever-decreasing cost of disk storage, additional libraries may find it economical to maintain their own copy of the international authority file. In doing this, they may realize substantial efficiencies in their cataloging operators, and may also save themselves the costs of searching for and downloading those particular records they need from their bibliographic utility.

Maintaining a local copy of the international authority file is not without problems however. I’d like to finish with a brief sketch of what can happen to these records at the local level. There are really two basic possibilities. At one extreme, the records are maintained as a separate resource file and never modified except when they’re modified through the contribution mechanism I’ve already described. In this case, loading updates is simple, you simply replace the existing record with the new one. This is the model that is followed at the bibliographic utilities, for example—all changes to this file are made with the intention that they will be shared with everyone. At the other extreme, authority records that arrive from outside are modified to work in the local environment; this is the model we follow at Northwestern. This makes the loading of updates more complex, because in most cases you want to preserve the local changes.

Examples of changes that wouldn’t be appropriate for making in the official copy of authority records are: local series treatment information (especially when the local treatment varies from that of the Library of Congress) [Fig. 15]; a record of local variants to name headings (useful for ongoing maintenance at the local level [Fig. 16]); and other enhancements.

Here, for example [Fig. 17] is a MeSH authority record. Now, MeSH authority records contain very few reference tracings for broader and narrower terms. Instead, they contain information about how a heading fits into the MeSH subject hierarchies in coded form. Unfortunately, our NOTIS system can’t do anything with this; so we have a program convert this information into the reference tracings that our system knows about.

  • New records
    • New—Add (duplicate to locally-created record?)
    • New—Delete
       
  • Updates
    • Heading has changed
    • Heading has not changed
    • Local information not present
    • Local information present
       
  • Deletes of existing records
    • Heading used in local records
    • Heading not used locally
Figure 18. Simplified logic for processing incoming authority records.

For a library making these kinds of changes at the local level, simply loading in replacement records as updates are received would obliterate the local work. At Northwestern, we’ve developed a protocol for making changes to authority records, and for marking those changes as local. Our loader program has been tailored to recognize the locally-added information, and to act as appropriate [Fig. 18]. This is in itself a complicated topic; I’ll just point out that the program contains separate and rather elaborate logic for handling new records, changes to existing records, and deletes of records. In this manner, we’re able to get authority records from the international authority file to work in the local environment, in as automated way as possible.

 


Transcript of Discussion Go to discussion


Biographical Information

Gary Strawn splits his time between the technical services and information technology departments at Northwestern University Library. He holds a dual post as the authorities librarian and a library systems analyst.