| Contents
Library Access to External Electronic Resources
|
Computers in Libraries '99
URLs, PURLs & TRULs : Link Maintenance in the Web-accessible OPAC
Tom Tyler
Abstract A Web-accessible OPAC is one that permits library catalog users to use common web browsers to interact with the underlying bibliographic database. Libraries have generally welcomed the opportunity occasioned by the browser client to provide live hyperlinks from records in their catalogs to external electronic resources. As the number of hyperlinks in individual OPACs increases, librarians and their parent institutions have started to develop and implement policies and procedures to deal with a variety of problems related to URLs.
Some problems are vendor-related and can only be resolved by changes to vendor-supported servers. The problem of URLs that no longer work is one that has received special attention recently because of its impact on library staff and resources. While some libraries have successfully implemented URL maintenance procedures, many have not.
Requests by librarians to ILS vendors for new tools to make URL maintenance more efficient and effective have yet to elicit a significant response, but recent announcements suggest some of environmental changes needed may soon be a reality.
The following highlights some of the issues to be discussed in the CIL'99 presentation.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Introduction
|
The need for frequent and ongoing maintenance for bibliographic records in a library catalog is a somewhat new phenomenon -- one that has occurred in the past few years. Catalog records, which once represented materials held locally by the library, now may link to resources held anywhere in the world that is accessible by a variety of telecommunications protocols.
Continuing maintenance of such records is needed if the link from record to resource is to remain valid. The resource may disappear altogether or its electronic address might simply change. Or it is possible that the technical or administrative requirements for electronic access might change as would be the case if a previously "free" resource instituted a policy of fee for use. There is also the possibility that the linked resource changes in content and thus alters the degree of "relatedness" to the bibliographic record that provides the link in the first place. Before we see how some libraries and the automated systems they use are dealing with this new phenomenon it might be helpful to look at the context in which this activity takes place.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Web-accessible OPACs
|
Web-accessible OPACs are relatively new, yet the library community has moved with unaccustomed speed to embrace this new technology. While it is easy for those of use who are have had a web-based catalog for 1, 2 or more years to think "everybody's doing it". There are still many libraries dependant on text-based systems, but their numbers are declining. Peter Scott's WebCATS site currently lists nearly 2,000 web-based catalogs of which more than 1,500 are found in North America. Almost 100 library catalogs were added to this site in the last quarter of 1998.
In early January, an informal survey of depository librarians on the GOVDOC-L list to determine availability of web access to their OPACs and their involvement with URL maintenance, showed, that of those responding, 75% reported current web access to their catalogs. Of the remaining 25%, half reported plans to move to new library systems with web interfaces in the current year.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The Library and Access to External Electronic Resources
|
The library "homepage" as the gateway to a wide range of information resources is almost a universal model these days. Via the homepage, the user may select the library's catalog, obtain administrative information about the library, search selected commercial databases for articles on a variety of subjects, initiate document delivery, and connect to a variety of other external electronic services and resources.
While there are as many variations to this basic model as there are library homepages, one element that seems to be unstated but almost universally understood is that the library is committed to providing access, not only to local collections but to resources beyond the library and its parent institution. The combinations of technologies that support this access from the homepage are essentially the same as those that allow access from records within the library's web-based OPAC, yet a combination of constraints - some system related, others organizational, and yet others more particularly bibliographic - have limited the use of records in the catalog database as vehicles for access to external electronic resources. Before looking at these constraints and how some libraries have overcome them, it would be helpful to find out a little more about hyperlink data in the webpac and how this data is being used in public displays.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hyperlinks in the Web OPAC
|
One of the characteristics of Web-based OPACs is that there is no longer the need to work only within a somewhat rigid hierarchy of menus and displays. Hypertext links are used extensively in web-OPAC implementations to enable the user to move among a variety of service or display options. While individual ILS / library variations may exist, it is more common than not to link to index browse displays from titles, author, and subjects. Less universal, but quite common are links to series titles and classification numbers. Unless local parameters have been set to suppress links to external resources, URLs from the Marc 856 fields (Electronic Access & Location) are usually hyperlinked.
What is not uniform is how different implementations of webpacs use the information available in the Marc 856 field to impart useful information to the catalog user.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The URL and Marc
Full Marc 856 Definition 856 - Current Practice OPAC Labels for the 856 SROS Test Record CARL - Summary CARL - Example DRA - Summary DRA - Example Endeavor - Summary Endeavor - Example#1 Endeavor - Example#2 Innovative - Summary Innovative - Example#1 Innovative - Example#2 InterCat - Summary InterCat - Example Notis - Summary Notis - Example PALS - Summary PALS - Example Sirsi - Summary Sirsi - Example#1 Sirsi - Example#2 Texas St Library VTLS - Summary VTLS - Example#1 VTLS - Example#2 Starr-TSEL Starr-TSEL 404 Starr-Endeavor Starr-Endeavor Starr-Endeavor Starr-Endeavor Starr-Innopac Starr-Notis Starr-InterCat Starr-Melvyl Starr-Melvyl 404 Starr-OhioLink via Innopac Starr-OhioLink via Z39.50 |
In 1993 Marc field 856 was defined to provide a link from the bibliographic record to an electronic resource. This was done initially to accommodate email, ftp, and telnet access methods. There was no provision for the URL - Uniform Resource Locator - that is so familiar today. Mosaic made its appearance in the same year, to be followed in 1994 by the first version of Netscape.
Responding to the growing phenomenon of the World Wide Web, with its use of the URL, a new subfield definition was approved for Marc 856 in the following year. Subfield "u" [$u] was added to the definition. In January 1995 the new subfield was validated for use by OCLC for its Online Union Catalog (OLUC).
Marc 856 fields created in the pre-URL era (i.e. no $u) are invisible to users of most library systems - DRA, Endeavor, Innopac, and others. Sirsi implementations will generally label and display information from the 856 field but will not translate the information into a useable electronic address. Library systems at the present time require a $u in the 856 to generate a live hyperlink to the URL. How this information displays to the user varies from system to system.
System vendors and libraries are far from being in agreement on what to call the external resources link as illustrated by the following table.
Some Labels used to Describe Marc 856 Data
How the field is labeled in the public display does not generally effect cataloging practices, but how the hyperlink is generated by the library system does. While the majority of vendors seem to use the URL itself as the URL caption (the underlined text visible to the user), several, including DRA, Innovative and Endeavor, generate the caption from information carried in either the Notes ($z) or the Materials Specified ($3) subfields. Innovative's unique "Click on the following to: " label for its display of electronic links and its practice of preceding the URL with the phrase "Connect to" if there is no subfield-z ($z), places an additional burden on catalogers concerned with grammar, syntax and the ability to convey meaning to the user. The following table shows the display characteristics for selected systems that support web-based OPACs.
Display Characteristics for Marc 856 Field Elements in Selected Web OPAC Implementations
Other practices that are vendor-specific present additional problems. This is generally due to the system not dealing adequately with multiple URLs or notes in a single Marc 856 field. Auto-Graphics, Endeavor and Innovative simply ignore any additional 856 note fields after the first one. DRA ignores the note subfield altogether. The four systems just mentioned also generally use only the first URL when more than one exist in a 856 field.
Two systems (Melvyl and CARL) that display multiple URLs from a single 856 field often concatenate all URLs into a single (and singularly unusable) link.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Linked Records in the OPAC
|
Apart from the relatively few libraries in the United States which have embarked on ambitious digitization projects related to book or photograph collections and have added conventional cataloging records to their catalogs for these resources, it would appear that most libraries direct experience with using the Marc 856 field for providing access to electronic resources has come dealing with two different bodies of materials - journal articles in full text and U.S. government publications.
In the case of providing catalog information for journals for which libraries have purchased access to full-text articles, the initial problem is that of revising existing records or creating new records to adequately convey information to the catalog user that articles are available. It is also generally necessary to let the user know that such resources are available only if validation criteria are met. Once 856 linkage has been established however, the records and their URLs are subject to the same maintenance requirements as any other records with links to electronic resources.
For many libraries - especially those among the 1,350 libraries that are depositories for federal documents - the recognition of the need for link maintenance in their library catalogs has been the combined result of using catalog records produced by Government Printing Office (GPO) and observing web management practices of government agencies over a period of several years.
GPO, which administers the depository program, started cataloging government publications on OCLC in 1976 in order to produce its Monthly Catalog of U.S. Government Publications which is required by statute. The records cataloged by GPO are acquired by many depositories, either directly from the Library of Congress which acts as GPO's agent for the sale of these records, or from one of several commercial vendors such as Marcive, Auto-Graphics, and OCLC.
Since April 1995 GPO has issued approximately 8,000 records with the Marc 856 field. While this is less than ten percent of GPO's total output for this period, the number has significance for libraries using these records in their catalogs. In December 1998, a test of last-use GPO URLs issued determined that more than 20% were either bad or no longer useful.
While ILS vendors were quick to enable the means to make electronic links to external resources they have been slow to provide effective tools to assist in areas associated with maintenance. The lack of tools has not been an insurmountable obstacle to some however.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Identification of URLs in the Catalog Database
|
Libraries involved in systematic link maintenance are generally agreed on the broad outline of how this should be done. The process can be conceptualized in three steps:
The general inability of current library systems to assist in steps one and two above has resulted in libraries using a variety of methods to identify URLs in their catalogs, convert them to hypertext links in an external HTML file, and then subject the file to link validation.
A number of libraries routinely test URLs at the point of cataloging. Others not only test the URL at this point, but save it for subsequent testing as a browser bookmark or as a hyperlink in a word-processing, spreadsheet or database file. A few libraries have access to specialized software that periodically extracts URLs from catalog records and generates an HTML encoded file for the next process - link validation.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Link Validation
|
In the absence of link validation as an option in current library systems libraries have turned to third party freeware, commercial software, or online link validators to test their URLs. Some of the link checking software mentioned by librarians are the following: LinkBot, Xenu's Link Sleuth, InContext WebAnalyzer, LinkCop, LinkScan, NetMechanic, Cyber Spyder, and Momspider. Software validators generally require Unix or 32-bit Windows NT or Windows 95 operating systems to operate.
Link validators usually generate rather detailed reports on the results of their testing. Because of transitory technical problems, some valid links may result in being reported as bad or broken. For this reason, further manual or machine testing of problem URLs is suggested.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Bibliographic & OPAC Maintenance |
Dealing with problem URLs in the library catalog is far more complex than in the more familiar "homepage" environment. The best-case scenario would be to simply replace a bad URL with a good one - assuming it can be found. If a replacement URL is unavailable then the cataloger's task can become more complicated because "relatedness" issues come into play. If the record can stand on its own without the URL, then it and or its corresponding 856 field is removed. It also might be necessary to delete or change note fields if they apply to the URL being deleted. Deletion (or possibly suppression) of the full bibliographic and any attached records might be indicated if the resource addressed by the bad URL was represented solely and entirely by its associated bib record.
The fact that bibliographic maintenance modules in some library systems operate in text mode with sometimes eccentric editors with limited or no capability for "cut and paste" makes the task of URL correction or replacement more difficult than it should be. In such situations, the creating of a new problem is a distinct possibility - especially if the URL exceeds 100 characters in length as do many found in records cataloged by GPO.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Staff |
As with any new process or procedure in libraries, the question of staff has to be addressed. Respondents to the survey of depository librarians indicated rather widespread, multi-unit involvement in the link validation and maintenance process. While several reported that currently, only documents staff were involved, a number indicated there was divided or shared responsibility between documents and cataloging or bibliographic maintenance units in their libraries. Several responded that systems personnel participated at some point in the process.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
PURLs - Promise and Reality
|
The PURL (Persistent URL) is a URL which, instead of addressing an actual resource, addresses a PURL server or resolver where a connection is attempted to a reference URL maintained in the resolver's database. The PURL resolver software was created by OCLC and released in 1996. OCLC offers the software to interested institutions without charge.
OCLC's intention was to "... help ensure reliable, long-term access to Internet resources with minimal maintenance." It was envisioned that there would be a number of PURL sites where responsible and conscientious "maintainers" would create PURLs and insure that the URLs associated with them were kept current.
For the PURL concept to be useful in the library environment, it is necessary to know who is maintaining PURLs, for what resources, and on which PURL resolver. Until early 1998 OCLC's "mother PURL" site was the only broad-scale national installation in the United States. Apart from the machine generated/updated PURLs created by OCLC from OLUC cataloging, there was no clear understanding of who else maintained URLs on the OCLC resolver. In March 1998 GPO installed its PURL resolver and began to include GPO PURLs in its cataloging. By the end of the year GPO had created more than 2,100 PURLs and included almost as many in its cataloging. The result was the widespread introduction of PURLs into the catalogs of American libraries.
Recipients of records with PURLs in lieu of URLs were not uniformly pleased. The electronic provenance or origin of the electronic resource was no longer visibly apparent, and some librarians believed this would effect future maintenance work. GPO's decision to use accession numbering for their PURL's (compared to OCLC's use of the OCLC record number) which made it difficult to insure a precise connection between a bibliographic record and the PURL associated with it was also a concern.
Judging from communications on several ListServs where cataloging and link maintenance issues have been discussed, it would seem that as of yet few libraries are investing heavily in "preventive maintenance" through the use of PURLs. There seems to be increasing understanding of the purpose and nature of the Persistent URL, but how this translates to benefits for an individual library is still less than clear. Still, some list messages suggest that the PURL Resolver (server) is a wonderful black box where you feed in your bad URLs and out pops good ones. If GPO goes beyond PURL creation and demonstrates the ability to maintain them over time, the promise of the PURL relative to link maintenance, will be reality. In the depository survey of January, only 1 of 3 respondents indicated an institutional preference for PURLs over URLs in the Marc 856 field.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The Role of TRULs in Link Maintenance
|
Thirty years ago - long before the internet, the world wide web and even the integrated library system - my cataloging professor at Berkeley attempted to explain the role of rules in the library setting. She made special note of three collections of rules that need to be understood by any librarian
I would like to offer the acronym TRULs to represent this class of operative rules.
Before OCLC and the automated library systems that followed, TRULs were uniquely associated with the work that went on within individual libraries. Cataloging departments from library to library would have subtle but locally important differences in how cards were typed, or how long one had to file cards in the catalog "above the rod". Even reference departments had their differences. One library's practice might be to never suggest the MLA bibliography to an undergraduate while just across town, another library would immediately go to the MLA when a freshman English student mentioned Faulkner, Fitzgerald, or Hemingway.
With OCLC, TRULs had to expanded to include Dublin's tribal rules. With the move to Integrated Library Systems, the TRULs really became important as libraries responded to the new environment by establishing "workarounds" to make up for deficiencies in their systems. My first experience with an automated library catalog required creating a way to explain to the public the benefits and innovative genius of a system that ignored filing indicators and initial articles in its title indexing. With this same system, we developed other TRULs to live with call number displays limited to just 24 characters.
At the present time TRULs seem to be unduly important to those libraries currently attempting URL link maintenance in their catalogs. Cumbersome processes are required to identify and capture 856 field data that is needed for validation. Often a variety of utility and applications software is used just to prepare a group of URLs for link testing. The limitations of even good link validators, require further file manipulation.
For many libraries, dealing with system-imposed constraints is very time consuming and sometimes in direct conflict with established cataloging rules. For example, if the cataloger in an Innopac library wants to display information from the Materials Specified subfield ($3) - say "Vol. 3" or "Table of Contents"- this information has to be added to the notes subfield ($z) to display to the public. The opposite is the case for DRA libraries, where if the cataloger wants to have the note subfield ($z) indicating "Adobe Acrobat required to view this document", then this text has to be placed in the Materials Specified subfield (#3).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Conclusion |
The fact that probably most libraries have yet to develop systematic procedures and practices for ongoing link maintenance is probably a TRUL waiting to happen once the number of linked URLs, or unlinkable URLs reaches criticality. This need not happen if our system vendors develop the tools required for the task such as - link validating robots that operate on library determined schedules, reporting capability linked to review files of records with problem URLs, global find/replace capability, and state-of-the art graphically based editors.
|