Classification Theory and the Internet: A move toward Mulitdimensional Classification

Susan Irwin

University of Denver

March 6, 2001

Human beings seem to have an innate need to organize (Taylor, 1999). The need to organize large amounts of knowledge and information led to the development of classification schemes and other organizational tools. Rapid changes in the way information is generated and accessed lead to changes in the way information is organized for retrieval (Chan, 1995).

The first edition of the Dewey Classification (DDC) was published in 1876 when knowledge was expanding at a fast rate and the public library system was becoming a more open system.  The DDC provided a new organizational scheme for dealing with these changes. Today, the Internet allows information to be generated and packaged in digital format. Can library classification improve access and retrieval of Internet resources?  This paper examines the purpose of library classification and explores how classification schemes can add value to Internet collections.

“Regardless of the nature of the information resource, the need to express its content, describe its format, facilitate its access, and enable its use remains constant” (Dillon & Jul, 1996 p.212-13). Library classification schemes have four main purposes. First, they order the fields of knowledge in a systematic way. Second, they bring related items together in a helpful sequence. Third, they provide orderly access to the shelves either for browsing or via the catalog. Finally, they provide an exact location for an item on the shelves (Dittmann & Hardy, 2000 p.8). 

A library is primarily a store of information packages (Taylor, 1999).  Taylor defines an “information package” as an instance of recorded information such as a book, article, video, Internet document or electronic journal. Classification schemes bring systematic order and control to the collection so that an information package can be retrieved according to a particular aspect of its character. Library classification schemes also improve retrieval through enhanced subject access via Online Public Access Catalogs. A successful classification scheme is one that saves the time of the user by creating an order convenient to the user (Mortimer, 2000).

 Like a library, the Internet is a store of information packages, but without systematic order. Lynch (1997) claims that the Internet is merely a huge amount of disorganized data, a "chaotic repository for the collective output of the world's digital printing presses." Search engines furnish access and retrieval points for Internet resources but not all of them allow users to navigate using subject categories. Since each search engine has its own vocabulary and organizational scheme, a search using one search engine may turn up too many hits while the same search using a different search engine may result in no hits at all.  The advantage of library classification in this situation is that it can create cohesion across diverse information stores by establishing a shared conceptual context (Albrechtsen & Jacob, 1998 p. 310). 

                 Classification systems increase access by grouping, or collocating, information packages together according to their subject content.  The major classification systems are founded on General Classification Theory (Marcella & Newton, 1994).  General Classification Theory is based on four principles:

(1)     Every thing, object etc. has to have a distinct and unambiguous description of its unique qualities.

(2)     Principles involving likeness and distinctness must be used in creating classes.

(3)     Hierarchies and other relational methods are necessary in order to group fundamental characteristics and to identify fundamental differences clearly.

(4)     The final system should appear as a logical progression from general to particular (Richmond, 1990).

 

t1As a result, classification schemes define subject content through schedules of classes, divisions and sections that move from the general to the specific (Mortimer, 2000).  By establishing relationships between information packages, classification systems supply context and furnish end-users with a way to discover potentially helpful items.

                All Internet search engines do not use the same controlled vocabulary or organizational scheme. Some of the organizational schemes have foundations in library classification schemes such as DDC, while others "have adopted or invented rough-and-ready subject classification schemes" (Dillon & Jul, 1996 p. 199). The Online Computer Library Center compared 50 subject categories from Yahoo to the DDC and to the Library of Congress Classification (LCC) to determine how well library classification schemes compare to Internet classifications in terms of general topic coverage (Vizine-Goetz, n.d.). Both the DDC and the LCC have e sufficiently wide topic coverage for classifying Internet resources” (MacLennan, 2000).

Once the relationships between the classes, subclasses, divisions and sections are established, library classification systems assign notation to each. Notation is used to assign call numbers to individual information packages.  The call number assigned works as a collocation device.  For a library, this means that information packages are ordered according to call number. Information packages that are related in some manner are shelved together using this number.  This allows patrons to browse related items in a specific area of the stacks.

                Although it is a popular library use, there are limits to the effectiveness of browsing in a library's shelves.  In a library, non-book items that are shelved separately, as well as items in circulation and Internet resources would not be brought to the attention of the browser.  Instead, the patron would have to browse each collection's location. Patrons may miss relevant information because they only browsed one section.

                In a catalog's online environment, browsing becomes more powerful because it shows the whole collection (Chan, 1995) and allows browsing the classification numbers of all information packages. “Classification browsing in the online catalogue [sic] is …the only way that users can bring together all of the various formats now available in our libraries” (Elrod, 2000 p.23).

                The call number assigned to an information package also works as a location device.  “Ultimately, in a physical world, each object must rest in its place-both absolutely and, in a classified collection” (Dillon & Jul, 1996 p. 204).  The assignment of an exact location allows for easier retrieval and re-shelving. However, the current practice of assigning a single location for an information package is somewhat limiting. If the subject of an information package falls into two different classes, it must nevertheless be located in a single place by class.  In a library materials can only be organized in a sequential or linear arrangement.

Internet resources do not require a physical location in the stacks and are freed from the need for a linear arrangement.  A resource can be stored in a place remote from the place of access. Physical collocation is not necessary because the resources can be grouped "on-the-fly" as a result of searching or browsing, regardless of their actual storage place (Dillon & Jul, 1996).

Without the need for linear arrangements, subject classification can be taken a step further.  Instead of making the difficult choice between two classes, multidimensional classification can be contemplated.  Metadata may be one solution to providing multidimensional classification.  A defined set of metadata elements, such as the Dublin Core, with repeatable elements would allow for simultaneous subject categories.

A commonly cited problem, regarding the classification of Internet resources, is that these resources are not stored in a physical setting. Rather, they are stored in cyberspace and their locations may change. Uniform Resource Locators (URLs) provide the address for the place of the resources on the Internet. The longevity of an average URL can be measured in weeks (Younger, 1997). Again, library classification offers a solution. Weinberg (1998) suggests that changes in location are similar to the reclassification of a book in a library. Repeatable metadata elements also would allow for multiple Internet addresses.

The task of evaluating Internet resources for classification and cataloguing may seem overwhelming. Over 7 million sites already exist on the Internet and the number grows daily (OCLC, n.d.). The number of sites seems relatively small when compared to the over 43 million unique bibliographic records of OCLC's WorldCat database (OCLC, 2001). However, if the Internet is to thrive as a new means of information communication, something like library classification will be needed to organize and retrieve information (Lynch, 1997).

 “ A good library does not provide haphazard access to an information jungle” (Toub, 1997 p.148). Classification schemes are used to provide systematic order, bring related items together, provide access through browsing and provide an exact location for each information package.  Library classification schemes also add value to Internet collections.  They supply a shared context and create cohesion across diverse information technologies through established systems.  The notation devices, combined with online technology, allow browsing of an entire collection regardless of format.   In addition, freed from the need to provide a single physical location, multidimensional classification can be approached through the use of metadata.  Multiple subject classifications create more points of intellectual access for the end-user.

 

 

Bibliography

 

Albrechtsen, Hanne & Jacob, Elin. (1998). The Dynamics of Classification Systems as Boundary

Objects for Cooperation in the Electronic Library. Library Trends, 47 (2), 293 – 312.

 

Chan, Lois M. (1995). Classification, Present and Future. Cataloging and Classification

Quarterly, 21(2), 5-17.

 

Dillon, Martin & Jul, Erik. (1996). Cataloging Internet Resources: The Convergence of Libraries

and Internet Resources. Cataloging and Classification Quarterly, 22 (3/4), 197 – 238.

 

Dittmann, Helena & Jane Hardy. (2000). Learn Library of Congress Classification. Lanham,

Maryland: The Scarecrow Press.

 

Elrod, J. McRee. (2000). Classification of Internet Resources: An AUTOCAT Discussion.

Cataloging and Classification Quarterly, 29 (4), 19 – 38.

 

Lynch, Cliffiord. (1997). Searching the Internet. Scientific American, 276 (3), 49-56.

 

MacLennan, Alan. (2000). Classification and the Internet. In Rita Marcella & Arthur Maltby

(Eds.) The Future of Classification, (pp. 59-68).  Hampshire, England: Gower.

par

Marcella, Rita & Robert Newton. (1994). A New Manual of Classification. Hampshire. England:

Gower.

 

Mortimer, Mary. (2000). Learn Dewey Decimal Classification (Edition 21). Lanham, Maryland:

The Scarecrow Press.

 

OCLC. (n.d.) Web Statistics. Retrieved March 5, 2001,

http://www.oclc.org/news/research/webstatistics.shtm.

 

OCLC. (2001). Product Statistics. Retrieved March 5, 2001,

http://www.oclc.org/news/product/statistics.shtm.

 

Richmond, Phyllis. (1990). General Theory of Classification. In Betty G. Bengston & Janet S.

Hill (Eds.), Classification of Library Materials. (pp.16-26). New York: Neal-Schuman Publishers, Inc.

 

Taylor, Arlene G. (1999). The Organization of Information. Englewood, Colorado: Libraries

Unlimited

 

Toub, Stephen E. (1997). Adding Value to Internet Collections. Library Hi Tech, 15 (3/4),

0 Retrieved February 3, 2001, from Emerald database, http://dandini.emerald-library.com.

 

Vizine-Goetz, Diane.  (n.d.) Using Library Classification Schemes for Internet Resources. OCLC

Internet Cataloging Project Colloquium, Position Paper.  Retrieved February 5, 2001, from the World Wide Web: http://www.oclc.org/home.

 

Vizine-Goetz, Diane. (1997). From book classification to knowledge organization: Improving

Internet resource description and discovery. American Society for Information Science, 24 (1) 24-27. Retrieved from OCLC First Search database, http://newfirstsearch.oclc.org.

 

Weinberg, Bella Hass. (1998). Improved Internet access: Guidance from research on indexing and

classification. American Society for Information Science, 25 (2) 26-29. Retrieved from OCLC First Search database, http://newfirstsearch.oclc.org.

 

Younger, Jennifer A. (1997). Resources Description in the Digital Age. Library Trends, 45 (3)

462-487.