ICPSR – Reviewed Fall 1997

Reviewed by Royce Kurtz, University of Mississippi Libraries, September 23, 1997

Introduction:

The Inter-university Consortium for Political and Social Research
(ICPSR) is a nonprofit archive for machine-readable data related to
social science research. Founded in 1962, ICPSR is the largest archive
of such data in the United States. As a nonprofit consortium, it
provides data and services to over 350 academic institutions. The
archive is a subdivision of the Institute for Social Research (ISR) at
thn of the Institute for Social Research (ISR) at the University of
Michigan. ISR is nationally known for conducting extensive,
long-running surveys, such as the Panel of Income Dynamics, Monitoring the Future, and the Study of American Families.
Data sets compiled by ISR as well as those acquired from U.S.
government agencies, private foundations, international organizations,
and individual researchers form the basis of ICPSR’s collection of
4,200 data sets. The archive adds about 200 titles a year. Access to
the titles in the archive has been through an annually published Guide to Resources and Services, now 915 pages (1996-97 last paper edition). The Guide

is updated by a quarterly bulletin. In 1993 ICPSR launched a Web site
which essentially functions as an electronic version of the Guide and the quarterly bulletins as well as providing the ability to download code books and data sets.

Scope and Coverage:

The archive has few guidelines governing the scope and coverage of
its collection. Guidelines for depositing data deal not with subject
appropriateness but technical issues. Besides the statement in the Guide
that its “holdings cover a broad range of disciplines, including
political science, sociology, demography, economics, history,
education, gerontology, criminal justice, public health, forcriminal
justice, public health, foreign policy, and law,” and that “ICPSR
encourages social scientists in all fields to contribute,” no other
statement defines the archives collection policy.

Individual scholars contribute data sets to the ICPSR archives.
Foundations and agencies contract with ICPSR to archive and distribute
their data sets, e.g., the National Archive of Computerized Data on Aging
funded by the National Institute on Aging. Public health care data
collected under grants from the Robert Wood Johnson Foundation Archive
are made available through ICPSR. In 1997 ICPSR was awarded a contract
by the Substance Abuse and Mental Health Service Administration to
establish a National Archive and Analytical Center for Alcohol, Drug Use, and Mental Health Data. They are also creating an International Archive of Education Data sponsored by the National Center for Education Statistics. ICPSR also purchases data sets from a number of sources.

ICPSR is committed to acquiring and maintaining new versions of
around one hundred serial data sets. These data sets include those
produced by private agencies or individuals, such as the ABC
News/Washington Post polls; U.S. government agencies, such as the
Census Bureau’s Current Population Surveys ; international
organizations, such as the International Monetary Fund’s Government
Finance Statistics; and other organizations, such as the Center for
Political Studies’ National Election Studies.

The overwhelming number of data sets focus on U.S. social issues.
Less than 15% of the archive’s holdings are titles concerned with
international or foreign–mostly European–studies.

Format:

The ICPSR home page has links to a “Table of Contents” and ICPSR’s
parent organizations. There is also a header, which is reduplicated on
successive Web pages for easy maneuverability, with links to the
archive, major subsets of the archive, ICPSR’s summer programs, other
Web sites, and contacts for help.

The Table of Contents screen provides an outline of ICPSR’s Web
site; it has links to six major sections, each with numerous
subsections. The first three sections, “About ICPSR,” “Membership,” and
“Governance,” describe ICPSR’s organization. “About ICPSR” includes a
link to contact people categorized by their job description with phone
number and e-mail address and a link to the full text of recent ICPSR
newsletters atext of recent ICPSR newsletters and bulletins.

The other three major sections are “Archive,” “Other Resources,” and
“Recent Developments.” “Recent Developments” has links to the General
Social Survey and American National Election Survey sites,
Eurobarometer mail list, a list of available CD-ROMs, and recent policy
statements. “Other Resources” has more policy statements, summer
program schedules and course descriptions, a link to the Publication
Related Archive, and “Other Data Sites.”

The key section, “Archive,” subsection “Access Holdings,” is the
electronic version of the Guide and the main access to the data
collections. “Archive” also links to three subsets of the archive:
“Recent Additions-Updates,” “Data on Aging,” and “Criminal Justice
Data,” as well as a “Documentation List” showing price and availability
of code books in electronic or paper format, and more policy
statements.

Record Structure of Data Archive:

The first link under “Archive,” labeled “Access Holdings,” leads to
a page titled “ICPSR Data Archive.” This page allows a researcher to
keyword search the archive, browse using eighteen main subject
headings, g eighteen main subject headings, or search by Title,
Principal Investigator (PI), and Study Number indexes. A main subject
division link leads in turn to subheadings. The final result of a
keyword or browse search is a neatly framed alphabetic list of
appropriate titles prefaced by AB for “Abstract” and DA for “Data.” AB
links to a long citation format that includes a detailed abstract. DA
links to a page that represents a powerful research feature of this Web
site, the ability to immediately download code books and data sets. If
available electronically, the page displays the phrase “Codebooks and
Documentation Freely Available.” The code book can often be keyword
searched from this screen. The search results display the line number
and the phrase on that line in which the keyword is found. The ability
to freely download or search the code book on the Web is a powerful
research tool. Researchers may easily determine if data sets contain
answers data sets contain answers to certain questions and how those
questions and responses are framed, thus determining quickly and
accurately the usefulness of a data set. Below the code book
information, the “Data Files” are neatly framed and presented. In most
instances the message “Access restricted to authorized users” is
displayed, but for member institutions or for those publicly accessible
data sets retrieval can proceed.

The format and arrangement of a data set citation on the AB Web page
follows the paper Guide. Each data set entry has a field for principal
investigator or investigators, a descriptive title, a unique ICPSR
study number, and a summary or abstract of the data set that may be
several paragraphs long. Other descriptive fields in the entry include
technical features, such as universe, sample size, data format, number
of cases, and records per case. Each of these fields, prefaced by a
word or phrase “field identifier,” links to a full explanation of that
identifier. Another detail coded into each record is the amount of
processing or editing done to the data and code books by the ICPSR
staff. Data sets may be reorganized and the code books fully revised,
or no editing may take place at all. Finally, a list of “related
publications” cites important publications based on the data set. This
standard format hase data set. This standard format has been modified
slightly through the years, but older entries have generally not been
changed to reflect new standards or editing styles.

ICPSR has separate Web pages for “Data on Aging” and “Criminal
Justice Data.” These pages are arranged like the main “ICPSR Data
Archive” page and have the same keyword searching features.

Subject Access to Data Archive:

On the “ICPSR Data Archive” page, data set citations are organized
into eighteen broad subject categories, such as “Census Enumerations,”
“Health Care and Health Facilities,” and “Geography and Environment.”
Major subject headings may have several subheadings. A common
subheading identifies data sets from “Nations Other than the United
States.” Data set citations appear under only one heading.

A simple keyword search engine is found on the “ICPSR Data Archive”
page. The search engine allows a search via pull-down menu bar in
title, investigator, or abstract fields, but not in combination. A
second pull-down menu allows the selection of a “word” or “string”
search. A “Help” link produces a screen with five short instructions:
searches are not case sensitive; the boolean opere not case sensitive;
the boolean operator “and” is assumed between words; quoted queries are
treated as single terms; word searching, which requires an exact match,
works only for abstract searching; and string searching is a truncation
device. For example, the string search “vot” gets vote, voting, votes,
and other variations. The search engine does not support the boolean
operator “or” or nesting of terms with parentheses. This presents minor
problems in searching because there is no thesaurus or controlled
vocabulary. One cannot use “or” for near synonymous words like voting
and election. The string search–minorit vot–produced eighteen hits;
the search–minorit election–produced fifteen hits.

Time Lag:

Updates of old titles and announcements of new titles are accessible
through a link on the “ICPSR Data Archive” page. As of the date for
this review (September 23, 1997), “Recent Additions” listed 22 titles
added between July 29 and September 19, 1997. The data sets were
compiled between 1970 and 1996, with 1994 being the median. Researchers
and organizations gathering data want to make use of their information
before passing it on for general use, thus delaying release dates.
ICPSR then performs various checks on the data and code books,
reformatting many of them for general use, which also delays the
releeral use, which also delays the release of data. As the editing
process can be lengthy, ICPSR has established a FastTrack service,
linked under “Electronic Services” on the “Contents” page, in which a
select number of new studies are made available only through anonymous
FTP before ICPSR has edited them. These studies are not cross-listed in
the “ICPSR Data Archive.” As of September 23, 1997, twelve studies were
available through the FastTrack service.

Editing:

ICPSR has kept its Web page clean and simple. Bold type,
underlining, and type size are used effectively to highlight and
organize each page. The utilization of framing to present lists of data
set titles, downloadable data sets, and code book options is simple and
effective. As much of the text was part of the paper Guide, years of
editing have minimized errors. The layering of screens follows a
reasonable hierarchy from general to specific. Headers with links allow
the easy return to various levels of the Web site.

Document Availability:

Most data sets are available only to paying members of the
consortium, but a growing number are available through anonymous FTP.
The Criminal Justice Data, the Publication Related Archive, the
FastTrack, and a small partFastTrack, and a small part of the Aging
Data are available to nonmembers. However, code books are not always
available electronically, and a data set without a code book is
useless. Obversely, a growing number of electronic code books are
available for downloading or for searching over the Web, where the data
are often not freely available. The options available to members and
nonmembers in terms of data and code book retrieval are always clearly
presented.

Cost:

All data sets in the archive are available to members upon the
payment of annual dues which are based on the institution’s enrollment
size and the highest relevant degree awarded. Institutional annual dues
for 1997-1998 ranged from $2,000 to $10,350 depending on the membership
category. Nonmembers may buy data sets, but they are relatively
expensive.

Comparison with Other Web Sites: The ability to keyword search a
large data archive and then immediately retrieve the code book and the
related data set is a major strength of the ICPSR. Several other Web
pages provide similar or complementary features. The University of
California at San Diego’s Data on the Net (odwin.ucsd.edu/idata/
)is a well-organized index to Web sites worldwide that provide access
to catworldwide that provide access to catalogues and/or data sets.
Columbia University’s Electronic Data Service (www.columbia.edu/acis/eds/)
actually provides a detailed, menu-driven search engine for its data
library which includes a search of the ICPSR Archive. The national data
archives of several European countries have placed their catalogues on
the Web, and the Council of European Social Science Data Archives
(CESSDA) provides a Web site (www.nsd.uib.no/cessda/home.html) that searches ten of these catalogues using a detailed menu-driven search engine.

Positive Aspects:

The ICPSR Web page truly represents the potential of the Web to
facilitate quantitative social science research. Structurally the site
is simple with an intuitive hierarchical layering of screens from
general subject categories to subsets of an individual data set. The
ability, without an ihe ability, without an intermediary, to order any
one of over four thousand data sets for immediate use is indeed a major
advance. The accessibility of code books for retrieval through
downloading or interactive Web searching increases the researcher’s
ability to easily and quickly determine the utility of any data set.

Recommendations for Improvement:

The ICPSR needs to reorganize its Web “Table of Contents” to better
align sections and subsections. The section, “Other Resources,” is
really a miscellaneous category; its subsections, “Web Services
Policy,” “Computer Assistance,” and “Electronic Services,” are policy
statements not “Other Resources.” The American National Election
Study’s “ANES Subsets and Stats” site should be put under the
“Archive,” and “FastTrack” should also be a featured item under
“Archive.” In other words, pages that access data should all be under
one section, and policy statements should be under another. The keyword
search engine is relatively simple. With only 4,200 data sets this is
currently not a serious issue, but the ability to “or” and nest using
parentheses will quickly become desirable search tools as the archive
grows.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s