Quantitative social science data are tools for research. The analysis of such data appears in professional journals, in scholarly books, and more and more often in more popular media. For the scholar, the connection between text and data is natural. We analyze data and publish results. We read the results of others analyses, learn from it, and move forward with our own research. But these connections are sometimes difficult to make. Data that back up an article are often difficult to find and even more difficult to analyze. Thus, our ability to replicate the work of others and to build on it diminished. We sometimes chase down the author of an article to find the data; and often the data are not there to be found. A similar problem exists for scholars who move from data to the published work based on the data. It may not be easy to trace the publications that emerge from a data set -- so that we can build on rather than duplicating that which has come before. Scholarship would be greatly enhanced if one could move easily from data to text and from text to data.
The Virtual Data Center Project is an operational, open-source, digital library to enable the sharing of quantitative research data, and the development of distributed virtual collections of data and documentation. The Virtual Data Center is being developed cooperatively by the Harvard University Library, and the Harvard-MIT Data Center, and is supported by a research grant primarily from the National Science Foundation. In this paper, we discuss the prototype of the VDC, and plans for its first release, These releases extend the current system operating at HMDC in a number of ways. We show how the public release generalizes the software infrastructure and interfaces of the prototypes to enable the linking together of multiple (distributed) collections of social science data. We outline the major features of the alpha software, the results from analyzing use of our system.