Seamless Searching of Numeric and Textual Resourcs Michael Buckland, Fredric Gey, Ray Larson University of California, Berkeley The dream for the use of new technology in libraries is to support seamless searching across an increasing range of resources on a growing digital landscape. The reality is that network-accessible digital resources, like the contents of a well-stocked reference library, are quite heterogeneous, especially in the variety of indexing, classification, categorization, and other forms of "metadata." The contribution of this project, funded by a 1999 National Leadership Grant from the Institute for Museum and Library Services, is to demonstrate improved access to written material and numerical data on the same topic when searching two quite different kinds of database: text databases (books, articles, and their bibliographic records) and numerical data (socio-economic databases). Example: From numbers to text The Government Information Site at Oregon State University http://govinfo.kerr.orst.edu/import/import.html has U.S. annual imports totals extracted from CD-Rom. A search on that source will yield some surprising numbers: An example could be noticing a sudden increase of imports into the USA through Los Angeles of shrimp and prawn from Vietnam and curiosity about the political background and economic consequences. The numbers show: - - - U. S. I m p o r t s o f M e r c h a n d i s e - - - General Imports: Imports for Consumption SHRIMP/PRAWN SHELL-ON COUNT SIZE 33-45 PER KG FRZN (HS: 0306130006) (SIC: 0913) Unit of Quantity -- Kilograms FROM: Vietnam THRU: LOS ANG Year Quantity Customs Value 1993 0 0 1994 48,782 676,930 1995 247,707 3,520,806 1996 562,427 7,864,052 Taking the keywords "Import" and "Vietnam" over to an online bibliographic databases of newspaper articles retrieves, among others, Iritani, Evelyn. "Normalizing ties to Vietnam important steps for U.S. firms; California stands to profit handsomely when barriers fall to trade with fast-growing country." Los Angeles Times v114 (July 12, 1995):D1. "Hanoi's trade deficit." New York Times v143 (July 15, 1994):D15(L). (Vietnam imports increasing faster than exports). The problem with doing this type of search is that there has, until now, been no easy path to integrate numeric databases with bibliographic and textual databases which might contain knowledge about cause and effect. The problem with doing this type of search is that there has, until now, been no easy path to integrate numeric databases with bibliographic and textual databases which might contain knowledge about cause and effect. The vocabulary which classifies the numeric data may be quite different from the subject headings used for books, magazine articles, and newspaper stories about the same topic of interest. Also there needs to be an environment of search support that facilitates such transverse searching, establishing connections, transferring data and invoking appropriate utilities in a helpful way. Assistance to selecting the best search terms in the target database is made possible supported by the use of "Entry Vocabulary Modules," which resemble Melville Dewey's "Relativ Index," but are created using statistical association techniques developed at Berkeley. Two significant problems are encountered when seeking to traverse a search across from libraries textual resources to numeric databases or vice versa: 1. The vocabulary which classifies the numeric data may be quite different from the subject headings used for books, magazine articles, and newspaper stories about the same topic of interest. For example, searching for Federal imports data for "automobiles" returns no results, even though billions of dollars of U.S. foreign exchange goes into auto imports. Searching under "car" yields data on railroad and tramway rolling stock. To get automobiles, the searcher needs to know to search under "P" for "Passenger Motor Vehicle." 2. There needs to be an environment of search support that facilitates such transverse searching, establishing connections, transferring data and invoking appropriate utilities in a helpful way. This project addresses both problems. It is research and demonstration designed to improve access to library and information resources. It extends the most recent research in library science to demonstrate and test a potential solution to a neglected real-world problem: How to support searches for both numeric and textual databases for information on the same topic. By demonstration, evaluation, description, and a www-accessible prototype, any librarian with access to the Internet will be able to try out this solution. By the free distribution of software all library system developers will be encouraged to adopt and adapt what is developed. The research and the prototype to be made available to librarians and library system developers to test-drive will necessarily focus on a few carefully chosen resources, but it will be clear that the same techniques are generally applicable to other online resources.