SRC Data Library
1155 E. 60th Street, Room 041, Chicago, IL 60637
Voice: 773.702.8256 ---- Fax: 773.702.2101
E-mail: datahelp@src.uchicago.edu

NEWS: USING GZIP'D
(COMPRESSED) DATA
Updated: 12-OCT-98


As many of you know by now, most data files under our /afs and /DL data servers have been compressed using "gzip" for efficiency. In the past we have advised users to read these data files into SAS by uncompressing the data into a "named pipe" created at the Unix prompt.

We have found a simpler way to do this from within the SAS command file using the following FILENAME syntax:

  FILENAME pipedata PIPE 'gzcat /DL/soc/crime/s6068/da6068.gz' ;

  DATA ;
  INFILE pipedata  LRECL=113  MISSOVER ;
      

This eliminates the need to create the pipe and gzcat the data file at the Unix prompt (using this FILENAME syntax those steps are covered internally, within the SAS command file).

Note: In this example, "pipedata" is an arbitrarily chosen nickname to be used within the SAS job to refer to the data. The "/DL" path and filename in the above example should be replaced with the specifications for the specific data file to be used in your SAS run. As always, the LRECL specification on the INFILE command should be changed to represent the true LRECL of your data file. MISSOVER is an option that tells SAS to treat blanks as missing data.

HOW THIS WORKS

"PIPE" tells SAS that your FILENAME refers to input that will be piped to SAS from the Unix operating system through the Unix command that follows in quotes. As SAS needs data, it reaches for it in through the pipe which has been fed from the "gzcat" command.

While the "gunzip" command uncompresses the data and stores the fully uncompressed file on disk (sometimes using as much as 10 times the amount of disk space as the compressed file used), the "gzcat" command grabs characters from the compressed data file in small packets, sends them through the pipe, and then discards the characters once they have been used-- never actually storing the uncompressed characters as a fully uncompressed file.