NEWS: USING GZIP'D
(COMPRESSED) DATA Updated: 12-OCT-98
We have found a simpler way to do this from within the SAS command file using the following FILENAME syntax:
FILENAME pipedata PIPE 'gzcat /DL/soc/crime/s6068/da6068.gz' ;
DATA ;
INFILE pipedata LRECL=113 MISSOVER ;
This eliminates the need to create the pipe and gzcat the data file at the Unix prompt (using this FILENAME syntax those steps are covered internally, within the SAS command file).
Note: In this example, "pipedata" is an arbitrarily chosen nickname to be used within the SAS job to refer to the data. The "/DL" path and filename in the above example should be replaced with the specifications for the specific data file to be used in your SAS run. As always, the LRECL specification on the INFILE command should be changed to represent the true LRECL of your data file. MISSOVER is an option that tells SAS to treat blanks as missing data.
"PIPE" tells SAS that your FILENAME refers to input that will be piped to SAS from the Unix operating system through the Unix command that follows in quotes. As SAS needs data, it reaches for it in through the pipe which has been fed from the "gzcat" command.
While the "gunzip" command uncompresses the data and stores the fully uncompressed file on disk (sometimes using as much as 10 times the amount of disk space as the compressed file used), the "gzcat" command grabs characters from the compressed data file in small packets, sends them through the pipe, and then discards the characters once they have been used-- never actually storing the uncompressed characters as a fully uncompressed file.