This document is currently under construction.
|
Select a topic if you wish to ZOOM directly to a particular section within this document [ INTRODUCTION ] [ MOVING
AROUND A TAPE (mt)] |
The most frequently used UNIX commands to manipulate tape are mt, dd, tar/gnutar, and cpio/gnucpio. mt is used to advance or rewind the tape so that a particular file can be read from or written to tape, the others ares used to transfer data from disk to tape or tape to disk.
The key features of dd are:
The key features of tar/gnutar are:
The key features of cpio/gnucpio are:
The strategy you will use is clearly one of preference. Most users find tar/gnutar the easiest to use and it is by far the fastest in terms of time taken to create and restore and archives (approximately 8 minutes to write/read 100 MB of data). For long term storage, cpio/gnucpio is probably the best strategy (it's used by the data library and by SRC system administrators) although there is a speed loss (8 minutes to write 100 MB of data, 30 minutes to restore 100 MB).
This handout will only make reference to the DAT drive attached to oedipus. Refer to the table of device names in the companion article, "Tape Media at SRC", if you intend to use other devices.
Formally, the difference between an "archive" and a "backup" is based upon the intended use of the files. "Archives" are files--data, output, command syntax, etc.-- that you are no longer using but that you want to preserve for possible use in the future. "Backups" are typically copies of files that are not actively being used and probably are also too large to be stored on hard disk for extended periods of time. For purposes of the discussion on tape usage, the two are not different.
However, the difference is important for preparing data to be written to tape. If you do not plan to use files written to tape for more than a year, you should consider whether your files will be readable by your favorite application(s) in the future. While SPSS and SAS have built-in features that make them backward compatible (meaning that newer versions of the software can usually read files created by older versions), you should NOT assume that this will always be true. As a standard rule, you should either create PORTABLE/TRANSPORT files or ASCII versions of your important data datasets BEFORE writing them to tape. Examples for creating a SAS transport file and SPSS portable file follow.
Example SAS commands to create a SAS TRANSPORT File.
options replace;
libname mydata '/dir1';
libname trandata xport '/dir2/filename';
proc copy in=mydata out=trandata;
select sasfile1;
run;
NOTE that the libname "trandata" refers to a file rather than a directory. The select statement is used to specify the names of SAS datasets you want included in the transport file. Excluding the select statement causes all SAS datasets to be copied into the transport file. Example SPSS Commands to create an SPSS Portable File.
GET FILE = '/dirname/filename.sys'.
EXPORT OUTFILE = '/dirname/filename.por'.
Where '/dirname/filename.sys' is the name of an SPSS system file and '/dirname/filename.por' is the name of the SPSS portable file.
In tape handling as in real estate, the critical factors are location, location, location. Unlike a floppy diskette, you cannot access a file directly; instead, you must position the tape heads at the appropriate location to read the data. If you know you only have one file or archive on a tape, it's not a problem; however, if you have received a "standard label tape" or have multiple files/archives on the tape, you will need to use the mt command. A "standard label" tape is one that contains special files on the tape indicating what data was written to the tape and how it was written. The format is most commonly seen with 9-track tapes written on a mainframe or VAX.
In short, you can do two things with "mt"--move forward and rewind to the beginning.
Assume that the file you need to extract from tape is the second file on the tape. To position the tape head correctly to read the data, issue the following command:
% mt -t /dev/nrst1 fsf 1
% rsh oedipus mt -t /dev/nrst1 fsf 1
% remsh oedipus mt -t /dev/nrst1 fsf 1
where "/dev/nrst1" is the no-rewind tape device name for the DAT device attached to oedipus, "fsf" means forward space file and the "1" is the number of files to forward space through. Note the "no rewind" device name. Without it, the tape will rewind to the beginning after it reaches the second file on the tape.
To rewind a tape to the beginning, issue the following command noting that the "n" or no rewind option is excluded:
% mt -t /dev/rst1 rew
dd is the recommended strategy for transferring data to/from other institutions and for the highly risk-averse user who lives in perpetual fear of tape failure. If you are archiving SAS or SPSS datasets, you must prepare them as transport files before writing them to tape (SAS proc copy, SPSS export command with the tape option).
Before getting started, you need to determine several things. First, what type of system will be used to read/write the tape. If it's a mainframe, you should read/write the tape with EBCDIC conversion. Second, you need to determine the blocksize of the dataset to be written. This is typically a multiple of the record length (=single line length) of your file. For example, SPSS portable files have a record length of 80 and are typically written with a blocksize of 5200, 65 records per block. Third, you need to keep a written copy of this information.
The general syntax for writing a file to tape for use on a mainframe is:
% cat filename | dd of=/dev/rst1 cbs=yyyy obs=xxxxx
conv=ebcdic
% cat filename | rsh oedipus dd of=/dev/rst1 obs=xxxxx
cbs=yyyy conv=ebcdic
where "cat" is the UNIX cat command; filename is the name of the file to be written to tape; "of" is the output file, the tape drive device name; "cbs=xxxx" is the conversion blocksize and xxxx is the length of a single line of your file; "obs=yyyy" stands for output block size where yyyy is a multiple of the cbs and must be less than 32760 when using the 9-track tape drives [default obs=512], "conv=ebcdic" specifies conversion to ebcdic. The "rsh machine" in the second example means "remote shell" (use remsh if you're working on a HP Workstation-- paideia, psyche, oedipus, polis, mazel, or zapata) and machine is the Unix Workstation where the tape device is located.
If you simply want to write a file in ASCII you can issue the following:
% dd if=filename of=/dev/rst1
% cat filename | rsh oedipus dd of=/dev/rst1 obs=xxxxx
In the first example, the obs option is the system default-- 512kb blocks. In the second example, the obs should be a multiple of the record length of the file written to tape, however, since no conversion is specified, you don't need the cbs parameter.
Since dd only copies file contents and because record lengths and blocksizes vary from file to file, most tapes created using dd are written in a standard label format. The standard label format is characterized by one and usually two additional identifier files per data file: the header file preceding the data file of interest, and the trailer file following the data file of interest. Both files contain the original name of the file written to tape, its record length and blocksize. All header and trailer files are written with a blocksize (ibs) and record length (cbs) of 80 characters. To read a header file from a 9-track tape written on the mainframe, issue the following command:
% rsh oedipus dd if=/dev/nrst1 ibs=80 cbs=80 conv=ascii
| cat > head1
% dd if=/dev/rst1 of=head1 ibs=80 cbs=80 conv=ascii
where if means input file, "/dev...." is the tape device, and "conv=ascii" specifies a conversion from ebcdic to ascii.
An example header file is displayed below:
The name of the original file written to tape is on the line that begins with HDR1. The Block size (ibs) and record length (cbs) used to write the file to tape appear at columns 6-10 and 11-15 of the line that starts with HDR2. In this case, the original file name was age1.data, the block size (ibs) is 3000 and the cbs is 200.
The standard label (SL) format has implications for reading data from tape. Most importantly, you will need to use mt to position the tape correctly for reading the data file you are interested in. The format of a SL tape is:
[HDR1] FILE1 [EOF1] [HDR2] FILE2 [EOF2].....
In order to read the second "real" file--FILE2--on tape, you will need to issue an mt command of the following form:
% mt -t /dev/nrst1 fsf 4
In general, the number of files to space through is 3*(file number) -2. For example, if you want to read the 4th data file on a standard label tape, you would fsf 3*4 -2 or 10 files.
Assume that you have received a standard label, 9-track tape created written in EBCDIC (the language of the mainframe/MVS computer).
To read a file from tape, issue the command:
% rsh oedipus dd if=/dev/nrst1 ibs=yyyy cbs=zzzz
conv=ascii | gzip -c > /dir/filename.gz
where the ibs is the blocksize and cbs is the record length obtained from the header file or from written documentation accompanying the tape; "gzip -c > /dir/filename.gz" causes the file you are reading from tape to be compressed and directed to /dir/filename.gz. We encourage users to use the compress option when the file being read from tape is larger than 15 MB (filesize can roughly be estimated by multiplying the record length by the total number of records in the file).
Tapes written on a VAX or on other UNIX systems in ASCII can be read as follows:
% rsh oedipus dd if=/dev/nrst1 ibs=yyyy | fold -nxxx
> gzip -c /dir/filename.gz
Since the cbs option only works when converting ascii to ebcdic or visa versa, you can obtain the appropriate records by folding the file into lines of length xxxx where xxxx is the record length.
tar and gnutar are recommended for those creating a personal backup of important files for use on UNIX machines. tar and gnutar have the advantage of allowing the user to archive multiple files at once-- including full directory structures--as well as retaining information about filenames, dates of creation, etc. NOTE: If your tape deteriorates or exhibits a read error, there is no error correction mechanism. You will lose the entire archive!!
Gnutar is the more robust of the two methods and is preferred to tar when archiving data across machines, e.g. creating a backup of your files on johnjohn to the DAT drive attached to oedipus. Note that tar can read files written by gnutar and visa-versa.
The general syntax for creating an archive is:
% gnutar -cvf /dev/rst1 . > tapelog &
% gnutar -cvf oedipus:/dev/rst1 . > tapelog &
% gnutar -cvf oedipus:/dev/rst1 directory-name . > tapelog &
where "c" means create, "v" stands for verbose and will list each file as it is being archived to tape, "f" denotes file-- the tape drive device that you are writing to. The "." means your current working directory and all subdirectories, if there are any. If you would like to archive a particular subdirectory or a file, replace the . with the directory name or filename. The " > tapelog" redirects the verbose listing of files name from your terminal screen to a file "tapelog".
With tar and gnutar, you can obtain a "table of contents" for the file archive by issuing the command:
% gnutar -tvf /dev/rst1
% gnutar -tvf oedipus:/dev/rst1
where "t" stands for table, "v" stands for verbose. By default, the listing of files will scroll on your terminal. If you'd like to redirect the output to a file, add " > filename" to the end of the command.
If you have multiple tar archives on your tape, you should use the "no rewind" device name. You will also need to issue an mt command with fsf 1 to space past the end of file marker marker, e.g.
% mt -t /dev/nrst1 fsf 1
Otherwise, you will see the error: tar:blocksize=0
By default, tar/gnutar will restore files to the same directory that they were originally archived from, creating the directory as needed. For example, if I had created a (gnu)tar archive that included the directory, mydir, the restore process will copy files to a directory mydir, creating it if it doesn't already exist.
To restore the entire contents of a tape archive:
% gnutar -xvf /dev/rst1
% gnutar -xvf /dev/rst1
Where "x" means extract, "v" means verbose and will list the name of each file as it is read from tape, and "f" refers to the device name as above.
You can also extract specific files or directories from a tape archive. Note that the UNIX wildcards (* and ?) will NOT work with either tar or gnutar. You must specify the file or directory names as they appear in the "table of contents".
% gnutar -xvf /dev/rst1 dirname
% gnutar -xvf /dev/rst1 dirname/filename
% gnutar -xvf /dev/rst1 filename1 filename2 filename3
While tar does not allow you any flexibility in file location, you can extract files to a new location with gnutar. For example, suppose you have an archived file /dir/file1 listed in your gnutar table and you'd like to copy it to the /tmp directory. The following command works!
% gnutar -xvOf /dev/rst1 /dir/file1 > /tmp/file1
Cpio is perhaps the best strategy for backing up data to tape--it preserves directory and file information as well as supporting error recovery in the event of tape failure. On the down side, these features also make it slow and incredibly cumbersome for backing up data across the network. Once SRC has made all tape devices accessible from all machines, we will provide a detailed discourse on cpio; however, here are the basics for those who are anxious to get started.
cpio requires the user to supply input listing the files and relative directory locations of the files you want to archive. This can be done in two ways, both of which involve use of the UNIX find command.
If you want to write all of the files in your user directory and subdirectories, issue the command:
% find . -print | gnucpio -ov -O /dev/rst1
% find dirname -print | gnucpio -ov -O /dev/rst1
where "find . -print |" produces a complete list of files and directories and "find dirname -print |" produces a list of files in the specified directory and pipes the result to cpio. The "o" in "ov" means output and "v" means verbose. [This option is very useful if you are planning to send the tape to another institution]. Finally, the "-O /dev/rst1" redirects the output to a tape device.
Alternatively, if you only want to backup a subset of all of your files, you can use the find command to produce a listing of files which you can then edit (using vi or emacs) to include only those files you want to back up. For example,
% find . -print > myfiles
% cat myfiles | gnucpio -ov -O /dev/rst1
Like tar and gnutar, you can obtain a "table of contents" of your archive. The command to do so is:
% gnucpio -itv -I /dev/rst1 > mycpio.log
where "i" of "itv" means input, "t" means table, and "v" means verbose, "-I /dev/rst1" specifies where to find the cpio archive, and "> mycpio.log" will redirect the table of contents to a file called "mycpio.log".
Unlike tar/gnutar, cpio accepts the UNIX wildcard characters (* and ?) for restoring files from tape. To restore files from a cpio archive:
% gnucpio -ivd -I /dev/rst1 > myout.log &
% gnucpio -ivd * -I /dev/rst1 > myout.log &
% gnucpio -ivd "*.ssd01" -I /dev/rst1 > myout.log &
% gnucpio -ivd "/dirname/file1" -I /dev/rst1 > myout.log &
where "i" means input, "v" means verbose and will list the names of files as they are restored, "d" tells cpio to create directories as needed, "-I /dev/rst1" is the input tape device, and "myout.log" is the output listing of restored files including all error information if cpio encounters an error on the tape. The first and second examples are equivalent and restore the entire archive. The third example restores all files with an extension ssd01, e.g. all SAS data sets. The final example restores a single file, called "file1", from the directory, dirname. NOTE in the latter two examples that the file references are QUOTED. This is necessary, otherwise, the files will not be restored.