Chapter 3 Importing from other statistical systems
In this chapter we consider the problem of reading a binary data file written by another statistical system. This is often best avoided, but may be unavoidable if the originating system is not available.
In all cases the facilities described were written for data files from specific versions of the other system (often in the early 2000s), and have not necessarily been updated for the most recent versions of the other system.
3.1 EpiInfo, Minitab, S-PLUS, SAS, SPSS, Stata, Systat
The recommended package foreign provides import facilities for files produced by these statistical systems, and for export to Stata. In some cases these functions may require substantially less memory than
write.foreign (See Export to text files) provides an export mechanism with support currently for
EpiInfo versions 5 and 6 stored data in a self-describing fixed-width text format.
read.epiinfo will read these .REC files into an R data frame. EpiData also produces data in this format.
read.mtp imports a ‘Minitab Portable Worksheet’. This returns the components of the worksheet as an R list.
read.xport reads a file in SAS Transport (XPORT) format and return a list of data frames. If SAS is available on your system, function
read.ssd can be used to create and run a SAS script that saves a SAS permanent dataset (.ssd or .sas7bdat) in Transport format. It then calls
read.xport to read the resulting file. (Package Hmisc has a similar function
sas.get, also running SAS.) For those without access to SAS but running on Windows, the SAS System Viewer (a zero-cost download) can be used to open SAS datasets and export them to e.g. .csv format.
read.S which can read binary objects produced by S-PLUS 3.x, 4.x or 2000 on (32-bit) Unix or Windows (and can read them on a different OS). This is able to read many but not all S objects: in particular it can read vectors, matrices and data frames and lists containing those.
data.restore reads S-PLUS data dumps (created by
data.dump) with the same restrictions (except that dumps from the Alpha platform can also be read). It should be possible to read data dumps from S-PLUS 5.x and later written with
If you have access to S-PLUS, it is usually more reliable to
dump the object(s) in S-PLUS and
source the dump file in R. For S-PLUS 5.x and later you may need to use
dump(…, oldStyle=T), and to read in very large objects it may be preferable to use the dump file as a batch script rather than use the
read.spss can read files created by the ‘save’ and ‘export’ commands in SPSS. It returns a list with one component for each variable in the saved data set. SPSS variables with value labels are optionally converted to R factors.
SPSS Data Entry is an application for creating data entry forms. By default it creates data files with extra formatting information that
read.spss cannot handle, but it is possible to export the data in an ordinary SPSS format.
Some third-party applications claim to produce data ‘in SPSS format’ but with differences in the formats:
read.spss may or may not be able to handle these.
Stata .dta files are a binary file format. Files from versions 5 up to 12 of Stata can be read and written by functions
write.dta. Stata variables with value labels are optionally converted to (and from) R factors. For Stata versions 13 and later see CRAN packages readstata13 and haven.
read.systat reads those Systat
SAVE files that are rectangular data files (
mtype = 1) written on little-endian machines (such as from Windows). These have extension .sys or (more recently) .syd.
Octave is a numerical linear algebra system (http://www.octave.org), and function
read.octave in package foreign can read in files in Octave text data format created using the Octave command
save -ascii, with support for most of the common types of variables, including the standard atomic (real and complex scalars, matrices, and N-d arrays, strings, ranges, and boolean scalars and matrices) and recursive (structs, cells, and lists) ones.