Chapter 14 OS facilities
R has quite extensive facilities to access the OS under which it is running: this allows it to be used as a scripting language and that ability is much used by R itself, for example to install packages.
Because R’s own scripts need to work across all platforms, considerable effort has gone into make the scripting facilities as platform-independent as is feasible.
14.1 Files and directories
There are many functions to manipulate files and directories. Here are pointers to some of the more commonly used ones.
To create an (empty) file or directory, use
dir.create. (These are the analogues of the POSIX utilities
mkdir.) For temporary files and directories in the R session directory see
Files can be removed by either
unlink: the latter can remove directory trees.
For directory listings use
list.files (also available as
list.dirs. These can select files using a regular expression: to select by wildcards use
Many types of information on a filepath (including for example if it is a file or directory) can be found by
There are several ways to find out if a file ‘exists’ (a file can exist on the filesystem and not be visible to the current user). There are functions
file_test with various versions of this test:
file_test is a version of the POSIX
test command for those familiar with shell scripting.
file.copy is the R analogue of the POSIX command
Choosing files can be done interactively by
file.choose: the Windows port has the more versatile functions
choose.dir and there are similar functions in the tcltk package:
file.edit will display and edit one or more files in a way appropriate to the R port, using the facilities of a console (such as RGui on Windows or R.app on macOS) if one is in use.
There is some support for links in the filesystem: see functions
With a few exceptions, R relies on the underlying OS functions to manipulate filepaths. Some aspects of this are allowed to depend on the OS, and do, even down to the version of the OS. There are POSIX standards for how OSes should interpret filepaths and many R users assume POSIX compliance: but Windows does not claim to be compliant and other OSes may be less than completely compliant.
The following are some issues which have been encountered with filepaths.
- POSIX filesystems are case-sensitive, so foo.png and Foo.PNG are different files. However, the defaults on Windows and macOS are to be case-insensitive, and FAT filesystems (commonly used on removable storage) are not normally case-sensitive (and all filepaths may be mapped to lower case).
- Almost all the Windows’ OS services support the use of slash or backslash as the filepath separator, and R converts the known exceptions to the form required by Windows.
- The behaviour of filepaths with a trailing slash is OS-dependent. Such paths are not valid on Windows and should not be expected to work. POSIX-2008 requires such paths to match only directories, but earlier versions allowed them to also match files. So they are best avoided.
- Multiple slashes in filepaths such as /abc//def are valid on POSIX filesystems and treated as if there was only one slash. They are usually accepted by Windows’ OS functions. However, leading double slashes may have a different meaning.
- Windows’ UNC filepaths (such as \server12and \?12) are not supported, but they may work in some R functions. POSIX filesystems are allowed to treat a leading double slash specially.
- Windows allows filepaths containing drives and relative to the current directory on a drive, e.g. d:foo/bar refers to d:/a/b/c/foo/bar if the current directory on drive d: is /a/b/c. It is intended that these work, but the use of absolute paths is safer.
dirname select parts of a file path: the recommended way to assemble a file path from components is
pathexpand does ‘tilde expansion’, substituting values for home directories (the current user’s, and perhaps those of other users).
On filesystems with links, a single file can be referred to by many filepaths. Function
normalizePath will find a canonical filepath.
Windows has the concepts of short (‘8.3’) and long file names:
normalizePath will return an absolute path using long file names and
shortPathName will return a version using short names. The latter does not contain spaces and uses backslash as the separator, so is sometimes useful for exporting names from R.
File permissions are a related topic. R has support for the POSIX concepts of read/write/execute permission for owner/group/all but this may be only partially supported on the filesystem, so for example on Windows only read-only files (for the account running the R session) are recognized. Access Control Lists (ACLs) are employed on several filesystems, but do not have an agreed standard and R has no facilities to control them. Use
Sys.chmod to change permissions.
14.3 System commands
system2 are used to invoke a system command and optionally collect its output.
system2 is a little more general but its main advantage is that it is easier to write cross-platform code using it.
system behaves differently on Windows from other OSes (because the API C call of that name does). Elsewhere it invokes a shell to run the command: the Windows port of R has a function
shell to do that.
To find out if the OS includes a command, use
Sys.which, which attempts to do this in a cross-platform way (unfortunately it is not a standard OS service).
shQuote will quote filepaths as needed for commands in the current OS.
14.4 Compression and Archives
Recent versions of R have extensive facilities to read and write compressed files, often transparently. Reading of files in R is to a vey large extent done by connections, and the
file function which is used to open a connection to a file (or a URL) and is able to identify the compression used from the ‘magic’ header of the file.
The type of compression which has been supported for longest is
gzip compression, and that remains a good general compromise. Files compressed by the earlier Unix
compress utility can also be read, but these are becoming rare. Two other forms of compression, those of the
xz utilities are also available. These generally achieve higher rates of compression (depending on the file, much higher) at the expense of slower decompression and much slower compression.
There is some confusion between
lzma compression (see https://en.wikipedia.org/wiki/Xz and https://en.wikipedia.org/wiki/LZMA): R can read files compressed by most versions of either.
File archives are single files which contain a collection of files, the most common ones being ‘tarballs’ and zip files as used to distribute R packages. R can list and unpack both (see functions
unzip) and create both (for
zip with the help of an external program).