Chapter 10 Appendix A Essential and useful other programs under a Unix-alike

This appendix gives details of programs you will need to build R on Unix-like platforms, or which will be used by R if found by configure.

Remember that some package management systems (such as RPM and Debian/Ubuntu’s) make a distinction between the user version of a package and the development version. The latter usually has the same name but with the extension ‘-devel’ or ‘-dev’: you need both versions installed.


A.1 Essential programs and libraries

You need a means of compiling C and FORTRAN 90 (see Using FORTRAN). Your C compiler should be ISO/IEC 6005937, POSIX 1003.1 and C99-compliant.38 R tries to choose suitable flags39 for the C compilers it knows about, but you may have to set CC or CFLAGS suitably. For versions of gcc prior to 5.1 with glibc this means including -std=gnu9940. (Note that options essential to run the compiler even for linking, such as those to set the architecture, should be specified as part of CC rather than in CFLAGS.)

Unless you do not want to view graphs on-screen (or use macOS) you need ‘X11’ installed, including its headers and client libraries. For recent Fedora/RedHat distributions it means (at least) RPMs ‘libX11’, ‘libX11-devel’, ‘libXt’ and ‘libXt-devel’. On Debian/Ubuntu we recommend the meta-package ‘xorg-dev’. If you really do not want these you will need to explicitly configure R without X11, using –with-x=no.

The command-line editing (and command completion) depends on the GNU readline library (including its headers): version 4.2 or later is needed for all the features to be enabled. Otherwise you will need to configure with –with-readline=no (or equivalent).

A suitably comprehensive iconv function is essential. The R usage requires iconv to be able to translate between “latin1” and “UTF-8”, to recognize “” (as the current encoding) and “ASCII”, and to translate to and from the Unicode wide-character formats “UCS-[24][BL]E” — this is true by default for glibc41 but not of most commercial Unixes. However, you can make use of GNU libiconv (as used on macOS: see https://www.gnu.org/software/libiconv/).

The OS needs to have enough support42 for wide-character types: this is checked at configuration. A small number of POSIX functions43 are essential, and others44 will be used if available.

Installations of zlib (version 1.2.5 or later), libbz2 (version 1.0.6 or later: called bzip2-libs/bzip2-devel or libbz2-1.0/libbz2-dev by some Linux distributions), liblzma45 version 5.0.3 or later are required.

PCRE46 (version 8.32 or later, although versions 8.20–8.31 will be accepted with a deprecation warning) is required (or just its library and headers if packaged separately). Only the ‘8-bit’ interface is used (and only that is built by default when installing from sources). PCRE must be built with UTF-8 support (not the default, and checked by configure) and support for Unicode properties is assumed by some R packages. JIT support (optionally available) is desirable for the best performance: support for this and Unicode properties can be checked at run-time by calling pcre_config(). If building PCRE for use with R a suitable configure command might be

./configure --enable-utf --enable-unicode-properties --enable-jit --disable-cpp

The –enable-jit flag is supported for most common CPUs. (See also the comments for Solaris.)

Library libcurl (version 7.22.0 or later47) is required, with at least 7.28.0 being desirable. Information on libcurl is found from the curl-config script: if that is missing or needs to be overridden48 there are macros to do so described in file config.site.

A tar program is needed to unpack the sources and packages (including the recommended packages). A version49 that can automagically detect compressed archives is preferred for use with untar(): the configure script looks for gtar and gnutar before tar – use environment variable TAR to override this.

There need to be suitable versions of the tools grep and sed: the problems are usually with old AT&T and BSD variants. configure will try to find suitable versions (including looking in /usr/xpg4/bin which is used on some commercial Unixes).

You will not be able to build most of the manuals unless you have texi2any version 5.1 or later installed, and if not most of the HTML manuals will be linked to a version on CRAN. To make PDF versions of the manuals you will also need file texinfo.tex installed (which is part of the GNU texinfo distribution but is often made part of the TeX package in re-distributions) as well as texi2dvi.50 Further, the versions of texi2dvi and texinfo.tex need to be compatible: we have seen problems with older TeX distributions.

If you want to build from the R Subversion repository then texi2any is highly recommended as it is used to create files which are in the tarball but not stored in the Subversion repository.

The PDF documentation (including doc/NEWS.pdf) and building vignettes needs pdftex and pdflatex. We require LaTeX version 2005/12/01 or later (for UTF-8 support). Building PDF package manuals (including the R reference manual) and vignettes is sensitive to the version of the LaTeX package hyperref and we recommend that the TeX distribution used is kept up-to-date. A number of standard LaTeX packages are required (including url and some of the font packages such as times, helvetic, ec and cm-super) and others such as hyperref and inconsolata are desirable (and without them you may need to change R’s defaults: see Making the manuals). Note that package hyperref (currently) requires packages kvoptions, ltxcmds and refcount. For distributions based on TeX Live the simplest approach may be to install collections collection-latex, collection-fontsrecommended, collection-latexrecommended, collection-fontsextra and collection-latexextra (assuming they are not installed by default): Fedora uses names like texlive-collection-fontsextra and Debian/Ubuntu like texlive-fonts-extra.

The essential programs should be in your PATH at the time configure is run: this will capture the full paths.

Those distributing binary versions of R may need to be aware of the licences of the external libraries it is linked to (including ‘useful’ libraries from the next section). The liblzma library is in the public domain and X11, libbzip2, libcurl and zlib have MIT-style licences. PCRE has a BSD-style licence which requires distribution of the licence (included in R’s COPYRIGHTS file) in binary distributions. GNU readline is licensed under GPL (which version(s) depending on the readline version).


A.2 Useful libraries and programs

The ability to use translated messages makes use of gettext and most likely needs GNU gettext: you do need this to work with new translations, but otherwise the version contained in the R sources will be used if no suitable external gettext is found.

The ‘modern’ version of the X11(), jpeg(), png() and tiff() graphics devices uses the cairo and (optionally) Pango libraries. Cairo version 1.2.0 or later is required. Pango needs to be at least version 1.10, and 1.12 is the earliest version we have tested. (For Fedora users we believe the pango-devel RPM and its dependencies suffice.) R checks for pkg-config, and uses that to check first that the ‘pangocairo’ package is installed (and if not, ‘cairo’) and if additional flags are needed for the ‘cairo-xlib’ package, then if suitable code can be compiled. These tests will fail if pkg-config is not installed51, and are likely to fail if cairo was built statically (unusual). Most systems with Gtk+ 2.8 or later installed will have suitable libraries

For the best font experience with these devices you need suitable fonts installed: Linux users will want the urw-fonts package. On platforms which have it available, the msttcorefonts package52 provides TrueType versions of Monotype fonts such as Arial and Times New Roman. Another useful set of fonts is the ‘liberation’ TrueType fonts available at https://fedorahosted.org/liberation-fonts/,53 which cover the Latin, Greek and Cyrillic alphabets plus a fair range of signs. These share metrics with Arial, Times New Roman and Courier New, and contain fonts rather similar to the first two (https://en.wikipedia.org/wiki/Liberation_fonts). Then there is the ‘Free UCS Outline Fonts’ project (https://www.gnu.org/software/freefont/) which are OpenType/TrueType fonts based on the URW fonts but with extended Unicode coverage. See the R help on X11 on selecting such fonts.

The bitmapped graphics devices jpeg(), png() and tiff() need the appropriate headers and libraries installed: jpeg (version 6b or later, or libjpeg-turbo) or libpng (version 1.2.7 or later) and zlib or libtiff (any recent version – 3.9.[4567] and 4.0.[23] have been tested) respectively. They also need support for either X11 or cairo (see above). Should support for these devices not be required or broken system libraries need to be avoided there are configure options –without-libpng, –without-jpeglib and –without-libtiff. For most system installations the TIFF libraries will require JPEG libraries to be present and perhaps linked explicitly, so –without-jpeglib may also disable the tiff() device. The tiff() devices only require a basic build of libtiff (not even JPEG support is needed). Recent versions allow several other libraries to be linked into libtiff such as lzma, jbig and jpeg12, and these may need also to be present.

Option –with-system-tre is also available: it needs a recent version of TRE. (The current sources are in the git repository at https://github.com/laurikari/tre/, but at the time of writing the resulting build will not pass its checks.).

An implementation of XDR is required, and the R sources contain one which is likely to suffice (although a system version may have higher performance). XDR is part of RPC and historically has been part of libc on a Unix-alike. However some builds of glibc hide it with the intention that the TI-RPC library be used instead, in which case libtirpc (and its development version) needs to be installed, and its headers need to be on the C include path or in /usr/include/tirpc.

Use of the X11 clipboard selection requires the Xmu headers and libraries. These are normally part of an X11 installation (e.g. the Debian meta-package ‘xorg-dev’), but some distributions have split this into smaller parts, so for example recent versions of Fedora require the ‘libXmu’ and ‘libXmu-devel’ RPMs.

Some systems (notably macOS and at least some FreeBSD systems) have inadequate support for collation in multibyte locales. It is possible to replace the OS’s collation support by that from ICU (International Components for Unicode, http://site.icu-project.org/), and this provides much more precise control over collation on all systems. ICU is available as sources and as binary distributions for (at least) most Linux distributions, Solaris, FreeBSD and AIX, usually as libicu or icu4c. It will be used by default where available: should a very old or broken version of ICU be found this can be suppressed by –without-ICU.

The bitmap and dev2bitmap devices and function embedFonts() use ghostscript (http://www.ghostscript.com/). This should either be in your path when the command is run, or its full path specified by the environment variable R_GSCMD at that time.


A.2.1 Tcl/Tk

The tcltk package needs Tcl/Tk ≥ 8.4 installed: the sources are available at https://www.tcl.tk/. To specify the locations of the Tcl/Tk files you may need the configuration options

–with-tcltk

use Tcl/Tk, or specify its library directory

–with-tcl-config=TCL_CONFIG

specify location of tclConfig.sh

–with-tk-config=TK_CONFIG

specify location of tkConfig.sh

or use the configure variables TCLTK_LIBS and TCLTK_CPPFLAGS to specify the flags needed for linking against the Tcl and Tk libraries and for finding the tcl.h and tk.h headers, respectively. If you have both 32- and 64-bit versions of Tcl/Tk installed, specifying the paths to the correct config files may be necessary to avoid confusion between them.

Versions of Tcl/Tk up to 8.5.19 and 8.6.4 have been tested (including most versions of 8.4.x, but not recently).

Note that the tk.h header includes54 X11 headers, so you will need X11 and its development files installed.


A.2.2 Java support

The build process looks for Java support on the host system, and if it finds it sets some settings which are useful for Java-using packages (such as rJava and JavaGD). This check can be suppressed by configure option –disable-java. Configure variable JAVA_HOME can be set to point to a specific JRE/JDK, on the configure command line or in the environment.

Principal amongst these settings are some library paths to the Java libraries and JVM, which are stored in environment variable R_JAVA_LD_LIBRARY_PATH in file R_HOME/etc/ldpaths (or a sub-architecture-specific version). A typical setting for ‘x86_64’ Linux is

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-1.b15.fc22.x86_64/jre
R_JAVA_LD_LIBRARY_PATH=${JAVA_HOME}/lib/that/server

Unfortunately this depends on the exact version of the JRE/JDK installed, and so may need updating if the Java installation is updated. This can be done by running R CMD javareconf which updates settings in both R_HOME/etc/Makeconf and R_HOME/etc/ldpaths. See R CMD javareconf –help for details: note that this needs to be done by the account owning the R installation.

Another way of overriding those settings is to set the environment variable R_JAVA_LD_LIBRARY_PATH (before R is started, hence not in ~/.Renviron), which suffices to run already-installed Java-using packages. For example

R_JAVA_LD_LIBRARY_PATH=/usr/lib/jvm/java-1.8.0/jre/lib/amd64/server

It may be possible to avoid this by specifying an invariant link as the path when configuring. For example, on that system any of

JAVA_HOME=/usr/lib/jvm/java
JAVA_HOME=/usr/lib/jvm/java-1.8.0
JAVA_HOME=/usr/lib/jvm/java-1.8.0/jre

worked.


A.2.3 Other compiled languages

Some add-on packages need a C++ compiler. This is specified by the configure variables CXX, CXXFLAGS and similar. configure will normally find a suitable compiler. However, in many cases this will be a C++98 compiler, and it is possible to specify an alternative compiler for use with C++11 by the configure variables CXX11, CXX11STD, CXX11FLAGS and similar (see C++ Support). Again, configure will normally find a suitable value for CXX11STD if the compiler given by CXX is capable of compiling C++11 code, but it is possible that a completely different compiler will be needed.

Other packages need full Fortran 90 (or later) support. For source files with extension .f90 or .f95, the compiler defined by the macro FC is used by R CMD INSTALL. This is found when R is configured and is often the same as F77: note that it is detected by the name of the command without a test that it can actually compile Fortran 90 code. Set the configure variable FC to override this if necessary: variables FCFLAGS, FCPICFLAGS, FCLIBS, SHLIB_FCLD and SHLIB_FCLDFLAGS might also need to be set.

See file config.site in the R source for more details about these variables.


A.3 Linear algebra


A.3.1 BLAS

The linear algebra routines in R can make use of enhanced BLAS (Basic Linear Algebra Subprograms, http://www.netlib.org/blas/faq.html) routines. However, these have to be explicitly requested at configure time: R provides an internal BLAS which is well-tested and will be adequate for most uses of R.

You can specify a particular BLAS library via a value for the configuration option –with-blas and not to use an external BLAS library by –without-blas (the default). If –with-blas is given with no =, its value is taken from the environment variable BLAS_LIBS, set for example in config.site. If neither the option nor the environment variable supply a value, a search is made for a suitable BLAS. If the value is not obviously a linker command (starting with a dash or giving the path to a library), it is prefixed by ‘-l’, so

--with-blas="foo"

is an instruction to link against ‘-lfoo’ to find an external BLAS (which needs to be found both at link time and run time).

The configure code checks that the external BLAS is complete (it must include all double precision and double complex routines, as well as LSAME), and appears to be usable. However, an external BLAS has to be usable from a shared object (so must contain position-independent code), and that is not checked.

Some enhanced BLASes are compiler-system-specific (sunperf on Solaris55, libessl on IBM, Accelerate on macOS). The correct incantation for these is often found via –with-blas with no value on the appropriate platforms.

Some of the external BLASes are multi-threaded. One issue is that R profiling (which uses the SIGPROF signal) may cause problems, and you may want to disable profiling if you use a multi-threaded BLAS. Note that using a multi-threaded BLAS can result in taking more CPU time and even more elapsed time (occasionally dramatically so) than using a similar single-threaded BLAS. On a machine running other tasks, there can be contention for CPU caches that reduces the effectiveness of the optimization of cache use by a BLAS implementation.

Note that under Unix (but not under Windows) if R is compiled against a non-default BLAS and –enable-BLAS-shlib is not used, then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed.

R relies on ISO/IEC 60559 compliance of an external BLAS. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to be computed—whereas x*0 can be NaN. This is checked in the test suite.

External BLAS implementations often make less use of extended-precision floating-point registers and will almost certainly re-order computations. This can result in less accuracy than using the internal BLAS, and may result in different solutions, e.g. different signs in SVD and eigendecompositions.

The URIs for several of these BLAS are subject to frequent gratuitous changes, so you will need to search for their current locations.


A.3.1.1 ATLAS

ATLAS (http://math-atlas.sourceforge.net/) is a “tuned” BLAS that runs on a wide range of Unix-alike platforms. Unfortunately it is built by default as a static library that on some platforms cannot be used with shared objects such as are used in R packages. Be careful when using pre-built versions of ATLAS (they seem to work on ‘ix86’ platforms, but not always on ‘x86_64’ ones).

The usual way to specify ATLAS will be via

--with-blas="-lf77blas -latlas"

if the libraries are in the library path, otherwise by

--with-blas="-L/path/to/ATLAS/libs -lf77blas -latlas"

For example, ‘x86_64’ Fedora needs

--with-blas="-L/usr/lib64/atlas -lf77blas -latlas"

For systems with multiple CPU cores it is possible to use a multi-threaded version of ATLAS, by specifying

--with-blas="-lptf77blas -lpthread -latlas"

Consult its installation guide for how to build ATLAS with position-independent code, and as a shared library.


A.3.1.2 ACML

For ‘x86_64’ processors56 under Linux there is the AMD Core Math Library (ACML). For the gcc version we could use

--with-blas="-lacml"

if the appropriate library directory (such as /opt/acml5.1.0/gfortran64/lib) is in the LD_LIBRARY_PATH. For other compilers, see the ACML documentation. There is a multithreaded Linux version of ACML available for recent versions of gfortran. To make use of this you will need something like

--with-blas="-L/opt/acml5.1.0/gfortran64_mp/lib -lacml_mp"

(and you may need to arrange for the directory to be in ld.so cache).

See see Shared BLAS for an alternative (and in many ways preferable) way to use ACML.

The version last tested (5.1.0) failed the reg-BLAS.R test in its handling of NAs.


A.3.1.3 Goto and OpenBLAS

Dr Kazushige Goto wrote a tuned BLAS for several processors and OSes, which was frozen in mid-2010. The final version is known as GotoBLAS2, and was re-released under a much less restrictive licence. Once it is built and installed, it can be used by configuring R with

--with-blas="-lgoto2"

See see Shared BLAS for an alternative (and in many ways preferable) way to use it.

OpenBLAS (http://www.openblas.net/) is a descendant project with support for some later CPUs (e.g. Intel Sandy Bridge). Once installed it can be used by something like

--with-blas="-lopenblas"

or as a shared BLAS.


A.3.1.4 Intel MKL

For Intel processors (and perhaps others) and some distributions of Linux, there is Intel’s Math Kernel Library. You are strongly encouraged to read the MKL User’s Guide, which is installed with the library, before attempting to link to MKL. This includes a ‘link line advisor’ which will suggest appropriate incantations: its use is recommended. Or see https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor.

There are also versions of MKL for macOS and Windows, but at the time these were tried they did not work with the standard compilers used for R on those platforms.

The MKL interface has changed several times and may change again: the following examples have been used with versions 10.3 to 11.3, for GCC compilers on ‘x86_64’.

To a sequential version of MKL we used

MKL_LIB_PATH=/path/to/intel_mkl/lib/intel64
export LD_LIBRARY_PATH=$MKL_LIB_PATH
MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core -lmkl_sequential"
./configure --with-blas="$MKL" --with-lapack

The option –with-lapack is used since MKL contains a tuned copy of LAPACK as well as BLAS (see LAPACK), although this can be omitted.

Threaded MKL may be used by replacing the line defining the variable MKL by

MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_core \
     -lmkl_gnu_thread -dl -lpthread"

The default number of threads will be chosen by the OpenMP software, but can be controlled by setting OMP_NUM_THREADS or MKL_NUM_THREADS, and in recent versions seems to default to a sensible value for sole use of the machine.

It has been reported that

--with-blas='-mkl=parallel' --with-lapack

worked with the Intel 2015.3 compilers on Centos 6.


A.3.1.5 Shared BLAS

The BLAS library will be used for many of the add-on packages as well as for R itself. This means that it is better to use a shared/dynamic BLAS library, as most of a static library will be compiled into the R executable and each BLAS-using package.

R offers the option of compiling the BLAS into a dynamic library libRblas stored in R_HOME/lib and linking both R itself and all the add-on packages against that library.

This is the default on all platforms except AIX unless an external BLAS is specified and found: for the latter it can be used by specifying the option –enable-BLAS-shlib, and it can always be disabled via –disable-BLAS-shlib.

This has both advantages and disadvantages.

  • It saves space by having only a single copy of the BLAS routines, which is helpful if there is an external static BLAS such as used to be standard for ATLAS.
  • There may be performance disadvantages in using a shared BLAS. Probably the most likely is when R’s internal BLAS is used and R is not built as a shared library, when it is possible to build the BLAS into R.bin (and libR.a) without using position-independent code. However, experiments showed that in many cases using a shared BLAS was as fast, provided high levels of compiler optimization are used.
  • It is easy to change the BLAS without needing to re-install R and all the add-on packages, since all references to the BLAS go through libRblas, and that can be replaced. Note though that any dynamic libraries the replacement links to will need to be found by the linker: this may need the library path to be changed in R_HOME/etc/ldpaths.

Another option to change the BLAS in use is to symlink a dynamic BLAS library (such as ACML or Goto’s) to R_HOME/lib/libRblas.so. For example, just

mv R_HOME/lib/libRblas.so R_HOME/lib/libRblas.so.keep
ln -s /opt/acml5.1.0/gfortran64_mp/lib/libacml_mp.so R_HOME/lib/libRblas.so

will change the BLAS in use to multithreaded ACML. A similar link works for some versions of Goto BLAS, OpenBLAS and MKL (provided the appropriate lib directory is in the run-time library path or ld.so cache).


A.3.2 LAPACK

Provision is made for using an external LAPACK library, principally to cope with BLAS libraries which contain a copy of LAPACK (such as sunperf on Solaris, Accelerate on macOS and ACML and MKL on ‘ix86’/‘x86_64’ Linux). At least LAPACK version 3.2 is required. This can only be done if –with-blas has been used.

However, the likely performance gains are thought to be small (and may be negative), and the default is not to search for a suitable LAPACK library, and this is definitely not recommended. You can specify a specific LAPACK library or a search for a generic library by the configuration option –with-lapack. The default for –with-lapack is to check the BLAS library and then look for an external library ‘-llapack’. Sites searching for the fastest possible linear algebra may want to build a LAPACK library using the ATLAS-optimized subset of LAPACK. To do so specify something like

--with-lapack="-L/path/to/ATLAS/libs -llapack -lcblas"

since the ATLAS subset of LAPACK depends on libcblas. A value for –with-lapack can be set via the environment variable LAPACK_LIBS, but this will only be used if –with-lapack is specified (as the default value is no) and the BLAS library does not contain LAPACK.

Since ACML contains a full LAPACK, if selected as the BLAS it can be used as the LAPACK via –with-lapack.

If you do use –with-lapack, be aware of potential problems with bugs in the LAPACK sources (or in the posted corrections to those sources). In particular, bugs in DGEEV and DGESDD have resulted in error messages such as

DGEBRD gave error code -10

. Other potential problems are incomplete versions of the libraries, seen several times in Linux distributions over the years.

Please do bear in mind that using –with-lapack is ‘definitely not recommended’: it is provided only because it is necessary on some platforms and because some users want to experiment with claimed performance improvements. Reporting problems where it is used unnecessarily will simply irritate the R helpers.

Note too the comments about ISO/IEC 60559 compliance in the section of external BLAS: these apply equally to an external LAPACK, and for example the Intel MKL documentation says

LAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or NaN values. Using these special values may cause LAPACK to return unexpected results or become unstable.

We rely on limited support in LAPACK for matrices with 2^{31} or more elements: it is quite possible that an external LAPACK will not have that support.

If you have a pure FORTRAN 77 compiler which cannot compile LAPACK it may be possible to use CLAPACK from http://www.netlib.org/clapack/ by something like

-with-lapack="-lclapack -lf2c"

provided these were built with position-independent code and the calling conventions for double complex function return values match those in the BLAS used, so it may be simpler to use CLAPACK built to use CBLAS and

-with-lapack="-lclapack -lcblas -lf2c"

A.3.3 Caveats

As with all libraries, you need to ensure that they and R were compiled with compatible compilers and flags. For example, this has meant that on Sun Sparc using the native compilers the flag -dalign is needed if sunperf is to be used.

On some systems it has been necessary that an external BLAS/LAPACK was built with the same FORTRAN compiler used to build R.