Chapter 6 The R API: entry points for C code

There are a large number of entry points in the R executable/DLL that can be called from C code (and some that can be called from FORTRAN code). Only those documented here are stable enough that they will only be changed with considerable notice.

The recommended procedure to use these is to include the header file R.h in your C code by

#include <R.h>

This will include several other header files from the directory R_INCLUDE_DIR/R_ext, and there are other header files there that can be included too, but many of the features they contain should be regarded as undocumented and unstable.

Most of these header files, including all those included by R.h, can be used from C++ code.

Note: Because R re-maps many of its external names to avoid clashes with user code, it is essential to include the appropriate header files when using these entry points.

This remapping can cause problems¹³⁷, and can be eliminated by defining R_NO_REMAP and prepending ‘Rf_’ to all the function names used from Rinternals.h and R_ext/Error.h. These problems can usually be avoided by including other headers (such as system headers and those for external software used by the package) before R.h.

We can classify the entry points as

API: Entry points which are documented in this manual and declared in an installed header file. These can be used in distributed packages and will only be changed after deprecation.
public: Entry points declared in an installed header file that are exported on all R platforms but are not documented and subject to change without notice.
private: Entry points that are used when building R and exported on all R platforms but are not declared in the installed header files. Do not use these in distributed code.
hidden: Entry points that are where possible (Windows and some modern Unix-alike compilers/loaders when using R as a shared library) not exported.

6.1 Memory allocation

There are two types of memory allocation available to the C programmer, one in which R manages the clean-up and the other in which user has full control (and responsibility).

6.1.1 Transient storage allocation

Here R will reclaim the memory at the end of the call to .C, .Call or .External. Use

char *R_alloc(size_t n, int size)

which allocates n units of size bytes each. A typical usage (from package stats) is

x = (int *) R_alloc(nrows(merge)+2, sizeof(int));

(size_t is defined in stddef.h which the header defining R_alloc includes.)

There is a similar call, S_alloc (for compatibility with older versions of S) which zeroes the memory allocated,

char *S_alloc(long n, int size)

and

char *S_realloc(char *p, long new, long old, int size)

which changes the allocation size from old to new units, and zeroes the additional units.

For compatibility with current versions of S, header S.h (only) defines wrapper macros equivalent to

type* Salloc(long n, int type)
type* Srealloc(char *p, long new, long old, int type)

This memory is taken from the heap, and released at the end of the .C, .Call or .External call. Users can also manage it, by noting the current position with a call to vmaxget and subsequently clearing memory allocated by a call to vmaxset. An example might be

void *vmax = vmaxget()
// a loop involving the use of R_alloc at each iteration
vmaxset(vmax)

This is only recommended for experts.

Note that this memory will be freed on error or user interrupt (if allowed: see Allowing interrupts).

The memory returned is only guaranteed to be aligned as required for double pointers: take precautions if casting to a pointer which needs more. There is also

long double *R_allocLD(size_t n)

which is guaranteed to have the 16-byte alignment needed for long double pointers on some platforms.

These functions should only be used in code called by .C etc, never from front-ends. They are not thread-safe.

6.1.2 User-controlled memory

The other form of memory allocation is an interface to malloc, the interface providing R error handling. This memory lasts until freed by the user and is additional to the memory allocated for the R workspace.

The interface functions are

type* Calloc(size_t n, type)
type* Realloc(any *p, size_t n, type)
void Free(any *p)

providing analogues of calloc, realloc and free. If there is an error during allocation it is handled by R, so if these routines return the memory has been successfully allocated or freed. Free will set the pointer p to NULL. (Some but not all versions of S do so.)

Users should arrange to Free this memory when no longer needed, including on error or user interrupt. This can often be done most conveniently from an on.exit action in the calling R function – see pwilcox for an example.

Do not assume that memory allocated by Calloc/Realloc comes from the same pool as used by malloc: in particular do not use free or strdup with it.

Memory obtained by these functions should be aligned in the same way as malloc, that is ‘suitably aligned for any kind of variable’.

These entry points need to be prefixed by R_ if STRICT_R_HEADERS has been defined.

6.2 Error handling

The basic error handling routines are the equivalents of stop and warning in R code, and use the same interface.

void error(const char * format, ...);
void warning(const char * format, ...);

These have the same call sequences as calls to printf, but in the simplest case can be called with a single character string argument giving the error message. (Don’t do this if the string contains ‘%’ or might otherwise be interpreted as a format.)

If STRICT_R_HEADERS is not defined there is also an S-compatibility interface which uses calls of the form

PROBLEM ...... ERROR
MESSAGE ...... WARN
PROBLEM ...... RECOVER(NULL_ENTRY)
MESSAGE ...... WARNING(NULL_ENTRY)

the last two being the forms available in all S versions. Here ‘……’ is a set of arguments to printf, so can be a string or a format string followed by arguments separated by commas.

6.2.1 Error handling from FORTRAN

There are two interface function provided to call error and warning from FORTRAN code, in each case with a simple character string argument. They are defined as

subroutine rexit(message)
subroutine rwarn(message)

Messages of more than 255 characters are truncated, with a warning.

6.3 Random number generation

The interface to R’s internal random number generation routines is

double unif_rand();
double norm_rand();
double exp_rand();

giving one uniform, normal or exponential pseudo-random variate. However, before these are used, the user must call

GetRNGstate();

and after all the required variates have been generated, call

PutRNGstate();

These essentially read in (or create) .Random.seed and write it out after use.

File S.h defines seed_in and seed_out for S-compatibility rather than GetRNGstate and PutRNGstate. These take a long * argument which is ignored.

The random number generator is private to R; there is no way to select the kind of RNG or set the seed except by evaluating calls to the R functions.

The C code behind R’s rxxx functions can be accessed by including the header file Rmath.h; See Distribution functions. Those calls generate a single variate and should also be enclosed in calls to GetRNGstate and PutRNGstate.

6.4 Missing and IEEE special values

A set of functions is provided to test for NA, Inf, -Inf and NaN. These functions are accessed via macros:

ISNA(x)        True for R’s NA only
ISNAN(x)       True for R’s NA and IEEE NaN
R_FINITE(x)    False for Inf, -Inf, NA, NaN

and via function R_IsNaN which is true for NaN but not NA.

Do use R_FINITE rather than isfinite or finite; the latter is often mendacious and isfinite is only available on a some platforms, on which R_FINITE is a macro expanding to isfinite.

Currently in C code ISNAN is a macro calling isnan. (Since this gives problems on some C++ systems, if the R headers is called from C++ code a function call is used.)

You can check for Inf or -Inf by testing equality to R_PosInf or R_NegInf, and set (but not test) an NA as NA_REAL.

All of the above apply to double variables only. For integer variables there is a variable accessed by the macro NA_INTEGER which can used to set or test for missingness.

6.5 Printing

The most useful function for printing from a C routine compiled into R is Rprintf. This is used in exactly the same way as printf, but is guaranteed to write to R’s output (which might be a GUI console rather than a file, and can be re-directed by sink). It is wise to write complete lines (including the “\n”) before returning to R. It is defined in R_ext/Print.h.

The function REprintf is similar but writes on the error stream (stderr) which may or may not be different from the standard output stream.

Functions Rvprintf and REvprintf are analogues using the vprintf interface. Because that is a C99¹³⁸ interface, they are only defined by R_ext/Print.h in C++ code if the macro R_USE_C99_IN_CXX is defined when it is included.

Another circumstance when it may be important to use these functions is when using parallel computation on a cluster of computational nodes, as their output will be re-directed/logged appropriately.

6.5.1 Printing from FORTRAN

On many systems FORTRAN write and print statements can be used, but the output may not interleave well with that of C, and will be invisible on GUI interfaces. They are not portable and best avoided.

Three subroutines are provided to ease the output of information from FORTRAN code.

subroutine dblepr(label, nchar, data, ndata)
subroutine realpr(label, nchar, data, ndata)
subroutine intpr (label, nchar, data, ndata)

Here label is a character label of up to 255 characters, nchar is its length (which can be -1 if the whole label is to be used), and data is an array of length at least ndata of the appropriate type (double precision, real and integer respectively). These routines print the label on one line and then print data as if it were an R vector on subsequent line(s). They work with zero ndata, and so can be used to print a label alone.

6.6 Calling C from FORTRAN and vice versa

Naming conventions for symbols generated by FORTRAN differ by platform: it is not safe to assume that FORTRAN names appear to C with a trailing underscore. To help cover up the platform-specific differences there is a set of macros that should be used.

F77_SUB(name): to define a function in C to be called from FORTRAN
F77_NAME(name): to declare a FORTRAN routine in C before use
F77_CALL(name): to call a FORTRAN routine from C
F77_COMDECL(name): to declare a FORTRAN common block in C
F77_COM(name): to access a FORTRAN common block from C

On most current platforms these are all the same, but it is unwise to rely on this. Note that names with underscores are not legal in FORTRAN 77, and are not portably handled by the above macros. (Also, all FORTRAN names for use by R are lower case, but this is not enforced by the macros.)

For example, suppose we want to call R’s normal random numbers from FORTRAN. We need a C wrapper along the lines of

#include <R.h>

void F77_SUB(rndstart)(void) { GetRNGstate(); }
void F77_SUB(rndend)(void) { PutRNGstate(); }
double F77_SUB(normrnd)(void) { return norm_rand(); }

to be called from FORTRAN as in

      subroutine testit()
      double precision normrnd, x
      call rndstart()
      x = normrnd()
      call dblepr("X was", 5, x, 1)
      call rndend()
      end

Note that this is not guaranteed to be portable, for the return conventions might not be compatible between the C and FORTRAN compilers used. (Passing values via arguments is safer.)

The standard packages, for example stats, are a rich source of further examples.

Passing character strings from C to FORTRAN 77 or vice versa is not portable (and to Fortran 90 or later is even less so). We have found that it helps to ensure that a C string to be passed is followed by several nuls (and not just the one needed as a C terminator). But for maximal portability character strings in FORTRAN should be avoided.

6.7 Numerical analysis subroutines

R contains a large number of mathematical functions for its own use, for example numerical linear algebra computations and special functions.

The header files R_ext/BLAS.h, R_ext/Lapack.h and R_ext/Linpack.h contains declarations of the BLAS, LAPACK and LINPACK linear algebra functions included in R. These are expressed as calls to FORTRAN subroutines, and they will also be usable from users’ FORTRAN code. Although not part of the official API, this set of subroutines is unlikely to change (but might be supplemented).

The header file Rmath.h lists many other functions that are available and documented in the following subsections. Many of these are C interfaces to the code behind R functions, so the R function documentation may give further details.

6.7.1 Distribution functions

The routines used to calculate densities, cumulative distribution functions and quantile functions for the standard statistical distributions are available as entry points.

The arguments for the entry points follow the pattern of those for the normal distribution:

double dnorm(double x, double mu, double sigma, int give_log);
double pnorm(double x, double mu, double sigma, int lower_tail,
             int give_log);
double qnorm(double p, double mu, double sigma, int lower_tail,
             int log_p);
double rnorm(double mu, double sigma);

That is, the first argument gives the position for the density and CDF and probability for the quantile function, followed by the distribution’s parameters. Argument lower_tail should be TRUE (or 1) for normal use, but can be FALSE (or 0) if the probability of the upper tail is desired or specified.

Finally, give_log should be non-zero if the result is required on log scale, and log_p should be non-zero if p has been specified on log scale.

Note that you directly get the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by using

- pdist(t, ..., /*lower_tail = */ FALSE, /* give_log = */ TRUE)

or shorter (and more cryptic) - pdist(t, …, 0, 1).

The random-variate generation routine rnorm returns one normal variate. See Random numbers, for the protocol in using the random-variate routines.

Note that these argument sequences are (apart from the names and that rnorm has no n) mainly the same as the corresponding R functions of the same name, so the documentation of the R functions can be used. Note that the exponential and gamma distributions are parametrized by scale rather than rate.

For reference, the following table gives the basic name (to be prefixed by ‘d’, ‘p’, ‘q’ or ‘r’ apart from the exceptions noted) and distribution-specific arguments for the complete set of distributions.

beta beta a, b

non-central beta nbeta a, b, ncp

binomial binom n, p

Cauchy cauchy location, scale

chi-squared chisq df

non-central chi-squared nchisq df, ncp

exponential exp scale (and not rate)

F f n1, n2

non-central F nf n1, n2, ncp

gamma gamma shape, scale

geometric geom p

hypergeometric hyper NR, NB, n

logistic logis location, scale

lognormal lnorm logmean, logsd

negative binomial nbinom size, prob

normal norm mu, sigma

Poisson pois lambda

Student’s t t n

non-central t nt df, delta

Studentized range tukey (*) rr, cc, df

uniform unif a, b

Weibull weibull shape, scale

Wilcoxon rank sum wilcox m, n

Wilcoxon signed rank signrank n

Entries marked with an asterisk only have ‘p’ and ‘q’ functions available, and none of the non-central distributions have ‘r’ functions. After a call to dwilcox, pwilcox or qwilcox the function wilcox_free() should be called, and similarly for the signed rank functions.

(If remapping is suppressed, the Normal distribution names are Rf_dnorm4, Rf_pnorm5 and Rf_qnorm5.)

For the negative binomial distribution (‘nbinom’), in addition to the (size, prob) parametrization, the alternative (size, mu) parametrization is provided as well by functions ‘[dpqr]nbinom_mu()’, see ?NegBinomial in R.

Functions dpois_raw(x, ) and dbinom_raw(x, ) are versions of the Poisson and binomial probability mass functions which work continuously in x, whereas dbinom(x,) and dpois(x,) only return non zero values for integer x.

double dbinom_raw(double x, double n, double p, double q, int give_log)
double dpois_raw (double x, double lambda, int give_log)

Note that dbinom_raw() gets both p and q = 1-p which may be advantageous when one of them is close to 1.

6.7.2 Mathematical functions

Function: double gammafn (double x)
Function: double lgammafn (double x)
Function: double digamma (double x)
Function: double trigamma (double x)
Function: double tetragamma (double x)
Function: double pentagamma (double x)
Function: double psigamma (double x, double deriv): The Gamma function, the natural logarithm of its absolute value and first four derivatives and the n-th derivative of Psi, the digamma function, which is the derivative of lgammafn. In other words, digamma(x) is the same as psigamma(x,0), trigamma(x) == psigamma(x,1), etc.

Function: double beta (double a, double b)
Function: double lbeta (double a, double b): The (complete) Beta function and its natural logarithm.

Function: double choose (double n, double k)
Function: double lchoose (double n, double k): The number of combinations of k items chosen from from n and the natural logarithm of its absolute value, generalized to arbitrary real n. k is rounded to the nearest integer (with a warning if needed).

Function: double bessel_i (double x, double nu, double expo)
Function: double bessel_j (double x, double nu)
Function: double bessel_k (double x, double nu, double expo)
Function: double bessel_y (double x, double nu): Bessel functions of types I, J, K and Y with index nu. For bessel_i and bessel_k there is the option to return exp(-x) I(x; nu) or exp(x) K(x; nu) if expo is 2. (Use expo == 1 for unscaled values.)

6.7.3 Numerical Utilities

There are a few other numerical utility functions available as entry points.

Function: double R_pow (double x, double y)
Function: double R_pow_di (double x, int i): R_pow(x, y) and R_pow_di(x, i) compute x^y and x^i, respectively using R_FINITE checks and returning the proper result (the same as R) for the cases where x, y or i are 0 or missing or infinite or NaN.

Function: double log1p (double x)

Computes log(1 + x) (log 1 plus x), accurately even for small x, i.e., |x| << 1.

This should be provided by your platform, in which case it is not included in Rmath.h, but is (probably) in math.h which Rmath.h includes (except under C++, so it may not be declared for C++98).

Function: double log1pmx (double x): Computes log(1 + x) - x (log 1 plus x minus x), accurately even for small x, i.e., |x| << 1.

Function: double log1pexp (double x): Computes log(1 + exp(x)) (log 1 plus exp), accurately, notably for large x, e.g., x > 720.

Function: double expm1 (double x)

Computes exp(x) - 1 (exp x minus 1), accurately even for small x, i.e., |x| << 1.

This should be provided by your platform, in which case it is not included in Rmath.h, but is (probably) in math.h which Rmath.h includes (except under C++, so it may not be declared for C++98).

Function: double lgamma1p (double x): Computes log(gamma(x + 1)) (log(gamma(1 plus x))), accurately even for small x, i.e., 0 < x < 0.5.

Function: double cospi (double x)

Computes cos(pi * x) (where pi is 3.14159…), accurately, notably for half integer x.

This might be provided by your platform¹³⁹, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes. (Ensure that neither math.h nor cmath is included before Rmath.h or define

#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1

before the first inclusion.)

Function: double sinpi (double x)

Computes sin(pi * x) accurately, notably for (half) integer x.

This might be provided by your platform, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes (but see the comments for cospi).

Function: double tanpi (double x)

Computes tan(pi * x) accurately, notably for (half) integer x.

This might be provided by your platform, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes (but see the comments for cospi).

Function: double logspace_add (double logx, double logy)
Function: double logspace_sub (double logx, double logy)
Function: double logspace_sum (const double* logx, int n): Compute the log of a sum or difference from logs of terms, i.e., “x + y” as log (exp(logx) + exp(logy)) and “x - y” as log (exp(logx) - exp(logy)), and “sum_i x[i]” as log (sum[i = 1:n exp(logx[i])] ) without causing unnecessary overflows or throwing away too much accuracy.

Function: int imax2 (int x, int y)
Function: int imin2 (int x, int y)
Function: double fmax2 (double x, double y)
Function: double fmin2 (double x, double y): Return the larger (max) or smaller (min) of two integer or double numbers, respectively. Note that fmax2 and fmin2 differ from C99/C++11’s fmax and fmin when one of the arguments is a NaN: these versions return NaN.

Function: double sign (double x): Compute the signum function, where sign(x) is 1, 0, or -1, when x is positive, 0, or negative, respectively, and NaN if x is a NaN.

Function: double fsign (double x, double y): Performs “transfer of sign” and is defined as |x| * sign(y).

Function: double fprec (double x, double digits)

Returns the value of x rounded to digits decimal digits (after the decimal point).

This is the function used by R’s signif().

Function: double fround (double x, double digits)

Returns the value of x rounded to digits significant decimal digits.

This is the function used by R’s round(). (Note that C99/C++11 provide a round function but C++98 need not.)

Function: double ftrunc (double x)

Returns the value of x truncated (to an integer value) towards zero.

(Note that C99/C++11 provide a round function but C++98 need not.)

6.7.4 Mathematical constants

R has a set of commonly used mathematical constants encompassing constants defined by POSIX and usually¹⁴⁰ found in math.h (but maybe not in the C++ header cmath) and contains further ones that are used in statistical computations. These are defined to (at least) 30 digits accuracy in Rmath.h. The following definitions use ln(x) for the natural logarithm (log(x) in R).

Name Definition (ln = log) round(value, 7)

M_E e 2.7182818

M_LOG2E log2(e) 1.4426950

M_LOG10E log10(e) 0.4342945

M_LN2 ln(2) 0.6931472

M_LN10 ln(10) 2.3025851

M_PI pi 3.1415927

M_PI_2 pi/2 1.5707963

M_PI_4 pi/4 0.7853982

M_1_PI 1/pi 0.3183099

M_2_PI 2/pi 0.6366198

M_2_SQRTPI 2/sqrt(pi) 1.1283792

M_SQRT2 sqrt(2) 1.4142136

M_SQRT1_2 1/sqrt(2) 0.7071068

M_SQRT_3 sqrt(3) 1.7320508

M_SQRT_32 sqrt(32) 5.6568542

M_LOG10_2 log10(2) 0.3010300

M_2PI 2*pi 6.2831853

M_SQRT_PI sqrt(pi) 1.7724539

M_1_SQRT_2PI 1/sqrt(2*pi) 0.3989423

M_SQRT_2dPI sqrt(2/pi) 0.7978846

M_LN_SQRT_PI ln(sqrt(pi)) 0.5723649

M_LN_SQRT_2PI ln(sqrt(2*pi)) 0.9189385

M_LN_SQRT_PId2 ln(sqrt(pi/2)) 0.2257914

Name	Definition (`ln = log`)	round(value, 7)
`M_E`	e	2.7182818
`M_LOG2E`	log2(e)	1.4426950
`M_LOG10E`	log10(e)	0.4342945
`M_LN2`	ln(2)	0.6931472
`M_LN10`	ln(10)	2.3025851
`M_PI`	pi	3.1415927
`M_PI_2`	pi/2	1.5707963
`M_PI_4`	pi/4	0.7853982
`M_1_PI`	1/pi	0.3183099
`M_2_PI`	2/pi	0.6366198
`M_2_SQRTPI`	2/sqrt(pi)	1.1283792
`M_SQRT2`	sqrt(2)	1.4142136
`M_SQRT1_2`	1/sqrt(2)	0.7071068
`M_SQRT_3`	sqrt(3)	1.7320508
`M_SQRT_32`	sqrt(32)	5.6568542
`M_LOG10_2`	log10(2)	0.3010300
`M_2PI`	2*pi	6.2831853
`M_SQRT_PI`	sqrt(pi)	1.7724539
`M_1_SQRT_2PI`	1/sqrt(2*pi)	0.3989423
`M_SQRT_2dPI`	sqrt(2/pi)	0.7978846
`M_LN_SQRT_PI`	ln(sqrt(pi))	0.5723649
`M_LN_SQRT_2PI`	ln(sqrt(2*pi))	0.9189385
`M_LN_SQRT_PId2`	ln(sqrt(pi/2))	0.2257914

There are a set of constants (PI, DOUBLE_EPS) (and so on) defined (unless STRICT_R_HEADERS is defined) in the included header R_ext/Constants.h, mainly for compatibility with S.

Further, the included header R_ext/Boolean.h has enumeration constants TRUE and FALSE of type Rboolean in order to provide a way of using “logical” variables in C consistently. This can conflict with other software: for example it conflicts with the headers in IJG’s jpeg-9 (but not earlier versions).

6.8 Optimization

The C code underlying optim can be accessed directly. The user needs to supply a function to compute the function to be minimized, of the type

typedef double optimfn(int n, double *par, void *ex);

where the first argument is the number of parameters in the second argument. The third argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.

Some of the methods also require a gradient function

typedef void optimgr(int n, double *par, double *gr, void *ex);

which passes back the gradient in the gr argument. No function is provided for finite-differencing, nor for approximating the Hessian at the result.

The interfaces (defined in header R_ext/Applic.h) are

Nelder Mead:

void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn,
           int *fail, double abstol, double intol, void *ex,
           double alpha, double beta, double gamma, int trace,
           int *fncount, int maxit);

BFGS:

void vmmin(int n, double *x, double *Fmin,
           optimfn fn, optimgr gr, int maxit, int trace,
           int *mask, double abstol, double reltol, int nREPORT,
           void *ex, int *fncount, int *grcount, int *fail);

Conjugate gradients:

void cgmin(int n, double *xin, double *x, double *Fmin,
           optimfn fn, optimgr gr, int *fail, double abstol,
           double intol, void *ex, int type, int trace,
           int *fncount, int *grcount, int maxit);

Limited-memory BFGS with bounds:

void lbfgsb(int n, int lmm, double *x, double *lower,
            double *upper, int *nbd, double *Fmin, optimfn fn,
            optimgr gr, int *fail, void *ex, double factr,
            double pgtol, int *fncount, int *grcount,
            int maxit, char *msg, int trace, int nREPORT);

Simulated annealing:

void samin(int n, double *x, double *Fmin, optimfn fn, int maxit,
           int tmax, double temp, int trace, void *ex);

Many of the arguments are common to the various methods. n is the number of parameters, x or xin is the starting parameters on entry and x the final parameters on exit, with final value returned in Fmin. Most of the other parameters can be found from the help page for optim: see the source code src/appl/lbfgsb.c for the values of nbd, which specifies which bounds are to be used.

6.9 Integration

The C code underlying integrate can be accessed directly. The user needs to supply a vectorizing C function to compute the function to be integrated, of the type

typedef void integr_fn(double *x, int n, void *ex);

where x[] is both input and output and has length n, i.e., a C function, say fn, of type integr_fn must basically do for(i in 1:n) x[i] := f(x[i], ex). The vectorization requirement can be used to speed up the integrand instead of calling it n times. Note that in the current implementation built on QUADPACK, n will be either 15 or 21. The ex argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.

There are interfaces (defined in header R_ext/Applic.h) for integrals over finite and infinite intervals (or “ranges” or “integration boundaries”).

Finite:

void Rdqags(integr_fn f, void *ex, double *a, double *b,
            double *epsabs, double *epsrel,
            double *result, double *abserr, int *neval, int *ier,
            int *limit, int *lenw, int *last,
            int *iwork, double *work);

Infinite:

void Rdqagi(integr_fn f, void *ex, double *bound, int *inf,
            double *epsabs, double *epsrel,
            double *result, double *abserr, int *neval, int *ier,
            int *limit, int *lenw, int *last,
            int *iwork, double *work);

Only the 3rd and 4th argument differ for the two integrators; for the finite range integral using Rdqags, a and b are the integration interval bounds, whereas for an infinite range integral using Rdqagi, bound is the finite bound of the integration (if the integral is not doubly-infinite) and inf is a code indicating the kind of integration range,

inf = 1: corresponds to (bound, +Inf),
inf = -1: corresponds to (-Inf, bound),
inf = 2: corresponds to (-Inf, +Inf),

f and ex define the integrand function, see above; epsabs and epsrel specify the absolute and relative accuracy requested, result, abserr and last are the output components value, abs.err and subdivisions of the R function integrate, where neval gives the number of integrand function evaluations, and the error code ier is translated to R’s integrate() $ message, look at that function definition. limit corresponds to integrate(…, subdivisions = *). It seems you should always define the two work arrays and the length of the second one as

    lenw = 4 * limit;
    iwork =   (int *) R_alloc(limit, sizeof(int));
    work = (double *) R_alloc(lenw,  sizeof(double));

The comments in the source code in src/appl/integrate.c give more details, particularly about reasons for failure (ier >= 1).

6.10 Utility functions

R has a fairly comprehensive set of sort routines which are made available to users’ C code. The following is declared in header file Rinternals.h.

Function: void R_orderVector (int* indx, int n, SEXP arglist, Rboolean nalast, Rboolean decreasing)
Function: void R_orderVector1 (int* indx, int n, SEXP x, Rboolean nalast, Rboolean decreasing)

R_orderVector() corresponds to R’s order(…, na.last, decreasing). More specifically, indx <- order(x, y, na.last, decreasing) corresponds to R_orderVector(indx, n, Rf_lang2(x, y), nalast, decreasing) and for three vectors, Rf_lang3(x,y,z) is used as arglist.

Both R_orderVector and R_orderVector1 assume the vector indx to be allocated to length >= n. On return, indx[] contains a permutation of 0:(n-1), i.e., 0-based C indices (and not 1-based R indices, as R’s order()).

When ordering only one vector, R_orderVector1 is faster and corresponds (but is 0-based) to R’s indx <- order(x, na.last, decreasing). It was added in R 3.3.0.

All other sort routines are declared in header file R_ext/Utils.h (included by R.h) and include the following.

Function: void R_isort (int* x, int n)
Function: void R_rsort (double* x, int n)
Function: void R_csort (Rcomplex* x, int n)
Function: void rsort_with_index (double* x, int* index, int n)

The first three sort integer, real (double) and complex data respectively. (Complex numbers are sorted by the real part first then the imaginary part.) NAs are sorted last.

rsort_with_index sorts on x, and applies the same permutation to index. NAs are sorted last.

Function: void revsort (double* x, int* index, int n): Is similar to rsort_with_index but sorts into decreasing order, and NAs are not handled.

Function: void iPsort (int* x, int n, int k)
Function: void rPsort (double* x, int n, int k)
Function: void cPsort (Rcomplex* x, int n, int k): These all provide (very) partial sorting: they permute x so that x[k] is in the correct place with smaller values to the left, larger ones to the right.

Function: void R_qsort (double v, size_t i, size_t j)
Function: void R_qsort_I (double v, int I, int i, int j)
Function: void R_qsort_int (int iv, size_t i, size_t j)
Function: void R_qsort_int_I (int iv, int I, int i, int j)

These routines sort v[i:j] or iv[i:j] (using 1-indexing, i.e., v[1] is the first element) calling the quicksort algorithm as used by R’s sort(v, method = “quick”) and documented on the help page for the R function sort. The …_I() versions also return the sort.index() vector in I. Note that the ordering is not stable, so tied values may be permuted.

Note that NAs are not handled (explicitly) and you should use different sorting functions if NAs can be present.

Function: subroutine qsort4 (double precision v, integer indx, integer ii, integer jj)
Function: subroutine qsort3 (double precision v, integer ii, integer jj): The FORTRAN interface routines for sorting double precision vectors are qsort3 and qsort4, equivalent to R_qsort and R_qsort_I, respectively.

Function: void R_max_col (double* matrix, int* nr, int* nc, int* maxes, int* ties_meth): Given the nr by nc matrix matrix in column-major (“FORTRAN”) order, R_max_col() returns in maxes[i-1] the column number of the maximal element in the i-th row (the same as R’s max.col() function). In the case of ties (multiple maxima), *ties_meth is an integer code in 1:3 determining the method: 1 = “random”, 2 = “first” and 3 = “last”. See R’s help page ?max.col.

Function: int findInterval (double* xt, int n, double x, Rboolean rightmost_closed, Rboolean all_inside, int ilo, int* mflag)
Function: int findInterval2(double xt, int n, double x, Rboolean rightmost_closed, Rboolean all_inside, Rboolean left_open, int ilo, int mflag)

Given the ordered vector xt of length n, return the interval or index of x in xt[], typically max(i; 1 <= i <= n & xt[i] <= x) where we use 1-indexing as in R and FORTRAN (but not C). If rightmost_closed is true, also returns n-1 if x equals xt[n]. If all_inside is not 0, the result is coerced to lie in 1:(n-1) even when x is outside the xt[] range. On return, *mflag equals -1 if x < xt[1], +1 if x >= xt[n], and 0 otherwise.

The algorithm is particularly fast when ilo is set to the last result of findInterval() and x is a value of a sequence which is increasing or decreasing for subsequent calls.

findInterval2() is a generalization of findInterval(), with an extra Rboolean argument left_open. Setting left_open = TRUE basically replaces all left-closed right-open intervals t) by left-open ones t], see the help page of R function findInterval for details.

There is also an F77_CALL(interv)() version of findInterval() with the same arguments, but all pointers.

A system-independent interface to produce the name of a temporary file is provided as

Function: char R_tmpnam (const char prefix, const char tmpdir)
Function: char R_tmpnam2 (const char prefix, const char tmpdir, const char *fileext): Return a pathname for a temporary file with name beginning with prefix and ending with fileext in directory tmpdir. A NULL prefix or extension is replaced by “”. Note that the return value is malloced and should be freed when no longer needed (unlike the system call tmpnam).

There is also the internal function used to expand file names in several R functions, and called directly by path.expand.

Function: const char R_ExpandFileName (const char fn): Expand a path name fn by replacing a leading tilde by the user’s home directory (if defined). The precise meaning is platform-specific; it will usually be taken from the environment variable HOME if this is defined.

For historical reasons there are FORTRAN interfaces to functions D1MACH and I1MACH. These can be called from C code as e.g. F77_CALL(d1mach)(4). Note that these are emulations of the original functions by Fox, Hall and Schryer on NetLib at http://www.netlib.org/slatec/src/ for IEC 60559 arithmetic (required by R).

6.11 Re-encoding

R has its own C-level interface to the encoding conversion capabilities provided by iconv because there are incompatibilities between the declarations in different implementations of iconv.

These are declared in header file R_ext/Riconv.h.

Function: void Riconv_open (const char to, const char *from)

Set up a pointer to an encoding object to be used to convert between two encodings: “” indicates the current locale.

Function: size_t Riconv (void *cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft)

Convert as much as possible of inbuf to outbuf. Initially the int variables indicate the number of bytes available in the buffers, and they are updated (and the char pointers are updated to point to the next free byte in the buffer). The return value is the number of characters converted, or (size_t)-1 (beware: size_t is usually an unsigned type). It should be safe to assume that an error condition sets errno to one of E2BIG (the output buffer is full), EILSEQ (the input cannot be converted, and might be invalid in the encoding specified) or EINVAL (the input does not end with a complete multi-byte character).

Function: int Riconv_close (void * cd)

Free the resources of an encoding object.

6.12 Allowing interrupts

No port of R can be interrupted whilst running long computations in compiled code, so programmers should make provision for the code to be interrupted at suitable points by calling from C

#include <R_ext/Utils.h>

void R_CheckUserInterrupt(void);

and from FORTRAN

subroutine rchkusr()

These check if the user has requested an interrupt, and if so branch to R’s error handling functions.

Note that it is possible that the code behind one of the entry points defined here if called from your C or FORTRAN code could be interruptible or generate an error and so not return to your code.

6.13 Platform and version information

The header files define USING_R, which can be used to test if the code is indeed being used with R.

Header file Rconfig.h (included by R.h) is used to define platform-specific macros that are mainly for use in other header files. The macro WORDS_BIGENDIAN is defined on big-endian¹⁴¹ systems (e.g. most OSes on Sparc and PowerPC hardware) and not on little-endian systems (nowadays all the commoner R platforms). It can be useful when manipulating binary files. NB: these macros apply only to the C compiler used to build R, not necessarily to another C or C++ compiler.

Header file Rversion.h (not included by R.h) defines a macro R_VERSION giving the version number encoded as an integer, plus a macro R_Version to do the encoding. This can be used to test if the version of R is late enough, or to include back-compatibility features. For protection against very old versions of R which did not have this macro, use a construction such as

#if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
  ...
#endif

More detailed information is available in the macros R_MAJOR, R_MINOR, R_YEAR, R_MONTH and R_DAY: see the header file Rversion.h for their format. Note that the minor version includes the patchlevel (as in ‘2.2’).

Packages which use alloca need to ensure it is defined: as it is part of neither C nor POSIX there is no standard way to do so. One can use

#include <Rconfig.h> // for HAVE_ALLOCA_H
#ifdef __GNUC__
// this covers gcc, clang, icc
# undef alloca
# define alloca(x) __builtin_alloca((x))
#elif defined(HAVE_ALLOCA_H)
// needed for native compilers on Solaris and AIX
# include <alloca.h>
#endif

(and this should be included before standard C headers such as stdlib.h, since on some platforms these include malloc.h which may have a conflicting definition), which suffices for known R platforms.

6.14 Inlining C functions

The C99 keyword inline should be recognized by all compilers nowadays used to build R. Portable code which might be used with earlier versions of R can be written using the macro R_INLINE (defined in file Rconfig.h included by R.h), as for example from package cluster

#include <R.h>

static R_INLINE int ind_2(int l, int j)
{
...
}

Be aware that using inlining with functions in more than one compilation unit is almost impossible to do portably, see http://www.greenend.org.uk/rjk/2003/03/inline.html, so this usage is for static functions as in the example. All the R configure code has checked is that R_INLINE can be used in a single C file with the compiler used to build R. We recommend that packages making extensive use of inlining include their own configure code.

6.15 Controlling visibility

Header R_ext/Visibility.h has some definitions for controlling the visibility of entry points. These are only effective when ‘HAVE_VISIBILITY_ATTRIBUTE’ is defined – this is checked when R is configured and recorded in header Rconfig.h (included by R_ext/Visibility.h). It is often defined on modern Unix-alikes with a recent compiler¹⁴², but not supported on macOS nor Windows. Minimizing the visibility of symbols in a shared library will both speed up its loading (unlikely to be significant) and reduce the possibility of linking to other entry points of the same name.

C/C++ entry points prefixed by attribute_hidden will not be visible in the shared object. There is no comparable mechanism for FORTRAN entry points, but there is a more comprehensive scheme used by, for example package stats. Most compilers which allow control of visibility will allow control of visibility for all symbols via a flag, and where known the flag is encapsulated in the macros ‘C_VISIBILITY’ and F77_VISIBILITY for C and FORTRAN compilers. These are defined in etc/Makeconf and so available for normal compilation of package code. For example, src/Makevars could include

PKG_CFLAGS=$(C_VISIBILITY)
PKG_FFLAGS=$(F77_VISIBILITY)

This would end up with no visible entry points, which would be pointless. However, the effect of the flags can be overridden by using the attribute_visible prefix. A shared object which registers its entry points needs only for have one visible entry point, its initializer, so for example package stats has

void attribute_visible R_init_stats(DllInfo *dll)
{
    R_registerRoutines(dll, CEntries, CallEntries, FortEntries, NULL);
    R_useDynamicSymbols(dll, FALSE);
...
}

The visibility mechanism is not available on Windows, but there is an equally effective way to control which entry points are visible, by supplying a definitions file pkgnme/src/pkgname-win.def: only entry points listed in that file will be visible. Again using stats as an example, it has

LIBRARY stats.dll
EXPORTS
 R_init_stats

6.16 Using these functions in your own C code

It is possible to build Mathlib, the R set of mathematical functions documented in Rmath.h, as a standalone library libRmath under both Unix-alikes and Windows. (This includes the functions documented in Numerical analysis subroutines as from that header file.)

The library is not built automatically when R is installed, but can be built in the directory src/nmath/standalone in the R sources: see the file README there. To use the code in your own C program include

#define MATHLIB_STANDALONE
#include <Rmath.h>

and link against ‘-lRmath’ (and perhaps ‘-lm’). There is an example file test.c.

A little care is needed to use the random-number routines. You will need to supply the uniform random number generator

double unif_rand(void)

or use the one supplied (and with a dynamic library or DLL you will have to use the one supplied, which is the Marsaglia-multicarry with an entry points

set_seed(unsigned int, unsigned int)

to set its seeds and

get_seed(unsigned int *, unsigned int *)

to read the seeds).

6.17 Organization of header files

The header files which R installs are in directory R_INCLUDE_DIR (default R_HOME/include). This currently includes

R.h includes many other files

S.h different version for code ported from S

Rinternals.h definitions for using R’s internal structures

Rdefines.h macros for an S-like interface to the above (no longer maintained)

Rmath.h standalone math library

Rversion.h R version information

Rinterface.h for add-on front-ends (Unix-alikes only)

Rembedded.h for add-on front-ends

R_ext/Applic.h optimization and integration

R_ext/BLAS.h C definitions for BLAS routines

R_ext/Callbacks.h C (and R function) top-level task handlers

R_ext/GetX11Image.h X11Image interface used by package trkplot

R_ext/Lapack.h C definitions for some LAPACK routines

R_ext/Linpack.h C definitions for some LINPACK routines, not all of which are included in R

R_ext/Parse.h a small part of R’s parse interface: not part of the stable API.

R_ext/RStartup.h for add-on front-ends

R_ext/Rdynload.h needed to register compiled code in packages

R_ext/R-ftp-http.h interface to internal method of download.file

R_ext/Riconv.h interface to iconv

R_ext/Visibility.h definitions controlling visibility

R_ext/eventloop.h for add-on front-ends and for packages that need to share in the R event loops (not Windows)

The following headers are included by R.h:

Rconfig.h configuration info that is made available

R_ext/Arith.h handling for NAs, NaNs, Inf/-Inf

R_ext/Boolean.h TRUE/FALSE type

R_ext/Complex.h C typedefs for R’s complex

R_ext/Constants.h constants

R_ext/Error.h error handling

R_ext/Memory.h memory allocation

R_ext/Print.h Rprintf and variations.

R_ext/RS.h definitions common to R.h and S.h, including F77_CALL etc.

R_ext/Random.h random number generation

R_ext/Utils.h sorting and other utilities

R_ext/libextern.h definitions for exports from R.dll on Windows.

The graphics systems are exposed in headers R_ext/GraphicsEngine.h, R_ext/GraphicsDevice.h (which it includes) and R_ext/QuartzDevice.h. Facilities for defining custom connection implementations are provided in R_ext/Connections.h, but make sure you consult the file before use.

Let us re-iterate the advice to include system headers before the R header files, especially Rinternals.h (included by Rdefines.h) and Rmath.h, which redefine names which may be used in system headers (fewer if ‘R_NO_REMAP’ is defined, or ‘R_NO_REMAP_RMATH’ for Rmath.h).

beta	`beta`	`a`, `b`
non-central beta	`nbeta`	`a`, `b`, `ncp`
binomial	`binom`	`n`, `p`
Cauchy	`cauchy`	`location`, `scale`
chi-squared	`chisq`	`df`
non-central chi-squared	`nchisq`	`df`, `ncp`
exponential	`exp`	`scale` (and not `rate`)
F	`f`	`n1`, `n2`
non-central F	`nf`	`n1`, `n2`, `ncp`
gamma	`gamma`	`shape`, `scale`
geometric	`geom`	`p`
hypergeometric	`hyper`	`NR`, `NB`, `n`
logistic	`logis`	`location`, `scale`
lognormal	`lnorm`	`logmean`, `logsd`
negative binomial	`nbinom`	`size`, `prob`
normal	`norm`	`mu`, `sigma`
Poisson	`pois`	`lambda`
Student’s t	`t`	`n`
non-central t	`nt`	`df`, `delta`
Studentized range	`tukey` (*)	`rr`, `cc`, `df`
uniform	`unif`	`a`, `b`
Weibull	`weibull`	`shape`, `scale`
Wilcoxon rank sum	`wilcox`	`m`, `n`
Wilcoxon signed rank	`signrank`	`n`

R.h	includes many other files
S.h	different version for code ported from S
Rinternals.h	definitions for using R’s internal structures
Rdefines.h	macros for an S-like interface to the above (no longer maintained)
Rmath.h	standalone math library
Rversion.h	R version information
Rinterface.h	for add-on front-ends (Unix-alikes only)
Rembedded.h	for add-on front-ends
R_ext/Applic.h	optimization and integration
R_ext/BLAS.h	C definitions for BLAS routines
R_ext/Callbacks.h	C (and R function) top-level task handlers
R_ext/GetX11Image.h	X11Image interface used by package trkplot
R_ext/Lapack.h	C definitions for some LAPACK routines
R_ext/Linpack.h	C definitions for some LINPACK routines, not all of which are included in R
R_ext/Parse.h	a small part of R’s parse interface: not part of the stable API.
R_ext/RStartup.h	for add-on front-ends
R_ext/Rdynload.h	needed to register compiled code in packages
R_ext/R-ftp-http.h	interface to internal method of `download.file`
R_ext/Riconv.h	interface to `iconv`
R_ext/Visibility.h	definitions controlling visibility
R_ext/eventloop.h	for add-on front-ends and for packages that need to share in the R event loops (not Windows)

Rconfig.h	configuration info that is made available
R_ext/Arith.h	handling for `NA`s, `NaN`s, `Inf`/`-Inf`
R_ext/Boolean.h	`TRUE`/`FALSE` type
R_ext/Complex.h	C typedefs for R’s `complex`
R_ext/Constants.h	constants
R_ext/Error.h	error handling
R_ext/Memory.h	memory allocation
R_ext/Print.h	`Rprintf` and variations.
R_ext/RS.h	definitions common to R.h and S.h, including `F77_CALL` etc.
R_ext/Random.h	random number generation
R_ext/Utils.h	sorting and other utilities
R_ext/libextern.h	definitions for exports from R.dll on Windows.