Chapter 6 The R API: entry points for C code
There are a large number of entry points in the R executable/DLL that can be called from C code (and some that can be called from FORTRAN code). Only those documented here are stable enough that they will only be changed with considerable notice.
The recommended procedure to use these is to include the header file R.h in your C code by
#include <R.h>
This will include several other header files from the directory R_INCLUDE_DIR/R_ext, and there are other header files there that can be included too, but many of the features they contain should be regarded as undocumented and unstable.
Most of these header files, including all those included by R.h, can be used from C++ code.
Note: Because R re-maps many of its external names to avoid clashes with user code, it is essential to include the appropriate header files when using these entry points.
This remapping can cause problems137, and can be eliminated by defining R_NO_REMAP
and prepending ‘Rf_’ to all the function names used from Rinternals.h and R_ext/Error.h. These problems can usually be avoided by including other headers (such as system headers and those for external software used by the package) before R.h.
We can classify the entry points as
- API
-
Entry points which are documented in this manual and declared in an installed header file. These can be used in distributed packages and will only be changed after deprecation.
- public
-
Entry points declared in an installed header file that are exported on all R platforms but are not documented and subject to change without notice.
- private
-
Entry points that are used when building R and exported on all R platforms but are not declared in the installed header files. Do not use these in distributed code.
- hidden
-
Entry points that are where possible (Windows and some modern Unix-alike compilers/loaders when using R as a shared library) not exported.
6.1 Memory allocation
There are two types of memory allocation available to the C programmer, one in which R manages the clean-up and the other in which user has full control (and responsibility).
6.1.1 Transient storage allocation
Here R will reclaim the memory at the end of the call to .C
, .Call
or .External
. Use
char *R_alloc(size_t n, int size)
which allocates n units of size bytes each. A typical usage (from package stats) is
x = (int *) R_alloc(nrows(merge)+2, sizeof(int));
(size_t
is defined in stddef.h which the header defining R_alloc
includes.)
There is a similar call, S_alloc
(for compatibility with older versions of S) which zeroes the memory allocated,
char *S_alloc(long n, int size)
and
char *S_realloc(char *p, long new, long old, int size)
which changes the allocation size from old to new units, and zeroes the additional units.
For compatibility with current versions of S, header S.h (only) defines wrapper macros equivalent to
type* Salloc(long n, int type)
type* Srealloc(char *p, long new, long old, int type)
This memory is taken from the heap, and released at the end of the .C
, .Call
or .External
call. Users can also manage it, by noting the current position with a call to vmaxget
and subsequently clearing memory allocated by a call to vmaxset
. An example might be
void *vmax = vmaxget()
// a loop involving the use of R_alloc at each iteration
vmaxset(vmax)
This is only recommended for experts.
Note that this memory will be freed on error or user interrupt (if allowed: see Allowing interrupts).
The memory returned is only guaranteed to be aligned as required for double
pointers: take precautions if casting to a pointer which needs more. There is also
long double *R_allocLD(size_t n)
which is guaranteed to have the 16-byte alignment needed for long double
pointers on some platforms.
These functions should only be used in code called by .C
etc, never from front-ends. They are not thread-safe.
6.1.2 User-controlled memory
The other form of memory allocation is an interface to malloc
, the interface providing R error handling. This memory lasts until freed by the user and is additional to the memory allocated for the R workspace.
The interface functions are
type* Calloc(size_t n, type)
type* Realloc(any *p, size_t n, type)
void Free(any *p)
providing analogues of calloc
, realloc
and free
. If there is an error during allocation it is handled by R, so if these routines return the memory has been successfully allocated or freed. Free
will set the pointer p to NULL
. (Some but not all versions of S do so.)
Users should arrange to Free
this memory when no longer needed, including on error or user interrupt. This can often be done most conveniently from an on.exit
action in the calling R function – see pwilcox
for an example.
Do not assume that memory allocated by Calloc
/Realloc
comes from the same pool as used by malloc
: in particular do not use free
or strdup
with it.
Memory obtained by these functions should be aligned in the same way as malloc
, that is ‘suitably aligned for any kind of variable’.
These entry points need to be prefixed by R_
if STRICT_R_HEADERS
has been defined.
6.2 Error handling
The basic error handling routines are the equivalents of stop
and warning
in R code, and use the same interface.
void error(const char * format, ...);
void warning(const char * format, ...);
These have the same call sequences as calls to printf
, but in the simplest case can be called with a single character string argument giving the error message. (Don’t do this if the string contains ‘%’ or might otherwise be interpreted as a format.)
If STRICT_R_HEADERS
is not defined there is also an S-compatibility interface which uses calls of the form
PROBLEM ...... ERROR
MESSAGE ...... WARN
PROBLEM ...... RECOVER(NULL_ENTRY)
MESSAGE ...... WARNING(NULL_ENTRY)
the last two being the forms available in all S versions. Here ‘……’ is a set of arguments to printf
, so can be a string or a format string followed by arguments separated by commas.
6.2.1 Error handling from FORTRAN
There are two interface function provided to call error
and warning
from FORTRAN code, in each case with a simple character string argument. They are defined as
subroutine rexit(message)
subroutine rwarn(message)
Messages of more than 255 characters are truncated, with a warning.
6.3 Random number generation
The interface to R’s internal random number generation routines is
double unif_rand();
double norm_rand();
double exp_rand();
giving one uniform, normal or exponential pseudo-random variate. However, before these are used, the user must call
GetRNGstate();
and after all the required variates have been generated, call
PutRNGstate();
These essentially read in (or create) .Random.seed
and write it out after use.
File S.h defines seed_in
and seed_out
for S-compatibility rather than GetRNGstate
and PutRNGstate
. These take a long *
argument which is ignored.
The random number generator is private to R; there is no way to select the kind of RNG or set the seed except by evaluating calls to the R functions.
The C code behind R’s rxxx
functions can be accessed by including the header file Rmath.h; See Distribution functions. Those calls generate a single variate and should also be enclosed in calls to GetRNGstate
and PutRNGstate
.
6.4 Missing and IEEE special values
A set of functions is provided to test for NA
, Inf
, -Inf
and NaN
. These functions are accessed via macros:
ISNA(x) True for R’s NA only
ISNAN(x) True for R’s NA and IEEE NaN
R_FINITE(x) False for Inf, -Inf, NA, NaN
and via function R_IsNaN
which is true for NaN
but not NA
.
Do use R_FINITE
rather than isfinite
or finite
; the latter is often mendacious and isfinite
is only available on a some platforms, on which R_FINITE
is a macro expanding to isfinite
.
Currently in C code ISNAN
is a macro calling isnan
. (Since this gives problems on some C++ systems, if the R headers is called from C++ code a function call is used.)
You can check for Inf
or -Inf
by testing equality to R_PosInf
or R_NegInf
, and set (but not test) an NA
as NA_REAL
.
All of the above apply to double variables only. For integer variables there is a variable accessed by the macro NA_INTEGER
which can used to set or test for missingness.
6.5 Printing
The most useful function for printing from a C routine compiled into R is Rprintf
. This is used in exactly the same way as printf
, but is guaranteed to write to R’s output (which might be a GUI console rather than a file, and can be re-directed by sink
). It is wise to write complete lines (including the “\n”
) before returning to R. It is defined in R_ext/Print.h.
The function REprintf
is similar but writes on the error stream (stderr
) which may or may not be different from the standard output stream.
Functions Rvprintf
and REvprintf
are analogues using the vprintf
interface. Because that is a C99138 interface, they are only defined by R_ext/Print.h in C++ code if the macro R_USE_C99_IN_CXX
is defined when it is included.
Another circumstance when it may be important to use these functions is when using parallel computation on a cluster of computational nodes, as their output will be re-directed/logged appropriately.
6.5.1 Printing from FORTRAN
On many systems FORTRAN write
and print
statements can be used, but the output may not interleave well with that of C, and will be invisible on GUI interfaces. They are not portable and best avoided.
Three subroutines are provided to ease the output of information from FORTRAN code.
subroutine dblepr(label, nchar, data, ndata)
subroutine realpr(label, nchar, data, ndata)
subroutine intpr (label, nchar, data, ndata)
Here label is a character label of up to 255 characters, nchar is its length (which can be -1
if the whole label is to be used), and data is an array of length at least ndata of the appropriate type (double precision
, real
and integer
respectively). These routines print the label on one line and then print data as if it were an R vector on subsequent line(s). They work with zero ndata, and so can be used to print a label alone.
6.6 Calling C from FORTRAN and vice versa
Naming conventions for symbols generated by FORTRAN differ by platform: it is not safe to assume that FORTRAN names appear to C with a trailing underscore. To help cover up the platform-specific differences there is a set of macros that should be used.
-
F77_SUB(name)
-
to define a function in C to be called from FORTRAN
-
F77_NAME(name)
-
to declare a FORTRAN routine in C before use
-
F77_CALL(name)
-
to call a FORTRAN routine from C
-
F77_COMDECL(name)
-
to declare a FORTRAN common block in C
-
F77_COM(name)
-
to access a FORTRAN common block from C
On most current platforms these are all the same, but it is unwise to rely on this. Note that names with underscores are not legal in FORTRAN 77, and are not portably handled by the above macros. (Also, all FORTRAN names for use by R are lower case, but this is not enforced by the macros.)
For example, suppose we want to call R’s normal random numbers from FORTRAN. We need a C wrapper along the lines of
#include <R.h>
void F77_SUB(rndstart)(void) { GetRNGstate(); }
void F77_SUB(rndend)(void) { PutRNGstate(); }
double F77_SUB(normrnd)(void) { return norm_rand(); }
to be called from FORTRAN as in
subroutine testit()
double precision normrnd, x
call rndstart()
x = normrnd()
call dblepr("X was", 5, x, 1)
call rndend()
end
Note that this is not guaranteed to be portable, for the return conventions might not be compatible between the C and FORTRAN compilers used. (Passing values via arguments is safer.)
The standard packages, for example stats, are a rich source of further examples.
Passing character strings from C to FORTRAN 77 or vice versa is not portable (and to Fortran 90 or later is even less so). We have found that it helps to ensure that a C string to be passed is followed by several nul
s (and not just the one needed as a C terminator). But for maximal portability character strings in FORTRAN should be avoided.
6.7 Numerical analysis subroutines
R contains a large number of mathematical functions for its own use, for example numerical linear algebra computations and special functions.
The header files R_ext/BLAS.h, R_ext/Lapack.h and R_ext/Linpack.h contains declarations of the BLAS, LAPACK and LINPACK linear algebra functions included in R. These are expressed as calls to FORTRAN subroutines, and they will also be usable from users’ FORTRAN code. Although not part of the official API, this set of subroutines is unlikely to change (but might be supplemented).
The header file Rmath.h lists many other functions that are available and documented in the following subsections. Many of these are C interfaces to the code behind R functions, so the R function documentation may give further details.
6.7.1 Distribution functions
The routines used to calculate densities, cumulative distribution functions and quantile functions for the standard statistical distributions are available as entry points.
The arguments for the entry points follow the pattern of those for the normal distribution:
double dnorm(double x, double mu, double sigma, int give_log);
double pnorm(double x, double mu, double sigma, int lower_tail,
int give_log);
double qnorm(double p, double mu, double sigma, int lower_tail,
int log_p);
double rnorm(double mu, double sigma);
That is, the first argument gives the position for the density and CDF and probability for the quantile function, followed by the distribution’s parameters. Argument lower_tail should be TRUE
(or 1
) for normal use, but can be FALSE
(or 0
) if the probability of the upper tail is desired or specified.
Finally, give_log should be non-zero if the result is required on log scale, and log_p should be non-zero if p has been specified on log scale.
Note that you directly get the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by using
- pdist(t, ..., /*lower_tail = */ FALSE, /* give_log = */ TRUE)
or shorter (and more cryptic) - pdist(t, …, 0, 1)
.
The random-variate generation routine rnorm
returns one normal variate. See Random numbers, for the protocol in using the random-variate routines.
Note that these argument sequences are (apart from the names and that rnorm
has no n) mainly the same as the corresponding R functions of the same name, so the documentation of the R functions can be used. Note that the exponential and gamma distributions are parametrized by scale
rather than rate
.
For reference, the following table gives the basic name (to be prefixed by ‘d’, ‘p’, ‘q’ or ‘r’ apart from the exceptions noted) and distribution-specific arguments for the complete set of distributions.
beta beta
a
,b
non-central beta nbeta
a
,b
,ncp
binomial binom
n
,p
Cauchy cauchy
location
,scale
chi-squared chisq
df
non-central chi-squared nchisq
df
,ncp
exponential exp
scale
(and notrate
)F f
n1
,n2
non-central F nf
n1
,n2
,ncp
gamma gamma
shape
,scale
geometric geom
p
hypergeometric hyper
NR
,NB
,n
logistic logis
location
,scale
lognormal lnorm
logmean
,logsd
negative binomial nbinom
size
,prob
normal norm
mu
,sigma
Poisson pois
lambda
Student’s t t
n
non-central t nt
df
,delta
Studentized range tukey
(*)rr
,cc
,df
uniform unif
a
,b
Weibull weibull
shape
,scale
Wilcoxon rank sum wilcox
m
,n
Wilcoxon signed rank signrank
n
Entries marked with an asterisk only have ‘p’ and ‘q’ functions available, and none of the non-central distributions have ‘r’ functions. After a call to dwilcox
, pwilcox
or qwilcox
the function wilcox_free()
should be called, and similarly for the signed rank functions.
(If remapping is suppressed, the Normal distribution names are Rf_dnorm4
, Rf_pnorm5
and Rf_qnorm5
.)
For the negative binomial distribution (‘nbinom’), in addition to the (size, prob)
parametrization, the alternative (size, mu)
parametrization is provided as well by functions ‘[dpqr]nbinom_mu()’, see ?NegBinomial in R.
Functions dpois_raw(x, )
and dbinom_raw(x, )
are versions of the Poisson and binomial probability mass functions which work continuously in x
, whereas dbinom(x,)
and dpois(x,)
only return non zero values for integer x
.
double dbinom_raw(double x, double n, double p, double q, int give_log)
double dpois_raw (double x, double lambda, int give_log)
Note that dbinom_raw()
gets both p and q = 1-p which may be advantageous when one of them is close to 1.
6.7.2 Mathematical functions
-
Function: double gammafn (double x)
Function: double lgammafn (double x)
Function: double digamma (double x)
Function: double trigamma (double x)
Function: double tetragamma (double x)
Function: double pentagamma (double x)
Function: double psigamma (double x, double deriv) -
The Gamma function, the natural logarithm of its absolute value and first four derivatives and the n-th derivative of Psi, the digamma function, which is the derivative of
lgammafn
. In other words,digamma(x)
is the same aspsigamma(x,0)
,trigamma(x) == psigamma(x,1)
, etc.
-
Function: double beta (double a, double b)
Function: double lbeta (double a, double b) -
The (complete) Beta function and its natural logarithm.
-
Function: double choose (double n, double k)
Function: double lchoose (double n, double k) -
The number of combinations of k items chosen from from n and the natural logarithm of its absolute value, generalized to arbitrary real n. k is rounded to the nearest integer (with a warning if needed).
-
Function: double bessel_i (double x, double nu, double expo)
Function: double bessel_j (double x, double nu)
Function: double bessel_k (double x, double nu, double expo)
Function: double bessel_y (double x, double nu) -
Bessel functions of types I, J, K and Y with index nu. For
bessel_i
andbessel_k
there is the option to return exp(-x) I(x; nu) or exp(x) K(x; nu) if expo is 2. (Useexpo == 1
for unscaled values.)
6.7.3 Numerical Utilities
There are a few other numerical utility functions available as entry points.
-
Function: double R_pow (double x, double y)
Function: double R_pow_di (double x, int i) -
R_pow(x, y)
andR_pow_di(x, i)
computex^y
andx^i
, respectively usingR_FINITE
checks and returning the proper result (the same as R) for the cases where x, y or i are 0 or missing or infinite orNaN
.
- Function: double log1p (double x)
-
Computes
log(1 + x)
(log 1 plus x), accurately even for small x, i.e., |x| << 1.This should be provided by your platform, in which case it is not included in Rmath.h, but is (probably) in math.h which Rmath.h includes (except under C++, so it may not be declared for C++98).
- Function: double log1pmx (double x)
-
Computes
log(1 + x) - x
(log 1 plus x minus x), accurately even for small x, i.e., |x| << 1.
- Function: double log1pexp (double x)
-
Computes
log(1 + exp(x))
(log 1 plus exp), accurately, notably for large x, e.g., x > 720.
- Function: double expm1 (double x)
-
Computes
exp(x) - 1
(exp x minus 1), accurately even for small x, i.e., |x| << 1.This should be provided by your platform, in which case it is not included in Rmath.h, but is (probably) in math.h which Rmath.h includes (except under C++, so it may not be declared for C++98).
- Function: double lgamma1p (double x)
-
Computes
log(gamma(x + 1))
(log(gamma(1 plus x))), accurately even for small x, i.e., 0 < x < 0.5.
- Function: double cospi (double x)
-
Computes
cos(pi * x)
(wherepi
is 3.14159…), accurately, notably for half integer x.This might be provided by your platform139, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes. (Ensure that neither math.h nor cmath is included before Rmath.h or define
#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1
before the first inclusion.)
- Function: double sinpi (double x)
-
Computes
sin(pi * x)
accurately, notably for (half) integer x.This might be provided by your platform, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes (but see the comments for
cospi
).
- Function: double tanpi (double x)
-
Computes
tan(pi * x)
accurately, notably for (half) integer x.This might be provided by your platform, in which case it is not included in Rmath.h, but is in math.h which Rmath.h includes (but see the comments for
cospi
).
-
Function: double logspace_add (double logx, double logy)
Function: double logspace_sub (double logx, double logy)
Function: double logspace_sum (const double* logx, int n) -
Compute the log of a sum or difference from logs of terms, i.e., “x + y” as
log (exp(logx) + exp(logy))
and “x - y” aslog (exp(logx) - exp(logy))
, and “sum_i x[i]” aslog (sum[i = 1:n exp(logx[i])] )
without causing unnecessary overflows or throwing away too much accuracy.
-
Function: int imax2 (int x, int y)
Function: int imin2 (int x, int y)
Function: double fmax2 (double x, double y)
Function: double fmin2 (double x, double y) -
Return the larger (
max
) or smaller (min
) of two integer or double numbers, respectively. Note thatfmax2
andfmin2
differ from C99/C++11’sfmax
andfmin
when one of the arguments is aNaN
: these versions returnNaN
.
- Function: double sign (double x)
-
Compute the signum function, where sign(x) is 1, 0, or -1, when x is positive, 0, or negative, respectively, and
NaN
ifx
is aNaN
.
- Function: double fsign (double x, double y)
-
Performs “transfer of sign” and is defined as |x| * sign(y).
- Function: double fprec (double x, double digits)
-
Returns the value of x rounded to digits decimal digits (after the decimal point).
This is the function used by R’s
signif()
.
- Function: double fround (double x, double digits)
-
Returns the value of x rounded to digits significant decimal digits.
This is the function used by R’s
round()
. (Note that C99/C++11 provide around
function but C++98 need not.)
- Function: double ftrunc (double x)
-
Returns the value of x truncated (to an integer value) towards zero.
(Note that C99/C++11 provide a
round
function but C++98 need not.)
6.7.4 Mathematical constants
R has a set of commonly used mathematical constants encompassing constants defined by POSIX and usually140 found in math.h (but maybe not in the C++ header cmath) and contains further ones that are used in statistical computations. These are defined to (at least) 30 digits accuracy in Rmath.h. The following definitions use ln(x)
for the natural logarithm (log(x)
in R).
Name Definition ( ln = log
)round(value, 7) M_E
e 2.7182818 M_LOG2E
log2(e) 1.4426950 M_LOG10E
log10(e) 0.4342945 M_LN2
ln(2) 0.6931472 M_LN10
ln(10) 2.3025851 M_PI
pi 3.1415927 M_PI_2
pi/2 1.5707963 M_PI_4
pi/4 0.7853982 M_1_PI
1/pi 0.3183099 M_2_PI
2/pi 0.6366198 M_2_SQRTPI
2/sqrt(pi) 1.1283792 M_SQRT2
sqrt(2) 1.4142136 M_SQRT1_2
1/sqrt(2) 0.7071068 M_SQRT_3
sqrt(3) 1.7320508 M_SQRT_32
sqrt(32) 5.6568542 M_LOG10_2
log10(2) 0.3010300 M_2PI
2*pi 6.2831853 M_SQRT_PI
sqrt(pi) 1.7724539 M_1_SQRT_2PI
1/sqrt(2*pi) 0.3989423 M_SQRT_2dPI
sqrt(2/pi) 0.7978846 M_LN_SQRT_PI
ln(sqrt(pi)) 0.5723649 M_LN_SQRT_2PI
ln(sqrt(2*pi)) 0.9189385 M_LN_SQRT_PId2
ln(sqrt(pi/2)) 0.2257914
There are a set of constants (PI
, DOUBLE_EPS
) (and so on) defined (unless STRICT_R_HEADERS
is defined) in the included header R_ext/Constants.h, mainly for compatibility with S.
Further, the included header R_ext/Boolean.h has enumeration constants TRUE
and FALSE
of type Rboolean
in order to provide a way of using “logical” variables in C consistently. This can conflict with other software: for example it conflicts with the headers in IJG’s jpeg-9
(but not earlier versions).
6.8 Optimization
The C code underlying optim
can be accessed directly. The user needs to supply a function to compute the function to be minimized, of the type
typedef double optimfn(int n, double *par, void *ex);
where the first argument is the number of parameters in the second argument. The third argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
Some of the methods also require a gradient function
typedef void optimgr(int n, double *par, double *gr, void *ex);
which passes back the gradient in the gr
argument. No function is provided for finite-differencing, nor for approximating the Hessian at the result.
The interfaces (defined in header R_ext/Applic.h) are
-
Nelder Mead:
void nmmin(int n, double *xin, double *x, double *Fmin, optimfn fn, int *fail, double abstol, double intol, void *ex, double alpha, double beta, double gamma, int trace, int *fncount, int maxit);
-
BFGS:
void vmmin(int n, double *x, double *Fmin, optimfn fn, optimgr gr, int maxit, int trace, int *mask, double abstol, double reltol, int nREPORT, void *ex, int *fncount, int *grcount, int *fail);
-
Conjugate gradients:
void cgmin(int n, double *xin, double *x, double *Fmin, optimfn fn, optimgr gr, int *fail, double abstol, double intol, void *ex, int type, int trace, int *fncount, int *grcount, int maxit);
-
Limited-memory BFGS with bounds:
void lbfgsb(int n, int lmm, double *x, double *lower, double *upper, int *nbd, double *Fmin, optimfn fn, optimgr gr, int *fail, void *ex, double factr, double pgtol, int *fncount, int *grcount, int maxit, char *msg, int trace, int nREPORT);
-
Simulated annealing:
void samin(int n, double *x, double *Fmin, optimfn fn, int maxit, int tmax, double temp, int trace, void *ex);
Many of the arguments are common to the various methods. n
is the number of parameters, x
or xin
is the starting parameters on entry and x
the final parameters on exit, with final value returned in Fmin
. Most of the other parameters can be found from the help page for optim
: see the source code src/appl/lbfgsb.c for the values of nbd
, which specifies which bounds are to be used.
6.9 Integration
The C code underlying integrate
can be accessed directly. The user needs to supply a vectorizing C function to compute the function to be integrated, of the type
typedef void integr_fn(double *x, int n, void *ex);
where x[]
is both input and output and has length n
, i.e., a C function, say fn
, of type integr_fn
must basically do for(i in 1:n) x[i] := f(x[i], ex)
. The vectorization requirement can be used to speed up the integrand instead of calling it n
times. Note that in the current implementation built on QUADPACK, n
will be either 15 or 21. The ex
argument is a pointer passed down from the calling routine, normally used to carry auxiliary information.
There are interfaces (defined in header R_ext/Applic.h) for integrals over finite and infinite intervals (or “ranges” or “integration boundaries”).
-
Finite:
void Rdqags(integr_fn f, void *ex, double *a, double *b, double *epsabs, double *epsrel, double *result, double *abserr, int *neval, int *ier, int *limit, int *lenw, int *last, int *iwork, double *work);
-
Infinite:
void Rdqagi(integr_fn f, void *ex, double *bound, int *inf, double *epsabs, double *epsrel, double *result, double *abserr, int *neval, int *ier, int *limit, int *lenw, int *last, int *iwork, double *work);
Only the 3rd and 4th argument differ for the two integrators; for the finite range integral using Rdqags
, a
and b
are the integration interval bounds, whereas for an infinite range integral using Rdqagi
, bound
is the finite bound of the integration (if the integral is not doubly-infinite) and inf
is a code indicating the kind of integration range,
-
inf = 1
-
corresponds to (bound, +Inf),
-
inf = -1
-
corresponds to (-Inf, bound),
-
inf = 2
-
corresponds to (-Inf, +Inf),
f
and ex
define the integrand function, see above; epsabs
and epsrel
specify the absolute and relative accuracy requested, result
, abserr
and last
are the output components value
, abs.err
and subdivisions
of the R function integrate, where neval
gives the number of integrand function evaluations, and the error code ier
is translated to R’s integrate() $ message
, look at that function definition. limit
corresponds to integrate(…, subdivisions = *)
. It seems you should always define the two work arrays and the length of the second one as
lenw = 4 * limit;
iwork = (int *) R_alloc(limit, sizeof(int));
work = (double *) R_alloc(lenw, sizeof(double));
The comments in the source code in src/appl/integrate.c give more details, particularly about reasons for failure (ier >= 1
).
6.10 Utility functions
R has a fairly comprehensive set of sort routines which are made available to users’ C code. The following is declared in header file Rinternals.h.
-
Function: void R_orderVector (int* indx, int n, SEXP arglist, Rboolean nalast, Rboolean decreasing)
Function: void R_orderVector1 (int* indx, int n, SEXP x, Rboolean nalast, Rboolean decreasing) -
R_orderVector()
corresponds to R’sorder(…, na.last, decreasing)
. More specifically,indx <- order(x, y, na.last, decreasing)
corresponds toR_orderVector(indx, n, Rf_lang2(x, y), nalast, decreasing)
and for three vectors,Rf_lang3(x,y,z)
is used as arglist.Both
R_orderVector
andR_orderVector1
assume the vectorindx
to be allocated to length >= n. On return,indx[]
contains a permutation of0:(n-1)
, i.e., 0-based C indices (and not 1-based R indices, as R’sorder()
).When ordering only one vector,
R_orderVector1
is faster and corresponds (but is 0-based) to R’sindx <- order(x, na.last, decreasing)
. It was added in R 3.3.0.
All other sort routines are declared in header file R_ext/Utils.h (included by R.h) and include the following.
-
Function: void R_isort (int* x, int n)
Function: void R_rsort (double* x, int n)
Function: void R_csort (Rcomplex* x, int n)
Function: void rsort_with_index (double* x, int* index, int n) -
The first three sort integer, real (double) and complex data respectively. (Complex numbers are sorted by the real part first then the imaginary part.)
NA
s are sorted last.rsort_with_index
sorts on x, and applies the same permutation to index.NA
s are sorted last.
- Function: void revsort (double* x, int* index, int n)
-
Is similar to
rsort_with_index
but sorts into decreasing order, andNA
s are not handled.
-
Function: void iPsort (int* x, int n, int k)
Function: void rPsort (double* x, int n, int k)
Function: void cPsort (Rcomplex* x, int n, int k) -
These all provide (very) partial sorting: they permute x so that
x[k]
is in the correct place with smaller values to the left, larger ones to the right.
-
Function: void R_qsort (double v, size_t i, size_t j)
Function: void R_qsort_I (double v, int I, int i, int j)
Function: void R_qsort_int (int iv, size_t i, size_t j)
Function: void R_qsort_int_I (int iv, int I, int i, int j) -
These routines sort
v[i:j]
oriv[i:j]
(using 1-indexing, i.e.,v[1]
is the first element) calling the quicksort algorithm as used by R’ssort(v, method = “quick”)
and documented on the help page for the R functionsort
. The…_I()
versions also return thesort.index()
vector inI
. Note that the ordering is not stable, so tied values may be permuted.Note that
NA
s are not handled (explicitly) and you should use different sorting functions ifNA
s can be present.
-
Function: subroutine qsort4 (double precision v, integer indx, integer ii, integer jj)
Function: subroutine qsort3 (double precision v, integer ii, integer jj) -
The FORTRAN interface routines for sorting double precision vectors are
qsort3
andqsort4
, equivalent toR_qsort
andR_qsort_I
, respectively.
- Function: void R_max_col (double* matrix, int* nr, int* nc, int* maxes, int* ties_meth)
-
Given the nr by nc matrix
matrix
in column-major (“FORTRAN”) order,R_max_col()
returns inmaxes[i-1]
the column number of the maximal element in the i-th row (the same as R’smax.col()
function). In the case of ties (multiple maxima),*ties_meth
is an integer code in1:3
determining the method: 1 = “random”, 2 = “first” and 3 = “last”. See R’s help page?max.col
.
-
Function: int findInterval (double* xt, int n, double x, Rboolean rightmost_closed, Rboolean all_inside, int ilo, int* mflag)
Function: int findInterval2(double xt, int n, double x, Rboolean rightmost_closed, Rboolean all_inside, Rboolean left_open, int ilo, int mflag) -
Given the ordered vector xt of length n, return the interval or index of x in
xt[]
, typically max(i; 1 <= i <= n & xt[i] <= x) where we use 1-indexing as in R and FORTRAN (but not C). If rightmost_closed is true, also returns n-1 if x equals xt[n]. If all_inside is not 0, the result is coerced to lie in1:(n-1)
even when x is outside the xt[] range. On return,*mflag
equals -1 if x < xt[1], +1 if x >= xt[n], and 0 otherwise.The algorithm is particularly fast when ilo is set to the last result of
findInterval()
and x is a value of a sequence which is increasing or decreasing for subsequent calls.findInterval2()
is a generalization offindInterval()
, with an extraRboolean
argument left_open. Settingleft_open = TRUE
basically replaces all left-closed right-open intervals t) by left-open ones t], see the help page of R functionfindInterval
for details.There is also an
F77_CALL(interv)()
version offindInterval()
with the same arguments, but all pointers.
A system-independent interface to produce the name of a temporary file is provided as
-
Function: char R_tmpnam (const char prefix, const char tmpdir)
Function: char R_tmpnam2 (const char prefix, const char tmpdir, const char *fileext) -
Return a pathname for a temporary file with name beginning with prefix and ending with fileext in directory tmpdir. A
NULL
prefix or extension is replaced by“”
. Note that the return value ismalloc
ed and should befree
d when no longer needed (unlike the system calltmpnam
).
There is also the internal function used to expand file names in several R functions, and called directly by path.expand
.
- Function: const char R_ExpandFileName (const char fn)
-
Expand a path name fn by replacing a leading tilde by the user’s home directory (if defined). The precise meaning is platform-specific; it will usually be taken from the environment variable
HOME
if this is defined.
For historical reasons there are FORTRAN interfaces to functions D1MACH
and I1MACH
. These can be called from C code as e.g. F77_CALL(d1mach)(4)
. Note that these are emulations of the original functions by Fox, Hall and Schryer on NetLib at http://www.netlib.org/slatec/src/ for IEC 60559 arithmetic (required by R).
6.11 Re-encoding
R has its own C-level interface to the encoding conversion capabilities provided by iconv
because there are incompatibilities between the declarations in different implementations of iconv
.
These are declared in header file R_ext/Riconv.h.
Function: void Riconv_open (const char to, const char *from)
Set up a pointer to an encoding object to be used to convert between two encodings: “”
indicates the current locale.
Function: size_t Riconv (void *cd, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft)
Convert as much as possible of inbuf
to outbuf
. Initially the int
variables indicate the number of bytes available in the buffers, and they are updated (and the char
pointers are updated to point to the next free byte in the buffer). The return value is the number of characters converted, or (size_t)-1
(beware: size_t
is usually an unsigned type). It should be safe to assume that an error condition sets errno
to one of E2BIG
(the output buffer is full), EILSEQ
(the input cannot be converted, and might be invalid in the encoding specified) or EINVAL
(the input does not end with a complete multi-byte character).
Function: int Riconv_close (void * cd)
Free the resources of an encoding object.
6.12 Allowing interrupts
No port of R can be interrupted whilst running long computations in compiled code, so programmers should make provision for the code to be interrupted at suitable points by calling from C
#include <R_ext/Utils.h>
void R_CheckUserInterrupt(void);
and from FORTRAN
subroutine rchkusr()
These check if the user has requested an interrupt, and if so branch to R’s error handling functions.
Note that it is possible that the code behind one of the entry points defined here if called from your C or FORTRAN code could be interruptible or generate an error and so not return to your code.
6.13 Platform and version information
The header files define USING_R
, which can be used to test if the code is indeed being used with R.
Header file Rconfig.h (included by R.h) is used to define platform-specific macros that are mainly for use in other header files. The macro WORDS_BIGENDIAN
is defined on big-endian141 systems (e.g. most OSes on Sparc and PowerPC hardware) and not on little-endian systems (nowadays all the commoner R platforms). It can be useful when manipulating binary files. NB: these macros apply only to the C compiler used to build R, not necessarily to another C or C++ compiler.
Header file Rversion.h (not included by R.h) defines a macro R_VERSION
giving the version number encoded as an integer, plus a macro R_Version
to do the encoding. This can be used to test if the version of R is late enough, or to include back-compatibility features. For protection against very old versions of R which did not have this macro, use a construction such as
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
...
#endif
More detailed information is available in the macros R_MAJOR
, R_MINOR
, R_YEAR
, R_MONTH
and R_DAY
: see the header file Rversion.h for their format. Note that the minor version includes the patchlevel (as in ‘2.2’).
Packages which use alloca
need to ensure it is defined: as it is part of neither C nor POSIX there is no standard way to do so. One can use
#include <Rconfig.h> // for HAVE_ALLOCA_H
#ifdef __GNUC__
// this covers gcc, clang, icc
# undef alloca
# define alloca(x) __builtin_alloca((x))
#elif defined(HAVE_ALLOCA_H)
// needed for native compilers on Solaris and AIX
# include <alloca.h>
#endif
(and this should be included before standard C headers such as stdlib.h, since on some platforms these include malloc.h which may have a conflicting definition), which suffices for known R platforms.
6.14 Inlining C functions
The C99 keyword inline
should be recognized by all compilers nowadays used to build R. Portable code which might be used with earlier versions of R can be written using the macro R_INLINE
(defined in file Rconfig.h included by R.h), as for example from package cluster
#include <R.h>
static R_INLINE int ind_2(int l, int j)
{
...
}
Be aware that using inlining with functions in more than one compilation unit is almost impossible to do portably, see http://www.greenend.org.uk/rjk/2003/03/inline.html, so this usage is for static
functions as in the example. All the R configure code has checked is that R_INLINE
can be used in a single C file with the compiler used to build R. We recommend that packages making extensive use of inlining include their own configure code.
6.15 Controlling visibility
Header R_ext/Visibility.h has some definitions for controlling the visibility of entry points. These are only effective when ‘HAVE_VISIBILITY_ATTRIBUTE’ is defined – this is checked when R is configured and recorded in header Rconfig.h (included by R_ext/Visibility.h). It is often defined on modern Unix-alikes with a recent compiler142, but not supported on macOS nor Windows. Minimizing the visibility of symbols in a shared library will both speed up its loading (unlikely to be significant) and reduce the possibility of linking to other entry points of the same name.
C/C++ entry points prefixed by attribute_hidden
will not be visible in the shared object. There is no comparable mechanism for FORTRAN entry points, but there is a more comprehensive scheme used by, for example package stats. Most compilers which allow control of visibility will allow control of visibility for all symbols via a flag, and where known the flag is encapsulated in the macros ‘C_VISIBILITY’ and F77_VISIBILITY
for C and FORTRAN compilers. These are defined in etc/Makeconf and so available for normal compilation of package code. For example, src/Makevars could include
PKG_CFLAGS=$(C_VISIBILITY)
PKG_FFLAGS=$(F77_VISIBILITY)
This would end up with no visible entry points, which would be pointless. However, the effect of the flags can be overridden by using the attribute_visible
prefix. A shared object which registers its entry points needs only for have one visible entry point, its initializer, so for example package stats has
void attribute_visible R_init_stats(DllInfo *dll)
{
R_registerRoutines(dll, CEntries, CallEntries, FortEntries, NULL);
R_useDynamicSymbols(dll, FALSE);
...
}
The visibility mechanism is not available on Windows, but there is an equally effective way to control which entry points are visible, by supplying a definitions file pkgnme/src/pkgname-win.def: only entry points listed in that file will be visible. Again using stats as an example, it has
LIBRARY stats.dll
EXPORTS
R_init_stats
6.16 Using these functions in your own C code
It is possible to build Mathlib
, the R set of mathematical functions documented in Rmath.h, as a standalone library libRmath under both Unix-alikes and Windows. (This includes the functions documented in Numerical analysis subroutines as from that header file.)
The library is not built automatically when R is installed, but can be built in the directory src/nmath/standalone in the R sources: see the file README there. To use the code in your own C program include
#define MATHLIB_STANDALONE
#include <Rmath.h>
and link against ‘-lRmath’ (and perhaps ‘-lm’). There is an example file test.c.
A little care is needed to use the random-number routines. You will need to supply the uniform random number generator
double unif_rand(void)
or use the one supplied (and with a dynamic library or DLL you will have to use the one supplied, which is the Marsaglia-multicarry with an entry points
set_seed(unsigned int, unsigned int)
to set its seeds and
get_seed(unsigned int *, unsigned int *)
to read the seeds).
6.17 Organization of header files
The header files which R installs are in directory R_INCLUDE_DIR (default R_HOME/include). This currently includes
R.h includes many other files S.h different version for code ported from S Rinternals.h definitions for using R’s internal structures Rdefines.h macros for an S-like interface to the above (no longer maintained) Rmath.h standalone math library Rversion.h R version information Rinterface.h for add-on front-ends (Unix-alikes only) Rembedded.h for add-on front-ends R_ext/Applic.h optimization and integration R_ext/BLAS.h C definitions for BLAS routines R_ext/Callbacks.h C (and R function) top-level task handlers R_ext/GetX11Image.h X11Image interface used by package trkplot R_ext/Lapack.h C definitions for some LAPACK routines R_ext/Linpack.h C definitions for some LINPACK routines, not all of which are included in R R_ext/Parse.h a small part of R’s parse interface: not part of the stable API. R_ext/RStartup.h for add-on front-ends R_ext/Rdynload.h needed to register compiled code in packages R_ext/R-ftp-http.h interface to internal method of download.file
R_ext/Riconv.h interface to iconv
R_ext/Visibility.h definitions controlling visibility R_ext/eventloop.h for add-on front-ends and for packages that need to share in the R event loops (not Windows)
The following headers are included by R.h:
Rconfig.h configuration info that is made available R_ext/Arith.h handling for NA
s,NaN
s,Inf
/-Inf
R_ext/Boolean.h TRUE
/FALSE
typeR_ext/Complex.h C typedefs for R’s complex
R_ext/Constants.h constants R_ext/Error.h error handling R_ext/Memory.h memory allocation R_ext/Print.h Rprintf
and variations.R_ext/RS.h definitions common to R.h and S.h, including F77_CALL
etc.R_ext/Random.h random number generation R_ext/Utils.h sorting and other utilities R_ext/libextern.h definitions for exports from R.dll on Windows.
The graphics systems are exposed in headers R_ext/GraphicsEngine.h, R_ext/GraphicsDevice.h (which it includes) and R_ext/QuartzDevice.h. Facilities for defining custom connection implementations are provided in R_ext/Connections.h, but make sure you consult the file before use.
Let us re-iterate the advice to include system headers before the R header files, especially Rinternals.h (included by Rdefines.h) and Rmath.h, which redefine names which may be used in system headers (fewer if ‘R_NO_REMAP’ is defined, or ‘R_NO_REMAP_RMATH’ for Rmath.h).