About lazy evaluation

13 minute(s) read

A follow-up on Thomas Lumley follow-up post on Miles McBain post about quotation.

In this post, Thomas is continuing Miles exploration of the concept of quoting and evaluation in R. Thomas speaks a little bit about lazy evaluation, and I decided to continue to explore this concept. Notably I wish to start over from on this quote from the blog post:

“In reality, to allow for lazy evaluation, R has a special data structure called a promise, which stores the expression until you look at it then evaluates it. R also has substitute() to get the expression out of the promise.”

Lazy Eval: a starting point

I’ve already written about lazy evaluation, but here is more about that concept.

A quick definition

Lazy evaluation is a programming strategy that allows a symbol to be evaluated only when needed. In other words, a symbol can be defined (e.g in a function), and it will only be evaluated when it is needed (and that moment can be never). This is why you can do:

plop <- function(a, b){
  a * 10
}
plop(4)
## [1] 40

Here, b is defined as a function argument, but never evaluated. So no error. This strategy is called “lazy” as it does “the strict minimum” of evaluation (remember that evaluation is looking for the value of a symbol).

Lazy evaluation means you can also do:

plop(a = 4, b = non_existing_variable)
## [1] 40

As b is never evaluated, we don’t have any problem, R never tries to look for the value of non_existing_variable.

We can also find it in control structure:

if (TRUE){
  12
} else {
  no_variable
}
## [1] 12

And of course this works on the other side:

if (FALSE){
  no_variable
} else {
  12
}
## [1] 12

Only the TRUE part is evaluated. You can also find it in :

if (TRUE || no_variable) {
  12
}
## [1] 12

Note that this won’t work with |, as:

The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. (from ?base::Logic)

if (TRUE | no_variable) {
  12
}
## Error in eval(expr, envir, enclos): objet 'no_variable' introuvable

Why lazy eval

Lazy evaluation is not R-restricted: it is also found in other languages (mainly functional languages). Its opposite is strict/eager evaluation, which is the default in most programming languages.

Lazy evaluation is implemented in R as it allows a program to be more efficient when used interactively: only the necessary symbols are evaluated, that is to say that only the needed objects will be loaded in memory and/or looked for. The downside being that it can make a program less predictable, as you are never 100% sure a symbol will be evaluated (but this is for more advanced use-cases).

It’s a typical mechanism for functional language, as it allows functions to be defined without any values in it. That means that you can create this object without a and b having a value.

ping <- function(a,b){
  a + b
}

To quote R: a language for data analysis and graphics:

The expression given as function arguments are not evaluated before the function is called. Instead, the expressions are packaged together with the environment in which they should be evaluated and it is this package that is passed to the function. Evaluation only takes place when the argument is required.

In fact, you’re already familiar with it, as I’m sure you can predict the output of this function:

mean_of_that <- function(x, mean_of = mean(x)){
  # Of course I could use na.rm, it's an example ;)
  x <- x[!is.na(x)]
  print(x)
  cat("The mean of x is", mean_of)
}
mean_of_that(c(1,2,3,4,NA))
## [1] 1 2 3 4
## The mean of x is 2.5

Here, if the output does not surprise you, it’s because you already have understood what is lazy eval (good news, right!): when R tries to access the value of mean_of, it looks for the value of x. At that exact moment, as the value of x has changed (no NA), you have the mean of the new x. If mean_of had been evaluated as soon as the function was called, the value of mean_of would have been NA.

See also :

ping <- function(a = Sys.time(), b = Sys.time(), c = Sys.time()){
  print(a)
  Sys.sleep(1)
  print(b)
  Sys.sleep(1)
  print(c)
}
ping()
## [1] "2018-09-04 08:27:00 CEST"
## [1] "2018-09-04 08:27:01 CEST"
## [1] "2018-09-04 08:27:02 CEST"

You can see that each element has a different value. If the elements had been evaluated at the moment the function was called, they would all have the same value (i.e the Sys.time of when the function is called).

LazyData, and promises

If specified in the DESCRIPTION, datasets from packages are lazily loaded. It means two things :

  • When library(pkg), the datasets are not loaded in the environment (definitely more efficient)
  • That you can “preload” them with data("dataset"), and get a promise back

If you run this in a fresh R session:

library(ggplot2)
data("diamonds")

This is what you’re going to get:

A <Promise>.

At this point, as I still don’t have called the dataset, the symbols (diamonds) holds a promise to this dataset, which is still not in memory:

library(pryr)
mem_used()
## 44.6 MB
#Now I need diamonds
nrow(diamonds)
## [1] 53940
mem_used()
## 48.1 MB

As you can see, the memory used by my R session has changed when I actually needed diamonds. This latter is no longer a promise, but a loaded dataset in my environment.

Note that substitute doesn’t “break the promise”:

data("txhousing")
mem_used()
## 48.1 MB
substitute(txhousing)
## txhousing
mem_used()
## 48.1 MB
nrow(txhousing)
## [1] 8602
mem_used()
## 48.6 MB

Here is an example of Non-standard evaluation with substitute: even if I’m passing txhousing as a symbol, substitute(txhousing) does not behave as nrow(txhousing). The symbol is not evaluated in the standard way, the promise is still a promise, and the symbol txhousing does not bring the object in the environment.

Let’s just put it into a function:

substiplop <- function(dataset){
  # deparse turns a symbol into a character
  name <- deparse(substitute(dataset))
  paste("You called", name)
}

library(ggplot2)
mem_used()
## 48.6 MB
substiplop(dataset = economics_long)
## [1] "You called economics_long"
mem_used()
## 48.6 MB

As you can see, no economics_long has been evaluated. Now compare:

nrowplop <- function(dataset){
  paste("You called a dataset with", nrow(dataset))
}

mem_used()
## 48.6 MB
nrowplop(dataset = economics_long)
## [1] "You called a dataset with 2870"
mem_used()
## 48.7 MB

Keep all this in mind, we’ll be back to it in a few.

More about lazy evaluation

Ok, now, now let’s dig deeper into lazy evaluation.

RTFM

Let’s start with the beginning: the R-Manuals. promises and lazy evaluation are referred to several times in the R Language Definition.

If we go to Promise objects, we learn that :

Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment. When a function is called the arguments are matched and then each of the formal arguments is bound to a promise. The expression that was given for that formal argument and a pointer to the environment the function was called from are stored in the promise.

What that means is that: when calling a function, arguments are turned into promises. These promises contain: an expression, and an environment (no value at first). In a sense, what this object holds is not a value, but a recipe for a value, saying “evaluate this expression in this environment”, and this recipe is called only when we need it.

Until that argument is accessed there is no value associated with the promise. When the argument is accessed, the stored expression is evaluated in the stored environment, and the result is returned. The result is also saved by the promise. The substitute function will extract the content of the expression slot. This allows the programmer to access either the value or the expression associated with the promise.

So, here’s a clear definition for the substitute function: an “expression slot content extractor” :) In other words, when passing arguments to a function, they are immediately turned into a promise, a data structure with an expression, and a recipe for a value. But here’s the thing: thanks to lazy evaluation, you can access this expression without having to actually give an argument a value (i.e., without having to look for its value).

Remember our function plop, and :

plop(a = 4, b = non_existing_variable)
## [1] 40

With our newly acquired knowledge, we can tell what’s happening here: b is created as a promise, containing the expression non_existing_variable. It contains no value, but as we never try to actually evaluate it (i.e. try to access its value), there is no error.

Let’s continue on that note: b is created as a promise (expression + environment), and substitute allows to get the expression out of a promise. So we could modify our function to play with the expression contained in b:

plop <- function(a, b) {
  cat("You entered", deparse(substitute(b)), "as `b` \n")
  a * 10
}
plop(a = 4, b = non_existing_variable)
## You entered non_existing_variable as `b`

## [1] 40

But that also means we can evaluate b the way we want (for example to create a dplyr::pull-like function)

plop <- function(a, b) {
  eval(substitute(b), envir = a)
}
plop(iris, Species)[1:10]
##  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
plop(iris, Sepal.Length)[1:10]
##  [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9

(More about environment evaluation here)

Or, even, that we could write a dplyr::mutate-like function:

mutator <- function(a, col_name_computation){
  # In three steps here to detail the process, could be one line of code
  col_name_computation_sub <- substitute(col_name_computation)
  res <- eval(col_name_computation_sub, envir = a)
  a$new_col <- res
  a
}
mutator(head(iris), Sepal.Length * 10)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
## 1          5.1         3.5          1.4         0.2  setosa      51
## 2          4.9         3.0          1.4         0.2  setosa      49
## 3          4.7         3.2          1.3         0.2  setosa      47
## 4          4.6         3.1          1.5         0.2  setosa      46
## 5          5.0         3.6          1.4         0.2  setosa      50
## 6          5.4         3.9          1.7         0.4  setosa      54

(Of course, the real dplyr::mutate does A LOT more, it’s just for the example)

Let’s sum up what is happening here :

  • I give a and new_col expressions as inputs
  • Both a and new_col become promises, linked to the expressions given as inputs. None are evaluated at this point, thanks to lazy evaluation
  • R extracts the expression contained in col_name_computation, puts it in col_name_computation_sub, which is at that stage a call.
  • I have defined a custom rule for evaluation, and this call is evaluated in the context of the dataframe given (remember that dataframes are lists, and you can eval a symbol inside a list).
  • This newly created vector is put inside the dataframe as a column
  • The modified data.frame is returned

To dissect a little bit what is happening:

mutator <- function(a, col_name_computation){
  col_name_computation_sub <- substitute(col_name_computation)
  cat("`col_name_computation_sub` is: ")
  print(col_name_computation_sub)
  cat("its class is: ")
  print(class(col_name_computation_sub))
  cat("it is evaluated in: ")
  print(substitute(a))
  
  res <- eval(col_name_computation_sub, envir = a)
  cat("`res` is: ")
  print(res)
  
  a$new_col <- res
  invisible(a)
}
mutator(head(iris), Sepal.Length * 10)
## `col_name_computation_sub` is: Sepal.Length * 10
## its class is: [1] "call"
## it is evaluated in: head(iris)
## `res` is: [1] 51 49 47 46 50 54
mutator(head(mtcars), mpg * disp)
## `col_name_computation_sub` is: mpg * disp
## its class is: [1] "call"
## it is evaluated in: head(mtcars)
## `res` is: [1] 3360.0 3360.0 2462.4 5521.2 6732.0 4072.5

Detecting promises

In case you were wondering how to check if something is a promise… let’s continue from the manual:

Within the R language, promise objects are almost only seen implicitly: actual function arguments are of this type. There is also a delayedAssign function that will make a promise out of an expression. There is generally no way in R code to check whether an object is a promise or not, nor is there a way to use R code to determine the environment of a promise.

There is a way to create a promise, through the delayedAssign function. At the time of writing I haven’t found a use case for that, but I’ll be glad to hear about one in the comment!

delayedAssign("a", this_var)
a
## Error in eval(expr, envir, enclos): objet 'this_var' introuvable
this_var <- 12
a
## Warning: redémarrage de l'évaluation d'une promesse interrompue

## [1] 12

Evaluation, and force()ing evaluation

From Argument evaluation:

The process of filling the value slot of a promise by evaluating the contents of the expression slot in the promise’s environment is called forcing the promise. A promise will only be forced once, the value slot content being used directly later on. A promise is forced when its value is needed.

Forcing is “filling” the value slot of a promise. This can be done by simply calling the object, or by using the force function (note that force is just semantic sugar). Let’s see how this can be useful with a plot (from Substitutions)

logplot <- function(y, ylab = deparse(substitute(y))) {
    y <- log(y)
    plot(y, ylab = ylab)
}
logplot(1:10)

Here, as ylab is forced after y has changed, the labels is the one from the modified y. Which can be changed if we force the ylab before:

logplot <- function(y, ylab = deparse(substitute(y))) {
    force(ylab)
    y <- log(y)
    plot(y, ylab = ylab)
}
logplot(1:10)

As said before: the promise is only forced once, so ylab finds its value in the first line of code.

Remember our mean_of_that function from before. Look at how it changes if I force the evaluation of mean_of before changing x:

mean_of_that <- function(x, mean_of = mean(x)){
  force(mean_of)
  x <- x[!is.na(x)]
  print(x)
  cat("The mean of x is", mean_of)
}
mean_of_that(c(1,2,3,4,NA))
## [1] 1 2 3 4
## The mean of x is NA

More about lazy evaluation

Here are some random quotes and elements found on the internet, not necessarily linked to R:

Lazy evaluation : Waiting until the last possible moment to evaluate an expression, especially for the purpose of optimizing an algorithm that may not use the value of the expression.

Since this method of evaluation runs f as little as possible, it is called “lazy evaluation”. It makes it practical to modularize a program as a generator that constructs a large number of possible answers, and a selector that chooses the appropriate one. While some other systems allow programs to be run together in this manner, only functional languages (and not even all of them) use lazy evaluation uniformly for every function call, allowing any part of a program to be modularized in this way. Lazy evaluation is perhaps the most powerful tool for modularization in the functional programmer’s repertoire.

Lazy evaluation (or call-by-need) delays evaluating an expression until it is actually needed; when it is evaluated, the result is saved so repeated evaluation is not needed. Lazy evaluation is a technique that can make some algorithms easier to express compactly or much more efficiently, or both. It is the normal evaluation mechanism for strict functional (side-effect-free) languages such as Haskell. However, automatic lazy evaluation is awkward to combine with side-effects such as input-output. It can also be difficult to implement lazy evaluation efficiently, as it requires more book-keeping.

Categories:

Updated:

What do you think?