# Down the rabbit hole with tidy eval — Part 1

7 minute(s) read

Some random explanations about programming with tidy eval.

## What on earth is evaluation?

So, let’s start with a simple question: what is evaluation? Evaluation is the process of analyzing an expression, in order to give the user something back. For example, in R, the standard evaluations is :

• you type/send something to the console (called a symbol)
• press enter
• R does some magic stuffs
• R returns you the value associated with the expression

For example :

``````# You type 1, the expression
1
# R evaluates 1, and returns you
 1

a <- 1
# Here, the expression is a (a is the symbol)
# Standard eval: when a symbol is evaluated, it return its value
a
 1
``````

Pretty clear isn’t it?

Spoiler: the part about R doing magic stuffs wasn’t quite true. In fact, R takes the symbol you’ve entered (here `a`), turns it into and internal representation, then looks in the direct environment of the expression in order to return the value associated with it. If R doesn’t find the value in the environment the expression is linked to, it goes up to the parent env, then to the parent env, so on and so forth.

This is R standard evaluation. The returned object is the value the symbol is linked to. Keep this in mind, you’ll need this later.

## Aside: about lazy evalution

An R strength is lazy evaluation. These strange words mean that R only evaluates the expression if the expression is actually used. That’s why this kind of function works:

``````lazy <- function(a, b){
print("please take a nap")
}
lazy()
 "please take a nap"

lazy <- function(a, b){
print(a)
}
lazy("please take a nap")
 "please take a nap"
``````

Here in function 1, `a` and `b` are not evaluated in the environment of the function, so no error. In function 2, `b` is never called, so it’s not evaluated, and no error is thrown either. On the other hand, this doesn’t work:

``````lazy <- function(a, b){
print(a)
print(b)
}
lazy("please take a nap")
 "please take a nap"
Error in print(b) :
argument "b" is missing, with no default
``````

Here, you can see that it throws an error: `b` is needed. You can also notice that `a` is first evaluated, the strings are printed, and only then the missing `b` throws an error.

## About scoping

Quick thing to keep in mind here, the notion of environment. Each expression is by default evaluated in its environment. Then if it’s missing, R goes up to its parent env, then to the parent env, etc.

Each function defines its own environment, which can have its own rules (so basically its own rule for evaluation of a symbol). The env opened when the function is launched and closed when finished. That’s why you can’t directly access the object created inside a function :

``````create <- function(){
a <- 1
}

create()
a
> Error: could not find 'a'

# Special character to override this

create <- function(){
a <<- 1
}
create()
a
 1

# But please DON'T do that.
``````

## Let’s focus: what about tidy eval?

So, back to our original point. I’ve been diving into tidy eval lately as I’ve been contributing to {narnia}, a package designed to analyse missing data, the tidy way. The whole philosophy of the package being the tidyverse, I needed to contribute with the same philosophy in mind.

So basically, I needed to create a function that took a `df`, the unquoted name `x` of a column, and `dplyr::group_by` with this column, and then `ggplot::ggplot`, with `aes(x)`, the name of the column previously specified. Thing is, you can’t simply do :

``````# Note : this is obviously not the function I was working on. This is an example.
#
# So you want to turn this into a function :
library(tidyverse)
iris %>%
group_by(Species) %>%
slice(5:10) %>%
ggplot(aes(Species, Sepal.Length)) +
geom_point()

# Let's try the simple way

gg_top <- function(df, col_group, col_plot){
df %>%
group_by(col_group) %>%
slice(5:10) %>%
ggplot(aes(col_group, col_plot)) +
geom_point()
}

gg_top(df = iris, col_group = Species, col_plot = Sepal.Length)

Error in grouped_df_impl(data, unname(vars), drop) :
Column `col_group` is unknown
``````

OK. Here R simply can’t find `col_group`. But where is this coming from? I did specified that `col_group` was equal to `Species`. Why is it looking for `col`?

Let’s try something else.

``````# This works
select(iris, Species)

# So what if I want to reproduce it?
# I can think of

select_custom <- function(df, col){
df[, col]
}
select_custom(df = iris, col = Species)
> Error in `[.data.frame`(df, , col) : could not find 'Species'

# But this works:
select_custom(df = iris, col = "Species")
``````

God damn, how is it that `dplyr::select` works with unquoted element, while `select_custom` needs a quoted string? That’s because :

• `select_custom` uses the standard evaluation: R sees the symbol `Species`, and tries to evaluate the standard way — i.e. by looking in the environment of the function for the value of `Species`. It doesn’t find it, so throws an error.
• When `"Species"` is quoted, R evaluates it for what it is: a string. So R doesn’t try to return a value from it.
• `dplyr::select` creates an environment, which has a custom method of evaluation. This is why you can pass unquoted string there — R will not look computer the symbol looking for a value in the env.

In each `dplyr::function(df, var)`, every `var` is evaluated in the environment of the function, which have special way of computed symbols. In the case of `filter`, R looks for a column named `var` in `df` (in practice, that’s not exactly how it works, but you get the point).

This explains the error being thrown earlier: `group_by` was looking for the `col_group` column inside our data.frame.

## Getting started

Then, the big question: how can we program with dplyr? How can we pass the unquoted `Species` arg from the function `gg_top` to our `group_by`, and `Sepal.Length` to the `ggplot`? Let’s start by breaking our problem into two parts: the `dplyr`, then the `ggplot`.

So first, we need to create a function that takes a data.frame, makes a `group_by` on a column, then returns the `slice(5:10)`. Basically something doing:

``````iris %>%
group_by(Species) %>%
slice(5:10)

# We could think of
slicer <- function(df, var){
df %>%
group_by(var) %>%
slice(5:10)
}
slicer(df = iris, var = Species)

> Error in grouped_df_impl(data, unname(vars), drop) :
Column `var` is unknown
``````

Here, you can see that R is looking for a `var` column. That’s because `var` is evaluated in the environment created by `group_by`, so looking for the column `var` in the `iris` df. So how to prevent that?

We could think of:

``````slicer(df = iris, var = "Species")

> Error in grouped_df_impl(data, unname(vars), drop) :
Column `var` is unknown
``````

But 1: that’s not working (because `group_by` doesn’t take a string), 2: we don’t want to quote.

So the thing is: `dplyr` functions work with a special type of objects, called `quosure` — this is how symbols are evaluated. You can create them with `quo()`.

``````quo(Species)
<quosure: global>
~Species

# So is this going to work?
slicer <- function(df, var){
df %>%
group_by(quo(var)) %>%
slice(5:10)
}
slicer(df = iris, var = Species)
Error in mutate_impl(.data, dots) :
Column `quo(var)` is of unsupported type quoted call
``````

Nop! Obviously here, `group_by(quo(var))` compute `quo(var)` as a quosure, so it does:

``````quo(quo(var))
<quosure: frame>
~quo(var)
``````

Not what we’ve been looking for either. We need a way to prevent the symbol `var` from being evaluated the standard way, but evaluated with tidy eval. Good news, there’s a function for that — `enquo()`. This function :

• Takes a symbol
• quotes the R code supplied
• captures the environment
• returns a quosure

Then, we need a way to tell `group_by` that we’ve taken care to the “quosurisation” (that’s not the real word, you know!). So… here comes `!!` (to be pronounced “Bang Bang” :) )

``````slicer <- function(df, var){
enquo_var <- enquo(var)
df %>%
# !! tells dplyr not to compute the object as a quosure
group_by(!!enquo_var) %>%
slice(5:10)
}
# That works!
slicer(df = iris, var = Species)
``````

[emoji party]

## the ggplot part

So now, we need to pass the `col_group` and `col_plot` into the ggplot call. We may be tempted to pass `!!enquo_col_plot` the same way we passed it through `group_by`. Thing is: tidy eval is not yet implemented in `ggplot2` — so you can’t pass the `enquo(var)` to it.

``````gg_top <- function(df, col_group, col_plot){
enquo_col_group <- enquo(col_group)
enquo_col_plot <- enquo(col_plot)
df %>%
group_by(!!enquo_col_group) %>%
slice(5:10) %>%
ggplot(aes(!!enquo_col_group, !!enquo_col_plot)) +
geom_point()
}

gg_top(df = iris, col_group = Species, col_plot = Sepal.Length)

Error in (function (x)  : could not find 'enquo_var'

``````

The trick is: you can use `quo_name`, which returns a character string with the name of the expression you’ve typed. Pass it to `ggplot2::aes_string`… and Voilà!

``````gg_top <- function(df, col_group, col_plot){
enquo_col_group <- enquo(col_group)
enquo_col_plot <- enquo(col_plot)
df %>%
group_by(!!enquo_col_group) %>%
slice(5:10) %>%
ggplot(aes_string(quo_name(enquo_col_group), quo_name(enquo_col_plot))) +
geom_point()
}

gg_top(df = iris, col_group = Species, col_plot = Sepal.Length)
``````

[emoji party]^2

Sorry, that was quite a long post.. I hope it has enlightened some dark side of the tidyverse :)

Coming soon: more on tidy eval, environment, and computing on the R language.

Tags:

Categories:

Updated: