A Crazy Little Thing Called {purrr} - Part 5: code optimization

4 minute(s) read

This might not be the last time I make this Queen reference.

There’s a general saying in programming that you should try to write as less code as possible. Spoiler: this is not because developers are lazy. On the contrary, concise code takes a lot of time to write. But it can save a massive amount of time on the long run, as it creates fewer bugs, and make the code easier to maintain.

As you know, I love {purrr}. Most tutorials on the web focus on the iteration part (don’t make me say what I didn’t say: the iteration tools are incredibly poweful), yet forgetting the “programming” part of {purrr} would make you miss a lot of cool features from this package. I hope my series of posts has convince you to dive into this “programming with {purrr}” world :)

So, today, we’re gonna have a look at what you can do with {purrr} to write more efficient R.

Repeat after me: don’t repeat yourself

It has been said countless time: if you copy and paste a piece of code more than two times, you need a function. Well, I know, it’s not always that easy to find the perfect function and RStudio makes it so easy to just copy down the line you’ve just typed. But everytime you wish to do that, remember that this is just “adding more fuel to the bug fire”.

(Ok, that might not be important if you’re just randomly trying stuffs, but let’s stay focused and imagine we’re responsable adults building packages, or doing reproducible research…)

Functions returning functions

R is an amazingly flexible language. Everything that exists in R is an object, and everything that happens is due to a function. So yes, functions are objects that can turn objects into other objects.

And if a function can take objects and return an object, there’s no reason these objects can’t be functions. That’s exactly what safely (seen in the first part of this series) and friends do. But today I’m here to talk about some other functions, compose and partial.

Compose

library(tidyverse)
library(broom)

Have you ever been in a situation when you wrote this kind of code?

lm(Sepal.Length ~ Species, data = iris) %>% tidy() %>% filter(p.value < 0.05)
lm(Pepal.Length ~ Species, data = iris) %>% tidy() %>% filter(p.value < 0.05)
lm(Sepal.Width ~ Species, data = iris) %>% tidy() %>% filter(p.value < 0.05)
lm(Sepal.Length ~ Species, data = iris) %>% tidy() %>% ilter(p.value < 0.05)

Yes, we’ve all done that. The thing is, with this kind of code, our eyes are focused on what stays the same, instead of focusing of what’s changing. And, most of all, there’s a high probability you didn’t noticed I made a typo on line 2 (example inspired by http://r4ds.had.co.nz/iteration.html).

And there’s also one on line 4.

So here comes compose(). This function takes as arguments a series of functions, and returns a function that will perform the series of function successively. So here, we can turn :

tidy(lm(Sepal.Length ~ Species, data = iris))

into

tidy_lm <- compose(tidy, lm)
tidy_lm(Sepal.Length ~ Species, data = iris)
##                term estimate  std.error statistic       p.value
## 1       (Intercept)    5.006 0.07280222 68.761639 1.134286e-113
## 2 Speciesversicolor    0.930 0.10295789  9.032819  8.770194e-16
## 3  Speciesvirginica    1.582 0.10295789 15.365506  2.214821e-32

So yes, this is a small improvement you could say. But the thing is, on this kind of code :

tidy_fancy_calculus <- compose(tidy, lm)
tidy_fancy_calculus(Sepal.Length ~ Species, data = iris)
tidy_fancy_calculus(Petal.Length ~ Species, data = iris)
tidy_fancy_calculus(Sepal.Width ~ Species, data = iris)
tidy_fancy_calculus(Petal.Width ~ Species, data = iris)

I’ll just have to make a change at one spot if I need to use another lm-like function.

*Note that the order of the function is not the a magrittr order, but a “classic” order, i.e. functions passed to compose are executed from right to left. *

And of course, you can always pass mappers in a compose :

clean_lm <- compose(as_mapper(~ arrange(.x, desc(p.value))), 
                    as_mapper(~ filter(.x, p.value < 0.05)),
                    tidy, 
                    lm)
clean_lm(Sepal.Length ~ Sepal.Width, data = iris)
##          term estimate std.error statistic      p.value
## 1 (Intercept) 6.526223 0.4788963  13.62763 6.469702e-28
# and etc
#clean_lm(Sepal.Length ~ Sepal.Width, data = iris)
#clean_lm(Sepal.Length ~ Sepal.Width, data = iris)
#clean_lm(Sepal.Length ~ Sepal.Width, data = iris)

For now formulas are not accepted as such (you need to call as_mapper), but you now, this is just a PR away ;)

Less code, more rokk

(Guitar Hero 2 reference)

So now, we need even less code. Yes, that’s possible, thanks to partial. As its name states, this function returns a partially filled function. That is to say that :

mean_na_rm <- partial(mean, na.rm = TRUE)
mean_na_rm(airquality$Ozone)
## [1] 42.12931

is the same as:

mean(airquality$Ozone, na.rm = TRUE)
## [1] 42.12931

Ok, that’s not that usefull on that example, but you get the idea :)

Let’s wrap up

To sum up, we need a function that will do a lm on iris, broom::tidy the result, filter the p.value, and arrange following the p.value column. So, here it is!

clean_lm_iris <- compose(as_mapper(~ arrange(.x, desc(p.value))), 
                    as_mapper(~ filter(.x, p.value < 0.05)),
                    tidy, 
                    partial(lm, data = iris))

clean_lm_iris(Sepal.Length ~ Sepal.Width)
##          term estimate std.error statistic      p.value
## 1 (Intercept) 6.526223 0.4788963  13.62763 6.469702e-28

What do you think?