Chapter 7 The mtcars example and drake plan generation

This chapter is a walkthrough of drake’s main functionality based on the mtcars example. It sets up the project and runs it repeatedly to demonstrate drake’s most important functionality.

7.1 Get the code.

Write the code files to your workspace.

drake_example("mtcars")

The new mtcars folder now includes a file structure of a serious drake project, plus an interactive-tutorial.R to narrate the example. The code is also online here.

7.2 Quick examples

Inspect and run your project.

library(drake)
load_mtcars_example()            # Get the code with drake_example("mtcars").
config <- drake_config(my_plan) # Master configuration list
vis_drake_graph(config)         # Hover, click, drag, zoom, pan.
make(my_plan)                   # Run the workflow.
outdated(config)                # Everything is up to date.

Debug errors.

failed()                   # Targets that failed in the most recent `make()`
context <- diagnose(large) # Diagnostic metadata: errors, warnings, etc.
error <- context$error
str(error)                 # Object of class "error"
error$message
error$call
error$calls                # Full traceback of nested calls leading up to the error. # nolint

Dive deeper into the built-in examples.

drake_example("mtcars") # Write the code files.
drake_examples()        # List the other examples.

7.3 The motivation of the mtcars example

Is there an association between the weight and the fuel efficiency of cars? To find out, we use the mtcars dataset from the datasets package. The mtcars dataset originally came from the 1974 Motor Trend US magazine, and it contains design and performance data on 32 models of automobile.

# ?mtcars # more info
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Here, wt is weight in tons, and mpg is fuel efficiency in miles per gallon. We want to figure out if there is an association between wt and mpg. The mtcars dataset itself only has 32 rows, so we generate two larger bootstrapped datasets and then analyze them with regression models. We summarize the regression models to see if there is an association.

7.4 Set up the mtcars example

Before you run your project, you need to set up the workspace. In other words, you need to gather the “imports”: functions, pre-loaded data objects, and saved files that you want to be available before the real work begins.

library(knitr) # drake knows which packages you load.
library(drake)

We need a function to bootstrap larger datasets from mtcars.

# Pick a random subset of n rows from a dataset
random_rows <- function(data, n){
  data[sample.int(n = nrow(data), size = n, replace = TRUE), ]
}

# Bootstrapped datasets from mtcars.
simulate <- function(n){
  # Pick a random set of cars to bootstrap from the mtcars data.
  data <- random_rows(data = mtcars, n = n)

  # x is the car's weight, and y is the fuel efficiency.
  data.frame(
    x = data$wt,
    y = data$mpg
  )
}

We also need functions to apply the regression models we need for detecting associations.

# Is fuel efficiency linearly related to weight?
reg1 <- function(d){
  lm(y ~ + x, data = d)
}

# Is fuel efficiency related to the SQUARE of the weight?
reg2 <- function(d){
  d$x2 <- d$x ^ 2
  lm(y ~ x2, data = d)
}

We want to summarize the final results in an R Markdown report, so we need the the report.Rmd source file. You can get it with drake_example("mtcars") or load_mtcars_example().

drake_example("mtcars", overwrite = TRUE)
file.copy("mtcars/report.Rmd", ".", overwrite = TRUE)
#> [1] TRUE

Here are the contents of the report. It will serve as a final summary of our work, and we will process it at the very end. Admittedly, some of the text spoils the punch line.

cat(readLines("report.Rmd"), sep = "\n")
#> ---
#> title: "Final results report for the mtcars example"
#> author: You
#> output: html_document
#> ---
#> 
#> # The weight and fuel efficiency of cars
#> 
#> Is there an association between the weight and the fuel efficiency of cars? To find out, we use the `mtcars` dataset from the `datasets` package. The `mtcars` data originally came from the 1974 Motor Trend US magazine, and it contains design and performance data on 32 models of automobile.
#> 
#> ```{r showmtcars}
#> # ?mtcars # more info
#> head(mtcars)
#> ```
#> 
#> Here, `wt` is weight in tons, and `mpg` is fuel efficiency in miles per gallon. We want to figure out if there is an association between `wt` and `mpg`. The `mtcars` dataset itself only has 32 rows, so we generated two larger bootstrapped datasets. We called them `small` and `large`.
#> 
#> ```{r example_chunk}
#> library(drake)
#> head(readd(small)) # 48 rows
#> loadd(large)       # 64 rows
#> head(large)
#> ```
#> 
#> Then, we fit a couple regression models to the `small` and `large` to try to detect an association between `wt` and `mpg`. Here are the coefficients and p-values from one of the model fits.
#> 
#> ```{r second_example_chunk}
#> readd(coef_regression2_small)
#> ```
#> 
#> Since the p-value on `x2` is so small, there may be an association between weight and fuel efficiency after all.
#> 
#> # A note on knitr reports in drake projects.
#> 
#> Because of the calls to `readd()` and `loadd()`, `drake` knows that `small`, `large`, and `coef_regression2_small` are dependencies of this R Markdown report. This dependency relationship is what causes the report to be processed at the very end.

Now, all our imports are set up. When the real work begins, drake will import functions and data objects from your R session environment

ls()
#>  [1] "as_chr"          "bad_config"      "bad_plan"       
#>  [4] "col"             "combos"          "config"         
#>  [7] "config1"         "config2"         "create_plot"    
#> [10] "get_logs"        "get_rmspe"       "good_config"    
#> [13] "good_plan"       "hist"            "latest_log_date"
#> [16] "lots_of_sds"     "make_my_plot"    "make_my_table"  
#> [19] "my_grid"         "plan"            "plan1"          
#> [22] "plan2"           "plot_rmspe"      "predictors"     
#> [25] "Produc"          "random_rows"     "reg1"           
#> [28] "reg2"            "results"         "simulate"       
#> [31] "small_config"    "small_plan"      "tmp"            
#> [34] "url"             "vals"            "y_vals"

and saved files from your file system.

list.files()
#> [1] "mtcars"     "report.Rmd"

7.5 The drake plan

Now that your workspace of imports is prepared, we can outline the real work step by step in a drake plan.

load_mtcars_example() # Get the code with drake_example("mtcars").
my_plan
#> # A tibble: 15 x 2
#>    target              command                                             
#>    <chr>               <expr>                                              
#>  1 report              knit(knitr_in("report.Rmd"), file_out("report.md"),…
#>  2 small               simulate(48)                                       …
#>  3 large               simulate(64)                                       …
#>  4 regression1_small   reg1(small)                                        …
#>  5 regression1_large   reg1(large)                                        …
#>  6 regression2_small   reg2(small)                                        …
#>  7 regression2_large   reg2(large)                                        …
#>  8 summ_regression1_s… suppressWarnings(summary(regression1_small$residual…
#>  9 summ_regression1_l… suppressWarnings(summary(regression1_large$residual…
#> 10 summ_regression2_s… suppressWarnings(summary(regression2_small$residual…
#> 11 summ_regression2_l… suppressWarnings(summary(regression2_large$residual…
#> 12 coef_regression1_s… suppressWarnings(summary(regression1_small))$coeffi…
#> 13 coef_regression1_l… suppressWarnings(summary(regression1_large))$coeffi…
#> 14 coef_regression2_s… suppressWarnings(summary(regression2_small))$coeffi…
#> 15 coef_regression2_l… suppressWarnings(summary(regression2_large))$coeffi…

Each row is an intermediate step, and each command generates a single target. A target is an output R object (cached when generated) or an output file (specified with single quotes), and a command just an ordinary piece of R code (not necessarily a single function call). Commands make use of targets generated by other commands, objects your environment, input files, and namespaced objects/functions from packages (referenced with :: or :::). These dependencies give your project an underlying network representation.

# Hover, click, drag, zoom, and pan.
config <- drake_config(my_plan)
vis_drake_graph(config, width = "100%", height = "500px") # Also drake_graph()

You can also check the dependencies of individual targets and imported functions.

deps_code(reg2)
#> # A tibble: 4 x 2
#>   name  type   
#>   <chr> <chr>  
#> 1 x     globals
#> 2 y     globals
#> 3 lm    globals
#> 4 x2    globals

deps_code(my_plan$command[[1]])
#> # A tibble: 6 x 2
#>   name                   type    
#>   <chr>                  <chr>   
#> 1 knit                   globals 
#> 2 large                  loadd   
#> 3 small                  readd   
#> 4 coef_regression2_small readd   
#> 5 report.md              file_out
#> 6 report.Rmd             knitr_in

deps_code(my_plan$command[[nrow(my_plan)]])
#> # A tibble: 4 x 2
#>   name              type   
#>   <chr>             <chr>  
#> 1 summary           globals
#> 2 suppressWarnings  globals
#> 3 regression2_large globals
#> 4 coefficients      globals

List all the reproducibly-tracked objects and files.

tracked(config)
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "datasets::mtcars"      
#>  [7] "file report.Rmd"        "file report.md"        
#>  [9] "random_rows"            "reg1"                  
#> [11] "reg2"                   "regression1_large"     
#> [13] "regression1_small"      "regression2_large"     
#> [15] "regression2_small"      "report"                
#> [17] "simulate"               "small"                 
#> [19] "summ_regression1_large" "summ_regression1_small"
#> [21] "summ_regression2_large" "summ_regression2_small"

7.6 Generate the plan

7.6.1 The easy way

drake version 7.0.0 will support new special syntax to create complicated drake plans from boilerplate code. See the chapter on plans for more details. To get the funcionality early, install development drake.

install.packages("remotes")
library(remotes)
install_github("ropensci/drake")

Then, use transformations to generate the plan.

my_plan <- drake_plan(
  report = knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE),
  small = simulate(48),
  large = simulate(64),
  regression1 = target(
    reg1(data),
    transform = map(data = c(small, large), .tag_out = reg)
  ),
  regression2 = target(
    reg2(data),
    transform = map(data, .tag_out = reg)
  ),
  summ = target(
    suppressWarnings(summary(reg$residuals)),
    transform = map(reg)
  ),
  coef = target(
    suppressWarnings(summary(reg))$coefficients,
    transform = map(reg)
  )
)

my_plan
#> # A tibble: 15 x 2
#>    target              command                                             
#>    <chr>               <expr>                                              
#>  1 report              knit(knitr_in("report.Rmd"), file_out("report.md"),…
#>  2 small               simulate(48)                                       …
#>  3 large               simulate(64)                                       …
#>  4 regression1_small   reg1(small)                                        …
#>  5 regression1_large   reg1(large)                                        …
#>  6 regression2_small   reg2(small)                                        …
#>  7 regression2_large   reg2(large)                                        …
#>  8 summ_regression1_s… suppressWarnings(summary(regression1_small$residual…
#>  9 summ_regression1_l… suppressWarnings(summary(regression1_large$residual…
#> 10 summ_regression2_s… suppressWarnings(summary(regression2_small$residual…
#> 11 summ_regression2_l… suppressWarnings(summary(regression2_large$residual…
#> 12 coef_regression1_s… suppressWarnings(summary(regression1_small))$coeffi…
#> 13 coef_regression1_l… suppressWarnings(summary(regression1_large))$coeffi…
#> 14 coef_regression2_s… suppressWarnings(summary(regression2_small))$coeffi…
#> 15 coef_regression2_l… suppressWarnings(summary(regression2_large))$coeffi…

In the first row above, knitr_in() indicates that report.Rmd is a dependency and targets loaded with loadd() and readd() in active code chunks are also dependencies. Use file_out() to tell drake that the target is a file output.

7.6.2 The old way

drake has old wildcard templating functions to help generate plans. It is more difficult to adapt them to practical use cases, but they have been around since the early days of drake.

Here are the commands to generate the bootstrapped datasets.

my_datasets <- drake_plan(
  small = simulate(48),
  large = simulate(64))
my_datasets
#> # A tibble: 2 x 2
#>   target command     
#>   <chr>  <expr>      
#> 1 small  simulate(48)
#> 2 large  simulate(64)

For multiple replicates:

expand_plan(my_datasets, values = c("rep1", "rep2"))
#> The interface at https://ropenscilabs.github.io/drake-manual/plans.html#large-plans is better than evaluate_plan(), map_plan(), gather_by(), etc.
#> # A tibble: 4 x 2
#>   target     command     
#>   <chr>      <expr>      
#> 1 small_rep1 simulate(48)
#> 2 small_rep2 simulate(48)
#> 3 large_rep1 simulate(64)
#> 4 large_rep2 simulate(64)

Here is a template for applying our regression models to our bootstrapped datasets.

methods <- drake_plan(
  regression1 = reg1(dataset__),
  regression2 = reg2(dataset__))
methods
#> # A tibble: 2 x 2
#>   target      command        
#>   <chr>       <expr>         
#> 1 regression1 reg1(dataset__)
#> 2 regression2 reg2(dataset__)

We evaluate the dataset__ wildcard to generate all the regression commands we need.

my_analyses <- evaluate_plan(
  methods, wildcard = "dataset__",
  values = my_datasets$target
)
my_analyses
#> # A tibble: 4 x 2
#>   target            command    
#>   <chr>             <expr>     
#> 1 regression1_small reg1(small)
#> 2 regression1_large reg1(large)
#> 3 regression2_small reg2(small)
#> 4 regression2_large reg2(large)

Next, we summarize each analysis of each dataset. We calculate descriptive statistics on the residuals, and we collect the regression coefficients and their p-values.

summary_types <- drake_plan(
  summ = suppressWarnings(summary(analysis__$residuals)),
  coef = suppressWarnings(summary(analysis__))$coefficients
)
summary_types
#> # A tibble: 2 x 2
#>   target command                                           
#>   <chr>  <expr>                                            
#> 1 summ   suppressWarnings(summary(analysis__$residuals))   
#> 2 coef   suppressWarnings(summary(analysis__))$coefficients

my_summaries <- evaluate_plan(
  summary_types,
  wildcard = "analysis__",
  values = my_analyses$target
)
my_summaries
#> # A tibble: 8 x 2
#>   target                command                                            
#>   <chr>                 <expr>                                             
#> 1 summ_regression1_sma… suppressWarnings(summary(regression1_small$residua…
#> 2 summ_regression1_lar… suppressWarnings(summary(regression1_large$residua…
#> 3 summ_regression2_sma… suppressWarnings(summary(regression2_small$residua…
#> 4 summ_regression2_lar… suppressWarnings(summary(regression2_large$residua…
#> 5 coef_regression1_sma… suppressWarnings(summary(regression1_small))$coeff…
#> 6 coef_regression1_lar… suppressWarnings(summary(regression1_large))$coeff…
#> 7 coef_regression2_sma… suppressWarnings(summary(regression2_small))$coeff…
#> 8 coef_regression2_lar… suppressWarnings(summary(regression2_large))$coeff…

For your knitr reports, use knitr_in() in your commands so that report.Rmd is a dependency and targets loaded with loadd() and readd() in active code chunks are also dependencies. Use file_out() to tell drake that the target is a file output.

report <- drake_plan(
  report = knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE)
)
report
#> # A tibble: 1 x 2
#>   target command                                                          
#>   <chr>  <expr>                                                           
#> 1 report knit(knitr_in("report.Rmd"), file_out("report.md"), quiet = TRUE)

Finally, consolidate your workflow using rbind(). Row order does not matter.

my_plan <- rbind(report, my_datasets, my_analyses, my_summaries)
my_plan
#> # A tibble: 15 x 2
#>    target              command                                             
#>    <chr>               <expr>                                              
#>  1 report              knit(knitr_in("report.Rmd"), file_out("report.md"),…
#>  2 small               simulate(48)                                       …
#>  3 large               simulate(64)                                       …
#>  4 regression1_small   reg1(small)                                        …
#>  5 regression1_large   reg1(large)                                        …
#>  6 regression2_small   reg2(small)                                        …
#>  7 regression2_large   reg2(large)                                        …
#>  8 summ_regression1_s… suppressWarnings(summary(regression1_small$residual…
#>  9 summ_regression1_l… suppressWarnings(summary(regression1_large$residual…
#> 10 summ_regression2_s… suppressWarnings(summary(regression2_small$residual…
#> 11 summ_regression2_l… suppressWarnings(summary(regression2_large$residual…
#> 12 coef_regression1_s… suppressWarnings(summary(regression1_small))$coeffi…
#> 13 coef_regression1_l… suppressWarnings(summary(regression1_large))$coeffi…
#> 14 coef_regression2_s… suppressWarnings(summary(regression2_small))$coeffi…
#> 15 coef_regression2_l… suppressWarnings(summary(regression2_large))$coeffi…

7.7 Run the workflow

You may want to check for outdated or missing targets/imports first.

config <- drake_config(my_plan, verbose = 0L)
outdated(config) # Targets that need to be (re)built.
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "regression1_large"     
#>  [7] "regression1_small"      "regression2_large"     
#>  [9] "regression2_small"      "report"                
#> [11] "small"                  "summ_regression1_large"
#> [13] "summ_regression1_small" "summ_regression2_large"
#> [15] "summ_regression2_small"

missed(config) # Checks your workspace.
#> character(0)

Then just make(my_plan).

make(my_plan)
#> target large
#> target small
#> target regression1_large
#> target regression2_large
#> target regression1_small
#> target regression2_small
#> target summ_regression1_large
#> target coef_regression1_large
#> target summ_regression2_large
#> target coef_regression2_large
#> target summ_regression1_small
#> target coef_regression1_small
#> target coef_regression2_small
#> target summ_regression2_small
#> target report

For the reg2() model on the small dataset, the p-value on x2 is so small that there may be an association between weight and fuel efficiency after all.

readd(coef_regression2_small)
#>               Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept) 27.8369441 1.07966324 25.782988 5.443436e-29
#> x2          -0.6359335 0.07138918 -8.907981 1.408597e-11

The non-file dependencies of your last target are already loaded in your workspace.

ls()
#>  [1] "as_chr"          "bad_config"      "bad_plan"       
#>  [4] "col"             "combos"          "config"         
#>  [7] "config1"         "config2"         "create_plot"    
#> [10] "get_logs"        "get_rmspe"       "good_config"    
#> [13] "good_plan"       "hist"            "latest_log_date"
#> [16] "lots_of_sds"     "make_my_plot"    "make_my_table"  
#> [19] "methods"         "my_analyses"     "my_datasets"    
#> [22] "my_grid"         "my_plan"         "my_summaries"   
#> [25] "plan"            "plan1"           "plan2"          
#> [28] "plot_rmspe"      "predictors"      "Produc"         
#> [31] "random_rows"     "reg1"            "reg2"           
#> [34] "results"         "simulate"        "small_config"   
#> [37] "small_plan"      "summary_types"   "tmp"            
#> [40] "url"             "vals"            "y_vals"
outdated(config) # Everything is up to date.
#> character(0)

build_times(digits = 4) # How long did it take to make each target?
#> # A tibble: 15 x 4
#>    target                 elapsed        user           system        
#>    <chr>                  <S4: Duration> <S4: Duration> <S4: Duration>
#>  1 coef_regression1_large 0.004s         0.004s         0s            
#>  2 coef_regression1_small 0.003s         0s             0s            
#>  3 coef_regression2_large 0.003s         0.004s         0s            
#>  4 coef_regression2_small 0.003s         0s             0s            
#>  5 large                  0.005s         0.004s         0s            
#>  6 regression1_large      0.005s         0.004s         0s            
#>  7 regression1_small      0.006s         0.008s         0s            
#>  8 regression2_large      0.005s         0.004s         0s            
#>  9 regression2_small      0.009s         0.008s         0s            
#> 10 report                 0.053s         0.052s         0s            
#> 11 small                  0.011s         0.012s         0s            
#> 12 summ_regression1_large 0.004s         0.008s         0s            
#> 13 summ_regression1_small 0.003s         0.004s         0s            
#> 14 summ_regression2_large 0.003s         0.004s         0s            
#> 15 summ_regression2_small 0.003s         0s             0s

See also predict_runtime() and rate_limiting_times().

In the new graph, the black nodes from before are now green.

# Hover, click, drag, zoom, and explore.
vis_drake_graph(config, width = "100%", height = "500px")

Optionally, get visNetwork nodes and edges so you can make your own plot with visNetwork() or render_drake_graph().

drake_graph_info(config)

Use readd() and loadd() to load targets into your workspace. (They are cached in the hidden .drake/ folder using storr). There are many more functions for interacting with the cache.

readd(coef_regression2_large)
#>               Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept) 27.7764504 0.85871209  32.34664 1.569624e-40
#> x2          -0.7056179 0.06942808 -10.16329 7.950479e-15

loadd(small)

head(small)
#>       x    y
#> 1 5.424 10.4
#> 2 3.440 17.8
#> 3 3.440 19.2
#> 4 3.170 15.8
#> 5 3.730 17.3
#> 6 3.845 19.2

rm(small)

cached()
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "regression1_large"     
#>  [7] "regression1_small"      "regression2_large"     
#>  [9] "regression2_small"      "report"                
#> [11] "small"                  "summ_regression1_large"
#> [13] "summ_regression1_small" "summ_regression2_large"
#> [15] "summ_regression2_small"

drake::progress()
#> # A tibble: 15 x 2
#>    target                 progress
#>    <chr>                  <chr>   
#>  1 coef_regression1_large done    
#>  2 coef_regression1_small done    
#>  3 coef_regression2_large done    
#>  4 coef_regression2_small done    
#>  5 large                  done    
#>  6 regression1_large      done    
#>  7 regression1_small      done    
#>  8 regression2_large      done    
#>  9 regression2_small      done    
#> 10 report                 done    
#> 11 small                  done    
#> 12 summ_regression1_large done    
#> 13 summ_regression1_small done    
#> 14 summ_regression2_large done    
#> 15 summ_regression2_small done

The next time you run make(my_plan), nothing will build because drake knows everything is already up to date.

make(my_plan)
#> All targets are already up to date.

But if you change one of your functions, commands, or other dependencies, drake will update the affected targets. Suppose we change the quadratic term to a cubic term in reg2(). We might want to do this if we suspect a cubic relationship between tons and miles per gallon.

reg2 <- function(d) {
  d$x3 <- d$x ^ 3
  lm(y ~ x3, data = d)
}

The targets that depend on reg2() need to be rebuilt.

config <- drake_config(my_plan)
outdated(config)
#> [1] "coef_regression2_large" "coef_regression2_small"
#> [3] "regression2_large"      "regression2_small"     
#> [5] "report"                 "summ_regression2_large"
#> [7] "summ_regression2_small"

Advanced: To get a rough idea of why a target is out of date, you can use dependency_profile(). It will tell you if any of the following changed since the last make():

  • The command in the drake plan.
  • At least one non-file dependency. (For this, the imports have to be up to date and cached, either with make(), make(skip_targets = TRUE), outdated(), or similar.)
  • At least one input file declared with file_in() or knitr_in().
  • At least one output file declared with file_out().
dependency_profile(target = regression2_small, config = config)
#> Warning: dependency_profile() in drake is deprecated. Use deps_profile()
#> instead.
#> # A tibble: 4 x 4
#>   hash     changed old_hash         new_hash        
#>   <chr>    <lgl>   <chr>            <chr>           
#> 1 command  FALSE   39e1321ff2265cac 39e1321ff2265cac
#> 2 depend   TRUE    ab16d2a0b3c4d844 4d6e115872246cea
#> 3 file_in  FALSE   ""               ""              
#> 4 file_out FALSE   ""               ""
# Hover, click, drag, zoom, and explore.
vis_drake_graph(config, width = "100%", height = "500px")

The next make() will rebuild the targets depending on reg2() and leave everything else alone.

make(my_plan)
#> target regression2_small
#> target regression2_large
#> target coef_regression2_small
#> target summ_regression2_small
#> target summ_regression2_large
#> target coef_regression2_large
#> target report

Trivial changes to whitespace and comments are totally ignored.

reg2 <- function(d) {
  d$x3 <- d$x ^ 3
    lm(y ~ x3, data = d) # I indented here.
}
outdated(config) # Everything is up to date.
#> character(0)

drake cares about nested functions too: nontrivial changes to random_rows() will propagate to simulate() and all the downstream targets.

random_rows <- function(data, n){
  n <- n + 1
  data[sample.int(n = nrow(data), size = n, replace = TRUE), ]
}

outdated(config)
#>  [1] "coef_regression1_large" "coef_regression1_small"
#>  [3] "coef_regression2_large" "coef_regression2_small"
#>  [5] "large"                  "regression1_large"     
#>  [7] "regression1_small"      "regression2_large"     
#>  [9] "regression2_small"      "report"                
#> [11] "small"                  "summ_regression1_large"
#> [13] "summ_regression1_small" "summ_regression2_large"
#> [15] "summ_regression2_small"

make(my_plan)
#> target large
#> target small
#> target regression1_large
#> target regression2_large
#> target regression1_small
#> target regression2_small
#> target summ_regression1_large
#> target coef_regression1_large
#> target summ_regression2_large
#> target coef_regression2_large
#> target summ_regression1_small
#> target coef_regression1_small
#> target coef_regression2_small
#> target summ_regression2_small
#> target report

Need to add new work on the fly? Just append rows to the drake plan. If the rest of your workflow is up to date, only the new work is run.

new_simulation <- function(n){
  data.frame(x = rnorm(n), y = rnorm(n))
}

additions <- drake_plan(
  new_data = new_simulation(36) + sqrt(10))
additions
#> # A tibble: 1 x 2
#>   target   command                      
#>   <chr>    <expr>                       
#> 1 new_data new_simulation(36) + sqrt(10)

my_plan <- rbind(my_plan, additions)
my_plan
#> # A tibble: 16 x 2
#>    target              command                                             
#>    <chr>               <expr>                                              
#>  1 report              knit(knitr_in("report.Rmd"), file_out("report.md"),…
#>  2 small               simulate(48)                                       …
#>  3 large               simulate(64)                                       …
#>  4 regression1_small   reg1(small)                                        …
#>  5 regression1_large   reg1(large)                                        …
#>  6 regression2_small   reg2(small)                                        …
#>  7 regression2_large   reg2(large)                                        …
#>  8 summ_regression1_s… suppressWarnings(summary(regression1_small$residual…
#>  9 summ_regression1_l… suppressWarnings(summary(regression1_large$residual…
#> 10 summ_regression2_s… suppressWarnings(summary(regression2_small$residual…
#> 11 summ_regression2_l… suppressWarnings(summary(regression2_large$residual…
#> 12 coef_regression1_s… suppressWarnings(summary(regression1_small))$coeffi…
#> 13 coef_regression1_l… suppressWarnings(summary(regression1_large))$coeffi…
#> 14 coef_regression2_s… suppressWarnings(summary(regression2_small))$coeffi…
#> 15 coef_regression2_l… suppressWarnings(summary(regression2_large))$coeffi…
#> 16 new_data            new_simulation(36) + sqrt(10)                      …

make(my_plan)
#> target new_data

If you ever need to erase your work, use clean(). The next make() will rebuild any cleaned targets, so be careful. You may notice that by default, the size of the cache does not go down very much. To purge old data, you could use clean(garbage_collection = TRUE, purge = TRUE). To do garbage collection without removing any important targets, use drake_gc().

# Uncaches individual targets and imported objects.
clean(small, reg1, verbose = 0L)
clean(verbose = 0L) # Cleans all targets out of the cache.
drake_gc(verbose = 0L) # Just garbage collection.
clean(destroy = TRUE, verbose = 0L) # removes the cache entirely
Copyright Eli Lilly and Company