--- title: "Modifying existing pipelines" output: rmarkdown::html_vignette: toc: true toc_depth: 4 description: > Shows how to insert, replace, and remove steps in a pipeline. vignette: > %\VignetteIndexEntry{Modifying existing pipelines} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitr-setup, include = FALSE} knitr::opts_chunk$set( comment = "#", prompt = FALSE, tidy = FALSE, cache = FALSE, collapse = TRUE ) old <- options(width = 100L) ``` ### Existing pipeline ```{r define-pipeline, include = FALSE, echo = FALSE} library(pipeflow) pip <- pip_new("my-pip") |> pip_add( "data", function(data = airquality[1:10, ]) data ) |> pip_add( "data_prep", function(x = ~data) { replace(x, "Temp.Celsius", (x[, "Temp"] - 32) * 5 / 9) } ) |> pip_add( "model_fit", function( data = ~data_prep, xVar = "Temp.Celsius" ) { lm(paste("Ozone ~", xVar), data = data) } ) |> pip_add( "model_plot", function( model = ~model_fit, data = ~data_prep, xVar = "Temp.Celsius", xLab = "Temperature in degrees Celsius", title = "Linear model fit" ) { require(ggplot2, quietly = TRUE) coeffs <- coefficients(model) ggplot(data) + geom_point(aes(.data[[xVar]], .data[["Ozone"]])) + geom_abline(intercept = coeffs[1], slope = coeffs[2]) + labs(title = title, x = xLab) } ) pip |> pip_set_params( list( xVar = "Solar.R", xLab = "Solar radiation in Langleys", title = "Some new title" ) ) pip_run(pip, lgr = NULL) ``` Let's start where we left off in the [Get started with pipeflow](v01-get-started.html) vignette, that is, we have the following pipeline ```{r show-pipeline} pip ``` with the following set data ```{r show-data} pip_get_params(pip)[["data"]] |> head(3) ``` ### Insert new step Let's say we want to insert a new step after the `data_prep` step that standardizes the y-variable. ```{r insert-step} pip |> pip_add( "standardize", function( data = ~data_prep, yVar = "Ozone" ) { data[, yVar] <- scale(data[, yVar]) data }, after = "data_prep" ) ``` ```{r} pip ``` ```{r, eval = FALSE, echo = nzchar(Sys.getenv("IN_PKGDOWN"))} library(visNetwork) do.call(visNetwork, args = pip_get_graph(pip)) |> visHierarchicalLayout(direction = "LR", sortMethod = "directed") ``` ```{r, echo = FALSE, eval = nzchar(Sys.getenv("IN_PKGDOWN"))} library(visNetwork) do.call(visNetwork, args = c(pip_get_graph(pip), list(height = 300))) |> visHierarchicalLayout(direction = "LR", sortMethod = "directed") ``` The `standardize` step is now part of the pipeline, but so far it is not used by any other step. ### Replace existing steps Let's revisit the function definition of the `model_fit` step ```{r} pip[["model_fit", "fun"]] ``` To use the standardized data, we need to change the data dependency such that it refers to the `standardize` step. Also instead of a fixed y-variable in the model, let's pass it as a parameter. ```{r replace-model-fit-step} pip |> pip_replace( "model_fit", function( data = ~standardize, # <- changed data reference xVar = "Temp.Celsius", yVar = "Ozone" # <- new y-variable ) { lm(paste(yVar, "~", xVar), data = data) } ) ``` The `model_plot` step needs to be updated in a similar way. ```{r replace-model-plot-step} pip |> pip_replace( "model_plot", function( model = ~model_fit, data = ~standardize, # <- changed data reference xVar = "Temp.Celsius", yVar = "Ozone", # <- new y-variable title = "Linear model fit" ) { coeffs <- coefficients(model) ggplot(data) + geom_point(aes(.data[[xVar]], .data[[yVar]])) + geom_abline(intercept = coeffs[1], slope = coeffs[2]) + labs(title = title) } ) ``` The updated pipeline now looks as follows. ```{r} pip ``` ```{r, echo = FALSE, eval = nzchar(Sys.getenv("IN_PKGDOWN"))} do.call(visNetwork, args = c(pip_get_graph(pip), list(height = 100))) |> visHierarchicalLayout(direction = "LR") ``` We see that the `model_fit` and `model_plot` steps now use (i.e., depend on) the standardized data. Let's re-run the pipeline and inspect the output. ```{r} pip_set_params(pip, params = list(xVar = "Solar.R", yVar = "Wind")) pip_run(pip) ``` ```{r} pip[["model_fit", "out"]] |> coefficients() ``` ```{r, fig.alt = "model-plot", warning = FALSE, message = FALSE} pip[["model_plot", "out"]] ``` ### Removing steps Let's see the pipeline again. ```{r} pip ``` When you are trying to remove a step, {pipeflow} by default checks if the step is used by any other step, and raises an error if removing the step would violate the integrity of the pipeline. ```{r try-remove-step} try(pip_remove(pip, "standardize")) ``` To enforce removing a step together with all its downstream dependencies, you can use the `recursive` argument. ```{r remove-steps-recursively} pip_remove(pip, "standardize", recursive = TRUE) ``` ```{r} pip ``` Naturally, the last step never has any downstream dependencies, so it can be removed without any issues. ```{r} last_step <- tail(pip[["step"]], 1) pip_remove(pip, last_step) ``` ```{r} pip ``` Replacing steps in a pipeline as shown in this vignette will allow to re-use existing pipelines and adapt them programmatically to new requirements. Another way of re-using pipelines is to combine them, which is shown in the [Combining pipelines](v03-combine-pipelines.html) vignette. ```{r, include = FALSE} options(old) ```