--- title: "roperators: a gentle tour" output: prettydoc::html_pretty: toc: true theme: cayman highlight: github vignette: > %\VignetteIndexEntry{roperators: a gentle tour} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( echo = TRUE, collapse = TRUE, comment = "#>" ) library(roperators) ``` `roperators` adds the small things you keep wishing base R had — string arithmetic, in-place modifiers, comparisons that don't flinch at `NA` or floating point, and a little drawer of everyday helpers. It's pure base R, with nothing heavy underneath, and it tries to be especially kind to people arriving from Python and other languages. Think of this as an unhurried tour — pour a coffee. But if you only want the highlights, here they are: ```{r} "foo" %+% "bar" # string addition (0.1 + 0.1 + 0.1) %~=% 0.3 # floating-point equality that just works c(1, NA) %==% c(1, NA) # NA == NA is treated as TRUE here name <- "you" f("hello {name}, 2 + 2 = {2 + 2}") # f-strings! ``` # String arithmetic Let's start with the one nearly everyone misses coming from other languages — gluing strings together with a `+`. So we added it: ```{r} my_string <- "using infix (%) operators " %+% "lets R do string addition" my_string # subtraction removes a pattern my_string %-% "lets R do string addition" # multiplication repeats (%*% was already taken, so it's %s*%) "ha" %s*% 3 ``` And something you *can't* do in Python — **string division**, which simply counts how many times a pattern turns up (regular expressions are welcome): ```{r} "an apple a day keeps the malignant spirit of Steve Jobs at bay" %s/% "a" # with a regular expression "an apple a day keeps the malignant spirit of Steve Jobs at bay" %s/% "Steve Jobs|apple" ``` # In-place modifiers (*à la* `+=`) How many times have you written something like `df$x[long$condition] <- df$x[long$condition] + 1`? The line barely fits on the page. Let's make it kinder: ```{r} x <- 1 x %+=% 2 x d <- iris # add 1 to setosa sepal lengths, in place d$Sepal.Length[d$Species == "setosa"] %+=% 1 ``` The full set is `%+=%`, `%-=%`, `%*=%`, `%/=%`, `%^=%`, `%root=%`, and `%log=%`. `%+=%` and `%-=%` are happy with strings, too: ```{r} x <- "ab" x %+=% "c" x ``` ## Filling in missing values and regex matches `%na<-%` gently fills the `NA`s, and `%regex=%` / `%regex<-%` edit in place: ```{r} x <- c(NA, 1, 2, 3) x %na<-% 0 x x <- c("a1b", "b1", "c", "d0") x %regex=% c("\\d+", "#") # replace just the matched part x ``` # Comparisons that behave ## When `NA == NA` ought to be `TRUE` An `NA` doesn't technically equal another `NA` — but most of the time, for what you're actually doing, you'd like it to. How many `if` statements have quietly broken on exactly this? ```{r} a <- c(NA, "foo", "foo", NA) b <- c(NA, "foo", "bar", "bar") a == b # base R: the NA leaks through a %==% b # roperators: NA == NA is treated as TRUE ``` `%>=%` and `%<=%` carry the same gentle NA-handling. ## When `0.1 + 0.1 + 0.1` ought to equal `0.3` This one catches almost everyone, and it really isn't your fault — it's just how computers hold decimals: ```{r} (0.1 + 0.1 + 0.1) == 0.3 # FALSE (!) (0.1 + 0.1 + 0.1) %~=% 0.3 # TRUE # greater/less-than-or-approximately-equal (0.1 + 0.1 + 0.1) %>~% 0.3 (0.1 + 0.1 + 0.1) %<~% 0.3 ``` ## Between, and strict equality ```{r} 5 %><% c(1, 10) # strictly between 1 %>=<% c(1, 10) # inclusive 5 %><% c(10, 1) # reversed bounds are fine too — no need to worry about order # %===% is strict value-AND-class equality, like JavaScript's === x <- int(2) x == 2 # TRUE x %===% 2 # FALSE (different class) x %===% int(2) ``` # Logical and SQL-style operators ```{r} "z" %ni% c("a", "b", "c") # not in TRUE %xor% FALSE # exclusive or TRUE %aon% TRUE # all-or-nothing: both TRUE, or both FALSE # SQL-style LIKE c("FOO", "bar", "fizz") %rlike% "foo" # case-insensitive c("dOe", "doe") %perl% "[a-z]O" # case-sensitive, Perl regex ``` # ✨ New in 1.4 A few new friends, added in this release. **`f()` — string interpolation (R's f-strings).** Anything inside `{ }` is evaluated right where you call it: ```{r} who <- "Ben"; n <- 2 f("Hi {who}, you have {n} new message{if (n != 1) 's'}") f("today's first letters: {head(LETTERS, n)}") # vectors are tidied up for you ``` **`%else%` — a calm fallback** for when an expression might error (the fallback only runs if it's actually needed): ```{r} sqrt("not a number") %else% NA_real_ (1:3)[[99]] %else% "out of range" ``` **`%/0%` — safe division** that returns `NA` rather than letting an `Inf` or `NaN` wander into your next `sum()` or `mean()`: ```{r} c(10, 20, 30) %/0% c(2, 0, 5) ``` **`%+-%` — a tolerance interval** that drops straight into the `between` operators: ```{r} 5 %+-% 0.5 4.9 %><% (5 %+-% 0.5) ``` **`%~%` — forgiving string equality** that ignores case and stray whitespace — the string cousin of `%~=%`: ```{r} " Yes " %~% "yes" c("Apple", "PEAR") %~% c("apple", "pear") ``` **`as.percent()` — proportions, dressed up:** ```{r} as.percent(c(0.1, 0.005, 2 / 3)) as.percent(2 / 3, digits = 0) ``` # Shorter type conversions R's conversion syntax is a touch wordy. These trim it down: ```{r} chr(42) # as.character() int(42.9) # as.integer() num("4.2") # as.numeric() bool("TRUE") # as.logical() # the famous factor-to-number stumble, smoothed over: fac <- factor(c(11, 22, 33)) as.numeric(fac) # 1 2 3 -- almost never what you wanted f.as.numeric(fac) # 11 22 33 # and convert to a class chosen at run time as.class(255, "roman") ``` # Gentle checks Rather than chaining five conditions, you can ask one calm question: ```{r} # would any of these break a calculation? is.bad_for_calcs(c(1, NA, Inf, NaN, 5)) is.scalar(1) is.constant(c(1, 1, 1)) is.binary(c("a", "b", "a")) ``` There's a whole family of `is.*_or_null()` predicates too, lovely for checking optional function arguments without fuss. # A drawer of everyday helpers ```{r} # pulling pieces out of vectors and strings get_1st_word("Ada Lovelace") get_last_word("Ada Lovelace") get_most_frequent(c("a", "b", "b", "c", "b")) # Oxford-comma joining, done for you paste_oxford("Tom", "Dick", "Harry") # complete-cases stats: just add _cc for na.rm = TRUE mean_cc(c(1, 2, NA)) sd_cc(c(1, 2, 3, NA)) # little environment checks get_os() get_R_version() # and file-extension checks is_csv_file(c("a.csv", "b.txt")) ``` # Cheat sheet | You want… | Reach for | |------------------------------------|--------------------------------------| | String concat / subtract | `%+%` / `%-%` | | String repeat / count | `%s*%` / `%s/%` | | In-place maths | `%+=%` `%-=%` `%*=%` `%/=%` `%^=%` | | Fill NAs / regex edit in place | `%na<-%` / `%regex=%` / `%regex<-%` | | NA-aware (in)equality | `%==%` `%>=%` `%<=%` | | Floating-point equality | `%~=%` `%>~%` `%<~%` | | Strict (value + class) equality | `%===%` | | Between (excl / incl) | `%><%` / `%>=<%` | | Not-in / xor / all-or-nothing | `%ni%` / `%xor%` / `%aon%` | | SQL-style LIKE | `%rlike%` / `%perl%` | | String interpolation | `f()` | | Inline error fallback | `%else%` | | Safe divide / tolerance | `%/0%` / `%+-%` | | Fuzzy string match | `%~%` | # A gentle word on names A few names are shared on purpose with the wider world — `%+%` with ggplot2, and `%like%`-style matching with data.table. If you've got those loaded as well, just reach for the namespaced form (`roperators::%+%`) where it matters, and everyone gets along fine.