themis: Extra Steps for tidymodels + recipes

R tidymodels Packages

themis contain extra steps for the recipes package for dealing with unbalanced data. The name themis is that of the ancient Greek goddess who is typically depicted with a balance.

Javier Orraca (Scatter Podcast)

Working with unbalanced data sets? Remember that accuracy, alone, is not the best performance metric (especially when dealing with unbalanced data). Instead, place more importance on Cohen’s kappa coefficient, F1 harmonic mean, or focus on improving your model’s specificity or sensitivity, etc.

I’ve been transitioning a lot of my workflows to the tidymodels framework and I am super excited about the future of tidymodels (recipes + parsnip + dials + tune + workflow + more 😭✊🙌). If you’re using recipes often like me, a new library called {themis}, by Emil Hvitfeldt expands the {recipes} pre-processing steps for working with unbalanced data sets (it adds functionality for under- and hybrid-sampling techniques). I love me some smote, and now I can incorporate this sampling technique into my recipes with themis::step_smote()!

# Installation

# Example workflow


sort(table(okc$Class, useNA = "always"))
#>  <NA>  stem other 
#>     0  9539 50316

ds_rec <- recipe(Class ~ age + height, data = okc) %>%
  step_meanimpute(all_predictors()) %>%
  step_smote(Class) %>%

sort(table(bake(ds_rec, new_data = NULL)$Class, useNA = "always"))
#>  <NA>  stem other 
#>     0 50316 50316

* themis
* themis on GitHub


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Orraca (2020, Feb. 20). Javier Orraca: themis: Extra Steps for tidymodels + recipes. Retrieved from

BibTeX citation

  author = {Orraca, Javier},
  title = {Javier Orraca: themis: Extra Steps for tidymodels + recipes},
  url = {},
  year = {2020}