themis: Extra Steps for tidymodels + recipes

R tidymodels Packages

themis contain extra steps for the recipes package for dealing with unbalanced data. The name themis is that of the ancient Greek goddess who is typically depicted with a balance.

Javier Orraca (Scatter Podcast)
02-20-2020

Working with unbalanced data sets? Remember that accuracy, alone, is not the best performance metric (especially when dealing with unbalanced data). Instead, place more importance on Cohen’s kappa coefficient, F1 harmonic mean, or focus on improving your model’s specificity or sensitivity, etc.

I’ve been transitioning a lot of my workflows to the tidymodels framework and I am super excited about the future of tidymodels (recipes + parsnip + dials + tune + workflow + more 😭✊🙌). If you’re using recipes often like me, a new library called {themis}, by Emil Hvitfeldt expands the {recipes} pre-processing steps for working with unbalanced data sets (it adds functionality for under- and hybrid-sampling techniques). I love me some smote, and now I can incorporate this sampling technique into my recipes with themis::step_smote()!

# Installation
install.packages("themis")

# Example workflow
library(recipes)
library(modeldata)
library(themis)

data(okc)

sort(table(okc$Class, useNA = "always"))
#> 
#>  <NA>  stem other 
#>     0  9539 50316

ds_rec <- recipe(Class ~ age + height, data = okc) %>%
  step_meanimpute(all_predictors()) %>%
  step_smote(Class) %>%
  prep()

sort(table(bake(ds_rec, new_data = NULL)$Class, useNA = "always"))
#> 
#>  <NA>  stem other 
#>     0 50316 50316

Source:
* themis
* themis on GitHub

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Orraca (2020, Feb. 20). Javier Orraca: themis: Extra Steps for tidymodels + recipes. Retrieved from https://www.javierorraca.com/posts/2020-02-20-Themis/

BibTeX citation

@misc{orraca2020themis:,
  author = {Orraca, Javier},
  title = {Javier Orraca: themis: Extra Steps for tidymodels + recipes},
  url = {https://www.javierorraca.com/posts/2020-02-20-Themis/},
  year = {2020}
}