Down sample an imbalanced dataset to get a balanced dataset

DownSampleDataframe(input.data, name.of.feature.to.balance.on, seedVal)

Arguments

input.data

A dataframe.

name.of.feature.to.balance.on

Name of column in the input dataframe. Column has to be a factor. This column is the membership column that you want to balance for.

seedVal

Numeric value to set seed for random number generator.

Value

List with two objects:

  1. ordered_balanced_data Dataframe with order of columns the same as the original data. Dataframe is a subset of the original dataframe because some observations were left out in order to balance the values in the selected column.

  2. left_out_data Dataframe of the left out observations.

See also