DownSampleDataframe.RdDown sample an imbalanced dataset to get a balanced dataset
DownSampleDataframe(input.data, name.of.feature.to.balance.on, seedVal)
| input.data | A dataframe. |
|---|---|
| name.of.feature.to.balance.on | Name of column in the input dataframe. Column has to be a factor. This column is the membership column that you want to balance for. |
| seedVal | Numeric value to set seed for random number generator. |
List with two objects:
ordered_balanced_data Dataframe with order of columns the same as the original data. Dataframe is a subset of the original dataframe because some observations were left out in order to balance the values in the selected column.
left_out_data Dataframe of the left out observations.
Other Preprocessing functions:
AddColBinnedToBinary(),
AddColBinnedToQuartiles(),
AddPCsToEnd(),
ConvertDataToPercentiles(),
CorAssoTestMultipleWithErrorHandling(),
GenerateElbowPlotPCA(),
GeneratePC1andPC2PlotsWithAndWithoutOutliers(),
Log2TargetDensityPlotComparison(),
LookAtPCFeatureLoadings(),
MultipleColumnsNormalCheckThenBoxCox(),
NormalCheckThenBoxCoxTransform(),
RanomlySelectOneRowForEach(),
RecodeIdentifier(),
RemoveColWithAllZeros(),
RemoveRowsBasedOnCol(),
RemoveSamplesWithInstability(),
SplitIntoTrainTest(),
StabilityTestingAcrossVisits(),
SubsetDataByContinuousCol(),
TwoSampleTTest(),
ZScoreChallengeOutliers(),
captureSessionInfo(),
correlation.association.test(),
describeNumericalColumnsWithLevels(),
describeNumericalColumns(),
generate.descriptive.plots.save.pdf(),
generate.descriptive.plots()