DownSampleDataframe.Rd
Down sample an imbalanced dataset to get a balanced dataset
DownSampleDataframe(input.data, name.of.feature.to.balance.on, seedVal)
input.data | A dataframe. |
---|---|
name.of.feature.to.balance.on | Name of column in the input dataframe. Column has to be a factor. This column is the membership column that you want to balance for. |
seedVal | Numeric value to set seed for random number generator. |
List with two objects:
ordered_balanced_data Dataframe with order of columns the same as the original data. Dataframe is a subset of the original dataframe because some observations were left out in order to balance the values in the selected column.
left_out_data Dataframe of the left out observations.
Other Preprocessing functions:
AddColBinnedToBinary()
,
AddColBinnedToQuartiles()
,
AddPCsToEnd()
,
ConvertDataToPercentiles()
,
CorAssoTestMultipleWithErrorHandling()
,
GenerateElbowPlotPCA()
,
GeneratePC1andPC2PlotsWithAndWithoutOutliers()
,
Log2TargetDensityPlotComparison()
,
LookAtPCFeatureLoadings()
,
MultipleColumnsNormalCheckThenBoxCox()
,
NormalCheckThenBoxCoxTransform()
,
RanomlySelectOneRowForEach()
,
RecodeIdentifier()
,
RemoveColWithAllZeros()
,
RemoveRowsBasedOnCol()
,
RemoveSamplesWithInstability()
,
SplitIntoTrainTest()
,
StabilityTestingAcrossVisits()
,
SubsetDataByContinuousCol()
,
TwoSampleTTest()
,
ZScoreChallengeOutliers()
,
captureSessionInfo()
,
correlation.association.test()
,
describeNumericalColumnsWithLevels()
,
describeNumericalColumns()
,
generate.descriptive.plots.save.pdf()
,
generate.descriptive.plots()