RanomlySelectOneRowForEach.RdIf multiple rows contain the same identifier for a column, then randomly select just one row. Do this for all identifiers and output a new dataframe where each identifier now only has one row.
RanomlySelectOneRowForEach(inputted.data, col.name.of.unique.identifier, seed)
| inputted.data | A dataframe. |
|---|---|
| col.name.of.unique.identifier | Name of column in inputted.data containing identifiers. |
| seed | Number indicating the seed to set for random number generation. |
A dataframe where a single row remains for each identifier.
Other Preprocessing functions:
AddColBinnedToBinary(),
AddColBinnedToQuartiles(),
AddPCsToEnd(),
ConvertDataToPercentiles(),
CorAssoTestMultipleWithErrorHandling(),
DownSampleDataframe(),
GenerateElbowPlotPCA(),
GeneratePC1andPC2PlotsWithAndWithoutOutliers(),
Log2TargetDensityPlotComparison(),
LookAtPCFeatureLoadings(),
MultipleColumnsNormalCheckThenBoxCox(),
NormalCheckThenBoxCoxTransform(),
RecodeIdentifier(),
RemoveColWithAllZeros(),
RemoveRowsBasedOnCol(),
RemoveSamplesWithInstability(),
SplitIntoTrainTest(),
StabilityTestingAcrossVisits(),
SubsetDataByContinuousCol(),
TwoSampleTTest(),
ZScoreChallengeOutliers(),
captureSessionInfo(),
correlation.association.test(),
describeNumericalColumnsWithLevels(),
describeNumericalColumns(),
generate.descriptive.plots.save.pdf(),
generate.descriptive.plots()
identifier.col <- c("a", "a", "a", "b", "b", "b", "c") value.col <- c(1, 2, 3, 1, 1, 1, 5) input.data.frame <- as.data.frame(cbind(identifier.col, value.col)) results <- RanomlySelectOneRowForEach(input.data.frame, "identifier.col", 1) results#> identifier.col value.col #> 1 a 1 #> 4 b 1 #> 7 c 5