Randomly select one row

If multiple rows contain the same identifier for a column, then randomly select just one row. Do this for all identifiers and output a new dataframe where each identifier now only has one row.

RanomlySelectOneRowForEach(inputted.data, col.name.of.unique.identifier, seed)

Arguments

inputted.data	A dataframe.
col.name.of.unique.identifier	Name of column in inputted.data containing identifiers.
seed	Number indicating the seed to set for random number generation.

Value

A dataframe where a single row remains for each identifier.

Other Preprocessing functions: AddColBinnedToBinary(), AddColBinnedToQuartiles(), AddPCsToEnd(), ConvertDataToPercentiles(), CorAssoTestMultipleWithErrorHandling(), DownSampleDataframe(), GenerateElbowPlotPCA(), GeneratePC1andPC2PlotsWithAndWithoutOutliers(), Log2TargetDensityPlotComparison(), LookAtPCFeatureLoadings(), MultipleColumnsNormalCheckThenBoxCox(), NormalCheckThenBoxCoxTransform(), RecodeIdentifier(), RemoveColWithAllZeros(), RemoveRowsBasedOnCol(), RemoveSamplesWithInstability(), SplitIntoTrainTest(), StabilityTestingAcrossVisits(), SubsetDataByContinuousCol(), TwoSampleTTest(), ZScoreChallengeOutliers(), captureSessionInfo(), correlation.association.test(), describeNumericalColumnsWithLevels(), describeNumericalColumns(), generate.descriptive.plots.save.pdf(), generate.descriptive.plots()

Examples

identifier.col <- c("a", "a", "a", "b", "b", "b", "c")
value.col <- c(1, 2, 3, 1, 1, 1, 5)
input.data.frame <- as.data.frame(cbind(identifier.col, value.col))

results <- RanomlySelectOneRowForEach(input.data.frame, "identifier.col", 1)

results
#>   identifier.col value.col
#> 1              a         1
#> 4              b         1
#> 7              c         5

Arguments

Value

See also

Examples