Use percentiles to assess for outliers in multidimensional data

Takes a dataframe of values where the columns are continuous features to be assess for outliers and rows are observations. For each feature, each observation will receive a percentile rank for that feature. At the end, a column will be added to the data that tallies how many features in a single observation has percentile ranks in the top and bottom specified percentiles; this will be done for every observation. Observations with many features in the top and bottom percentiles will be considered as potential outliers. The tally column can be used to locate observations that have many features with extreme values. As a result, the column can be used to assess for potential outliers.

ConvertDataToPercentiles(input.data, upper_lower_bound_threshold)

Arguments

input.data	A dataframe with the columns as continous values to be converted to percentile rank
upper_lower_bound_threshold	A number from 0 to 1. The tails that should be considered as percentiles that are too large or too small (upper_lower_bound_threshold, 1-upper_lower_bound_threshold)

Value

Original dataframe but values are converted to percentile and an additional column is added to tally up how many features in each observation has an usually large or small percentile rank.

Other Preprocessing functions: AddColBinnedToBinary(), AddColBinnedToQuartiles(), AddPCsToEnd(), CorAssoTestMultipleWithErrorHandling(), DownSampleDataframe(), GenerateElbowPlotPCA(), GeneratePC1andPC2PlotsWithAndWithoutOutliers(), Log2TargetDensityPlotComparison(), LookAtPCFeatureLoadings(), MultipleColumnsNormalCheckThenBoxCox(), NormalCheckThenBoxCoxTransform(), RanomlySelectOneRowForEach(), RecodeIdentifier(), RemoveColWithAllZeros(), RemoveRowsBasedOnCol(), RemoveSamplesWithInstability(), SplitIntoTrainTest(), StabilityTestingAcrossVisits(), SubsetDataByContinuousCol(), TwoSampleTTest(), ZScoreChallengeOutliers(), captureSessionInfo(), correlation.association.test(), describeNumericalColumnsWithLevels(), describeNumericalColumns(), generate.descriptive.plots.save.pdf(), generate.descriptive.plots()

Arguments

Value

See also