AddColBinnedToBinary()
|
Bin the values of a selected continuous column into 2 bins (halves) and add the bin assignments as a new column |
AddColBinnedToQuartiles()
|
Bin the values of a selected continuous column into 4 bins (quartiles) and add the bin assignments as a new column |
AddPCsToEnd()
|
Perform PCA |
captureSessionInfo()
|
Capture session info |
ConvertDataToPercentiles()
|
Use percentiles to assess for outliers in multidimensional data |
CorAssoTestMultipleWithErrorHandling()
|
Takes multiple vectors and do correlation/association testing with all of them |
correlation.association.test()
|
Given two numerical data vector, determine the correlation |
describeNumericalColumns()
|
Describe each numerical feature. Mean, stddev, median, skewness (symmetry), kurtosis (flatness), pass normality? |
describeNumericalColumnsWithLevels()
|
For each level, describe each numerical feature. Mean, sd, median, skewness (symmetry), kurtosis (flatness), pass normality? |
DownSampleDataframe()
|
Down sample an imbalanced dataset to get a balanced dataset |
generate.descriptive.plots()
|
Use histograms and boxplots to get an general idea of what data looks like |
generate.descriptive.plots.save.pdf()
|
Use histograms and boxplots to get an general idea of what data looks like |
GenerateElbowPlotPCA()
|
Create elbow plot to see how much total variance is explained by the components |
GeneratePC1andPC2PlotsWithAndWithoutOutliers()
|
Generate PC1 vs PC2 plots with and without outliers. |
Log2TargetDensityPlotComparison()
|
Do Log2 transformation on a column, and then compare with and without log2 transformation |
LookAtPCFeatureLoadings()
|
Principal component feature loadings |
MultipleColumnsNormalCheckThenBoxCox()
|
Checks multiple columns in a dataframe to see if each is normally distributed. If not, then box-cox transform |
NormalCheckThenBoxCoxTransform()
|
Checks if the data is normally distributed using Shapiro test. If not normal, then boxcox transform. |
RanomlySelectOneRowForEach()
|
Randomly select one row |
RecodeIdentifier()
|
Recode the identifier column of a dataset |
RemoveColWithAllZeros()
|
Remove columns with all zeros |
RemoveRowsBasedOnCol()
|
Remove rows from the dataframe if the row contains a value in the specified columns |
RemoveSamplesWithInstability()
|
Remove samples that have multiple values for a single column and those
values are unstable |
SplitIntoTrainTest()
|
Split into train and test |
StabilityTestingAcrossVisits()
|
Assess stability of values that correspond to a single identifier |
SubsetDataByContinuousCol()
|
Subset data by two bounds on a continuous column |
TwoSampleTTest()
|
Performs two sample t-test on multiple features |
ZScoreChallengeOutliers()
|
Remove outliers based on Z score of a particular variable |
CVPredictionsRandomForest()
|
Create random forest cross-validated model |
CVRandomForestClassificationMatrixForPheatmap()
|
Generate a random forest model under cross validation (CV) for different subsets of the data and display results in a pheatmap to easily compare the different subsets |
eval.classification.results()
|
Determine the performance of classification |
find.best.number.of.trees()
|
Using the classification error rate for each number of trees, find
the optimal number of trees to use for random forest classifier |
GenerateExampleDataMachinelearnr()
|
Produce example data set for demonstrating package functions |
LOOCVPredictionsRandomForestAutomaticMtryAndNtree()
|
Create random forest leave-one-out-cross-validated model |
LOOCVRandomForestClassificationMatrixForPheatmap()
|
Generate a random forest model under leave-one-out-cross-validation (LOOCV) for different
subsets of the data and display results in a pheatmap to easily compare the different subsets |
RandomForestAutomaticMtryAndNtree()
|
Create random forest classification model after optimizing mtry and ntree |
RandomForestClassificationGiniMatrixForPheatmap()
|
Generate a random forest model for different subsets of the data and display
results into a matrix |
RandomForestClassificationPercentileMatrixForPheatmap()
|
Generate a random forest model for different subsets of the data and display
results in a pheatmap to easily compare the different subsets |