|
AddColBinnedToBinary()
|
Bin the values of a selected continuous column into 2 bins (halves) and add the bin assignments as a new column |
|
AddColBinnedToQuartiles()
|
Bin the values of a selected continuous column into 4 bins (quartiles) and add the bin assignments as a new column |
|
AddPCsToEnd()
|
Perform PCA |
|
captureSessionInfo()
|
Capture session info |
|
ConvertDataToPercentiles()
|
Use percentiles to assess for outliers in multidimensional data |
|
CorAssoTestMultipleWithErrorHandling()
|
Takes multiple vectors and do correlation/association testing with all of them |
|
correlation.association.test()
|
Given two numerical data vector, determine the correlation |
|
describeNumericalColumns()
|
Describe each numerical feature. Mean, stddev, median, skewness (symmetry), kurtosis (flatness), pass normality? |
|
describeNumericalColumnsWithLevels()
|
For each level, describe each numerical feature. Mean, sd, median, skewness (symmetry), kurtosis (flatness), pass normality? |
|
DownSampleDataframe()
|
Down sample an imbalanced dataset to get a balanced dataset |
|
generate.descriptive.plots()
|
Use histograms and boxplots to get an general idea of what data looks like |
|
generate.descriptive.plots.save.pdf()
|
Use histograms and boxplots to get an general idea of what data looks like |
|
GenerateElbowPlotPCA()
|
Create elbow plot to see how much total variance is explained by the components |
|
GeneratePC1andPC2PlotsWithAndWithoutOutliers()
|
Generate PC1 vs PC2 plots with and without outliers. |
|
Log2TargetDensityPlotComparison()
|
Do Log2 transformation on a column, and then compare with and without log2 transformation |
|
LookAtPCFeatureLoadings()
|
Principal component feature loadings |
|
MultipleColumnsNormalCheckThenBoxCox()
|
Checks multiple columns in a dataframe to see if each is normally distributed. If not, then box-cox transform |
|
NormalCheckThenBoxCoxTransform()
|
Checks if the data is normally distributed using Shapiro test. If not normal, then boxcox transform. |
|
RanomlySelectOneRowForEach()
|
Randomly select one row |
|
RecodeIdentifier()
|
Recode the identifier column of a dataset |
|
RemoveColWithAllZeros()
|
Remove columns with all zeros |
|
RemoveRowsBasedOnCol()
|
Remove rows from the dataframe if the row contains a value in the specified columns |
|
RemoveSamplesWithInstability()
|
Remove samples that have multiple values for a single column and those
values are unstable |
|
SplitIntoTrainTest()
|
Split into train and test |
|
StabilityTestingAcrossVisits()
|
Assess stability of values that correspond to a single identifier |
|
SubsetDataByContinuousCol()
|
Subset data by two bounds on a continuous column |
|
TwoSampleTTest()
|
Performs two sample t-test on multiple features |
|
ZScoreChallengeOutliers()
|
Remove outliers based on Z score of a particular variable |
|
CVPredictionsRandomForest()
|
Create random forest cross-validated model |
|
CVRandomForestClassificationMatrixForPheatmap()
|
Generate a random forest model under cross validation (CV) for different subsets of the data and display results in a pheatmap to easily compare the different subsets |
|
eval.classification.results()
|
Determine the performance of classification |
|
find.best.number.of.trees()
|
Using the classification error rate for each number of trees, find
the optimal number of trees to use for random forest classifier |
|
GenerateExampleDataMachinelearnr()
|
Produce example data set for demonstrating package functions |
|
LOOCVPredictionsRandomForestAutomaticMtryAndNtree()
|
Create random forest leave-one-out-cross-validated model |
|
LOOCVRandomForestClassificationMatrixForPheatmap()
|
Generate a random forest model under leave-one-out-cross-validation (LOOCV) for different
subsets of the data and display results in a pheatmap to easily compare the different subsets |
|
RandomForestAutomaticMtryAndNtree()
|
Create random forest classification model after optimizing mtry and ntree |
|
RandomForestClassificationGiniMatrixForPheatmap()
|
Generate a random forest model for different subsets of the data and display
results into a matrix |
|
RandomForestClassificationPercentileMatrixForPheatmap()
|
Generate a random forest model for different subsets of the data and display
results in a pheatmap to easily compare the different subsets |