Function reference • machinelearnr

Preprocessing Functions
`AddColBinnedToBinary()`	Bin the values of a selected continuous column into 2 bins (halves) and add the bin assignments as a new column
`AddColBinnedToQuartiles()`	Bin the values of a selected continuous column into 4 bins (quartiles) and add the bin assignments as a new column
`AddPCsToEnd()`	Perform PCA
`captureSessionInfo()`	Capture session info
`ConvertDataToPercentiles()`	Use percentiles to assess for outliers in multidimensional data
`CorAssoTestMultipleWithErrorHandling()`	Takes multiple vectors and do correlation/association testing with all of them
`correlation.association.test()`	Given two numerical data vector, determine the correlation
`describeNumericalColumns()`	Describe each numerical feature. Mean, stddev, median, skewness (symmetry), kurtosis (flatness), pass normality?
`describeNumericalColumnsWithLevels()`	For each level, describe each numerical feature. Mean, sd, median, skewness (symmetry), kurtosis (flatness), pass normality?
`DownSampleDataframe()`	Down sample an imbalanced dataset to get a balanced dataset
`generate.descriptive.plots()`	Use histograms and boxplots to get an general idea of what data looks like
`generate.descriptive.plots.save.pdf()`	Use histograms and boxplots to get an general idea of what data looks like
`GenerateElbowPlotPCA()`	Create elbow plot to see how much total variance is explained by the components
`GeneratePC1andPC2PlotsWithAndWithoutOutliers()`	Generate PC1 vs PC2 plots with and without outliers.
`Log2TargetDensityPlotComparison()`	Do Log2 transformation on a column, and then compare with and without log2 transformation
`LookAtPCFeatureLoadings()`	Principal component feature loadings
`MultipleColumnsNormalCheckThenBoxCox()`	Checks multiple columns in a dataframe to see if each is normally distributed. If not, then box-cox transform
`NormalCheckThenBoxCoxTransform()`	Checks if the data is normally distributed using Shapiro test. If not normal, then boxcox transform.
`RanomlySelectOneRowForEach()`	Randomly select one row
`RecodeIdentifier()`	Recode the identifier column of a dataset
`RemoveColWithAllZeros()`	Remove columns with all zeros
`RemoveRowsBasedOnCol()`	Remove rows from the dataframe if the row contains a value in the specified columns
`RemoveSamplesWithInstability()`	Remove samples that have multiple values for a single column and those values are unstable
`SplitIntoTrainTest()`	Split into train and test
`StabilityTestingAcrossVisits()`	Assess stability of values that correspond to a single identifier
`SubsetDataByContinuousCol()`	Subset data by two bounds on a continuous column
`TwoSampleTTest()`	Performs two sample t-test on multiple features
`ZScoreChallengeOutliers()`	Remove outliers based on Z score of a particular variable
Clustering Functions
`CalcOptimalNumClustersForKMeans()`	Generate plots to help decide optimal number of clusters for Kmeans
`generate.2D.clustering.with.labeled.subgroup()`	Make a 2D scatter plot that shows the data as represented by PC1 and PC2
`generate.3D.clustering.with.labeled.subgroup()`	Make a 3D scatter plot that shows the data as represented by PC1, PC2, and PC3 and color labels clusters
`generate.plots.comparing.clusters()`	Compare clusters
`GenerateParcoordForClusters()`	Generate parallel plot to show each observation and which cluster they belong in.
`HierarchicalClustering()`	Automated hierarchical clustering with labeling of observations and groups
Classification Functions
`CVPredictionsRandomForest()`	Create random forest cross-validated model
`CVRandomForestClassificationMatrixForPheatmap()`	Generate a random forest model under cross validation (CV) for different subsets of the data and display results in a pheatmap to easily compare the different subsets
`eval.classification.results()`	Determine the performance of classification
`find.best.number.of.trees()`	Using the classification error rate for each number of trees, find the optimal number of trees to use for random forest classifier
`GenerateExampleDataMachinelearnr()`	Produce example data set for demonstrating package functions
`LOOCVPredictionsRandomForestAutomaticMtryAndNtree()`	Create random forest leave-one-out-cross-validated model
`LOOCVRandomForestClassificationMatrixForPheatmap()`	Generate a random forest model under leave-one-out-cross-validation (LOOCV) for different subsets of the data and display results in a pheatmap to easily compare the different subsets
`RandomForestAutomaticMtryAndNtree()`	Create random forest classification model after optimizing mtry and ntree
`RandomForestClassificationGiniMatrixForPheatmap()`	Generate a random forest model for different subsets of the data and display results into a matrix
`RandomForestClassificationPercentileMatrixForPheatmap()`	Generate a random forest model for different subsets of the data and display results in a pheatmap to easily compare the different subsets

Reference

Preprocessing Functions

Clustering Functions

Classification Functions