CVPredictionsRandomForest() is used on each subset of the data. For each subset of the data: A random forest model with a specified fold CV is generated. This random forest model is used to make a single column in the final matrix which can be used to generate a pheatmap. The inputted data should already have the rows shuffled. If LOOCV should be done on each subsetted data, then this can be indicated by using -1 for the number.of.folds.

CVRandomForestClassificationMatrixForPheatmap(
  input.data,
  factor.name.for.subsetting,
  name.of.predictors.to.use,
  target.column.name,
  seed,
  percentile.threshold.to.keep,
  number.of.folds
)

Arguments

input.data

A dataframe which should already have the rows shuffled.

factor.name.for.subsetting

String to specify name of column to use for subsetting. The column should be a factor. Each column of the generated pheatmap will correspond to a level in the factor.

name.of.predictors.to.use

A vector of strings to specify name of columns to use as predictors for random forest model. Each column should be numeric.

target.column.name

A string to specify the column with values the random forest model is trying to predict for. The column should be a factor.

seed

A number to set for random number generation.

percentile.threshold.to.keep

A number from 0-1 indicating the percentile to use for feature selection. This is used by the CVPredictionsRandomForest() function.

number.of.folds

An integer to specify the fold for CV. If This number is set to -1, then the function will use LOOCV for each subset data set.

Value

A list with three objects:

  1. A numerical matrix that can be used for pheatmap generation.

  2. A list of subset data sets for each column in the heatmap matrix.

  3. A list of vectors containing the predictions for each column in the heatmap matrix. These values are used to calculate the MCC.

See also

Examples

example.data <- GenerateExampleDataMachinelearnr() set.seed(1) example.data.shuffled <- example.data[sample(nrow(example.data)),] result.two.fold.CV <- CVRandomForestClassificationMatrixForPheatmap( input.data = example.data.shuffled, factor.name.for.subsetting = "sep.xy.ab", name.of.predictors.to.use = c("x", "y", "a", "b"), target.column.name = "actual", seed = 2, percentile.threshold.to.keep = 0.5, number.of.folds = 2)
#> [1] 1 #> [1] 2 #> [1] 1 #> [1] 2
matrix.for.pheatmap <- result.two.fold.CV[[1]] pheatmap_RF <- pheatmap::pheatmap(matrix.for.pheatmap, fontsize_col = 12, fontsize_row=12)