For this function, mtry is optimized using randomForest::tuneRF() and ntree is optimized using find.best.number.of.trees() function on the out-of-bag error. After optimizing for mtry and ntree, the optimal values are used to create a new random forest model, and this model is outputted.

RandomForestAutomaticMtryAndNtree(
  inputted.data,
  name.of.predictors.to.use,
  target.column.name,
  seed
)

Arguments

inputted.data

A dataframe.

name.of.predictors.to.use

A vector of strings that specifies the columns with values that we want to use for prediction.

target.column.name

A string that specifies the column with values that we want to predict for. This column should be a factor.

seed

A integer that specifies the seed to use for random number generation.

Value

A randomForest object is returned

Details

However, the default values of mtry and ntree from randomForest() are actually preferred in most cases.

See also

Examples

id = c("1a", "1b", "1c", "1d", "1e", "1f", "1g", "2a", "2b", "2c", "2d", "2e", "2f", "3a", "3b", "3c", "3d", "3e", "3f", "3g", "3h", "3i") x = c(18, 21, 22, 24, 26, 26, 27, 30, 31, 35, 39, 35, 30, 40, 41, 42, 44, 46, 47, 48, 49, 54) y = c(10, 11, 22, 15, 12, 13, 14, 33, 39, 37, 44, 40, 45, 27, 29, 20, 28, 21, 30, 31, 23, 24) a = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) actual = as.factor(c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3")) example.data <- data.frame(id, x, y, a, b, actual) rf.result <- RandomForestAutomaticMtryAndNtree(example.data, c("x", "y", "a", "b"), "actual", seed=2) predicted <- rf.result$predicted actual <- example.data[,"actual"] #Result is not perfect because RF model does not over fit to the training data. eval.classification.results(as.character(actual), as.character(predicted), "Example")
#> [[1]] #> [1] "Example" #> #> [[2]] #> predicted #> actual 1 2 3 #> 1 7 0 0 #> 2 0 5 1 #> 3 0 0 9 #> #> [[3]] #> [[3]]$accuracy #> [1] 0.9545455 #> #> [[3]]$macro_prf #> # A tibble: 3 x 3 #> precision recall f1 #> <dbl> <dbl> <dbl> #> 1 1 1 1 #> 2 1 0.833 0.909 #> 3 0.9 1 0.947 #> #> [[3]]$macro_avg #> # A tibble: 1 x 3 #> avg_precision avg_recall avg_f1 #> <dbl> <dbl> <dbl> #> 1 0.967 0.944 0.952 #> #> [[3]]$ova #> [[3]]$ova$`1` #> classified #> actual 1 others #> 1 7 0 #> others 0 15 #> #> [[3]]$ova$`2` #> classified #> actual 2 others #> 2 5 1 #> others 0 16 #> #> [[3]]$ova$`3` #> classified #> actual 3 others #> 3 9 0 #> others 1 12 #> #> #> [[3]]$ova_sum #> classified #> actual relevant others #> relevant 21 1 #> others 1 43 #> #> [[3]]$kappa #> [1] 0.9301587 #> #> #> [[4]] #> [1] 0.9331967 #>