Train a random forest model for classification or regression tasks.
cuda_ml_rand_forest(x, ...) # S3 method for default cuda_ml_rand_forest(x, ...) # S3 method for data.frame cuda_ml_rand_forest( x, y, mtry = NULL, trees = NULL, min_n = 2L, bootstrap = TRUE, max_depth = 16L, max_leaves = Inf, max_predictors_per_note_split = NULL, n_bins = 128L, min_samples_leaf = 1L, split_criterion = NULL, min_impurity_decrease = 0, max_batch_size = 128L, n_streams = 8L, cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"), ... ) # S3 method for matrix cuda_ml_rand_forest( x, y, mtry = NULL, trees = NULL, min_n = 2L, bootstrap = TRUE, max_depth = 16L, max_leaves = Inf, max_predictors_per_note_split = NULL, n_bins = 128L, min_samples_leaf = 1L, split_criterion = NULL, min_impurity_decrease = 0, max_batch_size = 128L, n_streams = 8L, cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"), ... ) # S3 method for formula cuda_ml_rand_forest( formula, data, mtry = NULL, trees = NULL, min_n = 2L, bootstrap = TRUE, max_depth = 16L, max_leaves = Inf, max_predictors_per_note_split = NULL, n_bins = 128L, min_samples_leaf = 1L, split_criterion = NULL, min_impurity_decrease = 0, max_batch_size = 128L, n_streams = 8L, cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"), ... ) # S3 method for recipe cuda_ml_rand_forest( x, data, mtry = NULL, trees = NULL, min_n = 2L, bootstrap = TRUE, max_depth = 16L, max_leaves = Inf, max_predictors_per_note_split = NULL, n_bins = 128L, min_samples_leaf = 1L, split_criterion = NULL, min_impurity_decrease = 0, max_batch_size = 128L, n_streams = 8L, cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace"), ... )
x | Depending on the context: * A __data frame__ of predictors. * A __matrix__ of predictors. * A __recipe__ specifying a set of preprocessing steps * created from [recipes::recipe()]. * A __formula__ specifying the predictors and the outcome. |
---|---|
... | Optional arguments; currently unused. |
y | A numeric vector (for regression) or factor (for classification) of desired responses. |
mtry | The number of predictors that will be randomly sampled at each split when creating the tree models. Default: the square root of the total number of predictors. |
trees | An integer for the number of trees contained in the ensemble. Default: 100L. |
min_n | An integer for the minimum number of data points in a node that are required for the node to be split further. Default: 2L. |
bootstrap | Whether to perform bootstrap. If TRUE, each tree in the forest is built on a bootstrapped sample with replacement. If FALSE, the whole dataset is used to build each tree. |
max_depth | Maximum tree depth. Default: 16L. |
max_leaves | Maximum leaf nodes per tree. Soft constraint. Default: Inf (unlimited). |
max_predictors_per_note_split | Number of predictor to consider per node split. Default: square root of the total number predictors. |
n_bins | Number of bins used by the split algorithm. Default: 128L. |
min_samples_leaf | The minimum number of data points in each leaf node. Default: 1L. |
split_criterion | The criterion used to split nodes, can be "gini" or "entropy" for classifications, and "mse" or "mae" for regressions. Default: "gini" for classification; "mse" for regression. |
min_impurity_decrease | Minimum decrease in impurity requried for node to be spilt. Default: 0. |
max_batch_size | Maximum number of nodes that can be processed in a given batch. Default: 128L. |
n_streams | Number of CUDA streams to use for building trees. Default: 8L. |
cuML_log_level | Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. |
formula | A formula specifying the outcome terms on the left-hand side, and the predictor terms on the right-hand side. |
data | When a __recipe__ or __formula__ is used, |
A random forest classifier / regressor object that can be used with the 'predict' S3 generic to make predictions on new data points.
library(cuda.ml) # Classification model <- cuda_ml_rand_forest( formula = Species ~ ., data = iris, trees = 100 ) predictions <- predict(model, iris[names(iris) != "Species"]) # Regression model <- cuda_ml_rand_forest( formula = mpg ~ ., data = mtcars, trees = 100 ) predictions <- predict(model, mtcars[names(mtcars) != "mpg"])