Load a XGBoost or LightGBM model file using Treelite. The resulting model object can be used to perform high-throughput batch inference on new data points using the GPU acceleration functionality from the CuML Forest Inference Library (FIL).

  mode = c("classification", "regression"),
  model_type = c("xgboost", "lightgbm"),
  algo = c("auto", "naive", "tree_reorg", "batch_tree_reorg"),
  threshold = 0.5,
  storage_type = c("auto", "dense", "sparse"),
  threads_per_tree = 1L,
  n_items = 0L,
  blocks_per_sm = 0L



Path to the saved model file.


Type of task to be performed by the model. Must be one of "classification", "regression".


Format of the saved model file. Notice if filename ends with ".json" and model_type is "xgboost", then cuda.ml will assume the model file is in XGBoost JSON (instead of binary) format. Default: "xgboost".


Type of the algorithm for inference, must be one of the following. - "auto": Choose the algorithm automatically. Currently 'batch_tree_reorg' is used for dense storage, and 'naive' for sparse storage. - "naive": Simple inference using shared memory. - "tree_reorg": Similar to naive but with trees rearranged to be more coalescing- friendly. - "batch_tree_reorg": Similar to 'tree_reorg' but predicting multiple rows per thread block. Default: "auto".


Class probability threshold for classification. Ignored for regression tasks. Default: 0.5.


In-memory storage format of the FIL model. Must be one of the following. - "auto": Choose the storage type automatically, - "dense": Create a dense forest, - "sparse": Create a sparse forest. Requires algo to be 'naive' or 'auto'.


If >1, then have multiple (neighboring) threads infer on the same tree within a block, which will improve memory bandwith near tree root (but consuming more shared memory). Default: 1L.


Number of input samples each thread processes. If 0, then choose (up to 4) that fit into shared memory. Default: 0L.


Indicates how CuML should determine the number of thread blocks to lauch for the inference kernel. - 0: Launches the number of blocks proportional to the number of data points. - >= 1: Attempts to lauch blocks_per_sm blocks for each streaming multiprocessor. This will fail if blocks_per_sm blocks result in more threads than the maximum supported number of threads per GPU. Even if successful, it is not guaranteed that blocks_per_sm blocks will run on an SM concurrently.


A GPU-accelerated FIL model that can be used with the 'predict' S3 generic to make predictions on new data points.


library(cuda.ml) library(xgboost) model_path <- file.path(tempdir(), "xgboost.model") model <- xgboost( data = as.matrix(mtcars[names(mtcars) != "mpg"]), label = as.matrix(mtcars["mpg"]), max.depth = 6, eta = 1, nthread = 2, nrounds = 20, objective = "reg:squarederror" )
#> [1] train-rmse:4.134045 #> [2] train-rmse:1.328784 #> [3] train-rmse:0.500717 #> [4] train-rmse:0.248922 #> [5] train-rmse:0.129600 #> [6] train-rmse:0.063306 #> [7] train-rmse:0.030865 #> [8] train-rmse:0.015847 #> [9] train-rmse:0.008145 #> [10] train-rmse:0.004665 #> [11] train-rmse:0.002660 #> [12] train-rmse:0.001517 #> [13] train-rmse:0.000806 #> [14] train-rmse:0.000455 #> [15] train-rmse:0.000454 #> [16] train-rmse:0.000454 #> [17] train-rmse:0.000454 #> [18] train-rmse:0.000454 #> [19] train-rmse:0.000454 #> [20] train-rmse:0.000454
xgb.save(model, model_path)
#> [1] TRUE
model <- cuda_ml_fil_load_model( model_path, mode = "regression", model_type = "xgboost" ) preds <- predict(model, mtcars[names(mtcars) != "mpg"]) print(preds)
#> # A tibble: 32 × 1 #> .pred #> <dbl> #> 1 1 #> 2 1 #> 3 1 #> 4 1 #> 5 1 #> 6 1 #> 7 1 #> 8 1 #> 9 1 #> 10 1 #> # … with 22 more rows