Load a XGBoost or LightGBM model file. — cuda_ml_fil_load

Load a XGBoost or LightGBM model file using Treelite. The resulting model object can be used to perform high-throughput batch inference on new data points using the GPU acceleration functionality from the CuML Forest Inference Library (FIL).

cuda_ml_fil_load_model(
  filename,
  mode = c("classification", "regression"),
  model_type = c("xgboost", "lightgbm"),
  algo = c("auto", "naive", "tree_reorg", "batch_tree_reorg"),
  threshold = 0.5,
  storage_type = c("auto", "dense", "sparse"),
  threads_per_tree = 1L,
  n_items = 0L,
  blocks_per_sm = 0L
)

Arguments

filename	Path to the saved model file.
mode	Type of task to be performed by the model. Must be one of "classification", "regression".
model_type	Format of the saved model file. Notice if `filename` ends with ".json" and `model_type` is "xgboost", then cuda.ml will assume the model file is in XGBoost JSON (instead of binary) format. Default: "xgboost".
algo	Type of the algorithm for inference, must be one of the following. - "auto": Choose the algorithm automatically. Currently 'batch_tree_reorg' is used for dense storage, and 'naive' for sparse storage. - "naive": Simple inference using shared memory. - "tree_reorg": Similar to naive but with trees rearranged to be more coalescing- friendly. - "batch_tree_reorg": Similar to 'tree_reorg' but predicting multiple rows per thread block. Default: "auto".
threshold	Class probability threshold for classification. Ignored for regression tasks. Default: 0.5.
storage_type	In-memory storage format of the FIL model. Must be one of the following. - "auto": Choose the storage type automatically, - "dense": Create a dense forest, - "sparse": Create a sparse forest. Requires `algo` to be 'naive' or 'auto'.
threads_per_tree	If >1, then have multiple (neighboring) threads infer on the same tree within a block, which will improve memory bandwith near tree root (but consuming more shared memory). Default: 1L.
n_items	Number of input samples each thread processes. If 0, then choose (up to 4) that fit into shared memory. Default: 0L.
blocks_per_sm	Indicates how CuML should determine the number of thread blocks to lauch for the inference kernel. - 0: Launches the number of blocks proportional to the number of data points. - >= 1: Attempts to lauch `blocks_per_sm` blocks for each streaming multiprocessor. This will fail if `blocks_per_sm` blocks result in more threads than the maximum supported number of threads per GPU. Even if successful, it is not guaranteed that `blocks_per_sm` blocks will run on an SM concurrently.

Value

A GPU-accelerated FIL model that can be used with the 'predict' S3 generic to make predictions on new data points.

Examples


library(cuda.ml)
library(xgboost)

model_path <- file.path(tempdir(), "xgboost.model")

model <- xgboost(
  data = as.matrix(mtcars[names(mtcars) != "mpg"]),
  label = as.matrix(mtcars["mpg"]),
  max.depth = 6,
  eta = 1,
  nthread = 2,
  nrounds = 20,
  objective = "reg:squarederror"
)
#> [1]	train-rmse:4.134045 
#> [2]	train-rmse:1.328784 
#> [3]	train-rmse:0.500717 
#> [4]	train-rmse:0.248922 
#> [5]	train-rmse:0.129600 
#> [6]	train-rmse:0.063306 
#> [7]	train-rmse:0.030865 
#> [8]	train-rmse:0.015847 
#> [9]	train-rmse:0.008145 
#> [10]	train-rmse:0.004665 
#> [11]	train-rmse:0.002660 
#> [12]	train-rmse:0.001517 
#> [13]	train-rmse:0.000806 
#> [14]	train-rmse:0.000455 
#> [15]	train-rmse:0.000454 
#> [16]	train-rmse:0.000454 
#> [17]	train-rmse:0.000454 
#> [18]	train-rmse:0.000454 
#> [19]	train-rmse:0.000454 
#> [20]	train-rmse:0.000454 

xgb.save(model, model_path)
#> [1] TRUE

model <- cuda_ml_fil_load_model(
  model_path,
  mode = "regression",
  model_type = "xgboost"
)

preds <- predict(model, mtcars[names(mtcars) != "mpg"])

print(preds)
#> # A tibble: 32 × 1
#>    .pred
#>    <dbl>
#>  1     1
#>  2     1
#>  3     1
#>  4     1
#>  5     1
#>  6     1
#>  7     1
#>  8     1
#>  9     1
#> 10     1
#> # … with 22 more rows