NEWS.md
Fixed issue with CUDA architecture string being empty when building {cuda.ml}
{cuda.ml} source code was revised to be compatible with libcuml++
version 21.06, 21.08, and 21.10
Added support for automatically downloading a pre-built version of libcuml++
and bundling & linking the downloaded libcuml++
with the rest of the {cuda.ml} installation when no pre-existing copy of libcuml++
is found. This is done so that new users can try out {cuda.ml} quickly without having to install Conda or to build libcuml++
from source manually.
Re-wrote R interfaces of all supervised ML algorithms using {hardhat} to support data-frame, matrix, formula, and recipe inputs, per suggestion from @topepo in https://github.com/mlverse/cuml/issues/78 and https://github.com/mlverse/cuml/issues/77.
Added {parsnip} bindings for random forest, SVM, and KNN models.
Improved warning message for missing linkage to the RAPIDS CuML shared library. If the C++/CUDA source code of this package was not linked with a valid version of the RAPIDS CuML shared library when the package was installed, then a warning will be emitted whenever the package is loaded.
Added support K-Means initialization options (namely, “kmeans++”, “random”, and “array”) and other configuration parameters for K-Means clustering in cuML
.
Added ‘cuml_log_level’ option to cuml_dbscan()
.
Implemented R interface for single-linkage agglomerative clustering.
cuML
(including inverse transformations from lower-dimensional representation to the original feature space when applicable).Added R interface for CuML Forest Inference Library (FIL). Users can load any existing XGBoost or LightGBM model using Treelite and use the model to perform high-throughput batch inference using GPU acceleration provided by FIL.
Implemented R interface for K-Nearest Neighbor (KNN) classification and regression.
Added ellipsis::check_dots_used()
checks for all ...
parameters in R.
Renamed this package from {cuml4r} to {cuml} per suggestion from @lorenzwalthert (context: https://github.com/mlverse/cuml/issues/75). The new name is shorter, and more importantly, is consistent with the mlverse naming convention for R packages (e.g., {keras}, {tensorflow}, {torch}, {tabnet}, etc).
cuML
.Implemented R interfaces for cuML
Random Forest classification and regression routines.
Implemented R interfaces for cuML
Support Vector Machine classifier and regressor.
Support for SVM multi-class classification was implemented using the one-vs- rest strategy (as SVM classifier from cuML
currently only supports binary classifications).
Included suggestions on how to build and install cuML
libraries from source with or without multi-GPU support in https://github.com/yitao-li/cuml-installation-notes. All suggestions are known to be working for RAPIDS cuML version 21.08. Please note the building-from- source option is more for advanced use cases that require customizations of RAPIDS cuML libraries’ build parameters, compilers, etc, and is somewhat time- consuming and not as beginner-friendly as installing cuML
directly from Conda.
Found and fixed a few typos and inconsistencies.
Some examples were simplified.
Added documentation for predict()
functions per suggestion from @topepo in https://github.com/mlverse/cuml/issues/80.
Configuration script was revised to work with RAPIDS cuML libraries installed via Conda or built from source. If RAPIDS cuML libraries could not be located during the configuration process, then a warning message will be emitted.
Improved on the initial prototype of {cuml} by utilizing modern C++ constructs from thrust
(https://github.com/NVIDIA/thrust), making the C++ source code of this project more readable and maintainable.
Formatted all human-written C++ source code with clang-format and all human- written R source code with styler
. Rcpp-generated C++ and R source files will not be formatted.
Caching of build artifacts using ccache
can be enabled by setting the env variable CUML4R_ENABLE_CCACHE (e.g., one can run R CMD build cuml
followed by CUML4R_ENABLE_CCACHE=1 R CMD INSTALL cuml_0.1.0.tar.gz
to avoid re- compiling the same artifacts across builds. Notice this feature is intended for {cuml} contributors or advanced users who need to build {cuml} frequently, and is not enabled by default for other users.
Some larger cpp files were split into more granular ones for faster build speed (if parallel build is enabled) and also greater maintainability.