R/umap.R
cuda_ml_umap.Rd
Run the Uniform Manifold Approximation and Projection (UMAP) algorithm to find a low dimensional embedding of the input data that approximates an underlying manifold.
cuda_ml_umap( x, y = NULL, n_components = 2L, n_neighbors = 15L, n_epochs = 500L, learning_rate = 1, init = c("spectral", "random"), min_dist = 0.1, spread = 1, set_op_mix_ratio = 1, local_connectivity = 1L, repulsion_strength = 1, negative_sample_rate = 5L, transform_queue_size = 4, a = NULL, b = NULL, target_n_neighbors = n_neighbors, target_metric = c("categorical", "euclidean"), target_weight = 0.5, transform_input = TRUE, seed = NULL, cuML_log_level = c("off", "critical", "error", "warn", "info", "debug", "trace") )
x | The input matrix or dataframe. Each data point should be a row and should consist of numeric values only. |
---|---|
y | An optional numeric vector of target values for supervised dimension reduction. Default: NULL. |
n_components | The dimension of the space to embed into. Default: 2. |
n_neighbors | The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Default: 15. |
n_epochs | The number of training epochs to be used in optimizing the low dimensional embedding. Default: 500. |
learning_rate | The initial learning rate for the embedding optimization. Default: 1.0. |
init | Initialization mode of the low dimensional embedding. Must be one of "spectral", "random". Default: "spectral". |
min_dist | The effective minimum distance between embedded points. Default: 0.1. |
spread | The effective scale of embedded points. In combination with
|
set_op_mix_ratio | Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection. Default: 1.0. |
local_connectivity | The local connectivity required -- i.e. the number of nearest neighbors that should be assumed to be connected at a local level. Default: 1. |
repulsion_strength | Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples. Default: 1.0. |
negative_sample_rate | The number of negative samples to select per positive sample in the optimization process. Default: 5. |
transform_queue_size | For transform operations (embedding new points using a trained model this will control how aggressively to search for nearest neighbors. Default: 4.0. |
a, b | More specific parameters controlling the embedding. If not set,
then these values are set automatically as determined by |
target_n_neighbors | The number of nearest neighbors to use to construct the target simplcial set. Default: n_neighbors. |
target_metric | The metric for measuring distance between the actual and
and the target values ( |
target_weight | Weighting factor between data topology and target topology. A value of 0.0 weights entirely on data, a value of 1.0 weights entirely on target. The default of 0.5 balances the weighting equally between data and target. |
transform_input | If TRUE, then compute an approximate representation of the input data. Default: TRUE. |
seed | Optional seed for pseudo random number generator. Default: NULL. Setting a PRNG seed will enable consistency of trained embeddings, allowing for reproducible results to 3 digits of precision, but at the expense of potentially slower training and increased memory usage. If the PRNG seed is not set, then the trained embeddings will not be deterministic. |
cuML_log_level | Log level within cuML library functions. Must be one of "off", "critical", "error", "warn", "info", "debug", "trace". Default: off. |
A UMAP model object that can be used as input to the
cuda_ml_transform()
function.
If transform_input
is set to TRUE, then the model object will
contain a "transformed_data" attribute containing the lower dimensional
embedding of the input data.
library(cuda.ml) model <- cuda_ml_umap( x = iris[1:4], y = iris[[5]], n_components = 2, n_epochs = 200, transform_input = TRUE ) set.seed(0L) print(kmeans(model$transformed, iter.max = 100, centers = 3))#> K-means clustering with 3 clusters of sizes 148, 1, 1 #> #> Cluster means: #> [,1] [,2] [,3] [,4] [,5] [,6] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,7] [,8] [,9] [,10] [,11] [,12] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,13] [,14] [,15] [,16] [,17] [,18] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,19] [,20] [,21] [,22] [,23] [,24] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,25] [,26] [,27] [,28] [,29] [,30] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,31] [,32] [,33] [,34] [,35] [,36] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,37] [,38] [,39] [,40] [,41] [,42] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,43] [,44] [,45] [,46] [,47] [,48] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,49] [,50] [,51] [,52] [,53] [,54] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,55] [,56] [,57] [,58] [,59] [,60] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,61] [,62] [,63] [,64] [,65] [,66] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,67] [,68] [,69] [,70] [,71] [,72] [,73] #> 1 0.006756757 0 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 1 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,74] [,75] [,76] [,77] [,78] [,79] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,80] [,81] [,82] [,83] [,84] [,85] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,86] [,87] [,88] [,89] [,90] [,91] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,92] [,93] [,94] [,95] [,96] [,97] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,98] [,99] [,100] [,101] [,102] [,103] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,104] [,105] [,106] [,107] [,108] [,109] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,110] [,111] [,112] [,113] [,114] [,115] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,116] [,117] [,118] [,119] [,120] [,121] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,122] [,123] [,124] [,125] [,126] [,127] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,128] [,129] [,130] [,131] [,132] [,133] #> 1 0.006756757 0 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 1 0.000000000 0.000000000 0.000000000 0.000000000 #> [,134] [,135] [,136] [,137] [,138] [,139] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,140] [,141] [,142] [,143] [,144] [,145] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> [,146] [,147] [,148] [,149] [,150] #> 1 0.006756757 0.006756757 0.006756757 0.006756757 0.006756757 #> 2 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> 3 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> #> Clustering vector: #> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 #> [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 #> [149] 1 1 #> #> Within cluster sum of squares by cluster: #> [1] 147 0 0 #> (between_SS / total_SS = 1.3 %) #> #> Available components: #> #> [1] "cluster" "centers" "totss" "withinss" "tot.withinss" #> [6] "betweenss" "size" "iter" "ifault"