Skip to contents

Normalizing sparse transform (a la softmax).

Usage

sparsemax(dim = -1L)

sparsemax15(dim = -1L, k = NULL)

Arguments

dim

The dimension along which to apply sparsemax.

k

The number of largest elements to partial-sort input over. For optimal performance, k should be slightly bigger than the expected number of non-zeros in the solution. If the solution is more than k-sparse, this function is recursively called with a 2*k schedule. If NULL, full sorting is performed from the beginning.

Value

The projection result, such that \(\sum_{dim} P = 1 \forall dim\) elementwise.

Details

Solves the projection:

\(\min_P ||input - P||_2 \text{ s.t. } P \geq0, \sum(P) ==1\)

Examples

input <- torch::torch_randn(10, 5, requires_grad = TRUE)
# create a top3 alpha=1.5 sparsemax on last input dimension
nn_sparsemax <- sparsemax15(dim=1, k=3)
result <- nn_sparsemax(input)
print(result)
#> torch_tensor
#>  0.0000  0.0000  0.0000  0.0000  1.0000
#>  0.0000  0.2258  0.0000  0.0000  0.0000
#>  0.0000  0.0000  0.0000  0.0460  0.0000
#>  0.0000  0.0000  0.0000  0.8466  0.0000
#>  0.0000  0.0000  0.0000  0.0000  0.0000
#>  0.0000  0.0000  0.6394  0.1074  0.0000
#>  1.0000  0.0623  0.0000  0.0000  0.0000
#>  0.0000  0.0432  0.0000  0.0000  0.0000
#>  0.0000  0.0000  0.0000  0.0000  0.0000
#>  0.0000  0.6688  0.3606  0.0000  0.0000
#> [ CPUFloatType{10,5} ][ grad_fn = <torch::autograd::LanternNode> ]