Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.9622  0.8418 -0.0502  0.1875 -0.2933 -0.9444 -0.0126 -0.4873  0.7122
#>   0.8677 -0.3529 -0.2450  0.6374  0.0679 -0.7167 -0.6897 -0.6467  0.3032
#>   0.1811  0.3920 -0.5167 -0.4076 -0.3437  0.2018 -0.2177 -0.6838 -0.3162
#> 
#> Columns 10 to 18  0.7549  0.5399 -0.1854 -0.4826  0.4033 -0.8416 -0.7491 -0.2169  0.7528
#>  -0.1120  0.7855 -0.4133 -0.1124  0.0044 -0.2194 -0.8202  0.3156  0.4540
#>  -0.6610 -0.1306 -0.4471  0.1243  0.7613 -0.4575 -0.5173 -0.7225  0.1277
#> 
#> Columns 19 to 20  0.8361  0.4677
#>  -0.4312  0.5808
#>   0.6034 -0.1632
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.3081 -0.0639  0.2172 -0.0604  0.1254  0.4370 -0.3234  0.2159 -0.1419
#>   0.3041 -0.0074 -0.2877 -0.1937  0.1851 -0.0323 -0.1800  0.0110  0.1459
#>  -0.3075  0.8545 -0.0911 -0.4127 -0.4484 -0.0903  0.1828 -0.0184 -0.1684
#> 
#> Columns 10 to 18 -0.1438 -0.5281  0.0915  0.4408 -0.2445  0.3916 -0.4612 -0.5501 -0.0287
#>  -0.2157 -0.2639 -0.4763  0.6271  0.2091 -0.3575 -0.4149 -0.0133  0.6744
#>  -0.4989 -0.0275 -0.2810 -0.4277  0.5102 -0.1220 -0.1063 -0.6250 -0.0097
#> 
#> Columns 19 to 20 -0.5657  0.2921
#>  -0.1760  0.3062
#>   0.7336  0.4270
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.3951  0.2865 -0.4982 -0.5852  0.4242  0.2952  0.0952  0.1432 -0.0250
#>  -0.2299  0.5802 -0.1146 -0.0046  0.1156  0.0213  0.4462 -0.0442  0.1141
#>  -0.1593  0.5002  0.0338 -0.2100 -0.2869 -0.0104  0.0684 -0.1977 -0.1143
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.0630 -0.8164  0.2896  0.2990  0.6140  0.4442 -0.2577  0.2354 -0.6512
#>   0.3258  0.7673  0.2476 -0.0734 -0.3150 -0.7168 -0.3780 -0.0354  0.4539
#>   0.6156 -0.2728 -0.2687 -0.2571  0.0563  0.3960  0.1600 -0.3323 -0.2503
#> 
#> Columns 10 to 18  0.3313  0.0636 -0.6123  0.4605  0.5237 -0.5697 -0.3941  0.2849  0.5401
#>  -0.0454 -0.2743 -0.2354 -0.2107 -0.4200  0.4798  0.3181  0.0149 -0.3661
#>   0.0462  0.2313 -0.6927  0.2438  0.3224 -0.6745 -0.2389 -0.3397  0.3211
#> 
#> Columns 19 to 20 -0.2524 -0.4662
#>  -0.1121  0.5733
#>   0.1792 -0.4667
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.4231 -0.0168 -0.2824 -0.2403 -0.0204  0.1226 -0.6130 -0.0832 -0.1784
#>  -0.1863  0.2507 -0.1469 -0.0242 -0.3112 -0.3717  0.1528  0.0188  0.0560
#>  -0.4433  0.3292 -0.0548 -0.2575 -0.2851  0.3427 -0.1712 -0.0420 -0.1880
#> 
#> Columns 10 to 18  0.5409 -0.3738  0.2143  0.5125 -0.0781 -0.2669 -0.6297 -0.7925  0.2595
#>  -0.4808 -0.0326 -0.0531 -0.2579  0.4998 -0.0980 -0.0159  0.0305  0.1732
#>   0.4075 -0.0128  0.0704  0.0968  0.0571  0.1456 -0.5200 -0.5549  0.1160
#> 
#> Columns 19 to 20 -0.1831 -0.1266
#>   0.5855  0.1915
#>  -0.2988  0.1237
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>