Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.1462  0.2567  0.8638 -0.4793 -0.1115 -0.2104 -0.7485  0.4740 -0.1531
#>   0.3957  0.5775  0.0635  0.3065 -0.7018  0.6755 -0.5393 -0.1958 -0.4268
#>   0.6308  0.8125 -0.1272  0.3501  0.4815  0.0798  0.0806  0.8838 -0.3797
#> 
#> Columns 10 to 18 -0.2030 -0.1653  0.2036 -0.7162  0.2247  0.5438 -0.0156 -0.7146  0.4829
#>  -0.4866 -0.4701 -0.6116  0.4002 -0.1761 -0.4387  0.2493 -0.2555  0.2485
#>   0.7117  0.6465  0.4741 -0.2180  0.4336 -0.1333 -0.6637 -0.5552 -0.2279
#> 
#> Columns 19 to 20 -0.8386 -0.5859
#>  -0.8517  0.5394
#>  -0.5549  0.0157
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.3733  0.2442 -0.0802  0.5218  0.2237  0.6097  0.0147  0.1574 -0.1451
#>   0.1650  0.3261 -0.3883  0.2830  0.2901 -0.0222 -0.3750  0.4487  0.3132
#>   0.2860  0.3143  0.2914  0.0066  0.4841  0.6879 -0.3368  0.0161  0.1735
#> 
#> Columns 10 to 18 -0.5410 -0.0555 -0.0629  0.6935 -0.0771  0.2098 -0.4356  0.1178  0.6063
#>   0.5324  0.1588  0.4329  0.2361  0.6784  0.2739  0.2718 -0.1892  0.0013
#>  -0.5777 -0.2206 -0.2697  0.1029  0.1815  0.1486  0.1102  0.1201  0.2854
#> 
#> Columns 19 to 20 -0.1709  0.4600
#>  -0.2949 -0.0989
#>  -0.4996  0.2651
#> 
#> (3,.,.) = 
#>  Columns 1 to 9  0.0559 -0.0247 -0.3546 -0.1171  0.1397  0.2934 -0.6387  0.5587  0.0797
#>   0.4747  0.3670  0.0611 -0.1523  0.1568  0.7565 -0.1505 -0.0636 -0.4648
#>   0.3566  0.5020  0.0323 -0.0766  0.0874 -0.0811 -0.5456  0.2729  0.0150
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.1006 -0.0912 -0.0105 -0.0207 -0.0004 -0.4845 -0.1874  0.4146  0.5107
#>  -0.5146  0.0357 -0.0140 -0.3100  0.2991  0.1250  0.5280  0.1639 -0.2374
#>  -0.3175 -0.1751 -0.3540 -0.0616 -0.3054  0.1587  0.3484 -0.1559  0.1439
#> 
#> Columns 10 to 18  0.1129  0.0397  0.2820 -0.1373  0.7131 -0.0931  0.1651  0.3733 -0.7387
#>   0.6729 -0.7223 -0.1651  0.0861  0.2632  0.8154 -0.6576 -0.6319  0.0571
#>   0.2528 -0.1375  0.0198 -0.0576  0.4953  0.5505 -0.3547  0.1049 -0.3637
#> 
#> Columns 19 to 20 -0.3648  0.6372
#>   0.5050 -0.2551
#>  -0.1905  0.3245
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.1943 -0.3759 -0.1204  0.0610  0.1332  0.4891  0.0670  0.3168 -0.0102
#>   0.5831  0.2041 -0.3626 -0.0618  0.3794  0.2439 -0.1293 -0.3310 -0.0012
#>   0.3991 -0.0834  0.1830 -0.1325  0.2608  0.1700 -0.5141  0.2352  0.1447
#> 
#> Columns 10 to 18  0.4251 -0.4894  0.3166 -0.3336  0.0494 -0.1528 -0.1164 -0.4695  0.1988
#>  -0.2418 -0.3025 -0.4445  0.4926  0.1747 -0.1367  0.0196  0.3205 -0.1316
#>   0.1748 -0.4457  0.1024 -0.0685  0.1443 -0.0405  0.1734 -0.3839 -0.0344
#> 
#> Columns 19 to 20 -0.0321 -0.1163
#>  -0.1435  0.3021
#>  -0.1448  0.0648
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>