Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.4927  0.4901 -0.4282 -0.0831 -0.2977  0.1164  0.8763 -0.9533 -0.4647
#>  -0.6632 -0.7458  0.7455  0.3813  0.3598  0.2851 -0.6641  0.4961  0.8769
#>   0.4221  0.7821  0.0039 -0.7736 -0.8827 -0.0752  0.1781  0.8693 -0.1321
#> 
#> Columns 10 to 18  0.9011 -0.5297 -0.7668 -0.0333  0.6147 -0.0687 -0.6040  0.5936  0.8512
#>   0.9241  0.0130  0.0572 -0.0071 -0.4765  0.8244 -0.0804 -0.8762  0.0557
#>   0.6619  0.3819 -0.5202  0.8035  0.9756 -0.2049  0.7426 -0.8909 -0.2870
#> 
#> Columns 19 to 20 -0.0176 -0.1480
#>   0.1718  0.2521
#>   0.4839  0.3082
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.3949 -0.0434  0.4409  0.0937 -0.1094  0.1320  0.1151 -0.3794 -0.4731
#>   0.3683  0.3524  0.2392 -0.2336 -0.2641 -0.0583  0.2366 -0.4079  0.1609
#>  -0.5180  0.8713  0.5828  0.4722 -0.4412  0.0112  0.0054  0.7250  0.5119
#> 
#> Columns 10 to 18 -0.3147 -0.7491  0.4564 -0.1294 -0.2644 -0.3022  0.0187 -0.1442 -0.3023
#>   0.4521 -0.4673 -0.0466  0.5337 -0.1050  0.2740  0.3496  0.0997  0.4321
#>   0.0863  0.0953  0.5917 -0.3298  0.2592  0.0548 -0.0745 -0.5867 -0.3719
#> 
#> Columns 19 to 20 -0.5365 -0.2288
#>   0.1866  0.1656
#>  -0.3659 -0.4206
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.2300  0.4053  0.0905 -0.0741 -0.5414  0.4193  0.3445 -0.2394 -0.1182
#>   0.2911  0.1463  0.2752 -0.0373 -0.4252  0.2677 -0.1027 -0.1861 -0.4730
#>   0.0394  0.7819 -0.1252 -0.3251 -0.2505  0.1518  0.4218  0.4204  0.0591
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.0431  0.6544 -0.2914 -0.0505 -0.6008 -0.0628  0.7554  0.6150 -0.1684
#>  -0.3466  0.4796 -0.3319 -0.3516 -0.2889  0.0550  0.6131  0.4393 -0.0448
#>  -0.2081  0.3180 -0.6217  0.1575  0.6654 -0.1015  0.4661  0.6445  0.1089
#> 
#> Columns 10 to 18 -0.7627 -0.1551 -0.4251 -0.2694  0.1820 -0.1036 -0.2315 -0.0280 -0.0879
#>  -0.5182  0.1173 -0.5221 -0.0291  0.1116 -0.5802  0.2164 -0.5016  0.2070
#>  -0.7195 -0.1991 -0.4307  0.5682 -0.6656 -0.0712  0.4629  0.3143  0.1695
#> 
#> Columns 19 to 20 -0.6286 -0.5400
#>  -0.4962 -0.1000
#>   0.4738  0.2069
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.3072  0.0778  0.0435 -0.4139 -0.2365  0.2873  0.0941 -0.1795 -0.3111
#>   0.1994  0.3400  0.1679 -0.1093 -0.3871  0.1592  0.4745 -0.1481 -0.1743
#>   0.1889  0.3443  0.0876 -0.1907 -0.4394  0.1575  0.0250 -0.1228 -0.2674
#> 
#> Columns 10 to 18  0.1840 -0.5982 -0.0230  0.1299 -0.2965 -0.1785 -0.3880  0.4689  0.1553
#>   0.3883 -0.4451  0.3667  0.1422  0.0032 -0.1569 -0.3590  0.2696  0.0381
#>   0.4934 -0.0373  0.1592  0.6636  0.2052 -0.1687  0.0998 -0.4514  0.5805
#> 
#> Columns 19 to 20 -0.1476  0.2681
#>  -0.5902  0.0265
#>  -0.4543  0.0861
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>