Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)Arguments
- input_size
The number of expected features in the input
xThe number of features in the hidden state
h- num_layers
Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two RNNs together to form astacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'or'relu'. Default:'tanh'- bias
If
FALSE, then the layer does not use bias weightsb_ihandb_hh. Default:TRUE- batch_first
If
TRUE, then the input and output tensors are provided as(batch, seq, feature). Default:FALSE- dropout
If non-zero, introduces a
Dropoutlayer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout. Default: 0- bidirectional
If
TRUE, becomes a bidirectional RNN. Default:FALSE- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t, \(x_t\) is
the input at time t, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1 or the initial hidden state at time 0.
If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for eacht. If a :class:nn_packed_sequencehas been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction0and1respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size): tensor containing the hidden state fort = seq_len. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size).
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
Lrepresents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)fork = 0. Otherwise, the shape is(hidden_size, num_directions * hidden_size)weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.4927 0.4901 -0.4282 -0.0831 -0.2977 0.1164 0.8763 -0.9533 -0.4647
#> -0.6632 -0.7458 0.7455 0.3813 0.3598 0.2851 -0.6641 0.4961 0.8769
#> 0.4221 0.7821 0.0039 -0.7736 -0.8827 -0.0752 0.1781 0.8693 -0.1321
#>
#> Columns 10 to 18 0.9011 -0.5297 -0.7668 -0.0333 0.6147 -0.0687 -0.6040 0.5936 0.8512
#> 0.9241 0.0130 0.0572 -0.0071 -0.4765 0.8244 -0.0804 -0.8762 0.0557
#> 0.6619 0.3819 -0.5202 0.8035 0.9756 -0.2049 0.7426 -0.8909 -0.2870
#>
#> Columns 19 to 20 -0.0176 -0.1480
#> 0.1718 0.2521
#> 0.4839 0.3082
#>
#> (2,.,.) =
#> Columns 1 to 9 0.3949 -0.0434 0.4409 0.0937 -0.1094 0.1320 0.1151 -0.3794 -0.4731
#> 0.3683 0.3524 0.2392 -0.2336 -0.2641 -0.0583 0.2366 -0.4079 0.1609
#> -0.5180 0.8713 0.5828 0.4722 -0.4412 0.0112 0.0054 0.7250 0.5119
#>
#> Columns 10 to 18 -0.3147 -0.7491 0.4564 -0.1294 -0.2644 -0.3022 0.0187 -0.1442 -0.3023
#> 0.4521 -0.4673 -0.0466 0.5337 -0.1050 0.2740 0.3496 0.0997 0.4321
#> 0.0863 0.0953 0.5917 -0.3298 0.2592 0.0548 -0.0745 -0.5867 -0.3719
#>
#> Columns 19 to 20 -0.5365 -0.2288
#> 0.1866 0.1656
#> -0.3659 -0.4206
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.2300 0.4053 0.0905 -0.0741 -0.5414 0.4193 0.3445 -0.2394 -0.1182
#> 0.2911 0.1463 0.2752 -0.0373 -0.4252 0.2677 -0.1027 -0.1861 -0.4730
#> 0.0394 0.7819 -0.1252 -0.3251 -0.2505 0.1518 0.4218 0.4204 0.0591
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.0431 0.6544 -0.2914 -0.0505 -0.6008 -0.0628 0.7554 0.6150 -0.1684
#> -0.3466 0.4796 -0.3319 -0.3516 -0.2889 0.0550 0.6131 0.4393 -0.0448
#> -0.2081 0.3180 -0.6217 0.1575 0.6654 -0.1015 0.4661 0.6445 0.1089
#>
#> Columns 10 to 18 -0.7627 -0.1551 -0.4251 -0.2694 0.1820 -0.1036 -0.2315 -0.0280 -0.0879
#> -0.5182 0.1173 -0.5221 -0.0291 0.1116 -0.5802 0.2164 -0.5016 0.2070
#> -0.7195 -0.1991 -0.4307 0.5682 -0.6656 -0.0712 0.4629 0.3143 0.1695
#>
#> Columns 19 to 20 -0.6286 -0.5400
#> -0.4962 -0.1000
#> 0.4738 0.2069
#>
#> (2,.,.) =
#> Columns 1 to 9 0.3072 0.0778 0.0435 -0.4139 -0.2365 0.2873 0.0941 -0.1795 -0.3111
#> 0.1994 0.3400 0.1679 -0.1093 -0.3871 0.1592 0.4745 -0.1481 -0.1743
#> 0.1889 0.3443 0.0876 -0.1907 -0.4394 0.1575 0.0250 -0.1228 -0.2674
#>
#> Columns 10 to 18 0.1840 -0.5982 -0.0230 0.1299 -0.2965 -0.1785 -0.3880 0.4689 0.1553
#> 0.3883 -0.4451 0.3667 0.1422 0.0032 -0.1569 -0.3590 0.2696 0.0381
#> 0.4934 -0.0373 0.1592 0.6636 0.2052 -0.1687 0.0998 -0.4514 0.5805
#>
#> Columns 19 to 20 -0.1476 0.2681
#> -0.5902 0.0265
#> -0.4543 0.0861
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>