Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.5944 -0.6401 -0.4884 0.2165 -0.4695 -0.5841 0.4003 0.0293 0.1361
#> 0.3141 -0.4003 0.6153 0.1675 -0.1991 0.1980 -0.0638 0.2009 0.3664
#> -0.3311 -0.2844 0.2501 -0.5501 -0.0930 -0.1435 -0.4153 0.1443 0.3987
#>
#> Columns 10 to 18 -0.8953 -0.3160 0.7690 0.4353 -0.9125 0.1048 -0.3679 -0.2261 -0.7134
#> 0.8789 0.2743 0.3330 -0.9160 -0.2329 -0.4732 0.7987 0.2498 -0.3513
#> -0.0595 -0.0414 0.2853 0.5087 -0.1204 -0.6069 -0.4815 0.3294 0.2926
#>
#> Columns 19 to 20 0.6519 -0.8058
#> -0.1518 0.2662
#> 0.5490 0.2870
#>
#> (2,.,.) =
#> Columns 1 to 9 0.6038 0.5381 -0.2702 -0.2611 0.1526 0.5815 0.5129 -0.1028 0.0631
#> 0.6053 0.2879 0.2676 -0.1108 -0.5120 -0.1488 0.2754 -0.0545 0.2809
#> 0.4840 -0.0389 0.4519 0.0532 0.3017 0.4180 0.0665 -0.3966 0.2519
#>
#> Columns 10 to 18 -0.0173 -0.4471 0.4763 0.6337 -0.3888 -0.1050 -0.5589 0.2354 -0.5651
#> 0.0250 -0.3052 -0.2256 -0.0450 -0.1452 0.3206 -0.0882 -0.1725 -0.1897
#> -0.2431 -0.5869 0.4712 0.0990 -0.1732 0.0419 0.4712 0.5383 -0.7632
#>
#> Columns 19 to 20 -0.0893 -0.5099
#> 0.4132 0.1224
#> 0.1380 -0.1980
#>
#> (3,.,.) =
#> Columns 1 to 9 0.1520 0.5244 0.0603 -0.0252 0.1342 0.4345 0.4929 -0.0837 0.2292
#> 0.2401 0.0731 -0.0629 -0.2905 -0.0231 0.3164 0.2400 0.1131 0.3396
#> 0.4581 0.1719 0.1856 -0.4919 -0.2999 0.4888 0.1800 -0.3973 -0.1134
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.7433 0.0132 0.7771 -0.5761 -0.0948 -0.3255 0.7292 -0.4665 -0.1679
#> -0.2671 -0.0385 0.1691 -0.5765 -0.0204 0.3695 0.0650 -0.3395 0.2541
#> -0.1996 -0.3506 0.4058 -0.1777 0.5691 0.4692 0.8625 0.7024 0.0454
#>
#> Columns 10 to 18 -0.5392 0.6396 0.7567 0.7827 -0.4873 0.1251 -0.3834 -0.0474 0.0489
#> -0.1094 -0.2379 -0.7107 -0.1123 -0.3464 -0.3640 -0.1792 -0.3552 0.5378
#> -0.5916 -0.0385 0.8658 0.1396 0.5103 0.5543 0.5276 0.5941 -0.6288
#>
#> Columns 19 to 20 -0.4282 -0.1346
#> -0.3171 -0.1174
#> 0.6226 -0.1535
#>
#> (2,.,.) =
#> Columns 1 to 9 0.4440 -0.2710 0.1350 -0.3705 -0.0772 -0.4557 0.0953 0.2506 0.6398
#> 0.0045 0.3164 0.4202 -0.4949 -0.2211 0.4044 -0.0044 -0.2685 0.6013
#> 0.7687 -0.2897 0.5142 -0.2122 -0.1092 0.0634 0.4387 0.3466 0.2665
#>
#> Columns 10 to 18 0.2733 0.1802 -0.6125 -0.2482 0.4468 0.4122 -0.1445 -0.1921 -0.0106
#> 0.1850 -0.0958 -0.4403 -0.3507 0.6460 -0.3670 -0.1135 0.2356 0.3084
#> -0.0798 -0.4872 -0.2650 0.1576 0.3333 0.2148 -0.6969 0.5199 -0.4006
#>
#> Columns 19 to 20 -0.2628 0.2681
#> -0.4175 -0.3963
#> -0.3259 -0.2066
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>