Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.1462 0.2567 0.8638 -0.4793 -0.1115 -0.2104 -0.7485 0.4740 -0.1531
#> 0.3957 0.5775 0.0635 0.3065 -0.7018 0.6755 -0.5393 -0.1958 -0.4268
#> 0.6308 0.8125 -0.1272 0.3501 0.4815 0.0798 0.0806 0.8838 -0.3797
#>
#> Columns 10 to 18 -0.2030 -0.1653 0.2036 -0.7162 0.2247 0.5438 -0.0156 -0.7146 0.4829
#> -0.4866 -0.4701 -0.6116 0.4002 -0.1761 -0.4387 0.2493 -0.2555 0.2485
#> 0.7117 0.6465 0.4741 -0.2180 0.4336 -0.1333 -0.6637 -0.5552 -0.2279
#>
#> Columns 19 to 20 -0.8386 -0.5859
#> -0.8517 0.5394
#> -0.5549 0.0157
#>
#> (2,.,.) =
#> Columns 1 to 9 0.3733 0.2442 -0.0802 0.5218 0.2237 0.6097 0.0147 0.1574 -0.1451
#> 0.1650 0.3261 -0.3883 0.2830 0.2901 -0.0222 -0.3750 0.4487 0.3132
#> 0.2860 0.3143 0.2914 0.0066 0.4841 0.6879 -0.3368 0.0161 0.1735
#>
#> Columns 10 to 18 -0.5410 -0.0555 -0.0629 0.6935 -0.0771 0.2098 -0.4356 0.1178 0.6063
#> 0.5324 0.1588 0.4329 0.2361 0.6784 0.2739 0.2718 -0.1892 0.0013
#> -0.5777 -0.2206 -0.2697 0.1029 0.1815 0.1486 0.1102 0.1201 0.2854
#>
#> Columns 19 to 20 -0.1709 0.4600
#> -0.2949 -0.0989
#> -0.4996 0.2651
#>
#> (3,.,.) =
#> Columns 1 to 9 0.0559 -0.0247 -0.3546 -0.1171 0.1397 0.2934 -0.6387 0.5587 0.0797
#> 0.4747 0.3670 0.0611 -0.1523 0.1568 0.7565 -0.1505 -0.0636 -0.4648
#> 0.3566 0.5020 0.0323 -0.0766 0.0874 -0.0811 -0.5456 0.2729 0.0150
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.1006 -0.0912 -0.0105 -0.0207 -0.0004 -0.4845 -0.1874 0.4146 0.5107
#> -0.5146 0.0357 -0.0140 -0.3100 0.2991 0.1250 0.5280 0.1639 -0.2374
#> -0.3175 -0.1751 -0.3540 -0.0616 -0.3054 0.1587 0.3484 -0.1559 0.1439
#>
#> Columns 10 to 18 0.1129 0.0397 0.2820 -0.1373 0.7131 -0.0931 0.1651 0.3733 -0.7387
#> 0.6729 -0.7223 -0.1651 0.0861 0.2632 0.8154 -0.6576 -0.6319 0.0571
#> 0.2528 -0.1375 0.0198 -0.0576 0.4953 0.5505 -0.3547 0.1049 -0.3637
#>
#> Columns 19 to 20 -0.3648 0.6372
#> 0.5050 -0.2551
#> -0.1905 0.3245
#>
#> (2,.,.) =
#> Columns 1 to 9 0.1943 -0.3759 -0.1204 0.0610 0.1332 0.4891 0.0670 0.3168 -0.0102
#> 0.5831 0.2041 -0.3626 -0.0618 0.3794 0.2439 -0.1293 -0.3310 -0.0012
#> 0.3991 -0.0834 0.1830 -0.1325 0.2608 0.1700 -0.5141 0.2352 0.1447
#>
#> Columns 10 to 18 0.4251 -0.4894 0.3166 -0.3336 0.0494 -0.1528 -0.1164 -0.4695 0.1988
#> -0.2418 -0.3025 -0.4445 0.4926 0.1747 -0.1367 0.0196 0.3205 -0.1316
#> 0.1748 -0.4457 0.1024 -0.0685 0.1443 -0.0405 0.1734 -0.3839 -0.0344
#>
#> Columns 19 to 20 -0.0321 -0.1163
#> -0.1435 0.3021
#> -0.1448 0.0648
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>