model_wavernn.Rd
WaveRNN model based on the implementation from fatchord. The original implementation was introduced in "Efficient Neural Audio Synthesis". #' Pass the input through the WaveRNN model.
model_wavernn(
upsample_scales,
n_classes,
hop_length,
n_res_block = 10,
n_rnn = 512,
n_fc = 512,
kernel_size = 5,
n_freq = 128,
n_hidden = 128,
n_output = 128
)
the list of upsample scales.
the number of output classes.
the number of samples between the starts of consecutive frames.
the number of ResBlock in stack. (Default: 10
)
the dimension of RNN layer. (Default: 512
)
the dimension of fully connected layer. (Default: 512
)
the number of kernel size in the first Conv1d layer. (Default: 5
)
the number of bins in a spectrogram. (Default: 128
)
the number of hidden dimensions of resblock. (Default: 128
)
the number of output dimensions of melresnet. (Default: 128
)
Tensor shape: (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)
forward param:
waveform the input waveform to the WaveRNN layer (n_batch, 1, (n_time - kernel_size + 1) * hop_length)
specgram the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)
The input channels of waveform and spectrogram have to be 1. The product of
upsample_scales
must equal hop_length
.
if(torch::torch_is_installed()) {
wavernn <- model_wavernn(upsample_scales=c(2,2,3), n_classes=5, hop_length=12)
waveform <- torch::torch_rand(3,1,(10 - 5 + 1)*12)
spectrogram <- torch::torch_rand(3,1,128,10)
# waveform shape: (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length)
output <- wavernn(waveform, spectrogram)
}