WaveRNN model based on the implementation from fatchord. The original implementation was introduced in "Efficient Neural Audio Synthesis". #' Pass the input through the WaveRNN model.

model_wavernn(
  upsample_scales,
  n_classes,
  hop_length,
  n_res_block = 10,
  n_rnn = 512,
  n_fc = 512,
  kernel_size = 5,
  n_freq = 128,
  n_hidden = 128,
  n_output = 128
)

Arguments

upsample_scales

the list of upsample scales.

n_classes

the number of output classes.

hop_length

the number of samples between the starts of consecutive frames.

n_res_block

the number of ResBlock in stack. (Default: 10)

n_rnn

the dimension of RNN layer. (Default: 512)

n_fc

the dimension of fully connected layer. (Default: 512)

kernel_size

the number of kernel size in the first Conv1d layer. (Default: 5)

n_freq

the number of bins in a spectrogram. (Default: 128)

n_hidden

the number of hidden dimensions of resblock. (Default: 128)

n_output

the number of output dimensions of melresnet. (Default: 128)

Value

Tensor shape: (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)

Details

forward param:

waveform the input waveform to the WaveRNN layer (n_batch, 1, (n_time - kernel_size + 1) * hop_length)

specgram the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)

The input channels of waveform and spectrogram have to be 1. The product of upsample_scales must equal hop_length.

Examples

if(torch::torch_is_installed()) {
wavernn <- model_wavernn(upsample_scales=c(2,2,3), n_classes=5, hop_length=12)

waveform <- torch::torch_rand(3,1,(10 - 5 + 1)*12)
spectrogram <- torch::torch_rand(3,1,128,10)
# waveform shape:  (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length)
output <- wavernn(waveform, spectrogram)
}