Inverse Mel Scale — transform_inverse_mel

Solve for a normal STFT from a mel frequency STFT, using a conversion matrix. This uses triangular filter banks.

transform_inverse_mel_scale(
  n_stft,
  n_mels = 128,
  sample_rate = 16000,
  f_min = 0,
  f_max = NULL,
  max_iter = 1e+05,
  tolerance_loss = 1e-05,
  tolerance_change = 1e-08,
  ...
)

Arguments

n_stft: (int): Number of bins in STFT. See n_fft in transform_spectrogram.
n_mels: (int, optional): Number of mel filterbanks. (Default: 128)
sample_rate: (int, optional): Sample rate of audio signal. (Default: 16000)
f_min: (float, optional): Minimum frequency. (Default: 0.)
f_max: (float or NULL, optional): Maximum frequency. (Default: sample_rate %/% 2)
max_iter: (int, optional): Maximum number of optimization iterations. (Default: 100000)
tolerance_loss: (float, optional): Value of loss to stop optimization at. (Default: 1e-5)
tolerance_change: (float, optional): Difference in losses to stop optimization at. (Default: 1e-8)
...: (optional): Arguments passed to the SGD optimizer. Argument lr will default to 0.1 if not specied.(Default: NULL)

Value

Tensor: Linear scale spectrogram of size (..., freq, time)

Details

forward param: melspec (Tensor): A Mel frequency spectrogram of dimension (..., n_mels, time)

It minimizes the euclidian norm between the input mel-spectrogram and the product between the estimated spectrogram and the filter banks using SGD.