Solve for a normal STFT from a mel frequency STFT, using a conversion matrix. This uses triangular filter banks.

transform_inverse_mel_scale(
  n_stft,
  n_mels = 128,
  sample_rate = 16000,
  f_min = 0,
  f_max = NULL,
  max_iter = 1e+05,
  tolerance_loss = 1e-05,
  tolerance_change = 1e-08,
  ...
)

Arguments

n_stft

(int): Number of bins in STFT. See n_fft in transform_spectrogram.

n_mels

(int, optional): Number of mel filterbanks. (Default: 128)

sample_rate

(int, optional): Sample rate of audio signal. (Default: 16000)

f_min

(float, optional): Minimum frequency. (Default: 0.)

f_max

(float or NULL, optional): Maximum frequency. (Default: sample_rate %/% 2)

max_iter

(int, optional): Maximum number of optimization iterations. (Default: 100000)

tolerance_loss

(float, optional): Value of loss to stop optimization at. (Default: 1e-5)

tolerance_change

(float, optional): Difference in losses to stop optimization at. (Default: 1e-8)

...

(optional): Arguments passed to the SGD optimizer. Argument lr will default to 0.1 if not specied.(Default: NULL)

Value

Tensor: Linear scale spectrogram of size (..., freq, time)

Details

forward param: melspec (Tensor): A Mel frequency spectrogram of dimension (..., n_mels, time)

It minimizes the euclidian norm between the input mel-spectrogram and the product between the estimated spectrogram and the filter banks using SGD.