Mel-frequency Cepstrum Coefficients — transform

Create the Mel-frequency cepstrum coefficients from an audio signal.

transform_mfcc(
  sample_rate = 16000,
  n_mfcc = 40,
  dct_type = 2,
  norm = "ortho",
  log_mels = FALSE,
  ...
)

Arguments

sample_rate: (int, optional): Sample rate of audio signal. (Default: 16000)
n_mfcc: (int, optional): Number of mfc coefficients to retain. (Default: 40)
dct_type: (int, optional): type of DCT (discrete cosine transform) to use. (Default: 2)
norm: (str, optional): norm to use. (Default: 'ortho')
log_mels: (bool, optional): whether to use log-mel spectrograms instead of db-scaled. (Default: FALSE)
...: (optional): arguments for transform_mel_spectrogram.

Value

tensor: specgram_mel_db of size (..., n_mfcc, time).

Details

forward param: waveform (tensor): Tensor of audio of dimension (..., time)

By default, this calculates the MFCC on the DB-scaled Mel spectrogram. This output depends on the maximum value in the input spectrogram, and so may return different values for an audio clip split into snippets vs. a a full clip.