Create the Mel-frequency cepstrum coefficients from an audio signal.

transform_mfcc(
  sample_rate = 16000,
  n_mfcc = 40,
  dct_type = 2,
  norm = "ortho",
  log_mels = FALSE,
  ...
)

Arguments

sample_rate

(int, optional): Sample rate of audio signal. (Default: 16000)

n_mfcc

(int, optional): Number of mfc coefficients to retain. (Default: 40)

dct_type

(int, optional): type of DCT (discrete cosine transform) to use. (Default: 2)

norm

(str, optional): norm to use. (Default: 'ortho')

log_mels

(bool, optional): whether to use log-mel spectrograms instead of db-scaled. (Default: FALSE)

...

(optional): arguments for transform_mel_spectrogram.

Value

tensor: specgram_mel_db of size (..., n_mfcc, time).

Details

forward param: waveform (tensor): Tensor of audio of dimension (..., time)

By default, this calculates the MFCC on the DB-scaled Mel spectrogram. This output depends on the maximum value in the input spectrogram, and so may return different values for an audio clip split into snippets vs. a a full clip.