Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

transform_sliding_window_cmn(
  cmn_window = 600,
  min_cmn_window = 100,
  center = FALSE,
  norm_vars = FALSE
)

Arguments

cmn_window

(int, optional): Window in frames for running average CMN computation (int, default = 600)

min_cmn_window

(int, optional): Minimum CMN window used at start of decoding (adds latency only at start). Only applicable if center == FALSE, ignored if center==TRUE (int, default = 100)

center

(bool, optional): If TRUE, use a window centered on the current frame (to the extent possible, modulo end effects). If FALSE, window is to the left. (bool, default = FALSE)

norm_vars

(bool, optional): If TRUE, normalize variance to one. (bool, default = FALSE)

Value

Tensor: Tensor of audio of dimension (..., time).

Details

forward param: waveform (Tensor): Tensor of audio of dimension (..., time).