It is implemented using normalized cross-correlation function and median smoothing.
functional_detect_pitch_frequency( waveform, sample_rate, frame_time = 10^(-2), win_length = 30, freq_low = 85, freq_high = 3400 )
(Tensor): Tensor of audio of dimension (..., freq, time)
(int): The sample rate of the waveform (Hz)
(float, optional): Duration of a frame (Default:
10 ** (-2)).
(int, optional): The window length for median smoothing (in number of frames) (Default:
(int, optional): Lowest frequency that can be detected (Hz) (Default:
(int, optional): Highest frequency that can be detected (Hz) (Default:
Tensor: Tensor of freq of dimension (..., frame)