Mostly used in Speech Recognition feature extraction, it further analyze the Fast Fourier Transform of a speech signal.
It uses a mathematical transformation called cepstrum which computes the inverse Fourier Transform of the log-spectrum of the speech signal.
The cepstrum is a common transform used to gain information form a person’s speech signal, and it can be sed to separate exitation signal (which contains the words and the pitch) to the transfer function (which contains the voice quality).
Mel-Frequency Cepstrum differs from standard cepstrum in the frequency bands, which are equally spaced on the Mel scale, approximating the human auditory system response more closely that the linearly-spaced frequency bands used in the normal cepstrum.
The algorithm to obtain the MFCCs is the following:
- Take the Fourier transform of a window of a signal;
- Map the powers of the spectrum obtained onto the mel scale, using triangular overlapping windows;
- Take the logs of the powers at each of the mel frequencies;
- Take the discrete cosine transform of the list of mel log powers;
- The MFCCs are the amplitudes of the resulting spectrum.
tags: signal-processing