MFCC - Mel-Frequency Cepstrum Coefficients

Mostly used in Speech Recognition feature extraction, it further analyze the Fast Fourier Transform of a speech signal.

It uses a mathematical transformation called cepstrum which computes the inverse Fourier Transform of the log-spectrum of the speech signal.

The cepstrum is a common transform used to gain information form a person’s speech signal, and it can be sed to separate exitation signal (which contains the words and the pitch) to the transfer function (which contains the voice quality).

Mel-Frequency Cepstrum differs from standard cepstrum in the frequency bands, which are equally spaced on the Mel scale, approximating the human auditory system response more closely that the linearly-spaced frequency bands used in the normal cepstrum.

The algorithm to obtain the MFCCs is the following:

Take the Fourier transform of a window of a signal;
Map the powers of the spectrum obtained onto the mel scale, using triangular overlapping windows;
Take the logs of the powers at each of the mel frequencies;
Take the discrete cosine transform of the list of mel log powers;
The MFCCs are the amplitudes of the resulting spectrum.

tags: signal-processing

Quartz 4

Explorer

MFCC - Mel-Frequency Cepstrum Coefficients

Graph View

Backlinks