Mostly used in Speech Recognition feature extraction, it further analyze the Fast Fourier Transform of a speech signal.

It uses a mathematical transformation called cepstrum which computes the inverse Fourier Transform of the log-spectrum of the speech signal.

The cepstrum is a common transform used to gain information form a person’s speech signal, and it can be sed to separate exitation signal (which contains the words and the pitch) to the transfer function (which contains the voice quality).

Mel-Frequency Cepstrum differs from standard cepstrum in the frequency bands, which are equally spaced on the Mel scale, approximating the human auditory system response more closely that the linearly-spaced frequency bands used in the normal cepstrum.

The algorithm to obtain the MFCCs is the following:

  1. Take the Fourier transform of a window of a signal;
  2. Map the powers of the spectrum obtained onto the mel scale, using triangular overlapping windows;
  3. Take the logs of the powers at each of the mel frequencies;
  4. Take the discrete cosine transform of the list of mel log powers;
  5. The MFCCs are the amplitudes of the resulting spectrum.

tags: signal-processing