Speech Analysis - Linear Predictive Coding (LPC) vs. The Cepstrum According to the basic model for speech synthesis, speech is composed of an excitation sequence linearly convolved with the impulse response of the vocal tract transfer function. The process of Cepstral deconvolution attempts to deconvolve the excitation from the vocal tract transfer function without making any of the assumptions that were necessary for linear prediction. Thus, it should be possible to obtain a transfer function that shows the effects of both poles and zeros, since the deconvolution process makes no assumption about the statistics of the excitation. Therefore, we may view the Cepstrum as an alternative method of system modelling.
The real cepstrum, A natural or base 10 logarithm is typically used for most applications but in principle any base may be used. The logarithm is a significant component to the whole operation, since we now have a linear combination in the frequency domain! See below. Thus, the vocal tract spectrum, Figure 1 - 12th order LPC Vs. the Real Cepstrum Analyzing Figure 1, notice that both linear prediction (blue) and the cepstrum (red) model the original spectrum (green) reasonably well. However, upon closer examination, notice that the Cepstral spectral envelope has produced some detail in the minima (between the peaks around the 2KHz region), which the linear prediction spectral envelope has failed to do. This is as expected, because in Cepstral deconvolution, no assumptions have been made concerning an all-pole model. Therefore, the processed frame of voiced speech contains a mixture of both poles and zeros, which can be better represented with the Cepstrum rather than the industry standard linear prediction technique. |
|||

Home | DSP Algorithms | Hardware Design | Radar Solutions | Disclaimer | Contact us |

Copyright © Advanced Solutions Nederland B.V., 2006-2016. All Rights Reserved. |