Consider a frame of speech data, s(n) that is comprised of a vocal tract transfer function, p(n) convolved with an excitation, e(n).
The real cepstrum, c(n) may be calculated by determining the logarithm of the magnitude of the Fourier Transform of s(n), and then obtaining the inverse Fourier Transform of the resulting sequence, as shown below:
A natural or base 10 logarithm is typically used for most applications but in principle any base may be used. The logarithm is a significant component to the whole operation, since we now have a linear combination in the frequency domain! See below.
Thus, the vocal tract spectrum, P(w) and excitation spectrum, E(w) are now additive (i.e. a linear combination). Researchers believed that by analysing these two signals as ‘time signals’, the excitation would manifest itself at large values of ‘frequency’ (high frequency ripple), whereas, the vocal tract spectral envelope would appear a low frequency ripple. Hence, the effects of the vocal tract and excitation may be separated. Since the original Cepstral formulation computed the spectrum of the log spectrum, the units of the frequency ripple were actually in time. Therefore, the word quefrency (anagram of frequency) was assigned to describe the ‘frequency’ of the ripples in this new pseudo time domain.
The spectral envelope pertaining to the vocal tract may be obtained by firstly multiplying c(n) by a rectangular window (lifter) of unit height and of a length long enough to contain all the low frequency information pertaining to just the vocal tract. The exact length of the lifter is actually depend upon on the amount of detail required for the application, and as a consequence is chosen empirically.