A REVIEW OF SUBBAND CODERS FROM A PRACTICAL PERSPECTIVE
INCLUDING A NEW METHOD FOR THE DESIGN OF THE ANALYSIS AND
SYNTHESIS FILTER AS A COSINE SUMMATION
by
Carl William Glasow, Jr.
B.S.E.E., Arizona State University, 1968
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering
1996
This thesis for the Master of Science
Electrical Engineering
degree by
Carl William Glasow, Jr.
has been approved
by
@4/xr/fÂ£
Date
Hamid Fardi
Glasow, Carl William Jr. (M. S., Electrical Engineering)
A Review of Subband Coders from a Practical Perspective Including a New Method for the
Design of the Analysis and Synthesis Filter Window as a Cosine Summation
Thesis directed by Associate Professor Mike Radenkovic
ABSTRACT
Subband coders are reviewed in terms of digital signal processing fundamentals. An
actual application, music compression, is used to help describe the digital processing
techniques used. The subband coder separates the audio signal into frequency bands
each with a sampled time domain signal of primarily that frequency. A detailed technique
for designing the analysis and synthesis filters for subband coding of audio signals is also
presented. The technique expresses the window in terms of a finite Fourier cosine series
summation and permits the specification of passband ripple and stopband rejection.
Design of a proper filter provides a means for nearperfect reconstruction of the original
audio signal from the subband coded signals. Due to characteristics of human audio
response, the subband coded signals are in a form which can be better compressed
without perceptual loss of signal content to the human ear.
This thesis accurately represents the content of the candidate's thesis. I recommend its
publication.
Mike Radenkovic
in
CONTENTS
CHAPTER
1. INTRODUCTION AND BACKGROUND................................. 1
General.............................................. 1
Technical Background Audio Compression Requirements ... 1
2. SUBBAND CODER REVIEW ....................................... 4
General.............................................. 4
Decimation and Interpolation ........................ 6
Modulation Methods................................... 9
Aliasing Cancellation ...............................12
Polyphase Filters....................................14
Continually Changing Waveforms and the Short Term Fourier
Transform ...........................................16
Block Transformation and Computational Efficiencies..17
3. ANALYSIS/SYNTHESlS FILTER DESIGN............................25
Filter Requirements .................................25
Cosine Summation Problem Statement...................27
Cosine Summation Solution ...........................30
Filter Design Results ...............................32
4. APPLICATION OF THE RESULTING SUBBAND CODER TO SINE
WAVE SAMPLES AND AN ACTUAL MUSIC SAMPLE..............36
5. SUPPLEMENTARY AND ALTERNATIVE COMPRESSION
TECHNIQUES...........................................40
IV
6. SUMMARY AND ADDITIONAL RESEARCH POSSIBILITIES..........41
APPENDIX
A. LISTING OF THE SUBBAND CODER C PROGRAM.................42
B. ADDITIONAL PROGRAM LISTINGS............................47
REFERENCES.........................................................59
v
CHAPTER 1
INTRODUCTION AND BACKGROUND
General
The intention of this thesis is to provide a thorough understanding of the signal processing
methods used to compress audio by means of subband coding. Subband coding
implements a unique digital filter, involves multirate sampling methods, requires specific
modulation methods for an efficient implementation, eliminates aliasing using digital
methods rather than by prefiltering, works with a continually timevarying waveform, and
uses a specific discrete digital transform algorithm that considers both compression and
efficient implementation.
Initial attempts to understand the requirements and design method for the unique filter
were not successful. An effort to model the filter used in the MPEG (Motion Picture
Experts Group) standard for audio subband coding resulted in the development of a design
technique that specified the filter in a concise form as a cosine series summation.
Organization is as follows. As background, the need for audio compression and
applicability of this technique to audio compression is reviewed. A detailed description of
the subband coder technique follows. The filter requirements are next discussed. A step
by step procedure for determining the analysis and synthesis filter windows is presented.
Application of the technique to a music compression requirement is presented, using the
technique on an actual musical sample to verify its correctness. Supplementary MPEG
techniques are briefly discussed. Copies of the programs to design the filter, to
demonstrate usage of the subband coder, and other programs used to verify the method
are included in appendices.
Technical Background Audio Compression Requirements
The compression of audio signals has a long history including mechanical coders in the
1940's that modelled the human voice, early digitized voice compression research by
Rabiner and Schafer [11], human psychoacoustic research, and the various researches
which resulted in the modern music compression techniques developed by the MPEG
Standards [5] committees during the last fifteen years.
The need for compression in music signals is examined by comparison of the bandwidth of
uncompressed music with popular modern day transmission bandwidths and digital storage
capacities. The human ear has a bandwidth of about 20 KHz and a dynamic range of
about 100 dB, and modern music playback and recording equipment is capable of meeting
these requirements. To digitize music at this bandwidth and dynamic range requires a
Nyquist sampling rate of double the bandwidth and digital resolution of a bit per each 6 dB
of dynamic range. Modern day CD players meet this requirement by sampling at 44.2K
1
with 16 bits of resolution for each of two channels (stereo) of music. The composite
required bit rate is (2)(44200)(16) or approximately 1.4 Mbits/second. Compared with the
4 KHz bandwidth of a phone line, 28.8K modem and 64K ISDN data communication
channels, and the 100K bandwidth of an FM radio station, the need for compressing
digitized music signals is quite obvious. To transfer 3 minutes of uncompressed stereo
music over a 28.8K modem would take two hours and twenty seven minutes. The
situation in storage is not much better. To store a 2 hour stereo music program digitally
requires 5.1 gigabytes, exceeding the capacity of modern low cost disk drives. Depending
on the desired quality of the reconstructed music signals, compression ratios ranging from
6 to 1 to approximately 90 to 1 are available.
Improvements in technology will relieve most present day problems. Hard drives will
become larger, the cost per byte of memory will continue to decrease, modems will
become slightly faster, and ISDN lines more prevalent among consumers. The real
breakthrough, though, will be fiber optic access for the consumer, either supplied by the
phone companies, or by home computer and internet access over already available cable
company fiber optic lines. As this is being written, Jones Intercable has announced 100
MBit internet cable service to its Washington D.C. customers by summer 1996 for $40 a
month. Even these advances will not alleviate the need for compression though. With a
100 Mbit data line, uncompressed audio will still require two minutes to deliver a two hour
audio program. The line also needs to have available bandwidth for video, and however
fast the transfer is, consumers will always be happier with faster delivery. In general, if the
communication network of a country is at capacity, that capacity can be increased by a
factor of ten if the information is compressed by a factor of ten. If delivery requires a
specified amount of time, that delivery time can be improved by a factor of ten if the
information is compressed by a factor of ten.
Some background in the technical aspects of compression follows. Most compression
techniques depend on the fact that not all codes are used equally, or that data is highly
correlated with adjacent data. Text data on a computer, for example, is encoded in 8 bit
ASCII, accommodating 256 different characters. In typical text, 15 to 20 letters comprise
an estimated 90% of the content, easily providing 10 to 12 compression even in the
absence of a lot of white space. Faxes take advantage of the large amount of white space
on most pages to effectively compress data. Videoconference equipment takes advantage
of the fact that people and surroundings do not move much when in a conference. Video
motion picture transmission also provides a high degree of correlation since it's highly
likely that a pixel is the same or nearly the same value as the pixels above, below, to the
right and left, and for several frames prior and following. Correlation in three dimensions is
thus possible. In fact, video requiring 24 bits per pixel to represent brightness and color,
and 525 by 400 pixels on a television screen updated at 30 frames per second requires an
uncompressed data rate of 150 Mbits per second. With correlation of data in three
dimensions, though, compression ratios between 100 and 1000 are achievable, depending
on the level of action.
Speech compression, a subset of the audio compression problem, also can provide a high
level of compression. It does not usually require stereo, the human voice is usually limited
to 4K bandwidth, and compression techniques that model the human vocal tract provide
reasonable quality of an actual human voice at 2500 bits per second, Rabiner and Schafer
2
[11]. For a simulated voice containing no emotional information, the limited number of
vowels and consonants in human speech can each be modelled with 6 bits and if typical
speaking rates occur at 3 sounds per second, speech can be compressed to 200 bits per
second.
The audio music signal, however, does not provide as high a level of correlation. Music
represents the full breadth of vocal and instrumental expression that can be heard by the
human ear. If sound occurs at up to 20 KHz and is sampled at 44 KHz, and the dynamic
range is 100 dB, there is little probability that the next sample can be accurately predicted.
The signal only occurs in one dimension, and that dimension is poorly correlated in time.
Neither differential encoding or adaptive differential coding are very effective in this signal
environment. Musics one correlative characteristic is that it typically consists of a limited
number of frequencies, and these frequencies change at a much slower rate that 44.2
KHz.
Slowly changing frequency content with a limited number of frequencies suggests the use
of an adaptive predictor. The frequency content and phase, however, change at a fast
enough rate, such that a predictor has a difficult time keeping up while maintaining
sufficient accuracy. For five to one compression, the filter would have to predict the next
16 bit value with 3 bit accuracy, an extremely difficult task. Additionally, depending on the
number of coefficients predicted, audible lower amplitudes of unpredicted frequencies
would be left out.
The subband coding technique, in contrast, takes advantage of the physical characteristics
of human hearing, by providing at least some information in all frequency bands. Studies
of human auditory response demonstrate that the human ear can hear multiple frequencies
of a wide dynamic range if the separation in frequency is high enough, Pan and Shlien, [9]
and [13]. However, the ear cannot hear two frequencies that are close together, if one is
higher in amplitude than the other. The ear also requires a higher level of signal for high
frequency signals than for low frequency signals. For example, a 20 KHz signal needs to
be 40 dB higher in amplitude than a 1 KHz signal at minimum hearing threshold levels.
The first hearing characteristic suggests that the signal can be coded into a finite number
of frequency bands without loss of audible information, as long as the bands are a width
that agrees with human frequency discrimination capability. The second hearing
characteristic suggests that once the information is divided into bands, the higher
frequencies can be coded with significantly less bits than the lower frequencies.
The subband coder is an extension of the Quadrature Mirror Filter originated by Croisier
and Esteban [3], a twochannel subband coder that has been adopted by the CCITT as
Recommendation G.722. A signal processing implementation of Recommendation G.722
as described by Analog Devices [4] is used to add an auxiliary channel to a voice channel
by dividing an 8 KHz channel into two 4 KHz channels using overlapping bands.
Rothweiler [12] and Chu [1] described the method by which the Quadrature Mirror Filter is
extended to multiband use.
3
CHAPTER 2
SUBBAND CODER REVIEW
General
For the subband coder to be useful it needs to have a flat frequency response, it must be
capable of operating on a constantly changing signal, and needs to provide near perfect
reconstruction of the original signal from the subband coded data. The ideal subband
coded information should also be less than or equal the bit rate of the original signal,
called the critical sampling frequency. At first look, these requirements would require a
perfect brickwall bandpass filter for each frequency band. If the filter is narrower than the
band, information is lost at the band borders and the frequency response is not flat. If the
filter is wider than the band, then the coded band information requires a higher bit rate
than the original signal. The solution is to have overlapping bands, where the combination
of information from two adjacent bands can be used to reconstruct the signals at the band
borders. This can be accomplished while sampling the subband coded signal at the same
frequency as the original signal. However, information outside the band, when sampled at
the critical Nyquist frequency of.the band will exhibit aliasing. Attempted recovery of an
aliased signal will result in interpretation as an incorrect frequency within the band instead
of the original out of band frequency. An additional requirement is to design a transform
process and a filter such that any aliased terms are cancelled.
The filter design should also have a finite number of coefficients. The ideal set of
coefficients for a brick wall filter can be determined by taking the Fourier transform of the 
ideal rectangular frequency response. The transform result defines the coefficients in the
time domain. A set of n successive coefficients define the multiplying factor for n
successive sampled values of the time domain signal. The Fourier transform of a
rectangular frequency function is the sin(co)/co function shown in Figure 2.1. The function is
plotted to 8rc with zero crossings at nji for n an integer not equal to 0. The actual
Fourier transform is infinite.
Although this function decreases in value the farther it is from its center maximum, the
signal is still not finite, meaning that the ideal filter has an infinite number of coefficients.
Simply zeroing the coefficients beyond a certain time results in an effect called the Gibbs
phenomena, where any ripple due to the filter truncation causes an oscillation in the
frequency domain near the cutoff frequency. This oscillation does not decrease in
amplitude, but increases in frequency as the cutoff frequency is approached.
This problem can be minimized by multiplying the truncated sin(
term by term by a windowing function which decreases the outside coefficients gradually to
zero as they approach the truncation boundary. Several popular window techniques are
described in digital signal processing textbooks, including the Hanning, Hamming,
Blackman, and Kaiser windows. These windows tradeoff complexity, passband and
stopband ripple characteristics, and rolloff characteristics to meet differing application
4
Figure 2.1 sin(co)/co
requirements, and approximate the ideal brickwall filter with a finite number of coefficients.
The window function therefore modifies the coefficients of an ideal filter to provide a filter
with desirable characteristics. The subband coder filter application requires in addition to
the application of a textbook window, the design of a window that both eliminates the
aliased terms in an overlapping filter, and permits, when the subband coder filter (the
analysis filter) is combined with the reconstruction filter (the synthesis filter), near perfect
reconstruction of the original signal.
Prior to discussing the filter design, an understanding of the subband coder is helpful. This
requires defining the method by which the signals are separated into bands so they can be
filtered, defining the number of bands, defining the width of the filter, discussing how a
timevarying signal is handled, and defining an efficient mathematical technique for
implementing the filters.
The first step is separating the input signal into different frequency bands. This is
accomplished by modulating the signal with equally spaced sine or cosine functions. This
is classic frequency modulation and for each modulating signal results in signals equal the
sum and the difference between the input signal and the modulating signal.
Mathematically, this is illustrated easily by the basic trigonometric identity
cos(a) cos(b) = .5 cos(a+b) + .5 cos(ab), (2.1)
5
where signal a is the input signal and signal b is the modulating signal.
Since several bands with finite spacing are provided, the resulting difference between the
signals can be filtered with a baseband filter equal the width of the band. (The sum term
is far outside this filter bandwidth, is of no interest, and is filtered out). If the bands are
equally spaced, the same baseband filter, called a prototype filter, can be shared
mathematically by all bands. The number of signals to be filtered equals the number of
bands. Additionally, the bandwidth of the narrow band filters is much less than the
bandwidth of the full original signal bandwidth. The reduction in filter bandwidth is by a
factor equal the number of bands. For example, using simple numbers, if a signal with 16
KHz bandwidth is sampled at 32 KHz and 32 bands are used, the modulating signals are
500 Hz apart and the baseband filter operating on each of the modulated signals has a
nominal bandwidth of 500 Hz.
Decimation and Interpolation
The reduction in bandwidth means also that the sample rate can be reduced accordingly.
For the example above, the Nyquist sampling rate is 32 KHz for the original signal and 1
KHz for each of the modulated signals. Since there are 32 modulated signals to be
sampled, the composite sampling rate is still 32 kHz, meeting the critical sampling
frequency requirement.
6
The technique of sampling at a lower rate is called downsampling. The downsampling
filter is called a decimation filter. Reconstruction of the signal involves the opposite of this,
upsampling, and the upsampling filter is called an interpolation filter. The downsampling
filter is called a decimation filter because it uses only every nth input sample (every 32nd
sample for the example above), decimating the filter. The interpolation filter, in contrast,
takes a decimated signal, zero fills the intermediate samples, and by filtering, interpolates
the signals between the decimated points to recover the original signal.
Figure 2.2 illustrates the decimation and interpolation of a sine wave by a factor of 4. The
original waveform was sampled four times more often than shown. The waveform can be
recovered by interpolation, setting the intermediate values to zero as shown and filtering
properly. A well designed filter can recover the original intermediate samples.
Decimation and interpolation filters introduce aliased terms due to characteristics of the
sampling action. These further complicate the requirement to cancel aliased terms. When
a signal is low pass filtered by some factor prior to decimating by that same factor, the
decimation creates duplicate images of that low pass frequency response across the full
frequency domain. This is easily demonstrated by plotting the frequency response of both
a low pass filter and of a decimated version of that low pass filter.
The plots in Figure 2.3 are generated by a program that creates a low pass filter with
bandwidth je/8.5 (a bandwidth of n provides an all pass filter), windows it with a Hanning
window, decimates the result by 8, and performs an FFT on both the original filter and the
decimated filter.
The notches are created by designing a filter with a bandwidth slightly less than the full
decimated bandwidth for the purpose of band border illustration.
Using the same low pass filter for interpolation as for decimation eliminates the other
images generated during the decimation operation. However, since the filter is not perfect,
the edges of adjacent images are not completely eliminated. A nonideal bandpass filter
attempting to capture one of the middle spectral images above would not be able to
eliminate the edges of the adjacent images. This is the aliasing effect of the decimation
and interpolation operation. The difference between a simple filter application and the
subband filtering application is that normally the out of band information would be
significantly attenuated prior to sampling. This is accomplished either by use of a good
attenuator, or by oversampling so that a less complex filter may be used. In the multiband
filter application, the adjacent bands contain necessary information, and with the
overlapping filters proposed, the adjacent band information can not be substantially
attenuated.
Another characteristic of decimation filters, a quite useful one, significantly reduces the
amount of computation in the subband filter application. If an input signal is multiplied by
another function prior to decimation, the computation can be performed on the decimated
signal after the decimation instead, and only on the decimated samples instead of all input
samples. This means that for a particular band, the modulation multiplication and the
analysis filter window coefficient multiplication can be performed after the decimation only
on the decimated samples. The interpolation filter exhibits similar features.
7
H(u) AFTER DECIHATION H(u)
U
Figure 2.3. Aliasing effect of decimation and interpolation.
8
Modulation Methods
The next topic of interest is the modulation technique used to extract the information in
each frequency band. There are several methods for spacing the modulation frequencies
in the frequency band. Depending on the method used, complex data may result in
doubling the amount of compressed data, the bottom and top bands may not be handled
ideally, the transform size may not be optimal, or an efficient transform may not be
realizable. Several modulation methods are briefly discussed to aid in understanding the
particular modulation method chosen for the subband coder application.
B B.BB2 B.BB4 B.BB6 B.BBB B.B1 B.B12 0.014 B.B16
t
Figure 2.4 Integer band modulation
The decimation and interpolation operation itself is a form of modulation and demodulation.
Sampling a frequency band at a decimated rate translates that frequency down to the
baseband (assuming the information was bandpass filtered prior to the decimation.)
Intuitively this can be grasped by imagining a high frequency sine wave being sampled at
a much lower rate than the Nyquist sampling frequency. If the sampling rate is not an
exact submultiple of the original frequency, the samples, although they skip several
periods, will create a lower frequency sine wave. The number of periods skipped is
related to the frequency band that is translated. Figure 2.4 shows an 1100 Hz sine wave
sampled at both 32 KHz and at 1 KHz. The points marked X show the 1 KHz sampling. A
baseband filter recovery using the X samples results in the 100 Hz modulated waveform
shown.
9
As seen in Figure 2.3 illustrating aliasing due to decimation and interpolation, interpolation
of this downsampled signal by zero filling to increase the sample rate, creates multiple
spectral images. Instead of low pass filtering the result, one of the higher frequency
images may be captured by bandpass filtering the result. The original information is thus
translated up to the original frequency band by simple interpolation (or to another band if
desired) without having to multiply the signal by a cosine or sine modulation term. A
digital bandpass filter on the input and output are all that are required. This simple
modulation technique is called integer band decimation and interpolation.
The limitations of this technique are the pre and post filtering requirement, and that the
bands are fixed as shown in Figure 2.3. If the desired center frequency of the bands is
not as shown, a different modulation technique is required. For example, having half
bands for the high and low bands may be undesirable.
Modulation using the Discrete Fourier Transform (DFT) provides a general purpose
modulator, and at the same time provides an efficient mathematical method for handling
multiple bands at the same time. The DFT equation is:
where a is the desired modulation frequency, x(n) are the M samples, X(co) is the
transform result, j is the imaginary operator, and per Eulers equation:
If M is a power of 2, the Fast Fourier Transform (FFT) may be used to further minimize
computation.
The values of to, or center frequency values, can be any equally spaced frequency values,
250 Hz, 750 Hz .... for example. The first value of to is not required to be 0. As long as
the frequencies are equally spaced, the DFT can be efficiently implemented.
As seen from equation (2.3) though, the general DFT creates both real and imaginary
terms. This technique is called quadrature modulation because both the cosine and sine
terms each create a sum and difference transform term, or four terms total. The result can
be low pass filtered eliminating the sum terms, but two terms are still required for each
frequency band. If these terms are kept as compression terms, twice as much information
must be transmitted or stored compared to a method that requires only real terms for each
band.
Single sideband (SSB) modulation provides this method. If the original time signal to be
modified is represented by a complex value and its conjugate value, multiplying the
complex value by exp(jcon) and its conjugate by exp (jeon) and summing them would result
in a real term. Creating the complex term and its conjugate can be accomplished by
multiplying the signal by cos(co,n) + j*sin(co1n), and cosfo^n) j*sin(co1n). With some
straightforward mathematical manipulation, the multiplication and summation result in the
(2.2)
n=0
e'in = cos(ajn) j*sin(ton).
(2.3)
10
input signal being multiplied, or modulated, by cos[(a>, +
to A(o/2, the width of the band divided by 2, the to, terms form a set of equally spaced
center modulation frequencies. A comparison of standard quadrature modulation and SSB
modulation with discrete Fourier transforms is instructive. A standard Discrete Fourier
Transform utilizes e'in and results in a complex frequency spectrum. A variation of the
DFT, the Discrete Cosine Transform, multiplies the series by a cosine term instead and
produces a real spectrum. Similarly, the SSB cosine multiplication produces a real
transform output.
SSB modulation also permits selection of an initial offset frequency, as with quadrature
modulation. Selection of the resulting sideband frequencies and the width of the subbands
still needs to be addressed. There are tradeoffs related to band alignment, number of
bands for a particular transform size, and computational efficiency. Figure 2.5 from
Crochiere and Rabiner [2] shows three cases. In each case, eight frequencies are equally
spaced around the unit circle and require the same transform size. In the first case the
first frequency is 0, and since the frequencies on the bottom of the circle are conjugate to
those on the top half, the resulting non duplicate bands are three full bands and a half
band at minimum and maximum frequency. In the second case, where the bottom band is
centered at 1/2 band from the origin (tc/8 for the example), the bands are also duplicated.
There are now at least four full bands and no half bands. In the third case, the first band
is offset by V* band. In this case, the frequencies on the bottom half of the circle are offset
from those on the top half. If alternating bands from the top and bottom half are used, the
bands may be reduced in half and eight non overlapping, non duplicated bands with
bandwidth rc/8 are available for the same transform size as for the other two cases. This
case, however, does not lead to a computational approach that lends itself to a fast
transform, although Crochiere and Rabiner [2] state that by using quadrature modulation
and proper handling of the imaginary terms an efficient implementation can be achieved.
The second implementation lends itself more readily to an efficient implementation, using
only real numbers, although it requires twice the transform size for the same number
bands. This is the approach used in the MPEG specification. The modulation frequencies
for an M band system are:
m[i] = 2rc(i + 1/2)/2M, i = 0 to M1 . (2.4)
For the four band system in Figure 2.5, for example, a transform size of 8 is required and
the first band
by the MPEG specification, requires a transform size of 64 and the first band is at re/64.
The resulting transform will be of the general form:
s[i] = x[n] cos[7t(2i+1 )(n+n0)/2M], (2.5)
n=0
where s[i] are the subband values, i is the band number from 0 to M1, and x[n]
are the input samples, and N is the number of samples transformed.
11
Ke.KjO
(4I) SSB CHANNELS
2 2
K* B.ko* V4
K SSB CHANNELS
0 7 16 2 5 3 4
w6 <**2 uij ug w4ir
Figure 2.5 SSB modulation options
The value of n0 will be addressed in the subsequent discussions on aliasing and block
transformations.
Aliasing Cancellation
A graphical explanation is helpful in describing aliasing cancellation in the multichannel
subband coder. Figure 2.6 below is from Chu[1].
For the bandpass filter output of a full spectrum signal, Figure 2.6 shows the spectrum of
the signal at the bandpass output, following decimation, and after the bandpass
interpolation filter. The mirror images shown as the conjugate frequency spectrum are
those that result from a Fourier transformation of the bandpass signal. For a cosine
waveform, an even function, the mirror images are the same polarity. For a sine function,
an odd function, they are of opposite polarity. These images of course need to be
considered because they generate the multiple frequency bands, and the aliased terms
that need to be cancelled. The fourth image set shows aliasing terms on both the lower
and upper end of the spectrum for both the frequency band and its image.
12
0RIG1**L SPECTPU1
ORIGINAL 5PEC1RLM
CONJUGATE FREO SPECTQLM FREQUENCY SRECTRLM CONJUGATE rUEO SPECTOLM  FREQUENCY S^CTRLM
J
TILTEO AND MODULATE MITH C05INE FLECTION
M______ /E\___________
i
'pi 0 pi
nCCfMnriON INTERPOLATION SPECTRAL IMAGES
t I
PI 0 PI
DEMODULATE AND TlTER WITH COSINE FLNCTICX
I
A & A A & P[
5UBTRACT ADJACENT B BAND FROM A BAND
____/")fi_______
I A* & B* I I A & B I
FILTER AND MODULATE WITH SINCFUCTION
/b\PHA5E +PI/P
W
\_ypH
10
/PWSC Pl/p
f*SE *PI/2
DECIMATION INTERPOLATION SPECTRR. IMAGES
01/7\ M M M M M M /M
P5C PI/2
DEMODULATE AND TILTCR WITH SINE rt/'CTION
BA AB
0^0
Pl**5C PI
Figure 2.6 Aliasing effects of decimation as a function of modulation phase.
The bands on the left are cosine modulated and the bands on the right are sine modulated
with a resulting id2 phase difference. Another je/2 phase shift in the interpolation filter for
the sine modulated band results in an out of phase negative frequency image, and the
spectral images shown on the right in Figure 2.6. The decimation and interpolation
generate multiple spectral images of the same phase as the original images in the third
image set. The sine function can also be considered a multiplication of the positive
frequencies by j and of the negative frequencies by j. The right side main positive and
negative frequencies are multiplied by j*j and (j)*(j) respectively which both equal 1. The
nonaliased frequencies are thus simply inverted by the procedure. The aliased terms, on
the other hand, in the regions of interest, are multiplied by j*(j) or (j)*j providing a gain of
1. The main frequency images are therefore inverted and the aliased images are not.
If the two figures shown represent adjacent bands (ie. one even bands and one odd
bands) the bottom left spectrum in Figure 2.6 reveals that subtracting the sine modulated
adjacent band from the cosine modulated band results in doubling the desired spectral
information and cancelling the aliased information. It should also be noted that the aliasing
is cancelled if the adjacent bands are id2 phase related regardless of the exact modulated
phases.
Chu [1] and Rothweiler [12] both present a mathematical derivation that demonstrates the
aliasing cancellation. The derivation is not repeated here but is briefly discussed. The
derivation describes the decimation in the frequency domain as a convolution of a periodic
train of delta functions with the filtered input. Expressions are derived for the frequency
13
response in adjacent bands, multiplied by cosine or sine terms as applicable, main
frequency and aliasing terms are identified, and the upper side and lower side of the two
adjacent bands combined to show the aliasing cancellation. The derivation also leads to
filter design requirements including a symmetry requirement and a constant square
summation for adjacent band contributions at all frequencies covered by the adjacent
bands.
Unfortunately multiplying one set of channels by a cosine function and the other by a sine
function requires two different kinds of filters. This means the efficient Discrete Transform
implementation hoped for is lost. However, Rothweiler [12] has shown that if a 45 degree
phase shift (or id A phase shift) is added (or subtracted) to both the cosine and sine terms,
the same filter can be used if some care is taken handling a polarity change for a part of
the term. The effect of subtracting the idA term is shown in the simple trigonometric
equations:
cos(x idA) = (1//2)*[cos(x) + sin(x)], and (2.6a)
sin(x id A) = (1A/2)*[cos(x) sin(x)]. (2.6b)
Defining n0 in equation (2.5) appropriately, the resulting subband transform is:
s[i] = ^ x[n] cos[jt(2i+1)(n/2M) id A)],
n=0
giving the following cosine terms for the first four subbands:
= 0: cos(1rcn/2M idA) =
= 1: cos(3rcn/2M idA) =
= 2: cos(5rcn/2M idA) 
= 3: cos(7jtn/2M idA) =
(1/V2)*[cos(jtn/2M) + sin(nn/2M)] = e^4,
(1//2)*[cos(7in/2M) sin(7tn/2M)] = e'31'4,
(1//2)[cos(nn/2M) + sin(wi/2M)] = e'51'4,
(1//2)*[cos(nn/2M) sin(nn/2M)] = e14.
(2.7)
(2.8a)
(2.8b)
(2.8c)
(2.8d)
The result of equation (2.8) demonstrates both the id2 phase shift required between bands
for aliasing cancellation, and a need to take care of a minus sign for certain bands. An
integer substitution for n that permits n to be specified in groups of 2M permits the
inversion to be handled by a partial filter inversion, and will be described in the block
transformation discussion.
A similar id2 phase shift in the synthesis process will provide an end to end transform
inversion of alternate bands where the aliased information is finally subtracted out.
Polyphase Filters
When, as in the MPEG subband coder example, there are 32 filters, an implementation
that decimates the input samples and handles them in blocks of 32 must be defined. If the
implementation is pictured as a one sample fixed phase delay, the delay can be
represented by the delay operator e'sT, where T is the single sample delay time. This
delay term is of the form of the Discrete Fourier Transform operator, and suggests that a
14
transform operation is applicable that uses all samples. The samples can then be handled
in blocks, where the block size is equal the decimation ratio. The selection of the sample
is often portrayed as a commutation operation, or a rotating selector, where each
successive sample is part of a different decimation group. The decimation property that
permits multiplication by a window function or modulation function after the decimation is
now utilized. The full convolution of an input signal with a filter window and modulator
would normally be shown as:
s[i]=ib x(Tn]*h[n]*cos[(2i+1)(njt/2M tc/4)]. (2.9)
n=0
This equation indicates that each input sample needs to be multiplied by each window and
modulation point. Shifting the samples by the window as a block permits samples to be
lined up with particular window coefficients, and multiplication by the modulation function
can now be defined by a transform size less than the full size of the window. This
capability permits a much more efficient implementation of both the windowing and
modulation. The one sample fixed phase delay gives the subband coder an alternate
name, the polyphase filter. The implementation also permits use of all input terms in a
multiband filter, as opposed to using one set of decimated terms. Use of the full set of
input samples intuitively provides more information about the signal when considered in its
original form, a single band full bandwidth signal. Figure 2.7 from Crochiere and Rabiner
[2] illustrates the polyphase filter implementation. Mitra and Kaiser [7] also describe the
polyphase filter in their Handbook of Signal Processing.
Figure 2.7 Polyphase filter block diagram.
15
The one sample difference between adjacent samples together with the cosine modulation
operation lends itself to a mathematical implementation of the composite filter using a
variation of the Fast Discrete Cosine Transform (FDCT). Its important to note the usage
of the FDCT in this application is different than the more common use. The FDCT, like the
Fast Fourier Transform (FFT) is typically used to transform timedomain data to the
frequency domain. The output in this typical application is a spectrum representing the
magnitude of each frequency band. In the subband filter application, the FDCT is used as
a mathematical construct to transform a single timedomain signal to multiple timedomain
signals, one for each frequency band. For the music compression example, a 4800 Hz
signal modulated by 4750 Hz in a 4500 Hz to 5000 Hz band, results in a 50 Hz signal in
that band sampled at a 1 KHz rate, rather than simply a magnitude signal indicating
presence of signal activity in that frequency band. The sampled timedomain signal also
provides phase information with non complex data, unlike the normal FFT frequency
domain output which requires complex data to maintain phase information, and unlike the
normal DFCT frequency domain output which, although all output is real numbers, contains
no phase information. The resulting subband signals are not pure decimated and
modulated time domain signals, however. Adjacent frames of subband data are shifted in
phase by n/2 radians, as required for aliasing cancellation.
The polyphase filter similarly results in a more efficient implementation of the synthesis
filter.
Continually Changing Waveforms and the Short Term Fourier Transform
The shortterm Fourier transform describes the method by which changing waveforms are
transformed on a sliding and overlapping basis to provide a smoothly changing and
accurate transformation of a time domain signal into the frequency domain, or separation
of the time domain signal into several time domain channels of different frequency content.
The first case applies to spectrum analysis of a changing signal, the second to the
subband coder application. Rabiner and Schafer [11] provided an early exploration of this
subject as it applies to speech waveforms.
The technique slides the waveform across a window in sample frames. The window is
wider than the sample frame. In the music application, the window contains 511 points, 32
samples are shifted in each frame time, a new transform is performed each 32 samples,
and 16 frames are always active. The 32 samples agree with the decimation factor and
with the number of subbands. Because of the midpoint maximum of the filter, the frame
halfway through the window is given the most weighting. Future and past frames are
given increasingly less weighting the farther they are from the middle frame. All frames
are added with their relative weighting. As new frames are shifted in and old frames are
shifted out, the effect is a continuous overlap and add of the changing input data coupled
with the filtering operation.
For the synthesis operation, the input subband information represents a decimated time
signal. The decimated time signal has rc/2 phase shifts incorporated into it for adjacent
16
bands to cancel aliasing effects. It also has a tc/2 phase shift for adjacent frames to better
capture frequencies close to the modulation frequency. Otherwise, if a 1250 Hz input were
modulated by 1250 Hz, the subband information would be stuck at a constant value,
providing no timing or amplitude information.
The synthesis filter, for each subband, multiplies the subband term by successive sections
of a fixed sine waveform as it moves from frame to frame. The frequency of the sine wave
section is related to the subband modulation frequency. This combined with the changing
subband signal provides 16 signals times 16 waveform sections of 32 samples each. The
sections are multiplied by the synthesis window and subsequently added to reconstruct the
waveform. The section in the middle is weighted more heavily by the middle maximum of
the synthesis filter. Sections prior to and after the center section are weighted less as in
the analysis filter. As the subband generated sinusoid sections are frame shifted by the
window, an overlap and add operation occurs reconstructing a time changing waveform.
In the next section, the movement of a sine wave through this analysis and synthesis
process is described including the phase shifts and partial window inversions required to
implement it.
The input samples are shifted 32 samples at a time by a 511 point window. They are
convolved with the analysis window multiplied by the cosine modulation terms. The
resulting convolution equation, implementing both the tc/2 phase shift for adjacent bands,
and the 7t/2 phase shift for successive frames is:
where x[Tn32m] are the n 512 input samples at frame time m,
h is the 512 point window,
s[m][i] are the 32 subband outputs at frame time m, and
i defines the 32 subbands from 0 to 31.
The 2*i+1 provides the n/2 radian shift between adjacent bands. The n16 term provides
the n/2 radian phase shift from frame to frame as n increases by 32 samples at a time.
The 16n/64 term provides the n/4 term required to implement the transform with a single
filter as described earlier in the discussion on aliasing cancellation.
The preceding implementation, while easy to understand, is restructured to reduce
computation, Pan [9]. The multiplications required for equation (2.10) are 512 for each of
32 bands for a total of 16384 multiplications. 511 additions are required for each band, or
16352 additions.
A simple substitution is used to reduce the computations. This substitution is made
possible per the discussion on decimation and efficient computation in polyphase filters.
Block Transformation and Computational Efficiencies
x[Tn32m] h[n] cos[(2i+1)(n16)n/64],
(2.10)
n=0
17
By setting n = k + 64j, where k and j are both integers, and by reversing the direction x is
slid across the window, equation (2.10) becomes:
s[m][i] = S i, x[k+64j+32m]*h[k+64j]*cos[(2i+1)*(k+64j16)jc/64]. (2.11)
k=0 j=0
The cosine term can be expanded to:
cos[(2i+1 )*(k16)re/64] cos[(2i+1 )*64jrc/64] 
sin[(2i+1)*(k16)rc/64] sin[(2i+1)*64jrc/64]. (2.12)
The term sin[(2i+1)*jn] equals zero for all integer combinations, and the term cos[(2i+1)*jjc]
equals +1 for j even and 1 for j odd. Equation (2.11) now becomes:
s[m][i] =
h
(1 y*x[k+64j+32m]*h[k+64j]*cos[(2*i+1 )*(k1 6)tc/64].
(2.13)
The term (1)1 can be included in the window h[k+64j] by inverting the window when j is
odd. This permits continued use of the Discrete Cosine transform. The cosine term is
independent of the j summation and can moved outside of it giving:
s[m][i] = ^ M[i][k] ^ x[k+64j+32m] C[k+64j], where (2.14a)
k=0 j=0
M[i][k] = cos[(2*i+1)*(k16)7c/64], and (2.14b)
C[k+64j] = h[k+64j] for j even, and h[k+64j] for j odd. (2.14c)
The resulting computation requires 512 + 32*64 = 2560 multiplications and 64*7 + 32*63 =
2464 additions, or about an 8 to 1 reduction in computation. Use of an FFT formulation
results in further reduction.
For the synthesis operation, a similar restructure can be made, although its not as simple
to show the derivation. The results will be shown and discussed briefly.
An inverse modified Discrete Cosine Transform is used to generate a vector V[i]:
U[32m+n] s[m][i]*cos[(2i+1)(n32m+16)7t/64], for n = 0 to 31, and (2.15)
k=0
y[32m+n]
.Â£
j=o
h[32j+n]*U[32j+32m+n].
(2.16)
18
s[m][i] are the subband samples at time frame m, and U[32m+n] is a vector of length 512,
representing the vectors from the most recent 16 frames. 9 is the estimated reconstructed
output. To provide a more effective implementation, the U vector is generated from an
intermediate V vector of length 1024, where 64 new values are generated each transform
frame. The 64 values are generated by creating 64 values during each transformation for
n = 0 to 63. Each of the 16 frames has 64 values to choose from, and only 32 values are
needed. By alternating between the top 32 and bottom 32 values of each 64 value vector,
and by inverting the synthesis window every 64 values, the same as for the analysis
window, the subband values are each multiplied by sine wave sections of the modulation
frequency. The sections are each (2i+1)rc/2 in length and increase their starting point by
(2i+1)jt/2 each frame. The subband information is changing from frame to frame at the
delta modulation frequency. When multiplied by the modulation frequency, the original
modulation frequency plus the delta modulation frequency is recovered. The window
averages sixteen frames of this giving heavier weighting to the middle frames and less to
the end frames per the shortterm Fourier transform requirement. A block diagram and a
set of equations describe this process in detail at the end of this subsection.
It is instructive at this point to see how a sine wave is handled by both the analysis and
synthesis transformation. We should expect to see nonzero values only in the two bands
where the frequency lies between two modulation frequencies. A frequency of 1350 Hz
will be used with a 32 KHz sampled system. The 32 bands provide bands of width 500
Hz. The two adjacent modulation frequencies are at i = 2 and i = 3, (2*i+1)*500 equals
1250 Hz for i = 2, and 1750 Hz for i = 3.
x[n] can be expressed as:
x[m][n] = cos[2ji1 350*(32mn+256)/32000], for n = 0 to 511, and
using convolution equation (2.10), (2.17)
s[m][i] = cos[2n1350*(32mn+256)/32000]*h[n]*cos[(2i+1 )(n1 6)je/64], (2.18)
n=0
where m is the frame number with an initial value of 0.
With a filter h[n] that attenuates all values outside the two adjacent bands, only s[2] and
s[3] will be nonzero.
An initial cosine phase of T = 256 for x[n] is chosen because the convolution of x[n] with
the filter h[n] will initially provide the peak value output. The peak point of the cosine
waveform aligns with the center peak value of the filter window for this case. The filter
design is also scaled such that the peak value of this convolution result will have a value
of 2 for a sine wave with a peak value of 1.
The frequency of 1350 Hz can be restated as (2i+1+.4)*250 and (2i+11.6)*250 for
subbands 2 and 3, and substituting tc/64 for 27c250/32000, s[2] and s[3] can be stated as:
19
s[32m+2] =% cos[(2i+1+.4)(32mn+256)ji/64]*h[n]*cos[(2i+1)(n16)jc/64], and (2.19a)
n=0
s[32m+3] =X cos[(2i+1 1.6)(32mn+256)7c/64]*h[n]*cos[(2i+1 )(n1 6)tc/64]. (2.19b)
n=0
Using the trigonometric identity in equation (2.1), assuming the sum of frequencies term is
attenuated by the filter, letting the Vz factor on the remaining frequency difference term be
cancelled by the normalized window gain of 2, and substituting i=2 and i=3, these
equations simplify to:
51f
s[32m+2] = L cos[(.4)(32mn+256)jt/64 + 57tm/2 3n/4]*h[n], and (2.20a)
n=0
s[32m+3] = L cos[(1.6)(32mn+256)rc/64 + 7;cm/2 n/4]h[n]. (2.20b)
n=0
For the filter that is designed in the next section the gain of h[n] at .4 and 1.6 from the
modulation frequency will be shown to be approximately 1.000 and .0337. Substituting
these values for h[n], the summation signs can be removed, and the first five values of
s[32m+2] and s[32m+3] will be:
m s[2] s[3]
0 .7071 .0238
1 .9877 .0053
2 .4540 .0300
3 .4540 .0300
4 .9877 .0053
Table 2.1
The subband signals are essentially a decimated sample of the modulated frequency
difference of 100 Hz from frame to frame with an additional tc/2 added per frame.
The above subband signals are then used by the synthesis transform to reconstruct the
original signal x[n]. To see how the synthesis process works, consider the operation for
only one nonzero band. The example above approximates this situation, since the
synthesis filter will multiply subband 3 by .0337 again, for a total transform gain of .0011.
The inverse transform formula in equation (2.15), when only s[2] is nonzero simplifies to:
U[32m+i] = s[32m+2]*cos[5*(16+i+32m)rc/64], for i = 0 to 31. (2.21)
20
For i = 0 to 31, cos[5*(16+i)rc/64 generates a 5/4 period cosine section from 5n/4 to 1 5tc/4
at time frame m = 0, a section from 15ji/4 to 25ji/4 at time frame m = 1, etc. The cosine
sections from four successive frames are shown in Figure 2.8. If sixteen sections are
created for the past sixteen frames and multiplied by the subband values for the most
recent sixteen time frames and summed, the 32 point value would still be a 1250 Hz
waveform section. The filtering operation by the baseband prototype filter, however, filters
16 sections of 1250 Hz waveform crunched closer together than normal and recreates the
original 1350 Hz waveform.
xie4
Figure 2.8 Four frames of subband 2 cosine synthesis sections.
The detailed process implementation is now shown in a form suitable for implementation
on a computer, MPEG specification [5]. Groups of 32 samples are shifted into the filter
each time frame. Groups of 64 samples are multiplied by the modified filter window. The
64 values of each of eight groups are added as described in the previous section where
an index substitution was made to improve computational efficiency. The 64 point vector
is then transformed with a modified Discrete Cosine Transform to separate the time
domain input signal into subbands of time domain signals. The modified Discrete Cosine
Transform provides a tc/2 phase difference between adjacent bands with the 2i+1 term in
the equation below, and with the inverted portions of the filter window. This provides the
aliasing cancellation required. The n + 16 term also phase shifts the decimated/modulated
values from frame to frame by jt/2 radians. Figure 2.9 from Shlien [13] shows a pictorial
implementation of the analysis filter process. The equation implementation is:
21
(2.22)
s[i] =
M[i][k] C[k+64j] x[k+64j] where,
i is the subband filter index from 0 to 31,
s[i] is the subband output sample for subband i and changes every 32 samples,
C[k+64j] = (1)' h[k+64j], where h is the 512 point filter window with the first value = 0,
x[n] is the 512 point audio input sample, and
M[ij[k] = cos[(2*i+1 )*(k16)n/64], the cosine transform coefficients.
32 samples
I
I I I I
X fi/o  I ~
I I I I l l
C Window______________________
I l I I I I
Figure 2.9 Analysis filter implementation
The synthesis process takes 32 subband samples at a time and performs an inverse
modified Discrete Cosine Transform on them. The 64 point result is shifted in parallel to a
16 deep 64 wide FIFO buffer, containing the results of the present and 15 previous inverse
transform results. The latest 32 sample output consists of taking alternating 32 sample
groups from the 16 buffers, multiplying them by the 512 point sample window and
summing them to provide 32 outputs per frame. The inverse modified Discrete Cosine
Transform again provides a rc/2 phase difference between adjacent bands with the 2k+1
term in the equation below, to provide the aliasing cancellation required. The 16+i term,
the alternating halves of the FIFO buffer, and the inverted portions of the synthesis filter
window provide id2 phase changes from frame to frame to compensate for the similar time
frame phase change in the analysis process. The implementation is shown pictorially in
Figure 2.10, again from Shlien [13]. The equation implementation is as follows:
22
Preshift the 1024 point 16 frame FIFO buffer, calculating V[i] for i = 1023 down to 64
using,
V[i] = V[i64]. (2.23)
Perform the inverse modified Discrete Cosine Transform for new V[0] to V[63].
V[i] = ^ N[i][k]*S[k], where S[k] are the 32 subband samples, (2.24)
k=0
N[i][k] = cos[(16+i)(2k+1)rt/64].
Build a 512 point U vector from alternating halves of the V vectors.
U[64*i + j] = V[128*i + j], U[64*i + 32 + j] = V[128*i + 96 + j],
for i = 0 to 7, j = 0 to 31.
Multiply the 512 point U vector by the 512 point D window.
W[i] = U[i]*D[i], for i = 0 to 511, where
D[i] = 32*h[i], for INT(i/64) even, and
D[i] = 32*h[i] for INT(i/64) odd.
Calculate the latest 32 output samples.
x[n] = if W[n+32i], for n = 0 to 31.
i=0
(2.25)
(2.26)
(227)
(2.28a)
(2.28b)
(2.29)
The D window values are 32 times the C window values to compensate for the division by
the interpolation ratio that occurs during the interpolation process, Crochiere and Rabiner
[2]
The above equations demonstrating the subband coder are implemented in a computer
program and the results discussed later.
23
subtend
sunples
vector
 ; ; U vector ;
l I 1 I 1 1 1 1 1
I D window
81 i i i ii i i rM
I ;  * W xteutr ; 
*111 1
*ohn
Reamjuuctcd nrapla
Figure 2.10 Synthesis filter implementation.
24
CHAPTER 3
ANALYS1S/SYNTHESIS FILTER DESIGN
Filter Requirements
The requirements for the combined analysis and synthesis filters in the subband coder, as
discussed before, are that the resulting frequency response be flat, and that any aliasing
effects generated by the decimation and interpolation operations be cancelled.
An ideal filter cannot be designed with a finite filter so the nonideal filter responses for
adjacent bands need to overlap to avoid gaps in the frequency response. The requirement
for a flat frequency response in the overlapping areas can be met by a fairly simple
requirement. Since there is a filter in both the decimation and interpolation operations, the
product of the gain of these two filters at a particular frequency plus the product of the gain
for the same frequency as contributed by the adjacent band must be a constant. For
example, if two adjacent modulation frequencies are specified as 1250 Hz and 1750 Hz,
for any frequency between, the sum of the squares of the gains for the two contributing
filters must be a constant. Outside adjacent bands, the frequency gain should be as close
to zero as possible.
25
Radians
Figure 3.1 Symmetrical overlapping filters with a flat combined frequency response.
The second requirement, cancellation of the effects of aliasing was illustrated earlier. If the
relative phasing between adjacent bands is defined properly, the aliasing terms may be
subtracted from each other. It is apparent from Figure 3.1 that for this cancellation to be
completely effective, the shape of the frequency response for the two adjacent bands must
be symmetrical with respect to the midpoint between bands. This does not necessarily
mean that each filter must be symmetrical about its center modulation frequency, but if the
same prototype filter is to be used for all bands, its evident that a symmetrical filter is the
most straightforward way of assuring that edges of adjacent filters are symmetrical with
respect to each other. Figure 3.1 illustrates a set of filters meeting this requirement.
The first step in the filter design is determining a filter length, the length being the number
of filter coefficients. The filter needs to be large enough to reduce the passband ripple and
attenuate the stopband frequencies to the desired level for each subband. The length
should be a power of two for computational efficiency. It should be short enough, on the
other hand, to represent a relatively constant frequency content per the requirements of a
shortterm Fourier transform.
For the music compression application, a 44.1 KHz or 48 KHz sampling rate has been
standardized as providing double the maximum human auditory bandwidth, and 32
subbands has been found to be a good compromise between complexity and adequate
frequency separation to meet human perceptual listening needs. Through evaluation of
26
filter lengths, 512 points has been shown to provide less than .07 dB of passband ripple
and greater than 100 dB of stopband attenuation. At 44.1 KHz, 512 points represent 11.6
milliseconds. With 32 to 1 decimation for the 32 bands, 32 sample points at a time are
slid by the filter window. These 32 samples represent 725.6 microseconds. With 32
samples and 512 points, the shortterm Fourier transform overlap and add operations
effectively averages 16 sets of 32 samples. This combination of times and overlap has
been shown to be sufficient to meet demanding listening requirements in most cases. The
final section in this paper summarizes some of the supplementary techniques that are
used to handle exceptional situations.
The symmetry requirements for a filter with an even number of coefficients can be stated
as:
h[N1m] = h[m], m = 0 ... N/2 1 (3.1)
For the 512 point filter, h[511] = h[0], h[510] = h[1],... h[256] = h[255].
Often its desired to have an odd number of coefficients. This is accomplished by setting
h[0] = 0., setting h[ N m ] = h[m] for m = 1 ... N/2 1, and making the maximum center
value h[N/2] unique.
The second case is chosen for this filter design for two reasons. It simplifies the discussion
of the development of the filter, and the existing MPEG standard filter has an odd number
of nonzero coefficients. The filter developed here can then be compared more easily with
the MPEG filter.
Also note than an FIR filter rather than an HR filter is used because standard FIR filter
design techniques provide symmetrical filters, FIR filters are linear phase, and they
guarantee stability.
Cosine Summation Problem Statement
The proposed design technique recognizes that standard FIR windows use a cosine or
sum of cosine terms to define the window. The following standard windows illustrate this.
Hanning: h[n] = 0.5 0.5*cos(2rcn/M), 0 < n < M, (3.2a)
h[n] = 0.0, otherwise.
Hamming: h[n] = 0.54 0.46*cos(2rcn/M), 0
h[n] = 0.0, otherwise.
Blackman: h[n] = 0.42 0.5*cos(27tn/M) + 0.08*cos(4jtn/M), 0 < n < M, (3.3b)
h[n] = 0.0, otherwise.
This idea is expanded on, recognizing that any finite symmetrical curve shape can be
represented, by use of a Fourier cosine series expansion, as a sum of cosine terms. For
example, the curve 1 x2 between 1 and 1 is shaped somewhat like the windows and can
27
be represented as a cosine series sum. The subband filter with a sum of squares
requirement seems a likely candidate for specification as a finite cosine series sum. The
subband ideal filter requirements are stated below.
G12(co) + G22(co) = k, a constant, for co, < co < coj,
G,(co) = 0., (o >= ofe,
G2(co) = 0., to < co,,
(3.3a)
(3.3b)
(3.3c)
where co, and (&, are two adjacent modulation frequencies, and G, and G2 are the gain in
the associated frequency bands.
Previously, the solution to this requirement by Johnston [6] minimized the following error
function.
The above method is a variant of the method of steepest descent and solves individually
for each filter coefficient. The value a is used to weight the importance of the passband
ripple and the stopband attenuation. The error function requires using different step sizes
to see which one optimizes the solution. The main drawback is that the filter coefficients
are not described concisely, consisting of N/2 non related points. If the design is to be
used by personnel other than the designer, each of these points need to be manually
typed into the design. The MPEG standard filters, for example, are not available
electronically in the public domain, and require purchase of a paper document and
subsequent typing of 250 10digit numbers into a design.
Other solutions to the problem include formulating a set of double poles and double zeroes
in the complex plane that meet the requirement, Wackersreuther [14], and formulating an
aliasing cancellation solution in the time domain rather than the frequency domain, Princen
and Bradley [10]. The latter solution is limited to filter lengths equal the number of bands,
however.
It should be noted that synthesis filters compatible with the MPEG standard must use the
MPEG standard filter. Otherwise, the synthesis filter will not be compatible with the
analysis filter used to create the compressed data. The design technique proposed here is
primarily useful for new designs and as an educational tool.
The proposed design technique defines the window as:
h(x) = a[n]*cos(2jtnx), where N is the number of cosine terms to be used. (3.5)
n=0
The a[n] terms are determined by convolving several frequencies in the passband and in
the stopband with each of the cosine terms multiplied by the ideal brickwall filter response
sin((o)/(o. The gain for each frequency tested is the sum of the peak values from the
convolutions with each cosine term. The convolution to find the peak can be replaced by
a simple sum of multiplications by aligning the peak value for each frequency with the
E(ffl) = [0/^11 G2(tc co) G2dcof + a*[^Mr G2(co)dcof .
(3.4)
28
peak value of the window. For a frequency f and a filter with 2K + 1 terms, this can be
expressed in discrete form for each cosine term / frequency combination as:
Z, [a[n]*cos(2nn*(k+K)/(2K+2))*[sin(a)ck)/coek]*cos(2itk*f/fs)]l (3.6)
k=K
where fs is the sampling frequency, and coc is the prototype low pass filter cutoff frequency.
The gain, or peak value, for each frequency is then expressed as:
G(f) = ^ ^ [a[n]*cos[2nn(k+K)/(2K+2)]*[sin(cock)/tock]*cos(2jrkf/fs)]. (3.7)
n=0 k=K
The symmetry of the above permits reducing the inner summation to 0 to K. The
summation is also normalized to 1 by replacing cock in the denominator with jtk, giving,
G(f) =
[c[k]*a[n]*cos[27tn(k+K)/(2K+2)]*[sin((ock)/jck]*cos(27ckf/fs)],
n=0 k=0
(3.8)
where c[k] = 2, k 0, c[0] = 1.
The c[0] term at one half value compensates for the single occurrence of the peak at k =
0.
The filter cutoff frequency coc, defines the reduction in bandwidth for multiple bands and the
amount of overlap. With the type of modulation chosen, the nominal bandwidth for M
brickwall filters would be rc/2M.
The amount of overlap is arbitrarily specified as 8/7 for now. Criteria for overlap selection
will be discussed at the end of this subsection.
Before specifying the equation for the music application, the ratio of the range of f to the
sampling frequency fs is first discussed. The ratio of f/f5 can be normalized as a function of
the number of subbands. For example, for 32 subbands, the distance between modulation
frequencies is 1/32 the Nyquist bandwidth of 1J2. For fs equal 32 KHz, the range of f to be
evaluated is from 0 to 500 Hz. For a brickwall filter, the gain would be one for f < 250 Hz
and 0 for f > 250 Hz. With overlapping filters, the lower sideband of the adjacent filter,
though, completely overlaps the full 500 Hz. If the frequency is specified in terms of the
delta from the midpoint, 250 Hz, the gain requirements can be specified as,
G2(250+Af) + G2(250Af) = 1, 0 < Af < 250 Hz, (3.9a)
G2(250+Af) = 0, Af >= 250 Hz. (3.9b)
29
Note that the ratio between fs and the subband width 500 Hz is 64 to one. If fs = 44.1 KHz
had been used, the subband width would have been 689.0625 Hz and the ratio still 64 to
1. fs equal 32 KHz is used in the example to avoid the use of unwieldy numbers.
Cosine Summation Solution
For the music application with a 511 point filter, 32 bands and an fs of 32 KHz, a solution
with 8 cosine terms is attempted. The resulting equation is:
G(f) = 1? c[k]*a[n]*cos(7cn(k+K)/256)*[sin(7tk)(8/7)/64)/7tk]*
n=0 k=0 cos(2jtk(250+Af)/32000)], (3.10)
where c[k] = 2, k 0, c[0] = 1.
The only unknown term, a[n], can be taken out of the inner summation, and the remaining
inner summation can be calculated for several frequencies.
For f, = 250 Af, define,
l[i][n] = f? [c[k]*cos[jtn(k+K)/256]*[sin(7ck/56)/jtk]*cos(27ckf/32000), (3.11a)
k=0
for n = 0 to 7.
For f, = 250 + Af, define,
25Â£
m[i][n] = X [c[k]*cos[jtn(k+K)/256]*[sin(kji/56)/7tk]*cos(2jikf/32000), (3.11 b)
k=0
for n = 0 to 7.
The sum of square gain criteria, and the stopband attenuation requirement can now be
stated as:
(a[0]*l[i][0] + a[1]*l[i][1] + ... + a[7]*l(i][7]p +
{a[0]*m[i][0] + a[1]*m[i][1] + ... + a[7]*m[i][7]}* = 1, (3.12a)
for Af < 250 Hz, and
{a[0]*m[i][0] + a[1]*m[i][1] + ... + a[7]*m[il[71} = 0, for Af >= 250 Hz, (3.12b)
for a set of delta frequencies, Af,.
30
The C program in Appendix B calculates the coefficients I[i][n] and m[i][n] for 13
frequencies below 250 Hz and 9 frequencies equal or greater than 250 Hz. Twenty two
equations with 8 unknowns are formulated.
One further constraint is that the window should go to 0 at the window endpoints and be
normalized to 1 at the window midpoint. The Hamming, Hanning, and Blackman windows
above all meet these requirements.
For the endpoints this requires,
(a[n]*cos(jcn*
(a[n]*cos(jcn*(256256)/256) = 0, or,
n=0
a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] = 0.
For the midpoint this requires,
^ (arnl*cos(nn*
n=0
(a[n]*cos(jtn*(256+0)/256) = 1, or,
(3.13)
(3.14)
(3.15)
a[0] a[1] + a[2] a[3] + a[4] a[5] + a[6] a[7] = 1, (3.16)
increasing the number of equations to 24.
A least squares solution for 24 equations and 8 unknowns is easily solved with a MATLAB
matrix solution capability and an iterative technique.
The iterative technique recognizes that at Af, = 0, the I term summation and m term
summation are equal to each other, and therefore equal 1//2, and that furthermore the m
term summations for Af, > 0 decrease rapidly to 0 as Af, becomes larger. The sum of
square equations for Af, 0 can be reformulated to take advantage of this:
a[0]*l[i][0] + a[1]*l[i][1] + ... + a[7]*l[i][7] =
V 1. {a[0]*m[i][0] + a[1]*m[i][1] + ... + a[7]*m[i][7]}2. (3.17)
The right side of the equation can be initialized with the coefficients arbitrarily set to the
Blackman window coefficient values a[0] to a[2] for example, with a[3] through a[7]
initialized at 0. The left side of the equations are now a set of linear equations.
MATLABs standard matrix solutions provide a little known additional capability when the
number of equations exceed the number of unknowns. The right side of the equation is
subtracted from the left side solution, which ideally should equal 0., and the root sum
square of those differences for the full set of equations is minimized.
31
This also means that the equations can be scaled to weight the importance of the
equations. The stopband equations can be weighted differently than the passband v
equations to control the relative importance of the passband ripple compared to the
stopband attenuation.
The MATLAB program in Appendix B is solved assuming a passband ripple of .04 dB and
a stopband attenuation Of 100 dB. For .04 dB ripple, the passband gain error should equal
1004C0, or approximately .005. For 100 dB stopband attenuation, the m coefficient
summation should be less than .00001. If the passband equation set remains normalized
at 1, and there are 13 passband equations and 9 stopband equations, the stopband
equations should be scaled at (.005/.00001)*(13/9) to give equal weighting to the
passband ripple specification of .04 dB and the stopband attenuation specification of 100
dB.
Finally, the two window endpoint and midpoint equations need to be scaled properly. The
ideal a[n] window coefficient summations of 0. and 1. are arbitrarily specified for an error of
.0002. Since there are two equations, they are scaled by (.0002/.00001)*(13/2) relative to
the passband equations.
Filter Design Results
The MATLAB program in Appendix B implements the above as described converging on a
set of a[n] coefficients that meet the requirements in five iterations. The program takes
less than 30 seconds to execute with a 33 MHz 486 PC compatible without floating point
capability, and is near instantaneous on a 90 MHz Pentium with a floating point capability.
The resulting a[n] coefficients a[0] through a[7] are:
3.7929857e001, 4.9704195e001, 1.2037640e001, 3.0637723e003, 2.9509270e004,
9.9179934e005, 2.9924455e005, 6.6170774e006
Figure 3.3 shows the resulting subband coder filter defined by the coefficients a[n] and the
result of inverting the filter for alternating groups of 64 coefficients as described in the
previous section.
The passband ripple values and the stopband attenuation values are shown in Table 3.1.
The desired window midpoint value of 1.00000, and endpoint values of .00000 are
accurate to 5 decimal places.
32
cln] htn]
n
Figure 3.3. Resulting subband coder filter before and after partial inversion.
33
Table 3.1. Window passband ripple and stopband attenuation results.
gain ripple(f<250)
250f 250+f sum of or
freq calc X(w) calc X(w) square atten(f>250)
0.0 0.70734 0.70734 0.7073405 0.7073405 1.000661 0.002870
5.0 0.73330 0.6803619 1.000628 0.002728
10.0 0.75815 0.6524831 1.000533 0.002312
15.0 0.78180 0.6238303 1.000381 0.001653
25.0 0.82523 0.5647582 0.999957 0.000185
50.0 0.90974 0.4139222 0.998957 0.004530
62.5 0.93931 0.93931 0.3414722 0.3414722 0.998908 0.004747
75.0 0.96134 0.2740052 0.999252 0.003249
100.0 0.98730 0.1605719 1.000546 0.002372
125.0 0.99736 0.99736 0.0809895 0.0809895 1.001294 0.005617
150.0 0.99989 0.0336023 1.000901 0.003909
187.5 0.99994 0.99994 0.0049925 0.0049925 0.999906 0.000408
200.0 0.99991 0.0019734 0.999823 0.000770
250.0 1.00004 1.00004 0.0000000 0.0000000 1.000076 0.000330
312.5 0.0000000 0.0000000 169.14
375.0 0.0000000 0.0000000 156.79
437.5 0.0000001 0.0000001 144.55
500.0 0.0000006 0.0000006 125.06
562.5 0.0000003 0.0000003 130.03
625.0 0.0000005 0.0000005 126.89
750.0 0.0000004 0.0000004 128.61
875.0 0.0000003 0.0000003 131.85
The results are corroborated by performing a Fourier transform on the resulting window
and comparing the gain at several frequencies. The results are shown in Table 3.1 in the
X(w) column and agree exactly. The Fourier transform frequency response is plotted in
Figure 3.4 showing more than 100 dB attenuation for frequencies greater than 500 Hz.
The results are further verified by convolving several selected frequencies with the window,
and comparing the peak value, when the waveform is completely in the window. These
results also agreed.
The primary drawback of the above design technique is that an overlap value must be
determined by trial and error. The I and m coefficients need to be recalculated for each
overlap value attempted. This takes about 30 seconds with the Pentium processor and
several minutes with the 486 PC compatible. The technique by Johnston supposedly
converges on an optimum overlap value, although this was not verified.
The value of 8/7 used is selected because it agrees closely with the MPEG filter overlap.
The window coefficients also compare reasonably well with the MPEG coefficients.
34
X(u)
Figure 3.4 Filter frequency response.
35
CHAPTER 4
APPLICATION OF THE RESULTING SUBBAND CODER TO SINE WAVE SAMPLES AND
AN ACTUAL MUSIC SAMPLE
The subband coder and filter were first tested on pure sine waves to verify the complete
algorithm, and to verify understanding of the phase and aliasing cancellation relationships.
For this evaluation a simple sine wave generation program was written to create a
specified frequency. Equation 2.20 (together with a calculation of the initial phase) is used
to calculate the band values at specified frames. The expected and actual results for
frequencies of 1250 Hz, 1350 Hz, 1500 Hz, 1510 Hz and 1800 Hz are shown in Table 4.1.
The predicted values take into account the 7c/2 phase shifts per band. The window
attenuation values are a function of the distance from the modulation frequency. A peak
sine wave value of 10000.0 is used.
36
Table 4.1.
1250 Hz subbands 2 and 3
ideal(2) 8577.71 5141.28 8577.71 5141.28 8577.71 5141.28 8577.71
actual(2) 8585.50 5145.61 8585.50 5145.61 8585.50 5145.61 8585.50
error(dB) 59.16 64.26 59.16 64.26 59.16 64.26 59.16
ideal(3) 0.00 0.00 0.00 0.00 0.00 0.00 0.00
actual(3) 1.76 10.39 1.76 10.39 1.76 10.39 1.76
error(dB) 72.08 56.66 72.08 56.66 72.08 56.66 72.08
1350 Hz subbands 2 and 3
ideal(2) 7409.36 1077.80 8676.40 9121.91 2047.06 6715.46 9941.55
actual(2) 7398.44 1083.86 8672.58 9111.38 2038.48 6715.00 9932.43
error(dB) 56.23 61.34 65.35 56.54 58.32 83.74 57.79
ideal(3) 224.10 331.76 165.90 136.72 326.63 247.26 35.97
actual(3) 224.93 321.61 153.15 141.57 319.58 234.11 44.36
error(dB) 78.61 56.86 54.88 63.27 60.03 54.61 58.51
1500 Hz subbands 2 and 3
ideal(2) 6237.30 6237.30 6237.30 6237.30 6237.30 6237.30 6237.30
actual(2) 6212.43 6212.43 6212.43 6212.43 6212.43 6212.43 6212.43
error(dB) 49.08 49.08 49.08 49.08 49.08 49.08 49.08
ideal (3) 3333.91 3333.91 3333.91 3333.91 3333.91 3333.91 3333.91
actual(3) 3294.80 3294.80 3294.80 3294.80 3294.80 3294.80 3294.80
error(dB) 45.14 45.14 45.14 45.14 45.14 45.14 45.14
1510 Hz subbands 2 and 3
ideal(2) 3571.06 3221.19 2858.61 2484.75 2101.08 1709.12 1310.42
actual(2) 3535.60 3184.75 2821.33 2446.78 2062.57 1670.21 1271.27
error(dB) 45.99 45.76 45.56 45.40 45.28 45.19 45.13
ideal(3) 6344.08 6592.11 6814.12 7009.24 7176.70 7315.84 7426.10
actual(3) 6312.63 6562.36 6786.21 6983.26 7152.76 7294.03 7406.51
error(dB) 47.04 47.52 48.07 48.70 49.41 50.22 51.15
ideal(2)
actual(2)
error(dB)
ideal(3)
actual(3)
error(dB)
1950.69 1800 Hz subbands 3 and 4 9929.59 4186.14 7342.42 8724.00 1950.69 '7 9929.59
1937.67 9922.22 4194.60 7329.82 8724.67 1937.67 9922.22
54.70 59.64 58.44 54.98 80.47 54.70 59.64
17.46 2.09 16.16 12.08 8.70 17.46 2.09
25.90 10.16 19.62 22.29 5.84 25.90 10.16
58.46 58.85 66.21 56.81 67.86 58.46 58.85
37
The resynthesized results are next compared with a delayed version of the original sine
wave. The output results for the first 31 samples are shown in Table 4.2.
Table 4.2. Subband coder applied to a 1350 Hz sine wave.
Subframe 0 Input
0.0 2619.8 5056.6 7140.1 8725.0 9700.3 9998.1 9597.4
8526.4 6859.8 4714.0 2238.9 392.6 2996.7 5391.4 7409.5
8910.1 9788.2 9982.7 9479.8 8314.7 6568.8 4364.1 1854.5
784.6 3368.9 5717.9 7667.5 9081.4 9861.0 9951.8 9347.5
Subframe 15 Output
3.7 22.3 2592.8 5027.6 7111.7 8699.4 9679.6 9983.5
9589.8 8525.8 6865.8 4725.8 2255.2 373.2 2975.6 5370.3
7389.8 8893.0 9774.7 9973.6 9475.8 8316.1 6575.7 4376.4
1871.8 762.9 3343.8 5690.5 7639.3 9054.1 9836.3 9931.4
Note that the output is delayed 15 subframes plus one sample from the input. Fifteen
subframes are required for the waveform to be completely shifted by the 511 point window
when shifted 32 samples per frame. The one sample additional delay is explained by the
first samples shift to window points 31, 63 ... 255 ... 479, 511, and a center maximum
window at window point 256.
The samples are further delayed by approximately .008 samples. The reason for this is
still being investigated. Even if this delay is taken into account, there are still some
residual errors. Reasons for these errors include the nonideal filter, and possibly that the
averaging of 64 sets of eight samples during the filtering operation do not have the same
attenuation as a 512 point filter. The error sources need to be further investigated.
38
The subband coder is next applied to an actual musical sample. About 10 seconds of the
Grateful Dead's "Tons Of Steel" were captured using a 16bit PCcompatible Soundblaster
unit sampling at 44.1 KHz. A program was written to extract one channel of data from the
stereo binary data. The input and resulting delayed output are compared in Table 4.3 for
the first 31 samples.
Table 4.3. Subband coder applied to an actual musical sample.
Subframe 0 Input
2569.0 2382.0
10042.0 9809.0
11.0 1239.0
7.0 646.0
3817.0 5703.0
9163.0 8989.0
2716.0 5018.0
1607.0 3916.0
5985.0 7155.0
7615.0 5809.0
5802.0 5075.0
5463.0 6584.0
9135.0 9567.0
4468.0 2205.0
4965.0 2970.0
6802.0 5994.0
Subframe 15 Output
5.7 2587.4
9575.2 10050.7
2207.9 16.2
2960.0 22.4
2396.7 3827.1
9817.6 9170.1
1228.6 2708.1
664.4 1634.6
5711.8 5995.7
8991.7 7616.3
5014.8 5796.6
3955.7 5502.9
7166.0 9143.5
5812.2 4471.8
5067.5 4957.6
6620.9 6836.2
Additional error results from signal in nonadjacent bands, where the ideal stopband
attenuation is not completely met.
An additional error possibility is that the original musical sample, since it was recorded
under less than ideal conditions, may contain frequency content outside the 22.05 KHz
Nyquist bandwidth, which shows up as aliasing. The analog filter, oversampling, and digital
input filter in the Soundblaster unit should eliminate most of this. Any noise following the
analog filter and in the A/D conversion process though, cannot be digitally filtered. The
short sample may also contain DC content. In the MPEG application, the samples are
usually passed through a digital 10 Hz DC blocking filter prior to analysis.
39
CHAPTER 5
SUPPLEMENTARY AND ALTERNATIVE COMPRESSION TECHNIQUES
The subband coder by itself provides only part of the compression ratio achieved. The
latest version of the MPEG audio standard provides varying levels of compression with
different levels of resulting quality. To provide musical quality which cannot be discerned
from the original by an experienced panel of listeners for a variety of musical test
passages, a compression ratio of only 6 to 1 can be achieved. To achieve this level of
compression, the subband coding technique discussed in this paper must be
supplemented by a parallel 1024 point FFT analysis of the music during recording to
optimize the amount of compression appropriate in each subband, particularly the lower
frequency bands. Variable recording rates with FIFO storage to accommodate artifacts of
certain digitally difficult musical passages (sudden percussive instruments, for example,
generate a preecho), are provided. Sophisticated scaling optimization, and Huffman bit
level compression are also used.
Six to one compression still does not permit the transmission of music over modems or
ISDN data channels at a reasonable rate. Additional compression with less than "perfect"
music quality is often utilized. Acceptable music quality for digital FM stations, can be
achieved, for example, with two 64K ISDN channels. This is the Digital Audio
Broadcasting (DAB) standard which implements layer 2 of the MPEG1 standard. The
newest version of the standard, MPEG2, provides a "low bit rate" layer consistent with
28.8K modem levels.
Less sophisticated techniques that do not take advantage of subband coding are also
used. Transmission of sound clips over the Internet often simply reduce the sampling rate
to 8K, use the ulaw or Alaw compression algorithm from the telephone communications
industry to reduce 12 bits of resolution to 8, and transmit mono only. This is somewhat
listenable, comparable to AM radio broadcasting. Adaptive Differential Pulse Code
Modulation (ADPCM) is also often utilized to provide additional improvement. These
techniques do not take advantage of the subband coding techniques discussed in this
paper.
40
CHAPTER 6
SUMMARY AND ADDITIONAL RESEARCH POSSIBILITIES
A detailed look at the subband coder has been presented. Although not as mathematically
complete as existing descriptions, it hopefully provides a more practical understanding of
the transform operation based on digital signal processing fundamentals. By using real
example bandwidths, sample rates, decimation and interpolation ratios, and relating them
to an actual application and filter, an understanding of filter and transform operation is
provided.
The most interesting and challenging aspect of the subband coder is the variety of signal
processing techniques used in its application. Prior to this research, it was believed to be
just another filter application. The combination of decimation and interpolation, modulation,
aliasing cancellation, transform of timevarying signals, block transformation techniques,
and special windowing requirements make the subject an extremely interesting area of
study.
The new filter design approach was undertaken originally because a clear requirements
statement and design description could not be found in the literature until late in the
research effort. In an attempt to understand the design and filter requirements a different
and possibly new design technique was developed. Although not as powerful as present
methods because the overlap factor needs to be determined by trial and error, the ability
to express the filter in a concise manner is certainly useful. The effort expended in
designing the filter also provided an improved understanding of its operation.
An area where more investigation would be useful is in better understanding the sources
of error in the subband coder. This would include exploration of whether and to what
extent smearing" occurs within a band, the effect of changing frequency content in the
sliding shortterm Fourier transform window, and comparing the effectiveness of this filter
with the Johnston steepest descent filter design method by implementing that method. It
would also include typing in the 512 point MPEG specification filter and comparing its
performance, directly acquiring a digital musical sample without analog processing and
determining for sure whether the analog processing contributes errors. Alternatives to
utilizing a actual digital sample may also be helpful. Filtering the sample to eliminate
aliasing and any DC component would be valuable. Determining how much noise is
present by recording the same sample several times, passing it through a higher sample
rate interpolation filter, correlating the experiments to align them, and computing the error
level would also be of interest.
41
APPENDIX A. LISTING OF THE SUBBAND CODER C PROGRAM
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIW
// Listing of the Subband Coder C Program
// Title mpeg.c Revision 33196 6:08pm 5351 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
#include
#include
#include
#include
#include
#include
#define PI 3.1415928
float in_data[16][32];
float c_window[16][32];
float d_window[16][32];
float c[513], d[513];
float z[65];
float m_cos[32][64];
float n_cos[32][64];
float s_chan[32];
float u[16][64];
float out_data[32];
FILE *fptr ;
FILE fptrl ;
FILE *fptr2 ;
main()
{
float k_float, angl, b, c_value;
float raw_data, in_float, y_in, alph;
int i, i1, j, j1, k, c1, c2;
int k1, k2, k3;
int isign; /* =1 for forward cos xform, 1 for inverse 7
int n; /* transform size 7
int band_no; /* subband number to print 7
printffEnter band number to print\n");
scanf("%i", &band_no);
42
/* calculate c window array, initialize in_data 7
for (i = 0; i < 16; i++)
{
for (j = 0; j < 32; j++)
{ k = 32*i + j;
k_float = (float) (k);
angl = PI (kjloat 255.00000001)/ 56.;
b = 3.7929857e001 4.9704195e001*cos(2.*PI*k_float/512.)
+ 1.2037640e001 cos(4.*PI*k_float/512.)
 3.0637723e003*cos(6.*PI*k_float/512.)
+ 2.9509270e004*cos(8.*PI*k_float/512.)
+ 9.9179934e005*cos(10.*PI*k_float/512.)
+ 2.9924455e005*cos( 12.*PI*k_float/512.)
+ 6.6170774e006*cos(14.*Prk_float/512.);
c_value = (1 ./28.) b sin(angl)/angl;
c[k] = c_value;
/* window is inverted from picture to compensate for nonfifoed
input 7
if (j == 2 II i = 3 II i == 6 II i = 7)
c_value = (float) 0.0 c_value;
if (i == 10 II i = 11 II i == 15 II i== 16)
c_value = (float) 0.0 c_value;
c_window[i][j] = c_value;
in_data[i][j] = 0.;
u[i][j] = 0.0;
}
}
/* define cosine array 7
for (i = 0; i < 32; i++)
{ for (j = 0; j < 64; j++)
{ m_cos[i][j] = cos( (float) ((2*i + 1 ) (j 16)) PI / 64.0);
n_cos[i][j] = cos( (float) ((2*i + 1 ) G + 16)) PI / 64.0);
}
}
fptrl = fopen("sine.fre","wr");
fptr2 = fopen("sine.sub",,'a");
fprintf(fptr1, "\r\n window freq sweep \r\n");
for (i = 0; i < 64; i++)
{ fprintf(fptr1, "\r\n");
for 0 = 0; j < 8; j++)
{ k = 8*i + j;
fprintf(fptr1, "%8.5f", c[k]);
}
43
/* calculate d analysis window 7
fprintf(fptr1, "\r\n");
for (i = 0; i < 512; i++) d[i] = 32.*c[i];
fprintf(fptr1, "\r\n d analysis window \r\n");
for (i = 0; i < 64; i++)
{ fprintf(fptr1, "\r\n");
for (j = 0; j < 8; j++)
{ k = 8*i + j;
fprintf(fptr1, "%8.5f", d[k]);
}
}
fprintf(fptr1, "\r\n");
fptr = fopen("sine.dat","r");
for (i1 = 0; i1 < 25; i1++)
{
fprintf(fptr1, "\niteration number %d\r\n", i1);
i = i1 % 16; /* % is the modulus operator 7
for (j = 0; j < 32; j++)
{ fscanf(fptr, "%f", &raw_data);
in_data[i]0] = raw_data;
}
fprintf(fptr1, "\r\n in_data \r\n");
fprintf(f3tr1,"");
for (j1 = 0; j1 < 32; j1 += 8)
{ for (j = 0; j < 8; j++) fprintf(fptr1, "%8.1f", in_data[i][j1+j]);
fprintf(fptr1, "\r\n");
}
for (j = 0; j < 65; j++) z[] = (float) 0.0;
k3 = 32;
for (k1 = 15; k1 >= 0; k1 )
{ k2 = k1 + i + (int) 1.0;
if (k2 > 15) k2 = 16;
k3 += 32;
if (k3 > 32) k3 = 64;
for (j = 0; j < 32; j++)
{
injloat = in_data[k2][j];
yjn = injloat c_window[k1][j];
z[31j+k3] += yjn;
}
44
}
fprintf(fptr1, "\r\n z array \r\n");
fprintf(fptr1,"");
for 01 = 0; j1 < 64; j1 += 8)
{ for 0 = 0; j < 8; j++) fprintf(fptr1, "%8.2f", z[j1+j]);
fprintf(fptr1, "\r\n");
}
for (k = 0; k < 32; k++)
{ s_chan[k] = 0.0;
for 0 = 0; j < 64; j++) s_chan[k] += (z[j] m_cos[k][j]);
}
fprintf(fptr1, "\r\n freq cos xform array \r\n");
fprintf(fptr1,"");
for 01 = 0; j1 < 32; j1 += 8)
{ for 0 = 0; j < 8; j++)
{
fprintf(fptr1, "%8.2f", s_chan[j1+j]);
if ((01+j)==band_no) && 01 >=15) && (i1 <=21))
fprintf(fptr2, "%8.2f", s_chan[j1+j]);
}
fprintf(fptr1, "\r\n");
for (k = 0; k < 64; k++)
{ z[k] = 0.0;
for 0 = 0; j < 32; j++)
{ z[k] += (s_chanB] n_cos[j][k]);
}
fprintf(fptr1, "\r\n inverse cos xform array \An");
fprintf(fjDtr1,"");
for 01 = 0; j1 < 64; j1 += 8)
{ for 0 = 0; j < 8; j++) fprintf(fptr1, "%8.2f", z[j1+j]);
fprintf(fptr1, "\r\n");
}
for 0 = 0; j < 32; j++) out_dataO] = 0.0;
for 0 = 0; j < 64; j++) u[i][j] = z[jj;
k3 = 32;
for (k1 = 15; k1 >= 0; k1)
{ k2 = k1 + i + (int) 1.0;
45
if (k2 > 15) k2 = 16;
k3 += 32;
if (k3 > 32) k3 = 64;
for (j = 0; j < 32; j++)
out_data[31j] += u[k2][31j+k3]*c_window[k1][j]*32.;
}
fprintf(fptr1, "\r\n output data array \r\n");
fprintf(fptr1,"");
for (j1 = 0; j1 < 32; j1 += 8)
{ for (j = 0; j < 8; j++) fprintf(fptr1, "%8.1f", out_data[j1+j]);
fprintf(fptr1, "\r\n");
}
}
fprintf(fptr2, "\n);
}
46
APPENDIX B. ADDITIONAL PROGRAM LISTINGS
lllllllllllllllllllllllllllllllllllllllllllllllllllllllim
// Lcoef and Mcoef Analysis Filter Coefficient Computation Listing
// Title window.c Revision 33096 8:47pm 1972 bytes
iiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiim
#include
#include
#include
#include
#include
#include
double l[30][32], m[30][32];
double a[32], frq[30];
FILE *fptr1 ;
FILE *fptr2 ;
main()
{
double pi, f_mpy, n1, ck, k1, k2, f_smpl, 11, ml, p1;
int i, k, n, nfreq;
nfreq = 22;
pi = 4.0 atan(1.0);
printf("pi = %16.12f\n", pi);
f_smpl = 32000.;
fptrl = fopen("lcoef.dat","wr");
fptr2 = fopen("mcoef.dat","wr");
/* put together a set of frequencies to use 7
frq[0] = 0.; frq[1] = 5.0; frq[2] = 10.0; frq[3] = 15.0; frq[4] = 25.0;
frq[5] = 50.;frq[6] = 62.5; frq[7] = 75.0; frq[8] = 100.; frq[9] = 125.;
frq[10] = 150.; frq[11] = 187.5; frq[12] = 200.; frq[13] = 250.;
frq[14] = 312.5; frq[15] = 375.; frq[16] = 437.5; frq[17] = 500.;
frq[18] = 562.5; frq[19] = 625.; frq[20] = 750.; frq[21] = 875.;
I* calculate the I and m coefficients for the matrix 7
/* i is the number of frequencies 7
I* n is the number of coefficients to be solved for 7
47
/* k is half the length of the filter 7
for (i = 0; i < nfreq; i++)
{
printf("%d\n,l1 i);
for (n = 0; n < 8; n++)
{
n1 = (double) n;
l[i][n] = 0.0; m[i][n] = 0.0;
for (k = 0; k < 256; k++)
{
if (k==0)ck= 1.0;
else ck = 2.0;
k1 = (double) k+ .000000001;
k2 = 256. + k1;
11 = ( ck cos(n1*pi*k2/256.) (sin(pi*k1/56.)/(pi*k1))
* cos(2.*pi*k1* (250.frq[i]) /f_smpl)) ;
l[i][n] += 11;
ml = ( ck cos(nrpi*k2/256.) (sin(pi*k1/56.)/(pi*k1))
* cos(2.*pi*kr (250.+frq[i]) /f_smpl)) ;
m[i][n] += ml;
}
}
}
for (i = 0; i < nfreq; i++)
{
for (n = 0; n < 8; n++)
{
fprintf(fptr1, "%16.7e", l[i][n]);
fprintf(fptr2, "%16.7e", m[i][n]);
}
fprintf(fptr1, \n");
fprintf(fptr2, "\n");
}
}
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
// Lcoef.dat and mcoef.dat partial results
// Title lcoef.dat and mcoef.dat Revision 33096 8:51pm 2860 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
9.7947056e001 5.5379333e001 4.8944199e001 5.0387345e001 4.9848963e001
5.0032781 e001 5.0031038e001 5.0010276e001
1.0156627e+0005.8010879e001 4.8232784e001 5.0890297e001 4.9380024e001
5.0539167e001 4.9407948e001 5.0943008e001
1.0431789e+0006.1080803e001 4.7519101 e001 5.1388330e001 4.8915572e001
5.1043636e001 4.8779843e001 5.1907420e001
48
1.0623050e+0006.4535467e001 4.6863816e001 5.1843255e001 4.8488732e001
5.1511850e001 4.8188067e001 5.2843100e001
9.7947056e001 5.5379333e001 4.8944199e001 5.0387345e001 4.9848963e001
5.0032781 e001 5.0031038e001 5.0010276e001
9.3461321 e001 5.3217830e001 4.9601966e001 4.9914382e001 5.0291277e001
4.9556871 e001 5.0610775e001 4.9162042e001
8.8141148e001 5.1534321 e001 5.0166231 e001 4.9500894e001 5.0679800e001
4.9139932e001 5.1113942e001 4.8441506e001
8.2049564e001 5.0312479e001 5.0610091 e001 4.9169305e001 5.0992899e001
4.8804947e001 5.1514139e001 4.7880797e001
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
II Window Coefficient Computation Listing MATLAB
// Title calcwind.m Revision 33096 8:28pm 2329 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
diary calcwind.dat
format compact
format long
nfreq = 22;
nfreql =14;
ncoef = 8;
qmax = 6;
load fval.dat
load lcoef.dat
load mcoef.dat
for i = 1 :ncoef
wcoef(1:1,i:i) = 0.;
end
vcoef(1:1,1:4)=[.3754 .4944 .1246 .0056];
wcoef(1:1,1:3)=[.42 .5 .08];
save wcoef.dat wcoef /ascii
for q =1:qmax
q
load wcoef.dat
k=20.*13./2.;
k1 = 500.*13./9.;
x = wcoef;
fprintf( gain ripple(f<250)\n);
fprintf( 250f 250+f sum of or\n);
fprintf( freq calc X(w) calc X(w) square atten(f>250)\n);
for i = 1 :nfreq
for j = 1 :ncoef
am(i:i,j:j)=xG)*mcoef(i:i,j:j);
49
al(i:i,j:j)=x(j)*lcoef(i:i,j:j);
end
ym(i)=sum(am(i:i,1 :ncoef));
ymsq(i) = ym(i)*ym(i);
yl(i)=sum(al(i:i,1 :ncoef));
ylsq(i) = yl(i)*yl(i);
ssq(i)=ymsq(i) + ylsq(i);
rip(i)=10*log10(ssq(i));
att(i)=20*log10(ym(i));
if (fval(i) <= 250)
fprintf(%6.1f%9.5f%9.5f,fval(i),yl(i),yl(i));
fprintf(%11.7f%11.7f%10.6f ,ym(i)lym(i)l88q(i));
fprintf(%10.6f\n\rip(i));
end
if (fval(i) > 250)
fprintf(%6.1f ,fval(i));
fprintf('%11.7f%11.7f %10.2f\n,,ym(i),ym(i),att(i));
end
end
if (q < 6)
fprintf(Vr);
ssq2 = ssq(1:nfreq1);
ssq1=1.*1.;
for i = 1 :nfreq
if (i < nfreql)
qcoef(i:i,1:ncoef) = lcoef(i:i,1:ncoef);
ydiff(i) = ssq1 ymsq(i);
y1(i) = sqrt(ydiff(i));
end
if (i >= nfreql)
qcoef(i:i,1:ncoef) = krmcoef(i:i,1:ncoef);
y1(i) = 0.;
end
end
f1 = nfreq + 1;
f2 = nfreq + 2;
qcoef(f1 :f1,1 :ncoef) = [k0k0k0k0];
qcoef(f2:f2,1 :ncoef) = [O k 0 k 0 k 0 kj;
y1(1)=sqrt(.5*ssq1);
y1 (f 1 )=.5*k;
y1 (f2)=.5*k;
x1=qcoefVy1;
wcoef = x1;
for i = 1 :ncoef
fprintf(%10.7f ,wcoef(i));
if (i == 6 I i == 12 I i = 18 I i = 24 I i = 30)
 fprintfOW);
end
50
end
fprintfCVT);
save wcoef.dat wcoef /ascii
e = qcoef*x1 y1;
for i = f1 :f2
fprintf(%13.9r, (e(i) + y1(i))/k);
end
fprintfCVn);
e1 = e(1:f2);
for i = 1 :f2
fprintf(%13.9f,e(i));
if (i == 6 I i = 12 I i = 18 I i 24 I i == 30)
fprintf(\n);
end
end
fprintf('\n%12.6f\n,norm(e1));
end
end
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
// Window frequency response for Figure 3.4 MATLAB
// Title windfreq.m Revision 33096 9:35pm 852 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
diary windfreq.dat
load wcoef.dat
ncoef = 8;
for i = 1:16
for j = 1:32
k = 32*(i1) + j 1;
if (k~=0)
kfloat = k+.000000001 ;
angl = 3.1415928 (kfloat 256)/ 56.;
b(k) = 0.;
for m = 1 :ncoef
b(k) = b(k) + wcoef(m)*cos(2.*(m1)*pi*kfloat/512.);
end
cvalue = b(k) sin(angl)/angl;
h(k) = cvalue;
c(k) = h(k);
if (rem((k1),128) >= 64)
c(k) = h(k);
end
f(k) = 32000.*(k1)/512;
end
end
end
h(512) = 0.;
51
c(512) = 0.;
plot(1:512, h, w)
xlabel(n)
ylabel('h[n]')
grid
pause
plot(1:512, c, V)
xlabel(n)
ylabel(c[n])
grid
pause
a4=(fft(h)/56.);
semilogy(f(1:50),abs(a4(1:50)),w)
xlabel(Hz)
ylabel(X(w)')
grid
pause
for i = 1:30
a3(i) = abs(a4(i));
fprintf(%7.1f%13.7f\n,If(i),a3(i));
end
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
// Program to generate sine wave data for testing mpeg.c
// Title sinegen.c Revision 22496 12:52pm 1115 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
#include
#include
#include
#include
#include
#include
FILE *fptr ;
main()
{
double pi, sam_rate, frequ, amplit, phase_deg, phase_rad;
int i, k, n_samples;
double indx, y;
printf("Sine wave generator program.\r\n");
printf("Enter sample and sine wave freq in Hz, zerotopeak amplitude,\r\n");
printff start phase in degrees, and number of samples.\r\n\r\n");
printffExample 44200. 2000. 1. 30. 512\r\n\r\nM);
52
printf("The results will go in file sine.dat 1 sample per line.\r\n");
scanf("%lf %lf %lf %lf %i", &sam_rate, &frequ, &lit, &phase_deg,
&n_samples);
printf("%f %f %f %f %d\r\n", sam_rate, frequ, amplit, phase_deg,
n_samples);
fptr = fopen("sine.dat","wr);
pi = 4.*atan(1.);
phase_rad = phase_deg*pi/180.;
k = 0;
for (i = 0; i < n_samples; i++)
{
indx = (double) i;
y = amplit sin( ( 2. pi frequ indx / sam_rate) + phase_rad);
k++;
printf("%9.4f", y);
if (k = 8) { k = 0; printf("\r\n");}
fprintf(fptr, "%15.7f\r\n", y);
}
}
/////////////////////////////////////////////////////////////////////////
// Program to generate ideal subband output for Table 4.1 C program
// Title subband.c Revision 33196 6:27pm 1394 bytes
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH
#include
#include
#include
#include
#include
#include
double att1[5], att2[5], freq[5];
double band[5], c1[7], c2[7];
FILE *fptr1 ;
main()
{
double angh, angl2, ml, r1, r2, delta;
double pi, fs, i1;
int f, m;
53
pi = 4.0 atan(1.0);
printffpi = %16.12f\n", pi);
fs = 32000.;
att1 [0]=1.00005;att1 [1 ]=.99998;att1 [2]=.70724;att1 [3]=.65239;att1 [4]=.99989;
att2[0]=0.;att2[1 ]=.03337;att2[2]=.70724;att2[3]=.75806;att2[4]=.00178;
freq[0]=1250. ;freq[1 ]=1350.;freq[2]=1500.;freq[3]=1510;freq[4]=1800.;
band[0]=2.0; band[1]=2.0; band[2]=2.0; band[3]=2.0; band[4]=3.0;
fptrl = fopen("subband.dat","wr");
for (f = 0; f < 5; f++)
{
i1 = 2.*band[f]+1.;
r1 = (freq[f] 250.*i1)/250.;
r2 = (freq[f] 250.*(i1+2.))/250.;
delta = 2.*freq[f]*255./fs .5;
for (m = 0; m < 7; m++)
{
ml = (double) m;
angll = r1*32.*m1/64 i1*m1/2 + i1*272./64 delta;
angl2 = r2*32.*m1/64 (i1+2.)*m1/2 + (i1+2.)*272./64 delta;
c1[m] = 10000.*att1[f]*cos(angl1*pi);
c2[m] = 10000.*att2[f]*cos(angl2*pi);
printf("%9.3f%4d%12.3f%12.3f\n", freq[f], m, c1[m], c2[m]);
}
for (m = 0; m < 7; m++) fprintf(fptr1, "%8.2f", c1[m]);
fprintf(fptr1, "\n");
for (m = 0; m < 7; m++) fprintf(fptr1, "%8.2f", c2[m]);
fprintf(fptr1, "\n");
}
}
/////////////////////////////////////////////////////////////////////////
// Program to extract mono decimal data from soundblaster binary stereo pern data
// Title dataextr.bas Revision 91795 10:19pm 276 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
10 OPEN "r", #1, "tons.raw", 1
20 OPEN "o", #2, "tons.dat"
30 FIELD 1, 1 AS A$
40 FOR I = 1 TO 20001 STEP 4
50 GET #1, I
60 XX = ASC(A$)
70 GET #1,1 + 1
80 X = ASC(A$)
90 IF X >= 128 THEN X = X 256
100 X = 256X + XX
54
110 PRINT #2, USING "#######";X
120 NEXT I
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
II Program to plot sin(w)/w curve for Figure 2.1 MATLAB
// Title sinc.m Revision 33096 4:20pm 168 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
c = 32.0;
for i = 1:512
angl(i) = pi (i 256.0000001 )/c;
h(i) = sin(angl(i))/(angl(i));
end
plot(angl, h, V)
xlabel(w);
ylabel('sin(w)/w')
grid
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
II Program to plot multiple images caused by decimation for Figure 2.3 MATLAB
// Title decimate.m Revision 33096 12:29pm 414 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
c = 8.5;
for i = 1:512
angl = pi (i 256.0000001);
han=.5.5*cos(2.*pi*i/512.);
h(i) = han sin(angl/c)/angl;
end
for i = 1:512
y(i)=o.;
if(rem(i,8)=0)
y(i) = h(i);
end
end
fh = fft(h,512);
fy = fft(y,512);
semilogy(1:512, abs(fh), V)
xlabel('w')
ylabel('H(w))
grid
pause
semilogy(1:512, abs(fy), w)
xlabel('w')
ylabel(H(w) AFTER DECIMATION')
grid
55
Illllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
II Program to show integer band modulation for Figure 2.4 MATLAB
// Title ibandmod.m Revision 33096 1:57pm 343 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
clear
m = 512;
for i = 1 :m
h(i) = .98*sin(2*pi*1100*i/32000. + .2*pi);
g(i) = .98*sin(2*pi*100*i/32000. + 1 26pi);
t(i) = (i1)/32000.;
end
for i = 1 :m
V(i)=0.;
if(rem(i,32)==1)
y(i) = h(i);
end
end
plot(t, h, V, t, g, 'w\ t, y, xw)
xlabel(t')
ylabel(ORIGINAL AND MODULATED SINE WAVE)
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
II Program to generate synthesis sine wave sections for Figure 2.8 C program
// Title sections.c Revision 33196 11:16pm 1049 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
#include
#include
#include
#include ,
#include
#include
#define PI 3.1415928
float z[64];
float n_cos[64];
float s_chan[5];
FILE *fptr1 ;
main()
{
int i, i1,j, j1,k, c1,c2;
int k1, k2, k3;
int isign; /* =1 for forward cos xform, 1 for inverse 7
56
int n; /* transform size 7
/* define cosine array 7
i = 2;
{ for (j = 0; j < 64; j++)
n_cos[j] = cos( (float) ((27 + 1") (j + 16)) PI / 64.0);
}
fptrl = fopenfsections.dat'V'wr");
s_chan[0] = 7409.36;
s_chan[1 ] = 1077.80;
s_chan[2] = 8676.40;
s_chan(3] = 9121.91;
s_chan[4] = 2047.06;
for (i1 = 0; i1 < 5; i1++)
{ if ((i1 =2) II (i1==3))
{ for (k = 0; k < 64; k++) z[k] = s_chan[i1] n_cos[k];
}
else
{ for (k = 0; k < 64; k++) z[k] = s_chan[i1] n_cos[k];
}
for (j1 = 0; j1 < 64; j1 += 8)
{ for (j = 0; j < 8; j++) fprintf(fptr1, "%8.2f", z[j1+j]);
fprintf(fptr1, "\r\n");
}
fprintf(fptr1, "\n");
}
}
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
// Program to plot synthesis sine wave sections for Figure 2.8 MATLAB
// Title sections.m Revision 4296 9:20pm 461 bytes
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH
load sect.dat
s1 = [sect(1:1,1:8) sect(2:2,1:8) sect(3:3,1:8) sect(4:4,1:8)];
s2 = [sect(5:5,1:8) sect(6:6,1:8) sect(7:7,1:8) sect(8:8,1:8)];
S3 = [sect(9:9,1:8) sect(10:10,1:8) sect(11:11,1:8) sect(12:12,1:8)];
s4 = [sect(13:13,1:8) sect(14:14,1:8) sect(15:15,1:8) sect(16:16,1:8)];
s5 = [sect(17:17,1:8) sect(18:18,1:8) sect(19:19,1:8) sect(20:20,1:8)];
axis([1 32 10000 10000]);
t = 1:32;
plot(t,s1,,wIt,s2,w,t,s3,w,,t,s4,,w)
grid
pause
57
Illllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
// Program to plot symmetrical overlap filter for Figure 3.1 MATLAB
// Title overlap.m Revision 33196 10:27pm 809 bytes
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
diary windfreq.dat
load wcoef.dat
ncoef = 8;
for i = 1:16
for j = 1:32
k = 32*(i1) + j1;
if (k~=0)
kfloat = k+.000000001 ;
angl = 3.1415928 (kfloat 256)/ 56.;
b(k) = 0.;
for m = 1 :ncoef
b(k) = b(k) + wcoef(m)*cos(2.*(m1)*pi*kfloat/512.);
end
cvalue = b(k) sin(angl)/angl;
h(k) = cvalue;
c(k) = h(k);
if (rem((k1),128) >= 64)
c(k) = h(k);
end
f(k) = 32000.*(k1)/512;
end
end
end
h(512) = 0.;
c(512) = 0.;
f(512) = 32000.;
a4=(fft(h)/56.);
a3 = fftshift(a4);
p = abs(a3(247:267));
for i = 1:33
q(i) = (i1 )*pi/33;
end
axis([0 pi 0 1 ]);
plot(q(1:15)Ip(7:21)Iw,(q(3:23)IpI,w,Iq(11:31)IpI,wlq(19:33)Ip(1:15)I,w')
xlabel(Radians')
ylabel(X(w)')
grid
58
REFERENCES
[1] P. L. Chu, "Quadrature Mirror Filter Design for an Arbitrary Number of Equal Bandwidth
Channels", IEEE Trans, on ASSP, vol. 33, no. 1, February, 1985.
[2] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing, Englewood
Cliffs, N. J., PrenticeHall, 1983.
[3] A. Croisier, D. Esteban, and C. Galand, "Perfect Channel Splitting by Use of
Interpolation/ Decimation/Tree Decomposition Techniques," presented at the 1976 Int.
Conf. Inform. Sci. Syst., Patras, Greece.
[4] Digital Signal Processing Applications Using the ADSP2100 Family Volume 2, Analog
Devices, Inc., SubBand ADPCM, pp. 293300, PrenticeHall, Englewood Cliffs, N. J.,
1995.
[5] ISO/IEC JTCI/SC29, "Information Technology Coding of Moving Pictures and
Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s IS 11172 (Part
3, Audio)".
[6] J. D. Johnston, "A Filter Family Designed for Use in Quadrature Mirror Filter Banks",
1980 International Conference ASSP, Denver, CO.
[7] S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing, John Wiley &
Sons, 1993.
[8] P. Noll, "Wideband Speech and Audio Coding", IEEE Communications Magazine, vol.
31, No. 11, pp. 3444, November, 1993.
[9] D. Y. Pan, "An Overview of the MPEG/Audio Compression Algorithm", Proc. of the
SPIE, vol. 2187, pp. 260273, 1994.
[10] J. P. Princen and A. B. Bradley, "Analysis/Synthesis Filter Bank Design Based on
Time Domain Aliasing Cancellation", IEEE Trans. Acoust., Speech, Signal Processing,
vol. ASSP34, No. 5, pp. 11531160, October, 1986.
[11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, PrenticeHall,
1978.
[12] J. H. Rothweiler, "Polyphase Quadrature Filters A New Subband Coding Technique"
in International Conf. on Acoust., Speech, and Signal Processing, pp. 12801283,
1983.
[13] S. Shlien, "Guide to MPEG1 Audio Standard", IEEE Trans, on Broadcasting, vol. 40,
No. 4, pp. 206218, December, 1994.
59
[14] G. Wackersreuther, "Some New Aspects for Filters in Filter Banks", IEEE Trans.
Acoust., Speech, Signal Processing, vol. ASSP34, No. 5, pp. 11821189, October,
1986.
60

PAGE 1
A REVIEW OF SUBBAND CODERS FROM A PRACTICAL PERSPECTIVE INCLUDING A NEW METHOD FOR THE DESIGN OF THE ANALYSIS AND SYNTHESIS FILTER AS A COSINE SUMMATION by Carl William Glasow, Jr. B.S.E.E., Arizona State University, 1968 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Electrical Engineering 1996
PAGE 2
This thesis for the Master of Science Electrical Engineering degree by Carl William Glasow, Jr. has been approved by Mike Radenkovic Tarnal Bose Hamid Fardi Date
PAGE 3
Glasow, Carl William Jr. (M. S., Electrical Engineering) A Review of Subband Coders from a Practical Perspective Including a New Method for the Design of the Analysis and Synthesis Filter Window as a Cosine Summation Thesis directed by Associate Professor Mike Radenkovic ABSTRACT Subband coders are reviewed in terms of digital signal processing fundamentals. An actual application, music compression, is used to help describe the digital processing techniques used. The subband coder separates the audio signal into frequency bands each with a sampled time domain signal of primarily that frequency. A detailed technique for designing the analysis and synthesis filters for subband coding of audio signals is also presented. The technique expresses the window in terms of a finite Fourier cosine series summation and permits the specification of passband ripple and stopband rejection. Design of a proper filter provides a means for nearperfect reconstruction of the original audio signal from the subband coded signals. Due to characteristics of human audio response, the subband coded signals are in a form which can be better compressed without perceptual loss of signal content to the human ear. This thesis accurately represents the content of the candidate's thesis. I recommend its publication. Mike Radenkovic iii
PAGE 4
CONTENTS CHAPTER 1. INTRODUCTION AND BACKGROUND . . . . . . . . . . . . 1 General......................................... 1 Technical Background Audio Compression Requirements ... 2. SUBBAND CODER REVIEW .............................. 4 General ......................................... 4 Decimation and Interpolation . . . . . . . . . . . . 6 Modulation Methods . . . . . . . . . . . . . . . . 9 Aliasing Cancellation . . . . . . . . . . . . . . . 12 Polyphase Filters . . . . . . . . . . . . . . . . . 14 Continually Changing Waveforms and the Short Term Fourier Transform .. ." .................................... 16 Block Transformation and Computational Efficiencies ........ 17 3. ANALYSIS/SYNTHESIS FIL TEA DESIGN ...................... 25 Filter Requirements . . . . . . . . . . . . . ...... 25 Cosine Summation Problem Statement .................. 27 Cosine Summation Solution .......................... 30 Filter Design Results ............................... 32 4. APPLICATION OF THE RESULTING SUBBAND CODER TO SINE WAVE SAMPLES AND AN ACTUAL MUSIC SAMPLE ....... 36 5. SUPPLEMENTARY AND ALTERNATIVE COMPRESSION TECHNIQUES .................................... 40 iv
PAGE 5
6. SUMMARY AND ADDITIONAL RESEARCH POSSIBILITIES ........ 41 APPENDIX A. LISTING OF THE SUBBAND CODER C PROGRAM .............. 42 B. ADDITIONAL PROGRAM LISTINGS .......................... 47 REFERENCES ..................................................... 59 v
PAGE 6
CHAPTER 1 INTRODUCTION AND BACKGROUND General The intention of this thesis is to provide a thorough understanding of the signal processing methods used to compress audio by means of subband coding. Subband coding implements a unique digital filter, involves multirate sampling methods, requires specific modulation methods for an efficient implementation, eliminates aliasing using digital methods rather than by prefiltering, works with a continually timevarying waveform, and uses a specific discrete digital transform algorithm that considers both compression and efficient implementation. Initial attempts to understand the requirements and design method for the unique filter were not successful. An effort to model the filter used in the MPEG (Motion Picture Experts Group) standard for audio subband coding resulted in the development of a design technique that specified the filter in a concise form as a cosine series summation. Organization is as follows. As background, the need for audio compression and applicability of this technique to audio compression is reviewed. A detailed description of the subband coder technique follows. The filter requirements are next discussed. A step by step procedure for determining the analysis and synthesis filter windows is presented. Application of the technique to a music compression requirement is presented, using the technique on an actual musical sample to verify it's correctness. Supplementary MPEG techniques are briefly discussed. Copies of the programs to design the filter, to demonstrate usage of the subband coder, and other programs used to verify the method are included in appendices. Technical Background Audio Compression Requirements The compression of audio signals has a long history including mechanical coders in the 1940's that modelled the human voice, early digitized voice compression research by Rabiner and Schafer [11 ], human psychoacoustic research, and the various researches which resulted in the modern music compression techniques developed by the MPEG Standards [5] committees during the last fifteen years. The need for compression in music signals is examined by comparison of the bandwidth of uncompressed music with popular modern day transmission bandwidths and digital storage capacities. The human ear has a bandwidth of about 20 KHz and a dynamic range of about 1 00 dB, and modern music playback and recording equipment is capable of meeting these requirements. To digitize music at this bandwidth and dynamic range requires a Nyquist sampling rate of double the bandwidth and digital resolution of a bit per each 6 dB of dynamic range. Modern day CD players meet this requirement by sampling at 44.2K 1
PAGE 7
with 16 bits of resolution for each of two channels (stereo) of music. The composite required bit rate is (2)(44200)(16) or approximately 1.4 Mbits/second. Compared with the 4 KHz bandwidth of a phone line, 28.8K modem and 64K ISDN data communication channels, and the 1 OOK bandwidth of an FM radio station, the need for compressing digitized music signals is quite obvious. To transfer 3 minutes of uncompressed stereo music over a 28.8K modem would take two hours and twenty seven minutes. The situation in storage is not much better. To store a 2 hour stereo music program digitally requires 5.1 gigabytes, exceeding the capacity of modern low cost disk drives. Depending on the desired quality of the reconstructed music signals, compression ratios ranging from 6 to 1 to approximately 90 to 1 are available. Improvements in technology will relieve most present day problems. Hard drives will become larger, the cost per byte of memory will continue to decrease, modems will become slightly faster, and ISDN lines more prevalent among consumers. The real breakthrough, though, will be fiber optic access for the consumer, either supplied by the phone companies, or by home computer and internet access over already available cable company fiber optic lines. As this is being written, Jones lntercable has announced 100 MBit internet cable service to its Washington D.C. customers by summer 1996 for $40 a month. Even these advances will not alleviate the need for compression though. With a 1 00 Mbit data line, uncompressed audio will still require two minutes to deliver a two hour audio program. The line also needs to have available bandwidth for video, and however fast the transfer is, consumers will always be happier with faster delivery. In general, if the communication network of a country is at capacity, that capacity can be increased by a factor of ten if the information is compressed by a factor of ten. If delivery requires a specified amount of time; that delivery time can be improved by a factor of ten if the information is compressed by a factor of ten. Some background in the technical aspects of compression follows. Most compression techniques depend on the fact that not all codes are used equally, or that data is highly correlated with adjacent data. Text data on a computer, for example, is encoded in 8 bit ASCII, accommodating 256 different characters. In typical text, 15 to 20 letters comprise an estimated 90% of the content, easily providing 1 0 to 12 compression even in the absence of a lot of white space. Faxes take advantage of the large amount of white space on most pages to effectively compress data. Videoconference equipment takes advantage of the fact that people and surroundings do not move much when in a conference. Video motion picture transmission also provides a high degree. of correlation since it's highly likely that a pixel is the same or nearly the same value as the pixels above, below, to the right and left, and for several frames prior and following. Correlation in three dimensions is thus possible. In fact, video requiring 24 bits per pixel to represent brightness and color, and 525 by 400 pixels on a television screen updated at 30 frames per second requires an uncompressed data rate of 150 Mbits per second. With correlation of data in three dimensions, though, compression ratios between 100 and 1 000 are achievable, depending on the level of action. Speech compression, a subset of the audio compression problem, also can provide a high level of compression. It does not usually require stereo, the human voice is usually limited to 4K bandwidth, and compression techniques that model the human vocal tract provide reasonable quality of an actual human voice at 2500 bits per second, Rabiner and Schafer 2
PAGE 8
[11 ]. For a simulated voice containing no emotional information, the limited number of vowels and consonants in human speech can each be modelled with 6 bits and if typical speaking rates occur at 3 sounds per second, speech can be compressed to 200 bits per second. The audio music signal, however, does not provide as high a level of correlation. Music represents the full breadth of vocal and instrumental expression that can be heard by the human ear. If sound occurs at up to 20KHz and is sampled at 44KHz, and the dynamic range is 1 00 dB, there is little probability that the next sample can be accurately predicted. The signal only occurs in one dimension, and that dimension is poorly correlated in time. Neither differential encoding or adaptive differential coding are very effective in this signal environment. Music's one correlative characteristic is that it typically consists of a limited number of frequencies, and these frequencies change at a much slower rate that 44.2 KHz. Slowly changing frequency content with a limited number of frequencies suggests the use of an adaptive predictor. The frequency content and phase, however, change at a fast enough rate, such that a predictor has a difficult time keeping up while maintaining sufficient accuracy. For five to one compression, the filter would have to predict the next 16 bit value with 3 bit accuracy, an extremely difficult task. Additionally, depending on the number of coefficients predicted, audible lower amplitudes of unpredicted frequencies would be left out. The subband coding technique, in contrast, takes advantage of the physical characteristics of human hearing, by providing at least some information in all frequency bands. Studies of human auditory response demonstrate that the human ear can hear multiple frequencies of a wide dynamic range if the separation in frequency is high enough, Pan and Shlien, [9] and [13]. However, the ear cannot hear two frequencies that are close together, if one is higher in amplitude than the other. The ear also requires a higher level of signal for high frequency signals than for low frequency signals. For example, a 20 KHz signal needs to be 40 dB higher in amplitude than a 1 KHz signal at minimum hearing threshold levels. The first hearing characteristic suggests that the signal can be coded into a finite number of frequency bands without loss of audible information, as long as the bands are a width that agrees with human frequency discrimination capability. The second hearing characteristic suggests that once the information is divided into bands, the higher frequencies can be coded with significantly less bits than the lower frequencies. The subband coder is an extension of the Quadrature Mirror Filter originated by Croisier and Esteban [3], a twochannel subband coder that has been adopted by the CCITT as Recommendation G.722. A signal processing implementation of Recommendation G.722 as described by Analog Devices [4] is used to add an auxiliary channel to a voice channel by dividing an 8 KHz channel into two 4 KHz channels using overlapping_ bands. Rothweiler [12] and Chu [1] described the method by which the Quadrature Mirror Filter is extended to multiband use. 3
PAGE 9
CHAPTER 2 SUBBAND CODER REVIEW General For the subband coder to be useful it needs to have a flat frequency response, it must be capable of operating on a constantly changing signal, and needs to provide near perfect reconstruction of the original signal from the subband coded data. The ideal subband coded information should also be less than or equal the bit rate of the original signal, called the critical sampling frequency. At first look, these requirements would require a perfect brickwall bandpass filter for each frequency band. If the filter is narrower than the band, information is lost at the band borders and the frequency response is not flat. If the filter is wider than the band, then the coded band information requires a higher bit rate than the original signal. The solution is to have overlapping bands, where the combination of information from two adjacent bands can be used to reconstruct the signals at the band borders. This can be accomplished while sampling the subband coded signal at the same frequency as the original signal. However, information outside the band, when sampled at the critical Nyquist frequency of. the band will exhibit aliasing. Attempted recovery of an aliased signal will result in interpretation as an incorrect frequency within the band instead of the original out of band frequency. An additional requirement is to design a transform process and a filter such that any aliased terms are cancelled. The filter design should also have a finite number of coefficients. The ideal set of coefficients for a brick wall filter can be determined by taking the Fourier transform of the ideal rectangular frequency response. The transform result defines the coefficients in the time domain. A set of n successive coefficients define the multiplying factor for n successive sampled values of the time domain signal. The Fourier transform of a rectangular frequency function is the sin(co)/co function shown in Figure 2.1. The function is plotted to t with zero crossings at n7t for n an integer not equal to 0. The actual Fourier transform is infinite. Although this function decreases in value the farther it is from its center maximum, the signal is still not finite, meaning that the ideal filter has an infinite number of coefficients. Simply zeroing the coefficients beyond a certain time results in an effect called the Gibbs phenomena, where any ripple due to the filter truncation causes an oscillation in the frequency domain near the cutoff frequency. This oscillation does not decrease in amplitude, but increases in frequency as the cutoff frequency is approached. This problem can be minimized by multiplying the truncated sin(co)/co function coefficients term by term by a windowing function which decreases the outside coefficients gradually to zero as they approach the truncation boundary. Several popular window techniques are described in digital signal processing textbooks, including the Hanning, Hamming, Blackman, and Kaiser windows. These windows tradeoff complexity, passband and stopband ripple characteristics, and rolloff characteristics to meet differing application 4
PAGE 10
3 ..... 3 ..... 01 8.8 .................................................................................. 8.6 ................................ ............... 8.4 ....................... ........................ 8.2 ................................................. ............... 8 .............. : ............. 8.2 :: ........ : ........... ..... .: ............... . . . . . . . . . . . . . . . . . 38 28 18 8 18 28 38 w Figure 2.1 sin(co)/co requirements, and approximate the ideal brickwall filter with a finite number of coefficients. The window function therefore modifies the coefficients of an ideal filter to provide a filter with desirable characteristics .. The subband coder filter application requires in addition to the application of a textbook window, the design of a window that both eliminates the aliased terms in an overlapping filter, and permits, when the subband coder filter (the analysis filter) is combined with the reconstruction filter (the synthesis filter), near perfect reconstruction of the original signal. Prior to discussing the filter design, an understanding of the subband coder is helpful. This requires defining the method by which the signals are separated into bands so they can be filtered, defining the number of bands, defining the width of the filter, discussing how a timevarying signal is handled, and defining an efficient mathematical technique for implementing the filters. The first step is separating the input signal into different frequency bands. This is accomplished by modulating the signal with equally spaced sine or cosine functions. This is classic frequency modulation and for each modulating signal results in signals equal the sum and the difference between the input signal and the modulating signal. Mathematically, this is illustrated easily by the basic trigonometric identity cos(a) cos(b) = .5 cos(a+b) + .5 cos(ab), (2.1) 5
PAGE 11
where signal a is the input signal and signal b is the modulating signal. Since several bands with finite spacing are provided, the resulting difference between the signals can be filtered with a baseband filter equal the width of the band. (The sum term is far outside this filter bandwidth, is of no interest, and is filtered out). If the bands are equally spaced, the same baseband filter, called a prototype filter, can be shared mathematically by all bands. The number of signals to be filtered equals the number of bands. Additionally, the bandwidth of the narrow band filters is much less than the bandwidth of the full original signal bandwidth. The reduction in filter bandwidth is by a factor equal the number of bands. For example, using simple numbers, if a signal with 16 KHz bandwidth is sampled at 32 KHz and 32 bands are used, the modulating signals are 500 Hz apart and the baseband filter operating on each of the modulated signals has a nominal bandwidth of 500 Hz. Decimation and Interpolation 8.8 1 : : : : : : . . . . . . . . . : : 0 : 0 : : 0 . . . . . . . . . . . . . . 8.6 . . : 0 : 0 : 0 . . . . . . . 0 0 . .  0 8.4 . . . . . . . B.Z &:&1 ::> a: : 8 .z ....... ..... 1""'"" .......... """"["""""" "("' ..... "T' ............ :_ ............ .. z . . 8.4 """"""']""'"" .......... """"["" ""'"'j"'" ..... '"!" """"""["""""'" . . . . 8.6 . . : ..... :: : : . . 8.8 . . ..... .... .... . . . . . . . ............. f ............. "1" . . . . . . "i" .............. f ............ : : : : 8 18 ZB 38 48 58 68 78 n Figure 2.2 Decimation and interpolation The reduction in bandwidth means also that the sample rate can be reduced accordingly. For the example above, the Nyquist sampling rate is 32 KHz for the original signal and 1 KHz for each of the modulated signals. Since there are 32 modulated signals to be sampled, the composite sampling rate is still 32 kHz, meeting the critical sampling frequency requirement. 6
PAGE 12
The technique of sampling at a lower rate is called downsampling. The downsampling filter is called a decimation filter. Reconstruction of the signal involves the opposite of this, upsampling, and the upsampling filter is called an interpolation filter. The downsampling filter is called a decimation filter because it uses only every nth input sample (every 32nd sample for the example above), decimating the filter. The interpolation filter, in contrast, takes a decimated signal, zero fills the intermediate samples, and by filtering, interpolates the signals between the decimated points to recover the original signal. Figure 2.2 illustrates the decimation and interpolation of a sine wave by a factor of 4. The original waveform was sampled four times more often than shown. The waveform can be recovered by interpolation, setting the intermediate values to zero as shown and filtering properly. A well designed filter can recover the original intermediate samples. Decimation and interpolation filters introduce aliased terms due to characteristics of the sampling action. These further complicate the requirement to cancel aliased terms. When a signal is low pass filtered by some factor prior to decimating by that same factor, the decimation creates duplicate images of that low pass frequency response across the full frequency domain. This is easily demonstrated by plotting the frequency response of both a low pass filter and of a decimated version of that low pass filter. The plots in Figure 2.3 are generated by a program that creates a low pass filter with bandwidth 1t/8.5 (a bandwidth of n provides an all pass filter), windows it with a Hanning window, decimates the result by 8, and performs an FFT on both the original filter and the decimated filter. The notches are created by designing a filter with a bandwidth slightly less than the full decimated bandwidth for the purpose of band border illustration. Using the same low pass filter for interpolation as for decimation eliminates the other images generated during the decimation operation. However, since the filter is not perfect, the edges of adjacent images are not completely eliminated. A nonideal bandpass filter attempting to capture one of the middle spectral images above would not be able to eliminate the edges of the adjacent images. This is the aliasing effect of the decimation and interpolation operation. The difference between a simple filter application and the subband filtering application is that normally the out of band information would be significantly attenuated prior to sampling. This is accomplished either by use of a good attenuator, or by oversampling so that a less complex filter may be used. In the multiband filter application, the adjacent bands contain necessary information, and with the overlapping filters proposed, the adjacent band information can not be substantially attenuated. Another characteristic of decimation filters, a quite useful one, significantly reduces the amount of computation in the subband filter application. If an input signal is multiplied by another function prior to decimation, the computation can be performed on the decimated signal after the decimation instead, and only on the decimated samples instead of all input samples. This means that for a particular band, the modulation multiplication and the analysis filter window coefficient multiplication can be performed after the decimation only on the .decimated samples. The interpolation filter exhibits similar features. 7
PAGE 13
3 .... ::c z: 0 ... G: I: u 1&1 Q II: 1&1 ... ... G: 18.1 18" 18.l 182 183 184 185 186 18? 188 18_, 8 188 Z88 388 488 588 688 w ::::::::::::::::.:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::!:::::::::::::::: : : : : : : : : : : : : : : : : i : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : l : : : : : : : : : : : : : : : : 0 : 0 0 : : : 0 0 0 0 0 : 0 0 0 0 0 0 0 : : : 0 0 0 0 0 0 0 0 : : : : : 0 0 0 :. :. .: 0 0 0 0 0 0 0 0 . . _,___ ...... : : : : 181 ::::: ::: ::::::::: :::: :. :::::::::: ::::::::::::::::::: ..................... ........... .......... .......... ...... ............... ......... : .................... ........... .......... :... t . . . .................. ,... 182 3 .......... ,.......... ............ ........... ........... .................. .......... ........... ............................. . . . . . . . . . . ,, . . . . ...... : . ....... ........ .......... ................... l . ........... : .......... ::\ 8 188 Z88 388 488 588 688 w Figure 2.3. Aliasing effect of decimation and interpolation. 8
PAGE 14
Modulation Methods The next topic of interest is the modulation technique used to extract the information in each frequency band. There are several methods for spacing the modulation frequencies in the frequency band. Depending on the method used, complex data may result in doubling the amount of compressed data, the bottom and top bands may not be handled ideally, the transform size may not be optimal, or an efficient transform may not be realizable. Several modulation methods are briefly discussed to aid in understanding the particular modulation method chosen for the subband coder application. The decimation and interpolation operation itself is a form of modulation and demodulation. Sampling a frequency band at a decimated rate translates that frequency down to the baseband (assuming the information was bandpass filtered prior to the decimation.) Intuitively this can be grasped by imagining a high frequency sine wave being sampled at a much lower rate than the Nyquist sampling frequency. If the sampling rate is not an exact submultiple of the original frequency, the samples, although they skip several periods, will create a lower frequency sine wave. The number of periods skipped is related to the frequency band that is translated. Figure 2.4 shows an 11 00 Hz sine wave sampled at both 32 KHz and at 1 KHz. The points marked X show the 1 KHz sampling. A baseband filter recovery using the X samples results in the 1 00 Hz modulated waveform shown. 9
PAGE 15
As seen in Figure 2.3 illustrating aliasing due to decimation. and interpolation, interpolation of this downsampled signal by zero filling to increase the sample rate, creates multiple spectral images. Instead of low pass filtering the result, one of the higher frequency images may be captured by bandpass filtering the result. The original information is thus translated up to the original frequency band by simple interpolation (or to another band if desired) without having to multiply the signal by a cosine or sine modulation term. A digital bandpass filter on the input and output are all that are required. This simple modulation technique is called integer band decimation and interpolation. The limitations of this technique are the pre and post filtering requirement, and that the bands are fixed as shown in Figure 2.3. If the desired center frequency of the bands is not as shown, a different modulation technique is required. For example, having half bands for the high and low bands may be undesirable. Modulation using the Discrete Fourier Transform (OFT} provides a general purpose modulator, and at the same time provides an efficient mathematical method for handling multiple bands at the same time. The OFT equation is: X(ro} = x(n}*eIOlll, n=O where ro is the desired modulation frequency, x(n} are the M samples, X(ro} is the transform result, j is the imaginary operator, and per Euler's equation: elOlll = cos(ron} j*sin(ron}. (2.2} (2.3} If M is a power of 2, the Fast Fourier Transform (FFT} may be used to further minimize computation. The values of ro, or center frequency values, can be any equally spaced frequency values, 250 Hz, 750 Hz ... for example. The first value of ro is not required to be 0. As long as the frequencies are equally spaced, the OFT can be efficiently implemented. As seen from equation (2.3} though, the general OFT creates both real and imaginary terms. This technique is called quadrature modulation because both the cosine and sine terms each create a sum and difference transform term, or four terms total. The result can be low pass filtered eliminating the sum terms, but two terms are still required for each frequency band. If these terms are kept as compression terms, twice as much information must be transmitted or stored compared to a method that requires only real terms for each band. Single sideband (SSB) modulation provides this method. If the original time signal to be modified is represented by a complex value and its conjugate value, multiplying the complex value by exp(jron} and its conjugate by exp(jron} and summing them would result in a real term. Creating the complex term and it's conjugate can be accomplished by multiplying the signal by cos(ro,n} + j*sin(ro,n}, and cos(ro,n} j*sin(ro,n}. With some straightforward mathematical manipulation, the multiplication and summation result in the 10
PAGE 16
input signal being multiplied, or modulated, by cos[(co1 + co)n]. If the co term used is equal to !lco/2, the width of the band divided by 2, the ro1 terms form a set of equally spaced center modulation frequencies. A comparison of standard quadrature modulation and SSB modulation with discrete Fourier transforms is instructive. A standard Discrete Fourier Transform utilizes e1oon and results in a complex frequency spectrum. A variation of the OFT, the Discrete Cosine Transform, multiplies the series by a cosine term instead and produces a real spectrum. Similarly, the SSB cosine multiplication produces a real transform output. SSB modulation also permits selection of an initial offset frequency, as with quadrature modulation. Selection of the resulting sideband frequencies and the width of the subbands still needs to be addressed. There are tradeoffs related to band alignment, number of bands for a particular transform size, and computational efficiency. Figure 2.5 from Crochiere and Rabiner [2] shows three cases. In each case, eight frequencies are equally spaced around the unit circle and require the same transform size. In the first case the first frequency is 0, and since the frequencies on the bottom of the circle are conjugate to those on the top half, the resulting non duplicate bands are three full bands and a half band at minimum and maximum frequency. In the second case, where the bottom band is centered at Y2 band from the origin (7t/8 for the example), the bands are also duplicated. There are now at least four full bands and no half bands. In the third case, the first band is offset by 1A band. In this case, the frequencies on the bottom half of the circle are offset from those on the top half. If alternating bands from the top and bottom half are used, the bands may be reduced in half and eight non overlapping, non duplicated bands with bandwidth 1t/8 are available for the same transform size as for the other two cases. This case, however, does not lead to a computational approach that lends itself to a fast transform, although Crochiere and Rabiner [2] state that by using quadrature modulation and proper handling of the imaginary terms an efficient implementation can be achieved. The second implementation lends itself more readily to an efficient implementation, using only real numbers, although it requires twice the transform size for the same number bands. This is the approach used in the MPEG specification. The modulation frequencies for an M band system are: co[i] = 27t(i + Y2}/2M, i = 0 to M1 (2.4) For the four band system in Figure 2.5, for example, a transform size of 8 is required and the first band co[O] is at 27t(Y2)/8, or rr/8 as shown. A 32 band system, such as that used by the MPEG specification, requires a transform size of 64 and the first band is at 7t/64. The resulting transform will be of the general form: s[i] = L x[n] cos[7t(2i+ 1 )(n+n0}/2M], n=O where s[i] are the subband values, i is the band number from 0 to M1, and x[n] are the input samples, and N is the number of samples transformed. 11 (2.5)
PAGE 17
. ( t ll SSB CHANNELS PL.US HAL.F CHANNEI..S AT w 0,,. 2 3 II. "'o'o ... 6 (0) I< e,k0 112 ffi' 1<12 SSB CHANNEL.S I 0 2 3 I I I 4 7 0 "'o ... '"'2 '"'3 ,. 5 6 lbl 1< e,k0 114 2 K SSB CHANNEL.S (C) Figure 2.5 SSB modulation options The value of n0 will be addressed in the subsequent discussions on aliasing and block transformations. Aliasing Cancellation A graphical is helpful in describing aliasing cancellation in the multichannel subband coder. Figure 2.6 below is from Chu[1 ]. For the bandpass filter output of a full spectrum signal, Figure 2.6 shows the spectrum of the signal at the bandpass output, following decimation, and after the bandpass interpolation filter. The mirror images shown as the conjugate frequency spectrum are those that result from a Fourier transformation of the bandpass signal. For a cosine waveform, an even function, the mirror images are the same polarity. For a sine function, an odd function, they are of opposite polarity. These images of course need to be considered because they generate the multiple frequency bands, and the aliased terms that need to be cancelled. The fourth image set shows aliasing terms on both the lower and upper end of the spectrum for both the frequency band and ifs image. 12
PAGE 18
I coNJuGnr[ rR[o sPCtTRU1 I rm;OU[NCY SP!:tTRI.rl I I I I PI PI It riLT[A AND MODULAT WITH COSIN[ rUNCTION A A I I 8 PI OCC I""'T ION / IHT(IIPOLAT ION SP!:CTlriAG 1 I I rtt II PI I _,.,, I PI rUNCTIIJ'O 1 fl jA&Aj PI SUBTAnCT II D A D ?if' I fA'j\ I PI I CONJLCAT!: rRE:o SP!:CTRU1 I rRE:OU:NCY I 1 I I PI f) PI I PI riLT[A AND WITH SINe rUNCTION PI/'2 I PI Figure 2.6 Aliasing effects of decimation as a function of modulation phase. The bands on the left are cosine modulated and the bands on the right are sine modulated with a resulting 7t/2 phase difference. Another 7t/2 phase shift in the interpolation filter for the sine modulated band results in an out of phase negative frequency image, and the spectral images shown on the right in Figure 2.6. The decimation and interpolation generate multiple spectral images of the same phase as the original images in the third image set. The sine function can also be considered a multiplication of the positive frequencies by j and of the negative frequencies by j. The right side main positive and negative frequencies are multiplied by j*j and (j)*(j) respectively which both equal 1. The nonaliased frequencies are thus simply inverted by the procedure. The aliased terms, on the other hand, in the regions of interest, are multiplied by j*(j) or (j)*j providing a gain of 1. The main frequency images are therefore inverted and the aliased images are not. If the two figures shown represent adjacent bands (ie. one even bands and one odd bands) the bottom spectrum in Figure 2.6 reveals that subtracting the sine modulated adjacent band from the cosine modulated band results in doubling the desired spectral information and cancelling the aliased information. It should also be noted that the aliasing is cancelled if the adjacent bands are 7t/2 phase related regardless of the exact modulated phases. Chu [1) and Rothweiler [12) both present a mathematical derivation that demonstrates the aliasing cancellation. The derivation is not repeated here but is briefly discussed. The derivation describes the decimation in the frequency domain as a convolution of a periodic train of delta functions with the filtered input. Expressions are derived for the frequency 13
PAGE 19
response in adjacent bands, multiplied by cosine or sine terms as applicable, main frequency and aliasing terms are identified, and the upper side and lower side of the two adjacent bands combined to show the aliasing cancellation. The derivation also leads to filter design requirements including a symmetry requirement and a constant square summation for adjacent band contributions at all frequencies covered by the adjacent bands. Unfortunately multiplying one set of channels by a cosine function and the other by a sine function requires two different kinds of filters. This means the efficient Discrete Transform implementation hoped for is lost However, Rothweiler [12] has shown that if a 45 degree phase shift (or 7r14 phase shift) is added (or subtracted) to both the cosine and sine terms, the same filter can be used if some care is taken handling a polarity change for a part of the term. The effect of subtracting the w4 term is shown in the simple trigonometric equations: cos(x 1ri4) = (1/J'2)*[cos(x) + sin(x)], and sin(x7r14) = (1/J'2)*[cos(x) sin(x)]. Defining n0 in equation (2.5) appropriately, the resulting subband transform is: s[i] = L x[n] cos[1t(2i+ 1 )(n/2M) 7r14 )], n=O giving the following cosine terms for the first four subbands: i = 0: cos(11tn/2M w4) = (1/J'2)*[cos(1tn/2M) + sin(1tn/2M)] = f!"4 i = 1: cos(31tn/2M w4) = (1/J'2)*[cos(1tn/2M) sin(1tn/2M)] = el3"'4 i = 2: cos(51tn/2M 1t/4) = (1/J'2)*[cos(1tn/2M) + sin(1tn/2M)] = eJ5"'4 i = 3: cos(71tn/2M 1t/4) = (1/J'2)*[cos(1tn/2M)sin(7tn/2M)] = el"'4 (2.6a) (2.6b) (2.7) (2.8a) (2.8b) (2.8c) (2.8d) The result of equation (2.8) demonstrates both the 1t/2 phase shift required between bands aliasing cancellation, and a need to take care of a minus sign for certain bands. An integer substitution for n that permits n to be specified in groups of 2M permits the inversion to be handled by a partial filter inversion, and will be described in the block transformation discussion. A similar w2 phase shift in the synthesis process will provide an end to end transform inversion of alternate bands where the aliased information is finally subtracted out. Polvphase Filters When, as in the MPEG subband coder example, there are 32 filters, an implementation that decimates the input samples and handles them in blocks of 32 must be defined. If the implementation is pictured as a one sample fixed phase delay, the delay can be represented by the delay operator esr, where T is the single sample delay time. This delay term is of the form of the Discrete Fourier Transform operator, and suggests tl)at a 14
PAGE 20
transform operation is applicable that uses all samples. The samples can then be handled in blocks, where the block size is equal the decimation ratio. The selection of the sample is often portrayed as a commutation operation, or a rotating selector, where each successive sample is part of a different decimation group. The decimation property that permits multiplication by a window function or modulation function after the decimation is now utilized. The full convolution of an input signal with a filter window and modulator would normally be shown as: s[i] = L x[Tn]h[n]cos[(2i+1)(mti2M 7t/4)]. n=O (2.9) This equation indicates that each input sample needs to be multiplied by each window and modulation point. Shifting the samples by the window as a block permits samples to be lined up with particular window coefficients, and multiplication by the modulation function can now be defined by a transform size less than the full size of the window. This capability permits a much more efficient implementation of both the windowing and modulation. The one sample fixed phase delay gives the subband coder an alternate name, the polyphase filter. The implementation also permits use of all input terms in a multiband filter, as opposed to using one set of decimated terms. Use of the full set of input samples intuitively provides more information about the signal when considered in its original form, a single band full bandwidth signal. Figure 2.7 from Crochiere and Rabiner [2] illustrates the polyphase filter implementation. Mitra and Kaiser [7] also describe the polyphase filter in their Handbook of Signal Processing. x0 (m) x, (ml x(n) Xp(m) XMt(m) Figure 2.7 Polyphase filter block diagram. 15 wktM0 M
PAGE 21
The one sample difference between adjacent samples together with the cosine modulation operation lends itself to a mathematical implementation of the composite filter using a variation of the Fast Discrete Cosine Transform (FDCT). It's important to note the usage of the FDCT in this application is different than the more common use. The FDCT, like the Fast Fourier Transform (FFT) is typically used to transform timedomain data to the frequency domain. The output in this typical application is a spectrum representing the magnitude of each frequency band. In the subband filter application, the FDCT is used as a mathematical construct to transform a single timedomain signal to multiple timedomain signals, one for each frequency band. For the music compression example, a 4800 Hz signal modulated by 4750 Hz in a 4500 Hz to 5000 Hz band, results in a 50 Hz signal in that band sampled at a 1 KHz rate, rather than simply a magnitude signal indicating presence of signal activity in that frequency band. The sampled timedomain signal also provides phase information with non complex data, unlike the normal FFT frequency domain output which requires complex data to maintain phase information, and unlike the normal DFCT frequency domain output which, although all output is real numbers, contains no phase information. The resulting subband signals are not pure decimated and modulated time domain signals, however. Adjacent frames of subband data are shifted in phase by rc/2 radians, as required for aliasing cancellation. The polyphase filter similarly results in a more efficient implementation of the synthesis filter. Continually Changing Waveforms and the Short Term Fourier Transform The shortterm Fourier transform describes the method by which changing waveforms are transformed on a sliding and overlapping basis to provide a smoothly changing and accurate transformation of a time domain signal into the frequency domain, or separation of the time domain signal into several time domain channels of different frequency content. The first case applies to spectrum analysis of a changing signal, the second to the subband coder application. Rabiner and Schafer [11] provided an early exploration of this subject as it applies to speech waveforms. The technique slides the waveform across a window in sample frames. The window is wider than the sample frame. In the music application, the window contains 511 points, 32 samples are shifted in each frame time, a new transform is performed each 32 samples, and 16 frames are always active. The 32 samples agree with the decimation factor and with the number of subbands. Because of the midpoint maximum of the filter, the frame halfway through the window is given the most weighting. Future and past frames are given increasingly less weighting the farther they are from the middle frame. All frames are added with their relative weighting. As new frames are shifted in and old frames are shifted out, the effect is a continuous overlap and add of the changing input data coupled with the filtering operation. For the synthesis operation, the input subband information represents a decimated time signal. The decimated time signal has rc12 phase shifts incorporated into it for adjacent 16
PAGE 22
bands to cancel aliasing effects. It also has a 7t/2 phase shift for adjacent frames to better capture frequencies close to the modulation frequency. Otherwise, if a 1250Hz input were modulated by 1250 Hz, the subband information would be stuck at a constant value, providing no timing or amplitude information. The synthesis filter, for each subband, multiplies the subband term by successive sections of a fixed sine waveform as it moves from frame to frame. The frequency of the sine wave section is related to the subband modulation frequency. This combined with the changing subband signal provides 16 signals times 16 waveform sections of 32 samples each. The sections are multiplied by the synthesis window and subsequently added to reconstruct the waveform. The section in the middle is weighted more heavily by the middle maximum of the synthesis filter. Sections prior to and after the center section are weighted less as in the analysis filter. As the subband generated sinusoid sections are frame shifted by the window, an overlap and add operation occurs reconstructing a time changing waveform. In the next section, the movement of a sine wave through this analysis and synthesis process is described including the phase shifts and partial window inversions required to implement it. Block Transformation and Computational Efficiencies The input samples are shifted 32 samples at a time by a 511 point window. They are convolved with the analysis window multiplied by the cosine modulation terms. The resulting convolution equation, implementing both the 7t/2 phase shift for adjacent bands, and the 7t/2 phase shift for successive frames is: s[m][i] = 5f x[Tn32m] h[n] cos[(2i+1){n16)7t/64], n=O where x[Tn32m] are the n 512 input samples at frame time m, his the 512 point window, s[m][i] are the 32 subband outputs at frame time m, and i defines the 32 subbands from 0 to 31. {2.10) The 2*i+1 provides the TC/2 radian shift between adjacent bands. The n16 term provides the rrf2 radian phase shift from frame to frame as n increases by 32 samples at a time. The 167t/64 term provides the rrf4 term required to implement the transform with a single filter as described earlier in the discussion on aliasing cancellation. The preceding implementation, while easy to understand, is restructured to reduce computation, Pan [9]. The multiplications required for equation {2.1 0) are 512 for each of 32 bands for a total of 16384 multiplications. 511 additions are required for each band, or 16352 additions. A simple substitution is used to reduce the computations. This substitution is made possible per the discussion on decimation and efficient computation in polyphase filters. 17
PAGE 23
By setting n = k + 64j, where k and j are both integers, and by reversing the direction xis slid across the window, equation (2.1 0) becomes: s[m][i] = t x[k+64j+32m]*h[k+64j]*cos[(2i+ 1 )*(k+64j16)7rl64]. k=O j=O The cosine term can be expanded to: cos[(2i+1)*(k16)7rl64] cos[(2i+1)*64j7t/64] sin[(2i+1)*(k16)7t/64] sin[(2i+1)*64j7rl64]. (2.11) (2.12) The term sin[(2i+ 1 )*j7t] equals zero for all integer combinations, and the term cos[(2i+ 1 )*j7t] equals + 1 for j even and 1 for j odd. Equation (2.11) now becomes: s[m][i] = f t (1Y*x[k+64j+32m]*h[k+64j]*cos[(2*i+1)*(k16)7rl64]. k=O j=O (2.13) The term (1Y can be included in the window h[k+64j] by inverting the window when j is odd. This permits continued use of the Discrete Cosine transform. The cosine term is independent of the j summation and can moved outside of it giving: s[m][i] = M[i][k] E x[k+64j+32m] C[k+64j], where k=O j=O M[i][k] = cos[(2*i+1)*(k16)7t/64],.and C[k+64j] = h[k+64j] for j even, and h[k+64j] for j odd. (2.14a) (2.14b) (2.14c) The resulting computation requires 512 + 32*64 = 2560 multiplications and 64*7 + 32*63 = 2464 additions, or about an 8 to 1 reduction in computation. Use of an FFT formulation results in further reduction. For the synthesis operation, a similar restructure can be made, although it's not as simple to show the derivation. The results will be shown and discussed briefly. An inverse modified Discrete Cosine Transform is used to generate a vector V[i]: U[32m+n] = 1: s[m][i]*cos[(2i+ 1 )(n32m+ 16)7t/64], for n = 0 to 31, and k=O y[32m+n] = f h[32j+n]*U[32j+32m+n]. j=O 18 (2.15) (2.16)
PAGE 24
s[m][i] are the subband samples at time frame m, and U(32m+n] is a vector of length 512, representing the vectors from the most recent 16 frames. y is the estimated reconstructed output. To provide a more effective implementation, the U vector is generated from an intermediate V vector of length 1 024, where 64 new values are generated each transform frame. The 64 values are generated by creating 64 values during each transformation for n = 0 to 63. Each of the 16 frames has 64 values to choose from, and only 32 values are needed. By alternating between the top 32 and bottom 32 values of each 64 value vector, and by inverting the synthesis window every 64 values, the same as for the analysis window, the subband values are each multiplied by sine wave sections of the modulation frequency. The sections are each (2i+1)7t/2 in length and increase their starting point by (2i+ 1 )7t/2 each frame. The subband information is changing from frame to frame at the delta modulation frequency. When multiplied by the modulation frequency, the original modulation frequency plus the delta modulation frequency is recovered. The window averages sixteen frames of this giving heavier weighting to the middle frames and less to the end frames per the shortterm Fourier transform requirement. A block diagram and a set of equations describe this process in detail at the end of this subsection. It is instructive at this point to see how a sine wave is handled by both the analysis and synthesis transformation. We should expect to see nonzero values only in the two bands where the frequency lies between two modulation frequencies. A frequency of 1350 Hz will be used with a 32 KHz sampled system. The 32 bands provide bands of width 500 Hz. The two adjacent modulation frequencies are at i = 2 and i = 3, (2*i+ 1 )*500 equals 1250Hz fori= 2, and 1750Hz fori= 3. x[n] can be expressed as: x[m][n] = cos[27t1350*(32mn+256)/32000], for n = o to 511, and using convolution equation (2.1 0), s[ m ][i] = cos[21t 1350* ( 32mn+256)/32000]*h[ n ]* cos[(2i+ 1 )(n16)7t/64], n=O where m is the frame number with an initial value of 0. (2.17) (2.18) With a filter h[n] that attenuates all values outside the two adjacent bands, only s[2] and s[3] will be nonzero. An initial cosine phase of T = 256 for x[n] is chosen because the convolution of x[n] with the filter h[n] will initially provide the peak value output. The peak point of the cosine waveform aligns with the center peak value of the filter window for this case. The filter design is also scaled such that the peak value of this convolution result will have a value of 2 for a sine wave with a peak value of The frequency of 1350Hz can be restated as (2i+1+.4)*250 and (2i+11.6)*250 for subbands 2 and 3, and substituting 1t/64 for 21t250/32000, s[2] and s[3] can be stated as: 19
PAGE 25
s[32m+2] = cos[(2i+1+.4)(32mn+256)7tl64]*h[n]*cos[(2i+1)(n16)7t/64], and n=O s[32m+3] = cos[(2i+ 11.6)(32mn+256)7tl64]*h[n]*cos[(2i+ 1 )(n16)1t/64]. n=O (2.1Sa) (2.1Sb) Using the trigonometric identity in equation (2.1 ), assuming the sum of frequencies term is attenuated by the filter, letting the Y2 factor on the remaining frequency difference term be cancelled by the normalized window gain of 2, and substituting i=2 and i=3, these equations simplify to: 51J.... s[32m+2] = L cos[(.4)(32mn+256)7tl64 + 57tm/2 37tl4]*h[n], and n=O 51J.... s[32m+3] = L cos[(1.6)(32mn+256)7tl64 + 71tm/2 1t/4]*h[n]. n=O (2.20a) (2.20b) For the filter that is designed in the next section the gain of h[n] at .4 and 1.6 from the modulation frequency will be shown to be approximately 1.000 and .0337. Substituting these values for h[n], .the summation signs can be removed, and the first five values of s[32m+2] and s[32m+3] will be: m s[2] s[3] 0 .7071 .023a 1 .san .0053 2 .4540 .0300 3 .4540 .0300 4 .san .0053 Table 2.1 The subband signals are essentially a decimated sample of the modulated frequency difference of 1 00 Hz from frame to frame with an additional 7tl2 added per frame. The above subband signals are then used by the synthesis transform to reconstruct the original signal x[n]. To see how the synthesis process works, consider the operation for only one nonzero band. The example above approximates this situation, since the synthesis filter will multiply subband 3 by .0337 again, for a total transform gain of .0011. The inverse transform formula in equation (2.15), when only s[2] is nonzero simplifies to: U[32m+i] = s[32m+2]*cos[5*(16+i+32m)1t/64], fori= 0 to 31. (2.21) 20
PAGE 26
For i = 0 to 31, cos[5*(16+i)7t/64 generates a 5/4 period cosine section from 57t/4 to 151t/4 at time frame m = 0, a section from 151t/4 to 251t/4 at time frame m = 1, etc. The cosine sections from four successive frames are shown in Figure 2.8. If sixteen sections are created for the past siXteen frames and multiplied by the subband values for the most recent sixteen time frames and summed, the 32 point value would still be a 125.0 Hz waveform section. The filtering operation by the baseband prototype filter, however, filters 16 sections of 1250 Hz waveform crunched closer together than normal and recreates the original 1350 Hz waveform. x184 8.8 e.G 8.4 8.2 8.2 8.4 8.6 8.8 ....................... .. 5 18 15 28 25 38 Figure 2.8 Four frames of subband 2 cosine synthesis sections. The detailed process implementation is now shown in a form suitable for implementation on a computer, MPEG specification [5]. Groups of 32 samples are shifted into the filter each time frame. Groups of 64 samples are multiplied by the modified filter window. The 64 values of each of eight groups are added as described in the previous section where an index substitution was made to improve computational efficiency. The 64 point vector is then transformed with a modified Discrete Cosine Transform to separate the time domain input signal into subbands of time domain signals. The modified Discrete Cosine Transform provides a 1t/2 phase difference between adjacent bands with the 2i+ 1 term in the equation below, andwith the inverted portions of the filter window. This provides the aliasing cancellation required. Then+ 16 term also phase shifts the decimated/modulated values from frame to frame by 1t/2 radians. Figure 2.9 from Shlien [13] shows a pictorial implementation of the analysis filter process. The equation implementation is: 21
PAGE 27
s[i) = E M[i][k) C[k+64j) x[k+64j] where, k=O j=O i is the subband filter index from 0 to 31, (2.22) s[i] is the subband output sample for subband i and changes every 32 samples, C[k+64j] = (1); h[k+64j], where his the 512 point filter window with the first value= 0, x[n) is the 512 point audio input sample, and M[i][k] = cos[(2*i+1 )*(k16)7t/64], the cosine transform coefficients. 0 32 IIUDplcs l 1111 X Mlo 11 l l 511 C W'Uidow 1 l 1 l l 511 Do subband samples Jl Figure 2.9 Analysis filter implementation The synthesis process takes 32 subband samples at a time and performs an inverse modified Discrete Cosine Transform on them. The 64 point result is shifted in parallel to a 16 deep 64 wide FIFO buffer, containing the results of the present and 15 previous inverse transform results. The latest 32 sample output consists of taking alternating 32 sample groups from the 16 buffers, multiplying them by the 512 point sample window and summing them to provide 32 outputs per frame. The inverse modified Discrete Cosine Transform again provides a 7tl2 phase difference between adjacent bands with the 2k+ 1 term in the equation below, to provide the aliasing cancellation required. The 16+i term, the alternating halves of the FIFO buffer, and the inverted portions of the synthesis filter window provide 7t/2 phase changes from frame to frame to compensate for the similar time frame phase change in the analysis process. The implementation is shown pictorially in Figure 2.1 0, again from Shlien [13]. The equation implementation is as follows: 22
PAGE 28
Preshift the 1024 point 16 frame FIFO buffer, calculating V[i] fori= 1023 down to 64 using, V[i] = V[i64]. Perform the inverse modified Discrete Cosine Transform for new V[O] to V[63]. V[i] = f N[i][k]*S[k], where S[k] are the 32 subband samples, k=O N[i][k] = cos[(16+i)*(2k+1)7t/64]. Build a 512 point U vector from alternating halves of the V vectors. U[64*i + j] = V[128*i + j], U[64*i + 32 + j] = V[128*i + 96 + j], for i = 0 to 7, j = 0 to 31. Multiply the 512 point U vector by the 512 point D window. W[i] = U[i]*D[i], for i = o to 511, where D[i] = 32*h[i], for INT(i/64) even, and D[i] = 32*h[i] for INT(i/64) odd. Calculate the latest 32 output samples. x[n] = f W[n+32i], for n = 0 to 31. i=O (2.23) (2.24) (2.25) (2.26) (2.27) (2.2Ba) (2.28b) (2.29) The D window values are 32 times the C window values to compensate for the division by the interpolation ratio that occurs during the interpolation process, Crochiere and Rabiner [2]. The above equations demonstrating the subband coder are implemented in a computer program and the results discussed later. 23
PAGE 29
lllbbaad o0 G =8 Vvll:lor amplcs ]I 6] I D wiadow Figure 2.10 Synthesis filter implementation. 16 Vvec:rarfi.fo I024umplcs .. .. 24
PAGE 30
CHAPTER 3 ANALYSIS/SYNTHESIS FIL TEA DESIGN Filter Requirements The requirements for the combined analysis and synthesis filters in the coder, as discussed before, are that the resulting frequency response be flat, and that any aliasing effects generated by the decimation and interpolation operations be cancelled. An ideal filter cannot be designed with a finite filter so the nonideal filter responses for adjacent bands need to overlap to avoid gaps in the frequency response. The requirement for a flat frequency response in the overlapping areas can be met by a fairly simple requirement. Since there is a filter in both the decimation and interpolation operations, the product of the gain of these two filters at a particular frequency plus the product of the gain for the same frequency as contributed by the adjacent band must be a constant. For example, if two adjacent modulation frequencies are specified as 1250 Hz and 1750 Hz, for any frequency between, the sum of the squares of the gains for the two contributing filters must be a constant. Outside adjacent bands, the frequency gain should be as close to zero as possible. 25
PAGE 31
1 8.9 8.8 0 8.7 ........ ................. ......................... 8.6 ...... 3 8.5 ..... )C 8.4 8.3 8.2 8.1 8 8 8.5 1 1.5 2 2.5 3 Radians Figure 3.1 Symmetrical overlapping filters with a flat combined frequency response. The second requirement, cancellation of the effects of aliasing was illustrated earlier. If the relative phasing between adjacent bands is defined properly, the aliasing terms may be subtracted from each other. It is apparent from Figure 3.1 that for this cancellation to be completely effective, the shape of the frequency response for the two adjacent bands must be symmetrical with respect to the midpoint between bands. This does not necessarily mean that each filter must be symmetrical about it's center modulation frequency, but if the same prototype filter is to be used for all bands, its evident that a symmetrical filter is the most straightforward way of assuring that edges of adjacent filters are symmetrical with respect to each other. Figure 3.1 illustrates a set of filters meeting this requirement. The first step in the filter design is determining a filter length, the length being the number of filter coefficients. The filter needs to be large enough to reduce the passband ripple and attenuate the stopband frequencies to the desired level for each subband. The length should be a power of two for computational efficiency. It should be short enough, on the other hand, to represent a relatively constant frequency content per the requirements of a shortterm Fourier transform. For the music compression application, a 44.1 KHz or 48KHz sampling rate has been standardized as providing double the maximum human auditory bandwidth, and 32 subbands has been found to be a good compromise between complexity and adequate frequency separation to meet human perceptual listening needs. Through evaluation of 26
PAGE 32
filter lengths, 512 points has been shown to provide less than .07 dB of passband ripple and greater than 100 dB of stopband attenuation. At 44.1 KHz, 512 points represent 11.6 milliseconds. With 32 to 1 decimation for the 32 bands, 32 sample points at a time are slid by the filter window. These 32 samples represent 725.6 microseconds. With 32 samples and 512 points, the shortterm Fourier transform overlap and add operations effectively averages 16 sets of 32 samples. This combination of times and overlap has been shown to be sufficient to meet demanding listening requirements in most cases. The final section in this paper summarizes some of the supplementary techniques that are used to handle exceptional situations. The symmetry requirements for a filter with an even number of coefficients can be stated as: h[N1m] = h[m], m=O ... N/21 (3.1) For the 512 point filter, h[511] = h[O], h[510] = h[1], ... h[256] = h[255]. Often it's desired to have an odd number of coefficients. This is accomplished by setting h[O] = 0., setting h[ N m ] = h[m] for m = 1 ... N/2 1, and making the maximum center value h[N/2] unique. The second case is chosen for this filter design for two reasons. It simplifies the discussion of the development of the filter, and the existing MPEG standard filter has an odd number of nonzero coefficients. The filter developed here can then be compared more easily with the MPEG filter. Also note than an FIR filter rather than an IIR filter is used because standard FIR filter design techniques provide symmetrical filters, FIR filters are linear phase, and they guarantee stability. Cosine Summation Problem Statement The proposed design technique recognizes that standard FIR windows use a cosine or sum of cosine terms to define the window. The following standard windows illustrate this. Hanning: h[n] = 0.5 0.5*cos(27tn/M), 0 n M, h[n] = 0.0, otherwise. Hamming: h[n] = 0.54 0.46*cos(27tn/M), 0 n M, h[n] = 0.0, otherwise. Blackman: h[n] = 0.42 0.5*cos(27tn/M) + 0.08*cos(47tn/M), 0 n M, h[n] = 0.0, otherwise. (3.2a) (3.2b) (3.3b) This idea is expanded on, recognizing that any finite symmetrical curve shape can be represented, by use of a Fourier cosine series expansion, as a sum of cosine terms. For example, the curve 1 x2 between 1 and 1 is shaped somewhat like the windows and can 27
PAGE 33
be represented as a cosine series sum. The subband filter with a sum of squares requirement seems a likely candidate for specification as a finite cosine series sum. The subband ideal filter requirements are stated below. G/(ro) + G/(ro) = k, a constant, for ro1 ro G1(ro) = 0., ro >= G2(ro) = 0., ro ro1 (3.3a) (3.3b) (3.3c) where ro1 are two adjacent modulation frequencies, and G, and G2 are the gain in the associated frequency bands. Previously, the solution to this requirement by Johnston [6] minimized the following error function. (3.4) The above method is a variant of the method of steepest descent and solves individually for each filter coefficient. The value a is used to weight the importance of the passband ripple and the stopband attenuation. The error function requires using different step sizes to see which one optimizes the solution. The main drawback is that the filter coefficients are not described concisely, consisting of N/2 non related points. If the design is to be used by personnel other than the designer, each of these points need to be manually typed into the design. The MPEG standard filters, for example, are not available electronically in the public domain, and require purchase of a paper document and subsequent typing of 250 1 0digit numbers into a design. Other solutions to the problem include formulating a set of double poles and double zeroes in the complex plane that meet the requirement, Wackersreuther [14], and formulating an aliasing cancellation solution in the time domain rather than the frequency domain, Princen and Bradley [10]. The latter solution is limited to filter lengths equal the number of bands, however. It should be noted that synthesis filters compatible with the MPEG standard must use the MPEG standard filter. Otherwise, the synthesis filter will not be compatible with the analysis filter used to create the compressed data. The design technique proposed here is primarily useful for new designs and as an educational tool. The proposed design technique defines the window as: h(x) = L a[n]*cos(21tnx), where N is the number of cosine terms to be used. (3.5) n=O The a[n] terms are determined by convolving several frequencies in the passband and in the stopband with each of the cosine terms multiplied by the ideal brickwall filter response sin(ro)/ro. The gain for each frequency tested is the sum of the peak values from the convolutions with each cosine term. The convolution to find the peak can be replaced by a simple sum of multiplications by aligning the peak value for each frequency with the 28
PAGE 34
peak value of the window. For a frequency f and a filter with 2K + 1 terms, this can be expressed in discrete form for each cosine term I frequency combination as: "t [a[n]*cos(27tn*(k+K)/(2K+2))*[sin(rock)/rockl*cos(27tk*f/fJ], k=K (3.6) where f5 is the sampling frequency, and roc is the prototype low pass filter cutoff frequency. The gain, or peak value, for each frequency is then expressed as: G(f) = 'f [a[n]*cos[27tn(k+K)/(2K+2)]*[sin(rock)/rock]*cos(27tkf/f5)]. n=O k=K (3.7) The symmetry of the above permits reducing the inner summation to 0 to K. The 'summation is also normalized to 1 by replacing rock in the denominator with 1tk, giving, G(f) = f [c[k]*a[n]*cos[27tn(k+K)/(2K+2)]*[sin(rock)htk]*cos(27tkf/f5)], n=O k=O where c[k] = 2, k '* o, c[O] = 1 (3.8) The c[O] term at one half value compensates for the single occurrence of the peak at k = 0. The filter cutoff frequency roc, defines the reduction in bandwidth for multiple bands and the amount of overlap. With the type of modulation chosen, the nominal bandwidth for M brickwall filters would be 7t/2M. The amount of overlap is arbitrarily specified as 8/7 for now. Criteria for overlap selection will be discussed at the end of this subsection. Before specifying the equation for the music application, the ratio of the range of f to the sampling frequency f5 is first discussed. The ratio of f/f5 can be normalized as a function of the number of subbands. For example, for 32 subbands, the distance between modulation frequencies is 1/32 the Nyquist bandwidth of f/2. For f5 equal 32 KHz, the range of f to be evaluated is from 0 to 500 Hz. For a brickwall filter, the gain would be one for f < 250 Hz and 0 for f >250Hz. With overlapping filters, the lower sideband of the adjacent filter, though, completely overlaps the full 500 Hz. If the frequency is specified in terms of the delta from the midpoint, 250 Hz, the gain requirements can be specified as, G2(250+Af) + G2(250Af) = 1, 0 S Af s 250 Hz, G2(250+Af) = 0, Af >= 250 Hz. 29 (3.9a) (3.9b)
PAGE 35
Note that the ratio between f5 and the subband width 500 Hz is 64 to one. If f5 = 44.1 KHz had been used, the subband width would have been 689.0625 Hz and the ratio still 64 to 1. f5 equal 32 KHz is used in the example to avoid the use of unwieldy numbers. Cosine Summation Solution For the music application with a 511 point filter, 32 bands and an f5 of 32 KHz, a solution with 8 cosine terms is attempted. The resulting equation is: G(f) = t c[k]*a[n]*cos(1tn(k+K)/256)*[sin(1tk)(8/7)/64)htk]* n=O k=O cos(21tk(250+L\f)/32000)], (3.10) where c[k] = 2, k o, c[O] = 1. The only unknown term, a[n], can be taken out of the inner summation, and the remaining inner summation can be calculated for several frequencies. For f, = 250 At, define, l[i][n] = f5 [c[k]*cos[1tn(k+K)/256]*[sin(1tk/56)/1tk]*cos(21tkf/32000), k=O for n = 0 to 7. For f, = 250 + at, define, m[i][n] = [c[k]*cos[1tn(k+K)/256]*[sin(lot/56)/1tk]*cos(21tkf/32000), k=O for n = 0 to 7. (3.11a) (3.11b) The sum of square gain criteria, and the stopband attenuation requirement can now be stated as: {a[O]*I[i][O] + a[1]*1[i][1] + ... + a[7J*I[i][7]}2 + {a[O]*m[i][O] + a[1 ]*m[i][1] + ... + a[7]*m[i][7]}2 = 1, for af < 250 Hz, and {a[O]*m[i][O] + a[1]*m[i][1] + ... + a[7]*m[i](7]}2 = 0, for Af >= 250 Hz, for a set of delta frequencies, Af,. 30 (3.12a) (3.12b)
PAGE 36
The C program in Appendix B calculates the coefficients l[i][n] and m[i][n] for 13 frequencies below 250 Hz and 9 frequencies equal or greater than 250 Hz. Twenty two equations with 8 unknowns are formulated. One further constraint is that the window should go to 0 at the window endpoints and be normalized to 1 at the window midpoint. The Hamming, Hanning, and Blackman windows above all meet these requirements. For the endpoints this requires, t (a[n]*cos(1tn*(25S)/256) = 0, or, n=O a[O] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] = 0. For the midpoint this requires, t {a[n]*cos(1tn*(256+0)/256) = 1, or, n=O a[O] a[1] + a[2] a[3] + a[4] a[5] + a[6] a[7] = 1, increasing the number of equations to 24. (3.13) (3.14) (3.15) (3.16) A least squares solution for 24 equations and 8 unknowns is easily solved with a MATLAB matrix solution capability and an iterative technique. The iterative technique recognizes that at Af1 = o,. the I term summation and m term summation are equal to each other, and therefore equal 1//2, and that furthermore them term summations for Af1 > 0 decrease rapidly to 0 as Af1 becomes larger. The sum of square equations for Af1 0 can be reformulated to take advantage of this: a[O]*I[i][O] + a[1]*1[i][1] + ... + a[7]*1[i][7] = J 1. {a[O]*m[i][O] + a[1]*m[i][1] + ... + a[7]*m[i][7]}2 (3.17) The right side of the equation can be initialized with the coefficients arbitrarily set to tile Blackman window coefficient values a[O] to a[2] for example, with a[3] through a[7] initialized at 0. The left side of the equations are now a set of linear equations. MATLAB's standard matrix solutions provide a little known additional capability when the number of equations exceed the number of unknowns. The right side of the equation is subtracted from the left side solution, which ideally should equal 0., and the root sum square of those differences for the full set of equations is minimized. 31
PAGE 37
This also means that the equations can be scaled to weight the importance of the equations. The stopband equations can be weighted differently than the passband equations to control the relative importance of the passband ripple compared to the stopband attenuation. The MATLAB program in Appendix B is solved assuming a passband ripple of .04 dB and a stopband attenuation Of 100 dB. For .04 dB ripple, the passband gain error should equal 1004120, or approximately .005. For 100 dB stopband attenuation, the m coefficient summation should be less than .00001. If the passband equation set remains normalized at 1, and there are 13 passband equations and 9 stopband equations, the stopband equations should be scaled at (.005/.00001 )*(13/9) to give equal weighting to the passband ripple specification of .04 dB and the stopband attenuation specification of 1 00 dB. Finally, the two window endpoint and midpoint equations need to be scaled properly. The ideal a[n] window coefficient summations of 0. and 1. are arbitrarily specified for an error of .0002. Since there are two equations, they are scaled by (.0002/.00001 )*(13/2) relative to the passband equations. Filter Design Results The MATLAB program in Appendix B implements the above as described converging on a set of a[n] coefficients that meet the requirements in five iterations. The program takes less than 30 seconds to execute with a 33 MHz 486 PC compatible without floating point capability, and is near instantaneous on a 90 MHz Pentium with a floating point capability. The resulting a[n] coefficients a[O] through a[7] are: 3.7929857e001, 4.9704195e001, 1.2037640e001, 3.0637723e003, 2.9509270e004, 9.9179934e005, 2.9924455e005, 6.6170n4e006 Figure 3.3 shows the resulting subband coder filter defined by the coefficients a[n] and the result of inverting the filter for alternating groups of 64 coefficients as described in the previous section. The passband ripple values and the stopband attenuation values are shown in Table 3.1. The desired window midpoint value of 1.00000, and endpoint values of .00000 are accurate to 5 decimal places. 32
PAGE 38
8.6 ............... 8.4 8.2 8 1r!'C :...:............ 8 188 288 388 488 588 688 n 1 8.8 8.6 8.4 8.2 ... = 8 .... u 8.2 ................ : .................. : ................ : .................. : .................. : ............... . . . . . . . . . 0 . 8.4 ............... ;. ................ : ............. : ................ . . . . . . . . 8.6 . :: .... : . . . . . . . 8.8 . :::. . . . . . 1 8 188 288 388 488 588 688 n Figure 3.3. Resulting subband coder filter before and after partial inversion. 33
PAGE 39
Table 3.1. Window passband ripple and stopband attenuation results. gain ripple(f<250) 25Qf 250+f sum of or freq calc X(w) calc X(w) square atten(f>250) 0.0 0.70734 0.70734 0.7073405 0.7073405 1.000661 0.002870 5.0 0.73330 0.6803619 1.000628 0.002728 10.0 0.75815 0.6524831 1.000533 0.002312 15.0 0.78180 0.6238303 1.000381 0.001653 25.0 0.82523 0.5647582 0.999957 .000185 50.0 0.90974 0.4139222 0.998957 .004530 62.5 0.93931 0.93931 0.3414722 0.3414722 0.998908 .004747 75.0 0.96134 0.2740052 0.999252 .003249 100.0 0.98730 0.1605719 1.000546 0.002372 125.0 0.99736 0.99736 0.0809895 0.0809895 1.001294 0.005617 150.0 0.99989 0.0336023 1.000901 0.003909 187.5 0.99994 0.99994 0.0049925 0.0049925 0.999906 .000408 200.0 0.99991 0.0019734 0.999823 .000770 250.0 1.00004 1.00004 0.0000000 0.0000000 1.000076 0.000330 312.5 0.0000000 0.0000000 169.14 375.0 0.0000000 0.0000000 .79 437.5 0.0000001 0.0000001 .55 500.0 .0000006 0.0000006 .06 562.5 .0000003 0.0000003 .03 625.0 0.0000005 0.0000005 .89 750.0 0.0000004 0.0000004 128.61 875.0 0.0000003 0.0000003 .85 The results are corroborated by performing a Fourier transform on the resulting window and comparing the gain at several frequencies. The results are shown in Table 3.1 in the X(w) column and agree exactly. The Fourier transform frequency response is plotted in Figure 3.4 showing more than 100 dB attenuation for frequencies greater than 500 Hz. The results are further verified by convolving several selected frequencies with the window, and comparing the peak value, when the waveform is completely in the window. These results also agreed. The primary drawback of the above design technique is that an overlap value must be determined by trial and error. The I and m coefficients need to be recalculated for each overlap value attempted. This takes about 30 seconds with the Pentium processor and several minutes with the 486 PC compatible. The technique by Johnston supposedly converges on an optimum overlap value, although this was not verified. The value of an used is selected because it agrees closely with the MPEG filter overlap. The window coefficients also compare reasonably well with the MPEG coefficients. 34
PAGE 40
: 00000000000000 ooooooooo OOOOOOOOOOOOO'oooooooooooooo 0000000000000 ooooooooooooooooooooooooo ooooo oooooooooo oooooooooooooooooooo 184 0 ...................... 0. 0 0 0 0 0........ 0 o 0 0. ,... 3 ..... 185 ooooooooooooooooooooooooo o,ooooo )( 186 18_., o: 188 0 : 0 0 : 0 0 0 . . . . . . . 189 . 0 : 0 0 0 0 : 0 0 0 0 0 . . . . . . . . . 18J.e 8 588 1888 1588 2888 2588 3888 3588 Hz Figure 3.4 Filter frequency response. 35
PAGE 41
CHAPTER 4 APPLICATION OF THE RESULTING SUBBAND CODER TO SINE WAVE SAMPLES AND AN ACTUAL MUSIC SAMPLE The subband coder and filter were first tested on pure sine waves to verity the complete algorithm, and to verity understanding of the phase and aliasing cancellation relationships. For this evaluation a simple sine wave generation program was written to create a specified frequency. Equation 2.20 (together with a calculation of the initial phase) is used to calculate the band values at specified frames. The expected and actual results for frequencies of 1250Hz, 1350Hz, 1500Hz, 1510Hz and 1800Hz are shown in Table 4.1. The predicted values take into account the 7tl2 phase shifts per band. The window attenuation values are a function of the distance from the modulation frequency. A peak sine wave value of 1 0000.0 is used. 36
PAGE 42
ideal(2) actual(2) error( dB) ideal(3) actual(3) error( dB) ideal(2) actual(2) error( dB) ideal(3) actual(3) error( dB) ideal(2) actual(2) error( dB) ideal(3) actual(3) error( dB) ideal(2) actual(2) error( dB) ideal(3) actual(3) error( dB) ideal(2) actual(2) error( dB) ideal(3) actual(3) error( dB) Table 4.1. 1250 Hz subbands 2 and 3 85n.71 5141.28 8577.71 5141.28 85n.71 5141.288577.71 8585.50 5145;61 8585.50 5145.61 8585.50 5145.61 8585.50 59.16 64.26 59.16 64.26 59.16 .64.26 59.16 0.00 0.00. 0.00 0.00 0.00 0.00 0.00 1.76 10.39 1.76 10.39 1.76 10.39 1.76 72.08 56.66 72.08 56.66 72.08 56.66 72.08 7409.36 7398.44 56.23 224.10 224.93 78.61 1350 Hz subbands 2 and 3 1on.8o 8676.40 9121.91 2047.06 6715.46 1 083.86 8672.58 9111 .38 2038.48 6715.00 61.34 65.35 56.54 :.58.32 83.74 331.76 165.90 136.72 326.63 247.26 321.61 153.15 141.57 319.58 234.11 56.86 54.88 63.27 60.03 54.61 1500 Hz subbands 2 and 3 9941.55 9932.43 57.79 35.97 44.36 58.51 6237.30 6237.30 6237.30 6237.30 6237.30 6237.30 6237.30 6212.43 6212.43 6212.43 6212.43 6212.43 6212.43 6212.43 49.08 49.08 49.08 49.08 49.08 49.08 49.08 3333.91 3333.91 3333.91 3333.91 3333.91 3333.91 3333.91 3294.80 3294.80 3294.80 3294.80 3294.80 3294.80 3294.80 45.14 45.14 45.14 45.14 45.14 45.14 45.14 1510Hzsubbands 2 and 3 3571.06 3221.19 2858.61 2484.75 2101.08 1709.12 3535.60 3184.75 2821.33 2446.78 2062.57 1670.21 45.99 45.76 45.56 45.40 45.28 45.19 6344.08 6592.11 6814.12 7009.24 7176.70 7315.84 6312.63 6562.36 6786.21 6983.26 7152.76 7294.03 47.04 47.52 48.07 48.70 49.41 50.22 1950.69 1937.67 54.70 17.46 25.90 58.46 1800 Hz subbands 3 and 4 9929.59 4186.14 7342.42 8724.00 9922.22 4194.60 7329.82 8724.67 59.64 58.44 54.98 80.47 2.09 16.16 12.08 8.70 10.16 19.62 22.29 5.84 58.85 66.21 56.81 67.86 37 1950.69 1937.67 54.70 17.46 25.90 58.46 1310.42 1271.27 45.13 7426.10 7406.51 51.15 9929.59 9922.22 59.64 2.09 10.16 58.85
PAGE 43
The resynthesized results are next compared with a delayed version of the original sine wave. The output results for the first 31 samples are shown in Table 4.2. Table 4.2. Subband coder applied to a 1350 Hz sine wave. Subframe 0 Input 0.0 2619.8 5056.6 7140.1 8725.0 9700.3 9998.1 9597.4 8526.4 6859.8 4714.0 2238.9 392.6 2996.7 5391.4 7409.5 8910.1 9788.2 9982.7 9479.8 8314.7 6568.8 4364.1 1854.5 784.6 3368.9 5717.9 7667.5 9081.4 9861.0 9951.8 9347.5 Subframe 15 Output 3.7 22.3 2592.8 5027.6 7111.7 8699.4 9679.6 9983.5 9589.8 8525.8 6865.8 4725.8 2255.2 373.2 2975.6 5370.3 7389.8 8893.0 9n4.7 9973.6 9475.8 8316.1 6575.7 4376.4 1871.8 762.9 3343.8 5690.5 7639.3 9054.1 9836.3 9931.4 Note that the output is delayed 15 subframes plus one sample from the input. Fifteen subframes are required for the waveform to be completely shifted by the 511 point window when shifted 32 samples per frame. The one sample additional delay is explained by the first samples shift to window points 31, 63 ... 255 ... 479, 511, and a center maximum window at window point 256. The samples are further delayed by approximately .008 samples. The reason for this is still being investigated. Even if this delay is taken into account, there are still some residual errors. Reasons for these errors include the nonideal filter, and possibly that the averaging of 64 sets of eight samples during the filtering operation do not have the same attenuation as a 512 point fiiter. The error sources need to be further investigated. 38
PAGE 44
The subband coder is next applied to an actual musical sample. About 1 0 seconds of the Grateful Dead's "Tons Of Steel" were captured using a 16bit PCcompatible Soundbla,ster unit sampling at 44.1 KHz. A program was written to extract one channel of data from the stereo binary data. The input and resulting delayed output are compared in Table 4.3 for the first 31 samples. Table 4.3. Subband coder applied to an actual musical sample. Subframe 0 Input 2569.0 2382.0 3817.0 5703.0 5985.0 7155.0 9135.0 9567.0 10042.0 9809.0 9163.0 8989.0 7615.0 5809.0 4468.0 2205.0 11.0 1239.0 2716.0 5018.0 5802.0 5075.0 4965.0 2970.0 7.0 646.0 1607.0 3916.0 5463.0 6584.0 6802.0 5994.0 Subtrame 15 Output 5.7 2587.4 2396.7 3827.1 5711.8 5995.7 7166.0 9143.5 9575.2 10050.7 9817.6 9170.1 8991.7 7616.3 5812.2 4471.8 2207.9 16.2 1228.6 2708.1 5014.8 5796.6 5067.5 4957.6 2960.0 22.4 664.4 1634.6 3955.7 5502.9 6620.9 6836.2 Additional error results from signal in nonadjacent bands, where the ideal stopband attenuation is not completely met. An additional error possibility is that the original musical sample, since it was recorded under less than ideal conditions, may contain frequency content outside the 22.05 KHz Nyquist bandwidth, which shows up as aliasing. The analog filter, oversampling, and digital input filter in the Soundblaster unit should eliminate most of this. Any noise following the analog filter and in the AID conversion process though, cannot be digitally filtered. The short sample may also contain DC content. In the MPEG application, the samples are usually passed through a digital 1 0 Hz DC blocking filter prior to analysis. 39
PAGE 45
CHAPTER 5 SUPPLEMENTARY AND ALTERNATIVE COMPRESSION TECHNIQUES The subband coder by itself provides only part of the compression ratio achieved. The latest version of the MPEG audio standard provides varying levels of compression with different levels of resulting quality. To provide musical quality which cannot be discerned from the original by an experienced panel of listeners for a variety of musical test passages, a compression ratio of only 6 to 1 can be achieved. To achieve this level of compression, the subband coding technique discussed in this paper must be supplemented by a parallel 1 024 point FFT analysis of the music during recording to optimize the amount of compression appropriate in each subband, particularly the lower frequency bands. Variable recording rates with FIFO storage to accommodate artifacts of certain digitally difficult musical passages (sudden percussive instruments, for example, generate a preecho}, are provided. Sophisticated scaling optimization, and Huffman bit. level compression are also used. Six to one compression still does not permit the transmission of music over modems or ISDN data channels at a reasonable rate. Additional compression with less than "perfect" music quality is often utilized Acceptable music quality for digital FM stations, can be achieved, for example, with two 64K ISDN channels. This is the Digital Audio Broadcasting (DAB) standard which implements layer 2 of the MPEG1 standard. The newest version of the standard, MPEG2, provides a "low bit rate" layer consistent with 28.8K modem levels. Less sophisticated techniques that do not take advantage of subband coding are also used. Transmission of sound clips over the Internet often simply reduce the sampling rate to BK, use the ulaw or Alaw compression algorithm from the telephone communications industry to reduce 12 bits of resolution to 8, and transmit mono only. This is somewhat listenable, comparable to AM radio broadcasting. Adaptive Differential Pulse Code Modulation (ADPCM) is also often utilized to provide additional improvement. These techniques do not take advantage of the subband coding techniques discussed in this paper. 40
PAGE 46
CHAPTER 6 SUMMARY AND ADDITIONAL RESEARCH POSSIBILITIES A detailed look at the subband coder has been presented. Although not as mathematically complete as existing descriptions, it hopefully provides a more practical understanding of the transform operation based on digital signal processing fundamentals. By using real example bandwidths, sample rates, decimation and interpolation ratios, and relating them to an actual application and filter, an understanding of filter and transform operation is provided. The most interesting and challenging aspect of the subband coder is the variety of signal processing techniques used in its application. Prior to this research, it was believed to be just another filter application. The combination of decimation and interpolation, modulation, aliasing cancellation, transform of timevarying signals, block transformation techniques, and special windowing requirements make the subject an extremely interesting area of study. The new filter design approach was undertaken originally because a clear requirements statement and design description could not be found in the literature until late in the research effort. In an attempt to understand the design and filter requirements a different and possibly new design technique was developed. Although not as powerful as present methods because the overlap factor needs to be determined by trial and error, the ability to express the filter in a concise manner is certainly useful. The effort expended in designing the filter also provided an improved understanding of its operation. An area where more investigation would be useful is in better understanding the sources of error in the subband coder. This would include exploration of whether and to what extent "smearing" occurs within a band, the effect of changing frequency content in the sliding shortterm Fourier transform window, and comparing the effectiveness of this filter with the Johnston steepest descent filter design method by implementing that method. It would also include typing in the 512 point MPEG specification filter and comparing its performance, directly acquiring a digital musical sample without analog processing and determining for sure whether the analog processing contributes errors. Alternatives to utilizing a actual digital sample may also be helpful. Filtering the sample to eliminate aliasing and any DC component would be valuable. Determining how much noise is present by recording the same sample several times, passing it through a higher sample rate interpolation filter, correlating the experiments to align them, and computing the error level would also be of interest. 41
PAGE 47
APPENDIX A. LISTING OF THE SUBBAND CODER C PROGRAM llllllllllllll/llll/llllllllllllllllllllllllllllllllllllllllllll/11/11/llllllll II Listing of the Subband Coder C Program II Titlempeg.c Revision 33196 6:08pm 5351 bytes lllll/lllllllllllllllllll/lllllllllllll/1/lllllllllllllllllllllllllllllll/lllll #include #include #include #include #include #include #define PI 3.1415928 float in_data[16][32]; float c_window[16][32]; float d_window[16][32]; float d[513]; float z[65]; float m_cos[32][64]; float n_cos[32][64]; float s_chan[32]; float u[16][64]; float out_data[32]; FILE *fptr ; FILE *fptr1 ; FILE *fptr2 ; main() { float k_floatl angll bl c_value; float raw_datal in_float 1 y_in 1 alph; int il i1 I jl j1 I k1 c1 I c2; int k1 I k21 k3; int isign; t =1 for forward cos xforml 1 for inverse */ int n; r transform size ., int band_no; r subband number to print*/ printf("Enter band number to print\n"); scanf("%i"l &band_no); 42
PAGE 48
/*calculate c window array, initialize in_data */ for (i = 0; i < 16; i++) { for (j = 0; j < 32; j++) { k = 32*i + j; k_float = (float) (k ); angl = PI (k_float 255.00000001 )/ 56.; b = 3.7929857e001 4.9704195e001*cos(2.*PI*k_float/512.) + 1.2037640e001*cos(4.*PI*k_float/512.) 3.0637723e003*cos(6.*PI*k_float/512.) + 2.9509270e004 *cos(8. *PI*k_float/512.) + 9.9179934eOOS*cos(1 0. *PI*k_float/512.) + 2.9924455eOOS*cos(12. *PI*k_float/512.) + 6.6170774e006*cos(14.*PI*k_float/512.); c_value = (1./28.) b sin(angl)/angl; c[k] = c_value; /* window is inverted from picture to compensate for nonfifoed input*/ if (i == 2 II i = 3 II i == 6 II i = 7) c_value = (float) 0.0 c_value; if (i == 10 II i = 11 II i == 15 II i == 16) c_value = (float) 0.0 c_value; c_window[i][j] = c_value; in_data[i]U] = 0.; u[i]U] = 0.0; I* define cosine array *I for (i = 0; i < 32; i++) { for (j = O; j < 64; j++) { m_cos[i]U] =cos( (float) ((2*i + 1 ) (j16)) Pl/64.0); n_cos[i]U] = cos( (float) ((2*i + 1 ) (j + 16)) Pl/64.0); } fptr1 = fopen("sine.fre","wr"); fptr2 = fopen("sine.sub" ,"a"); fprintf(fptr1, "\r\n window freq sweep \r\n"); for (i = 0; i < 64; i++) { fprintf(fptr1, "\r\n"); for (j = 0; j < 8; j++) { k = 8*i + j; fprintf(fptr1, "%8.5f ", c[k]); 43
PAGE 49
r calculated analysis window*/ fprintf(fptr1, ''\r\n"); for (i = 0; i < 512; i++) d[i] = 32:c[i]; fprintf(fptr1, "\r\n d analysis window \r\n"); for (i = 0; i < 64; i++) { fprintf(fptr1 I "\r\n"); } for 0 = 0; j < 8; j++) { k = 8*i + j; fprintf(fptr1 I "%8.5f d[k]); fprintf(fptr1, "\r\n"); fptr = for (i1 = 0; i1 < 25; i1 ++) { fprintf(fptr1, "\niteration number %d\r\n", i1 ); i = i1 % 16; /* % is the modulus operator *I for 0 = 0; j < 32; j++) { fscanf(fptr, "%f", &raw_data); in_data[i]Ul = raw_data; fprintf(fptr1 I 1'\r\n in_ data \r\n"); fprintf(fptr1, "); for 01 = 0; j1 < 32; j1 += 8) { for 0 = 0; j < 8; j++) fprintf(fptr1 I "%8.1f ", in_data[i]01 +j]); fprintf(fptr1 I "\r\n"); for 0 = 0; j < 65; j++) zU] = (float) 0.0; k3 = 32; for (k1 = 15; k1 >= 0; k1} { k2 = k1 + i + (int) 1.0; if (1<2 > 15) k2 = 16; k3 += 32; if (k3 > 32) k3 = 64; for 0 = 0; j < 32; j++) { in_float = in_data[k2]0]; y_in = in_float c_window[k1][j]; z[31j+k3] += y_in; 44
PAGE 50
fprintf(fptr1, "\r\n z array \r\n"); fprintf(fptr1 "); for 01 = 0; j1 < 64; j1 += 8) { for 0 = 0; j < 8; j++) fprintf(fptr1, "%8.2f ", z01 +j]); fprintf(fptr1, "\r\n"); for (k = 0; k < 32; k++) { s_chan[k] = 0.0; for 0 = 0; j < 64; j++) s_chan[k] += ( zO] m_cos[k]O] ); fprintf(fptr1, "\r\n freq cos xform array \r\n"); fprintf(fptr1 "); for 01 = 0; j1 < 32; j1 +;: 8) { for 0 = 0; j < 8; j++) { fprintf(fptr1, "%8.2f ", s_chanu1+j]); if ( (01+D==band_no) && (i1>=15) && (i1<=21)) fprintf(fptr2, "%8.2f ", s_chanu1 +j]); } fprintf(fptr1, "\r\n"); for (k = 0; k < 64; k++) { z[k] = 0.0; for 0 = 0; j < 32; j++) { z[k] += ( s_chan[j] n_cosO][k] ); } fprintf(fptr1, "\r\n inverse cos xform array \r\n"); fprintf(fptr1, "); for 01 = 0; j1 < 64; j1 += 8) { for 0 = 0; j < 8; j++) fprintf(fptr1, "%8.2f ", z01+j]); fprintf(fptr1, "\r\n"); for 0 = 0; j < 32; j++) out_dataO] = 0.0; for 0 = 0; j < 64; j++) u[i]Ol = zO]; k3 = 32; for (k1 = 15; k1 >= 0; k1) { k2 = k1 + i + (int) 1.0; 45
PAGE 51
} if (k2 > 15} k2 = 16; k3 += 32; if (k3 > 32) k3 = 64; for 0 = 0; j < 32; j++) out_data[31j] += u[k2][31j+k3]*c_window[k1 ][j]*32.; fprintf(fptr1, "\r\n output data array \r\n"}; fprintf(fptr1, "); for 01 = 0; j1 < 32; j1 += 8) { for 0 = 0; j < 8; j++} fprintf(fptr1, "%8.1f ", out_data[j1+j]}; fprintf(fptr1, "\r\n"); fprintf(fptr2, "\n"}; } 46
PAGE 52
APPENDIX B. ADDITIONAL PROGRAM LISTINGS l/l/ll/lll/l/1/ll/l/l/ll/llllllll/lllll/l//l//ll/l/ll/l/l/ll/l/llll II Lcoef and Mcoef Analysis Filter Coefficient Computation Listing II Titlewindow.c Revision 33096 8:47pm 1972 bytes lllll/l/l/ll////l/lll/ll/l/l/l/l/l/l/lllllll/l/l/lllll/lllll/1/1/11 #include #include #include #include #include #include double 1[30][32], m[30][32]; double a[32], frq[30]; FILE *fptr1 ; FILE *fptr2 ; main() { double pi, t_mpy, n1, ck, k1, k2, t_smpl, 11, m1, p1; int i, k, n, nfreq; nfreq = 22; pi = 4.0 atan{1.0); printf("pi = %16.12f\n", pi); f_smpl = 32000.; fptr1 = fopen{"lcoef.dat" ,"wr"); fptr2 = fopen("mcoef.dat","wr"); I* put together a set of frequencies to use *I frq[O] = 0.; frq[1] = 5.0; frq[2] = 1 0.0; frq[3] = 15.0; frq[4] = 25.0; frq[5] = 50.;frq[6] = 62.5; frq[7] = 75.0; frq[8] = 100.; frq[9] = 125.; frq[10] = 150.; frq[11] = 187.5; frq[12] = 200.; frq[13] = 250.; frq[14] = 312.5; frq[15] = 375.; frq[16] = 437.5; frq[17] = 500.; frq[18] = 562.5; frq[19] = 625.; frq[20] = 750.; frq[21] = 875.; I* calculate the I and m coefficients for the matrix *I I* i is the number of frequencies *I I* n is the number of coefficients to be solved for *I 47
PAGE 53
/* k is half the length of the filter *I for (i = 0; i < nfreq; i++) { printf("%d\n", i); for (n = 0; n < 8; n++) { n1 = (double) n; l[i][n] = 0.0; m[i][n] = 0.0; for (k = 0; k < 256; k++) { if (k == 0) ck = 1.0; else ck = 2.0; k1 =(double) k + .000000001; k2 = 256. + k1 ; 11 = ( ck cos(n1 *pi*k21256.) (sin(pi*k1/56.)/(pi*k1 )) cos(2. *pi*k1 (250.frq[i]) /f_smpl) ) ; l[i][n] += 11; m1 = ( ck cos(n1*pi*k21256.) (sin(pi*k1/56.)/(pi*k1)) cos(2. *pi*k1 (250.+frq[i]) /f_smpl) ) ; m[i][n] += m1; } for (i = 0; i < nfreq; i++) { for (n = 0; n < 8; n++) { } fprintf(fptr1, "%16.7e", l[i][n]); fprintf(fptr2, "% 16.7e", m[i][n]); fprintf(fptr1, "\n"); fprintf(fptr2, llllllll////////////llllll////////llllll//////////llll//llllllll//l II Lcoef.dat and mcoef.dat partial results II Title lcoef.dat and mcoef.dat Revision 33096 8:51pm 2860 bytes //llll////////llllllllllllllll//llllll//llllll////llll////ll//llll/ 9.7947056e001 5.5379333e001 4.8944199e001 5.0387345e001 4.9848963e001 5.0032781 e001 5.0031 038e001 5.001 0276e001 1.0156627e+OOO 5.801 0879e001 4.8232784e001 5.0890297e001 4.9380024e001 5.0539167e001 4.9407948e001 5.0943008e001 1.0431789e+000 6.1 080803e001 4.75191 01 e001 5.1388330e001 4.8915572e001 5.1043636e001 4.8779843eQ01 5.1907420e001 48
PAGE 54
1.06230508+000 6.4535467e001 4.6863816e001 5.1843255e001 4.8488732e001 5.15118508 4.8188067e001 5.28431 OOe001 9.7947056e001 5.5379333e001 4.8944199e001 5.0387345e001 4.9848963e001 5.00327818001 5.0031038e001 5.0010276e001 9.3461321 e001 5.3217830e001 4.9601966e001 4.9914382e001 5.0291277e001 4.9556871 e001 5.061 0775e001 4.9162042e001 8.8141148e001 5.1534321e001 5.0166231e001 4.9500894e001 5.0679800e001 4.9139932e001 5.1113942e001 4.8441506e001 8.2049564e001 5.0312479e001 5.0610091e001 4.9169305e001 5.0992899e001 4.880494 7e001 5.1514139e001 4. 7880797e001 11111/llllll/ll/lllll/ll/l/ll/l/llllll/lllllllllllllllllll/llll/111 II Window Coefficient Computation Listing MATLAB II Titlecalcwind.m Revision 33096 8:28pm 2329 bytes llllllllll/lllllllllllllll/l/l/llllllll/llllll/llllll/l/l/1/1111111 diary calcwind.dat format compact format long nfreq = 22; nfreq1 = 14; ncoef = 8; qmax = 6; load fval.dat load lcoef .dat load mcoef.dat fori = 1 :ncoef wcoef(1 :1 ,i:i) = 0.; end vcoef(1 :1,1 :4)=[.3754 .4944 .1246 .0056]; wcoef(1:1,1:3)=[.42 .5 .08]; save wcoef.dat wcoef /ascii for q =1 :qmax q load wcoef.dat k=20.*13./2.; k1 = 500.*1319.; x = wcoef; fprintf(' gain fprintf(' 250f 250+f ripple(f<250)\n'); sum of or\n'); fprintf(' freq calc X(w) calc X(w) square atten(f>250)\n'); for i = 1 :nfreq for j = 1 :ncoef am(i:i,j:j)=xU)*mcoef(i:i,j:j); 49
PAGE 55
al(i:i,j:j)=xO)*Icoef(i:i,j:j); end ym(i)=sum(am(i:i, 1 :ncoef)); ymsq(i) = ym(i)*ym(i); yl(i)=sum(al(i:i, 1 :ncoef)); ylsq(i) = yl(i)*yl(i); ssq(i)=ymsq(i) + ylsq(i); rip(i)=1 O*log1 O(ssq(i)); att(i)=20*1og1 O(ym(i)); if (fval(i) <= 250) fprintf('%6.1f%9.5f%9.5f ,fval(i),yl(i),yl(i)); fprintf('% 11. 7f% 11. 7f% 1 0.6f' ,ym(i),ym(i),ssq(i)); fprintf('% 1 0.6f\n' ,rip(i)); end if (fval(i) > 250) fprintf('%6.1f ',fval(i)); fprintf('% 11.7f% 11.7f %10.2f\n',ym(i),ym(i),att(i)); end end if (q < 6) fprintf('\n'); ssq2 = ssq(1 :nfreq1 ); ssq1 =1.*1.; for i = 1 :nfreq if (i < nfreq1) qcoef(i:i, 1 :ncoef) = lcoef(i:i, 1 :ncoef); ydiff(i) = ssq1 ymsq(i); y1 (i) = sqrt(ydiff(i)); end if (i >= nfreq 1 ) qcoef(i:i, 1 :ncoef) = k1 *mcoef(i:i, 1 :ncoef); y1(i) = 0.; end end f1 = nfreq + 1; f2 = nfreq + 2; qcoef(f1 :f1, 1 :ncoef) = (k 0 k 0 k 0 k 0]; qcoef(f2:f2, 1 :ncoef) = (0 k 0 k 0 k 0 k]; y1 (1 )=sqrt(.S*ssq1 ); y1 (f1 )=.S*k; y1 (f2)=.5*k; x1 =qcoef\y1 '; wcoef = x1'; for i = 1 :ncoef fprintf('% 10. 7f ,wcoef(i)); if (i == 6 I i == 12 I i = 18 I i = 24 I i = 30) fprintf('\n '); end 50
PAGE 56
end fprintf('\n'}; save wcoef.dat wcoef /ascii e = qcoef*x1 y1'; fori= f1 :f2 fprintf('% 13.9f, (e(i} + y1 (i}}/k}; end fprintf('\n'); e1 = e(1 :f2}; fori= 1 :f2 fprintf('% 13.9f ,e(i}); if (i == 6 I i = 12 I i = 18 I i = 24 I i == 30) fprintf('\n'); end end fprintf('\n% 12.6f\n',norm(e 1 )); end end 11111111111111111111111111111111111111111111111111/lllllllllllll/ll II Window frequency response for Figure 3.4MATLAB II Title windfreq.m Revision 33096 9:35pm 852 bytes llllll/llll//llllllllllllllllllllllllllllll/11/llllllllllllllllllll diary windfreq.dat load wcoef.dat ncoef = 8; fori= 1:16 for j = 1:32 k = 32*(i1) + j 1; if (k=0) kfloat = k+.000000001 ; angl = 3.1415928 (kfloat 256)/ 56.; b(k) = 0.; for m = 1 :ncoef b(k} = b(k) + wcoef(m}*cos(2.*(m1}*pi*kfloat/512.}; end cvalue = b(k) sin(angl)/angl; h(k} = cvalue; c(k} = h(k}; if (rem((k1},128} >= 64} c(k) = h(k); end f(k) = 32000. *(k1 )/512; end end end h(512} = 0.; 51
PAGE 57
c(512) = 0.; plot(1:512, h, 'w') xlabel('n') ylabel('h[n]') grid pause plot(1 :512, c, 'w') xlabel('n') ylabel('c[n]') grid pause a4=(fft(h )/56.); semilogy(f(1 :50),abs(a4(1 :50)),'w') xlabei('Hz') ylabei('X(w)') grid pause fori= 1:30 a3(i) = abs(a4(i)); fprintf('%7.1f% 13.7f\n' ,f(i),a3(i)); end lllllllllllllllllllllllllllllll/lllllllll/l/l/l/1/l/lllllllllllllll/11111 II Program to generate sine wave data for testing mpeg.c II Title sinegen.c Revision 22496 12:52pm 1115 bytes lllllll/l/lll/lllllll/l/l/l/lllllllllllllll/l/l/1/1/l/lllllllllllllllllll #include #include #include #include #include #include FILE *fptr main() { double pi, sam_rate, frequ, amplit, phase_deg, phase_rad; int i, k, n_samples; double indx, y; printf("Sine wave generator program.\r\n"); printf("Enter sample and sine wave freq in Hz, zerotopeak amplitude,\r\n"); printf(" start phase in degrees, and number of samples.\r\n\r\n"); prlntf("Example 44200. 2000. 1. 30. 512\r\n\r\n"); 52
PAGE 58
printf("The results will go in file sine.dat 1 sample per line.\r\n"); scanf("%1f %If %If %If %i", &sam_rate, &frequ, &lit, &phase_deg, &n_samples); printf("%f %f %f %f %d\r\n", sam_rate, frequ, amplit, phase_deg, n_samples); fptr = fopen("sine.dat","wr"); pi = 4.*atan(1.); phase_rad = phase_deg*pil180.; k = 0; for (i = 0; i < n_samples; i++) { indx = (double) i; y = amplit *sin( ( 2. pi* frequ indx I sam_rate) + phase_rad); } k++; printf("%9.4f", y); if (k = 8) { k = 0; printf("\r\n");} fprintf(fptr, "%15.71\r\n", y); } lllllllllllll//lll//llllllllllllll//l///1///llllllll///lll///111111111111 II Program to generate ideal subband output for Table 4.1 C program II Titlesubband.c Revision 33196 6:27pm 1394 bytes lllllllllllllllllllllllllllllllllllllllllllllllll/ll//lllllllllllll/11111 #include #include #include #include #include #include double att1 [5], att2[5], freq[5]; double band[5], c1 [7], c2[7]; FILE *fptr1 ; main() { double angl1, angl2, m1, r1, r2, delta; double pi, fs, i1; int f, m; 53
PAGE 59
pi = 4.0 atan(1.0); printf("pi = % 16.12f\n", pi); fs = 32000.; att1 [0]=1.00005;att1 [1 ]=.9999B;att1 [2]=.70724;att1 [3]=.65239;att1 [4]=.99989; att2[0]=0. ;att2[1]=.03337 ;att2[2]=. 70724;att2[3)=. 75806;att2[4]=.00 178; freq[0]=1250.;freq[1]=1350.;freq[2]=1500.;freq[3]=151 O;freq[4]=1 BOO.; band[0]=2.0; band[1]=2.0; band[2]=2.0; band[3]=2.0; band[4]=3.0; fptr1 = fopen("subband.dat", "wr"); for (f = 0; f < 5; f++) { i1 = 2.*band[f]+1.; r1 = (freq[f] 250.*i1 )/250.; r2 = (freq[f] 250.*(i1 +2.))/250.; delta = 2.*freq[f]*255./fs .5; for (m = 0; m < 7; m++) { m1 = (double) m; angl1 =r1*32.*m1/64i1*m112 + i1*272./64delta; angl2 =r2*32.*m1/64(i1+2.)*m1/2 + (i1+2.)*272./64delta; c1 [m] = 1 OOOO.*att1 [f]*cos(angl1 *pi); c2[m] = 1 OOOO.*att2[f]*cos(angl2*pi); printf("%9.3f%4d%12.3f%12.3f\n", freq[f], m, c1[m], c2[m]); } for (m = 0; m < 7; m++) fprintf(fptr1, "%B.2f ", c1[m]); fprintf(fptr1, "\n"); for (m = 0; m < 7; m++)"fprintf(fptr1, "%B.2f ", c2[m]); fprintf(fptr1, "\n"); } } lllllllllllllllllllllllllllllll/11111111111111111111111111111111111111111 II Program to extract mono decimal data from soundblaster binary stereo pcm data II Titledataextr.bas Revision 91795 10:19pm 276 bytes 1111111111111111111111111111111111111111111111111111111111111111111111111 1 0 OPEN "r", #1 "tons.raw", 1 20 OPEN "o", #2, "tons.dat" 30 FIELD 1, 1 AS A$ 40 FOR I = 1 TO 20001 STEP 4 50 GET #1, I 60 XX = ASC(A$) 70 GET #1, I + 1 80 X = ASC(A$) 90 IF X>= 128 THEN X= X256 100 X= 256*X +XX 54
PAGE 60
110 PRINT #2, USING "#######";X 120 NEXT I /IIIII/IIIII /Ill II /IIIII /IIIII /lllll/11111111111 /IIIII/IIIII/III II Ill /Ill II Program to plot sin(w)/w curve for Figure 2.1 MATLAB II Title sinc.m Revision 33096 4:20pm 168 bytes 1/lllllllllllllllllllllllllllllllllllllllll/lllllllllllllllllllllllllllll c = 32,0; fori= 1:512 angl(i) = pi (i 256.0000001 )/c; h(i) = sin(angl(i))/(angl(i)); end plot(angl, h, 'w') xlabel('w'); ylabel('sin(w)/w') grid lllllllllllllllllllllllllllllillllllllllllllllllllllllllllllllllllllll/11 II Program to plot multiple images caused by decimation for Figure 2.3MATLAB II Titledecimate.m Revision 33096 12:29pm 414 bytes lllllllllllllllllllllllllllllllll/1/lllllllllllllllllllllllllllllllllllll c = 8.5; fori= 1:512 angl = pi (i 256.0000001 ); han=.5.5*cos(2. *pi*i/512.); h(i) = han sin(angl/c)/angl; end fori= 1:512 y(i)=O.; if(rem(i,8)=0) y(i) = h(i); end end fh = fft(h,512); fy = fft(y,512); semilogy(1 :512, abs(fh), 'w') xlabel('w') ylabei('H(w)') grid pause semilogy(1 :512, abs(fy), 'w') xlabel('w') ylabei('H(w) AFTER DECIMATION') grid 55
PAGE 61
/llll//llll/l/llll/llll/lllllllllllll/l/l/llllllllllllll/lllll/11/lllllll II Program to show integer band modulation for Figure 2.4MATLAB II Title ibandmod.m Revision 33096 1:57pm 343 bytes llllllllllllll/lll/lll/llllllll/llllllllllllllll///l/lll/1/l//llllllllll/ clear m = 512; fori= 1:m h(i) = .98*sin(2*pi*11 OO*i/32000. + .2*pi); g(i) = .98*sin(2*pi*1 OO*i/32000. + 1.26*pi); t(i) = (i1)/32000.; end fori= 1:m y(i)=O.; if(rem(i,32)==1) y(i) = h(i); end end plot(t, h, 'w', t, g, 'w', t, y, 'xw') xlabel('t') ylabei('ORIGINAL AND MODULATED SINE WAVE') llllllllllllllll/lllllllll//llll/l/1/lllllllllll///l/lllll/l/lll/lllllll/ II Program to generate synthesis sine wave sections for Figure 2.8 C program II Title sections.c Revision 33196 11 : 16pm 1 049 bytes llllllllllllll/1/lllll/lllllllllll/1/lllllllllllll/lllllllllllllllllllll/ #include #include #include #include #include #include #define PI 3.1415928 float z[64]; float n_cos[64]; float s_chan[5]; FILE *fptr1 ; main() { int i, i1, j, j1, k, c1, c2; int k1 k2, k3; int isign; /* =1 for forward cos xform, 1 for inverse */ 56
PAGE 62
int n; /* transform size *I I* define cosine array *I i=2; { for 0 = 0; j < 64; j++) n_cosU] = cos( (float) ((2*i + 1 ) 0 + 16)) PI I 64.0); fptr1 = fopen("sections.dat" ,"wr"); s_chan[O] = 7409.36; s_chan[1] = 1077.80; s_chan[2] = 8676.40; s_chan[3] = 9121.91; s_chan[4] = 2047.06; for (i1 = 0; i1 < 5; i1++) { if ((i1=2) II (i1==3)) { for (k = 0; k < 64; k++) z[k] = s_chan[i1] n_cos[k] ; } else { for (k = 0; k < 64; k++) z[k] = s_chan[i1] n_cos[k] ; } for 01 = 0; j1 < 64; j1 += 8) { for 0 = 0; j < 8; j++) fprintf(fptr1, "%8.2f ", zU1+j]); fprintf(fptr1, "\r\n"); } fprintf(fptr1, "\n"); } } lllllllllll/l/lll/l/lllll/lll/1111111111111111111111111111111111111111111 II Program to plot synthesis sine wave sections for Figure 2.8 MATLAB II Titlesections.m Revision 4296 9:20pm 461 bytes lllllllll/lll/1111111/lllll/lll/l/l/l/lllllll/lll/lll/l/l/l/l/lllll/1///1 load sect.dat s1 = [sect(1 :1,1 :8) sect(2:2,1 :8) sect(3:3,1 :8) sect(4:4,1 :8)]; s2 = [sect(5:5,1 :8) sect(6:6, 1 :8) sect(7:7, 1 :8) sect(8:8, 1 :8)]; s3 = [sect(9:9,1:8) sect(10:10,1:8) sect(11:11,1:8) sect(12:12,1:8)]; s4 = [sect(13:13,1:8) sect(14:14,1:8) sect(15:15,1:8) sect(16:16,1:8)]; sS = [sect(17:17,1:8) sect(18:18,1:8) sect(19:19,1:8) sect(20:20,1:8)]; axis([1 32 10000 10000]); t = 1:32; plot(t,s1,'w' ,t,s2,'w',t,s3,'w' ,t,s4,'w') grid pause 57
PAGE 63
. llllll/lllllllllllll//llllllllllllllllllllllllllllll/lllllll/111111111111 II Program to plot symmetrical overlap filter for Figure 3.1 MATLAB II Titleoverlap.m Revision 33196 10:27pm 809 bytes lllllllllllll/11111111111111111111111111111111111111111111111111111111111 diary windfreq.dat load wcoef.dat ncoef = 8; fori= 1:16 for j = 1:32 k = 32*(i1) + j1; if (k=0) kfloat = k+.000000001 ; angl = 3.1415928 (kfloat 256)/ 56.; b(k) = 0.; for m = 1 :ncoef b(k) = b(k) + wcoef(m)*cos(2.*(m1 )*pi*kfloat/512.); end cvalue = b(k) sin(angl)/angl; h(k) = cvalue; c(k) = h(k); if (rem((k1),128) >= 64) c(k) = h(k); end f(k) = 32000.*(k1)/512; end end end h(512) = 0.; c(512) = 0.; f(512) = 32000.; a4=(fft(h)/56.); a3 = fftshift(a4); p = abs(a3(247:267)); fori= 1:33 q(i) = (i1 )*pi/33; end axis([O pi 0 1 ]); plot(q(1 :15),p(7:21 ),'w',q(3:23),p,'w' ,q(11 :31 ),p,'w' ,q(19:33),p(1 :15),'w') xlabel(' Radians') ylabei('X(w)') grid 58
PAGE 64
REFERENCES [1] P. L. Chu, "Quadrature Mirror Filter Design for an Arbitrary Number of Equal Bandwidth Channels", IEEE Trans. on ASSP, vol. 33, no. 1, February, 1985. [2] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing, Englewood Cliffs, N. J., PrenticeHall, 1983. [3] A. Croisier, D. Esteban, and C. Galand, "Perfect Channel Splitting by Use of Interpolation! Decimation/Tree Decomposition Techniques," presented at the 1976 Int. Cont. Inform. Sci. Syst., Patras, Greece. [4] Digital Signal Processing Applications Using the ADSP21 00 Family Volume 2, Analog Devices, Inc., SubBand ADPCM, pp. 293300, PrenticeHall, Englewood Cliffs, N. J., 1995. [5] ISO/IEC JTCI/SC29, "Information TechnologyCoding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s IS 11172 (Part 3, Audio)". [6] J. D. Johnston, "A Filter Family Designed for Use in Quadrature Mirror Filter Banks", 1980 International Conference ASSP, Denver, CO. [7] S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing, John Wiley & Sons, 1993. [8] P. Noii,"Wideband Speech and Audio Coding", IEEE Communications Magazine, vol. 31, No. 11, pp. 3444, November, 1993. [9] D. Y. Pan, "An Overview of the MPEG/Audio Compression Algorithm", Proc. of the SPIE, vol. 2187, pp. 260273, 1994. [10] J. P. Princen and A. B. Bradley, "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation", IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP34, No. 5, pp. 11531160, October, 1986. [11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, PrenticeHall, 1978. [12] J. H. Rothweiler, "Polyphase Quadrature FiltersA New Subband Coding Technique" in International Cont. on Acoust., Speech, and Signal Processing, pp. 12801283, 1983. [13] S. Shlien, "Guide to MPEG1 Audio Standard", IEEE Trans. on Broadcasting, vol. 40, No.4, pp. 206218, December, 1994. 59
PAGE 65
[14] G. Wackersreuther, "Some New Aspects for Filters in Filter Banks", IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP34, No. 5, pp. 11821189, October, 1986. 60
