Citation
Quality of compressed speech using trellis waveform coding

Material Information

Title:
Quality of compressed speech using trellis waveform coding
Creator:
McDonald, Michael Glenn
Publication Date:
Language:
English
Physical Description:
viii, 143 leaves : illustrations ; 29 cm

Subjects

Subjects / Keywords:
Compressed speech ( lcsh )
Compressed speech ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 142-143).
General Note:
Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Electrical Engineering, Department of Computer Science and Engineering.
Statement of Responsibility:
by Michael Glenn McDonald.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
22959055 ( OCLC )
ocm22959055
Classification:
LD1190.E54 1989m .M32 ( lcc )

Full Text
QUALITY OF COMPRESSED SPEECH USING
TRELLIS WAVEFORM CODING
by
Michael Glenn McDonald
B.S., Brigham Young University, 1982
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Electrical Engineering
and Computer Science
1989


This thesis for the Master of Science degree by
Michael Glenn McDonald
has been approved for the
Department of
Electrical Engineering and Computer Science
by
Date AJols,


McDonald, Michael Glenn (M.S., Electrical
Engineering)
Quality of Compressed Speech Using Trellis Waveform
Coding
Thesis directed by Professor Joe E. Thomas
Dynamic programming has recently been used
to achieve speech compression performance greater
than that achieved by techniques such as delta modu-
lation (DM) and differential pulse code modulation
(DPCM). DM and DPCM perform operations based on
current and previous symbols only. Multipath search
techniques employing dynamic programming consider
both previous and future symbols before producing a
(delayed) decision.
Multipath search coding techniques such as
trellis waveform coding have been used to produce
high-fidelity representations of speech signals at
the rate of 1 bit/symbol. Speech sampled at 8000
samples/second produces a transmission rate of '8000
bits/second. This rate is at the lower end of
waveform coders and approaches the rates of voice
vocoders. The advantage of waveform coding over
voice vocoders is that it preserves speaker recogni-
tion as well as intelligibility.
This thesis quantifies speech quality asso-
ciated with trellis waveform coding. This is done


IV
by computer simulation and subjective evaluation.
Simulation results are presented for a binary chan-
nel in the form of signal-to-quantized-noise ratio,
waveform comparisons, frequency spectrum comparisons
and error spectra for different trellis complexi-
ties. The subjective evaluation consists of scores
from subjective listeners evaluating several recon-
structed speech segments processed by trellises of
varying complexities.
Trellis waveform coding is shown to provide
good intelligibility and speaker recognition with
moderately complex trellises. However, recon-
structed speech contains noticeable background noise
indicating a need for enhancements to the basic
technique if toll-quality speech is desired.
The form and content of this abstract are approved.
I recommend its publication.
Signed
Joe E. Thomas


ACKNOWLEDGMENTS
I would like to thank my advisor and commit-
tee chairman, Joe Thomas, for his helpful sugges-
tions and approval of funds to support this work.
Also, Douglas Ross, Jay Rothman and Marvin Anderson
for reading this thesis and serving on my committee.
I appreciate John Clark's assistance in selecting a
topic. Daniel Michaels provided valuable advice re-
garding implementation of the experiment.
I would also like to thank Computing Ser-
vices for letting me use their equipment and for
directly assisting in numerous capacities. The
Electronics Maintenance and Calibration Laboratory
personnel provided the equipment necessary to per-
form the experiment. I appreciate the willingness
of the Stanford Information Systems Laboratory to
provide the speech used in this thesis. Their coop-
eration provided time to emphasize analysis rather
than speech source generation.
My work associates at Martin Marietta pro-
vided valuable assistance. Mike Meno helped with
the digital and analog designs for the experiment.
Steve Montague provided useful insight into the rep-


VI
lication algorithm. Dave Cattani, Tom Clebecek, Eon
Havermann and Mike Martinez provided software. The
Real-Time Distribution System Laboratory personnel
made a graphics display available. I thank the 24
subjective listeners, both work associates and
friends, for their careful evaluations. I also ap-
preciate the flexibility at work provided by my un-
derstanding and supportive supervisors, Vern Dorrell
and Brian Gallagher.
Finally, I thank my family for their con-
stant support and encouragement. Specifically, my
parents, Glenn and Carol McDonald, for teaching me
the value of education and providing me opportuni-
ties to obtain it. I express sincere thanks to my
wife and children, Melanie, Marie and Jordan, for
their patience and help in pursuing this degree.


CONTENTS
CHAPTER
I. INTRODUCTION................................ 1
Data Compression......................... 3
Code Books............................... 5
Tree Codes.............................. 10
Finite State Machines................... 15
State Diagrams.......................... 16
Trellis Codes........................... 17
Objective and Approach.................. 20
II. ALGORITHMS................................. 22
Viterbi Search Algorithm................ 23
Generalized Lloyd Algorithm............. 27
III. TRELLIS WAVEFORM CODING.................... 29
Implementing the Trellis................ 31
Trellis Encoding........................ 33
Populating the Trellis.................. 35
Extending the Trellis.................. 42
Encoding the Speech..................... 49
Decoding the Speech..................... 49
Effects of a Noisy Channel.............. 54
IV. EXPERIMENT CONFIGURATION................... 56
Processing the Speech.
57


VI11
Reconstructing the Speech................ 59
V. PERFORMANCE ANALYSIS....................... 62
Encoding SQNR............................ 62
Waveform Plots........................... 71
Spectrum Plots........................... 93
Subjective Evaluation................... 124
VI. SUMMARY AND TOPICS FOR FURTHER STUDY.... 138
BIBLIOGRAPHY
142


CHAPTER I
INTRODUCTION
This paper addresses the quality of com-
pressed speech associated with trellis waveform cod-
ing. The quality is analyzed using computer simula-
tion and subjective evaluation. The first chapter
provides motivation for studying this data compres-
sion technique and introduces the concepts of trel-
lis waveform coding with the associated terminology.
Today is the age of information. We have an
unprecedented amount of information available to us
in a variety of forms. An area that has made rapid
progress is that of electronic media. Messages can
be sent electronically between computer system users
providing an extremely convenient means of conveying
small pieces of information. Some computer systems
also support conversational modes where users can be
involved in interactive conferences. Each party has
access to an input device such as a keyboard and an
output device such as a display terminal. Conversa-
tions are initiated by typing on the keyboard and
all responses are displayed simultaneously on all
terminals. This is a significant capability consid-


2
ering the proliferation of computer networking,
telephone modem dial-in availability, and intercon-
tinental communications.
More recently, voice messaging has been
added to home telephones. Arriving calls are ac-
cepted by a computer when the resident is either on
the telephone or unavailable. The computer repro-
duces a short message from the resident and then
initiates a listening mode. This mode digitizes the
caller's message and stores it in memory for future
reproduction. Voice messaging for businesses sup-
ports message broadcasting to multiple parties and
re-transmittal of received messages to other inter-
ested users.
Education has also been greatly enhanced by
electronic media. Dictionaries, thesauri, encyclo-
pedias, literature, card catalogs, and canonical
works have all been made available on computer.
Having electronic media available provides quick and
easy access to all of these valuable resources. The
computer can aid in locating information efficiently
by performing automated searches and sorts as di-
rected .
A computer that has recently been introduced
into the marketplace offers storage formats which
include text, graphics, voice and music. It sup-


3
ports array processing, image processing, encryp-
tion, and facsimile transmission. These are the
essential elements to provide an integrated elec-
tronic media workstation. In the future, this sin-
gle workstation might support video telephone, tele-
vision programming, educational training, document
transmittal and data transfer.
There are few technological advances re-
quired for this vision to become a reality. The
overriding concern is the amount of data that must
be transferred from place to place and the amount of
storage that is required once it gets there. This
problem is being somewhat alleviated by increasing
the storage capability of various devices and by
ever striving to build higher-capacity communication
1 inks.
Data Compression
One element which aids the process of con-
veying information is that of data compression. Data
compression consists of reducing the data volume
while preserving the information content. There are
fundamentally two categories of data compression:
redundancy reduction and entropy reduction [1].
Redundancy reduction reduces data in such a
way as to preserve all information. A common exam-


4
pie is to represent frequently-occurring symbols
with a small amount of data and rarely-occurring
symbols with a larger amount, thus achieving less
data on the average. This process is termed loss-
less in that the original data can be completely
reconstructed from the reduced data set. There ex-
ists a one-to-one mapping between the original data
and the compressed data so that no information is
lost in the process. Compression is obtained by
removing the redundancy inherent in the data.
Entropy reduction consists of transforming
the data in such a way that the transformed data
have lower entropy than the original data. This
technique usually consists of some form of quantiza-
tion and is most often performed in the digital do-
main. One example is to represent the data by their
Fourier coefficients. These coefficients are gener-
ated by applying the Fast Fourier Transform and rep-
resent the amount of energy contained in each fre-
quency bin. For good compression, the compressed
data energy must be concentrated in a limited number
of frequency bins. Bins containing insignificant
amounts of energy can be omitted and the data can be
represented by storing only the Fourier coefficients
for each of the remaining bins. Clearly, this is a
lossy technique since some of the data information


5
has been discarded. The data will never be able to
be perfectly reconstructed, but will be a close ap-
proximation of the original.
Another entropy reduction technique involv-
ing quantization is multipath search coding. Sev-
eral data sets or vectors are available for encoding
the data. A block of data is compared to each vec-
tor, one block at a time and the vector which most
closely matches the data is used as the encoding.
The data block is quantized, since there are fewer
vectors than possible data sequences. These vectors
often take the form of a path through a certain
structure, so that the vector comparison is termed a
search. Due to the multiple vectors or paths avail-
able for encoding, this technique is referred to as
multipath search coding.
Code Books
Multipath search coding can be implemented
using a code book, a tree code or a trellis code. A
code book is simply a set of different vectors, each
vector containing symbols which are representative
of typical data sequences. The data input is broken
into blocks and compared to every vector in the code
book. The data block is then represented by the in-
dex of the vector which most closely matches it.


6
This representation is called an encoding, since the
original data are now represented by a different set
of data. Compression is achieved by virtue of fewer
bits being required to represent the number of vec-
tors in the code book than bits required to repre-
sent all possible data sequences. The data are
quantized in that they are now represented by the
symbols in the code book vector which are only ap-
proximately the same as the data. How well the code
book vector matches the original data determines the
quantization distortion. When designing the code
book, candidate vectors are chosen to match typical
sequences based on some error criterion. This error
criterion is selected to minimize distortions which
are most harmful to the reconstruction process.
A graphical representation of a code book
(often called a vector quantizer [2]) is provided in
Figure 1-1. It consists of four states represented
by the vertical dots and four nodes represented by
the horizontal dots. Nodes of the same state are
connected yielding three branches in each state. In
a code book, each branch is assigned a value repre-
senting a typical data symbol as shown.
Suppose the first block of data is 00 01 10,
consisting of three two-bit symbols. This block of
data is compared to all of the vectors and is found


7
_ 00 01 io
00#---------------------1--------------------
01#-------------------------------------------
10 ----------------------------------------
11*------ii-------------------- ------12------*
Figure 1-1 Codebook Quantizer
to perfectly match the first one. The first block
of data is therefore represented by 00, correspond-
ing to the index of the first vector. This has
resulted in a compression ratio of 3:1 since six
bits have been represented by two bits.
Now suppose the second block of data is 00
01 11. If the error criterion is the distance meas-
ure, or number of bits that differ, then the symbols
of vector 00 differ from the data block by one,
vector 01 differs by two, 10 differs by three, and
11 differs by three. Consequently, the second block
is also represented by the first vector, the vector
having the minimum distance, with a resulting dis-
tortion due to quantization of one bit error in six.
Producing more vectors in the code book pro-
vides more possible vector combinations, but also
increases the number of bits required to represent
the vectors. A code book having a k-bit index can


8
represent 2k different data sequences. However, the
number of possible data sequences is qL, where q=2r,
r is the number of bits per symbol, and L is the
number of symbols in a block (or elements in a vec-
tor) In the example, the number of sequences that
can be perfectly represented is four and the number
of possible data sequences is 64.
Further analysis of the encoding shows that
the 64 possible data sequences would be divided into
groups of 16 in four partitions corresponding to the
four code book vectors. The 16 6-bit vectors would
be represented by effectively four bits, the other
two bits being used to specify the partition. Ana-
lyzing possible differences of a four-bit pattern
from any other four-bit pattern is equivalent to
distances from the all-zero pattern, where distance
is defined as the number of bits that differ. One
pattern has a distance of zero, four have a distance
of one, six have two, four have three and one pat-
tern has a distance of four. With this perspective
and remembering that there are four partitions, the
total result is that four encodings will have no bit
errors, 16 encodings will have one error, 24 will
have two errors, 16 will have three, and four encod-
ings will have four bit errors. In summary, 60


9
vectors will have some degree of distortion ranging
from one bit error to four bit errors.
This scheme may appear useless, but there
are some applications in which the least significant
symbol bit is unimportant. The code book can be
chosen in such a way that errors are applied to the
least significant bit when possible. This approach
would yield a worst case situation of only one bit
error in the most significant bit given that there
are four errors in six bits. Furthermore, as evi-
denced by the error tally, this occurs in only four
out of the 64 cases (as does the all-correct case) ,
which is relatively infrequent. The most common
error will be two least significant bit errors.
Another aspect of data that aids the code
book or vector quantizer is that some data sequences
may occur less frequently than others. For example,
the all-zero or all-one case may never occur. The
code book can be set up to take advantage of this by
ensuring that these two sequences and any other in-
frequently occurring sequences are evenly divided
between the possible partitions. This reduces the
actual number of sequences that must be encoded by
any one vector, consequently reducing the average
distortion (error). In practice, systems that can
tolerate some distortion have been very successful


10
in compressing data with acceptable degradation by
using code book encoding techniques.
In summary, multipath search is accomplished
using a code book by successively calculating the
distortion between the original data and the code
book vectors (or paths). Once the minimum-distor-
tion path has been located, the original data are
encoded as the path index.
Tree Codes
The second form of multipath search coding
is tree codes [3]. A tree diagram is shown in Fig-
ure 1-2. Its name comes from the fact that there is
an initial state from which branches emerge, and
each of these branches enter another state from
which more branches emerge. This continues indefi-
nitely similar to the progression from the trunk of
Figure 1-2 Tree Branches


11
a tree to its branches and finally to the small
twigs.
A tree is simply a representation of an in-
finite state machine. Each node represents a dif-
ferent state and each branch is a transition from
one state to the next. The tree illustrates all
possible transitions. In this example, there are
only two branches emerging from each state, although
there could be several. Since there are only two in
this case, it is termed a binary tree. The upper
transition shown by the thicker line is due to a 0
and the lower transition is due to a 1. Further-
more, each branch has a symbol which is associated
with the transition. As a result, data sequences
can be represented by the different paths through
the tree.
As an example, the path 010 would start at
the leftmost node, transition along the upper path
as specified by the first 0, move along the lower
path as indicated by the 1, and then take the upper
path due to the last 0. This would result in a data
sequence of 01 10 00, which is just the symbols
along the path. An original data sequence of 01 10
00 would then result in an encoding of 010 corre-
sponding to the path specification. This results in


12
a 2:1 compression ratio since six bits are encoded
to three bits.
A tree contains pL different paths to repre-
sent qL possible combinations, where p=2b, b is the
number of bits for representing paths emerging from
a state, q=2r, r is the number of bits per symbol,
and L is the length of the tree. In the example,
there are eight different paths to represent 64 se-
quences. During encoding, possible sequences will
be separated into eight partitions corresponding to
each path, so that any eight sequences will be rep-
resented by one tree path or vector. It follows
that these sequences will differ in at most three
bits (the other three bits were used to specify the
path), so that the maximum distance will be three
bits resulting in a maximum of three bit errors out
of six. Since there are at most three bit errors,
these errors can all be contained in the least sig-
nificant bits. Specifically, there will be one zero
error, three single errors, three double errors, and
one triple error in each partition. Combining all
partitions results in eight perfect representations,
24 single and double error sequences, and eight tri-
ple error sequences.
It appears from this example that the tree
has far better performance than the code book, but


13
the compression ratios must be the same for a fair
comparison. A code book with a 2:1 compression ra-
tio might consist of two states and three nodes (two
branches). Now, 16 sequences are possible and there
are two code book vectors, so that eight sequences
will be encoded by each vector. One of the vector
bits is taken up in specifying the path leaving
three bits to represent the eight sequences in each
partition. This is identical to the case of the
tree where six bits were available in each tree
path, but three were occupied by the path. in both
cases, this leaves three bits to specify eight se-
quences in each partition. Performance is identi-
cal .
In the general case, the number of possible
sequences and available sequences is shown in Table
1-1 along with the corresponding compression ratios.
Table 1-1 Comparison of Code Book and
Tree Performance
Possible
Sequences
Available Compression
Sequences Ratio
Code Book 2rL
2k
rL: k
2 rL
2bL
r: b
Tree


14
For equal compression ratios, b=k/Lc, where b is the
number of bits used to specify the tree branches at
each node, k is the number of code book index bits,
and Lc is the code book length. For the same number
of available sequences, b=k/Lt, where Lt is the
length of the tree. This implies that Lc=Lt. This
seems reasonable, but for the same number of avail-
able sequences, k=bLt so that there must be 2k vec-
tors in the code book. The smallest value for b is
the binary case, b=l, and a practical Lt is 256 so
that the number of vectors is 2 256! It is now clear
that tree encoding ultimately out-performs code book
coding.
The previous example emphasizes the fact
that compression ratio and distortion are two funda-
mental parameters in analyzing the effectiveness of
various data compression schemes. Any data are eas-
ily compressed to any desired extent provided there
is no distortion requirement. One extreme is to
take all of the original data and discard them. The
compression ratio is infinite, but so is the distor-
tion. There is an entire discipline termed rate-
distortion theory which derives theoretical bounds
for the minimum distortion that can be achieved with
a given compression ratio and vice versa [4].


15
Finite State Machines
The previous tree code was generated with an
infinite state machine. Finite state machines also
produce tree codes. In this case, the tree branches
will begin to repeat when the state machine recy-
cles. For example, if the state machine is a two-
stage binary shift register with new bits shifting
in from the right as shown in Figure 1-3, there are
only four possible states. (A non-binary shift reg-
ister would shift symbolis rather than single bits
into each stage.) Starting in the 00 state, an
input sequence of 111 changes the contents of the
shift register from 00 to 01, from 01 to 11, and
finally from 11 to 11, which is a previously entered
state. This cyclic property is true for all initial
states and possible input sequences.
Figure 1-3 Finite State Machine Implemented as
a Two-Stage Binary Shift Register


16
State Diagrams
The state diagram of Figure 1-4 is a graphi-
cal representation of the two-bit shift register.
The bubbles represent states and the arrows repre-
sent transitions due to either an incoming 1 or 0.
The thick lines indicate a 0 transition and the thin
lines indicate a 1. Assuming that the shift regis-
ter is initialized to the 00 state, a 0 shifted in
01
Figure 1-4 State Diagram of a Two-Stage
Binary Shift Register


17
from the right will not change the contents of the
shift register. This is depicted by the upper loop
in the state diagram showing the 00 state tran-
sitioning back into the 00 state. On the other
hand, if a 1 is shifted in from the right, the con-
tents of the register change to 01. This is shown
by the arrow going from state 00 to state 01 and so
on.
The symbols along the paths represent the
symbols that are associated with that particular
transition. A transition from state 00 to state 01
produces a symbol 10. The same is true for each
path.
Trellis Codes
Taking advantage of the fact that the tree
branches begin to repeat for finite state machines,
the repeating branches can be connected into the
same previous states. This structure results in a
trellis. The trellis of Figure 1-5 is identical to
the tree of Figure 1-2 except that the branches are
merged when they begin to repeat. Nodes in each
column represent states and each column represents a
different transition. The branches of the trellis
are simply the state transitions so that branch sym-
bols are identical for the same state-to-state tran-


18
Figure 1-5 Trellis Structure
sitions. Comparing Figures 1-4 and 1-5, note that
the transition from the 00 state to the 00 state is
caused by an incoming 0 (as indicated by the thick
line) and results in the symbol 01. Likewise, the
second transition due to a 1 is-from 00 to 01 and
results in the symbol 10. Figures 1-4 and 1-5 are
perfectly analogous.
By adding one more stage to the shift regis-
ter, the contents of the full register can be used
to label the branches. The new shift register is as
shown in Figure 1-6. Each register contains a bit
that has been shifted in from the right. (In the
non-binary case, each symbol is more than one bit.)
The resultant K-stage register now contains K-l
state index bits as before, and K branch index bits
corresponding to the 2K branches. K is called the
constraint length, since it is the number of transi-


19
<------- State Index -------->
MSB LSB
Branch Index
Figure 1-6 Binary Shift Register Including
Extra Branch Index Bit
tions over which any incoming bit affects the con-
tents of the register.
The trellis diagram has 2K_1 states, 2K
branches, and the same number of paths (pL=2bL) as
the tree diagram, but is finite as opposed to the
tree. A larger constraint length results in more
states and branches, hence more paths. Constraint
length is an important parameter directly affecting
trellis performance. During encoding, more branches
results in finer quantization of the input data.
Longer trellises.increase performance due to reduc-
ing the effect of the initial start-state con-
straint. Paths through the trellis have many common
elements so that individual symbols may be used many


20
times. A trellis thus eliminates the complexity of
the infinite tree while providing more paths than
the code book.
A trellis works very similar to the tree. A
data sequence is compared to all possible paths
through the trellis. The data are then encoded as
the path through the trellis. Just like the tree,
six bits of data are represented by three bits of
path. Performance is identical to that of the fi-
nite tree. In summary, a trellis provides effective
quantization due to the many possible paths, yet is
easily realizable because it can be represented by a
finite state machine.
Objective and Approach
The primary objective of this thesis is to
quantify degradation to speech intelligibility and
speaker recognition when utilizing trellis waveform
coding techniques. In order to accomplish this ob-
jective, digitized speech was obtained and the trel-
lis was implemented in software. Speech data were
used to train the trellis and populate the branches
with symbols. The trellis training was accomplished
using the Generalized Lloyd Algorithm and the
Viterbi Algorithm with a mean square error crite-
rion. The digitized speech was then encoded, recon-


21
structed and analyzed. Analysis included both quan-
titative and qualitative results.
Chapter II outlines the algorithms that were
required for this analysis. Chapter III provides
details related to trellis waveform coding. Chapter
IV provides details of the experiment configuration.
Chapter V contains the results of the experiment.
The last chapter is a summary including suggestions
for further study.


CHAPTER II
ALGORITHMS
Previous discussions alluded to distortion
and comparison of data sequences to code vectors.
These are predicated on an error criterion. The
concept of distance was used previously as a concep-
tual error criterion, where distance was measured as
the number of bits that were different between the
data and code word. However, this may not be an
effective error criterion, since it weights all bits
equally. Error criteria must be selected based on
the data being encoded. For different kinds of
data, different kinds of distortion are more harm-
ful. For speech, many error criteria have been de-
veloped to minimize the distortion that is audible
to the human listener. These criteria are often
complicated and difficult to implement. An error
criterion that performs reasonably well and is
mathematically tractable is the mean square error
criterion expressed as


23
MSE = -jr'Z (xt xr)2 (2-1)
where x represents the original data sequence, x' is
the encoded data sequence, and n is the number of
elements. The difference between each element of
the two vectors is squared and summed over the en-
tire vector. The mean square error has the effect
of emphasizing errors in the most significant bit.
The mean square error criterion has been used in
producing the results of this thesis.
Viterbi Search Algorithm
The Viterbi Search Algorithm was originally-
developed for maximum likelihood decoding of con-
volutional codes used in communications for error
correction. Since a convolutional code can be rep-
resented by a trellis, the algorithm has direct ap-
plicability here.
The algorithm is really quite simple to im-
plement although it is computationally intensive and
requires a significant amount of storage. It util-
izes metrics which is a term referring to distortion
on a branch as calculated by the error criterion.
The algorithm consists of six steps.


24
1) For each state, calculate the metric for
each incoming branch.
2) Add each branch metric to the previous
state metric to form the cumulative
path metric.
3) Record the bit corresponding to the
branch transition with the smallest
path metric.
4) Save the surviving path metric as the
new state metric.
5) Repeat the previous steps for each
transition.
6) Select the path with the smallest
metric as the best path.
The first step consists of calculating the distor-
tion associated with all incoming branches. This
can be done by applying the mean square error crite-
rion so that a small metric corresponds to minimal
distortion. The second step requires adding 'the
branch metric to the previous state metric. Each
incoming branch emerges from a certain state. Pre-
viously, all paths into this state were eliminated
except the path of lowest distortion. The surviving
path metric was then associated with the state. The
current branch metrics are added to the surviving
path metrics to obtain a total path metric for each


25
incoming branch. At this point, the incoming path
with minimum distortion can be selected. The bit
causing the transition is recorded and the surviving
path metric is associated with the current state.
Conceptually, the Viterbi algorithm is
analagous to attempting to locate the shortest path
between two points along several paths. Multiple
paths merge and separate, but all paths begin and
end at a common point. The algorithm keeps track of
distances along multiple paths until they merge at a
common point, in this case a node. At this time,
the paths having the longer distances are elimi-
nated. Selecting longer paths to a certain point
can only make the overall journey longer. If the
shortest path is chosen, any path leading from this
point will be as short as possible. At the final
destination, only the shortest paths are remaining,
one for each node. It is then a simple process to
select the best path overall. The algorithm accom-
plishes a sequential maximum likelihood decision by
process of elimination at each transition in the
trellis.
The viterbi algorithm as outlined above re-
quires significant storage, since one entire path
through the trellis must be saved for each state.
The largest trellis depth in this thesis is 256, so


26
storage was not a problem. However, if storage is
limited, the Viterbi algorithm can be implemented as
a finite sliding window as shown in Figure 2-1. In
this case, only a fixed amount of previous path in-
formation is saved. After each transition, the best
path to each state is determined as before, but a
further step is to select the state having the best
overall path. The value at the beginning of the
window on this most likely path is used as the en-
coding. This is possible because all paths near the
beginning of the trellis begin to merge as the proc-
ess steps farther into the trellis. Simulations
have shown that if the window length is maintained
t t
Symbol Current
Release Transition
Figure 2-1 Finite Encoding Window


27
at 4 to 5 times the constraint length, then there is
only a small degradation in performance [5].
Generalized Lloyd Algorithm
The Lloyd Algorithm was originally discov-
ered as a solution of pulse code modulation quan-
tizer equations. It later became recognized as a
minimum distortion source code design. The Lloyd
algorithm consists of five steps [6].
1) Obtain initial trellis branch symbols.
2) Perform a trellis encoding of the
source using the search algorithm and
the error criterion.
3) Calculate the encoding distortion.
4) Refine the branch symbols based on the
trellis encoding.
5) Repeat steps 2-4 until the distortion
is acceptable.
The algorithm is generalized by adding a sixth step
which consists of extending the trellis and repeat-
ing steps 2-5 [7]. The process is completed when
the trellis has been extended to the maximum con-
straint length.
The initial trellis is based on some prior
knowledge of the data. If it is known to be Gauss-
ian, the initial trellis could consist of samples of


28
a Gaussian random variable. The second step uses
training data and requires searching the trellis for
the path which best represents the training data
based on the error criterion. This step is per-
formed for all blocks of data in the training set.
Upon completion, each data element has been encoded
to a certain branch in the trellis. This forms a
partitioning of the data, one partition for each
branch. All data values encoded to a given branch
can now be used to form a new symbol for the branch
which reduces the resulting distortion. In this
work the centroid is used which, in the case of mean
square error, reduces to the average of the ele-
ments. So, the average of all elements encoded to
each branch is used as the new branch symbol. In
the next step, the distortion associated with the
current encoding is evaluated. If the distortion is
acceptable, the existing branch symbols are used for
encoding data outside of the training set. If the
distortion is not acceptable, the existing branch
symbols are replaced by the new branch symbols and
the process is repeated. The sixth step of trellis
extension will be discussed in detail in the next
chapter.


CHAPTER III
TRELLIS WAVEFORM CODING
The trellis and algorithms required for
trellis waveform coding were previously introduced.
This chapter covers the specific implementation of
the trellis and application of the algorithms used
in this thesis.
The flowchart in Figure 3-1 shows the proc-
ess control functions. All of the calculations and
accompanying results are predicated upon the con-
straint length and the trellis depth. In the case
of populating the trellis, constraint length is used
as the constraint length for the final trellis.
Intermediate steps initialize the constraint length
to one and then extend the trellis at each iteration
by increasing the constraint length by one. Since
higher constraint-length trellises are generated by
extending lower-order trellises, all trellises hav-
ing constraint lengths less than the desired trellis
are generated as part of the process. So, when
constraint length is specified for populating the
trellis, what is actually being requested is that
all trellises having constraint lengths less than


30
CONTROL ]
PROCESSING J
POPULATE
TRELLIS
ENCODE
SPEECH
Figure 3-1 Controlling the Process


31
the desired constraint length be generated. The
branch symbols are available for storage after each
constraint-length iteration. The final iteration
provides a trellis having constraint length equal to
that input at the process control level.
In the case of encoding speech, the con-
straint length indicates the trellis which will be
used to encode the speech data. The branch symbols
corresponding to this trellis must have been previ-
ously generated. Larger constraint lengths result
in trellises with higher performance. Trellis depth
indicates the length of the trellis in both cases of
populating the trellis and encoding the speech.
Once the desired constraint length and trel-
lis depth have been defined, the trellis must be
populated with branch symbols representing typical
data sequences before initiating data encoding. The
entire process is to populate a trellis of the de-
sired constraint length and depth using training
data, and then start the process over again using
this trellis to encode speech data outside the
training set.
Implementing the Trellis
The trellis is implemented as a shift regis-
ter having length equal to the constraint length.


32
As was shown in Figure 1-6, the most significant bit
(MSB) is on the left and the least significant bit
(LSB) is on the right. Incoming bits are shifted in
from the right. The state is specified by the K-l
least significant bits. The branches are specified
by the entire contents of the shift register. Of
interest to the Viterbi algorithm is the index of
the incoming branches and previous states. These
operations are performed algebraically by the equa-
tions
i' = 1 + uqK_1, u=(0,l) (3-1)
1' = [if] + uqK2, u=(0,l) (3-2)
where i is the branch index, 1 is the state index, u
is the incoming transition symbol, and the prime
indicates previous. These operations are equivalent
to shifting the contents of the register back (to
the right) one stage and bringing in each of the
possible symbols from the left, since these were
lost in the transition to the current state. If the
current state is 00, then the two incoming branch
indices are 000 and 100 for the trellis of Figure
1-5. These correspond to the first and fifth
branches from the top that emerge from the third


33
column of nodes. The corresponding previous states
are 00 and 10.
The initial fanning out of the trellis re-
quires that the first K-l transitions be treated
separately. This is implemented by performing cal-
culations for only the first 2L states, where L is
the trellis depth, and for the zero-transition (up-
per) branches only, until the trellis is full.
In the current implementation, speech sam-
ples consisting of 12 bits/sample are stored as
16-bit integers. Since data are encoded as symbols,
the number of integers required for each encoding is
equal to the trellis length. The branch symbols are
stored as 32-bit floating point numbers. There is
one symbol for each branch with the number of
branches equal 2K. The path consists Of a single
bit at each node specifying one path for each state.
In this implementation, each bit is stored as a
16-bit integer. This results in 2K-1L integers to
represent the path. The path could be reduced as
suggested in the section on the Viterbi algorithm.
This was unnecessary in the current application.
Trellis Encoding
The technique for populating the trellis
branches with symbols will be discussed in the next


34.
section. It involves trellis encoding so that this
topic must be covered first. For purposes of this
discussion, assume that the trellis has been previ-
ously populated.
The key to trellis encoding is the search
algorithm. Several search algorithms exist, but the
optimal Viterbi algorithm has been used in this the-
sis. From the section on algorithms, it is under-
stood that the Viterbi algorithm selects the path
through the trellis which minimizes average distor-
tion based on some error criterion, in this case
mean square error. Symbols along the path then rep-
resent the encoded data. The path can be repre-
sented by a 1 or a 0 at every node. Since the
speech data in this thesis were digitized at twelve
bits per symbol, this represents a 12:1 compression
ratio.
To encode, input data are divided into
blocks equal to the depth of the trellis. The
Viterbi algorithm compares each block element to all
branch symbols at the corresponding trellis depth.
The best path through the trellis is selected based
on the accumulated error (distortion) for each path.
The encoded data become the respective branch sym-
bols along the path.


35
Populating the Trellis
In order to perform data encoding, the
branches of the trellis must be populated with sym-
bols that are representative of typical sequences.
An effective way of accomplishing this is by using
training data and the Generalized Lloyd Algorithm to
determine the branch symbols. This process is out-
lined in Figure 3-2 which spans three pages.
The first page contains initializations for
the trellis extension loop and the trellis refine-
ment loop. The first step is to set the constraint
length counter to zero, and set the distortion
threshold. The distortion threshold is used in de-
termining whether distortion imparted by the current
trellis to the training data is acceptable. The
constraint length counter is incremented before the
first operation so that the initial trellis is of
constraint length one. Each iteration thereafter in
the extend loop increases the constraint length by
one. Since the first trellis is of constraint
length one, this trellis consists of one state and
two branches as shown in Figure 3-3. Each transi-
tion begins in state 0 and ends in state 0, but one
transition follows the upper branch associated with
a 0 transition, and the other transition follows the


36
Extend
Loop
Figure 3-2 Populating the Trellis


37
Partition
Loop
Calculate current
block distortion
Sum with previous
block distortion
No
m=m+l
xj
Viterbi
Algorithm
Si= {x5 Ci;j}
Pi = 11 Si ||
L
Bi = Bi+ZXj GSi
j=i
A=A+A
m=M?
Figure 3-2 (continued)


38
Figure 3-2 (continued)


39
Figure 3-3 Constraint-Length-One Trellis
lower branch associated with a 1. The initial
branches are set to 1 and -1.
The next section is preparation for refining
the branch symbols. The current distortion is in-
itialized to infinity so that the first-time distor-
tion will always be less than the initial distor-
tion. The current distortion is moved to the previ-
ous distortion and the current distortion is set to
zero. Also set to zero are the block counter, the
number of symbols in a branch partition, and the sum
of the symbols in a branch partition.
Now that all parameters have been initial-
ized, the partition loop is entered on the second
page of Figure 3-2. The partition loop performs the
function of encoding the training sequence one block
at a time. This encoding finds the path through the
trellis which most closely represents the training
data. After encoding, each data sample is repre-
sented by a branch symbol. This branch information
is used to form several partitions, Si, of the
training data where a partition is all block ele-


40
ments that have the same branch index. (Cij indi-
cates that the jth element was encoded to the ith
branch, thus being associated with the partition
Si.) The number of elements per branch is incre-
mented and maintained as the number of elements in
the partition corresponding to that branch. A run-
ning sum is kept of all elements of the partition
for future calculation of the new branch symbols.
Lastly, the distortion for each encoding is calcu-
lated and accumulated for the entire data set.
Since this is a running sum, the final sum is repre-
sentative of the mean square error of the entire
encoded sequence with respect to the training se-
quence. After all blocks have been encoded, the
partition loop is complete and control passes to the
last page of Figure 3-2.
Here, the rate of change of distortion is
calculated by dividing the difference of the current
and previous distortions by the current distortion.
The current and previous distortions are the average
mean square error distortion for the entire training
sequence except that they have not been divided by
the total number of elements. However, the total
number of elements is identical in both current and
previous distortion measures, so the number of ele-
ments cancels out in the division. The subtraction


41
provides the amount that the distortion has de-
creased over the last iteration. Dividing by the
current distortion yields a ratio of distortion de-
crease to the current distortion. This provides a
means of evaluating relative gains in minimizing
distortion. If there is significant improvement, or
the trellis has not yet been refined, then new
branch symbols are generated by calculating the cen-
troid of each partition and the refine loop is re-
initiated. Once there is insignificant improvement
in distortion, the trellis is extended to higher-
level trellises as shown in Figures 3-4 and 3-5.
The refine loop is again initiated.
0
1
Figure 3-4 Constraint-Length-Two Trellis
Figure 3-5 Constraint-Length-Three Trellis


42
In summary, initially a constraint-length-
one trellis having two branches is populated with a
1 and a -1. The training data are divided into
blocks and each block is encoded using the con-
straint-length-one trellis. Upon completion, each
block element has been encoded to a branch parti-
tion. A new branch symbol is then calculated as the
centroid of the partition. This process is contin-
ued until there is little improvement in distortion.
At this point, the trellis is extended and the proc-
ess begins again. The process is complete when the
maximum constraint length has been reached and fur-
ther iterations yield little improvement in distor-
tion.
Extending the Trellis
After the trellis branch symbols have been
refined sufficiently for the K=1 case, the trellis
is extended to K=2. The number of branches is qK so
that the number of branches is doubled (in the bi-
nary case) each time K is increased by one. Exten-
sion continues until the trellis reaches the desired
constraint length.
A simple algorithm for extending the trellis
was provided in the flowchart of Figure 3-2. This
consists of forming a replica of every existing


43
branch, thus doubling the total number of branches.
The existing branch of index i is replicated as the
branch having index i+N, where N is the number of
branches. This extends each existing branch into a
branch that is incremented from the existing branch
by the number of states. Consequently, the higher-
order trellis now contains* all of the same branch
symbols as the previous trellis, but there are two
of each branch. This allows the same refined branch
symbols to be available for refinement of the
higher-order trellis.
A more complicated extension algorithm rep-
licates the branches in such a way as to not only
provide the same branch symbols, but also arrange
them in such a way as to preserve all paths that
exist in the current trellis. After the trellis of
constraint length K=1 is trained, there are two
branch symbols defined. These must now be repli-.
cated to four while preserving the paths contained
in the K=1 trellis. This is accomplished as shown
in Figure 3-6. The permutation diagram on the left
represents how branches should be replicated. For
instance, branch 1 of the trained trellis goes to
branch 1 of the new trellis, branch 2 goes to branch
2, branch 3 and 4 are reversed, 5 goes to 8 and so
on. This permutation diagram is applied until half


44
Permutation Diagram
Swap Diagram
Current
Trellis
Branch
Extended
Trellis
Branch
Current
Trellis
Branch
Extended
Trellis
Branch
Figure 3-6 Trellis Extension Branch
Replication Operations


45
of the new branches have been generated. The others
are obtained by reflecting the new branches about
the center of the trellis to generate a symmetric
trellis.
The permutation matrix is generated by ap-
plying each level of the permutation matrix to the
swapped matrix on the right. The first trellis has
two branches. To obtain the permutations for a
trellis having four branches, operations 1 and 2 of
the permutation diagram are applied to operations 3
and 4 of the swap diagram. This results in no
change since the indicated operation is to go from
the current branch to the same branch in both cases.
Since branches 3 and 4 are already swapped, they
remain swapped in the permutation diagram. For the
next level trellis, operations 1 through 4 of the
permutation diagram are applied to operations 5
through 8 of the swap diagram. As one example, the
permutation indicates a swapping of branches 3 and
4. Since this operation is overlaid onto branches 7
and 8 in the swap diagram, the replication to
branches 5 and 6 are swapped so that 7 goes to 5 and
8 goes to 6 instead. These results appear in the
permutation diagram. For the next level, operations
1 through 8 of the permutation diagram are applied
to operations 9 through 16 in order to obtain the


46
final permutation for a trellis having 16 branches.
This process continues until the desired number of
branches has been reached.
In order to extend a trellis, the existing
branches are permuted using the permutation diagram.
Then, the resultant branches are reflected about the
center of the trellis to provide twice the branches,
i.e. branch i goes to N-i + 1, where N is the number
of branches in the new trellis. When this permuta-
tion algorithm has been used, paths that existed in
the lower-order trellises now .exist in the higher-
order trellises. This insures that distortion
caused by the higher-order trellises will be at
least as low as that caused by the lower-order trel-
lis. However, this is only true in portions of the
full trellis. Due to the initial fan-out, the per-
muted branches may cause increased distortion, since
there are a limited number of paths returning to the
first state.
To demonstrate the preservation of paths
from lower-order trellises when extending to higher-
order, take as an example the extension from K=1 to
K=2 and finally to K=3 as depicted in Figure 3-7.
The K=1 trellis is initialized and the two branch
symbols are refined using training data. The K=1
trellis is then extended to K=2 by using the permu-


47
Figure 3-7 Extending to Higher-Order Trellises


48
tation diagram. The second trellis shows the re-
sult. The replicated branch indices of the K=1
trellis are shown in the upper K=2 trellis and the
branch indices of the new extended trellis are shown
in the lower K=2 trellis. Comparing the upper and
lower trellises, it is clear that the first two op-
erations of the permutation diagram have been ap-
plied: branch 1 went to branch 1 of the new trellis
and 2 went to branch 2. Upon reflecting the
branches, 1 also went to 4 and 2 went to 3. Note
that there are two nodes from which branches 1 and 2
emerge. This is the same as the K=1 trellis. Fur-
thermore, notice that the two branches entering the
same nodes are 1 and 2, just as in the smaller trel-
lis. In effect, the larger trellis contains two
smaller trellises yielding performance that is at
least as good as the smaller trellis (neglecting the
fan-out region). The K=2 branches are now replicated
as shown in the permutation diagram to generate the
K=3 trellis. Again, every node has the same
branches entering and the same branches emerging
from it as in the smaller trellis. This trellis is
also composed of two smaller trellises. The branch
indices for the new K=3 trellis are shown on the
bottom diagram of Figure 3-7.


49
Encoding the Speech
The speech data are encoded exactly as was
done during the trellis encoding required for popu-
lating the trellis. The trellis has been previously
populated with branch symbols and the data have been
divided into blocks. The path most closely corre-
sponding to the current data block as determined by
the search algorithm is used to represent the data.
As is shown in Figure 3-8, the block counter
is set to zero and incremented at the top of the
partition loop. The number of states and branches
are calculated based on the requested constraint
length. The trellis had to be previously trained
and populated with branch symbols, so there is only
a need during encoding to access the derived branch
symbols. Each block of data is provided to the
Viterbi algorithm which locates the best path for
that data block. The path is stored as a mapping
from input data to encoded symbols. This process is
continued until all blocks have been encoded. De-
tails of the Viterbi algorithm are provided in Fig-
ures 3-9 and 3-10.
Decoding the Speech
In an actual application, the digitized
speech must be transmitted over some channel. The


50
m=0
n=qk
N=qk
yi
m=m+l
xj
Viterbi
Algorithm
Pij
m=M?
Figure 3-8 Encoding the Speech


51
f LOCATE
V BEST PATH )
1=0
j=0
8X=0
j=j + l
1 = 1 + 1
i' = l+uqk_1
1 -=[4] +uqkJ
8i'=(xi'_xi'')2
8 j .=8 ^ + 81,
81=min (81J
Figure 3-9
The Viterbi Search Algorithm


52
Select state with
lowest distortion
Pi-u
l=n?
j=L?
l0=index{min(81)}
RETRACE
PATH
T
(^^RETURnJ^
Figure 3-9 (continued)


53
l = lo
j =L+1
j=j-l
i' = l+uql
-M
j=l?
Figure 3-10 Retracing the Path in the
Viterbi Search Algorithm
c-1
+uqk"2


54
compression is obtained by sending the bits repre-
senting the path rather than the actual symbols.
Both transmitter and receiver must have knowledge of
the trellis structure and its branch symbols. This
implies that training must be performed and the re-
sultant symbols provided to both the transmitter and
receiver. Once the path bits are received they are
used to trace out the path that was used in the
transmitter. Both trellises start in the first
state so that the start state is known. From the
first state, each incoming bit dictates whether the
next transition should be along the zero path or the
one path. As each branch is traversed, the symbol
associated with that branch is released. Upon re-
ceipt of all path bits, the encoded symbols have
been completely reconstructed (assuming a noiseless
channel).
Effects of a Noisy Channel
It was not the intent of this thesis to in-
vestigate performance of trellis coding in a noisy
channel. It appears that this scheme is sensitive
to noise. One channel bit error causes transitions
after the error to be relative to the wrong state.
However, the incorrect path that is traced out after
an error may closely resemble the intended path, in


55
fact merging with the correct path from time to time
so that one channel bit error does not imply that
all subsequent symbols will be in error. Also, a
symbol error does not imply that all symbol bits are
in error. A symbol error may actually only be wrong
in one bit. Finally, data are encoded in blocks so
that bit errors are constrained to having an effect
over one block only. Consequently, single channel
bit errors do produce multiple bit errors, but the
errors are not catastrophic.
Ayanaglu performed extensive analysis of
trellis coding in a noisy channel. His results show
graceful degradation of the trellis performance [8].


CHAPTER IV
EXPERIMENT CONFIGURATION
Digitized speech was obtained from the In-
formation Systems Laboratory at Stanford University
via Internet, a communication link between Stanford
and UCD. It consists of a training data set and a
larger sample set. The training set is a 40 second
segment consisting of one male and one female
speaker, whereas the sample set is six speakers of
20 seconds each, three males and three females. The
speakers contained in the training set are different
than the speakers in the sample set to prevent any
direct correlation between the training process and
the speakers that are actually encoded. Speakers
were randomly selected and are reading different
passages out of common literature.
The speech is bandlimited to 3600 Hz and
sampled at 8000 samples/second, 12 bits per sample.
This yields 320,000 samples for each training
speaker and 160,000 samples for each sample speaker.
Each 12-bit sample is stored as a two's complement
16-bit integer, so the data range from -2048 to
2047. Some of the data actually exceed this range,


57
but these samples were set to the extrema before
processing.
Processing the Speech
The digital processing is shown in Figure
4-1. The speech data were received as one training
file and one sample file. The sample file needed to
be divided into individual speakers to facilitate
further processing. After dividing the sample file,
both the training file and the speaker files were
converted to the VAX data format. Since the data
received from Stanford were in the Sun format, upper
and lower bytes of the 16-bit integers had to be
swapped to be compatible with the VAX.
The next step was to generate the branch
symbols. This was accomplished by using the train-
ing data. Once the branch symbols had been gener-
ated, these symbols were used to encode the sample
speakers. The result of encoding was a path that
represented the speaker. This path was converted to
the branch symbols specified by the path and then
truncated to return to a 12-bit representation.
When the speech had been encoded, both
original and reconstructed data were available to
generate performance measures. These measures con-
sisted of signal-to-quantized-noise ratio, time


58
(C Stanford
Disk Sun Microcomputer
Internet
Separate
Speakers
Swap
Bytes
UCD
VAX
Generate
Branch
Symbols
Encode
Waveform
Decode
Waveform
Sample
Data
Generate
Quality
Measures
Reconstructed
Data
Kermit


IBM PC

f
Graphic
Display

Plotter
Floppy disk
Compaq
Transfer
to DAC
Figure 4-1 Digital Processing


59
waveform comparisons, frequency spectrum comparisons
and error frequency spectra.
The original data and encoded data were also
converted to unipolar values ranging from 0 to 4095
in preparation for the 12-bit digital-to-analog con-
version. They were then sent from the VAX to the
IBM PC over a data link using the Kermit file trans-
fer software. Using the IBM PC, the data were
transferred to floppy disk and ported to the Compaq
computer which served as the host for data recon-
struction .
Reconstructing the Speech
i
Software resident on the Compaq read data of
one speaker from disk into RAM in preparation for
the direct memory access data transfer to the digi-
tal-to-analog converter (DAC). As shown in Figure
4-2 which illustrates the analog reconstruction, the
data were converted back to analog using a DAS-16
multiplying DAC manufactured by the Metrabyte Corpo-
ration. The DAC was programmed to run at 8000 con-
versions/second so that data were clocked out at the
same rate as the original sample rate. The data
were then passed through an active lowpass filter
having a 3-dB cutoff frequency of 3600 Hz to match
the original processing filter and to reject high-


60
Figure 4-2 Analog Reconstruction
frequency images induced by the analog reconstruc-
tion process. The filter circuit diagram and asso-
ciated parameters are shown in Figure 4-3 and the
pass band characteristics are plotted in Figure 4-4.
The final step was to pass the data to either an
analog reel-to-reel tape recorder for archiving, or
send it directly to a speaker for audio presenta-
tion.


61
22pF
Figure 4-3 Active Lowpass Filter. Source: Linear
Applications. vol. 1, National
Semiconductor Corp., 1973,
pp. AN5-9 to AN5-10.
Frequency (KHz)
Figure 4-4 Gain of Lowpass Filter


CHAPTER V
PERFORMANCE ANALYSIS
Signal to quantized noise ratio (SQNR) is a
measure of error in the encoded waveform. It is
defined as the signal power divided by the noise
power
SQNR
10 log.
£i Xi2
10
£.(Xi- x- )!
(5-1)
where x is the original waveform and x' is the en-
coded waveform. This quantity was used to evaluate
the relative performance of each of the different
trellis encoding schemes. The theoretical bound for
encoding a memoryless Gaussian source using mean
square error and 1 bit/symbol is 14.4 dB [9]. This
bound can be used as a benchmark for comparison of
the actual performance, since the memoryless Gauss-
ian source lower-bounds all other sources.
Encoding SQNR
Three trellis encoding schemes were evalu-
ated. The first two are standard trellises which


63
start in state 1, fan out to a full trellis and then
begin to repeat. The only difference is that the
first case was populated using the simple branch
replication technique, whereas the second case was
done using the more complicated replication tech-
nique which preserves paths through the trellis.
The third case was the full trellis where there is
no initial fan-out, but all states are available
from the start. This full trellis was populated
using the simpler replication technique.
The resulting SQNR for each of the cases is
shown in Tables 5-1 through 5-3. The cases are
plotted in Figures 5-1 to 5-3. The rows represent
constant constraint length and the columns are trel-
lis length. Constraint-length-one trellises in all
cases are flat at 2.6 dB. in a constraint-length-
one trellis, there is only one state and two paths,
so changing the length of the trellis has no effect
on its ability to encode data.
In all cases, performance continuously im-
proves from constraint lengths K=1 to K=8 for trel-
lis lengths L=7 and 8. This is as expected. In the
first case (Figure 5-1), the other trellis lengths
show the effect of replicating the trellis branches
without any concern for preserving paths from the
previous trellis. Often, the trellis extension re-


64
Table 5-1 SQNE for the Standard Trellis
Using Simple Extension
Trellis Depth
2 4 8 16 32 64 128 256
c 1 2.63 2.63 2.63 2.63 2.63 2.63 2.63 2.63
o n s 2 2.81 4.39 4.84 5.11 5.26 3.30 3.33 3.37
t r 3 2.82 4.59 5.29 5.76 6.12 6.02 6.22 6.37
a i n 4 2.82 1.67 3.52 5.08 6.21 6.76 7.13 7.34
t 5 2.83 2.12 2.98 5.13 6.55 7.13 7.63 7.89
L e n 6 2.83 2.13 O'. 59 4 85 4.81 6.45 7.59 8.45
g t 7 2.84 2.13 1.61 4.38 6.33 6.92 7.73 8.55
h 8 3.69 2.13 1.08 3.80 6.15 7.69 7.72 8.83
suits in worse performance with the more complex
trellis, since previously existing paths are de-
stroyed and refining the trellis must be re-initi-
ated. In fact, the refinement does not reach the
performance of the previous trellis before the mini-
mum rate of improvement is reached. This effect is
magnified in the shorter trellises where the fan-out
region occupies a larger percentage of the overall
trellis and thus becomes a dominant factor in deter-
mining branch symbols. The fan-out region affects
several samples after the full trellis has been


65
Table 5-2 SQNR for the Standard Trellis
Using Path-Preserving Extension
Trellis Depth
2 4 8 16 32 64 128 256
c 1 2.63 2.63 2.63 2.63 2.63 2.63 2.63 2.63
o n s 2 2.81 2.89 2.95 2.63 2.63 2.63 2.63 3.15
t r 3 2.82 4.14 4.41 4.69 4.87 4.75 4.68 3.15
a i n 4 2.82 1 19 2.89 4.20 5.09 5.78 5.30 5.20
t 5 2.83 2.13 2.22 4.05 5.30 5.77 7.24 6.08
L e n 6 2.83 2.13 2.18 3.69 5.24 7.19 7.31 7.44
g t 7 2.84 2.13 1.40 3.92 6.00 7.22 7.77 7.70
h 8 3.69 2.13 0.85 3.47 5.72 7.46 7.95 9.00
reached, because paths leaving the fan-out region
may continue along less desirable paths which are
required to minimize the error in the fan-out re-
gion, but are not optimal in the full trellis re-
gion. Longer trellises begin to suppress this ef-
fect as the fan-out region becomes small with re-
spect to the trellis. Notice that for constant con-
straint lengths and trellises longer than the fan-
out region, there is a continual increase in per-
formance as a function of trellis length, with the
exception of the two smallest constraint lengths.


66
Table 5-3 SQNR for the Full Trellis
Using Simple Extension
Trellis Depth
2 4 8 16 32 64 128 256
1 2.62 2.62 2.62 2.62 2.62 2.62 2.62 2.63
c
o 2 5.58 2.83 3.44 2.83 3.26 3.40 3.40 3.40
n s t 3 7.92 6.04 3.57 3.29 6.67 6.62 6.58 6.59
r
a i 4 11.83 8.88 8.49 7.24 8.06 8.00 7.73 7.93
n
t 5 12.86 9.62 9.95 7.72 8.97 8.84 8.69 8.75
L 6 17.52 13.23 11.55 9.14 10.12 9.94 9.47 10.00
e n g 7 20.35 13.89 12.63 10.31 10.88 10.65 10.45 10.77
t h 8 22.84 16.70 13.65 11.54 11.36 10.98 10.70 11.25
Surprisingly, the trellis extended using the
path-preserving replication technique (Figure 5-2)
is even more affected by the fan-out region, so that
only marginal performance is realized over the sim-
ple replication technique. All paths are preserved
only when all states in the trellis are available.
Since the initial paths emerge from the first state,
not all states are available until the full trellis
is reached at L=K-1. Unlike the simple replication
technique where previous branches are left in place,
the permutations associated with the path-preserving


CO <3 2 PS (T3 PQ) CO 67
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K=1
Figure 5-1 SQNR for the Standard Trellis
Using Simple Extension
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K=1
Figure 5-2 SQNR for the Standard Trellis
Using Path-Preserving Extension


W Of 2 02 (T3 m)
68
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K= 1
2 3 4 5 6 7
Log2 of Trellis Depth
Figure 5-3 SQNR for the Full Trellis
Using Simple Extension
l
B


69
replication technique may move branch symbols into
the initial fan-out paths that produce more degrada-
tion than those that were in the previous trellis.
Performance does still improve as a function of
trellis depth with constraint lengths greater than
K=5. For the large constraint lengths, the longer
trellises provide significant increases in perform-
ance. This implies the value of making the trellis
even longer. Ultimately, larger constraint lengths
and longer trellis lengths yield the best perform-
ance .
For the full trellis, initial fan-out is
eliminated and side information is transmitted to
the decoder to enable proper first-state initializa-
tion. This side information consists of enough bits
to specify all of the states in the trellis. Conse-
quently, the number of bits is K-l. For the largest
trellis in this study, this amounts to 7 bits.
These extra bits have the effect of lowering the
compression ratio and increasing the complexity of
the decoder. However, the performance is signifi-
cantly increased. SQNR for the largest constraint
length and longest trellis is 11.25 dB. This is
2.42 dB higher than the standard trellis. Perform-
ance generally increases with increasing constraint


70
length and decreasing trellis depth with the excep-
tion of K=3, L=8 and 16.
The effective transmit rate due to the side
information for K=8 is increased to 8219 bps from
the original 8000 bps. The performance of the
shorter-length trellises comes at higher and higher
cost as the number of bits required to represent the
initial state constitutes more of the total transmit
packet. The transmit rate is almost doubled (15,000
bps) at trellis length L=8. At L=2, the effective
rate is 36,000 bps. So, high-fidelity signals are
obtained, but the rate has been increased suffi-
ciently that other more effective schemes such as
multiple-level (non-binary branch transitions) trel-
lises can be used in place of the shorter trellises.
The side information required for the full
trellis can be implemented using the standard trel-
lis. The initial path metric for each path in the
fan-out region can be set to zero, so that the fan-
out region does not affect the choice of the best
path. Incoming data are only compared to paths be-
yond the fan-out region. Once the best path has
been selected, bits corresponding to the path from
the first state to the chosen state through the fan-
out region must be multiplexed into the beginning of
the data stream. When these data are received, the


71
first few bits specify the artificial path through
! the fan-out region, and then operation is as with
/
X the full trellis.
The distortion, as a function of iterations
in the refinement/extension portion of the algorithm
is shown in Figure 5-4 to 5-6 for each of the trel-
lis cases. These are for the K=8, L=256 trellises.
Note that the full trellis continually decreases,
whereas the partial trellises increase at each ex-
tension point. In shorter trellises, this jump is
even more distinct and the distortion does not reach
the previous level before being extended again.
Since the path-preserving replication does not per-
form much better than the simple replication, no
more results will be presented.
Waveform Plots
A good visual demonstration of the perform-
ance of trellis waveform coding is provided by over-
laying the original and encoded waveforms. The ef-
fectiveness of the encoding can be determined by how
closely the encoded data follow the original
waveform. Several records of each file were se-
lected for plotting. One record contained some
rapid variations followed by a low-amplitude tail.
Another record contained very smooth, repeating


72
xlO4
Figure 5-4 Convergence for the Standard Trellis
Using Simple Extension
xlO*
Figure 5-5 Convergence for the Standard Trellis
Using Path-Preserving Extension


73
xlO4
Figure 5-6 Convergence for the Full Trellis
Using Simple Extension
160


74
waveforms that each got slightly lower in amplitude.
These two records were used to analyze the waveform-
following capabilities and shortfalls of trellis
coding. Generally, the technique had more diffi-
culty with the first record. Plots of this record
encoded by the standard trellis and the full trellis
are shown in Figures 5-7 through 5-36. The solid
line indicates the original waveform, the dashed
line represents the encoded waveform.
Some of the plots exhibit near-triangular
oscillations. This is due to the construction of
the trellis. At each node in the trellis, one of
two branches provides the transition to the next
state. This transition releases a branch symbol
used to encode the original waveform. After tran-
sitioning to a new node, the two branches emerging
from the new node are not necessarily the same two
branches as at the previous node. Consequently, the
transition produces a different branch symbol and
transitions to the next node. Depending on the
path, the new node could possibly be the same as the
first, so that the pattern can be repeated. A trel-
lis contains several different-length paths so that
oscillations may be anywhere from 1 sample/cycle
(constant) to the constraint length/cycle. The con-


(D ClC c+h-i-0 3 > ffi aC 3 >
75
Figure 5-7 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=256)
Figure 5-8 Original and Encoded Waveforms Using
Full Trellis (K=8, L=256)


o ac c+H-i-i'a 3> 3h'W ac ctH-h-'t? 3 >
76
Figure 5-9 Original and Encoded Waveforms Using
Standard Trellis (K=7, L=256)
Figure 5-10 Original and Encoded Waveforms Using
Full Trellis (K=7, L=256)


oaC(+H'Ht)3> , , D M-dd (D aC c+H-H-it) 3 >
77
Figure 5-11 Original and Encoded Waveforms Using
Standard Trellis (K=6, L=256)
Figure 5-12 Original and Encoded Waveforms Using
Full Trellis (K=6, L=256)


ffi ac d-H' i-'d 3 > 3H-W oaC(+H< -d 3 >
78
Figure 5-13 Original and Encoded Waveforms Using
Standard Trellis (K=5, L=256)
Figure 5-14 Original and Encoded Waveforms Using
Full Trellis (K=5, L=256)


ac c+ Hh-'t) 3 > 3 H'W dC c+H-h^tJ 3 >
79
Figure 5-15 Original and Encoded Waveforms Using
Standard Trellis (K=4, L=256)
Figure 5-16 Original and Encoded Waveforms Using
Full Trellis (K=4, L=256)


aCctH'Hl0 3> D H-Od OuC c+H-1-**0 3 >
80
Figure 5-17 Original and Encoded Waveforms Using
Standard Trellis (K=3, L=256)
Figure 5-18 Original and Encoded Waveforms Using
Full Trellis (K=3, L=256)


CD ac 3 > DH-Woac c+p-t-O 3 >
81
Figure 5-19 Original and Encoded Waveforms Using
Standard Trellis (K=2, L=256)
Figure 5-20 Original and Encoded Wavaeforms Using
Full Trellis (K=2, L=256)


(D ac ctH- (-O 3 > 3H-W(DaC ctH-i'd 3 >
82
Figure 5-21 Original and Encoded Waveforms Using
Standard Trellis (K=l, L=256)
0 50 100 150 200 250
Sample Number
Figure 5-22 Original and Encoded Waveforms Using
Full Trellis (K=l, L=256)


CD CLC c+ h> i'*d 3 > 3H-Wac c+H-i-'t} 3 >
83
Figure 5-23 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=128)
Figure 5-24 Original and Encoded Waveforms Using
Full Trellis (K=8, L=128)


(0 ac d-H-HO g > 3 H-W (D o.C <+ H(-'O 3 >
84
i
Figure 5-25 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=64)
Figure 5-26 Original and Encoded Waveforms Using
Full Trellis (K=8, L=64)


aCrtH.Hiog> d h.to (d ac c+g >
85
Figure 5-27 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=32)
Figure 5-28 Original and Encoded Waveforms Using
Full Trellis (K=8, L=32)


0&CrtP'H'03> D M-W aCftH'Hl03>
86
Figure 5-29 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=16)
Figure 5-30 Original and Encoded Wavaeforms Using
Full Trellis (K=8, L=16)


ac c+ h* i**o 3> 3H-W ac c+H-h-'o 3 >
87
Figure 5-31 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=8)
Figure 5-32 Original and Encoded Waveforms Using
Full Trellis (K=8, L=8)


3P'Cd ac c+H-p-i'd 3 >
88
Figure 5-33 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=4)


(D CL C C+ H- I'O 3 > p H-tjJ (D a C f+H-M'd 3 >
89
Figure 5-35 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=2)
Figure 5-36 Original and Encoded Waveforms Using
Full Trellis (K=8, L=2)


90
stant path is used to generate the apparent square
wave at the top and bottom of several of the plots.
These oscillations are one of the sources of
noise heard in the reconstructed speech. Each os-
cillation corresponds to a different tone that is
presented to the listener. These discrete tones are
in addition to the information that is lost due to
the oscillations not accurately following the
waveform. In the constraint length K=1 case, there
are no oscillations other than the square wave it-
self. Consequently, the annoying discrete tones are
not audible in the reconstructed speech.
The partial and full trellises produce very
similar results for the maximum-length (L=256) trel-
lises (Figures 5-7 to 5-22). However, performance
diverges with decreasing trellis length as seen in
Figures 5-23 to 5-36. Partial trellis performance
decreases while full trellis performance increases
for the reasons stated earlier.
The reason for the spikes in Figure 5-31 is
that the short trellis is comprised mostly of the
fan-out region which has small values to minimize
distortion on merging paths. As a result, the first
few sample encodings are small and then change rap-
idly once out of the fan-out region.


91
A summary of the SQNR for each of the
waveform plots is provided in Tables 5-4 and 5-5.
These values represent SQNR for 250 samples, whereas
Tables 5-1 to 5-3 provided SQNR for the entire en-
coded file (160,000 samples).
Table 5-4 Summary of Waveform Plot SQNR
Using Standard Trellis
Trellis Depth
2 4 8 16 32 64 128 256
C
0
n
s
t
r
a
1
n
t
L
e
n
g
t
h
1
2
3
4
5
6
7
8
0.00 1.92 1.33 6.26 7.63 6.99 5.62 8
22
12
24
45
29
10
82
77


yc+oq p a> r
92
Table 5-5 Summary of Waveform Plot SQNR
Using Full Trellis
C
0
n
s
t
r
a
1
n
t
Trellis Depth
8 16 32 64
128 256
22.45 15.67 12.12 10.60 10.05 9.42 9.46