QUALITY OF COMPRESSED SPEECH USING
TRELLIS WAVEFORM CODING
by
Michael Glenn McDonald
B.S., Brigham Young University, 1982
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Electrical Engineering
and Computer Science
1989
This thesis for the Master of Science degree by
Michael Glenn McDonald
has been approved for the
Department of
Electrical Engineering and Computer Science
by
Date AJols,
McDonald, Michael Glenn (M.S., Electrical
Engineering)
Quality of Compressed Speech Using Trellis Waveform
Coding
Thesis directed by Professor Joe E. Thomas
Dynamic programming has recently been used
to achieve speech compression performance greater
than that achieved by techniques such as delta modu
lation (DM) and differential pulse code modulation
(DPCM). DM and DPCM perform operations based on
current and previous symbols only. Multipath search
techniques employing dynamic programming consider
both previous and future symbols before producing a
(delayed) decision.
Multipath search coding techniques such as
trellis waveform coding have been used to produce
highfidelity representations of speech signals at
the rate of 1 bit/symbol. Speech sampled at 8000
samples/second produces a transmission rate of '8000
bits/second. This rate is at the lower end of
waveform coders and approaches the rates of voice
vocoders. The advantage of waveform coding over
voice vocoders is that it preserves speaker recogni
tion as well as intelligibility.
This thesis quantifies speech quality asso
ciated with trellis waveform coding. This is done
IV
by computer simulation and subjective evaluation.
Simulation results are presented for a binary chan
nel in the form of signaltoquantizednoise ratio,
waveform comparisons, frequency spectrum comparisons
and error spectra for different trellis complexi
ties. The subjective evaluation consists of scores
from subjective listeners evaluating several recon
structed speech segments processed by trellises of
varying complexities.
Trellis waveform coding is shown to provide
good intelligibility and speaker recognition with
moderately complex trellises. However, recon
structed speech contains noticeable background noise
indicating a need for enhancements to the basic
technique if tollquality speech is desired.
The form and content of this abstract are approved.
I recommend its publication.
Signed
Joe E. Thomas
ACKNOWLEDGMENTS
I would like to thank my advisor and commit
tee chairman, Joe Thomas, for his helpful sugges
tions and approval of funds to support this work.
Also, Douglas Ross, Jay Rothman and Marvin Anderson
for reading this thesis and serving on my committee.
I appreciate John Clark's assistance in selecting a
topic. Daniel Michaels provided valuable advice re
garding implementation of the experiment.
I would also like to thank Computing Ser
vices for letting me use their equipment and for
directly assisting in numerous capacities. The
Electronics Maintenance and Calibration Laboratory
personnel provided the equipment necessary to per
form the experiment. I appreciate the willingness
of the Stanford Information Systems Laboratory to
provide the speech used in this thesis. Their coop
eration provided time to emphasize analysis rather
than speech source generation.
My work associates at Martin Marietta pro
vided valuable assistance. Mike Meno helped with
the digital and analog designs for the experiment.
Steve Montague provided useful insight into the rep
VI
lication algorithm. Dave Cattani, Tom Clebecek, Eon
Havermann and Mike Martinez provided software. The
RealTime Distribution System Laboratory personnel
made a graphics display available. I thank the 24
subjective listeners, both work associates and
friends, for their careful evaluations. I also ap
preciate the flexibility at work provided by my un
derstanding and supportive supervisors, Vern Dorrell
and Brian Gallagher.
Finally, I thank my family for their con
stant support and encouragement. Specifically, my
parents, Glenn and Carol McDonald, for teaching me
the value of education and providing me opportuni
ties to obtain it. I express sincere thanks to my
wife and children, Melanie, Marie and Jordan, for
their patience and help in pursuing this degree.
CONTENTS
CHAPTER
I. INTRODUCTION................................ 1
Data Compression......................... 3
Code Books............................... 5
Tree Codes.............................. 10
Finite State Machines................... 15
State Diagrams.......................... 16
Trellis Codes........................... 17
Objective and Approach.................. 20
II. ALGORITHMS................................. 22
Viterbi Search Algorithm................ 23
Generalized Lloyd Algorithm............. 27
III. TRELLIS WAVEFORM CODING.................... 29
Implementing the Trellis................ 31
Trellis Encoding........................ 33
Populating the Trellis.................. 35
Extending the Trellis.................. 42
Encoding the Speech..................... 49
Decoding the Speech..................... 49
Effects of a Noisy Channel.............. 54
IV. EXPERIMENT CONFIGURATION................... 56
Processing the Speech.
57
VI11
Reconstructing the Speech................ 59
V. PERFORMANCE ANALYSIS....................... 62
Encoding SQNR............................ 62
Waveform Plots........................... 71
Spectrum Plots........................... 93
Subjective Evaluation................... 124
VI. SUMMARY AND TOPICS FOR FURTHER STUDY.... 138
BIBLIOGRAPHY
142
CHAPTER I
INTRODUCTION
This paper addresses the quality of com
pressed speech associated with trellis waveform cod
ing. The quality is analyzed using computer simula
tion and subjective evaluation. The first chapter
provides motivation for studying this data compres
sion technique and introduces the concepts of trel
lis waveform coding with the associated terminology.
Today is the age of information. We have an
unprecedented amount of information available to us
in a variety of forms. An area that has made rapid
progress is that of electronic media. Messages can
be sent electronically between computer system users
providing an extremely convenient means of conveying
small pieces of information. Some computer systems
also support conversational modes where users can be
involved in interactive conferences. Each party has
access to an input device such as a keyboard and an
output device such as a display terminal. Conversa
tions are initiated by typing on the keyboard and
all responses are displayed simultaneously on all
terminals. This is a significant capability consid
2
ering the proliferation of computer networking,
telephone modem dialin availability, and intercon
tinental communications.
More recently, voice messaging has been
added to home telephones. Arriving calls are ac
cepted by a computer when the resident is either on
the telephone or unavailable. The computer repro
duces a short message from the resident and then
initiates a listening mode. This mode digitizes the
caller's message and stores it in memory for future
reproduction. Voice messaging for businesses sup
ports message broadcasting to multiple parties and
retransmittal of received messages to other inter
ested users.
Education has also been greatly enhanced by
electronic media. Dictionaries, thesauri, encyclo
pedias, literature, card catalogs, and canonical
works have all been made available on computer.
Having electronic media available provides quick and
easy access to all of these valuable resources. The
computer can aid in locating information efficiently
by performing automated searches and sorts as di
rected .
A computer that has recently been introduced
into the marketplace offers storage formats which
include text, graphics, voice and music. It sup
3
ports array processing, image processing, encryp
tion, and facsimile transmission. These are the
essential elements to provide an integrated elec
tronic media workstation. In the future, this sin
gle workstation might support video telephone, tele
vision programming, educational training, document
transmittal and data transfer.
There are few technological advances re
quired for this vision to become a reality. The
overriding concern is the amount of data that must
be transferred from place to place and the amount of
storage that is required once it gets there. This
problem is being somewhat alleviated by increasing
the storage capability of various devices and by
ever striving to build highercapacity communication
1 inks.
Data Compression
One element which aids the process of con
veying information is that of data compression. Data
compression consists of reducing the data volume
while preserving the information content. There are
fundamentally two categories of data compression:
redundancy reduction and entropy reduction [1].
Redundancy reduction reduces data in such a
way as to preserve all information. A common exam
4
pie is to represent frequentlyoccurring symbols
with a small amount of data and rarelyoccurring
symbols with a larger amount, thus achieving less
data on the average. This process is termed loss
less in that the original data can be completely
reconstructed from the reduced data set. There ex
ists a onetoone mapping between the original data
and the compressed data so that no information is
lost in the process. Compression is obtained by
removing the redundancy inherent in the data.
Entropy reduction consists of transforming
the data in such a way that the transformed data
have lower entropy than the original data. This
technique usually consists of some form of quantiza
tion and is most often performed in the digital do
main. One example is to represent the data by their
Fourier coefficients. These coefficients are gener
ated by applying the Fast Fourier Transform and rep
resent the amount of energy contained in each fre
quency bin. For good compression, the compressed
data energy must be concentrated in a limited number
of frequency bins. Bins containing insignificant
amounts of energy can be omitted and the data can be
represented by storing only the Fourier coefficients
for each of the remaining bins. Clearly, this is a
lossy technique since some of the data information
5
has been discarded. The data will never be able to
be perfectly reconstructed, but will be a close ap
proximation of the original.
Another entropy reduction technique involv
ing quantization is multipath search coding. Sev
eral data sets or vectors are available for encoding
the data. A block of data is compared to each vec
tor, one block at a time and the vector which most
closely matches the data is used as the encoding.
The data block is quantized, since there are fewer
vectors than possible data sequences. These vectors
often take the form of a path through a certain
structure, so that the vector comparison is termed a
search. Due to the multiple vectors or paths avail
able for encoding, this technique is referred to as
multipath search coding.
Code Books
Multipath search coding can be implemented
using a code book, a tree code or a trellis code. A
code book is simply a set of different vectors, each
vector containing symbols which are representative
of typical data sequences. The data input is broken
into blocks and compared to every vector in the code
book. The data block is then represented by the in
dex of the vector which most closely matches it.
6
This representation is called an encoding, since the
original data are now represented by a different set
of data. Compression is achieved by virtue of fewer
bits being required to represent the number of vec
tors in the code book than bits required to repre
sent all possible data sequences. The data are
quantized in that they are now represented by the
symbols in the code book vector which are only ap
proximately the same as the data. How well the code
book vector matches the original data determines the
quantization distortion. When designing the code
book, candidate vectors are chosen to match typical
sequences based on some error criterion. This error
criterion is selected to minimize distortions which
are most harmful to the reconstruction process.
A graphical representation of a code book
(often called a vector quantizer [2]) is provided in
Figure 11. It consists of four states represented
by the vertical dots and four nodes represented by
the horizontal dots. Nodes of the same state are
connected yielding three branches in each state. In
a code book, each branch is assigned a value repre
senting a typical data symbol as shown.
Suppose the first block of data is 00 01 10,
consisting of three twobit symbols. This block of
data is compared to all of the vectors and is found
7
_ 00 01 io
00#1
01#
10 
11*ii 12*
Figure 11 Codebook Quantizer
to perfectly match the first one. The first block
of data is therefore represented by 00, correspond
ing to the index of the first vector. This has
resulted in a compression ratio of 3:1 since six
bits have been represented by two bits.
Now suppose the second block of data is 00
01 11. If the error criterion is the distance meas
ure, or number of bits that differ, then the symbols
of vector 00 differ from the data block by one,
vector 01 differs by two, 10 differs by three, and
11 differs by three. Consequently, the second block
is also represented by the first vector, the vector
having the minimum distance, with a resulting dis
tortion due to quantization of one bit error in six.
Producing more vectors in the code book pro
vides more possible vector combinations, but also
increases the number of bits required to represent
the vectors. A code book having a kbit index can
8
represent 2k different data sequences. However, the
number of possible data sequences is qL, where q=2r,
r is the number of bits per symbol, and L is the
number of symbols in a block (or elements in a vec
tor) In the example, the number of sequences that
can be perfectly represented is four and the number
of possible data sequences is 64.
Further analysis of the encoding shows that
the 64 possible data sequences would be divided into
groups of 16 in four partitions corresponding to the
four code book vectors. The 16 6bit vectors would
be represented by effectively four bits, the other
two bits being used to specify the partition. Ana
lyzing possible differences of a fourbit pattern
from any other fourbit pattern is equivalent to
distances from the allzero pattern, where distance
is defined as the number of bits that differ. One
pattern has a distance of zero, four have a distance
of one, six have two, four have three and one pat
tern has a distance of four. With this perspective
and remembering that there are four partitions, the
total result is that four encodings will have no bit
errors, 16 encodings will have one error, 24 will
have two errors, 16 will have three, and four encod
ings will have four bit errors. In summary, 60
9
vectors will have some degree of distortion ranging
from one bit error to four bit errors.
This scheme may appear useless, but there
are some applications in which the least significant
symbol bit is unimportant. The code book can be
chosen in such a way that errors are applied to the
least significant bit when possible. This approach
would yield a worst case situation of only one bit
error in the most significant bit given that there
are four errors in six bits. Furthermore, as evi
denced by the error tally, this occurs in only four
out of the 64 cases (as does the allcorrect case) ,
which is relatively infrequent. The most common
error will be two least significant bit errors.
Another aspect of data that aids the code
book or vector quantizer is that some data sequences
may occur less frequently than others. For example,
the allzero or allone case may never occur. The
code book can be set up to take advantage of this by
ensuring that these two sequences and any other in
frequently occurring sequences are evenly divided
between the possible partitions. This reduces the
actual number of sequences that must be encoded by
any one vector, consequently reducing the average
distortion (error). In practice, systems that can
tolerate some distortion have been very successful
10
in compressing data with acceptable degradation by
using code book encoding techniques.
In summary, multipath search is accomplished
using a code book by successively calculating the
distortion between the original data and the code
book vectors (or paths). Once the minimumdistor
tion path has been located, the original data are
encoded as the path index.
Tree Codes
The second form of multipath search coding
is tree codes [3]. A tree diagram is shown in Fig
ure 12. Its name comes from the fact that there is
an initial state from which branches emerge, and
each of these branches enter another state from
which more branches emerge. This continues indefi
nitely similar to the progression from the trunk of
Figure 12 Tree Branches
11
a tree to its branches and finally to the small
twigs.
A tree is simply a representation of an in
finite state machine. Each node represents a dif
ferent state and each branch is a transition from
one state to the next. The tree illustrates all
possible transitions. In this example, there are
only two branches emerging from each state, although
there could be several. Since there are only two in
this case, it is termed a binary tree. The upper
transition shown by the thicker line is due to a 0
and the lower transition is due to a 1. Further
more, each branch has a symbol which is associated
with the transition. As a result, data sequences
can be represented by the different paths through
the tree.
As an example, the path 010 would start at
the leftmost node, transition along the upper path
as specified by the first 0, move along the lower
path as indicated by the 1, and then take the upper
path due to the last 0. This would result in a data
sequence of 01 10 00, which is just the symbols
along the path. An original data sequence of 01 10
00 would then result in an encoding of 010 corre
sponding to the path specification. This results in
12
a 2:1 compression ratio since six bits are encoded
to three bits.
A tree contains pL different paths to repre
sent qL possible combinations, where p=2b, b is the
number of bits for representing paths emerging from
a state, q=2r, r is the number of bits per symbol,
and L is the length of the tree. In the example,
there are eight different paths to represent 64 se
quences. During encoding, possible sequences will
be separated into eight partitions corresponding to
each path, so that any eight sequences will be rep
resented by one tree path or vector. It follows
that these sequences will differ in at most three
bits (the other three bits were used to specify the
path), so that the maximum distance will be three
bits resulting in a maximum of three bit errors out
of six. Since there are at most three bit errors,
these errors can all be contained in the least sig
nificant bits. Specifically, there will be one zero
error, three single errors, three double errors, and
one triple error in each partition. Combining all
partitions results in eight perfect representations,
24 single and double error sequences, and eight tri
ple error sequences.
It appears from this example that the tree
has far better performance than the code book, but
13
the compression ratios must be the same for a fair
comparison. A code book with a 2:1 compression ra
tio might consist of two states and three nodes (two
branches). Now, 16 sequences are possible and there
are two code book vectors, so that eight sequences
will be encoded by each vector. One of the vector
bits is taken up in specifying the path leaving
three bits to represent the eight sequences in each
partition. This is identical to the case of the
tree where six bits were available in each tree
path, but three were occupied by the path. in both
cases, this leaves three bits to specify eight se
quences in each partition. Performance is identi
cal .
In the general case, the number of possible
sequences and available sequences is shown in Table
11 along with the corresponding compression ratios.
Table 11 Comparison of Code Book and
Tree Performance
Possible
Sequences
Available Compression
Sequences Ratio
Code Book 2rL
2k
rL: k
2 rL
2bL
r: b
Tree
14
For equal compression ratios, b=k/Lc, where b is the
number of bits used to specify the tree branches at
each node, k is the number of code book index bits,
and Lc is the code book length. For the same number
of available sequences, b=k/Lt, where Lt is the
length of the tree. This implies that Lc=Lt. This
seems reasonable, but for the same number of avail
able sequences, k=bLt so that there must be 2k vec
tors in the code book. The smallest value for b is
the binary case, b=l, and a practical Lt is 256 so
that the number of vectors is 2 256! It is now clear
that tree encoding ultimately outperforms code book
coding.
The previous example emphasizes the fact
that compression ratio and distortion are two funda
mental parameters in analyzing the effectiveness of
various data compression schemes. Any data are eas
ily compressed to any desired extent provided there
is no distortion requirement. One extreme is to
take all of the original data and discard them. The
compression ratio is infinite, but so is the distor
tion. There is an entire discipline termed rate
distortion theory which derives theoretical bounds
for the minimum distortion that can be achieved with
a given compression ratio and vice versa [4].
15
Finite State Machines
The previous tree code was generated with an
infinite state machine. Finite state machines also
produce tree codes. In this case, the tree branches
will begin to repeat when the state machine recy
cles. For example, if the state machine is a two
stage binary shift register with new bits shifting
in from the right as shown in Figure 13, there are
only four possible states. (A nonbinary shift reg
ister would shift symbolis rather than single bits
into each stage.) Starting in the 00 state, an
input sequence of 111 changes the contents of the
shift register from 00 to 01, from 01 to 11, and
finally from 11 to 11, which is a previously entered
state. This cyclic property is true for all initial
states and possible input sequences.
Figure 13 Finite State Machine Implemented as
a TwoStage Binary Shift Register
16
State Diagrams
The state diagram of Figure 14 is a graphi
cal representation of the twobit shift register.
The bubbles represent states and the arrows repre
sent transitions due to either an incoming 1 or 0.
The thick lines indicate a 0 transition and the thin
lines indicate a 1. Assuming that the shift regis
ter is initialized to the 00 state, a 0 shifted in
01
Figure 14 State Diagram of a TwoStage
Binary Shift Register
17
from the right will not change the contents of the
shift register. This is depicted by the upper loop
in the state diagram showing the 00 state tran
sitioning back into the 00 state. On the other
hand, if a 1 is shifted in from the right, the con
tents of the register change to 01. This is shown
by the arrow going from state 00 to state 01 and so
on.
The symbols along the paths represent the
symbols that are associated with that particular
transition. A transition from state 00 to state 01
produces a symbol 10. The same is true for each
path.
Trellis Codes
Taking advantage of the fact that the tree
branches begin to repeat for finite state machines,
the repeating branches can be connected into the
same previous states. This structure results in a
trellis. The trellis of Figure 15 is identical to
the tree of Figure 12 except that the branches are
merged when they begin to repeat. Nodes in each
column represent states and each column represents a
different transition. The branches of the trellis
are simply the state transitions so that branch sym
bols are identical for the same statetostate tran
18
Figure 15 Trellis Structure
sitions. Comparing Figures 14 and 15, note that
the transition from the 00 state to the 00 state is
caused by an incoming 0 (as indicated by the thick
line) and results in the symbol 01. Likewise, the
second transition due to a 1 isfrom 00 to 01 and
results in the symbol 10. Figures 14 and 15 are
perfectly analogous.
By adding one more stage to the shift regis
ter, the contents of the full register can be used
to label the branches. The new shift register is as
shown in Figure 16. Each register contains a bit
that has been shifted in from the right. (In the
nonbinary case, each symbol is more than one bit.)
The resultant Kstage register now contains Kl
state index bits as before, and K branch index bits
corresponding to the 2K branches. K is called the
constraint length, since it is the number of transi
19
< State Index >
MSB LSB
Branch Index
Figure 16 Binary Shift Register Including
Extra Branch Index Bit
tions over which any incoming bit affects the con
tents of the register.
The trellis diagram has 2K_1 states, 2K
branches, and the same number of paths (pL=2bL) as
the tree diagram, but is finite as opposed to the
tree. A larger constraint length results in more
states and branches, hence more paths. Constraint
length is an important parameter directly affecting
trellis performance. During encoding, more branches
results in finer quantization of the input data.
Longer trellises.increase performance due to reduc
ing the effect of the initial startstate con
straint. Paths through the trellis have many common
elements so that individual symbols may be used many
20
times. A trellis thus eliminates the complexity of
the infinite tree while providing more paths than
the code book.
A trellis works very similar to the tree. A
data sequence is compared to all possible paths
through the trellis. The data are then encoded as
the path through the trellis. Just like the tree,
six bits of data are represented by three bits of
path. Performance is identical to that of the fi
nite tree. In summary, a trellis provides effective
quantization due to the many possible paths, yet is
easily realizable because it can be represented by a
finite state machine.
Objective and Approach
The primary objective of this thesis is to
quantify degradation to speech intelligibility and
speaker recognition when utilizing trellis waveform
coding techniques. In order to accomplish this ob
jective, digitized speech was obtained and the trel
lis was implemented in software. Speech data were
used to train the trellis and populate the branches
with symbols. The trellis training was accomplished
using the Generalized Lloyd Algorithm and the
Viterbi Algorithm with a mean square error crite
rion. The digitized speech was then encoded, recon
21
structed and analyzed. Analysis included both quan
titative and qualitative results.
Chapter II outlines the algorithms that were
required for this analysis. Chapter III provides
details related to trellis waveform coding. Chapter
IV provides details of the experiment configuration.
Chapter V contains the results of the experiment.
The last chapter is a summary including suggestions
for further study.
CHAPTER II
ALGORITHMS
Previous discussions alluded to distortion
and comparison of data sequences to code vectors.
These are predicated on an error criterion. The
concept of distance was used previously as a concep
tual error criterion, where distance was measured as
the number of bits that were different between the
data and code word. However, this may not be an
effective error criterion, since it weights all bits
equally. Error criteria must be selected based on
the data being encoded. For different kinds of
data, different kinds of distortion are more harm
ful. For speech, many error criteria have been de
veloped to minimize the distortion that is audible
to the human listener. These criteria are often
complicated and difficult to implement. An error
criterion that performs reasonably well and is
mathematically tractable is the mean square error
criterion expressed as
23
MSE = jr'Z (xt xr)2 (21)
where x represents the original data sequence, x' is
the encoded data sequence, and n is the number of
elements. The difference between each element of
the two vectors is squared and summed over the en
tire vector. The mean square error has the effect
of emphasizing errors in the most significant bit.
The mean square error criterion has been used in
producing the results of this thesis.
Viterbi Search Algorithm
The Viterbi Search Algorithm was originally
developed for maximum likelihood decoding of con
volutional codes used in communications for error
correction. Since a convolutional code can be rep
resented by a trellis, the algorithm has direct ap
plicability here.
The algorithm is really quite simple to im
plement although it is computationally intensive and
requires a significant amount of storage. It util
izes metrics which is a term referring to distortion
on a branch as calculated by the error criterion.
The algorithm consists of six steps.
24
1) For each state, calculate the metric for
each incoming branch.
2) Add each branch metric to the previous
state metric to form the cumulative
path metric.
3) Record the bit corresponding to the
branch transition with the smallest
path metric.
4) Save the surviving path metric as the
new state metric.
5) Repeat the previous steps for each
transition.
6) Select the path with the smallest
metric as the best path.
The first step consists of calculating the distor
tion associated with all incoming branches. This
can be done by applying the mean square error crite
rion so that a small metric corresponds to minimal
distortion. The second step requires adding 'the
branch metric to the previous state metric. Each
incoming branch emerges from a certain state. Pre
viously, all paths into this state were eliminated
except the path of lowest distortion. The surviving
path metric was then associated with the state. The
current branch metrics are added to the surviving
path metrics to obtain a total path metric for each
25
incoming branch. At this point, the incoming path
with minimum distortion can be selected. The bit
causing the transition is recorded and the surviving
path metric is associated with the current state.
Conceptually, the Viterbi algorithm is
analagous to attempting to locate the shortest path
between two points along several paths. Multiple
paths merge and separate, but all paths begin and
end at a common point. The algorithm keeps track of
distances along multiple paths until they merge at a
common point, in this case a node. At this time,
the paths having the longer distances are elimi
nated. Selecting longer paths to a certain point
can only make the overall journey longer. If the
shortest path is chosen, any path leading from this
point will be as short as possible. At the final
destination, only the shortest paths are remaining,
one for each node. It is then a simple process to
select the best path overall. The algorithm accom
plishes a sequential maximum likelihood decision by
process of elimination at each transition in the
trellis.
The viterbi algorithm as outlined above re
quires significant storage, since one entire path
through the trellis must be saved for each state.
The largest trellis depth in this thesis is 256, so
26
storage was not a problem. However, if storage is
limited, the Viterbi algorithm can be implemented as
a finite sliding window as shown in Figure 21. In
this case, only a fixed amount of previous path in
formation is saved. After each transition, the best
path to each state is determined as before, but a
further step is to select the state having the best
overall path. The value at the beginning of the
window on this most likely path is used as the en
coding. This is possible because all paths near the
beginning of the trellis begin to merge as the proc
ess steps farther into the trellis. Simulations
have shown that if the window length is maintained
t t
Symbol Current
Release Transition
Figure 21 Finite Encoding Window
27
at 4 to 5 times the constraint length, then there is
only a small degradation in performance [5].
Generalized Lloyd Algorithm
The Lloyd Algorithm was originally discov
ered as a solution of pulse code modulation quan
tizer equations. It later became recognized as a
minimum distortion source code design. The Lloyd
algorithm consists of five steps [6].
1) Obtain initial trellis branch symbols.
2) Perform a trellis encoding of the
source using the search algorithm and
the error criterion.
3) Calculate the encoding distortion.
4) Refine the branch symbols based on the
trellis encoding.
5) Repeat steps 24 until the distortion
is acceptable.
The algorithm is generalized by adding a sixth step
which consists of extending the trellis and repeat
ing steps 25 [7]. The process is completed when
the trellis has been extended to the maximum con
straint length.
The initial trellis is based on some prior
knowledge of the data. If it is known to be Gauss
ian, the initial trellis could consist of samples of
28
a Gaussian random variable. The second step uses
training data and requires searching the trellis for
the path which best represents the training data
based on the error criterion. This step is per
formed for all blocks of data in the training set.
Upon completion, each data element has been encoded
to a certain branch in the trellis. This forms a
partitioning of the data, one partition for each
branch. All data values encoded to a given branch
can now be used to form a new symbol for the branch
which reduces the resulting distortion. In this
work the centroid is used which, in the case of mean
square error, reduces to the average of the ele
ments. So, the average of all elements encoded to
each branch is used as the new branch symbol. In
the next step, the distortion associated with the
current encoding is evaluated. If the distortion is
acceptable, the existing branch symbols are used for
encoding data outside of the training set. If the
distortion is not acceptable, the existing branch
symbols are replaced by the new branch symbols and
the process is repeated. The sixth step of trellis
extension will be discussed in detail in the next
chapter.
CHAPTER III
TRELLIS WAVEFORM CODING
The trellis and algorithms required for
trellis waveform coding were previously introduced.
This chapter covers the specific implementation of
the trellis and application of the algorithms used
in this thesis.
The flowchart in Figure 31 shows the proc
ess control functions. All of the calculations and
accompanying results are predicated upon the con
straint length and the trellis depth. In the case
of populating the trellis, constraint length is used
as the constraint length for the final trellis.
Intermediate steps initialize the constraint length
to one and then extend the trellis at each iteration
by increasing the constraint length by one. Since
higher constraintlength trellises are generated by
extending lowerorder trellises, all trellises hav
ing constraint lengths less than the desired trellis
are generated as part of the process. So, when
constraint length is specified for populating the
trellis, what is actually being requested is that
all trellises having constraint lengths less than
30
CONTROL ]
PROCESSING J
POPULATE
TRELLIS
ENCODE
SPEECH
Figure 31 Controlling the Process
31
the desired constraint length be generated. The
branch symbols are available for storage after each
constraintlength iteration. The final iteration
provides a trellis having constraint length equal to
that input at the process control level.
In the case of encoding speech, the con
straint length indicates the trellis which will be
used to encode the speech data. The branch symbols
corresponding to this trellis must have been previ
ously generated. Larger constraint lengths result
in trellises with higher performance. Trellis depth
indicates the length of the trellis in both cases of
populating the trellis and encoding the speech.
Once the desired constraint length and trel
lis depth have been defined, the trellis must be
populated with branch symbols representing typical
data sequences before initiating data encoding. The
entire process is to populate a trellis of the de
sired constraint length and depth using training
data, and then start the process over again using
this trellis to encode speech data outside the
training set.
Implementing the Trellis
The trellis is implemented as a shift regis
ter having length equal to the constraint length.
32
As was shown in Figure 16, the most significant bit
(MSB) is on the left and the least significant bit
(LSB) is on the right. Incoming bits are shifted in
from the right. The state is specified by the Kl
least significant bits. The branches are specified
by the entire contents of the shift register. Of
interest to the Viterbi algorithm is the index of
the incoming branches and previous states. These
operations are performed algebraically by the equa
tions
i' = 1 + uqK_1, u=(0,l) (31)
1' = [if] + uqK2, u=(0,l) (32)
where i is the branch index, 1 is the state index, u
is the incoming transition symbol, and the prime
indicates previous. These operations are equivalent
to shifting the contents of the register back (to
the right) one stage and bringing in each of the
possible symbols from the left, since these were
lost in the transition to the current state. If the
current state is 00, then the two incoming branch
indices are 000 and 100 for the trellis of Figure
15. These correspond to the first and fifth
branches from the top that emerge from the third
33
column of nodes. The corresponding previous states
are 00 and 10.
The initial fanning out of the trellis re
quires that the first Kl transitions be treated
separately. This is implemented by performing cal
culations for only the first 2L states, where L is
the trellis depth, and for the zerotransition (up
per) branches only, until the trellis is full.
In the current implementation, speech sam
ples consisting of 12 bits/sample are stored as
16bit integers. Since data are encoded as symbols,
the number of integers required for each encoding is
equal to the trellis length. The branch symbols are
stored as 32bit floating point numbers. There is
one symbol for each branch with the number of
branches equal 2K. The path consists Of a single
bit at each node specifying one path for each state.
In this implementation, each bit is stored as a
16bit integer. This results in 2K1L integers to
represent the path. The path could be reduced as
suggested in the section on the Viterbi algorithm.
This was unnecessary in the current application.
Trellis Encoding
The technique for populating the trellis
branches with symbols will be discussed in the next
34.
section. It involves trellis encoding so that this
topic must be covered first. For purposes of this
discussion, assume that the trellis has been previ
ously populated.
The key to trellis encoding is the search
algorithm. Several search algorithms exist, but the
optimal Viterbi algorithm has been used in this the
sis. From the section on algorithms, it is under
stood that the Viterbi algorithm selects the path
through the trellis which minimizes average distor
tion based on some error criterion, in this case
mean square error. Symbols along the path then rep
resent the encoded data. The path can be repre
sented by a 1 or a 0 at every node. Since the
speech data in this thesis were digitized at twelve
bits per symbol, this represents a 12:1 compression
ratio.
To encode, input data are divided into
blocks equal to the depth of the trellis. The
Viterbi algorithm compares each block element to all
branch symbols at the corresponding trellis depth.
The best path through the trellis is selected based
on the accumulated error (distortion) for each path.
The encoded data become the respective branch sym
bols along the path.
35
Populating the Trellis
In order to perform data encoding, the
branches of the trellis must be populated with sym
bols that are representative of typical sequences.
An effective way of accomplishing this is by using
training data and the Generalized Lloyd Algorithm to
determine the branch symbols. This process is out
lined in Figure 32 which spans three pages.
The first page contains initializations for
the trellis extension loop and the trellis refine
ment loop. The first step is to set the constraint
length counter to zero, and set the distortion
threshold. The distortion threshold is used in de
termining whether distortion imparted by the current
trellis to the training data is acceptable. The
constraint length counter is incremented before the
first operation so that the initial trellis is of
constraint length one. Each iteration thereafter in
the extend loop increases the constraint length by
one. Since the first trellis is of constraint
length one, this trellis consists of one state and
two branches as shown in Figure 33. Each transi
tion begins in state 0 and ends in state 0, but one
transition follows the upper branch associated with
a 0 transition, and the other transition follows the
36
Extend
Loop
Figure 32 Populating the Trellis
37
Partition
Loop
Calculate current
block distortion
Sum with previous
block distortion
No
m=m+l
xj
Viterbi
Algorithm
Si= {x5 Ci;j}
Pi = 11 Si 
L
Bi = Bi+ZXj GSi
j=i
A=A+A
m=M?
Figure 32 (continued)
38
Figure 32 (continued)
39
Figure 33 ConstraintLengthOne Trellis
lower branch associated with a 1. The initial
branches are set to 1 and 1.
The next section is preparation for refining
the branch symbols. The current distortion is in
itialized to infinity so that the firsttime distor
tion will always be less than the initial distor
tion. The current distortion is moved to the previ
ous distortion and the current distortion is set to
zero. Also set to zero are the block counter, the
number of symbols in a branch partition, and the sum
of the symbols in a branch partition.
Now that all parameters have been initial
ized, the partition loop is entered on the second
page of Figure 32. The partition loop performs the
function of encoding the training sequence one block
at a time. This encoding finds the path through the
trellis which most closely represents the training
data. After encoding, each data sample is repre
sented by a branch symbol. This branch information
is used to form several partitions, Si, of the
training data where a partition is all block ele
40
ments that have the same branch index. (Cij indi
cates that the jth element was encoded to the ith
branch, thus being associated with the partition
Si.) The number of elements per branch is incre
mented and maintained as the number of elements in
the partition corresponding to that branch. A run
ning sum is kept of all elements of the partition
for future calculation of the new branch symbols.
Lastly, the distortion for each encoding is calcu
lated and accumulated for the entire data set.
Since this is a running sum, the final sum is repre
sentative of the mean square error of the entire
encoded sequence with respect to the training se
quence. After all blocks have been encoded, the
partition loop is complete and control passes to the
last page of Figure 32.
Here, the rate of change of distortion is
calculated by dividing the difference of the current
and previous distortions by the current distortion.
The current and previous distortions are the average
mean square error distortion for the entire training
sequence except that they have not been divided by
the total number of elements. However, the total
number of elements is identical in both current and
previous distortion measures, so the number of ele
ments cancels out in the division. The subtraction
41
provides the amount that the distortion has de
creased over the last iteration. Dividing by the
current distortion yields a ratio of distortion de
crease to the current distortion. This provides a
means of evaluating relative gains in minimizing
distortion. If there is significant improvement, or
the trellis has not yet been refined, then new
branch symbols are generated by calculating the cen
troid of each partition and the refine loop is re
initiated. Once there is insignificant improvement
in distortion, the trellis is extended to higher
level trellises as shown in Figures 34 and 35.
The refine loop is again initiated.
0
1
Figure 34 ConstraintLengthTwo Trellis
Figure 35 ConstraintLengthThree Trellis
42
In summary, initially a constraintlength
one trellis having two branches is populated with a
1 and a 1. The training data are divided into
blocks and each block is encoded using the con
straintlengthone trellis. Upon completion, each
block element has been encoded to a branch parti
tion. A new branch symbol is then calculated as the
centroid of the partition. This process is contin
ued until there is little improvement in distortion.
At this point, the trellis is extended and the proc
ess begins again. The process is complete when the
maximum constraint length has been reached and fur
ther iterations yield little improvement in distor
tion.
Extending the Trellis
After the trellis branch symbols have been
refined sufficiently for the K=1 case, the trellis
is extended to K=2. The number of branches is qK so
that the number of branches is doubled (in the bi
nary case) each time K is increased by one. Exten
sion continues until the trellis reaches the desired
constraint length.
A simple algorithm for extending the trellis
was provided in the flowchart of Figure 32. This
consists of forming a replica of every existing
43
branch, thus doubling the total number of branches.
The existing branch of index i is replicated as the
branch having index i+N, where N is the number of
branches. This extends each existing branch into a
branch that is incremented from the existing branch
by the number of states. Consequently, the higher
order trellis now contains* all of the same branch
symbols as the previous trellis, but there are two
of each branch. This allows the same refined branch
symbols to be available for refinement of the
higherorder trellis.
A more complicated extension algorithm rep
licates the branches in such a way as to not only
provide the same branch symbols, but also arrange
them in such a way as to preserve all paths that
exist in the current trellis. After the trellis of
constraint length K=1 is trained, there are two
branch symbols defined. These must now be repli.
cated to four while preserving the paths contained
in the K=1 trellis. This is accomplished as shown
in Figure 36. The permutation diagram on the left
represents how branches should be replicated. For
instance, branch 1 of the trained trellis goes to
branch 1 of the new trellis, branch 2 goes to branch
2, branch 3 and 4 are reversed, 5 goes to 8 and so
on. This permutation diagram is applied until half
44
Permutation Diagram
Swap Diagram
Current
Trellis
Branch
Extended
Trellis
Branch
Current
Trellis
Branch
Extended
Trellis
Branch
Figure 36 Trellis Extension Branch
Replication Operations
45
of the new branches have been generated. The others
are obtained by reflecting the new branches about
the center of the trellis to generate a symmetric
trellis.
The permutation matrix is generated by ap
plying each level of the permutation matrix to the
swapped matrix on the right. The first trellis has
two branches. To obtain the permutations for a
trellis having four branches, operations 1 and 2 of
the permutation diagram are applied to operations 3
and 4 of the swap diagram. This results in no
change since the indicated operation is to go from
the current branch to the same branch in both cases.
Since branches 3 and 4 are already swapped, they
remain swapped in the permutation diagram. For the
next level trellis, operations 1 through 4 of the
permutation diagram are applied to operations 5
through 8 of the swap diagram. As one example, the
permutation indicates a swapping of branches 3 and
4. Since this operation is overlaid onto branches 7
and 8 in the swap diagram, the replication to
branches 5 and 6 are swapped so that 7 goes to 5 and
8 goes to 6 instead. These results appear in the
permutation diagram. For the next level, operations
1 through 8 of the permutation diagram are applied
to operations 9 through 16 in order to obtain the
46
final permutation for a trellis having 16 branches.
This process continues until the desired number of
branches has been reached.
In order to extend a trellis, the existing
branches are permuted using the permutation diagram.
Then, the resultant branches are reflected about the
center of the trellis to provide twice the branches,
i.e. branch i goes to Ni + 1, where N is the number
of branches in the new trellis. When this permuta
tion algorithm has been used, paths that existed in
the lowerorder trellises now .exist in the higher
order trellises. This insures that distortion
caused by the higherorder trellises will be at
least as low as that caused by the lowerorder trel
lis. However, this is only true in portions of the
full trellis. Due to the initial fanout, the per
muted branches may cause increased distortion, since
there are a limited number of paths returning to the
first state.
To demonstrate the preservation of paths
from lowerorder trellises when extending to higher
order, take as an example the extension from K=1 to
K=2 and finally to K=3 as depicted in Figure 37.
The K=1 trellis is initialized and the two branch
symbols are refined using training data. The K=1
trellis is then extended to K=2 by using the permu
47
Figure 37 Extending to HigherOrder Trellises
48
tation diagram. The second trellis shows the re
sult. The replicated branch indices of the K=1
trellis are shown in the upper K=2 trellis and the
branch indices of the new extended trellis are shown
in the lower K=2 trellis. Comparing the upper and
lower trellises, it is clear that the first two op
erations of the permutation diagram have been ap
plied: branch 1 went to branch 1 of the new trellis
and 2 went to branch 2. Upon reflecting the
branches, 1 also went to 4 and 2 went to 3. Note
that there are two nodes from which branches 1 and 2
emerge. This is the same as the K=1 trellis. Fur
thermore, notice that the two branches entering the
same nodes are 1 and 2, just as in the smaller trel
lis. In effect, the larger trellis contains two
smaller trellises yielding performance that is at
least as good as the smaller trellis (neglecting the
fanout region). The K=2 branches are now replicated
as shown in the permutation diagram to generate the
K=3 trellis. Again, every node has the same
branches entering and the same branches emerging
from it as in the smaller trellis. This trellis is
also composed of two smaller trellises. The branch
indices for the new K=3 trellis are shown on the
bottom diagram of Figure 37.
49
Encoding the Speech
The speech data are encoded exactly as was
done during the trellis encoding required for popu
lating the trellis. The trellis has been previously
populated with branch symbols and the data have been
divided into blocks. The path most closely corre
sponding to the current data block as determined by
the search algorithm is used to represent the data.
As is shown in Figure 38, the block counter
is set to zero and incremented at the top of the
partition loop. The number of states and branches
are calculated based on the requested constraint
length. The trellis had to be previously trained
and populated with branch symbols, so there is only
a need during encoding to access the derived branch
symbols. Each block of data is provided to the
Viterbi algorithm which locates the best path for
that data block. The path is stored as a mapping
from input data to encoded symbols. This process is
continued until all blocks have been encoded. De
tails of the Viterbi algorithm are provided in Fig
ures 39 and 310.
Decoding the Speech
In an actual application, the digitized
speech must be transmitted over some channel. The
50
m=0
n=qk
N=qk
yi
m=m+l
xj
Viterbi
Algorithm
Pij
m=M?
Figure 38 Encoding the Speech
51
f LOCATE
V BEST PATH )
1=0
j=0
8X=0
j=j + l
1 = 1 + 1
i' = l+uqk_1
1 =[4] +uqkJ
8i'=(xi'_xi'')2
8 j .=8 ^ + 81,
81=min (81J
Figure 39
The Viterbi Search Algorithm
52
Select state with
lowest distortion
Piu
l=n?
j=L?
l0=index{min(81)}
RETRACE
PATH
T
(^^RETURnJ^
Figure 39 (continued)
53
l = lo
j =L+1
j=jl
i' = l+uql
M
j=l?
Figure 310 Retracing the Path in the
Viterbi Search Algorithm
c1
+uqk"2
54
compression is obtained by sending the bits repre
senting the path rather than the actual symbols.
Both transmitter and receiver must have knowledge of
the trellis structure and its branch symbols. This
implies that training must be performed and the re
sultant symbols provided to both the transmitter and
receiver. Once the path bits are received they are
used to trace out the path that was used in the
transmitter. Both trellises start in the first
state so that the start state is known. From the
first state, each incoming bit dictates whether the
next transition should be along the zero path or the
one path. As each branch is traversed, the symbol
associated with that branch is released. Upon re
ceipt of all path bits, the encoded symbols have
been completely reconstructed (assuming a noiseless
channel).
Effects of a Noisy Channel
It was not the intent of this thesis to in
vestigate performance of trellis coding in a noisy
channel. It appears that this scheme is sensitive
to noise. One channel bit error causes transitions
after the error to be relative to the wrong state.
However, the incorrect path that is traced out after
an error may closely resemble the intended path, in
55
fact merging with the correct path from time to time
so that one channel bit error does not imply that
all subsequent symbols will be in error. Also, a
symbol error does not imply that all symbol bits are
in error. A symbol error may actually only be wrong
in one bit. Finally, data are encoded in blocks so
that bit errors are constrained to having an effect
over one block only. Consequently, single channel
bit errors do produce multiple bit errors, but the
errors are not catastrophic.
Ayanaglu performed extensive analysis of
trellis coding in a noisy channel. His results show
graceful degradation of the trellis performance [8].
CHAPTER IV
EXPERIMENT CONFIGURATION
Digitized speech was obtained from the In
formation Systems Laboratory at Stanford University
via Internet, a communication link between Stanford
and UCD. It consists of a training data set and a
larger sample set. The training set is a 40 second
segment consisting of one male and one female
speaker, whereas the sample set is six speakers of
20 seconds each, three males and three females. The
speakers contained in the training set are different
than the speakers in the sample set to prevent any
direct correlation between the training process and
the speakers that are actually encoded. Speakers
were randomly selected and are reading different
passages out of common literature.
The speech is bandlimited to 3600 Hz and
sampled at 8000 samples/second, 12 bits per sample.
This yields 320,000 samples for each training
speaker and 160,000 samples for each sample speaker.
Each 12bit sample is stored as a two's complement
16bit integer, so the data range from 2048 to
2047. Some of the data actually exceed this range,
57
but these samples were set to the extrema before
processing.
Processing the Speech
The digital processing is shown in Figure
41. The speech data were received as one training
file and one sample file. The sample file needed to
be divided into individual speakers to facilitate
further processing. After dividing the sample file,
both the training file and the speaker files were
converted to the VAX data format. Since the data
received from Stanford were in the Sun format, upper
and lower bytes of the 16bit integers had to be
swapped to be compatible with the VAX.
The next step was to generate the branch
symbols. This was accomplished by using the train
ing data. Once the branch symbols had been gener
ated, these symbols were used to encode the sample
speakers. The result of encoding was a path that
represented the speaker. This path was converted to
the branch symbols specified by the path and then
truncated to return to a 12bit representation.
When the speech had been encoded, both
original and reconstructed data were available to
generate performance measures. These measures con
sisted of signaltoquantizednoise ratio, time
58
(C Stanford
Disk Sun Microcomputer
Internet
Separate
Speakers
Swap
Bytes
UCD
VAX
Generate
Branch
Symbols
Encode
Waveform
Decode
Waveform
Sample
Data
Generate
Quality
Measures
Reconstructed
Data
Kermit
IBM PC
f
Graphic
Display
Plotter
Floppy disk
Compaq
Transfer
to DAC
Figure 41 Digital Processing
59
waveform comparisons, frequency spectrum comparisons
and error frequency spectra.
The original data and encoded data were also
converted to unipolar values ranging from 0 to 4095
in preparation for the 12bit digitaltoanalog con
version. They were then sent from the VAX to the
IBM PC over a data link using the Kermit file trans
fer software. Using the IBM PC, the data were
transferred to floppy disk and ported to the Compaq
computer which served as the host for data recon
struction .
Reconstructing the Speech
i
Software resident on the Compaq read data of
one speaker from disk into RAM in preparation for
the direct memory access data transfer to the digi
taltoanalog converter (DAC). As shown in Figure
42 which illustrates the analog reconstruction, the
data were converted back to analog using a DAS16
multiplying DAC manufactured by the Metrabyte Corpo
ration. The DAC was programmed to run at 8000 con
versions/second so that data were clocked out at the
same rate as the original sample rate. The data
were then passed through an active lowpass filter
having a 3dB cutoff frequency of 3600 Hz to match
the original processing filter and to reject high
60
Figure 42 Analog Reconstruction
frequency images induced by the analog reconstruc
tion process. The filter circuit diagram and asso
ciated parameters are shown in Figure 43 and the
pass band characteristics are plotted in Figure 44.
The final step was to pass the data to either an
analog reeltoreel tape recorder for archiving, or
send it directly to a speaker for audio presenta
tion.
61
22pF
Figure 43 Active Lowpass Filter. Source: Linear
Applications. vol. 1, National
Semiconductor Corp., 1973,
pp. AN59 to AN510.
Frequency (KHz)
Figure 44 Gain of Lowpass Filter
CHAPTER V
PERFORMANCE ANALYSIS
Signal to quantized noise ratio (SQNR) is a
measure of error in the encoded waveform. It is
defined as the signal power divided by the noise
power
SQNR
10 log.
Â£i Xi2
10
Â£.(Xi x )!
(51)
where x is the original waveform and x' is the en
coded waveform. This quantity was used to evaluate
the relative performance of each of the different
trellis encoding schemes. The theoretical bound for
encoding a memoryless Gaussian source using mean
square error and 1 bit/symbol is 14.4 dB [9]. This
bound can be used as a benchmark for comparison of
the actual performance, since the memoryless Gauss
ian source lowerbounds all other sources.
Encoding SQNR
Three trellis encoding schemes were evalu
ated. The first two are standard trellises which
63
start in state 1, fan out to a full trellis and then
begin to repeat. The only difference is that the
first case was populated using the simple branch
replication technique, whereas the second case was
done using the more complicated replication tech
nique which preserves paths through the trellis.
The third case was the full trellis where there is
no initial fanout, but all states are available
from the start. This full trellis was populated
using the simpler replication technique.
The resulting SQNR for each of the cases is
shown in Tables 51 through 53. The cases are
plotted in Figures 51 to 53. The rows represent
constant constraint length and the columns are trel
lis length. Constraintlengthone trellises in all
cases are flat at 2.6 dB. in a constraintlength
one trellis, there is only one state and two paths,
so changing the length of the trellis has no effect
on its ability to encode data.
In all cases, performance continuously im
proves from constraint lengths K=1 to K=8 for trel
lis lengths L=7 and 8. This is as expected. In the
first case (Figure 51), the other trellis lengths
show the effect of replicating the trellis branches
without any concern for preserving paths from the
previous trellis. Often, the trellis extension re
64
Table 51 SQNE for the Standard Trellis
Using Simple Extension
Trellis Depth
2 4 8 16 32 64 128 256
c 1 2.63 2.63 2.63 2.63 2.63 2.63 2.63 2.63
o n s 2 2.81 4.39 4.84 5.11 5.26 3.30 3.33 3.37
t r 3 2.82 4.59 5.29 5.76 6.12 6.02 6.22 6.37
a i n 4 2.82 1.67 3.52 5.08 6.21 6.76 7.13 7.34
t 5 2.83 2.12 2.98 5.13 6.55 7.13 7.63 7.89
L e n 6 2.83 2.13 O'. 59 4 85 4.81 6.45 7.59 8.45
g t 7 2.84 2.13 1.61 4.38 6.33 6.92 7.73 8.55
h 8 3.69 2.13 1.08 3.80 6.15 7.69 7.72 8.83
suits in worse performance with the more complex
trellis, since previously existing paths are de
stroyed and refining the trellis must be reiniti
ated. In fact, the refinement does not reach the
performance of the previous trellis before the mini
mum rate of improvement is reached. This effect is
magnified in the shorter trellises where the fanout
region occupies a larger percentage of the overall
trellis and thus becomes a dominant factor in deter
mining branch symbols. The fanout region affects
several samples after the full trellis has been
65
Table 52 SQNR for the Standard Trellis
Using PathPreserving Extension
Trellis Depth
2 4 8 16 32 64 128 256
c 1 2.63 2.63 2.63 2.63 2.63 2.63 2.63 2.63
o n s 2 2.81 2.89 2.95 2.63 2.63 2.63 2.63 3.15
t r 3 2.82 4.14 4.41 4.69 4.87 4.75 4.68 3.15
a i n 4 2.82 1 19 2.89 4.20 5.09 5.78 5.30 5.20
t 5 2.83 2.13 2.22 4.05 5.30 5.77 7.24 6.08
L e n 6 2.83 2.13 2.18 3.69 5.24 7.19 7.31 7.44
g t 7 2.84 2.13 1.40 3.92 6.00 7.22 7.77 7.70
h 8 3.69 2.13 0.85 3.47 5.72 7.46 7.95 9.00
reached, because paths leaving the fanout region
may continue along less desirable paths which are
required to minimize the error in the fanout re
gion, but are not optimal in the full trellis re
gion. Longer trellises begin to suppress this ef
fect as the fanout region becomes small with re
spect to the trellis. Notice that for constant con
straint lengths and trellises longer than the fan
out region, there is a continual increase in per
formance as a function of trellis length, with the
exception of the two smallest constraint lengths.
66
Table 53 SQNR for the Full Trellis
Using Simple Extension
Trellis Depth
2 4 8 16 32 64 128 256
1 2.62 2.62 2.62 2.62 2.62 2.62 2.62 2.63
c
o 2 5.58 2.83 3.44 2.83 3.26 3.40 3.40 3.40
n s t 3 7.92 6.04 3.57 3.29 6.67 6.62 6.58 6.59
r
a i 4 11.83 8.88 8.49 7.24 8.06 8.00 7.73 7.93
n
t 5 12.86 9.62 9.95 7.72 8.97 8.84 8.69 8.75
L 6 17.52 13.23 11.55 9.14 10.12 9.94 9.47 10.00
e n g 7 20.35 13.89 12.63 10.31 10.88 10.65 10.45 10.77
t h 8 22.84 16.70 13.65 11.54 11.36 10.98 10.70 11.25
Surprisingly, the trellis extended using the
pathpreserving replication technique (Figure 52)
is even more affected by the fanout region, so that
only marginal performance is realized over the sim
ple replication technique. All paths are preserved
only when all states in the trellis are available.
Since the initial paths emerge from the first state,
not all states are available until the full trellis
is reached at L=K1. Unlike the simple replication
technique where previous branches are left in place,
the permutations associated with the pathpreserving
CO <3 2 PS (T3 PQ) CO
67
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K=1
Figure 51 SQNR for the Standard Trellis
Using Simple Extension
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K=1
Figure 52 SQNR for the Standard Trellis
Using PathPreserving Extension
W Of 2 02 (T3 m)
68
K=8
K=7
K=6
K=5
K=4
K=3
K=2
K= 1
2 3 4 5 6 7
Log2 of Trellis Depth
Figure 53 SQNR for the Full Trellis
Using Simple Extension
l
B
69
replication technique may move branch symbols into
the initial fanout paths that produce more degrada
tion than those that were in the previous trellis.
Performance does still improve as a function of
trellis depth with constraint lengths greater than
K=5. For the large constraint lengths, the longer
trellises provide significant increases in perform
ance. This implies the value of making the trellis
even longer. Ultimately, larger constraint lengths
and longer trellis lengths yield the best perform
ance .
For the full trellis, initial fanout is
eliminated and side information is transmitted to
the decoder to enable proper firststate initializa
tion. This side information consists of enough bits
to specify all of the states in the trellis. Conse
quently, the number of bits is Kl. For the largest
trellis in this study, this amounts to 7 bits.
These extra bits have the effect of lowering the
compression ratio and increasing the complexity of
the decoder. However, the performance is signifi
cantly increased. SQNR for the largest constraint
length and longest trellis is 11.25 dB. This is
2.42 dB higher than the standard trellis. Perform
ance generally increases with increasing constraint
70
length and decreasing trellis depth with the excep
tion of K=3, L=8 and 16.
The effective transmit rate due to the side
information for K=8 is increased to 8219 bps from
the original 8000 bps. The performance of the
shorterlength trellises comes at higher and higher
cost as the number of bits required to represent the
initial state constitutes more of the total transmit
packet. The transmit rate is almost doubled (15,000
bps) at trellis length L=8. At L=2, the effective
rate is 36,000 bps. So, highfidelity signals are
obtained, but the rate has been increased suffi
ciently that other more effective schemes such as
multiplelevel (nonbinary branch transitions) trel
lises can be used in place of the shorter trellises.
The side information required for the full
trellis can be implemented using the standard trel
lis. The initial path metric for each path in the
fanout region can be set to zero, so that the fan
out region does not affect the choice of the best
path. Incoming data are only compared to paths be
yond the fanout region. Once the best path has
been selected, bits corresponding to the path from
the first state to the chosen state through the fan
out region must be multiplexed into the beginning of
the data stream. When these data are received, the
71
first few bits specify the artificial path through
! the fanout region, and then operation is as with
/
X the full trellis.
The distortion, as a function of iterations
in the refinement/extension portion of the algorithm
is shown in Figure 54 to 56 for each of the trel
lis cases. These are for the K=8, L=256 trellises.
Note that the full trellis continually decreases,
whereas the partial trellises increase at each ex
tension point. In shorter trellises, this jump is
even more distinct and the distortion does not reach
the previous level before being extended again.
Since the pathpreserving replication does not per
form much better than the simple replication, no
more results will be presented.
Waveform Plots
A good visual demonstration of the perform
ance of trellis waveform coding is provided by over
laying the original and encoded waveforms. The ef
fectiveness of the encoding can be determined by how
closely the encoded data follow the original
waveform. Several records of each file were se
lected for plotting. One record contained some
rapid variations followed by a lowamplitude tail.
Another record contained very smooth, repeating
72
xlO4
Figure 54 Convergence for the Standard Trellis
Using Simple Extension
xlO*
Figure 55 Convergence for the Standard Trellis
Using PathPreserving Extension
73
xlO4
Figure 56 Convergence for the Full Trellis
Using Simple Extension
160
74
waveforms that each got slightly lower in amplitude.
These two records were used to analyze the waveform
following capabilities and shortfalls of trellis
coding. Generally, the technique had more diffi
culty with the first record. Plots of this record
encoded by the standard trellis and the full trellis
are shown in Figures 57 through 536. The solid
line indicates the original waveform, the dashed
line represents the encoded waveform.
Some of the plots exhibit neartriangular
oscillations. This is due to the construction of
the trellis. At each node in the trellis, one of
two branches provides the transition to the next
state. This transition releases a branch symbol
used to encode the original waveform. After tran
sitioning to a new node, the two branches emerging
from the new node are not necessarily the same two
branches as at the previous node. Consequently, the
transition produces a different branch symbol and
transitions to the next node. Depending on the
path, the new node could possibly be the same as the
first, so that the pattern can be repeated. A trel
lis contains several differentlength paths so that
oscillations may be anywhere from 1 sample/cycle
(constant) to the constraint length/cycle. The con
(D ClC c+hi0 3 > ffi aC 3 >
75
Figure 57 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=256)
Figure 58 Original and Encoded Waveforms Using
Full Trellis (K=8, L=256)
o ac c+Hii'a 3> 3h'W ac ctHh't? 3 >
76
Figure 59 Original and Encoded Waveforms Using
Standard Trellis (K=7, L=256)
Figure 510 Original and Encoded Waveforms Using
Full Trellis (K=7, L=256)
oaC(+H'Ht)3> , , D Mdd (D aC c+HHit) 3 >
77
Figure 511 Original and Encoded Waveforms Using
Standard Trellis (K=6, L=256)
Figure 512 Original and Encoded Waveforms Using
Full Trellis (K=6, L=256)
ffi ac dH' i'd 3 > 3HW oaC(+H< d 3 >
78
Figure 513 Original and Encoded Waveforms Using
Standard Trellis (K=5, L=256)
Figure 514 Original and Encoded Waveforms Using
Full Trellis (K=5, L=256)
ac c+ Hh't) 3 > 3 H'W dC c+Hh^tJ 3 >
79
Figure 515 Original and Encoded Waveforms Using
Standard Trellis (K=4, L=256)
Figure 516 Original and Encoded Waveforms Using
Full Trellis (K=4, L=256)
aCctH'Hl0 3> D HOd OuC c+H1**0 3 >
80
Figure 517 Original and Encoded Waveforms Using
Standard Trellis (K=3, L=256)
Figure 518 Original and Encoded Waveforms Using
Full Trellis (K=3, L=256)
CD ac 3 > DHWoac c+ptO 3 >
81
Figure 519 Original and Encoded Waveforms Using
Standard Trellis (K=2, L=256)
Figure 520 Original and Encoded Wavaeforms Using
Full Trellis (K=2, L=256)
(D ac ctH (O 3 > 3HW(DaC ctHi'd 3 >
82
Figure 521 Original and Encoded Waveforms Using
Standard Trellis (K=l, L=256)
0 50 100 150 200 250
Sample Number
Figure 522 Original and Encoded Waveforms Using
Full Trellis (K=l, L=256)
CD CLC c+ h> i'*d 3 > 3HWac c+Hi't} 3 >
83
Figure 523 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=128)
Figure 524 Original and Encoded Waveforms Using
Full Trellis (K=8, L=128)
(0 ac dHHO g > 3 HW (D o.C <+ H('O 3 >
84
i
Figure 525 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=64)
Figure 526 Original and Encoded Waveforms Using
Full Trellis (K=8, L=64)
aCrtH.Hiog> d h.to (d ac c+g >
85
Figure 527 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=32)
Figure 528 Original and Encoded Waveforms Using
Full Trellis (K=8, L=32)
0&CrtP'H'03> D MW aCftH'Hl03>
86
Figure 529 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=16)
Figure 530 Original and Encoded Wavaeforms Using
Full Trellis (K=8, L=16)
ac c+ h* i**o 3> 3HW ac c+Hh'o 3 >
87
Figure 531 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=8)
Figure 532 Original and Encoded Waveforms Using
Full Trellis (K=8, L=8)
3P'Cd ac c+Hpi'd 3 >
88
Figure 533 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=4)
(D CL C C+ H I'O 3 > p HtjJ (D a C f+HM'd 3 >
89
Figure 535 Original and Encoded Waveforms Using
Standard Trellis (K=8, L=2)
Figure 536 Original and Encoded Waveforms Using
Full Trellis (K=8, L=2)
90
stant path is used to generate the apparent square
wave at the top and bottom of several of the plots.
These oscillations are one of the sources of
noise heard in the reconstructed speech. Each os
cillation corresponds to a different tone that is
presented to the listener. These discrete tones are
in addition to the information that is lost due to
the oscillations not accurately following the
waveform. In the constraint length K=1 case, there
are no oscillations other than the square wave it
self. Consequently, the annoying discrete tones are
not audible in the reconstructed speech.
The partial and full trellises produce very
similar results for the maximumlength (L=256) trel
lises (Figures 57 to 522). However, performance
diverges with decreasing trellis length as seen in
Figures 523 to 536. Partial trellis performance
decreases while full trellis performance increases
for the reasons stated earlier.
The reason for the spikes in Figure 531 is
that the short trellis is comprised mostly of the
fanout region which has small values to minimize
distortion on merging paths. As a result, the first
few sample encodings are small and then change rap
idly once out of the fanout region.
91
A summary of the SQNR for each of the
waveform plots is provided in Tables 54 and 55.
These values represent SQNR for 250 samples, whereas
Tables 51 to 53 provided SQNR for the entire en
coded file (160,000 samples).
Table 54 Summary of Waveform Plot SQNR
Using Standard Trellis
Trellis Depth
2 4 8 16 32 64 128 256
C
0
n
s
t
r
a
1
n
t
L
e
n
g
t
h
1
2
3
4
5
6
7
8
0.00 1.92 1.33 6.26 7.63 6.99 5.62 8
22
12
24
45
29
10
82
77
yc+oq p a> r
92
Table 55 Summary of Waveform Plot SQNR
Using Full Trellis
C
0
n
s
t
r
a
1
n
t
Trellis Depth
8 16 32 64
128 256
22.45 15.67 12.12 10.60 10.05 9.42 9.46
