Citation
Blind clarification of single sideband supressed carrier signals using vocal statistics

Material Information

Title:
Blind clarification of single sideband supressed carrier signals using vocal statistics
Creator:
Geissinger, Gary A
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
152 leaves : illustrations ; 28 cm

Subjects

Subjects / Keywords:
Radio, Single-sideband ( lcsh )
Radio -- Transmitter-receivers ( lcsh )
Carrier waves ( lcsh )
Algorithms ( lcsh )
Algorithms ( fast )
Carrier waves ( fast )
Radio, Single-sideband ( fast )
Radio -- Transmitter-receivers ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 151-152).
Thesis:
Electrical engineering
General Note:
Department of Electrical Engineering
Statement of Responsibility:
by Gary A. Geissinger.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
54671069 ( OCLC )
ocm54671069
Classification:
LD1190.E54 2003m G44 ( lcc )

Downloads

This item has the following downloads:


Full Text
BLIND CLARIFICATION OF SINGLE SIDEBAND
SUPPRESSED CARRIER SIGNALS USING
VOCAL STATISTICS
by
Gary A. Geissinger
B. Mus. Ed., University of Colorado, 1978
B.A., University of Colorado, 1984
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering
2003


2003 by Gary A. Geissinger
All rights reserved.


This thesis for the Master of Science
degree by
Gary A. Geissinger
has been approved
by
Dr. M. Radenkovic
Dr. J. Bialasiewicz
Date
t. L. Moore


Geissinger, Gary A. (M.S., Electrical Engineering)
Blind Clarification of Single Sideband Suppressed Carrier Signals using Vocal
Statistics
Thesis directed by Dr. Miloje Radenkovic
ABSTRACT
Single sideband suppressed carrier signals are used to convey voice information in
the high frequency radio spectrum. Due to the nature of this type of modulation, it is
necessary for the transmitter and receiver to be closely tuned to the same frequency.
This is a difficult requirement in low cost, commercial and amateur quality radio
transceivers.
This thesis presents and evaluates a blind adaptive algorithm for autonomously
clarifying a received single sideband signal without resorting to the usual methods
that involve either pilot carriers or a vestigial opposite sideband. At the conclusion of
the thesis simulations of this algorithm are presented along with the results that were
obtained.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
Dr. Miloje Radenkovic
IV


CONTENTS
Figures....................................................................viii
Tables......................................................................xii
Chapter
1. Background Information.....................................................1
1.1 Relevant Characteristics of the Human Voice.............................1
1.1.1 Vowels The Desired Signal............................................1
1.1.2 Consonants A Possible Noise Source...................................4
1.2 Single Sideband Suppressed Carrier Communications......................6
1.2.1 Single Sideband (SSB) Modulation........................................6
1.2.2 Translating the SSB Signal to the Transmit Frequency...................9
1.2.3 Frequency Synthesizers.................................................11
1.2.3.1 Phase Locked Loop Frequency Synthesizers.............................11
1.2.3.2 Numerically Controlled Oscillators...................................12
1.2.4 SSB Demodulation Recovering the Original Audio Signal................14
1.2.5 Translating the Received SSB Signal to Baseband........................16
1.3 Conversion Chain Error Analysis.........................................16
1.3.1 Transmit Conversion Chain Frequency Errors.............................16
1.3.2 Receive Conversion Chain Frequency Errors..............................19
1.3.3 Combined Conversion Chain Frequency Errors.............................20
2. The Blind Clarification Algorithm.........................................22
2.1 The Effect of a Tuning Error on SSB Voice Modulation....................22
2.2 System Engineering Aspects of the Design................................24
2.3 Determination of Vowel Fundamental Frequency........................... 27
v


2.3.1 Origin of the Vowel Fundamental Determination Algorithms..............28
2.3.2 Analysis of Vowels in the Frequency Domain............................31
2.3.4 Initial Audio Sample Processing........................................37
2.3.5 Vowel Simulation.......................................................40
2.3.6 Determination of Vowel Fundamentals using Sliding Correlations........40
2.3.6.1 Algorithmic Details of Correlative Vowel Determination...............42
2.3.6.2 Reference Data Set Generation........................................44
2.3.6.3 Data Set Zero Padding................................................46
2.3.6.4 Data Set Comparison by Correlation....................................47
2.3.6.5 Scoring Correlations..................................................49
2.3.7 Vowel Fundamental Determination using Autocorrelations................51
2.3.7.1 Development of the Autocorrelation Based Algorithm...................52
2.3.7.2 Improved Vowel Simulation.............................................57
2.3.7.3 Product of Autocorrelation and Mask Sliding Dot Product............58
2.4 Determination of Tuning Error when Vowel Fundamental is Known...........60
2.4.1 Approximating the Tuning Error Based on Fundamental Calculation.......60
2.4.2 A Possible Improvement in the Approximated Tuning Error...............63
2.4.3 Determination of Tuning Error Using Time Domain Correlations..........68
2.4.3.1 Correlation of Sine Waves............................................69
2.4.3.2 Tuning Error Determination Algorithm................................. 71
2.4.3.3 Typical Operation of the Tuning Offset Algorithm.....................73
2.4.3.4 Algorithm Limitations.................................................74
3. Algorithm Performance Testing.............................................76
3.1 Testing using Simulated Data..............................................77
3.1.1 Vowel Fundamental Frequency Tests......................................77
3.1.2 Testing of Tuning Offset Determination Algorithm......................79
3.2 Testing using Audio Samples............................................80
3.2.1 Acquire the Target Signal..............................................82
VI


3.2.2 Estimate Voice Classification.........................................84
3.2.3 Input Sample Processing...............................................85
3.2.4 Estimate the Vowel Fundamental Frequency..............................85
3.2.5 Estimate the Tuning Offset............................................86
3.2.6 System Calibration....................................................86
3.2.7 Test Results Using Local Speakers.....................................88
3.2.8 Tests Using MF and HF Reception......................................100
3.2.9 Conclusions Based on Test Results....................................106
Appendix
A Lagrange Interpolation....................................................107
B Acronym and Abbreviation List.............................................109
C Software Listing..........................................................111
References..................................................................151
VII


FIGURES
Figure
1.1 Block Diagram of the Voice Mechanism..................................1
1.2 Harmonic Series for a 160 Hz Fundamental..............................2
1.3 155 Hertz ah Vowel..................................................3
1.4 Single Sideband Modulation............................................6
1.5 SSB Transmit Conversion Chain........................................10
1.6 Phase Locked Loop Synthesizer........................................11
1.7 Numerically Controlled Oscillator....................................13
1.8 Single Sideband Demodulation.........................................14
1.9 Receive Conversion Chain.............................................17
2.1 155 Hz ah Vowel with +20 Hz Tuning Error...........................24
2.2 Machine Epsilon Calculation Result...................................27
2.3 AFC Circuit for FSK Radio Channel....................................28
2.4 Comparison of Shifted and Un-shifted Spectra.........................30
2.5 Fourier Transform of ACos(at)........................................31
2.6 Fourier Transform of a Vowel.........................................32
2.7 Magnitude of the Sine Function.......................................33
2.8 Fourier Transform of Vowel with Side Lobes...........................34
2.9 Vowel with Side Lobes Subject to a Tuning Error......................36
2.10 Vowel Spectra using Sampled Data and Discrete Fourier Transform......36
2.11 Typical Input Data Ensemble in Time Domain...........................37
2.12 Input Data Ensemble after Window is Applied..........................38
2.13 Simulated Data in the Frequency Domain; With Tuning Error............39
2.14 Correlative Fundamental Determination Algorithm......................43
2.15 Simulated Data with No Tuning Error..................................45
viii


2.16 Data Sets Padded Prior to Correlation.................................46
2.17 Correlation Plot at Full Scale........................................47
2.18 Correlation Plot with No Tuning Error.................................48
2.19 Correlation Result with Input Shifted -20 Hz..........................48
2.20 Sliding Correlation of 107.666 Hz Fundamentals........................49
2.21 Two Radio Personalities Fundamental Vowel Frequencies Compared........50
2.22 Misidentified Vowel Fundamental Frequency.............................51
2.23 Example of Vowel Autocorrelation......................................53
2.24 123.45 Hz Vowel in Frequency Domain...................................54
2.25 Autocorrelation of 123.45 Hz Vowel in Frequency Domain................55
2.26 Modified Vowel Fundamental Algorithm Flow Chart.......................56
2.27 Correlation Mask for 107.666 Hz.......................................57
2.28 Fundamental Search using Autocorrelation Algorithm....................58
2.29 Search for 107.666 Hz Fundamental Using Original Algorithm............59
2.30 Sliding Dot Product Analysis of Pathologic Case.......................60
2.31 107.666 Hz Fundamental with No Offset.................................61
2.32 107.666 Hz Fundamental with -20 Hz Offset.............................62
2.33 First Order Approximation Error Due to One-Sided Finite Differences...65
2.34 Approximation of Tangent using Two-Sided Finite Differences...........66
2.35 Derivative of x as a Function of Iteration............................67
2.36 Improved Tuning Error Approximation...................................67
2.37 Incorrect Improvement in Tuning Error.................................68
2.38 Correlation of 128 Hz and Test Frequencies............................71
2.39 Tuning Offset Determination Algorithm.................................72
2.40 Offset Determination Correlation as a Function of Test Frequency......73
2.41 Correlation when -20 Hz Test Matches -20 Hz Offset....................74
3.1 Program Control Panel.................................................76
3.2 Performance About a 0.25 Hz Search Step...............................77
IX


3.3 Noise Performance of Fundamental Estimation...........................78
3.4 Errors in Tuning Offset due to Errors in Fundamental Determination....79
3.5 Effect of Noise on Tuning Error Estimation............................80
3.6 Data Acquisition Setup................................................81
3.7 Spectrum Laboratory Display with WWV Input Signal (10 MHz)............83
3.8 Input Processing Panel................................................84
3.9 Voice Classification Panel............................................84
3.10 Audio Sample Processing and Windowing.................................85
3.11 Vowel Fundamental Determination.......................................85
3.12 Tuning Offset Estimation Panel........................................86
3.13 Calibration Setup.....................................................87
3.14 Baritone ah Vowel at Fixed Frequency................................89
3.15 Baritone Frequency Plot for Rising Pitch ah Vowel...................91
3.16 Baritone Offset Versus Vowel Frequency Rising Pitch ah Vowel........91
3.17 Baritone Frequency Plot for Falling Pitch ah Vowel..................92
3.18 Baritone Offset Versus Vowel Frequency Falling Pitch ah Vowel.......92
3.19 512 Bins Baritone Frequency Plot for Falling Pitch ah Vowel.........93
3.20 512 Bins Baritone Offset Versus Vowel Falling Pitch ah Vowel........94
3.21 Contralto ah" Vowel at Fixed Frequency...............................95
3.22 Contralto ah Vowel Spectrum Analysis Fundamental....................95
3.23 Contralto ah Vowel Spectrum Analysis of 6th Harmonic................96
3.24 Contralto Estimated Tuning Offset.....................................96
3.25 Contralto With Rising Pitch...........................................98
3.26 Contralto with Falling Pitch..........................................99
3.27 Contralto Voice with 7 Tuning Offsets; No Error Rejection............100
3.28 Contralto Voice with 7 Tuning Offsets; With Error Rejection..........101
3.29 Tuning Offset Determination with Resonant Womans Voice..............102
3.30 Female Voice with -25 Hz Offset......................................102
x


3.31 Male Voice with -25 Hz Offset...........................................103
3.32 Large Sample of Male Voices; -25 Hz Offset................................104
3.33 Time Evolution of Estimated Tuning Error..................................105
XI


TABLES
Table
1.1 Spoken Vowel Fundamental Frequency Ranges................................4
1.2 Examples of Voiced and Unvoiced Consonants...............................5
1.3 SSB Modulator Outputs....................................................9
2.1 SSB Demodulation Output with +20 Hz Tuning Error........................23
2.2 Point Design Based on System Specifications.............................26
2.3 Typical Window Functions................................................35
2.4 Data Simulator Requirements.............................................40
2.5 Demonstration Simulation Run Specifications.............................42
2.6 Voice Classifications...................................................44
3.1 WWV Based Calibration Offsets...........................................88
3.2 Statistics for Baritone ah" Vowel at 111 Hz............................90
3.3 Statistics for Contralto ah Vowel at 214 Hz...........................97
3.4 Performance Comparison of Two FFT Lengths..............................104
XII


1. Background Information
1.1 Relevant Characteristics of the Human Voice
The blind clarification algorithm functions due to the fundamental characteristics of
the human voice. Consequently, a thorough understanding of the relevant
characteristics of the human voice is required. Of primary importance to the blind
clarification algorithm are the vowel sounds.
1.1.1 Vowels The Desired Signal
The model considered here for vowel generation is based upon source-filter theory.1
This theory divorces the source of vocal acoustic waves from the filtering that
transforms the initial acoustic waves into particular vowels. Figure 1.1 illustrates the
source-filter model as it applies to the human voice.
Air Pump
Lungs
Self-Sustained
Oscillator
(Simple Source)
Adjustable Transmission
HI Larynx
Vocal
Cavities
Output Interfaces
Nose
Aperture
To Microphone
% Mouth
"s Aperture
Pathway
of Interest
Sibilants
Etc.
Auxiliary
Output Interface
Figure 1.1 Block Diagram of the Voice Mechanism
1


The larynx acts as the source in this model and is responsible for generating the
fundamental pitch. The frequency of oscillation depends upon the dimensions of the
larynx and the mass and tension of the vocal cords. Given the conditions of high
breath pressure and close spacing of the vocal chords, the bursts of air from the
larynx will be regular; hence, the upper partials generated will be harmonically
related. Both even and odd harmonics are present at the larynx. The amplitude of
the harmonics decreases at a rate of about 12 dB per octave. The harmonics are
present up to 4000 Hz, after which their amplitudes are severely attenuated.2 Figure
1.2 gives an example of a male voice with a 160 Hz fundamental with no formant
filtering.
160 Hz Fundamental Larynx Source with NO Filtering
1

T3
|
| ai
0}
J>
Id
<2
o.oi
0.001
In this example all of the harmonics are present at the larynx with a smoothly
decreasing amplitude contour. The formant filters present in the human vocal tract
alter this contour to intone different vowels sounds. A better example of the input
160 480 800 1120 1440 1760 2080 2400 2720 3040 3360 3680 4000
Frequency (Hertz)
Figure 1.2 Harmonic Series for a 160 Hz Fundamental
2


signals of interest is shown in Figure 1.3 that gives the spectral analysis of a 155 Hz
ah vowel in a bass-baritone voice.
Bass-Baritone 155 Hertz "ah" Vowel
Frequency in Hertz
Figure 1.3 155 Hertz ah Vowel
The harmonic sequence of the 155 HZ ah vowel is very similar to the analytical
example in Figure 1.2; the primary difference between the two figures lies in the
relative amplitudes of the various harmonics. In this figure there is some extraneous
low amplitude noise from about 450 Hz to 1000 Hz. This was caused by a noise
source present in the room during data acquisition and therefore should be ignored.
The smooth curve drawn through the peaks of each harmonic is often called a vowel
contour. Unlike speech synthesis and voice recognition algorithms, the algorithm
developed here is not concerned with vowel contours. The absolute pitch of the
fundamentals and harmonics is of primary importance instead.
3


Any given individual with normal larynx function can generate a range of fundamental
vowel frequencies that can vary by at least an octave; this is the origin of singing.
When speaking, the usual pitch is about 50% higher in frequency than the lowest
pitch that can be generated while singing.3
The fundamental frequencies that are generated by an adult larynx primarily fall into
six overlapping ranges depending upon the physical properties of the larynx and
vocal cords. Using the 50% rule for the fundamental frequencies of speech
presented above, along with a tolerance of 50% to account for differences in
individual voices, the spoken vowel fundamental frequency ranges are listed below in
Table 1.1.
Voice Range
Highest Female
Most Probable Female
Lowest Female
Highest Male
Most Probable Male
Lowest Male
Classification
Soprano
Mezzo-Soprano
Contralto
Tenor
Baritone
Bass
Frequency Range
246.94-493.88 Hz
207.65-415.30 Hz
185.00-370.00 Hz
146.93-293.66 Hz
92.50-185 Hz
87.31 -174.62 Hz
Table 1.1 Spoken Vowel Fundamental Frequency Ranges
1.1.2 Consonants A Possible Noise Source
Unlike vowels, consonants generally have a fixed short duration and can therefore
be considered transient events. The group of consonant vocal sounds can be further
broken down into small groups based upon their acoustic properties1. Some
consonants still have merit as a desired signal input (see Table 1.2). This is due to
the presence of voiced sound generated by the vocal cords.
4


While vowels and voiced consonants will be processed as signals, unvoiced
consonants and other vocal sounds are not required by this algorithm and therefore
are considered to be noise. Fortunately, the effects of noise on the clarification
algorithm can be reduced by adaptive filtering.
Consonant Example Classification Voiced
b boy stop no
tf church affricative no
d dog stop no
f fog fricative no
g go stop yes
h hat fricative no
j join glide yes
k cake stop no
I like liquid yes
m may nasal yes
n nose nasal yes
P pop stop no
r river liquid yes
s saw fricative no
t top stop no
v valley fricative yes
w while glide yes
z zebra fricative yes
Table 1.2 Examples of Voiced and Unvoiced Consonants
The acoustic energy produced by the voice is transformed into electrical energy by a
transducer (microphone) and applied as input to a single sideband transmitter.
Microphones used for radio communication applications have a tailored frequency
response from 300 Hz to 3,000 Hz. Any acoustic energy outside of this range is
generally ignored.
5


1.2 Single Sideband Suppressed Carrier Communications
1.2.1 Single Sideband (SSB) Modulation
SSB is a modulation technique that is commonly used in the high frequency
spectrum. While it can be generated by phase techniques, in modem
communications equipment it is generated using narrow filters as illustrated in Figure
1.4.6
Double-Balanced
Mixer
Carrier Oscillator
Cos (2nf2t)
Figure 1.4 Single Sideband Modulation
The modulating signal, in this case voice, is applied to doubly-balanced mixer. The
other input to the mixer originates from a carrier oscillator source. Due to its
balanced design with respect to both the signal and carrier inputs, the leakage of the
mixer inputs directly to the mixer outputs is suppressed and can be adjusted to
suppress the unwanted outputs to a point where they are negligible. The desired
output of the mixer is the product of the two input signals
6


m{fx,t)Cos(27if2t) .
(1.1)
As previously discussed, the modulating signal of interest will consist of the vowels
and voiced consonants produced by a human speaker. Consider a voice signal that
consists of a vowel. As previously mentioned, a vowel consists of a fundamental
frequency and its harmonics within the frequency range of 300 to 3,000 Hz. This can
be represented as
Afiosiljdtfj).
k=1
(1.2)
The amplitude of the fundamental and harmonics present is specified by each of the
values that A* can attain. This product will be
m(t) = Cos{2jif1t)s)'' AkCos{2jdrfxt),
k=\
(1.3)
which is the product of cosine functions. By use of the trigonometric formula
Cos(a)Cos(fi) = Cos(a /?) + ~ Cos(a + /?),
(1.4)
the product in (1.3) becomes a sum and difference mixer product
XM + AA Cos(2x(f2 +kft)l).
k= 1 Jfc=l
(1.5)
7


Given that the carrier oscillator source is always much higher in frequency than the
modulating signal, the output of the mixer will be a grouping of signals above and
below the carrier frequency. The sideband filter only passes the signals from one
sideband. As upper sideband is generally the standard modulation technique used
in the high frequency spectrum, the output of the sideband filter would be
*=i
(1.6)
The undesired signal output from the mixer is commonly called the image frequency.
In this case the image consists of multiple frequencies based upon the original
modulating input
£VcoS(2*(/,-*/;)0.
/t=l
(1.7)
A numerical example given in Table 1.3 can be used to illustrate SSB modulation.
Consider a vowel that has a fundamental frequency of 256 Hz and harmonics that
reach up to 3,000 Hz. The carrier oscillator in this example will be at 455 KHz with
the sideband filter passing only the upper sideband information from 0 to 3,000 Hz
above the carrier frequency.
As mentioned previously, a single sideband communications link is generally limited
to signals in the audio band from 300 Hz to 3,000 Hz. Depending upon the shape
factor of the sideband filter and the responsivity of the microphone, it is entirely
possible that the 256 Hz fundamental will not be available at the receiver. The blind
clarification algorithm cannot count on energy at the fundamental frequency being
present.
8


Harmonic Frequency Carrier Oscillator Output Frequency
Fundamental 256 Hz 455.000 KHz 455.256 KHz
2nd 512 Hz 455.000 KHz 455.512 KHz
3rd 768 Hz 455.000 KHz 455.768 KHz
10,h 2560 Hz 455.000 KHz 457.560 KHz
11th 2816 Hz 455.000 KHz 457.816 KHz
Table 1.3 SSB Modulator Outputs
1.2.2 Translating the SSB Signal to the Transmit Frequency
Single sideband transmissions are most commonly used in the High Frequency (HF)
radio spectrum. The HF spectrum can be considered to be from 1.6 MHz to 30.0
MHz. The 455 KHz SSB signal generated in 1.2.1 is outside of this frequency range.
As a result frequency conversion is required.
The frequency conversion schemes commonly used involve multiple oscillator
injection frequencies and multiple mixers. Figure 1.5 shows a frequency conversion
scheme appropriate for the translation of SSB signals from baseband audio to the
SSB signal at the desired transmit frequency7.
It is important to note that the phase locked loop (PLL) synthesizers are ail locked to
the frequency standard. This is true in virtually all modern commercial and military
SSB equipment. In low cost amateur applications it is not unusual for some of the
oscillator injection frequencies to be generated by separate unlocked frequency
sources.
g


SSB Input
1.6 MHz
to
30 MHz
tuned in
10 Hz Steps
Roofing Fiber Bandpass Filter
30.000 MHz 42.905 MHz
Low Pass 16 KHz wide
Bandpass Fiber
10.7 MHz
8 KHz wide
Sideband Fiber
455.0 - 458.0 KHz
or
455.0 - 452.0 KHz
Lowpass Fiher Audio Output
DC 3 KHz 300 3Klfe
Figure 1.5 SSB Transmit Conversion Chain


The rationale for multiple conversions and injection frequencies involves practical
limitations in phase locked loop design and high rejection ratios for out of band
transmit products. While phase locked loop design and heterodyne frequency
product analysis are not germane to this thesis, the effect on transmit frequency
stability and accuracy due to the use of multiple synthesizers is a primary concern.
1.2.3 Frequency Synthesizers
As illustrated in Figures 1.4 and 1.5, frequency synthesizers are integral to SSB
communications systems. Two common schemes are used to synthesize the
frequencies used for frequency conversion.
1.2.3.1 Phase Locked Loop Frequency Synthesizers
The portion of PLL theory required for this thesis involves frequency accuracy.
Modern phase locked loops use a dual divider scheme to generate the output
frequency as shown in Figure 1.6.
Tuning
Voltage
Frequency
Figure 1.6 Phase Locked Loop Synthesizer
11


The PLL acts as an analog servo control system. The tuning voltage will drive the
voltage-controlled oscillator until the reference frequency and the feedback
frequency (and phase) are matched. The divider m sets the tuning step size for the
synthesizer. The divider n sets the output frequency. The output frequency is simply
the reference frequency times . The output frequency is therefore
An implementation of the 455 KHz synthesized injection oscillator required in Figure
1.3 and Table 1.3 using a 5 KHz reference frequency could have m = 400 and n = 91
for a 2 MHz master oscillator input. Any errors in the master oscillator frequency will
be referred to the PLL output; i.e., if the master oscillator is specified to be within 1
part per million (ppm), then the output frequency will have that specification as well6.
1.2.3.2 Numerically Controlled Oscillators
A newer technique that uses direct digital synthesis techniques in place of PLL
sources is becoming common in transmitters and in the later oscillators in receivers.
Such oscillators are called numerically controlled oscillators (NCOs)9. The block
diagram for a typical NCO is given in Figure 1.7.
The key to understanding the operation of an NCO is to consider the behavior of the
accumulator. Consider an NCO that uses a 32-bit accumulator that is updated by a
master oscillator at a 100 MHz rate. Once every 10 ns the value programmed into
the increment word is added to the contents of the accumulator. Being a 32 value,
the accumulator must transition through a total of 4,294,967,296 counts before it rolls
over and returns to 0. If, for example, the increment register is programmed to
contain 4295, then the accumulator will roll over approximately 100 times per
second. This means that the count conveyed into the sine lookup table will outline
fout
master oscillalt
(1.8)
12


Figure 1.7 Numerically Controlled Oscillator
about 100 sawtooth count sequences per second. The sine lookup table will use the
most significant bit outputs from the accumulator to convert these to sine waves that
are then converted to an analog signal by the analog to digital converter. The analog
signal is then passed through a low pass filter in order to remove any of the master
oscillator frequency components. The size of the sine lookup table and the number
of bits in the analog-to-digital converter determine the fidelity of the sine wave.
The frequency of the output is of primary concern. In this example, the exact output
frequency (assuming a perfect master oscillator) is
fout freference *
f0ttt = 100 MHz
increment
V
4295
* 4294967296
foul = 100.0000761/fe
(1.9)
The small setting error (76.1 uHz) is typical of a general purpose NCO. In SSB
equipment, the reference frequency is chosen so that the NCO synthesizer output is
13


exactly on the desired frequency, assuming that the master oscillator is absolutely
correct.
Unfortunately, the master oscillator is not perfect and in fact has an error just as in
the PLL synthesizer case. Equation (1.9) demonstrates that, just as in PLLs, the
frequency output of an NCO is directly proportional to the error in the master
oscillator. In further discussions involving oscillator injection signals, no distinction
will be made between those generated by NCOs and PLLs.
1.2.4 SSB Demodulation Recovering the Original
Audio Signal
A configuration similar to Figure 1.4 is used to recover the original audio from a SSB
modulated signal and is shown in Figure 1.8.
Single
Sideband
Input
Voice Signal
Output
300 3000 Hz
m(f,. t)
Canid Oscillator
Cos (2rf2t)
Figure 1.8 Single Sideband Demodulation
The input to the SSB demodulator will be
k=\
(1.10)
14


which is the same as (1.6). The mixer in the demodulator multiplies the input and the
local oscillator to yield
Cos(27uf2{demai)
t)
.^2 (mod) )0
\k=l
(1.11)
Again, using trigonometric identities, two mixer products will be present. The desired
output, assuming that frequency^ for the modulation and demodulation processes
are identical, will be returned to baseband. This output, neglecting noise,
interference, and gain differences, will be identical to the original input
= YtACosilnkfJ).
k=1
(1.12)
The significant difference between the modulation and demodulation processes is
that the image product from the mixer in the demodulator consists of a band of
frequencies above the carrier oscillator frequency at
k/2 CS(2ft(kf + /2(raod) + f2(demod) )0
k=1
(1.13)
This places the image product at a frequency that is significantly above baseband.
As a result, a low pass filter is used to remove the image product.
15


1.2.5 Translating the Received SSB Signal to Baseband
When a SSB signal in the HF radio spectrum is received, it must be translated to
baseband. This is generally done using a conversion scheme shown in Figure 1.9
that is in opposite order to the transmit chain described in 1.2.2.
As with the transmit conversion chain, all of the synthesizers in the receive system in
commercial and military SSB receivers are locked to the frequency standard.
1.3 Conversion Chain Error Analysis
Each of the synthesizers in transmit and receive conversion chains will induce a
frequency error in the final product of the chain. In a transmitter, the error is at the
transmit frequency. At the receiver, the error is in the demodulation of the signal.
Taken together, these cause a clarification error in the received signal at baseband.
1.3.1 Transmit Conversion Chain Frequency Errors
Consider the transmit chain shown in Figure 1.4 on a channel frequency of 14.3 MHz
using upper sideband modulation (USB). The final transmit frequency, given a
modulation input at fx will be
= 512\MHz (32.21 MHz + (11.155MZ7z (/, + 455KHz)))
= U.3MHz + fx
(1.14)
16


SSB Input
1.6 MHz
to
30 MHz
tuned in
10 Hz Steps
Roofing Filter Bandpass Filter
30.000 MHz 42.905 MHz
Low Pass 16 KHz wide
Bandpass Filter
10.7 MHz
8 KHz wide
Sideband Filter
455.0 - 458.0 KHz
or
455.0 - 452.0 KHz
Lowpass Filter
DC 3 KHz
Audio Output
300 3KHz
----
Figure 1.9 Receive Conversion Chain


Since each of the conversion frequencies is locked to the same master oscillator, the
errors will sum in similar fashion. For the purpose of this analysis, assume that the
master oscillator is high by parts per million. The transmit frequency error would
be
fTerror = 57.21* (32.21* + (11.155*- 0.455*))
fTerror = 14.3*ife
(1.15)
The final result is simply the error multiplied by the channel frequency. This is the
case in all well designed transmitters and can never be less than this, since the
combination of the frequencies must result in the channel frequency. However, the
error can be more if the injection sources are not locked to a master oscillator.
Consider the same frequency chain with unlocked oscillators. Since the errors are
uncorrelated and can be either positive or negative, they directly sum together
fTerror = 57.21*, -(32.21*2 +(11.155*3 -0.455*4))
fTerror = 57.21*, + 32.21*2 +11.155*3 + 0.455*4/fe '
(1.16)
In order for a transmitter without a master oscillator to achieve the same transmit
frequency accuracy as a transmitter using a master oscillator, the high frequency
oscillators (HFO) would have to be more stable than a master oscillator. This is
unlikely since master oscillators use temperature compensated crystal oscillators.
The HFO in a transmitter sets the channel frequencies. Unless the transmitter is
designed for single channel operation, it is likely that the HFO is a variable frequency
oscillator. Using a master oscillator can improve the frequency accuracy of a
transmitter by orders of magnitude.
18


In low cost amateur SSB transmitters, the high frequency oscillators are typically
synthesized and locked to a master oscillator, while the lower frequency oscillators
are fixed independent crystal oscillators. While not as stable as a fully synthesized
commercial or military transmitter, low cost amateur SSB equipment has acceptable
frequency accuracy for amateur applications10.
1.3.2 Receive Conversion Chain Frequency Errors
The frequency errors in a SSB receiver add in a similar fashion, except that they
result in an error at baseband. Again using the example of a USB signal at 14.3
MHz with a modulation frequency difu
fRout = 11.155MHz (:57.21MHz (14.3MHz + /,) 32.21MHz) 455KHz
fRout f\
(1.17)
For a fully synthesized receiver with an error of jc ppm in the master oscillator, the
frequency error in the final received audio signal would be
horror = 11.155x (57.21* 32.21x) 0.455*
horror =14.3x/fe
(1.18)
Note that this error is the same as the transmit error for a fully synthesized
transmitter, except opposite in sign. Again in a receiver without a master oscillator
the errors have the potential of being much larger. As with the transmitter, the errors
can be either positive or negative, so without a loss of generality:
19


/terror = H.155x3 -(57.21jc, -32.21x2)-0.455x4
/terror = 57.21*, + 32.21x2 +11.155x3 + 0.455x4 Hz'
(1.19)
which is the same as for the transmitter as shown in (1.14).
1.3.3 Combined Conversion Chain Frequency Errors
In the transmitter and receiver are taken together, then the error add. For the
example of a commercial or military transmitter using USB modulation at 14.3 MHz
with master oscillator errors of x ppm,

= 14 3* -14 3x Hz
error transmit receive11^
(1.20)
If the transmitter and receiver were locked to the same master oscillator, then the
errors would cancel. While some exotic military equipment does this using the
Global Positioning Satellite (GPS) constellation, this is not the usual practice. As a
result, using master oscillators that are accurate to 1 ppm, the maximum error on
either transmit or receive for well designed HF SSB transmitters and receivers is 30
Hz each. The total error could be as high as 60 Hz for the combined transmitter-to-
receiver link. The United States Government (USG) standard for HF transmitting
and receiving equipment is given in the National Telecommunication and Information
Administration (NTIA) Red Book"11. For SSB signals in the HF spectrum, the
maximum frequency error allowed is 20 Hz. This implies that well designed, fully
synthesized SSB transmitter equipment used on USG nets must have master
oscillators with an accuracy of at least 0.67 ppm to be within the NTIA standard from
1.6 to 30 MHz.
20


Amateur equipment designed without injection sources locked to master oscillators
does much more poorly. Unlike (1.14) and (1.17), whose errors subtract from each
other, the errors can exceed 100 Hz for either the transmitter or receiver alone. This
is the rationale for why the USG does not allow amateur quality transmitters to be
used in their HF communication nets. Conversely, such low cost equipment is
satisfactory for amateur use. This is because, while the USG allocated
communication channels, amateurs are allocated communication bands. An
amateur need only stay in a band to operate legally. Amateur HF bands are very
wide in terms of SSB signals; most band segments allocated to SSB transmissions
are over 100 KHz wide. Amateur radio operators are well aware of this. Only
amateur operators with well calibrated, fully synthesized transmitters operate near
the edge of band segments.
One final point concerning amateur SSB equipment: frequency errors evolve over
time. The oscillators may not be sufficiently temperature compensated; therefore, as
a transmission is being made, the heat dissipated by the final power amplifiers in the
transmitter will induce a frequency shift in the SSB transmit signal.
21


2. The Blind Clarification Algorithm
2.1 The Effect of a Tuning Error on SSB Voice Modulation
As shown in (1.18), if the transmitter and receiver conversion chains are aligned and
calibrated together, there will be no frequency errors in the recovered audio. For a
voice signal transmitted using SSB modulation, this means that any fundamentals
and harmonics present will be at the original frequencies. This implies that the voice
signal will be maximally intelligible by the listener if noise and interference are not
present. The only signals presented to the listener will be in the 300 to 3,000 Hz
passband. The end-to-end link bandwidth is determined by the sideband filters in the
transmitter and receiver as well as the tailored frequency responses of the
microphone, earphones, or loudspeaker.
The case of interest in this thesis is when there is an error between the frequencies
transmitted and those received. Equations (1.13) and (1.17) demonstrate that
frequency errors are simply added to modulating spectra when using USB
modulation. For a transmitted signal at 14.3 MHz with frequency errors of x ppm at
transmit and receive, combining (1.12), (1.14), and (1.18) yields
n
foutput =14-3*, -14.3xreceive+ ^ AkCos{27drft).
k=1
(2.1)
By combining the transmit and receive errors while removing the constants, this
becomes
22


(2.2)
foutpu, = ferror +
A=1
Using the same data as in Table 1.3, consider the result of demodulation with a
tuning error of +20 Hz. This can be caused by either the transmitter carrier oscillator
being 20 Hz high, the receiver carrier oscillator 20 Hz low, or a combination of errors.
Harmonic Input Frequency Carrier Oscillator Recovered Frequency Error
Fundamental 455.256 KHz 454.980 KHz 276 Hz +20 Hz
2nd 455.512 KHz 454.980 KHz 532 Hz +20 Hz
3rd 455.768 KHz 454.980 KHz 788 Hz +20 Hz
10th 457.560 KHz 454.980 KHz 2580 Hz +20 Hz
11th 457.816 KHz 454.980 KHz 2836 Hz +20 Hz
Table 2.1 SSB Demodulation Output with +20 Hz Tuning Error
Figure 2.1 shows the spectrum in Figure 1.3 with the 20 Hz shift in demodulation. A
total tuning error of +20 Hz would distort the recovered voice, but the signal would
still be intelligible. In commercial and military equipment, transmit and receive
frequency errors add and, as a result, a total error of 40 Hz could be present and the
equipment would still meet the USG frequency tolerance requirement. Unfortunately
errors of this magnitude are apparent to the listener and can impede intelligibility.
23


Bass-Baritone 155 Hz "ah" Vowel; +20 Hz Tuning Error
Frequency in Hertz
Figure 2.1 155 Hz ah Vowel with +20 Hz Tuning Error
2.2 System Engineering Aspects of the Design
The algorithms developed here are based on digital signal processing techniques
using sampled data. The selection of the sampling interval, number of samples per
ensemble, quantization level, and numerical precision are driven by a number of
factors. Each factor is described separately.
The length of vowels in time drives the total time span allowed for a data ensemble.
The algorithms here are batch processes; they assume that vowels are stationary for
the duration of an ensemble. The exact duration of vowels depends upon the
speaker, the language being spoken, and the dialog content. The assumption is that
vowels will have a duration of between 175th and 1720th of a second. This limits the
time duration of a data ensemble to 175th of a second or less.
24


Another constraint on the number of samples in an ensemble is the frequency
resolution of the fast Fourier transform (FFT) algorithm itself. The goal is to keep the
combined tuning error under the 20 Hz USG requirement. This implies that the bin
spacing of the FFT should be less than 20 Hz with adequate margin.
Tight frequency resolution requirements tend to drive toward FFT based
computations with a large number of bins. The computational capability of a typical
personal computer (PC) must also be considered. Large numbers of complex
computations, such as FFTs, can seriously affect the throughput of a PC. This
limitation forces the FFT to have a minimum number of bins to meet the given
frequency resolution requirement.
Since the highest frequency present at the output of a SSB demodulator is about 3
KHz, by the Nyquist theorem, the sample rate must be greater than 6 KHz.
Unfortunately, the IF amplifiers, detectors, and audio amplification stages in an HF
receiver have significant noise contribution up to approximately 8 KHz. This would
imply that it would be prudent to sample the receiver audio at a rate higher than 16
KHz so that high frequency noise does not alias into the analysis passband.
Standard PC sound boards allow for sampling at 8 KHz, 11.025 KHz, 22.050 KHz,
and 44.100 KHz.
The linear quantization levels available on a PC are 8 and 16 bits. The hardware
floating point mathematics in a PC is based on the IEEE-754 standard. The
precision for a floating point computation ranges from single precision with a 23 bit
mantissa to extended precision with a 63, bit mantissa. The selected floating point
resolution must be capable of supporting the multiplication of quantized data without
a resultant loss of precision due to truncation.
The system level requirements led to a point design with characteristics as specified
in Table 2.2.
25


Sample Rate
Ensemble Size
Quantization Level:
Mathematical Precision:
22,050 Hz
4,096 Samples
16 Bits Linear
IEEE-754 Extended Precision
Table 2.2 Point Design Based on System Specifications
The sample rate of 22,050 Hz along with an ensemble size of 4,096 indicates that an
ensemble will require 186 ms to collect, which is less than the 175th second assumed
duration of a vowel. The frequency resolution of FFTs using 4,096 points is 5.383
Hz, less than the requirement of 20 Hz.
Using the maximum available quantization level of 16 bits was a reasonable choice,
but it forced the requirement that the multiplication of two 16-bit values should not be
truncated. The use of IEEE-754 extended precision with its 63 bit mantissas easily
met this requirement.
The compiler chosen was Borland Delphi 6.0. A computer program was written to
verify that the precision was indeed 63 bits using a computation of machine epsilon
in Delphi 6.0:
function Emeps: extended;
var
temp, one, two: extended;
begin
one:= 1.0;
two:= 2.0;
temp:= 1.0;
repeat
temp:= temp / two
until (temp + one) = one;
Emeps:= temp + temp
end;
26


The result was as anticipated:
Machine Epsilon Calculation X
1 ' F

| Compute MEPS j | 1.08420217248550443E-0019 I 63.00 : i
1 Machine Epsilon (Extended Precision) Significant Bits B
Figure 2.2 Machine Epsilon Calculation Result
The number of significant bits was computed using a library call to the Delphi log2
function. Printing was by conversion to strings:
procedure TMEPSForm.MEPSButtonClick(Sender: TObject);
var
meps: extended;
st: string[80];
one, two: extended;
begin
meps:= Emeps;
str (meps:35, st);
mepsEdit.Text:= st;
str (-log2(meps):10:2, st);
SigBitEdit.Text:= st
end;
2.3 Determination of Vowel Fundamental Frequency
Equation (2.2) contains two unknown frequencies, the tuning offset (or error) and the
fundamental frequency of the vowel being transmitted. The algorithms for
determination of the vowel fundamental frequency will be developed first.
27


2.3.1 Origin of the Vowel Fundamental Determination
Algorithms
The concepts presented here grew out of a discussion concerning automatic
frequency control (AFC) techniques in general. Since a continuous carrier is present
in many forms of communication, phase locked loops can be used to recover the
carrier and adjust receiver tuning.
For example, in binary frequency shift keying (FSK) transmission systems, the output
carrier alternates between two transmit frequencies. These are commonly called the
mark and space frequencies and are placed symmetrically above and below the
assigned channel frequency. In practice, the transmit data is randomized; the
transmitted signal has a probability of close to 0.5 that any transmitted bit will be a
mark or space. As a result, the density of mark and space tones will be
approximately equal over long periods of time. Figure 2.3 presents the block
diagram of a commonly used demodulator for FSK signals that can track and remove
errors due to frequency drift.
Oscillator Output (Feedback)
Figure 2.3 AFC Circuit for FSK Radio Channel
28


Unfortunately, the human voice does not have characteristics even remotely similar
to an FSK signal. There is no continuous carrier. The transmitted "tones are
unknown as to their frequency, duration, and probability distribution.
Figure 2.4 compares two examples of recorded vowels. The second example is
shifted by +20 Hz. The plots in Figure 2.4 were generated by computing the discrete
Fourier transform of two sets of sampled audio data. This was accomplished using
an FFT algorithm on sampled and windowed data sets created at different times. It
is apparent that in the frequency domain the two are correlated even though one
spectra is shifted to simulate a +20 Hz SSB tuning error while the spectral line
spacing is left intact.
The first algorithm for determining vowel fundamental frequencies compares (by
correlation) the unknown audio input and a reference signal with a varying
fundamental frequency in the frequency domain. When the two fundamental
frequencies match, the correlation will be at a maximum.
The second algorithm is based on the use of autocorrelations. Similar techniques
have been developed over the years to detect periodic signals with an unknown
period12. If one compares a periodogram of a voice signal in the frequency domain
next to one of a periodic signal with an unknown period in the time domain, the
periodograms will appear similar. This algorithm compares the autocorrelation of the
incoming audio samples in the frequency domain with a reference spectral line
simulation.
29


Relative Amplitude Relative Amplitude
Bass-Baritone 155 Hertz "ah" Vowel
Frequency in Hertz
175 485 795 1105 1415 1725 2035 2345 2655 2965 3275 3585 3895
Frequency in Hertz
Figure 2.4 Comparison of Shifted and Un-shifted Spectra
30


2.3.2 Analysis of Vowels in the Frequency Domain
Equation (2.2) demonstrates that the received vowel consists of a composite of a
finite number of periodic functions. Consider a single periodic function of the form
h(t) = ACos(2nf0t).
The Fourier transform of this periodic function is given
H(f) = £ A Cos(2nf0 t)e dt
W) = f(*(/-/o)+ (2.3)
(2.4)
This can be represented graphically by
H(f)
Figure 2.5 Fourier Transform of ACos(at)
In this application, the periodic function is not merely one sine or cosine wave, but
the sum of the fundamental and harmonics up to the limit of the SSB filter, which is
about 3,000 Hz. Since the Fourier transform is linear, the Fourier transform of the
31


sum of components is equal to the sum of the Fourier transforms of each
component. As a result the Fourier transform of the components of a vowel appears
as follows in Figure 2.6.
H[f)
Figure 2.6 Fourier Transform of a Vowel
The periodic function available for processing does not extend to infinity on the time
axis. It is essentially multiplied by a window function that makes the waveform
length finite.
The natural approach is to use a rectangular window; i.e., simply truncate the
periodic function on both ends. This is the same as multiplying the periodic function
by a rectangular wave. Consider the Fourier transform pair of the rectangular
window by itself:
32


h{t) = A,\t\ 2^o J
/?(0=o,|/|>/0
(2.5)
This generates the classic Sine function; the amplitude of the Sine function is shown
in Figure 2.7.
Figure 2.7 Magnitude of the Sine Function
Due to the linearity of the Fourier transform, the frequency-convolution theorem
allows that

(2.6)
33


When the Fourier transform of the periodic function and the rectangular window are
convolved, each of the spectral lines displays the characteristic side lobes of the Sine
function. In discrete Fourier transforms, the side lobes leak into adjacent bins;
hence the name spectral leakage. This means that each spectral line in Figure 2.6
will be spread and give the appearance of Figure 2.8.
H(f)
Figure 2.8 Fourier Transform of Vowel with Side Lobes
As a result, different windowing functions have been developed to minimize the side
lobes. An example of an improved window function is the Hann window13:
1 '2nt\
w = 1 -Cos
2 [t )\
(2.7)
T is the total time duration of the window, with t being the time from 0 to T. The
window function is then multiplied with the periodic function, in this case a vowel,
34


being analyzed. While not perfect, the Hann window attenuates the highest side
lobe by 32 dB and has an asymptotic roll off of 18 dB per octave14.
The software developed here allows for operator selection of four different window
functions. They include the rectangular, Parzen, Hann, and Welch windows. The
default window is the Welch. Table 2.3 gives the discrete versions of the window
functions where N is the total number of samples in an ensemble and j is the index to
a particular data point.
Window Mathematical Description
Rectangular 1.0
In this application, Figure 2.8 would represent the case of no tuning error in a SSB
communication system. When the tuning errors are added, the spectral lines are
shifted as shown in Figure 2.9.
The analysis up to this point has concentrated on continuous functions. The
algorithms are actually implemented using sampled data and the discrete Fourier
transform. The fast Fourier transform (FFT) is an algorithm by which discrete Fourier
transforms (DFT) are efficiently calculated by computer. The FFT is an algorithm
and not an approximation; it provides a considerable improvement in efficiency when
compared to calculating a DFT directly. The detailed design of FFT algorithms is not
covered here; a particularly good algorithm description is given by Higgins in Digital
Signal Processing in VLSI15.
Parzen
1.0 - Abs ((j-0.5*(N -1.0)) / (0.5 (N + 1.0)))
0.5 (1.0 Cos (2.0 pi j / (N -1.0)))
1.0 - Sqr (O' 0.5 (N -1.0)) / (0.5 (N + 1.0)))
Hann
Welch
Table 2.3 Typical Window Functions
35


H(f)
Figure 2.9 Vowel with Side Lobes Subject to a Tuning Error
From the point of view of Figure 2.9, the use of sampled data concentrates the
energy into bins as seen in Figure 2.10.
Figure 2.10 Vowel Spectra using Sampled Data and Discrete Fourier Transform
As with the continuous case, the spectral lines will be shifted if there is a SSB tuning
error. Fortunately the spacing between the spectral lines located in each half plane
remains identical with that of a signal with no tuning error.
36


2.3.4 Initial Audio Sample Processing
A PC sound card is used to acquire data sets at a 22,050 Hz sample rate. The 16 bit
linearly quantized samples are collected and stored in a Windows .wav file. A single
pass through the blind clarification algorithm requires that 4,096 real data samples
be read from the .wav file and processed.
Data Simulation in the Time Domain
Data Simulation in Time Domain
Figure 2.11 Typical Input Data Ensemble in Time Domain
37


Except for the final tuning error frequency determination, all applications of the input
data are in the frequency domain. In order to prevent spectral leakage when the
input data is transformed from the time domain to the frequency domain, a window
function is applied. This window function will reduce the amplitudes to zero at the
ends of the ensemble. The default here is the Welch window function.
Simulated Data with Windowing Applied
Figure 2.12 Input Data Ensemble after Window is Applied
After windowing, the real data array is converted to a complex array simply by
inserting an imaginary value of 0 between each real value in the data array. The
windowed data is then transformed into the frequency domain using an FFT based
on the Danielson-Lanczos formula16. This approach is not as computationally
efficient as algorithms that are designed for processing real-valued inputs; however,
38


this same FFT algorithm is used elsewhere; in this way only one routine needs to be
coded and tested.
Simulated Data with Tuning Error
Frequency (Hz)
Simulated Data with Tuning Error
Figure 2.13 Simulated Data in the Frequency Domain; With Tuning Error
39


2.3.5 Vowel Simulation
A simulation of vowels was required both for testing the algorithms and, in some
cases, for comparison purposes in some of the determination algorithms. This
simulator was designed to generate data with the following characteristics:
(Simulated) Sample Frequency:
Data Set Length:
Added White Gaussian Noise:
Vowel Fundamental:
Harmonics:
Total Spectral Range:
Amplitude Ratio:
22,050 Hz
131,072 Samples
Adjustable from 0 to -100 dB SNR
Selectable from 87 Hz to 439 Hz
Up to 3,000 Hz
100-3,000 Hz
Fundamental, Harmonics
Identical (1:1)
Table 2.4 Data Simulator Requirements
2.3.6 Determination of Vowel Fundamentals using
Sliding Correlations
Consider the application of the discrete correlation relationship on the two sets of
data in the frequency domain14.
z(k) = ^h(i)x(k + i).
/=i
(2.8)
If the fundamental frequencies match, the spectral line spacing, in terms of FFT bins,
will be identical. Therefore the result, z(k), will be at a maximum (compared to other
cases where the fundamental frequencies of h(i) and x(k + i) differ).
Correlations can be computed using a brute force approach suggested by (2.8). This
is generally avoided because the operation requires n2 computations in order to
40


generate each possible alignment. A better approach is to allow FFTs to compute
the correlation products.
For any given data set, the frequency components are band limited and do not
extend to the ends of the data ensemble. This makes each sample set of finite
length and therefore the complexities of overlap-add and overlap-save algorithms
can be avoided.
The standard approach to using FFTs to perform correlations uses both forward and
inverse FFTs:
Corr{g, h)j <=>GkH[
(2.9)
This algorithm amounts to taking the FFT of both data sets, taking the conjugate of
the 2nd data set, multiplying the sets together, and then taking the inverse FFT of the
product. Brigham outlines a method of computing correlations using only forward
FFTs14:
Corr(g,h)t=FFT{GKH'K).
(2.10)
This is the approach that was used here. While it is slightly less efficient than the
algorithm that uses the inverse FFT shown in (2.9), only the forward FFT routine
needs be coded and tested. In order to avoid interpretation problems with the
correlation result, it is necessary to carefully pad each input sequence with zeros so
that the active data spans do not overlap.
41


For the purpose of determining the best candidate simulated frequency, all of the
4,096 bins are summed using an root mean square (RMS) algorithm to yield a
score. The score for each frequency is compared; the highest score is assumed to
indicate the correct vowel fundamental frequency.
This approach has the distinct advantage of equally weighting all 4,096 samples in
the result. This reduces the effect of channel noise, which is assumed to be additive
white Gaussian noise, by a factor of 64, or 36 dB. Given that successful SSB
reception requires an SNR of at least 10 dB, the vowel fundamental frequency
determination algorithm should be essentially immune to channel noise in practice.
This hypothesis will be tested in simulation.
2.3.6.1 Algorithmic Details of Correlative Vowel
Determination
Figure 2.14 gives the basic algorithmic flow used to determine vowel fundamental
frequencies.
In order to illustrate the nature of the algorithms while they are being described, a
simulation was performed with intermediate products saved for plotting. The
simulation conditions are given in Table 2.5.
Fundamental Frequency
Tuning Error
Signal to Noise Ratio
Speaker Classification
107.666 Hz
-20 Hz
20 dB
Bass-Baritone
Table 2.5 Demonstration Simulation Run Specifications
42


Start Process
Fundamental frequency with highest correlation or "score" is the one of interest
Figure 2.14 Correlative Fundamental Determination Algorithm
43


2.3.6.2 Reference Data Set Generation
The sliding correlation requires that simulated data be generated for comparison at
each candidate vowel fundamental frequency. Nominally each simulated data set is
produced at 0.5 Hz increments. For all possibilities of spoken vowels, this would
require over 700 correlation runs be performed in order to cover the 87 Hz to 439 Hz
range of possible vowel fundamentals. In order to reduce the number of correlations
required per pass, the frequency bounds were manually adjustable by voice
classification (see Table 2.6).
Classification Low Frequency High Frequency
Soprano 246.94 Hz 438.88 Hz
Mezzo-Soprano 207.65 Hz 415.30 Hz
Contralto 185.00 Hz 370.00 Hz
Tenor 146.83 Hz 293.66 Hz
Baritone 92.50 Hz 185.00 Hz
Bass 87.31 Hz 174.62 Hz
Table 2.6 Voice Classifications
The use of voice classification is not a big compromise. The vast majority of SSB
users in the HF spectrum are male; the large majority of them speak in the baritone
range. As a result the software is generally left in the baritone range unless the
target SSB signal has audibly different characteristics.
The simulated data is windowed and processed by the FFT algorithm into the
frequency domain in the same fashion as the audio sample data. Apart from the
offset frequency (and the noise), they appear very similar. With a fundamental
frequency of 107.66 Hz, the second harmonic is exactly located on the correct
frequency (see Figure 2.15). The offset samples in Figure 2.13 demonstrate that the
target (simulated) audio sample is 20 Hz low.
44


Simulated Data in Frequency Domain, NO Offset
Simulated Data in Frequency Domain, NO Offset
Figure 2.15 Simulated Data with No Tuning Error
45


2.3.6.3 Data Set Zero Padding
Prior to being processed by the correlation algorithm, both the input sample and the
simulation data sets in the frequency domain must be zero padded. In this case the
padding does not change the number of values in the data sets. This was desired so
the order of the FFT would not increase and to keep the data sets sized to contain an
integral power of 2 number of samples. While it is possible to derive an FFT
algorithm for any number of samples in an ensemble, algorithms with a power of two
number of samples are much more efficient17.
The zero padding was used to allow the first half of the Fourier transformed incoming
data to remain at the start of the array. The remaining locations in the array were
zeroed because they contain no useful data, since the FFT bins span frequencies
from 0 to 11,025 Hz and the data is only valid from 0 to 3,000 Hz.
The simulated data was also zero padded. In this case the FFT bins from 0 to
5,512.5 Hz were placed at the top of the array with all other elements set to zero.
Again, the upper bins were deleted since they contain no useful information. Padded
audio data and simulated data are shown in Figure 2.16.
o
Matrices Padded Prior to Correlation
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
0 1000 2000 3000 4000 5.000
Bin Number
Figure 2.16 Data Sets Padded Prior to Correlation
46


2.3.6.4 Data Set Comparison by Correlation
After the correlation is performed, the result is a typical correlation plot as shown in
Figures 2.17 and 2.18.
0 1024 2048 3072 4096
Bin Number
Figure 2.17 Correlation Plot at Full Scale
With identical data for both input arrays, zero padding is such a fashion will result in
the correlation maximum being at bin 2,049 of 1 to 4,096, which is correct for this
zero padding arrangement.
47


Correlation Result; 107.666 Hz Fundamental; 0 Hz Shift
Figure 2.18 Correlation Plot with No Tuning Error
With a -20 Hz tuning error, the peak moves left by about 4 bins (see Figure 2.19).
Correlation Result; 107.666 Hz Fundamental; -20 Hz Shift
2040 2045 2050 2055
Bin Number
Figure 2.19 Correlation Result with Input Shifted -20 Hz
48


The exact location of the maximum amplitude bin can be used to determine the
tuning offset. In this case there is a tuning error, so the maximum is about 4 bins
lower than 2,049. Unfortunately this approach is weak statistically as it uses the
result of only 1 bin of the correlation. A stronger approach is presented later.
2.3.6.5 Scoring Correlations
As correlations are performed on simulated data nominally at 174th Hz increments,
the summation (using RMS) of each correlation is calculated. The set with the
largest correlation sum determines the vowel fundamental frequency. Figure 2.20
shows a definite correlation spike at the correct frequency.
Sliding Correlation Score; Fundamental = 107.666 Hz
Figure 2.20 Sliding Correlation of 107.666 Hz Fundamentals
The algorithm developed here finds vowel fundamental frequencies in actual audio
samples. In order to test the algorithm with real data, samples were taken of two
radio personalities, Paul Harvey and Dave Logan, for analysis. The deep, low voice
of Paul Harvey is very evident by the placement of the sliding correlation maximum.
Figure 2.21 shows the comparison; no tuning error was present for this analysis.
49


Sliding Correlation Amplitude
cn
o
Paul Harvey; 0 Tuning Error Dave Logan; 0 Tuning Error
Figure 2.21 Two Radio Personalities Fundamental Vowel Frequencies Compared


The multiple fundamentals in the Dave Logan sample were thought to be an artifact,
perhaps from vocal inflection or multiple vowels during the data sample interval.
Analysis of other data ensembles show that this is a fundamental characteristic of
Dave Logans voice.
2.3.7 Vowel Fundamental Determination using
Autocorrelations
The method for finding vowel fundamental frequencies in 2.3.6 has some serious
deficiencies. Chief among these, as shown in Figure 2.22, is the tendency to find
spurious peaks in the result that are not due to the desired correlation. While this
occasionally happens in simulation, it almost always occurs when processing real
voice data.
Fundamental = 123.45 Hz; Offset = -32.45 Hz
Figure 2.22 Misidentified Vowel Fundamental Frequency
Another deficiency is that sometimes the correlation peak is biased away from the
actual peak. The amount of the bias changes as the phase of the harmonics of the
51


vowel fundamental change. While smoothing and range trimming algorithms helped
the algorithm to find the correct peak, the biasing tendency could not be reduced.
A new approach was developed that added autocorrelations to the processing. This
solved many of the problems encountered in the original algorithm.
2.3.7.1 Development of the Autocorrelation Based
Algorithm
The autocorrelation function can be very useful in identifying periodic signals12,1B.
z(k) = Ydg(i)g(k + i).
=i
(2.11)
As the spectral lines slide across each other, only their products are present in the
resulting autocorrelation product. Any signal that lacks the appropriate spectral line
characteristics tends to be concentrated at the center of the correlation. For the
auto-correlation performed here, this concentration occurs at bin 2,049 of 1 to 4,096.
As a result, this central product is omitted from further processing of the final
correlation result.
As with the previous cross correlation analyses, the autocorrelation is performed
using FFTs:
AutoCorr(g,g)j =FFt(gkG'k) .
(2.12)
Figure 2.23 diagrams this process.
52


Autocorrelation
.t. tllltllllll lllltl ,t; T i.ithli t">, JhTiiT" *A
Result
i k iff tlillllilil 1
Figure 2.23 Example of Vowel Autocorrelation
Consider a 123.45 Hz target vowel that has a -32.45 Hz offset as shown in Figure
2.24. The harmonic structure is evident in the sample, but the harmonics are shifted
by the offset frequency. In addition, the noise amplitude is significant. After the
sample is passed though the autocorrelation, the noise amplitude drops markedly.
As can be seen in Figure 2.25, in addition to the reduction in noise, the harmonics
start cleanly at the correct frequency; the autocorrelation as expected removes the
offset.
In the balance of the computations, the autocorrelation result is used in place of the
audio input signal after being transformed into the frequency domain. After the
autocorrelation, the algorithm is very similar with the original algorithm presented in
2.3.6. The other enhancements to the original algorithm will now be discussed.
53


Amplitude
cn
-P*
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
Audio Sample in Frequency Domain
0 1024 2048 3072 4096
Bin Number
Audio Sample in Frequency Domain
Figure 2.24 123.45 Hz Vowel in Frequency Domain


0 1024 2048 3072 4096
Autocorrelation of Input Data in Frequency Domain
Bin Number
Bin Number
Figure 2.25 Autocorrelation of 123.45 Hz Vowel in Frequency Domain


Start Process
Fundamental frequmcy with highest correlation or "score" is the one of interest
Figure 2.26 Modified Vowel Fundamental Algorithm Flow Chart
56


2.Z.7.2 Improved Vowel Simulation
Since autocorrelation removes the effects of the tuning offset and references the
harmonic lines to direct current (DC), the opportunity to use an improved vowel
simulation became evident. This new simulation replaces the ensemble of sine
waves in the time domain (that was then transformed into the frequency domain
using FFTS) with a direct simulation in the frequency domain. This simulation
consists of Cos2 peaks at each location where a spectral line would be anticipated as
seen in Figure 2.27.
Correlation Mask for 107.666 Hz
Bin Number
Figure 2.27 Correlation Mask for 107.666 Hz
As mentioned previously, the initial peak at bin 2,049 (as seen in Figure 2.25) should
not be used in the sliding correlation that follows the autocorrelation. Figure 2.27
shows that this initial peak is zeroed in the mask.
57


2.3.7.3 Product of Autocorrelation and Mask Sliding
Dot Product
The cross-correlation is no longer necessary to compute the score for a given trial
fundamental frequency. This is accomplished simply with an element-by-element
multiplication and a summation (dot product) of the result of the autocorrelation and
the computed mask for the trial frequency. As a result the final computation is very
rapid when performed on the 2.4 GHz Pentium IV PC. The term sliding refers to
the mask sliding across the autocorrelation product as the frequency of the mask is
changed.
Sliding "Dot Product" Fundamental Search
Figure 2.28 Fundamental Search using Autocorrelation Algorithm
It is worthwhile to compare the search result in Figure 2.28 with that of Figure 2.29.
58


Sliding Correlation Search for 107.666 Hz Fundamental
Figure 2.29 Search for 107.666 Hz Fundamental Using Original Algorithm
The noise and ripple using the autocorrelation is significantly less than with the
original algorithm. Figure 2.22 showed a fundamental frequency misidentification
using the original algorithm. Figure 2.30 shows the frequency identification using the
new algorithm. The autocorrelation significantly removed spikes and noise. The
peak is smoother and easier to interpret.
59


Pathologic Case; Fund. = 123.45 Hz; Offset = 32.45 Hz
Frequency (Hz)
Figure 2.30 Sliding Dot Product Analysis of Pathologic Case
2.4 Determination of Tuning Error when Vowel
Fundamental is Known
Two algorithms were developed to identify the tuning error. The first algorithm only
functions when the original sliding correlation algorithm (2.3.6) is used. The second
algorithm can be used with either fundamental estimation algorithm.
2.4.1 Approximating the Tuning Error Based on
Fundamental Calculation
Figure 2.19 illustrates how the tuning error can be approximated by examining the
location of spectral lines obtained from performing correlations in the frequency
domain.
60


The process is quite simple to perform. Using the zero padding algorithm developed
in 2.3.6.3, an autocorrelation would result in a spectral line peak being located in bin
2,049 of the 4,096 present in the correlation result.
If there is a tuning error present, the location of the peak will be offset by 5.383 Hz
per bin. The data used for Figure 2.32 was offset by -20 Hz. This is approximately 4
bins. As a result, the peak is at bin 2,045 instead of 2,049 as in Figure 2.31.
2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054
Bin Number
Figure 2.31 107.666 Hz Fundamental with No Offset
61


107.666 Hz Fundamental; -20 Hz Offset
Bin Number
Figure 2.32 107.666 Hz Fundamental with -20 Hz Offset
This algorithm for determining the tuning error is inadequate on several counts. The
algorithm is statistically weak, because the predicted tuning offset is based on finding
the maximum bin over a range of about 20 bins. A better approach would be to use
the information in many bins, not just a local maximum bin.
Another weakness is the precision of the basic algorithm. Since it is using a local
maximum bin as the tuning offset frequency indicator, the bin spacing of about 5.383
Hz per bin limits the resolution to half that value in the best case. While the
dimension of the FFTs in the algorithm could be increased, the computational time
for the algorithm is already excessive at about 15 seconds per estimate.
62


Finally, the predicted tuning error is based on the determination of the vowel
fundamental frequency. As already demonstrated, the sliding correlation algorithm
does a poor job of finding the fundamental frequency to begin with. When the vowel
fundamental frequency is incorrect by more than a fraction of a Hertz, the predicted
tuning offset value is meaningless.
There is a mathematical approach to improve the resolution of the predicted tuning
offset to better than a bin width.
2.4.2 A Possible Improvement in the Approximated
Tuning Error
An examination of Figure 2.32 shows the peak is a relatively smooth function that is
several bins wide. This suggests that the curve could be interpolated to find the
actual peak; this peak should be the actual tuning error.
The actual peak is a local maximum of a curve where the contents of each bin are
points on the curve. As a result, the slope at the actual maximum of the curve is
zero. The gradient algorithm provides a means to find the maximum of the curve.
The gradient algorithm used to find a function maximum is:
XM =Xi +£f'(Xi)
(2.13)
where g is the gain or a constant used to prevent over-estimation of the function
maximum19. One difficultly is finding the derivative of the curve at the current point;
no equation is available for the curve itself. A second difficulty is that after the first
iteration the new estimate of x is likely not an existing point on the curve.
If some points on a curve are available, then function values on the curve can be
determined by interpolating a polynomial that passes through the available points.
63


The Lagrange interpolation function is efficient in this application since the
polynomial coefficients are never explicitly calculated. The Lagrange interpolation is
described in Appendix A.
To approximate the derivative, a secant approximation of the tangent at a point is
computed by applying two sided finite differences. The traditional equation for the
derivative is one-sided.
/(*) = lim
A->0
f(x + h)-f(x)
(2.14)
To estimate the derivative of/at x, h is replaced by a term based on the square root
of machine epsilon for the computer and compiler being used. The computation of
the machine epsilon itself is described in 2.2.
/'(*)
/(s(1.0 + ^))-/(s)
X -\fs
(2.15)
When using finite differences, h is not vanishing small, just small compared to the
current value of x. As a result, the estimate of the derivative of/at x using one-sided
finite differences suffers a first order error.
64


Figure 2.33 First Order Approximation Error Due to One-Sided Finite Differences
Two-sided derivatives are just an extension to the standard form of the derivative.
/' (x) = lim
f(x + h)-f(x-h)
2 h
(2.16)
When approximating the derivative of/at x, the two-sided finite difference eliminates
the first order error in estimating the tangent.
/'(*)
/(*(1.0 + 4e)) /(x(1.0 -4s))
2x-Je
(2.17)
65


Figure 2.34 Approximation of Tangent using Two-Sided Finite Differences
In reality, two-sided finite differences are nothing more than the two-point slope
formula
m =
yi-yi
x2 -x}
(2.18)
with the values for xj and X2 being very close to a central point x.
Using the same simulated data that generated Figure 2.32, an improvement in the
estimated tuning error was performed using a 10th order Lagrange interpolator. Each
iteration of the gradient search monotonically moved closer to the peak in the
interpolated function. This can be seen as the derivative gradually converged on 0.
66


Approximated Derivative as a Function of Iteration
Figure 2.35 Derivative of jc as a Function of Iteration
At the same time, the value for or converged on a value close to the correct result of
-20.0.
Gradient Algorithm Improvement on Estimate of Tuning Error
Figure 2.36 Improved Tuning Error Approximation
67


Relying on approximated derivatives of interpolated polynomials is a numerically
unsound process. Even with noiseless data it is possible that the errors could be
quite large. Figure 2.37 shows an example where the improved tuning error
estimate, again performed with a 10th order interpolated polynomial, is not as good
as the original estimate.
Figure 2.37 Incorrect Improvement in Tuning Error
A numerically sound approach to computing the tuning error is required.
2.4.3 Determination of Tuning Error Using Time
Domain Correlations
Consider the following hypothesis. Once the vowel fundamental frequency is
computed, a sliding correlation can be used to estimate the tuning offset. As
candidate values of the tuning estimate are computed, a simulated vowel can be
created using the previously computed vowel fundamental frequency. The correct
value of the tuning estimate should produce a correlation maximum when the
sampled audio data and the simulation are compared.
68


This hypothesis seems reasonable, but it relies on an assumption that needs to be
explored. This assumption revolves around the orthogonality of sine waves. It would
seem that two sine waves of the same frequency would be highly correlated, this
hypothesis hinges on the concept that sine waves of different frequencies are not
highly correlated.
2.4.3.1 Correlation of Sine Waves
In this application, correlations are with respect to time; as a result, the time cross-
correlation function is of primary interest20.
Consider two sine waves with different non-zero integer frequencies n and m.
Without loss of generality, let t = 0:
-T
(2.19)
(2.20)
Using the standard trigonometric identity
Sin(a)Sin(b) =
Cos(a b) Cos(a + b)
2
2
(2.21)
69


(2.20) becomes
(2.21)
Over the interval of (-71,7x), this expression sums to 0. Due to the periodic nature of
the cosine, the sum will be 0 over every interval of 27t. As the limits of integration
expand, any portions of the sum that are not a full interval of 27t are divided by 2T
and get successively smaller. Hence, this integral at the limit is 0 when T
approaches infinity.
When the sum is 0 over the interval (-7t, 7t), it is said that the sine waves are
orthogonal over the interval (-71, 7t)21.
Without constraining the sampling interval T, in the discrete case (2.20) becomes
A careful examination of this expression is important. The number of samples
Int(l + T) increases as the sampling interval T decreases. The arguments to the
Sin functions change over a range of 0 to 27i in a discrete sense. As mentioned
previously, m and n are different non-zero integers.
None of those conditions are present for either the sampled vowel data or the
simulated vowel data available for determining the tuning error. The data ensembles
(2.22)
70


are collected over an interval that is short; at a 22,050 Hz sample rate an ensemble
of 4,096 samples span only 186 ms. This is far different from infinitely many
seconds.
In order to get a sense of the effect that short sample intervals have over relatively
realistic frequencies, simulations were designed (see Figure 2.38).
Correlation of 128 Hz and Test Frequency; 0.05 Hz Steps
Figure 2.38 Correlation of 128 Hz and Test Frequencies
Based on the results of the simulations, the hypothesis was accepted and the tuning
error algorithm was designed.
2.4.3.2 Tuning Error Determination Algorithm
This algorithm is very similar to the algorithms previously presented. The primary
difference is that the correlation is performed in the time domain by sliding the offset
frequencies. The vowel fundamental frequency used is the one determined from 2.3.
71


Start Process
Tuning offset frequency with highest correlation or "score" is the one of interest
Figure 2.39 Tuning Offset Determination Algorithm
72


2.4.3.3 Typical Operation of the Tuning Offset Algorithm
When searching for the offset, a new correlation is attempted at each test offset
frequency.
Offset Determination; 128 Hz Fundamental, -20 Hz Offset
Figure 2.40 Offset Determination Correlation as a Function of Test Frequency
When the test offset frequency matches the actual offset frequency, the correlation
function yields a maximum.
73


Figure 2.41 Correlation when -20 Hz Test Matches -20 Hz Offset
The final configuration of the blind clarification algorithm consists of the
autocorrelation based vowel fundamental determination algorithm followed by this
algorithm to determine the tuning offset frequency.
2.4.3.4 Algorithm Limitations
Since the fundamental and harmonics are being matched in the time domain, it is
possible to determine a tuning offset value that is biased by a multiple of the vowel
fundamental frequency. This makes perfect sense, since the spectral lines being
compared in the correlation are spaced by that amount.
This characteristic forces a limitation on the range of tuning offset that can be
accommodated. The limit consists of half of the vowel fundamental frequency. For a
bass voice the tuning offset to be determined must be less than about +/- 43 Hz. For
74


higher voices, the allowed range of offset increases. A soprano speaker would allow
for up to approximately 200 Hz of tuning offset to be determined.
In normal operation this limitation is not a driver in system performance. The normal
operating scenario is that the receiver operator manually tunes in the signal for
maximum intelligibility. Operators generally can tune a SSB signal to within 20 Hz by
ear. After manual tuning, the blind clarification algorithm is enabled. From then on,
any receiver or transmitter drift would be detected and removed.
75


3. Algorithm Performance Testing
Algorithm testing was performed both on simulated signals and with actual received
signals. Both types of test results are discussed separately in the following sections.
jfp Blind Clarification of SSB Voice Signals
Gary A. GeSsincjerj

fConfigurafibn^
Of Verbose Plots
O' ParameterfcPIols
RkotI: \S230~- 8^
. .. ... Ci Fist Term
_ ______________Cl MezzoS oprano (?" Baijtane
H^h Fo (Hz) Low Fo (Hz] Qr;Gbfltnbi Q Bias '
Target Vowel Frequency. Range;
5 lnputPtoceSing=
Input FieName
Total Number of Samples
^ 1107.666 || Sim. Flea IHzl
|0-0 li Sim. Offset IHzl
|| Sim. SNR [cfi]
J.
Input Fie .Status.
-AudioSample Set Pii
NOTE: Nun. of Samples Used = 2 X Num. of FFT Bib
- - |T
fFetcHSamp
$
Sample Foftlei
EOt
FFT B ins
C Parzen Window
" HamWrndpw
&We& Window.
C RectangularWindow
Audio Sample Status
rFurimnenlal Search to Autocoriefalion-,
ll '
Fteq(Hz)
Fund.-Search:
,1 Pass Search
Score.
.ISO
Step Size [Hzl
Fundamenld Seach Status
O' Nanaw Search 0; LPF Com?
O' 1.2.5 check
, ~ ripir ; -Improve.Qffset Estanate^ - ~ ~
'iFirat Search' Fred. IHzl Score..' il Improve Est. | |5 S?|| | II .
;1 Pass Search i! psn; Order Improved Estriate [Hz)
fdOffsetlHz) Step Size (Hz)
-Offset Seard
. Fundamental Search Status..,
O Nanow Search Oi U^Orm?
O' 12.5 Search
i '/Offset Search
:1 Pass Search
1.00
l; Offset (Uz)
J Seme'
li Step SdS;(Hz1:
Offset'Search Status
Figure 3.1 Program Control Panel
76


3.1 Testing using Simulated Data
Except for testing as a function of SNR or as otherwise noted, the signal to noise
ratio was fixed at 20 dB. This value is typical of HF SSB operation.
3.1.1 Vowel Fundamental Frequency Tests
A number of different tests were performed in order to get a sense of algorithm
performance.
The fundamental frequency search algorithm can be set to search at a specified
frequency increment. This will limit the resolution in the output. The behavior of the
fundamental frequency determination algorithm with respect to step size required
testing. Given the nature of small errors involved, the SNR was set to 100 dB.
' 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50
Actual
Figure 3.2 Performance About a 0.25 Hz Search Step
77


The behavior was as anticipated. Performance with respect to SNR is also
important. Testing was accomplished at a vowel fundamental frequency of 100.125
Hz with a step size of 0.25 Hz; however, the algorithm was placed in the 1,2,5 mode
so that the smallest effective step size is 0.001 Hz. Data was collected and plotted
as a function of the linear noise amplitude.
Standard Deviation of Fundamental Determination
Linear Noise Amplitude
Figure 3.3 Noise Performance of Fundamental Estimation
Figure 3.3 shows that the noise performance is approximately linear and that the
fundamental determination algorithm is somewhat insensitive to noise. This plot
ranges from SNRs of 100 dB to -5 dB. At an SNR of -10 dB or poorer the algorithm
begins to break down and tends to misidentify the vowel fundamental.
78


3.1.2 Testing of Tuning Offset Determination Algorithm
It is expected that errors in determining the vowel fundamental frequency will induce
errors in the tuning offset determination. This characteristic was tested with a 100
Hz fundamental. Errors in the vowel fundamental were artificially induced in order to
see the effect in the tuning offset determination.
100 Hz Fundamental; 20 dB SNR; Induced Tuning Errors
Fundamental Tuning Error (Hz)
Figure 3.4 Errors in Tuning Offset due to Errors in Fundamental Determination
As shown in Figure 3.4, the effect is a linear one. Unfortunately the sensitivity is
about 16.25:1 at 100 Hz. This implies that if the tuning error is to be known within 10
Hz, then the error in determining the vowel fundamental frequency should be on the
order of 0.5 Hz.
A test was then made to see the effect of noise on the tuning offset determination
itself. The errors in vowel fundamental frequency were manually removed prior to
the tuning offset determination. Figure 3.5 shows that the tuning offset calculation is
also not overly affected by additive white Gaussian noise.
79


Effect of Noise on Tuning Error Determination
Linear Noise Amplitude
Figure 3.5 Effect of Noise on Tuning Error Estimation
3.2 Testing using Audio Samples
Testing with audio samples was performed using the setup shown in Figure 3.6.
Each time the tuning error was to be estimated, six processing steps were
performed; each is described in the following paragraphs.
80


Low level
50 Ohm
Unfiltered
RF
Low Level
50 Ohm
Filtered
RF
OVU
600 Ohm
UnMered
Audio
OVU
600 Ohm
Filtered
Audio
PQ
Digital
Bus
ISA On PC
Digital Hard Drive
Bus for Further
Processing
Figure 3.6 Data Acquisition Setup


3.2.1 Acquire the Target Signal
The target signal was acquired by two different means. In early testing, a
microphone was used to collect audio samples directly. This was followed by the
analysis of signals collected off the air.
For most testing, the antenna used was a Barker and Williamson AC3.5-30
broadband folded dipole. This 90 foot long antenna is designed to receive signals
over the entire HF spectrum. While its performance at any given frequency is
marginal when compared with tuned antennas, the broadband coverage made this a
good choice. For medium frequency (MF) testing using radio station KOA at 850
KHz, a Sony AN-1 active antenna was used.
Except for MF testing with KOA, a Metropole bandpass filter was used. This filter
attenuates signals below 1.6 MHz and above 30 MHz and helped reduce spurious
responses due to strong local MF signals and cellular and pager sites.
The receiver used for all data analysis was a Watkins-Johnson HF-1000 with the
internal preselector option. With calibration on the order of 0.1 ppm, it contributed
virtually no errors in tuning at MF using KOA and only 2 Hz of error at 15 MHz.
In cases where interference was present, a Timewave DSP-599zx adaptive digital
signal processing (DSP) audio filter was used. This device can effectively reduce
random noise and can virtually eliminate heterodyne (fixed tone) interference.
A Dell 4550 PC with integral SoundBlaster compatible sound card was used for data
acquisition. The program used for .wav file acquisition was Spectrum Laboratory
version 2.01 b1.
82


00
CO
Jg Spectrum Laboratory V2.0 bl
? File > gtartjfStop' Options Quick Settings > A/jew/Windows Help

P'gjixT
j[FregljTime:|r_-_
Now: 07:00:57
View:. 07:00:57
/ I i i i i i .1 i 'i i. i
!!~ Display Sellings
Cursor Position:- -----^--
911.5400 Hz(pk)
-48.25S8.dB
i 07:00:53
ifColor Palette^
il. O jjvj
m
i i I
1-120 dB -60 -40 -20
j [ Capt oJoi-02 07:00:57
Time: 07:00:57.6
t'T
:! :il
rrw
BfMIHSE
Figure 3.7 Spectrum Laboratory Display with WWV Input Signal (10 MHz)


Signals were recorded as .wav files. The collections were accomplished at 16 bit
linear resolution at a nominal sample rate of 22,050 Hz.
The .wav files were then read into a buffer in the analysis program. The buffer
capacity was arbitrarily set at 131,072 samples. This is about six seconds of audio
capture although smaller captures were used in some cases.
' flnput Processm 9 - - |107.SSB li Sira. Freql (Hz).
|i flead Input Fife ; ]AudioSample.wav I! I 131072 | MuT 1 Qfm. hffe^Kn-l-el |File Readn Complete
i 'Sira. Data 1 Input FHa Name . Total Number of Samples J^qq" l! Sim: SNR [t£] ; 'InputFSeStatus
Figure 3.8 Input Processing Panel
3.2.2 Estimate Voice Classification
While a target signal was being digitized, the frequency range of the speakers voice
was noted. The voice classification system breaks the voice into six frequency
ranges. Except for rare cases, the baritone setting, which is the male mid range
classification, was used.
A commercial version of this algorithm would either perform this classification
automatically, or remove the requirement entirely. For the purpose of this thesis,
estimating the voice classification manually did not seem unreasonable. Based on
the classification, the software control panel was updated to reflect this classification.
jfp"Blind Clarification of SSB Voice Signals
- Configuration :- -
[ IT; Veibnsa Plots
j jj; Raiameteric Plots
|1B5.D0 ): 13250
O Soprano O; First Terra
O MesoSoprano. (? Bentons'
Q; Contralto C Bass
High Fo (Hz) Low Fo (Hz)
, Target VoweF Frequency Range:
Figure 3.9 Voice Classification Panel
84


Low and high frequencies of the classifications overlap; this aids in testing when the
target voice is not easily classified. For example, some radio personalities have
vocal mannerisms that alter their vowel fundamental frequencies artificially. A good
example of this is Scott Hastings at KOA. At times, for effect, he raises his vowel
fundamental frequency above his nominal baritone range.
3.2.3 Input Sample Processing
A new ensemble of 4,096 samples from the input buffer is obtained for each new
estimate. The data is windowed prior to being transformed into the frequency
domain; the Welch window is the default.
Audio'bampte bet nocesstriy 11 NOTE: Nun. of Samples Used = 2X Num. of FR Bins |! j^etoh Sampfe^ | 4097 j; |2048 ||| C. Paizen Window Q, Harm Window . JPtocess Audio Samples Complete j:
(? Welch Window Audio Sample Status
ji Sample Porter FFTBins O' Rectangular Window
Figure 3.10 Audio Sample Processing and Windowing
3.2.4 Estimate the Vowel Fundamental Frequency
The estimation method based on the autocorrelation of the input signal was used in
all cases.
f. ; ; : : . ~:r
i prunuajiicmai at !j!FiFid Search aj^Tfes Search | 94.75011 | 0.67 | Freq (Hzl Score-. 1 0-25 1 Step;5ize(Hz 1r|Fund by Autocorrelation complete || FiratementalSearchStatus; Ijj Narrow Search pi EPF Con.? .; fjj'1 chedt
Figure 3.11 Vowel Fundamental Determination
The nominal step size setting of 0.25 Hz was used for all testing. The 1,2,5 check
option was used so that the effective step size was actually 0.001 Hz. The Narrow
Search" and LPF Corr." were not used. The 1 Pass Search" option is sometimes
85


used with simulated data to force the fundamental estimation to a specified value. In
that case the vowel frequency value is set by the input processing panel.
Of particular interest is the score. This numerical value indicates the degree of
correlation. A low value would indicate either low audio signal strength, or that no
vowel was present during the sampling interval.
3.2.5 Estimate the Tuning Offset
Once the vowel fundamental frequency has been determined, it is possible to
estimate the tuning offset.
Offset Seaictv-~
j1| Offset Search
!;)1 PessSeaich
| 41 -00 li Offset [Hz]
I 2'H. ^ Seme
| 1.00 |j Stop Size [Hz]
|Offset Search Complete
Offset Search Status
Figure 3.12 Tuning Offset Estimation Panel
The step size was set to 0.50 Hz to improve the tuning offset resolution. Once again
the score indication gives the strength of the correlation and can be used to
indicate the quality of the offset estimation.
3.2.6 System Calibration
Prior to using the data acquisition setup shown in Figure 3.6, it was necessary to
calibrate the setup.
The HF-1000 receiver was tuned to a signal with a carrier on a known frequency.
The receiver tuning was offset 1 KHz low in USB mode. If the receiver and the
transmitter are perfectly calibrated, then a 1 KHz sine wave would result; however,
due to calibration errors, the sine wave is not exactly 1 KHz. The IG-18 sine-square
86


~ 1 KHz Audio Sinewave
Sample
of received
audio.
Figure 3.13 Calibration Setup
wave generator is used as a transfer standard by adjusting its frequency until the two
sine waves displayed on the 7633 oscilloscope are synchronized. The frequency of
the sine-square wave generator is then read from the reciprocal counter.
Prior to performing this calibration, the Racal-Dana 1992 counter was verified against
a disciplined standard locked to the GPS constellation. After three days of data
acquisition and stabilization, the counter was verified to be within 0.01 ppm.
To verify receiver calibration, NIST radio station WWV was received at 2.5, 5.0, 10.0,
15.0, and 20.0 MHz and set 1 KHz low in USB mode. The IG-18 signal generator
was adjusted for a frequency match using observations of about 1 minute per
reading. The 1992 counter was used to read the matched sine wave frequency.
The audio frequencies obtained were:
87


WWV Frequency
Offset from 1 KHz
2.500 MHz
5.000 MHz
10.000 MHz
15.000 MHz
20.000 MHz
0.481 Hz
0.933 Hz
1.805 Hz
Excessive Interference
Low Signal Strength
Table 3.1 WWV Based Calibration Offsets
Based on this data, it appears that the Watkins-Johnson receiver calibration is off by
approximately 0.18 ppm. This is well within the receiver specification.
As radio station KOA at 0.850 MHz will be used for a test source, its center
frequency was measured in the same fashion. The total tuning error was 1.91 Hz.
Subtracting the 0.15 Hz contribution from the HF-1000 receiver at 0.850 MHz, the
error in the center frequency for KOA is approximately 1.76 Hz. This is well within
the 20 Hz FCC specification for commercial AM stations.
3.2.7 Test Results Using Local Speakers
A microphone was used to gather samples of vowels.
Except for a couple of obviously bad readings, the result centers around the 0 Hz
offset that was used. The statistics in Table 3.2 we compiled after the two bad
readings were removed.
During normal voice the pitch changes as a function of time. A test was run with the
vowel fundamental frequency increasing by an octave over about 5 seconds.
88


Full Text

PAGE 1

BLIND CLARIFICATION OF SINGLE SIDEBAND SUPPRESSED CARRIER SIGNALS USING VOCAL STATISTICS by Gary A. Geissinger B. Mus. Ed., University of Colorado, 1978 B.A., University of Colorado, 1984 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Electrical Engineering 2003

PAGE 2

by Gary A. Geissinger All rights reseiVed.

PAGE 3

This thesis for the Master of Science degree by Gary A. Geissinger has been approved by Dr. J. Bialasiewicz Date

PAGE 4

Geissinger, Gary A. (M.S., Electrical Engineering) Blind Clarification of Single Sideband Suppressed Carrier Signals using Vocal Statistics Thesis directed by Dr. Miloje Radenkovic ABSTRACT Single sideband suppressed carrier signals are used to convey voice information in the high frequency radio spectrum. Due to the nature of this type of modulation, it is necessary for the transmitter and receiver to be closely tuned to the same frequency. This is a difficult requirement in low cost, commercial and amateur quality radio transceivers. This thesis presents and evaluates a blind adaptive algorithm for autonomously clarifying a received single sideband signal without resorting to the usual methods that involve either pilot carriers or a vestigial opposite sideband. At the conclusion of the thesis simulations of this algorithm are presented along with the results that were obtained. This abstract accurately represents the content of the candidate's thesis. recommend its publication. Signe Dr. Miloje Radenkovic iv

PAGE 5

CONTENTS Figures .................................................................................................................... viii Tables ..................................................................................................................... xii Chapter 1. Background lnforrnation ........................................................................................ 1 1. 1 Relevant Characteristics of the Human Voice ..................................................... 1 1.1.1 Vowels -The Desired Signal ........................................................................... 1 1.1.2 Consonants -A Possible Noise Source .......................................................... .4 1.2 Single Sideband Suppressed Carrier Communications ....................................... 6 1.2.1 Single Sideband (SSB) Modulation .................................................................. 6 1.2.2 Translating the SSB Signal to the Transmit Frequency .................................... 9 1.2.3 Frequency Synthesizers ................................................................................ 11 1.2.3.1 Phase Locked Loop Frequency Synthesizers ............................................. 11 1.2.3.2 Numerically Controlled Oscillators .............................................................. 12 1.2.4 SSB Demodulation-Recovering the Original Audio Signal. .......................... 14 1.2.5 Translating the Received SSB Signal to Baseband ....................................... 16 1.3 Conversion Chain Error Analysis ...................................................................... 16 1.3.1 Transmit Conversion Chain Frequency Errors ................................................ 16 1.3.2 Receive Conversion Chain Frequency Errors ................................................ 19 1.3.3 Combined Conversion Chain Frequency Errors ............................................. 20 2. The Blind Clarification Algorithm ......................................................................... 22 2.1 The Effect of a Tuning Error on SSB Voice Modulation ..................................... 22 2.2 System Engineering Aspects of the Design ...................................................... 24 2.3 Determination of Vowel Fundamental Frequency ........................... .................. 27 v

PAGE 6

2.3.1 Origin of the Vowel Fundamental Determination Algorithms .......................... 28 2.3.2 Analysis of Vowels in the Frequency Domain ................................................ 31 2.3.4 Initial Audio Sample Processing ..................................................................... 37 2.3.5 Vowel Simulation ........................................................................................... 40 2.3.6 Determination of Vowel Fundamentals using Sliding Correlations ................ .40 2.3.6.1 Algorithmic Details of Correlative Vowel Determination .............................. 42 2.3.6.2 Reference Data Set Generation ................................................................... 44 2.3.6.3 Data Set Zero Padding .............................................................................. .46 2.3.6.4 Data Set Comparison by Correlation ............................................................ 47 2.3.6.5 Scoring Correlations ................................................................................... 49 2.3.7 Vowel Fundamental Determination using Autocorrelations ............................ 51 2.3. 7.1 Development of the Autocorrelation Based Algorithm ................................. 52 2.3.7.2 Improved Vowel Simulation ........................................................................ 51 2.3. 7.3 Product of Autocorrelation and Mask -Sliding Dot Product ........................ 58 2.4 Determination of Tuning Error when Vowel Fundamental is Known .................. 60 2.4.1 Approximating the Tuning Error Based on Fundamental Calculation ............. 60 2.4.2 A Possible Improvement in the Approximated Tuning Error ........................... 63 2.4.3 Determination of Tuning Error Using Time Domain Correlations .................... 68 2.4.3.1 Correlation of Sine Waves .......................................................................... 69 2.4.3.2 Tuning Error Determination Algorithm ......................................................... 71 2.4.3.3 Typical Operation of the Tuning Offset Algorithm ........................................ 73 2.4.3.4 Algorithm Limitations .................................................................................. 7 4 3. Algorithm Performance Testing .......................................................................... 76 3.1 Testing using Simulated Data ........................................................................... 77 3.1.1 Vowel Fundamental Frequency Tests ............................................................ 77 3.1.2 Testing of Tuning Offset Determination Algorithm .......................................... 79 3.2 Testing using Audio Samples ........................................................................... 80 3.2.1 Acquire the Target Signal .............................................................................. 82 vi

PAGE 7

3.2.2 Estimate Voice Classification ......................................................................... 84 3.2.3 Input Sample Processing ............................................................................... 85 3.2.4 Estimate the Vowel Fundamental Frequency ................................................. 85 3.2.5 Estimate the Tuning Offset ............................................................................ 86 3.2.6 System Calibration ......................................................................................... 86 3.2.7 Test Results Using Local Speakers ............................................................... 88 3.2.8 Tests Using MF and HF Reception .............................................................. 100 3.2.9 Conclusions Based on Test Results ............................................................ 106 Appendix A Lagrange Interpolation ...................................................................................... 107 8 Acronym and Abbreviation List. ......................................................................... 1 09 C Software Listing ................................................................................................ 111 References ............................................................................................................ 151 vii

PAGE 8

FIGURES Figure 1.1 Block Diagram of the Voice Mechanism ......................................................... 1 1.2 Harmonic Series for a 160 Hz Fundamental. .................................................. 2 1.3 155 Hertz "ah" Vowel ..................................................................................... 3 1.4 Single Sideband Modulation .......................................................................... 6 1.5 SSB Transmit Conversion Chain .................................................................. 10 1.6 Phase Locked Loop Synthesizer .................................................................. 11 1. 7 Numerically Controlled Oscillator ................................................................. 13 1.8 Single Sideband Demodulation .................................................................... 14 1.9 Receive Conversion Chain ........................................................................... 17 2.1 155Hz "ah" Vowel with +20Hz Tuning Error ............................................... 24 2.2 Machine Epsilon Calculation Result ............................................................. 27 2.3 AFC Circuit for FSK Radio Channel ............................................................. 28 2.4 Comparison of Shifted and Un-shifted Spectra ............................................ 30 2.5 Fourier Transform of ACos(at) ..................................................................... 31 2.6 Fourier Transform of a Vowel ....................................................................... 32 2. 7 Magnitude of the Sine Function .................................................................... 33 2.8 Fourier Transform of Vowel with Side Lobes ................................................ 34 2.9 Vowel with Side Lobes Subject to a Tuning Error ......................................... 36 2.1 0 Vowel Spectra using Sampled Data and Discrete Fourier Transform ........... 36 2.11 Typical Input Data Ensemble in Time Domain .............................................. 37 2.12 Input Data Ensemble after Window is Applied .............................................. 38 2.13 Simulated Data in the Frequency Domain; With Tuning Error ...................... 39 2.14 Correlative Fundamental Determination Algorithm ...................................... .43 2.15 Simulated Data with No Tuning Error ........................................................... 45 viii

PAGE 9

2. 16 Data Sets Padded Prior to Correlation ........................................................ .46 2.17 Correlation Plot at Full Scale ........................................................................ 47 2.18 Correlation Plot with No Tuning Error .......................................................... .48 2.19 Correlation Result with Input Shifted -20 Hz ............................................... .48 2.20 Sliding Correlation of 107.666 Hz Fundamentals ........................................ .49 2.21 Two Radio Personalities Fundamental Vowel Frequencies Compared ........ 50 2.22 Misidentified Vowel Fundamental Frequency ............................................... 51 2.23 Example of Vowel Autocorrelation ............................................................... 53 2.24 123.45 Hz Vowel in Frequency Domain ....................................................... 54 2.25 Autocorrelation of 123.45 Hz Vowel in Frequency Domain ........................... 55 2.26 Modified Vowel Fundamental Algorithm Flow Chart ..................................... 56 2.27 Correlation Mask for 1 07.666 Hz .................................................................. 57 2.28 Fundamental Search using Autocorrelation Algorithm .................................. 58 2.29 Search for 107.666 Hz Fundamental Using Original Algorithm ..................... 59 2.30 Sliding Dot Product Analysis of Pathologic Case ......................................... 60 2.31 107.666 Hz Fundamental with No Offset.. .................................................... 61 2.32 107.666 Hz Fundamental with-20Hz Offset.. .............................................. 62 2.33 First Order Approximation Error Due to One-Sided Finite Differences .......... 65 2.34 Approximation of Tangent using Two-Sided Finite Differences .................... 66 2.35 Derivative of x as a Function of Iteration ...................................................... 67 2.36 Improved Tuning Error Approximation .......................................................... 67 2.37 Incorrect Improvement in Tuning Error ......................................................... 68 2.38 Correlation of 128 Hz and Test Frequencies ................................................ 71 2.39 Tuning Offset Determination Algorithm ........................................................ 72 2.40 Offset Determination Correlation as a Function of Test Frequency .............. 73 2.41 Correlation when -20 Hz Test Matches -20 Hz Offset .................................. 7 4 3.1 Program Control Panel. ................................................................................ 76 3.2 Performance About a 0.25 Hz Search Step .................................................. 77 ix

PAGE 10

3.3 Noise Performance of Fundamental Estimation ........................................... 78 3.4 Errors in Tuning Offset due to Errors in Fundamental Determination ........... 79 3.5 Effect of Noise on Tuning Error Estimation ................................................... 80 3.6 Data Acquisition Setup ................................................................................. 81 3.7 Spectrum Laboratory Display with WWV Input Signal (10 MHz) ................... 83 3.8 Input Processing Panel ................................................................................ 84 3.9 Voice Classification Panel ............................................................................ 84 3.1 0 Audio Sample Processing and Windowing ................................................... 85 3.11 Vowel Fundamental Determination .............................................................. 85 3.12 Tuning Offset Estimation Panel. ................................................................... 86 3.13 Calibration Setup ......................................................................................... 87 3.14 Baritone "ah" Vowel at Fixed Frequency ...................................................... 89 3.15 Baritone Frequency Plot for Rising Pitch "ah" Vowel .................................... 91 3.16 Baritone Offset Versus Vowel Frequency Rising Pitch "ah" Vowel ............... 91 3.17 Baritone Frequency Plot for Falling Pitch "ah" Vowel ................................... 92 3.18 Baritone Offset Versus Vowel Frequency Falling Pitch "ah" Vowel ............... 92 3.19 512 Bins Baritone Frequency Plot for Falling Pitch "ah" Vowel. .................... 93 3.20 512 Bins Baritone Offset Versus Vowel Falling Pitch "ahn Vowel. ................. 94 3.21 Contralto "ah" Vowel at Fixed Frequency ..................................................... 95 3.22 Contralto "ah" Vowel Spectrum Analysis Fundamental ................................ 95 3.23 Contralto "ah" Vowel Spectrum Analysis of 6th Harmonic ............................. 96 3.24 Contralto Estimated Tuning Offset ............................................................... 96 3.25 Contralto With Rising Pitch .......................................................................... 98 3.26 Contralto with Falling Pitch ........................................................................... 99 3.27 Contralto Voice with 7 Tuning Offsets; No Error Rejection ......................... 100 3.28 Contralto Voice with 7 Tuning Offsets; With Error Rejection ....................... 101 3.29 Tuning Offset Determination with Resonant Woman's Voice ..................... 1 02 3.30 Female Voice with-25Hz Offset.. .............................................................. 102 X

PAGE 11

3.31 Male Voice with-25Hz Offset.. .................................................................. 103 3.32 Large Sample of Male Voices; -25Hz Offset.. ............................................ 104 3.33 Time Evolution of Estimated Tuning Error .................................................. 105 xi

PAGE 12

TABLES Table 1.1 Spoken Vowel Fundamental Frequency Ranges .......................................... .4 1.2 Examples of Voiced and Unvoiced Consonants ............................................. 5 1.3 SSB Modulator Outputs ................................................................................. 9 2.1 SSB Demodulation Output with +20 Hz Tuning Error ................................... 23 2.2 Point Design Based on System Specifications ............................................. 26 2.3 Typical Window Functions ........................................................................... 35 2.4 Data Simulator Requirements ...................................................................... 40 2.5 Demonstration Simulation Run Specifications .............................................. 42 2.6 Voice Classifications .................................................................................... 44 3.1 WWV Based Calibration Offsets .................................................................. 88 3.2 Statistics for Baritone "ahn Vowel at 111 Hz ................................................. 90 3.3 Statistics for Contralto "ahn Vowel at 214 Hz ................................................ 97 3.4 Performance Comparison of Two FFT Lengths .......................................... 104 xii

PAGE 13

1. Background Information 1.1 Relevant Characteristics of the Human Voice The blind clarification algorithm functions due to the fundamental characteristics of the human voice. Consequently, a thorough understanding of the relevant characteristics of the human voice is required. Of primary importance to the blind clarification algorithm are the vowel sounds. 1.1.1 VowelsThe Desired Signal The model considered here for vowel generation is based upon source-filter theory.1 This theory divorces thf3 source of vocal acoustic waves from the filtering that transforms the initial acoustic waves into particular vowels. Figure 1.1 illustrates the source-filter model as it applies to the human voice. Air Pump Lungs I Pathway of Interest Self-Sustained Oscillator (Simple Source) Larynx Adjustable Transmission Vocal Cavities / 1'Sibilants r--I Etc. Auxiliary Output Interface Output Interfaces Nose \ Aperture To Microphon e Mouth .. Aperture .. Figure 1.1 Block Diagram of the Voice Mechanism 1

PAGE 14

The larynx acts as the source in this model and is responsible for generating the fundamental pitch. The frequency of oscillation depends upon the dimensions of the larynx and the mass and tension of the vocal cords. Given the conditions of high breath pressure and close spacing of the vocal chords, the bursts of air from the larynx will be regular; hence, the upper partials generated will be harmonically related. Both even and odd harmonics are present at the larynx. The amplitude of the harmonics decreases at a rate of about 12 dB per octave. The harmonics are present up to 4000Hz, after which their amplitudes are severely attenuated.2 Figure 1.2 gives an example of a male voice with a 160 Hz fundamental with no formant filtering. 01) "'0 .f! 1 0.1 01) > 0.01 0.001 160Hz Fundamental Larynx Source with NO Filtering o:::-ro:::rrr lc "; 1:' -lnnnniiiL i)C 160 480 800 1120 1440 1760 2080 2400 2720 3040 3360 3680 4000 Frequency (Hertz) Figure 1.2 Harmonic Series for a 160 Hz Fundamental In this example all of the harmonics are present at the larynx with a smoothly decreasing amplitude contour. The formant filters present in the human vocal tract alter this contour to intone different vowels sounds. A better example of the input 2

PAGE 15

signals of interest is shown in Figure 1.3 that gives the spectral analysis of a 155 Hz uah" vowel in a bass-baritone voice. 0.1 "'C c. s < 1ii 0.01 0.001 Bass-Baritone 155 Hertz "ah" Vowel . r---__ ; __ -----1-___ ------___ :_--1--_: __ --1--_;_--, ____ :_--1---------; ___ I : I I I : I : I : I : I : I : I : I : I : I I : I I : I : I : I : I : I : I : I : I : I : I I : I : I : I : I : I : I : I : I : I : I : I : I : I : I : I : I : I : I : I m r r------------------------: I : I : I : I : I : I : I : I : I r-----------------' : I : I : I : I I : I : I : I : I : : I : I : I : I I : I : I : I : I fr----: --i--+: __ I ::: ::: :::, :: :: ::: ::: ::: r-------------I--___ 1 ___ 1 ___ ------------_-_-_ -__ -_-1 _--_ :, -__ --: --------+---------------' --, ---------------------r--------: ----: -r -------------------------------; ---;--1-----------------------------'------1 : I : : I I I ---r--: :-----------------------------------------r--r 155 465 775 1085 1395 1705 2015 2325 2635 2945 3255 3565 3875 Frequency in Hertz Figure 1.3 155 Hertz "ah" Vowel The harmonic sequence of the 155 HZ "ah" vowel is very similar to the analytical example in Figure 1.2; the primary difference between the two figures lies in the relative amplitudes of the various harmonics. In this figure there is some extraneous low amplitude noise from about 450 Hz to 1 000 Hz. This was caused by a noise source present in the room during data acquisition and therefore should be ignored. The smooth curve drawn through the peaks of each harmonic is often called a vowel contour. Unlike speech synthesis and voice recognition algorithms, the algorithm developed here is not concerned with vowel contours. The absolute pitch of the fundamentals and harmonics is of primary importance instead. 3

PAGE 16

Any given individual with normal larynx function can generate a range of fundamental vowel frequencies that can vary by at least an octave; this is the origin of singing. When speaking, the usual pitch is about 50% higher in frequency than the lowest pitch that can be generated while singing.3 The fundamental frequencies that are generated by an adult larynx primarily fall into six overlapping ranges depending upon the physical properties of the larynx and vocal cords. Using the 50% rule for the fundamental frequencies of speech presented above, along with a tolerance of 50% to account for differences in individual voices, the spoken vowel fundamental frequency ranges are listed below in Table 1.1. Voice Range Classification Frequency Range Highest Female Soprano 246.94 493.88 Hz Most Probable Female Mezzo-Soprano 207.65 415.30 Hz Lowest Female Contralto 185.00 370.00 Hz Highest Male Tenor 146.93-293.66 Hz Most Probable Male Baritone 92.50 185 Hz Lowest Male Bass 87.31 -174.62 Hz Table 1.1 Spoken Vowel Fundamental Frequency Ranges 1.1.2 Consonants -A Possible Noise Source Unlike vowels, consonants generally have a fixed short duration and can therefore be considered transient events. The group of consonant vocal sounds can be further broken down into small groups based upon their acoustic properties 1 Some consonants still have merit as a desired signal input (see Table 1.2). This is due to the presence of voiced sound generated by the vocal cords. 4

PAGE 17

While vowels and voiced consonants will be processed as signals, unvoiced consonants and other vocal sounds are not required by this algorithm and therefore are considered to be noise. Fortunately, the effects of noise on the clarification algorithm can be reduced by adaptive filtering. Consonant Example Classification Voiced b boy stop no tf church affricative no d dog stop no f fog fricative no g go stop yes h hat fricative no join glide yes k cake stop no like liquid yes m may nasal yes n nose nasal yes p pop stop no r river liquid yes s saw fricative no t top stop no v valley fricative yes w while glide yes z zebra fricative yes Table 1.2 Examples of Voiced and Unvoiced Consonants The acoustic energy produced by the voice is transformed into electrical energy by a transducer (microphone) and applied as input to a single sideband transmitter. Microphones used for radio communication applications have a tailored frequency response from 300 Hz to 3,000 Hz. Any acoustic energy outside of this range is generally ignored. 5

PAGE 18

1.2 Single Sideband Suppressed Carrier Communications 1.2.1 Single Sideband (SSB) Modulation SSB is a modulation technique that is commonly used in the high frequency spectrum. While it can be generated by phase techniques, in modem communications equipment it is generated using narrow filters as illustrated in Figure 1.4.6 Voice Signal Input 300-3000 Hz m(f1,t) Double-Balanced Mixer Carrier Oscilla1or Cos (2xf2t) Double Sideband Sideband Fiher Figure 1.4 Single Sideband Modulation The modulating signal, in this case voice, is applied to doubly-balanced mixer. The other input to the mixer originates from a carrier oscillator source. Due to its balanced design with respect to both the signal and carrier inputs, the leakage of the mixer inputs directly to the mixer outputs is suppressed and can be adjusted to suppress the unwanted outputs to a point where they are negligible. The desired output of the mixer is the product of the two input signals 6

PAGE 19

m(f;,t)Cos(27if;_t) (1.1) As previously discussed, the modulating signal of interest will consist of the vowels and voiced consonants produced by a human speaker. Consider a voice signal that consists of a vowel. As previously mentioned, a vowel consists of a fundamental frequency and its harmonics within the frequency range of 300 to 3,000 Hz. This can be represented as n m(J;.,t) = L ACos(21l/if,J). k=l (1.2) The amplitude of the fundamental and harmonics present is specified by each of the values that Ak can attain. This product will be n m(t) = Cos(2if2t) L AkCos(21lkJ;t), k=l (1.3) which is the product of cosine functions. By use of the trigonometric formula 1 1 Cos( a )Cos(p) = -Cos( a -p) +-Cos( a + p), 2 2 (1.4) the product in (1.3) becomes a "sum and difference" mixer product (1.5) 7

PAGE 20

Given that the carrier oscillator source is always much higher in frequency than the modulating signal, the output of the mixer will be a grouping of signals above and below the carrier frequency. The sideband filter only passes the signals from one sideband. As upper sideband is generally the standard modulation technique used in the high frequency spectrum, the output of the sideband filter would be n A,./ L 72 Cos(2;r(kJ; + J;Jt). k=l (1.6) The undesired signal output from the mixer is commonly called the image frequency. In this case the image consists of multiple frequencies based upon the original modulating input n A,./ L 72 Cos(2;r(J; kJ; )t). k=l (1.7) A numerical exarnple given in Table 1.3 can be used to illustrate SSB modulation. Consider a vowel that has a fundamental frequency of 256 Hz and harmonics that reach up to 3,000 Hz. The carrier oscillator in this example will be at 455 KHz with the sideband filter passing only the upper sideband information from 0 to 3,000 Hz above the carrier frequency. As mentioned previously, a single sideband communications link is generally limited to signals in the audio band from 300 Hz to 3,000 Hz. Depending upon the shape factor of the sideband filter and the responsivity of the microphone, it is entirely possible that the 256 Hz fundamental will not be available at the receiver. The blind clarification algorithm cannot count on energy at the fundamental frequency being present. 8

PAGE 21

Harmonic Fundamental 2nd 3rd Frequency 256Hz 512Hz 768Hz 2560Hz 2816Hz Carrier Oscillator 455.000 KHz 455.000 KHz 455.000 KHz 455.000 KHz 455.000 KHz Output Frequency 455.256 KHz 455.512 KHz 455.768 KHz 457.560 KHz 457.816 KHz Table 1.3 SSB Modulator Outputs 1.2.2 Translating the SSB Signal to the Transmit Frequency Single sideband transmissions are most commonly used in the High Frequency (HF) radio spectrum. The HF spectrum can be considered to be from 1.6 MHz to 30.0 MHz. The 455 KHz SSB signal generated in 1.2.1 is outside of this frequency range. As a result frequency conversion is required. The frequency conversion schemes commonly used involve multiple oscillator injection frequencies and multiple mixers. Figure 1.5 shows a frequency conversion scheme appropriate for the translation of SSB signals from baseband audio to the SSB signal at the desired transmit frequencyl. It is important to note that the phase locked loop (PLL) synthesizers are all locked to the frequency standard. This is true in virtually all modern commercial and military SSB equipment. In low cost amateur applications it is not unusual for some of the oscillator injection frequencies to be generated by separate unlocked frequency sources. 9

PAGE 22

....... 0 SSB!nput 1.6MHz to 30MHz tunwin 10 lfL Steps Roofmg Filter 30.000MHz Low Pass 44.5 MHz to 72 .90 MHz Synthesizer (10KHz Steps) 2MHz Frequency Standard Bandpass Filter 42.905 MHz 16KHz w i de Bandpass Filter 10.7MHz B KHzwide Reference Frequency Divider Cbain S i deband Filter 455.0-458. 0 KHz or 455.0-452. 0 KHz Figure 1.5 SSB Transmit Conversion Chain Lowpass Filter Audio Output DC -3 KHz 300 3KHz

PAGE 23

The rationale for multiple conversions and injection frequencies involves practical limitations in phase locked loop design and high rejection ratios for out of band transmit products. While phase locked loop design and heterodyne frequency product analysis are not germane to this thesis, the effect on transmit frequency stability and accuracy due to the use of multiple synthesizers is a primary concern. 1.2.3 Frequency Synthesizers As illustrated in Figures 1.4 and 1.5, frequency synthesizers are integral to SSB communications systems. Two common schemes are used to synthesize the frequencies used for frequency conversion. 1.2.3.1 Phase Locked Loop Frequency Synthesizers The portion of PLL theory required for this thesis involves frequency accuracy. Modem phase locked loops use a dual divider scheme to generate the output frequency as shown in Figure 1.6. Master Oscmator Reference Frequency Phase/Frequency Comparator I Feedback Frequency Tuning Vohage WwPu FDter Divide by n Vohage Controlled Oscmator Figure 1.6 Phase Locked Loop Synthesizer 11 Output Signal

PAGE 24

The PLL acts as an analog servo control system. The tuning voltage will drive the voltage-controlled oscillator until the reference frequency and the feedback frequency (and phase) are matched. The divider m sets the tuning step size for the synthesizer. The divider n sets the output frequency. The output frequency is simply the reference frequency times n. The output frequency is therefore lout =(!master _oscillat%). n. (1.8) An implementation of the 455 KHz synthesized injection oscillator required in Figure 1.3 and Table 1.3 using a 5 KHz reference frequency could have m = 400 and n = 91 for a 2 MHz master oscillator input. Any errors in the master oscillator frequency will be referred to the PLL output; i.e., if the master oscillator is specified to be within 1 part per million (ppm), then the output frequency will have that specification as well8 1.2.3.2 Numerically Controlled Oscillators A newer technique that uses direct digital synthesis techniques in place of PLL sources is becoming common in transmitters and in the later oscillators in receivers. Such oscillators are called numerically controlled oscillators (NCOs)9 The block diagram for a typical NCO is given in Figure 1.7. The key to understanding the operation of an NCO is to consider the behavior of the accumulator. Consider an NCO that uses a 32-bit accumulator that is updated by a master oscillator at a 1 00 MHz rate. Once every 1 0 ns the value programmed into the increment word is added to the contents of the accumulator. Being a 32 value, the accumulator must transition through a total of 4,294,967,296 counts before it rolls over and returns to 0. If, for example, the increment register is programmed to contain 4295, then the accumulator will roll over approximately 100 times per second. This means that the count conveyed into the sine lookup table will outline 12

PAGE 25

Master Oscillator Adds increm.eot word to accumulator once for each master oscillator cycle Figure 1. 7 Numerically Controlled Oscillator about 100 sawtooth count sequences per second. The sine lookup table will use the most significant bit outputs from the accumulator to convert these to sine waves that are then converted to an analog signal by the analog to digital converter. The analog signal is then passed through a low pass filter in order to remove any of the master oscillator frequency components. The size of the sine lookup table and the number of bits in the analog-to-digital converter determine the fidelity of the sine wave. The frequency of the output is of primary concern. In this example, the exact output frequency (assuming a perfect master oscillator) is increment foul = f reference 2n '" = 1 OOMHz 4295 J out 4294967296 fout = 100.0000761Hz (1.9) The small setting error (76.1 uHz) is typical of a general purpose NCO. In SSB equipment, the reference frequency is chosen so that the NCO synthesizer output is 13

PAGE 26

exactly on the desired frequency, assuming that the master oscillator is absolutely correct. Unfortunately, the master oscillator is not perfect and in fact has an error just as in the PLL synthesizer case. Equation (1.9) demonstrates that, just as in Plls, the frequency output of an NCO is directly proportional to the error in the master oscillator. In further discussions involving oscillator injection signals, no distinction will be made between those generated by NCOs and Plls. 1.2.4 SSB Demodulation Recovering the Original Audio Signal A configuration similar to Figure 1.4 is used to recover the original audio from a SSB modulated signal and is shown in Figure 1.8. Single Sideband Input l..owpass Fil!or Voice Signal OulpUI 1------4 300-3000 Hz m(f1 t) Figure 1.8 Single Sideband Demodulation The input to the SSB demodulator will be n L 72 Cos(2;r(kJ; + fJt), k=l 14 (1.1 0)

PAGE 27

which is the same as {1.6). The mixer in the demodulator multiplies the input and the local oscillator to yield {1.11) Again, using trigonometric identities, two mixer products will be present. The desired output, assuming that frequency fi for the modulation and demodulation processes are identical, will be returned to baseband. This output, neglecting noise, interference, and gain differences, will be identical to the original input n m(J;,t) = k=i {1.12) The significant difference between the modulation and demodulation processes is that the image product from the mixer in the demodulator consists of a band of frequencies above the carrier oscillator frequency at n L f2 Cos(27r(kJ; + h.cmod) + f2(demod) )t) k=i {1.13) This places the image product at a frequency that is significantly above baseband. As a result, a low pass filter is used to remove the image product. 15

PAGE 28

1.2.5 Translating the Received SSB Signal to Baseband When a SSB signal in the HF radio spectrum is received, it must be translated to baseband. This is generally done using a conversion scheme shown in Figure 1.9 that is in opposite order to the transmit chain described in 1.2.2. As with the transmit conversion chain, all of the synthesizers in the receive system in commercial and military SSB receivers are locked to the frequency standard. 1.3 Conversion Chain Error Analysis Each of the synthesizers in transmit and receive conversion chains will induce a frequency error in the final product of the chain. In a transmitter, the error is at the transmit frequency. At the receiver, the error is in the demodulation of the signal. Taken together, these cause a clarification error in the received signal at baseband. 1.3.1 Transmit Conversion Chain Frequency Errors Consider the transmit chain shown in Figure 1.4 on a channel frequency of 14.3 MHz using upper sideband modulation {USB). The final transmit frequency, given a modulation input at.fi will be =57.21MHz-(32.21.MHz+(ll.155.MHz-(f. +455KHz))) fou1 = 14.3MHz +f. 16 {1.14)

PAGE 29

...... -.J SSB!nput 1.6MHz to JOMHz tuned in 10Hz Steps RoofmgFilter 30 000MHz Low Palls 44.5 MHz to 72 90 MHz Synthcsiur (10 IG:fz Steps) 2MHz Frequency Standard Bandpass Filler 42. 905 MHz 16Kl2wide Bandpass Filler 10.7MHz 8 Kl2wide Reference Frequency Divider Chain Sideband Filter 455.0-458 0 Kl2 or 455.0-452.0 Kl2 Figure 1.9 Receive Conversion Chain Lowpass Filter Audio Output DC-3 Kl2 300 3Kl2

PAGE 30

Since each of the conversion frequencies is locked to the same master oscillator, the errors will sum in similar fashion. For the purpose of this analysis, assume that the master oscillator is high by x parts per million. The transmit frequency error would be !Terror= 57.21x-(32.21x + (11.155x0.455x)) /Terror = l4.3xHz (1.15) The final result is simply the error multiplied by the channel frequency. This is the case in all well designed transmitters and can never be less than this, since the combination of the frequencies must result in the channel frequency. However, the error can be more if the injection sources are not locked to a master oscillator. Consider the same frequency chain with unlocked oscillators. Since the errors are uncorrelated and can be either positive or negative, they directly sum together /Terror= 57.21x1 -(32.21x2 + (ll.l55x3 0.455x4)) /Terror= 57.2lx1 + 32.2lx2 + 11.155x3 + 0.455x4Hz (1.16) In order for a transmitter without a master oscillator to achieve the same transmit frequency accuracy as a transmitter using a master oscillator, the high frequency oscillators (HFO) would have to be more stable than a master oscillator. This is unlikely since master oscillators use temperature compensated crystal oscillators. The HFO in a transmitter sets the channel frequencies. Unless the transmitter is designed for single channel operation, it is likely that the HFO is a variable frequency oscillator. Using a master oscillator can improve the frequency accuracy of a transmitter by orders of magnitude. 18

PAGE 31

In low cost amateur SSB transmitters, the high frequency oscillators are typically synthesized and locked to a master oscillator, while the lower frequency oscillators are fixed independent crystal oscillators. While not as stable as a fully synthesized commercial or military transmitter, low cost amateur SSB equipment has acceptable frequency accuracy for amateur applications10 1.3.2 Receive Conversion Chain Frequency Errors The frequency errors in a SSB receiver add in a similar fashion, except that they result in an error at baseband. Again using the example of a USB signal at 14.3 MHz with a modulation frequency offi., /Rout = 11.155MHz-(57.21MHz-(14.3MHz + /J32.21MHz)-455KHz /Rout = J; (1.17) For a fully synthesized receiver with an error of x ppm in the master oscillator, the frequency error in the final received audio signal would be fRerror = 11.155x-(57.21x32.21x)0.455x /Re"or =-14.3x.Hz (1.18) Note that this error is the same as the transmit error for a fully synthesized transmitter, except opposite in sign. Again in a receiver without a master oscillator the errors have the potential of being much larger. As with the transmitter, the errors can be either positive or negative, so without a loss of generality: 19

PAGE 32

fRe"or =11.155x3 -(57.2lx1 -32.2lx2)-0.455x4 fRwor = 57.21x1 + 32.2lx2 + 11.155x3 + 0.455x4 Hz' which is the same as for the transmitter as shown in (1.14). 1.3.3 Combined Conversion Chain Frequency Errors (1.19) In the transmitter and receiver are taken together, then the error add. For the example of a commercial or military transmitter using USB modulation at 14.3 MHz with master oscillator errors of x ppm, !error = l4.3xtransmit -l4.3xreceiveHz (1.20) If the transmitter and receiver were locked to the same master oscillator, then the errors would cancel. While some exotic military equipment does this using the Global Positioning Satellite (GPS) constellation, this is not the usual practice. As a result, using master oscillators that are accurate to 1 ppm, the maximum error on either transmit or receive for well designed HF SSB transmitters and receivers is 30 Hz each. The total error could be as high as 60 Hz for the combined transmitter-to receiver link. The United States Government (USG) standard for HF transmitting and receiving equipment is given in the National Telecommunication and Information Administration (NTIA) "Red Bookn11. For SSB signals in the HF spectrum, the maximum frequency error allowed is 20 Hz. This implies that well designed, fully synthesized SSB transmitter equipment used on USG nets must have master oscillators with an accuracy of at least 0.67 ppm to be within the NTIA standard from 1.6 to 30 MHz. 20

PAGE 33

Amateur equipment designed without injection sources locked to master oscillators does much more poorly. Unlike (1.14) and (1.17), whose errors subtract from each other, the errors can exceed 100 Hz for either the transmitter or receiver alone. This is the rationale for why the USG does not allow amateur quality transmitters to be used in their HF communication nets. Conversely, such low cost equipment is satisfactory for amateur use. This is because, while the USG allocated communication channels, amateurs are allocated communication bands. An amateur need only stay in a band to operate legally. Amateur HF bands are very wide in terms of SSB signals; most band segments allocated to SSB transmissions are over 100 KHz wide. Amateur radio operators are well aware of this. Only amateur operators with well calibrated, fully synthesized transmitters operate near the edge of band segments. One final point concerning amateur SSB equipment: frequency errors evolve over time. The oscillators may not be sufficiently temperature compensated; therefore, as a transmission is being made, the heat dissipated by the final power amplifiers in the transmitter will induce a frequency shift in the SSB transmit signal. 21

PAGE 34

2. The Blind Clarification Algorithm 2.1 The Effect of a Tuning Error on SSB Voice Modulation As shown in (1.18}, if the transmitter and receiver conversion chains are aligned and calibrated together, there will be no frequency errors in the recovered audio. For a voice signal transmitted using SSB modulation, this means that any fundamentals and harmonics present will be at the original frequencies. This implies that the voice signal will be maximally intelligible by the listener if noise and interference are not present. The only signals presented to the listener will be in the 300 to 3,000 Hz passband. The end-to-end link-bandwidth is determined by the sideband filters in the transmitter and receiver as well as the tailored frequency responses of the microphone, earphones, or loudspeaker. The case of interest in this thesis is when there is an error between the frequencies transmitted and those received. Equations (1.13) and (1.17) demonstrate that frequency errors are simply added to modulating spectra when using USB modulation. For a transmitted signal at 14.3 MHz with frequency errors of x ppm at transmit and receive, combining (1.12}, (1.14), and (1.18) yields n hutpul = 14.3xl7ansmit -14.3xreceive + LAkCos(27diftt). k=l By combining the transmit and receive errors while removing the constants, this becomes 22 (2.1)

PAGE 35

n foutpul = fmor + k=I (2.2) Using the same data as in Table 1.3, consider the result of demodulation with a tuning error of +20Hz. This can be caused by either the transmitter carrier oscillator being 20 Hz high, the receiver carrier oscillator 20 Hz low, or a combination of errors. Harmonic Fundamental 2"cl 3rd Input Frequency 455.256 KHz 455.512 KHz 455.768 KHz 457.560 KHz 457.816 KHz Carrier Oscillator 454.980 KHz 454.980 KHz 454.980 KHz 454.980 KHz 454.980 KHz Recovered Frequency 276Hz 532Hz 788Hz 2580Hz 2836Hz Error +20Hz +20Hz +20Hz +20Hz +20Hz Table 2.1 SSB Demodulation Output with +20 Hz Tuning Error Figure 2.1 shows the spectrum in Figure 1.3 with the 20 Hz shift in demodulation. A total tuning error of +20 Hz would distort the recovered voice, but the signal would still be intelligible. In commercial and military equipment, transmit and receive frequency errors add and, as a result, a total error of 40 Hz could be present and the equipment would still meet the USG frequency tolerance requirement. Unfortunately errors of this magnitude are apparent to the listener and can impede intelligibility. 23

PAGE 36

Bass-Baritone 155Hz "ah" Vowel; +20Hz Tuning Error : I : : : : : : I : I : I : : : I : I : I : : : : o.1 ' I I I I I I I I I I ff------------------,---i---,---r--t----1--T---r --t---y------:---rrr-----:--------------i---tt--i +---t-i---:--: I : I : I : I : I : I : I : I : I : I ------------' I I I I I I I I I : : I : I : I : I : I : I : I : I : I 1;--------------: : I : I : I : I : I : I : I I : I i : : : : : : : : i : : : : : O.Dl : I : I : I : I I : I : I I I t, ----:_::-_ -_-=-_ I -7--t--:--+---!--1----t----l--!--+--:--t----f--l--7----f--:--+-rl-==-. === = = =. -== = = = ==:====:-= = =={ == = = :::} == 1:-------.------, -1!--------'-----, ----,-:---:---: ---f}----- ___ ___ -----------: I I : I : 1 I : I : I : I : I : I r--: ---,--I---:---: --, -----1----------------: I I I : I I : I : I : I : I c:---1 --. -j---i --I------------------------:--' I I : I : 1 : I : I : I : I I I I i Jl i ll i ll i il i i i 0.001 175 485 795 1105 1415 1725 2035 2345 2655 2965 3275 3585 3895 Frequency in Hertz Figure 2.1 155 Hz "ah" Vowel with +20 Hz Tuning Error 2.2 System Engineering Aspects of the Design The algorithms developed here are based on digital signal processing techniques using sampled data. The selection of the sampling interval, number of samples per ensemble, quantization level, and numerical precision are driven by a number of factors. Each factor is described separately. The length of vowels in time drives the total time span allowed for a data ensemble. The algorithms here are batch processes; they assume that vowels are stationary for the duration of an ensemble. The exact duration of vowels depends upon the speaker, the language being spoken, and the dialog content. The assumption is that vowels will have a duration of between 1/5th and 1/20th of a second. This limits the time duration of a data ensemble to 1/5th of a second or less. 24

PAGE 37

Another constraint on the number of samples in an ensemble is the frequency resolution of the fast Fourier transform (FFT) algorithm itself. The goal is to keep the combined tuning error under the 20 Hz USG requirement. This implies that the bin spacing of the FFT should be less than 20 Hz with adequate margin. Tight frequency resolution requirements tend to drive toward FFT based computations with a large number of bins. The computational capability of a typical personal computer (PC) must also be considered. Large numbers of complex computations, such as FFTs, can seriously affect the throughput of a PC. This limitation forces the FFT to have a minimum number of bins to meet the given frequency resolution requirement. Since the highest frequency present at the output of a SSB demodulator is about 3 KHz, by the Nyquist theorem, the sample rate must be greater than 6 KHz. Unfortunately, the IF amplifiers, detectors, and audio amplification stages in an HF receiver have significant noise contribution up to approximately 8 KHz. This would imply that it would be prudent to sample the receiver audio at a rate higher than 16 KHz so that high frequency noise does not alias into the analysis passband. Standard PC sound boards allow for sampling at 8 KHz, 11.025 KHz, 22.050 KHz, and 44.1 00 KHz. The linear quantization levels available on a PC are 8 and 16 bits. The hardware floating point mathematics in a PC is based on the IEEE-754 standard. The precision for a floating point computation ranges from single precision with a 23 bit mantissa to extended precision with a 63, bit mantissa. The selected floating point resolution must be capable of supporting the multiplication of quantized data without a resultant loss of precision due to truncation. The system level requirements led to a point design with characteristics as specified in Table 2.2. 25

PAGE 38

Sample Rate Ensemble Size Quantization Level: Mathematical Precision: 22,050 Hz 4,096 Samples 16 Bits Linear IEEE-754 Extended Precision Table 2.2 Point Design Based on System Specifications The sample rate of 22,050 Hz along with an ensemble size of 4,096 indicates that an ensemble will require 186 ms to collect, which is less than the 1/5tn second assumed duration of a vowel. The frequency resolution of FFTs using 4,096 points is 5.383 Hz, less than the requirement of 20 Hz. Using the maximum available quantization level of 16 bits was a reasonable choice, but it forced the requirement that the multiplication of two 16-bit values should not be truncated. The use of IEEE-754 extended precision with its 63 bit mantissas easily met this requirement. The compiler chosen was Borland Delphi 6.0. A computer program was written to verify that the precision was indeed 63 bits using a computation of machine epsilon in Delphi 6.0: function Emeps: extended; var temp, one, two: extended; begin one:= 1.0; two:= 2.0; temp:= 1. 0; repeat temp:= temp I two until (temp + one) = one; Emeps:= temp + temp end; 26

PAGE 39

The result was as anticipated: Machine Epsilon Calculation 1!100 lt:J :11 I! C_ompute_M_JOPllJI L 1 I_ sJ.oo :1 Significant Bits :1 Machine Epsilon (Extended Precision) li Figure 2.2 Machine Epsilon Calculation Result The number of significant bits was computed using a library call to the Delphi log2 function. Printing was by conversion to strings: procedure TMEPSForm.MEPSButtonClick(Sender: TObject); var meps: extended; st: string[BO]; one, two: extended; begin meps:= Emeps; str (meps:35, st); mepsEdit.Text:= st; str (-log2(meps) :10:2, st); SigBitEdit.Text:= st end; 2.3 Determination of Vowel Fundamental Frequency Equation (2.2) contains two unknown frequencies, the tuning offset (or error) and the fundamental frequency of the vowel being transmitted. The algorithms for determination of the vowel fundamental frequency will be developed first. 27

PAGE 40

2.3.1 Origin of the Vowel Fundamental Determination Algorithms The concepts presented here grew out of a discussion concerning automatic frequency control (AFC) techniques in general. Since a continuous carrier is present in many forms of communication, phase locked loops can be used to recover the carrier and adjust receiver tuning. For example, in binary frequency shift keying (FSK) transmission systems, the output carrier alternates between two transmit frequencies. These are commonly called the mark and space frequencies and are placed symmetrically above and below the assigned channel frequency. In practice, the transmit data is randomized; the transmitted signal has a probability of close to 0.5 that any transmitted bit will be a mark or space. As a result, the density of mark and space tones will be approximately equal over long periods of time. Figure 2.3 presents the block diagram of a commonly used demodulator for FSK signals that can track and remove errors due to frequency drift. Input FSK Signal .. .. .. .. Oscillator Output (Feedback) Control Vohage \ Vohage Controlled OscDiator Loop Phase .. Low Pass Comparator .. Filter AFC v ... Low Pass Filter Comparator Figure 2.3 AFC Circuit for FSK Radio Channel 28 Keying Waveform Output ..

PAGE 41

Unfortunately, the human voice does not have characteristics even remotely similar to an FSK signal. There is no continuous carrier. The transmitted "tones" are unknown as to their frequency, duration, and probability distribution. Figure 2.4 compares two examples of recorded vowels. The second example is shifted by +20 Hz. The plots in Figure 2.4 were generated by computing the discrete Fourier transform of two sets of sampled audio data. This was accomplished using an FFT algorithm on sampled and windowed data sets created at different times. It is apparent that in the frequency domain the two are correlated even though one spectra is shifted to simulate a +20 Hz SSB tuning error while the spectral line spacing is left intact. The first algorithm for determining vowel fundamental frequencies compares (by correlation) the unknown audio input and a reference signal with a varying fundamental frequency in the frequency domain. When the two fundamental frequencies match, the correlation will be at a maximum. The second algorithm is based on the use of autocorrelations. Similar techniques have been developed over the years to detect periodic signals with an unknown period12 If one compares a periodogram of a voice signal in the frequency domain next to one of a periodic signal with an unknown period in the time domain, the periodograms will appear similar. This algorithm compares the autocorrelation of the incoming audio samples in the frequency domain with a reference spectral line simulation. 29

PAGE 42

0.1 u 1 J u .l: "t;:! 0.01 0.001 0.1 u "C -a u :> .c "' 0.01 0.001 Bass-Baritone 155 Hertz "ah" Vowel --:---!------t-_. ---t------:----+ ---t ---1--t -1 : I I : I : I : I : I : I I I : I I : I I I : I I I I : I I I I I -------------------------------------___ ___ .l. ___ I I I I I I I I I -------------: I I I : I I I : I I : I -----------------:---: : I : I : I : I I : I I I I I I I I I I I I I =-: __ --::: ::: ::: __ -I .. .. ---:---------------+--... -----,----------------------------------------------------i--j. --1--I I I I ----. ------------------------------. ------1--.. ;; trl 155 465 775 1085 1395 1705 2015 2325 2635 2945 3255 3565 3875 Frequency in Hertz Bass-Baritone 155Hz "ah" Vowel; +20Hz Tuning Error 175 485 795 1105 1415 1725 2035 2345 2655 2965 3275 3585 3895 Frequency in Hertz Figure 2.4 Comparison of Shifted and Un-shifted Spectra 30

PAGE 43

2.3.2 Analysis of Vowels in the Frequency Domain Equation (2.2) demonstrates that the received vowel consists of a composite of a finite number of periodic functions. Consider a single periodic function of the form h(t) = ACos(21(0t) The Fourier transform of this periodic function is given H(f) = [, ACos(2tif0t)e-i2tift dt A H(f)=-(8(f/0)+8(/ + fo) 2 This can be represented graphically by H(f) N2B(f+f0 ) N2li(f-f0 ) -fo +fo Figure 2.5 Fourier Transform of ACos(at) (2.3) (2.4) f In this application, the periodic function is not merely one sine or cosine wave, but the sum of the fundamental and harmonics up to the limit of the SSB filter, which is about 3,000 Hz. Since the Fourier transform is linear, the Fourier transform of the 31

PAGE 44

sum of components is equal to the sum of the Fourier transforms of each component. As a result the Fourier transform of the components of a vowel appears as follows in Figure 2.6. H(f) f -3G! -2G! -fo +fo +2fo +3fo +nfo -3,000Hz 3,000 Hz Figure 2.6 Fourier Transform of a Vowel The periodic function available for processing does not extend to infinity on the time axis. It is essentially multiplied by a "window function" that makes the waveform length finite. The natural approach is to use a rectangular window; i.e., simply truncate the periodic function on both ends. This is the same as multiplying the periodic function by a rectangular wave. Consider the Fourier transform pair of the rectangular window by itself: 32

PAGE 45

h(t) = A,ltl < t0 h(t) = A/2,iti H(f) = Sin(2trTof). 27rTof h(t) =O,jtj >t0 (2.5) This generates the classic Sine function; the amplitude of the Sine function is shown in Figure 2. 7. Figure 2.7 Magnitude of the Sine Function Due to the linearity of the Fourier transform, the frequency-convolution theorem allows that h(t)x(t) H(f) X(f). 33 (2.6)

PAGE 46

When the Fourier transform of the periodic function and the rectangular window are convolved, each of the spectral lines displays the characteristic side Jobes of the Sine function. In discrete Fourier transforms, the side Jobes "leak" into adjacent bins; hence the name spectral "leakage". This means that each spectral line in Figure 2.6 will be spread and give the appearance of Figure 2.8. I i-nfo I I I I I -3,000 Hz -3fo -2f0 -fo H(f) +fo +2f0 +3f0 +nfo 3,000Hz Figure 2.8 Fourier Transform of Vowel with Side Lobes f As a result, different windowing functions have been developed to minimize the side lobes. An example of an improved window function is the Hann window13 : (2.7) Tis the total time duration of the window, with t being the time from 0 to T. The window function is then multiplied with the periodic function, in this case a vowel, 34

PAGE 47

being analyzed. While not perfect, the Hann window attenuates the highest side lobe by 32 dB and has an asymptotic roll off of 18 dB per octave14 The software developed here allows for operator selection of four different window functions. They include the rectangular, Parzen, Hann, and Welch windows. The default window is the Welch. Table 2.3 gives the discrete versions of the window functions where N is the total number of samples in an ensemble and j is the index to a particular data point. Window Mathematical Description Rectangular 1.0 Parzen 1.0 Abs ((j-0.5*(N 1.0)) I (0.5 (N + 1.0))) Hann Welch 0.5 (1.0 -Cos (2.0 pi* j I (N 1.0))) 1.0 Sqr ((j 0.5 (N 1.0)) I (0.5 (N + 1.0))) Table 2.3 Typical Window Functions In this application, Figure 2.8 would represent the case of no tuning error in a SSB communication system. When the tuning errors are added, the spectral lines are shifted as shown in Figure 2.9. The analysis up to this point has concentrated on continuous functions. The algorithms are actually implemented using sampled data and the discrete Fourier transform. The fast Fourier transform (FFT) is an algorithm by which discrete Fourier transforms (OFT) are efficiently calculated by computer. The FFT is an algorithm and not an approximation; it provides a considerable improvement in efficiency when compared to calculating a OFT directly. The detailed design of FFT algorithms is not covered here; a particularly good algorithm description is given by Higgins in Digital Signal Processing in VLSI15 35

PAGE 48

H(f) -3,000Hz 3,000Hz Figure 2.9 Vowel with Side Lobes Subject to a Tuning Error From the point of view of Figure 2.9, the use of sampled data concentrates the energy into bins as seen in Figure 2.1 0. f f Figure 2.10 Vowel Spectra using Sampled Data and Discrete Fourier Transform As with the continuous case, the spectral lines will be shifted if there is a SSB tuning error. Fortunately the spacing between the spectral lines located in each half plane remains identical with that of a signal with no tuning error. 36

PAGE 49

2.3.4 Initial Audio Sample Processing A PC sound card is used to acquire data sets at a 22,050 Hz sample rate. The 16 bit linearly quantized samples are collected and stored in a Windows .wav file. A single pass through the blind clarification algorithm requires that 4,096 real data samples be read from the .wav file and processed. Data Simulation in the Time Domain 3 2 1.1. .... ,.. 0 -1 --2 -I I I I 0 1000 2000 3000 4000 5000 Sample Number Data Simulation in Time Domain 3,----------------------------------, Sample NIIIDber Figure 2.11 Typical Input Data Ensemble in Time Domain 37

PAGE 50

Except for the final tuning error frequency determination, all applications of the input data are in the frequency domain. In order to prevent spectral leakage when the input data is transformed from the time domain to the frequency domain, a window function is applied. This window function will reduce the amplitudes to zero at the ends of the ensemble. The default here is the Welch window function. Simulated Data with Windowing Applied 3 2 r--.11 -...... II.!. lt..Jij j 1 ,.. F"'' I'"'" r 0 "'0 ::s J 0 -1 -2 -3 I I I I I I I 0 1000 2000 3000 4000 5000 Sample Number Figure 2.12 Input Data Ensemble after Window is Applied After windowing, the real data array is converted to a complex array simply by inserting an imaginary value of 0 between each real value in the data array. The windowed data is then transformed into the frequency domain using an FFT based on the Danielson-Lanczos formula16 This approach is not as computationally efficient as algorithms that are designed for processing real-valued inputs; however, 38

PAGE 51

this same FFT algorithm is used elsewhere; in this way only one routine needs to be coded and tested. 3.50 3.00 -2.50 -:g .E 2.00 -c. 1.50 -1.00 -0.50 -0.00 "" \,lj,, ..... 0 soo 3.50 3.00 2.50 ..., "'=' .5 c. 2.00 1.50 1.00 0.50 0.00 150 170 Simulated Data with Tuning Error 1000 MIN .... d ... 1500 Frequency (Hz) I..! 2000 Simulated Data with Tuning Error 190 210 Frequency (Hz) .... "'"II.. 2500 3000 230 250 Figure 2.13 Simulated Data in the Frequency Domain; With Tuning Error 39

PAGE 52

2.3.5 Vowel Simulation A simulation of vowels was required both for testing the algorithms and, in some cases, for comparison purposes in some of the determination algorithms. This simulator was designed to generate data with the following characteristics: (Simulated) Sample Frequency: Data Set Length: 22,050 Hz 131,072 Samples Added White Gaussian Noise: Vowel Fundamental: Harmonics: Adjustable from 0 to -100 dB SNR Selectable from 87 Hz to 439 Hz Up to 3,000 Hz Total Spectral Range: 100-3,000 Hz Amplitude Ratio: Fundamental, Harmonics Identical ( 1 : 1 ) Table 2.4 Data Simulator Requirements 2.3.6 Determination of Vowel Fundamentals using Sliding Correlations Consider the application of the discrete correlation relationship on the two sets of data in the frequency domain14 n z(k) = Lh(i)x(k + i). i=l (2.8) If the fundamental frequencies match, the spectral line spacing, in terms of FFT bins, will be identical. Therefore the result, z(k), will be at a maximum (compared to other cases where the fundamental frequencies of h(i) and x(k + i) differ). Correlations can be computed using a brute force approach suggested by (2.8). This is generally avoided because the operation requires n2 computations in order to 40

PAGE 53

generate each possible alignment. A better approach is to allow FFTs to compute the correlation products. For any given data set, the frequency components are band limited and do not extend to the ends of the data ensemble. This makes each sample set of finite length and therefore the complexities of overlap-add and overlap-save algorithms can be avoided. The standard approach to using FFTs to perform correlations uses both forward and inverse FFTs: (2.9) This algorithm amounts to taking the FFT of both data sets, taking the conjugate of the 2"d data set, multiplying the sets together, and then taking the inverse FFT of the product. Brigham outlines a method of computing correlations using only forward FFTs14 : (2.10) This is the approach that was used here. While it is slightly less efficient than the algorithm that uses the inverse FFT shown in (2.9), only the forward FFT routine needs be coded and tested. In order to avoid interpretation problems with the correlation result, it is necessary to carefully pad each input sequence with zeros so that the active data spans do not overlap. 41

PAGE 54

For the purpose of determining the best candidate simulated frequency, all of the 4,096 bins are summed using an root mean square (RMS) algorithm to yield a "score". The score for each frequency is compared; the highest score is assumed to indicate the correct vowel fundamental frequency. This approach has the distinct advantage of equally weighting all 4,096 samples in the result. This reduces the effect of channel noise, which is assumed to be additive white Gaussian noise, by a factor of 64, or 36 dB. Given that successful SSB reception requires an SNR of at least 1 0 dB, the vowel fundamental frequency determination algorithm should be essentially immune to channel noise in practice. This hypothesis will be tested in simulation. 2.3.6.1 Algorithmic Details of Correlative Vowel Determination Figure 2.14 gives the basic algorithmic flow used to determine vowel fundamental frequencies. In order to illustrate the nature of the algorithms while they are being described, a simulation was performed with intermediate products saved for plotting. The simulation conditions are given in Table 2.5. Fundamental Frequency Tuning Error Signal to Noise Ratio Speaker Classification 107.666 Hz -20Hz 20dB Bass-Baritone Table 2.5 Demonstration Simulation Run Specifications 42

PAGE 55

131072 Samples !6Bt 22050HzFck 4096 Consecutive Start Proccss Sample audio into .wavfile Extract ensemble to be analyzed Set simulator to lowest candidate fundamental Cienemte fundamental, harmonics to 3KHz Real Samples _.... Apply windowing fimction to ensemble Apply windowing fimction to ensemble 4096 Complex .. Samples U;eFfT to tr.msform into frequency domain U;eFfT to transform into frequency domain 4096 Complex .,. ans Padded to 8192 Complex bins 4096R<:al Perform correlalion of two data ensembles ans Summed -------- to "Score" If sc:orc is greater tban prcviom ones, note the fundamental frequency used Increment simulalor frequency No Fundamental frequency with highest correlation or "score" is the one of interest Figure 2.14 Correlative Fundamental Determination Algorithm 43

PAGE 56

2.3.6.2 Reference Data Set Generation The sliding correlation requires that simulated data be generated for comparison at each candidate vowel fundamental frequency. Nominally each simulated data set is produced at 0.5 Hz increments. For all possibilities of spoken vowels, this would require over 700 correlation runs be performed in order to cover the 87 Hz to 439 Hz range of possible vowel fundamentals. In order to reduce the number of correlations required per pass, the frequency bounds were manually adjustable by voice classification (see Table 2.6). Classification Low Frequency High Frequency Soprano 246.94 Hz 438.88 Hz Mezzo-Soprano 207.65 Hz 415.30 Hz Contralto 185.00 Hz 370.00 Hz Tenor 146.83 Hz 293.66 Hz Baritone 92.50 Hz 185.00 Hz Bass 87.31 Hz 174.62 Hz Table 2.6 Voice Classifications The use of voice classification is not a big compromise. The vast majority of SSB users in the HF spectrum are male; the large majority of them speak in the baritone range. As a result the software is generally left in the baritone range unless the target SSB signal has audibly different characteristics. The simulated data is windowed and processed by the FFT algorithm into the frequency domain in the same fashion as the audio sample data. Apart from the offset frequency (and the noise), they appear very similar. With a fundamental frequency of 1 07.66 Hz, the second harmonic is exactly located on the correct frequency (see Figure 2.15). The offset samples in Figure 2.13 demonstrate that the target (simulated) audio sample is 20 Hz low. 44

PAGE 57

Q) '0 .E 'a t 3.50 3.00 r-2.50 r-2.00 r-1.50 r-1.00 r-0.50 r-0.00 0 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 Simulated Data in Frequency Domain, NO Offset J) 500 1000 1500 Frequency (Hz) lL 2000 2500 Simulated Data in Frequency Domain, NO Offset 215.33 Frequency (Hz) Figure 2.15 Simulated Data with No Tuning Error 45 3000

PAGE 58

2.3.6.3 Data Set Zero Padding Prior to being processed by the correlation algorithm, both the input sample and the simulation data sets in the frequency domain must be zero padded. In this case the padding does not change the number of values in the data sets. This was desired so the order of the FFT would not increase and to keep the data sets sized to contain an integral power of 2 number of samples. While it is possible to derive an FFT algorithm for any number of samples in an ensemble, algorithms with a power of two number of samples are much more efficient17. The zero padding was used to allow the first half of the Fourier transformed incoming data to remain at the start of the array. The remaining locations in the array were zeroed because they contain no useful data, since the FFT bins span frequencies from 0 to 11,025 Hz and the data is only valid from 0 to 3,000 Hz. The simulated data was also zero padded. In this case the FFT bins from 0 to 5,512.5 Hz were placed at the top of the array with all other elements set to zero. Again, the upper bins were deleted since they contain no useful information. Padded audio data and simulated data are shown in Figure 2.16. Matrices Padded Prior to Correlation 3.50 3.00 .. "0 2.50 -a 2.00 : I : : I : : : : I : : ; 1l 1.50 0 u -'-+--;---+---::, I : l j 1.00 : : : : : : : : : I : ' I : : : 0.50 I : : : I : : : --:---i-:--:---:--: : l : 0.00 0 1000 2000 3000 4000 5.000 Bin Number Figure 2.16 Data Sets Padded Prior to Correlation 46

PAGE 59

2.3.6.4 Data Set Comparison by Correlation After the correlation is performed, the result is a typical correlation plot as shown in Figures 2.17 and 2.18. o.__.___.___..__...___,__--'--...L.I 0 1024 2048 Bin Number 3072 Figure 2.17 Correlation Plot at Full Scale 4096 With identical data for both input arrays, zero padding is such a fashion will result in the correlation maximum being at bin 2,049 of 1 to 4,096, which is correct for this zero padding arrangement. 47

PAGE 60

Correlation Result; 107.666 Hz Fundamental; 0 Hz Shift 5 4 I I I I I I I ' ' : ' ' ' ' : : :--r-:--:--:-:--r-:--:-:--:-:--:-:--:-1--:-:-: I : : : I : : : : ,: :, : 1 1 : : : I : : I : : : I : : I : : : I : I : : : I : : : , I , -+--1---i--+---:--+--l----+--i---4---r I I I : : : : : : : : : : : ; . ; : : ; i I i i I i i i I i : I : : I : I : : I : : --:--: -+--t-+-+-t--+-1 I I I I I I I ' I I I I I I I : : : : : : : : : : : : : I : : : I : I : I : : __ : : : : : : : : : : : : : : : : : : : I : ' I : : I : : : I : i I I i I i : i I i 2039 2044 2049 Bin Number 2054 Figure 2.18 Correlation Plot with No Tuning Error With a -20 Hz tuning error, the peak moves left by about 4 bins {see Figure 2.19). Correlation Result; 107.666 Hz Fundamental; -20Hz Shift 5 4 2040 2045 I I I I I I ' I ' I ' : : I : I : : I I i i I I 2050 2055 Bin Number Figure 2.19 Correlation Result with Input Shifted -20 Hz 48

PAGE 61

The exact location of the maximum amplitude bin can be used to determine the tuning offset. In this case there is a tuning error, so the maximum is about 4 bins lower than 2,049. Unfortunately this approach is weak statistically as it uses the result of only 1 bin of the correlation. A stronger approach is presented later. 2.3.6.5 Scoring Correlations As correlations are performed on simulated data nominally at 1/4th Hz increments, the summation (using RMS) of each correlation is calculated. The set with the largest correlation sum determines the vowel fundamental frequency. Figure 2.20 shows a definite correlation spike at the correct frequency. Sliding Correlation Score; Fundamental= 107.666 Hz 20.0 ,-----;-----;------;---;------;-1 --..,--..,.---;------;----, II) 15.0 8
PAGE 62

Paul Harvey; 0 Tuning Error Dave Logan; 0 Tuning Error 4.50 1.3 I) 4.00 1.2 ""' "' s ""' -a s 3 .SO -a 1.1 3.00 "' ::: 0 .. i g 1.0 8 2.SO 8 01) "" 0 9 ;;, c il5 2 .00 ;;, n Ci5 01 1.50 0 8 0 1.00 [ v I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 0.7 80 85 90 95 100 lOS 110 liS 120 80 85 90 95 100 lOS 110 llS 120 Fundamental Frequency (Hz) Fundamental Frequency (Hz) Figure 2.21 Two Radio Personalities Fundamental Vowel Frequencies Compared

PAGE 63

The multiple fundamentals in the Dave Logan sample were thought to be an artifact, perhaps from vocal inflection or multiple vowels during the data sample interval. Analysis of other data ensembles show that this is a fundamental characteristic of Dave Logan's voice. 2.3.7 Vowel Fundamental Determination using Autocorrelations The method for finding vowel fundamental frequencies in 2.3.6 has some serious deficiencies. Chief among these, as shown in Figure 2.22, is the tendency to find spurious peaks in the result that are not due to the desired correlation. While this occasionally happens in simulation, it almost always occurs when processing real voice data. Fundamental = 123.45 Hz; Offset= -32.45 Hz 10 I I I I I 4-oekired Cohetation:Peak : I I I I I 8 I I I I I I I ----,-----r----T----,-----r----,-1 I I I I I I I I I I I I I I I I I I I I I I I I I I I -I +--1 I I I I I I I I I 6 I I I I I I I I ".:::1 I I "' ] 4 I I ,--0 u 2 I I I I I I 1 I I I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I 0 90 100 110 120 130 140 150 160 Frequency (Hz) 170 180 Figure 2.22 Misidentified Vowel Fundamental Frequency Another deficiency is that sometimes the correlation peak is biased away from the actual peak. The amount of the bias changes as the phase of the harmonics of the 51

PAGE 64

vowel fundamental change. While smoothing and range trimming algorithms helped the algorithm to find the correct peak, the biasing tendency could not be reduced. A new approach was developed that added autocorrelations to the processing. This solved many of the problems encountered in the original algorithm. 2.3. 7.1 Development of the Autocorrelation Based Algorithm The autocorrelation function can be very useful in identifying periodic signals 12 18 n z(k) = Lg(i)g(k + i). i=l (2.11) As the spectral lines slide across each other, only their products are present in the resulting autocorrelation product. Any signal that lacks the appropriate spectral line characteristics tends to be concentrated at the center of the correlation. For the auto-correlation performed here, this concentration occurs at bin 2,049 of 1 to 4,096. As a result, this central product is omitted from further processing of the final correlation result. As with the previous cross correlation analyses, the autocorrelation is performed using FFTs: (2.12) Figure 2.23 diagrams this process. 52

PAGE 65

Autocorrelation ,t,"" t.tll I u II t .t ;t Result Figure 2.23 Example of Vowel Autocorrelation Consider a 123.45 Hz target vowel that has a -32.45 Hz offset as shown in Figure 2.24. The harmonic structure is evident in the sample, but the harmonics are shifted by the offset frequency. In addition, the noise amplitude is significant. After the sample is passed though the autocorrelation, the noise amplitude drops markedly. As can be seen in Figure 2.25, in addition to the reduction in noise, the harmonics start cleanly at the correct frequency; the autocorrelation as expected removes the offset In the balance of the computations, the autocorrelation result is used in place of the audio input signal after being transformed into the frequency domain. After the autocorrelation, the algorithm is very similar with the original algorithm presented in 2.3.6. The other enhancements to the original algorithm will now be discussed. 53

PAGE 66

3.50 3.00 2.50 "' t 2.00 1.50 1.00 (J1 0.50 0.00 0 I I I I I I 1 I I I I I 1 I I I I I I I I ---1-------------1-------------+-----I I I I I I I I I I I I --r-----------,--------T --1 I I I I I I I I 1024 2048 Bin Number 3072 3.50 J.OO 1.50 ] 12.00 1.50 1.00 4096 0.00 I Audio Sample in Frequency Domain I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 21 41 61 I I I I ------t"-1 I I ------t1 I I ------+1 I I I ------r1 I I ------.&.-1 I I I _. ___ T...,_ I 81 Bin Number I I I I -----T---1 I I -----L--1 I I I ----+-1 I I I ----r--1 I I ____ J. __ I I I I --T-101 Figure 2.24 123.45 Hz Vowel in Frequency Domain I I I I ----t-1 I I ___ j __ I I I I ---+-1 I I I ---T-1 I I ---...J.-1 I I I ---T-1 121

PAGE 67

6 ] 4 "' t 2 0'1 0'1 0 0 Autocorrelation of Input Data in Frequency Domain I I I I 1 I I I 1 I I --------4--------1 I I I --------,-------1 I I ________ J ______ I I I I --------.----1 I 1024 2048 I I I I 1 I I I I I ________ J __________ L _______ I I I I I I I I 1 I I I I I I I --------,----------r-------1 I I I I I 1 I I I I I I I -----i----------r-------1 I I I 3072 4096 BinNwnber j t Autocorrelation of Input Data in Frequency Domain I I I 6 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I --------T-1 I I I I I I I I I I I I I I I --------r-------,--------,--------r-----r I I I I I I I I I I I 2 _______ J__ 1 I I I I I I I I I I I I I I I --------T -------,-------,--------r-----r I I I I I I I 0 2048 2068 2088 2108 2128 2148 BinNwnber Figure 2.25 Autocorrelation of 123.45 Hz Vowel in Frequency Domain

PAGE 68

131072 Samples 16 Bil ZZOSO HzFck 4096 Complo>< Bins 4096 Comple>c Bins 4096Rml SlllrtProc:oss 4--SlidingDot Product Bins Summed --------+ to Yeld "Score" Fundamoa1al froqua1cy wilh bigbell correlation or "score" is !be oue of interest Figure 2.26 Modified Vowel Fundamental Algorithm Flow Chart 56

PAGE 69

2.3.7.2 Improved Vowel Simulation Since autocorrelation removes the effects of the tuning offset and references the harmonic lines to direct current (DC), the opportunity to use an improved vowel simulation became evident. This new simulation replaces the ensemble of sine waves in the time domain (that was then transformed into the frequency domain using FFTS) with a direct simulation in the frequency domain. This simulation consists of Cos2 peaks at each location where a spectral line would be anticipated as seen in Figure 2.27. Correlation Mask for 107.666 Hz 1.10 I I I I I I I I I I 1.00 I I I I I 0.90 I I I I I --------,-------,-------,-------,-------, I I I I I 0.80 ________ L______ __L__ _____ L______ __L__ _____ L I I I I I 0.70 .! 0.60 0.. El < 0.50 I I I I I I I I I I I I I I I --------,--------,--------,---------,-------, I I I I I ________ L____ ___ L____ ___ L____ ___ L____ ___ L I I I I I I I I I I 0.40 I I I I I 0.30 I I I I I --------,--------,--------,---------,-------, I I I I I 0.20 ________ L ________ L ________ L ________ L ______ --L I I I I I I I I I I 0.10 I I I I I 0.00 I I I I I ---------,---------0.10 2049 2059 2069 2079 2089 2099 Bin Number Figure 2.27 Correlation Mask for 107.666 Hz As mentioned previously, the initial peak at bin 2,049 (as seen in Figure 2.25) should not be used in the sliding correlation that follows the autocorrelation. Figure 2.27 shows that this initial peak is zeroed in the mask. 57

PAGE 70

2.3. 7.3 Product of Autocorrelation and Mask Sliding Dot Product The cross-correlation is no longer necessary to compute the score for a given trial fundamental frequency. This is accomplished simply with an element-by-element multiplication and a summation (dot product) of the result of the autocorrelation and the computed mask for the trial frequency. As a result the final computation is very rapid when performed on the 2.4 GHz Pentium IV PC. The term "sliding" refers to the mask "sliding" across the autocorrelation product as the frequency of the mask is changed. Cl) ""0 .B P.. s < 130 120 llO 100 90 80 70 60 90 I I I I ---;---1 I I I Sliding "Dot Product" Fundamental Search I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I L ____ 1 I I I I I I I 1 I I ---+-1 I I I ---T-1 I I ----1---1 I I I 100 I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 110 120 130 140 150 160 170 180 Frequency Figure 2.28 Fundamental Search using Autocorrelation Algorithm It is worthwhile to compare the search result in Figure 2.28 with that of Figure 2.29. 58

PAGE 71

ll) "0 -a c: 0 ] 0 u 25.0 20.0 15.0 10.0 5.0 0.0 90 Sliding Correlation Search for I 07.666 Hz Fundamental I I I I I I ___ T __ I I I I I ----l.--1 I I I I I 1 I I 100 I I I I I I I I I I I I I ,-----r----,-----r----T----,-----r----,--1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ----r----,--1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I -i I I 110 120 130 140 150 160 170 180 Frequency (Hz) Figure 2.29 Search for 1 07.666 Hz Fundamental Using Original Algorithm The noise and ripple using the autocorrelation is significantly less than with the original algorithm. Figure 2.22 showed a fundamental frequency misidentification using the original algorithm. Figure 2.30 shows the frequency identification using the new algorithm. The autocorrelation significantly removed spikes and noise. The peak is smoother and easier to interpret. 59

PAGE 72

Pathologic Case; Fund.= 123.45 Hz; Offset= 32.45 Hz 100 60 I I I I I I I I I I I I ------,--------r -----,--------r-------,-------1 I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I ------,-------1 I I I I I I I I I I I I I I I I I I I I ------,--------r--1 I I I I I I I I I I I 50 80 100 120 140 Frequency (Hz) 160 180 Figure 2.30 Sliding Dot Product Analysis of Pathologic Case 2.4 Determination of Tuning Error when Vowel Fundamental is Known 200 Two algorithms were developed to identify the tuning error. The first algorithm only functions when the original sliding correlation algorithm (2.3.6) is used. The second algorithm can be used with either fundamental estimation algorithm. 2.4.1 Approximating the Tuning Error Based on Fundamental Calculation Figure 2.19 illustrates how the tuning error can be approximated by examining the location of spectral lines obtained from performing correlations in the frequency domain. 60

PAGE 73

The process is quite simple to perform. Using the zero padding algorithm developed in 2.3.6.3, an autocorrelation would result in a spectral line peak being located in bin 2,049 of the 4,096 present in the correlation result. If there is a tuning error present, the location of the peak will be offset by 5.383 Hz per bin. The data used for Figure 2.32 was offset by-20Hz. This is approximately 4 bins. As a result, the peak is at bin 2,045 instead of 2,049 as in Figure 2.31. Q) "0 ::s .-:::: 0.. c 0 Q) t:: 0 u 107.666 Hz, 0 Hz Tuning Offset 5 I I I I I I I I 4 I I I I I I I I --,-----r----r----r---1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 3 ---1-------l-----+-----l----l-----1-----1-----1-----1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 I I I I I I I ---r----T ____ T ____ ----r----r----r---1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 0 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 Bin Number Figure 2.31 107.666 Hz Fundamental with No Offset 61

PAGE 74

Q) "'0 E! =a e < = 0 -g 0 u 5 4 3 2 107.666 Hz Fundamental; -20Hz Offset I I I I I I I I I I I I I I I I I I I I ---,-----r----r----r---I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I -r---1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ---r----T ____ T ___ 1 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 0 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 Bin Number Figure 2.32 107.666 Hz Fundamental with -20 Hz Offset This algorithm for determining the tuning error is inadequate on several counts. The algorithm is statistically weak, because the predicted tuning offset is based on finding the maximum bin over a range of about 20 bins. A better approach would be to use the information in many bins, not just a local maximum bin. Another weakness is the precision of the basic algorithm. Since it is using a local maximum bin as the tuning offset frequency indicator, the bin spacing of about 5.383 Hz per bin limits the resolution to half that value in the best case. While the dimension of the FFTs in the algorithm could be increased, the computational time for the algorithm is already excessive at about 15 seconds per estimate. 62

PAGE 75

Finally, the predicted tuning error is based on the determination of the vowel fundamental frequency. As already demonstrated, the sliding correlation algorithm does a poor job of finding the fundamental frequency to begin with. When the vowel fundamental frequency is incorrect by more than a fraction of a Hertz, the predicted tuning offset value is meaningless. There is a mathematical approach to improve the resolution of the predicted tuning offset to better than a bin width. 2.4.2 A Possible Improvement in the Approx,imated Tuning Error An examination of Figure 2.32 shows the peak is a relatively smooth function that is several bins wide. This suggests that the curve could be interpolated to find the actual peak; this peak should be the actual tuning error. The actual peak is a local maximum of a curve where the contents of each bin are points on the curve. As a result, the slope at the actual maximum of the curve is zero. The gradient algorithm provides a means to find the maximum of the curve. The gradient algorithm used to find a function maximum is: X;+t =X;+ gf'(x;), (2.13) where g is the "gain" or a constant used to prevent over-estimation of the function maximum 19 One difficultly is finding the derivative of the curve at the current point; no equation is available for the curve itself. A second difficulty is that after the first iteration the new estimate of x is likely not an existing point on the curve. If some points on a curve are available, then function values on the curve can be determined by interpolating a polynomial that passes through the available points. 63

PAGE 76

The Lagrange interpolation function is efficient in this application since the polynomial coefficients are never explicitly calculated. The Lagrange interpolation is described in Appendix A. To approximate the derivative, a secant approximation of the tangent at a point is computed by applying two sided finite differences. The traditional equation for the derivative is one-sided. f'(x) =lim f(x+ h)-f(x) h-+0 h (2.14) To estimate the derivative ofjat x, his replaced by a term based on the square root of machine epsilon for the computer and compiler being used. The computation of the machine epsilon itself is described in 2.2. f '(x)"' f(x(l.O+Jj)f(x) X 8 (2.15) When using finite differences, h is not vanishing small, just small compared to the current value of x. As a result, the estimate of the derivative off at x using one-sided finite differences suffers a first order error. 64

PAGE 77

Figure 2.33 First Order Approximation Error Due to One-Sided Finite Differences Two-sided derivatives are just an extension to the standard form of the derivative. l'(x) =lim l(x+h)-l(x-h) 2h (2.16) When approximating the derivative of I at x, the two-sided finite difference eliminates the first order error in estimating the tangent. I '(x) l(x(l.O+Ji))-l(x(I.O-Ji)) 2x.fi 65 (2.17)

PAGE 78

Actual tangent a1 X Figure 2.34 Approximation of Tangent using Two-Sided Finite Differences In reality, two-sided finite differences are nothing more than the two-point slope formula (2.18) with the values for x1 and x2 being very close to a central point x. Using the same simulated data that generated Figure 2.32, an improvement in the estimated tuning error was performed using a 1Oth order Lagrange interpolator. Each iteration of the gradient search monotonically moved closer to the peak in the interpolated function. This can be seen as the derivative gradually converged on 0. 66

PAGE 79

Approximated Derivative as a Function of Iteration 1.60 .-----c-----,-----.,.----,-----1,----------:------, I I I I I I I I I I I 1 I I I I I 1.20 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I l:...l 0.80 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 0.40 ----1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 5.0 10.0 15.0 20.0 2S.O 30.0 35.0 Iteration Figure 2.35 Derivative of x as a Function of Iteration At the same time, the value for x converged on a value close to the correct result of -20.0. Gradient Algorithm Improvement on Estimate of Tuning Error -20.00 I I I I I I ll -20.20 llS 0 till c -20.40 E-""' I I -----t----1 : : : : (32, -20.)5) I I I I I I ----1 I I I I I I I I I I I I I I I I I 0 I I I I I I "' -20.60 10 1;; IJ.l -20.80 ______ l ______ J _______ L _____ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ----T------,-------,------T------,-------,------1 I I I I I I I I I I I I I I I I I I I I I I I -21.00 1 I I I I I I I I I I I I I I I I I -21.20 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 Iteration Figure 2.36 Improved Tuning Error Approximation 67

PAGE 80

Relying on approximated derivatives of interpolated polynomials is a numerically unsound process. Even with noiseless data it is possible that the errors could be quite large. Figure 2.37 shows an example where the improved tuning error estimate, again performed with a 1 01 h order interpolated polynomial, is not as good as the original estimate. -32.0 !5 -32.5 .5 Oil 1:: -33.0 1 -33.5 -34.0 Gradient Improvement of Tuning Error; Fund= 128.0; Offset= -33.15 I I I I I I I I I : : : I I I I I I I --------t--------1 I I I I I I I I ----+----------4-----------t----------i----1 I I I I I I I I I I I I I I I I I I 1 I I I I I -34.5 L.._L-..J---L--'.---'.---L-...L--'....---L__L...--L..__L__l..._...L_..J..._.L_.l_L__L__j---l___l_l 0 100 200 Iteration 300 400 Figure 2.37 Incorrect Improvement in Tuning Error A numerically sound approach to computing the tuning error is required. 2.4.3 Determination of Tuning Error Using Time Domain Correlations Consider the following hypothesis. Once the vowel fundamental frequency is computed, a sliding correlation can be used to estimate the tuning offset. As candidate values of the tuning estimate are computed, a simulated vowel can be created using the previously computed vowel fundamental frequency. The correct value of the tuning estimate should produce a correlation maximum when the sampled audio data and the simulation are compared. 68

PAGE 81

This hypothesis seems reasonable, but it relies on an assumption that needs to be explored. This assumption revolves around the orthogonality of sine waves. It would seem that two sine waves of the same frequency would be highly correlated, this hypothesis hinges on the concept that sine waves of different frequencies are not highly correlated. 2.4.3.1 Correlation of Sine Waves In this application, correlations are with respect to time; as a result, the time cross correlation function is of primary interesf0 } T Rxy('r) =lim-Jx(t)y(t + T)dt 2T -T Consider two sine waves with different non-zero integer frequencies n and m. Without loss of generality, let -r = 0: T Rxy = lim-1 -Jsin(mt)Sin(nt)dt 2T -T Using the standard trigonometric identity Sin(a)Sin(b) = Cos(a-b) 2 69 Cos(a+b) 2 (2.19) (2.20) (2.21)

PAGE 82

(2.20) becomes 1 T Rxy =lim-ITcos(m-n)t-Cos(m+n)t]dt T-+f1'.> 4T j[ -T (2.21) Over the interval of (-n, n), this expression sums to 0. Due to the periodic nature of the cosine, the sum will be 0 over every interval of 2n. As the limits of integration expand, any portions of the sum that are not a full interval of 2n are divided by 2T and get successively smaller. Hence, this integral at the limit is 0 when T approaches infinity. When the sum is 0 over the interval (--n, n), it is said that the sine waves are orthogonal over the interval (-n, ni1 Without constraining the sampling interval T, in the discrete case (2.20) becomes Rxy =lim LSin(2mniT)Sin(2mziT) = 0 T-++0 i=O (2.22} A careful examination of this expression is important. The number of samples Int(J + 1) increases as the sampling interval T decreases. The arguments to the Sin functions change over a range of 0 to 2n in a discrete sense. As mentioned previously, m and n are different non-zero integers. None of those conditions are present for either the sampled vowel data or the simulated vowel data available for determining the tuning error. The data ensembles 70

PAGE 83

are collected over an interval that is short; at a 22,050 Hz sample rate an ensemble of 4,096 samples span only 186 ms. This is far different from infinitely many seconds. In order to get a sense of the effect that short sample intervals have over relatively realistic frequencies, simulations were designed (see Figure 2.38). Ill ... 8 tiJ c:: .9 'iii v u 2500 2000 1500 1000 500 Correlation of 128 Hz and Test Frequency; 0.05 Hz Steps I I I I I I I -----T-----,-----r---1 I I I I I I I I I I I -----+-----r-----r---1 I I I I I I I I I I I _____ L ___ I I I I I I I I I I I I I I I -----T-----,-----r---1 I I I I I I I I I I I 64 80 96 112 128 I I I I I I I I 1 I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 144 160 176 192 Frequency Figure 2.38 Correlation of 128Hz and Test Frequencies Based on the results of the simulations, the hypothesis was accepted and the tuning error algorithm was designed. 2.4.3.2 Tuning Error Determination Algorithm This algorithm is very similar to the algorithms previously presented. The primary difference is that the correlation is performed in the time domain by sliding the offset frequencies. The vowel fundamental frequency used is the one determined from 2.3. 71

PAGE 84

131072 Samples 16 Bit 22050HzFck 4096 Consecutive Start Process Sample audio into .wavfile Extract ensemble to be analyzed Set simulator to lowest candidate tuning offset frequency Generate fundamental, bannonics to 3KHz Real Samples _. '------.., with Padding 2048 Real Perform correlation of two data ensembles Bins Summed to Yreld "Score" H"score" is greater than previous ones, note tbe offset frequency used Increment simulator tuning offset frequency No Vowel Fundamental Frequency is from Previous Algorithm Tuning offset frequency with highest correlation or "score" is the one of interest Figure 2.39 Tuning Offset Determination Algorithm 72

PAGE 85

2.4.3.3 Typical Operation of the Tuning Offset Algorithm When searching for the offset, a new correlation is attempted at each test offset frequency. Offset Determination; 128 Hz Fundamental, -20 Hz Offset 20.0 ,--------,-1 -----;-------:----,----------...,-1 -----, I I I I I I I I I I I I I I 1 I I I I I I I s:: 0 g 15.0 a) 1:: I I I I 0 I I I I u I I I I ...... I I I I 0 I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I C) 10.0 ""0 .t:: -a E < I I I I I ------________ L _______ 1 I I I I 5.0 I I I I I I I I I I I I I I I I I I I I 0.0 -60 -40 -20 0 20 40 60 Offset Frequency Figure 2.40 Offset Determination Correlation as a Function of Test Frequency When the test offset frequency matches the actual offset frequency, the correlation function yields a maximum. 73

PAGE 86

Correlation; -20 Hz Offset with -20 Hz Test Offset 4.00 ,.----,-------;-----;-1 -----;-----;-----;-1 ----,----------, I I I I I I I I 3.50 r-----+------:------t---------+-----+------:-----I I I 1 I I I I I I I I 3.00 r-----+------:------t----I I 2.50 r-----+-----+-----I I } I I -( 2.00 r-----+------:---I I I I 1.50 r------+------:----1 I 1.00 r-----+-------1--1 I 0.50 '-----+-r --r--.-0.00 1 0 512 1024 1536 1 I I I I I --------+------:-------1--l 2048 Bin Number 1 I I I 1 I I I I I 1 I I I ---2560 1 I I ----I--:----ll J 3072 3584 4096 Figure 2.41 Correlation when-20Hz Test Matches-20Hz Offset The final configuration of the blind clarification algorithm consists of the autocorrelation based vowel fundamental determination algorithm followed by this algorithm to determine the tuning offset frequency. 2.4.3.4 Algorithm Limitations Since the fundamental and harmonics are being matched in the time domain, it is possible to determine a tuning offset value that is biased by a multiple of the vowel fundamental frequency. This makes perfect sense, since the spectral lines being compared in the correlation are spaced by that amount. This characteristic forces a limitation on the range of tuning offset that can be accommodated. The limit consists of half of the vowel fundamental frequency. For a bass voice the tuning offset to be determined must be less than about +/-43 Hz. For 74

PAGE 87

higher voices, the allowed range of offset increases. A soprano speaker would allow for up to approximately 200 Hz of tuning offset to be determined. In normal operation this limitation is not a driver in system performance. The normal operating scenario is that the receiver operator manually tunes in the signal for maximum intelligibility. Operators generally can tune a SSB signal to within 20Hz by ear. After manual tuning, the blind clarification algorithm is enabled. From then on, any receiver or transmitter drift would be detected and removed. 75

PAGE 88

3. Algorithm Performance Testing Algorithm testing was performed both on simulated signals and with actual received signals. Both types of test results are discussed separately in the following sections. j,. eqru;gma r:-ffil@! ------.. l ffi!'Biind Clanficat10n of SSB Voice Srgnals A. 1185.00 { js2..5o t !:! D ParmnetericPiots .. i = : Ta'rg_ Vowel F_-):t : i! -------------I ___ =-_. sim. [Hz) ....... !,..,.,.====-===c==J il Sirri SNR .. 1:! Num of Fry Bils _, __ ] :i ; S&lllie Poi-Iter FFr liins 1-: . . .. . .. .. o; s.ear:et. O;LPF [;qJr,? id. 1.2.5_clieCk, -. .1! r lmpriiYii.Offse:t E_slimat If' r n I[ ,. --; N. .. ... -. 1 -._ ; .. : -- Ormlr I. .. OVed Esliiiate (Hzj ,-A .,J; '-t;lll' I __ _____ rn: l CJ! ; ', Figure 3.1 Program Control Panel 76

PAGE 89

3.1 Testing using Simulated Data Except for testing as a function of SNR or as otherwise noted, the signal to noise ratio was fixed at 20 dB. This value is typical of HF SSB operation. 3.1.1 Vowel Fundamental Frequency Tests A number of different tests were performed in order to get a sense of algorithm performance. The fundamental frequency search algorithm can be set to search at a specified frequency increment. This will limit the resolution in the output. The behavior of the fundamental frequency determination algorithm with respect to step size required testing. Given the nature of small errors involved, the SNR was set to 100 dB. Preformance as Function of Step Size 100.50 I I I I 100.45 I I I I I I I I I 1 I I I I I I I I 100.40 I I I I I I I I I ---r----r----1 I I I I I I I I I I I I I I I I I "0 100.35 Q) .... ;::) ---t-----+----;------t-----;-----j-----1----t-----t-----1 I I I I I I I I I I I I I I I I I 100.30 Q) 1 I I I I I I I I ::;; 100.25 I I I I I I I I I ___ L ____ L_ ___ L ____ L ___ I I I I I I I I I I I I I 100.20 I I I I I I I I I 1 I I I I I I I I 100.15 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 100.10 ---t------+ ---;------t-----;-----j-----1-----t-----t------1 I I I I I I I I I I I I I I I I I 100.05 1 I I I I I I I I 100.00 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 Actual Figure 3.2 Performance About a 0.25 Hz Search Step 77

PAGE 90

The behavior was as anticipated. Performance with respect to SNR is also important Testing was accomplished at a vowel fundamental frequency of 1 00.125 Hz with a step size of 0.25 Hz; however, the algorithm was placed in the 1,2,5 mode so that the smallest effective step size is 0.001 Hz. Data was collected and plotted as a function of the linear noise amplitude. Standard Deviation of Fundamental Determination 0.70 I 0.60 = 0.50 .S! 0.40 ] ] 0.30 rn 0.20 0.10 I I I I : i -5dBSNR I I ___________ l ____________ _j_ ---------+------------: I OdBSNR I I I I I I I 1 I I I I I I 5dBSNR : : ----------r-----------,------------T-----------1 I I i i ---1 I I I I I I I I I I I 20dBSNR1 I I 30dB SNR I I I 0.00 ._ ___,_ __ __ ..J...._ _.J __ __L __ __ ,_ __, 0.0000 S.OOOOE-1 1.0000 1.5000 2.0000 Linear Noise Amplitude Figure 3.3 Noise Performance of Fundamental Estimation Figure 3.3 shows that the noise performance is approximately linear and that the fundamental determination algorithm is somewhat insensitive to noise. This plot ranges from SNRs of 100 dB to -5 dB. At an SNR of -10 dB or poorer the algorithm begins to break down and tends to misidentify the vowel fundamentaL 78

PAGE 91

3.1.2 Testing of Tuning Offset Determination Algorithm It is expected that errors in determining the vowel fundamental frequency will induce errors in the tuning offset determination. This characteristic was tested with a 1 00 Hz fundamental. Errors in the vowel fundamental were artificially induced in order to see the effect in the tuning offset determination. g .. 0 t:: ao .!3 E-< "'0 .. u -6 ..s 20 15 10 5 0 -5 -10 -15 -20 100 Hz Fundamental; 20 dB SNR; Induced Tuning Errors I I I I I I I I I I I I I I ___ ____ r---r---r---r-1 I I I I I I I I I I I I I I I I I I I I I I __ L ___ ___ L ___ L ___ L_ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I --r---r---T ___ T ___ 1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I --r---r---T---T---,---,---,------r---r---r---r1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I --r1 I I I I I I I I I I I -25 "--''---'---'-'---'---'--'---'---'---'---'---'--........__-'--...._'---'---'---'--'---'---'--'--'---' -1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 Fundamental Tuning Error (Hz) Figure 3.4 Errors in Tuning Offset due to Errors in Fundamental Determination As shown in Figure 3.4, the effect is a linear one. Unfortunately the sensitivity is about 16.25:1 at 100 Hz. This implies that if the tuning error is to be known within 10 Hz, then the error in determining the vowel fundamental frequency should be on the order of 0.5 Hz. A test was then made to see the effect of noise on the tuning offset determination itself. The errors in vowel fundamental frequency were manually removed prior to the tuning offset determination. Figure 3.5 shows that the tuning offset calculation is also not overly affected by additive white Gaussian noise. 79

PAGE 92

Effect of Noise on Tuning Error Determination 1.00 sNRJ=-IO dB g 0.80 I I I I I I I I I -----T------,-------r------T------,------r-----... g gp 0.60 E-< c... 0 0.40 r:l 0 1 I I I I I I I I I I I I I I I I I I I I 1 I I I I I : : : : I I I I I I 1 I I I I I I I I I I I ;;: II) 0.20 Cl I I SNR-b dB I I I I I I I I I I I I I -----T-____ i ______ T ______ i _______ i _____ "C 0.00 til : : : : I I I I I I SNR =120 dB I I I I I I I I I I I I I I I I I -0.20 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 Linear Noise Amplitude Figure 3.5 Effect of Noise on Tuning Error Estimation 3.2 Testing using Audio Samples Testing with audio samples was performed using the setup shown in Figure 3.6. Each time the tuning error was to be estimated, six processing steps were performed; each is described in the following paragraphs. 80

PAGE 93

()) .... Barker & Williamson ACJ.S-30 Broadband Folded Dipole Antenna ( p l Low level SO Ohm Unfdterecl RF Metro pole 1.6-JO.OMH.z Bandpass Filter +p l Low level SO Ohm Filtered RF AltelnJlle Sr-our_ ...;.ce_ .......;,__;_..:....;.., Electrovoice 664 Dynamic Microphone Wotkins-Jobnson HF-1000 Commuinications Receiver _,,. __ _:_. + l ovu 6000hm Unfdtered Audio Low Noise Preamplifier Timewave DSP-599ZX Audio Filter I + l ovu 6000hm Filtered Audio Sound Blaster Sound Card Figure 3.6 Data Acquisition Setup + l Pa Digital 8Js Pentium IV CPU + l ISA Digital 8Js .wav File of Audio Samples t On PC Hard Drive for Further Processing

PAGE 94

3.2.1 Acquire the Target Signal The target signal was acquired by two different means. In early testing, a microphone was used to collect audio samples directly. This was followed by the analysis of signals collected off the air. For most testing, the antenna used was a Barker and Williamson AC3.5-30 broadband folded dipole. This 90 foot long antenna is designed to receive signals over the entire HF spectrum. While its performance at any given frequency is marginal when compared with tuned antennas, the broadband coverage made this a good choice. For medium frequency (MF) testing using radio station KOA at 850 KHz, a Sony AN-1 active antenna was used. Except for MF testing with KOA, a Metropole bandpass filter was used. This filter attenuates signals below 1.6 MHz and above 30 MHz and helped reduce spurious responses due to strong local MF signals and cellular and pager sites. The receiver used for all data analysis was a Watkins-Johnson HF-1000 with the internal preselector option. With calibration on the order of 0.1 ppm, it contributed virtually no errors in tuning at MF using KOA and only 2 Hz of error at 15 MHz. In cases where interference was present, a Timewave DSP-599zx adaptive digital signal processing (DSP) audio filter was used. This device can effectively reduce random noise and can virtually eliminate heterodyne (fixed tone) interference. A Dell 4550 PC with integral SoundBiaster compatible sound card was used for data acquisition. The program used for .wav file acquisition was Spectrum Laboratory version 2.01 b1. 82

PAGE 95

(X) w I"_ ------: S ectrum Labor at ry V2.0 bl -, Quicksehings',!DewfWindows-t!.elp: ,. --.. -._ 1'--1-11;.:.:.( ---. --------. -.. Figure 3.7 Spectrum Laboratory Display with WWV Input Signal (10 MHz)

PAGE 96

Signals were recorded as wav files. The collections were accomplished at 16 bit linear resolution at a nominal sample rate of 22,050 Hz. The .wav files were then read into a buffer in the analysis program. The buffer capacity was arbitrarily set at 131 ,072 samples. This is about six seconds of audio capture although smaller captures were used in some cases. .. :------.-. j1 07.666 li Sitn. Freq: . . I 1310n r ... jo 0 --I: . .
PAGE 97

Low and high frequencies of the classifications overlap; this aids in testing when the target voice is not easily classified. For example, some radio personalities have vocal mannerisms that alter their vowel fundamental frequencies artificially. A good example of this is Scott Hastings at KOA. At times, for effect, he raises his vowel fundamental frequency above his nominal baritone range. 3.2.3 Input Sample Processing A new ensemble of 4,096 samples from the input buffer is obtained for each new estimate. The data is windowed prior to being transformed into the frequency domain; the Welch window is the default. Audio Complete J: ---. Figure 3.10 Audio Sample Processing and Windowing 3.2.4 Estimate the Vowel Fundamental Frequency The estimation method based on the autocorrelation of the input signal was used in all cases. ; f'--:-----: ---_.,... r:------...--:'"---------------.----... Figure 3.11 Vowel Fundamental Determination The nominal step size setting of 0.25 Hz was used for all testing. The 1,2,5 check option was used so that the effective step size was actually 0.001 Hz. The "Narrow Search" and "LPF Carr." were not used. The "1 Pass Search" option is sometimes 85

PAGE 98

used with simulated data to force the fundamental estimation to a specified value. In that case the vowel frequency value is set by the input processing panel. Of particular interest is the "score". This numerical value indicates the degree of correlation. A low value would indicate either low audio signal strength, or that no vowel was present during the sampling interval. 3.2.5 Estimate the Tuning Offset Once the vowel fundamental frequency has been determined, it is possible to estimate the tuning offset. lDHset .Search Complete Ji ---Figure 3.12 Tuning Offset Estimation Panel The step size was set to 0.50 Hz to improve the tuning offset resolution. Once again the "score" indication gives the strength of the correlation and can be used to indicate the quality of the offset estimation. 3.2.6 System Calibration Prior to using the data acquisition setup shown in Figure 3.6, it was necessary to calibrate the setup. The HF-1 000 receiver was tuned to a signal with a carrier on a known frequency. The receiver tuning was offset 1KHz low in USB mode. If the receiver and the transmitter are perfectly calibrated, then a 1 KHz sine wave would result; however, due to calibration errors, the sine wave is not exactly 1 KHz. The IG-18 sine-square 86

PAGE 99

Sample of received audio. I KHz Audio Sinewave 1 .. ChanA TEK 7633 Oscilloscope Heath Chan B. IG-18 Sine-Square Generator Racal-Dana .. 1992 .. Reciprocal Counter 1 KHz Squarewave Figure 3.13 Calibration Setup wave generator is used as a transfer standard by adjusting its frequency until the two sine waves displayed on the 7633 oscilloscope are synchronized. The frequency of the sine-square wave generator is then read from the reciprocal counter. Prior to performing this calibration, the Racai-Dana 1992 counter was verified against a disciplined standard locked to the GPS constellation. After three days of data acquisition and stabilization, the counter was verified to be within 0.01 ppm. To verify receiver calibration, NIST radio station WWV was received at 2.5, 5.0, 10.0, 15.0, and 20.0 MHz and set 1 KHz low in USB mode. The IG-18 signal generator was adjusted for a frequency match using observations of about 1 minute per reading. The 1992 counter was used to read the matched sine wave frequency. The audio frequencies obtained were: 87

PAGE 100

WWV Frequency 2.500 MHz 5.000 MHz 10.000 MHz 15.000 MHz 20.000 MHz Offset from 1 KHz 0.481 Hz 0.933 Hz 1.805 Hz Excessive Interference Low Signal Strength Table 3.1 WWV Based Calibration Offsets Based on this data, it appears that the Watkins-Johnson receiver calibration is off by approximately 0.18 ppm. This is well within the receiver specification. As radio station KOA at 0.850 MHz will be used for a test source, its center frequency was measured in the same fashion. The total tuning error was 1.91 Hz. Subtracting the 0.15 Hz contribution from the HF-1 000 receiver at 0.850 MHz, the error in the center frequency for KOA is approximately 1. 76 Hz. This is well within the 20 Hz FCC specification for commercial AM stations. 3.2.7 Test Results Using Local Speakers A microphone was used to gather samples of vowels. Except for a couple of obviously bad readings, the result centers around the 0 Hz offset that was used. The statistics in Table 3.2 we compiled after the two bad readings were removed. During normal voice the pitch changes as a function of time. A test was run with the vowel fundamental frequency increasing by an octave over about 5 seconds. 88

PAGE 101

It appears that the estimated offset frequency was biased downward due to the rising pitch. It would be interesting to see if falling pitch also biased the estimated offset frequency. The bias reversed sign and became positive for the falling pitch. This could be problematic for accurate offset determination. The previous dataset was processed again; instead of the nominal 2,048 FFT bins, the number of bins was reduced to 512. In this way the amount of pitch change per ensemble collection time would be decreased by 4. Perhaps this could reduce the effect of changing pitch. Baritone Voice; Ah Vowel, Ill Hz Fundamental; 0 Hz Offset 30 I : : . : I I I I 28 1 : :. : II> I I I I 8 : : : : 26 I e I I e I l : : . I T e I I
PAGE 102

COLUMN NAME: Number of rows: Number of valid points: Number of missing points: Number of negative values: Number of positive values: Number of zero values: Minimum value: Maximum value: Inter range value: Median: Sum of row value: Sum of absolute value: Arithmetric mean: Quadratic mean: Absolute mean: Sum of squares: Variance: Standard deviation: Absolute deviation: Standard error: OFFFREQ 29 29 0 11 16 2 -6.50000000 6.00000000 12.50000000 0.50000000 -10.00000000 71.00000000 -0.34482759 3.00860834 2.44827586 262.50000000 9.25184729 3.04168494 2.53151011 0.56482671 Table 3.2 Statistics for Baritone "ah" Vowel at 111 Hz 90

PAGE 103

180 170 g 160 !:l Cl" ISO &: s 140 = ... "0 130 c ::I 120 110 100 90 0.0 I I Baritone Voice; Rising Pitch; Ah Vowel I I I I I 1 I I I I I I I I I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I -----T------,-------r----,------,-------r-----1 I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I -----T------------r------;------,-------r-----1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I ----------;-------r------;------,-------r------1 I I I I I s.o 10.0 IS.O 20.0 2S.O 30.0 3S.O Ensemble Nwnber Figure 3.15 Baritone Frequency Plot for Rising Pitch "ah" Vowel c: u g. 0 "' .a ., 40 30 20 10 0 -10 -20 -30 Baritone Rising Pitch Offset vs Fundamental Frequencies I I e I I I I I I I I 1 I I I I I I I I I I I I I I I e I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I ...... 1 e I I el I e e 1 e e I e e e I I I e I I I I I 1 e I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I e I I I I I I I I -40 90 100 110 120 130 140 150 Estimated Fundamental Frequency 160 170 180 Figure 3.16 Baritone Offset Versus Vowel Frequency Rising Pitch "ah" Vowel 91

PAGE 104

200 i>' Baritone Voice Falling Pitch; Constant Ah Vowel I I I I I I I I I I I I I c 180 u -----T-----,-------r------T------,-------r-----" [ 1 I 1 I I I I I I I I I I I I I I I I I I I I 160 1 I 1 I I I I I I I I I I I I I I I I I I I "0 140 c 1 I I I I I I I I 1 I I I I I I I I I I I I I I I I i 120 .,a -----T------;-------r------T-----,-------r-----., Ill 1 I I I 1 I I I I I I I I I I I I I I I I I I 100 1 I I I I 1 I I I I I I I I I I 80 0.0 s.o 10.0 1S.O 20.0 25.0 30.0 3S.O Sample Number Figure 3.17 Baritone Frequency Plot for Falling Pitch "ah" Vowel Rising Pitch Ah Vowel, Baritone Voice 1S.O I : I g: : : : : g 10.0 8' I e1 I I I : : : : : I I I I I I I I I I Cl:l I e I I I I 0 s.o 1 : : : : : ,a : : : : ., I I e e I e! Ill e I I I I le 0.0 1 I I I I I I I I I I I I I I I I I I I I I I I -S.O L-L-+L-L-L-L-L-L-L-.J...._.J...._.J...._.J...._.J...._.J...._.J.......J.......J.......J........L..::::L......J........J_ 80 100 120 140 160 180 200 Estimated Fundamental Frequency Figure 3.18 Baritone Offset Versus Vowel Frequency Falling Pitch "ah" Vowel 92

PAGE 105

The bias reversed sign and became positive for the falling pitch. This could be problematic for accurate offset determination. The previous dataset was processed again; instead of the nominal 2048 FFT bins, the number of bins was reduced to 512. In this way the amount of pitch change per ensemble collection time would be decreased by 4. Perhaps this could reduce the effect of changing pitch. 200 Q 180 0 g. 12 tz.. s 160 c "0 140 c :I tz.. "0 120 100 80 0 512 Bins Baritone Falling Pitch Constant Ah Vowel I I I I I I I I I --,-------r------T------,-------r------1 I I I I I I I I I I I I I I I. I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I -----T------,-------r------T-----,-------r------1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 20 40 60 80 100 120 140 Ensemble Nwnber Figure 3.19 512 Bins Baritone Frequency Plot for Falling Pitch "ah" Vowel While the bias is removed, the dispersion in the results increased; this is true even if the obviously bad offset estimates above 10 Hz are removed. 93

PAGE 106

r:: 11> ::I C" J: ts ..... 0 -o 20.0 15.0 10.0 5.0 0.0 -5.0 512 Bin Falling Pitch Baritione; Offset vs Fundamental Frequency I I I I I I I I 1 I I I I I I I I I : : : : : --------;--------r-------;--------r-------;--------1 I e I I I I I e I I I : : : : : J, : : : : : .,. ___ I ee-l! .. I 1e e. l I le I I.. 1e I 1 I I I I I I I I I i i i i l -10.0 L.._L......JL......JL....JL....J__[___.L___.L___.L__t___t____L___L___L___.!____.!__L__j__j__j___I___I_....L......J 80 100 120 140 160 180 200 Estimated Fundamental Frequency Figure 3.20 512 Bins Baritone Offset Versus Vowel Falling Pitch uah" Vowel A contralto female voice was also examined. It appears that in a few cases the vowel fundamental frequency was misidentified. Spectrum Laboratory was used to verify the actual spectral line. In Figures 3.21 and 3.12 it appears that in about the middle of the sample there is an excursion in frequency towards 208 Hz and at the end of the collection there is an excursion towards 220 Hz. Figure 3.22 only shows the region of the fundamental. The 6th harmonic is the largest amplitude of the harmonics in the fixed frequency contralto example. The negative excursion in the middle of the collect is obvious; the positive excursion at the end is as well. 94

PAGE 107

>. <> c .. ::s r::r '3 c .. "0 c ::s -g ... .g "' 218 217 216 215 214 213 212 2II 210 0.0 5.0 Contralto Voice Fundamental Frequency I ___ L_ I I I ___ L_ I I I I ...J _______ L ___ I I I I I I ...J _______ L ___ I I I I 10.0 15.0 I I I ___ L _____ I I I ___ L _____ I I I __ L _____ I I I __ L _____ I I I I I -______ ...J_____ _L _____ I I I I I I I I L ______ ...J _______ L _____ I I I I I I I I I L ______ ...J _______ L _____ I I I I I I 20.0 25.0 30.0 35.0 Ensemble Number Figure 3.21 Contralto "ah" Vowel at Fixed Frequency laboratory Y2.0 bl .... Figure 3.22 Contralto "ah" Vowel Spectrum Analysis Fundamental 95

PAGE 108

Laboriltory V2.0 bl ., Figure 3.23 Contralto "ah" Vowel Spectrum Analysis of 6th Harmonic Estimated Tuning Offset 19 I I I I I e I I I I 18 : : : 17 u 16
PAGE 109

There are no obviously bad data points here. Table 3.3 gives the statistics for this run. COLUMN NAME: Number of rows: Number of valid points: Number of missing points: Number of negative values: Number of positive values: Number of zero values: Minimum value: Maximum value: Inter range value: Median: Sum of row value: Sum of absolute value: Arithrnetric mean: Quadratic mean: Absolute mean: Sum of squares: Variance: Standard deviation: Absolute deviation: Standard error: OFFFREQ 31 31 0 16 13 2 -9.50000000 9.00000000 18.50000000 -0.50000000 -12.00000000 113.00000000 -0.38709677 4.56317660 3.64516129 645.50000000 21.36182796 4.62188576 3.63267430 0.83011519 Table 3.3 Statistics for Contralto "ah" Vowel at 214Hz With a mean of -0.38 and a standard deviation of 4.62, this appears to be a good data set. 97

PAGE 110

Rising and falling pitch examples were also analyzed for a contralto voice. The bias observed in the male voice with rising frequency was not obvious. 6 [ .... s c "0 .... "0 ;; .iJ Ul 350 300 250 200 5.0 Contralto Rising Pitch Frequency Plot 10.0 15.0 Ensemble NIDitber Contralto Estimated Offset; Rising Pitch 20.0 25.0 so I I I I ----------+---------+---------4----------t---------l I I I 30 1 I 1 I I I I I 20 10 --.-------t------------------1--.-------r--------IU I I I I .) 0 ----------t--------+-------..!'-+---------1----------l e I I I -20 1 I I I I I I I ----------r---------1---------1----------r--------' I I 1 1 I I I 0.0 5.0 10.0 15.0 Ensemble Nmober 20.0 Figure 3.25 Contralto With Rising Pitch 98 25.0

PAGE 111

>. u 6 ::1 C' s "0 c &! ., e a Jl 350 Contialto with Falling Pitch I I I I I I I I I I I I I I 300 ______ J_______ 1 I I 1 I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 250 ------J--------L--1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 200 1 I I I I I I I I I I I I I I I I I I I ISO 0.0 s.o 10.0 15.0 20.0 25.0 30.0 Ensemble Number Contralto with Falling Pitch so I I I I I I I I 30 _9 I I I I I I I I I I 0 I I e I I I I I I I I 10 1 e e I te : -Jo JJ I I I I I : : l : : I I I I I -30 1 I I I I I I I I I I I I I I SO 0.0 S.O 10.0 U.O 20.0 25.0 30.0 Ensemble Number Figure 3.26 Contralto with Falling Pitch What is obvious is that the quality of the tuning offset estimation is worse as the contralto's pitch was lower. 99

PAGE 112

3.2.8 Tests Using MF and HF Reception The first test was simply to see if a recorded example of a contralto could be used to test frequency offset estimation with spoken text. The text example, "The quick brown fox jumped over the lazy dog's back" was used. The recorded example was transmitted into a dummy load using an ICOM 756 PRO SSB transceiver at a frequency of 14.325 MHz in USB mode. A small amount of RF energy was coupled into the Watkins-Johnson HF-1 000 receiver; the rest of the configuration is as shown in Figure 3.6. Seven different tuning offsets from-30Hz to +30Hz at 10Hz increments were tried. The transmitter was off tuned by preset amounts to simulate the tuning errors. 7 Tunings; Contralto Offset Estimation; No Rejection -SO -100 .___,____..._....__-'--.L...--'--'--'-----'---'-----'-------'--'-----' 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 Ensemble Number Figure 3.27 Contralto Voice with 7 Tuning Offsets; No Error Rejection Some trending is visible; however, some obviously bad points tend to obscure the plot. 100

PAGE 113

40 ... 20 0 011) .5 0 1 ., -40 -60 2 7 Offset Contralto Voice with Error Rejection 3 4 s 6 7 8 Ensemble Number Figure 3.28 Contralto Voice with 7 Tuning Offsets; With Error Rejection After points with offset scores of less than 1 are removed as well as some very obviously bad points, the tuning offset estimation process seems to be reasonable. Clearly there are some other bad points that did not seem unreasonable during error rejection. A more sophisticated algorithm might be able to remove the other defective points as well. The error trend at point 6 is troubling. The MF spectrum was then searched for various qualities of male and female voices. Captures of voice segments was accomplished with an assortment of tuning errors. Figure 3.29 shows a typical result of estimating the tuning error with a resonant women's voice. It is expected that there will be a wide deviation; in addition to vowels there are consonants, space between words, and so forth. It was assumed that the offset determination estimates and score would be correlated. If the false readings at+/40.5 and readings with scores of less than 3.0 are omitted, then the mean is 20.00 with a standard deviation of 10.0. 101

PAGE 114

Female Voice; +25Hz Tuning Offset 8 : :. : I I I I I 7 : :Actual : I I I I + : 6 : : : I I I I le I i 5 --------:--------:-------:--------:1 I I e I I j 4 3 .: : : : :: I I I I I 2 : : : : I I I I I 0 -60 =1 i i i i -40 -20 0 20 40 60 Estimated Tuning Offset (Hz) Figure 3.29 Tuning Offset Determination with Resonant Woman's Voice Here the search limits were increased; the false readings moved to +/-83_0 Hz. 0 u rn &:: .!2 ... g "' J:Ll 10 8 6 4 2 Female Voice; -25Hz Offset I I I I I e 1 I I I I I : . : r------------:-----.-1 I I I I I I I I Actual Offset : I I 1 I I I I I I I I I 1 I I I I I I I I I -----4------------+-----------1 e I I I I I : : 1. I e I I I I 0 -100 i -50 J i 0 50 100 Estimated Offset Frequency (Hz) Figure 3.30 Female Voice with -25 Hz Offset 102

PAGE 115

Again if the false readings at the edges of the search range are eliminated as well as those with scores lower than 4 are eliminated the mean is -30.0 with a standard deviation of 18. In general male voices did more poorly than female voices. Figure 3.31 shows that there is a trend in the estimation, but it is not as strong. Male Voice; -25Hz Offset 10 Actual : 1 1 Offset : I -d-I I 8 -----------------------,------------T------------: : : <> I I I 8 : : : 6 : I : I I e ,.-I I I I e I ,a e I e I I 4 ------------+----.------1---+---;----f------------le I e e I : : : I I I I e e lee I 2 ____________ i______ 1 e I e I I I I I l I 0 L_ __ _L ____ L_ __ -100 -SO 0 so 100 Estimated Tuning Offset (Hz) Figure 3.31 Male Voice with -25 Hz Offset A number of captures were taken and the results plotted together. The time evolution aspect of the tuning error estimate is interesting. Figure 3.33 shows a male voice that transitions from syllable to syllable. The FFT length was reduced from 2,048 to 256. 103

PAGE 116

7 6 5 0 ell "' 4 3 2 Male Voice; Large Data Set; -25Hz Offset -40 -20 0 20 40 60 Estimated Offset (Hz) Figure 3.32 Large Sample of Male Voices; -25 Hz Offset This test began to address which resolution FFT works best in this application. The tradeoff is between frequency accuracy and data set length. Of these, the 1 024 and 512 long FFTs appear to possibly be good candidates. When +/-41.5 Hz anomalies were removed as well as those points with scores less of than 1.0, the 512 long FFTs appeared slightly better than the 1024 long FFTs. FFT Size 1,024 512 Points 13 18 Mean 16.1 Hz 18Hz Standard Deviation 26.5 20.0 Table 3.4 Performance Comparison of Two FFT Lengths 104

PAGE 117

so 40 -30 20 ., 1 0 !I 0 -10 -20 -30 so ...... 0 40 01 30 !I 20 10 0 "" 0 t--10 -20 30 -40 0 2048 Long FFTs; + 30 Hz Tuning Error I I I I 2.0 4.0 6.0 8 0 10.0 12. 0 Ensemble Number Male Voice; +30Hz Offset; 512 LongFFTs I I I I I I I I I I I I I I I I II I I I I I --'--------'--L __ I I I I I I I ---rr I I I I --I ___ I -----T---------1 -----, I I I I I I I I -+--------_____ .. I I I I I I I I I I -I ___ I -T--------------,-----I I I I I I _.,L _________ ----L-I I I I I I I I I I I -T---------,-------,----------r I I I I 10 20 30 40 so Ensemble Number 60 40 20 J t-J -2: -40 0 0 60 40 20 I 0 -20 -40 0 + 30 Hz Shift; I 024 Long FFTs I I I I I I I I I ---r---1 I I I -------: : I I I -I-I I I I ------L---1 I I I I --------r----1 I I I --------+----1 I I s.o I I I I I I I I I I I I I ,-1 I I I 1 I I 10. 0 I I I _..,. __ -:-------1 I I ---{---------1 I I I I I I L I I I I I 1 I I I I 1S.O 20.0 Ensemble Number 256 Long FFTs; +30 Shift I I --r--------1 I I I -+---1 20 40 60 Ensemble Number 10 lS. O 100 Figure 3.33 Time Evolution of Estimated Tuning Error

PAGE 118

Based on this test, 5121ong FFTs appear to give the best performance. It is important to note that this test was performed using a male voice; the results with female voices have proven to be better. Based on this information, a number of runs with 5121ong FFTs were made of female speakers. In general, they did not perform significantly better than the 2,048 long FFTs that were the baseline for this design. 3.2.9 Conclusions Based on Test Results It is clear that some signal preconditioning is required prior to applying the algorithms developed here. The algorithms work quite well on simulated vowels and on pure vowels, but once consonants are introduced, the algorithm tends to break down. In addition there is a tendency when processing spoken text for the algorithm to generate completely incorrect estimates of tuning errors. While these are easy to identify and remove, this tendency is unacceptable in a commercial product. While out of scope for this development activity, additional development on vowel peaking (and consonant notching) algorithms will be required. The algorithm developed here appears robust with respect to additive whit.e Gaussian noise. Perhaps consonants, with their correlated noise, are part of the problem. One motivation for developing this algorithm is based on the general availability of high performance computers for DSP applications. Unfortunately this algorithm, as it is currently implemented, is far too slow for a real time application. The current 5 minute processing time for each 6 second audio sample demonstrates that a speed improvement of at least 60:1 will be required. Some of this can be accomplished by using processors more suitable for DSP applications. No doubt considerable improvement can be made in the algorithm selection and coding as well. 106

PAGE 119

Appendix A Lagrange Interpolation A standard means of interpolating between data points involves fitting the points to a polynomial, then using that polynomial to interpolate between the measured points. One could perform a least squares regression of the points to determine the polynomial coefficients. Those coefficients could then be used to interpolate between the given points. A more efficient approach is to use the Lagrange form of the interpolating polynomial. This form does not require the explicit computation of the coefficients; hence, it is computationally more efficient and is commonly used in standard practice. An accurate interpolation must return a given abcissa value when a given ordinate value is input to the algorithm. Consider an interpolation between two points; i.e., linear interpolation. Two pairs of points would be used, (xJ, YI) and (x2 y2), with the interpolation value of y to be determined for an x value between x1 and x2 Using two-point slope form, the equation for y = j(x) could be determined: y=mx+b m= Y2 -yJ X2 -XI 107

PAGE 120

With a small amount of algebraic manipulation, this expression becomes: ( ) X2Y1 -.xyt + .xy2 + Y2X2 -Y2X1 -X2Y2 y=px= X2 -XI x-x x-x Y = p(x) = 2 Yt + 1 Y2 XI -X2 X2 -XI which is the Lagrange form of the first order interpolating polynomial. The fractions are in fact cardinal functions, they select a desired y; value by returning 1 for an Xj value when i = j and returning 0 if i j A similar derivation, with linearly increasing algebraic complexity, can be performed for any number of x,y pairs: This is the Lagrange form of the interpolating polynomial for any number of points22 108

PAGE 121

AFC AM dB DC OFT DSP Fo FCC FFT FSK GHz GPS HF HFO Hz IEEE KHz LPF LPF Corr. MHz mw NCO NIST NTIA PC Appendix B Acronym and Abbreviation List Automatic frequency control Amplitude modulation Decibel Direct current Discrete Fourier Transform Digital signal processing Vowel fundamental frequency Federal Communications Commission Fast Fourier Transform Frequency shift keying Gigahertz (109 Hertz) Global Positioning System High frequency (3 0 30.0 MHz) High frequency oscillator Hertz Institute of Electrical and Electronics Engineers Kilohertz (1 03 Hertz) Low pass filter Low pass filtered correlation function Megahertz (1 06 Hertz) Milliwatt (1 0-3 Watt) Numerically controlled oscillator National Institute of Standards and Technology National Telecommunication and Information Administration Personal computer 109

PAGE 122

PCI Peripheral Component Interconnect PLL Phaselockedloop ppm Parts per million RF Radio frequency RMS Root mean squared SNR Signal to noise ratio SSB Single sideband (suppressed carrier) USB Upper sideband USG United States government VU Volume unit (OVU = 1 mw into 600 Ohms) 110

PAGE 123

Appendix C Software Listing unit ThesisUnit; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, StdCtrls, Comctrls, Extctrls, Math, Spin; type TThesisForm = class(TForm) InputProcessingGroup: TGroupBox; InputFileNameEdit: TEdit; InputFileLabel: TLabel; InputFileButton: TButton; FileLengthEdit: TEdit; FileLengthLabel: TLabel; InputStatusEdit: TEdit; InputStatusLabel: TLabel; AudioSampleGroup: TGroupBox; ParzenRadio: TRadioButton; HannRadio: TRadioButton; WelchRadio: TRadioButton; RectangularRadio: TRadioButton; AudioSampleButton: TButton; AudioSampleStatusEdit: TEdit; AudioSampleStatusLabel: TLabel; SamplePointerEdit: TEdit; PointerLabel: TLabel; FFTSizeUpDown: TUpDown; FFTSizeEdit: TEdit; FFTSizeLabel: TLabel; FakeDataButton: TButton; OffsetSearchGroup: TGroupBox; FundamentalGroup: TGroupBox; FundamentalSearchButton: TButton; SFreqEdit: TEdit; EstOffsetEdit: TEdit; FreqLabel: TLabel; EstOffsetLabel: TLabel; FundamentalStatusEdit: TEdit; FundamentalStatusLabel: TLabel; SFreqCorrEdit: TEdit; FreqCorrLabel: TLabel; FakeFreqEdit: TEdit; FakeFreqLabel: TLabel; FakeOffsetEdit: TEdit; FakeOffsetLabel: TLabel; SFreqincrementEdit: TEdit; FreqincrementLabel: TLabel; SNREdit: TEdit; SNRLabel: TLabel; NoteLabel: TLabel; OffsetSearchButton: TButton; 111

PAGE 124

OffsetStatusEdit: TEdit; OffsetEdit: TEdit; OffsetLabel: TLabel; OffsetCorrEdit: TEdit; OffsetincrementEdit: TEdit; Labell: TLabel; OffsetincrementLabel: TLabel; OffsetStatusLabel: TLabel; FundTestButton: TButton; OffTestButton: TButton; Fundl25Check: TCheckBox; FreqNarrowCheck: TCheckBox; FreqLPFCheck: TCheckBox; AutoGroup: TGroupBox; FundAutoSearchButton: TButton; FreqAutoStatusEdit: TEdit; ConfigurationGroup: TGroupBox; VerboseCheck: TCheckBox; DiagCheck: TCheckBox; HighEdit: TEdit; HighLabel: TLabel; LowEdit: TEdit; LowLabel: TLabel; SopranoRadio: TRadioButton; MezzoSopranoRadio: TRadioButton; ContraltoRadio: TRadioButton; TenorRadio: TRadioButton; BaritoneRadio: TRadioButton; BassRadio: TRadioButton; ImproveOffsetEstimateGroup: TGroupBox; FineOffsetLabel: TLabel; OrderLabel: TLabel; ImproveOffsetEstimateButton: TButton; ImproveOffsetEdit: TEdit; OrderSpinEdit: TSpinEdit; AFreqEdit: TEdit; AFreqCorrEdit: TEdit; AFreqincrementEdit: TEdit; AFreqLabel: TLabel; AFreqCorrFLabel: TLabel; AFreqincrementLabel: TLabel; AFundamentalStatusLabel: TLabel; TargetLabel: TLabel; AFund125Check: TCheckBox; AFreqNarrowCheck: TCheckBox; AFreqLPFCheck: TCheckBox; AFundTestButton: TButton; BatchGroup: TGroupBox; BatchEdit: TEdit; BatchButton: TButton; BatchLabel: TLabel; ErrorEdit: TEdit; ErrorLabel: TLabel; Timerl: TTimer; procedure InputFileButtonClick(Sender: TObject); procedure FormCreate(Sender: TObject); procedure AudioSampleButtonClick(Sender: TObject); procedure FFTSizeUpDownClick(Sender: TObject; Button: TUDBtnType); procedure FakeDataButtonClick(Sender: TObject); procedure FundTestButtonClick(Sender: TObject); procedure FundamentalSearchButtonClick(Sender: TObject); procedure SopranoRadioClick(Sender: TObject); procedure MezzoSopranoRadioClick(Sender: TObject); 112

PAGE 125

procedure ContraltoRadioClick(Sender: TObject); procedure TenorRadioClick(Sender: TObject); procedure BaritoneRadioClick(Sender: TObject); procedure BassRadioClick(Sender: TObject); procedure ImproveOffsetEstimateButtonClick(Sender: TObject); procedure ParzenRadioClick(Sender: TObject); procedure HannRadioClick(Sender: TObject); procedure WelchRadioClick(Sender: TObject); procedure RectangularRadioClick(Sender: TObject); procedure OffsetSearchButtonClick(Sender: TObject); procedure OffTestButtonClick(Sender: TObject); procedure FundAutoSearchButtonClick(Sender: TObject); procedure VerboseCheckClick(Sender: TObject); procedure DiagCheckClick(Sender: TObject); procedure AFundTestButtonClick(Sender: TObject); procedure BatchButtonClick(Sender: TObject); procedure Timer1Timer(Sender: TObject); private { Private declarations } public { Public declarations ) end; var ThesisFor.m: TThesisFor.m; implementation {$R *.drmJ (*****************************************************************************) II type declarations needed for constants type mystringtype =string [80]; II for wav file interface shortstring =string [4]; intarray =array [1 .. 2] of byte; longarray =array [1 .. 4] of byte; (*****************************************************************************) II constant declarations const (* sample rate of audio card *) fck = 22050.0; (* maximum size of wave file to be analyzed *) maxwavdata 131072; (* Reference WAV file header values *) RIFF ref: shortstring WAVE=ref: shortstring fmt_ref: shortstring fmt_len_ref: longint format ref: smallint num chan ref: smallint samps_ref: longint bytes ref: longint byte_samp_ref: smallint bits_samp_ref: sma11int 'RIFF'; 'WAVE'; 'fmt '; 16; 1; 1; 22050; 44100; 2; 16; 113

PAGE 126

data_ref: shortstring 'data'; II message used to indicate a good read of a wav file ok: mystringtype = 'OK'; (* Const declarations for Numerical Recipes routines *) RealArrayMAX = 32768 + 32; II type declarations type II for wav file data storage wavdataarraytype =array [l .. maxwavdata] of extended; (* Type declarations for Numerical Recipes routines *) RealArray =ARRAY [l .. RealArrayMAX] OF extended; (* for window selection *) WindowSelectType = (Parzen, Hann, Welch, Rectangular); (*****************************************************************************) II variable declarations var (* wav file variables *) wavfile: file of byte; wavreadname, wavwritename: mystringtype; wavfilelen: longint; (* total len of WAV file in bytes *) datalen: longint; (* the number of data samples present *) wavdata: wavdataarraytype;(* the .WAV data*) point: longint; (* read pointer *) (* Header values read in from the .WAV file *) RIFF_inp: shortstring; WAVE inp: shortstring; fmt inp: shortstring; fmt=len_inp: longint; format_inp: smallint; num_chan_inp: smallint; samps inp: longint; bytes=inp: longint; byte samp inp: smallint; bits=samp=inp: smallint; data inp: shortstring; data=len_inp: longint; (* data variables needed for processing each completed FFT *) data: RealArray; f: RealArray; (* used by FFT routine *) norm: extended; (* FFT size variables *) nn, nn2, nn4, nn6, nnB: longint; (* general purpose vars provide a mechanism to tell operator of an error *) error, loc error: boolean; errmsg: mystringtype; (* For window selection *) WindowSelect: WindowSelectType; 114

PAGE 127

(* variables needed for audio sample and correlation processing *) AudioSamples, dataiN, DataREF: RealArray; (* Correlation Frequency Limits *) low, high: extended; (* Single Pass maximum correlation and offset *) loc_max_corr: extended; loc max offset: extended; loc=max=index: longint; (* Multipass maximum correlation and offset *) max_freq: extended; max_corr: extended; max offset: extended; max=index: longint; sqrt_eps: extended; (* For improve estimate of offset from fundamental calculation *) FundData: RealArray; Fundindex: longint; (* Correlation Search Increment in Hz *) Increment: extended; BeforeMeasure, AfterMeasure: extended; measured: boolean; verbose, diag: boolean; (* For batch mode *) running: boolean; ensemble: longint; BatchOuts: textfile; (*****************************************************************************) (* General Purpose Subroutines *) (*****************************************************************************) procedure update_nns; (* Rev 23 Dec 2002 GAG *) begin nn2:= nn + nn; nn4:= nn2 + nn2; nn6:= nn4 + nn2; nn8:= nn4 + nn4 end; (*****************************************************************************) (* Routines for Reading and Writing .WAV files *) (*****************************************************************************) function readshortstring: shortstring; (* Rev 23 Dec 2002 GAG *) var b: byte; temp: shortstring; i: smallint; 115

PAGE 128

begin temp:= 'xxxx'; for i:= 1 to 4 do begin read (wavfile, b); temp[i]:= chr (b) end; readshortstring:= temp end; (*****************************************************************************) function readlongint: longint; (* Rev 23 Dec 2002 GAG *) var i: smallint; loci: longint; loeb: b: byte; begin loeb:= @loci; for i:= 1 to 4 do begin read (wavfile, b); [i] := b end; readlongint:= loci end; (*****************************************************************************) function readint: smallint; (* Rev 23 Dec 2002 GAG *) var loci: smallint; loeb: i: smallint; b: byte; begin loeb:= @loci; for i:= 1 to 2 do begin read (wavfile, b); locb"[i] := b end; readint:= loci end; ( * * ** ** * ** ***** ***** * * *****"** * * * *** ** * * * * ************ *** * ***** *) procedure writeshortstring (str: shortstring); (* Rev 23 Dec 2002 GAG *) var 116

PAGE 129

i: smallint; b: byte; begin for i:= 1 to 4 do begin b:= ord (str[i]); write (wavfile, b) end end; (*****************************************************************************) procedure writelongint (i: longint); (* Rev 23 Dec 2002 GAG *) var loci: longint; loeb: Alongarray; begin loci:= i; loeb:= @loci; write (wavfile, locbA[l], locbA[2], locbA[3], locbA[4]) end; (*****************************************************************************) procedure writeint (i: smallint); (* Rev 23 Dec 2002 GAG *) var loci: smallint; loeb: Aintarray; begin loci:= i; loeb:= @loci; write (wavfile, locbA[l], locbA[2]) end; (*****************************************************************************) procedure readwav; (* Rev 23 Dec 2002 GAG *) var i: longint; s: mystringtype; trash: byte; st: mystringtype; begin assignfile (wavfile, wavreadname); reset (wavfile); (* read in the header *) RIFF inp:= readshortstring; wavfllelen:= readlongint; WAVE_inp:= readshortstring; fmt_inp:= readshortstring; fmt_len_inp:= readlongint; format_inp:= readint; num_chan_inp:= readint; 117

PAGE 130

samps inp:= readlongint; bytes=inp:= readlongint; byte samp inp:= readint; bits=samp=inp:= readint; (* read by trash added to format header *) while fmt_len_inp > 16 do begin read (wavfile, trash); fmt len inp:= fmt len inp -1; wavfilelen:= wavfilelen -1 end; data inp:= readshortstring; while ((data inp <>data ref) and (NOT EOF (wavfile))) do begin -(* get length of bogus info *) i:= readlongint; (* trash the garbage *) while i > 0 do begin read (wavfile, trash); i:= i -1; wavfilelen:= wavfilelen -1 end; (* get the next data_inp *) data inp:= readshortstring end; -(* get the data length *) data_len_inp:= readlongint; (* check the header *) if RIFF inp <> RIFF ref then s:= '"RIFF" incorrect' else if wavfilelen <= 44 then s:= 'File too short' else if WAVE_inp <> WAVE ref then s:= '"WAVE" incorrect' else if fmt inp <> fmt ref then s:= '"tmt incorrect' else if fmt_len_inp <> fmt_len_ref then s:= 'Format length incorrect' else if format inp <> format ref then s:= 'Format incorrect' else if num chan inp <> num chan ref then s:= INurn channels incorrect' else if samps_inp <> samps_ref then s:= 'Sample rate incorrect' else if bytes inp <> bytes ref then s:= 'Byte rate incorrect' else if byte_samp_inp <> byte_samp_ref then 118

PAGE 131

s:= 'Bytes I sample incorrect' else if bits samp inp <> bits samp ref then s:= 'Bits 7 sample incorrect' else if data inp <> data ref then s:= '"data" incorrect' (* This is commented out because not all .wav files are created equal *l II else II if data_len_inp <> (wavfilelen -36) then II s:= 'lengths do not agree' else s:= ok; if s = ok then begin error:= false; datalen:= (data_len_inp) div 2; if datalen > maxwavdata then datalen:= maxwavdata; for i:= 1 to datalen do wavdata [i]:= readint I 32767.0 end else begin error:= true; datalen:= 0; end; errmsg:= s; closefile (wavfile); point:= 1; str (point:B, st); Thesisform.SamplePointerEdit.text:= st end; (*****************************************************************************) procedure writewav; (* Rev 23 Dec 2002 GAG *) var i: longint; begin assignfile (wavfile, wavwritename); rewrite (wavfile); wavfilelen:= datalen 2 + 36; writeshortstring (RIFF ref); writelongint (wavfilelen); writeshortstring (WAVE ref); writeshortstring (fmt ref); writelongint (fmt_len=ref); writeint (format_ref); writeint (num chan ref); writelongint (samps_ref); writelongint (bytes ref); writeint (byte samp-ref); writeint (bits-samp-ref); writeshortstring (data_ref); 119

PAGE 132

writelongint (datalen 2); for i:= 1 to datalen do writeint (round (32767.0 wavdata[i))); closefile (wavfile) end; (*****************************************************************************) (* Fake Data Generation and Support Routines *) (*****************************************************************************) P.rocedure Fake Wave (numpoints: longint; fund: extended; offset: extended; doREF: boolean; ratio: extended); (* Rev 23 Dec 2002 GAG *) var i,j, limit: longint; accum, x, loc_f, advance: extended; noise: extended; begin { No harmonics above 3,000 will be present in a SSB signal due to the SSB filter. As a result, none are created. if (ratio > 0.0) then RandSeed:= 13579; limit:= trunc (3000.0 I fund); if doREF then begin if numpoints > RealArrayMAX then numpoints:= RealArrayMAX end else begin if numpoints > maxwavdata then numpoints:= maxwavdata end; for i:= 1 to numpoints do if doREF then dataREF [i):= 0.0 else wavdata [i) := 0.0; for j:= 1 to limit do begin loc f:= (fund j) + offset; if (loc f >= 300.0) AND (loc_f <= 3000.0) then begin -accum:= 0; advance:= loc f I 22050.0; for i:= 1 to numpoints do begin 120

PAGE 133

accum:= accum + advance; if accum >= 1.0 then accum:= accum1.0; x:= 0.1 sin (2.0 *pi accum); if (ratio.> 0.0) then begin noise:= RandG (0.0, 0.1 *ratio); x:= x + noise end; if doREF then dataREF (i]:= dataREF [i] + x else end end end end; wavdata [i]:= wavdata [i] + x (*****************************************************************************) procedure dofakedata (name: mystringtype); (* Rev 23 Dec 2002 GAG *) var st: rnystringtype; code: integer; freq, offset: extended; SNR: extended; ratio: extended; begin val (ThesisForm.FakeFreqEdit.text, freq, code); str (freq:8:3, st); ThesisForrn.FakeFreqEdit.text:= st; val (ThesisForm.FakeOffsetEdit.text, offset, code); str (offset:8:3, st); ThesisForm.FakeOffsetEdit.text:= st; val (ThesisForm.SNREdit.text, SNR, code); if (code <> 0) then SNR:= 60.0; str (SNR:8:2, st); ThesisForrn.SNREdit.text:= st; ratio:= Power(10.0, SNR I (-20.0)); if verbose then begin ThesisForm.InputStatusEdit.text:= 'Generating Fake Data'; Application.ProcessMessages end; Fake Wave (maxwavdata, freq, offset, false, ratio); datalen:= rnaxwavdata; errrnsg:= 'Fake Data Generated'; point:= 1; str (point:B, st); ThesisForrn.SamplePointerEdit.text:= st; 121

PAGE 134

error:= false; loc_error:= false; str (datalen:lO, st); ThesisForm.FileLengthEdit.text:= st; if (verbose AND (datalen > 0)) then begin wavwritename:= name; writewav end; ThesisForm.InputStatusEdit.text:= 'Fake Data Generated' end; (*****************************************************************************) (* Routines to write real and complex data arrays to files. *) (*****************************************************************************) procedure writerealdatafile (var data: RealArray; size: longint; (* Rev 23 Dec 2002 GAG *) var n: longint; outs: textfile; begin assignfile (outs, name); rewrite (outs); for n:= 1 to size do begin if FreqData then name: mystringtype; FreqData: boolean; scale: extended); writeln (outs, n:6, ',f[n] scale:26, ',data[n]:26) else writeln (outs, n:6, end; closefile (outs) end; ', data[n] :26) (*****************************************************************************) procedure writecomplexdatafile (var data: RealArray; size: longint; name: mystringtype; FreqData: boolean); (* Rev 23 Dec 2002 GAG *) var n: longint; outs: textfile; size_half: integer; begin assignfile (outs, name); rewrite (outs); size half:= size div 2; for n:= 1 to size do begin 122

PAGE 135

if n mod 2 = 0 then begin if FreqData then begin if n <= size half then writeln (outs, n div 2:6, f[ (n div 2)] :26,' data [n-1] :26, ', data [n] :26, sqrt (sqr (data[n-1]) + sqr (data[n])) :26) else writeln (outs, n div 2:6, f[size_half-(n div 2)+2]:26,' data [n-1] :26, data [n] :26, 1 1 sqrt (sqr (data[n-1]) + sqr (data[n])):26) end else writeln (outs, n div 2:6, data [n-1] :26, data [n] :26, sqrt (sqr (data[n-1]) + sqr (data[n])):26) end end; closefile (outs) end; (*****************************************************************************) (* Windowing Routines *) function ParzenWindow (j, N: longint):extended; (* Rev 23 Dec 2002 GAG *) (* From: Numerical Recipes, Page 425 *) begin ParzenWindow:= 1.0-abs ((j -0.5 (N-1.0)) I (0.5 (N + 1.0))) end; (*****************************************************************************) function HannWindow (j, N: longint) :extended; (* Rev 23 Dec 2002 GAG *) (* From: Numerical Recipes, Page 425 *) begin HannWindow:= 0.5 (1.0cos (2.0 *pi* j I (N-1.0))) end; (*****************************************************************************) function WelchWindow (j, N: longint):extended; (* Rev 23 Dec 2002 GAG *) (* From: Numerical Recipes, Page 425 *) begin 123

PAGE 136

WelchWindow:= 1.0 -sqr ( (j 0.5 (N -1.0) J I (0.5. (N + 1.0) J J end; (*****************************************************************************) function Window (j, N: longint): extended; (* Rev 23 Dec 2002 GAG *) var temp: extended; begin case WindowSelect of Welch: temp:= WelchWindow (j, N); Parzen: temp:= Parzenwindow (j, N); Hann: temp:= Hannwindow (j, N); else temp:= 1.0 end; Window:= temp end; (*****************************************************************************) Procedure PerformWindow (var data: RealArray; size: longint); (* Rev 23 Dec 2002 GAG *) var n: 1ongint; begin (* Window the data *) norm:= 0.0; for n:= 1 to size do begin data [n] := data [n] Window ( (n-1), (size-1)); norm:= norm+ Window ((n-1), (size-1)) end; norm:= norm I size end; (*****************************************************************************) Procedure Normalize (var data: RealArray; numpoints: longint); (* Rev 24 Dec 2002 GAG *) var n: 1ongint; term: extended; begin term:= 1.0 I norm; for n:= 1 to numpoints do data [n):= data [n] *term; end; (*****************************************************************************) Function Measure (var data: RealArray; numpoints: longint): extended; var temp: extended; 124

PAGE 137

i: longint; begin temp:= 0.0; for i:= 1 to numpoints do temp:= temp+ sqr (data [i]); Measure:= sqrt (temp) end; (*****************************************************************************) (* FFT Routines from Numerical Recipes Pascal Source Disk *) (*****************************************************************************) Replaces DATA by its discrete Fourier transform, if !SIGN is input as 1; or replaces DATA by size times its inverse discrete Fourier ransform, if !SIGN is input as -1. DATA is a complex array of length size or, equivalently, a real array of length 2*size. size MUST be an integer power of 2 (this is not checked for!). } PROCEDURE four1(VAR data: RealArray; size,isign: longint); (* Rev 24 Dec 2002 GAG *) VAR ii,jj,n,mmax,m,j,istep,i: longint; (* Double precision for trigonometric recurrances *) wtemp,wr,wpr,wpi,wi,theta: double; tempr,tempi,wrs,wis: extended; BEGIN if verbose then begin measured:= false; BeforeMeasure:= Measure (data, 2 size) end; n := 2*size; j := 1; (* This is the bit-reversal section of the routine. *} FOR ii := 1 TO size DO BEGIN i := 2*ii-1; IF j > i THEN BEGIN (* Exchange the two comples numbers. *) tempr := data[j]; tempi := data[j+1]; data[j] := data[i]; data[j+1] := data[i+1]; data[il := tempr; data[i+1] := tempi END; m := n DIV 2; WHILE (m >= 2) AND (j > m) DO BEGIN j := j-m; m := m DIV 2 END; j := j+m END; 125

PAGE 138

(* Here begins the Danielson-Lanczos section of the routine. *) mrnax := 2; (* Outer loop executed log2 size times. *) WHILE n > mrnax DO BEGIN .istep := 2*mrnax; (* Initialize the trigonometric recurrance. *) theta := 6.28318530717959/(isign*mrnax); wpr := -2.0*sqr(sin(0.5*theta)); wpi := sin(theta); wr := 1.0; wi := 0.0; (* Here are the two nested inner loops. *) FOR ii := 1 TO mmax DIV 2 DO BEGIN m := 2*ii-1; wrs = wr; wis := wi; (* This is the Danielson-Lanczos formula: *) FOR jj := 0 TO (n-m) DIV istep DO BEGIN i := m + jj*istep; j := i+mmax; tempr := wrs*data[j]-wis*data[j+l]; tempi := wrs*data[j+1]+wis*data[j]; data(j] := data[i]-tempr; data[j+l] := data[i+l]-tempi; data[i] := data[i]+tempr; data[i+l] := data[i+1]+tempi END; (* Trigonometric recurrance. *) wtemp := wr; wr = wr*wpr-wi*wpi+wr; wi := wi*wpr+wtemp*wpi+wi END; mmax := istep (* Not yet done. *) END; (* All done. *) (* One (added) task; normalize the output *) tempr:= 1.0 I sqrt(size); for i:= 1 to 2 size do data [i]:= data [i] tempr; if verbose then begin AfterMeasure:= Measure (data, 2 *size); measured:= true end END; (* (C) Copr. 1986-92 Numerical Recipes Software *) (*****************************************************************************) 126

PAGE 139

(* Supporting routines for performing ffts *) (*****************************************************************************) Procedure Euclidean converts the complex output of an fft to real (positive). The number of samples is reduced from size complex pairs to size real samples. ) procedure Euclidean (var data: RealArray; size: longint; full: boolean); (* Rev 23 Dec 2002 GAG *) var i: longint; size2: longint; begin size2:= size + size; for i:= 1 to size do if full then data [i] := sqrt (sqr(data [2*i-1]) + sqr (data [2*i]) + sqr(data [2*(size2-i)+4]) + sqr (data [2*(size2-i)+3))) else data [i] := sqrt (sqr(data [2*i-1]) + sqr (data [2*i])) end; (*****************************************************************************) (* Lagrange Interpolation routines *) (*****************************************************************************) function lagrange (VAR data: RealArray; x: extended; order, start: longint): extended; (* Rev 26 Dec 2002 GAG *) (* Find the interpolated value of array "data" at (real) index "x" *) var i, j: longint; num, den, temp: extended; begin temp:= 0.0; for i:= 1 to order do begin num:= 1.0; for j:= 1 to order do begin if j <> i then num:= num (x-(start+ j)) end; den:= 1.0; for j:= 1 to order do begin if j <> i then den:= den* ((start+ i) -(start+ j)) end; temp:= temp + num data[start + i] I den end; lagrange:= temp end; 127

PAGE 140

(*****************************************************************************) procedure Gradient Method; (* Rev 26 Dec 2002-GAG *) const gain var 0.05; start: longint; slope: extended; x, old_x, y: extended; y2, y1, x2, x1: extended; hz_per_bin: extended; order: longint; mean: longint; outs: textfile; iter: longint; begin hz_per_bin:= fck I nn2; order:= ThesisForm.OrderSpinEdit.Value; start:= Fundindex(order div 2); mean:= nn + 1; x:= Fundindex; iter:= 0; if verbose then begin assignfile (outs, 'gradient.dat'); rewrite (outs) end; repeat old x:= x; x2:= x (1.0 + y2:= lagrange (FundData, x2, order, start); x1:= x (1.0sqrt_eps); y1:= lagrange (FundData, x1, order, start); slope:= (y2-y1) I (x2-x1); x:= x + gain slope; loc max_offset:= (x mean) hz_per_bin; if verbose then begin iter:= iter + 1; writeln (outs, iter:10, loc_max_offset:40, ', slope:40) end; until (abs (x-old_x) I (abs(x) + abs(old_x))) < sqrt_eps; if verbose then closefile (outs); y:= lagrange (FundData, x, order, start); loc_max_corr:= y; loc_max_index:= round (x) end; 128

PAGE 141

(*****************************************************************************) procedure find sqrt eps; (* Rev 23 Dec 2oo2 GAG *) begin sqrt_eps:= 1.0; repeat sqrt eps:= sqrt eps I 2.0 until-((sqrt_eps-+ 1.0) = 1.0); sqrt_eps:= sqrt (sqrt_eps 2.0) end; (*****************************************************************************) (* Routines to support File Readin *) (*****************************************************************************) procedure FileReadinWAVfile; (* Rev 26 Dec 2002 GAG *) var st: rnystringtype; begin ThesisForrn.InputStatusEdit.text:= 'Beginning File Readin'; Application.ProcessMessages; if verbose then begin ThesisForrn.InputStatusEdit.text:= 'Begin File Readin'; Application.ProcessMessages end; loc error:= false; wavreadnarne:= ThesisForrn.inputfilenarneedit.text; ThesisForrn.FileLengthEdit.text:= ''; readwav; ThesisForrn.InputStatusEdit.text:= errrnsg; if error then ThesisForm.Filelengthedit.text:= 'error' else begin str (datalen:10, st); ThesisForrn.FileLengthEdit.text:= st end; if (verbose AND (datalen > 0)) then begin wavwritenarne:= 'Testl.wav'; writewav end; ThesisForrn.InputStatusEdit.text:= 'File Readin Complete' end; (*****************************************************************************) (* Routines to support initial FFT buffer fill I process operations *) 129

PAGE 142

procedure FillFrequencyVector; (* Rev 23 Dec 2002 GAG *) var n: longint; begin for n:= 1 to nn do f[n):= (n-1.0) Inn* fck 0.5 end; (*****************************************************************************) procedure SamplesProcessingTask (var data: RealArray; DoOffsetSearch: boolean); (* Rev 26 Dec 2002 GAG *) var i: longint; begin if verbose then begin ThesisForm.AudioSampleStatusEdit.text:= 'Running: SamplesProcessingTask'; Application.ProcessMessages end; if verbose then writerealdatafile (data, nn2, 'Incoming1.dat', false, 1.0); PerformWindow (data, nn2); if verbose then writerealdatafile (data, nn2, 'Incoming2.dat', false, 1.0); II nn2 real samples come in for i:= nn2 downto 1 do begin data [2*i-1]:= data [i); data [2*i ]:= 0.0 end; II nn2 complex samples come out if verbose then writerealdatafile (data, nn4, 'Incoming3.dat', false, 1.0); if NOT DoOffsetSearch then begin II nn2 complex pairs go in four1(data,nn2,1) II nn2 complex pairs come out end; FillFrequencyVector; if verbose then writecomplexdatafile (data, (2 nn2), 'Incoming4.dat', true); II nn2 complex pairs in and out Normalize (data, nn4); if verbose then 130

PAGE 143

writecomplexdatafile (data, (2 nn2), 'IncomingS.dat', true); errmsg:= 'FFT Processing Complete' end; procedure GetAudioSamplesTask; (* Rev 306 Dec 2002 GAG *) var i: longint; offset: longint; code: integer; st: mystringtype; begin ThesisForm.AudioSampleStatusEdit.text:= 'Fetching Audio Samples'; Application.ProcessMessages; if NOT (error or loc_error) then begin val (Thesisform.SamplePointerEdit.text, offset, code); if code <> 0 then offset:= 1; str (offset:8, st); Thesisform.SamplePointerEdit.text:= st; if code = 0 then begin if offset <= 0 then begin loc error:= true; errmsg:= 'illegal offset value' end else if (offset + nn2 1) > datalen then begin loc_error:= true; errmsg:= 'insufficient data' end else begin point:= offset; for i:= 1 to nn2 do AudioSamples [i):= wavdata [i+point); errmsg:= 'FFT Buffer OK'; point:= point + nn2; if point > datalen then point:= datalen; str (point:a, st); Thesisform.SamplePointerEdit.text:= st end end else begin loc_error:= true; errmsg:= 'bad parse of offset value' end; if NOT loc error then begin DataiN:= AudioSamples; 131

PAGE 144

SamplesProcessingTask (dataiN, false); err.msg:= 'Process Audio Samples Complete' end end else begin if error then _err.msg:= 'Error: GetAudioSamplesTask' else if loc error then err.msg:= 'Local Error: GetAudioSamplesTask' else err.msg:= 'Baffling: GetAudioSamplesTask' end; ThesisForm.AudioSampleStatusEdit.text:= errmsg end; (* Routines to Support Correlation Activities *) (*****************************************************************************) procedure Correlation (var datalP, data2P: RealArray); (* Rev 23 Dec 2002 GAG *) The data to be correlated is placed in datalp and data2p. Each set consists of nn2 complex pairs. var i: longint; Datal, Data2: RealArray; begin (* For diagnostic use; rectangular functions *) II data is nn2 complex pairs long ( for i:= 1 to nnB do if (i <= nn4) and (i mod 2 data [i] := 1.0 else data [i):= 0.0; datalp:= data; data2p:= data; II nn2 complex pairs long Datal:= datalp; if verbose then 1) then writecomplexdatafile (Datal, nn4, 'corrlin.dat', false); 30 Dec 2002 The data in datalp is nn2 complex pairs long. The lower half of this will be left in place; the upper half will be zeroed; it is empty anyway. The result is nn2 complex pairs long. for i:= (nn2 + 1) to nn4 do Datal [i] := 0.0; if verbose then writecomplexdatafile (Datal, nn4, 'corrlpad.dat', false); 132

PAGE 145

II nn2 complex pairs go in fourl(datal, nn2, 1); II nn2 complex pairs come out if verbose then writecomplexdatafile (Datal, nn4, 'corrl.dat', false); II nn2 complex pairs long Data2:= data2p; if verbose then writecomplexdatafile (Data2, nn4, 'corr2in.dat', false); 30 Dec 2002 The lower half of the data is moved to the upper half; the lower half is zeroed; it is empty anyway. The result is nn2 complex pairs long. for i:= nn4 downto 1 do begin if i > nn2 then Data2 [i] := Data2 [i nn2] else Data2 [i] := 0.0 end; if verbose then writecomplexdatafile (Data2, nn4, 'corr2pad.dat', false); II nn2 complex pairs come in fourl(data2, nn2, 1); II nn2 complex pairs come out if verbose then writecomplexdatafile (Data2, nn4, 'corr2.dat', false); (* compute the complex conjugate of DataREF nn2 complex pairs long *) for i:= 1 to nn4 do if i mod 2 = 0 then Data2 [i]:= -Data2 [i]; if verbose then writecomplexdatafile (Data2, nn4, 'corr3.dat', false); (* complex multiply them together; both are nn2 complex pairs long *) for i:= 1 to nn4 do if i mod 2 = 1 then data [i] := Datal [i] Data2 [iJ -Datal [i+l] Data2 [i+l] else data [i]:= Datal [i) Data2 [i-1] +Datal [i-1) Data2 [i); if verbose then writecomplexdatafile (data, nn4, 'corr4.dat', false); (* compute the complex conjugate of data, which is nn2 complex pairs long. *) for i:= 1 to nn4 do if i mod 2 = 0 then data [i]:= -data [i]; if verbose then writecomplexdatafile (data, nn4, 'corr5.dat', false); (* take the forward fft again *l II nn2 complex pairs come in fourl(data, nn2, 1); II nn2 complex pairs come out 133

PAGE 146

if verbose then writecomplexdatafile (data, nn4, 'corr6.dat', false); II nn2 complex pairs come in Euclidean (data, nn2, false); II nn2 reals come out if verbose then writerealdatafile (data, nn2, 'corr7.dat', false, 1.0) end; (*****************************************************************************) procedure OnePassCorrelation (freq, offset: extended; DoOffsetsearch: boolean); (* Rev 24 Dec 2002 GAG *) var i: longint; hz_per_bin: extended; score: extended; mean, diff, min, max: longint; begin hz_per_bin:= fck I nn2; loc_max_corr:= 0.0; loc max offset:= 0.0; loc-max-index:= 1; Fake_Wave (nn2, freq, offset, true, 0.0); llnn2 real samples go in SamplesProcessingTask (dataREF, DoOffsetSearch); llnn2 complex pairs come out if verbose then begin II nn2 complex pairs go in (each) Correlation (DataiN, DataiN); II nn2 reals come out writerealdatafi1e (data, nn2, 'CrossF.dat', false, 1.0); II nn2 complex pairs go in (each) Correlation (AudioSarnples, AudioSarnples); II nn2 reals come out writerealdatafile (data, nn2, 'CrossT.dat', false, 1.0) end; llnn2 complex pairs go in Correlation (DataiN, DataREF); llnn2 reals come out At this point the input data and the model data have been compared. If they are identical, then the peak is at nn + nnl2 + 1. If they are not, then the peak will be shifted. The region of interest is +143 Hz-. This is due to the lowest bass voice at about 86 Hz. mean:= nn + 1; II center of test range 134

PAGE 147

loc max corr:= 0.0; for-i:=-1 to nn2 do loc_max_corr:= loc_max_corr + sqr (data [i]); loc_max_corr:= sqrt (loc_max_corr); diff:= trunc (low I (2.0 hz_per_bin)); min:= mean -diff; max:_= mean + diff; score:= 0.0; for i:= min to max do if data[i] > score then begin score:= data [i]; loc max index:= i end; loc max offset:= Cloc_max_index mean) hz_per_bin end; (*****************************************************************************) procedure PerforrnFundamentalSearch (increment: extended); (* Rev 26 Dec 2002 GAG *) var freq: extended; offset: extended; code: integer; func: textfile; st: mystringtype; done, loc diag: boolean; loc_low, loc_high: extended; mean: extended; a, b, c: extended; Do_LPF: boolean; old_max_index: longint; old_max_offset: extended; begin max_freq:= 0.0; max_corr:= 0.0; max offset:= 0.0; max::::index:= 1; loc_max_corr:= 0.0; old max index:= 0; old-max-offset:= 0; if diag then begin assignfile Cfunc, 'FuncFreq.dat'); rewrite (func) end; Do LPF:= ThesisForm.FreqLPFCheck.Checked; a:= 0.0; b:= 0.0; mean:= 0.0; loc_diag:= diag; 135

PAGE 148

(* Hook for 1 pass search *) if increment = 0.0 then begin val (ThesisForm.FakeFreqEdit.Text, loc_low, code); loc_high:= loc low -1.0 end else begin If ThesisForm.FreqNarrowCheck.Checked then begin mean:= (high + low) I 2.0; loc low:= mean-(mean-low) 0.75; loc=high:= mean+ (high-mean) 0.75 end else begin loc_low:= low; loc_high:= high end end; if (loc_high < loc_low) or (NOT ThesisForm.Fund125Check.Checked) then done:= true else done:= false; repeat freq:= loc_low; repeat offset:= 0; if Do LPF then begin old max offset:= loc_max_offset; old max index:= loc max index end; OnePassCorrelation (freq, offset, false); if verbose then application.ProcessMessages; if loc diag OR Do LPF then begin -c:= b; b:= a; a:= loc_max_corr; mean:= (a+ b + c)/3.0 end; if loc_diag then writeln (func, freq:40, 1 1 loc_max_corr:40, (freq-increment) :40,1 1 mean:40); if Do LPF then begin-if mean > max corr then begin max_corr:= mean; max offset:= old_max_offset; max_freq:= freq -increment; max index:= old max index --136 I I

PAGE 149

end end else begin if loc max corr > max corr then begin -max_corr:= loc_max_corr; max offset:= loc max offset; max-freq:= freq;--max-index:= loc max index end --end; freq:= freq + increment until freq > loc_high; if loc diag then begin -closefile (func); loc_diag:= false end; FundData:= data; Fundindex:= max_index; str (max corr:B:l, st); ThesisForm.SFreqcorrEdit.text:= st; str (max offset:8:2, st); ThesisForm.EstOffsetEdit.text:= st; str (max freq:8:2, st); ThesisForm.SFreqEdit.text:= st; Application.ProcessMessages; if NOT done then begin loc low:= max freq -2 increment; loc-high:= max freq + 2 increment; if Increment >-1.0 then increment:= 1.0 else if increment> 0.75 then increment:= 0.5 else if increment > 0.35 then increment:= 0.2 else if increment > 0.15 then increment:= 0.1 else if increment > 0.075 then increment:= 0.05 else if increment > 0.0375 then increment:= 0.02 else if increment > 0.015 then increment:= 0.01 else if increment > 0.0075 then increment:= 0.005 137

PAGE 150

else done:-true end until done; end; (*****************************************************************************) procedure PerformAutocorrelationFundamentalSearch (var st: mystringtype; increment: extended); var fx: RealArray; i: longint; freq: extended; temp: extended; code: integer; inc: extended; limit, center: longint; hz_per_bin: extended; omega: extended; sum: extended; func: textfile; loc diag: boolean; loc-low, loc high: extended; mean: extended; a, b, c: extended; Do LPF: boolean; done: boolean; begin ThesisForm.FreqAutoStatusEdit.text:= st; Do_LPF:ThesisForm.AFreqLPFCheck.Checked; a:= 0.0; b:= 0.0; mean:= 0.0; loc diag:-diag; if loc diag then begin -assignfile (func, 'FuncAuto.dat'); rewrite (func) end; val (ThesisForm.HighEdit.text, temp, code); if (code <> 0) or (temp < 60) or (temp > 500) then 185.0 else High:-temp; val (ThesisForm.LowEdit.text, temp, code); if (code <> 0) or (temp < 60) or (temp > 500) then Low:= 92.50 else Low:= temp; if high <= low then begin High:= 185.00; Low:= 92.50 end; 138

PAGE 151

str (High:7:2, st); ThesisForm.HighEdit.text:= st; str (Low:7:2, st); ThesisForm.LowEdit.text:= st; Application.ProcessMessages; II nn2 complex pairs go in (each) Correlation (DataiN, DataiN); II nn2 reals come out if verbose then writerealdatafile (data, nn2, 'CrossF.dat', false, 1.0); hz_per_bin:= fck I nn2; max corr:= 0; max_freq:= 0; (* Hook for 1 pass search *) if increment = 0.0 then begin val (ThesisForm.FakeFreqEdit.Text, loc_low, code); loc_high:= loc_low 1.0 end else begin If ThesisForm.AFreqNarrowCheck.Checked then begin mean:= (high + low) I 2.0; loc_low:= mean-(mean-low) 0.75; loc_high:= mean+ (high-mean) 0.75 end else begin loc_low:= low; loc_high:= high end end; if (loc_high < loc_low) or (NOT ThesisForm.AFundl25Check.Checked) then done:= true else done:= false; repeat freq:= loc_low; repeat inc:= pi hz_per_bin I freq; { fck = 22050 and nn 2048, each bin is fck I nn2 = 5.3833 Hz. let's say you want to generate 161.499 Hz. The period of this wave is 161.499 I 5.3833 = 30 columns. Therefore, the rotation per column is then pil30 col.no. (half-freq cosine wave) center:= nn + 1; limit:= round (freq I (2.0 hz_per_bin)) +center+ 1; sum:= 0; for i:= limit to nn2 do begin omega:= inc (i -center); 139

PAGE 152

if omega >= pi then omega := omega -pi; temp:= cos (omega); fx [i] := sqr (temp); sum:= sum + fx [i] data [i] end; if sum > max carr then begin max carr:= sum; max_freq:= freq end; if loc diag OR Do LPF then begin -c:= b; b:= a; a:= loc_max_corr; mean:= (a+ b + c)/3.0 end; if verbose then begin for i:= 1 to (limit 1) do fx [i]:= 0.0; writerealdatafile (fx, nn2, 'Pattern.dat', false, 1.0) end; if loc diag then writeln (func, freq:40, 1 1,sum:40, (freq-increment):40, 1 ',mean:40); if Do LPF then begin-if mean > max carr then begin max carr:= mean; max_freq:= freq -increment end end else begin if sum > max carr then begin max carr:= sum; max_freq:= freq end end; freq:= freq + increment until freq > loc_high; str (max corr:8:2, st); ThesisForm.AFreqCorrEdit.text:= st; str (max_freq:9:3, st); ThesisForm.AFreqEdit.text:= st; Application.ProcessMessages; if NOT done then begin 140

PAGE 153

loc low:= max freq -2 increment; loc-high:= max freq + 2 increment; if increment >-1.0 then increment:= 1. 0 else if increment> 0.75 then increment:= 0.5 else if increment > 0. 35 then increment:= 0.2 else if increment > 0.15 then increment:= 0.1 else if increment > 0.075 then increment:= 0.05 else if increment > 0.0375 then increment:= 0.02 else if increment > 0.015 then increment:= 0.01 else if increment > 0.0075 then increment:= 0.005 else done:= true end; if diag then begin closefile (func); loc_diag:= false end until done; st:= 'Fund. by Autocorrelation complete'; ThesisForm.FreqAutoStatusEdit.text:= st end; (*****************************************************************************) procedure PerformOffsetSearch (increment: extended); var st: mystringtype; offset: extended; highlimit: extended; func: textfile; code: integer; begin DataiN:= AudioSamples; SamplesProcessingTask (dataiN, true); max corr:= 0.0; max=offset:= 0.0; max_index:= 1; loc max corr:= 0.0; if diag then begin 141

PAGE 154

assignfile (func, 'FuncOff.dat'); rewrite (func) end; if increment > 0.0 then begin offset:= round (-0.90 *low I (2.0 *increment)) *increment; (*low limit*) highlimit:= 0.90 low I 2.0 else begin val (ThesisForm.FakeOffsetEdit.text, offset, code); str (offset:B:2, st); (* offset is the low limit *) ThesisForm.FakeOffsetEdit.text:= st; ThesisForm.OffsetEdit.text:= st; highlimit:= offset -1 end; repeat OnePassCorrelation (max_freq, offset, true); if verbose then application.ProcessMessages; if diag then writeln (func, offset:40, ', loc_max_corr:40); if loc max corr > max corr then begin --max_corr:= loc_max_corr; max offset:= offset end; offset:= offset + increment until offset > highlimit; if diag then closefile (func); str (max corr:B:l, st); ThesisForm.OffsetCorrEdit.text:= st; str (max_offset:B:2, st); ThesisForm.OffsetEdit.text:= st; if error then ThesisForm.OffsetStatusEdit.text:= errmsg else ThesisForm.OffsetStatusEdit.text:= 'Offset Search Complete'; application.ProcessMessages end; (* Batch Mode Routines *l (*****************************************************************************) procedure StartBatchMode; var st: mystringtype; 142

PAGE 155

begin st:= 'Batch Mode Started'; ThesisFor.m.BatchEdit.text:= st; Application.ProcessMessages; st:= 'File Readin Started'; ThesisFor.m.BatchEdit.text:= st; Application.ProcessMessages; FileReadinWAVfile; st:= 'Batch Output Opening'; ThesisFor.m.BatchEdit.Text:= st; Application.ProcessMessages; assignfile (BatchOuts, 'Batch.dat'); append (BatchOuts); ensemble:= 0; running:= true; st:= 'Batch Mode Started'; ThesisFor.m.BatchEdit.text:= st; ThesisFor.m.BatchButton.Caption:= 'Stop Batch'; Application.ProcessMessages end; (*****************************************************************************) Procedure StopBatchMode; var st: mystringtype; begin st:= 'Batch Output Closing'; ThesisForm.BatchEdit.Text:= st; Application.ProcessMessages; closefile (BatchOuts); running:= false; ThesisFor.m.BatchButton.Caption:= 'Run Batch'; st:= 'Batch Mode Finished'; ThesisFor.m.BatchEdit.text:= st end; (*****************************************************************************) Procedure BatchMode; var loc st, st, mini st: mystringtype; code: integer; increment: extended; Freq, FCorr, Off, OCorr: extended; Induced: extended; begin if (datalen >=(point+ nn2ll then GetAudioSamplesTask else StopBatchMode; if running then 143

PAGE 156

begin ThesisForm.Timerl.Enabled:= false; ensemble:= ensemble + 1; str (ensemble:!, mini st); loc st:= 'Ensemble: ,-+mini st + '; ThesisForm.BatchEdit.text:= Ioc_st; Application.ProcessMessages; val (ThesisForm.AFreqincrementEdit.text, increment, code); if (code <> 0) or (increment < 0.0) then increment:= 1.0; str (increment:8:2, st); ThesisForm.AFreqincrementEdit.text:= st; st:= 'Batch Fund. Search'; PerformAutocorrelationFundamentalSearch (st, increment); loc st:= loc st + '*'; ThesisForm.BatchEdit.text:= loc st; Application.ProcessMessages; ThesisForm.OffsetStatusEdit.text:= 'Beginning Offset Search'; Application.ProcessMessages; val (ThesisForm.OffsetincrementEdit.text, increment, code); if (code <> 0) or (increment < 0.0) then increment:= 1.0; str (increment:8:2, st); ThesisForm.OffsetincrementEdit.text:= st; PerformOffsetSearch(increment); val (ThesisForm.ErrorEdit.text, Induced, code); val (ThesisForm.AFreqEdit.Text, Freq, code); val (ThesisForm.AFreqCorrEdit.text, FCorr, code); val (ThesisForm.OffsetEdit.text, off, code); val (ThesisForm.OffsetCorrEdit.text, OCorr, code); Writeln (BatchOuts, induced:40, ', Freq:40, ', FCorr:40, ', off:40, ', 0Corr:40,' '); ThesisForm.Timerl.Enabled:= true end end; (*****************************************************************************) (* GUI Support Routines *) (*****************************************************************************) procedure TThesisForm.InputFileButtonClick(Sender: TObject); (* Rev 02 Jan 2003 GAG *) begin FileReadinWAVfile end; (*****************************************************************************) procedure TThesisForm.FormCreate(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin 144

PAGE 157

find sqrt eps; windowSelect:= welch; error:= true; (* no data at start of processing *l loc error:= false; nn:;; 2048; update nns; datalen:= 0; low:= 92.5; high:= 185.0; verbose:= false; diag:= false; running:= false end; (*****************************************************************************) procedure TThesisForm.AudioSarnpleButtonClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin GetAudioSarnplesTask end; (*****************************************************************************) procedure TThesisForm.FFTSizeUpDownClick(Sender: TObject; Button: TUDBtnType); (* Rev 23 Dec 2002 GAG *) var i: longint; st: mystringtype; begin nn:= 1; for i:= 1 to FFTSizeUpDown.Position do nn:= nn + nn; update_nns; str (nn:5, st); FFTSizeEdit.text:= st end; (*****************************************************************************) procedure TThesisForm.FakeDataButtonClick(Sender: TObject); (* Rev 23 Dec 2002 GAG "') begin InputStatusEdit.text:= 'Generating Fake Data'; Application.ProcessMessages; dofakedata ('Thesisl.wav'l end; (*****************************************************************************) procedure TThesisForm.FundTestButtonClick(Sender: TObject); ("' Rev 23 Dec 2002 GAG "'l var freq: extended; code: integer; st: mystringtype; 145

PAGE 158

begin FundamentalStatusEdit.text:= 'Beginning 1 Pass Fund. Search'; Application.ProcessMessages; val (ThesisForm.FakeFreqEdit.Text, freq, code); str (freq:8:2, st); ThesisFor.m.FakeFreqEdit.text:= st; ThesisForm.SFreqEdit.text:= st; Perfor.mFundamentalSearch (0.0); if error then ThesisFor.m.FundamentalStatusEdit.text:= errmsg else FundamentalStatusEdit.text:= '1 Pass Fund. Search Complete' end; (*****************************************************************************) procedure TThesisFor.m.SopranoRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin LowEdit.text:= '246.94'; HighEdit.text:= '438.88'; //'329.63' end; (*****************************************************************************) procedure TThesisFor.m.MezzoSopranoRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin LowEdit.text:= '207.65'; HighEdit.text:= '415.30'; //'369.99' end; {*****************************************************************************) procedure TThesisForm.ContraltoRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin LowEdit.text:= '185.00'; HighEdit.text:= '370.00'; //'415.30' end; (*****************************************************************************) procedure TThesisFor.m.TenorRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin LowEdit.text:= '146.83'; HighEdit.text:= '293.66'; //'440.00' end; (*****************************************************************************) procedure TThesisForm.BaritoneRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) 146

PAGE 159

begin LowEdit.text:= '92.50'; HighEdit.text:= '185.00'; //'293.66'; end; (*****************************************************************************) procedure TThesisForm.BassRadioC1ick(Sender: TObject); (* Rev 23 Dec 2002 GAG *l begin LowEdit.text:= '87.31'; HighEdit.text:= '174.62'; //'293.66' end; (*****************************************************************************) procedure TThesisForm.FundamentalSearchButtonClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) var st: mystringtype; code: integer; temp: extended; begin FundamentalStatusEdit.text:= 'Beginning Fundamental Search'; Application.ProcessMessages; val (ThesisForm.HighEdit.text, temp, code); if (code <> 0) or (temp < 60) or (temp > 500) then High:= 185.0 else High:= temp; val (ThesisForm.LowEdit.text, temp, code); if (code <> 0) or (temp < 60) or (temp > 500) then Low:= 92.50 else Low:= temp; if high <= low then begin High:= 185.00; Low:= 92.50 end; str (High:7:2, st); ThesisForm.HighEdit.text:= st; str (Low:7:2, st); ThesisForm.LowEdit.text:= st; val (SFreqincrementEdit.text, increment, code); if (code <> 0) or (increment < 0.0) then increment:= 1.0; str (increment:8:2, st); SFreqincrementEdit.text:= st; PerformFundamentalSearch (increment); if error then 147

PAGE 160

ThesisForm.FundamentalStatusEdit.text:= errmsg else ThesisForm.FundamentalStatusEdit.text:= 'Fundamental Search Complete' end; (*****************************************************************************) procedure TThesisForm.ImproveOffsetEstimateButtonClick(Sender: TObject); (* Rev 24 Dec 2002 GAG *) var st: mystringtype; begin Gradient_Method; str (loc max offset:B:2, st); ImproveOffsetEdit.Text:= st end; (*****************************************************************************) procedure TThesisForm.ParzenRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin WindowSelect:= Parzen end; (*****************************************************************************) procedure TThesisForm.HannRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin WindowSelect:= Hann end; (*****************************************************************************) procedure TThesisForm.WelchRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin WindowSelect:= Welch end; (*****************************************************************************) procedure TThesisForm.RectangularRadioClick(Sender: TObject); (* Rev 23 Dec 2002 GAG *) begin WindowSelect:= Rectangular end; (*****************************************************************************) procedure TThesisFor.m.OffsetSearchButtonClick(Sender: TObject); var increment: extended; code: integer; st: mystringtype; 148

PAGE 161

begin OffsetStatusEdit.text:= 'Beginning Offset Search'; Application.ProcessMessages; val (OffsetincrementEdit.text, increment, code); if (code <> 0) or (increment < 0.0) then increment:= 1.0; str (increment:8:2, st); OffsetincrementEdit.text:= st; PerformOffsetSearch(increment) end; (*****************************************************************************) procedure TThesisForm.OffTestButtonClick(Sender: TObject); begin OffsetStatusEdit.text:= 'Beginning 1 Pass Off. Search'; Application.ProcessMessages; PerformOffsetSearch (0.0) end; (*****************************************************************************) procedure TThesisForm.FundAutoSearchButtonClick(Sender: TObject); var st: mystringtype; code: integer; begin st:= 'Start Fund. Search by Autocorrelations'; val (ThesisForm.AFreqlncrementEdit.text, increment, code); if (code <> 0) or (increment < 0.0) then increment:= 1.0; str (increment:8:2, st); ThesisForm.AFreqincrementEdit.text:= st; PerformAutocorrelationFundamentalSearch (st, increment) end; (*****************************************************************************) procedure TThesisForm.VerboseCheckClick(Sender: TObject); begin if VerboseCheck.Checked then Verbose:= true else Verbose:= false end; (*****************************************************************************) procedure TThesisForm.DiagCheckClick(Sender: TObject); begin If DiagCheck.Checked then Diag:= true else Diag:= false 149

PAGE 162

end; procedure TThesisForm.AFundTestButtonClick(Sender: TObject); var st: mystringtype; begin st:= 'Start 1 Pass Auto Search'; PerformAutocorrelationFundamentalSearch (st, 0.0) end; (*****************************************************************************) procedure TThesisForm.BatchButtonClick(Sender: TObject); begin if not running then StartBatchMode else StopBatchMode end; (*****************************************************************************) procedure TThesisForm.TimerlTimer(Sender: TObject); begin if running then BatchMode end; (*******************T*********************************************************) end. 150

PAGE 163

References 1. R. D. Kent, C. Read. The Acoustic Analysis of Speech. San Diego: Singular Publishing Group, Inc., 1992. 2. A. H. Benade. Fundamentals of Musical Acoustics. New York: Oxford University Press, 1976. 3. P. F. Roe. Choral Music Education. Englewood Cliffs: Prentice-Hall, 1970. 4. B. Coffin. The Sounds of Singing. Boulder: University of Colorado, 1976. 5. R. C. Weast, ed. Handbook of Chemistry and Physics. Cleveland: The Chemical Rubber Co., 1965. 6. H. P. Hsu. Analog and Digital Communications. New York: McGraw-Hill, 1993. 7. Technical Staff, Watkins-Johnson Company. Instruction Manual for the WJ-8718 Series Receiver. Gaithersburg: Watkins-Johnson Company, 1985. 8. F. M. Gardner. Phaselock Techniques. New York, John Wiley & Sons, Inc., 1979. 9. Technical Staff, Intel Corporation. Introduction to Direct Digital Synthesis; Application Note 101. San Jose, Intel Corporation, 1991. 1 0. W. Burdick, E. Swartz. Elecraft K2 Transceiver Owner's Manual. Aptos, Elecraft LLC, 2000. 11. United States Government. Manual of Regulations and Procedures for Federal Radio Frequency Management. Washington D.C., US Government Printing Office, 2001. 12. M. J. Hinich. Detecting a Hidden Periodic Signal When Its Period is Unknown. IEEE Transactions on Acoustics, Speech, And Signal Processing, Vol. ASSP-30, No. 5, October 1982. 13. W.H. Press[et al]. Numerical Recipes in C. Cambridge: The Cambridge University Press, 1992. 14. E. 0. Brigham. The Fast Fourier Transform and Its Applications. Englewood Cliffs: Prentice-Hall Inc., 1988. 151

PAGE 164

15. R. J. Higgins. Digital Signal Processing in VLSI. Englewood Cliffs: Prentice-Hall Inc., 1990. 16. W.H. Press[et al]. Numerical Recipes. Cambridge: The Cambridge University Press, 1986. 17. R. C. Singleton. An ALGOL Procedure For the Fast Fourier Transformation with Arbitrary Factors. CACM Collected Algorithms, Algorithm 339, Comm. ACM 111 November, 1968. 18. S. D. Steams, R. A. David. Signal Processing Algorithms. Englewood Cliffs: Prentice-Hall, Inc., 1988. 19. E. K. P. Chong, S. H. Zak. An Introduction to Optimization. New York: John Wiley and Sons, Inc., 1996. 20. G. R. Cooper, C. D. McGillem. Probabilistic Methods of Signal and System Analysis. Oxford: Oxford University Press, 1999. 21. G. P. Tolstov. Fourier Series. New York: Dover Press, 1962. 22. W. Cheney, D. Kincaid. Numerical Analysis and Computing. Pacific Grove: Brooks/Cole Publishing Co., 1999. 152