Citation
Digital correlator board for the IBM AT computer

Material Information

Title:
Digital correlator board for the IBM AT computer
Creator:
Meyer, David R
Publication Date:
Language:
English
Physical Description:
vii, 124 leaves : illustrations ; 29 cm

Thesis/Dissertation Information

Degree:
Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Electrical Engineering, CU Denver
Degree Disciplines:
Electrical engineering

Subjects

Subjects / Keywords:
IBM Personal Computer AT ( lcsh )
Correlators ( lcsh )
Correlators ( fast )
IBM Personal Computer AT ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaf 64).
General Note:
Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Electrical Engineering.
Statement of Responsibility:
by David R. Meyer.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
22937101 ( OCLC )
ocm22937101
Classification:
LD1190.E54 1990m .M477 ( lcc )

Full Text
DIGITAL CORRELATOR BOARD
FOR THE
IBM AT COMPUTER
by
David R. Meyer
B.S.E.E., Lehigh University, 1984
W thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Electrical Engineering
1990


This thesis for the Master of Science degree by
David R. Meyer
has been approved for the
Department of
Electrical Engineering
by
Gita Alaghband
Marv Anderson
Date M+rcX\j ^^3


iii
Meyer, David R. (M.S., Electrical Engineering)
Digital Correlator Board for the IBM AT Computer
Thesis directed by Professor Douglas Ross
This thesis reports on a hardware design. The
purpose was to design and build dedicated hardware for
a correlation calculation that will be at.least an
order of magnitude faster than a software
implementation. The design digitizes two analog
signals and stores the digital values in DRAM. The
correlation function is then performed on the stored
information. The computer is used only for high level
control and graphical display of the correlation
calculations. The major design task was to control the
DRAM and its refresh cycles and still provide
continuous sampling of the analog signals. This thesis
outlines the design, implementation and testing of the
hardware.
The form and content of this abstract are approved. I
recommend its publication.
Signed


CONTENTS
CHAPTER
I. INTRODUCTION................................. 1
Theory and Application......................2
Estimation Errors...........................5
Synopsis....................................6
II. GENERAL DESIGN DISCUSSION.....................7
Design Specification....................... 7
Design Tasks...............................11
III. FUNCTIONAL DESIGN DISCUSSION.................15
Data Format and Manipulation...............15
ADC Architecture...........................20
RAM Architecture...........................24
MAC Architecture.......................... 27
System Controller..........................29
Address Generator..........................35
System Computer Interface................40
Software................................. 46
IV. TEST RESULTS.............................. .48
Linearity..................................48
Autocorrelation............................51
Crosscorrelation
58


V
V. DESIGN IMPROVEMENTS..........................61
BIBLIOGRAPHY........................................64
APPENDIX
A. Schematics...................................65
B. Calibration Procedure........................79
C. Software Listing.............................81
D. DRAM Timing..................................86
E. State Machine Listing........................91
F. Integrated Circuit Specifications............97
G. Test Filter Characteristics.................116
F. Abbreviated Saicor Technical Note..........120


TABLES
vi
TABLES
Table
1. Eight Bit Conversion.........................19
2. MAC Conversion.............................. 19
3. I/O Mapping..................................43
4. Amplitude Linearity..........................49


Vll
FIGURES
Figure
1. Conceptual Block Diagram......................9
2. Final Block Diagram..........................12
3. ADC Schematic.................*..............21
4. MAC Preload Timing.......................... 28
5. State Machine Block Diagram..................31
6. Address Generator Block Diagram..............36
7. Address Decode Block Diagram.................39
8. Address Load Timing..................... ....41
9. I/O Bus Signals..............................42
10. Square Wave Autocorrelation..................50
11. Sine Wave Autocorrelation....................52
12. Wideband Noise Autocorrelation...............53
13. Bandlimited Noise Autocorrelation............55
14. Simple LPF Autocorrelation...................56
15. Simple LPF Autocorrelation Data Points....57
16. Maximally Flat Crosscorrelation..............59
17. Simple R.C. Crosscorrelation.................60


CHAPTER I
INTRODUCTION
The purpose of this project is to design a DSP
expansion board for an IBM AT or compatible computer.
The expansion board will perform either an
autocorrelation or a crosscorrelation function on
sampled waveforms. There are various methods of
implementing correlators; however, they can be divided
into two main categories. Real time processing which
computes correlation coefficients as the sampling
process takes place or post time processing which
stores the complete waveform in memory before
proceeding with coefficient calculations. The first
method has serious limitations on the number of
coefficients calculated and the maximum sampling rate
used. Either the sampling rate is very low in order to
process a large number of coefficients between the
samples or each coefficient has its own dedicated
hardware to reduce calculation time. The second method
allows a higher sampling rate to be used since there
are no functions performed between samples. In order
to sample at 5 million bytes per second and calculate


2
a 1024 point correlation function on an IBM AT
expansion board, the second method is used in this
thesis.
Although the project is design intensive, a
basic understanding of correlation theory is needed in
order to evaluate the performance of the design.
Theory and Application
Correlation functions measure the time domain
similarity between identical waveforms,
autocorrelation, and between different waveforms,
crosscorrelation. Autocorrelation is mathematically
described as:
T
Rx(r) = lim ^ T ^ x (Ox(t- r)dt
Autocorrelation is therefore a point by point
multiplication of a waveform with a shifted version of
itself, followed by a weighted summation over all
time. The Fourier transform of the autocorrelation
function yields the square of the magnitude of the
Fourier transform of the processed signal. This
function is called the auto power spectrum. The
ordinate axis of the autocorrelation plot has units of
volts2 and the abscissa has units of time or sampled
intervals.


3
There are three principal properties of
autocorrelation functions as follows:
1. Rx(t) = Rx(-t)
2. Rx(0) = mean square value > Rx(r)
3. Rx() = (average value)2
The significance of the first property is that phase
information is lost in the function. The second states
that the value at r=0 represents the total signal
power both A.C. and D.C. The third property states
that for large values of t, the function is
approaching the D.C. power of the signal. Two
important applications of autocorrelation are :
1. Recovery of periodic signal in noise.
2. Statistical parameter determination.
The autocorrelation function of a periodic time signal
is a periodic function of time, and the resulting
period is the same as that of the original time
signal. This property is the foundation for
application 1. Consider the situation where an unknown
periodic signal is buried in noise, but in which a
measurement of the period is desired. If the signal is
correlated with itself, the resulting autocorrelation
function will display the desired period. The
statistical parameters of mean value, mean square
value and variance can be derived. Property 2 is the


4
mean square value, arid property 3 is the square of the
mean. The variance parameter is expressed as the mean
square minus the square of the mean.
Crosscorrelation is mathematically described
as follows:
T
Rxy(T)*lim -j f x(t)y(t-r)dt
It is therefore a point by point multiplication of a
waveform by a shifted version of a second waveform,
followed by a summation over all time. The Fourier
transform of the crosscorrelation is called the cross
power spectrum. The crosscorrelation plot has the same
units as an autocorrelation plot.
There are two main properties of
crosscorrelation as follows:
1. Rxy(-T) =|Ryx(T)
j
2. ABS(Rxy(t)) < (h)[Rx(0) + Ry(0)]
The first property states that it does display
symmetry about the ordinate when x and y are
interchanged. The second property presents a useful
bounding relationship for the magnitude of the cross
function. The magnitude is never greater than the
average of the power contained in the two signals. One
of the primary applications of crosscorrelation is in
determining the delay of a signal that has been hidden
in additive noise. For example, this operation arises


5
in radar and sonar systems where a known signal is
transmitted and reflected from a target at some later
time. Measurement of the exact delay from the peak of
the cross function will provide ranging information.
Other useful applications are:
1. Detection of differences between two data
sequences.
2. Correction of errors in expanded code data
streams.
3. Multiplexing of data among several users.
4. Recognition of specified patterns within a
data stream.
5. Diagnosis of medical disorders.
Estimation Errors
The above theoretical summary is based upon
infinite summation time. Practical applications are
limited by cost and size of hardware to a finite
summation period. The correlation equation becomes
N
RxO) ^ *k*k+r
k=l
Because of the limited summation time, error is
introduced. Langenthal [1] explores this error. The
normalized standard error for random noise with a


6
bandwidth of B is
From this equation, an error of .01 is achieved at t=0
with a bandwidth-time product of 10000. For a sampling
rate of 2B, this implies that 20000 sums per point are
required. For the design described in this thesis, the
record length is 1048576. The corresponding error at
t=0 for a sampling rate twice the bandwidth of the
sampled noise yields an error of .0014.
Synopsis
The rest of this thesis details the design and
testing of a digital correlator. Chapter 2 explores
the several possible system computer interfaces and
describes the block architecture and design flow of
the method chosen. Chapter 3 describes each of the
main building blocks. For each method, the functions
that are performed will be documented along with
functional specifics of their implementation. In
chapter 4, the results and interpretation of system
test will be documented. And finally, chapter 5
suggests design improvements.


CHAPTER II
GENERAL DESIGN DISCUSSION
Most designs have various methods in which
they can be implemented. Certain methods are
eliminated from consideration because they do not
satisfy the general criteria of the overall design,
i.e., cost or size. In order to satisfy the end user
of the design, all the pertinent criteria need to be
specified before the detailed stage of the design
process can begin. The first effort in a design is
establishing a good block diagram and specification
that satisfies the desired functionality. This chapter
will present the original block diagram that was given
to me to implement along with desired specifications.
A discussion follows exploring the advantages and
disadvantages of a hardware based design versus a
software based design. A final block diagram is then
presented along with the design flow of the project.
Design Specification
The intent of this design was to implement a
hardware based correlation function on an extender


8
board that would insert into an IBM AT or compatible
computer's I/O slot. The design goals are as follows:
1. 1024 point correlation function in less
than 5 minutes. This is to be divided equally
between positive and negative coefficients.
2. Each correlation coefficient will have at
least 1 million product terms; therefore, 1
million bytes of storage is necessary for each
channel.
3. Analog to Digital converter (ADC) with 8
bit resolution.
4. Variable sample rate at binary increments
with maximum equal to 5 million bytes per
second.
5. Interface with IBM AT using Turbo Basic
software.
Because of size limitations, the following
considerations are not included in this design.
1. Automatic gain control at the input of the
ADC.
2. Antialiasing filter before the ADC.
3. Analog multiplexer to allow easy setup of
auto- or crosscorrelation functions.
It is left to the user to properly operate the
device. Figure 1 is the conceptual block diagram of
the design.


Figure l. Conceptual Block Diagram


10
The ADC digitizes the data to be stored in the
RAM. The address counter provides addressing of the
RAM during write and read inodes. The program timer
enables a variable sample rate from a fixed
oscillator. The MAC chip is a dedicated multiply and
accumulate integrated circuit that provides the
correlation coefficient to the computer.
An obvious question arises. Why do all of the
above functions need do be implemented in dedicated
hardware as opposed to a partial software
implementation? The answer is time and available
memory. An AT computer can only address 640 thousand
bytes of memory. This design needs 2 million bytes of
memory. A software implementation of the RAM with
existing computer hardware would store the data on a
hard disk, drastically increasing access time. This
would affect the coefficient calculation time along
with reducing the maximum sampling rate that could be
used. The sampling rate would also be reduced to allow
for overhead functions such as memory refresh cycles
and various interrupts. A software implementation of
just the MAC would still drastically reduce the
performance of the design. The multiply function is
one of the longest instructions for a microprocessor
to perform. The Intel 80286 used in the IBM AT takes
16 clock cycles to complete a multiply and 7 cycles


11
for an addition, if acquisition of data from the
extender board is done by memory cycles, then the MAC
function would take 23 cycles not including overhead
functions. On a 12 MHz machine, each cycle is 166 ns.
The MAC time would be 3.8 us. This is 19 times greater
than the 200 ns MAC time of dedicated hardware used in
this design. The 1024 point correlation calculation
would take at least 66 minutes as opposed to the 3.5
minutes that this design takes.
Design Tasks
Figure 2 is a block diagram of the design. It
exemplifies the partitioning of tasks between the
computer and the extender board. It also points out a
major change of scope between the conceptual block
diagram and the final design. DRAM was used instead of
SRAM. At the time of design, 2 million bytes of SRAM
was at least twice as expensive as DRAM and the
availability was on allocation only. Also, the
footprint of DRAM is smaller than SRAM due to the
multiplexing of the row and column address lines. The
breakthrough in this design was the ability to provide
refresh of the DRAM without sacrificing sampling rate
or calculation rate. Note that this could only be done
with dedicated hardware and not with a general
computer architecture.


12
Figure 2. Final Block Diagram


13
The successful design and implementation of a complex
design requires that the task be divided into small,
manageable assignments. This design was divided as
follows:
1. Data format and manipulation from ADC to
sequential file in computer.
2. ADC architecture
3. RAM architecture.
4. MAC architecture.
5. System control of ADC to RAM and RAM to MAC
interfaces.
6. Address generation and control.
7. ALTERA gate array fit determination and
design.
8. System to computer interface function and
timing.
9. Software considerations.
10. Timing, loading, and signal transmission
requirements.
11. Mentor Logic Automation (LAI) simulation
of gate array.
12. Mentor LAI simulation of complete
digital portion of design.
13. System test.
Items 1 through 10 and 13 are covered in chapters 3
and 4. I had the advantage of a MENTOR GRAPHICS


14
workstation with LAI library for simulation. With
these two tools, I was able to completely simulate my
design before building the prototype. Simulation
proved to be invaluable in that I had encountered no
design problems during system test.


CHAPTER III
FUNCTIONAL DESIGN DISCUSSION
This section will discuss each of the main
building blocks of the design. For each block
discussed, the functions that are performed will be
documented along with specifics of their
implementation. Each block assumes complete knowledge
of all other blocks.
Data Format and Manipulation
A method of handling and formatting the data
needed to be devised. The output data format of the
ADC is unsigned integer notation, the MAC chip
processes two's complement, and the computer uses
double precision floating point* The output of the ADC
ranges from 00H to FFH with 00H representing the most
negative voltage, FFH representing the most positive
voltage and 8OH representing ground. The most
efficient data format to present to the MAC chip is
two's complement. In this mode, subtraction due to
negative voltages is accomplished without overhead
circuitry. The overhead circuitry would have to detect


16
a negative voltage representation and activate the
subtraction mode of the MAC. In order to retain as
much accuracy as possible, the data is stored and
manipulated within the computer in double precision
floating point format.
The conversion between ADC and MAC is easily
accomplished by inverting the MSB of each ADC output
before it is presented to the MAC. The following
mapping takes place:
1. 00H to 8OH
2. 8OH to 00H
3. FFH to 7FH
Complete analysis of this transformation proves that
it is linear between the three breakpoints listed
above. This conversion can be accomplished before or
after the DRAM. Timing analysis between the ADC/DRAM
and DRAM/MAC dictated that the inversion be performed
before the DRAM. Reference the 74AS04 device UX73 in
appendix A.
The MAC chip consists of two 16 bit input
ports and a 35 bit output port. The 8 bit output of
the DRAM needs to be expanded to a 16 bit word. This
is easily accomplished in two's complement by
extending data bit 7 into the upper byte. This means
that bit 7 from the DRAM is applied to bits 7 through
15 of the MAC input port. An analysis of two's


17
complement would prove that FFH is equivalent to FFFFH
and OFH is equivalent to OOOFH.
The accumulator needs to be loaded into the
computer. Because of the size of the accumulator, data
only needs to be transferred once for each coefficient
calculated. The MAC accumulator has just enough
storage so that it will never overflow. The range of
the ADC is 127 to -128. The worst case Coefficient
that would need to be passed is therefore
(128) (128) (220) = 1.717986? 1010
The 35 bit two's complement range is 234-l to -234
or +/- 1.71798692 1010. The whole accumulator
therefore needs to be loaded into the computer.
In order to squeeze 35 bits through an eight
bit data bus, five I/O reads need to be executed. The
IBM AT does have a 16 bit data bus available, but
TBASIC software limits the I/O data acquisition to 8
bits. This is not a concern since the read time to
calculation time ratio is at least .01. TBASIC
represents integers in a 16 bit, 2 byte, two's
complement format. The read process needs to read a
byte and place it in the appropriate memory location
assigned to an integer. The integers then need to be
processed into a double precision number.
Four integers are cleared and assigned to the
read process. The output of the MAC consists of a


18
least significant product (LSP), a most significant
product (MSP), and an extended product (XTP). The LSP
and MSP are two bytes wide, lower byte(L) and upper
byte(U), and the XTP is 3 bits. A read of the XTP
brings in 8 bits so the upper 5 bits need to be masked
out. Because of the pullup resistors on the MACDATAB
data bus, the upper 5 bits are all logic high. The
read process maps the five I/O reads into four 16 bit
integers as follows:
ACCUM1% = 11111 XTP | MSP-U
ACCUM2 % = 00000000 | MSP-L
ACCUM3 % = 00000000 | LSP-U
ACCUM4 % = 00000000 | LSP-L
The upper 5 bits of ACCUM1% are then assigned the same
value as the most significant bit of XTP. This extends
the appropriate two's complement representation into
the upper five bits. Each integer is then converted to
a double precision number and the MAC accumulator data
is converted by the following formula:
RESULT = ACCUM4 # + 28 ACCUM3# + 216 *
ACCUM2# + 224 ACCUMl#
The proof of the conversion formula was done by
rigorous inspection. In Table 1 several eight bit
examples are given to introduce the mapping.


19
T.C. HEX DEC SEGMENTED DECIMAL
B7 -73 (-5*16) + 7 = -73
90 -112 (-7*16) + 0 = -112
89 -119 (-8*16) + 9 = -119
5D 93 (5*16) + 13 = 93
FC -4 (-1*16) + 12 = -4
- Table 1. Eight Bit Conversion
; an example of the conversion process used
design, assume the following MAC output which
represents -2,323,666,158 :
LSP-L = 12H
LSP-U = ABH
MSP-L 7FH
MSP-U = 75H
XTP = 7H
Table 2 shows the conversion process.
T.C. HEX SEGMENTED DECIMAL
0012H 18
+ 00 ABH * 28 + 171 * 28
+ 007FH * 216 + 127 * 216
+ FF75H * 224 139 * 224
-2,323,666,158
Table 2. MAC Conversion


20
ADC Architecture
Besides the original ADC requirements, power
consumption and PWB manufacturing were a major
concern. In order to use the voltages provided by the
computer, I was constrained to +/- 12 v. and +/- 5 v.
There is also a very limited current available from
all the supplies except +5v. +12 v. has 700ma, -12 v.
has 300ma and -5 v. has 300ma. With respect to the
manufacturing concern, a 5MHz analog operation would
be difficult with wire wrap boards. A PWB would need
to be used for the analog circuitry in order to have a
reasonable chance at success. The above concerns were
met by using a Motorola MC10319 and its associated
evaluation board. Figure 3 shows the schematic of the
ADC board. The IC itself is rated for operation at a
sampling rate of 10 million bytes per second. Since
the ADC is not my design, I will not go into great
detail about its operation. One important point from a
user's standpoint is its input amplitude range. The
voltage references provide a full range of +/- 1 v. at
the ADC. For an eight bit converter, this makes the
LSB equal to 7.8 mV. The nominal gain of the input
buffer is 1.5. This makes the input amplitude range
+/- 660 mV. Also note that the input is AC coupled


21
POWER REQUIRED:
+5 Volts 9 120mA
-5.2 Volts 9 10mA
+ 15 Volts 9 30mA
IS Volts 9 27mA
4) Pin numbers in parentheses are
for the 28 pin SOIC package.
5) Two locations are provided for
this resistor on the board. The
location used depends on whieh
style package is used.
6) Use cermet multi-turn pots for
Gain and Offset.
Mtsmni) ewaudatoqbo mam
(FOR 24 PIN DIP OR 28 PIN SOIC PACKAGE)
Figure 3
ADC Schematic


22
with a termination resistance of 75 ohms. The low end
cutoff frequency is approximately 20 Hz.
correlation spectrum. If the distortion can not be
tolerated, the blocking capacitor can be removed. Care
must be taken to insure that the source is not
affected by dc coupling. The distortion from the
blocking capacitor can be estimated for bandlimited
noise. The noise spectrum is the same as a low pass
filter spectrum. The blocking capacitor adds a high
pass filter section to the spectrum. Since the Fourier
transform of the autocorrelation function is pi times
the spectral density function, comparing the inverse
Fourier transform of both spectrum yields the
distortion of the autocorrelation function.. The
spectral density function is equivalent to
l/7r*F (w) *F(-W) The spectrum of bandlimited noise is
given by:
The cutoff frequency will distort the
1
F(W)
a+jw
The spectral density function is therefore:
1
1
7T S(W)
*
a+jw a-jw
The inverse Fourier transform yields:
1 -a r
--- e u(t) +
2a
1 ar
---e u(-r)
2a
Rxx(r)


23
The spectrum of bandlimited noise with a high pass
cutoff is:
1 jw
------ + -----------
a+jw b+jw
The spectral density function is therefore:
1 jw 1 -jw
7T S (w) = ------*--------* :-----*---------
a+jw b+jw a-jw b-jw
The inverse Fourier transform yields:
a -ar ar
Rxx(t) = [e u(t) + e u(-r) ]
2(a2-b2)
b -bt br
----------- [e u(t) + e u(-7)]
2(a2-b2)
Analyzing the positive abscissa of the correlation
function and assuming ba yields:
1 -ar b -br
Rxx(r) -------e - ------ e for r>0
2a 2a2
which yields a correction coefficient of
b
2a2
with a long delay constant.


24
RAM Architecture
The original intent of this design was to
implement the memory block with static RAM. Due to
pricing, availability, and footprint differences
between SRAM and DRAM, this design was implemented
with DRAM. The use of DRAM dictated a more complex
design with respect to the control signals needed and
their associated timing. The three main categories of
control are write, read, and idle.
The cost of DRAM and associated control
circuitry used in this design is approximately $500.
The leadtime was off the shelf and the footprint of
the DRAM and control circuitry is 13 square inches. To
implement the design in SRAM would cost from $2000 to
$4500 with leadtimes from 3 to 6 months. The footprint
of the DRAM design would occupy from 12 to 40 square
inches. The footprint of DRAM is smaller than SRAM
because the number of address pins is halved due to
the multiplexing of row and column address lines onto
the same pins. The multiplexing requires additional
circuitry to separate the address generator into row
and column; however, the increase footprint due to the
mux devices is outweighed by the reduction in memory
footprint. The disadvantage of DRAM is its increased
complexity of design. Whereas SRAM has only write and


25
read control signals with simple timing parameters,
DRAM has RAS, CAS, and WRITE signals with complex
timing. There is also refresh requirements that need
to be met to prevent loss of data.
A major accomplishment of this design is the
ability to fulfill refresh requirements without
interrupting the continuous acquisition of data.
Typical computer architectures that use DRAM have
refresh cycles that interrupt the normal process. This
design takes advantage of the built in row address
generator inside of the DRAM and also the automatic
refresh of any row being read or written to. The
multiplexing of the address generator is designed such
that subsequent write or read cycles of the DRAM occur
on subsequent row addresses, i.e., the row address is
least significant and the column address is most
significant. When the sampling rate is fast enough,
all rows are refreshed within the 8 ms refresh cycle
specification. This is called a fast sampling rate
write operation or rmwriteia operation. Since read
(rmread) operation timing is constant, the refresh
cycle specification is always met when a coefficient
is being calculated. The idle operation timing
continuously forces the CAS before RAS refresh timing
of the DRAM. This timing uses the internal address
generator of the DRAM to refresh a row. Subsequent CAS


26
before RAS refresh cycles increment the row address
counter so 512 of these cycles would completely
refresh the DRAM. During slow sampling rate writes
(rmwriteb operation), hidden refresh is executed
continuously between the sample clock rising edge and
falling edge. Hidden refresh toggles the RAS line
after a write cycle is performed. This has the same
effect as a CAS before RAS except that the DRAM data
output is not tristated.
Since there are two modes of refresh that
access different row address generators, refresh
across operation boundaries needs to be evaluated. The
worst case scenario is that one operation stops at
address x and the next operation starts at x+2. If
both operations had full refresh times of 8 ms, then
address x+1 might not get refreshed for 16 ms. For
this reason, I set all full refresh times to 4 ms
maximum. With this in mind the maximum sampling rate
that can be used during a rmwritea operation is
4 ms t 512 rows = 7.812 us.
The sampling rate for rmwriteb operation ranges from
7.182 us to the maximum sample rate allowed by the
8254 interval timer which is 6.5536 ms. At 7.8122 us,
there are 19 hidden refresh cycles per sampling
period. Each hidden refresh cycle is 200 ns. A full
512 row refresh occurs every 203 us. At the maximum


27
sample rate, there is a refresh gap of 3.3 ms after
which a 512 row hidden refresh occurs in
200ns 512 rows = 102 us.
During a rmread operation, the 512 row auto refresh
time is 102 us and during an idle operation the 512
row CAS before RAS time is also 102 us. Appendix D
contains the timing diagrams for the control of the
DRAM in the various modes discussed.
One last requirement of the DRAM that needed
to be met was initialization timing. Eight full CAS
before RAS cycles need to be done before normal
operation occurs. The software handles this by
commanding the idle state and staying in idle long
enough to meet this specification before any write or
read operation starts.
MAC Architecture
The IDT 7210 was selected because of its speed
and large accumulator. The operation of the MAC
consists of clearing the accumulator before the
calculation of a coefficient, writing data from the
DRAM into the X and Y registers, and reading data from
the accumulator onto the data bus so that the computer
can access the accumulator.


28
A preload of the accumulator is commanded by
the computer via the DECPRLDO signal output of the
74ALS138, UJ25, decode chip. This command is issued
only when the system is in idle mode so that the
outputs of the DRAM are tristated. The DRAM outputs
are tristated when a CAS before RAS refresh cycle is
being performed. When the PREL input of the MAC is
high, the accumulator is loaded at the rising edge of
the CLKP input signal. Due to the pullup resistors on
the MACDATAB bus and the tristated DRAM, the
accumulator is loaded with the two's complement
equivalent of -1. This offset is nullified in the
software by adding 1 to the coefficient calculation
result. The circuitry to generate PRLDSB1 and MACOUCLK
are inside the ALTERA gate array. Figure 4 shows the
timing of a MAC preload.
10MHZ _____J I______I I_____I I-----1 L -------I I-----r
ioh0 I____________________;-------------- ---------1
DECPRLD0 I__________________:-------------- ---------1
PRLDSB1 ___I I---
MRCOUCLK ' I-------1
Figure 4. MAC Preload Timing
Because of the multipurpose of pins 8 through
24 of the MAC chip, a 74LS244 buffer, UT35, was needed
to tristate the drive to pins 17 through 24 when the


29
system is not reading from the DRAM to the MAC. The
tristate control signal is IDLE1. The MAC accumulator
is only read when the system is idle, i.e. a write to
or read from the DRAM is not being performed.
The accumulator data is byte selected and
driven onto the data bus via MACRGLSBO, MACRGENO,
ENLSWO, ENMSWO, and ENXTPO signals. All the signals
are activated by decode of the I/O address lines.
MACRGENO is activated whenever the computer wants to
access the accumulator. The ENxxxO signals select
which word of the accumulator to put onto the MACDATAB
bus. The MACRGLSBO selects which byte of the
accumulator word is driven onto the data bus. When the
XTP is accessed, the LSB is always driven onto the
data bus.
System Controller
The majority of the hardware implemented
control functions reside in the SYSTEM controller
which controls the DRAM, the ADC and the MAC chip. The
controller was implemented through a state machine.
This proved to be a very organized method of control.
This section will discuss the various states and I/O
of the state machine, implementation of the state
machine, and specifics of timing.


30
A block diagram of the state machine is shown
in figure 5. Appendix E contains the state definition
table and the state transition table. Appendix A
contains the Altera gate array schematics where the
state machine is implemented. Appendix D shows the
output timing of the state machine. These appendix
will be referenced throughout this discussion.
The CONVCLK output signal controls the
sampling of the ADC. This signal has the same period
as the sample clock from the interval timer. The state
machine synchronizes the CONVCLK to the DRAM timing so
that direct data acquisition between the ADC and the
DRAM occurs. The CONVCLK signal is driven to the ADC
through a 74ALS244 gate and terminated on each ADC.
The MACINCLK and MACOUCLK output signals are
used to control the MAC chip when data is read from
the DRAM into the MAC. MACOUCLK is derived from
MACINCLK through delay and inversion. It is not a
direct output of the state machine. MACINCLK clocks
the DRAM data into the MAC'S x and y input registers.
MACOUCLK clocks the result of a multiplication and
addition into the accumulator. It is also used to
preload the accumulator before a coefficient is
calculated. An external delay is added to MACINCLK to
align it with DRAM control signals so that a direct
read from the DRAM to the MAC occurs.


Figure 5. State Machine Block Diagram
BIOH0
DECCLRQ
DECHR0
OECRO0
o,

-=£>
TCI
SHMCLK
TRIGGER
TRIGMODE
RFSHMD
DEC IDLED
1QMHZ
ASYNCHRONOUS CLERR
INITIRTE NRITE CYCLE
INITIATE HERO CYCLE
TERMINATE RERD/NRITE
WRITE CYCLE PERIDD
MODE
INPUTS
TRISTRTE COMMAND
MACHINE CLOCK
MRC CONTROL
RDC CONTROL
ROORESS
GENERATOR
CONTROL
DRAM CONTROL
STATUS

HACOUCLK
MACINCLK
CONVCLK
CNTCLKfl
CNTCLKB
RCADDRA
RCADORB
RASAO
RASBQ
CASAQ
CASBB
WRITES
IDLE1
SYSTEM CONTROLLER
STATE MACHINE


32
The IDLE1 output signal is the only state
machine status signal. When a logic high, it signifies
that a write to or read from the DRAM is not
occurring. This signal is outputted in a tristate form
and a constant active form. The tristate output is
connected to the data bus so that the computer can
read the status. The status is put on the bus when an
active DECIDLEO input is encountered. The other idle
output controls a 74LS244 tristate buffer UT35.
The rest of the outputs of the state machine
control the DRAM and its address counter. Because of
the product term limitation of the ALTERA device, most
of the DRAM control signals had to be divided into two
signals and recombined outside the gate array. The
signals are not recombined inside the array due to
large feedback path delay. The CNTCLKA and CNTCLKB
signals clock the DRAM address counter. The signals
are "ored" together outside of the gate array. CNTCLKA
is active during idle, read, and writea operation.
CNTCLKB is active during writeb. The multiplexing of
the row and column addresses is controlled by RCADDRA
and RCADDRB. These signals are "ored" together outside
of the array and delayed to produce ROWCOLO. RCADDRA
is active during idle, read, and writea while RCADDRB
is active during writeb. The DRAM is controlled by


33
RAS, CAS, and WRITE functions. RAS is the row address
select signal. RASAO and RASBO are "anded" together
outside of the array. CAS is the column address
select. CASAO and CASBO are also "anded" together.
RASAO and CASAO are active during idle and read
operations.
Idle, rmread, rmwritea, and rmwriteb are the
four operations of the state machine. The timing of
these operations is shown in appendix D. The idle
operation performs a CAS before RAS refresh of the
DRAM. No active control of the MAC or the ADC is
performed during an idle operation. The rmwritea
operation is the auto refresh write operation. The ADC
is sampling data and the DRAM is conducting normal
write cycle timing. The rmwriteb operation is the
hidden refresh write mode. The ADC is again sampling
data and the DRAM is conducting hidden refresh write
cycle timing. In the previous two operations, the MAC
chip is not actively controlled. The rmread operation
is the auto refresh read mode. The DRAM is conducting
normal read cycle timing. The MAC chip is reading data
into its x and y registers, processing the data, and
clocking the result into the accumulator. This
operation does not actively control the ADC.
The inputs of the state machine are shown on
sheets 2 and 5 of the DRAM CONTROLLER schematics in


34
appendix A. The start of a write cycle is determined
by the TRIGGER, TRIGMODE, and DECWRO input signals.
TRIGMODE, which comes from the computer, dictates
whether to wait for a manual trigger i.e. TRIGGER or
to start a write cycle immediately upon receiving a
DECWRO signal. An a or b write cycle is determined by
the state of the RFSHMD input signal. A logic high
initiates a writea cycle. A read cycle is initiated by
a DECRDO signal. Both write and read cycles are
terminated by an active TCI input signal. TCI is
generated by a terminal count on either DRAM address
generators. The state machine is initialized by
forcing the all zero state. This is accomplished by
the DECCLRO input on sheet 5. Whenever the address
counters are cleared the state machine is also
cleared. As previously mentioned, SAMCLK is an input
from the interval timer counter.
The state machine was implemented using D
registers. Another method would have been T (toggle)
registers. The D register method has a product term
applied to any register that goes to a logic high in
each state. The T method has a product term applied to
any register that changes logic level between states.
In this design, product terms were minimized by using
the D method. The product terms are shown on sheets 3
and 4 of the DRAM CONTROLLER schematic in Appendix A.


35
Each product term consists of inputs that identify a
previous state and inputs that define the next state.
On sheet 5 of the DRAM CONTROLLER schematic, the
product terms are summed together and presented to the
D registers. The output of the D registers become the
outputs of the state machine. No formal method was
used to determine the product terms needed. The
process was done entirely by inspection to minimize
the implementation size.
Address Generator
The address generator section consists of two
independent 20 stage counters, one for each channel of
DRAM. Figure 6 shows the block diagram of the address
generators. The output of the counter must be
multiplexed into two sections for input to the DRAM.
The two sections become the row address and column
address of the DRAM. The generators provide addressing
during read and write operations. This section will
describe the functional specifics of the read and
write operations along with circuitry to setup the
operations.
The generators are implemented using 74LS161A
devices. They are configured in a ripple carry out
mode. Through the mux devices, 74F257, the least


Figure 6. Address Generator Block Diagram
ADDRESS BANK A
20 STAGE
BINARY COUNTER
MULTIPLEXER
B
A
R0/B
----
HDORA (9:0)
*> TCI
MULTIPLEXER
B
A
AG/B
----
ADDRB (9:0)
10
o\


37
significant portion of the generator becomes the row
address and the most significant becomes the column
address. Pullup resistors are connected to the output
of the muxes to provide good high level drive. Series
damping resistors are used to damp overshoot and
undershoot at the DRAM input pins. This was done since
the inputs of the DRAM are mainly capacitive and act
like an open circuit in transmission line analysis.
The most significant carry out of each generator is
reclocked through a D flip flop by CNTCLK. The output
of the D flip flops are then "ored" together. The
resultant signal, TCI, signifies that one of the
generators is at terminal count. A read or,write
operation will then be terminated.
To prepare for a write operation, both address
generators need to be cleared. This causes time
samples on each ADC channel to be written to the same
address in each DRAM bank. Upon entering a write
operation, the address generators are incremented one
address before a write takes place. Due to the
reclocking of the terminal count signal, the last
address written is 00000H; therefore, all 220
address locations are utilized.
In order to calculate coefficients during a
read operation, an offset must be introduced into one
of the generators. Before each coefficient calculation


38
starts, an offset is loaded into one of the generators
from the computer. If a positive delay is desired, the
bank A generator is loaded with an offset equal to the
delay. If a negative delay is desired, an offset is
loaded into bank B in the same manner. Since the read
cycle is terminated when a generator reaches its
maximum count, there will be less than
summations except for a zero delay calculation. The
number of summations will be (220- delay), where
delay is in terms of absolute value. Since there is a
maximum delay of 511, a reduction from 2iU is
insignificant. This reduction is also accounted for in
the software by making the summation normalization
factor a function of the delay calculated. Upon
entering a read cycle, the generators are incremented
one address before the first read occurs. This aligns
the read process with the write process.
The clear and load circuitry mentioned above
is resident in the ALTERA gate array, reference DRAM
CONTROLLER, sheet5, appendix A. A block diagram of
this circuitry is shown in figure 7. A DECCLRO signal
causes all generator clear lines to be activated.
Since the 74LS161 clear function is asynchronous, a
clock is not needed. When DECPRLDO input is active,
the PRELD9 input signal is used to determine which
load lines are activated and which clear lines are


39
*> RDRCLRA0
*> RDRCLRB0
ADDRCLR0
RDDRL0R0
^ RDDRLDB0
}> CLKC
TRUTH TABLE
CLHCNTB PRLDSBl PRELD9 HDRCLRR0 RDRCLRB0 ADDRCLR0 AODRLDA0 ADORLDB0
8 0 X 0 0 0 X X
1 1 1 0 1 0 1 0
1 1 0 1 0 0 0 1
1 0 X 1 1 1 1 1
NOTE; ONLY VALID INPUT STATES RRE SHOWN
BIOW0
DECCLR0-
DECPRLD0-
PRELD9
10MHZ-
CLRCNT0
r8

PRLDSBl
START ADDRESS
DECODE
Figure 7. Address Decode Block Diagram
activated. Any generator stagie that is not loaded with
an offset is cleared. The load operation is
synchronous; therefore, a clock edge is needed. The
clock edge is provided via the CNTCLKC output, sheet 2
of DRAM CONTROLLER. CNTCLKC is "ored" with CNTCLKA and
CNTCLKB to produce the CNTCLK signal that clocks both
address generators. There are 9 lines that contain the
offset information. The most significant bit, PRELD8,


40
is written into the 74LS377 I/O register UAN25. The
rest of the information is presented on the data bus
when a DECPRLDO occurs. Figure 8 demonstrates the load
timing for a negative delay.
System Computer Interface
Most of the previous mentioned processes
needed information or gave information to the host
computer. This section will summarize the information
transfer. Since an IBM AT I/O timing specification is
not readily available, I needed to generate one. The
timing will be discussed along with synchronization of
signals between the computer and system.
Figure 9 shows the I/O signals used. The AT
bus specification sets aside I/O address space from
300H to 31FH for a prototype card. This is the space
that this design uses. TBASIC reads a byte of data
from I/O using the INP(portno) command. Portno is the
address of the desired port. Tbasic writes a byte of
data to I/O using the OUT portno,integer expression
command. Integer expression is the value of the byte
to be written. Table 3 is an I/O map of the system.


Figure 8. Address Load Timing
10MHZ
BIOHO
DECPRLD0
PREL09
RDCLRR0
RDCLRB0
RDDRCLR0
RDDRLDR0
RQORLOB0
CNTCLK
ji__r
H


RT I/O BUS CORRELATOR BOARD
//. y, nnm rtt n.m
" BIDIRECTIONAL DRTA ^ onnRRTTia.m
w ADDRESS BUS )> 0FN
v\. ADDRESS ENABLE If1W
" I/O WRITE STROBE TnR
" I/O READ STROBE
Figure 9. I/O Bus Signals


ADDRESS
31 OH
300H
302H
303H
304H
305H
306H
308H
309H
30 AH
30BH
30CH
30DH
FUNCTION
Interval timer control byte
Interval timer counter 0
Scratch register UAN25
RAM read strobe
RAM write strobe
Clear strobe
Preload counters
Read idle status
Read MAC LSP-L
Read MAC LSP-U
Read MAC MSB-L
Read MAC MSB-U
Read MAC XTP
Table 3. I/O mapping


44
Before writing a sampling rate to the interval
timer, a control byte of 36H must be sent. This puts
the timer into square wave mode and prepares it to
receive the LSB and then the MSB of the sampling
division ratio via address 300H. The scratch register
contains the TRIGMODE, and RFSHMD commands which are
written before a write operation and PRELD8 and PRELD9
which are written before a read operation. RAM read
strobe initiates a read operation whereas RAM write
strobe initiates a write operation. The clear strobe
is transmitted before a write operation and clears the
address generators and resets the state machine. The
last three commands are address decodes only and no
data is sent on the data bus. The preload counter
command sends a preload strobe to load the address
generators with the offset present on the data bus and
PRELD8 and PRELD9 of the scratch register. It also
causes the MAC accumulator to be loaded with a -1. The
last 5 read commands are self explanatory.
All the address decodes most be conditioned
with either the IOWO or IORO signals. IOWO signifies
that a valid I/O write is being performed and IORO
signifies that a valid I/O read is being performed.
All the I/O read functions are conditioned by IORO at
the 74ALS138 decode chip UJ30. The interval timer and
scratch registers are conditioned by IOWO at their


45
respective chips. The rest of the write decode
conditioning is performed inside of the gate array.
To generate an AT bus specification, I
analyzed an IBM AT schematic and the associated
integrated circuit data manuals to generate setup and
hold time specifications. I reinforced the
specification by testing the I/O timing of a Zenith AT
running TBASIC software. I also determined the read
and write cycle time when exploring the Zenith. For a
write cycle, the address is setup 42 ns before the
negative edge of IOWO and held for 34 ns after the
rising edge. The associated data setup and hold is 60
and 37 respective. The IOWO active low time is 750 ns.
For a read cycle, the address and cycle times are the
same as above. The system must have the data valid 20
ns before the rising edge of IORO and must be held for
3 ns after that'edge. The system must get off the host
computer's data bus 62 ns after the rising edge of
IORO.
The ram write and ram read strobes are
sensitive to synchronization between the computer and
the system. The state machine operates from an
on-board 10MHz crystal. Both of these strobes are
recloCked twice by the 10MHz signal before being
presented to the state machine as inputs, reference
appendix A, sheet 2 of DRAM control. This practically


46
eliminates metastability problems that could send the
state machine to invalid states. The ram write strobe
has an additional latch in the gate array to hold the
active strobe indication until a samclk is coincident
and the write cycle is initiated. Upon cycle
initiation, the IDLSTAT1F0 signal resets the holding
latch.
Software
The host computer through TBASIC software
needs to provide command and control as discussed in
the previous section. It also reads the MAC
accumulator and converts the data for user ouput. The
software program developed for this thesis provides
graphical output to the CRT and storage of the
coefficients in a sequential file. The self-
explanatory software listing is given in addendix C.
The main isoftware design task was to minimize overall
coefficient calculation time.
The software is structured so that the system
is not idle while the computer is doing a task. The
calculation of a coefficient takes about 200 ms during
which time the computer does not command the system.
As long as all the software functions for each
coefficient can be performed in this time, a poll of
the idle status can be performed. If the overhead


47
functions took longer than 200 ms, then a more complex
interrupt scheme would have to be used. The overhead
functions that are performed are:
1) Conversion of MAC accumulator
2) Normalize accumulator result
3) Store coefficient to sequential file


CHAPTER IV
TEST RESULTS
This chapter presents the results of tests
performed on the correlator. The system is tested for
amplitude linearity with respect to input amplitude.
The linearity test also demonstrates the dynamic range
of the system. Simple examples of autocorrelation and
crosscorrelation are then demonstrated. Before these
tests were performed, the system was calibrated using
the procedure in Appendix B.
Linearity
Linearity is readily determined by
autocorrelating a square wave of known amplitude. For
a square wave, the peaks of the autocorrelation
function are equal to the square of the peak voltage.
My test setup consisted of a function generator and a
true rms meter to monitor the amplitude output of the
generator. The frequency was set for 50 KHz and the
amplitude was varied. The input is sampled at 5MHz.


49
The test results are tabulated in table 4 and figure
10 shows the output waveform.
IN VOLT (mVrmsI OUT VOLT (mVrms} % DEVIATION
700 679 N/A
675 675 0
648 652 .6
600 603 .5
500 501 .2
400 401 .25
300 299 .33
200 203 1.5
100 100 0
50 50.6 1.2
40 40.8 2
30 29.9 .33
19.7 19.4 1.5
9.7 9.5 2
4.9 4.8 2
Table 4. Amplitude Linearity
The test demonstrates dynamic range of 135:1 or a 5 mV
to 675 mV input range. The best case dynamic range for
8 bits is 256:1. The top end of the range yields
linearity of less than 1% whereas the bottom increases
to 2%.


50
MIN---3.60E


51
Autocorrelation
The autocorrelation of a periodic signal
results in a periodic signal of the same frequency as
the input. This is easily demonstrated by a sine wave.
Again, the input is sampled at 5MHz and is set at
lOKHz, 800 mVpp or 280 mVrms. Figure 11 shows the
output. Given a peak sine wave amplitude of A, the
autocorrelation peak should be A2/2* In this example
the correlation peak should be 80000 mV2 .
Another useful autocorrelation function is
pseudorandom noise. Noise is generated by passing the
output of a 215-1 pseudorandom number pattern
through a low pass filter. The pseudorandom generator
is capable of generating an AC signal. Following the
guidelines for noise generation by Horowitz [2], I
clocked the pseudorandom generator at 2 MHz and set
the 4 stage low pass filter to 200 kHz. Appendix G
gives the magnitude and phase characteristics of the
filter. The expected wideband noise amplitude is:
Vrms = a(2/Fclock)V/Hz"^
where a is the peak output amplitude of the
pseudorandom generator. For a 200kHz bandwidth, the
expected output is 111 mVrms. The true rms voltmeter
read 106 mVrms. Figure 12 shows the autocorrelation


MAX:+8.05Efr04nUA2
52
m
i
Figure 11. Sine Wave Autocorrelation
MIN=-8.85E+04hVa2


MAX:+l,13E+04ir2
53
in

\


Figure 12. Wideband Noise Autocorrelation
MIN:-3,86E+02mVa2


54
output with the correlator sampling at 5 MHz. Note
that the square root of the correlation peak is 106
mVrms. The filter was then set to 20 kHz. The expected
filtered output is 35.3 mVrms. The true rms meter read
36 mVrms. Figure 13 shows the autocorrelation output
with the input sampled at 5 MHz. Note that the square
root of the peak is 36 mVrms.
The noise autocorrelation slope of decay
yields information about the bandwidth. To demonstrate
this, I inserted a simple R-C lowpass filter between
the output of the filtered pseudorandom generator and
the input of the correlator. The filtered pseudorandom
output bandwidth was set at 200 kHz and the simple
filter at 2 kHz. The 2 kHz cutoff should yield a decay
constant of 7.9 105. Figure 14 shows the
autocorrelation graphically and figure 15 shows it
numerically. The numerical data yields a decay
constant of 1.04 104. The difference between
measured and theory is due to component tolerance
since the correlation measurement is far more accurate
than the circuit components specified. The difference
is not due to the 20 Hz filter in the ADC since the
lowest frequency of the noise spectrum is 61 Hz [2],


Mflfctl.30EtMA2
in

Figure 13. Bandlimited Noise Autocorrelation
HIN:-i.60E+0MA2


HAX=+6.80E+03mUa2
56
i
Figure 14. Simple LPF Autocorrelation
MIN-+0.00E+00mUa2


57
6799.136967312782
6688.394505299579
6539.823726149729
6456.821020874103
6398.269637439907
6352.732056038223
6299.107307673601
6199.726928058076
6111.211381342347
6046.425036192286
5982.800267985294
5927.120209639087
5865.149629063776
5767.266460161631
5679.020518119594
5609.561696040284
5538.224103269344
5467.420465057727
5399.926888094782
5328.966995563423
5258.131752268599
5190.565020971738
5125.500054360628
5064.891588113895
5002.624759310705
4927.002019455438
4854.281922447115
4790.438058629647
4726.640904985661
4662.913999958037
4600.595766991402
4538.68637367626
4476.602933785262
4415.764736176519
4356.582239035002
4298.55051214069
4241.477771928369
4184.504268324085
4127.647342916845
4072.131271601547
4017.660983658152
3964.163528793535
3911.569190955363
3858.610529292382
Figure 15. Simple LPF Autocorrelation Data Points


58
Crosscorrelation
If wideband noise is used as an input to a
filter network, the crosscorrelation between the input
and the output yields the impulse response of the
filter. For this demonstration, a KROHN-HITE 3200 four
pole filter is used for a network. The filter has the
capability of a maximally flat or simple R-C response.
I used 200 kHz bandlimited noise into the KROHN-HITE
filter and selected a 2 kHz cutoff frequency. The
input was sampled at 2MHz. Figures 16 and 17 show the
results. The impulse response of the simple R-C filter
has no overshoot or undershoot whereas the impulse
response of the maximally flat filter does. This
indicates that the simple R-C network has a more
linear phase response than the maximally flat filter.
The peak shift is the same for both filters indicating
that both filters are of the same order.


HftX=+4. 58E+03hUa2
59
Figure 16. Maximally Flat Crosscorrelation
HIN=-3.99E+02mVa2


MAfc+2,64Et03nUA2
60
. 10
in
i
Figure 17. Simple R.C. Crosscorrelation
MIIfc+0.00E+00i#2


CHAPTER V
DESIGN IMPROVEMENTS
As with every design, there is room for
improvement. This chapter will briefly discuss
hardware and software improvements that increase
performance, i.e. execution time or accuracy, or
provide additional features.
The accuracy of the correlation function could
be improved if several spectra were averaged together
[3]. This could be accomplished in software by
commanding several capture calculate cycles and
keeping a running average. Another software
improvement would be to allow an autocorrelation's
abscissa to span from 0 to 1023. Since an
autocorrelation spectrum has even symmetry, the
negative abscissa does not yield additional
information. This improvement would also require a
hardware change to allow the address counters to load
a maximum count of 1023 instead of 511 .
To improve the acquisition of data, both a
variable antialiasing filter and an automatic gain
control circuit is needed in front of the ADC.


62
Currently, the operator must provide these functions
externally.
The limiting speed factor in this circuit is
the 200 ns DRAM cycle time. The MAC circuit can
operate at 65 ns. By using 100 ns DRAM and a faster
ALTERA gate array, i.e. 20 MHz design instead of 10
MHz, the calculation time could be halved and the
sample rate could be doubled to 10 MHz. 10 MHz is the
maximum sample rate of the ADC.
The last improvement that I will discuss, is
the ability to download data from the computer into
the RAM bank B instead of from the ADC. This would
facilitate correlation between a test signal and a
known or computer generated signal. The following
outline describes a possible implementation.
1) Acquire real time data on channel A at the desired
sample rate. Note that channel B will be writing data
to the DRAM but will be overwritten later.
2) Disconnect the write signal to bank A DRAM via
hardware change. This will preserve the data in bank A
while the computer data is written to bank B.
3) Change sample rate to download data from computer to
bank B. This rate needs to be determined, but should
be in the 10 us range. This would allow complete
download in 10 s.
4) Download data from computer in the following manner:


63
a) Add an 8 bit ramwrite register to the DATA
bus. Connect the output of the register to the
ADDATAB bus.
b) Tristate output of the ADC by either control
signal to pin 20 of ADC or by disconnecting
cable to ADC.
c) Tristate CONVCLK signal onto DATA bus. Have
computer constantly query this signal during
the bank B write operation. When signal goes
low, write new data to the register. This
could also be done in an interrupt mode with
CONVCLK signal providing the interrupt. The
sample rate is determined by this step. The
maximum computer time between either query or
interrupt and register write needs to be less
than sample period.
d) Initiate a correlator write cycle at the
proper sample rate and download data.


64
BIBLIOGRAPHY
[1] I.M. Langenthal, "Correlation and Probability
Analysis," SAICOR bulletin TB14, Signal
Analysis Corp., 1970, appendix H.
[2] P. Horowitz and W. Hill, The Art of
Electronics. Cambridge: Cambridge University
Press, 1980, pp. 437-442.
[3] W. D. Stanley, G. R. Dougerty and R. Dougerty,
Digital Signal Processing. Reston Virginia:
Reston Publishing Co., 1984, p.309.


APPENDIX A
Schematics


t
l
I
99


I
67




69


70


71


72
The next five pages schematically describe how
the Altera gate array is programmed. The Altera device
used is the EP1800 device. Altera's technology partner
is Intel which markets an identical device 5C180. The
EP1800 is an erasable programmable logic device
similar to an eprom memory device. It contains 48
macrocells with user-configurable I/O architecture,
allowing up to 64 inputs and 48 outputs. Each of the
48 macrocells contains a programmable AND and fixed OR
PLA structure with a maximum of eight product terms
for logic implementation, in addition, single product
terms control Output Enable/Asynchronous Clock and
Asynchronous Clear functions. The EP1800 also includes
programmable registers. Each of the 48 internal
registers may be programmed to be a D, T, SR, or JK
flipflop. In addition, each register may be clocked
asynchronously on an individual basis or synchronously
on a banked register basis.




J___________7_____________|___________b____________|___________b____________|___________j____________1____________S____________|___________Z____________|___________L
MAC0UCLK031
D
B
ft
PLD OPERATION
1 1900 PRAIRIE CITY ROAD
FOLSOM* CA 95630
DRAM CONTROLLER
TURBO OH 1 000-0000-000 REU 1A
GPLD' 5C180 | DATE.6-10-e9 | SHT 2 OF 5
1 2 \ r


1
7
X
6
1
5
OASAOFI
IDLSTAOFl

D
RASA0F1
CASA0F0
IDLSTAT1F1
7
2
>
PT2
c
RASA0F0
IDLSTAT1F1
RMUR0F1
RMRD0F1

RFSHMDF0
RASA0F0
tOLSTATIFt
RttUROF
SAMCLKF1

PT3
PT4
B
ucc

MACINCLKFO-
CONUCLKF1-
MR ITE0F0
RFSHMDF2
TC1F0
SAMCLKF1

A
MACIMCLKFO-
CONUCLKF1-
WR1TEOFO-
RFSKMOF1
TC1FO-
SAMCLKFO-
UCC
L
MACIMCLKFO-
CONUCLKF l*
WR1TE0F0-
RFSHMDF1-
TCIF1-
UCC
RASAOF0-
IDLSTAT1F1-
RMUROFO-
RFSHMOF0-
SAMCLKFO-
RASB0F1-
MACIMCLKFO-
CONUCLKFO-
RFSHMDF0-
MAC1HCLKF0-
C0NUCLKF1-
RASBOF0-
RFSHMOFO-
TC1F0-
SAMCLKFO-
MACIMCLKFO-
CONUCLKF 1
WRITE0F0
RASB0F1-
MACINCLKF0-
CONUCLKF1-
RASBOF0-
RFSHMDF0-
TC1F0-
SAMCLKF1-
8 | 7 | 6 | 5
>

.AA303
. AND6
>"


MAC IMCLKFO
CONUCLKFI-
RCAOORBFl-
SANCLKr1
MACIMCLKFO
CONUCLKFl-
RCADDRBF1
SAMCLKF0-
UCC
I_
MACINCLKF0
CONUCLKF1-
RASBOFO-
RFSHMDF0
TCIF1
IDLSTAT1F1
RMRO0F0
MACINCLKF0-
CONUCLKF1-
CNTCLKAF1-
RAS80F1-
IOLSTAT1FO-
CASA0F0-
7CIFO-

D*
>
.ABE03
[AMO 3

>
PT14

PT15
___________________________A
PLO OPERATION
1900 PRAIRIE CITY ROAD
FOLSOM* CA 95630
________DRAM CONTROLLER______________
TURBO m ON | 000-0000-000 REU Ift
EPLOSCI80 } DATEi 6-10-09 | SHT 3 OF 5
(J1
4
3
2
1


MAC I tlCLKF 1-
CONUCLKFl-
CNTCLKAF1-
PT21-
PT23-
, C'K2
MACIHCLKPT
RASB0FI
IDLSTAT1F0-
MACINCLKFI-
CONUCLKF1
CNTCLKAFO

>
RASBOFO-
CASA0F0-
O'
PT11
PT12-
PT13-
PTH-
PTX?
X"

-CNTCLKBPT
PTS-
PT6-
PT9-
PT19
PT21-
PT23-
£>
-RCADDRAPT
PT1-
PT2-
PT3-
PT4-
PT10-
PT10-
PT24-
P.T25-

PT11
PT15
PT16-
PT17-
*>
PLD OPERATION
1900 PRAIRIE CITY ROAD
FOLSOM. CA 95630
DRAM CONTROLLER
000-0000-000
EPLO* SC160 I DATE* 6-10-89 | SHT 4 OF S


PLD OPERATION
1900 PRAIRIE CITY ROAD
FOLSOM. CA 95630
DRAM CONTROLLER
TURBO ON
000-0000-000 REU 1A
EPLD 5C180 I DATEi 6-10-89 | SHT 5 OF 5


78
POWER REQUIRED:
+5 Volfs 0 120mA
-5.2 Volfs 9 10mA
+ 15 Volts 9 30mA
-15 Volts 9 27mA
4) Pin numbers in parentheses are
for the 28 pin SOIC package.
5) Two locations are provided for
this resistor on the board. The
location used depends on which
style package is used.
6) Use cermet multi-turn pots for
Gain and Offset.
mm dd mah
(FOR 24 PIN DIP OR 28 PIN SOIC P ACK AGE)


APPENDIX B
Calibration Procedure


80
1) Install system in host computer and power up. Wait
10 minutes before proceeding to allow system to warm
up.
2) With flat ribbon cable disconnected from both ADC,
inject a 200 mVpp 500 kHz AC sinewave into both ADC.
For each ADC, place a scope probe at the input and
output of the input amplifier. Adjust the offset pot
for 0 dc offset at the output. Adjust the gain pot for
an output amplitude of 300 mVpp.
3) Install the autocorrelation plug on the wire wrap
side of Jl. This connects the two channels of ADC data
together. Reconnect only the flat ribbon cable to ADC
channel A. Leave ADC channel B disconnected.
4) Inject a 50 mVpp sine wave at 50 kHz. Set the
sample rate for 1 MHz. Run the correlator. If the
positive and negative peak of the autocorrelation
output is not of equal amplitude, adjust the offset
and rerun test. Continue this step until peaks are
equal.
5) Disconnect cable from ADC channel A and connect ADC
channel B. Repeat step 4 for channel B.
6) Set the input amplitude to 1200 mVpp. Autocorrelate
channel B. The positive and negative peaks should be
the same. Note the value of the peaks.
7) Disconnect channel B and reconnect channel A. Keep
the function generator at the same setting as step 6.
Adjust the gain so that the correlation peaks are the
same as in step 6.
8) Disconnect the plug from the wire wrap side of Jl.
Connect the flat ribbon cable to both ADC channels.
Readjust generator for 1200 mVpp output with output
connected to both ADC inputs. Correlate and insure
that the peaks are approximately equal to the value in
step 6.
9) Determine if any adjustments are needed to the
normalization factor and edit the NORM variable in the
software listing.


APPENDIX C
Software Listing


CORE
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016
0017
0018
0019
0020
0021
0022
0023
0024
0025
0026
0027
0028
0029
0030
0031
0032
0033
0034
0035
0036
0037
0038
0039
0040
0041
0042
0043
0044
0045
0046
0047
0048
0049
0050
0051
0052
0053
0054
0055
0056
0057
0058
0059
0060
0061
0062
0063
0064
0065
0066
0067
0068
0069
0070
0071
0072
0073
0074
0075
0076
0077
0078
BAS page 1 of 4. Printed on 03/26/90 at 18:02:35.
9-19-89 PROGRAM "CORR.BAS"
This program controls the correlation board. Upon execution
a master reset is transmitted. The program then prompts for the
desired sanpling rate. The maxinun sample rate is 5 MHz and the
minimum is 306 Hz. A sampling rate is then calculated that is
an integer multiple of 10 MHz and the result is displayed.
The program then prompts for the trigger mode. The automatic trigger
mode will start aquiring data as soon as any key is pressed after the
aquisition prompt. The manual mode waits for a high level at the
trigger BNC located on the correlation board. After the memory is
filled, the program prompts for the most negative and most positive
delay coefficients to be calculated and for the name of the
sequential file that the coefficients should be written to.
After all the calculations are completed, the data is displayed
in graphic form to the screen.
NORM a 28.5 1 normalization factor
DIM COEF#(1022)
OUT &H305,0 1 clear strobe transmitted
1 The desired sampling rate in KHz is entered and its ratio
1 to 10MHz is determined. The ratio sent to the system must be
1 an integer so the desired ratio is rounded off and the resultant
1 ratio is displayed. The contents of the variable ratio is then
' sent to the interval timer. Note that since ratio is in twos
' complement, the MSBit is not used and the ratio is limited
1 to 32767 or 306 Hz.
INPUT "ENTER THE DESIRED SAMPLING RATE IN KHz" ; DESSMRT
RAT10%=ClNT<10000/DESSMRT>
SMRT=10000/RATIO%
PRINT "SAMPLE RATE IS" SMRT "KHZ"
DEF SEG = VARSEG(RATIOX)
ADDRESS = VARPTRCRATIOX)
LSB = PEEK(ADDRESS)
MSB = PEEKCADDRESS+1)
OUT &H310 , &H36 'send interval timer control word
OUT &H300 , LSB 'send interval timer LSB
OUT &H30Q , MSB 'send interval timer MSB

1 The user determines whether to use auto or manual
1 trigger for data acqusition.
i
TRIGS="NULL
WHILE TRIGS o "A" AND TRIGS o "H"
INPUT "ENTER TRIGGER HOOE, A=AUT0, M=HANUAL" ; TRIGS
WEND
i
1 Before a write operation, the refresh mode and trigger
1 mode must be sent to the scratch register. The trigger
1 mode is bitO and the refresh mode is bit1.
I
IF TRIGS = "A" THEN BIT0X = 1 ELSE BIT0X = 0
IF RATIOX < 79 THEN BIT1X = 1 ELSE BIT1X = 0
URREGX = 2*(BIT1X) + B1T0X
OUT &H302 URREGX 'write rfsh and trig modes to register
1 Initiate a write operation. If manual trigger was selected,
1 a write stobe is sent imediately. The program then waits for
1 a non-idle status before continuing. This signifies that the
' system has triggered. If auto trigger was selected, the program
1 waits for the user to strike a key.
i
IF TRIGS = "M" THEN
OUT SH304 0 'send write strobe
PRINT "WAITING FOR MANUAL TRIGGER"
WAIT SH308 255 255 'wait for active status
ELSE
PRINT "HIT RETURN KEY TO START AUTOMATIC ACQUISITION"
LENGTH = 0
WHILE LENGTH = 0
SS INKEYS


CORR. BAS page 2 of 4. Printed on 03/26/90 at 18
0079 LENGTH = LEN(SS)
0080 WEND
0081 OUT &H304 0 'send write strobe
0082 PRINT "ACQUIRING DATA"
0083 END IF
0084 i
0085 1 Waft for system to acquire all data.
0086 i
0087 IDLEX = 0
0088 WHILE IOLEX = 0
0089 IDSTX = INPC&H308) 'poll idle.status
0090 IOLEX = 127-IDSTX
0091 PRINT "ACQUIRING DATA"
0092 WEND
0093 PRINT "DATA IS ACQUIRED"
0094 i
0095 1 The user enters the limits of the correlation delay and
0096 1 names a sequential file to store the data. Four integer
0097 1 variables are also declared and cleared for reading
0098 1 the MAC accimilator.
0099 i
0100 INPUT "ENTER THE HOST NEGATIVE DELAY NEEDED CHAX= -511) ; NEGDLYX
0101 INPUT "ENTER THE HOST POSITIVE DELAY NEEDED 0102 INPUT "ENTER NAME OF FILE TO STORE COEFFICIENTS" ; FILES
0103 OPEN FILES FOR OUTPUT AS #1
0104 ACCUM1%=0
0105 ACCUM2%=0
0106 ACCUM3%=0
0107 ACCUM4%=0
0108
0109 1 Coefficient calculation A delay is first sent to the
0110 system through the wrdelay subroutine. The program then
0111 1 commands the system to calculate coefficients by sending
0112 1 a read strobe. The results are read in and reassembled.
0113 1 The reassembled result is both stored in a sequential file
0114 1 and printed to the screen.
0115 1
0116 COFDLYX = NEGDLYX
0117 GOSUB WRDELAY 1 send delay info
0118 OUT &H303 0 1 send read strobe
0119 PRINT CALCULATING COEFFICIENT" COFDLYX
0120 i
0121 DO
0122
0123 WAIT &K308 255 127 1 wait for inactive status
0124 >
0125 MACB5X = INPC&H309) 1 read MAC LSP-L byte
0126 MACB4X = INP(&H30A) read MAC LSP-U byte
0127 MACB3X = INP<&H30B) read MAC MSP-L byte
0128 HACB2X = INP(&H30C) read MAC HSP-U byte
0129 HACB1X = INPISH30D) ' read MAC XTP byte
0130 i
0131 INCR COFDLYX
0132 . IF COFDLYX <= POSOLYX THEN
0133 GOSUB WRDELAY 1 send delay info
0134 OUT &H303 0 1 send read strobe
0135 PRINT CALCULATING COEFFICIENT" COFDLYX
0136 END IF
0137 i
0138 1 Reassemble aceunilator data
0139 t
0140 IF (MACB1X AND 4) = 0 THEN MACB1X = MACB1X 248
0141 DEFSEG VARSEGCACCUM1X)
0142 ADDRESS = VARPTRCACCUM1X)
0143 POKE ADDRESS+1 MACB1X
0144 POKE ADDRESS HACB2X
0145 DEFSEG = VARSEG(ACCUH2X)
0146 ADDRESS = VARPTRCACCUM2X)
0147 POKE ADDRESS HACB3X
0148 DEFSEG = VARSEGCACCUH3X)
0149 ADDRESS = VARPTR (ACCUH3X)
0150 POKE ADDRESS MACB4X
0151 DEFSEG = VARSEGCACCUM4X)
0152 ADDRESS = VARPTRCACCUH4X)
0153 POKE ADDRESS HACB5X
0154 ACC1# = CDBL(ACCUMIX)
0155 ACC2# CDBLCACCUM2X)
0156 ACC3* = CDBL(ACCUM3X)


CORR
0157
0158
0159
0160
0161
0162
0163
0166
0165
0166
0167
0168
0169
0170
0171
0172
0173
0174
0175
0176
0177
0178
0179
0180
0181
0182
0183
0184
0185
0186
0187
0188
0189
0190
0191
0192
0193
0194
0195
0196
0197
0198
0199
0200
0201
0202
0203
0204
0205
0206
0207
0208
0209
0210
0211
0212
0213
0214
0215
0216
0217
0218
0219
0220
0221
0222
0223
0224
0225
0226
0227
0228
0229
0230
0231
0232
0233
0234
v
84
BAS page 3 of 4. Printed on 03/26/90 at 18:02:35.
ACC4# = CDBL(ACCUM4X)
RESULT# = (ACC1# 2'24 + ACC2# 2'16 + ACC3# 256 +ACC4#)
C0EFF#(C0FDLYX+511) = RESULT# NORM / (1048576 ABS(COFOLYX))
WRITE #1 C0EFF#(C0FDLYX+511)
PRINT COEFF#(COFDLYX+511)
LOOP UNTIL COFOLYX > POSOLYX
1 FROM THE LIMIT LABEL TO THE PLOT LABEL IS THE PLOTTING ROUTINE.
i
LIMIT:
MINY=0 : MAXY=0
FOR COFOLYX = NEGDLYX TO POSOLYX
IF COEFF#(COFDLYX+511) < 0 THEN
IF CGEFF#(COFDLYX+511) < HINY THEN HINY = COEFF#(COFDLYX+511)
ELSE
IF COEFF#(COFDLYX+511) > MAXY THEN MAXY = COEFF#(COFDLYX+511)
ENO IF
NEXT
IF MAXY > ABS(MINY) THEN ABSMAX = MAXY ELSE ABSMAX=ABS(HINY)
SCREEN 2
CLS
WINDOW (-319,-99) (320,100)
OEFINECAP:
DIM NEGX(50) POSX(50) GRHINY(200) GRMAXY(200)
CLS
PRINT "-511"
GET (-319,90) (-290,100), NEGX
CLS
PRINT "511"
GET (-319,90) (-295,100), POSX
CLS
PRINT "MAX="; : PRINT USING "+#.##-----MAXY ; : PRINT mV2"
GET (-319,90) (-150,100), GRMAXY
CLS
PRINT "MIN="; : PRINT USING "+#.##-----HINY ; : PRINT "mV'2"
GET (-319,90) (-150,100), GRMINY
CLS
AXIS:
LINE (-319,0) (320,0)
FOR X= -250 TO 250 STEP 50
LINE (X,-1) (X,1)
NEXT
LINE (0,-99) (0,100)
FOR Y= -98 TO 100 STEP 14
LINE (1.Y) (-1,Y)
NEXT
CAPTIONS:
PUT (-280, -5), NEGX
PUT (250, -5), POSX
PUT (-319, 100), GRMAXY
PUT (-319, -85), GRMINY
PLOT:
PSET(0,0)
FOR COFOLYX = 0 TO 511 STEP 2
YY = INT(COEFF#(COFOLYX*511) 99 / ABSMAX)
LINE -(C0FDLYX/2.YY)
NEXT
PSET(0,0)
FOR COFDLYX = 0 TO -511 STEP -2
YY = I NT (COE F F#( COFD LYX+511) 99 / ABSMAX)
LINE -(COFDLYX/2.YY)
NEXT
CLOSE #1


85
CORK.BAS page 4 of 4. Printed on 03/26/90 at 18:02:35.
0235 END
0236 1
0237 1 The two most significant bits of the delay are sent
023S ' to the scratch register. The least significant bits are
0239 1 sent to the system with the preload strobe.
0240
0241 URDELAY:
0242 IF C0F0LYX < 0 THEN BYTE7X =1 ELSE BYTE7X =0
0243 IF ABS(COFDLYX) > 255 THEN BYTE6X =1 ELSE BYTE6X=0
0244 URREGX = 12B BYTE7X + 64 * BYTE6X
0245 LSBDLYX = ABSCC0FDLYX) AND &H00FF
0246 OUT &H302 URREGX
0247 OUT &H306 LSBDLYX
0248 RETURN
0249
0250
0251
0252
0253
0254
0255
0256
0257
0258
0259
0260
0261
0262
0263
0264
0265
0266
0267
' write to scratch reg
send preload strobe


APPENDIX D
DRAM Timing


BRASS
BCflSQ
BWRITE0
IDLE CYCLE TIMING CRS BEFORE RRS
IOLE1
IDLE2 I IDLES
I0LE2
ILDE2
IRHHB1TE1
on
HKRERD1
1_J L
jI__T
J
100 ns/stote
oo
vj


RERD CYCLE TIMING AUTOREFRESH
| IDLE3
RHHERD1
RKRERD2
RNRER03 I HHRERD2
RMRERD2 I RHRERD3
RMRERD2
RNRERD1
I OLE I
BRRS0 _________ I--------1
BCRS0 ____________I I------1
BWRITE0
CNTCLK ________I I-------1
ROWCOL0 _______I I---------1
ADDRESS I mH I CC1-
DRTR VALID
BMHCINCLK______;___________f
HRCOCLK L
r
r
n__ii j
1______I-----L
n__i1
L i
| COL | ROM i cl i
*
1 1 r" i

rn
00
00
100 ns/s to te


WRITE 1. CYCLE. TIMING AUTOREFRESH
IOLES
RHHH1R I RHHR2H
RKMR1R
RMHR2R
RKUR2R
BRAS0
BCAS0
BWRITE0
CNTCLK
ROWCOL0
ADDRESS
CONVCLKB
ADC DATA
J-----1_____I-----1_____T
L_l---1_T
r
1 1 1

1 1 J L

| non | COL | RDM |

J---------L
VRLIO I 1 I I
1____I----1
LJ--1
1___I-1
~1__I---1__[
I COL I ROH I COL
J1__f
3 C
100 ns/state
oo


WRITE 2 CYCLE TIMING HIDDEN REFRESH
IDLE3
RHHR1B I RHUR20
RHHR30 I RHWR20
RHHR2B I RHHRVB
RHWRUB
BRflSQ I I I I________T
BCRS0 ___________I I________I ~1______
BWRITEQ 1_________________
CNTCLK ________
ROWCOL0 ________I L___________________
RDDRESS 1 110,1 1 C0L _
CDNVCLKB L_____I
ADC DRTH VRLID I
L_r
i__r
RHUMB I RHHR1B
RHHR2B
l
RON I COL
100 ns/stotB
vo
o


APPENDIX E
State Machine Listing


OUTPUTS
STATE'S MACINCLK CONVCLK CNTCLKA CNTCLKB RCADDRA RCADDRB RASAO RASBO CASAO CASBO WRITEO IDLE1
IDLE1 L H L L L L H H H H H H
IDLE2 L H L L L L H H L H H H
IDLE3 L H L L L L . L H L H H H
BMREAD1 L H H L H L H H H H H L
RMREAD2 L H L L L L L H L H H L
RMREAD3 H H H L H L H H H H H L
RMREAD4 H H L L H L H H H H H L
RMWRITE1A L L H L H L H H H H H L
RMWRITE2A L H L L L L H L H L L L
RMWRITEIB L L L H L H H H H H H L
RMWRITE2B L H L H L L H L H L L L
RMWRITE3B L H L H L L H H H L L L
RMWRITE4B L H L L L H H H H H H L
ALLZERO L L L L L L L L L L L L
to


93