Citation
Fast Fourier transforms for direct solution of Poisson's equation

Material Information

Title:
Fast Fourier transforms for direct solution of Poisson's equation
Creator:
Bradford, Bert Larue
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
xi, 310 leaves : illustrations ; 28 cm

Subjects

Subjects / Keywords:
Poisson's equation ( lcsh )
Fourier transformations ( lcsh )
Boundary value problems ( lcsh )
Algorithms ( lcsh )
Algorithms ( fast )
Boundary value problems ( fast )
Fourier transformations ( fast )
Poisson's equation ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references.
Thesis:
Submitted in partial fulfillment of the requirements of the degree, Doctor of Philosophy, Department of Mathematical and Statistical Sciences.
Statement of Responsibility:
by Bert Larue Bradford.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
25562325 ( OCLC )
ocm25562325
Classification:
LD1190.L622 1991d .B72 ( lcc )

Downloads

This item has the following downloads:


Full Text
1 >
: FAST FOURIER TRANSFORMS
FOR
DIRECT SOLUTION OF POISSONS EQUATION
by
11
11 Bert Larue Bradford
i ' ,1
B.A., The University of North Texas, 1976
i-MJL, The University of Texas at Austin, 1979
1 i
i 1
l
A thesis submitted to the
[ j, Faculty of the Graduate School of the
- University of Colorado in partial fulfillment
I of the requirements for the degree of
I Doctor of Philosophy
Department of Mathematics
1991
i
I


1991 by Bert Larue Bradford
All rights reserved.


This thesis for the Doctor of Philosophy
degree by
Bert Larue Bradford
., has been approved for the
Department of
1 Mathematics
I
1 by
Roland A. Sweet
Thomas F. Russell


Bradford, Bert Larue (Ph.D., Mathematics)
Fast Fourier Transforms for Direct Solution of Poissons Equation
Thesis directed by Professor Roland A. Sweet
i
This thesis; presents compact algorithms used to incorporate the Cooley-
Tukey fast Fourier transform (FFT) into the solution of finite difference
approximations to the multi-dimensional Poisson equation. In each spatial
dimension, we must specify boundary conditions at both the left and right
endpoint. Boundary conditions we consider include cyclic, Dirichlet, and
Neumann. Furthermore, there is often a need to orient the grid such that
one or both of the endpoints of the computational domain are staggered
at half of a grid spacing. This leads to staggered Dirichlet and staggered
Neumann boundary conditions. When the Poisson equation is discretized,
these boundary conditions are approximated by requiring the real sequence
which represents the approximate solution to satisfy discrete analogs. The
discretized boundary value problem is solved by the Fourier analysis method
(also referred to as the eigenvector expansion method or as a fast Poisson
solver). This method requires finding the eigenvalues and eigenvectors cor-
responding toithe discretized boundary value problem. The discrete solution
is expanded in terms of these eigenvectors. The efficiency of this algorithm
results from the ability to calculate the coefficients in such eigenvector ex-
pansions usihjg an FFT algorithm. For each of the boundary conditions
discussed above, an FFT algorithm has been developed which computes the
coefficients in the corresponding eigenvector expansion as efficiently as pos-
sible by eliminating all redundant computations which would occur in the
full complex FFT, and without pre- or post-processing. Such FFT algo-
rithms are referred to as compact symmetric FFTs. The elimination of pre-
and post-processing improves performance by reducing both the number of
operations and data accesses. These FFT algorithms are all general mixed
radix, in-place algorithms which accept the input sequence in natural order.
The inverse algorithms accept the input sequence in permuted order. Thus,
reordering of data is never required.
The form and content of this abstract axe approved. I recommend its
publication.
Signed.
Roland A. Sweet
i
IV


Contents
List of Figures vii
List of Tables ,, x
Acknowledgements xi
1 Introduction 1
1.1 The Fourier Analysis Method............................ 1
1.2 The New FFT and FST Algorithms ........................ 8
2 Fast Fourier, Transforms 10
2.1 Complex (C) ................................. 10
2.2 Real! (R)............................................. 20
2.3 Real Even (RE)........................................ 32
2.4 Rea( Odji (RO)........................................ 44
2.5 Real' Composite Even-Even (RE-E)...................... 56
2.6 Real Composite Even-Odd (RE-O)........................ 62
2.7 Real Composite Odd-Even (RO-E)........................ 68
2.8 Real Composite Odd-Odd (RO-O)......................... 74
2.9 Real Staggered Even (RSE)............................. 79
2.10 Real Staggered Odd (RSO).............................. 83
2.11 Tables of Symmetries.................................. 87
3 Fast Staggered Transforms 91
3.1 Complex (C) ................................. 91
3.2 Real (R).............................................. 96
3.3 Real Staggered Even (RSE).............................109
3.4 Real Staggered Odd (RSO)..............................123
3.5 Real Composite Staggered Even Staggered Even (RSE-SE) 137
i


3.6 Real Composite Staggered Even Staggered Odd (RSE-SO) 143
3.7 Real Composite Staggered Odd Staggered Even (RSO-SE) 149
3.8 Real Composite Staggered Odd Staggered Odd (RSO-SO) 155
3.9 Tables of Symmetries..................................161
4 Software jlmplementation and Performance 164
4.1 Introduction..........................................164
4.2 The Radix-2 RO FFT....................................167
4.3 The Radix-4 RO FFT....................................178
4.4 The Radix-3 RO FFT....................................190
4.5 The Mixed Radix RO FFT ...............................204
4.6 Performance of the RO FFT.............................214
4.7 Automating Implementation of the RO FFT...............223
A Eigenstructure of the Discrete Poisson Equation 225
B Software for the RO FFT 228
C FORTRAN Skeleton for Combine Equations 274
D Mathematica Scripts 277
E Automatically Generated Subroutines for the RO FFT 301
Bibliography
309


List of Figures
2.1 Splitting tree for complex FFT............................... 14
2.2 Splitting tree for R symmetric FFT .......................... 24
2.3 Splitting tree for RE symmetric FFT ......................... 34
2.4 Splitting tree for RO symmetric FFT ......................... 46
2.5 Splitting tree for RE-E symmetric FFT........................ 60
2.6 Splitting ,tree for RE-0 symmetric FFT....................... 66
2.7 Splitting tree for RO-E symmetric FFT........................ 72
2.8 Splitting; tree for RO-O symmetric FFT....................... 77
3.1 Splitting tree for R symmetric FST ..........................101
3.2 Splitting tree for RSE symmetric FST.........................112
3.3 Splitting tree for RSO symmetric FST.........................126
3.4 Splitting tree for RSE-SE symmetric FST....................141
3.5 Splitting tree for RSE-SO symmetric FST....................147
3.6 Splitting tree for RSO-SE symmetric FST....................153
3.7 Splitting tree for RSO-SO symmetric FST ....................158
4.1 Radix-2 storage pattern for ICS induced symmetries for N =
16 highlighting the case n = N/4...........................170
4.2 Radix-2 storage pattern for ICS induced symmetries for N =
16 highlighting the case n = 1 ............................170
4.3 Radix-2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = 0 ............................171
4.4 Radix-2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = i\T/4.........................171
4.5 Radix-2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = 1 ............................172
4.6 Radix-2 storage pattern for I sequences for N = 16 highlight-
ing the case n = 0 .................................173
I


4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
4.23
4.24
4.25
4.26
Radix-2 storage pattern for I sequences for N = 16 highlight-
ing the case n = Nj4 ...................................
Radix-2 storage pattern for I sequences for N = 16 highlight-
ing the case n = l .......................................
Splitting tree for the radix-2 RO FFT for iV = 16 ........
Radix-4 storage pattern for ICS induced symmetries for N =
24 highlighting the case n = 0 ..........................
Radix-4 storage pattern for ICS induced symmetries for N
24 highlighting the case n = i\T/8........................
Radix-4 storage pattern for ICS induced symmetries for N =
24 highlighting the case n = 1 ..........................
Radix-4 storage pattern for ISCS induced symmetries for N =
24 highlighting the case n = 0 ..........................
Radix-4 storage pattern for ISCS induced symmetries for N
24 highlighting the case n = N/8 .........................
Radix-4 storage pattern for ISCS induced symmetries for N =
24 highlighting the case n = 1 ..........................
Radix-4 storage pattern for I sequences for N = 24 highlight-
ing the case n = 0 ......................................
Radix-4 storage pattern for I sequences for N = 24 highlight-
ing the case n = N/ 8 ...................................
Radix-4 storage pattern for I sequences for N = 24 highlight-
ing the case n = 1 ......................................
Radix-3 storage pattern for ICS induced symmetries for N =
18 highlighting the case to = 0 ..........................
Radix-3 storage pattern for ICS induced symmetries for N
18 highlighting the case n = N/6..........................
Radix-3 storage pattern for ICS induced symmetries for N
18 highlighting the case n = 1 ..........................
Radix-3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = 0 ..........................
Radix-3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = N/6..........................
Radix-3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = 1 ..........................
Radix-3 storage pattern for I sequences for N = 18 highlight-
ing the case n = 0 ......................................
Radix-3 storage pattern for I sequences for N 18 highlight-
ing the case n = N/6 ...................................
174
175
176
182
182
183
184
184
185
186
187
188
193
193
194
195
195
196
197
198
vxn
I
i


.[ '1
4.27 Radix-3 storage pattern for I sequences for N = 18 highlight-
ing the case n = 1 ..........................................199
4.28 Radix-3 storage pattern for 12 sequences for N = 18 high-
lighting the case n 0 ............................200
4.29 Radix-3 storage pattern for 12 sequences for N = 18 high-
lighting the case n = N/6..................................201
4.30 Radix-3; storage pattern for 12 sequences for N = 18 high-
lighting the case n = l ...................................202
4.31 Initialization subroutine hierarchy for the RO FFT .........207
4.32 Forward transform subroutine hierarchy for the RO FFT . 207
i
I'
/ | I
IX


List of | Tables
I i
1.1 Discrete;Homogeneous Boundary Conditions ................. 2
1.2 Eigenstructure for the Standard Grid ....................... 4
1.3 Eigenstructure for the Staggered Grid....................... 4
1.4 Eigenstructure for the Mixed Grid........................... 5
1.5 Operation Counts for 2D Poisson Solvers .................... 7
I l
2.1 Symmetries in the IDFT .................................. 88
2.2 Symmetries in the DFT ................................... 89
i
3.1 Symmetries in the IDST ..................................161
3.2 Symmetries in the DST....................................162
4.1 Splitting Tree for the Radix-2 RO FFT for iV = 16 177
4.2 Splitting Tree for the Radix-4 RO FFT for N 64 189
4.3 Splitting!Tree for the Radix-3 RO FFT for IV = 27 203
4.4 Splitting1 Tree for the Mixed Radix RO FFT for N 72 ... 206
4.5 Timing Data for 1024 Sequences on the IBM 3090J .......215
4.6 Timing Data for 1024 Sequences on the Cray Y-MP8/864 . 216
4.7 Timing ^jlodel for 1024 Sequences on the IBM 3090J..........221
4.8 Comparison of Timing Data for Handwritten Code and Au-
tomated! Code for 1024 Sequences on the IBM 3090J.........224
r
i
I '
I
X
I (
i


Acknowledgements
This work was'generously supported by the IBM Federal Sector Division
Resident Study Program.
i
i i
i
i
xi


Chapter 1
Introduction
1.1 The Fourier Analysis Method
l
We begin with a brief overview of the Fourier analysis method. We will
first present the Fourier analysis method in one spatial dimension. We will
then extend the method to a two-dimensional rectangle. The extension to
higher dimensional rectangular regions is analogous, but we will not pur-
sue this. Finally, we will discuss operation counts for the Fourier analysis
method, and compare it to other methods.
In one spatial dimension, the discretized Poisson equation is:
1, 1 2Un -(- i = fn
for 1 < n < M. | We must specify boundary conditions at both the left
and right endpoint. We may assume, without loss of generality, that the
boundary conditions are homogeneous, since inhomogeneous boundary val-
ues may be absorbed into f\ and /m- The discrete, homogeneous boundary
conditions we consider, specified for n 1, are shown in Table 1.1. Note
that we consider two variants of Dirichlet and Neumann boundary condi-
tions, depending upon whether the boundary coincides with a grid point or
is staggered at a half grid spacing. The notation D-N indicates a homoge-
neous Dirichlet boundary condition at the left endpoint, and a homogeneous
Neumann boundary condition at the right endpoint. Similar notation will
be used for other combinations. Combinations which involve only C, D, or N
are referred to as standard grid boundary conditions. Combinations which
involve only DS or NS are referred to as staggered grid boundary conditions.
Other combinations are referred to as mixed grid boundary conditions.


The discretized boundary value problem may be written in matrix form
as:
Au = f (1.1)
where A is a1 matrix of dimension M, and u,f are vectors of length M.
The boundary conditions have been used to eliminate Uq and um+i- -A is
tridiagonal, and in one spatial dimension we would simply solve this linear
system by Gaussian elimination. However, in anticipation of extensions to
higher dimensions, we procede as follows. First, we find the eigenvalues and
eigenvectors of A.' These are summarized in Tables 1.2, 1.3, and 1.4. Note
that A always has a full set of linearly independent eigenvectors whose com-
ponents are trigonometric expressions. Note also that in these tables the
computational domain is different for each boundary condition. The reason
for this will become clear after studying the corresponding symmetric FFT.
Appendix A provides an example of one technique for finding these eigenval-
ues and eigenvectors. For this general discussion, we denote the eigenvalues
by A*, (repeated to multiplicity) and the corresponding eigenvectors by fa
for 1 < k < M. We now seek a solution for u in the form of an eigenvector
expansion:
M
= £ Ukfa (1-2)
k= 1
This requires that we also express / as an eigenvector expansion:
i
i
M
/=£A& (1.3)
k-1
Since f is known and the vectors fa are linearly independent, we may com-
pute /fc. Because the components of fa are trigonometric expressions, fa
Table 1.1: Discrete Homogeneous Boundary Conditions
Acronym Boundary Condition Discrete Analog
C Cyclic u0 uM
D Dirichlet u0 = 0
N' Neumann u2 u0 = 0
1 DS Dirichlet-Staggered U! + u0 = 0
NS Neumann-St aggered ux Uq = 0
2


I
may be computedimost efficiently by means of a symmetric FFT. Thus, this
step is referred to as Fourier analysis. Substituting equations (1.2) and (1.3)
into equation (1.1) yields:
| M M
X Mk = A[X v-kfa]
k=1 k=1 M X ^k^k^k k=1
Since the vectors are linearly independent, we conclude:
fcAfc = fk
i1
for 1 < k < M. \ We may now compute Uk, unless A*. = 0. In this case,
the compatibility: condition fk = 0 must hold, and Uk is arbitrary. Thus,
the solution for u is not unique in this case. This occurs for C-C, N-N, NS-
NS, N-NS, and NS-N boundary conditions, and corresponds to the fact that
the solutions to these problems are unique only up to an additive constant.
Having determined %, u may now be computed using the inverse of the
corresponding symmetric FFT. This step is called Fourier synthesis.
We now indicate how to extend the Fourier analysis method to a two-
dimensional rectangle. For simplicity, we assume that the number of un-
knowns in each dimension are equal. In two spatial dimensions, the dis-
cretized Poisson equation is:
^nl,m 2ttji,m "t" U-n+l,m F P [un,m1 ~~ 2un]m = fn,m
for 1 < n,m < ;M, where p = Ax/Ay. We assume that homogeneous
boundary conditions are specified on all four sides of the rectangle of the
same type considered previously. The discretized boundary value problem
may be written in!matrix form as:
1 11 P [^m1 -)- I/tji+i] fm (1-^)
for 1 < m < M. Um. is a vector of length M with nth component un,m, and
likewise for fm. A is the same M-dimensional matrix as in the corresponding
one-dimensional problem. As before, we seek a solution for um in the form
of an eigenvector expansion:
M
^m ^ ^k,m^k (1*5)
l k=l
3
i


I I
Table 1.2: Eigenstructure for the Standard Grid
Bnd Cnd; nth Comp of Eigenvec Comp Domain
Transform Associated Eigenvalue Eigenvec Indx
1' C-C : R FFT j i cos (2Tkn/N) 4 sin2 (7rk/N) 0 < n < N 1 0 < k < N/2 or 0 < k < (N l)/2
, i 1 l sm(2irkn/N) 4sin2(7r k/N) 0 < n < N 1 1 < k < N/2 1 or 1 < k < (N -1)12
N-N EE FFT;1 cos(2tt kn/N) 4 sin2 (7rk/N) 0 < n < JV/2 0 < k < N/2
D-D RO FFT ' sin(27r kn/N) 4 sin2 (irk/N) 1 < n < N/2 1 l N-D RE-0 FFT cos[27rn(2& -f l)/iV] 4sin2[7r(2A: + 1 )/JV] 0 < n < JV/4 1 0<&< N/4 1
D-N '! RO-E FFT sin[27m(2fc 1 )/N] 4sin2[7r(2A: l)/jV] 1 < n < JV/4 1 < k < N/4
1 !1
Table 1.3: Eigenstructure for the Staggered Grid
Bnd Cnd nth Comp of Eigenvec Comp Domain
Transform i Associated Eigenvalue Eigenvec Indx
NS-NS1 :: RSE FST j, cos[7r&(2n + l)/iV] 4sm2(7ck/N) 11 11 1 1 DS-DS RSO FST! sin[7rfc(2n + 1)/JV] 4 sin2(7rA:/A) 0 < n < N/2 1 1 < k < N/2
NS-DS RSE-SO FST cos[7r(2A: + l)(2n + l)/iV] 4 sin2[7r(2& + l)/iV] 0 < n < N/4 1 0 < k < N/4 1
DS-NS RSO-SE FST sin[7r(2fc + l)(2n + 1)/JV] 4 sin2[7r(2A: -)- 1)/JV] 0 < n < N/4 1 0 < k < N/4 1
4


I
I
I
i I
iTable 1.4: Eigenstructure for the Mixed Grid
Bnd Cnd!! nth Comp of Eigenvec Comp Domain
Transform 1, Associated Eigenvalue N = 2(2 M + 1) Eigenvec Indx
; ll N-NS. j, RE-E FFT cos(47rA:n/Ar) 4sin2(27r k/N) 0 < n < M 0 < k < M
N-DS i i RE-0 FFT cos[27rn(2fe + 1 )/N] 4 sin2[7r(2ft + 1)/A] 0 D-NS j, RO-E FFT sin[27rn(2A: l)/N] -4sin2[7r(2A: 1 )/N] 1 < n < M 1 D-DS '' RO-OFFT sin(47rkn/N) 4sin2(27r k/N) 1 < n < M 1 < k < M
NS-N' ,| RSE-SE F;ST cos[2Trk(2n + 1)/1V] -4sin2(27rA:/iV) 0 < n < M 0 NS-D ; RSE-SO FST cos[7r(2fc + l)(2n + l)/iV] 4sin2[x(2fc + l)/i\T] 0 < n < Af- 1 0 < k < M 1
DS-N RSO-SE FST sin[ir(2A: + l)(2ra + 1)/1V] 4sin2[7r(2A: + l)/iV] 0 < n < M 0 < k < M
DS-D RSO-SO FST sin[27rA:(2n + 1)/1V] 4sin2(27r k/N) 0 < n < M 1 1 < k < M
i
5


11
1 I
I
I,
j;
This requires that we also express fm as an eigenvector expansion:
i M .
fm : yi (1-6)
1 fc=l
fk,m may be computed most efficiently by performing M symmetric FFTs of
length M. Substituting equations (1.5) and (1.6) into equation (1.4) yields:
M M
^ ^ r 1 "t"
k=1 ,1 fc=l
M
, P ^ 1 2Uk,m H" f^Js,m+l]^A
fc=l
m
, = XI +
; *=i
!; M
fc=l
Since the vectors are linearly independent, we conclude:
P 1 (^/s 2p ')U]ttTn -f- p U^m+l fk,m
for l < k,m < M. We now obtain Uk,m by solving M tridiagonal linear
systems of dimension M by Gaussian elimination. For C-C, N-N, NS-NS,
N-NS, or NS-N boundary conditions, one of these linear systems is singular.
In this case, fk,m 1 must satisfy a compatibility condition, and the solution
for Uk,m is not, unique. Having determined Uk,m, Um may be computed by
performing M symmetric FFTs of length M.
We conclude this section with a discussion of operation counts for the
Fourier analysis method, and a comparison of it to other methods for solv-
ing the discrete Poisson equation. The Fourier analysis method is efficient
only for two or more dimensions. As before, we will restrict our discus-
sion to two dimensions. The operation count for an MxM grid, where M
is a power of two, is easily obtained from the description of the algorithm
above. We performed 2M symmetric FFTs of length M, each of which re-
quires 0(Mlog M): operations. We solved M tridiagonal linear systems of
dimension M by Gaussian elimination, each of which requires 0(M) oper-
ations. Thus, the asymptotic operation count for the entire algorithm is
I


I!
i
I'
Table jl.5: Operation Counts for 2D Poisson Solvers
Method Operation Count
Gaussian Elimination 1, j Successive Over-Relaxation Alternating Direction Implicit Cyclic Reduction Courier Analysis FACR(^) 0(M4) 0{M2 log M) 0(M2 log2 M) 0{M2 log M) 0(M2 log M) 0(M2loglogM)
i1
0(M2logM).f The operation counts for other methods of solving the dis-
crete Poisson [equation are summarized in Table 1.5. The source of this
information is j [8].j 'The FACR(^) method combines the cyclic reduction and
Fourier analysis methods.
I
I H


1.2
The New FFT and FST Algorithms
From the discussion of the Fourier analysis method in Section 1.1, it is
evident that FFT algorithms form the core of this method. Our goal is to
provide the best possible FFT algorithms for this purpose, and to address
all of the boundary conditions in Tables 1.2, 1.3, and 1.4. In this section,
we summarize the inew contributions to FFT literature contained herein.
For each of the boundary conditions in Tables 1.2, 1.3, and 1.4 an FFT
algorithm has been developed which computes the coefficients in the corre-
sponding eigenvector expansion as efficiently as possible by eliminating all
redundant computations which would occur in the full complex FFT, and
without pre- or post-processing. Such FFT algorithms are referred to as
compact symmetric FFTs. The older pre- and post-processing algorithms
are described in detail in [2, 10]. Pre- and post-processing steps contribute
only low order terms to operation counts. However, for sequences of prac-
tical length these low order terms may be significant. Furthermore, these
algorithms require additional data accesses which also contribute to the to-
tal execution time. Thus, compact symmetric FFTs eliminate the additional
operations and data accesses associated with pre- and post-processing algo-
rithms. Pre- and post-processing algorithms also have the restriction that
the length of the sequence must be even. A compact symmetric FFT has
long been available for real sequences, known as Edsons algorithm. In
[4], a compact symmetric FFT for real even sequences is introduced, but
in the context of Clenshaw-Curtis quadrature. In [10], in-place compact
symmetric FFTs are developed for real, even, odd, quarterwave even, and
quaxterwave odd symmetries. All in-place algorithms based on the splitting
method require either the input or output sequence to be in a permuted
order, referred to as bit-reversed order. These in-place algorithms require
the input sequence in physical space to be in bit-reversed order, and produce
the forward transform in natural order. From our discussion of the Fourier
analysis method, it is clear that this is the opposite of what is desired. In
[1], analogous algorithms are developed which accept the input sequence
in physical space in natural order, and produce the forward transform in
bit-reversed order. We follow the general approach set forth in [1].
With this background, we may now summarize our new contributions
to FFT literature., The algorithms in [1] were developed for radix-2 only.
We have generalized all of these to radix-p, for a general factor p. This
has resulted in a number of new intermediate symmetries which occur in
8


I
the course of the splitting method. After obtaining the combine equations
for the inverse transform, they must be inverted to obtain those for the
forward transform! For the radix-p algorithms, this requires the inversion of
many systems of p equations in p unknowns. We have exploited the special
nature of these systems of equations to invert them in closed form. The real
quarterwave even and quarterwave odd transforms, which we refer to as the
real staggered even (RSE) and real staggered odd (RSO) FFTs, have been
used for N-D and;D-N boundary conditions respectively. We have shown
that the algorithms for these symmetries in [1] are not in-place. We have
developed two new compact symmetric FFTs, called real composite even-
odd (RE-0) and composite odd-even (RO-E) for these boundary conditions.
We have shown that these new algorithms are in-place and obtain the goal of
eliminating all redundant operations which would occur in the full complex
FFT.
For staggered!grid boundary conditions, we have developed new algo-
rithms based on a variant of the DFT which we refer to as the discrete stag-
gered transform (DST). In analogy to the FFT, we have developed efficient
algorithms for computing the DST, which we refer to as the fast staggered
transform (FST). Previously, the only known algorithms for staggered grid
boundary conditions were the real quarterwave even and quarterwave odd
FFTs, and the pre- and post-processing algorithms in [6]. The real quarter-
wave even and quarterwave odd FFTs have been used for NS-NS and DS-DS
boundary conditions respectively, but the algorithms for these symmetries
in [1] are not in-place. The pre- and post-processing algorithms for NS-DS
and DS-NS boundary conditions are less efficient than the new compact
symmetric FSTs for the same general reasons discussed previously.
For mixed grid boundary conditions, we have developed new algorithms
based on superimposing two symmetries. We refer to the resulting sym-
metries as composite symmetries. Previously, the only known algorithms
for mixed grid boundary conditions were the pre- and post-processing al-
gorithms in [6] for NS-D and D-NS boundary conditions. Again, the pre-
and post-processing algorithms are less efficient than the new compact algo-
rithms. Furthermore, we have developed compact algorithms for six mixed
grid boundary conditions which previously could not be solved by Fourier
methods. ;
9


Chapter 2
Feist Fourier Transforms
i !,
1 i
1 i
i
i1
2.1 Complex (C)
: j-
t '
We begin by reviewing the fast Fourier transform, and establishing no-
tation which will lie used throughout.
i i
Definition 2.1 Given a C sequence xn, for 0 < n < N 1, the forward
discrete Fourier transform (DFT) is defined by:
i ; JV-1
I j, X* = 1/JV £ Wn (2-1)
; | n=0
for 0 < k < N j l,1 where:
cjN = ei2^N
,, |,
For convenience, we will often suppress the constant 1/N.
i,
The following itheorem provides the inverse discrete Fourier transform
(IDFT). We omit Jthe proof of this result because it is well known.
i i
Theorem 2.11 A \C sequence xn may be recovered from its DFT Xk by the
inverse discrete Fourier transform (IDFT) which is given by:
! I
|, ', JV-1
i, xn='EXk ! i! k0
i. 1'
for 0 < n < N lL.
I
(2.2)


>' I
li 1
By Definition 2.1, the sequences xn and Xk axe of length N. These se-
quences can be extended to all integral values of n and k using the periodicity
properties proyided by the following corollary.
j; i |
Corollary 2.1 Equations (2.1) and, (2.2) imply that the sequences xn and
Xk may be extended periodically to all integral values of n and k by:
XN+n xn
-XjV+A = Xk
We will develop fast algorithms for computing the DFT and ID FT which
are based on the Gboley-Tukey fast Fourier transform (FFT). Following the
general approach in [1], we will develop algorithms for the IDFT given Xk
in bit-reversed
order. Inverting these yields algorithms for the DFT given
xn in natural order. We begin by defining notation which will be needed in
the development of these algorithms.
| I1
Definition 2.2 Given a C sequence Xk of length N, and a factor p of N,
we define a splitting of Xk consisting of the following p subsequences, each
of length NJp: [ |
-^k,q = Xpk+q
for 0 < k < N/p j-lj 0 < q < p 1. We denote the IDFT of these by yn That is:
N/p-1
I'
Vn,q 53 ^.9^"
N/p
for 0 < n < N/p j jl, 0 < q < p 1.
Given a C sequence xn of length N, and a factor p of N, we define the
following p subsequences, each of length N/p:
xn,l xlN/p+n
, I I
for 0 < n < N/p -M, 0 < l < p 1.
| i
; l'
The inverse^, fast Fourier transform (IFFT) is based on the principle of
computing the quantities yn,q, and then combining these in the appropriate
fashion to obtain aThe precise equation for performing this combining
operation is provided by the next theorem.
11


(2.3)
Theorem 2.2 The inverse combine equation for C sequences is:
:; p-1
| i; Xn,l = £ ^p^NV^q
|. 1 9=0
1 jl
for 0 < n < N\/p 77 1, 0 < l < p 1.
j |i
We now prove'Theorem 2.2.
i JV-1
l i!
' ^ fc=0
! : p-iN/p-i
i :: =eexPk+^k*-'>
? g=0 k=0
j p-l N/p-1
! ;: = S>5? £ ***"/,
j ' , g=:0 fc=0
i |
I p-l
| = £ UN Vn,q
q=0
I
In terms of the subsequence notation defined previously, this result is:
| ! *n,Z = xlN/p+n
i p-l
Eg (IN/p+n)
wjv yiN/p+n,q
i ;! 9=0
, L i
' j 1! = £wpwJvyn,g
i. I' 9=0
This completes the proof of Theorem 2.2.
The following corollary provides an important special case of this result.
This is the same a;s equation (2) in [1], except that we are working with the
IDFT. I 1,
Corollary 2.2 Assume p = 2. The inverse combine equation for C
quences is: j
t J i n, 0 = it Vn, 0 + uNVn, 1
! n, 1 = j ; Vn,0 ~ ^NVn, 1
for 0 < n < N/2 -f,l.
\
12


We may npw describe the IFFT algorithm for a C sequence with length
a power of twd. Figure 2.1 is a splitting tree diagram which represents this
algorithm for a C I sequence of length eight. The original sequence is split
into two subsequences, one consisting of the even numbered terms, and the
other consisting of]the odd numbered terms. Assume, for the moment, that
the IDFT of each j subsequence is known. Then the IDFT of the original
sequence may be obtained by applying Corollary 2.2. The algorithm now
continues recursively. That is, the IDFT of each subsequence is computed
by splitting them and repeating the steps above. Eventually, subsequences
of length one yrill jbe obtained. Since a sequence of length one is its own
IDFT, the recursive process terminates at this point.
We now bejgin' |the development of the FFT algorithm. WTe will obtain
the forward combine equation for the FFT by inverting the inverse combine
equation. For this], we will need the following orthogonality property.
i' il
Lemma 2.1 If N\ 'is a positive integer, and 0 < j,n < N 1, then:
! = *(*) =
k=0
N if j n
0 otherwise
We now prove Lemma 2.1. The case j = n is obvious. For j ^ n, define:
!' || y =
Summing the finite geometric series yields:
i i!
; 1j JV-i N-1
; i' = £/
k=0 k0
|! = (l-yN)/(l-y)
' i: = (i i)/(i -y)
1 i1 =o
i'
, ],
This completes; the]proof of Lemma 2.1. The forward combine equation for
the FFT is now provided by the following theorem.
i ;;
Theorem 2.3 1 The forward combine equation for C sequences is:
I'
p-i
Vn,q = l/p UNnq Uv lqx^l
1=0
(2.4)
for 0 < n < N/p \ 'l, 0 < q < p 1.
13
i


Figure 2.1: Splitting tree for complex FFT
14


We now prove! Theorem 2.3.
ii
N/p-l
yn,q 5^
! Jfe=0
" N/p-l
' = E **+, 1 fc=o
'! N/p-1 JV-1
\- = E fVJv E W^'W/,
!1 fc=o j=0
l" JV-l N/p-1
= i/E Wi E
| ! J=0 fczzO
' !; = l/pl>,wW j, z=o
' = 1/PNnqY.UJplqX^
I 1=0
I
This completes the proof of Theorem 2.3.
The following corollary provides an important special case of this result.
This is the same as equation (13) in [1], except that we are working with
the IDFT. j ;,
Corollary 2.3 Assume p = 2. The forward combine equation for C se-
quences is: |.
Vnft (n,0 "I" *n,l)/2
Hn,i = ujjn{xnfi n,i)/2
for 0 < n < N/2 -j'l.
We close this section by presenting the FFT and IFFT algorithms for
complex sequence^ with length a power of two. We emphasize that this
FFT is an in-place algorithm which accepts the input sequence xn in natu-
ral order, and produces the forward transform X*. in bit-reversed order. The
IFFT is an in-place! algorithm which accepts the sequence X& in bit-reversed
order, and produces the inverse transform x in natural order. These al-
j j
gorithms may be used together in such a way that reordering of the data
15


is never required: We will not include complete algorithm specifications
such as these for lall of the symmetric FFTs presented later. However, the
algorithms presented here should provide a guideline for developing com-
plete algorithms from forward and inverse combine equations. The codes
are written in FORTRAN, and are patterned after similar codes found in
[9].
i
i, i
16
i
i
i
i


I
I
c
C TEST DRIVER FOR COMPLEX FFT
C
PARAMETER (LOGN=3,N=2**LOGN)
COMPLEX X(0:N-1)
COMPLEX OMEGA(0:N-1)
COMMON /FCCOM/ L,OMEGA
DO 100, 1=0, N-l
X(I) = CMPLX(1.0,0.0)
100 CONTINUE
WRITE(6,1) (X(I),1=0,N-l)
1 FORMATClH ,COMPLEX SEQUENCE = ,4(/,4E13.4))
CALL FCI(LOGN)
CALL FFC(LOGN.X)
WRITE(6,2) (X(I),1=0,N-l)
2 FORMATS1H1,FORWARD TRANSFORM = ,4(/,4E13.4))
CALL FICCLOGN.X)
WRITE(6,3) (X(I),1=0,N-l)
3 FORMATClH ,INVERSE TRANSFORM = ,4(/,4E13.4))
END
C
C FOURIER TRANSFORM
C COMPLEX SEQUENCE
C INITIALIZATION
C
SUBROUTINE FCI(LOGN)
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
L = 2**L0GN
OMEGA(0) = 1.0
TPIDL != 8.0*ATAN(1.0)/L
OMEGA(1) CMPLXCCOS(TPIDL),SIN(TPIDL))
DO 100 1=2,L-l
OMEGA(I) ;= OMEGA(I-1)*0MEGA(1)
100 CONTINjUE
RETURN
END I 1
17


c
C FOURIER TRANSFORM
C FORWARD DIRECTION
C COMPLEX SEQUENCE
C |l
SUBROUTINE FFC(LOGN,X)
COMPLEX X,(0:2**L0GN-1)
N = 2**LOGN
DO 10p 1=1,LOGN
NS = 2**(I-1)
LS = N/NS
CALL CF(NS,LS,X)
100 CONTINUE,
DO 200 1=0,N-l
X(I) = X(I)/N
200 CONTINUE
RETURN
END
C
C COMPLEX SEQUENCES
C FORWARD COMBINED
c
C NS = NUMBER OF SEQUENCES
C LS = LENGTH OF SEQUENCES
C
SUBROUTINE CF(NS,LS,X)
COMPLEX X(0:LS/2-l,0:l,NS),TMPl
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
DO 200 J=1,NS
DO 100, I=0,LS/2-l
TMP1 =1 XCI.O.J) + X(I,1,J)
X(I,1,J) = CONJG(OMEGA(I*L/LS))*(X(I,0,J) X(I,1,J
X(I,0,J) = TMP1
100 CONTINUE
200 CONTINUE ,
return;
END
18


c S'
C FOURIER TRANSFORM
C INVERSE DIRECTION
C COMPLEX SEQUENCE
C ! j i
SUBROUTINE FIC(LOGN.X)
COMPLEX X(0:2**L0GN-1)
N = 2**L0GN
DO 100 I=1,L0GN
i' ii
LS = 2**l'
NS = N/LS!
CALL CI(NS,LS,X)
100 CONTINUE
RETURN i.
end ;
c
C COMPLEX SEQUENCES
C INVERSE COMBiNED
c
C NS = NUMBER OF SEQUENCES
C LS = LENGTH OF SEQUENCES
C
SUBROUTINE CI(NS,LS,X)
COMPLEX X(0:LS/2-l,0:l,NS),TMPl
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
DO 200; J=1 ,NS
DO 100; I=0,LS/2-l
TMP1 = 0MEGA(I*L/LS)*X(I,1,J)
X(I,1,J) = X(I,0,J) TMP1
X(I,0,J) = X(I,0,J) + TMP1
100 CONTINUE 11
200 CONTINUE i
RETURN
END

19


I
2.2 Real (R)
In this section! we will be concerned with the following symmetries:
Definition 2.3 A real (R) sequence xn of length N is defined by:
*71 = *71
A conjugate symmetric (CS) sequence Xk of length N is defined by:
Xri-k = Xk
The following! lemma establishes the relationship between these symme-
tries. We omit the proof of this result because it is well known.
Lemma 2.2 If x^ is an R sequence of length N, then its DFT Xk is a CS
sequence of length N. If Xk is a CS sequence of length N, then its IDFT
xn is an R sequence of length N.
\ I
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for C-C boundary con-
ditions. Since an R sequence is also periodic with length N, it satisfies C-C
boundary conditions for the computational domain 0 < n < N 1.
Theorem 2.4 Let xn be an R sequence and let Xk be its CS symmetric
DFT, both of length N. The real form of the DFT is:
!| jv-i
Re{Xk) = 1 /N ^ rncos(27rkn/N)
71=0
l . z}
Im(Xk) = l/N y; xnsm(2-Kkn/N)
71=0
for 0 < k < N If N is even, then the real form of the IDFT is:
n = -<^o.+ (l)Xjv/2 +
N/ 2-1
y; {2Re(Xk) cos(2Trkn/N) 2Jm(Xfc) sin(27rkn/N)}
' k= 1
20


for 0 < n < N 1. If N is odd, we obtain instead:
(JV-l)/2
xn = Xo + 53 {2Re(Xk) cos{2/wkn/N) 2Im(Xk) sm.{2-irkn/N)}
l *=1
, for 0 < n < N 1.
We now prove Theorem 2.4. The result for the DFT follows immediately
from Definition 2;1 and the R symmetry of xn. Note that only half of the
CS sequence Xk needs to be specified. We prove the result for the IDFT for
the case of even fl only, since the proof for odd N is similar. Using the CS
symmetry of Xk yields:
1 N-l
*. Y
k=b
= -^0 + (-l)n-^V/2 +
Nj 21 N/ 2-1
t 2 XN_kUN(N~k)
fc=i fc=i
= Xo '+ (-l)nXjv/2 +
N/ 2-1 N/21
[ Y X^N+ Y
j fc=l fc=l
? i N/21
= Xo + (l)nXjy/2 + 2Re[ X^u#1]
fc=l
= j'-^O ;+ ( l)"Xjy/2 +
N/2-1
53 {2Re(Xj.) cos(2irkn/N) 2Im(Xj.) sin(27rhn/jV)}
fc=i
This completes the proof of Theorem 2.4.
We now develop a fast, mixed radix algorithm for computing the R
symmetric DFT and its inverse, given xn in natural order. Note that an R
sequence of length N may be stored in N real storage locations, compared to
2N real storage locations for a C sequence of length N. Also, a CS sequence
of length N may ibe stored in N real storage locations because half of the
sequence is redundant and need not be stored. Our goal is to exploit these
symmetries in the data in order to obtain a reduction by half in both storage
I i
21


requirements and number of operations compared to that for C sequences.
This algorithm is based on the symmetries which occur in the splittings of
the CS sequence IX*. We begin developing this algorithm by defining all of
the intermediate symmetries involved.
Definition 2.4 Let Xk be a CS sequence of length N with factor p. For
q Q, we define CS induced intersequence symmetry (CSIS) by:
Xk,pq XN/pkl,q
For q ^ 0, we denote subsequence Xkiq by CSIS(q). Subsequence p q
is a redundant copy of subsequence q, which we denote by CSIS(p q) =
CSIS (q). We also say that subsequence p q is the dual of subsequence q.
A staggered conjugate symmetric (SCS) sequence Xk of length N is de-
fined by:
Xjv-jt-i = Xk
Let N have factor p. For 0 < q < p 1, we define SCS induced intersequence
symmetry (SCSIS) by:
Xk,p-ql X N/p-kl,g
For 0 < q < p 1, we denote subsequence Xkq by SCSIS(q). Subsequence
p q 1 is a redundant copy of subsequence q, which we denote by
SCSIS(p q 1)= SCSIS (q). We also say that subsequence p q 1 is
the dual of subsequence q.
The following 'lemma establishes the relationship between these symme-
tries.
Lemma 2.3 Let \Xk be a CS sequence of length N with factor p. Then
the subsequence Xkto is CS symmetric, and the remaining subsequences Xktq
are CSIS symmetric. If p is even, then the CSIS symmetry of subsequence
XkiP/2 reduces to SCS symmetry.
Let Xk be an SCS sequence of length N with factor p. Then the subse-
quences Xk,q are SCSIS symmetric. If p is odd, then the SCSIS symmetry
of subsequence Xk^p_iy2 reduces to SCS symmetry.
We now provej Lemma 2.3. Let Xk be a CS sequence of length N with
factor p. The subsequence Xk,o satisfies:
^N/pk, 0 X Npk Xpk X kiQ
22


That is, subsequence Xklo is CS symmetric. The remaining subsequences
Xk,q satisfy:
Xk,p-q ~ Xpk+p-q
1 -^JVpfep+g
Xp(Njp_k_ !)+g
= XN/p-k-l,q
That is, for q0; the subsequences Xk,q are CSIS symmetric. If p is even,
then the CSIS symmetry of Xk)P/2 reduces to:
Xk,p/ 2 = XNjp-k-l,p/2
That is, subsequence Xfcp/2 is SCS symmetric.
Let Xk be!an SCS sequence of length N with factor p. The subsequences
Xk>q satisfy:
X-k ,p q 1
That is, the subsequences Xk,q are SCSIS symmetric. If p is odd, then the
SCSIS symmetry of Xk,(p-i)/2 reduces to:
Xk,{p-1)/2 = X N/pkl,(pl)/2
That is, subsequence Xk^p-\)/2 is SCS symmetric. This completes the proof
of Lemma 2.3:
A mixed radix splitting tree diagram for a CS sequence is shown in
Figure 2.2. The acronyms representing the symmetries are summarized in
Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant.
The next lemma provides the intermediate symmetries in the IDFT in-
duced by the intermediate symmetries in the DFT.
Lemma 2.4 The intersequence symmetry CSIS induces the following inter-
sequence symmetry in the IDFT:
71
y-n,p-q ^N/pVn.q
i ,! 23
i
Xpk+pql
X Ifpk p+q+1 1
Xp(N/p-k-l)+q
XN/p-k-l,q


i
I
I
i I
Figure 2.2: Splitting tree for R symmetric FFT
l
I
I I
I I
24


Let Xk be an SCS sequence of length N. Its IDFT xn satisfies:
xn -
7l/2 ,
xn ^JV xn
where xn is the magnitude of xni and hence is real. The intersequence
symmetry SCSIS induces the following intersequence symmetry in the IDFT:
Vn.p-q-l = UN/pVn,q
We now prove Lemma 2.4. Let Xk>q be CSIS symmetric. Then the IDFT
of Xk,p-q is:
Vn,
p-g
N/p-l
: ^,p-9w^/p
Jfc=0
N/p-l_
~ X) XN/p-k-l,q^kN/p
k=0
N/p-l
n(JV/pfc1)
= E
k=0
N/p
N/p-l_
= UJV/P X^guiv/P
fc=0
= jv/p^g
Let Xfc be an SCS sequence of length IV. Its IDFT xn satisfies:
N-l
XT
= 'E.Xkvfr
k0
N-l
Ev n(Nk1)
XN_k-l^N
k=0
N-l_
E
k=0
kn

We express a:n in polar form as follows:


Substituting this into the preceding symmetry for xn and solving for 6 leads
to:
n/2 ~
xn=UN Xn
Let Xk,q be SCSIS symmetric. Then the IDFT of XklP-q-i is:
yn,p-q-l
I
N/p-l
53 ^k,p-q-luN/p
k=0
N/p-l _
53 ^N/p-k-l^N/p
k=0
N/p-l
£ ^
n(N/p-k-1)
WN/p
k=0
N/p-l_
UN/p £ Xk,qUNk/p
k0
UN/Py*,q
This completes the proof of Lemma 2.4.
The preceding lemma shows that each symmetry appearing in Figure 2.2
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations for the R symmetric IFFT.
Theorem 2.5 Assume that p is even. The inverse combine equation for
CS, SCS, and CSIS sequences is:
l
i,Z Vnfi + (-l)Zjfn,p/2 +
P/21
2Re[ 53
9=1
(2.5)
for 0 < n < jy/p 1, 0 < Z < p 1. Note that xn real. The inverse, combine equation for SCSIS sequences is:
P/2-1
I .! Xnj = 2Re[ 9=0
for 0 < n < N/p 1, 0 < Z < p 1.
26


f !
Next, assume, \that p is odd. The inverse combine equation for CS and
CSIS sequences is:
(P1)/2
*,/ = Vnfi + 2 Re[ pVNVn.ql
91
(2.7)
for 0 < n < NJp q- 1, 0 < l < p 1. The inverse combine equation for SCS
and SCSIS sequences is:
I
(p3)/2
*-ki = ~yn,(P-i)/2 + 2Re[u>lp/2u/2 Y ulpuNlh, 9] (2-8)
: 11 9=0
for 0 < n < N\jp =- 1, 0 < l < p 1.
|
We now prove'Theorem 2.5. First, assume that p is even. Consider the
combining of pS,'!SCS, and CSIS sequences. Substituting the symmetries
found earlier into! the inverse combine equation (2.3) yields:
n,I
g=0
yn, 0 + ulp/2u7/2yn,P/2 +
: i p/2-1 p/2-1
Up y^.q + 2^ WP UN i/n,p-q
' 9=1 g=l
| = yn, 0 + (-l)lu%/2[u,-%2yn>p,2} +
| ! p/2-1 p/2-1
! I, Y WpWN^ +
| ' q1 q=1
2/71,0 + (i)lyn,p/2 +
p/2-1
2Re[ Y
i ! 91
] 1
Consider the combining of SCSIS sequences. Substituting the symme-
tries found earlier! into the inverse combine equation (2.3) yields:
I
*n,I 4 YUPUNyn,q
9=0
, I
I h
! i 27
i I
I


Z>/2-l p/2-1
E apWAT9yn,9+ E Jpip-q-l)^P~q~l)yn,p-q-l
90
<7=0
p/2-1
= E
9=0
Using SCS symmetry yields:
E p^VriA + Wp E Wp lqNnqy^q
p/2-1
E
9=0
*IN/p+n
W
-(W/p+n)/2.
JV
-Z/2
xlN/p+n
i/2-
= Uv' UN' Xn,l
(2.9)
Substituting this into the combine equation above yields:
: p/2-1
*n,Z = ^p/2^/2 E "JW n,s +
9=0
p/2-1
u>-l,2u>xn/2 jr uPlqu}Nnqyn,q
9=0
p/2-1
= 2Re[o4/2o^/2 w?"3yn,]
9=0
Next, assume that p is odd. Consider the combining of CS and CSIS se-
quences. Substituting the symmetries found earlier into the inverse combine
equation (2.3) yields:
xn,l = E
9=0
(p1)/2 (?1)/2
y*1? + E *>Nqyn,q + E a;((p-9)a,n(p.
9=1 9=1
(pl)/2 (p-l)/2
Vn, 0 + E UlpqVNqyn,q + E ^lq-ryr
9=1 9=1
(? 1)/2
2/n,o + 2Re[ E upulryn,q]
n, 9
9=1
28


Consider the combining of SCS and SC SIS sequences. Substituting the
symmetries found earlier into the inverse combine equation (2.3) yields:
=i +
(p-3)/2 (p-3)/2
, X UpUNyn,q+ X a,pP~9_1)a;Ar(P'9"1)^1P-9-l
I. q=0 q=0
' l(p- 1V2 n(p1)/2 .
= WP J/ "JV lfe.(p-l)/2 +
(pT3)/2 (p-3)/2
X " + WplwNn X WplqNnqyn>q
g=0 Combining this with equation (2.9) yields:
1 |
p/2 np/2 ,
xn,l = ^ Ifn,(p-l)/2 +
(p-3)/2 (p3)/2
Wp2a,JV2 X p^AtVq +pl/2"jT/2 X
g=0 g=0
, 'h . (P-3)/2
= -2/n,(p-l)/2 + 2Re[^/2w/2 2 w?Srn,9]
g=0
This completes the proof of Theorem 2.5.
The following corollary provides an important special case of this result.
These are the same as equations (6) and (7) in [1], except that we are working
with the DDFT. ,
Corollary 2.4 Assume p 2. The inverse combine equation for CS and
SCS sequences is:;
*n,0 ~ Vtl,0 2/n,l
j *7i,l = J/n.,0 £/n,l
/or 0 < n < N/2 -^1. TAe inverse combine equation for SCSIS sequences is:
I *7i,o = 2Re[u$2ynfi}
*7i,i = 2Im[(Jff2ynfl]
for 0 < n < N/2 -f-j 1.
29


' I
The next theorem provides all of the forward combine equations for the
R symmetric FFT.
Theorem 2.6 Assume that p is even. The forward combine equation for
CS, SCS, and CSIS sequences is given by equation (2-4) for 0 < n < N/p 1,
0 < g < p/2 1 and:
v-1
yn,P/2 =
1=0
(2.10)
for 0 < n < N/p 1. The forward combine equation for SCSIS sequences
is:
1=0
(2.11)
for 0 < n < N/p 1, 0 < q < pj 2 1.
Next, assume that p is odd. The forward combine equation for CS and
CSIS sequences is given by equation (2-4) for 0 < n < N/p l,0 (p l)/2. The forward combine equation for SCS and SCSIS sequences is
given by equation: (2.11) for 0 < n < N/p 1, 0 < q < (p 3)/2 and:
p-i
Vn,{p-1)/2 = 1/P y~^(~
1=0

(2.12)
for 0 < n < N/p 1.
I
I
We now prove Theorem 2.6. First, assume that p is even. The forward
combining of CS, SCS, and CSIS sequences requires one new equation:
Vn,p/2 N/pyn,p/2
= 1 /pE^72^
1=0
= I/pXX-1)^
j ' /=0
The forward combine equation for SCSIS sequences is obtained by sub-
stituting equation (2.9) into equation (2.4):
yn,q = l/pw^^v-^Xnj
: ;: i=o
30


I
I
= l/P uNnq 5Z UP lq K l/2lvn/2 n,l]
1=0
l=o
Next, assume that p is odd. The forward combining of CS and CSIS
sequences does not require any new equations. The forward combining of
SCS and SCSIS sequences requires one new equation:
t
Vn,(p-l)/2 ~
This completes the proof of Theorem 2.6.
The following corollary provides an important special case of this result.
These axe the same as equations (11) and (12) in [1], except that we axe
working with the IDFT.
Corollary 2.5 Assume p = 2. The forward combine equation for CS and
SCS sequences is:
Vn, 0 = (n,0 + *n,l)/2
jj 3?Ti|l = (n,0 n,l)/2
for 0 < n < N/2 1. The forward combine equation for SCSIS sequences
is:
Vnfl = ^ (n,0 " i7i,l)/2
for 0 < n < N/2 1.
n/2
WN/pyn,(.P-1)/2
1/P^2LOpll>/2in,l
1=0
1=0
31


I
2.3 Real Even (RE)
I.
In this section, we will be concerned with the following symmetries:
Definition 2i5 A real even (RE) sequence xn of length N is defined by:
*71 = *71
j ^ *JV_n = Xn
I.
Note that an RE sequence may also be viewed as having both R and CS
symmetry, which we denote by RCS.
i
The following|lemma establishes the relationship between these symme-
tries. We omit the proof of this result because it is well known.
11
i!
Lemma 2.5 If * is an RE sequence of length N,. then its DFT Xk is an
RCS sequence of length N. If Xk is an RCS sequence of length N, then its
IDFT xn is an RE sequence of length N.
i.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. j Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for N-N boundary condi-
tions. Note that if N is even, then an RE sequence satisfies N-N boundary
conditions for the; computational domain 0 < n < N/2. That is:
*AT-1 = *1
XN/ 21 = *jV/2+l
Theorem 2.7 Let xn be an RE sequence and let Xk be its RCS symmetric
DFT, both of length N where N is even. The real form of the DFT is:
Nf 2-1
Xk=ll/N[x0 + (-l)kxN/2+ 2x- cos(27r kn/N)]
. | i 71=1
for 0 < k < N/2. | The real form of the IDFT is:
ji N/ 2-1
*71 i-Xo + (-l)nXN/2 + E 2Xk cos(27rkn/N)
, k=l
for 0 < n < N/2. ^Note that the results for the DFT and IDFT are identical
except for scaling.
32


We now prove iTheorem 2.7. The result for the DFT follows from Theo-
rem 2.4, the RCS symmetry of X/., and the RE symmetry of xn as follows:
N-l
Xk = 1 /N ^2 xncos(2Trkn/N)
I n=0
= 1/N{xq + (l)*a:jv/2 +
N/ 2-1 N/ 2-1
xn cos(2ttkn/N) + ^ *Ar-nCos[2Trk(N n)/N]}
71=1 71=1
N/ 2-1
= l/A[a:o + (-l)fcjv/2 + 53 2a:n cos(2xA:n/l\r)]
71=1
The result for the IDFT follows immediately from Theorem 2.4 and the RCS
symmetry of Xk- Note that only half of the RE sequence xn needs to be
specified. This completes the proof of Theorem 2.7.
We now develop a fast, mixed radix algorithm for computing the RE
symmetric DFT and its inverse, given xn in natural order. Note that an RE
sequence of length N may be stored in N/2 real storage locations, compared
to 2N real storage locations for a C sequence of length N. Similarly, an RCS
sequence of length N may be stored in N/2 real storage locations. Our goal
is to exploit these symmetries in the data in order to obtain a reduction by
one fourth in both storage requirements and number of operations compared
to that for C sequences. This algorithm is based on the symmetries which
occur in the splittings of the RCS sequence Xk- We begin developing this
algorithm by defining all of the intermediate symmetries involved.
Definition 2.6 Let Xk be an RCS sequence of length N with factor p. The
intermediate symmetries which occur in the splittings of Xk are identical to
those in Definition 2-4, with the addition that all sequences are real as well.
We indicate this by preceding the acronym for each symmetry with an R.
The relationships between the symmetries recorded in Lemma 2.3 are not
affected by the fac't that all sequences have R symmetry as well. A mixed
radix splitting tree diagram for an RCS sequence is shown in Figure 2.3. The
acronyms representing the symmetries axe summarized in Table 2.2 for ease
of reference. Note that a branch of the splitting tree corresponding to a dual
sequence terminates because it is redundant. Note also that at the deepest
level of the splitting tree we find R sequences rather than C sequences.
The next lemma provides the intermediate symmetries in the IDFT in-
duced by the intermediate symmetries in the DFT.
i
33


11
I
Figure 2.3: Splitting tree for RE symmetric FFT
34


Lemma 2.6 The intermediate symmetries in the IDFT induced by the in-
termediate symmetries in the DFT are identical to those in Lemma 2-4, with
the following addition. Let Xk be an R sequence of length N. Its IDFT xn
satisfies: j
^Nn ~
i
Since all sequences have R symmetry, only half of the IDFT of any sequence
needs to be computed.
I
We now prove1 Lemma 2.6. Let Xk be an R sequence of length N. Its
IDFT xn satisfies:
; :
i , &=0
I N-l
I = E ^W'
f A=0
1 . Xn
i
This completes the proof of Lemma 2.6.
The preceding lemma shows that each symmetry appearing in Figure 2.3
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations; for the RE symmetric IFFT.
j i
Theorem 2.8' Assume that p is even. The inverse combine equation for
RCS, RSCS, and RCSIS sequences is given by equation (2.5) for the lower
half-range of n and 0 < / < p/2 1. We also need the companion equation:
j. xN/p-n,l = Vnfi + (-l)irlj/n,p/2 +
j P/21
j 2Ee[ E V(+1HV,,] (2-13)
I' 9=1
I
for the lower half-range of n and 0 < l < pj 2 1. The inverse combine
equation for RSCSIS sequences is given by equation (2.6) for the lower half-
range ofn and 0 < l < p/2 1. We also need the companion equation:
Pi 21
xN/p-n,l = 2 Re[
(2.14)


for the lower half-range of n and 0 < Z < p/2 1. The inverse combine
equation for R sequences is given by equation (2.3) for the lower half-range
ofn and 0 < Z < p/2 1. We also need the companion equation:
T,
| XN/p-n,l = (2-15)
i 9=0
for the lower half-range ofn and 0 < Z < p/2 1.
Next, assume that p is odd. The inverse combine equation for RCS and
RCSIS sequences is given by equation (2.7) for the lower half-range ofn and
0 < l < (p l)/2.' We also need the companion equation:
I ;
i (p1)/2
\xN/p-n,l = yn,oT2Re[ u>pq^l+l)u;'Nyn,q] (2.16)
I 9=1
I
for the lower half-range ofn and 0 < Z < (p 3)/2. The inverse combine
equation for RSCS and RSCSIS sequences is given by equation (2.8) for the
lower half-range of n and 0 < Z < [p l)/2. We also need the companion
equation: .
*N/pn,l 2Zn,(p1)/2 t"
I. (p-3)/2
I ; Y. ;qV+l)%1h,,} (2.17)
[ 9=0
|
for the lower half-range of n and 0 < Z < (p 3)/2. The inverse combine
equation for Resequences is given by equation (2.3) for the lower half-range
ofn and 0 < Z < (p l)/2. We also need the companion equation (2.15) for
the lower half-range ofn and 0 < Z < (P 3)/2.
j !
We now prove Theorem 2.8. First, assume that p is even. Consider the
combining of RCS, RSCS, and RCSIS sequences. Since we will compute
only half of each sequence yn>q on the right hand side of equation (2.5), we
need the following companion equation:


I
Using RSCS symmetry yields:
yN/pntq
I
I
U?
(N/p-n)/2
N/p
yN/p-n,q
-U1
-n/2__
N/p y-n,q
yn,q
(2.18)
Substituting this into the companion equation above yields:
i *i\T/p71,i Vnfi d" ( 1) Vn,p/2 +
: p/2-1
2Re[ 53
9=1
i = Vnfl + (-l)i"rlj/n,p/2 +
p/21
2Re[ ^ w-9^+1^^i/n,g]
9=1
Consider1 the combining of RSCSIS sequences. Since we will compute
only half of each sequence yn>q on the right hand side of equation (2.6), we
need the following companion equation:
p/21
ZiV/p-rM j= 2Re[ulJ2J^/p~n}/2 £ 0>lpq^N/p~n)yNlp-
9=0
p/21
9=0
! p/21
= sa*[u.-<>/VsP E V(,+l)*V,]
9=0
1
n,gj
Consider the combining of R sequences. Since we will compute only half
of each sequence yn,q on the right hand side of equation (2.3), we need the
following companion equation:
fN/p-nJL
J2UPUNN/P "Wp-n,,
q=o
9=0
37


I
Next, assume tlhat p is odd. Consider the combining of RCS and RCSIS
sequences. Since we will compute only half of each sequence yn>q on the right
hand side of equation (2.7), we need the following companion equation:
(p-1)/2
xN/p-nJ = VN/p-n.O + 2Re[ Yj ^p^N^^VN/p-n.q]
q=l
(p-1)/2
,= Vn,o + 2Re[ Y uf+1)uNnqyn,q]
I > 9=1
; (p-i)/2
= yn<0 + 2Re[ Y p n,q]
9=1
Consider the combining of RSCS and RSCSIS sequences. Since we will
compute only half of each sequence yng on the right hand side of equa-
tion (2.8), we need the following companion equation:
xN/p-n,l = ~yN/p-n,(p-1)/2 +
(p-3)/2
1 i'
9=0
Substituting equation (2.18) into the companion equation above yields:
xN/p-n,l ~ yn,{p-1)/2 +
(p3)/2
9=0
! = yn,(p~ l)/2 +
1 (p3)/2
9=0
The companion equation for R sequences is identical to the even p case.
This completes the proof of Theorem 2.8. The following corollary provides
an important special case of this result.
Corollary 2.6 Assume p = 2. The inverse combine equation for RCS and
RSCS sequences is:
0 Vriy0 T j/n,l
jV/2-n,0 = Vnfi ~ Vn.,1
38
i


for the lower half-range of n. The inverse combine equation for RSCSIS
sequences is:
:! xnfi = 2Re[tifff2yn,o]
XN/2-71,0 = 2Im[u)pj 2fn,o]
for the lower half-range ofn. The inverse combine equation for R sequences
is:
i xn,0 Unfl "H
xNj2n,0 ~ Vnfi ~ ^N1 Vn.,1
for the lower half-range ofn.
The next theorem provides all of the forward combine equations for the
RE symmetric FFT.
Theorem 2.9 Assume that p is even. The forward combine equation for R
sequences is:
i Vn,q = 9{n,0 + ( l)9n,p/2 4"
p/2-1
E + (2.19)
I 1=1
for the lower half-range ofn and 0 < q < p 1. Note that 3/0,g is real because
3:0,0 = *0 a,nd Xq,p/2 XN/2 are both real. This ensures that the final output
is real because ,n = 0 in the last stage of the algorithm. The forward combine
equation for RCS, RSCS, and RCSIS sequences is given by equation (2.19)
for the lower half-range of n and 0 < q < p/2 1 with the exception that all
sequences xn i are real. In addition:
yn,p/2 = 1/P{*n, 0 + (-l)P/2Z-n,p/2 +
i ; p/2-1
E n,i]} (2.20)
1=1
for the lower half-range of n. The forward combine equation for RSCSIS
sequences is:
yn,q = l/p^"(?+1/2){n,0 + *(-l)9*_n1p/2 +
p/21
E [u>-/(9+1/2>aW + u^1/2)^]} (2.21)
39


I
for the lower half-range of n and 0 < q < pj2 1. Note that yo,g is real
because xQ^pj%, Oi
Next, assume that p is odd. The forward combine equation for R se-
quences is:
i '!
(p-1)/2
y,q =7 l/PNnq{Xn,0+ 53 {u-l,1Xntl+JX_ntl}} (2.22)
l-l
for the lower half-range of n and 0 < q < p 1. The forward combine
equation for RCSand RCSIS sequences is given by equation (2.22) for the
lower half-range of n and 0 < q < (p l)/2 with the exception that all se-
quences xn>i are real. The forward combine equation for RSCS and RSCSIS
sequences is:
(P-1)/2
Jin* = l/?iw<,+1/!){*,0 + £ + (2.23)
1=1
for the lower half-range ofn and 0 < q < (p 3)/2. In addition:
(p~1)/2
Vn,(p-1)/2 = l/p{n,0 + 53 ( ^) iXn,l (2.24)
1=1
I
for the lower half-range ofn.
\
We now prove Theorem 2.9. First, assume that p is even. The forward
combine equation for R sequences is obtained by developing a compact form
of equation (2.4) which eliminates all redundant data. For this purpose, we
will need the following result which is valid for all R sequences:
'] xn,pl1
i
Using this result, we obtain:
x(p-l-l)N/p+n
xN-(l+l)N/p+n
*(/+l)AT/p-n
x-n,l+l
Vn,q
P-1
1/pujNiqJ2ujPlqx^i
1=0
40
i


I
P/2-1 p/2-1
= 1/p wivn9{ E p lqT 7 ^n.Z + £
1=0 1=0
p/2-1 p/2-1
= l/p E + E wpz+1)-
1=0 z=o
[ p/2-1 p/2
= 1/P ^"9{ E wp_ ^2* l n,! + E"?-".'}
z=o Z=1
= 1/P a,Wn9{a;n,0 + (' -1)9Z -n,p/2 +
p/2-1
E Klq*n,l + <#*_,,]}
1=1
The forward combining of RCS, RSCS, and RCSIS sequences requires
one new equation:
n/2
I yn,p/2 ~ U}N/pyn,p/2
I = l/p{xnfi + (-l)p/2a:_n,p/2 +
?/2-1
E + *-",*]}
: i i /=i
.The forward combine equation for RSCSIS sequences is obtained by sub-
stituting equation (2.9) into equation (2.19):
2/n.g, = 1/Pw^?{ln,0 + (-l)9S_n,p/2 +
P/2-1
!. 2 [£ l9*n,I + *-n,z]}
1=1
= l/P^"(9+1/2){*n,0 + *(-l)9*-n,p/2 +
I I
P/2-1
EK,(fffl/1,^+Jfs+1/,)*-^]}
Next, assume that p is odd. The forward combine equation for R se-
quences is obtained by developing a compact form of equation (2.4) which
eliminates all redundant data.
IInJt = 1/PNnqY,U}plqx*J
41


= VP^n9{w-9(,-1>/2a:ni(p-i)/2 +
i , (p3)/2 (p- 3)/2
X Wi +
Z=0 Z=0
= l/P^n9{Wp9(P_1)/2a5n,(p-l)/2 +
' (p3)/2 (p3)/2
, i, X a,pZ9a;n,J+ X wI(Z+l)-n,Z+l}
Z=0 Z=:0
: - l/Pwivn9{;p'9(P~1)/2a:n,(p-l)/2 +
, (p3)/2 (p-l)/2
ji X) WpZ9a:n,/+ X
Z=0 Z=1
(p-1)/2
= 1/P^n9{*n,0+ X ["p ^^.Z + Wp9*-7i,z]}
Z=1
The forward combining of RCS and RCSIS sequences does not require
any new equations. The forward combine equation for RSCS and RSCSIS
sequences is obtained by substituting equation (2.9) into equation (2.22):
! , (p-i)/2
yn,q = l/PNnq{Xn,0+ X ["p1kx^,l + UPX-K,i\}
i 1=1
= l^P^n(9+l/2){*n,0 + X [WpZ(9+1/2)n,Z +Wp(9+1/2)i_n,z]}
1=1
For q = (p l)/2lthis reduces to:
Unfa-1)/2 = ajvyp^ri,(p-l)/2
[ i (p-l)/2
= 1M*.0+ X (
1 : z=i
This completes the proof of Theorem 2.9. The following corollary provides
an important special case of this result.
Corollary 2.7 Assume p = 2. The forward combine equation for R se-
quences is: ( i
JZn,0 r (*n,0 "I" *_nii)/2
2Zn,l = kjv (n,0 xn,l)/2
i
42


for the lower half-range of n. The forward combine equation for RCS and
RSCS sequences is:
yn, o = (*n,0 + *-n,l)/2
1 1 (7i,0 n,l)/2
for the lower half-range of n. The forward combine equation for RSCSIS
sequences is:
Vn, o = ujfn/2(xn>0 + *-n,l)/2
for the lower half-range ofn.


2.4 Real Odd (RO)
In this section, we will be concerned with the following symmetries:
Definition 2.7-4 real odd (RO) sequence xn of length N is defined by:
xN-n ~xn
An imaginary odd (10) sequence, or equivalently an imaginary conjugate
symmetric (ICS) sequence, X}, of length N is defined by:
Xk = -Xh
Xivjt = Xk
l_ 1
The follow[ing1 lemma establishes the relationship between these symme-
tries. We omit the proof of this result because it is well known.
1 '!
Lemma 2.7 If xn is an RO sequence of length N, then its DFT Xk is an
ICS sequence of length N. If Xk is an ICS sequence of length N, then its
IDFT xn is an RO sequence of length N.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion require!! by the Fourier analysis method for D-D boundary con-
ditions. Note that if IV is even, then an RO sequence satisfies D-D boundary
conditions for thei computational domain 1 < n < N/2 1. That is:
i ii Xo = 0
i i *JV/2 = 0
. I
Theorem 2.i0 Let xn be an RO sequence and let Xk be its ICS symmetric
DFT, both of length N where N is even. The real form of the DFT is:
' N/21
i Im(Xk) = 1/1V sin(2xfcn/JV')
n=l
for 1 < k < N/2 1. The real form of the IDFT is:
11 JV/2-1
xn = ^2 2Im(Xk)sm(2irkn/N)
&=i
i !
44


!
for 1 < n < N/2 1. Note that the results for the DFT and IDFT are
identical except for scaling.
We now prove Theorem 2.10. The result for the DFT follows from The-
orem 2.4, the ICS symmetry of Xk, and the RO symmetry of xn as follows:
N-l
Im(Xfc) =r 1/N ^ xn sm(2Trkn/N)
n=0
N/ 2-1
= l/N ^ {xn sm(2irkn/N) + sin[27r&(lV n)/N}}
,i | i 71= 1
N/ 2-1
= 1/N ^2 2ajnsin(27rkn/N)
711
The result for1 the IDFT follows immediately from Theorem 2.4 and the ICS
symmetry of Xk. Note that only half of the RO sequence xn needs to be
specified. This completes the proof of Theorem 2.10.
We now develop a fast, mixed radix algorithm for computing the RO
symmetric DFT and its inverse, given xn in natural order. Note that an RO
sequence of length1 N may be stored in iV/2 real storage locations, compared
to 21V real storage locations for a C sequence of length N. Similarly, an ICS
sequence of length N may be stored in N/2 real storage locations. Our goal
is to exploit thesej symmetries in the data in order to obtain a reduction by
one fourth in both storage requirements and number of operations compared
to that for C sequences. This algorithm is based on the symmetries which
occur in the splittings of the ICS sequence Xk- We begin developing this
algorithm by defining all of the intermediate symmetries involved.
Definition 218 Let Xk be an ICS sequence of length N with factor p. The
intermediate symmetries which occur in the splittings of Xk are identical to
those in Definition 2-4, with the addition that all sequences are pure imagi-
nary as well. We indicate this by preceding the acronym for each symmetry
with an I. 1
; I
The relationships between the symmetries recorded in Lemma 2.3 are not
affected by the fact that all sequences have I symmetry as well. A mixed
radix splitting tree diagram for an ICS sequence is shown in Figure 2.4. The
acronyms representing the symmetries are summarized in Table 2.2 for ease
of reference. Note that a branch of the splitting tree corresponding to a dual
45


1
Figure 2.4: Splitting tree for RO symmetric FFT
46
!


sequence terminates because it is redundant. Note also that at the deepest
level of the splitting tree we find I sequences rather than C sequences.
The next lemma provides the intermediate symmetries in the IDFT in-
duced by the intermediate symmetries in the DFT.
Lemma 2.8 The''intermediate symmetries in the IDFT induced by the in-
termediate symmetries in the DFT are identical to those in Lemma 2-4, with
the following addition. Let Xk be an I sequence of length N. Its IDFT xn
satisfies: ,,
XNn =
Since all sequences have I symmetry, only half of the IDFT of any sequence
needs to be computed.
I, :
We now prove. Lemma 2.8. Let Xk be an I sequence of length N. Its
IDFT xn satisfies:
Ztf-n = ^Xk^^
k-0
N-1
=
k=0
1 = ~Xn
This completes the proof of Lemma 2.8.
The preceding lemma shows that each symmetry appearing in Figure 2.4
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations for the RO symmetric IFFT.
'' 1.1
Theorem 2.11 Assume that p is even. The inverse combine equation for
ICS, ISCS, and ICSIS sequences is given by equation (2.5) for the lower
half-range of n and 0 < l < p/2 1. We also need the companion equation:
XN/p-ti,l ~ y-nfi + ( 1) Vn,p/2 ~
' ' P/21
, . 2£e[ £ (2-25)
47


for the lower half-range of n and 0 < l < pj2 1. The inverse combine
equation for ISCSIS sequences is given by equation (2.6) for the lower half-
range ofn and 0 < l < pj2 1. We also need the companion equation:
p/21
JV/pn,/ = 2i£e[u>2 UP Vn,q] (2-26)
9=0
i :'
for the lower \half-range ofn and 0 < l < p/2 1. The inverse combine
equation for I sequences is given by equation (2.3) for the lower half-range
ofn and 0 < l < p/2 1. We also need the companion equation:
' p-i
JV/p-n,Z = Y,"?+l)7qyn>q (227)
q=0
for the lower half-range ofn and 0 < l < p/2 1.
Next, assume that p is odd. The inverse combine equation for ICS and
ICSIS sequences is given by equation (2.1) for the lower half-range ofn and
0 < / < (p l)/2. We also need the companion equation:
1 ; (p1)/2
' xNjp-n,l = -Vnfi ~ 2fie[ ^ ]v3/n,q] (2.28)
: 9=1
for the lower half-range of n and 0 < l < (p 3)/2. The inverse combine
equation for ISCS and ISCSIS sequences is given by equation (2.8) for the
lower half-range of n and 0 < l < (p l)/2. We also need the companion
equation: 1 j'
i ;
*JV/p-n,Z = -&i,(p-l)/2 -
i, (P 3)/2
2Re[u>^l+1V2unN/2 J2 ;q(l+l)%yn,9] (2-29)
9=0
for the lower \halfrrange ofn and 0 < l < (p 3)/2. The inverse combine
equation for I sequences is given by equation (2.3) for the lower half-range
ofn and 0 < l < (p l)/2. We also need the companion equation (2.27) for
the lower half-range ofn and 0 < / < (p 3)/2.
We now prove Theorem 2.11. First, assume that p is even. Consider the
combining of'ICS', ISCS, and ICSIS sequences. Since we will compute only
I
48


half of each sequence yUiq on the right hand side of equation (2.5), we need
the following companion equation:
^N/pn,l VN/pnfl "I" ( 1) VN/pn,p/2 "i"
p/2-1
2Re[ ulp9VNN/P~n)yN/p-n,g\
9=1
Using ISCS symmetry yields:
(N/p-n)/2
yN/p-n,q ~ WN/p yN/p-n,q
. n/2_
= +lV/p yn,q
I ! = + y-n,q
Substituting this into the companion equation above yields:
(2.30)
I !*fNjp-n,l ~ yn,0 + ( 1) 2/n,p/2
p/2^1
2B*[ £
9=1
i
= ~yn,0 + (-l)^n,P/2 -
I . P/2-1
2Re[ £
| ; *=1
Consider the combining of ISCSIS sequences. Since we will compute only
half of each sequence yn>q on the right hand side of equation (2.6), we need
the following companion equation:
: p/21
*iJ* '= m£ J<4N^yN,r^,}
;, 9=
p/2-1
= -2Re[

V+Wu ~n/2
uf+1)UNnqT>
^71,9]
VN ]C
9=0
[ ; P/2"1
I 9=0
Consider the combining of I sequences. Since we will compute only half
of each sequence yn 49


following companion equation:
p-i
Elq qCN/p-n)
UPUN yN/p-n,q
9=0
: = -E^,+,WX.
9=0
Next, assume that p is odd. Consider the combining of ICS and ICSIS
sequences. Since we will compute only half of each sequence yn hand side of equation (2.7), we need the following companion equation:
(p-l)/2
N/pntl ~~ VN/p7i,0 2Re[ ^ ^ UN/p-ntf]
Q=1
(p-l)/2
= ~Vn,0 2Re[
! 9=1
(p-1)/2
,.= -IM-2Re[ 53 Wp9(i+1)^yn,g]
g=l
I
Consider the combining of ISCS and ISCSIS sequences. Since we will
compute only half of each sequence yn>q on the right hand side of equa-
tion (2.8), we need the following companion equation:
xN/p-n,l = -&V/p-n,(p-l)/2 +
(p-3)/2
1 2M^lp-")n E
I 9=0
Substituting equation (2.30) into the companion equation above yields:
xN/p~n,l ~ Vn,(p-1)/2
(p3)/2
9=0
. = -n,(p-1)/2
(p3)/2
9=0
2Re[o;;(/+1)/2a;/2 £
50
i


The companion equation for I sequences is identical to the even p case.
This completes the proof of Theorem 2.11. The following corollary provides
an important special case of this result.
|
Corollary 2.8 Assume p 2. The inverse combine equation for ICS and
ISCS sequences is:
xn,0 77 Vn.,0 d- 2/ti,1
, i xN/2n,0 = ~ 2/71,0 + 2/71,1
for the lower, half-range of n. The inverse combine equation for ISCSIS
sequences is:
xnfl = 2Re[u^2ynfi]
^ jv/2-ti,o = 2Im[u>jJ yn>o]
for the lower halfrrange ofn. The inverse combine equation for I sequences
is:
1 7i,0 2/71,0 H" ^NVn,l
xN/2~n,0 = 2/n,0 + 2/n,l
I
for the lower half-range ofn.
' ! I
I
The next theorem provides all of the forward combine equations for the
RO symmetric FET.
' 1
Theorem 2.12 Assume that p is even. The forward combine equation for
I sequences is: .
| 2/71,9 = l/pv]fnq{xnt0 + (-I)q+lx-n,p/2 +
' ' P/2-1
I £ 1lqxn,i Jpqx_nti}} (2.31)
': z=i
for the lower half-range of n and 0 < q < p 1. Note that yo>q is pure
imaginary because zq,o = *0 and xo,p/2 xN/2 are both pure imaginary.
This ensures that1 the final output is pure imaginary because n 0 in the
last stage of the algorithm. The forward combine equation for ICS, ISCS,
and ICSIS sequences is given by equation (2.31) for the lower half-range of


n and 0 < q addition.
fln,p/2 = lM*n,0 + (~l)p/a+1*_1p/a +
p/2-1
£
(2.32)
i=i
for the lower half-range ofn. Note that yo,P/2 = 0 because a:o,o = = 0 and
xo,p/2 xN/2 =p 0-1 The forward combine equation for ISCSIS sequences is:
yn,q 5= l/pu>xn{q+1/2){xni0 + i(-l)q+1 x-n,p/2 +
p/2-1
;! £[u.p-^+1/2)^-^1/2)i_7l,]} (2.33)
/or the lower half-range ofn and 0 < q < p/2 1. Note that yo,q is pure
imaginary because o,o = x0 = 0.
Next, assume that p is odd. The forward combine equation for I sequences
is:
Vn,q = l/pw/5{zn,o
(?-l)/2
+ E iuplqxn,l UPX-n,l\}
1=1
(2.34)
for the lower half-range of n and 0 < q < p 1. The forward combine
equation for ICS \and ICSIS sequences is given by equation (2.34) for the
lower half-range of n and 0 < q < (p l)/2 with the exception that all
sequences xn%i are real. The forward combine equation for ISCS and ISCSIS
sequences is:
(P-1)/2
Vn,g = l/pa;iVn(9+1/2){inio + £ [u Z(9+1/2)in,i a;J9+1/2^_n,z]} (2.35)
l=i
for the lower half-range ofn and 0 1 (p-i)/2
yh,(pja)/2 = l/p{n,0 + (~ 1) (2.36)
Z=1
for the lower half-range ofn. Note that Po,(p-i)/2 = 0.
We now prove jTheorem 2.12. First, assume that p is even. The forward
combine equation for I sequences is obtained by developing a compact form
52


I
of equation (2.4) which eliminates all redundant data. For this purpose, we
will need the following result which is valid for all I sequences:
2-n,pZ1 (pZl)iV/p+n
, | = *JV-(Z+l)ZV/p+n
(Z+l)JV/p n
; i = n,Z+l
Using this result, we obtain:
p-i

z=o
1: p/2i p/2-1
= E upl : !, z=o 1=0
p/2-1 p/2-1
= \(puNnq{ E wp lQx,i - E wI('+l)*-n,Z+l}
: Z=0 z=o
p/2-1 p/2
= E ***. - E^p^-n.z}
' Z=0 Z=1
= Vi>"jv"9{n,0 + (-l)9+1* n,p/2 +
p/2-1
E K /]} iZ=l
The forward combining of ICS, ISCS, and ICSIS sequences requires one
new equation: :,
- n/2
Vn,p/2 ~ uN/pyn,p/2
' = 1/P{*n,0 + (-l)P/2+1Zn,p/2 +
: p/2-1
E (-l)W *-n,z]}
1=1
The forward combine equation for ISCSIS sequences is obtained by sub-
stituting equation (2.9) into equation (2.31):
53


I
Vn,g .,= l/pUNnq{vn,0 + (-l)q+1X-n,p/2 +
p/2-1
Y ["p - WP 'S-n.l3}
2=1
' = l/p^n(?+1/2){ini0 + *(-1)fl+1a_niP/2 +
p/2-1
EK,(s+1/2)^-^1/2)M}
2=1
1 _ I
Next, assume that p is odd. The forward combine equation for I se-
quences is obtained by developing a compact form of equation (2.4) which
eliminates all redundant data.
Vn,q = 1/PuNnq^2^plqxn,i
: i=o
=, 1/PUNlq{Upq^~1)/2xnl{p-l)!2 +
(p-3)/2 (p-3)/2
Y UplqXn,l+ Y u;qLP'l~l]Xn,p-l-1}
2=0 1=0
= VPy>P'5(p'1)/2a:n,(p-i)/2 +
: (P~3)/2 (p-3)/2
! ' Y Y u>f+1h_n 1=0 l=o
' l/P^n9{^p9(P_1)/2*n,(p-l)/2 +
i i (p3)/2 (p-l)/2
Y V;lqXn,l~ Y ^-n,i}
1 Z=0 2=1
(p-l)/2
= l/pUNnq{xn,0+ Yj ~ uj*-n,l]}
The forward combining of ICS and ICSIS sequences does not require
any new equations. The forward combine equation for ISCS and ISCSIS
sequences is obtained by substituting equation (2.9) into equation (2.34):
I 1
, (P-l)/2
Vn, q = l/pw/5W,0+ Yj iup kxnl ~ UpX-n,l}}
i1 2=1
54
I


(p-l)/2
= 1 */l3^n(9+l/2){*n>0 + [>p*(9+1/2)Xn,! -£4(9+1/2)*-n,i]}
i=l
For q (p 1)/2 this reduces to:
&i>(p-1)/2 = WJV/p^,(p-1)/2
I ; (P-l)/2
= lM*n,0 + X) ~
: z=i
This completes the proof of Theorem 2.12. The following corollary provides
an important special case of this result.
Corollary 2.9 Assume p 2. The forward combine equation for I se-
quences is:
' Vnfl (*7i,0 n,l)/2
| Vn, 1 = W;T(a:nio + _n,l)/2
for the lower half-range of n. The forward combine equation for ICS and
ISCS sequences is:
Vnfi = (Zn, 0-*-n,l)/2
! $n,l ~ (n,0 ~t~ n,l)/2
for the lower half-range of n. The forward combine equation for ISCSIS
sequences is:
Vv.fi = ^"/2(in|0 **_n,i)/2
/or the lower half-range ofn.
i
i1
55
I


2.5 Real Composite Even-Even (RE-E)
i. '
In this section, we will be concerned with the following symmetries:
h
Definition 2.9 A' real composite even-even (RE-E) sequence xn of length
N, where N is even, is defined by:
xn = xti
2JJV71 ~ xn
xN/2-n = xn
Note that an RE-E, sequence of length N is also an RE sequence of length N.
A real conjugate symmetric zero odd term (RCSZO) sequence Xk of length
N, where N is even, is defined by:
Xk = Xk
' XN_k = X*
Xk = (-1 )kxk
The following lemma establishes the relationship between these symme-
tries.
Lemma 2.9 If is an RE-E sequence of length N, where N is even, then
its DFT Xk is an R CSZO sequence of length N. IfXk is an RCSZO sequence
of length N, where N is even, then its IDFT xn is an RE-E sequence of
length N. I
. i
We now prove Lemma 2.9. We will only prove the first assertion. Assume
xn is an RE-E1 sequence of length N, where N is even. Since xn is also an
RE sequence of length N, Lemma 2.5 implies that its DFT X*. is an RCS
sequence of length N. Thus, we have only to prove the third property in
the definition of an RCSZO sequence. For this, we use the representation of
Xk provided by Theorem 2.7 and the RE-E symmetry of xn as follows:
, JV/2-1
Xk = X0 + {-l)kxN/2 + ^2 2in cos(2irfc7i/N)
\ 71=1
,1 N/ 2-1
= !zo-i- (-l)fcJv/2 + $3 2a:A727ics[2?rfc(lV/2 n)(N]
i 71=1
56


I
N/ 2-1
= xq -f (l^io + (1)* X^ 2Kn cos(2irkn/N)
n 1
i' i' N/ 2-1
- + *o + X) 2ajncos(2xkn/N)]
n=l
. i, N/2-1
= (-f)*[*o + (l)*a3/v/2 + X) 2a3n cos(27rfcn/JV)]
l 71=1
= ;(-!)***
This completes the proof of Lemma 2.9.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for N-NS boundary con-
ditions. Note that if N = 2(2M + 1), then an RE-E sequence satisfies N-NS
boundary conditions for the computational domain 0 < n < M. That is:
1
XN-l X\
' I.!
XM XM+l
Theorem 2.13 Let xn be an RE-E sequence and let Xk be its RCSZO sym-
metric DFT, both of length N where N is even. Assume that N = 2(214+1).
The real form of the DFT is:
M
' %2k = 2/N[xo + ^>2 cos(4xAn/lV)]
i i i 71=1
for 0 < k < M. The real form of the IDFT is:
M
Xn = Xq + X] 2X2k cos(47Tkn/N)
, ii *=1
for 0 < n < M. Note that the results for the DFT and IDFT are identical
except for scaling.
i, J'i
We now prove Theorem 2.13. The result for the DFT follows from The-
orem 2.7, the RCSZO symmetry of Xk, and the RE-E symmetry of xn as
follows: ,
, !, NZ2-1
X2k =f l/iV[o + xjj/2 + XI 2zncos(4irkn/N)\
f ' 71= 1
I
1 " 57


M
= 1/N{xq + zat/2 + 53 2Xn cos(Airkn/N) +
71=1
M
53 2^/z-n cos[4irk(N/2 n)/#]}
71=1
M
= 2/N[xq + 53 2a:n cos(4xkn/N)]
' 1 71=1
The result for the IDFT follows immediately from Theorem 2.7 and the
RCSZO symmetry'of Xk. Note that only one fourth of the RE-E sequence
xn needs to be'specified. This completes the proof of Theorem 2.13.
A fast, mixed radix algorithm for computing the RE-E symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RE'symmetric FFT. Note that an RE-E sequence of length
N may be stored in N/4 real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an RCSZO sequence of
length N may be stored in JV/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth in
both storage requirements and number of operations compared to that for
C sequences. This algorithm is based on the symmetries which occur in the
splittings of the RCSZO sequence Xk We begin developing this algorithm
by defining one new intermediate symmetry involved.
Definition 2.10 A zero (Z) sequence Xk of length N is defined by:
xk = o
for 0 < k < N 1.'
i
The following lemma establishes the relationship between the symmetries
which occur in the splittings of the RCSZO sequence Xk- We omit the proof
of this result because it is trivial.
Lemma 2.10 Let\xk be an RCSZO sequence of length N with factor 2.
Then subsequence Xk,o is RCS symmetric, and subsequence Xk,i is Z sym-
metric. The symmetries which occur in the splittings of the RCS sequence
Xk,o are identical to those in Lemma 2.3, with the addition that all sequences
have R symmetry as well.
I
i
58


A mixed radix splitting tree diagram for an RCSZO sequence is shown
in Figure 2.5. Thej acronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find R sequences rather
than C sequences.! i
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4 and 2.6, with
the addition provided by the following lemma. We omit the proof of this
result because it is trivial.
Lemma 2.11 Let 'Xf. be a Z sequence of length N. Its IDFT xn is also a
Z sequence of length N.
These results snow that each symmetry appearing in Figure 2.5 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations1 for the RE-E symmetric IFFT, obtained as a special case
of that for the RE,symmetric IFFT.
Corollary 2.10 Assume p = 2. The inverse combine equation for RCS and
Z sequences is:
n,0 ~ 2/71,0
for the lower half-range ofn. The inverse combine equations for the remain-
ing symmetries' are' provided by Theorem 2.8 for arbitrary factors p.
We now prove ; Corollary 2.10. The inverse combine equation for RCS
and Z sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p 2. Thus, we apply Corollary 2.6 and use the
Z symmetry of yn,i Note that the companion equation is not needed be-
cause only one. fourth of the RE-E sequence xn needs to be computed. This
completes the proof of Corollary 2.10.
The next corollary provides all of the forward combine equations for
the RE-E symmetric FFT, obtained as a special case of that for the RE
symmetric FFT.
Corollary 2.11 Assume p = 2. The forward combine equation for RCS
and Z sequences is:
J/n, 0 n, 0
tin, 1 = 0
59


I
I I
Figure 2.5: Splitting tree for RE-E symmetric FFT
60
i


for the lower half-range of n. The forward combine equations for the re-
maining symmetries are provided by Theorem 2.9 for arbitrary factors p.
We now prove Corollary 2.11. The forward combine equation for RCS
and Z sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the
RE-E symmetry of xn as follows:
yn,o = (n,0 + *-n,l )/2
= (*7l + *CiV/2-Tl)/2
= (*n+.*n)/2
= *n, 0
I 2/71,1 J (*71,0 n,l)/2
= (*7i *lV/2n)/2
= (*7i n)/2
= 0
This completes the proof of Corollary 2.11.
I
61


I
l
2.6 Real Composite Even-Odd (RE-O)
i
In this section, we will be concerned with the following symmetries:
Definition 2.11 A real composite even-odd, (RE-0) sequence xn of length
N, where N is even, is defined by:
*n xn
' xN-n =
, xN/2-n ~ ~xn
Note that an RE-0 sequence of length N is also an RE sequence of length
N. A real conjugate symmetric zero even term (RCSZE) sequence Xk of
length N, where N is even, is defined by:
1 Xk = xk
Xjf-k Xk
Xk = {-l)k+1Xk
'!
The following lemma establishes the relationship between these symme-
tries.
Lemma 2.12 If x^ is an RE-0 sequence of length N, where N is even, then
its DFTXk is an RCSZE sequence of length N. If Xk is an RCSZE sequence
of length N, where N is even, then its ID FT xn is an RE-0 sequence of
length N.
i i
We now prove Lemma 2.12. We will only prove the first assertion. As-
sume xn is an RE-|0 sequence of length N, where N is even. Since xn is
also an RE sequence of length N, Lemma 2.5 implies that its DFT Xk is an
RCS sequence of length N. Thus, we have only to prove the third property
in the definition of an RCSZE sequence. For this, we use the representation
of Xk provided; by,Theorem 2.7 and the RE-0 symmetry of xn as follows:
N/ 2-1
Xk = *0 + (-l)fcZiV/2 + cos(2irkn/N)
71=1
JV/2-1
= X0 + (-l)kxN/2+ ^2 2zjvy2-n cos[2irk(N/2 n)/N]
71=1


N/ 21
= o + (l)fc+1*o + (l)fc+1 53 2a!n cos(2irkn/N)
71=1
N/ 2-1
= l(-l)fc+1[(-l)*+1s0 + *o + 2Xn cos(2irA:n/JV)]
71=1
! JV/21
= (-l)fc+1[0 + (-l)fcj\r/2 + 5^ 2 cos(27TAn/JV)]
n=l
= (-i)fc+1xfc
This completes1 the'proof of Lemma 2.12.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for N-D or N-DS bound-
ary conditions, depending on the length of the sequence N. Note that if
N = AM, then an RE-0 sequence satisfies N-D boundary conditions for the
computational domain 0 < n < N/A 1. That is:
XN-1 = i
XN/ 4 = 0
Similarly, if N 2(2M+1), then an RE-0 sequence satisfies N-DS boundary
conditions for the computational domain 0 XJV-I = *i
XM = -XM+1
Theorem 2.l4 Let xn be an RE-0 sequence and let Xk be its RCSZE sym-
metric DFT, both of length N where N is even. Assume that N = AM. The
real form of the DFT is:
N/ 4-1
Xik+i = 2/N{xo + 53 cos[27rn(2& + 1 )/N]}
I 71=1
for 0 < k < N/A 1. The real form of the IDFT is:
! N/ 4-1
Xn = 5^ 2X2J:+i cos[27Tn(2fe + l)/N]
fc=o
63
i
i |
I


I
, ; 1
for 0 < n < Nj4 -11. Next, assume that N = 2(2M + 1). The real form of
the DFT is: <
M
X2k4i 2/N{xo + 2xncos[2nn(2k + 1)/N]}
' 71=1
for 0 < k < M1 The real form of the IDFT is:
M-l
*n = (;-l)nXN/2 + ^2 2X2k+i cos[2xn(2& + l)/iV]
; fc=o
for 0 < n < M. ''
I '
We now prove,Theorem 2.14. We prove the result for the DFT for the
case of N = AM only, since the proof for N = 2(2M + 1) is similar. This
result follows fromiTheorem 2.7, the RCSZE symmetry of Xk, and the RE-0
symmetry of xn as follows:
j
N/ 2-1
X2k+i = 1/N{x0-xN/2+ ^2 2incos[2xra(2A+1)/1V]}
I ri= 1
N/ 4-1
= 1/N{x0 xN/2 + ^2 2Xn cos[2xn(2fc + 1 )/N] +
'r n=1
N/ 4-1
2*JV/2-nCos[2x(lV/2 n)(2k + 1)/1V]}
71=1
1' N/ 41
= 2/N{x0-\- ^ 2xn cos[2xn(2fc + 1)/1V]}
71=1
The results for the IDFT follow immediately from Theorem 2.7 and the
RCSZE symmetry; of Xk- Note that only one fourth of the RE-0 sequence
xn needs to be [specified. This completes the proof of Theorem 2.14.
A fast, mixed radix algorithm for computing the RE-0 symmetric DFT
and its inverse,'given xn in natural order, may be obtained as a special case
of that for the RE symmetric FFT. Note that an RE-0 sequence of length
N may be stored in Nj4 real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an RCSZE sequence of
length N may be stored in N/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth
64
. I


in both storage requirements and number of operations compared to that
for C sequences. This algorithm is based on the symmetries which occur in
the splittings of the1 RCSZE sequence Xk- This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk- We omit the
proof of this result because it is trivial.
Lemma 2.13 Let Xk be an RCSZE sequence of length N with factor 2.
Then subsequence Xk,o is Z symmetric, and subsequence Xk,i is RSCS sym-
metric. The symmetries which occur in the splittings of the RSCS sequence
Xk,\ are identical to those in Lemma 2.3, with the addition that all sequences
have R symmetry as well.
A mixed radix splitting tree diagram for an RCSZE sequence is shown
in Figure 2.6. The acronyms representing the symmetries axe summarized
in Table 2.2 for. ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find R sequences rather
them C sequences, j
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.6, and 2.11.
These results show that each symmetry appearing in Figure 2.6 induces
a symmetry in,the| IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the RE-0 symmetric IFFT, obtained as a special case
of that for the RE symmetric IFFT.
Corollary 2.12 Assume p = 2. The inverse combine equation for Z and
RSCS sequences is:
*n, 0 = Vii,l
for the lower half-range ofn. The inverse combine equations for the remain-
ing symmetries are provided by Theorem 2.8 for arbitrary factors p.
We now proye Corollary 2.12. The inverse combine equation for Z and
RSCS sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z
symmetry of yn'o- Note that the companion equation is not needed because
only one fourth of the RE-0 sequence xn needs to be computed. This
completes the proof of Corollary 2.12.
I
65


I
Figure 2.6: Splitting tree for RE-0 symmetric FFT
I
I
66


The next corollary provides all of the forward combine equations for
the RE-0 symmetric FFT, obtained as a special case of that for the RE
symmetric FFT.
Corollary 2.13 Assume p 2. The forward combine equation for Z and
RSCS sequences is:
Vnfi = 0
i Vn, 1 2-71,0
for the lower half-range of n. The forward combine equations for the re-
maining symmetries are provided by Theorem 2.9 for arbitrary factors p.
We now proye Corollary 2.13. The forward combine equation for Z and
RSCS sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p 2. Thus, we apply Corollary 2.7 and use the
RE-0 symmetry of xn as follows:
Vnfl (*71,0 *ti,i)/2
= (xn + *JV/2-ti)/2
1 = {xn xn)/2
- 0
2fa,l (7l,0 _7l|l)/2
= (*71 JV/2 7l)/2
= (*71 + *7l)/2
1 ' = *71,0
This completes the proof of Corollary 2.13.
67


2.7 Real Composite Odd-Even (RO-E)
1 i,
In this section, we will be concerned with the following symmetries:
Definition 2.12 A real composite odd-even (RO-E) sequence xn of length
N, where N is even, is defined by:
xn xn
XN-n ~xn
, xN/2-n = xn
Note that an RO-E, sequence of length N is also an RO sequence of length
N. An imaginary conjugate symmetric zero even term (ICSZE) sequence
Xk of length N, where N is even, is defined by:
xk = -xk
XN-k = Xk
Xk = (-1)*+1X*
The following lemma establishes the relationship between these symme-
tries.
Lemma 2.14 Ifxn is an RO-E sequence of length N, where N is even, then
its DFT Xk is an ICSZE sequence of length N. If Xk is an ICSZE sequence
of length N, where N is even, then its IDFT xn is an RO-E sequence of
length N.
We now prove L;emma 2.14. We will only prove the first assertion. As-
sume xn is an RO-E sequence of length N, where N is even. Since xn is
also an RO sequence of length N, Lemma 2.7 implies that its DFT Xk is an
ICS sequence of length N. Thus, we have only to prove the third property
in the definition of an ICSZE sequence. For this, we use the representation
of Xk provided by Theorem 2.10 and the RO-E symmetry of xn as follows:
N/2-1
Xk = i/N ^ 2xn sin(2Trkn/N)
' i1 n~1
N/2-1
= -i/f E %xN/2-n sin[27rfc(iV/2 n)/N]
n= 1
68


I
i
' N/ 2-1
= (1)*+1[i/N ^2 2xnsm(2xkn/N)]
n1
=:' (-i)*+1x*
This completes the proof of Lemma 2.14.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for D-N or D-NS bound-
ary conditions,depending on the length of the sequence N. Note that if
N = 4M, then an RO-E sequence satisfies D-N boundary conditions for the
computational domain 1 < n < N/i. That is:
x0 = 0
*7V/4-l = *JV/4+l
Similarly, if N = 2(2M+1), then an RO-E sequence satisfies D-NS boundary
conditions for the computational domain 1 < n < M. That is:
. i
*o = 0
XM : *M+1
r
Theorem 2.15 Let xn be an RO-E sequence and let Xk be its ICSZE sym-
metric DFT, both of length N where N is even. Assume that N 4M. The
real form of the | DFT is:
N/4.1
Im(X2k-i) = -2/N{(-l)k+1xN/4 + 2x" sin[2xn(2k 1)/N]}
' , | 71=1
for 1 < k < N/i. The real form of the IDFT is:
N/i
*n = 2Im(X2k-i) sin[27rn(2fc 1)/JV]
i: fc=i
l 1 1
for 1 < n < N/i. Next, assume that N = 2(2M + 1). The real form of the
DFT is:
M
Im(X2k-i) = -2/N ^2 2*sin[2irn(2Jb 1 )/N)
n=l
69


I
for 1 < k < M. The real form, of the IDFT is:
M
xn = ^2 2Im(X2k-i) sm[2Tm(2k 1)/N]
k=l
for 1 < n < M.
We now prove Theorem 2.15. We prove the result for the DFT for the
case of N = AM only, since the proof for N = 2(2M + 1) is similar. This
result follows from Theorem 2.10, the ICSZE symmetry of Xand the RO-E
symmetry of xn as follows:

JV/2-1
1 jN ^2 2n sin[2xn(2A 1)/JV]
n=l
N/ 4-1
-1/JV{(-1)2*W + £ 2xn sin[27rn(2& 1)/1V] +
, n=l
N/ 4-1
^2 2xN/2-n sin[27r(iV/2 n)(2k 1)/JV]}
71=1
Nf 4-1
2/N{(-l)k+lxw/4 + ^2 2xn sin[27rn(2& 1)/iV-]}
n=l
The results for the, IDFT follow immediately from Theorem 2.10 and the
ICSZE symmetry of Xk- Note that only one fourth of the RO-E sequence
xn needs to be specified. This completes the proof of Theorem 2.15.
A fast, mixed radix algorithm for computing the RO-E symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RO symmetric FFT. Note that an RO-E sequence of length
N may be stored in N/A real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an ICSZE sequence of
length N may be stored in N/A real storage locations. Our goal is to exploit
these symmetries in j the data in order to obtain a reduction by one eighth
in both storage requirements and number of operations compared to that
for C sequences. This algorithm is based on the symmetries which occur in
the splittings of the ICSZE sequence Xk- This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk- We omit the
proof of this result because it is trivial.
70


Lemma 2.15 Let Xp. be an ICSZE sequence of length N with factor 2. Then
subsequence Xkto is Z symmetric, and subsequence Xk,i is ISCS symmetric.
The symmetries which occur in the splittings of the ISCS sequence Xk,i o,re
identical to those in Lemma 2.3, with the addition that all sequences have I
symmetry as well.
A mixed radix splitting tree diagram for an ICSZE sequence is shown
in Figure 2.7. The acronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find I sequences rather
than C sequences.
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11.
These results show that each symmetry appearing in Figure 2.7 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the RO-E symmetric IFFT, obtained as a special case
of that for the RO symmetric IFFT.
Corollary 2.14 Assume p = 2. The inverse combine equation for Z and
ISCS sequences is:
7i,0 Vn, 1
for the lower half-range ofn. The inverse combine equations for the remain-
ing symmetries are provided by Theorem 2.11 for arbitrary factors p.
We now prove Corollary 2.14. The inverse combine equation for Z and
ISCS sequences may be regarded as a special case of that for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z sym-
metry of yn,o- Note that the companion equation is not needed because only
one fourth of the RO-E sequence xn needs to be computed. This completes
the proof of Corollary 2.14.
The next corollary provides all of the forward combine equations for
the RO-E symmetric FFT, obtained as a special case of that for the RO
symmetric FFT.i
Corollary 2.15 Assume p = 2. The forward combine equation for Z and
ISCS sequences is: ,
yn, o = 0
2/n, 1 n>0
71


11
ICSZE
Figure 2.7: Splitting tree for RO-E symmetric FFT
I
72


for the lower half-range of n. The forward combine equations for the re-
maining symmetries are provided by Theorem 2.12 for arbitrary factors p.
We now prove Corollaxy 2.15. The forward combine equation for Z and
ISCS sequences may be regarded as a special case of that for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.9 and use the R.O-E
symmetry of xn as follows:
Vnfi = (n, 0 Ti, l)/2
= (7l - xN/2-ti)/2
= (*Tl 7l)/2
= 0
Vn.,1 = (n,0 4" *n,l)/2
= (n "t" ziV/2n)/2
= (zn + n)/2
= Z7l,0
This completes the proof of Corollary 2.15.
I
73


I
2.8 Real Composite Odd-Odd (RO-O)
In this section, we will be concerned with the following symmetries:
I ;
Definition 2.13 A real composite odd-odd (RO-O) sequence xn of length
N, where N is even, is defined by:
xn = xn
XjV-n = ~n
xN/2-n =
Note that an RO-O sequence of length N is also an RO sequence of length
N. An imaginary conjugate symmetric zero odd term (ICSZO) sequence Xk
of length N, where N is even, is defined by:
Xk = -Xk
Xn-h -Xk
, \ X* = (-i)kxk
The following lemma establishes the relationship between these symme-
tries.
Lemma 2.16 If x^ is an RO-O sequence of length N, where N is even, then
its DFT Xk is an ICSZO sequence of length N. If Xk is an ICSZO sequence
of length N, where N is even, then its ID FT xn is an RO-O sequence of
length N.
i i
We now prove Lemma 2.16. We will only prove the first assertion. As-
sume xn is an RO-O sequence of length N, where N is even. Since xn is
also an RO sequence of length N, Lemma 2.7 implies that its DFT Xk is an
ICS sequence of length N. Thus, we have only to prove the third property
in the definition of an ICSZO sequence. For this, we use the representation
of Xk provided by Theorem 2.10 and the RO-O symmetry of xn as follows:
N/ 2-1
Xk = i/N 2xn s'm(2'!rkn/N)
[, n=1
' N/ 2-1
= -i/H E 2 jv/2n sin[27rA:(iV/2 n)/N]
n= 1
74


;1 N/ 2-1
=' (1 )k[i/N ^ 2ajnsin(27vkn/N)}
71=1
= (-i?xk
This completes thejproof of Lemma 2.16.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for D-DS boundary con-
ditions. Note that if N = 2(2M + 1), then an RO-O sequence satisfies D-DS
boundary conditions for the computational domain 1 < n < M. That is:
x0 = 0
xu -xm+1
Theorem 2.16 Lei xn be an RO-O sequence and let Xk be its ICSZO sym-
metric DFT, both of length N where N is even. Assume that N = 2(2M-j-l).
The real form of the DFT is:
M
Im{X2k) = -2/N ^ 2rnsin(47zkn/N)
, 71=1
for 1 < k < M. The real form of the IDFT is:
M
xn = 2Im{X2k) sin(4?rkn/N)
k=l
for 1 < n < M. Note that the results for the DFT and IDFT are identical
except for scaling.
I
We now prove Theorem 2.16. The result for the DFT follows from The-
orem 2.10, the ICSZO symmetry of Xk, and the RO-O symmetry of xn as
follows:
II N/2-1
Im(X2fcl) = 1/N 2snsin(4irkn/N)
71=1
M
= l/JV{^ 2$n sin(47rfen/JV) +
71=1
;1 M
2xN!*-n sin[4wfe(N/2 n)/N}}
71=1
75


M
= 2/JV y; 2xn sin(4irkn/N)
71=1
The result for the IDFT follows immediately from Theorem 2.10 and the
ICSZO symmetry of Xk. Note that only one fourth of the RO-O sequence
xn needs to be specified. This completes the proof of Theorem 2.16.
A fast, mixed radix algorithm for computing the RO-O symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RO symmetric FFT. Note that an RO-O sequence of length
N may be stored in JV/4 real storage locations, compared to 2N real storage
locations for a G sequence of length N. Similarly, an ICSZO sequence of
length N may be stored in JV/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth
in both storage requirements and number of operations compared to that
for C sequences.' This algorithm is based on the symmetries which occur in
the splittings of the ICSZO sequence Xk- This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk- We omit the
proof of this result because it is trivial.
Lemma 2.17 Let Xk be an ICSZO sequence of length N with factor 2.
Then subsequence Xk,o is ICS symmetric, and subsequence Xk,i is Z sym-
metric. The symmetries which occur in the splittings of the ICS sequence
Xk,o are identical to those in Lemma 2.3, with the addition that all sequences
have I symmetry as well.
A mixed radix splitting tree diagram for an ICSZO sequence is shown
in Figure 2.8. The apronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find I sequences rather
than C sequences.
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11.
These results show that each symmetry appearing in Figure 2.8 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the RO-O symmetric IFFT, obtained as a special case
of that for the RO symmetric IFFT.
76


I
* 1
I
i
, i
11
!'
ICSZO
11
Figure 2.8: Splitting tree for RO-O symmetric FFT
I '
I
I
, I
*1
77


i
Corollary 2.16 Assume p = 2. The inverse combine equation for ICS and
Z sequences is:
l n,0 Un} 0
for the lower half-range ofn. The inverse combine equations for the remain-
ing symmetries are provided by Theorem 2.11 for arbitrary factors p.
We now prove Corollary 2.16. The inverse combine equation for ICS
and Z sequences may be regarded as a special case of that for ICS and
ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z
symmetry of yn,i- Note that the companion equation is not needed because
only one fourth of the RO-O sequence xn needs to be computed. This
completes the proof of Corollary 2.16.
The next corollary provides all of the forward combine equations for
the RO-O symmetric FFT, obtained as a special case of that for the RO
symmetric FFT.
Corollary 2.17 Assume p = 2. The forward combine equation for ICS and
Z sequences is:
Vnfi n,0
tin, 1 = 0
for the lower half-range of n. The forward combine equations for the re-
maining symmetries are provided by Theorem 2.12 for arbitrary factors p.
We now prove Corollary 2.17. The forward combine equation for ICS
and Z sequences may be regarded as a special case of-that for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.9 and use the RO-O
symmetry of xn as follows:
I
2/71,0 : (n,0 x n,l)/2
= (ti ~ xN/2-n)/2
= (7l + 7l)/2
= xn,0
2/n,l = (7l,0 4" X 71,l)/2
= (*71 + *iV/2~7l)/2
= (7l 7l)/2
= 0
This completes the proof of Corollary 2.17.
78


I
- I'
2.9 Real Staggered Even (RSE)
In this section,.we will be concerned with the following symmetries:
i!
Definition 2.14 A real staggered even (RSE) sequence xn of length N is
defined by:
i
j = n
| JV n1 : 7i
An u}-even (uE) sequence Xk of length N is defined by:
XN-k = Xk_ (2.37)
,! ** = (2-38)
The following lemma establishes the relationship between these symme-
tries. We omit the proof of this result because it is well known.
h
Lemma 2.18 If zn! is an RSE sequence of length N, then its DFT Xk is
an ojE sequence of length N. If Xk is an uE sequence of length N, then its
ID FT xn is an RSE sequence of length N.
1 1!
The next lemma'will be needed to obtain the real form of the DFT and
IDFT.
- M
Lemma 2.19 Let Xk be an uoE sequence of length N, and let Xk denote
the magnitude of Xk. Then:
N 3 II $ (2.39)
Xk = 1/JV £ a^*(n+1/2) (2.40)
i 71=0
(2.41)
k=0
; Xw+k = -Xk (2.42)
iX^-k = -Xk i (2.43)
i [ 11 79
I |
j '
i.i


We now prove|Lemina 2.19. We express Xk in polar form as follows:
' ! Xk = Xkei6
l|- i
Substituting ^his into equation (2.38) and solving for 6 leads to equation
(2.39). Combining equations (2.1) and (2.39) leads to equation (2.40), while
combining equations (2.2) and (2.39) leads to equation (2.41). Equation
(2.42) is obtained
from equation (2.40) as follows:
JV-l
= l/N Y. W"+)(,+1/!)
71=0
N~ 1
= -l/N £
n=0
= -xk
Equation (2.43) is obtained by combining equations (2.37) and (2.39) as
follows: I
! XT (.AT&)/2 xr
= -Ukj2xk
; = -xk
This completes thje proof of Lemma 2.19.
The next- theorem uses the previous lemma to find the real form of the
DFT and IDFT. jObserve that the result for the DFT is the eigenvector
expansion required by the Fourier analysis method for N-D boundary condi-
tions. Note that ifi-AT is even, then an a>E sequence (represented by Xk) satis-
fies N-D boundary conditions for the computational domain 0 < k < Nj21.
That is: I
;J XN/2 = 0
Observe that J, the result for the IDFT is the eigenvector expansion required
by the Fourier analysis method for NS-NS boundary conditions. Note that
if N is even, jjhenJan RSE sequence satisfies NS-NS boundary conditions for
the computational domain 0 < n < N/2 1. That is:
I xN_i = x0
'I I
;! | XN/ 21 = XN/2
j 80
I


Theorem 2:17 Let xn be an RSE sequence and let Xk be its uE symmetric
DFT, both of length N where N is even. The real form of the DFT is:
.;! N/2-1
Xk = l/N ^2 2xn cos[xA:(2n + 1)//V]
n=0
! 1
for 0 < k < Nj 2 : 1. The real form of the ID FT is:
'I JV/ 2-1
L = X0+ E co8[ffJfe(2n + 1)/JV]
! ; *=i
for 0 < n < N j 2 J 1.
.i
We now prove Theorem 2.17. The result for the DFT follows from equa-
tion (2.40) and the RSE symmetry of xn as follows:
iii' j
xk = :|i
!, I n=0
|l JV/21 JV/2-1
=;;ji/jvj{ d wn+I/2)+ ^
: n=0 n=0
JV/2-1 JV/21
= !!/{ £ *^<+1/2> + £ iu$"+1/2)}
'I n=0 n=0
;! I JV/21
= 2/lVRe[ J2 WNHn+1/2)]
1 j Tl=0
;:;]11 :! jv/2i
= [jl/N ^2 2xn cos[7rfe(2n + 1)/Ar]
,1 n=0
Note that only half of the wE sequence Xk needs to be specified. The result
for the IDFT'follows from equations (2.41) and (2.43) as follows:
fc(n+l/2)
''i, JV^l
xn =| XkvkN
k=o
C. I JV/21 JV/21
4 *i+ E **<4(7l+1/2) + E *iv-^ir*)(n+1/2)
fc=i

81


1 ivy 21 N/ 21
= XU £ x*<4'"+,/:!) + £ jW<+1/2>
I fc=l &=1
JV/21
= *o + 2Re[ £ Xku%n+1/i)]
k=i
iV/2-l
= Xi+ £ 2Xk cos[irfc(2n + 1)/N]
k= l
Note that only half of the RSE sequence xn needs to be specified. This
completes the proof of Theorem 2.17.
We now develop a fast, mixed radix algorithm for computing the RSE
symmetric DFT and its inverse, given xn in natural order. Note that an
RSE sequence of length N may be stored in N/2 real storage locations,
compared to 2N rteal storage locations for a C sequence of length N. How-
ever, an wE sequence of length N requires N real storage locations. Thus,
in order to obtain jan in-place algorithm, we must use a more compact rep-
resentation of an wE sequence. Such a compact representation is provided
by the quantities Xk in Lemma 2.19. Using this representation, an u;E se-
quence of length N may be stored in N/2 real storage locations. Our goal is
to exploit these symmetries in the data in order to obtain a reduction by one
fourth in both storage requirements and number of operations compared to
that for C sequences. The procedure for developing this algorithm will be
different from! the other algorithms in this chapter for the following reason.
Equation (2.40) shows that when we replace the complex quantity Xk by
the real quantity Xk, the DFT is changed to a new transform, which we
call the discrete staggered transform (DST). Equation (2.41) provides the
inverse discrete staggered transform (IDST). Note that the DST is a con-
stant multiple of the DFT, whereas the IDST is not related to the IDFT in
any simple way. We have found that the applications of the DST include
the boundary J conditions considered in this section, as well as others. Thus,
we have devoted all of Chapter 3 to the development of fast, mixed radix
algorithms for computing the DST and IDST. These algorithms are called
the fast staggered) transform (FST) and inverse fast staggered transform
(IFST). The FST for RSE sequences is developed in Section 3.3.
82


2.10
Real Staggered Odd (RSO)
In this section,! we will be concerned with the following symmetries:
;,| ,j
Definition 2.15 A real staggered odd (RSO) sequence xn of length N is
defined by: ''' |
xn = x.
XN-n-l =
An u-odd (wO) sequence Xk of length N is defined by:
I _
1 *N-k = a*
! A*. = u>jfXk
(2.44)
(2.45)
The following lemma establishes the relationship between these symme-
tries. We omit the' proof of this result because it is well known.
1 l; ij
!?] ]
Lemma 2.20 If xn is an RSO sequence of length N, then its DFT Xk is
an ujO sequence o( length N. If Xk is an ojO sequence of length N, then its
IDFT xn is an RSO sequence of length N.
The next lemma will be needed to obtain the real form of the DFT and
IDFT. I
I'j
Lemma 2.2ii Let X}. be an ojO sequence of length N, and let Xk denote
the magnitude of Xk- Then:
A*
iXk

AN+k
XN-k
fc/2
* WN Xk
N-1
WE *nVN
710
JV1
k= 0
-Xk
Xk
-k(n+1/2)
k(n+1/2)
N
(2.46)
(2.47)
(2.48)
(2.49)
(2.50)


We now prove Lemma 2.21. We express Xk in polar form as follows:
j ** = Xkei6
Substituting jtjhis into equation (2.45) and solving for 6 leads to equation
(2.46). Combiningjequations (2.1) and (2.46) leads to equation (2.47), while
combining equations (2.2) and (2.46) leads to equation (2.48). Equation
(2.49) is obtained from equation (2.47) as follows:
-XjV+k
(i\T+Jfe)(n+1/2)
N-l
= -i/N J2 XnuN
71=0
n=0
i
= -Xk
Equation (2.50) is obtained by combining equations (2.44) and (2.46) as
follows: !
Xn-ic =
l LQ
= IU)
(JVfc)/2
N
,k/2"xr
Xx-h
'N
Xk
= Xk
This completes the proof of Lemma 2.21.
The nextii|heorem uses the previous lemma to find the real form of the
DFT and IDFT. (Observe that the result for the DFT is the eigenvector
expansion required by the Fourier analysis method for D-N boundary condi-
tions. Note that if| N is even, then an u>0 sequence (represented by Xk) sat-
isfies D-N boundary conditions for the computational domain 1 < k < N/2.
That is: 1!' \
j! i X0 = 0
I -XjV/2-l = Xtf/2+i
'' |
Observe that the result for the IDFT is the eigenvector expansion required
by the Fourier analysis method for DS-DS boundary conditions. Note that
if JV is even, then an RSO sequence satisfies DS-DS boundary conditions for
the computational^ domain 0 < n < N/2 1. That is:
[ l i
1 JV-1 = -*0
'! xN/2-l = ~xN/2
84


.1
I
I, I ' I
Theorem 2.18 Let xn be an RSO sequence and let Xk be its u O symmetric
DFT, both of length N where N is even. The real form of the DFT is:
! N/21
i xk = -l/N E 2a;nsin[irfc(2n + l)/iy]
j 71=0
for 1 < k < N/2. The real form of the IDFT is:
I N/21
= (-l)n+1Xjv/2 E 2Xksm[TTk{2n + l)/N]
i k= l
for 0 < n < N/2 -j 1.
We now prove [Theorem 2.18. The result for the DFT follows from equa-
tion (2.47) and the RSO symmetry of xn as follows:
N-l
-fc(n+1/2)
Xk = 'w-i/N E xn^N
,j, \ n=0
I ) N/21 N/ 2-1
= !-.m y. W("+1/2) + E 1/2)>
| I 71=0 71=0
!' "I JV/2-1 N/ 2-1
= l-i/N{ £ - E
| j 71=0 71=0
;.i ! jv/2-i
= :2/Mm[ E *7ij?(n+1/2)]
: i I n=0
i.'] ! N/21
""-l/N E 2x* sin[7ri(2n + 1)/Ar]
!|. j 71=0
Note that only half of the ojO sequence Xk needs to be specified. The result
for the IDFT:!follows from equations (2.48) and (2.50) as follows:
!;[ j
. = Ed^("+'/2
fc=o I
N/21 N/21
Mn+1/2)
= Oii-mm + E +1,2,+ E ^-^!f-x+1/2)}
'll! 'I fc= 1 fc=1
85


" JV/21 N/ 2-1
= i{i(vl)Efff/2 + £ Xk*T+im - E W<"+1/>)}
' 1 j fc=l fe=l
j iV/21
= i{('i-l)tfw/J + 2am[ 2 itu(+1/I>]}
/ I _ k=1
N/2i
= (-iy+'XN/2- £ 2Xk sin[xfc(2n + 1)/JV]
| k=1
Note that only half of the RSO sequence xn needs to be specified. This
completes the; proof of Theorem 2.18.
We now develop a fast, mixed radix algorithm for computing the RSO
symmetric DFT and its inverse, given xn in natural order. Note that an
RSO sequence of Jlength N may be stored in N/2 real storage locations,
compared tO|2IV rleal storage locations for a C sequence of length N. How-
ever, an wO sequence of length N requires N real storage locations. Thus,
in order to obtain an in-place algorithm, we must use a more compact rep-
resentation of an u;0 sequence. Such a compact representation is provided
by the quantities lXk in Lemma 2.21. Using this representation, an wO se-
quence of length N may be stored in N/2 real storage locations. Our goal is
to exploit these symmetries in the data in order to obtain a reduction by one
fourth in both storage requirements and number of operations compared to
that for C sequences. The procedure for developing this algorithm will be
different from, the
Equation (2.47) s'
the real quantity
other algorithms in this chapter for the following reason,
rows that when we replace the complex quantity Xk by
Xk, the DFT is changed to a new transform, which we
call the discrete staggered transform (DST). Equation (2.48) provides the
inverse discrete staggered transform (IDST). Note that the DST is a con-
stant multiple of the DFT, whereas the HIST is not related to the ID FT in
any simple wjay. jWe have found that the applications of the DST include
the boundary conditions considered in this section, as well as others. Thus,
we have devoted all of Chapter 3 to the development of fast, mixed radix
algorithms for computing the DST and IDST. These algorithms are called
the fast staggered transform (FST) and inverse fast staggered transform
(IFST). The; FST| for RSO sequences is developed in Section 3.4.
86


'I,
'I ;
2.11 Tables of Symmetries
'I
l,
i' i'


I
; Table 2.1: Symmetries in the IDFT
Aero Symmetry Sequence DFT
j Periodic xN+n = xn %N+k = Xk
R i Real Xn 7l ^N-k = Xk
RE Real Even xn = xn xN-n = xn ^ II II 1
RO Real oka n &7i JVn = xk = -Xk Xjsr-k = ~Xk
RE-E 'i Real Composite Even-Even (N even) Xn = Xn xNn = xn xN/2-n xn xk = xk XN-k = Xk Xk = (-l)fcxfc
RE-0 i Real Composite Even-Odd (JV even) xn xn xNn = xn xN/2n ~xn Xk = xk XN_k xk xk = (~l)k+1xk
RO-E 1 Real Composite 0 RO-C) Real Composite Ocid-Odd (N even) xn xn X Nin = Xn xN/2n ~xn Xk = -xk XN-k -Xk Xk = (-l)kxk
RSE 1 Real Staggered Even xn xn XN-k = Xk Xk = u>kNXk
RSO Real Staggered Odd xn xn XNrz1 XN-k = Xk Xk = -u,kNXk
88
i


Table 2.2: Symmetries in the DFT
Aero Symmetry Sequence IDFT
Periodic XN+k = xk
CS 1 Conjugate Symmetric XN-k = Xk xn =
SCS ! i Staggered Conjugate Symmetric = Xk xn = u>xnxn -nil. Xn Wjy Xn
CSIS 1 CS Indcd Interseq Symmetry Xk,pq X]\r/p_k_i g Hn,p-q UN/pyn,q
SCSIS scs Intel 'Sym Indcd seq metry Xk,pq1 XN/p_k_1>q Vn,p-q-1 = UN/pVn,q
R Real a ii XNn = xn
I Imaf ;inaiy £ i ii &N71 = 7l
i
I
i
89


Full Text

PAGE 1

FAST FOURlER TRANSFORMS FOR DIRECT SOLUTION OF POISSON'S EQUATION by Bert Larue Bradford B.A., The University of North Texas, 1976 M.A., The University of Texas at Austin, 1979 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Mathematics 1991

PAGE 2

@1991 by Bert Larue Bradford All rights reserved.

PAGE 3

This thesis for the Doctor of Philosophy degree by Bert Larue Bradford has been approved for the Department of Mathematics by Roland A. Sweet William L. Briggs William Clohessy Thomas F. Russell lf/9! Date

PAGE 4

Bradford, Bert Larue (Ph.D., Mathematics) Fast Fourier Transforms for Direct Solution of Poisson's Equation Thesis directed by Professor Roland A. Sweet This thesis presents compact algorithms used to incorporate the Cooley Tukey fast Fourier transform (FFT) into the solution of finite difference approximations to the multi-dimensional Poisson equation. In each spatial dimension, we must specify boundary conditions at both the left and right endpoint. Boundary conditions we consider include cyclic, Dirichlet, and Neumann. Furthermore, there is often a need to orient the grid such that one or both of the endpoints of the computational domain are staggered at half of a grid spacing. This leads to staggered Dirichlet and staggered Neumann boundary conditions. When the Poisson equation is discretized, these boundary conditions are approximated by requiring the real sequence which represents the approximate solution to satisfy discrete analogs. The discretized boundary value problem is solved by the Fourier analysis method (also referred to as the eigenvector expansion method or as a fast Poisson solver). This method requires finding the eigenvalues and eigenvectors cor responding to the discretized boundary value problem. The discrete solution is expanded in terms of these eigenvectors. The efficiency of this algorithm results from the ability to calculate the coefficients in such eigenvector ex pansions using an FFT algorithm. For each of the boundary conditions discussed above, an FFT algorithm has been developed which computes the coefficients in the corresponding eigenvector expansion as efficiently as pos sible by eliminating all redundant computations which would occur in the full complex FFT, and without pre-or post-processing. Such FFT algo rithms are referred to as compact symmetric FFTs. The elimination of pre and post-processing improves performance by reducing both the number of operations and data accesses. These FFT algorithms are all general mixed radix, in-place algorithms which accept the input sequence in natural order. The inverse algorithms accept the input sequence in permuted order. Thus, reordering of data is never required. The form and content of this abstract are approved. I recommend its publication. Signed Roland A. Sweet IV

PAGE 5

Contents List of Figures vii List of Tables X Acknowledgements xi 1 Introduction 1 1.1 The Fourier Analysis Method 1 1.2 The New FFT and FST Algorithms 8 2 Fast Fourier Transforms 10 2.1 Complex (C) .. 10 2.2 Real (R) ..... 20 2.3 Real Even (RE) 32 2.4 Real Odd (RO) 44 2.5 Real Composite Even-Even (RE-E) 56 2.6 Real Composite Even-Odd (RE-O) 62 2.7 Real Composite Odd-Even (RO-E) 68 2.8 Real Composite Odd-Odd (RO-O) 74 2.9 Real Staggered Even (RSE) 79 2.10 Real Staggered Odd (RSO) 83 2.11 Tables of Symmetries ... 87 3 Fast Staggered Transforms 91 3.1 Complex (C) .. .... 91 3.2 Real (R) .......... 96 3.3 Real Staggered Even (RSE) 109 3.4 Real Staggered Odd (RSO) 123 3.5 Real Composite Staggered Even. Staggered Even (RSE-SE) 137

PAGE 6

3.6 Real Composite Staggered Even-Staggered Odd (RSE-SO) 143 3.7 Real Composite Staggered Odd Staggered Even (RSO-SE) 149 3.8 Real Composite Staggered Odd Staggered Odd (RSO-SO) 155 3.9 Tables of Symmetries . . . . . . . 161 4 Software Implementation and Performance 164 4.1 Introduction . . . 164 4.2 The Radix-2 RO FFT 167 4.3 The Radix-4 RO FFT 178 4.4 The Rad.ix-3 RO FFT 190 4.5 The Mixed Radix RO FFT 204 4.6 Performance of the RO FFT 214 4.7 Automating Implementation of the RO FFT. 223 A Eigenstructure of the Discrete Poisson Equation 225 B Software for the RO FFT 228 C FORTRAN Skeleton for Combine Equations 274 D Mathematica Scripts 277 E Automatically Generated Subroutines for the RO FFT 301 Bibliography 309 vi

PAGE 7

List of Figures 2.1 Splitting tree for complex FFT 14 2.2 Splitting tree for R symmetric FFT 24 2.3 Splitting tree for RE symmetric FFT 34 2.4 Splitting tree for RO symmetric FFT 46 2.5 Splitting tree for RE-E symmetric FFT 60 2.6 Splitting tree for RE-O symmetric FFT 66 2.7 Splitting tree for RO-E symmetric FFT 72 2.8 Splitting tree for RO-O symmetric FFT 77 3.1 Splitting tree for R symmetric FST 101 3.2 Splitting tree for RSE symmetric FST 112 3.3 Splitting tree for RSO symmetric FST 126 3.4 Splitting tree for RSE-SE symmetric FST 141 3.5 Splitting tree for RSE-SO symmetric FST 147 3.6 Splitting tree for RSO-SE symmetric FST 153 3.7 Splitting tree for RSO-SO symmetric FST 158 4.1 Radix-2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = N I 4 170 4.2 Radix-2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = 1 . . . 170 4.3 Radix-2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 0 . . 171 4.4 Radix-2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = N I 4 . . 171 4.5 Radix-2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 1 . . . . 172 4.6 Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = 0 . 173

PAGE 8

4.7 Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = N /4 174 4.8 Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = 1 175 4.9 Splitting tree for the radix-2 RO FFT for N = 16 176 4.10 Radix-4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 0 182 4.11 Radix-4 storage pattern for ICS induced sym.'Iletries for N = 24 highlighting the case n = N j 8 182 4.12 Radix-4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 1 183 4.13 Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 0 184 4.14 Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = N /8 184 4.15 Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 1 185 4.16 Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = 0 186 4.17 Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = N /8 187 4.18 Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = 1 188 4.19 Radix-3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 0 193 4.20 Radix-3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = N /6 193 4.21 Radix-3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 1 194 4.22 Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 0 195 4.23 Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = N /6 195 4.24 Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 1 196 4.25 Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = 0 197 4.26 Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = N /6 198 viii

PAGE 9

4.27 4.28 4.29 4.30 4.31 4.32 Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = 1 . . Radix-3 storage pattern for I2 sequences for N = 18 highlighting the case n = 0 . . . Radix-3 storage pattern for I2 sequences for N = 18 highlighting the case n = N /6 . Radix-3 storage pattern for I2 sequences for N = 18 high lighting the case n = 1 Initialization subroutine hierarchy for the RO FFT Forward transform subroutine hierarchy for the RO FFT IX 199 200 201 202 207 207

PAGE 10

List of Tables 1.1 Discrete Homogeneous Boundary Conditions 2 1.2 Eigenstructure for the Standard Grid 4 1.3 Eigenstructure for the Staggered Grid , 4 1.4 Eigenstructure for the Mixed Grid , , 5 1.5 Operation Counts for 2D Poisson Solvers 7 2.1 Symmetries in the IDFT 88 2.2 Symmetries in the DFT 89 3.1 Symmetries in the IDST 161 3.2 Symmetries in the DST 162 4.1 Splitting Tree for the Radix-2 RO FFT for N = 16 177 4.2 Splitting Tree for the Radix-4 RO FFT for N = 64 189 4.3 Splitting Tree for the Radix-3 RO FFT for N = 27 203 4.4 Splitting Tree for the Mixed Radix RO FFT for N = 72 206 4.5 Timing Data for 1024 Sequences on the IBM 3090J 215 4.6 Timing Data for 1024 Sequences on the Cray Y-MP8/864 216 4. 7 Timing Model for 1024 Sequences on the IBM 3090J , 221 4.8 Comparison of Timing Data for Handwritten Code and Au-tomated Code for 1024 Sequences on the IBM 3090J , , 224 X

PAGE 11

Acknowledgements This work was generously supported by the IBM Federal Sector Division Resident Study Program. XI

PAGE 12

Chapter 1 Introduction 1.1 The Fourier Analysis Method We begin with a brief overview of the Fourier analysis method. We will first present the Fourier analysis method in one spatial dimension. We will then extend the method to a two-dimensional rectangle. The extension to higher dimensional rectangular regions is analogous, but we will not pur sue this. Finally, we will discuss operation counts for the Fourier analysis method, and compare it to other methods. In one spatial dimension, the discretized Poisson equation is: for 1 ::; n ::; M. We must specify boundary conditions at both the left and right endpoint. We may assume, without loss of generality, that the boundary conditions are homogeneous, since inhomogeneous boundary val ues may be absorbed into h and fM. The discrete, homogeneous boundary conditions we consider, specified for n = 1, are shown in Table L 1. Note that we consider two variants of Dirichlet and Neumann boundary condi tions, depending upon whether the boundary coincides with a grid point or is staggered at a half grid spacing. The notation D-N indicates a homoge neous Dirichlet boundary condition at the left endpoint, and a homogeneous Neumann boundary condition at the right endpoint. Similar notation will be used for other combinations. Combinations which involve only C, D, or N are referred to as standard grid boundary conditions. Combinations which involve only DS or NS are referred to as staggered grid boundary conditions. Other combinations are referred to as mixed grid boundary conditions.

PAGE 13

The discretized boundary value problem may be written in matrix form as: Au= f (1.1) where A is a matrix of dimension M, and u, f are vectors of length M. The boundary conditions have been used to eliminate uo and UM +1 A is tridiagonal, and in one spatial dimension we would simply solve this linear system by Gaussian elimination. However, in anticipation of extensions to higher dimensions, we pro cede as follows. First, we find the eigenvalues and eigenvectors of A. These are summarized in Tables 1.2, 1.3, and 1.4. Note that A always has a full set of linearly independent eigenvectors whose components are trigonometric expressions. Note also that in these tables the computational domain is different for each boundary condition. The reason for this will become clear after studying the corresponding symmetric FFT. Appendix A provides an example of one technique for finding these eigenval ues and eigenvectors. For this general discussion, we denote the eigenvalues by Ak (repeated to multiplicity) and the corresponding eigenvectors by tPk for 1 :S k :S M. We now seek a solution for u in the form of an eigenvector expansion: M u = L UktPk (1.2) k=l This requires that we also express f as an eigenvector expansion: (1.3) Since f is known and the vectors tPk are linearly independent, we may compute A. Because the components of tPk are trigonometric expressions, A Table 1.1: Discrete Homogeneous Boundary Conditions Acronym Boundary Condition Discrete Analog c Cyclic UoUM D Dirichlet u0 = 0 N Neumann u2-u0=0 DS Dirichlet-Staggered u, + uo = 0 NS Neumann-Staggered u,-uo=O 2

PAGE 14

may be computed most efficiently by means of a symmetric FFT. Thus, this step is referred to as Fourier analysis. Substituting equations (1.2) and (1.3) into equation (1.1) yields: k=l M A[L Ukk] k=l M L UkAkk k=l Since the vectors k are linearly independent, we conclude: for 1 :<; k :<; M. We may now compute Uk, unless Ak = 0. In this case, the compatibility condition jk = 0 must hold, and iik is arbitrary. Thus, the solution for u is not unique in this case. This occurs for C-C, N-N, NS NS, N-NS, and NS-N boundary conditions, and corresponds to the fact that the solutions to these problems are unique only up to an additive constant. Having determined iik, u may now be computed using the inverse of the corresponding symmetric FFT. This step is called Fourier synthesis. We now indicate how to extend the Fourier analysis method to a two dimensional rectangle. For simplicity, we assume that the number of un knowns in each dimension are equal. In two spatial dimensions, the dis cretized Poisson equation is: for 1 :<; n, m :<; M, where p = t::..xj t::..y. We assume that homogeneous boundary conditions are specified on all four sides of the rectangle of the same type considered previously. The discretized boundary value problem may be written in matrix form as: (1.4) for 1 :<; m :<; M. Urn is a vector oflength M with n'th component Un,rn, and likewise for frn. A is the same M -dimensional matrix as in the corresponding one-dimensional problem. As before, we seek a solution for Urn in the form of an eigenvector expansion: M Urn= L Uk,mk k=l 3 (1.5)

PAGE 15

Table 1.2: Eigenstructure for the Standard Grid Bnd Cnd n'th Comp of Eigenvec Comp Domain Transform Associated Eigenvalue Eigen vee Indx C-C cos(2dn/ N) OSnSN-1 RFFT -4sin2(d/N) 0 S k S N /2 or 0 S k S (N-1)/2 sin(27rkn/ N) OSnSN-1 -4sin2(d/N) 1 S k S N /2 -1 or 1 < k < (N-1)/2 N-N cos(2dn/ N) 0 S n S N/2 REFFT -4 sin2 ( d/ N) 0 < k < N/2 D-D sin(2dn/ N) 1 S n S N /2-1 RO FFT -4 sin2 ( d/ N) 1 < k < N /2-1 N-D cos[27rn(2k + 1)/ N] OSnSN/4-1 RE-O FFT -4sin2[7r(2k + 1)/N] O
PAGE 16

Table 1.4: Eigenstructure for the Mixed Grid Bnd Cnd n'th Camp of Eigenvec Comp Domain Transform Associated Eigenvalue Eigenvec Indx N = 2(2M + 1) N-NS cos( 41rkn IN) O:<;n:;M RE-E FFT -4 sin2(21rkl N) 0 < k < M N-DS cos[27rn(2k + 1)1 N] O:<;n:;M RE-O FFT -4sin2[7r(2k + 1)INJ O< k < M D-NS sin[27rn(2k-1)1 N] 1:5cn:5cM RO-E FFT -4 sin2[7r(2k -1 )IN] 1
PAGE 17

This requires that we also express fm as an eigenvector expansion: M fm = L:A,mk (1.6) k=1 A,m may be computed most efficiently by performing M symmetric FFTs of length M. Substituting equations (1.5) and (1.6) into equation (1.4) yields: M A[L uk,mk] + k=1 M p2 L[uk,m-1-2uk,m + Uk,m+dk k=1 M L Uk,mAkk + k=1 M p2 L:[uk,m-1-2uk,m + "k.m+1lk k=l Since the vectors k are linearly independent, we conclude: 2. + (' 2 2). 2. ,. p Uk,m-1 Ak p Uk,m + p Uk,m.+ 1 = k,Tn for 1 :S k, m :S M. We now obtain itk,m by solving M tridiagonal linear systems of dimension M by Gaussian elimination. For C-C, N-N, NS-NS, N-NS, or NS-N boundary conditions, one of these linear systems is singular. In this case, A,m must satisfy a compatibility condition, and the solution for itk,m is not unique. Having determined Uk,m, Um may be computed by performing M symmetric FFTs of length M. We conclude this section with a discussion of operation counts for the Fourier analysis method, and a comparison of it to other methods for solv ing the discrete Poisson equation. The Fourier analysis method is efficient only for two or more dimensions. As before, we will restrict our discus sion to two dimensions. The operation count for an MxM grid, where M is a power of two, is easily obtained from the description of the algorithm above. We performed 2M symmetric FFTs of length M, each of which re quires O(M log M) operations. We solved M tridiagonal linear systems of dimension M by Gaussian elimination, each of which requires O(M) oper ations. Thus, the asymptotic operation count for the entire algorithm is 6

PAGE 18

Table 1.5: Operation Counts for 2D Poisson Solvers Method Operation Count Gaussian Elimination O(M4 ) Successive Over-Relaxation O(M3logM) Alternating Direction Implicit O(M2log2 M) Cyclic Reduction O(M2logM) Fourier Analysis O(M2logM) FACR(l) O(M2loglog M) O(M2log M). The operation counts for other methods of solving the dis crete Poisson equation are summarized in Table 1.5. The source of this information is [8]. The FACR(l) method combines the cyclic reduction and Fourier analysis methods. 7

PAGE 19

1.2 The New FFT and FST Algorithms From the discussion of the Fourier analysis method in Section 1.1, it is evident that FFT algorithms form the core of this method. Our goal is to provide the best possible FFT algorithms for this purpose, and to address all of the boundary conditions in Tables 1.2, 1.3, and 1.4. In this section, we summarize the new contributions to FFT literature contained herein. For each of the boundary conditions in Tables 1.2, 1.3, and 1.4 an FFT algorithm has been developed which computes the coefficients in the corre sponding eigenvector expansion as efficiently as possible by eliminating all redundant computations which would occur in the full complex FFT, and without preor post-processing. Such FFT algorithms are referred to as compact symmetric FFTs. The older pre-and post-processing algorithms are described in detail in [2, 10]. Preand post-processing steps contribute only low order terms to operation counts. However, for sequences of prac tical length these low order terms may be significant. Furthermore, these algorithms require additional data accesses which also contribute to the to tal execution time. Thus, compact symmetric FFTs eliminate the additional operations and data accesses associated with pre-and post-processing algo rithms. Preand post-processing algorithms also have the restriction that the length of the sequence must be even. A compact symmetric FFT has long been available for real sequences, known as Edson's algorithm. In [4], a compact symmetric FFT for real even sequences is introduced, but in the context of Glenshaw-Curtis quadrature. In [10], in-place compact symmetric FFTs are developed for real, even, odd, quarterwave even, and quarterwave odd symmetries. All in-place algorithms based on the splitting method require either the input or output sequence to be in a permuted order, referred to as bit-reversed order. These in-place algorithms require the input sequence in physical space to be in bit-reversed order, and produce the forward transform in natural order. From our discussion of the Fourier analysis method, it is clear that this is the opposite of what is desired. In [1], analogous algorithms are developed which accept the input sequence in physical space in natural order, and produce the forward transform in bit-reversed order. We follow the general approach set forth in [1]. With this background, we may now summarize our new contributions to FFT literature. The algorithms in [1] were developed for radix-2 only. We have generalized all of these to radix-p, for a general factor p. This has resulted in a number of new intermediate symmetries which occur in 8

PAGE 20

the course of the splitting method. After obtaining the combine equations for the inverse transform, they must be inverted to obtain those for the forward transform. For the radix-p algorithms, this reqnires the inversion of many systems of p equations in p unknowns. We have exploited the special nature of these systems of equations to in vert them in closed form. The real quarterwave even and quarterwave odd transforms, which we refer to as the real staggered even (RSE) and real staggered odd (RSO) FFTs, have been used for N-D and D-N boundary conditions respectively. We have shown that the algorithms for these symmetries in [1] are not in-place. We have developed two new compact symmetric FFTs, called real composite evenodd (RE-O) and composite odd-even (RO-E) for these boundary conditions. We have shown that these new algorithms are in-place and obtain the goal of eliminating all redundant operations which would occur in the full complex FFT. For staggered grid boundary conditions, we have developed new algorithms based on a variant of the DFT which we refer to as the discrete stag gered transform (DST). In analogy to the FFT, we have developed efficient algorithms for computing the DST, which we refer to as the fast staggered transform (FST). Previously, the only known algorithms for staggered grid boundary conditions were the real quarterwave even and quarterwave odd FFTs, and the pre-and post-processing algorithms in [6]. The real quarterwave even and quarterwave odd FFTs have been used for NS-NS and DS-DS boundary conditions respectively, but the algorithms for these symmetries in [1] are not in-place. The pre-and post-processing algorithms for NS-DS and DS-NS boundary conditions are less efficient than the new compact symmetric FSTs for the same general reasons discussed previously. For mixed grid boundary conditions, we have developed new algorithms based on superimposing two symmetries. We refer to the resulting sym metries as composite symmetries. Previously, the only known algorithms for mixed grid boundary conditions were the pre-and post-processing al gorithms in [6] for NS-D and D-NS boundary conditions. Again, the preand post-processing algorithms are less efficient than the new compact algorithms. Furthermore, we have developed compact algorithms for six mixed grid boundary conditions which previously could not be solved by Fourier methods. 9

PAGE 21

Chapter 2 Fast Fourier Transforms 2.1 Complex (C) We begin by reviewing the fast Fourier transform, and establishing no tation which will be used throughout. Definition 2.1 Given a C sequence Xn, for 0 < n < N 1, the forward discrete Fourier transform (DFT) is defined by: N-1 xk = 1/N L "'nWNkn (2.1) n=O for 0 :<:: k :<:: N -1, where: For convenience, we will often suppress the constant 1/ N. The following theorem provides the inverse discrete Fourier transform (IDFT). We omit the proof of this result because it is well known. Theorem 2.1 A C sequence Xn may be recovered from its DFT Xk by the inverse discrete Fourier transform {IDFT} which is given by: N-1 Xn = L (2.2) k=O forO:<::n:<::N-1.

PAGE 22

By Definition 2.1, the sequences "'nand Xk are oflength N. These sequences can be extended to all integral values of nand k using the periodicity properties provided by the following corollary. Corollary 2.1 Equation.< (2.1} and {2.2} imply that the sequences Xn and xk may be extended periodically to all integral values of n and k by: We will develop fast algorithms for computing the DFT and IDFT which are based on the CooleyTukey fast Fourier transform (FFT). Following the general approach in [1], we will develop algorithms for the IDFT given Xk in bit-reversed order. Inverting these yields algorithms for the DFT given Xn in natural order. We begin by defining notation which will be needed in the development of these algorithms. Definition 2.2 Given a C sequence Xk of length N, and a factor p of N, we define a splitting of Xk consisting of the following p subsequences, each of length N jp: xk,q = xpk+q for 0 :0: k :0: N /p1, 0 :0: q :0: p1. We denote the IDFT of these by Yn,q. That is: Nfp-l Yn,q = L Xk,qw7v')P k=O for 0 :0: n :0: N /p1, 0 :<:; q :<:; p1. Given a C sequence Xn of length N, and a factor p of N, we define the following p subsequences, each of length N / p: Xn,l = XtNjp+n for 0 :0: n :0: N /p1, 0 :S l :<:; p1. The inverse fast Fourier transform (IFFT) is based on the principle of computing the quantities Yn,q, and then combining these in the appropriate fashion to obtain :Cn,l The precise equation for performing this combining operation is provided by the next theorem. 11

PAGE 23

Theorem 2.2 The inverse combine equation for C sequences is: p-1 lq nq Xn,l = q:::::O for 0 $ n $ N jp-1, 0 $ l $ p-1. We now prove Theorem 2.2. N-1 :l:n = L Xkw 1JP = k=O p-1N/p-1 2: 2: Xpk+qw';jpk+q) q:::::O k=O p-1 Njp-1 '<' nq '<' X kn L_.WN L_. k,qWNjp q=O k=O p-1 '<' nq q=O In terms of the subsequence notation defined previously, this result is: "lN/p+n p-1 '<' q(lNjp+n) L. W N YIN /p+n,q q=O p-1 '"" lq nq L....J WP WN Yn,q q=O This completes the proof of Theorem 2.2. (2.3) The following corollary provides an important special case of this result. This is the same as equation (2) in [1], except that we are working with the IDFT. Corollary 2.2 Assume p = 2. The inverse combine equation for C se quences ts: Yn,o + wJVYn,l n Yn,O-WNYn,l for 0 $ n $ N /2 -1. 12

PAGE 24

We may now describe the IFFT algorithm for a C sequence with length a power of two. Figure 2.1 is a 'splitting tree' diagram which represents this algorithm for a C sequence of length eight. The original sequence is split into two subsequences, one consisting of the even numbered terms, and the other consisting of the odd numbered terms. Assume, for the moment, that the IDFT of each subsequence is known. Then the IDFT of the original sequence may be obtained by applying Corollary 2.2. The algorithm now continues recursively. That is, the IDFT of each subsequence is computed by splitting them and repeating the steps above. Eventually, subsequences of length one will be obtained. Since a sequence of length one is its own IDFT, the recursive process terminates at this point. We now begin the development of the FFT algorithm. We will obtain the forward combine equation for the FFT by inverting the inverse combine equation. For this, we will need the following 'orthogonality property.' Lemma 2.1 If N is a positive integer, and 0 :<:: j, n :<:: N-1, then: if j = n otherwise We now prove Lemma 2.1. The case j = n is obvious. For j fc n, define: y = w'Jv-j fc 1 Summing the finite geometric series yields: N-1 "' k(n-j) L...WN k=O = k=O (1-yN)/(1-y) (1 1)/(1 y) 0 This completes the proof of Lemma 2.1. The forward combine equation for the FFT is now provided by the following theorem. Theorem 2.3 The forward combine equation for C sequences is: p-1 1 / -nq "' -lq Yn,q = PWN LWP a:n,l l=O forO:<:: n :<:: Nfp-1, 0 :<:: q :<:: p-1. 13 (2.4)

PAGE 25

c <::: c c <::: c c <::: c c <::: Figure 2.1: Splitting tree for complex FFT 14

PAGE 26

We now prove Theorem 2.3. N(p-1 Yn,q I: xk,qwmp k=O N(p-1 L Xpk+qw'N')P k=O N/p-1 N-1 [1/N :c w -j(pk+q),wkn L.,; L.,; J N J N(p k=D j=O N-1 N(p-1 1/N" "'w-iqr wk(n-j)] L.,; J N L L.,; N(p j=O k=O p-1 1 / -q(lN(p+n) PL.,; "'lN(p+nWN = l=O p-1 1/pw/tq L w;1 q:cn,l l=O This completes the proof of Theorem 2.3. The following corollary provides an important special case of this result. This is the same as equation (13) in [1], except that we are working with the IDFT. Corollary 2.3 Assume p quences z.s: 2. The forward combine equation for C seYn,O ( :Z:n,O + :Cn,l) /2 Yn,1 = W/t(:cn,O-"'n,J)/2 for 0 $ n $ N /2 -1. We close this section by presenting the FFT and IFFT algorithms for complex sequences with length a power of two. We emphasize that this FFT is an in-place algorithm which accepts the input sequence :Cn in natu ral order, and produces the forward transform Xk in bit-reversed order. The IFFT is an in-place algorithm which accepts the sequence Xk in bit-reversed order, and produces the inverse transform :Cn in natural order. These al gorithms may be used together in such a way that reordering of the data 15

PAGE 27

is never required. We will not include complete algorithm specifications such as these for all of the symmetric FFTs presented later. However, the algorithms presented here should provide a guideline for developing com plete algorithms from forward and inverse combine equations. The codes are written in FORTRAN, and are patterned after similar codes found in [9]. 16

PAGE 28

c C TEST DRIVER FOR COMPLEX FFT c c PARAMETER (LOGN=3,N=2LOGN) COMPLEX X(O:N-1) COMPLEX OMEGA(O:N-1) COMMON /FCCOM/ L,OMEGA DO 100 I=O ,N-1 X(I) = CMPLX(1.0,0.0) 100 CONTINUE WRITE(6,1) (X(I) ,I=O,N-1) 1 FORMAT(1H ,'COMPLEX SEQUENCE= ',4(/,4E13.4)) CALL FCI(LOGN) CALL FFC(LOGN,X) WRITE(6,2) (X(I) ,I=O,N-1) 2 FORMAT(1H TRANSFORM = ',4(/ ,4E13.4)) CALL FIC(LOGN,X) WRITE(6,3) (X(I),I=O,N-1) 3 FORMAT(1H ,'INVERSE TRANSFORM= ',4(/,4E13.4)) END C FOURIER TRANSFORM C COMPLEX SEQUENCE C INITIALIZATION c SUBROUTINE FCI(LOGN) COMPLEX OMEGA(O:O) COMMON /FCC OM/ L, 0!1EGA L = 2**LOGN OMEGA(O) = 1.0 TPIDL = 8.0*ATAN(1.0)/L OMEGA(1) = CMPLX(COS(TPIDL),SIN(TPIDL)) DO 100 I=2,L-1 OMEGA(I) = OMEGA(I-1)MEGA(1) 100 CONTINUE RETURN END 17

PAGE 29

c C FOURIER TRANSFORM C FORWARD DIRECTION C COMPLEX SEQUENCE c c SUBROUTINE FFC(LOGN,X) COMPLEX X(0:2**LOGN-1) N = 2**LOGN DO 100 I=1,LOGN NS = 2**(I-1) LS = N/NS CALL CF(NS,LS,X) 100 CONTINUE DO 200 I=O,N-1 X(I) = X(I)/N 200 CONTINUE RETURN END C COMPLEX SEQUENCES C FORWARD COMBINED c C NS = NUMBER OF SEQUENCES C LS = LENGTH OF SEQUENCES c SUBROUTINE CF(NS,LS,X) COMPLEX X(O:LS/2-1,0:1,NS),TMP1 COMPLEX OMEGA(O:O) COMMON /FCCOM/ L,OMEGA DO 200 J=1,NS DO 100 I=O,LS/2-1 TMP1 = X(I,O,J) + X(I,1,J) X(I,1,J) = CONJG(OMEGA(IL/LS))(X(I,O,J) -X(I,1,J)) X(I,O,J) = TMP1 100 CONTINUE 200 CONTINUE RETURN END 18

PAGE 30

c C FOURIER TRANSFORM C INVERSE DIRECTION C COMPLEX SEQUENCE c c SUBROUTINE FIC(LOGN,X) COMPLEX X(0:2**LOGN-1) N = 2**LOGN DO 100 I=l,LOGN LS = 2**I NS = N/LS CALL CI(NS,LS,X) 100 CONTINUE RETURN END C COMPLEX SEQUENCES C INVERSE COMBINED c C NS = NUMBER OF SEQUENCES C LS = LENGTH OF SEQUENCES c SUBROUTINE CI(NS,LS,X) COMPLEX X(O:LS/2-1,0:1,NS),TMP1 COMPLEX OMEGA(O:O) COMMON /FCCOM/ L,OMEGA DO 200 J=1,NS DO 100 I=O,LS/2-1 TMP1 = OMEGA(I*L/LS)*X(I,l,J) X(I,1,J) = X(I,O,J) -Tl1P1 X(I,O,J) = X(I,O,J) + TMP1 100 CONTINUE 200 CONTINUE RETURN END 19

PAGE 31

2.2 Real (R) In this section, we will be concerned with the following symmetries: Definition 2.3 A real {R) sequence Xn of length N is defined by: A conjugate symmetric {CS) sequence Xk of length N is defined by: XN-k = Xk The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.2 If Xn is an R sequence of length N, then its DFT Xk is a CS sequence of length N. If Xk is a CS sequence of length N, then its IDFT Xn is an R sequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for C-C boundary con ditions. Since an R sequence is also periodic with length N, it satisfies C-C boundary conditions for the computational domain 0 ::; n :S N -1. Theorem 2.4 Let Xn be an R sequence and let xk be its cs symmetric DFT, both of length N. The real form of the DFT is: N-1 1/ N L Xn cos(21rkn/ N) n::::::O N-1 Im(Xk) = -1/ N L Xn sin(2r.kn/ N) n=O for 0::; k :S N /2 if N is even, and 0 :S k :S (N-1)/2 if N is odd. If N is even, then the real form of the IDFT is: Xn = Xo+(-l)nXN/2+ N/2-1 L {2Re(Xk) cos(2r.kn/ N)-2Im(Xk) sin(2r.kn/ N)} k=1 20

PAGE 32

for 0 :": n :": N 1. If N is odd, we obtain instead: (N-1)/2 :Z:n = Xo + L {2Re(Xk) cos(2dn/ N)-2Im(Xk) sin(2dn/ N)} k=1 forO:": n :": N -1. We now prove Theorem 2.4. The result for the DFT follows immediately from Definition 2.1 and the R symmetry of "'n. Note that only half of the CS sequence Xk needs to be specified. We prove the result for the IDFT for the case of even N only, since the proof for odd N is similar. Using the CS symmetry of X k yields: N-1 Zn = L Xkw';p k=O Xo+(-1tXN/2+ N/2-1 N/2-1 I; Xkwj;p + I; XN-kw;JN-k) k=1 k=1 Xo + (-ltXN/2 + N/2-1 N/2-1 L Xkwjyn + L XkwNkn k=l k=1 N/2-1 X0 + ( -l)nXN/2 + 2Re[ I; Xkwl;pj Xo + (-ltXN/2 + N/2-1 k=1 I; {2Re(Xk)cos(2dn/N)-2Im(Xk)sin(2dn/N)} k=1 This completes the proof of Theorem 2.4. We now develop a fast, mixed radix algorithm for computing the R symmetric DFT and its inverse, given Zn in natural order. Note that an R sequence oflength N may be stored inN real storage locations, compared to 2N real storage locations for a C sequence of length N. Also, a CS sequence of length N may be stored in N real storage locations because half of the sequence is redundant and need not be stored. Our goal is to exploit these symmetries in the data in order to obtain a reduction by half in both storage 21

PAGE 33

requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the CS sequence Xk-We begin developing this algorithm by defining ail of the intermediate symmetries involved. Definition 2.4 Let Xk be a CS sequence of length N with factor p. For q i 0, we define CS induced intersequence symmetry (CSIS) by: xk,p-q = xNfp-k-1,q For q i 0, we denote subsequence Xk,q by CSIS(q). Subsequence p-q is a redundant copy of subsequence q, which we denote by CSIS(p q) = CSIS* (q). We also say that subsequence p-q is the dual of subsequence q. A staggered conjugate symmetric (SCS} sequence Xk of length N is de fined by: xN-k-1 = xk Let N have factor p. For 0 ::; q ::; p-1, we define SCS induced intersequence symmetry (SCSIS) by: Xk,p-q-1 = XNjp-k-1,q For 0 ::; q ::; p-1, we denote subsequence Xk,q by SCSIS(q). Subsequence p-q -1 is a redundant copy of subsequence q, which we denote by SCSIS(pq-1} = SCSIS*(q). We also say that subsequence p-q-1 is the dual of subsequence q. The following lemma establishes the relationship between these symm.e tries. Lemma 2.3 Let Xk be a CS sequence of length N with factor p. Then the subsequence Xk,o is CS symmetric, and the remaining subsequences Xk,q are CSIS symmetric. If p is even, then the CSIS symmetry of subsequence xk,p/2 reduces to scs symmetry. Let Xk be an SCS sequence of length N with factor p. Then the subse quences Xk,q are SCSIS symmetric. If p is odd, then the SCSIS symmetry of subsequence Xk,(p-1);2 reduces to SCS symmetry. We now prove Lemma 2.3. Let Xk be a CS sequence of length N with factor p. The subsequence Xk,o satisfies: xNfp-k,o = xN-pk = xpk = xk,o 22

PAGE 34

That is, subsequence xk,O is cs synunetric. The remaining subsequences xk,q satisfy: xpk+p-q XN-pk-p+q X p(Njp-k-1)+q X N/p-k-1,q That is, for q fc 0 the subsequences Xk,q are CSIS symmetric. If p is even, then the CSIS synunetry of Xk,p/2 reduces to: xk,p/2 = xNfp-k-l,p/2 That is, subsequence Xk,v/2 is SCS synunetric. Let Xk be an SCS sequence oflength N with factor p. The subsequences xk,q satisfy: xk,p-q-1 Xpk+p-q-1 XN-pk-p+q+l-1 = xp(Nfp-k-l)+q XN/p-k-1,q That is, the subsequences Xk,q are SCSIS synunetric. If p is odd, then the SCSIS synunetry of Xk,(p-1);2 reduces to: Xk,(p-1)/2 = XN/p-k-1,(p-1)t2 That is, subsequence Xk,(v-1);2 is SCS synunetric. This completes the proof of Lenuna 2.3. A mixed radix splitting tree diagram for a CS sequence is shown in Figure 2.2. The acronyms representing the synunetries are sununarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. The next lenuna provides the intermediate synunetries in the IDFT in duced by the intermediate synunetries in the DFT. Lemma 2.4 The intersequence symmetry CSIS induces the following inter sequence symmetry in the IDFT: -nYn,p-q = WNjpYn,q 23

PAGE 35

cs cs CS --=::::::::::::c SIS* ( 1) c CSIS*(l) Figure 2.2: Splitting tree for R symmetric FFT 24

PAGE 36

Let Xk be an SCS sequence of length N. Its IDFT Xn satisfies: -nWN Xn -n/2. WN Xn where ;;n is the magnitude of Xn, and hence is real. The intersequence symmetry SCSIS induces the following intersequence symmetry in the IDFT: -n Yn,p-q-1 = WNjpYn,q We now prove Lemma 2.4. Let Xk,q be CSIS symmetric. Then the IDFT of Xk,p-q is: Njp-1 Yn,p-q I: xk,p-qw7J!p k-=0 Nfp-1 "' -kn L_, X Njp-k-1,qWNjp k=O Njp-1 "' X n(N/p-k-1) L k,qWN/p k=O Nfp-1 WNlp L Xk,qWN'; k=O Let Xk be an SCS sequence of length N. Its IDFT Xn satisfies: N-1 Zn L Xkw}.p k=O N-1 "' X n(N-k-1) L_, N-k-lWN k=O N-1 wft L Xkwi/n k=O We express Xn in polar form as follows: 25

PAGE 37

Substituting this into the preceding syrmnetry for :en and solving for B leads to: -n/2. :Z:n =WN Xn Let Xk,q be SCSIS syrmnetric. Then the IDFT of Xk,p-q-1 is: Yn,p-q-1 N/p-1 L Xk,p-q-1w;;;P k=O N/p-1 X kn ..... N/p-k-,qWN/p k=O N/p-1 X n(N/p-k-1) ..... k,qWN/p k=O N/p-1 wNfp I: Xk,qwN-/; k=O This completes the proof of Lermna 2.4. The preceding lermna shows that each syrmnetry appearing in Figure 2.2 induces a syrmnetry in the IDFT. These induced syrmnetries are surmnarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the R syrmnetric IFFT. Theorem 2.5 Assume that p is even. The inverse combine equation for CS, SCS, and CSIS sequences is: Xn,l Yn,O + ( + p/2-1 2Re[ L (2.5) q::::l for 0 :S n :S N jp 1, 0 :S l :S p 1. Note that :Cn,l is real because Yn,o ts real. The inverse combine equation for SCSIS sequences is: p/2-1 2R [ 1/2 n/2 lq nq 1 Xn,l = e WP WN L WP WN Yn,qJ q=O for 0 :S n :S N jp 1, 0 :S l :S p 1. 26 (2.6)

PAGE 38

Next, assume that p is odd. The inverse combine equation for CS and CSIS sequences is: (p-1)/2 '"n,l = Yn,O + 2Re[ L (2.7) q=l for 0 :S: n :S: N / p -1, 0 :S: l :S: p -1. The inverse combine equation for SCS and SCSIS sequences is: (p-3)/2 2 R r 1/2, n/2 lq nq J '"n,l-Yn,(p-1)/2 + elwp WN L.. wp WN Yn,q q=O for 0 :S: n :S: N jp-1, 0 :S: l :S: p-1. (2.8) We now prove Theorem 2.5. First, assume that pis even. Consider the combining of CS, SCS, and CSIS sequences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: p-1 "'"" lq nq Zn,l = LWPWNYn,q q=O + lp/2 np/2 Yn,O WP WN Yn,p/2 -t p/2-1 p/2-1 wlqwnqy + wl(p-q)wn(p-q)y L P N n,q L P N n,p-q q=l q=l ( 1 )1 np/2 -n/2 J = Yn,O + -WN lWNfp Yn,p/2 + p/2-1 p/2-1 lq nq + -lq -nqL wp WN Yn,q L wp WN Yn,q q:::::l q=l Yn,O + ( -1 )1 Yn,p/2 + p/2-1 2Re[ L q=l Consider the combining of SCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (2.3) yields: :l:n,l = p-1 ""' lq nq LWPWNYn,q q=O 27

PAGE 39

p/2-1 p/2-1 = lq nq + l(p-q-1) n(p-q-1) L..J wp WN Yn,q L wp WN Yn,p-q-1 q=O q=O p/2-1 p/2-1 '""" lq nq + -I -n V -lq -nqL wp WN Yn,q WP WN L WP WN Yn,q q=O q=O Using SCS symmetry yields: = "'ZN/p+n -(lNfp+n)/2-WN XzNjp+n -l/2 -n/2Wp WN Xn,l Substituting this into the combine equation above yields: p/2-1 ;;n,l wlf2wn/2 wlqwnqy + p N L p N n,q q=O p/2-1 -l/2 -n/2 -lq -nqWp wN L WP WN Yn,q q::::O p/2-1 2Re[w112wn/2 w 1wnqy ] pNL_.;pNn,q q=O (2.9) Next, assume that pis odd. Consider the combining of CS and CSIS se quences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: (p-1)/2 (p-1)/2 = Yn,O + I: lq nq WpWNYn,q + I: l(p-q) wP n(p-q) WN Yn,p-q q=l q=l (p-1)/2 (p-1)/2 Yn,O + I: lq nq WpWNYn,q + I: -lq -nqWP WN Yn,q q=l q=l (p-1)/2 2R f lq nq 1 Yn,O + eL L wp WN Yn,qj q=l 28

PAGE 40

Consider the combining of SCS and SCSIS sequences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: p-1 "" lq nq Zn,l q=O = l(p-1)/2 n(p-1)/2 + WP WN Yn,(p-1)/2 (p-3)/2 (p-3)/2 "" lq nq "" l(p-q-1) n(p-q-1) L.., wp WN Yn,q T wp WN Yn,p-q-1 q=O q=O = l(p-1)/2 n(p-1)/2 WP w N Yn,(p-1 )/2 ..,. (p-3)/2 (p-3)/2 lq nq ...L -l -n ""' -lq -nq_ L WP WN Yn,q 1 WP WN L..-J wp WN Yn,q q=O q=O Combining this with equation (2.9) yields: p/2 np/2 WP WN Yn,(p-1)/2 + (p-3)/2 (p-3)/2 l/2 n/2 "" lq nq + -l/2 -n/2 "" WP WN L WpWNYn,q wp WN L q=O q=O (p-3)/2 2 R r 1/2 n/2 "" lq nq ] -Yn,(p-1)/2 + elwp WN L., WP wN Yn,q = q=O This completes the proof of Theorem 2.5. The following corollary provides an important special case of this result. These are the same as equations (6) and (7) in [1], except that we are working with the IDFT. Corollary 2.4 Assume p = 2. The inverse combine equation for CS and SCS sequences is: Xn,O Yn,O + fin,l :Z:n,l Yn,O -fln,l for 0 :::; n :::; N /2-1. The inverse combine equation for SCSIS sequences is: 2 R n/2 1 Xn,o etwN Yn,OJ in,1 -2Im[w;12Yn,o] for 0 :::; n :::; N /2 -1. 29

PAGE 41

The next theorem provides all of the forward combine equations for the R symmetric FFT. Theorem 2.6 Assume that p is even. The forward combine equation for CS, SCS, and CSIS sequences is given by equation {2.4} for 0::; n::; N jp-1, 0 ::; q ::; p/2-1 and: p-1 Yn,p/2 = 1/p 2.::( -1)1Xn,i (2.10) [:;:;;;0 for 0 ::; n ::; N / p -1. The forward combine equation for SCSIS sequences 'tS: p-1 1/ n(q+l/2) '\' -l(q+l/2). PWN LWP Xn,l (2.11) l=O forO::; n::; Njp-1, 0::; q::; p/2 -1. Next, assume that p is odd. The forward combine equation for CS and CSIS sequences is given by equation (2.4) for 0 ::; n ::; N jp-1, 0 :S q :S (p-1)/2. The forward combine equation for SCS and SCSIS sequences is given by equation {2.11} for 0::; n::; N/p-1, 0::; q::; (p-3)/2 and: for 0::; n::; Njp 1. p-1 Yn,(p-1)/2 = 1/p 2.::( -1)1in,l l=O (2.12) We now prove Theorem 2.6. First, assume that p is even. The forward combining of CS, SCS, and CSIS sequences requires one new equation: Yn,p/2 = n/2 WNjpYn,p/2 p-1 1/p 2.:: w;1PI2xn,l l=O p-1 1/p 2.::(-1)1xn,l l=O The forward combine equation for SCSIS sequences is obtained by sub stituting equation (2.9) into equation (2.4): p-1 1 / -nq '\' -lq Yn,q = p WN L WP Xr.,l l=O 30

PAGE 42

p-1 = 1 / -nq"""' -lq[ -l/2 -n/2J pwN LwP wP wN -;en,l l=O p-1 1 / -n(q+l/2)"""' -l(q+l/2)pwN LWP Xn,I l=O Next, assume that p is odd. The forward combining of CS and CSIS sequences does not require any new equations. The forward combining of SCS and SCSIS sequences requires one new equation: n/2 Yn,(p-1)/2 = WNjpYn,(p--1)/2 p-1 1/p L:;w;lpf2:iin,l l=O p-1 1/p I;( -1)1:ii,;,z l=O This completes the proof of Theorem 2.6. The following corollary provides an important special case of this result. These are the same as equations (11) and (12) in [1], except that we are working with the IDFT. Corollary 2.5 Assume p = 2. The forward combine equation for CS and SCS sequences is: Yn,O ( ;J;n,O + ;J;n,,) /2 Yn,1 ( "'n,O "'n,1) /2 for 0 ::0: n ::0: N /2 -1. The forward combine equation for SCSIS sequences zs: -n/2(. )/2 Yn,O = W N Xn,O ZXn,l for 0 ::0: n ::0: N /2 1. 31

PAGE 43

2.3 Real Even (RE) In this section, we will be concerned with the following symmetries: Definition 2.5 A real even (RE) sequence Xn of length N is defined by: XN-n = Xn Note that an RE sequence may also be viewed as having both R and CS symmetry, which we denote by R CS. The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.5 If Xn is an REsequence of length N, then its DFT Xk zs an RCS sequence of length N. If Xk is an RCS sequence of length N, then its IDFT Xn is an REsequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for N-N boundary condi tions. Note that if N is even, then an RE sequence satisfies N-N boundary conditions for the computational domain 0 :C: n :C: N /2. That is: "'N/2-1 Theorem 2. 7 Let "'n be an REsequence and let Xk be its RCS symmetric DFT, both of length N where N is even. The real form of the DFT is: N/2-1 Xk = 1/N["o + (-l)kxN/2 + L 2xncos(2r.kn/N)] n=l for 0 :C: k :C: N /2. The real form of the IDFT is: N/2-1 Xn=Xo+(-ltXN;2 + I; 2Xkcos(27rkn/N) k:::::l for 0 :C: n :C: N /2. Note that the results for the DFT and IDFT are identical e:l:cept for scaling. 32

PAGE 44

We now prove Theorem 2. 7. The result for the DFT follows from Theo rem 2.4, the RCS symmetry of xk, and theRE symmetry of Xn as follows: N-1 Xk l/ N L "'n cos(27rkn/ N) n=O = l/N{xo+(-l)k"'N/2+ N/2-1 N/2-1 I; Xncos(27rkn/N) + L "'N-ncos[21fk(Nn)/N]} n=l n=l N/2-1 1/ N["o + ( -l)k"'N/2 + I; 2:vn cos(27fkn/ N)] n::::l The result for the IDFT follows immediately from Theorem 2.4 and the RCS symmetry of Xk. Note that only half of the RE sequence "'n needs to be specified. This completes the proof of Theorem 2.7. We now develop a fast, mixed radix algorithm for computing the RE symmetric DFT and its inverse, given Xn in natural order. Note that an RE sequence oflength N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCS sequence oflength N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the RCS sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 2.6 Let Xk be an RCS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of xk are identical to those in Definition 2.4, with the addition that all sequences are real as well. We indicate this by preceding the acronym for each symmetry with an R. The relationships between the symmetries recorded in Lemma 2.3 are not affected by the fact that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RCS sequence is shown in Figure 2.3. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDFT in duced by the intermediate symmetries in the DFT. 33

PAGE 45

RCS csrs*(l) Figure 2.3: Splitting tree for RE symmetric FFT 34

PAGE 46

Lemma 2.6 The intermediate symmetries in the IDFT induced by the in termediate symmetries in the DFT are identical to those in Lemma 2.4, with the following addition. Let Xk be an R sequence of length N. Its IDFT Xn satisfies: Since all sequences haveR symmetry, only half of the IDFT of any sequence needs to be computed. We now prove Lemma 2.6. Let Xk be an R sequence of length N. Its IDFT Xn satisfies: N-1 X k(N-n) :CN-n LJ kWN k=O N-1 L XkwNkn k=O :;;n This completes the proof of Lemma 2.6. The preceding lemma shows that each symmetry appearing in Figure 2.3 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RE symmetric IFFT. Theorem 2.8 Assume that p is even. The inverse combine equation for RCS, RSCS, and RCSIS sequences is given by equation {2.5} for the lower half-range of n and 0 :S l :S p/2-1. We also need the companion equation: "N/p-n,l Yn,O + ( -1 )I+ 1 Yn,p/2 + p/2-1 2Re[ w-q(l+l)wnqy ] Lp N n,q (2.13) q=l for the lower half-range of n and 0 :S l :S p/2 -1. The inverse combine equation for RSCSIS sequences is given by equation (2. 6) for the lower half range of n and 0 :; l :; p/2-1. We also need the companion equation: p/2-1 iN; = 2Re[w-(l+l)i2wn/2 w-q(l+1)wnqy i p-n,l p N L p N n,qJ (2.14) q=O 35

PAGE 47

for the lower half-range of n and 0 :0: l :0: pl2 -1. The inverse combine equation for R sequences is given by equation (2. 3) for the lower half-range of n and 0 :0: l :0: pl2 -1. We also need the companion equation: p-l X -""wq(l+l)w_-nqv N/p-n,l -L p N (2.15) q::::O for the lower half-range of n and 0 :':: l :':: pI 2 -1. Next, assume that p is odd. The inverse combine equation for RCS and RCSIS sequences is given by equation {2. 7} for the lower half-range ofn and 0 :0: l :0: (p -1) I 2. We also need .the companion equation: (p-1)/2 2R r "' -q(hl) nq 1 ZNjp-n,l = Yn,O + el L wp WN Yn,qJ (2.16) q::=l for the lower half-range of n and 0 :0: l :0: (p-3)12. The inverse combine equation for RSCS and RSCSIS sequences is given by equation (2. 8) for the lower half-range of n and 0 :<:; l :<:; (p -1) 12. We also need the companion equation: XNjp-n,l = Yn,(p-l)/2 + (2.17) for the lower half-range of n and 0 :<:; l :0: (p-3)12. The inverse combine equation for R sequences is given by equation (2.3) for the lower half-range ofn and 0 :0: l :0: (p-1)12. We also need the companion equation (2.15} for the lower half-range of n and 0 :<:; l :<:; (p-3)12. We now prove Theorem 2.8. First, assume that pis even. Consider the combining of RCS, RSCS, and RCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.5), we need the following companion equation: XNjp-n,l = YN/p-n,O + ( -1)1YNjp-n,p/2 + p/2-l 2R '<' /q q(Njp-n) 1 eL L wp wN Yl'ljp-n,q, q=l 36

PAGE 48

Using RSCS syrrunetry yields: YNjp-n,q (Njp-n)/2 WNjp YNjp-n,q -n/2-WN/p Yn,q fln,q Substituting this into the companion equation above yields: XNfp-n,l Yn,O + ( -1 )1+1 Yn,p/2 + p/2-1 2Re[ "' w(1+1)w-n-y ] L p N n,q q=l Yn,O + ( -1 )1+1 Yn,p/2 + p/2-1 2Re[ "' w-q(l+1)wnqy l L P N n,qJ q=l (2.18) Consider the combining of RSCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.6), we need the following companion equation: p/2-1 2R r 1/2 (N/p-n)/2 "' 1q q(Nfp-n) 1 elwp WN L wpwN q=O p/2-1 --L w;(1+1)wj;inqYn,q] q=O p/2-1 = 2Re[w-(l+1)12wn/2 "' w-q(l+l)wn?y J P N L p 11i n,q q=O Consider the combining of R sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.3), we need the following companion equation: "'Nfp-n,l p-1 "' lq q(Njp-n) L.. WP WN YNjp-n,q q=O p-1 = "'w(l+1)w-n-y LP N n,q q::;;::O 37

PAGE 49

Next, assume that pis odd. Consider the combining of RCS and RCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.7), we need the following companion equation: "Nfp-n,l = (p-1)/2 R [ "' lq q(N/p-n) ] YNjp-n,O + 2 e wp WN YN/p-n,q q;;:;;;l (p-1)/2 Yn,O + 2Re[ L q=l (p-1)/2 Y 0 + 2Re[ "' w-q(l+1)wnqy 1 p N nR. q=l Consider the combining of RS CS and RS CSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (2.8), we need the following companion equation: 'i.Njp-n,l -fJNjp-n,(p-1)/2 + (p-3)/2 2 R [ 1/2 (Njp-n)/2 "' lq q(Njp-n) ] e WP WN L..t WP WN YNjp-n,q q=O Substituting equation (2.18) into the companion equation above yields: X Nfp-n,l Yn,(p-1 )/2 + (p-3)/2 2Re[w(l+1)/2w-n/2 "' wq(l+l)w-nq_y ] p N L p N n,q q::::O Yn,(p-1)/2 + (p-3)/2 2Re[w-(l+1)/2wn/2 "' w-q(l+1)wnqy ] p N L..t p N n,q q=O The companion equation for R sequences is identical to the even p case. This completes the proof of Theorem 2.8. The following corollary provides an important special case of this result. Corollary 2.6 Assume p = 2. The inverse combine equation for RCS and RSCS sequences is: = "N/2-n,O 38 Yn,O + fln,l Yn,O -fln,l

PAGE 50

for the lower half-range of n. The inverse combine equation for RSCSIS sequences is: Xn,O = XN/2-n,o 2Im[w;:/2Yn,o] for the lower half-range of n. The inverse combine equation for R sequences zs: "'N/2-n,O for the lower half-range of n. Yn,o + wfvYn,l -n--Yn,O -WN Yn,l The next theorem provides all of the forward combine equations for the RE symmetric FFT. Theorem 2.9 Assume that p is even. The forward combine equation for R sequences zs: Yn,q ljpw!t{xn,O + ( -l)qx-n,p/2 + p/2-l "' [w-1x l + o)x 11} L....,; p n, p -n, J (2.19) l=l for _the lower half-range of n and 0 :S q :S p1. Note that Yo,q is real because reo,o = :co and Xo,p;2 = re N/ 2 are both real. This ensures that the final output is real because n = 0 in the last stage of the algorithm. The forward combine equation for RCS, RSCS, and RCSIS sequences is given by equation {2.19} for the lower half-range of n and 0 :S q :S p /2 1 with the exception that all sequences Xn,l are real. In addition: Yn,p/2 = ljp{xn,O + ( -1)Pi2 re-n,p/2 + p/2-1 L ( -1)1 [xn,l + "-n,l]} l:;:::l (2.20) for the lower half-range of n. The forward combine equation for RSCSIS sequences is: = 1 / -n(q+l/2){. + '( 1)q. Yn,q p WN X'n,O Z -X-n,p/2 : p/2-1 L [w;l(q+l/ 2 )in,l + __ n,l]} (2.21) l=l 39

PAGE 51

for the lower half-range of n and 0 :': q :': p/2 -1. Note that Yo,q is real because zo,pf2 = '"N/2 = 0. Next, assume that p is odd. The forward combine equation for R sequences zs: (p-1)/2 1 / -nq{ + "' [ -lq ..L lq-]} Yn,q = pwN a::n,O L wp Xn,l I wp X-n,l (2.22) /;1 for the lower half-range of n and 0 :': q :': p -1. The forward combine equation for RCS and RCSIS sequences is given by equation (2.22} for the lower half-range of n and 0 :': q :': (p -1) /2 with the exception that all se quences Xn,l are real. The forward combine equation for RSCS and RSCSIS sequences zs: (p-1 )/2 1/ -n(q+l/2){, "' [ -l(q+1/2), l(q+l/2). ]} Yn,q -p WN Xn,O I L wp Xn,l T wp X-n,l /;1 for the lower half-range of n and 0 :': q::: (p-3)/2. In addition: (p-1)/2 Yn,(p-1)/2 = 1/p{xn,O + L (-1)1[xn,! + X-n,z]} /;1 for the lower half-range of n. (2.23) (2.24) We now prove Theorem 2.9. First, assume that pis even. The forward combine equation for R sequences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all R sequences: Xn,p-l-1 = '"(v-l-1)Nfp+n "'N-(l+1)Njp+n Z(l+l)Nfp-n Z-n,l+l Using this result, we obtain: p-1 1 / -nq "' -lq Yn,q pwN LWP Zn,l l=O 40

PAGE 52

p/2-1 p/2-1 1/pw-;,n{ L w;1xn,l + L w;q(p-l-1)xn,p-l-1} 1=0 1=0 p/2-1 p/2-1 = 1/pw;,;n{ L w;1xn,l + L l=O l=O p/2-1 p/2 1/pw,Vnq{ L w;1xn,l + 1=0 1=1 = ljpw,Vnq{xn,o + (-1)x-n,p/2 + p/2-1 L [w;lqxn,l + 1=1 The forward combining of RCS, RSCS, and RCSIS sequences requires one new equation: Yn,p/2 n/2 WNjpYn,p/2 1/p{xn,O + ( -1)"12x-n,pf2 + p/2-1 L ( -1)1[xn,l + "'-n,l]} l=l The forward combine equation for RSCSIS sequences is obtained by sub stituting equation (2.9) into equation (2.19): Yn,q = 1/pw,Vnq{Xn,O + (-1J"x-n,p/2 + p/2-1 L [w; 1xn,l + 1=1 1 / -n(q+l/2){ + '( 1)+ p WN Zn,O Z X-n,p/2 p/2-1 "' [w-l(q+1/2);;, wl(q+l/2);;, I} L....J p n,l T p 1=1 Next, assume that p is odd. The forward combine equation for R se quences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. P"-1 Yn'q = ljp WN-nq "'W-lq mn I .L....t p "" 1=0 41

PAGE 53

= 1/pw!{nq{w;(p-1)12xn,(p-1)/2 + (p-3)/2 (p-3)/2 ""' -lq + ""' -q(p-l-1) } L wp Zn,l L.....J wp Xn,p-l-1 l=O l=O = 1/p w!{nq{w;q(p-1)!2xn,(p-1)/2 + (p-3)/2 (p-3)/2 L w;1 qxn,l + L 1=0 1=0 = 1jpw!{nq{w;q(p-1)!2 a:n,(p-1)/2 + (p-3)/2 (p-1 )/2 ""' -lq + ""' lq-} L.t wp Zn,l L wp Z-n,l 1=0 1=1 (p-1)/2 1/pw!{nq{xn,o + L [w;1xn,l + l=l The forward combining of RCS and RCSIS sequences does not require any new equations. The forward combine equation for RSCS and RSCSIS sequences is obtained by substituting equation (2.9) into equation (2.22): (p-1)/2 Yn,q = 1/pw!{nq{xn,o + L [w;1xn,l + l:::::l (p-1)/2 1/pw-n(q+l/2){.;; + [w-l(q+l/2).;; ..._ wl(q+l/2);;, ]} N ""'n,O L.....J p .vn,l p -n,l 1=1 For q = (p -1) /2 this reduces to: Yn,(p-1)/2 = n/2 WNjpYn,(p-1)/2 (p-1)/2 1/p{:i:n,o + L ( -1)1[in,z+ '"-n,zl} 1=1 This completes the proof of Theorem 2. 9. The following corollary provides an important special case of this result. Corollary 2. 7 Assume p = 2. The forward combine equation for R sequences is: Yn,O (xn,O + X-n,1)/2 Yn,1 = wN-n(xn,O-X-n,l)/2 42

PAGE 54

for the lower half-range of n. The forward combine equation for RCS and RSCS sequences is: Yn,O = (xn,O + '"-n,l)/2 Yn,l = (xn,O-'"-n,l)/2 for the lower half-range of n. The forward combine equation for RSCSIS sequences 'LS.' -n/2(-+ ._ )/2 Yn,O = W N Xn,O taLn,l for the lower half-range of n. 43

PAGE 55

2.4 Real Odd (RO) In this section, we will be concerned with the following symmetries: Definition 2.7 A real odd {RO) sequence Xn of length N is defined by: An imaginary odd {IO) sequence, or equivalently an imaginary conjugate symmetric {ICS) sequence, Xk of length N is defined by: xk -xk XN-k = -Xk The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2. 7 If Xn is an RO sequence of length N, then its DFT Xk >S an ICS sequence of length N. If Xk is an ICS sequence of length N, then its IDFT Xn is an RO sequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for D-D boundary con ditions. Note that if N is even, then an RO sequence satisfies D-D boundary conditions for the computational domain 1 :5 n :5 N/2-1. That is: "o 0 :C N/2 0 Theorem 2.10 Let Xn be an RO sequence and let xk be its ICS symmetric DFT, both of length N where N is even. The real form of the DFT is: N/2-1 Im(Xk) = -1/ N L 2xn sin(27rkn/ N) n=l for 1 :5 k :5 N /2-1. The real form of the IDFT is: N/2-1 Xn =L 2Im(Xk)sin(27rknjN) k::::l 44

PAGE 56

for 1 ::; n ::; N /2 1. Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.10. The result for the DFT follows from The orem 2.4, the res symmetry of xk> and the RO symmetry of Xn as follows: N-1 lm(Xk) = -1/N L Xnsin(27rkn/N) N/2-1 = -1/N L {xnsin(2dn/N) + "N-nsin[2d(N-n)/N] N/2-1 = -1/N L 2xnsin(2dn/N) n=l The result for the IDFT follows immediately from '\'heorem 2.4 and the res symmetry of xk. Note that only half of the RO sequence Xn needs to be specified. This completes the proof of Theorem 2.10. We now develop a fast, mixed radix algorithm for computing the RO symmetric DFT and its inverse, given Xn in natural order. Note that an RO sequence oflength N may be stored in N /2 real storage locations, compared to 2N real storage locations for a e sequence of length N. Similarly, an res sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for e sequences. This algorithm is based on the symmetries which occur in the splittings of the res sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries in valved. Definition 2.8 Let Xk be an ICS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of xk are identical to those in Definition 2.4, with the addition that all sequences are pure imagi nary as well. We indicate this by preceding the acronym for each symmetry with an I. The relationships between the symmetries recorded in Lemma 2.3 are not a:ffected by the fact that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IeS sequence is shown in Figure 2.4. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual 45

PAGE 57

CS CSIS(l) ICS csrs*(l) Figure 2.4: Splitting tree for RO symmetric FFT 46

PAGE 58

sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDFT in duced by the intermediate symmetries in the DFT. Lemma 2.8 The intermediate symmetries in the IDFT induced by the in termediate symmetries in the DFT are identical to those in Lemma 2.4, with the following addition. Let Xk be an I sequence of length N. Its IDFT Xn satisfies: Since all sequences have I symmetry, only half of the IDFT of any sequence needs to be computed. We now prove Lemma 2.8. Let Xk be an I sequence of length N. Its IDFT Xn satisfies: N-1 '"N-n L k=O N-1 -I.: xkw].r k=O -Xn This completes the proof of Lemma 2.8. The preceding lemma shows that each symmetry appearing in Figure 2.4 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RO symmetric IFFT. Theorem 2.11 Assume that p is even. The inverse combine equation for ICS, ISCS, and ICSIS sequences is given by equation (2.5) for the lower half-range of n and 0 ::; l ::; p/2-1. We also need the companion equation: "'Nfp-n,l (2.25) q=1 47

PAGE 59

for the lower half-range of n and 0 :0: l :0: p/2 1. The inverse combine equation for ISCSIS sequences is given by equation {2. 6) for the lower half range of n and 0 :0: l :0: p/2 1. We also need the companion equation: p/2-1 2R [ -(1+1)/2 n/2 ,, .--q(h 1) nq 1 ZNjp-n,l-ewp WN L wp WNYn,qJ q:::O (2.26) for the lower half-range of n and 0 :0: l :0: p/2 1. The inverse combine equation for I sequences is given by equation {2. 3) for the lower half-range of n and 0 :0: l :0: p/21. We also need the companion equation: (2.27) for the lower half-range of n and 0 :0: l :0: p/21. Next, assume that p is odd. The inverse combine equation for ICS and ICSIS sequences is given by equation {2. 7) for the lower half-range of n and 0 :0: l :0: (p-1)/2. We also need the companion equation: (p-1)/2 2 R "" -q(l+l) nq i 'J'.Njp-n,l = -Yn,O-l L...,; wp WN Yn,q.J (2.28) q:::::l for the lower half-range of n and 0 :0: l :0: (p-3)/2. The inverse combine equation for ISCS and ISCSIS sequences is given by equation {2. 8) for the lower half-range of n and 0 :0: l :0: (p-1)/2. We also need the companion equation: ii;Nfp-n,l -fin,(p-1)/2(2.29) for the lower half-range of n and 0 :0: l :0: (p-3)/2. The inverse combine equation for I sequences is given by equation {2. 3) for the lower half-range ofn and 0:0: l :0: (p-1)/2. We also need the companion equation {2.27} for the lower half-range ofn and 0:0: l :0: (p3)/2. We now prove Theorem 2.11. First, assume that pis even. Consider the combining of ICS, ISCS, and ICSIS sequences. Since we will compute only 48

PAGE 60

half of each sequence Yn,q on the right hand side of equation (2.5), we need the following companion equation: "'Nfp-n,l YN/p-n,O + ( -1)1YN/p-n,p/2 + p/2-1 2R [ lq q(Njp-n) ] e L.-WP WN YNjp-n,q q=l Using ISCS symmetry yields: YNjp-n,q (Njp-n)/2 WN/p YNjp-n,q -n/2--tWNjp Yn,q = +fJn,q Substituting this into the companion equation above yields: "'N/p-n,l -Yn,O + (-1)1fJn,p/2p/2-1 2Re[ wq(l+l)w-nq-y l L p N n,q. q=1 = -Yn,o + ( -1)1Yn,pf2-p/22Ref w-q(l+ l)wnqy ] .. L P N n,q q=l (2.30) Consider the combining ofiSCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.6), we need the following companion equation: p/2-1 2 R [ 1/2 (Njp-n)/2 lq q(Njp-n) ] "'Nfp-n,l e wp WN L-WP wN YNjp-n,q q=O p/2-1 = 2 R [ (1+1)/2 -n/2 q(l+l) .-nq] ewp WN L wP wN Yn,q q::::O p/2-1 -2Re[w-(l+1)12wn/2 wq(l+1)wnqy 1 p N L p N n,q.J q=O Consider the combining of I sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.3), we need the 49

PAGE 61

following companion equation: '"Nfp-n,l = p-1 "' lq q(Njp-n) L.. WP WN YNjp-n,q q:::::O p-1 "'wq(l+1 lw -nq_y L p N n,q q=O Next, assume that pis odd. Consider the combining of ICS and ICSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2. 7), we need the following companion equation: 3! Njp-n,l (p-1)/2 2 R r "' lq q(Njp-n) ] l eL L wp WN YNjp-n,q q=l (p-1)/2 -Yn,O -2Re[ w;(l+1 )WNnqYn,q] q=l (p-1)/2 -Yn,o -2Re[ q=l Consider the combining of ISCS and ISCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (2.8), we need the following companion equation: itNjp-n,l YN jp-n,(p-1 )/2 + (p-3)/2 2R [ 1/2 (Njp-n)/2 "' lq q(N/p-n) ] e WP wN L WP WN YNjp-n,q q::.:O Substituting equation (2.30) into the companion equation above yields: -ffn,(p-1)/2--ffn,(p-1)/2 50

PAGE 62

The companion equation for I sequences is identical to the even p case. This completes the proof of Theorem 2 .11. The following corollary provides an important special case of this result. Corollary 2.8 Assume p = 2. The inverse combine equation for ICS and ISCS sequences is: Xn,O Yn,O + f/n,l XN/2-n,O -Yn,fJ + fJn,l for the lower half-range of n. The inverse combine equation for ISCSIS sequences zs: = for the lower half-range of n. The inverse combine equation for I sequences ts: Yn,o + wl\rYn,1 X N/2-n,O 1 -Yn,O T i.I:N Yn,l for the lower half-range of n. The next theorem provides all of the forward combine equations for the RO symmetric FFT. Theorem 2.12 Assume that p is even. The forward combine equation for I sequences is: Yn,q 1/pwj\,.nq{xn,O + ( -l)q+l;;;-_n,p/2 + p/2-1 "' [ -lq ,lq-'} L WP Xn,l wp X-n,lJ (2.31) l=l for the lower half-range of n and 0 :S q :S p -1. Note that Yo,q is pure imaginary because xo,o = x0 and '"o,pj2 = x N/2 are both pure imaginary. This ensures that the final output is pure imaginary because n = 0 in the last stage of the algorithm. The forward combine equation for ICS, ISCS, and ICSIS sequences is given by equation {2.31} for the lower half-range of 51

PAGE 63

n and 0 C: q C: p/2-1 with the exception that all sequences Xn,l are real. In addition: Yn,p/2 1/p{xn,O + ( -1)P/2+1:z:_n,p/2 + p/2-1 L ( -1)1[xn,l-"'-n,l]} 1=1 (2.32) for the lower half-range of n. Note that Yo,p;2 = 0 because xo,o = xo = 0 and Xo,pj2 = "'N/2 = 0. The forward combine equation for ISCSIS sequences is: 1 / -n(q+l/2){-. "( 1)q+l-+ Yn,q = PWN Xn,O T 't X-n,p/2 P/2-1 "' [w-l(q+l/2);; I-wl(q+1/2);, __ tl} L..J p n, p n, (2.33) 1=1 for the lower half-range of n and 0 S q S p /2 -l. Note that Yo,q zs pure imaginary because xo,o = :co = 0. Next, assume that p is odd. The forward combine equation for I sequences 'tS.' (p-1)/2 1 / -nq{ + "' r -lq lq-l} Yn,qpwN Xn,O lwp Xn,lWp Z-n,l_, (2.34) 1=1 for the lower half-range of n and 0 C: q C: p 1. The forward combine equation for ICS and ICSIS sequences is given by equation (2.34} for the lower half-range of n and 0 C: q C: (p 1) /2 with the exception that all sequences Xn,l are real. The forward combine equation for ISCS and ISCSIS sequences zs: (p-1)/2 Yn,q = 1/pwjVn(q+1 /2){:iin,O + L [w;l(q+l/ 2):iin,l-w;<+1/');,_n,d} (2.35) l::::l for the lower half-range of n and 0 C: q S (p-3)/2. In addition: (p-1)/2 Yn,(p-1)/2 = 1/p{:i:n,O + L ( -1)1[in,t X-n,l]} 1=1 for the lower half-range of n. Note that Yo,(p1 ); 2 = 0. (2.36) We now prove Theorem 2.12. First, assume that pis even. The forward combine equation for I sequences is obtained by developing a compact form 52

PAGE 64

of equation (2.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all I sequences: Zn,p-l-1 Z(p-l-l)N/p+n XN-(l+l)N/p+n _;z:(l+1)N/p-n Using this result, we obtain: Yn,q = p-1 1/pw}tq I; w;1xn,l l=O p/2-1 p/2 ljpw!tq{ I; w;1 :vn,l + I; w;q(p-l-1 ):cn,p-l-1} l=O l=O p/2-1 p/2-1 1/pw}tq{ I; w;1:cn,l-I; 1=0 l=O p/2-1 p/2 ljpw-;t{ I; w;1:vn,1-I;w;x-n,z} l=O 1=1 1/pw}t{xn,O + ( -l)q+l;z:_n,p/2 + p/2-1 "' [ -1q lq1} Wp Xn,lWp X-n,lj l=l The forward combining ofiCS, ISCS, and ICSIS sequences requires one new equation: n/2 WN/pYn,p/2 ljp{Xn,O + (-l)P/2+1X_n,p/2 + p/2-1 I; ( -1 )1 [ :Vn,! -'"-n,z]} 1=1 The forward combine equation for ISCSIS sequences is obtained by substituting equation (2.9) into equation (2.31): 53

PAGE 65

Yn,q = wl(q+l/2);; j'} p -n,l Next, assume that p is odd. The forward combine equation for I se quences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. p-1 Yn,q = 1 / -nq """ -lq p WN L..J Wp Xn,l l:::::O = 1 / --nq{ -q(p-1)/2 .J.. p WN Wp '"n,(p-l)/2 (p-3)/2 (p-3)/2 """ -lq + """ -q(p-l-1) } L..J WP Xn,l L..J WP Xn,p-l-1 l=O l=O = 1/pw}Fq{w;q(p-l)/2,n,(p-1)/2 + (p-3)/2 (p-3)/2 L w;lq'"n,l-L l=O l=O 1 / -nq{ -q(p-1)/2 .J.. pwN wp "n,(p-1)/2 (p-3)/2 (p-1)/2 """ -lq """ lq-} L..J WP Xn,l L WP X-n,l The forward combining of ICS and ICSIS sequences does not require any new equations. The forward combine equation for ISCS and ISCSIS sequences is obtained by substituting equation (2.9) into equation (2.34): (p-1)/2 1 / -nq{ + """ -lq lq'} Yn,q pwN Xn,O L LWp Xn,lwp X-n,ll l=1 54

PAGE 66

(p-1)/2 = 1/pw-n(q+l/2){;; + [w-l(q+l/2);; 1 wl(q+1/2);; z]} N n,O L p n, p -n, 1=1 For q = (p -1) j 2 this reduces to: n/2 Yn,(p-1)/2 = WNjpYn,(p-1)/2 (p-1)/2 = 1/p{xn,o + L (-1)1[xn,z-'"-n,z]} 1=1 This completes the proof of Theorem 2.12. The following corollary provides an important special case of this result. Corollary 2.9 Assume p = 2. The forward combine equation for I sequences ts: Yn,O = (zn,O -"if-n,1)/2 Yn,1 wjlt(zn,O + '"-n,1)/2 for the lower half-range of n. The forward combine equation for ICS and ISCS sequences is: Yn,O ( Zn,O Z -n,1) /2 Yn,1 (zn,O + "'-n,1)/2 for the lower half-range of n. The forward combine equation for ISCSIS sequences ts: -n/2()/2 Yn,O = WN Zn,O-tX'-n,l for the lower half-range of n. 55

PAGE 67

2.5 Real Composite Even-Even (RE-E) In this section, we will be concerned with the following symmetries: Definition 2.9 A real composite even-even (RE-E) sequence Xn of length N, where N is even, is defined by: :VN-n = Xn Note that an RE-E sequence of length N is also an REsequence of length N. A real conjugate symmetric zero odd term (RCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 2.9 lfxn is an RE-E sequence of length N, where N is even, then its DFT Xk is an RCSZO sequence of length N. If Xk is an RCSZO sequence of length N, where N is even, then its IDFT Xn is an RE-E sequence of length N. We now prove Lemma 2.9. We will only prove the first assertion. Assume Xn is an RE-E sequence of length N, where N is even. Since Xn is also an RE sequence of length N, Lemma 2.5 implies that its DFT Xk is an RCS sequence of length N. Thus, we have only to prove the third property in the definition of an RCSZO sequence. For this, we use the representation of Xk provided by Theorem 2. 7 and the RE-E symmetry of Xn as follows: N/2-1 xk = Xo + ( -l)k'"N/2 + :E 2Xn cos(2r.kn/ N) n=l N/2-1 Xo + (-l)k"'N/2 + L 2xN/2-ncos[27rk(N/2n)/N] n=l 56

PAGE 68

N/2-1 = xo + ( -1)kxo + ( -1l L 2xn cos(27rkn/ N) n=l N/2-1 = (-l)k[(-l)kxo+xo+ L 2xncos(27rkn/N)] n=l N/2-1 = (-l)k[xo + (-l)k"'N/2 + L 2xncos(27rkn/N)] n::::.:l This completes the proof of Lemma 2.9. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for N -NS boundary con ditions. Note that if N = 2(2M + 1), then an RE-E sequence satisfies N-NS boundary conditions for the computational domain 0 ::; n::; M. That is: Theorem 2.13 Let xn be an RE-E sequence and let Xk be its RCSZO sym metric DFT, both of length N where N is even. Assume that N = 2(2M +1). The real form of the DFT is: M X2k = 2/N[xo + L 2xncos(47rknjN)J n:::::l for 0 :S k :S M, The real form of the IDFT is: M Xn = Xo + L 2X2k cos( 47rknj N) k=1 for 0 :S n :S M, Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.13. The result for the DFT follows from The orem 2.7, the RCSZO symmetry of Xk, and the RE-E symmetry of Xn as follows: N/2-1 X2k l/N[xo + '"N/2 + L 2xncos(47rkn/N)] n:::::l 57

PAGE 69

M 1/N{;co + '"N/ 2 + L 2xncos(47rkn/N) + n=l M L 2'"N/2-ncos[47rk(N/2-n)/Nj} n=l M = 2/N[;co + L 2;cn cos(47rkn/N)] n=l The result for the IDFT follows inunediately from Theorem 2.7 and the RCSZO synunetry of Xk. Note that only one fourth of the RE-E sequence "n needs to be specified. This completes the proof of Theorem 2.13. A fast, mixed radix algorithm for computing the RE-E symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RE synunetric FFT. Note that an REE sequence of length N may be stored inN /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCSZO sequence of length N may be stored inN /4 real storage locations. Our goal is to exploit these synunetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the synunetries which occur in the splittings of the RCSZO sequence Xk. We begin developing this algorithm by defining one new intermediate synunetry involved. Definition 2.10 A zero {Z) sequence Xk of length N is defined by: forO :S k :S N -1. The following lenuna establishes the relationship between the symmetries which occur in the splittings of the RCSZO sequence Xk We omit the proof of this result because it is trivial. Lemma 2.10 Let Xk be an RCSZO sequence of length N with factor 2. Then subsequence Xk,o is RCS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the RCS sequence Xk,o are identical to those in Lemma 2.3, with the addition that all sequences have R symmetry as well. 58

PAGE 70

A mixed radix splitting tree diagram for an RCSZO sequence is shown in Figure 2.5. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4 and 2.6, with the addition provided by the following lemma. We omit the proof of this result because it is trivial. Lemma 2.11 Let Xk be a Z sequence of length N. Its IDFT Xn is also a Z sequence of length N. These results show that each symmetry appearing in Figure 2.5 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RE-E symmetric IFFT, obtained as a special case of that for the RE symmetric IFFT. Corollary 2.10 Assume p = 2. The inverse combine equation for RCS and Z sequences is: Xn,O = Yn,O for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.8 for arbitrary factors p. We now prove Corollary 2.10. The inverse combine equation for RCS and Z sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z symmetry of Yn,l Note that the companion equation is not needed be cause only one fourth of the RE-E sequence "'n needs to be computed. This completes the proof of Corollary 2.10. The next corollary provides all of the forward combine equations for the RE-E symmetric FFT, obtained as a special case of that for the RE symmetric FFT. Corollary 2.11 Assume p and Z sequences is: 2. The forward combine equation for R CS Yn,O Xn,O Yn,l 0 59

PAGE 71

CS RCSZO Figure 2.5: Splitting tree for RE-E symmetric FFT 60

PAGE 72

for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 2. 9 for arbitrary factors p. We now prove Corollary 2.11. The forward combine equation for RCS and Z sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the RE-E symmetry of a:n as follows: Yn,O (a:n,O + "'-n,l)/2 (:.n + "'N/2-n)/2 (:lOn + :Z:n)/2 Xn,O Yn,1 ( :.n,O -"'-n,1) /2 (a:n-"'N/2-n)/2 = (;vn-:Z:n)/2 = 0 This completes the proof of Corollary 2.11. 61

PAGE 73

2.6 Real Composite Even-Odd (RE-O) In this section, we will be concerned with the following symmetries: Definition 2.11 A real composite even-odd (RE-O) sequence Xn of length N, where N is even, is defined by: Note that an RE-O sequence of length N is also an RE sequence of length N. A real conjugate symmetric zero even term (RCSZE) sequence Xk of length N, where N is even, is defined by: xk = xk XN-k xk xk = ( -l)k+lxk The following lemma establishes the relationship between these symme tries. Lemma 2.12 Ifxn is an RE-O sequence of length N, where N is even, then its DFT Xk is an RCSZE sequence of length N. If Xk is an RCSZE sequence of length N, where N is even, then its IDPT Xn is an RE-O sequence of length N. We now prove Lemma 2.12. We will only prove the first assertion. As sume Xn is an RE0 sequence of length N, where N is even. Since Xn is also an REsequence of length N, Lemma 2.5 implies that its DFT Xk is an RCS sequence of length N. Thus, we have only to prove the third property in the definition of an RCSZE sequence. For this, we use the representation of Xk provided by Theorem 2. 7 and the RE-O symmetry of ;en as follows: N/2-1 Xk = xo+(-l)k'"N/2+ L 2;cncos(2r.knjN) n::::::l N/2-1 Xo + (-l)kxN;2 + I; 2xN;2-ncos[2d(N/2n)/N] n=l 62

PAGE 74

N/2-1 = "o+(-1)k+1,o+(-1)k+1 :E 2xncos(21rkniN) n=l N/2-1 ( -1)k+l[( -1)k+lx0 + xo + :E 2xn cos(21rknl N)] n;;:;l N/2-1 ( -1)k+l[a:o + ( -1)k"Nf2 + :E 2a:n cos(21rkniN)] n=l This completes the proof of Lemma 2.12. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for N-D or N-DS bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an RE-O sequence satisfies N-D boundary conditions for the computational domain 0 :S: n :S: N I 4 -1. That is: '"N-1 = "1 '"N/4 0 Similarly, if N = 2(2M + 1), then an RE-O sequence satisfies N-DS boundary conditions for the computational domain 0 :S: n :S: M. That is: Theorem 2.14 Let Xn be an RE-O sequence and let Xk be its RCSZE sym metric DFT, both of length N where N is even. Assume that N = 4M. The real form of the DFT is: N/4-1 X2k+1 = 21 N{x0 + :E 2xn cos[21rn(2k + 1)1 N]} n=l for 0 :S: k :S: N I 4 -1. The real form of the ID FT is: N/4-1 Xn = :E 2X2k+l cos[21rn(2k + l)IN] k=O 63

PAGE 75

for 0::; n::; N /4-1. Next, assume that N = 2(2M + 1). The real form of the DFT is: M Xzk+1 = 2/N{xo + L 2xncos[21rn(2k + 1)/N]} n=l for 0 ::; k ::; M. The real form of the ID FT is: M-1 Xn=(-1tXN;z+ L 2X2kHcos[27rn(2k+1)/N] k=O for 0::; n::; M. We now prove Theorem 2.14. We prove the result for the DFT for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 2.7, the RCSZE symmetry of Xk, and the RE-O symmetry of '"n as follows: N/2-1 1/N{xo-xN;z+ L 2xncos[27rn(2k+1)/N]} n=l N/4-1 = 1/N{xo-xN;z+ L 2xncoS:27rn(2k+1)/N]+ n=l N/4-1 L 2xN/2-n cos[27r(N/2-n)(2k + 1)/N]} n=l N/4-1 2/N{xo + L 2:rncos[27rn(2k + 1)/N]} n=l The results for the IDFT follow immediately from Theorem 2. 7 and the RCSZE symmetry of Xk. Note that only one fourth of the RE-O sequence Xn needs to be specified. This completes the proof of Theorem 2.14. A fast, mixed radix algorithm for computing the RE-O symmetric DFT and its inverse, given :>:n in natural order, may be obtained as a special case of that for theRE symmetric FFT. Note that an RE-O sequence of length N may be stored inN /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCSZE sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth 64

PAGE 76

in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the RCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 2.13 Let Xk be an RCSZE sequence of length N with factor 2. Then subsequence Xk,o is Z symmetric, and subsequence Xk,l is RSCS sym metric. The symmetries which occur in the splittings of the RSCS sequence Xk,l are identical to those in Lemma 2.3, with the addition that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RCSZE sequence is shown in Figure 2.6. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.6, and 2.11. These results show that each symmetry appearing in Figure 2.6 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RE-O symmetric IFFT, obtained as a special case of that for the RE symmetric IFFT. Corollary 2.12 Assume p = 2. The inverse combine equation for Z and RSCS sequences is: Zn,O = fin,l for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.8 for arbitrary factors p. We now prove Corollary 2.12. The inverse combine equation for Z and RSCS sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the RE-O sequence Xn needs to be computed. This completes the proof of Corollary 2.12. 65

PAGE 77

RCSZE Figure 2.6: Splitting tree for RE-O symmetric FFT 66

PAGE 78

The next corollary provides all of the forward combine equations for the RE-O symmetric FFT, obtained as a special case of that for the RE symmetric FFT. Corollary 2.13 Assume p = 2. The forward combine equation for Z and RSCS sequences is: Yn,O 0 iJn,l Xn,O for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.9 for arbitrary factors p. We now prove Corollary 2.13. The forward combine equation for Z and RSCS sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the RE-O symmetry of Zn as follows: Yn,O (zn,o + "'-n,d/2 (zn + "'N/2-n)/2 (zn-Zn)/2 = 0 i/n,l (zn,O-"'-n,l)/2 ( Zn -Z N/2-n) /2 = (zn + Zn)/2 Zn,O This completes the proof of Corollary 2.13. 67

PAGE 79

2.7 Real Composite Odd-Even (RO-E) In this section, we will be concerned with the following symmetries: Definition 2.12 A real composite odd-even (RO-E) sequence Xn of length N, where N is even, is defined by: Note that an RO-E sequence of length N i.< also an RO sequence of length N. An imaginary conjugate symmetric zero eten term (ICSZE) sequence Xk of length N, where N is even, is defined by: -Xk ( -lf+1 xk The following lemma establishes the relationship between these symme tries. Lemma 2.14 If Zn is an RO-E sequence of length N, where N is even, then its DFT Xk is an ICSZE sequence of length N. If Xk is an ICSZE sequence of length N, where N is even, then its IDFT Xn is an RO-E sequence of length N. We now prove Lemma 2.14. We will only prove the first assertion. As sume Zn is an RO-E sequence of length N, where N is even. Since Xn is also an RO sequence of length N, Lemma 2. 7 implies that its DFT Xk is an ICS sequence of length N. Thus, we have only to prove the third property in the definition of an ICSZE sequence. For this, we use the representation of Xk provided by Theorem 2.10 and the RO-E symmetry of Xn as follows: N/2-1 Xk = -if N L 2xn sin(271'kn/ N) n=l N/2-1 -i/N L 2xN/2-nsin[2r.k(N/2n)/N] n=l 68

PAGE 80

N/2-1 (--1)k+1[-i/N 2:: 2xnsin(27rkn/N)] n=l This completes the proof of Lemma 2.14. The next theorem uses the previous lemma to find the real form of the DFT aud IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for D-N or D-NS bound ary conditions, depending on the length of the sequence N. Nate that if N = 4M, then au RO-E sequence satisfies D-C\ boundary conditions for the computational domain 1 ::; n ::; N /4. That is: "o 0 Similarly, if N = 2(2M +1), then an RO-E sequence satisfies D-NS boundary conditions for the computational domain 1 :S n ::; M. That is: xo 0 Theorem 2.15 Let Xn be an RO-E sequence and let Xk be its ICSZE sym metric DFT, both of length N where N is even. Assume that N = 4M. The real form of the DFT is: N/4-1 Im(X2k-1) = -2/N{(-1)k+l"'N;4 + 2:: 2xnsin[2r.n(2k -1)/N]} n.:;;;:;l for 1 :S k :S N /4. The real form of the IDFT is: N/4 "'n =2:: 2Im(X2k-1) sin[27rn(2k-1)/N] k=1 for 1 :S n :S N/4. Next, assume that N = 2(2M' + 1). The real form of the DFT is: M Im(X2k-d = -2/ N I: 2xn sin[27rn(2k1)/ N] n=l 69

PAGE 81

for 1 :":: k :":: M. The real form of the IDFT is: M "'n =I:2Im(X2k-J)sin[2r.n(2k-1)/N] k=l for1:0:n:O:M. We now prove Theorem 2.15. We prove the result for the DFT for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. Tills result follows from Theorem 2.10, the ICSZE symmetry of Xk> and the RO-E symmetry of Xn as follows: N/2-1 Im(X2k-d = -1/N L l);N] n=l N/4-1 -1/N{(-1)k+12xN;4 + L 2xnsin[27rn(2k -1)/N] + n=l N/4-1 L 2:cN/2-nsin[27r(N/2n)(2k -1)/NJ} n=l N/4-1 -2/N{(-1)k+l"'N/4 + L 2xnsin[2r.n(2k-1)IN]} n=l The results for the IDFT follow immediately from Theorem 2.10 and the ICSZE symmetry of Xk. Note that only one fourth of the RO-E sequence Xn needs to be specified. Tills completes the proof of Theorem 2.15. A fast, mixed radix algorithm for computing the RO-E symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RO symmetric FFT. Note that an RO-E sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an ICSZE sequence of length N may be stored in N I 4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. Tills algorithm is based on the symmetries which occur in the splittings of the ICSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establi.shes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of tills result because it is trivial. 70

PAGE 82

Lemma 2.15 LetXk be an ICSZE sequence of length N withfactor2. Then subsequence Xk,O is Z symmetric, and subsequence Xk,l is ISCS symmetric. The symmetries which occur in the splittings oj the ISCS sequence Xk,1 are identical to those in Lemma 2.3, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an ICSZE sequence is shown in Figure 2.7. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11. These results show that each symmetry appearing in Figure 2.7 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RO-E symmetric IFFT, obtained as a special case of that for the RO symmetric IFFT. Corollary 2.14 Assume p = 2. The inverse combine equation for Z and ISCS sequences is: Xn,O = fln,l for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.11 jor arbitrary factors p. We now prove Corollary 2.14. The inverse combine equation for Z and ISCS sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z sym metry of Yn,O Note that the companion equation is not needed because only one fourth of the RO-E sequence Zn needs to be computed. This completes the proof of Corollary 2.14. The next corollary provides all of the forward combine equations for the RO-E symmetric FFT, obtained as a special case of that for the RO symmetric FFT. Corollary 2.15 Assume p = 2. The forward combine equation for Z and ISCS sequences is: Yn,O 0 iJn,l Xn,O 71

PAGE 83

ICSZE Figure 2. 7: Splitting tree for RO-E symmetric FFT 72

PAGE 84

for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.12 for arbitrary factors p. We now prove Corollary 2.15. The forward combine equation for Z and ISCS sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.9 and use the RO-E symmetry of "'n as follows: Yn,O (:vn,O-"'-n,l)/2 (:en-"'N/2-n)/2 (:en-Xn)/2 0 Yn,l (:vn,O + "'-n,l)/2 ( Xn + :C N/2-n) /2 = (xn + Xn)/2 This completes the proof of Corollary 2.15. 73

PAGE 85

2.8 Real Composite Odd-Odd (RO-O) In this section, we will be concerned with the following symmetries: Definition 2.13 A real composite odd-odd (RO-O) sequence Xn of length N, where N is even, is defined by: Xn = Xn -Xn 'XNj2-n = -Xn Note that an RO-O sequence of length N is also an RO sequence of length N. An imaginary conjugate symmetric zero odd term (ICSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 2.16 If "n is an RO-O sequence of length N, where N is even, then its DFT Xk is an ICSZO sequence of length 1V. If Xk is an ICSZO sequence of length N, where N is even, then its IDFT :Z:n is an RO-O sequence of length N. We now prove Lemma 2.16. We will only prove the first assertion. As sume Xn is an RO-O sequence of length N, where N is even. Since Xn is also an RO sequence of length N, Lemma 2. 7 implies that its DFT Xk is an ICS sequence of length N. Thus, we have only to prove the third property in the definition of an ICSZO sequence. For this, we use the representation of Xk provided by Theorem 2.10 and the RO-O symmetry of Xn as follows: N/2-1 Xk = -i/N L 2xnsin(2dn/N) n=l N/2-1 -i/N L 2xN;2 .. nsin[21l"k(N/2n)/N] n-::::::1 74

PAGE 86

N/2-1 = (--1)k[-i/N L 2ctnsin(2r.kn/N)] n=l This completes the proof of Lemma 2.16. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for D-DS boundary con ditions. Note that if N = 2(2M + 1), then an RO-O sequence satisfies D-DS boundary conditions for the computational domain 1 :S n :S M. That is: Theorem 2.16 Let "'n be an RO-O sequence and let Xk be its ICSZO sym metric DFT, both of length N where N is even. Assume that N = 2(2M +1). The real form of the DFT is: M Im(X2k) = -2/NL 2"'nsin(4r.kn/N) n=l for 1 :S k :S M. The real form of the IDFT is: M "n =L 2lm(X2k) sin( 4dn/N) k=1 for 1 :S n :S M. Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.16. The result for the DFT follows from The orem 2.10, the ICSZO symmetry of Xk, and the RO-O symmetry of Xn as follows: N/2-1 -1/N L 2xnsin(4r.kn/N) n=l M -1/N{L 2xnsin(4r.knjN) + n=l M L 2ctN/2-n sin[4r.k(N/2-n)/N]} n=l 75

PAGE 87

M -21 N L 2xn sin( 47rkn/ N) n=l The result for the IDFT follows immediately from Theorem 2.10 and the ICSZO symmetry of Xk. Note that only one fourth of the RO-O sequence :Cn needs to be specified. This completes the proof of Theorem 2.16. A fast, mixed radix algorithm for computing the RO-O symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RO symmetric FFT. Note that an RO-O sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, aa ICSZO sequence of length N may be stored in N I 4 real storage loca:ions. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the synunetries which occur in the splittings of the ICSZO sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 2.17 Let Xk be an ICSZO sequence of length N with factor 2. Then subsequence Xk,o is ICS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the ICS sequence xk,O are identical to those in Lemma 2.3, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an ICSZO sequence is shown in Figure 2.8. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11. These results show that each symmetry appearing in Figure 2.8 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RO-O symmetric IFFT, obtained as a special case of that for the RO symmetric IFFT. 76

PAGE 88

cs ICSZO Figure 2.8: Splitting tree for RO-O symmetric FFT 77

PAGE 89

Corollary 2.16 Assume p = 2. The inverse combine equation for ICS and Z sequences is: 2ln,O = Yn,O for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.11 for arbitrary factors p. We now prove Corollary 2.16. The inverse combine equation for ICS and Z sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z symmetry of Yn,! Note that the companion equation is not needed because only one fourth of the RO-O sequence :Z:n needs to be computed. This completes the proof of Corollary 2.16. The next corollary provides all of the forward combine equations for the RO-O symmetric FFT, obtained as a special case of that for the RO symmetric FFT. Corollary 2.17 Assume p = 2. The forward combine equation for ICS and Z sequences is: Yn,O = Yn,l 0 for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.12 for arbitrary factors p. We now prove Corollary 2.17. The forward combine equation for ICS and Z sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.9 and use the RO-O symmetry of Xn as follows: Yn,O (:z:n,O-"'-n,l)/2 (:z:n-'"N/2-n)/2 (:z:n + Xn)/2 Yn,! = (xn,O + "'-n,J)/2 (xn + "'N/2-n)/2 (xn-:Z:n)/2 0 This completes the proof of Corollary 2.17. 78

PAGE 90

2.9 Real Staggered Even (RSE) In this section, we will be concerned with the following symmetries: Definition 2.14 A real staggered even (RSE) sequence :Vn of length N zs defined by: An w-even (wE) sequence Xk of length N is defined by: (2.37) (2.38) The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.18 If '"n is an RSE sequence of length N, then its DFT Xk zs an wE sequence of length N. If Xk is an wE sequence of length N, then its IDFT "'n is an RSE sequence of length N. The next lemma will be needed to obtain the real form of the DFT and IDFT. Lemma 2.19 Let Xk be an wE sequence of length N, and let Xk denote the magnitude of Xk. Then: xk = xk = Xn XN+k = k/2 -WN Xk 1/ N L XnW"i,k(n+l/2) n:::::O LX k(n+l/2) kWN -Xk -Xk 79 (2.39) (2.40) (2.41) (2.42) (2.43)

PAGE 91

We now prove Lemma 2.19. We express Xk in polar form as follows: -e Xk = Xke' Substituting this into equation (2.38) and solving for e leads to equation (2.39). Combining equations (2.1) and (2.39) leads to equation (2.40), while combining equations (2.2) and (2.39) leads to equation (2.41). Equation (2.42) is obtained from equation (2.40) as follows: N-1 XN+k = 1/ N L XnWN(N+k)(n+1/2) n=O N-1 liN "' -k(nTl/2) -L....J XnWN n=O Equation (2.43) is obtained by combining equations (2.37) and (2.39) as follows: XN-k = w"N(N-k)/2XN-k -w';j'xk -xk This completes the proof of Lemma 2.19. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the DFT is the eigenvector expansion required by the Fourier analysis method for N-D boundary condi tions. Note that if N is even, then an wE sequence (represented by Xk) satis fies N-D boundary conditions for the computational domain 0 S: k S: N /2-1. That is: X-1 -XN-1 =X, XN/2 = 0 Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for NS-NS boundary conditions. Note that if N is even, then an RSE sequence satisfies NS-NS boundary conditions for the computational domain 0 S: n S: N/2-1. That is: XN-1 Xo ZN/2-1 = XN/2 80

PAGE 92

Theorem 2.17 Let Zn be an RSE sequence and let xk be its wE symmetric DFT, both of length N where N is even. The real form of the DFT is: N/2-1 Xk = 1/ N L 2a:n cos[d(2n + 1 )/ N] n=D for 0 :S k :S N /2 -1. The real form of the ID FT is: N/2-1 Zn = Xo + L 2Xk cos[?rk(2n + 1)/N] for 0 :S n :S N /2 -1. We now prove Theorem 2.17. The result for the DFT follows from equa tion (2.40) and the RSE symmetry of Zn as follows: n=O N/2-1 N/2-1 1/N{ "' -k(n+!/2) ..L "' .-k(N-n-1/2)} ZnWN L X]\ -n--lWN n=D n=O N/2-1 N/2-1 1/N{ L XnWNk(n+!/2) + L n=.O n:::=O N/2-1 2/NRe[ L ZnWNk(n+J/2)] n=O N/2-1 = 1/N L 2xncos[d(2n+1)/NJ n:::::::O Note that only half of the wE sequence Xk needs to be specified. The result for the IDFT follows from equations (2.41) and (2.43) as follows: N-1 Xn L k=1 81

PAGE 93

N/2-1 N/2-1 Xo + L + L XkwNk(n+l/2) k;1 k;l N/2-1 = Xo + 2Re[ L k;1 N/2-1 Xo + L 2Xkcos[?rk(2n + 1)/N] k;l Note that only half of the RSE sequence Xn needs to be specified. This completes the proof of Theorem 2.17. We now develop a fast, mixed radix algorithm for computing the RSE symmetric DFT and its inverse, given Xn in natural order. Note that an RSE sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. How ever, an wE sequence of length N requires N real storage locations. Thus, in order to obtain an in-place algorithm, we must use a more compact rep resentation of an wE sequence. Such a compact representation is provided by the quantities Xk in Lemma 2.19. Using this representation, an wE se quence of length N may be stored inN /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. The procedure for developing this algorithm will be different from the other algorithms in this chapter for the following reason. Equation (2.40) shows that when we replace the complex quantity Xk by the real quantity Xk, the DFT is changed to a new transform, which we call the discrete staggered transform (DST). Equation (2.41) provides the inverse discrete staggered transform (IDST). Note that the DST is a con stant multiple of the DFT, whereas the IDST is not related to the IDFT in any simple way. We have found that the applications of the DST include the boundary conditions considered in this section, as well as others. Thus, we have devoted all of Chapter 3 to the development of fast, mixed radix algorithms for computing the DST and IDST. These algorithms are called the fast staggered transform (FST) and inverse fast staggered transform (IFST). The FST for RSE sequences is developed in Section 3.3. 82

PAGE 94

2.10 Real Staggered Odd (RSO) In this section, we will be concerned with the following symmetries: Definition 2.15 A real staggered odd (RSO) sequence "'n of length N IS defined by: An w-odd (wO) sequence Xk of length N is defined by: (2.44) (2.45) The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.20 If "'n is an RSO sequence of length N, then its DFT Xk 1s an wO sequence of length N. If Xk is an wO sequence of length N, then its IDFT "'n is an RSO sequence of length N. The next lemma will be needed to obtain the real form of the DFT and IDFT. Lemma 2.21 Let Xk be an wO sequence of length N, and let Xk denote the magnitude of Xk. Then: n=O N-1 "'"' x--k(n+l/2) L z kWN k=O -Ji:k Ji:k 83 (2.46) (2.47) (2.48) (2.49) (2.50)

PAGE 95

We now prove Lemma 2.21. We express Xk in polar form as follows: xk = xke;o Substituting this into equation (2.45) and solving for 8 leads to equation (2.46). Combining equations (2.1) and (2.46) leads to equation (2.47), while combining equations (2.2) and (2.46) leads to equation (2.48). Equation (2.49) is obtained from equation (2.47) as follows: N-l ./N -(N,k)(n+l/2) XN+k = _, XnWN n=O N-1 ./N -k(n+l/2) 2 L XnWN n=O -Xk Equation (2.50) is obtained by combining equations (2.44) and (2.46) as follows: -(N-k)/2x -ZWN N-k k/2x 'WN k xk This completes the proof of Lemma 2.21. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the DFT is the eigenvector expansion required by the Fourier analysis method for D-N boundary condi tions. Note that if N is even, then an wO sequence (represented by Xk) sat isfies D-N boundary conditions for the computational domain 1 :<; k :<; N /2. That is: Xo 0 XN/2-1 = XN/2+1 Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DS-DS boundary conditions. Note that if N is even, then an RSO sequence satisfies DS-DS boundary conditions for the computational domain 0 :<:: n :<:: N /2-1. That is: '"N/2-1 84

PAGE 96

Theorem 2.18 Let Xn be an RSO sequence and let xk be its wO symmetric DFT, both of length N where N is even. The real form of the DFT is: N/2-1 Xk=-1/N :E 2:z:nsin[d(2n+1)/NJ n=O for 1 ::0: k ::0: N /2. The real form of the IDFT is: N/2-1 xn=(-1t+1XN;2 -:E 2Xksin[7rk(2n+1)/N] k=1 for 0 :S n :S N /2-1. We now prove Theorem 2.18. The result for the DFT follows from equa tion (2.47) and the RSO symmetry of :Z:n as follows: N-1 X-k = ;N"' -k(n+1/2) -t L ZnWN n=O N/2-1 N/2-1 "/N{ "' -k(n+l/2) + "' -k(N-n-1/2)} -Z L 'J:-nWN L XN-n-lWN n:::.::O n=O N/2-1 N/2-1 "/N{ "' -k(n+l/2) k(n+l/2)} -t L XnWN L XnWN n=O N/2-1 2/ Nim[ L XnwN-k(n+l/2)] n=O N/2-1 = -1/ N L 2xn sin[d(2n + 1)/ N] n=O Note that only half of the wO sequence Xk needs to be specified. The result for the IDFT follows from equations (2.48) and (2.50) as follows: N-1 "' xk(n+1/2) Zn Z L kWN k=O 85

PAGE 97

N/2-1 N/2-1 i{i(-ltXN/2 + L L Xkw!/'(n+1 /2)} k=1 k=1 N/2-1 i{i(-ltXN/2 + 2ilm[ L k=1 N/2-1 (-lt+'xN12-I: 2Xpin[?rk(2n+l)/N] k=1 Note that only half of the RSO sequence :lln needs to be specified. This completes the proof of Theorem 2.18. We now develop a fast, mixed radix algorithm for computing the RSO symmetric DFT and its inverse, given :lln in natural order. Note that an RSO sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. How ever, an wO sequence of length N requires N real storage locations. Thus, in order to obtain an in-place algorithm, we must use a more compact rep resentation of an wO sequence. Such a compact representation is provided by the quantities Xk in Lemma 2.21. Using this representation, an wO sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. The procedure for developing this algorithm will be different from the other algorithms in this chapter for the following reason. Equation (2.47) shows that when we replace the complex quantity Xk by the real quantity Xk> the DFT is changed to a new transform, which we call the discrete staggered transform (DST). Equation (2.48) provides the inverse discrete staggered transform (IDST). Note that the DST is a constant multiple of the DFT, whereas the IDST is not related to the IDFT in any simple way. We have found that the applications of the DST include the boundary conditions considered in this section, as well as others. Thus, we have devoted all of Chapter 3 to the development of fast, mixed radix algorithms for computing the DST and IDST. These algorithms are called the fast staggered transform (FST) and inverse fast staggered transform (IFST). The FST for RSO sequences is developed in Section 3.4. 86

PAGE 98

2.11 Tables of Symmetries 87

PAGE 99

Table 2.1: Symmetries in the IDFT Aero Symmetry Sequence DFT Periodic ZN+n = ::Cn xN+k = xk R Real ;fn = a'.!n xN-k = xk RE Real Xn = Zn xk =Xk Even ZN-n = Xn XN-k = Xk RO Real Xn = Xn xk = -Xk Odd XN-n = XN-k = -Xk RE-E Real Composite Xn = Xn xk =Xk Even-Even ::CN-n = Xn XN-k = Xk (N even) "'NI2 n = Xn Xk=(-l)kXk RE-O Real Composite Xn = Xn xk = xk Even-Odd XN-n = Xn xN-k = xk (N even) "'NI2-n = -'Xn xk=(-l)k+1Xk RO-E Real Composite Xn = Xn xk = -Xk Odd-Even ';VN-n = -Zn XN-k = -Xk (N even) XNf2-n = Xn xk = (-l)k+lxk RO-O Real Composite Xn = Xn xk = -Xk Odd-Odd XN-n = -Xn XN-k = -Xk (N even) "N/2 n = -Xn Xk=(-ljkXk RSE Real :Cn = Zn xN-k = xk Staggered XN-n-l = Xn k-Xk = wNXk Even RSO Real Xn = Xn XN-k = Xk Staggered XN-n-1 = -Xn r k-Xk = -wNXk Odd 88

PAGE 100

Table 2.2: Symmetries ir. the DFT Aero Symmetry Sequence IDFT Periodic xN+k = xk XN+n = :Z:n cs Conjugate xN-k = xk Xn = Zn Symmetric scs Staggered xN-k-1 = xk -nXn = WN Zn Conjugate -n/2 N Xn = WN Xn Symmetric CSIS CS Indcd Xk,p-q = XN/p-k-l,q -nYn,p-q = w NjpYn,q Interseq Symmetry SCSIS SCS Indcd Xk,p-q-1 = X N/pk-l,q -nYn,p-q-1 = WNjpYn,q Interseq Symmetry R Real xk = xk XN-n = Zn I Imaginary xk = -Xk XN-n = -Xn 89

PAGE 101

Table 2.2: ( contd.) Aero Symmetry Sequence IDFT ReSZO ReS & Zero xk =Xk "in= Xn Odd Terms xN-k = xk 'XN-n = Xn (N even) Xk=(-lJkXk 'XN/2-n = Xn I ReSZE ReS & Zero xk =Xk Xn = Xn Even Terms XN-k = Xk XN-n = Xn (N even) xk = ( -l)k+l xk "'N/2-n = -Xn reSZE res & Zero xk = -Xk Xn = Xn Even Terms XN-k = -Xk XN-n = -Xn (N even) xk = ( -l)k+l xk XN/2-n;:;:;:;: Zn reszo res & Zero xk = -Xk :E'n = l:n Odd Terms XN-k = -Xk ZN-n = -Xn (N even) Xk=(-l)kXk "'N/2-n = -cen z Zero xk = o Xn = 0 wE w-Even xN-k = xk Zn = Xn k-Xk = wNXk XN-n-1 = Xn wO w-Odd xN-k = xk Xn = Xn Xk = -wNXk XN-n-l = -Xn 90

PAGE 102

Chapter 3 Fast Staggered Transforms 3.1 Complex (C) We begin by defining the fast staggered transform, and establishing notation which will be used throughout. Definition 3.1 Given a C sequence Xn, for 0 :<:; n :<:; N 1, discrete staggered transform (DST) is defined by: the forward N-1 X 1/N "'"" .-k(n-t-1/2) k-L..J ZnWN (3.1) n=O for 0 :S k :S N 1, where: For convenience, we will often suppress the constant 1/ N. The following theorem provides the inverse discrete staggered transform (IDST). Theorem 3.1 A c sequence Xn may be recovered from its DST xk inverse discrete staggered transform (IDST) which is given by: forO
PAGE 103

We now prove Theorem 3.1 using Lemma 2.1 as follows: N-1 "' X k(n+l/2) L,. kWN k=O N-1 N-1 L [1/ N L k:::::O j=O N-1 N-1 1/ N L 3:j[ L j=O k=O N-1 1/N L Xj[N8n(j)J j=O This completes the proof of Theorem 3 .1. By Definition 3.1, the sequences "'n and Xk are of length N. These sequences can be extended to all integral values of n and k using the pe riodicity properties provided by the following corollary. Carefully note the unusual periodicity property satisfied by Xkl Corollary 3.1 Equations {3.1} and {3.2} imply that the sequences Xn and xk may be ea:tended periodically to all integr-al values of n and k by: XN+n Xn XN+k -Xk We will refer to the periodicity property of Xk as odd periodicity. We will develop fast algorithms for computing the DST and IDST which are based on a variant of the CooleyTukey fast Fourier transform (FFT). Following the general approach in [1], we will develop algorithms for the IDST given Xk in bit-reversed order. Inverting these yields algorithms for the DST given "'n in natural order. We begin by defining notation which will be needed in the development of these algorithms. Definition 3.2 Given a C sequence Xk of length N, and a factor p of N, we define a splitting of Xk consisting of the following p subsequences, each of length N / p: xk,q = Xpk+q for 0 S k S N/p-1, 0 S q S p1. We denote the IDST of these by Yn,q That is: N/p-1 Yn,q = "' xk wk(nc1/2) L.....t ,q Njp k=O 92

PAGE 104

for 0 S: n S: N jp-1, 0 S: q:::; p-l. Given a C sequence Xn of length N, and a factor p of N, we define the following p subsequences, each of length N / p: for 0 S: n S: N jp-1, 0 S: l S: p-l. The inverse fast staggered transform (IFST) is based on the principle of computing the quantities Yn,q, and then combining these in the appropriate fashion to obtain Xn,l The precise equation for performing this combining operation is provided by the next theorem. Theorem 3.2 The inverse combine equation for C sequences is: p-1 lq q(n+l/2) il:n,l = .L..J wp w N Yn,q q=O for 0 S: n S: N jp-1, 0 S: l S: p-1. We now prove Theorem 3.2. N-1 """" X k(n+J/2) :Z:n L kWN k=O p-1 Njp-1 """" """" X (pk+q)(n+1/2) L_, L_, pk+qWN q=O k=O p-1 Njp-1 """" q(n+l/2) """" X k(n+l/2) L_, WN L_, k,qWNjp q=O k=O p-1 """" q(n+J/2) LWN Yn,q q=O In terms of the subsequence notation defined previously, this result is: = '"lNJp+n p-1 """" q(lNJp+n+1/2) L WN YlN/p+n,q q=O p-1 """" lq q(n+l/2) LWPWN Yn,q q=O 93 (3.3)

PAGE 105

This completes the proof of Theorem 3.2. The following corollary provides an important special case of this result. Corollary 3.2 Assume p = 2. The inverse combine equation for C sequences ts: Xn,O Xn,l for 0 '5c n '5c N /2 -1. n+:/2 -Yn,O + WN Yn,1 n+l/2 Yn,OWN Yn,l We now begin the development of the FST algorithm. We will obtain the forward combine equation for the FST by inverting the inverse combine equation. For this, we will use the 'orthogonality property' provided by Lemma 2 .1. The result is summarized in the following theorem. Theorem 3.3 The forward combine equation for C sequences is: (3.4) for 0 '5c n '5c N fp-1, 0 '5c q '5c p-1. We now prove Theorem 3.3. Njp-1 "' X wk(n+l/2) Yn,q k,q Njp N/p-1 L N/p-1 N-1 L [1/ N L k=O j=O N-1 S/p-1 liN "' -q(j+l/2)[ k(n-j)] :v1wN WN/p j=O k=O p-1 1 / "' -q(lN/p+n+l/2) P L XtN/p+nWN = p-1 1 / -q(n+l/2)"' -lq pwN LwP Xn,l 94

PAGE 106

This completes the proof of Theorem 3.3. The following corollary provides an important special case of this result. Corollary 3.3 Assume p = 2. The forward combine equation for C se quences 'ts: Yn,O (:lln,O + :lln,,J/2 Yn,l = WN(n+l/2)(ren,O:lln,l)/2 for 0 S n S N /2 -1. 95

PAGE 107

3.2 Real (R) In this section, we will be concerned with the following symmetries: Definition 3.3 A real {R) sequence Xn of length N is defined by: An odd conjugate symmetric {OCS) sequence Xk of length N is defined by: XN-k = -Xk The following lemma establishes the relationship between these symme tries. Lemma 3.1 If Xn is an R sequence of length N, then its DST Xk is an OCS sequence of length N. If Xk is an OCS sequence of length N, then its IDST Xn is an R sequence of length N. We now prove Lemma 3.1. 'vVe will only prove the first assertion. n=O n=O This completes the proof of Lemma 3.1. The next theorem uses the previous lemma to find the real form of the DST and IDST. These results will be used in subsequent sections in order to relate the DST to fast Poisson solvers. Theorem 3.4 Let Xn be an R sequence and let xk be its ocs symmetric DST, both of length N. The real form of the DST is: N-1 1/ N L Xn cos['lf'k(2n + 1)/ NJ n=O N--1 -1/N L Xnsin[7l'k(2n + 1)/NJ n:::;O 96

PAGE 108

for 0 :0: k :0: N /2 if N is even, and 0 :0: k :0: (N-1)/2 if N is odd. If N is even, then the real form of the IDST is: "'n = Xo + ( -1r+1 Im(XN;2l + N/2-1 L {2Re(Xk)cos['ll'k(2n+ 1)/N]2Im(Xk)sin['ll'k(2n+ 1)/N]} k=1 for 0 :0: n :0: N 1. If N is odd, we obtain instead: (N -1)/2 :l!n = Xo+ L {2Re(Xk)cos['ll'k(2n+1)/N]-2Im(Xk)sin[r.k(2n+1)/N]} k.:;;;:l forO:Sn:SN-1. We now prove Theorem 3.4. The result for the DST follows immediately from Definition 3.1 and the R symmetry of Xn Note that only half of the OCS sequence Xk needs to be specified. We prove the result for the IDST for the case of even N only, since the proof for odd N is similar. The OCS symmetry and odd periodicity of X k implies: Xo -XN = Xo XN/2 -XN/2 Thus, Xo is real and XN;2 is pure imaginary. Using this and the OCS symmetry of xk yields: N-1 :l!n L k;O Xo + ( -1)niXN/2 + N/2-1 N/2-1 X k(n+l/2) + X (N-k)(n+l/2) L._. kWN L._. N-kWN k=1 k=1 Xo + ( -1r+1 Im(XN/2) + N/2-1 N/2-1 X k(n+l/2) + X -k(nc-1/2) L._. kWN L._. kWN k=1 k=1 N/2-1 = Xo + ( -1r+1Im(XN;2 ) + 2Re[ L i-l/2)] k=l 97

PAGE 109

Xo + ( -l)n+llm(XN/2) + N/2-1 L {2Re(Xk) cos[d(2n + 1)/ N]-2Im(Xk) sin[d(2n + 1)/ N]} k=1 This completes the proof of Theorem 3.4. We now develop a fast, mixed radix algorithm for computing the R symmetric DST and its inverse, given "'n in natural order. Note that an R sequence of length N may be stored in N real storage locations, compared to 2N real storage locations for a C sequence of length N. Also, an OCS sequence of length N may be stored in N real storage locations because half of the sequence is redundant and need not be stored. Our goal is to exploit these symmetries in the data in order to obtain a reduction by half in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the OCS sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.4 Let Xk be an OCS sequence of length N with factor p. For q # 0, we define OCS induced intersequence symmetry (OCSIS) by: For q # 0, we denote subsequence Xk,q by OCSIS(q). Subsequence p-q is a redundant copy of subsequence q, which we denote by OCSIS(pg) = OCSIS* (q). We also say that subsequence p-q is the dual of subsequence q. A staggered odd conjugate symmetric (SOCS) sequence Xk of length N is defined by: XN-k-1 = -Xk Let N have factor p. For 0 c; q c; p-1, we define SOCS induced interse quence symmetry (SOCSIS} by: Xk,p-q-1 = -X Njp-k-l,q For 0 c; q c; p-1, we denote subsequence Xk,q by SOCSIS(q). Subsequence P-q -1 is a redundant copy of subsequence q, which we denote by SOCSIS(pq -1} = SOCSIS*(q). We also say that subsequence p-q-1 is the dual of subsequence q. 98

PAGE 110

The following lemma establishes the relationship between these symme tries. Lemma 3.2 Let Xk be an acs sequence of length N with factor p. Then the subsequence Xk,O is aCS symmetric, and the remaining subsequences Xk,q are aCSIS symmetric. If p is even, then the aCSIS symmetry of subsequence xk,p/2 reduces to sacs symmetry. Let Xk be an SaCS sequence of length N with factor p. Then the subse quences Xk,q are SaCSIS symmetric. Ifp is odd, then the SaCS IS symmetry of subsequence xk,(p-1)/2 reduces to sacs symmetry. We now prove Lemma 3.2. Let Xk be an OCS sequence of length N with factor p. The subsequence X k,O satisfies: XN/p-k,O = XN-pk = -Xpk = -Xk,O That is, subsequence Xk,o is OCS symmetric. The remaining subsequences xk,q satisfy: xvk+v- -X N-pk--p+q = -X p(Njp-k-l)+q -X Njp-k-1,q That is, for q f 0 the subsequences Xk,q are OCSIS symmetric. If pis even, then the OCSIS symmetry of Xk,p/2 reduces to: xk,p/2 = -X N/v-k-l,p/2 That is, subsequence Xk,v/2 is SOCS symmetric. Let Xk be an SOCS sequence of length N with factor p. The subse quences xk,q satisfy: Xk,p-q-1 -Xvk+p-q-1 -X N -pk-p+q+l-1 -Xp(N/v-k-1)+q -XN/p-k-l,q That is, the subsequences Xk,q are SOCSIS symmetric. Ifp is odd, then the SOCSIS symmetry of Xk,(v-1);2 reduces to: Xk,(p-1)/2 = -X N/v-k-1,(p-l)/2 99

PAGE 111

That is, subsequence Xk,(p-1);2 is SOCS symmetric. This completes the proof of Lemma 3.2. A mixed radix splitting tree diagram for an OCS sequence is shown in Figure 3.1. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3.3 The intersequence symmetry OCSIS induces the following in tersequence symmetry in the IDST: -(n+l/2)-Yn,p-q WN/p Yn,q Let Xk be an SOCS sequence of length N. Its IDST >:n satisfies: -(n+l/2)_ WN Zn -(n+1/2)/2-WN Xn where Zn is the magnitude of Xn, and hence is real. The intersequence symmetry SOCSIS induces the following intersequence symmetry in the IDST: -(nH/2)_ Yn,p-q-1 WNjp Yn,q We now prove Lemma 3.3. Let Xk,q be OCSIS symmetric. Then the IDST of Xk,p-q is: Yn,p-q Njp-1 "' X k(n+l/2) k,p-qWNjp k=O Njp-1 L k=O Njp-1 "' X w(Njp-k-1)(n+1/2) k,q Njp k=O Njp-1 W -(n+l/2) "' X W -k(n+l/2) Njp k,q Njp k=O -(n+1/2)-WNjp Yn,q 100

PAGE 112

ocs ocs ocs ..=::::::::::::ocsrs( 1) c OCSIS(1) ocsrs*(1) Figure 3.1: Splitting tree for R symmetric FST 101

PAGE 113

Let Xk be an SOCS sequence oflength N. Its IDST Xn satisfies: N-1 "' X k(n+1/2) Zn L..J kWN k=O N-1 "' X .(N-k-1)(n+1/2) L...J k=O N-1 -(n+l/2) "' X .-k(n+1/2) WN L-.t kU.: N k=O = -(n+l/2)-WN Zn We express Xn in polar form as follows: i6 Xn = Xne Substituting this into the preceding symmetry for Xn and solving for e leads to: -(n+l/2)/2. Xn = WN Xn Let Xk,q be SOCSIS symmetric. Then the IDST of Xk,p-q-1 is: Nfp-1 Yn,p-q-l "' k(n+1/2) L.., xk,p-q--1"'Nfp k=O Njp-1 "' X k(n+l/2) L.., Nfp-k-1,qWN/p k=O Nfp-1 "' X (Njp-k-1)(n+l/2) L.., k,qWNfp k=O N/p-1 -(n+1/2) "' xr -k(n+l/2) WNfp L.., k,qWNjp k=O -(n+1/2)WNfp Yn,q This completes the proof of Lemma 3.3. The preceding lemma shows that each symmetry appearing in Figure 3.1 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the R symmetric IFST. 102

PAGE 114

Theorem 3.5 Assume that p is even. The inverse combine equation for acs, SaCS, and aCSIS sequences is: '"n,l Yn,O + ( -1 )1iin,p/2 + p/2-1 2Re[ 'i;""' w1"wq(n+l/2)y 1 L p N n,qJ (3.5) q=l for 0 :0: n :0: N jp-1, 0 ::; l ::; p-1. Note that :rn,l is real because Yn,o zs real. The inverse combine equation for SaCSIS sequences is: p/2-1 2 R r Z/2 (n+1/2J/2 'i;""' lq q(n+l/2) ] il!n,l -eLwp WN L-.t WP WN Yn,q q=O forO :0: n :0: N/p-1, 0::; l::; p-l. (3.6) Next, assume that p is odd. The inverse combine equation for aCS and aCSIS sequences is: (p-1)/2 2 R [ 'i;""' lq q(ncl/2) ] Zn,l = Yn,O + e L wp wl'l Yn,q (3.7) q=l for 0 :0: n ::; N jp-1, 0 :0: l :0: p-1. The inverse combine equation for sacs and SaCSIS sequences is: (p-3)/2 -+ 2R l/2 (n+l/2)/2 'i;""' lq q(n+l/2) i Xn,l -Yn,(p-1)/2 C[Wp WN L., Wp WN Yn,qJ q=O for 0 ::; n::; N jp-1, 0::; l::; p-1. (3.8) We now prove Theorem 3.5. First, assume that pis even. Consider the combining of OCS, SOCS, and OCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (3.3) yields: = p-1 'i;""' lq q(n+1/2) LWPWN Yn,q q=O + lp/2 p(n+1/2)/2 + Yn,O wp WN Yn,p/2 p/2-1 p/2-1 'i;""' lq q(n+l/2) _!_ 'i;""' L wpwN Yn,q I L ,l(p--q) (p-q)(n+!/2) wP W 1v Yn,p-q q=l 103

PAGE 115

= + ( l)l p(n+l/2)/2[ -(n+1/2)/21 -'-Yn,O -WN WN(p Yn,p/2! p/2-1 p/2-1 "' lq q(n+l/2) + "' --lq -q(n+l/2)L wp WN Yn,q L...J wp WN Yn,q q=1 Yn,O + ( -1)1Yn,p(2 + p/2-1 2 R [ "' lq q(n+1(2) 1 e L wp wN Yn,qJ q::::::l Consider the combining of SOCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (3.3) yields: = p-1 "' lq q(n+l/2) L..t WP WN Yn,q q=O p/2-1 p/2-1 "' lq q(n+l/2) "' I(p-q-,), (p-q-1)(n-1/2) L wp WN Yn,q T L wp WN Yn,p-q-l q=O q:::::O p/2-1 p/2-1 "' lq q(n+l/2) + -l -(n+1/2) "' -lq -q(n+l/2)_ L..t wp WN Yn,q wp WN L IJ..,'p WN Yn,q q=O q=O Using SOCS symmetry yields: Xn,l X IN fp+n = -(lN(p+n+J/2)/2-WN XtNjp+n -i/2 -(n+l/2)/2-Wp WN Xn,l Substituting this into the combine equation above yields: p/2-1 l/2 (n+l/2)/2 "' lq q(n+l/2) + Xn,l wp WN L wp WN Yn,q q=O p/2-1 -l/2 -(n+l/2)/2 "' -lq -q(n+1(2)-Wp WN L WP WN Yn,q q=O p/2-1 2 R r i/2 (n+l/2)/2 lq, q(n+1/2). 1 eLwp WN L wP WN Yn,qj q=O (3.9) Next, assume that pis odd. Consider the combining of OCS and OCSIS sequences. Substituting the symmetries found earlier into the inverse 104

PAGE 116

combine equation (3.3) yields: p-1 lq q(n+1/2) Zn,l wp WN Yn,q q=O (p-1)/2 (p-1)12 Y + Wlqwq(n+l/2)Y j(p-q)jp-q)(n+l/2)Y n,O L p N n,q T L p N n,p-q q=1 q=l (p-1)/2 (p-1)/2 + lq q(n+1/2) + -lq -q(n+J/2)_ Yn,O L..J wp WN Yn,q wp WN Yn,q q=l q=l (p-1 )/2 2R [ '<' lq q(n+l/2) J Yn,O + e L wp WN Yn,q q=l Consider the combining of SOCS and SOC SIS sequences. Substituting the symmetries found earlier into the inverse combine equation (3.3) yields: p-1 lq q(n+1/2) LwvwN Yn,q q=O l(p-1)/2 (n+1/2)(p-1)/2 WP WN Yn,(p-1)/2 -t(p-3)/2 (p-3)/2 lq q(n+1/2) + l(p-q-1) (p-q-1)(n-r1/2) L wpwN Yn,q L WP WN Yn,p-q-1 q=O q=O = l(p-1)/2 (n+l/2)(p-1)/2 WP WN Yn,(p-1)/2 -r (p-3)/2 (p-3)/2 lq q(n+l/2) + -l -(n+l/2) -lq -q(n+1/2)L WP wN Yn,q wp WN L wp wN Yn,q q=O q=O Combining this with equation (3.9) yields: p/2 p(n+l/2)/2 + WP WN Yn,(p-1)/2 (p-3)/2 l/2 (n+l/2)/2 lq q(n+1/2) + WP WN L wp WN Yn,q q=O (p-3)/2 -l/2 -(n+l/2)/2 -lq ,-q(n+l/2)_ WP WN L WP wN Yn,q q=O = (p-3)/2 + 2R [ 1/2 (n+1/2)/2 lq q(n+l/2) 1 -Yn,(p-1)/2 e wp WN L., wp wN Yn,qJ q=O 105

PAGE 117

This completes the proof of Theorem 3.5. The following corollary provides an important special case of this result. Corollary 3.4 Assume p = 2. The inverse combine equation for OCS and SOCS sequences is: Xn,O Yn,O + iJn,l Xn,1 -Yn,O --iJn,l for 0 :0: n :0: N 12 -1. The inverse combine equation for SOCSIS sequences zs: 2 R [ (n+1/2)/2 1 -e WN Yn,OJ 2Im[wt:+1 / 2)/2 Yn,o] for 0 :0: n :0: N 12 -1. The next theorem provides all of the forward combine equations for the R symmetric FST. Theorem 3.6 Assume that p is even. The forward combine equation for OCS, SOCS, and OCSIS sequences is given by equation (3.4) for 0 :0: n :0: Nip-1, 0:0: q :0: pl21 and: p-1 Yn,p/2 = 1lp 2:) -1)1 xn,1 (3.10) 1=0 for 0 :0: n :0: NIp1. The forward combine equation for SOCSIS sequences zs: p-1 11 -(n+l/2)(q+J/2) ,, -1(q+l/2) Yn,q P W N ...J WP Xn,l (3.11) [:::;;:0 forO :0 n :0: Nlp-1, 0:0: q :0: pl2-l. Next, assume that p is odd. The forward combine equation for OCS and OCSIS sequences is given by equation (3.4) for 0 :0: n :0: NIP1, 0 :0 q :0: (p1)12. The forward combine equa.tion for SOCS and SOCSIS sequences is given by equation (3.11) for 0 ::; n :0: NIp1, 0 :0: q :0: (p-3) 12 and: for 0 :0 n :0: Nip-1. p-1 Yn,(p-1)/2 = 1lp L( -1)1in,1 1=0 106 (3.12)

PAGE 118

We now prove Theorem 3.6. First, assume that pis even. The forward combining of OCS, SOCS, and OCSIS sequences requires one new equation: Yn,p/2 w(n+1/2)/2 Njp Yn,p/2 p-1 1/p L w;lpf2 :cn,l l;:;:::Q p-1 1/p L( -1)1Xn,l l=O The forward combine equation for SOCSIS sequences 1s obtained by substituting equation (3.9) into equation (3.4): Yn,q = p-1 1 / -q(n+l/2)"' -lq P w N L._., WP ':Cn,l 1=0 p-1 1 / -q(n+l/2)"' -lqr --1/2 -(n+l/2)/2-1 pwN LWP LWp, WN Xn,lJ 1=0 p-1 1 / -(n+l/2)(q+1/2)"' -l(q+l/2)pwN LWP Xn,l 1=0 Next, assume that pis odd. The forward combining of OCS and OCSIS sequences does not require any new equations. The forward combining of SOCS and SOCSIS sequences requires one new equation: Yn,(p-1 )/2 (n+l/2)/2 WN/p Yn,(p-1)/2 p-1 1/ "' -lp/2p LWP Xn,l 1=0 p-1 1/p I;( -1)1xn,l 1=0 This completes the proof of Theorem 3.6. The following corollary provides an important special case of this result. Corollary 3.5 Assume p = 2. The forward combine equation for OCS and SOCS sequences is: Yn,O ( Xn,O + Xn,1) /2 fln,l ( Zn,O -Xn,J.) /2 107

PAGE 119

for 0 :0: n :0: N /2 1. The forward combine equation for SOCSIS sequences ts: -(n+l/2)/2(-. )/2 Yn,O = W N mn,o ZZn,l for 0 :0: n :0: N /2 -1. 108

PAGE 120

3.3 Real Staggered Even (RSE) In this section, we will be concerned with the following symmetries: Definition 3.5 A real staggered even (RSE) sequence "'n of length N IS defined by: XN-n-1 Xn A real odd conjugate symmetric (ROCS) sequence Xk of length N is defined by: Note that an ROCS sequence may be viewed as having both R and OCS symmetry, or equivalently as an RO sequence. The following lemma establishes the relationship between these symme tries. Lemma 3.4 If "'n is an RSE sequence of length N, then its DST Xk is an ROCS sequence of length N. If Xk is an ROCS sequence of length N, then its IDST "'n is an RSE sequence of length N. We now prove Lemma 3.4. We will only prove the first assertion. Let :>:n be an RSE sequence of length N. Since Xn is also R symmetric, Lemma 3.1 implies that its DST Xk is an OCS sequence of length N. Thus, we have only to prove that Xk is R symmetric as well: n=D n=O N-1 = k(n+l/2) L XnWN n=O 109

PAGE 121

This completes the proof of Lemma 3.4. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for NS-NS boundary conditions. Note that if N is even, then an RSE sequence satisfies NS-NS boundary conditions for the computational domain 0 <::; n <::; N /2-1. That is: 'l:N-1 a::o XN/2-1 ;rN/2 Theorem 3. 7 Let Xn be an RSE sequence and let xk be its ROCS sym metric DST, both of length N where N is even. The real form of the DST zs: N/2-1 Xk=1/N L 2:vncos[d(2n+1)/NJ n:::::O for 0 <::; k <::; N /2-1. The real form of the IDST is: N/2-1 Xn=Xo+ L 2Xkcos[d(2n+1)/N] for 0 <::; n <::; N /2 -1. We now prove Theorem 3. 7. The result for the DST follows from The orem 3.4, the ROCS symmetry of Xk, and the RSE symmetry of "'n as follows: N-1 xk 1/N L XnCos[?rk(2n + 1)/N] n:::::O N/2-1 1/N{ L ;cncos[d(2n+1)/N]+ n::=O N/2-1 L "'N-n-1 cos[7rk(2N2n-1)/ N]} n=O N/2-1 1/N L 2a:ncos[7rk(2n + 1)/NJ n=O 110

PAGE 122

The result for the IDST follows immediately from Theorem 3.4 and the ROCS symmetry of Xk. Note that only half of the RSE sequence :Cn needs to be specified. This completes the proof of Theorem 3.7. We now develop a fast, mixed radix algorithm for computing the RSE symmetric DST and its inverse, given :Cn in natural order. Note that an RSE sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. Sim ilarly, an ROCS sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCS sequence Xk. We be gin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.6 Let Xk be an ROCS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of Xk are identical to those in Definition 3.4, with the addition that all sequences are real as well. We indicate this by preceding the acronym for each symmetry with an R. The relationships between the symmetries recorded in Lemma 3.2 are not affected by the fact that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an ROCS sequence is shown in Figure 3.2. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3.5 The intermediate symmetries in the IDST induced by the in termediate symmetries in the DST are identical to those in Lemma 3.3, with the following addition. Let Xk be an R sequence of length N. Its IDST "'n satisfies: Since all sequences have R symmetry, only half of the IDST of any sequence needs to be computed. 111

PAGE 123

ROCS ocsrs*(l) Figure 3.2: Splitting tree for RSE symmetric FST 112

PAGE 124

We now prove Lemma 3.5. Let Xk be an R sequence of length N. Its IDST "n satisfies: N--1 XN-n-1 "' X k(N-n-1/2) L.J kWN k=O N-1 "' X -k(n+l/2) L.J kWN k=O 'ifn This completes the proof of Lemma 3.5. The preceding lemma shows that each symmetry appearing in Figure 3.2 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RSE symmetric IFST. Theorem 3.8 Assume that p is even. The inverse combine equation for ROCS, RSOCS, and ROCSIS sequences is given by equation {3.5} for the lower half-range of n and 0 :S l :S p/2 -1. We also need the companion equation: "NJp-n-1,1 (3.13) for the lower half-range of n and 0 :S l :S p/2 -1. The inverse combine equation for RSOCSIS sequences is given by equation {3.6} for the lower half-range of n and 0 :S l :S p/2 -1. We also need the companion equation: p/2-1 2R [ -(1+1)/2 (n+l/2)/2 "' -q(l+l), q(n+l/2) 1 'XNjp-n-1,1e Wp WN Wp WN Yn,qJ q=O (3.14) for the lower half-range of n and 0 :S l :S p/2 1. The inverse combine equation for R sequences is given by equation (3.3} for the lower half-range of n and 0 :S l :S p/2-1. We also need the companion equation: p-1 "' q(l+l) .-q(n+1/2)'iVNjp-n-l,l-L..J wp WN Yn,q (3.15) q=O 113

PAGE 125

for the lower half-range of n and 0 :':: l :':: p/2-1. Next, assume that p is odd. The inverse combine equation for ROCS and ROCSIS sequences is given by equation {3. 7) for the lower half-range of n and 0 :'::I:':: (p-1)/2. We also need the companion equation: (p-1)/2 2 R [ "' -q(l+l) q(n+l/2) J X_Njp-n-l,l = Yn,O T e L..,; wp WN Yn,q (3.16) q=l for the lower half-range of n and 0 :':: I :':: (p 3) /2. The inverse combine equation for RSOCS and RSOCSIS sequences is given by equation (3.8} for the lower half-range ofn and 0 :':: l :':: (p-1)/2. We also need the companion equation: iN/p-n-1,1 = Yn,(p-1)/2 + (p-3)/2 "' -q(l+l) q(n+l/2) J L wp WN Yn,q q=O (3.17) for the lower half-range of n and 0 :':: l :':: (p -3) /2. The inverse combine equation for R sequences is given by equation (3.3) for the lower half-range ofn and 0 :':: l :':: (p-1)/2. We also need the companion equation {3.15} for the lower half-range of n and 0 :':: l :':: (p-3)/2. We now prove Theorem 3.8. First, assume that p is even. Consider the combining ofROCS, RSOCS, and ROC SIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.5), we need the following companion equation: XNjp-n-1,1 = YNjp-n-1,0 + (-lhiNjp-n-1,p/2 + p/2-1 2R [ "' lq q(Njp-n-1/2) J e L wP wN YN/p-n-l,q q=l Using RSOCS symmetry yields: YNjp-n-1,q (Njp-n-1/2)/2 WN/p YN/p-n-1,q -(n+l/2)/2-WNjp Yn,q -Yn,q 114 (3.18)

PAGE 126

Substituting this into the companion equation above yields: X:Njp-n-1,l = +( 1)1+1' Yn,O -Yn,p/2 T p/2-1 2R 'l "' q(l+l), -q(n+1/2)I e L wP WN Yn,q q=l + ( 1)1+1--L Yn,O -Yn,p/2 1 p/2-1 2Rel "' w-q(l+l)wq(n+l/')y I 1. L p N n,q q=l Consider the combining of RSOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.6), we need the following companion equation: ZNjp-n-1,1 = = p/2-1 2 R [ 1/2 (Njp-n-1/2)/2 "' lq q(N/p-n-1/2) 1 e WP WN L..t WP WN YNjp-n-l,qJ q=O p/2--1 2R [ (1+1)/2 --(n+l/2)/2 ,, ,q(l+l) -q(n+l/2)1 ewp" WN L wP WN Yn,q q:::::O p/2--1 2R [ -(1+1)/2 (n+l/2)/2 "' ,-q(l+l) q(n+l/2) I ewP wN L wP wN Yn,q q=O Consider the combining of R sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.3), we need the following companion equation: XNjp-n-l,l p-1 "' lq q(N/p-n-1/2) L wp WN YN}p-n-1,q q=O p-1 = "' q(l+1) -q(n+J/2)_ LWP WN Yn,q q=O Next, assume that p is odd. Consider the combining of ROC S and ROCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3. 7), we need the following companion equation: '"Nfp-n-1,1 (p-1)/2 2 R [ "' iq q(N jp-n-1/2) 1 YNjp-n-1,0 + e WP WN YNjp-n-l,qJ q=l 115

PAGE 127

(p-1 )/2 Y 0 + 2Re[ '<' wq(l,: lw -.q(n+1 / 2)-y. ] n, ....p !\ n,q q=l (p-1)/2 Y + 2Re[ '<' w-q(l+1)wq(n+l/2)y 'j n,O L..J p N n,q q=l Consider the combining of RSOCS and RSOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (3.8), we need the following companion equation: iiN/p-n-1,1 = -YNjp-n-1,(p-1)/2 + (p-3)/2 2 R [ 1/2 (N/p-n-1/2)/2 '<' ,lq q(N/p-n-1/2) ] e Wp WN L '-"p WN q=O Substituting equation (3.18) into the companion equation above yields: i;N/p-n-1,1 = Yn,(p-1)/2 + (e-3)/2 2 R [ (1+1)/2 -(n+1/2)/2 '\:' q(l'-1) -q(n+l/2)] e WP wN L...t wP wN Yn,q q=O Yn,(p-1)/2 + (p-3)/2 2 R [ -(1+1)/2 (n+l/2)/2 '<' -q(l+l) q(n+l/2) ] ewp WN L WP WN Yn,q q=O The companion equation for R sequences is identical to the even p case. This completes the proof of Theorem 3.8. The following corollary provides an important special case of this result. Corollary 3.6 Assume p = 2. The inverse combine equation for ROCS and RSOCS sequences is: :Cn,O Yn,O + fin,l ZN/2-n-1,0 = Yn,O-Yn,l for the lower half-range of n. The inverse combine equation for RSOCSIS sequences zs: XN/2--n-1,0 2R i (n+1/2)/2 ] eLwN Yn,o 2! (n+J/2)/2 ] mlwN Yn,O 116

PAGE 128

for the lower half-range of n. The inverse combine equation for R sequences zs: n+l/2 = Yn,O + WN Yn,l .(n-,-1/2)-Yn,O -W N Yn,l for the lower half-range of n. The next theorem provides all of the forward combine equations for the RSE symmetric FST. Theorem 3.9 Assume that p is even. The forward combine equation for R sequences is: p/2-1 1 / -q(n+l/2) c -lq ..L q(l+1)-] Yn,q:::.:::: PWN L LWp Zn,l WP X-n-l,l+l (3.19) [::::::0 for the lower half-range of n and 0 :; q :; p 1. For N = p, n = 0 this reduces to: p/2-1 Yo,q = 2/pRe[ L w;q(l+1/2)xo,t] (3.20) l=O This ensures that the final output is real because N = p, n = 0 in the last stage of the algorithm. The forward combine equation for ROCS, RSOCS, and ROC SIS sequences is given by equation (3.19) for the lower half-range of n and 0 :; q :':: p/2-1 with the exception that all sequences Xn,l are real. In addition: p/2-1 Yn,p/2 = 1/p L (-1)1[xn,l-'"-n-1,1+1] (3.21) l=O for the lower half-range of n. For N = p, n = 0 this reduces to flo,pj2 = 0. The forward combine equation for RSOCSIS sequences is: p/2--J l/ -(n+l/2)(q+1/2)[ ,-l(qo 1/2)-+ l=O (3.22) l=O 117

PAGE 129

for the lower half-range of n and 0 :":: q :":: p/21. For N = p, n = 0 this reduces to: p/2-1 Yo,q = 2/pRe[ L w;Cl+1/2)(q+1/2)xo,t] (3.23) l=O Next, assume that p is odd. The forward combine equation for R sequences zs: Yn,q ljpwjVq(n+J/2){( -l)"w;12,n,(p-1)/2 + (p-3)/2 "" [ -lq + ,q(l+1)]} L wp Xn,l wp X-n-l,l+l (3.24) l=O for the lower half-range of n and 0 < q :":: p 1. For N = p, n = 0 this reduces to: (p-3)/2 Yo,q = 1/p{( -l)""'o,(N-1);2 + 2Re[ L w;(l+1/2 )"'o,zl} (3.25) l=O Note that Yo,q is real because "'o,(N-1);2 = "'(N-l)/2 is real. The forward com bine equation for ROCS and ROCSIS sequences is given by equation {3.24} for the lower half-range of n and 0 :":: q :":: (p 1) /2 with the exception that all sequences :l!n,l are real. The forward combine equation for RSOCS and RSOCSIS sequences is: Yn,q = 1 / -(n+1/2)(q+l/2){ '( 1)q+1 (q+l/2)/2-, p WN Z Wp Xn,\_p-1)/2 T (p-3)/2 "" [w-l(q+l/2);;; + w(l+l)(q+l/2);; ]} L p n,l p -n-l,l+l l=O (3.26) for the lower half-range of n and 0 :<:: q :<:: (p3)/2. For N = p, n = 0 this reduces to: (p-3)/2 Yo,q = 2/pRe[ L w;(l+l/2 )(q+ 1 / 2)x0 ,1 ] (3.27) l=O In addition: Yn,(p-1)/2 1/p{( -1)(p-1)/2Xn,(p-1)/2 + (p-3)/2 L ( -1/[xn,l-'"-n-1,1+1]} (3.28) l=O for the lower half-range of n. For N = p, n = 0 this reduces to iio,(p1);2 = 0 because "'o,(N-1);2 = "'CN-1)/2 = 0. 118

PAGE 130

We now prove Theorem 3.9. First, assume that pis even. The forward combine equation for R sequences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all R sequences: :Z:n,p-Z.-1 "'(p-l-1)N/p+n ;xN-(l+l)N/p+n 'X(I+l)N/p-n-1 = Using this result, we obtain: Yn,q = = p-1 1 / -q(n+l/2) '\"""' -lq pwN Xn,l l=O p/2-1 1 / -q(n+l/2) '\"""' [ -lq -q(p-l-1) 1 PWN L WP Xn,l T WP :Z:n,p-l-lJ l=O p/2-1 1 / -q(n+l/2) '\"""' c -lq q(l+1)-1 pwN L....t lWp Xn,l T Wp :Z:-n--l,l+l; l=O For N = p, n = 0 this reduces to: p/2-1 Yo,q = 1/pw;q12 L [w;1 :co,t + p/2-1 = ljp L [w;q(l+1/2)xo.t t l=O p/2-1 = 2/pRe[ L w;q(l+J/2):t0,t] l=O The forward combining of ROCS, RSOCS, and ROCSIS sequences re quires one new equation: = (n+J/2)/2 Yn,p/2 WN/p Yn,p/2 p/2-1 1/p L (-1)1[Xn,t-"'-n-1,1+1] l=O 119

PAGE 131

The forwaxd combine equation for RSOCSIS sequences is obtained by substituting equation (3.9) into equation (3.19): p/2-1 1 / -q(n+l/2) "' [ -lq + q(l+1)1 Yn,q pwN L wp Xn,l wp Z-n-l,l+lJ l=O p/2-1 1 / -(n+l/2)(q+l/2)[ "' -i(q+l/2)--1-PWN 6 wp Xn,l ; [::::.:0 p/2-1 "' w(i+l)(q+l/2 ):;; 1 -J L.J p -n-1, -rl l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.20): p/2-1 Yo,q 2/pRe[ L w;q(l-t-1 / 2la:o,d l=O p/2-1 2/pRe[ L w;(l+1/2)(q+l/2);,o,d l=O Next, assume that p is odd. The forwaxd combine equation for R. se quences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. p-1 1 / -q(n+1/2) "'. -lq Yn,q pwN LWP Xn,l l=O 1 / -q(n+l/2){ -q(p-1)/2 + pwN WP Xn,(p-1)/2 (p-3)/2 L [w;1 qa:n,l + w;q(p-l-l)xn,p-l-1]} l=O = 1jpw}Vq(n+1 / 2){(-1)qw,;l2xn,(p-1)/2 + (p-3)/2 "' [ -lq ,q(l+l)--J} L WP Xn,l T WP X-n-1,1+1 l=O For N = p, n = 0 this reduces to: 120

PAGE 132

Yo,q + (p-3)/2 2:= [w;1xo,l + l=D 1/p{( -1)xo,(N-1)/2 + (p-3)/2 2:= [w;(l+l/2lxo,l + l=O (p-3)/2 1/p{( -1):oo,(N-1)/2 + 2Re[ 2:= w;(l+1 / 2l:oo,z]} [::;;0 The forward combining of ROCS and ROCSIS sequences does not re quire any new equations. The forward combine equation for RSOCS and RSOCSIS sequences is obtained by substituting equation (3.9) into equa tion (3.24): Y = 1/pw-q(n+l/2){(-1)wl2 x n,q N p n,(p-1 )/2 T (p-3)/2 2:= [w;1:on,l + wtl+l):;;-_n-l,l+l]} l=D = + (p-3)/2 "' [w-l(q+l/2);; + w(l+l)(q+l/2):1: 1} L...., p n,l p -n-l,l+lJ l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.25). We will also use the fact that "'o,(N1 ); 2 = "'(N-1)/2 = 0. (p-3)/2 Yo,q = 1/p{( -1)xo,(N-1)/2 + 2Re[ 2:= w;(h-1 /'):z:o,d} l=D (p-3)/2 1/p{i( -1)+1;;o,(N-1)/2 + 2Re[ 2:= w;U+ 1 / 2 )(q+l/2l;;0,z]} [:::.:0 (p-3)/2 2/pRe[ "' w-(l+l/2 )(+ 1 1 2 );; i L....J p O,lJ 121

PAGE 133

For q = (p-1)/2 we obtain: (n+l/2)/2 Yn,(p-!)/2 = WN/p Yn,(p-!)/2 (p-3)/2 = 1/p{( -1)(p-l)/2 :Vn,(p-1)/2 + L ( -1)1[:Vn,l --'"-n-1,1+1]} This completes the proof of Theorem 3.9. The following corollary provides an important special case of this result. Corollary 3. 7 Assume p = 2. The forward combine equation for R se quences is: Yn,O = (o;n,O + Z-n-1,1)/2 -(n+1/2)( )/ Yn,l WN Zn,O-X-n-1,1 2 for the lower half-range of n. The forward combine equation for ROCS and RSOCS sequences is: Yn,O = (o;n,O + "'-n-1,!)/2 Yn,! = (o;n,O-"'-n-1,1)/2 for the lower half-range of n. The forward combine equation for RSOCSIS sequences zs: -(n+l/2)/2(. )/2 Yn,O = WN Zn,O + 1-Z-n-1,1 for the lower half-range of n. 122

PAGE 134

3.4 Real Staggered Odd (RSO) In this section, we will be concerned with the following symmetries: Definition 3. 7 A real staggered odd (RSO) sequence Xn of length N is defined by: XN-n-1 An imaginary odd conjugate symmetric (JOGS) sequence Xk of length N is defined by: Note that an JOGS sequence may be viewed as having both I and OGS sym metry, or equivalently as an imaginary even (IE) sequence. The following lemma establishes the relationship between these symme tries. Lemma 3.6 If Xn is an RSO sequence of length N, then its DST Xk is an JOGS sequence of length N. If Xk is an JOGS sequence of length N, then its IDST Xn is an RSO sequence of length N. We now prove Lemma 3.6. We will only prove the first assertion. Let "'n be an RSO sequence oflength N. Since "'" is also R symmetric, Lemma 3.1 implies that its DST Xk is an OCS sequence of length N. Thus, we have only to prove that Xk is I symmetric as well: = n:::::O N-1 -k(N -n-1/2) ZN-n-lWN n=O N-1 "' k(n+l/2) L ZnWN n:::O 123

PAGE 135

This completes the proof of Lemma 3.6. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DS-DS boundary conditions. Note that if N is even, then an RSO sequence satisfies DS-DS boundary conditions for the computational domain 0 ::; n ::; N /2-1. That is: :eN/2-1 Theorem 3.10 Let Xn be an RSO sequence and let Xk be its JOGS sym metric DST, both of length N where N is even. The real form of the DST zs: N/2-1 Im(Xk) = -1/N L 2xnsin[7rk(2n+ 1)/N] n::::O for 1 ::; k ::; N /2. The real form of the IDST is: N/2-1 Xn = ( -l)n+l Im(XN/2)L 21m(Xk) sin[7rk(2n + 1)/ N] k=1 for 0 ::; n ::; N /2 -1. We now prove Theorem 3.10. The result for the DST follows from Theo rem 3.4, the IOCS symmetry of Xk, and the RSO symmetry of Xn as follows: N-1 Im(Xk) = -1/ N L Xn sin[7fk(2n + 1)/ N] N/2-1 -1/N{ L Xnsin[7fk(2n + 1)/N] + N/2-1 L "'N-n-1 sin[7fk(2N2n-1)/ N]} n=O N/2-1 -1/N L 2xnsin[7rk(2n + 1)/N] n=O Note that the range for k reflects the fact that X0 = 0. The result for the IDST follows immediately from Theorem 3.4 and the IOCS symmetry of 124

PAGE 136

X k. Note that only half of the RS 0 sequence Xn needs to be specified. This completes the proof of Theorem 3.10. We now develop a fast, mixed radix algorithm for computing the RSO symmetric DST and its inverse, given Xn in natural order. Note that an RSO sequence of length N may be stored inN /2 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an IOCS sequence of length N may be stored in N/2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCS sequence X k. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.8 Let Xk be an JOGS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of Xk are identical to those in Definition 3.4, with the addition that all sequences are imaginary as well. We indicate this by preceding the acronym for each symmetry with an I. The relationships between the symmetries recorded in Lemma 3.2 are not affected by the fact that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCS sequence is shown in Figure 3.3. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3. 7 The intermediate symmetries in the IDST induced by the in termediate symmetries in the DST are identical to those in Lemma 3.3, with the following addition. Let Xk be an I sequence of length N. Its IDST "'n satisfies: :CN-n-1 = -Xn Since all sequences have I symmetry, only half of the IDST of any sequence needs to be computed. 125

PAGE 137

IOCS ocsrs*(l) Figure 3.3: Splitting tree for RSO symmetric FST 126

PAGE 138

We now prove Lemma 3.7. Let Xk be an I sequer,ce of length N. Its IDST :lln satisfies: N-1 'C""' X k(N -n-1/2) ZN-n-1 ::::::. kWN N-l 'C""' X -.<(n+J/2) -L.....t kWN k=O :::: -Xn This completes the proof of Lemma 3. 7. The preceding lemma shows that each symmetry appearing in Figure 3.3 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RSO symmetric IFST. Theorem 3.11 Assume that p is even. The inverse combine equation for JOGS, ISOCS, and IOCSIS sequences is given by equation {3.5) for the lower half-range of n and 0 :0:: l :0:: p/2-1. We also need the companion equation: "Nfp-n-1,1 = -Yn,o + ( -1)11Jn,p/2p/2-1 2Re[ 'C""' w-q(l+l)wq(n+1/2)y J L p N n,q q=l (3.29) for the lower half-range of n and 0 :0:: l :0:: p/2 -1. The inverse combine equation for ISOCSIS sequences is given by equation {3.6} for the lower half-range of n and 0 :0:: l :0:: p/2 -1. We also need the companion equation: p/2-1 -_ 2 R [ -(1+1)/2 (n+l/2)/2 -q(l+l) _q(n+J/2) 1 (3 30) ZNjp-n-l,le WP WN L wp wN Yn,q. q=O for the lower half-range of n and 0 :0:: l :0:: p/2 -1. The inverse combine equation for I sequences is given by equation (S.3} for the lower half-range of n and 0 :0:: l :0:: p/2-1. We also need the companion equation: p-1 'C""' q(i+1) -q(n+J/2)ZNjp-n-l,l--L wp WN Yn,q (3.31) q=O for the lower half-range of n and 0 :0:: l :0:: p/2 -1. 127

PAGE 139

Next, assume that p is odd. The inverse combine equation for JOGS and IOCSIS sequences is given by equation (3. 7} for the lower half-range of n and 0 :S l :S (p-1)/2. We also need the companion equation: (p-1)/2 -2R I -q(l+l) q(n-H/2) J ZNjp-n-l,l--Yn,Oe wp WN Yn,q (3.32) q=l for the lower half-range of n and 0 :S l :S (p-3)/2. The inverse combine equation for ISOCS and ISOCSIS sequences is given by equation (3. 8) for the lower half-range ofn and 0 :S l :S (p--1)/2. We also need the companion equation: XNfp-n-1,1 2R I -(1+1)/2 (n+l/2)/2 ewP wN -!in,(p-1)/2-(p-3)/2 -q(l+l) q(n+l/2) ] L.., WP WN Yn,q q=O (3.33) for the lower half-range of n and 0 :S l :S (p-3)/2. The inverse combine equation for I sequences is given by equation (3.3} for the lower half-range ofn and 0 :S l :S (p-1)/2. We also need the companion equation (3.31} for the lower half-range of n and 0 :S l :S (p-3)/2. We now prove Theorem 3.11. First, assume that pis even. Consider the combining of IOCS, ISOCS, and IOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.5), we need the following companion equation: "N/p-n-1,1 = YN/p-n-1,0 + ( -1)1YN/p--n-1,p/2 + p/2-1 2Rel L q=l Using ISOCS symmetry yields: YN/p-n-1,q (N/p-n-1/2)/2 WN/p YN/p-n-1,q -(n+l/2)/L WN/p Yn,q = fln,q Substituting this into the companion equation above yields: 128 (3.34)

PAGE 140

ZNjp-n-1,l -Yn,o + ( -1)1Yn,p/2p/2-1 2Re[ "' wq(l+l)w -q(n+l/')-y ] L p N n,q q=l -Yn,o + ( --1 )1Yn,p/2 -p/2-1 2Re[ "' w-q(l+l)wq(n+l/')y l L p N n,qJ q=1 Consider the combining of ISOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.6), we need the following companion equation: p/2-1 iN/p-n-1,1 = L = q=O p/2-l -2R [ (i+l)/2 -(n+l/2)/2 ,q(l-r1) e wp wN L """'p wN Yn,qJ q=-0 p/2--1 -2R [ -(1+1)/2 (n+l/2)/2 "' -q(lf) q(n+l/2) J ewp wN L WP wN Yn,q q=O Consider the combining of I sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.3), we need the following companion equation: p-1 "'Nfp-n-1,1 = "' lq q(Njp-n-1/2) ...wp WN YN/p-n-1,q q=O p--1 "' q(l+1) -q(n+l/2)LWP WN Yn,q q=O Next, assume that pis odd. Consider the combining ofiOCS and IOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation ( 3. 7), we need the following companion equation: (p-1)/2 2 R [ "' lq q(N/p-n-1/2) ] "'Nfp-n-1,1 = YNjp-n-1,0 + e ...wp WN YNjp-n-1,q q=l 129

PAGE 141

(p-1)/2 -y 2Ref wq(l+l)w -q(n+1 / 2)-y ] n,O L p N n,q q:::::l = (p-1)/2 -y 2Re[ w-q(l+l)"'q(n+l/2)y ] n,O L......J p N n,q q=l Consider the combining of ISOCS and ISOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (3.8), we need the following companion equation: ;;Nfp-n-1,1 = -YN/p-n-l,(p-1)/2 + Substituting equation (3.34) into the companion equation above yields: ZNjp-n-1,1 = -fin,(p-1)/2-= -fin,(p-1)/2 The companion equation for I sequences is identical to the even p case. This completes the proof of Theorem 3.1 L The following corollary provides an important special case of this result. Corollary 3.8 Assume p = 2. The inverse combine equation for JOGS and ISOCS sequences is: :Z:n,O "'N/2-n-1,0 Yn,O + fJn,l -Yn,O + Yn,l for the lower half-range of n. The inverse combine equation for ISOCSIS sequences zs: ZN/2-n-1,0 2R c (n+l/2)/2 ] eLWN Yn,O 21 c, ,(n_ -1 -mLwN Yn,O 130

PAGE 142

for the lower half-range of n. The inverse combine equation for I sequences zs: n-Ll/2 Yn,O + wN' Yn,l "'N/2-n-1,0 -L -(n+1/2)-Yn,O 1 W N Yn,l for the lower half-range of n. The next theorem provides all of the forv;ard combine equations for the RSO symmetric FST. Theorem 3.12 Assume that p is even. The forward combine equation for I sequences is: p/2-1 1 / -q(n+1/2) '\"' [ -lq q(l+l)' Yn,q = PWN L wp Xn,[wp :V-n-l,l+lJ l=O for the lower half-range of n and 0 ::; q ::; p -1. For N = p, n reduces to: p/2-1 YO,q = 2ijplm[ L w;q(l+l/2)xo.zl bO (3.35) 0 this (3.36) This ensures that the final output is imaginary because N = p, n = 0 in the last stage of the algorithm. The forward combine equation for JOGS, ISOCS, and IOCSIS sequences is given by equation (3.35) for the lower half-range of n and 0 :S q :S p/2-1 with the exception that all sequences Xn,t are real. In addition: p/2-1 Yn,p/2 = 1/p L (-1)1\xn,l + X-n-1,/H] (3.37) l=O for the lower half-range of n. For N = p, n = 0 note that Yo,N/2 = -ifJo,Nj2 zs zmagznary. The forward combine equation for ISOCSIS sequences is: p/2-1 1 / -(n+l/2)(q+l/2)l ,-l(q+l/2) _ Yn,q PWN L Wp Xn,l l=O p/2-1 '\"' w(l+l)(q+1/2)a; 1 L p -n-l,l-t-lj l=O 131 (3.38)

PAGE 143

for the lower half-range of n and 0 :'0 q :'0 p12 L For N = p, n = 0 this reduces to: p/2-1 Yo,q = 2ijplm[ I: (3.39) Next, assume that pis odd. The forward combine equation for I sequences zs: Yn,q 1 j p wjVq(n+ 1 /2 ) { ( -1 + (p-3)/2 "' i -lq q(l+l)-1} LWp Xn,l -Wp X-n-l,l+lJ (3.40) for the lower half-range of n and 0 < q 5 p 1. For N = p, n = 0 this reduces to: (p-3)/2 Yo,q = 1/p{( -1)qxo,(N-1);2 + 2iJm[ I: w;q(Z+l/2 lxo,z]} (3.41) Note that Yo,q is imaginary because xo,(N-1);2 = '"(N-l)/2 is imaginary. The forward combine equation for JOGS and IOCSIS sequences is given by equa tion {3.40) for the lower half-range of n and 0 :'0 q 5 (p-1)/2 with the exception that all sequences :l:n,l are real. The forward combine equation for ISOCS and ISOCSIS sequences is: Yn,q = 1 / -(n+l/2)(q+l/2){ '( 1)q+1w(q+1/2)/2;;; + PWN Z p (p-3)/2 "' [w-l(q+l/2).;; (l-tl)(q+l/2).;; ]} L p "-'n,l Wp ...,-n-1,[11 (3.42) for the lower half-range of n and 0 :'0 q :'0 (p-3)/2. For N = p, n = 0 this reduces to: In addition: Yo,q 1/p{i( -1)q+l iiio,(N-1)/2 + Yn,(p-1)/2 (p-3)/2 2ilm[ L w;(Z+l/2)(q+l/2);;;o,zl} l::::O 1/p{( -1)(p-1)/2 iiin,(p-1)/2 + (p-3)/2 I: ( -1)1r:i:n,t + l=O 132 (3.43) (3.44)

PAGE 144

for the lower half-range of n. For N = p,n = 0 note that Yo,(N-1);2 = -ifJo,(N -1)/2 IS 1magmary. We now prove Theorem 3.12. First, assume that pis even. The forward combine equation for I sequences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all I sequences: Zn,p-l-1 '"(p-l-1)Njp+n '"N-(l+l)Nfp+n -a;(l+l)N/p-n-1 -'X-n-l,l-.rl Using this result, we obtain: Yn,q = = p-1 1 / -q(n+1/2)"' -lq pwN Zn,l l=O p/2-1 1 / -q(n+l/2) "' r -lq + -q(p-1-1) 1 pwN L..i LWp Xn,l WP Xn,p-l-11 l=O p/2-1 1 / -q(n+l/2) "' -lq q(l-t-1)--r PW_rv L., [Wp "r,,l-WP '"-n-1,1+1] l=O For N = p, n = 0 this reduces to: p/2-1 Yo,q 1/pw;/2 2:: [w;'xo,l-l=O p/2-1 1/p "' [w-q(l+l/2), w(h-1/2);;; ] L..,_; p O,l p O,l 1::::0 p/2-1 2i/plm[ 2:: w;('+1 / 2)xo,d l=O The forward combining ofiOCS, ISOCS, and IOCSIS sequences requires one new equation: Yn,p/2 (n+l/2)/2 WN/p Yn,p/2 p/2-1 1/p 2:: (-1)1[xn,l + '"-n-1,1+,] l=O 133

PAGE 145

The forward combine equation for ISOCSIS sequences is obtained by substituting equation (3.9) into equation (3.35): p/2-1 = 1 / -q(n+l/2) "I<' [ -lq q(l-:-1)J Yn,q pwN L...wp Xn,l wp ;.e-n-1,1+1 l=O p/2-1 = 1/ -(n+l/2)(q+1/2)r "I<' ,-l(q+l/2)_ pwN l (.;.;P Xn,l l=O p/2-1 "I<' w(l+1)(q+1/2);;; 1 p -n-l,l-i-1 .. l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.36): p/2-1 YO,q 2ijpimf w-q(l+l/2)x0 ,1 L p ,.; l=O p/2-1 2i/pim[ L w;(l+l/2)(q+1/2);;o,d 1::::::0 Next, assume that p is odd. The forward combine equation for I se quences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. p-1 1 / -q(n+l/2) "I<' ,-lq Yn,q pwN L WP ;r.n,l l=O 1 / -q(n+l/2){ -q(p-1)/2 + P WN WP "'n,(p-1)/2 (p-3)/2 "' [ -lq + -qlp-l-1) ]} L WP Zn,l WP Xn,p-l-1 l=O 1/p w -q(n+1/2){(-1)qwq/Zx ( )/ ..L N p n, p-l 2 (p-3)/2 w(l+;)o;;]} L p ""n,lp "'-n-l,l+l l=O For N = p, n = 0 this reduces to: 134

PAGE 146

Yo,q = + (p-3)/2 "' wq(l+l),.'} L...,; p wO,l p wO,lj l=O 1/p{( -1)xo,(N-1)/2 + (p-3)/2 "' [w-q(l+l/2), wq(l+l/2);;; 'i} L....J p O,l p O,l., l=O (p-)/2 1/p{(-1)qxo,(N-1);2 + 2ilm[ L w;U+1 /2lxo,z]} kO The forward combining of IOCS and IOCSIS >equences does not require any new equations. The forward combine equation for ISOCS and ISOCSIS sequences is obtained by substituting equation (3.9) into equation (3.40): Yn,q = 1 'pw-q(n+1/2){( + I N -p (p-3)/2 "' [ -lq q(l+l)-'} L....i WP Zn,l WP X-n-l,l+lJ l=O = 1/ -(n+l/2)(q+l/2){ '( 1)q+l .(q+l/2)/2+ pwN -WP Xn,(p-1)/2 (p-3)/2 "' [w-l(q+l/2);; w(l+l)(q+l/2);; ]} L....t p n,l p -n-l,l,-1 l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.41). (p-3)/2 Yo,q = 1/p{( -1)qxo,(N-1)/2 + 2ilm[ L w;q(!+l/2 lxo,z]} l=O (p-3)/2 = 1/p{i( -l)q+l,;;O,(N-1)/2 + 2ilm[ L w;U+1/2)(q+l/2 );;o,l]} l=O For q = (p-1)/2 we obtain: (n+l/2)/2 Yn,(p-1)/2 = WN/p Yn,(p-1)/2 (p-3)/2 1/p{( -1)(p-1)/2xn,(p-1)/2 + L ( -l)1ixn,l + '"-n-1,!+1]} l=O 135

PAGE 147

Tills completes the proof of Theorem 3.12. The following corollary provides an important special case of tills result. Corollary 3.9 Assume p = 2. The forward combine equation for I se quences zs: Yn,O ("'n,O-'"-n-1,1)/2 Yn,1 WN(n+l/ 2 )(,n,O + "-n-1,1)/2 for the lower half-range of n. The forward combine equation for JOGS and ISOCS sequences is: Yn,O ("'n,O-"'-n-1,1)/2 Yn,1 ("n,O + "'-n-1,1)/2 for the lower half-range of n. The forward combine equation for ISOCSIS sequences is: -(n+l/2)/2(-)/2 Yn,O -WN Xn,O-ZX_n--1,1 for the lower half-range of n. 136

PAGE 148

3.5 Real Composite Staggered Even -Staggered Even (RSE-SE) In this section, we will be concerned with the following symmetries: Definition 3.9 A real composite staggered even staggered even (RSE-SE) sequence Xn of length N, where N is even, is defined by: Note that an RSE-SE sequence of length N is also an RSE sequence of length N. A real odd conjugate symmetric zero odd term (ROCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries: Lemma 3.8 If Xn is an RSE-SE sequence of length N, where N is even, then its DST Xk is an ROCSZO sequence of length N. If Xk is an ROCSZO sequence of length N, where N is even, then its IDST Xn is an RSE-SE sequence of length N, We now prove Lemma 3.8. We will only prove the first assertion. Assume :Z:n is an RSE-SE sequence of length N, where N is even. Since :Z:n is also an RSE sequence oflength N, Lemma 3.4 implies that its DST Xk is an ROCS sequence of length N. Thus, we have only to prove the third property in the definition of an ROCSZO sequence. For this, we use the representation of Xk provided by Theorem 3. 7 and the RSE-SE symmetry of "'n as follows: N/2-1 Xk = L 2a:ncos[d(2n+l)/N] n=O 137

PAGE 149

N/2-1 L 2;eN/2-n-1 cos[r.k(N2n-1)/ N] n:::::::O N/2-1 (-1)k L 2:tncos[d(2n+1)/N] n=O This completes the proof of Lemma 3.8. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for NS-N boundary con ditions. Note that if N = 2(2M + 1), then an RSE-SE sequence satisfies NS-N boundary conditions for the computational domain 0 :S n :S M. That lS: Theorem 3.13 Let Xn be an RSE-SE sequence and let Xk be its ROCSZO symmetric DST, both of length N where N is even. Assume that N = 2(2M + 1). The real form of the DST is: M-1 x2k = 2jN{(-1)kxM+ L 2xncos[21rk(2n+ 1)/N]} n=O for 0 S k :S M. The real form of the IDST is: M Xn = Xo + L 2X2 k cos[21rk(2n + 1)/ N] k=1 forO :S n :SM. We now prove Theorem 3.13. The result for the DST follows from The orem 3.7, the ROCSZO symmetry of Xk> and the RSE-SE symmetry of Xn as follows: N/2-1 x2k 1/N L 2xncos[2o>rk(2n+1)/N] n=D 138

PAGE 150

M-1 1/ N {( -1 )k2a:M + L 2a:n cos[27rk(2n + 1)/ N] + n=D M-1 L 2:z:N/2-n-1 cos[27rk(N2n-1)/ N]} n=O M-1 2/N{(-l)k:rM+ L 2:rncos[27rk(2n+ 1)/N]} n=O The result for the IDST follows immediately from Theorem 3. 7 and the ROCSZO symmetry of Xk. Note that only one fourth of the RSE-SE sequence :Cn needs to be specified. This completes the proof of Theorem 3.13. A fast, mixed radix algorithm for computing the RSE-SE symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSE symmetric FST. Note that an RSE-SE sequence of length N may be stored in N/4 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an ROCSZO sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCSZO sequence Xk. We begin developing this algorithm by defining one new intermediate symmetry involved. Definition 3.10 A zero (Z) sequence Xk of length N is defined by: forO :0: k::; N -1. The following lemma establishes the relationship between the symmetries which occur in the splittings of the ROCSZO sequence Xk. We omit the proof of this result because it is trivial. Lemma 3.9 Let Xk be an ROCSZO sequence of length N with factor 2. Then subsequence Xk,o is ROCS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the ROCS sequence Xk,O are identical to those in Lemma 3.2, with the addition that all sequences have R symmetry as well. 139

PAGE 151

A mixed radix splitting tree diagram for an ROCSZO sequence is shown in Figure 3.4. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3 and 3.5, with the addition provided by the following lemma. We omit the proof of this result because it is trivial. Lemma 3.10 Let Xk be a Z sequence of length N. Its IDST "'n is also a Z sequence of length N. These results show that each sy.mmetry appearing in Figure 3.4 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSE-SE symmetric IFST, obtained as a special case of that for the RSE symmetric IFS T. Corollary 3.10 Assume p = 2. The inverse combine equation for ROCS and Z sequences is: Xn,O = Yn,O for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3. 8 for arbitrary factors p. We now prove Corollary 3.10. The inverse combine equation for ROCS and Z sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.6 and use the Z symmetry of Yn,l Note that the companion equation is not needed because only one fourth of the RSE-SE sequence "n needs to be computed. This completes the proof of Corollary 3.10. The next corollary provides all of the forward combine equations for the RSE-SE symmetric FST, obtained as a special case of that for the RSE symmetric FST. Corollary 3.11 Assume p = 2. The forward combine equation for ROCS and Z sequences is: Yn,O Xn,O Yn,l 0 140

PAGE 152

ROCSZO Figure 3.4: Splitting tree for RSE-SE symmetric FST 141

PAGE 153

for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 3. 9 for arbitrary factors p. We now prove Corollary 3.11. The forward combine equation for ROCS and Z sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3. 7 and use the RSE-SE symmetry of Xn as follows: Yn,O = (xn,O + "-n-1,1)/2 = (xn + "N/2-n-1)/2 (xn + Xn)/2 Xn,O iin,l = (xn,O-"-n-1,1)/2 = (:i:n-"N/2-n-1)/2 = ( :l:n -Xn) /2 = 0 This completes the proof of Corollary 3 .11. 142

PAGE 154

3.6 Real Composite Staggered Even Staggered Odd (RSE-SO) In this section, we will be concerned with the following symmetries: Definition 3.11 A real composite staggered even staggered odd (R5E-50) sequence "'n of length N, where N is even, is defined by: iN -n-1 Zn Note that an R5E-50 sequence of length N is also an R5E sequence of length N. A real odd conjugate symmetric zero even term (ROC5ZE} sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.11 If "'n is an R5E-50 sequence of length N, where N is even, then its D5T Xk is an ROC5ZE sequence of length N. If Xk is an ROC5ZE sequence of length N, where N is even, then its ID5T "'n is an R5E-50 sequence of length N. We now prove Lemma 3.11. \Ve will only prove the first assertion. As sume "'n is an RSE-SO sequence of length N, where N is even. Since "'n is also an RSE sequence of length N, Lemma 3.4 implies that its DST Xk is an ROCS sequence of length N. Thus, we have only to prove the third property in the definition of an ROCSZE sequence. For this, we use the representation of X k provided by Theorem 3. 7 and the RSES 0 symmetry of "'n as follows: N/2-1 xk = L 2:cncos[d(2n+l)/N] n=O 143

PAGE 155

N/2-1 L 2xN/2-n-1 cos[7rk(N-2n-1)1 N] n=O N/2-1 ( -1)k+l L 2xn cos[7rk(2n + 1)/ N] n=O This completes the proof of Lemma 3.11. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector ex pansion required by the Fourier analysis method for NS-DS or NS-D bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an RSE-SO sequence satisfies NS-DS boundary conditions for the computational domain 0 S: n S: N I 4 1. That is: 'J1N-1 = Xo '"N/4-1 = -ZN/4 Similarly, if N = 2(2M +1), then an RSE-SO sequence satisfies NS-D bound ary conditions for the computational domain 0 S: n S: M 1. That is: -Xo "'M = 0 Theorem 3.14 Let zn be an RSE-SO sequence and let Xk be its ROCSZE symmetric DST, both of length N where N is even. Assume that N = 4M. The real form of the DST is: N/4-1 x2k+1 = 2IN L 2:cncos[,-(2k + 1)(2n + 1)IN] n=O for 0 S: k S: N I 4 1. The real form of the IDST is: N/4-1 Zn = L 2X2k+l cos[,-(2k + 1)(2n + 1)IN] k=O for 0 S: n S: Nl4-1. Next, assume that N = 2(2M + 1). The real form of the DST is: M-1 x2k+l = 2/N L 2Xn cos[7r(2k + 1)(2n + 1)/N] n=O 144

PAGE 156

for 0 :0: k :0: M -1. The real form of the IDST is: M-1 Zn = L 2X2k+l cos[11"(2k + 1)(2n + 1)/N] k:=O for 0 :0: n :0: M -1. Note that the results for the DST and IDST are identical except for scaling. We now prove Theorem 3.14. We prove the result for the DST for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 3.7, the ROCSZE symmetry of Xk> and the RSE-SO symmetry of :l!n as follows: N/2-1 x2k+l = 1/N L 2:!!nCOS[11"(2k+ 1)(2n+ 1)/N] n=O N/4-1 = 1/N{ L 2zn cos[11"(2k + 1)(2n + 1)/N] + n=O N/4-1 L 2"'N/2-n-1 cos[11"(2k + l)(N-2n-1)/N]} n=O N/4-1 2/N L 2xncos[11"(2k+l)(2n+1)/N] n=O The results for the IDST follow immediately from Theorem 3. 7 and the ROCSZE symmetry of Xk. Note that only one fourth of the RSE-SO sequence Xn needs to be specified. This completes the proof of Theorem 3.14. A fast, mixed radix algorithm for computing the RSE-SO symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSE symmetric FST. Note that an RSE-SO sequence oflength N may be stored in N /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an ROCSZE sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. 145

PAGE 157

Lemma 3.12 Let Xk be an ROCSZE sequence of length N with factor 2. Then subsequence Xk,o is Z symmetric, and subsequence Xk,l is RSOCS symmetric. The symmetries which occur in the splittings of the RSOCS sequence Xk,l are identical to those in Lemma 3.2, with the addition that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RO CS ZE sequence is shown in Figure 3.5. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.5, and 3.10. These results show that each symmetry appearing in Figure 3.5 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSE-SO symmetric IFST, obtained as a special case of that for the RSE symmetric IFS T. Corollary 3.12 Assume p = 2. The inverse combine equation for Z and RSOCS sequences is: Xn,O = f.in,l for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.8 for arbitrary factors p. We now prove Corollary 3.12. The inverse combine equation for Z and RSOCS sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.6 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the RSE-SO sequence Xn needs to be computed. This completes the proof of Corollary 3.12. The next corollary provides all of the forward combine equations for the RSE-SO symmetric FST, obtained as a special case of that for the RSE symmetric FST. Corollary 3.13 Assume p = 2. The forward combine equation for Z and RSOCS sequences is: Yn,O = 0 146

PAGE 158

ROCSZE Figure 3.5: Splitting tree for RSE-SO symmetric FST 147

PAGE 159

for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 3. 9 for arbitrary factors p. We now prove Corollary 3.13. The forward combine equation for Z and RSOCS sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.7 and use the RSE-SO symmetry of Zn as follows: Yn,O (a:n,O + "'-n-1,,)/2 (xn + "'N/2-n-1)/2 = (xn-Xn)/2 0 Yn,1 (xn,O-"'-n-1,1)/2 (xn-"'N/2-n-1)/2 (:vn + Xn)/2 This completes the proof of Corollary 3.13. 148

PAGE 160

3. 7 Real Composite Staggered Odd Staggered Even (RSO-SE) In this section, we will be concerned with the following symmetries: Definition 3.12 A real composite staggered oddstaggered even (RSO-SE) sequence Xn of length N, where N is even, is defined by: Note that an RSO-SE sequence of length N is also an RSO sequence of length N. An imaginary odd conjugate symmetric zero even term (IOCSZE) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.13 If Xn is an RSO-SE sequence of length N, where N is even, then its DST Xk is an IOCSZE sequence of length N. If Xk is an IOCSZE sequence of length N, where N is even, then its IDST Xn is an RSO-SE sequence of length N. We now prove Lemma 3.13. We will only prove the first assertion. As sume "'n is an RSO-SE sequence of length N, where N is even. Since Xn is also an RSO sequence of length N, Lemma 3.6 implies that its DST Xk is an IOCS sequence of length N. Thus, we have only to prove the third property in the definition of an IOCSZE sequence. For this, we use the representation of Xk provided by Theorem 3.10 and the RSO-SE symmetry of :l!n as follows: N/2-1 Xk = -if N L 2xn sin[7rk(2n + 1)/ N] n;;;;;Q 149

PAGE 161

N/2-1 -i/N L 2xN/2-n-1sin[r.k(N2n -1)/N] n=O N/2-1 (-1)k+l{-i/N L 2xnsin[d(2n+1)/N]} n=O This completes the proof of Lemma 3.13. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DS-NS or DS-N bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an RSO-SE sequence satisfies DS-NS boundary conditions for the computational domain 0 :S n :S N /4-1. That is: "'N/4-1 "'N/4 Similarly, if N = 2(2M + 1), then an RSO-SE sequence satisfies DS-N bound ary conditions for the computational domain 0 :S n :S M. That is: ZN-1 = -xo XM-1 = XM+l Theorem 3.15 Let Xn be an RSO-SE sequence and let Xk be its IOCSZE symmetric DST, both of length N where N is even. Assume that N = 4M. The real form of the DST is: N/4-1 Im(X2k+l) = -2/N L 2xn sin[1r(2k + 1)(2n + 1)/N] n=O for 0 :S k :S N /4 -1. The real form of the IDST is: N/4-1 Xn =L 2Im(X2k+l) sin[1r(2k + 1)(2n + 1)/N] k=O for 0 :S n :S N/4-1. Next, assume that N = 2(2M + 1). The real form of the DST is: M-1 Im(X2k+l) = -2/ N {( -1/:vM + L 2xn sin[1r(2k + 1)(2n + 1 )/ N]} n=O 150

PAGE 162

for 0 :0: k :0: M. The real form of the IDST is: M-1 "'n = ( -1)n+l Im(XN/2)L 2Im(X2k+1) sin[1r(2k + 1)(2n + 1)1 N] k::::::.O for 0 :<:; n :0: M. Note that the results for the DST and IDST are identical except for scaling. We now prove Theorem 3.15. We prove the result for the DST for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 3.10, the IOCSZE symmetry of Xk, and the RSO-SE symmetry of Xn as follows: N/2-1 Im(X2k+1) = -11 N L 2zn sin[1r(2k + 1)(2n + 1)IN] n=O N/4-1 -1IN{ L 2znsin[1r(2k + 1)(2n + 1)IN] + n=O N/4-1 L 2'"N/2-n-1 sin[1r(2k + 1)(N-2n1)1 N]} N/4-1 = -2IN L 2xn sin[1r(2k + 1)(2n + 1)IN] n=O The results for the IDST follow immediately from Theorem 3.10 and the IOCSZE symmetry of Xk. Note that only one fourth of the RSO-SE se quence Zn needs to be specified. This completes the proof of Theorem 3.15. A fast, mixed radix algorithm for computing the RSO-SE symmetric DST and its inverse, given "n in natural order, may be obtained as a special case of that for the RSO symmetric FST. Note that an RSO-SE sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an IOCSZE sequence of length N may be stored in N I 4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. 151

PAGE 163

Lemma 3.14 Let Xk be an IOCSZE sequence of length N with factor 2. Then subsequence Xk.o is Z symmetric, and subsequence Xk,l is ISOCS sym metric. The symmetries which occur in the splittings of the ISOCS sequence Xk,l are identical to those in Lemma 3.2, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCSZE sequence is shown in Figure 3.6. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.7, and 3.10. These results show that each symmetry appearing in Figure 3.6 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSO-SE symmetric IFST, obtained as a special case of that for the RSO symmetric IFST. Corollary 3.14 Assume p = 2. The inverse combine equation for Z and ISOCS sequences is: Zn,O = i/n,l for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.11 for arbitrary factors p. We now prove Corollary 3.14. The inverse combine equation for Z and ISOCS sequences may be regarded as a special case of that for IOCS and ISOCS sequences, where p = 2. Thus, we apply Corollary 3.8 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the RSO-SE sequence Xn needs to be computed. This completes the proof of Corollary 3.14. The next corollary provides all of the forward combine equations for the RSO-SE symmetric FST, obtained as a special case of that for the RSO symmetric FST. Corollary 3.15 Assume p = 2. The forwar-d combine equation for Z and ISOCS sequences is: Yn,O 0 fin,l :Vn,O 152

PAGE 164

IOCSZE Figure 3.6: Splitting tree for RSO-SE symmetric FST 153

PAGE 165

for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 3.12 for arbitrary factors p. We now prove Corollary 3.15. The forward combine equation for Z and ISOCS sequences may be regarded as a special case of that for IOCS and ISOCS sequences, where p = 2. Thus, we apply Corollary 3.9 and use the RSO-SE symmetry of "'n as follows: Yn,O (:cn,O-"-n-1,,)/2 (:en-"'N/2-n-1)/2 = (:xn-:Xn)/2 = 0 iin,l = (:xn,O + "'-n-1,1)/2 = (:>On+ "N/2-n-1)/2 (:>On+ "'n)/2 = Zn,O This completes the proof of Corollary 3.15. 154

PAGE 166

3.8 Real Composite Staggered Odd -Staggered Odd (RSO-SO) In this section, we will be concerned with the following symmetries: Definition 3.13 A real composite staggered odd staggered odd (RSO-SO) sequence >:n of length N, where N is even, is defined by: ';CN/2-n-1 -Zn Note that an RSO-SO sequence of length N is also an RSO sequence of length N. An imaginary odd conjugate symmetric zero odd term (IOCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.15 If "'n is an RSO-SO sequence of length N, where N is even, then its DST Xk is an IOCSZO sequence of length N. If Xk is an IOCSZO sequence of length N, where N is even, then its IDST >:n is an RSO-SO sequence of length N. We now prove Lemma 3.15. We will only prove the first assertion. As sume >:n is an RSO-SO sequence of length N, where N is even. Since :Cn is also an RSO sequence of length N, Lemma 3.6 implies that its DST Xk is an IOCS sequence of length N. Thus, we have only to prove the third property in the definition of an IOCSZO sequence. For this, we use the representation of Xk provided by Theorem 3.10 and the RSO-SO symmetry of :Z:n as follows: N/2-1 xk -i/ N L 2"n sin[71"k(2n + 1)/ N] n=O 155

PAGE 167

N/2-1 = -i/ N L 2xN;2-n-! sin[?rk(N2n-1)/ N] n;;;::O N/2-1 (-1)k{-i/N L 2xnsin[r.k(2n + 1)/N]} n=D This completes the proof of Lemma 3.15. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DS-D boundary con ditions. Note that if N = 2(2M + 1), then an RSO-SO sequence satisfies DS-D boundary conditions for the computational domain 0 :0: n :0: M 1. That is: ZN-1 = -Zo :CM = 0 Theorem 3.16 Let :en be an RSO-SO sequence and let Xk be its IOCSZO symmetric DST, both of length N where N is even. Assume that N = 2(2M + 1). The real form of the DST is: M-1 Im(X2k) = -2/N L 2xnsin[21fk(2n + 1)/N] n=O for 1 :0: k :0: M. The real form of the IDST is: M :Cn =L 2Im(X2k) sin[2r.k(2n + 1)/NJ k=l forO :0: n :0: M -1. We now prove Theorem 3.16. The result for the DST follows from The orem 3.10, the IOCSZO symmetry of Xk, and the RSO-SO symmetry of Xn as follows: N/2-1 Im(X2k) = -1/N L 2:cnsin[2r.k(2n + l)/N] n::::O 156

PAGE 168

M-1 -1/ N { L 2xn sin[2d(2n + 1)/ N] + n:::;;O M-1 L 2xN/2-n-l sin[2d(N2n-1)/ N]} M-1 = -2/N L 2xnsin[27rk(2n+ 1)/N] n=O The result for the IDST follows immediately from Theorem 3.10 and the IOCSZO symmetry of Xk. Note that only one fourth of the RSO-SO se quence :>:n needs to be specified. This completes the proof of Theorem 3.16. A fast, mixed radix algorithm for computing the RSO-SO symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSO symmetric FST. Note that an RSO-SO sequence of length N may be stored in N j 4 real storage loc<>tions, compared to 2N real storage locations for a C sequence of length N. Similarly, an IOCSZO sequence of length N may be stored in N/4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCSZO sequence Xk. This does not introduce any new intermediate syrrunetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 3.16 Let Xk be an JOGSZO sequence of length N with factor 2. Then subsequence Xk,o is JOGS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the JOGS sequence xk,O are identical to those in Lemma 3.2, with the addttion that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCSZO sequence is shown in Figure 3.7. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.7, and 3.10. 157

PAGE 169

IOCSZO Figure 3. 7: Splitting tree for RSO-SO symmetric FST 158

PAGE 170

These results show that each symmetry appearing in Figure 3. 7 induces a symmetry in the rDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSO-SO symmetric IFST, obtained as a special case of that for the RSO symmetric rFST. Corollary 3.16 Assume p = 2. The inverse combine equation for JOGS and Z sequences is: :Vn,O = Yn,O for the lower half-range of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.11 for arbitrary factors p. We now prove Corollary 3.16. The inverse combine equation for rOCS and z sequences may be regarded as a special case of that for roes and rSOCS sequences, where p = 2. Thus, we apply Corollary 3.8 and use the Z symmetry of Yn,1. Nate that the companion equation is not needed because only one fourth of the RSO-SO sequence Xn needs to be computed. This completes the proof of Corollary 3.16. The next corollary provides all of the forward combine equations for the RSO-SO symmetric FST, obtained as a special case of that for the RSO symmetric FST. Corollary 3.17 Assume p = 2. The forward combine equation for JOGS and Z sequences is: Yn,O Zn,O Yn,1 = 0 for the lower half-range of n. The forward combine equations for the re maining symmetries are provided by Theorem 3.12 for arbitrary factors p. We now prove Corollary 3.17. The forward combine equation for roes and z sequences may be regarded as a special case of that for roes and rSOCS sequences, where p = 2. Thus, we apply Corollary 3.9 and use the RS 0-SO symmetry of Xn as follows: 159

PAGE 171

Yn,O (xn,O-"-n-1,1)/2 = (a:n"N/2-n-1)/2 ( Xn + "n) /2 :l!n,O fin,l ("n,O + "-n-1,1)/2 (xn + "N/2-n-1)/2 = ("n-"n)/2 0 This completes the proof of Corollary 3.17. 160

PAGE 172

3.9 Tables of Symmetries Table 3.1: Symmetries in the IDST Aero Symmetry Sequence DST Periodic ZN+n = Zn XN+k = -Xk R Real ::Cn = Zn XN-k = -Xk RSE Real :Cn = Zn xk =Xk Staggered ZN-n-1 = Xn XN-k = -Xk Even RSO Real 'Zn = Xn xk = -Xk Staggered XN-n-1 = -Xn xN-k = xk Odd RSE-SE Real Composite Z"n = Xn xk =Xk S.Even-S.Even XN-n-1 = Xn XN-k = -Xk (N even) :l!N/2-n-1 = Xn Xk=(-ltxk RSE-SO Real Composite Xn = Zn xk =Xk S.Even-S.Odd ZN-n-1 = :Z:n XN-k = -Xk (N even) ZN/2-n-1 = -Xn xk = (-l)k+'xk RSO-SE Real Composite Z"n = Xn xk = -Xk S.Odd-S.Even ZN-n-1 = -Xn xN-k = xk (N even) ZNj2-n-1 = Xn xk = ( -l)k+' xk RSO-SO Real Composite "fn = Xn xk = -Xk S.Odd-S.Odd XN-n-1 = -Xn xN-k = xk (N even) XN/2-n-1 :::::::: --Xn Xk=(-l)kXk 161

PAGE 173

Table 3.2: Symmetries in the DST Aero Sym Sequence IDST Periodic XN+k-Xk ZN+n = Zn ocs Odd XN-k = -Xk Xn = Zn Conj Sym socs Stag XN-k-1 = -Xk -(n+1/2)Zn = WN :Vn Odd -(n+l/2)/2::Cn = WN Zn Conj Sym OCSIS ocs Xk,p-q = -XN/p-k-1,q -(n+l/2)Yn,p-q = WN/p Yn,q Indcd Interseq Sym SOC SIS socs Xk,p-q-1 = -X N/p-k-1,q -(n+l/2)-Yn,p-q-1 = WNjp Yn,q Indcd Interseq Sym R Real xk -xk "'N n-1 = Z'n I !mag xk--Xk XN n-1 = -Xn 162

PAGE 174

Table 3.2: ( contd.) Aero Sym Sequence IDST ROCSZO ROCS & Zero xk =Xk X"n = llln Odd Terms XN-k = -Xk ZN-n-1 = Zn (N even) Xk=(-l)kXk ZN/2-n-1 = Zn ROCSZE ROCS & Zero xk =Xk Zn = Zn Even Terms XN-k = -Xk ZN-n-1 = Zn (N even) xk = ( -l)k+l xk ZN/2-n-1 = -Xn IOCSZE IOCS & Zero xk = -Xk 'Xn = Xn Even Terms xN-k = xk ZN-n-1 = -Xn (N even) xk = ( -l)k+l xk ZN/2 n 1 = Xn IOCSZO IOCS & Zero xk = -Xk Xn = Zn Odd Terms xN-k = xk XN-n-1 = -Xn (N even) xk = (-l)kxk '>'N/2 n 1 = -Xn z Zero xk = o :l!n = 0 163

PAGE 175

Chapter 4 Software Implementation and Performance 4.1 Introduction We begin this chapter by estimating the mmber of lines of FORTRAN code required to implement all of the FFT and FS T algorithms presented in the preceding chapters. There are 5 basic transforms required to address all boundary conditions. These are the R, RE, RO FFTs and the RSE, RSO FSTs. We have excluded all of the composite symmetries because they are special cases of the 5 basic transforms listed above. Note also that by a basic transform we mean both the forward and inverse directions, since one direc tion is seldom useful without the other. For each basic transform, we have identified a need for 4 values of the radix p, namely p = 2, 3, 4, 5. As will be explained in Section 4.6, we have found that larger values of plead to ineffi cient implementations on most vector computers. Each basic transform may be implemented in-place, producing the forward transform in a permuted order, or they may utilize additional storage, producing the forward trans form in natural order. In either case, we require that the inverse transform be produced in natural order because the original data is provided in natu ral order. On the other hand, the forward transform may be produced in a permuted order because it is usually followed by an inverse transform which accepts its input in that same order. We will be focusing our attention on serial vector processors. However, these algorithms have excellent potential for parallelization. Shared memory machine architectures generally require

PAGE 176

only simple modifications to the serial code. Distributed memory machine architectures, on the other hand, require significantly different data man agement techniques in order to minimize interprocessor communication [llj. Thus, we have identified a need for at least 2 variations of these codes cor responding to these broad classes of machine architectures. If we combine the independent options discussed above, we obtain at least 5 x 4 x 2 x 2 = 80 variations of the basic transforms. From prototype software, which will be described in Section 4.5, we estimate that each variation requires approximately 750 lines of FORTRAN code. Thus, the entire package of 80 variations requires approximately 60K lines of FORTRAN code. In view of the estimated size of the complete software package, we have selected just one of the basic transforms to implement and test in detail. The transform we have selected is the RO FFT. There are two reasons for this selection. The first reason is that if xn is an RO sequence of length N, where N is even, then: xo = 0 '"N/2 0 Thus, the RO FFT presents the additional problem of eliminating all com putations involving zeros in the data. The RO FFT is unique with this problem. The second reason is that there is a well known implementation of the prepost-processing algorithm for the RO FFT: VFFTPK [5, 7]. One of our goals is to compare the performance of the compact algorithms with their prepost-processing counterparts. Thus, we would want to avoid selecting the RSE-SO FST, for example. The remainder of this chapter will be concerned with the implementa tion and performance of software for the RO FFT. Furthermore, we have restricted our attention to the forward transform, since there are no new issues involved in the inverse transform. The implementation process begins by developing simplified forms of the forward combine equations for a specific value of the radix p. We have restricted our attention to p = 2, 3, 4 because this is sufficient to illustrate the breadth of difficulties involved in imple menting this algorithm, and also provides enough flexibility for conducting a thorough performance comparison to VFFTPK. We have also restricted our attention to an in-place algorithm for a serial vector processor. Thus, for each value of p we find storage patterns for the data which allow the combine equations to be executed in-place on this machine architecture. The general design of the software is then described, with emphasis on the automated 165

PAGE 177

generation of splitting trees. Next, we present the results of performance tests of this software, using VFFTPK as a baseline. Results are presented for both an IBM 3090J and a Cray Y-MPS/864. These results are analyzed in detail, and a timing model is presented. Finally, we wish to automate as much of the implementation process as possible, in view of the estimated size of the entire package. We describe how Mathematica [12] can be used to automate most of the steps described above. 166

PAGE 178

4.2 The Radix-2 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix-2 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.1 Assume p = 2. The forward combine equation for I se-quences zs: ( 4.1) for 0$ n $ N/4 if N/2 is even, or 0$ n $ (N-2)/4 if N/2 is odd, and 0::; q :0 1. For n = 0 and q = 0,1 equation (4.1} reduces to: Yo,o = i(Im(xo,o) + Im(xo,l))/2 Yo,1 = i(Im(xo,o)Im(xo,l))/2 If N/2 is even, thenforn = N/4 and q = 0,1 equation (4.1} reduces to: YN/4,0 YN/4,1 i(2Im( '"N/4,0) )/2 i( -2Re(xN;4,o))/2 For the remaining values of n and q = 0, 1, equation (4.1} reduces to: Yn,o {Re(xn,o)-Re(x-n,1) + i[Im(xn,o) + Im(x-n,l)]}/2 Yn,1 = wjt{Re(xn,o) + Re(:Ln,l) + i[Im(xn,o)Im(:"-n,l)J}/2 The forward combine equation for ICS and ISCS sequences is identical to that for I sequences with the exception that all sequences Xn,l are real and q = 0. In addition: Yn,1 = (xn,O + "'-n,l)/2 ( 4.2) for 0 :0 n ::; N /4 if N/2 is even, or 0 $ n ::; (N -2)/4 if N /2 is odd. For n = 0 equation (4.2) reduces to: iio,1 = 0 If N /2 is even, then for n = N/4 equation (1,.2) reduces to: YN/4,1 = (2xN/4,o)/2 167

PAGE 179

The forward combine equation for ISCSIS sequences is: -n/2 ( -.. ) ,2 Yn,O = WN Xn,O ZX-n,l / for 0::; n::; N/4 if N/2 is even, or 0::; n::; (j'i2)/4 if N/2 is odd, For n = 0 equation (4.3) reduces to: Yo,o = i( -ito,l)/2 If N/2 is even, then for n = N/4 equation {4.3) reduces to: YN/4,0 = i( -h XNj4,o)/2 We now prove Corollary 4.1. We will only provide the key steps of selected results which may not be obvious. In most of these cases, this involves the application of one or more of the symmetries summarized in Table 2.2. The forward combine equation for I sequences is simplified as follows. Since "'n is the IDFT of an I sequence, it follows from Lemma 2.8 that :l!o,o = m0 and m0 1 = mN;2 are both pure imaginary. The forward combine equation for ICS and ISCS sequences is simplified as follows. Since "n is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that "o.o = "o = 0 and "o,1 = "N/2 = 0. The forward combine equation for ISCSIS sequences is simplified as follows. Since Xn is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that m0 0 = "o = 0. For n = N /4 we proceed as follows: -1(-2 )/2 YN/4,0 wa "N/4,0-wa"-Nf4,1 = (wg 1XNj4,oi( -h 'iN/4,o)/2 Thls completes the proof of Corollary 4.1. Data storage patterns for all of the combine equations in Corollary 4.1 are shown in Figures 4.1 through 4.8. In each figure, the input quantities are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these combine equations can be executed in-place. In order to accomplish this, we never store redun dant data or variables which always have the value of zero. We have not illustrated the case of N/2 being odd because it is identical except for the absence of the variables corresponding ton= Nj4, 168

PAGE 180

The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix-2 RO FFT in Table 4.1. Table 4.1 utilizes a simple, compressed format suitable for automated ma chine generation and interpretation. It is actually a representation of the splitting tree for the radix-2 RO FFT, and is analogous to Figure 4.9. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.1 through 4.8. The corresponding repetition count rep resents contiguous repetitions of the same symmetric sequence. Table 4.1 shows the storage pattern for each factor of 2 in the RO FFT for N = 16. Each radix-2 splitting of an ICS sequence results in an ICS sequence followed by an ISCS sequence. However, the IDFT of an ICS sequence of length 2 is identically zero and is not stored. Each radix-2 splitting of an ISCS se quence results in a dual pair of I sequences, and the IDFT of only one of these is stored. Finally, each radix-2 splitting of an I sequence results in two I sequences. 169

PAGE 181

X1,0 Yl,O X2,0 Y2,o :z:a,o Ys,o Z4,0 Y4,1 Xs,o iis,1 :z:s,o f/2,1 Z7,0 f/1,1 Figure 4.1: Radix-2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = N /4 X1,0 1,0 Z2,0 Y2,o Z3,0 Ys,o Z4,0 Y4,1 zs,o iis.1 zs,o ii2,1 Z7,0 ill,l Figure 4.2: Radix-2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = 1 170

PAGE 182

Zo,1 Yo,o z1,o Iyl,O :l:a,o Iy2,o Zs,o Iys,o z4,o Iy4,o z3,o Rys,o :V2,0 RY2,o :2:1,0 Ryl,O Figure 4.3:. Radix-2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 0 Zo,l Iyo,o Z7,o Iyl,O Za,o Iy2,o Zs,o Iys,o Z4,o Y4,o i:3,0 Rys,o i:2,0 Ry2,o z1,o Ryl,O Figure 4.4: Radix-2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = N /4 171

PAGE 183

Zo,1 Iyo,o :e7,0 Yl,O Zs,o Iy2,o Zs,o Iy3,o z4,o Iy4,o 2:3,0 Ry3,o z2,o Ry2,o x1,o Yl,O Figure 4.5: Radix-2 storage pattern for ISCS induced synunetries for N = 16 highlighting the case n = 1 172

PAGE 184

I:z:o,o Yo,o I:v,,o Iy,,o I:v2,o Iy2,o I:na,o Iya,o I:z:4,0 Iy4,o Izs,o Rya,o I:vs,o Ry2,o I:z:7,0 Ryl,O l:vo,l YO,l Rx1,0 Iy,,, R:vs,o Iy2,1 R:vs,o Iya,1 R:v.,o Iy4,1 Rxa,o Rya,l Rx2,0 Ry2,1 Rx,,o Ry1,1 Figure 4.6: Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = 0 173

PAGE 185

bo,o Iyo,o Ix,,o Iyl,O I"2,o Iy2,o Ixs,o Iya,o r,.,o Y4,o Ixs,o Ry3,o I"e,o Ry2,o Ix7,o Ry,,o Ixo,l lyo,1 R"1,o Iy,,, R"e,o Iy2,1 Rxs,o Iy3,1 Rx4,o Y4,1 R"3,o RYa,l R"2,o Ry2,1 Rx1,o Ry1,1 Figure 4.7: Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = N/4 174

PAGE 186

Izo,o Ja:1,0 I:v2,o I:va,o l:v4,o Izs,o Ize,o Ix7,0 I:ro,l R"1,o Rze,o R:rs,o R:r4,o R"a,o R"2,o R"l,O lYo,o Iyl,O lY2,o Iva,o lY4,o Rva,o RY2,o Yl,O lY2,1 Iva,t lY4,1 Rva,1 RY2,1 Ry1,1 Figure 4.8: Radix-2 storage pattern for I sequences for N = 16 highlighting the case n = 1 175

PAGE 187

cs cs scs <. ICS scs N = 16 N =8 N= 4 N=2 Figure 4.9: Splitting tree for the radix-2 RO FFT for N = 16 176

PAGE 188

Table 4.1: Splitting Tree for the Radix-2 RO FFT for N = 16 Length Factor Symmetry Repetitions 16 2 ICS 1 8 2 ICS 1 ISCS 1 4 2 ICS 1 ISCS 1 I 1 2 2 ISCS 1 I 3 177

PAGE 189

4.3 The Radix-4 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix-4 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.2 Assume p = 4. The forward combine equation for I se quences zs: Y -.L ( 1)+l., .-." ,q., )/4 n,q -N ""n,O ...,.-n,2 T W4 "'n,l -'-"'4"-'-n,l ( 4.4) for 0 :S n :S N /8 if N /4 is even, or 0 :S n :S (N-4)/8 if N /4 is odd, and 0 :S q :S 3. Porn= 0 and q = 0,1,2,3 equation (4-4) reduces to: Yo,o i(Im(:z:o,o) + Im(:z:o,2) + 2i'?'(:z:o,!))/4 Yo,l = i(Im(xo,o)-Im(xo,2)-2Re(xo,!))/4 Yo,2 = i(Im(:co,o) + Im(:co,2)-2Im(:co,!))/4 Yo,a = i(Im(:z:o,o)-Im(:z:o,2) + 2Re(:z:o,,))/4 If N/4 is even, then for n = N/8 and q = 0,1,2,3 equation (4-4) reduces to: YN/8,0 = i(2lm(:z:N/8,o) + 2lm(:z:Njs,J))/4 YN/8,! -i.J2 (Re("'N/8,o)-lm(:z:Njs,o) + Re(:z:N/8,1) + lm(:z:N;s,J))/4 YN/8,2 = i( -2Re(:z:Nj8,o) + 2Re(:z:N;s, 1))/4 YN/8,3 = -i.J2 (Re("'N/8,o) + Im(:z:N/8,o) + Re(xN/8,!)-Im(:z:N/8,J))/4 For the remaining values ofn and q = 0, 1,2, 3, equation (4-4) reduces to: Yn,O = {Re(:cn,o)Re(:"-n,2) + Re(xn,l)-Re(x-n,l) + i[Im(:z:n,o) + Im(:z:-n,2) + Im(xn,!) + Im("'-n,l)]}/4 Yn,l = wjt{Re(:z:n,o) + Re(Ln,2) + Im(xn,l)Im(X-n,!) + i[Im("'n,o)-Im(:"-n,2)-Re(:cn,J)-Re(:"-n,J)]}/4 Yn,2 = w.N2 n{Re(:z:n,o)Re(:c-n,2)Re(xn,l) T Re("'-n,l) + i[Jm(:z:n,o) + Im("'-n,2)-Im(:cn,!)-Im("'-n,J)]}/4 Yn,3 = w.N3 n{Re(:cn,o) + Re(:z:-n,2)-Im(:cn,!) + Im(X-n,l) + i[Im("'n,o)-lm(:Ln,2) + Re(:cn,l) + Re(:c-n,l)]}/4 178

PAGE 190

The forward combine equation for IGS, ISGS, and IGSIS sequences is identical to that for I sequences with the exception that all sequences "n,l are real and 0 :S: q :<:: 1. In addition: Y-n2 = (;rnO-'"-n2-"n1 + "-nl)/4 ' ' ( 4.5) for 0 :S: n :S: N/8 if N/4 is even, or 0 :S: n :S: (N-4)/8 if N/4 is odd. For n = 0 equation (4.5) reduces to: iio,2 = 0 If N/4 is even, then for n = N/8 equation (4.5) reduces to: YN/8,2 = (2(;rN/8,0"N/8,1))/4 The forward combine equation for ISGSIS sequences is: -n(q+l/2)(+'( 1)q+l+ -q-1/2-, q+l/2)/4 ( 4.6) Yn,q -WN Zn,O Z Z-n,2 W4 Zn,l -W4 ;;c-n,l for 0 :S: n :S: N/8 if N/4 is even, or 0 :S: n :S: (N-4)/8 if N/4 is odd, and 0 :S: q :S: 1. For n = 0 and q = 0,1 equation (4.6) reduces to: Yo,o i( -:to,2-Y2 xo,l)/4 Yo,1 = i(
PAGE 191

Table 2.2. The forward combine equation for I sequences is simplified as follows. Since Zn is the IDFT of an I sequence, it follows from Lemma 2.8 that xo,o = Xo and x 0,2 = "'N/ 2 are both pure imaginary. For n = N /8 and q = 1 we proceed as follows: YN/8,1 = Ws1 (xN/8,0 + '"-N/8,2 + W41 '"N/8,1 ( -1 1+ -1( -1 + 1))/4 W8 XNj8,0-W8"'-N/8,1 W4 W8 "'N/8,1 W8X-Nj8,2 = i(2Im(w81"'N/8,o)-2Re(w8 1"N/8,J))/4 i(2Im(w81 )Re(xN;s,o) + 2Re(w81)Im(xN;s,o)2Re(w8 1 )Re(xN/8, 1 ) + 2Im(w81 )Im(xN/8,1 ))/ 4 -iv'2(Re(a:N/8,o)-Im(xN;8,o) + Re(:rN;s, 1 ) +Im(xN/8,1))/4 The forward combine equation for ICS, ISCS, and ICSIS sequences is sim plified as follows. Since "'n is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that xoo = xo = 0 and Xo 2 = "'N' 2 = 0. The forward ' combine equation for ISCSIS sequences is simplified as follows. Since a:n is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that xo,o = Xo = 0. For n = N /8 and q = 1 we proceed as follows: YN/8,1 W!6 3(i:N/8,0 + ii:_N/8,2 + Wg3XN/8,1 ( -3-3 -. -9-' 1 -)/4 W1s "N/8,0 w1s"-N/8,1 7 W1s "'N/8,1 7 w1s"-N/8,2 (xNj8,o(w!s3 -wfs) + XN/8,1( -w!i + wia))/4 i(:CN/8,o( -2Im(wf6))-:CN/8,1( -2Im(wi6)))/4 i(( -2 sin(31r /8))xN18, 0 + (2 sin(1r /8))xN/8 1)/4 This completes the proof of Corollary 4.2. Data storage patterns for all of the combine equations in Corollary 4.2 are shown in Figures 4.10 through 4.18. In each figure, the input quanti ties are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these com bine equations can be executed in-place. In order to accomplish this, we never store redundant data or variables which always have the value of zero. We have not illustrated the case of N /4 being odd because it is identical except for the absence of the variables corresponding to n = N /8. The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix-4 RO FFT in Table 4.2. 180

PAGE 192

Table 4.2 utilizes a simple, compressed format suitable for automated rna chine generation and interpretation. It is actually a representation of the splitting tree for the radix-4 RO FFT, and is analogous to Figure 2.4. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.10 through 4.18. The corresponding repetition count represents contiguous repetitions of the same symmetric sequence. Table 4.2 shows the storage pattern for each factor of 4 in the RO FFT for N = 64. 181

PAGE 193

X1,0 Yl,O Z2,0 Y2,o za,o Y3,2 Z4,0 f/2,2 zs,o Y1,2 ;r 0,1 YO,l :rl,l Iy,,, Z2,1 Iy2,1 za,l Iya,l Z4,1 Ry2,1 zs,l Ry,,, Figure 4.10: Radix-4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 0 Z1,0 Yl,O Z2,0 Y2,0 Z3,0 fJa,2 3::4,0 f/2,2 zs,o fJ, ,2 zo,l Iyo,l X1,1 Iy,,, Z2,1 Iy2,1 2:3,1 Ya,1 Z4,1 Ry2,1 zs,l Ry,,, Figure 4.11: Radix-4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = N /8 182

PAGE 194

Z1,0 :r:2,0 zs,o zo,l Z1,1 za,l :c4,1 zs,l 1,0 Yz,o Y3,z f/2,2 Y1,1 Ivz,1 Iy3,1 Rvz,1 Y1,1 Figure 4.12: Rad.ix-4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 1 183

PAGE 195

i:o,2 Yo,o Xs,1 Iyl,O 2:4,1 Iy2,o Za,1 Iy3,o x2,1 Ry,,o z-1 ,1 Ryl,O i:o,1 Yo,! Zs,o Iy1,1 2:4,0 Iy,,l 2:3,0 Iy2,1 2:2,0 Ry2,1 2:1,0 Ry1,1 Figure 4.13: Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 0 Zo,2 Iyo,o i:s,1 Iy,,o 2:4,1 Iy,,o ii:3,1 Y3,o ii:2,1 Ry2,o 2:1,1 Ry,,o Xo,1 Iyo,l i:s,o Iy1,1 i:4,0 Iy2,1 :l:a,o Ya,1 i:2,0 RY2,1 z1,o Ryl,l Figure 4.14: Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = N /8 184

PAGE 196

:i:o,2 :i:s,1 z4,1 :i:3,1 :i:2,1 Z1,1 Xo,l :l:s,o Z4,o ;1:3,0 z2,o .t1,0 Ivo,o Yl,O Iy2,o Iy3,o Ry2,o Yl,O Iy2,1 Iy3,1 Ry2,1 Y1,1 Figure 4.15: Radix-4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 1 185

PAGE 197

1xo,o 1Yo,o 1:z:,,o 1y,,o I:c2,0 1y2,o 1"3,0 Iy3,o 1>:4,0 Ry2,o lxs,o Ryl,O 1"o,J Yo,1 1"'' 1y,,, 1x2,1 1Y2,1 I:n3,! 1y3,1 1"4,1 Ry2,1 1"s,J Ry1,1 1"o,2 Yo,2 Rxs,1 1y1,2 Rz4,1 1y2,2 Rz3,! 1y3,2 Rz2,1 Ry2,2 Rzl,J Ry1,2 Rzo,1 lvo,3 Rzs,o 1y,,3 Rz4,o 1y2,3 Rzg,o 1y3,3 Rz2,0 RY2,3 Rz,,o Ry,,3 Figure 4.16: Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = 0 186

PAGE 198

bo,o Iyo,o Ixl,O Iy,,o b2,0 Iy2,o I;c3,o I Iy3,o h4,0 Ry2,o I"s,o Ry,,o I"o,1 Iyo,1 I"1,1 Iy1,1 I"2,1 Iy2,1 1"3,1 Y3,1 1"4,1 Ry2,1 l;cs,1 Ry1,1 I"o,2 lyo,2 R"s,1 Iy,,2 R"4,1 Iy2,2 R"3,1 Y3,2 R"2,1 Ry2,2 R"1,1 Ry1,2 R"o,1 Iyo,3 Iy,,a R"4,o Iy2,a lyg,a Ry2,3 R:c1,0 Ry,,a Figure 4.17: Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = N/8 187

PAGE 199

la:o,o Ia:,,o la:2,0 la:a,o la:4,0 bs,o Ixo,l la:1,1 la:2,1 la:a,l Ia:4,1 la:s,l Ixo,2 Ra:s,l Ra:4,1 Ra:a,l Ra:2,1 Ra:1,1 Ra:o,l Ra:s,o Ra:4,o Ra:a,o Ra:2,o Ra:,,o lyo,o Y1,2 ly1,3 ly2,3 ly3,3 Ry2,a Ry,,a Figure 4.18: Radix-4 storage pattern for I sequences for N = 24 highlighting the case n = 1 188

PAGE 200

Table 4.2: Splitting Tree for the Radix-4 RO FFT for N = 64 Length Factor Synrmetry Repetitions 64 4 ICS 1 16 4 ICS 1 ISCS 1 I 1 4 4 ICS 1 ISCS 1 I 7 189

PAGE 201

4.4 The Radix-3 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix-3 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.3 Assume p = 3. The forward combine equation for I sequences is: (4.7) for 0::; n::; N/6 if N/3 is even, or 0::; n::; (N3)/6 if N/3 is odd, and 0::; q::; 2. Porn= 0 and q= 0,1,2 equation (4.7) reduces to: Yo,o = i(Im(:co,o) + 2/m(:co,l))/3 Yo,! = i(Im(:co,o)v'3 Re(:co,,J-/m(:co,J))/3 Yo,2 = i(Jm(:co,o) + v'3 Re(:co,,)-/m(:co,J))/3 If N /3 is even, then for n = N/6 and q = 0, 1, 2 equation (47) reduces to: YN/6,0 = i(21m(:cN/6,o) + lm(:vN/6,1))/3 YN/6,! i(Im(:cN;a,o)-v'3 Re(xN;e,o)Im(xN/6,!))/3 YNj6,2 = i(!m(:cN/6,!)-vf3 Re(:vK/B,o)lm(:vN/6,o))/3 For the remaining values of n and q = 0, 1,2, equation (4. 7) reduces to: Yn,O {Re(:cn,o) + Re(:vn,!)Re(x-n,!) + i[Im(:cn,o) + Im(:vn,!) + Jm(:v-r.,J)]}/3 Yn,! = w]t{Re(:vn,o) + (l/2)(Re(x_,,,1)-Re(:cn,!)) + (v'3/2)(Jm(:cn,J) Im(:v-n,l)) + i[Im(:cn,o)-(1/2)(Jm(:cn,l) + Im(:c-n,!))(v'3/2)(Re(:v-n,!) + Re(xn,!))]}/3 Yn,2 wjV2 n{Re(xn,o) + (1/2)(Re(:v-n,1)-Re(:vn,!)) + (v'3/2)(Im(x-n,l)-Im(xr.,l)) + i[Im(:cn,o)-(1/2)(Im(x_n,d + Im(xn,l)) + (v'3/2)(Re(:c-n,l) + Re(xn,!))]}/3 190

PAGE 202

The forward combine equation for ICS and ICSIS sequences is identical to that for I sequences with the exception that all sequences Xn,l are real and 0 :::: q :::: 1. The forward combine equation for ISCS and ISCSIS sequences is: -n/2( -1/2 1/L )/3 Yn,O = WN Zn,O + w3 Zn,l -Wa a::-n,l ( 4.8) for 0 :':: n :':: N/6 if N/3 is even, or 0 :':: n :':: (N-3)/6 if N/3 is odd. n = 0 equation (4.8) reduces to: Yo,o = i( -vl3 :Eo,1)/3 If N /3 is even, then for n = N/6 equation {4.8) reduces to: YN/6,0 = i( -:EN/6,0-XN/6,1)/3 For the remaining values ofn equation (4.8} reduces to: Yn,O = w"Nn12(in,O + (1/2)(in,1-'"-n,1)-i(vf3/2)(xn,1 + X .. n,1))/3 In addition: For Yn,1 = (xn,O-Xn,1 + X .. n,1)/3 ( 4.9) for 0 :':: n :':: N/6 if N/3 is even, or 0 :':: n :':: (N-3)/6 if N/3 is odd. For n = 0 equation (4.9) reduces to: iio,1 = 0 If N /3 is even, then for n = N /6 equation { 4. 9) reduces to: YN/6,1 = (2:EN/6,0 --XNj6,1)/3 We now prove Corollary 4.3. We will only provide the key steps of selected results which may not be obvious. In most of these cases, this involves the application of one or more of the symmetries summarized in Table 2.2. The forward combine equation for I sequences is simplified as follows. Since Xn is the IDFT of an I sequence, it follows from Lemma 2.8 that "'o,o = "'o and "'N/ 6 1 = "N/2 are both pure imaginary. For n = N/6 and q = 1 we proceed as follows: -1( + -1 1-)/3 YN/6,1 we "Nf6,o W3 "N/6,1-w3"'--Nf6,1 = ( -1 1-' -1 )/3 W6 XNj6,0-W6X-Nj6,1 T W2 "'N/6,1 i(2Im(w61"'N/6,o)-Im("'N/6,1))/3 = i(2Im(w61)Re("'N/6,o) + 2Re(w61)Im("'N/s,o)-Im("'N/6,1))/3 = i(Im("'N/ 6 0)-vf3Re(xN; 6 ,0 ) -Im(xN; 6 1))/3 191

PAGE 203

The forward combine equation for ICS and ICSIS sequences is simplified as follows. Since Xn is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that xo,o = xo = 0 and "N/6 1 = "'N/2 = 0. The forward combine equation for ISCS and ISCSIS sequences is simplified as follows. Since Xn is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that xo,o = x0 = 0. For n = N /6 and q = 0 we proceed as follows: YN/6,0 = w]}(i:N/6,0 + W$1 XN/6,1 ( -1-1 -1-)/3 wl2 "N/6,0-wl2"'-Nf6,1 cW4 "'N/6,1 i(i:N/6,o(2Im(w;:}))-XNfs,J)/3 i(-!i:N/6,0-XN/6,1)/3 This completes the proof of Corollary 4.3. Data storage patterns for all of the combine equations in Corollary 4.3 are shown in Figures 4.19 through 4.30. In anticipation of the development ofthe general mixed radix RO FFT, we have included two storage patterns for the combine equations for I sequences. The first storage pattern is compatible with that for radix-2 and 4. The second storage pattern, which we refer to as 12 sequences, is compatible with the other storage patterns for radix-3. In each figure, the input quantities are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these combine equations can be executed in-place. In order to accomplish this, we never store redundant data or variables which always have the value of zero. We have not illustrated the case of N /3 being odd because it is identical except for the absence of the variables corresponding ton=N/6. The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix-3 RO FFT in Table 4.3. Table 4.3 utilizes a simple, compressed format suitable for automated ma chine generation and interpretation. It is actually a representation of the splitting tree for the radix-3 RO FFT, and is analogous to Figure 2.4. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.19 through 4.30. The corresponding repetition count represents contiguous repetitions of the same symmetric sequence. Table 4.3 shows the storage pattern for each factor of 3 in the RO FFT for N = 27. 192

PAGE 204

Z1,0 1!1 ,0 :r2,0 Y2,o zs,o Iya,J X4,0 ly2,1 xs,o Iy1,1 :ro,l YD,l x1,1 Rv1,1 X2,1 RY2,1 Figure 4.19: Radix-3 storage pattern for res induced symmetries for N = 18 highlighting the case n = 0 2!1,0 Yl,O X2,0 Yz,o xs,o Ya,1 x4,0 Iy2,1 zs,o Iy1,1 xo,l Iyo,1 X1,1 Ry,,, X2,1 Rv2,1 Figure 4.20: Radix-3 storage pattern for res induced symmetries for N = 18 highlighting the case n = N /6 193

PAGE 205

z1,0 1,0 :l!2,0 Y2,o Z3,0 ly3,1 :l!4,0 Iy2,1 a::s,o Yl,l zo,l I yo,! Z1,1 Y1,1 :l!2,1 RY2,1 Figure 4.21: Radix-3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 1 194

PAGE 206

Za,l Iys,o Z2,1 Iy2,o Z1,1 ly1,0 Zo,1 Yo,o Zs,o Ryl,O x4,o Ry2,o Za,o :iis,1 x2,o fJ2,1 i:1,0 :ii1,1 Figure 4.22: Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 0 Xa,l Ya,o :l:2,1 Iyz,o X1,1 Iyl,O Xo,l lyo,o Zs,o Ryl,O :l:4,0 RY2,o Xa,o :iis,1 2:2,0 :ii2,1 Zt,O Y1,1 Figure 4.23: Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = N /6 195

PAGE 207

2!3,1 Iy3,o 2:2,1 Iy2,o Z1,1 Yl,O Zo,1 Iyo,o Zs,o Y1,0 2:4,0 Z3,o Y3,l 2:2,0 iJ2,1 Zl,O 1,1 Figure 4.24: Radix-3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 1 196

PAGE 208

I:z:o,o Yo,o Ixl,O Iy1,0 Ix2,o Iy2,o Iza,o Iya,o Iz4,o Ry2,o Izs,o Ryl,O Ixo,1 Yo,l I:tl,l Iy1,1 Ix2,1 Iy2,1 Ixa,l Iya,l R"'2,1 Ry2,1 Rx1,1 Ry1,1 Rxo,1 Yo,2 R"'s,o Iy1,2 R"'4,o Iy2,2 Rxa,o Iya,2 R"'2,o Ry2,2 R"'l,O Ry1,2 Figure 4.25: Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = 0 197

PAGE 209

I:z:o,o Iyo,o I:>:l,O Iyl,O b2,0 Iy2,o I:z:3,0 Y3,o 1:>:4,0 Ry2,o Izs,o Ryl,O l:z:o,l Iyo,1 I:cl,l ly1,1 I:z:2,1 ly2,1 l:z:3,1 Y3,1 R:z:2,1 Ry2,1 R:z:1,1 Ry1,1 R:>:o,l Iyo,2 Rzs,o Iy1,2 R:c4,o Iy2,2 R:z:3,o Y3,2 R:c2,o Ry2,2 R:c1,0 Ry1,2 Figure 4.26: Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = N/6 198

PAGE 210

Ixo,o I:>:l,O Iz2,D I:>:a,o I:>:4,o I:>:s,o l:>:o,1 1:>:1,1 I:>:2,1 1:>:3,1 Rx2,1 R"1,1 R:>:o,1 R:>:s,o R:>:4,o R:>:a,o R:>:2,o Rx1,0 Iyo,o Y1,o Iy2,o Iya,o Ry2,o Y1,0 Iyo,1 Y1,1 Y1,2 Figure 4.27: Radix-3 storage pattern for I sequences for N = 18 highlighting the case n = 1 199

PAGE 211

Iya,o !:1:2,1 Iy2,o Iy,,o I:vo,l Yo,o Ixs,o Ryl,O Ix4,o Ry2,o Ixa,o Iya,l Ix2,o Iy2,1 Ix1 0 Iy,.l Ixo,o Yo,1 Rx1 0 Ry,,, Rx2,o Ry2,1 Rxa,o lya,2 Rx4,o lY2,2 Rxs,o ly1,2 Rxo,l YD,2 Rx1 1 RY1,2 Rx2,1 Ry2,2 Figure 4.28: Radix-3 storage pattern for I2 sequences for N = 18 highlight ing the case n = 0 200

PAGE 212

I:z:a,l Ya,o I:z:2,1 ly2,o Iz1 ,1 Iyl,O I:z:o,l Iyo,o I:z:s,o Ryl,O I:z:4,o RY2,o I:z:a,o Ya,1 I:z:2,o lY2,1 Ixl,O Iy1,1 I:z:o,o Ivo,l R:z:l,O Ry1,1 R:z:2,o Ry2,1 R:z:a,o Ya,2 R:z:4,o ly2,2 R:z:s,o Iy1,2 R:z:o,l Ivo.2 R:z:1,1 Ry1,2 R:z:2,1 Ry2,2 Figure 4.29: Radix-3 storage pattern for I2 sequences for N = 18 highlight ing the case n = N /6 201

PAGE 213

Ia:a,l Ia:2,1 Ia:1,1 Ia:o,l Ia:5,o Ia:4,0 Ia:a,o I"'2,o r,,,o I"o,o Ra:,,o Ra:2,o Ra:a,o R"4,o R"'5,o R"o,l Ra:,,, R"'2,1 Iyo,o Ry1,2 Ry2,2 Figure 4.30: Radix-3 storage pattern for 12 sequences for N = 18 highlighting the case n = 1 202

PAGE 214

Table 4.3: Splitting Tree for the Radix-3 RO FFT for N = 27 Length Factor Symmetry Repetitions 27 3 ICS 1 9 3 ICS 1 12 1 3 3 ICS 1 I2 4 203

PAGE 215

4.5 The Mixed Radix RO FFT Up to this point, we have been developing the RO FFT for p = 2, 3, 4 as three unrelated algorithms. We now begin combining these into a single mixed radix RO FFT algorithm for sequence lengths comprised of these factors. The first issue we must address is compatibility of storage patterns. That is, when we compare the storage patterns for p = 2,3,4, are sequences with the same symmetries stored in the same pattern? The answer is yes, with one exception. The storage pattern for I sequences for p = 3 is different from p = 2, 4. We deal with this as follows. We will process all even factors first, followed by all odd factors. Thus, when processing a factor of 3, we assume that there are already I sequences present stored in the pattern corresponding top= 2, 4. Processing a factor of 3 will introduce I sequences stored in a new pattern. To distinguish these two storage patterns for I sequences, we refer to the latter as I2 sequences. When processing additional factors of 3, we will encounter be-th I and 12 sequences. Thus, for p = 3 we have developed two storage patterns for the combine equation for I sequences corresponding to these two possibilities. These are shown in Figures 4.25 through 4.30. The second issue we must address is generating splitting trees for the mixed radix RO FFT. The splitting tree guides the entire algorithm by in dicating which combine equations to apply to the data, and in what order. Table 4.4 shows the splitting tree for the mixed radix RO FFT for N = 4x2x3x3 = 72, using the same compressed format introduced earlier. Be cause the splitting tree is different for each value of N, it is necessary to de velop software which generates it automatically. The first step in this process is to develop a representation of the splitting tree in standard FORTRAN. Closely related to this is the factorization of the length of the sequence, N. We have made an important decision to process all even factors first, followed by all odd factors. More explicitly, we will first process all factors of 4, followed by at most one factor of 2, followed by all factors of 3. We represent this factorization in FORTRAN by: NFAC(l) NFAC(2) = NFAC(3:12) = N number of factors list of factors (10 maximum) 204

PAGE 216

When the factors are processed in a known order, the amount of memory required to represent the splitting tree can be minimized. This is important, since otherwise the splitting tree could require almost as much memory as the data. In Table 4.4 we observe that each factor has associated with it 1-4 splitting tree entries. With the restriction that the factors are processed in the order specified above, it is easily seen that there are at most 4 splitting tree entries per factor for any relevant value of N. We represent the number of splitting tree entries for the J'th factor in FORTRAN by NE(J), and it satisfies 1 $ NE( J) $ 4. We represent the I'th splitting tree entry for the J'th factor in FORTRAN by: TREE(1,I,J) symmetry of DFT of sequence, where: 1 ICS 2 = ISCS 3 I 4 = I2 TREE(2,I,J) = associated repetition count The arrays NFAC, NE, and TREE are constant for a fixed value of N. Because they are referenced throughout the code for the RO FFT, they have been placed in a common block. At this point, our discussion of the design of the software for the R.O FFT will be facilitated by a description of each subroutine and their relationships. Figure 4.31 provides the subroutine hierarchy for the initialization process ing, which is defined as all processing which depends only on the length of the sequence, N. Figure 4.32 provides the subroutine hierarchy for the transform processing, which is defined as all processing which depends on the data itself. Since the initialization processing is executed just once for each value of N, its execution time is not critical. In contrast, the execution time of the transform processing is critical. N;:,te that many subroutines are used in both the initialization and transform processing. For a descrip tion of each subroutine, refer to the prologues contained with the code in Appendix B. Note that this software is designed to transform multiple se quences on vector processors. In the remainder of this section, we highlight features of the software which are worthy of special attention. 205

PAGE 217

Table 4.4: Splitting Tree for the Mixed Radix RO FFT for N = 72 Length Factor Symmetry Repetitions 72 4 ICS 1 18 2 ICS 1 ISCS 1 I 1 9 3 ICS 1 ISCS 1 I 3 3 3 ICS 1 I2 2 ISCS 1 I 9 206

PAGE 218

VICSF4 VISCSF4 VIF4 VICSF2 VISCSF2 VIF2 VFFR03I VICSF3 VISCSF3 VIF3 VI2F3 Figure 4.31: Initialization subroutine hierarchy for the RO FFT VFFR04 VICSF4 VISCSF4 VIF4 VFFRO VFFR02 VICSF2 VISCSF2 VIF2 VFFR03 VICSF3 VISCSF3 VIF3 VI2F3 Figure 4.32: Forward transform subroutine hierarchy for the RO FFT 207

PAGE 219

The most significant feature of the design of this software is the automated generation of the splitting tree. Because this depends only on the length of the sequence N, it is performed with initialization processing for which execution time is not critical. From this, we obtain two important by-products as well. The first of these is the array INDX, which provides the permuted indices of the forward transform. The ordering of the forward transform depends on the storage pattern used for each combine equation in the algorithm. As a result, this ordering is rather complex, and we must provide the user with an explicit description of it. Thus, the array INDX pro vides a list of the indices of the forward transform in the correct permuted order. The second by-product is for applications to fast Poisson solvers. Recall from Section 1.1 that the RO FFT has associated with it a set of eigenvalues. The computations in the spectral domain involve both the for ward transform and these eigenvalues. It is essential that the eigenvalues be provided in the same permuted order as the forward transform. Thus, the initialization processing includes the computation of these eigenvalues, and they are stored in the array EIGENV in the same permuted order indicated by INDX. We note that these eigenvalues are scaled by N rather than scal ing the forward transform by 1/N. This is far more efficient. INDX itself is not needed for applications to fast Poisson solvers, but may be useful for other applications. INDX is obtained by computing the forward transform of a special sequence. The splitting tree is generated as this transform is computed. Having obtained INDX, EIGENV is easily computed. In the following paragraphs we describe this process in more detail. The sequence which we transform to obtain INDX is derived as follows. We set: Im(Xk) = k for the lower half-range of k. From Theorem 2.10, the IDFT of this for even values of N is given by: N/2-1 "'n = -L 2k sin(21rkn/ N) k=l for 1 $ n $ N /2 1. Of course, there is an analogous result for odd values of N. When we compute the DFT of the sequence Xn we recover the indices k, to within rounding error, in permuted order. We note that some of the indices will be negative. This is because the DFT Xk is ICS symmetric. That is: 208

PAGE 220

Only non-redundant terms of Xk are stored, but these do not necessarily belong to the lower half of the sequence. Thus, INDX will contain index values belonging to both the lower and upper halves of the sequence Xk. Note that the sign of the index k has no effect on the associated eigenvalue: The processing described in this paragraph may be found near the end of subroutine VFRO I. Next, we describe how the splitting tree is generated and used. Recall that the splitting tree is generated as the sequence :lln in the preceding paragraph is transformed. Subroutine VFFROI supervises this forward trans form and generation of the splitting tree by processing the list of factors of the sequence length N in forward order. Before processing the first factor, the initial splitting tree entry is stored. This always has the form: NE(1) TREE(1,1,1) TREE(2,1,1) 1 one entry for the first factor 1 DFT of sequence is res symmetric 1 repetition count is 1 As a specific example, let us assume that the first factor of N is 4. In this case, we next call subroutine VFFR04I. Subroutine VFFR04I uses the initial splitting tree entry to supervise the application of the radix-4 forward combine equations. It also adds new splitting tree entries which reflect the changes made by this radix-4 splitting. These changes are obtained by studying the appropriate storage pattern diagrams. For this example, these are Figures 4.10 through 4.12. First, we note that the splitting tree changes depend on the length of the subsequence being split, LS. The subsequence length which establishes the boundaries between these options is always 2p, where p is the radix. For this example, this critical subsequence length is 8, and let us assume that LS > 8. Then the output of this forward combine operation is one res sequence, followed by one ISeS sequence, followed by one I sequence. At this point, we add a new set of splitting tree entries which describe this output. Recall that the FORTRAN structure of the splitting tree is a three dimensional integer array. The third index indicates the factor number being processed. In this example, the initial splitting tree entry has a third index of 1, and these new splitting tree entries have a third index of 2. The new splitting tree entries will be used to supervise the processing for the second factor of N. 209

PAGE 221

By continuing in this fashion, we generate the complete splitting tree. To facilitate this, subroutine VFFR04I has an input parameter IC which is used as the third index into the splitting tree for obtaining inputs, and IN"' IC + 1 is used as the third index into the splitting tree for storing outputs. Also note that new splitting tree entries are added in a manner which keeps the splitting tree as compressed as possible. That is, we never make two consecutive splitting tree entries with the same symmetry. We increment the corresponding repetition count instead. For the example we have been discussing, the code fragment in subroutine V:FFR04I which adds the new splitting tree entries is shown below. IF ((NE(IN) .GE. 1) .AND. & (TREE(1,NE(IN),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 1 TREE(2,NE(IN),IN) = 1 END IF NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 2 TREE(2,NE(IN),IN) = 1 NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 3 TREE(2,NE(IN),IN) = 1 Note that the code fragment above is a knowledge base, or set of rules, which describes the results of a radix-4 splitting of an ICS sequence for LS > 8. Subroutines VFFR04I, VFFR02I, and VFFR03I collectively contain all of the rules required to generate splitting trees for the RO FFT for sequences of length N which are products of 2,3,4. Note that we never explicitly generated the splitting tree for a specific value of N. Instead, this is done automatically by the software during initialization processing. Since the initialization processing actually computes a forward trans form, the forward transform processing outlined in Figure 4.32 is completely analogous to it, except that the splitting tree is not generated because it is al ready available. Consequently, we will not describe the subroutines involved in the forward transform in detail. We only note that the nse of the splitting tree to guide the forward transform does not add significantly to the exe cution time because this merely involves testing splitting tree entries and 210

PAGE 222

branching to call the appropriate subroutines. We have not implemented the inverse transform because it does not introduce any new significant is sues. However, we will briefly outline what is involved. First, one must develop corollaries for the inverse combine equations analogous to those for the forward combine equations. The storage patterns for the inverse com bine equations are the same as for the forward combine equations, except that they are traversed in the inverse direction. The input to the inverse transform is in the permuted order output by the forward transform. The inverse transform uses the same initialization routine as the forward transform because the splitting tree must be generated in the forward direction. The splitting tree cannot be generated in the inverse direction because, for example, there is more than one way that I sequences can be produced. Thus, it was essential that we saved the state of the splitting tree for each factor of N. The software for the inverse transform itself is analogous to that for the forward transform, with the following exceptions. The factors of N are processed in inverse order, and consequently the splitting tree is traversed in the inverse direction. Finally, the inverse combine equations are used instead of the forward combine equations. One of the key objectives of the prototype software for the RO FFT is efficiency. Consequently, in this paragraph we discuss coding techniques utilized to obtain optimum performance. These techniques are discussed in the order of most significant to least significant. This software is written in FORTRAN for serial vector processors. It is designed to compute multiple transforms. Each sequence is stored in a row of a two dimensional array. We vectorize by column, resulting in a vector stride of 1 and a vector length equal to the number of sequences. All complex arithmetic is expanded into real and imaginary parts. The equations have been optimized by hand, stor ing all intermediate results in local scalar variables. Within vectorized loops, local scalar variables are implemented as vector registers when possible and do not necessarily result in additional memory accesses. In as much as pos sible, computations involving the same vectors are in adjacent FORTRAN statements to avoid multiple memory accesses for the same data. We have avoided coding the unary operator -. Where this operator must be used, it is used as a scalar operator if possible, rather than as a vector operator. CASE structures have been implemented with the computed GOTO statement to avoid sequences of test and branch instructions. The most common case is coded last to avoid an additional brar.ch instruction. Finally, we have avoided coding DO loops which are known a priori to have only one iteration. 211

PAGE 223

We conclude this section with a discussion of the test driver used in the development of the prototype software for the RO FFT. We do this because it has played a crucial role in the development of this software, and also because it provides an explicit example of the correct user interface. It should be clear that there are a large number of logic paths through this software. For this reason, this test driver has been designed to exhaustively test with sequences of all lengths within a practical range. For applications to fast Poisson solvers, the length of a RO sequence represents twice the number of grid points in one dimension of a two or three dimensional grid. Thus, 1024 is a reasonable upper bound on the sequence length. Again for fast Poisson solvers, the number of sequences is the product of the number of grid points in the remaining dimensions, and therefore is fairly large. We have selected 1024 as the number of sequences because this is the vector length, and is sufficient for most vector processors to attain full vector speed. We assign the same test data to all sequences. This test data has been selected to facilitate automated verification of the software output, and is derived as follows. We set: Im(Xk) = 1 for the lower half-range of k. From Theorem 2.10,the IDFT of this for even values of N is given by: N/2-1 "'n = I; 2 sin(21l"kn/ N) k=l for 1 ::; n ::; N /2-1. Of course, there is an analogous result for odd values of N. When we compute the DFT of the sequence "'n we recover the sequence with values to within rounding error. The value -1 occurs because of the ICS symmetry and permuted ordering of the DFT Xk. In spite of this, it is easy to automate the verification of the software output because it is constant, up to a sign, and does not require sorting. At this point, a word about scaling is in order. Because we will be making performance compar isons to VFFTPK, we have scaled the output as VFFTPK does. Thus, we have scaled the output by 1/VN instead of 1/N. As a result, the correct output will be VN rather than For the final version of this software, it is recommended that no scaling be done because this can be accomplished more efficiently by the user. The test driver is designed to produce a concise report of the test results. If the automated verification of the software output is successful, then only timing data is provided. Otherwise, additional debug data is output. 212

PAGE 224

The prototype software for the RO FFT has passed the tests described in the preceding paragraph. Thus, we are ready to procede with performance comparisons to VFFTPK. The test driver has been designed to apply the same tests to VFFTPK, time both algorithms, and compare their perfor mance. The results are presented in Section 4.6. 213

PAGE 225

4.6 Performance of the RO FFT The tests described at the end of Section 4.5 have been executed on both an IBM 3090J, located at the IBM Federal Sector Division in Houston, Texas, and a Cray Y-MPS/864, located at the Kational Center for Atmo spheric Research in Boulder, Colorado. The results are shown in Tables 4.5 and 4.6. The column headings in these tables are as follows: N CINIT CTRAN PINIT PTRAN DEL TIM length of the RO symmetric sequence (products of 2,3,4 only) compact algorithm initialization time (seconds) compact algorithm transform time (seconds) prepost-processing algorithm (VFFTPK) initialization time (seconds), not available for odd values of N prepost-processing algorithm (VFFTPK) transform time (seconds), not available for odd values of N lOO(PTRAN-CTRAN)/PTRAN The remainder of this section is devoted to a careful analysis of the data in Tables 4.5 and 4.6. First, we emphasize that it is not our intent to compare the performance of the IBM 3090J and the Cray Y-MPS/864. Rather, we are comparing the performance of the compact algorithm to the prepost processing algorithm (VFFTPK). We regard the performance of VFFTPK as the baseline, and compare the performance of the compact algorithm to it. We will refer to this as the relative performance of the compact algorithm, and it is expressed quantitatively in the column labeled DELTIM. Note that we are not concerned with the performance of initialization processing because in applications it is executed only once for each value of N. Also, the functionality of the initialization processing is quite different for the two algorithms. Note that the relative performance of the compact algorithm is signifi cantly higher on the IBM 3090J than on the Cray Y-MPS/864. The reason for this involves an analysis of the assembly language generated from the 214

PAGE 226

Table 4.5: Timing Data for 1024 Sequences on the IBM 3090J N CINIT CTRAN PINIT PTRAN DEL TIM 3 0.000215 0.000098 0.000000 I 0 000000 0.0 4 0.000089 0.000105 0.000007 0.000011 -854.5 6 0.000129 0.000264 0.000053 0.000130 -103.1 8 0.000165 0.000437 0.000016 0.001221 64.2 9 0.000200 0.000821 0.000000 0.000000 0.0 12 0.000224 0.000837 0.000028 0.001867 55.2. 16 0.000295 0.001157 o.oooo27 I o.oo2360 51.0 18 0.000338 0.001987 0.000032 0.003009 34.0 24 0.000493 0.002503 0.000037 0.004024 37.8 27 0.000543 0.004127 0.000000 0.000000 0.0 32 0.000636 0.003313 0.000037 0.005254 36.9 36 0.000772 0.004646 0.000057 0.007280 36.2 48 0.001095 0.005909 0.000074 0.009791 39.6 54 0.001264 0.009420 0.000088 0.011954 21.2 64 0.001488 0.007822 0.000089 0.012861 39.2 72 0.001982 0.011617 0.000104 0.016385 29.1 81 0.002303 0.017147 0.000000 0.000000 0.0 96 0.003018 0.015135 0.0.00125 0.023006 34.2 108 0.003514 0.020407 0.000145 0.029910 31.8 128 0.004367 0.019962 0.000154 0.030695 35.0 144 0.005475 0.026458 0.000186 0.040155 34.1 162 0.006480 0.039680 0.000205 0.048162 17.6 192 0.008622 0.035250 0.000226 0.053534 34.2 216 0.010670 0.049952 0.000253 0.064079 22.0 243 0.012850 0.068479 0.000000 0.000000 0.0 256 0.013318 0.046786 0.000281 0.070952 34.1 288 0.017432 0.064724 0.000318 0.084894 23.8 324 0.020924 0.083660 0.000360 0.109690 23.7 384 0.028521 0.084783 0.000403 0.113559 25.3 432 0.034990 0.106282 0.000460 0.145865 27.1 486 0.043096 0.149356 0.000512 0.172023 13.2 512 0.046345 0.111392 0.000519 0.149192 25.3 576 0.063103 0.138339 0.000593 0.193386 28.5 648 0.078003 0.187007 0.000665 0.228665 18.2 729 0.096424 0.24 7511 0.000000 0.000000 0.0 768 0.105239 0.181592 0.000769 0.258849 29.8 864 0.131837 0.243505 0.000867 0.303848 19.9 972 0.162945 0.306870 0.000976 0.378660 19.0 1024 0.176456 0.240520 0.001008 0.341197 29.5 215

PAGE 227

Table 4.6: Timing Data for 1024 Sequences on the Cray Y-MP8/864 N CINIT CTRAN PINIT PTRAN DEL TIM 3 0.000038 0.000029 0.000000 0.000000 0.0 4 0.000036 0.000030 0.000005 0.000005 -522.9 6 0.000056 0.000078 0.000017 0.000031 -156.7 8 0.000064 0.000116 0.000014 0.000181 36.0 9 0.000075 0.000170 0.000000 0.000000 0.0 12 0.000087 0.000199 0.000027 0.000361 44.8 16 0.000104 0.000274 0.000026 0.000485 43.5 18 0.000131 0.000420 0.000034 0.000606 30.6 24 0.000167 0.000588 0.000042 0.000834 29.4 27 0.000192 0.000783 0.000000 0.000000 0.0 32 0.000222 0.000790 o.oooo4o I o.oo1o89 27.4 36 0.000249 0.000989 0.000045 0.001395 29.1 48 0.000335 0.001330 0.000054 0.001895 29.8 54 0.000412 0.001850 0.000054 0.002305 19.8 64 0.000465 0.001774 0.000053 0.002490 28.8 72 0.000579 0.002511 0.000063 0.003142 20.1 81 0.000690 0.003141 0.000000 0.000000 0.0 96 0.000851 0.003345 0.000072 0.004224 20.8 108 0.001011 0.004115 0.000068 0.005372 23.4 128 0.001301 0.004517 0.000072 0.005636 19.9 144 0.001772 0.005576 0.000079 0.007230 22.9 162 0.002139 0.007283 0.000081 0.008711 16.4 192 0.002659 0.007424 0.000091 0.009838 24.5 216 0.003269 0.009854 0.000093 0.011822 16.7 243 0.003993 0.012057 0.000000 0.000000 0.0 256 0.004179 0.009865 0.000096 0.013037 24.3 288 0.005616 0.013014 0.000111 0.015986 18.6 324 0.006756 0.015806 0.000111 0.019110 17.3 384 0.008940 0.017537 0.000128 0.021576 18.7 432 0.011495 0.021370 0.000130 0.025842 17.3 486 0.014145 0.027055 0.000136 0.030702 11.9 512 0.015338 0.023642 0.000139 0.028688 17.6 576 0.019435 0.028664 0.000162 0.035000 18.1 648 0.025037 0.036605 0.000163 0.041512 11.8 729 0.030515 0.043868 0.000000 0.000000 0.0 768 0.032927 0.038324 0.000190 0.047187 18.8 864 0.041872 0.048680 0.000199 0.056344 13.6 972 0.052831 0.057831 0.000212 0.068120 15.1 1024 0.057313 0.050686 0.000220 0.063066 19.6 216

PAGE 228

FORTRAN source code on both machines. To simplify this analysis, we have restricted our attention to the most computationally intensive vector ized loops involved in radix-4 processing. We have selected radix-4 because the number of vector registers required for efficient implementation of these algorithms increases with the value of the radix, p. Thus, for our test cases radix-4 represents the worst case. For the compact algorithm, the vector ized loop which we analyze is the DO 101 loop from subroutine VIF4 (see Appendix B). For VFFTPK, the vectorized loop which we analyze is the following code segment from subroutine VRADF4: DO 1003 M=l,MP CH(M,I-1,1,1) = ((WA1(I-2)oCC(M,I-l,K,2)+WA1(I-1)o 1 CC(M,I,K,2))+(WA3(I-2)oCC(M,I-l,K,4)+W&3(I-1)* 1 CC(M,I,K,4)))+(CC(M,I-1,K,1)+(WA2(I-2)oCC(M,I-1,K,3)+ 1 WA2(I-1)oCC(M,I,K,3))) CH(M,IC-1,4,K) = (CC(M,I-1,K,1)+(WA2(I-2)oCC(M,I-1,K,3)+ 1 WA2(I-1)oCC(M,I,K,3)))-((WA1(1-2)*CC(M,I-1,K,2)+ 1 WA1(I-1)oCC(M,I,K,2))+(WA3(I-2)CC(M,I-1,K,4)+ 1 WA3(I-1)oCC(M,I,K,4))) 1003 CONTINUE We are interested in the number of memory accesses (vector loads and stores) required to implement these vectorized loops. First, we consider the IBM 3090J. This machine has 16 single precision vector registers of length 256, which may be concatenated in pairs to form 8 double precision vector registers. Since both algorithms are coded in single precision, we have 16 vector registers available. This machine also has a 256K byte cache. Vector instructions may operate on two vector registers, or one vector register and the address of a vector in memory. The latter type of vector instruction is used extensively by the VS FORTRAN compiler, and it complicates the process of counting memory accesses. For the first such instruction issued to a particular address, we assume that the memory operand is not in cache and count this as a memory access. For subsequent instructions issued to this same address, we assume that the memory operand is in cache, and do not count this as a memory access. With these assumptions, the number of memory accesses required to implement 217

PAGE 229

the segments from VIF4 and VRADF4 are both 16, This is optimal, because both code segments involve 8 real input and 8 real output vectors, Next, we consider the Cray Y-MP8/864, This machine has 8 vector registers of length 64 for both single and double precision, and there is no cache, Vector instructions may operate on two vector registers only, Thus, the number of memory accesses is simply the number of vector loads and stores, The FORTRAN compiler used was CFT77 with full optimization, The number of memory accesses required to implement the segment from VIF4 is 28, Of these, 12 involve temporary storage locations due to an insufficient number of vector registers, The number of memory accesses re quired to implement the segment from VRADF4 is 26, Of these, 6 involve temporary storage locations and 4 involve reloading vectors a second time due to the complexity of the code, Both code segments required more than the optimum number of memory accesses, However, the segment from VIF4 required more than the segment from VRADF4 because the former is part of an in-place algorithm which requires a larger number of temporary stor age locations, Thus, the relative performance of the compact algorithm is constrained on the Cray Y-MP8/864 by an insufficient number of vector registers, In view of the analysis above, we will focus our attention on the IBM 3090J for the remainder of this section, Our next goal is to develop analytic timing models for both algorithms, First, we develop the timing model for the compact algorithm, As usual, N will denote the length of the RO symmetric sequence, and we express N as follows: To simplify the model, we have excluded odd values of N, Thus, for each factor of N there are N /2 1 real quantities to be processed, We seek a least squares fit to the timing data using the sum of the following terms: c1(N/2-1) time for scaling, and adjustments to the other terms c2p(Nj2-1) time for processing all factors of 2 C3q(N/21) time for processing all factors of 3 time for processing all factors of 4 218

PAGE 230

Note that the time required for processing a given factor is not uniform because the computations involved depend on the length of the subsequence being split. For example, the computations required for the last factor of N do not involve multiplications by powers of omega. Thus, the constants c 2 ,ca,c4 represent averages, and the time required for processing the last factor of N will be overestimated. The constant c1 is used to adjust for this, and therefore it may be negative. The least squares solution was computed using Mathematica [12], and yielded the following results: CJ -0.000058986 C2 +0.000069355 ca +0.00011704 7 C4 +0.00010503 7 Next, we develop the timing model for the prepost-processing algorithm. This algorithm is restricted to even values of N, and it ultimately transforms a real sequence of length N /2. Thus, for this model we express N /2 as follows: N/2 = 2P3"4r We seek a least squares fit to the timing data using the sum of the following terms: caq(N/2) time for pre-processing, post-processing) scaling, and adjustments to the other terms time for processing all factors of 2 time for processing all factors of 3 time for processing all factors of 4 The least squares solution was again computed using Mathematica, and yielded the following results: CJ +0.000147671 C2 +0.0000781027 ca +0.000111397 c4 +0.000110474 219

PAGE 231

Note that c, is positive in this case due to a large contribution from pre and post-processing. Table 4. 7 is analogous to Table 4.5 except that all timing data has been computed from the timing models. Timing data not relevant to the timing models have been omitted or set to zero. A comparison of these tables shows that we have obtained an excellent fit. We now focus our attention on the columns labeled DEL TIM. Recall that this is a measure of the relative performance of the compact algorithm, which is our primary interest. Both tables show that DEL TIM is a fairly complicated function of N. We will now summarize the reasons for this. In Table 4.5 note that for N = 4, 6, the compact algorithm is slower than VFFTPK. This is because VFFTPK contains simplified code for these special cases. Such small values of N are of no practical importance, so we ignore this. In Table 4. 7, the timing model has been used to extrapolate timing data for VFFTPK for N = 4, 6. Swarztrauber [10] has shown that the compact and prepost-processing algorithms have the same asymptotic operation counts, but the compact algorithm has smaller low order terms. For values of N within the practical range shown in Tables 4.5 and 4. 7, these low order terms make a significant contribution. Closely related to this is the number of factors of N. Both algorithms must access all of the data for each factor of N, while VFFTPK accesses all of the data two additional times for pre-and post-processing. These additional data accesses are most significant when the number of factors of N is small. Thus, DELTIM generally decreases as N increases. This can be seen by comparing the timing data for N = 64, 256, 1024. However, D ELTIM is not a simple monotonically decreasing function of N. An interesting phenomenon occurs when N includes an odd power of 2. Recall that both algorithms use as many factors of 4 as possible, resulting in at most one factor of 2. Moreover, the prepost-processing algorithm actually works with sequences of length N /2. This eliminates the factor of 2 for the prepost-processing algorithm. With one less factor to process, the performance of the prepost-processing algorithm improves relative to the compact algorithm. This can be seen by comparing the timing data for N = 256, 512,1024. There is an additional phenomenon which creates irregularities in the timing data. Recall from Theorem 2.12 that the forward combine equations involve factors of wp, where pis the radix. For p = 2, 4 this reduces to -1, i respectively. Of course, we do not need to perform multiplications by these values. On the other hand, for p = 3 this reduces to -1/2 + iVS/2, and 220

PAGE 232

Table 4.7: Timing Model for 1024 Sequences on the IBM 3090J N Factorization CTRAN PTRAN I DELTIM 3 203140 0.000000 0.000000 1 0.0 4 203041 0.000046 0.000452 89.8 6 2 1 3 1 4 0.000255 0.000777 67.2 8 2 1 3 1 0.000346 0.001033 66.5 9 203240 0.000000 0.000000 0.0 12 2 1 4 1 0.000815 0.002023 59.7 16 20304Z 0.001058 0.002690 60.7 18 2 1 324 0.001956 0.003334 41.3 24 2 1 3 1 4 1 0.002557 0.004435 42.3 27 203340 0.000000 0.000000 0.0 32 2132 0.003307 0.005898 43.9 36 224 1 0.004 762 0.008074 41.0 48 2 1 42 0.006167 0.010743 42.6 54 213340 0.009399 0.013010 27.8 64 203043 0.007940 0.014295 44.5 72 2 1 324 1 0.012233 0.017314 29.3 81 203440 0.000000 0.000000 0.0 96 2 1 3 1 42 0.015862 0.023041 31.2 108 203341 0.021051 0.030238 30.4 128 2 1 33 0.020505 0.030662 33.1 144 2242 0.027348 0.040251 32.1 162 2 1 3 4 4 0.038285 0.048054 20.3 192 203143 0.035451 0.053579 33.8 216 2 1 3 3 4 1 0.049921 0.063972 22.0 243 203540 0.000000 0.000000 0.0 256 203044 0.045868 0.071321 35.7 288 2 1 3 2 4 2 0.064999 0.085163 23.7 324 203441 0.082792 0.108761 23.9 384 2 1 3 1 4 3 0.084523 0.113374 25.4 432 203342 0.107979 0.144815 25.4 486 2 1 3 5 4 0.144136 0.171231 15.8 512 2 1 3 4 0.109782 0.150929 27.3 576 203243 0.140693 0.192821 27.0 648 213441 0.188501 0.228009 17.3 729 203640 0.000000 0.000000 0.0 768 203144 0.183154 0.256740 28.7 864 2 1 3342 0.246353 0.303614 18.9 972 203541 0.306174 0.380421 19.5 1024 203045 0.238228 0.341847 30.3 221

PAGE 233

complex multiplications by this value are required. Similar considerations apply to the prepost-processing algorithm. The timing data for both al gorithms exhibit local maxima at values of N which include many factors of 3. For examples, see the timing data for N = 486,972. These local maxima in the timing data create corresponding irregularities in DEL TIM. This phenomenon is also reflected in the constants c2 c3 c4 in the timing models. Although we have been analyzing the RO FFT, this phenomenon occurs in the complex FFT as well, and contradicts a statement made in the paper by Cooley and Tukey [3] which introduced the complex FFT. There it was asserted that the operation count for the complex FFT, when normalized by N log( N), attains a minimum for radix-3. This erroneous conclusion was based on oversimplified operation counts which included multiplications by 1 and i. The complex pattern of DEL TIM in Tables 4.5 and 4.7 is the result of superimposing the phenomena discussed in the preceding paragraphs. As an example, for N = 486 we have both an odd power of 2 and many factors of 3. Although DEL TIM is a fairly complicated function of N, we generalize that for most values of N within a practical range, DEL TIM is approximately 25-30%. Thus, for applications which make extensive use of symmetric FFTs, it is well worth the effort to implement the compact algorithms. 222

PAGE 234

4. 7 Automating Implementation of the RO FFT Recall from Section 4.1 that the entire package of compact algorithms requires a significant quantity of FORTRAN code. Thus, we wish to auto mate as much of the implementation process as possible. We have focused our attention on the most labor intensive steps in this process. These are obtaining the simplified form of the combine equations contained in Corol laries 4.1 through 4.3, and writing FORTRAN code for these which maps each algebraic quantity to the correct storage location as specified by the associated data storage patterns. Thus, we need a software tool which is capable of performing symbolic algebra and outputing results in FORTRAN syntax. A number of such tools are available, and we have selected Math ematica [12]. In this section, we describe how Mathematica can be used to automate the steps described above. As a specific example, we will automate the generation of code for the radix-4 RO FFT. The subroutines which we will generate automatically are those which implement the radix-4 combine equations, namely VICSF4, VISCSF4, and VIF4. The remaining subroutines associated with the radix4 RO FFT are relatively easy to code. Our overall strategy is to develop a FORTRAN skeleton for the subroutines which implement combine equa tions, generate FORTRAN code for the combine equations using Mathernat ica, and insert these equations into the skeleton. The FORTRAN skeleton is contained in Appendix C. Although we are focusing on the radix-4 RO FFT, this skeleton may be used for any of the compact symmetric FFTs with only minor modifications. The command files for generating FORTRAN code for the combine equations with Mathematica are contained in Ap pendix D. There are three command files corresponding to the forward combine equations for ICS induced symmetries, ISCS induced symmetries, and I sequences. Although we are focusing on the radix-4 RO FFT, these command files are valid for any even value of the radix p, and could easily be modified for odd values of p. These command files are quite involved. However, they contain extensive comments, so we will not elaborate on them further here. It is assumed that the reader has a general familiarity with Mathematica. Appendix E contains the results of inserting the Mathematica output into the FORTRAN skeleton, yielding new versions of subroutines VICSF4, VISCSF4, and VIF4. Note that the FORTRAN skeleton and any other handwritten code is in uppercase, while the Mathematica output is in lower case. We emphasize that no attempt was made to manually optimize 223

PAGE 235

Table 4.8: Comparison of Timing Data for Handwritten Code and Auto mated Code for 1024 Sequences on the IBM 3090J Hand. Auto. Hand. Auto. N PTRAN CTRAN CTRAN DEL TIM DEL TIM 16 0.002360 0.001157 0.001276 51.0 45.9 64 0.012861 0.007822 0.008753 39.2 31.9 256 0.070952 0.046786 0.052649 34.1 25.8 1024 0.341197 0.240520 0.273806 29.5 19.8 this code. We have combined the new versions of these subroutines with the remainder of the code, and executed the tests described in Section 4.5. The results are shown in Table 4.8. This table is restricted to powers of 4, since only the radix-4 subroutines have been automated. By comparing the timing data for the handwritten and automated versions, we see that DELTIM has decreased by 5-10%. This decrease in performance could be recovered by manually optimizing the code. The procedures described in this section have been highly successful for automating the generation of code for the radix-4 RO FFT, and could easily be extended to generate code for the other compact symmetric FFTs. 224

PAGE 236

Appendix A Eigenstructure of the Discrete Poisson Equation

PAGE 237

In this appendix, we provide an example of the technique used to prove the eigenstructures for the discrete Poisson equation with various boundary conditions which are sununarized in Tables 1.2 through 1.4. The example we will present is that for D-DS boundary conditions. From Table 1.4, we see that the associated transform is the RO-O FFT. Theorem 2.16 in Section 2.8 provided the real form of the DFT and IDFT associated with an RO-O sequence of length N = 2(2M + 1). \Ve asserted that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for D-DS boundary conditions. Although we did not prove this, we made it seem plausible by observing that an RO-O sequence satisfies D-DS boundary conditions for the computational domain 1 :S n :S M. We now verify that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for D-DS boundary conditions. In the process, we will determine the corresponding eigenvalues as well. The matrix, of dimension M, corresponding to the discretized Poisson equation satisfying D-DS boundary conditions is shown below. We hypothesize that the n'th component of the k'th eigenvector is sin( 4ll'kn/ N), for 1 :S n :S M ,1 :S k :S M. We test this hypothesis by computing the following matrix vector product: -2 1 0 1 -2 1 = 0 1 0 0 -2 1 1 -3 sin( 4ll'k/ N) sin( Sll'k / N) sin(4dM/N) We will compute the product for the first row, the last row, and the second row which represents a typical interior row. r1 -2sin(4ll'k/N) + sin(8ll'k/N) = sin(41rk/N)[-2 + 2cos(4d/N)] = sin(4ll'k/N)[-4sin2(2ll'k/N)] TM sin(41rk(M -1)/N)-3sin(4dM/N) sin(411'k(N/43/2)/N)3sin(411'k(N/41/2)/N) = sin(d(N6)/ N)3 sin(ll'k(N2)/ N) (-1)k+J sin(6ll'k/N)-3(-l)k+1sin(2r.k/N) 226

PAGE 238

= ( -l)k+1 sin(271"kiN)[cos(471"kiN) + 2cos2(27rkiN)3] = sin(47rkMIN)[cos(471"kiN) -1 + 2cos2(27rkiN)-2] = sin( 471"kM IN)[ -4 sin2(271"k IN)] r2 = sin(47rkiN)2sin(87rkiN) + sin(127rkiN) = sin(81rkjN47rkjN)-2sin(87rkiN) + sin(87rk/N + 47rkiN) = 2sin(87rk/N)cos(47rk/N)-2sin(87rk/N) = sin(87rkiN)[2cos(47rkiN)2] = sin(87rk/N)[-4sin2(27rkiN)] We have shown that the n'th component of the k'th eigenvector is sin( 47rknj N), and that the associated eigenvalue is ).k = -4 sin2(27rk/ N) for 1 s; n s; M,1 s; k s; M where N = 2(2M + 1). 227

PAGE 239

Appendix B Software for the RO FFT

PAGE 240

c C TEST DRIVER FOR THE RO FFT c c C NOTES COJIICER!HNG PERFORMAlfCE MONITORS c C COMMENTS BEGINNING lJITH CI ARE USED FOR PERFORNANCE C MONITORING DE THE IBM 3090, WHILE THOSE BEGIIHJUG \HTH CC C ARE FOR THE CRAY YI'iP, SUBROUTINE IBMTIME PROVIDES TI11E C STAMPS ON THE IBM 3090 USING THE STCK IESTRUCTIOi, WHILE C SUBROUTINE SECOND PERFORMS A SIMILAR FUNCTION ON THE CRAY C YMP. THE FOLLOWING VARIABLES ARE USED BY SUBROUTINE C IBMTIME. c CI REA.L*S !START, !STOP c c C ALLOCATE STORAGE FOR THE PREPOST-PROCESSING ALGORITHH c c PARAMETER (M=1024,HAXLEN=1024) REAL X(1:M,1:MAXLEN/2),XT(1:M,1:MAXLEN/2) REAL WSAVE(3*(MAXLEN/2-1)+1S) C ALLOCJ.TE STORAGE FOR THE COMPACT ALGORITHH c c REAL Y(l :M,1 :MAXLEIU2-1) ,EIGElliV (1 :lUXLEN/2-1), IfORDS (9) INTEGER INDX(l:MAXLEN/2-1) COMPLEX OMEGA(0:2MAXLEN-1) COMMON /VFROCOM1/ WORDS,OHEGA C PRINT COLUMN HEADINGS c WRITE(6,1) 1 FOR.MAT(lH ,1SX,'N',4X,'CH!'IT',7X,'CTRAN',7X, .t I PI NIT' ?X. 1PTRAnl'. sx. 'DELTIH' '/) c C LOOP THROUGH VALUES OF N c i = 3 1001 IF (E .GT. HAXLEN) GOTO 1002 c C CALL COMPACT INITIALIZATION c CI CALL IBf1TIIiE(TSTART) CC CINIT = SECOND() CALL VFROI(N,INDX,EIGENV,IRC) CI CALL IBMTIME(TSTOP) CI CINIT = 1, OE-6 (TSTOP TSTil.RT) CC CINIT = SECOND() CINIT IF (IRC .EQ. 0) THEN c C GENERATE TEST DATA c PI = 4.0 ATAN(l.O) TPIDN = 2.0PI/N 229

PAGE 241

c IF (2(N/2) .EQ. N) THEN H5 = N/2-1 ELSE MS = (N-1) /2 END IF DO 200 I=1,M5 DO 201 K=1,H X(K,I) = 0.0 201 CONTINUE DO 100 J=l,MS 51 = 2.0 SIN(TPIDNIJ) DO 101 K=l,M X(K,I) = X(K,I) 51 101 CONTINUE 100 CONTINUE DO 202 K=l,M Y(K,I) = X(K,I) 202 CONTINUE 200 CONTINUE C C.A.LL COMPACT ALGORITHM c CI CALL IBMTIME(T5URT) CC CTRAN = SECOND () CALL VFFRO (H, Y) CI C!LL IBMTIME(TSTOP) CI CTRAI = 1.0E-6 (TSTOP TSTART) CC CTRAN = SECOND () CTRAN c C VERIFY COMPACT ALGORITHM OUTPUT c reo = o 5QRTN = 5QRT(FLOAT(N)) DO 300 I=l,HS RELERR = AB5(SQRTN-AB5(Y(1,I)))/SQRTN IF (RELERR .GT. l.OE-3) ICO = 1 300 CONTINUE IF (2(N/2) .BE. N) THEN c C IF ll IS ODD, THEN SET DEFAULT OUTPUT Pii.RAI1ETERS FOR PRE C POST-PROCESSING ALGORITHM c IPO = 2 PINIT 0.0 PTRAN 0.0 ELSE c C IF N IS EVEN, THEN CALL PREPOST-PROCESSIIJG ALGORITHH c CI CALL IBMTIME(TSTART) CC PII'iiiT = SECONDO CALL VSI!Il'TI01S,lrlSAVE) CI CALL IBl'ITIHE(TSTOP) CI PINIT 1. OE-6 (TSTOP TST ART) CC PINIT = SECONDO PINIT 230

PAGE 242

CI CALL IBMTIME(TSTART) CC PTRAN = SECOND () CALL VSINT(M,MS,X,XT,M,WSAVE) CI CALL IBMTIME(TSTOP) CI PTRAN l.OE-6 (!STOP -!START) CC PTRAlll' = SECOlli'D () -PTRA.llr c C VERIFY PRE-POST-PROCESSING ALGORITHM OUTPUT c c IPO = 0 DO 400 I=1,MS RELERR = ABS(SQRTN+X(1,I))/SQRTN IF (RELERR .GT. 1.0E-3) IPO = 1 400 CONTINUE EJDIF C COMPUTE PERCENT DIFFERENCE IN TRANSFORM TIMES c c IF ((ICO .EQ. O) .AND. (IPO .EQ. 0)) THEN DELTIM = 100.0 (PTRAN-CTRAN)/PTRAN ELSE DELTIM = 0.0 END IF C OUTPUT TIMING DATA FOR I c c WRITE(6,2) N,CINIT,CTRAN,PINIT,PTRAN,DELTIN 2 FORMAT(iH ,12X,I4,2X,F10.6,2X,F10.6,2X, t F10.6,2X,F10.6,2X,F6.1) C IF VERIFICATION OF COMPACT ALGORITHM OUTPUT FAILED, THEH C OUTPUT DEBUG INFORMATION c c IF (ICO .EQ. 1) THEN VRITE(6,3) (IIDX(I),I=1,MS) 3 FORHAT(1H ,'INDX: ',128(/,416)) WRITE(6,4) (EIGENV(I),I=1,MS) 4 FORHAT(lH 'EIGEIJV: 1 ,128(/ ,4E13.4)) VRITE(6,S) (Y(1,I),I=1,MS) S FORMAT(1H ,'VFFRO OUTPUT:',128(/,4E13.4)) END IF C IF VERIFICATION OF PRE-POST-PROCESSING ALGORITHH OUTPUT C FAILED, THEN OUTPUT DEBUG INFORMATION c c IF (IPO .EQ. 1) THEN WRITE(6,6) (X(1,I),I=l,MS) 6 FORMAT(1H ,'VSINT OUTPUT:',128(/,4E13 4)) END IF END IF C INCREMEE'T N (Ill SOME FASHION) AND REITERATE LOOP UIJTIL DONE c 11 = 1+1 GOTO 1001 231

PAGE 243

1002 CONTINUE END 232

PAGE 244

c C SUBROUTINE: VFROI c c C DJ.ME c C VECTORIZED FOURIER TRANSFORM FOR RO SEQUENCES, C INITIALIZATION ROUTINE c c C FUNCTION c C ALL PROCESSING WHICH DEPENDS ONLY 01 THE SEQUENCE LENGTH N c c C INPUT PARAMETERS c C N: LENGTH OF RO SYMMETRIC SEQUENCE c c C OUTPUT P A.RAMETERS c C ISDX: PERMUTED INDICES OF FORWARD TRANSFORM c C EIGENV: ASSOCIATED EIGENVALUES IN ORDER SPECIFIED BY UDX C ADD SCALED BY N c C IRC: INITIALIZATION RETURN CODE c 0 c 1 c c INITIALIZATION SUCCESSFUL N NOT A PRODUCT OF 2,3,4 OR MORE THAN 10 FACTORS C OUTPUT TO COMMON (INTERNAL USE ONLY) c C MISCELLANEOUS CONSTANTS AND POWERS OF QlLIEGA USED IN C COMBINE EQUATIONS. COMI10N AREA SIZE NUST BE ESTABLISHED C BY USER. SEE TEST DRIVER FOR DETAILS. c C LIST OF FACTORS OF N c c SUBROUTINE VFROI(N,INDX,EIGENV,IRC) INTEGER IMDX(1:1) REAL EIGENV(1:1) COMPLEX 0!-tEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, t L,OMEGA INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE INTEGER P(3) C LIST OF FACTORS FOR FACTORIZATION OF N (ORDER IS VERY C IMPORTANT) c 233

PAGE 245

DJ.TJ. P/4.2.3/ c C MISCELLJ.NEOUS CONSTJ.NTS c CSQRT2 = -SQRT(2.0) SQRT2D2 = -CSQRT2/2.0 SQRT3 = SQRT(3.0) CSQRT3 = -SQRT3 SQRT3D2 = SQRT3/2.0 CSQRT3D2 = -SQRT3D2 PI= 4.0+J.TAN(1.0) CTIW16 = -2.0 SIN(PI/8.0) CTIW16E3 = -2.0 SIN(3.0+PI/8.0) c C POWERS OF OMEGA c L = 2+]! OMEGA(O) = 1. 0 TPIDL = 2.0+PI/L 0MEGi(1) = CMPLX(COS(TPIDL).SIN(TPIDL)) DO 100 I=2.L-1 c OMEGA(!) = OMEGA(I-1)+0MEGA(1) 100 CONTINUE C FACTORIZATION OF N c c D'FAC(1) N lllFAC(2) = 0 I = 1 LS = N C llHILE ((NFAC(2) .LT. 10) .AND. c c c c (I .LE. 3) .AND. (LS .Gr. 1)) DO 1 IF ((NFAC(2) .GE. 10) .OR. l (I .GT. 3) .OR. i (LS .LE. 1)) GOTO 2 IQ = LS/P(I) IR = LS IQ+P(I) IF (IR .EQ. 0) THEN NFAC(2) = NFAC(2) + 1 &FJ.C(NFAC(2)+2) = P(I) LS = IQ ELSE I = I+1 EJJDIF C EllDDO c c GOTO 1 2 CONTINUE IF (LS .EQ. 1) THEN IRe = 0 234

PAGE 246

c C GENERATE SPLITTING TREE c TPIDN = 2.0*PI/N IF (2(N/2) .EQ. N) THEN :MS = N/2-1 ELSE HS = (N-1)/2 ElirDIF DO 300 !=1, MS EIGENV(I) = 0.0 DO 200 J=l,MS Sl = 2.0 J SIN(TPIDN*IJ) EIGENV(I) = EIGENV(I) Sl :wo CONTINUE 300 CONTINUE CALL VFFROI(l,EIGENV) C PERMUTED INDEX ARRAY c c DO 400 I=l,MS IF (EIGENV(I) .GT. 0) THEN IIDX(I) = EIGENV(I) + 0.1 ELSE INDX(I) = N + EIGENV(I) + 0.1 END IF 400 CONTINUE C PERMUTED SCALED EIGENVALUE ARRAY c DO 600 I=l,MS E!GENV(I) = -4.0 Ilr (SIN(PIEIGENV(I)/N))**2 590 CONTINUE c C FACTORIZATION OF N FAILED c ELSE rae = 1 Elii'DIF RETURN END 235

PAGE 247

c C SUBROUTIHE: VFFROI c c C Jtii.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE SUPERVISES THE FORt:!ARD TRANSFORH AND C GENERATION OF THE SPLITTING TREE BY PROCESSING THE LIST CF C FACTORS OF THE SEQUENCE LENGTH N IN FORWARD ORDER. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRll.ll!SFORJI! c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C FIRST HALF OF A RO SEQUENCE OF LENGTH Il (ELEREn:TS 1 C THROUGH N/2-1 IF N IS EVEN. OR ELEMENTS 1 THROUGH (N-1)/2 C IF N IS ODD, ELEMENTS 0 AND N/2 ARE NOT -INCLUDED BECAUSE C THEY ARE ZERO) c c C OUTPUT PARAMETERS c C X: FORWARD TRANSFORM IN PERMUTED ORDER, SCALED BY 1/N c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C INITIAL SPLITTING TREE ENTRY FOR FIRST FACTOR OF I c SUBROUTIEE VFFROI(M,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOH2/ llrFAC,NE,TREE LS = NFAC(1) NE(1) = 1 TREE(1,1,1) = 1 TREE(2,1,1) = 1 DO 100 I=1,NFAC(2) IP2 = I+2 C WRITE(6,1001) NFAC(IP2) 1001 FORMAT(1H ,'PROCESSING FACTOR ',11) GDTO (1,2,3,4),NFAC(IP2) 1 CONTINUE 2 CDNTIIWE CALL VFFR02I(M,LS,I,X) GOTO 99 3 CONTINUE 236

PAGE 248

c Cll.L VFFR03I01.LS,I,X) GOTO 99 4 CONTINUE CALL VFFR04I(H,LS,I,X) 99 CONTINUE LS = LS/NFAC(IP2) 100 CONTINUE C SCALING IS REQUIRED IN INITIALIZATION PROCESSING FOR C COMPUTING THE PERl'IUTED INDEX AND EIGENVALUE ARRAYS. c SCALE= 1.0/NFAC(1) IF (2*(Jil'FAC(1)/2) .EQ. NFAC(1)) MS = NFAC(l)/2-1 ELSE MS = (NFAC(i)-1)/2 EMDIF DO 200 I=1,HS DO 201 J=1,M X(J,I) = X(J.I)*SCiLE 201 CONTINUE 200 CONTINUE RETURlll END 237

PAGE 249

c C SUBROUTINE: VFFR04I c c C i.A.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX-4 INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AH INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION C OF THE FORWARD COMBINE EQUATIDrl"S FOR RADIX-4. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES YHICH REFLECT THE CHANGES C MADE BY THIS RADIX-4 SPLITTING. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURREin C STATE OF THE DATA IN THE ARRAY X (THAT IS lli'I.E lJO'.t C PROCESSING FACTOR NUMBER !C OF THE SEQUENCE LENGTH N) c C X: TiD DIMENSIONAL ARRAY, EACH ROY OF YHICH C IRTERHEDIATE RESULTS AS SPECIFIED BY THE SPLI7TING TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FOR'I-JARD COMBINE EQUATIDrJS FOR RADIX-4 c c C OUTPUT TO COMMON (UTERNAL USE ONLY) c C NElJ SPLITTING TREE ENTRIES WHICH REFLECT THE CHAIIIGES MADE C BY THIS RADIX-4 SPLITTING c SUBROUTINE VFFR04I 0-!,LS, IC, X) REAL X(l:M,l:l) INTEGER YFAC(1:12),NE(1:10),TREE(1:2,1:4,1 10) COMJ'!ON /VFROCOM2/ NFAC ,liTE, TREE LSD2 = LS/2 IX = 1 IN = IC+1 NE(IN) = 0 DO 1000 I=l,NE(IC) C VRITE(6,1001) TREE(1,I,IC),TREE(2,I,IC) 1001 F0RMAT(1H ,'SPLITTING TREE ENTRY= 1,2I5) 238

PAGE 250

I c GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CALL VICSF4(M,LS,X(1,IX)) IX = IX+LSD2-1 IF (LS .LT. 8) THEN IF ((Il!E(U') .GE. 1) .AND .t (TREE(i,NE(IN) ,IE) ,EQ. 3)) THEn TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(Ii) + 1 TREE(l,NE(IN),IE) = 3 TREE(2,NE(IN),IM) = 1 END IF ELSEIF (LS .EQ. 8) THEN IF ((HE(IIi) .GE. 1) .AND. t (TREE(i,lJE(UJ),IN) .EQ. 2)) THEN 200 TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN) ,IN) 2 TREE(2,NE(IM),IN) 1 END IF NE(IN) = NE(IN) + TREE(1,NE(IN),IN) 3 TREE(2,NE(IN),IN) ELSE IF ((NE(IN) .GE. 1) .AND. i: (TR.EE(l,NE(IIrl) ,IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) TREE(2,NE(IN),IN) ElliDIF NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 2 TREE(2,NE(IN),IN) 1 NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IU) 1 END IF GOTO 1000 CONTIIWE C ISCS SYMMETRY OCCURS AT MOST ONCE -NO LOOP REQUIRED c CALL VISCSF4(H,LS,X(1,IX)) IX = IX+LSD2 IF ((NE(IN) .GE. 1) .A.JW. t (TREE(1,NE(IN),IN) .EQ. 3)) THEN TREE(2,ll!E(IE),IN) = TREE(2,NE(IN),IN) + 2 ELSE 239

PAGE 251

BE(IN) = NE(IN) + TREE(1,NE(IN),IN) 3 TREE(2,NE(IN) ,IN) 2 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF4(M,LS,X(1,IX)) IX = IX+LS IF ((liE(IIO .GE. 1) .AUD. & (TREE(1,NE(IN),IN) .EQ. 3)) THEN TREE(2,NE(IJJ) ,IN) TREE(2,NE(Illr) ,IN) + 4 ELSE IE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IN) 4 END IF 301 CONTINUE 1000 CONTINUE RETURN EHD 240

PAGE 252

c C SUBROUTINE: VFFR02I c c C NAME c C VECTOR! ZED FORWARD FOURIER TR.ArJSFORH FOR RO SEQUENCES, C RADIX-2 INITIALIZATION ROUTINE c c C FUIIICTIOII c C THIS SUBROUTIIJE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAl'IETER (IC) TO SUPERVISE THE APPLICATIDrf C OF THE FORWARD COI1BINE EQUATIONS FOR RADIX-2. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES WHICH REFLECT THE CHANGES C HADE BY THIS RADIX-2 SPLITTING. c c C INPUT PARAMETERS c C M: l'lUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: UDEX INTO SPLITTING TREE '1-JHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE YOU C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LENGTH 10 c C X: TilO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAII[S C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTIIW TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX-2 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lEW SPLITTING TREE ENTRIES WHICH REFLECT THE Cli.ANGES MADE C BY THIS RADIX-2 SPLITTING c SUBROUTINE VFFR02I(M,LS,IC,X) REAL X(1:M,1:1) INTEGER EFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE LSD2 = LS/2 IX = 1 IN = IC+1 EE(IN) = 0 DO 1000 I=1,NE(IC) C WRITE(6,1001) TREE(l,I,IC) 1001 FORMAT(1H 1SPLITTING TREE ENTRY = 1 241

PAGE 253

I c GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CALL VICSF2(M,LS,X(1,IX)) IX = IX+LSD2-1 IF ((NE(IN) .GE. 1) .AND. t (TftEE(l,NE(II),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 1 TREE(2,NE(IN),IN) 1 END IF NE(IN) = NE(IN) + 1 TREE(l,l'lE(Ilii'),IN) 2 TREE(2,NE(IN),IN) = 1 GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE -NO LOOP REQUIRED c CALL VISCSF2(M,LS,X(1,IX)) IX = IX+LSD2 IF ((NE(IN) .GE. 1) . ND. i (TREE(l,NE(IN) ,IN) .EQ. 3)) TEEm TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE IE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IN) = 1 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF2(H,LS,X(1,IX)) IX = IX+LS IF ((NE(II) .GE. 1) .AND. t (TREE(i,RE(Illl') ,nl') .EQ. 3)) THEN TREE(2,]!'E(I!J) ,BJ) = TREE(2,fJE(IN) ,I!J) + 2 ELSE NE(Illl') = lli'E(Illl') + 1 TREE(1,NE(IN) ,IN) 3 TREE(2,NE(IN),IN) 2 ENDIF 301 CONTINUE 1000 CONTINUE RETURN END 242

PAGE 254

c C SUBROUTINE: VFFR03I c c C NAME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX-3 INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES 3PECIFIED C BY AN IflPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION C OF THE FORWARD COMBIIE EQUATIONS FOR RADIX-3. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES YHICH REFLECT CHANGES C HlDE BY THIS RADIX-3 SPLITTING. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOW C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LErJGTH ll!) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH C IITERMEDIATE RESULTS AS SPECIFIED BY THE TREE C EBTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX-3 c c C OUTPUT TO COHHON (INTERNAL USE ONLY) c C JJEW SPLITTING TREE ENTRIES WHICH REFLECT THE CHANGES NADE C BY THIS RADIX-3 SPLITTING c SUBROUTINE VFFR03I(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),ME(1:10),TREE(1:2,1:4,1 10) COMMON /VFROCOM2/ EFAC,NE,TREE LSM1D2 = (LS-1)/2 IX = 1 IE' = IC+l NE(IE) = 0 DO 1000 I:l,NE(IC) C WRITE(6,1001) TREE(1,I,IC),TREE(2,I,IC) 1001 FORHAT(1H ,'SPLITTING TREE ENTRY= ',2I5) 243

PAGE 255

c GOTO (100,200,300,400),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CiLL VICSF3(M,LS,X(1,IX)) IX = IX+LSM102 IF (LS .LT. 6) THEN IF ((NE(IliJ) .GE. 1) .AND. & (TREE(1,NE(IN),IN) .EQ. 4)) THEN TREE(2,NE(IN) ,IN) = TREE(2,1l!E(IN) ,HI) + 1 ELSE NE(IM) = NE(IN) + 1 TREE(1,NE(IN),I]J) 4 TREE(2,NE(IN),IN) = 1 END IF ELSE IF ((NE(IN) .GE. 1) .AND. t (TREE(1,NE(IN),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) TREE(2,NE(IN),IN) = EJI'DIF NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 4 TREE(2,NE(IN),IN) = 1 END IF GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VISCSF3(M,LS,X(1,IX)) IX = IX+LSM1D2 IF (LS .LT. 6) THEN IF ((NE(IN) .GE. 1) .AND. l (TREE(1,NE(IJ0,IN) .EQ. 4)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,llrE(IN),IN) 4 TREE(2,NE(IN),IN) = 1 END IF ELSE IF ((NE(IN) .GE. 1) .AND. i: (TREE(l,NE(IN) ,IN) .EQ. 4)) THEN TREE(2,NE(IN),IN): TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = .E(IN) + 1 TREE(l,NE';(IN),IN) = 4 TREE(2,NE'(IN),IN) = 1 END IF NE(IN) = NE(IN) + 1 244

PAGE 256

TREE(1,NE(IN),IN) 2 TREE(2,NE(IN),IE) 1 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF3(M,LS,X(l,IX)) IX = IX+LS IF ((NE(IN) .GE. 1) .AND. (TREE(l,NE(IN),IN) .EQ. 3)) THEN TREE(2,NE(IN),Ii) TREE(2,NE(IN),IN) + 3 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),Ii) 3 TREE(2,NE(IN),IE) 3 END IF 301 CONTINUE GOTO 1000 40o comrnrUE DO 401 J=TREE(2,I,IC),1,-1 CALL VI2F3(M,LS,XC1,IX)) IX = IX+LS IF ((NE(II) .GE. 1) .AND. t (TREE(1,EE(IM),IN) .EQ. 4)) THEN TREE(2,NE(IN),IN) TREE(2,NE(IN),IN) + 3 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN) ,IN) 4 TREE(2,NE(IJJ) ,IN) 3 END IF 401 CONTINUE 1000 COE"TiliJUE RETURN END 245

PAGE 257

c C SUBROUTINE: VFFRO c c C NAME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES c c C FUNCTION c C THIS SUBROUTINE SUPERVISES THE FORWARD TRANSFORM BY C PROCESSING THE LIST OF FACTORS OF THE SEQUENCE LENGTH N IN C FORWARD ORDER. c c C INPUT PARAMETERS c C H: lUMBER OF SEQUENCES TO TR.JI.NSFO!U1 c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C FIRST HALF OF A RO SEQUENCE OF LENGTH N (ELEHENTS 1 C THROUGH N/2-1 IF N IS EVEN, OR ELEMENTS 1 THROUGH (N-1)/2 C IF I IS ODD, ELEMENTS 0 AID N/2 ARE NOT INCLUDED BECAUSE C THEY ARE ZERO) c c C OUTPUT PARAMETERS c C X: FORWARD TRANSFORI1 IN PERMUTED ORDER, SCALED BY 1/llT. C SCALING SHOULD BE DELETED IN THE FINAL VERSION OF THIS C CODE. IT fS nJCLUDED ONLY FOR PERFORMAIIfCE COHPARISOUS C WITH VFFTPH;, c c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C IOIE c SUBROUTINE VFFRO(M,X) REAL X(1:H,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) CDMliON /VFROCOM2/ NFAC ,NE, TREE LS = NF.A.C(1) DO 100 I=1,NFAC(2) IP2 = I+2 GOTO (1,2,3,4),1FAC(IP2) 1 CONTI!iUE 2 CONTINUE CALL VFFR02(M,LS,I,X) GOTO 99 3 CONTINUE CALL VFFR03(M,LS,I,X) GOTO 99 246

PAGE 258

c 4 CONTiiUE CALL VFFR04(H,LS,I,X) 99 CONTINUE LS = LS/NFAC(IP2) 100 CONTINUE C SCALIIiiG SHOULD BE DELETED IN THE FHJAL VERSIOIJ OF THIS C CODE. IT IS INCLUDED ONLY FOR PERFOR1U.NCE CDP-IPAaiSOIJS YITH C VFFTPK. c SCALE= 1.0/SQRT(FLOAT(NFAC(l))) IF (2(NFAC(1)/2) .EQ. NFAC(1)) THEN MS = NFAC(l)/2-1 ELSE MS = (NFAC(l)-1)/2 END IF DO 200 I=1,MS DO 201 J=l,M X(J,I) = X(J,I)SCALE 201 CONTINUE 200 CONTINUE RETURN END 247

PAGE 259

c C SUBROUTINE: VFFR04 c c C NAME c C VECTORIZED FORTrJ'ii.RD FOURIER TRANSFORM FOR RO SEQUELlCES, C aiDIX-4 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SP2:CH'IED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORWARD CONBINE EQUATIONS FOR RADIX-4. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOW C PROCESSING FACTOR NUMBER IC OF THE SEQUEICE LENGTH N) c C X: TVO DIMENSIONAL ARRAY, EACH ROM OF TrJ'HICH CONThiNS C INTERMED!il. TE RESULTS AS SPECIFIED BY THE SPLITTIUG TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT P AllMETER.S c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX-4 c c C OUTPUT TO COMMON (INTERIAL USE ONLY) c C DONE c SUBROUTINE VFFR04(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOH2/ liJFAC,liJE,TREE c LSD2 = LS/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF4(M,LS,X(1,IX)) IX = IX+LSD2-1 248

PAGE 260

c GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE EO LOOP REQUIRED c CALL VISCSF4(M,LS,X(1,IX)) IX ::::: IX+LSD2 GOTO 1000 300 CONTillUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF40-I,LS,X(1,IX)) IX = IX+LS 301 CONTINUE 1000 COJJTINUE RETURN END 249

PAGE 261

c C SUBROUTINE: VFFR02 c c C NAME c C VECTORIZED FORVARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX-2 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORVARD COMBINE EQUATIONS FOR RADIX-2. c c / C INPUT' PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LEIGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOV C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LENGTH N) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF MHICH CONTAINS C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTING TREE C EDTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RAD!X-2 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C DONE c SUBROUTINE VFFR02(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE c LSD2 = LS/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300),TREE(l,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF2(M,LS,X(l,IX)) IX = IX+LSD2-1 250

PAGE 262

c GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE -NO LOOP REQUIRED c CALL VIS,CSF2(M,LS,X(1,IX)) IX "' IX+LSD2 GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF2(M,LS,X(1,IX)) IX = IX+LS 301 CONTINUE 1000 CONTINUE RETURN END 251

PAGE 263

c C SUBROUTINE: VFFR03 c c C JIIA.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX-3 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORWARD COMBINE EQUATIONS FOR RADIX-3. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, TrJE .ARE NOW C PROCESSING FACTOR liHJl'IBER IC OF THE SEQUENCE LENGTH N) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTING TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX-3 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VFFR03(M,LS,IC,X) REAL X(1:H,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,EE,TREE c LSM102 = (LS-1)/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300,400),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF3(M,LS,X(1,IX)) IX = IX+LSH1D2 252

PAGE 264

c GOTO 1000 200 CONTINUE C ISCS SYriNETRY OCCURS AT MOST ONCE EO LOOP REQUIRED c CALL VISCSF3(M,LS,X(1,IX)) IX = IX+LSM1D2 GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,-1 CALL VIF3(M,LS,X(1,IX)) IX = IX+LS 301 CONTINUE GOTO 1000 400 CONTINUE DO 401 J=TREE(2,I,IC),1,-1 CALL IX = IX+LS 401 CONTINUE 1000 COJITINUE RETURN EIID 253

PAGE 265

c C SUBROUTINE: VICSF4 c c C DAME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD FOR C RADIX-4 c c C FUNCTIOE' c C THIS SUBROUTINE EXECUTES THE RJI.DIX-4 FORWARD CGr1BINE C EQUATIONS FOR ICS INDUCED SYNMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TTJO DIHENSIDiililL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX-4 FORRARD COMBINE EQUATIONS FOR ICS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NOME c SUBROUTIDE VICSF4(M,LS,X) REAL X(l:M,l:LS/2-1) COMPLEX OMEGA(O:O) COMMON /VFROCOMl/ CSQRT2,SQRT2D2, LSD4 = LS/4 DO 1 J=l,M SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, L,OMEGA X(J,LSD4) = (-2.0) X(J,LSD4) 1 CONTINUE IF (8(15/8) .EQ. LS) THEN LSDS = LS/8 I3LSD8 = 3LSD8 DO 2 J=1,M V1 = CSQRT2 (X(J,LSD8) + X(J,I3LSD8)) X(J,LSD8) = 2.0 (X(J,LSD8) X(J,I3LSD8)) X(J,I3LSD8) = V1 2 CONTINUE 254

PAGE 266

T c -------------......... HS = LSDB-1 ELSE MS = (LS-4)/8 END IF IF (LS .GT. 8) THEE LSD2 = LS/2 LDLS = L/LS DO 100 I=l,MS LSD4MI = LSD4I LSD4PI = LSD4+I LSD2Ml = LSD2-I ILDLS = ILDLS 51 = REAL(DHEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) DO 101 J=1,M Vl = X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) V3 = X(J,LSD4PI) X(J,LSD4MI) V4 = -X(J,LSD4PI) X(J,LSD4MI) X(J,I) = V2 + V3 X(J,LSD4MI) = V2 -V3 C Ct = COIJG((S1,S2)) CHPLX(V1,V4) C X(J,LSD4PI) = AIMAG(C1) C X(J,LSD2MI) = REAL(C1) c X(J,LSD4PI) = S1V4 S2*V1 X(J,LSD2MI) = StVt + S2V4 101 CONTINUE 100 CONTINUE EIDIF RETURN END 255

PAGE 267

c C SUBROUTINE: VISCSF4 c c C NAME c C VECTORIZED ISCS INDUCED SYNI>IETRIES FORWARD COHBnED FOR C li.A.DIX-4 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-4 FORWARD CGNB:i:NE C EQUATIONS FOR ISCS INDUCED SYIDiETRIES. c c C INPUT PARAMETERS c C H: NUMBER OF SEQUENCES TO TRANSFORJ{ c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX-4 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COHMOi (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF4(M,LS,X) REAL X(l:M,O:LS/2-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, LSD4 = LS/4 DO 1 J=1,M L,OI1EGA Vi= CSQRT2 X(J,LSD4) V2 = Vl + X(J,O) X(J,O) = Vl X(J,O) X(J,LSD4) = V2 i COE!TINUE IF (B(LS/8) .EQ. LS) THEN LSDB = LS/8 I3LSD8 = 3LSD8 DO 2 J=l,H Vi = CTIW1SX(J,I3LSD8) + CTIWi6E3X(J,LSD8) 256

PAGE 268

X(J,I3LSD8) = CTIW16E3*X(J,I3LSD8) CTIW16Y(J,LSD8) X(J,LSDS) =Vi 2 CONTINUE MS = LSDB-1 ELSE MS = (LS-4)/8 END IF IF (LS .GT. 8) THEN LSD2 = LS/2 LD2LS = L/(2LS) DO 100 I=l,MS LSD4MI = LSD4 -I LSD4PI = LSD4 + I LSD2MI = LSD2 -I ILD2LS = ILD2LS I3ILD2LS = 3ILD2LS 51 -REAL(OMEGA(ILD2LS)) 82 -AIMAG(OMEGA(ILD2LS)) 53 -REAL(OHEGA(I3ILD2LS)) 54 -AIMAG(OMEGA(I3ILD2LS)) DO 101 J=1,M V1 = SQRT2D2 (X(J,LSD4HI) + X(J,LSD4PI)) V2 SQRT2D2 (X(J,LSD4MI) X(J,LSD4PI)) V3 V1 X(J ,I) V4 V1 + X(J ,I) c VS V2 X(J,LSD2MI) V6 = -V2 X(J,LSD2MI) C Cl = CONJG((Sl,S2)) CMPLX(V6,V4) C X(J,I) = AIMAG(Cl) C X(J,LSD4MI) = REAL(C1) c X(J,I) = S1V4 S2V6 l(J,LSD4MI) = S1*V6 + S2V4 c C C2 = CONJG((S3,S4)) CMPLX(VS,V3) C X(J,LSD4PI) AIMAG(C2) C l(J,LSD2MI) = REiL(C2) c X(J,LSD4PI) X(J,LSD2MI) 101 CONTINUE 100 CONTINUE END IF RETURN END S3*V3 S4VS S3VS + S4V3 257

PAGE 269

c C SUBROUTINE: VIF4 c c C IU.HE c C VECTORIZED I SEQUENCES FORWARD COMBINED FOR RADII-4 c c C FUJJCTIOM c C THIS SUBROUTINE EXECUTES THE RADIX-4 FORWARD COMBnJE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TVO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF U I SYmiETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX-4 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMI'!:ON (IIrTERN.li.L USE ONLY) c C NOllE c SUBROUTINE VIF4(M,LS,X) REAL X(l:M,O:LS-1) COMPLEX OMEGA(O:O) COMMOE /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, i: CTIW16,CTHJ16E3, & L,ONEGA LSD4 = LS/4 LSD2 = LS/2 I3LSD4 = 3LSD4 DO 1 J=l,H Vi X(J,O) + X(J,LSD2) V2 X(J,O) X(J,LSD2) V3 2.0*X(J,LSD4) V4 2.0*X(J,I3LSD4) X(J,O) = V1 + V3 X(J,LSD4) = V2-V4 X(J ,LSD2) = V1 -V3 X(J,I3LSD4) = V2 + V4 1 CONTIJIIUE 258

PAGE 270

2 c IF (8(15/8) .EQ. LS) THEN LSD8 = LS/8 I3LSD8 3LSD8 I5LSD8 = 5LSD8 I7LSD8 = 7LSD8 DO 2 J=l,M Vl = X(J,LSDS) + X(J,I3LSD8) V2 X(J,LSD8) X(J,I3LSD8) V3 = X(J,I5LSD8) + X(J,I7L5D8) V4 = X(J,I5LSD8) X(J,I7LSD8) X(J,L5D8) = 2.0 Vl X(J,I3L5D8) = C5QRT2 (V3 V2) X(J,I5L5D8) 2.0 V4 X(J,I7LSD8) = C5QRT2 (V3 + V2) CONTINUE MS = 1508-1 ELSE MS = (15-4)/8 END IF IF (LS .GT. 8) THEN LDL5 = 1/LS DO 100 I=l,MS L5D4MI = LSD4 -I L5D4PI = LSD4 + I LSD2MI = 1502 I LSD2PI = LSD2 + I I3LSD4MI = I3LSD4 I I3LSD4PI = I3LSD4 + I LSMI=1S-I ILDLS = ILDLS I2ILDLS = 2*ILDLS I3ILDLS = 3ILDLS 51 = REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 53= REAL(OMEGA(I2ILDLS)) 54 = AIMAG(OMEGA(I2ILDLS)) 55 = REAL(OMEGA(I3ILDLS)) 56 = A!MAG(OMEGA(I3ILDLS)) DO 101 J=l,H Vl = X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) V3 = X(J,LSD4PI) + X(J,LSD4MI) V4 X(J,LSD4PI) X(J,LSD4MI) VS = X(J,I3LSD4MI) + X(J,I3LSD4PI) V6 = X(J,I3LSD4MI) X(J,I3LSD4PI) V7 = X(J,LSMI) + X(J,LSD2PI) VB= X(J,LSMI) XCJ,LSD2PI) C Cl = CMPLX(V8+V6,V1+V3) C X(J,I) = AIMAG(Cl) C X(J,LSD4MI) = REAL(Cl) c c X(J,I) = Vl + V3 X(J,LSD4MI) VS + V6 259

PAGE 271

C C2 = CONJG((S1,52)) CMPLX(V7+V4.V2-V5) C X(J ,LSD4PI) .A.I11AG(C2) C X(J,LSD2HI) = REAL(C2) c c V9 = V7 + V4 V10 = V2 VS X(J,LSD4PI) S1V10-S2V9 X(J ,LSD2IU) = S1V9 + S2*V10 C C3 = CONJG((S3,S4)) CMPLX(V8-V6,V1-V3) C X(J,LSD2PI) = AIHAG(C3) C X(J,I3LSD4HI) = REAL(C3) c c V9 = V8 V6 V10 = Vi V3 X(J,LSD2PI) = S3V10-S4*V9 X(J,I3LSD4MI) S3V9 + S4V10 C C4 = COJJJG((S5,S6)) CNPLX(V7-V4, V2+VS) C X(J,I3LSD4PI) = AIHAG(C4) C X(J,LSMI) = REAL(C4) c V9 = V7 V4 V10 = V2 + VS X(J,I3LSD4PI) = SS*V10-S6V9 X(J,LSHI) = SSV9 + S6V10 101 CONTINUE 100 CONTINUE END IF RETURN END 260

PAGE 272

c C SUBROUTINE: VICSF2 c c C E'.A.ME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD COMBINED FOR C RADIX-2 c c C FUE'CTION c C THIS SUBROUTINE EXECUTES THE RADIX-2 FORWARD COMBINE C EQUATIONS FOR ICS INDUCED SYI
PAGE 273

c C SUBROUTINE: VISCSF2 c c C UME c C VECTORIZED ISCS INDUCED SYMMETRIES FORlrJARD CONBII!ED FOR C RJ.DIX-2 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-2 FORWARD COMBINE C EQUATIONS FOR ISCS INDUCED SYMMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRJI.NSFOIUI c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RJI.DIX-2 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C DONE c SUBROUTINE VISCSF2(M,LS,X) REAL X(l:M,O:LS/2-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, & SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, DO 1 J=l,M X(J,O) =X(J,O) CTIW16,CTIM16E3, L,OMEGA 1 COflTUJUE IF (LS .GT. 4) THEN LSD2 = LS/2 LD2LS = L/(2*LS) DO 100 I=1.(LS-2)/4 LSD2MI = LSD2-I ILD2LS = I*LD2LS S1 = -REAL(OHEGA(ILD2LS)) 52= -.II.HU.G(OHEG.A(ILD2LS)) DO 101 J=1,M 262

PAGE 274

c C Cl = CONJG((S1,S2)) CMPLX(-X(J,LSD2MI),X(J,I)) C X(J,I) = AIMAG(Cl) C X(J,LSD2MI) = REAL(Cl) c Vl = S2X(J,I) S1XCJ,LSD2NI) X(J,I) = SiX(J,I) + S2XCJ,LSD2MI) X(J,LSD2MI) =Vi 101 CONTINUE 100 CONTINUE END IF RETUR.E END 263

PAGE 275

I I I c C SUBROUTINE: VIF2 c c C NAME c C VECTORIZED I SEQUEillCES FORYJl.RD CONBINED FOR RADIX-2 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-2 FORWARD COMB!llE C EQUATIONS FOR I SEQUENCES. c c C Il!lPUT P.ARll!ETERS c C H: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN I SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PAR.AI'ffiTERS c C X: UPDATED BY RADIX-2 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMMON (UTERl\!Jl.L USE ONLY) c C NONE c SUBROUTINE VIF2(M,LS,X) REAL X(l:M,O:LS-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, & CTIW18,CTHF16E3, i 1, OJliEGA LSD2 = LS/2 DO 1 J=l,M Vi= X(J,O) X(J,LSD2) X(J,O) = X(J,O) + X(J,LSD2) XCJ,LSD2) = V1 1 CONTINUE IF (LS .GT. 4) THEE LDLS = 1/LS DO 100 I=i,(LS-2)/4 LSD2MI = LSD2-! LSD2PI = LSD2+I LSIH = LS-I ILDLS = I*LDLS 264

PAGE 276

51 = REAL(OMEGA(ILDL5)) 52 = AIMAG(OMEGA(ILDLS)) DO 101 J=i,M Vi X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) c V3 = X(J,LS!U) + X(J,LSD2PI) V4 = X(J,LSMI) X(J,LSD2PI) C Cl = CMPLX(V4,V1) C X(J,I) = AIMAG(C1) C X(J,LSD2MI) = REAL(C1) c c X(J,I) = Vl X(J,LSD2IU) = V4 C C2 = CONJG((S1,S2)) CMPLX(V3,V2) C X(J ,LSD2PI) = AiriAG(C2) C X(J,LSMI) = REAL(C2) c X(J,LSD2PI) X(J ,LSHI) 101 CONTINUE 100 CONTINUE END IF RETURN END = Si*V2 S2*V3 S1*V3 + S2*V2 265

PAGE 277

c C SUBROUTINE: VICSF3 c c C NAME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD COMBINED FOR C RADIX-3 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-3 FORWARD CONBIIJE C EQUUIONS FOR ICS INDUCED SYNHETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEIIW SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH L5 c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX-3 FORYARD COMBINE EQUATIONS FOR ICS C INDUCED SY!iMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VICSF3(M,LS,X) REAL X(1:M,1:(LS-1)/2) COMPLEX OMEGA(O:O) CaMMON /VFROCOM1/ CSQRT2,SQRT202, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, LSD3 = LS/3 DO 1 J=1,H L,ONEGA X(J,LS03) = CSQRT3 + X(J,LSD3) 1 CONTINUE IF (LS .GT. 6) THEN LDLS = L/LS DO 100 I=1.(LS-3)/6 LSD3MI = LSD3-I LSD3PI = LSD3+I ILDLS = I+LDLS 51 REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 266

PAGE 278

c DO 101 J=1,M Vl = CSQRT3D2 (X(J,LSD3PI) + X(J,LSD3MI)) V2 = X(J,LSD3PI) X(J,LSD3IU) C C1 = COJJG((S1,S2)) CHPLX((-O.S)V2+X(J,I),V1) C X(J,LSD3MI) = REAL(Cl) C X(J,LSD3PI) = AIMAG(C1) c c V3 = (-O.S)V2 + X(J,I) X(J ,LSD3IU) S1V3 + S2*V1 X(J,LSD3PI) SlVl S2V3 X(J,I) = V2 + X(J,I) 101 CONTINUE 100 COllJTINUE END IF RETURN END 267

PAGE 279

c C SUBROUTINE: VISCSF3 c c C NAME c C VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COMBINED FOR C RA.DIX-3 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-3 FORWARD COiiBINE C EQUATIONS FOR ISCS INDUCED SYMMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRJI.NSFORH c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIHENSIOHAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT P AR.AMETERS c C X: UPDATED BY RADIX-3 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF3(M,LS,X) REAL X(hM,O' (LS-3)/2) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, a: CTIW16,CTH!'16E3, 1:: L,OMEGA LSH3D6 (LS-3)/6 DO 1 J=1,M X(J,LSM3D6) = CSQRT3 X(J,LSM3D6) 1 CONTINUE IF (LS .GT. 6) THEE LSM1D2 = (LS-1)/2 LD2LS = L/(2LS) DO 100 I=1,LSM3D6 = LSM3D6 -I LSM3D6PI = LSM3D6 + I LSH1D2MI = LSM1D2 -I ILD2LS = ILD2LS 268

PAGE 280

c 51 = REALCOMEGA(ILD2LS)) 52 = AIMAG(OMEGA(ILD2LS)) DO 101 J=i,M Vi CSQRT302 (X(J,LSM306PI) + X(J,LSM3D6MI)) V2 = X(J ,LSM306PI) X(J ,LSM306I'II) C C1 = CONJG((S1,S2)) CMPLX((-O.S)V2+X(J,LSMlD2NI) ,V1) C X(J,LSM3D6MI) REAL(C1) C X(J ,LSM3D6PI) = AII1AG(C1) c c V3 = (-0.5)*V2 + X(J,LSM1D2MI) X(J,LSM3D6MI) S1V3 + S2V1 X(J ,LSI'i3D6PI) S1V1 S2V3 X(J,LSH1D2MI) 101 CONTINUE 100 CONTINUE END IF RETURN n V2 + X(J,LSM1D2MI) 269

PAGE 281

c C SUBROUTINE: VIF3 c c C NAME c C VECTOR!ZED I SEQUENCES FORWARD COMBINED FOR RADIX-3 c c C FUNCTIOI c C THIS SUBROUTINE EXECUTES THE RADIX-3 FORWARD COMBINE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN I SYMHETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX-3 FORWARD COMBINE EQUATIONS FOR I C SEQUEniCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lii'Ol!IE c SUBROUTINE VIF3 01,15 ,X) REAL X(l:M,O:LS-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, & SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, .t CTIM16, CTIW16E3, .t L,OMEGA LSD3 = LS/3 I2LSD3 = 2*1503 DO 1 J=1,M Vi= SQRT3 X(J,I21SD3) V2 = X(J,O) X(J.LSD3) X(J.O) = X(J.O) + 2.0*X(J.LSD3) X(J.LSD3) = V2 V1 X(J.I2LSD3) = V2 + V1 1 CONTH!UE IF (LS .GT. 6) THEN LDLS = L/LS DO 100 I=1.(LS-3)/6 LSD3MI = LSD3 -I 270

PAGE 282

LSD3PI = LSD3 + I I2LSD3MI = I2LSD3 -I I2LSD3PI = I2LSD3 + I LSMI=LS-I ILDLS = ILDLS I2ILDLS = 2ILDLS 51 = REAL(Ol1EGA(ILDLS)) 52 = AIMAG(OHEGA(ILDLS)) 53= REAL(OMEGA(I2ILDLS)) 54= AIHAG(OMEGA(I2ILDLS)) DO 101 J=i,M V1 = X(J,LSD3MI) + X(J,LSD3PI) V2 = SQRT3D2 (X(J,LSD3MI) X(J,LSD3PI)) V3 = SQRT3D2 (X(J ,I2LSD3IU) + X (J, I2LSD3PI)) V4 X(J,I2LSD3HI) X(J,l2LSD3PI) c V5 (-O.S)Vl + X(J,I) V6 (-0.5)V4 + X(J,LSMI) C Cl = CMPLX(V4+X(J,LSHI),V1+X(J,I)) C X(J,I) = AIHAG(C!) C X(J,LSD3MI) = REAL(C1) c c X(J,I) =Vi+ X(J,I) X(J,LSD3MI) = V4 + X(J,LSMI) C C2 = COMJG((S1,S2)) CMPLX(V6-V2,V5-V3) C X(J,LSD3PI) = AIHAG(C2) C X(J,I2L5D3MI) = REAL(C2) c c V7 = V6 V2 va = vs V3 X(J,L5D3PI) = X (J, I2L5D3IU) Sl*V8 S2V7 Sl*V7 + S2VS C C3 = COHJG((S3,S4)) CMPLX(V6+V2,VS+V3) C X(J,I2LSD3PI) = AIMAG(C3) C X(J,LSMI) = REAL(C3) c V7 = VS + V2 va = vs + va X(J,I2LSD3PI) = S3V8 S4V7 X(J,LSMI) = S3V7 + 54*V8 101 CONTINUE 100 COP1TIE'UE EP10IF RETURN END 271

PAGE 283

c C SUBROUTINE: VI2F3 c c C NAME c C VECTORIZED !2 SEQUENCES FORWARD COMBINED FOR RADIX-3 c c C FUJii'CTIOE c C THIS SUBROUTINE EXECUTES THE RADIX-3 FORWARD CONBINE C EQUATIONS FOR 12 SEQUEiifCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DII'IENSIONAL ARRAY, EACH ROll OF WHICH cm!Tii.IHS THE C !OFT OF AI !2 SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PAR.UlETERS c C X: UPDATED BY RADIX-3 FORWARD COMBINE EQUATIONS FOR !2 C SEQUENCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VI2F3(M,LS,X) REAL X(l:M,O:LS-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, t SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, 1: CTIIri116,CTilnT16E3, 1: L,OMEGA EQUIVALENCE (!RO,LSH306) IRO (LS-3)/6 IR1 = (LS-1)/2 IR2 = (Ei>t
PAGE 284

c DO 100 I=1,LSM3D6 IROMI IRO -I IROPI IRO + I IR1MI IR1 -I IR1PI IR1 + I IR2MI IR2 -I IR2PI "' IR2 + I ILDLS = hLDLS I2ILDLS = 2*ILDLS 51= REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 53 = REAL(OMEGA(I2ILDLS)) 54 = AIMAG(OHEGA(I2ILDLS)) DO 101 J=1,M Vi SQRT3D2 (X(J,IROMI) + X(J,IROPI)) V2 X(J,IROMI) X(J,IROPI) V3 X(J,IR2MI) + XCJ,IR2PI) V4 = SQRT3D2 (X(J,IR2MI) X(J,IR2PI)) VS = (-0.5)*V2 + X(J,IR1MI) V6 = (-O.S)*V3 + X(J,IR1PI) C C1 = CMPLX(V2+X(J,IR1MI),V3+X(J,IR1PI)) C X(J,IROHI) REAL(C1) C X(J,IROPI) = AIMAG(Cl) c c X(J,IROHI) = V2 + X(J,IR1MI) X(J,IROPI) = V3 + X(J,IR1PI) C C2 = COEJG((S1,S2)) CHPLX(V5-V4,V6-V1) C X(J,IR1MI) REAL(C2) C X(J,IR1PI) = AIMAG(C2) c c V7 = VS V4 VS=V6-V1 X(J,IR1MI) S1*V7 + S2*V8 X(J,IR1PI) = Sl*VS-S2*V7 C C3 = COIJG((S3,S4)) CMPLX(V6+V4,V6+V1) C X(J,IR2MI) = REAL(C3) C X(J,IR2PI) = AIMAG(C3) c V7 = VS + V4 VS = V6 + V1 X(J,IR2MI) S3V7 + S4*V8 X (J, IR2PI) 53V8 54*V7 101 CONTINUE 100 COJifTII'JUE END IF RETURJJ END 273

PAGE 285

Appendix C FORTRAN Skeleton for Combine Equations

PAGE 286

c C SUBROUTINE: V_acronym_Fori_p c c C NAME c C VECTORIZED acronym INDUCED SYMl1ETRIES :forward o:r inverse C COMBINED FOR R!DIX-p c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX-p :forvard or inverse C COMBINE EQUATIONS FOR acronym INDUCED SYMMETRIES. c c C IEPUT PARAMETERS c C M: NUHBER OF SEQUENCES TO TRilNSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICR CONTAINS THE C idft or idst OF AI acronym SYMMETRIC SEQUENCE OF LENGTH C LS c c C OUTPUT PARAMETERS c C A: UPDATED BY Rli.DIX-p forward or inverse COHBHJE C EQUATIOES FOR acronym INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c c SUBROUTINE V_acronym_Fori_p(M,LS,A) REAL A(l:M,Oor1:function_of_LS) COMPLEX OMEGA(O:O) COMMON /V_ForS_symmetry_COM1/ miscellaneous constants, t L,OMEGA IliTEGER P,TWOP PARAMETER (P=p,TWOP=2*P) C COMPUTATIONS FOR I = 0 c DO 1 J=1,M 1 CONTIEUE IF (TWOP*(LS/TVOP) .EQ. LS) THEN c C COMPUTATIOIS FOR I = LS/TWOP c DO 2 J=1,M 275

PAGE 287

c 2 CONTINUE MS = LS/TWOP-1 ELSE MS = (LS-P)/TVOP EJlDIF IF (LS .GT. !WOP) THEN C COMPUTATIOES FOR I = t,MS c DO 100 I=t,HS no 101 J=t,M 101 COllfTINUE 100 CONTINUE END IF RETURI! EHD 276

PAGE 288

Appendix D Mathematica Scripts

PAGE 289

( VECTORIZED ICS INDUCED SYMMETRIES FORWARD COI-lBINED EVEN Rli.DIX J < BEFORE EXECUTING THIS FILE, OPEN THE PACKAGES MSG.M AUD REIM.M. MSG.M PROVIDES TEXT FOR ERROR WHILE REIM.M REDEFINES THE FUNCTIONS RE AND IM FOR PERFORMING SYMBOLIC RATHER THAN NUMERIC COMPUTATIONS. ) < SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 ( THE CONJUGATE FUNCTION MUST BE REDEFINED FOR PERFORHIJJG SYMBOLIC RATHER THAN NUMERIC COMPUTATIONS J Unproteet[Conjugate] {"Conjugate"} Conjugate[expr_] := Re[expr] Im[expr] I Protect(Conjugate] {"Conjugate"} ( REX REPRESENTS THE REAL PART OF X. AT THIS POINT, IT IS AN UNDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. WE WILL LATER DEFINE REX IN TERMS OF ITS LOCATIOIJ liiTHIN A FORTRAN ARRAY, MATHEMATIC.A MAKES NO ASSUHPTIOJ.fS ABOUT THE DATA TYPE OF REX. THE FOLLOWING ST ATENEETS INFORlli HATHEHATICA THAT REX IS REAL VALUED. IMX REPRESENTS THE IMAGINARY PART OF X, AND IS ANALOGOUS TO REX. ) rex/: Re[rex[n_]] := rex[n] rex/: Im[rex (n_]] ,. 0 imx/: Re[Unx[n_]] ,. imx[n] imx/: Im[imx [n_] J := 0 < THE FOLLOWIJliG STATEMENTS ARE VALID FOR THE IDFT OF P.iJY ICS 278

PAGE 290

SYMMETRIC SEQUENCE. THESE STATEf-!ENTS PLAY 11. CRUCIAL ROLE IN SIMPLIFYING THE FOR&!A.RO COMBINE EQUATIONS FOR ICS SEQUENCES. LS REPRESENTS THE LENGTH OF THE SUBSEQUEI>!CE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) rex[O] := 0 rex[ls/2] := 0 imx[n_] := 0 < THE FUJJCTIONS X, W (OMEGA), AND Y CORRESPOND TO IWTATION USED IN THE COMBINE EQUATIONS. THE ARGUMENTS TO THESE FUNCTIONS REPRESENT SUBSCRIPTS. ) x:[n_,l_] := rex:[hls/p+n] + himx[Hls/p+n] v[l_] := Exp[h2+Pi/l] y[n_,q_] := ( x[n,O] + ( -1) (q+1 )+Conjugate [x [ -n ,p/2]] + Sum[ Conjugate[v[p]-(l+q)]+x[n,l]v[p] (l+q)+Conjugate [x [ -n,l]] {l,1,p/2-1} J ) < THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, Aim THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsryOq = Table( Factor[Simpliy[Re(y[O,q]]]] {q,O,p/2-1} ] {0, O} < THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NON-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y(O.Q]. ) rhsiyOq = Table[ Factor[Simplify[Im[y[O,q]]]) {q,O,p/2-1} ] {0, -2+rez[ls/4]} ( THE FOLLOWING TABLE COETAH'S THE REAL PART OF ,Q] 279

PAGE 291

FOR THE APPROPRIATE RANGE OF Q, THE RESULTS SHOULD BE 0, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsrym.q =Table[ Factor(Sirnplify[Re[y(ls/(2p) ,q]]]] {q,O,p/2-1} ] {0, 0} < THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[LS/(2P),Q] FOR THE APPROPRIATE RANGE OF Q, THE NON-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYMQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHt HAND SIDE OF THE IMAGINARY PART OF Y[M,Q] WHERE M = LS/(2*P). ) rhsiym.q =Table[ Factor[Simplify[Im[y[ls/(2p),q]]JJ {q,O,p/2-1} ] < THE FOLLOl-iimG RESULT IS THE REAL PART OF Y[I ,0] FOR 11. GENERAL INDEX I. A NON-ZERO RESULT WILL LATER BE STORED THE FORTRAN ARRAY. THE LABEL RHSRYIO PROVIDES A CONVENIENT MEARS FOR REFERRING TO THIS RESULT, AND IS An ABBRE\II.A.TION FOR THE RIGHT HAND SIDE OF THE REAL PART OF Y[I,O]. ) rhsryiO = Factor[Simplify(Re[y(i,O]]]] rex[i] rex[-i + ls/4] -rex[-i + ls/2] + rex[i + ls/4] ( THE FOLLOWING RESULT IS THE IMAGINARY PART OF Y[I,O] FOR A GENERAL INDEX I. A NON-ZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYIO PROVIDES A CONVEIHEin HEAlS FOR REFERRING TO THIS RESULT, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[I,O], ) rhsiyiO Factor [Simplify [Im[y [i,O]] ]] 0 < THE REMAINING EQUATIONS INVOLVE COMPLEX MULTIPLICATION BY A POWER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF ONEGA REQUIRED HAVE BEEN PRE-COMPUTED AND STORED IN THE CQI{PLEX .ARR.A.Y OMEGA. OMEGA COinAINS THE L 1TH ROOTS OF UNITY, lr:'liERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2*LS FCR fi.LL V.A.LUES OF LS. I AND Q ARE INDEX VALUES USED IN THE CJMBIHE 280

PAGE 292

EQUATIONS. MATHEMATIC! REGARDS OMEGA AS AN UHDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT YILL BE IDENTICAL TO A FORTRAN ARRAY. THE FUNCTIONS OR AND OI REPRESENT THE REAL AND IMAGINARY PARTS, RESPECTIVELY, OF THE POWERS OF OMEGA USED IN THE COMBINE EQUATIONS. ) or(q_] := Re(omega(q*il/ls]] oi(q_] = Im(omega[q*i*l/ls]] rhsorq =Table[ or(q] {q,l,p/2-1} {Re [Ol!lega [ /ls]]} rhsoiq = Table[ oi[q] {q,l,p/2-1} ] < THE FUNCTIONS FR AID FI ARE OBTAINED FROM Y BY OMITTING THE FIRST FACTOR (A POWER OF OMEGA) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_,q_] := Re[ x[n,O] + {-1)-(q+i)*Conjugate[x[-n,p/2]] + Sum[ Conjugate(;; (p](l>l
PAGE 293

Clear[:fr] Clear[fi] < THE FUNCTIONS RYI [Q] AND IYI [Q] REPRESENT THE REil.L U:D IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or(q]fr[q] + oi[q]:fi[q] iyi[q_] := or[q]*fi[q) oi[q]fr[q] rhsryiq = Table[ ryi[q] {q,1,p/2-1} ] {fi [1] oi[1] + fr [1] ox[1]} rhsiyiq =Table[ iyi[q] {q,l,p/2-1}] { -(fr [1) oi[1]) + :fi [1] or[1]} < THE FUNCTION YT CORRESPONDS TO THE NOTATION Y TILDE USED IN THE COMBINE EQUATIONS. THE ARGUMENT TO THIS FUNCTION REPRESENTS THE FIRST SUBSCRIPT, WHILE THE SECOND SUBSCRIPT HAS THE IMPLICIT VALUE P/2. ) yt [n_] := x[n,O] + + Sum( (-1)-1 (x[n,l)-x[-n,l]) {l,i,p/2-1} ] ) < THE FOLLOWING RESULTS ARE OBTAINED BY EVALUATING Y! AT SPECIFIC VALUES OF ITS ARGUMENT. ) rhsytO = Factor(Simplify[yt[O]]] 0 rhsytm = Factor[Simplify[yt [ls/(2p)]]] 2(rex [ls/8] -rex [ (3ls) /8]) rhsyti = Factor[Simplify[yt[i]]] rex[i] + rex[-i + ls/4] -rex[-i + ls/2] rex[i + ls/4] < BEFORE EXECUTiliiG THE REMAINDER OF THIS FILE, If'JST DETERMINE STORAGE PATTERNS FOR THE INPUT AND OUTPUT DATA WHICH ALLOV THE COMBINE EQUATIONS TO BE EXECUTED IN-PLACE. 282

PAGE 294

THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN ARRAY NAMED A. EACH SEQUENCE IS STORED IN A ROW OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE NUMBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING REX IN TERMS OF ITS LOCATION WITHIN THE FORTRAn ARJi.A Y A. HATHEMATICA REGARDS A AS AN UlWEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT MILL BE IDENTICAL TO A FiJRTRU ARRAY. ) rex[n_] := a[j,n] ( THE STORAGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOTiriNG FUNCTIONS, THE FUNCTION EAI1ES ARE FOR. OUTPUT QUANTITIES. FOR EXAMPLE, YO[N) MEANS Y[N,O], IY[N,Q] MEAlS THE IMAGINARY PART OF Y[B,Q], ETC ... ) yO[n_] '= a[j ,n] yt[n_] := a[j,ls/p-n] iy[n_,q_] := a[j,qls/p+n] ry[n_,q_] := a[j,qls/p+ls/p-n] ( WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FORTRAN ARRAY A, AiD USING FORTRAN SYNTAX. IN THIS WAY, TRESE RESULTS MAY BE INSERTED INTO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAINS EXPRESSIONS INVOLVIJJG THE INPUT DATA. THE SECOIW TABLE IS A CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXHESSIOlfS IN THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERO, THEN THERE IS NO CORRESPONDING OUTPUT LOCATION IN THE SECOilD TABLE. THE PAIR OF TABLES REPRESENT A. COHBHIE EQUATION. BY USING LOCAL SCALAR VARIA.BLES AS TEMPORARY STORAGE LOCA.TIONS. THESE COMBINE EQUATIONS CAN BE EXECUTED II PLACE. THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. ) FortranForm [rhs iyOq] List(0,-2a(j,ls/4)) lhsiyOq = Table[ {q.l,p/2-1} ] {o(j ,ls/4)} ( THE FOLLOWING 2 PAIRS OF TABLES SPECIFY THE FOR I= LS/(2P). ) 283

PAGE 295

2*(a(j,ls/8) -a(j,3*ls/8)) lhsytm = FortranForm[yt[ls/(2*p)]] a(j,ls/8) FortranForm[rhsiymq] List(O,-(Sqrt(2)*(a(j,ls/8) + a(j,3*ls/8)))) lhsiymq =Table[ FortranForm[iy[ls/(2*p),q]] {q,i,p/2-1} ] ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND OI. ) FortranForm[rhsorq] List(Re(omega(i*l/ls))) FortranForm[rhsoiq] List(Im(omega(i*l/ls))) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. ) FortranFor.m[rhsfrq] List(a(j,i) + a(j.-i + ls/2)) FortranFor.m[rhsfiq] List(-(a(j,-i + ls/4) + a(j,i + ls/4))) ( THE FOLLOlJING 4 PAIRS OF TABLES SPECIFY THE CONPUTATIONS FOR THE GENERAL INDEX I. ) FortranForm[rhsryiO] a(j,i) a(j,-i + ls/4) -a(j,-i + ls/2) + a(j, i + ls/4) lhsryiO = FortranForm[yO[i]] 284

PAGE 296

a (j, i) F ortre.nF orm [rhs yt i] a(j,i) + a(j,-i + ls/4) -a(j,-i + ls/2) -a(j ,i + ls/4) lhsyti FortranFor.m[yt[i]] a(j,-i + ls/4) FortranForm.[rhsiyiq] List(-{fr(1)oi(1)) + fi(1)or(1)) lhsiyiq = Table[ FortranForm[iy[i,q]] {q,1,p/2-1} ] {a(j,i + ls/4)} FortranForm[rhsryiq] lhsryiq = Table[ FortranForm.(ry[i,q]] {q,1,p/2-1}] {a(j,-i + ls/2)} 285 --

PAGE 297

< VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COMBINED EVEN R.ADIX ) < BEFORE EXECUTING THIS FILE, OPEN THE PACKAGES NSG. I'l AND REil'LM. MSG.M PROVIDES TEXT FOR ERROR MESSAGES, i-JHILE REil'LM REDEFINES THE FUNCTIONS RE AHD IM FOR PERFOR1>11NG SYMBOLIC RATHER THAN NUI1ERIC COMPUTATIONS. ) < SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 < THE CONJUGATE FUNCTION :MUST BE REDEFINED FOR PERFORIUJiV; SYMBOLIC RATHER THAN NUMERIC CnMPUTATIONS. ) Unprotect[Conjugate] {''Conjugate"} Conjugate[expr_] := Re[expr] -Im[expr] I Protect[Conjugate] {"Conjugate"} < THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WILL BE NEEDED TO SIMPLIFY THE COMBINE EQUATIONS. THESE PROPERTIES ARE NOT BUILT INTO MATHEMATICA, AND }lliST BE SPECIFIED AS FOLLOWS. ) EvenOdd = { } Sin[(n_?Uegative) x_.] :> -Sin[-n x], Cos[(n_?Negative) x_.] :> Cos[-n x] /; n -Sin[-(nx)], Cos[(x_.)(n_)?Negative] :> Cos[-Cnx)] /; n < 0} < ABSX REPRESENTS THE ABSOLUTE VALUE OF X. AT THIS POINT, IT IS AN UNDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. ME WILL LATER DEFillE ABSX IN TERNS OF ITS LOCATION TJITHIN A FORTRAN .ARRAY. MATHEMATIC.A HAKES NO 286

PAGE 298

ASSUMPTIONS ABOUT THE DATA TYPE OF ABSX. THE FOLLOWING SUTEMENTS INFORM M.ii.THEMATICA THAT ABSX IS REAL VP.LGED. ) absx/: Re[absx(n_]] := absx[n] absx/: Im(absx[n_]] := 0 < THE FOLLOt.TlNG STATEMENT IS VALID FOR THE IDFT OF AMY ISCS SYMMETRIC SEQUENCE. THIS STATEMENT PLAYS A CRUCihL hOLE IN SIHPLIFYIDG THE FORWARD COMBINE EQUATIONS FOR ISCS SEQUENCES. J absx[O] := 0 < THE FUNCTIONS XT (X TILDE), lJ (OI1EGA), AND Y CORRESPOND TO NOTATION USED IN THE COMBINE EQUATIONS. THE ARGffi1ENTS TO THESE FUNCTIONS REPRESEIT SUBSCRIPTS. LS REPRESENTS THE LEIGTH OF THE SUBSEQUENCE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) xt(n_,l_] := absx[l*ls/p+n] y(n_,q_] := ( w[2*ls]-(-n(2q+1)) xt[n,O] + * xt[-n,p/2] + w[2ls*p]-((l*ls-n*p)*(2*q+1)) xt(-n,l] {l,i,p/2-1} l ) ( THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AIJD THEREFORE YILL NOT BE STORED IN THE FORTRAN ARRAY. J rhsryOq =Table[ Factor[Simplify[Re[y[O,q]]]] {q,O,p/2-1}) {0, 0} ( THE FOLLOlHDG TABLE CONTAINS THE II>IJI.GIIJARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NON-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[O,Q]. J 287

PAGE 299

rhsiyOq = Table( Factor[Simpli:fy(Im[y(O,q]]]] {q,O,p/2-1} ] {-(2-(1/2)absx[ls/4] + absx(ls/2]), -(2"(1/2)absx[ls/4] -abs:x:[ls/2])} < THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[LS/(2*P) ,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE O, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. NOTE THAT THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WERE REQUIRED TO OBTAIN THESE RESULTS. ) rhsrymq =Table[ Re(y[ls/(2p),q]] /. EvenOdd {q,O ,p/2-1} ] {0, 0} < THE FOLLOWING TA.BLE CONTA.INS THE IJUGnJARY PART OF Y[LS/(2P),Q] FOR THE APPROPRIATE RANGE OF Q. THE NON-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYHQ PROVIDES A CONVENIENT MEANS FOR REFERRI!IJG TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IIUGINARY PART OF Y[M,Q] WHERE M = LS/(2>0?), AGAIN, THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WERE REQUIRED TO OBTAIN THESE RESULTS. ) rhsiymq =Table[ Irn[y[ls/(2p),q]] /. EvenOdd {q, 0 ,p/2-1} l {-2Sin[Pi/8]absx[ls/8] 2Sin[(3Pi)/8]absx[(3*ls)/8], -2Sin[(3Pi)/8]absx[ls/8] + 2Sin[Pi/8]absx((3ls)/8]} ( THE REMAINIEG EQUATIOlfS INVOLVE COMPLEX MULTIPLICATIOIJ BY A POYER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF UEEGA REQUIRED HAVE BEEN PRE-COMPUTED AND STORED IN THE COHPLEX ARRAY OMEGA. OMEGA CONTAINS THE L'TH ROOTS OF UNITY, WHERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2LS FCR ALL VALUES OF LS. I AND Q ARE INDEX VALUES USED Hl' THE COl'lBifiJE EQUATIONS. MATHEI1ATICA REGARDS OMEGA AS Ail! UNDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT WILL BE IDENTICAL TO A FORTRAJl' ARRAY. THE FUNCTIONS OR AIW OI REPRESENT THE REAL AND IMAGINARY PARTS, OF THE POWERS OF OHEGA USED IN THE COMBINE EQUATIONS. ) or[q_] := Re[ornega[(2q+1)*i*l/(2ls)]] oi[q_J := Irn[omega[(2>0q+1)*il/(2*ls)]] 288

PAGE 300

rhsorq =Table[ or[q] {q,O,p/2-1}] rhsoiq =Table( oi[q] {q,O,p/2-1} ] < THE FUNCTIONS FR AHD FI ARE OBTAINED FROM Y BY OHITTING THE FIRST FACTOR (A POWER OF Of1EG.A) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_,q_) := Re[ xt[n,O) + + Sum[Conjugate (y [2*p] (1(2q+1))] xt [n, 1] xt[-n,l] {1,1,p/2-1} l l fi[n_,q_] := Im[ x.t[n,O] + I(-1)-(q+l)xt[-n,p/2] + Sum[Conjugate [v[2p] (1( 2q+1)) Jxt (n,l] xt[-n,l] {1,1,p/2-1} l l rhsfrq =Table[ Factor[fr(i,q]] {q,O,p/2-1} ] {(2*absx[i] (1/2)absx[-i + ls/4) + 2-(1/2)absx(i + ls/4))/2, (2absx[i] + + ls/4] + ls/4])/2} rhsfiq = Table[ Factor(fi[i,q)] {q,O,p/2-1} ] {-(2-(1/2)absr[-i + ls/4] + 2absx[-i + ls/2] + 2-(1/2)absx[i + ls/4])/2, + ls/4] -2*absx[-i + ls/2] + 2-(1/2)absx[i + ls/4])/2} < WE ROW CLEAR THE DEFINITIONS OF THE FUNCTIONS OR,OI,FR,FI SO THAT MATHEMATICA 'WILL LEAVE THEl'l IN SYMBOLIC FORl'I RATHER THAN EXPANDING THEM. ) Clear[or] Clear[oi] Clear[fr] Clear[fi] 289

PAGE 301

< THE FUlll'CTIONS RYI (Q] AND IYI [Q] REPRESENT THE REAL l;.IJD IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or[q]fr[q] + oi[q]fi[q] iyi[q_] := or[q)fi[q] oi[q]fr[q] rhsryiq = Table[ ryi[qJ {q,O,p/2-1} ] {fi[O]*oi[O] + fr[O]or[O], fi[1)oi[1] + rhsiyiq = Table[ iyi[q] {q,O,p/2-l} {-(fr[O]oi[O]) + fi[O]*or(O], -(fr(l]*oi[i]) + fi[1]or[1]} < BEFORE EXECUTUG THE REI1.HNDER OF THIS FILE, ONE MUST DETERMINE STORAGE PATTERNS FOR THE INPUT AND OUTPUL DATA WHICH ALLOW THE COMBINE EQUATIONS TO BE EXECUTED IN-PLACE: THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN ARRAY NAMED A. EACH SEQUENCE IS STORED IN A ROW OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE NUMBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING ABSX IN TERMS OF ITS LOCATION WITHIN THE FORTRAN ARRAY A. MATHEMATICA REGARDS A AS AN UNDEFillED FUNCTION BUT TffE SYNTAX OF THE FINAL OUTPUT WILL BE IDENTICAL TO A FORTRAN ARRAY. l absx[n_] := a[j,ls/2-n] < THE STORAGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOWING FUNCTIONS. THE FUNCTION NAMES ARE ACRONYHS FOR OUTPUT QUANTITIES. FOR EXAMPLE, IY[N,Q] MEANS THE INi!.GlNAR.Y PART OF Y[N,Q], ETC ... J iy[n_,q_] := a[j,qls/p+n] ry[n_,q_] := a[j,qls/p+ls/p-n] < WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FORTRAN ARRAY A, AND USING FORTRAN SYNTAX. IN THIS HAY, THESE RESULTS MAY BE INSERTED Illl'TO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAINS EXPRESSIONS INVOLVING THE INPUT DATA. THE SECOND IS A CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXPRESSIONS IN THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERU, THEN THERE IS NO CORRESPONDING OUTPUT LOCATION IU THE SECOND TABLE. THE PAIR OF TABLES REPRESENT A CONBHiE 290

PAGE 302

EQUATIOE. BY USING LOCAL SCALAR VARIABLES AS TENPORARY STORAGE LOCATIONS, THESE COMBINE EQUATIONS CAN BE EXECUTED IH-PLACE. THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. ) F ortra.nF orm[rhs iyOq] List(-(a(j,O) + Sqrt(2)a(j,ls/4)), -(-a(j,O) + Sqrt(2)a(j,ls/4))) lhsiyOq = Table[ FortranForm[iy[O,q]] {q,O,p/2-1} ] {a(j,O), a(j,ls/4)} ( THE FOLLOWIJJG PUR OF TA.BLES SPECIFY THE CONFUTATIONS FOR I = LS/(2P). ) FortranForm[rhsiymqJ List(-2*Sin(3Pi/8)a(j,ls/8) 2Sin(Pi/8)a(j,3*ls/8) ,2*Sin(Pi/8)a(j,ls/8) 2Sin(3Pi/8)a(j,3ls/8)) lhsiymq =Table[ FortranForm(iy[ls/(2*p),q]] {q,O,p/2-1} ] {a(j ,ls/8), a(j ,3ls/8)} ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND OI. ) FortranForm[rhsorq] List(Re(omega(il/(2s))),Re(omega(3il/(2ls)))) FortranForm[rhsoiq) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. ) FortranForm[rhsfrq] List((Sqrt(2)a(j,-i + ls/4) + 2*a(j,-i + ls/2) Sqrt(2)a(j,i + ls/4))/2, (-(Sqrt(2)a(j,-i + ls/4)) + 2a(j,-i + ls/2) + Sqrt(2)a(j,i + ls/4))/2) 291

PAGE 303

FortranForm[rhsfiq] List(-(2a(j,i) + Sqrt(2)a(j,-i + ls/4) + Sqrt(2)a(j,i + ls/4))/2, -(-2*a(j,i) + Sqrt(2)a(j,-i + ls/4) + Sqrt(2)a(j,i + ls/4))/2) < THE FOLLOWING 2 PAIRS OF TABLES SPECIFY THE COMPUTATIOIITS FOR THE GENERAL INDEX I. ) FortranForm[rhsiyiq] List(-(fr(O)*oi(O)) + fi(O)or(O), -(fr(1)oi(1)) + fi(l)or(l)) lhsiyiq =Table[ FortranFor.m[iy[i,q]] {q,O,p/2-1} {a(j,i), a(j,i + ls/4)} FortranForm[rhsryiqJ List(fi(O)oi(O) + r(O)*or(O), fi(1)oi(1) + fr(1)or{1)) lhsryiq = Table[ FortranFor.m[ry[i,q]] {q,O,p/2-1} ] {a(j,-i + ls/4), a{j,-i + ls/2)} 292

PAGE 304

( VECTORIZED I SEQUENCES FORWARD COMBINED EVEN RADIX ) ( BEFORE EXECUTING THIS FILE. OPEN THE PACKAGES HSG.I>! Jl.ND REIM.M. HSG.M PROVIDES TEXT FOR ERROR F-IESSAGES, t-JHILE REIM.M REDEFINES THE FUNCTIONS RE AND IM FOR PERFORI1HJG SYMBOLIC R.A.THER THAli NUI1ERIC COMPUTATIONS. ) ( SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 ( THE COllJJUG.A.TE FUICTION MUST BE REDEFIIJED FOR PERFORNING SYMBOLIC RATHER THAI HOMERIC COMPUTATIONS. ) Unp:rotect [Conjugate] {"Conjugate"} Conjugate[expr_] := Re[expr] -Im[expr] I Protect[Conjugate] {"Conjugate"} ( REX REPRESENTS THE REAL PART OF X. AT THIS POINT, IT IS AN UIDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. WE WILL LATER DEFINE REX IN TERMS OF ITS LOCATION WITHIN A FORTRAN ARRAY. MATHEMATICA MAKES NO ASSUMPTIONS ABOUT THE DATA TYPE OF REX. THE FOLLmHNG STATEMENTS IrlFORM MATHElUTIC.A THAT REX IS REAL VALUED. Il'U REPRESENTS THE IMAGINARY PART OF X .AND IS ANALOGOUS TO REX. ) rex/: Re [rex [n_]] := rex[n] rex/: Im[rex [n_]] := 0 im:J;/; Re [imx [n_]] = imx[n] imx/: Im[imx [n_]] := 0 ( THE FOLLOWING STATEMENTS ARE VALID FOR THE IDFT OF kNY I 293

PAGE 305

SYMMETRIC SEQUENCE. THESE STATEMENTS PLAY A CRUCIAL ROLE IN SIMPLIFYING THE FORRARD COMBINE EQUATIONS FOR I SEQUENCES. LS REPRESENTS THE LENGTH OF THE SUBSEQUENCE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) :rex[O] := 0 :rex[ls/:2] := 0 ( THE FUNCTIONS X, W (OMEGA), ANDY CORRESPOND TO NOTATION USED IN THE COMBINE EQUATIONS. THE ARGUMENTS TO THESE FUNCTIONS REPRESENT SUBSCRIPTS. ) x[n_,l_] := :rex[lls/p+n] + Iimx[l*ls/p+n] w[l_] := Exp[H2Pi/l] y(n_,q_] := "' ( x[n,O] + (-1)-(q+l)Conjugate[x[-n,p/2]] + Sum( Conjugate[w[p]-(lq)]x(n,l]w[p]-(l*q)*Conjugate[x[-n,l]] {1, 1 ,p/2-1} ] ) ( THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AND THEREFORE NOT BE STORED IN THE FORTRAN ARRAY. ) rhsryOq = Table[ Factor[Sirnpliy[Re[y(O,q)]]) {q,O,p-1} ] {0, o, o, 0} ( THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NON-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRIUG TO THESE RESULTS, AID IS AN ABBREVIATION FOR THE RIGHT HAED SIDE OF THE IMAGINARY PART OF Y[O.Q]. ) rhsiyOq = Table[ Factor[Simplify[Irn[y[O,q]]J] {q,O ,p-1} ] {imx[O] + imx[O] imx[O] imx[O] ( 2irnx[ls/4] + irnx(ls/2], irnx[ls/2] -2irnx[ls/4] + imx[ls/2] irnx[ls/2] + 294

PAGE 306

THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[LS/(2*P),Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsrymq =Table[ Factor[Sirnplify[Re[y[ls/(2p),q]]j] {q,O ,p-1} ) {0, 0, 0, 0} ( THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[LS/(2P) ,Q] FOR THE APPROPRIATE RAJliGE OF Q. THE ItTOI!-ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RBSIYMQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y(M,Q] WHERE M = LS/(2*P). ) rhsiyrnq =Table[ Faetor[Sirnplify[Irn[y(ls/(2*p),q]]]] {q,O,p-1} ] {2(imx[ls/8] + irnx[(3ls)/8]), 2-(1/2)(im:x[ls/8] -irnx[(3ls)/B] rex[ls/8] rex[(3*ls)/8]), -2(rex[ls/8] rex((3*ls)/8]), -(2-(1/2)(imx[ls/8] -irnx[(3Hs)/8] + rex[ls/8] + xex((3ls)/8]))} < THE FOLLOVING RESULT IS THE REAL PART OF Y[I,O] FOR A GENERAL INDEX I. A NON-ZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSRYIO PROVIDES A COI1VENIENT MEANS FOR REFERRING TO THIS RESULT, AND IS AN JlBBREV!ATIOllT FOR THE RIGHT HAND SIDE OF THE REAL PART OF Y(I,O]. ) xhsryiO = Factor(Simplify[Re[y(i,O]]]] xex(i] rex[-i + ls/4] rex(-i + ls/2] + xex[i + ls/4] < THE FOLLOYING RESULT IS THE IMAGINARY PART OF Y[I,O] FOR A GENERAL INDEX I. A NON-ZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYIO PROVIDES A CONVENIEHT MEANS FOR REFERRING TO THIS RESULT, AND IS AN ABBREVIATIOI FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[I,O]. ) rhsiyiO = Factor[Simplify[Im[y(i,O]]J] imx[i] + irnx[-i + ls/4] + im.x(-i + ls/2] + irnx[i + ls/4] 295

PAGE 307

( THE REMAINING EQUATIONS INVOLVE COMPLEX BY A POWER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF OllE;J.A REQUIRED HAVE BEEN PRE-COMPUTED AND STORED IN THE CGr1PLEX ARRAY 011EGA. OMEGA CONTAINS THE L'TH ROOTS OF UNITY, ti'HERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2*15 FOR ALL VALUES OF LS. I AND Q ARE INDEX VALUES USED IN THE COHBil'JE EQUATIOli!S. MATHEMATICA REGARDS OJ.1EGA Jl.S AIJ UNDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT 'I-JILL BE IDENTICAL TO A FORTRAN ARRAY. THE OR li.ND DI REPRESENT THE REAL AmD Il1AGINARY PARTS, RESPECTIVELY, OF THE POWERS OF OMEGA USED IN THE COMBINE EQUATIONS. ) or[q_] := Re[omega[qil/ls]] oi[q_] := Im[omega[qil/ls]] rhsorq = Table[ or[q] {q,l,p-1} {Re[omega[(il)/ls]], Re[omega[(2*i*l)/ls]], Re[omega[(3*i*l)/ls]]} rhsoiq =Table[ oi[q) {q,l,p-1} {Im[omega[(il)/ls]]. Im[omega[(2il)/ls]], Im[omega[ (3i*l) /ls] ]} ( THE FUNCTIONS FR AND FI ARE OBTJI.UED FROM Y BY OHITTJ'JJG THE FIRST FACTOR (11. POWER OF OMEGA) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_.q_] := Re[ x[n,O) + ( -1)-(q+l )Conjugate (:x [ -n,p/2]] + Sum[ Conjugate [v [p]-(hq)] *x[n,l]v[p]-(lq)Conjugate[x[-n,l]] {l,1,p/2-1} l J fi(n_,q_] := Im[ :x[n,O] + ( -1)(q+l )Conjugate [x [ -n,p/2]] + Sum[ Conjugate[v[p]-(lq)]x[n,l]v (p](hq) *Conjugate [x [ -n,l]] {1, 1 ,p/2-1} l l rhsfrq = Table[ Factor(fr[i,q]) {q,1,p-1} l {-(im.x[-i + ls/4] imx[i + ls/4] -rex[i] -rex[-i + ls/2]), rex[i] + rex[-i + ls/4] -rex[-i + ls/2] -rex[i + ls/4] imx[-i + ls/4] imx[i + ls/4] + rex[i] + rex[-i + ls/2]} rhsfiq = Table[ Factor[fi[i,q]] {q,l,p-1} ] 296

PAGE 308

{im.x[i] imx[-i + ls/2] -rex[-i + ls/4] -:rex[i + ls/4], imx[i] imx[-i + ls/4] + imx[-i + ls/2) -imx[i + ls/4), imx[i] iml::[-i + ls/2] + rex[-i + ls/4] + rex[i + ls/4]} < VE NOW CLEAR THE DEFINITIONS OF THE FUNCTIONS OR,OI,FR,FI SO THAT H!THEM!TIC! WILL LEAVE THEN IN SYI'IBOLIC FORH R1l.THER THAN EXPANDING THEM. l Clear[or] Clear[oi] Clear[fr] Clea:r[fi] < THE FUNCTIONS RYI[Q] AND IYI[Q] REPRESENT THE REAL AND IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or[q]fr[q] + oi[q]fi[q] iyi[q_] := or[q]*fi[q] oi[q]fr[q] rhsryiq =Table[ ryi(q] {q,l,p-1}] {fi [1] oi (1] + fr [1] or[l], fi(2] oi[2] + "fr [2] or [2] fi[3]oi[3] + fr[3]or[3]} rhsiyiq = Table[ iyi[q] {q,l,p-1} {-(fr[l]oi[l]) + fi[1]or[1], -(fr[2]*oi[2]) + fi[2]or[2], -(fr[3]oi[3]) + fi[3]or[3]} < BEFORE EXECUTIIG THE REMAINDER OF THIS FILE, ONE MUST DETERMINE STORAGE PA.TTERMS FOR THE IlfPUT AND OUTPUT DATA WHICH ALLOW THE COMBINE EQUATIONS TO BE EXECUTED IN-PLACE. THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN 1RRAY NAMED A. EACH SEQUENCE IS STORED IN A ROM OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE JliUl'!lBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING REX AND IMX IN TEru1S OF THEIR LOCATION WITHIN THE FORTaAE ARRAY A. HATHEMATICA REGARDS A AS AN UNDEFINED FUNCTIOH, BUT THE SYNTAX OF THE FINAL OUTPUT VILL BE IDENTICAL TO A FORTRAN ARRAY. l 297

PAGE 309

imx[n_] := a[j ,n] rex [n_] : = a[j ,ls-n] < THE STORiGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOlJI:NG FUNCTIONS. THE FUNCTION NAMES ARE ACRONYHS FOR OUTPUT QUUTITIES. FOR EXAMPLE, IY[N ,Q] I1EANS THE I!U.GINii.RY PART OF Y[N ,Q], ETC ... ) iy[n_,q_] := a(j,qls/p+n] ry[n_,q_] := a(j,qls/p+ls/p-n] < WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FCRTRH ARRAY A, AND USING FORTRAN SYNTAX. IN THIS WAY, THESE RESULTS MAY BE INSERTED INTO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAIIIS EXPRESSIONS INVOLVING THE INPUT DATA. THE SECOJJD TABLE IS Jl. CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXPRESSIONS Ii THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERO, THEN THERE IS DO CORRESPONDING OUTPUT LOCATION IN THE SECOED TABLE. THE PAIR OF TABLES REPRESENT A COMBINE EQUATION. BY USING LOCAL SCALAR VARIABLES AS TEMPORARY STORAGE LOCATIONS, THESE COMBINE EQUATIONS CAN BE EXECUTED IN-PLACE. THE FOLLOMING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. l List(a(j,O) + 2a(j,ls/4) + a(j,ls/2), a(j,O) -a(j,ls/2) 2a(j,3ls/4), a(j,O) 2a(j,ls/4) + a(j,ls/2), a(j,O) -a(j,ls/2) + 2a(j,3ls/4)) lhsiyOq =Table[ FortranForm[iy[O,q]] {q,O,p-1}] {a(j ,0), a(j ,ls/4), a(j,ls/2), a(j ,3ls/4)} < THE FOLLOWimG PAIR OF TABLES SPECIFY THE COMPUTATIOIE FOR I = LS/(2P). ) FortranForm[rhsiymq] List(2(a(j,ls/8) + a(j,3*ls/8)), Sqrt(2)(a(j,ls/8) a(j,3s/8) a(j,S*ls/8) -a(j,7Hs/8)), -2(-a(j,Sls/8) + a(j,7ls/8)), -(Sqrt(2)*(a(j,ls/8) a(j,3ls/8) + a(j,Sls/8) + a(j, 7ls/8)))) 298

PAGE 310

lhsiymq =Table[ FortranFor.m[iy[ls/(2p),q]] {q,O,p-1} ] {a(j,ls/8), a(j,3ls/8), a(j,Sls/8), a(j,7ls/8)} ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND DI. l FortranForm[rhsorq] List(Re(omega(il/ls)),Re(omega(2il/ls)), Re(omega(3il/ls))) FortranForm[rhsoiq) List(Im(omega(il/ls)),Im(omega(2il/ls)), Im(omega(3*i*l/ls))) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. l FortranForm[rhsfrq] List(-(a(j,-i + ls/4) -a(j,-i + ls) a(j,i + ls/4) a(j,i + ls/2)), -a(j,-i + 3ls/4) + a(j,-i + ls) -a(j,i + ls/2) + a(j,i + 3*ls/4),a(j,-i + ls/4) + a(j,-i + ls) -a(j,i + ls/4) + a(j,i + ls/2)) FortranForm[rhsfiq] List (a(j, i) a(j ,-i + ls/2) -a(j ,-i + 3*ls/4) -a(j,i + 3s/4),a(j,i) a(j,-i + ls/4) + a(j,-i + ls/2) -a(j,i + ls/4), a(j,i) a(j,-i + ls/2) + a(j,-i + 3*ls/4) + a(j,i + 3*ls/4)) ( THE FOLLOWING 4 PAIRS OF TABLES SPECIFY THE COMPUTATIONS FOR THE GENERAL INDEX I. l FortranForm[rhsiyiO] a(j,i) + a(j,-i + ls/4) + a(j,-i + ls/2) + a(j, i + ls/4) lhsiyiO = FortranForm(iy[i,O]] a(j,i) 299

PAGE 311

FortranForm[rhsryiO] a(j,-i + 3ls/4) + a(j,-i + ls) -a(j,i + ls/2) -a(j,i + 3ls/4) lhsryiO = FortranFor.m[ry[i,O]] a(j,-i + ls/4) FortranForm[rhsiyiq] List(-(fr(1)oi(1)) + fi(1)*or(1), -{fr(2)*oi(2)) + fi(2)or(2), -(fr(3)*oi(3)) + fi(3)or(3)) lhsiyiq = Table[ FortranForm(iy[i,q]] {q,1,p-1} l {a(j,i + ls/4), a(j,i + ls/2), a(j,i + 3*ls/4)} FortranForm[rhsryiq] List(fi(1)*oi(1) + fr(1)or(1), fi(2)oi(2) + fr(2)or(2),fi(3)oi(3) + fr(3)or(3)) lhsryiq = Table[ FortranForm[ry[i,qJ] {q,1,p-1} l {a(j,-i + ls/2), a(j,-i + 3*ls/4), a(j,-i + ls)} 300

PAGE 312

Appendix E Automatically Generated Subroutines for the RO FFT

PAGE 313

c C SUBROUTINE: VICSF4 c c C NAME c C VECTORIZED ICS INDUCED SYMJ>IETRIES FORtiFARD COMBII!ED FOR. C RADIX-4 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE R.li.DIX-4 FORWARD COl'!BINE C EQUATIONS FOR ICS INDUCED SYMMETRIES. c c C IBPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRAD!SFORH c C LS; LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF YHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARli.METERS c C A: UPDATED BY RADIX-4 FORMARD COMBINE EQUATIONS FOR ICS C IIDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lONE c SUBROUTINE VICSF4(M,LS,A) REAL A(l:M,l:LS/2-1) COMPLEX OMEGA(O:O) COMMON /VFROCOH1/ SQRT2,SQRT2D2, t SQRT3,SQRT3D2, t TSINPID8,TS!N3PI08, It L,OMEGA INTEGER P, TWOP PARANETER (P=4,TWOP=2P) c C COMPUTATIONS FOR I = 0 c c DO 1 J=1,M a(j,ls/4) = (-2)a(j,ls/4) 1 COiiTHIUE IF (TVOP(LS/TWOP) .EQ. LS) THEN C COMPUTATIONS FOR I = LS/TWOP c 302

PAGE 314

c DO 2 J=t,M VO = 2+(a(j,ls/8) a(j,3+ls/8)) Vl = (-Sqrt2)+(a(j,ls/8) + a(j,3+ls/8)) a(j,ls/8) = VO a(j,3+ls/8) =Vi 2 COIII'TII'JUE MS = LS/TIJOP-1 ELSE MS = (LS-P)/TYOP END IF IF (LS .GT. TWOP) THEN C COMPUTATIOIII'S FOR I = 1,MS c DO 100 I=l,MS 0R1 = ReAL(omega(i*l/ls)) Oil = AimAG(omega(i+l/ls)) DO 101 J=l,M FRl = a(j,i) + a(j,-i + ls/2) FI1 = -a(j,-i + ls/4) -a(j,i + ls/4) VO a(j,i) a(j,-i + ls/4) a(j,-i + ls/2) + i: a(j,i + ls/4) Vi a(j,i) + a(j,-i + ls/4) a(j,-i + ls/2) i: a(j,i + ls/4) a(j,i) = vo a(j,-i + ls/4) =Vi a(j,i + ls/4) = fr1+(-oi1) + fi1+or1 a(j,-i + ls/2) = fi1+oi1 + fr1+or1 101 CONTINUE 100 CONT!lii'UE END IF aETurur END 303

PAGE 315

c C SUBROUTINE: VISCSF4 c c C IU.ME c C VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COI'lBilTED FOR C RADIX-4 c c C FUIICTION c C THIS SUBROUTINE EXECUTES THE RJI.DIX-4 FORlJARD CONBDTE C EQUATIONS FOR ISCS INDUCED SYJIIIiETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL .A.RR.AY, EACH ROY OF WHIC!f CDrHHIJS THE C IDFT OF AN ISCS SYMf'IETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C A: UPDATED BY RADIX-4 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF4(M,LS,A) REAL A(l,M,O,LS/2-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ SQRT2,SQRT2D2, SQRT3,SQRT3D2, TSIHPID8,TSIN3PID8, L, OI'IEGA c INTEGER P, TIJOP PARAMETER (P=4,TWOP=2*P) C COMPUTATIONS FOR I = 0 c DO 1 J=i,M VO = -a(j,O) -Sqrt2a(j,ls/4) Vi= a(j,O) -Sqrt:
PAGE 316

c C COMPUTATIONS FOR I LS/TWOP c c DO 2 J=l,H VO = (-TSin3PiDS)a(j,ls/8) Vl = TSinPiDSa(j,ls/8) TSin3PiD8a(j,3ls/8) a(j,ls/8) = VO a(j,3ls/8) = Vl 2 CONTINUE MS = LS/TIJOP-1 ELSE MS = (LS-P)/TYOP END IF IF (LS .GT. TWOP) THEN C COMPUTATIONS FOR I = 1,MS c DO 100 l=l,HS ORO = ReAL(omega(il/(2ls))) OR1 = ReAL(omega(3il/(2ls))) OIO = AimAG(omega(il/(2ls))) 011 = AimAG(omega(3il/(2ls))) DO 101 J=1,H FRO Sqrt2D2a(j,-i + ls/4) + a(j,-i + ls/2) Sqrt2D2a(j,i + ls/4) FRl (-Sqrt2D2)a(j,-i + ls/4) + a(j,-i + ls/2) + Sqrt2D2a(j,i + ls/4) FlO -a.(j,i) -Sqrt2D2a(j,-i + ls/4) i:: Sq:rt2D2a(j,i + ls/4) Fit a(j,i) Sqrt2D2a(j,-i + ls/4) .t Sqrt2D2a(j,i + ls/4) a(j,i) = frO(-oiO) + fiOorO a(j,i + ls/4) = r1(-oi1) + filorl a(j,-i + ls/4) fiOoiO + frOorO a(j,-i + ls/2) = filoil + fr1or1 101 COliTINUE 100 CONTINUE END IF RETURN EmD 305

PAGE 317

c C SUBROUTINE: VIF4 c c C NAME c C VECTORIZED I SEQUENCES FORWARD COMBINED FOR RADIX-4 c c C FUNCTION c C THIS SUBROUTUE EXECUTES THE RADIX-4 FORWARD COMBI!lE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH THE C IDFT OF AN I SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C A: UPDATED BY RADIX-4 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C BONE c c SUBROUTINE VIF4(M,LS,A) REAL A(l:M,O:LS-1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ SQRT2,SQRT2D2, 1: SQRT3, SQRT3D2, TSINPIDS,TSIN3PID8, L,OHEGA INTEGER P, TWOP PARAMETER (P=4,TWOP=2P) C COMPUTATIONS FOR I = 0 c DO 1 J=1,M VO a(j,O) + 2a(j,ls/4) + a(j,ls/2) V1 a(j,O) -a(j,ls/2) -2*a(j,3*ls/4) V2 a(j,O) 2*a(j,ls/4) + a(j,ls/2) V3 a(j ,0) -a(j ,ls/2) + 2*a(j,3>i
PAGE 318

c a(j,3s/4) = V3 1 CONTINUE IF (TWOP(LS/TWOP) .EQ. LS) THEN C COMPUTATIONS FOR I = LS/TROP c c DO 2 J=1,H VO 2(a(j,ls/8) + a(j,3ls/8)) Vl Sqrt2(a(j,ls/8) -a(j,3ls/8) -a(j,5*ls/8) -& a(j,7ls/8)) V2 2(a(j,5ls/8) -a(j,7ls/8)) V3 (-Sqrt2)(a(j,ls/8) a(j,3ls/8) + + & a(j,7ls/8)) a(j ,ls/8) = VO a(j,3ls/8) = Vi a(j,Sls/8) = V2 a(j,7ls/8) = V3 2 CONTINUE MS = LS/TWOP-1 ELSE MS = (LS-P)/TWOP EliJDIF IF (LS .GT. TWOP) THEN C COMPUTATIONS FOR I = l,MS c DO 100 I=1 ,HS OR1 ReAL(omega(il/ls)) 0R2 ReAL(omega(2il/ls)) OR3 ReAL(omega(3il/ls)) Oil AimAG(omega(i*l/ls)) 0!2 = AimAG(omega(2*il/ls)) OI3 = AimAG(omega(3*il/ls)) DO 101 J=l,M FR1 -a(j,-i + ls/4) + a(j,-i + ls) + a(j,i + ls/4) + & a(j,i + ls/2) FR2 -a(j,-i + 3Hs/4) + a(j,-i + ls)-a(j,i + 1s/2) + a(j, i + 3*ls/4) FR3 a(j,-i + ls/4) + a(j,-i + ls) -a(j,i + ls/4) + a(j,i + ls/2) FI1 a(j,i) -a(j,-i + ls/2) a(j,-i + 3ls/4) -a(j ,i + 3ls/4) FI2 FI3 a(j, i) a(j, -i + ls/4) + a(j,-i + ls/2) -a(j,i + ls/4) a(j,i) -a(j,-i + ls/2) + a(j,-i + 3s/4) + a(j,i + 3s/4) VO a(j,i) + a(j,-i + ls/4) + a(j,-i + ls/2) + a(j,i + ls/4) Vl a(j,-i + 3ls/4) + a(j,-i + ls)-a(j,i + ls/2) a(j ,i + 3ls/4) a(j,i) =vo a(j,-i + ls/4) = V1 a(j,i + ls/4) = fr1*(-oi1) + fil*orl a(j,i + ls/2) = fr2*(-oi2) + fi2or2 a(j,i + 3ls/4) = fr3(-oi3) + fi3or3 307

PAGE 319

a(j,-i + ls/2) = fi1oi1 + fr1or1 a(j,-i + 3ls/4) = fi2oi2 + fr2or2 a(j,-i + ls) = fi3oi3 + fr3*or3 101 CONTINUE 100 COJilTIIIIUE EDDIF RETURN El!D 308

PAGE 320

Bibliography [1] W. 1. Briggs, Further symmetries of in-place FFTs, SIAM J. Sci. Stat. Comput., Vol. 8, No. 4 (1987), pp. 644-654. [2] J. W. Cooley, P. A. W. Lewis, and P. D. Welsh, The fast Fourier trans form algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms, J. Sound Vibration, 12 (1970), pp. 315337. [3] J. W. Cooley and J. W. Tukey, An algor-ithm for the machine calculation of complex Fourier ser-ies, Math. Comp., 19 (1965), pp. 297-301. [4] W. M. Gentleman, Implementing Glenshaw-Curtis quadrature, II Com puting the cosine transformation, Coilli-n. ACM, Vol. 15, No.5 (1972), pp. 343-346. [5] E. Grosse, Netlib news: Greetings, SIAM News, Vol. 23, No. 6 (1990), pp. 14-16. [6] U. Schumann and R. A. Sweet, Fast Fourier Transforms for direct solu tion of Poisson's equation with staggered boandary conditions, J. Com put. Phys., Vol. 75, No. 1 (1988), pp. 123-137. [7] P. N. Swarztrauber, Vectorizing the FFTs, in Parallel Computations (G. Rodrigue, ed.), Academic Press, New York, 1982, pp. 490-501. [8] --, Fast Poisson solvers, MAA Studies in Numerical Analysis, Vol. 24 (1984, G. H. Golub, ed. ), pp. 319-370. [9] __ FFT algorithms for vector computers, Parallel Comput., 1 (1984), pp. 45-63. [10] --, Symmetric FFTs, Math. Comp., V;,l. 47, No. 175 (1986), pp. 323-346.

PAGE 321

[llj __ Multiprocessor FFTs, Parallel Comput., 5 (1987), pp. 197-210. [12] S. Wolfram, Mathematica: A system for doing mathematics by com puter, AddisonWesley, New York, 1988. 310