1 >
: FAST FOURIER TRANSFORMS
FOR
DIRECT SOLUTION OF POISSONS EQUATION
by
11
11 Bert Larue Bradford
i ' ,1
B.A., The University of North Texas, 1976
iMJL, The University of Texas at Austin, 1979
1 i
i 1
l
A thesis submitted to the
[ j, Faculty of the Graduate School of the
 University of Colorado in partial fulfillment
I of the requirements for the degree of
I Doctor of Philosophy
Department of Mathematics
1991
i
I
1991 by Bert Larue Bradford
All rights reserved.
This thesis for the Doctor of Philosophy
degree by
Bert Larue Bradford
., has been approved for the
Department of
1 Mathematics
I
1 by
Roland A. Sweet
Thomas F. Russell
Bradford, Bert Larue (Ph.D., Mathematics)
Fast Fourier Transforms for Direct Solution of Poissons Equation
Thesis directed by Professor Roland A. Sweet
i
This thesis; presents compact algorithms used to incorporate the Cooley
Tukey fast Fourier transform (FFT) into the solution of finite difference
approximations to the multidimensional Poisson equation. In each spatial
dimension, we must specify boundary conditions at both the left and right
endpoint. Boundary conditions we consider include cyclic, Dirichlet, and
Neumann. Furthermore, there is often a need to orient the grid such that
one or both of the endpoints of the computational domain are staggered
at half of a grid spacing. This leads to staggered Dirichlet and staggered
Neumann boundary conditions. When the Poisson equation is discretized,
these boundary conditions are approximated by requiring the real sequence
which represents the approximate solution to satisfy discrete analogs. The
discretized boundary value problem is solved by the Fourier analysis method
(also referred to as the eigenvector expansion method or as a fast Poisson
solver). This method requires finding the eigenvalues and eigenvectors cor
responding toithe discretized boundary value problem. The discrete solution
is expanded in terms of these eigenvectors. The efficiency of this algorithm
results from the ability to calculate the coefficients in such eigenvector ex
pansions usihjg an FFT algorithm. For each of the boundary conditions
discussed above, an FFT algorithm has been developed which computes the
coefficients in the corresponding eigenvector expansion as efficiently as pos
sible by eliminating all redundant computations which would occur in the
full complex FFT, and without pre or postprocessing. Such FFT algo
rithms are referred to as compact symmetric FFTs. The elimination of pre
and postprocessing improves performance by reducing both the number of
operations and data accesses. These FFT algorithms are all general mixed
radix, inplace algorithms which accept the input sequence in natural order.
The inverse algorithms accept the input sequence in permuted order. Thus,
reordering of data is never required.
The form and content of this abstract axe approved. I recommend its
publication.
Signed.
Roland A. Sweet
i
IV
Contents
List of Figures vii
List of Tables ,, x
Acknowledgements xi
1 Introduction 1
1.1 The Fourier Analysis Method............................ 1
1.2 The New FFT and FST Algorithms ........................ 8
2 Fast Fourier, Transforms 10
2.1 Complex (C) ................................. 10
2.2 Real! (R)............................................. 20
2.3 Real Even (RE)........................................ 32
2.4 Rea( Odji (RO)........................................ 44
2.5 Real' Composite EvenEven (REE)...................... 56
2.6 Real Composite EvenOdd (REO)........................ 62
2.7 Real Composite OddEven (ROE)........................ 68
2.8 Real Composite OddOdd (ROO)......................... 74
2.9 Real Staggered Even (RSE)............................. 79
2.10 Real Staggered Odd (RSO).............................. 83
2.11 Tables of Symmetries.................................. 87
3 Fast Staggered Transforms 91
3.1 Complex (C) ................................. 91
3.2 Real (R).............................................. 96
3.3 Real Staggered Even (RSE).............................109
3.4 Real Staggered Odd (RSO)..............................123
3.5 Real Composite Staggered Even Staggered Even (RSESE) 137
i
3.6 Real Composite Staggered Even Staggered Odd (RSESO) 143
3.7 Real Composite Staggered Odd Staggered Even (RSOSE) 149
3.8 Real Composite Staggered Odd Staggered Odd (RSOSO) 155
3.9 Tables of Symmetries..................................161
4 Software jlmplementation and Performance 164
4.1 Introduction..........................................164
4.2 The Radix2 RO FFT....................................167
4.3 The Radix4 RO FFT....................................178
4.4 The Radix3 RO FFT....................................190
4.5 The Mixed Radix RO FFT ...............................204
4.6 Performance of the RO FFT.............................214
4.7 Automating Implementation of the RO FFT...............223
A Eigenstructure of the Discrete Poisson Equation 225
B Software for the RO FFT 228
C FORTRAN Skeleton for Combine Equations 274
D Mathematica Scripts 277
E Automatically Generated Subroutines for the RO FFT 301
Bibliography
309
List of Figures
2.1 Splitting tree for complex FFT............................... 14
2.2 Splitting tree for R symmetric FFT .......................... 24
2.3 Splitting tree for RE symmetric FFT ......................... 34
2.4 Splitting tree for RO symmetric FFT ......................... 46
2.5 Splitting tree for REE symmetric FFT........................ 60
2.6 Splitting ,tree for RE0 symmetric FFT....................... 66
2.7 Splitting tree for ROE symmetric FFT........................ 72
2.8 Splitting; tree for ROO symmetric FFT....................... 77
3.1 Splitting tree for R symmetric FST ..........................101
3.2 Splitting tree for RSE symmetric FST.........................112
3.3 Splitting tree for RSO symmetric FST.........................126
3.4 Splitting tree for RSESE symmetric FST....................141
3.5 Splitting tree for RSESO symmetric FST....................147
3.6 Splitting tree for RSOSE symmetric FST....................153
3.7 Splitting tree for RSOSO symmetric FST ....................158
4.1 Radix2 storage pattern for ICS induced symmetries for N =
16 highlighting the case n = N/4...........................170
4.2 Radix2 storage pattern for ICS induced symmetries for N =
16 highlighting the case n = 1 ............................170
4.3 Radix2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = 0 ............................171
4.4 Radix2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = i\T/4.........................171
4.5 Radix2 storage pattern for ISCS induced symmetries for N =
16 highlighting the case n = 1 ............................172
4.6 Radix2 storage pattern for I sequences for N = 16 highlight
ing the case n = 0 .................................173
I
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
4.23
4.24
4.25
4.26
Radix2 storage pattern for I sequences for N = 16 highlight
ing the case n = Nj4 ...................................
Radix2 storage pattern for I sequences for N = 16 highlight
ing the case n = l .......................................
Splitting tree for the radix2 RO FFT for iV = 16 ........
Radix4 storage pattern for ICS induced symmetries for N =
24 highlighting the case n = 0 ..........................
Radix4 storage pattern for ICS induced symmetries for N
24 highlighting the case n = i\T/8........................
Radix4 storage pattern for ICS induced symmetries for N =
24 highlighting the case n = 1 ..........................
Radix4 storage pattern for ISCS induced symmetries for N =
24 highlighting the case n = 0 ..........................
Radix4 storage pattern for ISCS induced symmetries for N
24 highlighting the case n = N/8 .........................
Radix4 storage pattern for ISCS induced symmetries for N =
24 highlighting the case n = 1 ..........................
Radix4 storage pattern for I sequences for N = 24 highlight
ing the case n = 0 ......................................
Radix4 storage pattern for I sequences for N = 24 highlight
ing the case n = N/ 8 ...................................
Radix4 storage pattern for I sequences for N = 24 highlight
ing the case n = 1 ......................................
Radix3 storage pattern for ICS induced symmetries for N =
18 highlighting the case to = 0 ..........................
Radix3 storage pattern for ICS induced symmetries for N
18 highlighting the case n = N/6..........................
Radix3 storage pattern for ICS induced symmetries for N
18 highlighting the case n = 1 ..........................
Radix3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = 0 ..........................
Radix3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = N/6..........................
Radix3 storage pattern for ISCS induced symmetries for N =
18 highlighting the case n = 1 ..........................
Radix3 storage pattern for I sequences for N = 18 highlight
ing the case n = 0 ......................................
Radix3 storage pattern for I sequences for N 18 highlight
ing the case n = N/6 ...................................
174
175
176
182
182
183
184
184
185
186
187
188
193
193
194
195
195
196
197
198
vxn
I
i
.[ '1
4.27 Radix3 storage pattern for I sequences for N = 18 highlight
ing the case n = 1 ..........................................199
4.28 Radix3 storage pattern for 12 sequences for N = 18 high
lighting the case n 0 ............................200
4.29 Radix3 storage pattern for 12 sequences for N = 18 high
lighting the case n = N/6..................................201
4.30 Radix3; storage pattern for 12 sequences for N = 18 high
lighting the case n = l ...................................202
4.31 Initialization subroutine hierarchy for the RO FFT .........207
4.32 Forward transform subroutine hierarchy for the RO FFT . 207
i
I'
/  I
IX
List of  Tables
I i
1.1 Discrete;Homogeneous Boundary Conditions ................. 2
1.2 Eigenstructure for the Standard Grid ....................... 4
1.3 Eigenstructure for the Staggered Grid....................... 4
1.4 Eigenstructure for the Mixed Grid........................... 5
1.5 Operation Counts for 2D Poisson Solvers .................... 7
I l
2.1 Symmetries in the IDFT .................................. 88
2.2 Symmetries in the DFT ................................... 89
i
3.1 Symmetries in the IDST ..................................161
3.2 Symmetries in the DST....................................162
4.1 Splitting Tree for the Radix2 RO FFT for iV = 16 177
4.2 Splitting Tree for the Radix4 RO FFT for N 64 189
4.3 Splitting!Tree for the Radix3 RO FFT for IV = 27 203
4.4 Splitting1 Tree for the Mixed Radix RO FFT for N 72 ... 206
4.5 Timing Data for 1024 Sequences on the IBM 3090J .......215
4.6 Timing Data for 1024 Sequences on the Cray YMP8/864 . 216
4.7 Timing ^jlodel for 1024 Sequences on the IBM 3090J..........221
4.8 Comparison of Timing Data for Handwritten Code and Au
tomated! Code for 1024 Sequences on the IBM 3090J.........224
r
i
I '
I
X
I (
i
Acknowledgements
This work was'generously supported by the IBM Federal Sector Division
Resident Study Program.
i
i i
i
i
xi
Chapter 1
Introduction
1.1 The Fourier Analysis Method
l
We begin with a brief overview of the Fourier analysis method. We will
first present the Fourier analysis method in one spatial dimension. We will
then extend the method to a twodimensional rectangle. The extension to
higher dimensional rectangular regions is analogous, but we will not pur
sue this. Finally, we will discuss operation counts for the Fourier analysis
method, and compare it to other methods.
In one spatial dimension, the discretized Poisson equation is:
1, 1 2Un ( i = fn
for 1 < n < M.  We must specify boundary conditions at both the left
and right endpoint. We may assume, without loss of generality, that the
boundary conditions are homogeneous, since inhomogeneous boundary val
ues may be absorbed into f\ and /m The discrete, homogeneous boundary
conditions we consider, specified for n 1, are shown in Table 1.1. Note
that we consider two variants of Dirichlet and Neumann boundary condi
tions, depending upon whether the boundary coincides with a grid point or
is staggered at a half grid spacing. The notation DN indicates a homoge
neous Dirichlet boundary condition at the left endpoint, and a homogeneous
Neumann boundary condition at the right endpoint. Similar notation will
be used for other combinations. Combinations which involve only C, D, or N
are referred to as standard grid boundary conditions. Combinations which
involve only DS or NS are referred to as staggered grid boundary conditions.
Other combinations are referred to as mixed grid boundary conditions.
The discretized boundary value problem may be written in matrix form
as:
Au = f (1.1)
where A is a1 matrix of dimension M, and u,f are vectors of length M.
The boundary conditions have been used to eliminate Uq and um+i A is
tridiagonal, and in one spatial dimension we would simply solve this linear
system by Gaussian elimination. However, in anticipation of extensions to
higher dimensions, we procede as follows. First, we find the eigenvalues and
eigenvectors of A.' These are summarized in Tables 1.2, 1.3, and 1.4. Note
that A always has a full set of linearly independent eigenvectors whose com
ponents are trigonometric expressions. Note also that in these tables the
computational domain is different for each boundary condition. The reason
for this will become clear after studying the corresponding symmetric FFT.
Appendix A provides an example of one technique for finding these eigenval
ues and eigenvectors. For this general discussion, we denote the eigenvalues
by A*, (repeated to multiplicity) and the corresponding eigenvectors by fa
for 1 < k < M. We now seek a solution for u in the form of an eigenvector
expansion:
M
= Â£ Ukfa (12)
k= 1
This requires that we also express / as an eigenvector expansion:
i
i
M
/=Â£A& (1.3)
k1
Since f is known and the vectors fa are linearly independent, we may com
pute /fc. Because the components of fa are trigonometric expressions, fa
Table 1.1: Discrete Homogeneous Boundary Conditions
Acronym Boundary Condition Discrete Analog
C Cyclic u0 uM
D Dirichlet u0 = 0
N' Neumann u2 u0 = 0
1 DS DirichletStaggered U! + u0 = 0
NS NeumannSt aggered ux Uq = 0
2
I
may be computedimost efficiently by means of a symmetric FFT. Thus, this
step is referred to as Fourier analysis. Substituting equations (1.2) and (1.3)
into equation (1.1) yields:
 M M
X Mk = A[X vkfa]
k=1 k=1 M X ^k^k^k k=1
Since the vectors are linearly independent, we conclude:
fcAfc = fk
i1
for 1 < k < M. \ We may now compute Uk, unless A*. = 0. In this case,
the compatibility: condition fk = 0 must hold, and Uk is arbitrary. Thus,
the solution for u is not unique in this case. This occurs for CC, NN, NS
NS, NNS, and NSN boundary conditions, and corresponds to the fact that
the solutions to these problems are unique only up to an additive constant.
Having determined %, u may now be computed using the inverse of the
corresponding symmetric FFT. This step is called Fourier synthesis.
We now indicate how to extend the Fourier analysis method to a two
dimensional rectangle. For simplicity, we assume that the number of un
knowns in each dimension are equal. In two spatial dimensions, the dis
cretized Poisson equation is:
^nl,m 2ttji,m "t" Un+l,m F P [un,m1 ~~ 2un]m = fn,m
for 1 < n,m < ;M, where p = Ax/Ay. We assume that homogeneous
boundary conditions are specified on all four sides of the rectangle of the
same type considered previously. The discretized boundary value problem
may be written in!matrix form as:
1 11 P [^m1 ) I/tji+i] fm (1^)
for 1 < m < M. Um. is a vector of length M with nth component un,m, and
likewise for fm. A is the same Mdimensional matrix as in the corresponding
onedimensional problem. As before, we seek a solution for um in the form
of an eigenvector expansion:
M
^m ^ ^k,m^k (1*5)
l k=l
3
i
I I
Table 1.2: Eigenstructure for the Standard Grid
Bnd Cnd; nth Comp of Eigenvec Comp Domain
Transform Associated Eigenvalue Eigenvec Indx
1' CC : R FFT j i cos (2Tkn/N) 4 sin2 (7rk/N) 0 < n < N 1 0 < k < N/2 or 0 < k < (N l)/2
, i 1 l sm(2irkn/N) 4sin2(7r k/N) 0 < n < N 1 1 < k < N/2 1 or 1 < k < (N 1)12
NN EE FFT;1 cos(2tt kn/N) 4 sin2 (7rk/N) 0 < n < JV/2 0 < k < N/2
DD RO FFT ' sin(27r kn/N) 4 sin2 (irk/N) 1 < n < N/2 1 l
ND RE0 FFT cos[27rn(2& f l)/iV] 4sin2[7r(2A: + 1 )/JV] 0 < n < JV/4 1 0<&< N/4 1
DN '! ROE FFT sin[27m(2fc 1 )/N] 4sin2[7r(2A: l)/jV] 1 < n < JV/4 1 < k < N/4
1 !1
Table 1.3: Eigenstructure for the Staggered Grid
Bnd Cnd nth Comp of Eigenvec Comp Domain
Transform i Associated Eigenvalue Eigenvec Indx
NSNS1 :: RSE FST j, cos[7r&(2n + l)/iV] 4sm2(7ck/N) 11 11 1 1
DSDS RSO FST! sin[7rfc(2n + 1)/JV] 4 sin2(7rA:/A) 0 < n < N/2 1 1 < k < N/2
NSDS RSESO FST cos[7r(2A: + l)(2n + l)/iV] 4 sin2[7r(2& + l)/iV] 0 < n < N/4 1 0 < k < N/4 1
DSNS RSOSE FST sin[7r(2fc + l)(2n + 1)/JV] 4 sin2[7r(2A: ) 1)/JV] 0 < n < N/4 1 0 < k < N/4 1
4
I
I
I
i I
iTable 1.4: Eigenstructure for the Mixed Grid
Bnd Cnd!! nth Comp of Eigenvec Comp Domain
Transform 1, Associated Eigenvalue N = 2(2 M + 1) Eigenvec Indx
; ll NNS. j, REE FFT cos(47rA:n/Ar) 4sin2(27r k/N) 0 < n < M 0 < k < M
NDS i i RE0 FFT cos[27rn(2fe + 1 )/N] 4 sin2[7r(2ft + 1)/A] 0
DNS j, ROE FFT sin[27rn(2A: l)/N] 4sin2[7r(2A: 1 )/N] 1 < n < M 1
DDS '' ROOFFT sin(47rkn/N) 4sin2(27r k/N) 1 < n < M 1 < k < M
NSN' , RSESE F;ST cos[2Trk(2n + 1)/1V] 4sin2(27rA:/iV) 0 < n < M 0
NSD ; RSESO FST cos[7r(2fc + l)(2n + l)/iV] 4sin2[x(2fc + l)/i\T] 0 < n < Af 1 0 < k < M 1
DSN RSOSE FST sin[ir(2A: + l)(2ra + 1)/1V] 4sin2[7r(2A: + l)/iV] 0 < n < M 0 < k < M
DSD RSOSO FST sin[27rA:(2n + 1)/1V] 4sin2(27r k/N) 0 < n < M 1 1 < k < M
i
5
11
1 I
I
I,
j;
This requires that we also express fm as an eigenvector expansion:
i M .
fm : yi (16)
1 fc=l
fk,m may be computed most efficiently by performing M symmetric FFTs of
length M. Substituting equations (1.5) and (1.6) into equation (1.4) yields:
M M
^ ^ r 1 "t"
k=1 ,1 fc=l
M
, P ^ 1 2Uk,m H" f^Js,m+l]^A
fc=l
m
, = XI +
; *=i
!; M
fc=l
Since the vectors are linearly independent, we conclude:
P 1 (^/s 2p ')U]ttTn f p U^m+l fk,m
for l < k,m < M. We now obtain Uk,m by solving M tridiagonal linear
systems of dimension M by Gaussian elimination. For CC, NN, NSNS,
NNS, or NSN boundary conditions, one of these linear systems is singular.
In this case, fk,m 1 must satisfy a compatibility condition, and the solution
for Uk,m is not, unique. Having determined Uk,m, Um may be computed by
performing M symmetric FFTs of length M.
We conclude this section with a discussion of operation counts for the
Fourier analysis method, and a comparison of it to other methods for solv
ing the discrete Poisson equation. The Fourier analysis method is efficient
only for two or more dimensions. As before, we will restrict our discus
sion to two dimensions. The operation count for an MxM grid, where M
is a power of two, is easily obtained from the description of the algorithm
above. We performed 2M symmetric FFTs of length M, each of which re
quires 0(Mlog M): operations. We solved M tridiagonal linear systems of
dimension M by Gaussian elimination, each of which requires 0(M) oper
ations. Thus, the asymptotic operation count for the entire algorithm is
I
I!
i
I'
Table jl.5: Operation Counts for 2D Poisson Solvers
Method Operation Count
Gaussian Elimination 1, j Successive OverRelaxation Alternating Direction Implicit Cyclic Reduction Courier Analysis FACR(^) 0(M4) 0{M2 log M) 0(M2 log2 M) 0{M2 log M) 0(M2 log M) 0(M2loglogM)
i1
0(M2logM).f The operation counts for other methods of solving the dis
crete Poisson [equation are summarized in Table 1.5. The source of this
information is j [8].j 'The FACR(^) method combines the cyclic reduction and
Fourier analysis methods.
I
I H
1.2
The New FFT and FST Algorithms
From the discussion of the Fourier analysis method in Section 1.1, it is
evident that FFT algorithms form the core of this method. Our goal is to
provide the best possible FFT algorithms for this purpose, and to address
all of the boundary conditions in Tables 1.2, 1.3, and 1.4. In this section,
we summarize the inew contributions to FFT literature contained herein.
For each of the boundary conditions in Tables 1.2, 1.3, and 1.4 an FFT
algorithm has been developed which computes the coefficients in the corre
sponding eigenvector expansion as efficiently as possible by eliminating all
redundant computations which would occur in the full complex FFT, and
without pre or postprocessing. Such FFT algorithms are referred to as
compact symmetric FFTs. The older pre and postprocessing algorithms
are described in detail in [2, 10]. Pre and postprocessing steps contribute
only low order terms to operation counts. However, for sequences of prac
tical length these low order terms may be significant. Furthermore, these
algorithms require additional data accesses which also contribute to the to
tal execution time. Thus, compact symmetric FFTs eliminate the additional
operations and data accesses associated with pre and postprocessing algo
rithms. Pre and postprocessing algorithms also have the restriction that
the length of the sequence must be even. A compact symmetric FFT has
long been available for real sequences, known as Edsons algorithm. In
[4], a compact symmetric FFT for real even sequences is introduced, but
in the context of ClenshawCurtis quadrature. In [10], inplace compact
symmetric FFTs are developed for real, even, odd, quarterwave even, and
quaxterwave odd symmetries. All inplace algorithms based on the splitting
method require either the input or output sequence to be in a permuted
order, referred to as bitreversed order. These inplace algorithms require
the input sequence in physical space to be in bitreversed order, and produce
the forward transform in natural order. From our discussion of the Fourier
analysis method, it is clear that this is the opposite of what is desired. In
[1], analogous algorithms are developed which accept the input sequence
in physical space in natural order, and produce the forward transform in
bitreversed order. We follow the general approach set forth in [1].
With this background, we may now summarize our new contributions
to FFT literature., The algorithms in [1] were developed for radix2 only.
We have generalized all of these to radixp, for a general factor p. This
has resulted in a number of new intermediate symmetries which occur in
8
I
the course of the splitting method. After obtaining the combine equations
for the inverse transform, they must be inverted to obtain those for the
forward transform! For the radixp algorithms, this requires the inversion of
many systems of p equations in p unknowns. We have exploited the special
nature of these systems of equations to invert them in closed form. The real
quarterwave even and quarterwave odd transforms, which we refer to as the
real staggered even (RSE) and real staggered odd (RSO) FFTs, have been
used for ND and;DN boundary conditions respectively. We have shown
that the algorithms for these symmetries in [1] are not inplace. We have
developed two new compact symmetric FFTs, called real composite even
odd (RE0) and composite oddeven (ROE) for these boundary conditions.
We have shown that these new algorithms are inplace and obtain the goal of
eliminating all redundant operations which would occur in the full complex
FFT.
For staggered!grid boundary conditions, we have developed new algo
rithms based on a variant of the DFT which we refer to as the discrete stag
gered transform (DST). In analogy to the FFT, we have developed efficient
algorithms for computing the DST, which we refer to as the fast staggered
transform (FST). Previously, the only known algorithms for staggered grid
boundary conditions were the real quarterwave even and quarterwave odd
FFTs, and the pre and postprocessing algorithms in [6]. The real quarter
wave even and quarterwave odd FFTs have been used for NSNS and DSDS
boundary conditions respectively, but the algorithms for these symmetries
in [1] are not inplace. The pre and postprocessing algorithms for NSDS
and DSNS boundary conditions are less efficient than the new compact
symmetric FSTs for the same general reasons discussed previously.
For mixed grid boundary conditions, we have developed new algorithms
based on superimposing two symmetries. We refer to the resulting sym
metries as composite symmetries. Previously, the only known algorithms
for mixed grid boundary conditions were the pre and postprocessing al
gorithms in [6] for NSD and DNS boundary conditions. Again, the pre
and postprocessing algorithms are less efficient than the new compact algo
rithms. Furthermore, we have developed compact algorithms for six mixed
grid boundary conditions which previously could not be solved by Fourier
methods. ;
9
Chapter 2
Feist Fourier Transforms
i !,
1 i
1 i
i
i1
2.1 Complex (C)
: j
t '
We begin by reviewing the fast Fourier transform, and establishing no
tation which will lie used throughout.
i i
Definition 2.1 Given a C sequence xn, for 0 < n < N 1, the forward
discrete Fourier transform (DFT) is defined by:
i ; JV1
I j, X* = 1/JV Â£ Wn (21)
;  n=0
for 0 < k < N j l,1 where:
cjN = ei2^N
,, ,
For convenience, we will often suppress the constant 1/N.
i,
The following itheorem provides the inverse discrete Fourier transform
(IDFT). We omit Jthe proof of this result because it is well known.
i i
Theorem 2.11 A \C sequence xn may be recovered from its DFT Xk by the
inverse discrete Fourier transform (IDFT) which is given by:
! I
, ', JV1
i, xn='EXk
! i! k0
i. 1'
for 0 < n < N lL.
I
(2.2)
>' I
li 1
By Definition 2.1, the sequences xn and Xk axe of length N. These se
quences can be extended to all integral values of n and k using the periodicity
properties proyided by the following corollary.
j; i 
Corollary 2.1 Equations (2.1) and, (2.2) imply that the sequences xn and
Xk may be extended periodically to all integral values of n and k by:
XN+n xn
XjV+A = Xk
We will develop fast algorithms for computing the DFT and ID FT which
are based on the GboleyTukey fast Fourier transform (FFT). Following the
general approach in [1], we will develop algorithms for the IDFT given Xk
in bitreversed
order. Inverting these yields algorithms for the DFT given
xn in natural order. We begin by defining notation which will be needed in
the development of these algorithms.
 I1
Definition 2.2 Given a C sequence Xk of length N, and a factor p of N,
we define a splitting of Xk consisting of the following p subsequences, each
of length NJp: [ 
^k,q = Xpk+q
for 0 < k < N/p jlj 0 < q < p 1. We denote the IDFT of these by yn
That is:
N/p1
I'
Vn,q 53 ^.9^"
N/p
for 0 < n < N/p j jl, 0 < q < p 1.
Given a C sequence xn of length N, and a factor p of N, we define the
following p subsequences, each of length N/p:
xn,l xlN/p+n
, I I
for 0 < n < N/p M, 0 < l < p 1.
 i
; l'
The inverse^, fast Fourier transform (IFFT) is based on the principle of
computing the quantities yn,q, and then combining these in the appropriate
fashion to obtain aThe precise equation for performing this combining
operation is provided by the next theorem.
11
(2.3)
Theorem 2.2 The inverse combine equation for C sequences is:
:; p1
 i; Xn,l = Â£ ^p^NV^q
. 1 9=0
1 jl
for 0 < n < N\/p 77 1, 0 < l < p 1.
j i
We now prove'Theorem 2.2.
i JV1
l i!
' ^ fc=0
! : piN/pi
i :: =eexPk+^k*'>
? g=0 k=0
j pl N/p1
! ;: = S>5? Â£ ***"/,
j ' , g=:0 fc=0
i 
I pl
 = Â£ UN Vn,q
q=0
I
In terms of the subsequence notation defined previously, this result is:
 ! *n,Z = xlN/p+n
i pl
Eg (IN/p+n)
wjv yiN/p+n,q
i ;! 9=0
, L i
' j 1! = Â£wpwJvyn,g
i. I' 9=0
This completes the proof of Theorem 2.2.
The following corollary provides an important special case of this result.
This is the same a;s equation (2) in [1], except that we are working with the
IDFT. I 1,
Corollary 2.2 Assume p = 2. The inverse combine equation for C
quences is: j
t J i n, 0 = it Vn, 0 + uNVn, 1
! n, 1 = j ; Vn,0 ~ ^NVn, 1
for 0 < n < N/2 f,l.
\
12
We may npw describe the IFFT algorithm for a C sequence with length
a power of twd. Figure 2.1 is a splitting tree diagram which represents this
algorithm for a C I sequence of length eight. The original sequence is split
into two subsequences, one consisting of the even numbered terms, and the
other consisting of]the odd numbered terms. Assume, for the moment, that
the IDFT of each j subsequence is known. Then the IDFT of the original
sequence may be obtained by applying Corollary 2.2. The algorithm now
continues recursively. That is, the IDFT of each subsequence is computed
by splitting them and repeating the steps above. Eventually, subsequences
of length one yrill jbe obtained. Since a sequence of length one is its own
IDFT, the recursive process terminates at this point.
We now bejgin' the development of the FFT algorithm. WTe will obtain
the forward combine equation for the FFT by inverting the inverse combine
equation. For this], we will need the following orthogonality property.
i' il
Lemma 2.1 If N\ 'is a positive integer, and 0 < j,n < N 1, then:
! = *(*) =
k=0
N if j n
0 otherwise
We now prove Lemma 2.1. The case j = n is obvious. For j ^ n, define:
!'  y =
Summing the finite geometric series yields:
i i!
; 1j JVi N1
; i' = Â£/
k=0 k0
! = (lyN)/(ly)
' i: = (i i)/(i y)
1 i1 =o
i'
, ],
This completes; the]proof of Lemma 2.1. The forward combine equation for
the FFT is now provided by the following theorem.
i ;;
Theorem 2.3 1 The forward combine equation for C sequences is:
I'
pi
Vn,q = l/p UNnq Uv lqx^l
1=0
(2.4)
for 0 < n < N/p \ 'l, 0 < q < p 1.
13
i
Figure 2.1: Splitting tree for complex FFT
14
We now prove! Theorem 2.3.
ii
N/pl
yn,q 5^
! Jfe=0
" N/pl
' = E **+,
1 fc=o
'! N/p1 JV1
\ = E fVJv E W^'W/,
!1 fc=o j=0
l" JVl N/p1
= i/E Wi E
 ! J=0 fczzO
' !; = l/pl>,wW
j, z=o
' = 1/PNnqY.UJplqX^
I 1=0
I
This completes the proof of Theorem 2.3.
The following corollary provides an important special case of this result.
This is the same as equation (13) in [1], except that we are working with
the IDFT. j ;,
Corollary 2.3 Assume p = 2. The forward combine equation for C se
quences is: .
Vnft (n,0 "I" *n,l)/2
Hn,i = ujjn{xnfi n,i)/2
for 0 < n < N/2 j'l.
We close this section by presenting the FFT and IFFT algorithms for
complex sequence^ with length a power of two. We emphasize that this
FFT is an inplace algorithm which accepts the input sequence xn in natu
ral order, and produces the forward transform X*. in bitreversed order. The
IFFT is an inplace! algorithm which accepts the sequence X& in bitreversed
order, and produces the inverse transform x in natural order. These al
j j
gorithms may be used together in such a way that reordering of the data
15
is never required: We will not include complete algorithm specifications
such as these for lall of the symmetric FFTs presented later. However, the
algorithms presented here should provide a guideline for developing com
plete algorithms from forward and inverse combine equations. The codes
are written in FORTRAN, and are patterned after similar codes found in
[9].
i
i, i
16
i
i
i
i
I
I
c
C TEST DRIVER FOR COMPLEX FFT
C
PARAMETER (LOGN=3,N=2**LOGN)
COMPLEX X(0:N1)
COMPLEX OMEGA(0:N1)
COMMON /FCCOM/ L,OMEGA
DO 100, 1=0, Nl
X(I) = CMPLX(1.0,0.0)
100 CONTINUE
WRITE(6,1) (X(I),1=0,Nl)
1 FORMATClH ,COMPLEX SEQUENCE = ,4(/,4E13.4))
CALL FCI(LOGN)
CALL FFC(LOGN.X)
WRITE(6,2) (X(I),1=0,Nl)
2 FORMATS1H1,FORWARD TRANSFORM = ,4(/,4E13.4))
CALL FICCLOGN.X)
WRITE(6,3) (X(I),1=0,Nl)
3 FORMATClH ,INVERSE TRANSFORM = ,4(/,4E13.4))
END
C
C FOURIER TRANSFORM
C COMPLEX SEQUENCE
C INITIALIZATION
C
SUBROUTINE FCI(LOGN)
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
L = 2**L0GN
OMEGA(0) = 1.0
TPIDL != 8.0*ATAN(1.0)/L
OMEGA(1) CMPLXCCOS(TPIDL),SIN(TPIDL))
DO 100 1=2,Ll
OMEGA(I) ;= OMEGA(I1)*0MEGA(1)
100 CONTINjUE
RETURN
END I 1
17
c
C FOURIER TRANSFORM
C FORWARD DIRECTION
C COMPLEX SEQUENCE
C l
SUBROUTINE FFC(LOGN,X)
COMPLEX X,(0:2**L0GN1)
N = 2**LOGN
DO 10p 1=1,LOGN
NS = 2**(I1)
LS = N/NS
CALL CF(NS,LS,X)
100 CONTINUE,
DO 200 1=0,Nl
X(I) = X(I)/N
200 CONTINUE
RETURN
END
C
C COMPLEX SEQUENCES
C FORWARD COMBINED
c
C NS = NUMBER OF SEQUENCES
C LS = LENGTH OF SEQUENCES
C
SUBROUTINE CF(NS,LS,X)
COMPLEX X(0:LS/2l,0:l,NS),TMPl
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
DO 200 J=1,NS
DO 100, I=0,LS/2l
TMP1 =1 XCI.O.J) + X(I,1,J)
X(I,1,J) = CONJG(OMEGA(I*L/LS))*(X(I,0,J) X(I,1,J
X(I,0,J) = TMP1
100 CONTINUE
200 CONTINUE ,
return;
END
18
c S'
C FOURIER TRANSFORM
C INVERSE DIRECTION
C COMPLEX SEQUENCE
C ! j i
SUBROUTINE FIC(LOGN.X)
COMPLEX X(0:2**L0GN1)
N = 2**L0GN
DO 100 I=1,L0GN
i' ii
LS = 2**l'
NS = N/LS!
CALL CI(NS,LS,X)
100 CONTINUE
RETURN i.
end ;
c
C COMPLEX SEQUENCES
C INVERSE COMBiNED
c
C NS = NUMBER OF SEQUENCES
C LS = LENGTH OF SEQUENCES
C
SUBROUTINE CI(NS,LS,X)
COMPLEX X(0:LS/2l,0:l,NS),TMPl
COMPLEX OMEGA(0:0)
COMMON /FCCOM/ L,OMEGA
DO 200; J=1 ,NS
DO 100; I=0,LS/2l
TMP1 = 0MEGA(I*L/LS)*X(I,1,J)
X(I,1,J) = X(I,0,J) TMP1
X(I,0,J) = X(I,0,J) + TMP1
100 CONTINUE 11
200 CONTINUE i
RETURN
END
19
I
2.2 Real (R)
In this section! we will be concerned with the following symmetries:
Definition 2.3 A real (R) sequence xn of length N is defined by:
*71 = *71
A conjugate symmetric (CS) sequence Xk of length N is defined by:
Xrik = Xk
The following! lemma establishes the relationship between these symme
tries. We omit the proof of this result because it is well known.
Lemma 2.2 If x^ is an R sequence of length N, then its DFT Xk is a CS
sequence of length N. If Xk is a CS sequence of length N, then its IDFT
xn is an R sequence of length N.
\ I
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for CC boundary con
ditions. Since an R sequence is also periodic with length N, it satisfies CC
boundary conditions for the computational domain 0 < n < N 1.
Theorem 2.4 Let xn be an R sequence and let Xk be its CS symmetric
DFT, both of length N. The real form of the DFT is:
! jvi
Re{Xk) = 1 /N ^ rncos(27rkn/N)
71=0
l . z}
Im(Xk) = l/N y; xnsm(2Kkn/N)
71=0
for 0 < k < N
If N is even, then the real form of the IDFT is:
n = <^o.+ (l)Xjv/2 +
N/ 21
y; {2Re(Xk) cos(2Trkn/N) 2Jm(Xfc) sin(27rkn/N)}
' k= 1
20
for 0 < n < N 1. If N is odd, we obtain instead:
(JVl)/2
xn = Xo + 53 {2Re(Xk) cos{2/wkn/N) 2Im(Xk) sm.{2irkn/N)}
l *=1
, for 0 < n < N 1.
We now prove Theorem 2.4. The result for the DFT follows immediately
from Definition 2;1 and the R symmetry of xn. Note that only half of the
CS sequence Xk needs to be specified. We prove the result for the IDFT for
the case of even fl only, since the proof for odd N is similar. Using the CS
symmetry of Xk yields:
1 Nl
*. Y
k=b
= ^0 + (l)n^V/2 +
Nj 21 N/ 21
t 2 XN_kUN(N~k)
fc=i fc=i
= Xo '+ (l)nXjv/2 +
N/ 21 N/21
[ Y X^N+ Y
j fc=l fc=l
? i N/21
= Xo + (l)nXjy/2 + 2Re[ X^u#1]
fc=l
= j'^O ;+ ( l)"Xjy/2 +
N/21
53 {2Re(Xj.) cos(2irkn/N) 2Im(Xj.) sin(27rhn/jV)}
fc=i
This completes the proof of Theorem 2.4.
We now develop a fast, mixed radix algorithm for computing the R
symmetric DFT and its inverse, given xn in natural order. Note that an R
sequence of length N may be stored in N real storage locations, compared to
2N real storage locations for a C sequence of length N. Also, a CS sequence
of length N may ibe stored in N real storage locations because half of the
sequence is redundant and need not be stored. Our goal is to exploit these
symmetries in the data in order to obtain a reduction by half in both storage
I i
21
requirements and number of operations compared to that for C sequences.
This algorithm is based on the symmetries which occur in the splittings of
the CS sequence IX*. We begin developing this algorithm by defining all of
the intermediate symmetries involved.
Definition 2.4 Let Xk be a CS sequence of length N with factor p. For
q Q, we define CS induced intersequence symmetry (CSIS) by:
Xk,pq XN/pkl,q
For q ^ 0, we denote subsequence Xkiq by CSIS(q). Subsequence p q
is a redundant copy of subsequence q, which we denote by CSIS(p q) =
CSIS (q). We also say that subsequence p q is the dual of subsequence q.
A staggered conjugate symmetric (SCS) sequence Xk of length N is de
fined by:
Xjvjti = Xk
Let N have factor p. For 0 < q < p 1, we define SCS induced intersequence
symmetry (SCSIS) by:
Xk,pql X N/pkl,g
For 0 < q < p 1, we denote subsequence Xkq by SCSIS(q). Subsequence
p q 1 is a redundant copy of subsequence q, which we denote by
SCSIS(p q 1)= SCSIS (q). We also say that subsequence p q 1 is
the dual of subsequence q.
The following 'lemma establishes the relationship between these symme
tries.
Lemma 2.3 Let \Xk be a CS sequence of length N with factor p. Then
the subsequence Xkto is CS symmetric, and the remaining subsequences Xktq
are CSIS symmetric. If p is even, then the CSIS symmetry of subsequence
XkiP/2 reduces to SCS symmetry.
Let Xk be an SCS sequence of length N with factor p. Then the subse
quences Xk,q are SCSIS symmetric. If p is odd, then the SCSIS symmetry
of subsequence Xk^p_iy2 reduces to SCS symmetry.
We now provej Lemma 2.3. Let Xk be a CS sequence of length N with
factor p. The subsequence Xk,o satisfies:
^N/pk, 0 X Npk Xpk X kiQ
22
That is, subsequence Xklo is CS symmetric. The remaining subsequences
Xk,q satisfy:
Xk,pq ~ Xpk+pq
1 ^JVpfep+g
Xp(Njp_k_ !)+g
= XN/pkl,q
That is, for q0; the subsequences Xk,q are CSIS symmetric. If p is even,
then the CSIS symmetry of Xk)P/2 reduces to:
Xk,p/ 2 = XNjpkl,p/2
That is, subsequence Xfcp/2 is SCS symmetric.
Let Xk be!an SCS sequence of length N with factor p. The subsequences
Xk>q satisfy:
Xk ,p q 1
That is, the subsequences Xk,q are SCSIS symmetric. If p is odd, then the
SCSIS symmetry of Xk,(pi)/2 reduces to:
Xk,{p1)/2 = X N/pkl,(pl)/2
That is, subsequence Xk^p\)/2 is SCS symmetric. This completes the proof
of Lemma 2.3:
A mixed radix splitting tree diagram for a CS sequence is shown in
Figure 2.2. The acronyms representing the symmetries are summarized in
Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant.
The next lemma provides the intermediate symmetries in the IDFT in
duced by the intermediate symmetries in the DFT.
Lemma 2.4 The intersequence symmetry CSIS induces the following inter
sequence symmetry in the IDFT:
71
yn,pq ^N/pVn.q
i ,! 23
i
Xpk+pql
X Ifpk p+q+1 1
Xp(N/pkl)+q
XN/pkl,q
i
I
I
i I
Figure 2.2: Splitting tree for R symmetric FFT
l
I
I I
I I
24
Let Xk be an SCS sequence of length N. Its IDFT xn satisfies:
xn 
7l/2 ,
xn ^JV xn
where xn is the magnitude of xni and hence is real. The intersequence
symmetry SCSIS induces the following intersequence symmetry in the IDFT:
Vn.pql = UN/pVn,q
We now prove Lemma 2.4. Let Xk>q be CSIS symmetric. Then the IDFT
of Xk,pq is:
Vn,
pg
N/pl
: ^,p9w^/p
Jfc=0
N/pl_
~ X) XN/pkl,q^kN/p
k=0
N/pl
n(JV/pfc1)
= E
k=0
N/p
N/pl_
= UJV/P X^guiv/P
fc=0
= jv/p^g
Let Xfc be an SCS sequence of length IV. Its IDFT xn satisfies:
Nl
XT
= 'E.Xkvfr
k0
Nl
Ev n(Nk1)
XN_kl^N
k=0
Nl_
E
k=0
kn
We express a:n in polar form as follows:
Substituting this into the preceding symmetry for xn and solving for 6 leads
to:
n/2 ~
xn=UN Xn
Let Xk,q be SCSIS symmetric. Then the IDFT of XklPqi is:
yn,pql
I
N/pl
53 ^k,pqluN/p
k=0
N/pl _
53 ^N/pkl^N/p
k=0
N/pl
Â£ ^
n(N/pk1)
WN/p
k=0
N/pl_
UN/p Â£ Xk,qUNk/p
k0
UN/Py*,q
This completes the proof of Lemma 2.4.
The preceding lemma shows that each symmetry appearing in Figure 2.2
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations for the R symmetric IFFT.
Theorem 2.5 Assume that p is even. The inverse combine equation for
CS, SCS, and CSIS sequences is:
l
i,Z Vnfi + (l)Zjfn,p/2 +
P/21
2Re[ 53
9=1
(2.5)
for 0 < n < jy/p 1, 0 < Z < p 1. Note that xn
real. The inverse, combine equation for SCSIS sequences is:
P/21
I .! Xnj = 2Re[
9=0
for 0 < n < N/p 1, 0 < Z < p 1.
26
f !
Next, assume, \that p is odd. The inverse combine equation for CS and
CSIS sequences is:
(P1)/2
*,/ = Vnfi + 2 Re[ pVNVn.ql
91
(2.7)
for 0 < n < NJp q 1, 0 < l < p 1. The inverse combine equation for SCS
and SCSIS sequences is:
I
(p3)/2
*ki = ~yn,(Pi)/2 + 2Re[u>lp/2u/2 Y ulpuNlh, 9] (28)
: 11 9=0
for 0 < n < N\jp = 1, 0 < l < p 1.

We now prove'Theorem 2.5. First, assume that p is even. Consider the
combining of pS,'!SCS, and CSIS sequences. Substituting the symmetries
found earlier into! the inverse combine equation (2.3) yields:
n,I
g=0
yn, 0 + ulp/2u7/2yn,P/2 +
: i p/21 p/21
Up y^.q + 2^ WP UN i/n,pq
' 9=1 g=l
 = yn, 0 + (l)lu%/2[u,%2yn>p,2} +
 ! p/21 p/21
! I, Y WpWN^ +
 ' q1 q=1
2/71,0 + (i)lyn,p/2 +
p/21
2Re[ Y
i ! 91
] 1
Consider the combining of SCSIS sequences. Substituting the symme
tries found earlier! into the inverse combine equation (2.3) yields:
I
*n,I 4 YUPUNyn,q
9=0
, I
I h
! i 27
i I
I
Z>/2l p/21
E apWAT9yn,9+ E Jpipql)^P~q~l)yn,pql
90
<7=0
p/21
= E
9=0
Using SCS symmetry yields:
E p^VriA + Wp E Wp lqNnqy^q
p/21
E
9=0
*IN/p+n
W
(W/p+n)/2.
JV
Z/2
xlN/p+n
i/2
= Uv' UN' Xn,l
(2.9)
Substituting this into the combine equation above yields:
: p/21
*n,Z = ^p/2^/2 E "JW n,s +
9=0
p/21
u>l,2u>xn/2 jr uPlqu}Nnqyn,q
9=0
p/21
= 2Re[o4/2o^/2 w?"3yn,]
9=0
Next, assume that p is odd. Consider the combining of CS and CSIS se
quences. Substituting the symmetries found earlier into the inverse combine
equation (2.3) yields:
xn,l = E
9=0
(p1)/2 (?1)/2
y*1? + E *>Nqyn,q + E a;((p9)a,n(p.
9=1 9=1
(pl)/2 (pl)/2
Vn, 0 + E UlpqVNqyn,q + E ^lqryr
9=1 9=1
(? 1)/2
2/n,o + 2Re[ E upulryn,q]
n, 9
9=1
28
Consider the combining of SCS and SC SIS sequences. Substituting the
symmetries found earlier into the inverse combine equation (2.3) yields:
=i +
(p3)/2 (p3)/2
, X UpUNyn,q+ X a,pP~9_1)a;Ar(P'9"1)^1P9l
I. q=0 q=0
' l(p 1V2 n(p1)/2 .
= WP J/ "JV lfe.(pl)/2 +
(pT3)/2 (p3)/2
X " + WplwNn X WplqNnqyn>q
g=0
Combining this with equation (2.9) yields:
1 
p/2 np/2 ,
xn,l = ^ Ifn,(pl)/2 +
(p3)/2 (p3)/2
Wp2a,JV2 X p^AtVq +pl/2"jT/2 X
g=0 g=0
, 'h . (P3)/2
= 2/n,(pl)/2 + 2Re[^/2w/2 2 w?Srn,9]
g=0
This completes the proof of Theorem 2.5.
The following corollary provides an important special case of this result.
These are the same as equations (6) and (7) in [1], except that we are working
with the DDFT. ,
Corollary 2.4 Assume p 2. The inverse combine equation for CS and
SCS sequences is:;
*n,0 ~ Vtl,0 2/n,l
j *7i,l = J/n.,0 Â£/n,l
/or 0 < n < N/2 ^1. TAe inverse combine equation for SCSIS sequences is:
I *7i,o = 2Re[u$2ynfi}
*7i,i = 2Im[(Jff2ynfl]
for 0 < n < N/2 fj 1.
29
' I
The next theorem provides all of the forward combine equations for the
R symmetric FFT.
Theorem 2.6 Assume that p is even. The forward combine equation for
CS, SCS, and CSIS sequences is given by equation (24) for 0 < n < N/p 1,
0 < g < p/2 1 and:
v1
yn,P/2 =
1=0
(2.10)
for 0 < n < N/p 1. The forward combine equation for SCSIS sequences
is:
1=0
(2.11)
for 0 < n < N/p 1, 0 < q < pj 2 1.
Next, assume that p is odd. The forward combine equation for CS and
CSIS sequences is given by equation (24) for 0 < n < N/p l,0
(p l)/2. The forward combine equation for SCS and SCSIS sequences is
given by equation: (2.11) for 0 < n < N/p 1, 0 < q < (p 3)/2 and:
pi
Vn,{p1)/2 = 1/P y~^(~
1=0
(2.12)
for 0 < n < N/p 1.
I
I
We now prove Theorem 2.6. First, assume that p is even. The forward
combining of CS, SCS, and CSIS sequences requires one new equation:
Vn,p/2 N/pyn,p/2
= 1 /pE^72^
1=0
= I/pXX1)^
j ' /=0
The forward combine equation for SCSIS sequences is obtained by sub
stituting equation (2.9) into equation (2.4):
yn,q = l/pw^^v^Xnj
: ;: i=o
30
I
I
= l/P uNnq 5Z UP lq K l/2lvn/2 n,l]
1=0
l=o
Next, assume that p is odd. The forward combining of CS and CSIS
sequences does not require any new equations. The forward combining of
SCS and SCSIS sequences requires one new equation:
t
Vn,(pl)/2 ~
This completes the proof of Theorem 2.6.
The following corollary provides an important special case of this result.
These axe the same as equations (11) and (12) in [1], except that we axe
working with the IDFT.
Corollary 2.5 Assume p = 2. The forward combine equation for CS and
SCS sequences is:
Vn, 0 = (n,0 + *n,l)/2
jj 3?Til = (n,0 n,l)/2
for 0 < n < N/2 1. The forward combine equation for SCSIS sequences
is:
Vnfl = ^ (n,0 " i7i,l)/2
for 0 < n < N/2 1.
n/2
WN/pyn,(.P1)/2
1/P^2LOpll>/2in,l
1=0
1=0
31
I
2.3 Real Even (RE)
I.
In this section, we will be concerned with the following symmetries:
Definition 2i5 A real even (RE) sequence xn of length N is defined by:
*71 = *71
j ^ *JV_n = Xn
I.
Note that an RE sequence may also be viewed as having both R and CS
symmetry, which we denote by RCS.
i
The followinglemma establishes the relationship between these symme
tries. We omit the proof of this result because it is well known.
11
i!
Lemma 2.5 If * is an RE sequence of length N,. then its DFT Xk is an
RCS sequence of length N. If Xk is an RCS sequence of length N, then its
IDFT xn is an RE sequence of length N.
i.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. j Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for NN boundary condi
tions. Note that if N is even, then an RE sequence satisfies NN boundary
conditions for the; computational domain 0 < n < N/2. That is:
*AT1 = *1
XN/ 21 = *jV/2+l
Theorem 2.7 Let xn be an RE sequence and let Xk be its RCS symmetric
DFT, both of length N where N is even. The real form of the DFT is:
Nf 21
Xk=ll/N[x0 + (l)kxN/2+ 2x cos(27r kn/N)]
.  i 71=1
for 0 < k < N/2.  The real form of the IDFT is:
ji N/ 21
*71 iXo + (l)nXN/2 + E 2Xk cos(27rkn/N)
, k=l
for 0 < n < N/2. ^Note that the results for the DFT and IDFT are identical
except for scaling.
32
We now prove iTheorem 2.7. The result for the DFT follows from Theo
rem 2.4, the RCS symmetry of X/., and the RE symmetry of xn as follows:
Nl
Xk = 1 /N ^2 xncos(2Trkn/N)
I n=0
= 1/N{xq + (l)*a:jv/2 +
N/ 21 N/ 21
xn cos(2ttkn/N) + ^ *ArnCos[2Trk(N n)/N]}
71=1 71=1
N/ 21
= l/A[a:o + (l)fcjv/2 + 53 2a:n cos(2xA:n/l\r)]
71=1
The result for the IDFT follows immediately from Theorem 2.4 and the RCS
symmetry of Xk Note that only half of the RE sequence xn needs to be
specified. This completes the proof of Theorem 2.7.
We now develop a fast, mixed radix algorithm for computing the RE
symmetric DFT and its inverse, given xn in natural order. Note that an RE
sequence of length N may be stored in N/2 real storage locations, compared
to 2N real storage locations for a C sequence of length N. Similarly, an RCS
sequence of length N may be stored in N/2 real storage locations. Our goal
is to exploit these symmetries in the data in order to obtain a reduction by
one fourth in both storage requirements and number of operations compared
to that for C sequences. This algorithm is based on the symmetries which
occur in the splittings of the RCS sequence Xk We begin developing this
algorithm by defining all of the intermediate symmetries involved.
Definition 2.6 Let Xk be an RCS sequence of length N with factor p. The
intermediate symmetries which occur in the splittings of Xk are identical to
those in Definition 24, with the addition that all sequences are real as well.
We indicate this by preceding the acronym for each symmetry with an R.
The relationships between the symmetries recorded in Lemma 2.3 are not
affected by the fac't that all sequences have R symmetry as well. A mixed
radix splitting tree diagram for an RCS sequence is shown in Figure 2.3. The
acronyms representing the symmetries axe summarized in Table 2.2 for ease
of reference. Note that a branch of the splitting tree corresponding to a dual
sequence terminates because it is redundant. Note also that at the deepest
level of the splitting tree we find R sequences rather than C sequences.
The next lemma provides the intermediate symmetries in the IDFT in
duced by the intermediate symmetries in the DFT.
i
33
11
I
Figure 2.3: Splitting tree for RE symmetric FFT
34
Lemma 2.6 The intermediate symmetries in the IDFT induced by the in
termediate symmetries in the DFT are identical to those in Lemma 24, with
the following addition. Let Xk be an R sequence of length N. Its IDFT xn
satisfies: j
^Nn ~
i
Since all sequences have R symmetry, only half of the IDFT of any sequence
needs to be computed.
I
We now prove1 Lemma 2.6. Let Xk be an R sequence of length N. Its
IDFT xn satisfies:
; :
i , &=0
I Nl
I = E ^W'
f A=0
1 . Xn
i
This completes the proof of Lemma 2.6.
The preceding lemma shows that each symmetry appearing in Figure 2.3
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations; for the RE symmetric IFFT.
j i
Theorem 2.8' Assume that p is even. The inverse combine equation for
RCS, RSCS, and RCSIS sequences is given by equation (2.5) for the lower
halfrange of n and 0 < / < p/2 1. We also need the companion equation:
j. xN/pn,l = Vnfi + (l)irlj/n,p/2 +
j P/21
j 2Ee[ E V(+1HV,,] (213)
I' 9=1
I
for the lower halfrange of n and 0 < l < pj 2 1. The inverse combine
equation for RSCSIS sequences is given by equation (2.6) for the lower half
range ofn and 0 < l < p/2 1. We also need the companion equation:
Pi 21
xN/pn,l = 2 Re[
(2.14)
for the lower halfrange of n and 0 < Z < p/2 1. The inverse combine
equation for R sequences is given by equation (2.3) for the lower halfrange
ofn and 0 < Z < p/2 1. We also need the companion equation:
T,
 XN/pn,l = (215)
i 9=0
for the lower halfrange ofn and 0 < Z < p/2 1.
Next, assume that p is odd. The inverse combine equation for RCS and
RCSIS sequences is given by equation (2.7) for the lower halfrange ofn and
0 < l < (p l)/2.' We also need the companion equation:
I ;
i (p1)/2
\xN/pn,l = yn,oT2Re[ u>pq^l+l)u;'Nyn,q] (2.16)
I 9=1
I
for the lower halfrange ofn and 0 < Z < (p 3)/2. The inverse combine
equation for RSCS and RSCSIS sequences is given by equation (2.8) for the
lower halfrange of n and 0 < Z < [p l)/2. We also need the companion
equation: .
*N/pn,l 2Zn,(p1)/2 t"
I. (p3)/2
I ; Y. ;qV+l)%1h,,} (2.17)
[ 9=0

for the lower halfrange of n and 0 < Z < (p 3)/2. The inverse combine
equation for Resequences is given by equation (2.3) for the lower halfrange
ofn and 0 < Z < (p l)/2. We also need the companion equation (2.15) for
the lower halfrange ofn and 0 < Z < (P 3)/2.
j !
We now prove Theorem 2.8. First, assume that p is even. Consider the
combining of RCS, RSCS, and RCSIS sequences. Since we will compute
only half of each sequence yn>q on the right hand side of equation (2.5), we
need the following companion equation:
I
Using RSCS symmetry yields:
yN/pntq
I
I
U?
(N/pn)/2
N/p
yN/pn,q
U1
n/2__
N/p yn,q
yn,q
(2.18)
Substituting this into the companion equation above yields:
i *i\T/p71,i Vnfi d" ( 1) Vn,p/2 +
: p/21
2Re[ 53
9=1
i = Vnfl + (l)i"rlj/n,p/2 +
p/21
2Re[ ^ w9^+1^^i/n,g]
9=1
Consider1 the combining of RSCSIS sequences. Since we will compute
only half of each sequence yn>q on the right hand side of equation (2.6), we
need the following companion equation:
p/21
ZiV/prM j= 2Re[ulJ2J^/p~n}/2 Â£ 0>lpq^N/p~n)yNlp
9=0
p/21
9=0
! p/21
= sa*[u.<>/VsP E V(,+l)*V,]
9=0
1
n,gj
Consider the combining of R sequences. Since we will compute only half
of each sequence yn,q on the right hand side of equation (2.3), we need the
following companion equation:
fN/pnJL
J2UPUNN/P "Wpn,,
q=o
9=0
37
I
Next, assume tlhat p is odd. Consider the combining of RCS and RCSIS
sequences. Since we will compute only half of each sequence yn>q on the right
hand side of equation (2.7), we need the following companion equation:
(p1)/2
xN/pnJ = VN/pn.O + 2Re[ Yj ^p^N^^VN/pn.q]
q=l
(p1)/2
,= Vn,o + 2Re[ Y uf+1)uNnqyn,q]
I > 9=1
; (pi)/2
= yn<0 + 2Re[ Y p n,q]
9=1
Consider the combining of RSCS and RSCSIS sequences. Since we will
compute only half of each sequence yng on the right hand side of equa
tion (2.8), we need the following companion equation:
xN/pn,l = ~yN/pn,(p1)/2 +
(p3)/2
1 i'
9=0
Substituting equation (2.18) into the companion equation above yields:
xN/pn,l ~ yn,{p1)/2 +
(p3)/2
9=0
! = yn,(p~ l)/2 +
1 (p3)/2
9=0
The companion equation for R sequences is identical to the even p case.
This completes the proof of Theorem 2.8. The following corollary provides
an important special case of this result.
Corollary 2.6 Assume p = 2. The inverse combine equation for RCS and
RSCS sequences is:
0 Vriy0 T j/n,l
jV/2n,0 = Vnfi ~ Vn.,1
38
i
for the lower halfrange of n. The inverse combine equation for RSCSIS
sequences is:
:! xnfi = 2Re[tifff2yn,o]
XN/271,0 = 2Im[u)pj 2fn,o]
for the lower halfrange ofn. The inverse combine equation for R sequences
is:
i xn,0 Unfl "H
xNj2n,0 ~ Vnfi ~ ^N1 Vn.,1
for the lower halfrange ofn.
The next theorem provides all of the forward combine equations for the
RE symmetric FFT.
Theorem 2.9 Assume that p is even. The forward combine equation for R
sequences is:
i Vn,q = 9{n,0 + ( l)9n,p/2 4"
p/21
E + (2.19)
I 1=1
for the lower halfrange ofn and 0 < q < p 1. Note that 3/0,g is real because
3:0,0 = *0 a,nd Xq,p/2 XN/2 are both real. This ensures that the final output
is real because ,n = 0 in the last stage of the algorithm. The forward combine
equation for RCS, RSCS, and RCSIS sequences is given by equation (2.19)
for the lower halfrange of n and 0 < q < p/2 1 with the exception that all
sequences xn i are real. In addition:
yn,p/2 = 1/P{*n, 0 + (l)P/2Zn,p/2 +
i ; p/21
E n,i]} (2.20)
1=1
for the lower halfrange of n. The forward combine equation for RSCSIS
sequences is:
yn,q = l/p^"(?+1/2){n,0 + *(l)9*_n1p/2 +
p/21
E [u>/(9+1/2>aW + u^1/2)^]} (2.21)
39
I
for the lower halfrange of n and 0 < q < pj2 1. Note that yo,g is real
because xQ^pj%, Oi
Next, assume that p is odd. The forward combine equation for R se
quences is:
i '!
(p1)/2
y,q =7 l/PNnq{Xn,0+ 53 {ul,1Xntl+JX_ntl}} (2.22)
ll
for the lower halfrange of n and 0 < q < p 1. The forward combine
equation for RCSand RCSIS sequences is given by equation (2.22) for the
lower halfrange of n and 0 < q < (p l)/2 with the exception that all se
quences xn>i are real. The forward combine equation for RSCS and RSCSIS
sequences is:
(P1)/2
Jin* = l/?iw<,+1/!){*,0 + Â£ + (2.23)
1=1
for the lower halfrange ofn and 0 < q < (p 3)/2. In addition:
(p~1)/2
Vn,(p1)/2 = l/p{n,0 + 53 ( ^) iXn,l (2.24)
1=1
I
for the lower halfrange ofn.
\
We now prove Theorem 2.9. First, assume that p is even. The forward
combine equation for R sequences is obtained by developing a compact form
of equation (2.4) which eliminates all redundant data. For this purpose, we
will need the following result which is valid for all R sequences:
'] xn,pl1
i
Using this result, we obtain:
x(pll)N/p+n
xN(l+l)N/p+n
*(/+l)AT/pn
xn,l+l
Vn,q
P1
1/pujNiqJ2ujPlqx^i
1=0
40
i
I
P/21 p/21
= 1/p wivn9{ E p lqT 7 ^n.Z + Â£
1=0 1=0
p/21 p/21
= l/p E + E wpz+1)
1=0 z=o
[ p/21 p/2
= 1/P ^"9{ E wp_ ^2* l n,! + E"?".'}
z=o Z=1
= 1/P a,Wn9{a;n,0 + (' 1)9Z n,p/2 +
p/21
E Klq*n,l + <#*_,,]}
1=1
The forward combining of RCS, RSCS, and RCSIS sequences requires
one new equation:
n/2
I yn,p/2 ~ U}N/pyn,p/2
I = l/p{xnfi + (l)p/2a:_n,p/2 +
?/21
E + *",*]}
: i i /=i
.The forward combine equation for RSCSIS sequences is obtained by sub
stituting equation (2.9) into equation (2.19):
2/n.g, = 1/Pw^?{ln,0 + (l)9S_n,p/2 +
P/21
!. 2 [Â£ l9*n,I + *n,z]}
1=1
= l/P^"(9+1/2){*n,0 + *(l)9*n,p/2 +
I I
P/21
EK,(fffl/1,^+Jfs+1/,)*^]}
Next, assume that p is odd. The forward combine equation for R se
quences is obtained by developing a compact form of equation (2.4) which
eliminates all redundant data.
IInJt = 1/PNnqY,U}plqx*J
41
= VP^n9{w9(,1>/2a:ni(pi)/2 +
i , (p3)/2 (p 3)/2
X Wi +
Z=0 Z=0
= l/P^n9{Wp9(P_1)/2a5n,(pl)/2 +
' (p3)/2 (p3)/2
, i, X a,pZ9a;n,J+ X wI(Z+l)n,Z+l}
Z=0 Z=:0
:  l/Pwivn9{;p'9(P~1)/2a:n,(pl)/2 +
, (p3)/2 (pl)/2
ji X) WpZ9a:n,/+ X
Z=0 Z=1
(p1)/2
= 1/P^n9{*n,0+ X ["p ^^.Z + Wp9*7i,z]}
Z=1
The forward combining of RCS and RCSIS sequences does not require
any new equations. The forward combine equation for RSCS and RSCSIS
sequences is obtained by substituting equation (2.9) into equation (2.22):
! , (pi)/2
yn,q = l/PNnq{Xn,0+ X ["p1kx^,l + UPXK,i\}
i 1=1
= l^P^n(9+l/2){*n,0 + X [WpZ(9+1/2)n,Z +Wp(9+1/2)i_n,z]}
1=1
For q = (p l)/2lthis reduces to:
Unfa1)/2 = ajvyp^ri,(pl)/2
[ i (pl)/2
= 1M*.0+ X (
1 : z=i
This completes the proof of Theorem 2.9. The following corollary provides
an important special case of this result.
Corollary 2.7 Assume p = 2. The forward combine equation for R se
quences is: ( i
JZn,0 r (*n,0 "I" *_nii)/2
2Zn,l = kjv (n,0 xn,l)/2
i
42
for the lower halfrange of n. The forward combine equation for RCS and
RSCS sequences is:
yn, o = (*n,0 + *n,l)/2
1 1 (7i,0 n,l)/2
for the lower halfrange of n. The forward combine equation for RSCSIS
sequences is:
Vn, o = ujfn/2(xn>0 + *n,l)/2
for the lower halfrange ofn.
2.4 Real Odd (RO)
In this section, we will be concerned with the following symmetries:
Definition 2.74 real odd (RO) sequence xn of length N is defined by:
xNn ~xn
An imaginary odd (10) sequence, or equivalently an imaginary conjugate
symmetric (ICS) sequence, X}, of length N is defined by:
Xk = Xh
Xivjt = Xk
l_ 1
The follow[ing1 lemma establishes the relationship between these symme
tries. We omit the proof of this result because it is well known.
1 '!
Lemma 2.7 If xn is an RO sequence of length N, then its DFT Xk is an
ICS sequence of length N. If Xk is an ICS sequence of length N, then its
IDFT xn is an RO sequence of length N.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion require!! by the Fourier analysis method for DD boundary con
ditions. Note that if IV is even, then an RO sequence satisfies DD boundary
conditions for thei computational domain 1 < n < N/2 1. That is:
i ii Xo = 0
i i *JV/2 = 0
. I
Theorem 2.i0 Let xn be an RO sequence and let Xk be its ICS symmetric
DFT, both of length N where N is even. The real form of the DFT is:
' N/21
i Im(Xk) = 1/1V sin(2xfcn/JV')
n=l
for 1 < k < N/2 1. The real form of the IDFT is:
11 JV/21
xn = ^2 2Im(Xk)sm(2irkn/N)
&=i
i !
44
!
for 1 < n < N/2 1. Note that the results for the DFT and IDFT are
identical except for scaling.
We now prove Theorem 2.10. The result for the DFT follows from The
orem 2.4, the ICS symmetry of Xk, and the RO symmetry of xn as follows:
Nl
Im(Xfc) =r 1/N ^ xn sm(2Trkn/N)
n=0
N/ 21
= l/N ^ {xn sm(2irkn/N) + sin[27r&(lV n)/N}}
,i  i 71= 1
N/ 21
= 1/N ^2 2ajnsin(27rkn/N)
711
The result for1 the IDFT follows immediately from Theorem 2.4 and the ICS
symmetry of Xk. Note that only half of the RO sequence xn needs to be
specified. This completes the proof of Theorem 2.10.
We now develop a fast, mixed radix algorithm for computing the RO
symmetric DFT and its inverse, given xn in natural order. Note that an RO
sequence of length1 N may be stored in iV/2 real storage locations, compared
to 21V real storage locations for a C sequence of length N. Similarly, an ICS
sequence of length N may be stored in N/2 real storage locations. Our goal
is to exploit thesej symmetries in the data in order to obtain a reduction by
one fourth in both storage requirements and number of operations compared
to that for C sequences. This algorithm is based on the symmetries which
occur in the splittings of the ICS sequence Xk We begin developing this
algorithm by defining all of the intermediate symmetries involved.
Definition 218 Let Xk be an ICS sequence of length N with factor p. The
intermediate symmetries which occur in the splittings of Xk are identical to
those in Definition 24, with the addition that all sequences are pure imagi
nary as well. We indicate this by preceding the acronym for each symmetry
with an I. 1
; I
The relationships between the symmetries recorded in Lemma 2.3 are not
affected by the fact that all sequences have I symmetry as well. A mixed
radix splitting tree diagram for an ICS sequence is shown in Figure 2.4. The
acronyms representing the symmetries are summarized in Table 2.2 for ease
of reference. Note that a branch of the splitting tree corresponding to a dual
45
1
Figure 2.4: Splitting tree for RO symmetric FFT
46
!
sequence terminates because it is redundant. Note also that at the deepest
level of the splitting tree we find I sequences rather than C sequences.
The next lemma provides the intermediate symmetries in the IDFT in
duced by the intermediate symmetries in the DFT.
Lemma 2.8 The''intermediate symmetries in the IDFT induced by the in
termediate symmetries in the DFT are identical to those in Lemma 24, with
the following addition. Let Xk be an I sequence of length N. Its IDFT xn
satisfies: ,,
XNn =
Since all sequences have I symmetry, only half of the IDFT of any sequence
needs to be computed.
I, :
We now prove. Lemma 2.8. Let Xk be an I sequence of length N. Its
IDFT xn satisfies:
Ztfn = ^Xk^^
k0
N1
=
k=0
1 = ~Xn
This completes the proof of Lemma 2.8.
The preceding lemma shows that each symmetry appearing in Figure 2.4
induces a symmetry in the IDFT. These induced symmetries are summarized
in Table 2.2 for ease of reference. The next theorem provides all of the inverse
combine equations for the RO symmetric IFFT.
'' 1.1
Theorem 2.11 Assume that p is even. The inverse combine equation for
ICS, ISCS, and ICSIS sequences is given by equation (2.5) for the lower
halfrange of n and 0 < l < p/2 1. We also need the companion equation:
XN/pti,l ~ ynfi + ( 1) Vn,p/2 ~
' ' P/21
, . 2Â£e[ Â£ (225)
47
for the lower halfrange of n and 0 < l < pj2 1. The inverse combine
equation for ISCSIS sequences is given by equation (2.6) for the lower half
range ofn and 0 < l < pj2 1. We also need the companion equation:
p/21
JV/pn,/ = 2iÂ£e[u>2 UP Vn,q] (226)
9=0
i :'
for the lower \halfrange ofn and 0 < l < p/2 1. The inverse combine
equation for I sequences is given by equation (2.3) for the lower halfrange
ofn and 0 < l < p/2 1. We also need the companion equation:
' pi
JV/pn,Z = Y,"?+l)7qyn>q (227)
q=0
for the lower halfrange ofn and 0 < l < p/2 1.
Next, assume that p is odd. The inverse combine equation for ICS and
ICSIS sequences is given by equation (2.1) for the lower halfrange ofn and
0 < / < (p l)/2. We also need the companion equation:
1 ; (p1)/2
' xNjpn,l = Vnfi ~ 2fie[ ^ ]v3/n,q] (2.28)
: 9=1
for the lower halfrange of n and 0 < l < (p 3)/2. The inverse combine
equation for ISCS and ISCSIS sequences is given by equation (2.8) for the
lower halfrange of n and 0 < l < (p l)/2. We also need the companion
equation: 1 j'
i ;
*JV/pn,Z = &i,(pl)/2 
i, (P 3)/2
2Re[u>^l+1V2unN/2 J2 ;q(l+l)%yn,9] (229)
9=0
for the lower \halfrrange ofn and 0 < l < (p 3)/2. The inverse combine
equation for I sequences is given by equation (2.3) for the lower halfrange
ofn and 0 < l < (p l)/2. We also need the companion equation (2.27) for
the lower halfrange ofn and 0 < / < (p 3)/2.
We now prove Theorem 2.11. First, assume that p is even. Consider the
combining of'ICS', ISCS, and ICSIS sequences. Since we will compute only
I
48
half of each sequence yUiq on the right hand side of equation (2.5), we need
the following companion equation:
^N/pn,l VN/pnfl "I" ( 1) VN/pn,p/2 "i"
p/21
2Re[ ulp9VNN/P~n)yN/pn,g\
9=1
Using ISCS symmetry yields:
(N/pn)/2
yN/pn,q ~ WN/p yN/pn,q
. n/2_
= +lV/p yn,q
I ! = + yn,q
Substituting this into the companion equation above yields:
(2.30)
I !*fNjpn,l ~ yn,0 + ( 1) 2/n,p/2
p/2^1
2B*[ Â£
9=1
i
= ~yn,0 + (l)^n,P/2 
I . P/21
2Re[ Â£
 ; *=1
Consider the combining of ISCSIS sequences. Since we will compute only
half of each sequence yn>q on the right hand side of equation (2.6), we need
the following companion equation:
: p/21
*iJ* '= mÂ£ J<4N^yN,r^,}
;, 9=
p/21
= 2Re[
V+Wu ~n/2
uf+1)UNnqT>
^71,9]
VN ]C
9=0
[ ; P/2"1
I 9=0
Consider the combining of I sequences. Since we will compute only half
of each sequence yn
49
following companion equation:
pi
Elq qCN/pn)
UPUN yN/pn,q
9=0
: = E^,+,WX.
9=0
Next, assume that p is odd. Consider the combining of ICS and ICSIS
sequences. Since we will compute only half of each sequence yn
hand side of equation (2.7), we need the following companion equation:
(pl)/2
N/pntl ~~ VN/p7i,0 2Re[ ^ ^ UN/pntf]
Q=1
(pl)/2
= ~Vn,0 2Re[
! 9=1
(p1)/2
,.= IM2Re[ 53 Wp9(i+1)^yn,g]
g=l
I
Consider the combining of ISCS and ISCSIS sequences. Since we will
compute only half of each sequence yn>q on the right hand side of equa
tion (2.8), we need the following companion equation:
xN/pn,l = &V/pn,(pl)/2 +
(p3)/2
1 2M^lp")n E
I 9=0
Substituting equation (2.30) into the companion equation above yields:
xN/p~n,l ~ Vn,(p1)/2
(p3)/2
9=0
. = n,(p1)/2
(p3)/2
9=0
2Re[o;;(/+1)/2a;/2 Â£
50
i
The companion equation for I sequences is identical to the even p case.
This completes the proof of Theorem 2.11. The following corollary provides
an important special case of this result.

Corollary 2.8 Assume p 2. The inverse combine equation for ICS and
ISCS sequences is:
xn,0 77 Vn.,0 d 2/ti,1
, i xN/2n,0 = ~ 2/71,0 + 2/71,1
for the lower, halfrange of n. The inverse combine equation for ISCSIS
sequences is:
xnfl = 2Re[u^2ynfi]
^ jv/2ti,o = 2Im[u>jJ yn>o]
for the lower halfrrange ofn. The inverse combine equation for I sequences
is:
1 7i,0 2/71,0 H" ^NVn,l
xN/2~n,0 = 2/n,0 + 2/n,l
I
for the lower halfrange ofn.
' ! I
I
The next theorem provides all of the forward combine equations for the
RO symmetric FET.
' 1
Theorem 2.12 Assume that p is even. The forward combine equation for
I sequences is: .
 2/71,9 = l/pv]fnq{xnt0 + (I)q+lxn,p/2 +
' ' P/21
I Â£ 1lqxn,i Jpqx_nti}} (2.31)
': z=i
for the lower halfrange of n and 0 < q < p 1. Note that yo>q is pure
imaginary because zq,o = *0 and xo,p/2 xN/2 are both pure imaginary.
This ensures that1 the final output is pure imaginary because n 0 in the
last stage of the algorithm. The forward combine equation for ICS, ISCS,
and ICSIS sequences is given by equation (2.31) for the lower halfrange of
n and 0 < q
addition.
fln,p/2 = lM*n,0 + (~l)p/a+1*_1p/a +
p/21
Â£
(2.32)
i=i
for the lower halfrange ofn. Note that yo,P/2 = 0 because a:o,o = = 0 and
xo,p/2 xN/2 =p 01 The forward combine equation for ISCSIS sequences is:
yn,q 5= l/pu>xn{q+1/2){xni0 + i(l)q+1 xn,p/2 +
p/21
;! Â£[u.p^+1/2)^^1/2)i_7l,]} (2.33)
/or the lower halfrange ofn and 0 < q < p/2 1. Note that yo,q is pure
imaginary because o,o = x0 = 0.
Next, assume that p is odd. The forward combine equation for I sequences
is:
Vn,q = l/pw/5{zn,o
(?l)/2
+ E iuplqxn,l UPXn,l\}
1=1
(2.34)
for the lower halfrange of n and 0 < q < p 1. The forward combine
equation for ICS \and ICSIS sequences is given by equation (2.34) for the
lower halfrange of n and 0 < q < (p l)/2 with the exception that all
sequences xn%i are real. The forward combine equation for ISCS and ISCSIS
sequences is:
(P1)/2
Vn,g = l/pa;iVn(9+1/2){inio + Â£ [u Z(9+1/2)in,i a;J9+1/2^_n,z]} (2.35)
l=i
for the lower halfrange ofn and 0
1 (pi)/2
yh,(pja)/2 = l/p{n,0 + (~ 1) (2.36)
Z=1
for the lower halfrange ofn. Note that Po,(pi)/2 = 0.
We now prove jTheorem 2.12. First, assume that p is even. The forward
combine equation for I sequences is obtained by developing a compact form
52
I
of equation (2.4) which eliminates all redundant data. For this purpose, we
will need the following result which is valid for all I sequences:
2n,pZ1 (pZl)iV/p+n
,  = *JV(Z+l)ZV/p+n
(Z+l)JV/p n
; i = n,Z+l
Using this result, we obtain:
pi
z=o
1: p/2i p/21
= E upl
: !, z=o 1=0
p/21 p/21
= \(puNnq{ E wp lQx,i  E wI('+l)*n,Z+l}
: Z=0 z=o
p/21 p/2
= E ***.  E^p^n.z}
' Z=0 Z=1
= Vi>"jv"9{n,0 + (l)9+1* n,p/2 +
p/21
E K /]} iZ=l
The forward combining of ICS, ISCS, and ICSIS sequences requires one
new equation: :,
 n/2
Vn,p/2 ~ uN/pyn,p/2
' = 1/P{*n,0 + (l)P/2+1Zn,p/2 +
: p/21
E (l)W *n,z]}
1=1
The forward combine equation for ISCSIS sequences is obtained by sub
stituting equation (2.9) into equation (2.31):
53
I
Vn,g .,= l/pUNnq{vn,0 + (l)q+1Xn,p/2 +
p/21
Y ["p  WP 'Sn.l3}
2=1
' = l/p^n(?+1/2){ini0 + *(1)fl+1a_niP/2 +
p/21
EK,(s+1/2)^^1/2)M}
2=1
1 _ I
Next, assume that p is odd. The forward combine equation for I se
quences is obtained by developing a compact form of equation (2.4) which
eliminates all redundant data.
Vn,q = 1/PuNnq^2^plqxn,i
: i=o
=, 1/PUNlq{Upq^~1)/2xnl{pl)!2 +
(p3)/2 (p3)/2
Y UplqXn,l+ Y u;qLP'l~l]Xn,pl1}
2=0 1=0
= VPy>P'5(p'1)/2a:n,(pi)/2 +
: (P~3)/2 (p3)/2
! ' Y Y u>f+1h_n
1=0 l=o
' l/P^n9{^p9(P_1)/2*n,(pl)/2 +
i i (p3)/2 (pl)/2
Y V;lqXn,l~ Y ^n,i}
1 Z=0 2=1
(pl)/2
= l/pUNnq{xn,0+ Yj ~ uj*n,l]}
The forward combining of ICS and ICSIS sequences does not require
any new equations. The forward combine equation for ISCS and ISCSIS
sequences is obtained by substituting equation (2.9) into equation (2.34):
I 1
, (Pl)/2
Vn, q = l/pw/5W,0+ Yj iup kxnl ~ UpXn,l}}
i1 2=1
54
I
(pl)/2
= 1 */l3^n(9+l/2){*n>0 + [>p*(9+1/2)Xn,! Â£4(9+1/2)*n,i]}
i=l
For q (p 1)/2 this reduces to:
&i>(p1)/2 = WJV/p^,(p1)/2
I ; (Pl)/2
= lM*n,0 + X) ~
: z=i
This completes the proof of Theorem 2.12. The following corollary provides
an important special case of this result.
Corollary 2.9 Assume p 2. The forward combine equation for I se
quences is:
' Vnfl (*7i,0 n,l)/2
 Vn, 1 = W;T(a:nio + _n,l)/2
for the lower halfrange of n. The forward combine equation for ICS and
ISCS sequences is:
Vnfi = (Zn, 0*n,l)/2
! $n,l ~ (n,0 ~t~ n,l)/2
for the lower halfrange of n. The forward combine equation for ISCSIS
sequences is:
Vv.fi = ^"/2(in0 **_n,i)/2
/or the lower halfrange ofn.
i
i1
55
I
2.5 Real Composite EvenEven (REE)
i. '
In this section, we will be concerned with the following symmetries:
h
Definition 2.9 A' real composite eveneven (REE) sequence xn of length
N, where N is even, is defined by:
xn = xti
2JJV71 ~ xn
xN/2n = xn
Note that an REE, sequence of length N is also an RE sequence of length N.
A real conjugate symmetric zero odd term (RCSZO) sequence Xk of length
N, where N is even, is defined by:
Xk = Xk
' XN_k = X*
Xk = (1 )kxk
The following lemma establishes the relationship between these symme
tries.
Lemma 2.9 If is an REE sequence of length N, where N is even, then
its DFT Xk is an R CSZO sequence of length N. IfXk is an RCSZO sequence
of length N, where N is even, then its IDFT xn is an REE sequence of
length N. I
. i
We now prove Lemma 2.9. We will only prove the first assertion. Assume
xn is an REE1 sequence of length N, where N is even. Since xn is also an
RE sequence of length N, Lemma 2.5 implies that its DFT X*. is an RCS
sequence of length N. Thus, we have only to prove the third property in
the definition of an RCSZO sequence. For this, we use the representation of
Xk provided by Theorem 2.7 and the REE symmetry of xn as follows:
, JV/21
Xk = X0 + {l)kxN/2 + ^2 2in cos(2irfc7i/N)
\ 71=1
,1 N/ 21
= !zoi (l)fcJv/2 + $3 2a:A727ics[2?rfc(lV/2 n)(N]
i 71=1
56
I
N/ 21
= xq f (l^io + (1)* X^ 2Kn cos(2irkn/N)
n 1
i' i' N/ 21
 + *o + X) 2ajncos(2xkn/N)]
n=l
. i, N/21
= (f)*[*o + (l)*a3/v/2 + X) 2a3n cos(27rfcn/JV)]
l 71=1
= ;(!)***
This completes the proof of Lemma 2.9.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for NNS boundary con
ditions. Note that if N = 2(2M + 1), then an REE sequence satisfies NNS
boundary conditions for the computational domain 0 < n < M. That is:
1
XNl X\
' I.!
XM XM+l
Theorem 2.13 Let xn be an REE sequence and let Xk be its RCSZO sym
metric DFT, both of length N where N is even. Assume that N = 2(214+1).
The real form of the DFT is:
M
' %2k = 2/N[xo + ^>2 cos(4xAn/lV)]
i i i 71=1
for 0 < k < M. The real form of the IDFT is:
M
Xn = Xq + X] 2X2k cos(47Tkn/N)
, ii *=1
for 0 < n < M. Note that the results for the DFT and IDFT are identical
except for scaling.
i, J'i
We now prove Theorem 2.13. The result for the DFT follows from The
orem 2.7, the RCSZO symmetry of Xk, and the REE symmetry of xn as
follows: ,
, !, NZ21
X2k =f l/iV[o + xjj/2 + XI 2zncos(4irkn/N)\
f ' 71= 1
I
1 " 57
M
= 1/N{xq + zat/2 + 53 2Xn cos(Airkn/N) +
71=1
M
53 2^/zn cos[4irk(N/2 n)/#]}
71=1
M
= 2/N[xq + 53 2a:n cos(4xkn/N)]
' 1 71=1
The result for the IDFT follows immediately from Theorem 2.7 and the
RCSZO symmetry'of Xk. Note that only one fourth of the REE sequence
xn needs to be'specified. This completes the proof of Theorem 2.13.
A fast, mixed radix algorithm for computing the REE symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RE'symmetric FFT. Note that an REE sequence of length
N may be stored in N/4 real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an RCSZO sequence of
length N may be stored in JV/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth in
both storage requirements and number of operations compared to that for
C sequences. This algorithm is based on the symmetries which occur in the
splittings of the RCSZO sequence Xk We begin developing this algorithm
by defining one new intermediate symmetry involved.
Definition 2.10 A zero (Z) sequence Xk of length N is defined by:
xk = o
for 0 < k < N 1.'
i
The following lemma establishes the relationship between the symmetries
which occur in the splittings of the RCSZO sequence Xk We omit the proof
of this result because it is trivial.
Lemma 2.10 Let\xk be an RCSZO sequence of length N with factor 2.
Then subsequence Xk,o is RCS symmetric, and subsequence Xk,i is Z sym
metric. The symmetries which occur in the splittings of the RCS sequence
Xk,o are identical to those in Lemma 2.3, with the addition that all sequences
have R symmetry as well.
I
i
58
A mixed radix splitting tree diagram for an RCSZO sequence is shown
in Figure 2.5. Thej acronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find R sequences rather
than C sequences.! i
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4 and 2.6, with
the addition provided by the following lemma. We omit the proof of this
result because it is trivial.
Lemma 2.11 Let 'Xf. be a Z sequence of length N. Its IDFT xn is also a
Z sequence of length N.
These results snow that each symmetry appearing in Figure 2.5 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations1 for the REE symmetric IFFT, obtained as a special case
of that for the RE,symmetric IFFT.
Corollary 2.10 Assume p = 2. The inverse combine equation for RCS and
Z sequences is:
n,0 ~ 2/71,0
for the lower halfrange ofn. The inverse combine equations for the remain
ing symmetries' are' provided by Theorem 2.8 for arbitrary factors p.
We now prove ; Corollary 2.10. The inverse combine equation for RCS
and Z sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p 2. Thus, we apply Corollary 2.6 and use the
Z symmetry of yn,i Note that the companion equation is not needed be
cause only one. fourth of the REE sequence xn needs to be computed. This
completes the proof of Corollary 2.10.
The next corollary provides all of the forward combine equations for
the REE symmetric FFT, obtained as a special case of that for the RE
symmetric FFT.
Corollary 2.11 Assume p = 2. The forward combine equation for RCS
and Z sequences is:
J/n, 0 n, 0
tin, 1 = 0
59
I
I I
Figure 2.5: Splitting tree for REE symmetric FFT
60
i
for the lower halfrange of n. The forward combine equations for the re
maining symmetries are provided by Theorem 2.9 for arbitrary factors p.
We now prove Corollary 2.11. The forward combine equation for RCS
and Z sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the
REE symmetry of xn as follows:
yn,o = (n,0 + *n,l )/2
= (*7l + *CiV/2Tl)/2
= (*n+.*n)/2
= *n, 0
I 2/71,1 J (*71,0 n,l)/2
= (*7i *lV/2n)/2
= (*7i n)/2
= 0
This completes the proof of Corollary 2.11.
I
61
I
l
2.6 Real Composite EvenOdd (REO)
i
In this section, we will be concerned with the following symmetries:
Definition 2.11 A real composite evenodd, (RE0) sequence xn of length
N, where N is even, is defined by:
*n xn
' xNn =
, xN/2n ~ ~xn
Note that an RE0 sequence of length N is also an RE sequence of length
N. A real conjugate symmetric zero even term (RCSZE) sequence Xk of
length N, where N is even, is defined by:
1 Xk = xk
Xjfk Xk
Xk = {l)k+1Xk
'!
The following lemma establishes the relationship between these symme
tries.
Lemma 2.12 If x^ is an RE0 sequence of length N, where N is even, then
its DFTXk is an RCSZE sequence of length N. If Xk is an RCSZE sequence
of length N, where N is even, then its ID FT xn is an RE0 sequence of
length N.
i i
We now prove Lemma 2.12. We will only prove the first assertion. As
sume xn is an RE0 sequence of length N, where N is even. Since xn is
also an RE sequence of length N, Lemma 2.5 implies that its DFT Xk is an
RCS sequence of length N. Thus, we have only to prove the third property
in the definition of an RCSZE sequence. For this, we use the representation
of Xk provided; by,Theorem 2.7 and the RE0 symmetry of xn as follows:
N/ 21
Xk = *0 + (l)fcZiV/2 + cos(2irkn/N)
71=1
JV/21
= X0 + (l)kxN/2+ ^2 2zjvy2n cos[2irk(N/2 n)/N]
71=1
N/ 21
= o + (l)fc+1*o + (l)fc+1 53 2a!n cos(2irkn/N)
71=1
N/ 21
= l(l)fc+1[(l)*+1s0 + *o + 2Xn cos(2irA:n/JV)]
71=1
! JV/21
= (l)fc+1[0 + (l)fcj\r/2 + 5^ 2 cos(27TAn/JV)]
n=l
= (i)fc+1xfc
This completes1 the'proof of Lemma 2.12.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for ND or NDS bound
ary conditions, depending on the length of the sequence N. Note that if
N = AM, then an RE0 sequence satisfies ND boundary conditions for the
computational domain 0 < n < N/A 1. That is:
XN1 = i
XN/ 4 = 0
Similarly, if N 2(2M+1), then an RE0 sequence satisfies NDS boundary
conditions for the computational domain 0
XJVI = *i
XM = XM+1
Theorem 2.l4 Let xn be an RE0 sequence and let Xk be its RCSZE sym
metric DFT, both of length N where N is even. Assume that N = AM. The
real form of the DFT is:
N/ 41
Xik+i = 2/N{xo + 53 cos[27rn(2& + 1 )/N]}
I 71=1
for 0 < k < N/A 1. The real form of the IDFT is:
! N/ 41
Xn = 5^ 2X2J:+i cos[27Tn(2fe + l)/N]
fc=o
63
i
i 
I
I
, ; 1
for 0 < n < Nj4 11. Next, assume that N = 2(2M + 1). The real form of
the DFT is: <
M
X2k4i 2/N{xo + 2xncos[2nn(2k + 1)/N]}
' 71=1
for 0 < k < M1 The real form of the IDFT is:
Ml
*n = (;l)nXN/2 + ^2 2X2k+i cos[2xn(2& + l)/iV]
; fc=o
for 0 < n < M. ''
I '
We now prove,Theorem 2.14. We prove the result for the DFT for the
case of N = AM only, since the proof for N = 2(2M + 1) is similar. This
result follows fromiTheorem 2.7, the RCSZE symmetry of Xk, and the RE0
symmetry of xn as follows:
j
N/ 21
X2k+i = 1/N{x0xN/2+ ^2 2incos[2xra(2A+1)/1V]}
I ri= 1
N/ 41
= 1/N{x0 xN/2 + ^2 2Xn cos[2xn(2fc + 1 )/N] +
'r n=1
N/ 41
2*JV/2nCos[2x(lV/2 n)(2k + 1)/1V]}
71=1
1' N/ 41
= 2/N{x0\ ^ 2xn cos[2xn(2fc + 1)/1V]}
71=1
The results for the IDFT follow immediately from Theorem 2.7 and the
RCSZE symmetry; of Xk Note that only one fourth of the RE0 sequence
xn needs to be [specified. This completes the proof of Theorem 2.14.
A fast, mixed radix algorithm for computing the RE0 symmetric DFT
and its inverse,'given xn in natural order, may be obtained as a special case
of that for the RE symmetric FFT. Note that an RE0 sequence of length
N may be stored in Nj4 real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an RCSZE sequence of
length N may be stored in N/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth
64
. I
in both storage requirements and number of operations compared to that
for C sequences. This algorithm is based on the symmetries which occur in
the splittings of the1 RCSZE sequence Xk This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk We omit the
proof of this result because it is trivial.
Lemma 2.13 Let Xk be an RCSZE sequence of length N with factor 2.
Then subsequence Xk,o is Z symmetric, and subsequence Xk,i is RSCS sym
metric. The symmetries which occur in the splittings of the RSCS sequence
Xk,\ are identical to those in Lemma 2.3, with the addition that all sequences
have R symmetry as well.
A mixed radix splitting tree diagram for an RCSZE sequence is shown
in Figure 2.6. The acronyms representing the symmetries axe summarized
in Table 2.2 for. ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find R sequences rather
them C sequences, j
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.6, and 2.11.
These results show that each symmetry appearing in Figure 2.6 induces
a symmetry in,the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the RE0 symmetric IFFT, obtained as a special case
of that for the RE symmetric IFFT.
Corollary 2.12 Assume p = 2. The inverse combine equation for Z and
RSCS sequences is:
*n, 0 = Vii,l
for the lower halfrange ofn. The inverse combine equations for the remain
ing symmetries are provided by Theorem 2.8 for arbitrary factors p.
We now proye Corollary 2.12. The inverse combine equation for Z and
RSCS sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z
symmetry of yn'o Note that the companion equation is not needed because
only one fourth of the RE0 sequence xn needs to be computed. This
completes the proof of Corollary 2.12.
I
65
I
Figure 2.6: Splitting tree for RE0 symmetric FFT
I
I
66
The next corollary provides all of the forward combine equations for
the RE0 symmetric FFT, obtained as a special case of that for the RE
symmetric FFT.
Corollary 2.13 Assume p 2. The forward combine equation for Z and
RSCS sequences is:
Vnfi = 0
i Vn, 1 271,0
for the lower halfrange of n. The forward combine equations for the re
maining symmetries are provided by Theorem 2.9 for arbitrary factors p.
We now proye Corollary 2.13. The forward combine equation for Z and
RSCS sequences may be regarded as a special case of that for RCS and
RSCS sequences, where p 2. Thus, we apply Corollary 2.7 and use the
RE0 symmetry of xn as follows:
Vnfl (*71,0 *ti,i)/2
= (xn + *JV/2ti)/2
1 = {xn xn)/2
 0
2fa,l (7l,0 _7ll)/2
= (*71 JV/2 7l)/2
= (*71 + *7l)/2
1 ' = *71,0
This completes the proof of Corollary 2.13.
67
2.7 Real Composite OddEven (ROE)
1 i,
In this section, we will be concerned with the following symmetries:
Definition 2.12 A real composite oddeven (ROE) sequence xn of length
N, where N is even, is defined by:
xn xn
XNn ~xn
, xN/2n = xn
Note that an ROE, sequence of length N is also an RO sequence of length
N. An imaginary conjugate symmetric zero even term (ICSZE) sequence
Xk of length N, where N is even, is defined by:
xk = xk
XNk = Xk
Xk = (1)*+1X*
The following lemma establishes the relationship between these symme
tries.
Lemma 2.14 Ifxn is an ROE sequence of length N, where N is even, then
its DFT Xk is an ICSZE sequence of length N. If Xk is an ICSZE sequence
of length N, where N is even, then its IDFT xn is an ROE sequence of
length N.
We now prove L;emma 2.14. We will only prove the first assertion. As
sume xn is an ROE sequence of length N, where N is even. Since xn is
also an RO sequence of length N, Lemma 2.7 implies that its DFT Xk is an
ICS sequence of length N. Thus, we have only to prove the third property
in the definition of an ICSZE sequence. For this, we use the representation
of Xk provided by Theorem 2.10 and the ROE symmetry of xn as follows:
N/21
Xk = i/N ^ 2xn sin(2Trkn/N)
' i1 n~1
N/21
= i/f E %xN/2n sin[27rfc(iV/2 n)/N]
n= 1
68
I
i
' N/ 21
= (1)*+1[i/N ^2 2xnsm(2xkn/N)]
n1
=:' (i)*+1x*
This completes the proof of Lemma 2.14.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for DN or DNS bound
ary conditions,depending on the length of the sequence N. Note that if
N = 4M, then an ROE sequence satisfies DN boundary conditions for the
computational domain 1 < n < N/i. That is:
x0 = 0
*7V/4l = *JV/4+l
Similarly, if N = 2(2M+1), then an ROE sequence satisfies DNS boundary
conditions for the computational domain 1 < n < M. That is:
. i
*o = 0
XM : *M+1
r
Theorem 2.15 Let xn be an ROE sequence and let Xk be its ICSZE sym
metric DFT, both of length N where N is even. Assume that N 4M. The
real form of the  DFT is:
N/4.1
Im(X2ki) = 2/N{(l)k+1xN/4 + 2x" sin[2xn(2k 1)/N]}
' ,  71=1
for 1 < k < N/i. The real form of the IDFT is:
N/i
*n = 2Im(X2ki) sin[27rn(2fc 1)/JV]
i: fc=i
l 1 1
for 1 < n < N/i. Next, assume that N = 2(2M + 1). The real form of the
DFT is:
M
Im(X2ki) = 2/N ^2 2*sin[2irn(2Jb 1 )/N)
n=l
69
I
for 1 < k < M. The real form, of the IDFT is:
M
xn = ^2 2Im(X2ki) sm[2Tm(2k 1)/N]
k=l
for 1 < n < M.
We now prove Theorem 2.15. We prove the result for the DFT for the
case of N = AM only, since the proof for N = 2(2M + 1) is similar. This
result follows from Theorem 2.10, the ICSZE symmetry of Xand the ROE
symmetry of xn as follows:
JV/21
1 jN ^2 2n sin[2xn(2A 1)/JV]
n=l
N/ 41
1/JV{(1)2*W + Â£ 2xn sin[27rn(2& 1)/1V] +
, n=l
N/ 41
^2 2xN/2n sin[27r(iV/2 n)(2k 1)/JV]}
71=1
Nf 41
2/N{(l)k+lxw/4 + ^2 2xn sin[27rn(2& 1)/iV]}
n=l
The results for the, IDFT follow immediately from Theorem 2.10 and the
ICSZE symmetry of Xk Note that only one fourth of the ROE sequence
xn needs to be specified. This completes the proof of Theorem 2.15.
A fast, mixed radix algorithm for computing the ROE symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RO symmetric FFT. Note that an ROE sequence of length
N may be stored in N/A real storage locations, compared to 2N real storage
locations for a C sequence of length N. Similarly, an ICSZE sequence of
length N may be stored in N/A real storage locations. Our goal is to exploit
these symmetries in j the data in order to obtain a reduction by one eighth
in both storage requirements and number of operations compared to that
for C sequences. This algorithm is based on the symmetries which occur in
the splittings of the ICSZE sequence Xk This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk We omit the
proof of this result because it is trivial.
70
Lemma 2.15 Let Xp. be an ICSZE sequence of length N with factor 2. Then
subsequence Xkto is Z symmetric, and subsequence Xk,i is ISCS symmetric.
The symmetries which occur in the splittings of the ISCS sequence Xk,i o,re
identical to those in Lemma 2.3, with the addition that all sequences have I
symmetry as well.
A mixed radix splitting tree diagram for an ICSZE sequence is shown
in Figure 2.7. The acronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find I sequences rather
than C sequences.
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11.
These results show that each symmetry appearing in Figure 2.7 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the ROE symmetric IFFT, obtained as a special case
of that for the RO symmetric IFFT.
Corollary 2.14 Assume p = 2. The inverse combine equation for Z and
ISCS sequences is:
7i,0 Vn, 1
for the lower halfrange ofn. The inverse combine equations for the remain
ing symmetries are provided by Theorem 2.11 for arbitrary factors p.
We now prove Corollary 2.14. The inverse combine equation for Z and
ISCS sequences may be regarded as a special case of that for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z sym
metry of yn,o Note that the companion equation is not needed because only
one fourth of the ROE sequence xn needs to be computed. This completes
the proof of Corollary 2.14.
The next corollary provides all of the forward combine equations for
the ROE symmetric FFT, obtained as a special case of that for the RO
symmetric FFT.i
Corollary 2.15 Assume p = 2. The forward combine equation for Z and
ISCS sequences is: ,
yn, o = 0
2/n, 1 n>0
71
11
ICSZE
Figure 2.7: Splitting tree for ROE symmetric FFT
I
72
for the lower halfrange of n. The forward combine equations for the re
maining symmetries are provided by Theorem 2.12 for arbitrary factors p.
We now prove Corollaxy 2.15. The forward combine equation for Z and
ISCS sequences may be regarded as a special case of that for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.9 and use the R.OE
symmetry of xn as follows:
Vnfi = (n, 0 Ti, l)/2
= (7l  xN/2ti)/2
= (*Tl 7l)/2
= 0
Vn.,1 = (n,0 4" *n,l)/2
= (n "t" ziV/2n)/2
= (zn + n)/2
= Z7l,0
This completes the proof of Corollary 2.15.
I
73
I
2.8 Real Composite OddOdd (ROO)
In this section, we will be concerned with the following symmetries:
I ;
Definition 2.13 A real composite oddodd (ROO) sequence xn of length
N, where N is even, is defined by:
xn = xn
XjVn = ~n
xN/2n =
Note that an ROO sequence of length N is also an RO sequence of length
N. An imaginary conjugate symmetric zero odd term (ICSZO) sequence Xk
of length N, where N is even, is defined by:
Xk = Xk
Xnh Xk
, \ X* = (i)kxk
The following lemma establishes the relationship between these symme
tries.
Lemma 2.16 If x^ is an ROO sequence of length N, where N is even, then
its DFT Xk is an ICSZO sequence of length N. If Xk is an ICSZO sequence
of length N, where N is even, then its ID FT xn is an ROO sequence of
length N.
i i
We now prove Lemma 2.16. We will only prove the first assertion. As
sume xn is an ROO sequence of length N, where N is even. Since xn is
also an RO sequence of length N, Lemma 2.7 implies that its DFT Xk is an
ICS sequence of length N. Thus, we have only to prove the third property
in the definition of an ICSZO sequence. For this, we use the representation
of Xk provided by Theorem 2.10 and the ROO symmetry of xn as follows:
N/ 21
Xk = i/N 2xn s'm(2'!rkn/N)
[, n=1
' N/ 21
= i/H E 2 jv/2n sin[27rA:(iV/2 n)/N]
n= 1
74
;1 N/ 21
=' (1 )k[i/N ^ 2ajnsin(27vkn/N)}
71=1
= (i?xk
This completes thejproof of Lemma 2.16.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. Observe that the result for the IDFT is the eigenvector
expansion required by the Fourier analysis method for DDS boundary con
ditions. Note that if N = 2(2M + 1), then an ROO sequence satisfies DDS
boundary conditions for the computational domain 1 < n < M. That is:
x0 = 0
xu xm+1
Theorem 2.16 Lei xn be an ROO sequence and let Xk be its ICSZO sym
metric DFT, both of length N where N is even. Assume that N = 2(2Mjl).
The real form of the DFT is:
M
Im{X2k) = 2/N ^ 2rnsin(47zkn/N)
, 71=1
for 1 < k < M. The real form of the IDFT is:
M
xn = 2Im{X2k) sin(4?rkn/N)
k=l
for 1 < n < M. Note that the results for the DFT and IDFT are identical
except for scaling.
I
We now prove Theorem 2.16. The result for the DFT follows from The
orem 2.10, the ICSZO symmetry of Xk, and the ROO symmetry of xn as
follows:
II N/21
Im(X2fcl) = 1/N 2snsin(4irkn/N)
71=1
M
= l/JV{^ 2$n sin(47rfen/JV) +
71=1
;1 M
2xN!*n sin[4wfe(N/2 n)/N}}
71=1
75
M
= 2/JV y; 2xn sin(4irkn/N)
71=1
The result for the IDFT follows immediately from Theorem 2.10 and the
ICSZO symmetry of Xk. Note that only one fourth of the ROO sequence
xn needs to be specified. This completes the proof of Theorem 2.16.
A fast, mixed radix algorithm for computing the ROO symmetric DFT
and its inverse, given xn in natural order, may be obtained as a special case
of that for the RO symmetric FFT. Note that an ROO sequence of length
N may be stored in JV/4 real storage locations, compared to 2N real storage
locations for a G sequence of length N. Similarly, an ICSZO sequence of
length N may be stored in JV/4 real storage locations. Our goal is to exploit
these symmetries in the data in order to obtain a reduction by one eighth
in both storage requirements and number of operations compared to that
for C sequences.' This algorithm is based on the symmetries which occur in
the splittings of the ICSZO sequence Xk This does not introduce any new
intermediate symmetries. The following lemma establishes the relationship
between the symmetries which occur in the splittings of Xk We omit the
proof of this result because it is trivial.
Lemma 2.17 Let Xk be an ICSZO sequence of length N with factor 2.
Then subsequence Xk,o is ICS symmetric, and subsequence Xk,i is Z sym
metric. The symmetries which occur in the splittings of the ICS sequence
Xk,o are identical to those in Lemma 2.3, with the addition that all sequences
have I symmetry as well.
A mixed radix splitting tree diagram for an ICSZO sequence is shown
in Figure 2.8. The apronyms representing the symmetries are summarized
in Table 2.2 for ease of reference. Note that a branch of the splitting tree
corresponding to a dual sequence terminates because it is redundant. Note
also that at the deepest level of the splitting tree we find I sequences rather
than C sequences.
The intermediate symmetries in the IDFT induced by the intermediate
symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11.
These results show that each symmetry appearing in Figure 2.8 induces
a symmetry in the IDFT. These induced symmetries are summarized in
Table 2.2 for ease of reference. The next corollary provides all of the inverse
combine equations for the ROO symmetric IFFT, obtained as a special case
of that for the RO symmetric IFFT.
76
I
* 1
I
i
, i
11
!'
ICSZO
11
Figure 2.8: Splitting tree for ROO symmetric FFT
I '
I
I
, I
*1
77
i
Corollary 2.16 Assume p = 2. The inverse combine equation for ICS and
Z sequences is:
l n,0 Un} 0
for the lower halfrange ofn. The inverse combine equations for the remain
ing symmetries are provided by Theorem 2.11 for arbitrary factors p.
We now prove Corollary 2.16. The inverse combine equation for ICS
and Z sequences may be regarded as a special case of that for ICS and
ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z
symmetry of yn,i Note that the companion equation is not needed because
only one fourth of the ROO sequence xn needs to be computed. This
completes the proof of Corollary 2.16.
The next corollary provides all of the forward combine equations for
the ROO symmetric FFT, obtained as a special case of that for the RO
symmetric FFT.
Corollary 2.17 Assume p = 2. The forward combine equation for ICS and
Z sequences is:
Vnfi n,0
tin, 1 = 0
for the lower halfrange of n. The forward combine equations for the re
maining symmetries are provided by Theorem 2.12 for arbitrary factors p.
We now prove Corollary 2.17. The forward combine equation for ICS
and Z sequences may be regarded as a special case ofthat for ICS and ISCS
sequences, where p = 2. Thus, we apply Corollary 2.9 and use the ROO
symmetry of xn as follows:
I
2/71,0 : (n,0 x n,l)/2
= (ti ~ xN/2n)/2
= (7l + 7l)/2
= xn,0
2/n,l = (7l,0 4" X 71,l)/2
= (*71 + *iV/2~7l)/2
= (7l 7l)/2
= 0
This completes the proof of Corollary 2.17.
78
I
 I'
2.9 Real Staggered Even (RSE)
In this section,.we will be concerned with the following symmetries:
i!
Definition 2.14 A real staggered even (RSE) sequence xn of length N is
defined by:
i
j = n
 JV n1 : 7i
An u}even (uE) sequence Xk of length N is defined by:
XNk = Xk_ (2.37)
,! ** = (238)
The following lemma establishes the relationship between these symme
tries. We omit the proof of this result because it is well known.
h
Lemma 2.18 If zn! is an RSE sequence of length N, then its DFT Xk is
an ojE sequence of length N. If Xk is an uE sequence of length N, then its
ID FT xn is an RSE sequence of length N.
1 1!
The next lemma'will be needed to obtain the real form of the DFT and
IDFT.
 M
Lemma 2.19 Let Xk be an uoE sequence of length N, and let Xk denote
the magnitude of Xk. Then:
N 3 II $ (2.39)
Xk = 1/JV Â£ a^*(n+1/2) (2.40)
i 71=0
(2.41)
k=0
; Xw+k = Xk (2.42)
iX^k = Xk i (2.43)
i [ 11 79
I 
j '
i.i
We now proveLemina 2.19. We express Xk in polar form as follows:
' ! Xk = Xkei6
l i
Substituting ^his into equation (2.38) and solving for 6 leads to equation
(2.39). Combining equations (2.1) and (2.39) leads to equation (2.40), while
combining equations (2.2) and (2.39) leads to equation (2.41). Equation
(2.42) is obtained
from equation (2.40) as follows:
JVl
= l/N Y. W"+)(,+1/!)
71=0
N~ 1
= l/N Â£
n=0
= xk
Equation (2.43) is obtained by combining equations (2.37) and (2.39) as
follows: I
! XT (.AT&)/2 xr
= Ukj2xk
; = xk
This completes thje proof of Lemma 2.19.
The next theorem uses the previous lemma to find the real form of the
DFT and IDFT. jObserve that the result for the DFT is the eigenvector
expansion required by the Fourier analysis method for ND boundary condi
tions. Note that ifiAT is even, then an a>E sequence (represented by Xk) satis
fies ND boundary conditions for the computational domain 0 < k < Nj21.
That is: I
;J XN/2 = 0
Observe that J, the result for the IDFT is the eigenvector expansion required
by the Fourier analysis method for NSNS boundary conditions. Note that
if N is even, jjhenJan RSE sequence satisfies NSNS boundary conditions for
the computational domain 0 < n < N/2 1. That is:
I xN_i = x0
'I I
;!  XN/ 21 = XN/2
j 80
I
Theorem 2:17 Let xn be an RSE sequence and let Xk be its uE symmetric
DFT, both of length N where N is even. The real form of the DFT is:
.;! N/21
Xk = l/N ^2 2xn cos[xA:(2n + 1)//V]
n=0
! 1
for 0 < k < Nj 2 : 1. The real form of the ID FT is:
'I JV/ 21
L = X0+ E co8[ffJfe(2n + 1)/JV]
! ; *=i
for 0 < n < N j 2 J 1.
.i
We now prove Theorem 2.17. The result for the DFT follows from equa
tion (2.40) and the RSE symmetry of xn as follows:
iii' j
xk = :i
!, I n=0
l JV/21 JV/21
=;;ji/jvj{ d wn+I/2)+ ^
: n=0 n=0
JV/21 JV/21
= !!/{ Â£ *^<+1/2> + Â£ iu$"+1/2)}
'I n=0 n=0
;! I JV/21
= 2/lVRe[ J2 WNHn+1/2)]
1 j Tl=0
;:;]11 :! jv/2i
= [jl/N ^2 2xn cos[7rfe(2n + 1)/Ar]
,1 n=0
Note that only half of the wE sequence Xk needs to be specified. The result
for the IDFT'follows from equations (2.41) and (2.43) as follows:
fc(n+l/2)
''i, JV^l
xn = XkvkN
k=o
C. I JV/21 JV/21
4 *i+ E **<4(7l+1/2) + E *iv^ir*)(n+1/2)
fc=i
81
1 ivy 21 N/ 21
= XU Â£ x*<4'"+,/:!) + Â£ jW<+1/2>
I fc=l &=1
JV/21
= *o + 2Re[ Â£ Xku%n+1/i)]
k=i
iV/2l
= Xi+ Â£ 2Xk cos[irfc(2n + 1)/N]
k= l
Note that only half of the RSE sequence xn needs to be specified. This
completes the proof of Theorem 2.17.
We now develop a fast, mixed radix algorithm for computing the RSE
symmetric DFT and its inverse, given xn in natural order. Note that an
RSE sequence of length N may be stored in N/2 real storage locations,
compared to 2N rteal storage locations for a C sequence of length N. How
ever, an wE sequence of length N requires N real storage locations. Thus,
in order to obtain jan inplace algorithm, we must use a more compact rep
resentation of an wE sequence. Such a compact representation is provided
by the quantities Xk in Lemma 2.19. Using this representation, an u;E se
quence of length N may be stored in N/2 real storage locations. Our goal is
to exploit these symmetries in the data in order to obtain a reduction by one
fourth in both storage requirements and number of operations compared to
that for C sequences. The procedure for developing this algorithm will be
different from! the other algorithms in this chapter for the following reason.
Equation (2.40) shows that when we replace the complex quantity Xk by
the real quantity Xk, the DFT is changed to a new transform, which we
call the discrete staggered transform (DST). Equation (2.41) provides the
inverse discrete staggered transform (IDST). Note that the DST is a con
stant multiple of the DFT, whereas the IDST is not related to the IDFT in
any simple way. We have found that the applications of the DST include
the boundary J conditions considered in this section, as well as others. Thus,
we have devoted all of Chapter 3 to the development of fast, mixed radix
algorithms for computing the DST and IDST. These algorithms are called
the fast staggered) transform (FST) and inverse fast staggered transform
(IFST). The FST for RSE sequences is developed in Section 3.3.
82
2.10
Real Staggered Odd (RSO)
In this section,! we will be concerned with the following symmetries:
;, ,j
Definition 2.15 A real staggered odd (RSO) sequence xn of length N is
defined by: ''' 
xn = x.
XNnl =
An uodd (wO) sequence Xk of length N is defined by:
I _
1 *Nk = a*
! A*. = u>jfXk
(2.44)
(2.45)
The following lemma establishes the relationship between these symme
tries. We omit the' proof of this result because it is well known.
1 l; ij
!?] ]
Lemma 2.20 If xn is an RSO sequence of length N, then its DFT Xk is
an ujO sequence o( length N. If Xk is an ojO sequence of length N, then its
IDFT xn is an RSO sequence of length N.
The next lemma will be needed to obtain the real form of the DFT and
IDFT. I
I'j
Lemma 2.2ii Let X}. be an ojO sequence of length N, and let Xk denote
the magnitude of Xk Then:
A*
iXk
AN+k
XNk
fc/2
* WN Xk
N1
WE *nVN
710
JV1
k= 0
Xk
Xk
k(n+1/2)
k(n+1/2)
N
(2.46)
(2.47)
(2.48)
(2.49)
(2.50)
We now prove Lemma 2.21. We express Xk in polar form as follows:
j ** = Xkei6
Substituting jtjhis into equation (2.45) and solving for 6 leads to equation
(2.46). Combiningjequations (2.1) and (2.46) leads to equation (2.47), while
combining equations (2.2) and (2.46) leads to equation (2.48). Equation
(2.49) is obtained from equation (2.47) as follows:
XjV+k
(i\T+Jfe)(n+1/2)
Nl
= i/N J2 XnuN
71=0
n=0
i
= Xk
Equation (2.50) is obtained by combining equations (2.44) and (2.46) as
follows: !
Xnic =
l LQ
= IU)
(JVfc)/2
N
,k/2"xr
Xxh
'N
Xk
= Xk
This completes the proof of Lemma 2.21.
The nextiiheorem uses the previous lemma to find the real form of the
DFT and IDFT. (Observe that the result for the DFT is the eigenvector
expansion required by the Fourier analysis method for DN boundary condi
tions. Note that if N is even, then an u>0 sequence (represented by Xk) sat
isfies DN boundary conditions for the computational domain 1 < k < N/2.
That is: 1!' \
j! i X0 = 0
I XjV/2l = Xtf/2+i
'' 
Observe that the result for the IDFT is the eigenvector expansion required
by the Fourier analysis method for DSDS boundary conditions. Note that
if JV is even, then an RSO sequence satisfies DSDS boundary conditions for
the computational^ domain 0 < n < N/2 1. That is:
[ l i
1 JV1 = *0
'! xN/2l = ~xN/2
84
.1
I
I, I ' I
Theorem 2.18 Let xn be an RSO sequence and let Xk be its u O symmetric
DFT, both of length N where N is even. The real form of the DFT is:
! N/21
i xk = l/N E 2a;nsin[irfc(2n + l)/iy]
j 71=0
for 1 < k < N/2. The real form of the IDFT is:
I N/21
= (l)n+1Xjv/2 E 2Xksm[TTk{2n + l)/N]
i k= l
for 0 < n < N/2 j 1.
We now prove [Theorem 2.18. The result for the DFT follows from equa
tion (2.47) and the RSO symmetry of xn as follows:
Nl
fc(n+1/2)
Xk = 'wi/N E xn^N
,j, \ n=0
I ) N/21 N/ 21
= !.m y. W("+1/2) + E 1/2)>
 I 71=0 71=0
!' "I JV/21 N/ 21
= li/N{ Â£  E
 j 71=0 71=0
;.i ! jv/2i
= :2/Mm[ E *7ij?(n+1/2)]
: i I n=0
i.'] ! N/21
""l/N E 2x* sin[7ri(2n + 1)/Ar]
!. j 71=0
Note that only half of the ojO sequence Xk needs to be specified. The result
for the IDFT:!follows from equations (2.48) and (2.50) as follows:
!;[ j
. = Ed^("+'/2
fc=o I
N/21 N/21
Mn+1/2)
= Oiimm + E +1,2,+ E ^^!fx+1/2)}
'll! 'I fc= 1 fc=1
85
" JV/21 N/ 21
= i{i(vl)Efff/2 + Â£ Xk*T+im  E W<"+1/>)}
' 1 j fc=l fe=l
j iV/21
= i{('il)tfw/J + 2am[ 2 itu(+1/I>]}
/ I _ k=1
N/2i
= (iy+'XN/2 Â£ 2Xk sin[xfc(2n + 1)/JV]
 k=1
Note that only half of the RSO sequence xn needs to be specified. This
completes the; proof of Theorem 2.18.
We now develop a fast, mixed radix algorithm for computing the RSO
symmetric DFT and its inverse, given xn in natural order. Note that an
RSO sequence of Jlength N may be stored in N/2 real storage locations,
compared tO2IV rleal storage locations for a C sequence of length N. How
ever, an wO sequence of length N requires N real storage locations. Thus,
in order to obtain an inplace algorithm, we must use a more compact rep
resentation of an u;0 sequence. Such a compact representation is provided
by the quantities lXk in Lemma 2.21. Using this representation, an wO se
quence of length N may be stored in N/2 real storage locations. Our goal is
to exploit these symmetries in the data in order to obtain a reduction by one
fourth in both storage requirements and number of operations compared to
that for C sequences. The procedure for developing this algorithm will be
different from, the
Equation (2.47) s'
the real quantity
other algorithms in this chapter for the following reason,
rows that when we replace the complex quantity Xk by
Xk, the DFT is changed to a new transform, which we
call the discrete staggered transform (DST). Equation (2.48) provides the
inverse discrete staggered transform (IDST). Note that the DST is a con
stant multiple of the DFT, whereas the HIST is not related to the ID FT in
any simple wjay. jWe have found that the applications of the DST include
the boundary conditions considered in this section, as well as others. Thus,
we have devoted all of Chapter 3 to the development of fast, mixed radix
algorithms for computing the DST and IDST. These algorithms are called
the fast staggered transform (FST) and inverse fast staggered transform
(IFST). The; FST for RSO sequences is developed in Section 3.4.
86
'I,
'I ;
2.11 Tables of Symmetries
'I
l,
i' i'
I
; Table 2.1: Symmetries in the IDFT
Aero Symmetry Sequence DFT
j Periodic xN+n = xn %N+k = Xk
R i Real Xn 7l ^Nk = Xk
RE Real Even xn = xn xNn = xn ^ II II 1
RO Real oka n &7i JVn = xk = Xk Xjsrk = ~Xk
REE 'i Real Composite EvenEven (N even) Xn = Xn xNn = xn xN/2n xn xk = xk XNk = Xk Xk = (l)fcxfc
RE0 i Real Composite EvenOdd (JV even) xn xn xNn = xn xN/2n ~xn Xk = xk XN_k xk xk = (~l)k+1xk
ROE 1 Real Composite 0
ROC) Real Composite OcidOdd (N even) xn xn X Nin = Xn xN/2n ~xn Xk = xk XNk Xk Xk = (l)kxk
RSE 1 Real Staggered Even xn xn XNk = Xk Xk = u>kNXk
RSO Real Staggered Odd xn xn XNrz1 XNk = Xk Xk = u,kNXk
88
i
Table 2.2: Symmetries in the DFT
Aero Symmetry Sequence IDFT
Periodic XN+k = xk
CS 1 Conjugate Symmetric XNk = Xk xn =
SCS ! i Staggered Conjugate Symmetric = Xk xn = u>xnxn nil. Xn Wjy Xn
CSIS 1 CS Indcd Interseq Symmetry Xk,pq X]\r/p_k_i g Hn,pq UN/pyn,q
SCSIS scs Intel 'Sym Indcd seq metry Xk,pq1 XN/p_k_1>q Vn,pq1 = UN/pVn,q
R Real a ii XNn = xn
I Imaf ;inaiy Â£ i ii &N71 = 7l
i
I
i
89

PAGE 1
FAST FOURlER TRANSFORMS FOR DIRECT SOLUTION OF POISSON'S EQUATION by Bert Larue Bradford B.A., The University of North Texas, 1976 M.A., The University of Texas at Austin, 1979 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Mathematics 1991
PAGE 2
@1991 by Bert Larue Bradford All rights reserved.
PAGE 3
This thesis for the Doctor of Philosophy degree by Bert Larue Bradford has been approved for the Department of Mathematics by Roland A. Sweet William L. Briggs William Clohessy Thomas F. Russell lf/9! Date
PAGE 4
Bradford, Bert Larue (Ph.D., Mathematics) Fast Fourier Transforms for Direct Solution of Poisson's Equation Thesis directed by Professor Roland A. Sweet This thesis presents compact algorithms used to incorporate the Cooley Tukey fast Fourier transform (FFT) into the solution of finite difference approximations to the multidimensional Poisson equation. In each spatial dimension, we must specify boundary conditions at both the left and right endpoint. Boundary conditions we consider include cyclic, Dirichlet, and Neumann. Furthermore, there is often a need to orient the grid such that one or both of the endpoints of the computational domain are staggered at half of a grid spacing. This leads to staggered Dirichlet and staggered Neumann boundary conditions. When the Poisson equation is discretized, these boundary conditions are approximated by requiring the real sequence which represents the approximate solution to satisfy discrete analogs. The discretized boundary value problem is solved by the Fourier analysis method (also referred to as the eigenvector expansion method or as a fast Poisson solver). This method requires finding the eigenvalues and eigenvectors cor responding to the discretized boundary value problem. The discrete solution is expanded in terms of these eigenvectors. The efficiency of this algorithm results from the ability to calculate the coefficients in such eigenvector ex pansions using an FFT algorithm. For each of the boundary conditions discussed above, an FFT algorithm has been developed which computes the coefficients in the corresponding eigenvector expansion as efficiently as pos sible by eliminating all redundant computations which would occur in the full complex FFT, and without preor postprocessing. Such FFT algo rithms are referred to as compact symmetric FFTs. The elimination of pre and postprocessing improves performance by reducing both the number of operations and data accesses. These FFT algorithms are all general mixed radix, inplace algorithms which accept the input sequence in natural order. The inverse algorithms accept the input sequence in permuted order. Thus, reordering of data is never required. The form and content of this abstract are approved. I recommend its publication. Signed Roland A. Sweet IV
PAGE 5
Contents List of Figures vii List of Tables X Acknowledgements xi 1 Introduction 1 1.1 The Fourier Analysis Method 1 1.2 The New FFT and FST Algorithms 8 2 Fast Fourier Transforms 10 2.1 Complex (C) .. 10 2.2 Real (R) ..... 20 2.3 Real Even (RE) 32 2.4 Real Odd (RO) 44 2.5 Real Composite EvenEven (REE) 56 2.6 Real Composite EvenOdd (REO) 62 2.7 Real Composite OddEven (ROE) 68 2.8 Real Composite OddOdd (ROO) 74 2.9 Real Staggered Even (RSE) 79 2.10 Real Staggered Odd (RSO) 83 2.11 Tables of Symmetries ... 87 3 Fast Staggered Transforms 91 3.1 Complex (C) .. .... 91 3.2 Real (R) .......... 96 3.3 Real Staggered Even (RSE) 109 3.4 Real Staggered Odd (RSO) 123 3.5 Real Composite Staggered Even. Staggered Even (RSESE) 137
PAGE 6
3.6 Real Composite Staggered EvenStaggered Odd (RSESO) 143 3.7 Real Composite Staggered Odd Staggered Even (RSOSE) 149 3.8 Real Composite Staggered Odd Staggered Odd (RSOSO) 155 3.9 Tables of Symmetries . . . . . . . 161 4 Software Implementation and Performance 164 4.1 Introduction . . . 164 4.2 The Radix2 RO FFT 167 4.3 The Radix4 RO FFT 178 4.4 The Rad.ix3 RO FFT 190 4.5 The Mixed Radix RO FFT 204 4.6 Performance of the RO FFT 214 4.7 Automating Implementation of the RO FFT. 223 A Eigenstructure of the Discrete Poisson Equation 225 B Software for the RO FFT 228 C FORTRAN Skeleton for Combine Equations 274 D Mathematica Scripts 277 E Automatically Generated Subroutines for the RO FFT 301 Bibliography 309 vi
PAGE 7
List of Figures 2.1 Splitting tree for complex FFT 14 2.2 Splitting tree for R symmetric FFT 24 2.3 Splitting tree for RE symmetric FFT 34 2.4 Splitting tree for RO symmetric FFT 46 2.5 Splitting tree for REE symmetric FFT 60 2.6 Splitting tree for REO symmetric FFT 66 2.7 Splitting tree for ROE symmetric FFT 72 2.8 Splitting tree for ROO symmetric FFT 77 3.1 Splitting tree for R symmetric FST 101 3.2 Splitting tree for RSE symmetric FST 112 3.3 Splitting tree for RSO symmetric FST 126 3.4 Splitting tree for RSESE symmetric FST 141 3.5 Splitting tree for RSESO symmetric FST 147 3.6 Splitting tree for RSOSE symmetric FST 153 3.7 Splitting tree for RSOSO symmetric FST 158 4.1 Radix2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = N I 4 170 4.2 Radix2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = 1 . . . 170 4.3 Radix2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 0 . . 171 4.4 Radix2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = N I 4 . . 171 4.5 Radix2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 1 . . . . 172 4.6 Radix2 storage pattern for I sequences for N = 16 highlighting the case n = 0 . 173
PAGE 8
4.7 Radix2 storage pattern for I sequences for N = 16 highlighting the case n = N /4 174 4.8 Radix2 storage pattern for I sequences for N = 16 highlighting the case n = 1 175 4.9 Splitting tree for the radix2 RO FFT for N = 16 176 4.10 Radix4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 0 182 4.11 Radix4 storage pattern for ICS induced sym.'Iletries for N = 24 highlighting the case n = N j 8 182 4.12 Radix4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 1 183 4.13 Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 0 184 4.14 Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = N /8 184 4.15 Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 1 185 4.16 Radix4 storage pattern for I sequences for N = 24 highlighting the case n = 0 186 4.17 Radix4 storage pattern for I sequences for N = 24 highlighting the case n = N /8 187 4.18 Radix4 storage pattern for I sequences for N = 24 highlighting the case n = 1 188 4.19 Radix3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 0 193 4.20 Radix3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = N /6 193 4.21 Radix3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 1 194 4.22 Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 0 195 4.23 Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = N /6 195 4.24 Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 1 196 4.25 Radix3 storage pattern for I sequences for N = 18 highlighting the case n = 0 197 4.26 Radix3 storage pattern for I sequences for N = 18 highlighting the case n = N /6 198 viii
PAGE 9
4.27 4.28 4.29 4.30 4.31 4.32 Radix3 storage pattern for I sequences for N = 18 highlighting the case n = 1 . . Radix3 storage pattern for I2 sequences for N = 18 highlighting the case n = 0 . . . Radix3 storage pattern for I2 sequences for N = 18 highlighting the case n = N /6 . Radix3 storage pattern for I2 sequences for N = 18 high lighting the case n = 1 Initialization subroutine hierarchy for the RO FFT Forward transform subroutine hierarchy for the RO FFT IX 199 200 201 202 207 207
PAGE 10
List of Tables 1.1 Discrete Homogeneous Boundary Conditions 2 1.2 Eigenstructure for the Standard Grid 4 1.3 Eigenstructure for the Staggered Grid , 4 1.4 Eigenstructure for the Mixed Grid , , 5 1.5 Operation Counts for 2D Poisson Solvers 7 2.1 Symmetries in the IDFT 88 2.2 Symmetries in the DFT 89 3.1 Symmetries in the IDST 161 3.2 Symmetries in the DST 162 4.1 Splitting Tree for the Radix2 RO FFT for N = 16 177 4.2 Splitting Tree for the Radix4 RO FFT for N = 64 189 4.3 Splitting Tree for the Radix3 RO FFT for N = 27 203 4.4 Splitting Tree for the Mixed Radix RO FFT for N = 72 206 4.5 Timing Data for 1024 Sequences on the IBM 3090J 215 4.6 Timing Data for 1024 Sequences on the Cray YMP8/864 216 4. 7 Timing Model for 1024 Sequences on the IBM 3090J , 221 4.8 Comparison of Timing Data for Handwritten Code and Automated Code for 1024 Sequences on the IBM 3090J , , 224 X
PAGE 11
Acknowledgements This work was generously supported by the IBM Federal Sector Division Resident Study Program. XI
PAGE 12
Chapter 1 Introduction 1.1 The Fourier Analysis Method We begin with a brief overview of the Fourier analysis method. We will first present the Fourier analysis method in one spatial dimension. We will then extend the method to a twodimensional rectangle. The extension to higher dimensional rectangular regions is analogous, but we will not pur sue this. Finally, we will discuss operation counts for the Fourier analysis method, and compare it to other methods. In one spatial dimension, the discretized Poisson equation is: for 1 ::; n ::; M. We must specify boundary conditions at both the left and right endpoint. We may assume, without loss of generality, that the boundary conditions are homogeneous, since inhomogeneous boundary val ues may be absorbed into h and fM. The discrete, homogeneous boundary conditions we consider, specified for n = 1, are shown in Table L 1. Note that we consider two variants of Dirichlet and Neumann boundary condi tions, depending upon whether the boundary coincides with a grid point or is staggered at a half grid spacing. The notation DN indicates a homoge neous Dirichlet boundary condition at the left endpoint, and a homogeneous Neumann boundary condition at the right endpoint. Similar notation will be used for other combinations. Combinations which involve only C, D, or N are referred to as standard grid boundary conditions. Combinations which involve only DS or NS are referred to as staggered grid boundary conditions. Other combinations are referred to as mixed grid boundary conditions.
PAGE 13
The discretized boundary value problem may be written in matrix form as: Au= f (1.1) where A is a matrix of dimension M, and u, f are vectors of length M. The boundary conditions have been used to eliminate uo and UM +1 A is tridiagonal, and in one spatial dimension we would simply solve this linear system by Gaussian elimination. However, in anticipation of extensions to higher dimensions, we pro cede as follows. First, we find the eigenvalues and eigenvectors of A. These are summarized in Tables 1.2, 1.3, and 1.4. Note that A always has a full set of linearly independent eigenvectors whose components are trigonometric expressions. Note also that in these tables the computational domain is different for each boundary condition. The reason for this will become clear after studying the corresponding symmetric FFT. Appendix A provides an example of one technique for finding these eigenval ues and eigenvectors. For this general discussion, we denote the eigenvalues by Ak (repeated to multiplicity) and the corresponding eigenvectors by tPk for 1 :S k :S M. We now seek a solution for u in the form of an eigenvector expansion: M u = L UktPk (1.2) k=l This requires that we also express f as an eigenvector expansion: (1.3) Since f is known and the vectors tPk are linearly independent, we may compute A. Because the components of tPk are trigonometric expressions, A Table 1.1: Discrete Homogeneous Boundary Conditions Acronym Boundary Condition Discrete Analog c Cyclic UoUM D Dirichlet u0 = 0 N Neumann u2u0=0 DS DirichletStaggered u, + uo = 0 NS NeumannStaggered u,uo=O 2
PAGE 14
may be computed most efficiently by means of a symmetric FFT. Thus, this step is referred to as Fourier analysis. Substituting equations (1.2) and (1.3) into equation (1.1) yields: k=l M A[L Ukk] k=l M L UkAkk k=l Since the vectors k are linearly independent, we conclude: for 1 :<; k :<; M. We may now compute Uk, unless Ak = 0. In this case, the compatibility condition jk = 0 must hold, and iik is arbitrary. Thus, the solution for u is not unique in this case. This occurs for CC, NN, NS NS, NNS, and NSN boundary conditions, and corresponds to the fact that the solutions to these problems are unique only up to an additive constant. Having determined iik, u may now be computed using the inverse of the corresponding symmetric FFT. This step is called Fourier synthesis. We now indicate how to extend the Fourier analysis method to a two dimensional rectangle. For simplicity, we assume that the number of un knowns in each dimension are equal. In two spatial dimensions, the dis cretized Poisson equation is: for 1 :<; n, m :<; M, where p = t::..xj t::..y. We assume that homogeneous boundary conditions are specified on all four sides of the rectangle of the same type considered previously. The discretized boundary value problem may be written in matrix form as: (1.4) for 1 :<; m :<; M. Urn is a vector oflength M with n'th component Un,rn, and likewise for frn. A is the same M dimensional matrix as in the corresponding onedimensional problem. As before, we seek a solution for Urn in the form of an eigenvector expansion: M Urn= L Uk,mk k=l 3 (1.5)
PAGE 15
Table 1.2: Eigenstructure for the Standard Grid Bnd Cnd n'th Comp of Eigenvec Comp Domain Transform Associated Eigenvalue Eigen vee Indx CC cos(2dn/ N) OSnSN1 RFFT 4sin2(d/N) 0 S k S N /2 or 0 S k S (N1)/2 sin(27rkn/ N) OSnSN1 4sin2(d/N) 1 S k S N /2 1 or 1 < k < (N1)/2 NN cos(2dn/ N) 0 S n S N/2 REFFT 4 sin2 ( d/ N) 0 < k < N/2 DD sin(2dn/ N) 1 S n S N /21 RO FFT 4 sin2 ( d/ N) 1 < k < N /21 ND cos[27rn(2k + 1)/ N] OSnSN/41 REO FFT 4sin2[7r(2k + 1)/N] O
PAGE 16
Table 1.4: Eigenstructure for the Mixed Grid Bnd Cnd n'th Camp of Eigenvec Comp Domain Transform Associated Eigenvalue Eigenvec Indx N = 2(2M + 1) NNS cos( 41rkn IN) O:<;n:;M REE FFT 4 sin2(21rkl N) 0 < k < M NDS cos[27rn(2k + 1)1 N] O:<;n:;M REO FFT 4sin2[7r(2k + 1)INJ O< k < M DNS sin[27rn(2k1)1 N] 1:5cn:5cM ROE FFT 4 sin2[7r(2k 1 )IN] 1
PAGE 17
This requires that we also express fm as an eigenvector expansion: M fm = L:A,mk (1.6) k=1 A,m may be computed most efficiently by performing M symmetric FFTs of length M. Substituting equations (1.5) and (1.6) into equation (1.4) yields: M A[L uk,mk] + k=1 M p2 L[uk,m12uk,m + Uk,m+dk k=1 M L Uk,mAkk + k=1 M p2 L:[uk,m12uk,m + "k.m+1lk k=l Since the vectors k are linearly independent, we conclude: 2. + (' 2 2). 2. ,. p Uk,m1 Ak p Uk,m + p Uk,m.+ 1 = k,Tn for 1 :S k, m :S M. We now obtain itk,m by solving M tridiagonal linear systems of dimension M by Gaussian elimination. For CC, NN, NSNS, NNS, or NSN boundary conditions, one of these linear systems is singular. In this case, A,m must satisfy a compatibility condition, and the solution for itk,m is not unique. Having determined Uk,m, Um may be computed by performing M symmetric FFTs of length M. We conclude this section with a discussion of operation counts for the Fourier analysis method, and a comparison of it to other methods for solv ing the discrete Poisson equation. The Fourier analysis method is efficient only for two or more dimensions. As before, we will restrict our discus sion to two dimensions. The operation count for an MxM grid, where M is a power of two, is easily obtained from the description of the algorithm above. We performed 2M symmetric FFTs of length M, each of which re quires O(M log M) operations. We solved M tridiagonal linear systems of dimension M by Gaussian elimination, each of which requires O(M) oper ations. Thus, the asymptotic operation count for the entire algorithm is 6
PAGE 18
Table 1.5: Operation Counts for 2D Poisson Solvers Method Operation Count Gaussian Elimination O(M4 ) Successive OverRelaxation O(M3logM) Alternating Direction Implicit O(M2log2 M) Cyclic Reduction O(M2logM) Fourier Analysis O(M2logM) FACR(l) O(M2loglog M) O(M2log M). The operation counts for other methods of solving the dis crete Poisson equation are summarized in Table 1.5. The source of this information is [8]. The FACR(l) method combines the cyclic reduction and Fourier analysis methods. 7
PAGE 19
1.2 The New FFT and FST Algorithms From the discussion of the Fourier analysis method in Section 1.1, it is evident that FFT algorithms form the core of this method. Our goal is to provide the best possible FFT algorithms for this purpose, and to address all of the boundary conditions in Tables 1.2, 1.3, and 1.4. In this section, we summarize the new contributions to FFT literature contained herein. For each of the boundary conditions in Tables 1.2, 1.3, and 1.4 an FFT algorithm has been developed which computes the coefficients in the corre sponding eigenvector expansion as efficiently as possible by eliminating all redundant computations which would occur in the full complex FFT, and without preor postprocessing. Such FFT algorithms are referred to as compact symmetric FFTs. The older preand postprocessing algorithms are described in detail in [2, 10]. Preand postprocessing steps contribute only low order terms to operation counts. However, for sequences of prac tical length these low order terms may be significant. Furthermore, these algorithms require additional data accesses which also contribute to the to tal execution time. Thus, compact symmetric FFTs eliminate the additional operations and data accesses associated with preand postprocessing algo rithms. Preand postprocessing algorithms also have the restriction that the length of the sequence must be even. A compact symmetric FFT has long been available for real sequences, known as Edson's algorithm. In [4], a compact symmetric FFT for real even sequences is introduced, but in the context of GlenshawCurtis quadrature. In [10], inplace compact symmetric FFTs are developed for real, even, odd, quarterwave even, and quarterwave odd symmetries. All inplace algorithms based on the splitting method require either the input or output sequence to be in a permuted order, referred to as bitreversed order. These inplace algorithms require the input sequence in physical space to be in bitreversed order, and produce the forward transform in natural order. From our discussion of the Fourier analysis method, it is clear that this is the opposite of what is desired. In [1], analogous algorithms are developed which accept the input sequence in physical space in natural order, and produce the forward transform in bitreversed order. We follow the general approach set forth in [1]. With this background, we may now summarize our new contributions to FFT literature. The algorithms in [1] were developed for radix2 only. We have generalized all of these to radixp, for a general factor p. This has resulted in a number of new intermediate symmetries which occur in 8
PAGE 20
the course of the splitting method. After obtaining the combine equations for the inverse transform, they must be inverted to obtain those for the forward transform. For the radixp algorithms, this reqnires the inversion of many systems of p equations in p unknowns. We have exploited the special nature of these systems of equations to in vert them in closed form. The real quarterwave even and quarterwave odd transforms, which we refer to as the real staggered even (RSE) and real staggered odd (RSO) FFTs, have been used for ND and DN boundary conditions respectively. We have shown that the algorithms for these symmetries in [1] are not inplace. We have developed two new compact symmetric FFTs, called real composite evenodd (REO) and composite oddeven (ROE) for these boundary conditions. We have shown that these new algorithms are inplace and obtain the goal of eliminating all redundant operations which would occur in the full complex FFT. For staggered grid boundary conditions, we have developed new algorithms based on a variant of the DFT which we refer to as the discrete stag gered transform (DST). In analogy to the FFT, we have developed efficient algorithms for computing the DST, which we refer to as the fast staggered transform (FST). Previously, the only known algorithms for staggered grid boundary conditions were the real quarterwave even and quarterwave odd FFTs, and the preand postprocessing algorithms in [6]. The real quarterwave even and quarterwave odd FFTs have been used for NSNS and DSDS boundary conditions respectively, but the algorithms for these symmetries in [1] are not inplace. The preand postprocessing algorithms for NSDS and DSNS boundary conditions are less efficient than the new compact symmetric FSTs for the same general reasons discussed previously. For mixed grid boundary conditions, we have developed new algorithms based on superimposing two symmetries. We refer to the resulting sym metries as composite symmetries. Previously, the only known algorithms for mixed grid boundary conditions were the preand postprocessing al gorithms in [6] for NSD and DNS boundary conditions. Again, the preand postprocessing algorithms are less efficient than the new compact algorithms. Furthermore, we have developed compact algorithms for six mixed grid boundary conditions which previously could not be solved by Fourier methods. 9
PAGE 21
Chapter 2 Fast Fourier Transforms 2.1 Complex (C) We begin by reviewing the fast Fourier transform, and establishing no tation which will be used throughout. Definition 2.1 Given a C sequence Xn, for 0 < n < N 1, the forward discrete Fourier transform (DFT) is defined by: N1 xk = 1/N L "'nWNkn (2.1) n=O for 0 :<:: k :<:: N 1, where: For convenience, we will often suppress the constant 1/ N. The following theorem provides the inverse discrete Fourier transform (IDFT). We omit the proof of this result because it is well known. Theorem 2.1 A C sequence Xn may be recovered from its DFT Xk by the inverse discrete Fourier transform {IDFT} which is given by: N1 Xn = L (2.2) k=O forO:<::n:<::N1.
PAGE 22
By Definition 2.1, the sequences "'nand Xk are oflength N. These sequences can be extended to all integral values of nand k using the periodicity properties provided by the following corollary. Corollary 2.1 Equation.< (2.1} and {2.2} imply that the sequences Xn and xk may be extended periodically to all integral values of n and k by: We will develop fast algorithms for computing the DFT and IDFT which are based on the CooleyTukey fast Fourier transform (FFT). Following the general approach in [1], we will develop algorithms for the IDFT given Xk in bitreversed order. Inverting these yields algorithms for the DFT given Xn in natural order. We begin by defining notation which will be needed in the development of these algorithms. Definition 2.2 Given a C sequence Xk of length N, and a factor p of N, we define a splitting of Xk consisting of the following p subsequences, each of length N jp: xk,q = xpk+q for 0 :0: k :0: N /p1, 0 :0: q :0: p1. We denote the IDFT of these by Yn,q. That is: Nfpl Yn,q = L Xk,qw7v')P k=O for 0 :0: n :0: N /p1, 0 :<:; q :<:; p1. Given a C sequence Xn of length N, and a factor p of N, we define the following p subsequences, each of length N / p: Xn,l = XtNjp+n for 0 :0: n :0: N /p1, 0 :S l :<:; p1. The inverse fast Fourier transform (IFFT) is based on the principle of computing the quantities Yn,q, and then combining these in the appropriate fashion to obtain :Cn,l The precise equation for performing this combining operation is provided by the next theorem. 11
PAGE 23
Theorem 2.2 The inverse combine equation for C sequences is: p1 lq nq Xn,l = q:::::O for 0 $ n $ N jp1, 0 $ l $ p1. We now prove Theorem 2.2. N1 :l:n = L Xkw 1JP = k=O p1N/p1 2: 2: Xpk+qw';jpk+q) q:::::O k=O p1 Njp1 '<' nq '<' X kn L_.WN L_. k,qWNjp q=O k=O p1 '<' nq q=O In terms of the subsequence notation defined previously, this result is: "lN/p+n p1 '<' q(lNjp+n) L. W N YIN /p+n,q q=O p1 '"" lq nq L....J WP WN Yn,q q=O This completes the proof of Theorem 2.2. (2.3) The following corollary provides an important special case of this result. This is the same as equation (2) in [1], except that we are working with the IDFT. Corollary 2.2 Assume p = 2. The inverse combine equation for C se quences ts: Yn,o + wJVYn,l n Yn,OWNYn,l for 0 $ n $ N /2 1. 12
PAGE 24
We may now describe the IFFT algorithm for a C sequence with length a power of two. Figure 2.1 is a 'splitting tree' diagram which represents this algorithm for a C sequence of length eight. The original sequence is split into two subsequences, one consisting of the even numbered terms, and the other consisting of the odd numbered terms. Assume, for the moment, that the IDFT of each subsequence is known. Then the IDFT of the original sequence may be obtained by applying Corollary 2.2. The algorithm now continues recursively. That is, the IDFT of each subsequence is computed by splitting them and repeating the steps above. Eventually, subsequences of length one will be obtained. Since a sequence of length one is its own IDFT, the recursive process terminates at this point. We now begin the development of the FFT algorithm. We will obtain the forward combine equation for the FFT by inverting the inverse combine equation. For this, we will need the following 'orthogonality property.' Lemma 2.1 If N is a positive integer, and 0 :<:: j, n :<:: N1, then: if j = n otherwise We now prove Lemma 2.1. The case j = n is obvious. For j fc n, define: y = w'Jvj fc 1 Summing the finite geometric series yields: N1 "' k(nj) L...WN k=O = k=O (1yN)/(1y) (1 1)/(1 y) 0 This completes the proof of Lemma 2.1. The forward combine equation for the FFT is now provided by the following theorem. Theorem 2.3 The forward combine equation for C sequences is: p1 1 / nq "' lq Yn,q = PWN LWP a:n,l l=O forO:<:: n :<:: Nfp1, 0 :<:: q :<:: p1. 13 (2.4)
PAGE 25
c <::: c c <::: c c <::: c c <::: Figure 2.1: Splitting tree for complex FFT 14
PAGE 26
We now prove Theorem 2.3. N(p1 Yn,q I: xk,qwmp k=O N(p1 L Xpk+qw'N')P k=O N/p1 N1 [1/N :c w j(pk+q),wkn L.,; L.,; J N J N(p k=D j=O N1 N(p1 1/N" "'wiqr wk(nj)] L.,; J N L L.,; N(p j=O k=O p1 1 / q(lN(p+n) PL.,; "'lN(p+nWN = l=O p1 1/pw/tq L w;1 q:cn,l l=O This completes the proof of Theorem 2.3. The following corollary provides an important special case of this result. This is the same as equation (13) in [1], except that we are working with the IDFT. Corollary 2.3 Assume p quences z.s: 2. The forward combine equation for C seYn,O ( :Z:n,O + :Cn,l) /2 Yn,1 = W/t(:cn,O"'n,J)/2 for 0 $ n $ N /2 1. We close this section by presenting the FFT and IFFT algorithms for complex sequences with length a power of two. We emphasize that this FFT is an inplace algorithm which accepts the input sequence :Cn in natu ral order, and produces the forward transform Xk in bitreversed order. The IFFT is an inplace algorithm which accepts the sequence Xk in bitreversed order, and produces the inverse transform :Cn in natural order. These al gorithms may be used together in such a way that reordering of the data 15
PAGE 27
is never required. We will not include complete algorithm specifications such as these for all of the symmetric FFTs presented later. However, the algorithms presented here should provide a guideline for developing com plete algorithms from forward and inverse combine equations. The codes are written in FORTRAN, and are patterned after similar codes found in [9]. 16
PAGE 28
c C TEST DRIVER FOR COMPLEX FFT c c PARAMETER (LOGN=3,N=2LOGN) COMPLEX X(O:N1) COMPLEX OMEGA(O:N1) COMMON /FCCOM/ L,OMEGA DO 100 I=O ,N1 X(I) = CMPLX(1.0,0.0) 100 CONTINUE WRITE(6,1) (X(I) ,I=O,N1) 1 FORMAT(1H ,'COMPLEX SEQUENCE= ',4(/,4E13.4)) CALL FCI(LOGN) CALL FFC(LOGN,X) WRITE(6,2) (X(I) ,I=O,N1) 2 FORMAT(1H TRANSFORM = ',4(/ ,4E13.4)) CALL FIC(LOGN,X) WRITE(6,3) (X(I),I=O,N1) 3 FORMAT(1H ,'INVERSE TRANSFORM= ',4(/,4E13.4)) END C FOURIER TRANSFORM C COMPLEX SEQUENCE C INITIALIZATION c SUBROUTINE FCI(LOGN) COMPLEX OMEGA(O:O) COMMON /FCC OM/ L, 0!1EGA L = 2**LOGN OMEGA(O) = 1.0 TPIDL = 8.0*ATAN(1.0)/L OMEGA(1) = CMPLX(COS(TPIDL),SIN(TPIDL)) DO 100 I=2,L1 OMEGA(I) = OMEGA(I1)MEGA(1) 100 CONTINUE RETURN END 17
PAGE 29
c C FOURIER TRANSFORM C FORWARD DIRECTION C COMPLEX SEQUENCE c c SUBROUTINE FFC(LOGN,X) COMPLEX X(0:2**LOGN1) N = 2**LOGN DO 100 I=1,LOGN NS = 2**(I1) LS = N/NS CALL CF(NS,LS,X) 100 CONTINUE DO 200 I=O,N1 X(I) = X(I)/N 200 CONTINUE RETURN END C COMPLEX SEQUENCES C FORWARD COMBINED c C NS = NUMBER OF SEQUENCES C LS = LENGTH OF SEQUENCES c SUBROUTINE CF(NS,LS,X) COMPLEX X(O:LS/21,0:1,NS),TMP1 COMPLEX OMEGA(O:O) COMMON /FCCOM/ L,OMEGA DO 200 J=1,NS DO 100 I=O,LS/21 TMP1 = X(I,O,J) + X(I,1,J) X(I,1,J) = CONJG(OMEGA(IL/LS))(X(I,O,J) X(I,1,J)) X(I,O,J) = TMP1 100 CONTINUE 200 CONTINUE RETURN END 18
PAGE 30
c C FOURIER TRANSFORM C INVERSE DIRECTION C COMPLEX SEQUENCE c c SUBROUTINE FIC(LOGN,X) COMPLEX X(0:2**LOGN1) N = 2**LOGN DO 100 I=l,LOGN LS = 2**I NS = N/LS CALL CI(NS,LS,X) 100 CONTINUE RETURN END C COMPLEX SEQUENCES C INVERSE COMBINED c C NS = NUMBER OF SEQUENCES C LS = LENGTH OF SEQUENCES c SUBROUTINE CI(NS,LS,X) COMPLEX X(O:LS/21,0:1,NS),TMP1 COMPLEX OMEGA(O:O) COMMON /FCCOM/ L,OMEGA DO 200 J=1,NS DO 100 I=O,LS/21 TMP1 = OMEGA(I*L/LS)*X(I,l,J) X(I,1,J) = X(I,O,J) Tl1P1 X(I,O,J) = X(I,O,J) + TMP1 100 CONTINUE 200 CONTINUE RETURN END 19
PAGE 31
2.2 Real (R) In this section, we will be concerned with the following symmetries: Definition 2.3 A real {R) sequence Xn of length N is defined by: A conjugate symmetric {CS) sequence Xk of length N is defined by: XNk = Xk The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.2 If Xn is an R sequence of length N, then its DFT Xk is a CS sequence of length N. If Xk is a CS sequence of length N, then its IDFT Xn is an R sequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for CC boundary con ditions. Since an R sequence is also periodic with length N, it satisfies CC boundary conditions for the computational domain 0 ::; n :S N 1. Theorem 2.4 Let Xn be an R sequence and let xk be its cs symmetric DFT, both of length N. The real form of the DFT is: N1 1/ N L Xn cos(21rkn/ N) n::::::O N1 Im(Xk) = 1/ N L Xn sin(2r.kn/ N) n=O for 0::; k :S N /2 if N is even, and 0 :S k :S (N1)/2 if N is odd. If N is even, then the real form of the IDFT is: Xn = Xo+(l)nXN/2+ N/21 L {2Re(Xk) cos(2r.kn/ N)2Im(Xk) sin(2r.kn/ N)} k=1 20
PAGE 32
for 0 :": n :": N 1. If N is odd, we obtain instead: (N1)/2 :Z:n = Xo + L {2Re(Xk) cos(2dn/ N)2Im(Xk) sin(2dn/ N)} k=1 forO:": n :": N 1. We now prove Theorem 2.4. The result for the DFT follows immediately from Definition 2.1 and the R symmetry of "'n. Note that only half of the CS sequence Xk needs to be specified. We prove the result for the IDFT for the case of even N only, since the proof for odd N is similar. Using the CS symmetry of X k yields: N1 Zn = L Xkw';p k=O Xo+(1tXN/2+ N/21 N/21 I; Xkwj;p + I; XNkw;JNk) k=1 k=1 Xo + (ltXN/2 + N/21 N/21 L Xkwjyn + L XkwNkn k=l k=1 N/21 X0 + ( l)nXN/2 + 2Re[ I; Xkwl;pj Xo + (ltXN/2 + N/21 k=1 I; {2Re(Xk)cos(2dn/N)2Im(Xk)sin(2dn/N)} k=1 This completes the proof of Theorem 2.4. We now develop a fast, mixed radix algorithm for computing the R symmetric DFT and its inverse, given Zn in natural order. Note that an R sequence oflength N may be stored inN real storage locations, compared to 2N real storage locations for a C sequence of length N. Also, a CS sequence of length N may be stored in N real storage locations because half of the sequence is redundant and need not be stored. Our goal is to exploit these symmetries in the data in order to obtain a reduction by half in both storage 21
PAGE 33
requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the CS sequence XkWe begin developing this algorithm by defining ail of the intermediate symmetries involved. Definition 2.4 Let Xk be a CS sequence of length N with factor p. For q i 0, we define CS induced intersequence symmetry (CSIS) by: xk,pq = xNfpk1,q For q i 0, we denote subsequence Xk,q by CSIS(q). Subsequence pq is a redundant copy of subsequence q, which we denote by CSIS(p q) = CSIS* (q). We also say that subsequence pq is the dual of subsequence q. A staggered conjugate symmetric (SCS} sequence Xk of length N is de fined by: xNk1 = xk Let N have factor p. For 0 ::; q ::; p1, we define SCS induced intersequence symmetry (SCSIS) by: Xk,pq1 = XNjpk1,q For 0 ::; q ::; p1, we denote subsequence Xk,q by SCSIS(q). Subsequence pq 1 is a redundant copy of subsequence q, which we denote by SCSIS(pq1} = SCSIS*(q). We also say that subsequence pq1 is the dual of subsequence q. The following lemma establishes the relationship between these symm.e tries. Lemma 2.3 Let Xk be a CS sequence of length N with factor p. Then the subsequence Xk,o is CS symmetric, and the remaining subsequences Xk,q are CSIS symmetric. If p is even, then the CSIS symmetry of subsequence xk,p/2 reduces to scs symmetry. Let Xk be an SCS sequence of length N with factor p. Then the subse quences Xk,q are SCSIS symmetric. If p is odd, then the SCSIS symmetry of subsequence Xk,(p1);2 reduces to SCS symmetry. We now prove Lemma 2.3. Let Xk be a CS sequence of length N with factor p. The subsequence Xk,o satisfies: xNfpk,o = xNpk = xpk = xk,o 22
PAGE 34
That is, subsequence xk,O is cs synunetric. The remaining subsequences xk,q satisfy: xpk+pq XNpkp+q X p(Njpk1)+q X N/pk1,q That is, for q fc 0 the subsequences Xk,q are CSIS symmetric. If p is even, then the CSIS synunetry of Xk,p/2 reduces to: xk,p/2 = xNfpkl,p/2 That is, subsequence Xk,v/2 is SCS synunetric. Let Xk be an SCS sequence oflength N with factor p. The subsequences xk,q satisfy: xk,pq1 Xpk+pq1 XNpkp+q+l1 = xp(Nfpkl)+q XN/pk1,q That is, the subsequences Xk,q are SCSIS synunetric. If p is odd, then the SCSIS synunetry of Xk,(p1);2 reduces to: Xk,(p1)/2 = XN/pk1,(p1)t2 That is, subsequence Xk,(v1);2 is SCS synunetric. This completes the proof of Lenuna 2.3. A mixed radix splitting tree diagram for a CS sequence is shown in Figure 2.2. The acronyms representing the synunetries are sununarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. The next lenuna provides the intermediate synunetries in the IDFT in duced by the intermediate synunetries in the DFT. Lemma 2.4 The intersequence symmetry CSIS induces the following inter sequence symmetry in the IDFT: nYn,pq = WNjpYn,q 23
PAGE 35
cs cs CS =::::::::::::c SIS* ( 1) c CSIS*(l) Figure 2.2: Splitting tree for R symmetric FFT 24
PAGE 36
Let Xk be an SCS sequence of length N. Its IDFT Xn satisfies: nWN Xn n/2. WN Xn where ;;n is the magnitude of Xn, and hence is real. The intersequence symmetry SCSIS induces the following intersequence symmetry in the IDFT: n Yn,pq1 = WNjpYn,q We now prove Lemma 2.4. Let Xk,q be CSIS symmetric. Then the IDFT of Xk,pq is: Njp1 Yn,pq I: xk,pqw7J!p k=0 Nfp1 "' kn L_, X Njpk1,qWNjp k=O Njp1 "' X n(N/pk1) L k,qWN/p k=O Nfp1 WNlp L Xk,qWN'; k=O Let Xk be an SCS sequence of length N. Its IDFT Xn satisfies: N1 Zn L Xkw}.p k=O N1 "' X n(Nk1) L_, NklWN k=O N1 wft L Xkwi/n k=O We express Xn in polar form as follows: 25
PAGE 37
Substituting this into the preceding syrmnetry for :en and solving for B leads to: n/2. :Z:n =WN Xn Let Xk,q be SCSIS syrmnetric. Then the IDFT of Xk,pq1 is: Yn,pq1 N/p1 L Xk,pq1w;;;P k=O N/p1 X kn ..... N/pk,qWN/p k=O N/p1 X n(N/pk1) ..... k,qWN/p k=O N/p1 wNfp I: Xk,qwN/; k=O This completes the proof of Lermna 2.4. The preceding lermna shows that each syrmnetry appearing in Figure 2.2 induces a syrmnetry in the IDFT. These induced syrmnetries are surmnarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the R syrmnetric IFFT. Theorem 2.5 Assume that p is even. The inverse combine equation for CS, SCS, and CSIS sequences is: Xn,l Yn,O + ( + p/21 2Re[ L (2.5) q::::l for 0 :S n :S N jp 1, 0 :S l :S p 1. Note that :Cn,l is real because Yn,o ts real. The inverse combine equation for SCSIS sequences is: p/21 2R [ 1/2 n/2 lq nq 1 Xn,l = e WP WN L WP WN Yn,qJ q=O for 0 :S n :S N jp 1, 0 :S l :S p 1. 26 (2.6)
PAGE 38
Next, assume that p is odd. The inverse combine equation for CS and CSIS sequences is: (p1)/2 '"n,l = Yn,O + 2Re[ L (2.7) q=l for 0 :S: n :S: N / p 1, 0 :S: l :S: p 1. The inverse combine equation for SCS and SCSIS sequences is: (p3)/2 2 R r 1/2, n/2 lq nq J '"n,lYn,(p1)/2 + elwp WN L.. wp WN Yn,q q=O for 0 :S: n :S: N jp1, 0 :S: l :S: p1. (2.8) We now prove Theorem 2.5. First, assume that pis even. Consider the combining of CS, SCS, and CSIS sequences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: p1 "'"" lq nq Zn,l = LWPWNYn,q q=O + lp/2 np/2 Yn,O WP WN Yn,p/2 t p/21 p/21 wlqwnqy + wl(pq)wn(pq)y L P N n,q L P N n,pq q=l q=l ( 1 )1 np/2 n/2 J = Yn,O + WN lWNfp Yn,p/2 + p/21 p/21 lq nq + lq nqL wp WN Yn,q L wp WN Yn,q q:::::l q=l Yn,O + ( 1 )1 Yn,p/2 + p/21 2Re[ L q=l Consider the combining of SCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (2.3) yields: :l:n,l = p1 ""' lq nq LWPWNYn,q q=O 27
PAGE 39
p/21 p/21 = lq nq + l(pq1) n(pq1) L..J wp WN Yn,q L wp WN Yn,pq1 q=O q=O p/21 p/21 '""" lq nq + I n V lq nqL wp WN Yn,q WP WN L WP WN Yn,q q=O q=O Using SCS symmetry yields: = "'ZN/p+n (lNfp+n)/2WN XzNjp+n l/2 n/2Wp WN Xn,l Substituting this into the combine equation above yields: p/21 ;;n,l wlf2wn/2 wlqwnqy + p N L p N n,q q=O p/21 l/2 n/2 lq nqWp wN L WP WN Yn,q q::::O p/21 2Re[w112wn/2 w 1wnqy ] pNL_.;pNn,q q=O (2.9) Next, assume that pis odd. Consider the combining of CS and CSIS se quences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: (p1)/2 (p1)/2 = Yn,O + I: lq nq WpWNYn,q + I: l(pq) wP n(pq) WN Yn,pq q=l q=l (p1)/2 (p1)/2 Yn,O + I: lq nq WpWNYn,q + I: lq nqWP WN Yn,q q=l q=l (p1)/2 2R f lq nq 1 Yn,O + eL L wp WN Yn,qj q=l 28
PAGE 40
Consider the combining of SCS and SCSIS sequences. Substituting the symmetries found earlier into the inverse combine equation (2.3) yields: p1 "" lq nq Zn,l q=O = l(p1)/2 n(p1)/2 + WP WN Yn,(p1)/2 (p3)/2 (p3)/2 "" lq nq "" l(pq1) n(pq1) L.., wp WN Yn,q T wp WN Yn,pq1 q=O q=O = l(p1)/2 n(p1)/2 WP w N Yn,(p1 )/2 ..,. (p3)/2 (p3)/2 lq nq ...L l n ""' lq nq_ L WP WN Yn,q 1 WP WN L..J wp WN Yn,q q=O q=O Combining this with equation (2.9) yields: p/2 np/2 WP WN Yn,(p1)/2 + (p3)/2 (p3)/2 l/2 n/2 "" lq nq + l/2 n/2 "" WP WN L WpWNYn,q wp WN L q=O q=O (p3)/2 2 R r 1/2 n/2 "" lq nq ] Yn,(p1)/2 + elwp WN L., WP wN Yn,q = q=O This completes the proof of Theorem 2.5. The following corollary provides an important special case of this result. These are the same as equations (6) and (7) in [1], except that we are working with the IDFT. Corollary 2.4 Assume p = 2. The inverse combine equation for CS and SCS sequences is: Xn,O Yn,O + fin,l :Z:n,l Yn,O fln,l for 0 :::; n :::; N /21. The inverse combine equation for SCSIS sequences is: 2 R n/2 1 Xn,o etwN Yn,OJ in,1 2Im[w;12Yn,o] for 0 :::; n :::; N /2 1. 29
PAGE 41
The next theorem provides all of the forward combine equations for the R symmetric FFT. Theorem 2.6 Assume that p is even. The forward combine equation for CS, SCS, and CSIS sequences is given by equation {2.4} for 0::; n::; N jp1, 0 ::; q ::; p/21 and: p1 Yn,p/2 = 1/p 2.::( 1)1Xn,i (2.10) [:;:;;;0 for 0 ::; n ::; N / p 1. The forward combine equation for SCSIS sequences 'tS: p1 1/ n(q+l/2) '\' l(q+l/2). PWN LWP Xn,l (2.11) l=O forO::; n::; Njp1, 0::; q::; p/2 1. Next, assume that p is odd. The forward combine equation for CS and CSIS sequences is given by equation (2.4) for 0 ::; n ::; N jp1, 0 :S q :S (p1)/2. The forward combine equation for SCS and SCSIS sequences is given by equation {2.11} for 0::; n::; N/p1, 0::; q::; (p3)/2 and: for 0::; n::; Njp 1. p1 Yn,(p1)/2 = 1/p 2.::( 1)1in,l l=O (2.12) We now prove Theorem 2.6. First, assume that p is even. The forward combining of CS, SCS, and CSIS sequences requires one new equation: Yn,p/2 = n/2 WNjpYn,p/2 p1 1/p 2.:: w;1PI2xn,l l=O p1 1/p 2.::(1)1xn,l l=O The forward combine equation for SCSIS sequences is obtained by sub stituting equation (2.9) into equation (2.4): p1 1 / nq '\' lq Yn,q = p WN L WP Xr.,l l=O 30
PAGE 42
p1 = 1 / nq"""' lq[ l/2 n/2J pwN LwP wP wN ;en,l l=O p1 1 / n(q+l/2)"""' l(q+l/2)pwN LWP Xn,I l=O Next, assume that p is odd. The forward combining of CS and CSIS sequences does not require any new equations. The forward combining of SCS and SCSIS sequences requires one new equation: n/2 Yn,(p1)/2 = WNjpYn,(p1)/2 p1 1/p L:;w;lpf2:iin,l l=O p1 1/p I;( 1)1:ii,;,z l=O This completes the proof of Theorem 2.6. The following corollary provides an important special case of this result. These are the same as equations (11) and (12) in [1], except that we are working with the IDFT. Corollary 2.5 Assume p = 2. The forward combine equation for CS and SCS sequences is: Yn,O ( ;J;n,O + ;J;n,,) /2 Yn,1 ( "'n,O "'n,1) /2 for 0 ::0: n ::0: N /2 1. The forward combine equation for SCSIS sequences zs: n/2(. )/2 Yn,O = W N Xn,O ZXn,l for 0 ::0: n ::0: N /2 1. 31
PAGE 43
2.3 Real Even (RE) In this section, we will be concerned with the following symmetries: Definition 2.5 A real even (RE) sequence Xn of length N is defined by: XNn = Xn Note that an RE sequence may also be viewed as having both R and CS symmetry, which we denote by R CS. The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.5 If Xn is an REsequence of length N, then its DFT Xk zs an RCS sequence of length N. If Xk is an RCS sequence of length N, then its IDFT Xn is an REsequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for NN boundary condi tions. Note that if N is even, then an RE sequence satisfies NN boundary conditions for the computational domain 0 :C: n :C: N /2. That is: "'N/21 Theorem 2. 7 Let "'n be an REsequence and let Xk be its RCS symmetric DFT, both of length N where N is even. The real form of the DFT is: N/21 Xk = 1/N["o + (l)kxN/2 + L 2xncos(2r.kn/N)] n=l for 0 :C: k :C: N /2. The real form of the IDFT is: N/21 Xn=Xo+(ltXN;2 + I; 2Xkcos(27rkn/N) k:::::l for 0 :C: n :C: N /2. Note that the results for the DFT and IDFT are identical e:l:cept for scaling. 32
PAGE 44
We now prove Theorem 2. 7. The result for the DFT follows from Theo rem 2.4, the RCS symmetry of xk, and theRE symmetry of Xn as follows: N1 Xk l/ N L "'n cos(27rkn/ N) n=O = l/N{xo+(l)k"'N/2+ N/21 N/21 I; Xncos(27rkn/N) + L "'Nncos[21fk(Nn)/N]} n=l n=l N/21 1/ N["o + ( l)k"'N/2 + I; 2:vn cos(27fkn/ N)] n::::l The result for the IDFT follows immediately from Theorem 2.4 and the RCS symmetry of Xk. Note that only half of the RE sequence "'n needs to be specified. This completes the proof of Theorem 2.7. We now develop a fast, mixed radix algorithm for computing the RE symmetric DFT and its inverse, given Xn in natural order. Note that an RE sequence oflength N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCS sequence oflength N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the RCS sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 2.6 Let Xk be an RCS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of xk are identical to those in Definition 2.4, with the addition that all sequences are real as well. We indicate this by preceding the acronym for each symmetry with an R. The relationships between the symmetries recorded in Lemma 2.3 are not affected by the fact that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RCS sequence is shown in Figure 2.3. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDFT in duced by the intermediate symmetries in the DFT. 33
PAGE 45
RCS csrs*(l) Figure 2.3: Splitting tree for RE symmetric FFT 34
PAGE 46
Lemma 2.6 The intermediate symmetries in the IDFT induced by the in termediate symmetries in the DFT are identical to those in Lemma 2.4, with the following addition. Let Xk be an R sequence of length N. Its IDFT Xn satisfies: Since all sequences haveR symmetry, only half of the IDFT of any sequence needs to be computed. We now prove Lemma 2.6. Let Xk be an R sequence of length N. Its IDFT Xn satisfies: N1 X k(Nn) :CNn LJ kWN k=O N1 L XkwNkn k=O :;;n This completes the proof of Lemma 2.6. The preceding lemma shows that each symmetry appearing in Figure 2.3 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RE symmetric IFFT. Theorem 2.8 Assume that p is even. The inverse combine equation for RCS, RSCS, and RCSIS sequences is given by equation {2.5} for the lower halfrange of n and 0 :S l :S p/21. We also need the companion equation: "N/pn,l Yn,O + ( 1 )I+ 1 Yn,p/2 + p/21 2Re[ wq(l+l)wnqy ] Lp N n,q (2.13) q=l for the lower halfrange of n and 0 :S l :S p/2 1. The inverse combine equation for RSCSIS sequences is given by equation (2. 6) for the lower half range of n and 0 :; l :; p/21. We also need the companion equation: p/21 iN; = 2Re[w(l+l)i2wn/2 wq(l+1)wnqy i pn,l p N L p N n,qJ (2.14) q=O 35
PAGE 47
for the lower halfrange of n and 0 :0: l :0: pl2 1. The inverse combine equation for R sequences is given by equation (2. 3) for the lower halfrange of n and 0 :0: l :0: pl2 1. We also need the companion equation: pl X ""wq(l+l)w_nqv N/pn,l L p N (2.15) q::::O for the lower halfrange of n and 0 :':: l :':: pI 2 1. Next, assume that p is odd. The inverse combine equation for RCS and RCSIS sequences is given by equation {2. 7} for the lower halfrange ofn and 0 :0: l :0: (p 1) I 2. We also need .the companion equation: (p1)/2 2R r "' q(hl) nq 1 ZNjpn,l = Yn,O + el L wp WN Yn,qJ (2.16) q::=l for the lower halfrange of n and 0 :0: l :0: (p3)12. The inverse combine equation for RSCS and RSCSIS sequences is given by equation (2. 8) for the lower halfrange of n and 0 :<:; l :<:; (p 1) 12. We also need the companion equation: XNjpn,l = Yn,(pl)/2 + (2.17) for the lower halfrange of n and 0 :<:; l :0: (p3)12. The inverse combine equation for R sequences is given by equation (2.3) for the lower halfrange ofn and 0 :0: l :0: (p1)12. We also need the companion equation (2.15} for the lower halfrange of n and 0 :<:; l :<:; (p3)12. We now prove Theorem 2.8. First, assume that pis even. Consider the combining of RCS, RSCS, and RCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.5), we need the following companion equation: XNjpn,l = YN/pn,O + ( 1)1YNjpn,p/2 + p/2l 2R '<' /q q(Njpn) 1 eL L wp wN Yl'ljpn,q, q=l 36
PAGE 48
Using RSCS syrrunetry yields: YNjpn,q (Njpn)/2 WNjp YNjpn,q n/2WN/p Yn,q fln,q Substituting this into the companion equation above yields: XNfpn,l Yn,O + ( 1 )1+1 Yn,p/2 + p/21 2Re[ "' w(1+1)wny ] L p N n,q q=l Yn,O + ( 1 )1+1 Yn,p/2 + p/21 2Re[ "' wq(l+1)wnqy l L P N n,qJ q=l (2.18) Consider the combining of RSCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.6), we need the following companion equation: p/21 2R r 1/2 (N/pn)/2 "' 1q q(Nfpn) 1 elwp WN L wpwN q=O p/21 L w;(1+1)wj;inqYn,q] q=O p/21 = 2Re[w(l+1)12wn/2 "' wq(l+l)wn?y J P N L p 11i n,q q=O Consider the combining of R sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.3), we need the following companion equation: "'Nfpn,l p1 "' lq q(Njpn) L.. WP WN YNjpn,q q=O p1 = "'w(l+1)wny LP N n,q q::;;::O 37
PAGE 49
Next, assume that pis odd. Consider the combining of RCS and RCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.7), we need the following companion equation: "Nfpn,l = (p1)/2 R [ "' lq q(N/pn) ] YNjpn,O + 2 e wp WN YN/pn,q q;;:;;;l (p1)/2 Yn,O + 2Re[ L q=l (p1)/2 Y 0 + 2Re[ "' wq(l+1)wnqy 1 p N nR. q=l Consider the combining of RS CS and RS CSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (2.8), we need the following companion equation: 'i.Njpn,l fJNjpn,(p1)/2 + (p3)/2 2 R [ 1/2 (Njpn)/2 "' lq q(Njpn) ] e WP WN L..t WP WN YNjpn,q q=O Substituting equation (2.18) into the companion equation above yields: X Nfpn,l Yn,(p1 )/2 + (p3)/2 2Re[w(l+1)/2wn/2 "' wq(l+l)wnq_y ] p N L p N n,q q::::O Yn,(p1)/2 + (p3)/2 2Re[w(l+1)/2wn/2 "' wq(l+1)wnqy ] p N L..t p N n,q q=O The companion equation for R sequences is identical to the even p case. This completes the proof of Theorem 2.8. The following corollary provides an important special case of this result. Corollary 2.6 Assume p = 2. The inverse combine equation for RCS and RSCS sequences is: = "N/2n,O 38 Yn,O + fln,l Yn,O fln,l
PAGE 50
for the lower halfrange of n. The inverse combine equation for RSCSIS sequences is: Xn,O = XN/2n,o 2Im[w;:/2Yn,o] for the lower halfrange of n. The inverse combine equation for R sequences zs: "'N/2n,O for the lower halfrange of n. Yn,o + wfvYn,l nYn,O WN Yn,l The next theorem provides all of the forward combine equations for the RE symmetric FFT. Theorem 2.9 Assume that p is even. The forward combine equation for R sequences zs: Yn,q ljpw!t{xn,O + ( l)qxn,p/2 + p/2l "' [w1x l + o)x 11} L....,; p n, p n, J (2.19) l=l for _the lower halfrange of n and 0 :S q :S p1. Note that Yo,q is real because reo,o = :co and Xo,p;2 = re N/ 2 are both real. This ensures that the final output is real because n = 0 in the last stage of the algorithm. The forward combine equation for RCS, RSCS, and RCSIS sequences is given by equation {2.19} for the lower halfrange of n and 0 :S q :S p /2 1 with the exception that all sequences Xn,l are real. In addition: Yn,p/2 = ljp{xn,O + ( 1)Pi2 ren,p/2 + p/21 L ( 1)1 [xn,l + "n,l]} l:;:::l (2.20) for the lower halfrange of n. The forward combine equation for RSCSIS sequences is: = 1 / n(q+l/2){. + '( 1)q. Yn,q p WN X'n,O Z Xn,p/2 : p/21 L [w;l(q+l/ 2 )in,l + __ n,l]} (2.21) l=l 39
PAGE 51
for the lower halfrange of n and 0 :': q :': p/2 1. Note that Yo,q is real because zo,pf2 = '"N/2 = 0. Next, assume that p is odd. The forward combine equation for R sequences zs: (p1)/2 1 / nq{ + "' [ lq ..L lq]} Yn,q = pwN a::n,O L wp Xn,l I wp Xn,l (2.22) /;1 for the lower halfrange of n and 0 :': q :': p 1. The forward combine equation for RCS and RCSIS sequences is given by equation (2.22} for the lower halfrange of n and 0 :': q :': (p 1) /2 with the exception that all se quences Xn,l are real. The forward combine equation for RSCS and RSCSIS sequences zs: (p1 )/2 1/ n(q+l/2){, "' [ l(q+1/2), l(q+l/2). ]} Yn,q p WN Xn,O I L wp Xn,l T wp Xn,l /;1 for the lower halfrange of n and 0 :': q::: (p3)/2. In addition: (p1)/2 Yn,(p1)/2 = 1/p{xn,O + L (1)1[xn,! + Xn,z]} /;1 for the lower halfrange of n. (2.23) (2.24) We now prove Theorem 2.9. First, assume that pis even. The forward combine equation for R sequences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all R sequences: Xn,pl1 = '"(vl1)Nfp+n "'N(l+1)Njp+n Z(l+l)Nfpn Zn,l+l Using this result, we obtain: p1 1 / nq "' lq Yn,q pwN LWP Zn,l l=O 40
PAGE 52
p/21 p/21 1/pw;,n{ L w;1xn,l + L w;q(pl1)xn,pl1} 1=0 1=0 p/21 p/21 = 1/pw;,;n{ L w;1xn,l + L l=O l=O p/21 p/2 1/pw,Vnq{ L w;1xn,l + 1=0 1=1 = ljpw,Vnq{xn,o + (1)xn,p/2 + p/21 L [w;lqxn,l + 1=1 The forward combining of RCS, RSCS, and RCSIS sequences requires one new equation: Yn,p/2 n/2 WNjpYn,p/2 1/p{xn,O + ( 1)"12xn,pf2 + p/21 L ( 1)1[xn,l + "'n,l]} l=l The forward combine equation for RSCSIS sequences is obtained by sub stituting equation (2.9) into equation (2.19): Yn,q = 1/pw,Vnq{Xn,O + (1J"xn,p/2 + p/21 L [w; 1xn,l + 1=1 1 / n(q+l/2){ + '( 1)+ p WN Zn,O Z Xn,p/2 p/21 "' [wl(q+1/2);;, wl(q+l/2);;, I} L....J p n,l T p 1=1 Next, assume that p is odd. The forward combine equation for R se quences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. P"1 Yn'q = ljp WNnq "'Wlq mn I .L....t p "" 1=0 41
PAGE 53
= 1/pw!{nq{w;(p1)12xn,(p1)/2 + (p3)/2 (p3)/2 ""' lq + ""' q(pl1) } L wp Zn,l L.....J wp Xn,pl1 l=O l=O = 1/p w!{nq{w;q(p1)!2xn,(p1)/2 + (p3)/2 (p3)/2 L w;1 qxn,l + L 1=0 1=0 = 1jpw!{nq{w;q(p1)!2 a:n,(p1)/2 + (p3)/2 (p1 )/2 ""' lq + ""' lq} L.t wp Zn,l L wp Zn,l 1=0 1=1 (p1)/2 1/pw!{nq{xn,o + L [w;1xn,l + l=l The forward combining of RCS and RCSIS sequences does not require any new equations. The forward combine equation for RSCS and RSCSIS sequences is obtained by substituting equation (2.9) into equation (2.22): (p1)/2 Yn,q = 1/pw!{nq{xn,o + L [w;1xn,l + l:::::l (p1)/2 1/pwn(q+l/2){.;; + [wl(q+l/2).;; ..._ wl(q+l/2);;, ]} N ""'n,O L.....J p .vn,l p n,l 1=1 For q = (p 1) /2 this reduces to: Yn,(p1)/2 = n/2 WNjpYn,(p1)/2 (p1)/2 1/p{:i:n,o + L ( 1)1[in,z+ '"n,zl} 1=1 This completes the proof of Theorem 2. 9. The following corollary provides an important special case of this result. Corollary 2. 7 Assume p = 2. The forward combine equation for R sequences is: Yn,O (xn,O + Xn,1)/2 Yn,1 = wNn(xn,OXn,l)/2 42
PAGE 54
for the lower halfrange of n. The forward combine equation for RCS and RSCS sequences is: Yn,O = (xn,O + '"n,l)/2 Yn,l = (xn,O'"n,l)/2 for the lower halfrange of n. The forward combine equation for RSCSIS sequences 'LS.' n/2(+ ._ )/2 Yn,O = W N Xn,O taLn,l for the lower halfrange of n. 43
PAGE 55
2.4 Real Odd (RO) In this section, we will be concerned with the following symmetries: Definition 2.7 A real odd {RO) sequence Xn of length N is defined by: An imaginary odd {IO) sequence, or equivalently an imaginary conjugate symmetric {ICS) sequence, Xk of length N is defined by: xk xk XNk = Xk The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2. 7 If Xn is an RO sequence of length N, then its DFT Xk >S an ICS sequence of length N. If Xk is an ICS sequence of length N, then its IDFT Xn is an RO sequence of length N. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DD boundary con ditions. Note that if N is even, then an RO sequence satisfies DD boundary conditions for the computational domain 1 :5 n :5 N/21. That is: "o 0 :C N/2 0 Theorem 2.10 Let Xn be an RO sequence and let xk be its ICS symmetric DFT, both of length N where N is even. The real form of the DFT is: N/21 Im(Xk) = 1/ N L 2xn sin(27rkn/ N) n=l for 1 :5 k :5 N /21. The real form of the IDFT is: N/21 Xn =L 2Im(Xk)sin(27rknjN) k::::l 44
PAGE 56
for 1 ::; n ::; N /2 1. Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.10. The result for the DFT follows from The orem 2.4, the res symmetry of xk> and the RO symmetry of Xn as follows: N1 lm(Xk) = 1/N L Xnsin(27rkn/N) N/21 = 1/N L {xnsin(2dn/N) + "Nnsin[2d(Nn)/N] N/21 = 1/N L 2xnsin(2dn/N) n=l The result for the IDFT follows immediately from '\'heorem 2.4 and the res symmetry of xk. Note that only half of the RO sequence Xn needs to be specified. This completes the proof of Theorem 2.10. We now develop a fast, mixed radix algorithm for computing the RO symmetric DFT and its inverse, given Xn in natural order. Note that an RO sequence oflength N may be stored in N /2 real storage locations, compared to 2N real storage locations for a e sequence of length N. Similarly, an res sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for e sequences. This algorithm is based on the symmetries which occur in the splittings of the res sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries in valved. Definition 2.8 Let Xk be an ICS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of xk are identical to those in Definition 2.4, with the addition that all sequences are pure imagi nary as well. We indicate this by preceding the acronym for each symmetry with an I. The relationships between the symmetries recorded in Lemma 2.3 are not a:ffected by the fact that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IeS sequence is shown in Figure 2.4. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual 45
PAGE 57
CS CSIS(l) ICS csrs*(l) Figure 2.4: Splitting tree for RO symmetric FFT 46
PAGE 58
sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDFT in duced by the intermediate symmetries in the DFT. Lemma 2.8 The intermediate symmetries in the IDFT induced by the in termediate symmetries in the DFT are identical to those in Lemma 2.4, with the following addition. Let Xk be an I sequence of length N. Its IDFT Xn satisfies: Since all sequences have I symmetry, only half of the IDFT of any sequence needs to be computed. We now prove Lemma 2.8. Let Xk be an I sequence of length N. Its IDFT Xn satisfies: N1 '"Nn L k=O N1 I.: xkw].r k=O Xn This completes the proof of Lemma 2.8. The preceding lemma shows that each symmetry appearing in Figure 2.4 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RO symmetric IFFT. Theorem 2.11 Assume that p is even. The inverse combine equation for ICS, ISCS, and ICSIS sequences is given by equation (2.5) for the lower halfrange of n and 0 ::; l ::; p/21. We also need the companion equation: "'Nfpn,l (2.25) q=1 47
PAGE 59
for the lower halfrange of n and 0 :0: l :0: p/2 1. The inverse combine equation for ISCSIS sequences is given by equation {2. 6) for the lower half range of n and 0 :0: l :0: p/2 1. We also need the companion equation: p/21 2R [ (1+1)/2 n/2 ,, .q(h 1) nq 1 ZNjpn,lewp WN L wp WNYn,qJ q:::O (2.26) for the lower halfrange of n and 0 :0: l :0: p/2 1. The inverse combine equation for I sequences is given by equation {2. 3) for the lower halfrange of n and 0 :0: l :0: p/21. We also need the companion equation: (2.27) for the lower halfrange of n and 0 :0: l :0: p/21. Next, assume that p is odd. The inverse combine equation for ICS and ICSIS sequences is given by equation {2. 7) for the lower halfrange of n and 0 :0: l :0: (p1)/2. We also need the companion equation: (p1)/2 2 R "" q(l+l) nq i 'J'.Njpn,l = Yn,Ol L...,; wp WN Yn,q.J (2.28) q:::::l for the lower halfrange of n and 0 :0: l :0: (p3)/2. The inverse combine equation for ISCS and ISCSIS sequences is given by equation {2. 8) for the lower halfrange of n and 0 :0: l :0: (p1)/2. We also need the companion equation: ii;Nfpn,l fin,(p1)/2(2.29) for the lower halfrange of n and 0 :0: l :0: (p3)/2. The inverse combine equation for I sequences is given by equation {2. 3) for the lower halfrange ofn and 0:0: l :0: (p1)/2. We also need the companion equation {2.27} for the lower halfrange ofn and 0:0: l :0: (p3)/2. We now prove Theorem 2.11. First, assume that pis even. Consider the combining of ICS, ISCS, and ICSIS sequences. Since we will compute only 48
PAGE 60
half of each sequence Yn,q on the right hand side of equation (2.5), we need the following companion equation: "'Nfpn,l YN/pn,O + ( 1)1YN/pn,p/2 + p/21 2R [ lq q(Njpn) ] e L.WP WN YNjpn,q q=l Using ISCS symmetry yields: YNjpn,q (Njpn)/2 WN/p YNjpn,q n/2tWNjp Yn,q = +fJn,q Substituting this into the companion equation above yields: "'N/pn,l Yn,O + (1)1fJn,p/2p/21 2Re[ wq(l+l)wnqy l L p N n,q. q=1 = Yn,o + ( 1)1Yn,pf2p/22Ref wq(l+ l)wnqy ] .. L P N n,q q=l (2.30) Consider the combining ofiSCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.6), we need the following companion equation: p/21 2 R [ 1/2 (Njpn)/2 lq q(Njpn) ] "'Nfpn,l e wp WN LWP wN YNjpn,q q=O p/21 = 2 R [ (1+1)/2 n/2 q(l+l) .nq] ewp WN L wP wN Yn,q q::::O p/21 2Re[w(l+1)12wn/2 wq(l+1)wnqy 1 p N L p N n,q.J q=O Consider the combining of I sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2.3), we need the 49
PAGE 61
following companion equation: '"Nfpn,l = p1 "' lq q(Njpn) L.. WP WN YNjpn,q q:::::O p1 "'wq(l+1 lw nq_y L p N n,q q=O Next, assume that pis odd. Consider the combining of ICS and ICSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (2. 7), we need the following companion equation: 3! Njpn,l (p1)/2 2 R r "' lq q(Njpn) ] l eL L wp WN YNjpn,q q=l (p1)/2 Yn,O 2Re[ w;(l+1 )WNnqYn,q] q=l (p1)/2 Yn,o 2Re[ q=l Consider the combining of ISCS and ISCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (2.8), we need the following companion equation: itNjpn,l YN jpn,(p1 )/2 + (p3)/2 2R [ 1/2 (Njpn)/2 "' lq q(N/pn) ] e WP wN L WP WN YNjpn,q q::.:O Substituting equation (2.30) into the companion equation above yields: ffn,(p1)/2ffn,(p1)/2 50
PAGE 62
The companion equation for I sequences is identical to the even p case. This completes the proof of Theorem 2 .11. The following corollary provides an important special case of this result. Corollary 2.8 Assume p = 2. The inverse combine equation for ICS and ISCS sequences is: Xn,O Yn,O + f/n,l XN/2n,O Yn,fJ + fJn,l for the lower halfrange of n. The inverse combine equation for ISCSIS sequences zs: = for the lower halfrange of n. The inverse combine equation for I sequences ts: Yn,o + wl\rYn,1 X N/2n,O 1 Yn,O T i.I:N Yn,l for the lower halfrange of n. The next theorem provides all of the forward combine equations for the RO symmetric FFT. Theorem 2.12 Assume that p is even. The forward combine equation for I sequences is: Yn,q 1/pwj\,.nq{xn,O + ( l)q+l;;;_n,p/2 + p/21 "' [ lq ,lq'} L WP Xn,l wp Xn,lJ (2.31) l=l for the lower halfrange of n and 0 :S q :S p 1. Note that Yo,q is pure imaginary because xo,o = x0 and '"o,pj2 = x N/2 are both pure imaginary. This ensures that the final output is pure imaginary because n = 0 in the last stage of the algorithm. The forward combine equation for ICS, ISCS, and ICSIS sequences is given by equation {2.31} for the lower halfrange of 51
PAGE 63
n and 0 C: q C: p/21 with the exception that all sequences Xn,l are real. In addition: Yn,p/2 1/p{xn,O + ( 1)P/2+1:z:_n,p/2 + p/21 L ( 1)1[xn,l"'n,l]} 1=1 (2.32) for the lower halfrange of n. Note that Yo,p;2 = 0 because xo,o = xo = 0 and Xo,pj2 = "'N/2 = 0. The forward combine equation for ISCSIS sequences is: 1 / n(q+l/2){. "( 1)q+l+ Yn,q = PWN Xn,O T 't Xn,p/2 P/21 "' [wl(q+l/2);; Iwl(q+1/2);, __ tl} L..J p n, p n, (2.33) 1=1 for the lower halfrange of n and 0 S q S p /2 l. Note that Yo,q zs pure imaginary because xo,o = :co = 0. Next, assume that p is odd. The forward combine equation for I sequences 'tS.' (p1)/2 1 / nq{ + "' r lq lql} Yn,qpwN Xn,O lwp Xn,lWp Zn,l_, (2.34) 1=1 for the lower halfrange of n and 0 C: q C: p 1. The forward combine equation for ICS and ICSIS sequences is given by equation (2.34} for the lower halfrange of n and 0 C: q C: (p 1) /2 with the exception that all sequences Xn,l are real. The forward combine equation for ISCS and ISCSIS sequences zs: (p1)/2 Yn,q = 1/pwjVn(q+1 /2){:iin,O + L [w;l(q+l/ 2):iin,lw;<+1/');,_n,d} (2.35) l::::l for the lower halfrange of n and 0 C: q S (p3)/2. In addition: (p1)/2 Yn,(p1)/2 = 1/p{:i:n,O + L ( 1)1[in,t Xn,l]} 1=1 for the lower halfrange of n. Note that Yo,(p1 ); 2 = 0. (2.36) We now prove Theorem 2.12. First, assume that pis even. The forward combine equation for I sequences is obtained by developing a compact form 52
PAGE 64
of equation (2.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all I sequences: Zn,pl1 Z(pll)N/p+n XN(l+l)N/p+n _;z:(l+1)N/pn Using this result, we obtain: Yn,q = p1 1/pw}tq I; w;1xn,l l=O p/21 p/2 ljpw!tq{ I; w;1 :vn,l + I; w;q(pl1 ):cn,pl1} l=O l=O p/21 p/21 1/pw}tq{ I; w;1:cn,lI; 1=0 l=O p/21 p/2 ljpw;t{ I; w;1:vn,1I;w;xn,z} l=O 1=1 1/pw}t{xn,O + ( l)q+l;z:_n,p/2 + p/21 "' [ 1q lq1} Wp Xn,lWp Xn,lj l=l The forward combining ofiCS, ISCS, and ICSIS sequences requires one new equation: n/2 WN/pYn,p/2 ljp{Xn,O + (l)P/2+1X_n,p/2 + p/21 I; ( 1 )1 [ :Vn,! '"n,z]} 1=1 The forward combine equation for ISCSIS sequences is obtained by substituting equation (2.9) into equation (2.31): 53
PAGE 65
Yn,q = wl(q+l/2);; j'} p n,l Next, assume that p is odd. The forward combine equation for I se quences is obtained by developing a compact form of equation (2.4) which eliminates all redundant data. p1 Yn,q = 1 / nq """ lq p WN L..J Wp Xn,l l:::::O = 1 / nq{ q(p1)/2 .J.. p WN Wp '"n,(pl)/2 (p3)/2 (p3)/2 """ lq + """ q(pl1) } L..J WP Xn,l L..J WP Xn,pl1 l=O l=O = 1/pw}Fq{w;q(pl)/2,n,(p1)/2 + (p3)/2 (p3)/2 L w;lq'"n,lL l=O l=O 1 / nq{ q(p1)/2 .J.. pwN wp "n,(p1)/2 (p3)/2 (p1)/2 """ lq """ lq} L..J WP Xn,l L WP Xn,l The forward combining of ICS and ICSIS sequences does not require any new equations. The forward combine equation for ISCS and ISCSIS sequences is obtained by substituting equation (2.9) into equation (2.34): (p1)/2 1 / nq{ + """ lq lq'} Yn,q pwN Xn,O L LWp Xn,lwp Xn,ll l=1 54
PAGE 66
(p1)/2 = 1/pwn(q+l/2){;; + [wl(q+l/2);; 1 wl(q+1/2);; z]} N n,O L p n, p n, 1=1 For q = (p 1) j 2 this reduces to: n/2 Yn,(p1)/2 = WNjpYn,(p1)/2 (p1)/2 = 1/p{xn,o + L (1)1[xn,z'"n,z]} 1=1 This completes the proof of Theorem 2.12. The following corollary provides an important special case of this result. Corollary 2.9 Assume p = 2. The forward combine equation for I sequences ts: Yn,O = (zn,O "ifn,1)/2 Yn,1 wjlt(zn,O + '"n,1)/2 for the lower halfrange of n. The forward combine equation for ICS and ISCS sequences is: Yn,O ( Zn,O Z n,1) /2 Yn,1 (zn,O + "'n,1)/2 for the lower halfrange of n. The forward combine equation for ISCSIS sequences ts: n/2()/2 Yn,O = WN Zn,OtX'n,l for the lower halfrange of n. 55
PAGE 67
2.5 Real Composite EvenEven (REE) In this section, we will be concerned with the following symmetries: Definition 2.9 A real composite eveneven (REE) sequence Xn of length N, where N is even, is defined by: :VNn = Xn Note that an REE sequence of length N is also an REsequence of length N. A real conjugate symmetric zero odd term (RCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 2.9 lfxn is an REE sequence of length N, where N is even, then its DFT Xk is an RCSZO sequence of length N. If Xk is an RCSZO sequence of length N, where N is even, then its IDFT Xn is an REE sequence of length N. We now prove Lemma 2.9. We will only prove the first assertion. Assume Xn is an REE sequence of length N, where N is even. Since Xn is also an RE sequence of length N, Lemma 2.5 implies that its DFT Xk is an RCS sequence of length N. Thus, we have only to prove the third property in the definition of an RCSZO sequence. For this, we use the representation of Xk provided by Theorem 2. 7 and the REE symmetry of Xn as follows: N/21 xk = Xo + ( l)k'"N/2 + :E 2Xn cos(2r.kn/ N) n=l N/21 Xo + (l)k"'N/2 + L 2xN/2ncos[27rk(N/2n)/N] n=l 56
PAGE 68
N/21 = xo + ( 1)kxo + ( 1l L 2xn cos(27rkn/ N) n=l N/21 = (l)k[(l)kxo+xo+ L 2xncos(27rkn/N)] n=l N/21 = (l)k[xo + (l)k"'N/2 + L 2xncos(27rkn/N)] n::::.:l This completes the proof of Lemma 2.9. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for N NS boundary con ditions. Note that if N = 2(2M + 1), then an REE sequence satisfies NNS boundary conditions for the computational domain 0 ::; n::; M. That is: Theorem 2.13 Let xn be an REE sequence and let Xk be its RCSZO sym metric DFT, both of length N where N is even. Assume that N = 2(2M +1). The real form of the DFT is: M X2k = 2/N[xo + L 2xncos(47rknjN)J n:::::l for 0 :S k :S M, The real form of the IDFT is: M Xn = Xo + L 2X2k cos( 47rknj N) k=1 for 0 :S n :S M, Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.13. The result for the DFT follows from The orem 2.7, the RCSZO symmetry of Xk, and the REE symmetry of Xn as follows: N/21 X2k l/N[xo + '"N/2 + L 2xncos(47rkn/N)] n:::::l 57
PAGE 69
M 1/N{;co + '"N/ 2 + L 2xncos(47rkn/N) + n=l M L 2'"N/2ncos[47rk(N/2n)/Nj} n=l M = 2/N[;co + L 2;cn cos(47rkn/N)] n=l The result for the IDFT follows inunediately from Theorem 2.7 and the RCSZO synunetry of Xk. Note that only one fourth of the REE sequence "n needs to be specified. This completes the proof of Theorem 2.13. A fast, mixed radix algorithm for computing the REE symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RE synunetric FFT. Note that an REE sequence of length N may be stored inN /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCSZO sequence of length N may be stored inN /4 real storage locations. Our goal is to exploit these synunetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the synunetries which occur in the splittings of the RCSZO sequence Xk. We begin developing this algorithm by defining one new intermediate synunetry involved. Definition 2.10 A zero {Z) sequence Xk of length N is defined by: forO :S k :S N 1. The following lenuna establishes the relationship between the symmetries which occur in the splittings of the RCSZO sequence Xk We omit the proof of this result because it is trivial. Lemma 2.10 Let Xk be an RCSZO sequence of length N with factor 2. Then subsequence Xk,o is RCS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the RCS sequence Xk,o are identical to those in Lemma 2.3, with the addition that all sequences have R symmetry as well. 58
PAGE 70
A mixed radix splitting tree diagram for an RCSZO sequence is shown in Figure 2.5. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4 and 2.6, with the addition provided by the following lemma. We omit the proof of this result because it is trivial. Lemma 2.11 Let Xk be a Z sequence of length N. Its IDFT Xn is also a Z sequence of length N. These results show that each symmetry appearing in Figure 2.5 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the REE symmetric IFFT, obtained as a special case of that for the RE symmetric IFFT. Corollary 2.10 Assume p = 2. The inverse combine equation for RCS and Z sequences is: Xn,O = Yn,O for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.8 for arbitrary factors p. We now prove Corollary 2.10. The inverse combine equation for RCS and Z sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z symmetry of Yn,l Note that the companion equation is not needed be cause only one fourth of the REE sequence "'n needs to be computed. This completes the proof of Corollary 2.10. The next corollary provides all of the forward combine equations for the REE symmetric FFT, obtained as a special case of that for the RE symmetric FFT. Corollary 2.11 Assume p and Z sequences is: 2. The forward combine equation for R CS Yn,O Xn,O Yn,l 0 59
PAGE 71
CS RCSZO Figure 2.5: Splitting tree for REE symmetric FFT 60
PAGE 72
for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 2. 9 for arbitrary factors p. We now prove Corollary 2.11. The forward combine equation for RCS and Z sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the REE symmetry of a:n as follows: Yn,O (a:n,O + "'n,l)/2 (:.n + "'N/2n)/2 (:lOn + :Z:n)/2 Xn,O Yn,1 ( :.n,O "'n,1) /2 (a:n"'N/2n)/2 = (;vn:Z:n)/2 = 0 This completes the proof of Corollary 2.11. 61
PAGE 73
2.6 Real Composite EvenOdd (REO) In this section, we will be concerned with the following symmetries: Definition 2.11 A real composite evenodd (REO) sequence Xn of length N, where N is even, is defined by: Note that an REO sequence of length N is also an RE sequence of length N. A real conjugate symmetric zero even term (RCSZE) sequence Xk of length N, where N is even, is defined by: xk = xk XNk xk xk = ( l)k+lxk The following lemma establishes the relationship between these symme tries. Lemma 2.12 Ifxn is an REO sequence of length N, where N is even, then its DFT Xk is an RCSZE sequence of length N. If Xk is an RCSZE sequence of length N, where N is even, then its IDPT Xn is an REO sequence of length N. We now prove Lemma 2.12. We will only prove the first assertion. As sume Xn is an RE0 sequence of length N, where N is even. Since Xn is also an REsequence of length N, Lemma 2.5 implies that its DFT Xk is an RCS sequence of length N. Thus, we have only to prove the third property in the definition of an RCSZE sequence. For this, we use the representation of Xk provided by Theorem 2. 7 and the REO symmetry of ;en as follows: N/21 Xk = xo+(l)k'"N/2+ L 2;cncos(2r.knjN) n::::::l N/21 Xo + (l)kxN;2 + I; 2xN;2ncos[2d(N/2n)/N] n=l 62
PAGE 74
N/21 = "o+(1)k+1,o+(1)k+1 :E 2xncos(21rkniN) n=l N/21 ( 1)k+l[( 1)k+lx0 + xo + :E 2xn cos(21rknl N)] n;;:;l N/21 ( 1)k+l[a:o + ( 1)k"Nf2 + :E 2a:n cos(21rkniN)] n=l This completes the proof of Lemma 2.12. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for ND or NDS bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an REO sequence satisfies ND boundary conditions for the computational domain 0 :S: n :S: N I 4 1. That is: '"N1 = "1 '"N/4 0 Similarly, if N = 2(2M + 1), then an REO sequence satisfies NDS boundary conditions for the computational domain 0 :S: n :S: M. That is: Theorem 2.14 Let Xn be an REO sequence and let Xk be its RCSZE sym metric DFT, both of length N where N is even. Assume that N = 4M. The real form of the DFT is: N/41 X2k+1 = 21 N{x0 + :E 2xn cos[21rn(2k + 1)1 N]} n=l for 0 :S: k :S: N I 4 1. The real form of the ID FT is: N/41 Xn = :E 2X2k+l cos[21rn(2k + l)IN] k=O 63
PAGE 75
for 0::; n::; N /41. Next, assume that N = 2(2M + 1). The real form of the DFT is: M Xzk+1 = 2/N{xo + L 2xncos[21rn(2k + 1)/N]} n=l for 0 ::; k ::; M. The real form of the ID FT is: M1 Xn=(1tXN;z+ L 2X2kHcos[27rn(2k+1)/N] k=O for 0::; n::; M. We now prove Theorem 2.14. We prove the result for the DFT for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 2.7, the RCSZE symmetry of Xk, and the REO symmetry of '"n as follows: N/21 1/N{xoxN;z+ L 2xncos[27rn(2k+1)/N]} n=l N/41 = 1/N{xoxN;z+ L 2xncoS:27rn(2k+1)/N]+ n=l N/41 L 2xN/2n cos[27r(N/2n)(2k + 1)/N]} n=l N/41 2/N{xo + L 2:rncos[27rn(2k + 1)/N]} n=l The results for the IDFT follow immediately from Theorem 2. 7 and the RCSZE symmetry of Xk. Note that only one fourth of the REO sequence Xn needs to be specified. This completes the proof of Theorem 2.14. A fast, mixed radix algorithm for computing the REO symmetric DFT and its inverse, given :>:n in natural order, may be obtained as a special case of that for theRE symmetric FFT. Note that an REO sequence of length N may be stored inN /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an RCSZE sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth 64
PAGE 76
in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the RCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 2.13 Let Xk be an RCSZE sequence of length N with factor 2. Then subsequence Xk,o is Z symmetric, and subsequence Xk,l is RSCS sym metric. The symmetries which occur in the splittings of the RSCS sequence Xk,l are identical to those in Lemma 2.3, with the addition that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RCSZE sequence is shown in Figure 2.6. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.6, and 2.11. These results show that each symmetry appearing in Figure 2.6 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the REO symmetric IFFT, obtained as a special case of that for the RE symmetric IFFT. Corollary 2.12 Assume p = 2. The inverse combine equation for Z and RSCS sequences is: Zn,O = fin,l for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.8 for arbitrary factors p. We now prove Corollary 2.12. The inverse combine equation for Z and RSCS sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.6 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the REO sequence Xn needs to be computed. This completes the proof of Corollary 2.12. 65
PAGE 77
RCSZE Figure 2.6: Splitting tree for REO symmetric FFT 66
PAGE 78
The next corollary provides all of the forward combine equations for the REO symmetric FFT, obtained as a special case of that for the RE symmetric FFT. Corollary 2.13 Assume p = 2. The forward combine equation for Z and RSCS sequences is: Yn,O 0 iJn,l Xn,O for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.9 for arbitrary factors p. We now prove Corollary 2.13. The forward combine equation for Z and RSCS sequences may be regarded as a special case of that for RCS and RSCS sequences, where p = 2. Thus, we apply Corollary 2.7 and use the REO symmetry of Zn as follows: Yn,O (zn,o + "'n,d/2 (zn + "'N/2n)/2 (znZn)/2 = 0 i/n,l (zn,O"'n,l)/2 ( Zn Z N/2n) /2 = (zn + Zn)/2 Zn,O This completes the proof of Corollary 2.13. 67
PAGE 79
2.7 Real Composite OddEven (ROE) In this section, we will be concerned with the following symmetries: Definition 2.12 A real composite oddeven (ROE) sequence Xn of length N, where N is even, is defined by: Note that an ROE sequence of length N i.< also an RO sequence of length N. An imaginary conjugate symmetric zero eten term (ICSZE) sequence Xk of length N, where N is even, is defined by: Xk ( lf+1 xk The following lemma establishes the relationship between these symme tries. Lemma 2.14 If Zn is an ROE sequence of length N, where N is even, then its DFT Xk is an ICSZE sequence of length N. If Xk is an ICSZE sequence of length N, where N is even, then its IDFT Xn is an ROE sequence of length N. We now prove Lemma 2.14. We will only prove the first assertion. As sume Zn is an ROE sequence of length N, where N is even. Since Xn is also an RO sequence of length N, Lemma 2. 7 implies that its DFT Xk is an ICS sequence of length N. Thus, we have only to prove the third property in the definition of an ICSZE sequence. For this, we use the representation of Xk provided by Theorem 2.10 and the ROE symmetry of Xn as follows: N/21 Xk = if N L 2xn sin(271'kn/ N) n=l N/21 i/N L 2xN/2nsin[2r.k(N/2n)/N] n=l 68
PAGE 80
N/21 (1)k+1[i/N 2:: 2xnsin(27rkn/N)] n=l This completes the proof of Lemma 2.14. The next theorem uses the previous lemma to find the real form of the DFT aud IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DN or DNS bound ary conditions, depending on the length of the sequence N. Nate that if N = 4M, then au ROE sequence satisfies DC\ boundary conditions for the computational domain 1 ::; n ::; N /4. That is: "o 0 Similarly, if N = 2(2M +1), then an ROE sequence satisfies DNS boundary conditions for the computational domain 1 :S n ::; M. That is: xo 0 Theorem 2.15 Let Xn be an ROE sequence and let Xk be its ICSZE sym metric DFT, both of length N where N is even. Assume that N = 4M. The real form of the DFT is: N/41 Im(X2k1) = 2/N{(1)k+l"'N;4 + 2:: 2xnsin[2r.n(2k 1)/N]} n.:;;;:;l for 1 :S k :S N /4. The real form of the IDFT is: N/4 "'n =2:: 2Im(X2k1) sin[27rn(2k1)/N] k=1 for 1 :S n :S N/4. Next, assume that N = 2(2M' + 1). The real form of the DFT is: M Im(X2kd = 2/ N I: 2xn sin[27rn(2k1)/ N] n=l 69
PAGE 81
for 1 :":: k :":: M. The real form of the IDFT is: M "'n =I:2Im(X2kJ)sin[2r.n(2k1)/N] k=l for1:0:n:O:M. We now prove Theorem 2.15. We prove the result for the DFT for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. Tills result follows from Theorem 2.10, the ICSZE symmetry of Xk> and the ROE symmetry of Xn as follows: N/21 Im(X2kd = 1/N L l);N] n=l N/41 1/N{(1)k+12xN;4 + L 2xnsin[27rn(2k 1)/N] + n=l N/41 L 2:cN/2nsin[27r(N/2n)(2k 1)/NJ} n=l N/41 2/N{(1)k+l"'N/4 + L 2xnsin[2r.n(2k1)IN]} n=l The results for the IDFT follow immediately from Theorem 2.10 and the ICSZE symmetry of Xk. Note that only one fourth of the ROE sequence Xn needs to be specified. Tills completes the proof of Theorem 2.15. A fast, mixed radix algorithm for computing the ROE symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RO symmetric FFT. Note that an ROE sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an ICSZE sequence of length N may be stored in N I 4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. Tills algorithm is based on the symmetries which occur in the splittings of the ICSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establi.shes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of tills result because it is trivial. 70
PAGE 82
Lemma 2.15 LetXk be an ICSZE sequence of length N withfactor2. Then subsequence Xk,O is Z symmetric, and subsequence Xk,l is ISCS symmetric. The symmetries which occur in the splittings oj the ISCS sequence Xk,1 are identical to those in Lemma 2.3, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an ICSZE sequence is shown in Figure 2.7. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11. These results show that each symmetry appearing in Figure 2.7 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the ROE symmetric IFFT, obtained as a special case of that for the RO symmetric IFFT. Corollary 2.14 Assume p = 2. The inverse combine equation for Z and ISCS sequences is: Xn,O = fln,l for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.11 jor arbitrary factors p. We now prove Corollary 2.14. The inverse combine equation for Z and ISCS sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z sym metry of Yn,O Note that the companion equation is not needed because only one fourth of the ROE sequence Zn needs to be computed. This completes the proof of Corollary 2.14. The next corollary provides all of the forward combine equations for the ROE symmetric FFT, obtained as a special case of that for the RO symmetric FFT. Corollary 2.15 Assume p = 2. The forward combine equation for Z and ISCS sequences is: Yn,O 0 iJn,l Xn,O 71
PAGE 83
ICSZE Figure 2. 7: Splitting tree for ROE symmetric FFT 72
PAGE 84
for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.12 for arbitrary factors p. We now prove Corollary 2.15. The forward combine equation for Z and ISCS sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.9 and use the ROE symmetry of "'n as follows: Yn,O (:vn,O"'n,l)/2 (:en"'N/2n)/2 (:enXn)/2 0 Yn,l (:vn,O + "'n,l)/2 ( Xn + :C N/2n) /2 = (xn + Xn)/2 This completes the proof of Corollary 2.15. 73
PAGE 85
2.8 Real Composite OddOdd (ROO) In this section, we will be concerned with the following symmetries: Definition 2.13 A real composite oddodd (ROO) sequence Xn of length N, where N is even, is defined by: Xn = Xn Xn 'XNj2n = Xn Note that an ROO sequence of length N is also an RO sequence of length N. An imaginary conjugate symmetric zero odd term (ICSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 2.16 If "n is an ROO sequence of length N, where N is even, then its DFT Xk is an ICSZO sequence of length 1V. If Xk is an ICSZO sequence of length N, where N is even, then its IDFT :Z:n is an ROO sequence of length N. We now prove Lemma 2.16. We will only prove the first assertion. As sume Xn is an ROO sequence of length N, where N is even. Since Xn is also an RO sequence of length N, Lemma 2. 7 implies that its DFT Xk is an ICS sequence of length N. Thus, we have only to prove the third property in the definition of an ICSZO sequence. For this, we use the representation of Xk provided by Theorem 2.10 and the ROO symmetry of Xn as follows: N/21 Xk = i/N L 2xnsin(2dn/N) n=l N/21 i/N L 2xN;2 .. nsin[21l"k(N/2n)/N] n::::::1 74
PAGE 86
N/21 = (1)k[i/N L 2ctnsin(2r.kn/N)] n=l This completes the proof of Lemma 2.16. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DDS boundary con ditions. Note that if N = 2(2M + 1), then an ROO sequence satisfies DDS boundary conditions for the computational domain 1 :S n :S M. That is: Theorem 2.16 Let "'n be an ROO sequence and let Xk be its ICSZO sym metric DFT, both of length N where N is even. Assume that N = 2(2M +1). The real form of the DFT is: M Im(X2k) = 2/NL 2"'nsin(4r.kn/N) n=l for 1 :S k :S M. The real form of the IDFT is: M "n =L 2lm(X2k) sin( 4dn/N) k=1 for 1 :S n :S M. Note that the results for the DFT and IDFT are identical except for scaling. We now prove Theorem 2.16. The result for the DFT follows from The orem 2.10, the ICSZO symmetry of Xk, and the ROO symmetry of Xn as follows: N/21 1/N L 2xnsin(4r.kn/N) n=l M 1/N{L 2xnsin(4r.knjN) + n=l M L 2ctN/2n sin[4r.k(N/2n)/N]} n=l 75
PAGE 87
M 21 N L 2xn sin( 47rkn/ N) n=l The result for the IDFT follows immediately from Theorem 2.10 and the ICSZO symmetry of Xk. Note that only one fourth of the ROO sequence :Cn needs to be specified. This completes the proof of Theorem 2.16. A fast, mixed radix algorithm for computing the ROO symmetric DFT and its inverse, given Xn in natural order, may be obtained as a special case of that for the RO symmetric FFT. Note that an ROO sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, aa ICSZO sequence of length N may be stored in N I 4 real storage loca:ions. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the synunetries which occur in the splittings of the ICSZO sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 2.17 Let Xk be an ICSZO sequence of length N with factor 2. Then subsequence Xk,o is ICS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the ICS sequence xk,O are identical to those in Lemma 2.3, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an ICSZO sequence is shown in Figure 2.8. The acronyms representing the symmetries are summarized in Table 2.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDFT induced by the intermediate symmetries in the DFT are identical to those in Lemmas 2.4, 2.8, and 2.11. These results show that each symmetry appearing in Figure 2.8 induces a symmetry in the IDFT. These induced symmetries are summarized in Table 2.2 for ease of reference. The next corollary provides all of the inverse combine equations for the ROO symmetric IFFT, obtained as a special case of that for the RO symmetric IFFT. 76
PAGE 88
cs ICSZO Figure 2.8: Splitting tree for ROO symmetric FFT 77
PAGE 89
Corollary 2.16 Assume p = 2. The inverse combine equation for ICS and Z sequences is: 2ln,O = Yn,O for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 2.11 for arbitrary factors p. We now prove Corollary 2.16. The inverse combine equation for ICS and Z sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.8 and use the Z symmetry of Yn,! Note that the companion equation is not needed because only one fourth of the ROO sequence :Z:n needs to be computed. This completes the proof of Corollary 2.16. The next corollary provides all of the forward combine equations for the ROO symmetric FFT, obtained as a special case of that for the RO symmetric FFT. Corollary 2.17 Assume p = 2. The forward combine equation for ICS and Z sequences is: Yn,O = Yn,l 0 for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 2.12 for arbitrary factors p. We now prove Corollary 2.17. The forward combine equation for ICS and Z sequences may be regarded as a special case of that for ICS and ISCS sequences, where p = 2. Thus, we apply Corollary 2.9 and use the ROO symmetry of Xn as follows: Yn,O (:z:n,O"'n,l)/2 (:z:n'"N/2n)/2 (:z:n + Xn)/2 Yn,! = (xn,O + "'n,J)/2 (xn + "'N/2n)/2 (xn:Z:n)/2 0 This completes the proof of Corollary 2.17. 78
PAGE 90
2.9 Real Staggered Even (RSE) In this section, we will be concerned with the following symmetries: Definition 2.14 A real staggered even (RSE) sequence :Vn of length N zs defined by: An weven (wE) sequence Xk of length N is defined by: (2.37) (2.38) The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.18 If '"n is an RSE sequence of length N, then its DFT Xk zs an wE sequence of length N. If Xk is an wE sequence of length N, then its IDFT "'n is an RSE sequence of length N. The next lemma will be needed to obtain the real form of the DFT and IDFT. Lemma 2.19 Let Xk be an wE sequence of length N, and let Xk denote the magnitude of Xk. Then: xk = xk = Xn XN+k = k/2 WN Xk 1/ N L XnW"i,k(n+l/2) n:::::O LX k(n+l/2) kWN Xk Xk 79 (2.39) (2.40) (2.41) (2.42) (2.43)
PAGE 91
We now prove Lemma 2.19. We express Xk in polar form as follows: e Xk = Xke' Substituting this into equation (2.38) and solving for e leads to equation (2.39). Combining equations (2.1) and (2.39) leads to equation (2.40), while combining equations (2.2) and (2.39) leads to equation (2.41). Equation (2.42) is obtained from equation (2.40) as follows: N1 XN+k = 1/ N L XnWN(N+k)(n+1/2) n=O N1 liN "' k(nTl/2) L....J XnWN n=O Equation (2.43) is obtained by combining equations (2.37) and (2.39) as follows: XNk = w"N(Nk)/2XNk w';j'xk xk This completes the proof of Lemma 2.19. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the DFT is the eigenvector expansion required by the Fourier analysis method for ND boundary condi tions. Note that if N is even, then an wE sequence (represented by Xk) satis fies ND boundary conditions for the computational domain 0 S: k S: N /21. That is: X1 XN1 =X, XN/2 = 0 Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for NSNS boundary conditions. Note that if N is even, then an RSE sequence satisfies NSNS boundary conditions for the computational domain 0 S: n S: N/21. That is: XN1 Xo ZN/21 = XN/2 80
PAGE 92
Theorem 2.17 Let Zn be an RSE sequence and let xk be its wE symmetric DFT, both of length N where N is even. The real form of the DFT is: N/21 Xk = 1/ N L 2a:n cos[d(2n + 1 )/ N] n=D for 0 :S k :S N /2 1. The real form of the ID FT is: N/21 Zn = Xo + L 2Xk cos[?rk(2n + 1)/N] for 0 :S n :S N /2 1. We now prove Theorem 2.17. The result for the DFT follows from equa tion (2.40) and the RSE symmetry of Zn as follows: n=O N/21 N/21 1/N{ "' k(n+!/2) ..L "' .k(Nn1/2)} ZnWN L X]\ nlWN n=D n=O N/21 N/21 1/N{ L XnWNk(n+!/2) + L n=.O n:::=O N/21 2/NRe[ L ZnWNk(n+J/2)] n=O N/21 = 1/N L 2xncos[d(2n+1)/NJ n:::::::O Note that only half of the wE sequence Xk needs to be specified. The result for the IDFT follows from equations (2.41) and (2.43) as follows: N1 Xn L k=1 81
PAGE 93
N/21 N/21 Xo + L + L XkwNk(n+l/2) k;1 k;l N/21 = Xo + 2Re[ L k;1 N/21 Xo + L 2Xkcos[?rk(2n + 1)/N] k;l Note that only half of the RSE sequence Xn needs to be specified. This completes the proof of Theorem 2.17. We now develop a fast, mixed radix algorithm for computing the RSE symmetric DFT and its inverse, given Xn in natural order. Note that an RSE sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. How ever, an wE sequence of length N requires N real storage locations. Thus, in order to obtain an inplace algorithm, we must use a more compact rep resentation of an wE sequence. Such a compact representation is provided by the quantities Xk in Lemma 2.19. Using this representation, an wE se quence of length N may be stored inN /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. The procedure for developing this algorithm will be different from the other algorithms in this chapter for the following reason. Equation (2.40) shows that when we replace the complex quantity Xk by the real quantity Xk, the DFT is changed to a new transform, which we call the discrete staggered transform (DST). Equation (2.41) provides the inverse discrete staggered transform (IDST). Note that the DST is a con stant multiple of the DFT, whereas the IDST is not related to the IDFT in any simple way. We have found that the applications of the DST include the boundary conditions considered in this section, as well as others. Thus, we have devoted all of Chapter 3 to the development of fast, mixed radix algorithms for computing the DST and IDST. These algorithms are called the fast staggered transform (FST) and inverse fast staggered transform (IFST). The FST for RSE sequences is developed in Section 3.3. 82
PAGE 94
2.10 Real Staggered Odd (RSO) In this section, we will be concerned with the following symmetries: Definition 2.15 A real staggered odd (RSO) sequence "'n of length N IS defined by: An wodd (wO) sequence Xk of length N is defined by: (2.44) (2.45) The following lemma establishes the relationship between these symme tries. We omit the proof of this result because it is well known. Lemma 2.20 If "'n is an RSO sequence of length N, then its DFT Xk 1s an wO sequence of length N. If Xk is an wO sequence of length N, then its IDFT "'n is an RSO sequence of length N. The next lemma will be needed to obtain the real form of the DFT and IDFT. Lemma 2.21 Let Xk be an wO sequence of length N, and let Xk denote the magnitude of Xk. Then: n=O N1 "'"' xk(n+l/2) L z kWN k=O Ji:k Ji:k 83 (2.46) (2.47) (2.48) (2.49) (2.50)
PAGE 95
We now prove Lemma 2.21. We express Xk in polar form as follows: xk = xke;o Substituting this into equation (2.45) and solving for 8 leads to equation (2.46). Combining equations (2.1) and (2.46) leads to equation (2.47), while combining equations (2.2) and (2.46) leads to equation (2.48). Equation (2.49) is obtained from equation (2.47) as follows: Nl ./N (N,k)(n+l/2) XN+k = _, XnWN n=O N1 ./N k(n+l/2) 2 L XnWN n=O Xk Equation (2.50) is obtained by combining equations (2.44) and (2.46) as follows: (Nk)/2x ZWN Nk k/2x 'WN k xk This completes the proof of Lemma 2.21. The next theorem uses the previous lemma to find the real form of the DFT and IDFT. Observe that the result for the DFT is the eigenvector expansion required by the Fourier analysis method for DN boundary condi tions. Note that if N is even, then an wO sequence (represented by Xk) sat isfies DN boundary conditions for the computational domain 1 :<; k :<; N /2. That is: Xo 0 XN/21 = XN/2+1 Observe that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DSDS boundary conditions. Note that if N is even, then an RSO sequence satisfies DSDS boundary conditions for the computational domain 0 :<:: n :<:: N /21. That is: '"N/21 84
PAGE 96
Theorem 2.18 Let Xn be an RSO sequence and let xk be its wO symmetric DFT, both of length N where N is even. The real form of the DFT is: N/21 Xk=1/N :E 2:z:nsin[d(2n+1)/NJ n=O for 1 ::0: k ::0: N /2. The real form of the IDFT is: N/21 xn=(1t+1XN;2 :E 2Xksin[7rk(2n+1)/N] k=1 for 0 :S n :S N /21. We now prove Theorem 2.18. The result for the DFT follows from equa tion (2.47) and the RSO symmetry of :Z:n as follows: N1 Xk = ;N"' k(n+1/2) t L ZnWN n=O N/21 N/21 "/N{ "' k(n+l/2) + "' k(Nn1/2)} Z L 'J:nWN L XNnlWN n:::.::O n=O N/21 N/21 "/N{ "' k(n+l/2) k(n+l/2)} t L XnWN L XnWN n=O N/21 2/ Nim[ L XnwNk(n+l/2)] n=O N/21 = 1/ N L 2xn sin[d(2n + 1)/ N] n=O Note that only half of the wO sequence Xk needs to be specified. The result for the IDFT follows from equations (2.48) and (2.50) as follows: N1 "' xk(n+1/2) Zn Z L kWN k=O 85
PAGE 97
N/21 N/21 i{i(ltXN/2 + L L Xkw!/'(n+1 /2)} k=1 k=1 N/21 i{i(ltXN/2 + 2ilm[ L k=1 N/21 (lt+'xN12I: 2Xpin[?rk(2n+l)/N] k=1 Note that only half of the RSO sequence :lln needs to be specified. This completes the proof of Theorem 2.18. We now develop a fast, mixed radix algorithm for computing the RSO symmetric DFT and its inverse, given :lln in natural order. Note that an RSO sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. How ever, an wO sequence of length N requires N real storage locations. Thus, in order to obtain an inplace algorithm, we must use a more compact rep resentation of an wO sequence. Such a compact representation is provided by the quantities Xk in Lemma 2.21. Using this representation, an wO sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. The procedure for developing this algorithm will be different from the other algorithms in this chapter for the following reason. Equation (2.47) shows that when we replace the complex quantity Xk by the real quantity Xk> the DFT is changed to a new transform, which we call the discrete staggered transform (DST). Equation (2.48) provides the inverse discrete staggered transform (IDST). Note that the DST is a constant multiple of the DFT, whereas the IDST is not related to the IDFT in any simple way. We have found that the applications of the DST include the boundary conditions considered in this section, as well as others. Thus, we have devoted all of Chapter 3 to the development of fast, mixed radix algorithms for computing the DST and IDST. These algorithms are called the fast staggered transform (FST) and inverse fast staggered transform (IFST). The FST for RSO sequences is developed in Section 3.4. 86
PAGE 98
2.11 Tables of Symmetries 87
PAGE 99
Table 2.1: Symmetries in the IDFT Aero Symmetry Sequence DFT Periodic ZN+n = ::Cn xN+k = xk R Real ;fn = a'.!n xNk = xk RE Real Xn = Zn xk =Xk Even ZNn = Xn XNk = Xk RO Real Xn = Xn xk = Xk Odd XNn = XNk = Xk REE Real Composite Xn = Xn xk =Xk EvenEven ::CNn = Xn XNk = Xk (N even) "'NI2 n = Xn Xk=(l)kXk REO Real Composite Xn = Xn xk = xk EvenOdd XNn = Xn xNk = xk (N even) "'NI2n = 'Xn xk=(l)k+1Xk ROE Real Composite Xn = Xn xk = Xk OddEven ';VNn = Zn XNk = Xk (N even) XNf2n = Xn xk = (l)k+lxk ROO Real Composite Xn = Xn xk = Xk OddOdd XNn = Xn XNk = Xk (N even) "N/2 n = Xn Xk=(ljkXk RSE Real :Cn = Zn xNk = xk Staggered XNnl = Xn kXk = wNXk Even RSO Real Xn = Xn XNk = Xk Staggered XNn1 = Xn r kXk = wNXk Odd 88
PAGE 100
Table 2.2: Symmetries ir. the DFT Aero Symmetry Sequence IDFT Periodic xN+k = xk XN+n = :Z:n cs Conjugate xNk = xk Xn = Zn Symmetric scs Staggered xNk1 = xk nXn = WN Zn Conjugate n/2 N Xn = WN Xn Symmetric CSIS CS Indcd Xk,pq = XN/pkl,q nYn,pq = w NjpYn,q Interseq Symmetry SCSIS SCS Indcd Xk,pq1 = X N/pkl,q nYn,pq1 = WNjpYn,q Interseq Symmetry R Real xk = xk XNn = Zn I Imaginary xk = Xk XNn = Xn 89
PAGE 101
Table 2.2: ( contd.) Aero Symmetry Sequence IDFT ReSZO ReS & Zero xk =Xk "in= Xn Odd Terms xNk = xk 'XNn = Xn (N even) Xk=(lJkXk 'XN/2n = Xn I ReSZE ReS & Zero xk =Xk Xn = Xn Even Terms XNk = Xk XNn = Xn (N even) xk = ( l)k+l xk "'N/2n = Xn reSZE res & Zero xk = Xk Xn = Xn Even Terms XNk = Xk XNn = Xn (N even) xk = ( l)k+l xk XN/2n;:;:;:;: Zn reszo res & Zero xk = Xk :E'n = l:n Odd Terms XNk = Xk ZNn = Xn (N even) Xk=(l)kXk "'N/2n = cen z Zero xk = o Xn = 0 wE wEven xNk = xk Zn = Xn kXk = wNXk XNn1 = Xn wO wOdd xNk = xk Xn = Xn Xk = wNXk XNnl = Xn 90
PAGE 102
Chapter 3 Fast Staggered Transforms 3.1 Complex (C) We begin by defining the fast staggered transform, and establishing notation which will be used throughout. Definition 3.1 Given a C sequence Xn, for 0 :<:; n :<:; N 1, discrete staggered transform (DST) is defined by: the forward N1 X 1/N "'"" .k(nt1/2) kL..J ZnWN (3.1) n=O for 0 :S k :S N 1, where: For convenience, we will often suppress the constant 1/ N. The following theorem provides the inverse discrete staggered transform (IDST). Theorem 3.1 A c sequence Xn may be recovered from its DST xk inverse discrete staggered transform (IDST) which is given by: forO
PAGE 103
We now prove Theorem 3.1 using Lemma 2.1 as follows: N1 "' X k(n+l/2) L,. kWN k=O N1 N1 L [1/ N L k:::::O j=O N1 N1 1/ N L 3:j[ L j=O k=O N1 1/N L Xj[N8n(j)J j=O This completes the proof of Theorem 3 .1. By Definition 3.1, the sequences "'n and Xk are of length N. These sequences can be extended to all integral values of n and k using the pe riodicity properties provided by the following corollary. Carefully note the unusual periodicity property satisfied by Xkl Corollary 3.1 Equations {3.1} and {3.2} imply that the sequences Xn and xk may be ea:tended periodically to all integral values of n and k by: XN+n Xn XN+k Xk We will refer to the periodicity property of Xk as odd periodicity. We will develop fast algorithms for computing the DST and IDST which are based on a variant of the CooleyTukey fast Fourier transform (FFT). Following the general approach in [1], we will develop algorithms for the IDST given Xk in bitreversed order. Inverting these yields algorithms for the DST given "'n in natural order. We begin by defining notation which will be needed in the development of these algorithms. Definition 3.2 Given a C sequence Xk of length N, and a factor p of N, we define a splitting of Xk consisting of the following p subsequences, each of length N / p: xk,q = Xpk+q for 0 S k S N/p1, 0 S q S p1. We denote the IDST of these by Yn,q That is: N/p1 Yn,q = "' xk wk(nc1/2) L.....t ,q Njp k=O 92
PAGE 104
for 0 S: n S: N jp1, 0 S: q:::; pl. Given a C sequence Xn of length N, and a factor p of N, we define the following p subsequences, each of length N / p: for 0 S: n S: N jp1, 0 S: l S: pl. The inverse fast staggered transform (IFST) is based on the principle of computing the quantities Yn,q, and then combining these in the appropriate fashion to obtain Xn,l The precise equation for performing this combining operation is provided by the next theorem. Theorem 3.2 The inverse combine equation for C sequences is: p1 lq q(n+l/2) il:n,l = .L..J wp w N Yn,q q=O for 0 S: n S: N jp1, 0 S: l S: p1. We now prove Theorem 3.2. N1 """" X k(n+J/2) :Z:n L kWN k=O p1 Njp1 """" """" X (pk+q)(n+1/2) L_, L_, pk+qWN q=O k=O p1 Njp1 """" q(n+l/2) """" X k(n+l/2) L_, WN L_, k,qWNjp q=O k=O p1 """" q(n+J/2) LWN Yn,q q=O In terms of the subsequence notation defined previously, this result is: = '"lNJp+n p1 """" q(lNJp+n+1/2) L WN YlN/p+n,q q=O p1 """" lq q(n+l/2) LWPWN Yn,q q=O 93 (3.3)
PAGE 105
This completes the proof of Theorem 3.2. The following corollary provides an important special case of this result. Corollary 3.2 Assume p = 2. The inverse combine equation for C sequences ts: Xn,O Xn,l for 0 '5c n '5c N /2 1. n+:/2 Yn,O + WN Yn,1 n+l/2 Yn,OWN Yn,l We now begin the development of the FST algorithm. We will obtain the forward combine equation for the FST by inverting the inverse combine equation. For this, we will use the 'orthogonality property' provided by Lemma 2 .1. The result is summarized in the following theorem. Theorem 3.3 The forward combine equation for C sequences is: (3.4) for 0 '5c n '5c N fp1, 0 '5c q '5c p1. We now prove Theorem 3.3. Njp1 "' X wk(n+l/2) Yn,q k,q Njp N/p1 L N/p1 N1 L [1/ N L k=O j=O N1 S/p1 liN "' q(j+l/2)[ k(nj)] :v1wN WN/p j=O k=O p1 1 / "' q(lN/p+n+l/2) P L XtN/p+nWN = p1 1 / q(n+l/2)"' lq pwN LwP Xn,l 94
PAGE 106
This completes the proof of Theorem 3.3. The following corollary provides an important special case of this result. Corollary 3.3 Assume p = 2. The forward combine equation for C se quences 'ts: Yn,O (:lln,O + :lln,,J/2 Yn,l = WN(n+l/2)(ren,O:lln,l)/2 for 0 S n S N /2 1. 95
PAGE 107
3.2 Real (R) In this section, we will be concerned with the following symmetries: Definition 3.3 A real {R) sequence Xn of length N is defined by: An odd conjugate symmetric {OCS) sequence Xk of length N is defined by: XNk = Xk The following lemma establishes the relationship between these symme tries. Lemma 3.1 If Xn is an R sequence of length N, then its DST Xk is an OCS sequence of length N. If Xk is an OCS sequence of length N, then its IDST Xn is an R sequence of length N. We now prove Lemma 3.1. 'vVe will only prove the first assertion. n=O n=O This completes the proof of Lemma 3.1. The next theorem uses the previous lemma to find the real form of the DST and IDST. These results will be used in subsequent sections in order to relate the DST to fast Poisson solvers. Theorem 3.4 Let Xn be an R sequence and let xk be its ocs symmetric DST, both of length N. The real form of the DST is: N1 1/ N L Xn cos['lf'k(2n + 1)/ NJ n=O N1 1/N L Xnsin[7l'k(2n + 1)/NJ n:::;O 96
PAGE 108
for 0 :0: k :0: N /2 if N is even, and 0 :0: k :0: (N1)/2 if N is odd. If N is even, then the real form of the IDST is: "'n = Xo + ( 1r+1 Im(XN;2l + N/21 L {2Re(Xk)cos['ll'k(2n+ 1)/N]2Im(Xk)sin['ll'k(2n+ 1)/N]} k=1 for 0 :0: n :0: N 1. If N is odd, we obtain instead: (N 1)/2 :l!n = Xo+ L {2Re(Xk)cos['ll'k(2n+1)/N]2Im(Xk)sin[r.k(2n+1)/N]} k.:;;;:l forO:Sn:SN1. We now prove Theorem 3.4. The result for the DST follows immediately from Definition 3.1 and the R symmetry of Xn Note that only half of the OCS sequence Xk needs to be specified. We prove the result for the IDST for the case of even N only, since the proof for odd N is similar. The OCS symmetry and odd periodicity of X k implies: Xo XN = Xo XN/2 XN/2 Thus, Xo is real and XN;2 is pure imaginary. Using this and the OCS symmetry of xk yields: N1 :l!n L k;O Xo + ( 1)niXN/2 + N/21 N/21 X k(n+l/2) + X (Nk)(n+l/2) L._. kWN L._. NkWN k=1 k=1 Xo + ( 1r+1 Im(XN/2) + N/21 N/21 X k(n+l/2) + X k(nc1/2) L._. kWN L._. kWN k=1 k=1 N/21 = Xo + ( 1r+1Im(XN;2 ) + 2Re[ L il/2)] k=l 97
PAGE 109
Xo + ( l)n+llm(XN/2) + N/21 L {2Re(Xk) cos[d(2n + 1)/ N]2Im(Xk) sin[d(2n + 1)/ N]} k=1 This completes the proof of Theorem 3.4. We now develop a fast, mixed radix algorithm for computing the R symmetric DST and its inverse, given "'n in natural order. Note that an R sequence of length N may be stored in N real storage locations, compared to 2N real storage locations for a C sequence of length N. Also, an OCS sequence of length N may be stored in N real storage locations because half of the sequence is redundant and need not be stored. Our goal is to exploit these symmetries in the data in order to obtain a reduction by half in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the OCS sequence Xk. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.4 Let Xk be an OCS sequence of length N with factor p. For q # 0, we define OCS induced intersequence symmetry (OCSIS) by: For q # 0, we denote subsequence Xk,q by OCSIS(q). Subsequence pq is a redundant copy of subsequence q, which we denote by OCSIS(pg) = OCSIS* (q). We also say that subsequence pq is the dual of subsequence q. A staggered odd conjugate symmetric (SOCS) sequence Xk of length N is defined by: XNk1 = Xk Let N have factor p. For 0 c; q c; p1, we define SOCS induced interse quence symmetry (SOCSIS} by: Xk,pq1 = X Njpkl,q For 0 c; q c; p1, we denote subsequence Xk,q by SOCSIS(q). Subsequence Pq 1 is a redundant copy of subsequence q, which we denote by SOCSIS(pq 1} = SOCSIS*(q). We also say that subsequence pq1 is the dual of subsequence q. 98
PAGE 110
The following lemma establishes the relationship between these symme tries. Lemma 3.2 Let Xk be an acs sequence of length N with factor p. Then the subsequence Xk,O is aCS symmetric, and the remaining subsequences Xk,q are aCSIS symmetric. If p is even, then the aCSIS symmetry of subsequence xk,p/2 reduces to sacs symmetry. Let Xk be an SaCS sequence of length N with factor p. Then the subse quences Xk,q are SaCSIS symmetric. Ifp is odd, then the SaCS IS symmetry of subsequence xk,(p1)/2 reduces to sacs symmetry. We now prove Lemma 3.2. Let Xk be an OCS sequence of length N with factor p. The subsequence X k,O satisfies: XN/pk,O = XNpk = Xpk = Xk,O That is, subsequence Xk,o is OCS symmetric. The remaining subsequences xk,q satisfy: xvk+v X Npkp+q = X p(Njpkl)+q X Njpk1,q That is, for q f 0 the subsequences Xk,q are OCSIS symmetric. If pis even, then the OCSIS symmetry of Xk,p/2 reduces to: xk,p/2 = X N/vkl,p/2 That is, subsequence Xk,v/2 is SOCS symmetric. Let Xk be an SOCS sequence of length N with factor p. The subse quences xk,q satisfy: Xk,pq1 Xvk+pq1 X N pkp+q+l1 Xp(N/vk1)+q XN/pkl,q That is, the subsequences Xk,q are SOCSIS symmetric. Ifp is odd, then the SOCSIS symmetry of Xk,(v1);2 reduces to: Xk,(p1)/2 = X N/vk1,(pl)/2 99
PAGE 111
That is, subsequence Xk,(p1);2 is SOCS symmetric. This completes the proof of Lemma 3.2. A mixed radix splitting tree diagram for an OCS sequence is shown in Figure 3.1. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3.3 The intersequence symmetry OCSIS induces the following in tersequence symmetry in the IDST: (n+l/2)Yn,pq WN/p Yn,q Let Xk be an SOCS sequence of length N. Its IDST >:n satisfies: (n+l/2)_ WN Zn (n+1/2)/2WN Xn where Zn is the magnitude of Xn, and hence is real. The intersequence symmetry SOCSIS induces the following intersequence symmetry in the IDST: (nH/2)_ Yn,pq1 WNjp Yn,q We now prove Lemma 3.3. Let Xk,q be OCSIS symmetric. Then the IDST of Xk,pq is: Yn,pq Njp1 "' X k(n+l/2) k,pqWNjp k=O Njp1 L k=O Njp1 "' X w(Njpk1)(n+1/2) k,q Njp k=O Njp1 W (n+l/2) "' X W k(n+l/2) Njp k,q Njp k=O (n+1/2)WNjp Yn,q 100
PAGE 112
ocs ocs ocs ..=::::::::::::ocsrs( 1) c OCSIS(1) ocsrs*(1) Figure 3.1: Splitting tree for R symmetric FST 101
PAGE 113
Let Xk be an SOCS sequence oflength N. Its IDST Xn satisfies: N1 "' X k(n+1/2) Zn L..J kWN k=O N1 "' X .(Nk1)(n+1/2) L...J k=O N1 (n+l/2) "' X .k(n+1/2) WN L.t kU.: N k=O = (n+l/2)WN Zn We express Xn in polar form as follows: i6 Xn = Xne Substituting this into the preceding symmetry for Xn and solving for e leads to: (n+l/2)/2. Xn = WN Xn Let Xk,q be SOCSIS symmetric. Then the IDST of Xk,pq1 is: Nfp1 Yn,pql "' k(n+1/2) L.., xk,pq1"'Nfp k=O Njp1 "' X k(n+l/2) L.., Nfpk1,qWN/p k=O Nfp1 "' X (Njpk1)(n+l/2) L.., k,qWNfp k=O N/p1 (n+1/2) "' xr k(n+l/2) WNfp L.., k,qWNjp k=O (n+1/2)WNfp Yn,q This completes the proof of Lemma 3.3. The preceding lemma shows that each symmetry appearing in Figure 3.1 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the R symmetric IFST. 102
PAGE 114
Theorem 3.5 Assume that p is even. The inverse combine equation for acs, SaCS, and aCSIS sequences is: '"n,l Yn,O + ( 1 )1iin,p/2 + p/21 2Re[ 'i;""' w1"wq(n+l/2)y 1 L p N n,qJ (3.5) q=l for 0 :0: n :0: N jp1, 0 ::; l ::; p1. Note that :rn,l is real because Yn,o zs real. The inverse combine equation for SaCSIS sequences is: p/21 2 R r Z/2 (n+1/2J/2 'i;""' lq q(n+l/2) ] il!n,l eLwp WN L.t WP WN Yn,q q=O forO :0: n :0: N/p1, 0::; l::; pl. (3.6) Next, assume that p is odd. The inverse combine equation for aCS and aCSIS sequences is: (p1)/2 2 R [ 'i;""' lq q(ncl/2) ] Zn,l = Yn,O + e L wp wl'l Yn,q (3.7) q=l for 0 :0: n ::; N jp1, 0 :0: l :0: p1. The inverse combine equation for sacs and SaCSIS sequences is: (p3)/2 + 2R l/2 (n+l/2)/2 'i;""' lq q(n+l/2) i Xn,l Yn,(p1)/2 C[Wp WN L., Wp WN Yn,qJ q=O for 0 ::; n::; N jp1, 0::; l::; p1. (3.8) We now prove Theorem 3.5. First, assume that pis even. Consider the combining of OCS, SOCS, and OCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (3.3) yields: = p1 'i;""' lq q(n+1/2) LWPWN Yn,q q=O + lp/2 p(n+1/2)/2 + Yn,O wp WN Yn,p/2 p/21 p/21 'i;""' lq q(n+l/2) _!_ 'i;""' L wpwN Yn,q I L ,l(pq) (pq)(n+!/2) wP W 1v Yn,pq q=l 103
PAGE 115
= + ( l)l p(n+l/2)/2[ (n+1/2)/21 'Yn,O WN WN(p Yn,p/2! p/21 p/21 "' lq q(n+l/2) + "' lq q(n+l/2)L wp WN Yn,q L...J wp WN Yn,q q=1 Yn,O + ( 1)1Yn,p(2 + p/21 2 R [ "' lq q(n+1(2) 1 e L wp wN Yn,qJ q::::::l Consider the combining of SOCSIS sequences. Substituting the symme tries found earlier into the inverse combine equation (3.3) yields: = p1 "' lq q(n+l/2) L..t WP WN Yn,q q=O p/21 p/21 "' lq q(n+l/2) "' I(pq,), (pq1)(n1/2) L wp WN Yn,q T L wp WN Yn,pql q=O q:::::O p/21 p/21 "' lq q(n+l/2) + l (n+1/2) "' lq q(n+l/2)_ L..t wp WN Yn,q wp WN L IJ..,'p WN Yn,q q=O q=O Using SOCS symmetry yields: Xn,l X IN fp+n = (lN(p+n+J/2)/2WN XtNjp+n i/2 (n+l/2)/2Wp WN Xn,l Substituting this into the combine equation above yields: p/21 l/2 (n+l/2)/2 "' lq q(n+l/2) + Xn,l wp WN L wp WN Yn,q q=O p/21 l/2 (n+l/2)/2 "' lq q(n+1(2)Wp WN L WP WN Yn,q q=O p/21 2 R r i/2 (n+l/2)/2 lq, q(n+1/2). 1 eLwp WN L wP WN Yn,qj q=O (3.9) Next, assume that pis odd. Consider the combining of OCS and OCSIS sequences. Substituting the symmetries found earlier into the inverse 104
PAGE 116
combine equation (3.3) yields: p1 lq q(n+1/2) Zn,l wp WN Yn,q q=O (p1)/2 (p1)12 Y + Wlqwq(n+l/2)Y j(pq)jpq)(n+l/2)Y n,O L p N n,q T L p N n,pq q=1 q=l (p1)/2 (p1)/2 + lq q(n+1/2) + lq q(n+J/2)_ Yn,O L..J wp WN Yn,q wp WN Yn,q q=l q=l (p1 )/2 2R [ '<' lq q(n+l/2) J Yn,O + e L wp WN Yn,q q=l Consider the combining of SOCS and SOC SIS sequences. Substituting the symmetries found earlier into the inverse combine equation (3.3) yields: p1 lq q(n+1/2) LwvwN Yn,q q=O l(p1)/2 (n+1/2)(p1)/2 WP WN Yn,(p1)/2 t(p3)/2 (p3)/2 lq q(n+1/2) + l(pq1) (pq1)(nr1/2) L wpwN Yn,q L WP WN Yn,pq1 q=O q=O = l(p1)/2 (n+l/2)(p1)/2 WP WN Yn,(p1)/2 r (p3)/2 (p3)/2 lq q(n+l/2) + l (n+l/2) lq q(n+1/2)L WP wN Yn,q wp WN L wp wN Yn,q q=O q=O Combining this with equation (3.9) yields: p/2 p(n+l/2)/2 + WP WN Yn,(p1)/2 (p3)/2 l/2 (n+l/2)/2 lq q(n+1/2) + WP WN L wp WN Yn,q q=O (p3)/2 l/2 (n+l/2)/2 lq ,q(n+l/2)_ WP WN L WP wN Yn,q q=O = (p3)/2 + 2R [ 1/2 (n+1/2)/2 lq q(n+l/2) 1 Yn,(p1)/2 e wp WN L., wp wN Yn,qJ q=O 105
PAGE 117
This completes the proof of Theorem 3.5. The following corollary provides an important special case of this result. Corollary 3.4 Assume p = 2. The inverse combine equation for OCS and SOCS sequences is: Xn,O Yn,O + iJn,l Xn,1 Yn,O iJn,l for 0 :0: n :0: N 12 1. The inverse combine equation for SOCSIS sequences zs: 2 R [ (n+1/2)/2 1 e WN Yn,OJ 2Im[wt:+1 / 2)/2 Yn,o] for 0 :0: n :0: N 12 1. The next theorem provides all of the forward combine equations for the R symmetric FST. Theorem 3.6 Assume that p is even. The forward combine equation for OCS, SOCS, and OCSIS sequences is given by equation (3.4) for 0 :0: n :0: Nip1, 0:0: q :0: pl21 and: p1 Yn,p/2 = 1lp 2:) 1)1 xn,1 (3.10) 1=0 for 0 :0: n :0: NIp1. The forward combine equation for SOCSIS sequences zs: p1 11 (n+l/2)(q+J/2) ,, 1(q+l/2) Yn,q P W N ...J WP Xn,l (3.11) [:::;;:0 forO :0 n :0: Nlp1, 0:0: q :0: pl2l. Next, assume that p is odd. The forward combine equation for OCS and OCSIS sequences is given by equation (3.4) for 0 :0: n :0: NIP1, 0 :0 q :0: (p1)12. The forward combine equa.tion for SOCS and SOCSIS sequences is given by equation (3.11) for 0 ::; n :0: NIp1, 0 :0: q :0: (p3) 12 and: for 0 :0 n :0: Nip1. p1 Yn,(p1)/2 = 1lp L( 1)1in,1 1=0 106 (3.12)
PAGE 118
We now prove Theorem 3.6. First, assume that pis even. The forward combining of OCS, SOCS, and OCSIS sequences requires one new equation: Yn,p/2 w(n+1/2)/2 Njp Yn,p/2 p1 1/p L w;lpf2 :cn,l l;:;:::Q p1 1/p L( 1)1Xn,l l=O The forward combine equation for SOCSIS sequences 1s obtained by substituting equation (3.9) into equation (3.4): Yn,q = p1 1 / q(n+l/2)"' lq P w N L._., WP ':Cn,l 1=0 p1 1 / q(n+l/2)"' lqr 1/2 (n+l/2)/21 pwN LWP LWp, WN Xn,lJ 1=0 p1 1 / (n+l/2)(q+1/2)"' l(q+l/2)pwN LWP Xn,l 1=0 Next, assume that pis odd. The forward combining of OCS and OCSIS sequences does not require any new equations. The forward combining of SOCS and SOCSIS sequences requires one new equation: Yn,(p1 )/2 (n+l/2)/2 WN/p Yn,(p1)/2 p1 1/ "' lp/2p LWP Xn,l 1=0 p1 1/p I;( 1)1xn,l 1=0 This completes the proof of Theorem 3.6. The following corollary provides an important special case of this result. Corollary 3.5 Assume p = 2. The forward combine equation for OCS and SOCS sequences is: Yn,O ( Xn,O + Xn,1) /2 fln,l ( Zn,O Xn,J.) /2 107
PAGE 119
for 0 :0: n :0: N /2 1. The forward combine equation for SOCSIS sequences ts: (n+l/2)/2(. )/2 Yn,O = W N mn,o ZZn,l for 0 :0: n :0: N /2 1. 108
PAGE 120
3.3 Real Staggered Even (RSE) In this section, we will be concerned with the following symmetries: Definition 3.5 A real staggered even (RSE) sequence "'n of length N IS defined by: XNn1 Xn A real odd conjugate symmetric (ROCS) sequence Xk of length N is defined by: Note that an ROCS sequence may be viewed as having both R and OCS symmetry, or equivalently as an RO sequence. The following lemma establishes the relationship between these symme tries. Lemma 3.4 If "'n is an RSE sequence of length N, then its DST Xk is an ROCS sequence of length N. If Xk is an ROCS sequence of length N, then its IDST "'n is an RSE sequence of length N. We now prove Lemma 3.4. We will only prove the first assertion. Let :>:n be an RSE sequence of length N. Since Xn is also R symmetric, Lemma 3.1 implies that its DST Xk is an OCS sequence of length N. Thus, we have only to prove that Xk is R symmetric as well: n=D n=O N1 = k(n+l/2) L XnWN n=O 109
PAGE 121
This completes the proof of Lemma 3.4. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for NSNS boundary conditions. Note that if N is even, then an RSE sequence satisfies NSNS boundary conditions for the computational domain 0 <::; n <::; N /21. That is: 'l:N1 a::o XN/21 ;rN/2 Theorem 3. 7 Let Xn be an RSE sequence and let xk be its ROCS sym metric DST, both of length N where N is even. The real form of the DST zs: N/21 Xk=1/N L 2:vncos[d(2n+1)/NJ n:::::O for 0 <::; k <::; N /21. The real form of the IDST is: N/21 Xn=Xo+ L 2Xkcos[d(2n+1)/N] for 0 <::; n <::; N /2 1. We now prove Theorem 3. 7. The result for the DST follows from The orem 3.4, the ROCS symmetry of Xk, and the RSE symmetry of "'n as follows: N1 xk 1/N L XnCos[?rk(2n + 1)/N] n:::::O N/21 1/N{ L ;cncos[d(2n+1)/N]+ n::=O N/21 L "'Nn1 cos[7rk(2N2n1)/ N]} n=O N/21 1/N L 2a:ncos[7rk(2n + 1)/NJ n=O 110
PAGE 122
The result for the IDST follows immediately from Theorem 3.4 and the ROCS symmetry of Xk. Note that only half of the RSE sequence :Cn needs to be specified. This completes the proof of Theorem 3.7. We now develop a fast, mixed radix algorithm for computing the RSE symmetric DST and its inverse, given :Cn in natural order. Note that an RSE sequence of length N may be stored in N /2 real storage locations, compared to 2N real storage locations for a C sequence of length N. Sim ilarly, an ROCS sequence of length N may be stored in N /2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCS sequence Xk. We be gin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.6 Let Xk be an ROCS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of Xk are identical to those in Definition 3.4, with the addition that all sequences are real as well. We indicate this by preceding the acronym for each symmetry with an R. The relationships between the symmetries recorded in Lemma 3.2 are not affected by the fact that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an ROCS sequence is shown in Figure 3.2. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3.5 The intermediate symmetries in the IDST induced by the in termediate symmetries in the DST are identical to those in Lemma 3.3, with the following addition. Let Xk be an R sequence of length N. Its IDST "'n satisfies: Since all sequences have R symmetry, only half of the IDST of any sequence needs to be computed. 111
PAGE 123
ROCS ocsrs*(l) Figure 3.2: Splitting tree for RSE symmetric FST 112
PAGE 124
We now prove Lemma 3.5. Let Xk be an R sequence of length N. Its IDST "n satisfies: N1 XNn1 "' X k(Nn1/2) L.J kWN k=O N1 "' X k(n+l/2) L.J kWN k=O 'ifn This completes the proof of Lemma 3.5. The preceding lemma shows that each symmetry appearing in Figure 3.2 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RSE symmetric IFST. Theorem 3.8 Assume that p is even. The inverse combine equation for ROCS, RSOCS, and ROCSIS sequences is given by equation {3.5} for the lower halfrange of n and 0 :S l :S p/2 1. We also need the companion equation: "NJpn1,1 (3.13) for the lower halfrange of n and 0 :S l :S p/2 1. The inverse combine equation for RSOCSIS sequences is given by equation {3.6} for the lower halfrange of n and 0 :S l :S p/2 1. We also need the companion equation: p/21 2R [ (1+1)/2 (n+l/2)/2 "' q(l+l), q(n+l/2) 1 'XNjpn1,1e Wp WN Wp WN Yn,qJ q=O (3.14) for the lower halfrange of n and 0 :S l :S p/2 1. The inverse combine equation for R sequences is given by equation (3.3} for the lower halfrange of n and 0 :S l :S p/21. We also need the companion equation: p1 "' q(l+l) .q(n+1/2)'iVNjpnl,lL..J wp WN Yn,q (3.15) q=O 113
PAGE 125
for the lower halfrange of n and 0 :':: l :':: p/21. Next, assume that p is odd. The inverse combine equation for ROCS and ROCSIS sequences is given by equation {3. 7) for the lower halfrange of n and 0 :'::I:':: (p1)/2. We also need the companion equation: (p1)/2 2 R [ "' q(l+l) q(n+l/2) J X_Njpnl,l = Yn,O T e L..,; wp WN Yn,q (3.16) q=l for the lower halfrange of n and 0 :':: I :':: (p 3) /2. The inverse combine equation for RSOCS and RSOCSIS sequences is given by equation (3.8} for the lower halfrange ofn and 0 :':: l :':: (p1)/2. We also need the companion equation: iN/pn1,1 = Yn,(p1)/2 + (p3)/2 "' q(l+l) q(n+l/2) J L wp WN Yn,q q=O (3.17) for the lower halfrange of n and 0 :':: l :':: (p 3) /2. The inverse combine equation for R sequences is given by equation (3.3) for the lower halfrange ofn and 0 :':: l :':: (p1)/2. We also need the companion equation {3.15} for the lower halfrange of n and 0 :':: l :':: (p3)/2. We now prove Theorem 3.8. First, assume that p is even. Consider the combining ofROCS, RSOCS, and ROC SIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.5), we need the following companion equation: XNjpn1,1 = YNjpn1,0 + (lhiNjpn1,p/2 + p/21 2R [ "' lq q(Njpn1/2) J e L wP wN YN/pnl,q q=l Using RSOCS symmetry yields: YNjpn1,q (Njpn1/2)/2 WN/p YN/pn1,q (n+l/2)/2WNjp Yn,q Yn,q 114 (3.18)
PAGE 126
Substituting this into the companion equation above yields: X:Njpn1,l = +( 1)1+1' Yn,O Yn,p/2 T p/21 2R 'l "' q(l+l), q(n+1/2)I e L wP WN Yn,q q=l + ( 1)1+1L Yn,O Yn,p/2 1 p/21 2Rel "' wq(l+l)wq(n+l/')y I 1. L p N n,q q=l Consider the combining of RSOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.6), we need the following companion equation: ZNjpn1,1 = = p/21 2 R [ 1/2 (Njpn1/2)/2 "' lq q(N/pn1/2) 1 e WP WN L..t WP WN YNjpnl,qJ q=O p/21 2R [ (1+1)/2 (n+l/2)/2 ,, ,q(l+l) q(n+l/2)1 ewp" WN L wP WN Yn,q q:::::O p/21 2R [ (1+1)/2 (n+l/2)/2 "' ,q(l+l) q(n+l/2) I ewP wN L wP wN Yn,q q=O Consider the combining of R sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.3), we need the following companion equation: XNjpnl,l p1 "' lq q(N/pn1/2) L wp WN YN}pn1,q q=O p1 = "' q(l+1) q(n+J/2)_ LWP WN Yn,q q=O Next, assume that p is odd. Consider the combining of ROC S and ROCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3. 7), we need the following companion equation: '"Nfpn1,1 (p1)/2 2 R [ "' iq q(N jpn1/2) 1 YNjpn1,0 + e WP WN YNjpnl,qJ q=l 115
PAGE 127
(p1 )/2 Y 0 + 2Re[ '<' wq(l,: lw .q(n+1 / 2)y. ] n, ....p !\ n,q q=l (p1)/2 Y + 2Re[ '<' wq(l+1)wq(n+l/2)y 'j n,O L..J p N n,q q=l Consider the combining of RSOCS and RSOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (3.8), we need the following companion equation: iiN/pn1,1 = YNjpn1,(p1)/2 + (p3)/2 2 R [ 1/2 (N/pn1/2)/2 '<' ,lq q(N/pn1/2) ] e Wp WN L '"p WN q=O Substituting equation (3.18) into the companion equation above yields: i;N/pn1,1 = Yn,(p1)/2 + (e3)/2 2 R [ (1+1)/2 (n+1/2)/2 '\:' q(l'1) q(n+l/2)] e WP wN L...t wP wN Yn,q q=O Yn,(p1)/2 + (p3)/2 2 R [ (1+1)/2 (n+l/2)/2 '<' q(l+l) q(n+l/2) ] ewp WN L WP WN Yn,q q=O The companion equation for R sequences is identical to the even p case. This completes the proof of Theorem 3.8. The following corollary provides an important special case of this result. Corollary 3.6 Assume p = 2. The inverse combine equation for ROCS and RSOCS sequences is: :Cn,O Yn,O + fin,l ZN/2n1,0 = Yn,OYn,l for the lower halfrange of n. The inverse combine equation for RSOCSIS sequences zs: XN/2n1,0 2R i (n+1/2)/2 ] eLwN Yn,o 2! (n+J/2)/2 ] mlwN Yn,O 116
PAGE 128
for the lower halfrange of n. The inverse combine equation for R sequences zs: n+l/2 = Yn,O + WN Yn,l .(n,1/2)Yn,O W N Yn,l for the lower halfrange of n. The next theorem provides all of the forward combine equations for the RSE symmetric FST. Theorem 3.9 Assume that p is even. The forward combine equation for R sequences is: p/21 1 / q(n+l/2) c lq ..L q(l+1)] Yn,q:::.:::: PWN L LWp Zn,l WP Xnl,l+l (3.19) [::::::0 for the lower halfrange of n and 0 :; q :; p 1. For N = p, n = 0 this reduces to: p/21 Yo,q = 2/pRe[ L w;q(l+1/2)xo,t] (3.20) l=O This ensures that the final output is real because N = p, n = 0 in the last stage of the algorithm. The forward combine equation for ROCS, RSOCS, and ROC SIS sequences is given by equation (3.19) for the lower halfrange of n and 0 :; q :':: p/21 with the exception that all sequences Xn,l are real. In addition: p/21 Yn,p/2 = 1/p L (1)1[xn,l'"n1,1+1] (3.21) l=O for the lower halfrange of n. For N = p, n = 0 this reduces to flo,pj2 = 0. The forward combine equation for RSOCSIS sequences is: p/2J l/ (n+l/2)(q+1/2)[ ,l(qo 1/2)+ l=O (3.22) l=O 117
PAGE 129
for the lower halfrange of n and 0 :":: q :":: p/21. For N = p, n = 0 this reduces to: p/21 Yo,q = 2/pRe[ L w;Cl+1/2)(q+1/2)xo,t] (3.23) l=O Next, assume that p is odd. The forward combine equation for R sequences zs: Yn,q ljpwjVq(n+J/2){( l)"w;12,n,(p1)/2 + (p3)/2 "" [ lq + ,q(l+1)]} L wp Xn,l wp Xnl,l+l (3.24) l=O for the lower halfrange of n and 0 < q :":: p 1. For N = p, n = 0 this reduces to: (p3)/2 Yo,q = 1/p{( l)""'o,(N1);2 + 2Re[ L w;(l+1/2 )"'o,zl} (3.25) l=O Note that Yo,q is real because "'o,(N1);2 = "'(Nl)/2 is real. The forward com bine equation for ROCS and ROCSIS sequences is given by equation {3.24} for the lower halfrange of n and 0 :":: q :":: (p 1) /2 with the exception that all sequences :l!n,l are real. The forward combine equation for RSOCS and RSOCSIS sequences is: Yn,q = 1 / (n+1/2)(q+l/2){ '( 1)q+1 (q+l/2)/2, p WN Z Wp Xn,\_p1)/2 T (p3)/2 "" [wl(q+l/2);;; + w(l+l)(q+l/2);; ]} L p n,l p nl,l+l l=O (3.26) for the lower halfrange of n and 0 :<:: q :<:: (p3)/2. For N = p, n = 0 this reduces to: (p3)/2 Yo,q = 2/pRe[ L w;(l+l/2 )(q+ 1 / 2)x0 ,1 ] (3.27) l=O In addition: Yn,(p1)/2 1/p{( 1)(p1)/2Xn,(p1)/2 + (p3)/2 L ( 1/[xn,l'"n1,1+1]} (3.28) l=O for the lower halfrange of n. For N = p, n = 0 this reduces to iio,(p1);2 = 0 because "'o,(N1);2 = "'CN1)/2 = 0. 118
PAGE 130
We now prove Theorem 3.9. First, assume that pis even. The forward combine equation for R sequences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all R sequences: :Z:n,pZ.1 "'(pl1)N/p+n ;xN(l+l)N/p+n 'X(I+l)N/pn1 = Using this result, we obtain: Yn,q = = p1 1 / q(n+l/2) '\"""' lq pwN Xn,l l=O p/21 1 / q(n+l/2) '\"""' [ lq q(pl1) 1 PWN L WP Xn,l T WP :Z:n,pllJ l=O p/21 1 / q(n+l/2) '\"""' c lq q(l+1)1 pwN L....t lWp Xn,l T Wp :Z:nl,l+l; l=O For N = p, n = 0 this reduces to: p/21 Yo,q = 1/pw;q12 L [w;1 :co,t + p/21 = ljp L [w;q(l+1/2)xo.t t l=O p/21 = 2/pRe[ L w;q(l+J/2):t0,t] l=O The forward combining of ROCS, RSOCS, and ROCSIS sequences re quires one new equation: = (n+J/2)/2 Yn,p/2 WN/p Yn,p/2 p/21 1/p L (1)1[Xn,t"'n1,1+1] l=O 119
PAGE 131
The forwaxd combine equation for RSOCSIS sequences is obtained by substituting equation (3.9) into equation (3.19): p/21 1 / q(n+l/2) "' [ lq + q(l+1)1 Yn,q pwN L wp Xn,l wp Znl,l+lJ l=O p/21 1 / (n+l/2)(q+l/2)[ "' i(q+l/2)1PWN 6 wp Xn,l ; [::::.:0 p/21 "' w(i+l)(q+l/2 ):;; 1 J L.J p n1, rl l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.20): p/21 Yo,q 2/pRe[ L w;q(lt1 / 2la:o,d l=O p/21 2/pRe[ L w;(l+1/2)(q+l/2);,o,d l=O Next, assume that p is odd. The forwaxd combine equation for R. se quences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. p1 1 / q(n+1/2) "'. lq Yn,q pwN LWP Xn,l l=O 1 / q(n+l/2){ q(p1)/2 + pwN WP Xn,(p1)/2 (p3)/2 L [w;1 qa:n,l + w;q(pll)xn,pl1]} l=O = 1jpw}Vq(n+1 / 2){(1)qw,;l2xn,(p1)/2 + (p3)/2 "' [ lq ,q(l+l)J} L WP Xn,l T WP Xn1,1+1 l=O For N = p, n = 0 this reduces to: 120
PAGE 132
Yo,q + (p3)/2 2:= [w;1xo,l + l=D 1/p{( 1)xo,(N1)/2 + (p3)/2 2:= [w;(l+l/2lxo,l + l=O (p3)/2 1/p{( 1):oo,(N1)/2 + 2Re[ 2:= w;(l+1 / 2l:oo,z]} [::;;0 The forward combining of ROCS and ROCSIS sequences does not re quire any new equations. The forward combine equation for RSOCS and RSOCSIS sequences is obtained by substituting equation (3.9) into equa tion (3.24): Y = 1/pwq(n+l/2){(1)wl2 x n,q N p n,(p1 )/2 T (p3)/2 2:= [w;1:on,l + wtl+l):;;_nl,l+l]} l=D = + (p3)/2 "' [wl(q+l/2);; + w(l+l)(q+l/2):1: 1} L...., p n,l p nl,l+lJ l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.25). We will also use the fact that "'o,(N1 ); 2 = "'(N1)/2 = 0. (p3)/2 Yo,q = 1/p{( 1)xo,(N1)/2 + 2Re[ 2:= w;(h1 /'):z:o,d} l=D (p3)/2 1/p{i( 1)+1;;o,(N1)/2 + 2Re[ 2:= w;U+ 1 / 2 )(q+l/2l;;0,z]} [:::.:0 (p3)/2 2/pRe[ "' w(l+l/2 )(+ 1 1 2 );; i L....J p O,lJ 121
PAGE 133
For q = (p1)/2 we obtain: (n+l/2)/2 Yn,(p!)/2 = WN/p Yn,(p!)/2 (p3)/2 = 1/p{( 1)(pl)/2 :Vn,(p1)/2 + L ( 1)1[:Vn,l '"n1,1+1]} This completes the proof of Theorem 3.9. The following corollary provides an important special case of this result. Corollary 3. 7 Assume p = 2. The forward combine equation for R se quences is: Yn,O = (o;n,O + Zn1,1)/2 (n+1/2)( )/ Yn,l WN Zn,OXn1,1 2 for the lower halfrange of n. The forward combine equation for ROCS and RSOCS sequences is: Yn,O = (o;n,O + "'n1,!)/2 Yn,! = (o;n,O"'n1,1)/2 for the lower halfrange of n. The forward combine equation for RSOCSIS sequences zs: (n+l/2)/2(. )/2 Yn,O = WN Zn,O + 1Zn1,1 for the lower halfrange of n. 122
PAGE 134
3.4 Real Staggered Odd (RSO) In this section, we will be concerned with the following symmetries: Definition 3. 7 A real staggered odd (RSO) sequence Xn of length N is defined by: XNn1 An imaginary odd conjugate symmetric (JOGS) sequence Xk of length N is defined by: Note that an JOGS sequence may be viewed as having both I and OGS sym metry, or equivalently as an imaginary even (IE) sequence. The following lemma establishes the relationship between these symme tries. Lemma 3.6 If Xn is an RSO sequence of length N, then its DST Xk is an JOGS sequence of length N. If Xk is an JOGS sequence of length N, then its IDST Xn is an RSO sequence of length N. We now prove Lemma 3.6. We will only prove the first assertion. Let "'n be an RSO sequence oflength N. Since "'" is also R symmetric, Lemma 3.1 implies that its DST Xk is an OCS sequence of length N. Thus, we have only to prove that Xk is I symmetric as well: = n:::::O N1 k(N n1/2) ZNnlWN n=O N1 "' k(n+l/2) L ZnWN n:::O 123
PAGE 135
This completes the proof of Lemma 3.6. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DSDS boundary conditions. Note that if N is even, then an RSO sequence satisfies DSDS boundary conditions for the computational domain 0 ::; n ::; N /21. That is: :eN/21 Theorem 3.10 Let Xn be an RSO sequence and let Xk be its JOGS sym metric DST, both of length N where N is even. The real form of the DST zs: N/21 Im(Xk) = 1/N L 2xnsin[7rk(2n+ 1)/N] n::::O for 1 ::; k ::; N /2. The real form of the IDST is: N/21 Xn = ( l)n+l Im(XN/2)L 21m(Xk) sin[7rk(2n + 1)/ N] k=1 for 0 ::; n ::; N /2 1. We now prove Theorem 3.10. The result for the DST follows from Theo rem 3.4, the IOCS symmetry of Xk, and the RSO symmetry of Xn as follows: N1 Im(Xk) = 1/ N L Xn sin[7fk(2n + 1)/ N] N/21 1/N{ L Xnsin[7fk(2n + 1)/N] + N/21 L "'Nn1 sin[7fk(2N2n1)/ N]} n=O N/21 1/N L 2xnsin[7rk(2n + 1)/N] n=O Note that the range for k reflects the fact that X0 = 0. The result for the IDST follows immediately from Theorem 3.4 and the IOCS symmetry of 124
PAGE 136
X k. Note that only half of the RS 0 sequence Xn needs to be specified. This completes the proof of Theorem 3.10. We now develop a fast, mixed radix algorithm for computing the RSO symmetric DST and its inverse, given Xn in natural order. Note that an RSO sequence of length N may be stored inN /2 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an IOCS sequence of length N may be stored in N/2 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one fourth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCS sequence X k. We begin developing this algorithm by defining all of the intermediate symmetries involved. Definition 3.8 Let Xk be an JOGS sequence of length N with factor p. The intermediate symmetries which occur in the splittings of Xk are identical to those in Definition 3.4, with the addition that all sequences are imaginary as well. We indicate this by preceding the acronym for each symmetry with an I. The relationships between the symmetries recorded in Lemma 3.2 are not affected by the fact that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCS sequence is shown in Figure 3.3. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The next lemma provides the intermediate symmetries in the IDST in duced by the intermediate symmetries in the DST. Lemma 3. 7 The intermediate symmetries in the IDST induced by the in termediate symmetries in the DST are identical to those in Lemma 3.3, with the following addition. Let Xk be an I sequence of length N. Its IDST "'n satisfies: :CNn1 = Xn Since all sequences have I symmetry, only half of the IDST of any sequence needs to be computed. 125
PAGE 137
IOCS ocsrs*(l) Figure 3.3: Splitting tree for RSO symmetric FST 126
PAGE 138
We now prove Lemma 3.7. Let Xk be an I sequer,ce of length N. Its IDST :lln satisfies: N1 'C""' X k(N n1/2) ZNn1 ::::::. kWN Nl 'C""' X .<(n+J/2) L.....t kWN k=O :::: Xn This completes the proof of Lemma 3. 7. The preceding lemma shows that each symmetry appearing in Figure 3.3 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next theorem provides all of the inverse combine equations for the RSO symmetric IFST. Theorem 3.11 Assume that p is even. The inverse combine equation for JOGS, ISOCS, and IOCSIS sequences is given by equation {3.5) for the lower halfrange of n and 0 :0:: l :0:: p/21. We also need the companion equation: "Nfpn1,1 = Yn,o + ( 1)11Jn,p/2p/21 2Re[ 'C""' wq(l+l)wq(n+1/2)y J L p N n,q q=l (3.29) for the lower halfrange of n and 0 :0:: l :0:: p/2 1. The inverse combine equation for ISOCSIS sequences is given by equation {3.6} for the lower halfrange of n and 0 :0:: l :0:: p/2 1. We also need the companion equation: p/21 _ 2 R [ (1+1)/2 (n+l/2)/2 q(l+l) _q(n+J/2) 1 (3 30) ZNjpnl,le WP WN L wp wN Yn,q. q=O for the lower halfrange of n and 0 :0:: l :0:: p/2 1. The inverse combine equation for I sequences is given by equation (S.3} for the lower halfrange of n and 0 :0:: l :0:: p/21. We also need the companion equation: p1 'C""' q(i+1) q(n+J/2)ZNjpnl,lL wp WN Yn,q (3.31) q=O for the lower halfrange of n and 0 :0:: l :0:: p/2 1. 127
PAGE 139
Next, assume that p is odd. The inverse combine equation for JOGS and IOCSIS sequences is given by equation (3. 7} for the lower halfrange of n and 0 :S l :S (p1)/2. We also need the companion equation: (p1)/2 2R I q(l+l) q(nH/2) J ZNjpnl,lYn,Oe wp WN Yn,q (3.32) q=l for the lower halfrange of n and 0 :S l :S (p3)/2. The inverse combine equation for ISOCS and ISOCSIS sequences is given by equation (3. 8) for the lower halfrange ofn and 0 :S l :S (p1)/2. We also need the companion equation: XNfpn1,1 2R I (1+1)/2 (n+l/2)/2 ewP wN !in,(p1)/2(p3)/2 q(l+l) q(n+l/2) ] L.., WP WN Yn,q q=O (3.33) for the lower halfrange of n and 0 :S l :S (p3)/2. The inverse combine equation for I sequences is given by equation (3.3} for the lower halfrange ofn and 0 :S l :S (p1)/2. We also need the companion equation (3.31} for the lower halfrange of n and 0 :S l :S (p3)/2. We now prove Theorem 3.11. First, assume that pis even. Consider the combining of IOCS, ISOCS, and IOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.5), we need the following companion equation: "N/pn1,1 = YN/pn1,0 + ( 1)1YN/pn1,p/2 + p/21 2Rel L q=l Using ISOCS symmetry yields: YN/pn1,q (N/pn1/2)/2 WN/p YN/pn1,q (n+l/2)/L WN/p Yn,q = fln,q Substituting this into the companion equation above yields: 128 (3.34)
PAGE 140
ZNjpn1,l Yn,o + ( 1)1Yn,p/2p/21 2Re[ "' wq(l+l)w q(n+l/')y ] L p N n,q q=l Yn,o + ( 1 )1Yn,p/2 p/21 2Re[ "' wq(l+l)wq(n+l/')y l L p N n,qJ q=1 Consider the combining of ISOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.6), we need the following companion equation: p/21 iN/pn1,1 = L = q=O p/2l 2R [ (i+l)/2 (n+l/2)/2 ,q(lr1) e wp wN L """'p wN Yn,qJ q=0 p/21 2R [ (1+1)/2 (n+l/2)/2 "' q(lf) q(n+l/2) J ewp wN L WP wN Yn,q q=O Consider the combining of I sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation (3.3), we need the following companion equation: p1 "'Nfpn1,1 = "' lq q(Njpn1/2) ...wp WN YN/pn1,q q=O p1 "' q(l+1) q(n+l/2)LWP WN Yn,q q=O Next, assume that pis odd. Consider the combining ofiOCS and IOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equation ( 3. 7), we need the following companion equation: (p1)/2 2 R [ "' lq q(N/pn1/2) ] "'Nfpn1,1 = YNjpn1,0 + e ...wp WN YNjpn1,q q=l 129
PAGE 141
(p1)/2 y 2Ref wq(l+l)w q(n+1 / 2)y ] n,O L p N n,q q:::::l = (p1)/2 y 2Re[ wq(l+l)"'q(n+l/2)y ] n,O L......J p N n,q q=l Consider the combining of ISOCS and ISOCSIS sequences. Since we will compute only half of each sequence Yn,q on the right hand side of equa tion (3.8), we need the following companion equation: ;;Nfpn1,1 = YN/pnl,(p1)/2 + Substituting equation (3.34) into the companion equation above yields: ZNjpn1,1 = fin,(p1)/2= fin,(p1)/2 The companion equation for I sequences is identical to the even p case. This completes the proof of Theorem 3.1 L The following corollary provides an important special case of this result. Corollary 3.8 Assume p = 2. The inverse combine equation for JOGS and ISOCS sequences is: :Z:n,O "'N/2n1,0 Yn,O + fJn,l Yn,O + Yn,l for the lower halfrange of n. The inverse combine equation for ISOCSIS sequences zs: ZN/2n1,0 2R c (n+l/2)/2 ] eLWN Yn,O 21 c, ,(n_ 1 mLwN Yn,O 130
PAGE 142
for the lower halfrange of n. The inverse combine equation for I sequences zs: nLl/2 Yn,O + wN' Yn,l "'N/2n1,0 L (n+1/2)Yn,O 1 W N Yn,l for the lower halfrange of n. The next theorem provides all of the forv;ard combine equations for the RSO symmetric FST. Theorem 3.12 Assume that p is even. The forward combine equation for I sequences is: p/21 1 / q(n+1/2) '\"' [ lq q(l+l)' Yn,q = PWN L wp Xn,[wp :Vnl,l+lJ l=O for the lower halfrange of n and 0 ::; q ::; p 1. For N = p, n reduces to: p/21 YO,q = 2ijplm[ L w;q(l+l/2)xo.zl bO (3.35) 0 this (3.36) This ensures that the final output is imaginary because N = p, n = 0 in the last stage of the algorithm. The forward combine equation for JOGS, ISOCS, and IOCSIS sequences is given by equation (3.35) for the lower halfrange of n and 0 :S q :S p/21 with the exception that all sequences Xn,t are real. In addition: p/21 Yn,p/2 = 1/p L (1)1\xn,l + Xn1,/H] (3.37) l=O for the lower halfrange of n. For N = p, n = 0 note that Yo,N/2 = ifJo,Nj2 zs zmagznary. The forward combine equation for ISOCSIS sequences is: p/21 1 / (n+l/2)(q+l/2)l ,l(q+l/2) _ Yn,q PWN L Wp Xn,l l=O p/21 '\"' w(l+l)(q+1/2)a; 1 L p nl,ltlj l=O 131 (3.38)
PAGE 143
for the lower halfrange of n and 0 :'0 q :'0 p12 L For N = p, n = 0 this reduces to: p/21 Yo,q = 2ijplm[ I: (3.39) Next, assume that pis odd. The forward combine equation for I sequences zs: Yn,q 1 j p wjVq(n+ 1 /2 ) { ( 1 + (p3)/2 "' i lq q(l+l)1} LWp Xn,l Wp Xnl,l+lJ (3.40) for the lower halfrange of n and 0 < q 5 p 1. For N = p, n = 0 this reduces to: (p3)/2 Yo,q = 1/p{( 1)qxo,(N1);2 + 2iJm[ I: w;q(Z+l/2 lxo,z]} (3.41) Note that Yo,q is imaginary because xo,(N1);2 = '"(Nl)/2 is imaginary. The forward combine equation for JOGS and IOCSIS sequences is given by equa tion {3.40) for the lower halfrange of n and 0 :'0 q 5 (p1)/2 with the exception that all sequences :l:n,l are real. The forward combine equation for ISOCS and ISOCSIS sequences is: Yn,q = 1 / (n+l/2)(q+l/2){ '( 1)q+1w(q+1/2)/2;;; + PWN Z p (p3)/2 "' [wl(q+l/2).;; (ltl)(q+l/2).;; ]} L p "'n,l Wp ...,n1,[11 (3.42) for the lower halfrange of n and 0 :'0 q :'0 (p3)/2. For N = p, n = 0 this reduces to: In addition: Yo,q 1/p{i( 1)q+l iiio,(N1)/2 + Yn,(p1)/2 (p3)/2 2ilm[ L w;(Z+l/2)(q+l/2);;;o,zl} l::::O 1/p{( 1)(p1)/2 iiin,(p1)/2 + (p3)/2 I: ( 1)1r:i:n,t + l=O 132 (3.43) (3.44)
PAGE 144
for the lower halfrange of n. For N = p,n = 0 note that Yo,(N1);2 = ifJo,(N 1)/2 IS 1magmary. We now prove Theorem 3.12. First, assume that pis even. The forward combine equation for I sequences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. For this purpose, we will need the following result which is valid for all I sequences: Zn,pl1 '"(pl1)Njp+n '"N(l+l)Nfp+n a;(l+l)N/pn1 'Xnl,l.rl Using this result, we obtain: Yn,q = = p1 1 / q(n+1/2)"' lq pwN Zn,l l=O p/21 1 / q(n+l/2) "' r lq + q(p11) 1 pwN L..i LWp Xn,l WP Xn,pl11 l=O p/21 1 / q(n+l/2) "' lq q(lt1)r PW_rv L., [Wp "r,,lWP '"n1,1+1] l=O For N = p, n = 0 this reduces to: p/21 Yo,q 1/pw;/2 2:: [w;'xo,ll=O p/21 1/p "' [wq(l+l/2), w(h1/2);;; ] L..,_; p O,l p O,l 1::::0 p/21 2i/plm[ 2:: w;('+1 / 2)xo,d l=O The forward combining ofiOCS, ISOCS, and IOCSIS sequences requires one new equation: Yn,p/2 (n+l/2)/2 WN/p Yn,p/2 p/21 1/p 2:: (1)1[xn,l + '"n1,1+,] l=O 133
PAGE 145
The forward combine equation for ISOCSIS sequences is obtained by substituting equation (3.9) into equation (3.35): p/21 = 1 / q(n+l/2) "I<' [ lq q(l:1)J Yn,q pwN L...wp Xn,l wp ;.en1,1+1 l=O p/21 = 1/ (n+l/2)(q+1/2)r "I<' ,l(q+l/2)_ pwN l (.;.;P Xn,l l=O p/21 "I<' w(l+1)(q+1/2);;; 1 p nl,li1 .. l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.36): p/21 YO,q 2ijpimf wq(l+l/2)x0 ,1 L p ,.; l=O p/21 2i/pim[ L w;(l+l/2)(q+1/2);;o,d 1::::::0 Next, assume that p is odd. The forward combine equation for I se quences is obtained by developing a compact form of equation (3.4) which eliminates all redundant data. p1 1 / q(n+l/2) "I<' ,lq Yn,q pwN L WP ;r.n,l l=O 1 / q(n+l/2){ q(p1)/2 + P WN WP "'n,(p1)/2 (p3)/2 "' [ lq + qlpl1) ]} L WP Zn,l WP Xn,pl1 l=O 1/p w q(n+1/2){(1)qwq/Zx ( )/ ..L N p n, pl 2 (p3)/2 w(l+;)o;;]} L p ""n,lp "'nl,l+l l=O For N = p, n = 0 this reduces to: 134
PAGE 146
Yo,q = + (p3)/2 "' wq(l+l),.'} L...,; p wO,l p wO,lj l=O 1/p{( 1)xo,(N1)/2 + (p3)/2 "' [wq(l+l/2), wq(l+l/2);;; 'i} L....J p O,l p O,l., l=O (p)/2 1/p{(1)qxo,(N1);2 + 2ilm[ L w;U+1 /2lxo,z]} kO The forward combining of IOCS and IOCSIS >equences does not require any new equations. The forward combine equation for ISOCS and ISOCSIS sequences is obtained by substituting equation (3.9) into equation (3.40): Yn,q = 1 'pwq(n+1/2){( + I N p (p3)/2 "' [ lq q(l+l)'} L....i WP Zn,l WP Xnl,l+lJ l=O = 1/ (n+l/2)(q+l/2){ '( 1)q+l .(q+l/2)/2+ pwN WP Xn,(p1)/2 (p3)/2 "' [wl(q+l/2);; w(l+l)(q+l/2);; ]} L....t p n,l p nl,l,1 l=O For N = p, n = 0 we obtain a simplified form by substituting equation (3.9) into equation (3.41). (p3)/2 Yo,q = 1/p{( 1)qxo,(N1)/2 + 2ilm[ L w;q(!+l/2 lxo,z]} l=O (p3)/2 = 1/p{i( l)q+l,;;O,(N1)/2 + 2ilm[ L w;U+1/2)(q+l/2 );;o,l]} l=O For q = (p1)/2 we obtain: (n+l/2)/2 Yn,(p1)/2 = WN/p Yn,(p1)/2 (p3)/2 1/p{( 1)(p1)/2xn,(p1)/2 + L ( l)1ixn,l + '"n1,!+1]} l=O 135
PAGE 147
Tills completes the proof of Theorem 3.12. The following corollary provides an important special case of tills result. Corollary 3.9 Assume p = 2. The forward combine equation for I se quences zs: Yn,O ("'n,O'"n1,1)/2 Yn,1 WN(n+l/ 2 )(,n,O + "n1,1)/2 for the lower halfrange of n. The forward combine equation for JOGS and ISOCS sequences is: Yn,O ("'n,O"'n1,1)/2 Yn,1 ("n,O + "'n1,1)/2 for the lower halfrange of n. The forward combine equation for ISOCSIS sequences is: (n+l/2)/2()/2 Yn,O WN Xn,OZX_n1,1 for the lower halfrange of n. 136
PAGE 148
3.5 Real Composite Staggered Even Staggered Even (RSESE) In this section, we will be concerned with the following symmetries: Definition 3.9 A real composite staggered even staggered even (RSESE) sequence Xn of length N, where N is even, is defined by: Note that an RSESE sequence of length N is also an RSE sequence of length N. A real odd conjugate symmetric zero odd term (ROCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries: Lemma 3.8 If Xn is an RSESE sequence of length N, where N is even, then its DST Xk is an ROCSZO sequence of length N. If Xk is an ROCSZO sequence of length N, where N is even, then its IDST Xn is an RSESE sequence of length N, We now prove Lemma 3.8. We will only prove the first assertion. Assume :Z:n is an RSESE sequence of length N, where N is even. Since :Z:n is also an RSE sequence oflength N, Lemma 3.4 implies that its DST Xk is an ROCS sequence of length N. Thus, we have only to prove the third property in the definition of an ROCSZO sequence. For this, we use the representation of Xk provided by Theorem 3. 7 and the RSESE symmetry of "'n as follows: N/21 Xk = L 2a:ncos[d(2n+l)/N] n=O 137
PAGE 149
N/21 L 2;eN/2n1 cos[r.k(N2n1)/ N] n:::::::O N/21 (1)k L 2:tncos[d(2n+1)/N] n=O This completes the proof of Lemma 3.8. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for NSN boundary con ditions. Note that if N = 2(2M + 1), then an RSESE sequence satisfies NSN boundary conditions for the computational domain 0 :S n :S M. That lS: Theorem 3.13 Let Xn be an RSESE sequence and let Xk be its ROCSZO symmetric DST, both of length N where N is even. Assume that N = 2(2M + 1). The real form of the DST is: M1 x2k = 2jN{(1)kxM+ L 2xncos[21rk(2n+ 1)/N]} n=O for 0 S k :S M. The real form of the IDST is: M Xn = Xo + L 2X2 k cos[21rk(2n + 1)/ N] k=1 forO :S n :SM. We now prove Theorem 3.13. The result for the DST follows from The orem 3.7, the ROCSZO symmetry of Xk> and the RSESE symmetry of Xn as follows: N/21 x2k 1/N L 2xncos[2o>rk(2n+1)/N] n=D 138
PAGE 150
M1 1/ N {( 1 )k2a:M + L 2a:n cos[27rk(2n + 1)/ N] + n=D M1 L 2:z:N/2n1 cos[27rk(N2n1)/ N]} n=O M1 2/N{(l)k:rM+ L 2:rncos[27rk(2n+ 1)/N]} n=O The result for the IDST follows immediately from Theorem 3. 7 and the ROCSZO symmetry of Xk. Note that only one fourth of the RSESE sequence :Cn needs to be specified. This completes the proof of Theorem 3.13. A fast, mixed radix algorithm for computing the RSESE symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSE symmetric FST. Note that an RSESE sequence of length N may be stored in N/4 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an ROCSZO sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCSZO sequence Xk. We begin developing this algorithm by defining one new intermediate symmetry involved. Definition 3.10 A zero (Z) sequence Xk of length N is defined by: forO :0: k::; N 1. The following lemma establishes the relationship between the symmetries which occur in the splittings of the ROCSZO sequence Xk. We omit the proof of this result because it is trivial. Lemma 3.9 Let Xk be an ROCSZO sequence of length N with factor 2. Then subsequence Xk,o is ROCS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the ROCS sequence Xk,O are identical to those in Lemma 3.2, with the addition that all sequences have R symmetry as well. 139
PAGE 151
A mixed radix splitting tree diagram for an ROCSZO sequence is shown in Figure 3.4. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3 and 3.5, with the addition provided by the following lemma. We omit the proof of this result because it is trivial. Lemma 3.10 Let Xk be a Z sequence of length N. Its IDST "'n is also a Z sequence of length N. These results show that each sy.mmetry appearing in Figure 3.4 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSESE symmetric IFST, obtained as a special case of that for the RSE symmetric IFS T. Corollary 3.10 Assume p = 2. The inverse combine equation for ROCS and Z sequences is: Xn,O = Yn,O for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3. 8 for arbitrary factors p. We now prove Corollary 3.10. The inverse combine equation for ROCS and Z sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.6 and use the Z symmetry of Yn,l Note that the companion equation is not needed because only one fourth of the RSESE sequence "n needs to be computed. This completes the proof of Corollary 3.10. The next corollary provides all of the forward combine equations for the RSESE symmetric FST, obtained as a special case of that for the RSE symmetric FST. Corollary 3.11 Assume p = 2. The forward combine equation for ROCS and Z sequences is: Yn,O Xn,O Yn,l 0 140
PAGE 152
ROCSZO Figure 3.4: Splitting tree for RSESE symmetric FST 141
PAGE 153
for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 3. 9 for arbitrary factors p. We now prove Corollary 3.11. The forward combine equation for ROCS and Z sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3. 7 and use the RSESE symmetry of Xn as follows: Yn,O = (xn,O + "n1,1)/2 = (xn + "N/2n1)/2 (xn + Xn)/2 Xn,O iin,l = (xn,O"n1,1)/2 = (:i:n"N/2n1)/2 = ( :l:n Xn) /2 = 0 This completes the proof of Corollary 3 .11. 142
PAGE 154
3.6 Real Composite Staggered Even Staggered Odd (RSESO) In this section, we will be concerned with the following symmetries: Definition 3.11 A real composite staggered even staggered odd (R5E50) sequence "'n of length N, where N is even, is defined by: iN n1 Zn Note that an R5E50 sequence of length N is also an R5E sequence of length N. A real odd conjugate symmetric zero even term (ROC5ZE} sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.11 If "'n is an R5E50 sequence of length N, where N is even, then its D5T Xk is an ROC5ZE sequence of length N. If Xk is an ROC5ZE sequence of length N, where N is even, then its ID5T "'n is an R5E50 sequence of length N. We now prove Lemma 3.11. \Ve will only prove the first assertion. As sume "'n is an RSESO sequence of length N, where N is even. Since "'n is also an RSE sequence of length N, Lemma 3.4 implies that its DST Xk is an ROCS sequence of length N. Thus, we have only to prove the third property in the definition of an ROCSZE sequence. For this, we use the representation of X k provided by Theorem 3. 7 and the RSES 0 symmetry of "'n as follows: N/21 xk = L 2:cncos[d(2n+l)/N] n=O 143
PAGE 155
N/21 L 2xN/2n1 cos[7rk(N2n1)1 N] n=O N/21 ( 1)k+l L 2xn cos[7rk(2n + 1)/ N] n=O This completes the proof of Lemma 3.11. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector ex pansion required by the Fourier analysis method for NSDS or NSD bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an RSESO sequence satisfies NSDS boundary conditions for the computational domain 0 S: n S: N I 4 1. That is: 'J1N1 = Xo '"N/41 = ZN/4 Similarly, if N = 2(2M +1), then an RSESO sequence satisfies NSD bound ary conditions for the computational domain 0 S: n S: M 1. That is: Xo "'M = 0 Theorem 3.14 Let zn be an RSESO sequence and let Xk be its ROCSZE symmetric DST, both of length N where N is even. Assume that N = 4M. The real form of the DST is: N/41 x2k+1 = 2IN L 2:cncos[,(2k + 1)(2n + 1)IN] n=O for 0 S: k S: N I 4 1. The real form of the IDST is: N/41 Zn = L 2X2k+l cos[,(2k + 1)(2n + 1)IN] k=O for 0 S: n S: Nl41. Next, assume that N = 2(2M + 1). The real form of the DST is: M1 x2k+l = 2/N L 2Xn cos[7r(2k + 1)(2n + 1)/N] n=O 144
PAGE 156
for 0 :0: k :0: M 1. The real form of the IDST is: M1 Zn = L 2X2k+l cos[11"(2k + 1)(2n + 1)/N] k:=O for 0 :0: n :0: M 1. Note that the results for the DST and IDST are identical except for scaling. We now prove Theorem 3.14. We prove the result for the DST for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 3.7, the ROCSZE symmetry of Xk> and the RSESO symmetry of :l!n as follows: N/21 x2k+l = 1/N L 2:!!nCOS[11"(2k+ 1)(2n+ 1)/N] n=O N/41 = 1/N{ L 2zn cos[11"(2k + 1)(2n + 1)/N] + n=O N/41 L 2"'N/2n1 cos[11"(2k + l)(N2n1)/N]} n=O N/41 2/N L 2xncos[11"(2k+l)(2n+1)/N] n=O The results for the IDST follow immediately from Theorem 3. 7 and the ROCSZE symmetry of Xk. Note that only one fourth of the RSESO sequence Xn needs to be specified. This completes the proof of Theorem 3.14. A fast, mixed radix algorithm for computing the RSESO symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSE symmetric FST. Note that an RSESO sequence oflength N may be stored in N /4 real storage locations, compared to 2N real storage locations for a C sequence of length N. Similarly, an ROCSZE sequence of length N may be stored in N /4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the ROCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. 145
PAGE 157
Lemma 3.12 Let Xk be an ROCSZE sequence of length N with factor 2. Then subsequence Xk,o is Z symmetric, and subsequence Xk,l is RSOCS symmetric. The symmetries which occur in the splittings of the RSOCS sequence Xk,l are identical to those in Lemma 3.2, with the addition that all sequences have R symmetry as well. A mixed radix splitting tree diagram for an RO CS ZE sequence is shown in Figure 3.5. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find R sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.5, and 3.10. These results show that each symmetry appearing in Figure 3.5 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSESO symmetric IFST, obtained as a special case of that for the RSE symmetric IFS T. Corollary 3.12 Assume p = 2. The inverse combine equation for Z and RSOCS sequences is: Xn,O = f.in,l for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.8 for arbitrary factors p. We now prove Corollary 3.12. The inverse combine equation for Z and RSOCS sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.6 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the RSESO sequence Xn needs to be computed. This completes the proof of Corollary 3.12. The next corollary provides all of the forward combine equations for the RSESO symmetric FST, obtained as a special case of that for the RSE symmetric FST. Corollary 3.13 Assume p = 2. The forward combine equation for Z and RSOCS sequences is: Yn,O = 0 146
PAGE 158
ROCSZE Figure 3.5: Splitting tree for RSESO symmetric FST 147
PAGE 159
for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 3. 9 for arbitrary factors p. We now prove Corollary 3.13. The forward combine equation for Z and RSOCS sequences may be regarded as a special case of that for ROCS and RSOCS sequences, where p = 2. Thus, we apply Corollary 3.7 and use the RSESO symmetry of Zn as follows: Yn,O (a:n,O + "'n1,,)/2 (xn + "'N/2n1)/2 = (xnXn)/2 0 Yn,1 (xn,O"'n1,1)/2 (xn"'N/2n1)/2 (:vn + Xn)/2 This completes the proof of Corollary 3.13. 148
PAGE 160
3. 7 Real Composite Staggered Odd Staggered Even (RSOSE) In this section, we will be concerned with the following symmetries: Definition 3.12 A real composite staggered oddstaggered even (RSOSE) sequence Xn of length N, where N is even, is defined by: Note that an RSOSE sequence of length N is also an RSO sequence of length N. An imaginary odd conjugate symmetric zero even term (IOCSZE) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.13 If Xn is an RSOSE sequence of length N, where N is even, then its DST Xk is an IOCSZE sequence of length N. If Xk is an IOCSZE sequence of length N, where N is even, then its IDST Xn is an RSOSE sequence of length N. We now prove Lemma 3.13. We will only prove the first assertion. As sume "'n is an RSOSE sequence of length N, where N is even. Since Xn is also an RSO sequence of length N, Lemma 3.6 implies that its DST Xk is an IOCS sequence of length N. Thus, we have only to prove the third property in the definition of an IOCSZE sequence. For this, we use the representation of Xk provided by Theorem 3.10 and the RSOSE symmetry of :l!n as follows: N/21 Xk = if N L 2xn sin[7rk(2n + 1)/ N] n;;;;;Q 149
PAGE 161
N/21 i/N L 2xN/2n1sin[r.k(N2n 1)/N] n=O N/21 (1)k+l{i/N L 2xnsin[d(2n+1)/N]} n=O This completes the proof of Lemma 3.13. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DSNS or DSN bound ary conditions, depending on the length of the sequence N. Note that if N = 4M, then an RSOSE sequence satisfies DSNS boundary conditions for the computational domain 0 :S n :S N /41. That is: "'N/41 "'N/4 Similarly, if N = 2(2M + 1), then an RSOSE sequence satisfies DSN bound ary conditions for the computational domain 0 :S n :S M. That is: ZN1 = xo XM1 = XM+l Theorem 3.15 Let Xn be an RSOSE sequence and let Xk be its IOCSZE symmetric DST, both of length N where N is even. Assume that N = 4M. The real form of the DST is: N/41 Im(X2k+l) = 2/N L 2xn sin[1r(2k + 1)(2n + 1)/N] n=O for 0 :S k :S N /4 1. The real form of the IDST is: N/41 Xn =L 2Im(X2k+l) sin[1r(2k + 1)(2n + 1)/N] k=O for 0 :S n :S N/41. Next, assume that N = 2(2M + 1). The real form of the DST is: M1 Im(X2k+l) = 2/ N {( 1/:vM + L 2xn sin[1r(2k + 1)(2n + 1 )/ N]} n=O 150
PAGE 162
for 0 :0: k :0: M. The real form of the IDST is: M1 "'n = ( 1)n+l Im(XN/2)L 2Im(X2k+1) sin[1r(2k + 1)(2n + 1)1 N] k::::::.O for 0 :<:; n :0: M. Note that the results for the DST and IDST are identical except for scaling. We now prove Theorem 3.15. We prove the result for the DST for the case of N = 4M only, since the proof for N = 2(2M + 1) is similar. This result follows from Theorem 3.10, the IOCSZE symmetry of Xk, and the RSOSE symmetry of Xn as follows: N/21 Im(X2k+1) = 11 N L 2zn sin[1r(2k + 1)(2n + 1)IN] n=O N/41 1IN{ L 2znsin[1r(2k + 1)(2n + 1)IN] + n=O N/41 L 2'"N/2n1 sin[1r(2k + 1)(N2n1)1 N]} N/41 = 2IN L 2xn sin[1r(2k + 1)(2n + 1)IN] n=O The results for the IDST follow immediately from Theorem 3.10 and the IOCSZE symmetry of Xk. Note that only one fourth of the RSOSE se quence Zn needs to be specified. This completes the proof of Theorem 3.15. A fast, mixed radix algorithm for computing the RSOSE symmetric DST and its inverse, given "n in natural order, may be obtained as a special case of that for the RSO symmetric FST. Note that an RSOSE sequence of length N may be stored in N I 4 real storage locations, compared to 2N real storage locations for a C sequence oflength N. Similarly, an IOCSZE sequence of length N may be stored in N I 4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCSZE sequence Xk. This does not introduce any new intermediate symmetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. 151
PAGE 163
Lemma 3.14 Let Xk be an IOCSZE sequence of length N with factor 2. Then subsequence Xk.o is Z symmetric, and subsequence Xk,l is ISOCS sym metric. The symmetries which occur in the splittings of the ISOCS sequence Xk,l are identical to those in Lemma 3.2, with the addition that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCSZE sequence is shown in Figure 3.6. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.7, and 3.10. These results show that each symmetry appearing in Figure 3.6 induces a symmetry in the IDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSOSE symmetric IFST, obtained as a special case of that for the RSO symmetric IFST. Corollary 3.14 Assume p = 2. The inverse combine equation for Z and ISOCS sequences is: Zn,O = i/n,l for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.11 for arbitrary factors p. We now prove Corollary 3.14. The inverse combine equation for Z and ISOCS sequences may be regarded as a special case of that for IOCS and ISOCS sequences, where p = 2. Thus, we apply Corollary 3.8 and use the Z symmetry of Yn,O Note that the companion equation is not needed because only one fourth of the RSOSE sequence Xn needs to be computed. This completes the proof of Corollary 3.14. The next corollary provides all of the forward combine equations for the RSOSE symmetric FST, obtained as a special case of that for the RSO symmetric FST. Corollary 3.15 Assume p = 2. The forward combine equation for Z and ISOCS sequences is: Yn,O 0 fin,l :Vn,O 152
PAGE 164
IOCSZE Figure 3.6: Splitting tree for RSOSE symmetric FST 153
PAGE 165
for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 3.12 for arbitrary factors p. We now prove Corollary 3.15. The forward combine equation for Z and ISOCS sequences may be regarded as a special case of that for IOCS and ISOCS sequences, where p = 2. Thus, we apply Corollary 3.9 and use the RSOSE symmetry of "'n as follows: Yn,O (:cn,O"n1,,)/2 (:en"'N/2n1)/2 = (:xn:Xn)/2 = 0 iin,l = (:xn,O + "'n1,1)/2 = (:>On+ "N/2n1)/2 (:>On+ "'n)/2 = Zn,O This completes the proof of Corollary 3.15. 154
PAGE 166
3.8 Real Composite Staggered Odd Staggered Odd (RSOSO) In this section, we will be concerned with the following symmetries: Definition 3.13 A real composite staggered odd staggered odd (RSOSO) sequence >:n of length N, where N is even, is defined by: ';CN/2n1 Zn Note that an RSOSO sequence of length N is also an RSO sequence of length N. An imaginary odd conjugate symmetric zero odd term (IOCSZO) sequence Xk of length N, where N is even, is defined by: The following lemma establishes the relationship between these symme tries. Lemma 3.15 If "'n is an RSOSO sequence of length N, where N is even, then its DST Xk is an IOCSZO sequence of length N. If Xk is an IOCSZO sequence of length N, where N is even, then its IDST >:n is an RSOSO sequence of length N. We now prove Lemma 3.15. We will only prove the first assertion. As sume >:n is an RSOSO sequence of length N, where N is even. Since :Cn is also an RSO sequence of length N, Lemma 3.6 implies that its DST Xk is an IOCS sequence of length N. Thus, we have only to prove the third property in the definition of an IOCSZO sequence. For this, we use the representation of Xk provided by Theorem 3.10 and the RSOSO symmetry of :Z:n as follows: N/21 xk i/ N L 2"n sin[71"k(2n + 1)/ N] n=O 155
PAGE 167
N/21 = i/ N L 2xN;2n! sin[?rk(N2n1)/ N] n;;;::O N/21 (1)k{i/N L 2xnsin[r.k(2n + 1)/N]} n=D This completes the proof of Lemma 3.15. The next theorem uses the previous lemma to find the real form of the DST and IDST. Observe that the result for the IDST is the eigenvector expansion required by the Fourier analysis method for DSD boundary con ditions. Note that if N = 2(2M + 1), then an RSOSO sequence satisfies DSD boundary conditions for the computational domain 0 :0: n :0: M 1. That is: ZN1 = Zo :CM = 0 Theorem 3.16 Let :en be an RSOSO sequence and let Xk be its IOCSZO symmetric DST, both of length N where N is even. Assume that N = 2(2M + 1). The real form of the DST is: M1 Im(X2k) = 2/N L 2xnsin[21fk(2n + 1)/N] n=O for 1 :0: k :0: M. The real form of the IDST is: M :Cn =L 2Im(X2k) sin[2r.k(2n + 1)/NJ k=l forO :0: n :0: M 1. We now prove Theorem 3.16. The result for the DST follows from The orem 3.10, the IOCSZO symmetry of Xk, and the RSOSO symmetry of Xn as follows: N/21 Im(X2k) = 1/N L 2:cnsin[2r.k(2n + l)/N] n::::O 156
PAGE 168
M1 1/ N { L 2xn sin[2d(2n + 1)/ N] + n:::;;O M1 L 2xN/2nl sin[2d(N2n1)/ N]} M1 = 2/N L 2xnsin[27rk(2n+ 1)/N] n=O The result for the IDST follows immediately from Theorem 3.10 and the IOCSZO symmetry of Xk. Note that only one fourth of the RSOSO se quence :>:n needs to be specified. This completes the proof of Theorem 3.16. A fast, mixed radix algorithm for computing the RSOSO symmetric DST and its inverse, given Xn in natural order, may be obtained as a special case of that for the RSO symmetric FST. Note that an RSOSO sequence of length N may be stored in N j 4 real storage loc<>tions, compared to 2N real storage locations for a C sequence of length N. Similarly, an IOCSZO sequence of length N may be stored in N/4 real storage locations. Our goal is to exploit these symmetries in the data in order to obtain a reduction by one eighth in both storage requirements and number of operations compared to that for C sequences. This algorithm is based on the symmetries which occur in the splittings of the IOCSZO sequence Xk. This does not introduce any new intermediate syrrunetries. The following lemma establishes the relationship between the symmetries which occur in the splittings of Xk. We omit the proof of this result because it is trivial. Lemma 3.16 Let Xk be an JOGSZO sequence of length N with factor 2. Then subsequence Xk,o is JOGS symmetric, and subsequence Xk,l is Z sym metric. The symmetries which occur in the splittings of the JOGS sequence xk,O are identical to those in Lemma 3.2, with the addttion that all sequences have I symmetry as well. A mixed radix splitting tree diagram for an IOCSZO sequence is shown in Figure 3.7. The acronyms representing the symmetries are summarized in Table 3.2 for ease of reference. Note that a branch of the splitting tree corresponding to a dual sequence terminates because it is redundant. Note also that at the deepest level of the splitting tree we find I sequences rather than C sequences. The intermediate symmetries in the IDST induced by the intermediate symmetries in the DST are identical to those in Lemmas 3.3, 3.7, and 3.10. 157
PAGE 169
IOCSZO Figure 3. 7: Splitting tree for RSOSO symmetric FST 158
PAGE 170
These results show that each symmetry appearing in Figure 3. 7 induces a symmetry in the rDST. These induced symmetries are summarized in Table 3.2 for ease of reference. The next corollary provides all of the inverse combine equations for the RSOSO symmetric IFST, obtained as a special case of that for the RSO symmetric rFST. Corollary 3.16 Assume p = 2. The inverse combine equation for JOGS and Z sequences is: :Vn,O = Yn,O for the lower halfrange of n. The inverse combine equations for the remain ing symmetries are provided by Theorem 3.11 for arbitrary factors p. We now prove Corollary 3.16. The inverse combine equation for rOCS and z sequences may be regarded as a special case of that for roes and rSOCS sequences, where p = 2. Thus, we apply Corollary 3.8 and use the Z symmetry of Yn,1. Nate that the companion equation is not needed because only one fourth of the RSOSO sequence Xn needs to be computed. This completes the proof of Corollary 3.16. The next corollary provides all of the forward combine equations for the RSOSO symmetric FST, obtained as a special case of that for the RSO symmetric FST. Corollary 3.17 Assume p = 2. The forward combine equation for JOGS and Z sequences is: Yn,O Zn,O Yn,1 = 0 for the lower halfrange of n. The forward combine equations for the re maining symmetries are provided by Theorem 3.12 for arbitrary factors p. We now prove Corollary 3.17. The forward combine equation for roes and z sequences may be regarded as a special case of that for roes and rSOCS sequences, where p = 2. Thus, we apply Corollary 3.9 and use the RS 0SO symmetry of Xn as follows: 159
PAGE 171
Yn,O (xn,O"n1,1)/2 = (a:n"N/2n1)/2 ( Xn + "n) /2 :l!n,O fin,l ("n,O + "n1,1)/2 (xn + "N/2n1)/2 = ("n"n)/2 0 This completes the proof of Corollary 3.17. 160
PAGE 172
3.9 Tables of Symmetries Table 3.1: Symmetries in the IDST Aero Symmetry Sequence DST Periodic ZN+n = Zn XN+k = Xk R Real ::Cn = Zn XNk = Xk RSE Real :Cn = Zn xk =Xk Staggered ZNn1 = Xn XNk = Xk Even RSO Real 'Zn = Xn xk = Xk Staggered XNn1 = Xn xNk = xk Odd RSESE Real Composite Z"n = Xn xk =Xk S.EvenS.Even XNn1 = Xn XNk = Xk (N even) :l!N/2n1 = Xn Xk=(ltxk RSESO Real Composite Xn = Zn xk =Xk S.EvenS.Odd ZNn1 = :Z:n XNk = Xk (N even) ZN/2n1 = Xn xk = (l)k+'xk RSOSE Real Composite Z"n = Xn xk = Xk S.OddS.Even ZNn1 = Xn xNk = xk (N even) ZNj2n1 = Xn xk = ( l)k+' xk RSOSO Real Composite "fn = Xn xk = Xk S.OddS.Odd XNn1 = Xn xNk = xk (N even) XN/2n1 :::::::: Xn Xk=(l)kXk 161
PAGE 173
Table 3.2: Symmetries in the DST Aero Sym Sequence IDST Periodic XN+kXk ZN+n = Zn ocs Odd XNk = Xk Xn = Zn Conj Sym socs Stag XNk1 = Xk (n+1/2)Zn = WN :Vn Odd (n+l/2)/2::Cn = WN Zn Conj Sym OCSIS ocs Xk,pq = XN/pk1,q (n+l/2)Yn,pq = WN/p Yn,q Indcd Interseq Sym SOC SIS socs Xk,pq1 = X N/pk1,q (n+l/2)Yn,pq1 = WNjp Yn,q Indcd Interseq Sym R Real xk xk "'N n1 = Z'n I !mag xkXk XN n1 = Xn 162
PAGE 174
Table 3.2: ( contd.) Aero Sym Sequence IDST ROCSZO ROCS & Zero xk =Xk X"n = llln Odd Terms XNk = Xk ZNn1 = Zn (N even) Xk=(l)kXk ZN/2n1 = Zn ROCSZE ROCS & Zero xk =Xk Zn = Zn Even Terms XNk = Xk ZNn1 = Zn (N even) xk = ( l)k+l xk ZN/2n1 = Xn IOCSZE IOCS & Zero xk = Xk 'Xn = Xn Even Terms xNk = xk ZNn1 = Xn (N even) xk = ( l)k+l xk ZN/2 n 1 = Xn IOCSZO IOCS & Zero xk = Xk Xn = Zn Odd Terms xNk = xk XNn1 = Xn (N even) xk = (l)kxk '>'N/2 n 1 = Xn z Zero xk = o :l!n = 0 163
PAGE 175
Chapter 4 Software Implementation and Performance 4.1 Introduction We begin this chapter by estimating the mmber of lines of FORTRAN code required to implement all of the FFT and FS T algorithms presented in the preceding chapters. There are 5 basic transforms required to address all boundary conditions. These are the R, RE, RO FFTs and the RSE, RSO FSTs. We have excluded all of the composite symmetries because they are special cases of the 5 basic transforms listed above. Note also that by a basic transform we mean both the forward and inverse directions, since one direc tion is seldom useful without the other. For each basic transform, we have identified a need for 4 values of the radix p, namely p = 2, 3, 4, 5. As will be explained in Section 4.6, we have found that larger values of plead to ineffi cient implementations on most vector computers. Each basic transform may be implemented inplace, producing the forward transform in a permuted order, or they may utilize additional storage, producing the forward trans form in natural order. In either case, we require that the inverse transform be produced in natural order because the original data is provided in natu ral order. On the other hand, the forward transform may be produced in a permuted order because it is usually followed by an inverse transform which accepts its input in that same order. We will be focusing our attention on serial vector processors. However, these algorithms have excellent potential for parallelization. Shared memory machine architectures generally require
PAGE 176
only simple modifications to the serial code. Distributed memory machine architectures, on the other hand, require significantly different data man agement techniques in order to minimize interprocessor communication [llj. Thus, we have identified a need for at least 2 variations of these codes cor responding to these broad classes of machine architectures. If we combine the independent options discussed above, we obtain at least 5 x 4 x 2 x 2 = 80 variations of the basic transforms. From prototype software, which will be described in Section 4.5, we estimate that each variation requires approximately 750 lines of FORTRAN code. Thus, the entire package of 80 variations requires approximately 60K lines of FORTRAN code. In view of the estimated size of the complete software package, we have selected just one of the basic transforms to implement and test in detail. The transform we have selected is the RO FFT. There are two reasons for this selection. The first reason is that if xn is an RO sequence of length N, where N is even, then: xo = 0 '"N/2 0 Thus, the RO FFT presents the additional problem of eliminating all com putations involving zeros in the data. The RO FFT is unique with this problem. The second reason is that there is a well known implementation of the prepostprocessing algorithm for the RO FFT: VFFTPK [5, 7]. One of our goals is to compare the performance of the compact algorithms with their prepostprocessing counterparts. Thus, we would want to avoid selecting the RSESO FST, for example. The remainder of this chapter will be concerned with the implementa tion and performance of software for the RO FFT. Furthermore, we have restricted our attention to the forward transform, since there are no new issues involved in the inverse transform. The implementation process begins by developing simplified forms of the forward combine equations for a specific value of the radix p. We have restricted our attention to p = 2, 3, 4 because this is sufficient to illustrate the breadth of difficulties involved in imple menting this algorithm, and also provides enough flexibility for conducting a thorough performance comparison to VFFTPK. We have also restricted our attention to an inplace algorithm for a serial vector processor. Thus, for each value of p we find storage patterns for the data which allow the combine equations to be executed inplace on this machine architecture. The general design of the software is then described, with emphasis on the automated 165
PAGE 177
generation of splitting trees. Next, we present the results of performance tests of this software, using VFFTPK as a baseline. Results are presented for both an IBM 3090J and a Cray YMPS/864. These results are analyzed in detail, and a timing model is presented. Finally, we wish to automate as much of the implementation process as possible, in view of the estimated size of the entire package. We describe how Mathematica [12] can be used to automate most of the steps described above. 166
PAGE 178
4.2 The Radix2 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix2 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.1 Assume p = 2. The forward combine equation for I sequences zs: ( 4.1) for 0$ n $ N/4 if N/2 is even, or 0$ n $ (N2)/4 if N/2 is odd, and 0::; q :0 1. For n = 0 and q = 0,1 equation (4.1} reduces to: Yo,o = i(Im(xo,o) + Im(xo,l))/2 Yo,1 = i(Im(xo,o)Im(xo,l))/2 If N/2 is even, thenforn = N/4 and q = 0,1 equation (4.1} reduces to: YN/4,0 YN/4,1 i(2Im( '"N/4,0) )/2 i( 2Re(xN;4,o))/2 For the remaining values of n and q = 0, 1, equation (4.1} reduces to: Yn,o {Re(xn,o)Re(xn,1) + i[Im(xn,o) + Im(xn,l)]}/2 Yn,1 = wjt{Re(xn,o) + Re(:Ln,l) + i[Im(xn,o)Im(:"n,l)J}/2 The forward combine equation for ICS and ISCS sequences is identical to that for I sequences with the exception that all sequences Xn,l are real and q = 0. In addition: Yn,1 = (xn,O + "'n,l)/2 ( 4.2) for 0 :0 n ::; N /4 if N/2 is even, or 0 $ n ::; (N 2)/4 if N /2 is odd. For n = 0 equation (4.2) reduces to: iio,1 = 0 If N /2 is even, then for n = N/4 equation (1,.2) reduces to: YN/4,1 = (2xN/4,o)/2 167
PAGE 179
The forward combine equation for ISCSIS sequences is: n/2 ( .. ) ,2 Yn,O = WN Xn,O ZXn,l / for 0::; n::; N/4 if N/2 is even, or 0::; n::; (j'i2)/4 if N/2 is odd, For n = 0 equation (4.3) reduces to: Yo,o = i( ito,l)/2 If N/2 is even, then for n = N/4 equation {4.3) reduces to: YN/4,0 = i( h XNj4,o)/2 We now prove Corollary 4.1. We will only provide the key steps of selected results which may not be obvious. In most of these cases, this involves the application of one or more of the symmetries summarized in Table 2.2. The forward combine equation for I sequences is simplified as follows. Since "'n is the IDFT of an I sequence, it follows from Lemma 2.8 that :l!o,o = m0 and m0 1 = mN;2 are both pure imaginary. The forward combine equation for ICS and ISCS sequences is simplified as follows. Since "n is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that "o.o = "o = 0 and "o,1 = "N/2 = 0. The forward combine equation for ISCSIS sequences is simplified as follows. Since Xn is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that m0 0 = "o = 0. For n = N /4 we proceed as follows: 1(2 )/2 YN/4,0 wa "N/4,0wa"Nf4,1 = (wg 1XNj4,oi( h 'iN/4,o)/2 Thls completes the proof of Corollary 4.1. Data storage patterns for all of the combine equations in Corollary 4.1 are shown in Figures 4.1 through 4.8. In each figure, the input quantities are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these combine equations can be executed inplace. In order to accomplish this, we never store redun dant data or variables which always have the value of zero. We have not illustrated the case of N/2 being odd because it is identical except for the absence of the variables corresponding ton= Nj4, 168
PAGE 180
The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix2 RO FFT in Table 4.1. Table 4.1 utilizes a simple, compressed format suitable for automated ma chine generation and interpretation. It is actually a representation of the splitting tree for the radix2 RO FFT, and is analogous to Figure 4.9. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.1 through 4.8. The corresponding repetition count rep resents contiguous repetitions of the same symmetric sequence. Table 4.1 shows the storage pattern for each factor of 2 in the RO FFT for N = 16. Each radix2 splitting of an ICS sequence results in an ICS sequence followed by an ISCS sequence. However, the IDFT of an ICS sequence of length 2 is identically zero and is not stored. Each radix2 splitting of an ISCS se quence results in a dual pair of I sequences, and the IDFT of only one of these is stored. Finally, each radix2 splitting of an I sequence results in two I sequences. 169
PAGE 181
X1,0 Yl,O X2,0 Y2,o :z:a,o Ys,o Z4,0 Y4,1 Xs,o iis,1 :z:s,o f/2,1 Z7,0 f/1,1 Figure 4.1: Radix2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = N /4 X1,0 1,0 Z2,0 Y2,o Z3,0 Ys,o Z4,0 Y4,1 zs,o iis.1 zs,o ii2,1 Z7,0 ill,l Figure 4.2: Radix2 storage pattern for ICS induced symmetries for N = 16 highlighting the case n = 1 170
PAGE 182
Zo,1 Yo,o z1,o Iyl,O :l:a,o Iy2,o Zs,o Iys,o z4,o Iy4,o z3,o Rys,o :V2,0 RY2,o :2:1,0 Ryl,O Figure 4.3:. Radix2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = 0 Zo,l Iyo,o Z7,o Iyl,O Za,o Iy2,o Zs,o Iys,o Z4,o Y4,o i:3,0 Rys,o i:2,0 Ry2,o z1,o Ryl,O Figure 4.4: Radix2 storage pattern for ISCS induced symmetries for N = 16 highlighting the case n = N /4 171
PAGE 183
Zo,1 Iyo,o :e7,0 Yl,O Zs,o Iy2,o Zs,o Iy3,o z4,o Iy4,o 2:3,0 Ry3,o z2,o Ry2,o x1,o Yl,O Figure 4.5: Radix2 storage pattern for ISCS induced synunetries for N = 16 highlighting the case n = 1 172
PAGE 184
I:z:o,o Yo,o I:v,,o Iy,,o I:v2,o Iy2,o I:na,o Iya,o I:z:4,0 Iy4,o Izs,o Rya,o I:vs,o Ry2,o I:z:7,0 Ryl,O l:vo,l YO,l Rx1,0 Iy,,, R:vs,o Iy2,1 R:vs,o Iya,1 R:v.,o Iy4,1 Rxa,o Rya,l Rx2,0 Ry2,1 Rx,,o Ry1,1 Figure 4.6: Radix2 storage pattern for I sequences for N = 16 highlighting the case n = 0 173
PAGE 185
bo,o Iyo,o Ix,,o Iyl,O I"2,o Iy2,o Ixs,o Iya,o r,.,o Y4,o Ixs,o Ry3,o I"e,o Ry2,o Ix7,o Ry,,o Ixo,l lyo,1 R"1,o Iy,,, R"e,o Iy2,1 Rxs,o Iy3,1 Rx4,o Y4,1 R"3,o RYa,l R"2,o Ry2,1 Rx1,o Ry1,1 Figure 4.7: Radix2 storage pattern for I sequences for N = 16 highlighting the case n = N/4 174
PAGE 186
Izo,o Ja:1,0 I:v2,o I:va,o l:v4,o Izs,o Ize,o Ix7,0 I:ro,l R"1,o Rze,o R:rs,o R:r4,o R"a,o R"2,o R"l,O lYo,o Iyl,O lY2,o Iva,o lY4,o Rva,o RY2,o Yl,O lY2,1 Iva,t lY4,1 Rva,1 RY2,1 Ry1,1 Figure 4.8: Radix2 storage pattern for I sequences for N = 16 highlighting the case n = 1 175
PAGE 187
cs cs scs <. ICS scs N = 16 N =8 N= 4 N=2 Figure 4.9: Splitting tree for the radix2 RO FFT for N = 16 176
PAGE 188
Table 4.1: Splitting Tree for the Radix2 RO FFT for N = 16 Length Factor Symmetry Repetitions 16 2 ICS 1 8 2 ICS 1 ISCS 1 4 2 ICS 1 ISCS 1 I 1 2 2 ISCS 1 I 3 177
PAGE 189
4.3 The Radix4 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix4 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.2 Assume p = 4. The forward combine equation for I se quences zs: Y .L ( 1)+l., .." ,q., )/4 n,q N ""n,O ...,.n,2 T W4 "'n,l '"'4"'n,l ( 4.4) for 0 :S n :S N /8 if N /4 is even, or 0 :S n :S (N4)/8 if N /4 is odd, and 0 :S q :S 3. Porn= 0 and q = 0,1,2,3 equation (44) reduces to: Yo,o i(Im(:z:o,o) + Im(:z:o,2) + 2i'?'(:z:o,!))/4 Yo,l = i(Im(xo,o)Im(xo,2)2Re(xo,!))/4 Yo,2 = i(Im(:co,o) + Im(:co,2)2Im(:co,!))/4 Yo,a = i(Im(:z:o,o)Im(:z:o,2) + 2Re(:z:o,,))/4 If N/4 is even, then for n = N/8 and q = 0,1,2,3 equation (44) reduces to: YN/8,0 = i(2lm(:z:N/8,o) + 2lm(:z:Njs,J))/4 YN/8,! i.J2 (Re("'N/8,o)lm(:z:Njs,o) + Re(:z:N/8,1) + lm(:z:N;s,J))/4 YN/8,2 = i( 2Re(:z:Nj8,o) + 2Re(:z:N;s, 1))/4 YN/8,3 = i.J2 (Re("'N/8,o) + Im(:z:N/8,o) + Re(xN/8,!)Im(:z:N/8,J))/4 For the remaining values ofn and q = 0, 1,2, 3, equation (44) reduces to: Yn,O = {Re(:cn,o)Re(:"n,2) + Re(xn,l)Re(xn,l) + i[Im(:z:n,o) + Im(:z:n,2) + Im(xn,!) + Im("'n,l)]}/4 Yn,l = wjt{Re(:z:n,o) + Re(Ln,2) + Im(xn,l)Im(Xn,!) + i[Im("'n,o)Im(:"n,2)Re(:cn,J)Re(:"n,J)]}/4 Yn,2 = w.N2 n{Re(:z:n,o)Re(:cn,2)Re(xn,l) T Re("'n,l) + i[Jm(:z:n,o) + Im("'n,2)Im(:cn,!)Im("'n,J)]}/4 Yn,3 = w.N3 n{Re(:cn,o) + Re(:z:n,2)Im(:cn,!) + Im(Xn,l) + i[Im("'n,o)lm(:Ln,2) + Re(:cn,l) + Re(:cn,l)]}/4 178
PAGE 190
The forward combine equation for IGS, ISGS, and IGSIS sequences is identical to that for I sequences with the exception that all sequences "n,l are real and 0 :S: q :<:: 1. In addition: Yn2 = (;rnO'"n2"n1 + "nl)/4 ' ' ( 4.5) for 0 :S: n :S: N/8 if N/4 is even, or 0 :S: n :S: (N4)/8 if N/4 is odd. For n = 0 equation (4.5) reduces to: iio,2 = 0 If N/4 is even, then for n = N/8 equation (4.5) reduces to: YN/8,2 = (2(;rN/8,0"N/8,1))/4 The forward combine equation for ISGSIS sequences is: n(q+l/2)(+'( 1)q+l+ q1/2, q+l/2)/4 ( 4.6) Yn,q WN Zn,O Z Zn,2 W4 Zn,l W4 ;;cn,l for 0 :S: n :S: N/8 if N/4 is even, or 0 :S: n :S: (N4)/8 if N/4 is odd, and 0 :S: q :S: 1. For n = 0 and q = 0,1 equation (4.6) reduces to: Yo,o i( :to,2Y2 xo,l)/4 Yo,1 = i(
PAGE 191
Table 2.2. The forward combine equation for I sequences is simplified as follows. Since Zn is the IDFT of an I sequence, it follows from Lemma 2.8 that xo,o = Xo and x 0,2 = "'N/ 2 are both pure imaginary. For n = N /8 and q = 1 we proceed as follows: YN/8,1 = Ws1 (xN/8,0 + '"N/8,2 + W41 '"N/8,1 ( 1 1+ 1( 1 + 1))/4 W8 XNj8,0W8"'N/8,1 W4 W8 "'N/8,1 W8XNj8,2 = i(2Im(w81"'N/8,o)2Re(w8 1"N/8,J))/4 i(2Im(w81 )Re(xN;s,o) + 2Re(w81)Im(xN;s,o)2Re(w8 1 )Re(xN/8, 1 ) + 2Im(w81 )Im(xN/8,1 ))/ 4 iv'2(Re(a:N/8,o)Im(xN;8,o) + Re(:rN;s, 1 ) +Im(xN/8,1))/4 The forward combine equation for ICS, ISCS, and ICSIS sequences is sim plified as follows. Since "'n is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that xoo = xo = 0 and Xo 2 = "'N' 2 = 0. The forward ' combine equation for ISCSIS sequences is simplified as follows. Since a:n is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that xo,o = Xo = 0. For n = N /8 and q = 1 we proceed as follows: YN/8,1 W!6 3(i:N/8,0 + ii:_N/8,2 + Wg3XN/8,1 ( 33 . 9' 1 )/4 W1s "N/8,0 w1s"N/8,1 7 W1s "'N/8,1 7 w1s"N/8,2 (xNj8,o(w!s3 wfs) + XN/8,1( w!i + wia))/4 i(:CN/8,o( 2Im(wf6)):CN/8,1( 2Im(wi6)))/4 i(( 2 sin(31r /8))xN18, 0 + (2 sin(1r /8))xN/8 1)/4 This completes the proof of Corollary 4.2. Data storage patterns for all of the combine equations in Corollary 4.2 are shown in Figures 4.10 through 4.18. In each figure, the input quanti ties are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these com bine equations can be executed inplace. In order to accomplish this, we never store redundant data or variables which always have the value of zero. We have not illustrated the case of N /4 being odd because it is identical except for the absence of the variables corresponding to n = N /8. The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix4 RO FFT in Table 4.2. 180
PAGE 192
Table 4.2 utilizes a simple, compressed format suitable for automated rna chine generation and interpretation. It is actually a representation of the splitting tree for the radix4 RO FFT, and is analogous to Figure 2.4. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.10 through 4.18. The corresponding repetition count represents contiguous repetitions of the same symmetric sequence. Table 4.2 shows the storage pattern for each factor of 4 in the RO FFT for N = 64. 181
PAGE 193
X1,0 Yl,O Z2,0 Y2,o za,o Y3,2 Z4,0 f/2,2 zs,o Y1,2 ;r 0,1 YO,l :rl,l Iy,,, Z2,1 Iy2,1 za,l Iya,l Z4,1 Ry2,1 zs,l Ry,,, Figure 4.10: Radix4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 0 Z1,0 Yl,O Z2,0 Y2,0 Z3,0 fJa,2 3::4,0 f/2,2 zs,o fJ, ,2 zo,l Iyo,l X1,1 Iy,,, Z2,1 Iy2,1 2:3,1 Ya,1 Z4,1 Ry2,1 zs,l Ry,,, Figure 4.11: Radix4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = N /8 182
PAGE 194
Z1,0 :r:2,0 zs,o zo,l Z1,1 za,l :c4,1 zs,l 1,0 Yz,o Y3,z f/2,2 Y1,1 Ivz,1 Iy3,1 Rvz,1 Y1,1 Figure 4.12: Rad.ix4 storage pattern for ICS induced symmetries for N = 24 highlighting the case n = 1 183
PAGE 195
i:o,2 Yo,o Xs,1 Iyl,O 2:4,1 Iy2,o Za,1 Iy3,o x2,1 Ry,,o z1 ,1 Ryl,O i:o,1 Yo,! Zs,o Iy1,1 2:4,0 Iy,,l 2:3,0 Iy2,1 2:2,0 Ry2,1 2:1,0 Ry1,1 Figure 4.13: Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 0 Zo,2 Iyo,o i:s,1 Iy,,o 2:4,1 Iy,,o ii:3,1 Y3,o ii:2,1 Ry2,o 2:1,1 Ry,,o Xo,1 Iyo,l i:s,o Iy1,1 i:4,0 Iy2,1 :l:a,o Ya,1 i:2,0 RY2,1 z1,o Ryl,l Figure 4.14: Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = N /8 184
PAGE 196
:i:o,2 :i:s,1 z4,1 :i:3,1 :i:2,1 Z1,1 Xo,l :l:s,o Z4,o ;1:3,0 z2,o .t1,0 Ivo,o Yl,O Iy2,o Iy3,o Ry2,o Yl,O Iy2,1 Iy3,1 Ry2,1 Y1,1 Figure 4.15: Radix4 storage pattern for ISCS induced symmetries for N = 24 highlighting the case n = 1 185
PAGE 197
1xo,o 1Yo,o 1:z:,,o 1y,,o I:c2,0 1y2,o 1"3,0 Iy3,o 1>:4,0 Ry2,o lxs,o Ryl,O 1"o,J Yo,1 1"'' 1y,,, 1x2,1 1Y2,1 I:n3,! 1y3,1 1"4,1 Ry2,1 1"s,J Ry1,1 1"o,2 Yo,2 Rxs,1 1y1,2 Rz4,1 1y2,2 Rz3,! 1y3,2 Rz2,1 Ry2,2 Rzl,J Ry1,2 Rzo,1 lvo,3 Rzs,o 1y,,3 Rz4,o 1y2,3 Rzg,o 1y3,3 Rz2,0 RY2,3 Rz,,o Ry,,3 Figure 4.16: Radix4 storage pattern for I sequences for N = 24 highlighting the case n = 0 186
PAGE 198
bo,o Iyo,o Ixl,O Iy,,o b2,0 Iy2,o I;c3,o I Iy3,o h4,0 Ry2,o I"s,o Ry,,o I"o,1 Iyo,1 I"1,1 Iy1,1 I"2,1 Iy2,1 1"3,1 Y3,1 1"4,1 Ry2,1 l;cs,1 Ry1,1 I"o,2 lyo,2 R"s,1 Iy,,2 R"4,1 Iy2,2 R"3,1 Y3,2 R"2,1 Ry2,2 R"1,1 Ry1,2 R"o,1 Iyo,3 Iy,,a R"4,o Iy2,a lyg,a Ry2,3 R:c1,0 Ry,,a Figure 4.17: Radix4 storage pattern for I sequences for N = 24 highlighting the case n = N/8 187
PAGE 199
la:o,o Ia:,,o la:2,0 la:a,o la:4,0 bs,o Ixo,l la:1,1 la:2,1 la:a,l Ia:4,1 la:s,l Ixo,2 Ra:s,l Ra:4,1 Ra:a,l Ra:2,1 Ra:1,1 Ra:o,l Ra:s,o Ra:4,o Ra:a,o Ra:2,o Ra:,,o lyo,o Y1,2 ly1,3 ly2,3 ly3,3 Ry2,a Ry,,a Figure 4.18: Radix4 storage pattern for I sequences for N = 24 highlighting the case n = 1 188
PAGE 200
Table 4.2: Splitting Tree for the Radix4 RO FFT for N = 64 Length Factor Synrmetry Repetitions 64 4 ICS 1 16 4 ICS 1 ISCS 1 I 1 4 4 ICS 1 ISCS 1 I 7 189
PAGE 201
4.4 The Radix3 RO FFT In this section, we will develop the forward combine equations and as sociated data storage patterns for the radix3 RO FFT. We will address the general mixed radix RO FFT in Section 4.5. The following corollary is obtained from Theorem 2.12. Corollary 4.3 Assume p = 3. The forward combine equation for I sequences is: (4.7) for 0::; n::; N/6 if N/3 is even, or 0::; n::; (N3)/6 if N/3 is odd, and 0::; q::; 2. Porn= 0 and q= 0,1,2 equation (4.7) reduces to: Yo,o = i(Im(:co,o) + 2/m(:co,l))/3 Yo,! = i(Im(:co,o)v'3 Re(:co,,J/m(:co,J))/3 Yo,2 = i(Jm(:co,o) + v'3 Re(:co,,)/m(:co,J))/3 If N /3 is even, then for n = N/6 and q = 0, 1, 2 equation (47) reduces to: YN/6,0 = i(21m(:cN/6,o) + lm(:vN/6,1))/3 YN/6,! i(Im(:cN;a,o)v'3 Re(xN;e,o)Im(xN/6,!))/3 YNj6,2 = i(!m(:cN/6,!)vf3 Re(:vK/B,o)lm(:vN/6,o))/3 For the remaining values of n and q = 0, 1,2, equation (4. 7) reduces to: Yn,O {Re(:cn,o) + Re(:vn,!)Re(xn,!) + i[Im(:cn,o) + Im(:vn,!) + Jm(:vr.,J)]}/3 Yn,! = w]t{Re(:vn,o) + (l/2)(Re(x_,,,1)Re(:cn,!)) + (v'3/2)(Jm(:cn,J) Im(:vn,l)) + i[Im(:cn,o)(1/2)(Jm(:cn,l) + Im(:cn,!))(v'3/2)(Re(:vn,!) + Re(xn,!))]}/3 Yn,2 wjV2 n{Re(xn,o) + (1/2)(Re(:vn,1)Re(:vn,!)) + (v'3/2)(Im(xn,l)Im(xr.,l)) + i[Im(:cn,o)(1/2)(Im(x_n,d + Im(xn,l)) + (v'3/2)(Re(:cn,l) + Re(xn,!))]}/3 190
PAGE 202
The forward combine equation for ICS and ICSIS sequences is identical to that for I sequences with the exception that all sequences Xn,l are real and 0 :::: q :::: 1. The forward combine equation for ISCS and ISCSIS sequences is: n/2( 1/2 1/L )/3 Yn,O = WN Zn,O + w3 Zn,l Wa a::n,l ( 4.8) for 0 :':: n :':: N/6 if N/3 is even, or 0 :':: n :':: (N3)/6 if N/3 is odd. n = 0 equation (4.8) reduces to: Yo,o = i( vl3 :Eo,1)/3 If N /3 is even, then for n = N/6 equation {4.8) reduces to: YN/6,0 = i( :EN/6,0XN/6,1)/3 For the remaining values ofn equation (4.8} reduces to: Yn,O = w"Nn12(in,O + (1/2)(in,1'"n,1)i(vf3/2)(xn,1 + X .. n,1))/3 In addition: For Yn,1 = (xn,OXn,1 + X .. n,1)/3 ( 4.9) for 0 :':: n :':: N/6 if N/3 is even, or 0 :':: n :':: (N3)/6 if N/3 is odd. For n = 0 equation (4.9) reduces to: iio,1 = 0 If N /3 is even, then for n = N /6 equation { 4. 9) reduces to: YN/6,1 = (2:EN/6,0 XNj6,1)/3 We now prove Corollary 4.3. We will only provide the key steps of selected results which may not be obvious. In most of these cases, this involves the application of one or more of the symmetries summarized in Table 2.2. The forward combine equation for I sequences is simplified as follows. Since Xn is the IDFT of an I sequence, it follows from Lemma 2.8 that "'o,o = "'o and "'N/ 6 1 = "N/2 are both pure imaginary. For n = N/6 and q = 1 we proceed as follows: 1( + 1 1)/3 YN/6,1 we "Nf6,o W3 "N/6,1w3"'Nf6,1 = ( 1 1' 1 )/3 W6 XNj6,0W6XNj6,1 T W2 "'N/6,1 i(2Im(w61"'N/6,o)Im("'N/6,1))/3 = i(2Im(w61)Re("'N/6,o) + 2Re(w61)Im("'N/s,o)Im("'N/6,1))/3 = i(Im("'N/ 6 0)vf3Re(xN; 6 ,0 ) Im(xN; 6 1))/3 191
PAGE 203
The forward combine equation for ICS and ICSIS sequences is simplified as follows. Since Xn is the IDFT of an ICS sequence, it follows from Lemmas 2.2 and 2.8 that xo,o = xo = 0 and "N/6 1 = "'N/2 = 0. The forward combine equation for ISCS and ISCSIS sequences is simplified as follows. Since Xn is the IDFT of an ISCS sequence, it follows from Lemmas 2.4 and 2.8 that xo,o = x0 = 0. For n = N /6 and q = 0 we proceed as follows: YN/6,0 = w]}(i:N/6,0 + W$1 XN/6,1 ( 11 1)/3 wl2 "N/6,0wl2"'Nf6,1 cW4 "'N/6,1 i(i:N/6,o(2Im(w;:}))XNfs,J)/3 i(!i:N/6,0XN/6,1)/3 This completes the proof of Corollary 4.3. Data storage patterns for all of the combine equations in Corollary 4.3 are shown in Figures 4.19 through 4.30. In anticipation of the development ofthe general mixed radix RO FFT, we have included two storage patterns for the combine equations for I sequences. The first storage pattern is compatible with that for radix2 and 4. The second storage pattern, which we refer to as 12 sequences, is compatible with the other storage patterns for radix3. In each figure, the input quantities are on the left, the output quantities are on the right, and the arrows indicate particular input quantities required to produce particular output quantities. These storage patterns have been designed so that these combine equations can be executed inplace. In order to accomplish this, we never store redundant data or variables which always have the value of zero. We have not illustrated the case of N /3 being odd because it is identical except for the absence of the variables corresponding ton=N/6. The storage patterns for the individual combining operations are assem bled into the overall storage pattern for the radix3 RO FFT in Table 4.3. Table 4.3 utilizes a simple, compressed format suitable for automated ma chine generation and interpretation. It is actually a representation of the splitting tree for the radix3 RO FFT, and is analogous to Figure 2.4. Each symmetry name represents a sequence whose detailed storage pattern may be found in Figures 4.19 through 4.30. The corresponding repetition count represents contiguous repetitions of the same symmetric sequence. Table 4.3 shows the storage pattern for each factor of 3 in the RO FFT for N = 27. 192
PAGE 204
Z1,0 1!1 ,0 :r2,0 Y2,o zs,o Iya,J X4,0 ly2,1 xs,o Iy1,1 :ro,l YD,l x1,1 Rv1,1 X2,1 RY2,1 Figure 4.19: Radix3 storage pattern for res induced symmetries for N = 18 highlighting the case n = 0 2!1,0 Yl,O X2,0 Yz,o xs,o Ya,1 x4,0 Iy2,1 zs,o Iy1,1 xo,l Iyo,1 X1,1 Ry,,, X2,1 Rv2,1 Figure 4.20: Radix3 storage pattern for res induced symmetries for N = 18 highlighting the case n = N /6 193
PAGE 205
z1,0 1,0 :l!2,0 Y2,o Z3,0 ly3,1 :l!4,0 Iy2,1 a::s,o Yl,l zo,l I yo,! Z1,1 Y1,1 :l!2,1 RY2,1 Figure 4.21: Radix3 storage pattern for ICS induced symmetries for N = 18 highlighting the case n = 1 194
PAGE 206
Za,l Iys,o Z2,1 Iy2,o Z1,1 ly1,0 Zo,1 Yo,o Zs,o Ryl,O x4,o Ry2,o Za,o :iis,1 x2,o fJ2,1 i:1,0 :ii1,1 Figure 4.22: Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 0 Xa,l Ya,o :l:2,1 Iyz,o X1,1 Iyl,O Xo,l lyo,o Zs,o Ryl,O :l:4,0 RY2,o Xa,o :iis,1 2:2,0 :ii2,1 Zt,O Y1,1 Figure 4.23: Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = N /6 195
PAGE 207
2!3,1 Iy3,o 2:2,1 Iy2,o Z1,1 Yl,O Zo,1 Iyo,o Zs,o Y1,0 2:4,0 Z3,o Y3,l 2:2,0 iJ2,1 Zl,O 1,1 Figure 4.24: Radix3 storage pattern for ISCS induced symmetries for N = 18 highlighting the case n = 1 196
PAGE 208
I:z:o,o Yo,o Ixl,O Iy1,0 Ix2,o Iy2,o Iza,o Iya,o Iz4,o Ry2,o Izs,o Ryl,O Ixo,1 Yo,l I:tl,l Iy1,1 Ix2,1 Iy2,1 Ixa,l Iya,l R"'2,1 Ry2,1 Rx1,1 Ry1,1 Rxo,1 Yo,2 R"'s,o Iy1,2 R"'4,o Iy2,2 Rxa,o Iya,2 R"'2,o Ry2,2 R"'l,O Ry1,2 Figure 4.25: Radix3 storage pattern for I sequences for N = 18 highlighting the case n = 0 197
PAGE 209
I:z:o,o Iyo,o I:>:l,O Iyl,O b2,0 Iy2,o I:z:3,0 Y3,o 1:>:4,0 Ry2,o Izs,o Ryl,O l:z:o,l Iyo,1 I:cl,l ly1,1 I:z:2,1 ly2,1 l:z:3,1 Y3,1 R:z:2,1 Ry2,1 R:z:1,1 Ry1,1 R:>:o,l Iyo,2 Rzs,o Iy1,2 R:c4,o Iy2,2 R:z:3,o Y3,2 R:c2,o Ry2,2 R:c1,0 Ry1,2 Figure 4.26: Radix3 storage pattern for I sequences for N = 18 highlighting the case n = N/6 198
PAGE 210
Ixo,o I:>:l,O Iz2,D I:>:a,o I:>:4,o I:>:s,o l:>:o,1 1:>:1,1 I:>:2,1 1:>:3,1 Rx2,1 R"1,1 R:>:o,1 R:>:s,o R:>:4,o R:>:a,o R:>:2,o Rx1,0 Iyo,o Y1,o Iy2,o Iya,o Ry2,o Y1,0 Iyo,1 Y1,1 Y1,2 Figure 4.27: Radix3 storage pattern for I sequences for N = 18 highlighting the case n = 1 199
PAGE 211
Iya,o !:1:2,1 Iy2,o Iy,,o I:vo,l Yo,o Ixs,o Ryl,O Ix4,o Ry2,o Ixa,o Iya,l Ix2,o Iy2,1 Ix1 0 Iy,.l Ixo,o Yo,1 Rx1 0 Ry,,, Rx2,o Ry2,1 Rxa,o lya,2 Rx4,o lY2,2 Rxs,o ly1,2 Rxo,l YD,2 Rx1 1 RY1,2 Rx2,1 Ry2,2 Figure 4.28: Radix3 storage pattern for I2 sequences for N = 18 highlight ing the case n = 0 200
PAGE 212
I:z:a,l Ya,o I:z:2,1 ly2,o Iz1 ,1 Iyl,O I:z:o,l Iyo,o I:z:s,o Ryl,O I:z:4,o RY2,o I:z:a,o Ya,1 I:z:2,o lY2,1 Ixl,O Iy1,1 I:z:o,o Ivo,l R:z:l,O Ry1,1 R:z:2,o Ry2,1 R:z:a,o Ya,2 R:z:4,o ly2,2 R:z:s,o Iy1,2 R:z:o,l Ivo.2 R:z:1,1 Ry1,2 R:z:2,1 Ry2,2 Figure 4.29: Radix3 storage pattern for I2 sequences for N = 18 highlight ing the case n = N /6 201
PAGE 213
Ia:a,l Ia:2,1 Ia:1,1 Ia:o,l Ia:5,o Ia:4,0 Ia:a,o I"'2,o r,,,o I"o,o Ra:,,o Ra:2,o Ra:a,o R"4,o R"'5,o R"o,l Ra:,,, R"'2,1 Iyo,o Ry1,2 Ry2,2 Figure 4.30: Radix3 storage pattern for 12 sequences for N = 18 highlighting the case n = 1 202
PAGE 214
Table 4.3: Splitting Tree for the Radix3 RO FFT for N = 27 Length Factor Symmetry Repetitions 27 3 ICS 1 9 3 ICS 1 12 1 3 3 ICS 1 I2 4 203
PAGE 215
4.5 The Mixed Radix RO FFT Up to this point, we have been developing the RO FFT for p = 2, 3, 4 as three unrelated algorithms. We now begin combining these into a single mixed radix RO FFT algorithm for sequence lengths comprised of these factors. The first issue we must address is compatibility of storage patterns. That is, when we compare the storage patterns for p = 2,3,4, are sequences with the same symmetries stored in the same pattern? The answer is yes, with one exception. The storage pattern for I sequences for p = 3 is different from p = 2, 4. We deal with this as follows. We will process all even factors first, followed by all odd factors. Thus, when processing a factor of 3, we assume that there are already I sequences present stored in the pattern corresponding top= 2, 4. Processing a factor of 3 will introduce I sequences stored in a new pattern. To distinguish these two storage patterns for I sequences, we refer to the latter as I2 sequences. When processing additional factors of 3, we will encounter beth I and 12 sequences. Thus, for p = 3 we have developed two storage patterns for the combine equation for I sequences corresponding to these two possibilities. These are shown in Figures 4.25 through 4.30. The second issue we must address is generating splitting trees for the mixed radix RO FFT. The splitting tree guides the entire algorithm by in dicating which combine equations to apply to the data, and in what order. Table 4.4 shows the splitting tree for the mixed radix RO FFT for N = 4x2x3x3 = 72, using the same compressed format introduced earlier. Be cause the splitting tree is different for each value of N, it is necessary to de velop software which generates it automatically. The first step in this process is to develop a representation of the splitting tree in standard FORTRAN. Closely related to this is the factorization of the length of the sequence, N. We have made an important decision to process all even factors first, followed by all odd factors. More explicitly, we will first process all factors of 4, followed by at most one factor of 2, followed by all factors of 3. We represent this factorization in FORTRAN by: NFAC(l) NFAC(2) = NFAC(3:12) = N number of factors list of factors (10 maximum) 204
PAGE 216
When the factors are processed in a known order, the amount of memory required to represent the splitting tree can be minimized. This is important, since otherwise the splitting tree could require almost as much memory as the data. In Table 4.4 we observe that each factor has associated with it 14 splitting tree entries. With the restriction that the factors are processed in the order specified above, it is easily seen that there are at most 4 splitting tree entries per factor for any relevant value of N. We represent the number of splitting tree entries for the J'th factor in FORTRAN by NE(J), and it satisfies 1 $ NE( J) $ 4. We represent the I'th splitting tree entry for the J'th factor in FORTRAN by: TREE(1,I,J) symmetry of DFT of sequence, where: 1 ICS 2 = ISCS 3 I 4 = I2 TREE(2,I,J) = associated repetition count The arrays NFAC, NE, and TREE are constant for a fixed value of N. Because they are referenced throughout the code for the RO FFT, they have been placed in a common block. At this point, our discussion of the design of the software for the R.O FFT will be facilitated by a description of each subroutine and their relationships. Figure 4.31 provides the subroutine hierarchy for the initialization process ing, which is defined as all processing which depends only on the length of the sequence, N. Figure 4.32 provides the subroutine hierarchy for the transform processing, which is defined as all processing which depends on the data itself. Since the initialization processing is executed just once for each value of N, its execution time is not critical. In contrast, the execution time of the transform processing is critical. N;:,te that many subroutines are used in both the initialization and transform processing. For a descrip tion of each subroutine, refer to the prologues contained with the code in Appendix B. Note that this software is designed to transform multiple se quences on vector processors. In the remainder of this section, we highlight features of the software which are worthy of special attention. 205
PAGE 217
Table 4.4: Splitting Tree for the Mixed Radix RO FFT for N = 72 Length Factor Symmetry Repetitions 72 4 ICS 1 18 2 ICS 1 ISCS 1 I 1 9 3 ICS 1 ISCS 1 I 3 3 3 ICS 1 I2 2 ISCS 1 I 9 206
PAGE 218
VICSF4 VISCSF4 VIF4 VICSF2 VISCSF2 VIF2 VFFR03I VICSF3 VISCSF3 VIF3 VI2F3 Figure 4.31: Initialization subroutine hierarchy for the RO FFT VFFR04 VICSF4 VISCSF4 VIF4 VFFRO VFFR02 VICSF2 VISCSF2 VIF2 VFFR03 VICSF3 VISCSF3 VIF3 VI2F3 Figure 4.32: Forward transform subroutine hierarchy for the RO FFT 207
PAGE 219
The most significant feature of the design of this software is the automated generation of the splitting tree. Because this depends only on the length of the sequence N, it is performed with initialization processing for which execution time is not critical. From this, we obtain two important byproducts as well. The first of these is the array INDX, which provides the permuted indices of the forward transform. The ordering of the forward transform depends on the storage pattern used for each combine equation in the algorithm. As a result, this ordering is rather complex, and we must provide the user with an explicit description of it. Thus, the array INDX pro vides a list of the indices of the forward transform in the correct permuted order. The second byproduct is for applications to fast Poisson solvers. Recall from Section 1.1 that the RO FFT has associated with it a set of eigenvalues. The computations in the spectral domain involve both the for ward transform and these eigenvalues. It is essential that the eigenvalues be provided in the same permuted order as the forward transform. Thus, the initialization processing includes the computation of these eigenvalues, and they are stored in the array EIGENV in the same permuted order indicated by INDX. We note that these eigenvalues are scaled by N rather than scal ing the forward transform by 1/N. This is far more efficient. INDX itself is not needed for applications to fast Poisson solvers, but may be useful for other applications. INDX is obtained by computing the forward transform of a special sequence. The splitting tree is generated as this transform is computed. Having obtained INDX, EIGENV is easily computed. In the following paragraphs we describe this process in more detail. The sequence which we transform to obtain INDX is derived as follows. We set: Im(Xk) = k for the lower halfrange of k. From Theorem 2.10, the IDFT of this for even values of N is given by: N/21 "'n = L 2k sin(21rkn/ N) k=l for 1 $ n $ N /2 1. Of course, there is an analogous result for odd values of N. When we compute the DFT of the sequence Xn we recover the indices k, to within rounding error, in permuted order. We note that some of the indices will be negative. This is because the DFT Xk is ICS symmetric. That is: 208
PAGE 220
Only nonredundant terms of Xk are stored, but these do not necessarily belong to the lower half of the sequence. Thus, INDX will contain index values belonging to both the lower and upper halves of the sequence Xk. Note that the sign of the index k has no effect on the associated eigenvalue: The processing described in this paragraph may be found near the end of subroutine VFRO I. Next, we describe how the splitting tree is generated and used. Recall that the splitting tree is generated as the sequence :lln in the preceding paragraph is transformed. Subroutine VFFROI supervises this forward trans form and generation of the splitting tree by processing the list of factors of the sequence length N in forward order. Before processing the first factor, the initial splitting tree entry is stored. This always has the form: NE(1) TREE(1,1,1) TREE(2,1,1) 1 one entry for the first factor 1 DFT of sequence is res symmetric 1 repetition count is 1 As a specific example, let us assume that the first factor of N is 4. In this case, we next call subroutine VFFR04I. Subroutine VFFR04I uses the initial splitting tree entry to supervise the application of the radix4 forward combine equations. It also adds new splitting tree entries which reflect the changes made by this radix4 splitting. These changes are obtained by studying the appropriate storage pattern diagrams. For this example, these are Figures 4.10 through 4.12. First, we note that the splitting tree changes depend on the length of the subsequence being split, LS. The subsequence length which establishes the boundaries between these options is always 2p, where p is the radix. For this example, this critical subsequence length is 8, and let us assume that LS > 8. Then the output of this forward combine operation is one res sequence, followed by one ISeS sequence, followed by one I sequence. At this point, we add a new set of splitting tree entries which describe this output. Recall that the FORTRAN structure of the splitting tree is a three dimensional integer array. The third index indicates the factor number being processed. In this example, the initial splitting tree entry has a third index of 1, and these new splitting tree entries have a third index of 2. The new splitting tree entries will be used to supervise the processing for the second factor of N. 209
PAGE 221
By continuing in this fashion, we generate the complete splitting tree. To facilitate this, subroutine VFFR04I has an input parameter IC which is used as the third index into the splitting tree for obtaining inputs, and IN"' IC + 1 is used as the third index into the splitting tree for storing outputs. Also note that new splitting tree entries are added in a manner which keeps the splitting tree as compressed as possible. That is, we never make two consecutive splitting tree entries with the same symmetry. We increment the corresponding repetition count instead. For the example we have been discussing, the code fragment in subroutine V:FFR04I which adds the new splitting tree entries is shown below. IF ((NE(IN) .GE. 1) .AND. & (TREE(1,NE(IN),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 1 TREE(2,NE(IN),IN) = 1 END IF NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 2 TREE(2,NE(IN),IN) = 1 NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 3 TREE(2,NE(IN),IN) = 1 Note that the code fragment above is a knowledge base, or set of rules, which describes the results of a radix4 splitting of an ICS sequence for LS > 8. Subroutines VFFR04I, VFFR02I, and VFFR03I collectively contain all of the rules required to generate splitting trees for the RO FFT for sequences of length N which are products of 2,3,4. Note that we never explicitly generated the splitting tree for a specific value of N. Instead, this is done automatically by the software during initialization processing. Since the initialization processing actually computes a forward trans form, the forward transform processing outlined in Figure 4.32 is completely analogous to it, except that the splitting tree is not generated because it is al ready available. Consequently, we will not describe the subroutines involved in the forward transform in detail. We only note that the nse of the splitting tree to guide the forward transform does not add significantly to the exe cution time because this merely involves testing splitting tree entries and 210
PAGE 222
branching to call the appropriate subroutines. We have not implemented the inverse transform because it does not introduce any new significant is sues. However, we will briefly outline what is involved. First, one must develop corollaries for the inverse combine equations analogous to those for the forward combine equations. The storage patterns for the inverse com bine equations are the same as for the forward combine equations, except that they are traversed in the inverse direction. The input to the inverse transform is in the permuted order output by the forward transform. The inverse transform uses the same initialization routine as the forward transform because the splitting tree must be generated in the forward direction. The splitting tree cannot be generated in the inverse direction because, for example, there is more than one way that I sequences can be produced. Thus, it was essential that we saved the state of the splitting tree for each factor of N. The software for the inverse transform itself is analogous to that for the forward transform, with the following exceptions. The factors of N are processed in inverse order, and consequently the splitting tree is traversed in the inverse direction. Finally, the inverse combine equations are used instead of the forward combine equations. One of the key objectives of the prototype software for the RO FFT is efficiency. Consequently, in this paragraph we discuss coding techniques utilized to obtain optimum performance. These techniques are discussed in the order of most significant to least significant. This software is written in FORTRAN for serial vector processors. It is designed to compute multiple transforms. Each sequence is stored in a row of a two dimensional array. We vectorize by column, resulting in a vector stride of 1 and a vector length equal to the number of sequences. All complex arithmetic is expanded into real and imaginary parts. The equations have been optimized by hand, stor ing all intermediate results in local scalar variables. Within vectorized loops, local scalar variables are implemented as vector registers when possible and do not necessarily result in additional memory accesses. In as much as pos sible, computations involving the same vectors are in adjacent FORTRAN statements to avoid multiple memory accesses for the same data. We have avoided coding the unary operator . Where this operator must be used, it is used as a scalar operator if possible, rather than as a vector operator. CASE structures have been implemented with the computed GOTO statement to avoid sequences of test and branch instructions. The most common case is coded last to avoid an additional brar.ch instruction. Finally, we have avoided coding DO loops which are known a priori to have only one iteration. 211
PAGE 223
We conclude this section with a discussion of the test driver used in the development of the prototype software for the RO FFT. We do this because it has played a crucial role in the development of this software, and also because it provides an explicit example of the correct user interface. It should be clear that there are a large number of logic paths through this software. For this reason, this test driver has been designed to exhaustively test with sequences of all lengths within a practical range. For applications to fast Poisson solvers, the length of a RO sequence represents twice the number of grid points in one dimension of a two or three dimensional grid. Thus, 1024 is a reasonable upper bound on the sequence length. Again for fast Poisson solvers, the number of sequences is the product of the number of grid points in the remaining dimensions, and therefore is fairly large. We have selected 1024 as the number of sequences because this is the vector length, and is sufficient for most vector processors to attain full vector speed. We assign the same test data to all sequences. This test data has been selected to facilitate automated verification of the software output, and is derived as follows. We set: Im(Xk) = 1 for the lower halfrange of k. From Theorem 2.10,the IDFT of this for even values of N is given by: N/21 "'n = I; 2 sin(21l"kn/ N) k=l for 1 ::; n ::; N /21. Of course, there is an analogous result for odd values of N. When we compute the DFT of the sequence "'n we recover the sequence with values to within rounding error. The value 1 occurs because of the ICS symmetry and permuted ordering of the DFT Xk. In spite of this, it is easy to automate the verification of the software output because it is constant, up to a sign, and does not require sorting. At this point, a word about scaling is in order. Because we will be making performance compar isons to VFFTPK, we have scaled the output as VFFTPK does. Thus, we have scaled the output by 1/VN instead of 1/N. As a result, the correct output will be VN rather than For the final version of this software, it is recommended that no scaling be done because this can be accomplished more efficiently by the user. The test driver is designed to produce a concise report of the test results. If the automated verification of the software output is successful, then only timing data is provided. Otherwise, additional debug data is output. 212
PAGE 224
The prototype software for the RO FFT has passed the tests described in the preceding paragraph. Thus, we are ready to procede with performance comparisons to VFFTPK. The test driver has been designed to apply the same tests to VFFTPK, time both algorithms, and compare their perfor mance. The results are presented in Section 4.6. 213
PAGE 225
4.6 Performance of the RO FFT The tests described at the end of Section 4.5 have been executed on both an IBM 3090J, located at the IBM Federal Sector Division in Houston, Texas, and a Cray YMPS/864, located at the Kational Center for Atmo spheric Research in Boulder, Colorado. The results are shown in Tables 4.5 and 4.6. The column headings in these tables are as follows: N CINIT CTRAN PINIT PTRAN DEL TIM length of the RO symmetric sequence (products of 2,3,4 only) compact algorithm initialization time (seconds) compact algorithm transform time (seconds) prepostprocessing algorithm (VFFTPK) initialization time (seconds), not available for odd values of N prepostprocessing algorithm (VFFTPK) transform time (seconds), not available for odd values of N lOO(PTRANCTRAN)/PTRAN The remainder of this section is devoted to a careful analysis of the data in Tables 4.5 and 4.6. First, we emphasize that it is not our intent to compare the performance of the IBM 3090J and the Cray YMPS/864. Rather, we are comparing the performance of the compact algorithm to the prepost processing algorithm (VFFTPK). We regard the performance of VFFTPK as the baseline, and compare the performance of the compact algorithm to it. We will refer to this as the relative performance of the compact algorithm, and it is expressed quantitatively in the column labeled DELTIM. Note that we are not concerned with the performance of initialization processing because in applications it is executed only once for each value of N. Also, the functionality of the initialization processing is quite different for the two algorithms. Note that the relative performance of the compact algorithm is signifi cantly higher on the IBM 3090J than on the Cray YMPS/864. The reason for this involves an analysis of the assembly language generated from the 214
PAGE 226
Table 4.5: Timing Data for 1024 Sequences on the IBM 3090J N CINIT CTRAN PINIT PTRAN DEL TIM 3 0.000215 0.000098 0.000000 I 0 000000 0.0 4 0.000089 0.000105 0.000007 0.000011 854.5 6 0.000129 0.000264 0.000053 0.000130 103.1 8 0.000165 0.000437 0.000016 0.001221 64.2 9 0.000200 0.000821 0.000000 0.000000 0.0 12 0.000224 0.000837 0.000028 0.001867 55.2. 16 0.000295 0.001157 o.oooo27 I o.oo2360 51.0 18 0.000338 0.001987 0.000032 0.003009 34.0 24 0.000493 0.002503 0.000037 0.004024 37.8 27 0.000543 0.004127 0.000000 0.000000 0.0 32 0.000636 0.003313 0.000037 0.005254 36.9 36 0.000772 0.004646 0.000057 0.007280 36.2 48 0.001095 0.005909 0.000074 0.009791 39.6 54 0.001264 0.009420 0.000088 0.011954 21.2 64 0.001488 0.007822 0.000089 0.012861 39.2 72 0.001982 0.011617 0.000104 0.016385 29.1 81 0.002303 0.017147 0.000000 0.000000 0.0 96 0.003018 0.015135 0.0.00125 0.023006 34.2 108 0.003514 0.020407 0.000145 0.029910 31.8 128 0.004367 0.019962 0.000154 0.030695 35.0 144 0.005475 0.026458 0.000186 0.040155 34.1 162 0.006480 0.039680 0.000205 0.048162 17.6 192 0.008622 0.035250 0.000226 0.053534 34.2 216 0.010670 0.049952 0.000253 0.064079 22.0 243 0.012850 0.068479 0.000000 0.000000 0.0 256 0.013318 0.046786 0.000281 0.070952 34.1 288 0.017432 0.064724 0.000318 0.084894 23.8 324 0.020924 0.083660 0.000360 0.109690 23.7 384 0.028521 0.084783 0.000403 0.113559 25.3 432 0.034990 0.106282 0.000460 0.145865 27.1 486 0.043096 0.149356 0.000512 0.172023 13.2 512 0.046345 0.111392 0.000519 0.149192 25.3 576 0.063103 0.138339 0.000593 0.193386 28.5 648 0.078003 0.187007 0.000665 0.228665 18.2 729 0.096424 0.24 7511 0.000000 0.000000 0.0 768 0.105239 0.181592 0.000769 0.258849 29.8 864 0.131837 0.243505 0.000867 0.303848 19.9 972 0.162945 0.306870 0.000976 0.378660 19.0 1024 0.176456 0.240520 0.001008 0.341197 29.5 215
PAGE 227
Table 4.6: Timing Data for 1024 Sequences on the Cray YMP8/864 N CINIT CTRAN PINIT PTRAN DEL TIM 3 0.000038 0.000029 0.000000 0.000000 0.0 4 0.000036 0.000030 0.000005 0.000005 522.9 6 0.000056 0.000078 0.000017 0.000031 156.7 8 0.000064 0.000116 0.000014 0.000181 36.0 9 0.000075 0.000170 0.000000 0.000000 0.0 12 0.000087 0.000199 0.000027 0.000361 44.8 16 0.000104 0.000274 0.000026 0.000485 43.5 18 0.000131 0.000420 0.000034 0.000606 30.6 24 0.000167 0.000588 0.000042 0.000834 29.4 27 0.000192 0.000783 0.000000 0.000000 0.0 32 0.000222 0.000790 o.oooo4o I o.oo1o89 27.4 36 0.000249 0.000989 0.000045 0.001395 29.1 48 0.000335 0.001330 0.000054 0.001895 29.8 54 0.000412 0.001850 0.000054 0.002305 19.8 64 0.000465 0.001774 0.000053 0.002490 28.8 72 0.000579 0.002511 0.000063 0.003142 20.1 81 0.000690 0.003141 0.000000 0.000000 0.0 96 0.000851 0.003345 0.000072 0.004224 20.8 108 0.001011 0.004115 0.000068 0.005372 23.4 128 0.001301 0.004517 0.000072 0.005636 19.9 144 0.001772 0.005576 0.000079 0.007230 22.9 162 0.002139 0.007283 0.000081 0.008711 16.4 192 0.002659 0.007424 0.000091 0.009838 24.5 216 0.003269 0.009854 0.000093 0.011822 16.7 243 0.003993 0.012057 0.000000 0.000000 0.0 256 0.004179 0.009865 0.000096 0.013037 24.3 288 0.005616 0.013014 0.000111 0.015986 18.6 324 0.006756 0.015806 0.000111 0.019110 17.3 384 0.008940 0.017537 0.000128 0.021576 18.7 432 0.011495 0.021370 0.000130 0.025842 17.3 486 0.014145 0.027055 0.000136 0.030702 11.9 512 0.015338 0.023642 0.000139 0.028688 17.6 576 0.019435 0.028664 0.000162 0.035000 18.1 648 0.025037 0.036605 0.000163 0.041512 11.8 729 0.030515 0.043868 0.000000 0.000000 0.0 768 0.032927 0.038324 0.000190 0.047187 18.8 864 0.041872 0.048680 0.000199 0.056344 13.6 972 0.052831 0.057831 0.000212 0.068120 15.1 1024 0.057313 0.050686 0.000220 0.063066 19.6 216
PAGE 228
FORTRAN source code on both machines. To simplify this analysis, we have restricted our attention to the most computationally intensive vector ized loops involved in radix4 processing. We have selected radix4 because the number of vector registers required for efficient implementation of these algorithms increases with the value of the radix, p. Thus, for our test cases radix4 represents the worst case. For the compact algorithm, the vector ized loop which we analyze is the DO 101 loop from subroutine VIF4 (see Appendix B). For VFFTPK, the vectorized loop which we analyze is the following code segment from subroutine VRADF4: DO 1003 M=l,MP CH(M,I1,1,1) = ((WA1(I2)oCC(M,Il,K,2)+WA1(I1)o 1 CC(M,I,K,2))+(WA3(I2)oCC(M,Il,K,4)+W&3(I1)* 1 CC(M,I,K,4)))+(CC(M,I1,K,1)+(WA2(I2)oCC(M,I1,K,3)+ 1 WA2(I1)oCC(M,I,K,3))) CH(M,IC1,4,K) = (CC(M,I1,K,1)+(WA2(I2)oCC(M,I1,K,3)+ 1 WA2(I1)oCC(M,I,K,3)))((WA1(12)*CC(M,I1,K,2)+ 1 WA1(I1)oCC(M,I,K,2))+(WA3(I2)CC(M,I1,K,4)+ 1 WA3(I1)oCC(M,I,K,4))) 1003 CONTINUE We are interested in the number of memory accesses (vector loads and stores) required to implement these vectorized loops. First, we consider the IBM 3090J. This machine has 16 single precision vector registers of length 256, which may be concatenated in pairs to form 8 double precision vector registers. Since both algorithms are coded in single precision, we have 16 vector registers available. This machine also has a 256K byte cache. Vector instructions may operate on two vector registers, or one vector register and the address of a vector in memory. The latter type of vector instruction is used extensively by the VS FORTRAN compiler, and it complicates the process of counting memory accesses. For the first such instruction issued to a particular address, we assume that the memory operand is not in cache and count this as a memory access. For subsequent instructions issued to this same address, we assume that the memory operand is in cache, and do not count this as a memory access. With these assumptions, the number of memory accesses required to implement 217
PAGE 229
the segments from VIF4 and VRADF4 are both 16, This is optimal, because both code segments involve 8 real input and 8 real output vectors, Next, we consider the Cray YMP8/864, This machine has 8 vector registers of length 64 for both single and double precision, and there is no cache, Vector instructions may operate on two vector registers only, Thus, the number of memory accesses is simply the number of vector loads and stores, The FORTRAN compiler used was CFT77 with full optimization, The number of memory accesses required to implement the segment from VIF4 is 28, Of these, 12 involve temporary storage locations due to an insufficient number of vector registers, The number of memory accesses re quired to implement the segment from VRADF4 is 26, Of these, 6 involve temporary storage locations and 4 involve reloading vectors a second time due to the complexity of the code, Both code segments required more than the optimum number of memory accesses, However, the segment from VIF4 required more than the segment from VRADF4 because the former is part of an inplace algorithm which requires a larger number of temporary stor age locations, Thus, the relative performance of the compact algorithm is constrained on the Cray YMP8/864 by an insufficient number of vector registers, In view of the analysis above, we will focus our attention on the IBM 3090J for the remainder of this section, Our next goal is to develop analytic timing models for both algorithms, First, we develop the timing model for the compact algorithm, As usual, N will denote the length of the RO symmetric sequence, and we express N as follows: To simplify the model, we have excluded odd values of N, Thus, for each factor of N there are N /2 1 real quantities to be processed, We seek a least squares fit to the timing data using the sum of the following terms: c1(N/21) time for scaling, and adjustments to the other terms c2p(Nj21) time for processing all factors of 2 C3q(N/21) time for processing all factors of 3 time for processing all factors of 4 218
PAGE 230
Note that the time required for processing a given factor is not uniform because the computations involved depend on the length of the subsequence being split. For example, the computations required for the last factor of N do not involve multiplications by powers of omega. Thus, the constants c 2 ,ca,c4 represent averages, and the time required for processing the last factor of N will be overestimated. The constant c1 is used to adjust for this, and therefore it may be negative. The least squares solution was computed using Mathematica [12], and yielded the following results: CJ 0.000058986 C2 +0.000069355 ca +0.00011704 7 C4 +0.00010503 7 Next, we develop the timing model for the prepostprocessing algorithm. This algorithm is restricted to even values of N, and it ultimately transforms a real sequence of length N /2. Thus, for this model we express N /2 as follows: N/2 = 2P3"4r We seek a least squares fit to the timing data using the sum of the following terms: caq(N/2) time for preprocessing, postprocessing) scaling, and adjustments to the other terms time for processing all factors of 2 time for processing all factors of 3 time for processing all factors of 4 The least squares solution was again computed using Mathematica, and yielded the following results: CJ +0.000147671 C2 +0.0000781027 ca +0.000111397 c4 +0.000110474 219
PAGE 231
Note that c, is positive in this case due to a large contribution from pre and postprocessing. Table 4. 7 is analogous to Table 4.5 except that all timing data has been computed from the timing models. Timing data not relevant to the timing models have been omitted or set to zero. A comparison of these tables shows that we have obtained an excellent fit. We now focus our attention on the columns labeled DEL TIM. Recall that this is a measure of the relative performance of the compact algorithm, which is our primary interest. Both tables show that DEL TIM is a fairly complicated function of N. We will now summarize the reasons for this. In Table 4.5 note that for N = 4, 6, the compact algorithm is slower than VFFTPK. This is because VFFTPK contains simplified code for these special cases. Such small values of N are of no practical importance, so we ignore this. In Table 4. 7, the timing model has been used to extrapolate timing data for VFFTPK for N = 4, 6. Swarztrauber [10] has shown that the compact and prepostprocessing algorithms have the same asymptotic operation counts, but the compact algorithm has smaller low order terms. For values of N within the practical range shown in Tables 4.5 and 4. 7, these low order terms make a significant contribution. Closely related to this is the number of factors of N. Both algorithms must access all of the data for each factor of N, while VFFTPK accesses all of the data two additional times for preand postprocessing. These additional data accesses are most significant when the number of factors of N is small. Thus, DELTIM generally decreases as N increases. This can be seen by comparing the timing data for N = 64, 256, 1024. However, D ELTIM is not a simple monotonically decreasing function of N. An interesting phenomenon occurs when N includes an odd power of 2. Recall that both algorithms use as many factors of 4 as possible, resulting in at most one factor of 2. Moreover, the prepostprocessing algorithm actually works with sequences of length N /2. This eliminates the factor of 2 for the prepostprocessing algorithm. With one less factor to process, the performance of the prepostprocessing algorithm improves relative to the compact algorithm. This can be seen by comparing the timing data for N = 256, 512,1024. There is an additional phenomenon which creates irregularities in the timing data. Recall from Theorem 2.12 that the forward combine equations involve factors of wp, where pis the radix. For p = 2, 4 this reduces to 1, i respectively. Of course, we do not need to perform multiplications by these values. On the other hand, for p = 3 this reduces to 1/2 + iVS/2, and 220
PAGE 232
Table 4.7: Timing Model for 1024 Sequences on the IBM 3090J N Factorization CTRAN PTRAN I DELTIM 3 203140 0.000000 0.000000 1 0.0 4 203041 0.000046 0.000452 89.8 6 2 1 3 1 4 0.000255 0.000777 67.2 8 2 1 3 1 0.000346 0.001033 66.5 9 203240 0.000000 0.000000 0.0 12 2 1 4 1 0.000815 0.002023 59.7 16 20304Z 0.001058 0.002690 60.7 18 2 1 324 0.001956 0.003334 41.3 24 2 1 3 1 4 1 0.002557 0.004435 42.3 27 203340 0.000000 0.000000 0.0 32 2132 0.003307 0.005898 43.9 36 224 1 0.004 762 0.008074 41.0 48 2 1 42 0.006167 0.010743 42.6 54 213340 0.009399 0.013010 27.8 64 203043 0.007940 0.014295 44.5 72 2 1 324 1 0.012233 0.017314 29.3 81 203440 0.000000 0.000000 0.0 96 2 1 3 1 42 0.015862 0.023041 31.2 108 203341 0.021051 0.030238 30.4 128 2 1 33 0.020505 0.030662 33.1 144 2242 0.027348 0.040251 32.1 162 2 1 3 4 4 0.038285 0.048054 20.3 192 203143 0.035451 0.053579 33.8 216 2 1 3 3 4 1 0.049921 0.063972 22.0 243 203540 0.000000 0.000000 0.0 256 203044 0.045868 0.071321 35.7 288 2 1 3 2 4 2 0.064999 0.085163 23.7 324 203441 0.082792 0.108761 23.9 384 2 1 3 1 4 3 0.084523 0.113374 25.4 432 203342 0.107979 0.144815 25.4 486 2 1 3 5 4 0.144136 0.171231 15.8 512 2 1 3 4 0.109782 0.150929 27.3 576 203243 0.140693 0.192821 27.0 648 213441 0.188501 0.228009 17.3 729 203640 0.000000 0.000000 0.0 768 203144 0.183154 0.256740 28.7 864 2 1 3342 0.246353 0.303614 18.9 972 203541 0.306174 0.380421 19.5 1024 203045 0.238228 0.341847 30.3 221
PAGE 233
complex multiplications by this value are required. Similar considerations apply to the prepostprocessing algorithm. The timing data for both al gorithms exhibit local maxima at values of N which include many factors of 3. For examples, see the timing data for N = 486,972. These local maxima in the timing data create corresponding irregularities in DEL TIM. This phenomenon is also reflected in the constants c2 c3 c4 in the timing models. Although we have been analyzing the RO FFT, this phenomenon occurs in the complex FFT as well, and contradicts a statement made in the paper by Cooley and Tukey [3] which introduced the complex FFT. There it was asserted that the operation count for the complex FFT, when normalized by N log( N), attains a minimum for radix3. This erroneous conclusion was based on oversimplified operation counts which included multiplications by 1 and i. The complex pattern of DEL TIM in Tables 4.5 and 4.7 is the result of superimposing the phenomena discussed in the preceding paragraphs. As an example, for N = 486 we have both an odd power of 2 and many factors of 3. Although DEL TIM is a fairly complicated function of N, we generalize that for most values of N within a practical range, DEL TIM is approximately 2530%. Thus, for applications which make extensive use of symmetric FFTs, it is well worth the effort to implement the compact algorithms. 222
PAGE 234
4. 7 Automating Implementation of the RO FFT Recall from Section 4.1 that the entire package of compact algorithms requires a significant quantity of FORTRAN code. Thus, we wish to auto mate as much of the implementation process as possible. We have focused our attention on the most labor intensive steps in this process. These are obtaining the simplified form of the combine equations contained in Corol laries 4.1 through 4.3, and writing FORTRAN code for these which maps each algebraic quantity to the correct storage location as specified by the associated data storage patterns. Thus, we need a software tool which is capable of performing symbolic algebra and outputing results in FORTRAN syntax. A number of such tools are available, and we have selected Math ematica [12]. In this section, we describe how Mathematica can be used to automate the steps described above. As a specific example, we will automate the generation of code for the radix4 RO FFT. The subroutines which we will generate automatically are those which implement the radix4 combine equations, namely VICSF4, VISCSF4, and VIF4. The remaining subroutines associated with the radix4 RO FFT are relatively easy to code. Our overall strategy is to develop a FORTRAN skeleton for the subroutines which implement combine equa tions, generate FORTRAN code for the combine equations using Mathernat ica, and insert these equations into the skeleton. The FORTRAN skeleton is contained in Appendix C. Although we are focusing on the radix4 RO FFT, this skeleton may be used for any of the compact symmetric FFTs with only minor modifications. The command files for generating FORTRAN code for the combine equations with Mathematica are contained in Ap pendix D. There are three command files corresponding to the forward combine equations for ICS induced symmetries, ISCS induced symmetries, and I sequences. Although we are focusing on the radix4 RO FFT, these command files are valid for any even value of the radix p, and could easily be modified for odd values of p. These command files are quite involved. However, they contain extensive comments, so we will not elaborate on them further here. It is assumed that the reader has a general familiarity with Mathematica. Appendix E contains the results of inserting the Mathematica output into the FORTRAN skeleton, yielding new versions of subroutines VICSF4, VISCSF4, and VIF4. Note that the FORTRAN skeleton and any other handwritten code is in uppercase, while the Mathematica output is in lower case. We emphasize that no attempt was made to manually optimize 223
PAGE 235
Table 4.8: Comparison of Timing Data for Handwritten Code and Auto mated Code for 1024 Sequences on the IBM 3090J Hand. Auto. Hand. Auto. N PTRAN CTRAN CTRAN DEL TIM DEL TIM 16 0.002360 0.001157 0.001276 51.0 45.9 64 0.012861 0.007822 0.008753 39.2 31.9 256 0.070952 0.046786 0.052649 34.1 25.8 1024 0.341197 0.240520 0.273806 29.5 19.8 this code. We have combined the new versions of these subroutines with the remainder of the code, and executed the tests described in Section 4.5. The results are shown in Table 4.8. This table is restricted to powers of 4, since only the radix4 subroutines have been automated. By comparing the timing data for the handwritten and automated versions, we see that DELTIM has decreased by 510%. This decrease in performance could be recovered by manually optimizing the code. The procedures described in this section have been highly successful for automating the generation of code for the radix4 RO FFT, and could easily be extended to generate code for the other compact symmetric FFTs. 224
PAGE 236
Appendix A Eigenstructure of the Discrete Poisson Equation
PAGE 237
In this appendix, we provide an example of the technique used to prove the eigenstructures for the discrete Poisson equation with various boundary conditions which are sununarized in Tables 1.2 through 1.4. The example we will present is that for DDS boundary conditions. From Table 1.4, we see that the associated transform is the ROO FFT. Theorem 2.16 in Section 2.8 provided the real form of the DFT and IDFT associated with an ROO sequence of length N = 2(2M + 1). \Ve asserted that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DDS boundary conditions. Although we did not prove this, we made it seem plausible by observing that an ROO sequence satisfies DDS boundary conditions for the computational domain 1 :S n :S M. We now verify that the result for the IDFT is the eigenvector expansion required by the Fourier analysis method for DDS boundary conditions. In the process, we will determine the corresponding eigenvalues as well. The matrix, of dimension M, corresponding to the discretized Poisson equation satisfying DDS boundary conditions is shown below. We hypothesize that the n'th component of the k'th eigenvector is sin( 4ll'kn/ N), for 1 :S n :S M ,1 :S k :S M. We test this hypothesis by computing the following matrix vector product: 2 1 0 1 2 1 = 0 1 0 0 2 1 1 3 sin( 4ll'k/ N) sin( Sll'k / N) sin(4dM/N) We will compute the product for the first row, the last row, and the second row which represents a typical interior row. r1 2sin(4ll'k/N) + sin(8ll'k/N) = sin(41rk/N)[2 + 2cos(4d/N)] = sin(4ll'k/N)[4sin2(2ll'k/N)] TM sin(41rk(M 1)/N)3sin(4dM/N) sin(411'k(N/43/2)/N)3sin(411'k(N/41/2)/N) = sin(d(N6)/ N)3 sin(ll'k(N2)/ N) (1)k+J sin(6ll'k/N)3(l)k+1sin(2r.k/N) 226
PAGE 238
= ( l)k+1 sin(271"kiN)[cos(471"kiN) + 2cos2(27rkiN)3] = sin(47rkMIN)[cos(471"kiN) 1 + 2cos2(27rkiN)2] = sin( 471"kM IN)[ 4 sin2(271"k IN)] r2 = sin(47rkiN)2sin(87rkiN) + sin(127rkiN) = sin(81rkjN47rkjN)2sin(87rkiN) + sin(87rk/N + 47rkiN) = 2sin(87rk/N)cos(47rk/N)2sin(87rk/N) = sin(87rkiN)[2cos(47rkiN)2] = sin(87rk/N)[4sin2(27rkiN)] We have shown that the n'th component of the k'th eigenvector is sin( 47rknj N), and that the associated eigenvalue is ).k = 4 sin2(27rk/ N) for 1 s; n s; M,1 s; k s; M where N = 2(2M + 1). 227
PAGE 239
Appendix B Software for the RO FFT
PAGE 240
c C TEST DRIVER FOR THE RO FFT c c C NOTES COJIICER!HNG PERFORMAlfCE MONITORS c C COMMENTS BEGINNING lJITH CI ARE USED FOR PERFORNANCE C MONITORING DE THE IBM 3090, WHILE THOSE BEGIIHJUG \HTH CC C ARE FOR THE CRAY YI'iP, SUBROUTINE IBMTIME PROVIDES TI11E C STAMPS ON THE IBM 3090 USING THE STCK IESTRUCTIOi, WHILE C SUBROUTINE SECOND PERFORMS A SIMILAR FUNCTION ON THE CRAY C YMP. THE FOLLOWING VARIABLES ARE USED BY SUBROUTINE C IBMTIME. c CI REA.L*S !START, !STOP c c C ALLOCATE STORAGE FOR THE PREPOSTPROCESSING ALGORITHH c c PARAMETER (M=1024,HAXLEN=1024) REAL X(1:M,1:MAXLEN/2),XT(1:M,1:MAXLEN/2) REAL WSAVE(3*(MAXLEN/21)+1S) C ALLOCJ.TE STORAGE FOR THE COMPACT ALGORITHH c c REAL Y(l :M,1 :MAXLEIU21) ,EIGElliV (1 :lUXLEN/21), IfORDS (9) INTEGER INDX(l:MAXLEN/21) COMPLEX OMEGA(0:2MAXLEN1) COMMON /VFROCOM1/ WORDS,OHEGA C PRINT COLUMN HEADINGS c WRITE(6,1) 1 FOR.MAT(lH ,1SX,'N',4X,'CH!'IT',7X,'CTRAN',7X, .t I PI NIT' ?X. 1PTRAnl'. sx. 'DELTIH' '/) c C LOOP THROUGH VALUES OF N c i = 3 1001 IF (E .GT. HAXLEN) GOTO 1002 c C CALL COMPACT INITIALIZATION c CI CALL IBf1TIIiE(TSTART) CC CINIT = SECOND() CALL VFROI(N,INDX,EIGENV,IRC) CI CALL IBMTIME(TSTOP) CI CINIT = 1, OE6 (TSTOP TSTil.RT) CC CINIT = SECOND() CINIT IF (IRC .EQ. 0) THEN c C GENERATE TEST DATA c PI = 4.0 ATAN(l.O) TPIDN = 2.0PI/N 229
PAGE 241
c IF (2(N/2) .EQ. N) THEN H5 = N/21 ELSE MS = (N1) /2 END IF DO 200 I=1,M5 DO 201 K=1,H X(K,I) = 0.0 201 CONTINUE DO 100 J=l,MS 51 = 2.0 SIN(TPIDNIJ) DO 101 K=l,M X(K,I) = X(K,I) 51 101 CONTINUE 100 CONTINUE DO 202 K=l,M Y(K,I) = X(K,I) 202 CONTINUE 200 CONTINUE C C.A.LL COMPACT ALGORITHM c CI CALL IBMTIME(T5URT) CC CTRAN = SECOND () CALL VFFRO (H, Y) CI C!LL IBMTIME(TSTOP) CI CTRAI = 1.0E6 (TSTOP TSTART) CC CTRAN = SECOND () CTRAN c C VERIFY COMPACT ALGORITHM OUTPUT c reo = o 5QRTN = 5QRT(FLOAT(N)) DO 300 I=l,HS RELERR = AB5(SQRTNAB5(Y(1,I)))/SQRTN IF (RELERR .GT. l.OE3) ICO = 1 300 CONTINUE IF (2(N/2) .BE. N) THEN c C IF ll IS ODD, THEN SET DEFAULT OUTPUT Pii.RAI1ETERS FOR PRE C POSTPROCESSING ALGORITHM c IPO = 2 PINIT 0.0 PTRAN 0.0 ELSE c C IF N IS EVEN, THEN CALL PREPOSTPROCESSIIJG ALGORITHH c CI CALL IBMTIME(TSTART) CC PII'iiiT = SECONDO CALL VSI!Il'TI01S,lrlSAVE) CI CALL IBl'ITIHE(TSTOP) CI PINIT 1. OE6 (TSTOP TST ART) CC PINIT = SECONDO PINIT 230
PAGE 242
CI CALL IBMTIME(TSTART) CC PTRAN = SECOND () CALL VSINT(M,MS,X,XT,M,WSAVE) CI CALL IBMTIME(TSTOP) CI PTRAN l.OE6 (!STOP !START) CC PTRAlll' = SECOlli'D () PTRA.llr c C VERIFY PREPOSTPROCESSING ALGORITHM OUTPUT c c IPO = 0 DO 400 I=1,MS RELERR = ABS(SQRTN+X(1,I))/SQRTN IF (RELERR .GT. 1.0E3) IPO = 1 400 CONTINUE EJDIF C COMPUTE PERCENT DIFFERENCE IN TRANSFORM TIMES c c IF ((ICO .EQ. O) .AND. (IPO .EQ. 0)) THEN DELTIM = 100.0 (PTRANCTRAN)/PTRAN ELSE DELTIM = 0.0 END IF C OUTPUT TIMING DATA FOR I c c WRITE(6,2) N,CINIT,CTRAN,PINIT,PTRAN,DELTIN 2 FORMAT(iH ,12X,I4,2X,F10.6,2X,F10.6,2X, t F10.6,2X,F10.6,2X,F6.1) C IF VERIFICATION OF COMPACT ALGORITHM OUTPUT FAILED, THEH C OUTPUT DEBUG INFORMATION c c IF (ICO .EQ. 1) THEN VRITE(6,3) (IIDX(I),I=1,MS) 3 FORHAT(1H ,'INDX: ',128(/,416)) WRITE(6,4) (EIGENV(I),I=1,MS) 4 FORHAT(lH 'EIGEIJV: 1 ,128(/ ,4E13.4)) VRITE(6,S) (Y(1,I),I=1,MS) S FORMAT(1H ,'VFFRO OUTPUT:',128(/,4E13.4)) END IF C IF VERIFICATION OF PREPOSTPROCESSING ALGORITHH OUTPUT C FAILED, THEN OUTPUT DEBUG INFORMATION c c IF (IPO .EQ. 1) THEN WRITE(6,6) (X(1,I),I=l,MS) 6 FORMAT(1H ,'VSINT OUTPUT:',128(/,4E13 4)) END IF END IF C INCREMEE'T N (Ill SOME FASHION) AND REITERATE LOOP UIJTIL DONE c 11 = 1+1 GOTO 1001 231
PAGE 243
1002 CONTINUE END 232
PAGE 244
c C SUBROUTINE: VFROI c c C DJ.ME c C VECTORIZED FOURIER TRANSFORM FOR RO SEQUENCES, C INITIALIZATION ROUTINE c c C FUNCTION c C ALL PROCESSING WHICH DEPENDS ONLY 01 THE SEQUENCE LENGTH N c c C INPUT PARAMETERS c C N: LENGTH OF RO SYMMETRIC SEQUENCE c c C OUTPUT P A.RAMETERS c C ISDX: PERMUTED INDICES OF FORWARD TRANSFORM c C EIGENV: ASSOCIATED EIGENVALUES IN ORDER SPECIFIED BY UDX C ADD SCALED BY N c C IRC: INITIALIZATION RETURN CODE c 0 c 1 c c INITIALIZATION SUCCESSFUL N NOT A PRODUCT OF 2,3,4 OR MORE THAN 10 FACTORS C OUTPUT TO COMMON (INTERNAL USE ONLY) c C MISCELLANEOUS CONSTANTS AND POWERS OF QlLIEGA USED IN C COMBINE EQUATIONS. COMI10N AREA SIZE NUST BE ESTABLISHED C BY USER. SEE TEST DRIVER FOR DETAILS. c C LIST OF FACTORS OF N c c SUBROUTINE VFROI(N,INDX,EIGENV,IRC) INTEGER IMDX(1:1) REAL EIGENV(1:1) COMPLEX 0!tEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, t L,OMEGA INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE INTEGER P(3) C LIST OF FACTORS FOR FACTORIZATION OF N (ORDER IS VERY C IMPORTANT) c 233
PAGE 245
DJ.TJ. P/4.2.3/ c C MISCELLJ.NEOUS CONSTJ.NTS c CSQRT2 = SQRT(2.0) SQRT2D2 = CSQRT2/2.0 SQRT3 = SQRT(3.0) CSQRT3 = SQRT3 SQRT3D2 = SQRT3/2.0 CSQRT3D2 = SQRT3D2 PI= 4.0+J.TAN(1.0) CTIW16 = 2.0 SIN(PI/8.0) CTIW16E3 = 2.0 SIN(3.0+PI/8.0) c C POWERS OF OMEGA c L = 2+]! OMEGA(O) = 1. 0 TPIDL = 2.0+PI/L 0MEGi(1) = CMPLX(COS(TPIDL).SIN(TPIDL)) DO 100 I=2.L1 c OMEGA(!) = OMEGA(I1)+0MEGA(1) 100 CONTINUE C FACTORIZATION OF N c c D'FAC(1) N lllFAC(2) = 0 I = 1 LS = N C llHILE ((NFAC(2) .LT. 10) .AND. c c c c (I .LE. 3) .AND. (LS .Gr. 1)) DO 1 IF ((NFAC(2) .GE. 10) .OR. l (I .GT. 3) .OR. i (LS .LE. 1)) GOTO 2 IQ = LS/P(I) IR = LS IQ+P(I) IF (IR .EQ. 0) THEN NFAC(2) = NFAC(2) + 1 &FJ.C(NFAC(2)+2) = P(I) LS = IQ ELSE I = I+1 EJJDIF C EllDDO c c GOTO 1 2 CONTINUE IF (LS .EQ. 1) THEN IRe = 0 234
PAGE 246
c C GENERATE SPLITTING TREE c TPIDN = 2.0*PI/N IF (2(N/2) .EQ. N) THEN :MS = N/21 ELSE HS = (N1)/2 ElirDIF DO 300 !=1, MS EIGENV(I) = 0.0 DO 200 J=l,MS Sl = 2.0 J SIN(TPIDN*IJ) EIGENV(I) = EIGENV(I) Sl :wo CONTINUE 300 CONTINUE CALL VFFROI(l,EIGENV) C PERMUTED INDEX ARRAY c c DO 400 I=l,MS IF (EIGENV(I) .GT. 0) THEN IIDX(I) = EIGENV(I) + 0.1 ELSE INDX(I) = N + EIGENV(I) + 0.1 END IF 400 CONTINUE C PERMUTED SCALED EIGENVALUE ARRAY c DO 600 I=l,MS E!GENV(I) = 4.0 Ilr (SIN(PIEIGENV(I)/N))**2 590 CONTINUE c C FACTORIZATION OF N FAILED c ELSE rae = 1 Elii'DIF RETURN END 235
PAGE 247
c C SUBROUTIHE: VFFROI c c C Jtii.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE SUPERVISES THE FORt:!ARD TRANSFORH AND C GENERATION OF THE SPLITTING TREE BY PROCESSING THE LIST CF C FACTORS OF THE SEQUENCE LENGTH N IN FORWARD ORDER. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRll.ll!SFORJI! c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C FIRST HALF OF A RO SEQUENCE OF LENGTH Il (ELEREn:TS 1 C THROUGH N/21 IF N IS EVEN. OR ELEMENTS 1 THROUGH (N1)/2 C IF N IS ODD, ELEMENTS 0 AND N/2 ARE NOT INCLUDED BECAUSE C THEY ARE ZERO) c c C OUTPUT PARAMETERS c C X: FORWARD TRANSFORM IN PERMUTED ORDER, SCALED BY 1/N c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C INITIAL SPLITTING TREE ENTRY FOR FIRST FACTOR OF I c SUBROUTIEE VFFROI(M,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOH2/ llrFAC,NE,TREE LS = NFAC(1) NE(1) = 1 TREE(1,1,1) = 1 TREE(2,1,1) = 1 DO 100 I=1,NFAC(2) IP2 = I+2 C WRITE(6,1001) NFAC(IP2) 1001 FORMAT(1H ,'PROCESSING FACTOR ',11) GDTO (1,2,3,4),NFAC(IP2) 1 CONTINUE 2 CDNTIIWE CALL VFFR02I(M,LS,I,X) GOTO 99 3 CONTINUE 236
PAGE 248
c Cll.L VFFR03I01.LS,I,X) GOTO 99 4 CONTINUE CALL VFFR04I(H,LS,I,X) 99 CONTINUE LS = LS/NFAC(IP2) 100 CONTINUE C SCALING IS REQUIRED IN INITIALIZATION PROCESSING FOR C COMPUTING THE PERl'IUTED INDEX AND EIGENVALUE ARRAYS. c SCALE= 1.0/NFAC(1) IF (2*(Jil'FAC(1)/2) .EQ. NFAC(1)) MS = NFAC(l)/21 ELSE MS = (NFAC(i)1)/2 EMDIF DO 200 I=1,HS DO 201 J=1,M X(J,I) = X(J.I)*SCiLE 201 CONTINUE 200 CONTINUE RETURlll END 237
PAGE 249
c C SUBROUTINE: VFFR04I c c C i.A.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX4 INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AH INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION C OF THE FORWARD COMBINE EQUATIDrl"S FOR RADIX4. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES YHICH REFLECT THE CHANGES C MADE BY THIS RADIX4 SPLITTING. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURREin C STATE OF THE DATA IN THE ARRAY X (THAT IS lli'I.E lJO'.t C PROCESSING FACTOR NUMBER !C OF THE SEQUENCE LENGTH N) c C X: TiD DIMENSIONAL ARRAY, EACH ROY OF YHICH C IRTERHEDIATE RESULTS AS SPECIFIED BY THE SPLI7TING TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FOR'IJARD COMBINE EQUATIDrJS FOR RADIX4 c c C OUTPUT TO COMMON (UTERNAL USE ONLY) c C NElJ SPLITTING TREE ENTRIES WHICH REFLECT THE CHAIIIGES MADE C BY THIS RADIX4 SPLITTING c SUBROUTINE VFFR04I 0!,LS, IC, X) REAL X(l:M,l:l) INTEGER YFAC(1:12),NE(1:10),TREE(1:2,1:4,1 10) COMJ'!ON /VFROCOM2/ NFAC ,liTE, TREE LSD2 = LS/2 IX = 1 IN = IC+1 NE(IN) = 0 DO 1000 I=l,NE(IC) C VRITE(6,1001) TREE(1,I,IC),TREE(2,I,IC) 1001 F0RMAT(1H ,'SPLITTING TREE ENTRY= 1,2I5) 238
PAGE 250
I c GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CALL VICSF4(M,LS,X(1,IX)) IX = IX+LSD21 IF (LS .LT. 8) THEN IF ((Il!E(U') .GE. 1) .AND .t (TREE(i,NE(IN) ,IE) ,EQ. 3)) THEn TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(Ii) + 1 TREE(l,NE(IN),IE) = 3 TREE(2,NE(IN),IM) = 1 END IF ELSEIF (LS .EQ. 8) THEN IF ((HE(IIi) .GE. 1) .AND. t (TREE(i,lJE(UJ),IN) .EQ. 2)) THEN 200 TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN) ,IN) 2 TREE(2,NE(IM),IN) 1 END IF NE(IN) = NE(IN) + TREE(1,NE(IN),IN) 3 TREE(2,NE(IN),IN) ELSE IF ((NE(IN) .GE. 1) .AND. i: (TR.EE(l,NE(IIrl) ,IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) TREE(2,NE(IN),IN) ElliDIF NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 2 TREE(2,NE(IN),IN) 1 NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IU) 1 END IF GOTO 1000 CONTIIWE C ISCS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VISCSF4(H,LS,X(1,IX)) IX = IX+LSD2 IF ((NE(IN) .GE. 1) .A.JW. t (TREE(1,NE(IN),IN) .EQ. 3)) THEN TREE(2,ll!E(IE),IN) = TREE(2,NE(IN),IN) + 2 ELSE 239
PAGE 251
BE(IN) = NE(IN) + TREE(1,NE(IN),IN) 3 TREE(2,NE(IN) ,IN) 2 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF4(M,LS,X(1,IX)) IX = IX+LS IF ((liE(IIO .GE. 1) .AUD. & (TREE(1,NE(IN),IN) .EQ. 3)) THEN TREE(2,NE(IJJ) ,IN) TREE(2,NE(Illr) ,IN) + 4 ELSE IE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IN) 4 END IF 301 CONTINUE 1000 CONTINUE RETURN EHD 240
PAGE 252
c C SUBROUTINE: VFFR02I c c C NAME c C VECTOR! ZED FORWARD FOURIER TR.ArJSFORH FOR RO SEQUENCES, C RADIX2 INITIALIZATION ROUTINE c c C FUIIICTIOII c C THIS SUBROUTIIJE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAl'IETER (IC) TO SUPERVISE THE APPLICATIDrf C OF THE FORWARD COI1BINE EQUATIONS FOR RADIX2. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES WHICH REFLECT THE CHANGES C HADE BY THIS RADIX2 SPLITTING. c c C INPUT PARAMETERS c C M: l'lUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: UDEX INTO SPLITTING TREE '1JHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE YOU C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LENGTH 10 c C X: TilO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAII[S C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTIIW TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX2 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lEW SPLITTING TREE ENTRIES WHICH REFLECT THE Cli.ANGES MADE C BY THIS RADIX2 SPLITTING c SUBROUTINE VFFR02I(M,LS,IC,X) REAL X(1:M,1:1) INTEGER EFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE LSD2 = LS/2 IX = 1 IN = IC+1 EE(IN) = 0 DO 1000 I=1,NE(IC) C WRITE(6,1001) TREE(l,I,IC) 1001 FORMAT(1H 1SPLITTING TREE ENTRY = 1 241
PAGE 253
I c GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CALL VICSF2(M,LS,X(1,IX)) IX = IX+LSD21 IF ((NE(IN) .GE. 1) .AND. t (TftEE(l,NE(II),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 1 TREE(2,NE(IN),IN) 1 END IF NE(IN) = NE(IN) + 1 TREE(l,l'lE(Ilii'),IN) 2 TREE(2,NE(IN),IN) = 1 GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VISCSF2(M,LS,X(1,IX)) IX = IX+LSD2 IF ((NE(IN) .GE. 1) . ND. i (TREE(l,NE(IN) ,IN) .EQ. 3)) TEEm TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE IE(IN) = NE(IN) + 1 TREE(l,NE(IN),IN) 3 TREE(2,NE(IN),IN) = 1 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF2(H,LS,X(1,IX)) IX = IX+LS IF ((NE(II) .GE. 1) .AND. t (TREE(i,RE(Illl') ,nl') .EQ. 3)) THEN TREE(2,]!'E(I!J) ,BJ) = TREE(2,fJE(IN) ,I!J) + 2 ELSE NE(Illl') = lli'E(Illl') + 1 TREE(1,NE(IN) ,IN) 3 TREE(2,NE(IN),IN) 2 ENDIF 301 CONTINUE 1000 CONTINUE RETURN END 242
PAGE 254
c C SUBROUTINE: VFFR03I c c C NAME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX3 INITIALIZATION ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES 3PECIFIED C BY AN IflPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION C OF THE FORWARD COMBIIE EQUATIONS FOR RADIX3. IT ALSO C ADDS NEW SPLITTING TREE ENTRIES YHICH REFLECT CHANGES C HlDE BY THIS RADIX3 SPLITTING. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOW C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LErJGTH ll!) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH C IITERMEDIATE RESULTS AS SPECIFIED BY THE TREE C EBTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX3 c c C OUTPUT TO COHHON (INTERNAL USE ONLY) c C JJEW SPLITTING TREE ENTRIES WHICH REFLECT THE CHANGES NADE C BY THIS RADIX3 SPLITTING c SUBROUTINE VFFR03I(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),ME(1:10),TREE(1:2,1:4,1 10) COMMON /VFROCOM2/ EFAC,NE,TREE LSM1D2 = (LS1)/2 IX = 1 IE' = IC+l NE(IE) = 0 DO 1000 I:l,NE(IC) C WRITE(6,1001) TREE(1,I,IC),TREE(2,I,IC) 1001 FORHAT(1H ,'SPLITTING TREE ENTRY= ',2I5) 243
PAGE 255
c GOTO (100,200,300,400),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c c CiLL VICSF3(M,LS,X(1,IX)) IX = IX+LSM102 IF (LS .LT. 6) THEN IF ((NE(IliJ) .GE. 1) .AND. & (TREE(1,NE(IN),IN) .EQ. 4)) THEN TREE(2,NE(IN) ,IN) = TREE(2,1l!E(IN) ,HI) + 1 ELSE NE(IM) = NE(IN) + 1 TREE(1,NE(IN),I]J) 4 TREE(2,NE(IN),IN) = 1 END IF ELSE IF ((NE(IN) .GE. 1) .AND. t (TREE(1,NE(IN),IN) .EQ. 1)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) TREE(2,NE(IN),IN) = EJI'DIF NE(IN) = NE(IN) + 1 TREE(1,NE(IN),IN) = 4 TREE(2,NE(IN),IN) = 1 END IF GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VISCSF3(M,LS,X(1,IX)) IX = IX+LSM1D2 IF (LS .LT. 6) THEN IF ((NE(IN) .GE. 1) .AND. l (TREE(1,NE(IJ0,IN) .EQ. 4)) THEN TREE(2,NE(IN),IN) = TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = NE(IN) + 1 TREE(1,llrE(IN),IN) 4 TREE(2,NE(IN),IN) = 1 END IF ELSE IF ((NE(IN) .GE. 1) .AND. i: (TREE(l,NE(IN) ,IN) .EQ. 4)) THEN TREE(2,NE(IN),IN): TREE(2,NE(IN),IN) + 1 ELSE NE(IN) = .E(IN) + 1 TREE(l,NE';(IN),IN) = 4 TREE(2,NE'(IN),IN) = 1 END IF NE(IN) = NE(IN) + 1 244
PAGE 256
TREE(1,NE(IN),IN) 2 TREE(2,NE(IN),IE) 1 END IF GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF3(M,LS,X(l,IX)) IX = IX+LS IF ((NE(IN) .GE. 1) .AND. (TREE(l,NE(IN),IN) .EQ. 3)) THEN TREE(2,NE(IN),Ii) TREE(2,NE(IN),IN) + 3 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN),Ii) 3 TREE(2,NE(IN),IE) 3 END IF 301 CONTINUE GOTO 1000 40o comrnrUE DO 401 J=TREE(2,I,IC),1,1 CALL VI2F3(M,LS,XC1,IX)) IX = IX+LS IF ((NE(II) .GE. 1) .AND. t (TREE(1,EE(IM),IN) .EQ. 4)) THEN TREE(2,NE(IN),IN) TREE(2,NE(IN),IN) + 3 ELSE NE(IN) = NE(IN) + 1 TREE(1,NE(IN) ,IN) 4 TREE(2,NE(IJJ) ,IN) 3 END IF 401 CONTINUE 1000 COE"TiliJUE RETURN END 245
PAGE 257
c C SUBROUTINE: VFFRO c c C NAME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES c c C FUNCTION c C THIS SUBROUTINE SUPERVISES THE FORWARD TRANSFORM BY C PROCESSING THE LIST OF FACTORS OF THE SEQUENCE LENGTH N IN C FORWARD ORDER. c c C INPUT PARAMETERS c C H: lUMBER OF SEQUENCES TO TR.JI.NSFO!U1 c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C FIRST HALF OF A RO SEQUENCE OF LENGTH N (ELEHENTS 1 C THROUGH N/21 IF N IS EVEN, OR ELEMENTS 1 THROUGH (N1)/2 C IF I IS ODD, ELEMENTS 0 AID N/2 ARE NOT INCLUDED BECAUSE C THEY ARE ZERO) c c C OUTPUT PARAMETERS c C X: FORWARD TRANSFORI1 IN PERMUTED ORDER, SCALED BY 1/llT. C SCALING SHOULD BE DELETED IN THE FINAL VERSION OF THIS C CODE. IT fS nJCLUDED ONLY FOR PERFORMAIIfCE COHPARISOUS C WITH VFFTPH;, c c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C IOIE c SUBROUTINE VFFRO(M,X) REAL X(1:H,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) CDMliON /VFROCOM2/ NFAC ,NE, TREE LS = NF.A.C(1) DO 100 I=1,NFAC(2) IP2 = I+2 GOTO (1,2,3,4),1FAC(IP2) 1 CONTI!iUE 2 CONTINUE CALL VFFR02(M,LS,I,X) GOTO 99 3 CONTINUE CALL VFFR03(M,LS,I,X) GOTO 99 246
PAGE 258
c 4 CONTiiUE CALL VFFR04(H,LS,I,X) 99 CONTINUE LS = LS/NFAC(IP2) 100 CONTINUE C SCALIIiiG SHOULD BE DELETED IN THE FHJAL VERSIOIJ OF THIS C CODE. IT IS INCLUDED ONLY FOR PERFOR1U.NCE CDPIPAaiSOIJS YITH C VFFTPK. c SCALE= 1.0/SQRT(FLOAT(NFAC(l))) IF (2(NFAC(1)/2) .EQ. NFAC(1)) THEN MS = NFAC(l)/21 ELSE MS = (NFAC(l)1)/2 END IF DO 200 I=1,MS DO 201 J=l,M X(J,I) = X(J,I)SCALE 201 CONTINUE 200 CONTINUE RETURN END 247
PAGE 259
c C SUBROUTINE: VFFR04 c c C NAME c C VECTORIZED FORTrJ'ii.RD FOURIER TRANSFORM FOR RO SEQUELlCES, C aiDIX4 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SP2:CH'IED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORWARD CONBINE EQUATIONS FOR RADIX4. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOW C PROCESSING FACTOR NUMBER IC OF THE SEQUEICE LENGTH N) c C X: TVO DIMENSIONAL ARRAY, EACH ROM OF TrJ'HICH CONThiNS C INTERMED!il. TE RESULTS AS SPECIFIED BY THE SPLITTIUG TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT P AllMETER.S c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX4 c c C OUTPUT TO COMMON (INTERIAL USE ONLY) c C DONE c SUBROUTINE VFFR04(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOH2/ liJFAC,liJE,TREE c LSD2 = LS/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF4(M,LS,X(1,IX)) IX = IX+LSD21 248
PAGE 260
c GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE EO LOOP REQUIRED c CALL VISCSF4(M,LS,X(1,IX)) IX ::::: IX+LSD2 GOTO 1000 300 CONTillUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF40I,LS,X(1,IX)) IX = IX+LS 301 CONTINUE 1000 COJJTINUE RETURN END 249
PAGE 261
c C SUBROUTINE: VFFR02 c c C NAME c C VECTORIZED FORVARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX2 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORVARD COMBINE EQUATIONS FOR RADIX2. c c / C INPUT' PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LEIGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, WE ARE NOV C PROCESSING FACTOR NUMBER IC OF THE SEQUENCE LENGTH N) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF MHICH CONTAINS C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTING TREE C EDTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RAD!X2 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C DONE c SUBROUTINE VFFR02(M,LS,IC,X) REAL X(1:M,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,NE,TREE c LSD2 = LS/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300),TREE(l,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF2(M,LS,X(l,IX)) IX = IX+LSD21 250
PAGE 262
c GOTO 1000 200 CONTINUE C ISCS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VIS,CSF2(M,LS,X(1,IX)) IX "' IX+LSD2 GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF2(M,LS,X(1,IX)) IX = IX+LS 301 CONTINUE 1000 CONTINUE RETURN END 251
PAGE 263
c C SUBROUTINE: VFFR03 c c C JIIA.ME c C VECTORIZED FORWARD FOURIER TRANSFORM FOR RO SEQUENCES, C RADIX3 SUPERVISOR ROUTINE c c C FUNCTION c C THIS SUBROUTINE USES THE SPLITTING TREE ENTRIES SPECIFIED C BY AN INPUT PARAMETER (IC) TO SUPERVISE THE APPLICATION OF C THE FORWARD COMBINE EQUATIONS FOR RADIX3. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C IC: INDEX INTO SPLITTING TREE WHICH SPECIFIES THE CURRENT C STATE OF THE DATA IN THE ARRAY X (THAT IS, TrJE .ARE NOW C PROCESSING FACTOR liHJl'IBER IC OF THE SEQUENCE LENGTH N) c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS C INTERMEDIATE RESULTS AS SPECIFIED BY THE SPLITTING TREE C ENTRIES CORRESPONDING TO IC c c C OUTPUT PARAMETERS c C X: UPDATED BY FORWARD COMBINE EQUATIONS FOR RADIX3 c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VFFR03(M,LS,IC,X) REAL X(1:H,1:1) INTEGER NFAC(1:12),NE(1:10),TREE(1:2,1:4,1:10) COMMON /VFROCOM2/ NFAC,EE,TREE c LSM102 = (LS1)/2 IX = 1 DO 1000 I=l,NE(IC) GOTO (100,200,300,400),TREE(1,I,IC) 100 CONTINUE C ICS SYMMETRY OCCURS AT MOST ONCE NO LOOP REQUIRED c CALL VICSF3(M,LS,X(1,IX)) IX = IX+LSH1D2 252
PAGE 264
c GOTO 1000 200 CONTINUE C ISCS SYriNETRY OCCURS AT MOST ONCE EO LOOP REQUIRED c CALL VISCSF3(M,LS,X(1,IX)) IX = IX+LSM1D2 GOTO 1000 300 CONTINUE DO 301 J=TREE(2,I,IC),1,1 CALL VIF3(M,LS,X(1,IX)) IX = IX+LS 301 CONTINUE GOTO 1000 400 CONTINUE DO 401 J=TREE(2,I,IC),1,1 CALL IX = IX+LS 401 CONTINUE 1000 COJITINUE RETURN EIID 253
PAGE 265
c C SUBROUTINE: VICSF4 c c C DAME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD FOR C RADIX4 c c C FUNCTIOE' c C THIS SUBROUTINE EXECUTES THE RJI.DIX4 FORWARD CGr1BINE C EQUATIONS FOR ICS INDUCED SYNMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TTJO DIHENSIDiililL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX4 FORRARD COMBINE EQUATIONS FOR ICS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NOME c SUBROUTIDE VICSF4(M,LS,X) REAL X(l:M,l:LS/21) COMPLEX OMEGA(O:O) COMMON /VFROCOMl/ CSQRT2,SQRT2D2, LSD4 = LS/4 DO 1 J=l,M SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, L,OMEGA X(J,LSD4) = (2.0) X(J,LSD4) 1 CONTINUE IF (8(15/8) .EQ. LS) THEN LSDS = LS/8 I3LSD8 = 3LSD8 DO 2 J=1,M V1 = CSQRT2 (X(J,LSD8) + X(J,I3LSD8)) X(J,LSD8) = 2.0 (X(J,LSD8) X(J,I3LSD8)) X(J,I3LSD8) = V1 2 CONTINUE 254
PAGE 266
T c ......... HS = LSDB1 ELSE MS = (LS4)/8 END IF IF (LS .GT. 8) THEE LSD2 = LS/2 LDLS = L/LS DO 100 I=l,MS LSD4MI = LSD4I LSD4PI = LSD4+I LSD2Ml = LSD2I ILDLS = ILDLS 51 = REAL(DHEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) DO 101 J=1,M Vl = X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) V3 = X(J,LSD4PI) X(J,LSD4MI) V4 = X(J,LSD4PI) X(J,LSD4MI) X(J,I) = V2 + V3 X(J,LSD4MI) = V2 V3 C Ct = COIJG((S1,S2)) CHPLX(V1,V4) C X(J,LSD4PI) = AIMAG(C1) C X(J,LSD2MI) = REAL(C1) c X(J,LSD4PI) = S1V4 S2*V1 X(J,LSD2MI) = StVt + S2V4 101 CONTINUE 100 CONTINUE EIDIF RETURN END 255
PAGE 267
c C SUBROUTINE: VISCSF4 c c C NAME c C VECTORIZED ISCS INDUCED SYNI>IETRIES FORWARD COHBnED FOR C li.A.DIX4 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX4 FORWARD CGNB:i:NE C EQUATIONS FOR ISCS INDUCED SYIDiETRIES. c c C INPUT PARAMETERS c C H: NUMBER OF SEQUENCES TO TRANSFORJ{ c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX4 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COHMOi (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF4(M,LS,X) REAL X(l:M,O:LS/21) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, LSD4 = LS/4 DO 1 J=1,M L,OI1EGA Vi= CSQRT2 X(J,LSD4) V2 = Vl + X(J,O) X(J,O) = Vl X(J,O) X(J,LSD4) = V2 i COE!TINUE IF (B(LS/8) .EQ. LS) THEN LSDB = LS/8 I3LSD8 = 3LSD8 DO 2 J=l,H Vi = CTIW1SX(J,I3LSD8) + CTIWi6E3X(J,LSD8) 256
PAGE 268
X(J,I3LSD8) = CTIW16E3*X(J,I3LSD8) CTIW16Y(J,LSD8) X(J,LSDS) =Vi 2 CONTINUE MS = LSDB1 ELSE MS = (LS4)/8 END IF IF (LS .GT. 8) THEN LSD2 = LS/2 LD2LS = L/(2LS) DO 100 I=l,MS LSD4MI = LSD4 I LSD4PI = LSD4 + I LSD2MI = LSD2 I ILD2LS = ILD2LS I3ILD2LS = 3ILD2LS 51 REAL(OMEGA(ILD2LS)) 82 AIMAG(OMEGA(ILD2LS)) 53 REAL(OHEGA(I3ILD2LS)) 54 AIMAG(OMEGA(I3ILD2LS)) DO 101 J=1,M V1 = SQRT2D2 (X(J,LSD4HI) + X(J,LSD4PI)) V2 SQRT2D2 (X(J,LSD4MI) X(J,LSD4PI)) V3 V1 X(J ,I) V4 V1 + X(J ,I) c VS V2 X(J,LSD2MI) V6 = V2 X(J,LSD2MI) C Cl = CONJG((Sl,S2)) CMPLX(V6,V4) C X(J,I) = AIMAG(Cl) C X(J,LSD4MI) = REAL(C1) c X(J,I) = S1V4 S2V6 l(J,LSD4MI) = S1*V6 + S2V4 c C C2 = CONJG((S3,S4)) CMPLX(VS,V3) C X(J,LSD4PI) AIMAG(C2) C l(J,LSD2MI) = REiL(C2) c X(J,LSD4PI) X(J,LSD2MI) 101 CONTINUE 100 CONTINUE END IF RETURN END S3*V3 S4VS S3VS + S4V3 257
PAGE 269
c C SUBROUTINE: VIF4 c c C IU.HE c C VECTORIZED I SEQUENCES FORWARD COMBINED FOR RADII4 c c C FUJJCTIOM c C THIS SUBROUTINE EXECUTES THE RADIX4 FORWARD COMBnJE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TVO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF U I SYmiETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX4 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMI'!:ON (IIrTERN.li.L USE ONLY) c C NOllE c SUBROUTINE VIF4(M,LS,X) REAL X(l:M,O:LS1) COMPLEX OMEGA(O:O) COMMOE /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, i: CTIW16,CTHJ16E3, & L,ONEGA LSD4 = LS/4 LSD2 = LS/2 I3LSD4 = 3LSD4 DO 1 J=l,H Vi X(J,O) + X(J,LSD2) V2 X(J,O) X(J,LSD2) V3 2.0*X(J,LSD4) V4 2.0*X(J,I3LSD4) X(J,O) = V1 + V3 X(J,LSD4) = V2V4 X(J ,LSD2) = V1 V3 X(J,I3LSD4) = V2 + V4 1 CONTIJIIUE 258
PAGE 270
2 c IF (8(15/8) .EQ. LS) THEN LSD8 = LS/8 I3LSD8 3LSD8 I5LSD8 = 5LSD8 I7LSD8 = 7LSD8 DO 2 J=l,M Vl = X(J,LSDS) + X(J,I3LSD8) V2 X(J,LSD8) X(J,I3LSD8) V3 = X(J,I5LSD8) + X(J,I7L5D8) V4 = X(J,I5LSD8) X(J,I7LSD8) X(J,L5D8) = 2.0 Vl X(J,I3L5D8) = C5QRT2 (V3 V2) X(J,I5L5D8) 2.0 V4 X(J,I7LSD8) = C5QRT2 (V3 + V2) CONTINUE MS = 15081 ELSE MS = (154)/8 END IF IF (LS .GT. 8) THEN LDL5 = 1/LS DO 100 I=l,MS L5D4MI = LSD4 I L5D4PI = LSD4 + I LSD2MI = 1502 I LSD2PI = LSD2 + I I3LSD4MI = I3LSD4 I I3LSD4PI = I3LSD4 + I LSMI=1SI ILDLS = ILDLS I2ILDLS = 2*ILDLS I3ILDLS = 3ILDLS 51 = REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 53= REAL(OMEGA(I2ILDLS)) 54 = AIMAG(OMEGA(I2ILDLS)) 55 = REAL(OMEGA(I3ILDLS)) 56 = A!MAG(OMEGA(I3ILDLS)) DO 101 J=l,H Vl = X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) V3 = X(J,LSD4PI) + X(J,LSD4MI) V4 X(J,LSD4PI) X(J,LSD4MI) VS = X(J,I3LSD4MI) + X(J,I3LSD4PI) V6 = X(J,I3LSD4MI) X(J,I3LSD4PI) V7 = X(J,LSMI) + X(J,LSD2PI) VB= X(J,LSMI) XCJ,LSD2PI) C Cl = CMPLX(V8+V6,V1+V3) C X(J,I) = AIMAG(Cl) C X(J,LSD4MI) = REAL(Cl) c c X(J,I) = Vl + V3 X(J,LSD4MI) VS + V6 259
PAGE 271
C C2 = CONJG((S1,52)) CMPLX(V7+V4.V2V5) C X(J ,LSD4PI) .A.I11AG(C2) C X(J,LSD2HI) = REAL(C2) c c V9 = V7 + V4 V10 = V2 VS X(J,LSD4PI) S1V10S2V9 X(J ,LSD2IU) = S1V9 + S2*V10 C C3 = CONJG((S3,S4)) CMPLX(V8V6,V1V3) C X(J,LSD2PI) = AIHAG(C3) C X(J,I3LSD4HI) = REAL(C3) c c V9 = V8 V6 V10 = Vi V3 X(J,LSD2PI) = S3V10S4*V9 X(J,I3LSD4MI) S3V9 + S4V10 C C4 = COJJJG((S5,S6)) CNPLX(V7V4, V2+VS) C X(J,I3LSD4PI) = AIHAG(C4) C X(J,LSMI) = REAL(C4) c V9 = V7 V4 V10 = V2 + VS X(J,I3LSD4PI) = SS*V10S6V9 X(J,LSHI) = SSV9 + S6V10 101 CONTINUE 100 CONTINUE END IF RETURN END 260
PAGE 272
c C SUBROUTINE: VICSF2 c c C E'.A.ME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD COMBINED FOR C RADIX2 c c C FUE'CTION c C THIS SUBROUTINE EXECUTES THE RADIX2 FORWARD COMBINE C EQUATIONS FOR ICS INDUCED SYI
PAGE 273
c C SUBROUTINE: VISCSF2 c c C UME c C VECTORIZED ISCS INDUCED SYMMETRIES FORlrJARD CONBII!ED FOR C RJ.DIX2 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX2 FORWARD COMBINE C EQUATIONS FOR ISCS INDUCED SYMMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRJI.NSFOIUI c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RJI.DIX2 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C DONE c SUBROUTINE VISCSF2(M,LS,X) REAL X(l:M,O:LS/21) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, & SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, DO 1 J=l,M X(J,O) =X(J,O) CTIW16,CTIM16E3, L,OMEGA 1 COflTUJUE IF (LS .GT. 4) THEN LSD2 = LS/2 LD2LS = L/(2*LS) DO 100 I=1.(LS2)/4 LSD2MI = LSD2I ILD2LS = I*LD2LS S1 = REAL(OHEGA(ILD2LS)) 52= .II.HU.G(OHEG.A(ILD2LS)) DO 101 J=1,M 262
PAGE 274
c C Cl = CONJG((S1,S2)) CMPLX(X(J,LSD2MI),X(J,I)) C X(J,I) = AIMAG(Cl) C X(J,LSD2MI) = REAL(Cl) c Vl = S2X(J,I) S1XCJ,LSD2NI) X(J,I) = SiX(J,I) + S2XCJ,LSD2MI) X(J,LSD2MI) =Vi 101 CONTINUE 100 CONTINUE END IF RETUR.E END 263
PAGE 275
I I I c C SUBROUTINE: VIF2 c c C NAME c C VECTORIZED I SEQUEillCES FORYJl.RD CONBINED FOR RADIX2 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX2 FORWARD COMB!llE C EQUATIONS FOR I SEQUENCES. c c C Il!lPUT P.ARll!ETERS c C H: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN I SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PAR.AI'ffiTERS c C X: UPDATED BY RADIX2 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMMON (UTERl\!Jl.L USE ONLY) c C NONE c SUBROUTINE VIF2(M,LS,X) REAL X(l:M,O:LS1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, & CTIW18,CTHF16E3, i 1, OJliEGA LSD2 = LS/2 DO 1 J=l,M Vi= X(J,O) X(J,LSD2) X(J,O) = X(J,O) + X(J,LSD2) XCJ,LSD2) = V1 1 CONTINUE IF (LS .GT. 4) THEE LDLS = 1/LS DO 100 I=i,(LS2)/4 LSD2MI = LSD2! LSD2PI = LSD2+I LSIH = LSI ILDLS = I*LDLS 264
PAGE 276
51 = REAL(OMEGA(ILDL5)) 52 = AIMAG(OMEGA(ILDLS)) DO 101 J=i,M Vi X(J,I) + X(J,LSD2MI) V2 = X(J,I) X(J,LSD2MI) c V3 = X(J,LS!U) + X(J,LSD2PI) V4 = X(J,LSMI) X(J,LSD2PI) C Cl = CMPLX(V4,V1) C X(J,I) = AIMAG(C1) C X(J,LSD2MI) = REAL(C1) c c X(J,I) = Vl X(J,LSD2IU) = V4 C C2 = CONJG((S1,S2)) CMPLX(V3,V2) C X(J ,LSD2PI) = AiriAG(C2) C X(J,LSMI) = REAL(C2) c X(J,LSD2PI) X(J ,LSHI) 101 CONTINUE 100 CONTINUE END IF RETURN END = Si*V2 S2*V3 S1*V3 + S2*V2 265
PAGE 277
c C SUBROUTINE: VICSF3 c c C NAME c C VECTORIZED ICS INDUCED SYMMETRIES FORWARD COMBINED FOR C RADIX3 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX3 FORWARD CONBIIJE C EQUUIONS FOR ICS INDUCED SYNHETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEIIW SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH L5 c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX3 FORYARD COMBINE EQUATIONS FOR ICS C INDUCED SY!iMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VICSF3(M,LS,X) REAL X(1:M,1:(LS1)/2) COMPLEX OMEGA(O:O) CaMMON /VFROCOM1/ CSQRT2,SQRT202, SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, CTIW16,CTIW16E3, LSD3 = LS/3 DO 1 J=1,H L,ONEGA X(J,LS03) = CSQRT3 + X(J,LSD3) 1 CONTINUE IF (LS .GT. 6) THEN LDLS = L/LS DO 100 I=1.(LS3)/6 LSD3MI = LSD3I LSD3PI = LSD3+I ILDLS = I+LDLS 51 REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 266
PAGE 278
c DO 101 J=1,M Vl = CSQRT3D2 (X(J,LSD3PI) + X(J,LSD3MI)) V2 = X(J,LSD3PI) X(J,LSD3IU) C C1 = COJJG((S1,S2)) CHPLX((O.S)V2+X(J,I),V1) C X(J,LSD3MI) = REAL(Cl) C X(J,LSD3PI) = AIMAG(C1) c c V3 = (O.S)V2 + X(J,I) X(J ,LSD3IU) S1V3 + S2*V1 X(J,LSD3PI) SlVl S2V3 X(J,I) = V2 + X(J,I) 101 CONTINUE 100 COllJTINUE END IF RETURN END 267
PAGE 279
c C SUBROUTINE: VISCSF3 c c C NAME c C VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COMBINED FOR C RA.DIX3 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIX3 FORWARD COiiBINE C EQUATIONS FOR ISCS INDUCED SYMMETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRJI.NSFORH c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIHENSIOHAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN ISCS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT P AR.AMETERS c C X: UPDATED BY RADIX3 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF3(M,LS,X) REAL X(hM,O' (LS3)/2) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, i SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, a: CTIW16,CTH!'16E3, 1:: L,OMEGA LSH3D6 (LS3)/6 DO 1 J=1,M X(J,LSM3D6) = CSQRT3 X(J,LSM3D6) 1 CONTINUE IF (LS .GT. 6) THEE LSM1D2 = (LS1)/2 LD2LS = L/(2LS) DO 100 I=1,LSM3D6 = LSM3D6 I LSM3D6PI = LSM3D6 + I LSH1D2MI = LSM1D2 I ILD2LS = ILD2LS 268
PAGE 280
c 51 = REALCOMEGA(ILD2LS)) 52 = AIMAG(OMEGA(ILD2LS)) DO 101 J=i,M Vi CSQRT302 (X(J,LSM306PI) + X(J,LSM3D6MI)) V2 = X(J ,LSM306PI) X(J ,LSM306I'II) C C1 = CONJG((S1,S2)) CMPLX((O.S)V2+X(J,LSMlD2NI) ,V1) C X(J,LSM3D6MI) REAL(C1) C X(J ,LSM3D6PI) = AII1AG(C1) c c V3 = (0.5)*V2 + X(J,LSM1D2MI) X(J,LSM3D6MI) S1V3 + S2V1 X(J ,LSI'i3D6PI) S1V1 S2V3 X(J,LSH1D2MI) 101 CONTINUE 100 CONTINUE END IF RETURN n V2 + X(J,LSM1D2MI) 269
PAGE 281
c C SUBROUTINE: VIF3 c c C NAME c C VECTOR!ZED I SEQUENCES FORWARD COMBINED FOR RADIX3 c c C FUNCTIOI c C THIS SUBROUTINE EXECUTES THE RADIX3 FORWARD COMBINE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH CONTAINS THE C IDFT OF AN I SYMHETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C X: UPDATED BY RADIX3 FORWARD COMBINE EQUATIONS FOR I C SEQUEniCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lii'Ol!IE c SUBROUTINE VIF3 01,15 ,X) REAL X(l:M,O:LS1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, & SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, .t CTIM16, CTIW16E3, .t L,OMEGA LSD3 = LS/3 I2LSD3 = 2*1503 DO 1 J=1,M Vi= SQRT3 X(J,I21SD3) V2 = X(J,O) X(J.LSD3) X(J.O) = X(J.O) + 2.0*X(J.LSD3) X(J.LSD3) = V2 V1 X(J.I2LSD3) = V2 + V1 1 CONTH!UE IF (LS .GT. 6) THEN LDLS = L/LS DO 100 I=1.(LS3)/6 LSD3MI = LSD3 I 270
PAGE 282
LSD3PI = LSD3 + I I2LSD3MI = I2LSD3 I I2LSD3PI = I2LSD3 + I LSMI=LSI ILDLS = ILDLS I2ILDLS = 2ILDLS 51 = REAL(Ol1EGA(ILDLS)) 52 = AIMAG(OHEGA(ILDLS)) 53= REAL(OMEGA(I2ILDLS)) 54= AIHAG(OMEGA(I2ILDLS)) DO 101 J=i,M V1 = X(J,LSD3MI) + X(J,LSD3PI) V2 = SQRT3D2 (X(J,LSD3MI) X(J,LSD3PI)) V3 = SQRT3D2 (X(J ,I2LSD3IU) + X (J, I2LSD3PI)) V4 X(J,I2LSD3HI) X(J,l2LSD3PI) c V5 (O.S)Vl + X(J,I) V6 (0.5)V4 + X(J,LSMI) C Cl = CMPLX(V4+X(J,LSHI),V1+X(J,I)) C X(J,I) = AIHAG(C!) C X(J,LSD3MI) = REAL(C1) c c X(J,I) =Vi+ X(J,I) X(J,LSD3MI) = V4 + X(J,LSMI) C C2 = COMJG((S1,S2)) CMPLX(V6V2,V5V3) C X(J,LSD3PI) = AIHAG(C2) C X(J,I2L5D3MI) = REAL(C2) c c V7 = V6 V2 va = vs V3 X(J,L5D3PI) = X (J, I2L5D3IU) Sl*V8 S2V7 Sl*V7 + S2VS C C3 = COHJG((S3,S4)) CMPLX(V6+V2,VS+V3) C X(J,I2LSD3PI) = AIMAG(C3) C X(J,LSMI) = REAL(C3) c V7 = VS + V2 va = vs + va X(J,I2LSD3PI) = S3V8 S4V7 X(J,LSMI) = S3V7 + 54*V8 101 CONTINUE 100 COP1TIE'UE EP10IF RETURN END 271
PAGE 283
c C SUBROUTINE: VI2F3 c c C NAME c C VECTORIZED !2 SEQUENCES FORWARD COMBINED FOR RADIX3 c c C FUJii'CTIOE c C THIS SUBROUTINE EXECUTES THE RADIX3 FORWARD CONBINE C EQUATIONS FOR 12 SEQUEiifCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C X: TWO DII'IENSIONAL ARRAY, EACH ROll OF WHICH cm!Tii.IHS THE C !OFT OF AI !2 SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PAR.UlETERS c C X: UPDATED BY RADIX3 FORWARD COMBINE EQUATIONS FOR !2 C SEQUENCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VI2F3(M,LS,X) REAL X(l:M,O:LS1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ CSQRT2,SQRT2D2, t SQRT3,CSQRT3,SQRT3D2,CSQRT3D2, 1: CTIIri116,CTilnT16E3, 1: L,OMEGA EQUIVALENCE (!RO,LSH306) IRO (LS3)/6 IR1 = (LS1)/2 IR2 = (Ei>t
PAGE 284
c DO 100 I=1,LSM3D6 IROMI IRO I IROPI IRO + I IR1MI IR1 I IR1PI IR1 + I IR2MI IR2 I IR2PI "' IR2 + I ILDLS = hLDLS I2ILDLS = 2*ILDLS 51= REAL(OMEGA(ILDLS)) 52 = AIMAG(OMEGA(ILDLS)) 53 = REAL(OMEGA(I2ILDLS)) 54 = AIMAG(OHEGA(I2ILDLS)) DO 101 J=1,M Vi SQRT3D2 (X(J,IROMI) + X(J,IROPI)) V2 X(J,IROMI) X(J,IROPI) V3 X(J,IR2MI) + XCJ,IR2PI) V4 = SQRT3D2 (X(J,IR2MI) X(J,IR2PI)) VS = (0.5)*V2 + X(J,IR1MI) V6 = (O.S)*V3 + X(J,IR1PI) C C1 = CMPLX(V2+X(J,IR1MI),V3+X(J,IR1PI)) C X(J,IROHI) REAL(C1) C X(J,IROPI) = AIMAG(Cl) c c X(J,IROHI) = V2 + X(J,IR1MI) X(J,IROPI) = V3 + X(J,IR1PI) C C2 = COEJG((S1,S2)) CHPLX(V5V4,V6V1) C X(J,IR1MI) REAL(C2) C X(J,IR1PI) = AIMAG(C2) c c V7 = VS V4 VS=V6V1 X(J,IR1MI) S1*V7 + S2*V8 X(J,IR1PI) = Sl*VSS2*V7 C C3 = COIJG((S3,S4)) CMPLX(V6+V4,V6+V1) C X(J,IR2MI) = REAL(C3) C X(J,IR2PI) = AIMAG(C3) c V7 = VS + V4 VS = V6 + V1 X(J,IR2MI) S3V7 + S4*V8 X (J, IR2PI) 53V8 54*V7 101 CONTINUE 100 COJifTII'JUE END IF RETURJJ END 273
PAGE 285
Appendix C FORTRAN Skeleton for Combine Equations
PAGE 286
c C SUBROUTINE: V_acronym_Fori_p c c C NAME c C VECTORIZED acronym INDUCED SYMl1ETRIES :forward o:r inverse C COMBINED FOR R!DIXp c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE RADIXp :forvard or inverse C COMBINE EQUATIONS FOR acronym INDUCED SYMMETRIES. c c C IEPUT PARAMETERS c C M: NUHBER OF SEQUENCES TO TRilNSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICR CONTAINS THE C idft or idst OF AI acronym SYMMETRIC SEQUENCE OF LENGTH C LS c c C OUTPUT PARAMETERS c C A: UPDATED BY Rli.DIXp forward or inverse COHBHJE C EQUATIOES FOR acronym INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c c SUBROUTINE V_acronym_Fori_p(M,LS,A) REAL A(l:M,Oor1:function_of_LS) COMPLEX OMEGA(O:O) COMMON /V_ForS_symmetry_COM1/ miscellaneous constants, t L,OMEGA IliTEGER P,TWOP PARAMETER (P=p,TWOP=2*P) C COMPUTATIONS FOR I = 0 c DO 1 J=1,M 1 CONTIEUE IF (TWOP*(LS/TVOP) .EQ. LS) THEN c C COMPUTATIOIS FOR I = LS/TWOP c DO 2 J=1,M 275
PAGE 287
c 2 CONTINUE MS = LS/TWOP1 ELSE MS = (LSP)/TVOP EJlDIF IF (LS .GT. !WOP) THEN C COMPUTATIOES FOR I = t,MS c DO 100 I=t,HS no 101 J=t,M 101 COllfTINUE 100 CONTINUE END IF RETURI! EHD 276
PAGE 288
Appendix D Mathematica Scripts
PAGE 289
( VECTORIZED ICS INDUCED SYMMETRIES FORWARD COIlBINED EVEN Rli.DIX J < BEFORE EXECUTING THIS FILE, OPEN THE PACKAGES MSG.M AUD REIM.M. MSG.M PROVIDES TEXT FOR ERROR WHILE REIM.M REDEFINES THE FUNCTIONS RE AND IM FOR PERFORMING SYMBOLIC RATHER THAN NUMERIC COMPUTATIONS. ) < SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 ( THE CONJUGATE FUNCTION MUST BE REDEFINED FOR PERFORHIJJG SYMBOLIC RATHER THAN NUMERIC COMPUTATIONS J Unproteet[Conjugate] {"Conjugate"} Conjugate[expr_] := Re[expr] Im[expr] I Protect(Conjugate] {"Conjugate"} ( REX REPRESENTS THE REAL PART OF X. AT THIS POINT, IT IS AN UNDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. WE WILL LATER DEFINE REX IN TERMS OF ITS LOCATIOIJ liiTHIN A FORTRAN ARRAY, MATHEMATIC.A MAKES NO ASSUHPTIOJ.fS ABOUT THE DATA TYPE OF REX. THE FOLLOWING ST ATENEETS INFORlli HATHEHATICA THAT REX IS REAL VALUED. IMX REPRESENTS THE IMAGINARY PART OF X, AND IS ANALOGOUS TO REX. ) rex/: Re[rex[n_]] := rex[n] rex/: Im[rex (n_]] ,. 0 imx/: Re[Unx[n_]] ,. imx[n] imx/: Im[imx [n_] J := 0 < THE FOLLOWIJliG STATEMENTS ARE VALID FOR THE IDFT OF P.iJY ICS 278
PAGE 290
SYMMETRIC SEQUENCE. THESE STATEf!ENTS PLAY 11. CRUCIAL ROLE IN SIMPLIFYING THE FOR&!A.RO COMBINE EQUATIONS FOR ICS SEQUENCES. LS REPRESENTS THE LENGTH OF THE SUBSEQUEI>!CE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) rex[O] := 0 rex[ls/2] := 0 imx[n_] := 0 < THE FUJJCTIONS X, W (OMEGA), AND Y CORRESPOND TO IWTATION USED IN THE COMBINE EQUATIONS. THE ARGUMENTS TO THESE FUNCTIONS REPRESENT SUBSCRIPTS. ) x:[n_,l_] := rex:[hls/p+n] + himx[Hls/p+n] v[l_] := Exp[h2+Pi/l] y[n_,q_] := ( x[n,O] + ( 1) (q+1 )+Conjugate [x [ n ,p/2]] + Sum[ Conjugate[v[p](l+q)]+x[n,l]v[p] (l+q)+Conjugate [x [ n,l]] {l,1,p/21} J ) < THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, Aim THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsryOq = Table( Factor[Simpliy[Re(y[O,q]]]] {q,O,p/21} ] {0, O} < THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NONZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y(O.Q]. ) rhsiyOq = Table[ Factor[Simplify[Im[y[O,q]]]) {q,O,p/21} ] {0, 2+rez[ls/4]} ( THE FOLLOWING TABLE COETAH'S THE REAL PART OF ,Q] 279
PAGE 291
FOR THE APPROPRIATE RANGE OF Q, THE RESULTS SHOULD BE 0, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsrym.q =Table[ Factor(Sirnplify[Re[y(ls/(2p) ,q]]]] {q,O,p/21} ] {0, 0} < THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[LS/(2P),Q] FOR THE APPROPRIATE RANGE OF Q, THE NONZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYMQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHt HAND SIDE OF THE IMAGINARY PART OF Y[M,Q] WHERE M = LS/(2*P). ) rhsiym.q =Table[ Factor[Simplify[Im[y[ls/(2p),q]]JJ {q,O,p/21} ] < THE FOLLOliimG RESULT IS THE REAL PART OF Y[I ,0] FOR 11. GENERAL INDEX I. A NONZERO RESULT WILL LATER BE STORED THE FORTRAN ARRAY. THE LABEL RHSRYIO PROVIDES A CONVENIENT MEARS FOR REFERRING TO THIS RESULT, AND IS An ABBRE\II.A.TION FOR THE RIGHT HAND SIDE OF THE REAL PART OF Y[I,O]. ) rhsryiO = Factor[Simplify(Re[y(i,O]]]] rex[i] rex[i + ls/4] rex[i + ls/2] + rex[i + ls/4] ( THE FOLLOWING RESULT IS THE IMAGINARY PART OF Y[I,O] FOR A GENERAL INDEX I. A NONZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYIO PROVIDES A CONVEIHEin HEAlS FOR REFERRING TO THIS RESULT, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[I,O], ) rhsiyiO Factor [Simplify [Im[y [i,O]] ]] 0 < THE REMAINING EQUATIONS INVOLVE COMPLEX MULTIPLICATION BY A POWER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF ONEGA REQUIRED HAVE BEEN PRECOMPUTED AND STORED IN THE CQI{PLEX .ARR.A.Y OMEGA. OMEGA COinAINS THE L 1TH ROOTS OF UNITY, lr:'liERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2*LS FCR fi.LL V.A.LUES OF LS. I AND Q ARE INDEX VALUES USED IN THE CJMBIHE 280
PAGE 292
EQUATIONS. MATHEMATIC! REGARDS OMEGA AS AN UHDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT YILL BE IDENTICAL TO A FORTRAN ARRAY. THE FUNCTIONS OR AND OI REPRESENT THE REAL AND IMAGINARY PARTS, RESPECTIVELY, OF THE POWERS OF OMEGA USED IN THE COMBINE EQUATIONS. ) or(q_] := Re(omega(q*il/ls]] oi(q_] = Im(omega[q*i*l/ls]] rhsorq =Table[ or(q] {q,l,p/21} {Re [Ol!lega [ /ls]]} rhsoiq = Table[ oi[q] {q,l,p/21} ] < THE FUNCTIONS FR AID FI ARE OBTAINED FROM Y BY OMITTING THE FIRST FACTOR (A POWER OF OMEGA) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_,q_] := Re[ x[n,O] + {1)(q+i)*Conjugate[x[n,p/2]] + Sum[ Conjugate(;; (p](l>l
PAGE 293
Clear[:fr] Clear[fi] < THE FUNCTIONS RYI [Q] AND IYI [Q] REPRESENT THE REil.L U:D IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or(q]fr[q] + oi[q]:fi[q] iyi[q_] := or[q]*fi[q) oi[q]fr[q] rhsryiq = Table[ ryi[q] {q,1,p/21} ] {fi [1] oi[1] + fr [1] ox[1]} rhsiyiq =Table[ iyi[q] {q,l,p/21}] { (fr [1) oi[1]) + :fi [1] or[1]} < THE FUNCTION YT CORRESPONDS TO THE NOTATION Y TILDE USED IN THE COMBINE EQUATIONS. THE ARGUMENT TO THIS FUNCTION REPRESENTS THE FIRST SUBSCRIPT, WHILE THE SECOND SUBSCRIPT HAS THE IMPLICIT VALUE P/2. ) yt [n_] := x[n,O] + + Sum( (1)1 (x[n,l)x[n,l]) {l,i,p/21} ] ) < THE FOLLOWING RESULTS ARE OBTAINED BY EVALUATING Y! AT SPECIFIC VALUES OF ITS ARGUMENT. ) rhsytO = Factor(Simplify[yt[O]]] 0 rhsytm = Factor[Simplify[yt [ls/(2p)]]] 2(rex [ls/8] rex [ (3ls) /8]) rhsyti = Factor[Simplify[yt[i]]] rex[i] + rex[i + ls/4] rex[i + ls/2] rex[i + ls/4] < BEFORE EXECUTiliiG THE REMAINDER OF THIS FILE, If'JST DETERMINE STORAGE PATTERNS FOR THE INPUT AND OUTPUT DATA WHICH ALLOV THE COMBINE EQUATIONS TO BE EXECUTED INPLACE. 282
PAGE 294
THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN ARRAY NAMED A. EACH SEQUENCE IS STORED IN A ROW OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE NUMBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING REX IN TERMS OF ITS LOCATION WITHIN THE FORTRAn ARJi.A Y A. HATHEMATICA REGARDS A AS AN UlWEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT MILL BE IDENTICAL TO A FiJRTRU ARRAY. ) rex[n_] := a[j,n] ( THE STORAGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOTiriNG FUNCTIONS, THE FUNCTION EAI1ES ARE FOR. OUTPUT QUANTITIES. FOR EXAMPLE, YO[N) MEANS Y[N,O], IY[N,Q] MEAlS THE IMAGINARY PART OF Y[B,Q], ETC ... ) yO[n_] '= a[j ,n] yt[n_] := a[j,ls/pn] iy[n_,q_] := a[j,qls/p+n] ry[n_,q_] := a[j,qls/p+ls/pn] ( WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FORTRAN ARRAY A, AiD USING FORTRAN SYNTAX. IN THIS WAY, TRESE RESULTS MAY BE INSERTED INTO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAINS EXPRESSIONS INVOLVIJJG THE INPUT DATA. THE SECOIW TABLE IS A CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXHESSIOlfS IN THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERO, THEN THERE IS NO CORRESPONDING OUTPUT LOCATION IN THE SECOilD TABLE. THE PAIR OF TABLES REPRESENT A. COHBHIE EQUATION. BY USING LOCAL SCALAR VARIA.BLES AS TEMPORARY STORAGE LOCA.TIONS. THESE COMBINE EQUATIONS CAN BE EXECUTED II PLACE. THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. ) FortranForm [rhs iyOq] List(0,2a(j,ls/4)) lhsiyOq = Table[ {q.l,p/21} ] {o(j ,ls/4)} ( THE FOLLOWING 2 PAIRS OF TABLES SPECIFY THE FOR I= LS/(2P). ) 283
PAGE 295
2*(a(j,ls/8) a(j,3*ls/8)) lhsytm = FortranForm[yt[ls/(2*p)]] a(j,ls/8) FortranForm[rhsiymq] List(O,(Sqrt(2)*(a(j,ls/8) + a(j,3*ls/8)))) lhsiymq =Table[ FortranForm[iy[ls/(2*p),q]] {q,i,p/21} ] ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND OI. ) FortranForm[rhsorq] List(Re(omega(i*l/ls))) FortranForm[rhsoiq] List(Im(omega(i*l/ls))) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. ) FortranFor.m[rhsfrq] List(a(j,i) + a(j.i + ls/2)) FortranFor.m[rhsfiq] List((a(j,i + ls/4) + a(j,i + ls/4))) ( THE FOLLOlJING 4 PAIRS OF TABLES SPECIFY THE CONPUTATIONS FOR THE GENERAL INDEX I. ) FortranForm[rhsryiO] a(j,i) a(j,i + ls/4) a(j,i + ls/2) + a(j, i + ls/4) lhsryiO = FortranForm[yO[i]] 284
PAGE 296
a (j, i) F ortre.nF orm [rhs yt i] a(j,i) + a(j,i + ls/4) a(j,i + ls/2) a(j ,i + ls/4) lhsyti FortranFor.m[yt[i]] a(j,i + ls/4) FortranForm.[rhsiyiq] List({fr(1)oi(1)) + fi(1)or(1)) lhsiyiq = Table[ FortranForm[iy[i,q]] {q,1,p/21} ] {a(j,i + ls/4)} FortranForm[rhsryiq] lhsryiq = Table[ FortranForm.(ry[i,q]] {q,1,p/21}] {a(j,i + ls/2)} 285 
PAGE 297
< VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COMBINED EVEN R.ADIX ) < BEFORE EXECUTING THIS FILE, OPEN THE PACKAGES NSG. I'l AND REil'LM. MSG.M PROVIDES TEXT FOR ERROR MESSAGES, iJHILE REil'LM REDEFINES THE FUNCTIONS RE AHD IM FOR PERFOR1>11NG SYMBOLIC RATHER THAN NUI1ERIC COMPUTATIONS. ) < SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 < THE CONJUGATE FUNCTION :MUST BE REDEFINED FOR PERFORIUJiV; SYMBOLIC RATHER THAN NUMERIC CnMPUTATIONS. ) Unprotect[Conjugate] {''Conjugate"} Conjugate[expr_] := Re[expr] Im[expr] I Protect[Conjugate] {"Conjugate"} < THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WILL BE NEEDED TO SIMPLIFY THE COMBINE EQUATIONS. THESE PROPERTIES ARE NOT BUILT INTO MATHEMATICA, AND }lliST BE SPECIFIED AS FOLLOWS. ) EvenOdd = { } Sin[(n_?Uegative) x_.] :> Sin[n x], Cos[(n_?Negative) x_.] :> Cos[n x] /; n Sin[(nx)], Cos[(x_.)(n_)?Negative] :> Cos[Cnx)] /; n < 0} < ABSX REPRESENTS THE ABSOLUTE VALUE OF X. AT THIS POINT, IT IS AN UNDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. ME WILL LATER DEFillE ABSX IN TERNS OF ITS LOCATION TJITHIN A FORTRAN .ARRAY. MATHEMATIC.A HAKES NO 286
PAGE 298
ASSUMPTIONS ABOUT THE DATA TYPE OF ABSX. THE FOLLOWING SUTEMENTS INFORM M.ii.THEMATICA THAT ABSX IS REAL VP.LGED. ) absx/: Re[absx(n_]] := absx[n] absx/: Im(absx[n_]] := 0 < THE FOLLOt.TlNG STATEMENT IS VALID FOR THE IDFT OF AMY ISCS SYMMETRIC SEQUENCE. THIS STATEMENT PLAYS A CRUCihL hOLE IN SIHPLIFYIDG THE FORWARD COMBINE EQUATIONS FOR ISCS SEQUENCES. J absx[O] := 0 < THE FUNCTIONS XT (X TILDE), lJ (OI1EGA), AND Y CORRESPOND TO NOTATION USED IN THE COMBINE EQUATIONS. THE ARGffi1ENTS TO THESE FUNCTIONS REPRESEIT SUBSCRIPTS. LS REPRESENTS THE LEIGTH OF THE SUBSEQUENCE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) xt(n_,l_] := absx[l*ls/p+n] y(n_,q_] := ( w[2*ls](n(2q+1)) xt[n,O] + * xt[n,p/2] + w[2ls*p]((l*lsn*p)*(2*q+1)) xt(n,l] {l,i,p/21} l ) ( THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AIJD THEREFORE YILL NOT BE STORED IN THE FORTRAN ARRAY. J rhsryOq =Table[ Factor[Simplify[Re[y[O,q]]]] {q,O,p/21}) {0, 0} ( THE FOLLOlHDG TABLE CONTAINS THE II>IJI.GIIJARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NONZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[O,Q]. J 287
PAGE 299
rhsiyOq = Table( Factor[Simpli:fy(Im[y(O,q]]]] {q,O,p/21} ] {(2(1/2)absx[ls/4] + absx(ls/2]), (2"(1/2)absx[ls/4] abs:x:[ls/2])} < THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[LS/(2*P) ,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE O, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. NOTE THAT THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WERE REQUIRED TO OBTAIN THESE RESULTS. ) rhsrymq =Table[ Re(y[ls/(2p),q]] /. EvenOdd {q,O ,p/21} ] {0, 0} < THE FOLLOWING TA.BLE CONTA.INS THE IJUGnJARY PART OF Y[LS/(2P),Q] FOR THE APPROPRIATE RANGE OF Q. THE NONZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYHQ PROVIDES A CONVENIENT MEANS FOR REFERRI!IJG TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IIUGINARY PART OF Y[M,Q] WHERE M = LS/(2>0?), AGAIN, THE EVEN AND ODD PROPERTIES OF THE COS AND SIN FUNCTIONS WERE REQUIRED TO OBTAIN THESE RESULTS. ) rhsiymq =Table[ Irn[y[ls/(2p),q]] /. EvenOdd {q, 0 ,p/21} l {2Sin[Pi/8]absx[ls/8] 2Sin[(3Pi)/8]absx[(3*ls)/8], 2Sin[(3Pi)/8]absx[ls/8] + 2Sin[Pi/8]absx((3ls)/8]} ( THE REMAINIEG EQUATIOlfS INVOLVE COMPLEX MULTIPLICATIOIJ BY A POYER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF UEEGA REQUIRED HAVE BEEN PRECOMPUTED AND STORED IN THE COHPLEX ARRAY OMEGA. OMEGA CONTAINS THE L'TH ROOTS OF UNITY, WHERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2LS FCR ALL VALUES OF LS. I AND Q ARE INDEX VALUES USED Hl' THE COl'lBifiJE EQUATIONS. MATHEI1ATICA REGARDS OMEGA AS Ail! UNDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT WILL BE IDENTICAL TO A FORTRAJl' ARRAY. THE FUNCTIONS OR AIW OI REPRESENT THE REAL AND IMAGINARY PARTS, OF THE POWERS OF OHEGA USED IN THE COMBINE EQUATIONS. ) or[q_] := Re[ornega[(2q+1)*i*l/(2ls)]] oi[q_J := Irn[omega[(2>0q+1)*il/(2*ls)]] 288
PAGE 300
rhsorq =Table[ or[q] {q,O,p/21}] rhsoiq =Table( oi[q] {q,O,p/21} ] < THE FUNCTIONS FR AHD FI ARE OBTAINED FROM Y BY OHITTING THE FIRST FACTOR (A POWER OF Of1EG.A) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_,q_) := Re[ xt[n,O) + + Sum[Conjugate (y [2*p] (1(2q+1))] xt [n, 1] xt[n,l] {1,1,p/21} l l fi[n_,q_] := Im[ x.t[n,O] + I(1)(q+l)xt[n,p/2] + Sum[Conjugate [v[2p] (1( 2q+1)) Jxt (n,l] xt[n,l] {1,1,p/21} l l rhsfrq =Table[ Factor[fr(i,q]] {q,O,p/21} ] {(2*absx[i] (1/2)absx[i + ls/4) + 2(1/2)absx(i + ls/4))/2, (2absx[i] + + ls/4] + ls/4])/2} rhsfiq = Table[ Factor(fi[i,q)] {q,O,p/21} ] {(2(1/2)absr[i + ls/4] + 2absx[i + ls/2] + 2(1/2)absx[i + ls/4])/2, + ls/4] 2*absx[i + ls/2] + 2(1/2)absx[i + ls/4])/2} < WE ROW CLEAR THE DEFINITIONS OF THE FUNCTIONS OR,OI,FR,FI SO THAT MATHEMATICA 'WILL LEAVE THEl'l IN SYMBOLIC FORl'I RATHER THAN EXPANDING THEM. ) Clear[or] Clear[oi] Clear[fr] Clear[fi] 289
PAGE 301
< THE FUlll'CTIONS RYI (Q] AND IYI [Q] REPRESENT THE REAL l;.IJD IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or[q]fr[q] + oi[q]fi[q] iyi[q_] := or[q)fi[q] oi[q]fr[q] rhsryiq = Table[ ryi[qJ {q,O,p/21} ] {fi[O]*oi[O] + fr[O]or[O], fi[1)oi[1] + rhsiyiq = Table[ iyi[q] {q,O,p/2l} {(fr[O]oi[O]) + fi[O]*or(O], (fr(l]*oi[i]) + fi[1]or[1]} < BEFORE EXECUTUG THE REI1.HNDER OF THIS FILE, ONE MUST DETERMINE STORAGE PATTERNS FOR THE INPUT AND OUTPUL DATA WHICH ALLOW THE COMBINE EQUATIONS TO BE EXECUTED INPLACE: THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN ARRAY NAMED A. EACH SEQUENCE IS STORED IN A ROW OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE NUMBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING ABSX IN TERMS OF ITS LOCATION WITHIN THE FORTRAN ARRAY A. MATHEMATICA REGARDS A AS AN UNDEFillED FUNCTION BUT TffE SYNTAX OF THE FINAL OUTPUT WILL BE IDENTICAL TO A FORTRAN ARRAY. l absx[n_] := a[j,ls/2n] < THE STORAGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOWING FUNCTIONS. THE FUNCTION NAMES ARE ACRONYHS FOR OUTPUT QUANTITIES. FOR EXAMPLE, IY[N,Q] MEANS THE INi!.GlNAR.Y PART OF Y[N,Q], ETC ... J iy[n_,q_] := a[j,qls/p+n] ry[n_,q_] := a[j,qls/p+ls/pn] < WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FORTRAN ARRAY A, AND USING FORTRAN SYNTAX. IN THIS HAY, THESE RESULTS MAY BE INSERTED Illl'TO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAINS EXPRESSIONS INVOLVING THE INPUT DATA. THE SECOND IS A CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXPRESSIONS IN THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERU, THEN THERE IS NO CORRESPONDING OUTPUT LOCATION IU THE SECOND TABLE. THE PAIR OF TABLES REPRESENT A CONBHiE 290
PAGE 302
EQUATIOE. BY USING LOCAL SCALAR VARIABLES AS TENPORARY STORAGE LOCATIONS, THESE COMBINE EQUATIONS CAN BE EXECUTED IHPLACE. THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. ) F ortra.nF orm[rhs iyOq] List((a(j,O) + Sqrt(2)a(j,ls/4)), (a(j,O) + Sqrt(2)a(j,ls/4))) lhsiyOq = Table[ FortranForm[iy[O,q]] {q,O,p/21} ] {a(j,O), a(j,ls/4)} ( THE FOLLOWIJJG PUR OF TA.BLES SPECIFY THE CONFUTATIONS FOR I = LS/(2P). ) FortranForm[rhsiymqJ List(2*Sin(3Pi/8)a(j,ls/8) 2Sin(Pi/8)a(j,3*ls/8) ,2*Sin(Pi/8)a(j,ls/8) 2Sin(3Pi/8)a(j,3ls/8)) lhsiymq =Table[ FortranForm(iy[ls/(2*p),q]] {q,O,p/21} ] {a(j ,ls/8), a(j ,3ls/8)} ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND OI. ) FortranForm[rhsorq] List(Re(omega(il/(2s))),Re(omega(3il/(2ls)))) FortranForm[rhsoiq) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. ) FortranForm[rhsfrq] List((Sqrt(2)a(j,i + ls/4) + 2*a(j,i + ls/2) Sqrt(2)a(j,i + ls/4))/2, ((Sqrt(2)a(j,i + ls/4)) + 2a(j,i + ls/2) + Sqrt(2)a(j,i + ls/4))/2) 291
PAGE 303
FortranForm[rhsfiq] List((2a(j,i) + Sqrt(2)a(j,i + ls/4) + Sqrt(2)a(j,i + ls/4))/2, (2*a(j,i) + Sqrt(2)a(j,i + ls/4) + Sqrt(2)a(j,i + ls/4))/2) < THE FOLLOWING 2 PAIRS OF TABLES SPECIFY THE COMPUTATIOIITS FOR THE GENERAL INDEX I. ) FortranForm[rhsiyiq] List((fr(O)*oi(O)) + fi(O)or(O), (fr(1)oi(1)) + fi(l)or(l)) lhsiyiq =Table[ FortranFor.m[iy[i,q]] {q,O,p/21} {a(j,i), a(j,i + ls/4)} FortranForm[rhsryiqJ List(fi(O)oi(O) + r(O)*or(O), fi(1)oi(1) + fr(1)or{1)) lhsryiq = Table[ FortranFor.m[ry[i,q]] {q,O,p/21} ] {a(j,i + ls/4), a{j,i + ls/2)} 292
PAGE 304
( VECTORIZED I SEQUENCES FORWARD COMBINED EVEN RADIX ) ( BEFORE EXECUTING THIS FILE. OPEN THE PACKAGES HSG.I>! Jl.ND REIM.M. HSG.M PROVIDES TEXT FOR ERROR FIESSAGES, tJHILE REIM.M REDEFINES THE FUNCTIONS RE AND IM FOR PERFORI1HJG SYMBOLIC R.A.THER THAli NUI1ERIC COMPUTATIONS. ) ( SPECIFY THE RADIX P (EVEN VALUES ONLY) ) p := 4 ( THE COllJJUG.A.TE FUICTION MUST BE REDEFIIJED FOR PERFORNING SYMBOLIC RATHER THAI HOMERIC COMPUTATIONS. ) Unp:rotect [Conjugate] {"Conjugate"} Conjugate[expr_] := Re[expr] Im[expr] I Protect[Conjugate] {"Conjugate"} ( REX REPRESENTS THE REAL PART OF X. AT THIS POINT, IT IS AN UIDEFINED FUNCTION WHOSE ARGUMENT REPRESENTS THE SUBSCRIPT ON X. WE WILL LATER DEFINE REX IN TERMS OF ITS LOCATION WITHIN A FORTRAN ARRAY. MATHEMATICA MAKES NO ASSUMPTIONS ABOUT THE DATA TYPE OF REX. THE FOLLmHNG STATEMENTS IrlFORM MATHElUTIC.A THAT REX IS REAL VALUED. Il'U REPRESENTS THE IMAGINARY PART OF X .AND IS ANALOGOUS TO REX. ) rex/: Re [rex [n_]] := rex[n] rex/: Im[rex [n_]] := 0 im:J;/; Re [imx [n_]] = imx[n] imx/: Im[imx [n_]] := 0 ( THE FOLLOWING STATEMENTS ARE VALID FOR THE IDFT OF kNY I 293
PAGE 305
SYMMETRIC SEQUENCE. THESE STATEMENTS PLAY A CRUCIAL ROLE IN SIMPLIFYING THE FORRARD COMBINE EQUATIONS FOR I SEQUENCES. LS REPRESENTS THE LENGTH OF THE SUBSEQUENCE BEING SPLIT BOTH HERE AND IN THE FORTRAN CODE. ) :rex[O] := 0 :rex[ls/:2] := 0 ( THE FUNCTIONS X, W (OMEGA), ANDY CORRESPOND TO NOTATION USED IN THE COMBINE EQUATIONS. THE ARGUMENTS TO THESE FUNCTIONS REPRESENT SUBSCRIPTS. ) x[n_,l_] := :rex[lls/p+n] + Iimx[l*ls/p+n] w[l_] := Exp[H2Pi/l] y(n_,q_] := "' ( x[n,O] + (1)(q+l)Conjugate[x[n,p/2]] + Sum( Conjugate[w[p](lq)]x(n,l]w[p](l*q)*Conjugate[x[n,l]] {1, 1 ,p/21} ] ) ( THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AND THEREFORE NOT BE STORED IN THE FORTRAN ARRAY. ) rhsryOq = Table[ Factor[Sirnpliy[Re[y(O,q)]]) {q,O,p1} ] {0, o, o, 0} ( THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[O,Q] FOR THE APPROPRIATE RANGE OF Q. THE NONZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYOQ PROVIDES A CONVENIENT MEANS FOR REFERRIUG TO THESE RESULTS, AID IS AN ABBREVIATION FOR THE RIGHT HAED SIDE OF THE IMAGINARY PART OF Y[O.Q]. ) rhsiyOq = Table[ Factor[Simplify[Irn[y[O,q]]J] {q,O ,p1} ] {imx[O] + imx[O] imx[O] imx[O] ( 2irnx[ls/4] + irnx(ls/2], irnx[ls/2] 2irnx[ls/4] + imx[ls/2] irnx[ls/2] + 294
PAGE 306
THE FOLLOWING TABLE CONTAINS THE REAL PART OF Y[LS/(2*P),Q] FOR THE APPROPRIATE RANGE OF Q. THE RESULTS SHOULD BE 0, AND THEREFORE WILL NOT BE STORED IN THE FORTRAN ARRAY. ) rhsrymq =Table[ Factor[Sirnplify[Re[y[ls/(2p),q]]j] {q,O ,p1} ) {0, 0, 0, 0} ( THE FOLLOWING TABLE CONTAINS THE IMAGINARY PART OF Y[LS/(2P) ,Q] FOR THE APPROPRIATE RAJliGE OF Q. THE ItTOI!ZERO RESULTS WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RBSIYMQ PROVIDES A CONVENIENT MEANS FOR REFERRING TO THESE RESULTS, AND IS AN ABBREVIATION FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y(M,Q] WHERE M = LS/(2*P). ) rhsiyrnq =Table[ Faetor[Sirnplify[Irn[y(ls/(2*p),q]]]] {q,O,p1} ] {2(imx[ls/8] + irnx[(3ls)/8]), 2(1/2)(im:x[ls/8] irnx[(3ls)/B] rex[ls/8] rex[(3*ls)/8]), 2(rex[ls/8] rex((3*ls)/8]), (2(1/2)(imx[ls/8] irnx[(3Hs)/8] + rex[ls/8] + xex((3ls)/8]))} < THE FOLLOVING RESULT IS THE REAL PART OF Y[I,O] FOR A GENERAL INDEX I. A NONZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSRYIO PROVIDES A COI1VENIENT MEANS FOR REFERRING TO THIS RESULT, AND IS AN JlBBREV!ATIOllT FOR THE RIGHT HAND SIDE OF THE REAL PART OF Y(I,O]. ) xhsryiO = Factor(Simplify[Re[y(i,O]]]] xex(i] rex[i + ls/4] rex(i + ls/2] + xex[i + ls/4] < THE FOLLOYING RESULT IS THE IMAGINARY PART OF Y[I,O] FOR A GENERAL INDEX I. A NONZERO RESULT WILL LATER BE STORED IN THE FORTRAN ARRAY. THE LABEL RHSIYIO PROVIDES A CONVENIEHT MEANS FOR REFERRING TO THIS RESULT, AND IS AN ABBREVIATIOI FOR THE RIGHT HAND SIDE OF THE IMAGINARY PART OF Y[I,O]. ) rhsiyiO = Factor[Simplify[Im[y(i,O]]J] imx[i] + irnx[i + ls/4] + im.x(i + ls/2] + irnx[i + ls/4] 295
PAGE 307
( THE REMAINING EQUATIONS INVOLVE COMPLEX BY A POWER OF OMEGA. IN THE FORTRAN CODE, ALL POWERS OF OllE;J.A REQUIRED HAVE BEEN PRECOMPUTED AND STORED IN THE CGr1PLEX ARRAY 011EGA. OMEGA CONTAINS THE L'TH ROOTS OF UNITY, ti'HERE L IS A FIXED CONSTANT WHICH IS DIVISIBLE BY 2*15 FOR ALL VALUES OF LS. I AND Q ARE INDEX VALUES USED IN THE COHBil'JE EQUATIOli!S. MATHEMATICA REGARDS OJ.1EGA Jl.S AIJ UNDEFINED FUNCTION, BUT THE SYNTAX OF THE FINAL OUTPUT 'IJILL BE IDENTICAL TO A FORTRAN ARRAY. THE OR li.ND DI REPRESENT THE REAL AmD Il1AGINARY PARTS, RESPECTIVELY, OF THE POWERS OF OMEGA USED IN THE COMBINE EQUATIONS. ) or[q_] := Re[omega[qil/ls]] oi[q_] := Im[omega[qil/ls]] rhsorq = Table[ or[q] {q,l,p1} {Re[omega[(il)/ls]], Re[omega[(2*i*l)/ls]], Re[omega[(3*i*l)/ls]]} rhsoiq =Table[ oi[q) {q,l,p1} {Im[omega[(il)/ls]]. Im[omega[(2il)/ls]], Im[omega[ (3i*l) /ls] ]} ( THE FUNCTIONS FR AND FI ARE OBTJI.UED FROM Y BY OHITTJ'JJG THE FIRST FACTOR (11. POWER OF OMEGA) AND TAKING REAL AND IMAGINARY PARTS. ) fr[n_.q_] := Re[ x[n,O) + ( 1)(q+l )Conjugate (:x [ n,p/2]] + Sum[ Conjugate [v [p](hq)] *x[n,l]v[p](lq)Conjugate[x[n,l]] {l,1,p/21} l J fi(n_,q_] := Im[ :x[n,O] + ( 1)(q+l )Conjugate [x [ n,p/2]] + Sum[ Conjugate[v[p](lq)]x[n,l]v (p](hq) *Conjugate [x [ n,l]] {1, 1 ,p/21} l l rhsfrq = Table[ Factor(fr[i,q]) {q,1,p1} l {(im.x[i + ls/4] imx[i + ls/4] rex[i] rex[i + ls/2]), rex[i] + rex[i + ls/4] rex[i + ls/2] rex[i + ls/4] imx[i + ls/4] imx[i + ls/4] + rex[i] + rex[i + ls/2]} rhsfiq = Table[ Factor[fi[i,q]] {q,l,p1} ] 296
PAGE 308
{im.x[i] imx[i + ls/2] rex[i + ls/4] :rex[i + ls/4], imx[i] imx[i + ls/4] + imx[i + ls/2) imx[i + ls/4), imx[i] iml::[i + ls/2] + rex[i + ls/4] + rex[i + ls/4]} < VE NOW CLEAR THE DEFINITIONS OF THE FUNCTIONS OR,OI,FR,FI SO THAT H!THEM!TIC! WILL LEAVE THEN IN SYI'IBOLIC FORH R1l.THER THAN EXPANDING THEM. l Clear[or] Clear[oi] Clear[fr] Clea:r[fi] < THE FUNCTIONS RYI[Q] AND IYI[Q] REPRESENT THE REAL AND IMAGINARY PARTS, RESPECTIVELY, OF Y[I,Q]. l ryi[q_] := or[q]fr[q] + oi[q]fi[q] iyi[q_] := or[q]*fi[q] oi[q]fr[q] rhsryiq =Table[ ryi(q] {q,l,p1}] {fi [1] oi (1] + fr [1] or[l], fi(2] oi[2] + "fr [2] or [2] fi[3]oi[3] + fr[3]or[3]} rhsiyiq = Table[ iyi[q] {q,l,p1} {(fr[l]oi[l]) + fi[1]or[1], (fr[2]*oi[2]) + fi[2]or[2], (fr[3]oi[3]) + fi[3]or[3]} < BEFORE EXECUTIIG THE REMAINDER OF THIS FILE, ONE MUST DETERMINE STORAGE PA.TTERMS FOR THE IlfPUT AND OUTPUT DATA WHICH ALLOW THE COMBINE EQUATIONS TO BE EXECUTED INPLACE. THE DATA IS CONTAINED IN A TWO DIMENSIONAL FORTRAN 1RRAY NAMED A. EACH SEQUENCE IS STORED IN A ROM OF A, SO THE FIRST INDEX SIMPLY IDENTIFIES THE SEQUENCE JliUl'!lBER. THE STORAGE PATTERN FOR THE INPUT DATA IS SPECIFIED BY DEFINING REX AND IMX IN TEru1S OF THEIR LOCATION WITHIN THE FORTaAE ARRAY A. HATHEMATICA REGARDS A AS AN UNDEFINED FUNCTIOH, BUT THE SYNTAX OF THE FINAL OUTPUT VILL BE IDENTICAL TO A FORTRAN ARRAY. l 297
PAGE 309
imx[n_] := a[j ,n] rex [n_] : = a[j ,lsn] < THE STORiGE PATTERN FOR THE OUTPUT DATA IS SPECIFIED BY THE FOLLOlJI:NG FUNCTIONS. THE FUNCTION NAMES ARE ACRONYHS FOR OUTPUT QUUTITIES. FOR EXAMPLE, IY[N ,Q] I1EANS THE I!U.GINii.RY PART OF Y[N ,Q], ETC ... ) iy[n_,q_] := a(j,qls/p+n] ry[n_,q_] := a(j,qls/p+ls/pn] < WE NOW OUTPUT ALL OF OUR RESULTS IN TERMS OF THE FCRTRH ARRAY A, AND USING FORTRAN SYNTAX. IN THIS WAY, THESE RESULTS MAY BE INSERTED INTO FORTRAN CODE. THE RESULTS ARE OUTPUT IN PAIRS OF TABLES. THE FIRST TABLE CONTAIIIS EXPRESSIONS INVOLVING THE INPUT DATA. THE SECOJJD TABLE IS Jl. CORRESPONDING LIST OF STORAGE LOCATIONS FOR THE EXPRESSIONS Ii THE FIRST TABLE. IF THE FIRST TABLE CONTAINS A ZERO, THEN THERE IS DO CORRESPONDING OUTPUT LOCATION IN THE SECOED TABLE. THE PAIR OF TABLES REPRESENT A COMBINE EQUATION. BY USING LOCAL SCALAR VARIABLES AS TEMPORARY STORAGE LOCATIONS, THESE COMBINE EQUATIONS CAN BE EXECUTED INPLACE. THE FOLLOMING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR I = 0. l List(a(j,O) + 2a(j,ls/4) + a(j,ls/2), a(j,O) a(j,ls/2) 2a(j,3ls/4), a(j,O) 2a(j,ls/4) + a(j,ls/2), a(j,O) a(j,ls/2) + 2a(j,3ls/4)) lhsiyOq =Table[ FortranForm[iy[O,q]] {q,O,p1}] {a(j ,0), a(j ,ls/4), a(j,ls/2), a(j ,3ls/4)} < THE FOLLOWimG PAIR OF TABLES SPECIFY THE COMPUTATIOIE FOR I = LS/(2P). ) FortranForm[rhsiymq] List(2(a(j,ls/8) + a(j,3*ls/8)), Sqrt(2)(a(j,ls/8) a(j,3s/8) a(j,S*ls/8) a(j,7Hs/8)), 2(a(j,Sls/8) + a(j,7ls/8)), (Sqrt(2)*(a(j,ls/8) a(j,3ls/8) + a(j,Sls/8) + a(j, 7ls/8)))) 298
PAGE 310
lhsiymq =Table[ FortranFor.m[iy[ls/(2p),q]] {q,O,p1} ] {a(j,ls/8), a(j,3ls/8), a(j,Sls/8), a(j,7ls/8)} ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS OR AND DI. l FortranForm[rhsorq] List(Re(omega(il/ls)),Re(omega(2il/ls)), Re(omega(3il/ls))) FortranForm[rhsoiq) List(Im(omega(il/ls)),Im(omega(2il/ls)), Im(omega(3*i*l/ls))) ( THE FOLLOWING PAIR OF TABLES SPECIFY THE COMPUTATIONS FOR THE FUNCTIONS FR AND FI. l FortranForm[rhsfrq] List((a(j,i + ls/4) a(j,i + ls) a(j,i + ls/4) a(j,i + ls/2)), a(j,i + 3ls/4) + a(j,i + ls) a(j,i + ls/2) + a(j,i + 3*ls/4),a(j,i + ls/4) + a(j,i + ls) a(j,i + ls/4) + a(j,i + ls/2)) FortranForm[rhsfiq] List (a(j, i) a(j ,i + ls/2) a(j ,i + 3*ls/4) a(j,i + 3s/4),a(j,i) a(j,i + ls/4) + a(j,i + ls/2) a(j,i + ls/4), a(j,i) a(j,i + ls/2) + a(j,i + 3*ls/4) + a(j,i + 3*ls/4)) ( THE FOLLOWING 4 PAIRS OF TABLES SPECIFY THE COMPUTATIONS FOR THE GENERAL INDEX I. l FortranForm[rhsiyiO] a(j,i) + a(j,i + ls/4) + a(j,i + ls/2) + a(j, i + ls/4) lhsiyiO = FortranForm(iy[i,O]] a(j,i) 299
PAGE 311
FortranForm[rhsryiO] a(j,i + 3ls/4) + a(j,i + ls) a(j,i + ls/2) a(j,i + 3ls/4) lhsryiO = FortranFor.m[ry[i,O]] a(j,i + ls/4) FortranForm[rhsiyiq] List((fr(1)oi(1)) + fi(1)*or(1), {fr(2)*oi(2)) + fi(2)or(2), (fr(3)*oi(3)) + fi(3)or(3)) lhsiyiq = Table[ FortranForm(iy[i,q]] {q,1,p1} l {a(j,i + ls/4), a(j,i + ls/2), a(j,i + 3*ls/4)} FortranForm[rhsryiq] List(fi(1)*oi(1) + fr(1)or(1), fi(2)oi(2) + fr(2)or(2),fi(3)oi(3) + fr(3)or(3)) lhsryiq = Table[ FortranForm[ry[i,qJ] {q,1,p1} l {a(j,i + ls/2), a(j,i + 3*ls/4), a(j,i + ls)} 300
PAGE 312
Appendix E Automatically Generated Subroutines for the RO FFT
PAGE 313
c C SUBROUTINE: VICSF4 c c C NAME c C VECTORIZED ICS INDUCED SYMJ>IETRIES FORtiFARD COMBII!ED FOR. C RADIX4 c c C FUNCTION c C THIS SUBROUTINE EXECUTES THE R.li.DIX4 FORWARD COl'!BINE C EQUATIONS FOR ICS INDUCED SYMMETRIES. c c C IBPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRAD!SFORH c C LS; LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF YHICH CONTAINS THE C IDFT OF AN ICS SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARli.METERS c C A: UPDATED BY RADIX4 FORMARD COMBINE EQUATIONS FOR ICS C IIDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C lONE c SUBROUTINE VICSF4(M,LS,A) REAL A(l:M,l:LS/21) COMPLEX OMEGA(O:O) COMMON /VFROCOH1/ SQRT2,SQRT2D2, t SQRT3,SQRT3D2, t TSINPID8,TS!N3PI08, It L,OMEGA INTEGER P, TWOP PARANETER (P=4,TWOP=2P) c C COMPUTATIONS FOR I = 0 c c DO 1 J=1,M a(j,ls/4) = (2)a(j,ls/4) 1 COiiTHIUE IF (TVOP(LS/TWOP) .EQ. LS) THEN C COMPUTATIONS FOR I = LS/TWOP c 302
PAGE 314
c DO 2 J=t,M VO = 2+(a(j,ls/8) a(j,3+ls/8)) Vl = (Sqrt2)+(a(j,ls/8) + a(j,3+ls/8)) a(j,ls/8) = VO a(j,3+ls/8) =Vi 2 COIII'TII'JUE MS = LS/TIJOP1 ELSE MS = (LSP)/TYOP END IF IF (LS .GT. TWOP) THEN C COMPUTATIOIII'S FOR I = 1,MS c DO 100 I=l,MS 0R1 = ReAL(omega(i*l/ls)) Oil = AimAG(omega(i+l/ls)) DO 101 J=l,M FRl = a(j,i) + a(j,i + ls/2) FI1 = a(j,i + ls/4) a(j,i + ls/4) VO a(j,i) a(j,i + ls/4) a(j,i + ls/2) + i: a(j,i + ls/4) Vi a(j,i) + a(j,i + ls/4) a(j,i + ls/2) i: a(j,i + ls/4) a(j,i) = vo a(j,i + ls/4) =Vi a(j,i + ls/4) = fr1+(oi1) + fi1+or1 a(j,i + ls/2) = fi1+oi1 + fr1+or1 101 CONTINUE 100 CONT!lii'UE END IF aETurur END 303
PAGE 315
c C SUBROUTINE: VISCSF4 c c C IU.ME c C VECTORIZED ISCS INDUCED SYMMETRIES FORWARD COI'lBilTED FOR C RADIX4 c c C FUIICTION c C THIS SUBROUTINE EXECUTES THE RJI.DIX4 FORlJARD CONBDTE C EQUATIONS FOR ISCS INDUCED SYJIIIiETRIES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL .A.RR.AY, EACH ROY OF WHIC!f CDrHHIJS THE C IDFT OF AN ISCS SYMf'IETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C A: UPDATED BY RADIX4 FORWARD COMBINE EQUATIONS FOR ISCS C INDUCED SYMMETRIES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C NONE c SUBROUTINE VISCSF4(M,LS,A) REAL A(l,M,O,LS/21) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ SQRT2,SQRT2D2, SQRT3,SQRT3D2, TSIHPID8,TSIN3PID8, L, OI'IEGA c INTEGER P, TIJOP PARAMETER (P=4,TWOP=2*P) C COMPUTATIONS FOR I = 0 c DO 1 J=i,M VO = a(j,O) Sqrt2a(j,ls/4) Vi= a(j,O) Sqrt:
PAGE 316
c C COMPUTATIONS FOR I LS/TWOP c c DO 2 J=l,H VO = (TSin3PiDS)a(j,ls/8) Vl = TSinPiDSa(j,ls/8) TSin3PiD8a(j,3ls/8) a(j,ls/8) = VO a(j,3ls/8) = Vl 2 CONTINUE MS = LS/TIJOP1 ELSE MS = (LSP)/TYOP END IF IF (LS .GT. TWOP) THEN C COMPUTATIONS FOR I = 1,MS c DO 100 l=l,HS ORO = ReAL(omega(il/(2ls))) OR1 = ReAL(omega(3il/(2ls))) OIO = AimAG(omega(il/(2ls))) 011 = AimAG(omega(3il/(2ls))) DO 101 J=1,H FRO Sqrt2D2a(j,i + ls/4) + a(j,i + ls/2) Sqrt2D2a(j,i + ls/4) FRl (Sqrt2D2)a(j,i + ls/4) + a(j,i + ls/2) + Sqrt2D2a(j,i + ls/4) FlO a.(j,i) Sqrt2D2a(j,i + ls/4) i:: Sq:rt2D2a(j,i + ls/4) Fit a(j,i) Sqrt2D2a(j,i + ls/4) .t Sqrt2D2a(j,i + ls/4) a(j,i) = frO(oiO) + fiOorO a(j,i + ls/4) = r1(oi1) + filorl a(j,i + ls/4) fiOoiO + frOorO a(j,i + ls/2) = filoil + fr1or1 101 COliTINUE 100 CONTINUE END IF RETURN EmD 305
PAGE 317
c C SUBROUTINE: VIF4 c c C NAME c C VECTORIZED I SEQUENCES FORWARD COMBINED FOR RADIX4 c c C FUNCTION c C THIS SUBROUTUE EXECUTES THE RADIX4 FORWARD COMBI!lE C EQUATIONS FOR I SEQUENCES. c c C INPUT PARAMETERS c C M: NUMBER OF SEQUENCES TO TRANSFORM c C LS: LENGTH OF SUBSEQUENCES BEING SPLIT c C A: TWO DIMENSIONAL ARRAY, EACH ROW OF WHICH THE C IDFT OF AN I SYMMETRIC SEQUENCE OF LENGTH LS c c C OUTPUT PARAMETERS c C A: UPDATED BY RADIX4 FORWARD COMBINE EQUATIONS FOR I C SEQUENCES c c C OUTPUT TO COMMON (INTERNAL USE ONLY) c C BONE c c SUBROUTINE VIF4(M,LS,A) REAL A(l:M,O:LS1) COMPLEX OMEGA(O:O) COMMON /VFROCOM1/ SQRT2,SQRT2D2, 1: SQRT3, SQRT3D2, TSINPIDS,TSIN3PID8, L,OHEGA INTEGER P, TWOP PARAMETER (P=4,TWOP=2P) C COMPUTATIONS FOR I = 0 c DO 1 J=1,M VO a(j,O) + 2a(j,ls/4) + a(j,ls/2) V1 a(j,O) a(j,ls/2) 2*a(j,3*ls/4) V2 a(j,O) 2*a(j,ls/4) + a(j,ls/2) V3 a(j ,0) a(j ,ls/2) + 2*a(j,3>i
PAGE 318
c a(j,3s/4) = V3 1 CONTINUE IF (TWOP(LS/TWOP) .EQ. LS) THEN C COMPUTATIONS FOR I = LS/TROP c c DO 2 J=1,H VO 2(a(j,ls/8) + a(j,3ls/8)) Vl Sqrt2(a(j,ls/8) a(j,3ls/8) a(j,5*ls/8) & a(j,7ls/8)) V2 2(a(j,5ls/8) a(j,7ls/8)) V3 (Sqrt2)(a(j,ls/8) a(j,3ls/8) + + & a(j,7ls/8)) a(j ,ls/8) = VO a(j,3ls/8) = Vi a(j,Sls/8) = V2 a(j,7ls/8) = V3 2 CONTINUE MS = LS/TWOP1 ELSE MS = (LSP)/TWOP EliJDIF IF (LS .GT. TWOP) THEN C COMPUTATIONS FOR I = l,MS c DO 100 I=1 ,HS OR1 ReAL(omega(il/ls)) 0R2 ReAL(omega(2il/ls)) OR3 ReAL(omega(3il/ls)) Oil AimAG(omega(i*l/ls)) 0!2 = AimAG(omega(2*il/ls)) OI3 = AimAG(omega(3*il/ls)) DO 101 J=l,M FR1 a(j,i + ls/4) + a(j,i + ls) + a(j,i + ls/4) + & a(j,i + ls/2) FR2 a(j,i + 3Hs/4) + a(j,i + ls)a(j,i + 1s/2) + a(j, i + 3*ls/4) FR3 a(j,i + ls/4) + a(j,i + ls) a(j,i + ls/4) + a(j,i + ls/2) FI1 a(j,i) a(j,i + ls/2) a(j,i + 3ls/4) a(j ,i + 3ls/4) FI2 FI3 a(j, i) a(j, i + ls/4) + a(j,i + ls/2) a(j,i + ls/4) a(j,i) a(j,i + ls/2) + a(j,i + 3s/4) + a(j,i + 3s/4) VO a(j,i) + a(j,i + ls/4) + a(j,i + ls/2) + a(j,i + ls/4) Vl a(j,i + 3ls/4) + a(j,i + ls)a(j,i + ls/2) a(j ,i + 3ls/4) a(j,i) =vo a(j,i + ls/4) = V1 a(j,i + ls/4) = fr1*(oi1) + fil*orl a(j,i + ls/2) = fr2*(oi2) + fi2or2 a(j,i + 3ls/4) = fr3(oi3) + fi3or3 307
PAGE 319
a(j,i + ls/2) = fi1oi1 + fr1or1 a(j,i + 3ls/4) = fi2oi2 + fr2or2 a(j,i + ls) = fi3oi3 + fr3*or3 101 CONTINUE 100 COJilTIIIIUE EDDIF RETURN El!D 308
PAGE 320
Bibliography [1] W. 1. Briggs, Further symmetries of inplace FFTs, SIAM J. Sci. Stat. Comput., Vol. 8, No. 4 (1987), pp. 644654. [2] J. W. Cooley, P. A. W. Lewis, and P. D. Welsh, The fast Fourier trans form algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms, J. Sound Vibration, 12 (1970), pp. 315337. [3] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comp., 19 (1965), pp. 297301. [4] W. M. Gentleman, Implementing GlenshawCurtis quadrature, II Com puting the cosine transformation, Coillin. ACM, Vol. 15, No.5 (1972), pp. 343346. [5] E. Grosse, Netlib news: Greetings, SIAM News, Vol. 23, No. 6 (1990), pp. 1416. [6] U. Schumann and R. A. Sweet, Fast Fourier Transforms for direct solu tion of Poisson's equation with staggered boandary conditions, J. Com put. Phys., Vol. 75, No. 1 (1988), pp. 123137. [7] P. N. Swarztrauber, Vectorizing the FFTs, in Parallel Computations (G. Rodrigue, ed.), Academic Press, New York, 1982, pp. 490501. [8] , Fast Poisson solvers, MAA Studies in Numerical Analysis, Vol. 24 (1984, G. H. Golub, ed. ), pp. 319370. [9] __ FFT algorithms for vector computers, Parallel Comput., 1 (1984), pp. 4563. [10] , Symmetric FFTs, Math. Comp., V;,l. 47, No. 175 (1986), pp. 323346.
PAGE 321
[llj __ Multiprocessor FFTs, Parallel Comput., 5 (1987), pp. 197210. [12] S. Wolfram, Mathematica: A system for doing mathematics by com puter, AddisonWesley, New York, 1988. 310
