LASER STATISTICS THROUGH ATMOSPHERIC TURBULENCE:
A SIMULATION APPROACH
by
Tze Kuan Yan
B.S., University of Colorado, 1989
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Electrical Engineering
1991
5-'
-i.
"j
This thesis for the Master of Science
degree by
Tze Kuan Yan
has been approved for the
Department of
Electrical Engineering
by
Arun K. Majumdar
Marvin F. Anderson
Date
ABSTRACT
Yan, Tze Kuan (M.S. Electrical Engineering)
Laser Statistics Through Atmospheric Turbulence: A
Simulation Approach
Thesis directed by Professor Arun K. Majumdar
This thesis is based on maximum entropy method Of
constructing an unknown probability density function (PDF)
from the high-order moments. Maximum entropy method is very
useful in data reduction, and solving system of equations
problems. This paper will look at the classical solution of
solving the moment problems, and compare it with the new
algorithm developed here, which is used to calculate the
Lagrange multipliers. This method allows us to know how
actually the Lagrange multipliers are related to moments of
distribution. This new algorithm also applies to the infinite
domain, semi-infinite domain, and finite domain.
The form and content of this abstract are approved. I
recommend its publication.
Signed
Arun K. Majumdar
DEDICATION
I wish to acknowledge the following people who made
this study possible. My mentor, Dr. Arun K. Majumdar, of the
Faculty of the Graduate School, Department of Electrical
Engineering, University of Colorado at Denver, who was the
source of my inspiration for this study. Dr. James Baker-Jarvis
of NIST, who helped me to discover the new algorithm. Last,
but not least, I am very grateful to my wife, and friends who
assisted in typing this paper.
CONTENTS
CHAPTER
1. INTRODUCTION...........................................1
2. LASER STATISTICS IN ATMOSPHERIC TURBULENCE.... 4
Definition of Atmospheric Turbulence...............4
Variance of Intensity Fluctuations.................6
Relationship between PDF and Higher Order
Moments.........................................7
3. THE INFORMATION ENTROPY...............................10
Definition of Entropy.............................10
Example: Information Content of a Morse Code. . . 10
The Linear Problem with One Constraint............12
The Linear Problem with Many Constraints..........14
4. THE MOMENT PROBLEM...................................16
History of the Moment Problem................ 16
Bayesian Theory...................................18
Derivation of Maximum Entropy Methods.............21
Moments and Cumulants.............................27
Analysis of the MAXENT Algorithm..................30
Application in Infinite Domain................... 32
Application in Semi-Infinite Domain...............35
Application in Finite Domain......................40
Constructing PDF from Experimental Data. ...... 49
vi
5. DISCUSSION AND CONCLUSION.................54
APPENDIX..........................................56
BIBLIOGRAPHY......................................65
CHAPTER I
INTRODUCTION
Laser irradiance fluctuations in real atmospheric
turbulance as well as laboratory simulated turbulance have
been studied for a long time. Nevertheless, it still remains as
a puzzle for us to understand how the statistics of irradiance
fluctuations behave in turbulant medium. In order to
understand the nature of irridiance fluctuations, we have to
know the form of probability density function (PDF), because
probability density function is used to described the statistics
of a fluctuating phenomenon like scintillation caused by
atmospheric turbulance [10 ]. There are many ways we can
calculate the probability density function. One of these
methods is called polynomial expansion. It is a very popular
method because it has been examined and studied by a lot of
mathmaticians and scientists for more than a century. In
recent years, a method called maximum entropy has been under
intensive study. For problems outside the traditional domain of
thermodynamics, the conceptual foundations of maximum
entropy also have been debated for some time. This method
becomes more meaningful in practical applications because
2
computers have became more powerful today than before.
From a practical point of view, the maximum entropy method
is more useful because it use moments to describe the
probability distribution. It is very easy for us to obtain the
moments of the distribution from the experimental data.
This thesis is to use the technique of maximum entropy
to reconstruct the probability density function from the finite
moments. It is assumed that we know the physical data of
signal information where we can calculate the lower-order and
higher-order moments. The method of maximum entropy is a
result of Bayesian theorem. With P(Ai/A) being the probability
calculated by maximum entropy (MAXENT). We will exam
P(Aj/A) very carefully later. Use of maximum entropy allows
us to obtain and update the probability distributions as more
information is available.
In 1948, Shannon [ 1 ] introduced the idea of maximum
entropy to describe the information content of a signal. This
idea was used by Jaynes [ 2 ] to analyze statistical physics in
1957. Since then, there are many outstanding application
which used the method of maximum entropy. Mead and
Papanicolaon [ 3 ] used MAXENT to solve integral equations.
James Baker-Jarvis [ 4 ] use it to solve differential equations
with constraint of boundary condition problems. Arun K.
Majumdar has studied the general inverse scattering problem
in the context of maximum entropy [ 5 ].
The uncertainty of a probability distribution can be
measured by the information entropy. By maximizing the
3
entropy subject to available information, we can obtain the
probability distribution function that is least consistent with
the knowledge of the data. So, a probability distribution for
the current state of a system consistent with prior
information can be obtained by MAXENT [ 6 ].
There are some restrictions that must be satisfied in
Shannons derived theory of maximum entropy [ 1 ].
1. The uncertainty of a probability distribution must be
positive.
2. It must be an increasing function of the number of the
degree of freedom (1/N).
3. Independent sources of uncertainty must be additive.
There are many areas of research that require the
estimation of a probability density function (PDF) from the
moment of a distribution, such as mine detection problems,
moment problems, optical propagation problems. It is crucial
in estimating the probability of successful detection in the
mine detection problems. The probability of detection is
quantified by a PDF for the noise level. Some of the research in
the past has assumed a Gaussian distribution function is
immersed into a sinusoidal signal. This is not always true,
because the Gaussian distribution function is not the only
noise exiting. This is why we need some kind of methods to
calculate the probability density function from the moments
which are obtained from the available data. The maximum
entropy techniques for the moment problem that we are
developing in this paper are very useful in this kind of
problems.
CHAPTER 2
LASER STATISTICS IN ATMOSPHERIC TURBULENCE
Definition of Atmospheric Turbulence
Most energy from the optical waves will redistribute
in atmospheric turbulence through small scale intensity
scintillations. The log-intensity variance, is used to
describe the logarithm of the intensity of the wave. The
variance usually increase along with the turbulence's strength.
The turbulence saturates when the log-intensity variance is
equal to 2.4. We can calculate the intensity scintillations in
the saturation region when measured through aperture as small
as fkh in diameter, where X is the optic wavelength, L is
the path length of propagation. The optic wave length varies
inversely as the variance of the log-intensity scintillations.
Because atmosphere is non-stationary, random fluctuations
in density and refractive index are usually caused by
atmosphere turbulence. Also the random fluctuations change
according to different place and time. Since the variation of
pressure is relatively small, we define the turbulence mainly
as the fluctuations of temperature. In fact, the fluctuations of
5
temperature is the prime reason to cause the fluctuation of
the refractive index. The refractive index is defined as,
5n = -79P&r /(9-l)T2 (2.1)
Where T is the temperature, P is the pressure and 0 is
the ratio of specific heat which is equal to 1.47 for air.
Velocity fluctuation is also another factor. Which
is from large scale phenomena to affect the equilibrium
of fluctuations. We define Lo the large scale, size, as the
2 7C
outer scale of turbulence. Where Lo = K0 ,'Ko is a
wave number. Also Lo is the inner scale of turbulence,
2 7t
where lo = Km
Km is also a wave number.
The formula of spectrum is the same for velocity,
temperature, and refractive index. It is defined by
Talarslri as below [ 10 ] .
(k) = 0.033C5 K Texp(--Si)
Km (2.2)
Where
K is the spatial wave number.
Km = 5.92/10.
For small scales the equation is a good estimation
if K >= -K-m.
For large scales the equation is a poor estimation
if K <= K0.
6
Lo, lo, and cÂ£ are the three main parameters to
describe the atmospheric turbulence, is defined as the
structure parameter of refractive index. The intensity of the
refractive index can be measured by Cn. Where I0A-17 <= C<=:
10A-13, the unit is metersA(-2/3). The outer scale (large size)
of atmospheric turbulence is Lo, above which the energy is to
be introduced into the turbulence. It varies with the strength
of the turbulence. That is 100 meters above the ground. The
inner scale (small size) of atmosphere turbulence is lo, below
which viscous dissipates and penetrates the spectrum.
Turbulence energy is usually dissipated around 1mm above the
ground.
Variance of Intensity Fluctuations
*
The variance of the log-intensity, f is usually
measured from the experiment. So we can calculate the log
amplitude
5x = 5ini/4 (2.3)
5?ni has to be small for the spherical wave and the beam wave
to be valid.
The value of Sini can be larger than 2.5 if the high value
of Cn. is obtained in the 500 meters horizontal propagation
path which is close to the ground. It is very different for
2
Sini to be larger than 2.5 in vertical propagation paths because
Cl is constantly changing along the propagation path. From the
7
experimental results, we can show that the spherical wave
has a smaller variance than other kinds of waves.
Let Wo represents the radius of an optical beam at the
transmitter and K is the optical wave number, 2 it/A,. If
L K WoA2/2 then Sini starts out equal to the variance of a
plane wave. On the other hand, if L KWoA2/2 the variance
will be approximately equal to the value of a spherical wave
[10].
Relations between the PDF and the High Order Moments
To study phenomena like laser propagation through
turbulence atmosphere, it is necessary to know the probability
distribution function. However, it is difficult to measure PDF
because of many technical problems such as random spikes and
saturation of detector. As a result, there are many models have
been proposed to obtain the probability density function of
laser scintillation. Such as K distribution, l-K distribution,
log-normal, exponential Bessel, Furutsn distribution [9]. No
matter which model is used to calculate the PDF, we need to
know the higher order moments of the random variable. The
characteristic function of the random variable contains a
series of higher order moments, which can be used to calculate
the probability distribution. The higher order moments contain
information to reconstruct the tail of probability density
function. The information concerning the width and the lack of
symmetry of the distribution can be obtained from the even-
8
and odd order moments, if we use skewness to represent the
measure and asymmetric nature, or flatness to measure the
weight in the tail. It is very easy to obtain this information
from Gaussian type distribution, but it is not the case when
dealing with non-Gaussian type distribution. In order to
understand the physical nature, we need to know the moments
beyond 4th order.
skewness:
r3
excess:
r4
superskewness:
r5
super excess:
r6
hyperskewness:
r7
hyperexcess:
r8
1 (2.4)
3 2 (2.5)
-2 -10 < X3 X X2 > (2.6)
<*6>-15 3 (2.7)
<*7> -105 < X3 X X2 >2 (2.8)
<*"> -105 4 (2.9)
For a Gaussian probability distribution, the odd-order
central moments disappear; so r3> rs and r7 does not exist. We
can obtain r4 = r6 = r8 = 0 according to the above definition.
Remember,Gaussian distribution is described by the first two
moments only [ 5 ].
9
In this paper, it is assumed that we are able to obtain all
of the high order moments that we need. Therefore, we can use
this information to reconstruct the probability density
function.
CHAPTER 3
THE INFORMATION ENTROPY
Definition of Entropy
Let us use a very simple example to induce the result,
which will help us to understand the formalism of entropy.
Consider the problem of tossing two coins. Each having
possible outcomes Ri, R2; therefore, the total number of
possible outcomes are R = Ri R2, or in this particular case,
2*2 = 4. Nevertheless, the information contained in the two
coins should be additive:
|( Ri, R2) = |(Ri) + |(R2)
= Ln R (3.1)
So, we know the information must satisfy a logarithmic type
of function.
Example: Information Content of a Morse Code
Consider an example below:
11
Assume
Ni = number of dashes
N2 = number of dots
therefore
N = Ni + N2
(total number of dashes and dots)
N!
R = (3.2)
Ni N2j
I = Ln R
= [ Ln(N!) Ln(Ni !) Ln(N2!) ] (3.3)
Stirling's approximation:
N = V2 7t n nn en (3.4)
Ln N! = Ln(V2i) + 0.5*LnN + n*LnN n (3.5)
also
Ln(V27c) + 0.5*LnN n*LnN n (3.6)
Ln N N(LnN 1) (3.7)
therefore
1 = N(LnN 1) Ni(Ln Ni -1) N2(LnN2 -1) (3.8)
remember the normalized information is
I i = (3-9)
N
= LnN -1 (Ni/N)*LnNi + Ni/N -(N2/N)*LnN2
+ N2/N
= LnN -(Ni/N)*LnNi -(N2/N)*LnN2
12
= -(Ni/N)*LnNi (N2/N)*LnN2+ (N!/N)*LnN
+(N2/N)*LnN (3.10)
As a result
= -^Ln&
N
N
*2LnNl
N N
(3.11)
Letting Pi = Nj/N, we can generalize the information entropy
formula
k
S = Â£ Pi LnPi
i . (3.12)
an estimate of Pi can be obtained if we maximize the entropy
[ 7 ].
The Linear problem with one constraint
Given
k
S = Â£ P, LnPi
i
where
k
Â£Pi = 1
i
is the normalized constraint for k points.
(3.13)
(3.14)
If
13
K K
S = -2 Pi LnPj -MEpi-1)
taking the variation of s with respect to Pi
SS
SPi
k
= SfLnPi 1 -X]
1
(3-15)
(3.16)
then set Equation (3.16) equal to zero.
Ln Pi + 1 X = 0 (3.17)
therefore
Pi = exp( X -1) (3.18)
To find X we need to use the constraint condition.
X Pi = K expo. -1) = 1
i
(3.19)
where
K = 1/exp(A. -1)
= 1/Pi
(3.20)
14
or
Pi = 1/K
(3.21)
The Linear Problem with Many Constraints
The constraints from an experiment are usually given in
term of expectation values.
< fk > = X pi fki
i
where
k
i pi i
i
therefore
(3.22)
(3.23)
S = X [PiLnPi Xo(Pi -1) X Xk(fkiPi )]
i k
(3.24)
= -XtLnPi -^o -
SPi i k (3.25)
then set equation (3-25) equal to zero, as a result
Pi = exp( -Xo X ^kfki)
k (3.26)
from normalization constraint condition, we obtain
15
exp(-Xo) =
X exP( X ^kfki)
(3.27)
then the probability is
Pi =
exp( X ^kfki)
__________k__________
X exP( X ^kfki)
k
1
(3.28)
CHAPTER 4
THE MOMENT PROBLEM
History of the Moment Problem
From the classical point of view, estimation of an
associated with moment problem can be calculated from the
moments of the function. Maximum entropy is one of the
techniques used for these type of problems. In order to obtain
the least biased estimate of a probability density function,
we need to maximize the entropy of a probability distribution
and also consider the constraint that is the prior information.
This process can be done with the maximum entropy method by
use of Lagrange multipliers. It is not an easy job to calculate
the Lagrange multipliers because they always involves a
system of simultaneous non-linear equations. Since non-linear
equations are not unique, the calculation always give multiple
solutions to the same systems of equations. Therefore, this
kind of process requires us to understand the physical behavior
of the nature of the system, and generate a method to guess
the initial conditions. Such a step is very important because
Lagrange multipliers are just a set of number from a non-
linear mathematical equations and they have no meaning by
17
themselves. As a result, we need to understand the
relationship between the Largrange multipliers and the
physical nature. In this paper, the main object is to find an
algorithm that is used to calculate the Lagrange multipliers
explicitly.
Let us define a positive square integral function, v(x), so
the moments of the function v(x) are given:
where n = 0,1,2,3,4, ...
Ideally we should have infinite numbers of moments, but in
real life, only finite moments can be considered. As I mention
above, such a system of equations may be unique; it will give
multiple solutions to the same system of equations. Therefore,
we have to introduce another constraint that is the function
that will at least decay as fast as exp(-|x|) for a large x.
Maximum entropy is a very powerful alternative method
to solve the moment problem, where we have successfully used
in optics problems, data reduction application and physics
problems.
In this paper, I am going to take a closer look at the
classical method that is used to calculate moment problems by
solving a system of simultaneous non-linear equations, a job
that is not always easy. The new iterative technique that we
are developing in this paper, which is easier to handle. I also
(4.1)
18
compare the moment expansion technique and cumulant
expansion technique.
Bavesian Theorem
Suppose that an experiment is repeated a large number, N
times, resulting in both A and B. A n B nn times; A and not B,
A n B, n2i times, B and not A, A n B, 1*12 times, and neither A
or B, 1122 times. I present these results in table 1.1.
Table 4.1 Table for Event A and B
A A
B nn 012
B 021 022
Note that N = nn + ni2 + n2i + 022.
Then it follows that :
nn +02i
P(A) = __________________________ (4.2)
nil +ni2+n2i + n22
nn +ni2
P(B) = __________________________ (4.3)
nil +ni2+n2i +n22
nn
P(A/B)
nil +ni2
(4.4)
19
nil
P(B/A) _____ (4.5)
nn +ni2
nn
P(B n A) = __________________________ (4.6)
nil +ni2+n2i +n22
P(A n B) nn
P(A/B) = ____________ = ___________ (4.7)
P(B) nn + ni2
or
P(AB/C) = P(A/BC) P(B/C) (4.8)
also
P(A/B) + P( A /B) =1 (4.9)
We can prove
P(A/BC)*P(B/C) = P(B/AC)*P(A/G) (4.10)
as a result, we have Bayesian theorem:
P(B/AC)*P(B/C)
P(A/BC) = _________.__________ (4.11)
P(A/C)
Let us consider C as our prior information about A before
we receive information B. We treat P(A/C) as our prior
20
probability of A as long as we know C. If we also know the new
information about B, we can use this new information to
update the posterior probability of P(A/BC). We can see now
how powerful the Baysian theorem is. It allows us to study and
forecast future events from our limited information, and it
also allows us to add more information, when it becomes
available, in the learning process.
Sometimes, we might encounter multivariate sample
space. The Bayesian rule can be rewritten in another form.
Assume sample space S is a union of mutually exclusive
subsets. That is, say S = Bi u B2 u ... u Bk) when Bi u Bk
= $ where i and k are not equal. Then any subset A of S can
be written as :
A = A n S
= A n (Bi u B2 u ... u Bk)
= (An Bi) u(An b2) u ...u (A n Bk) (4.12)
so
P(A) = P(An Bi) + p (An B2) + ... + P(A n Bk)
= P(Bi)*P(A/Bi) + P(B2)*P(A/B2) + ...
+P( Bk)* P( Bk)
k
=S P(Bi)*P(A/Bi)
i=l
(4.13)
21
a conditional probability of the form P(Bi/A) can be evaluated
as :
P(A/Bi)
P(Bi / A) = _____________
P(A)
P(Bi)*PA/Bi)
Â£ P(Bi)*P(A/Bi)
(4.14)
Derivation of Maximum Entropy Methods
If we define the information entropy
S
j v(x) Lnv(x) dx
(4.15)
where S is a function of a continuously varying random
variable x [1J.
AS in chapter 3, we use the variation method to
maximize s, In order to obtain the most probable distribution
function, the constraints can be carried into the entropy
formalism by Lagrange multipliers.
S
j v(x) Lnv(x) dx normalized constraint
(4.16)
22
where
r k
normalized constraint = (Xo -1) I v(x)dx + ^ Xn
L n=l
k is an even integer, and are Lagrange multipliers.
If we take 8S with respect to 8V and set it equal to zero,
then we have an expression for the probability density
function.
k
V(x) = exp[ X xn ]
n=0 (4.17)
let
<|)(k) = j [exp( ikx) V(x)] dx
k
exp( ikx) exp[ ^ xn] dx
n=0
(4.18)
v(x) x"dx
Where we have assumed > 0.
If we integrate ^(k) by parts and use of boundary
conditions at infinity, then we will have a differential
equation for the Fourier transform of the probability density
function.
n-2 3k<'1) (4.19)
assume
23
wo =
(4.20)
and if we substitute ^(k) back into Equation (4.19), then
arrange all co-efficients according to their powers. As a
result, we will have a system of linear equations.
k
Â£n^n = (j-l)
n=l (4.21)
for j = 1,2 ,3 ,4....m. (m must be an even integer).
The reason why m must be an even integer will be explained
later.
Put into a matrix form, we will have
... I X1 0
... I 2X2
... I 3*3 = 2
... I 4X4 1 3
... ...
(4.22)
let us define the above equation as
x p = d
(4.23)
24
therefore
...
...
...
...
P = (X,i 2X2 ,3^3,4^.4 ,kXk)1
(t denotes transpose).
and
d = (0 2 3 ,... (k-l))1
(4.24)
(4.25)
(4.26)
Ideally, we can solve the infinite numbers of linear equation of
(4.22) to obtain Lagrange multipliers, ^n. If we know all ^n,
then we can calculate ^0 by using the following formula.
V(0) = exp(-Ao) =
(4.27)
where
25
A
k
exp[ X x1 ] dx
i=l
so we have finally completed the solution of V(x)
(4.28)
k
V(x) = exp[ X x* ] dx
i=o (4.29)
here, we have assumed that = 1.
Now, Let us look at another alternative. If we define ^(k)
in term of cumulants instead of moments, use it to expand the
Fourier transform of the probability distribution, we will have
a different result.
Let
k
(k) = exp[ X
n=0
(ik^Cn
n!
(4.30)
if we substitute it into Equation (4.19), expand this equation,
and then identifying the k power of the coefficients. We will
have another form of linear system of equations.
1 Cl Cl2 + C2
0 c2 2cic2 + c3
0 C3 2C1C3 + 2C22 + C4
0 C4 2C1C4 + 6C2C3 + C5
Cl3 + 3C1C2 + C3
3Cl2C2 + 3C1C3 + 3C22 + C4 ...
3Cl2C3 + .... + 9C2C3 + C5 ...
3Cl2C4 + ... + 9C32 + C6 ...
26
If we define it as
1 0
2X2 1 1
3A.3 1 = 0
4X4 1 0
(4.31)
C (3 = a
(4.32)
where C is equal to the matrix below
1 Cl Cl2 + c2
0 C2 2ClC2 + C3
0 C3 2clC3 + 2C22 + C4
0 C4 2C1C4 + 6C2C3 + C5
Cl3 + 3cic2 + c3
3Cl2C2 + 3C1C3 + 3C22 + c4 ...
3Cl2C3 + .... + 9C2C3 + C5 ...
3Cl2C4 + ... + 9c32 + C6 ...
P = (X\, 2X2 .3A.3,4A.4 ,..., k^k) (4.33)
and
a = (0, 1, 0, 0, ... )l (4.34)
27
Moments and Cumulants
Since we can either use moment method or cumulant
method to expand the Fourier transform, we should know the
relationship between them in order to understand how they
affect each other.
From Equations (4.20) and (4.30)
I
n=0
(ik)n
n!
k
exp[ X
11=0
(ik)nCn 1
n! J
(4.35)
lhs = { l+t+^^ + ... + ^t+...}
(4.36)
where 1 = W
rhs = exp{ 1+cit + ^y-+...+^|-+ ...}
exp(~if> exp(2lr) exp(7r) -
fl hClt I Clt2 I 1 M I c2t2 | 1 I
{1+ 1; + 2! +...}{! + ^r + 2f(-^-) +
{1 + ^- +A-(<Â£-*) +
1 r! 2lK r[2 ; + "
) ..
(4.37)
It is worth noting that the very tedious process of
writing down the explicit relation for particular values of n
28
may be shortened considerably. In fact, differentiating
Equation (4.35) with respect to 9i we have
| (1 + t + +... + + ...)
3t 13tr
-----------h ... + ------ + ...
3ci r' 9cj (4.38)
hence identifying the power of t
3
3cj
( j )
(4.39)
In particular, j = 1
3
3ci
r
(4.40)
therefore, given any in term of the cs, we can write
down successively those of lower orders by a differentiation.
= ci (4.41)
< x2> = c2 + ci2 (4.42)
C3 + 3C2C1 + Ci3
(4.43)
29
< x4>
C4 + 4C3C1 + 3c| + 6C2C12 + Cl4
C5 + 5C4C1 + IOC3C2 + IOC3C12 + 15c|ci + 10C2C^ + cf
(4.44)
(4.45)
C6 + 6C5C1 + 15C4C2 + 15C4C12 + 10cÂ§ + 6OC3C2C1 + 2OC3C1
+ 2OC3C1 + 15c23 + 45c^c] + 15C2C4 + cf
(4.46)
The above moments are regular moments.
Conversely
Â£lÂ£ + ,..+CtÂ£ + ... = Ln(i+<2d^_Â£ +...+ <*ZJL + .
1! r! 1! r!
we will have the first 6 terms as the following
Cl =
C2 = 2
C3 = 3 ' + 23
C4 = 4 3 2 + 12 2 64
(4.47)
(4.48)
(4.49)
(4.50)
(4.51)
30
C5 = 5 10 + 20 2 + 30z
- 60 3 + 245
(4.52)
C6 = 6 15 + 30 2 102
+ 120 1203 + 303
- 270 4 ^Qoc1;*6
(4.53)
Analysis of the MAXENT Algorithm
From Equation (4.22), we can see the systems of linear
equation is infinite. It is impossible to solve such a system of
linear equation. There are many ways to approximate the
solution to this kind of system of equation. One of this ways
is to truncate the system of linear equation into a even K
system. (K must be an even number). The truncated system
will include the higher order moments, which is not the case
for Gaussian distribution. It only requires two moments to
describe its characteristics. Therefore, maximum entropy
becomes a problem of non-linear nature. In this case, we need
to intelligently guess the higher the higher moments in order
to solve the system for the Lagrange multipliers, except for
the Gaussian distribution. We can obtain the Lagrange
multipliers for Gaussian distribution by using two linear
equations. Higher order moments do not require here because
they do not influence the solution.
The algorithm I am developing here can improve the
approximation to maximum entropy by partitioning the
equations into two matrix systems
Sp = 5 (4.54)
= 5 (4.55)
The second system has H equations. (H is an even number).
Assume the is the highest moment that is going to be
used, and all the higher moments after < xk+2 > are set to zero.
Arrange the equations completely, so we solve moments
instead of Lagrange multipliers. B is an H by H matrix of
Lagrange multipliers. And beta is an H by 1 matrix of moments.
From the first equation, we can approximate the values of
moments in A and use these values to solve for the Lagrange
multipliers. Then, I insert the calculated Lagrange multipliers
into the second system of equations to calculate the new
moments. After that I can substitute the new moments values
from Equation (4.55) to update the a matrix A in Equation
(4.54). The moments and the Lagrange multipliers will
finally converge in a few iterations. The above method will
only work based on two assumptions. The first assumption is
to have an educated initial guess. The second assumption
requires that the Lagrange multipliers are consistent with the
integrability of the distribution.
There is no difference between the Equation (4.22) and
the system of linear equations obtained from non-linear, least
32
square approximation. Nevertheless, in the least square
approach, the development of moment matrix is based on the
assumption that the underlying probability distribution is
constant. We also need a series expression of V(x) in terms of
expression function. In MAXENT approach, no such assumptions
have been made. In addition to this, the maximum entropy
method is more flexible, and we feed additional constraint
information into system of equations without any trouble.
When I used maximum entropy algorithm, I found out
some interest points, if I used ten or more moments in
calculating the Lagrange multipliers. The numerical solutions
for Lagrange multiplier were not as good as I wanted them to
be. They were very noisy. This might due to the ill-condition
matrix, because the values of the elements in the first column
and the last column are to much different. The values of the
first column are too small, but the values of the last column
are too big. The highest moment I ever used in my algorithm is
sixth moment. It seemed to give me some good results. Finally,
the numerical values of Lagrange multipliers calculated from
moment method and cumulant method were very close to each
other.
Application in Infinite Domain
Let us consider the classical problem. As we mentioned
earlier, Gaussian distribution is one of the special functions
that we only use two moments to describe it. So, the Lagrange
33
multipliers can be calculated by two linear equations. Put it
into the matrix form .
j I ^>1
I 2A.2
0
(4.56)
using Cramers rule, we obtain
2
(4.57)
%2
________2_______
2 2 2
(4.58)
exp(-Xo) =
exP( 2 x" ) d*
n=l
(4.59)
Example 1: a distribution of Gaussian type.
Let
f(x) = exp (- x2) (4.60)
the moments are
< xn > = I exp (- x2) xn dx = T( L)
(4.61)
34
for n = 0, 2, 4, ... even numbers.
the moments are equal to zero for n = 1, 3, 5, ... odd numbers.
= 1
= = = 0
= 0.5
= 0.75
= 1.875
therefore, the Lagrange multipliers are calculated from the
bellowing equation
1 0 0.5 0 | 0
0 0.5 0 0.75 | 1X2 1
0.5 0 0.75 0 | 3X2 = 0
0 0.75 0 1.8751 4X4 0
(4.62)
we get the solutions for this system.
^2=1
A,i = X3 = X4 = 0
in this case, the solution is exact.
In fact, if we set the higher moments equal to zero, we
can still get the same result because a Gaussian distribution
is completely determined by its first and second moments.
35
therefore
V(x) = exp(- x2 )
This expression is plotted on Figure 4.1.
Application in Semi-Infinite Domain
(4.63)
We define the moments
..... |
v( x) xn dx
(4.64)
and the Laplace transform of the probability density function
of the Fourier transform
= I v( x) exp( ikx) dx = ^ ^ ^ ^ X >
Jo n=0 n!
(4.65)
we can calculate the Lagrange multipliers from the following
equation
36
f[x] and v[x]
Figure 4.1
Comparison of
f[x] and v[x]
37
... I v(0)
| ... I 2X2
| ... I 3X3 = 2
... I 4X4 3
(4.66)
Example 1: a distribution function with initial conditions.
Let
V( X ) = exp( -x )
the moments are given as
< Xn> = I J ( "exp(-x)xdX = n! )
where
= 1
= 1
= 2
= 6
= 24
substitute these values into Equation
I 1 1 2 | Xi
I 1 2 6 | 2X2
i 2 6 24 | 3X3
(4.67)
(4.68)
2
(4.69)
38
we get
h = 1
A<2 = A,3 = 0
again the maximum entropy is exact.
V(x) = exp( x ) (4 70)
This function is plotted in Figure 4.2.
Now let us look at another application of Equation (4.66)
which yields an inexact solution.
Example 2:
Let
V( x ) = x exp (- x ) (4.71)
the moments are given as
< xn>
[
x exp (-x) xn dx
(n+1)!
(4.72)
where
= 1
= 2
= 6
= 24
= 120
= 720
= 5040
39
f[x] and v[x]
Figure 4.2
Comparison of
f[x] and v[x]
40
substitute these values i into Equation (4.66)
| 1 2 6 24 I Xl 0
I 2 6 24 120 | 2X2 = 1
I 6 24 120 720 | 3A.3 4
24 120 720 5040 I 4A.4 18
(4.73)
the solutions are
a.i = -3
^2 = 1.5
X3 = -0.22222
X4 = 0.0104167
therefore the function can be approximated as
V( x) = exp (- 2.75 + 1.5 x 0.3 x2 + 0.222 x* 0.0104167 x*)
This expression is plotted in Figure 4.3.
(4.74)
Application in Finite Domain
The domain of the problem of the probability function is
a finite region a =< x =< b. In this case, we define the moments
41
f[x] and v[x]
Figure 4.3
Comparison of
f[x] and v[x]
42
< xn>
-f
v( x) xn dx
(4.75)
and the Fourier transform of the probability function
n=a
(4.76)
if we require that the probability distribution is not zero at
the initial and the end point, then we must use the boundaries
conditions to calculate the Lagrange multipliers. Substituting
these conditions into the Equation (4.22). then we have
... I *1
| ... I 2X2
| ... I 3A.3
... I 4X4
Example 1:
Let
V( x ) = x exp (- xA2 )
V(a) V(b) |
+ aV(a) bV(b) |
2 + aA2V(a) bA2V(b) |
3 + aA3V(a) bA3V(b) |
(4.77)
(4.78)
43
the moments are defined as
< xn > = I x exp (-x2) xn dx = T(n ^ )
Jo
(4.79)
where
= 0.5
OLl> = 0.44311134
= 0.5
- 0.66467
= 1
= 1.6616755
= 3
using the Equation (4.76), we can approximate the Lagrange
multipliers
= -10.2
h, = 12.415
^3 = -5.653
>-4 = 0.969
therefore, the function can be approximated as
V( X) = exp (- 3.5 + 10.2 x 12.415 x2 + 5.653 x3 0.969 x4)
This function is plotted at Figure 4.4.
(4.80)
44
f[x] and v[x]
Figure 4.4
Comparison of
f[x] and v[x]
45
Example 2: Gamma distribution.
Gamma distribution is defined
P(y) -it
P Ha)
v0t-l y
------] exp (- )
(4.81)
the moment generating function m(t) for a continue random
variable y is defined as E[exp( ty )].
therefore
m(t) = E[ ]
exp(ty) y06-1
p Ha)
exp( ) dy
P
r
Ha)]
1
d-Pt)
(4.82)
using Taylors expansion series, we can expand Equation (4.82)
thus
= l + (-a) (l^C-pt) +
0.1, . (-(-|3t)2
2!
! + t(p)+^L^r1)p2]+131 +...
2!
3!
(4.83)
therefore, in general
46
= a(a+l)(a+2)...(a+k-l)pk
If we choose a = 2.5, P = 2, then
(4.84)
= 1
= 5
= 35
= 315
= 3465
= 45045
675675
using Equation (4.76), we can approximate the Lagrange
multipliers
= -1.50001
*2 = 0.300003
^3 = -0.190478E-1
*4 = 0.396831 E-3
therefore, the function can be approximated as
V( x ) = exp (- 4.1 + 1.5 x 0.3 x2 + 0.019 x3 0.000397x4)
(4.85)
This function is plotted at Figure 4.5.
47
Figure 4.5
Comparison of
f[x] and v[x]
48
Constructing PDF from the Experimental Data
Experiment 1.
Experiment 2.
Experiment 3.
Experiment 4.
Experiment 5.
This experiment is plotted at Figure 4.6.
This experiment is plotted at Figure 4.7.
This experiment is plotted at Figure 4.8.
This experiment is plotted at Figure 4.9.
This experiment is plotted at Figure 4.10
All these experimental data are obtained from the lab at
the University of Colorado at Denver. They are unknown
probability density functions.
v[x]
Figure 4.6
v[x] is an unknown
probability density function
V[X]
Figure 4.7
v[x] Is an unknown
probability density function
51
v[x]
probability density function
52
v[x]
Figure 4.9
v[x] is an unknown
probability density function
Y[X]
Figure 4.10
v[x] is an unknown
probability density function
CHAPTERS
DISCUSSION AND CONCLUSION
In this paper, we have carefully examineD the classical
moment problems, using maximum entropy. An explicit
procedure for approximating the Lagrange multipliers has been
developed foe infinite and finite domains for the classical
problems. This method is based on developing a differential
equation for the Fourier Transform of the probability
distribution. This new method allows the physical
interpolation of the various Lagrange multipliers in term of
the cumulants and moments of the distribution. The algorithm
is good to solving certain types of non-linear differential
equations. The difference between this algorithm and the old
classical algorithm is this new method provides a significant
result which eliminates the tedious solution of systems of
non-linear equations by solving two linear differential
systems of equations. Also, the new algorithm does not has an
assumption as in the maximum likelihood problem that the
underlying probability distribution is Gaussian distribution,
which is not always true in the real world.
55
The MAXENT algorithm allows us to truncate the infinite
sum in the expansion of the Fourier Transform the probability
density function. In fact, this algorithm only use the first few
moments to approximate the Lagrange multipliers, but give me
good results.
APPENDIX
! This program is used to plotted Figure 4.1.
! Program language: Mathematica.
a = 0
b =1
c = 0
d = 0
g[xj = Exp[-a*x b*xA2 c*xA3 d*xA4]
e = 0
v[xj = Exp[-e a*x b*xA2 c*xA3 d*xA4]
Plot[ v[x], {x, -3, 3}, AxesLabel -> {"x", Mv[x]"}]
f[xj = Exp[-xA2]
Plot[ { v[x], f[x] }, {x, -3, 3}, AxesLabel -> {"x", "f[x] and v[x]"} ]
57
! This program is used to plotted Figure 4.2.
! Program language: Mathematica.
a = 1
b = 0
c = 2.9*10A-8
d = 3.7*10A-9
g[xj = Exp[-a*x b*xA2 c*xA3 d*xA4]
init = Nlntegrate[ g[x], { x, 0, 10} ]
e = Log[ init ]
v[xj = Exp[-e a*x b*xA2 c*xA3 d*xA4]
Plot[ v[x], {x, 0, 3}, AxesLabel -> {"x", "v[x]"}]
f[xj = Exp[ -x ]
Plot[ { v[x], f[x] }, {x, 0, 3}, AxesLabel -> {"x", "f[x] and v[x]"} ]
58
! This program is used to plotted Figure 4.3.
! Program language: Mathematica.
a = -3
b = 1.5
c = 0.22222
d = 0.0104167
g[xj = Exp[-a*x b*xA2 c*xA3 d*xA4]
init = Nlntegrate[ g[x], { x, 0, 10} ]
e = Log[ init ]
v[xj = Exp[-e a*x b*xA2 c*xA3 d*xA4]
Plot[ v[x], {x, 0, 5}, AxesLabel -> {"x", v[x]"}]
f[xj = x*Exp[ -x ]
Plot[ { v[x]p f[x] }, {x, 0, 3}, AxesLabel -> {"x", Mf[x] and v[x]"} ]
59
! This program is used to plotted Figure 4.4.
! Program language: Mathematica.
a = -10.2
b = 12.415
c = 5.653
d = 0.969
g[xj = Exp[-a*x b*xA2 c*xA3 d*xA4]
init = Nlntegrate[ g[x], { x, 0, 10} ]
e = Log[ init ]
v[xj = Exp[-e a*x b*xA2 c*xA3 d*xA4]
Plot[ v[x], {x, 0, 5}, AxesLabel -> {"xH, "v[x]"}]
f[xj = x*Exp[ -x ]
Plot[ { v[x], f[x] }, {x, 0, 3}, AxesLabel -> {"x", "f[x] and v[x]"} ]
60
! This program is used to plotted Figure 4.5.
! Program language. Mathematica.
a = -1.5
b = 0.3
c = 2/105
d = 1/2520
g[*J = Exp[-a*x b*xA2 c*xA3 d*xA4]
init = Nlntegrate[ g[x], { x, 0, 10} ]
e = Log[ init ]
v[xj = Exp[-e a*x b*xA2 c*xA3 d*xA4]
Plot[ v[x], {x, 0, 5}, AxesLabel -> {"x", nv[x]"}]
f[xj = x*Exp[ -x ]
Plot[ { v[x], f[x] }, {x, 0, 3}, AxesLabel -> {"x", Hf[x] and v[x]"} ]
61
! program: MAXENT.BAS
! programming language: Vax Basic
print chr$(30); chr(92); chr(50);chr(74);
print chr$(27); chr(92); chr(74);
main:
y$ = y"
until ( y$ o y and y$ o y )
input" input (size) nl nl%
input" input ( size) n "; n%
input" iteration time 'd' d
length = 2 nl% 1
length = 2 n% 1
dim m(nl%,nl%), inv_m(nl%,nl%), b(nl%,l),
lamda(nl%,l), g(length,l)
dim ma(n%,n%), inv_ma(nl%,nl%), c(nl%,l),
x(nl%,l), h(length,l)
print" input matrix element g[i] size is nl"
mat input g
when error in
for dl = 1 to d
mat b = zer
for 1 = 2 to nl%
b(l,l) = (l-l)*g(l-l,l)
next 1
k = 0
for i = 1 to n%
for j = 1 to nl%
m(i,j) = j g(j+k,l)
next j
k = k + 1
next i
call print_matrix ("matrix m", m(,) )
call print_matrix("matrix b", b(,) )
mat inv_m = inv(m)
call print_matrix ( inversed matrix inv_m, inv_m(,) )
call calculate_solution_x ( lamda(,), inv_m(,), b(,))
call print matrix ("lamda ", lamda(,) )
mat c = zer
c(l.l) = -g(5,l) lamda(l,l) + 4 g(4,l)
c(2,l) = 5 g(5,l)
mat h = zer
h(l,l) = -6
for 1 = 2 to 5
62
h(l,l) = lamda( 1-1, 1)
next 1
mat ma = zer
k = 3
for i = 1 to n%
bO = 0
for j = 1 to n%
if ( ( bO+k) >= 1) then
ma(i,j) = h(l,l)
h(l,l) = h(l,l) 1
else
ma(i,j) = ( bO+k-1) h(bO+k, 1)
end if
end if
bO = bO + 1
next j
k = k 1
next i
call print_matrix ( "matrix ma", ma(,) )
call print_matrix ( "matrix c ", c (,) )
call print_matrix ( "matrix h ", h (,) )
mat inv_ma = inv( ma )
call print_matrix (" inversed matrix ma", inv_ma(,) )
call calculate_solution_x ( x(,), inv_ma(,), c(,) )
call print_matrix ( "x", x(,) )
for 1 = lengh-1 to lengh
g(l,l) = x(l-5,l)
next 1
call print_matrix ("solution matrix g", g(,) )
next dl
call print_matrix ("solution matrix x", x(,) )
use
print unresolvable matrix, try again"
end when
input" continue.; y$
next
end
sub initial_lamda ( hfloat m(,), b(,) )
mat b = zer
for 1 = 2 to nl%
b(l,l) = ( 1-1 ) g( j+k, 1)
next 1
63
k = 0
for i = 1 to nl%
for j = 1 to nl%
m( i, j ) = j g(j+M)
next j
k = k + 1
next i
end sub
sub initialjma ( ma(,), c(,), h(,), lamda(,) )
mat c = zer
c(l,l) = -g(5,l) lamda(l,l) + 4 g(4,l)
c(2,l) = 5 g(5,l)
mat h = zer
h(l,l) = -6
for 1 = 2 to 5
h(l,l) = lamda(l-l,l)
next 1
mat ma = zer
k = 3
for i = 1 to n%
bO = 0
for j = 1 to n%
if ( k+bO) >= 1) then
if ( ( bO+k-1) = 0 ) then
h(l,l) = h(l,l) -1
mat( ij) = h( 1,1)
else
a(i,j) = (bO+k-1) h( bO+k, 1)
end if
end if
bO = bO + 1
next j
k = k 1
next i
end sub
sub calculate_solution x ( hfloat x(,), inv_a(,), b(,) )
mat x = inv_a b
end sub
sub print_matrix ( a$, hfloat out_matrix(,) )
print a$
mat print out_matrix;
end sub
BIBLIOGRAPHY
[1] C. Shannon, "A mathematical theory of communication."
Tech Rep. Bell Syst. Technical Journal,1948.
[2] E. Jaynes, "Information theory and statistical
mechanics." Phys. Rev. Vol. 106, P. 620, 1957.
[3] L. Mead and N. Papanicolaon, "Maximum entropy in the
problem of moments."Journal of Mathematical Physics,
Vol. 25, P 2204,1984.
[4] J. Baker-Jarvis, J. Alameddine, and M. Racine, "Solving
differential equations by a maximum entropy-minimum
norm method with a application to fokker-plank
equations." J. Math Phys,Vol.30, PP.1459-1463, 1989.
[5] A. K. Majumdar, "Uniqueness of statistics derived from
moments of irradiance fluctuations in atmospheric
optical propagation." Optical Communications, Vol. 50,
PP. 1-7,1984.
[6] J. Baker-Jarvis and R. Geyor, Application of the
method of maximum entropy in stochastic processes and
Inverse problems. ( To be published ).
[7] M. G. Kendall and A. Stuart, "The advanced theory of
statistics. New York: Hafner publishing company, Vol.1,
PP. 55-104,1969.
66
[8] W. Menderhall, R. L. Scheaffer, and D. D. Wockerly,
"Mathematical statistics with application." Boston:
Duxbury Press, 1978.
[9] R.G. Frehlich and J. H. Churnside, "Probability density
function for estimates of the moments of laser
scintillation. SPIE Proc., Infrared and Millimeter wave
Propagation Engineering, 1988.
[10] S. A. Brown-Van Hoozer, "Statistical laser irradiance
fluctuations through atmospheric turbulence based on
higher-order moments." Master Thesis, University of
Colorado at Denver, 1990.
[11] A. K. Majumdar, "Higher-order statistics of laser-
irradiance fluctuations due to turbulence." Journal of the
Optical Society of America, Vol. I, P.1067. Nov. 1984.