A DERIVATION OF A
GENERALIZED CAUCHY DISTRIBUTION
FOR
FLOOD FREQUENCY APPLICATIONS
by
Stephen Rocky Durrans
B.S.C.E., University of Colorado at Denver, 1985
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Civil Engineering
1988
This thesis for the Master of Science degree by
Stephen Rocky Durrans
has been approved for the
Department of
Civil Engineering
by
Date
Ill
Durrans, Stephen Rocky (M.S., Civil Engineering)
A Derivation of a Generalized Cauchy Distribution for Flood
Frequency Applications
Thesis directed by Assistant Professor ChwenYuan Guo
Commonly used probability distributions for flood frequency
analyses have both theoretical and practical strengths and weak
nesses. Based upon a knowledge of these attributes, and an
enumeration of desirable characteristics, this thesis derives a
new probability distribution which simultaneously satisfies the
desired properties. Termed here as the "generalized Cauchy," the
derived distribution is a 3parameter version of the well known
Cauchy distribution. Following the derivation, methods of esti
mation of the parameters of the distribution are discussed.
Comparisons of fits of the distribution with those obtainable by
applying the Pearson Type III are also made for a number of actu
ally observed data series and an assessment of the ability of the
derived distribution to explain the separation effect is made.
Results and conclusions of the study identify a new direction for
future flood frequency research. Knowledge of the derived distri
bution also contributes to the repertoire of tools available to
the statistician.
The form and content of this abstract are approved. I recommend
its publication.
Signed_
CONTENTS
CHAPTER
I. INTRODUCTION ....................................... 1
Literature Review ................................ 2
Flood Frequency Distributions .................... 9
Gumbel Distribution .......................... 9
Pearson Type III Distribution................... 10
Lognormal Distribution ........................ 12
Wakeby Distribution ............................ 12
A Point of Divergence ............................ 14
Scope and Presentation of Study................... 15
II. REVIEW OF STATISTICAL CONCEPTS ..................... 18
Types of Probability Distributions ............... 18
Properties of Probability Distributions .......... 20
Descriptors of Probability Distributions ......... 24
Overview........................................ 24
Descriptors of Location ............................ 26
Descriptors of Dispersion .......................... 28
Descriptors of Symmetry ............................ 29
Parameter Estimation .................................. 30
Graphical Presentation
31
V
CHAPTER
III. A GENERALIZED CAUCHY DISTRIBUTION ....................... 33
Derivation of the Generalized Cauchy
Distribution ........................................ 33
Characteristics of the Generalized Cauchy
Distribution ........................................ 36
Probability Paper ..................................... 41
IV. PARAMETER ESTIMATION .................................... 44
Estimation from Observed Quantiles ................ 46
Approximate Method of Moments ......................... 49
Comparison of Estimation Techniques ................... 60
V. APPLICATIONS TO REAL DATA............................... 64
SiteSpecific Comparisons ............................. 65
Regional Skews and theSeparation Effect .............. 76
VI. SUMMARY AND CONCLUSIONS.................................. 83
CITED REFERENCES............................................ 86
APPENDIX
A. MONTE CARLO EXPERIMENT RESULTS ... ...................... 89
B. STREAMFLOW DATA
93
VI
TABLES
Table
5.1 Summary of Stations for Distribution Comparisons . 66
5.2 Summary of Generalized Cauchy Distribution
Parameter Estimates ....................................... 74
A. l Monte Carlo Experiment Results ............................ 90
B. l Annual Runoff, Cave Creek near Fort Spring,
Kentucky................................................... 94
B.2 Annual Runoff, Spray River at Banff, Canada ........... 95
B.3 Annual Runoff, Green River at Munfordville,
Kentucky .................................................. 96
B.4 Annual Runoff, Weldon River at Mill Grove,
Missouri................................................... 97
B.5 Annual Snowmelt Peaks, San Juan River at
Pagosa Springs, Colorado .................................. 98
B.6 Annual Peaks, Fishkill Creek at Beacon, New
York....................................................... 99
FIGURES
Figure
2.1 Definition Sketch of Probability Density
and Cumulative Distribution Functions ............... 23
3.1 Cumulative Generalized Cauchy Distribution
Functions............................................ 37
3.2 Generalized Cauchy Distribution Probability
Density Functions ....................................... 37
3.3 Cauchy Probability Paper .............................. 42
4.1 Definition Sketch for Approximate Method of
Moments Estimation Technique ............................ 31
4.2 Approximate Relationship Between a, j3, y and /i . . 36
2
4.3 Approximate Relationship Between a, y and cr ... . 57
4.4 Approximate Relationship Between y and Cg.......... 58
5.1 Distributions of Annual Runoff, Cave Creek
near Fort Spring, Kentucky............................... 67
5.2 Distributions of Annual Runoff, Spray River
at Banff, Canada......................................... 68
5.3 Distributions of Annual Runoff, Green River
at Munfordville, Kentucky . ..................... 69
5.4 Distributions of Annual Runoff, Weldon River
at Mill Grove, Missouri.................................. 70
5.5 Distributions of Annual Snowmelt Peak Flows,
San Juan River at Pagosa Springs, Colorado .......... 71
5.6 Distributions of Annual Peak Flows, Fishkill
Creek at Beacon, New York................................ 72
5.7 Statistics of Observed and Synthesized Skews .... 79
Vlll
Figure
5.8 Statistics of Observed and Synthesized Skews .... 80
5.9 Statistics of Observed and Synthesized Skews ....
81
CHAPTER I
INTRODUCTION
In sciences such as hydrology where the phenomena being
studied exhibit considerable degrees of stochasticity (randomness),
some of the most powerful tools available to investigators for
mathematical modeling of the processes are those afforded by the
mathematics of probability theory and statistical inference. Such
has long been recognized and innumerable research papers may be
found in the literature on applications of these theories.
Specific applications of probability theory and statistical
inference are too numerous to elaborate on in any detail but are
the foundations of many of the daytoday techniques utilized in
practice. As an example, consider the vast number of empirically
derived relationships that are commonly applied in all disciplines
of engineering and the applied sciences. How are relationships of
this type developed and with what reliabilities may they be used?
Probability theory and mathematical statistics provide many of the
answers.
Within the realm of possible applications of these
theories, the one that almost immediately comes to mind amongst
civil engineers is that of flood frequency analysis, the intent of
which is to assign a flood magnitude to a specified annual exceed
2
ance probability. Coupled with the concepts of risk, flood fre
quency analyses permit engineers to establish design criteria for
many types of hydraulic improvements based upon justifiable eco
nomic considerations. A considerable amount of research has been
devoted to flood frequency analysis by many investigators over the
last 70 to 80 years and, as such, is nothing new. Even considering
the amount of work performed to date, however, research on the sub
ject seems to be increasing in intensity. The work presented here
identifies some theoretical shortfalls in some commonly used dis
tributions and derives a new distribution that eliminates some of
these shortfalls. It is hoped that it will prepare a foundation
for continuing research of the topic.
Literature Review
As mentioned above, flood frequency analysis has a fairly
long history. The earliest work is generally credited to Hazen
[1914]; however, isolated instances of work performed prior to
that time are cited by Hall [1921] and Foster [1934]. A number of
probability distributions have been proposed for. modeling flood
extremes and consequently the literature is quite extensive.
Potter [1987], for example, cited over 100 references that appeared
in the period from 1983 to 1986 alone. Because of the extent of
the literature, the review presented here is general in nature and
is intended to briefly summarize some conceptual developments that
have occurred over the years. The following section is more speci
fic in nature and briefly discusses some of the most commonly
3
applied probability distributions for flood frequency analyses.
The earliest works dealing with the subject of flood fre
quency analysis appeared almost exclusively in the annually pub
lished volumes of the Transactions of the American Society of Civil
Engineers. During the early 20th century, the Transactions con
tained complete papers, often well over 100 pages in length each,
as opposed to the compendiums of abstracts that are currently pub
lished. The early works by Hazen [1914], Hall [1921] and Foster
[1924] are highly recommended reading for others interested in
flood frequency analysis. While they are sometimes rather verbose,
and some of the terminology is different from that currently used,
they demonstrate considerable degrees of ingenuity and foresight,
if not just plain common sense, on the parts of the various wri
ters.
Based upon these early works, the writer has been left with
the impression that prior to about 1915 or 1920 the normal distri
bution, or "Gaussian law of error" as it was termed, was used al
most exclusively in statistical analyses regardless of the phenome
non being studied. Early flood frequency investigators such as
Hazen [1914] and Hall [1921], however, recognizing that flood mag
nitudes are positively valued only, dismissed the normal distribu
tion as being appropriate. Based upon collected streamflow records,
empirical distributions were devised that had a lower bound of
zero. Within about a quarter century thereafter, the relatively
recent works of statisticians by the names of Pearson, Fisher and
Tippett were picked up on and "skew frequency curves" became rather
4
well (and analytically) defined. In particular, distributions such
as the Pearson Type III [Foster, 1924], the Gumbel [Gumbel, 1945]
and the lognormal [Harza, 1921; Beard, 1943] began to appear in
the water resources literature and became firmly entrenched. Even
to this writing, these distributions are some of the most commonly
recognized and applied.
Most of the early flood frequency analyses were directed
towards modeling of actual streamflow extremes as opposed to their
logarithms or otherwise transformed variates. The earliest use of
logarithmic transforms appears to be obscure; however, as noted
above, Harza [1921] and Beard [1943] both utilized such a technique
in the case of the lognormal distribution. Beard also called
attention to the need for a procedure for handling zero flow years
when transformations of that type are used. Jennings and Benson
[1969] would later develop such a technique based upon the theory
of conditional probabilities.
Investigators realized early on that flood frequency analy
ses were valid only at the gaging sites where the streamflow
meausurements were taken. Since one rarely encounters the luxury
of having a gaging station at a given project site, it became de
sirable to develop methodologies by which records from a number of
stations in the general area of the project could be combined to
yield estimates of flood frequencies at the project site. Dennis
published one of the earliest works [Dennis, 1921] dealing speci
fically with this topic, but limited his study to records obtained
at different locations on the same stream. Other early researchers
5
such as Hall [1921] and Foster [1924] also made mention of this
topic and made their analyses with annual streamflow extremes ex
pressed as ratios of the mean annual flood. Through such analyses,
streamflow magnitudes could be expressed as dimensionless ratios
and therefore could be more easily extrapolated to ungaged sites.
Later works such as that by Lane and Lei [1950] explicitly accoun
ted for the variance (or standard deviation) of the series. In
the case of Lane and Lei, the use of a variability index, which is
the standard deviation of the logarithms of an observed series, was
proposed to permit quantile estimations from regionally developed
estimates of the mean and variability index.
Relatively recent work related to ungaged sites involves
quantile estimation based upon regressions on various watershed
characteristics such as area, slope, imperviousness, and so on.
Bodhaine [1961] developed such a technique for a substantial area
of the northwest United States and included basin, geologic and
meteorologic characteristics in his formulation. Other works
have addressed the synthesis of frequency curves through the use of
catchment models and synthetic or known precipitation events. As
pointed out by Hughes [1977], however, there is no reason to expect
that, for example, the 0.01 probability flood event occurs due to
the 0.01 probability precipitation event. Other watershed charac
teristics such as soil moisture conditions are themselves stochas
tic in nature and have effects on a watershed response to a given
precipitation input. The method developed by Hughes explicitly
accounted for the probabilities of other influential factors and
6
a synthetic flood frequency curve was formulated in.terms of mu
tually exclusive joint probabilities where each joint probability
represented the simultaneous occurrence of given watershed vari
ables.
Fundamental to the development of extrapolated or synthe
sized frequency curves is a regional analysis of gaging stations
in an assumed homogeneous region. A considerable amount of re
search has been devoted to regional analyses since it is generally
believed that they may be applied to temper sitespecific estimates
which often exhibit high degrees of variability due to sampling
errors. An example of this is given by the recommendation of the
U.S. Water Resources Council [USWRC, 1981] to utilize coefficients
of skewness that are weighted combinations of sitespecific and
regional estimates.
Since available data series are typically rather short, at
least in the United States, techniques have been proposed for the
incorporation of additional data based upon estimates of ungaged
historical floods as well as geological evidence of paleofloods.
Synthesis of additional data is also sometimes performed, however,
this procedure often requires a priori knowledge of the distribu
tion and is thus of limited use. Techniques of estimation when
historical and/or paleoflood data are available are presented in a
recent paper by Stedinger and Cohn [1986] for the case of the log
normal distribution. In addition, Hosking and Wallis [1986] make
an assessment of the benefit of such additional data and conclude
that it is quite valuable when only singlesite gage data is used.
7
In contrast, when regional analyses are made, marginal gains from
using historical and/or paleoflood data may be negligible.
Once parameter estimates are made and a distribution func
tion is fitted to an observed data series, quantile estimates for
any desired probability level may be made. As was mentioned by
Foster [1924] and popularized by Chow [1951], an often convenient
method involves the use of a function of the form
x = x + Ks (1.1)
p x '
where x^ is a value of the variate X associated with probability
p; x and sx are, respectively, estimates of the mean and standard
deviation of X and K is a standard deviate which is a function of
the probability p and the distribution of X. For most specific
distributions, K is actually a function of the probability level
and the coefficient of skewness. Values of K are usually presented
in tabular form such as by the USWRC [1981]; however, Mavis [1970]
has also presented them in nomographical form for a number of com
monly used distributions. Use of equations of the type of (1.1)
is particularly convenient when one is working with distributions
that are not easily expressable analytically. However, when one is
working with a mathematically tractable distribution, analytical
determinations are probably just as easy from the cumulative dis
tribution function.
When one reviews the literature it becomes evident that a
considerable lag between mathematical developments and their appli
8
cations to water resources problems has existed. However, in the
writer's investigation, it appears as though this lag is decreasing
as time progresses. In some fairly recent cases, particularly
with respect to the development of skewed distributions such as
the Wakeby [Houghton, 1978], it seems as though statistical inves
tigations of water resources phenomena might be pacing some aspects
of statistical theory. Regardless of any trends that might exist,
however, the writer encountered a statement by Evans [1930] that
summarizes much of the writer's own philosophy and is repeated as
follows:
To the engineer the solution of any problem is interesting
and welcome . but the time is long past when statistical
methods should have been accepted and earnestly put to use by
the engineer.
One of the trends in flood frequency analysis at present
seems to be in the direction of identification of robust distribu
tions; i.e., distributions that model data reasonably well even if
the data come from a different distribution. A recent paper by
Wallis and Wood [1985] is a very good example of this. Early re
searchers noted that true flood distributions could not be deter
mined with a great degree of reliability due to the short data
series that were available; today, 70 to 80 years later, this topic
is still of concern. Analysis of flood data by causative factor
is also becoming popular and receiving a great deal of attention.
Jarrett and Costa [1983] and Costa and Jarrett [1981] presented
interesting discussions of this topic in the context of rainfall
and snowmelt floods and which are directed specifically towards
9
mountain and foothill streams in Colorado.
Flood Frequency Distributions
As might be imagined, and as stated earlier, a number of
different probability distributions have been introduced for flood
frequency applications. Extreme flood series observed in differ
ent geographical areas seem to emanate from different populations
(or parent distributions) and there is no reason to suspect that
this should not be so. This section briefly summarizes a few of
the most commonly applied distributions and identifies some of
their theoretical and practical strengths and weaknesses. The dis
cussions are intended as objective expositions of some of the
characteristics of the distributions. The interested reader is
referred to Haan [1982] and Yevjevich [1972] for thorough presenta
tions of these distributions and their application techniques.
Also, Kite [1985] provides computer program listings for many of
them.
Gumbel Distribution
The Gumbel is an extreme value distribution. In developing
the theory of extremes, two English statisticians, Fisher and
Tippett, showed that the distribution of the largest values within
each of n samples approaches a limiting form as the size m of each
of the n samples approaches infinity. Gumbel, in turn, reasoned
that the limiting distribution was therefore applicable to flood
frequency analysis since one desires to fit a distribution to n
10
annual flood observations where each of the n observations is the
maximum of m = 365 daily observations [Linsley et al., 1982].
Gumbel's reasoning is rather convincing and effectively couples
theoretical considerations and physical reality; however, the
limiting distribution suffers from the fact that it has a constant
coefficient of skewness. Actual flood distributions (apparently)
have varying degrees of skewness. The Gumbel distribution then can
only reasonably be applied when an observed data series exhibits
a coefficient of skewness not significantly different from the
Gumbel population value of 1.14.
Other theoretical problems with the Gumbel distribution
pertain to the question as to whether m = 365 is sufficiently large
as a size for each of the n samples and to the fact that daily
streamflows are serially correlated. Fisher and Tippett's deriva
tion was based upon the premise that the m observations in each
sample are independent of one another [Kite, 1985]. Studies of
the effects of finite sample sizes and serially correlated data
would be both interesting and informative.
Pearson Type III Distribution
Introduced for flood frequency analysis by Hall [1921] and
Foster [1924], the Pearson Type III distribution is the third of
seven types of probability distributions derived from a single
differential equation by the English statistician, Karl Pearson.
The Pearson Type III is a rather generalized distribution in that,
depending on its parameter values, it becomes a normal distribu
11
tion, an exponential distribution or even any one of the family of
gamma distributions which in turn include not only the 1, 2 and
3parameter gamma distributions, but also the chisquared distribu
tion and the Rayleigh and Maxwell distributions.
Early investigators such as Hall and Foster, recognizing
that flood magnitudes are positively valued only, proposed the use
of the Pearson Type III since it is a singly bounded distribution
which may take on a lower bound of zero. The Pearson Type III,
with this constraint, is the 2parameter gamma distribution. It
must be noted that invocation of a lower bound on the Pearson Type
III necessitates positive skewness and further that the Pearson
Type III was originally intended to be applied to actually ob
served, nontransformed flood magnitudes. As currently recommended
by the U.S. Water Resources Council [USWRC, 1981], however, the
Pearson Type III should be fitted to the base 10 logarithms of an
observed annual flood series and may have either positive or nega
tive skewness. As a result, either a nonzero lower bound or a
noninfinite upper bound is invoked on the realspace counterpart
of the distribution; i.e., on the theoretical magnitudes that flood
flows may assume. Whatever physical basis the distribution may
have once had is lost. Critiques of the USWRC procedure, which
are quite ubiquitous in the literature, cite and are based upon
these inferred bounds (see, e.g., Kite [1985] and Dawdy and
Lettenmaier [1987]).
While the density function of the Pearson Type III is ex
pressable in analytical form, the cumulative distribution function,
12
which is of primary interest in application, is not. In practice
then it is necessary to resort to tabulations of standard deviates
K that may be substituted into Chow's generalized formula (equation
(l.D).
Lognormal Distribution
Logtransformation of skewed data series often yields new
series with coefficients of skewness lower than those exhibited by
the original series. If the skewness of a transformed series is
essentially zero, the lognormal distribution, which is merely a
normal distribution fitted to the logarithms, is often applied. If
the logskew is not essentially zero, use of the lognormal distri
bution can not be justified. The normal distribution fitted to the
logarithms of an observed data series is of course a doubly un
bounded distribution. Some physical basis therefore exists in the
case of this distribution since its realspace counterpart has a
lower bound of zero and no upper bound.
Like the Pearson Type III, the lognormal cumulative dis
tribution function is not expressable analytically. One must re
sort to tabulations of standard normal deviates Z (which are the
same as values of K in Chow's generalized formula) in order to
construct the curve.
Wakeby Distribution
Proposed for flood frequency applications by Houghton
[1978], the Wakeby distribution is a fairly recent development.
Because of its five parameters, the Wakeby can approximate nearly
13
any distribution currently in use and thus has been coined as a
"grandparent distribution." Furthermore, the left and righthand
tails of the distribution are separable from one another. In prac
tice, one can concentrate interest in either tail without concern
for the opposite tail.
Because of its five parameters, the Wakeby distribution
seems to violate the principle of parsimony. The five parameters
also render fitting of the distribution rather tedious and time
consuming. While the Wakeby has been subjected to some criticism
due to these characteristics, it has on the other hand received a
fair amount of interest due to its ability to explain the "separa
tion effect" discussed by Matalas et al. [1975].
The separation effect refers to regional analyses of the
coefficient of skewness and is manifested by a difference between
statistics observed in nature and those which are attainable with
commonly used probability distributions. More particularly, a
cartesian plot of observed means and standard deviations of skew
ness for various regions displays data points that are consistently
and significantly above those attainable by synthesis of random
variates from commonly applied distributions. As noted by
Houghton, the separation effect can be accounted for by a distri
bution having one very thick tail and with the opposite tail being
thick enough to decrease average skews. The Wakeby has this pro
perty whereas other commonly applied distributions do not since
they lack enough kurtosis for any given skew.
Definition of the Wakeby distribution is most easily stated
14
in inverse form; i.e., the variate X is expressed as a function of
F(x) which is a probability. This is particularly convenient in
application since it enables one to determine a flood magnitude
directly from a given probability level.
A Point of Divergence
Based upon the foregoing discussions, one should come away
with the impression that each distribution presented has both bene
fits and shortfalls. In particular, it seems that a distribution
should be developed that exhibits skewness of varying degrees, that
is fairly parsimonious and that is easily expressable in cumulative
distribution function form. Recalling that logskews are often
lower than corresponding realspace skews, the distribution should
also be able to be applied to logtransformed data without accom
panying and undesirable problems in boundedness.
In order to simultaneously accomplish these objectives,
what is needed is a doubly unbounded distribution of logarithms
that exhibits skewness of varying degrees. Double unboundedness
in logspace implies a lower bound of zero and no upper bound in
realspace. This distribution should also be easily expressable
in cumulative distribution function form for practical reasons.
This thesis presents the derivation of such a distribution which
turns out to be a generalized form of the Cauchy distribution.
Heretofore, the Cauchy distribution has not found application in
hydrologic problems [Yevjevich, 1972]. The derived distribution
is termed here as the "generalized Cauchy distribution."
15
Foilwing the derivation, methods of parameter estimation
for the generalized Cauchy distribution are evaluated and the fit
of the distribution to actually observed data series is compared
with those obtained using the Pearson Type III distribution. An
assessment is also made of the ability of the generalized Cauchy
distribution to explain the separation effect mentioned earlier.
The writer's review of the literature on flood frequency
analysis involved the consultation of well over 100 individual re
search papers and spanned a period of about 80 years. In none of
the works consulted was an approach of the type presented here
discussed. This seems to indicate that the work presented here is
entirely original.
Scope and Presentation of Study
The presentation, by means of this thesis, of the study
performed is arranged into chapters dealing with specific portions
of the underlying theory and applications procedure. The follow
ing paragraphs briefly summarize the scopes and topics of each
chapter and should help to clarify the interrelationships that
exist between chapters.
Chapter II digresses somewhat from the main thrust of the
work but is included for the benefit of the relatively uninitiated
reader. The writer believes that since most civil engineers have
little to no formal training in probability theory and statistics
a cursory review of some basic concepts may be warranted. The
review is not intended to be comprehensive but should give the
16
reader a fairly good understanding of many of the ideas presented.
The reader is referred to Guttman et al. [1982] for a more thorough
and exceptionally readable treatment.
Based upon the objectives developed and discussed in the
previous section, Chapter III presents a derivation of the general
ized Cauchy distribution. Some basic characteristics of the dis
tribution are also presented and a probability paper is developed
that may be used in conjunction with the distribution to construct
linear (or nearly so) plots of the cumulative distribution func
tion curve.
A discussion of parameter estimation techniques is pre
sented in Chapter IV. Since the generalized Cauchy distribution
exhibits some problems in this area, two rather unconventional
techniques are formulated and compared against each other using a
Monte Carlo experimental approach. Based upon the results of the
experiments, a recommendation is made as to which of the two tech
niques should be applied in practice.
Chapter V provides a comparison of the fit of the derived
distribution to that attainable with the Pearson Type III distri
bution for a number of actually observed flood series. The Pearson
Type III is selected for comparison purposes since it is probably
the most commonly applied distribution in the U.S. and since it
may be applied regardless of observed sample skewnesses. Chapter
V also presents the results of an assessment of the ability of the
generalized Cauchy distribution to explain the separation effect.
As a summary, Chapter VI reiterates the theory underlying
17
the derived distribution. Limitations of the study are also sum
marized and a synopsis of future research directions that might be
pursued in an effort to extend or improve the results of this work
is also included.
Following the text, a listing of cited references is pro
vided and appendices are included containing partial tabulations
of Monte Carlo experiment results as well as data used for the fit
comparisons made in Chapter V.
CHAPTER II
REVIEW OF STATISTICAL CONCEPTS
As mentioned in the Introduction, the topic of this chapter
is a brief review of some basic concepts of probability theory and
mathematical statistics. The discussions of this chapter are non
parametric in that assumptions pertaining to the distribution of a
variate are not made. There are a few cases where a comment is
made in the context of flood frequency applications; however, the
majority of the concepts may be applied to any probability distri
bution and for any application.
Types of Probability Distributions
Bound closely to statistics lies a larger and more general
field of mathematics known as probability theory. In fact, statis
tics should be viewed as but one component of probability theory.
A result of the application of statistical methods is the infer
ence of a probability distribution which may be used to quantify
the variate of interest in a convenient and parsimonious manner.
Because of this interrelationship, a basic understanding of proba
bility theory is necessary to effectively apply statistical con
cepts and methods.
There are basically two different types of probability dis
19
tributions. The first comprises the class of discrete distribu
tions and the second the class of continuous distributions. Other
distributions consisting of a combination of the two basic types
find some application in special circumstances but are not ad
dressed here.
The basic characteristic of a discrete distribution is that
the variate being modeled can take on only well defined, isolated
values. Usually, these types of distributions are applied where
the outcome of a process assumes integral values only in terms of
the variate of interest. As an example, consider a model describ
ing the number of faulty parts which might be produced during a
time interval on an assembly line. Continuous distributions, in
contrast, permit the variate of interest to assume any real magni
tude within the defined range of the distribution. There is then
an infinite number of possible values that the variate may take on.
Cases exist where continuous distributions may be utilized to ap
proximate discrete distributions, and vice versa.
Since hydrologic variables such as annual flood extremes
do not generally assume discrete values only, continuous distribu
tions are nearly always used to model them. Discrete distributions
find some application in cases such as risk analysis; however, the
remainder of this thesis is concerned with continuous distributions
only.
20
Properties of Probability Distributions
There are a number of commonly recognized and applied
probability distributions, but nearly any mathematical equation
may be adopted for use as a probability distribution provided cer
tain criteria are satisfied. Mathematical equations applied to
represent distributions are usually descriptors of the density
function, but exceptions exist where the cumulative distribution
function is modeled instead. This discussion defines and de
scribes the density function, the cumulative distribution function
and their relationships to one another.
Conditions that must be satisfied in order for a mathema
tical function f(x) to qualify as a density function are:
1. Ordinates of the function must be positively valued, at least
within the defined range of the variate.
2. The range of the function or its parameter values, or both,
must be defined such that the definite integral of the func
tion, when evaluated over the defined range, is unity.
The first criterion in effect states that the cumulative distribu
tion function must be nondecreasing. In aggregate, the two
criteria imply that the cumulative distribution function must ap
proach zero at the lower end of the defined range and unity at the
upper end of the defined.range. Depending on whether the distri
bution is bounded, the cumulative distribution function may ap
proach either zero or unity either absolutely or asymptotically.
21
This will become more clear once the cumulative distribution func
tion is formally defined.
A theoretically simple relationship exists between the
density and cumulative distribution functions and may be expressed
as
(2.1)
or
F(x) = Jf(x)dx (2.2)
where f(x) denotes the density function, F(x) denotes the cumula
tive distribution function and the integral is indefinite with
respect to the variate X. Notation adopted here is consistent
with common statistical practice: Upper case letters are used when
reference is made to all or a portion of the population space of
the variate and corresponding lower case letters are used when
reference is made to a particular value of the variate.
The probability that the variate X is less than or equal
to some specified value x may be determined from
rx
Pr(X
*x
o
where xq denotes the lower end of the defined range of X. When
distributions are used that are unbounded to the left, the integral
22
should be evaluated over the interval from minus infinity to x.
An illustration of this is provided in Figure 2.1 where both the
density and cumulative distribution functions are shown for some
hypothetical and arbitrary probability distribution. In a manner
similar to that shown by equation (2.3), the probability that X
lies between and XÂ£ where x^=X2 may be determined by integra
tion of the density function over the interval (x^,, X2). It should
be apparent then that the probability of X being exactly equal to
a specified value x is zero.
Probabilities have no dimension. It may be seen that F(x)
is dimensionless and that f(x) has the inverse dimension of the
variate X.
The preceding discussion is presented from a mathematical
perspective. Common practice in hydrology is to adopt the nota
tion of a recurrence interval; e.g., the 100year flood. A
relationship exists between the two terminologies and may be writ
ten as
T
r
P
e
(2.4)
where T denotes the recurrence interval in years and P is the
probability that a flood of the corresponding magnitude will be
equalled or exceeded in any given year. The fact that an exceed
ance probability is used here should be noted; the previous,
mathematical discussion defines nonexceedance probabilities.
Recalling that the total area under the density function curve must
F(x) f(x)
23
Figure 2.1
Definition Sketch of Probability Density and
Cumulative Distribution Functions
24
be unity, however, a relationship between exceedance and non
exceedance probabilities may be written as
Pg = Pr(X>x) = 1 Pr(X=x) = 1 F(x). (2.5)
The concept of a recurrence interval should not be con
strued to imply that a flood of the corresponding magnitude or
greater will occur only once in every years. Instead, it
should be understood that a flood of that magnitude or greater will
occur on the average once every T years. It is possible, for
example, to experience 100year floods in successive years.
Evaluations of the latter constitute what is known as risk analy
sis.
Descriptors of Probability Distributions
Properties of various probability distributions may be
characterized by statistical measures describing their location,
dispersion, symmetry (or lack of it) and other features. The
following paragraphs provide discussions of these characteristics
and present quantitative expressions through which they may be
'numerically evaluated.
Overview
Properties of probability distributions are usually quan
tified in terms of moments of the area under the density function
curve about selected axes. Mathematically, these moments are of
25
ten expressed in terms of the expectation of various functions of
X. Axes selected for moment determinations are usually taken as
either the origin (x = 0) or as that axis passing through the
mean of X. Corresponding moments are termed as either noncentral
or central moments, respectively. The rth noncentral moment p.' ,
where r>l, is defined as
/
fi'T = E(Xr) = f xr f(x)dx (2.6)
x
and where the integral is evaluated over the defined range of X.
The rth central moment where r>2, is similarly defined as
= E(X pf = f(x /i)rf(x)dx (2.7)
x
where denotes the mean of X and the integral is again evaluated
over the defined range of X. Note that r is defined as being
greater than or equal to 2 for the case of central moments; the
first central moment is necessarily equal to zero. Also note
that since the integrals are definite, noncentral moments and
central moments u are not functions of X, but rather are functions
'r
of parameters and/or constants only. This of course is because X
is "integrated out."
Moments defined by equations (2.6) and (2.7) do not always
exist, or may only exist up to some order r. Specifically, if the
rth moment does not exist, all moments of higher order similarly
do not exist. If the rth moment does exist, all moments of lower
26
order also exist. When circumstances arise where moments do not
exist, alternative descriptors must be resorted to; a few of these
are mentioned in subsequent subsections.
Convention is such that noncentral moments are usually
presented up to order r = 1 and central moments, or functions of
moments, are presented for higher orders. As might be imagined
from inspection of equations (2.6) and (2.7), noncentral moments
are usually easier to derive and, as such, it is desirable to have
a relationship between p' and Such a relationship may be
r
developed by expanding (X /i) and taking the expected value of
each term. Resulting relationships for the second and third cen
tral moments are then [Yevjevich, 1972]
p2 = p2 ~ P? (2.8)
and
^3 = P'3 ~ + 2^ (29)
These relationships are utilized in Chapter IV.
Descriptors of Location
The most commonly known descriptor of location of a proba
bility distribution is the mean; however, locations may be repre
sented by other statistics also. The mean value of a probability
distribution is the first noncentral moment and thus may be evalu
27
ated from
ft = E(X) = f x f(x)dx. (2.10)
x
Inspection of this equation reveals that the mean fj. is the x
coordinate of the centroid of the area under the density function
curve.
For cases where the mean does not exist, or can not be
evaluated analytically, other descriptors of location are the mode
and the median. The mode is that value x for which
mo
Â£f(x)1 =0 (2.11)
Jx
mo
and
of (x) 1 <0. (2.12)
dx x
mo
The mode is then that value of X for which the density function
curve is at a maximum and is concave downward. A weakness of the
mode as a descriptor of location lies in the fact that it does not
always exist. The Pearson Type III distribution, for example, is
modeless in some cases. The median is that value x^q which splits
the area under the density function curve into two equal halves.
The probability that X is greater than the median is equal to the 
probability that X is less than the median.
Means, modes and medians all have the same dimension that
28
the variate X has.
Descriptors of Dispersion
The two most commonly used descriptors of dispersion are
the variance and the standard deviation. The variance, denoted
2
Var(X) or
fined by
2 = Var(X) = f (x fj.)2 f(x)dx. (2.13)
x
The standard deviation is simply the positive square root of the
variance. Similar to the mean, an analogy to mechanics may be made
for the variance. Equation (2.13) reveals that the variance is
the moment of inertia of the area under the density function curve
with respect to its centroidal axis.
An alternative descriptor of dispersion is the inter
quartile range which is here denoted by R and is defined as
R = x?5 *25. ^ (2.14)
The terms x^^ and *25 resPectively> denote values of X where the
corresponding values of F(x) are 0.75 and 0.25. The selection of
the quartiles x^ and Xy^ is rather arbitrary; any two quantiles
may be used in practice.
Standard deviations and ranges, like the mean, have the
same dimension as the variate X; the variance has the dimension of
29
the variate squared.
Descriptors of Symmetry
Many probability distributions are skewed in that they do
not exhibit symmetry about any axis. Symmetry of a distribution
is measured by a statistic C known as the coefficient of skewness
s
and which is defined as
C
s
^3
f (x /a)3f(x)dx
x
(2.15)
Other definitions for the coefficient of skewness exist, but that
given by equation (2.15) is used here. Alternative expressions of
symmetry for cases when the integrals can not be evaluated are not
in evidence in the literature as far as the writer has been able
to determine; however, it is likely that one could develop an ex
pression containing various quantiles similar to the definition of
the interquartile range.
A probability distribution is said, to be positively or
negatively skewed, respectively, depending on whether Cg is greater
than or less than zero. When Cg = 0, the density function curve
is symmetrical about the mean and the mean, mode and median are
identical.
Inspection of equation (2.15) reveals that the coefficient
of skewness is a dimensionless quantity.
30
Parameter Estimation
Assuming that a process is stationary (it does not change
with respect to time), the probability distribution from which
realizations occur has constant parameter values. These parameter
values are never known with certainty; statistical methods are used
to estimate them from an observed data series.
In order to obtain reasonable parameter estimates, the
observed data from which they are derived must be representative
of the phenomenon being studied. One rarely knows this to be the
case; however, it is generally assumed to be unless there are rea
sons to suspect otherwise. In the context of annual flood ex
tremes, for example, it is possible that a data series being
modeled might be biased due to systematic measurement errors, due
to short sequences of data obtained during relatively wet or dry
periods, or other similar causes.
In addition to being representative, data values should
also be identically distributed and should be independent of one
another. Where observations are a result of different physical
phenomena such as rainfall and snowmelt, for example, they should
not be considered to be identically distributed. An effort should
be made to separate the data values into homogeneous groups if at
all possible. With respect to independence, there should be no
significant serial correlations between observations.
There are a number of methods by which estimates of param
eters may be made from an observed data series. The most commonly
31
known techniques are the method of moments and the method of maxi
mum likelihood; however, a number of other techniques exist also
and consist of graphical methods, least squares methods and maxi
mum entropy methods to name but a few. Since the generalized
Cauchy distribution derived here exhibits some distinct problems
with respect to parameter estimation, a further discussion of tech
niques is contained in Chapter IV.
A property of parameter estimates that must be noted is
that they have probability distributions of their own. Since
estimates are functions of random sample observations, they too
are random variables. This is a very important concept but is of
ten ignored since but only one set of data is usually available.
It is desirable then to utilize estimation techniques that provide
estimates with a minimal variance and which are unbiased.
Graphical Presentation
It is often desired to present the results of frequency
analyses in graphical form by plotting the cumulative distribution
function curve. Results presented in this manner allow one to
assess various quantiles at a glance and also provide a visual
means of assessment of the goodness of fit of the distribution
function to the observed data. In some cases, parameter estimates
themselves may be made from graphical presentations.
Most commonly, graphical presentations are prepared on
special probability papers which are designed so as to yield linear
plots of the cumulative distribution function curve. The type of
32
graphic paper utilized varies from one probability distribution to
another. Some distributions such as the Pearson Type III require
many different probability papers if linear plots are desired for
all possible combinations of parameter values. This, of course,
would be a practical nightmare so one type of paper is generally
adopted and slightly curvilinear plots are accepted in most cases.
The benefit of linear plots lies in the ease with which extrapola
tions may be made. The reader is cautioned, however, that extra
polation is risky; confidence intervals become rather wide near
the extremes of the graph.
Since it is desirable to illustrate data points as well as
the cumulative distribution function in graphical presentations,
some means of assigning probability levels to the data points is
in order. Numerous methods of accomplishing this have been pro
posed; the most commonly used method, at least in the U.S., is the
Weibull method which, mathematically, is defined by
P
e
m
n + 1
(2.16)
where n is the number of observations in a data series, m is the
rank of an individual data value when the n values are listed in
descending order of magnitude and Pg is as previously defined.
All points will be assigned probability levels between zero and
unity and thus all may be plotted on probability paper. A discus
sion of various plotting techniques is not forwarded here; it is
assumed that the Weibull method may be accepted.
CHAPTER III
A GENERALIZED CAUCHY DISTRIBUTION
Recalling the discussion in Chapter I related to desirable
characteristics of a probability distribution for flood frequency
applications, this chapter presents a derivation of a distribution
that simultaneously satisfies all of the stated objectives. As
mentioned earlier, the derived distribution turns out to be a gen
eralized form of the Cauchy distribution and, for this reason, is
termed here as the generalized Cauchy distribution. Following the
derivation, some characteristics of the distribution are briefly
presented and, concluding this chapter, a probability paper is
developed that, when used in conjunction with the distribution,
provides linear (or nearly so) graphical presentations of the cumu
lative distribution function.
Derivation of the Generalized
Cauchy Distribution
Derivation of a probability distribution is often easier in
terms of the cumulative distribution function F(x) than in terms of
the density function f(x). In fact, the derivation following pro
ceeds in this manner and, once F(x) is defined, f(x) is also
defined through equation (2.1).
34
Considering that one of the objectives is to develop a
doubly unbounded distribution, and recognizing that cumulative dis
tribution functions often resemble an Scurve, one can begin the
derivation by considering the inverse tangent function
y = tan x. (3.1)
This function has an Scurve shape and is asymptotic to y = 177 2
radians and to y = xr/2 radians as x approaches, respectively, pos
itive and negative infinity. Equation (3.1) of course is not a
cumulative probability distribution function because y does not
lie between zero and unity for all x; however, by shifting equation
(3.1) with respect to y, and scaling the shifted expression, one
may write
g(x)
y + 7t/2 tan
TT
1
TT
X
00
(3.2)
where g(x) has the bounds 0
teria of a cumulative distribution function noted in Chapter II.
Note that it is assumed in equation (3.2) that tan ^ x is expressed
in radians; one may use degrees instead if tt is replaced with 180.
Even though equation (3.2) is a cumulative, distribution
function, it is not a very useful one. The corresponding density
function is symmetrical about x = 0 and has a constant magnitude of
dispersion (variance). Since it is desirable to obtain a distri
bution that has a variable location with respect to x, as well as
variable dispersion, one must modify equation (3.2). Resorting
again to shifting and scaling, one may write
35
co
(3.3)
where a and /3 are, respectively, scale and location parameters.
Changing the value of a and holding f3 constant will affect the dis
persion of the distribution and will also change its location
since the density function is symmetrical about x = /3/a. Changing
/3 and holding a constant will affect the point.of symmetry only;
i.e., /3 has no effect on dispersion.
function and is given in many probability theory and statistics
texts. The corresponding density function is
Note that equations (3.3) and (3.4) define symmetrical, and there
fore nonskewed, probability distributions. Since it is desirable
for flood frequency applications, however, to use a distribution
capable of describing various degrees of skewness, both positive
and negative as well as zero, some modification of equations (3.3)
and (3.4) needs to be made. This modification may be made by
raising the entire righthandside of equation (3.3) to a power and
rewriting it in a more generalized form as
Equation (3.3) defines the cumulative Cauchy distribution
tt[1 + (ax /3)2]
(3.4)
36
F(x) = [
tan ^(ax /3)
7r
&
0; y>0; oo
(3.5)
This equation then defines the cumulative generalized Cauchy dis
tribution function and the derivation is complete. The corres
ponding density function is
f(x) =
ay
[tan '''(ax
Yl
ir[l + (ax Â£) ]
tr
+ 5] 5 a>0; y:
(3.6)
When y = 1, equations (3.5) and (3.6) reduce to equations (3.3) and
(3.4) and the skewness is zero. When y=1, the skewness is posi
tive since the lefthand tail is thinned and the righthand tail
is thickened and, when y
reasons.
Figures 3.1 and 3.2 illustrate cumulative distribution
function and density function curves of the generalized Cauchy dis
tribution for various values of y and for a = 5 and /3 = 12.5. The
behavior of the tails, and thus of the skewness, is rather evident.
Characteristics of the Generalized
Cauchy Distribution
Characteristics of the generalized Cauchy distribution are
such that all of the objectives defined in Chapter I are simultane
ously attained. The following paragraphs reiterate the objectives
and discuss the properties of the distribution with respect to each
f(x) F(x)
.02 .03 0 .2.4
37
Figure 3.1
Cumulative Generalized Cauchy
Distribution Functions
Figure 3.2
Generalized Cauchy Distribution
Probability Density Functions
38
defined objective.
The first objective stated deals with skewness. Some pro
bability distributions applied for flood frequency determinations,
namely the Gumbel and lognormal distributions, have constant skew
ness. Since annual extreme flood populations, however, have
(apparently) varying degrees of skewness, it is desirable to use a
distribution that can model any degree of skewness. The general
ized Cauchy distribution then satisfies this objective; as men
tioned in the previous section, it can, depending on the value of
y, model zero, negative or positive skewness.
eralized Cauchy distribution is again acceptable. Being a 3
parameter distribution, it is essentially.the same as a Pearson
Type III in this respect. In particular, it contains a location
parameter f3, a scale parameter a and a shape parameter y.
function may be expressed analytically and thus the third objec
tive is satisfied. This fact of course arises due to the deriva
tion of the distribution. Perhaps more importantly, however, is
the fact that the cumulative distribution function, equation (3.5),
can be expressed in inverse form as
This latter trait is quite desirable in application as it permits
convenient determinations of quantiles from specified probability
With respect to the second objective of parsimony, the.gen
The form of the cumulative generalized Cauchy distribution
(3.7)
39
levels without resorting to tables or iterative numerical proce
dures .
The final objective stated deals with boundedness and the
desire to model logarithms as opposed to realspace data. With
respect to this objective, the generalized Cauchy distribution
is again acceptable and has some particularly desirable attributes.
Since the generalized Cauchy distribution should be fitted to log
transformed data series, and since it is doubly unbounded, its
realspace counterpart has a single, lower bound of zero and no
upper bound. The early, physicallybased arguments of Hazen [1914],
Hall [1921] and Foster [1924] are revived and satisfied. A final
point worth noting with respect to boundedness addresses the
Pearson Type III distribution. This latter distribution, because
of its properties, invokes physically unjustifiable bounds on the
theoretical magnitudes that floods may assume. While it appears,
based upon the critiques of the distribution, that hydrologists do
not favor this, the distribution continues to be utilized for rea
sons of expediency and its ability to fit observed data series.
If one is to accept the Pearson Type III based upon goodness of fit
criteria, yet at the same time abhor its boundedness, it seems that
one should believe that the theoretically bounded tail of the
Pearson Type III should in reality be a very thin tail. Inspection
of the curves illustrated in Figures 3.1 and 3.2 provides the con
clusion that the generalized Cauchy distribution has this property.
In effect then application of the generalized Cauchy distribution
permits determinations of probability levels for flood magnitudes
40
outside the theoretical bounds imposed by the Pearson Type III.
While the generalized Cauchy distribution satisfies the
stated objectives, and is thus of interest, it does present some
problems with respect to estimation of its parameters. Since, like
the Cauchy distribution, its moments can not be analytically evalu
ated, alternative techniques to moment estimation, or an approxi
mate method of moments, must be utilized. A further discussion of
estimation techniques is contained in Chapter IV.
The location of the mode of the generalized Cauchy distri
bution is also somewhat difficult to evaluate and requires the
implementation of an iterative numerical procedure, at least when
y 41 1. When y = 1, of course, the mode is located at the point of
symmetry and is
x
mo
7=1
(3.8)
For cases when y Â£ 1, differentiation of equation (3.6) with re
spect to x yields the expression
df (x)
dx
2
a 7
tt[1 + (ax Â£) ]
1
____ftan ^(ax /3) + ll7 1
2 2 L tt 2 J
(Mp1(r + if 2< )
(3.9)
which is equal to zero when x is either plus or minus infinity or
when
41
E^tanV B) + lj1 = 2(qx _
(3.10)
The location of the mode then is that value of x
the equation
mo
which satisfies
rr 2tan (ax /3) it = 0. (3.11)
ax B mo
mo
Since the mode location is seldom determined in flood frequency
analyses, further discussion of this statistic is not forwarded
here.
Probability Paper
Since probability paper types vary from one distribution
to another, and since Cauchy distribution paper does not appear to
be commercially available, a paper for use with that distribution
has been developed and is presented in Figure 3.3. The method by
which probability paper may be developed for any distribution is
presented by Haan [1982].
The paper presented in Figure 3.3 has probability levels
as abscissas and discharges as ordinates to be consistent with
other papers commonly used in flood frequency analysis. Mathema
ticians, on the other hand, often plot the variates as abscissas
and probabilities as ordinates similar to the presentation of
Figure 3.1. Note that the discharge (ordinate) scale in Figure
3.3 is logarithmic; this, of course, is because the generalized.
Cauchy Probability Paper
Tl
H
JO
e
CD
LO
LO
O'
d)
00
it
CO
r<
13
CO
H
Q
Exceedance Probability (%)
4^
to
43
Cauchy distribution should be fitted to the logarithms of observed
annual flood series.
Plots of the cumulative distribution function on the devel
oped paper will be linear only if y = 1 and the skewness is thus
zero. When y=>1, the plots will be concave upward and, when yc1,
the plots will be concave downward.
CHAPTER IV
PARAMETER ESTIMATION
Short of choosing a probability distribution to apply in
given circumstances, the single most important aspect of frequency
analysis is arguably that of estimation of the parameters of the
distribution. Seemingly small errors, or differences, in parameter
estimates can have rather dramatic influences on quantile estimates
made in later phases of the analysis. Every effort should be made
to utilize the best estimation techniques available where the best
technique is defined here as that yielding estimates with a minimum
variance and which are unbiased.
There are a number of commonly recognized and applied par
ameter estimation techniques. In order of efficiency, these tech
niques are [Yevjevich, 1972]:
1. Graphical
2. Least Squares
3. Method of Moments
4. Method of Maximum Likelihood
Graphical estimation is very subjective, and with a 3parameter
distribution such as that derived here, can be quite tedious; ten
different investigators will most likely come up with ten different
45
sets of parameter estimates. The method of moments, which is prob
ably the most well known method, can not be applied in the case of
the generalized Cauchy distribution, at least not in the strictest
sense. As mentioned earlier, population moments can not be derived
since the integrals can not be evaluated. Least squares and maxi
mum likelihood techniques also present problems in the case of the
generalized Cauchy distribution. Since each of these methods re
quires the simultaneous solution of, in the case of the derived
distribution, highly nonlinear equations, the mathematics become
quite unweildy and essentially untractable. Individual parameters
can not be isolated from one another to facilitate the solution.
Considering these problems, it is apparent that other, al
ternative estimation techniques should be investigated. As dis
cussed in the following sections, a first alternative technique
examined here involves equating, the cumulative distribution func
tion to three observed quantiles and solving the resulting system
of equations. The equations are of course again nonlinear, but
are much more tractable than those encountered with either the
least squares or maximum likelihood techniques. A second technique
examined here involves an approximation of population moments which
may be subsequently equated to sample moments. In particular, em
pirical, graphical figures and regression equations are presented
which may be utilized to estimate values of the parameters from
sample means, variances and coefficients of skewness.
Following the development of each of the two alternative
estimation techniques, a Monte Carlo approach of randomly general
46
ing observations from known populations is utilized to compare
subsequently determined parameter estimates with the known popula
tion values. Based upon the results of these experiments,, a
recommendation is made as to which of the two techniques should be
applied in practice.
Estimation from Observed Quantiles
The generalized Cauchy distribution derived here has three
parameters, namely a, /3 and y, and thus three independent equations
are necessary to estimate them from an observed data series. In
order to illustrate this estimation technique, consider a sample
of n logtransformed observations which have been ranked in order
of increasing magnitude and assigned nonexceedance probability
levels with a plotting position formula. Consider also the esti
mation of three quantiles, say x50 an<^ x75 maY
performed from the ranked data. Now, the cumulative generalized
Cauchy distribution function is
F(x) = ^Â£2Â§1 + ij ; a>0; /=0; oo
which may be reformulated as
ax j6 = tan^r[F(x)^'^ i)]^; a=0; y=0; 005x500 . (4.2)
Since the value of F(x) corresponding to X25 is 0.25, the value of
F(x) corresponding to x,q is 0.50, and so on, one may then write
47
the simultaneous equations
 /3 = A = tan[ir(0.25*^ 
ax^Q 0 = B = tan[7r(0.50'''//^ 
and
ax75 0 = C = tan[Tr(0.751/r 
*)], (4.3)
*)] (4.4)
*)] (4.5)
which, when satisfied, yield estimates of the unknowns a, 0 and y.
This system of equations is of course nonlinear, but may be solved
without too much difficulty. If one makes an initial estimate of
/, values of A, B and C may be easily determined and initial esti
mates of a and 0 may be found by applying Cramer's rule to equa
tions (4.3) and (4.4) to obtain
A
a
B A
x50 x25
(4.6)
and
a ^x25
0 =
x50
A*50
X25
(4.7)
A A
Calculated values of a, 0 and C may then be substituted to deter
mine whether equation (4.5) is satisfied. If not, a new value of
y should be assumed and the procedure repeated. Newton's method
48
or other numerical root finding methods may be adopted to speed
convergence.
It should be noted that this estimation technique is some
what subjective. As presented in the foregoing development, the
quantiles x^x^q and xj^ are utilized; in reality, any three
quantiles could be adopted and, because of sampling errors in the
basic data, different estimates of parameters would most likely
result from the use of different quantiles. A method of eliminat
ing this subjectivity used here involves the estimation of numerous
quantiles from an observed data series and estimating different
sets of parameters from specified combinations of quantiles.
There is of course an infinite number of possible combinations that
one might adopt; however, in the interest of practicality, five
combinations are adopted here and are:
1. x5; x50; X95
2. x10; x50 x90
3. x15; X50 x85
4. x20; x50 x80
5. X25 X50 X75
Note that the first combination of the quantiles x,., x^q and x^^
can not be estimated when the sample size n is less than 19. This
is because the plotting position formula will assign probability
levels such that the smallest value will have a nonexceedance
probability greater than 0.05 and the largest sample value will
have a nonexceedance probability less than 0.95.
49
Since for each combination of quantiles one can develop
equations similar to (4.3) through (4.7), one can then develop
j = 1,2 ... 5 estimates fij, and Â£ Selection of a particular
set j of parameter estimates may be made based upon a root mean
square error (rmse) criterion defined by
(4.8)
where is the ith sample value, i = 1,2 ... n, with non
exceedance probability p^ and x^ is defined as
n; j = 1,2 ... 5
(4.9)
That set j of parameter estimates yielding the smallest rmse^
should be selected.
The adoption of an rmse criterion might lead one to confuse
this estimation procedure with the least squares procedure. While
there is certainly some similarity, a fundamental difference
exists. The technique discussed here forces the cumulative dis
tribution function to pass through three observed quantiles; least
squares methods do not impose this constraint.
Approximate Method of Moments
Strictly speaking, the method of moments can not be used
in the case of the generalized Cauchy distribution because evalua
50
tions of the integrals to derive population moments can not be
made. One can, however, numerically integrate the density func
tion to obtain approximations to population moments. Resulting
approximations may be presented in graphical form as functions of
parameter values and thus may be used to determine parameter esti
mates from known sample moments. In addition, the formulation of
regression equations relating approximate population moments to
parameter values may also be performed.
doubly unbounded, and recognizing that numerical integration can
not be performed over an infinite range, the methodology used here
for the development of the moment approximations involves the
adoption of cutoff points for the numerical procedure and the
approximation of the tails of the distribution with triangles. A
definition sketch of this is provided by Figure 4.1. The cutoff
points Xq i and Xgg ^ are selected such that the areas under the
density function curve in each tail are each equal to 0.005.
Applying equation (3.7) then values of Xq ^ and xgg 5 may t>e deter
mined from
Recalling that the generalized Cauchy distribution is
xQ 5 = i[tan[T(0.0051/r i) ] + /3]
(4.10)
and
_ 1
X99.5 a
[tan[ir(0.9951/y Â£)].+ fl].
(4.11)
of Moments Estimation Technique
52
Since for Xq ^ and Xgg ^ there are corresponding values of f(x^
and f(xgg ,.), and since the areas of the triangles must each be
equal to 0.005, the base lengths b^ of the triangles may be deter
mined to be
0.010
f(x0.5')
and
(4.12)
= 0.010
1 f('x99.5')
(4.13)
Also, since the centroid of a triangle lies onethird of its height
from its base, moment arms a^ of the triangles with respect to the
origin are
a0 ~ x0.5
(4.14)
and
h
al X99.5 + 3
(4.15)
so the rth noncentral moments of the combined tail areas become
0.005
0.5
0.010 ir r ooio ]r\
3f(^)J + LX99.5 + 3f(^37J y
(4.16)
53
Note that the moments expressed by equation (4.16) include the
tails of the distribution only. Using a trapezoidal rule integra
tion technique with an increment of Ax then the complete rth non
central moments may be defined as
ff(x) + f(xi+1)'
Ax
+
Ax I
2 I
+ 0.005
0.5
0.010 ir r 0.010 ]rN\
3f(x0_5)J + LX99.5 + 3f(xg95)J )
(4.17)
where the summation is performed over values of X lying between
Xq and Xg9 Noncentral moments thus determined may be con
verted to central moments through the application of equations
(2.8) and (2.9).
Since it is desired to plot graphical curves of various
statistics, namely the mean, variance and coefficient of skewness,
as functions of parameter values, it is helpful and informative to
obtain some knowledge of the particular parameters influencing the
various statistics. A manner in which this may be accomplished is
to express the statistics in terms of their corresponding popula
tion moments; i.e., in terms of the integrals defining those
moments. These integrals of course can not be analytically evalu
ated, but insight as to influential parameters may be gleaned.
As an illustration of this, consider the mean which is defined by
(4.18)
54
Since the cumulative generalized Cauchy distribution function may
be written in inverse form, and since f(x)dx = dF(x), equation
(4.18) may be rewritten as
f F 1(x)dF(x). (4.19)
J0
Substituting equation (3.7) then,
H
*]) +/3]dF(x)
(4.20)
or
(4.21)
Since the first integral in equation (4.21) contains only the para
meter y, that equation may be written more compactly as
H = [Â£ + (y) 1
a 1
(4.22)
where ^(y) denotes an unknown function of y. The mean p may then
be seen to be a function of all three parameters a, /3 and y.
Proceeding in a similar fashion for the variance and coefficient of
skewness, one can derive
2 My)
(4.23)
55
and
cs = ^ = *3(r) (4.24)
O
where the terms ^(x) and ^(y) again denote unknown functions of
y. The variance is then a function of the parameters a and y and
the coefficient of skewness is a function of y only.
With knowledge of these influential parameters, one may
vary them one at a time and approximately determine population mo
ments using equation (4.17). For the purposes of this study, val
ues of a have been taken in increments of 10 from 10 to 60, values
of (3 in increments of 40 from 0 to 320 and values of y in incre
ments of 0.01 from 0.95 to 1.05. The results of these analyses
are shown in Figures 4.2, 4.3 and 4.4.
Based upon the data values developed to generate these
figures, one may perform regression analyses to develop mathemati
cal relationships between parameter values and the approximate
statistics. While the details of their derivations are not pre
sented, these regression equations, along with their corresponding
2
coefficients of determination r are
E(X) = fj. = P + y ~ 1 (r2 = 1.000), (4.25)
Var(X) = cr2 = 5520_x 11.,770/ + 6402 ^2 = 0<998) (4.26)
Mean,
Figure 4.2
Approximate Relationship Between a, /3, y and ft
Ui
cr
Variance,
57
o
Figure 4.3
Approximate Relationship Between a, y and
Coefficient of Skewness,
58
0.95 1.00 1.05
r
Figure 4.4
Approximate Relationship Between y and Cg
59
and
C = 146.ly2 + 362.1r 215.4; X>1 (r2 = 0.999); (4.27)
s
C = 69.8(y 1); ySl (r2 = 0.999). (4.28)
s
These equations should only be considered valid within the ranges
of the parameters a, ft and y used in their developments and as
shown in Figures 4.2 through 4.4. Beyond these ranges, incorrect
estimates may result.
Application of this approximate method of moments involves
calculating sample statistics from the logtransformed values x^,
i = 1,2 ... n, of an observed data series using
i n
x = y x.,
n l
i=l
(4.29)
2
s
n
(4.30)
and
G = (x. x) (4.31)
(n l)(n 2)s i=l
2
and equating them, respectively, to p.,
allows one to determine y from Figure 4.4 and, in turn, knowledge
2
of y and s allows one to determine a from Figure 4.3. The value
of jS may be finally estimated using Figure 4.2 with knowledge of x,
60
a and y. As an alternative to using Figures 4.2 through 4.4, one
may rearrange equations (4.25) through (4.28) to yield estimates
2
of a, fi and y from known values of x, s and G.
sections are unconventional and thus some knowledge of their pro
perties is desirable. One particularly wishes to know which of
the two techniques is more reliable in terms of parameter estima
tion. In order to make this assessment, a Monte Carlo experimental
approach has been adopted for the case of the approximate method
of moments technique. Results of the experiments, in conjunction
with a process of elimination, leads to a recommendation as to
which of the two techniques should be utilized.
considering the adoption of some known population distribution;
i.e., "true" values of the parameters a, /3 and y are specified.
With this known population, one can randomly generate j = 1,2 ... m
samples of x. where each sample is of size i = 1,2 ... n. Esti
Comparison of Estimation Techniques
Parameter estimation techniques discussed in the previous
An. illustration of the Monte Carlo approach may be made by
mates a., ft. and y. can then be estimated for each of the m samples
and one can determine the statistics
(4.32)
(4.33)
61
i m
E 3 L?!
Jl
1 ni ty
Var(a) = ^ J [S E(S)]Z,
j=l
, m
A 1 a a 7
Var(/3) = jpiy Â£ [/3 E(yS)]Z
j=l J
and
Var(^} = [^i E(^)]
j=i
Biases B() may also be estimated f
B(S) = E(S) a, (4.38)
B(/3) = E(^) /3 (4.39)
and
B(y) = E(Â£) y. (4.40)
Repeating this entire process for different populations and for
various sample sizes n enables one to make a fairly well qualified
determination as to the acceptability of an estimation technique.
If a technique is desirable, biases will be either small or pre
dictable and should decrease as n increases. Similarly, variances
(4.34)
(4.35)
(4.36)
(4.37)
62
should be relatively small and should also decrease as n increases.
For the purposes of this study, m is taken as m = 50 and
sample sizes n are taken as n = 15, 30, 45, 60. Values of a and y
are taken in discrete increments and within the ranges shown by
Figures 4.2 through 4.4. Values of /8 have been held constant at
/Q = 0 since this parameter has no influence other than to change
the absolute magnitudes of the generated variates. A partial list
ing of the experimental results is tabulated in Appendix A for the
case of the approximate method of moments technique. Observations
which may be made from those results are:
1. The variance exhibited by estimates of a is quite large, even
for sample sizes as large as n = 60. Since observed annual
flood series are typically rather short, relatively inefficient
estimates would be expected to be yielded.
2. Biases in estimates of a decrease with increasing n, but seem
to be a function of the population value of a. Bias correc
tion equations would thus be implicit in terms of a.
3. Variances exhibited by estimates of (3 decrease with increasing
n, but biases in estimates of /3 increase with increasing n.
Estimates of yS are therefore inconsistent in that they do not
converge to the population value.
4. ' Variances and biases exhibited by estimates of y both increase
with increasing n. Estimates of y are therefore neither con
sistent nor efficient.
63
Monte Carlo experiments have not been performed for the
observed quantile estimation technique due to the amount of compu
ter time that would be required; however, based upon the results .
summarized above, one is left with no recourse but to consider the
approximate method of moments as unacceptable. By elimination then
the observed quantile estimation technique should be utilized in
practice. While this latter technique may not yield the most effi
cient estimates imaginable, it at least yields consistent estimates
since, as n increases, observed quantiles approach their population
values. This of course is one of the fundamental tenets of proba
bility theory and is a vast improvement over the inconsistent
estimates exhibited by the approximate method of moments.
A drawback to the observed quantile estimation technique
lies in its application. Since a number of combinations of quan
tiles should be estimated from an observed data series, and since
estimation of sets of parameters for each combination of quantiles
involves an iterative numerical technique, fitting of a distribu
tion function can be quite tedious. The development of a computer
program to facilitate fitting would have easily recognizable bene
fits.
CHAPTER V
APPLICATIONS TO REAL DATA
Chapter III has provided a derivation of a generalized
Cauchy distribution and Chapter IV has made a recommendation as to
a technique by which that distribution may be fitted to an observed
data series. One can now proceed to fit the distribution to real,
observed data series and to make a comparison of its fit with fits
obtained using other commonly applied distributions. This chapter
provides an illustration of fit comparisons using a few actually
observed annual peak and runoff series. The Pearson Type III dis
tribution is also fitted to these series and is used for comparison
purposes since it is probably the most commonly applied distribu
tion in the United States. Unlike the lognormal and Gumbel dis
tributions, application of the Pearson Type III does not require
that the sample skewness be insignificantly different from a con
stant population value.
Following the presentation of sitespecific analyses, an
assessment of the ability of the generalized Cauchy distribution
to model or explain the separation effect [Matalas et al., 1975]
is made. This phenomenon was briefly mentioned in Chapter I and
is discussed in more detail here.
65
SiteSpecific Comparisons
The purpose of this section is to summarize and discuss
comparisons of the fits of the generalized Cauchy distribution and
the Pearson Type III distribution to some actually observed data
series. A summary of the data series used is given in Table 5.1;
complete tabulations of the series are contained in Appendix B.
While the main thrust of the work presented by this thesis is re
lated to annual extreme series, many of the data series modeled
here are actually total annual runoff series. Since the same dis
tributions are often used for both types of series, however, and
since the basic objectives underlying the development of the
generalized Cauchy distribution are still valid for total annual
runoff series, this should not present any problems. Indeed, the
inclusion of both types of series should broaden the base of the
comparisons and may lend a better understanding as to the diversity
of applications of the derived distribution. In addition to the
inclusion of different types of data series, an attempt is made to
include data series from a diverse group of geographical locations.
This will hopefully eliminate any biases that might be perceived
in the comparisons if all data series were taken from one region
only.
Graphical illustrations of fitted distribution functions
for each of the sites listed in Table 5.1 are provided by Figures
5.1 through 5.6. The illustrations have been prepared using a
cartesian paper as opposed to the probability paper developed in
Table 5.1
Summary of Stations for Distribution Comparisons
Sample Statistics of Logs
Station Description Series Type Years of Record X 2 s G
Cave Creek near Fort Spring, Kentucky Annual Runoff 18 3.14 0.037 2.09
Spray River at Banff, Canada Annual Runoff 45 2.64 0.031 1.93
Green River at Munfordville, Kentucky Annual Runoff 21 5.94 0.025 0.60
Weldon River at Mill Grove, Missouri Annual Runoff 31 2.27 0.137 0.20
San Juan River at Pagosa Springs, Colorado Annual Peaks 42 3.34 0.033 0.27
Fishkill Creek at Beacon, New York Annual Peaks 24 3.37 0.060 0.73
67
Figure 5.1
Distributions of Annual Runoff, Cave Creek
near Fort Spring, Kentucky
F(x)
68
X
Figure 5.2
Distributions of Annual Runoff, Spray River
at Banff, Canada
F(x)
69
Figure 5.3
Distributions of Annual Runoff, Green River
at Munfordville, Kentucky
0.4 0.6 0.8
70
Figure 5.4
Distributions of Annual Runoff, Weldon River
at Mill Grove, Missouri
F(x)
71
Figure 5.5
Distributions of Annual Snowmelt Peak Flows,
San Juan River at Pagosa Springs, Colorado
F(x)
72
Figure 5.6
Distributions of Annual Peak Flows, Fishkill Creek
at Beacon, New York
73
Chapter III since the latter does not permit the illustration of
the theoretical bounds invoked by the Pearson Type III distribution
when those bounds fall within the ranges of the figures.
Data points shown in Figures 5.1 through 5.6 have been
plotted using the Weibull plotting position formula (equation
(2.16)). Fitting of the generalized Cauchy distribution has been
performed using the observed quantile estimation technique recom
mended in Chapter IV. Parameter estimates for that distribution,
as well as the corresponding solution basis quantiles, are. sum
marized in Table 5.2. Fitting of the Pearson Type III distribution
has been performed based upon the Bulletin 17B guidelines [USWRC,
1981] with two exceptions: First, outlier testing and adjustment
techniques have not been applied; and second, regional skew ad
justments have not been made. Since techniques comparable to
these have not been developed for the generalized Cauchy distribu
tion, their application to the Pearson Type III might tend to bias
the comparisons.
An observation which may be made from these figures is that
the generalized Cauchy distribution has much heavier (thicker)
tails than the Pearson Type III does. In the cases of Figures 5.1
and 5.2, the Pearson Type III has no upper tail at all since the
skewness is sufficiently negative. A point also worth noting with
respect to these two figures is that the upper theoretical bounds
of the Pearson Type III are actually lower than observed data
values. The heaviness of the tails of the generalized Cauchy dis
tribution is of some concern; values of X corresponding to say the
Table 5.2
Summary of Generalized Cauchy Distribution Parameter Estimates
Parameter Estimates
Station Description a
Cave Creek near Fort 17.33
Spring, Kentucky
Spray River at Banff, 50.64
Canada
Green River at Munfordville, 28.71
Kentucky
Weldon River at Mill Grove, 11.95
Missouri
San Juan River at Pagosa 20.44
Springs, Colorado
Fishkill Creek at Beacon,
New York
p y Solution Basis Quantiles
55.41 0.787 X15 o LP) X x85
137.53 0.664 X5; X50; x95
171.79 0.842 X5; X50; X95
27.81 0.913 X5 x50; X95
68.16 1.063 X5 o X X95
40.72 1.155 X5 x50; X95
12.19
75
100year flood are significantly higher than those inferred by the
Pearson Type III, often by as much as an order of magnitude or
more. Tails of the generalized Cauchy distribution are much like
inverse power functions as opposed to the exponential type tails
of the Pearson Type III. This explains their heaviness.
An additional observation, which is linked to the heavy
tails, is that the generalized Cauchy distribution is more kurtotic
(peaked) than the Pearson Type III. This is manifested by the
steeper slopes of the cumulative distribution function in the mid
dle of the range.
A third observation pertains to skewness. Regardless of
the heavy tails, and of the steep slopes of the cumulative distri
bution function, the fit of the generalized Cauchy distribution
appears to improve as the absolute value of the sample skewness
increases. This is particularly evident in the upper tail of the
distribution.
, A final observation made here is that the solution basis
quantiles for final parameter estimates are nearly always x^, x^q
and xgj. In only one case of the six evaluated is a varying re
sult obtained and, in that case, the quantiles x^, x^g and Xg,. can
not be estimated from the sample because the sample size is too
small. The fact, however, that the solution basis in that one case
is the quantiles x^,., x^g and Xg,, as opposed to x^g, x^g and Xgg,
reinforces the rmse criterion upon which the parameter estimates
are based.
76
Regional Skews and the Separation Effect
Based upon the foregoing conclusions, and recalling the
discussions of Houghton [1978] mentioned earlier with respect to
the behaviors of distribution tails and kurtosis necessary to ex
plain the separation effect, it seems appropriate to make an
assessment of the generalized Cauchy distribution in this regard.
Specific attributes of the generalized Cauchy distribution leading
to this are its kurtosis as well as the fact that it has a very
thick upper tail and a relatively (with respect to other commonly
used flood frequency distributions) thick lower tail when its skew
ness is positive.
In an exposition of the separation phenomenon, Matalas et
al. [1975] showed that seven commonly used probability distribu
tions, namely the normal, lognormal, Gumbel, Pearson Type III,
Weibull, Pareto and uniform, are incapable of modeling statistics
of the coefficient of skewness observed in nature. In particular,
standard deviations of the coefficient of skewness, when plotted
against means of the coefficient of skewness for each of the 14
regions in the U.S., are consistently and significantly higher for
actually observed, historical data than those obtainable by syn
thesis of random observations from the seven hypothetical distri
butions. As noted by Matalas et al., this seems to imply that
none of the seven considered is the true distribution of floods
but does not imply that one of them might not be a good approxima
tion to the true underlying distribution.
77
A Monte Carlo approach similar to that used by Matalas et
al. is used here to make an assessment of the ability of the
generalized Cauchy distribution to explain the separation effect.
The approach involves the generation of j = 1,2 ... 500 samples
of generalized Cauchydistributed variates x_^ where each sample
is of size i = 1,2 n, n = 10, 20, 30. The sample coefficient
of skewness Gy is then calculated for each sample from
 x3
G. =
J
Y (x. x.)
ni=i ^ y
s .
J
(5.1)
where
x .
J
i n
n A*
i=l
x. .
ij
and
(5.2)
(5.3)
Finally, the mean skew G and the standard skew s(G) are computed
from
500
Â£Gi
j=l J
(5.4)
and
78
r i _ 9 i 2
S(G) = [Wol^GjG) J (55)
j=i
and plotted against each other for various sample sizes n. Popu
lation values of y for purposes of the random variate generation
are varied in increments of 0.03 from 0.82 to 1.39 but the values
of a and /3 are taken as constants of 10.0 and zero, respectively.
As shown earlier, these latter parameters have no influence on
skewness.
. Results of the Monte Carlo experiments are shown in Figures
5.7 through 5.9. The darkened circles represent historically ob
served values for each of the 14 regions, the shaded bands repre
sent the ranges of results obtained by Matalas et al. and the
open circles and dashed lines represent results obtained for the
generalized Cauchy distribution. The fact that the mean and
standard skews, as well as the shaded band width, increase with n
is attributable to the Kirby bounds [Kirby, 1974] which state that
sample skews are bounded and depend only on n as
G <
(n 2)
(n 1)
172
(5.6)
As n increases, so does the range of sample skews (and their vari
ance) which may be realized in samples of size n.
The conclusion that the generalized Cauchy distribution
overcompensates for the separation effect supports the belief that
its tails are too heavy but is a significant conclusion nonetheless
Standard Skew
o
CN in 0 leralized Can ichy Data n = 10
tH s a) CO o / . Q o s
& d u> CO m * ,*".C ^ 7 1 1 J ^ Histc 9 >rical Data
kssssss~v;r?ss
o ^Matale is, et al. Re suits
c 0. 5 1. 0 1. 5 2 .0 2 .5 3. 0
Mean Skew
Figure 5.7
Statistics of Observed and Synthesized Skews
Standard Skew
o
CN
O " '*>* a a n = 20
O ^ Genere o o x. tlized Cauchy Data
Historica 1 Data X
e t
\ Matalas et al. Resul ts
LO
o'
0.5
1.0 1.5 2.0
Mean Skew
2.5
3.0
Figure 5.8
Statistics of Observed and Synthesized Skews
Standard Skew
Figure 5.9
Statistics of Observed and Synthesized Skews
82
for it implies that the most basic of the objectives underlying
the derivation may be correct. The objective referred to is that
of a doubly unbounded distribution. The existence of a significant
lower tail enables the realization of extremely low observations
which can have significant effects on variances and means exhibited
by coefficients of skewness. The tails must apparently be thinner
than those of the generalized Cauchy distribution however.
An additional observation that may be made from these re
sults is that there is a marked tendency for the standard skew to
decrease as the mean skew increases for the generalized Cauchy
distribution. This is the opposite of usually experienced trends
(refer to the shaded bands of Figures 5.7 through 5.9). While the
writer has no conclusive evidence as to the reason for this be
havior, it is believed that it may be due to the progressive thin
ning of the lower tail of the generalized Cauchy distribution as
the magnitude of positive skewness increases. In effect, very low
magnitudes of X become highly improbable and, resultingly, vari
ances exhibited by the coefficient of skewness decrease.
CHAPTER VI
SUMMARY AND CONCLUSIONS
The previous pages have briefly discussed some theoretical
and practical strengths and weaknesses in a number of probability
distributions commonly used for flood frequency analyses. KnowT
ledge of these various attributes has led to an enumeration of
some desirable characteristics of theoretical flood frequency dis
tributions and a distribution has been derived that has these
characteristics. Termed here as the generalized Cauchy, the
derived distribution turns out to be a 3parameter version of the
well known Cauchy distribution. The third parameter introduced,
namely y, has the effect of changing the skewness of the otherwise
symmetrical distribution. Depending on the magnitude of y, the
generalized version of the Cauchy distribution may exhibit zero,
positive or negative skewness, yet remains doubly unbounded.
Following the derivation of the generalized Cauchy distribution,
methods of estimation of its parameters have also been discussed.
While work remains to be done in this area, a recommendation to use
an observed quantile estimation technique has been made here.
Future work might address alternative estimation techniques.
Applications of the generalized Cauchy distribution to hy
drologic problems have not been found; however, future research
84
directions have been identified. Indeed, the most basic of the
hypotheses underlying the derivation, namely those of double un
boundedness and simultaneous skewness, appear to be correct. The
fact that the separation effect is overcompensated for by the dis
tribution justifies this claim. It appears as though a similar
distribution with thinner tails of an exponential rather than
power function nature will meet the desired objectives of fitting
sitespecific series as well as explaining the separation effect.
The inducement of skewness to the normal distribution in a manner
similar to that performed here might be one way to accomplish this;
use of the Pearson Type IV distribution might be another. These
latter distributions are not easily expressable, particularly in
cumulative distribution function form, but appear worthy of evalu
ation nonetheless.
While the study presented here seems to have failed in its
intent to identify an alternative distribution for flood frequency
analyses, and thus perhaps its title should be revised, signifi
cance can be claimed. Not only has it identified a new direction
for flood frequency research, but it has made a contribution to the
repertoire of tools at the statistician's disposal. Some basic
characteristics of the distribution have been identified here, but
additional work needs to be done to examine sampling distributions
of its parameters for hypothesis testing purposes. Work also needs
to be done to develop methods of placing confidence limits on the
cumulative distribution function curve.
In closing, one should come away with the impression that
85
a tremendous amount of work remains to be done. This study has
dealt with one of the simplest theories of probability and mathe
matical statisitcs, namely that of continuous univariate distribu
tions. In combination with the balance of the theories underlying
those major headings, one could easily devote a lifetime of study
yet barely scratch the surface. Study of these fields is particu
larly exciting, especially when they have applications as direct
as they do in water resources engineering, and can be both reward
ing and frustrating. Since the theories deal with random vari
ables, it is often difficult to even speculate as to the types of
results that might be obtained in any given endeavor. One must
therefore adopt a positive attitude; even the most frustrating
realizations become rewarding when taken in the proper context.
CITED REFERENCES
Beard, L.R. 1943. "Statistical Analysis in Hydrology." Transactions,
American Society of Civil Engineers, Vol. 108, pp. 11101160.
Bodhaine, G.L. 1961. "FloodFrequency Relationships in the Pacific
Northwest." Transactions, American Society of Civil Engineers,
Vol. 126, Part I, pp. 18581867.
Chow, V.T. 1951. "A Generalized Formula for Hydrologic Frequency
Analysis." Transactions, American Geophysical Union, Vol. 32,
No. 2, pp. 231237.
Costa, J.E., and R.D. Jarrett. 1981. "Debris Flows in Small
Mountain Stream Channels of Colorado and Their Hydrologic
Implications." Bulletin of the Association of Engineering
Geologists, Vol. XVIII, No. 3, pp. 309322.
Dawdy, D.R. and D.P. Lettenmaier. 1987. "Initiative for Risk
Based Flood Design." Journal of Hydraulic Engineering, Vol.
113, No. 8, pp. 10411051.
Dennis, H.W. 1921. "A Method for Adapting the Records of Stream
Flow at One Point to Another Point on the Same Stream."
Transactions, American Society of Civil Engineers, Vol. LXXXIV,
pp. 551569.
Evans, W.S. 1930. "The Graphical Solution of a Correlation Table."
Transactions, American Society of Civil Engineers, Vol. 94,
pp. 936960.
Foster, H.A. 1924. "Theoretical Frequency Curves and Their
Application to Engineering Problems." Transactions, American
Society of Civil Engineers, Vol. LXXXVII, pp. 142203.
Foster, H.A. 1934. "Duration Curves." Transactions, American
Society of Civil Engineers, Vol. 99, pp. 12131267.
Gumbel, E.J. 1945. "Floods Estimated by Probability Method."
Engineering NewsRecord, Vol. 134, p. 833.
Guttman, I., S.S. Wilks, and J.S. Hunter. 1982. Introductory
Engineering Statistics, 3rd Edition. John Wiley & Sons: New
York.
87
Haan, C.T. 1982. Statistical Methods in Hydrology, 3rd Printing.
Iowa State University Press: Ames.
Hall, L.S. 1921. "The Probable Variations in Yearly RunOff as
Determined from a Study of California Streams." Transactions,
American Society of Civil Engineers, Vol. LXXXIV, pp. 191257.
Harza, L.F. 1921. Discussion of "The Probable Variations in Yearly
RunOff as Determined from a Study of California Streams" by
L.S. Hall. Transactions, American Society of Civil Engineers,
Vol. LXXXIV, pp. 191257.
Hazen, A. 1914. "Storage to be Provided in Impounding Reservoirs
for Municipal Water Supply." Transactions, American Society
of Civil Engineers, Vol. LXXVII.
Hosking, J.R.M., and J.R. Wallis. 1986. "Paleoflood Hydrology and
Flood Frequency Analysis." Water Resources Research, Vol. 22,
No. 4, pp. 543550.
Houghton, J.C. 1978. "Birth of a Parent: The Wakeby Distribution
for Modeling Flood Flows." Water Resources Research, Vol. 14,
No. 6, pp. 11051109.
Hughes, W.C. 1977. "Peak Discharge Frequencies from Rainfall
Information." ASCE Journal of the Hydraulics Division, Vol.
103, No. HY1, pp. 3950.
Jarrett, R.D., and J.E. Costa. 1983. "Multidisciplinary Approach
to the Flood Hydrology of Foothill Streams in Colorado."
Proceedings, International Symposium on Hydrometeorology,
American Water Resources Association.
Jennings, M.E., and M.A. Benson. 1969. "Frequency Curves for
Annual Flood Series with Some Zero Events or Incomplete Data."
Water Resources Research, Vol. 5, No. 1, pp. 276280.
Kirby, W. 1974. "Algebraic Boundedness of Sample Statistics."
Water Resources Research, Vol. 10, No. 2, pp. 220222.
Kite, G.W. 1985. Frequency and Risk Analyses in Hydrology, 3rd
Printing. Water Resources Publications: Littleton.
Lane, E.W., and K. Lei. 1950. "Stream Flow Variability."
Transactions, American Society of Civil Engineers, Vol. 115,
pp. 10841134.
Linsley, R.K., M.A. Kohler, and J.L.H. Paulhus. 1982. Hydrology
for Engineers, 3rd Edition. McGrawHill Book Co: New York.
88
Matalas, N.C., J.R. Slack, and J.R. Wallis. 1975. "Regional Skew
in Search of a Parent." Water Resources Research, Vol. 11, No.
6, pp. 815826.
Mavis, F.T. 1970. "Nomographs for Some Cumulative Frequency
Distributions." ASCE Journal of the Hydraulics Division, Vol.
96, No. HY10, pp. 21772183.
Potter, K.W. 1987. "Research on Flood Frequency Analysis: 1983
1986." Contributions in Hydrology, U.S. National Report to
International Union of Geodesy and Geophysics. American
Geophysical Union: Washington, D.C. pp. 113118.
Stedinger, J.R., and T.A. Cohn. 1986. "Flood Frequency Analysis
with Historical and Paleoflood Information." Water Resources
Research, Vol. 22, No. 5, pp. 785793.
USWRC. 1981. "Guidelines for Determining Flood Flow Frequency."
Bulletin No. 17B of the Hydrology Committee, U.S. Water
Resources Council: Washington, D.C.
Wallis, J.R., and E.F. Wood. 1985. "Relative Accuracy of Log
Pearson III Procedures." Journal of Hydraulic Engineering,
Vol. Ill, No. 7, pp. 10431056.
Yevjevich, V.M. 1972. Probability and Statistics in Hydrology.
Water Resources Publications: Littleton.
APPENDIX A
MONTE CARLO EXPERIMENT RESULTS
Table A.l following is a partial listing of results of
Monte Carlo experiments performed to assess the acceptability of
parameter estimates derived using the approximate method of mo
ments technique. Discussions of this estimation technique, as
well as of the Monte Carlo procedure used, are contained in
Chapter IV.
Tabulated statistics in this appendix should not be used
to describe sampling distributions of parameter estimates. Since
but 50 samples have been generated to assess the statistics, sto
chastic variations are evident in the tabulated values. Trends
are indicated however.
Table A.l
Monte Carlo Experiment Results
n a Â£ y E(3) E(/3) E(r) Var(a) Var(/3) Var(y)
15 15 0 0.95 69 0.02 1.01 1670 12.0 0.001
30 15 0 0.95 45 0.05 1.03 638 5.0 0.002
45 15 0 0.95 42 0.27 1.04 371 2.8 0.002
60 15 0 0.95 39 0.84 1.06 559 1.7 0.003
15 15 0 1.01 53 0.04 1.01 891 9.4 0.001
30 15 0 1.01 34 0.24 1.04 574 4.4 0.002
45 15 0 1.01 45 1.32 1.05 679 2.1 0.001
60 15 0 1.01 35 1.25 1.07 536 1.4 . 0.003
15 15 0 1.05 48 0.15 1.02 983 9.0 0.001
30 15 0 1.05 47 1.57 1.04 960 3.8 0.001
45 15 0 1.05 41 1.06 1.05 682 3.5 0.002
60 15 0 1.05 38 1.62 1.08 676 1.7 0.004
Table A.l
(Continued)
n a 0
15 30 0
30 30 0
45 30 0
60 30 0
15 30 0
30 30 0
45 30 0
60 30 0
15 30 0
30 30 0
45 30 0
60 30 0
y E(S) EGÂ§)
0.95 122 0.02
0.95 108 0.27
0.95 85 0.35
0.95 76 0.74
1.01 142 1.09
1.01 130 1.79
1.01 73 0.83
1.01 89 1.38
1.05 106 0.71
1.05 75 0.90
1.05 89 1.38
1.05 61 1.48
E(y) Var(a) Var(@) V a.r.(y)
1.01 7250 10.0 0.001
1.02 3900 6.9 0.001
1.04 2870 3.2 0.003
1.06 2590 1.5 0.004
1.01 8110 12.4 0.001
1.03 3790 3.6 0.001
1.06 2880 2.6 0.003
1.06 2850 1.3 0.003
1.02 2990 7.3 0.001
1.04 3260 4.6 0.002
1.05 3180 2.4 0.002
1.08 1760 0.5 0.003
Table A.l
(Continued)
n a JB
15 60 0
30 60 0
45 60 0
60 60 0
15 60 0
30 60 0
45 60 0
60 60 0
15 60 0
30 ' 60 0
45 60 0
60 60 0
r E(S) E(Â£)
0.95 261 0.17
0.95 139 0.48
0.95 174 0.70
0.95 139 0.56
1.01 227 0.50
1.01 198 1.27
1.01 209 1.39
1.01 189 1.54
1.05 260 1.30
1.05 180 1.04
1.05 164 1.39
1.05 101 1.30
E(r) Var(a)
1.01 31500
1.03 8910
1.04 8430
1.06 9690
1.02 19400
1.03 13900
1.04 12200
1.05 7240
1.01 19800
1.04 13000
1.06 13100.
1.11 7900
Var(/3) Var(y)
10.1 0.001
8.2 0.002
2.8 0.002
2.2 .0.004
11.1 0.001
3.2 0.001
2.8 0.002
0.9 0.001
7.9 0.001
3.3 0.001
1.6 0.002
1.3 0.005
vO
ho
