AN IMPROVED UNCONSTRAINED GLOBAL
OPTIMIZATION ALGORITHM
by
Ronald John Van Iwaarden
B. A., University of Colorado at Denver, 1989
M. A., University of Colorado at Denver, 1992
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
1996
This thesis for the Doctor of Philosophy
degree by
Ronald John Van Iwaarden
has been approved
by
Weldon Lodwick
Harvey Greenberg
George Corliss
Jennifer Ryan
Tom Russell
Date
Acknowledgement
I would like to recognize the guidance of Dr. Weldon Lodwick,
without whom the ideas for this dissertation would not have grown. I rec-
ognize the criticism of Dr. Harvey Greenberg who pushed me farther than
I thought I could go. I recognize the critique by Dr. George Corliss who
improved the presentation of this dissertation and has been a continual
source of inspiration and support. Finally, I recognize the love, support
and devotion of my wife, Stephanie Van Iwaarden. She have given be the
strength to endure and the love that was both necessary and sufficient
to complete this work. Thanks to all who supported and instructed me
through the years and have not been directly mentioned here.
m
Van Iwaarden, Ronald John (Ph.D., Applied Mathematics)
An Improved Unconstrained Global Optimization Algorithm
Thesis directed by Associate Professor Weldon Lodwick
ABSTRACT
Global optimization is a very hard problem especially when the
number of variables is large (greater than several hundred). Recently,
some methods including simulated annealing, branch and bound, and an
interval Newtons method have made it possible to solve global optimiza-
tion problems with several hundred variables. However, this is a small
number of variables when one considers that integer programming can
tackle problems with thousands of variables, and linear programming is
able to solve problems with millions of variables. The goal of this research
is to examine the present state of the art for algorithms to solve the un-
constrained global optimization problem (GOP) and then to suggest some
new approaches that allow problems of a larger size to be solved with an
equivalent amount of computer time. This algorithm is then implemented
using portable C-|f and the software will be released for general use.
This new algorithm is given with some theoretical results under
which the algorithm operates. Numerical results that demonstrate the
improvement that can be obtained by using this new algorithm.
IV
This abstract accurately represents the content of the candidates thesis.
I recommend its publication.
Weldon Lodwick
v
CONTENTS
Chapter
1. Introduction: Unconstrained Global Optimization Problems . 1
1.1 Introduction................................................... 1
1.2 Overview ...................................................... 2
2. A Review of General Global Optimization Methods............. 3
2.1 Terminology.................................................... 4
2.2 Probabilistic Methods ......................................... 5
2.2.1 Simulated Annealing.......................................... 6
2.2.2 Adaptive Simulated Annealing ............................... 10
2.2.3 Multi-level, Single Linkage (Clustering Methods) ........... 11
2.3 Deterministic Methods........................................ 15
2.3.1 Cutting Plane Methods ...................................... 16
2.3.2 Branch and Bound History ................................. 19
2.3.3 Branch and Bound............................................ 26
2.3.4 Bounding Functions ......................................... 29
2.3.4.1 Bounding Based on Lipschitz Constants................... 30
2.3.4.2 Interval Arithmetic....................................... 30
2.3.4.3 Interval Taylors Expansion............................... 32
2.3.4.4 Slope Form................................................ 34
2.3.4.5 Convex Envelopes.......................................... 35
2.3.5 Partitioning Schemes....................................... 36
2.4 Result Verification
40
2.4.1 Interval Newton Operator................................ 41
2.4.2 Krawczyk Operator....................................... 42
2.4.3 Hansen-Sengupta Operator ............................... 43
2.5 Interval Iteration......................................... 45
2.5.1 Hansens Algorithm ..................................... 46
3. Back-Boxing Methods For Global Optimization................ 51
3.1 Back-Boxing.............................................. 54
4. Test Problems.............................................. 70
4.1 Published Problem Sets ................................... 70
5. Results.................................................... 83
5.1 Implementation............................................. 83
5.2 Numerical Results ...................................... 88
5.3 Conclusions and Directions for Further Research............ 95
5.3.1 Open Questions on Back-Boxing .......................... 95
Appendix....................................................... 96
A. Data for Numerical Results ................................. 97
Bibliography ................................................. 110
vii
FIGURES
Figure
2.1 Clusters of the function (x2 l)2....................... 12
3.1 First locate a local minimum............................. 62
3.2 Place a box around that point such that / is convex . 63
3.3 Partition the region around that box..................... 63
3.4 Partitions for B B'. The first is from Algorithm 14
and the second is the memory intensive method........ 67
viii
1. Introduction: Unconstrained Global Optimization Problems
1.1 Introduction
The general problem to be considered is min f(x) Â£ 3?, where
x Â£ S and S' is a non-empty compact set in 3ftn. Although general compact
sets can have complex structure, S is assumed here to be a box [x1?xi] X
[aq, X X \x_n, xn] Â£ 3?. The general unconstrained global optimization
is one of the problems in mathematics that is still considered to be very
hard. Murty and Kabadi [71] construct a class of global optimization
problems for which either identifying a given point as a local minimum or
showing the objective to be unbounded is an NP-complete problem. This
is also shown by Pardalos and Vavasis [75] for quadratic programming in
1991.
The class created by Murty and Kabadi [71] is contrived, but
demonstrates that there are NP-complete global optimization problems.
This has been confirmed in practice since extremely challenging prob-
lems relating to global optimization can be found in almost all sections
of science and industry. Proving that these problems are NP-complete is
difficult, but the difficulty in solving needs no proof. Examples of such
nonlinear problems in which having the global optimum is either necessary
or desirable include Ending the direction and duration of radiation dosage
of cancer tumors, the lowest energy state for a protein molecule, and the
correct mixtures for optimal chemical plant operations.
1.2 Overview
The dissertation first reviews general global optimization meth-
ods. It explores both deterministic and stochastic methods and gives some
of the advantages or disadvantages of these methods. It also includes some
of the historical work in global optimization. Branch and bound methods
(a type of deterministic method) are presented in depth because branch
and bound methods are central to the development of the algorithm cre-
ated herein. This sets the framework in which this research lies.
The dissertation then looks at some new developments in global
optimization. The main contribution of this dissertation is the develop-
ment of a method that we call Back-Boxing and the release of source code
for global optimization and interval arithmetic. Back-Boxing allows prob-
lems to be solved to a greater degree of accuracy without an increase in
the runtime.
The last section contains published or electronically posted test
problems for nonlinear global optimization that are deemed to be difficult
global optimization problems. These problems are then used to show the
performance of the Back-Boxing algorithm. Tests are run showing that
the Back-Boxing algorithm is faster for either high dimensional problems
or for problems in which a high degree of accuracy is required.
2
2. A Review of General Global Optimization Methods
A general global optimization procedure is a technique that finds
the global optima of any function, where it is assumed that the initial
bounds on the values of the variables are given. It is also acceptable for
the bounds on the variables to be infinite. That is, the acceptable values
for the variables could be ( 00,00), [k, 00), or ( 00, k] where k is any real
number. Thus, the ideal general global optimization solver is a black box
into which one feeds the description of the function, its constraints, and
the bounds on the variables. The output of this black box is a combination
of the following states:
The approximate values at which the function attains the optima
with mathematically verified error bounds.
Values which are the approximation of the global optimum along
with a mathematically verified maximum error.
The function is unbounded.
The character of the first two states could be a set of boxes that
are guaranteed to contain the global optimum. Another possibility is to
report a set of points for which one can give a mathematically correct
probability as to how close the present best minimum function value is to
the global optimum. These termination situations correspond to the types
of output one would receive after completing a deterministic algorithm or
terminating a stochastic process prematurely. These types of algorithms
are studied in depth in the following sections.
One of the earliest papers on the subject of global optimization
was published by Tuy [99]. The research has progressed quite quickly
since then and has generated several books that are devoted only to global
optimization [35, 38, 19, 18, 39, 88, 49].
2.1 Terminology
The following symbols and terms are used throughout this work.
NaN: (Not A Number) This is the IEEE definition of a number that does
not exist or is undefined, such as 0/0 or sin-1 (2). Note that numbers
such as 1/0 are not NaNs, but are represented as infinite values.
Â£: corresponds to a computer or an automaton on which the global op-
timization algorithm is applied. For example corresponds to the
set of real numbers as represented on the automaton.
x G A real vector.
X: An interval number. That is, X = {x\x < x < x}, where x is al-
ways the left endpoint, and x is the right endpoint. Note that this
dissertation uses capitol letters to denote the interval version of any
structure such as an interval matrix or vector.
/: The space of interval numbers.
X G An interval vector. That is X = [x, x\ where x, x G 4Tb
A G InXn: An interval matrix. That is A = [a, a] where a, a G WnXn.
thick: An interval number, vector, or matrix X is said to be thick if there
exists X\ G X and x2 G X with x\ ^ x2. That is, width(A) > 0.
thin: An interval number, vector, or matrix X is said to be thin if for all
X\ G X and x2 G X, x\ = x2. That is, width(A) = 0.
4
macheps: macheps is the smallest positive value e such that 1+macheps ^
1 on the automaton being used.
machinf: machinf is the largest positive number that can be represented
on the automaton.
mach-inf: mach-inf is the largest (in absolute value) negative number
that can be represented on the automaton.
X: The midpoint of the interval number or vector X.
OX: The smallest interval number, vector, or matrix that contains the
set X.
]X[: The smallest outward rounded interval number, vector, or matrix
that contains X. On an automaton, this would correspond to round-
ing x up and rounding x down to the next machine representable
number for that automaton.
vertex: x* Â£ R is said to be a vertex of a convex region R if the set
{R x*} is convex
2.2 Probabilistic Methods
Probabilistic methods for global optimization are infinite pro-
cesses for which the probability of having visited the global optima ap-
proaches 1 as the number of steps tends to infinity. This means that if an
uncountable number of test runs were made with the method, the global
optimum would not be found in at most a countable number of the test
runs. The most basic such method is as follows:
Algorithm 1 : Random Search
Initialization: fmin = oo, xmin =NaN, and i = 0.
step 1: xl = random number.
5
Step 2. If f (x )
step 3: Increment i and go to step 1.
If F is at least piecewise continuous, then as i > oo, fmin >
min/. Hence, this is a general stochastic optimization technique. Rather
than converging to the global minimum over the procedure converges
to the minimum function value over all machine representable numbers if
this procedure is implemented on a computer.
This method was studied by several researchers ([1, 10]). They
show that if the function is evaluated at points which are drawn from
a uniform distribution over the region S, then it can be shown that the
smallest function value found in this way converges to the global minimum
value y* with probability 1.
2.2.1 Simulated Annealing
Simulated annealing (SA) is another probabilistic method that
has been used with good results as an optimization technique for integer
optimization and, more recently, for general global optimization.
SA is loosely based on the physical idea of annealing (cooling)
metals in order to find the lowest energy state of that metal. For this
reason, SA uses words such as temperature that relate back to this physical
interpretation. The mathematical interpretation of annealing is a cooling
schedule which is a decreasing function of time. For integer optimization,
SA can be seen as a random walk with bias [57]. In general, a SA technique
can be written as follows:
Algorithm 2 : Naive Simulated Annealing
Initialization: Set x = 0, and i = 0.
6
step 1: Choose y in some neighborhood of xl.
step 2: If f(y) < /(V) set xl+1 = y, go to step 4.
step 3: If (Uniform Random Number) < set xt+1 = y.
step 4: Increment i and go to step 1.
where c(i) is the temperature at time i [25], and / is the function being
minimized. In general, one wants c(i) \ cro ^ 0 as i oo. This is
a very naive implementation. Fox [25] recommends several methods for
speeding up the convergence of this naive implementation. The conver-
gence properties of this algorithm, as well as some of the more intelligent
implementations, are discussed in [36].
Like most stochastic processes, this is not a true algorithm in the
sense that it does not terminate in a finite number of steps. However, if
one places a limit on the number of steps taken by the algorithm, then
Heine [36] gives some probabilistic bounds on the percentage of the global
optima that has been found based on the number of steps that have been
taken.
The implementation of this technique is straight forward for dis-
crete problems. One need only define the neighborhood of ay and the
cooling schedule, c(i). When SA is applied to continuous problems, the
idea of neighborhood is even more nebulous. For continuous SA, one idea
is to first determine a search direction. Continuous SA would choose a
point on the unit hyper-sphere at random about the point Xi that gives
the search direction. The algorithm would then choose a random length
to step in that direction. An obvious drawback of this procedure is that
it does not use any local information (such as the derivatives). To avoid
7
this problem, Dekkers and Aarts [15] randomly pick one of the following
two methods. Either ay is chosen at random, or ay is chosen by applying
a few steps of a local search procedure to ay_i. Dekkers then goes on to
show that as long as one implements the random search a sufficiently large
number of times, the algorithm is still guaranteed to converge.
A general algorithm for the continuous nonlinear case with finite
termination is as follows:
Algorithm 3 : Nonlinear Simulated Annealing
Initialization: Initialize (c, x) and set i = 0.
step 1: Generate another vector y from x.
step 2: If f(y) f(x) < 0, accept y as the new state and set x = y.
step 3: Else, if e~U(y)-f(x))lci > random [0, 1), then accept y as the new
state and set x = y.
step 4: Lower Ci to
step 5: If i < L, go to step 1.
step 6: Report all points x for which the best function value is achieved.
One immediately notes that, similar to the basic SA algorithm, there are
several portions of this algorithm that are chosen at the discretion of the
implementor. Firstly, one must choose an initial value for the cooling
function. This parameter must be chosen sufficiently large so that ap-
proximately all transition stages are acceptable at this value. Let x(c) be
the ratio between the number of accepted transitions and the number of
proposed transitions. The cooling parameter can be chosen by taking a
sample set and requiring that the initial acceptance ratio Xo = x(co) is
close to 1. Next perform m0 samples with m0 = rrii + ra2, where m2 is
8
the number of accepted transitions, and m\ is the number of unacceptable
transitions. Let A/+ be the average value of those A fX}V values for which
AfXtV = f(y) f(x) > 0. Then choose c0 according to the formula
/ \ -l
c0 = A/+ In
m2
1x12X0 + (1 Xo)mi
Similarly, the cooling schedule can be formulated in many differ-
ent ways. One such method is
Q+i Q
Cj ln(l + h)\
3
where a(c) is a small positive number and denotes the standard deviation
of the values of the cost function of the points in the Markov chain of x
at C{. The constant 8 is called the distance parameter and determines the
speed of the decrement of the control parameter [15].
Finally, one must determine how to generate a new point y from
x. Call this procedure gXjV. There are several requirements that must be
satisfied so that the method is guaranteed to converge in a probabilistic
sense. These are beyond the scope of this paper but can be found in [15].
The following methods do satisfy the necessary criteria.
First, y can be generated from x by choosing y from a uniform
distribution of points on S, where S is the region over which one wants
the global optimum. That is, the probability that y is generated from any
given point x is gXjV = l/m(S) where m(S) is the Lebesgue measure of the
set S. If S is simply a box, then m(S) = nr=i(^' However, as men-
tioned before, this does not take into account any structural information
about the function.
9
Second, one could choose
9x,y
LS(x) if w > t
l/m(S) if w < t
where t is a fixed number in the interval (0,1], w is a uniform random
number drawn from [0,1), and LS(x) is a local search procedure that
generates a point y in a descent direction from x such that f(y) < f(x)
but y is not necessarily a local minimum.
2.2.2 Adaptive Simulated Annealing
Simulated annealing has been further developed and modified by
many different people. One of this modifications is called Adaptive Sim-
ulated Annealing (ASA). This software was developed by Lester Ingber
and other contributors first released for public use in 1989 as very fast
simulated reannealing (VFSR) code. This code is presently available on
the WWW at
http://alumni.caltech.edu/~ingber or via anonymous ftp at ftp.alumni.caltech.edu
in the directory /pub/ingber.
One problem with standard SA is that each parameter or variable
is annealed or cooled at the same rate, regardless of its range, value, or
sensitivity. ASA allows for different annealing (cooling) schedules for the
different parameters and variables.
SA uses a fixed cooling schedule in which the temperature decays
monotonically to 0 (or to a temperature close to 0). In a real world
setting, it would seem reasonable to stretch out the range over which
the relatively insensitive parameters are being searched. For example,
if the objective function is fairly insensitive to changes in a particular
10
variable ay, ASA decreases the rate at which variable Xi cools. This allows
ASA to explore more of the region for that particular variable. This is
accomplished by periodically rescaling the annealing time. Ingber [51]
calls this reannealing.
2.2.3 Multi-level, Single Linkage (Clustering Methods)
Multi-level, single linkage is one of a group of stochastic methods
that are often called clustering methods. A clustering method starts with
a uniform sample of points from the region S and then creates groups or
clusters of close points that correspond to a common region of attraction.
A cluster C is a set of points which correspond to a region of attraction
and has the following properties:
C contains exactly one local minimum x*.
Any descent algorithm that is started from a point x Â£ C con-
verges to the point x*.
The first two properties imply that C can contain exactly one
stationary point.
For example, consider the function (x2 l)2, which has global
minima at the points x = 1. There are two clusters of points for this
function. The first cluster is {x \-y/2< x < 0}, and the second cluster
is {x |0 < x < a/2} ( see figure 2.1). Any descent method that starts
in cluster 1 is guaranteed to converge to the point x = 1. Similarly, a
descent method starting at any point in cluster 2 is guaranteed to converge
to the point x = 1.
Clustering methods have several techniques to identify these clus-
ters, but the basic algorithm is still the same. A seed point is chosen which
11
12
is often the point with the lowest function value obtained so far or a local
minimum. Points are added to this cluster through the application of a
clustering rule or until some termination criterion is satisfied. Methods
differ on the clustering rule and on the termination criterion.
Two of the more common methods are based on either the den-
sity [98] of the cluster or on the distances between the points. In the first
method, the termination criterion states that the number of points per
unit volume within the cluster should not drop below a certain value. For
the second method, the termination criterion states that a certain critical
distance between points of the cluster may not be exceeded.
Single linkage is the specific name given to one type of clustering
method. It begins by evaluating the function at a set of sample points.
It then applies a local optimizer procedure (LO) to the sample point with
the lowest function value and starts a cluster about the resulting local
minimum. It then adds points to the current cluster until the distance
of the unclustered point to the cluster exceeds the critical distance. The
distance between a point x and a cluster is defined as the minimum dis-
tance between x and any point in the cluster. Once no more points can be
added to the cluster, the cluster is said to be terminated. If there are still
unclustered points, then the (LO) is applied to the unclustered sample
point with the lowest function value and a new cluster is formed around
the resulting local minimum.
The method as found in [59] is actually implemented slightly
differently in an iterative fashion. The points are sampled in groups of
fixed size M, and a continually expanded sample set is re-clustered at each
13
iteration. If X* is the set of local minima from the previous iterations,
the elements of X* are used as the seed points for the first clusters.
Suppose the critical distance (4 for single linkage in iteration k
for some a > 0 is chosen such that
1/n
dk = 7T1/2
r(l +
' 2 AM
(2,1)
where m(S) is the Lebesgue measure of the region S and
roo
T(n) = / dx
Jo
is the Gamma function. The resulting method has strong theoretical prop-
erties. If a > 4, then even if the sampling continues forever, the total
number of local searches started by single linkage can be proved to be fi-
nite with probability 1 [58]. Since the method cannot run forever, multiple
local (even global) minima can remain undetected in a single given cluster
and can remain undetected since only one local minimizer is applied to
each cluster.
Multi Level Single Linkage (MLSL) [59] differs since it is guaran-
teed to find all global optima with probability 1. The method is applied
to an ever expanding sample set like single linkage. The LO is then ap-
plied to every sample point unless there is another sample point within
the critical distance with a smaller function value. Formulated in this way,
MLSL does not form clusters and can be applied to the entire sample set
without reduction or concentration. Reduction is the act of discarding
a certain fraction of the sample points with the highest function values.
Concentration transforms the sample by allowing one, or at most a few,
steepest descent steps from every point.
14
MLSL may appear quite naive but, similar to single linkage, if
the critical distance is chosen according to (2.1) with a > 4, and if the
sampling continues forever, the number of local searches started is finite
with probability 1 [96]. Furthermore, single linkage is not guaranteed to
find the global optima. MLSL has been shown in [96] to find all local
minima (including the global minima) if the algorithm is allowed to run
forever. More precisely, consider the connected component for a level set
L(y) = {x\f(x) < y} containing a local minimum x*. Define the basin
Q(x*) as the largest value y for which every application of LO converges to
x*. Timmer then shows that in the limit, if a point is sampled in Q(x*)}
then the local minimum x* is found. If one chooses a > 0 (2.1), any
local minimum x* is found by MLSL in a finite number of iterations with
probability 1.
2.3 Deterministic Methods
Deterministic algorithms differ from stochastic algorithms in sev-
eral ways. First, stochastic algorithms never give a 100% guarantee of hav-
ing located the global optima since a guarantee of that quality requires
that the algorithm run for infinite time. Second, deterministic algorithms
always run in finite (although quite long) time and return a set of regions
that are guaranteed to contain all global optima. Finally, stochastic algo-
rithms are not guaranteed to find all global optima of a function but are
guaranteed to find one global optima when there may be many alternative
global optima. Deterministic algorithms return a guarantee that all global
optima are contained in the regions that are returned at the algorithms
termination.
15
This section is not meant to be a complete guide to the history
of global optimization. In fact, several methods are left out intention-
ally. The research contained herein strives to focus on problems that were
more general in nature such as concave objective function with convex
constraints rather than problems designed to solve a specific type of prob-
lem.
2.3.1 Cutting Plane Methods
Cutting plane methods are best applied to concave minimization.
That is, they are best applied to minimizing a concave function over a
convex set. According to Horst and Tuy [49] Many applications lead to
minimizing a concave function over a convex set. Moreover, it turns out
that concave minimization techniques also play an important role in other
fields of global optimization. There are three main techniques that are
generally applied to concave minimization: cutting methods, successive
approximation, and successive partition methods. This research explores
only cutting plane methods, but a good reference for all these methods
can be found in [49].
The basic concave program (BCP) can be written as follows:
min f(x)
s.t. Ax < b
x > 0
where A is an m X n matrix, x is an m-vector, / : > is a concave
function, and D = <6, x > 0}. It can be shown that the minimum
of any bounded concave program occurs at a vertex of the polytope defined
16
by A, 6, and the non-negativity constraint. Therefore, the solution to the
BCP can be re-written as:
min f(xr) where xl is a vertex of the polytope P
Define a point x to be a local BCP minimizer if f(x) < f(x) for
any x in the convex hull of x and the adjacent vertices y1, y2,. ., ys. Let
7 = f(x)- For any x Â£ satisfying f(x) > 7, the point x + 6(x x)
such that
9 = sup{t|t > 0, f(x + t(x x) > 7}
is called the 7-extension of x with respect to x. Now, let 6i be the 7
extension of x with respect to y\ It can be shown that any solution 7r of
the system of linear inequalities
9irx{yi x) > 1 (i = l,2,...,s) (2.2)
where x is a local BCP minimizer. This follows since min{/(y1),. ., f(ys)} >
f(x). From this result the following theorem can be given.
Theorem 1 (Sufficient condition for global optimality see [49],
page 179). Let 7r be a solution of the system 2.2. Then
7t(x x) > 1 for all x Â£ D such that f(x) < f(x)
Hence, if
max{7r(x x)\x Â£ D} < 1,
then x is a global optimal solution of BCP.
This gives a quick and easy method to check for optimality of a
given point by solving the LP
max{7r(x x)\x Â£ D~},
17
where tt(x x) > 1 is a valid cut for the BCP. If the optimum value
of this LP is less than 1, then x is a global optimal solution. If not, the
solution must be found in the region
D fs\{x\Tr(x x) > 1}.
Hence the name cutting plane algorithms. The basic algorithm
finds a local optimum. If that local optimum is not a global optimum,
then the algorithm cuts away the region of the domain that contains that
solution and proceeds to find another local optimum. This continues until
the global optimum is found.
Algorithm 4 : Cutting Plane [49]
initialization: Locate a local BCP minimizer x. Set 7 = f(x), D0 = D,
k = 0.
step 1: At xk, construct a 7-valid cut TTk for (/, TC).
step 2: Solve the linear program max7rfc(x xk) s.t. x Â£ Dk. Let uk be
a basic optimal solution of this linear program. If TTk(ujk xk) < 1,
then stop: x is a global minimizer. Else, go to step 3.
step 3: Let Dj~+i = DkC\{x\^k(x xk) > 1}. Find a vertex xk+l which
is a local BCP minimizer of f(x) over Dk+1- If f(xk+1) > 7, set
k = k + 1 and go to step 1. Else go to step 4.
step 4: Set 7 = f(xk+1), k = k + 1, and goto step 1.
The concavity cuts that are used in Algorithm 4 are easy to
construct. The cuts that are constructed tend to become shallow as the
algorithm progresses. The volume of the area removed by the cuts can
approach zero long before enough is cut away to find the global minimum.
This causes slow convergence and can cause the algorithm to not converge
18
at all. This has made the research look to more expensive but deeper
cuts in order to guarantee finiteness. This dissertation does not describe
in detail the different methods for doing this but again refers the reader
to [49].
2.3.2 Branch and Bound History
Branch and bound is one of the few deterministic methods that
can be applied to a wide class of problems with little or no knowledge of
the state of the function that is to be minimized. It can be applied quite
easily to methods with concave or convex domains, concave or convex
objective functions, and functions for which no information on continuity
is available. While one does generally have some information about the
function, some branch and bound methods do not require this in any form.
The version we develop here exploits information about the function as it
is actively discovered. However, the algorithm finds the information that
it needs rather than requiring the user to provide the information at the
start. The algorithm computes all the needed derivatives and then uses
this information to exploit the structure of the function.
The earliest paper this author found that presents a branch and
bound algorithm with finite convergence proofs is the 1969 paper by Falk
and Soland [23]. Many people had thought about using branch and bound
to solve the nonlinear optimization problem. In particular, the ideas of
global optimization through branch and bound had been around for a few
years at this point. In 1966, Lawler and Wood [62] wrote an article that
surveyed branch and bound methods. They concentrated on methods for
integer programming but do mention that several people had considered
19
and suggested branch and bound methods for solving nonlinear program-
ming problems. For example, they also refer to a 1963 presentation by
Beale [4] in which he applies branch and bound to solve nonlinear trans-
portation problems. Lawler and Wood state that this work was not in-
vestigated completely, but Beales paper shows that the ideas for branch
and bound have been considered for some time. The presentation by Beale
was then accepted for publication in 1964. This is approximately the same
time as Tuys [99] first publication on the subject.
Falk and Solands algorithm concentrated on finding the mini-
mum of a separable objective function over a rectangular domain (bounds
on the variables, i.e. a box). They subdivide the region using a rectan-
gular partition, and they bound the objective function by using a convex
underestimating nonlinear function. The nonlinear function is derived by
computing the convex envelope of the objective function over each rect-
angular region. The convex envelope of / on D is the supremum of all
convex functions that underestimate the function / on D. The resulting
function is convex, and it can be shown that the minimum of a continuous
function / on a compact set is the same as the minimum of the convex
envelope of / on that same set. At each step, the algorithm splits along
the variable for which the greatest difference between the convex envelope
and the function is greatest. By continuing to reduce the size of each
partition, the algorithm is able to improve the fit of the convex envelope
at each iteration.
Greenberg [29] showed that Falk and Solands result could be
improved upon in terms of computational time as well as accuracy of the
20
underestimating function. He shows that one can use the Lagrangian dual
to compute a lower bound on the function rather than using a convex
underestimating function. He goes on to show that in the worst case
the Lagrangian dual gives the same result as the convex underestimating
function and is potentially better. He also shows that the Lagrangian
dual is less expensive to compute and gives some economic interpretation
of the presence of gaps. Additionally, Greenberg gives some directions for
further research.
Falk and Solands algorithm was improved by Soland [92] so that
it could handle non-convex constraints. The constraints are treated in
a manner similar to that of the objective function. The constraints are
replaced by their convex envelopes. Soland was able to prove convergence,
but it was possible that the convergence was asymptotic.
At the same time, Beale and Tomlin [5] introduced their global
optimization algorithm which was based on the concept of using ordered
variables. The objective function was approximated with piecewise linear
functions, and the problem was solved as an LP. For more information on
the early work in branch and bound methods for global optimization, see
McCormick [64] who provides a summary of the early work and feels that
branch and bound methods held much promise for global optimization.
McCormick [65] extends the early work of Falk and Soland to
allow for factorable functions. McCormick presents rules to generate con-
vex and concave bounding functions for factorable functions which appear
on many NLPs. These extensions allow branch and bound to solve most
NLPs and is used by Epperly [21] to handle bilinear constraints.
21
In 1976, Horst [43] frees the original Falk and Soland algorithm
of the restriction to rectangular partitions and the restriction to convex
underestimating functions. Horst uses compact partitions and general un-
derestimating functions in their places and then provides the necessary
conditions on these partitions and functions to prove convergence. He
provides an algorithm that uses n-dimensional simplices to partition the
domain. Even though he proves that convex envelopes are unnecessary, he
uses them in his algorithm for simplicity. The use of n-dimensional sim-
plices also makes his algorithm much more efficient for concave functions
since the n-dimensional simplices have only n + 1 vertices, while rectan-
gles have 2n vertices. He proves that the algorithm must give at least one
accumulation point which is a solution of the original global optimization
problem. In a later paper [44], he proves that every accumulation point
must be a solution to the GOP.
Another algorithm was given by Thoai and Tuy [94], They
present a convex branch and bound algorithm which uses conical par-
titions. They develop the necessary consistency and criteria required for
convergence. The problems which their algorithm attacks have a concave
objective function and a polyhedral constraint set. They use an LP which
is based on Tuys cutting planes [99] to obtain lower bounds for each
convex region.
We have now presented the three main partitioning methods used
in branch and bound global optimization algorithms. These methods are
rectangular, simplicial, and conical. Rectangular partitions are generally
22
used by methods that use convex envelopes such as the algorithm devel-
oped by Falk and Soland [23] or algorithms that employ interval arithmetic
such as the algorithm developed by Hansen [35]. Simplicial and conical
partitions are generally used to solve concave minimization problems. This
research examines these ideas more in depth in section 2.3.5.
In 1982, Benson [6] generalized Falk and Solands [23] and Horsts [43]
algorithms to an algorithm that was less restrictive on the types of under-
estimating functions used. Benson shows that every accumulation point is
a solution of the GOP and, since Falk and Solands work is a special case,
shows that for their algorithm, every accumulation point is a solution of
the GOP. Unfortunately, no computational results have been reported, so
it is difficult to judge the utility of this work.
Rosen [86] presents a unique method of minimizing a concave
function over a convex set. Rosen first computes a local maximum of the
function by using a local search method. He then examines the eigen-
vectors of the Hessian matrix at the local maximum in order to estimate
which feasible vertices are far from the maximum by using a multiple cost
row LP. The vertex that yields the lowest objective functional value is
then used an an upper bound for the global minimum and is used to cre-
ate a set in which the minimum cannot occur due to the concavity of
the objective. He can then exclude this region from the search leaving
at most 2n polyhedral subsets that can be solved using the Falk-Hoffman
algorithm [22], It is hoped that the algorithm converges quickly since
small simplices can be given from the first step of Rosens method. Rosen
applied his algorithm to several problems and provides a bound on the
23
amount of work required.
Tuy, Thieu, and Thai [103] provide an improved version of the
Thoai-Tuy algorithm that no longer required a bounded polyhedral feasi-
ble region. They develop a conical branch and bound algorithm that works
for concave objective functions defined over a closed convex set with the
possibility of the set being unbounded. They explore several different
splitting strategies for the method including splitting the largest cone,
splitting the oldest cone, and splitting the most promising cone. They use
a supporting hyperplane of the convex set to obtain an underestimating
function over a cone.
In [45], Horst generalizes the Benson-Falk-Horst-Soland algo-
rithm. The latter is often referred to as the BFHS algorithm developed
by Benson [6] (not to be confused with the BFGS local optimization
method). It includes the branch and bound algorithm that Tuy, Thieu,
and Thai [103] developed. Horst creates some extensions to the original
method which allows for region deletion and different variable selection
rules. He shows that the general algorithm has convergence in the limit
and provides a concrete algorithm for minimizing concave functions over
a compact and convex sets.
Horst and Tuy [42] show that many other algorithms are special
cases of a general branch and bound algorithm. They show that Pinters
algorithms [79, 80], the algorithm by Zheng and Galperin [27, 28, 109],
and the algorithms of Pijavskii [78], Shubert [91], and Mladineo [67] are
all special cases of their general branch and bound algorithm. In a later
paper, Tuy and Horst [102] review the convergence criterion for a general
24
branch and bound algorithm. They present a restart branch and bound
algorithm for use when the feasible set is overestimated using feasible cuts.
Their algorithm introduces new cuts for the feasible region as it progresses.
They provide concrete algorithms for concave D.C. programs (programs
that can be expressed as the difference of two convex programs).
Benson, Horst, and Tuy [6, 45, 102] develop algorithms that as-
sume it is easy to find feasible points inside each of the partitions. While
this may be true for convex regions, it is not the case for general con-
straints. In fact, determining feasibility can be just as difficult as folding
the global optimum. Horst [46] extends his previous general branch and
bound algorithm and convergence proofs to apply when the feasibility of
the partition sets is unknown. Additionally, Horst and Thoai [48] use this
modification to solve systems of Lipschitzian equations and inequalities.
In [41], Horst, Thoai, and Benson refine their algorithm to present
a faster version for branch and bound for concave minimization over a con-
vex feasible region. They use conical partitions again along with polyhe-
dral outer approximation to the convex region. Both of these modifications
give a lower bound step that is a linear program rather than just a lin-
ear objective function with convex constraints. This is the reason for the
speed-up observed in this modification.
Most of the algorithms presented thus far assume a bounded
feasible region. Hamed and McCormick [33] investigate how to treat a
problem for which there are no bounds on the variables. They investigate
several procedures in an attempt to provide bounds on the variables. Each
procedure depends on the type of NLP that is being solved.
25
2.3.3 Branch and Bound
Branch and bound, like SA, is a technique that has been suc-
cessfully applied to integer optimization problems. In this context, a
general integer program is given by (IP): max{cx|x Â£ S'}, where S' =
{x Â£ Zp\Ax < b}. Let Sr be a relaxation of the set S to some larger
set so that zp(x) > cx. For example, a common relaxation of a inte-
ger program is to its linear counterpart (RP/p): max{cx|x Â£ S)p}, where
Sip = {x Â£ 4P(|Ax < b}. Then the general integer branch and bound
algorithm is given in algorithm 5 (see [73]).
Algorithm 5 : Integer Branch and Bound
Initialization: Set L = IP (L is a list of IPs, each with its own portion
of the domain), and S = S', z = machinf, and zIP = machinf.
step 1: (Termination test): If L = 0, then the solution x that yielded
zIP = cx is optimal. If there is no such x, then return infeasible,
step 2: (Problem selection and relaxation): Select and delete a problem
from L. Solve a relaxation of (IP): RP*. Let zp be the optimal value
of the relaxation and let xp be an optimal solution if one exists,
step 3: (Pruning):
a : If zr< z_ip, g t step 1-
b : If xlR ^ S'*, go to step 4.
c : if xp Â£ S'*, and cxp > zIP} let zIP = cxlR. Delete from L all
problems with Z{ < z_iP. If cxp = zP} go to step 1. Otherwise,
go to step 4.
step 4: Let {S'*-?}(i=1 be a division of S'* such that S'* = (Jj=i SlJ. Add
problems {IP13} to L, where A3 = zp for j = Go to step 1.
26
In step 2 if the entire set of subsets of S has been pruned, then
the last integer solution found was optimal. If not, then a set is pulled off
of the list, and an upper bound on the value of IP is computed for the set
If this upper bound is less than the best computed integer solu-
tion computed to this point, then the set Sl is pruned. If the solution is
integral and better than the best solution so far, then the set of IPs that
remain are searched for sets that can be pruned.
The branch portion of branch and bound becomes obvious when
step 5 partitions Sl into k disjoint subsets. IPs are then generated that
have these subsets for their domain, and they added to the list L and the
algorithm continues.
This algorithm has been adapted to solve the Global NLP prob-
lem in a straight forward fashion by using some relaxation to bound the
function value over a portion of the domain. Given the general NLP prob-
lem max/(x) st x Â£ R C a naive algorithm is as follows.
Algorithm 6 : Continuous Branch and Bound
Initialization: Set i = 1, z* = mach-inf, and list = R.
step 1: Pull a region Ri off of the list.
step 2: If the region Ri is too small, put it on the list of finished boxes
and go to step 1.
step 3: Compute z_,2:+ such that < f(x)\jit < z+.
step 4: If > z* discard the region Ri and go to step 1.
step 5: If z+ < z* set z* = z+.
27
step 6: Split the region Ri into k equal sized pieces and add them to the
list.
step 7: Go to step 1.
step 8: Return the list of finished boxes.
Algorithm 6 covers the region R with smaller and smaller boxes, eliminat-
ing the ones where it is impossible for a global minimum to exist since we
have bounded the values of that region to be less than some lower bound
on the global minimum.
This basic algorithm suffers from several potential difficulties.
First, the bounds computed for the function over a region that is used in
step 2 are potentially not tight. If the bounds computed are sufficiently
bad, then the algorithm could spend a large amount of time attempting to
eliminate regions that are close to the global minimum, but whose objec-
tive function is very flat. Second, in step 6, the initial box is divided into k
equal sized pieces which does not take into consideration any information
such as local/global behavior of the function over any given region. A
more intelligent algorithm would take such information into consideration
when determining where to split. For example, if the box consists of the
square [-1, 1] X [-1, 1] and / = x2} the box could be divided into four boxes,
each with one corner at [0, 0]. This may not be a good partition since
each box still contains the global minimum of the function. However, if
the box were split into the boxes
[e, e] X [e, e], [e, 1] X [1,1], [1, e] X [1,1], [e, e] X [e, 1], [e, e] X [1, e],
where e is a small positive number, we now have five boxes, only one of
which contains the global min, and the other four can be easily eliminated.
28
It would be an advantage to have an algorithm which could identify these
boxes so that the region can be split more intelligently.
Third, regions can be eliminated on the basis of information other
than just function values. If the first derivative is strictly positive (neg-
ative) over a region, that region can be eliminated as it cannot contain
even a local minimum. Similarly, a region can be eliminated if the Hes-
sian matrix cannot be positive definite on that region. Fourth a region
could be partially eliminated using a combination of first and second order
information. These ideas are explored in greater depth in section 2.5.1.
There are two basic features to any branch and bound algorithm.
First, every algorithm requires some sort of bounding function for the
objective. Second, the algorithm must determine how to partition the
region after the bound has been computed. This research examines both
these features and develop new approaches that allow high dimensional
global optimization problems to be solved. We then present a new method
for partitioning the region and give computational results that compare
the new method with an older one.
2.3.4 Bounding Functions
It is desirable to have a bounding function that quickly gets ac-
curate bounds on the values of a function when applying a branch and
bound method for global optimization. Unfortunately, the desire to have
both an accurate as well as a quickly computed bound is usually contradic-
tory. We now explore a variety of bounding functions. Many of them are
applicable to only specific types of problems such as concave minimization
problems or separable functions.
29
2.3.4.1 Bounding Based on Lipschitz Constants
A potentially inexpensive but inaccurate bound on the function
is given by
BL,Rf = f(x*) Lw(R) < f(x) < f(x*) + Lw(R) = BL'Rf,
where x* Â£ R} L is the Lipschitz constant for / on R} and w(R) is an upper
bound on the diameter of the region R. This bound has the advantage of
being very inexpensive to compute provided that the Lipschitz constant
is available at no cost. In addition it has the property that
lim BlR BltRf = 0.
w(R)>-0
However, this method also has the disadvantage of potentially overestimat-
ing the bound if the constant l is unnecessarily large. Also, the bounds
do not take advantage of the fact that, for most functions, a Lipschitz
constant may vary greatly over different regions of the function.
2.3.4.2 Interval Arithmetic
Another method of bounding a function is based on interval arith-
metic. Interval arithmetic was invented by Moore [68] and has been used
recently for solving ordinary differential equations, linear systems, verify-
ing chaos, and global optimization.
Define an interval X = [x, af], where x and x are real numbers.
Then, for any binary operation o where o is one of +, , /, or *, and X
and Y any two intervals, we desire
X o Y = [minx o y, maxx o y\.
xEX xEX
yeY yeY
30
Similarly, for a unary function
4>(X) = [min(:r), max^(r)].
xÂ£X xÂ£X
Let X = [ay x\ and Y = [y, y] be intervals. We find that
X + Y = [x + y,x + y\
X Y = \x_ y, x y]
= [a, 6]
a = min{xy, xy, xy, xy}
b = max{iy, xy, xy} xy}
Y = ,- O^Y [y y\
Vx = [y/x_i V^] X > 0
lnX = [In x, In a7] X > 0
ex = [eA ei
For non-monotonic computer functions such as sin(W), we can
treat these functions by using the periodicity of these functions. Also, on
an automaton, one must take care to round outwards the result of any
operation on intervals so that the result is mathematically correct.
For example, let f(X, Y) = 2X + 3XY \[X and let X = [1,4]
and Y = [1,2]. One way of computing f(X}Y) is as follows.
T1 = 2X = 2 [1,4] = [2, 8]
T2 = X*Y = [1,4] [-1,2] = [-4,8]
T3 = 3 T2 = 3 [4, 8] = [12, 24]
31
T 4 = Tl + T3 = [2,8] + [-12,24] = [-10,32]
T5 = Vx = y/\lJ\ = [l,2]
T6 = T4-T5 = [-10,32] [1,2] = [-12,31]
true range = [/(4,-1), /(4, 2)]
= [-6,30]
We can now take the above operations and apply them to any
function. The real operations are replaced by interval operations and
all the intermediate results are intervals. This provides a function with
guaranteed enclosures of the range of any function. Also (see [69]) we have
the guarantee that
lim F(X) = F(x)
w(X)^-0
where F is the interval inclusion function of / over R and / is continuous
on X.
Interval inclusion functions (of which there are many different
types) have the advantage of being easy to implement using an object
oriented language such as CHf or Ada. It also does not require any a
prioi information about /. However, computing this interval bound carries
a cost of 2 to 4 times as much effort as evaluating / [74],
2.3.4.3 Interval Taylors Expansion
Any n + 1 times continuously differentiable function f in an in-
terval about x* can be written as
f(x) = J2
f^(x*)(x x*y
i = 0
n\
T R -n ( X, Â£ ) ,
(2.3)
32
where
Rn(x,0 =
fn+1(0(x x*)n+1
(n + 1)!
and Â£ lies on the line segment connecting x and x*. Interval arithmetic can
quite readily take advantage of this Taylors expansion of / at any point x*
in an interval X. Evaluate (2.3) where x* Â£ X, and (x x*) = X x*. Add
the interval remainder term Rn(x} Â£) by evaluating the n + 1st derivative
over the interval X and multiply this value by (X x*)n+l. Define this
value to be the nth order Taylor polynomial Tn(X) and write it as
T(A-) = V t/(V)(A--.V)^ + /+1(A-)(A- -
8 = 0
Uh
(n + 1)!
Note that for n = 1, this is similar to the Lipschitz method except
that an interval of values is used for the Lipschitz constant. This is also
known (see [74]) as the generalized mean value form and is written
Ti{X) = fm(X, z) = f(z) + f'(X)(X I),
where the Taylors expansion is computed at the point x Â£ X.
As with the other methods,
lim TniX) = f(x).
w(X)-*Q
Furthermore, if fn+1 is continuous and R is a box with sides parallel to
the coordinate axis, then the interval Taylors expansion has the property
that
lim Tn(X) = {y\y = f(x),x Â£ X}.
nYoo
Traditional methods for computing f^{X) require exponential
computational effort as n > oo if / has non-trivial nth derivatives. Auto-
matic differentiation [55] allows the computation of the nth term of Tn(X)
33
for at most a constant times the work required to compute f(x). In par-
ticular, for n = 1, V/(i?) can be computed for at most 5 times the effort
required to compute f(R) itself [55].
2.3.4.4 Slope Form
Neumaier [74] suggests a method for including the range of a
function that is similar to using a first order Taylor method. Let / be a
differentiable function of one variable. Let the standard divided difference
of / be defined by
f[x,y]
f(y)-fR)
y-x
if x ^ y
f(x) if x = y
When / is twice continuously differentiable one can write
f(X) = f(z) + f[z,z\(x z).
If this holds for x Â£ x and if f[x, y] is an arithmetical expression in x and
y, the slope form is written as
fs(x, z) = f(z) + f[z, x](x z).
For example, consider
f(x) = x3 2x2 + 3x -
f'(x) = 3x2 4x + 3
f[x-,y] = x2 + xy + y2 -
X = [o,i].
Then
/M/2] = [1/2,4]
- 1
2(x + y) + 3
34
and so,
fs(x, 1/2) = 1/8 + [1/2,4][-l/2,1/2] = [-15/8,17/8],
Note that the simple interval extension of / and the mean value form give
/([O' 1]) = [-3.3]
1/2) = 1/8 + [1,6][1/2,1/2] = [-23/8,25/8]
which are both larger than the slope form. In fact, for the same center, the
slope form gives consistently more accurate results than the generalized
mean value form [74],
This presentation of the slope form applies to functions of a single
variable. If the number of variables is greater than 1, then the slope form
is not unique. For example, for two variables Â£i,Â£2 and /(Â£) = Â£iÂ£2, both
f\x,x\ = (+2,4i) and f\x,x\ = (+2,3/1) are slopes which shows that the
slope form is not unique.
2.3.4.5 Convex Envelopes
In their first paper on the subject, Falk and Soland [23] by using
the convex envelope (the supremum of all convex functions less than or
equal to /) of the function. In fact, if the function / is separable and the set
over which the convex envelope is desired is given by C = {x\l < x < +},
then the convex envelope T of / can be computed as follows. Let
n n
^(3/) = = ~
Z=1
i=1
where
P(t) = J2p^) = /(*)}
i=1
lt<.xt<.Lt
2=1
35
That is, the convex envelope of the function / over a rectangular partition
is equal the sum of the convex envelopes of /; taken over Ci = {xi\U <
xt < Lt}
2.3.5 Partitioning Schemes
As mentioned in section 2.3.2, there are three methods of par-
titioning a region: cone, rectangle, and simplex. Conical partitions are
generally used by methods for concave minimization problems, rectangu-
lar are used by methods that use interval arithmetic or convex envelopes,
and simplicial partitions are generally only applied to concave minimiza-
tion problems. One of the two main thrusts of this research is partitioning.
In particular, we have developed a new method for a partitioning scheme
that improves the performance of a validated branch and bound algo-
rithm when the accuracy desired is quite small or when the dimension of
the problem is large.
Falk and Soland [23] used rectangular partitions in their origi-
nal algorithm. This has been the partitioning method of choice for any
algorithm that employs either interval arithmetic or convex envelopes. In-
terval arithmetic uses rectangular partitions since an interval vector X
in n-space is a rectangle. Hence, partitioning into rectangles is natural.
Convex envelopes use rectangular partitions because they work so well for
finding an underestimating convex function.
Schemes that are used for determining where to partition a given
rectangle can be quite varied. One can bisect regions along the longest
edge, bisect according to the most promising variable, or bisect according
36
the size of the gradient. More generally, one can cut the region into k re-
gions where k is any integer. Hansen [35] has implemented several of these.
He recommends trisecting along the longest edge. These ideas were also
explored by Csendes and Ratz [14, 84, 85]. They explored chopping the
box into 2 to n pieces where n was the dimension of the function. They also
explored using the Hansen-Sengupta operator (see section 2.4.3) to deter-
mine where to divide the box and they had some success with this method.
In order to maintain the theoretical convergence, one must guarantee that
the length of the longest edge tends to 0 as the algorithm progresses. Di-
viding the region into k equal sized pieces satisfies this requirement. We
have developed a new method for dividing a rectangular partition that
takes into account information about convexity and/or monotonicity of
the function around a local optimum of a rectangular partition.
By contrast, Horst and Tuy [49] give a process which does not
take into consideration any information about convexity or monotonicity.
This is a normal simplicial subdivision process for the optimiza-
tion of a concave function over a convex but not necessarily polyhedral
set. They let an n-simplex M = [id, id,..., vn+l] be a subsimplex of M0
which contains the feasible region D and let 4>m(%) be the convex envelope
of the function / over M. 4>m(%) is obtained by finding the affine function
that agrees with f(x) at each vertex vl of M. The linear program
min4>m(x) s.t. x Â£ D fl M
has a basic optimal solution u(M) with value (3(B). Since 4>m(x) is a
convex (also affine) underestimating function of /, we know that (3(M) <
min f(D fl M).
37
A basic simplicial subdivision process is given by:
Algorithm 7 : Simplicial Subdivision Process
step 1: The initial simplex is M0.
step 2: The subdivision of any n-simplex M is a radial subdivision with
respect to a point x = w(M) Â£ M which is distinct from any vertex
of M.
Note that when the underestimating function is affine as above, the re-
sulting LP is easy to solve since the n-simplex has exactly n vertices.
Conical partitioning is also generally implemented for the opti-
mization of a concave function over a convex set. There are two main
methods of conical partitioning: bisection and normal subdivision.
Following the development by Horst and Thoai [49], let S'=conv{n1,. . vn+l}
be an n-simplex defined as the convex hull of its vertices id,..., un+1, and
let 0 Â£ A. Denote by Fi the facet of S that is opposite to ly i.e. ly ^
Clearly Fi is an (n l)-simplex. Let C(Fi) be the convex polyhedral cone
generated by the vertices of F. It is well known that
n-\-1
U C(Fi) = ^
8 = 1
and that {C(fi)\i = 1,. ., n + 1} is a partition of
Now let C = C'(n1,. ., un) be a polyhedral cone generated by n
independent vectors ul Â£ Then every r Â£ C can be written as
n
r = ^ AiUi i = 1,. ., n, where A; > 0.
8 = 1
If r ^ 0 and r ^ u\ i = 1,. . n, then for each i satisfying A; > 0 in 2.3.5,
we have
Ci = C'(n1,. . id-1, r, id+1,. ., un).
38
It is easy to see that {Ci\Xi > 0} is a partition of C. We have
U c. = c.
A, >0
and
int(C'i) P|int(C'J) = 0, for i ^ j.
A classic method for selecting the vector r is the bisection method.
Choose
r = (1/2) (A1 + A2),
where
||un un 11 = max{| |it* uJ \ \ \i,j Â£ {1,. ., n}, i ^ j}.
Geometrically, bisection corresponds to choosing the midpoint of one of
the longest edges of the (n l)-simplex conv/u1,. . un}. It has been
shown that a conical branch and bound method that uses bisection conical
partitioning is convergent. Many more choices of r that lead to the same
convergence properties can be found in [104], However, bisection tends
to be quite slow in practice [49], and other methods for partitioning the
region have been developed.
Horst and Tuy [49] give a method that is a hybrid of the bisec-
tion method and an cu-subdivision process. An cu-subdivision process sub-
divides a region K with respect to the basic optimal solution of the linear
program associated with the n-simplex. cu-subdivisions take into consid-
eration the values at the points of the n-simplex where bisection simply
throws it away in favor of the longest edge. However, this information
can cause cu-subdivision to degenerate and jam. That is, cu-subdivision
slows and make smaller and smaller cuts in the cone..
39
Define Z(Q) to be the (n l)-simplex which is the section of
K by the hyperplane H0 = {x\eQ~lx = 1}, where e is the vector of all
ones. Define a(Q) to be the eccentricity of K relative to the basic optimal
solution of the linear program associated with Q. The following algorithm
gives a hybrid of the cu-subdivision process with the bisection process for
simplicial branch and bound.
Algorithm 8 : cu-subdivision, Bisection Hybrid Process
initialization: Let Q0 be the initial n-simplex. Select r/0 and p Â£ (^, 1).
Set r/(Q) = r/o for the initial cone K0 = con(Q) to be subdivided,
step 1: If ||eQ-1|| < v(Q) and ^(Q) ^ p, then perform an cu-subdivision
of K = con(<5).
step 2: Otherwise, perform a bisection of K = con(Q).
This algorithm has the advantage of using some function information, but
it is prevented from jamming like pure cu-subdivision by applying bisection
when the region becomes eccentric.
2.4 Result Verification
Result verification methods are computational methods that math-
ematically guarantee existence of solutions. That is, result verification
requires the verification of hypotheses of an existence theorem be carried
out on a digital computer. This is generally done using interval arithmetic
techniques with outward rounding. This research requires result verifica-
tion in its partitioning scheme and for this reason, some ideas pertaining
to this research and result verification are explored here.
Given a vector valued function y = F(x) mapping and
an interval X, we would like to quickly and easily determine if there is
40
a solution to the equation F(x) = 0 in the interval X. This research is
interested in solutions to F(x) = 0 (where F(x) = X f(x)) since these are
points at which the function can achieve a global or local optimum. The
points for which XF(y) = 0 indicate that these points deserve further
inspection. Result verification methods would allow one to place a small
box around this point y and then mathematically verify that either there
exists a point within that box which is a local minimum, some point in
the box is a saddle point, or nothing can be verified.
This is important for global optimization since using result veri-
fication and a convexity test can either verify or dismiss the possibility of
a particular region containing a local optima rather than simply contain-
ing a saddle point. Also, it is shown that some of the result verification
methods can also improve an approximate solution that was found by a
local optimization method.
This research now introduces three operators that verify a solu-
tion in a mathematically correct fashion. These operators are the Interval
Newton Operator, the Krawczyk operator, and the Hansen-Sengupta op-
erator.
2.4.1 Interval Newton Operator
Let AH = {A_1|A Â£ A}, and define N(x,x) = x AHF(x).
Theorem 2 Let F : D0 C {ft iff be Lipschitz continuous on
D C D0 and let F(x) F(y) = A(x y) for A Â£ A and y, y Â£ D0. Then,
if x Â£ X Â£ D, every X' satisfying
N(X,x) = x AHF(x) C X'
41
has the following three properties:
(1) Every zero x* Â£ X of F satisfies x* Â£ X'.
(2) If X' F\X = 0, then F contains no zero in X.
(3) If x Â£ int(A) and X' C X, then F contains a unique zero in X and,
hence in X'.
Proof: See [74],
N(X, x) is called the Interval Newton operator since, if A =
F'^X)-1, the operator looks like the Newton update xn+i = xn F\xn)~l F(xn).
The disadvantage to this method is that A must be regular. That is,
VA Â£ A, A-1 must exist. A closed, convex and bounded set A of m X n
matrices is called a Lipschitz set for F on D Â£ D0 if F(x) F(y) = A(x y)
for some A Â£ A. A is called a Lipschitz matrix if A is a Lipschitz set and
A Â£ IRnXn. It follows that A must be a regular Lipschitz matrix. This
strongly restricts the set of matrices that work for this method for a par-
ticular choice of F.
2.4.2 Krawczyk Operator
An improved operator was developed by Krawczyk. Define
K(X, x) : = 7 CF{x) (CA -I)(X-x),
where C is any n X n real matrix Then the following theorem holds.
Theorem 3 Let
F : D0 C be Lipschitz continuous on D C D0} and let A Â£
FRnXn be an interval Lipschitz matrix for F on h. If x Â£ X Â£ ID and
X1 = K(X,x) then:
(1) Every zero x* Â£ X of F satisfies x* Â£ X'.
42
(2) If X' F\X = 0 then F contains no zero in X.
(3) If x Â£ int(X) and 0 ^ X' C X, then A is strongly regular (that is,
A~XA is regular), F contains a unique zero in X, and hence in X'.
Proof: See [74],
If, as with the interval Newton operator, we replace C by F{x)
and let A = F(x), then this operator can be seen as a Newton step plus
a contraction of the interval. A distinct advantage of this method is that
A is not required to be regular. If not, then we cannot verify uniqueness,
but it is still possible to verify that no solution exists in the interval X.
Neumaier [74] goes on to demonstrate that the best possible value for C
is A, that the best possible value of x is i, and that these values are
independent of each other.
2.4.3 Hansen-Sengupta Operator
Define the operator
r (A,B,X)
B/AHX
if 0 a,
(a; B/A, B/A[)
(A'] B/A,B/A[)
X
if B > 0 Â£ A,
if B < 0 Â£ A,
if 0 Â£ A, B.
Then T(A, B, X) = Â£ X\x = b for some a Â£ A, b Â£ B} [74], Given
a matrix A, a vector 6, and an approximate solution X* to the equation
AX = 6, an improved guess jq, y2, yn can be obtained by the following
process,
Yi = r ( Aii, k AtkYk AlkXk,X* ] = 1,..., n.
k*
This operator can only be applied to matrices for which An ^ 0. If at*
43
some point, becomes 0, then we assign the value of 0 to Ip and have
verified that no solution to Ax = b exists in the interval X. Note that
A, 6, and X can all be either real or interval. If they are all real, this
operator becomes the standard Gauss-Sidel process for solving systems of
linear equations.
For nonlinear result verification, define
H(X} x) = x + r{CA,-CF{x), (X 7)).
The following theorem then holds for the nonlinear Hansen-Sengupta op-
erator.
Theorem 4 Let F : D0 C iff > be Lipschitz continuous on
D C D0} and let A Â£ I^R.nXn be an interval Lipschitz matrix for F on D.
If x Â£ X Â£ ID and X' = H(X} x) then:
(1) Every zero x* Â£ X of F satisfies x* Â£ X'.
(2) If X'f\X = 0 then F contains no zero in X.
(3) If x Â£ int(A) and 0 ^ X' C X, then A is strongly regular (that is,
A~XA is regular), F contains a unique zero in X, and hence in X'.
Proof: See [74],
Neumaier goes on to show that the Hansen-Sengupta operator
provides enclosures for the zero of F that are guaranteed to be at least
as good as those of the Krawczyk operator. Also, the Hansen-Sengupta
operator gives sharper non-existence and existence tests. However, it can
be shown that the existence test with H(X}x) is weaker than with the
Newton operator N(X, x). Also, there is not yet a known result that gives
the optimal values for C and x for the Hansen-Sengupta operator.
44
2.5 Interval Iteration
The methods in the previous section can be seen as a means to
verify whether a given box contains a possible local optimum. They can
all be used as the basis of a global optimization algorithm. Historically,
the earliest of these algorithms used the Newton operator, but any of the
operators K or T can be used.
Algorithm 9 : Interval Iteration (Newton operator)
Initialization: Put the box X on the list L of boxes, where X is the
region over which the global optimum is desired, and set the DONE
list to NULL.
step 1: If the L is empty, go to step 3; otherwise, remove a box B from
L.
step 2: If N(B}x)
step 2a: Else if N(B} x) is smaller than some pre-determined toler-
ance, put K(B) on the DONE list. Go to step 1.
step 2b: Else if N(B}x) is sufficiently smaller than B, put N(B}x)
on L. Go to step 1.
step 2c: Else, split N(B} x)f]B into k pieces, B\ ... Bk and put
them on L. Go to step 1.
step 3: Output the DONE list.
A potential problem of this algorithm is the dependence on the
matrix Y. The algorithm works best if Y is close to the inverse of the
midpoint of the Hessian matrix and computing that inverse requires 0(n3)
operations. The approximate inverse can be used from step to step, but
a good approximation to the inverse of the midpoint of the Hessian is
45
important to the speed of convergence. When the second derivative is
continuous on the box X, the third derivative is non-zero on X, and /
has a fixed point in X, then the above algorithm converges quadratically
in the limit [69]. Like a quasi-Newtons method for real numbers, this
quadratic convergence rate is dependent on having a good approximation
to the inverse of the midpoint of the Hessian.
All boxes on the DONE list consist of all the points of the orig-
inal box for which it is possible that VL(x) = 0. To complete the global
optimization algorithm, each of these boxes must now be checked to de-
termine which boxes contain only local optima and which boxes contain
the global optima. To do this, let L be the DONE list and let
y* = min F(Bi)
and discard all Bi for which F(Bi) > y*. The global optima must be in
the remaining boxes.
2.5.1 Hansens Algorithm
Hansen has implemented an interval branch and bound algorithm
which includes several different techniques to eliminate portions of a box.
This research touches on the highlights of the particular methods used
to eliminate part or all of a box as presented in [35]. His algorithm is
guaranteed to find the global maximum of all local maximums, and he
assumes that the global maximum occurs in the interior of the initial box
and not on the boundary.
The most obvious method of eliminating a box is to evaluate the
function at some point in a region. If that function value is better than
46
any known function value, record this function value as the best known
function value y*. If the lowest function value of the function over a box
is greater than y*, that box can be eliminated. Similarly, if 0 ^ VF(X),',
we know that it is impossible for there to be a stationary point of F in
the box X. Thus X can be deleted. Finally, if V2F(W)8y < 0, it must
be true that V2F(x)8y < 0 for any real vector x Â£ X. This implies
that the Hessian matrix cannot be positive semi-definite (and therefore,
the function cannot be convex) on the box X. Again the box X can be
deleted. Enclosures of both the gradient and the hessian can be computed
using interval arithmetic. Just as with evaluating the function itself, the
enclosures for the gradient and the Hessian are guaranteed.
Hansen also implements some methods that are able to eliminate
just part of a box. These methods require more work, but can result in a
savings of time as large regions can be eliminated. It can be shown using a
Taylors expansion of the function F about a point x = X that for y Â£ X,
n
F(y) Â£ F(x) + ~ Xi)gt(X i,..., Xi,xi+1, ...,xn),
8 = 1
where y8- is the ith component of the gradient of F. We denote g\ =
gi(Xi}. ., Xi, Xi-|_i,. ., xn) and rewrite equation( 2.5.1) for some j = 1,. . n,
as
n
F(y) Â£ F(x) + ((yj Xj)g] + Y(Vi ~
8 = 1
This equation is true for all y8- Â£ X{. We can replace y8- by Xi in
the right hand side and get
n
F(y) Â£ F(x) + (y3 Xj)g] + - xjgj. (2.4)
8 = 1
47
Hence, F(y) > F, where F is the best function value so far.
Define t = yj Xj,
n
u = F(x) + Y,{Xi Xi)gj F
i=1
and
We can then delete the portions of X for which U + Vt > 0 and
retain the portions for which U + Vt < 0. Let T = {t : U + Vt < 0},
U = [a, 6], and V = [c, d\. Then T is given by the following:
\a/d, oo] if a < 0 and d > 0
\a/c, oo] if a > 0, c < 0, and d < 0
[oo, oo] if a < 0 and c < 0 < d
[ oo, a/d\ U\a/c, oo] if a > 0 and c < 0 < d
[ oo, a/c] if a < 0 and c > 0
[ oo, a/d] if a > 0, c > 0 and d > 0
0 otherwise.
Hansen sets Yj = Xj Pll?1 + xy)- If this set is empty, all of X is
deleted. Otherwise, the algorithm sets Xj = Yj and repeats this operation
for j = 1,... ,n. It is also possible for Yj to be made up of two intervals.
If this is the case, the algorithm sets Xj = OY3 but retains the gap
information if the box is split. In this manner, it saves the problem of
dealing with handling the branching that would occur if we split the box.
It also allows the algorithm to discard the gap if the box is split at a later
point in time.
48
Similarly, Hansen uses the second order Taylors expansion to
eliminate portions of the box X. If we let x = X and z = y x} then we
can delete the points y for which
f(x) + zT g(x) +
ztX23{x,X)z
2
> F
where X23 (x, X) is the Hessian with arguments (Ah,. . Xj, Xj+i,. . xn).
For example, when n = 2, we then want to retain the points y such that
X, ^ + 2z1z2X^1 + ^22V22 ^
j(x) + zigi + z2g2 H------------- < J
Hansen shows that X2(x}X) is lower triangular(see [35], section 6.4) and
so the above expression simplifies to
f(x) + Zigi + z2g2 +
z\V2i + Z\V21
T z2 X22
After simplifying the above and replacing z2 by Z2 = X2 x2} one wants
to solve the quadratic equation
A + Bzi + Cz\ < 0,
(2.5)
where
A = f(x) f + Z2g2 + V22Z2,
B = Mk
2
and
The solution of 2.5 then becomes a matter of solving the equation
A + BT + CT2 = 0 where A, B} (7, T are all intervals. The solution to this
equation is quite complicated, is not presented here but can be found
49
in [35]. The above equation is then also solved for the case in which one
replaces z\ by Z\ = X\ aq, and then solves for z2.
Hansen applies the zerof/l order check first, immediately followed
by the test for monotonicity and the test for non-convexity. He then
applies the first order elimination which is followed by the second order
elimination. One pass of an interval Newtons method (presented in sec-
tion 2.5) is also applied. If the box is reduced at all, then the function
is re-evaluated at (approximately) its center, and F is updated. Hansen
also implements a Newton-like search whenever a box yields a new upper
bound on F is found, but this search is terminated if it leaves the box.
Hansen claims to have had a great deal of success with this com-
bination of techniques. He states that run time appears to increase roughly
with the cube of the dimension of the problem. Since a specially mod-
ified compiler was used by Hansen, this research was unable to compile
Hansens FORTRAN code, so direct comparisons on a single machine were
not possible.
50
3. Back-Boxing Methods For Global Optimization
The algorithms for global optimization presented in the previous
chapter work well, but often take a great deal of time for large numbers
of variables or when high accuracy is required. We present a new method
that can solve problems with a large number of variables to a high degree
of accuracy in a short amount of time, even when the available memory
for the automaton is limited.
Back-Boxing is a new partitioning scheme for global optimization
with interval arithmetic and rectangular partitions. It takes advantage
of the fact that local optimization is quick and easy and attempts to use
ideas of both convexity and result verification to reduce the work expended
when close to an optimum. As with global optimization algorithms, this
scheme can be foiled and it can actually take significantly longer on some
problems. When one desires a high degree of accuracy, a Back-Boxing
partitioning scheme performs significantly better.
The algorithm that is developed here finds a local minimum for a
given function subject to bound contraints on the variables. As before, we
will continue to use capital letters to represent interval variables and non-
capitols for point variables. First, consider the following local optimization
algorithm of the function F where (a) = F(x + ap):
Algorithm 10 : Damped Newtons Method for Real Lo-
cal Optimization
step 1: Solve V2F(x)p = VF(x) for p.
step 2: Find a so that the following are satisfied:
a) (a) < (0) + e
b) 4>(a) > (f)(0) + (1 e)
step 3: Set x y- x + ap.
step 4: If ||VF(x)|| < e, stop; else, go to step 1.
This is a basic damped Newtons method for local optimization.
Given enough time and a function that is not too poorly behaved, this al-
gorithm converges to a point x* where *V F(x*) = 0, which may or may not
be a local optima. If we want the algorithm to converge to a point inside a
box, this algorithm could easily fail. It is possible that the method would
attempt to go outside the box when it applies the line search in step 2. To
prevent this, consider the following modification to the algorithm to find
the local optimum of a function over a box B\
Algorithm 11 : Constrained Damped Newtons Method
for Real Local Optimization
step 1: Solve HF(x)p = VF(x)
step 2: For 1 < i < n if ay = Bi and pi > 0, set pi = 0.
step 3: For 1 < i < n if ay = Bi and pi < 0, set pi = 0.
step 4: If x + p (jf B, find a such that x + ap Â£ dB and set p ap.
step 5: Find a so that the following are satisfied:.
a) 4>(a) <
b) (a) > (f>(0) + (1 e)
step 6: Set x x + ap.
step 7: If ||VF(x)|| < e, stop; else, go to step 1.
52
Steps 2 and 3 check to determine if x is on the side of the box.
If it is, step 4 makes sure that the vector p does not take x outside the
box. That is, if Xi is on the right side of the box and if pi > 0, any value
of a would place x + ap outside the box, so pi is set to 0. Similarly, if ay
is on the left side of the box and pi < 0, then any value of a would place
x + ap outside of the box.
Step 4 then checks to see if the x + p would remain inside the
box. If not, p is reduced to a length that would, at most, allow x + ap be
on the side of the box where 0 < a <1. Finding that a is accomplished
by the following theorem.
Theorem 5 Let X Â£ In, X not thin, x* Â£ X, x* Â£ and let
a unit direction p be given. Then the line starting at x* with direction p
leaves X at the point x* + ap where a is given by
where
Proof:
a =
min ay(X, x*, p),
l*
at(X, x*,p)*
X;~l
if Pi > 0
if Pi < 0
oo if pi = 0.
Finding the largest a is equivalent to solving the LP
Max a
st:
x* + api < Xi i = l,.. .,n
x* + api > Xt 1 = 1,.. .,n
a > 0.
53
Since x* Â£ X, a = 0 is a feasible point of this LP. Therefore,
the largest a for which x* + ap is inside or on the box X is given by
a* = min8'jK^0 cq(W, x*, p), providing the desired result. S
On a digital computer, directed roundings must be used so that
it is guaranteed that the point x + ap is in the box. For example, suppose
X = [0,1], x* = 1/3, and p = 1. Then a* = 2/3. Suppose also that 1/3 is
represented on our automaton as 0.334, and 2/3 is represented as 0.667.
Then, x + ap = 1.001, which lies outside of X. While this example is
indeed contrived, it is entirely possible for roundings to occur that force
x + ap outside of X.
3.1 Back-Boxing
Some branch and bound global optimization algorithms are based
on interval arithmetic (see [35]). While interval arithmetic is mathemat-
ically correct, it has an inherent trend to sometimes explosively overesti-
mate the range of a function over a box. When the box width tends to
zero, interval arithmetic guarantees that the width of the overestimation
also tends to zero but only linearly in the box width. From this we obtain
the convergence proof.
However, even though the overestimation tends to zero, this is
only a theoretical result and often can cause a great deal of computer time
to be expended for the boxes immediately surrounding a global minima.
Therefore, it may be wise to identify the areas where minima are located.
The algorithm slows on these areas and a fast local search procedure could
applied instead. The following theorems are required to help identify these
basins of attraction and are taken from [8].
54
Theorem 6 If Gaussian elimination can be performed on A Â£
R"x" without pivoting, and all the diagonal elements of the resulting ma-
trix B are positive, then A is positive definite.
Theorem 7 If A Â£ is symmetric, has positive diagonal
elements, and is diagonally dominant, then A is positive definite.
From these standard theorems for real linear algebra, one can
quickly derive corollaries for interval linear algebra. Define the following:
\X\
(X)
max(|2Â£|, |X|)
'
min(|X|,|X|) if 0 ^ AG
<
0 otherwise.
Define an interval matrix A to be diagonally dominant if
{A%%) > E |Aij
8 = 1
*'Â£i
Corollary 1 If A Â£ /nXn -g Symmetric, has positive diagonal
elements, and is diagonally dominant, then every symmetric sub-matrix
A1 Â£ A is positive definite.
Proof: Let A' Â£ $tnXn,A' Â£ A, and (A')T = A'. Since A has positive
diagonal elements, A' has positive diagonal elements. Since A is diagonally
dominant,
n n
A't > A^> (Att) > Z\Av\ > E\A'j\-
8 = 1 8 = 1
*'Â£i *'Â£i
The first inequality is possible since we know that An > 0. There-
fore, A' is symmetric, diagonally dominant, and has a positive diagonal.
Applying Theorem 7, A' is positive definite.
55
Corollary 2 If A Â£ InXn can be Gaussian eliminated without
pivoting, and the resulting matrix B Â£ JnXn has a positive diagonal, then
every symmetric sub-matrix A' Â£ A is positive definite.
Proof: Let B = AG, where G Â£ and G is the product of elementary
matrices, none of which correspond to row exchanges. Furthermore, let
Bn >0, 1 < i < n. Let A' Â£ A be symmetric. Then B' = A'C Â£ AG = B
has a positive diagonal and is obtained from A' by Gaussian elimination
without row exchanges and, by Theorem 6, is positive definite.
One interesting point in the context of the unconstrained opti-
mization problem is that even though the Hessian should be symmetric if
calculated using exact real arithmetic, the interval Hessian often is non-
symmetric due to round-offs occurring in different places and in different
orders. The algorithm that is developed here forces symmetry on the inter-
val Hessian by taking A8 J = A8 J fj Ahi which sometimes results in tighter
values for the Hessian.
The reason that forcing symmetry is a valid procedure is that the
Hessian of every twice continuously differentiable function is symmetric.
Let A be a matrix such that A = V2F(W) and A8J ^ Aji for some i,j.
Without loss of generality, suppose that there exists a Â£ A8J, a Aji.
Then any matrix A' Â£ A, A' Â£ with AG = a cannot be symmetric
and so could not be the Hessian of a twice continuously differentiable
function. Therefore, we can eliminate any such matrices by applying the
above theorems to AfjAT rather than just the matrix A. We take the
intersection rather than the union since the intersection gives a tighter
bound on A.
56
Back-Boxing is defined to be the process of identifying a box that
surrounds a given point in such that the objective function is convex
on that box. This may not always be possible. For example, consider the
function f(x) = x3 at the point x = 0. For any interval that contains x = 0
as an interior point, / is not convex on that interval. For a case such as
this one, a small box is placed around the point and it is set aside for later
processing. The computational method developed here finds a box where
F has a unique global optimum on that box and verifies mathematically
that the minimum is unique using result verification methods unlike [56].
In [56], a stationary point is found, and then a prohibited area is placed
around that point. Next, it is assumed that this stationary point is the
global minimum of the prohibited area, and this portion of the domain
is no longer considered. This has the advantage that one does not need
to verify that the global minimum of the prohibited area occurs at the
stationary point, but this is also its downfall. Since the region is not
verified, it is possible that it could contain not only other local minima,
but also a smaller local minima.
Recall that the majority of the work expended by a branch and
bound algorithm is usually expended searching close to the local optima.
This has been verified by several different people including Hansen, Wal-
ster, and Sengupta in [107]. They state that for most problems they
tested, the time required to globally optimize did not go up as an ex-
ponential function of the size of the domain. This was also verified by
Hansen [35] as well as in some private communication between Hansen
and Moore. If the work required for a branch and bound algorithm was
57
evenly distributed without regard to the existence or location of a local
optima, then one would expect exponential growth in the time when the
size of the domain increases. Since this does not happen, most of the work
must be expended when close to the optima.
Another way to see this is to consider what happens to the inter-
val extension of F, VF, and V2F when X is close to or contains a global
minimum. If X contains a local minimum, and we then bisect X into two
pieces, both pieces could now contain the global minimum and, neither
can be deleted. Similarly, if a box G of width macheps is placed around
the global minimum of X, and each of the boxes X G is evaluated with
an interval extension, the evaluation (due to round off error) does not al-
low any of the boxes in X G to be eliminated. For example, consider
the function f(x) = x2 on the interval X = [10, 10]. If we split X into
the three boxes [-10, e], [-e, e], [e, 10] where e is the smallest machine rep-
resentable number, only one box contains the global minimum. However,
F([e, 10]) = [e2,100] and on the automaton, ]e2,100[= [0,100] cannot be
deleted. This may be a trivial example, but round-offs must be taken into
consideration. Many more boxes that are close to the solution must be
treated when tolerance is small. Similarly, inevitable overestimation in
the evaluation of the gradient causes the enclosures to contains zero, even
when there is no local minimum in that box.
It is relatively easy (when compared to global optimizing) to find
the global optima of a convex function over a convex set. This can be done
to a Back-Boxed region. Naturally, one wants to identify the largest box
around a point such that the objective function is convex. This turns out
58
to be a non-trivial problem and is discussed later. Since one does not
want to spend time Back-Boxing regions that are not promising, it would
be wise to first attempt to locate a local optima and then Back-Box that
local optima. In this way, rather than spend work attempting to eliminate
regions close to the global optima, this difficult region is treated using a
real arithmetic algorithm which is inexpensive in terms of CPU time and
also is fast. The following algorithm incorporates these ideas.
The Back-Boxing algorithm maintains three lists of boxes. The
first list is the list of undetermined boxes which contains all boxes for which
it has not been determined whether they contain a global minimum. The
second list is the set of finished boxes. These are boxes that have been
reduced to a size smaller than a preset tolerance. The global optima of
the function may be contained within these boxes. Finally, there is the
list of CX boxes. This is a list of boxes for which it has been determined
that the objective function F is convex at each point of the box. For each
of these boxes, the minimum over the box is verified, and the remainder
of the box is deleted at the end of the algorithm.
The algorithm contains two loops. In the outer loop the algo-
rithm checks to see whether any excessively large boxes remain on either
the unchecked list or the CX list. If there are any boxes that are too large
on either of these lists, the inner loop processes these boxes. These boxes
are reduced in size or split either by Back-Boxing or by bisection. This is
continued until the unchecked list is empty. The algorithm then returns
to the outer loop. If any boxes B on the CX list are too large, a box B* of
width e of the width of the box is located in the center. Then the boxes
59
B* and B B* are put on the unchecked list, and the algorithm contin-
ues. The box is located in the center because that is where the proposed
minimum of the box was found.
At termination, the algorithm returns all the boxes that are on
the CX list and all the boxes on the finished list. All the global minima
must be contained within these boxes.
Algorithm 12 : Back-Boxing
Initialization: Set list of undetermined boxes to B0} where B0 is the set
over which one wants the set of global minima. Assign the set of CX
boxes NULL, the list of finished boxes NULL, and i = 0.
step 1: Choose a box B from the list of undetermined boxes. If the list
of undetermined boxes is empty, go to step 4.
step 2: Attempt to eliminate all or part of the box using some of the
techniques in section 2.3.3.
step 3: If either B is not large enough for Back-Boxing or if (F(B)
F*)/F* > BBTol, where F* is the smallest upper bound of the func-
tion values on the region, split B into k pieces B1,. . bk. Place the
boxes for which max, width(IT) < pre-specihed tolerance on the list
of finished boxes and place the remainder on the unchecked list and
go to step 1. Otherwise, continue.
Overview: In steps 3a through 3d, the algorithm attempts to find
the largest box B' for which it is mathematically guaranteed
that B' has a unique global optimum. This process uses the
intermediate boxes Bg} Bh} and Bhs to complete its work,
step 3a: Apply a point algorithm to minimize / over B to find a
60
point x that (nearly) satisfies the Kuhn-Tucker conditions. That
is, find a point x' such that X F gg < e, where g represents the
constraints ay < B and ay > F, and e is some specified tolerance,
step 3b: While 0 ^ V/(F')), enlarge B' to Bg. Then while V2/(F3)
is positive definite, enlarge Bg to B/,. Finally, while H(X} x) Â£ X,
enlarge Bh to Bhs The strategy to enlarge B' is given below in
Algorithm 13.
step 3c: If Bhs is larger than Bh or if Bh is sufficiently larger than
Bg, set B' = Bhs and place Bh on the list of CX boxes. Else set
B' = Bg and discard Bg.
step 3d: Let BB' = B1,. ., BJ. Place the boxes for which max, width(F') <
pre-specihed tolerance on the list of finished boxes and place the
remainder on the unchecked list and go to step 1. A discussion
of how to do this in a mathematically verifiable way on a digital
computer is given below.
step 4: All boxes have now been classified as either being too small to
be processed further, or it has been determined that the function F
is convex on that box. The algorithm now attempts to eliminate or
reduce the size of the boxes that remain. This is the end of the inner
loop as mentioned above. Remove a box B from the list of CX boxes.
If the list of CX boxes is empty, go to step 9.
step 5: Attempt to eliminate part or all of B using some of the techniques
in section 2.3.3.
step 6: If B is now empty, go to step 4. Otherwise apply a point descent
61
Figure 3.1: First locate a local minimum
method which is guaranteed in theory (given enough time and pre-
cision) to converge to the global minimum of a convex function on a
convex set to what remains of the box. This returns a point x* Â£ R,
x* Â£ $Rn.
step 7: Let Y = x* + [e, ev\ for some ev > 0, and verify that Y contains
a local minimum over the box B. If not, go to step 6 with x* as the
starting point for the minimizer.
step 8: Let Y = x* + [e^, e]. If F(Y) > best function value so far,
discard B. Else, discard B Y and contract the box Y using one of
the result verification routines in section 2.4 until it is smaller than
the predescribed tolerance. Put Y on the list of finished boxes and
go to step 4.
step 9: Discard all boxes on the finished list for which F_(Bi) > F*.
step 10: Return the list of finished boxes.
This research desired an algorithm that worked as a black box
62
Figure 3.2: Place a box around that point such that / is convex
Figure 3.3: Partition the region around that box
63
requiring no information about the function other than its definition. This
algorithm satisfies those goals. In fact (assuming the function is differen-
tiable), derivatives are computed using automatic differentiation so one
does not need to provide an algebraic definition for the derivative. As
output, it gives a set of boxes for which are mathematically guaranteed to
contain the global optima when this algorithm is implemented on a dig-
ital computer with directed roundings (available on any IEEE machine)
for interval arithmetic and guaranteed function evaluations.
Step 4b of the Back-Boxing algorithm is accomplished in the
following manner.
Algorithm 13 : Grow Box
Initialization: Let c = width(I?)/2, a = 0, b = (a + c/2), and ONE to
be the vector of all ones. Finally, let x be the point about which
one desires a box of maximal size for which F has a unique global
optimum on that box.
step 1: Bg = Bf)(x + b[-l,l]ONE).
step 2: If 0 ^ Vf(Bg), a = b, else c = b.
step 3: If (c a) < tol, go to step 4. Otherwise, go to step 1.
step 4: Bg = B f|( + a[1, lJOtVE), c = width(I?)/2, b = (a + c/2).
step 5: Bh = Bf){x + b[-l,l]ONE).
step 6: If 'V2f(B[l) is positive semi-definite, a = 6, else c = b.
step 7: If (c a) < tol, go to step 8. Otherwise, go to step 4.
step 8: Bh = B p|( + a[1, l]ONE).
step 9: Bhs = Bf){x + b[-l,l]ONE).
step 10: If H(Bh.s, x) Â£ A, a = 6, else c = b.
64
step 11: If (c a) < tol, go to step 12. Otherwise, go to step 8.
step 12: Bhs = B p|( + a[1,1]0NE).
The Grow Box algorithm is a combination of three one dimen-
sional bisection algorithms. The first identifies the largest value a for
which VF(Bg = XF(Bf](x + a[1, lJOiVif)) does not contain 0. The
second uses Bg as a starting point and then determines the largest box
for which X2F(Bil) = X2F(B f](x + a[1, lJOiVif)) is positive definite.
The final bisection algorithm finds the largest box for which H(X} x) Â£ X.
Since the cost of computing the gradient is an order of magnitude less than
computing the Hessian [55], the box is first enlarged using the gradient
rather than the Hessian or the Hansen-Sengupta operators.
Jansson and Kniipel proposed and implemented a similar method
in [56]. They use a local minimizer to find a proposed local minimum and
then place a prohibited area around this local minimum which would
no longer be searched. They do not guarantee the existence or lack of
a local or global minima in this prohibited area. Hence, other local or
global minima could escape detection. They do compute bounds on the
function value over this prohibited area and discard the prohibited area
if the minimum function value exceeds the best function value discovered
for the function on the region.
Back-Boxing is also similar in some respects to MLSL. In MLSL,
a local minimizer is applied to a proposed basin of attraction exactly
once. That region is then ignored since the best function value for that
region has been found. There is always the potential problem that there
might be two or more local minima in a proposed basin of attraction and
65
that a global (or smaller local) minimum could be missed by MLSL.
Back-Boxing is guaranteed to not miss any local or global min-
ima of the function since any region that is identified as convex has its
local minimum verified. If the local minimum cannot be quickly veri-
fied, then the box is split and treated again until it is either discarded or
the global minimum of the box is found. Regions that are not identified
as having a convex objective function are eliminated only if they do not
have the necessary characteristics of a global minimum. Some of these
conditions are the gradient containing 0, having a smaller function value
than other regions, having a positive definite Hessian, and others given in
section 2.3.3.
We know that a real matrix is positive definite if it is symmetric,
its diagonal is greater than zero, and it is diagonally dominant. The test
for diagonal dominance of every symmetric sub-matrix becomes Ay; <
YT3=i min Similarly, the test for the diagonal being greater
3 A
than 0 is simply Ay > 0. If both of these conditions are satisfied, then
the every real sub-matrix A' of A is guaranteed to be positive definite.
Step 4d of Algorithm 12 can be accomplished in any number of
ways. If we assume that B' is strictly interior to R, we can let
B-B' = {X'\B' X\X\ G [B[,
There are three choices for each Xi, i = 1,. . N which yields 3W
possible boxes. 1 must be subtracted for the box B where X1 =B which
gives 3^ 1 possible boxes for B B1 (see figure 3.4) and this may not
be the best way to calculate the difference. The hope from steps 4a-d
is that the box B has only a few local optima, and that each of these
66
1 4 6
B B B
2 B B1 7 B
3 s 8
B B
1 B B' 2 B
1
Figure 3.4. Partitions for B B'. The first is from Algorithm 14 and the
second is the memory intensive method.
can be captured in its own box, while the remainder of the box can be
easily eliminated using any of the standard branch and bound techniques.
Adding an exponential number of boxes to the list is not a good choice in
this case as it requires an exponential number of boxes to be eliminated.
For example, if N = 100 (not a large number of variables), 5.154xl047
boxes are added to the list. Considering that each box requires storing
2N double numbers to represent the upper and lower bounds of the box in
each dimension and each double requires ten bytes, this requires in excess
of 1.031xl045 megabytes of memory which exceeds the storage capacity of
any known computer today (actually more than the sum of the storage
capacity of all computers ever built).
Since this is unreasonable for a large number of variables, we
compute B B' in the following fashion:
Algorithm 14 : Minus
Initialize: Set MinusList = NULL, i = 0.
step 1: X = B.
step 2: Xi = [fy, B].
67
step 3: Add X to MinusList.
step 4: Xi = [Â£, Bt].
step 5: Add X to MinusList.
step 6: Xi =
step 7: If i < N} increment and go to step 2. Else return MinusList.
This algorithm creates boxes by cutting out portions of the box
in each dimension. For example, if
[ [-10,10]
JD =
_ [-5,30] _
and
[2,3]
the following boxes would be added to the list by the above method in the
following order:
[-10,-1] [1,10] [-l.i] [-1,1]
[-5,30] 5 [-5,30] 5 [-5,2] 5 _ [3,30] _
This method returns a total of 2N boxes of considerably different
volumes. Moreover, the amount of memory required for this method is
much lower. For example, for 10,000 variables and 10 bytes per double,
only 200K of memory is required for these boxes. This is easily handled
on todays computers. It is hoped that only a few of the boxes require
significant time to process, and that many may be eliminated outright.
Jansson and Kniipel use a branch and bound algorithm like Algorithm 12
and compute what they call a prohibited area. They do not describe
how they search the area that is not prohibited, nor do they describe how
they divide up the region B B'.
68
Algorithm 12 has similarities to many global optimization algo-
rithms reviewed here. If the minimum size required for Back-Boxing is set
to 0, then the algorithm degenerates into a standard branch and bound
algorithm such as Hansens method. If not, then at steps 4c and 4d of
the algorithm, the procedure attempts to find as large a box as is possible
that is still either convex or does not have 0 in the gradient. B' can either
easily be discarded or a global maximum over the box B' can be quickly
and easily found using a local optimization method and result verification.
69
4. Test Problems
This section contains the problem set that was compiled from
problems found in the literature or specially created to test the Back-
Boxing algorithm that was developed in this paper. Several problems
were either created or found to test the Back-Boxing algorithm that was
developed in this paper. The solutions (if known) are given. The majority
of the test problems can be found in [39, 56, 88] but the original reference
is provided when known.
4.1 Published Problem Sets
Test Problem 1 Extended Rosenbrocks function [87]
n/2
F(x) = 100(z2; x2i-l)2 + (1 x2i-l)2
8 = 1
s.t. 10 < Xi < 10
x* = (1,1,1,..., 1)T, and f(x*) = 0.
Test Problem 2 Extended Rosenbrocks function, version
2 [88]
n
F(x) = X! 100(x,- ay2_i)2 + (1 ay_i)2
8 = 1
s.t. 10 < Xi < 10
z* = (l,l,l,...,l)T, f(x*) = 0.
Schittkowski [88] provides many different variations of the standard Rosen-
brocks function, of which the above two examples are given. The original
reference by Rosenbrock [87] is the first of the above two with n = 2.
Rosenbrocks function is a classic problem in local optimization since it
has a very narrow and very steep banana-shaped valley that surrounds
the global minimum.
Test Problem 3 Six Hump Camel Back [56, 97]
F(x) = 4x2 2.1xf + xf/3 + x\x *2 4x\ + 4^^
x
*
X
*
s.t. 5 < Xi < 5
'
(0.0898400,-0.712659)
< , f(x*) = -1.0316285.
(-0.0898400,0.712659)
Test Problem 4 Levy 3 [56, 107]
5 5
F(x) = J2lcos((l + 1)X1 + 0 cos((j + l)x2 + j)
*'=1 j=l
s.t. 10 < Xi < 10
'
(4.97648,4.85806)
(4.97648,-1.42513)
(4.97648,-7.70831)
(-1.30671,4.85806)
< (-1.30671,-1.42513) ,/(O =-176.542.
(-1.30671,-7.70831)
(-7.58989,4.85806)
(-7.58989,-1.42513)
(-7.58989,-7.70831)
Test Problem 5 Levy 5 [56, 107]
5 5
F(x) = J2lcos((l + 1)X1 + 0 cos((j + l)x2 + j)
i=l 3=1
+ (Xl + 1.42513)2 + (x2 + 0.80032)2
s.t. 10 < Xi < 10
71
x* = (-1.30685,-1.42485), /(x*) = -176.138.
Levy 3 and Levy 5 are very difficult problems for a local opti-
mizer. The products of different periods of cosines lead to a very rugged
landscape with hundreds of local optima. Levy 2 has 9 separate local
optima, and Levy 5 is identical to Levy 3 except for the addition of a
quadratic term. This quadratic term has the effect of raising the land-
scape on the edges so that the global minimum that is closest to the point
(-1.42513, -0.80032) becomes unique.
Test Problem 6 Beale 1 [56, 107]
F(x) = (1.5 x\ + X1X2)2 + (2.25 x\ + xix22)2 + (2.625 aq + xix2)2
s.t. 4.5 < Xi < 4.5
x* = (3,0.5), /(x*) = 0.
Test Problem 7 Booth [56, 107]
T(x) = (aq T 2^2 7)2 T (2aq T aq 5)2
s.t. 10 < Xi < 10
x* = (1, 3), f{x*) = 0.
Test Problem 8 Matyas [56, 107]
F(x) = 0.26(x2 + x2) 0.48xix2
s.t. 10 < Xi < 10
x* = (0, 0), f(x*) = 0.
Matyas appears to be simple since it is just a quadratic. However,
the Hessian matrix is very nearly singular along the line aq = aq which
72
causes some problems eliminating regions using zerof/l, first or second order
information.
Test Problem 9 Branin [56, 97]
F(x) = (x2 ~^X1 + ~X1 ~ ^ + 10 ^1 ^ cos X! + 10
s.t. 5 < xi < 10
0 < x2 < 15
x* = <
, f(x*)
(-3.14159,12.27500)
(3.14159,2.27500)
(9.42478,2.47500)
= 0.397887.
Test Problem 10 Rastrigin [56, 97]
F(x) = x\ + x22 cos(12xi) cos(18x2)
S.t. 1 < Xi < 1
= (0,0), f(x*) = 2.
Test Problem 11 Griewank 2 [56, 97]
ryt 2 _J_ ryt 2 ryt
771/ \ X1 -T X2 X2
r (x) =--------------cos xi cos7= + 1
V ; 200 V2
s.t.
-100 < Xi < 100
x* = (0,0),f(x*) = 0.
Test Problem 12 Exp2 [18, 56]
10 ?
F(x) = jr (edio 5e2/10 e-.yio + 5e-^
8 = 1
S.t. 0 < X; < 20
= (1,10), f(x*) = 0.
73
Test Problem 13 Treccani [18, 56]
F(x) = x\ + Ax\ + Ax\ +
s.t.
5 < Xj < 5
x =
. /(**) = 0.
(0.0)
:-2,0)
Test Problem 14 Three Hump Camel Back [18, 56]
F = 2x\ l.Qhx\ + -x\ + xi x2 + x\
s.t.
5 < Xi < 5
x* = (0,0), f(x*) = 0.
Test Problem 15 Branin2 [18]
F = ^1 2x2 + sin(47rx2) - ^ sin(27rxi)^
s.t. 10 < Xi < 10
(1,0)
(0.148696,0.402086)
(0.402537,0.287408) f(x*) = 0.
(1.59746,-0.287408)
(1.85130,-0.402086)
Test Problem 16 Chichinadze [12, 19, 56]
9 7T . 1 / (x2 1/2)2
F = x\ 12a:i + 11 + 10 cos x1 + 8 sin(57rxi)------exp----------------
x =
V5
s.t.
-30 < X! < 30
x* = (5.90133,0.5), f(x*) = -43.3159. -10 < x2 < 10
74
Test Problem 17 Price [19, 56]
F(x) = (2x\x2 x2)2 + (6x1 x22 + X2)2
s.t. 10 < Xi < 10
(0,0)
X* = < (2,4) fix*) = 0.
(1.464352,-2.506012)
Test Problem 18 Mccormick [39, 65]
F(x) = sin(xi + X2) + (xi X2)2 1.5a:i + 2.5x2 + 1
s.t. 1.5 < xi < 4
3 < x2 < 3
x* = (0,0), f(x*) = 1, 1.5 < x 1 < 4, and 3 < x2 < 3
Test Problem 19 Schwefel 3.2 [56, 107]
F(x) = J2 [(xi xl)2 + l1 xif]
i=2
s.t. 10 < Xi < 10
= (1,1,1), f(x*) = 0.
Test Problem 20 Levy 8 [56, 107]
k 1
F(x) = sin2(7ryi) + ^(y* 1)2(1 + 10 sin2(7ry8 + 1))
i=l
+ (yk 1)2(1 + sin2(27rxfc))
s.t. 10 < Xi < 10
x* = 1, f(x*) = 0
75
Levy 8, is difficult due to the combinations of different periods of
the sine function. However, like Levy 5, they are coupled with a quadratic
which allows the interval methods to quickly eliminate large regions.
Test Problem 21 Hartman 3 [56, 97]
F(x) = ~X!c*exP
8 = 1
s.t. 0 < x; < 1
a =
P =
-Efly(h
3 = 1
0.36890 0.11700 0.26730
0.46990 0.43870 0.74700
0.10910 0.87320 0.55470
0.03815 0.57430 0.88280
3 10 30 1
0.1 10 35 c = 1.2
3.0 10 30 3
0.1 10 35 3.2
x* = (0.114614,0.555649,0.852547), /(x*) = -3.86278.
Test Problem 22 Shekel 5 [56, 97]
x
*
F(x)
5
Â£
i
Â£j = 1 (X3 m)2 + C*
s.t. 0 < Xi < 10
4 4 4 4 0.1
1 1 1 1 0.2
8 8 8 8 c = 0.2
6 6 6 6 0.4
3 7 3 7 0.4
(4.00004,4.00013,4.00004,4.00013), /(x*)=-10.1532.
76
Test Problem 23 Shekel 7 [56, 97]
7 1
F(x)
s.t.
= -Â£
^ J2j=i{xj ahJ)2 + a
0 < Xi < 10
a =
4 4 4 4
1111
8 8 8 8
6 6 6 6
3 7 3 7
2 9 2 9
5 5 3 3
0.1
0.2
0.2
0.4
0.4
0.6
0.3
x
*
(4.00057,4.00069, 3.99949, 3.99961), /(x*)=-10.4029.
Test Problem 24 Shekel 10 [56, 97]
1
10
Fi*) = -E
= 1 J2j = l(xj ai,j)2 + ci
s.t. 0 < Xi < 10
4 4 4 4 0.1
1 1 1 1 0.2
8 8 8 8 0.2
6 6 6 6 0.4
3 7 3 7 c = 0.4
2 9 2 9 0.6
5 5 3 3 0.3
8 1 8 1 0.7
6 2 6 2 0.5
7 3.6 7 3.6 0.5
77
x* = (4.00075,4.00059,3.99966,3.99951), /(x*)=-10.4029.
Test Problem 25 Colville4 [37, 39]
F(x) = 100(x2 xz)2 + (1 xi)2 + 90(x4 Xg)2 + (1 x3)2
+ 10.1((x2 l)2 + (4 l)2) + 19.8(^2 1)(4 1)
s.t. 10 < Xi < 10
x* = (l,l,l,l), f(x*) = 0.
Colville is a variation on the standard Rosenbrock function but
turned out to be similarly difficult. This function has a banana-shaped
valley in 4 dimensions as well as a dense 4x4 Hessian as opposed to the
sparse Hessian of the 4 dimensional Rosenbrock function.
Test Problem 26 Hartman 6 [56, 97]
4 6
F(x) = -Â£ Ci exp I] J KXj Phi )2
i=l 1 = 1
s.t. 0 < x 8 < 1
0.1312 0.1696 0.5569 0.0124 0.8282 0.5886
0.2329 0.4135 0.8307 0.3736 0.1004 0.9991
0.2348 0.1451 0.3522 0.2883 0.3047 0.6650
0.4047 0.8828 0.8732 0.5732 0.1091 0.0381
10 17 3.5 1.7 8 1
0.05 10 17 0.1 8 14 1.2
a = 3 3.5 1.7 10 17 8 c = 3
17 8 0.05 10 0.1 14 3.2
x* = (0.201690,0.150011,0.476874,0.275332,0.311652,0.657300)
f(x*) = -3.32237.
78
Test Problem 27 Sum Squares4 [37, 39]
n
F(x = ^ i x2
8 = 1
s.t. 10 < Xi < 10
x* = 0, f(x*) = 0.
Test Problem 28 Powells function [81]
F(x) = J2jLi (x4j-3 + 10x4j_2)2 + 5(x4j_i X4j)2
+ (4j-2 2x4j_i)4 + 10(x4j_3 X4j)4
s.t. 4 < Xi < 5
X0 = (3, 1,0,1,3, 1,0,1,..., 3,-1, 0,1)T, f(x*)
0.
Test Problem 29 [17]
x
*
l1
F(x) =
s.t.
F(x) = ~ Xi-i)2 + (xi l)2
8 = 2
s.t. 10 < Xi < 10
1 1 1 _____I_____f(r*) 0
21 /2 5 23/4 5 27/8 !> 2(2n~1 -1)/2n~1 J J \ 'V ) U'
Test Problem 30 Goldstein-Price [56, 97]
[1 + (x\ + x2 + l)2 (19 14a: i + 3x2 14x2 6xix2 + &x22)]
*[20 + (2xi 3x2)2(18 32x4 + 12x2 + 48x2 36ai 1 ai2 + 27^2)]
-2 < xi < 2
x* = (0,-l) f(x*) = 3.
Test Problem 31 Griewank 10 [56, 97]
F(x)
s.t.
600 < Xi < 600
+ 1
x* = 0, f(x*) = 0.
79
Griewank 2 and Griewank 10 are both very difficult problems for
local optimizers since they have a large number of local optima due to
the products of the the cosines. However, they are not very difficult for
an interval optimizer since they are raised slightly by a quadratic. This
allows large regions to be quickly eliminated with each improvement in
the present estimate of the global optima.
Test Problem 32 Schwefel 3.1 [56, 107]
F(x) = J2 [(g g2)2 + (g -1)2]
8 = 1
s.t. 10 < Xi < 10
a:* = (1,1,1), f(x*) = 0.
Test Problem 33 Kowalik [56, 107]
F(x)
s.t.
a
y (a-- Xl ^ + h%X2__]
ti\l % + btx3 + xJ
0 < xi < 0.42
0.1957 4
0.1947 2
0.1735 1
0.1600 1/2
0.844 1/4
0.627 b = 1/6
0.456 1/8
0.342 1/10
0.323 1/12
0.235 1/14
0.246 1/16
80
X
* = (0.192833,0.190836,0.123117,0.135766) /(x*) = 3.07486104.
Test Problem 34 Holzmann [37, 39, 40]
99
F(x) = Y.fKx)
8 = 1
I (i X2)X?-> |
fi(x) = -O.D + eV*1 j
Ui = 25 + (50 log(0.01z))2/3
s.t. .1 < xi < 100
0 < x2 < 25.6
0 < x3 < 5
x* = (50,25,1.5), f(x*) = 0.
Holzman may not appear to be difficult at first glance, but is
difficult if only because each evaluation of the function requires a large
amount of computer time. It was difficult to solve the problem due to
this restriction but, in some communication with Dr. Bennet Fox, it was
suggested that one could evaluate only the first few terms and approximate
the remainder with a Markov chain. As the size of the box became smaller,
one would need a better approximation in order to eliminate a region, but
this could be a useful direction of attack on this problem.
Test Problem 35 Holzmann [37, 39, 40]
n
x) =
8 = 1
s.t. 10 < Xi < 10
X* = 0, f(x*) = 0.
This problem is trivial to solve for almost any gradient based
method. However, the Hessian is identically 0 at the solution and this
81
should cause problem for branch and bound methods that rely on gradient
or hessian information to quickly eliminate large regions.
82
5. Results
5.1 Implementation
All of the code for this dissertation was coded in C-|f and com-
piled using the GNU C/C-|f compiler version 2.7.2. The programs were
run on an IBM RS6000 running AIX version 3. The algorithm was termi-
nated either when the desired tolerance was reached for every box or when
4 hours of CPU time had expired. If less than 4 hours of CPU time was
used, then the research executed 10 runs of the algorithm and presents
the average run time in this paper. The unit time for this machine as
computed by 1000 executions of the Shekel 5 function is 0.03 seconds.
The code will be made available at
ftp://interval.usl.edu/pub/interval_math/BackBox. The Back-Boxing soft-
ware will be updated to include more of the techniques proposed by Hansen
(see section 2.5.1) as well as will as support for general constraints rather
than just bounded variables.
At the time this research began, no solid public interval domain
package was available. For this reason, an interval package was developed
expressly for this purpose. It is easily ported to any IEEE machine and
requires the modification of only three small functions. The author will
aid in the porting upon request if access to the platform is available. A
new package for interval arithmetic called BIAS (Basic Interval Arithmetic
Software) is available at http://www.ti3.tu-harburg.de/indexEnglisch.html.
Algorithm 11 is used for the local optimizer. The solution of
the Newtons equations was implemented using a truncated Conjugate
Gradient method. The line search was then implemented by checking if
a = 1 is acceptable. If it is not, then a cubic is fitted using (0),
(f)(1), and
acceptable, then the algorithm half steps back until an acceptable step
size is found.
The standard branch and bound algorithm that was used for
comparison used 0th and lsf order conditions for elimination of a box and
used a local optimizer when the function value at the center was close to
the best function value already found. This algorithm was then modified
to implement Back-Boxing on every box that was large enough and had
a function value that was close enough to the best function value already
seen.
Early in the research, many methods were explored for storing
the list of unchecked boxes. They were stored in a first in, first out method
(corresponds to a depth first search), first in last out (corresponds to a
breadth first search), sorted by increasing lower bounds, sorted by in-
creasing upper bounds, sorted by increasing volume, sorted by decreasing
volume, sorted by decreasing upper bound, and sorted by decreasing lower
bound. Although each of these sorting methods may work best for a par-
ticular problem type, this research found that sorting the list according to
increasing lower bound provided the best results in general.
84
All derivatives, both real and interval, are computed using re-
verse automatic differentiation [55]. This lead to a great savings of exe-
cution time when the number of variables was large, but it also required
more memory than explicitly coding the derivatives. This did not turn
out to be a significant problem as the amount of memory required was
small. The automatic differentiation was accomplished by using operator
overloading in C-||-. This overloading also imposed overhead above and
beyond the overhead for interval arithmetic. On average, each function
took approximately 3 times as long to evaluate due to the overhead of
automatic differentiation. However, this was made up in the time savings
in computing derivatives as well as the time that would be required to
debug the derivatives that would need to be entered in to the computer.
The author realizes that this implementation of a standard branch
and bound algorithm is by no means optimal. However, conclusions can
be drawn from this implementation in terms of the general behavior of the
algorithm.
Greenberg [30] states
The reasons to perform computational testing in the context of con-
ducting and reporting research are to demonstrate:
correctness of model or algorithm
quality of solution
speed of computation
robustness
These are, of course, not exhaustive, but they are enough to serve our
purposes.
This algorithm is proven [35] to converge to the correct set of
solutions to the GOP. Hence, correctness of the algorithm is not in ques-
tion as it would be if this algorithm were a local optimization procedure.
85
Similarly, the quality of the solution is not in question since the algorithms
are run until they achieve a predetermined accuracy which must contain
the global optima. That leaves the questions of speed of computation, ro-
bustness, and whether my code implements the algorithm presented here
that must be answered.
Verifying that the software written implements the algorithm
presented here is a daunting task and is not possible for very large pro-
grams. In order to attempt to verify that this algorithm achieves its stated
goals of globally optimizing a function, the algorithm was executed on all
the functions presented here and was able to both find and verify each
of the global optima for each of the presented functions. We believe that
this gives a strong case for believing that the software written achieves the
goal of finding all global optima for a bound constrained function.
To this end, this research has assembled a set of test problems
that are widely accepted as being difficult for global optimization algo-
rithms to solve. This was determined in several ways. First a thorough
review of the literature was performed and a set of problems that are used
to test other global optimization algorithms was uncovered.
Second, the Frequently Asked Questions (FAQ) [90] about non-
linear optimization was consulted. This document is posted monthly on
the Usenet Newsgroup sci.op-research as well is available on the WWW at
http://www.skypoint.com/subscribers/ashbury/nonlinear-programming-faq.html.
This resulted in locating a few more articles and books with global opti-
mization problems as well as a few pieces of source code that are available
on the internet.
86
Finally, this research consulted the Usenet newsgroup sci.op-
research and put out a request for difficult global optimization problems.
This request was met by several recommendations which duplicated many
of the problems found in the first and second portions of this search. This
set of problems listed in section 4 was then used to evaluate the per-
formance of the algorithm at a variety of numbers of variables and with
several different tolerance levels.
Late in this research, several libraries have started to become
available as global optimization test suites. The first is CUTE (Con-
strained and Unconstrained Testing Environment) which is a set of FOR-
TRAN subroutines, system tools and test problems in the area of nonlin-
ear optimization and nonlinear equations. The package may be obtained
via anonymous FTP at camelot.cc.rl.ac.uk (Internet 130.246.8.61), in the
directory pub/cute, or at thales.math.fundp.ac.be (Internet 138.48.4.14)
in directory cute. Argonne National Laboratories has also released the
MinPack-2 Test Problem Collection [3]. This is a set of actual nonlinear
optimization problems that have arisen in applied mathematics settings.
These problems involve small dimensional application problems with in-
teresting features that make them difficult to solve and so are useful to
test the robustness of a global optimization algorithm. They are not large
scale, but they can be computationally intensive (some require solving
ODEs or PDEs via numerical techniques) and are often difficult to par-
allelize. The source code for both the function value as well as the gradient
is available in FORTRAN. This research did not use the problems from
CUTE or from the MinPack-2 test set because it found them late in the
87
research and was not able to easily translate them into usable C-|f code.
5.2 Numerical Results
This research applied the Back-Boxing algorithm to most of the
problems in the test suite for various ranges of accuracy. The results of
these executions can be found in Appendix A. Some of these results are
analyzed here. This research also compares the results obtained here with
those from Ratz [84] and those of Hansen [35].
For the standard form of Rosenbrocks function, when the num-
ber of variables is increased, the execution time of the standard branch
and bound algorithm and the Back-Boxing algorithm increase. Also, the
execution time for the standard branch and bound algorithm appears to
increase exponentially for the higher levels of accuracy. On the other hand,
for the Back-Boxing algorithm, the execution time is larger for the lower
accuracy but is independent of the accuracy. This is because the global
optimum is surrounded by a box of size (approximately) 0.004 and this
box has its global local optimum verified.
The 6-hump camel back function again exhibited the standard
behaviour for the two algorithms. The standard branch and bound al-
gorithm increased its execution time as the tolerance decreased while the
time for Back-Boxing remained constant.
The Shekel series all performed similarly. The Back-Boxing algo-
rithm did not significantly decrease the amount of time required, even for
the 10_15case. However, the time required for the Back-Boxing algorithm
is fairly constant for tolerances lower than 10-5.
The Levy problems were quite interesting. Due to the extremely
large number of local minimum (approximately 700), Back-Boxing was not
expected to perform well. It still turned in a good performance, matching
a standard branch and bound for low tolerances and it performed signif-
icantly better for the higher tolerances. The quadratic nature of Levy-5
was brought out in the faster solution times when compared to Levy-3
The Sum Squares algorithm is the exact sort of problem for which
one would expect Back-Boxing to excel. That is exactly what happened.
The Back-Boxing algorithm was able to identify the entire region as convex
and immediately dismiss it. It should be noted that the standard branch
and bound algorithm could have been just as fast if a good partitioning
scheme was used. However, a better partitioning scheme is the point of
Back-Boxing. Also, the addition of a second order test would not have
significantly improved the performance of standard branch and bound
since it was slow due to the exponential number of boxes that must be
checked.
Historically, the second order information has been used only to
attempt to eliminate regions where the function is not convex rather than
identify regions where it is convex and leave those to an easier algorithm
which is what the Back-Boxing procedure does. That is where the advan-
tage of Back-Boxing appears.
The Beale function was mostly uninteresting since, again, Back-
Boxing had constant times for higher tolerances while standard branch
and bound increased its time as the tolerance decreased. Similarly the
Booth function performed exactly as was expected. Since the function
is quadratic, the Back-Boxing algorithm was able to quickly identify this
89
fact, and the entire algorithm terminated with the global optimum the
local optimizer had found.
Kowalick was an especially difficult problem for both problems
and was not solvable to even 10-5 and so no data is provided here. The
research feels that this would be a good test problem for all global opti-
mization algorithms due to its simple nature as well as its difficulty.
Matyas is yet another problem that should be easy for the Back-
Boxing algorithm to solve. Indeed it was but, surprisingly, it was quite
difficult for the standard branch and bound algorithm. The branch and
bound algorithm had problems with it due to the large number of local op-
timization procedures that are applied. They are applied so often because
the function is very flat close the line x\ = x2. Since the function values
there are so close to the values for the global best known function value, it
continued to apply the local optimizer, even though it had already found
the global optimum with the first application.
The Schweffel32 problem was interesting since the time required
for Back-Boxing increased with each decrement in the tolerance. The
research is unable to provide a good explanation for this behaviour except
that the function may simply be too flat at the global optimum to identify
a region of convexity.
Branin performed well for Back-Boxing as well in spite of having
a large number of local optima. The time for Back-Boxing was mostly
constant while the standard branch and bound increased quickly.
Griewanks second problem was quite surprising. It was solved
90
for the same amount of time for each of the tolerances for the Back-
Boxing algorithm. However, the algorithm (Back-Boxing and the branch
and bound) found the global minimum on the first iteration by chance
(designr) and then the Back-Boxing algorithm set the region around the
global minimum aside for later processing. The surrounding area was the
sum of a quadratic and a cosine which was quickly and easily eliminated.
Similarly, the branch and bound function eliminated much of the region
around the global optima but then spent the remainder of its time trying
to resolve the region that immediately surrounded the global optima.
However, when the number of variables was increased to 10,
Back-Boxing out performed standard branch and bound by a large margin.
Back-Boxing again was able to eliminate large regions due to a more in-
telligent partitioning while standard branch and bound took an extremely
long time due to partitioning boxing exactly at the global optimum. The
research also tried this problem with the box [-600,6000]. Each method
took approximately 18 seconds for 10_15due to standard branch and bound
not partitioning exactly or very close to the global optimum.
The research expected the Back-Boxing algorithm to perform
better on the Hartman 6 problem since the function appears to be quite
well behaved. Again, the time for Back-Boxing was constant for the tol-
erances tested and standard branch and bound increased quite rapidly.
Exp2 is one of the problems on which Back-Boxing performed
well. It was faster for each of the tolerances tested. Back-Boxing required
more time as the size of the problem increased just as the non-Back-Boxing
algorithm did but the increase was not as large.
91
Treccani is a well behaved quartic and Back-Boxing performed
well on this one also. The time for Back-Boxing was constant over the
tested tolerances while standard branch and bound increased rapidly.
The 3-hump camel back problem was perfect for Back-Boxing.
The execution time was constant and was significantly faster than the
branch and bound algorithm for all the tolerances tested. However, the
problem is quite well behaved, has only two variables, and is quadratic in
one of the variables. This is the type of problem for which Back-Boxing
was designed. It did just as well as could be expected.
Branin 2 was another quartic polynomial and was quite difficult
for both algorithms. Both solved the problem very quickly for low toler-
ances but, as the accuracy increased, each one required huge amounts of
CPU time to solve the problem. In fact, neither was able to solve it to a
tolerance of 10-15 within the 4 hour limit. It should be noted that both
had located each global optimum and that the remaining time was spent
in verifying to the required tolerance.
Chichinadze was another problem that displayed the standard
behavior of the Back-Boxing algorithm. Back-Boxing was slower for 10-5
but was equal to or faster than branch and bound for 10_loand 10-15.
Again, this is due to the fact that it is able to identify a large region (or
regions) in which the function is convex and then ignore those until later
in the algorithm.
Price was a bit of a surprise. It is a 6th order polynomial, but
Back-Boxing still performed well. Back-Boxing was equally fast for 10-5
and significantly faster for 10-1 and 10-15 The number of Back-Boxing
92
*
* |