A NEW PARADIGM FOR ROBUST COMBINATORIAL OPTIMIZATION
USING PERSISTENCE AS A THEORY OF EVIDENCE
by
Tod Morrison
B.A., Metropolitan State College of Denver, 1995
M.S., University of Colorado Denver, 2007
A thesis submitted to the
University of Colorado Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
2010
This thesis for the Doctor of Philosophy
degree by
Tod Morrison
has been approved
Alexander Engau
Weldon Lodwick
Burt Simon
z/u/zoto
Date
Morrison, Tod (Ph.D., Applied Mathematics)
A New Paradigm for Robust Combinatorial Optimization:
Using Persistence as a Theory of Evidence
Thesis directed by Professor Harvey J. Greenberg
ABSTRACT
Robust optimization is a name given to a collection of mathematical pro-
gramming techniques for proactively addressing uncertainties in mathematical
optimization problems. This dissertation adds to this collection a new paradigm
for combinatorial optimization by which the persistence of decisions is treated
as evidence of robustness. We begin by introducing necessary terminology and
developing the conceptual foundations for our notion of persistence. We then
review the necessary elements of Dempster-Shafer Theory before presenting the
main thesis: by treating persistence as evidence, we can select solutions based
on their evidential robustness. Finally, we demonstrate the potential of this new
approach with a discussion of some applications.
Central to this paradigm is the notion that by looking at how individual
decisions (selections and rejections) persist across a given set of acceptable
solutions, we can find solutions that are more robust to uncertainties in the
underlying model. This set of solutions may come from any of a number of
sources, such as expert opinion or historical decisions, or may be generated by
algorithmic means. We develop one such algorithmic approach that produces a
in
rank ordered list of optimal and near-optimal solutions by a method of successive
exclusions. Once we have a collection of solutions to examine, we can model the
how of decision persistence through a variety means: by looking at individual
decision independently or by looking at sets of decisions and how they arise as
complements or substitutes; and by considering selection and rejections alone or
in combination.
Using the Dempster-Shafer Theory of Evidence, we construct a formal foun-
dation in which we treat persistence as evidence of robustness, which we in turn
use to build belief functions. Then, using these belief functions, we investi-
gate conditioning on prior knowledge, and methods for combining persistence
evidence from multiple sources (or multiple objectives) using various rules of
combination, including an extension to Shafers discount and combine method.
Further, we investigate relations to Hamming distance and information theory.
Finally, we apply our approach to two applications: the sensor placement prob-
lem, and portfolio tracking.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
IV
ACKNOWLEDGMENT
First and foremost, I want to thank Dr. Harvey Greenberg who, more than
simply being my thesis advisor, has been my mentor and, most importantly,
my friend. Without Harveys guidance and constant support throughout my
graduate career, this dissertation could not have been completed. Additionally,
I want to thank my committee: Dr. Steve Billups, Dr. Alex Engau, Dr. Bill
Hart, Dr. Weldon Lodwick, and Dr. Burt Simon for their valuable input
most notably Steve and Bill for their thorough and careful review.
Finally, I would like to thank my parents for their continual support and
encouragement; and Brian Peisley, Ron Nelson, and Andra Ferrara for their
willingness to play sounding board and readiness to nag when necessary.
Tod Morrison
June 2010
CONTENTS
Figures ............................................................. viii
Tables............................................................... ix
Chapter
1. Introduction......................................................... 1
2. Conceptual Foundations of Persistence................................ 6
2.1 Persistence of Selections.......................................... 6
2.2 Persistence of Rejections......................................... 17
2.3 Normalization..................................................... 23
2.4 A Note on the Domain of the Maximum-Persistence Model.......... 30
2.5 Summary........................................................... 32
3. Dempster-Shafer Theory of Evidence.................................. 34
3.1 Belief Functions.................................................. 35
3.1.1 Other Properties of Belief Functions............................ 39
3.2 Dempsters Rule of Combination ................................... 40
3.2.1 Alternate Rules of Combination.................................. 42
3.2.2 Discount & Combine Combining Highly Conflicting Beliefs ... 45
4. Using Persistence as Evidence....................................... 52
4.1 Building Belief from Persistence of Solutions.................. 52
4.1.1 Making a Partial Decision Conditional Beliefs.............. 56
4.2 Building Belief from Persistence of Decisions..................... 58
vi
4.2.1 Conditional Beliefs
63
4.3 Multiple Objectives................................................. 65
4.3.1 Combining Multiple Objectives as Pools of Evidence................ 66
4.3.2 Computing Persistence from Multiple Objectives.................... 71
4.4 Persistence and Generalized Information Theory...................... 75
4.4.1 Hamming Distance.................................................. 75
4.4.2 Entropy........................................................... 79
4.4.3 Evidence Paradox ................................................. 84
5. Applications.......................................................... 87
5.1 Sensor Placement.................................................... 87
5.1.1 Robustness Using Successive Exclusions............................ 92
5.1.1.1 Persistence of Solutions........................................ 96
5.1.1.2 Persistence of Decisions....................................... 100
5.1.2 Multiple Objectives............................................. 107
5.1.2.1 Combining Decision-Sets Belief Functions....................... 107
5.1.2.2 Grouping Objectives............................................ 110
5.2 Portfolio Selection ............................................ 114
5.2.1 Base Portfolio Tracking Model ................................... 115
5.2.2 Persistent Portfolio Tracking Model.............................. 117
6. Concluding Remarks and Avenues for Future Research ............... 123
Notation................................................................ 126
Glossary................................................................ 128
References.............................................................. 132
vii
FIGURES
Figure
1.1 Chronology of stochastic programming models.......................... 2
2.1 Breakpoints for the objective value................................. 11
2.2 Plotting as a function of n......................................... 23
4.1 Question flow charts and information gain........................... 83
5.1 Network flow contamination relative to injection point.............. 88
5.2 Comparing benchmark and tracking portfolio returns................. 117
5.3 Comparing persistent and base-model tracking portfolio returns. . 119
viii
TABLES
Table
2.1 Treasure-room knapsack solution sets................................... 9
2.2 Consensus sets for treasure-room knapsack problem..................... 10
2.3 Treasure-room knapsack solution sets with global persistence values. 14
2.4 Treasure-room solution sets with total global persistence factors. . 15
2.5 Treasure-room solution sets with total adjusted global persistence
factors........................................................... 18
2.6 Total persistence values for example class of knapsack problems. . 20
2.7 Comparative rankings under various models of persistence.............. 22
2.8 Modified treasure-room knapsack solutions (a = c)..................... 31
3.1 Combined belief and plausibility values for the technician example. 45
3.2 Discount & combine belief and plausibility values for the technician
example........................................................... 51
4.1 Singleton-only belief and plausibility values......................... 55
4.2 Singleton-only conditional beliefs.................................... 59
4.3 Decision-sets belief values, Bel(x) = e k ......................... gi
fc=i
4.4 Decision-sets conditional beliefs..................................... 65
4.5 Second-objective treasure-room knapsack problem solution sets. . 68
4.6 Combined singleton-only beliefs for the common solutions.............. 69
4.7 Combined decision-sets belief function values for solution sets union. 70
4.8 Sample bpa and belief values with associated entropy scores........... 86
IX
5.1 Ten sensor placements for minimizing population exposed.......... 92
5.2 Sensor placements for minimizing extent contaminated.............. 94
5.3 Sensor placements for minimizing volume consumed................... 94
5.4 Sensor placements for minimizing failed detections................ 95
5.5 Sensor placements for minimizing time to detection................ 95
5.6 Singleton-only beliefs for minimizing population exposed........... 96
5.7 Singleton-only beliefs for minimizing extent contaminated.......... 97
5.8 Singleton-only beliefs for minimizing volume consumed.............. 97
5.9 Singleton-only beliefs for minimizing failed detections............ 98
5.10 Singleton-only beliefs for minimizing time to detection............ 98
5.11 Example singleton-only conditional beliefs...................... 99
5.12 Decisions-sets beliefs for minimizing population exposed.......... 102
5.13 Decisions-sets beliefs for minimizing extent contaminated......... 102
5.14 Decisions-sets beliefs for minimizing volume consumed............. 103
5.15 Decisions-sets beliefs for minimizing failed detections........... 103
5.16 Decisions-sets beliefs for minimizing time to detection........... 104
5.17 Example decisions-sets conditional beliefs...................... 106
5.18 Combined belief values for PE and EC............................. 108
5.19 Combined belief values for VC and NF............................. 109
5.20 Combined belief values for VC and TD............................. 109
5.21 Combined belief values for NF and TD............................. 110
5.22 Robust sensor placements based on 10 sensors.................... 113
5.23 Comparing solution performance across all objectives............ 114
5.24 Comparing persistent and base-model tracking portfolio selections. 121
x
1. Introduction
Nearly all decision problems are plagued by some form of uncertainty. In
mathematical models, this uncertainty may appear as erroneous, incomplete, or
noisy data. This presents a number of interesting challenges to mathematical
programming that have been addressed by a variety of techniques, both reactive
and proactive. Reactive techniques, such as sensitivity analysis, are concerned
with understanding how a particular solution changes under small perturbations
in the data. Proactive techniques, instead, attempt to anticipate uncertainties
and provide a solution that is less sensitive to changes in the data.
The mathematical notion of robustness has evolved since its origins in sta-
tistical decision theory [29]. As mathematical programming developed in the
1950s, three paradigms emerged: Mean-risk [23], Recourse [9], and Chance-
constrained [8]. Building on the recourse model, Rosenhead [1968] proposed
taking suboptimal decisions at an early stage in order to gain flexibility at subse-
quent stages. That was a non-mathematical proposal, and not all situations have
opportunities for recourse. One example, which we explore in depth, is assigning
sensors to placements in a water distribution system to detect the injection of
a contaminant. A sensor, once affixed, cannot be moved hence, there is no
recourse. Another notion of robustness originates with robust statistics that
is, volatility. This was proposed by Mulvey, Vanderbei, and Zenios [24], using
an objective function that rewarded expected returns and penalized variance.
During the past decade, robust optimization has transformed from a special case
1
designed to capture the discouragement of catastrophic outcomes of low proba-
bility towards a unifying framework of the classical paradigms (see Kouvelis and
Yu [20] and Ben-Tal and Nemirovski [2, 3]). (Also see Greenberg [13, 14] for
more details and an extensive bibliography, plus Greenberg and Morrison [15]
for a broad range of robustness notions.)
Figure 1.1: Chronology of stochastic programming models.
Common to all these approaches, a robust decision is a policy chosen in the
presence of uncertainty, in which a trade-off is made between maximizing reward
and minimizing risks inherent in its implementation. Robust optimization is a
term given to a broad collection of mathematical programming methods that
use some combination of algorithmic and modeling techniques to find robust
solutions to uncertain optimization problems. This active approach to managing
uncertainty places robust optimization into the category of proactive techniques
2
for optimization under uncertainty.
This thesis adds to the modeling paradigms in three principal ways:
Probability need not be the underlying measure of uncertainty.
Worst-case outcomes could be ruled out separately, and remaining solu-
tions are robust if they impart some confidence in their implementation.
The greater the confidence, the more robust the solution.
A decision-makers confidence grows with greater persistence of solutions.
Our starting point is a given set of acceptable solutions, XK = {x1,..., xK},
from which we want to infer a robust solution. In particular, XK can be an
objective-ranked list, though we consider other ways to choose XK. Intuitively,
if an asset is selected in not only an optimal solution, but also in all of the
10-best solutions, we think this is a robust selection.
Brown et al. [5] recognized the impact of persistence on confidence in a
computed solution. As one solves a series of scenarios for a model, with modest
changes in the data, one expects to see similar solutions. That is a form of
robustness that we call confidence in the solutions computed for implementation.
They proceed to elaborate on how to take persistence into account for any
mathematical program. The developments in this thesis are consistent with
their goals, but we use a different approach than the elastic constraints given by
Brown et al. [6].
Glover [12] introduced the use of persistence, which he called consistency,
in variable assignment in the context of using the information to construct good
3
solutions. Adams et al. [1] extended this by proving some properties of persis-
tence, again for the purpose of developing a better solution algorithm. Slaney
and Walsh [28] studied how persistence affects the computational complexity of
combinatorial optimization problems in what they call the backbone.
The move from using persistence for solution computation to its impact on
models with uncertainty came with Walsh [30] and was extended by Manandhar
et al. [22). Their observations about values persisting under different scenarios is
related to our work, but they did not connect it to robust optimization. As far as
we can tell, the first connection of persistence with robust optimization was by
Bertsimas et al. [4]. They are concerned with finding persistence of a variables
value and using that frequency as a probability measure for robustness. Our
work differs from theirs in that we focus on how persistence can be used as
evidence, and we do not limit it to single variables. We develop a new theory
of robustness based on persistence, and we show how it applies to a variety of
problems.
This approach has some boundaries, so it may not be the best choice for
some decision-making situations. Our basic model is combinatorial: select a
subset, whose reward (or cost) is uncertain. We assume there is no recourse
possible. Once a subset is selected, it is implemented, and the decision-makers
have no further control over outcomes (except to cross their fingers). Our focus
is on notions of confidence and how persistence offers evidence that builds confi-
dence. Besides building intuition for this approach, we propose a mathematical
foundation, which we illustrate with familiar applications knapsack problem
and portfolio selection. We also go deeper into the sensor placement problem,
4
giving numerical results to demonstrate how this approach could be used.
The rest of this thesis is organized as follows. Chapter 2 builds intuition for
laying the formal foundation of persistence, and Chapter 3 presents what we need
from the Dempster-Shafer Theory of Evidence. Then, Chapter 4 presents the
main thesis: persistence is evidence, and we can select solutions based on their
evidential robustness. Chapter 5 goes deeper into some applications to demon-
strate the efficacy of this approach, and Chapter 6 presents conclusions and
avenues for further research. See the Mathematical Programming Glossary [16]
for basic terms; new terms are defined as needed.
5
2. Conceptual Foundations of Persistence
Most of our decisions are affected by a host of uncertainties. Situations
of incomplete or erroneous data, unclear and possibly conflicting objectives and
goals, and the potential for unforeseen complications in implementation warrant
robust solutions. Robust solutions are designed to mitigate uncertainty, regard-
less of source. In this chapter, we develop the foundations for an approach to
mathematical programming that define robustness by considering the extent of
agreement among a collection of acceptable solutions (for instance, optimal
and near-optimal).
2.1 Persistence of Selections
We begin our development with a discussion of some basic concepts and
terminology. In order to ground and give context to this discussion, let us
start with a simple example. Suppose, like a character from an Indiana Jones
adventure, we have discovered the location of a legendary treasure room. Of
course, like any good treasure room, this one is largely inaccessible and protected
by all kinds of traps and pitfalls, so we will take only what we can fit in one
small rucksack. Given that we will need to quickly decide what to take once we
enter the room, we want to have a definite plan; we need to solve a knapsack
(or perhaps, rucksack) problem. Unfortunately, due to the vagaries of historical
records, we can only speculate on what we will find (some things may be lighter
or heavier, or more or less valuable than expected), so our solution should allow
6
for some extra flexibility.
Suppose after researching the likely contents of the treasure room our best
guess at their weights and values gives the following knapsack problem:
max cx
s.t. ax < b, (2.1)
xe{o,i}n,
where the values, weights, and bag limit are
c = [44 49 39 43 11 10 10 41 5 25 ]
a =[ 47 52 42 39 16 5 13 43 1 30 ]
b = 107.
Our first step is to generate a collection of acceptable solutions. One
approach is to use a method of successive exclusions to generate a collection
of optimal and near-optimal solutions. We do this by first obtaining an initial
optimal solution, x1, with objective value z1. For each xk, we define the selection
set: Sk = {i G X : xk = 1}, where X = {1, ..., n} is the set of decisions
corresponding to the decision variables, x,. Then, starting with k = 1, we
obtain Sk+1 by solving our problem with the added constraints:
Yl Xi- l-S'-71 1 for j = 1,..., k,
iesi
where |S31 is the magnitude of the selection set S3. We stop when:
7
(2.1) becomes infeasible;
a specified maximum number of solutions is generated; or,
the objective value falls below some threshold, such as zK < azx for some
a
Stepping through in this way, we find alternative optima, if any, and continue
generating solutions, allowing the objective value to decrease ensuring that no
solutions are regenerated. Other stopping rules are possible, but, whatever rule
is chosen, the result is XK = {x1,..., xK}, for some K > 1.
The rate at which the objective value changes as we generate new solutions
is one measure of interest; a flat function implies that there are at least K
alternative optima. We can see the result of applying this process to our treasure-
room knapsack problem in Table 2.1, where the last row shows the total number
of times each object is selected. Intuitively, object 6 seems highly preferred,
while object 10 is never selected.
We define the consensus set is defined as:
ck = s1 n s2 n n sk. (2.2)
We are interested in how the consensus set changes as alternative solutions
are generated. Table 2.2 shows the consensus set for the best 10 knapsack
solutions. Since the consensus set is defined in terms of set intersections, the set
cannot increase as solutions are added. In fact, for our example, \Ck\ 0 for
k > 7. Moreover, this process is highly order-dependent; in our example, since
z4 = z5 = z6, the fact that \C4\ = 3 is simply an effect of the order in which the
8
Table 2.1: Treasure-room knapsack solution sets.
k Objective Value Xi x2 Knapsack Assignment x3 x4 xr> x6 x7 x8 Xg Â£io
1 112 1 0 0 1 0 1 1 0 1 0
2 110 0 0 0 1 1 1 0 1 1 0
3 109 0 0 0 1 0 1 1 1 1 0
4 108 0 0 1 1 1 1 0 0 1 0
5 108 1 1 0 0 0 1 0 0 1 0
6 108 1 0 0 1 1 1 0 0 0 0
7 107 0 0 1 1 0 1 1 0 1 0
8 107 0 1 0 1 0 0 1 0 1 0
9 107 1 0 0 1 0 1 1 0 0 0
10 107 0 1 0 1 0 1 0 0 1 0
11 106 0 0 1 0 1 1 0 1 1 0
12 105 0 0 1 0 0 1 1 1 1 0
13 105 0 1 0 0 0 1 0 1 1 0
14 105 0 0 0 1 1 1 0 1 0 0
Total 4 4 4 10 5 13 6 6 11 0
solutions are generated. On the other hand, the relative slowness with which the
optimal value decreases suggests that some form of robustness may be gained by
considering near-optimal solutions for instance, there may be near-optimal
solutions that are less sensitive to variations in model parameters.
We can address the issue of order-dependence by considering the notion of
breakpoints. Formally, breakpoints are values k = bo,b\,... in the sequence for
which the objective value decreases (for maximization). For a maximization
problem like the treasure-room knapsack problem, we begin with b0 = 1 and
recursively define 6, =f min{/c : zk < z1} while adopting the convention that
9
Table 2.2: Consensus sets for treasure-room knapsack problem.
k z 1 112 2 110 3 109 4 108 5 108 6 108 7 107 8 107 9 107 10 107
X\ /
X4 / / / / *
X6 / / / / / / / * *
x7 /
Xg / / / / /
|C| 5 3 3 3 2 1 1 0 0 0
(alternative consensus set elements are marked with asterisks)
bi = K + 1 if zk > z1 for all k. The recursion stops with m breakpoint regions:
1}, {b\,... ,b2
l}, . j ,. i K},
and m + 1 breakpoints:
6o 1
bi = min{/c : zk < z1}
bi = min{fc : zk < zbi~x}
bm_i = min{A; : zk = zK}
bm = K 1.
This is illustrated graphically in Figure 2.1.
We define persistence as the continued presence of a selection s in the
10
if7TZ------->
----
alternative
optima
b0 b\ 62 bs
Figure 2.1: Breakpoints for the objective value.
collection of selection sets. One form, if there are alternative optima, is
the percentage of solutions in the optimality region that contain s, that is
|{* 6 1 : xs = l}|/(6i 1). A stronger form of persistence is a
measure of how long a selection remains in the consensus set. Because this
can depend on the order in which the optimal-set is generated, we modify our
definition of consensus set to only consider breakpoint regions in their entirety.
Formally, we define the multisets:
Si= l+l Sk for i = l,...,m. (2.3)
bi~i
Let N(x,X) denote the number of occurrences of element x in multiset X. In
particular, is the number of optimal solution sets that contain the
selection s. More generally,
Af(s,Sl) = \{k : s e Sk for < k < bi}\ (2.4)
11
gives the number of solution sets in the z-th breakpoint region that contain
selection s.
For example, looking at the knapsack solution sets we see that the first
decision, aq, is selected twice in the fourth multiset (the first with multiple
solutions),
Af(l,
Now define the consensus multisets by their intersection:
Ck = Sl D---nSk for k= (2.5)
The number of occurrences satisfies J\f(s,Cl) = min{A/"(.s,Ct_1),A^(.s,
the solution set for each breakpoint region is unique, we have bi = i + 1 and
~~
The local persistence of a selection s is its frequency within a breakpoint~~
region:
ft(s)
s')
bj bi i
(2.6)
The numerator is the number of optimal or near-optimal solution sets that
contain s, while the denominator is the number of solution sets in the breakpoint
region, so pi(s) 6 [0,1]. The greater the value of p,(s) the more persistent
selection s is in the z-th breakpoint region.
Although there may be some value to considering the local persistence of a
selection, particularly within the first breakpoint region when there are many
12
alternate optima, this can be overly restrictive when the optimal value decreases
relatively slowly. For instance, in our example problem we see a total drop in
objective value of less than 7% between the first and fourteens solutions. In
such cases, the extent of the data uncertainties can be more concerning than a
slight loss in objective value, and the possibility of discovering highly persistent
selections that might not be identified when considering only the first breakpoint
warrants considering a broader notion of persistence.
The global persistence of selection s is its frequency for the horizon:
7W = (2.7)
In this case, the numerator is the number of optimal and near-optimal solution
sets that contain s across all breakpoints, or equivalently, across all K generated
solutions.
To illustrate this, Table 2.3 extends the list of generated solutions from
Table 2.1 by including the global persistence values for each of the items. Con-
sidering the global persistence values of the items, we see that x3 satisfies the
knapsack constraint and maximizes the total persistence, Â£)s7(s), with a solu-
tion value of 109, only slightly less than the optimal solution.
Looking at the global persistence value of an individual selection in isolation
fails to capture interaction effects, so we must extend the notion of persistence
to consider subsets of S. In the context of robust decision-making, we may feel
confident in choosing s if y(s) is relatively large, but we want to have confidence
in the entire set of selections. Two selections could be substitutes, where the
13
Table 2.3: Treasure-room knapsack solution sets with global persistence values.
k Objective Value Xi X2 Knapsack Assignment X3 X4 X5 X6 X7 Xg Xg Z10 Â£7(s) s
1 112 1 0 0 1 0 1 1 0 1 0 44/i4
2 110 0 0 0 1 1 1 0 1 1 0 45/l4
3 109 0 0 0 1 0 1 1 1 1 0 46/h
4 108 0 0 1 1 1 1 0 0 1 0 43/l4
5 108 1 1 0 0 0 1 0 0 1 0 32/14
6 108 1 0 0 1 1 1 0 0 0 0 32/14
7 107 0 0 1 1 0 1 1 0 1 0 44/l4
8 107 0 1 0 1 0 0 1 0 1 0 31/l4
9 107 1 0 0 1 0 1 1 0 0 0 33/l4
10 107 0 1 0 1 0 1 0 0 1 0 38/l4
11 106 0 0 1 0 1 1 0 1 1 0 39/l4
12 105 0 0 1 0 0 1 1 1 1 0 40/l4
13 105 0 1 0 0 0 1 0 1 1 0 34/l4
14 105 0 0 0 1 1 1 0 1 0 0 34/l4
7 (u) 4 14 4 14 4 14 10 14 5 14 13 14 6 14 6 14 11 14 0
inclusion of one excludes the other. Or, they could be complements, where
the inclusion of one includes the other for instance, in our example solution
list the second and third items both have global persistence values of 4/n but
are never selected together, which we should consider when selecting based on
persistence.
We address this issue by defining the global selection persistence factor of a
set as:
r (5)
|{A; : S C Sk for 1 < k < K}\
K
for S 7^ 0
and
r(0) =0.
In particular, T({s}) = 7(5). More generally, we have the following.
14
Theorem 2.1. f(S) < min7(s).
Proof: We have,
I {A; : S C Sk for 1 < k < K}\ < |{fc : s G Sk for 1 < k < K}\
for each s G S. Equivalently, r(S) < j(s) for each s S.
Theorem 2.1 says that the global selection persistence factor of a subset
cannot exceed the global persistence of any of its members. For example, con-
sidering the selection set S = {4,5,6}, we see that S' is a subset of S2, S4, S6,
and S14, giving it a global selection persistence factor, F(S) = -A, while the
global persistence values of the constituent selections are 7(4) = 7(5) =
and 7(6) = g.
Table 2.4: Treasure-room solution sets with total global persistence factors.
k Objective Value Xi X2 Knapsack Assignment XZ Xi x5 xe x7 x8 Xg X10 Â£T(S) s
1 112 1 0 0 1 0 1 1 0 1 0 134/l4
2 110 0 0 0 1 1 1 0 1 1 0 142/l4
3 109 0 0 0 1 0 1 1 1 1 0 146/l4
4 108 0 0 1 1 1 1 0 0 1 0 130/l4
5 108 1 1 0 0 0 1 0 0 1 0 64/l4
6 108 1 0 0 1 1 1 0 0 0 0 68/l4
7 107 0 0 1 1 0 1 1 0 1 0 138/h
8 107 0 1 0 1 0 0 1 0 1 0 64/l4
9 107 1 0 0 1 0 1 1 0 0 0 74/l4
10 107 0 1 0 1 0 1 0 0 1 0 00
11 106 0 0 1 0 1 1 0 1 1 0 118/l4
12 105 0 0 1 0 0 1 1 1 1 0 120/14
13 105 0 1 0 0 0 1 0 1 1 0 74/l4
14 105 0 0 0 1 1 1 0 1 0 0 78/l4
15
In Table 2.4, we illustrate an application of global selection persistence fac-
tors, extending Table 2.1 to include with each solution its total global selection
persistence factor, which we compute as the sum of T over all subsets of the
corresponding selection set. Because of its ability to capture interactions among
selections, considering the total T of solutions can, in some cases, allow us to
distinguish among solutions with similar global persistence values. For instance,
since the first and seventh solutions, x1 and x7, have the same total global persis-
tence (Table 2.3), they could be seen as equally good when we consider decisions
independently. However, the difference in their total global selection persistence
factor (Table 2.4) suggests that x7 is a better solution when interactions among
selections are considered.
In computing the global selection persistence factor, a solution that selects
many elements contributes the same weight to each as one that selects only a
few. As a consequence, T will tend to give more weight to subsets of those
solutions whose selection sets are relatively large. In the knapsack problem this
would tend to favor low item-weight items. For instance, x2, which has the
highest item-weight at 52, is selected only by those solutions that select just
four items and hence feasible selection sets, S, that include x2 will have low
values of T(S) despite its high item-value. By contrast, while x7 has a much
lower item-value relative to its item-weight sets that include it will have higher
T values. To addresses this situation, we propose an adjusted global selection
persistence factor that normalizes the weight contributed by each solution.
One approach to normalizing the contribution of a solution is to define an
adjusted global selection persistence factor, so that each solution in our list
16
contributes an amount relative to the cardinality its solution set. Thus, we
define the adjusted global selection persistence factor as:
Â£ (2ls*l-i)-'
f(S) = ----------for S # 0 and f (0) = 0.
K
Note that each solution in our list of acceptable solutions contributes an equal
weight, which is distributed over the subsets of that solution.
We contrast this normalized form with the unnormalized global selection
persistence factor in Table 2.5. The effect of the adjustment can be seen when
comparing the second and third solutions using T the third solution scores
higher, while with T the second scores higher. We can explain this, in part,
by the increased contribution of the fourteenth solution, which selects only four
items and selects a proper subset of the items selected by the second solution.
2.2 Persistence of Rejections
As we have seen, defining persistence in terms of selections alone causes the
global persistence to be greater for those solutions whose selection sets are rela-
tively large. In the knapsack problem this favors low-weight items and does not
account for the information provided by considering when items are rejected.
For our example, item 1 is selected only four times; another perspective is that
item 1 is rejected 10 times. The rejections offer information that is complemen-
tary to selections. This observation offers an alternative to using T for dealing
with the issues raised when the number of selections made varies between solu-
tions. Instead, we address the problem with using selections alone by accounting
17
Table 2.5: Treasure-room solution sets with total adjusted global persistence
factors.
k Objective Value Xi X2 Knapsack Assignment x3 xA x5 x6 x7 xs Xq Â£10 (S) scsk zr(5) scsk
1 112 1 0 0 1 0 1 1 0 1 0 134/l4 0.42919
2 110 0 0 0 1 1 1 0 1 1 0 142/l4 0.43779
3 109 0 0 0 1 0 1 1 1 1 0 146/l4 0.43717
4 108 0 0 1 1 1 1 0 0 1 0 !30/i4 0.38065
5 108 1 1 0 0 0 1 0 0 1 0 64/l4 0.24332
6 108 1 0 0 1 1 1 0 0 0 0 68/l4 0.24762
7 107 0 0 1 1 0 1 1 0 1 0 138/l4 0.39908
8 107 0 1 0 1 0 0 1 0 1 0 64/l4 0.22857
9 107 1 0 0 1 0 1 1 0 0 0 74/l4 0.25653
10 107 0 1 0 1 0 1 0 0 1 0 86/14 0.30876
11 106 0 0 1 0 1 1 0 1 1 0 H8/14 0.33333
12 105 0 0 1 0 0 1 1 1 1 0 120/l4 0.33303
13 105 0 1 0 0 0 1 0 1 1 0 74/l4 0.26144
14 105 0 0 0 1 1 1 0 1 0 0 78/l4 0.26083
for both rejections and selections in determining the overall persistence.
We could account for rejections by simply re-modeling with complementary
decision variables, x' 1 x, and considering the persistence of selections
for the complementary variables. However, despite the dependence, accounting
for both selections and rejections in one framework does change the effect of
final decision-making and the confidence we have in the result. An analogy,
albeit imperfect, is grading by subtracting the number of wrong answers from
the number of right ones, rather than just counting the percent of right ones.
Another view is the statistical concept of hypothesis testing. Although there
may not be enough selections to make a confident decision to select an asset,
there could be enough rejections to make a confident decision to reject it; there
18
could also be insufficient persistence of either decision, thus giving little or no
confidence to either selection or rejection. More to the point, the consequences
of a selection are different from that of a rejection. For example, rejecting an
asset may cost a little due to its being better (in terms of lost opportunity) than
what we finally choose, but selecting an asset may incur a significant downside.
Let Rk = -'Sk, the rejection set, be the complement of Sk and define the
global persistence of rejection:
7M = (2.8)
and the global rejection persistence factor:
~ x \{k : RC Rk for l
T(R) = ^-------=--------==^ for R ^ 0 and T(0) = 0. (2.9)
Note that, T({r}) = 7(7-) and, by analogy, Theorem 2.1 extends as:
T(R) < min7(r).
rR
We can combine the information provided by considering the persistence of
both selections and rejections in several ways. For now, consider their simple
sum, the combined global persistence factor:
**
In particular, ^(5,0) = r(S') and ^(0,i?) = T(i?).**
19
Table 2.6: Total persistence values for example class of knapsack problems.
k 7fc 7fc pfc pfc
1 i 2(n-l) 1 2n_1+n2 (3)2n_1+2n4
n+l n+l n+l n+l n+l
2 (n-1)2 n (n+l) 2n_2n n (2n+l)2R-1-2n
n+l n+l n+l n+l n+l
3,. .. ,n + 1 (n2)(n1) n+l n+2 n+l (n+2) 2n-3n n+l n+3 n+l (3n+7)2"-2-4n n+l
To illustrate these extensions, consider the class of knapsack problems for
n > 2 given by:
n
max nx\ + ^ Xj
i=2
n
s.t. (n 1) Xi + Xj < n 1, (2-11)
3=2
x e {o,i}n.
The optimal solution is x1 = ei with S'1 = {1}, R1 = {2,... ,n}, and objective
value n. The second-best solution is x2 = 1 ei with an objective value of n 1.
The next n 1 solutions, xk for k {3,..., n + 1}, each have value n 2 and
are, respectively, xk = 1 ei e*,_i. Our objective-ranked list has the values
shown in Table 2.6, where
7fe = 7(s)
sesk
lk = ^(r)
rÂ£Rk
rl = J2 r(s)
SCSk
20
= E r~~ + E r<5>~~
5C5fc:|S| = l SC5'=:|5|>2
7fc+ 0
1
n + 1
= < 7fc +
7fc +
1
n + 1
1
n + 1
n1
X](V)(n*) =
i=2
n2
(72) (w *) =
i=2
(n + 1) 2n 2 n
n + 1
(n + 2) 2n~3 n
n + 1
if fc = 1;
if k = 2;
if 3 < k < n + 1.
r* = E f(*)
RCRk
\$>k
E r(R) + Y, rM
RCRk:\R\=l RC Rk:\R\>2
7* +
2n
-l
(n- 1) 1
n + 1
< 7fc+ 0
7fc +
1
n+1
271-1 + n 2
n + 1
n
n+1
n + 3
n+1
E E
SCSk RCRk
if k = 1;
if k = 2;
if3
= 2lRfcl ^ r(5r) + 2l^fcl ^ r(/?)
SCS* R
= 2IRV + 2l5*lffc.
In Table 2.7, we compare the rankings under various models of persistence of
solutions to our example class of knapsack problems for different values of n > 2.
For n = 4 and n = 5, the optimal solution is most persistent with respect only
to the ranking of {7fc} and {Tfc}. In all three cases, the second-best solution
is the most persistent with respect to the rankings of {7fc}, {Tfc}, {7fc + 7fc},
21
Table 2.7: Comparative rankings under various models of persistence.
n=3
k xk yyfc ryk pfc pfc
1 1 0 0 1 4 4 4 1 4 4 4 14 4
2 0 1 1 4 3 4 4 5 3 4 4 22 4
3 0 0 1 2 5 4 4 2 5 4 4 20 4
4 0 1 0 2 5 4 4 2 5 4 4 20 4
n=4
k xk /y k ,yk pfc pfc V])fc
1 10 0 0 1 6 5 5 1 6 5 5 28 5
2 0 111 9 4 5 5 16 4 5 5 64 5
3 0 0 11 6 6 5 5 8 6 5 5 60 5
4 0 10 1 6 6 5 5 8 6 5 5 60 5
5 0 110 6 6 5 5 8 6 5 5 60 5
n=5
k xk yyk syk rfc rfc \J)fc
1 1 0 0 0 0 1 6 8 6 i 6 8 6 54 6
2 0 1111 16 6 5 6 43 6 5 6 166 6
3 0 0 111 12 6 7 6 23 6 7 6 156 6
4 0 10 11 12 6 7 6 23 6 7 6 156 6
5 0 110 1 12 6 7 6 23 6 7 6 156 6
6 0 1110 12 6 7 6 23 6 7 6 156 6
{rfe + tk}, and {Fk}. The remaining solutions (k > 3) are the second-most
persistent with respect to all rankings.
We further investigate the persistence rankings with respect to {Tfc} by
using the formulas from page 20. First, we note that for n > 2, the optimal
solution is always the least persistent (that is, for k ^ 1). We then
investigate the relative rankings of 'F2 and *F3+ (that is, *Ffc for k > 3) by looking
at their difference,
^,2 ^3+ (2re + 1) 2~1 2n (3n + 7) 2n~2 4n (n 5) 2n~2 + 2n
n+1 n+1 n+1
22
Figure 2.2: Plotting 'Pfc as a function of n.
where we see that \k2 > 'If3+ for n > 2 (illustrated graphically in Figure 2.2).
2.3 Normalization
So far in this chapter, we have developed several functions that seek to quan-
tify the persistence of a given decision or solution. However, these functions
may produce values on vastly different scales and are not readily comparable.
Consider, for instance, a situation where the underlying problem is such that
acceptable solutions make only a relatively few selections, rejecting most op-
tions. In this case, the total global rejection persistence factor, ]Ckc-.s F(-R),
will be an order of magnitude larger than the total global selection persistence
factor, )Cs'cs r(S"). We can address these incompatibilities by normalizing the
functions, rescaling their range with an appropriate divisor to produce values in
the unit interval, [0,1].
Even though any sufficiently large divisor can be used to produce values in
23
the desired interval, the idea is to find an upper bound on the possible values that
is meaningful, so some consideration of the underlying model should be given.
For instance, recall that for a given decision (that is, a selection, s, or rejection,
r), the global persistence values for selections and rejections are defined as
7(s) =----------------- and 7(n =------------------T7=------,
respectively. We have 0 < 'y(s) < 1 and 0 < 'y(s) < 1, so an easy upper
bound for the total global persistence of selections, =1 7(5), or rejections,
Y2r Ir=o l{r) i is simply the number of decision variables, n. Although this bound
is non-arbitrary, it wholly disregards the information contained in the solutions
set and reflects only a very coarse understanding of the underlying problem.
We can find a tighter bound by noting that the total global persistence
of selections would be maximized by selecting everything if this were feasible,
while the total global persistence of rejections would be maximized by rejecting
all possible choices. Hence, we have
K
max 7() < ^ \ \S
s:x3 = 1 s=1 fc=l
for selections, and
max 7(s) < ^7(0 =
s:xs=1 s=l
k= 1
(2.12)
(2.13)
for rejections. Using the bounds from (2.12) and (2.13) as divisors, we define
24
the normalized global persistence functions for selections and rejections as
7 (s)
Af(s, ih
EL |sfcl
and
jir)
A/~(r, ili
EL l*fcl
(2.14)
respectively, which give us the desired property that the total normalized global
persistence values, X)S:xs=o 7(s) and Er xr=o7(r)> are appropriately adjusted
to lie in the unit interval.
Considering the global persistence factor functions,
\{k : S C Sk for 1 < k < K}\
^-----=-------==s- for 5^0
and T(0) = 0,
and,
~ , \{k : R C Rk for 1 < k < K}\
r(R) = ^--------=-----------^ for R ^ 0 and T(0) = 0,
we have 0 < T(5) < 1 and 0 < T(i?) < 1 for arbitrary decision sets S,R C X.
As noted though, the total global selection persistence factor, Es'cs r(S"), and
total global rejection persistence factor, Erc-,s^(R)i may produce very large
values.
As with the total global persistence functions, we can find a gross upper
bound by simply examining the general form of the total T (or total T) function
without considering the information contained in the solutions set. Note that,
since 0 < r(5) < 1 for all S C I and T(0) = 0, the total T for the maximal
selection set, S X, which sums over all 2'1' = 2" subsets of I, is always less
than (2n 1). By similar reasoning, we get the same bound for the total T using
25
the maximal rejection set, R = X.
Although this bound provides a sufficient divisor for rescaling both total T
and total T onto the unit interval, it is a single divisor for both, so it does not
fully address the very incompatibilities that we are attempting to adjust. We
can improve on this bound by incorporating information from the solution set,
in which case, given an arbitrary selection set S C I and rejection set R Cl,
we have
E r
S'CS k=1
for global selection persistence factor, and
EF(fl')<^E(2lSl-l) (2.16)
R'CR k=1
for global rejection persistence factor.
These bounds can be significantly tighter than the gross bound, (2n 1).
Consider the knapsack problems from Section 2.2. For rejections we have,
E
n+1
^tE(2M-0
fc= i
n + 1
(2n-1-l) + (21-l)+^(22-l)
k-2
[2"_1 + 3(n 1)] < 2n 1.
Thus, for large n the gross bound will be on the order of (2n + 2) times the
bound given in (2.16).
Using the bounds in (2.15) and (2.16), define the normalized global persis-
26
tence factor functions:
for 5^0 and Â£(0) = 0, (2.17)
and
|{fc : R C Rk for 1 < k < K}\
for R ^ 0 and Â£(0) = 0. (2.18)
We can compare these normalized forms to the inherently normalized adjusted
global selection persistence factor,
holding with equality when S X.
Another normalization is to consider the maximum possible value, taking
into account that there is a limit, possibly due to a budget. For example, consider
the knapsack problem from Section 2.2. Once aq = 1, all other variables must
be zero, whereas X\ = 0 allows all other variables to be 1. We can thus partition
for S ^ 0 and f(0) = 0,
which is already normalized, with
f (5') <1 for S C X
s'cs
27
(2.15) as:
i if 1 (E S
2n~1 1 otherwise.
(Similarly for (2.16).)
Using problem-dependent bounds to normalize could raise new issues, espe-
cially as a problem is modified. Hence, we point this out only as an extreme, and
note that the normalization chosen can affect the ranking. For this reason, we
maintain a problem-independent normalization for the remainder of this study.
In deriving an appropriate divisor for the combined global persistence factor,
Â£ r(s') <
S'CS
^(5, R) = r(S) + T(R) for S n R = 0,
we can use the bound information we have already found. First, we recall (from
Section 2.2) that the total combined global persistence factor for a solution
V Cl can be written as
Â£ ns,R)= Â£ (r(S) + r(fl))
scv scv
RC->V RC^V
= 2hv| ]T r(s) + 2|v| Y f(#).
SCV RC-^V
Then, using the bounds from (2.15) and (2.16), we have, for any V Cl,
,heM2|s
Â£ -(S, R) < 2^--------------
SCV
HC-.V
1) (2M -1)
+ K
(2.19)
28
Finally, we determine an upper bound on the total 'F for all V Cl by maximiz-
ing the right-hand side of (2.19) with respect to V.
Theorem 2.2. For A, B > 0,
mffl {A(2v) + B(2n~v)} = 2n max{i, B} + min{A, B}.
Proof: The maximand is convex since its second derivative is
(In 2f{A2v + B2n~v) > 0.
Thus, the maximum occurs at an end point:
max
0
{T(2") + Â£(2iV-t)}
max
ve{o,N}
{A(2V) + B(2n~v)}
= max{A2N + B, A+ B2n}
= 2^ max{A, B} + min{A, B}.
Applying Theorem 2.2 to (2.19), our upper bound on the total \F is
max
vex
E sev RC->V = max vex
< (2'xl)
2hv| y r(s)+2|V| Y T(R^
SCV RC-iV
max
k= 1
K
E
(2
Rk
- 1
)
29
Moreover, letting
we define the normalized combined global persistence factor,
MSJi) = r(s) + w)
for S n R = 0.
(2.20)
In summary, our exploration of normalization concludes that using problem-
independent bounds for scaling offers simplicity and consistency, but tighter
bounds are possible for specific problems.
2.4 A Note on the Domain of the Maximum-Persistence Model
In the treasure-room knapsack problem, maximizing the total global persis-
tence subject to the original knapsack constraint gives a solution with a value
very near the optimal of the original and well within the range of generated
solutions. In general, though, there is no guarantee that there exists an x* with
and cx* > zK. We illustrate this by modifying our previous knapsack example
slightly. Setting the weights equal to the values, so that
x* G argmax
X
[44 49 39 43 11 10 10 41 5 25 ],
30
the method of successive exclusions generates the results shown in Table 2.8.1
In this case, we maximize global persistence by setting Â£ = [0000111111],
which gives a total persistence value of 57/i9, but whose value in the original
problem is only 102, well outside the range of the generated solutions.
Table 2.8: Modified treasure-room knapsack solutions (a = c).
k Objective Value Xi Knapsack Assignment Â£3 Â£4 Â£5 Â£6 Â£7 Â£9 Xw Â£7(s) s
1 107 0 1 0 1 0 1 0 0 1 0 34/l9
2 107 1 0 0 1 0 1 1 0 0 0 34/l9
3 107 0 0 1 1 0 0 0 0 0 1 17/l9
4 107 0 1 0 1 0 0 1 0 1 0 34/l9
5 107 0 0 1 1 0 1 1 0 1 0 45/l9
6 106 0 0 1 0 1 0 1 1 1 0 48/l9
7 106 0 1 0 0 1 0 0 1 1 0 37/l9
8 106 0 0 1 0 1 1 0 1 1 0 48/l9
9 106 1 0 0 0 1 0 1 1 0 0 37/l9
10 106 1 0 0 0 1 1 0 1 0 0 37/l9
11 105 0 1 0 0 0 1 0 1 1 0 39/l9
12 105 0 1 0 0 1 1 1 0 0 1 41/l9
13 105 1 0 0 0 1 1 1 0 1 1 50/l9
14 105 0 1 0 0 0 0 1 1 1 0 39/l9
15 105 0 0 0 1 1 0 1 1 0 0 39/l9
16 105 0 0 0 1 1 1 0 1 0 0 39/l9
17 105 0 0 1 0 0 1 1 1 1 0 50/l9
18 105 0 0 1 0 0 0 0 1 0 1 22/l9
19 105 1 0 0 0 0 1 1 1 0 0 39/l9
7(u) 5 19 6 19 6 19 7 19 9 19 11 19 11 19 12 19 10 19 4 19
This same result extends to other methods of generating the collection of
acceptable solutions. In general, we impose no restriction on the feasible domain
'For this example we set K = 19 in order to capture a complete breakpoint.
31
of the maximum-persistence problem that would restrict the optimal solution
to be in the acceptable solution list beyond the constraints of the underlying
model. Mathematically, this means that the collection of acceptable solutions is
a subset of the considered decision space, that is XK C X.
2.5 Summary
By considering the nature and extent of agreement among a collection of
acceptable solutions, using the persistence of decisions, for both selections and
rejections, to make these decisions suggests a greater confidence in the resulting
solution. We develop an approach to uncertainty in combinatorial optimization
that defines robustness in terms of maximizing persistence. First, however,
we must determine an appropriate method by which a suitable collection of
solutions will be generated, which will vary by application. Then, we must clarify
what we mean by persistence, which can simply consider individual decisions
independently, or can be extended to consider interactions among subsets of the
decision space.
One approach to generate the required collection of solutions, though not
the only approach, is by the method of successive exclusions, which produces
a rank-ordered list of solutions by starting with an initial optimal solution and
generates alternative solutions by excluding those solutions already found and
re-solving our model. A consequence of using this approach is that the solu-
tions are generated in order of decreasing optimality (with respect to the base
model). Hence, by considering the persistence of a decision between successive
breakpoints, where the optimal value changes, we can measure the local per-
32
sistence of a decision across only those solutions that have the same objective
value. Alternately, when the objective value changes very slowly between break-
points, or when our collection of acceptable solutions is not rank-ordered, we
can consider the global persistence of decisions.
If we wish to extend our notion of persistence to include not only individ-
ual decisions but also the interactions among sets of decision, we can measure
persistence by the global selection persistence factor. Additionally, we can ei-
ther consider selections or rejections alone or we can consider both selections
and rejections in a combined measure. Then, because, in general, there is no
reason for all the solutions in our list to select the same number of elements,
we also investigate adjusting our measures of persistence to account for the size
of the selection sets in which a given decision occurs. The result of this in-
vestigation is a normalized measure of persistence from which follows a deeper
and more general discussion of comparability and normalization across all our
various persistence measures.
33
3. Dempster-Shafer Theory of Evidence
Dempster-Shafer Theory (DST) is a mathematical theory of evidence that
allows one to quantify the degree to which some source of evidence supports a
particular proposition. It is an alternative to traditional probability theory in
that this support can be allocated to sets or intervals without requiring assump-
tions about the degree of support given to their constituent elements. In the
case where the evidence is sufficient to completely assign support to singleton
sets, the model reduces to the traditional probability model. DST was initially
developed by Shafer in his 1976 book, A Mathematical Theory of Evidence [27],
as an expansion to Dempsters earlier work on the mathematical structure for
upper and lower probabilities [10]. In that discussion, Shafer proposed calling
set functions with the structure of Dempsters lower probabilities belief func-
tions, and developed the implications of Dempsters rule for combining belief
functions based on different bodies of evidence.
The theory is at once a theory of evidence, using belief functions to indicate
the degree of support provided by a body of evidence, and a theory of prob-
able reasoning, focusing on the combination of degrees of support stemming
from distinct bodies of evidence. Our discussion begins with an examination
of belief functions, their relatives, plausibility functions, and the mathematical
representation of evidence. Following that we examine the combination of belief
functions, beginning with a discussion of Dempsters rule of combination and its
implications, then with a brief review of some alternative rules of combination
34
including an investigation of a novel modification of Shafers discount & combine
method. Finally, we investigate some criticisms and alternate interpretations of
DST that have arisen as the theory has evolved.
3.1 Belief Functions
A belief function assigns a value, representing the weight of evidence (or
degree of support), to each of the subsets of a universe of possible events. Unlike
traditional probability theory, DST relaxes the additivity axiom, so that beliefs
can be formed from evidence that does not add up in some simple way. Formally,
a belief function is defined as follows:
Definition 3.1. Given a nonempty universal set (or frame of discernment), ft,
let fP (ft) denote its power set. A belief function is a function, Bel : T (ft) >
[0,1], such that
1. Bel(0) = 0,
2. Bel(ft) = 1, and
3. Bel(A UB)> Bel(A) + Bel(B) Bel(A D B) for A, Â£ e T (ft).
The first two conditions are the same axioms as in probability theory. The
third condition is the property of superadditivity, which relaxes the additivity
axiom of probability by allowing the belief allocated to a set to be greater than
the sum of the beliefs allocated to its constituents.
Related to belief functions is another type of function for representing the
plausibility of evidence in DST. Similar to belief functions, plausibility func-
tions relax the additivity axiom of probability, but instead have the property of
35
subadditivity, which means that the plausibility allocated to a set can be less
than the sum of the plausibilities of its constituents. We formalize this in the
following definition:
Definition 3.2. A plausibility function is a function Plaus : IP (Q) > [0,1]
such that
1. Plaus(0) = 0,
2. Plaus(f2) = 1, and
3. Plaus(A D B) < Plaus(A) + Plaus(B) Plaus(A U B) for A,Be7 (fi).
Given a belief function, Belm, its companion plausibility function is defined
by the formula
Plausm(A) = 1 Belm(-ii4). (3.1)
We verify that this function is a plausibility function as defined by Definition 3.2,
by confirming that
Plausm(0) = 1 Belm(f2) = 0,
Plausm(D) = 1 Belm(0) = 1, and
Plausm(A nB) = l- Belm(->A U ->B)
< 1 [Belm(-iA) + Belm(->B) Belm(-u4 D ~^B)]
= 1 [(l Plausm(A)) + (l Plausm(B)) (l Plausm(A U J5))]
= Plausm(A) + Plausm(B) Plausm(A U B).
Like probability functions, belief functions encode the sum of all support for
36
any subset of $1. Unlike probability functions, however, the weakening of the
additivity axiom allows some degree of support to be allocated to the whole of
compound set without making any additional claims about the support given to
any of its subsets. This behavior becomes explicit when we look at another set
function, related to the belief function, called the basic probability assignment
(bpa).
Definition 3.3. A bpa is a set-function m(:r) : CP (fl) > [0,1] with the following
properties:
1. m(0) = 0,
2- ]CaÂ£?(0) m(^) = 1-
The bpa, m(A), expresses the degree of support given to the set A but to
no particular subset of A. The belief in A is then the sum of all support given
to subsets of A. Thus, given a bpa, m, we can write the belief function based
on m as
Belm(A) Â£ m(B), (3.2)
BCA
and we can write the associated plausibility function as
Plausm(A) = m(B). (3.3)
BnA^O
Moreover, we can obtain the bpa from a given belief function via the formula:
m(X) = Â£ (1)I"4HBI Bel(.B), (3.4)
BCA
where \A\ |Z?| gives the difference in cardinality of the two sets, and from a
37
given plausibility function via the formula:
m(v4) = ^(-1)^HB|[1 Plaus(-d3)]. (3.5)
BCA
Using (3.2) and (3.3) and noting that B C A => B fl A 0 for A, B ^ 0,
we can readily show another relation between belief functions and plausibility
functions. We have, given a bpa, m,
Belm(A) - m(Â£?) < m(&) = Plausm(A) (3.6)
BCA BnA^0
for all A C Q. In words, belief functions are bound above by their companion
plausibility functions.
Sets to which a given bpa assigns support, those sets A for which m(y4) > 0,
are called a focal elements [19]. We can express total ignorance in terms of the
bpa by setting m(fl) = 1 and m(^4) = 0 for all A ^ fb This reflects that there
is no evidence to support any subset of the universal set and the only focal
element is the universal set, Q. Using (3.2), we see that the corresponding belief
function is Bel(fl) = 1 and Bel(A) = 0 for all A ^ fl, exactly the same as the
bpa. Total ignorance is an extreme example of a class of belief functions, called
non-dogmatic belief functions, whose corresponding bpa is such that m(fl) > 0.
Some other special classes of belief functions are:
Dogmatic belief functions that assign no support to the universal set (that
is, m(fl) = 0); and
Additive belief functions that assign support only to singletons, so that
38
m(A) = 0 W1 C ft : \A\ > 1.
Additive belief functions are also referred to as Bayesian belief functions
because of their relation to traditional probability functions, which are given in
the following, known propositions [19].
Theorem 3.4. A belief function Bel on a finite universe Q is additive if, and
only if, its corresponding bpa m is such that m({:r}) = Bel({:r}) and m(^4) = 0
for all A CQ that are not singletons.
Corollary 3.5. For additive belief functions,
Bel(v4 U B) = Bel(i4) + Bel(fl) Bel(A D B) for A,BQQ.
Corollary 3.6. For additive belief functions, we have Bel(A) = Plaus(v4) for
all ACQ.
3.1.1 Other Properties of Belief Functions
Let mi,..., mn be bpas with their associated belief functions, Bell,..., Bel,
and let
n
= 1, on > 0.
i=1
39
Then, the weighted sum of belief functions is a belief function:
ai Beh(0) = ai x 0 = 0 (3.7a)
i=l i=1
n n
^ at Belt(0) = x 1 = 1 (3.7b)
z=l i1
n n
y OLi Bel,^ U B) > [a, Belj(yl) + cq Beh(B) ctj Belj(^4 n 5)] (3.7c)
i=1
n n n
= ^ oii Belj(A) + ^ a, Beb(B) ^ cq Befi(,4 D Â£).
i=l
i=l
i=l
i=l
3.2 Dempsters Rule of Combination
In general, distinct bodies of evidence over a common frame of discernment
can be combined in a number of ways. Dempsters rule, originally presented by
Dempster [10] and, subsequently, further developed by Shafer [27], was the first
of these techniques specifically applied to combining belief functions. Given two
independent bodies of evidence expressed by bpas, mi and m2, Dempsters rule
of combination is given by,
mi,2(A) =
mipO m2(T)
xn Y=A_________
1 ^mi(X)m2(y)
xn v=0
Cl,2
for C| 2 7^ 1. (Note, summations over X, Y are abbreviated to just the condition
for notational convenience, for instance we write X fl Y = A where we want to
40
sum over all X and Y such that X D Y A.) Alternatively, because
^miPOm200 =
xe:P(n)
ye3>(n)
mi(^) ^2
,xe3>(n) ) \ye:P(n)
= 1
implies
1 ci,2 = 1 J^mi(X) m2(F) = ^m1(X)m2(F),
xn y=%
xny/0
we can write Dempsters rule as
X m1(X)m2(F) X m1(X)m2(F)
/ ,n xhy=a______________ xny=.4__________
mi2 1- XI m1(X)m2(y) X m1(X)m2(y)
xnv=0
xn y#0
with the requirement that X mi(AT) m2(F) ^ 0.
xmy0
Using this expression of Dempsters rule we have,
m(l,2),3(^4)
Y nu(l/)m2(V)
E VsSLs3Lx=^---------MY)
XnY=A
Y mi (U) m2(V)
Â£ -----MY)
xny^0
x m1(C/)m2(U)m3(F)
unvnY=A_________________
x m1(U)m2(V)m3(Y)
f/nvny/0
Y mi(U)m2(V)
E -----MY)
XHY=A
Y m1(f/)m2(y)
X t,nv=x1-ea3---------nr3(F)
xny^0
ml,(2,3)(J4)>
41
which shows that Dempsters rule is associative. Moreover, since
E m1(X)m2(r)
E m2(X)mi(F)
m1>2(A)
XnY=A
XHY=A
m2,l(A),
1- E mi(X)m2(y) 1
- E m2(X)mi(y)
XC\Y=%
xr\Y=**
Dempsters rule of combination is also commutative. Thus, Dempsters rule is**
independent of the order in which evidence is combined.
In fact, given J distinct bpas, m* : T (0) > [0,1] fori E {1,..., J}, defined
on the same frame of discernment, Q, we have
3.2.1 Alternate Rules of Combination
Dempsters rule can lead to counterintuitive results with highly conflicting
evidence. The value of the term c in the denominator can be seen as a measure
of the conflict. When c = 0, Dempsters rule is well justified, but for c 0 the
normalization factor, 1 c, has the effect of attributing any probability mass
associated with conflict to the null set, completely ignoring any conflict.
As an example, consider a situation where two technicians are asked to
diagnose a malfunctioning computer. The first believes that the problem is
likely due to bad RAM though he considers the possibility of a failing CPU
and gives the following values: Beli(bad RAM) = mi (bad RAM) = 0.98, and
Bell (CPU failure) = mi (CPU failure) = 0.02. The second suspects a failing
42
hard drive, but also recognizes a small possibility of a failing CPU, assigning the
values: Bel2(Drive failure) = m2(Drive failure) = 0.97, and Bel2(CPU failure) =
m2(CPU failure) = 0.03. Using these values we see that,
c = (0.98)(0.97) + (0.98)(0.03) + (0.02)(0.97) = 0.9994,
and thus,
mx 2(CPU failure) = (-02K:.0?) = i
1,2V 1 0.9994
which implies that Beli;2(CPU failure) = 1. Dempsters rule of combination
thus assigns complete support to the belief in a failing CPU, a diagnoses neither
technician believes is very likely. This happens because each of the technicians
rules out the preferred alternative of the other.
In order to address this anomaly, a number of alternative rules of com-
bination have been developed (Sentz & Ferson [26] provides a good survey of
these). We describe two here: Yagers rule [34] and Dubois and Prades rule [11].
In the next section, we explore and develop a specialized extension of Shafers
discount & combine method [27].
Yagers Rule:
mi;2(A)
53 mi(X) m2(Y) for A ^ 0 and A ^ Q
XnY=A
< mi(f2) m2(H) + 53 mi(Y) m2(Y) for A = Q
XnY=Q
0 for A = 0
43
Applying Yagers Rule to the technician example, we get the combined bpa:
mi;2(bad RAM) = 0, mii2(Drive failure) = 0,
mi,2(CPU failure) = 0.0006, mii2(D) = 0.9994.
This gives a belief function that represents the technicians disagreement with
nearly total ignorance (see Table 3.1).
Dubois and Prades Rule:
mij2(A)
/
Y mi(X)m2(Y) +
^ XtlY=A
Y mi(X)m2(Y)
XUY=A
xn v=0
for A ^ 0
0
s,
for A = 0
Applying Dubois and Prades Rule to the technician example, we get the
combined bpa:
mii2(bad RAM) = 0, m12(bad RAM or CPU failure) = 0.0294,
m1>2(CPU failure) = 0.0006, m12(bad RAM or Drive failure) = 0.9506,
mij2(Drive failure) = 0, mij2(CPU failure or Drive failure) = 0.0194.
Thus, instead of putting the most support into D (as Yagers Rule does), this
disjuncts the technicians top choices.
We compare the resulting belief and plausibility values in Table 3.1. Note
that Yagers rule results in nearly total ignorance, assigning belief values near
zero while assigning very high plausibility values to everything. By contrast,
44
though Dubois and Prades rule also assigns very little belief to singletons, it
does assign substantial belief to the set bad RAM or Drive failure and very
low plausibility to CPU failure.
Table 3.1: Combined belief and plausibility values for the technician example.
Yager Dubois & Prade
Bel Plaus Bel Plaus
bad RAM 0.0000 0.9994 0.0000 0.9800
CPU failure 0.0006 1.0000 0.0006 0.0494
Drive failure 0.0000 0.9994 0.0000 0.9700
bad RAM or CPU failure 0.0006 1.0000 0.0300 1.0000
bad RAM or Drive failure 0.0000 0.9994 0.9506 0.9994
CPU or Drive failure 0.0006 1.0000 0.0200 1.0000
bad RAM or CPU or Drive failure 1.0000 1.0000 1.0000 1.0000
3.2.2 Discount &: Combine Combining Highly Conflicting Beliefs
As discussed in Section 3.2.1, Dempsters rule can be problematic when com-
bining highly conflicting evidence. Mathematically, the denominator becomes
zero, but more importantly, the conflict means we cannot believe the combina-
tion. This problem is particularly evident when combining highly conflicting,
dogmatic belief functions. For such situations, Shafer proposed a method in
chapter 11 of his book [27, p. 251-255] that discounts the belief functions by
first applying a discount factor 0 < a, < 1 to each belief function to reflect our
confidence in that source of evidence, Bel*(A), for all proper subsets A of Q, so
that
Bel"(A) = ai Bek(A) for A ^ D and Bel"(Q) = 1,
45
and then combines them using Dempsters Rule. This discount & combine
method is more appropriate for combining highly conflicting belief functions
because it avoids the problem of discarding conflicting information seen in other
rules of combination. In fact, for belief functions that are both highly conflicting
and dogmatic, discounting the belief functions before combining them is essential
in order to avoid discarding evidence, because the other methods assign zero
belief to anything not supported by both evidence pools.
For combining highly-conflicting, dogmatic belief functions we propose a new
method based on Shafers discount & combine method, where we introduce an
additional re-normalizing step to convert the resulting combined belief function
back into a dogmatic belief function. We further show that for totally conflicting
bpas (where c = 1) the combined belief function is equivalent to a weighted sum
of the original belief functions.
Given totally conflicting, dogmatic bpas mi and m2, define the non-dogmatic
bpas:
m^A) = ami(j4) for A ^ Q, m2(i4) = for A^Q,
and
ihi(O) = 1 a, m2(fi) = 1-/3,
where 0 < a < 1 and 0 < (3 < 1. We can think of a and j3 as weights of trust
on the evidence for mi and m2, respectively; setting a 0 implies that we have
no trust in mi (that is, rhi reflects total ignorance), while setting a = 1 implies
total faith in mi with mi = up.
46
We can then combine rhi and m2 using Dempsters rule:
hii,2(A)
Â£ mi(X)m2(y)
xnv=A____________
1- Â£ m(X)m2(Y)
xnv=0
Â£ m1(X)m2(r)
xnY=A___________
Â£ ihi(X)m2(y)
xny#0
for A ^ 0 and mli2(0) = 0.
Since mi and m2 are totally conflicting, we have, for A 7^ Q,
- (A) =_____________ami(>l)(l-/3) + (l-a)/3m2(i4)_____________
1,2 Â£ ami(d)(l P)+ Â£ (1 a)pm2{A) + (1 a)(l f$)
Aj=Sl A^Q
_ a(l /?) mi(A) + (1 a)/3m2(A)
a(l j3) + (1 a) ft + (1 a)(l ft)
_ a(l 0) mx(v4) + (1 a)/3 m2(A)
and, for A = D,
mi,2(n)
1 a/3
(3.9)
The reasons for restricting a and /3 to be strictly greater than zero and
less than one become apparent when we examine the behavior of fhij2 at these
extremes. If, for example, we let a = 0, we have mlj2(fi) = 1 j3 and riili2(/l) =
P m2(A), which is simply the non-dogmatic bpa m2. At the other extreme, where
a = 1, we have ihii2(f2) = 0 and rhi>2(A) = which is undefined when
P = 1 and equal to mi otherwise.
Finally, we produce the dogmatic combination, mii2, by setting mij2(D) = 0
47
and normalizing:
m1)2(i4)
mi,2(^)
1 mi,2(n)
a(l ~ (3) mx(d) + (1 a)/3m2(A)
1 a/3
1 a/3
q(1 (3) mi (A) + (1 a)(3 m2(zl)
1 a/3 (1 a)(l /3)
a(l j3)mi(A) + (1 a)/3m2(M)
a 2 a/3 + /3
a(l ~ P)
a 2af3 + f3
mj(v4) +
(1 ~ a)P
a 2 a/3 + /3
m2(v4).
(3.10)
Note that, since
<*(1 -P) (1 ~a)P =
a 2 a/3 + (3 a 2 a/3 + (3
mi,2 is a weighted sum of mi and m2. Moreover, in the case where a = (3 we
have
a(l /3) (1 a)/3 1
a 2 a/3 + (3 a 2 a/3 + (3 2
For the case of combining n totally conflicting belief functions, whose focal
elements are completely disjoint, we have
mi...(4)
LIU (< - i)) m*(^)
E (o.n^(i-o) + rK=.(i-i)
48
and, for A Q,
n?=i(i-i)
Er.,(.n^(i-.))+nr=i(i-ai)
where 0 < a, < 1 is the discounting weight for m,. Setting mi...n(fi) = 0 and
normalizing:
1 mL..n(f))
Eti (ai n^(i <)) mi(^)
Et i (* n^i(i-))
Looking again at the case where all the discounting weights are equal, cq = a
for all i, we have
E[Li Q(1 a)n lmi(A)
- a)n_1
Er=iQ(1 ~a)n
EILi^l -a)n_1
= Â£
a(l
na( 1
a)"-1
a)"-1
ll
We have thus generalized our previous observation for combining two totally
conflicting belief functions with equal discounting weights.
In the case where mi and m2 are not totally conflicting, we need to factor
in the extent of agreement. Thus, we have, for A ^ Q,
anq^Xl /?) + (1 a)/3m2(^) + a(3 J]mi(X)m2(y)
---------- (3.H)
1 a/3 + a(3 ^2 mi(^0 hi2(T)
xn y#0
49
and, for A fi,
(l-a)(l-/3)
1 a/3 + aP mi W m2(Y)
xnYV0
Setting mi,2(Q) = 0 and normalizing we get:
a(l /?) mi(i4) + (1 a)P m2(A) + a/3 Â£ m^X) m2(Y)
mli2(A) =---------------------------------------------------. (3.13)
a 2a/3 + f3 + a/3 ^mi(X) m2(X)
xny#0
Applying our modified discount & combine method to the technician exam-
ple from Section 3.2.1 with a = (3 = 0.5, we get the combined bpa:
mij2(bad RAM) = 0.48985, mli2(Drive failure) = 0.48485.
mij2(CPU failure) = 0.02529,
which places support nearly evenly split between bad RAM and drive failure.
Table 3.2 gives the resulting belief function and plausibility function values for
both the unmodified and our modified discount & combine method. We note
that in this case, the bpa produced by our method gives nonzero values to only
singletons (unlike the unmodified method, which leaves some weight assigned
to fl). Hence, this belief function is additive, so Corollary 3.6 applies and the
belief values are equal to the plausibility values. Moreover, unlike either Yagers
rule or Dubois and Prades rule, this method of combination assigns significant
belief to both technicians preferred causes for the malfunction, and, unlike the
unmodified approach, it retains the dogmatic nature of the belief functions being
3.12)
mi, 2(fi)
50
combined.
Table 3.2: Discount & combine belief and plausibility values for the technician
example.
Unmodified Modified
Bel Plaus Bel Plaus
bad RAM 0.32660 0.65987 0.48985 0.48985
CPU failure 0.01686 0.35013 0.02529 0.02529
Drive failure 0.32327 0.65654 0.48459 0.48459
bad RAM or CPU failure 0.34346 0.67673 0.51514 0.51514
bad RAM or Drive failure 0.64987 0.98314 0.97470 0.97470
CPU or Drive failure 0.34013 0.67340 0.51014 0.51014
bad RAM or CPU or Drive failure 1.00000 1.00000 1.00000 1.00000
51
4. Using Persistence as Evidence
The intuitive notion of persistence is a compelling force for robust, decision-
making. Given a list of acceptable solutions, an individual decision, such as
selecting or rejecting a particular asset, that persists in appearing in the solution
set is appealing to select in our final solution. This persistence conveys a form
of robustness in that we feel more confident that this selection or rejection is a
good decision even when our estimates of the returns are inexact.
In this chapter, we build a formal foundation using the Dempster-Shafer
Theory of Evidence by treating persistence as evidence, which we in turn use to
build belief functions. Our first approach builds belief functions by considering
the total global persistence of solutions. Following that, we investigate an alter-
native approach that builds its belief functions by treating solutions as sets of
decisions and considering the persistence of subsets of these sets.
4.1 Building Belief from Persistence of Solutions
In this section, we develop our first approach to using persistence as evidence
of a solutions robustness. We define the frame of discernment, f2, for our belief
functions to be the set of 2" possible solutions, expressed as vectors. This
corresponds to the set of selections:
S(x) = {j : Xj 1} and Xj(S)
0 otherwise.
52
Using vector notation, let XK = {a:1,..., xK} denote the set of K 2:-vectors
used to compute the persistence vector, p = [7(2:1) 7(xn) ]T (hence, 1 p is
a vector of the global persistence of rejection values). Then, for a given feasible
solution space, X, we define a bpa based on the total combined (selections and
rejections) global persistence value of a solution:
m({2:}) = 0 for x X (4.1a)
m({^}) = ~ (PTx + (1 P)T( 1 x)) tx for x Â£ X (4.1b)
m(T) = 0 for Y C X : |T| > 1, (4.1c)
where k = ^2yex (PJV + (1 p)T(l y)) and 1 is an appropriately sized vector
of ones. Note that setting m({o:}) = 0 for x $ X defines the set of focal elements
to be exactly the set X. Moreover, since m(T) = 0 for |y| > 1, we can simplify
our notation, writing m({2;}) as m(:r).
We call (4.1) the singleton-only bpa, as it does not assign a probability
mass to subsets of X with more than one vector. We also consider replacing
(4.1c) with the following:
disjunctive bpa: m(T) = m(x) (4.2)
X&Y
for Y Cl, and re-defining k in (4.1b) accordingly. The disjunctive bpa, inter-
prets m(T) as the probability that some member of Y is the most robust, as
defined in the following.
Definition 4.1. The mode of the bpa is the solution with the belief, which we
define to be the most robust.
53
This interpretation carries over to the belief function that we now develop
for the singleton-only bpa. The total persistence of the selections is given by
pTx = ^ZjeS(x)Pj> and the total persistence of the rejections is given by (1
p)T(l x) = ]Cjg5(x)(l Pi)- This induces the belief function:
Bel(x) = m(:r) for x 6 X
Bel(:r) = 0 for x Â£ X.
(4.3)
Since Bel(x) = m(x) for singletons and m(Y) = 0 for all other subsets of fi, this
belief function is an example of the special case of additive belief functions that
are equivalent to probability measures.
Depending on how we generate our initial list of acceptable solutions, the
list may include solutions of questionable quality. For example, consider the
ranked list of solutions from Table 2.1, which was produced by solving the knap-
sack problem using the method of successive exclusions discussed in Section 2.1
(see p. 7). For many of the generated solutions there exist higher-ranking solu-
tions with better objective item-values and less total item-weight. We call such
solutions dominated.
Definition 4.2. The solution vector x dominates y for the knapsack problem,
if cx > cy and ax < ay.
Counting dominated solutions may cause a distortion of the persistence of
a decision, which we can address by modifying the definition of the persistence
vector so that pj denotes the undominated persistence of Xj 1 (that is, we de-
fine XK to include only the undominated solutions and compute the persistence
54
values from this new set). The persistence of a rejection is 1 pj.
As an example, Table 4.1a shows the belief and plausibility values resulting
from using the solution set from Table 2.1 and setting X = XK. Notice that
each of the solutions with cx = 108 is dominated by the solution with cx = 109
because ax = 101 < 103 < 105 < 107. Similarly, both solutions with cx = 107
dominate all the lower-valued solutions. Removing these dominated solutions,
we get Table 4.1b, which shows the adjusted belief values for the undominated
solutions. Since this belief function is additive, we observe a characteristic of
such functions: Bel(x) = Plaus(x), for all solutions. (We include the extra
column here for later comparison.)
Table 4.1: Singleton-only belief and plausibility values.
(a) with dominated solutions (b) without dominated solutions
cx ax Bel(x) Plaus(x) cx ax Bel(x) Plaus(x)
112 105 0.0750 0.0750 112 105 0.2010 0.2010
110 104 0.0766 0.0766 no 104 0.1907 0.1907
109 101 0.0782 0.0782 109 101 0.2113 0.2113
108 103 0.0735 0.0735 107 100 0.2010 0.2010
108 105 0.0671 0.0671 107 97 0.1959 0.1959
108 107 0.0671 0.0671 pu = [1 1 15 153: 2 5 0]t/5
107 100 0.0750 0.0750
107 105 0.0656 0.0656
107 104 0.0687 0.0687
107 97 0.0766 0.0766
106 107 0.0671 0.0671
105 104 0.0687 0.0687
105 101 0.0703 0.0703
105 103 0.0703 0.0703
pd = [4 4 i 110 5 13 6 16 11 0]t/14
55
4.1.1 Making a Partial Decision Conditional Beliefs
Suppose we make an a priori decision to select or reject some particular
item. Define Xj = {x E X : Xj = 1} and Xj = {x E X : Xj = 0}. Let
Bel (a: | Y), where Y C X, denote Dempsters conditional belief function, which
is given by Dempsters rule of conditioning:
Bel(x | Xj) =
Bel(x V Bel(-Xj) Bel(x V Xj) Bel(Xj)
1 Bel(-'Xj)
1 BelPO)
know xj 1 is
Theorem 4.3. For the singleton-only bpa, the belief for a solution x given we
Bel(x)
i I RpI (X ) 1 X ^ Xj
Bel(x | Xj) = { Bel
0 if x Xj.
Proof: For x E Xj, Bel(x) = 0 and
Bel(x V Xj) = Bel(Xj) => Bel(x | Xj) = 0.
For x Â£ Xj, since the singleton-only belief is additive,
Bel(x V Xj) = Bel(x) + Bel(Xj).
Then, from (3.1) and Corollary 3.5,
1 Bel(Xj) = Plaus(Xj) = Bel(Xj)
Bel(x | Xj)
Bel(x)
Bel(Xj)'
56
Intuition suggests that belief increases as support for that belief increases,
meaning that conditional belief in x should increase with conditions that it
satisfies; otherwise, if the evidence does not support x, the belief in x drops to
zero. That is, intuition suggests that, if x Â£ Xi U Xj (so x Xi and x Â£ Xj),
Bel(x) > Bel(x | Xi U Xj) = Bel(x | X{ fl Xj) = 0
and, if x Â£ Xi (or x 0 Xj),
Bel(x | Xi) = Bel(x | Xi fl Xj) = 0 (or Bel(x | Xj) = Bel(x | Xi n Xj) = 0);
otherwise, if x Xi fl Xj,
Bel(x) < Bel(x | U Xj) < Bel(x | Xi) < Bel(x | Xt D Xj).
Although this intuition is not valid for all bpas, it does hold for our singleton-
only bpa. The first two cases follow directly from Theorem 4.3 and we confirm
the third in the following theorem.
Theorem 4.4. For the singleton-only bpa, x G X, H Xj implies:
Bel(x) < Bel(x | Xi U Xj) < Bel(x | Xi) < Bel(x | Xi fl Xj).
57
Proof: Since D Xj C Xi C X{ U Xj C fl,
Bel(x) Bel(x) Bel(x) Bel(^) _
Bel(Xi n Xj) ~ Bel(X0 Bel(Xi U Xj) ~ Bel(O) 6 1 h
and, by Theorem 4.3,
Bel(x) < Bel(x | Xi U Xj) < Bel(x | Xt) < Bel(x | Xt D Xj).
Table 4.2a shows the revised (conditional) beliefs for some selections from
Table 4.1a, and Table 4.2b shows the associated beliefs for solutions from Ta-
ble 4.1b. Observe that in the undominated case (Table 4.2b), since Xi = 1 in
only the first solution, the belief in the first solution jumps to 1, while the beliefs
in all others drop to zero. Moreover, since Â£4 = xe = xg = 1 in all undominated
solutions, fixing these values leaves the beliefs unchanged. In these cases, where
Bel(x | Xj) = Bel(x), the knowledge that Xj = 1 does not affect our belief that
x is the most robust solution.
Finally, we note that since xw = 0 for all x, the conditional belief,
Bel(x | X10), is not defined in either the dominated or undominated case. As
in Bayesian conditional probability, P (A | B) = Pp^^~ fails when P (B) = 0,
Dempsters rule also fails when we attempt to define a belief conditioned on an
event that does not occur.
4.2 Building Belief from Persistence of Decisions
In our second approach we treat the solutions in our solution list as sets of
decisions and construct a belief function by examining the persistence of each
of their subsets. Consider the collection of belief functions induced by each Sk
58
Table 4.2: Singleton-only conditional beliefs.
(a) with dominated solutions
Bel(x) Bel(x|Xi) Bel(x|X0 Bel(z|X6) Bel(x|X9)
0.0750 0.2699 0.1033 0.0803 0.0945
0.0766 0 0.1054 0.0820 0.0965
0.0782 0 0.1076 0.0837 0.0985
0.0735 0 0.1011 0.0786 0.0925
0.0671 0.2415 0 0.0719 0.0846
0.0671 0.2415 0.0924 0.0719 0
0.0750 0 0.1033 0.0803 0.0945
0.0656 0 0.0902 0 0.0826
0.0687 0.2472 0.0946 0.0735 0
0.0766 0 0.1054 0.0820 0.0965
0.0671 0 0 0.0719 0.0846
0.0687 0 0 0.0735 0.0866
0.0703 0 0 0.0752 0.0886
0.0703 0 0.0967 0.0752 0
(b) without dominated solutions
Bel(x) Bel^lXj) Bel(a;|X,) Be\{x\X6) Bel(rr|X))
0.2010 1 0.2010 0.2010 0.2010
0.1907 0 0.1907 0.1907 0.1907
0.2113 0 0.2113 0.2113 0.2113
0.2010 0 0.2010 0.2010 0.2010
0.1959 0 0.1959 0.1959 0.1959
from defining each bpa, such that each nonempty subset of Sk is equally
likely:
if S c Sk,S ^0
m k(S) =
0
otherwise.
We then aggregate the evidence from the full solution list by defining a final bpa
as a weighted sum of each of these K bpas so that,
m(5) = i = i Y. jiSiTT?
k=1 k:SCSk
and the belief for a given solution is
K K
m(x) = -Y E mt(S) = -^Belt(SW). (4.4)
k=1 SCS(x) k= 1
We call (4.4) the decision-sets belief function. Note that, since this belief func-
tion considers only selections, there is nothing restricting infeasible solutions
from having large belief values (in fact, for x = 1 we have Bel(:r) = 1). Hence,
we must explicitly require that the solutions under consideration first satisfy any
constraints from the underlying problem. Because it does not affect the relative
rankings this requirement is not onerous. Moreover, depending on the nature
of the problem, we may feel that some solutions should have high belief values
despite their infeasibility (for instance, we might want to represent solutions we
wish were available if only we had a larger knapsack).
Table 4.3 illustrates this approach using the solution set from Table 2.1.
Mirroring Tables 4.1a and 4.1b, Table 4.3a gives the beliefs resulting from using
the persistence values from all fourteen solutions, while Table 4.3b considers
only the undominated solutions. Note that in considering subsets, the second
solution (with cx = 110) has the highest belief when all fourteen solutions are
considered unlike in the singleton-only bpa where the third solution has the
60
highest belief. When considering only undominated solutions both approaches
assign the greatest belief to the third solution (with cx = 109).
Table 4.3:
K
Decision-sets belief values, Bel(a;) = ^
fc=i
Belfc (S(*))
K
(a) with dominated solutions
cx ax Bel(x) Plaus(x)
112 105 0.4292 0.9109
110 104 0.4378 0.9152
109 101 0.4372 0.9246
108 103 0.3807 0.9009
108 105 0.2433 0.8106
108 107 0.2476 0.7916
107 100 0.3991 0.9055
107 105 0.2286 0.7820
107 104 0.2565 0.8057
107 97 0.3088 0.8760
106 107 0.3333 0.8441
105 104 0.3330 0.8628
105 101 0.2614 0.8192
105 103 0.2608 0.8143
(b) without dominated solutions
cx ax Bel(x) Plaus(x)
112 105 0.5320 0.9544
no 104 0.4804 0.9415
109 101 0.5837 0.9673
107 100 0.5320 0.9544
107 97 0.3807 0.9226
This belief function is very closely related to the global selection persistence
factor and adjusted global selection persistence factor from Section 2.1. Recall
(p. 14) that for any selection set SCI, the global selection persistence factor
is defined as
, N \{k : S C Sk for 1 < k < K}\ p , , /rtx
T(S) = --------------------------------- for 5^0 with T(0) = 0
61
and (c.f. p. 17) the adjusted global selection persistence factor is defined as
iw =
E e1
{1
-I)-1
K
for S ^ 0 with T(0) = 0.
When all optimal selection sets have the same cardinality (that is, |5fc| = m
for all k in 1,..., A'), maximizing total, global selection persistence factor is
equivalent to maximizing total belief. This is shown with the following.
Theorem 4.5 (Restricted Equivalence). If |Sfe| = IS1! for all k e {1,..., K},
argmax ^ r(S") = argmax ^Bel*, (S(x))
xX
S'CS(x)
xeX
Proof: Using the indicator function,
l(a,A) =
we can write
1 if a Â£ A
0 otherwise,
^ ^ |{fc : 5C5fcfor l
argmax > T(S') = argmax > J--------------------------
xex , xex , A
S'CS(x)
S'CS(x)
SV0
1 K
argmax ^ ^l(5',T(5fc))
xeX ----. . .
S'CS(x) k=1
5V0
K
argmax ^ I (5', T (Sk)).
xX
S'CS(x) k= 1
S'^0
62
Similarly, since IS^I = IS1! for all k in 1,..., K, we have
K K
argmax ^ Belfc (S(x)) = argmax E E mk{S')
xeX k=1 xeX fc=1 S'CS(x)
K
i (5",y (sk))
= argmax ^ L 2|s*| x
XizA S'CS(x) k=1
5V0
= argmax-^ ^I (S",0> (S*))
ieX 1 S'CS(x) k=1
SV0
/f
= argmax ^ I (5', 7 (Sk)).
xeX
SCS(x) k= 1
5V0
Thus, the two formulations differ only by a constant multiplier and are optimized
by the same solution set.
In words, Theorem 4.5 says that the following are equivalent:
1. V is a most-robust selection set.
2. V is a selection set with maximum total belief.
3. Visa selection set with maximum total global selection persistence factor.
The condition that all x E XK have the same sum holds for any problem
with a constant-sum constraint, ]TT Xj = N, such as sensor assignment without
slack.
4.2.1 Conditional Beliefs
As with the singleton-only belief function, if we make an a priori decision to
select or reject some item, we can conditionally revise our beliefs. Recall from
63
Section 4.1.1 that the general form of Dempsters conditional belief function is
given by
Bel(x
where we have defined Xj = {x E X : Xj 1} and Xj = {x E X : Xj 0}.
(Note that Bel(:r | Xj) is not the same as Bel(x | S(x) = {j}).) We specify this
rule for the decision-sets belief function in the following theorem.
Theorem 4.6. For the decisions-sets bpa,
E m(5(?/))
y:S(y)CS{x)Ayj=l
E m (s(y))
y
where Bel(x | Xj) is the belief in x given we know Xj = 1.
Proof: We have
Bel(x Xj)
Xj) =
Bel(x V -nXj) Bel(-iXj) Bel(x V Xj) Bel(Xj)
1 Bel(-iXj)
1 Bel(Xj-)
Bel(:r | Xj) =
Bel(z VXj)- Bel(Xj)
1 Bel(X.)
Â£ (.S'f'y)) Â£ m (s(y))
y-S(y)CS(x)\/yj=0 V'-Vj-0
i-E m(%))
yyj= o
E m{s(y))
y:S(y)CS(x)Ayj = l
1~E m(5(y))
y-Vj=o
E {s{y))
y- S(y)QS{x)Ayj = l
E m(5(r/))
V-Vj=1
In Table 4.4, we illustrate some revised belief values for the belief function
shown in Table 4.3. Fixing x\ = 1 gives the most dramatic results since very
64
few of the considered solutions select X\. Additionally, the final column shows
the results of conditioning on two selections, x4 = 1 and x6 = 1, and thus is
nonzero for only those solutions that make both selections.
Table 4.4: Decision-sets conditional beliefs.
Bel(x) Bel(x|Ab) Bel(x|AT4) Bel(x|A:6) Bel(x|X9) Bel(x|A:4 n X6)
0.4292 0.7480 0.5254 0.4527 0.4207 0.5283
0.4378 0.0000 0.5000 0.4811 0.4311 0.5283
0.4372 0.0000 0.4992 0.4609 0.4768 0.4991
0.3806 0.0000 0.4492 0.4223 0.4079 0.4717
0.2433 0.4390 0.0000 0.2813 0.2987 0.0000
0.2476 0.5020 0.3398 0.2955 0.0000 0.3641
0.3991 0.0000 0.4738 0.4217 0.4536 0.4708
0.2286 0.0000 0.3016 0.0000 0.2923 0.0000
0.2565 0.5630 0.3516 0.2948 0.0000 0.3631
0.3088 0.0000 0.3516 0.3289 0.3668 0.3349
0.3333 0.0000 0.0000 0.3835 0.3851 0.0000
0.3330 0.0000 0.0000 0.3734 0.4079 0.0000
0.2614 0.0000 0.0000 0.2999 0.3323 0.0000
0.2608 0.0000 0.3389 0.3090 0.0000 0.3631
4.3 Multiple Objectives
Another source of evidence comes when considering solutions from multiple
objectives. In multi-objective optimization, though, the various objectives are
usually at least partially conflicting. In this section, we investigate a couple of
approaches for integrating evidence produced by multiple objectives. In both
cases, each objective is associated with a distinct solution set. In the first,
we treat each solution set and associated persistence values as a distinct pool
of evidence, construct a belief function from each, and combine them using
one of the rules for combining evidence (we consider both Dempsters rule and
65
the modified discount & combine method). The second takes an alternative
approach of attempting to merge the solution sets prior to computing persistence
values.
4.3.1 Combining Multiple Objectives as Pools of Evidence
Let XKl C X1, XK2 C X2 denote two distinct solution sets produced by
considering two objectives, and let mi and m2 be their associated bpas. We
combine these with Dempsters Rule:
m12(0) = 0,
where
m12(y)
E mi([/)m2(P)
unv=Y
C=
C/nv=0
Since Dempsters rule fails when the sets to be combined do not share any focal
elements (in other words, when c = 1), we require that X1 Pi X2 ^ 0 for this
approach.
Theorem 4.7. For the singleton-only bpa,
mi2(r) = 0
m12(i)
mi(i) m2(x)
Z) mi(u)m2(u)
u&x'rx2
m12(y) = 0
for x <Â£ X1 n X2
for x G X1 n X2
forY : \Y\ > 1
66
and
Beli2(y) = 5^mi2(y)
yev
Â£ mi(y)m2{y)
yeY_____________
Â£ mi(u) m2(u)
uex'nx2
Proof: Since U fl V = Y with |F| > 1 implies \U\ > 1 and |V| > 1, and
mi(U) m2(V) = 0 for \U\ > 1 or |V| > 1, Dempsters rule becomes:
mi2(x) = 0 for x Â£ X1 fl X2
m12(x)=m-1lf)m2W for x X1 n X2
1 C
m 12(F) = 0 for Y : \Y\ > 1,
where
c= Â£ m1(Â£/)m2(V)
uc\v=q>
= mi(u)m2(u)
uexl,v&x2
u^-v
= (Yi mi(u)) (5Z m2^)
Vuex1 / \vex2
^2 rn1(u)m2(u)
uex1nx2
Further,
= 1- '^2 mi(n)m 2(u).
uex'nx2
Belj2(y) = J2 mi2(u) = J^nii2(j/)
t/cy yey
Â£ m2(2/)
yeY______________
Â£ mi(u)m2()'
uexinx2
For the purpose of illustration, consider if we add a second objective to the
treasure-room knapsack problem. This could be the relative historical values of
the various treasures as assigned by the curator of the local museum for instance
(for this example, though, we use a random perturbation of the original values).
67
Let our second objective be:
max cx,
where
c = f 46 40 41 36 10 11 11 44 5 22 1.
Table 4.5: Second-objective treasure-room knapsack problem solution sets.
Objective Knapsack Assignment
k Value Xi x2 X3 X4 x5 x6 X7 x8 x9 2-10 Bels2(:r) Belds(.x)
1 112 0 0 1 0 0 1 1 1 1 0 0.0733 0.4740
2 111 0 0 1 0 1 1 0 1 1 0 0.0717 0.4178
3 109 1 0 1 0 0 1 1 0 0 0 0.0634 0.2836
4 109 1 0 0 1 0 1 1 0 1 0 0.0799 0.4178
5 107 0 0 1 0 0 1 1 1 0 0 0.0651 0.2968
6 107 0 0 0 1 0 1 1 1 1 0 0.0815 0.4163
7 106 1 0 0 0 0 1 0 1 1 0 0.0750 0.3161
8 106 0 0 1 0 1 1 0 1 0 0 0.0634 0.2544
9 106 1 0 0 0 0 0 1 1 1 0 0.0618 0.2691
10 106 0 0 0 1 1 1 0 1 1 0 0.0799 0.3508
11 105 1 0 0 0 1 0 0 1 1 0 0.0601 0.2363
12 104 0 0 1 1 0 1 1 0 1 0 0.0799 0.3883
13 104 1 0 0 1 0 1 1 0 0 0 0.0717 0.2544
14 103 1 0 1 0 0 1 0 0 1 0 0.0733 0.2879
Total 4 4 4 10 5 13 6 6 11 0
In Table 4.5, we show the solutions generated by applying the method of suc-
cessive exclusions to the treasure-room knapsack problem with this second objec-
tive along with the resulting singleton-only, Belso, and decision-sets, Belds, belief
values. We illustrate the use of Dempsters rule to combine the singleton-only
68
belief functions, Bel0 and Belj0, in Table 4.6, which gives the post-combination
beliefs and solutions for the situation in which the bpas are constructed by lim-
iting the solution spaces to the respective solution lists (that is, X1 = XKl and
X2 = XK*). Since, in this case, we are combining singleton-only bpas, only the
common solutions, X1 fl X2 = XKl D XK2, have non-zero belief values.
Table 4.6: Combined singleton-only beliefs for the common solutions.
Objective Values Xi X2 Knapsack Assignment Â£3 Â£4 Â£5 Â£6 Â£7 Â£8 Â£9 X10 Bels1(Â£) Bels2(Â£) Belso(Â£)
107 104 1 0 0 1 0 1 1 0 0 0 0.0687 0.0717 0.1254
112 109 1 0 0 1 0 1 1 0 1 0 0.0750 0.0799 0.1527
107 104 0 0 1 1 0 1 1 0 1 0 0.0750 0.0799 0.1527
106 111 0 0 1 0 1 1 0 1 1 0 0.0671 0.0717 0.1225
110 106 0 0 0 1 1 1 0 1 1 0 0.0766 0.0799 0.1559
105 112 0 0 1 0 0 1 1 1 1 0 0.0687 0.0733 0.1283
109 107 0 0 0 1 0 1 1 1 1 0 0.0782 0.0815 0.1624
For the decision-sets belief function the necessity for c < 1 entails a more
stringent requirement: it is not sufficient that the acceptable solution domains
intersect (that is, X1 fl X2 ^ 0), we must further require that the respective
selection sets intersect (that is, S(XK') Pi S(XK2) 7^ 0). Moreover, Dempsters
rule assigns non-zero beliefs only to subsets of this intersection. Thus, when
working with objectives that produce largely disjoint selection sets, the results
of combining decision-sets belief functions with Dempsters rule is of dubious
quality. In these cases, using a rule like the modified discount & combine method
is more appropriate.
We illustrate the combination of decision-sets belief functions in Table 4.7,
where we show the results of both Dempsters rule and the modified dis-
69
Table 4.7: Combined decision-sets belief function values for solution sets union.
Objective Values Xi Xi Knapsack Assignment X3 X4 X5 X6 Xj x$ Xg auo Dempsters Bel^(x) mod-D&C Bel?5(x)
112 109 1 0 0 1 0 1 1 0 1 0 0.4104 0.3144
110 106 0 0 0 1 1 1 0 1 1 0 0.4043 0.2982
109 107 0 0 0 1 0 1 1 1 1 0 0.4395 0.3232
108 103 1 0 0 1 1 1 0 0 0 0 0.2431 0.1651
108 102 1 1 0 0 0 1 0 0 1 0 0.2899 0.1793
108 103 0 0 1 1 1 1 0 0 1 0 0.3675 0.2608
107 104 1 0 0 1 0 1 1 0 0 0 0.2768 0.1969
107 92 0 1 0 1 0 1 0 0 1 0 0.3118 0.1914
107 92 0 1 0 1 0 0 1 0 1 0 0.1867 0.1331
107 104 0 0 1 1 0 1 1 0 1 0 0.4055 0.2982
106 111 0 0 1 0 1 1 0 1 1 0 0.3795 0.2827
105 101 0 0 0 1 1 1 0 1 0 0 0.2681 0.1803
105 100 0 1 0 0 0 1 0 1 1 0 0.3192 0.1933
105 112 0 0 1 0 0 1 1 1 1 0 0.4108 0.3045
103 109 1 0 1 0 0 1 1 0 0 0 0.2495 0.1714
101 106 0 0 1 0 1 1 0 1 0 0 0.2466 0.1691
101 105 1 0 0 0 1 0 0 1 1 0 0.1959 0.1415
100 107 0 0 1 0 0 1 1 1 0 0 0.2735 0.1840
100 106 1 0 0 0 0 1 0 1 1 0 0.3543 0.2234
100 106 1 0 0 0 0 0 1 1 1 0 0.2226 0.1586
98 103 1 0 1 0 0 1 0 0 1 0 0.3215 0.2010
count & combine method (with a\ a2 = 0.5). Since the intersection of
the two selection sets, S(XKl) D S(XK2) {1,3,4, 5,6, 7,8, 9}, is substantial,
either method produces reasonable results. The resulting beliefs differ in that
the bpa produced by the modified discount h combine method assigns weight
to subsets of the union of the selection sets, while Dempsters rule gives weight
only to subsets of the intersection of the selections sets.
70
4.3.2 Computing Persistence from Multiple Objectives
Let XKl, XK2 be solution sets for two objectives from which their persis-
tences are computed. One could use XK XKl U XKi (where K is the number
in their union), in which case nothing new is needed. However, one may intuit
that solutions in their intersection are more robust against uncertainties.
Definition 4.8. Two solution sets are in total conflict if XKl fl XK2 = 0.
Otherwise, they are compatible.
If they are compatible, we could reduce to their common solutions, XKl fl
XK2, from which persistence values can be computed to yield a robust solution.
Or, we could use the multiset union, letting K = Kx lJ K2, so that more weight
is given to common solutions and compute persistence using the combined set.
However, if the solution sets are in total conflict, the multiset union simply
appends one list onto the other, and we must seriously question whether the
resulting persistences are meaningful. Moreover, when the solution sets are in
total conflict, the approach used in Section 4.3.1 is generally not appropriate:
Dempsters rule fails because 1 c = 0.
Yagers rule fails because m(A) = 0 for all A.
Dubois &; Prades rule and the discount & combine rule do not produce
meaningful results for belief functions, such as the singleton-only bpa,
whose focal elements are solution vectors (but may give meaningful results
for the decision-sets belief function, since its focal elements are individual
decisions).
71
For the remainder of this section, we suppose at least some subset of the
objectives are in total conflict and investigate how to combine the evidence their
solution sets provide. For two objectives, let XK = XKl U XKi and compute
the persistences from XK, where K = K\ + K2 since XKl D XKl = 0.
For M objectives, where M > 2, our first phase is to partition the collection
of solution sets (1 set per objective) into groups such that each group has a
non-empty intersection. (In particular, two objectives in conflict result in two
groups.) Among such partitions, choose a minimum number of groups. The idea
is to represent each of the groups by the solutions in its intersection, and focus
on the situation where the collection is in total conflict every inter-group
intersection is empty.
First, we must consider how to allocate groups. To help fix ideas, suppose
M = 3 and
XKl = {x\ x2}, XK2 = {x2, x3}, XK3 = {x3, x4}. (4.5)
There are two possible allocations that minimize the number of groups:
G\ = {XK\XK*}, G\ = {X*} (4.6a)
g! = {xK2,xK} (4.6b)
We require that the groups satisfy the following properties:
(PI) Assignment. Every solution set is assigned to exactly one group.
(P2) Inter-group conflict. G ^ G' => X(G) D X(G') = 0.
(P3) Intra-group compatibility. X(G) ^ 0.
72
We could use the assignment model with conflict constraints to form groups
that satisfy the first two conditions. However, the intra-group compatibility
condition is more difficult because we could have all pairs of sets compatible but
not have a complete non-empty intersection.
Let amg = 1 if objective set XKm is assigned to group g, so the constraints
of a conflict assignment model with Ng groups are
Y mg = 1 for 9 m e {1,.... 1 M}, (4.7a)
t > 9 ^ {L ... ,Ng}, and
/ amg Â£ ~~ 1 for (4.7b)
i=1 (mj,... , up) G Gg with t = 2,..., M 1,
where = {(mi,..., m^) : XKm fl -r\XKmt = 0}. The first constraint (4.7a)
ensures that every solution set is assigned to exactly one group (PI), while
the second (4.7b) enforces the requirement that every group have a non-empty
intersection (P3).
The requirement that every inter-group intersection is empty (P2) is en-
forced not by the model constraints, but instead results from optimization. If
we minimize Ng for a G {0, 1}MxN9 by seeking a solution for Ng = 1,2,..., M,
and stopping the first time a feasible solution is found, the resulting assignment
must be such that every inter-group intersection is empty, otherwise we could
find an assignment of Ng 1 groups that would be feasible.
To illustrate this assignment model, consider the M = 3 solution sets from
(4.5). In this example, we have only one conflict set, 62 = {(1,3)}. So, for
Ng = 1, (4.7b) requires that cin + a31 < 1, but (4.7a) gives an = a2i = a3i = 1.
73
Hence, the assignment is infeasible for Ng = 1 and we must increase Ng. When
Ng = 2, the assignment constraints are:
an +<*12 = 1
(4.7a) <
a21 + a22 = 1 and (4.7b)
<*31 + <*32 1
<*11 + <*31 < 1
<*12 + <*32 5: 1
In this case, there are two distinct feasible assignments. First, in order to break
symmetries, we fix an = 1. This forces a12 = 0 and a3i = 0, then 0:31 = 0
forces a32 = 1 giving XKl G G* and XKi G G*. Finally, the only restriction we
have on a21 and a22 is that a21 + a22 = 1, so either cii2 = 1 ( ==> a22 = 0) or
<*22 = 1 ( => <*12 = 0) is feasible. Hence, both groupings, (4.6a) and (4.6b),
represent optimal solutions to the assignment model.
Once we have assigned the objectives to groups, the second phase is to let
each group be represented by the intersection of its members:
X(G) = p| XKm.
Thus, in our previous example we have:
X(Gl) = {:r2}, X(Gl) = {x3,x4} (4.8a)
X{G\) = {adjX2}, X(G2) = {x3} (4.8b)
Finally, we construct a merged solution set by taking the union of the groups
and compute persistence values from resulting list. Thus, in our example, we
74
would compute persistence values from either:
X1 = {x2,x3,x4} or X2 = {xl,x2,x3}.
4.4 Persistence and Generalized Information Theory
In the rest of this chapter, we discuss some relations between persistence as
evidence and information theory. We begin by showing a relationship between
optimizing with respect to global persistence and minimizing a Hamming dis-
tance. We build upon these results by further relating persistence to generalized
information theory and investigating a relationship between the size of the solu-
tion set, XK, and entropy. Finally, we conclude this section with a discussion of
a counterintuitive, and seemingly paradoxical, relationship between uncertainty
and the quantity of evidence considered that is suggested by our results.
4.4.1 Hamming Distance
In (2.7) and (2.8), we generated the global persistence values for selections
and rejections by counting the number of elements in the respective multisets.
An alternative method of generation, which offers some notational convenience,
constructs the persistence vector directly from the K acceptable solutions as
0-1 vectors. Hence, we observe that
7(1) K 7(1) K
p = 7(n) = A xk and p = fc=i 7 (n) k=1 (4.9)
75
where p = 1 p is the persistence vector of rejections. Using this notation,
we can then write the total global persistence of selections and rejections for a
vector x G {0,1}" as
^ 7(s)=pJx and ~f(r) = pr (1 x). (4-10)
s:x = l r:xT=0
Since solutions are 0-1 vectors, we can relate global persistence to Hamming
distance:
H(x,y) = ^2 \xi ~ Vi\
i
Define the total combined global persistence of a vector x G {0,1}" as:
rl>{x)= 7(0 + 7(0 =pTx + pJ(l -x). (4.11)
s:xs = l r:xr=0
Then, Theorem 4.10 shows that a vector, x, that maximizes total global persis-
tence also minimizes the sum of Hamming distances,
K
fc=i
Lemma 4.9. H(x, y) xT(l y) + (1 x) y.
Proof: H(x,y) = \{i : xt = 1 A = 0}[ + \{i : x{ = 0 A y{ = 1}|
= xJ(l y) + (1 x)Ty.
76
Theorem 4.10. Given a collection of solution vectors, {xk}, for k = 1,..., K,
K
argmax ip(x) = argmin H(x, xk)
xex
xex
k= 1
for 0/XC {0, l}n.
Proof: Using Lemma 4.9, we have,
K
x&X
fc=i
argmin
xex
T k
XX*
argmin Y"' H(x, xk) = argmin V' | xT(l xk) + (1 x)T
xeX k= 1 ^
K
inJ][rrT(l-i*) + fV-
^[.v- rV -xT(\-xk)
k=1 constant
K
= argmax Y"' \xJxk :rT(l xk) .
x-txY i
argmax
xex
xex
k=l
Substituting the summations from (4.9) into (4.11), we have
argmax if(x) = argmax
xX xex
argmax
xex
K
J27(s)+ ^(r)
1 r:xr=0
iE^+(i-^T(iE(1'-i)
Jfc=l / \ k= 1
^2 xTxk + ~ (*_ xk)
,k=1 fc= 1
K
= argmax -L + (1 x) xk^j
argmax
xX
k= 1
77
= argmax ^ |xTxfc + 1T ^1 xkj xJ xk^j
xeX k=1 L '^------'
constant
K
= argmax ^xTxk xT xfcj
xex k=1 L
K
= argmin X, xk).
xex
fc=l
(Aside: The only property we use for the Li norm is that \x y\ = 0,1 for
x, y = 0,1. This extends to \x y\p, so the Theorem and Corollary extend to
maximizing any Lp norm. Since we do not use the other norms explictly, we
continue with H defined by L\.)
A special case of the relation between global persistence and Hamming dis-
tances arises when we have a constant-sum constraint, Ylixi = N- First, since
p 1 p, we can write total global persistence (4.11) as:
1p(x) = pTx + pT(l x)
= pTx + (1 p)T(1 x)
= pTx + iTr rTxpTf+pTx
T ->T -*T -* -*T
= 2 p' X 1 X + l 1 1 p
= (2p-l)Jx + n- Uplli.
Note that the last two terms are constant with respect to x. Moreover, under
78
the constant-sum constraint, 1 x = N, we have:
argmax if(x) = argmax 2pTx lTx + n ||p||i
xex xex
= argmax 2pJx N + n ||p||i
xex '-----v-----'
constant
= argmax p x
xex
= argmax / 7(3).
xex ,
s: x3 = l
This is simply the total global persistence of selections. We have thus shown
the following.
Corollary 4.11. Given a collection of solution vectors, {xk}, for k = 1,K,
K
argmax ^ 7(s) = argmin ^ H(x, xk)
xex
s: Xg = l
xex
k= 1
for 0 7^ X C |a; G {0,1}" : xTl = iV j.
4.4.2 Entropy
Yager [33] generalized Shannons measure of information in the DST (also
see Lamata and Moral [21] and Klir [17, 18, 19]):
E{m) = m(x) log2 Plaus(x),
xex
79
where Plaus(x) = 1 Bel(-ix) is the plausibility of x, and m(;r) log2 Plaus(:r) =f 0
when Plaus(x) = 0. Using any Bayesian structure (notably, (4.1)), Yager showed
E(m) = ^ m(x) log2 m(x), (4.12)
xex
def
where 0 log2 0 = 0. It is well known that the entropy function is strictly concave.
This follows from the strict convexity of each term in the sum:
d2(u log2 v) = 1
dv2 v
Let ms be the bpa for considering only selections, and let mr be the bpa for
considering only rejections. Henceforth, we assume: Xj = 1 for some decision j
and solution x G XK\ Xj 0 for some decision j and solution x G XK\ and,
X D XK. (We could have X D XK, for instance X could be equal to the entire
feasible region, when we consider choosing a solution that minimizes the sum of
Hamming distances to XK, or, equivalently, maximizes total persistence.) Then,
Ks
pjx > o
so for x G X:
ms(x)
pTx
Ks
r = E(f- X) > >
xex
mr(x)
(1 -p)T{l -x)
Kr
80
Theorem 4.12. For X = XK = {x1,... ,xK},
pTx
mr(x) =
(1 ~P)T(1 ~x)
K\\l-p\\2
Proof: Since p = -h J2k=i xk-> we have
K
Kf
= ^2 PJX = 'Y^p1xk = pTKp = K ||p||2,
xexK
fc=l
K
r= (1 -p)TX = -p)V = (1 -p)JK(l -p)
x
i2
= Alii-pir-
Let m = ams + (l-o)mr for some a 6 [0,1]. In particular, the bpa defined
K/
in (4.1) corresponds to choosing a = ------. The belief function is the same
Ks I-
convex combination: Bel(x) = a Bels(x) + (1 a) Belr(x). However, the entropy
is generally not the same linear combination, as shown in the following.
Theorem 4.13. Suppose E(m) = /3E(ms) + (1 0)E{mr). Then,
E(ms) > E(mr) =*> f3 > a and E(ms) < E(mr) ==> /3 < a.
81
Proof: From the strict concavity of E,
/3E(ms) + (1 f3)E(mr) = E(m)
= E(ams + (1 a)mr)
> aE(ms) + (1 a)E(mr)
=> f3(E(ms) E(mr)) > a(E(ms) E(mr)).
This yields the desired result.
Consider the simple example:
(- - 1 0 0
1 0 0
, X = 0 1 0
0 1 0
L 0 0 0
The maximum-persistence solution is x = x3 = 0 with minimum sum of Ham-
ming distances, 1-1(0, XK) = 2. (Note that H(xk,XK) = 3 for k = 1,2.) How
much evidence is there for the added solution, x = 0? There is clear evidence
that Â£3 = 0, and there is some evidence that Â£2 = 0 (by x1) and that X\ = 0
(by Â£2). The evidence is persistence.
We havep= (^2, V2,0)T, so ms(X) = (Y2, V2,0)T and mr(X) = (0.3,0.3,0.4)T.
The entropies are E(ms) = 1 and E(mr) = 1.571. This suggests that there is
more uncertainty measuring by rejections.
A way to think of this is to consider the situation where x = (?,?,?) as
we gain the information necessary to progressively fill in the ?s. If we learn
82
Xj = 1, we are done the other coordinates must be zero. However, if we learn
Xj = 0, we are not done because the other coordinates could still be 0 or 1. If
we ask, x\ = 1? first, then ask x2 = 1?", the average number of questions to
determine x is y2 1+ V2X 2 = 1.5. On the other hand, if we ask, aq = 0? first,
then ask if x2 = 0?, the average number of questions is 0.3 x 1 + 0.7 x 2 = 1.7.
Figure 4.1 shows the situation.
ms(->xl) = J/2 mr{-*xl) = 0.7
ms(xl) = 1/2
mr(x1) 0.3
Figure 4.1: Question flow charts and information gain.
Consider the knapsack example with and without dominated solutions,
shown in Table 2.1, and let X = XK in each case. We show the resulting
bpa and belief values for both selections only and rejections only as well as
the combined values in Table 4.8 (along with the associated a, /?, and entropy
scores). We see that across all three belief functions, the set with dominated
solutions has more uncertainty (that is, greater entropy values) than its subset
of undominated solutions.
83
Let 1-L*(X, XK) be the minimum sum of Hamming distances from X to XK:
K
n*(X, XK) = min H{x, XK) = min V H(x, xk).
k=l
Theorem 4.14. Y C XK => T-L*(X,Y) < y.*(X, XK), with equality only if
H(x*,xk) = 0 for all xk Â£ XK \ Y and x* Â£ argmin 'H(x,XK).
xex
Proof: Since Y C XK, H(x*,XK) = H(x*,Y) + U{x*,XK \ Y). Thus,
'H(x*,XK) > l-L{x*,Y) > H(xY,T), where the first inequality is strict if
n{x*,xK\Y) > o.
Theorem 4.14 suggests that it is better to use undominated solutions to
obtain a robust solution that maximizes persistence (for any a Â£ [0,1]).
This is consistent with using XK with lesser entropy, meaning less uncertainty.
More generally, any reduction in the set of candidates leads to a more robust
maximum-persistence solution.
4.4.3 Evidence Paradox
We have seen that having a subset to determine persistence reduces the
uncertainty, but the larger set has more evidence. This is the paradox we address
by considering XK = {x1,... ,xK} for K = 1,..., Kmax. Using X1 results in
a lower sum of Hamming distances than using X2, and the entropy, visavis the
uncertainty, is less. Does this make the solution more robust? Adding x2 as
evidence gives us greater belief in a solution than using just x1, so does X2 yield
a more robust solution? Therein lies the evidence paradox.
To resolve this paradox, we consider reduction of dominated solutions. In-
84
tuition suggests that this is better, consistent with the first property: fewer is
better. But, that is because we used objective values in the dominance, which
uses a value of merit not included in our computation of persistence. If we
remove the optimum solution from XK, we would believe the conflicting prop-
erty: fewer is worse. Using objective values, however, does not reveal fully the
paradox with respect to robustness. At issue is whether entropy is synonymous
with uncertainty, and whether reduced uncertainty is synonymous with greater
robustness.
This question can be partially settled by recalling that the purpose in con-
sidering a list of solutions is to address uncertainties in the underlying model or
its parameters. The hope is that by considering a larger solution set, we have
greater confidence in discovering the ex-post optimum (that is, the true optima
once uncertainties have been resolved). In this context, the evidence paradox
says that the solution set should be large enough, but not too large, and should
reflect the uncertainties in the underlying model.
85
Table 4.8: Sample bpa and belief values with associated entropy scores.
(a) with dominated solutions
md(xk) Be\d(xk) md{xk) Beldr(xk) md(xk) Beld(xfc)
0.0822 0.0822 0.0698 0.0698 0.0750 0.0750
0.0841 0.0841 0.0711 0.0711 0.0766 0.0766
0.0860 0.0860 0.0725 0.0725 0.0782 0.0782
0.0804 0.0804 0.0684 0.0684 0.0735 0.0735
0.0598 0.0598 0.0725 0.0725 0.0671 0.0671
0.0598 0.0598 0.0725 0.0725 0.0671 0.0671
0.0822 0.0822 0.0698 0.0698 0.0750 0.0750
0.0579 0.0579 0.0711 0.0711 0.0656 0.0656
0.0617 0.0617 0.0739 0.0739 0.0687 0.0687
0.0710 0.0710 0.0807 0.0807 0.0766 0.0766
0.0729 0.0729 0.0629 0.0629 0.0671 0.0671
0.0748 0.0748 0.0643 0.0643 0.0687 0.0687
0.0636 0.0636 0.0752 0.0752 0.0703 0.0703
0.0636 0.0636 0.0752 0.0752 0.0703 0.0703
a = 0.3317
E(mi) 3.7935 3.8253 3.8050 (0 = 0.639)
(Recall that pd = [4 4 4 10 5 13 6 6 11 0]T/14.)
(b) without dominated solutions
mus{xk) Bel(zfc) m(xfc) Bel(xfc) mu(xk) Bel(a:fc)
0.2111 0.2111 0.2000 0.2000 0.2053 0.2053
0.1889 0.1889 0.1800 0.1800 0.1842 0.1842
0.2222 0.2222 0.2100 0.2100 0.2158 0.2158
0.2111 0.2111 0.2000 0.2000 0.2053 0.2053
0.1667 0.1667 0.2100 0.2100 0.1895 0.1895
a = 0.4737
Â£(m) 2.3146 2.3289 2.3195 (0 = 0.657)
(Recall that pu = [1 1 1 5 1 5 3 2 5 0]t/5.)
86
5. Applications
In the preceding chapters, we developed an approach to optimization under
uncertainty that treats the persistence of individual decisions across a collection
of acceptable solutions as evidence of the robustness of those decisions. Thus
far, our development has focused on a relatively simple 0-1 knapsack problem.
In this chapter, we demonstrate how the new paradigm of persistence yields
robust solutions for substantive applications.
5.1 Sensor Placement
In this section, we investigate the problem of placing sensors in municipal
water networks to detect contaminants that are maliciously or accidentally in-
jected. Following Watson et al. [31] and Carr et al. [7], we are given a water
network and N sensors and must decide to which placements the sensors should
be assigned. Once a sensor is assigned to a placement, it cannot be moved, so
the decision has no recourse.
Let v denote the point at which the contaminant is injected, our primary
source of uncertainty. Given v, each sensor placement assignment x determines
the effect of the contamination relative to that injection point and assignment
as illustrated in Figure 5.1. The actual impact of the contamination is measured
by multiple criteria:
Population Exposed (PE). Any node, i, in the network that is downstream
87
injection
point (v) contaminated
Figure 5.1: Network flow contamination relative to injection point.
of v is exposed if there is a path with no sensor from v to i. The population
of i is a parameter in the model.
Extent of Contamination (EC). The contaminant flows through the system,
contaminating each placement in its path until it reaches a sensor. The
contaminated pipe lengths determine the extent of this contamination.
Volume Consumed (VC). The population continues to drink the water until
the contaminant is detected. The demand rates and the time to detection
determine how much contaminated water is consumed.
Number of Failed Detections (NF). Since, generally, there will not be enough
sensors to completely cover the water network, there will be some number
of attacks that will fail to be detected by any sensors. (Measured as a
proportion of all attacks.)
Time Until Detection (TD). The time between the attack and detection of
the contaminate by a sensor, where the time for water (or a contaminate)
to pass from one node to another is a parameter in the model. (Contami-
nation events are presumed to be detected within 24 hours with or without
88
sensors, so any sensor detection times greater than 24 hours are considered
undetected by the sensors.)
The primary decision variable is the binary assignment:
{1 if a sensor is assigned to placement i;
0 otherwise.
These are subject to the budget constraint:
T^
i
where N is less than the number of placements.
Since there is no advantage to not placing a sensor, it is optimal to assign
them all, so the budget constraint always holds with equality at optimality. For
the purpose of this discussion we assume the number of sensors, N, is given.
Other than the budget constraint, there is no issue of feasibility; we want our
decision to be robust in the sense that the objective values are as insensitive as
possible to the injection point.
Although the model uses secondary variables to measure each of the objec-
tives, we can simply write each objective as a function of any assignment [32],
Thus, the sensor placement model is given by:
min (PE(j:, u), EC(x, v), VC(x, v), NF(x, v), TD(x, u)),
xX(N)
89
where
x G {0, l}n :
and $ is an appropriate function of the objectives, such as a weighted sum. In
this form, the model is not yet well posed since the objective value depends upon
the uncertain point of injection, v.
. One model assumes no information about the injection point and prepares
for the worst-case:
min max
xeX(N) v
However, analysts have some information about the injection point, which
takes the form of injection weights, w > 0, where wv = 1. One could think of
w as a probability and the objective as an expected case but it is not necessary
to attribute any probability meaning to these weights.
Since, in the general case, the admissible weights include w = l for some v
and wv = 0 for v ^ v, we can formulate the worst-case model, where we have
no information, as:
min max V"' rCj,$(PE(x, v), EC(x, v), VC(x, v), NF(x, v), TD(x, v)),
xGX(N)wSn *
v
where Sn is an n-simplex, {w Â£ R" : Ylvwv = !}
Alternatively, as in Carr et al. [7], the information about the weights may
be in the form of bounds, so that w is restricted to
90
~~
~~ |