Citation
Theory of disruption

Material Information

Title:
Theory of disruption
Creator:
Simler, Kenneth John
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
50 leaves : ; 28 cm

Subjects

Subjects / Keywords:
NP-complete problems ( lcsh )
NP-complete problems ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 46-50).
Thesis:
Computer science
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Kenneth John Simler.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
47832839 ( OCLC )
ocm47832839
Classification:
LD1190.E52 2001m .S55 ( lcc )

Full Text
Theory of Disruption
by
Kenneth John Simler
B.S., Mesa State College 1997
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
2001


Simler Kenneth John (M.S., Computer Science)
Theory of Disruption -
Thesis directed by Associate Professor William J. Wolfe
ABSTRACT
Finding solutions to problems dominates most work in computer science.
Regardless of the application/research under development almost always there is a
problem being investigated. The algorithms and heuristics employed to solve NP-
Complete problems grow constantly from die research of scientists across the globe.
These problems are of (intense) interest in computer science because of the broad
domain of applications they apply to and the desire to know if Polynomial (P) equals
Nondeterministic Polynomial (NP). This thesis centers on methods for solving such
problems; particularly on how (greedy) algorithms act by sorting through the search
space of a problem and by taking certain steps arrive at a solution. All of the problems
examined have a sequence that represents their solution. By manipulating this sequence
a given algorithm advances from a poor starting solution (normally) to a local maximum
or minimum. Each step changes the solution to a better one and the change may be large
or small depending on the step. The Theory of Disruption (discussed in this paper)
argues that a smaller change (less disruptive step) will, on average, produce a better
solution.
This abstract accurately represents the content of the candidate's thesis. I recommend its
publication.
m
William J. Wolfe


ACKNOWLEDGMENT
I would like to thank Dr. William Wolfe for the concept of Disruption and his
guidance during the research conducted for this paper. It would never have been
possible without his leadership. I also thank the reviewers, Charlene Long, Tanya
Simler, Carolyn Simler, for their helpful comments on the technical content and
structure of the paper.


FIGURES
Figure
Figure 6.1.1...........................................................19
Figure 6.1.2...........................................................19
Figure 6.1.3...........................................................20
Figure 6.1.4...........................................................20
Figure 6.1.5...........................................................21
Figure 6.1.6...........................................................21
Figure 6.1.7...........................................................22
Figure 6.1.8...........................................................22
Figure 6.1.9...........................................................23
Figure 6.1.10..........................................................23
Figure 6.3.1...........................................................33
Figure 6.3.2...........................................................33
Figure 6.3.3...........................................................33
Figure 6.3.4...........................................................33
Figure 6.3.5...........................................................34
Figure 6.3.6...........................................................34
Figure 6.3.7...........................................................34
Figure 6.3.8...........................................................34
Figure 6.3.9...........................................................35
Figure 6.3.10..........................................................35
Figure 6.3.11..........................................................35
Figure 6.3.12..........................................................35
Figure 7.1 Traveling Salesman Average Scores...........................36
Figure 7.2 Weighted Early Tardy Average Scores.........................37
Figure 7.3 Knapsack Average Scores: Knapsack Size = lA.................38
Figure 7.4 Knapsack Average Scores: Knapsack Size = Vi.................39
Figure 7.5 Knapsack Average Scores: Knapsack Size = V*.................40
Figure 7.6 Traveling Salesman Percent Wins.............................41
Figure 7.7 Weighted Early Tardy Percent Wins...........................42
Figure 7.8 Knapsack=l/4 Percent Wins...................................43
Figure 7.9 Knapsack=l/2 Percent Wins...................................44
Figure 7.10 Knapsack=3/4 Percent Wins..................................45
vii


The Weighted Early/Tardy (E/T) Problem lives in the scheduling world and can
be viewed as a simple (controlled environment) factory-scheduling problem. The
problem involves a given number of jobs that must be completed. Each job has a given
length, due date, early penalty and late penalty. The optimal solution is a schedule that
completes all jobs and has the smallest penalty score. The E/T is NP-Complete [18],
It is hard to find an environment in which scheduling does not have some
application. Even narrowing the scope of scheduling such that only the machine/job
problem is considered, there are still interesting problems that fall in this domain. One
such problem is the relocation problem. The relocation problem is to determine a
reconstruction sequence for a set of old buildings, under a limited budget, such that there
is adequate temporary space to house the residents decanted during rehabilitation [25].
This problem can surprisingly be cast as a resource-constrained single-machine
scheduling problem where the object is to minimize the tardy penalty for each job [25].
Although it does not address any early penalties it is a very similar problem and is NP-
hard [25],
A closely related scheduling problem to the E/T is the E/T with batch delivery
costs factored in. This problem differs from the E/T in that all jobs in a given batch have
a common due date. It follows the philosophy of Just-In-Time production [7], The slight
difference of having a common due date decision variable for the jobs changes the
problem enough to make it polynomially solvable by a dynamic programming algorithm
[7].
Quite often there are scheduling problems involving several machines with an
associated set of jobs. It is also true that, given multiple machines, there are cases in
which the problem can be considered a single machine problem. One such case is
when one machine acts as a bottle-neck [3]. Research conducted on a scheduling
problem only interested in tardiness penalties used an insertion algorithm for its final
step in determining optimality. This insertion algorithm was not identical to the one
discussed in this paper, but the idea of taking one job and inserting it somewhere else in
the sequence was the same. Although this scheduling problem was not in NP, the use of
the insertion algorithm to solve it is more evidence that it holds some merit for this type
of scheduling problem. The E/T and related problems have many applications to real-
world problems and, as a result, significant attention has been paid to them
The Standard 0-1 Knapsack problem (SKP) differs from the previous two in that
a solution does not involve every member of the problem but a subset of the members.
This problem has a thief in a jewelry store trying to fill his fixed-sized knapsack with the
most valuable items. Taking part of an item is not allowedit is either all or none. The
optimal solution is the most valuable knapsack that can be filled from the items. The
SKP is NP-Complete [42],
The SKP has many interesting variations. One such allows the thief to have an
expanding knapsack. Thus, its name is the expanding knapsack problem (EKP). It can
be interpreted as a rubber knapsack whose capacity increases the more items it
contains. It has several applications, e g. in budget control where the dealer gives trade
discount depending on the number of items purchased ... [31]. This problem is closely
related to the collapsing knapsack problem (CKP) where the knapsack is allowed to
2


would increase the solution space by n! *(0+1). From a computational standpoint this
would mean it now takes n+1 computers to solve the problem in one hour, or it takes
n+1 hours for one computer to solve the problem. Adding another city means it would
take (n+1 )*(n+2) hours to solve the problem etc. Thus, we turn to other methods for
finding solutions.
4


2.3 Knapsack Problem
The knapsack problem addresses the worries faced by a thief. A thief wants to
fill his sack such that the items contained therein are of the most value. He also wants to
make sure the sack is as full as possible. A large valuable item might, fill the sack but be
of less worth than taking two medium-sized, medium-valued items. Thus, taking the
most valuable items first does not guarantee the best sack because the item's size is of
importance. Unfortunately for the thief, he will be there when the authorities arrive if he
tries to find the answer. For this paper it is sufficient to know that the problem is NP-
Complete, and a sequence of numbers can be used to represent a solution.
6


3.1 The 2-opt Heuristic
The 2-opt tries all possible ways to invert a given sequence of numbers. It
begins at the first position of a sequence and inverts the first number with the second,
then the first number with the third, first with the fourth, and so on until all possible
inversions from the first position have been accomplished. It then starts with the second
number and inverts it with the third, then second with fourth, and so on. When all
possible inversions have been tried, the best inverted sequence is used for the next
iteration. If no improvement could be made, the heuristic terminates.
Example: 1
Sequence is 1234567890
Positions are 1 and 4
Resulting string after inversion 4321567890
Example: 2
Sequence is 1234567890
Positions are 3 and 9
Resulting string after inversion 1298765430
Example: 3
Sequence is 1234567890
Positions are 1 and 9
Resulting string after inversion 9876543210
This last example shows that some inversions for the TSP problem produce the same
tour and, thus, can be avoided. The starting sequence and the resulting sequence for
example three are the same tour because the beginning and ending values touch with the
TSP problem. By analyzing both sequences it can be seen that each number still has the
exact same neighbors even though the ordering is different. However, in the other two
problems the ends of a sequence do not touch and no duplicates are created.
8


3.3 The Insertion Heuristic
The insertion tries all possible ways to pick one number and insert it in another
position for a given sequence of numbers. It begins at the first position of a sequence
and inserts the first number in the second position, then the first number in the third
position, first in the fourth position and so on until all possible insertions from the first
position have been accomplished. It then starts with the second number and inserts it in
first position, then the second in the third and so on. When all possible insertions have
been tried the best sequence is used for the next iteration. If no improvement could be
made, the heuristic terminates.
This heuristic differs from the other two because it must go back each time;
meaning that the other two, once they have handled the first number, never touch it
again because it would produce a duplicate sequence. For example, the pairwise
swapping 1 and 7 would create the same tour if it swapped 7 and 1. This is not the case
with the insertion. Putting 1 in the 9th position is not the same as putting 9 in the 1st
position. As a result, this heuristic checks roughly twice the number of possibilities the
others do. There are two insertions that are skipped with each looping sequence.
Inserting a number in its own position is not done, and inserting a number in the position
immediately to the left is not done because it is the same as inserting the left number
with the one immediately to the right which would have already been done.
Example: 1
Sequence is 1234567890
Working number is 1
Position is 4th
Resulting string after inversion 2341567890
Example: 2
Sequence is 1234567890
Working number is 3
Position is 9th
Resulting string after inversion 1245678930
Example: 3
Sequence is 1234567890
Working number is 9
Position is 2nd
Resulting string after inversion 1923456780
All insertions create a unique sequence except the ones mentioned above.
10


performed very well in finding excellent solutions to this problem. Data smoothing
allows a local search heuristic to escape from a poor, local optimum [9]. It does this by
allowing a heuristic to move from a locally good state to an inferior state in hopes that it
will lead to a better state than the first local minimum There are many different
approaches in implementing smoothing heuristics. One study used eight unique
smoothing heuristics and compared them to the 2-opt, 3-opt and well know Lin-
Kemighan heuristic.
On the TSPLIB problems, the tours generated
by sequential smoothing were competitive with the
tours generated by two versions of the well-known Lin-
Kemighan heuristic and were better than the tours
generated by standard versions of the two-opt and three-
opt heuristics [9].
In the above paragraph the TSPLIB refers to a well know, library of
TSP test problems. The 2-opt is used here as standard in which to
compare other algorithms against, another testament to its capability in
regards to the TSP problem.
4.2 Weighted Early Tardy Research
The E/T is a member of the scheduling problem family. This family of
problems is quite large and much effort has been poured into finding good algorithms
and heuristics for it. One reason for such effort lies within the many practical uses
scheduling has in real-world problems. The E/T, even being NP-Complete, is a very
simple scheduling problem in its constituent elements, if not in its solvability.
Many neighborhood strategies have been used to find solutions to this problem
with mixed results. With large problems the neighborhood searches must be restricted
because even nA3 produces far too many possibilities to handle in a reasonable amount
of time. Such constraints reduce the time to solve a problem. However, in most cases
such constraints produce a lower quality solution since fewer candidates are considered
[18].
One research paper discussed two neighborhood search algorithms, namely the
insertion and pairwise interchange. Since both were used in this project, it was of great
interest to read about the observations their utilization produced. In support of the
findings presented herein, the insertion algorithm is known to produce better solutions
on average than the pairwise interchange algorithm. However, as noted in [19] the
combination of the two produces a better solution than the insertion does alone. The
most surprising and rewarding information found was the following comment:
12


The strongly correlated knapsack problem (SCKP) is a variation of the SKP.
The SCKP is defined as the SKP with the variation that the profit of each item
corresponds to its weight plus a fixed constant [33]. One approach to solving such
problems is to cast them as Subset-sum Problems. It can be proven that an optimal
solution to the Subset-sum Problem is also an optimal solution to the original problem
provided that the largest possible number of items is chosen [33]. This approach used
the 2-opt to solve the Subset-sum Problem. A 2-optimal heuristic was able to find exact
solutions with overwhelming probability [33], The research done in this paper does
not address the Subset-sum Problem, but the 2-optimal heuristic used for this problem
was similar to the pairwise interchange algorithm as defined in this paper. It is
interesting to note that the 2-optimal performs very well and by doing so supports the
research done in this paper where the pairwise was also the dominate heuristic for this
problem.
14


grew rapidly. The gap between the pairwise algorithm and the 2-opt algorithm grew
slowly but steadily.
5.3 Knapsack Disruption
The Knapsack problem does not use an entire sequence for its solution so it had
to be adjusted to make a complete sequence work as the solution. For information on
how a complete sequence works as a solution, see the section titled Research under the
heading of Knapsack Research (page 32). As with the Weighted Early Tardy
Problem, the knapsack problem required far more effort than the TSP did in determining
a definition of disruption. This process was grueling, and it is still not entirely clear that
the definition given in this paper captures the essence of the problem. The reason will
be explained in a moment. In order to label disruption there must be something to
disrupt. The sequence of numbers cannot be used for obvious reasons; namely the
disruption of a number sequence has no connection to a given problem and if used could
then be used for all problems. Finding a valid disruption definition involves finding the
essence of a problem and defining disruption around it. With the TSP it was differing
tour legs and with the E/T job positioning, both of which represented something very
unique and essential for each problem. The knapsack problem has items for the taking
and a sack to put them in. To calculate disruption the following for methods were tried.
1. Count the number of different items that make up the sack and compare that
to the number of items the sack contained on the last iteration. The difference would be
the disruption value.
2. Store the value of the sack and compare the current value with the last value.
This idea was quickly removed because the "score" was not used for a disruption value
on either of the other two problems.
3. Keep track of each item and its position in the sack. The number of different
items and the differing positions would be used to calculate disruption. This was also
thrown out because it tied too closely with the positioning of numbers in the sequence
which as mentioned earlier should not be used.
4. Save the sack size used and check it against the current sack size used. The
difference in the sack sizes would be considered disruption.
Numbers 1 and 4 seemed equally suited for a definition of disruption. However,
number 4 supported the current theory that less disruption yields better solutions and
thus, it was used as the definition. In all fairness, number 1 might be just as qualified or
perhaps more. There is not enough information to say.
The pairwise algorithm had the least disruption, with the 2-opt second and the
insertion third. For informational purposes and for those interested, if number 1 was
used for the definition of disruption, the insertion was the least disruptive, the pairwise
next and the 2-opt most disruptive. This problem was much like the TSP in that the
16


6.0 Research
In applying the three heuristics to the three problems, it was not obvious
that any one heuristic changed a given sequence of numbers better than another did. It
should be noted that the insertion heuristic checked nearly twice as many as the other
two, which on the surface hinted it might perform better on average, but such was not
the case. In all of the following, each heuristic was given the same starting sequence of
numbers and proceeded to optimize from there. Data from multiple runs can be
referenced in the graph portion of the paper. The heuristics are rated as first second or
third, depending on how they performed in the testing. It should be noted that if two
heuristics reported the same result, then they were given the same place, meaning two
could come in first place if they tied for the highest score or second place if they tied for
the lowest score, etc.
One important observation to make is that all three heuristics have the ability to
swap two adjacent numbers. For example, if the sequence was 1235467890 all of the
heuristics can swap 4 with 5. The 2-opt simply inverts the subsequence 54, the pairwise
always swaps two numbers, and the insertion just takes one of the numbers and puts it in
the others place. Therefore, each heuristic can fix a problem if it is caused by two
adjacent numbers being out of place.
6.1 Traveling Salesman Research
When the heuristics were applied to the TSP involving ten cities, the 2-opt and
the insertion performed equally well while the pairwise score was a little better than half
the others. As more cities were added to the problem, a gap between the insertion and 2-
opt began to emerge and then grew rapidly. The pairwise never did compete with the
others, remaining in third place through the entire test. In the ten city test the insertion
came in first 75% of the time; with 20 cities it dropped to 25%; with 30 to 10%; with 40
to 7%, with 50 to 4%, etc. The 100-city test yielded the insertion first l-% of the time.
The 2-opt was in first place 75% at 10 cities, 90% at 20 cities and continued to climb up
to 99% at 70 cities and beyond. The pairwise was first less than 1% of the time after 30
cities were reached The natural question is why? Given the explanation of the
heuristics, what makes one better than another? As mentioned previously, the theory
boils down to step size or disruption. The heuristic with the smallest step (or least
disruption) dominates. With the TSP problem, a visual helps greatly to understand why
one heuristic is more disruptive than another.
18


Consider the same tour 1237654890. If we apply the pairwise to this tour the optimal
tour will be found in two moves.
Figure 6.1.3
The pairwise swaps two cites. For the first swap, choose 7 and 4. This produces
the following sequence 1234657890 as shown below.
Figure 6.1.4
This swap removed four legs of the tour, 3-7, 4-8,4-5 & 6-7, and replaced them
with the four new legs, 3-4,4-5, 5-7 & 7-8. Notice that a neW cross was created from
the swapping. This is often the case with the pairwise. If this new cross made the tour
longer, the pairwise would not continue and would have stopped on the last tour. Of the
three heuristics, the pairwise breaks more legs at each step (4), and is, therefore, the
most disruptive. Thus, according to the theory, that is why it performs the worst on the
TSP problem. For the next move, swap cities 5 and 6. This will produce the optimal tour
shown in Figure 6.1.2. The swapping of cities which are adjacent such as 5 and 6 in the
current tour only removes two legs and inserts two new ones, which makes the pairwise
behave like the 2-opt in this special case. The oval configuration is one that all three
heuristics are able to find the optimal tour for, but its a nice example problem.
20


Figure 6.1.7
The insertion of 5 or 4 into its place yields the optimal tour (see Figure 6.1.2).
To understand the kind of tour configuration problems each heuristic can fix in
one move, consider the following examples From Figure 5.1 it is known that the 2-opt
fixes crosses one at a time. What do the other two heuristics fix? The pairwise fixes a
double cross in one move as shown by the following tour 1284567390.
Figure 6.1.8
In one move, by swapping cities 8 and 3, the pairwise heuristic can produce the
optimal tour, see Figure 6.1.2. The ability to fix a double cross looks impressive, but
since it cannot fix a single cross without creating another cross (except in special cases)
as shown in Figure 6.1.3, the pairwise is often stuck with a poor tour. The 2-opt and the
insertion can foe a double cross in two moves. However, the insertion will create a
different crossin this case, a spike, as shown in the next figure. If the spike is worse
than the double cross, then the insertion will be stuck. The 2-opt will simply remove the
crosses one at a time with no chance of being stuck with a cross in its tour.
22


The insertion heuristic is able to fix a "spike" in one move. Consider the
following tour 1238456790.
Figure 6.1.9
In one move, by inserting city 8 into its correct position, the insertion heuristic
will produce the optimal tour (see Figure 6.1.2). The 2-opt requires two moves to fix
this kind of problem, and the pairwise requires three.
The following is a run of forty cities with, from left to right, the 2-opt, pairwise
and insertion. Notice that the 2-opt tour does not contain a cross and, in fact, never does.
The insertion tour is not very good but is noticeably better than the pairwise tour and
typically is.
Figure 6.1.10
23


and finish time. Each plays an important role for determining the score of a job
sequence. If any given sequence is taken and examined, finding the optimal job
positions on a given timeline is not an obvious thing to do. The (due) times for each job
must be checked so their weights can be compared using either the early or the tardy
weight depending on their current finish time. The jobs arfe then moved accordingly
until the least total penalty is obtained. Several runs of four jobs were taken and
examined if one of the heuristics did not perform as well as one of the others. In
studying the positioning of the jobs, it was hoped that the reason the heuristic did not do
well would present itself. With die TSP, the reason was easily explainable due to the
number of new legs each heuristic created on an iteration. A study of the Knapsack
problem also showed some of the trouble spots each heuristic had challenges with, but
the E/T did not. The best explanation to be offered revolves around the nature of each
heuristic. The 2-opt performs inversions. If a swap or an insertion is needed to improve
the penalty score, then it will be stuck. The same rule applies to the other two heuristics.
If their operations cannot fix the problem, they are forced to stop. This explanation is
obvious and of little help. However, no other patterns have been discovered. The
following r^resents three differentTuns of four j obsrTn each instance one of the
heuristics had a higher penalty than the optimal. The poor, performing heuristic's score
and job sequence is shown along with the optimal sequence/seore and all possible job
sequences that the heuristic could create given its definition, hi all cases the possibilities
were worse than the current, hence, it was stuck with a poor score. In studying the other
job sequences, it is possible to see why the heuristic could not find a better sequence.
Namely, its operations on the job sequence were not sufficient to find a better sequence
in one move. To explain "why" as it has been explained with the other two problems
cannot presently be done.
Key for the following data
ST = Start Time
FT = Finish Time
DT = Due Time
L = Length
EW = Early Weight
TW = Tardy Weight
Score = Total Paialty.
25


Score = 70
ST= 6 FT= 12 DT= 31 L= 7 EW= 1 TW= 9
ST= 13 FT= 19 DT= 22 D= 7 EW= 7 TW= 4
ST= 20 FT= 26 DT= 13 L= 7 EW= 5 TW= 2
ST= 27 FT= 35 DT= 31 L= 9 EW= 2 TW= 1
Score = 116
ST= 2 FT= 8 DT= 22 L= 7 EW= 7 TW= 4
ST= 9 FT= 15 DT= 13 L= 7 EW= 5 TW= 2
ST= 16 FT= 24 DT= 31 L= 9 EW= 2 TW= 1
ST= 25 FT= 31 DT= 31 L= 7 EW= 1 TW= 9
27


Score = 37
ST= 4 FT= 5 DT= 5 L=2EW=2
ST= 12 FT= 19 DT= 26 L= 8 EW= 4
ST= 20 FT= 22 DT= 13 L= 3 EW= 4
ST= 23 FT= 24 DT= 24L= 2 EW= 5
Score = 186
ST= 17 FT= 18 DT=24L= 2 EW= 5 TW= 5
ST= 19 FT= 26 DT= 26 L= 8 EW= 4 TW= 8
ST= 27 FT= 28 DT= 5 L= 2 EW= 2 TW= 6
ST= 29FT= 31 DT= 13L=3 EW=4TW= 1
29
an


6.2.3 Non-Optimal Insertion
insertion heuristic 47
ST= 0 FT= 6 DT=3 L= 7 EW= 7 TW=
ST= 7 FT= 7 DT= 8 L= 1 EW=9TW=
ST= 8 PT= 15 DT= 8 L=8EW=5TW=
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW=
optimal score 17
ST=0 FT= 7 DT=8 L=8EW=5TW=
ST= 8 FT= 8 DT= 8 L=1EW=9TW=
ST= 9 PT= 15DT=3 L=7EW=7TW=
ST= 16 PT= 17 DT= 17L= 2 EW= 3 TW=
possible sequences
Score = 108
ST= 0 FT= 6 DT=3 L= 7EW= 7TW=
ST= 7 FT= 14 DT= 8 L= 8 EW= 5 TW=
ST= 15 FT= 16 DT= 17L= 2 EW= 3 TW=
ST= 17 FT= 17 DT= 8 L=1EW=9TW=
Score = 89
ST= 0 FT= 6 DT=3 L= 7EW= 7TW=
ST= 7 FT= 14 DT= 8 L= 8 EW= 5 TW=
ST= 15 FT= 15 DT= 8 L= 1 EW= 9 TW=
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW=
Score = 83
ST= 0 FT=6 DT= 3 L= 7EW= 7TW=
ST= 7 FT= 8 DT= 17L= 2 EW= 3 TW=
ST= 9 FT= 9 DT= 8 L= 1 EW= 9 TW=
ST= 10 FT= 17 DT= 8 L= 8 EW= 5 TW=
Score = 74
ST= 0 FT= 6 DT=3 L=7EW=7TW=
ST= 8 FT= 8 DT= 8 L= 1 EW= 9 TW=
ST= 9 FT= 10 DT= 17L= 2 EW= 3 TW=
ST= 11 FT= 18 DT= 8 L= 8 EW= 5 TW=
1
8
5
6
5
8
1
6
1
5
6
8
1
5
8
6
1
6
8
5
1
8
6
5
30


6.3 Knapsack Research
A solution to the knapsack problem consists of a list of items which fit into the
knapsack, not an entire list of items like the other two problems. This presented a small
problem because all three heuristics work with a complete sequence of jobs/cities/items,
as the case may be. Some adjustments were necessary to allow the three heuristics to
produce solutions to this problem, which only needed a subsequence. The adjustment
was made by allowing the heuristics to work like before, taking a complete list of items,
processing them normally, but the knapsack would be filled by starting with the first
item in the sequence and progressing to the second, third and so on until the knapsack
was full. The best, complete sequence was kept and used in the next iteration. When a
better knapsack could not be found, the process was ended.
The heuristics were run against a battery of test, 10-100 items, and scored
accordingly. The beauty of this problem was that the pairwise heuristic, which to this
point had not shown potential, dominated the others. The pairwise came in first, the 2-
opt second and the insertion third. However, the gap was not very great between the
three. One possibility for such a dose performance is that all values were between 1 and
10 for each item. Thus, with large lists of items there were several items of the same
value. This allows for dose scoring. However, the gap is still definite between all three
heuristics.
The knapsack problem had a unique attribute which required more testing than
the others did. The knapsack could be of any size and different sizes could affect the
performance of the heuristics. Three sizes were chosen: 1/4 the total size of all the
items, 1/2 the size of all the items and 3/4 the size of all the items. In each of these runs
the performance of the heuristics were much the same. The insertion performed a little
better with a smaller knapsack but not by much. The term better refers to the insertion
taking second place 100 times more with the 1/4 size knapsack than with the others,
which is a ~35% increase. There were no other obvious comparable differences.
In the 10-item test the pairwise had a marginal but dedded lead, and the other
two were within a point of each other. After the 10-item test the gap spread slowly,
definitely and consistently. The pairwise was a clear winner, followed by the 2-opt and
then the insertion. This was a surprising result because it was initially believed that the
insertion heuristic would out-perform the 2-opt and the pairwise because it could place
any item in the front of the sequence. Thus, it should be able to construct an excellent
knapsack. As will be discussed below, there were snags that were not obvious that the
insertion could not handle well. These snags were apparently enough to keep it in third
place. This leads us to the discussion of why an heuristic out-performs another.
Keeping the theory of disruption in mind, the following will illustrate some of the
strengths and weaknesses the heuristics have when used on the knapsack problem.
32


One of the 2-opt's weaknesses comes from moving chunks of a sequence as
opposed to moving individual items in the sequence. Consider the following list of
values and sizes. Sizes are in the lower row.
2693848710
2126451289
Figure 6.3.1
If the knapsack size is 5, then the first three items will fit and fill the sack. The
value of this sack is 17. For an optimal sack items (8/1) and (7/2) need to be in the first
and second positions. Notice that it is impossible for the 2-opt to move the first item out
of the sack without moving at least four other items. If positions 1 and 4 were chosen
and inverted, then the sequence would be as follows.
3962848710
6212451289
Figure 6.3.2
Now the sack score is 0 because the first item is too large to fit in the sack. It
does not matter how the positions are chosen to invert by; the resulting score for this
problem will not be higher than 17. Thus it cannot improve the sack score and will
terminate. Therefore, if positions 1 & 3 in Figure 6.3.1 are inverted, the resulting
sequence is as follows.
9623848710
2126451289
Figure 6.3.3
Notice that the score is the same. Hence this move would not be allowed, but
now the higher values are on the left and the lower on the right. This configuration is
exactly what is needed for the 2-opt to excel. It can now move the lower scoring items
out of the sack without affecting the higher valued ones. If positions 2 and 8 are
inverted, then the following sequence is created.
9784832610
2215462189
Figure 6.3.4
This is the optimal sequence and produces a knapsack with a of score 24. The
2-opt needs a starting sequence or needs to construct a sequence such that the higher
scores (best items) are always on the left otherwise it will never be able to move the
lower scores without removing the higher scores as well.
33


Consider the following list of values and scores when the insertion heuristic is
applied to one scenario of the knapsack problem.
7693848210
2126451289
Figure 6.3.9
If the knapsack is of size five then the first three items will fit in the sack. The
value of the sack will be 22. The seventh item has a value of eight and a size of one.
The optimal sack would have the seventh item in the sack and the second item out. The
insertion heuristic can place an item anywhere in the list of items. If the seventh item is
placed in the first position, the following results.
8769384210
1212645289
Figure 6.3.10
Notice that the first three items still fit in the sack, but the total score is one less
than the previous sack. Since the largest value is in the last position (third in this case),
the (9/2) item is always bumped out of the sack if another item is placed ahead of it in
the sequence. The insertion heuristic cannot create a better sack score in one move. It
would have to move the (9/2) item into the first or second position so that the (6/1) item
would be bumped out of the sack when the (8/1) item was moved into the first position.
If a better score cannot be found in one move, then the heuristic terminates and is left
with a sub-optimal score. If the insertion heuristic builds its sack such that the larger
items are on the right most used portion of the sequence, it will always find sub-optimal
scores because the larger items are always removed from the sack when another item is
placed before them. This is very similar to the 2-opts problem. If the sequence had
been as follows,
7963848210
2216451289
Figure 6.3.11
Moving the seventh item (8/1) into the first position would remove the 6/1 item
from the sack and produce the optimal sequence shown below.
8796384210
12216452 8 9
Figure 6.3.12
This sequence yields the optimal value of 24.
35


Weighted Early Tardy Average Scores
Figure 7.2
Blue = 2-opt Algorithm
Mack = Pairwise Algorithm
Pink = Insertion Algorithm
Lower scores are best.
All jobs were randomly generated with the following constraints:
Length =1.10
Tardy Weight =1.10
Early Weight =1.10
Due Date = 1.. MAXJOBLENGTH*NUMJOBS
The Average Score represents the average penalty for the given number of jobs.
37


Knapsack Average Scores: Knapsack Size = 1/4
Figure 7.3
Blue = 2-opt Algorithm
Hack = Pairwise Algorithm
Pink = Insertion Algorithm
Higher scores are best.
All Items were randomly generated with the following constraints:
Size = 1.. 10
Value = 1..10
Knapsack Size = 1 /4 "Total SizeOf AllTheltems
The Average Score represents the average value of the knapsack
for the given number of items.
38


Knapsack Average Scores: Knapsack Size = 3/4
Hax
Figure 7.5
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
Higher scores are best.
All Items were randomly generated with the following constraints:
Size =1.10
Value = 1..10
Knapsack Size = 3/4*TotalSizeOfAllTheItems
The Average Score represents the average value of the knapsack for the given number of
items.
40


Weighted Early Tardy Percent Wins
Figure 7.7
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
The Percent Wins indicates the percentage a given algorithm had the best score for a
given number of jobs. If the algorithms tied for the best score, they all would be
awarded first place. Thus, it is possible for the, sum of the three graph values to exceed
100%, as they do for the 10 job run.
42


Knapsack=l/2 Percent Wins
Figure 7.9
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
The Percent Wins indicates the percentage a given algorithm had the best score for a
given number of items. If the algorithms tied for the best score, they all would be
awarded first place. Thus, it is possible for the, the sum of three graph values to exceed
100%, as they do for the 10 item run.
44


References
[1] Adamopoulos, G.I.; and Pappis, C.P. Scheduling jobs with different, job-dependent
earliness and tardiness penalties using the SLK method European Journal of
Operational Research. January 20, 1996, Vol. 25, Issue. 3, pp.336-344.
[2] Arthanari, T.S.; Usha, M. An alternate formulation of the symmetric
traveling salesman problem and its properties Discrete Applied Mathematics.
January 15, 2000, Vol. 98, Issue 3, pp. 173-190.
[3] Ben-Daya, M.; Duffuaa, S O.; Raouf, A. Minimizing mean tardiness subject to
unspecified minimum number tardy for a single machine European Journal of
Operational Research. February 22, 1996, Vol. 89, Issue 1-2, pp. 100-107.
[4] Boland, N.L.; Clarke, L.W.; Nemhauser, G.L. The asymmetric traveling salesman
problem with replenishment arcs European Journal of Operational Research.
June 1, 2000, Vol. 123, Issue 2, pp. 408-427.
[5] Burkard, Rainer E.; Deineko, Vladimir G. On the traveling salesman problem with a
relaxed Monge matrix Information Processing Letters. September 15, 1998,
Vol. 67, Issue 5, pp. 231-237.
[6] Chen, Dewu; Li, S.; Tang, Guochun. Single Machine Scheduling with
Common Due Date Assignment in a Group Technology Environment
Mathematical And Computer Modelling. February 1997, Vol. 25, Issue 3, pp.
81-90.
[7] Chen, Zhi-Long. Scheduling and common due date assignment with
earliness-tardiness penalties and batch delivery costs European Journal of
Operational Research. August 23, 1996, Vol. 93, Issue 1, pp. 261-264.
[8] Cormen, Thomas H.; Leiserson, Charles E.; and Rivest, Ronald L. Algorithms.
Cambridge, Massachusetts, 1990.
[9] Coy, Steven P.; Golden, Bruce L., Wasil, Edward A. A computational study of
smoothing heuristics for the traveling salesman problem European Journal of
Operational Research. July 1, 2000, Vol. 124, Issue 1, pp. 15-27.
[10] Cucker, Felipe; Shub, Michael. Generalized Knapsack problems and fixed degree
separations Theoretical Computer Science. October 20, 1996, Vol. 166, Issue
1-2, pp. 301-305 .
46


[21] Johnston, Robert E.; Khan, Lutfar R. Bounds for nested knapsack problems
European Journal of Operational Research. February 16, 1995, Vol. 81, Issue 1,
pp. 154-165.
[22] Kohli, Rajeev; and Knshnamurti, Rahmesh. Joint performance of greedy heuristics
for the integer knapsack problem Discrete Applied Mathematics. January 9,
1995, Vol. 56, Issue. 1, pp. 37-48.
[23] Kovalyov, Mikhail Y.; Shafransky, Yakov M. Uniform machine
scheduling of unit-time jobs subject to resource constraints Discrete Applied
Mathematics. May 15, 1998, Vol. 84, Issue 1-3, pp. 253-257.
[24] Kravchenko, Svetlana A. Minimizing the number of late jobs for the two-
machine unit-time job-shop scheduling problem Discrete Applied Mathematics.
January 15, 2000, Vol. 98, Issue 3, pp. 209-217.
[25] Lin, B .M.T.; Cheng, T.C.E. Minimizing the weighted number of tardy jobs and
maximum tardiness in relocation problem with due date constraints European
Journal of Operational Research. July 1, 1999, Vol. 116, Issue l,pp. 183-193.
[26] Mak, King-Tim; and Morton, Andrew J. Distances between traveling salesman
tours Discrete Applied Mathematics. April 7, 1995, Vol. 58, Issue. 3,
pp. 281-291.
[27] Malandraki, Chryssi; and Dial, Robert B. A restricted dynamic programming
heuristic algorithm for the time dependent traveling salesman problem
European Journal of Operational Research. April 5, 1996, Vol. 90, Issue. 1,
pp. 45-55.
[28] Marchetti-Spaccamela, A.; Vercellis, C. Stochastic on-line knapsack problems
Mathematical Programming. January 2, 1995, Vol. 68, Issue 1, pp. 73-104.
[29] Park, Kyungchul; Park, Sungsoo. Lifting cover inequalities for the precedence-
constrained knapsack problem Discrete Applied Mathematics. February 7,
1997, Vol. 72, Issue 3, pp. 219-241.
[30] Pattloch, Marcus; Schmidt, Gunter. Lot-size scheduling of two types of jobs on
identical machines Discrete Applied Mathematics. March 7, 1996, Vol. 65,
Issue 1-3, pp. 409-419.
48


[31] Pferschy, Ulrich; Pisinger, David; Woeginger, Gerhard J. Simple but efficient
approaches for the collapsing knapsack problem Discrete Applied
Mathematics. August 22, 1997, Vol. 77, Issue 3, pp. 271-280.
[32] Phillips, Jeffrey Mark; Punnen, Abraham P., Kabadi, S.N. A linear time
algorithm for the bottleneck traveling salesman problem on a Hahn graph
Information Processing Letters. July 30, 1998, Vol. 67, Issue 2, pp. 105-110.
[33] Pisinger, David. A fast algorithm for strongly correlated knapsack problems
Discrete Applied Mathematics. December 1, 1998, Vol. 89, Issue 1-3, pp. 197-
212.
[34] Pisinger, David. A minimal algorithm for the Multiple-Choice Knapsack
Problem European Journal of Operational Research. June 8, 1995, Vol. 83,
Issue 2, pp. 394-410.
[35] Renaud, Jacques; and Boctor, Fayez F. An efficient composite heuristic for the
symmetric generalized traveling salesman problem European Journal of
Operational Research. August 1, 1998, Vol. 83, Issue. 2, pp. 571-584.
[36] Righini, Giovanni; and Trubian, Marco. A worst-case analysis of two approximate
algorithms for the asymmetric travelling salesman problem European Journal
of Operational Research. March 16, 1995, Vol. 81, Issue. 3, pp. 553-556.
[37] Sipser, Michael. Introduction to the Theory of Computation.
Boston, PWS Publishing Company, 1997.
[38] Stadje, Wolfgang. Selecting jobs for scheduling on a machine subject to failure
Discrete Applied Mathematics. December 8, 1995, Vol. 63, Issue 3, pp. 257-
265.
[39] Steiner, George. Minimizing the number of tardy jobs with precedence
constraints and agreeable due dates Discrete Applied Mathematics. January 10,
1997, Vol. 72, Issue 1-2, pp. 167-177.
[40] Tamir, Arie. Fully polynomial approximation schemes for locating a tree-shaped
facility: A generalization of the knapsack problem Discrete Applied
Mathematics. October 5,1998, Vol. 87, Issue 1-3, pp. 229-243.
49


[41] de la Vega, W. Fernandez; and Karpinski, M. On the approximation hardness of
dense TSP and other path problem. Information Processing Letters. April 30,
1999, Vol. 70, Issue. 2, pp. 53-55.
[42] Veihaegh, W.F.J.; and Aarts, E.H.L. A polynomial-time algorithm for knapsack
with divisible item sizes May 28, 1997, Vol. 62, Issue 4, pp. 217-221.
[43] Weismantel, Robert. On the 0/1 knapsack polytope Mathematical
Programming. April 1, 1997, Vol. 77, Issue 1, pp. 49-68.
[44] Woeginger, Gerhard J. Sensitivity analysis for knapsack problems:
another negative result Discrete Applied Mathematics. June 1999, Vol. 92,
Issue 2-3, pp. 247-251.
[45] Zhu, Nan. A relation between the knapsack and group knapsack problems
Discrete Applied Mathematics. October 5, 1998, Vol. 87, Issue 1-3, pp. 255-
268.
50


Full Text
This thesis for the Master of Computer Science
degree by
Kenneth John Simler
has been approved
by
Gita Alaghband
Apni T, 2 co!
April 5, 2001
Date


DEDICATION
I dedicate this thesis to my wife who has without fail supported me through my
schooling at substantial sacrifice to her time and interests.


CONTENTS
Figures...............................................................vii
Chapter
1.0 Introduction......................................................1
2.0 The NP-Complete Problems..........................................5
2.1 Traveling Salesman Problem........................................5
2.2 Weighted Early Tardy Problem......................................5
2.3 Knapsack Problem..................................................6
3.0 Heuristics Greedy Algorithms..................................7
3.1 The 2-opt Heuristic...............................................8
3.2 The Pairwise Interchange Heuristic................................9
3.3 The Insertion Heuristic..........................................10
4.0 Previous Research................................................11
4.1 Traveling Salesman Research......................................11
4.2 Weighted Early Tardy Research....................................12
4.3 Knapsack Research................................................13
5.0 Theory of Disruption.............................................15
5.1 Traveling Salesman Disruption....................................15
5.2 Weighted Early Tardy Disruption................................ 15
5.3 Knapsack Disruption..............................................16
6.0 Research.........................................................18
6.1 Traveling Salesman Research......................................18
6.2 Weighted Early Tardy Research....................................24
6.2.1 Non-Optimal 2-opt...............................................26
6.2.2 Non-Optimal Pairwise...........................................28
6.2.3 Non-Optimal Insertion..........................................30
6.3 Knapsack Research..............................................32
7.0 Research Charts..................................................36
References..............................................................46
vi


1.0 Introduction
One of the most studied dasses of problems in the realm of computer science is
the class known as Nondeterministic Polynomial (NP). All the problems studied in this
paper are drawn from this class and are known to be NP-complete [37], This paper will
NOT address finding an optimal solution to any NP-complete problem but rather will
show how the step size of an algorithm affects the solution of three unique problems in
an unintuitive way. The following gives an introduction of the three problems along
with a little research history. For those interested section 4.0 contains information on
previous research conducted on the three problems that relates directly to the research in
this paper. Section 2.0 contains a detailed description of the three problems.
The Traveling Salesperson Problem (TSP) (planar TSP for the purposes of this
paper) is perhaps one of the most celebrated NP-Complete problems [8] and was a prime
candidate for this research project. It consists of a given number of dries where all must
be visited only once (except the first dty because it is also the last). The optimal
solution has the salesman traveling the shortest possible distance. There are many
different variations of the TSP which have been researched and many more heuristics
employed than problem types. !
One variation termed the bottleneck TSP (BTSP) is defined as to find a
Hamiltonian cycle (tour) H in an edge weighted graph G such that the largest weight of
edges in H is as small as possible [32]. This variation is also NP-hard but there are
well-known cases, which can be solved in linear time [32],
The TSP and its variations have been studied for many years with no success in
finding a polynomial algorithm. However, another approach to solving these problems
is to investigate spedal cases that are polynomial time solvable. The goal is to find
good conditions such that for the instance which satisfies the conditions an optimal tour
can be found in polynomial time [13]. Several spedal cases of the TSP are solvable in
polynomial time, due to spedal combinatorial structures of the distance matrix [5].
These spedal problems have been the topic of much research, but their relevance to this
paper is minimal.
As the TSP is easily represented as a graph, many graph algorithms have been
used in solving it. One such approach uses replenishment arcs in the tour. Such arcs
are allowed to make subpaths in the tour. A path is said to be infeasible if its weight
exceeds a specified constant weight limit and it contains no replenishment arcs;
otherwise it is feasible [4]. As defined in the paper a replenishment arc is one that is
parallel to another, meaning two paths to go from dty A to dty B. [Such problems are
encountered in the real world. One such instance is an aircraft routing problem. The
replenishment arcs represent maintenance opportunities [4]. This is one of many
practical problems that the TSP models well.
1


decrease in size. Both problems have been tackled with dynamic programming with
positive results.
The multi-dimensional knapsack problem can be cast in many different ways,
one being a knapsack of knapsacks, each one being a certain size. In the real world such
problems may be known as Trim problems [14] or Cutting Stock Problems [14], Such
problems may be coined as follows:
Given quantities of goods of different shapes
to be cut from a material that comes in various sizes,
such as lengths and heights, a number of possible
efficient cutting patterns are considered. How much of
each pattern should be cut? The numbers of each
pattern cut represent the decision variables. The
constraints are given by the required quantities and by
the amount of available material of each size. The
objective may be to minimize the cost of the used
material or the amount of produced waste. [14].
As such problems appear in the real world, much effort has been exerted in solving them
and many branch and bound algorithms along with dynamic programming have been
tried with positive results.
The multiple-choice knapsack problem has the SKP as a special case and is,
thus, NP-hard [34], The problem differs from the SKP in that it has k classes N1,... ,Nk
of items to pack in some knapsack of capacity c. ... the problem is to choose one item
from each class such that the profit sum is maximized without having the weight sum to
exceed c [34]. This problem also appears in the real world. A couple examples being
Capital Budgeting [34] and Menu Planning [34], With so many different practical
applications that map well to different variants of the SKP it is one of the most studied
problems in its family.
Because the three heuristics used in this research project work with sequences of
numbers representing cities, jobs, jewelry, etc., it was necessary to work with NP-
Complete problems that had a sequence of numbers or partial sequence of numbers to
represent their solutions.
The heuristics used, 2-opt, pairwise interchange and insertion can be classified
as greedy algonthms. Each of the heuristics manipulates a sequence of numbers and
chooses the best sequence. Best is defined differently for each problem domain. The
manipulation continues until a local minimum or maximum is reached-meaning no
manipulation, given the constraints of the heuristic, produces a better solution.
Since each of the problems is combinatorially explosive, the solution is hiding in
a sea of possibilities, that given moderate to hard problems, is too big to search through
at least too big to look at all the possible solutions. Each of the problems possible
number of solutions grows exponentially with the number of elements involved with
each problem. For example, the TSP problem requires looking at roughly n! different
tours to find the correct one. If one computer was solving a given problem of n cities
and it was taking 1 hour to find the solution, then adding one more city to the problem
3


2.0 The NP-CompIete Problems
2.1 Traveling Salesman Problem (TSP)
In the TSP problem a salesman must visit a given number of cities. Each city
must be visited once and the salesman must finish at the dty he started from. (The first
city is technically visited twice.) The goal is to find the tour through the cities that has a
minimal distance. This problem yields itself nicely to a graph representation. In a
complete graph of the cities, each dty is a node and the edges with their weights
represent the distance from one dty to another. A solution to this problem can be
represented by a sequence of numbers. Each number corresponds to a dty. The
ordering of the numbers represents the order in which the dties are visited. The first and
last numbers in the sequence are considered neighbors because the last dty must connect
to the first in order to complete a tour. There are certain configurations that are trivial
for finding the optimal tour. Cities arranged in a circle is one example. For all the test
cases mentioned in this paper, the dties were randomly generated in a 1 X 1 square.
2.2 Weighted Early Tardy Problem (E/T)
The Weighted Early Tardy Problem is a well-known scheduling problem. The
problem consists of a number of jobs that must be scheduled on a strict timeline. All the
jobs must be scheduled with the goal of minimizing the total penalty. Each job has a
concrete due date that is guaranteed to be on the timeline. The jobs also have a penalty
if they are completed early, a penalty if they are finished late, and a known length of
processing. If the jobs do not conflict, the problem is trivial because all jobs can be
scheduled on their appointed due date. The problem is also trivial if some constraint
forces all the jobs to be late or early. However, for this research project the jobs' due
dates are all on the available timeline. The optimal solution is an ordering of the jobs
such that the combined early and tardy penalties are minimi zed
5


3.0 Heuristics Greedy Algorithms
All heuristics used in this research project are greedy algorithms. They fall
under the category of improvement algorithms rather than the category of creation
algorithms. A greedy algorithm always makes the choice that looks best at the moment.
That is, it makes a locally optimal choice in the hope that this choice will lead to a
globally optimal solution [8]. In the context of the problems discussed in this paper
each greedy algorithm (heuristic) starts with a sub-optimal solution (sequence of
numbers) to a problem and improves it until it reaches a local minimum or maximum. A
creation algorithm builds a solution from scratch making the best choice for each
iteration.
The run time for each heuristics sequence-adjusting-part is 0(nA2). Although
for practical computing, the insertion runs nearly twice as long as the others because its
inner two loops run (n-l)A2, whereas the 2-opt and the pairwise both run n*(n-l)/2.
It should be mentioned that the outer loop, which determines whether the
heuristics have made any progress, will run an indeterminate number of times. For the
TSP it is ~n a little lower for the 10-city through the 50-city testing and a littler higher
thereafter. For the E/T it starts at ~n for 10 jobs and, depending on the heuristic,
increases or remains the same. The insertion maintains an outer loop of ~n, the pairwise
approximately ~1.5, and the twopt ~2n at the 90-job range. The knapsack problem,
depending on the size of the knapsack and the heuristic in question, ranges from 1/14(n)
to l/6(n) which makes testing a higher number of items much easier than with the other
two problems. From a theoretical point of view, all of the heuristics run at 0(nA3), but
for practical computing the knapsack allows a thousand runs for the 100-item test in the
same time the TSP can do 100 runs for the 100-city test. The E/T problem has an extra
element for arranging the jobs on a timeline. As a result it is 0(nA4) and only 10 runs
could be tested at 90 jobs. A test of 100 jobs was not done because the time for the 90-
job test took nearly two days.
7


3.2 The Pairwise Interchange Heuristic
The pairwise tries all possible ways of swapping two numbers in a
sequence. It begins by taking the first number and swapping it with the second, then
swapping the first with the third, first with the fourth, etc. Next it takes the second
number and swaps it with the third number, then the second with the fourth and so on
until all possible swaps have been made. It then takes the best-swapped sequence and
tries all possible swaps again until a sequence cannot be improved upon. Once that
sequence is found the heuristic terminates.
Example: 1
Sequence is 1234567890
Positions are 1 and 4
Resulting string after inversion 4231567890
Example: 2
Sequence is 1234567890
Positions are 3 and 9
Resulting string after inversion 1294567830
Example: 3
Sequence is 1234567890
Positions are 1 and 0
Resulting string after inversion 0234567891
All swaps of the pairwise create a unique sequence.
9


4.0 Previous Research
Most of the previous research studied while writing this paper focused on
solving the problems optimally or solving a special case of them optimally. In each
case, those involved looked for the most efficient way to solve the problem given their
constraints, be them time, space, etc. Some of the research utilized the same algorithms
addressed in this paper. Although the results of that research made no direct reference to
the concept of disruption, the algorithms performed in the same order of excellence.
Some of the comments from one research paper were particularly supportive of the
theory of disruption. The following addresses previous research involving the three
problems and how that research relates directly to this paper.
4.1 Traveling Salesman Research
Being one of the most celebrated (if not the most celebrated) NP-Complete
problems in the field, the TSP has received a great deal of attention. Many methods
have been used in solving the TSP with great results. The Lin-Kemighan algorithm,
which is considered the champion when comparing TSP heuristics, uses the 2-opt as the
workhorse of the algorithm [26], Thus, it is consistent that the 2-opt outperforms the
pairwise and insertion algorithms with the tests run during this research. Other
variations on the Lin-Kemighan algorithm have been made using what is called a k-opt
algorithm. The k-opt can make any number of swaps not being restricted to two as the
2-opt is. Since the time involved solving a large problem using three or more for the k
in the k-opt becomes unacceptable, it is necessary to judiciously determine which
neighborhoods to visit and how long to allow a given k-opt to search. Even with the
time problems involved, such k-opts have been proven to be very effective in solving the
TSP [26],
Another variation of the TSP is the generalized traveling salesman problem
(GTSP). This problem consists of a set of points with each point being assigned to a
group. Each point may only belong to one group. Assuming each group represents a
dty, and each point in a group represents a customer, the problem is as follows: The
salesman must visit at least one customer in each city. He is allowed to visit more than
one. The solution is to create a tour which contains all the cities, visiting one or more
customers while minimizing the sum of traveling costs [35]. The Lin-Kemighan
heuristic was again used on this problem with some modification [35] and produced
excellent results. Since the Lin-Kemighan algorithm has lasted more than 25 years as
the dominate TSP algorithm, it would seem that the 2-opt, being the main staple of the
algorithm, has within it some essence of the TSP problem and; as argued in this paper, it
is the least disruptive.
One approach applied to the TSP under the category of data smoothing, also
called search space smoothing and fine-tuned learning in the literature [9], has
11


These results also indicate that the process of choosing
the next schedule using the early/tardy neighbourhood
scheme is more important to the overall result of the
search than under a standard neighbourhood scheme.
Hence when the window size is reduced, the acceptance
of poorer neighbours reduces the effectiveness of the
search. This is because the early/tardy status of a job
may cause major changes to the overall schedule...
[19]
This comment refers to the processes of moving jobs around to produce a
schedule. The last sentence in the quotation, This is because the early/tardy status of a
job may cause major changes to the overall schedule ... supports the theory of
disruption. The process of moving jobs around, as explained in the disruption portion of
this paper, may cause one neighbor to look very different from another. It is very
encouraging to see such findings.
As this paper is dealing with three well-known NP-complete problems it was
interesting to find some research that treated a scheduling problem as a compact
multidimensional knapsack problem. The research used branch and bound algorithms to
solve the problem; and a comment was made that is relevant to this research.
We believe that the exploitation of linear
programming models in scheduling problems lending to
optimal solution procedures like branch and bound or
branch and cut will give good quality results if and only
if the corresponding models take well into consideration
the problem structure [11]
The research in this paper is considering how one heuristic captures the essence of a
problem, or in the above vernacular, the problem structure. The theory of disruption is
proposing that a heuristic, which captures the problem structure, will be less disruptive.
4.3 Knapsack Research
The SKP also has several variations, some of which can be solved in polynomial
time and some that yield themselves nicely to dynamic programming. There are also
many greedy methods which, when used with certain classes of problems, produce good
results. In one such class of problems density [22] is used, being the value/weigbt, to
order and choose which items are taken. Such findings indicate that greedy heuristics
can perform well against the knapsack problem.
13


5.0 Theory of Disruption
What is disruption? It is different for every problem, and for the three problems
addressed in this paper it had to be defined. The real challenge, for two of the problems,
was finding a definition that captured the heart of the problems.
5.1 Traveling Salesman Disruption
In the TSP problem the disruptive measure was obvious. While drawing the
tours for any given sequence, it was easy to see that the 2-opt removed two legs of a tour
and inserted two new legs. The pairwise removed four legs, and the insertion removed
three. If a random set of 40 cities was taken and 40A3 = 64,000 (which is roughly what
each algorithm looks at), random tours were generated for that sequence they on average
would be very different since there are ~10A46 different tours. The difference between
the random tours is defined as disruption. If all the tour legs from one tom to another
were different, that would be maximum disruption. If no tour legs were different, then
no disruption would have occurred. Returning to the three algorithms, each one has a
different set of tour legs at each iteration. The 2-opt has two, the insertion has three, and
the pairwise has four new legs for each new tour created (on average). Their disruption
value is, therefore, 2, 3 and 4 respectively. The theory of disruption as defined in this
paper is: The algorithm that disrupts the search space the least will perform the best
on average. For this problem the 2-opt is a decided wanner, with the insertion a solid
second and pairwise very much the loser.
5.2 Weighted Early Tardy Disruption
For the Weighted Early Tardy Problem a definition of disruption had to be
determined, and a visual did not help. For a given sequence of jobs there is an optimal
starting position on the available timeline for those jobs. When the sequence is changed,
the job's starting positions may or may not change depending on their optimal
positioning. Whether or not the starting positions differ is the definition of disruption
for this problem. Every time an algorithm made a change to a sequence, the new
sequences optimal job positioning was calculated and compared to the last optimal job
positioning and any differences were recorded. It was found that the insertion had the
least disruption, followed by the pairwise and finally the 2-opt. The insertion's
disruption was ~4 jobs out of position regardless of the number of jobs being worked.
The pairwise and the 2-opt's disruption increased steadily as the number of jobs
increased. At ten jobs, the pairwise and the 2-opt disruption was ~5. At 90 jobs, the
pairwise disruption value was ~13 and the 2-opt was ~20. As the job number increased,
the gap between the insertion algorithms score and the other two algorithms scores
15


disruption values remained fairly constant regardless of how many items were being
looked at. (NOTE. With the TSP the values were constant not fairly constant.) If the
knapsack size was 1/4 the total size of all the items, then the disruption value is roughly
the same from 10 items up to 100 items with a slight decrease as the number of items
increased. At 20 items and a V* size knapsack the pairwise disruption was 0.371, the 2-
opt was 0.388, and the insertion was 0.416. At 100 items the pairwise disruption value
was 0.346, the 2-opt was 0.370, and the insertion was 0.376. With a Vz size knapsack
and a % size knapsack the algorithm scores were different but in the same order and a
little more spacing between the 2-opt's value and the insertion's value.
The closeness of these scores is worth discussing. The scores are tightly
bunched, and it is hard to argue that one algorithm is less or more disruptive than
another. To argue that the disruption value is relevant, the scores for each algorithm
need to be taken into account. The scores referred to here are the overall scores of the
knapsacks value. Of the three problems examined, the knapsack has the three
algorithms closely matched in performance, and as more items are added the algorithms
improve by proportional amounts. It makes sense that the disruption values are dose
because the overall knapsack scores are also dose.
17


Consider the following tour 1237654890. If we apply the 2-opt to this sequence, the
optimal tour will be found in one move.
Figure 6.1.1
Invert the subsequence 7654, and it becomes 4567, creating the tour 1234567890 as
shown below.
Figure 6.1.2
It is important to notice that the.2-.opt removed two legs of the tour, legs 3-7 & 4-8 and
replaced them with two new legs, 3-4 & 7-8. This supports the theory of least disruption
as to why the 2-opt dominates the other heuristics.
19


Consider again the same tour 1237654890. If we apply the insertion heuristic to this
tour, the optimal tour will be found in three moves.
Figure 6.1.5
The insertion moves one city to another position in the tour. The first move will
place city 7 in its proper position. This produces the following sequence 1236547890 as
shown below.
Figure 6.1.6
The insertion of one city in another position removes three legs of the tour, 3-7,
4-8 & 6-7, and replaces them with three new ones, 3-6, 4-7 & 7-8. There is no obvious
visual indication that this tour is better than the last however, it is. The insertion
heuristic, by breaking three legs, is the second (least/most) disruptive and comes in
second best the majority of the time. The next insertion will be to move city 6 into its
position, which produces the following:
21


6.2 Weighted Early Tardy Research
Both the TSP and this problem use an entire sequence of numbers to represent a
solution to a given problem. One important difference in the TSP is the beginning and
ending numbers are considered connected because the salesman must travel from the last
city back to the first. In the E/T, the sequence represents the order in which the jobs will
be performed, and the last and first numbers have no relation to each other beyond what
their order implies. The other important difference is that a given job sequence must be
placed on a time line. A sequence dictates the order in which the jobs will be done but
not the time they will be done. For each sequence the jobs must be placed optimally on
the given timeline. This extra step requires another nested loop in the code which makes
the core of the heuristic 0(nA3). This extra step made testing several jobs difficult. The
random sampling could not be large when the job number was more than sixty.
This problem did not yield itself obvious to the theory of disruption. The TSP
had an easy graphical interpretation of disruption, but the scheduling of jobs had no
obvious condition of being disrupted. After some investigation and testing, it was
decided that the starting position of each job would be recorded. After each heuristic
performed one adjustment of a sequence, the jobs' new starting positions would be
compared to the last set of starting positions. The changes would be tallied and
recorded. The average value of jobs out of position after one adjustment would be the
measure of disruption.
The first test consisted of a thousand runs for ten jobs. It resulted in the insertion
heuristic having the lowest score (penalty), pairwise the second and 2-opt the third. The
disruption values were in the same order, with the insertion being ~4, pairwise -5 and 2-
opt -5.5. The disruption score represented how many jobs, on average, were in a
different starting position after one adjustment of a sequence. The tests continued from
20 jobs all the way to 90 jobs at 5 step increments However, because the heuristic runs
at 0(nA4), only ten runs were made at the 70-90 job level. As the number of jobs
increased so did the disruption values, except for the insertions disruptions value. It held
a constant of ~4 while the others increased steadily. The pairwise disruption value was
-13 at 90 jobs and the 2-opt's was -20.
As with the other problems, there was a decided first, second and third
performer, but the gaps here were very wide. For the higher number of jobs the insertion
heuristic not only dominated but also had a two to four times lower penalty than the
other two heuristics. This is not the case with the TSP and knapsack problems.
Although a definite winner emerged for the other problems, usually the losers were less
than two times as bad as the victor was. The difference here is striking.
The question of why one heuristic dominates over another still needs to be
asked. The theory of disruption is supported by this test, but what intrinsic nature does
this problem have which makes one heuristic dominate over another What type of
sequential job problems do the three heuristics fix or find difficult? This problem was
not easy to analyze. It has several variables: early weight, tardy weight, length, due date
24


6.2.1 Non-Optimal 2-opt
2-Opt heuristic score 50
ST= 0 FT= 8 DT= 31 L= 9 EW= 2 TW= 1
ST= 9 FT= 15 DT= 13 L= 7 EW= 5 TW= 2
ST= 16 FT= 22 DT= 22 L= 7 EW= 7 TW= 4
ST= 25 FT= 31 DT= 31 L= 7 EW= 1 TW= 9
Optimal sequence score 25
ST= 0 FT= 6 DT= 31 L= 7 EW= 1 TW= 9
ST= 7 FT= 13 DT= 13 L= 7 EW= 5 TW= 2
ST= 16 FT= 22 DT= 22 L= 7 EW= 7 TW= 4
ST= 23 FT= 31 DT= 31 L=9EW=2TW= 1
Possible sequences
Score = 87
ST= 0 FT= 8 DT= 31 L= 9 EW= 2 TW= 1
ST= 9 FT= 15 DT= 13 L= 7 EW= 5 TW= 2
ST= 16 FT= 22 DT= 31 L= 7 EW= 1 TW= 9
ST= 23 FT= 29 DT= 22 L= 7 EW= 7 TW= 4
Score = 99
ST= 2 FT= 10 DT= 31 L= 9 EW= 2 TW= 1
ST= 11 FT= 17 DT= 22 L= 7 EW= 7 TW= 4
ST= 18 FT= 24 DT= 13 L= 7 EW= 5 TW= 2
ST= 25 FT= 31 DT= 31 L=7EW= 1TW=9
Score = 94
ST= 0 FT= 8 DT= 31 L= 9 EW= 2 TW= 1
ST= 9 FT= 15 DT= 31 L= 7 EW= 1 TW= 9
ST= 16FT=22DT=22L=7EW=7TW=4
ST= 23 FT= 29 DT= 13 L= 7 EW= 5 TW= 2
Score = 61
ST= 2 FT= 8 DT= 13 L= 7 EW= 5 TW= 2
ST= 9 FT= 17 DT= 31 L= 9 EW= 2 TW= 1
ST= 18 FT= 24 DT= 22 L= 7 EW= 7 TW= 4
ST= 25 FT= 31 DT= 31 L= 7 EW= 1 TW= 9
26


6.2.2 Non-Optimal Pairwise
pairwise heuristic 30
ST=4 FT= 5 DT=5 L=2EW=2TW=
ST= 15 FT= 22 DT= 26 L= 8 EW= 4 TW=
ST= 23 FT= 24 DT= 24L= 2 EW= 5 TW=
ST= 25 FT= 27 DT= 13 L= 3 EW= 4 TW=
optimal score 16
ST= 4 FT= 5 DT=5 L=2EW=2TW=
ST= 11 FT= 13 DT= 13L= 3 EW= 4 TW=
ST= 15 FT= 22 DT= 26L= 8 EW= 4 TW=
ST= 23 FT= 24 DT= 24L= 2 EW= 5 TW=
possible sequences
Score = 142
ST= 11 FT= 13 DT= 13 L= 3 EW= 4 TW=
ST= 15 FT= 22 DT= 26L=8 EW= 4 TW=
ST= 23 FT= 24 DT= 24 L= 2 EW= 5 TW=
ST= 25FT= 26DT= 5 L=2EW=2TW=
Score = 114
ST= 0 FT= 7 DT= 26L= 8 EW= 4 TW=
ST= 8 FT= 9 DT=5 L= 2EW= 2TW=
ST= 23 FT= 24 DT= 24L= 2 EW= 5 TW=
ST= 25 FT= 27 DT= 13 L= 3 EW= 4 TW=
Score = 30
ST= 4 FT= 5 DT=5 L=2EW=2TW=
ST= 11 FT= 13 DT= 13L= 3 EW= 4 TW=
ST= 17 FT= 18 DT= 24L= 2 EW= 5 TW=
ST= 19 FT= 26 DT= 26L= 8 EW= 4 TW=
Score 46
ST=4 FT= 5 DT=5 L=2EW=2TW=
ST= 17 FT= 18 DT= 24 L= 2 EW= 5 TW=
ST= 19 FT= 26 DT= 26L= 8 EW= 4 TW=
ST= 27 FT= 29 DT= 13 L= 3 EW= 4 TW=
6
8
5
1
6
1
8
5
1
8
5
6
8
6
5
1
6
1
5
8
6
5
8
1
28


Score = 72
ST= 0 FT= 7 DT= 8 L=8EW=5TW=5
ST= 8 FT= 14 DT= 3 L= 7 EW= 7 TW= 1
ST= 15 FT= 15 DT= 8 L= 1 EW= 9TW= 8
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW= 6
Score =111
ST=0 FT= 0 DT= 8 L=1EW=9TW=8
ST= 1 FT= 7 DT= 3 L=7EW=7TW=1
ST= 8 FT= 15 DT= 8 L=8EW=5TW=5
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW= 6
Score = 84
ST= 0 FT= 0 DT= 8 L=1EW=9TW=8
ST= 1 FT= 8 DT= 8 L=8EW=5TW=5
ST= 9 FT= 15 DT= 3 L=7EW=7TW=1
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW= 6
Score = 65
ST= 7 FT= 7 DT= 8 L=1EW=9TW=8
ST= 8 FT= 15 DT= 8 L=8EW=5TW=5
ST= 16 FT= 17 DT= 17L= 2 EW= 3 TW= 6
ST= 18 FT= 24 DT= 3 L= 7 EW= 7 TW= 1
Score = 106
ST= 0 FT= 1 DT= 17L= 2 EW= 3 TW= 6
ST= 2 FT= 8 DT= 3 L=7EW=7TW=1
ST= 9 FT= 9 DT= 8 L=1EW=9TW=8
ST= 10 FT= 17 DT= 8 L= 8 EW= 5 TW= 5
31


Now consider an instance of the knapsack problem and apply the pairwise
heuristic to it. Consider the following list of values and sizes.
8893647615
2 1 2 6 4 5 1 1 8 7
Figure 6.3.5
The knapsack size will remain at 5 for this problem. The score is 25 for the
current sack. All one-swap combinations do not create a better score, which marks the
point of termination for the heuristic. Clearly having items seven (7/1) and eight (6/1)
replace the first item (8/2) would improve the sack score, but it would require two moves
to produce the optimal sequence with a backtrack step for the first move. If the (7/1)
item was swapped with the (8/2) item, the following sequence is created.
7893648615
_ 1126452187
Figure 6.3.6
This sacks score is 24, which is one lower than the previous. This move would
never be done because it is a move backwards, but it is necessary to produce the optimal.
As shown below the (6/1) item can be swapped with the (3/6) item and the optimal
sequence produced.
7896648315
1121452687
Figure 6.3.7
The problem of having a large item in the sack that can be replaced with two
smaller but more valuable items will always stop the pairwise heuristic. It can also stop
the insertion heuristic if conditions are right. Another problem related is swapping two
items of equal value but of different sizes. Consider the following.
8993647285
2 1 2 6 4 5 1 1 1 7
Figure 6.3.8
By swapping item nine (8/1) with item one (8/2), more room will be created in
the sack, and the score will be the same. Then the eighth (2/1) item could be swapped
with the fourth (3/6) item and the optimal sack created. If less space used but same
score was considered an improvement then this problem could be avoided.
34


Traveling Salesman Average Scores
Figure 7.1
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
Lower scores are best. All city positions were randomly generated in a 1 by 1 square.
The average score represents the average distance of the best tours for the given number
of cities.
36


Knapsack Average Scores: Knapsack Size = 1/2
Kax
Figure 7.4
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
Higher scores are best.
All Items were randomly generated with the following constraints:
Size =1..10
Value =1..10
Knapsack Size = l/2*TotalSizeOfAllTheItems
The Average Score represents the average value of the knapsack for the given
number of items.
39


Traveling Salesman Percent Wins
Hax
Figure 7.6
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
The Percent Wins indicates the percentage a given algorithm had the best score for a
given number of cities. If the algorithms tied for the best score, they all would be
awarded first place. Thus, it is possible for the sum of the three graph values to exceed
100%, as they do for the 10 city run.
41


Knapsack=l/4 Percent Wins
Han
Figure 7.8
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
The Percent Wins indicates the percentage a given algorithm had the best score for a
given number of items. If the algorithms tied for the best score, they all would be
awarded first place. Thus, it is possible for the, sum of the three graph values to exceed
100%, as they do for the 10 item run.
43


Knapsack=3/4 Percent Wins
Max
Figure 7.10
Blue = 2-opt Algorithm
Black = Pairwise Algorithm
Pink = Insertion Algorithm
The Percent Wins indicates the percentage a given algorithm had the best score for a
given number of items. If the algorithms tied for the best score, they all would be
awarded first place. Thus, it is possible for the sum of the three graph values to exceed
100%, as they do for the 10 item run.
45


[11] Della Croce, Federico; Gupta, Jatinder N.D.; Tadei, Roberto. Minimizing tardy
Jobs in a flowshop with common due date European Journal of Operational
Research, January 16, 2000, Vol. 120, Issue 2, pp. 375-381.
[12] Dutta, P.; DuttaMajumder, D. Performance Evaluation of Evolutionary Class of
AlgorithmsAn Application to 0-1 Knapsack Problem Mathematical And
Computer Modelling. April 1998, Vol., 27, Issue 7, pp. 57-72.
[13] Enomoto, Hikoe; Oda, Yoshiaki; Ota, Katsuhiro. Pyramidal tours with
step-backs and the asymmetric traveling salesman problem Discrete Applied
Mathematics. October 5, 1998, Vol. 87, Issue 1-3, pp. 57-65.
[14] Fayard, Didier; Zissimopoulos, Vassilis. An approximation algorithm for
solving unconstrained two-dimensional knapsack problems European Journal
of Operational Research. August 3,1995, Vol. 84, Issue 3, pp. 618-632.
[15] Gens, George; and Levner, Eugene. An approximate binary search
algorithm for the multiple-choice knapsack problem Information Processing
Letters. September 15, 1998, Vol. 67, Issue. 5, pp. 261-265.
[16] Gupta, S.K.S.; Srimani, P.K. Scheduling Independent Jobs for Torus Connected
Networks With/Without Link Contention Mathematical And Computer
Modelling. January 2000, Vol. 31, Issue 2-3, pp. 131-145.
[17] Hassin, Refael; and Rubinstein, Shlomi. An approximation algorithm for the
maximum traveling salesman problem Information Processing Letters.
August 17, 1998, Vol. 67, Issue. 3,pp. 125-130.
[18] James, R. J.W.; and Buchanan, J.T. Performance enhancements to tabu search for
the early/tardy scheduling problem European Journal of Operational Research.
April 16, 1998, Vol 106, Issue. 2-3, pp. 254-265.
[19] James, R.J.W.; and Buchanan, J.T. A neighbourhood scheme with a compressed
solution space for the early/tardy scheduling problem European Journal of
Operational Research. November 1,1997, Vol. 102, Issue. 3, pp. 513-527.
[20] Jenner, Birgit. Knapsack problems for NL Information Processing Letters.
May 12,1995, Vol. 54, Issue 3, pp. 169-174.
47