
Citation 
 Permanent Link:
 http://digital.auraria.edu/AA00005730/00001
Material Information
 Title:
 Algebraic approach to graph isomorphism
 Creator:
 Choudhry, Shahzad
 Publication Date:
 2007
 Language:
 English
 Physical Description:
 viii, 52 leaves : ; 28 cm
Thesis/Dissertation Information
 Degree:
 Master's ( Master of Science)
 Degree Grantor:
 University of Colorado Denver
 Degree Divisions:
 College of Engineering and Applied Science, CU Denver
 Degree Disciplines:
 Computer Science
 Committee Chair:
 Altman, Tom
 Committee CoChair:
 Stillman, Boris
 Committee Members:
 Ra, IlKyeun
Subjects
 Subjects / Keywords:
 Isomorphisms (Mathematics) ( lcsh )
Graph theory ( lcsh ) Algebra ( lcsh ) Algebra ( fast ) Graph theory ( fast ) Isomorphisms (Mathematics) ( fast )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Bibliography:
 Includes bibliographical references (leaves 5152).
 Statement of Responsibility:
 by Shahzad Choudhry.
Record Information
 Source Institution:
 University of Colorado Denver Collections
 Holding Location:
 Auraria Library
 Rights Management:
 All applicable rights reserved by the source institution and holding location.
 Resource Identifier:
 166386637 ( OCLC )
ocn166386637
 Classification:
 LD1193.E52 2007m C46 ( lcc )

Downloads 
This item has the following downloads:

Full Text 
ALGEBRAIC APPROACH TO
GRAPH ISOMORPHISM
by
Shahzad Choudhry
B.Sc., University of Colorado at Denver, 2000
A thesis submitted to the
University of Colorado at Denver and Health Sciences Center
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
2007
This thesis for the Masters of Science
degree by
Shahzad Choudhry
has been approved
by
Lite 3/0?
Date
Choudhry, Shahzad (M.S., Computer Science)
Algebraic Approach To Graph Isomorphism
Thesis directed by Professor Tom Altman
ABSTRACT
The subject of this thesis is graph isomorphism. There are many ways to
draw the same graph. If the graph is complex, with a little imagination, one can see
that deciding whether two drawings represent the same graph can become a
challenging task. This, however, is not graph isomorphism. The word iso indicates
similarity and the word morph indicates change. The graph isomorphism problem is
one of deciding whether two different graphs are similar in the sense of structural
equivalence. That is whether they have the same pattern of connections. To this day,
no practical algorithm has been designed to solve this problem for all cases. A Graph
Isomorphism/from a graph G to a graph H is a onetoone mapping for each node in
G to a node in H such that for any two nodes u and v in G there are the same number
of edges between f(u) and f(v) as there are between u and v in G. Two graphs G and
H are isomorphic, if and only if, there exists a vertex bijection/: Vg Vh that
preserves edge connectivity. If you can relabel the nodes of one graph to match
those of another graph, preserving edge multiplicity, you have two isomorphic
graphs. This thesis provides a detailed specification of the graph isomorphism
problem along with a discussion of algorithms for its solution. The major portion of
the thesis work performed was the design and implementation of a computer program
to check if two graphs are Isomorphic. This report also provides the design
documentation, testing, and pseudocode for this program, which uses Altmans
Marking Algorithm [3].
This abstract accurately represents the content of the candidates thesis. I recommend
its publication.
Signed
Tom Altman
CONTENTS
Figures..................................................................vii
Tables..................................................................viii
Chapter
1. Introduction............................................................1
1.1 Graph Isomorphism Specification........................................1
1.2 Graph Isomorphism Algorithms...........................................7
1.2.1 Vertex Filtering (VF) Algorithm.....................................12
1.2.2 McKay's Nauty Algorithm.............................................15
1.2.3 The Marking Algorithm...............................................16
1.2.4 The Projection Algorithm............................................18
2. Developer Requirements.................................................21
3. HighLevel Design......................................................25
4. Detailed Design.......................................................30
4.1 Program Hardware and Software Specifications..........................30
4.2 Code Files, Classes, and Data Structures.............................31
4.3 The Marking Algorithm................................................34
4.4 The LRCholesky Algorithm............................................38
5. Testing and Analysis..................................................41
6. Conclusion..........................................................48
References..............................................................51
vi
LIST OF FIGURES
Figure
1.1 Example Graphs................................................1
1.2 Example Graphs................................................4
1.3 Example Graphs................................................5
3.1 Example Graph Input File.....................................29
Vll
LIST OF TABLES
Table
5.1 Test Data
.42
Vlll
1. Introduction
This introduction will provide a detailed specification of Graph Isomorphism
(GI) followed by a discussion of algorithms to solve the problem.
1.1 Graph Isomorphism Specification
It is not difficult to see that there are an infinite number of ways to draw the
same graph. If the graph is a little complex, with a little imagination, one can see that
deciding whether two drawings represent the same graph can become a challenging
task. This, however, is not graph isomorphism. The word iso indicates similarity and
the word morph indicates change. The graph isomorphism problem is one of
deciding whether two different graphs are similar in the sense of structural
equivalence. That is whether they have the same pattern of connections. To this day,
no practical algorithm has been designed to solve this problem for all cases. With
that said, let us move on to further specification and explanation of the problem.
Consider the following two graphs, G and H.
Fig 1.1 Example Graphs
1
It is easy to see, because of the way the graph is drawn, that by renaming the vertices,
from graph H, we can get graph G. That is 9>l, 2>4, 4>2, and 5>3 transforms
graph H to graph G. Note that this renaming or transformation in turn, causes the
edges to change endpoints, that is edge (9, 2) in H now becomes edge (1,4), as in G.
Each renaming, of edges or nodes, is known as a bijection. If a set of bijections exists
to transform a graph to another graph, the two graphs are isomorphic. However,
finding these bijections can become very difficult when dealing with more complex
graphs. Consider the following graph specifications:
Graph G has 4 nodes with edges (4,1), (4,2), and (3,4).
Graph H has 4 nodes with edges (9,2), (4,2), and (2, 5).
Yes, these are the same graphs G and H as shown in the diagrams before. Since the
graphs are undirected, the edge (x, y) is the same as edge (y, x). Now, without the aid
of the diagram, we can see that it is not that straightforward to come up with the
bijections to transform G to H or H to G.
In the graphs above, the edge bijections are a direct result of node bijections.
The node bijections are sufficient to show the transformation, and thus show that
graphs G and H are isomorphic. The graphs are undirected, but what if the graphs
were digraphs, where (x, y) and (y, x) are not the same edge? Then we can simply
take the direction of the edges into account when looking for isomorphism. What if
the edges were labeled and/or assigned weights, as in a weighted graph or weighted
digraph? In this case, we simply say that the weight on an edge is not really part of
2
the structure of the graph and is generally ignored in graph isomorphism. However,
in multigraphs and multidigraphs, if edge multiplicity is not maintained, edge
bijections are needed. That is, if an isomorphism,/ on a graph H, maps a node v to
node f(v) and a node u to node f(u)\ f(v) and f(u) must have the same number of edges
between each other as do the original nodes v and u in H. If this is not the case, then
we are changing the structure of the original graph H, which is defined by its edges
and which edges connect which nodes. In a naive algorithm, for the general case, it
may require n\ vertex bijections to show that one graph can be transformed into
another, where n is the number of nodes.
Now, let us formalize a definition of graph isomorphism. A graph
isomorphism/from a graph G to a graph H is a onetoone mapping for each node in
G to a node in H such that for any two nodes u and v in G there are the same number
of edges between f(u) and f(v) as there are between u and v in G. Two graphs G and
H are isomorphic, if and only if, there exists a vertex bijection/: Vq Vh that
preserves edge multiplicity. If you can relabel the nodes of one graph to match those
of another graph, preserving edge multiplicity, you have two isomorphic graphs. In
other words, G and H are isomorphic if their vertices can be labeled or numbered so
that adjacency matrices are equal. With regard to matrices, the problem for graphs G
and H can be boiled down to: does there exist a permutation matrix P such that P GP
= H? Of course, adjacency must be preserved, to avoid changing the structure of the
graph. A vertex bijection/: Vq Vh preserves adjacency iff(u) is adjacent to f(v) if
3
and only if u is adjacent to v, for all u, v in G. For simple graphs, which do not
include multigraphs or multidigraphs, preserving adjacency is the same as
preserving edge multiplicity (as given in the definition), since there is, at most, only
one edge between any pair of vertices. In the case of digraphs, the edge direction
must also be preserved.
Now that we know what graph isomorphism is, we can see that in order for
two graphs to be isomorphic, they must have the same number of nodes. This is true,
because each node in one graph has to be mapped by the isomorphism to a node in
the other graph and because this is a onetoone mapping. What about the number of
edges? If the number of nodes and the number of edges are the same amongst two
simple nonmulti graphs, are the graphs isomorphic? See the example below.
 i
Fig 1.2 Example Graphs
In this example, you can easily see that the structure of the graphs is different even
though they have the same number of nodes and edges. The graphs are said to be of
different isomorphism type. Graphs of the same isomorphism type are isomorphic.
Isomorphic type, however, only seems to make sense if we are talking about graphs
with a fixed number of nodes. On the other hand, we could abstract graphs to
isomorphic types, in which case, all graphs with the same isomorphic type can be
4
considered the same graph. Thus, the number of nodes and the number of edges
being the same for two graphs G and H, is necessary, but not sufficient, to show
isomorphism. Similar is the case for edge multiplicity preservation in an
isomorphism. If/is an isomorphism from graph G to graph H, then node v in G for
all nodes in G and f(v) in H must have the same degree. Otherwise, the structure of
the graphs G and H is different and they cannot be isomorphic. See the following
example.
Fig 1.3 Example Graphs
The two graphs have the same number of nodes, edges, and the same degree sequence
of <1,1,1,2, 2, 3>. However, they are not isomorphic because of the structural
difference. Such properties of graphs are known as isomorphism invariants. These
invariants cannot be different between two isomorphic graphs. Thus, they can be
used to show nonisomorphism, but are not sufficient in themselves to show
isomorphism.
5
A common goal with graph isomorphism is to create a practical general
algorithm to decide graph isomorphism, or to show that no such algorithm exists. In
a bruteforce approach there are n\ vertex bijections to check and each check would
require examining all vertex pairs, where n is the number of nodes. There are many
uses of graphs. Similarly, as isomorphism is a property of graphs, it also has
importance for the same uses. In particular, there are uses in linguistics with context
free grammars, in circuits, in pattern recognition, in lowlevel image representations,
in chemistry with chemical compounds, etc.
As mentioned earlier, no practical algorithm is known to exist that solves the
general graph isomorphism problem in polynomial time. In fact, it is not even known
which classification the problem falls into, whether it is NPComplete, or whether it
falls into the P class and can be decided in polynomial time. The problem is,
however, in NP, due to the ties to the permutation of vertices. There are also other
types of classifications, but it is still not known whether graph isomorphism falls into
them or not. In either case, it is considered a difficult problem to solve efficiently.
However, by placing certain restrictions on the type of graphs, polynomial time
algorithms have been devised. Such restrictions include bounded Eigenvalue
multiplicity, planar graphs, graphs satisfying a fixed degree bound, interval graphs,
permutation graphs, convex graphs, and more. A more general problem, the sub
graph isomorphism problem, is known to be NPComplete. A new class, GI, the set
of problems with a polynomialtime Turing reduction to the graph isomorphism
6
problem, was also created. The problem is trivially a complete problem for GI [14].
The rest of this thesis is dedicated to algorithms for graph isomorphism. Graph
isomorphism is perhaps the most widely studied natural problem whose complexity is
not classified as complete for some natural class. In 1852, Francis Guthrie posed the
Four Color Problem for coloring a map. This problem asks if it is possible to color,
using only four colors, any map of countries in such a way as to prevent two
bordering countries from having the same color [11]. This problem was solved in
1976 by Kenneth Appel and Wolfgang Haken. This can be considered the birth of
graph theory. While trying to solve it, mathematicians invented many fundamental
graph theoretic terms and concepts, and graph isomorphism was bom [11].
1.2 Graph Isomorphism Algorithms
Efficient graph matching algorithms suited for matching large graphs have been a
subject of research for approximately the last three and a half decades. As mentioned
in the previous section, most algorithms reduce the complexity by imposing
restrictions on the input graphs. An alternative approach to reducing matching
complexity is that of using an adequate representation of the searching process and
pruning unprofitable paths in the search space. This is a technique most common in
Artificial Intelligence. In this way, no restrictions must be imposed on the structure
of the input graphs and the obtained algorithms can be generally applied. However,
this process can become a difficult problem to solve.
7
One of the pioneer papers ascribed to this area of study, was published in 1970
by D.G. Comeil, C.C. Gotlieb [7]. It illustrates an isomorphism algorithm, which
performs suitable transformations on the input graphs, in order to find a different
representation for which the matching is computationally more convenient. However,
it has been shown that the conjecture on which this method is based, is not always
true.
A procedure that significantly reduces the size of the search space is the
backtracking algorithm proposed by Ullmann in his 1976 paper [8]. This algorithm is
devised for both graph isomorphism and subgraph isomorphism, and is still one of
the most commonly used today, for exact graph matching. Compared with other
algorithms, it is more convenient in terms of matching time, in case of onetoone
matching.
Another backtracking algorithm, SD, was also presented in 1976 by D.C.
Schmidt and L.E. Druffel [9]. It uses the information contained in the distance matrix
representation of a graph to establish an initial partition of the graph nodes. This
distance matrix information is then used in a backtracking procedure to reduce the
search tree of possible mappings.
A more recent algorithm, known as VF for Vector Filtering, is based on a depth
first search strategy, with a set of rules to efficiently prune the search tree. This
algorithm can be found in the 1999 paper by L. P. Cordelia, P. Foggia, C. Sansone,
8
and M. Vento [10]. In 2000, an improvement was made to this VF algorithm, by the
same authors, giving rise to the VF2, which always outperforms the VF [11].
The McKay's Nauty algorithm was developed in 1981 by B.D. McKay [12].
Even if Nauty is considered to be one of the fastest graph isomorphism algorithms
available, it has been shown that there are categories of graphs for which it employs
exponential time in order to find an isomorphism. It is based on a set of
transformations that reduce the graphs to be matched to a canonical form for which it
is very simple to test for the presence of an isomorphism.
Another possible approach to the isomorphism problem is the one presented in
1995 by H. Bunke and B.T. Messmer [13]. Instead of reducing the complexity of
matching two graphs, the authors attempt to reduce the overall computational cost
when matching a sample graph against a large set of prototypes. The method
performs the matching in quadratic time with the size of the input graph and
independently of the number of prototypes. It is obviously convenient in applications
requiring matching a graph against a database, but the memory required to store the
preprocessed database grows exponentially with the size of the graphs, making the
method suitable only for small graphs. So, in case of onetoone matching, other
algorithms, in particular Ullmanns, are more suitable.
All the above cited algorithms are exact ones, meaning they find the exact
solution to the matching problem, if any. Other techniques, like those based on non
deterministic paradigms, able to find approximate solutions to the matching problem
9
have been proposed, especially in the recent past. In most cases, they are very
powerful in reducing the complexity from exponential to polynomial, but they do not
guarantee finding an exact and optimal solution. As mentioned before, another
category of algorithms for this problem places restrictions on the input graphs. Such
restrictions include bounded Eigenvalue multiplicity, planar graphs, graphs
satisfying a fixed degree bound, interval graphs, permutation graphs, convex graphs,
and more. These algorithms are also polynomial in running time. However, unlike
approximation algorithms, these algorithms are important and practical, as given
problem definitions may very well meet, or can be modified to meet the restrictions
with little consequence.
Altmans paper [3] gives two heuristic backtracking algorithms with polynomial
running times, if no backtracking is applied. The Marking Algorithm is based on the
Principal Axis Theorem and a specific marking of diagonal elements of the adjacency
matrices. The Projection Algorithm modifies the other, incorporating a heuristic that
uses at most n steepest descent projections. Both algorithms rely on the calculation of
Eigenvalues of the adjacency matrices, which are forced to be symmetric, positive
definite matrices to ensure that the Eigenvalue algorithm(s) converge.
Thus far, we have discussed different classifications of algorithms, and given a
brief overview of several effective and wellknown algorithms. For each algorithm,
the source papers were referenced, and can be studied for more detailed specifications
of these algorithms. However, we will now discuss four of these algorithms in more
10
detail, along with their performance. The four algorithms that will be discussed
below are Vertex Filtering (VF) algorithm, McKay's Nauty algorithm, Altmans
Marking algorithm, and Altmans Projection algorithm.
Through research, one useful paper comparing the performance of the five
general GI algorithms discussed above was found. The paper was published fairly
recently, by P. Foggia, C.Sansone, and M. Vento [4]. As it could be expected, the
research in this paper showed that none of the algorithms is definitively better than
the others. In particular, for randomly connected graphs, the Nauty algorithm is
better if the graphs are dense and/or of large size. On the contrary, for smaller, sparse
graphs, VF2 performs better. On more regular graphs, i.e. on 2D meshes, VF2 is
definitely the best algorithm, since the Nauty algorithm is not even able to find a
solution for graphs bigger than a few dozens of nodes. In the case of bounded
valence graphs, if the valence is small, VF2 is always the best algorithm; while for
bigger values of the valence, if the size of the graphs is small, the Nauty algorithm is
more convenient. Finally, it is also worth noting that SD, VF, and VF2, are the only
algorithms that were always able to find a solution to the isomorphism problem,
independent of the size and kind of graphs to be matched.
It seems that it is a tie between the performance of the VF2 (based on the VF)
and the Nauty algorithm. However, depending on the type of graphs and valence, one
would be better suited to a particular situation than the other. As these two
algorithms have proven themselves, let us take a closer look at them.
11
1.2.1 Vertex Filtering (VF) Algorithm
There are actually two main VF algorithms for graph and subgraph
isomorphism common today, the VF and VF2. Performance evaluations done by the
authors of VF show major increases in speed compared to the algorithm written by
Ullmann [8]. The VF2 algorithm reduces memory usage from 0(n2) to 0(n) and has
been found to always outperform its older version, the VF algorithm. The VF2
algorithm was published by the same authors in 2000, about one year after its
predecessor [10] [11].
Both VF algorithms are implemented as backtracking match functions that
apply recursive function calls for every added mapping between the pattern and the
target graph. The recursive programming style makes it possible to backtrack simply
by returning from the current invocation of the matching function. In order to keep
track of which nodes have been mapped in each iteration, between the two input
graphs, a state object is created during each iteration. The state object is cloned from
the previous state and extended with a new mapping during each recursion. This
means that every state object contains all the mappings found in the previous
state object, in addition to the currently added one. This way the state object always
includes a partial, valid isomorphism. The recursion continues until all the nodes and
edges of the pattern graph have been matched. If at some point there are no more
possible nodes in the target to match, even though the whole pattern graph has not yet
12
been matched, it simply returns to the previous level and tries the next pair. This
continues until the whole search space has been covered. The VF algorithms do not
have an exact strategy for choosing the search plan, instead it is dynamically
determined during the search. This is done so that the next node is always selected
from the adjacent nodes of the most recently added node. Choosing the path in this
manner makes it a depthfirst search. When the search finally reaches a node from
which there are no adjacent nodes, then the unmapped nodes, which are adjacent to
previously mapped nodes, will be matched. Below you will find the VF2 algorithm.
M is said to be a graph/subgraph isomorphism if and only if M is an
isomorphism between G2 and a subgraph of G1. We will assume that the graphs
involved are directed graphs, i.e. an arc (i,j) is to be considered different from (j, i).
However, the extension of the algorithm to undirected graphs is trivial. The matching
process can be suitably described by means of a State Space Representation (SSR).
Each state s of the matching process can be associated to a partial mapping solution
M(s), which contains only a subset of the components of the mapping function M. A
partial mapping solution univocally identifies two subgraphs of G1 and G2, say
Gl(s) and G2(s), obtained by selecting from G1 and G2 only the nodes included in
the components of M(s), and the branches connecting them. In the following, we will
denote by Ml(.v) and M2(s) the projection of M(s) onto N1 and N2, respectively,
while the sets of the branches of Gl(.y) and G2(s) will be denoted by Bl(s) and B2(s),
respectively.
13
A highlevel description of the VF2 matching algorithm can be outlined at this
point:
PROCEDURE Match(s)
INPUT: an intermediate state s; the initial state sO has M(sO) NULL
OUTPUT: the mappings between the two graphs
IF M(s) covers all the nodes of G2 THEN
OUTPUT M(s)
ELSE
Compute set P(s) of the pairs candidate for inclusion in M(s)
FOREACH (n, m) member of P(s)
IF F(s, n, m) THEN
Compute state s' by adding (n, m) to M(s)
CALL Match(s')
END IF
END FOREACH
Restore data structures
END IF
END PROCEDURE
Here F(s, n, m) is a boolean function (called feasibility function) that is used to prune
the search tree. If its value is true, it is guaranteed that the state s', obtained by
adding (n, m) to s, is a partial isomorphism if s is; hence the final state is either an
isomorphism between G1 and G2, or a graph/subgraph isomorphism between a sub
graph of G1 and G2. Moreover, F will also prune some states that, albeit
corresponding to an isomorphism between GIG) and G2(s), would not lead to a
complete matching solution.
Once again, we reiterate that there is no efficient general graph isomorphism
algorithm. In the worst cases, this algorithm performs the same as the bruteforce
14
naive algorithm. That is, it has a 0(nm) exponential running time, where n and m are
the number of nodes in the input graphs. However, in most cases it outperforms any
other available algorithm, and requires close to O(n) space; and thus it can handle
larger graphs more easily than most other algorithms.
1.2.1 McKay's Nauty Algorithm
Despite the fact that this algorithm is advertised as the fastest general graph
GI algorithm, it also has a worstcase 0(nm) exponential running time, like the VF
algorithm. However, in performance testing it was found that this algorithm
outperforms any others when dealing with dense, small to medium size graphs.
Except in the case of graphs known as 2D meshes, where the algorithm was not able
to find a solution, but this may just be due to an implementation error.
As mentioned before, the algorithm is based on a set of transformations that
reduce the graphs to be matched, to a canonical form, for which it is very simple to
test the presence of an isomorphism. There is no document which explains in detail
how Nauty works. However, most of the important ideas were published in Brendan
D. McKays paper [12]. The algorithm finds the automorphism group of a vertex
colored graph. Nauty is also able to produce a canonicallylabeled isomorph of the
graph, to assist in isomorphism testing. An automorphism of a graph is a graph
isomorphism with itself, i.e., a mapping from the vertices of the given graph, G, back
to vertices of G, such that the resulting graph is isomorphic with G. The set of
automorphisms for G make up its automorphism group, which the algorithm uses as
15
an abstraction to look for isomorphism. You can also say that the algorithm is based
on group theory. Some data structures, including trees, are used in the process.
Coloring or partitioning is used, automorphisms of the graph are found by noticing
that two labelings give the same labeled graph. The canonical labeling map
corresponds to one of these labelings, chosen according to a complicated scheme.
This is just a high level view of some of the features of the algorithm. Despite
my sincerest efforts, I was unable to find a good description or explanation of the
algorithm. McKays paper, which can be found online, contains a very long and
difficult to understand description of the algorithm, which may also involve concept
learning.
1.2.3 The Marking Algorithm
As mentioned earlier, this algorithm is based on the Principal Axis Theorem
and a specific marking of the diagonal elements of the adjacency matrices. For every
symmetric matrix A, there exists an orthogonal matrix Za which transforms A into a
diagonal form Da = ZaTAZa The diagonal elements of Da are the Eigenvalues of A.
This is the Principal Axis Theorem, and it tells us that every symmetric matrix with n
rows and n columns, has n Eigenvalues. Computation of Eigenvalues is part of the
algorithm. To ensure that the algorithm for computation of these values converges, it
is necessary that only symmetric, positive definite matrices are used to represent the
graphs. Matrices for undirected graphs are symmetric, and the algorithm can be used
for simple undirected graphs. For the matrices to be positive definite, it is sufficient
16
that the matrices be strongly diagonally dominant, i.e., the sum of the absolute value
of the nondiagonal entries must be less than the absolute value of the diagonal entry,
for each diagonal entry in each row. This can be accomplished by setting the
diagonal entries to n, which will be zero for simple graphs with no selfloops. This
will not change the structure of the permutation matrix P, whose existence will show
that the input graphs A and B are isomorphic.
The LRCholesky method for computing Eigenvalues is recommended, since
it is guaranteed to converge for the type of matrices described above with sufficient
accuracy in polynomial time. The algorithm attempts to iteratively change or mark
the diagonal entries of the adjacency matrices so that they remain isospectral or
orthogonally congruent, while converging to an Eigenvalue multiplicity of one. If
successful, the matrices will be permutation similar, and thus isomorphic. With
certain isospectral graphs, this algorithm will be forced to backtrack and try a
different marking sequence, causing it to run in exponential time. Pairs of Isospectral
Nonisomorphic Graphs or PINGs, where the graphs are orthogonally congruent, but
not permutation similar, will cause this algorithm to backtrack. If the graphs are
isomorphic, the algorithm marks their adjacency matrices in such a way as to wind up
with orthogonally similar matrices, that is, they will have the same Eigenvalues.
However, whether the graphs are found to be isomorphic or not, the algorithm hopes
that the graphs remain isospectral during the marking, otherwise backtracking will be
17
required. If the graphs are isomorphic and no backtracking is involved, the running
time of the algorithm is 0(n4). Below is the pseudocode for the algorithm.
Initialize ilast = j last = 1
If Not SameEigenValues(A, B) then return NOT ISOMORPHIC
Else
For i from ilast to n
A[i, i] = 2in
For j from jlast to n
If B[j, j] = n then
B[j,j] = 2in
If SameEigenValues(A, B) then Exit jloop
Else If j = n then BACKTRACK if possible,
else return NON ISOMORPHIC
Else B[j, j] = n
End For
End For
Endlf
1.2.4 The Projection Algorithm
The observation of how fast the previous algorithm converges toward the final
permutation matrix P, as the diagonal elements of the graphs are cumulatively
marked, led to the projection algorithm. There are two disadvantages to the marking
algorithm. First, since the algorithm begins marking immediately, the presence of
PINGs will force backtracking. Second, the algorithm must fully mark the pair
before a permutation matrix can be determined. The projection algorithm addresses
both of these issues in a twophase approach. In the first phase, the vertex sets of the
two graphs are partitioned into equivalence classes based on Eigenvalue invariance
with a single marking. This avoids the first shortcoming of the marking algorithm.
In the second phase, the marking algorithm is performed only on the submatrices
18
whose corresponding vertices belong to the same equivalence class from phase one.
This refines the original vertex partition, and at each stage of refinement, the
similarity transformation matrix Q is examined and a candidate permutation matrix P
is computed by solving an assignment problem that maximizes the sum of the
absolute values of Qs entries (a one perrow, percolumn choice). This avoids
having to mark the entire matrices before a possible permutation matrix is found. We
will now discuss each of the phases of the algorithm individually.
The first phase will determine equivalence classes of a graph matrix by
separately marking each of its diagonal entries and checking to see if the Eigen
values are the same. The process will be done for both graphs and can complete in
0(n4) steps. A Possible Mapping Tree (PMT) is constructed from these equivalence
classes, where at level i, all possible mappings between the ith vertex equivalence
classes of Ga and Gb are represented. Note that for some graphs this tree may just
contain a root node and leaf nodes. The PMT gives us a vertex partitioning, for
which all diagonals for each vertex in a partition are marked with a value sufficiently
distinct from the other partitions, to allow for refined future partitioning. If the newly
marked matrices are isospectral, then the algorithm moves on to the second phase
described below.
Now, a refined marking is executed on vertex pairs in the same equivalence class.
At each stage of refinement, Q is computed, which converges towards some P. Any
Q defines a weighted matching, whose cost is bounded by n, by some P that
19
maximizes the overall weighted matching cost between the vertices of GA and Gb
The best choice for the next proper marking, at each step, is the one that maximizes
the overall cost from Q. The running time for this choice computation is 0(n4).
If no backtracking is needed, since the maximum number of markings is n 1,
the running time of this algorithm is 0(n5). It is possible that the algorithm may need
to backtrack after marking some nodes, when it is unable to proceed to a permutation
matrix P. In this case, the running time again may become exponential as in the
marking algorithm. However, if PINGs are present, nonisomorphism will be
detected early, unlike in the marking algorithm. In some cases, if backtracking is not
required, P may be found earlier than in the marking algorithm as well.
20
2. Developer Requirements
This section describes the requirements in paragraph format for the Graph
Isomorphism Program from a developers point of view. Some of these requirements
were verbally discussed and some were left to be decided by the developer.
This program will determine if two simple, undirected, unweighted graphs
with no selfloops or multiple edges are isomorphic or not. The program will have a
Graphical User Interface (GUI) and must be able to run on Microsoft Windows XP.
Once the program is launched, the user will be allowed to input two graphs to check
if they are isomorphic or not.
The program will allow for four types of graph input. Both graphs must be
specified using one of these four input formats. Two of the input types will have a
limit of onehundred nodes. The user will have to select which format he/she would
like to use, and the program will provide a method for this selection. The user may
choose to supply the graphs by using GUI controls designed to represent adjacency
matrices for the graphs. There will be two of these Adjacency Matrix Graph
Representation (AMGR) GUI controls, sidebyside, to allow for graph input and
modification. The AMGRs will allow the user to input graphs with up to one
hundred nodes.
21
The user will be able to specify files for the input graphs. The format of such
files will be explained in a ReadMe.doc file accompanying the program. The
number of nodes and edges for the input graphs will be unlimited when using this
type of input.
The third type of graph input will be a Graph Diagram Builder (GDB), which
will allow the user to build a regular graph diagram by adding nodes and edges to this
builder. This GDB will also allow for deletion of nodes and edges, and a mechanism
for relabeling of nodes. There will need to be one GDB for each input graph, and it
should not allow any selfloops or multiple edges, and no more than onehundred
nodes.
The fourth and last type of graph input will be a Random Graph Generator
(RGG), which will allow the user to generate random input graphs. The user will
need to specify the number of nodes and edges, with no graphsize limit imposed.
The user will be able to create both graphs randomly, or create the one random graph
and another random graph isomorphic to the first.
The program should treat the AMGRs as the main input format to read from
prior to performing the isomorphism check, as discussed below.
A button will be provided for the user to click on once the input graphs have
been provided in one of the three formats. When this button is clicked, the program
will first determine which format the user chose to provide the input graphs. If the
user chose to provide the input graphs from files, the program will check to see if the
22
files exist and that they are in the right format. The format will be checked as the
program reads in the graphs, from the files, into the AMGRs. The program will also,
at this time, check to see that the graphs do not contain any selfloops or multiple
edges. If the user chose to provide the input graphs using the GDBs, the AMGRs will
be updated from them. Similarly, if the user chose to provide the input graphs from
the RGGs, the AMGRs will be updated from them.
After checking the input and updating the AMGRs, the program will check to
see if the graphs have the same number of nodes and edges. Then finally, the main
part of the program is executed to see if the graphs are isomorphic or not. The
program will check for isomorphism according to The Marking Algorithm presented
in Altmans paper [3]. This algorithm will not be discussed in this section, but will be
discussed in the detailed design for this program along with some of its
implementation details.
Once the program has checked for isomorphism and the results have been
given to the user, the userprovided inputs the will not be cleared. Another button
will be provided to clear the form, if the user desires. This will allow the user to
modify the graph inputs and check for isomorphism again using the same button. The
user can either modify the AMGRs, or modify the input graphs using the same format
used to provide the input graphs originally. The user will have to select whichever
format he/she will modify for both graphs, as in the previous run. Then the user can
23
make the modifications, and the program will then use the input format selected, to
obtain the input graphs before checking them for isomorphism.
When using the RGGs for the input graphs, if the user chooses, the program
will generate the second graph (B), isomorphic to the first (A), by performing random
permutations. A random permutation matrix (P) will be created and PTAP will
produce B. The program will store P, also while running the algorithm to check for
isomorphism when using RGGs, the program will also store the permutation matrix
produced by the algorithm. A button will be provided to compare these two
permutation matrices, after checking for isomorphism.
Finally, the program will also provide a progress bar and status textindicator
to show the progress while checking for isomorphism. The time taken to check for
isomorphism will also be reported by the program.
24
3. HighLevel Design
This section is for the Graph Isomorphism Program. It will discuss in
paragraph format, how the program will accept input and give output. In other words,
it will describe the Graphical User Interface (GUI) for the program. These designs
follow directly from the requirements for the program.
The program will need some way of allowing the user to choose the way in
which the input graphs will be provided. That is, either from adjacency matrices,
files, graph diagrams, or random graphs. This can be accomplished by radio buttons
across the top left of the form. The first one on the left will be Adjacency Matrices
and will be selected by default. Then we will have the radio button for Files, then
one for the Graph Diagram Builder, and the last one on the right, will be for the
Random Graph Generator. Above these radio buttons will be the following label:
Select the method for providing the input graphs.
Below the radio buttons will be two text fields labeled Graph A and Graph
B for the full paths to the input files. There will also be Browse buttons to the
right of each of these text fields to allow the user to navigate to the files. All four of
these controls will be disabled unless the user selects the Files radio button. Below
these controls, there will be three tabs: one for the Adjacency Matrix Graph
Representation (AMGR) controls, one for the Graph Diagram Builder (GDB)
controls, and one for the Random Graph Generator (RGG) controls.
25
The AMGR controls will be two sidebyside checkbox grid controls with
labels 0 to 100 for both rows and columns. All checkboxes in each of the
AMGRs will be unchecked and the user will have to simply check a box to represent
an edge. Both of these controls will be disabled until the user selects the number of
nodes from a list box, labeled Number of nodes, in the middle of the two controls.
When the user selects the number of nodes, the controls will be enabled and the grids
resized to match the number of nodes selected. These grid controls will need vertical
and horizontal scrollbars to allow the user to scroll through the rows and columns. If
the user selects a checkbox on the diagonal of either grid, to represent a selfloop, a
message will be displayed that the selection will be ignored, as selfloops are not
allowed.
The GDB controls, will reside on another tab. Each of these sidebyside
controls will allow the user to build a diagram of a graph using nodes and edges. The
details of these controls cannot yet be determined, as no such standard controls exist.
Most probably, some third party components will be used to build the GDB controls.
These controls will not allow, or the program will ignore, selfloops and multiple
edges. These controls will not allow the user to attempt to build a diagram with more
than 100 nodes. These controls will allow the user to change the labels of nodes,
delete edges, and delete nodes.
The last and third tab will be for the RGG controls. At the top of this tab,
there will be a label, Specify the number of nodes." Below this label will be a text
26
box to allow the user to enter the number of nodes. Below this, there will be another
label, Specify the number of edges. Below this label will be another text box to
allow the user to enter the number of edges. Below and indented to the right, will be
a button to Generate Random Graph A followed below by another to Generate
Random Graph B. Then below these buttons, will be the label OR, followed
below by another button to Generate Random Graph B Isomorphic to Graph A.
Finally, below this button and aligned to the bottom right of the tab area, will be a
button to Check if Permutation Matrices Equal. This button will only be enabled if
the user chooses to generate graph B isomorphic to A, and after the program has
checked for isomorphism and found the graphs to be isomorphic. If the program
works correctly, the latter will always be true if the former is true.
Underneath these tabs to the right there will be two buttons, one for Clear
Form and to the right of that, one for Check for Isomorphism. The form clear
button will clear all the input the user has provided and reset the form, variables, and
data structures. The other button, when clicked, will first check all the input for
errors, then store the graphs in an internal object representation along with other
variables, and then find out if the graphs are isomorphic. The program will check for
isomorphism according to the Marking Algorithm presented in Altmans paper [3].
This algorithm will not be discussed here but will be discussed in the detailed design
for this program along with other implementation details.
27
On the opposite side of the form, from the buttons discussed in the above
paragraph, to the left, will be a status indicator, below which will be a progress bar.
As the program checks for isomorphism, the different stages will be indicated by the
status indicator. Suitable status indication texts will be decided and implemented
when the program is written. Similarly, the progress bar will progress as
computations are completed by the program, and these computations and
progressions will have to be decided at the time the program is written. This is due to
details that have not yet been designed and are beyond the scope of this section.
Once the program has found the graphs to be isomorphic to one another, or
not, this result will be indicated by the status indicator and the progress bar will be
reset. A message box will also accompany, to indicate the result. After the user
clicks the OK button on this message box, the status indicator will also be reset, but
the form will not be cleared. This is so that the user can modify the input and check
for isomorphism again.
Lastly, we will discuss the format of the input files, if files are used for the
input graphs. The files must be simple text files and could be created in Notepad,
which is available in Microsoft Windows XP. The first line of the file will be the
number of nodes. The node numbering or labeling will begin at 1, not 0. All other
lines will simply give two numbers separated by a comma, to represent an edge.
There can only be one edge on each line. Empty lines or lines with only whitespace
28
characters will not be accepted anywhere in the files. An example of a simple file
would look like the following.
29
4. Detailed Design
This section will address in paragraph format, the implementation design
details for the program. These designs follow directly from the requirements and
highlevel design for the program. The previous two sections should be reviewed
prior to reading this section. This detailed design will be the final documentation
leading to the actual implementation of the program.
First, we will discuss the hardware and software requirements for running the
program and for the computer that will be used to build the program, along with the
development environment and programming language. Second, we will discuss the
main code files, classes, and data structures that will be part of the program. Third,
we will discuss the specification of the main algorithms that will be used by the
program for Graph Isomorphism (GI). The GDB or Graph Diagram Builder controls,
that allow the user to provide input graphs by creating graph diagrams, will need to be
created using some third party component(s), which will be determined during
implementation.
4.1 Program Hardware and Software Specifications
The minimum hardware and software requirements for a machine to be able to
run this program are as follows.
Intel Pentium 4 1.6GHz or compatible CPU
256 MB of RAM
100 MB of free harddisk space
30
Video Card that supports 32Bit color quality and 1152x864 screen resolution
17inch color monitor
Microsoft Windows XP operating system
Microsoft .NET Framework version 1.1.x
The hardware and software specifications for the machine that will be used to build
this program are as follows.
Intel Pentium 4 2.4GHz CPU
256 MB of RAM
24 GB of free harddisk space
NVIDIA GeForce4 MX 440 with AGP8x video card
17inch color monitor
Microsoft Windows XP Professional with Service Pack 2 operating system
Microsoft .NET Framework version 1.1.4322
The development environment chosen for the program is Microsoft Visual Studio
.NET Architect 2003, and the programming language is Visual Basic .NET or
VB.NET. This is an objectoriented language, released in 2003, developed by
Microsoft as a very highlevel easytouse programming language. The Object
Oriented Paradigm will be used in building the program.
4.2 Code Files, Classes, and Data Structures
The program will have a file named Main.vb which will contain all the
autogenerated code from the development environments GUI builder for the single
GUI form of the program. This file will contain any code required to allow user
interaction with the GUI. It will contain the code that allows the GUI to interact with
the rest of the programs functionality. All the functionality in this file will be
contained in a single autogenerated class called Main. This Main class will
31
contain the main code that will allow all of the objects of the program to interact with
each other as well. Furthermore, this class will contain functions to update the status
indicator and progress bar of the GUI form and will have global member object
variables for the classes below. A timing mechanism will also be implemented for
final analysis.
Another file called Graph.vb will contain a single class named Graph to
implement the graph data structure. This data structure will be used to store
internally and interact with simple graphs throughout the processing of the program.
This class will contain three, main global member variables to represent a graph: one
for the number of nodes in a graph, another for the number of edges, and a double
dynamic integer array to store the adjacency matrix of a graph. The class will contain
input functions for adding a node and adding an edge to a graph for building graphs.
These functions will not allow multigraphs, multidigraphs, or graphs with selfloops
to be created. The class will also contain accessor functions to get the number of
nodes, number of edges, and any valid entry from the adjacency matrix.
A file called GraphEigenValues.vb will contain a single class named
GraphEigenValues to provide the program with the functionality to find the Eigen
values of a graph. The constructor for this method will take a Graph object for which
the Eigenvalues need to be found. Thus, each object of this class will deal with a
single graph. A thirdparty component will be used for the calculations, more details
can be found in the Algorithms section. One global member variable will be
32
needed to represent a matrix in the format the third party component needs. The
constructor will put the supplied graphs adjacency matrix into this format. One
main, readonly property function called GetEigenValues will use the thirdparty
component to return the Eigenvalues of the graph as an integer array.
Another code file for this program will be named Isomorphic.vb and will
contain a single class called Isomorphic. The Isomorphic class will use objects of
both the Graph and GraphEigenvalues classes, discussed above, to find out if the two
input graphs are isomorphic to one another. This class will check for isomorphism
according to the Marking Algorithm presented in Altmans paper [3]. This paper also
mentions the use of the LRCholesky method for finding Eigenvalues of a graph.
This class will implement functions for the Marking Algorithm as described in the
next section. This class will also have a function for comparing Eigenvalues of two
graphs to see if they are the same or not, using objects of the GraphEigenvalues class.
The algorithm used by the class above will require a fifth file named
IntPairStack.vb, which will contain a structure called IntPair, and a class called
IntPairStack. The IntPair structure will contain two public member integer
variables called Firstlnt and Secondlnt. The IntPairStack class will have one
private member variable, which will be an array of IntPair structures. Later in this
section, we will discuss how this stack will be used by the algorithm. The stack will
be a first in last out or last in first out stack with push and pop functions that take and
33
return an IntPair, respectively. The constructor will take two integers, to push onto
the stack.
4.3 The Marking Algorithm
This algorithm is based on the Principal Axis Theorem and a specific marking
of the diagonal elements of the adjacency matrix. Computation of Eigenvalues is
part of the algorithm. To ensure that the algorithm for computation of these
converges, it is necessary that only symmetric, positive definite matrices are used to
represent the graphs. Matrices for undirected graphs are symmetric and the algorithm
can be used for simple undirected graphs. For the matrices to be positive definite, it
is sufficient that the matrices be strongly diagonally dominant, i.e., the sum of the
absolute values of the nondiagonal entries must be less than the absolute value of the
diagonal entry, for each diagonal entry in each row. This can be accomplished by
setting the diagonal entries to n. This will not change the structure of the permutation
matrix P, whose existence will show that the input graphs A and B are isomorphic.
The LRCholesky method for computing Eigenvalues is recommended, since
it is guaranteed to converge for the type of matrices described above with sufficient
accuracy in polynomial time. The algorithm attempts to iteratively change or mark
the diagonal entries of the adjacency matrices so that they remain isospectral or
orthogonally congruent or have the same Eigenvalues, while converging to an Eigen
value multiplicity of one, or having distinct Eigenvalues. If successful, the matrices
will be permutation similar, and thus isomorphic. The marking maps diagonal
34
elements of one matrix to the other, essentially creating the node permutations, and
ultimately finding out if the graphs are isomorphic. With certain isospectral graphs,
this algorithm will be forced to backtrack and try a different marking sequence,
causing it to run in exponential time. Pairs of Isospectral Nonisomorphic Graphs or
PINGs, where the graphs are orthogonally congruent, but not permutation similar,
will cause this algorithm to backtrack. If the graphs are isomorphic, the algorithm
marks their adjacency matrices in such a way as to produce orthogonally similar
matrices, that is, they have the same Eigenvalues. However, whether the graphs are
found to be isomorphic or not, the algorithm hopes they remain isospectral during the
marking, otherwise backtracking will be required. If the graphs are isomorphic and
no backtracking is involved, the running time of the algorithm is 0(n4).
Below is the pseudocode for the algorithm. This pseudocode needs to be
refined in order to implement the backtracking functionality.
Initialize ilast = jlast = 1
If Not SameEigenValues(A, B) then return NOT ISOMORPHIC
Else
For i from ilast to n
A[i, i] = 2in
For j from jlast to n
If B[j, j] = n then
B[j,j] = 2in
If SameEigenValues(A, B) then Exit jloop
Else If j = n then BACKTRACK if possible,
else return NON ISOMORPHIC
Else B[j, j] = n
End For
End For
Endlf
35
In order to implement the backtracking functionality, this algorithm will be divided
into two functions, a parent and a child. The parent function can then call the child
function in a loop. The public parent function will be called
ChecklfGraphsIsomorphicByMarking and the private child function will be called
MarkGraphs. The Stack data structure, as described in the previous section, will be
used here and the algorithms are more detailed. The pseudocode for the parent is
given below.
Initialize ilast = 1
Initialize jlast = 0
Initialize BackTrack = False
Initialize Stack(ilast, jlast)
Set all diagonal entries in A and B to n
While (True)
If MarkGraphs(A, B, Stack, BackTrack) Then
Return ISOMORPHIC
Else
If BackTrack Then
If Stack is Empty Then
Return NOT ISOMORPHIC
Else
Return NOT ISOMORPHIC
Endlf
If the marking is successful, the child returns true and we have isomorphic graphs.
Otherwise, the child function will set the BackTrack variable, letting us know if we
need to attempt to backtrack or not. If not, then the graphs are not isomorphic;
otherwise, we attempt to backtrack. However, if the stack is empty, we cannot
backtrack, meaning the graphs are not isomorphic. The pseudocode for the child is
36
given below. It is almost the same as the original pseudocode with some slight
changes to work with the parent function and add more detail. Note that both
parameters of this child function are reference parameters, that is, if the child function
modifies the passed arguments, these changes will be made to the actual objects
declared in the parent function. The first two are for the graphs, one is the
IntPairStack and the other is the backtracking variable.
Initialize BackTrack = False
Initialize IntPair = Stack.Pop
If Not SameEigenValues(A, B) then
return FALSE
Else
For i from IntPair.Firstlnt (ilast) to n
A[i, i] = 2in
For j from 1 to n
If B[j, j] = n AND j != IntPair.Secondlnt (jlast) then
B[j, j] = 2in
If SameEigenValues(A, B) then
jLast = j
Stack.Push(i, j)
Exit jloop
Else
B[j,j] = n
If j = n then
BACKTRACK = TRUE
Return FALSE
Endlf
Endlf
Endlf
End For
End For
Return TRUE
Endlf
37
4.4 The LRCholesky Algorithm
Now it is time to discuss the LRCholesky algorithm for computing the Eigen
values of a graph. I have been researching this algorithm using resources online and
have found the problem of calculating Eigenvalues to be a very difficult one, in the
field of Linear Algebra. In mathematics, and in particular in Linear Algebra, an
important tool for describing Eigenvalues of square matrices is the characteristic
polynomial. Saying that X is an Eigenvalue of A is equivalent to stating that the
system of linear equations (A XI)v = 0 (where I is the identity matrix) has a nonzero
solution v (namely an Eigenvector), and so it is equivalent to the determinant det(A 
XI) being zero. The function p(k) = det(A XI) is a polynomial in X since
determinants are defined as sums of products. This is the characteristic polynomial of
A, and the Eigenvalues of a matrix are the zeros of its characteristic polynomial. If
the matrix is small, we can compute them symbolically using the characteristic
polynomial. However, this is often impossible for larger matrices, in which case we
must use a numerical method. In fact, for any square matrix larger than 3x3, the
problem becomes difficult to solve. An analytic solution for the Eigenvalues of 2x2
matrices can be obtained directly from the quadratic formula. If
a b
A =
c d
then the characteristic polynomial is
38
det
a A b
c d A
= (a A)(d A)bc = A2 (a d)A + {ad be).
Notice that the coefficients are the trace TR(A) = a + d and the determinant det(A):
ad be, so the solutions are
. a + d (a + d)2 , ,
A =,  + bc ad =
_ a + d + ^4bc + {ad)2
A formula for the Eigenvalues of a 3x3 matrix or 4x4 matrix could be derived in an
analogous way, using the formulae for the roots of a cubic or quatric equation.
The LRCholesky algorithm relies on reducing the matrix A to tridiagonal
form. It uses Laguerres shifts for the symmetric tridiagonal matrices. The general
algorithm is as follows.
Let A be a s.p.d. matrix.
A0 = A
k = 0,1,2,...
choose shift Ok
T
Ak aj = VkVk (Cholesky decomposition, Vk is lowertriangular)
Ak+i = VkTVk + akI
Two steps of the zero shift Cholesky LR are equivalent to one step of the zero shift
QR. The shift Ok should be such that Ak cj is positive definite. When applied to a
symmetric positive definite diagonalplussemiseparable matrix, D + S, where D is
the diagonal, the shift can be included into the diagonal part.
Algorithms for the Laguerres Shift, VTV Product, and the Cholesky
Decomposition are also needed, along with the equations for their computation.
39
Implementing this algorithm becomes a more complex problem than implementing
the algorithm for GI, discussed in the previous section. The main purpose of this
program and thesis work is to solve the GI problem in Computer Science. Since this
Eigenvalue problem is mainly a study in Linear Algebra, and because of the
complexities of its implementation, it seems unnecessary to spend the time and effort
in implementing this algorithm. Also, code for this algorithm is not readily available
over the internet either, it seems that only companies have implemented similar
algorithms and are charging for their components and/or software.
After a significant amount of research, a .NET component that will work with
VB.NET called .NET Matrix Library 2.5 made by BlueBit Software, was found. The
advanced version 2.5 can calculate the Eigenvalues of a matrix efficiently using the
Cholesky Decomposition, for symmetric real matrices. Of course, the matrix may
have to be supplied in a form that is accepted by the functions provided in the .NET
component for calculating Eigenvalues. In addition, learning to use the library will
also be another task, but the process should be simple and documentation should be
available on the internet or with the developer install. This component may need to
be purchased since the trial version only handles 6x6 size matrices. Using this
component will allow focus on the GI problem for the thesis work rather than
worrying about the Linear Algebra problem of calculating Eigenvalues.
40
5. Testing and Analysis
The material in this section will serve as a roadmap for thoroughly testing the
program. The test cases will be organized according to the number of nodes,
increasing from lower boundary values. Since the program accepts input graphs in
four different formats, that is, adjacency matrices, files, diagrams, and randomly
generated graphs, some tests will be run four times, once for each format. It will also
be necessary to have tests for a particular format that do not apply to the other
formats. Similarly, for some tests it will only make sense to use one type of format.
Test data has been organized into the table below. Three of the graph input
formats will be abbreviated as AMGR for Adjacency Matrix Graph Representation;
GDB for Graph Diagram Builder; and RGG for Random Graph Generator. The first
column is the test number for each distinct number of nodes and edges, including a
letter for the different input formats. The input format used to provide the graphs is
given in the Input Type column. The Program State column specifies whether,
before running the test, the program was relaunched, the form was cleared, or the
input modified. The Iso column specifies whether the test was for isomorphic
graphs or not. Before testing, in all cases, we should know whether the graphs are
isomorphic or not. For each test, if the program correctly identifies the graphs as
isomorphic or not, the Passed column will contain Yes, to show that the program
41
passed the test. The Time Taken column gives the actual running time rounded to
the nearest millisecond, as reported by the program in the status indicator.
Table 5.1 Test Data
Test Nodes Edges Input Tvoe Program State Iso Time Taken Passed
la 0 0 Files Relaunched Y N/A Yes
lb 0 0 GDB Relaunched Y N/A Yes
2a 1 0 AMGR Relaunched Y N/A Yes
2b 1 0 Files Form Cleared Y N/A Yes
2c 1 0 GDB Form Cleared Y N/A Yes
3a 2 0 AMGR Relaunched Y N/A Yes
3b 2 0 Files Input Modified Y N/A Yes
3c 2 0 GDB Input Modified Y N/A Yes
4a 2 1 AMGR Relaunched Y N/A Yes
4b 2 1 Files Relaunched Y N/A Yes
4c 2 1 GDB Relaunched Y N/A Yes
4d 2 1 RGG Relaunched Y N/A Yes
5a 3 1 AMGR Relaunched Y 750 Yes
5b 3 1 Files Form Cleared Y 109 Yes
5c 3 1 GDB Form Cleared Y 94 Yes
5d 3 1 RGG Form Cleared Y 78 Yes
6a 3 2 AMGR Relaunched Y 313 Yes
6b 3 2 Files Input Modified Y 125 Yes
6c 3 2 GDB Input Modified Y 78 Yes
6d 3 2 RGG Input Modified Y 94 Yes
7a 3 3 AMGR Form Cleared Y 0 Yes
7b 3 3 Files Relaunched Y 406 Yes
7c 3 3 GDB Relaunched Y 406 Yes
7d 3 3 RGG Relaunched Y 109 Yes
8a 4 2 AMGR Form Cleared Y 0 Yes
8b 4 2 Files Form Cleared Y 125 Yes
8c 4 2 GDB Form Cleared Y 94 Yes
8d 4 2 RGG Form Cleared Y 109 Yes
9a 4 2 AMGR Form Cleared N 16 Yes
9b 4 2 Files Input Modified N 125 Yes
9c 4 2 GDB Input Modified N 78 Yes
9d 4 2 RGG Input Modified N 78 Yes
42
10a 4 3 AMGR Input Modified Y 0 Yes
10b 4 3 Files Relaunched Y 406 Yes
10c 4 3 GDB Relaunched Y 391 Yes
lOd 4 3 RGG Relaunched Y 109 Yes
11a 4 3 AMGR Input Modified N 16 Yes
lib 4 3 Files Form Cleared N 109 Yes
lie 4 3 GDB Form Cleared N 78 Yes
lid 4 3 RGG Form Cleared N 109 Yes
12a 5 4 AMGR Input Modified Y 0 Yes
12b 5 4 Files Input Modified Y 125 Yes
12c 5 4 GDB Input Modified Y 94 Yes
12d 5 4 RGG Input Modified Y 78 Yes
13a 5 4 AMGR Relaunched N 312 Yes
13b 5 4 Files Input Modified N 125 Yes
13c 5 4 GDB Input Modified N 94 Yes
13d 5 4 RGG Input Modified N 94 Yes
14a 6 9 AMGR Form Cleared Y 16 Yes
14b 6 9 Files Input Modified Y 109 Yes
14c 6 9 GDB Input Modified Y 94 Yes
14d 6 9 RGG Input Modified Y 94 Yes
15a 6 9 AMGR Form Cleared N 0 Yes
15b 6 9 Files Input Modified N 172 Yes
15c 6 9 GDB Input Modified N 78 Yes
15d 6 9 RGG Input Modified N 94 Yes
16 20 16 RGG Relaunched N 328 Yes
17 20 16 RGG Input Modified Y 188 Yes
18 20 45 RGG Input Modified N 94 Yes
19 20 45 RGG Input Modified Y 156 Yes
20 50 50 RGG Input Modified N 141 Yes
21 50 50 RGG Input Modified Y 1234 Yes
22 52 100 RGG Input Modified N 105 Yes
23 52 100 RGG Input Modified Y 2016 Yes
24 100 120 RGG Input Modified N 125 Yes
25 100 120 RGG Input Modified Y 28781 Yes
26 105 330 RGG Input Modified N 172 Yes
27 105 330 RGG Input Modified Y 38641 Yes
Now that we have a set of test cases and their results, we can analyze the test data.
43
The program result was as expected; for all test cases, meaning, in each case,
the program passed the test. For the first tests, up to and including 4d, the program
did not report any running times. This is due to the fact that the program did not run
the algorithm to check for isomorphism. This is the expected behavior in these tests,
since the nodes were less than three and the number of edges could not be more than
one.
From test 5a through 15d, the program state is changed to see if the program
gets the correct result in different situations, which it did. The running times in these
tests are so small due to the small number of nodes and edges, that they are more
dependant on the machine and operating system than the number of nodes and/or
edges. However, in most cases, if the user needs to run multiple checks, the user
would simply modify the input rather than clearing the form or relaunching the
program. Therefore, in the tests where the input was modified, checks for
isomorphism were performed a few times to make sure the running times were not
significantly different. This is important if we want to analyze the running time. The
program code does not store any data for future calculations or processing between
checks for isomorphism. With these rechecks, we now know that this is also the
case for the third party component used for Eigenvalue calculations, the operating
system, the .NET framework, and the machine itself.
From test 5a through 15d, the different input types are also tested, to see if the
program gets the correct result no matter how the input graphs are provided, which it
44
did. These tests also show that the type of input format used to provide the graphs
effects the running time. Therefore, we should use the same input type for analyzing
running times.
The last tests, from test 16 through 27 all use the RGG input type to randomly
create input graphs, both isomorphic and not. Generally speaking, when the program
randomly creates both input graphs, they will not be isomorphic. If the user chooses
to create graph B isomorphic to A, then they will always be isomorphic. In general,
these tests show that the running times are significantly higher when the graphs are
isomorphic. This makes sense since the randomly generated graphs probably do not
have the same Eigenvalues before marking, even when the diagonal elements are all
the same in both graphs. These tests also show that the running times are much more
dependant on the number of nodes and not the number of edges. This is because
increasing the number of edges, with the same number of nodes, to make the graphs
more dense, does not significantly change the running time.
For isomorphic tests, from test 16 through 27, the program option to randomly
create graph B isomorphic to graph A, was used. When using this option, the
program randomly generates a permutation matrix to create graph B isomorphic to A,
which is stored. The program then also has another option to see if this permutation
matrix is the same as the one generated by the algorithm, after checking for
isomorphism. This check was performed in all of these tests and showed that there
45
can be more than one permutation matrix. Furthermore, it showed that it is more
likely to have the same permutation matrices for denser graphs.
The program also has other options for the GDB input format. You can copy
the diagram from graph A to B and then modify it to save time. This option was used
for all GDB input format tests. Other options allow you to swap two node labels in
the diagrams and randomly relabel the nodes in the diagram for graph B. These two
options were tested independently from the tests given in the table above, and worked
correctly. Similarly, many other tests, not in the table above, were performed to make
sure the program works correctly. For example, there were many tests performed to
ensure the program worked correctly and displayed the correct messages when bad
input was provided by the user.
Now it is time to do the final running time analysis, that is, to show that the
running time is actually 0(n4), as is for the algorithm, and that the program code
correctly implements the algorithm. The running time of 0(n4) is based on the
number of nodes only, and the program test results show that denser graphs do not
significantly change the running time, if the nodes are kept constant. This running
time is the highest order term of a quatric equation,
An4 + Bn + Cn2 + Dn = Actual Running Time.
Here A, B, C, and D are all constants and n is the number of nodes. From tests 21,
23, 25, and 27, we can get the numbers of nodes and actual running times, and solve
46
the system of equations that result. The system of equations that needs to be solved
is,
6250000.4 +1250005 + 2500C + 50D = 1234
7311616.4 + 1406085 + 2704C + 52Ã‚Â£> = 2016
100000000.4 +10000005 +10000C +100D = 28781
121550625.4 +11576255 +11025C + 105D = 38641.
The solutions for A, B, C, and D were found using a computer program available on
the internet [20]. This gives us an equation for the actual running time for the
program,
93323757 4 58917728297 3 142439935447 2 19958154731
21221200000 63663600000 + 2122120000 12732720
Using this equation, now we can predict the running time for, say, 75 nodes, which
comes to 8717 milliseconds or approximately 9 seconds. Running the program using
randomly generated isomorphic graphs with 75 nodes and 150 edges, a running time
of 9345 milliseconds or approximately 9 seconds was reported. While not exactly
equal, the two running times are equal when rounding to the nearest second.
However, this will not always be the case, because the algorithm has backtracking
built into it, and depending on the graphs, it may run in exponential time.
Furthermore, the test data used to create the system of equations above may have
contained tests in which backtracking was needed. To some extent however, we have
shown that the running time of the implemented algorithm is indeed 0(n4).
Therefore, the actual running time may be less or more than the running time
calculated using the equation above.
47
6. Conclusion
The Vertex Filtering (VF) and McKays Nauty were two algorithms
researched in more depth as they were advertised as the best available. However,
there is no efficient general Graph Isomorphism (GI) algorithm. In the worst cases,
these algorithms perform the same as the bruteforce naive algorithm. That is, with a
0(f) exponential running time, where t and s are the number of nodes in the input
graphs. However, in most cases, they outperform any other available algorithms. It
seems the VF2 and the Nauty algorithm can perform equally well, depending on the
type of graphs and valence. The running times for these algorithms are highorder
polynomials.
Also studied in depth, were the Marking Algorithm and the Projection
Algorithm, both given in Altmans paper [3]. These are not solutions to the general
problem, nor are they approximation algorithms; they basically restrict the input to
simple undirected graphs. For many applications, they can be very useful. The
Marking algorithm has a running time of 0(n4) and the Projection Algorithm has a
running time of 0(n5). In the worst case, both will still run in exponential time, in the
Projection Algorithm the likelihood of this happening is decreased.
The GI program designed, written, and tested; was discussed with Altman and
implements the Marking Algorithm [3]. Developer Requirements, HighLevel
Design, and Detailed Design documents were created for the program. This program
48
uses the Marking Algorithm to see if two simple undirected graphs are isomorphic.
While writing these documents, a significant amount of research was done on
calculating Eigenvalues, using the LRCholesky method, and no useful pseudocode
or actual code was found for this daunting task. Since the algorithm relied on this, a
suitable thirdparty component was found. The same is true for the graph diagram
builder that is part of the program and allows the user to enter the two input graphs by
drawing them. However, this functionality required much more coding to get it to
work correctly, and even more research and testing to find a suitable thirdparty
component.
Writing the program took up most of the time spent on this thesis, and
modifications to the design were required throughout the process, but in the end, it
was successful. A Testing and Analysis document was also written and used as a
road map for thoroughly testing the program. The program created is a userfriendly
GUI program with robust error checking and reporting for bad input and system
exceptions. Four different input formats can be used to specify the graphs; these
include adjacency matrices, files, diagrams, and random generation. The adjacency
matrices are provided by simply selecting the number of nodes from a list, and then
selecting checkboxes for edges. The diagram builder controls required the most
custom programming and allow the user to copy the diagram for graph A to graph B.
Options are also available to swap labels of two nodes and for graph B, to randomly
relabel the nodes. Graph B can be randomly generated along with A or isomorphic
49
to A, using a randomly generated permutation matrix. This matrix can then be tested
for equality against the one found by the algorithm, after it discovers the graphs are
isomorphic. The program also includes an option to clear and reset the form, and a
status indicator along with a progress bar. The program has been thoroughly tested
and is an easytouse, reliable tool for checking if two graphs are isomorphic.
When randomly generating both graphs, for larger numbers of nodes, they
were generally found to be nonisomorphic. When randomly generating graph B
isomorphic to A, the random permutation matrix used to create graph B was not
always the same as the one found by the algorithm. This shows that there can be
multiple permutation matrices for pairs of isomorphic graphs. It was also found that
this is less likely for denser graphs, and in such cases, the two permutation matrices
were the same. The programs running time was dominated by the number of nodes,
as is the case for the algorithm. In fact, in the Testing and Analysis section of this
thesis, to some extent, it is shown that the running time of the program is 0(n4), as
given for the algorithm [3].
50
REFERENCES
1. Cvetkovic, D.M., Doob, M, and Sachs, H., Spectra of Graphs Theory and
Application, Academic Press, New York (1980).
2. Gross, J., Yellen, J., Graph Theory and Its Applications, CRC Press, Boca
Raton, Florida (1998).
3. Altman, T., An Algebraic Approach to the Graph Isomorphism Problem, to
appear in Ullmans Quarterly Journal.
4. Foggia, P., Sansone, C., Vento, M., A Performance Comparison of Five
Algorithms for Graph Isomorphism. Napoli, Italy (2003).
5. Kukluk, J.P., Holder, L.B., Cook, D.J., Algorithm and Experiments in Testing
Planar Graphs for Isomorphism, Journal of Graph Algorithms and
Applications (2004) 313356.
6. Plestenjak, B., Camp, E.V., and Barel, M.V., A Cholesky LR Algorithm for the
Positive Definite Symmetric DiagonalPlusSemiseparable Eigenproblem,
Leuven, Belgium, 2005.
7. Comeil, D.G. and Gotlieb, C.C., An Efficient Algorithm for Graph
Isomorphism, Journal of the Association for Computing Machinery, 1970.
8. Ullman, J.R., An algorithm for Subgraph Isomorphism, Journal of the
Association for Computing Machinery, 1976.
9. Schmidt, D.C. and Druffel, L.E., A Fast Backtracking Algorithm to test
Directed Graphs for Isomorphism using Distance Matrices, Journal of the
Association for Computing Machinery, 1976.
10. Cordelia, L. P., Foggia, P., Sansone, C., and Vento, M., Performance
evaluation of the VF graph matching algorithm, Proc. 10th Int. Conf. on
Image Analysis and Processing, 1999.
51
11. Cordelia, L. P., Foggia, P., Sansone, C., and Vento, M., A Subgraph
Isomorphism algorithm for matching large graphs, IEEE Transactions On
Pattern Analysis and Machine Intelligence, 2004.
12. McKay, B.D., Practical Graph Isomorphism, 10th. Manitoba Conference on
Numerical Mathematics and Computing, 1981.
13. Bunke, H., Messmer, B.T., Efficient Attributed Graph Matching and its
Application to Image Analysis, Proc. on Image Analysis and Processing, 1995.
14. http://en.wikipedia.org/wiki/Graph_isomorphism_problem, 2006.
15. http://www.cs.ucf.edu/newsletter/voll/issue_two/graphtheory.html, 2006.
16. http://mde.abo.fi/confluence/download/attachments/675/LillqvistThesis.pdf,
2006.
17. http://amalfi.dis.unina.it/graph/db/papers/vfalgorithm.pdf, 2006.
18. http://en.wikipedia.Org/wiki/Graph_theory#History, 2006.
19. http://cs.anu.edu.au/~bdm/nauty/, 2006.
20. http://www.hostsrv.com/webmab/app 1 /MSP/quickmath/02/pageGenerate?site
=quickmath&s 1 =equations&s2=solve&s3=advanced#reply, 2006.
52

Full Text 
PAGE 1
ALGEBRAIC APPROACH TO GRAPH ISOMORPHISM by Shahzad Chaudhry B.Sc., University of Colorado at Denver, 2000 A thesis submitted to the University of Colorado at Denver and Health Sciences Center in partial fulfillment of the requirements for the degree of Master of Science Computer Science 2007
PAGE 2
This thesis for the Masters of Science degree by Shahzad Choudhry has been approved by Tom Altman 11 I J. 31 o'f Date
PAGE 3
Chaudhry, Shahzad (M.S., Computer Science) Algebraic Approach To Graph Isomorphism Thesis directed by Professor Tom Altman ABSTRACT The subject of this thesis is graph isomorphism. There are many ways to draw the same graph. If the graph is complex, with a little imagination, one can see that deciding whether two drawings represent the same graph can become a challenging task This, however, is not graph isomorphism The word iso indicates similarity and the word morph indicates change. The graph isomorphism problem is one of deciding whether two different graphs are similar in the sense of structural equivalence. That is whether they have the same pattern of connections To this day, no practical algorithm has been designed to solve this problem for all cases A Graph Isomorphism/from a graph G to a graph His a onetoone mapping for each node in G to a node in H such that for any two nodes u and v in G there are the same number of edges betweenf(u) andf(v) as there are between u and v in G. Two graphs G and Hare isomorphic, if and only if, there exists a vertex bijection/: Vo 7 VH that
PAGE 4
preserves edge connectivity. If you can relabel the nodes of one graph to match those of another graph, preserving edge multiplicity, you have two isomorphic graphs. This thesis provides a detailed specification of the graph isomorphism problem along with a discussion of algorithms for its solution. The major portion of the thesis work performed was the design and implementation of a computer program to check if two graphs are Isomorphic. This report also provides the design documentation, testing, and pseudocode for this program, which uses Altman's Marking Algorithm [3]. This abstract accurately represents the content of the candidate's thesis. I recommend its publication. Signed Tom Altman
PAGE 5
CONTENTS Figures ................................................................................................................... vii Tables .................................................................................................. ... ....... ....... viii Chapter 1. Introduction .... .... ............. .... ................ ............ ......................................... ......... 1 1.1 Graph Isomorphism Specification .................................................................... 1 1.2 Graph Isomorphism Algorithms ............................................... ..................... 7 1.2.1 Vertex Filtering (VF) Algorithm ................................................................ .12 1.2.2 McKay's Nauty Algorithm ...... ..... ............................... ..... ............................ 15 1.2.3 The Marking Algorithm ............................................................................... 16 1.2.4 The Projection Algorithm ............. ......................... ......................... ......... 18 2. Developer Requirements ............................................................ .......... ........... 21 3. HighLevel Design ............................................................................................ 25 4. Detailed Design .......... ....... .......... ... ................ ............... ....... ... ........................ 30 4.1 Program Hardware and Software Specifications ........................................... .30 4.2 Code Files, Classes, and Data Structures ............... ... .................................... 31 4.3 The Marking Algorithm ............................... .... ............................................... 34 4.4 The LRCholesky Algorithm ......................................................................... .38 5. Testing and Analysis ............................................................... ................ ... ...... 41 v
PAGE 6
6. Conclusion ........................................................................................................ 48 References ........ .... ....................... ........ .................................................. .............. 51 VI
PAGE 7
LIST OF FIGURES Figure 1.1 Example Graphs ....................................... ......................................................... 1 1.2 Example Graphs ........................................................... ..................................... 4 1.3 Example Graphs ................................................................................................. 5 3.1 Example Graph Input File ............................................................................... 29 Vll
PAGE 8
LIST OF TABLES Table 5.1 Test Data .............................................. .......................................................... 42 Vlll
PAGE 9
1. Introduction This introduction will provide a detailed specification of Graph Isomorphism (GI) followed by a discussion of algorithms to solve the problem. 1.1 Graph Isomorphism Specification It is not difficult to see that there are an infinite number of ways to draw the same graph. If the graph is a little complex, with a little imagination, one can see that deciding whether two drawings represent the same graph can become a challenging task. This, however, is not graph isomorphism. The word iso indicates similarity and the word morph indicates change. The graph isomorphism problem is one of deciding whether two different graphs are similar in the sense of structural equivalence. That is whether they have the same pattern of connections. To this day, no practical algorithm has been designed to solve this problem for all cases. With that said, let us move on to further specification and explanation of the problem. Consider the following two graphs, G and H. 9 4 2 2 4 Graph G Graph H Fig 1.1 Example Graphs 1
PAGE 10
It is easy to see, because of the way the graph is drawn, that by renaming the vertices, from graph H, we can get graph G. That is and transforms graph H to graph G. Note that this renaming or transformation in tum, causes the edges to change endpoints, that is edge (9, 2) in H now becomes edge (1, 4), as in G. Each renaming, of edges or nodes, is known as a bijection. If a set of bijections exists to transform a graph to another graph, the two graphs are isomorphic. However, finding these bijections can become very difficult when dealing with more complex graphs. Consider the following graph specifications: Graph G has 4 nodes with edges (4, 1), (4, 2), and (3, 4). Graph H has 4 nodes with edges (9, 2), (4, 2), and (2, 5). Yes, these are the same graphs G and H as shown in the diagrams before. Since the graphs are undirected, the edge (x, y) is the same as edge (.y, x). Now, without the aid of the diagram, we can see that it is not that straightforward to come up with the bijections to transform G to H or H to G. In the graphs above, the edge bijections are a direct result of node bijections. The node bijections are sufficient to show the transformation, and thus show that graphs G and H are isomorphic. The graphs are undirected, but what if the graphs were digraphs, where (x, y) and (y, x) are not the same edge? Then we can simply take the direction of the edges into account when looking for isomorphism. What if the edges were labeled and/or assigned weights, as in a weighted graph or weighted digraph? In this case, we simply say that the weight on an edge is not really part of 2
PAGE 11
the structure ofthe graph and is generally ignored in graph isomorphism. However, in multigraphs and multidigraphs, if edge multiplicity is not maintained, edge bijections are needed. That is, if an isomorphism,/, on a graph H, maps a node v to nodef(v) and a node u to nodef(u);f(v) andf(u) must have the same number of edges between each other as do the original nodes v and u in H. If this is not the case, then we are changing the structure of the original graph H, which is defined by its edges and which edges connect which nodes. In a naive algorithm, for the general case, it may require n! vertex bijections to show that one graph can be transformed into another, where n is the number of nodes. Now, let us formalize a definition of graph isomorphism. A graph isomorphism J from a graph G to a graph H is a onetoone mapping for each node in G to a node in H such that for any two nodes u and v in G there are the same number of edges betweenf(u) andf(v) as there are between u and v in G. Two graphs G and Hare isomorphic, if and only if, there exists a vertex bijection/: Vo VH that preserves edge multiplicity. If you can relabel the nodes of one graph to match those of another graph, preserving edge multiplicity, you have two isomorphic graphs. In other words, G and H are isomorphic if their vertices can be labeled or numbered so that adjacency matrices are equal. With regard to matrices, the problem for graphs G and H can be boiled down to: does there exist a permutation matrix P such that PTGP = H? Of course, adjacency must be preserved, to avoid changing the structure of the graph. A vertex bijection/: V0 VH preserves adjacency ifj(u) is adjacent tof(v) if 3
PAGE 12
and only if u is adjacent to v, for all u v in G. For simple graphs, which do not include multigraphs or multidigraphs, preserving adjacency is the same as preserving edge multiplicity (as given in the definition), since there is, at most, only one edge between any pair of vertices. In the case of digraphs the edge direction must also be preserved. Now that we know what graph isomorphism is, we can see that in order for two graphs to be isomorphic, they must have the same number of nodes. This is true, because each node in one graph has to be mapped by the isomorphism to a node in the other graph and because this is a onetoone mapping What about the number of edges? If the number of nodes and the number of edges are the same amongst two simple nonmulti graphs, are the graphs isomorphic? See the example below. r: Fi2 1.2 Example Graphs In this example, you can easily see that the structure of the graphs is different even though they have the same number of nodes and edges. The graphs are said to be of different isomorphism type. Graphs ofthe same isomorphism type are isomorphic. Isomorphic type, however, only seems to make sense if we are talking about graphs with a fixed number of nodes. On the other hand, we could abstract graphs to isomorphic types, in which case, all graphs with the same isomorphic type can be 4
PAGE 13
considered the same graph. Thus, the number of nodes and the number of edges being the same for two graphs G and H, is necessary, but not sufficient, to show isomorphism. Similar is the case for edge multiplicity preservation in an isomorphism. lffis an isomorphism from graph G to graph H, then node v in G for all nodes in G andf(v) in H must have the same degree. Otherwise, the structure of the graphs G and H is different and they cannot be isomorphic. See the following example. I I Fi2 1.3 Example Graphs The two graphs have the same number of nodes, edges, and the same degree sequence of<1, 1, 1, 2, 2, 3>. However, they are not isomorphic because ofthe structural difference. Such properties of graphs are known as isomorphism invariants. These invariants cannot be different between two isomorphic graphs. Thus, they can be used to show nonisomorphism, but are not sufficient in themselves to show isomorphism. 5
PAGE 14
A common goal with graph isomorphism is to create a practical general algorithm to decide graph isomorphism, or to show that no such algorithm exists. In a bruteforce approach there are n! vertex bijections to check and each check would require examining all vertex pairs, where n is the number of nodes. There are many uses of graphs. Similarly, as isomorphism is a property of graphs, it also has importance for the same uses. In particular, there are uses in linguistics with context free grammars, in circuits, in pattern recognition, in lowlevel image representations, in chemistry with chemical compounds, etc. As mentioned earlier, no practical algorithm is known to exist that solves the general graph isomorphism problem in polynomial time. In fact, it is not even known which classification the problem falls into, whether it is NPComplete, or whether it falls into the P class and can be decided in polynomial time. The problem is, however, in NP, due to the ties to the permutation of vertices. There are also other types of classifications, but it is still not known whether graph isomorphism falls into them or not. In either case, it is considered a difficult problem to solve efficiently. However, by placing certain restrictions on the type of graphs, polynomial time algorithms have been devised. Such restrictions include bounded Eigenvalue multiplicity, planar graphs, graphs satisfying a fixed degree bound, interval graphs, permutation graphs, convex graphs, and more. A more general problem, the sub graph isomorphism problem, is known to be NPComplete. A new class, GI, the set of problems with a polynomialtime Turing reduction to the graph isomorphism 6
PAGE 15
problem, was also created. The problem is trivially a complete problem for GI [14]. The rest of this thesis is dedicated to algorithms for graph isomorphism. Graph isomorphism is perhaps the most widely studied natural problem whose complexity is not classified as complete for some natural class. In 1852, Francis Guthrie posed the Four Color Problem for coloring a map. This problem asks if it is possible to color, using only four colors, any map of countries in such a way as to prevent two bordering countries from having the same color [11]. This problem was solved in 1976 by Kenneth Appel and Wolfgang Haken. This can be considered the birth of graph theory. While trying to solve it, mathematicians invented many fundamental graph theoretic tenns and concepts, and graph isomorphism was born [11 ]. 1.2 Graph Isomorphism Algorithms Efficient graph matching algorithms suited for matching large graphs have been a subject of research for approximately the last three and a half decades. As mentioned in the previous section, most algorithms reduce the complexity by imposing restrictions on the input graphs. An alternative approach to reducing matching complexity is that of using an adequate representation of the searching process and pruning unprofitable paths in the search space. This is a technique most common in Artificial Intelligence. In this way, no restrictions must be imposed on the structure of the input graphs and the obtained algorithms can be generally applied. However, this process can become a difficult problem to solve. 7
PAGE 16
One of the pioneer papers ascribed to this area of study, was published in 1970 by D.G. Comeil, C.C Gotlieb [7]. It illustrates an isomorphism algorithm, which performs suitable transformations on the input graphs, in order to find a different representation for which the matching is computationally more convenient. However, it has been shown that the conjecture on which this method is based, is not always true. A procedure that significantly reduces the size of the search space is the backtracking algorithm proposed by Ullmann in his 1976 paper [8]. This algorithm is devised for both graph isomorphism and sub graph isomorphism, and is still one of the most commonly used today, for exact graph matching. Compared with other algorithms, it is more convenient in terms of matching time, in case of onetoone matching. Another backtracking algorithm, SD, was also presented in 1976 by D.C. Schmidt and L.E. Druffel [9]. It uses the information contained in the distance matrix representation of a graph to establish an initial partition of the graph nodes. This distance matrix information is then used in a backtracking procedure to reduce the search tree of possible mappings. A more recent algorithm, known as VF for Vector Filtering, is based on a depth first search strategy, with a set of rules to efficiently prune the search tree. This algorithm can be found in the 1999 paper by L. P. Cordelia, P. Foggia, C. Sansone, 8
PAGE 17
and M. Vento [10]. In 2000, an improvement was made to this VF algorithm by the same authors, giving rise to the VF2, which always outperforms the VF [11]. The McKay's Nauty algorithm was developed in 1981 by B.D. McKay [12]. Even ifNauty is considered to be one of the fastest graph isomorphism algorithms available, it has been shown that there are categories of graphs for which it employs exponential time in order to find an isomorphism. It is based on a set of transformations that reduce the graphs to be matched to a canonical form for which it is very simple to test for the presence of an isomorphism. Another possible approach to the isomorphism problem is the one presented in 1995 by H. Bunke and B.T. Messmer [13]. Instead of reducing the complexity of matching two graphs, the authors attempt to reduce the overall computational cost when matching a sample graph against a large set of prototypes. The method performs the matching in quadratic time with the size of the input graph and independently of the number of prototypes. It is obviously convenient in applications requiring matching a graph against a database, but the memory required to store the preprocessed database grows exponentially with the size of the graphs, making the method suitable only for small graphs. So, in case of onetoone matching, other algorithms, in particular Ullmann's, are more suitable. All the above cited algorithms are exact ones, meaning they find the exact solution to the matching problem, if any Other techniques, like those based on non deterministic paradigms, able to find approximate solutions to the matching problem 9
PAGE 18
have been proposed, especially in the recent past. In most cases, they are very powerful in reducing the complexity from exponential to polynomial, but they do not guarantee finding an exact and optimal solution. As mentioned before, another category of algorithms for this problem places restrictions on the input graphs. Such restrictions include bounded Eigenvalue multiplicity, planar graphs, graphs satisfying a fixed degree bound, interval graphs, permutation graphs, convex graphs, and more. These algorithms are also polynomial in running time. However, unlike approximation algorithms, these algorithms are important and practical, as given problem definitions may very well meet, or can be modified to meet the restrictions with little consequence. Altman's paper [3] gives two heuristic backtracking algorithms with polynomial running times, if no backtracking is applied. The Marking Algorithm is based on the Principal Axis Theorem and a specific marking of diagonal elements of the adjacency matrices. The Projection Algorithm modifies the other, incorporating a heuristic that uses at most n steepest descent projections. Both algorithms rely on the calculation of Eigenvalues of the adjacency matrices, which are forced to be symmetric, positive definite matrices to ensure that the Eigenvalue algorithm(s) converge. Thus far, we have discussed different classifications of algorithms, and given a brief overview of several effective and wellknown algorithms. For each algorithm, the source papers were referenced, and can be studied for more detailed specifications of these algorithms. However, we will now discuss four of these algorithms in more 10
PAGE 19
detail, along with their performance. The four algorithms that will be discussed below are Vertex Filtering (VF) algorithm, McKay's Nauty algorithm, Altman's Marking algorithm, and Altman's Projection algorithm. Through research, one useful paper comparing the performance of the five general GI algorithms discussed above was found. The paper was published fairly recent1y, by P. Foggia, C.Sansone, and M. Vento [4]. As it could be expected, the research in this paper showed that none of the algorithms is definitively better than the others. In particular, for randomly connected graphs, the Nauty algorithm is better if the graphs are dense and/or of large size. On the contrary, for smaller, sparse graphs, VF2 performs better. On more regular graphs, i.e. on 20 meshes, VF2 is definitely the best algorithm, since the Nauty algorithm is not even able to find a solution for graphs bigger than a few dozens of nodes. In the case of bounded valence graphs, if the valence is small, VF2 is always the best algorithm; while for bigger values of the valence, if the size of the graphs is small, the Nauty algorithm is more convenient. Finally, it is also worth noting that SO, VF, and VF2, are the only algorithms that were always able to find a solution to the isomorphism problem, independent of the size and kind of graphs to be matched. It seems that it is a tie between the performance of the VF2 (based on the VF) and the Nauty algorithm. However, depending on the type of graphs and valence, one would be better suited to a particular situation than the other. As these two algorithms have proven themselves, let us take a closer look at them. 11
PAGE 20
1.2.1 Vertex Filtering (VF) Algorithm There are actually two main VF algorithms for graph and subgraph isomorphism common today, the VF and VF2. Performance evaluations done by the authors of VF show major increases in speed compared to the algorithm written by Ullmann [8]. The VF2 algorithm reduces memory usage from O(n2 ) to O(n) and has been found to always outperform its older version, the VF algorithm. The VF2 algorithm was published by the same authors in 2000, about one year after its predecessor [I 0] [ 11]. Both VF algorithms are implemented as backtracking match functions that apply recursive function calls for every added mapping between the pattern and the target graph. The recursive programming style makes it possible to backtrack simply by returning from the current invocation of the matching function. In order to keep track of which nodes have been mapped in each iteration, between the two input graphs, a state object is created during each iteration. The state object is cloned from the previous state and extended with a new mapping during each recursion. This means that every state object contains all the mappings found in the previous state object, in addition to the currently added one. This way the state object always includes a partial, valid isomorphism. The recursion continues until all the nodes and edges of the pattern graph have been matched. If at some point there are no more possible nodes in the target to match, even though the whole pattern graph has not yet 12
PAGE 21
been matched, it simply returns to the previous level and tries the next pair. This continues until the whole search space has been covered. The VF algorithms do not have an exact strategy for choosing the search plan, instead it is dynamically determined during the search. This is done so that the next node is always selected from the adjacent nodes of the most recently added node. Choosing the path in this manner makes it a depthfirst search. When the search finally reaches a node from which there are no adjacent nodes, then the unmapped nodes, which are adjacent to previously mapped nodes, will be matched. Below you will find the VF2 algorithm. M is said to be a graph/subgraph isomorphism if and only if M is an isomorphism between G2 and a subgraph ofGI. We will assume that the graphs involved are directed graphs, i.e. an arc (i, j) is to be considered different from (j, i). However, the extension ofthe algorithm to undirected graphs is trivial. The matching process can be suitably described by means of a State Space Representation (SSR). Each state s of the matching process can be associated to a partial mapping solution M(s), which contains only a subset of the components of the mapping function M. A partial mapping solution univocally identifies two subgraphs of G 1 and G2, say Gl(s) and G2(s), obtained by selecting from Gland G2 only the nodes included in the components ofM(s), and the branches connecting them. In the following, we will denote by Ml(s) and M2(s) the projection ofM(s) onto Nl and N2, respectively, while the sets of the branches ofGI(s) and G2(s) will be denoted by Bl(s) and B2(s), respectively. 13
PAGE 22
point: A highlevel description of the VF2 matching algorithm can be outlined at this PROCEDURE Match(s) INPUT: an intermediate states; the initial state sO has M(sO) =NULL OUTPUT: the mappings between the two graphs IF M(s) covers all the nodes of G2 THEN OUTPUTM(s) ELSE Compute set P(s) of the pairs candidate for inclusion in M(s) FOREACH (n, m) member ofP(s) IF F(s, n, m) THEN Compute states' by adding (n, m) to M(s) CALL Match(s') END IF ENDFOREACH Restore data structures END IF END PROCEDURE Here F(s, n m) is a boolean function (called feasibility function) that is used to prune the search tree. If its value is true, it is guaranteed that the state s ', obtained by adding (n, m) to s, is a partial isomorphism if s is; hence the final state is either an isomorphism between G 1 and G2, or a graph/subgraph isomorphism between a subgraph of G 1 and G2. Moreover, F will also prune some states that, albeit corresponding to an isomorphism between G1(s) and G2(s), would not lead to a complete matching solution. Once again, we reiterate that there is no efficient general graph isomorphism algorithm. In the worst cases, this algorithm performs the same as the bruteforce 14
PAGE 23
naive algorithm. That is, it has a O(nm) exponential running time, where n and m are the number of nodes in the input graphs. However, in most cases it outperforms any other available algorithm, and requires close to O(n) space; and thus it can handle larger graphs more easily than most other algorithms. 1.2.1 McKay's Nauty Algorithm Despite the fact that this algorithm is advertised as the fastest general graph GI algorithm, it also has a worstcase O(nm) exponential running time, like the VF algorithm. However, in performance testing it was found that this algorithm outperforms any others when dealing with dense, small to medium size graphs. Except in the case of graphs known as 2D meshes, where the algorithm was not able to find a solution, but this may just be due to an implementation error. As mentioned before, the algorithm is based on a set of transformations that reduce the graphs to be matched, to a canonical form, for which it is very simple to test the presence of an isomorphism. There is no document which explains in detail how Nauty works. However, most of the important ideas were published in Brendan D. McKay's paper [12]. The algorithm finds the automorphism group of a vertex colored graph. Nauty is also able to produce a canonicallylabeled isomorph ofthe graph, to assist in isomorphism testing. An automorphism of a graph is a graph isomorphism with itself, i.e., a mapping from the vertices of the given graph, G, back to vertices of G, such that the resulting graph is isomorphic with G. The set of automorphisms for G make up its automorphism group, which the algorithm uses as 15
PAGE 24
an abstraction to look for isomorphism. You can also say that the algorithm is based on group theory. Some data structures, including trees, are used in the process. Coloring or partitioning is used, automorphisms of the graph are found by noticing that two labelings give the same labeled graph. The canonical labeling map corresponds to one of these labelings, chosen according to a complicated scheme. This is just a high level view of some of the features of the algorithm. Despite my sincerest efforts, I was unable to find a good description or explanation of the algorithm. McKay's paper, which can be found online, contains a very long and difficult to understand description of the algorithm, which may also involve concept learning. 1.2.3 The Marking Algorithm As mentioned earlier, this algorithm is based on the Principal Axis Theorem and a specific marking of the diagonal elements of the adjacency matrices. For every symmetric matrix A, there exists an orthogonal matrix ZA which transforms A into a diagonal form D A = ZA T AZA. The diagonal elements of D A are the Eigenvalues of A. This is the Principal Axis Theorem, and it tells us that every symmetric matrix with n rows and n columns, has n Eigenvalues. Computation of Eigenvalues is part of the algorithm. To ensure that the algorithm for computation of these values converges, it is necessary that only symmetric, positive definite matrices are used to represent the graphs. Matrices for undirected graphs are symmetric, and the algorithm can be used for simple undirected graphs For the matrices to be positive definite, it is sufficient 16
PAGE 25
that the matrices be strongly diagonally dominant, i.e., the sum of the absolute value of the nondiagonal entries must be less than the absolute value of the diagonal entry, for each diagonal entry in each row. This can be accomplished by setting the diagonal entries to n, which will be zero for simple graphs with no selfloops. This will not change the structure of the permutation matrix P, whose existence will show that the input graphs A and B are isomorphic. The LRCholesky method for computing Eigenvalues is recommended, since it is guaranteed to converge for the type of matrices described above with sufficient accuracy in polynomial time. The algorithm attempts to iteratively change or mark the diagonal entries of the adjacency matrices so that they remain isospectral or orthogonally congruent, while converging to an Eigenvalue multiplicity of one. If successful, the matrices will be permutation similar, and thus isomorphic. With certain isospectral graphs, this algorithm will be forced to backtrack and try a different marking sequence, causing it to run in exponential time. Pairs of Isospectral Nonisomorphic Graphs or PINGs, where the graphs are orthogonally congruent, but not permutation similar, will cause this algorithm to backtrack. If the graphs are isomorphic, the algorithm marks their adjacency matrices in such a way as to wind up with orthogonally similar matrices, that is, they will have the same Eigenvalues. However, whether the graphs are found to be isomorphic or not, the algorithm hopes that the graphs remain isospectral during the marking, otherwise backtracking will be 17
PAGE 26
required. If the graphs are isomorphic and no backtracking is involved, the running time of the algorithm is O(n4). Below is the pseudocode for the algorithm. Initialize ilast = jlast = 1 IfNot SameEigenValues(A, B) then return "NOT ISOMORPHIC" Else Endlf For i from ilast to n A[i, i] = 2in For j from jlast ton If B[j, j] = n then B[j,j] = 2in End For End For If SameEigen Values( A, B) then Exit jloop Else If j = n then BACKTRACK if possible, else return "NON ISOMORPHIC" Else B[j, j] = n 1.2.4 The Projection Algorithm The observation of how fast the previous algorithm converges toward the final permutation matrix P, as the diagonal elements of the graphs are cumulatively marked, led to the projection algorithm. There are two disadvantages to the marking algorithm. First, since the algorithm begins marking immediately, the presence of PINGs will force backtracking. Second, the algorithm must fully mark the pair before a permutation matrix can be determined. The projection algorithm addresses both of these issues in a twophase approach. In the first phase, the vertex sets of the two graphs are partitioned into equivalence classes based on Eigenvalue invariance with a single marking. This avoids the first shortcoming ofthe marking algorithm. In the second phase, the marking algorithm is performed only on the submatrices 18
PAGE 27
whose corresponding vertices belong to the same equivalence class from phase one. This refines the original vertex partition, and at each stage of refinement, the similarity transformation matrix Q is examined and a candidate permutation matrix P is computed by solving an assignment problem that maximizes the sum of the absolute values ofQ's entries (a one perrow, percolumn choice). This avoids having to mark the entire matrices before a possible permutation matrix is found. We will now discuss each of the phases of the algorithm individually. The first phase will determine equivalence classes of a graph matrix by separately marking each of its diagonal entries and checking to see if the Eigen values are the same. The process will be done for both graphs and can complete in O(n4 ) steps A Possible Mapping Tree (PMT) is constructed from these equivalence classes, where at level i all possible mappings between the lh vertex equivalence classes ofGA and 08 are represented. Note that for some graphs this tree may just contain a root node and leaf nodes. The PMT gives us a vertex partitioning, for which all diagonals for each vertex in a partition are marked with a value sufficiently distinct from the other partitions, to allow for refined future partitioning. If the newly marked matrices are isospectral, then the algorithm moves on to the second phase described below. Now, a refined marking is executed on vertex pairs in the same equivalence class. At each stage of refinement, Q is computed, which converges towards some P. Any Q defines a weighted matching, whose cost is bounded by n, by some P that 19
PAGE 28
maximizes the overall weighted matching cost between the vertices of GA and G8 The best choice for the next proper marking, at each step, is the one that maximizes the overall cost from Q. The running time for this choice computation is O(n4). If no backtracking is needed, since the maximum number of markings is n1, the running time of this algorithm is O(n5). It is possible that the algorithm may need to backtrack after marking some nodes, when it is unable to proceed to a permutation matrix P. In this case, the running time again may become exponential as in the marking algorithm. However, ifPINGs are present, nonisomorphism will be detected early, unlike in the marking algorithm. In some cases, if backtracking is not required, P may be found earlier than in the marking algorithm as well. 20
PAGE 29
2. Developer Requirements This section describes the requirements in paragraph format for the Graph Isomorphism Program from a developer's point of view. Some ofthese requirements were verbally discussed and some were left to be decided by the developer. This program will determine if two simple, undirected unweighted graphs with no selfloops or multiple edges are isomorphic or not. The program will have a Graphical User Interface (GUI) and must be able to run on Microsoft Windows XP. Once the program is launched, the user will be allowed to input two graphs to check if they are isomorphic or not. The program will allow for four types of graph input. Both graphs must be specified using one of these four input formats. Two of the input types will have a limit of onehundred nodes. The user will have to select which format he/she would like to use, and the program will provide a method for this selection. The user may choose to supply the graphs by using GUI controls designed to represent adjacency matrices for the graphs. There will be two of these Adjacency Matrix Graph Representation (AMGR) GUI controls, sidebyside, to allow for graph input and modification. The AMGRs will allow the user to input graphs with up to one hundred nodes 21
PAGE 30
The user will be able to specify files for the input graphs. The format of such files will be explained in a 'ReadMe.doc' file accompanying the program. The number of nodes and edges for the input graphs will be unlimited when using this type of input. The third type of graph input will be a Graph Diagram Builder (GDB), which will allow the user to build a regular graph diagram by adding nodes and edges to this builder. This GOB will also allow for deletion of nodes and edges, and a mechanism for relabeling of nodes. There will need to be one GDB for each input graph, and it should not allow any selfloops or multiple edges, and no more than onehundred nodes. The fourth and last type of graph input will be a Random Graph Generator (RGG), which will allow the user to generate random input graphs. The user will need to specify the number of nodes and edges, with no graphsize limit imposed. The user will be able to create both graphs randomly, or create the one random graph and another random graph isomorphic to the first. The program should treat the AMGRs as the main input format to read from prior to performing the isomorphism check, as discussed below. A button will be provided for the user to click on once the input graphs have been provided in one of the three formats. When this button is clicked, the program will first determine which format the user chose to provide the input graphs. If the user chose to provide the input graphs from files, the program will check to see if the 22
PAGE 31
files exist and that they are in the right format. The format will be checked as the program reads in the graphs, from the files, into the AMGRs. The program will also, at this time, check to see that the graphs do not contain any selfloops or multiple edges. If the user chose to provide the input graphs using the GDBs the AMGRs will be updated from them. Similarly, if the user chose to provide the input graphs from the RGGs, the AMGRs will be updated from them. After checking the input and updating the AMGRs, the program will check to see ifthe graphs have the same number of nodes and edges. Then finally, the main part of the program is executed to see if the graphs are isomorphic or not. The program will check for isomorphism according to The Marking Algorithm presented in Altman's paper [3]. This algorithm will not be discussed in this section, but will be discussed in the detailed design for this program along with some of its implementation details. Once the program has checked for isomorphism and the results have been given to the user, the userprovided inputs the will not be cleared. Another button will be provided to clear the form, if the user desires. This will allow the user to modify the graph inputs and check for isomorphism again using the same button. The user can either modify the AMGRs, or modify the input graphs using the same format used to provide the input graphs originally. The user will have to select whichever format he/she will modify for both graphs, as in the previous run. Then the user can 23
PAGE 32
make the modifications, and the program will then use the input format selected, to obtain the input graphs before checking them for isomorphism. When using the RGGs for the input graphs, if the user chooses, the program will generate the second graph (B), isomorphic to the first (A), by performing random permutations. A random permutation matrix (P) will be created and pT AP will produce B. The program will store P, also while running the algorithm to check for isomorphism when using RGGs, the program will also store the permutation matrix produced by the algorithm. A button will be provided to compare these two permutation matrices, after checking for isomorphism. Finally, the program will also provide a progress bar and status textindicator to show the progress while checking for isomorphism. The time taken to check for isomorphism will also be reported by the program. 24
PAGE 33
3. HighLevel Design This section is for the Graph Isomorphism Program. It will discuss in paragraph format, how the program will accept input and give output. In other words, it will describe the Graphical User Interface (GUI) for the program. These designs follow directly from the requirements for the program. The program will need some way of allowing the user to choose the way in which the input graphs will be provided. That is, either from adjacency matrices, files, graph diagrams, or random graphs. This can be accomplished by radio buttons across the top left of the form. The first one on the left will be "Adjacency Matrices" and will be selected by default. Then we will have the radio button for "Files," then one for the "Graph Diagram Builder," and the last one on the right, will be for the "Random Graph Generator." Above these radio buttons will be the following label: "Select the method for providing the input graphs." Below the radio buttons will be two text fields labeled "Graph A" and "Graph B" for the full paths to the input files. There will also be "Browse" buttons to the right of each of these text fields to allow the user to navigate to the files. All four of these controls will be disabled unless the user selects the "Files" radio button. Below these controls, there will be three tabs: one for the "Adjacency Matrix Graph Representation" (AMGR) controls, one for the "Graph Diagram Builder" (GDB) controls, and one for the "Random Graph Generator" (RGG) controls. 25
PAGE 34
The AMGR controls will be two sidebyside checkbox grid controls with labels "0" to "1 00" for both rows and columns. All checkboxes in each of the AMGRs will be unchecked and the user will have to simply check a box to represent an edge. Both of these controls will be disabled until the user selects the number of nodes from a list box, labeled "Number of nodes," in the middle of the two controls. When the user selects the number of nodes, the controls will be enabled and the grids resized to match the number of nodes selected. These grid controls will need vertical and horizontal scrollbars to allow the user to scroll through the rows and columns. If the user selects a checkbox on the diagonal of either grid, to represent a selfloop, a message will be displayed that the selection will be ignored, as selfloops are not allowed. The GDB controls, will reside on another tab. Each of these sidebyside controls will allow the user to build a diagram of a graph using nodes and edges. The details of these controls cannot yet be determined, as no such standard controls exist. Most probably, some third party components will be used to build the GDB controls. These controls will not allow, or the program will ignore, selfloops and multiple edges. These controls will not allow the user to attempt to build a diagram with more than 1 00 nodes. These controls will allow the user to change the labels of nodes, delete edges, and delete nodes. The last and third tab will be for the RGG controls. At the top of this tab, there will be a label, "Specify the number of nodes." Below this label will be a text 26
PAGE 35
box to allow the user to enter the number of nodes. Below this, there will be another label, "Specify the number of edges." Below this label will be another text box to allow the user to enter the number of edges. Below and indented to the right, will be a button to "Generate Random Graph A" followed below by another to "Generate Random Graph B." Then below these buttons, will be the label "OR," followed below by another button to "Generate Random Graph B Isomorphic to Graph A." Finally, below this button and aligned to the bottom right of the tab area, will be a button to "Check if Permutation Matrices Equal." This button will only be enabled if the user chooses to generate graph B isomorphic to A, and after the program has checked for isomorphism and found the graphs to be isomorphic. If the program works correctly, the latter will always be true if the former is true. Underneath these tabs to the right there will be two buttons, one for "Clear Form" and to the right of that, one for "Check for Isomorphism." The form clear button will clear all the input the user has provided and reset the form, variables, and data structures. The other button, when clicked will first check all the input for errors, then store the graphs in an internal object representation along with other variables, and then find out if the graphs are isomorphic. The program will check for isomorphism according to the Marking Algorithm presented in Altman's paper [3]. This algorithm will not be discussed here but will be discussed in the detailed design for this program along with other implementation details. 27
PAGE 36
On the opposite side of the form, from the buttons discussed in the above paragraph, to the left, will be a status indicator, below which will be a progress bar. As the program checks for isomorphism, the different stages will be indicated by the status indicator. Suitable status indication texts will be decided and implemented when the program is written. Similarly, the progress bar will progress as computations are completed by the program, and these computations and progressions will have to be decided at the time the program is written. This is due to details that have not yet been designed and are beyond the scope of this section. Once the program has found the graphs to be isomorphic to one another, or not, this result will be indicated by the status indicator and the progress bar will be reset. A message box will also accompany, to indicate the result. After the user clicks the "OK" button on this message box, the status indicator will also be reset, but the form will not be cleared. This is so that the user can modify the input and check for isomorphism again. Lastly, we will discuss the format of the input files, if files are used for the input graphs. The files must be simple text files and could be created in Notepad, which is available in Microsoft Windows XP. The first line of the file will be the number of nodes. The node numbering or labeling will begin at 1, not 0. All other lines will simply give two numbers separated by a comma, to represent an edge. There can only be one edge on each line. Empty lines or lines with only whitespace 28
PAGE 37
characters will not be accepted anywhere in the files An example of a simple file would look like the following. 6 4,5 6,2 1,3 4,3 1,2 Fi2 3.1 Example Graph Input File 29
PAGE 38
4. Detailed Design This section will address in paragraph format, the implementation design details for the program. These designs follow directly from the requirements and highlevel design for the program. The previous two sections should be reviewed prior to reading this section. This detailed design will be the final documentation leading to the actual implementation of the program. First, we will discuss the hardware and software requirements for running the program and for the computer that will be used to build the program, along with the development environment and programming language. Second, we will discuss the main code files, classes, and data structures that will be part of the program. Third, we will discuss the specification of the main algorithms that will be used by the program for Graph Isomorphism (GI). The GDB or Graph Diagram Builder controls, that allow the user to provide input graphs by creating graph diagrams, will need to be created using some third party component(s), which will be determined during implementation. 4.1 Program Hardware and Software Specifications The minimum hardware and software requirements for a machine to be able to run this program are as follows. Intel Pentium 4 1.6GHz or compatible CPU 256 MBofRAM 100MB offree harddisk space 30
PAGE 39
Video Card that supports 32Bit color quality and 1152x864 screen resolution 17inch color monitor Microsoft Windows XP operating system Microsoft .NET Framework version l.l.x The hardware and software specifications for the machine that will be used to build this program are as follows. Intel Pentium 4 2.4GHz CPU 256MBofRAM 24GB of free harddisk space NVIDIA GeForce4 MX 440 with AGP8x video card 17inch color monitor Microsoft Windows XP Professional with Service Pack 2 operating system Microsoft .NET Framework version 1.1.4322 The development environment chosen for the program is Microsoft Visual Studio .NET Architect 2003, and the programming language is Visual Basic .NET or VB.NET. This is an objectoriented language, released in 2003, developed by Microsoft as a very highlevel easytouse programming language. The Object Oriented Paradigm will be used in building the program. 4.2 Code Files, Classes, and Data Structures The program will have a file named "Main.vb" which will contain all the autogenerated code from the development environment's GUI builder for the single GUI form of the program. This file will contain any code required to allow user interaction with the GUI. It will contain the code that allows the GUI to interact with the rest of the program's functionality. All the functionality in this file will be contained in a single autogenerated class called "Main." This Main class will 31
PAGE 40
contain the main code that will allow all of the objects of the program to interact with each other as well. Furthermore, this class will contain functions to update the status indicator and progress bar of the GUI form and will have global member object variables for the classes below. A timing mechanism will also be implemented for final analysis. Another file called "Graph. vb" will contain a single class named "Graph" to implement the graph data structure. This data structure will be used to store internally and interact with simple graphs throughout the processing of the program. This class will contain three, main global member variables to represent a graph: one for the number of nodes in a graph, another for the number of edges, and a double dynamic integer array to store the adjacency matrix of a graph. The class will contain input functions for adding a node and adding an edge to a graph for building graphs. These functions will not allow multigraphs, multidigraphs, or graphs with selfloops to be created. The class will also contain accessor functions to get the number of nodes, number of edges, and any valid entry from the adjacency matrix. A file called "GraphEigenValues.vb" will contain a single class named "GraphEigen Values" to provide the program with the functionality to find the Eigen values of a graph. The constructor for this method will take a Graph object for which the Eigenvalues need to be found. Thus, each object of this class will deal with a single graph. A thirdparty component will be used for the calculations, more details can be found in the "Algorithms" section. One global member variable will be 32
PAGE 41
needed to represent a matrix in the format the third party component needs. The constructor will put the supplied graph's adjacency matrix into this format. One main, readonly property function called "GetEigenValues" will use the thirdparty component to return the Eigenvalues of the graph as an integer array. Another code file for this program will be named "Isomorphic.vb" and will contain a single class called "Isomorphic." The Isomorphic class will use objects of both the Graph and GraphEigenvalues classes, discussed above, to find out if the two input graphs are isomorphic to one another. This class will check for isomorphism according to the Marking Algorithm presented in Altman's paper [3]. This paper also mentions the use ofthe LRCholesky method for finding Eigenvalues of a graph. This class will implement functions for the Marking Algorithm as described in the next section. This class will also have a function for comparing Eigenvalues of two graphs to see if they are the same or not, using objects of the GraphEigenvalues class. The algorithm used by the class above will require a fifth file named "IntPairStack.vb," which will contain a structure called "IntPair," and a class called "IntPairStack." The IntPair structure will contain two public member integer variables called "Firstlnt" and "Secondlnt." The IntPairStack class will have one private member variable, which will be an array of IntPair structures. Later in this section, we will discuss how this stack will be used by the algorithm. The stack will be a first in last out or last in first out stack with push and pop functions that take and 33
PAGE 42
return an IntPair, respectively. The constructor will take two integers, to push onto the stack. 4.3 The Marking Algorithm This algorithm is based on the Principal Axis Theorem and a specific marking of the diagonal elements of the adjacency matrix. Computation of Eigenvalues is part of the algorithm. To ensure that the algorithm for computation of these converges, it is necessary that only symmetric, positive definite matrices are used to represent the graphs. Matrices for undirected graphs are symmetric and the algorithm can be used for simple undirected graphs. For the matrices to be positive definite, it is sufficient that the matrices be strongly diagonally dominant, i.e., the sum of the absolute values of the nondiagonal entries must be less than the absolute value of the diagonal entry, for each diagonal entry in each row. This can be accomplished by setting the diagonal entries to n. This will not change the structure of the permutation matrix P, whose existence will show that the input graphs A and Bare isomorphic. The LRCholesky method for computing Eigenvalues is recommended, since it is guaranteed to converge for the type of matrices described above with sufficient accuracy in polynomial time. The algorithm attempts to iteratively change or mark the diagonal entries of the adjacency matrices so that they remain isospectral or orthogonally congruent or have the same Eigenvalues, while converging to an Eigen value multiplicity of one, or having distinct Eigenvalues. If successful, the matrices will be permutation similar, and thus isomorphic. The marking maps diagonal 34
PAGE 43
elements of one matrix to the other, essentially creating the node permutations, and ultimately finding out if the graphs are isomorphic. With certain isospectral graphs, this algoritlun will be forced to backtrack and try a different marking sequence, causing it to run in exponential time. Pairs of Isospectral Nonisomorphic Graphs or PINGs, where the graphs are orthogonally congruent, but not permutation similar, will cause this algoritlun to backtrack. If the graphs are isomorphic, the algoritlun marks their adjacency matrices in such a way as to produce orthogonally similar matrices, that is, they have the same Eigenvalues. However, whether the graphs are found to be isomorphic or not, the algoritlun hopes they remain isospectral during the marking, otherwise backtracking will be required. Ifthe graphs are isomorphic and no backtracking is involved, the running time of the algoritlun is O(n4). Below is the pseudocode for the algoritlun. This pseudocode needs to be refined in order to implement the backtracking functionality. Initialize ilast = jlast = 1 IfNot SameEigenValues(A, B) then return "NOT ISOMORPHIC" Else Endif For i from ilast to n A[i, i] = 2in For j from jlast ton IfB[j, j] = n then B[j,j] = 2in End For End For If SameEigen Values( A, B) then Exit jloop Else Ifj = n then BACKTRACK if possible, else return "NON ISOMORPHIC" Else B[j, j] = n 35
PAGE 44
In order to implement the backtracking functionality, this algorithm will be divided into two functions, a parent and a child. The parent function can then call the child function in a loop. The public parent function will be called "ChecklfGraphslsomorphicByMarking" and the private child function will be called "MarkGraphs." The Stack data structure, as described in the previous section, will be used here and the algorithms are more detailed. The pseudocode for the parent is given below. Initialize ilast = 1 Initialize jlast = 0 Initialize BackTrack= False Initialize Stack(ilast, jlast) Set all diagonal entries in A and B to n While (True) IfMarkGraphs(A, B, Stack, BackTrack) Then Return ISOMORPHIC Else If BackTrack Then If Stack is Empty Then Return NOT ISOMORPHIC Else Return NOT ISOMORPHIC Endlf If the marking is successful, the child returns true and we have isomorphic graphs. Otherwise, the child function will set the BackTrack variable, letting us know if we need to attempt to backtrack or not. If not, then the graphs are not isomorphic; otherwise, we attempt to backtrack. However, if the stack is empty, we cannot backtrack, meaning the graphs are not isomorphic. The pseudocode for the child is 36
PAGE 45
given below. It is almost the same as the original pseudocode with some slight changes to work with the parent function and add more detail. Note that both parameters of this child function are reference parameters, that is, if the child function modifies the passed arguments, these changes will be made to the actual objects declared in the parent function. The first two are for the graphs, one is the IntPairStack and the other is the backtracking variable. Initialize BackTrack= False Initialize lntPair = Stack.Pop lfNot SameEigenValues(A, B) then return FALSE Else Endlf Fori from IntPair.Firstlnt (ilast) ton A[i, i] = 2in For j from 1 to n IfBO,j] =nAND j != IntPair.Secondint Glast) then BO,j] = 2in Endlf End For End For Return TRUE IfSameEigenValues(A, B) then Else End If 37 jLast = j Stack.Push(i, j) Exitjloop BO,j] = n Ifj = n then Endlf BACKTRACK= TRUE Return FALSE
PAGE 46
4.4 The LRCholesky Algorithm Now it is time to discuss the LRCholesky algorithm for computing the Eigen values of a graph. I have been researching this algorithm using resources online and have found the problem of calculating Eigenvalues to be a very difficult one, in the field of Linear Algebra. In mathematics, and in particular in Linear Algebra, an important tool for describing Eigenvalues of square matrices is the characteristic polynomial. Saying that A. is an Eigenvalue of A is equivalent to stating that the system oflinear equations (AA.I)v = 0 (where I is the identity matrix) has a nonzero solution v (namely an Eigenvector), and so it is equivalent to the determinant det(A A.I) being zero. The function p(A.) = det(A A.I) is a polynomial in A. since determinants are defined as sums of products. This is the characteristic polynomial of A, and the Eigenvalues of a matrix are the zeros of its characteristic polynomial. If the matrix is small, we can compute them symbolically using the characteristic polynomial. However, this is often impossible for larger matrices, in which case we must use a numerical method. In fact, for any square matrix larger than 3 x 3, the problem becomes difficult to solve. An analytic solution for the Eigenvalues of2x2 matrices can be obtained directly from the quadratic formula. If then the characteristic polynomial is 38
PAGE 47
det = (a A)( d A) be = A2 (a d) A + (ad be). [aA b ] e dA Notice that the coefficients are the trace TR(A) =a+ d and the determinant det(A) = adbe, so the solutions are A formula for the Eigenvalues of a 3x3 matrix or 4x4 matrix could be derived in an analogous way, using the formulae for the roots of a cubic or quatric equation. The LRCholesky algorithm relies on reducing the matrix A to tridiagonal form. It uses Laguerre's shifts for the symmetric tridiagonal matrices. The general algorithm is as follows. Let A be a s.p.d. matrix. Ao=A k=O,l,2, ... choose shift crk Akcrkl = VkVkT(Cholesky decomposition, Vk is lowertriangular) Ak+I = vkrvk + crkl Two steps of the zero shift Cholesky LR are equivalent to one step of the zero shift QR. The shift crk should be such that Akcrkl is positive definite. When applied to a symmetric positive definite diagonalplussemiseparable matrix, D + S, where D is the diagonal, the shift can be included into the diagonal part. Algorithms for the Laguerre's Shift, vrv Product, and the Cholesky Decomposition are also needed, along with the equations for their computation. 39
PAGE 48
Implementing this algorithm becomes a more complex problem than implementing the algorithm for GI, discussed in the previous section. The main purpose of this program and thesis work is to solve the GI problem in Computer Science. Since this Eigenvalue problem is mainly a study in Linear Algebra, and because of the complexities of its implementation, it seems unnecessary to spend the time and effort in implementing this algorithm. Also, code for this algorithm is not readily available over the internet either, it seems that only companies have implemented similar algorithms and are charging for their components and/or software. After a significant amount of research, a .NET component that will work with VB.NET called .NET Matrix Library 2.5 made by BlueBit Software, was found. The advanced version 2.5 can calculate the Eigenvalues of a matrix efficiently using the Cholesky Decomposition, for symmetric real matrices. Of course, the matrix may have to be supplied in a form that is accepted by the functions provided in the .NET component for calculating Eigenvalues. In addition, learning to use the library will also be another task, but the process should be simple and documentation should be available on the internet or with the developer install. This component may need to be purchased since the trial version only handles 6x6 size matrices. Using this component will allow focus on the GI problem for the thesis work rather than worrying about the Linear Algebra problem of calculating Eigenvalues. 40
PAGE 49
5. Testing and Analysis The material in this section will serve as a roadmap for thoroughly testing the program. The test cases will be organized according to the number of nodes, increasing from lower boundary values. Since the program accepts input graphs in four different formats, that is, adjacency matrices, files, diagrams, and randomly generated graphs, some tests will be run four times, once for each format. It will also be necessary to have tests for a particular format that do not apply to the other formats. Similarly, for some tests it will only make sense to use one type of format. Test data has been organized into the table below. Three ofthe graph input formats will be abbreviated as AMGR for Adjacency Matrix Graph Representation; GDB for Graph Diagram Builder; and RGG for Random Graph Generator. The first column is the test number for each distinct number of nodes and edges, including a letter for the different input formats. The input format used to provide the graphs is given in the "Input Type" column. The "Program State" column specifies whether, before running the test, the program was relaunched, the form was cleared, or the input modified. The "lso" column specifies whether the test was for isomorphic graphs or not. Before testing, in all cases, we should know whether the graphs are isomorphic or not. For each test, if the program correctly identifies the graphs as isomorphic or not, the "Passed" column will contain "Yes," to show that the program 41
PAGE 50
passed the test. The "Time Taken" column gives the actual running time rounded to the nearest millisecond, as reported by the program in the status indicator. Table 5.1 Test Data Test Nodes Edges Program State Jso Time Taken Passed 1a 0 0 Files Relaunched y N/A Yes 1b 0 0 GDB Relaunched y N/A Yes 2a 1 0 AMGR Relaunched y N/A Yes 2b 1 0 Files Form Cleared y N/A Yes 2c 1 0 GDB Form Cleared y N/A Yes 3a 2 0 AMGR Relaunched y N/A Yes 3b 2 0 Files Input Modified y N/A Yes 3c 2 0 GDB Input Modified y N/A Yes 4a 2 1 AMGR Relaunched y N/A Yes 4b 2 1 Files Relaunched y N/A Yes 4c 2 1 GDB Relaunched y N/A Yes 4d 2 1 RGG Relaunched y N/A Yes 5a 3 1 AMGR Relaunched y 750 Yes 5b 3 1 Files Form Cleared y 109 Yes 5c 3 1 GDB Form Cleared y 94 Yes 5d 3 1 RGG Form Cleared y 78 Yes 6a 3 2 AMGR Relaunched y 313 Yes 6b 3 2 Files Input Modified y 125 Yes 6c 3 2 GDB Input Modified y 78 Yes 6d 3 2 RGG Input Modified y 94 Yes 7a 3 3 AMGR Form Cleared y 0 Yes 7b 3 3 Files Relaunched y 406 Yes 7c 3 3 GDB Relaunched y 406 Yes 7d 3 3 RGG Relaunched y 109 Yes 8a 4 2 AMGR Form Cleared y 0 Yes 8b 4 2 Files Form Cleared y 125 Yes 8c 4 2 GDB Form Cleared y 94 Yes 8d 4 2 RGG Form Cleared y 109 Yes 9a 4 2 AMGR Form Cleared N 16 Yes 9b 4 2 Files Input Modified N 125 Yes 9c 4 2 GDB Input Modified N 78 Yes 9d 4 2 RGG Input Modified N 78 Yes 42
PAGE 51
lOa 4 3 AMGR Input Modified y 0 Yes lOb 4 3 Files Relaunched y 406 Yes lOc 4 3 GDB Relaunched y 391 Yes IOd 4 3 RGG Relaunched y 109 Yes lla 4 3 AMGR Input Modified N 16 Yes llb 4 3 Files Form Cleared N 109 Yes lie 4 3 GDB Form Cleared N 78 Yes lid 4 3 RGG Form Cleared N 109 Yes 12a 5 4 AMGR Input Modified y 0 Yes 12b 5 4 Files Input Modified y 125 Yes 12c 5 4 GDB Input Modified y 94 Yes 12d 5 4 RGG Input Modified y 78 Yes 13a 5 4 AMGR Relaunched N 312 Yes 13b 5 4 Files Input Modified N 125 Yes 13c 5 4 GDB Input Modified N 94 Yes 13d 5 4 RGG Input Modified N 94 Yes 14a 6 9 AMGR Form Cleared y 16 Yes 14b 6 9 Files Input Modified y 109 Yes 14c 6 9 GDB Input Modified y 94 Yes 14d 6 9 RGG Input Modified y 94 Yes 15a 6 9 AMGR Form Cleared N 0 Yes ISh 6 9 Files Input Modified N 172 Yes 15c 6 9 GDB Input Modified N 78 Yes 15d 6 9 RGG IOQ_ut Modified N 94 Yes 16 20 16 RGG Relaunched N 328 Yes 17 20 16 RGG ln.Qut Modified y 188 Yes 18 20 45 RGG Ingut Modified N 94 Yes 19 20 45 RGG Input Modified y 156 Yes 20 50 50 RGG Input Modified N 141 Yes 21 50 50 RGG Input Modified y 1234 Yes 22 52 100 RGG Input Modified N 105 Yes 23 52 100 RGG Input Modified y 2016 Yes 24 100 120 RGG Input Modified N 125 Yes 25 100 120 RGG lnQut Modified y 28781 Yes 26 105 330 RGG ln_put Modified N 172 Yes 27 105 330 RGG ln_2ut Modified y 38641 Yes Now that we have a set of test cases and their results, we can analyze the test data. 43
PAGE 52
The program result was as expected; for all test cases, meaning, in each case, the program passed the test. For the first tests, up to and including 4d, the program did not report any running times. This is due to the fact that the program did not run the algorithm to check for isomorphism. This is the expected behavior in these tests, since the nodes were less than three and the number of edges could not be more than one. From test 5a through 15d, the program state is changed to see ifthe program gets the correct result in different situations, which it did. The running times in these tests are so small due to the small number of nodes and edges, that they are more dependant on the machine and operating system than the number of nodes and/or edges. However, in most cases, if the user needs to run multiple checks, the user would simply modify the input rather than clearing the form or relaunching the program. Therefore, in the tests where the input was modified, checks for isomorphism were performed a few times to make sure the running times were not significantly different. This is important if we want to analyze the running time. The program code does not store any data for future calculations or processing between checks for isomorphism. With these rechecks, we now know that this is also the case for the third party component used for Eigenvalue calculations, the operating system, the .NET framework, and the machine itself. From test 5a through 15d, the different input types are also tested, to see if the program gets the correct result no matter how the input graphs are provided, which it 44
PAGE 53
did. These tests also show that the type of input format used to provide the graphs effects the running time. Therefore, we should use the same input type for analyzing running times. The last tests, from test 16 through 27 all use the RGG input type to randomly create input graphs, both isomorphic and not. Generally speaking, when the program randomly creates both input graphs, they will not be isomorphic. If the user chooses to create graph B isomorphic to A, then they will always be isomorphic. In general, these tests show that the running times are significantly higher when the graphs are isomorphic. This makes sense since the randomly generated graphs probably do not have the same Eigenvalues before marking, even when the diagonal elements are all the same in both graphs. These tests also show that the running times are much more dependant on the number of nodes and not the number of edges. This is because increasing the number of edges, with the same number of nodes, to make the graphs more dense, does not significantly change the running time. For isomorphic tests, from test 16 through 27, the program option to randomly create graph B isomorphic to graph A, was used. When using this option, the program randomly generates a permutation matrix to create graph B isomorphic to A, which is stored. The program then also has another option to see if this permutation matrix is the same as the one generated by the algorithm, after checking for isomorphism. This check was performed in all of these tests and showed that there 45
PAGE 54
can be more than one permutation matrix. Furthermore, it showed that it is more likely to have the same permutation matrices for denser graphs. The program also has other options for the GDB input format. You can copy the diagram from graph A to B and then modify it to save time. This option was used for all GDB input format tests. Other options allow you to swap two node labels in the diagrams and randomly relabel the nodes in the diagram for graph B. These two options were tested independently from the tests given in the table above, and worked correctly. Similarly, many other tests, not in the table above, were performed to make sure the program works correctly. For example, there were many tests performed to ensure the program worked correctly and displayed the correct messages when bad input was provided by the user. Now it is time to do the final running time analysis, that is, to show that the running time is actually O(n4), as is for the algorithm, and that the program code correctly implements the algorithm. The running time of O(n4 ) is based on the number of nodes only, and the program test results show that denser graphs do not significantly change the running time, if the nodes are kept constant. This running time is the highest order term of a quatric equation, An4 + Bn3 + Cn2 + Dn =Actual Running Time. Here A, B, C, and D are all constants and n is the number of nodes. From tests 21, 23, 25, and 27, we can get the numbers of nodes and actual running times, and solve 46
PAGE 55
the system of equations that result. The system of equations that needs to be solved IS, 6250000A + 125000B + 2500C +SOD= 1234 7311616A + 140608B + 2704C + 52D = 2016 1 OOOOOOOOA + 1 OOOOOOB + 1 OOOOC + 1 OOD = 28781 121550625A + 1157625B + 11025C + 105D = 38641. The solutions for A, B, C, and D were found using a computer program available on the internet [20]. This gives us an equation for the actual running time for the program, 93323757 4 58917728297 3 142439935447 2 19958154731 n n + n n. 21221200000 63663600000 2122120000 12732720 Using this equation, now we can predict the running time for, say, 75 nodes, which comes to 8717 milliseconds or approximately 9 seconds. Running the program using randomly generated isomorphic graphs with 7 5 nodes and 150 edges, a running time of9345 milliseconds or approximately 9 seconds was reported. While not exactly equal, the two running times are equal when rounding to the nearest second. However, this will not always be the case, because the algorithm has backtracking built into it, and depending on the graphs, it may run in exponential time. Furthermore, the test data used to create the system of equations above may have contained tests in which backtracking was needed. To some extent however, we have shown that the running time of the implemented algorithm is indeed O(n4). Therefore, the actual running time may be less or more than the running time calculated using the equation above. 47
PAGE 56
6. Conclusion The Vertex Filtering (VF) and McKay's Nauty were two algorithms researched in more depth as they were advertised as the best available. However, there is no efficient general Graph Isomorphism (GI) algorithm. In the worst cases, these algorithms perform the same as the bruteforce naive algorithm. That is, with a 0(1) exponential running time, where t and s are the number of nodes in the input graphs. However, in most cases, they outperform any other available algorithms. It seems the VF2 and the Nauty algorithm can perform equally well, depending on the type of graphs and valence. The running times for these algorithms are highorder polynomials. Also studied in depth, were the Marking Algorithm and the Projection Algorithm, both given in Altman's paper [3]. These are not solutions to the general problem, nor are they approximation algorithms; they basically restrict the input to simple undirected graphs. For many applications, they can be very useful. The Marking algorithm has a running time of O(n4 ) and the Projection Algorithm has a running time of O(n5). In the worst case, both will still run in exponential time, in the Projection Algorithm the likelihood of this happening is decreased The GI program designed, written, and tested; was discussed with Altman and implements the Marking Algorithm [3]. Developer Requirements, HighLevel Design, and Detailed Design documents were created for the program. This program 48
PAGE 57
uses the Marking Algorithm to see if two simple undirected graphs are isomorphic. While writing these documents, a significant amount of research was done on calculating Eigenvalues, using the LRCholesky method, and no useful pseudocode or actual code was found for this daunting task. Since the algorithm relied on this, a suitable thirdparty component was found. The same is true for the graph diagram builder that is part of the program and allows the user to enter the two input graphs by drawing them. However, this functionality required much more coding to get it to work correctly, and even more research and testing to find a suitable thirdparty component. Writing the program took up most of the time spent on this thesis, and modifications to the design were required throughout the process, but in the end, it was successful. A Testing and Analysis document was also written and used as a road map for thoroughly testing the program. The program created is a userfriendly GUI program with robust error checking and reporting for bad input and system exceptions. Four different input formats can be used to specify the graphs; these include adjacency matrices, files, diagrams, and random generation. The adjacency matrices are provided by simply selecting the number of nodes from a list, and then selecting checkboxes for edges. The diagram builder controls required the most custom programming and allow the user to copy the diagram for graph A to graph B. Options are also available to swap labels of two nodes and for graph B, to randomly relabel the nodes. Graph B can be randomly generated along with A or isomorphic 49
PAGE 58
to A, using a randomly generated permutation matrix. This matrix can then be tested for equality against the one found by the algorithm, after it discovers the graphs are isomorphic. The program also includes an option to clear and reset the form, and a status indicator along with a progress bar. The program has been thoroughly tested and is an easytouse, reliable tool for checking if two graphs are isomorphic. When randomly generating both graphs, for larger numbers of nodes, they were generally found to be nonisomorphic. When randomly generating graph B isomorphic to A, the random permutation matrix used to create graph B was not always the same as the one found by the algorithm. This shows that there can be multiple permutation matrices for pairs of isomorphic graphs. It was also found that this is less likely for denser graphs, and in such cases, the two permutation matrices were the same. The program's running time was dominated by the number of nodes, as is the case for the algorithm. In fact, in the Testing and Analysis section of this thesis, to some extent, it is shown that the running time of the program is O(n4), as given for the algorithm [3]. 50
PAGE 59
REFERENCES 1. Cvetkovic, D.M Doob, M and Sachs, H., Spectra of Graphs Theory and Application, Academic Press, New York (1980). 2. Gross, J., Yellen J., Graph Theory and Its Applications, CRC Press, Boca Raton, Florida (1998). 3. Altman, T., An Algebraic Approach to the Graph Isomorphism Problem, to appear in Ullman's Quarterly Journal. 4 Foggia, P., Sansone, C., Vento, M., A Performance Comparison of Five Algorithms for Graph Isomorphism. Napoli, Italy (2003). 5 Kukluk J.P., Holder, L.B., Cook, D.J., Algorithm and Experiments in Testing Planar Graphs for Isomorphism, Journal of Graph Algorithms and Applications (2004) 313356. 6. Plestenjak, B., Camp, E.V., and Barel, M.V., A Cholesky LR Algorithm for the Positive Definite Symmetric DiagonalPlusSemiseparab/e Eigenprob/em, Leuven, Belgium, 2005. 7. Corneil, D.G and Gotlieb, C C., An Efficient Algorithm for Graph Isomorphism, Journal of the Association for Computing Machinery 1970. 8. Ullman, J.R., An algorithm for Subgraph Isomorphism, Journal of the Association for Computing Machinery, 1976. 9. Schmidt, D.C. and Druffel, L.E., A Fast Backtracking Algorithm to test Directed Graphs for Isomorphism using Distance Matrices Journal ofthe Association for Computing Machinery, 1976. 10. Cordelia, L. P., Foggia, P., Sansone, C., and Vento, M., Performance evaluation of the VF graph matching algorithm, Proc. 1Oth Int. Conf. on Image Analysis and Processing, 1999 51
PAGE 60
11. Cordelia, L. P., Foggia, P., Sansone, C., and Vento, M., A Subgraph Isomorphism algorithm for matching large graphs, IEEE Transactions On Pattern Analysis and Machine Intelligence, 2004. 12. McKay, B.D., Practical Graph Isomorphism, lOth. Manitoba Conference on Numerical Mathematics and Computing, 1981. 13. Bunke, H., Messmer, B.T., Efficient Attributed Graph Matching and its Application to Image Analysis, Proc. on Image Analysis and Processing, 1995. 14. http://en.wikipedia.org/wiki/Graph_isomorphism_problem, 2006. 15. http://www.cs. ucf.edu/newsletter/vol1 /issue_ two/graphtheory.html, 2006. 16. http:/ /mde.abo.fi/confluence/ download/attachments/67 5/LillqvistThesis. pdf, 2006. 17. http://amalfi.dis.unina.it/graph/db/papers/vfalgorithm.pdf, 2006. 18. http://en.wikipedia.org/wiki/Graph _theory#History, 2006. 19. 2006. 20. http://www.hostsrv.com/webmab/app1/MSP/quickmath/02/pageGenerate?site =quickmath&s1 =equations&s2=solve&s3=advanced#reply, 2006. 52

