Citation
Visualizing large-scale biological systems

Material Information

Title:
Visualizing large-scale biological systems
Creator:
Gabow, Aaron
Publication Date:
Language:
English
Physical Description:
xi, 100 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Bioinformatics ( lcsh )
Biology ( lcsh )
Data structures (Computer science) ( lcsh )
Graph theory ( lcsh )
Bioinformatics ( fast )
Biology ( fast )
Data structures (Computer science) ( fast )
Graph theory ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 98-100).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Aaron Gabow.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
62860494 ( OCLC )
ocm62860494
Classification:
LD1193.E52 2005m G32 ( lcc )

Full Text
VISUALIZING LARGE-SCALE BIOLOGICAL SYSTEMS
by
Aaron Gabow
B.A., Brown University, 2001
B.S., Brown Univsersity, 2001
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
2005


This thesis for the Master of Science
degree by
Aaron Gabow
has been approved
by
Tom Altman
S/ID/05
Date


Gabow, Aaron (M.S., Computer Science)
VISUALIZING LARGE-SCALE BIOLOGICAL SYSTEMS
Thesis directed by Associate Professor Lawrence Hunter
ABSTRACT
The emergence of high-throughput biological tools has led to an explosion of
information in the past decade that biologists are still learning how to work with
effectively. One particularly weak area remains visually presenting and exploring
large collections of data. For example, biologists often visualize protein-protein
interaction data by mapping proteins to nodes in a graph, interactions to edges,
and use an automated graph layout algorithm to position the nodes. While this
method can create picture that meets basic aesthetic concerns, it is not good
at conveying information. The graph can have thousands of nodes too many
data points shown to effectively search for a particular node, and the graphic
does not lend itself to an immediate summary or meaning. We have integrated
several ideas from graph theory and graph visualization to create an application
capable in presenting large-scale biological interaction data.
Many biological data sets can be modelled as small-world networks, and
often these ones are multi-scale when clustered. We use this hierarchical decom-
position to assist in layout and providing a representation that gives an effective
overview of a graph while allowing the user to pick particular areas of interest.
111


This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
Lawrence Hunter
IV


DEDICATION
To my parents for their complete support, and to the U.S. Forces, 5th Special
Forces Group; never forget the Nam.


ACKNOWLEDGMENT
This thesis was made possible by Prof. Lawrence Hunters insight and sugges-
tions on direction. Sonia Leach and Andy Dolbey both made data available to
me for the research.


CONTENTS
Figures ............................................................ ix
Tables.............................................................. xi
Chapter
1. Overview and General Background..................................... 1
1.1 Overview ........................................................... 1
1.2 Small-World Models.................................................. 4
1.3 Layouts............................................................ 6
1.4 Clustering.......................................................... 8
1.5 Focus + Context..................................................... 9
2. Layout Algorithms.................................................. 10
2.1 Force-Directed Placement........................................... 10
2.2 LEDA Implementation................................................ 11
2.3 Modifications for Expansion........................................ 11
2.4 Alternatives...................................................... 12
3. Clustering......................................................... 14
4. Focus+Context...................................................... 19
4.1 Visualizing Trees.................................................. 22
4.2 Spacetrees and Degree of Interest Trees........................... 23
4.3 Hyperbolic Visualization.......................................... 25
4.4 Putting it All Together........................................... 30
vu


4.4.1 ExpAndroids Design............................................. 32
5. Results.......................................................... 41
5.1 Evaluation Methodology and Test Data............................. 41
5.2 Clustering Results ............................................... 43
5.3 Visualization..................................................... 46
6. Future Work....................................................... 58
6.1 Layout Algorithm and Improved Presentation........................ 58
6.2 Adjusting Clustering Threshold.................................... 58
6.3 Further Characterizing Biological Graphs.......................... 58
Appendix
A. Application Code.................................................... 60
References............................................................. 98
viii


FIGURES
Figure
1.1 Yeast Protein-Protein Interactions ................................... 2
2.1 Calculating Distance with Similar Triangles.......................... 12
3.1 A Protein Interaction Network Clustered by Bu and Han................ 18
3.2 Sets for Determining Edge Strength................................... 18
4.1 Fisheye Distortion................................................... 20
4.2 Distance in Spring Layouts .......................................... 21
4.3 Perspective Wall..................................................... 33
4.4 Cone Trees........................................................... 34
4.5 Spacetrees........................................................... 34
4.6 Degree-of-Interest Trees............................................. 35
4.7 Degree-of-Interest on a Shallow Tree................................. 35
4.8 Distance on a Poincare Disk.......................................... 36
4.9 A Radial Layout in Euclidean Space................................... 37
4.10 Gene Ontology Visualized in a Hyperbolic Viewer..................... 38
4.11 H3 with a Poincare Model............................................ 39
4.12 H3 with a Klein Model............................................... 40
5.1 Unprocessed Basic Graph.............................................. 48
5.2 Clustered Basic Graph................................................ 49
5.3 Clustered Protein Interaction Graph.................................. 50
ix


5.4 Unclustered Protein Interaction Graph............................. 51
5.5 Clustered Protein Interaction Graph............................... 52
5.6 Metabolic Interaction Graphs...................................... 53
5.7 Medline Co-Citation Graphs........................................ 54
5.8 Hyperbolic View of Stanfords Graphics Website.................... 55
5.9 ExpAndroids View of Stanfords Graphics Website.................. 56
5.10 ExpAndroids Filtered View of Stanfords Graphics Websites .... 57
x


TABLES
Table
1.1 Real World Data and Small-World Graphs......................... 5
5.1 Size Reduction from Small-World Clustering..................... 43
5.2 Comparing Ju and Aubers clustering methods .................. 44
5.3 Small-World Clustering of Data................................ 45
xi


1. Overview and General Background
1.1 Overview
High-throughput biology has released a flood of information. While having
large amounts of data is helpful, sometimes all the information is not fully
utilized. When it comes to visualizing this data for exploration or presentation,
it difficult to maintain all the information while still presenting understandable
images.
It is possible to visualize large-scale biological interactions by mapping ob-
jects such as proteins to nodes in a graph, interactions to edges, and use an
automated graph layout algorithm to position the nodes While this method
can create picture that meets basic aesthetic concerns, it is suboptimal at con-
veying information, as shown in Fig. 1.1. These graphs can have thousands of
nodes too many data points shown to effectively search for a particular node,
and the graphic does not lend itself to an immediate summary or meaning.
This thesis integrates several ideas from graph theory and graph visualiza-
tion to create an application capable in presenting large-scale biological net-
works. The data underlying these networks could be as diverse as protein in-
teractions in a cell or abstracts of biology papers that share references to a
particular gene.
The rise of high-throughput biology brings to light a problem with using
large data sets there is no Moores Law for the size of computer screens (or
the size of journal pages). While dramatic increases in memory and processing
1


*
Figure 1.1: Yeast Protein-Protein Interactions: This picture, from Nature,
shows the largest connected component of proteins in yeast. Color signifies the
phenotypic effect of knocking-out that protein [10]
2


power allow for the processing a dataset with all 20,000 genes [19], the space used
to display it remains essentially the same. Previous research considered how to
maximize screen space [22], [5] and [20], how to give maximum screen space to
the area of interest [17], and how to minimize the amount of data displayed
while maintaining the meaning of the graph [1], [12], [3] and [21]. This project
integrates all three approaches, using properties typically found in graphs of
biological data.
Some biological data sets can be characterized as small-world models [10],
[11], Non-biological networks such as the world-wide web and film actors have
been shown [1] to be multiscale. that is the highly connected components
that form a cluster in a graph have small-world properties as well. If biological
networks have this multiscale property, then the hierarchical decomposition it
implies can assist in layout and providing a representation that gives an effective
overview of a graph while allowing the user to pick particular areas of interest.
Current applications for visualizing large-scale systems in biological data
rarely treat the limitations on screen space as the main concern, ignoring key
elements that could improve the overall viewability. For this thesis, the goal of
the application is to display a modest amount of data that allows the user to
explore the dataset for interesting phenomenon without overwhelming the user.
The richness of the data is preserved by intuitive methods for moving among
different subsets of the data, while preserving the global context. We claim that
by decomposing graphs into meaningful cluster, our application, ExpAndroid,
can cluster large datasets to reduce size, lay them out, pick areas of interest
to give maximum screen space, and summarize the properties of nodes out of
3


focus.
1.2 Small-World Models
Small-world networks have recently emerged as an important area of graph
theory [26]. These graphs capture elements that are not included in more tra-
ditional network topology. A small-world network is somewhere in-between a
completely regular (crystalline) graph and a completely random one. It has
several properties that make it distinct from either.
The defining properties of small-world graphs are their characteristic path
length and clustering coefficient [26]. The path length between two given nodes
is the shortest path connecting them. The characteristic path length is path
length averaged over all pairs of vertices. The clustering value for a given node
is the percent of possible edges between its neighbors that actually exist:
c(v) = r(N(v))/(k *{k- 1)) (1.1)
where r(N(v)) is the total number of edges connecting the neighbors of v and k
is the size of vs neighborhood [1].
From these properties, it is possible to see why this graph is often used
to explain social networks. Let each node represent a person, and each edge
represent the relation is a friend of. Using this idea, an edge between a node
representing Reed and one representing Ben means that Reed is a friend of Ben.
Now most people hang out in cliques, like most co-workers are friends. If Reed
and Ben also work with Sue, both Reed and Ben are likely to be friends with her.
Of course, people know people outside of their work. Ben still keeps in touch
with his high school buddy Johnny, who now lives in Kansas. While he and
4


Table 1.1: Real-World Data and Small-World Graphs: (reproduced from Watts)
Path Length and Clustering Coefficients in the Real World. Each example shows
behavior where the characteristic path length L is about the same for random
and actual data. However, the cluster coefficient C is much greater than it would
be if the graph was randomly connected.
example L(actual) L(random) C(actual) C (random)
Film Actors 3.65 2.99 .79 .00027
Power Grid 18.7 12.4 .080 .005
C. elegans 2.65 2.25 .028 .05
Ben are still friends, they do not know each others friends. This friendship is
represented as one of the edges that goes outside the local neighborhood. Now, in
small-world models, the percentages of these topographically distant connections
are smaller than in a totally random graph. However, only a small number of
these connections will reduce the characteristic path length. Random graphs
have small characteristic path lengths and low clustering coefficients. Regular
graphs have large characteristic path lengths and high clustering coefficients.
Small-world graphs have characteristic path lengths close to random graphs,
and cluster coefficients close to regular graphs.
There are many empirical examples that map to the small-world network
model. Watts [26] speculates that this type of connectivity is typical in large,
sparse networks. To support this claim he offers three real-world examples that
are characteristic of small-world models, as shown in Table 1.2.
The C. elegans entry takes the synaptic information drawn from White et
al.s Mind of the Worm and generates a graph where a neuron is represented by
5


a node and a synapse or a gap junction is an edge.
Clearly, these sorts of networks occur in both man-made and natural sys-
tems. Furthermore, yeast metabolic pathways were similarly shown to have
small-world-like average path length and correlation coefficient [11] in the graph
formed by representing compounds as nodes, and metabolic transformations be-
tween compounds as edges. Consequently, there is reason to believe that many
of the graphs we want this application to display will have these sorts of char-
acteristics.
1.3 Layouts
Graphs are traditionally laid-out by hand, or manually readjusted after the
initial layout by an algorithm; laying out graphs entirely algorithmically is a
growing subsection of graph theory. Being able to give an effective layout with-
out human-aide becomes more useful as people regularly work with very large
graphs. The size of graphs that researchers can use increase as computer mem-
ory and speed increase. Also, many applications, such as Ecocyc, that allow for
users to explore data collections use graph layouts to present information to the
user. If users want to browse arbitrary subsets of data, the graph layout cannot
require human interaction.
There are several aesthetic constraints for laying out a graph. These include
reducing edge crossings (using a planar embedding when possible), minimization
of drawing area, minimizing total edge length, using uniform edge lengths, and
showing symmetry in the graph [2].
Characterizing a graph before layout often allows for the usage of better
layout algorithms for that particular graph. For example, algorithms exist for
6


planar graphs and trees that work much faster and do a far better job at meeting
the aesthetic constraints than algorithms for general graphs [2].
Generalized graph layout algorithms are much less effective. These algo-
rithms usually work by treating the graph as a collection of forces and then try-
ing to find a minimum to the systems energy corresponding to the best display.
This approach typically has greater computational complexity, which limits vi-
able graph size for rapid displaying. Heuristics allow for larger-scale drawings,
such as Harel and Korens [7] approach that can quickly handle graphs with
1,000 nodes.
However, heuristics have degenerate cases that slower-running force algo-
rithms typically do not have, such as Harel and Korens method offers almost
no speed up to hierarchies unless it allows for many edge crossings. More im-
portantly, even if an algorithm could take in an arbitrary graph of arbitrary size
and instantaneously produce a layout that met the typical aesthetic concerns,
this algorithm still would not solve the problem of visualizing large-scale graphs.
Graphs with thousands of nodes are difficult for the viewer to understand and
use. While limiting such conditions as edge crossings improve the readability of
the graph, the size itself places a large cognitive load on the viewer trying to
browse the graph. Furthermore, when graphs of that size are used to present
findings, there can be a fair amount of extraneous data that partially obfus-
cates the intended point. In addition to automatically laying out the graph,
it be valuable for the application to offer a way to deal with the scale of the
information.
1.4 Clustering
7


Rather than displaying all the nodes and edges in the graph, some ap-
proaches use composite nodes, where a single node stands in for a collection
of nodes and edges. This clustering is like a filter of nodes to display, without
actually removing any data.
These composite nodes can be formed in several ways. For example, if nodes
in the graph represent genes, a composite node might represent a collection of
genes with similar function. This method can cluster a graph rather quickly.
The drawback is that this sort of functional clustering might not always best
represent what is happening in the graph itself. For example, in a graph of
gene regulation for a given experiment, not all membrane transport proteins
are necessarily co-regulated, so there might not be edges between the nodes
that functional clustering group together. Furthermore, this sort of functional
annotation might not always be available to the given biological system.
Graph-theoretic approaches to clustering are applicable to a larger range
of graphs. However, optimal clustering is an NP-Hard problem. While many
approximation algorithms exist, approximating with accuracy is still very slow,
sometimes no faster than the NP-Hard problem itself.
We use a technique developed by Auber et al. [1] to quickly cluster small-
world networks. The technique was found by Auber to usually give clusters
that could be broken down into even smaller clusters, breaking a small-world
graph down into smaller and smaller clusters. This approach offers a potentially
quick way to cluster, suggests possible ways to manage the amount of data, relies
on the topology of the graph itself, and assumes very little about the data. The
main requirement is that the graph model of the data has small-world properties,
8


which is a good assumption for many biological datasets.
1.5 Focus -f Context
One study that influenced the direction of the application was performed
by Munzner et al. [18] while working with biologists on ways of comparing
phylogenetic trees. The scientists mentioned that they wanted to reliably detect
structural differences, allowing them to focus in upon the area of interest. While
focusing in on these areas, they still want to see the differences in the global
context of the models [18]. This indicates that our application should handle
focus + context.
Focus + context (F + C) is a group of techniques that expand the amount
of screen space given to an area of interest, while reducing the space given to the
peripheral data. Sometimes, the magnification is just like placing a magnifying
glass over a graph, those physically closest to the area of interest will also get
more screen space. Sometimes the magnification will look for the most seman-
tically relevant nodes to zoom in on. For example, if the viewer wanted to look
at a node in the middle of a path in a tree, it might make sense to magnify the
area given to all nodes in that path, while not magnifying the siblings of the
in-focus node.
9


2. Layout Algorithms
The layout problem was considered relatively independently from issues of
clustering and F+C. While some previous efforts manage to solve issues of lay-
out and F+C in one step, our modular approach allows for different layout algo-
rithms to be substituted in future versions. It also allows for greater emphasis
on the effectiveness of focus + context and clustering techniques in visualizing
biological networks.
2.1 Force-Directed Placement
Force-directed placement algorithms treat the graph as a collection of forces,
and the layout problem as a way to find a minima for those forces. One of
the more well-known general layout algorithms is the spring embedder, where
nodes are treated as repulsive objects, such as magnets and edges are treated
as springs. All nodes push away from each other, while nodes that are attached
by edges experience an attractive force to counterbalance the repulsion. This
approach is rather simple, and does not explicitly punish edge crossings, but
tends to give nice layouts on graphs that are fairly well connected (it performs
badly on graphs that have unconnected subgraphs and on trees). Because each
node interacts with every other node (0(n2)) for this algorithm it scales rather
poorly. In its basic form, it can only display graphs of less than 40 nodes to
perform real-time layout.
2.2 LEDA Implementation
10


LEDA is a graph package that had an implementation based on this idea of
a spring embedding. While it does not use Hookes law or magnetic repulsion,
it has the same basic approach of calculating a repulsive force between every
node in the graph, attractive forces for nodes that are connected by edges, and
then moving each node based on a percentage of the force vector acting on it
each iteration. ExpAndriods layout algorithm is based in their ideas. It uses a
Java version of the algorithm that has been extended to allow node expansion.
2.3 Modifications for Expansion
ExpAndroid initially lays out the graph with all clusters represented by
composite nodes. This reduction of number of nodes that need to be considered
by the algorithm improves the speed performance and scale of the graph that
can be displayed. However, users may expand a given node, either by clicking
on the node, or by textually searching for a node that is currently contained in a
composite node. Once the user focuses upon a composite node, the node expands
in size, and the nodes previously represented by the cluster are displayed inside
the large rectangular region of the expanded node.
Since the clusters are expandable, we needed to provide a methodology
for laying out graphs with graphs with variable size vertices. The simplest
solution is to give expanded nodes much larger repulsive forces, so they have the
room to expand. However, it is difficult to know how much the repulsive force
should be increased by, particularly if multiple nodes are expanded at the same
time. Bounding spheres allow for a better solution to the problem, but since the
expanded nodes are rectangular, there is a large portion of unused space, which
is sub-optimal when screen space is a limiting resource.
11


We propose a computationally efficient way of comparing two nodes, based
upon the position between nodes centers, the size of node, and some basic
geometry of similar triangles. These similar triangles allow for distances to
be given between the nearest boundaries between two nodes with little more
computation than required to find the distance between the center points as
shown in Figure 2.1.
Figure 2.1: Calculating Distance with Similar Triangles: To find the distance
between boundaries using only one square root, first the distance calculate the
distance between the center of the two rectangles. Then use the relative position
of the two center, and the slope of the line connecting the centers versus the
slope of the line from the center to relevant corner to find which of the rectangles
side the edge connecting the two rectangles begins from. Then similar triangles
are used to find how much of the calculated distance is inside the rectangle
without using another computationally expensive square root operation, which
is subtracted from the distance calculation.
2.4 Alternatives
As mentioned earlier, there are several possibly viable improvements to the
layout algorithm, particularly those by Harel and Koren [7], [8]. However, many
of the improvements on these systems come from grouping nodes together for one
representative force, reducing the complexity of the system. ExpAndroid also
12


reduces the complexity of the layout algorithm by treating a clustered region,
whether it is expanded or not, as a single node. If the node is expanded, the
nodes it contains are layed out with a separate call to the spring layout that
does not examine the nodes outside the cluster. If the clustering significantly
reduces the complexity of the the graph, than the layout algorithm runs well. If
it cannot reduce the graph complexity to under 40 nodes per level, the layout is
the speed-limiting step of Exp Android. However, one of the nice things about
the implementation of our system is the modularity in graph layout. A new
implementation for laying out an expanded node is no more difficult than the
implementation of the layout class itself.
13


3. Clustering
Clustering tends to be a hard problem to solve quickly. For example, finding
the maximal cliques in a graph is NP-HARD. Finding a close approximation is
actually NP-HARD as well. There are some rather good general approximation
algorithms that run far too slowly to be useful for this application. For example,
Hartuv and Shamir have an algorithm for finding highly connected subgraphs.
Highly connected is defined as a subgraph where it takes the removal of at least
n/2 edges to disconnect the graph, where n is the number nodes in the sub-
graph. This is a nice algorithm in terms of grabbing really good, clearly defined
subclusters. The algorithm finds a highly connected subgraph by finding the
mincut of a graph, and then looking to see if the mincut is bigger than n/2, i.e.
if the number of edges required to disconnect the graph is larger than half the
nodes in the graph. If it is, the graph is highly connected and returned. Oth-
erwise, the algorithm recursively searches both partitions for highly connected
subgraph. A single node is not considered highly connected. This algorithm is
called on the initial graph, and each highly connected subgraph that is found,
is removed from the graph. When no new subclusters can be found on this
modified graph, the algorithm terminates. This algorithm was used by Przulj
et al. [21] to cluster protein-protein interaction graphs. We began our applica-
tion by trying to use this approach for clustering. When we implemented it, we
put a large emphasis on optimizing the algorithm for speed. For example, once
the recursive call to the first partition completes, that partition can be freed
14


from memory. The next improvement we made was realizing that the smaller
partition should always be examined first. If the smaller partition is always
examined the maximal recursion depth is only 10 for this graph of 989 proteins
(the dataset discussed in the Przulj paper), since at most n/2 nodes are going
to be in the smaller partition, and it only requires 10 halvings to reduce 1024
to 1 (ie /op2l024 or a balanced binary tree with depth 10, depending on how
you like to think of these things).Even with such emphasis on optimization, this
algorithm took an entire day to cluster.
Bu et al. [3] also cluster protein-protein interaction graphs. They use the
eigenvalues of the adjacency matrix to characterize the topology of the network.
They search for two types of groupings within the network, quasi-cliques and
quasi-bipartites. When they categorized the proteins in the graph, they found
that proteins within the same group tended to share function. This allowed
them to create putative functional tags for uncharacterized proteins. They find
quasi-cliques by looking for larger, positive eigenvalues and then taking the
corresponding eigenvector. The proteins that have the largest absolute weight
form a potential quasi-clique. If it has at least 10 proteins and each protein
there interacts with at least 20 percent of the other proteins in the quasi-clique,
then it is considered an actual quasi-clique.
Bu does not include any formal proof of this method and just gives an
intuition about it. However, we found important degenerate cases. For example,
any regular graph is going to have a maximal eigenvalue whose eigenvector is
all 1. This means on a simple circular cycle where every node has degree 2, this
method would return all the nodes, even though they do not form a quasi-clique.
15


Clearly the method is easily influenced by graph topology and the paper does
not to show that these biological interaction graphs are not so influenced.
Ju and Han [12] efficiently cluster protein interaction graphs before applying
graph layout algorithms. They describe their method as nodes with the iden-
tical interacting partners are collapsed into a single composite node [12]. This
seems to mean that they cluster together nodes that have identical neighbors.
However, this approach does not appear to explain their results, such as in Fig-
ure 3.1. All the nodes in the center of Fig. 3.1(a) are collapsed into one node
through identical neighbors. The edge count in (b) is 105 = 15 14/2, so the
graph is the complete graph klb. Since collapsing nodes into a composite would
not affect any of the edges that do not go to the composite node, and since
(b) is /cl5, clearly all nodes on the outside are connected to each other in (a).
Since the composite node in (b) has edges to every node on the outside, each
node on the outside of (a) needs an edge to at least one internal node. However,
since all the internal nodes get clustered, they all need the same neighbors. This
restriction means that every external node must connect to every internal node.
Furthermore, this identical-neighbor constraint means that every node on the
inside must be connected to all the other internal nodes or no other internal
nodes. If the internal nodes are fully connected, then the graph in (a) should
have 513 14 4- 513 512/2 + 14 13/2 = 138,601 edges. If there are no internal
connections, then the graph should have 513 14 + 14 13/2 = 7273 edges. It
seems that this dataset cannot generate (b) from (a) using the clustering method
described in their paper. However, this interpretation of clustering off identical
16


neighbors is what we shall mean by Jus method in the rest of the paper.
This approach should give similar results to Auber et al. Auber et a!, score
each edge in terms of clustering and discard weaker nodes to cluster. The method
scores edges by looking at the sets formed by the endpoints u and v of an edge.
It then forms three sets of nodes. Uonly is the set of nodes that u has edges to,
but v does not. Vonly is the set of nodes that v has edges to, but u does not.
S is the set of nodes that both u and v send edges to (see Fig. 3.2. They define
a function / that takes two sets as input and returns the percentage of edges
between the two sets divided by the total possible number of edges between
the two sets, and the function r which takes one set as input and returns the
percentage of edges where both endpoints are in the set divided by the total
possible number of edges that could have both endpoints in that set. Each edge
is then scored such that score = f (Uonly, Vonly) 4- f (Uonly, S)+f(S, Vonly) +
r(S) + )5|/(|5| + \Vonly\ + \Uonly\). Edges that have strengths less than a
threshold value are discarded, and connected components form clusters. After
clusters are formed, edges are restore. The choice in cutoff threshold is somewhat
arbitrary, but they find that a strength of 1.6 tends to create optimal clusters.
Since the condition where two nodes have identical neighbors is gives a score of
at least 1.0 (provided there is an edge between the two nodes), it seems that
with a threshold of 1.0 this method should reduce a graph at least as much of
Ju.
17


la)
(b)
Figure 3.1: A Protein Interaction Network Clustered by Bu and Han:(a) 527
nodes with 8568 edges (b) 15 nodes and 105 edges. All nodes in the center of
(a) have been collapsed to a single composite node.
Figure 3.2: Sets for Determining Edge Strength: Each edge forms three sets
of neighboring edges.
18


4. Focus+Context
Focus + context (F + C) is a distortion-based technique [15] that magnifies
a portion of the data, while shrinking, but still displaying, the rest of the data.
Presenting data without distortion is like having two diagrams on separate pages
of a book. To compare the graphs, the reader needs to flip the pages, and keep
the details of each diagram looks like in his head. A distortion based approach
would be presenting smaller versions of each diagram on the same page, with
an additional picture enlarging the area of comparison. This representation
removes the mental work of keeping a diagram in mind. The first studies in this
area were in textual information representation. The first work was the Bifocal
display, where a are of interest was magnified. This idea was generalized into the
perspective wall [15]. The perspective wall offers an intuitive visual metaphor
for the distortion in the system as shown in Fig. 4.3. The area of focus appears
to be on a wall that is physically closer to the viewer. The bifocal display is
really just a special case of the perspective wall where the wall moves back from
the viewer at a 90 degree angle. The nice metaphor comes at a heavy price -
large amounts of the screen, the limited resource in the problem, are taken up
with empty space. It is possible to rectify this problem to some degree, giving
fisheye distortion, where like a fisheye lens, the area farther away from the
area in focus gets successively less detail [9].
There are several problems with these approaches. First, expanding in this
manner can cause edge crossings as shown in Fig. 4.1. Edge crossings are not
19


only bad from the layout perspective, but also from a user interface perspective.
Studies by Plaisant [20] found that they greatly affect viewer comprehension.
Also, this solution interacts poorly with spring embedded general graphs. The
distortion of the basic fisheye method is purely geometric, that is the area mag-
nified is that which is nearest to the focal point by some distance metric (such
as Euclidean). Depending on the graph and what local minima the embedding
has finished in, a force directed layout might have an embedding where, for ex-
ample, a gene g is connected a path length of two to both genes a and b, but the
Euclidean distance for ga and gb are not that close. Consequently the distortion
might incorrectly make g look less related to a than b as in Fig. 4.2.
\tu
Figure 4.1: Fisheye Distortion: Figure (a) is a non-distorted view of the graph.
Figure (b) is the same graph with a fisheye distortion noticed the nodes nearer
to the area of focus have had edge crossings introduced.
4.1 Visualizing Trees
20


/
/
f b
r
\
!
\
Figure 4.2: Distance in Spring Layouts: Although spring layouts try to place
connected nodes near each other, they do not give distance a semantic content.
For example, the distance path length for ga is the same for gb, even though
their Euclidean distances are different
21


Most of the existing F-fC work is on trees rather than general graphs. Tree
layouts our simpler a lot of data fit into this hierarchical representation, and
having a limited number of connections to the rest of the graph makes the
decision of which other nodes to put in focus simpler. One of the initial attempts
to provide F + C for trees was cone trees [9]. Cone trees are a way of embedding
the graph into 3D space as shown Fig. 4.4. Not only does this give the system
another dimension to embed the graph in, but it shifts some of the cognitive
load [22] of processing the graph from conscious thought to the visual system.
This system allows viewers to deal with larger than typically understandable
hierarchies, and, similar to the perspective wall, it presents the distortion in a
way that is intuitive to the viewer. The viewer does not perceive it as distortion
[5].
These naturally understandable metaphors are useful when people are learn-
ing a system. In personal discussions with researchers, several people liked the
Perspective Wall more than other F -f C systems. It seemed like this real-world
connection helped them immediately understand the visualization. The pop-
ularity of cone trees was limited by the technology at the time. This sort of
three-dimensional visualization was still relatively high-end in the early-90s and
made them infeasible solutions. Now that graphics cards can easily process this
information, cone trees might deserve a second look. However, there is one sig-
nificant disadvantage with them they require user interaction to be effective.
At any given position, the viewing angle might not display all the nodes. This
means that cone trees can be used for exploratory purposes, but they probably
cannot completely solve the problem of presenting the data.
22


4.2 Spacetrees and Degree of Interest Trees
Spacetrees take a different approach to F + C. Rather that making sure that
some area is in focus and the remainder of the graph is in context, spacetrees
provide just as much context as space allows [20]. Given that screen space is
the limited resource, they build their system to try to make maximal use out
of screen real estate when visualizing the tree. One feature to this approach
is that they rotate the tree to go from left-to-right rather than up-and-down,
taking advantage of screens typically being wider than they are tall. The main
contribution in Plaisant et al. is automating the decision of when to collapse a
subtree into a glyph, an icon that represents a subgraph. Another distinguishing
feature was that this application was designed with constant feedback from a
representative user base. This user evaluation caused the design to move away
from a strict F + C system. Initial prototypes tried to maintain the entire
graph in context, performing a progressive geometric de-magnification on the
areas not in focus. Users noted that while this provided smooth transitions, it
was distracting to have labels that were visible but not legible. Plaisant et al.
switched to system with a more semantic, binary demagnification. Nodes are
either visible in their normal size, or they are collapsed down into glyphs that
represent a subtree. The height, length and tone of the subtree glyph indicate
its depth, average width, and total number of nodes, respectively (see Fig. 4.5.
The testing also found users best understood interactions if transition between
steps was segmented, so the user first saw the branches trimmed, then the tree
23


translated, and then had the glyphs expanded into visible subtrees.
Degree-of-interest trees (DOI trees) combine several approaches to F + C,
but in many ways they are similar to spacetrees. Card et al. [5] calculates the
degree-of-interest of each node based on how deep in the tree it is and how the
path distance between the node and the node that is currently in focus. They
take these interest scores and then the system uses them to assign node size.
They use three sizes of nodes, the largest containing the most information about
the node. For example, if this approach was used to visualize the Gene Ontology
hierarchy, this might include entity name, GO description, and a list of all the
genes that correspond to the entry. The medium size nodes might just have
entity name and a single corresponding gene. The smallest node just indicate
that the node exists, giving information about the subtree structure, as seen in
Fig. 4.6.
These size assignments are preliminary. If there is insufficient space to dis-
play the entire graph, the program shrinks the visualization down by shrinking
lower interest nodes and collapsing low interest subtrees into triangular glyphs.
Every time a new node comes into focus, the interest score is recalculated and
the tree is redrawn. To aid in viewing comprehension, these transitions are an-
imated as a series of linear changes, with an adjustable number of animating
frames between transitions. Card et al. also adds new ways for how the viewer
can interact with the tree. Each node is a box, and boxes can be rotated left
and right. The left and right sides of the box can display additional information
about that particular node. The effectiveness of this approach on large-scale
graphs when compared to other systems is not discussed by Card et al. The
24


visualizations start to get a little crowded when thousands of nodes are in the
tree. Card et al. shows this system laying out a taxonomic structure of 7,000
entries, and it looks crowded (notice the overlapping regions in Fig. 4.6. Also
see Fig. 4.7 for an example where most of the space is devoted to context).
While visualizing that many nodes in a constrained space is going to introduce
some clutter, it seems like DOI trees become more and more like other systems
as the graph grows. The distinctions between spacetrees and DOI trees are less
apparent in these very large trees.
4.3 Hyperbolic Visualization
Hyperbolic visualization takes a different approach than spacetrees and DOI
trees. With this method everything is always visible, even when the nodes are
very far from the node that is in focus. Also, the demagnification is continuous,
similar to the approach tried and abandoned in the initial prototype of space-
trees. An example of such a system can be seen in an interactive model available
at
http://www.inxight.com/VizServerDemos/demo/orgchart.html.
There are several approaches to visualizing trees using hyperbolic geometry.
Both methods are driven by the same basic idea. While trees have several nice
properties for visualization, they have the (literally) big problem that the space
a tree takes up can grow exponentially as the depth increases. To combat this
problem, Card et al. [5] uses a distortion technique that always gives more space
to the next level while never exiting the bounds of the graph layout. This bit of
magic is accomplished by using non-Euclidean geometry.
25


Hyperbolic geometry differs from Euclidean geometry in that given a line £
and a point p not on £, there does not need to be one line that is parallel to £
and contains p [6].
Hyperbolic geometry was fully developed by Janos Bolyai and Nikolai
Lobachevsky.
There are several properties of the geometry that have are useful for visual-
izing trees when the hyperbolic space is projected into Euclidean space.
For a non-Euclidean geometry to be useful for visualizing a graph, there
needs to be way of converting it into something representable in the Euclidean
geometry without violating the axioms of the hyperbolic geometry. There are
several models for doing this conversion. The Poincare and Klein models have
both been used.
The Poincare disk model takes an infinite surface and projects it down into
the unit circle. To do such, distances get farther away from the center of the
circle. The distances between two points in the Poincare disk is given as follows.
Let a and b and let p and q be the ends of the p-line that connects a and b.
Then the distance dp is given by
dp = | log(AB, PQ)\ = | log(AP BQ)/(AQ BP)| (4.1)
[23]. Conceptually, (although not actually correct mathematically) distances in
the Poincare disk can be thought of as follows. As shown in Fig. 4.8, start at
the center of a circle and move along a ray that lies on the radius. Now move
a distance x on the ray. Now, from the outside observation it looks like the
distance x is half the distance between the center and the end of the radius.
Move a distance x again, and from the outside it looks like the distance x is half
26


of what it was before, meaning that only 3/4 of the radius has been covered.
Each time you move a distance x, you halve the distance to the circle itself,
without ever reaching it [24].
Transformations to change the focus of the graph are done with circle pre-
serving transformations on the unit disk, expressed as complex function of the
form
Zt = (Qz + P)/(l + Pz) (4.2)
where P and 0 are complex numbers, |P| <1 and |0| = 1, and P is the
complex conjugate of P. 0 indicates the rotation of the origin and P indicate
the translation to the origin [14],
The basic idea is that the hyperbolic models give us an infinite amount of
room to theoretically work with, and certain layout algorithms that would be
ineffective using Euclidean geometry work fine using the hyperbolic. The trade-
off is that there is a loss of detail at subtrees away from the center, but this
actually can be a desired property for giving an F -I- C representation. This
approach combines questions of layout and F + C, since the layout combined
with the Euclidean projection needed for visualization solves both problems.
As Herman [9] notes, these hyperbolic visualization algorithms are somewhat
mysterious. It is quite difficult to reproduce the results. Unfortunately, none
of the papers axe didactic enough to reveal the mystery. Many of the details
of the algorithm are omitted from the papers as the developers decided to focus
on a commercial application available from Inxight.
27


The conceptual description of this approach seems similar to general radial
algorithms performed in a hyperbolic geometry.
Figure 4.9 shows such a basic radial layout algorithm. These algorithms
draw subtrees within an annulus wedge. The nodes at layer i are simply placed
on the circle C*. C{ has a radius incrementally greater than Cj_i. The subtree
can never expand past the angle established at the root of the subtree, v. For
example, in the figure, the bold rays denote the maximum span for where de-
scendants of v can go. Specifically, let the annulus wedge be denoted Wv. The
children of v are placed out on C:+i such that for a child u of v, the angle (3U of
Wu is
Ai = min {1(u)/3v/1(v),t)
where l(x) is the number of leaves rooted at the subtree of x and r is the angle
formed by the bounding rays.
These layouts are of limited use because even a branching factor of three
will make the amount of space left for the subtree too small within a few layers.
However, using a hyperbolic geometry, specifically because of the way parallel
lines diverge in a hyperbolic geometry, each child will typically get a wedge
that spans about as big an angle as does its parents wedge, yet none of the
childrens wedges will overlap [14]. This approach not only allows a large
number of nodes to be displayed in a limited space, but by providing smooth
and continuous transitions between what is and is not in focus, they are useful
for browsing large hierarchies. These hyperbolic viewers borrow from space trees
and typically only have labels visible or hidden, with no geometric shrinking (Fig.
4.10). This does introduce a non-linearity in the representation that is similar
28


to spacetrees and DOI trees. Alternatives are to shrink the labels, leaving much
of the space occupied with unreadable labels, or to leave them always the same
size and therefore overlapping These approaches all have legibility problems,
as described above.
Lampings [14] basic idea is extended by the work Munzner [17]. She
switches the Poincare disk model into a Klein model in the volume of a 3D
sphere (Fig. 4.11 and Fig. 4.12). The switch from Poincare to Klein model was
based on tests that showed more nodes can be represented legibly that way. By
switching over to 3D, she gains virtual space in a manner similar to cone trees.
Actually, this system is a generalization of cone trees. These changes allow the
systems to handle about an order of magnitude more nodes in the fringe than
the Lamping system. It also has nodes that are in-between being focused upon
and shunted to the fringe, a feature that Lamping lacks since there are no real
nodes, just labels.
Munzner also proposes a way to handle generalized graphs. Dropping edges
transform a graph into a tree. Munzner offers a domain specific way to pick the
best edge to keep. The system lays out the graph with a spanning tree. The
removed edges can be added back in for the visualization. This approach can
create a clutter in the visualized graph, which is partially ameliorated by having
the system in 3D.
Munzners system has advantages in displaying graphs with very large num-
bers quickly, but there are several problems. The perspective walls main prob-
lem holds for both hyperbolic approaches by placing the graph inside a circle,
not all of the screen space is used.
29


Also, picking an appropriate spanning tree for a graph is non-trivial. To
use this idea might require finding an edge appropriateness metric for every
graph visualized. In the case of the small-world graphs found in this problem
space, the metric has to not only pick the most meaningful edge, but it has
to work against small-world properties. Since small-world graphs have small
average path lengths, there is the problem that trying to transform a graph into
a tree could create a very shallow tree. A shallow tree leaves most of the graph
in focus, since everything is pretty close to whatever node is being examined.
4.4 Putting it All Together
Auber et al. [1] finds that the small-world graphs they tested can generally
be hierarchically decomposed, such that one clustered subgraph turns out to
also have small-world properties. These small-world sub-graphs can similarly
be decomposed into clusters, recursing down to give a multi-scale model of the
small world graph. If this observation holds for biological systems, then there
is a natural mapping into Munzners system. Decompose the graph as much
as possible. Map the graph into a tree of the hierarchy of clustering. Examine
a node n. The nodes that are siblings in the tree are the nodes that are in
the same cluster as n. Nodes that are in subclusters of the cluster that n is in
are descendants in the tree. Nodes that do not cluster down to the level of n
are ancestors. Unlike the previous way Munzner decomposed a graph into level
assignments with a spanning tree, this approach uses levels of recursion to find
distance.
While this idea is interesting, hyperbolic viewers are difficult to work with.
One problem is that Xerox holds a very broad patent on the idea of laying graphs
30


out in hyperbolic space and they have shown little interest in allowing the idea
to be used by anyone, academic or not, unaffiliated with Xerox. They have
interfered the distribution of other non-commercial hyperbolic viewing tools [4].
On a more philosophical level, using a hyperbolic viewer would not show
clustering information, it would just use the clustering for picking edges. It is
preferable to show the clusters because they allow for summary statistics on
what type of things are getting clustered together (such as this is a duster
of mainly regulatory proteins), and presenting clusters can make relationships
appear clearer. One possibility is splitting the screen, giving a cluster view and
a hyperbolic F-fC view, but this approach greatly reduces how much screen
space the application can work with. Most importantly, the embedding process
projects the hyperbolic space down onto a circle (or sphere). Since screens are
almost always rectangular, this means such a system will always waste space
(if the display region is square, a circular display area will waste 4 ir units
of display space, more if the display is rectangular). While it seems possible
to use the corners that have white space for something like buttons or extra
information, Neither H3 [17] or Inxights system [14] do this. This space is
hard to make use of because it is on the corners and the curvature is difficult
to work around. Also, the proposed mapping is dependent on the data being
truly multiscale. Since the depth of the hierarchy depends on how many times
a cluster can be decomposed into small-world subclusters, if the data does not
repeatedly subcluster, than this approach does not overcome the shallowness
inherent to small-world graphs.
4.4.1 Exp Androids Design
31


Our system, ExpAndroid, functions like spacetrees. The hierarchy is used to
determine how much of a collapsed node to expand. For example, say the user
wants to examine a cluster in a protein-protein interaction graph for Alcohol
dehydrogenase. If the node is currently contained in a composite, the system
will see how many nodes would be displayed if all real nodes in the cluster were
displayed. If there is sufficient space, then all the nodes will be shown once the
cluster node is expanded. If there is insufficient space, the system looks for nodes
in the expansion that can remain clustered, which is likely to occur if the graph is
a multiscale small-world graph. The decision on what space is sufficient is based
upon how many many nodes are to be displayed. The system tries to balance
the space given to out-of-focus nodes and in-focus nodes. If few nodes are in-
focus, little screen space is given to the expanded region. If a large percentage
of displaying nodes are in the expanded cluster, a large portion of the screen is
given to that node. This methodology loses the semantic meaning of distance
that a hyperbolic viewer has. An object that is far in the hierarchy from the
in-focus node will not only get little screen space, but it will be shunted to the
edge of the display area. While this means we lose a dimension for information
transfer, that also means our system can take user improvements to the layout
for creating a final image for publication.
32


Rosenc^J
we muck's?
to v
sending. So,
n *r
j.et*s transfonn^t^
y^r t-h* exterior
;t^ -.l that it was
i,r*
nuotn from the und^rstandi,
enfrs1'-!
day*.
tto-an his fathers d*ath
m> wucn
thf'^nnc?t dream of: I
- ^ay*
rKJ *, ^ing of so yo*Jn~ t r*
SJ 50 ni9htaojr tjfi
i i j ^>u vouchsafe YaZ,v X-*\
$>t tie time: so
ijv him on to pl^
£) as from occa£
Figure 4.3: Perspective Wall: Perspective walls were one of the earliest at-
tempts at focusing in on an area while providing a global context
33


Figure 4.4: Cone Trees: Cone trees are a way of presenting trees in 3-space
:**!r*l
____
(fcrLlw]
Figure 4.5: Spacetrees: The triangular glyphs representing subtrees vary by
shade, width and height
34


Figure 4.6: Degree-of-Interest Trees
Figure 4.7: Degree-of-Interest on a Shallow Tree: Almost all the space in the
figure is devoted to context and not the focus. While the area of context is large,
little information about the context is given other than the general shape of the
subtree.
35


Figure 4.8: Distance on a Poincare Disk
36


Figure 4.9: A Radial Layout in Euclidean Space
37



T might 4?
ME >n!y M ttath cell iaaO. bii'icl. IS a t.< it. itn visual.rati.n*. .men* all tn >^K* sv'£ *A:rVg. tSBET,lJ3S'atT? i
---------___________r
fft _*v a .
ftaW&feftfeS
rjtvaCT^ U a_
^OfMrp*r/^ 'r
jmmv
~>Crjitf >Trer jJ
qtcvr*. 3*;fy^rT~]
r '3£*_3^ b*>'.<5* ________
Figure 4.10: Gene Ontology Visualized in a Hyperbolic Viewer [16]
38


Figure 4.11: H3 with a Poincare Model: This graph is laid out in hyperbolic
space and then projected down into a sphere using the Poincare model
39


Figure 4.12: H3 with a Klein Model: This graph is laid out in hyperbolic
space and then projected down into a sphere using the Klein model
40


5. Results
5.1 Evaluation Methodology and Test Data
To evaluate ExpAndroid the speed of clustering, the degree of clustering,
and the images created were tested for several datasets. When possible this
performance was compared to other applications. However, many of the relevant
focus+context tools are not openly available, and are intended for tree data.
Since small-world clustering does not work on trees, most datasets for other
focus+context systems do not work with ExpAndroid. One exception is H3.
While we could not use H3 to visualize our data, we did visualize one of their
datasets.
One interaction graph comes from protein-protein interactions in budding
yeast, Sacchromyces cerevisiae. Von Mering [25] created a large data set of these
interactions. It is a collection of various experimental high-throughput tech-
niques. The techniques include the yeast two-hybrid systems, protein complex
purification techniques using mass spectrometry. Correlated messenger RNA
expression profiles, genetic interaction data, and computer predicted interac-
tions. There is also a smaller gold-standard data subset of manually curated
catalogs. Von Mering divided all of the interactions into high, medium and low
confidence to account for the high degree of false positives that occur with these
techniques. If just the high and medium confidence interactions were included
the graph had 2,617 nodes and 11,865 edges. This dataset was selected to fa-
cilitate comparisons of our clustering method with to the techniques of Bu and
41


Przulj, who used the same data set [3], [21]. Jeong has made various metabolic
networks available [11]. We used Chlamydia pneumoniae for testing.
We also visualized two datasets that came from within the Hunter lab.
One was a collection of Medline abstracts, where nodes are genes and an edge
connects gene u and v if and only if there is an abstract that references both
genes. The other group of datasets was a series of predicted protein interactions,
much like the von Mering data. These datasets varied on what kind of evidence
and how much of it was present to predict an interaction.
In addition, we generated several artificial datasets using a method based
upon the generalized connectivity equation given by Kleinberg [13]. The basic
idea involves laying a graph out in a grid. A node has a certain lattice distance
for which it is connected to all its neighbors.In addition, a node has long-range
contacts. The creation of a contact depends on two constants q > 0 and r > 0,
such that for a given trial the edge from u has endpoint v with probability
proportional to [<2(, v)]~r. Unlike Klienbergs method, the edge here are undi-
rected, meaning that for our dataset, if a long-range connection connects u and
v then there is a connection to v and u as well. Also, to make the datasets look
closer to biological data, on some datasets we removed a subsection of the local
neighborhood connections, so while a node would still be far more likely to have
more local connections than long-range, it would not be guaranteed to connect
to all of the nodes in the given lattice distance. While actual biological data
is more important, these created data create graphs to visualize with a finer
degree of control of the graph properties. Testing indicated what sort of graphs
this approach works well for.
42


Table 5.1: Small-World Clustering of Data: Data sets were checked to see
how much clustering reduced the initial graph. Datasets that had unconnected
subgraphs only had the largest connected subgraph checked. All data is with a
clustering threshold of 1.6 unless otherwise noted
dataset initial node count top-level reduction
kleinberg random 400 3%
kleinberg regular 400 .997%
yeast (high confidence) 601 47.6%
yeast (medium + high) 2357 39.7%
predicted protein 1 344 79.4%
predicted protein 2 431 13.9%
predicted protein 3 1316 22.1%
Chlamydia Metabolism 386 0%
Chlamydia (.8 threshold) 386 17.1%
Medline Co-citation 708 19%
5.2 Clustering Results
While almost all datasets experienced a significant reduction of initial size,
only one data set was reduced close to an order of magnitude (79%). This resullt
was surprising given the results demonstrated on protein interaction graphs in
Ju [12], which as discussed previously should be at most as good as the clustering
method in ExpAndroid. The two datasets discussed in Ju and Han are reduced
from 307 nodes down to 25, and from 527 nodes to 15. While these two datasets
were not available, we did implement Jus clustering method and compared it
43


Table 5.2: The clustering methods of Ju and Auber were compared on a
variety of datasets. All tests were performed on a 1.7 GHz Pentium 4 with 512
MB RAM.
dataset Ju top-level reduction Jus time ExpAndroid time
predicted protein 1 1.7 % .69 sec 3.73 sec
predicted protein 2 13% .281 sec .157 sec
predicted protein 3 9% 4.23 sec 18.7 sec
yeast (high conf.) 14% .67 sec .953 sec
yeast (all) 19% 17.7 sec 25.5 sec
Medline Co-Citation 11% .812 sec .80 sec
with small-world clustering (see Table 5.2.
These results were more in line with what we expected for the performance of
Jus clustering method. Jus simpler method was faster on all but two datasets,
and was faster by at least a multiple of 4 on two tests. However, Jus clustering
managed to cluster fewer nodes on every dataset. If the speed performance of
Aubers method is acceptable, than it appears that it should be preferred for
reducing the complexity of biological graphs for layout, particularly since laying
out a graph of reduced complexity speeds up overall runtime remarkably [12].
Certainly ExpAndroid spends most of its time with layout, and benefits in speed
from clustering. For example, the Medline co-citation graph takes .80 seconds
to cluster, but takes 21 seconds to cluster, layout and display the graph. This
clustering step is well worth the .80 seconds, as it reduces the total execution
time from 56 second down to 21. All times are derived from execution on a 1.7
44


Table 5.3: Small-World Clustering of Data: Data sets were checked to see
average number of recursive clusters in a given cluster, size of a given cluster,
and nodes reduced on the top-level. Datasets that had unconnected subgraphs
only had the largest connected subgraph checked. All data is with a clustering
threshold of 1.6
dataset ave. cluster depth ave. clust. size
kleinberg random 1.0 13
kleinberg regular 1.0 399.0
yeast (high confidence) 1.19 16.5
yeast (medium + high) 1.18 23.9
predicted protein 1 1.33 134
predicted protein 2 1.05 11.16
predicted protein 3 1.08 70.54
Chlamydia Metab. (.8 threshold) 1.0 3.28
Medline Co-Citation 1.5 50.6
GHz computer with 512 MB RAM.
The data was also rather shallow, often not producing clusters that could
be further clustered. However, the average depth could often be increased by
changing the clustering threshold. For example, the dataset predicted protein 1
had an average clustering depth of 1.8 when clustered with a threshold of 1.71.
However, even when the average depth increased, the maximum depth of the
recursion never exceeded two. There are two possible ways of improving picking
the threshold constant. One is that testing on biological datasets might establish
a better default value than 1.6. The other possibility is given that clustering is
45


quicker than layout, it might be worth trying several probe values to give some
statistics on reduction and depth of clusters, and then pick the best of those
values.
The more intractable problem is that the data was not that hierarchically
decomposable. ExpAndroid is not based upon hierarchical decomposition to
the degree it would be if it was based upon a hyperbolic viewing system. It
only becomes a problem when a large portion of the graph is contained in a
composite node that does not further decompose. If the user is interested in
that region, ExpAndroid needs to then show all the contents of that cluster. If
those contents are most of the graph, then clustering does not make the data
much clearer to understand.
5.3 Visualization
The application takes a square region of the screen and uses it for display-
ing the graph. The remainder of screen space is used to provide an additional
interface for exploring the graph and to display information. This includes what
nodes are in the in-focus area, name of the highlighted node, and annotation
details. Because many biological datasets could tie into biological databases
for additional brief pieces of information, we included support in our XML file
system for an annotation tag. For example, if a graph represented gene interac-
tions, each gene node might include the most relevant Gene Ontology functional
annotation. Such annotation becomes particularly useful once the graph is clus-
tered. By searching all the nodes in a cluster, the application can check all the
annotations and characterize the node. Initially we just presented the highest
occurring annotation in that cluster as its characterization. Based on a users
46


suggestion, it is also possible to display the three most frequently occurring an-
notations in that cluster and its subclusters. If nodes do not have annotations,
no space is used for this characterization.
Even with smaller size graphs (n = 20), clustering proved helpful for some
graphs for removing clutter such as that created by node labels. As shown in
Fig. 5.1 and 5.2, a small world graph can have its complexity greatly reduced
by clustering.
47


Figure 5.1: Unprocessed Basic Graph: This is a smaller graph with no clus-
tering.
A large-scale protein-interaction graphs that were barely legible without
clustering displayed successfully with clustering, as in Figure 5.4 and 5.3, and
again in Figure 5.5.
ExpAndroid also reduced clutter for viewing metabolic graphs (Fig. 5.6)
and Medline co-citations (Fig. 5.7).
ExpAndroid was also compared to H3 (Fig. 5.8 for visualizing the Stanford
graphics department website. The website had many pendent vertices (nodes
attached to the graph by only one edge). Pendent vertices are problematic for
48


Figure 5.2: Clustered Basic Graph: This is the same graph after clustering
49


Figure 5.3: Clustered Protein Interaction Graph
50


Figure 5.4: Unclustered Protein Interaction Graph: The same graph as pre-
viously shown, without any clustering
51


Figure 5.5: Clustered Protein Interaction Graph
52


Figure 5.6: Metabolic Interaction Graphs: The metabolic interactions of
Chlamydia pneumoniae (a) unclustered and (b) clustered
53


Figure 5.7: Medline Co-Citation Graphs: The co-citation of genes in Medline
abstracts (a) unclustered and (b) clustered
54


small-world clustering, since their edge strength is always zero and will never
cluster. This makes Figure 5.9 cluttered. Figure 5.10 automatically filters these
pendent vertices, which suggests that ExpAndroid could be improved by clus-
tering these pendent vertices to their one neighbor.
Figure 5.8: Hyperbolic View of Stanfords Graphics Website
55


56


Figure 5.10: ExpAndroids Filtered View of Stanfords Graphics Website
57


6. Future Work
6.1 Layout Algorithm and Improved Presentation
The layout algorithm is often the performance bottleneck and should be
upgraded to allow for real-time layout of hundreds of visible nodes. Since small-
world clustering appears to rarely reduce the nodes on the initial layout by more
than30 percent, if the layout algorithm can only efficiently handle 50 nodes, than
the application cannot be used for truly large graphs.
It would also improve ExpAndroid to give more visual cues as to the sub-
structure of collapsed nodes. For example, the clustered nodes that have many
nodes in the cluster might be presented larger.
6.2 Adjusting Clustering Threshold
As noted earlier, the threshold of 1.6 is rarely optimal for reducing the size of
the graph, or for recursively clustering the graph. In addition, possibly the clus-
ter should not be a constant. Sometimes a cluster will not break up into smaller
points even if the cluster is not a fully-connected clique. If it is not a clique,
raising the threshold can break apart a subcluster. This dynamic changing of
threshold might improve the maximum depth of hierarchical decomposition.
6.3 Further Characterizing Biological Graphs
When the randomly generated data was clustered, it performed differently
than the other data sets. It tended to form one giant cluster or none at all,
depending on how many local connections the set had. It was possible to increase
58


the number of first-level clusters to two or three if the threshold was raised to
around 4. This disparity might indicate what type of graphs work well with this
sort of clustering technique.
The work of Auber et al. focuses on small-world graphs. A characteriz-
able subset of small-world graphs might better fit into this clustering approach.
Many small-world graphs are also scale-free graphs. Scale-free graphs are a type
of graph where the distribution of the number of neighbors a node has obeys a
power law. Consequently, a scale-free network will have a few nodes that have a
very large number of adjacent nodes. For example, in the high-confidence yeast
protein interaction dataset over half of the 601 nodes have 4 or fewer neighbors.
However, four nodes have over 50 neighbors. If the small-world clustering tech-
nique is ultimately a scale-free-network clustering technique, it would allow for
better screening of what kind of dataset this visualization approach can handle.
59


Appendix A. Application Code
This is an appendix with the code for clustering, laying out, and displaying
graphs.
0.8
/* *
Craatad on Jan 4, 2005

* TO DO To changa tha taaplata for this ganaratad fila go to
* Window Prafarancas Java Coda Styla ~ Coda Tanplataa
*/
packaga bosM.agabow.graph.viz.swingsat;
/**
Cantbor agabow
*
T0D0 To changa tha tasplata for this ganaratad typa comant go to
Window Prafarancas Java Coda Styla Coda Taerolatas
*/
public class DiractadEdga artands Edga {
public DiractadEdga(Hoda sourca, loda sink){
sourca. sourca;
sink. sink;
>
/* (non-Jawadoc)
Csaa hosts.agabow.graph.viz.swingsat.EdgaSisDiroctadO
*/
public boolaan isDiractadO {
raturn trua;
>
/* (nan-Javadac)
Ssaa heoa.agabow.graph.viz.swingsat.Bdga#isWaightad()
*/
public boolaan isWaightadO {
raturn falsa;
>
public static wold aain(StringG args) {
>
/
* Craatad on Fab 1, 2005
a
TO DO To changa tha t sop lata for this ganaratad fila go to
* Window Prafarancas Java Coda Styla Coda TaspUtay
*/
packaga hoos.agabow.graph.viz.swingsat;
inport jzva.awt.AlpbaCooposita;
inport java.avt.Dinansion;
inport iawa.mrt.Qraphics2D;
inport jzva.awt.ftandaringSints;
inport Java.art.gaon.Lin2D;
inport java.awt.gaoo.Bactangla2D;
inport java.util.;
inport java.awt.Color;
iaport hoM.agabow.graph.viz.layout.*;
/*
teuthor agabow
T0D0 To changa tha tasplata for this ganaratad typa consant go to
Window Prafarancas Java Coda Styla Coda Tassplatas
*/
public class DisplayClnstsr artands DisplayHoda {
privsta BashSat nodaSat.;
privata OraphSat graph.;
7/ privata SactanglaZD.Doubla bordar.;
// privata Bactangla2D.Doubla ract.;
//doubla width.;
//doubla haigtrt.;
//doubla x_, y_;
privata boolaan isErpandad.;
privata int azpandOrdar.;
privata static int azpandZD 0;
//addad 4/2
//doubla pWidth., pBaigbt.;
60


t:
s
13
t
f
b,
I
I eg
S lot..
I issl
I HP
p.
a i '
4*
t;*/

s?
ii
s S>
i
$ -* <-*
s m
I s**
-I?
O --. < ft *
o3 ,SS2
aS42S3
mil*
34
:4
s^JB ft
ft.* f.
SHS J
~.mu
-K s|s^
a3?Sas
SaV'ScS
:*^sH
^ IsTdl
rSrS
public void *xpaad(K
iBxp*ad*d_ true;
void *zpand(doubl* paaslVidtb, doubl* pan*lB*ight, doubl* navVidth, doubl* novHoight, 8raphcB2D g2d){


>
public void draw(double paatlVidth, doable panelBeight, doable z, double y, Grapbics2D g2d){
if(isRxponded.){
//sidth. height. 70;
//x_ y_ .3;
c2d.eetColor(nev Color(255. 255, 255));
7/ g2d.setCogposite(AiphaCompo site.get Inst ance(AlphaComposite. SBC, 0.5f));
//modified 4/6
g2d.fill(ne Rectongle2D.Double(panelVidth
W......... '
7/g2d.fill(rect_);
g2d.setColor(new Color(265, 0
0));
z_, panelBeight
x., paaelBeight
g2d.drav(nev Re ctongle2D.Doable(panelVidth
g2d.setColor(nev Color(0, 255, 01);
7/ g2d.draw(rect.);
g2d.eetColor(oev Color(Q, 0, 0));
7/g2d. eetCoapoeite (AlphaCcapoalt#. get Instance (AlphaCoaposita .SRC) );
' y_, width., height.));
1 J-, eidth., bight.));
Iterator i graph..getllllodesO.iterator();
Iterator 1 graph .getill£dzei() iteratorO ;
while (j .haslert OH
Edge e (Edge)j.nozt();
if (graph..haslode(e.sink.) Ah graph..has*oda(e.source.))<
//g2d. setCeaposit a (ilphoComposite. get Iastanee (AlphaComposite. SRC.0VER, 5f));
e.getDisplajO .drow(widtb_, height., z. panelVidth, y_ ponalBeigfat, g2d);
//g2d. setCoaipositedlphaCoapoeite. get Instance (ilphaCoaposite. SRC) );
//System.out .print la ("blab + e.getDisplayO);
>
vhile(i. ha alert () ){
6 ((lode)i.next()) .gatDisplayO;
height., z_ panelVidth, y_
paaelBeight, g2d);
>
/ HashSet adjlodes ((glyph.getlodeO) .getleighbors());
Iterator 1 adilodes. iterator!);
while (j. haslext O ){
lode a (lode)j.next();
Displsylode da n.getDiaplayO;
if (oodeSet..contains (da) ){
g2d.drsw(new Line2D.Double(glyph.getCenterl() width. x. panelVidth, glyph.getCeaterTO height. y_
m.getCenterZO width. panelVidth z_, dn. get Cent erT() height. + y. paaelBeight ));
>
paaelBeight,
>
>
w
else<
// x_ y. .3;
/ g2d.setColor(new Color(0, 0, 0));
width. height. 10;
// System.oat .println^The locatiam is: + z. + " + y_);
Iectaagle2D.Doable rect aew Rectangla2D.Double(panalVidth z. + z, paaelBeight y_ y, width., height.);
g2d.fill(rect);
g2d.setColor(aew Color(0, 255, 0));
g2d.fill(new Rectangle2D.Double(panelVidth (z_ .01) x, paaelBeight (y_ .01) y, width_/2, height_/2));
g2d.eetColor(nev Color(0, 0, 0));
*/
g2d.eetColor(new Color(255, 0, 0));
width. height. 10;
// System.oat.priatln(MThe location is: + z. + " + y_);
Rectangle2D.Double rect new Rect angle 2D. Double (panelVidth z_ z, paaelBeight y_ + y, width., height.);
g2d.fill(rect);
g2d.setColor(aew Color(0, 0, 0));
>
>
public String getlame(double x, doable y){
String name super.getBane(z, y);
if(isExpanded.){
//width. height. 70;
//x_ y. .3;
Iterator 1 nodeSet..iterator();
whila(i.haolazt()H
DisplayBode glyph (DlsplayBode)i.nezt();
if (glyph, contained, y))l
dsm glyph.getB(x, y) ;
>
>
return name;
>
public void aetLocatica(dodble x, doable y){
62


x_ x;
7- 7i
y
public void set!(double i){
i. x;
>
public wold setT(double y){
j- t:
>
public double getX(K
return x_;
>
public double getT()-{
return y.;
//modified 4/2
public double getCenterX(){
//return (x. + .007);
//System.out.printlB(Right now: "
if(isBxnanded()H
return (x_ + (double)width_/(2.0 *
}
else{
return (x_ + 6.0/(2.0 pVidtb.));
>
public double getCenter!()(
//return (y. + .007);
if (ieBxpanded()){
(y. + (double)heigbt_/(2.0
(y. + 5.0/(2.0 pBeight.));
return
}
else<
>
>
x_ + "
pVidtb.));
pHeight_))
vidth_/(2.0
public double getVidth(}\
return wldth./pVidth.;
pVidtb.));
public double getBeigJrt(){
return height./pBeight.;
public void eetPDina (double pHeight, double pVidtb){
pHeight. pHeight;
pVidtb. pVidtb;
public boolean contain*(double x, double y){
if(pVidtb. 0 kk pHeight. 0)<
return false;
>
else if (isExpanded.H
//else<
//return border..contains(z, y);
Hectangle20.Double rect nee BectangleZD.Double(pVidtb.
pHeight. y_, width., height.);
//System.out.println(this expanded: width. + height.);
return rect.contains(x, y);
else<
//System.out.printla(Mthis collapsed: M + width. + + height.);
//Rectangle2D.Double rect new Rectangle2D. Double (pVidtb. x.. pHeight. y_, 10, 10);
Kectangle2C. Double rect new Rectangle2D. Double (pVidtb. x_, pHeight. y_, height., width.);
return rect.contains(x y);
>
}
public void setDiasnsions(double width, double heightH
width. width;
height. > height;
public boolean i sExpanded(){
return isExpanded.;
public boolean isClustr(){
return true;
>
public String nodesirrey(String fillirrayX
lnt sixe fillirray.length;
int i;
fer( i 0; i < sixe; i++){
fillArrayCiJ
}
i 4;
Iterator iter graph..getAllBodesO .iterator() ;
while (iter. haelaxt (J){
i + 1;
lode n (Hode)iter.naxtO;
63


fillArrayCU s n.gatMuaaO;
ntun fillArray;
public HaahSat aodaSat(}{
r-atura graph_.gvtillBodaaO;
public void addDiaplay)loda(Diaplayl6da navItaaH
oodaSat..add(oavXtaa);
>
public void findiraragaPoaitioa(H
doubla x, ji
x 0.0;
y 0.0;
HaahSat nodaa gra^h_ .gatillkodatO;
Itarator i nodaa.iterator();
vhila(i.hasHaxt())<
Boda n (Boda)i.naxt();
x -+ a.gatDiaplajO.gatX();
y + a.gatDiaplay().gatT();
>
x x/aodaa.aimaO;
y y/nodaa.aisaO;
>
public boolaan orarlapa(Diaplayboda dM
doubla du_x dn.gatXO;
doubla da_y dn.gatTO;
doubla vidth vidth./pVidth.;
doubla baigbt hoight./pHaigbt :
doubla da.haight da.gatBaight() / pBeight.;
doubla da.vidth dn-gatVidthQ / pVidth.;
boolaau iaOrarlappad falaa;
if (dn_x < x. kk dn_y < y.H
if((dn_x 4 da.vidth) > x. U
(du_y + do.baight) > y_H
iaOrarlappad trua;
>
if(da.x > x. kk dn_y > r_H
if ((x_ 4 vidth) > dn.x u
(y_ 4 baigbt) > dn_yH
iaOrarlappad trua;
>
if(da.x ( x. U dn_y > y.K
if((dn_x 4 da.vidth) > x. U
(y_ 4 baigbt) > duyH
iaOrarlappad trua;
>
if(dn_x > x. U da.y < y_){
if((x_ 4 vidth) > dn_x mk
(dn_y 4 dn_haight) > y_){
iaOrarlappad trua;
>
//Syvtau.out.printlaCcbackiag 4 ga-rRodaO.gatBauaO 4 and 4 dn.gatSodaQ ,gatla() 4 "
raiaru iaOrarlappad;
public int gatBxpandOrdar(){
raturn axpandOrdar.;
public vtatic void ainCStriugQ arga) {
>
/* *
* Croatad on Fab 1, 2000
* TODO To chaaga tha taoplata for this gaaaratad fila go to
* Vlndov Prafaraucas Jara Coda Styla ~ Coda Taaiplataa
/
packaga bona.agabov.graph.rix.aviagaat;
iaport jara.avt.caon.a;
iaport jara.avt.Color;
iaport jara.avt.Oraphica;
iaport jara.avt.6rsphica2D;
iaport jara.avt.AlphaCoqpoaita;
/*
* toutbor agabov
*
* TODO To chauga tba taiiylnta for thia gaaaratad tvpa comast go to
* Viadov Prafaraacaa Jara Coda Styla Coda Tanplataa
*/
public claaa DiaplayHdga {
prirota doubla xSourca.;
n 4 iaOrarlappad);
64


private DisplayRode source.;
private DisplayRods sink.;
private double ySink.;
private Bdg* edge.;
public DispleyEdge(Edg* ){
tdg. a;
Mode arc, auk;
arc e.source.;
auk a.sink.;
source. a.source..getDisplayO;
sink. e.sink..getDisplayO;
public DisplayEdge(Edge a, Point2) arc, Point2D snk){
edge. a;
// source. ;
// sink. snk;
>
public DisplnyBdge(Edge a. Rode sre, Rode snk){
adge. a;
source. arc.getDisplayO;
sink. uk.getDisplayO;
public Point2D getSource(){
return na* Point2D.Double(sourca_.getCenter!(), source..gatCantarT());
public Point 2D getSink(){
return nev Point2D.Doubla(sink_.gatCanterZO, sink..getCenterT());
public Edga getEdge(){
return edge.;
public void draw(doubla panelVidth, double paneLBeight( double xCorner, doubls yCoraer, Graphict2D g2d){
if (source, a* nnllK
source. edge..source..getDisplayO;
if(sink, as null){
sink. edge..sink..getDisplayO;
//g2d.setCoaposite(41phaCaposite.getInstance(AlphaConposite.DST_0TER));
if (source.. isRxpandedO || sink.. isRxpandedO ){
d.setColor(nev Color(2S5, 0, 0));g2d.setColor(nev Color(2SS, 100, 0));
j2d.i
g2d.drav(nev Line2D.Double(source..getCenter!() panelWidth 4 zCorner, source..getCenterT()
sink..getCenter!() a panelVidth 4 rCornsr, sink..getCenterT() panelBeight 4 yCoraer));
//g2d. eetCmposit e(AlphaCmposite. gat Instance (Alpha Opposite. SRC .ATOP) );
~ '.setColor(xMV Color(0, 0, 0));
panelBaight
j2d.J
public static void min (String Q args) {
Rode nl, n2;
nl a nev Rode("foo". "4);
Display Node dn nev DisplayRoda(nl, .3, .5);
nl.eetDisplay(dn);
n2 nev Rode(n", "4");
DisplayRode dn2 nev DisplayRode(n2, .0, .7);
a2.setDisplay(dn2);
UndirectedBdga edge nev UndiractadKdge(nl, n2);
DiaplayEdge de nev DispiayBdga(edga);
Systee.ovt .print In ("The orignal string is:M 4 nl.anaot.
nl.setAnnot("blah blah blah");
Systea.out.printIn(nTba aodified string is:" 4 nl.axmot.
4 and the reference is 4 ds.getBdgeT) .getSouroeO .annot.);;
>
}
/
* Crsated on Jan 31, 2005
* T0D0 To change the tsaplate for this generated file go to
* Vindov Preferences Java Coda Styla Coda Tsaplatas
/
package bom.egabov.graph.vis.svingset;
l^ort Java.avt.gem.*;
u^ort java.avt .Color;
ispart java.avt.Graphics;
isport java.avt.Graphics2D;
inport java.util.*;
/*
* danthor agabov
* T0D0 To change the tablate for this generated type romant go to
* Vindov Preferences Java Code Style Code Tmplates
*/
public class DisplayRode {
protected double x_;
protected double y_;
protected double pVidth., pBeigbt.;
private Rode node.;
yComer,
65


private Bactaagle2D.Double net.;
protected double width. 10;
protected double baipt. 10;
public DiiplaylocUO'Q
public Displaylode(bode n)(
node. a;
*_ 0;
7- 0;
//>dM 4/3
pVidth. 660;
pBeight. 650;
>
public Displaybodefbode a, doable xLoc, doable yLoc){
node. a;
x_ xLoc;
7- yLoc;
//added 4/2
pWidth. 650;
pBeight. 650;
>
public doable gtl(){
retara z ;
>
public doable getY(){
public void eetZ(double x){
x_ x;
>
public void eetT (double y){
public bode getbode(H
return node.;
>
public void draw(double panelVidth, double panelBeight, double z, double y, 8raphies2D g2d){
double radio* 10.0;
pVidth. panelVidth;
pBeight. panelBeight;
rect oev Bectangla2D.Double(panelVidth i. x, panelBeight j. + y, radius, radius);
g2d.fill(rect_);
>
public String getbane(doable z, double y){
return node..setNaaa();
)
//uodified 4/2
public double gatCenterZ(){
//return (*_ .007)
retura (x_ + vidth_/(2.0 pVidth.));
>
public doable getCenterY(){
retura (y_ + height_/(2.6 pHeight.));
public String toStrlag(K
return aev String(Doable.toString(x_) + " + Double.toString(y_));
public boolean contains(double z, double y){
if (rect. aullH
return false;
}
elsa{
return rect..contains(x, y);
>
public boolean isBxpanded(){
return false;
>
public boolean i sCluster()(
return false;
>
public double getVidtb(H
return vidth./pVidth.;
public double getB*ight(H
return height./pBeight.;
//return 0;
>
public &ectangle2D.Double getBect() {
Baotengle2D.Double retBec* nee Bectangle2D.Double(pVidth_ x_ + z_, pHeight. y. + y_, width., height.)
zetiirn ret Bee;
>
66


public void axpand(doubl panelWidth, double panelBeight, doable nevWidtb,
doable stfBeight, Grephics2D g2d)0
public static void nainCStringn ergs) {
>
/
Created on Jan 3 2005
TOM To change the t^>late for this generated file go to
Window Preferences Java Code Style Code Templates
*/
package heats .agabov .greph.vis.svingset;
inport java.util.HashSet;
/* *
tanthor agabov

T0D0 To change the tsnplate for this generated type coaaant go to
* Window Preferences Java Code Style Cods Tanqtlates
public abstract class Edge {
protected Mode source.;
protected Bode sink.;
protected doable strength.;
private HaahSet nultiEdges.;
private DisplayEdge display.;
public abstract boolean isDirectedO;
public abstract boolean isWelghtedO;
public String toStriagCH
retnrn new String ("Source: + source. + sink:" + sink.);
public Node getSource(){
return source.;
>
public Node getSink(H
return sink.;
>
public void addEdge(Edge e){
nit iBdges.. add (e >;
>
public void setDisplay(DisplayEdge de){
display. de;
public DisplayEdge getDisplay(H
return display.;
public HashSet getBdges(H
return sailtiEdges.;
public static void aain(StringO args) {
>
}
Created on Jan 81, 2005

TOM To change the tsoplate for this generated file go to
* Window Preferences Java Code Style Code Tssplates
*/
hO; agabov.graph.vis.soingset;
Inpart java.io.*;
inport java.awt iiphaCosposlte;
i^ort java.awt .Color;
inport java.awt.Graphics;
Inport java.awt .Graphics2D;
jj^iart java.avt.KanderingEints;
i^ort java.awt.event.*;
Inport java.awt.aeon.*;
inport java.nt.Sisniiai;
inport java.awt.image.*;
inport java.avt.GridBagCenstraints;
Inport javaawt.CrldRagLayout;
inport jnvax.swing.JPrama;
inport jnvax.swing.JLnbol;
inport javsx.swing.JOptionPane;
i^ort javax. swing.JPanel;
inport jnvax.swing.JList;
import javax.swing.JScrollPans;
inport jnvax.swing.JToxtirea;
inport java.awt.Font;
inport javax.swing. JMenuItam;
isport javax.swing.JMennBar;
inport javax.swing.JMonu;
inport javax.swing.JFileChoosar;
67


u
1
!

1


>
1m{
Systaa.out.priatln(Plaasa pick a smaller clustering threshold. Cannot display + dust .getAlLBodesQ .sise() + N nodes"
pablic void BOQHPriMd(KciiMEmt e)<
//System. out.println(Bnonkey?");
// Checks whether or not the cursor is inside of tbs rectangle while tba user is pressing tbs sous*.
Itarator i dc_.nodaSat().iterator();
while (i. haaJfartC) ) {
Oisplayloda glyph ((Vode)i.nextO) .getDisplayO;
if(glyph.contains(a.gatX(), a.gatT())){
//JOytionPane.showHastageDialog(null, glyph.gstHodeO .gatNamaO);
text. sat Tart ("In Focus: \n glyph. gat9ana(a. gatZ() a.gatTQ));
if(glyph.isClustar() Aft glyph.isSxpaaaedO false)!
DiaplayClustar blorg (DisplayClustar)glyph;
blorg. elustarChackO;
Graph!cs2D g (Graphics2D)gatGraphics();
isfoList. > blorg.uodaaArray (inf oList_);
inf oLiwt. [0] "Expandad 8uamary";
inf oLiat.[l] "Ion Transport: 461";
isfoList. [2] "Lipid Transport: 121;
infoList.CS] "Beurotranaaitter Transport: 9X";
isfoList.[4] "Clustered:";
list..satListData(infoList_);
//Graphics2D g2d (Graphics2D)gatGraphics();
Graphics2D g2d display..craataGraphics();
g2d.setRanderingHint (BanderlagHint a .XET.AHTIALIA3IMG,bandaringBints.VALUE A1TIAL1AS (DO;
g2d.satBandaringHint(EandaringBinta.KEY.TEXT. AHTIALIA3UG BandaringHints.TALOE.TEXT.AVTIALXAS.OI );
g2d.setColar(aaw Color(0, 0( 0));
g. satBandaringBintCBandariagHints .KET.AKTI ALIAS DM, RaadaringHints -VALUE. AET1ALI AE_0E);
g. satBandaringHint (Bandar iugHinta.KET.TEXT.AVTIAL IAS I1G WandaringHinta.tAUlB TEXT ATTiLTigpn );
g.satColor(aa Color(0, 0. 0));
Dinansioo d gatSixaO;
double panalHaight d.getHeightO;
double panelVidth d.getVidthf);
// blorg.expaad(paael9idth, panelBeight, 100, 100, g2d);
long width ScraauAllocator.raquastSpaoa(blorg, dc_);
//width 200;
Syrtan.oat.println("Tba width is width);
blorg.axpend(p*nelbidth, panalBaight, width, width, g);
SprlugLayoot si nsw SpringLayout();
fcr(int k 0; k < 20; k++H
//System, out.print Inf"Moot! layout" + dc_.getXode() nans_);
sl.epringEabed(dc_ .g*tBode() .gwtClusterO, (k + 1)* 5, k 6);
//paiatComponant (g);
update(g2d);
update(g);
>
sl.po tydatafg);
update(g2d);
>
}
alaa {
//System.out.println(Dwoot woot");
>
public void mouseDragged(MouseEvent a)(
// Baudlas tba avast of a usar ralaasing the moosa button,
public void oueeEeleaaed(HouseBvent a)! }
// This art hod required by MouaaListanar.
pablic void uouaeMovedOfouseEvest e){>
// Tbase natfaods are required by MousaMotiouListauar.
public void mouseClicked(HouseEvent a)0
public void son sa Exited (Moo aaEvant a)0
public void mouse Entered(HouseBvent a)0
public void paintComponent(Graphics g) {
Int r 10;
clear(g);
Graphics2D g2d2 (Graphics2D)g;
if (display. null) (
// Compute tba grid only ona time
int w tbis.gatHidthO:
int b this.getBeigbtO;
display. (Bufferedlmage)(this.craatelmage(w,h));
Graphica2D g2d display..craataGraphicsO;
g2d. satBandaringHint (Bander ingHint s. KET_AVTI ALIAS H6 ,BandaringHints. TALBE.AVTIALIAS.OI);
g2d. satBandaringHint (Bandar iuglint . KET.TCXT.AMTXALIASIBG RandaringHints.TALOB_TEXT.AVTIALU8.OH );
g2d.setColor(new Color(0, 0, 0));
69


TTinmiinn d getSixe();
double panelBsigbt d.getHeightO;
double panelVidth d.getVidtb();
g2d.setCoaposite(AlphaCoaposite.getInstance(AlphaCooposite.SKC));
dc_.drav(panelHeight, panelUidth, 0
g2d);
ttarator J edgeiterator();
while(1. h&sHert())-{
((Edgejj .nextQ) .getDisployO .drarv(panelUidth, panelBeight, g2d);
Iterator i nodeDisplay_.iterator();
hila(i.badxt()){
DispleyBode glyph (DisplayBodeH.nextO;
glypb.drav(panelWidth, panelHeight, 0, 0, g2d);
>
*/
>
// super. paintCoaponent clears offscreen pisaap.
// line* vere using double buffaring by default.
protactad void elear(Granhics g) {
roper.paintCoaponent(g);
public void setLocatians(HashSet nodes, BashSet od^i){
nodeDisplay. nodes;
^adges_ edges;
public roid sat Text Field (JText Area text){
taxt_ tart;
>
public void setJLiet(JList scrollH
Lirt_ scroll;
>
public JNanuBar croataManuBarO <
JNanuBar aanuBar;
Jftaxm aanu;
JMenuItan aiOpan, liQoit, oiSave;
//Craata tha menu bar.
aanuBar nav JBanuBarQ;
//Build tha first asnu.
sanu nav JNenu(?ile");
asnu.gatAccassibleContextO satAceassiblaDascriptian(
"Load filas and control tha application*);
aanuBar.add(asnu);
//a group of JManuItsns
miOpan nav JManuItaa(Open");
niQuit nav JMeHnItans(HQuit");
ai8ava nav JHsnuItea( //aanaltan. satWnsaonic (Keygvsnt .TK,T); //usad constructor instead
BiOpan.gatAccassiblaCootaxtO. eeticeesaiblaDaecriptian(
This doesnt really do anything*);
ni Open.add! ctionListaner(this);
niQuit. addict ionli Stan er (this);
ai Sava.addictioaListenar(this);
aasu.add(niOpan);
menu.add(niQuit);
nanu.add(aiSava);
return aanuBar;
>
public void actioaPerfornad(ActionEvent a) {
JHanuItaa sourca (JHanuItaa)(a.gatSourca());
String s Action avant detected.*
Bvant source: source.getTaxtO;
Systaa.out.print ln(s);
if (sourca. gutTextO "Quit"){
System.exit(0);
if (source, gat Tart () "SaveMH
try <
// Sava as JPEG
Fils fils nav File("savediaage.jpg");
InageI0.vrite(display., "jpg", file);
) catch (IOBxcaptian ioe) {
>
if (sourca. getTaxtO M0panH
JFilaChoosar chooser nav JFllaChoosarO;
// Iota: sourca for ExaanleFilePiltar can be found in PileChooserDeao,
// under tha do/jfc directory in the Java 2 SDK, Standard Edition.
int raturnTal chooser.shovOpanOialog(this);
if (returnYal JPilaGboosar.APPBOTE.QPTIOlH
70


s M S'S'S'B M'SM'SM'S'S'S'S'S'SMM'S'S'S'S SS '' list lUHlIiliiiiiiiiiiiii ii Him
I ll
E EE EEEE EEEEEEEEEEEEEEE&EEEEEE EE EEjEEb
O to D 00 CO CD 00 Ha*4O0l*MKOO>
S Ss ^o i fimi


imminm
X ............
li ml
is sssi
ShEhhCShhl
jsjajajsAAAAAAAA
|SSSIS332S21S
o aaiiaactflaaflflM
IMpffi
sSxSszSssssS
g.e.g.8.g.e.e.g.e.e.g.e.
-aaoSaoaaaoaa

S888888838888
HSIIIIIIJTIII
Hg.g.g.g.g.D.g.g.g.g.g.g.
5*4 -H *4 -H 4 -H -H f* *4 -H -H -H
^oaQOoaoooooo
isM2!:i:s::s
sis is1
BB-B- frfr
H >H *H *rt *H
O a Q QQ
lit u
igt ti
Ill h is:|
sis is liii*
OK ...
****
r
pi
s S e-6-e-
h. k. 0. h. 3J 3j 3J
llll
wC
H!S!
.... s-bb-
SSS'SSfS
dlim
CM
fsssi
ft? si
c-
IS
b
t4 *4
S3
ll
fl
SB.'
*5
4
tf
o Q
2l.
,*8 *
flic*
llffl 1
Isiii i
}
i
SSS t|


//e4.aetDiaplay(nev Diaplay£dg(e4));
4.aetDiaplay(naw Display£dge(e4));
e5.aetDiaplay(nev DiaplayEdge(e6));
edgea.add(el);
edges.add(e2);
edges.add(e3);
edges.add(e4);
edges.* *dd(e6);
/
if (dc.getBodeO ! anll){
Syitaa.ont.priBtLi(dc.gtlod() .gatlutO);
In
<
Syrtea.oat.printin("This ia nail");
al.spriagBnbed(gs, 100, 0);
//h*3 sl.springEahed(gs2, 1000, 0);
SVCloster svc nan SVCluster(gs);
GraphDisplay toe nav GraphDisplay(ga, ha, edges);
CraphSet claat svc.cloatar();
oda n28 Mod*.lodePactory(dnat, "", "8");
Display-Cluster dc2 nav DisplayClnater(n28, clnat);
al.*pring£Bbed(dnst 100, 0);
dc2.aatLocatlon(0, 0);
dc2.espead();
dc2.aatDiaaaaioaa(849, 649);
1oo.setDisplay(dc2);
JPanel jp saw JPaaelO;
GridBagLayoat eanager aav GridBagLayoat0;
jp.set Lay oat (manager);
GridBagCoastralnts c nav SridBagConatraintaQ;
c.fill OridBagConatraints.BOTB;
JPnm fraa* aav JFri ("Clnster fin");
foo.aetBackgrovad(aev Color(255, 265, 256));
foo.setPreferredSixe(nev DiMnaion{650, 650));
JHamaBar aami foo.ereateMamiBarO;
fraM.setDafanltCloaeOperatiaaQPraM.EZIT.OI.CLOSB);
fraM. set JManuBar (mob) ;
fraaa.gatCoatantPaaaO .add(Jp);
JLiat SMpleJList nav JLiat(foo.iafoList.);
splaJlitt.setPrototypeCellYalae("blahblahhblah");
saaplaJList.stVne f"Hnrin in Poena");
sa^leJLlst, aatTi siblaRovCormt (20);
Post diaplayPont aav Foot ("Sen.f ", Pont.BOLD, 12);
sa^le JLiat. setFoot(diaplayFoat);
JScrollPana liatPaaa aav JSerollPaneCtaapleJLirt);
JTertirea taxt nav JTestArea("vakka \n vakkan);
c. gridbeiglrt 2;
e.grids 0;
e.gridy 0;
e.gridvidth GridBagCoostraints.RELATIVE;
jp.edd(foo, e);
c.veighty 1.0;
c.gridbaigbt 1;
c.grids 1;
e.gridy 0;
e.gridvidth QridBagCoattraints.REXAXlDBR;
jp.add(liatPaaa, c);
c.gridbaigbt *1;
c.grids 1;
e.gridy 1;
e.gridvidth GridBagCoostraints.REHAXRDER;
jp.add(tart, c);
foo.setTextField(tert);
foo.aatJLiat(aaaplaJLiat);
fraM.paekO;
fraM. aat Visible (traa);
>
}
/
Created oa Jan 3, 2005

* TQDO To change the tanplate for this generated file go to
Uindov Preferences Java Code Style Coda TasipTates
*/
package hoM.agabov.graph_viz.svingset;
73


import j avax. ml. par Bars -Facto ryConfigur at ionError;
import javax.xml.parsare.ParserConf ignrationException;
import javax.xml.parsers.SiXParsar;
import javax.xml.parsers.SiXParserFactory;
jgpriT-t org.xml.sax.Attributes;
import org.xml.sax.XMUteader;
import org.xml.sax.SilZxcaption;
import org.nl.aaz.SAXParseKxcaption;
import org.xml.sax .helpers.DefaultBandler;
import org.nl.aaz.helpers.XKLReadarFactory;
import java.io.FilaOutpntStraam;
import java.io.IOBxcsption;
import jars.io.PrintStream;
import java.util.*;
/*
* Caotbor agabov
* TODQ To change tba template for this ganaratad tjpa conwot go to
* Window Prafarancas ~ Java Coda Style Cods Templates
*/
public class GraphParssr axtands DefaoltBandler {
privata boolaan isBams. falsa;
privata boolaan isltt. falsa;
privata boolaan isSoorca. falsa;
privata boolaan isSink_ falsa;
privata boolaan ieVeight. falsa;
privata GraphSet gs_ nav 6raphSat();
privata String sonrceMame_, lisUm., desc_, weight.;
privata BashMap nodeBash. oav BashHapO;
privata FilaOutputStraam vritaFila_;
privata PrintStran coot.;
public GraphParsarOt
try<
vriteFile. new FilaOtttputStreaa( "blah.txt");
//coot. nav PrimtStran(vritaFila_);
//ceut_ .printlnC1 (andy-preprocess ");
catch (Exemption a)
{
Syrtaa.arr.printIn ("Error writing to fils");
>
public GraphSet getGraph(){
ratnrn g*.;
// Tha thraa mathods of tha ErrorHandlar intarfaca
public void arror (SilParsaExcaption a)
System.ont.println("\n***Brror*** 4 a);
public void warning (SAXParseExcaption a)
Sy*tam.out.println("\na**Warniag*** 4 a);
public void fatslError (SAZParaaExcaption a)
System.out .orintln(B\n***Fatal Error*** 4 a);
public void startDocumant ()
^ System.out .println("\n***Start Dormant***") ;
public void andDocnont ()
System.out.printla(H\n***Ead Docnant***");
// Thara ara many ways to filter out Elements T
// this is an alnantai sxampla
public void start El meat (.String namespace. String localBame,
String qualifiediame, ittributes attribs)
if(qualifiadBama "IAHE"K
isBam* trna;
>
alsa if(qualifiadBama "TEXT_DESC"H
isitt trua;
>
alsa if(qualifiadBama "SOURCE"H
isSourca. trua;
}
alsa if (qualif iadBama "SDDC"H
isSink. trua;
>
alsa if (qualif iedBmee "VEI6HT H
isWeight. trua;
//Systaa.out.printIn("boot!");
alee if (qualif iadBna "XKL"H
//cout_.println("*(")i
74


if(ialame. II iaAtt. || isSonrce. II isSink.H
//System.out.println ("\nlew 4 qualifiedkene Entry: );
}
// Only print character* from the 'name* elements
public void characters (char eh, int start, int length)
if(islame. || isAtt. II isSonrce. 11 IsSink. II isVeight.H
//System.ont.println("well 4 start 4 " length);
String chData (nev String(ch, start, length)).rim();
//System.oat.printIn(chData);
if (islmw.H
aonrceBme. chData;
>
else if(isltt_)<
deac. chData;
SyrtetB.ont.println(Mitt Is + desc_);
>
else if (isSource.H
oarcelane chData,
//coat_.print("({ | 4 sonrcsBeme. 4 "I ");
// System.out .println("I love yon Laraleen "+ sinkXan*.);
>
else if(isSink. U !isVeigbt_){
sinklame. chData;
//cout_.print(" I" 4 sinkBame. 4 "I ");
//Systen.onfe.println("I lore yon Luraleen sinkhmne.);
else if (isWaight.H
weight. chData;
// system.oat.printlnCVhoo boo 4 sinVWama. 4 and + sourceName. + at 4 weight.);
>
>
>
// Regardless of what Slsaant we*re on, were done with 'nans
public void endElensnt (String nansspaceti&I, String localBaoe,
String qualif iedNmne)
if (qualif iedBame "BAMB"){
iaBame. false*
\
else if (qnalif iedXaste *TEXT_DESC"){
iaAtt. false;
>
els* if(qnalifiedBame "S0UBCB"){
IsSonrce false;
>
else if (qnalif iedBame SHK"H
iaSink. false;
// Systea. out. println ("blah" );
if ((nodeHash..get(sonrreWne.)) ! nail AA
(nodeHash_.getIsinhR.)) ! nullH
//System, out .printla(sourcaHama. + and sink name: + sinkBame. + hash sal: +
// (Bode) (nodeHash..get(sourcaBama.)) + and bash ral; 4
// (Bode)(nodeHash..get(sinkBame.)));
//eliminate self edges
//nodifed on 4/18 to hare call to eddBiBdge(node, node, int) rather than
//add81Edge(node, node)
if (nodeHash. .get (sooreeBae.) ! nodeHash. .ge-..(sinkBame_) 11
jgs..hasEdge((Bode) (node Hash, .get (sink!e_ )), (Rode) (nodeHash.. get ( sonrcaHame.) ) ) ){
^s.. addBiEdge( (Bode) (nodeHash. .get(sonreeflmae. )), (Bode) (nodeHash,.get (sinkBame.) ) 1);
elae{
System.ont.printInC'AeaoTeing edge: 4 sonrrel_ 4 to 4 ainkHasm. 4 "or removing double edge"
>
else{
ft System.ont .print In (source Berne. 4 and sink name: 4 sinkBama. );
elsa if (qualif iedlane * VEIGH?"){
isWelght. false;
else if (qualif iedBame "B0DE"H
Bode n Hode.BodeFectory(desc_, sonrceMm
nodelash..pot(n.name., n);
else if (qualif iertHe "ML"K
// coot..printing)*);
elee if (qnalif iedBame "SXKS"H
//coot..println( ) 4 weight. 4 ")");
>
public void endFile(H
//coot..println("() () 8000)");
75


//corxt..close();
>
public static void main (StringO args)
SIXParserFactory factory SiXPerserFactory .nevInstanceO;
SAXParser parser factory.aevSAXParserO;
GraphParser handler new GraphParser();
XMLBeader raadar parser.getXMLBeader();
parser.parse("test. znl", handler);
// parser.parse{"cardinalityS.xnl, handler);
// parser.parse("cardinality2.xal", handler);
handler.ondFileO;
GraphSet gs handler-setGraph();
SWCiuster sve new SWClnrter(gs);
sac.cluster();
>
// set parser features
catch(Exception )(
e.printStackTraee(Systea.err);
>
>
/*
* Created on Jan 21, 2005
*
TOGO To change the tablate for this generated file go to
* Window ~ Preferences Java Code Style Code Tanplates
/
package hoaa.agabov.graph.vis.sviagget;
i^ort Java.utll.e;
inport java.io'.*;
* Santhor agabov

TOGO To change the tablate for this generated type consent go to
* Window Preferences Java Code Style ~ Code Templates
/
public class GraphSet {
private HashMap edgeMap.;
private HashMap otherEdgaMsp.;
private HaahSet nodeSet.;
private GraphSet parent.;
private int id_;
private static int idConnt. 0;
public GraphSet(){
edgeMap. new HashMap(1024);
nodeSet. new BashSet(1024);
otherEdgaMap. new 3amhMap(1024);
public GraphSet(HashMap edges, HashSet nodes, GraphSet parent){
dgeMep. edges;
nodaSet. nodes;
parent. parent;
id. idCount.;
idCount. + 1;
otherEdgeMap. nev HashMap(1024);
public OraphSet(HashMap edges, HashMap otherBdges, HashSet nodes, GraphSet parent){
dgeHap. edges;
otherBdgeMap. otherBdges;
nodeSet. nodes;
parent. parent;
id_ e idCount.;
idCount. + 1;
Iterator i nodeSet..iterator();
while ( i. hasHext ( ) ) {
otherEdgeMap..put(i.next(), nev BaabSotO);
>
>
public String toString()<
if (nodeSet. null 11 otherEdgeMap. nullH
return
>
elao{
return new String("The nodes are M nodeSet. + "and the edges are "
+ otherBdgeMap.);
>
public void addHoda(loda a){
nodeSet..add(n);
otherBdgeMap..put(n, nev HaehSetO);
public boolean basHoda(Hode n){
76


return nodeSet. .caxrtains(n);
>
public boolean addKdgs(lods source, Hod* sink){
String key Integer.toString(source.id.) + + Integer.toString(aink.id.);
if (! nodeSat.. contains (source) ) {
>
if OuodeSet.. contains (sink)){
( sink);
>
if (! edgeMap _. centainsKey (key) ) {
UndireetedEdge neeBdge nav tfadirectedBdge( source, sink);
edgeHap..put(key, neeBdge);
((BashSet)otherBdgeMap_.get(source)) .add(neeEdge);
//((HaBhSet)otberEdgenap_.get(sink)).add(neeBdge);
//naad this chack to prevent duplication of nodes in
//directed graphs
if(!source.isleighbor(sink)M
source.addleighbor(sink);
sink.eddleigbbor(source);
>
return true;
>
public boolean addBiEdge(Bode source, lode sink){
addBdge(source, sink);
addBdge(sink, source);
return true;
>
public boolean addBiEdge(lode source, lode sink, int i){
OndiractadEdge a nav UndireetedEdge (source, sink);
e.satDisplayTnav DisplayEdge(e));
addBiEdga(a);
return true;
}
public boolean addBiEdge (OndiractadEdge edge){
ode source ec%e. source.;
lode sink edge. sink.;
String key Integer.toString(source.id.) + + Integer.toStnng(sink.id_);
if('nodeSet..contains(source)){
>
if(inodeSet..contains(sink) ){
addlods(sink);
}
if(ledgeMap..contain*Key(key)){
dgeflep..put(key, edge);
//key Integer.toString(sink.id.) + + Integer.toString(source.id.);
dgeHap..put(key, edge);
( (BasbSet)otbarBdgeNap. .get (eource) ) add (edge);
((laahS#t)otherBdge*ap..get(sink)).add(edge);
//need this check to prevent duplication of nodes in
//directed graphs
if(!source.isleighbor(sink)H
source.addlelghbor(sink);
sink.eddleigbbor(source);
>
return true;
>
public RashSet getEdgee(lode source) throes lullPointarBxcsptiac
return source.getleighborsO;
public BasbSet getEdges2(Hode source) throes InllPourterException
return (BashSet)otherEdgeKap_.get(source);
public BaebSet getill£dgea2(H
BaakSat edges nae BashSat(1024);
Set nodes otherEdgeKap..ksySet();
Iterator i nodes.IteratorQ;
ehile(i.basisxt()H
edges.addill(((HashSet)other&dgeHap..get(i.next ())));
return edges;
public Collection getAll£dgea(){
return edgeMap..values();
public BasbSet getilHodes(H
return nodaSet.;
>
public int getID(){
return id.;
>
77


public SraphSet g* *tPant(){
return parent.;
public boolean haaBdgo(Rode source, lode eiukK
String key Integer.toString(soarcs.id.) + 4- Integer.toString(sink.id_);
return edgeMap. .containsXey(Eey);
public boolean baeOndlBdge(Mode soorce, Rode einkH
boo lea contains false;
String keyl Integer.toStriug(source.id.) + Integer.toString(sink.id.);
String key2 Integer.toString(sink.id.) + Integer.toString(source.id.) ;
// Systen. out. print In (" the key is keyl);
if (edgeMap..containsKey(keyl) 11 edgeMap..containsKey(key2)){
contains true;
>
return contains;
}
/*
public boolean haaBdge2(Rode scarce. Rode sinkH
ResbMsp edges it
return edgeMap.. containsKey(key) ;
public int collapse( BasbKap edges, RashSet nodesH
OraphSet cluster nee OraphSet(edges, nodes, this);
Iterator i nodes.iteratorO;
ehiled.basRert () ){
Rode checkRoda (Rode)i.naxtO;
if (nodeSet.. contains (checksode )H
aodeSet .reeove(checkRode);
>
>
return duster.getXDQ;
public int expand (K
id. parent, .get IDO;
parent. parent.-getPerent();
return id.;
}
public static void aain(StringD ergs) {
CrapbSet gs nee GraphSetQ;
Rode nl Rode.RodeFectory(aH, "1");
Rode n2 Rode.BodaFectory("", "2");
Rode nS Rode.RodePactoryC", "3*1);
Rode n4 Rode.RodePactory( u, -V);
Rode n5 Rode.RodeFactoryC"", "6");
gs.edd81Bdge(nl, n2, 1)
ge.addBiEdge(nl, nS. 1)
gs. addBiEdge (nl, o4, 1)
gs.adrfBiEdge(n1, n5, 1)
gs.addBiEdge(n2. n3. 1)
gs.addBiEdge(n2, n4, 1)
ge.addBiBdge(n2, nS, 1)
gs.addBiEdge(n3. 4, 1)
gs. addBiEdge (nS, n5, 1)
g*.addBiBdge(a4, n5, 1)
//gs.printiispO;
>
/
Created on Apr 22 2006
T0D0 To change the teaplate far this generated file go to
Vindoe Preferences Jars Code Style Code Templates
*/
package hcM.agaboe.graph.eis.graphalgoa;
i^ort hoae. egaboe. gxapk.Tis layout. Spri ngLayoot;
iaport hcue.agaboe.graph_eln.awing eat.*;
i^xart jaee.io.FileReader;
inpart jaee.util.Collection;
iaport java.util.BashMap;
(port jaee.util.RashSet;
iaport 3***-Btil.Stack;
iaport java.util. Iterator;
inport jaraz.nl.parsers.SAZParser;
i^mrt javax.nl.parsers.SAXParserFactory;
iaport org.nl.sax.InputSoureo;
iaport org.nl.sax.ZRLRaadsr;
iaport DateLibrary. DataConvsrter;
/*
* Ranthor agaboe
78


Q
h!
U t
*4 I
ttiiuaHaJissSla
H
l
*
a
i
&
s
&
i
!
£
I I
^ 1
I i

:-Sf
i*
H N
"1
t*
Sis
'fist
I 5
3
S,
1 p
I a
gSS-
-Sc-t
*T"
Ilf
II
T,
Ilia!
fS /*><-*-!
public donblo g*tStt(CraphSat tm, int aod){
//find root
int coast 0, dopth 0, iiu 0, daptb,total
Hod* root no* Mod*
Iterator 1 troo. cotillSodaiO. iterator ();
&
*
s>
-3
zli
ot
II;
=:(:
>3 M 4*
ilsiL
h
Ji
f
i
i
I
la
g|£ I?*
3} !*j 3 oTrS -H ^
Oi


5 8
!
s3
g *§
- 4 4*
4 H
H C^S hf1
3 6-
- .fid
t* *g Sr
IS'fl
"rui 5
IIS-5
M in H fH *4
3 ins}
£ hhhhfl
(?
4* 4* 4* 4*s
|
tf
£, EEtfeg-
I llllf
i,5iS,S,l
-y
a
sail
ftaii Sissi
Ihfi HIM
kh-..
!?y-36
53a
aa*
. 688 .
IUU*
f S
8 8
! 5
15
H
S3
l
is
&
Iffi
ill '
tsJ!LiL33l:|
Ills
8 3
tl
* :
a
I
is 3
Sss-
=13. S
=1.14 5
811 £
S'
H
a
5
4
£iS
|rt|-
:nl
6 i
v= 3

IIU
jo.sts. s3sfl
"4. .
nmirU i
iijiisIMi


lode newNode new Mode(annotation
newRode.id. current lhn_;
cuntntBia, ctxrrvntVun. 1;
);
return nevHode;
>
pablic static Soda HodeFactory(GraphSet cluster, Stria* annotation, String um){
Med* newiode nav lode(annotation, naae + currentRuB.);
aavloda. id. cumntlta.;
currantlxe_ currantRi_ 1;
nevSoda.cluster. clastar;
ratun aavSoda;
>
public HashSat getReighbors(K
raturn neighbors.;
public boolaaa i*Reighbor(Rocle checkKeyH
raturn (neighbors.. contains (checkJCey)) ;
pablic void eddReighborCHode nevRodeH
neighbors_.edd(newHode);
public String gstl(H
raturn naaa.;
>
pablic boolean isCoupo
If(cluster. * nullH
return falsa;
>
alsa{
raturn true;
sita()<
>
>
public QraphSet getClu*ter() throws RullPointarExceptioiK
if (! isCcuposita ()) i.
throw uav luHPoistarEzcaptioa(a'Tbe node + Integer .toSt ring (id.) + isn't a composite node)
>
raturn cluster.;
>
public void set Cluster(GrophSet clust){
if(!isCcuposita()){
throw new XullPointarfixcaptioa("Tbe node + Integer.toString(id.) + isn't a couposita node")
>
duster. dust;
>
public void setDisplay(DisplayRode d){
display. d;
public DisplayRode gatDisplay()(
return display.;
public void set Aimot (String axmotK
am
}
public lot getXD(){
raturn id.;
>
/
pablic lnt ha sbCode( ) {
return id.;
}
/
public String toStringCH
//return nee String (name: name. + id: + Id.);
raturn new String("atia: + nans.);
public static sold ain(StringQ args) {
>
/
Created on Apr 15 2006
e
*/
package bone.agabov.graph.vLs.svingset;
i^ort java.util.*;
/*
another agabov
e
The ScraanAllocator claas is a static class used by GraphDisplay
and all OisplayClustars. It coetaina the logic for deciding how neb
space an expanding region gats, and decides if any additional collapses
are needed
/
public claas ScraanAllocator {
private static BaahSet spaceLeft. nev HashSat(1024);
private static HashSat axpandadHodas. new HashSat (1024);
private static GraphDisplay display.;
81


protected static long reqnastSpace(DisplayClaster expandingClaster, DisplayClnster
int closterSixe expandingClaster .getBodeO .getClnsterO .getAllHodes() .sixe();
int parentSixe parantClnstar .ratSoda() .getCiusterO .getAllJiodes(). sixe();
double width parantClustar.width.;
long space Nath, round (width (double)clatterSize/(parentSixe 4 clnrterSike)) /2;
if(space < 75 H
space 76;
Systen.out.prlntln(Allocating space " 4 closterSixe " 4 parantSisa);
ratera space;
protected static void collapsePpdate(DisplayClnster de){
>
protected static roid expaadlfode(DisplayCluster dc){
>
protected static roid setScreea(QraphDisplay display)^
display. display;
public static roid nain(StringO ergs) {
/*
Created an Apr 15, 2005

* /
package hone.agabow.graph.riz.swingset;
inport jara.vtil.*;
/**
* 4autbor agabow

The ScreenAllocator class is a static class used by QraphDisplay
and all DisplayClnsters. Xt contains tha logic for deciding how nach
space as expanding region gets, and decides if any additional collapses
are needed
/
public class ScreenAllocator {
private static BashSat spaceLaft. new HashSet(1024);
prirate static HashSet expaadadBodea. naw BashSat(1024);
private static QraphDisplay display.;
protected static long requestSpace(DisplayCluster expand!ngCloster, DisplayClnster
int closterSixe expandingClnster .getVode().getClasterO .getAUBodes() .sise() ;
int parentSixe parantClustar .getBodeO .getCiusterO .getAxlBodesO. sixe();
double width parentClaster.width.;
long space Math .ro and (width (double )clusterSize/(parentSixe 4 clustsrSixe))/2;
if (space < 75H
space 75;
Systen. oat .print la ("Allocating space " 4 clusters ixe 4 4 parentSixe);
return space;
protected static roid collapseOpdate(DisplayClnster de){
>
protected static roid axpandRode(DisplayClnster dcM
>
protected static roid setScreen(QrnphDisplay displayH
display. display;
public static roid ain(StringG args) {
>
/
Created on Jen 25, 2005
TOM To change the teoplate for this generated file go to
Window Preferences Jara Code Style Code Templates
*/
package bans. agabow. greph.rin. grephelgos;
inport hone .agabow.graph.riz. swingset .Displayliode;
i^ort bona.agabow.graph.riz.swingaet.OraphSet;
inport bosM. agabow. grph_rix. swing set. Bode;
i^ert hone.agabow.graph.rix.swingset.Bdge;
inport bone.agabow.grapb.wix. avingset .QadirectedBdge;
iajrart hone. agabow .graph.r is. swing set. OraphParser;
inport hone.agabow.graph.rix.swingset.SWCluster;
inport hone.agabow.xraph.rix.layout.SpringLayout;
iaport DataLibrary.OataConverter;
parentClaster){
parentClasterH
82


import jaTa.io.PilaEeader;
isport java.util.*;
iapart javax.xnl.parsers.SAXParser;
iaport jawax. xnl. parsers. SAXParserFactory;
iaport org. t1 ux. InputSource;
i^xsrt org.xnl.sax.XHLBeader;
Author agabov

* TOGO To change the tcapiat* for this generated tjpa coanaat go to
* Window Preferences Java Cod* Styla Coda Teoplates
*/
public class SearchAlgos {
public SearchAlgos OO
public BasbKap BFS(6raphSet graphH
ratarn nav HaahMapO;
public HashMap DPS(OraphSot graphM
HashSet nodai graph.getilllodss();
Iterator i aodas.iterator();
ode aextlloda;
HashMap partitions now HasbMapOj
HashMap colors nav HaahMapO;
HashMap rasuits new HashMap();
int pertitionld 0;
whiled. haaBext OH
nextBode (Hods)i.next();
colors.pat(nextHod*, "WHITE");
i nod* s.iterator();
Ailed basEsxt () ){
nextBod* (Bode)i.aext();
If(((String)colors.get(axtBode)) "WBITB"){
HashSet ccnpositeHodes new HashSetQ;
partitions.put(nev Integar(partitiaold), ccvpositeHodes);
results DPS.Visit(colors, partitions, partitionld, nextBode, graph);
colors (HashMap)(results.gat("COLORS");;
partitions (HashMap)(results.g*t("PARTITIOHS"));
partitionld partitionld + 1;
>
return partitions;
public HashMap DPS.Tisit(HashMap colors, HashMap partitions, int partitionld, Bods u, HraphSot graphX
if(colors nsil){
Systen.arr.pristln("Colors hashnap is null.");
if (partitions * nullH
Syrten.arr.println("Partitions h*"*t*p is null.");
HashSet edges graph.r*tBdges2(u) ;
HashMap results nev HashMap ();
Iterator 1 edges.iterator O;
Bod* neighbor;
//Syrten.out.println(nTisitisg + u + with id partitionld);
colors.put(u, "CHAT");
((HashSet)(partitions.gA(new Intogor(partltionld)))).add(u);
vhile(i. haaBext OH
Edge o (Hdga)i.nertO;
if(u e.getSinkOH
neighbor e.getSource();
else{
neighbor e .getSinkO;
//Systen.out.println(neighbor.gstlsoeO) ;
if (colors.mt(neighbor) "WHITE" ){
results DPS.Tisit(colors, partitions, partitionld, neighbor, graph);
colors (HashMap)(resalts.got("C0L015"));
partitions (HashMap) (results.get("PAST ITIOHS"));
>
colors.put(u, "BLACK");
resuits.put("COLOAS", colors);
results.put("PAHTITIOHS , partitions);
return results;
>
public HashSet Artieulatiou(CraphS*t graph) {
HashSet nodes graph.getillBodes();
Iterator i nodes.iteratorO;
Bode aextBode;
HashMap partitions new HashMap();
HashMap colors now HasbMapO;
HashMap results sew HashMap();
HashSet notArt new HashSet();
83


HashSat artPoints bn HashSat();
lot partitlenld 0;
ihila(i.haiB*xt()){
nextSode (Rod*) i .next();
colon .put(nertBoda, 'WBIT11");
i nodes.iterator();
shi 1* (i. hasRext ())i
nextRode (Rode)i.next()j
if(((String)color*.gat(naxtRoda)) "WHITE")!
HashSat cenpositeRodas nav BaahSatO;
partitions.put(nee Isteger(partitianld) conpociteBodes) ;
molts Art.Visit(colors, partitions, partitioold, nsztlods, graph, notArt);
colon (HashMap) (nsults.gat ("COLORS"));
partitions (HashMap) (nsults.gat(*PAHTITI01S")) ;
^artitionld partitioald +1;
>
Iterator j nodes.itaratorO;
vhil*(j. haslextO )!
Rod* n (Rods)j.next();
if(! not Art.contains(a)H
artPoints.add(n);
Syrian.out.printIn(n + is
articulation point");
}
}
Syrtan.out.printin("Th* re an + artPoints. sis*()
//Systan.out.printin("Bot art points" + notArt);
retars artPoints;
+ articulation points");
public HashMap Art.Visit(HashMap colors, HashMap partitions, int partitionId, Rod* n, OraphSet graph, HashSat notArt)!
if (colors nullH
Systan.err.print In("Colors is nnll.");
if(partitions noil)!
Syrtan.err.printIn("Partitions hashntap is anil.*);
HashSat adgaa graph. jcetBdg*s2(a);
results nee HashMap();
Iterator i edges.iterator(/;
Rode neighbor;
//STstan.ont.println("Visiting + a + with id + pertitionld);
colon.put(n, "MAT");
((H*shS*t) (partitions .gat (nav Iatog*r(partitioold)))) .add(n);
efail*(i. hasRext ())!
Idgo s (Bdge)i.nextO;
ifiu s.getSinkQ)!
neighbor *.g*tSourc*();
Isa!
neighbor *.getSink();
if(colors.gat
rasnlts Art_Vlsit(colen, partitions, part:
colon (HashMap) (results. gat ("COLORS"));
partitions (HashMap)(results .get("PARTITiaiS")) ;
// Sy stan. out. print la ( ae ighber. gatRsns ());
i.gat(neighbor) * "VHxTK")!
* Art_Vlait(colers, partitions,
itianld, neighbor, graph, notArt);
else if(colors.gat(naighbor) "GRAY")!
,,B- --------'"Adding + neighbor);
uneigaDorj;
/ als*<
//Systan. out .pr Intin ( ".
notArt.add(neighbor);
>
Systan. out. println( colors. gat (neighbor) );
*/
>
colon.put(s, "BLACK");
rasnlts.put("00L0IS", colors);
results.put("PARTITIQIB", partitions);
ratnrn results;
>
public static void aain(Strug ergs) !
OraphSat gs n* OraphSet();
SeerchAlgos sa nav SaarchAlgosO;
rrt
SAZParsarFactory factory SAZParsarFactory.navInstancaO;
SAZParsar paraar factory.navSAZParsarO;
OraphParsar handler nav GranhPsrser();
ZMLRaadar reader parser. gatZMLRaadarQ;
parser, parse (nav IaputSourca(ne* FilaB*adar("/hona/agabov/graph_Tia/d*ta/Sonia8pos .xnl") ), handler);
// parser .pare* (nav Input Source (nee FileA*ader("/boee/agaboe/grepb_Ti*/data/r*nd6en/Sonia. xnl") ), handler);
handler.aadFila();
gs handler.get6raph();
parser null;
handler null;
factory null;
reader null;
>
catch(Exception exception)!
except ion.printStacfcTraca(Systan. err);
84


Jt
l
lO
.convert(filteredVodeo, w%h, "/haae/egabov/greph.viz/dete/So&iedpoa.filtered.zml");

35
3l
3?
i:
II
V.
*1
Ifi
s:
If
A
ill
tb

e
H
iT 3 -J3 J3
r§jjii
3SSSJ3
H........
iS
rt.
.. a
9 H to
-ft 3 9
lllllli
a a a a a a
9686888
C
S3
b$
Ss
b l
S3
%
2.
IK
f: 1
t
31 !;
Id-:
3WS
till til
u e e e
H H H H
h-S**
10
00


fgliJ
!S33?-|4
ft
S
i i
|, 1
UfM
.. 8 -8 -H
L-.-s.-.-'a
o..! 2
3
-"5 3
ft
JT6HB hhr,
ull'******^0
tllllllll
sr*a
** i h .
h 2 'S>So 8 S j
5,iiirr
a... H-se-Mve-e- ss.s
it
ss

S3
* H
9ft
to
oo
Ts
e-e-
93
ftft
99
M 5%
33 s-s-
S3 * -H OO
^ e 23 ftft
W* SS-s H KJ? b'S
88 Sli
IslSU-va I 0 ill 299
1 31.SS01 sis-..s * ** m
^ 1 *3 M k1 O -SS
||9a9'g9 K "9ft
33
33
33
ffrfr
*4 -H
9ft
II
8-ft
9ft
a
a
u
e-e-
H H
9ft
Is
4*
r*>/S
//vltbad gotX to gotCoator 4/4
Bl.x al.gotDivplayO.gotCoirtorZO
al.j al.gotDisply() .g*tC*storT()
n2_x a2.got0ispljr() .gotCoatorXO
b2_j n2.g*tDiiplaj() gat Cost rT()


s
"s
.a

g
ci
I
l
M
8
$
0 0
l 1
M >
ii
a
4 4*
m m
* -H
9ft
a
8
H
o
4*
s>
3
a
8
I
^4 J4 J4£

S-S-S-S
I I
t tttttt
H -H -H -4
9ft9 ft
8SS3
3333
:gft
lli
m m m
4 5S
.98
4 tt t
as?
3333
mi
8888
iu%
HHH H
1111
IBS
8888
H hH H
9ft9ft
iSS
59ft
*83 8
SftSft

rtH Hrl
8||-8
all*
ISIS
asaa
4* 4 4 4
S.B.S.S.
e-e-e-s*
SS.S6
31
3*3

rH iH ^
0 J> .
u:
ur
:8l
Sli
O SC883
MM 0.0,
3
>
ft
S'
o.
m
a
t
?r:
' 8nl7
aSili______
J 823*33 aa..-. .
J-,f 8 2 00 *8
, si" ** ftt* 1 8S
-aw* 888S w?
S*
g
X
lw
is
3 8
-rt
o
V
SI
M Q M
XI &
8-;
as
*1 j>l sl
H :l
Nloill'az Wall
5
t
s
6
a
8
83 ft
|S
^4
fs
38
A Q
4
SS
4* 0
ft.
B
m
s It
**
5^
sS vS IS
8 88 -58
A.* W-H --H
O 9U4* 3 4* ^ 4*
8, b
£fc .
an h a ho
r^s m f*i rS
a
o.
*r4
a
4*
ft
lis#
jj
r* &5
rsii
as a|.
Jilt
ii
95
a3
H H
l!
V
r>
00


+
t + 1 ss a Is, s- t
8 SS < ....
1 $1 a oo A. MM
5* 6-S- .* *H *< | ||
^ Q O w JS 4* * ,pH&S> H H 3 t 4*
s^3-sii l JJ
lfi|S! Soto S.-*-, a
2? -H5 al e,a"
0, jl.. illSl- A tlfj
ill
ociw*
a aw a *
ja a A h
%>
s aaasaas.!)
t
i~
u
s
fc-s'
a> I
48
oi
4* £
!3
&
8S II 8
8
- 4>
u
m w'm * y *
SI 51 SI 51
? a a,
s i *
4 '*S "S '* '
s5 a
oo
oo
r*i i"*"! r*-*^ A


*/
double yLocl, yLoc2, zLocl, xLoc2;
double distance 0.0;
double ravDietance 0.0;
yLocl dul.getCeaterYO;
yLoc2 dn2.getCanterY();
zLocl dal.getCenterXO;
xLoc2 dn2.getCenterX();
if (yLocl yLoc2 U zLocl zLoe2}{
distance 0.0;
>
else if(yLocl yLoc2){
distance findXDistanee(dnl, dn2);
>
else if(zLocl zLoc2){
distance fiadYDistance(dnl, dn2);
>
// if node 1 is above and to the left of node 2
else if(yLocl < yLoc2 U xLocl < zLoc2){
//the Euclidian distance between center points
ravDi stance Math, sort ((yLocl yLoc2) (yLocl yLoc2) +
(zLocl zLoc2) (zLocl zLoc2));
doable coraerSlope dnl.getleight()/(dnl.getWidthO);
double lineSlope (yLocl yLoc2)/(zLocl xLoc2);
double dl, d2;
if (lineSlope > cornerSlopeH
//use rectangle height to fora similar triangles
dl rawDistance (4nl.getBeigbt()/2)/(yLoc2 yLocl);
>
else{
//use rectangle width
dl rawDistance (dnl.getVidtb()/2)/(zLoc2 zLocl);
>
//now find for second node
coTnorSlopo dn2.getBeight()/(dn2.gatVidth());
if(lineSlope > cornerSlopeH
d2 rawDistance (dn2.getBeight()/2)/(yLoc2 yLocl) ;
>
elso{
//use rectangle width
d2 rawDistance (dn2.getWidth()/2)/(zLoc2 zLocl);
>
distance rawDistance dl d2;
>
// if node 1 is above and to the right of node 2
else if (yLocl < yLoc2 kk zLocl > zLoc2)<
//the Euclidian distance between center points
rawDistance Rath.sqrt((yLocl yLoc2) (yLocl yLoc2) +
(zLocl xLoc2) (zLocl zLoc2));
double coraerSlope -dnl .getBeightO/(dnl .getWidthO );
double lineSlope (yLocl yLoc2)/(zLocl zLoc2);
doable dl, d2;
if(lineSlope < cornerSlopeH
//uee rectangle height to form similar triangles
dl rawDistance (dal.getBeigbt()/2)/(yLoc2 yLocl);
}
elso(
//use rectangle width
dl rawDistance (dnl.getVidth()/2)/(xLocl zLoc2);
>
//now find for second node
coraerSlope -dn2.getBeight()/(dn2.getUidth());
if (lineSlope < cornerSlopeH
d2 rawDistance (du2.getheigbt()/2)/(yLoc2 yLocl) ;
>
elseC
//use rectangle width
d2 rawDistance *(dn2.gettfidth()/2)/(zLocl zLoc2);
>
distance rawDistance dl d2:
>
// if node 1 is below and to the left of node 2
else if (yLocl > yLoc2 kk zLocl < zLoc2){
//the Euclidian distance between center points
rawDistance Hath.sqrt((yLocl yLoc2) (yLocl yLoc2) +
(zLocl zLoc2) (zLocl zLoe2)j;
double coraerSlope -dnl.get Height()/(dnl.getWidthO);
double lineSlope (yLocl yLoc2)/ (zLocl xLoc2);
double dl, d2;
if (lineSlope < cornerSlopeH
//use rectangle height to fora similar triangles
dl rawDistance (dnl.getHeight()/2)/(yLocl yLoc2);
>
elee{
//use rectangle width
// System.out.println(Vh*t do you want?");
89