PAGE 1
RELIABLE COMMUNICATION SCHEMES IN HYPERRINGS DISTRIBUTED SYSTEMS by Ashraf Ahmed Aly B.S., University, 1986 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Computer Science 1999
PAGE 2
This thesis for the Master of Science degree by Ashraf Ahmed Aly has been approved by Tom Altman Mike Radenkovic
PAGE 3
Aly, Ashraf Ahmed (M.S., Computer Science) Reliable Communication Schemes in HyperRings Distributed Systems Thesis directed by Professor Tom Altman ABSTRACT In this paper, we study the properties, constructions, fault tolerance, and connectivity of hyperrings. We usually use a graph to represent the architecture of an network, where nodes represent processors and edges represent communication links between pairs of processors. A graph G = ( V, E) is called a hyperring with N nodes (NHR for short) if V = { 0, ... ,N1 } and E = { { u, v} I vu modulo N is a power of 2}. Although the number of edges in a hyperring is roughly twice that of a hypercube with the same number.ofnodes, the diameter and the connectivity of the hyperring are shorter and larger, respectively, than those of the corresponding hypercube. These properties are advantageous to hyperring as desirable interconnection networks. Since hypercubes are commonly used networks, the study of embedding of hypercubes into hyperring will be useful. Altman [2], showed that for any n, the ndimensional hypercube can be embedded in the hyperring with 2n nodes as its subgraph. This paper discusses the reliability in hyperring. One of the major goals in network design is to find the best way to increase the system's reliability. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements, as well as on the structure its communication links and computer elements, as well as on the distribution of its resources, such as programs and data files. This abstract accurately represents the content of the candidate's thesis. I recommend its publication. Signed Tom Altman iii
PAGE 4
DEDICATION I dedicate this thesis to my parents, Ahmed and Kamla, my wife, Shahinaz, and my daughter, Nesma, with love.
PAGE 5
ACKNOWLEDGMENTS My thanks to Professor Altman for his support encouragement and guidance with my research.
PAGE 6
CONTENTS Figures ........................................................... IX Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Research Goal . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. HyperRings ..................................................... 3 2.1 Advantages of HRs over HCs . . . . . . . . . . . . . . . . . . . 3 2.2 Construction of HRs . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 The Number of Edges in HRs . . . . . . . . . . . . . . . . . . . 7 2.4 Embedding ofHyperCubes into HyperRings ......................... 10 2.4.1 Embedding of a into an 8HR ............................... 12 2.4.2 Embedding of a 16HC into a 16HR .............................. 12 2.5 Embedding ofComplete Binary Tree (size 2n1) inNHR ................ 13 3. Connectivity in HRs . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Network Flows: Basic Concepts .................................... 15 3.2 Path Model ofNetwork Flow ...................................... 17 3.3 Maximum Network Flows and Minimum Capacity Cuts ................. 18 3.3.1 Mathematical Formulation of the Maximal Flow ..................... 18 Vl
PAGE 7
3.4 Connectivity: Basic Concept ....................................... 20 3.5 Theorem (Bounds on Connectivity) ................................. 20 3.6 Menger's Connectivity Theorems ................................... 21 3. 7 Vertex Connectivity of a Graph . . . . . . . . . . . . . . . . . . 22 3.8 Problems of Edge Connectivity .................................... 23 3.8.1 Finding EdgeDisjoint Paths in a Graph ............................ 23 3.8.2 Edge Connectivity of a Graph .................................... 24 4. FaultTolerant Communication in HRs ............................... 26 5. Reliability in Distributed Systems ................................... 29 5.1 Reliable Transmission (Byzantine General Problem) .................... 29 5.2 Reliability Optimization .......................................... 30 5.3 Message Transmission ........................................... 32 5.4 Network with Faults ............................................. 34 5.5 Efficiency of Message Transmission ................................ 35 5.6 Optimization Problem Formulation ................................. 37 5.7 Reliability Optimization Algorithm ................................. 39 5.8 Broadcasting with Random Faults .................................. 41 5.9 Tolerating Faulty Nodes/Links ..................................... 41 5.10 Optimal Transmission Complexity ................................. 44 vii
PAGE 8
6. Maximum Flow for OddSized HRs .................................. 46 7. Summary ....................................................... 51 Appendix ......................................................... 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 V111
PAGE 9
FIGURES. Figure I. Examples of a 16HC and a 16HR ................................... 1 2.1 An 11HR ...................................................... 3 2.2 A 16HR ....................................................... 4 2.3 A 16HC ....................................................... 4 2.4 Treelike method . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Circular method . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Doubling method . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 An 11HR ...................................................... 8 2.8 A 10HR ....................................................... 9 2.9 A 16HR ....................... ............................... 10 2.10 A 4HC ...................................................... 12 2.11 An embedding of a 4HC into an 8HR ............................. 12 2.12 An embedding of a 16HC into 16HR .............................. 12 2.14 An embedding of complete binary tree with 15 nodes into a 15HR . . . 14 3.1 Conservation of flow for node 3 . . . . . . . . . . . . . . . . . . 15 4. A6HR ........................................................ 26 lX
PAGE 10
5.1 Comparing complexity through examples ............................ 36 6.1 A 3HR ....................................................... 48 6.2 A 5HR ....................................................... 48 6.3 An 11HR ..................................................... 49 X
PAGE 11
1. Introduction 1.1 Research Goal Although the HyperCube is quite powerful from a computational point of view, there are some disadvantages to its use as an architecture for parallel computation. One of the most obvious disadvantages is that the node degree of the HyperCube grows with its size. This means, for example, that processors designed for anNnode HyperCube cannot later be used in a 2Nnode HyperCube. Moreover, the complexity of the communications portion of a node can become fairly large as N increases. For example, every node in a 1024processor HyperCube has 10 neighbors, and every node in a one millionprocessor HyperCube has 20 neighbors. I / / v y / v/ / v IY 'V 1/ 16HC. 16HR. Figure 1. Example of a 16HC and a 16HR. 1
PAGE 12
In this thesis we present hyperrings (HRs for short). The number of links between processors in a HR is roughly twice that of a HC with the same number of processors, but the proposed organization possesses a number of advantages over that of a HC. Altman, et al. [2], have shown that for any N we can construct a HR with N processors, whereas a HyperCube must contain exactly 2n processors for some positive n. Figure 1 shows examples of a 16HC and a 16HR. We first formally introduce hyperrings, then study their properties, constructions, connectivity, fault tolerance, and finally study embeddings of Hyper Cubes, grid graphs, and complete binary trees into HRs. Hyperrings and their variations have appeared in literature under several different names. See Altman [1] for details and references. 2
PAGE 13
2. HyperRings Defmition 1. A circulant graph, G = ( V, E) is called a hyperring with N nodes(NHRforshort)ifV= {0, 1, ... ,N1} andE= {{u, v} I vumoduloNispower of2}. Hyperrings, are a multimachine organization that is, in a sense, a generalization ofthe standard HyperCube architecture. An example of an 11HR is shown in Fig. 2.1. Figure 2.1. An 11HR. 2.1 Advantages of HRs over HCs Clearly Hyperrings are connected regular graphs as is shown in Fig. 2.2. Although the number of edges in a hyperring is roughly twice that of a HyperCube with the same number of nodes, the diameter and the connectivity of the hyperring are shorter and larger, respectively, than those of the corresponding HyperCube. These properties make hyperrings desirable as interconnection networks. A number of topologies have been proposed for interconnecting processors in large scaled distributed systems. The HyperCube (HC for short) is shown in Fig. 2.3, is one of the most popular and efficient networks due to its topological richness. 3
PAGE 14
I /' / / y v ,/ 1/ /_ v r IV v Figure 2.2. 16HR. Figure 2.3. 16HC. HyperRings and their variations have appeared in literature under several different names, including an optimal broadcasting scheme. Altman [1 ], showed that the number of links between processors in a HR. is roughly twice that of a HC with the same number of processors, but the proposed organization processes a number of advantages over that of a HC. In particular, for any N we can construct, a HR. with N processors, whereas a HyperCube must contain exactly 2n processors for some positive n. Some of the advantages ofHRs over HCs are: The number of processors in a HRs does not have to be a power of 2. The number of edges in a HR is roughly twice that of a samesized HC. HCs and complete binary trees can be directly embedded in HRs. 2.2 Construction of HRs Definition 2. Let Nbe a positive integer. The binarity of N, written bin(N), is a function whose range is between 0 and I log N l that returns the largest integer k, such that N is divisible by 2*. Observe that any positive integer N may be represented as the following product N = 2hin(N) M, 4 (1)
PAGE 15
Where M is an odd number. First, let us observe a simple (nonrecursive) construction of NHRs. The following procedure is straightforward and needs no further explanation [1]. for k : = 0 to N1 do fori= 0 to LlogNJ do connect processor k to one labeled [k + 2']N if there is no link between k and [k + 2;]N Although the above is an effective and correct procedure, it gives us no insight about the actual number of edges placed in the HR. In addition, no obvious information about recursive structure relates HRs construction in this fashion. If E(N) denotes the nt.unber of edges in an NHR, then E(2N) = 2E(N) + 2N. (2) The problem, of course, is the construction and the determination of the number of edges in the original NHR. In fact, the problem may be reduced to exalninations of MHRs, where Mis an odd positive integer in Equation (1). The nonrecursive construction of NHRs may be used for construction of such MHRs, with a slight modification. Observe that for odd M of the form 2k + 1, we can construct the MHR by using the nonrecursive construction procedure that ignores the last iteration of the inner for loop. Note that in this case the last iteration of the inner for loop produces the same connections as the connections produced in the first iteration of the inner for loop. That is, the inner for loop should be interated one less time to produce theMHR. For the construction of an arbitrary NHR one simply needs to compute k = bin (N), determine the odd Min Equation ( 1 ), construct the appropriate MHR, and recursively applied the doubling construction procedure exactly k times. There are three general methods to construct the HRs. 1. Treelike method In Treelike method, each node is connected to its neighbor if and only if it's a power of 2 distance away from that node in both the clock and counter clockwise 5
PAGE 16
direction, as shown in Fig. 2.4. Obviously no duplicate edges are introduced. 2. Circular method 9 0 8 Figure 2.4. Treelike method. Circular method consists of approximately log n phases. In phase i1 the two nodes that are a distance of 2 to the power of i away from a given node are connected to it as is shown in Fig. 2.5, where connections of21 are displayed. 0 6 2 4 Figure 2.5. Circular method. 6
PAGE 17
3. Doubling method steps. To construct a hyperring by using the doubling method we use the following (1) make a duplicate copy of call it NHR1 ; (2) relabel all nodes in NHRo from ito 2i (0 i N1 ); (3) relabel all nodes in NHR1 from i to 2i+ 1 (0 i N1 ); ( 4) for each i (0 i 2N 1) connect node i and node j if j = i+ 1 modulo 2N; The above procedure is called the doubling construction procedure, as is shown in Fig. 2.6. (a) A 3HR. I \ I \ I \ I \ I \ ..._ ____ _. (b) A copy of3HR. Figure 2.6. Doubling method. 2.3 The Number of Edges in HRs 0 2 3 (c) A 6HR connected from the two connected 3HR.s. Altman, et al. [2], showed that the number of edges in HRs does not grow monotonically with the number of nodes. 7
PAGE 18
Theorem: The number of edges in an NHR, N 2, is given by: { N(log N1/2) if the Hamming weight of N is 1 E(N) = NL logN J if the Hamming weight of N is 2 Nf log N l if the Hamming weight of N is 3 or more. Where HW (N) denotes the Hamming weight of N. Proof: If the Hamming weight of N is 1, then N = 2", n 1. Observe that during the execution of the nonrecursive construction procedure of NHR., no multiple edges could possibly be constructed until the very last iteration of the inner for loop. During that iteration, exactly half of the edges would be eliminated. It follows that the number of (nonreplicated) edges when N is a power of 2, is N log N N/2= N (logN112). A Hamming weight of2 implies that N can be decomposed into a sum of2P and 29 Moreover, this decomposition must be unique. Because N = 2P + 29 for each pair of nodes that were 2P and 29 positions apart, exactly one extra edge is eliminated due to duplication in the nonrecursive construction procedure of NHRs. Hence, the number of edges is Nflog NlN, in this case, is equal to Nl_log NJ. In the case where Hamming weight is 3 or more, the nonrecursive construction procedure of HRs does not attempt to create any duplicated edge even if the condition is removed. Hence, the number of edges in this case is Nflog Nl. Example 1 0 6 Figure 2.7. An 11HR. 8
PAGE 19
Assume we have an 11HR as is shown in Fig. 2.7. The Hamming weight of N nodes = HW (11) = 3 =number of 1 'sin base 2. By using Theorem 1, we can find the number of edges in 11HR. HW(11)=3 The number of edges in 11HR = .Nflog Nl where HW (N) :l!:3 Hence, the number of edges is 44. Example2 Examine the 10:HR as is shown in Fig. 2.8. Number of nodes= N = 10. (10)10 = (1010:h 0 2 Figure 2.8. A 10 HR. The Hamming weight N nodes = HW (11) = 2= number of 1 's in base 2. The number of edges in a 1 0HR = .N[_logNJ = 30. 9
PAGE 20
Example3 0 3 4 8 Figure 2.9. A 16HR.. Examine the 16HR as shown in Fig. 2.9. By Using Theorem 1, we have: The Number of nodes = N = 16 (16)10 = (10000)2 HW (16) = 1 =number of 1 'sin base 2. The Number of edges = N (Log N'l'l). Log2N= Log 16 = 4. The Number of edges = 16 ( 4112) = 56. 2.4 Embedding of HyperCubes into HyperRings Since HyperCubes are commonly used networks, the study of embeddings of HyperCubes in hyperrings will be useful. Altman [2], showed that for any n, the ndimensional HyperCube can be directly embedded in the hyperring with 2n nodes. 10
PAGE 21
Thendimensional HyperCube (nHC for short) is a graph with N = 2n nodes labeled by 2n binary numbers from 0 .to 2n1. Where there exists an edge between two nodes if and only if their binary representations of their labels differ by exactly one bit. The HC can be used to simulate many networks since it contains or nearly contains those networks as subgraphs. This is one of the main reasons why the HC is a powerful network for parallel computation. From the structures of NHCs and NHRs, it is clear that for N = 2n, the NHR contains NHC as a subgraph, making the HR an even more powerful network for parallel computation. 1 dimensional grid (size 1 x k) A !dimensional grid of size 1 x k can be embedded as a subgraph into a kHR in an obvious way. That is, there are no wasted nodes in the HR. Note that for HCs, if k is not a power of 2, e.g., k =2m+ 1, then the HC will need 2m+1 = 2k2 nodes. 2dimensional grid (size ax b) To embed a grid of size ax b as a subgraph into a HR, we need min (a X 2 [Logbl, nodes. 3dimensional grid (size ax b x c) The number of nodes required in the HR to embed a x b x c grid is min (a X 2fLogbl X 2fLogcl, b X 2fLogal X 2fLogcl X In general, when embedding kdimensional grids into HRs we have one extra degree of freedom, i.e., the size of one of dimensions can be the same as the original grid which is not necessarily a power of2. It follows that more nodes are wasted when the grid embedding is done in HCs than HRs (unless each of the grid dimensions is a power of2). 11
PAGE 22
Below are several examples of direct embeddings of a HCs into a HRs. 2.4.1 Embedding of a 4BC into an SHR 0 4 Figure 2.10. A 4HC. Figure 2.11. An embedding of a 4HC into an 8HR.. Fig. 2.10, represent a 4HC, and Fig. 2.11, represent a 4HC embedding into an 8HR.. 2.4.2 Embedding of a 16HC into a 16HR 15 0 4 s I '7 1} J7 '6 12 / 15 13 3 12 4 1 v 9 3 / ,l/ y If/ 17 0 8 2 10 11 11 5 16HC. 9 8 Figure 2.12. An embedding of a 16HC into a 16HR. 12
PAGE 23
2.5 Embedding of Complete Binary Tree (size 2nl) in NHR It is known that a complete binary tree with 2n1 nodes cannot be embedded as a subgraph into a 2nHC. Altman, et al. [1], showed that it can be embedded as a subgraph into a (2n1)HR. The address of each node of the complete binary tree is given by the following procedure: (1) The address of the root is 0. (2) If the address of a node at depth k is i, then the addresses of its left child and its right child are i+ 2k and i+ 2k+ 1 respectively. Theorem 2. Suppose that the address of each node of the complete binary tree with 2n 1 nodes is given by the above procedure. Then the addresses of all nodes in the complete binary tree are distinct. Furthermore, the address of any node at depth k (0 s: k s: n1) is between 2A:1 and 2k+1 2. Proof. From the procedure above, the address of any node is different from the address of any ancestor of the node. It is also immediate that the address of any node in its left subtree is different from the addresses of any node in its right subtree. Hence, all addresses in the complete binary tree are distinct. The latter assertion of the theorem is proved by induction on k. Fork= 0, the assertion obviously holds. Suppose that up to depth k the assertion holds. From the induction hypothesis, the range of the addresses in depth k + 1 is between The address of a node in a complete binary tree is different from the address of its parent by a power of 2: From this fact and Theorem 2, a complete. binary tree with 2n1 nodes can be embedded as a subgraph into a (2n1)HR. The embedding of a complete binary tree into the HR is shown in Fig. 2.14. 13
PAGE 24
0 7 Figure 2.14. An embedding of complete binary tree with 15 nodes into a 15HR. 14
PAGE 25
3. Connectivity in HRs Introduction: In the case of computer networks, the edge or node connectivity corresponds to the minimum number of communication links or computer centers which, if removed, would still leave the network connected, this criterion is most meaningful. If all centers in the network are of equal importance. Otherwise, it is desirable for certain pairs of nodes in the graphs to have a larger edge or node connectivity than others. The minimum number of nodes in any cut of a graph is equal to the maximum number of nodes disjoint paths between nodes. Similarly, the edge connectivity between nodes is equal to the maximum number ofedge disjoint paths between that pair of nodes. Menger's fundamental theorem has simplified the determination of the node and edge connectivity between any pair of nodes in a graph [24]. 3.1 Network Flows: Basic Concepts A network is a graph in which a flow can take place in the edges of the graph. Every network has an origin and a terminus called the source and sink, respectively. A network can be thought of as a pipeline system in which the edges represent pipelines. The source is the inlet of water, the sink is the outlet of water, and all other nodes are connections between the lengths of pipe [22]. Figure 3.1. Conservation of :{low for node 3. 15
PAGE 26
A flow in a network may be a flow of fluids, electricity, or other commodity. The corresponding units of the flow may be cubic feet per second, amperes, or dollars per month, etc. In general, a flow is a rate .. One of the major assumptions about networks is. the conservation of flow in and out of a node. The flow into a node is equal to the flow out of it. If the number above each edge in Fig. 3.1, the represents the flow in the edge, then the conservation of flow for node 3 is observed. The upper limit on the flow in an edge is the capacity of the edge. The flow capacity of an edge is a positive number and may be infinite, in which case the edge is said to be uncapacitated. If an edge has a flow capacity of zero, removing the edge from the network does not alter the potential flow in the network. Given a chain connecting the source and sink of a network, the maximum flow that can be put across the chain is restricted by the edge of the chain with the smallest capacity. If (x, y) is an edge of G, then the capacity of (x, y) is given by c (x, y). The flow in an edge (x, y) is designated _by f(x, y) is related to its capacity by c (x, y) y) A network is a graph G ( V, E) with nonnegative, weights assigned to each edge. Alternatively, a network is an ordered triple (V, E, W) where Vis a set of vertices, E a set of directed edges on V, and W is a function mapping the elements of E to the nonnegative reals. If we interpret the edge weight as limits on the capacities of the edges to transmit a commodity of some type (such as, vehicular traffic, data, fluids). The network is called a capacitated network. The determination of maximum amount of traffic that can be transmitted between a given pair of vertices in a capacitated network is called the maximum network flow problem. The problem of network flows is a basic problem in graph theory and combinatorial optimization. A basic variation of the network flow problem is defined as follows. Let G (V, E) be a directed graph with two distinguished vertices, S (the source) with indegree 0, and T (the sink) with outdegree 0. Each edge x in E has an associated positive weight c(x), called the capacity ofx. The capacity measures the amount of flow that can pass through an edge. We call such a graph a network. For convenience we assign a capacity of 0 to nonexisting edges. A flow is a function/ on the edges of the network that satisfies the following two conditions: 16
PAGE 27
(1) 0 c(x): The flow through an edge cannot exceed the capacity of that edge. (2) For all V E V{s, t}, f..f(u, v) = f..f(v, w): The total flow entering a vertex is equal to the total flow entering a vertex is equal to the total flow exiting this vertex (except for the source and sink). These two conditions imply that the total flow leaving S is equal to the total flow entering T. The problem is to maximize this flow. One way to visualize this problem is to think of the network as a network of water pipes. The goal is to push as much water through the pipes as possible. If too much water is pushed to the wrong area, the pipes will burst. We need to introduce some terminology, beginning with a precise definition of what we mean by a traffic flow. There are two ways to define this, either by a path model of flow, or using conservation mode. While the two methods are mathematically equivalent, only the path model of flow will be discussed here. 3.2 Path Model of Network Flow Let N CV. E) be a capacitated network. LetS and T be a pair of distinct vertices in V (N), and let P be a set of paths from S to T. If x is an edge in E (N), we denote the (nonnegative integral) capacity ofx by C (x), and we denote the set of paths in P that include the edge x by P (x). Let/be a nonnegative integral function defined on P. We denote the sum of the values off(P) taken over all paths Pin P (x) by sum if, x). We call fa flow on P if for every edge x in E (N), sum if, x) is at most C (x). IF fis a flow on P and Pis a path in P, we callf(P) the flow on P. If xis an edge in E (N), we call sum if, x) the flow on x. The set of paths P together with a flow f on P is called a path flow on Nand is specified by the quintuple (N, S, T, P, /). The value of a path flow (N, S, T, P, /) is the sum of the values off (P) taken over all paths p in P and is denoted by value (P, /), the other parameters being left implicit. The vertices SandT are called the source and sink of the path flow, respectively see [21] for details. 17
PAGE 28
3.3 Maximum Network Flows and Minimum Capacity Cuts The fundamental bound on the maximum value of a path flow or a conservative flow on a network N with source Sand sink T can be informally stated as follows: The maximum flow value is bounded above the tightest "bottleneck: between SandT." We now formalize this intuitive notion. Let (N, S, T, P, f) be a conservative flow. If X is a set of vertices in V (N), then xe denotes the complement of X with respect to V (N), that is, V (N) X. We define a cut (X, xe) as the set of edges { (x, X.) I X E X an X. E xe}. If s is in X and Tis in xe, then (X, xe) is called an ST cut. If X, xe is an ST cut, there is no path from S to Tin N (V, E(X, xe). We defme the capacity of a cut, denoted by Cap (X, xe), as the sum of the capacities of the edges in (X, xe). We denote the capacity of a minimum capacity ST cut in Nby min cap (N, S, n. The flow on (X, xe), denoted flow (X, xe), is the sum of the edge flows in (X, X). The net flow across (X, X) denoted by net flow (X, xe)flow (xe, X). See [21] for details.. c 3.3.1 Mathematical Formulation of the Maximal Flow Problem: The Maxflow mincut theorem. The maximal flow problem is to find a feasible equilibrium or steadystate flow pattern through a network such that the total flow from the source to the sink is maximized. Should there exist more than a single source (sink) node, then a super source (supersink) node is connected to each of the sources (sinks). The edges making the connection have infinite flow capacity [22]. The flow into a node x is L f(x,y). From the conservation of flo,w assumption, L f(x,y) L f(y,x) = o. 18
PAGE 29
In addition, the flow out of the source s node equals the flow into the sink t node; then f(s,y) = v = f(y,t). The maximal flow problem can be stated as !v, L f(x,y)L f(x,y) = c(xy) j{xy) 0. x=s x=s x=t An observation can be made about the upper bound on the flow through a network. The sum of the capacities on the edges leaving the source provides an upper bound on the flow out of the source node. Similarly, the sum of the capacities of the edges entering the sink provides an upper bound on the flow into the sink. Since the flow out of the source equals the flow into the sink, the sum of the edge that is smaller is the upper limit on the flow in the network. A cut is defined as any set of edges containing at least one edge from every chain from source to sink. The capacity or value of the cut is the sum of the flow capacities of the edges in the direction specified by the cut. Thus, recalling the earlier observation, a cut across the edges leaving the source has a value that is an upper bound on the flow in the network. A cut separating the source and sink that has the smallest capacity is called a minimum cut. The Maxflow min cut theorem states that for any net, the value of the maximal flow from source to sink is equal to the capacity of minimum cut separating sink and source. The capacity of any cut in a network specifies an upper bound on the flow in a network. 19
PAGE 30
3.4 Connectivity: Basic Concepts Previously, we defined the vertex connectivity, VG (G), and edge connectivity, EC (G), of a graph G. These invariants measure the susceptibility of a graph to disconnection under vertex or edge deletion. From one viewpoint, they calibrate the vulnerability of the graph considered as a communication network to disruption. It is remarkable that precisely these measures of strength of connection also determine the diversity of routing in the graph, that is, the number of disjoint paths between pairs of vertices [21]. G ( V, E) in terms of parameterized versions of these invariants which will be useful later in developing algorithms for VC and EC. If u and v are nonadjacent vertices in G, we define a vertexdisconnecting set (or vertex separator) between u and v in G as a set of vertices S in V (G) such that u and v are in different components of GS. S is said to disconnect (or separate) the pair of vertices u and v. An edge disconnecting set between a pair of vertices and u and v in G is similarly defined as a set of edges TinE (G) such that u and v are in different components of GT. For nonadjacent vertices u and v. We define the vertex connectivity between u and v, denoted GC (G, u, v), as the cardinality of the smallest vertex disconnecting set between u and v. On the other hand, EC (G, u, v) denotes the cardinality of the smallest edge disconnecting set between the pair of vertices u and v. We can then define VC (G) as the minimum of VG (G, u, v) taken over all distinct, nonadjacent vertices u and v in G (or) IVI1 ifG is a complete graph, while EG (G) is the minimum of EC (G, u, v) taken over all distinct vertices u and v in G. 3.5 Theorem (Bounds on Connectivity) For every graph G (V, E), VC EC EG min (G). There are a variety <;>f conditions on the degree sequence of a graph that guarantee lower bounds on connectivity, such as the following. 20
PAGE 31
Theorem (Connectivity Lower Bound). Let G (V, E) be a graph of order greater than one, and let n satisfy 1 n I Vl1. If min Cl V(G)I + n2) /2, then VC(G) n. By the previous theorems, this also establishes a lower bound on the edge connectivity. The precise relation between the connectivity of a graph and the number of disjoint paths that exist between pairs of vertices in the graph is the subject of the following classical theorem. Let us say a pair of paths are vertexdisjoint if they have at most their end points in common; while a pair of paths are edgedisjoint if they have no edges in common. Theorem (path diversit)' characterization of connectivity). Let G (V, E) be a graph and let k be an integer. Then, (1) If the removal of k (but not less thank) vertices (edges) disconnects a pair of vertices u and v in G, there are at least k vertex (edge) disjoint paths in G from u to v; (2) If G is k vertex (edge) connected, there are at least k vertex (edge) disjoint paths between every pair of vertices in G [21]. 3.6 Menger's Connectivity Theorems Let .A.e (X Y) denote the maximum number of edgedisjoint paths between nodes X and f. Let .A..., (X Y) denote the maximum number of vertexdisjoint paths from X to f. Let Ke (X Y) denote the minimum size of these cutsets which partition the graph so that Xis in one component and fis in the other. Also, we define K, (X Y) to be the minimum size of those vertexcuts which separate a graph G into two components, one containing X and the other containing f. So, no such vertexcut exists if (X Y) is anedge of the graph [20]. Let G = (V, E) be an undirected graph with X Y, E, V, then Ke (X Y) (X Y), because every path from X to fuses at least one edge of a set which disconnects 21
PAGE 32
G, and no two paths use the same edge, thus Ke (.K, 1') = Ae (X, 1'), if (X, 1') not in E. The following is the statement of Menger's edgeconnectivity theorem. A graph is K edgeconnected if and only if any two distinct vertices are connected by at least Kedgedisjoint paths. 3.7 Vertex Connectivity of a Graph The usual formula for the vertex connectivity of a graph G (V, E) in terms of its pairwise vertex connectivities is VC(G)=min {VC(G, u, v)}, where (u, v) not in E (G). The following procedure improves on this slightly. For convenience, we assume that ifG is not a complete graph. We invoke functions VG (G, u, v) to calculate the pairwise vertex connectivities [21]. Function VC (G) (*Returns in VC the Vertex connectivity of G*) VarG: Graph i,j: 1. .. 1 V(G)I VC: integer function set VCto I V(G)I1 if G is complete then return for i = 1 to I V (G) I do for j = i + 1 to I V (G) I do if i andj not adjacent then set VC to min { VC, VC ( G, i, j)} if i > VC then return End Function VC 22
PAGE 33
3.8 Problems of Edge Connectivity We can calculate EC (G) by finding a maximum flow on a network G' obtained from G by replacing each undirected edge {u, v} in G by a pair of directed edges (u, v) and (u, v) in G' and assigning a unit capacity to each edge of G'. The pairwise edgeconnectivity, pairwise edgedisjoint path, and pairwise edge disconnecting set problems can then be solved using flow methods. The flow bearing paths of a flow on G' correspond directly to a set of edgedisjoint paths in G. The edges of a minimum cut between a pair of vertices in G' determine a minimum cardinality edge disconnecting set in G between those vertices. 3.8.1 Finding EdgeDisjoint Paths in a Graph Let G = (V, E) be a connected undirected graph. Two paths in G are said to be edge disjoint, if they do not contain the same edge. Let 0 be the set of vertices in V with odd degrees. We first claim that the number of vertices in 0 is even. To prove this claim, we notice that, by summing up the degrees of all vertices, we get exactly twice the number of edges (since each edge is counted twice). But, since all vertices of even degree contributed an even number to this sum, there must be an even number of vertices of odd degree. We now prove the following theorem [19]. Theorem : Let G = (V, E) be a connected undirected graph, and let 0 be the set of vertices with odd degrees. We can divide the vertices in 0 into pairs and find edgedisjoint paths connecting vertices in each pair. Proof : The proof is by induction on the number of edges. Induction hypothesis: .The theorem is true for all connected undirected graphs with < m edges. Consider a connected undirected graph G with m edges, and let 0 be the set of odd vertices. If 0 is empty, then the theorem is true. Otherwise, take any two vertices in 0. Since G is connected, there is a path connecting them. Remove the whole path from G. The remaining graph has fewer edges. We would like to use the induction hypothesis, to find the paths for the rest of the odd vertices, and to complete the proof. The problem, however, is that, by removing the path, we may have disconnected the graph. The induction hypothesis applied only to connected graphs. We have to be very careful about using the induction hypothesis 23
PAGE 34
correctly. we can avoid this difficulty in this case; we will change the hypothesis and adapt it to our needs. The problem we encountered was with the connectivity requirement. Let's remove it. We now have the following induction hypothesis: Revised induction hypothesis: The theorem is true for all undirected graph having< m edges. This is obviously a stronger theorem. Its proof, on the other hand, is simpler. Consider again an undirected graph with m edges, and 0 as before. The graph may not be connected. In this case, the graph is partitioned into several connected components. We will take two odd vertices from the same component. Since each component is a connected graph by itself, it must have an even number of odd vertices. Hence, ifthere are any odd vertices, we can find two of them in the same component. So now we are basically done. since the two chosen vertices are in the same component, we can connect them by a path. We then remove the path. the graph has now less than m edges, and we can use the induction hypothesis because it does not require connectivity. Thus, in the remaining graph, we can pair the odd vertices in edge disjoint paths. We can then add the path we removed and complete the proof. 3.8.2 Edge connectivity of a graph The algorithm shown below is to find the edgeconnectivity K (G) of an undirected graph G. Where K (G) denote the edgeconnectivity of an arbitrary graph G = (V,E), G' denotes the digraph obtained by replacing each edge of G by two antiparallel edges, each of unit capacity. The algorithm finds the value of a maximum flow F for the network consisting of G' with a source x and a sink y. The algorithm would have O(n IE I ) running time [20]. 24
PAGE 35
Algorithm: Input G and construct G' Specify u K1 E I for all v E V{u} do begin find F for ( G'with x = u andy= v) ifF
PAGE 36
4. FaultTolerant Communication in HRs A major goal in design of a multiprocessor network, which may be defmed as a system of independent processors or computers that communicate with one another via communication network, is faulttoler3Ilce. These systems are made faulttolerant by providing redundant or spare processors and/or redundant communication links. When an active system component fails, its tasks are dynamically transferred to spare components, thereby allowing system operation to continue, possibly in a degraded mode. In this thesis, we present an approach to the analysis and synthesis of highly faulttolerant multiprocessor networks, whose control is decentralized, i.e., distributed throughout the system. Each processor is assumed to be capable of acting as a local supervisor ofreconfiguration and recovery. The system is represented by a facility graph whose nodes denote processors and whose edges denote communication paths. l(a) 4(d) Figure 4.a. (G). 26 4(d) Figure 4.b. (Gp).
PAGE 37
Example 1: Figure 4.a shows a 6HR graph. The nodes ofG correspond to the microcomputers, and the edges of G correspond to the direct links between the microcomputers. Suppose that the initial state of.the system G isS (a, b, c, d, e, j)(1, 2, 3, 4, 5, 0). The graph G in Fig. 4.a is the corresponding initial configuration. The F fault affecting node a changes S to SF= (1, 2, 3, 4, 5, 0), and yields the configuration GF of Figure 4.b. We require each node y to be capable of acting as a recovery supervisor in a small part of the system, termed its neighborhood N (Y). We restrict N (Y) to containing Y and all nodes adjacent to Y. Each faultfree node Y may periodically test itself. It is assumed that if Y is faulty, its state S ( Y) changes to 1, and this fact can be detected by all nodes in the neighborhood N (Y). Every active faultfree node Y; periodically polls and records the status of all other nodes inN (Y;). Y; also stores and periodically updates backup files for certain nodes on N (Y;), which allow Y; as a local supervisor to reassign the tasks of these nodes when they become faulty. Although it is generally preferable to limit the role of supervisor to active faultfree nodes, there are cases where it is useful to allow spares to act as supervisors. Distributed recovery typically proceeds in several steps as follows. ( 1) A fault F occurs changing the system configuration. (2) An active node Y; detects a faulty node yj in its neighborhood N (Y;), assumes that role oflocal supervisor and initiates recovery. Usually you know attempts to find a spare node Yk inN (Y;) such that Yk can assume the previous state of the faulty node lj. (3) If no suitable spare is available, either because no spares are present inN ( Y;), or because the available spares do not have the proper connectivity, Y; makes an appropriate change to the state of N ( Y;) and relinquishes the role of local supervisor, which must then be assumed by some other node. It should be noted that the initial fault affecting node lj in step (1) above corresponds to the absence of a node in state subsequent reassignment of states during recovery may also lead to other states being temporarily missing from the system configuration. For example, if supervisor node Y; changes the state of another 27
PAGE 38
node yk from sk to the of the faulty node lj, the old state sk of yk is unassigned. These considerations lead to the important concept of an error condition, which is defined formally as follows. Let GF be a faulttolerant system whose configuration is in Fig. l.b. An exists if GF has no active node in Thus, error detection and recovery will be based on the identification of configurations in which one of the possible active states is missing. A particular error E may occur with many different system configurations. The fastest recovery is achieved when a spare node Yk is immediately used to replace a faulty node lj by assuming its state, as in step (2) above. In this case, we say the Yk covers lj. In general, a node lj covers a node lj in a graph G if lj is adjacent to every node in G that is adjacent to lj. See [23] for details. 28
PAGE 39
5. Reliability in Distributed Systems One of the major goals of the reliability engineer is to find the best way to increase the systems reliability. Usually, how well a system meets its design objective and is expressed as a function of the reliability of the subsystems or components. Thus, there are many factors that affect the reliability of a system. It is well accepted that improving the reliability of the system can be achieved by one or a combination of the following techniques: reducing the complexity of the system improving the reliability of components or subsystems implementing large safety factors using redundant components (standby, switched) practicing a planned maintenance and repair schedule. Many of these techniques are in conflict with others and it is extremely important to consider all the aspects of the system. Computing exact reliability and performing optimization is a complex process. Some simplifying about the system network configurations are made so that the problem can be handled with reasonable effort. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements, as well as on the distribution of its resources, such as programs and data files. Useful measure of reliability in distributed systems is the terminal reliability between a pair of nodes which is the probability that at least one communication path exists between these nodes. An interesting optimization problem is that of maximizing the terminal reliability between a pair of computing elements under a given budget constraint. Analytical techniques to solve this problem are applicable only special forms of reliability expressions [24]. 5.1 Reliable Transmission (Byzantine General Problem) Consider a distributed system of processors whose interconnection network is given by an undirected graph G (V, E). Assume there is a processor at each vertex of G which can communicate directly with the neighboring processors at adjacent 29
PAGE 40
vertices. To transmit a message to a distant processor, a processor included the intended route with its message and then routes the message through the network, by transmitting to the first processor/vertex on the route which proceeds in a similar manner, and so on, until the message eventually reaches its destination. Feng [25], showed that the differences between the Byzantine problem and faulttolerant broadcasting is that the source may be faulty in the former case, but the source is always good in the latter case. We are interested in the kind of problems that can arise when some of the processors in the network may be faulty. We will assume that faulty processors can exhibit a bizarre variety of deleterious effects. They may lose messages, route messages incorrectly, garble messages, or even generate multiple incorrect copies of messages. It is not at all obvious whether reliable transmission can be achieved in the presence of such potential chaos. But, surprisingly, we can show that if the interconnection graph for the network has sufficiently high vertex connectivity, then the reliable processors in the network can communicate reliably even in the presence of severely unreliable components. 5.2 Reliability Optimization The development of computer networking and lowcost processing devices has led to an increasing interest in distributed systems. This trend is becoming more important because: it provides a practical means for sharing expensive information resources, computing power, and information handling devices; it provides higher performance by permitting concurrent operations on the processing elements; it provides more reliable operational capabilities by the replication of resources and the rerouting of the tasks that are assigned to a faulty processing element to some other faultfree processing elements of the distributed systems. The reliability of a distributed system depends not only on the reliability of the communication network, but also on the processing nodes' reliabilities and the 30
PAGE 41
distribution of other resources, su_ch as programs and data files in the network. Therefore, the reliability of a distributed system can be measured by the probability of the successful executions of its programs. As a result, the successful execution of a task requires that the host processor on which the task is running is operational, all the processing nodes on which the required data files reside are operational, and all the links which will be used in data files transfers are operational [25]. The reliability of the distributed systems has been studied by using different measures, such as the survivability index, which is a quantitative measure used to determine the survivability of a distributed system by averaging the number of programs that remain operational after a combination of links and nodes have failed. Another study which is called the terminal reliability is used to measure the probability that at least one path exists between the terminal nodes. The multiterminal reliability describes the probability that paths exist between a subset of the processing nodes which designates the source nodes and another subset which designates the destination nodes. An important design problem in distribute systems is that of maximizing terminal reliability between a pair of terminal nodes. The optimizing techniques that can be used for terminal reliability maximization vary from analytical to heuristic approximation methods. Most of the methods available which are proposed for maximizing the system's reliability are _based on using redundancy to improve the reliability of seriesparallel systems. The use of redundancy to solve this optimization problem which is more difficult to solve than a nonlinear programming problem. Integer programming techniques give integer solutions but they do not guarantee that optimal solutions can be obtained in a reasonable computing time [24]. INSTANCE: Graph G = (V,E), subset V' !: V, a rational "failure probability" p(e), 0 p(e) 1, for each e E E, a positive rational number q 1 [26]. QUESTION : Assuming edge failures are independent of one another, is the probability q or greater that each pair of vertices in V' is joined by at least one path containing no failed edge? 31
PAGE 42
Dynamic programming solves the optimal redundancy allocation problem, but the required computation time and the amount of storage needed to store the generated tables make it infeasible when the number of state variable is large. 5.3 Message Transmission In this paper, we express a network by a graph, where a node is a processor and an edge is a communication link. If two nodes are connected by an edge, they are called adjacent. If every pair of nodes in a network are connected by an edge, the network is said a complete network. Otherwise, it is called an incomplete network or a general network. Broadcasting is a process of disseminating a message from a source node to all other nodes. Since not every pair of nodes have a link connecting them, broadcasting can be achieved in the way that each node repeatedly receives and forwards the messages. When only one node has a message to disseminate, it is called onetoall broadcasting. If every node has a message to disseminate to all other nodes, it is called all toall broadcasting. For broadcasting, no secrecy is required. That is, we need not keep the message secret to any node. theonly concerned matter is correctness, i.e., every good node should correctly receive the message broadcast by the source node before the broadcasting is finished. For broadcasting, we require that every good node can correctly receive the messages broadcast by good nodes. The goal of alltoall broadcasting can be formalized as follows: Letp1,p1 pn be all the processors (nodes) of the network. Denote m[i] the message broadcast by p; and m1 [zl the message received by p1 fromp;, fori, j =I, 2, .. ; .. n. The broadcasting is said to be successful if m[i] = m1 [i] for every pair of i andj such thatp; andp1 are good processors [25]. We always assume that the source node of broadcasting be good and honest. That is, the source node always sends a correct message to its neighbors (adjacent 32
PAGE 43
nodes). However, it is sometimes possible that the source node itself is faulty, while in this situation all other nodes must agree on some value as what the source node broadcasts. This problem is called Byzantine agreement. Hence, the difference between faulttoletant broadcasting and Byzantine agreement can be stated as that the former always has a good source while the latter may have a faulty source. We formalize the goal of Byzantine agreement as follows: Letp1,p2 ,pn be all processors (nodes) of the network. Suppose thatp1 is a source node and has a message m to send to all other p; 's. At the end of the execution of a Byzantine agreement protocol, every p; decides a message m;. We require that m; = mi for every pair ofi andj such thatp; and pi are good processors. Furthermore, if p 1 is good, m; = m for all the good p; 's. Byzantine agreement is one of the most important tasks in distributed systems. By applying Byzantine agreement protocols, we perform as if there exists a broadcasting channel while there actually is not. The Byzantine agreement problem has been much studied. Many protocols for ByZantine agreement have been proposed. Although Byzantine agreement is usually studied on complete networks, it can be implemented for general networks. The most basic operation used in byzantine agreement protocols is faulttolerant broadcasting: the message broadcast by good nodes can be correctly received by all other good nodes. For example, the simplest protocol for Byzantine agreement, EIG (exponential information gathering) just exploits t + 1 rounds of alltoall broadcasting. Hence, the study on how to raise the efficiency and reliability of broadcasting helps to raise the efficiency of realizing Byzantine agreement in general networks. Only reliability is concerned in the study of broadcasting, whereas both correctness and secrecy are concerned in secure message transmission; In secure message transmission setting, one node (processor), called a sender, has a secret messages to send to another node, called a receiver. We require that the receiver can correctly obtain the messages whereas all other nodes obtain nothing about s. We always assume that all the good nodes execute the protocol honestly. We also assume that every node knows the topology of the underlying network in advance and that the network is synchronized by a global clock. Thus, all the processors can implement one step of a protocol concurrently in a specified time unit. 33
PAGE 44
The basic idea of secure message transmission is to exploit disjoint paths a sender and a receiver. When the distance between a sender and a receiver is large, the transmitted messages possibly suffer from both corruption and leakage. For more details, see [25]. 5.4 Networks with Faults If the nodes (processors) and edges (links) of the network work well, we say they are good. The main theme of this work is to study the problems in the previous section in the situation where some processors and/or links may be faulty. In a network, both processors and links may fail to work well. We call them faulty nodes and faulty links. Both faulty nodes and faulty links are called faults. If we consider the maximum number of faults that can be tolerated in the worst case, the assumption that only nodes may be faulty covers the assumption that both nodes and links may be faulty. This is because we can regard all the links connected to faulty nodes be faulty. However, we sometimes assume that only faulty links exist, i.e., all the nodes are good. For some cases, it is easier for us to cope with just faulty links than to cope with both faulty nodes and faulty links. Two types of faults are usually considered in the study of faulttolerant broadcasting. One is the failstop type, and the other is the arbitrarychange (or Byzantine) type. A faulty node or link of the failstop type does not transmit any i.e., it stops the message passing through it. A faulty node or link of the Byzantine type may not only stop sending message but also arbitrarily change a message passing through it. It may even fabricate a fake message. Byzantine faults are much more malicious than failstop faults, and therefore, are more difficult to tolerate. Fault tolerance of broadcasting against faults of the failstop type have been much studied. For the study of broadcasting, we do not consider the situation where source node is faulty, i.e., we always assume that the source node be good. It is the problem of Byzantine agreement where the source node may be faulty. Byzantine agreement is equivalent to a problem called Consensus. In the Consensus problem, every node has a value at first. After a Consensus protocol is finished, all the good nodes must achieve an argument on a value. If at first all t_he good nodes hold an identical value, the agreed value should be the identical value. 34
PAGE 45
One of the main issues studied in message transmission is to evaluate the maximum number of faults that can be tolerated in the worst case. This problem arises in both reliable message transmission (i.e., broadcasting) and secure message transmission. For nonadaptive protocols/schemes, we can never tolerate as many Byzantine faults as a half of the connectivity of the network. This is the case for both reliable message transmission and secure message transmission. Hence, we sometimes would like to take another assumption about faults, that is, we assume that nodes and/or links become faulty randomly with a fixed probability. In the random cases, we study the relationship between the probability that a protocol/scheme succeeds and the probability that each node and/or each link becomes faulty. For large networks, the assumption of random faults is more realistic [25]. 5.5 Efficiency of Message Transmission When we talk about the efficiency of a protocol/scheme/algorithm of message transmissions, we usually refer transmission complexity, messagesize complexity, stop/round complexity and communication complexity. In the following, we explain in details these complexities. By a transmission we mean that a node transmits a message along one link once. For example, when A sends a message to C through B, two transmissions are needed, as shown in Figure 5.1 (a). Transmission complexity of a protocol/scheme refers the number of transmissions in the implementation of the protocol/scheme. Messagesize complexity refers the size of the message being transmitted. When we say messagesize complexity of a protocol/scheme, we mean the maximum size of the messages of all transmissions by the protocol/scheme. For example, A has a message of size 20 bits. If A sends the message to C through B as shown in Figure 5.1 (a), then the messagesize complexity is 20 bits while the transmission complexity is 2. If A separates the message into two parts of size 1 0 bits and sends one of them through Band the other through D as shown in Figure 5.1 (b), then the messagesize complexity is 10 bits, but the transmission complexity is 4. Apparently, there is a tradeoff between messagesize complexity and transmission complexity. We can always reduce messagesize complexity by 35
PAGE 46
increasing transmission complexity. However, from practical viewpoint, we rather reduce transmission complexity as much as possible. This is because increasing the size of the transmission is less costly than increasing the number of transmissions. In the communication in a network many transmissions can take place concurrently. For example, transmissions from B to C and from D to C in Figure 5.1(b) can take place concurrently. However, transmissions from A to Band fromB to C have to take place one after another. We call a concurrent round of transmissions a round or a step. The number of concurrent rounds/steps is called round/step complexity. For example, A has a message of size 20 bits to send to B. If A separates the message into two parts of size 1 0 bits and sends one of them directly to B and sends the other through D and C as shown in Figure 5.1 ( c }, then the transmission complexity, the messagesize complexity and the round/stop complexity are 4, 10, 3, respectively. Sometimes we also call the round/step complexity of a protocol/scheme the running time of the protocol/scheme. Sometimes we also mention communication complexity. Communication complexity refers the total amount of messages being transmitted. If the message size is always the same during the execution of the protocol/scheme, the communication complexity is the product of messagesize complexity and transmission complexity. The communication complexity is 40 bits in Figure 5.1 (c). B B B 'c A/ ,. c (a) D (b) D (c) D Figure 5.1. Comparing complexity through examples. One thing we need to mention about transmission complexity, message complexity, step complexity and communication complexity is that all these complexities are calculated for good nodes/links. In other words, we do not consider 36
PAGE 47
faults in estimating those complexities since faults of the Byzantine type may perform arbitrarily. The efficiency of message transmission is affected not only by the number of faults, or the faulty probability in random case, but also by the limited manner of transmissions. For example of broadcasting, if we allow a node to send messages along all the links connected to it concurrently in one step, this is said to be allport broadcasting. On the other hand, if we only allow a node to send messages along one of its links in each step, this is said to be oneport broadcasting. Apparently, oneport broadcasting requires larger step complexities than allport broadcasting does. In the case where no fault exists, allport broadcastirig requires trivially the same number_ of steps as the diameter of the network. However, the situation is much more complicated for oneport broadcasting. Constructing oneport broadcasting with the optimal step complexity in general networks is NPhard even for the case where no fault exists [25]. 5.6 Optimization Problem Formulation The optimization problem is more general than the problem of solving series parallel systems. Distributed systems have complex structures and can not be modeled as seriesparallel systems and, therefore, different techniques need to be used to obtain optimal or near optimal solutions in a reasonable computation time. Furthermore, the costreliability functions associated with the components depend on their characteristics, the technology used in manufacturing them, the environment surrounding their physical location, and the method used for improving their reliability. The algorithms introduced here are independent of the techniques used to improve the reliability of the components and assume that the costreliability functions are given. These costreliability functions approximate the amount of dollars that must be spent to improve the reliability of the distributed systems (DS) components [24]. Terminal reliability is a measure of the probability of successful communication between a pair of nodes. If multiple paths exist between a given pair of nodes, then the terminal reliability is the probability that at least one path is 37
PAGE 48
operational. There are several algorithms proposed in the literature to obtain a terminal reliability expression. These algorithms are based on path enumeration, cut set enumeration, or state enumeration. A general formula for the terminal reliability, R(s + t), can be directly derived from its definition that can be stated as follows: .. By direct expansion of the formula, a general expression of terminal reliability between a pair of terminal nodes (s, t) can be obtained as follows: m t) = L P,.(P;)L nlj) ... +(l)m1 n P,_ n Pm). i1 '
PAGE 49
An analytical methodology was studied in [21] to solve this optimization problem using the Lagrangian multiplier method and to show that the optimal solutions can be derived for some simple reliability expressions. For example,, the following problem was studied analytically: GIVEN: MAXIMIZE: OVER DESIGN VARIABLES: SUBJECT TO: d';s. i=l 1r 0 d; ' ; do not spend C; more than required to get P; from r; to 1. The optimal solution resulted from applying the Lagrangian multiplier method is 1{ r. d; =Dmax + ' n i=I c1 C; However, this analytical approach can only be applied to simple forms of reliability expressions. Therefore, we adopt an iterative approach that can handle large distributed systems and develop algorithm to maximize the reliability within a cost constraint. 5. 7 Reliability Optimization Algorithm The following iterative algorithm consist mainly of two steps, an exploring step and an updating step. In the exploring step, the reliability of the unreliable elements are improved by investing a o dollar amount, and then the objective 39
PAGE 50
function is evaluated for every element examined. In the updating step, the values of the objective function are compared to find the element whose reliability improvement maximizes the objective function, and then investing o dollars to improve its reliability [24]. These two steps are repeated Until all the available budget has been used up. However, this algorithm uses different objective functions to identify at each iteration the element that maximizes terminal reliability. The algorithm uses the terminal expression itself as objective function. The terminal reliability expression is as an objective function to be maximized in each iteration. In the exploring step, the improvement in the reliability of each element is first examined sequentially (one at a time) by spending o dollars on improving its reliability and the corresponding terminal reliability value is evaluated. In the updating step, the values of terminal reliability resulted from the improvements of the explored elements are compared to identify the element that maximizes terminal reliability and then to improve its reliability. These two steps are repeated until all the budget has been used up. The number of iterations (maxiter) is a function of o that can be chosen as the minimum of dollars that must be spent on every element to produce an increase in its reliability equal to the desired accuracy in the reliability values. For example, if the accuracy in measuring the reliability of the elements is 1 Q4, o is then the amount that should be spent on any unreliable element to increase its reliability by at least 1 Q4. Once o is determined, the number of iterations is Dmax /o. The complete description of this algorithm is shown below [24]. 40
PAGE 51
Algorithm: *Initialization. Dmax maxtter= o for i = 1 to n do d;=O od for iter = 1 to maxiter do begin *Exploring step. for i = 1 to n do begin d;=d;+ o; evaluate R;; d.=d.0. I I J end *Updating step. fmd: Rimax =max (R;:i = 1 n); improve imax: d;max = d;max + 0 ; end 5.8 Broadcasting with Random Faults For large networks the worst case rarely appears. That is, broadcasting succeeds with a high probability even if many more than [(n1)/2] faults exit in a network of n component graphs. Hence, it is a more realistic assumption that faults are randomly distributed in a network [25]. 5.9 Tolerating Faulty Nodes/Links We assume that/faulty nodes of the Byzantine type are randomly distributed in a network and that all the links are healthy. We consider broadcasting from the 41
PAGE 52
source node through n independent spanning trees of a network, and study the relation between the number of faulty nodes and the probability of successful broadcasting. Suppose that each conformation of the network with/ faulty nodes is equally probable. We denote a configuration of network G with/faulty nodes by c{;, and denote the set of the all configurations of G with/faulty nodes by c{;. If broadcasting in c{; is successful configuration, and otherwise c{; is called a failed configuration. We denote the set of all successful configurations of G with/faulty nodes and the set of all failed configurations of G with f faulty nodes by SCl and FCl respectively. Then the probability of successful broadcasting in G with/faulty nodes is equal to I SCl ill C I = 1 I FCl ill C{; I We can more realistically consider all configurations with up to/faults to estimate the probability of successful broadcasting. However, if we assume that in this consideration the probability of configurations with the same number of faults are equal (e.g., the probability of each config1,ll(ltion with k faults (k /) is Pk), the results in this section and the next section are still valid for such distributed modes of faults [25]. Let us consider a network G = G1 x x Gm where the order of each Gi (1 s is n) is not greater than a constant b. Lets be the source node of G, and let q1(s, u), ... qn(s, u) ben paths from s to u in then independent spanning trees Th T2 , Tm respectively. These n paths are internally nodedisjoint. By the broadcasting scheme the message from the source node s is disseminated to each node u of G through then independent spanning trees. From the construction of then independent spanning trees, for each i (1 i n) there are at most nb nodes on qi(s, u). If in a configuration c{;, more than LCn1)12J paths among q1(s, u), q2(s, u), ... qn __ (s, u) contain faulty nodes, then c{; is called a failed configuration for u. We denote the set of all failed configurations for u by FC{; (u). Then we have jFc[j < L jFC[(u)j ueV(G) Theorem: If/faulty nodes of the Byzantine type are randomly distributed in a network G = G1 x x Gn with no faulty links, then the broadcasting through then independent spanning trees in G is successful with a probability higher than 42
PAGE 53
1( 46;1f'',, where N is the order of G and is the maximum order of_Q and b is the maximum order of the component graphs [25]. Proof : The probability of successful broadcasting through the n independent spanning trees of G is lsa{j_ 1 I Fcc{ I lcc{l lcc{l For any nodes u of G, there are n internally nodedisjoint paths q1(s, u), ... qn(s, u) of length at most bn. Consider the configurations where exactly f n/2l faulty nodes exist and they are on the different paths among the n nodedisjoint paths from s to u. The number of each configurations is apparently no more than (bnin'21(r. By repeatedly counting, we have I I I ( )fn/21( n 1(Nr n/21\ FCa(u) < bn ln!2l) 1ln12lJ" Hence, jFcc{l < N(bn/nt21( n 1(NIn!21\(N)J Icc{ I I n!2l) II n12l) I =N(bn/nt21( n l f(ll)(l+lln12l l n!2l) N(N l}(N + 1I n/2l) ( ) r n/21 ( 4 3 ) r nt21 < bn(bn/nt21 2 n b;l Then we have 43
PAGE 54
Corollary : For any k > 1, if at most !f.faulty nodes of the Byzantine type are 4b nk randomly distributed in a network G = G1 x x Gn, then broadcasting through then independent spanning trees in G is successful with a probability higher than 1krnt2l. 5.10 Optimal Transmission Complexity Let G be an incomplete network(graph) and I Gl denote the number of nodes of G. Apparently, if there are no faults in G, the optimal transmission complexity of broadcasting in G is I GIL If the connectivity ofG is n, at mostl n; 1J Byzantine faults can be tolerated in the broadcasting. We only.consider nonadaptive fault tolerant broadcasting. We can simply let the source node send n copies of the message through n disjoint paths to some node. We can do this to all the nodes one by one. Of course, l n; 1 J Byzantine faults can be tolerated. However, such fault tolerant broadcasting has large transmission complexity. It is easy to see that any nonadaptive faulttolerant broadcasting has transmission complexity at least n( I G l1) if it is requested to tolerate up to l n; 1 J Byzantine faults. This is because for the nonadaptive case, every nodes to receive at least n transmissions in order to obtain the correct message in the presence ofl n; 1 J Byzantine faults. If G is an nchannel graph, the nonadaptive faulttolerant broadcasting __ algorithm has complexity n( I G l1 ), and it can tolerate up to l n; 1 J Byzantine faults. Hence, it is optimal in transmission complexity. Observation: Lets be the source and v be an arbitrary node in G. Let m (i = 1, 2, ..... n) be a copy of the message m transmitted from s to v. Without loss of 44
PAGE 55
generality, we may suppose that m; is transmitted from s to v through a path of length k, namely, Transmission: su" Transmission: u1 u2 .... Transmission: v = uk. Then every Transmission (j = 1, 2, ..... k) is used in the fmal majority voting not only at v but also at ui. We attach a label to each copy of m when it starts from s. Suppose that these copies are m2 m". Remember that each of m1 m2 m" is actually the same as m. We just imagine that we place a label on each copy of the message. Let v be an arbitrary node ofG{s}. Since our broadcasting is nonadaptive to tolerate l n; 1 j Byzantine faults, v must receive all m1 m2 m". That is, each m; should reach all the nodes [25]. 45
PAGE 56
6. Maximum flow for oddsized HRs Altman [3] showed that the edge connectivity ofHR is equal to its nodal degree, which optimum. However, it is still an open question whether the node connectivity of an oddsized HR is equal to its degree. We have attempted to answer this question by explicitly testing the node connectivities of various oddsized HRs. The maximumflow problem is the simplest problem concerning flow networks. It asks, what is the greatest rate at which material can be shipped from source to the sink without violating any capacity constraints. We shall use the classical method of Ford and Fulkerson for finding maximum flows. The FordFulkerson method FordFulkerson method for solving the maximumflow problem. It depends on three ideas: residual networks, augmenting paths, cuts of flow networks. These ideas are essential to the maxflow min cut theorem, which characterizes the value of a maximum flow in terms of cuts of the flow networks. The FordFulkerson method is iterative. We start withj{u, v) = 0 for all u, v E V, giving an initial flow of value 0. We increase the flow value by fmding an "augmenting path", as a path from source Sto the sink t along which we can push more flow. We repeat this process until noaugmenting path can be found. This process yields a maximum flow. Residual networks Consists of edges that can admit more net flow. Suppose we have a flow network G = (V, E) with sources and sink t. Letfbe a flowinG, and consider a pair of vertices u, v E V. Residual capacity Ci is the amount of additional net flow we can push from u to v before exceeding the capacity. C1= c(u, v)j{u, v). 46
PAGE 57
Augmenting paths Given a flow network G = (V, E) and a flow f. An augmenting pathp is a simple path from s to t in the residual network G1 The residual capacity of p, is the maximum amount of net flow that we can send along the edges of an augmenting pathp. Cuts of flow networks The FordFulkerson method repeatedly augments the flow along augmenting paths until a maximum flow has been found. The maxflow min cut theorem, tells us that a flow is maximum if and only if its residual network contains no augmenting path. A cut (S, 1) of flow network G = ( V, E) is a partition of V into Sand T = VS such that s E Sand t E T. If/is a flow, then the net flow across the cut (S, 1) is defined to bef(S, 1). The capacity of the cut (S, 1) is C (S, 1). The basic FordFulkerson algorithm In each iteration of the FordFulkerson method, we fmd any augmenting path p and augment flow falongp by the residual capacity Cjp). We update the net flow f[u, v] between each pair u, v of vertices that are connected by an edge in either direction, we assume implicitly thatf[u, v] = 0. The capacity C (u, v) = 0 if (u, v) E E. The residual capacity C1(u, v) = C (u, v)f(u, v). First we initialize the flow fto 0. The while loop repeatedly finds an augmenting path p in Gft and augments flow f alongp by the residual capacity Cjp). When no augmenting paths exist, the ft"ow /is a maximum flow. BreadthFirst search (BFS). To visit all nodes connected to node kin a graph, we put k onto a FIFO queue, then enter into a loop where we get the next node from the queue. If it has not been visited, visit it and push all the unvisited nodes on its adjacency list. We continue doing that until the queue is empty [22]. We have used the FordFulkerson method to find the maximum flow for the oddsized HRs, which is related to the nodeconnectivity of the HRs. 47
PAGE 58
The following is the maximum flow out put for various oddsized HRs. Maximum flow for 3HR. Number of edges = 3 augment path : 02 augment path: 012 Max flow=2 Maximum flow for 5HR. 0 Figure 6.1. A 3HR. 0 3 Figure 6.2. A 5HR. 48
PAGE 59
Number of edges = 10 augment path : 04 augment path : 014 augment path : 024 augment path : 034 Max flow=4 Maximum flow for 11HR Number of edges = 44 augment path : 01 0 augment path : 011 0 augment path : 0210 augment path: 0310 augment path : 071 0 augment path : 081 0 augment path : 091 0 augment path : 0461 0 Maximum flow= 8 0 6 Figure 6.3. An 11HR. 49
PAGE 60
Maximum flow for 29HR Number of edges = 116 augment path: 0128 augment path : 02328 augment path: 042628 augment path: 082328 augment path : 028 augment path : 02728 augment path: 02528 augment path: 0212028 Maximum flow= 8 Maximum flow for 57HR Number of edges = 342 augment path: 0156 augment path: 02356 augment path: 056 augment path : 045256 augment path : 0895056 augment path: 0163156 augment path : 025333556 augment path : 032404856 augment path: 0414256 augment path: 04956 augment path : 0535456 augment path : 05556 Maximum flow= 12 50
PAGE 61
7. Summary In this thesis we have studied the properties and the constructions, embeddings, fault tolerance, connectivity, reliability, and optimization ofHRs. We have shown construction procedures for HRs and the exact evaluation for the number of edges in anNHR. Altman [3] showed that the edge connectivity ofHR is equal to its nodal degree, which optimum. However, it is still an open question whether the node connectivity of an oddsized HR is equal to its degree. We have attempted to answer this question by explicity testing the node connectivities of various oddsized HRs. Empirical results, so far have confirmed our hypothesis for the first 1 ,000 oddsized HRs. 51
PAGE 62
Appendix I /MaxFlow.cpp #include #include #include #include #include #include #include "hring.h" void main() { I I clrscr(); FlowMap myMap; hyperRing myRing; AUGLIST::iterator aug; edgeList: :iterator path; Flow Map: :iterator maplterator; for(int i=3; iconnectivity(); FlowMap::iterator mydex = myMap>begin(); FlowMap::iterator mydex2; while(mydex != myMap>end()) { edge *theEdge = mydex>second; printf("\nThe edge from node %ito node %i has Capacity %i, theEdge>startN ode,theEdge>endN ode, theEdge>capacity, theEdge>flow); mydex++; 52
PAGE 63
} int numbOtEdges = myMap>size(); unsaturatedPath = numbOtEdges; printf("\n\n\t**** %i nodes HRing **** \n",i); printf("\tNumber of edges %i\n",numbOtEdges); //myRing>getDirectedGraph(myMap,i); while( I) { %i\n" ,i,maxFlow); edgeList pathlist = myRing>searchPath(i); edgeList: :iterator pathtemp4 = pathlist>begin(); printf("augment path:\t"); while(pathtemp4 != pathlist>end()) { } edge* aaaa = *pathtemp4; printf("%i",aaaa>startN ode); pathtemp4++; printf("\n"); if(pathlist>size() = 0) { II } if(unsaturatedPath != 0) { } continue; printf("\n node number %i maxFlow = break; int mincap = 0; path = pathlist>begin(); edge *temp = *path; //mincap = temp>capacity; int thisNode = temp>endNode; int con= 0; while(path != pathlist>end()) {//find mincap in the path temp = *path; 53
PAGE 64
if(temp>startNode != thisNode) { } if( con= 0) { } con++; path++; mincap = temp>capacity; continue; pathlist >erase(path); path= pathlist>begin(); con= 0; continue; thisNode = temp>endNode; if(mincap >= temp>ResidualCap) mincap =temp> Residual Cap; path++; } I I found min cap if(mincap <= 0) { unsaturatedPath; if(unsaturatedPath == 0) break; my Ring> AugList.erase(myRing> AugList. begin(),myRing> AugList.end() ); continue; } path = pathlist>begin(); while(path != pathlist>end()) { } temp = *path; temp>flow += mincap; temp>ResidualCap = mincap; path++; myRing>AugList.erase(myRing>AugList.begin(),myRing>AugList.end()); 54
PAGE 65
maxFlow += mincap; } mydex = myMap>begin(); I /FlowMap: :iterator mydex2; while(mydex != myMap>end()) { edge *theEdge = mydex>second; printf("The edge from node %i to node %i has Capacity %i, Flow%i \n", theEdge>startN ode,theEdge>endN ode, theEdge>capacity, theEdge>flow ); mydex++; } } cout << endl << endl; II cin.get(); delete myRing; } 11Sleep(30000); llmatrix.h #ifndef MATRIX H #define MATRIX H class matrix { public: private: }; matrix( unsigned, unsigned); matrix(); void init(); int* operator [](unsigned); void display() const; int **mat; unsigned row,col; 55
PAGE 66
#end if //matrix.cpp #include #include "matrix.h" matrix::matrix(unsigned r, unsigned c):row(r),col(c) { } mat = new int* [row]; for(unsigned i=O; i
PAGE 67
{ } } for( unsigned j=O; j
PAGE 68
REFERENCES [1] T. Altman, Y. Igarashi and K. Obocata, HyperRing Connection Machines, Proc.IEEE 9th Annual. International Conference (1994) pp. 290294. [2] T. Altman, Y. Igarashi, and K. Obokata, HyperRing Connection Machines, Parallel Computing 21 (1995)pp. 13271338. [3] T. Altman, Reliable Schemes for HyperRings 28th International Southeastern Conference on Combinatorics, Graph theory and Computing; Boca Raton, Florida, Feb., 1997. [4] Y. Han and R. Finkel, An Optimal Scheme for Disseminating Information, Proc. 1988 Int. conf. on Parallel Processing, St. Charles (1988) pp. 198203. [5) Y. Han, Y. Igarashi, K. Kanai and K. Miura, FaultTolerant Broadcasting in Binary Jumping Networks, in: Algorithms and Computation, ISAAC '92, Nagoya, Lecture Notes in Computer Science 650 (SpringerVerlage, New York, 1992) pp.145154. [6] D. Peleg and J. D. Ullman, An Optimal Synchronizer for the Hypercube, SLAM J. Comput. 18 (1989)pp. 740747. [7] G. Tel, Topics in Distributed Algorithms (Cambridge University, Press. Cambridge, 1991). [8] N. Alon, A. Barak and U. Mauber, On Disseminating Information Reliably without Broadcasting, in: Proc. 7th Internet. Conf on Distributed Computing Systems (1987)pp. 7481. M. Y. Chan, Embedding of DDimensional Grids into Optimal Hypercubes, Proc. 1 s Annual. ACM Symposium on Parallel Algorithms and Architectures (1989) pp. 5257. 58
PAGE 69
[10].S. N. Bhatt, F. R. K. Chung, F. T. Leighton and A. L. Rosenberg, Efficient Embeddings ofTrees in hypercubes, SIAM J. Compute. 21(1992) pp. 151162. [11] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays. Trees. Hypercubes (Morgan Caveman, San Mateo, CA, 1992). [12] M. Rotifer, U. P. Stirred, and W. Under, "Embedding 3Dimensional Grids into Optimal hypercubes, Technical Report. Dept. of Mathematics and Computer Science. Univ. ofPaderborn, Paderborn, Germany, 1994. [13] K. Fee. "Embedding Mesh ofTrees in the Hypercube." J. Parallel and Distributed Computing, vol., pp.230. 1991. [14] F. T. Leighton, M. Newsman, AUG. Rained, and E. Schwa, "Dynamic Tree Embeddings in Butterflies and Hypercubes," The 1st Annual ACM Symposium on Parallel A1grithms and Architectures, pp.224234, 1989. [15] R. Koch, F. T. Leighton, B. Maggs, S. Rao, and A. L. Rosenberg, "WorkPreserving Emulations of FixedConnected Networks," The 21st Annual ACM Symposium on Theory of Computing, pp.227240, 1989. [16] S. N. Bhatt, F. R. K. Chung, F. T. Leighton, and A. L. Rosenberg, "Efficient Embeddings ofTrees in Hypercubes," SIAM J. Computing, vol. 21, pp.151162, 1992. [17] B. Aiello and F. T. Leighton, "Coding Theory, Hypercube Embedding, and Fault Tolerance," The 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, pp.l25136, 1991. [18] N. Alon, A. Barak, and U. Mauber, "On Disseminating Information Reliably without Broadcasting, The 7th International Conference on Distributed Computing Systems, pp.7481, 1987. [19] U. Mauber, (1989). Introduction to Algorithms. AddisonWesley publishing company Inc. [20] Gibbons, Alan. (1985). Algorithmic Graph Theory. Cambridge University. 59
PAGE 70
[21] M. C. Hugh. James A. (1990). Algorithmic Graph Theory. PrenticeHall, Inc. [22] B. D. Sivazlian, (1975). Optimization Techniques in Operations Research. PrenticeHall, lhc. [23] R. M. Yanney, J.P. Hayes, Distributed Recovery in FaultTolerant Multiprocessor Networks. IEEE Transactions on Parallel and Distributed Systems (1984) pp. 514520. [24] C. S. Raghavendra, Reliability Optimization in the Design of Distributed Systems. IEEE Transactions on Parallel and Distributed Systems (1985) pp. 265270. [25] B. Feng (1996), "Reliable and Secure Message, "Doctoral Thesis, Dept. of Computer Science, Gunma University, Japan, pp. 0238. [26] G. Michael (1979). Computers and Interactability. Bell Telephone Laboratories [c.f.7]. 60
