Citation
Reliable communication schemes in hyper-rings distributed systems

Material Information

Title:
Reliable communication schemes in hyper-rings distributed systems
Creator:
Aly, Ashraf Ahmed
Publication Date:
Language:
English
Physical Description:
x, 60 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Hypercube networks (Computer networks) -- Design ( lcsh )
Ring networks (Computer networks) -- Design ( lcsh )
Electronic data processing -- Distributed processing ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (M.S.)--University of Colorado at Denver, 1999.
Bibliography:
Includes bibliographical references (leaves 58-60).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Ashraf Ahmed Aly.

Record Information

Source Institution:
University of Colorado Denver
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
44104731 ( OCLC )
ocm44104731

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

RELIABLE COMMUNICATION SCHEMES IN HYPER-RINGS DISTRIBUTED SYSTEMS by Ashraf Ahmed Aly B.S., University, 1986 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Computer Science 1999

PAGE 2

This thesis for the Master of Science degree by Ashraf Ahmed Aly has been approved by Tom Altman Mike Radenkovic

PAGE 3

Aly, Ashraf Ahmed (M.S., Computer Science) Reliable Communication Schemes in Hyper-Rings Distributed Systems Thesis directed by Professor Tom Altman ABSTRACT In this paper, we study the properties, constructions, fault tolerance, and connectivity of hyper-rings. We usually use a graph to represent the architecture of an network, where nodes represent processors and edges represent communication links between pairs of processors. A graph G = ( V, E) is called a hyper-ring with N nodes (N-HR for short) if V = { 0, ... ,N-1 } and E = { { u, v} I v-u modulo N is a power of 2}. Although the number of edges in a hyper-ring is roughly twice that of a hypercube with the same number.ofnodes, the diameter and the connectivity of the hyper-ring are shorter and larger, respectively, than those of the corresponding hypercube. These properties are advantageous to hyper-ring as desirable interconnection networks. Since hypercubes are commonly used networks, the study of embedding of hypercubes into hyper-ring will be useful. Altman [2], showed that for any n, the n-dimensional hypercube can be embedded in the hyper-ring with 2n nodes as its subgraph. This paper discusses the reliability in hyper-ring. One of the major goals in network design is to find the best way to increase the system's reliability. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements, as well as on the structure its communication links and computer elements, as well as on the distribution of its resources, such as programs and data files. This abstract accurately represents the content of the candidate's thesis. I recommend its publication. Signed Tom Altman iii

PAGE 4

DEDICATION I dedicate this thesis to my parents, Ahmed and Kamla, my wife, Shahinaz, and my daughter, Nesma, with love.

PAGE 5

ACKNOWLEDGMENTS My thanks to Professor Altman for his support encouragement and guidance with my research.

PAGE 6

CONTENTS Figures ........................................................... IX Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Research Goal . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. Hyper-Rings ..................................................... 3 2.1 Advantages of HRs over HCs . . . . . . . . . . . . . . . . . . . 3 2.2 Construction of HRs . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 The Number of Edges in HRs . . . . . . . . . . . . . . . . . . . 7 2.4 Embedding ofHyper-Cubes into Hyper-Rings ......................... 10 2.4.1 Embedding of a into an 8-HR ............................... 12 2.4.2 Embedding of a 16-HC into a 16-HR .............................. 12 2.5 Embedding ofComplete Binary Tree (size 2n-1) inN-HR ................ 13 3. Connectivity in HRs . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Network Flows: Basic Concepts .................................... 15 3.2 Path Model ofNetwork Flow ...................................... 17 3.3 Maximum Network Flows and Minimum Capacity Cuts ................. 18 3.3.1 Mathematical Formulation of the Maximal Flow ..................... 18 Vl

PAGE 7

3.4 Connectivity: Basic Concept ....................................... 20 3.5 Theorem (Bounds on Connectivity) ................................. 20 3.6 Menger's Connectivity Theorems ................................... 21 3. 7 Vertex Connectivity of a Graph . . . . . . . . . . . . . . . . . . 22 3.8 Problems of Edge Connectivity .................................... 23 3.8.1 Finding Edge-Disjoint Paths in a Graph ............................ 23 3.8.2 Edge Connectivity of a Graph .................................... 24 4. FaultTolerant Communication in HRs ............................... 26 5. Reliability in Distributed Systems ................................... 29 5.1 Reliable Transmission (Byzantine General Problem) .................... 29 5.2 Reliability Optimization .......................................... 30 5.3 Message Transmission ........................................... 32 5.4 Network with Faults ............................................. 34 5.5 Efficiency of Message Transmission ................................ 35 5.6 Optimization Problem Formulation ................................. 37 5.7 Reliability Optimization Algorithm ................................. 39 5.8 Broadcasting with Random Faults .................................. 41 5.9 Tolerating Faulty Nodes/Links ..................................... 41 5.10 Optimal Transmission Complexity ................................. 44 vii

PAGE 8

6. Maximum Flow for Odd-Sized HRs .................................. 46 7. Summary ....................................................... 51 Appendix ......................................................... 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 V111

PAGE 9

FIGURES. Figure I. Examples of a 16-HC and a 16-HR ................................... 1 2.1 An 11-HR ...................................................... 3 2.2 A 16-HR ....................................................... 4 2.3 A 16-HC ....................................................... 4 2.4 Tree-like method . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Circular method . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Doubling method . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 An 11-HR ...................................................... 8 2.8 A 10-HR ....................................................... 9 2.9 A 16-HR ....................... ............................... 10 2.10 A 4-HC ...................................................... 12 2.11 An embedding of a 4-HC into an 8-HR ............................. 12 2.12 An embedding of a 16-HC into 16-HR .............................. 12 2.14 An embedding of complete binary tree with 15 nodes into a 15-HR . . . 14 3.1 Conservation of flow for node 3 . . . . . . . . . . . . . . . . . . 15 4. A6-HR ........................................................ 26 lX

PAGE 10

5.1 Comparing complexity through examples ............................ 36 6.1 A 3-HR ....................................................... 48 6.2 A 5-HR ....................................................... 48 6.3 An 11-HR ..................................................... 49 X

PAGE 11

1. Introduction 1.1 Research Goal Although the Hyper-Cube is quite powerful from a computational point of view, there are some disadvantages to its use as an architecture for parallel computation. One of the most obvious disadvantages is that the node degree of the Hyper-Cube grows with its size. This means, for example, that processors designed for anN-node Hyper-Cube cannot later be used in a 2N-node Hyper-Cube. Moreover, the complexity of the communications portion of a node can become fairly large as N increases. For example, every node in a 1024-processor Hyper-Cube has 10 neighbors, and every node in a one million-processor Hyper-Cube has 20 neighbors. I / / v y / v-/ / v IY 'V 1/ 16-HC. 16-HR. Figure 1. Example of a 16-HC and a 16-HR. 1

PAGE 12

In this thesis we present hyper-rings (HRs for short). The number of links between processors in a HR is roughly twice that of a HC with the same number of processors, but the proposed organization possesses a number of advantages over that of a HC. Altman, et al. [2], have shown that for any N we can construct a HR with N processors, whereas a Hyper-Cube must contain exactly 2n processors for some positive n. Figure 1 shows examples of a 16-HC and a 16-HR. We first formally introduce hyper-rings, then study their properties, constructions, connectivity, fault tolerance, and finally study embeddings of Hyper Cubes, grid graphs, and complete binary trees into HRs. Hyper-rings and their variations have appeared in literature under several different names. See Altman [1] for details and references. 2

PAGE 13

2. Hyper-Rings Defmition 1. A circulant graph, G = ( V, E) is called a hyper-ring with N nodes(N-HRforshort)ifV= {0, 1, ... ,N-1} andE= {{u, v} I v-umoduloNispower of2}. Hyper-rings, are a multi-machine organization that is, in a sense, a generalization ofthe standard Hyper-Cube architecture. An example of an 11-HR is shown in Fig. 2.1. Figure 2.1. An 11-HR. 2.1 Advantages of HRs over HCs Clearly Hyper-rings are connected regular graphs as is shown in Fig. 2.2. Although the number of edges in a hyper-ring is roughly twice that of a Hyper-Cube with the same number of nodes, the diameter and the connectivity of the hyper-ring are shorter and larger, respectively, than those of the corresponding Hyper-Cube. These properties make hyper-rings desirable as interconnection networks. A number of topologies have been proposed for interconnecting processors in large scaled distributed systems. The Hyper-Cube (HC for short) is shown in Fig. 2.3, is one of the most popular and efficient networks due to its topological richness. 3

PAGE 14

I /' / / y v ,/ 1/ /_ v r IV v Figure 2.2. 16-HR. Figure 2.3. 16-HC. Hyper-Rings and their variations have appeared in literature under several different names, including an optimal broadcasting scheme. Altman [1 ], showed that the number of links between processors in a HR. is roughly twice that of a HC with the same number of processors, but the proposed organization processes a number of advantages over that of a HC. In particular, for any N we can construct, a HR. with N processors, whereas a Hyper-Cube must contain exactly 2n processors for some positive n. Some of the advantages ofHRs over HCs are: The number of processors in a HRs does not have to be a power of 2. The number of edges in a HR is roughly twice that of a same-sized HC. HCs and complete binary trees can be directly embedded in HRs. 2.2 Construction of HRs Definition 2. Let Nbe a positive integer. The binarity of N, written bin(N), is a function whose range is between 0 and I log N l that returns the largest integer k, such that N is divisible by 2*. Observe that any positive integer N may be represented as the following product N = 2hin(N) M, 4 (1)

PAGE 15

Where M is an odd number. First, let us observe a simple (nonrecursive) construction of N-HRs. The following procedure is straightforward and needs no further explanation [1]. for k : = 0 to N-1 do fori= 0 to LlogNJ do connect processor k to one labeled [k + 2']N if there is no link between k and [k + 2;]N Although the above is an effective and correct procedure, it gives us no insight about the actual number of edges placed in the HR. In addition, no obvious information about recursive structure relates HRs construction in this fashion. If E(N) denotes the nt.unber of edges in an N-HR, then E(2N) = 2E(N) + 2N. (2) The problem, of course, is the construction and the determination of the number of edges in the original N-HR. In fact, the problem may be reduced to exalninations of M-HRs, where Mis an odd positive integer in Equation (1). The nonrecursive construction of N-HRs may be used for construction of such M-HRs, with a slight modification. Observe that for odd M of the form 2k + 1, we can construct the M-HR by using the nonrecursive construction procedure that ignores the last iteration of the inner for loop. Note that in this case the last iteration of the inner for loop produces the same connections as the connections produced in the first iteration of the inner for loop. That is, the inner for loop should be interated one less time to produce theM-HR. For the construction of an arbitrary N-HR one simply needs to compute k = bin (N), determine the odd Min Equation ( 1 ), construct the appropriate M-HR, and recursively applied the doubling construction procedure exactly k times. There are three general methods to construct the HRs. 1. Tree-like method In Tree-like method, each node is connected to its neighbor if and only if it's a power of 2 distance away from that node in both the clock and counter clockwise 5

PAGE 16

direction, as shown in Fig. 2.4. Obviously no duplicate edges are introduced. 2. Circular method 9 0 8 Figure 2.4. Tree-like method. Circular method consists of approximately log n phases. In phase i-1 the two nodes that are a distance of 2 to the power of i away from a given node are connected to it as is shown in Fig. 2.5, where connections of21 are displayed. 0 6 2 4 Figure 2.5. Circular method. 6

PAGE 17

3. Doubling method steps. To construct a hyper-ring by using the doubling method we use the following (1) make a duplicate copy of call it N-HR1 ; (2) relabel all nodes in N-HRo from ito 2i (0 i N-1 ); (3) relabel all nodes in N-HR1 from i to 2i+ 1 (0 i N-1 ); ( 4) for each i (0 i 2N -1) connect node i and node j if j = i+ 1 modulo 2N; The above procedure is called the doubling construction procedure, as is shown in Fig. 2.6. (a) A 3-HR. I \ I \ I \ I \ I \ ..._ ____ _. (b) A copy of3-HR. Figure 2.6. Doubling method. 2.3 The Number of Edges in HRs 0 2 3 (c) A 6-HR connected from the two connected 3-HR.s. Altman, et al. [2], showed that the number of edges in HRs does not grow monotonically with the number of nodes. 7

PAGE 18

Theorem: The number of edges in an N-HR, N 2, is given by: { N(log N-1/2) if the Hamming weight of N is 1 E(N) = NL logN J if the Hamming weight of N is 2 Nf log N l if the Hamming weight of N is 3 or more. Where HW (N) denotes the Hamming weight of N. Proof: If the Hamming weight of N is 1, then N = 2", n 1. Observe that during the execution of the nonrecursive construction procedure of N-HR., no multiple edges could possibly be constructed until the very last iteration of the inner for loop. During that iteration, exactly half of the edges would be eliminated. It follows that the number of (nonreplicated) edges when N is a power of 2, is N log N -N/2= N (logN-112). A Hamming weight of2 implies that N can be decomposed into a sum of2P and 29 Moreover, this decomposition must be unique. Because N = 2P + 29 for each pair of nodes that were 2P and 29 positions apart, exactly one extra edge is eliminated due to duplication in the nonrecursive construction procedure of N-HRs. Hence, the number of edges is Nflog NlN, in this case, is equal to Nl_log NJ. In the case where Hamming weight is 3 or more, the nonrecursive construction procedure of HRs does not attempt to create any duplicated edge even if the condition is removed. Hence, the number of edges in this case is Nflog Nl. Example 1 0 6 Figure 2.7. An 11-HR. 8

PAGE 19

Assume we have an 11-HR as is shown in Fig. 2.7. The Hamming weight of N nodes = HW (11) = 3 =number of 1 'sin base 2. By using Theorem 1, we can find the number of edges in 11-HR. HW(11)=3 The number of edges in 11-HR = .Nflog Nl where HW (N) :l!:3 Hence, the number of edges is 44. Example2 Examine the 10-:HR as is shown in Fig. 2.8. Number of nodes= N = 10. (10)10 = (1010:h 0 2 Figure 2.8. A 10 HR. The Hamming weight N nodes = HW (11) = 2= number of 1 's in base 2. The number of edges in a 1 0-HR = .N[_logNJ = 30. 9

PAGE 20

Example3 0 3 4 8 Figure 2.9. A 16-HR.. Examine the 16-HR as shown in Fig. 2.9. By Using Theorem 1, we have: The Number of nodes = N = 16 (16)10 = (10000)2 HW (16) = 1 =number of 1 'sin base 2. The Number of edges = N (Log N'l'l). Log2N= Log 16 = 4. The Number of edges = 16 ( 4-112) = 56. 2.4 Embedding of Hyper-Cubes into Hyper-Rings Since Hyper-Cubes are commonly used networks, the study of embeddings of Hyper-Cubes in hyper-rings will be useful. Altman [2], showed that for any n, the ndimensional Hyper-Cube can be directly embedded in the hyper-ring with 2n nodes. 10

PAGE 21

Then-dimensional Hyper-Cube (n-HC for short) is a graph with N = 2n nodes labeled by 2n binary numbers from 0 .to 2n-1. Where there exists an edge between two nodes if and only if their binary representations of their labels differ by exactly one bit. The HC can be used to simulate many networks since it contains or nearly contains those networks as subgraphs. This is one of the main reasons why the HC is a powerful network for parallel computation. From the structures of N-HCs and N-HRs, it is clear that for N = 2n, the N-HR contains N-HC as a subgraph, making the HR an even more powerful network for parallel computation. 1 -dimensional grid (size 1 x k) A !-dimensional grid of size 1 x k can be embedded as a subgraph into a kHR in an obvious way. That is, there are no wasted nodes in the HR. Note that for HCs, if k is not a power of 2, e.g., k =2m+ 1, then the HC will need 2m+1 = 2k-2 nodes. 2-dimensional grid (size ax b) To embed a grid of size ax b as a subgraph into a HR, we need min (a X 2 [Logbl, nodes. 3-dimensional grid (size ax b x c) The number of nodes required in the HR to embed a x b x c grid is min (a X 2fLogbl X 2fLogcl, b X 2fLogal X 2fLogcl X In general, when embedding k-dimensional grids into HRs we have one extra degree of freedom, i.e., the size of one of dimensions can be the same as the original grid which is not necessarily a power of2. It follows that more nodes are wasted when the grid embedding is done in HCs than HRs (unless each of the grid dimensions is a power of2). 11

PAGE 22

Below are several examples of direct embeddings of a HCs into a HRs. 2.4.1 Embedding of a 4-BC into an S-HR 0 4 Figure 2.10. A 4-HC. Figure 2.11. An embedding of a 4-HC into an 8-HR.. Fig. 2.10, represent a 4-HC, and Fig. 2.11, represent a 4-HC embedding into an 8-HR.. 2.4.2 Embedding of a 16-HC into a 16-HR 15 0 4 s I '7 1} J7 '6 12 / 15 13 3 12 4 1 v 9 3 / ,l/ y If/ 17 0 8 2 10 11 11 5 16-HC. 9 8 Figure 2.12. An embedding of a 16-HC into a 16-HR. 12

PAGE 23

2.5 Embedding of Complete Binary Tree (size 2n-l) in N-HR It is known that a complete binary tree with 2n-1 nodes cannot be embedded as a subgraph into a 2n-HC. Altman, et al. [1], showed that it can be embedded as a subgraph into a (2n-1)-HR. The address of each node of the complete binary tree is given by the following procedure: (1) The address of the root is 0. (2) If the address of a node at depth k is i, then the addresses of its left child and its right child are i+ 2k and i+ 2k+ 1 respectively. Theorem 2. Suppose that the address of each node of the complete binary tree with 2n -1 nodes is given by the above procedure. Then the addresses of all nodes in the complete binary tree are distinct. Furthermore, the address of any node at depth k (0 s: k s: n-1) is between 2A:1 and 2k+1 -2. Proof. From the procedure above, the address of any node is different from the address of any ancestor of the node. It is also immediate that the address of any node in its left subtree is different from the addresses of any node in its right subtree. Hence, all addresses in the complete binary tree are distinct. The latter assertion of the theorem is proved by induction on k. Fork= 0, the assertion obviously holds. Suppose that up to depth k the assertion holds. From the induction hypothesis, the range of the addresses in depth k + 1 is between The address of a node in a complete binary tree is different from the address of its parent by a power of 2: From this fact and Theorem 2, a complete. binary tree with 2n-1 nodes can be embedded as a subgraph into a (2n1)-HR. The embedding of a complete binary tree into the HR is shown in Fig. 2.14. 13

PAGE 24

0 7 Figure 2.14. An embedding of complete binary tree with 15 nodes into a 15-HR. 14

PAGE 25

3. Connectivity in HRs Introduction: In the case of computer networks, the edge or node connectivity corresponds to the minimum number of communication links or computer centers which, if removed, would still leave the network connected, this criterion is most meaningful. If all centers in the network are of equal importance. Otherwise, it is desirable for certain pairs of nodes in the graphs to have a larger edge or node connectivity than others. The minimum number of nodes in any cut of a graph is equal to the maximum number of nodes disjoint paths between nodes. Similarly, the edge connectivity between nodes is equal to the maximum number ofedge disjoint paths between that pair of nodes. Menger's fundamental theorem has simplified the determination of the node and edge connectivity between any pair of nodes in a graph [24]. 3.1 Network Flows: Basic Concepts A network is a graph in which a flow can take place in the edges of the graph. Every network has an origin and a terminus called the source and sink, respectively. A network can be thought of as a pipeline system in which the edges represent pipelines. The source is the inlet of water, the sink is the outlet of water, and all other nodes are connections between the lengths of pipe [22]. Figure 3.1. Conservation of :{low for node 3. 15

PAGE 26

A flow in a network may be a flow of fluids, electricity, or other commodity. The corresponding units of the flow may be cubic feet per second, amperes, or dollars per month, etc. In general, a flow is a rate .. One of the major assumptions about networks is. the conservation of flow in and out of a node. The flow into a node is equal to the flow out of it. If the number above each edge in Fig. 3.1, the represents the flow in the edge, then the conservation of flow for node 3 is observed. The upper limit on the flow in an edge is the capacity of the edge. The flow capacity of an edge is a positive number and may be infinite, in which case the edge is said to be uncapacitated. If an edge has a flow capacity of zero, removing the edge from the network does not alter the potential flow in the network. Given a chain connecting the source and sink of a network, the maximum flow that can be put across the chain is restricted by the edge of the chain with the smallest capacity. If (x, y) is an edge of G, then the capacity of (x, y) is given by c (x, y). The flow in an edge (x, y) is designated _by f(x, y) is related to its capacity by c (x, y) y) A network is a graph G ( V, E) with nonnegative, weights assigned to each edge. Alternatively, a network is an ordered triple (V, E, W) where Vis a set of vertices, E a set of directed edges on V, and W is a function mapping the elements of E to the nonnegative reals. If we interpret the edge weight as limits on the capacities of the edges to transmit a commodity of some type (such as, vehicular traffic, data, fluids). The network is called a capacitated network. The determination of maximum amount of traffic that can be transmitted between a given pair of vertices in a capacitated network is called the maximum network flow problem. The problem of network flows is a basic problem in graph theory and combinatorial optimization. A basic variation of the network flow problem is defined as follows. Let G (V, E) be a directed graph with two distinguished vertices, S (the source) with in-degree 0, and T (the sink) with out-degree 0. Each edge x in E has an associated positive weight c(x), called the capacity ofx. The capacity measures the amount of flow that can pass through an edge. We call such a graph a network. For convenience we assign a capacity of 0 to non-existing edges. A flow is a function/ on the edges of the network that satisfies the following two conditions: 16

PAGE 27

(1) 0 c(x): The flow through an edge cannot exceed the capacity of that edge. (2) For all V E V{s, t}, f..f(u, v) = f..f(v, w): The total flow entering a vertex is equal to the total flow entering a vertex is equal to the total flow exiting this vertex (except for the source and sink). These two conditions imply that the total flow leaving S is equal to the total flow entering T. The problem is to maximize this flow. One way to visualize this problem is to think of the network as a network of water pipes. The goal is to push as much water through the pipes as possible. If too much water is pushed to the wrong area, the pipes will burst. We need to introduce some terminology, beginning with a precise definition of what we mean by a traffic flow. There are two ways to define this, either by a path model of flow, or using conservation mode. While the two methods are mathematically equivalent, only the path model of flow will be discussed here. 3.2 Path Model of Network Flow Let N CV. E) be a capacitated network. LetS and T be a pair of distinct vertices in V (N), and let P be a set of paths from S to T. If x is an edge in E (N), we denote the (nonnegative integral) capacity ofx by C (x), and we denote the set of paths in P that include the edge x by P (x). Let/be a nonnegative integral function defined on P. We denote the sum of the values off(P) taken over all paths Pin P (x) by sum if, x). We call fa flow on P if for every edge x in E (N), sum if, x) is at most C (x). IF fis a flow on P and Pis a path in P, we callf(P) the flow on P. If xis an edge in E (N), we call sum if, x) the flow on x. The set of paths P together with a flow f on P is called a path flow on Nand is specified by the quintuple (N, S, T, P, /). The value of a path flow (N, S, T, P, /) is the sum of the values off (P) taken over all paths p in P and is denoted by value (P, /), the other parameters being left implicit. The vertices SandT are called the source and sink of the path flow, respectively see [21] for details. 17

PAGE 28

3.3 Maximum Network Flows and Minimum Capacity Cuts The fundamental bound on the maximum value of a path flow or a conservative flow on a network N with source Sand sink T can be informally stated as follows: The maximum flow value is bounded above the tightest "bottleneck: between SandT." We now formalize this intuitive notion. Let (N, S, T, P, f) be a conservative flow. If X is a set of vertices in V (N), then xe denotes the complement of X with respect to V (N), that is, V (N) -X. We define a cut (X, xe) as the set of edges { (x, X.) I X E X an X. E xe}. If s is in X and Tis in xe, then (X, xe) is called an ST cut. If X, xe is an ST cut, there is no path from S to Tin N (V, E(X, xe). We defme the capacity of a cut, denoted by Cap (X, xe), as the sum of the capacities of the edges in (X, xe). We denote the capacity of a minimum capacity S-T cut in Nby min cap (N, S, n. The flow on (X, xe), denoted flow (X, xe), is the sum of the edge flows in (X, X). The net flow across (X, X) denoted by net flow (X, xe)flow (xe, X). See [21] for details.. c 3.3.1 Mathematical Formulation of the Maximal Flow Problem: The Max-flow min-cut theorem. The maximal flow problem is to find a feasible equilibrium or steady-state flow pattern through a network such that the total flow from the source to the sink is maximized. Should there exist more than a single source (sink) node, then a super source (supersink) node is connected to each of the sources (sinks). The edges making the connection have infinite flow capacity [22]. The flow into a node x is L f(x,y). From the conservation of flo,w assumption, L f(x,y) L f(y,x) = o. 18

PAGE 29

In addition, the flow out of the source s node equals the flow into the sink t node; then f(s,y) = v = f(y,t). The maximal flow problem can be stated as !v, L f(x,y)L f(x,y) = c(x-y) j{x-y) 0. x=s x=s x=t An observation can be made about the upper bound on the flow through a network. The sum of the capacities on the edges leaving the source provides an upper bound on the flow out of the source node. Similarly, the sum of the capacities of the edges entering the sink provides an upper bound on the flow into the sink. Since the flow out of the source equals the flow into the sink, the sum of the edge that is smaller is the upper limit on the flow in the network. A cut is defined as any set of edges containing at least one edge from every chain from source to sink. The capacity or value of the cut is the sum of the flow capacities of the edges in the direction specified by the cut. Thus, recalling the earlier observation, a cut across the edges leaving the source has a value that is an upper bound on the flow in the network. A cut separating the source and sink that has the smallest capacity is called a minimum cut. The Max-flow min cut theorem states that for any net, the value of the maximal flow from source to sink is equal to the capacity of minimum cut separating sink and source. The capacity of any cut in a network specifies an upper bound on the flow in a network. 19

PAGE 30

3.4 Connectivity: Basic Concepts Previously, we defined the vertex connectivity, VG (G), and edge connectivity, EC (G), of a graph G. These invariants measure the susceptibility of a graph to disconnection under vertex or edge deletion. From one viewpoint, they calibrate the vulnerability of the graph considered as a communication network to disruption. It is remarkable that precisely these measures of strength of connection also determine the diversity of routing in the graph, that is, the number of disjoint paths between pairs of vertices [21]. G ( V, E) in terms of parameterized versions of these invariants which will be useful later in developing algorithms for VC and EC. If u and v are nonadjacent vertices in G, we define a vertex-disconnecting set (or vertex separator) between u and v in G as a set of vertices S in V (G) such that u and v are in different components of G-S. S is said to disconnect (or separate) the pair of vertices u and v. An edge disconnecting set between a pair of vertices and u and v in G is similarly defined as a set of edges TinE (G) such that u and v are in different components of G-T. For nonadjacent vertices u and v. We define the vertex connectivity between u and v, denoted GC (G, u, v), as the cardinality of the smallest vertex disconnecting set between u and v. On the other hand, EC (G, u, v) denotes the cardinality of the smallest edge disconnecting set between the pair of vertices u and v. We can then define VC (G) as the minimum of VG (G, u, v) taken over all distinct, nonadjacent vertices u and v in G (or) IVI-1 ifG is a complete graph, while EG (G) is the minimum of EC (G, u, v) taken over all distinct vertices u and v in G. 3.5 Theorem (Bounds on Connectivity) For every graph G (V, E), VC EC EG min (G). There are a variety <;>f conditions on the degree sequence of a graph that guarantee lower bounds on connectivity, such as the following. 20

PAGE 31

Theorem (Connectivity Lower Bound). Let G (V, E) be a graph of order greater than one, and let n satisfy 1 n I Vl-1. If min Cl V(G)I + n-2) /2, then VC(G) n. By the previous theorems, this also establishes a lower bound on the edge connectivity. The precise relation between the connectivity of a graph and the number of disjoint paths that exist between pairs of vertices in the graph is the subject of the following classical theorem. Let us say a pair of paths are vertex-disjoint if they have at most their end points in common; while a pair of paths are edge-disjoint if they have no edges in common. Theorem (path diversit)' characterization of connectivity). Let G (V, E) be a graph and let k be an integer. Then, (1) If the removal of k (but not less thank) vertices (edges) disconnects a pair of vertices u and v in G, there are at least k vertex (edge) disjoint paths in G from u to v; (2) If G is k vertex (edge) connected, there are at least k vertex (edge) disjoint paths between every pair of vertices in G [21]. 3.6 Menger's Connectivity Theorems Let .A.e (X Y) denote the maximum number of edge-disjoint paths between nodes X and f. Let .A..., (X Y) denote the maximum number of vertex-disjoint paths from X to f. Let Ke (X Y) denote the minimum size of these cut-sets which partition the graph so that Xis in one component and fis in the other. Also, we define K, (X Y) to be the minimum size of those vertex-cuts which separate a graph G into two components, one containing X and the other containing f. So, no such vertex-cut exists if (X Y) is an-edge of the graph [20]. Let G = (V, E) be an undirected graph with X Y, E, V, then Ke (X Y) (X Y), because every path from X to fuses at least one edge of a set which disconnects 21

PAGE 32

G, and no two paths use the same edge, thus Ke (.K, 1') = Ae (X, 1'), if (X, 1') not in E. The following is the statement of Menger's edge-connectivity theorem. A graph is K edge-connected if and only if any two distinct vertices are connected by at least Kedge-disjoint paths. 3.7 Vertex Connectivity of a Graph The usual formula for the vertex connectivity of a graph G (V, E) in terms of its pairwise vertex connectivities is VC(G)=min {VC(G, u, v)}, where (u, v) not in E (G). The following procedure improves on this slightly. For convenience, we assume that ifG is not a complete graph. We invoke functions VG (G, u, v) to calculate the pairwise vertex connectivities [21]. Function VC (G) (*Returns in VC the Vertex connectivity of G*) VarG: Graph i,j: 1. .. 1 V(G)I VC: integer function set VCto I V(G)I-1 if G is complete then return for i = 1 to I V (G) I do for j = i + 1 to I V (G) I do if i andj not adjacent then set VC to min { VC, VC ( G, i, j)} if i > VC then return End Function VC --22

PAGE 33

3.8 Problems of Edge Connectivity We can calculate EC (G) by finding a maximum flow on a network G' obtained from G by replacing each undirected edge {u, v} in G by a pair of directed edges (u, v) and (u, v) in G' and assigning a unit capacity to each edge of G'. The pairwise edge-connectivity, pairwise edge-disjoint path, and pairwise edge disconnecting set problems can then be solved using flow methods. The flow bearing paths of a flow on G' correspond directly to a set of edge-disjoint paths in G. The edges of a minimum cut between a pair of vertices in G' determine a minimum cardinality edge disconnecting set in G between those vertices. 3.8.1 Finding Edge-Disjoint Paths in a Graph Let G = (V, E) be a connected undirected graph. Two paths in G are said to be edge disjoint, if they do not contain the same edge. Let 0 be the set of vertices in V with odd degrees. We first claim that the number of vertices in 0 is even. To prove this claim, we notice that, by summing up the degrees of all vertices, we get exactly twice the number of edges (since each edge is counted twice). But, since all vertices of even degree contributed an even number to this sum, there must be an even number of vertices of odd degree. We now prove the following theorem [19]. Theorem : Let G = (V, E) be a connected undirected graph, and let 0 be the set of vertices with odd degrees. We can divide the vertices in 0 into pairs and find edge-disjoint paths connecting vertices in each pair. Proof : The proof is by induction on the number of edges. Induction hypothesis: .The theorem is true for all connected undirected graphs with < m edges. Consider a connected undirected graph G with m edges, and let 0 be the set of odd vertices. If 0 is empty, then the theorem is true. Otherwise, take any two vertices in 0. Since G is connected, there is a path connecting them. Remove the whole path from G. The remaining graph has fewer edges. We would like to use the induction hypothesis, to find the paths for the rest of the odd vertices, and to complete the proof. The problem, however, is that, by removing the path, we may have disconnected the graph. The induction hypothesis applied only to connected graphs. We have to be very careful about using the induction hypothesis 23

PAGE 34

correctly. we can avoid this difficulty in this case; we will change the hypothesis and adapt it to our needs. The problem we encountered was with the connectivity requirement. Let's remove it. We now have the following induction hypothesis: Revised induction hypothesis: The theorem is true for all undirected graph having< m edges. This is obviously a stronger theorem. Its proof, on the other hand, is simpler. Consider again an undirected graph with m edges, and 0 as before. The graph may not be connected. In this case, the graph is partitioned into several connected components. We will take two odd vertices from the same component. Since each component is a connected graph by itself, it must have an even number of odd vertices. Hence, ifthere are any odd vertices, we can find two of them in the same component. So now we are basically done. since the two chosen vertices are in the same component, we can connect them by a path. We then remove the path. the graph has now less than m edges, and we can use the induction hypothesis because it does not require connectivity. Thus, in the remaining graph, we can pair the odd vertices in edge disjoint paths. We can then add the path we removed and complete the proof. 3.8.2 Edge connectivity of a graph The algorithm shown below is to find the edge-connectivity K (G) of an undirected graph G. Where K (G) denote the edge-connectivity of an arbitrary graph G = (V,E), G' denotes the digraph obtained by replacing each edge of G by two antiparallel edges, each of unit capacity. The algorithm finds the value of a maximum flow F for the network consisting of G' with a source x and a sink y. The algorithm would have O(n IE I ) running time [20]. 24

PAGE 35

Algorithm: Input G and construct G' Specify u K-1 E I for all v E V-{u} do begin find F for ( G'with x = u andy= v) ifF
PAGE 36

4. Fault-Tolerant Communication in HRs A major goal in design of a multiprocessor network, which may be defmed as a system of independent processors or computers that communicate with one another via communication network, is fault-toler3Ilce. These systems are made fault-tolerant by providing redundant or spare processors and/or redundant communication links. When an active system component fails, its tasks are dynamically transferred to spare components, thereby allowing system operation to continue, possibly in a degraded mode. In this thesis, we present an approach to the analysis and synthesis of highly fault-tolerant multiprocessor networks, whose control is decentralized, i.e., distributed throughout the system. Each processor is assumed to be capable of acting as a local supervisor ofreconfiguration and recovery. The system is represented by a facility graph whose nodes denote processors and whose edges denote communication paths. l(a) 4(d) Figure 4.a. (G). 26 4(d) Figure 4.b. (Gp).

PAGE 37

Example 1: Figure 4.a shows a 6-HR graph. The nodes ofG correspond to the microcomputers, and the edges of G correspond to the direct links between the microcomputers. Suppose that the initial state of.the system G isS (a, b, c, d, e, j)-(1, 2, 3, 4, 5, 0). The graph G in Fig. 4.a is the corresponding initial configuration. The F fault affecting node a changes S to SF= (1, 2, 3, 4, 5, 0), and yields the configuration GF of Figure 4.b. We require each node y to be capable of acting as a recovery supervisor in a small part of the system, termed its neighborhood N (Y). We restrict N (Y) to containing Y and all nodes adjacent to Y. Each fault-free node Y may periodically test itself. It is assumed that if Y is faulty, its state S ( Y) changes to -1, and this fact can be detected by all nodes in the neighborhood N (Y). Every active fault-free node Y; periodically polls and records the status of all other nodes inN (Y;). Y; also stores and periodically updates backup files for certain nodes on N (Y;), which allow Y; as a local supervisor to reassign the tasks of these nodes when they become faulty. Although it is generally preferable to limit the role of supervisor to active fault-free nodes, there are cases where it is useful to allow spares to act as supervisors. Distributed recovery typically proceeds in several steps as follows. ( 1) A fault F occurs changing the system configuration. (2) An active node Y; detects a faulty node yj in its neighborhood N (Y;), assumes that role oflocal supervisor and initiates recovery. Usually you know attempts to find a spare node Yk inN (Y;) such that Yk can assume the previous state of the faulty node lj. (3) If no suitable spare is available, either because no spares are present inN ( Y;), or because the available spares do not have the proper connectivity, Y; makes an appropriate change to the state of N ( Y;) and relinquishes the role of local supervisor, which must then be assumed by some other node. It should be noted that the initial fault affecting node lj in step (1) above corresponds to the absence of a node in state subsequent reassignment of states during recovery may also lead to other states being temporarily missing from the system configuration. For example, if supervisor node Y; changes the state of another 27

PAGE 38

node yk from sk to the of the faulty node lj, the old state sk of yk is unassigned. These considerations lead to the important concept of an error condition, which is defined formally as follows. Let GF be a fault-tolerant system whose configuration is in Fig. l.b. An exists if GF has no active node in Thus, error detection and recovery will be based on the identification of configurations in which one of the possible active states is missing. A particular error E may occur with many different system configurations. The fastest recovery is achieved when a spare node Yk is immediately used to replace a faulty node lj by assuming its state, as in step (2) above. In this case, we say the Yk covers lj. In general, a node lj covers a node lj in a graph G if lj is adjacent to every node in G that is adjacent to lj. See [23] for details. 28

PAGE 39

5. Reliability in Distributed Systems One of the major goals of the reliability engineer is to find the best way to increase the systems reliability. Usually, how well a system meets its design objective and is expressed as a function of the reliability of the subsystems or components. Thus, there are many factors that affect the reliability of a system. It is well accepted that improving the reliability of the system can be achieved by one or a combination of the following techniques: reducing the complexity of the system improving the reliability of components or subsystems implementing large safety factors using redundant components (standby, switched) practicing a planned maintenance and repair schedule. Many of these techniques are in conflict with others and it is extremely important to consider all the aspects of the system. Computing exact reliability and performing optimization is a complex process. Some simplifying about the system network configurations are made so that the problem can be handled with reasonable effort. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements, as well as on the distribution of its resources, such as programs and data files. Useful measure of reliability in distributed systems is the terminal reliability between a pair of nodes which is the probability that at least one communication path exists between these nodes. An interesting optimization problem is that of maximizing the terminal reliability between a pair of computing elements under a given budget constraint. Analytical techniques to solve this problem are applicable only special forms of reliability expressions [24]. 5.1 Reliable Transmission (Byzantine General Problem) Consider a distributed system of processors whose interconnection network is given by an undirected graph G (V, E). Assume there is a processor at each vertex of G which can communicate directly with the neighboring processors at adjacent 29

PAGE 40

vertices. To transmit a message to a distant processor, a processor included the intended route with its message and then routes the message through the network, by transmitting to the first processor/vertex on the route which proceeds in a similar manner, and so on, until the message eventually reaches its destination. Feng [25], showed that the differences between the Byzantine problem and fault-tolerant broadcasting is that the source may be faulty in the former case, but the source is always good in the latter case. We are interested in the kind of problems that can arise when some of the processors in the network may be faulty. We will assume that faulty processors can exhibit a bizarre variety of deleterious effects. They may lose messages, route messages incorrectly, garble messages, or even generate multiple incorrect copies of messages. It is not at all obvious whether reliable transmission can be achieved in the presence of such potential chaos. But, surprisingly, we can show that if the interconnection graph for the network has sufficiently high vertex connectivity, then the reliable processors in the network can communicate reliably even in the presence of severely unreliable components. 5.2 Reliability Optimization The development of computer networking and low-cost processing devices has led to an increasing interest in distributed systems. This trend is becoming more important because: it provides a practical means for sharing expensive information resources, computing power, and information handling devices; it provides higher performance by permitting concurrent operations on the processing elements; it provides more reliable operational capabilities by the replication of resources and the rerouting of the tasks that are assigned to a faulty processing element to some other fault-free processing elements of the distributed systems. The reliability of a distributed system depends not only on the reliability of the communication network, but also on the processing nodes' reliabilities and the 30

PAGE 41

distribution of other resources, su_ch as programs and data files in the network. Therefore, the reliability of a distributed system can be measured by the probability of the successful executions of its programs. As a result, the successful execution of a task requires that the host processor on which the task is running is operational, all the processing nodes on which the required data files reside are operational, and all the links which will be used in data files transfers are operational [25]. The reliability of the distributed systems has been studied by using different measures, such as the survivability index, which is a quantitative measure used to determine the survivability of a distributed system by averaging the number of programs that remain operational after a combination of links and nodes have failed. Another study which is called the terminal reliability is used to measure the probability that at least one path exists between the terminal nodes. The multiterminal reliability describes the probability that paths exist between a subset of the processing nodes which designates the source nodes and another subset which designates the destination nodes. An important design problem in distribute systems is that of maximizing terminal reliability between a pair of terminal nodes. The optimizing techniques that can be used for terminal reliability maximization vary from analytical to heuristic approximation methods. Most of the methods available which are proposed for maximizing the system's reliability are _based on using redundancy to improve the reliability of series-parallel systems. The use of redundancy to solve this optimization problem which is more difficult to solve than a nonlinear programming problem. Integer programming techniques give integer solutions but they do not guarantee that optimal solutions can be obtained in a reasonable computing time [24]. INSTANCE: Graph G = (V,E), subset V' !: V, a rational "failure probability" p(e), 0 p(e) 1, for each e E E, a positive rational number q 1 [26]. QUESTION : Assuming edge failures are independent of one another, is the probability q or greater that each pair of vertices in V' is joined by at least one path containing no failed edge? 31

PAGE 42

Dynamic programming solves the optimal redundancy allocation problem, but the required computation time and the amount of storage needed to store the generated tables make it infeasible when the number of state variable is large. 5.3 Message Transmission In this paper, we express a network by a graph, where a node is a processor and an edge is a communication link. If two nodes are connected by an edge, they are called adjacent. If every pair of nodes in a network are connected by an edge, the network is said a complete network. Otherwise, it is called an incomplete network or a general network. Broadcasting is a process of disseminating a message from a source node to all other nodes. Since not every pair of nodes have a link connecting them, broadcasting can be achieved in the way that each node repeatedly receives and forwards the messages. When only one node has a message to disseminate, it is called one-to-all broadcasting. If every node has a message to disseminate to all other nodes, it is called all to-all broadcasting. For broadcasting, no secrecy is required. That is, we need not keep the message secret to any node. the-only concerned matter is correctness, i.e., every good node should correctly receive the message broadcast by the source node before the broadcasting is finished. For broadcasting, we require that every good node can correctly receive the messages broadcast by good nodes. The goal of all-to-all broadcasting can be formalized as follows: Letp1,p1 pn be all the processors (nodes) of the network. Denote m[i] the message broadcast by p; and m1 [zl the message received by p1 fromp;, fori, j =I, 2, .. ; .. n. The broadcasting is said to be successful if m[i] = m1 [i] for every pair of i andj such thatp; andp1 are good processors [25]. We always assume that the source node of broadcasting be good and honest. That is, the source node always sends a correct message to its neighbors (adjacent 32

PAGE 43

nodes). However, it is sometimes possible that the source node itself is faulty, while in this situation all other nodes must agree on some value as what the source node broadcasts. This problem is called Byzantine agreement. Hence, the difference between fault-toletant broadcasting and Byzantine agreement can be stated as that the former always has a good source while the latter may have a faulty source. We formalize the goal of Byzantine agreement as follows: Letp1,p2 ,pn be all processors (nodes) of the network. Suppose thatp1 is a source node and has a message m to send to all other p; 's. At the end of the execution of a Byzantine agreement protocol, every p; decides a message m;. We require that m; = mi for every pair ofi andj such thatp; and pi are good processors. Furthermore, if p 1 is good, m; = m for all the good p; 's. Byzantine agreement is one of the most important tasks in distributed systems. By applying Byzantine agreement protocols, we perform as if there exists a broadcasting channel while there actually is not. The Byzantine agreement problem has been much studied. Many protocols for ByZantine agreement have been proposed. Although Byzantine agreement is usually studied on complete networks, it can be implemented for general networks. The most basic operation used in byzantine agreement protocols is fault-tolerant broadcasting: the message broadcast by good nodes can be correctly received by all other good nodes. For example, the simplest protocol for Byzantine agreement, EIG (exponential information gathering) just exploits t + 1 rounds of all-to-all broadcasting. Hence, the study on how to raise the efficiency and reliability of broadcasting helps to raise the efficiency of realizing Byzantine agreement in general networks. Only reliability is concerned in the study of broadcasting, whereas both correctness and secrecy are concerned in secure message transmission; -In secure message transmission setting, one node (processor), called a sender, has a secret messages to send to another node, called a receiver. We require that the receiver can correctly obtain the messages whereas all other nodes obtain nothing about s. We always assume that all the good nodes execute the protocol honestly. We also assume that every node knows the topology of the underlying network in advance and that the network is synchronized by a global clock. Thus, all the processors can implement one step of a protocol concurrently in a specified time unit. 33

PAGE 44

The basic idea of secure message transmission is to exploit disjoint paths a sender and a receiver. When the distance between a sender and a receiver is large, the transmitted messages possibly suffer from both corruption and leakage. For more details, see [25]. 5.4 Networks with Faults If the nodes (processors) and edges (links) of the network work well, we say they are good. The main theme of this work is to study the problems in the previous section in the situation where some processors and/or links may be faulty. In a network, both processors and links may fail to work well. We call them faulty nodes and faulty links. Both faulty nodes and faulty links are called faults. If we consider the maximum number of faults that can be tolerated in the worst case, the assumption that only nodes may be faulty covers the assumption that both nodes and links may be faulty. This is because we can regard all the links connected to faulty nodes be faulty. However, we sometimes assume that only faulty links exist, i.e., all the nodes are good. For some cases, it is easier for us to cope with just faulty links than to cope with both faulty nodes and faulty links. Two types of faults are usually considered in the study of fault-tolerant broadcasting. One is the fail-stop type, and the other is the arbitrary-change (or Byzantine) type. A faulty node or link of the fail-stop type does not transmit any i.e., it stops the message passing through it. A faulty node or link of the Byzantine type may not only stop sending message but also arbitrarily change a message passing through it. It may even fabricate a fake message. Byzantine faults are much more malicious than fail-stop faults, and therefore, are more difficult to tolerate. Fault tolerance of broadcasting against faults of the fail-stop type have been much studied. For the study of broadcasting, we do not consider the situation where source node is faulty, i.e., we always assume that the source node be good. It is the problem of Byzantine agreement where the source node may be faulty. Byzantine agreement is equivalent to a problem called Consensus. In the Consensus problem, every node has a value at first. After a Consensus protocol is finished, all the good nodes must achieve an argument on a value. If at first all t_he good nodes hold an identical value, the agreed value should be the identical value. 34

PAGE 45

One of the main issues studied in message transmission is to evaluate the maximum number of faults that can be tolerated in the worst case. This problem arises in both reliable message transmission (i.e., broadcasting) and secure message transmission. For non-adaptive protocols/schemes, we can never tolerate as many Byzantine faults as a half of the connectivity of the network. This is the case for both reliable message transmission and secure message transmission. Hence, we sometimes would like to take another assumption about faults, that is, we assume that nodes and/or links become faulty randomly with a fixed probability. In the random cases, we study the relationship between the probability that a protocol/scheme succeeds and the probability that each node and/or each link becomes faulty. For large networks, the assumption of random faults is more realistic [25]. 5.5 Efficiency of Message Transmission When we talk about the efficiency of a protocol/scheme/algorithm of message transmissions, we usually refer transmission complexity, message-size complexity, stop/round complexity and communication complexity. In the following, we explain in details these complexities. By a transmission we mean that a node transmits a message along one link once. For example, when A sends a message to C through B, two transmissions are needed, as shown in Figure 5.1 (a). Transmission complexity of a protocol/scheme refers the number of transmissions in the implementation of the protocol/scheme. Message-size complexity refers the size of the message being transmitted. When we say message-size complexity of a protocol/scheme, we mean the maximum size of the messages of all transmissions by the protocol/scheme. For example, A has a message of size 20 bits. If A sends the message to C through B as shown in Figure 5.1 (a), then the message-size complexity is 20 bits while the transmission complexity is 2. If A separates the message into two parts of size 1 0 bits and sends one of them through Band the other through D as shown in Figure 5.1 (b), then the message-size complexity is 10 bits, but the transmission complexity is 4. Apparently, there is a tradeoff between message-size complexity and transmission complexity. We can always reduce message-size complexity by 35

PAGE 46

increasing transmission complexity. However, from practical viewpoint, we rather reduce transmission complexity as much as possible. This is because increasing the size of the transmission is less costly than increasing the number of transmissions. In the communication in a network many transmissions can take place concurrently. For example, transmissions from B to C and from D to C in Figure 5.1(b) can take place concurrently. However, transmissions from A to Band fromB to C have to take place one after another. We call a concurrent round of transmissions a round or a step. The number of concurrent rounds/steps is called round/step complexity. For example, A has a message of size 20 bits to send to B. If A separates the message into two parts of size 1 0 bits and sends one of them directly to B and sends the other through D and C as shown in Figure 5.1 ( c }, then the transmission complexity, the message-size complexity and the round/stop complexity are 4, 10, 3, respectively. Sometimes we also call the round/step complexity of a protocol/scheme the running time of the protocol/scheme. Sometimes we also mention communication complexity. Communication complexity refers the total amount of messages being transmitted. If the message size is always the same during the execution of the protocol/scheme, the communication complexity is the product of message-size complexity and transmission complexity. The communication complexity is 40 bits in Figure 5.1 (c). B B B 'c A/ ,. c (a) D (b) D (c) D Figure 5.1. Comparing complexity through examples. One thing we need to mention about transmission complexity, message complexity, step complexity and communication complexity is that all these complexities are calculated for good nodes/links. In other words, we do not consider 36

PAGE 47

faults in estimating those complexities since faults of the Byzantine type may perform arbitrarily. The efficiency of message transmission is affected not only by the number of faults, or the faulty probability in random case, but also by the limited manner of transmissions. For example of broadcasting, if we allow a node to send messages along all the links connected to it concurrently in one step, this is said to be all-port broadcasting. On the other hand, if we only allow a node to send messages along one of its links in each step, this is said to be one-port broadcasting. Apparently, one-port broadcasting requires larger step complexities than all-port broadcasting does. In the case where no fault exists, all-port broadcastirig requires trivially the same number_ of steps as the diameter of the network. However, the situation is much more complicated for one-port broadcasting. Constructing one-port broadcasting with the optimal step complexity in general networks is NP-hard even for the case where no fault exists [25]. 5.6 Optimization Problem Formulation The optimization problem is more general than the problem of solving series parallel systems. Distributed systems have complex structures and can not be modeled as series-parallel systems and, therefore, different techniques need to be used to obtain optimal or near optimal solutions in a reasonable computation time. Furthermore, the cost-reliability functions associated with the components depend on their characteristics, the technology used in manufacturing them, the environment surrounding their physical location, and the method used for improving their reliability. The algorithms introduced here are independent of the techniques used to improve the reliability of the components and assume that the cost-reliability functions are given. These cost-reliability functions approximate the amount of dollars that must be spent to improve the reliability of the distributed systems (DS) components [24]. Terminal reliability is a measure of the probability of successful communication between a pair of nodes. If multiple paths exist between a given pair of nodes, then the terminal reliability is the probability that at least one path is 37

PAGE 48

operational. There are several algorithms proposed in the literature to obtain a terminal reliability expression. These algorithms are based on path enumeration, cut set enumeration, or state enumeration. A general formula for the terminal reliability, R(s --+ t), can be directly derived from its definition that can be stated as follows: .. By direct expansion of the formula, a general expression of terminal reliability between a pair of terminal nodes (s, t) can be obtained as follows: m t) = L P,.(P;)-L nlj) ... +(-l)m-1 n P,_ n Pm). i-1 '
PAGE 49

An analytical methodology was studied in [21] to solve this optimization problem using the Lagrangian multiplier method and to show that the optimal solutions can be derived for some simple reliability expressions. For example,, the following problem was studied analytically: GIVEN: MAXIMIZE: OVER DESIGN VARIABLES: SUBJECT TO: d';s. i=l 1r 0 d; --' ; do not spend C; more than required to get P; from r; to 1. The optimal solution resulted from applying the Lagrangian multiplier method is 1{ r. d; =-Dmax + --' n i=I c1 C; However, this analytical approach can only be applied to simple forms of reliability expressions. Therefore, we adopt an iterative approach that can handle large distributed systems and develop algorithm to maximize the reliability within a cost constraint. 5. 7 Reliability Optimization Algorithm The following iterative algorithm consist mainly of two steps, an exploring step and an updating step. In the exploring step, the reliability of the unreliable elements are improved by investing a o dollar amount, and then the objective 39

PAGE 50

function is evaluated for every element examined. In the updating step, the values of the objective function are compared to find the element whose reliability improvement maximizes the objective function, and then investing o dollars to improve its reliability [24]. These two steps are repeated Until all the available budget has been used up. However, this algorithm uses different objective functions to identify at each iteration the element that maximizes terminal reliability. The algorithm uses the terminal expression itself as objective function. The terminal reliability expression is as an objective function to be maximized in each iteration. In the exploring step, the improvement in the reliability of each element is first examined sequentially (one at a time) by spending o dollars on improving its reliability and the corresponding terminal reliability value is evaluated. In the updating step, the values of terminal reliability resulted from the improvements of the explored elements are compared to identify the element that maximizes terminal reliability and then to improve its reliability. These two steps are repeated until all the budget has been used up. The number of iterations (maxiter) is a function of o that can be chosen as the minimum of dollars that must be spent on every element to produce an increase in its reliability equal to the desired accuracy in the reliability values. For example, if the accuracy in measuring the reliability of the elements is 1 Q-4, o is then the amount that should be spent on any unreliable element to increase its reliability by at least 1 Q-4. Once o is determined, the number of iterations is Dmax /o. The complete description of this algorithm is shown below [24]. 40

PAGE 51

Algorithm: *Initialization. Dmax maxtter= -o for i = 1 to n do d;=O od for iter = 1 to maxiter do begin *Exploring step. for i = 1 to n do begin d;=d;+ o; evaluate R;; d.=d.0. I I J end *Updating step. fmd: Rimax =max (R;:i = 1 n); improve imax: d;max = d;max + 0 ; end 5.8 Broadcasting with Random Faults For large networks the worst case rarely appears. That is, broadcasting succeeds with a high probability even if many more than [(n-1)/2] faults exit in a network of n component graphs. Hence, it is a more realistic assumption that faults are randomly distributed in a network [25]. 5.9 Tolerating Faulty Nodes/Links We assume that/faulty nodes of the Byzantine type are randomly distributed in a network and that all the links are healthy. We consider broadcasting from the 41

PAGE 52

source node through n independent spanning trees of a network, and study the relation between the number of faulty nodes and the probability of successful broadcasting. Suppose that each conformation of the network with/ faulty nodes is equally probable. We denote a configuration of network G with/faulty nodes by c{;, and denote the set of the all configurations of G with/faulty nodes by c{;. If broadcasting in c{; is successful configuration, and otherwise c{; is called a failed configuration. We denote the set of all successful configurations of G with/faulty nodes and the set of all failed configurations of G with f faulty nodes by SCl and FCl respectively. Then the probability of successful broadcasting in G with/faulty nodes is equal to I SCl ill C I = 1 I FCl ill C{; I We can more realistically consider all configurations with up to/faults to estimate the probability of successful broadcasting. However, if we assume that in this consideration the probability of configurations with the same number of faults are equal (e.g., the probability of each config1,ll(ltion with k faults (k /) is Pk), the results in this section and the next section are still valid for such distributed modes of faults [25]. Let us consider a network G = G1 x x Gm where the order of each Gi (1 s is n) is not greater than a constant b. Lets be the source node of G, and let q1(s, u), ... qn(s, u) ben paths from s to u in then independent spanning trees Th T2 , Tm respectively. These n paths are internally node-disjoint. By the broadcasting scheme the message from the source node s is disseminated to each node u of G through then independent spanning trees. From the construction of then independent spanning trees, for each i (1 i n) there are at most nb nodes on qi(s, u). If in a configuration c{;, more than LCn-1)12J paths among q1(s, u), q2(s, u), ... qn __ (s, u) contain faulty nodes, then c{; is called a failed configuration for u. We denote the set of all failed configurations for u by FC{; (u). Then we have jFc[j < L jFC[(u)j ueV(G) Theorem: If/faulty nodes of the Byzantine type are randomly distributed in a network G = G1 x x Gn with no faulty links, then the broadcasting through then independent spanning trees in G is successful with a probability higher than 42

PAGE 53

1-( 46;1f'',, where N is the order of G and is the maximum order of_Q and b is the maximum order of the component graphs [25]. Proof : The probability of successful broadcasting through the n independent spanning trees of G is lsa{j_ 1 I Fcc{ I lcc{l lcc{l For any nodes u of G, there are n internally node-disjoint paths q1(s, u), ... qn(s, u) of length at most bn. Consider the configurations where exactly f n/2l faulty nodes exist and they are on the different paths among the n node-disjoint paths from s to u. The number of each configurations is apparently no more than (bnin'21(r. By repeatedly counting, we have I I I ( )fn/21( n 1(N-r n/21\ FCa(u) < bn ln!2l) 1-ln12lJ" Hence, jFcc{l < N(bn/nt21( n 1(N-In!21\(N)-J Icc{ I I n!2l) II n12l) I =N(bn/nt21( n l f(l-l)(l+l-ln12l l n!2l) N(N -l}(N + 1I n/2l) ( ) r n/21 ( 4 3 ) r nt21 < bn(bn/nt21 2 n b;l Then we have 43

PAGE 54

Corollary : For any k > 1, if at most ---!f.-faulty nodes of the Byzantine type are 4b nk randomly distributed in a network G = G1 x x Gn, then broadcasting through then independent spanning trees in G is successful with a probability higher than 1-krnt2l. 5.10 Optimal Transmission Complexity Let G be an incomplete network(graph) and I Gl denote the number of nodes of G. Apparently, if there are no faults in G, the optimal transmission complexity of broadcasting in G is I GI-L If the connectivity ofG is n, at mostl n; 1J Byzantine faults can be tolerated in the broadcasting. We only.consider non-adaptive fault tolerant broadcasting. We can simply let the source node send n copies of the message through n disjoint paths to some node. We can do this to all the nodes one by one. Of course, l n; 1 J Byzantine faults can be tolerated. However, such fault tolerant broadcasting has large transmission complexity. It is easy to see that any non-adaptive fault-tolerant broadcasting has transmission complexity at least n( I G l-1) if it is requested to tolerate up to l n; 1 J Byzantine faults. This is because for the non-adaptive case, every nodes to receive at least n transmissions in order to obtain the correct message in the presence ofl n; 1 J Byzantine faults. If G is an n-channel graph, the non-adaptive fault-tolerant broadcasting __ algorithm has complexity n( I G l-1 ), and it can tolerate up to l n; 1 J Byzantine faults. Hence, it is optimal in transmission complexity. Observation: Lets be the source and v be an arbitrary node in G. Let m (i = 1, 2, ..... n) be a copy of the message m transmitted from s to v. Without loss of 44

PAGE 55

generality, we may suppose that m; is transmitted from s to v through a path of length k, namely, Transmission: su" Transmission: u1 -u2 .... Transmission: -v = uk. Then every Transmission (j = 1, 2, ..... k) is used in the fmal majority voting not only at v but also at ui. We attach a label to each copy of m when it starts from s. Suppose that these copies are m2 m". Remember that each of m1 m2 m" is actually the same as m. We just imagine that we place a label on each copy of the message. Let v be an arbitrary node ofG-{s}. Since our broadcasting is non-adaptive to tolerate l n; 1 j Byzantine faults, v must receive all m1 m2 m". That is, each m; should reach all the nodes [25]. 45

PAGE 56

6. Maximum flow for odd-sized HRs Altman [3] showed that the edge connectivity ofHR is equal to its nodal degree, which optimum. However, it is still an open question whether the node connectivity of an odd-sized HR is equal to its degree. We have attempted to answer this question by explicitly testing the node connectivities of various odd-sized HRs. The maximum-flow problem is the simplest problem concerning flow networks. It asks, what is the greatest rate at which material can be shipped from source to the sink without violating any capacity constraints. We shall use the classical method of Ford and Fulkerson for finding maximum flows. -The FordFulkerson method Ford-Fulkerson method for solving the maximum-flow problem. It depends on three ideas: residual networks, augmenting paths, cuts of flow networks. These ideas are essential to the max-flow min cut theorem, which characterizes the value of a maximum flow in terms of cuts of the flow networks. The FordFulkerson method is iterative. We start withj{u, v) = 0 for all u, v E V, giving an initial flow of value 0. We increase the flow value by fmding an "augmenting path", as a path from source Sto the sink t along which we can push more flow. We repeat this process until no-augmenting path can be found. This process yields a maximum flow. Residual networks Consists of edges that can admit more net flow. Suppose we have a flow network G = (V, E) with sources and sink t. Letfbe a flowinG, and consider a pair of vertices u, v E V. Residual capacity Ci is the amount of additional net flow we can push from u to v before exceeding the capacity. C1= c(u, v)-j{u, v). 46

PAGE 57

Augmenting paths Given a flow network G = (V, E) and a flow f. An augmenting pathp is a simple path from s to t in the residual network G1 The residual capacity of p, is the maximum amount of net flow that we can send along the edges of an augmenting pathp. Cuts of flow networks The FordFulkerson method repeatedly augments the flow along augmenting paths until a maximum flow has been found. The max-flow min cut theorem, tells us that a flow is maximum if and only if its residual network contains no augmenting path. A cut (S, 1) of flow network G = ( V, E) is a partition of V into Sand T = V-S such that s E Sand t E T. If/is a flow, then the net flow across the cut (S, 1) is defined to bef(S, 1). The capacity of the cut (S, 1) is C (S, 1). -The basic FordFulkerson algorithm In each iteration of the Ford-Fulkerson method, we fmd any augmenting path p and augment flow falongp by the residual capacity Cjp). We update the net flow f[u, v] between each pair u, v of vertices that are connected by an edge in either direction, we assume implicitly thatf[u, v] = 0. The capacity C (u, v) = 0 if (u, v) E E. The residual capacity C1(u, v) = C (u, v)-f(u, v). First we initialize the flow fto 0. The while loop repeatedly finds an augmenting path p in Gft and augments flow f alongp by the residual capacity Cjp). When no augmenting paths exist, the ft"ow /is a maximum flow. Breadth-First search (BFS). To visit all nodes connected to node kin a graph, we put k onto a FIFO queue, then enter into a loop where we get the next node from the queue. If it has not been visited, visit it and push all the unvisited nodes on its adjacency list. We continue doing that until the queue is empty [22]. We have used the FordFulkerson method to find the maximum flow for the odd-sized HRs, which is related to the node-connectivity of the HRs. 47

PAGE 58

The following is the maximum flow out put for various odd-sized HRs. Maximum flow for 3-HR. Number of edges = 3 augment path : 0-2 augment path: 0-1-2 Max flow=2 Maximum flow for 5-HR. 0 Figure 6.1. A 3-HR. 0 3 Figure 6.2. A 5-HR. 48

PAGE 59

Number of edges = 10 augment path : 0-4 augment path : 0-1-4 augment path : 0-2-4 augment path : 0-3-4 Max flow=4 Maximum flow for 11-HR Number of edges = 44 augment path : 0-1 0 augment path : 0-1-1 0 augment path : 0-2-10 augment path: 0-3-10 augment path : 0-7-1 0 augment path : 0-8-1 0 augment path : 0-9-1 0 augment path : 0-4-6-1 0 Maximum flow= 8 0 6 Figure 6.3. An 11-HR. 49

PAGE 60

-Maximum flow for 29-HR Number of edges = 116 augment path: 0-1-28 augment path : 0-2-3-28 augment path: 0-4-26-28 augment path: 0-8-23-28 augment path : 0-28 augment path : 0-27-28 augment path: 0-25-28 augment path: 0-21-20-28 Maximum flow= 8 -Maximum flow for 57-HR Number of edges = 342 augment path: 0-1-56 augment path: 0-2-3-56 augment path: 0-56 augment path : 0-4-52-56 augment path : 0-8-9-50-56 augment path: 0-16-31-56 augment path : 0-25-33-35-56 augment path : 0-32-40-48-56 augment path: 0-41-42-56 augment path: 0-49-56 augment path : 0-53-54-56 augment path : 0-55-56 Maximum flow= 12 50

PAGE 61

7. Summary In this thesis we have studied the properties and the constructions, embeddings, fault tolerance, connectivity, reliability, and optimization ofHRs. We have shown construction procedures for HRs and the exact evaluation for the number of edges in anN-HR. Altman [3] showed that the edge connectivity ofHR is equal to its nodal degree, which optimum. However, it is still an open question whether the node connectivity of an odd-sized HR is equal to its degree. We have attempted to answer this question by explicity testing the node connectivities of various odd-sized HRs. Empirical results, so far have confirmed our hypothesis for the first 1 ,000 odd-sized HRs. 51

PAGE 62

Appendix I /MaxFlow.cpp #include #include #include #include #include #include #include "hring.h" void main() { I I clrscr(); FlowMap myMap; hyperRing myRing; AUGLIST::iterator aug; edgeList: :iterator path; Flow Map: :iterator maplterator; for(int i=3; iconnectivity(); FlowMap::iterator mydex = myMap->begin(); FlowMap::iterator mydex2; while(mydex != myMap->end()) { edge *theEdge = mydex->second; printf("\nThe edge from node %ito node %i has Capacity %i, theEdge->startN ode,theEdge->endN ode, theEdge->capacity, theEdge->flow); mydex++; 52

PAGE 63

} int numbOtEdges = myMap->size(); unsaturatedPath = numbOtEdges; printf("\n\n\t**** %i nodes H-Ring **** \n",i); printf("\tNumber of edges %i\n",numbOtEdges); //myRing->getDirectedGraph(myMap,i); while( I) { %i\n" ,i,maxFlow); edgeList pathlist = myRing->searchPath(i); edgeList: :iterator pathtemp4 = pathlist->begin(); printf("augment path:\t"); while(pathtemp4 != pathlist->end()) { } edge* aaaa = *pathtemp4; printf("%i",aaaa->startN ode); pathtemp4++; printf("\n"); if(pathlist->size() = 0) { II } if(unsaturatedPath != 0) { } continue; printf("\n node number %i maxFlow = break; int mincap = 0; path = pathlist->begin(); edge *temp = *path; //mincap = temp->capacity; int thisNode = temp->endNode; int con= 0; while(path != pathlist->end()) {//find mincap in the path temp = *path; 53

PAGE 64

if(temp->startNode != thisNode) { } if( con= 0) { } con++; path++; mincap = temp->capacity; continue; pathlist ->erase(path); path= pathlist->begin(); con= 0; continue; thisNode = temp->endNode; if(mincap >= temp->ResidualCap) mincap =temp-> Residual Cap; path++; } I I found min cap if(mincap <= 0) { unsaturatedPath--; if(unsaturatedPath == 0) break; my Ring-> AugList.erase(myRing-> AugList. begin(),myRing-> AugList.end() ); continue; } path = pathlist->begin(); while(path != pathlist->end()) { } temp = *path; temp->flow += mincap; temp->ResidualCap -= mincap; path++; myRing->AugList.erase(myRing->AugList.begin(),myRing->AugList.end()); 54

PAGE 65

maxFlow += mincap; } mydex = myMap->begin(); I /FlowMap: :iterator mydex2; while(mydex != myMap->end()) { edge *theEdge = mydex->second; printf("The edge from node %i to node %i has Capacity %i, Flow%i \n", theEdge->startN ode,theEdge->endN ode, theEdge->capacity, theEdge->flow ); mydex++; } } cout << endl << endl; II cin.get(); delete myRing; } 11Sleep(30000); llmatrix.h #ifndef MATRIX H ---#define MATRIX H class matrix { public: private: }; matrix( unsigned, unsigned); -matrix(); void init(); int* operator [](unsigned); void display() const; int **mat; unsigned row,col; 55

PAGE 66

#end if //matrix.cpp #include #include "matrix.h" matrix::matrix(unsigned r, unsigned c):row(r),col(c) { } mat = new int* [row]; for(unsigned i=O; i
PAGE 67

{ } } for( unsigned j=O; j
PAGE 68

REFERENCES [1] T. Altman, Y. Igarashi and K. Obocata, Hyper-Ring Connection Machines, Proc.IEEE 9th Annual. International Conference (1994) pp. 290-294. [2] T. Altman, Y. Igarashi, and K. Obokata, Hyper-Ring Connection Machines, Parallel Computing 21 (1995)pp. 1327-1338. [3] T. Altman, Reliable Schemes for Hyper-Rings 28th International Southeastern Conference on Combinatorics, Graph theory and Computing; Boca Raton, Florida, Feb., 1997. [4] Y. Han and R. Finkel, An Optimal Scheme for Disseminating Information, Proc. 1988 Int. conf. on Parallel Processing, St. Charles (1988) pp. 198-203. [5) Y. Han, Y. Igarashi, K. Kanai and K. Miura, Fault-Tolerant Broadcasting in Binary Jumping Networks, in: Algorithms and Computation, ISAAC '92, Nagoya, Lecture Notes in Computer Science 650 (Springer-Verlage, New York, 1992) pp.145-154. [6] D. Peleg and J. D. Ullman, An Optimal Synchronizer for the Hypercube, SLAM J. Comput. 18 (1989)pp. 740-747. [7] G. Tel, Topics in Distributed Algorithms (Cambridge University, Press. Cambridge, 1991). [8] N. Alon, A. Barak and U. Mauber, On Disseminating Information Reliably without Broadcasting, in: Proc. 7th Internet. Conf on Distributed Computing Systems (1987)pp. 74-81. M. Y. Chan, Embedding of D-Dimensional Grids into Optimal Hypercubes, Proc. 1 s Annual. ACM Symposium on Parallel Algorithms and Architectures (1989) pp. 52-57. 58

PAGE 69

[10].S. N. Bhatt, F. R. K. Chung, F. T. Leighton and A. L. Rosenberg, Efficient Embeddings ofTrees in hypercubes, SIAM J. Compute. 21(1992) pp. 151-162. [11] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays. Trees. Hypercubes (Morgan Caveman, San Mateo, CA, 1992). [12] M. Rotifer, U. P. Stirred, and W. Under, "Embedding 3-Dimensional Grids into Optimal hypercubes, Technical Report. Dept. of Mathematics and Computer Science. Univ. ofPaderborn, Paderborn, Germany, 1994. [13] K. Fee. "Embedding Mesh ofTrees in the Hypercube." J. Parallel and Distributed Computing, vol., pp.-230. 1991. [14] F. T. Leighton, M. Newsman, AUG. Rained, and E. Schwa, "Dynamic Tree Embeddings in Butterflies and Hypercubes," The 1st Annual ACM Symposium on Parallel A1grithms and Architectures, pp.224-234, 1989. [15] R. Koch, F. T. Leighton, B. Maggs, S. Rao, and A. L. Rosenberg, "Work-Preserving Emulations of Fixed-Connected Networks," The 21st Annual ACM Symposium on Theory of Computing, pp.227-240, 1989. [16] S. N. Bhatt, F. R. K. Chung, F. T. Leighton, and A. L. Rosenberg, "Efficient Embeddings ofTrees in Hypercubes," SIAM J. Computing, vol. 21, pp.151-162, 1992. [17] B. Aiello and F. T. Leighton, "Coding Theory, Hypercube Embedding, and Fault Tolerance," The 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, pp.l25-136, 1991. [18] N. Alon, A. Barak, and U. Mauber, "On Disseminating Information Reliably without Broadcasting, The 7th International Conference on Distributed Computing Systems, pp.74-81, 1987. [19] U. Mauber, (1989). Introduction to Algorithms. Addison-Wesley publishing company Inc. [20] Gibbons, Alan. (1985). Algorithmic Graph Theory. Cambridge University. 59

PAGE 70

[21] M. C. Hugh. James A. (1990). Algorithmic Graph Theory. Prentice-Hall, Inc. [22] B. D. Sivazlian, (1975). Optimization Techniques in Operations Research. Prentice-Hall, lhc. [23] R. M. Yanney, J.P. Hayes, Distributed Recovery in Fault-Tolerant Multiprocessor Networks. IEEE Transactions on Parallel and Distributed Systems (1984) pp. 514-520. [24] C. S. Raghavendra, Reliability Optimization in the Design of Distributed Systems. IEEE Transactions on Parallel and Distributed Systems (1985) pp. 265-270. [25] B. Feng (1996), "Reliable and Secure Message, "Doctoral Thesis, Dept. of Computer Science, Gunma University, Japan, pp. 02-38. [26] G. Michael (1979). Computers and Interactability. Bell Telephone Laboratories [c.f.7]. 60