Concepts in modular computing design for complex systems

Material Information

Concepts in modular computing design for complex systems
Thompson, Anthony Scott
Publication Date:
Physical Description:
120 leaves : illustrations ; 28 cm

Thesis/Dissertation Information

Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Computer Science and Engineering, CU Denver
Degree Disciplines:
Computer Science
Committee Chair:
Alaghband, Gita
Committee Co-Chair:
Ra, Ilkyeun


Subjects / Keywords:
Electronic data processing -- Distributed processing ( lcsh )
Modularity (Engineering) ( lcsh )
Object-oriented methods (Computer science) ( lcsh )
Electronic data processing -- Distributed processing ( fast )
Modularity (Engineering) ( fast )
Object-oriented methods (Computer science) ( fast )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


Includes bibliographical references (leaves 118-120).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Anthony Scott Thompson.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
50710142 ( OCLC )
LD1190.E52 2002m .T46 ( lcc )

Full Text
Concepts in Modular Computing Design for Complex Systems
Anthony Scott Thompson
B.A. Johnson State College, VT, 1991
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science

This thesis for the Master of Science
degree by
Anthony Scott Thompson
has been approved
Gita Alaghband
g- 17-^2

Thompson, Anthony Scott (M.S., Computer Science)
Concepts in Modular Computing Design for Complex Systems
Thesis directed by Professor Gita Alaghband
As computer science and technology advance, the need for more powerful
systems also increases. These shortcomings can be overcome through
modular distributed systems. These types of systems pool a farm or cluster of
servers to collaborate with each other on a variety tasks. Many distributed
systems have been created with the concept of emulating a single computer
system and/or Network Operating System (NOS). While this approach seems
practical, it has many pitfalls do to constraints on distributed systems that
make them difficult to adapt to the single system model. A better and more
flexible approach is to apply Object Oriented Analysis and Design (OOA&D)
and Design Patterns to create a low level framework for developing
distributed applications.
This paper will set out to explore designing a complex modular system using
Object Oriented techniques with an emphasis on patterns. We will develop
the Gnu Object/Distributed Object (GoDo) library to investigate and test
concepts in process distribution. This framework will provide low level
objects for developing distributed systems. These objects will provide a
unified interface for creating concrete objects using Design Patterns when
possible. This provides a more flexible approach to system design when
compared to a static component based method.
One of the most important decisions for any distributed system is the
Location or Scheduling Policy. A series of tests will be conducted to deduce
which scheduling policy is best given the processing dynamics of a given
application. This research will provide insight into a critical design decision
for distributed applications.

We will extrapolate using the GoDo framework for designing a complex
system by architecting the Virtual Office internet System (VOiS). This will
provide a point of study to explore OOA&D techniques and applying GoDo.
The VOiS system will handle unified but distributed file systems, a Virtual
Network Operating System (VNOS), a Voice/Video conferencing systems
and many other features. We will use the real world example of a traditional
brick and mortar office space on which to base our objects. We will show
that we can take an extremely complex system and simplify it through shared
object abstractions.
This abstract accurately represents the content for the candidates thesis. I
recommend its publication.
Gita Alaghband

1. Introduction........................................................1
1.1 Why Use Patterns?..................................................2
1.2 Load Distribution.................................................3
1.3 What This Paper Hopes to Show.....................................4
2. Using Design Patterns In a Distributed System......................5
2.1 Review of Patterns................................................6
2.2 GoDo: A Framework for Modular Distributed Programming.............7
2.2.1 GoDo Concepts and Abstractions..................................10
2.3 Patterns Used in GoDo............................................11
2.3.1 Facade Patterns [GHJV95]........................................12
2.3.2 Acceptor/Connector [SSRB01]....................................13
2.3.3 Observer [GHJV95], [BMRSS01]...................................14
2.3.4 Component Configuration [SSRB01]...............................15
2.4 Freedom Offered By Abstract Classing...........................16
2.5. Summary of GoDo and Patterns.....................................17
3. Process Distribution and Load Balancing...........................18
3.1 Load Balancing Schemes...........................................18
3.2 Load Balancing System............................................20
3.3 Using Weighting In Location Policy...............................21
3.4 Testing Transfer Policies........................................22
3.4.1 Test Scheme.....................................................22
3.4.2 Test System....................................................24
3.4.3 Test Programs and Assumptions..................................25
3.4.4 The Test.......................................................25
3.4.5 Results and Conclusion.........................................26
4. Designing a NGI Application: VOiS.................................29
4.1 Introduction to the VOiS.........................................29
4.1.1 VOiS Overview...................................................29
4.2 VOiS System Design...............................................31
4.2.1 Overview of VOiS Process Creation...............................32 VOiS Client...................................................32
v VOiS Boss......................................................32 VOiS Worker....................................................32 VOiS Process Creation..........................................32
4.2.2 VOiS Distributed Services.......................................33 VOiS System Services...........................................34 VOiS Facility Service.......................................34 VOiS Switch Board Service...................................34 VOiS Data Center Services...................................34 VOiS User Services.............................................35 VOiS Office Service.........................................35 VOiS Terminal Service.......................................35 VOiS Phone Service..........................................35 VOiS Directory Service......................................36
4.3 VOiS Distributed Processing Conclusions............................36
5. Conclusions and Future Work........................................37
5.1 General Conclusion................................................37
5.2 Future Work.......................................................38
A. Heterogeneous Test: Aggregate Processing Time Table................39
B. Heterogeneous Test: Aggregate Processing Time Charts...............40
C. Heterogeneous Test: Average Processing Time Table..................42
D. Heterogeneous Test: Average Processing Time Charts.................43
E. Gnu Object /Distributed Object Source Code.........................45
F. GNU GENERAL PUBLIC LICENSE........................................110

2.1 GoDo Communication Sequence......................................9
3.1 GoDo System Architecture.........................................23
4.1 VOiS High Level Conceptual Diagram............................30
4.2 Collaboration Diagram VOiS Client Process Creation............33

3.1 Aggregate sum of tests,

3.1 The sum total processing time......................................28
3.2 Mean test processing time..........................................28

1. Introduction
The speed, power and storage in computers is increasing at an ever
quickening pace. The general principle of Moores law appear to apply to
more than just processing power [DM98]. Storage space, memory and
network speeds are all increasing rapidly. Even with these facts, the ideas that
drive new technologies are still outpacing the speed of improvements to
computer hardware.
The current computing capacity of a single system and the "last leg"
bandwidth of the Internet do not meet the needs for most Next Generation
Internet (NGI) applications. This is due to the large amount of processing and
bandwidth required to accomplish acceptable performance for tasks such as
Multitrack Voice Over IP (VOIP), Video over IP or Multimedia Broadcast
over IP. Bandwidth restrictions may be resolved through Gigabit Ethernet
and its eventual wide spread adoption but processing barriers will not be
easily overcome through hardware alone. To accomplish the processing
required, systems will need to be massively parallel having a large number of
Processing Elements (PEs). These systems must be easily maintainable,
upgradable and cost affective. Changes to a system should impose zero down
time. Unfortunately large tightly coupled Multi Processor (MP) systems do
not meet any of these requirements they are extremely expensive while
most hardware upgrades and/or maintenance require downtime.
These shortcomings become most evident when exploring NGI
applications. These applications hope to take advantage of increases in the
speed of the Internet. While taking advantage of the speed of the Internet is
critical, these systems require huge amounts of processing power. Lets take
an example of a non-linear television broadcast system. This system will be
required to move large amounts of data and, in the case of a live broadcast,
process the data in near real time. Since the feed is non-linear, there will be
multiple users/clients accessing the feed at differing points on the disk each
requiring a streaming server. Needless to say, this is a very expensive
proposition and one that even the largest Symmetric Multi-Processor (SMP)
servers would be unable to service given a large viewing audience.
To handle the load incurred by massive systems, we need a flexible
way to add, remove and configure processing elements (PEs). The best way
to accomplish this is through a distributed system where processes are
offloaded to a cluster or farm of separate computers. These type of systems
were inspired by the Beowulf. This system was originally developed at at

NASAs Goddard Flight Space Center to handle large complex mathematical
computations[Mer98]. This approach to computing has been coined the
"pile of PC" approach since many research systems are built on leftover or
recycled computers. The the objective is to use a group of computers
together to distribute and compute a common task. This creates a Multi-
Processor (MP) system without the need for expensive, monolithic MP
hardware.1 Adding more processing power becomes a simple matter of
adding more computers to the collection of Processing Elements (PEs).
Another big advantage of this type of system is that it can be design so
PEs can be added in real time without the need to take the system down. This
is of great importance to NGI applications. Since these applications will
presumably be feeding linear data, disruptions in service is unacceptable. A
larger monolithic MP system must have down time to add new systems
resources. This is not the case with a modular system. In fact with the proper
design, an entire modular system can be upgraded incrementally with zero
down time.
There are two key issues with designing a distributed multiprocessor system:
1) What will be the framework or method for creating system applications.
2) What technique will be used for distributing and load balancing the
1.1 Why Use Patterns?
There have been two distinct approaches to creating applications for
distributed systems that act as a single unit:
1. The first approach is to emulate a single server and write applications to
this pseudo server. While the single system approach is logical and
grounded in a real world modeling, it falls short due the the differences
between single server and distributed MP systems mainly dealing with
memory management. Some systems, such as Shasta [SG97], even attempt
to support the instruction set of a given architecture to in order to run
system binaries directly without modification. This approach defeats one
of the advantages of a loosely coupled distributed system as it locks the
system into a single architecture and instruction set. One of the main
assumptions of this paper is that it is lucrative to have a mixed,
heterogeneous Distributed Multiprocessor (DMP) system.2 The paper
1 The term monolithic refers to tightly coupled single system computers.
2 MP will be used to refer to monolithic multiprocessor systems.

explores this issue further in section 3.3.
2. The other approach is to create a programming framework on which to
build applications. This approach is much better suited to distributed
systems because it de-couples and distributes memory management. This
approach is also much more flexible when it comes to modifying the inner
workings of the various components of a distributed system.
A single system approach implies there is, at some level, a shared pool
of virtual memory. The problems of memory coherence or keeping shared
data in sync is problematic at best. In a single system, memory coherence
issues are handled expeditiously through hardware and communication
latency is usually negligible given modem bus speeds. Neither of these facts
are true in a distributed virtual memory system. Memory management must
be handled through software that, at some level, needs to be centralized to
emulate a single system. Given the single point of entry needed for such a
virtual memory system, communication does not come at trivial cost. If
several processes are contending for the same piece of data, a communication
contention will occur [LH89],
The object oriented (00) framework approach uses memory objects
and avoids many of the above issues. If memory is not shared, it is handled
locally. This saves the overhead of fetch latency incurred when traversing an
internet communication network when a page fault occurs. If shared memory
is required, a memory object can be used to provide dynamic distributed
memory management [LH89]. The manager is only responsible for a single
piece of data. This removes the communication contention by distributing the
point of entry to various shared memory objects as opposed to fixed page
ownership strategy. This also has the added advantage of distributing the
processing of memory contention. The added benefit of 00 design allows the
memory manager to be changed out and/or researched easily for any given
1.2 Load Distribution
The issue of process distribution and load balancing is key to having a
robust and predictable distributed system. It is very important that no PE get
overloaded with with processes while others sit idle. It is also important that
the system not spend too much time trying to balance the load on the system
resulting in delayed process creation.
The latency during decision making is amplified when compared to
MP systems. A MP system, since it is hardware based, has the advantage of

much a faster communication bus and load balancing algorithms embedded in
hardware. This makes the cost of PE load discovery and the balancing
algorithm trivial when compared to a distributed system [LH89]. Distributed
systems have much longer communication latency and, since the load
balancing is done in software, it is reliant on both processing speed and the
availability of a local PE.
These issues are similar to the ones faced with virtual memory. The
fundamental difference is we can work around using virtual memory but we
can not ignore process distribution. The method of process distribution
greatly affects the speed of a distributed system. For this reason, it is
important to choose an efficient process distribution algorithm. The choice of
algorithm may change given various factors such as the randomness of the
processes, PE variability and overall system load. These factors need to be
considered as a whole when deciding the best process distribution algorithm.
1.3 What This Paper Hopes to Show
This paper will show the advantages of using an 00 approach to
creating distributed systems. To this end, the Gnu Object/Distributed Object
(GoDo) library is developed and used to help research concepts in distributed
system design. This library will be used both as a point of reference and as a
research tool for testing.
The paper then tests several approaches to process distribution
and load balancing. This paper will display the results of that research and
draw a conclusion a good approach. The paper will also extrapolate on
reasons why dynamically changing the load distribution policy may be
We will then use the Virtual Office internet System (VOiS) to show
the use of GoDo in designing a framework for creating a VOiS system. This
will provide an example of designing a large distributed system using GoDo
and 00 technologies. This should provide some insight into the organization
and ease of system design using 00 techniques.

2. Using Design Patterns In a Distributed System
Patterns are very useful in creating an abstract framework on which to
build any system. Patterns provide the flexibility of object oriented design in
contrast to component based structure on which most distributed systems are
based. Component based systems define both the operations they provide and
those they require [Gam20]. An object-oriented approach is only defined by
the operations it provides. It is the addition of required operations that make
component based solutions less desirable. Sparse requirements is the strength
of object-oriented designs and the reason many modem programmers use this
approach for software engineering. All requirements are removed leaving the
implementation flexible. Design Patterns go a step further by providing
simplified interfaces that can be reused for a variety of common tasks
providing even a higher level of abstraction.
We will present a brief review of patterns and the reasoning behind
their adoption. Design Pattern basics are treated in detail in [BMRSS01],
[Lar98], and [GHKV95] and design pattern in distributed systems are covered
in [Gam20] or [SSRB01].
The section will then introduce the Gnu Object / Distributed Object
(GoDo) framework. It was created and developed for this thesis. The GoDo
framework was developed in C++ to provide maximum efficiency but still
exploit 00 techniques. Appendix F contains most of the source code for this
The paper follows by exploring some of the patterns used in GoDo.
This section provides a description of the pattern and a list of the class that
exploit the given Design Pattern. Some of the example classes are from the
Gnu Object (Go) library [ThoOl]. The Go library was also created for this
thesis and provides a thin C++ wrapper around many common C libraries.
The classes that are referenced are used so often in die GoDo framework that
it was felt they were worth mentioning. Another reasoning behind this is
simply that GoDo is a subset of the Go library.
We then continue with a discussion of Abstract Classes and there
importance. The concept of Abstract Classing is used extensively in the
GoDo library. This is what makes an 00 approach to distributed system
development more flexible than other methodologies.
The section concludes with a summary of Patterns and GoDo. The

summary highlights some of the important points of the section as they relate
to distributed systems.
2.1 Review of Patterns
Patterns are a set of common, high level, abstract problem domains
that can be used to solve more specific problems. They can be viewed as
templates that can be reused to solve a variety of problems [GHKV95].
Patterns provide common interfaces and general solutions to a wide range of
problems. There are a special set of patterns for solving concurrent and
distributed problems.
Patterns have four essential elements [GHKV95]:
Name A handle used to describe a pattern. Similar patterns are
commonly described with different names depending on the author.
Every attempt will be made to provide all possible names for any
given pattern. Some refer to alternate names as "Alias" [RMRSS01],
For our purposes well treat all potential names equally.
Problem The issue that the pattern addresses and how the pattern
may overcome inflexible design decisions.
Solution Describes the elements that make up the pattern along with
the basic interface definitions. This also demonstrates the
"arraignment of elements"[GHKV95]: the classes and how they
interact. This will be our main focus when designing classes.
Consequence This describes the benefits and/or risks of using a
given pattern.
We will use these elements to describe the design patterns.
Patterns can also be broken down in to three distinct categories
Architectural This type of pattern controls and defines the
architecture of a given framework or system. This is the most rigid
type of pattern as it defines system-wide structural properties such as
subsystems and lays out the responsibilities, rules and guidelines for
Design This group defines a single solution to a problem that
reoccur or manifest itself in differing problem domains. In short, a
Design Patterns provides a common solution to similar but different

problems. This makes code reuse much simpler by providing a
common interface to solutions that are only abstractly similar.
Idiom These are low level patterns that reflect some aspect of the
programming language. These are the lowest level of patterns and are
often specific to a given language.
The differentiation between patterns and primitive building blocks is a
question that varies with interpretation [GHKV95]. For our purpose we will
be describing and highlighting a few patterns as examples of how they can be
used to create building blocks. These building blocks, when put together,
create the framework on which distributed applications can be built.
MP systems have three main components that patterns address: Inter
Process Communication (IPC), Synchronization and Concurrency Control.
Patterns address all of these needs and provide abstract interfaces that allow a
framework to be created that can be used for distributed and non distributed
processes. Abstraction allows the details of the inter workings of the
protocols to be reworked. This has two advantages:
1) It allows certain aspects of the distributed systems to be reworked, changed
and/or researched without breaking other parts of the system.
2) It allows vendors and researchers the opportunity to provide a flexible
interface on which to develop new applications. As long as the top level
abstract interfaces are adhered to, applications easily port to different
2.2 GoDo: A Framework for Modular Distributed Programming
We have designed GoDo as a simple framework for distributing
processing among multiple machines. GoDos design is heavily influenced
by Douglas C. Schmidts work on the ACE library [Sch02], GoDo provides
the services necessary to create threads of execution. These threads can
technically be of any type. This includes heavy weight processes, light
weight processes (threads), distributed processes or new types of processes
not yet invented. Because of the subject matter of the thesis, this paper will
only concentrate on distributed processes. GoDo works with a client slave
model and uses a set of top level abstract classes to provide a set of services.
GoDo breaks down its services in the following way:
1) Client Services
Process Manager Session A class for connecting to a Process Manager
and creating Client Processes.

Client Process These objects are created by the Process Manager
2) Master Services
Process Manager (PM): This module manages the creation of all
processes. The PM acts as a proxy for the client during process
initialization. The PM uses a Process Creator object to create new
Process Creator (PC)3: This will create new processes. This module is
designed to be used by the process manager to provide create any type
of process. GoDo accomplishes the flexibility in creating a process of
any required type by allowing new Process Creators to be substituted
without affecting programs written using this framework.
3) Slave Services
Slave: This object communicates with the Master services to create
distributed processes. The Slave also negotiates its addition to the farm
or cluster of PEs. This automates the process of adding additional
elements and makes the system modular. If the system is using a local
process type (lightweight or heavy weight processes), this class may be
Process (Proc): This is the process itself. This mixin class [GHJV95]
provides communication to the client and the process manager on the
processs behalf. The processes are always compiled as dynamic
libraries linked in by the Slave .. If the system is using a local process
type (lightweight or heavy weight processes), Process objects may be
created directly by the PC.
To create distributed processes, the system does the following: The
client application will connect to the Process Manager through a single
communication socket owned by a Process Manager Session (PMSession).
The PMSession will get a list of all available processes by name. This is all
done transparently by the PMSession object during initialization. Once this
is done, the client application can request distributed objects through the
PMSession. The PM will receive the requests and then forward them to the
PC. This design de-couples the PM from the knowledge of the process type
to be created. It also allows the PC to be changed dynamically depending on
processing needs or system conditions. The PC returns the address, port and
3 Its important to note that PC will always refer to a Process Creator object and not a
Personal Computer for this papers purposes.

Illustration 2.1 GoDo Communication Sequence
process id to the PM who in turn forwards this information to the PMSession.
The PMSession creates a Client Process object and initiates the connection to
the remote process. Upon successful process creation, the PMSession then
returns a fully instantiated Client Process object of the type requested. This
procedure is invisible to the developer. Illustration 1 is a sequence diagram
showing the basic communication between the GoDo subsystems.
GoDo runs on a TCP/IP based System Area Network (SAN). Since
communication is done via TCP/IP, the network speed is dependent on the
hardware used and is easily upgradeable.
System Inter Processes Communication (IPC) between the process and
the client is limited to a standard in (stdin) and standard out (stdout) in the
base Process and Client Process classes. This was done to simplify the
protocol and remove the need for marshaling. The system relies on other

objects or child classes of the Process and Process Client to provide additional
functionality. The stdin/stdout socket will also provide communication for
all system level communication.
2.2.1 GoDo Concepts and Abstractions
GODO simplifies process distribution for modular distributed systems
by providing a service specifically designed for a given problem domain.
GODO is a hybrid system lying somewhere between distributed process
schemes such as Java RMI, DCOM, CORBA or RPC and migrated processes
such as Java Applets.
RMI, DCOM, CORBA and RPC are all appropriate protocols for
centralizing business logic. Through these protocols, core pieces of
application code can be located in a centralized location. This type of
architecture provides for easy system upgrades. All critical code is located on
a limited number of servers as opposed to being spread to multiple client
applications. Updating or upgrading functionality only requires changing
files in a few locations. This is much easier than updating files to every
clients application. These protocols also allow processing that may require a
MP system or large amounts of memory to be done on a server with adequate
hardware. The main problem with this set of protocols when applied to a
DMP systems is the large amount of overhead incurred through forced
"features". Most notable is the use of a stub. A stub is a layer used for
marshaling data that is sent to and from a distributed object. The stub insures
data compatibility between systems where bit order and/or word size may
vary. With GoDo, a process may or may not have a stub depending on the
application. For short lived processes that have limited parameters and return
values, a stub adds unneeded processing. Optional stubbing is a flexibility of
classing that is unfortunately lost in static protocols like the ones mentioned
The second methodology (migrating the process) has the problem of
requiring the entire process to be transported each time it is to be used. For
large processes, this is impractical and would unnecessarily consume network
bandwidth. For this reason it will not be discussed further.
GODO is designed specifically for distributed processing inside a
System Area Network (SAN).4 Its purpose is to run multiple processes for a
single application. This differs from the remote procedure protocols that are
designed quite literally in reverse: to service multiple applications with a
single process. Service features that are not required in a DMP system, such
4 A SAN is sometimes used to refer to a Storage Area Network. For our discussion, a
SAN will always refer to a System Area Network.

as naming/registry, activation services, distributed garbage collection and
object brokers are removed. By stripping these and other unnecessary
features, client/object communication is greatly simplified.
GODO does set out to provide some of these services but it does so
through object inheritance that can be added only if needed. Class interfaces
are defined through a set of abstract classes. This has the advantage of
allowing the programmer to add in the necessary amount of complexity as
warranted. By adhering to the interface defined by the base classes, modified
child classes will not break other parts of the system. This leaves a high
degree of flexibility and allows the framework to be quickly re-factored as
technology advances. This basic premiss of 00 development sets GoDo apart
from the static high level protocols used for remote procedure invocation.
To facilitate communication, each class has its own communication
channel and top level mixin class for hiding all of the objects internal IPC.
This helps remove the burden of low level socket programming and makes
connecting to a yet undetermined Slave transparent. This socket is also used
for system level communication. It could also be used by child classes for
communing things such as stubbed data.
2.3 Patterns Used in GoDo
The GoDo framework is based on the Gnu Objects (Go) library. It,
and GoDo, use patterns to the highest degree possible to provide common
interfaces to much of the systems functionality. This section describes the
patterns used and how they apply to the system. We will break the sections
up into three categories of Go/GoDo classes:
1) Core This represents the main objects that are required to run the
2) Communication These are objects dealing with communication
between processes.
3) Synchronization This group of classes handle any mutual
exclusion or barriers that must be created to protect data or recognize
the completion of critical sections.
As we describe patterns, we will acknowledge the classes and
categories these patterns are used in at the end of the description section. The
patterns and their descriptions are listed below by subsection.

2.3.1 Facade Patterns [GHJV95]
Name: Facade
Problem: A subsystem contains a high degree of complexity most of which
the programmer does not need to interact with or can be resolved
and configured programmatically. While the implementation has
these complexities, they need to be encapsulated to hide them from
the class user.
Solution: Facade is used to provide a common interface with correct level of
access to a given set of functionality. This also makes subsystems
easier to use by encapsulating the complexities.
1. Hides the subsystem components reducing the number of objects
a client must deal with when using the class.
2. Promotes weak or loose coupling between a client and a
subsystem. The component of a subsystem can be varied without
affecting its clients.
3. Does not stop the client from using the subsystem classes as
opposed to the more general version.
1. Client Process (Core)
2. Process Creator (Core)
3. Process (Core)
4. IPC (Communications)
5. Shared Memory (Communications)
6. Mutex (Synchronization and Concurrency)
7. Command (Synchronization and Concurrency)
The Facade pattern is used in almost every class in the Go and GoDo
libraries. It promotes low coupling which is key to allowing polymorphism to
be effectively used. The Process Creator class takes advantage of this feature
allowing different process creation schemes to be injected into the system.
Furthermore, this reduces compilation dependencies. The Facade is a key
component to sub-classing the Process Classes allowing them to be
dynamically created.

2.3.2 Acceptor/Connector [SSRB01]
Name: Acceptor/Connector
Problem: Network oriented connection protocols often require a significant
amount of coding and configuration to establish a connection.
Most of this is repetitive and there are, for the most part, only a
few common configurations that satisfy most cases.
Solution: Encapsulate repetitive initialization code and provide an interface
to only the required services. Provide mechanism for customizing
the configuration but provide standard initialization to the most
common case. De-couple the connection and initialization
services from the processing of peer services.
1) Portability The connection establishment class can be used to
wrap subtle differences between platforms using the same core
protocol. This would be most evident when using WinSocks as
opposed to Berkley Sockets. By wrapping each initialization with
the same interface and making platform dependent runtime or
compile time decisions, a portable class can be created.
2) Extensibility The connection establishment classes can be
written as basic initializations of the given service and, through
inheritance, be extended by subclasses. Hence the basic or
general initialization can be written once and then may be
extended to more specific cases without the need for code changes
to the main class.
3) Polymorphism It is possible and actually desirable to create
the interfaces to these classes with enough abstraction to allow
different types of IPC to be returned by either active or passive
connection establishment. In short, this means protocols with the
same functional attributes can be exchanged by a program without
the need for a code change.
4) Complexity This method incurs a degree of complexity that
may not be necessary for simple sockets. This makes these
patterns possibly unsuitable for simple programs or programs
where the use of IPC is minimal.
1) SocketAcceptor This class is part of the Go librarys
goSocket classes and is used so frequently in GoDo.
2) SocketConnector This is the active complement to the passive

As mentioned earlier, these classes are not directly part of GoDo but
are part of the Go library. This does not diminish the importance of the role
these classes play in the GoDo framework. These classes are an integral part
of the GoDo framework as they are used for all IPC once a connection is
established. These classes are used internally by almost every class in the
GoDo library though their usage is hidden.
2.3.3 Observer [GHJV95], [BMRSS01]
Name: Observer, Dependents, Publish-Subscribe or Gatekeeper
Problem: As data or information changes in one place, it needs to notify and
update other components who are dependent on this data.
Solution: Provide a dedicated component to hold the data and/or keep state.
This is called the Subject [GHJV95] or Publisher [BMRSS01].
Other components can then ask to be notified of state/data
changes. Components dependent on the state/data change are
called Observer [GHJV95] or Subscriber [BMRSS01]. In
distributed systems this pattern is also referred to as Gatekeeper or
Event Channel depending on its usage. We will use
Subject/Observer for all instances to keep consistency and make
identification of the pattern simple.
1) Coupling between Subject and Observer is abstract. The
Observer does not necessarily need to have knowledge of the
event it is registering to be notified about. This provides loose
2) The Subject does not care about the number of observers. This
is due to the fact the notification is done through a broadcast to all
3) Because the Observer has no knowledge of when or how often
notifications will come from the Subject, updates may arrive
unexpectedly and often leading to disruption of process flow.
This can potentially impact application performance.
1) Shared Memory (Communication)
2) Mutex (Synchronization & Concurrency)
3) Command (Synchronization & Concurrency)

The GoDo framework makes extensive use of the pattern for creating
events. These events can be memory changes (Shared Memory) or a lifted
barrier (Mutex and Command).
2.3.4 Component Configuration [SSRB01]
Name: Component Configuration or Service Configuration
Problem: An application needs to be reconfigured or have services added
without recompiling or stopping an application.
Solution: Create a dynamically linked object with a common interface that
can be used to execute and configure the object. The object can
then be instantiated dynamically at runtime.
1) This pattern provides uniformity through a common interface
for components.
2) Administration of components is centralized simplifying
development by allowing certain activities to be performed
3) The code for the components base class is reusable. Common
or distinct behavior can be encapsulated and hidden from the
4) The dynamism of a component is greatly enhanced. This is
probably the most important attribute as functionality is brought
into existence at runtime without having to stop or reinitialize a
process. It can be done without necessarily affecting the running
5) The major downside to this pattern is increased overhead. It
takes more time to instantiate a dynamic object than it does one
that is statically compiled. Some compilers also add extra levels
of indirection for method invocation and global variable access for
dynamic libraries [GLDW87].
6) The Component Configuration pattern suffers from a
potentially narrow common interface. It is difficult if not
impossible to add or enhance direct communication between the
component and the invoking process.
1) Process Creator (Core)
2) Process (Core)

3) Process Manager (Core)
4) Process Distributor (Core)
The GoDo library uses components to create and bring into existence
distributed processes. The Slave object resides on each server/PE and uses
the Component Configuration pattern to create new processes in a threaded
environment locally on the server. The Process class is used as a prototype
and mixin for all processes to be created. The downside is a limited interface
with only three methods: init, fini and run. This can be easily overcome
through private Communication objects and/or the standard in and standard
out communication channel.
2.4 Freedom Offered By Abstract Classing
Abstract classing allows the internals of the class to be changed
radically without affecting other classes or compiled dynamic libraries. This
is a key advantage over top down static protocols other systems dictate. Many
systems require the developer to code to either existing paradigms such as
Ameba or Chorus[Tan95] or to a special ridged library. Both of these
methodologies are extremely restrictive when it comes to the most important
aspect of creating a distributed system: how the process is actually being
GoDo relies heavily on abstract classing and interfaces. As weve
seen we can use the Facade pattern to provide a uniform interface that
facilitates polymorphism. The Process and the Process Creator classes both
rely on common interfaces but for different reasons.
The Process class is an abstract mixin class. It is used to create a
processs functionality. Late binding is used to allow a Process objects
functionality to be created at runtime through dynamic libraries. This allows
the PC to create processes based on threads, system processes (fork) or
distributed processes. This technique uses the Component Configuration
design pattern. To facilitate this functionality, each process has a common set
of functions (interface) that are called by the PC provided by the Facade
design pattern. This same technique is used by the PM. The PM can use a
PC of any process type and load it while running.
Another good example of how these freedoms come into play in an
application is to look at the load balancing scheme facilitated through the PC.
The Component Configuration pattern allows the distribution or load
balancing method of a running process to be changed dynamically while a

process is running. This is handled through the abstract Process Distributor
class. Its children provide the programmer with complete control over how a
process or group of processes are distributed. There may be sections of a
calculation where the size and complexity is non-deterministic. This type of
process distribution would be best suited to a dynamic or adaptive scheduling
algorithm. On the other hand, there may be a points where a simple round
robin scheme works best. The decision making overhead of dynamic
scheduling is eliminated since the PE receives processes evenly. To clarify
this subject, we will discuss the merits and implications of differing process
distribution schemes in section 3 of this paper. The key here is that using a
framework provides the programmer with control over these key elements.
2.5. Summary of GoDo and Patterns
Using a framework and object oriented approach to creating
distributed system provides a much better solution to solving and optimizing
distributed systems. This method of distributed processing provides freedom
and control that static protocols or single system emulations do not provide.
What these types of systems are trying to "hide" is exactly what is most
important to real world distributed applications. The 00 approach allows the
programmer to access the correct amount of functionality for the give
The GoDo library provides us with a point of reference from which
we can explore 00 programing and distributed system design. Modem
distributed systems will need to take a completely different approach to
process distribution than what has been prescribed in the past. Most current
process distribution technologies such as RMI, DCOM and CORBA are based
on the concept of multiple applications accessing a single core piece of logic.
This is completely the opposite of the problem the GoDo library and many
single system emulations are trying to solve. These approach a single
application that needs processes distributed across multiple applications.
Many unnecessary and expensive features are needed by these protocols to
integrate into application development environments. These are mostly
unnecessary when designing a single application to be distributed across
multiple processing elements. The one feature needed for process distribution
is absent from all of the aforementioned protocols: a load balancing method.
The Go and GoDo libraries both use patterns to solve various and
some times very different problems. Design Patterns simplify programming
by providing common interfaces or approaches to a wide variety of problem

3. Process Distribution and Load Balancing
Load balancing is used in parallel processing systems to evenly spread
tasks across multiple Processing Elements (PE). This is a necessary
component of both distributed and non distributed parallel systems. Load
balancing is the logic by which processes are dispersed among PEs. Its
main purpose is to prevent idle time among PEs. The other objective of the
load balancing or a scheduling algorithm is to avoid "thrashing". This is a
state where some or all PEs become overloaded and are no longer able to
effectively process data. Load balancing is also known as load sharing and is
considered to be part of the larger distributed scheduling problem [SHK95].
Load balancing properties can be broken up into two parts: the load
balancing scheme and the load balancing system. The properties of the
scheme deal with more abstract concepts while the systems is concerned with
the physical elements need to make the scheme work. The first two sections
provide information on each of these concepts respectively.
The next section deals with the weighting of PEs in a load balancing
policy. We will show why the weighting of PEs is important to proper load
balancing in a heterogeneous slave pool. We will also discuss when it might
be appropriate to use weighting in a cluster that is homogeneous. This paper
does not cover the weighting of processes and will leave that to future work.
The final section deals with testing and comparing four different load
balancing algorithms. These include Round Robin, Random, Dynamic and
Weighted. Its important to point out that the weighted algorithm is the same
as dynamic but has the additional benefit of a PE weighting factor. We will
show that this weighting factor is critical to the proper load balancing of a
heterogeneous system.
3.1 Load Balancing Schemes
The core of a Load Balancing system is embededed in the scheduling
algorithm. This is the decision making piece of the system. The scheduling
algorithm will "receive" a process or part of a process to distribute. The
algorithm will then choose a PE to "send" the work to. The algorithm can be
static or dynamic [SS94]. A static system sends the work out in a set pattern.
There is no consideration for processor load, process size and/or complexity.
Dynamic load balancing takes these factors into consideration. The load

balancing algorithm may take part or all of these things into consideration.
There also may be a weighting factor used for the PE and/or process. For the
PE, this will represent the processing power and/or memory of the element.
These could be supplied in differing weights or as an overall weight. The
process is weighted in a similar fashion. Likewise, the memory load and/or
complexity of a process can be weighted together or separately. There is also
a third type of algorithm derived from dynamic load balancing called
adaptive [SS94]. This type of system adapts to changes in the systems load or
performance. This can be as simple as not performing poling or holding jobs
in a special queue if the overall system load is too high or as complex as a self
tuning system with a high amount of AI (artificial intelligence) for decision
There are two models that determine where the load balancing
algorithm is run. These are centralized and decentralized respectively [SS94].
Centralized load balancing has one element that is dedicated to load
balancing. The upside to this model is that the load balancing element of the
system is almost always free to do its job. This also is a good place to do any
queuing of tasks as processes can be held until a PE with load below some
threshold is available. The downside to a centralized system is there is no
redundancy. If the master or central load balancing agent fails for any reason,
the entire system fails. Decentralized load balancing has been around since
the beginning of networking. It can be seen in systems such as routers and
DNS. These daemons are both "senders" and "receivers" (these concepts will
be discussed in more detail in the next section). If they are overloaded, they
can off load work to another system. One popular area of research has been
the decentralized workstation model. While showing much promise at its
outset, it is less common in practice. This concept takes all workstations on a
network and has them work as a single multiprocessor system. If a
workstation is overloaded, it simply sends tasks off to a systems that is under
loaded. Problems arise when all workstations are overloaded. The load
balancing process itself becomes an extra burden to an already overloaded
system. There can also be long loops of process transfers creating
communication latency compounding the problem [SS94] [SHK95], Due to
these inescapable shortcomings, this model has been, for the most part,
A load balancing scheme can be broken down into various policies.
These are as follows [SHK95]:
Transfer Policy Sets a threshold for transferring tasks to another node.
It is indicative of a decentralized system. We will not use a transfer policy

in our model because distribution is centralized in our system. We also do
not engage in process migration typical of a transfer policy.
Selection Policy Selects a task for transfer. This is done after a node
goes over a load threshold. Selection policy is another element of a
decentralized system so it too will not be used in our model.
Location Policy Finds a suitable node to send a process to. The
decision can be made statically or dynamically. This is the load balancing
algorithm and is present in all MP systems.
Information Policy Decides when information about an element should
be collected. There are several types:
Demand-driven State information is collected every time it is
needed. Information is only collected when it is needed.
Periodic Information is collect periodically at set intervals. This
type is static because it does not adapt to the systems state.
State-change-driven Elements provide state information whenever
their state changes by a certain degree. This require constant state
monitoring at each element and provides the most dynamic solution as
changes in processing happen proactively.
3.2 Load Balancing System
The communications network is of critical importance to the load
balancing system. If communications are not reliable, the system as a whole
will be unreliable. It is best practice to use a dedicated point to point
protocol with error correction in a distributed systems. This insures there will
not be contention for communications channels and message delivery will be
guaranteed. In a load balancing system, the communications network provides
the link between the load balancing logic, the process request and the PE.
The Sender is the system element that has work to be done [SHK95].
This is either a client sending a task to a distributed system or, in the case of a
network of workstations (NOW), a workstation whos processing load has
gone over a certain threshold. In the first instance, there is no decision
making by the sender. The sender simply hands a task off. The process of
load balancing and process distribution is invisible to the end user. In the
second case, it is customary for the workstation to decide where the process is
to be sent.
The Receiver is the element that takes on processing requests
[SHK95]. It can be a slave/worker, daemon or a workstation. There is little
or no logic to redistribute the process. In the worst case the slave/worker can
respond with a NAK (negative acknowledgment) or refuse the process. In the

workstation model, the receiver may be responsible for redistributing the
process. Typically all elements are responsible for load balancing processes.
One such model makes the Receiver act as the load balancing element if its
load is above a given threshold. If it is above the limit, it becomes a Sender
and either randomly or in some set pattern (ex. nearest neighbor, next address,
etc.) resubmits the process for load balancing. This model is potentially
unstable. If all the workstations are above the given threshold, the
communications network will become jammed with process "sends". The
system will essentially go into a busy loop [SHK95].
The last part of a load balancing system is the queue. The queue is a
list of processes that need to be run. The queue can be stored on any part of
the system. It is typically stored on the system with the least amount of
overhead. In the case of a centralized system, the queue is held on the process
distribution server. The queuing may also be offloaded to a kernel based
queuing system provided by the OS. An OS typically will provide a pre-
optimized queuing system.
3.3 Using Weighting In Location Policy
In order to create a system that is flexible, we must be able to add
PEs that vary in resources and processing power. This holds true for the
"Pile of PCs" approach.5 We want to be able to add any free and available
computer to the system. In this model there is little control over what types of
PE will become available. While this is obvious for the junkyard approach, it
is also very important for commercial distributed systems. These types of
systems will have PEs of varying power. This results in improved
scalability. Legacy machines can be utilized as state of the art elements are
added. Therefor, the legacy systems can be phased out without impacting the
over all system. An older PE can be taken off-line when it is no longer
providing ample processing or its at the end of its life cycle. It is a
worthwhile arrangement from the vantage point of not impacting the end
users. The entire system can potentially be upgraded with no down time.
With these advantages, it seems very lucrative to create a dynamic
distributed system where PE processing power is varied. While it appears to
be a good idea, there is a big hurdle to overcome. In order to compensate for
differences in processing power, a deterministic or quasi-deterministic system
of process distribution must be created. To have a true deterministic system,
everything about the process behavior should be known in advance [Tan95].
In a system where PEs vary, the PE system information will also need to be
5 Note that when referencing a "Pile of PCs", PC does mean Personal Computer.

know in advance. This includes static information such as total memory and
CPU power as well as dynamic information such as process load and free
memory. Static system information can be collected up front when a system
offers itself for processing becoming a system PE. Dynamic systems
information should be collected at the time of process distribution for
temporal accuracy. Deterministic systems often use dynamic load scheduling
algorithms to make or "determine" which PE should get the process.
For the reasons mentioned above, it is believed that a heterogeneous
system using a weighted load balancing scheme provides the most attractive
model for a distributed system. The next section will test if the weighted
policy is indeed the best solution for a heterogeneous collection of loosely
coupled PEs.
3.4 Testing Transfer Policies
We will use four different Transfer Policies for determining how
processes should be distributed: Round Robin, Random, Dynamic and
Weighted. These are combined with four different test programs for a total of
sixteen tests. The transfer policies and test program are described in detail in
the next sections.
We will use a heterogeneous system where the hardware of each PE is
variable. As we have and will argue, homogeneous systems are not of interest
due to the fact that they lock the system into a single PE architecture. With a
heterogeneous distributed system, individual PEs may be changed out and
retired but the overall system should continue to operate for a long time due
to modular hardware upgrades.
3.4.1 Test Scheme
The load balancing system is a centralized three tier architecture based
on the GoDo library. Tier one consist of a client who will request processing
space. The middle tier is the Master server. The master server receives a
request from the client and brokers the process to a PE. The decision as to
which PE gets the process is made by the load balancing or scheduling
algorithm based the transfer policy in place. The third tier is the slave. The
slave takes process requests and runs them given thread availability. If the
slave is out of threads, it responds with a NAKs and the server retries the
transfer policy. This relationship provides our queuing rule. Upon process
completion, the slave returns its IP address and process runtime to the client.
This runtime is used in evaluation of the effectiveness of the transfer policy.

Aside from running processes, the Slave will also return the load average to
the Master on request. The Slave is analogous to a PE. Illustration 1 in
section 2.2 provides a graphic representation of the system communication
protocol. A detailed explanation of process creation is also described in
section 2.2. Illustration 2 shows an overview of the system.
Illustration 3.1 GoDo System Architecture
The static algorithm uses Round Robin scheduling. This scheme
involves looping through the slave PEs and giving each a process in turn.
There is no consideration for processing load or process size. The load
balancing scheme blindly hands each PE a process.
The Random Location Policy picks a number between 0 and the total
number or slave PEs minus one. This number is used to pick an element from
a constant ordered vector of processing elements. There are no other
considerations. This is a hybrid policy as it is not truly static nor is it really
dynamic. It is not static because a decision, all be it random, is being made in
regard to process distribution. Given our definition, it is not really dynamic
either since it does not take into account any system state information.
The dynamic scheme uses a demand-driven information gathering
policy. Each time a process request comes from a client, each Slave is poled
for its load average. This calculation is done by the kernel automatically so
there is no additional processing penalty. After being poled, the Slave passes
its load average value back to the Master. The Master then chooses the
Slave (PE) with the lowest load average.

The last policy uses the above mentioned dynamic method in
conjunction with a element weighting. This weighting is derived from system
memory and the kernel calculated bogomips. The later is a calculation done
at boot time by the system on a processor to determine its MIPS. The same
formula is used to determine the processing and memory weights. It is as
Weight Value/(2 (Max( All Initial Weight Values))
The "Weight Value" can be any metric used for weighting. The idea
behind this formula is to not introduce too much bias when weighting a PE.
3.4.2 Test System
The test system uses a 100Mb Ethernet network.6 TCP/IP protocol
was used for packet transfer. TCP/IP provides guaranteed packet delivery on
a socket connection. TCP uses a three way hand shake for connection
establishment and a four segment close connection negotiation. TCP also
handles error detection and packet retransmission [Ste98]. The features of
TCP greatly simplify the creation of a reliable network.
Connecting the network is a 10/100 asynchronous Ethernet switch.
An asynchronous switch provides two way full bandwidth communication of
independent collation domains. This provides for the least amount of network
contention and reduces communications latency. All computers are equipped
with 10/100 asynchronous Ethernet network interfaces.
The Client and Master are run on the same machine. This is due to the
low overhead each process incurs. These were run on an Alienware MJ-12.
This system has two MP Athlon 1600+ processors and 512 Mb of system
Two of the PEs were Compaq Armada laptops. Each system has a
300 Mhz Pentium II processor with 192 Mb of RAM. One laptop had a
bogomips rating of 598.01 while the other was 663.55. The third PE was a
Compaq Presario 5868 with a 600Mhz Athlon processor with 512 Mb of
RAM and a bogomips rating of 1192.75. The fourth and last PE was a clone
Athlon 1.2 Ghz system with 256 Mb of RAM and a bogomips rating of
6 Fast ethemet is not the optimal network speed given the fact that Gigabit
ethemet is readily available. Because of this, network latency is ignored as
a factor in our tests.

3.4.3 Test Programs and Assumptions
It is assumed that static load balancing will perform well on a series of
identical tasks. This is due to the fact the static algorithms have lower
overhead. It would certainly hold true for small tasks that never overload the
entire system but it is not clear if this assumption will also hold if the system
is given a task that overloads it or if the process size is random. To this end,
four test programs were created. One process simulates straight double loop
linear vector/vector multiplication and has a runtime of (n2). The aggregate
of these processes form a simulation of a parallel matrix/matrix multiplication
process. The second test uses the same double nested loop but the loop size is
chosen at random within a give bound. These processes are designed to
simulate random system load more likely not associated with mathematical
computation per se. The two versions had their loop or vector size set to
5000000 and 2500000. For the random test, the loop count represents the
upper bound for randomization with the lower bound being 0. A small
calculation is performed in the inner loop to hinder loop optimization by the
compiler or processor. Optimization may lead the processor or compiler to
figure out the calculated values are not used and not perform the required
looping. The memory size was set to 1Mb and 5kb respectively. The
memory was filled with a single character to force its use avoiding any
optimization that might put off instantiating it. The smaller versions of the
base test programs are easily accomplished by the PE farm. The larger ones
cannot be accomplished quickly and provides a slightly overloaded system
The dynamic load balancing is expected to performer better under
variable load while weighted is expected to perform the best. This is due to
load factors and processing resources being taken into account. It is also
assumed that the weighted scheme will perform better at non random linear
calculations under heavy system load since the system should be more evenly
balanced. The static policy will unknowingly overload the weaker systems
while leaving the more powerful systems underutilized.
The static algorithms should fair better at handling low to moderate
matrix/matrix multiply calculations. This is due to the low overhead incurred
during process distribution, the consistency of processing resources required
and the fact that no system will ever get overwhelmed.
3.4.4 The Test
The test program runs 1000 test processes. Each test program is run

three times to give a total of 3000 data points per test. The test was run three
times to provide a check for the random processes to make sure there was not
a massive fluctuation in program results. The large number of processes per
program (1000) also helps negate issues by creating a mean loop size that
should be close to half the upper bound.
The client collects data on the processs run times and the PE. This is
written to a test specific file. These file are then transformed into comma
delineated files with other data, contained in the file name, put on each line.
These are then rolled into a single file that is uploaded to a database for
The data is analyzed based on two criterion. The first is total
processing time. This is the total time a processing element was utilized or
the aggregate run time. The second is the average run time.
3.4.5 Results and Conclusion
Given Moores law, which proposes an exponential increase in
processing power over time, PE system resources in a heterogeneous,
modular system will likely vary greatly. While most heterogeneous systems
are research based like the ones at Sandia National Labs, it is believed that
single modular distributed systems will become common place in commercial
applications as well.
There are several advantages of a software based heterogeneous DMP
system over homogeneous MP or DMP systems. The most compelling is that
the system can be expanded using the latest technology. This keeps the
system from becoming locked into a legacy PE which will eventually be
eclipsed by ever faster processing power. Heterogeneous systems hope to
circumvent system obsolescence for this reason. Another compelling
advantage is that a PE can be a computer of any type. The system is never
locked into a single PE architecture. This allows the system to assimilate the
most advantageous current technology. Taken further, if the GoDo library is
ported, it is no longer bound to a single operating system. This increases the
systems heterogeneity. Upgrading or adding to the farm of PEs is made even
more flexible while leveraging the latest software as well as hardware
Most of the assumptions made before the test have held true. In all
cases except one the weighted scheme performed best. Likewise, a dynamic
demand-based poling schemes, of which the weighted policy is part of,
worked better on random size tasks. There was one anomaly to this

assumption where the round robin scheme outperformed the dynamic scheme
on smaller tasks. The results of the aggregate sum runtime can be seen in
table 1.
Round Robin Random Dynamic Weighted
Test 1 3589.5 3408.32 2572.62 1685.87
Test 2 28749.66 27638.89 12541.19 13418.62
Test 3 420.25 705.89 529.67 334.38
Test 4 3765.43 5195.52 2803.82 1660.05
Test 1: Linear Double Nested Loop 2.5 Million loops, 5Kb of memory
Test 2: Linear Double Nested Loop 5 million loops, 1Mb of memory
Test 3: Random Double Nested Loop Upper bound of 2.5 Million loops, 5 kb of memory
Test 4: Random Double Nested Loop Upper bound of 5 Million loops, 5 Mb of memory
Note: All run times are based in seconds. The lower bound for all random size loops is 0.
Table 3.1 Aggregate sum of three tests consisting of1000 distributed processes each.
There are only two variations in the test results. The first is the seen in Test 2. The
dynamic scheduling policy was better than the weighted one. The other was in Test 3 as the
round robin policy performed better than the dynamic.
An interesting result was that the static Round Robin approach was not
the best choice for consistent linear parallel processing on our heterogeneous
test system. The dynamic schemes were clearly better at this task. This may
be partially due to the wide range or processing power among the PEs. PEs
that are closely matched are not of much interest as a wide variance in PE
capabilities would most likely be the case in any large heterogeneous system.
Charts 1 and 2 below show the general trend of the tests with regard
to aggregate and average processing time. A complete listing of tables and
charts from these tests can be found in appendix A, B, C and D. Please note
that the charts are in time and not in policy order. The label on the Y axis
must be read to determine what policy is being represented.

Sum Run Time
Random Process Size: 0-6 Million Loops Double Nested. 1Mb of Memory
Chart 3.1 The sum total processing time of three 1000 process tests. The test
process consists of a double nested loop with a random loop size. The upper
bound of the loop size is five million while the lower bound is 0. Each process
also instantiates one megabyte of memory.
Average Run Time
Random Process Size: 0-6 Million Loops Double Nested. 1Mb of Memory
Process Distribution Scheme
Chart 3.2 The average processing time of three 1000 process tests. The test
process consists of a double nested loop with a random loop size. The upper
bound of the loop size is five million while the lower bound is 0. Each process
also instantiates one megabyte of memory.

4. Designing a NGI Application: VOiS
We will now pull all of the foreknowledge of this paper together to
design a Next Generation Internet application. We will use the Virtual Office
internet System (VOiS) to illustrate using the distributed object library GoDo
in a distributed application.
We will start with an introduction and overview of the system. This
will provide information on the systems purpose and provide a gross overview
of the high level subsystems.
We will then look at the VOiS system design. This will introduce the
process creation process which is built on top of GoDo. We will then work
our way through the components of VOiS starting at the lowest level used for
process distribution and working our way up to resources directly available to
VOiS users.
4.1 Introduction to the VOiS
VOiS is a system that sets out to replace brick and mortar office space
for small to medium size business. The VOiS system will provide a private
network to people in disparate locations. This will allow companies to be
developed without the need for expensive brick and mortar office space and to
hire talent with no need for relocation. This also allows talent to live where
ever they please improving the quality of life of all involved.
4.1.1 VOiS Overview
VOiS is designed to provide all the facilities of an office environment
including network file services, e-mail, a phone/video conferencing system
and "offices".
An "office" will resemble a complex instant messaging system. While
working, each employ will have the ability to "knock" on anothers door. The
"office" will consists of a private connection for voice and video
conferencing. The recipient of the knock can then decide how to respond to
the connection. "Walking" the office will simply mean looking up a list of
names, positions or both and choosing the office to "knock" on. Walking the
office can also be done through an organizational tree, allowing new
users/employees to quickly identify who to contact in the organization for
certain requests.

The phone system will probably be of the most interest. VIOP (Voice
Over IP) will be used to connect voice style communications. Soft Switch
technology can then be employed to make the connection between the VOiS
phone system and traditional circuit switched network.
The virtual file and e-mail services will ride on top of a VNOS
(Virtual Network Operating System). The idea behind the VNOS is to
provide a layer of abstraction from the actual NOS servers. Therefore the
distribution of processing and services across multiple servers becomes
possible. The VOiS system runs on a "farm" of servers. New resources may
be added in a modular fashion. When more resources are required for the
system, simply add more computers to the farm. Therefore increasing both
the disk space and processing power of the VOiS system. The concept is to
provide a layer of abstraction that makes physical limitations such as disk
partition size irrelevant.
VOiS can be viewed on an architectural level as a four tier system
consisting of clients, authentication server, boss server and a server farm of
workers (Illustration 2).
Illustration 4.1 VOiS High Level
Conceptual Diagram

The VOiS sxystem will be controlled by the "Boss" server. The Boss
is a child class of the GoDo Process Manager. The Boss will server many
functions and keep a database of users, services, authentication tokens and
other user/office information. The Boss will also act as a distribution server
for the worker server farm. Distribution of service will be handle by the
GoDo library. The file system will also be distributed across the workers.
Therefore common servers disk space can be combined into a storage area
network. The Bosss job will be to relay the results of a process request and
coordinate the starting and stopping of the processes. The Boss will be
responsible for keeping track of the workload of each Worker enabling the it
to decide which Worker is best suited to handle a new process.
The Authentication server is a fairly complex firewall system. The
authentication server will mainly act as a gateway insulating the Boss from
direct Internet communications. It does basic initial authentication for high
level security but will defer to the Boss for low level system authentication.
The Bosss security system will closely resemble MITs Kerbaros [KN93] or
Secure Shell [BS01] using public/private key encryption for authentication.
The Boss will be the holder of private and public key information insuring it
is never accessible directly from the Internet.
The Client will be a the user interface run by a VOiS user. This will
contain several smaller applications to provide access to the VOiS systems
functionality. This will include the client office, client phone, and a
connection to the VNOS.
The Worker will have a suite of server applications which will be run
when the Boss instructs them to do so. A Worker is analogous to and a child
of the GoDo Slave.
As mentioned above, the Worker will also provide disk space for the
Boss storage area network. The Worker will have no knowledge of what the
disk space is being used for. This will be coordinated by the Boss and allow
for the layer of abstraction necessary to maintain dynamic scalability of
storage resources.
4.2 VOiS System Design
The next section will explore the system design of the VOiS using
GoDo. We will first look at the high level separation of work between the
Client, Boss and Worker. We will then explore the design of the various
VOiS services. We will conclude this section with an overview of the process
creation collaboration.

4.2.1 Overview of VOiS Process Creation
The conceptual diagram in illustration three shows a high level
breakdown of the parts of the VOiS system. We will provide a quick
clarification of how each parts fit into the GoDo distributed system
framework. The Authentication server is omitted from this discussion as,
while its an important part of the VOiS system, it has little to do with
distributed processing and hence is out side the scope of this paper. VOiS Client
The VOiS Client uses the GoDo PMSession class to request
processing space from the VOiS Boss Server. Once connected to the Boss, all
subsequent functionality is first initiated by creating a GoDo Process Client
object to communicate with the running process on the farm. The Boss will
run a Proxy Process Client for each request so the real client never gains
connectivity to the systems area network. This will provide protection from
one VOiS user hacking the data of a different Office. VOiS Boss
The Boss node uses the GoDo Process Manager class to create
distributed objects. The Boss is our centralized Master. It will acquire most
of its functionality through inheritance in order to gain access to the Process
Managers protected attributes and methods. The addition of a Proxy Client
and the need to create an extra IPC channel makes this necessary. VOiS Worker
The Worker uses the GoDo Slave class. Since the functionality
extracted is straight forward, an object of the Slave class is created and the
class is not inherited. After creating and starting the Slave object in its own
thread, the Worker has no further need to interact with the object. The Slave
is left to go about its business. VOiS Process Creation
The procedure for creating processes in the VOiS system is identical
to the process used for GoDo except for the addition of a proxy client. (The
GoDo procedure is described in detail in section 2.2.) With the addition of
the proxy server, the process manager will create a new proxy server
providing it with the port and IP of the process and an additional IPC channel
for communicating with the Boss. This channel will give the boss the ability

to control processes through the proxy which is necessary for system
administration. The proxy will return an IP and Port over the local IPC
channel allowing the Process Manager to return the Clients proxy connection
Illustration 5 shows the communication between the Client, Boss and
Illustration 4.2 Collaboration Diagram VOiS Client Process Creation
An important consideration when creating a distributed system is what
constitutes a process worthy of distributing. It makes little since to distribute
a process that is short lived or whose resource requirements are so low that
the distribution and creation of the process requires more CPU time then the
process itself. This is a very intriguing problem for the VOiS but a simple
solution has to be deduced. A distributed process will, in general, constitute a
server service. A server implies that there is some type of long running
process. This may not be processor or system intensive but it will be long
lived and, more than likely, vary in resource requirements based on demand.
For the remainder of this discussion, we discover what services in the VOiS
require process distribution.
4.2.2 VOiS Distributed Services
VOiS has a series of services which run as servers to fulfill various
system requests. These services can be broken down into two basic
The first is system services. These are started at system initialization
and run for the life of the system. These are top level component and are used
to spawn members of our next category user distributed processes.
A user distributed process lasts for the duration of a client session.
These are, for the most part, created when a client initiates a VOiS session.
There may be further sub-grouping beyond these where a user service

may need to create distributed processes. So far there have been none
discovered during system analysis and design but the possibility one might
exist will be left open. VOiS System Services
VOiS system services can be broken down by examining the real
world functionality we will be emulating. At the highest level, we will need a
facility, data center and communication switch board. These are high level
global services for the entire system. Each of these systems will aggregate
company specific services of the given type. The term "company" refers to a
logical group of users who collectively use the VOiS services in a private
virtual office space. We will continue our discussion with a description of
these services and their subsystems. VOiS Facility Service
There is a single Facility service for the VOiS. This facility will
aggregate the Office Space services. There will be one Office Space per
company. These will exist for the period company has VOiS service from a
POP (Point of Presence). The Office Space service will act as a container and
point of entry for all other company level services. VOiS Switch Board Service
To facilitate communications, a Switch Board services will be created.
When a communication request is received, the Switch Board will request a
distributed process to handle the Session services. This session will last for
the duration of "call". The Switch Board services is complemented by the
Office Space Switch Board (OSSB) contained in the Office Space. This
service allows users of the same office space to communicate with each other
on a virtual "private line". The physical purpose of this service is to isolate
communication between office spaces. The OSSB will forward all "call"
requests to the main Switch Board for Session creation. VOiS Data Center Services
The Data Center services acts as a creator, aggregator and control
center for VNOS services. There will be VNOS servers to handle Storage
Area Network resources,email and file system access.
Each worker or node will provide a portion of its local file system to
the VOiS Storage Area Network. The Storage service will provide a

collection of virtual inodes and model itself after the time proven UNIX File
System or UFS [Lem97].
Each Office Space will get two distinct VNOS services. One will
handle email while the other will provide access to a private company file
system. Access to these services will be integrated into the clients OS
paradigm. The email service will use SMTP (Simple Mail Transfer Protocol)
[Pos82] while the file system server will integrate itself with the methodology
used by the clients OS. For UNIX based systems, now including Macintosh,
the resources will appear as an inode or more literally as a directory. This
presentation is similar to and will be modeled after NFS. For Microsoft based
systems, the service will appear as a SMB (Server Message Block) service
[AA87][Sha99]. VOiS User Services
User services exist for the duration of VOiS user session. These are a
mix of user private and Office Space global services. These services provide
the intermediate layer between the client and the VOiS system. VOiS Office Service
When a user initiates a session, they are provided with a virtual Office
from the company Office Space by the Office Space Manager. The Office
service is analogous to Offices Space in that it is a container for other
services. The objects or services contained in the Office include a Terminal
for accessing VNOS services, a Phone for communication and a Directory
containing members of the company. VOiS Terminal Service
The Terminal service provides access to the VNOS servers. The
Terminal is responsible for preparing VNOS and system data for client
presentation. VOiS Phone Service
Each office will be equipped with a Phone. These have video
conferencing capabilities for communicating inside the Office Space or with
VOiS users from other companies. The H323 standard and VoIP (Voice over
IP) would be the logical choice for the audio/video signaling.[ArcOl].
35 VOiS Directory Service
The Directory itself is maintained by the Office Space as a LDAP
(Lightweight Directory Access Protocol) server [KHY95] [Kil95]. The
Directory service is an LDAP client. This service provides a standards based
list of company employees and gives the client the ability to "walk" the office.
This directory can be displayed in tree structure by company organization or
displayed in flat list.
4.3 VOiS Distributed Processing Conclusions
By using the GoDo library, we greatly simplify the creation of a
distributed application. We can use preexisting class objects and inheritance
to provide most of the functionality needed for process distribution. The only
piece we need to reengineer is the PM to add a proxy. This still does not pose
much of an issue because we have a strong base class to derive functionality
Our choice of using long lived services for process distribution seems
to have worked well. Through examining the design of VOiS we can
discover a large number of candidates for process distribution. We have
successfully offered almost all of the systems functionality for process
distribution. This should provide a highly optimized systems.
There is one last observation that makes a strong case for a Dynamic
Weighted scheduling policy for an application such as VOiS. By using this
type of policy, we will, at system initialization time, bias the processes loaded
first toward the most powerful elements in our Worker Farm. These
processes/systems also happen to be the most critical for VOiS. They are
global in scope and will affect every user and subsystem. We clearly want
these to run on the most powerful PE and, through weighting, this will happen

5. Conclusions and Future Work
This next section will present the conclusions drawn from this paper
and present future work that will extend research into this topic.
5.1 General Conclusion
This paper has explored a wide variety of issues surrounding the
development of parallel distributed applications. We began with a general
overview and a very brief history of the "pile of PCs" school of parallel
processing. We then explored creating of a flexible 00 library on which to
base parallel distributed applications. This was followed by testing what is
probably the most important aspect of any distributed parallel system the
location policy or load balancing scheme. Through these tests we discovered
that a dynamic-weighted scheme works best for a heterogeneous system.
This paper has dismissed other distributed programming technologies
mainly because they are designed to solve a different problem domain:
remotely locating an instance of critical segments of system logic to be run by
multiple client applications. This solution is inverse to the issue being
discussed: locating multiple processes from a single application on distributed
PEs for parallel processing. To meet their purpose, these libraries have
accrued unnecessary processing overhead.
We also noted that clustered systems that attempt to present
themselves as a single system are, by their very nature, inefficient. They
attempt to mimic hardware based systems and do not take into account the
fundamental differences between software based DMP and hardware based
MP systems. Hardware based systems have the advantage of a very fast
communication bus and the ability to embed algorithmic logic in specialized
hardware. These optimizations make things that are possible in a hardware
based MP system impractical for distributed parallel systems. Things such as
virtual memory and process migration are key to a single system image.
These procedures require too much computation and communication to work
effectively in a distributed parallel system.
The uses of an Object Orient Framework provides a compelling
structure on which to base distributed applications. OO design provides the
user with the flexibility to explore ideas and make changes to very low level
parts of the distributed system without affecting other elements. Weve also

discovered through GoDo that there are several areas where dynamism can be
added to a distributed system and controlled through software. This includes
changing process types and/or distribution algorithms. We used the
Component Configuration design pattern to provide this functionality. For
distributed algorithms, our experiment showed that dynamic change is
probably desirable. This flexibility allows us to create an adaptive load
balance system that can even contain a level of AI. In short, OO techniques
provide the flexibility to create just about any type of distributed system
5.2 Future Work
The possibilities for future work on this subject are almost endless.
Since GoDo aims to provide all the functionality of a monolithic MP system,
there are many areas that deserve further investigation.
The most compelling is the development of a shared memory object.
The most compelling feature of the OO approach is if the top level Shared
Memory classes interface is well designed, the underlying functionality can be
changed, tested and researched without changing the code to processes using a
Shared Memory object. Differing memory coherence schemes should be
tested against differing problems domains. It is more than likely that certain
schemes will solve niche problem domains better than others. It is probably
most important to find the most efficient general coherence method for a
loosely coupled distributed system.
Objects that handle distributed concurrency control are another area of
interest. This would be very similar to the Shared Memory class discussed
above and would probably closely modal a Mutex, Semaphore, and/or
Condition. This area would probably be more in the lines of development and
proof of concept as opposed to research. Since any condition typically block
(unless being poled for availability), it is unlikely there is much in the way of
optimization once an adequate method is derived.
The full development and implementation of a working VOiS System
would be of great value. It would probably make the most since to break the
system down into its subsystem components. The order in which the systems
are designed is of no consequence as long as they adhere to the GoDo
communication and process creation protocols. To accomplish this, the
Shared Memory and Concurrency Control class should be developed first.

Appendix A
Heterogeneous Test: Aggregate Processing Time Table
§ l |Proce&sType [Random [Random i \ Balancing j Method [Random [Dynamic 1 Double [ Loops in l l Millions IlIIIII 2.5| Memory Size 5kb | Total [ j Processing [ | Time In [ \ Seconds [ 1_''705^89l 1 529J57?
[Random [Round Robin i 2.5| 5kb 420.25[
[Random [Weighted .ViV.V.V.V.V.V.VAvf.V.V/.VWAVAVAV.W.V, | 2.5[ 5kb X 334.38[
[Random [Random i 5| 1Mb ij 5195.52!
[Random [Round Robin T 5f 1Mb 5 3765.43[
[Random [Dynamic 1 5| 1Mb X 2803.82[
[Random [Weighted 1 5| 1Mb $ 1660.05!
[Straight Loop pound Robin 2.5) 5kb 3589.5;
[Straight Loop :tv.-.s%sw.v,v.'Sf.ssw,v.v.v.v.sw.v.v.v.v [Random ! 2.5| 5kb \ 3408.32;
[Straight Loop [Dynamic 2.5j 5kb $ 2572.62!
IStraight Loop [Weighted 2.5| 5kb s 1685.87!
[Straight Loop [Round Robin ! 5| 1Mb 28749.66!
[Straight Loop [Random I 5[ 1Mb 27638.89[
[Straight Loop fweighted i A 1Mb 13418.62!
[Straight Loop [Dynamic*:v.v..v.v.w.*.v.v.v.*.v.v.v.v.v.v.v, 1 §[ .v>v.v\,.v.v.,.v. 1Mb 12541.19[

Appendix B
Heterogeneous Test: Aggregate Processing Time Charts
Sum Run Time
Loop Process Size; 2A Million Loops Double Netted 6kb of Memory
Random Dynamic
Process Distribution Scheme

Sum Processing Tims in Seconds Sum processing Time In Seconds
Sum Run Time
Loop Process Size: 5 Million Loops Double Nested -1Mb of Memory
35000 -.
Round Robin
Random Weighted
Process Distribution Scheme
Sum Run Time
Rindom Process Size: 04 Million Loops Double Nested. 1Mb of Memory
Random Round Robin Dynamic Weighted
Process Distribution Scheme

Appendix C
Heterogeneous Test: Average Processing Time Table
! f i Processing \
Balancing iiDouble Loops inf Time in
|Proc Type w Method '.\v.y.'.v.\v.v.v.v.v.v.v.v.v.v.\\v.v. Millions f Memory Size | __Secondsww_J
IRandom iRandom § 2.i 5kb 1 0.241
fRandom iDynamic 2.5: 5kb 0.18f
^Random iRound Robin 5 2.5f 5kb | 0.14f
fRandom iWeighted £ 2$ 5kb 1 0.11|
^Random iRandom | £ 1Mb | 1.73f
fRandom iRound Robin f £ 1Mb { 1.2f|
fRandom iDynamic j: £ 1Mb f 0.93|
fRandom IWeighted £ 1Mb i 0.55|
^Straight Loop ; Round Robin f 2.5\ 5kb I 1.2f
fStraight Loop iRandom 1 2L5: 5kb i i.ii
fStraight Loop ..v.iv.v.v.v\*.vrv.v.v.-.v.'v.v.v.v1 iDynamic iV.-.'lViSsf.W.IISV.SI'.Vl'.IWiV.-.-.Vil 2.51 5kb 1 'iW.Wiv.-rtWiWVi'i-iMWiV.''. diet
|Sbaight Loop iWeighted jl £sf 5kb 1 o.56f
fai^gMLoop JRound Robin | £ 1Mb \ MWAVAVA\WSSWVAWVAV.W.V.WV.V 9.58|
^Straight Loop j.v.v.vav.v.Vv.'.^v.v.v.v.v.v.v.v. iRandom $ v 5f 1Mb 9.211
^Straight Loop |Straight Loop iWeighted v.v(.nvWAnKv>WAnHWA<>niM jj § 1Mb | 4.47|
{Dynamic 5| 1Mb { 4.18|

Sum Processing Time In Seconds Sum proass|ng Time In Seconds
Appendix D
Heterogeneous Test: Average Processing Time Charts
Average Run Time
Loop Process Size: 2JS Million Loops Double Nested. 5fcb of Memory
Random Dynamic
Process Distribution Scheme
Average Run Time
Random Process Size: 0-2.5 Million Loops Double Nested. 5kb of Memory
Dynamic Round Robin
Process Distribution Scheme

Sum Processing Time In Seconds Sum Processing Time In Seconds
Average Run Time
Loop Process Size: 5 Million Loops Double Nested. 1Mb of Memory
Round Robin Random Weighted Dynamic
Process Distribution Scheme
Average Run Time
Random Process Size: 0-6 Million Loops Double Nested. 1Mb of Memory

Appendix E
Gnu Object /Distributed Object Source Code
License Agreement
This program and all source code contained in this thesis paper is free
software; you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
See the GNU General Public License in appendix G for more details.
To receive the latest copy of the GNU General Public License, write
to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
Copyright 2000,2001,2002 Anthony Scott Thompson
asthomps @
Process Manager
File: ProcessManager.h
This is a top level class for managing processes. This class manages processes
created by the process creator.
Author: Anthony Scott Thompson
Date: 11/17/01
#ifhdef ProcessManager_h
#define ProcessManager_h 1

#include "ProcessCreator.h"
#include "DoProtocol.h"
#include "ProcessID.h"
using namespace std;
class ProcessManager: public Daemon {
SocketServer *ss;
ThreadPool thrdPool;
String sProcList;
Exception exc;
friend void* client_connection(SockArgs *sa);
ProcessCreator *pc;
ReaderWriter rwPC;
ProcessManager(const char* host, int port,u_int thread_count = 20) {

ss = new SocketServer(this,&thrdPool,port,host);
this->pc = NULL;
~ProcessManager() {
delete ss;
//Start the process creator
void run();
//Add or change the process creator.
void setProcessCreator(ProcessCreator* pc){
this->pc = pc;
} //End extern
File: ProcessManager.cpp
This is a top level class for managing processes. This class manages processes
created by the process creator.
Author: Anthony Scott Thompson
Date: 11/17/01
#include "ProcessManager.h"

void* client_connection(SockArgs *sa){
BinSocket& bs = *sa->sock;
ProcessManager& pm = *(ProcessManager*)sa->args;
String sBuff;
char* ret;
size_t size;
ProcessID *procID = 0;
(void*)ret = bs.Receive(size);
pm.showError(Bad Connection Message Received from client.");
return NULL;
bs.Send(ACK,strlen(ACK) +1);
(void*)ret = bs.Receive(size);
if(stmcmp(ret,LIST,strlen(LIST))) {
pm.exc.setWhat("client_connection: Bad message received. Expected LIST.");
delete ret;
return NULL;
bs.Send(pm.sProcList.c_strO,pm.sProcList.size() +1);
}catch(Exception e){
e.appendWhat("ProcessManager::client_connection: Closing connection.");
pm.exc.setWhat("client_connection: Unknown critical error. Closing connection.");

delete ret;
(void*)ret = bs.Receive(size);
if(!strncmp(ret,REQ,strlen(REQ))) {
char* proc = ret + strlen(REQ) + 1;
char* fnd = proc;
while(fnd[0] !=SEP_CHAR){
fnd[0] = \0; //Terminate the proc string
char* args = -H-fnd; //anything left is argument
procID = pm.pc->Create(proc,args);
//pm.exc.setWhat(client_connection: Process creation failed.");
bs.Send(NAK,strlen(NAK) +1);
bs.Send(sBuff.c_strO,sBuff.sizeO +1);
char buff[1024];
sprintf(buff,"PM: Sent new data to client: %s: ",;
delete procID;
char *num,*tmp = ret + strlen(KILL);
int check;

num = ++tmp;
check = 0;
for(int i = 0; i < 2; i++){
tmp = \0; num = -H-tmp; check = 0; int iSig = -9; while(tmp[0] !=SEP_CHAR){ tmp++; check++; if(check = 20) { //Weve gone too far break; } }
received."); if(check = 20) {//Error pm.exc.setWhat("client_connection: Malformed kill message pm.showError(pm.exc); break; }
} if(!i){ iSig = atoi(num); )else{ pm.rwPC.readLockO; pm.pc->kill(atol(num),iSig); pm.rwPC.readUnlockO; }

pm.exc.setWhat("client_connection: Bad message from session.");
}catch(Exception e){
e.appendWhat("ProcessManager::client_connection: Closing
pm.exc.setWhat("client_connection: Unknown critical error. Closing
return NULL;
void ProcessManager::runO {
deque dq;

sProcList += dq.frontO;
sProcList += SEP;
Process Creator
File: ProcessCreator.h
This is an abstract top level class for managing processes. The sole job of this class
is to create processes. Processes are designed to be controlled by an outside process or
Process Controller.
Author: Anthony Scott Thompson
Date: 09/30/01
#ifndef ProcessCreator_h
#define ProcessCreator_h
#include "ProcessID.h"
using namespace std;
extern C++"{
class ProcessCreator{

deque dq;
Application *app;
void addToList(char* app){
virtual bool run() = 0;
virtual ProcessID* Create(char* proc, char* args) = 0;
virtual void kill(u_long ProcID, int sig = 9) = 0;
void getList(deque& dq){
dq = this->dq;
void setApp(Application *app){
this->app = app;
virtual ~ProcessCreator() {];
Distributed Object Process Creator
File: doProcessCreator.h
This is the Process Creator for making Distributed Objects. These objects use a
pool of servers to distribute processing across pool or computers.

This file/class replaces doMaster.* and the Master class.
Author: Anthony Scott Thompson
Date: 03/10/02
#ifndef doProcessCreator_h
#define doProcessCreator_h 1
#define LOGFILE "/tmp/doMaster.log"
//#include "time_tool.h"
#include "ProcessCreator.h"
#include "pdDynamic.h"
#include "pdRandom.h
#include "pdRoundRobin.h"
#include "pdWeighted.h"
#include "SlaveManager.h"
#include "DoProtocol.h"

#defme TIME_OUT 2 //Slave response time out in seconds
#define BOGUS "bogus"
extern "C++"{
class doProcessCreator: public ProcessCreator{
Exception exc;
//int fdLogFile;
//Condition* condRequestStats;
//Mutex* mtxSlaveList;
Mutex* mtxOut;
Mutex mtxProcList;
Barrier* barStatList;
ReaderWriter* rwSlaveCount;
SlaveManager *sm;
ProcessDistributor *pd;
pdRandom pdRand; //default process distributor
//TMapSlaves* mapSlaves;
TProcs* IstProcs;
unsigned int uintSlaveCount;
// opThread* thrdSlave;
//opThread* thrdSlaveRecv;
String sHost, sProcPath;
int iPort, fdSigNewSlave;
Application *app;
bool bUseMem;
bool bUseMips;
bool processSlave(int fd);

u_int addSlaveOl
writeGuard g(*rwSlaveCount);
return ++uintSlaveCount;
u_int subSlaveOf
readGuard g(*rwSlaveCount);
return uintSlaveCount;
unsigned int getSlaveCount(){
readGuard g(*rwSlaveCount);
return uintSlaveCount;
friend void* slaveProc(void* args);
friend void* slave_recv(void* args);
friend string kill(string s);
friend string slaveCount(string s);
bool loadProcs(char* path);
doProcessCreator(Application *app, char* proc_path, int port, char* host = ""){
this->app = app;
sm = new SlaveManager(app, port, host);
sProcPath = proc_path;
iPort = port;
sHost = host;
pd = &pdRand; //Set the default process distributor

bUseMips = true;
bUseMem = true;
// ockClient = new Socket;
//thrdSlaveRecv = new opThread;
//condRequestStats = new Condition;
//mtxSlaveList = new Mutex;
mtxOut = new Mutex;
barStatList = new Barrier;
rwSlaveCount = new ReaderWriter;
//mapSlaves = new TMapSlaves;
IstProcs = new TProcs;
bool runO;
ProcessED* Create(char* proc, char* args);
//?? The kill method needs to be added
void kill(u_long proc_id,int sig) {};
void setProcessDistributor(ProcessDistributor *pd){
this->pd = pd;
}//end extern

File: doProcessCreator.cpp
This is the Process Creator for making Distributed Objects. These objects use a
pool of servers to distribute processing across pool or computers.
This file replace doMaster.* and the Master class.
Author: Anthony Scott Thompson
Date: 03/10/02
#include "doProcessCreator.h"
Mutex mtx;
Mutex mtxWrite;
pthread_mutex_t pmtxGlobal = PTHREAD_MUTEX_INITIALIZER;
struct Args{
int fd;
doProcessCreator* master;
struct argsProcf
Args* a;
String* proc;
//Socket* sock;
a = new Args;
proc = new String;
//sock = new Socket;

delete a;
delete proc;
//delete sock;
ProcessID* doProcessCreator::Create(char* proc, char* args){
app->debug("New Create request received.");
//if the slave DQ has changed, refresh
//the Process distributor
app->showError("doProcessCreator::Create: Bad return from Distributor refresh.");
return 0;
int fdBest = pd->schedule();
exc.setWhat("Create; Bad return from Process Distributor. No Slave found.");
Socket newSocket;
app->debug("Found Best Slave:" + newSocket.getPeerAddress(fdBest));
String sBuff,sVal;
StrTok st;
sBuff.setSize(strlen(proc) + strlen(args) + strlen(PROC) + 3);
sprintf(sBuff.dataO,"%s%s%s%s%s", PROC, SEP, proc, SEP, args);

string stlbuff = sBuff.dataO;
]catch(Exception e){
app->sho wError(e);
sVal = //This will force us to exit and remove the slave.
if(sVal = ACK){
ProcessID *procID = new ProcessIDO;
//!! Get the process ID
//Add the file descriptor to the value of the whole
//This needs to be changed as we should use an ID for a PE instead of
//the fd as this may change if a socket is bracken. Well use it for now.
procID->setProcessID((u_long)((fdBest*1000000) + atol(sVal.c_strO)));
//!! End
//! '.Get the Host Name
//! !Get Port
//!! End
//??To Do: Manager the Processes from the Process Creator

sprintf(sBuff.dataO,"New Process created: %lu",procID->getProcessIDO);
return procED;
//Weve run out of threads on the server!
//Make a recursive call to get the next available Slave.
app->logInformation("Receive NAK from slave. Probably out of threads. Sleeping 1
//Sleep for a little to avoid system thrashing
struct timespec ts;
ts.tv_sec = 1;
ts.tv_nsec = 0;
//return NULL; //Well try again
//This shouldnt happen.
//Make sure we dont hang the slave.
try{newSocket.Send(EXT,fdBest);]catch(...){} //We dont care about errors
return NULL;

delete sm;
delete mtxOut;
delete barStatList;
delete rwSlaveCount;
delete IstProcs;
bool doProcessCreator::runO{
if(!loadProcs(sProcPath.dataO)) (
exc.setWhat("run: Problem loading processes. Exiting.");
return false;
app->debug(Processes Loaded!");
app->debug(Slave manager running.);
return true;
bool doProcessCreator:;loadProcs(char* path){
DIR *dir = opendir(path);
struct dirent *dirEntry;
String sName;
char *loc;
exc.setWhat("loadProcs: Could not open directory.);

while((dirEntry = readdir(dir)) != NULL){
if(dirEntry->d_name[0] !=
sName = dirEntry->d_name;
loc = strtok(sName.dataO,".");
loc = strtok(NULL,".");
if(!stmcmp(loc,"do",2)) {
//?? This needs to be changed so the memory is cleaned up
//?? This currently does not matter since this is only distroyed
//?? when the program terminates
exc.setWhatC'loadProcs: Null value return searching for .do
return true;
exc.setWhatC'loadProcs: No processes loaded. Check path.");
return false;
bool doProcessCreator::processSlave(int fd){
app->debug("Processing Slave");

Socket sock;
SlaveStats* ss = new SlaveStats;
string buff;
String sBuff;
app->debug("Processing variables set");
buff = "New slave being registered: + sock.getPeerAddress(fd);
if(sBuff != JOIN){
app->logInformation("Bad response from slave:" + buff +
" Slave:" +sock.getPeerAddress(fd));
return false;
if(sBuff !=NAK){
app->debug("MIPS:" + buff);
ss->mips = atof(buff.c_strO);
app->logbiformation("MIPS not returned by Slave:" + sock.getPeerAddress(fd));
return false;
if(buff != NAK){
app->debug("MEM:" + buff);
ss->tmem = atof(buff.c_strO);

app->logInformation("MEM not returned by Slave:" + sock.getPeerAddress(fd));
return false;
app->debug("Slave add successfully!");
return true;
Process Distributor
File: ProcessDistributor.h
This is an abstract class for creating scheduling algorithms. This is really only
useful for Distributed Objects (DO) since LWP and HWP are handled by the kernel. Since
we need a dummy class to create empty methods for LWP and HWP, well use empty
functions instead of null for this class.
Author: Anthony Scott Thompson
Date: 03/10/02
#ifndef ProcessDistributor_h
#define ProcessDistributor_h 1
#include "DoProtocol.h"
extern "C++" {
class ProcessDistributor{
TdqFileDescr dqFD;

TdqFileDescr& getSlaveDQOf
return dqFD;
//Does any processing after receiving an update DQ
virtual int refreshO = 0;
//Run the scheduling algorith and return the file descriptor to
//the chosen socket,
virtual int scheduleO = 0;
virtual ~ProcessDistributor(){}
}//end extern
Round Robin Process Distributor
File: pdRoundRobin.h
This is a simple Round Robin schedular. This type of load balancing is very useful
when Processing Elements are have the same system resources and the processing power
required by the distributed processes are identical. This method of scheduling has the least
amount of overhead so time can be shaved off of large calculations by reducing the time
taken to choose a PE.
Author: Anthony Scott Thompson
Date: 03/10/02
#ifndef pdRoundRobin_h
#define pdRoundRobin_h 1

#include "ProcessDistributor.h1
extern "C-H-"{
class pdRoundRobin: public ProcessDistributor{
TdqFileDescr dqTmp, dqPopper;
Mutex mtxTmp;
//Does any processing after receiving an update DQ
int refresh(){
//The use of temp protected by a mutext makes this
//thread safe given getSlaveDQ and reffessh are called
//sequentially from the same thread (as should be).
dqTmp = dqFD;
return 0;
//Run the scheduling algorithm and return the file descriptor to
//the chosen socket,
int schedule(){
if(dqPopper.emptyO) {
dqPopper = dqTmp;
int ret = dqPopper.frontO;
return ret;

virtual -pdRoundRobinQ {}
}//end extern
Random Process Distributor
File: pdRandom.h
This class randomly picks an element for processing from a list. This type of
algorithm works best when the decision about which element to use is not critical to system
performance and either the complexity of the algorithm is undetermined or the resources of
the processing elements varies greatly.
Author: Anthony Scott Thompson
Date: 03/10/02
#ifndef pdRandom_h
#define pdRandom_h 1
#include "ProcessDistributor.h"
extern "C++"{
class pdRandom:public ProcessDistributorf
Mutex mtxRandom;
//Make sure the seed is randomized

//Reseed the random number generator
//This will make the numbers more random
int refresh/) {
time_t t;
return 0;
//Run the scheduling algorith and return the file descriptor to
//the chosen socket,
int schedule/) {
return -1;
//Calls to random are not thread safe.
int idx = (int)(random/)%dqFD.size());
return dqFD[idx];
-pdRandom/) {}
}//end extern
Dynamic Process Distributor

File: pdDynamic.h
This process distributor uses a Dynamic method for scheduling processes. This
approach chooses the best PE by calculating the available memory and current system load
to decide which element is optimal for receiving the process.
Author: Anthony Scott Thompson
Date: 03/10/02
#ifhdef pdDynamicJh
#define pdDynamic_h 1
#include "ProcessDistributor.h"
extern "C-H-"{
class pdDynamic: public ProcessDistributor {
int fd;
char* buff;
SlaveStats ss;
u_int mem_multi;
buff = new char[1024];
delete buff;

//Does any processing after receiving an update DQ
int reffeshO {return 0; ]
//Run the scheduling algorith and return the file descriptor to
//the chosen socket,
int schedule(){
Socket newSocket;
int fdBest;
TdqFileDescr::iterator it = dqFD.beginO;
float weight;
float bestWeight = -20000; //set this to an implausably low number
while(it != dqED.end()){
fd = *it;
weight = 0;
ss.load = atof(buff);
newSocket.Recei ve(buff,MAXBUFF,fd);
ss.cmem = atof(buff);
weight -= ss.load;
if(weight>bestWeight) {
bestWeight = weight;
fdBest = *it;
return fdBest;

}//end extern
Weighted Process Distributor
File: pdWeighted.h
This process distributor uses a Dynamic Weighted method for scheduling
processes. This approach chooses the best PE by calculating the available memory and
current system load to decide which element is optimal for receiving the process. It also
takes into account the total memory and the processors bogomips. These are used as
weighting factors for the PE.
Author: Anthony Scott Thompson
Date: 03/11/02
#ifndef pdWeighted_h
#define pdWeighted_h 1
#include "ProcessDistributor.h"
extern "C-H-"{
class pdWeighted:public ProcessDistributor {
int fd;
char* buff;
TMapSlaves mapSlaves;
SlaveStats ss;

u_int mem_multi;
//?? Add a parameter to take the Application obj
// and add debugging
buff = new char[1024];
delete buff;
//Does any processing after receiving an update DQ
int refresh/) {
//Clean the map
TMapSlaves::iterator it = mapSlaves. begin/);
while(it != mapSlaves.endO){
delete it->second;
//Rebuild the map
TdqFileDescr::iterator itDQ = dqFD.beginO;
while/itDQ != dqFD.endO){
fd = *itDQ;
Socket sock;
SlaveStats* ss = new SlaveStats;

if(strcmp(buff,NAK)) {
ss->mips= atoi(buff);
if(strcmp(buff,NAK)) {
ss->tmem = atoi(buff);
return 0;
//Run the scheduling algorith and return the file descriptor to
//the chosen socket,
int schedule(){
Socket newSocket;
SlaveStats* ss;
int fdBest = -1;
TMapSlaves: iterator it = mapSlaves.beginO;
float bestWeight = -20000; //set this to an implausably low number
float weight;
while(it != mapSlaves.endO){
fd = it->first;
ss = it->second;
weight = 0;

ss->load = atof(buff);
ss->cmem = atof(buff);
weight -= ss->load;
weight += ss->mips/4000.0;
weight += ss->tmem/40.0;
if(weight>bestWeight) {
bestWeight = weight;
fdBest = it->first;
return fdBest;
}//end extern
Process Manager Session
File: PMSession.h
This is the Process Manager Session class and is part of the goDo (Gnu Object
Distributed Object) library. This class establishes a connection to the process manager and
allows a client to create new distributed processes. The PMSession class gets a list of all

valid processes from the Process Manager. The create method creates and returns a new
ProcessClient pointer. See the ProcessClient class for more information.
Author: Anthony Scott Thompson
Date: 11/23/01
//ifndef PMSession_h
#define PMSession_h 1
//include "ProcessClient.h"
#include "DoProtocol.h"
using namespace std;
class PMSession{
BinSocket* myBinSocket;
list IstProcs;
Application *app;
Exception exc;
PMSession(Application *app){
this->app = app;

delete myBinSocket;
bool connect(char* host, int port);
list* getProcessList(){
return &lstProcs;
ProcessClient* create(char* proc, char* args);
void CloseO {
try {
myBinSocket->Send(QUIT,strlen(QUIT) +1);
}catch(Exception e){
} //End Extern
File: PMSession.cpp
This is the Process Manager Session class and is part of the goDo (Gnu Object
Distributed Object) library. This class establishes a connection to the process manager and
allows a client to create new distributed processes. The PMSession class gets a list of all
valid processes from the Process Manager. The create method creates and returns a new
ProcessClient pointer. See the ProcessClient class for more information.

Author: Anthony Scott Thompson
Date: 11/23/01
#include "PMSession.h"
bool PMSession::connect(char* host, intport){
try {
String sMsg;
char* ret;
size_t size;
SocketConnector myConnector(host,port);
myBinSocket = myConnectorO;
sMsg = CONN_PM;
myBinSocket->Send(sMsg.c_strO, sMsg.sizeO);
(void*)ret = myBinSocket->Receive(size);
sMsg = ret;
delete ret;
if(sMsg != ACK){
exc.setWhat("connect: ACK not received from Process Manager. Closing
myBinSocket->Send(LIST,strlen(LIST) +1);
(void*)ret = myBinSocket->Receive(size);
sMsg = ret;
delete ret;
StrTok st;
String sRet;
while(sRet.sizeO) {

return true;
}catch(Exception e){
delete myBinSocket;
exc.setWhat("connect: Unknown error");
app->sho wError(exc);
return false;
ProcessClient* PMSession::create(char* proc,char* args){
int retry = 0;
String sRet, sMsg(REQ);
sMsg += SEP;
sMsg += proc;
sMsg += SEP;
sMsg += args;
char* ret;
size_t size;
StrTok st;
myBinSocket->Send(sMsg.c_strO,sMsg.size() +1);
(void*)ret = myBinSocket->Receive(size);

sMsg = ret;
delete ret;
if(sRet = ACK){
SocketConnector sc;
BinSocket* bs = NULL;
while(retry < RETRY_MAX){
bs = sc();
}catch(Exception e){
app->debug("Client process connection failed. Retrying.");
if(retry = RETRY_MAX){
exc.appendWhat("PMSession::create: Retries failed.");
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = 10000000; //Wait .01 seconds

bs->Send(CONN_CLIENT,strlen(CONN_CLIENT) + 1);
(void*)ret = bs->Receive(size);
if(strcmp(ret,ACK)) {
delete ret;
exc.setWhat("create: Bad return from process");
delete ret;
return new ProcessClient(bs);
exc.setWhat("create: Bad return from create request.");
return NULL;
Client Process
File: ProcessClient.h
This is the ProcessClient class and is part of the goDo (Gnu Object Distributed
Object) library. The ProcessClient class is designed to be a foundation for creating a
distributed process. An instance of this class is created by calling the the "create" method of

the PMSession. This will return a pointer to a connected ProcessClient. Communication to
the process is facilitated through the stdin and stdout methods. Separate sockets or special
objects can be used to create auxiliary communication channels.
Author: Anthony Scott Thompson
Date: 11/23/01
#ifndef ProcessClient_h
#define ProcessClient_h 1
#include "DoProtocol.h"
#include "DoSigProcessor.h"
using namespace std;
extern "C++"{
class ProcessClient: private DoSigProcessor{
BinSocket *myBinSocket;
ProcessClient(BinSocket* bs){
myBinSocket = bs;
delete myBinSocket;
int std!n(String& sin);

int stdOut(String sOut);
void kill(int signal);
} //End Extern
Distributed Object Slave
File: doSlave.h
This is the slave class for the Do (Distributed Object) library. This class is used as
a slave server that will take processes from a master server (or more correctly the Process
Manager class), process them and return the results. This class also allows for the automatic
negotiation between Master and Slave to automate adding PEs to the overall system.
Author: Anthony Scott Thompson
#ifndef Slave_h
#define Slave_h 1

#include go/do.h"
#include "go/ErrorHandler.h
#include "go/Mutex.h"
#include "go/Barrier.h"
#include "Process.h"
using namespace std;
typedef Process *maker_ptr();
typedef map TProcsMap;
typedef TProcsMap: :value_type TProcMapValue;
class Slave;
struct LoadStats{
float mem;

float load;
struct SysStatsf
float bogomips;
float tmem;
struct Proc{
opThread* thrd;
long proc_id;
Process* proc;
Slave* slave;
delete proc;
extern "C++"{
class Slaverpublic Daemon {
TProcsMap mapProcs;
Socket* sockMaster;
ThreadPool thrdPool;
opThread* thrdProc;
opThread* thrdClean;
Proc* procKill;
Condition condClean;
static string* MasterHostName;
String sProcPath;
String sHost;

static Mutex mtxMasterHost;
static Mutex* mtxOut; j
int iMasterPort;
Exception exc;
char* buff;
struct SysStats ss;
int iConnectRetries;
bool connect(const string& HostName, int port);
static string badReply(string buff = "");
void getSysStats(struct SysStats& ss);
void getLoadStats(struct LoadStats& Is);
void procListenO;
bool loadProcsO;
friend void* runProc(void* args);
friend void* clean(void* args);
friend void* listen(void* args);
sockMaster = new Socket;
thrdProc = new opThread;
thrdClean = new opThread;
MasterHostName = new string;
mtxOut = new Mutex;
buff = new char[MAXBUFF];

delete sockMaster;
delete thrdProc;
delete thrdClean;
delete MasterHostName;
delete mtxOut;
delete buff;
void run(const char* master, int master_port,
char* host, char* proc_path, ujnt ThreadPoolSize = 50);
} //end extern
File: doSlave.cpp
This is the slave class for the Do (Distributed Object) library. This class is used as
a slave server that will take processes from a master server (or more correctly the Process
Manager class), process them and return the results. This class also allows for the automatic
negotiation between Master and Slave to automate adding PEs to the overall system.
Author: Anthony Scott Thompson
#include "doSlave.h"
extern char **environ;
struct Args{

mt fd;
Socket* sock;
Slave* slave;
string* Slave:-.MasterHostName;
Mutex* Slave: :mtxOut;
Mutex Slave: :mtxMasterHost;
//Exception Slave::exc;
void* runProc(void* args)(
Proc *proc = (Proc*)args;
Slave* s = proc->slave; //Grab the slave point, proc will be deleted
if(proc->proc->checkConnectedO) {
s->showError("doSlave::runProc: Failed client connection check.");
delete proc;
s->debug("runProc; Thread Exiting.",DEBUG_MILD);
return NULL;
void Slave: :run(const char* master, int master_port,
char* host, char* proc_path, u_int ThreadPoolSize
50) {
*MasterHostName = master;
iMasterPort = master_port;

sProcPath = proc_path;
sHost = host;
//Sets the file descriptor limit and blocks unwanted signals
//if(thrdPool.setPriority(10)){ //This is not a show stopper
//showError("doSlave:;run: Problem setting priority for thread pool.");
try {
}catch(Exception e){
e.appendWhat("Slave: :run");
debugC'run: ThreadPool created.");
exc.setWhat("run: Problem loading processes.");
iConnectRetries = 0;
while(iConnectRetries < CONN_RETRIES){
if(connect(*MasterHostName, master_port)) {
sleep(2); //Wait two seconds before retrying

exc.setWhat("run: Exceded connection retries.");
bool Slave: :connect(const string& HostName, int port){
//Get the system statistics
//printCGetting system stats.");
debug("Socket Set for Master");
debug("Connected to Master");
debug("Join sent.");
if(strcmp(buff,ACK)) {
exc.setWhat("connect: Did not receive ACK from Master after JOIN.");
return false;
debug("Received ACK from master, .returning ACK");
logInformation(("Connection Succeded to + HostName + n.").c_strO);
return true;;
}catch(Exception e){

exc.setWhat("connect: Unknow error.");
return false;
bool Slave: :loadProcsO{
DIR *dir = opendir(sProcPath.c_strO);
struct dirent *dirEntry;
String sName,sFile;
char *loc;
exc.setWhat("loadProcs: Could not open directory.");
while((dirEntry = readdir(dir)) != NULL){
if(dirEntry->d_name[0] !=.){
sFile = sName = dirEntry->d_name;
loc = strtok(sName.dataO,.")i
loc = strtok(NULL,".);
if(!stmcmp(loc,"do",2)) {
//?? This needs to be changed so the memory is cleaned up
//?? This currently does not matter since this is only distroyed
//?? when the program terminates
String sLib(sProcPath);
sLib += sFile;
//?? This should be changed to the glib module functions for protablity.
void* hndl = dlopen(sLib.c_strO,RTLD_NOW);