Citation
The concepts and implementation of Hydra PPS

Material Information

Title:
The concepts and implementation of Hydra PPS
Creator:
Powers, Franklin E
Publication Date:
Language:
English
Physical Description:
xi, 137 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Parallel programming (Computer science) ( lcsh )
Parallel programming (Computer science) ( fast )
Hydra Parallel Programming System
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 135-137).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Franklin E. Powers.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
466878302 ( OCLC )
ocn466878302
Classification:
LD1193.E52 2006m P68 ( lcc )

Full Text
THE CONCEPTS AND IMPLEMENTATION OF HYDRA PPS
by
Franklin E. Powers, Jr.
A. A., Arapahone Community College, 1996
A.S., Arapahone Community College, 1999
B. S., University of Colorado at Denver, 2003
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
2006


Copyright (C) 2006 by Franklin E. Powers, Jr.
All rights reserved.


This thesis for the Master of Science
degree by-
Franklin E. Powers, Jr.
has been approved
by
Boris Stilman
Ellen Gethner

Date


Powers, Franklin E, (M.S., Computer Science)
The Concepts and Implementation of Hydra PPS
Thesis directed by Professor Gita Alaghband
ABSTRACT
The Hydra Parallel Programming System, a Java-based parallel language exten-
sion and its collection of tools, are described. The Hydra language addresses a
number of areas, that have not received much attention. The first and most impor-
tant of which is that it allows a parallel program to be written and then recompiled
to a particular architecture and runtime at runtime. The first implementation of
this system was designed explicitly for SMP (Symmetric Multiprocessing) and
compatible architectures. SMP architectures are becoming more common and
need a simple, reliable, flexible, and efficient way to program them.
This paper describes the reasons why Hydra was created, the details of Hydra,
compares it to some alternative Java-based parallel language extensions, and a
number of benchmarks for these systems. The details of Hydra that are discussed
include an overview of the whole system, the Hydra language, the Hydra runtime,
the Alchemist compiler, Hydra Stay Resident, and the Benchmark And Profiling
System. The alternatives which are discussed are: javab, javar, JACE, JOMP,
Java Threads, and Java Thread Pools. The benchmarks are used to compare Hy-
dra with the various alternatives and sequential programs.
This abstract accurately represents the content of the candidates thesis. I rec-
ommend its publication.
Signe
Gita Alaghband


DEDICATION
This thesis is dedicated to my parents for their love and support while I was going
to college and writing this.


ACKNOWLEDGEMENT
The advice of my advisor, Gita Alaghband, was very helpful while I was writing
this.


CONTENTS
Figures........................................................ x
Tables.......................................................... xi
Chapter
1. Introduction.................................................1
1.1 The Problem In Detail........................................2
1.2 The Current Solutions. ....................................3
1.3 The New Solution The Hydra Parallel Programming System 5
1.4 Components of the Hydra PPS..................................6
2. The Hydra Language...........................................8
2.1 Events...................................................... 8
2.2 Time Slots..................................................9
2.3 Event Groups...............................................10
2.3.1 Parallel Event Groups....................................11
2.3.2 FIFO Event Groups..................................... .11
2.3.3 Sequential Event Groups..................................12
2.3.4 Prescheduled Event Groups................................14
2.4 Locks....................................................14
2.4.1 Pass Locks...............................................15
2.4.2 Read Locks...............................................16
2.4.3 Write Locks........................................ 16
2.4.4 ReadWrite Locks..........................................17
2.4.5 Using Locks..............................................18
2.5 Commands.................................................21
2.5.1 Event Group Creation.....................................21
2.5.2 Event Group Manipulation.................................21
2.5.3 Prescheduled Event Group Specific........................22
2.5.4 Parameter Locks..........................................23
3. Comparison With Other Languages.............................24
3.1 Architecture Independence..................................24
3.2 Compilation To Architecture at Runtime.....................25
3.3 Fewer Language Restrictions................................25
3.4 Recognized By Existing Tools...............................26
3.5 No Deadlocks...............................................26
vii


3.6 No Explicit Barriers......................................28
3.7 Scheduled Locking.........................................28
3.8 Restricted Mixing of Parallel and Sequential Code.........29
3.9 Indirect Support for Many Kinds of Parallel Loops.......30
3.10 Support for Parallel Invocations.........................31
3.11 Support for Spawning Workers.............................32
4. The Hydra Runtime .........................................33
4.1 Compilation of Programs...................................33
4.2 Scheduling of Events......................................34
4.3 Execution of Events.......................................37
4.4 System Maintenance........................................38
4.5 The Supervisor............................................38
4.6 Locks.....................................................40
5. The Alchemist Compiler.....................................41
5.1 Compiler Stages............................................41
5.2 Plugins...................................................42
5.2.1 Compiler Plugins.........................................42
5.2.2 Language Plugins........................................43
5.2.3 Runtime Plugins.........................................43
5.3 Definitions.............................................44
5.3.1 Alchemist Definition Generator...........................44
5.3.2 How Definitions Work................................... 45
5.4 Transformations...........................................46
5.4.1 Defining A Transformation................................47
5.4.2 How Transformations Work................................48
6. Benchmarks.................................................50
6.1 Test Machines..............................................51
6.1.1 Desktop.................................................51
6.1.2 Workstation.............................................51
6.1.3 Server..................................................52
6.2 Hydras Benchmarking Framework - BAPS.....................52
6.3 Overhead Test.............................................53
6.3.1 Overhead for Threads Algorithm...........................54
6.3.2 Overhead for Parallel Loops Algorithm...................54
6.3.3 Overhead for Parallel Invocations Algorithm.............55
6.3.4 Overhead for Parallel Sections Algorithm................55
6.3.5 Overhead for Hydra Algorithm............................56
6.3.6 Results.................................................57
6.4 Merge Sort Test. ........................................58
6.4.1 Parallel \MergeSort\ With A Set Number Of Threads. .59
6.4.2 Parallel \MergeSort\ With A Minimum Array Size..........60
viii


6.4.3 Problem Parallel \MergeSort\ and Threads................61
6.4.4 Results.................................................62
6.5 Matrix Matrix Multiplication Test.........................63
6.5.1 Threaded Algorithm......................................65
6.5.2 Hydra Algorithm.........................................66
6.5.3 Results.................................................67
6.6 Single Master Thread with One or More Worker Threads ... 68
6.6.1 Thread Pools and Explicitly Spawning Workers............69
6.6.2 Results Total Execution Time..........................70
6.6.3 Results Time To Start Work............................72
6.7 Alchemist Definition Generator............................73
6.8 Hydra Stay Resident.......................................74
6.8.1 How It Works............................................75
6.8.2 The Results.............................................76
6.8.3 Conclusions and Future Work.............................77
7. Conclusions and Future Research And Development Directions.78
7.1 A Lighter System..........................................79
7.2 New Language Constructs...................................79
7.3 Runtime for .Clusters.....................................80
7.4 Real-time Hydra...........................................80
Appendix
A. Alchemist Definition Generator Source Code.................81
B. Benchmark Source Code for Hydra............................97
B.l Overhead Benchmark for Hydra ..........................97
B.2 Merge Sort Benchmark for Hydra............................109
B.3 Matrix Matrix Multiply Benchmark for Hydra .............. 113
B. 4 Spawn Worker Benchmark for Hydra ....................... 115
C. Licenses..................................................121
C.l GNU Lesser General Public License. ...................... 121
C. 2 Merge Sort Benchmark for Hydra...........................132
D. The Included CD...........................................134
References....................................................135
IX


LIST OF FIGURES
Figure
2.1 Example Execution With A FIFO Event Group..................12
2.2 Example Execution With A Sequential Event Group............13
2.3 Example Execution Without A Hydra Runtime..................13
2.4 Locking Example Part 1 - No Locks.........................19
2.5 Locking Example Part 2 - After 1st Lock...................20
2.6 Locking Example Part 3 - After 2nd Lock...................20
2.7 Locking Example Part 4 - After 3rd and 4th Locks..........21
4.1 Scheduling In A Hydra Program.............................35
4.2 Scheduling In A Hydra Runtime.............................37
6.1 Overhead for Starting Parallel Work.......................57
6.2 Times Faster Than Sequential Merge Sort................ . .63
6.3 Times Faster Than Sequential Matrix Matrix Multiply. . . .67
6.4 Times Faster Than Sequential for Execution.................71
6.5 Times Faster Than Sequential to Start Worker...............72
6.6 Times Faster Than Sequential for Definition Generator. . .74
x


LIST OF TABLES
Table
1.1 Comparison of Parallel Languages for Java...................5
1.2 Comparison of Parallel Languages for Java including Hydra. 6
6.1 Time for Overhead Test....................................58
6.2 Time and Times Faster for Merge Sort .....................63
6.3 Time and Times Faster for Matrix Matrix Multiplication .68
6.4 Total Execution Time for Spawning Workers.................71
6.5 TTSW for Spawning Workers.................................73
6.6 Time and Times Faster for Definition Generator............74
xi


1. Introduction
The Hydra Parallel Programming System, Hydra PPS, is a parallel language ex-
tension to Java that was developed to address a number of issues[l]. The language
works well with existing Java tools and programs written with the language are
portable. They execute in parallel when used on a parallel architecture with a
compatible runtime. They execute sequentially as regular Java programs when
used without a compatible runtime. Hydra is designed as a shared memory paral-
lel language. Instead of replacing Java, Hydra extends the Java language through
extensions. The language is designed to avoid deadlock. An implementation of
all language constructs is provided at runtime rather than an compile time.
Hydra provides a solution to the issue addressed by Herb Sutter, a software
architect from Microsoft, who stated that the software industry is not prepared
to make use of multi-core CPUs (Central Processing Units) from AMD and In-
tel[2]. Many developers do not have the skills to create software that uses mul-
tiple threads. It is difficult to parallelize the algorithms used in many desktop
programs. This means that the software industry is not prepared to handle com-
puters with multiple CPUs or processors that are explicitly multithreaded. This
problem is compounded by the fact that this technology has been slowly moving
downward to smaller computers and becoming more common. Already explicit
multithreading for processors is common in the form of Intels HyperThreading.
Cheap multi-processor motherboards are now available from VIA[3] [4]. Multi-core
processors are now available in laptops; there is also a dual-processor handheld
available[5] [6] [7] [8].
Another issue is that many high-level parallel languages focus either on dis-
tributing tasks between multiple computers or on loops. Distributing tasks be-
tween multiple computers is great for large systems, but not necessarily the best
way of working with programs that typically run on only a single computer. In
addition, it is not easy to express everything with loops which are easily paral-
lelizable. There are also autoparallelizing compilers, but they dont work with
everything. Working with threads directly can cause problems like deadlocks and
race conditions. Some developers may never be entirely comfortable using certain
methods of parallelizing code and as a result having a few different methods would
be invaluable for developers.
Obviously something needs to be done to help make the average developers
1


job of creating multithreaded programs for the desktop and smaller platforms eas-
ier.
This paper will introduce this programming problem, a few of the current so-
lutions, and discuss in detail a new solution. In order to narrow the focus of this
paper, it is important to note that the focus will be on the Java environment
and on architectures that work within a single desktop computer that are closely
related to SMP (Symmetric Multiprocessing). A number of tests: Overhead Test,
Merge Sort, Matrix Matrix Multiply, and Spinning off Tasks will be defined. An
additional test, the Alchemist Definition Generator, which is exclusive to Hydra
and sequential Java. The discussion on constructs will be limited to those that are
required for these tests and are defined by the solutions presented in this paper.
Lastly, since there are a number of versions of Java, it should be noted that only
Java 1.5 as defined by Sun will be covered.
1.1 The Problem In Detail
Todays sequential architectures have essentially reached their limit[9]. As a re-
sult, numerous parallel architectures have been proposed and many have been
implemented. The most common are SMP (Symmetric Multiprocessing) personal
computers. CMP (Chip Multiprocessing) and SMT (Symmetric Multithreading)
are related architectures [10]. These architectures are based on the concept of two
or more threads executing concurrently on a system that has shared memory.
SMP architectures use two or more identical processors with each processor capa-
ble of executing a thread. CMP architectures use two or more identical processor
cores on one chip where each core can execute a different thread[ll]. SMT archi-
tectures execute two or more threads on a single core, and allow them to share
the computational capabilities of the core when one thread is unable to make use
of the cores full processing capabilities[12]. A combination of these architecture
features may be combined together, so that the user may have two or more proces-
sors, each with two or more cores, which can in turn execute two or more threads.
While each of these is compatible with one another, they do not have identical
performance and the same code may not be optimal on all of them[13]. Addition-
ally, there are other parallel architectures in use and proposed, including SIMD
(Single Instruction Multiple Data) and using multiple specialized cores. These
other architectures are not generally compatible with the thread model of pro-
gramming. As a result parallel programs developed today, may either not work
on the parallel architectures of tomorrow or may not be optimal.
2


There is also the problem that many programmers do not have the skills needed
for parallel programming, programming tools either dont have support for parallel
programming, or it is not in a form that high-level programmers are accustomed
to.
1.2 The Current Solutions
Three of the most common methods for writing parallel programs are using an
autoparallelizing compiler, using OpenMP parallel language, and writing multi-
threaded programs manually. This paper will focus on a few freely available Java
versions of these three technologies: javab (autoparallelization), javar (OpenMP-
like), JOMP (OpenMP implementation for Java), Java ACE or JACE (threads),
Java Threads (threads), and Java Thread Pools (threads)[14][15][16][17].
Autoparallelization is based on the idea that the compiler can figure out what
needs to be parallelized on its own or with very little help. For example, javab
takes an already compiled Java class file and recompiles it to produce parallel
code. Autoparallelization tends to be fairly limited. In this particular project,
javab was unable to recompile any of the test programs. The javab compiler itself
needs to be compiled with a C compiler and is not a Java program, javab is fairly
old and does not support all modern Java language constructs, such as support
for inner classes.
javar is an OpenMP-like language extension for Java. Support for parallelism
is limited to parallel loops and parallel invocations. Like javab, javar is a C pro-
gram. Also like javab, javar is fairly old and does not support all modern Java
language constructs, such as generics.
JOMP is an OpenMP implementation for Java. Extensive support for parallel
loops, parallel regions, barriers, and several others. Unlike the previous solutions,
JOMP is a native Java compiler. Like the previous two, JOMP does not support
all modern Java language constructs, like generics. We will demonstrate that Hy-
dra can efficiently spawn worker threads from a master thread in one of our test
cases, while this case is not supported in JOMP due to implementation bugs.
JACE, Java Threads, and Java Thread Pools are implementations of threads
for Java. They can be difficult to use under some circumstances and deadlocks
are a strong possibility. However, we were able to successfully implement and
execute all tests with these parallel programming tools. The benchmarks shall
demonstrate that JACE and Java Thread are slow. While Java Thread Pools
3


is a much faster implementation. This improvement is due to all threads being
created in advance. However, the program needs to know how many threads it
will need in advance and it can add some complications to some tasks, which is
discussed in section 6.6.1.
There are certainly a number of ways these solutions can be compared. One
is by their performance, which will be shown in chapter 6. For this paper, the
following attributes will be considered:
Whether the solution is written with the Java language A solution
may handle Java code and still be written in a different language, such as
C. If a solution is written in a different language, it may not be usable by
all developers and even if it is usable, it may not be easy to use. As a result,
we consider this a desired metric and note it in table 1.1.
Whether the solution supports all current Java language con-
structs The Java language changes over time and many constructs are
complicated. Even though a solution may support Java, it may be unable
to work with all Java language constructs. Having a solution that supports
all Java language constructs would be desirable. We consider this in table
1.1.
Whether the solution requires threads to be created and managed
explicitly Managing threads can be a complicated task. Deadlocks can
occur and performance can suffer. It makes it harder to port programs to be
able to work on other architectures later. We do not consider this a positive
performance metric and it should be marked as a no in table 1.1.
Whether the solution compiles to the architecture at runtime -
By compiling to the architecture at runtime, an individual program can
potentially work on different parallel architectures. It can allow for programs
to work more efficiently on architectures that are compatible, but different
performance-wise. Since parallel architectures are changing all the time, Yes
is the desired value and is noted in table 1.1.
The number of tests completed successfully This is defined as a
solution that can successfully complete all the tests in this paper. We mark
this as All for the ideal result and a 0 for the worst possible in table 1.1.
As can be seen from the table above, none of the tested solutions match all of
the specifications that are considered desirable for this project.
4


Table 1.1 Com] aarison o : Parallel Languages for Java
Parallelization Java Supports Explicit Compiles to Successful
Software Native All Java Language Constructs Threading Architecture at Runtime Tests
javab No No No No 0
javar No No No No All (4)
JOMP Yes No No No 3
JACE Yes Yes Yes No All (4)
Java Threads Yes Yes Yes No All (4)
Java Thread Pool Yes Yes Yes No All (4)
1.3 The New Solution The Hydra Parallel Programming System
This paper will present a new solution, that will have a number of desirable
attributes:
Can successfully execute all the tests that are presented in this paper.
Does not have explicit threading.
Will work with all Java constructs and will stay compatible with Java pro-
grams.
Is native to Java.
Can recompile the programs at runtime.
This new solution is called the Hydra Parallel Programming System[l]. This is
a language extension to Java, rather than a completely new language or a simple
library. It is an explicit parallel language and yet it avoids a number of issues
that are common for such languages (such as deadlock). Unlike many other sys-
tems, Hydra compiles the program to a particular architecture at runtime. It is
backwards compatible with regular Java and as a result, Hydra programs can be
executed without a Hydra runtime. In addition, it is modular which allows for
new Hydra runtimes to be created with ease.
As can be seen from the table 1.2, Hydra passes all the specifications for this
project. In chapter 3 a more in depth comparison of the language features will be
presented.
5


Table 1.2 Com] Darison o : Parallel Languages for Java including h ydra
Parallelization Java Supports Explicit Compiles to Successful
Software Native All Java Language Constructs Threading Architecture at Runtime Tests
javab No No No No 0
javar No No No No All (4)
JOMP Yes No No No 3
JACE Yes Yes Yes No All (4)
Java Threads Yes Yes Yes No All (4)
Java Thread Pool Yes Yes Yes No All (4)
Hydra Yes Yes No Yes All (4)
1.4 Components of the Hydra PPS
In this section every software package in the Hydra Parallel Programming System
will be mentioned, along with where more information can be found on each one.
Alchemist Compiler A class library which defines a generic recompiler
used to convert precompiled Java class files from one form to another. This
provides Hydra the ability to recompile Hydra programs at runtime as de-
scribed further in chapter 5.
Alchemist Definition Generator A program that is used to produce
definitions for the Alchemist compiler. These definitions are used to de-
scribe a language or an implementation of a language. Definitions are de-
scribed further in 5.3. The Alchemist Definition Generator is described
further in 5.3.1 and 6.7. The source code is in appendix A.
' BAPS The Benchmark And Profiling System is developed for use with
Hydra and other Java-based parallel programming languages. It provides a
framework that makes it quick and easy to develop different benchmarks,
described in 6.2
Hydra Language A class library that describes the Hydra Language to
all Java tools, described further in chapter 2.
Hydra Language Plugin A class library that describes the Hydra Lan-
guage to the Alchemist compiler. It is combined with Alchemist and a
Hydra runtime to allow Hydra programs to execute. All plug-ins are briefly
described in 5.2.
6


Hydra Runtime The first Hydra runtime executes Hydra programs on
SMP and compatible architectures and is described further in chapter 4.
Hydra Stay Resident An add-on for the Hydra Runtime, that allows it
to be loaded and remain in memory waiting for programs to execute. This
shortens the time it takes to start Hydra and Java programs is described
further in 6.8.
Par Make A set of scripts for Windows, that allows the Hydra PPS to
be built quickly and easily on a Windows computer. For older Windows
systems, it builds the Hydra PPS sequentially. For newer Windows systems,
it builds the Hydra PPS in parallel. This is not described any further in
this paper.
Test Suite A set of simple tests for Hydra and other Java-based parallel
languages, described further in chapter 6. The source code for the Hydra
versions of the benchmarks are in appendix B.
7


2. The Hydra Language
The Hydra Language is similar to OpenMP in that it is an extension to another
language, uses shared memory, and when used with a system that is incompati-
ble, the extensions are ignored[18]. Unlike OpenMP, the focus of Hydra is Hydra
events and the relationship between events. All events in Hydra are atomic. There
is a before an event and an after an event, but not during. As a result, Hydra
cant interrupt an event. To specify the relationship between events, event groups
and locks are used. Event groups are used to specify the order events may execute
in and which events may execute in parallel. To allow for complex combinations,
event groups may be placed in other event groups. Locks are used to specify how
the events manipulate data. Unlike many other languages, Hydras locks do not
cause a deadlock.
The Hydra language consists of annotations that specify events and empty
classes and methods for all other program specifications. These are used by the
programmer to mark up the source code, which is then processed by the Java
compiler. Later, the program is processed by a Hydra compiler (usually part of a
runtime system) and converted into a true parallel program. This allows for the
program to be tied to a particular architecture and runtime system after develop-
ment. This allows for Hydra to be much more powerful than the other languages
for parallel programming.
Time slots are used to determine when events will execute. The runtimes time
slots cannot be accessed directly, instead they are accessed indirectly through the
use of event groups and locks.
2.1 Events
Events are the smallest unit of code that the Hydra runtime is aware of. As
far as the Hydra runtime is concerned, events are all atomic. There is a before
and an after for events, but not during. The events may be executed either in
parallel or sequentially and in any order. Any and all dependencies between the
events are specified through other mechanisms (event groups and locks). As long
as the runtime adheres to the specifications made with the event groups and locks,
the runtime will execute Hydra programs correctly. Meaning that if events are
placed in a parallel event group (discussed in 2.3.1) and there are no conflicting
8


locks, then correct operation of the events should not depend on what order the
runtime executes the events in.
Since all events are atomic, they will not be interrupted by the Hydra runtime.
Likewise, they should generally proceed from beginning to end without pausing
for any great length of time. That is, that the event should not be waiting on
things like non-Hydra locks, because it interrupts the flow of the program and
the runtime will be unable to properly allocate resources.
Events are specially designated methods. Normally when a method is invoked
by a Java program, the code for that method is executed immediately. When
the method is a Hydra event, a request is made to schedule the event. When
the request is made, the event is implicitly placed within an event group along
with the events parameters. The time the event will execute is defined by the
event group, when the parameters may be locked, and the implementation of the
runtime.
Events are made by placing the @event annotation before the return type for
the method.
2.2 Time Slots
Time slots are used to determine when events will execute. All events that are in
the same time slot may execute in any order and, if possible, in parallel. Different
time slots execute at different times. Two events that are in different time slots
will also execute at different times.
Time slots are executed in a FIFO (First-In-First-Out) order. The first time
slot is executed, then the second, and so on, until the last time slot has been
executed.
For any particular time slot, X, the following is true:
No other time slots will execute in parallel with time slot X.
Once time slot X begins execution, the events in time slot X can execute at
any time, in any order, and in parallel.
All events in time slot X must complete execution for time slot X to complete
execution.
9


All time slots before X must complete execution before time slot X can
begin.
Time slot X must complete execution before the time slots after time slot
X can begin.
The runtimes time slots cannot be manipulated directly, instead the program-
mer must specify the relationship between events using event groups and locks.
The time slot concept is used in three different places in the Hydra language and
runtime. The first two are specified by the programmer and the third is used by
the runtime:
Logical Time Slots Event groups and locks are used to specify logical
ordering of all events. The definitions of all Hydra constructs can be specified
through the use of logical time slots. All combinations of Hydra constructs
can be represented with logical time slots.
Prescheduled Event Group Time Slots Each set of prescheduled event
group time slots are specific to an. individual prescheduled event group.
These are identical to the logical time slots except that the programmer
has direct access to them. The prescheduled event group type is described
in section 2.3.4.
Runtime Time Slots The runtime uses the time slot data structures
to represent when events will actually execute. The runtime provides a
mapping from the logical order created by the combination of the previous
two types of time slots to runtime time slots.
2.3 Event Groups
Event groups are used to organize events. They specify the order in which events
may execute and which events may execute in parallel. Complicated relationships
can be expressed by placing events groups within other event groups. All event
invocations and event groups must be placed within one and only one event group.
The only exception to this rule is the very first event group, which is created by
the runtime.
Every event group belongs to one of four different types. The type of the
event group defines its properties. These types are: Parallel, FIFO, Sequential,
and Prescheduled. The properties of these event groups are described in the fol-
lowing subsections and in section 2.5.
10


To demonstrate the properties of these event groups, an example algorithm
shall be considered.
ScheduleAllEvents():
SCHEDULE S ay Hello ()
SCHEDULE SayGoodBye()
SayHello():
DISPLAY Hello
SCHEDULE S ay How Are You ()
SayHowAreYou():
DISPLAY How Are You?
SayGoodBye():
DISPLAY Good Bye
2.3.1 Parallel Event Groups
An event group that is of the parallel type allows all of its children to execute in
any order and to execute in parallel.
When the SayHello event is added to the parallel event group, it will be added
to the 1st time slot as well. When the SayGoodbye event is added to the event
group, it will also be added to the 1st time slot. The result is that both events
will be in the same time slot, so they can execute in either order.
When the SayHello event is executed, the SayHowAreYou event will be added
to the parallel event group and it will be added to the 1st time slot. If the Say-
Goodbye event has not executed yet, then SayHowAreYou and SayGoodbye will
be able to execute in either order. But since it is the execution of the SayHello
event that causes the scheduling of the SayHowAreYou event to take place, the
output will always have Hello appearing before How Are You.
2.3.2 FIFO Event Groups
FIFO event groups cause child events to execute in a FIFO manner. None of
the children execute in parallel. Yet the results are not always the same as run-
ning the events sequentially.
Children are executed in the same order that they are placed in the event
group. So the first child to be added to a FIFO event group will be executed first,
11


the second will be executed second, and so.
When the SayHello event is added to the FIFO event group, it will be added to
the 1st time slot. When the SayGoodbye event is added to the event group, it will
be added to the 2nd time slot. Since they are in two different time slots, there is
a definite order between the events. SayHello must execute first and SayGoodbye
must execute second.
When the SayHello event is executed, the SayHowAreYou event will be added
to the FIFO event group and it will be added to the 3rd time slot. Once again
the events are in two different time slots, so there is a definite order between the
events. SayGoodbye must execute next and then SayHowAreYou will be last.
Event Schedule Execution
Order Order
t1 SayHello _ _ Hello t3
t2 SayGoodbye Goodbye t5

t4 SayHowAreYou >- How are you? t6
Figure 2.1 Example Execution With A FIFO Event Group
2.3.3 Sequential Event Groups
Sequential event groups simulate how events would execute without a Hydra run-
time. None of the children execute in parallel.
When the SayHello event is added to the sequential event group, it will be
added to the 1st time slot. When the SayGoodbye event is added to the event
group, it will be added to the 2nd time slot. Since they are in two different time
slots, there is a definite order between the events. SayHello must execute first and
SayGoodbye must execute second.
When the SayHello event is executed, the SayHowAreYou event will be added
to the sequential event group. However, at this point Hydra knows that we wish
to mimic sequential execution and so rather than inserting the event at the end of
the event group, it will place it before the next event. So SayHowAreYou will be
12


added to the 2nd time slot and Say Goodbye will be moved to the 3rd time slot.
Once again the events are in two different time slots, so there is a definite order
between the events. SayHowAreYou must execute next and then SayGoodbye will
be last.
Event Schedule Execution
Order Order
t1 SayHello Hello t3
t2 SayGoodbye ^ y How are you? t5

t4 SayHowAreYou' s' ^ Goodbye t6
Figure 2.2 Example Execution With A Sequential Event Group
It is important to note that sequential event groups cause the children of the
group to execute in the exact order that they would without a Hydra runtime.
However, note that the results produced by a sequential event group are not nec-
essarily the same as those produced without a Hydra runtime. This is a result of
Hydra only scheduling events and event groups. Meaning that all events will have
a delayed execution, whereas all non-event methods within an event will execute
immediately.
Without a Hydra runtime, the example program would produce the same re-
sults, but the scheduling does not occur in the same order.
Event Schedule Execution
Order Order
t1 SayHello Hello t2

t3 SayHowAreYou How are you? t4
t5 SayGoodbye Goodbye t6
Figure 2.3 Example Execution Without A Hydra Runtime
13


2.3.4 Prescheduled Event Groups
Prescheduled event groups are the most powerful event group type available in
Hydra. They allow the program to specify in detail the ordering of child events.
This event group is roughly equivalent to a combination of a FIFO group and a
parallel event group.
The ordering of children is specified through the use of time slots. All time
slots in a prescheduled event group are relative to all other time slots in the same
event group. Each time slot is essentially like a parallel event group, which means
that all children in a time slot may execute in any order and can execute in parallel.
A detailed example of how to use prescheduled event groups will be shown in
the section on merge sort.
2.4 Locks
In Hydra, locks are combined with the concept of method parameters. When
an event is defined, it not only defines every parameter it takes, but also the de-
sired locking method is defined as well. When a request is made to schedule an
event, a request is also made to lock the parameters. The event will execute when
all specified parameters are available for the requested lock types. So when exe-
cution of an event begins, all of the locks have been acquired. Once the execution
of an event ends, all of the locks are released.
Since all locks are defined in the parameter list, no new locks may be acquired
during the execution of an event. Additionally, no locks may be released during
the execution of an event.
Parameters are locked using four different types of locks. All types of locks
are specified with an annotation with the same name as the lock type.
Locks are also used as an implicit-form of message passing: When a lock is
requested, access to the requested data is implied. So it is the job of the runtime
to ensure that all events have access to the specified data.
The runtime may change locks to achieve better performance or to make a
program conform to a type of architecture. For example, a implementation of
Hydra may reduce all locks to the pass lock, since no locking is required. On the
other hand, another implementation may elect to upgrade pass locks to readwrite
14


locks.
In the following subsections, some examples will be presented. For these ex-
amples, it is assumed that the runtime is using ideal implementations of all locks.
In the case of the pass lock, this means that no actual locking will take place. For
the others, it means that they will also function as described.
2.4.1 Pass Locks
Pass is used to ensure that an event has access to the parameter, but that a
minimal amount of locking is done. This means that ideally, the Hydra runtime
would allow the parameter to be passed without having to actually lock it. Since
the desire is to not lock the parameter, the program is responsible for using some
other mechanism, namely the ordering implied by the event groups through time
slots, to ensure that the data is synchronized properly.
It is important to note that a pass lock is a bit different from a lack of any kind
of lock. In Hydra, if necessary, a pass lock can be upgraded to another type of
lock, whereas, if no lock is used, there is no lock upgrade. To put it another way,
the difference is that the pass lock recommends that no locking or the minimum
locking takes place, whereas with an actual lack of a lock, there will be no locking
in any case.
This is generally used when an event will have exclusive access to a parameter
or when a prescheduled event group is being used. This is also the default lock
if no type of lock has been specified, this means that if no lock is specified for a
parameter, then pass is applied.
To clarify how the pass lock works, lets look at the following example code:
void Schedule(@pass Object ol, @pass Object o2) {
EventGroup.Parallel(). AddToThis();
A(ol);
A(ol);
A(o2);
A(o2);
}
@event void A(@pass Object o) {
}
15


In this example, the A event takes a single parameter o. The event requests a
pass lock to this parameter. We request that this event be scheduled four times
as part of a parallel event group. The first two times are with the object ol and
the last two times are with the object o2. Since the lock is a pass lock and we
have an ideal implementation of the pass lock, all events may execute in parallel.
2.4.2 Read Locks
Read locks are used when the specified parameter will not be modified. When
a parameter will not be modified, more than one event can access it. This lock
ensures that no events that want to modify the parameter will execute at the same
time as the events that wish to read it.
To clarify how the read lock works, lets look at the following example code:
void Schedule(@pass Object ol, @pass Object o2) {
Event Group .Parallel (). AddToThis ();
A(ol);
A(ol);
A(o2);
A(o2);
}
@event void A(@read Object o) {
}
In this example, the A event reads a single parameter o. We request that this
event be scheduled four times as part of a parallel event group. The first two
times are with the object ol and the last two times are with the object o2. Even
though there is an overlap with the parameters, all events may execute in parallel.
2.4.3 Write Locks
Write locks are used when the specified parameter will be modified and not read
from. Since the specified parameter will be modified, only one event may have
access to it. If there are two or more events writing to the same parameter in
a row, then it is possible for the runtime to optimize the sequence of events by
removing one or more events. This optimization is possible, because the last event
will overwrite the work of the previous events, so if the previous events dont ex-
ecute, then nothing will be different.
16


To clarify how the write lock works, lets look at the following example code:
void Schedule(@pass Object ol, @pass Object o2) {
EventGroup.Parallel(). AddToThis ();
A(ol);
A(ol);
A(o2);
A(o2);
}
@event void A(@write Object o) {
}
In this example, the A event writes to a single parameter o. We request that
this event be scheduled four times as part of a parallel event group. The first
two times are with the object ol and the last two times are with the object o2.
Since the lock is a write lock, only half of the events may execute in parallel. The
two that write to ol may not execute in parallel and the two that write to o2
may not execute in parallel. However, either event that writes to ol may execute
in parallel with either event that writes to o2. Additionally, the first event that
writes to ol and the first event that writes to o2 may be removed by the runtime.
2.4.4 Read Write Locks
ReadWrite locks are used when the specified parameter will be modified and read
from. Since the specified parameter will be modified, only one event may have
access to it.
To clarify how the readwrite lock works, lets look at the following example
code:
void Schedule(@pass Object ol, @pass Object o2) {
EventGroup.Parallel(). AddToThis();
A(ol);
A(ol);
A(o2);
A(o2);
}
17


@event void A(@readwrite Object o) {
}
In this example, the A event reads from and writes to a single parameter o. We
request that this event be scheduled four times as part of a parallel event group.
The first two times are with the object ol and the last two times are with the
object o2. Since the lock is a readwrite lock, only half of the events may execute
in parallel. The two that readwrite to ol may not execute in parallel and the
two that readwrite to o2 may not execute in parallel. However, either event that
read writes to ol may execute in parallel with either event that readwrites to o2.
2.4.5 Using Locks
When a lock is requested for a parameter, the runtime checks to see what the
previous lock requested was. If the requested lock cant execute in parallel with
the previous lock, then the runtime schedules the lock for the time slot after the
previous lock. If the requested lock can execute in parallel with the previous lock,
then the runtime schedules the lock for the same time slot as the previous lock.
If an event requests locks for two or more different objects and they are available
at different time slots, then the runtime picks the latest of the time slots and
schedules all locks requested by the event for that time slot. The SMP runtime
only does scheduling for read, write, and readwrite locks. Additionally, write and
readwrite are treated the same.
The simplest algorithm for handling the scheduling of locks is as follows:
ScheduleFor = 0
For Each Object in ParametersOf(Event)
If PreviousLockFor(Object) == READ Then
If RequestedLockFor(Object) == READ Then
Available = PreviousTimeslotFor(Object)
Else If RequestedLockFor(Object) == WRITE Then
Available = PreviousTimeslotFor(Object) + 1
Else If RequestedLockFor(Object) == READWRITE Then
Available = PreviousTimeslotFor(Object) + 1
Else If PreviousLockFor(Object) == WRITE Then
Available = PreviousTimeslotFor(Object) + 1
Else If PreviousLockFor(Object) == READWRITE Then
18


Available = PreviousTimeslotFor(Object) + 1
If Available > ScheduleFor Then
ScheduleFor = Available
For Each Object in ParametersOf(Event)
SetPreviousLockFor(Object, RequestedLockFor(Object))
SetPreviousTimeslotFor(Obj ect, ScheduleFor))
To demonstrate how scheduling for locks is handled with the SMP runtime,
we will use a few figures. In these figures, we will have two objects A and B. To
the right of these objects is the number that designates the time slot for which
the lock was last scheduled and the type of the lock by one or two letters. We will
also have two time slots, which are separated by a line and both will be designated
with a number. Lastly, we will have events which will be boxes in the four time
slots. For the sake of simplicity we will assume that the type of the event group
used is parallel, so that we can focus solely on the locks. We will also assume that
there are no other events executing during this example.
Now, when our example begins, there are no events and nothing has been
scheduled for the locks. In this example, we represent this state with empty boxes
and zeros next to the locks.
A:0
1
B:0
2
Figure 2.4 Locking Example Part 1 No Locks
Next we add an event that takes one parameter and uses the read lock. The
parameter that was passed was A. The event will naturally be added to the 1st
time slot. The A object will also be scheduled for time slot 1 with a read lock.
19


A:1R
1
X
B:0
2
Figure 2.5 Locking Example Part 2 After 1st Lock
Next we add the same event and with the same parameter. Since the requested
lock is still a read lock, the event will be added to the 1st time slot and the A
object will still be scheduled for time slot 1 with a read lock.
A:1R B:0
1
~A
"a
2
Figure 2.6 Locking Example Part 2 After 2nd Lock
Next we will add a new event. This event takes two parameters. One with a
read lock and one with a write lock. The A object was passed for the write lock
and the B object was passed for the read lock. The B object could be scheduled
for the 1st time slot with a read lock. However, since a read lock conflicts with
a write lock on the same object, the A object cant be scheduled for the 1st time
slot. So the A object will be scheduled for the 2nd time slot with a write lock.
Since both locks must occur at the same time, the B object will be scheduled for
a read lock for the 2nd time slot as well. As a result, the event will end up in the
second time slot.
20


B:2R
1
~A
~A
2
AB
Figure 2.7 Locking Example Part 2 After 3rd and 4th Locks
2.5 Commands
All major Hydra commands are divided into three groups: Event Group Creation,
Event Group Manipulation, and Parameter Locks. Each group of commands will
be handled separately.
2.5.1 Event Group Creation
New event groups can be created with the EventGroup class by invoking the
static method that has the same name as the desired event group. The method
will then return the new event group.
So if a new FIFO event group is desired, then the method EventGroup.FIFOQ
is invoked. If a new Parallel event group is desired, then the method Event-
Group.Parallel() is invoked and so on.
2.5.2 Event Group Manipulation
Event Group Manipulation commands are used to retrieve event groups, place
events within event groups, and place event groups into other event groups.
There are two different event groups that can be retrieved. First is the event
group that is the parent of the event that is currently executing. This is retrieved
with the EventGroup.GetCurrent() method. Once it is retrieved, it can be as-
signed to a variable or otherwise manipulated. This event group will not change
throughout the execution of the event. The second event group that can be re-
21


trieved is the event group that is being added to. This is the event group that
is accepting the new events as they are being invoked by the currently executing
event. This second event group may or may not be the same as the current event
group. This event group can be changed during the execution of the event. This
event group is retrieved with the Event Group. Get AddingTo() method.
The event group that is being added to may be changed with the AddE-
ventsToThis() method. This is used by specifying the new adding to event
group before the command, like so: a.AddEventsToThis(). In which case events
would be added to the a event group.
Event groups may be placed within other event groups using two different
methods. The first adds an event group to the event group that is being added
to, which is done with the AddAsChildEventGroup() method, which is used like
so: a.AddAsChildEventGroup(). Which adds the a event group to the event group
that is being added to. The second method adds one event group to another ex-
plicitly specified event group, which is done with the AddAsChildEventGroupTo()
method. That is: a.AddAsChildEventGroup(b). Which adds the event group a
to the event group b.
2.5.3 Prescheduled Event Group Specific
There are commands that may only be used with Prescheduled Event Groups.
These commands change the time slot that events and event groups are being
added to. In the description of these commands, this time slot will be referred to
as the current time slot. These commands are:
Back Changes the current time slot to the time slot before the current
one. If there is no time slot before the current one, then a new time slot is
added before the current one. The new time slot is then made the current
time slot.
"\
Forward Changes the current time slot to the time slot after the current
one. If there is no time slot after the current one, then a new time slot is
added after the current one. The new time slot is then made the current
time slot.
GoToFirst Changes the current time slot to the earliest time slot that
has been defined in the event group.
GoToLast Changes the current time slot to the latest time slot that has
been defined in the event group.
22


2.5.4 Parameter Locks
The parameter lock commands are used to specify the locks that were described in
section 2.3. These commands take the form of annotations that are placed before
the data type of a parameter. These annotations have the same name as the lock
they represent. However, they are specified in lower case. It should be noted that
all annotations start with the at sign (@).
So if a pass lock is desired, then the annotation @pass is used. If a read lock
is desired, then the annotation @read is used. And so on.
23


3. Comparison With Other Languages
Hydra has a number of strengths in comparison to other parallel language ex-
tensions and libraries. The most common ones are as follows:
Architecture Independence
Compilation To Architecture at Runtime
Fewer Language Restrictions
Recognized By Existing Tools
No Deadlocks
There are also a number of differences in how some constructs in Hydra work
including:
No Explicit Barriers
Scheduled Locking
Restricted Mixing of Sequential and Parallel Code
Indirect Support for Many Kinds of Parallel Loops
Support for Parallel Invocations
Support for Spawning Workers
Each of these differences will be described in detail. Sample algorithms for the
differences are provided in chapter 6.
3.1 Architecture Independence
Many methods of parallelizing code are fairly low-level. For example, Java Threa-
ds, Java Thread Pools, and JACE all require the programmer to use threads
for parallelism. While threads are a well known and well documented subject,
threaded programs are not portable to different architectures. Many such meth-
ods of parallel programming are not backwards compatible with non-parallel ar-
chitectures. Even when they are backwards compatible, they generally cant make
24


efficient use of the old architectures.
Hydra uses events as its unit of parallel execution. A Hydra program can be
recompiled with a runtime to use threads so that it will work with a thread-based
parallel architecture, but it is not required to use one. Hydra has a very high-level
of backwards compatibility with non-parallel architectures. This is proven by the
fact that the Hydra programs presented in this paper can execute with or without
Hydra and will execute efficiently either way.
3.2 Compilation To Architecture at Runtime
Many high-level methods of parallelizing code require that the program be bonded
to a particular architecture and runtime at compile time. For example, javar and
JOMP all require the programmer to use their respective compilers to produce
low-level architecture-dependent code at the time of compilation. This means'
that once a user receives the program, the user cant use it on a different architec-
ture even if the original source code would be compatible with the architecture.
Additionally, the programmer needs to be aware of the underlying architectures
and ensure that different versions of the compiled program are available for all
the different architectures that the user would wish to use. Lastly, the user also
needs to be aware of their architecture, so that they can be sure to use the correct
version of the program.
Hydra compiles programs to a particular architecture and runtime at runtime.
So as long as a program is written correctly, it can be used on any architecture
that Hydra supports. The programmer does not need to compile multiple versions
of the same program and the user does not need to be aware of which architecture
they have. All that needs to be done is to verify that the architecture that is to
be used is compatible with Hydra and that the program is a Hydra program.
3.3 Fewer Language Restrictions
Many extensions to Java rely on class libraries for the definitions of the paral-
lel language constructs. For example, Java Threads, Java Thread Pools, and
JACE. These class libraries restrict what can be done with the language to only
what classes, methods, and fields can do. For example, parallel loops cant be
easily represented with these languages like they can with other languages, such
as javar and JOMP. Another example is that they also cant have locks specified
in the parameter list, like with Hydra.
25


Hydra uses a class library to represent the language constructs. However, the
class library does not contain the definitions for the constructs. The actual def-
initions of the language constructs are supplied at runtime and Hydra programs
are recompiled. To put it another way, the class library is only used to mark up
Java programs, so that they can be Hydra programs. Since the class library is
only used for markup purposes, it can do things that classes, methods, and fields
cant normally do.
3.4 Recognized By Existing Tools
Many extensions to Java cant be recognized by many, if any, existing tools for
Java. For example, javar and JOMP. These language extensions make use of com-
ments to represent their language constructs. Comments are generally ignored by
Java tools.
Hydra makes use of a class library to represent language constructs. Existing
Java tools can thus recognize the constructs without knowing anything about Hy-
dra. For example: Netbeans can be used to display information about different
classes and check for errors [19].
It works because these tools are required to recognize classes and their mem-
bers to start with. For example, Netbeans needs to recognize these, so it can
check the code for typos and to display javadoc on the classes and members [19].
3.5 No Deadlocks
In most parallel programming languages deadlock prevention is the responsibility
of the programmer. For example, Java Threads, Java Thread Pools, and JACE.
It is possible to cause deadlock with the following:
1. Produce two threads.
2. Have thread 1 acquire lock 1.
3. Have thread 2 acquire lock 2.
4. Have thread 1 wait for lock 2. .
' 5. Have thread 2 wait for lock 1.
26


At this point, the two threads would wait indefinitely. While this is a trivial
example, it demonstrates the possibility of deadlock.
Hydra avoids deadlocks. To clarify, when it is said that deadlock is impossible
with Hydra, the following restrictions are put into place:
Only Hydras parallel constructs are being used. Meaning Hydra locks and
event groups.
Deadlocks as defined by operating system text books and circular dead
states[20]
The Hydra runtime is implemented correctly.
This does not include:
Javas parallel constructs, which can be used with Hydra programs.
Custom parallel constructs produced by the programmer or a 3rd party via
the Java language.
Incorrectly implemented Hydra runtimes.
There are three different categories of circular dead states[20]:
Communication Deadlock
Interleaved Deadlock (Identical to the operating system text book definition
of deadlock)
Scheduling Deadlock
Operating system text books define a list of properties that must be met for
a deadlock to occur [20]:
A circular chain of processes
Each process in the chain must be waiting for something that another in the
chain has.
Deadlock is impossible in Hydra for the following reasons:
1. With Hydra locks, all locks must be acquired at one time. If an event is
unable to acquire all of its locks at one time, it cant acquire any locks and
must wait until all are available. Since all must be acquired at one time, it
is impossible for one event to be waiting for a lock, while it already holds
another lock. This is referred to as one shot allocation and it does not allow
for a circular chain of processes which is necessary for deadlock[21][22].
27


2. With event groups, the ordering of events is determined and does not use
locks. With this there can be no circular chain of processes waiting on each
other, so deadlock is impossible.
3. Communication deadlock requires message passing and Hydra does not sup-
port message passing, so communication deadlock cant occur.
4. Scheduling deadlock requires a scheduling problem to occur, which causes
two or more processes to wait on each other. Hydra does not have any
constructs that can cause this problem, so scheduling deadlock cant occur
either.
The technique that Hydra uses to eliminate deadlocks is referred to as dead-
lock prevention[21]. Deadlock prevention does not allow for deadlocks to occur
due to the nature of the parallel language.
3.6 No Explicit Barriers
In the Hydra language, there axe only implicit barriers and they are defined as
part of the ordering constructs. This means that the programmer cannot spec-
ify explicit barriers using the Hydra language. Though if it is absolutely necessary
to have explicit barriers, the programmer could use Javas CountDownLatch or
CyclicBarrier.
Between all time slots is an implicit barrier. Events in the same time slot do
not have an implicit barrier between them. This is true for all types of time slots
be they part of a prescheduled event group, logical time slots, or runtime time
slots. For FIFO and Sequential event groups, there are implicit barriers between
every event and child event group.
In Java Threads, Java Thread Pools, and JACE, barriers are generally explicit.
In OpenMP-like parallel language extensions, like javar and JOMP, implicit barri-
ers are generally placed at the end of parallel constructs by default[15] [16] [18]. In
Hydra, as long as no ordering constructs, like FIFO, Sequential, and Presched-
uled event groups, are used, then there are no implicit barriers.
3.7 Scheduled Locking
Locks work differently with the Hydra language than they do with most par-
allel language extensions. First of all, as mentioned before, it is impossible to
28


cause deadlock. However, that is not all.
Locks are traditionally acquired at the time they are requested. With Hydra,
virtually everything is scheduled in advance, this includes locks. So rather than
requesting a lock when it is desired, a Hydra program specifies that it will need
a certain lock with an event in the future. The runtime then determines when
to schedule the event so that the requested lock is available. This eliminates the
need for traditional waiting mechanisms that need to be implemented. So there
is no busy wait while waiting for a lock and there are no threads simply idling
because a requested lock is not available. Instead, when a lock is being waited on,
the Hydra runtime can schedule other events to take place or have the threads
perform system maintenance. On the other hand, since Hydra locks have to be
scheduled in advance, it reduces the level of flexibility.
Additionally, locks can be easily optimized by the Hydra runtime. For exam-
ple, if a Hydra program executes without a Hydra runtime and with only a Java
virtual machine, then the locks will automatically be optimized out. This elimi-
nates both the processing and memory overhead that is necessary for maintaining
locks. On the other hand, this makes it so that a programmer cant know for
certain how a lock will function in advance. However, this is not expected to be
a problem.
Also, Hydra locks are all acquired at one time. This improves performance
over most other parallel language extensions. Additionally, it provides greater
ease of use since all locking is done in one place. On the other hand, this also
reduces flexibility.
Lastly, Hydra locks are released automatically when an event ends. This re-
sults in greater ease of use, since the operation of events will be consistent and
the programmer does not have to worry about manually releasing all locks. On
the other hand, once again this reduces flexibility.
Overall, Hydra locks should provide greater performance and ease of use than
traditional locks. However, due to their nature, Hydra locks wall lack flexibility
and they lack some consistency since Hydra can optimize the locks.
3.8 Restricted Mixing of Sequential and Parallel Code
In Hydra, events have a number of properties including:
All events are considered atomic. This means that there is a before an
29


event and an after an event, but no during an event. The Hydra
runtime does not process parallel constructs used by an event while the
event is executing.
All code in an event is executed sequentially. This means that there is
no parallel code in an event. While an event may make use of parallel
constructs, the actual parallel code will not execute until after the event
is completed. So while an event may schedule other events to do work in
parallel, each event during execution will only do sequential work.
With other parallel language extensions, parallel and sequential code can fre-
quently be nested within one another. Here are some examples:
For JOMP parallel regions may be placed anywhere within a JOMP pro-
gram. Additionally, sequential code may be specified within a parallel re-
gion.
For thread based systems (Java Threads, Java Thread Pools, and JACE)
parallel threads can be created and then a barrier can be specified to have
a region of sequential code. Additionally, within that region of sequential
code, a new set of parallel threads could be produced, which then have their
own region of sequential code, and so on.
This means that other parallel language extensions should have a slightly
higher degree of flexibility than Hydra.
3.9 Indirect Support for Many Kinds of Parallel Loops
Parallel loops allow the work within a loop to be parallelized. In Hydra, par-
allel loops are generally produced by an event with a sequential loop that adds
events to either a parallel event group or a prescheduled event group. The loop
itself will not be executed parallel. However, some time after the loop completes
execution, the events will begin execution. At that time the events will be ex-
ecuted in parallel. Many different loops can be represented with this and have
their work parallelized.
In JOMP and javar, parallel loops are supported directly. However, only sim-
ple counted loops are supported. When the parallel loop begins, work will be split
between different threads automatically. When the work is done, the loop will end
and the results will be available.
30


javabs parallel loop support is similar to JOMPs and javar. The main dif-
ference is that there is no way to specify the loops that are to be parallelized
since javab does everything automatically. So if javab does not detect that it can
parallelize a loop, then no attempt will be made to parallelize the loop.
In thread based systems (Java Threads, Java Thread Pools, and JACE), par-
allel loops are not supported directly. While it is possible to represent any kind
of parallel loops, it is not easy to do.
This means that Hydra and the thread based systems offer the greatest flexi-
bility for parallel loops, while the simplest would be JOMP, javar, and javab. The
hardest to use would be the thread based systems.
3.10 Support for Parallel Invocations
Parallel invocations allow two or more methods to be executed in parallel.
In Hydra, parallel invocations are supported directly via events. Any methods
that a programmer wishes to invoke in parallel need to be made into events and
placed in a parallel or prescheduled event group.
In JOMP, parallel invocations are done with parallel loops and sections con-
tained within parallel regions.
In javar, parallel invocations are supported directly via parJnv construct. This
allows for a block of method invocations to be executed in parallel.
In thread based systems (Java Threads, Java Thread Pools, and JACE), paral-
lel invocations are partially supported via the thread mechanism. For any methods
that a programmer wishes to invoke in parallel, the programmer needs to produce
a thread and have the thread invoke the method. However, full support for paral-
lel invocations does not exist in such systems because they generally dont provide
an easy to use mechanism for passing parameters to the methods.
This means that all parallel language extensions that are covered in this pa-
per support parallel invocations. However, all versions are slightly different. The
hardest to use would be the thread based systems.
31


3.11 Support for Spawning Workers
Sometimes it is desirable to spawn off a separate worker task without hav-
ing to worry about when it will be done and having to manage it.
In Hydra, such tasks are supported directly via events and event groups. The
programmer simply has to make the worker task an event or event group and place
it within a parallel event group. This task will then execute in parallel with the
other children in the parallel event group. After that, there is no need to worry
about the task.
In javar, this can be done by using parallel loops and parallel invocations.
However, there is a barrier at the end of all parallel constructs, which prevents
the main thread from proceeding and potentially coming back to spawn off more
workers. This can only be avoided through the use of recursion.
In JOMP, this is supported by using parallel regions, parallel loops or parallel
sections and having the implicit barrier disabled.
In most thread based systems (Java Threads and JACE), this is also supported
directly and works well. The task simply needs to be given its own thread.
In some thread based systems that rely on pools (Java Thread Pools), it is
possible to make it work, but there are some complications. While functionality
is essentially equivalent to other thread systems, there is one substantial problem.
It is possible that some code may be dependent on having access to a certain
number of threads from the thread pool. This is problematic because the thread
pool will typically be divided between the different tasks, so guaranteeing that
the required number of threads are available all the time may be difficult. This
will be shown in chapter 6.
This means that not all parallel language extensions support spawning work-
ers. Additionally, even when a parallel language extension supports it, it may not
work very well. However, Hydra supports this completely along with JOMP and
most thread based systems (Java Threads and JACE).
32


4. The Hydra Runtime
The runtime is responsible for mapping the Hydra constructs to the underly-
ing machine architectures in order to exploit their parallel processing capabilities.
Therefore a variety of runtimes may be created for the Hydra language. The
current version is tailored for the SMP (Symmetric Multi-Processing) architec-
ture, allowing Hydra events to execute on separate threads. The runtime creates
the exact number of threads that may execute concurrently on the system, one
thread for each virtual processor. Virtual processors include physical proces-
sors, additional cores on each processor, and additional threads that can execute
concurrently on each core.
This runtime executes on any Java VM that supports Java 1.5. Hydra pro-
grams written with backwards compatibility in mind can execute on any Java 1.5
VM as a regular sequential program.
4.1 Compilation of Programs
Typically, when a parallel program is written, it must be compiled to a par-
ticular architecture by the programmer. If support for multiple architectures is
desired, then different versions of the program are required. Occasionally, even
different versions of the source code may also be required. In addition its inconve-
nient for the user, who may not be aware of the type of the underlying architecture.
Hydra works differently; one version of the program is created and distributed
by the programmer. The runtime that is used with the program determines the
type of architecture the program will be capable of running on.
Before execution a Hydra program does not have the proper code to interface
with a particular Hydra runtime. Instead it only has a marked up bytecode,
which specifies how all Hydra language constructs should be used in the program.
To make the program actually work with the runtime, it must first be converted
so that it can interface with the API of a particular runtime. This conversion
is performed as part of the runtime by a compiler called Alchemist. Alchemist
will recompile the classes as they are loaded by the Java VM by using a custom
loader[23]. This paper will present compilation further in chapter 5.
33


4.2 Scheduling of Events
When events are scheduled, to the programmer it appears as a simple method
invocation. However, there is a lot that goes on under the hood of the Hydra sys-
tem. First of all, Hydra programs are recompiled and when they are recompiled,
that simple method invocation becomes a multistep process that results in the
event being placed in an event group. This process is shown in Figure 4.1 and
described below:
1. Scheduling Method Is Invoked The scheduling method for the event
is invoked. In the source code, this appears as if the event itself is being
invoked.
2. Gets Event Object All events have a singleton that represents the event.
This singleton is retrieved so that the event may be manipulated.
3. Acquires Parameters Instance Events have parameters just like the
methods they are based on. When scheduling an event, Hydra needs to
know what these parameters are so that they can be properly locked and
the event can be scheduled. We say that the parameters instance is acquired
because a new instance of a parameters object may or may not be created
based on the capabilities of the runtime. No more details will be covered on
this, because it is beyond the scope of this paper. The parameters for the
event are placed in the the parameters object.
4. Place Event' and Parameters In Event Group The event and the
parameters are placed into the event group so that they can be found by the
supervisor when it begins its scheduling operation.
34


Hydra Program
Scheduling Method
1. Scheduling method
is invoked. <
2. Gets object that
represents the event.
Event Singleton j
3.'Acquire an instance of the class
used to store the parameters,
and place the parameters in
the instance.
4. Place event and parameters
in the event group.
Parameters
- Event Group
Figure 4.1 Scheduling In A Hydra Program
Most of these details are taken care of by a new method which replaced the
original method. The algorithm for the new method follows:
Any Event (PI, P2, PN):
AllParameters = AnyEventClass.Instance.GetParameters()
AllParameters.Pl = PI
AllParameters.P2 = P2
AllParameters.PN = PN
Any Event Class. Instance. Schedule (AllParameters)
After placing the event and its parameters in an event group, eventually the
time comes for the event group to be scheduled. The time this happens is de-
pendent on the overall structure of the program and the implementation of the
runtime. However, the first prerequisite is that the event group must be a child
of another event group. The only exception to this is the first event group for
a program because the creation and scheduling of this event group is completely
performed by the runtime. After that it depends on how the parent event group
operates. For example, if we have a child event group of a FIFO event group,
35


the child will be scheduled when all prior events and event groups in the FIFO
event group have been executed or scheduled. If an event group has already been
scheduled once before and new events have been added, then it is dependent on
the runtime implementation. At the time of this writing the current implemen-
tation of Hydra schedules new events for an event group after it has executed all
the events that have already been scheduled for the event group.
Once the Hydra runtime decides to schedule events from an event group, it
pulls events out of the event groups and begins the scheduling process. This
process is shown in Figure 4.2 and described below:
1. Retrieve Data On Event The supervisor queries the event group for
the information on the event, including the objects it needs to lock and the
types of locks it needs.
2. Check Event Group The supervisor queries the event group to determine
the earliest time slot the event should be allowed to execute in. This is used
to maintain the proper ordering of the events.
3. Check Parameters The supervisor then checks the objects that the event
needs to lock to determine when they are available. It then schedules the
event for the earliest time slot that it can executed in.
4. Place Event In Time Slot The supervisor then attempts to place the
event in the data structure for the specified time slot. If the data structure
is full, the next available time slot is used.
5. Notify Event Group The supervisor notifies the event group of the
scheduled event and its time slot.
6. Update Locks The supervisor updates the locks on the objects to indicate
that the event will be using them in the specified time slot.
36


Hydra Program
l Event Group
1. Get event'and
parameters from
the event group,
2. Checks to see when
the event should he
scheduled according.to
the event group. \
5. Notifies the event group.
Parameters
3,..Finds out when the
,x'event can have the
requested access
for all parameters.
6. Updates schedule
for all parameters.
Hydra Runtime
4.
Scheduler
Place the event in ;
a time slot. ; _____________
................. Time Slots
Figure 4.2 Scheduling In A Hydra Runtime
It should be noted that the traditional method of locking objects is not used
in Hydra. Instead events are scheduled to have access to the locked objects at a
particular time. Since events will always execute at that particular time, there is
no need to actually lock the objects using the traditional method. For example:
When two events are scheduled that wish to write to the same object, the fol-
lowing will happen. The first event will execute. After it is completed and only
after it has been completed, will the second event be allowed to execute. This is
implemented by the supervisor placing the two events in separate time slots.
4.3 Execution of Events
The Hydra runtime executes each time slot one at a time and in order. Events
within a given time slot can execute concurrently. When a time slot is executed,
it is split into parts. Each part is then handed to a different thread. That thread
is then responsible for executing all the events in its part. After an entire time
slot is executed, the supervisor moves on to the next time slot. When all time
slots have been executed, the system shuts down.
As each event executes, the requests to schedule event groups are placed in
queues. There are three queues per thread: one that is currently being added to,
one that is assigned to the supervisor, and finally an inactive queue. The thread
37


specific queues allow the thread to safely manipulate the queues without having
to lock them. Ever}' so often a running thread attempts to switch its queue with
the inactive one. If it finds that the inactive queue already has event groups in it,
then it will force the supervisor to execute. Upon execution, the supervisor will
swap its queue for that thread with the inactive one. It will then empty the queue
and attempt to schedule the events in the event groups.
4.4 System Maintenance
Whenever a thread has no work to do it performs system maintenance. There
are two primary types of tasks it can perform:
Supervisor Functions The thread can switch to supervisor mode, sched-
ule, and assign events for execution.
Miscellaneous Maintenance This covers all non-essential tasks the run-
time can perform. This includes tasks such as garbage collection, producing
more objects for object pools, saving buffered information, etc.
The idea is to try to keep the threads doing work even when there is noth-
ing they have to do. For example, when only one thread is executing the actual
program, a second thread could schedule the events, so that the program thread
doesnt need to wait for the events to be scheduled. A third thread could add
objects to the object pools to make object creation faster for the program and
supervisor threads. A fourth thread could be busy disposing of unneeded objects.
A fifth can be busy flushing buffers.
Every time a thread completes some maintenance for the system, it checks to
see if more work for the actual program is available. In addition, all system main-
tenance functions are meant to be quick so that if real work arises, the threads
can quickly begin executing the rest of the program. Ultimately, if everything
that can be done has been done, then the threads which have no work will go
to sleep for a little while. In the current version of the SMP Hydra runtime the
amount of time the a thread spends asleep is proportional to how long the thread
has been without work up to a limit of 16 ms.
4.5 The Supervisor
The Supervisor handles all critical functions, besides compilation, for the Hy-
dra runtime. The Supervisor is generically defined only by the tasks it performs
38


and not by a particular thread or section(s) of code. As of this Aversion of Hydra,
the Supervisor is responsible for the folloAving:
Determines which time slot events are to be scheduled for.
Determines which time slot should be executed now.
Determines which threads should execute which events.
Assigns events to threads.
Wakes up threads, if necessary.
For each Hydra runtime, the Supervisor is allowed to function differently, which
is why there is no concrete definition of how it works. However, for the runtime
that was developed first, the following properties apply to the Supervisor:
The Supendsor is a specially designated thread.
The Supervisor thread can change as the runtime is running.
There is not always a Supervisor thread. This means that the Supervisor is
not always executing.
The Supervisor thread is whichever thread claims to be the Supervisor first.
The Supervisor thread can relinquish its role.
A thread attempts to become the Supervisor, when a thread is unable to
continue executing events.
The Supervisor thread communicates with all non-Supervisor threads through
special array-based buffers. Once the Supervisor thread relinquishes its Supervisor
status, it will also be a non-Supervisor thread, as a result, the Supervisor thread
sends messages to itself this way as well. Each non-Supervisor thread has its own
set of buffers. The buffers are as follows:
Event groups to be scheduled Any time a new event group needs to be
scheduled, it is placed in this buffer. When this buffer is full, the currently
executing thread will attempt to become the Supervisor.
Events to be executed. Any time an event needs to be executed, it will
be placed in this buffer. When this buffer is empty, the currently executing
thread will attempt to become the Supervisor.
39


4.6 Locks
Internally the Hydra runtime has another classification system for locks in ad-
dition to what is specified by the language. All locks have one of the locking types
that was described in section 2.4, as well as one of the following:
Native A native Hydra lock has all of the Hydra data structures necessary
for locking right in the lock. These locks have Hydra.Runtime.Object as a
superclass.
Non Native A non-native lock does not have the Hydra data structures
necessary for locking in the lock. Instead, the Hydra runtime needs to
reference a separate mapping data structure and lookup Hydra the data
structures for the lock. These locks do not have Hydra.Runtime.Object as
a superclass.
Unknown An unknown lock is one that is one of the above. However,
the runtime does not know in advance which one it is. The Hydra runtime
must first determine which type of lock it is and then use the appropriate
data structures.
Out of these three categories, the native lock is the ideal lock. It has the small-
est overhead and fastest access time. The non-native type is slightly slower and
takes up a little bit more space. How much of a difference it makes is dependent
upon the map that is used and how it works. The data structures for native locks
will be automatically garbage collected, whereas the non-native locks take a bit
more work. The unknown lock is the slowest since its type needs to be determined.
These lock types are not specified by the programmer. They are determined
automatically by the compiler when the program is recompiled to work with the
Hydra runtime. At this time, all three lock types are implemented by the runtime.
However, only the non-native type is implemented by the compiler.
40


5. The Alchemist Compiler
The Alchemist compiler handles most of the compilation process for Hydra. The
Alchemist compiler is a generic compiler for converting Java byte code from one
form to another; in other words, it recompiles the code.
The primary method of recompiling code in Alchemist is to use definitions and
transformations. Definitions define what constructs are contained in a language
or what classes and members are in a class library. Transformations are used to
specify what the rules are for transforming the Java byte code.
Plugins are used to specify how the compiler is to do the recompiling and
contains all the details about the language or class library. Custom code for re-
compiling can also be specified in the plugins.
5.1 Compiler Stages
The Alchemist compiler is broken up into seven stages. The first stage loads the
data. The last stage gets rid of unnecessary data. All stages in between process
the data in some way and pass it on to the next stage. The Alchemist compiler
defines seven separate stages which can be used by the compiler. However, not
all of these stages need to be used.
Get Prerequisites Determines if anything else needs to be loaded before
this paricular class.
Pull Apart Loads the Java byte code into the compilers data structures.
The byte code is read-in using the ASM 2.0 toolkit[24].
Analysis Analyzes the code to determine what changes should be made.
For example, it determines which methods need to be converted into events.
Transform Changes the code from its original form to run on a particular
Hydra runtime. For example, it converts methods that should be events
into events by creating new classes for the event and replacing the methods
so that invocation results in scheduling the event rather than executing the
method.
41


Compile Produces new byte code from the resulting transformed code.
This byte code is produced using the ASM 2.0 toolkit[24],
Output Outputs the resulting byte code.
Clean Up Dispose of all unnecessary data.
5.2 Plugins
There are three types of plugins for the Alchemist compiler. The first type is
the compiler plugins, which are used to specify different types of compilers. The
second type is used to specify different languages. The third type is used to spec-
ify different runtime systems.
Without these plug-ins, the Alchemist compiler does not know what language
it is to convert from, what it is to compile to, or even when it should compile
code. These plug-ins convert the Alchemist compiler from a generic compiler to
a specialized compiler.
5.2.1 Compiler Plugins
The core components of the Alchemist compiler are fairly generic and can compile
class files at any time. Compiler plugins are used to specify what type of compiler
is desired. Three separate plugins are specified, though only one was completely
implemented and tested. The three different plugins are as follows:
Class Loader This plugin compiles classes as they are being loaded by
the Java runtime. It replaces the class loader that is normally used by Java
programs. Additionally, it can easily keep track of information about the
classes it loads and store it in a cache for quick access later. This is the
plugin that has been completely implemented and tested.
Class File Transformer This plugin compiles classes as they are being
loaded by the Java runtime. It does not replace the class loader that is
normally used by Java programs. Instead, it transforms class files while
another class loader is loading the files.
Compiler This plugin functions as a traditional compiler. It is invoked
from the command-line. Rather than recompiling classes as they are loaded,
it is instructed to compile specified classes in advance.
42


Compiler plug-ins are created by the creator of Alchemist. These plug-ins are
distributed with the Alchemist compiler and are needed by the compiler. Since
the current focus of Alchemist is on compilation at runtime, it is assumed that
the complete Alchemist package will be distributed with the runtime.
5.2.2 Language Plugins
Language plug-ins define the properties of a language, which includes the fol-
lowing:
Definitions of all language constructs
Tests that can be performed to verify that an implementation of the language
is working correctly.
Language plug-ins are created by the creator of the language. The plug-in
does not necessarily need to be distributed to programmers who use the language.
Those who create runtimes for the language do require the language plug-in.
5.2.3 Runtime Plugins
Runtime plug-ins define the properties of a runtime, which includes the following:
The plug-in for the language the runtime implements.
Definitions for the interface for the runtime.
Transformations from the language to the runtime.
Custom compiler classes to replace or extend some of Alchemists.
Database information, so that Alchemist can cache compiled classes in a
database.
Shared initialization code and information for Alchemist and the runtime.
Runtime plug-ins are created by the creator of the runtime. The plug-in needs
to be distributed with the compiler. Since the current focus of Alchemist is on
compilation at runtime, it is assumed that the developer will distribute the plug-in
with the runtime. Additionally, the compiler will need the language plug-in as
well.
43


5.3 Definitions
Definitions are used as a form of reflection for the Java language. However, unlike
the standard reflection defined in Java, members of classes can be referenced as
easily as classes. To understand what this means, first it is necessary to under-
stand how Java handles reflection.
In Java, classes can be referenced directly by simply stating the class name
followed by .class. For example, the class EventGroup can be referenced with
EventGroup.class. However, members of classes need to be referenced with a
string or by iterating through the list of methods. For example, the method Event-
Group.GetCurrent() needs to be referenced with: EventGroup. class. getMethod
("GetCurrent").
The problems with this method are the following:
Typos cant be picked up until runtime.
If a method is changed, a reference to it can become invalid, and it wont
be noticed until runtime.
Alchemist attempts to solve this problem with definitions. Definitions are class
files that are used to represent everything in a class library. This means that they
represent not just the classes, but all the members as well. Since the definitions
are class files, they can be referenced in the same way that regular Java class
files can be referenced. However, at this time they are only good for specifying
transformations for Alchemist.
5.3.1 Alchemist Definition Generator
Definitions need to be produced before they can be used. The program that
produces definitions is called the Alchemist Definition Generator.
When used the Definition Generator goes through the available packages and
finds the public classes in them. Once a class is found, a definition file is produced
for the class. The first definition added to the definition file is the definition for the
class. Reflection is then used to load the class and find all the members of the class.
For each member of the class, a definition is added to the definition file. Each
definition inherits from a particular subclass of Alchemist.Definitions.Definition.
For example, when used for producing the definition files for the Hydra run-
time, the Definition Generator will discover a number of packages, including the
Hydra.Runtime package. Inside this package the following classes will be found:
44


EventGroup In this it will find a number of members including the inner
classes: FIFO, Parallel, and Sequential.
Parameters In this it will find a number of members including the fields:
READ, READWRITE, and WRITE.
For each class, a separate definition file will be produced and each member will
be noted in that file.
This program is not only used to help produce a Hydra Runtime, it is also
useful to demonstrate the capabilities of the language due to the following prop-
erties:
It can operate with or without Hydra and demonstrates that Hydra is indeed
backwards compatible with regular Java.
It was easy to write with the Hydra language.
Its performance improves when used with a Hydra runtime. This will be
shown in section 6.7.
Making this program use the Hydra language was relatively easy. First, the
individual packages can be handled in parallel, since each is stored in a separate
directory. Additionally, all of the individual classes in a package can be handled in
parallel, since each is stored in a separate file. However, the individual members
of each class need to be handled sequentially since they are stored in the same
file. There is no data that needs to be shared between the parallel operations.
There is one method in the program that determines whether a file in a direc-
tory is a class or another directory. This method then either produces definitions
for the class or for the directory.
Thus when making the program use Hydra, all that was necessary was to
make the program use a parallel event group and to make the previously men-
tioned method an event. This event only uses pass locks, since no data is shared.
5.3.2 How Definitions Work
All definitions inherit from a subclass of Alchemist.Definitions.Definition. The
subclass it inherits from specifies what type of definition it is. For example, class
definitions inherit from Alchemist.Definitions.Definition.Class and field definitions
inherit from Alchemist.Definitions.Definition.Field. When a definition is first used
with Alchemist, an instance of the class is created and stored in hashtable with
45


the definition class as the key. After the first use, the instance of the definition is
retrieved from the hashtable.
All instances of Alchemist.Definitions.Definition store all necessary informa-
tion about the particular class or member it represents. At this time all this
information is restricted to Alchemist for internal use only. This information
includes the following:
i
The name
Whether the member is static or not.
Whether the member was defined by an interface.
The parameter types (Constructor and Method specific)
The return type (Method specific)
The type (Field specific)
Retention Policy (Annotation specific)
Targets (Annotation specific)
Values (Annotation specific)
5.4 Transformations
Transformations are used to describe what needs to be converted and what it
needs to be converted to. Transformations are not used to specify how the con-
version is to take place, because Alchemist decides this on its own. For example, a
transformation can be used to specify that the classes EventGroup and Presched-
uledEventGroup should be changed to the class EG. In that case, Alchemist would
convert all return types, parameter types, field types, etc that specify EventGroup
and PrescheduledEventGroup to EG.
Transformations can be used to change:
All references to a class to refer to another class.
All references to a interface to refer to another interface.
A constructor invocation to a static field.
46


A constructor invocation to a static method.
A method invocation to a get field.
A method invocation to a set field.
A method invocation to a different method invocation.
A static method invocation to an invocation of an instance method.
A static method invocation to an constructor invocation.
An instance method invocation to an constructor invocation.
An instance method invocation to a static method invocation.
A get field to a method invocation.
A set field to a method invocation.
A get field to get a different field.
A set field to set a different field.
A get static field to a get instance field.
A get static field to a constructor invocation.
A get instance field to a constructor invocation.
A get instance field to a get static field.
A set static field to a get instance field.
A set instance field to a set static field.
When a transformation is specified, Alchemist determines the type of the trans-
formation. The type of the transformation is then used to help Alchemist deter-
mine what needs to be done to perform the transformation on Java byte code.
5.4.1 Defining A Transformation
A transformation is defined by simply stating what needs to be changed and what
it needs to be changed to. In pseudo code, a transformation would be defined like
so:
47


EventGroup->EG
In that case we would be defining that all references to the class EventGroup be
changed to the class EG. The way we do this with Java code is like so:
Tr ansform. FromTo (Definitions. Event Group. class, Definitions. EG. class);
In that case, we would have all Alchemist Definition files stored in the Definitions
package. Definitions.EventGroup.class would refer to the class that contains the
definition for EventGroup and Definitions.EG.class would refer to the class that
contains the definition for EG.
The transformation does not need to be stored or manipulated by the pro-
grammer. After definition, Alchemist stores the transformation in its internal
data structures and takes care of everything after that.
5.4.2 How Transformations Work
When Alchemist is compiling, it uses the transformations to determine what it
should do. Since there are many possible transformations, Alchemist simplifies
the process by breaking all possible transformations into a number of different
categories. These categories define most of what needs to be done for a transfor-
mation. The transformation categories are as follows:
Not Supported These transformations are not supported.
Custom Custom transformations specified by the programmer.
Change From Constructor Removes the extra instruction(s) associated
with using a constructor and changes the constructor invocation.
Change To Constructor Adds the extra instruction(s) associated with
using a constructor and changes an instruction to the constructor invocation.
Constructor To Constructor Converts one constructor to another con-
structor. !
i
Instruction Transform Changes the type of the instruction used to
access a member. For example, from an invocation to a get field.
48


Name Transform Changes the names of classes and members that are
specified in instructions.
Then during the transform stage for the compiler, the byte code is modified
using three separate substages. The classes are sent through a class transform,
which checks all the classes and interfaces the class inherits from, uses for a signa-
ture, and uses for field types. Then the class transform sends the methods through
a method transform. The method transform changes the return and parameter
types for methods. Then the method transform sends the individual instructions
through a method builder filter, that is responsible for changing the individual
instructions. This filter is responsible for determining the type of the instruction
and looking up the transformation that applies to the instruction. If a transforma-
tion is found, then the transformation is performed and the resulting instructions
are sent on to the next method builder. If no transformation is found, then the
instruction is sent unchanged on to the next method builder.
49


6. Benchmarks
To demonstrate that the system works and has achieved good performance, the
results from a number of benchmarks will be presented in the following sections.
These results will be compared with the results from a number of different paral-
lel frameworks that work with Java. Each benchmark was programmed for and
executed with the following:
Regular non-extended sequential Java
Hydra
JACE
javab
javar
JOMP
Java Thread Pool
Java Threads
As noted in section 1.2, javab was unable to successfully recompile any of the
tests. As a result, its performance was the same as sequential Java.
The benchmarks were executed on a number of different machines, which shall
be presented shortly. To help develop and execute the benchmarks a framework
(BAPS) was created, which shall also be presented shortly.
Each benchmark will be presented along with the results from each one. The
benchmarks are as follows:
Overhead
Merge Sort
Matrix Matrix Multiply
Spawn Worker Tasks
50


After the discussion of benchmarks, performance results will be presented for
the Alchemist Definition Generator. Lastly, an add-on for Hydra, Hydra Stay
Resident (HSR), which improves the start time of the Hydra runtime, will be
presented.
6.1 Test Machines
Three test machines were used for the benchmarks. Each has slightly differ-
ent parallel capabilities from the others. They will be presented in the order of
how much of an improvement we can expect to see for parallel programs over a
sequential program.
6.1.1 Desktop
The desktop computer is an HP Pavilion al450n. This computer has a dual-
core processor and is able to execute two threads in parallel, one on each core.
The maximum amount of possible improvement is that a parallel program will
take half the amount of time as a sequential version of the same program.
This computer has 2.00 GB of RAM and is running a 32-bit operating system,
Microsoft Windows XP Media Center 2005.
This computer is like many other desktops and laptops being sold today.
6.1.2 Workstation
The workstation computer is a Dell Precision 670 with dual Intel Xeon 2.8 GHz
processors. This computer has two processors, each of which can execute two
threads using Intels HyperThreading technology. This allows for four threads to
be executed in parallel, however, it is not as good as a four processor or four core
processor computer. The maximum amount of possible improvement is that a
parallel program will take a quarter of the amount of time as a sequential version
of the same program. However, we wont see this because its not a true four core
or four processor computer.
The workstation has 2.00 GB of RAM and is running a 64-bit operating sys-
tem, Microsoft Windows XP Professional x64 Edition.
51


6.1.3 Server
The server is a Sun Enterprise 450 Server with four 400 MHz Sparc processors;
this computer can execute four threads in parallel. The maximum amount of pos-
sible improvement is that a parallel program will take a quarter of the amount of
time as a sequential version of the same program.
This computer has 2.00 GB of RAM and is running a 32-bit operating system,
Solaris.
6.2 Hydras Benchmarking Framework BAPS
A benchmark framework was specifically created for this project. The objective
was to create a framework that could properly benchmark parallel programs with
Hydra, other parallel extensions to Java, and non-extended Java. This project
also needed to provide the ability to handle command-line arguments consistently
among benchmarks so that different options can be specified for benchmarks. The
end result was the BAPS (Benchmark and Profiling System) framework.
One problem that can occur with benchmarks is that there can be a variance
in their execution time. Two things that can cause this is that Java loads and
compiles classes while the program is running[25]. BAPS can account for this by
allowing the user to set the conditions for when the benchmark should terminate
by setting the MIN, MAX, and WITHIN command-line arguments, which are
described later in this section.
BAPS offers a number of different features including:
Benchmarks Benchmarks can be created which execute within other
programs.
Benchmark Programs These are stand-alone benchmarks and are able
to parse command-line arguments and constants.
Benchmark Sets These can contain a number of different benchmarks,
which are desirable to execute as a set.
Measurers Used for measuring the amount of time or memory that is
used by a benchmark.
Recorders Used for producing measurers that have consistent options
specified.
52


With BAPS, different command-line arguments can be specified for bench-
marks including:
ARGFILE A file that contains additional command-line arguments.
HELP Display help for BAPS and a particular benchmark.
MAX Set the maximum number of iterations for the benchmark.
MIN Set the minimum number of iterations for the benchmark.
NOGC No garbage collection between benchmark iterations.
SPREADSHEET Outputs results in a spreadsheet/table-like format.
WITHIN The desired difference between averages for the benchmark.
Custom Arguments Custom arguments can be defined for benchmarks.
With BAPS, different command-line constants can be specified for benchmarks
including:
PROCESSORS The total number of physical or virtual processors in
the computer.
Custom Constants Custom constants can be defined for benchmarks.
6.3 Overhead Test
The overhead test checks to see how much overhead there is for executing code
in parallel. Each parallel language extension has its own method of producing
parallel code; however, some of them share a common algorithm for their tests.
Additionally, some have multiple functions that need to be tested for overhead,
although there is something they all have in common: they all need to split up
the work so that it can execute in parallel. Each algorithm will produce a set
number of threads (or events or whatever the parallel language extension calls
its equivalent for doing parallel work), this set number will be equal to the number
of threads that can execute in parallel on the computer. Then all the threads
are all joined together. Then the time to perform this operation will be divided
by the number of threads to produce the overhead for each thread.
The algorithms for the overhead tests will be presented in 6.3.1 6.3.5. The
results will be presented and discussed in 6.3.6
53


6.3.1 Overhead for Threads Algorithm
When testing for the overhead of creating and joining a set number of threads,
there is no need to worry about the first thread because it is already created and
it is automatically disposed of when there is no longer a need for it. So all that
needs to be checked are the other threads. As a result 1 is subtracted from the
number of threads that need to be created. That number of threads is created
and instructed to perform whatever operation needs to be done. Then the original
thread does its work. In this case there is no work for any of the threads to do.
Then all of the threads are joined together.
It is possible to have all the threads help with the creation of the remaining
threads after they begin, but that complicates the algorithm and is unlikely to
help, since it takes a short period of time for each thread to start doing its work.
In addition, some form of synchronization would have to be used to make it ef-
ficient and this test will not use any synchronization, so the original thread will
create all the additional threads.
The algorithm is as follows:
TestForThreads(Number):
FOR 1=1 TO Number 1
Threads[I] = CreateNewThread(WorkToDo)
NEXT I
WorkToDo()
FOR 1 = 1 TO Number 1
Threads[I].Join()
NEXT I
This algorithm applies to all thread based systems (Java Threads, Java Thread
Pools, and JACE).
6.3.2 Overhead for Parallel Loops Algorithm
When testing parallel loops, all that needs to be done is to specify a loop, have it
do some work, and have the loop execute a set number of times. The work in the
loop should be parallelized automatically.
The algorithm is as follows:
54


TestForLoops(Number)
FOR 1 = 0 TO Number 1
WorkToDo()
NEXT I
This algorithm applies to javab, javar, and JOMP.
6.3.3 Overhead for Parallel Invocations Algorithm
When testing parallel invocations, all that needs to be done is to have a recursive
function split itself into two separate threads. One thread executes the work and
the next thread continues its recursion until all threads have been created, and
then it does its work.
The algorithm is as follows:
TestForlnvocations(Number):
Split (Number)
Split(Number):
IF Number > 1 THEN
Split (Number 1)
WorkToDo()
ELSE
WorkToDoQ
This algorithm applies to only javar.
6.3.4 Overhead for Parallel Sections Algorithm
When testing parallel sections for JOMP, it is necessary to note that JOMP cant
have one set of parallel sections within another set. So it is necessary to create all
parallel sections right away. The algorithm first checks to see how many parallel
sections are necessary and then it produces that number of parallel sections.
The algorithm is as follows:
TestForParallelSections(Number):
IF Number == 1 THEN
55


PARALLEL SECTIONS
SECTION
WorkToDo()
END SECTION
END PARALLEL SECTIONS
ELSE IF Number == 2 THEN
PARALLEL SECTIONS
SECTION
WorkToDo()
END SECTION
SECTION
WorkToDo()
END SECTION
END PARALLEL SECTIONS
ELSE IF Number == 3 THEN
PARALLEL SECTIONS
SECTION
WorkToDo()
END SECTION
SECTION
WorkToDoQ
END SECTION
SECTION
WorkToDo()
END SECTION
END PARALLEL SECTIONS
And so on...
This algorithm applies to only JOMP.
6.3.5 Overhead for Hydra Algorithm
For the Hydra version of the algorithm, a set number of events need to be sched-
uled and placed in the current event group. Hydra will then execute the events
on as many separate threads as it can and then will automatically join all the
threads together at the end of the test.
There is no need to specify what the current event group is because the code
outside of the algorithm will determine this. Separate benchmarks will then be
produced for all of the different event group types, each of them will execute this
algorithm, but with different event group types.
56


The algorithm is as follows:
TestForHydra(Number) :
FOR 1 = 0 TO Number
SCHEDULE WorkToDo()
NEXT I
This algorithm only applies to Hydra. The source code for the algorithm is
presented in section B.l.
6.3.6 Results
For this test, the smaller the overhead the better. From the results shown in
figure 6.1 and table 6.1, it can be seen that Hydra is among the lowest in over-
head for performing parallel work.
Overhead
o
Is
3
1100
1000
900
800
700
600
500
400
300
200
100
y







' 1
!r t
1 :
Desktop
Workstation
Server
Hydra FIFO
a Hydra Parallel
Hydra Prescheduled
* Hydra Sequential
is JACE
v javar par
a javar par inv
JOMP for
a JOMP sections
Java Thread Pool
Java Threads
Figure 6.1 Overhead for Starting Parallel Work
57


Table 6.1 Time for Overhead Test
Test Desktop Timers) Workstation Time(/.ts) Server Time(//s)
Hydra FIFO 36.75 47.06 227.49
Hydra Parallel 58.95 35.24 289.51
Hydra Prescheduled 58.71 34.29 295.89
Hydra Sequential 33.11 48.66 222.23
JACE 192.51 170.82 811.13
javar par 499.84 220.28 1032.73
javar par inv 189.2 129.71 64.25
JOMP for 15.42 14.06 629.78
JOMP sections 16.75 55.32 64.89
Java Thread Pool 49.25 21.46 208.08
Java Threads 189.86 168.48 798.94
6.4 Merge Sort Test
Merge sort was chosen as a benchmark for the following reasons:
It is a simple sort that is easily understood and as a result it is likely to be
used.
It is an example of a recursive algorithm.
It divides the work up evenly and as a result it produces a fairly clean
parallel algorithm.
All the details of merge sort will not be covered: only be the basic idea and
how it works in the algorithms used for the benchmarks will be covered.
The basic idea is that merge sort splits the array into two halves and recur-
sively merge sorts the two halves. After successfully completing the merge sort
on the two halves it then merges the two. It starts out at the sorting elements
0 through N-l. The merge sort then splits the array into two parts and merge
sorts 0 through N/2 and N/2+1 through N-l separately and so on until it reaches
the individual elements in the array, each of which is already sorted. It then re-
turns from each recursive call one by one, merging the individually sorted pieces
together until it has successfully merged the first two parts. At that point, the
entire array is successfully sorted.
Merge sort is a very simple algorithm to parallelize. For javar, all that needs
to be done is to specify that the recursion is a parallel invocation and it is done.
58


For the others it is a little more complicated, but not by much. It is important to
note at this point, that there is overhead for executing parallel code. Many par-
allel language extensions cant deal with this overhead automatically with merge
sort and as a result need a little bit of help. This help comes in the form of only
parallelizing part of the merge sort and the rest will then be done sequentially.
For thread-based systems, it is desirable to have the exact number of threads
that can be executed in parallel. For JOMP, the merge sort can only be split into
two parallel parts, because it doesnt support nested parallel sections. For Hydra,
there is some overhead with every event that is scheduled, but the more events
that are produced the more the work can be split up between different threads.
Thus with Hydra, it is desirable to find a balance.
There are two separate ways to determine when to stop creating threads or
events, but both are very similar. One is to keep track of how many parallel
invocations have been made with an atomic integer. Another is to check the size
of the array to sort against some other value, G, that specifies when the desired
size has been reached.
For this benchmark, an array of 500,000 integers will be sorted. G will be
defined for Hydra as 50,000.
After the basic algorithms are presented, one problem with parallel merge sort
will be presented and then the results will be presented.
6.4.1 Parallel Merge Sort With A Set Number Of Threads
The first possibility that will be covered is keeping track of the number of splits
with an atomic integer. For this merge sort the number of threads (could be
events, parallel sections, parallel invocations, etc...) will be stored in NumberOfTh-
reads. This variable needs to be an integer that supports an atomic increment.
Every time the parallel merge sort splits up some work, NumberOfThreads will
be incremented by 1. When this variable has reached a preset value, Maximum,
then the splitting will stop and a sequential merge sort will begin.
SpawnAllThreads(A, B):
IF A = B THEN
RETURN
Middle = (A + B) / 2
IF NumberOfThreads.IncrementQ >= Maximum THEN
59


MergeSort(A, Middle, B)
ELSE
NewThread = CreateNewThread(SpawnAllThreads, A, Middle)
SpawnAllThreads(Middle, B)
NewThread. JoinQ
ENDIF
It starts with just thread 1. Thread 1 creates thread 2 and then executes a
recursive call to merge sort. At that point it then creates thread 3 and executes
another recursive call to merge sort. This keeps up until all threads are created.
Meanwhile, each of the child threads will be creating new threads as well. Thread
2 creates thread 4 and then executes a recursive call to merge sort. It is important
to note that the threads will join together again at the same point in the recursive
algorithm at which they were created.
6.4.2 Parallel Merge Sort With A Minimum Array Size
The second possibility uses a set minimum array size. For this merge sort the
size of the array to be sorted will be compared with the minimum size of an array,
Minimum. If the array is larger, then the array will be split and the sort will
be parallelized. The parallel sorts will recursively invoke this function until the
arrays are smaller than Minimum. At that point, the sort will be sequential.
This is the version of the sort that is used with Hydra. For Hydra, a presched-
uled event group will be used so that time slots can be assigned to each event.
FORWARD and BACK are used to change the time slots that events are being
placed into. When BACK is specified, the previous timeslot is being added to.
When FORWARD is specified, the next timeslot is being added to. Every time
ScheduleMergeSort recursively calls itself, BACK is specified. When each recursive
call ends, FORWARD is specified. This is done to make the last events (sorting
the smallest arrays) to be scheduled execute first and the first event (merging the
last two arrays) to be scheduled last.
ScheduleMergeSort (Array):
CREATE NEW Prescheduled Event Group
ScheduleMergeSort
ScheduleMergeSort (A, B):
BACK
IF A = B THEN
60


RETURN
Difference = B A
Middle = (A + B) / 2
IF Difference >= Minimum THEN
ScheduleMergeSort(A, Middle);
ScheduleMergeSort(Middle + 1, B);
SCHEDULE Merge(A, Middle, B)
ELSE
SCHEDULE MergeSort(A, Middle, B)
ENDIF
FORWARD
The scheduling of the events that perform the sort in parallel is done in a
depth-first manner. Recursion is used to split the array into sub-arrays.
The first set of events that are executed are ordinary recursive merge sorts
that work on only a part of the larger array. These parts are of size Minimum or
smaller. After completing the first set of events, the following events are all merges
and they steadily merge together the previous results until at last the entire array
is sorted. Events in the same time slot may all be executed independently and as
a result may be executed in separate threads. Hydra will exploit this to the best
of its and the hardwares ability.
The source code for the algorithm is presented in section B.2.
6.4.3 Problem With Parallel Merge Sort and Threads
Now there is one important problem to note. It takes time for a thread to start
up and there is no synchronization with the original algorithm. As a result it is
possible for the actual spawning of threads to become skewed. For example, if
it takes too long for thread 2 to start execution, then thread 4 will actually be
created by thread 1 after it creates thread 3. If the limit for the number of threads
is in fact 4, then the work will not quite be evenly divided between the threads. In
order to ensure that work is evenly divided between the threads a small amount of
synchronization needs to be added to the algorithm. After creating a thread, the
creating thread should wait until the created thread starts its work. The modified
algorithm is below:
SpawnAllThreads(A, B):
61


NOTIFY CREATOR
IF A = B THEN
RETURN
Middle = (A + B) / 2
IF NumberOfThreads.Increment() >= Maximum THEN
MergeSort(A, Middle. B)
ELSE
NewThread = CreateNewThread(SpawnAllTlireads, A, Middle)
WAIT FOR NewThread TO START
SpawnAllThreads(Middle, B)
N ewThread. J oin ()
ENDIF
The WAIT FOR and NOTIFY can be implemented in a number of ways, in-
cluding with a barrier that is created by the creator and passed to the created
thread, which both threads then wait at or it can be done with messaging passing.
Now it may not seem like that big of a difference, but as shall be seen with the
benchmark results the difference is fairly significant.
6.4.4 Results
For this benchmark, shorter times are better and the faster it is in comparison to
the sequential merge sort, the better. From the results in figure 6.2 and table 6.2
below, it can be seen that Hydra does well in comparison to the others.
It can also be seen that generally using synchronization with threads can
greatly improve the results. For example, with a Java Thread Pool on the server,
this test takes 459.22 ms without synchronization, but with synchronization it
takes only 302.41 ms.
62


Merge Sort
c
0)
3
C7
<1)
CO
c
CO
wySynchronization
Sequential
Hydra
a JACE
javar
JOMP
Java Thread Pool
b Java Threads
Figure 6.2 Times Faster Than Sequential Merge Sort
Table 6.2 Time and Times Faster for Merge Sort
Test Desktop Workstation Server
Time(ms) Faster Time(ms) Faster Time(ms) Faster
Sequential 152.04 l.OOx 155.66 l.OOx 865.20 l.OOx
Hydra 92.42 1.65x 72.82 2.14x 305.06 2.84x
JACE 90.18 1.69x 100.07 1.56x 304.19 2.84x
w/Sync 76.83 1.98x 73.84 2.11x 295.52 2.93x
javar 82.35 1.85x 70.97 2.19x 279.63 3.09x
JOMP 79.17 1.92x 123.29 1.26x 458.81 1.89x
Java Thread 89.01 1.71x 99.65 1.56x 459.22 1.88x
Pool
w/Sync 84.26 1.80x 74.98 2.08x 302.41 2.86x
Java Threads 76.30 1.99x 95.11 1.64x 298.06 2.90x
w/Sync 79.80 1.99x 73.59 2.12x 293.68 2.95x
6.5 Matrix Matrix Multiplication Test
Matrix matrix multiplication was chosen as a benchmark for the following reasons:
63


Manipulation of matrices is a popular application of parallel programming.
It is an example of an algorithm which is easy to parallelize, but is not
recursive, unlike the previous benchmark.
All the details of matrix matrix multiplication will not be covered, only the
basic idea and how it works in the algorithms used for the benchmarks will be
covered.
The basic idea is that when multiplying two matrices, the rows are taken
from one and matched up with the columns of the other. Then the elements
are multiplied together from each row and column and add the results for each
pairing of elements together to produce one element in the resulting matrix. The
following is an actual algorithm for performing this task sequentially:
FOR J = 0 TO A.Height
FOR I = 0 TO A.Width
C.Values[I,J] = 0
FOR IC = 0 TO A.Width
C.Values[I,J] = C.Valuesp,J] + A.Values. [K,J] *
B. Values. [I,K]
Now this algorithm is easy enough to parallelize. All that needs to be done
is to divide this up into row and column pairs and assign each pair to a different
thread or event and the result will be a parallel algorithm. However, as stated by
Jordan and Alaghband it can be advantageous to divide the matrices into square
submatrices [26]. This is primarily an issue if multiplying matrices together is
just one part of a larger computation. For example matrices could be added or
subtracted as well. It is also useful in a distributed environment. However, this
shall be included for the sake of completeness since it is a different method of
parallelizing matrix matrix multiply.
Now the algorithms that were used for this parallel multiplication were gen-
eralized so that they could accept either row and column multiplication or
submatrix multiplication. In addition the data will be copied from the matrices
into data structures that represent the various parts of the matrix (the rows,
columns, or submatrices) so that the multiplication can be simple for each indi-
vidual part. This also allows the data structures to be smaller for each piece of
the matrix, whereas the program would have to access the entire matrix. When
dealing with large matrices this should help avoid having to access main memory
and allow the system to keep more data in the cache. In addition, if the matrices
64


are very large, it should help avoid page faults during the multiplication.
After multiplication the result will then be copied back into the result matrix.
In addition all the data for a matrix will be stored in a one dimensional array and
represent transposed matrices with a single bit switched on, which indicates the
particular algorithm to use for multiplication and other operations. By making
the matrix a one dimensional array, only one array access needs to be made every
time the matrix is accessed, which should improve performance slightly. The rows
will basically be represented as a submatrix with only one dimension and a column
will basically be a row that is transposed. Each submatrix will function like the
larger matrix, except that they will all have references to all other submatrices in
their rows and columns.
6.5.1 Threaded Algorithm
For the thread version of the matrix matrix multiplication algorithm as presented
here, the following will need to be done:
1. Create all threads.
2. Create the parts (the submatrices or the rows and columns) of the matri-
ces.
3. Multiply all the parts.
4. Create the resulting matrix.
5. Copy the data into the resulting matrix.
ForEachThread():
SpawnAllThreads ()
CreateAllParts()'
Await ()
FillAllAParts()
FillAllBPartsQ
Await ()
ProcessAllPartsQ
Await ()
BuildAnswerMatrix()
Await ()
Fill Answer M atrix()
Await ()
65


6.5.2 Hydra Algorithm
For the Hydra version of the matrix matrix multiplication algorithm as presented
here, the following will need to be done:
1. Create the parts (the submatrices or the rows and columns) of the matri-
ces.
2. Multiply all the parts.
3. Create the resulting matrix.
4. Copy the data into the resulting matrix.
Schedule All():
CREATE NEW Prescheduled Event Group
ScheduleCreationOfPartsQ
Schedule DoneCreating
ScheduleCreationOfPartsQ:
Schedule HowToMultiply.GetPartsOfA(A)
Schedule HowToMultiply. GetPartsOfB (B)
Schedule HowToMultiply. GetPartsOfC(C)
DoneCreatingQ:
ScheduleFillAQ
ScheduleFillBQ
FORWARD
ScheduleMultiplication ()
FORWARD
ScheduleBuildResult ()
FORWARD
ScheduleF illResult ()
The source code for the algorithm is presented in section B.3.
66


6.5.3 Results
For this benchmark, shorter times are better and the faster it is in comparison to
the sequential merge sort, the better. Some parallel languages support this matrix
matrix multiplication directly with an implementation of parallel loops, for others
it is necessary to split the work up manually. For those that require the work to
be split up manually, dashes are put in place of their results in the table and the
actual results are placed next to the algorithm between them. From the results
shown in figure 6.3 and table 6.3, it can be seen that Hydra does not do as well in
this benchmark in comparison to most of the others. However, Hydra still does
better than sequential Java and it is still reasonably close to the others.
Matrix Matrix Multiply
as
>
ro
cr
a>
W
E
m Rows and Columns
x Submatrices
b Sequential
Hydra
a JACE
a javar
JOMP
Java Thread Pool
a Java Threads
Figure 6.3 Times Faster Than Sequential Matrix Matrix Multiply
67


Table 6.3 Time and Times Faster for Matrix Matrix Multiplication
Test Desktop Workstation Server
Time(ms) Faster Time(ms) Faster Time(ms) Faster
Sequential 112.72 l.OOx 69.50 l.OOx 1,399.56 l.OOx
Rows and 123.17 0.92x 73.47 0.95x 1,401.28 l.OOx
Columns Submatrices 121.72 0.93x 84.66 0.82x 1,384.77 l.Olx
Hydra - - - - - -
Rows and Columns 54.12 2.08x 40.38 1.72x 393.17 3.56x
Submatrices 74.25 1.52x 51.97 1.34x 488.30 2.87x
JACE - - - - - -
Rows and 121.11 0.93x 33.27 2.09x 357.47 3.92x
Columns Submatrices 124.58 0.90x 34.66 2.01x 367.06 3.81x
javar 50.34 2.24x 34.04 2.04x 362.06 3.87x
JOMP 54.08 2.08x 108.86 0.64x 497.83 2.81x
Java Thread Pool - - - -
Rows and 55.27 2.04x 33.53 2.07x 353.51 3.96x
Columns Submatrices 59.90 1.88x 35.19 1.97x 364.18 3.84x
Java Threads - - - - - -
Rows and 53.91 2.09x 32.96 2.11x 357.75 3.91x
Columns Submatrices 57.75 1.95x 34.59 2.01x 374.29 3.74x
6.6 Single Master Thread with One or More Worker Threads
In this benchmark responsive user interfaces will be covered[31]. A single mas-
ter thread will be used to represent the typical interface thread that exists in a
Java program. This thread will proceed to execute one variation of each of the
two previous benchmarks and then terminate. The test is used to see how long it
takes for all the work to be completed and how long the master thread takes to
start each of the two individual tasks.
The first number is what has typically been shown in this paper. It indicates
how long it takes the program to actually complete its work. Ideally this should
be as low as possible, but this is not the most important measure of performance
for this benchmark.
68


The second number is closely related to responsiveness (though not identical).
This is due to the fact that the first thing a program must do to respond to
something is that it must begin the work of responding. However, the actual
response time would be measured by determining how long it takes for the program
to actually output some kind of information to the end user. As a result, this
number will be referred to this as TTSW (Time To Start Work) instead of response
time. This is the most important measure of performance for this benchmark. So
this should be as low as possible.
Start WorkQ:
FOR 1 = 0 TO Benchmarks.Length
START Profiler
Benchm arks. Run ()
DONE Profiler
NEXT I
There are two separate versions of this algorithm. One just executes sequen-
tially. The other explicitly spawns off the work using parallel constructs. In order
to get this benchmark to execute efficiently, all parallel languages, except for Hy-
dra, had to explicitly spawn off their work. It is possible to explicitly spawn off
the work for Hydra and it does provide slightly better performance and better
control over what happens, however, it was unnecessary.
The source code for the Hydra versions of the benchmark are presented in
section B.4.
6.6.1 Thread Pools and Explicitly Spawning Workers
The Java Thread Pool was not setup for explicitly spawning workers, since there
are many possible ways to do this. The first way is the naive way, in which the
program doesnt worry about it and just allocates one thread to each worker. The
first way can be problematic if the thread pool is static in size, since the workers
may require a certain number of threads to execute their algorithms. If two or
more such workers start at the same time and each takes enough threads to ensure
that the others wont get all they need, then the system will deadlock.
This problem is resolvable through a number of different methods. The first
is to make the thread pool dynamic in size, this causes the thread pool to create
more workers when it needs them, but then this makes using a thread pool similar
69


to using regular threads. The second is to make the workers capable of seeing and
using a limited number of threads, in which case they will use fewer threads than
they usually do. The problem with the second approach, is that the workers may
be unable to properly split up all their work to take full advantage of the parallel
capabilities of the computer. The third is to make the workers more flexible so
that they can use any and all threads as they become available and wont try to
use additional threads when they arent available.
Due to the complexities of this problem, it is assumed for this paper that the
programmer would choose the simplest approach and use a thread pool with a
dynamic size, in which case it will be similar to using regular threads. So the
thread pool result for explicitly spawning workers was not included and simply
combined with the threads result.
6.6.2 Results Total Execution Time
For the first result for this benchmark, shorter times are better and the faster
it is in comparison to the sequential merge sort, the better. From the results
shown in figure 6.4 and table 6.4, it can be seen that Hydra falls behind slightly,
since this makes use of the benchmark from section 6.5, in which Hydra fell be-
hind. Once again, it is better than sequential Java and it is reasonably close to
the other. Also, the second result for this benchmark is the most important.
It can also be seen that JOMP was able to complete this test for the desktop,
but was unstable on the workstation and the server. As a result, dashes are put
in place of the results for JOMP on the workstation and server.
70


Spawn Workers Total Time to Execute
Implicit
Explicit
Sequential
Hydra
a JACE
javar
JOMP
a Java Thread Pool
m Java Threads
Figure 6.4 Times Faster Than Sequential for Execution
Table 6.4 Total Execution Time for Spawning Workers
Test Desktop Workstation Server
Time(ms) Faster Time (ms) Faster Time (ms) Faster
Sequential 265.70 l.OOx 221.18 l.OOx 2.234.32 l.OOx
Hydra 135.13 1.97x 105.09 2. lx 680.22 3.28x
Explicit 141.03 1.88x 156.77 1.41x 668.35 3.34x
JACE 136.25 1.95x 225.69 0.98x 646.13 3.46x
Explicit 134.34 1.98x 197.82 1.12x 623.52 3.58x
javar 135.52 1.96x 103.04 2.15x 644.27 3.47x
Explicit 136.04 1.95x 111.33 1.99x 624.17 3.58x
JOMP 140.48 1.89x 172.31 1.28x 960.97 2.33x
Explicit 281.69 0.94x - - - -
Java 133.82 1.99x 103.34 2.14x 656.50 3.40x
Thread
Pool
Java 141.66 1.88x 107.94 2.05x 657.17 3.40x
Threads
Explicit 145.38 1.83x 111.86 1.98x 638.28 3.50x
71


6.6.3 Results Time To Start Work
For the second result for this benchmark, shorter times are better and the faster it
is in comparison to the sequential merge sort, the better. From the results shown
in figure 6.5 and table 6.5, it can be seen that Hydra can implicitly spawn workers.
It can also be seen that Hydra performs well whether it is used to implicitly or
explicitly spawn workers, however, it does perform better when used explicitly.
It can also be seen that JOMP was able to complete this test for the desktop,
but was unstable on the workstation and the server. As a result, dashes are put
in place of the results for JOMP on the workstation and server.
Spawn Workers TTSW
Implicit
Explicit
0 Sequential
Hydra
b JACE
n javar
b JOMP
b Java Thread Pool
Java Threads
Figure 6.5 Times Faster Than Sequential for Time To Start Worker
72


Table 6.5 -r. TSW for Spawning Workers
Test Desktop Workstation Server
Time(ms) Faster Time (ms) Faster Time(ms) Faster
Sequential 132.81 l.OOx 110.55 l.OOx 1,116.53 l.OOx
Hydra 0.24 558.46x 0.31 359.8x 1.15 971.47x
Explicit 0.080 1657. lx 0.11 968.OOx 0.14 7860.74x
JACE 68.09 1.95x 112.80 0.98x 322.57 3.46x
Explicit 0.41 320.llx 0.29 379.89x 0.86 1305.56x
javar 67.73 1.96x 51.47 2.15x 321.58 3.47x
Explicit 13.85 9.59x 0.88 125.4x 2.69 414.46x
JOMP 70.21 1.89x 86.11 1.28x 478.35 2.33x
Explicit 0.15 889.57x - - - -
Java 66.88 1.99x 51.63 2.14x 327.76 3.41x
Thread Pool Java 70.79 1.88x 53.92 2.05x 328.08 3.40x
Threads Explicit 0.34 394.35x 0.24 470.21x 0.77 1445.28x
6.7 Alchemist Definition Generator
This section demonstrates that the Alchemist Definition Generator works with
and without Hydra. It also demonstrates that the performance improves when
used with Hydra. For more details about how the Alchemist Definition Generator
works see section 5.3.1.
The Alchemist Definition Generator was executed with the BAPS framework
to produce a benchmark. The Alchemist Definition Generator was used to produce
definitions for the Hydra runtime and was executed 1000 times. The definition
generator was only written to work with Hydra and regular Java, as a result it
will only be tested with those. The program is largely I/O bound and so it is
unlikely to demonstrate the type of gains we saw in the previous benchmarks.
The results of the benchmark are shown in figure 6.6 and table 6.6. It can be
seen that the definition generator does perform better with Hydra than it does
with regular Java. It can also be seen that the performance improves when used
with a computer that has greater parallel processing capabilities.
73


Alchemist Definition Generator
(0
>
03
CT
CO
c
01
-C
J-
b-
Q)
to
CO
0
E
i-
Figure
1.6
1.5 i
1.4
1.3
1.2
1.1
1 i
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Desktop Workstation Server
6.6 - Times Faster Than Sequential for the Definition Generator
a Java
Hydra
Table 6.6 Time and Times Faster for Definition Generator
Test Desktop Workstation Server
Time (ms) Faster Time(ms) Faster Time (ms) Faster
Definition 247.27 xl.00 325.36 xl.00 1,318.82 xl.00
Generator with Java Definition Generator with Hydra 211.96 xl.17 233.35 xl.39 877.34 xl.50
6.8 Hydra Stay Resident
The Hydra Stay Resident (HSR) project was started to address one significant
performance issue with the Hydra runtime. This issue is that it takes a while for
the complete runtime to start up. This is caused by a number of different factors
including:
Startup time for the Java virtual machine.
Connecting to the database where recompiled classes are stored.
74


Initialization of all of Alchemists data structures.
Initialization of all of the Hydra runtimes data structures.
While many data structures are initialized in parallel, it still takes some time
for the complete runtime to startup. This can be a substantial problem for two
reasons:
1. Some programs are short and if they are too short, then the startup time
for the runtime will either take longer than the program takes to run or the
startup time will account for a very large percentage of the total execution
time.
2. Interactive programs will start up slowly and users may notice this. If users
notice this, then it can cause impatience and make using Hydra programs
undesirable.
In order to solve these problems, HSR was developed. When HSR is used,
the Hydra runtime and the Java virtual machine will both stay in memory after
they are first loaded. Programs to be executed will be sent to the already started
Hydra runtime and since the runtime is already loaded, the startup time is reduced
to virtually nothing. If a program was started previously, then there are two
additional benefits to using HSR:
Classes Are Already Loaded Its classes do not need to be loaded again.
Code Is Preoptimized Classes are already optimized by Java.
6.8.1 How It Works
HSR operates by providing two separate programs:
The HSR Server This is written in the Hydra language and is responsible
for two things. First, to make the Hydra runtime sleep whenever there is
no work to do. Second, to wake up the Hydra runtime whenever there is
work to do. This program uses socket-based interprocess communication to
communicate with the HSR start program[32][33].
The HSR Start Program This is written in C and is compiled to a native
executable. So it should have a near instantaneous start time. It is used
to indicate what program the Hydra runtime should be executing and the
command-line arguments for the program. This program uses socket-based
interprocess communication to communicate this to the HSR server[32].
75


HSR operates through a simple sequence of events, which are as follows:
1. The Hydra runtime is started in advance.
2. Once started the Hydra runtime starts the HSR server.
3. The HSR server creates a server socket for interprocess communication.
4. The HSR server schedules the sleeping events.
5. Hydra executes the sleeping events, which causes the system to wait un-
til...
6. The user specifies a program to start with the start program.
7. The HSR start program opens a socket for interprocess communication to
the HSR server.
8. The HSR start program sends the name and the command-line arguments
for the program to the HSR server.
9. The HSR server receives the information on the program.
10. The Hydra runtime wakes up.
11. The HSR server places the program in a FIFO event group, which is followed
by the sleeping events.
12. The Hydra runtime executes the program that was sent to it.
13. The HSR server informs the start program through the socket connection
that the program has completed execution.
14. The HSR start program ends.
15. Loop back to 4.
6.8.2 The Results
All tests in this section were executed on the workstation.
To understand the results, it is important to consider how long it takes for the
Hydra runtime and the Java virtual machine to start. If the Hydra runtime is
started with a class that has an empty main method and is already precompiled
76


in the database, then it takes 2.36s to complete execution. If just the Java vir-
tual machine is started with the same class file, then it takes 170 ms to complete
execution. From this it can be seen that the Hydra runtime is quite slow.
When using HSR the results will vary. For the purposes of this paper, it is
assumed that the HSR server is already started and is not doing any work. Then
if the start program is used to instruct the HSR server to start the same class
that was used earlier, then it will take about 170 ms for the first time. After this,
the class file is already loaded into memory, so it can be as low as 10 ms. Using
HSR provides us with a speedup of about 236 times over the Hydra runtime and
17 times over the Java virtual machine.
HSR also works with real programs. If the Alchemist Definition Generator is
used with HSR, then it will take 730 ms the first time,- and 300 ms the second
time. If the same program is executed repeatedly (20+ times), then the runtime
for the Alchemist Definition Generator will slowly drop to about 250 ms.
HSR allows Java programs to avoid the startup time of the Java virtual ma-
chine and allows Hydra programs to avoid the startup time of the Hydra runtime.
Reducing startup time enhances performance of the Hydra programs. HSR allows
for the class cache for Hydra to remain in memory and the optimizations per-
formed by the Java JIT (Just-In-Time) compiler to be kept in memory, resulting
in the performance of programs to gradually improve with use.
6.8.3 Conclusions and Ftiture Work
HSR is just a very simple prototype system that is meant to demonstrate that
the startup time problem can be addressed. A complete version of HSR should
add a number of features, including:
The ability to schedule new programs while one is already executing.
The ability to direct console I/O from the HSR server to the HSR start
program.
The ability to allow the HSR start program to terminate programs being
executed by the HSR server.
Even though there is still much a great deal of development work left for HSR,
it certainly demonstrates that it can resolve Hydras startup time problem. It
also demonstrates that it can be used to improve the startup time of regular Java
programs.
77


7. Conclusions and Future Research and Development Directions
Hydra provides solutions to the issues presented in this paper. Hydra programs
can execute without change on a regular Java virtual machine as a sequential
program or can execute as a parallel program with a Hydra runtime. It pro-
vides comparable performance to other parallel programming languages. It is
easy to use and avoids some problems and limitations that some other parallel
programming languages have, like deadlocks. In particular it provides a solution
to developing software for SMP and compatible architectures, which are becoming
more common.
It should be noted that this paper cannot be used to reach a conclusion regard-
ing which parallel programming language is the best or the worst. The reasons
for this are as follows:
1. There are many different things we could be looking for in a comparison of
parallel programming languages and each would likely result in a different
best. For example, it is possible to search for the easiest to use or for the
one with the best performance.
2. Even if one particular attribute was chosen, there are many ways to conduct
a comparison. For example, if the objective is to find the one with the best
performance, it has to be asked, which benchmarks are the most important.
In this particular paper, for example, Hydra did exceptionally well with the
spawn worker benchmark and the overhead test, but not so well with the
matrix matrix multiplication.
3. This paper does not attempt to address these issues, its purpose is to merely
introduce the Hydra Parallel Programming System and demonstrate that
works.
There are a large number of different directions Hydra can go in for the future.
Some of those directions are specific to Hydra Stay Resident and are mentioned in
6.8.3. Others cover the project as a whole. Some possible directions are described
here.
78


7.1 A Lighter System
At the moment, Hydra uses a lot of memory and has a number of features that are
specific to larger systems with more powerful processing capabilities. This is great
for achieving high-performance that have the required resources, but this can be
problematic for systems that have few resources or dont need the full features.
These features are as follows:
Database for storing recompiled classes.
Hashmap for storing recompiled classes.
Recompilation at runtime.
The Hydra runtime expects to have parallel processing capabilities.
As stated, all of. these are great and are currently considered advantages of
Hydra, but what if it was decided to use Hydra on a cellphone or some other
handheld device? There likely will not be a need or the memory for the database
and the hashmap. Additionally, compilation at runtime may not be desirable.
Instead compilation at program installation time may be desirable. Also, future
versions of Hydra may have additional features beyond parallel programming that
may be considered desirable. If that happens, then some users of Hydra may wish
to use the extra features and not the parallel constructs in Hydra.
7.2 New Language Constructs
The first version of the Hydra language does not have all the features that other
parallel language extensions have. For example, while it is possible to do parallel
loops, there is no high-level construct for doing so like in JOMP. Adding these to
the language itself should be fairly straightforward. The problem arises when we
start to consider how it should be implemented.
If parallel loops do not produce events, then it would require more work on
the runtime itself. There will need to be a new low-level construct created to
support parallel loops and it would be highly disruptive to the current scheduling
and execution mechanisms.
These high-level constructs are worth the effort, because it would make a
number of tasks simpler to do. For example, the matrix matrix multiplication in
section 6.5 would not require the programmer to do as much work as it does now.
79


7.3 Runtime for Clusters
The first Hydra runtime was developed for SMP and compatible architectures.
This runtime does not cover clusters, which are another popular architecture. It
is considered possible to port the Hydra runtime to clusters, though it has not yet
been attempted.
Handling events, event groups, and scheduling should all be easj' to do. Pa-
rameters will be a little more complicated, but if the locks are treated as a form
of message passing, then it should be possible to make those work. There are only
three real issues to be resolved.
1. When parameters with a lot of references to other objects is passed. Then
the compiler or the runtime will have to find what the cut-off point is for
sending all the data.
2. The storing of references to parameters be handled. One processs copy of
a reference will not be the same as another.
3. The sharing of data that is not locked in the current version of Hydra. For
example, class and instance fields and not locked at this time.
7.4 Real-time Hydra
Due to its ability to schedule everything in advance, Hydra may one day be
suitable for real-time programming. For this to happen there are a few issues that
need to be resolved, which will be discussed here.
First, there is no mapping between time slots and real world time. With-
out this mapping, there is no way for Hydra to schedule anything based on real
world time. Also, Hydra cannot tell when anything will execute in real world
time either. This can be potentially be taken of by having Hydra estimate the
amount of time each event will execute. With that information, it is possible to
determine how long time slots will take. With both pieces of information, it is
possible for Hydra to tell when an event will execute. Once that is done, then
it will be possible for Hydra to schedule events based on when they should execute.
Second, there needs to be new constructs for the language that allow the pro-
grammer to specify when an event should execute. Other constructs may also be
desirable, like periodic events and determining when an event will execute.
80


A. Alchemist Definition Generator Source Code
The source code for the Alchemist Definition Generator follows. All Hydra com-
mands are in the main method and the first Generate method.
/*
* DefinitionGenerator.java
*
* This file is part of the Alchemist Compiler Library Public Release 3, which
* is in turn part of the Hydra Parallel Programming System Public Release 3.
* Copyright (C) 2006, Franklin E. Powers, Jr.
*
* This library is free software; you can redistribute it and/or modify it
* under the terms of the GNU Lesser General Public License as published by the
* Free Software Foundation; either version 2.1 of the License, or (at your
* option) any later version.
*
* This library is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
* for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this library; if not, write to the Free Software Foundation,
* Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
package. Alchemist;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintStream;
import java.lang.Class;
import java.lang.reflect.Constructor;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import java.lang.reflect.Modifier;
import java.util.HashSet;
import java.util.Set;
import java.util.Stack;
import Hydra.Language.*;
/**
81


J
* The Definition Generator is used to produce definition class files for a set
* of classes.
*
* Oauthor Franklin E. Powers, Jr.
*/
public abstract class DefinitionGenerator {
/*************************************************************************
* Public Methods
/**
* The method invoked when the Definition Generator is started. It goes
* through the command-line arguments in groups of 4 and produces
* definitions for all the specified files.
*
* Oparam aosArgs The command-line arguments.
* Othrows Throwable If an unexpected error occurs, a message is displayed
* telling the user where they can go to report the error
* or to get help and the error is rethrown.
*/
public static final void main(final StringD aosArgs) throws Throwable £
if (IbSilent)
System.out.printlnCAlchemist Definition Generator vl.0\n" +
"Copyright (C) 2006, Franklin E. Powers, Jr.\n" +
"http://www.dominarvis.com/Hydra/Alchemist");
if (((aosArgs.length 7, 4) != 0) II (aosArgs.length == 0)) {
if (IbSilent) {
System.out.printlnCIncorrect number of parameters supplied.");
DisplayHelpO ;
}
} else
try {
final EventGroup egGroup = EventGroup.Parallel();
final EventGroup egAddingTo = EventGroup.GetAddingToO;
egGroup.AddEventsToThisO ;
for (int i = 0; i < aosArgs.length; i += 4) {
Generate(aosArgs[i].equalsC.") ? "" : aosArgs[i], new
File (aosArgs [i + 1]), aosArgs [i + 2] .equalsC ")
"" : aosArgs[i + 2], new File(aosArgs[i +3]),
false);
>
?
egAddingTo.AddEventsToThis();
egGroup.AddAsChildEventGroupO;
]- catch (final Throwable e) {
82


>
if (!bSilent) {
System.out.printlnC'An error which should not have occured"
+ "has AnAt this time it may be advisable to +
"report this error.\nTo report this error or to +
"get help go to An" +
"\thttp: //www. dominarvis. com/Hydra/Alchemist/Error"
+ ".html\n\nThe error follows:");
throw e;
}
>
public static final void Silent() {
bSilent = true;
>
/************************************* ***************** *******************
* Private Constructors
*************************************************************************/
/**
* The DefinitionGenerator is not meant to be instantiated, so it is an
* abstract class and its constructor is private.
*/
private Def initionGenerator () -[ }
/*************************************************************************
* Private Fields
*************************************************************************/
private static boolean bSilent = false;
/*************************************************************************
* Private Inner Classes: The Base Class for Errors
*************************************************************************/
/**
* This class is used to represent errors that occur when the Definition
* Generator is executing. It is defined as a NamedRuntimeException, so
* that the exceptions can be thrown anywhere and caught by the main
* method.
*
* Sauthor Franklin E. Powers, Jr.
*/
private static abstract class Error extends
Hydra.Util.Specialized.NamedRuntimeException {
83


/ **
* Creates a new instance of Error.
*
* Oparam fFile The file for which the error occured.
*/
public Error(final File fFile) {
this(fFile.getAbsolutePathO);
}
/**
* Creates a new instance of Error.
*
* param sFile The file for which the error occured.
*/
public Error(final String sFile) {
superO'The code generator ", + sFile);
>
/*************************************************************************
* Private Inner Classes: Errors
/**
* This error is thrown when a destination file can!t be created by the
* Definition Generator.
*
* author Franklin E. Powers, Jr.
*/
private static final class could_not_create_destination_file extends Error
{
/**
* Creates a new instance of could_not_create_destination_file.
*
* param fFile The file for which the error occured.
*/
public could_not_create_destination_file(final File fFile) {
super(fFile);
>
>
/**
* This error is thrown when a destination directory cant be created by
* the Definition Generator.
*
* author Franklin E. Powers, Jr.
*/
private static final class could_not_create_destination_directory extends
84


Error {
/**
* Creates a new instance of could_not_create_destination_directory.
*
* @param fFile The file for which the error occured.
*/
public could_not_create_destination_directory(final File fFile) {
super(fFile);
>
/**
* This error is thrown when a source file cant be found by the Definition
* Generator.
*
* author Franklin E. Powers, Jr.
*/
private static final class could_not_find_the_source_file extends Error {.
/**
* Creates a new instance of could_not_find_the_source_file.
*
* param fFile The file for which the error occured.
*/
public could_not_find_the_source_file(final File fFile) {
super(fFile);
>
>
/**
* This error is thrown when a source file cant be loaded by the
* Definition Generator.
*
* author Franklin E. Powers, Jr.
*/
private static final class could_not_load_the_source_class_file extends
Error {
/ **
* Creates a new instance of could_not_load_the_source_file.
*
* param fFile The file for which the error occured.
*/
public could_not_load_the_source_class_file(final File fFile) {
super(fFile);
>
>
/**
* This error is thrown when a destination file cant be written to by the
* Definition Generator.
85


*
* author Franklin E. Powers, Jr.
*/
private static final class could_not_write_to_the_destination_file extends
Error {
/**
* Creates a new instance of could_not_write_to_the_destination_file.
*
* param fFile The file for which the error occured.
*/
public could_not_write_to_the_destination_file(final File fFile) {
super(fFile);
>
}
/**
* This error is thrown when a the Definition Generator encounters a
* package name which is invalid.
*
* author Franklin E. Powers, Jr.
*/
private static final class
does_not_consider_this_to_be_a_valid_package_name extends Error {
/**
* Creates a new instance of
* does_not_consider_this_to_be_a_valid_package_name.
*
* param sName The package name which is considered invalid.
*/
public does_not_consider_this_to_be_a_valid_package_name(final String
sName) {
super(sName);
>
>
/**
* This error is thrown when a specified file cant be used by the
* Definition Generator.
*
* author Franklin E. Powers, Jr.
*/
private static final class is_unable_to_use_the_following_file extends
Error -(
/**
* Creates a new instance of is_unable_to_use_the_following_file.
*
* param fFile The file for which the error occured.
*/
public is_unable_to_use_the_following_file(final File fFile) {
86


>
super(fFile);
>
/**
* This error is thrown when the output for a directory is specified as a
* file.
*
* Oauthor Franklin E. Powers, Jr.
*/
private static final class
will_not_produce_generated_a_file_from_a_directory extends
Error {
/**
* Creates a new instance of
* will_not_produce_generated_a_file_from_a_directory.
*
* Oparam fSource The source directory for which the error occured.
* Oparam fDest The destination file for which the error occured.
*/
public will_not_produce_generated_a_file_from_a_directory(
final File fSource, final File fDest) {
super(fSource.getAbsolutePathO +" and +
fDest.getAbsolutePathO);
}
}
* This error is thrown when the output for a directory or a file is the
* same as a source directory or file.
*
* Oauthor Franklin E. Powers, Jr.
*/
private static final class
will_not_produce_generated_a_file_on_top_of_the_source_file extends
Error {
/**
* Creates a new instance of
* will_not_produce_generated_a_file_on_top_of_the_source_file.
*
* Oparam fFile The file for which the error occured.
*/
public will_not_produce_generated_a_file_on_top_of_the_source_file(
final File fFile) {
super(fFile);
>
87


/*************************************************************************
* Private Methods
*************************************************************************/
/**
* Creates a directory of generated definitions based on a source
* directory.
*
* Oparam sSourcePackage The name of the source package.
* Oparam fSourceDir The source directory.
* Oparam sDestPackage The name of the destination package.
* Oparam fDestDir The directory to place the definitions in.
*/
private static final void CreateDirectory(String sSourcePackage, final File
fSourceDir, String sDestPackage, final File fDestDir) {.
final File[] aofSourceFiles = fSourceDir.listFilesO;
EnsureThisDirectoryExists(fDestDir);
if (sDestPackage.lengthO == 0)
sDestPackage = fDestDir.getNameO ;
else
sDestPackage = sDestPackage + + fDestDir.getNameO ;
if (sSourcePackage.lengthO == 0)
sSourcePackage = fSourceDir.getNameO;
else
sSourcePackage = sSourcePackage + + fSourceDir.getNameO;
for (final File fSourceFile : aofSourceFiles)
Generate(sSourcePackage, fSourceFile, sDestPackage,
CreateGeneratedFile(fDestDir, fSourceFile.getNameO),
true);
}
/**
* Produces an instance of java.io.File for use as the destination file
* from an instance of java.io.File that specifies the destination
* directory and the name of the source file.
*
* Oparam fDestDir The destination directory.
* Oparam sSource The name of the source file.
* Oreturn The destination file.
*/
private static final File CreateGeneratedFile(final File fDestDir, final
String sSource) {
return new File(fDestDir, ReplaceClass(sSource));
}
/**
88


* Displays the documentation for the Definition Generator.
*/
private static final void DisplayHelpO {
System.out.printin(
"The definition generator takes parameters in groups of 4.\n" +
"These parameters are as follows: \n" +
"SOURCE PACKAGE The name of the package that the source +
"classes are being read\n" +
" from.\n" +
"SOURCE PATH The name of the directory or file that is +
"being read from.\n" +
"DEST PACKAGE The name of the package that the" +
"definitions are being written\n" +
" to.\n" +
"DEST PATH - The name of the directory or file that is +
"being written to.\n\n\n" +
"Note that any package name may be replaced with a An" +
"A indicates that the package name is the name of the +
" directory that is being\n" +
"used. Also note that if a package is specified and a +
"directory is specifiedAn" +
"then the directory name is appended to the package name."
);
>
/**
* Ensures that the specified destination directory exists.
*
* Oparam fDestDir The destination directory.
* Othrows could_not_create_destination_directory If a directory which is
* supposed to store one or
* more definition files
* could not be created.
*/
private static final void EnsureThisDirectoryExists(final File fDestDir) {
if (!(fDestDir.exists()))
if (! (fDestDir.mkdirsO))
throw new could_not_create_destination_directory(fDestDir);
>
/**
* Ensures that the specified destination file exists.
*
* Oparam fDestFile The destination file.
* Othrows could_not_create_destination_file If a definition file could not
* be created.
*/
private static final void EnsureThisFileExists(final File fDestFile) {
if (fDestFile.exists())
89