Citation
Streaming Java applications from a relational database

Material Information

Title:
Streaming Java applications from a relational database
Creator:
Gnabasik, David
Publication Date:
Language:
English
Physical Description:
72 unnumbered leaves : ; 28 cm

Subjects

Subjects / Keywords:
Java (Computer program language) ( lcsh )
Streaming technology (Telecommunications) ( lcsh )
Relational databases ( lcsh )
Java (Computer program language) ( fast )
Relational databases ( fast )
Streaming technology (Telecommunications) ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 66-67).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by David Gnabasik.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
53368684 ( OCLC )
ocm53368684
Classification:
LD1190.E52 2002m G52 ( lcc )

Full Text
STREAMING JAVA APPLICATIONS FROM A RELATIONAL
DATABASE
by
David Gnabasik
B.A., University of Chicago, 1979
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science


This thesis for the Master of Science
degree by
David Gnabasik
has been approved
by
Dr. Gita Alagfiband
tO ~2)~0?
Date
Dr. Ellen Gethner


Gnabasik, David (M.S., Computer Science)
Streaming Java Applications from a Relational Database
Thesis directed by Professor Gita Alaghband
ABSTRACT
This paper describes a streaming mechanism that distributes and
deploys Java class bytecode streams persisted in a relational
database. Based upon the Java linking model, the mechanism
distributes a virtual application to a clients process space, allowing
the deployment of dynamic component streams instead of static
applications. Customized client class loaders request Java
components from a class server using dedicated socket ports. The
class server uses a 1st-order Markov probability model to effectively
predict the clients next class request. Various performance
measurements are made of the system.
This abstract accurately represents the content of the candidates
thesis. I recommend its publication.
Signed
Gita Alaghband
in


DEDICATION
I dedicate this thesis to my wife and family for their love, understanding
and support.


ACKNOWLEDGEMENT
I wish to thank my graduate advisor, Dr. Gita Alaghband, for her direction
and attentiveness during my research. I also wish to thank the faculty at
the University of Colorado at Denver for their inspiration and the Graduate
School for their support.


Contents
Contents......................................................................vi
Figures.....................................................................viii
Tables........................................................................ix
Chapter 1: Analysis and Design.................................................1
Introduction................................................................1
Description of Problem......................................................2
Other Design Goals..........................................................3
Initial Questions...........................................................4
Relational Databases........................................................5
Operational Scenarios.......................................................6
Software Design Patterns....................................................9
Mathematical Models........................................................11
The Java Architecture.....................................................15
Java Class Activation......................................................18
Java Class File Format.....................................................20
Class Stream Verification..................................................25
Class Loaders and the Class Loading Mechanism..............................27
Class Hierarchy Storage....................................................31
Class Update Notification Mechanism........................................34
Version Control and Binary Compatibility...................................35
Java ARchive Format........................................................38
Security Management........................................................38
Chapter 2: Architectural Comparisons..........................................42
Serialization and Reflection...............................................42
Distributed Object Architectures...........................................46
Other Java Distribution Solutions..........................................51
Middleware Component Models................................................53
Chapter 3: Performance Measurements...........................................55
Increasing System Performance............................................. 55
Performance Measurements...................................................59
Classes from database.........................................................62
Chapter 4: Conclusions........................................................64
Conclusions and Future Research............................................64
VI


Bibliography......................................................................66
Appendix A: Class Size Sampling Distribution....................................68
Glossary..........................................................................69
VII


Figures
Figure
9 Generic Cache Management Pattern......................................10
10 Producer-Consumer Collaboration......................................11
11 Overall Architectural Model...........................................14
12 Internal Structure of the Java Virtual Machine........................17
13 Database Entity-Relation Diagram......................................34
14 Components of a Jini System...........................................52
15 Time of Class Package Instantiation: Cached (256k) and Non-Cached....60
16 Average Transfer Rate in Bytes Per Second.............................61
viii


Tables
Table
1 Order of Class Activation.............................................. 20
2 Format of a Class File Table............................................21
3 Types of Attributejnfo Tables...........................................23
4 Similarities and Differences Between Jini and JNLP......................52
5 Cached / Non-Cached Time Ratios vs Number of Clients....................60
6 Effectiveness of Class Prediction.......................................62
7 Effectiveness of Class Prediction Over Simulated Network................63
IX


Chapter 1: Analysis and Design
Introduction
This paper describes a mechanism for the storage, distribution and
deployment of streaming Java applications from a relational database. Java
class hierarchies are persisted as Java bytecode streams into database
rows. A streaming mechanism transmits a virtual application to a clients
process space from a class server, in effect deploying dynamic component
streams instead of large, static applications. The mechanism is suitable for
resource-constrained clients, proximity computing environments, or mobile
computing devices. The model is designed to mitigate some of the
distribution and deployment problems of application software.
In much the same way that Java applets can be downloaded from a server
to execute in a clients process space, so too can more powerful class
streams be prefetched and delivered to clients on-demand. The class server
uses a 1st-order Markov probability model to effectively predict the clients
next class request. The problem is fundamentally one of application
partitioning and distribution using efficient class prediction. The Java linking
model and class file format are ideally suited for managing this problem.
Brief Outline
The rest of the paper is organized as follows. Chapter 1 provides an
analysis of the problem and design considerations for its solution with regard
to software design patterns, mathematical models, and particulars of the
Java architecture including Java class activation, class file format, class
loading mechanism and linking model. Chapter 2 provides architectural
comparisons with other systems such as Java RMI, distributed object
architectures and middleware component models. Chapter 3 provides
performance measurements of the proposed model. Chapter 4 offers
conclusions and possible directions for future research.
1


Description of Problem
Dynamic component streams can address several software distribution and
deployment issues. The internet has permitted the relatively automated and
unobtrusive update or correction of software applications. The software
versioning process has been encumbered by static library or object file
linking limitations, from a combination of language and compiler design and
operating system constraints. The software deployment process has
become even more complicated trying to satisfy specific customer requests
and environments. It is difficult to effectively manage an application that
maintains core functionality yet is customizable for a particular computing
environment, a set of formal RFP requirements, or changing business
needs. Indeed, an application can become so flexible and customizable that
it becomes a deployment and maintenance nightmare with an uncontrollable
number of different versions in the field. A very expensive development staff
ends up directly supporting the application because they are the only ones
who really know how to configure or patch the application. The developers
are often too busy to properly train support personnel, and resent well-
meaning support efforts that often times bring a system down to its knees.
The latest mobile telephones, PDAs or wireless computing devices are often
resource-constrained with limited amounts of memory and or bandwidth. An
entire application simply does not fit in this amount of space. Furthermore,
computing is also becoming personalized, where many of the same software
components are combined in customized ways for small groups or even
individuals.
The process of application deployment can take advantage of the dynamic
linking and class loading mechanism mechanisms in Java compilers to
support a distributed and customized component model. Since the Java
virtual machine, class file format and bytecode streams are supported in
heterogeneous computing environments, an immediate advantage is that
this distribution model works with many different computing environments.
The class loading mechanism that retrieves and distributes a streaming
class hierarchy to a clients process space faces some conceptual issues,
but the primary issue is to maintain responsive, overall system performance.
System performance is mainly effected by:
2


how a class hierarchy is described and persisted in the database
how a class hierarchy is distributed to a client
how linking and class loading works in the client virtual machine
the frequency and cost of client-server communication
Therefore, the major problems to be resolved are:
defining an efficient streaming component model
decomposing and persisting class hierarchies into a database
retrieving the most probable class subgraphs to be activated next
distributing new or updated classes to clients
maintaining overall system performance
A Note on Terminology
The definitions of many italicized terms are found in the glossary. Two terms
require particular attention. In Java, variables and expressions have
(compile-time) type, whereas objects and arrays have (run-time) class. An
object is an instance of its class and all of its super-classes. A non-null
reference-type variable at run-time refers to an object whose class is
compatible with the type of the variable. The term class in this context can
refer to both classes and interfaces.
Other Design Goals
This model aims for simplicity and generality, but it cannot properly be called
an architecture. It is characterized equally by the problems that it avoids as
well as the problems it addresses. These negative design goals include:
The clients virtual machine executable does not have to be modified.
Instead, downloaded user-defined class loaders act as the primary
software environment control mechanism.
Hardware enhancements are implemented on the server, not on the
client, unlike an ultimate gaming machine strategy.
The server does not manage client state beyond minimal client
authentication, authorization and initialization, although it does record
client class usage and requests in order to adaptively predict and
prefetch Java class streams.
Time and effort is expended at the class-producing or -authoring stage
instead of the class-consuming stage.
3


Initial Questions
What was the system development environment?
The system used to develop this model was a Sun Microsystems Java
1.3.1 run time and SDK environment, dated 10/23/99, running under the
Microsoft Windows NT 4.0 service pack 6 operating system on a
Pentium III 600 MHz computer with 256 megabytes of RAM. The client
also executed under the Java HotSpot Client version 1.3.1_02-b02. The
development server ran Microsoft Windows NT 4.0 service pack 6 on
the same computer that hosted an Oracle 8.1.6 relational database.
Why was Java chosen to explore this distributed model and not C++, given
the concern for performance?
Although recent just-in-time Java compilers rival the speed of most C++
compilers, it is the linking model of Java that allows it to easily and
dynamically integrate class streams into a clients process space.
What is the difference between this model and source control software?
Both maintain software versions and history, which has proven beneficial
time and again. This model, however, attempts to predict which class to
distribute next to the client.
What is the difference between this model and standard Java applets?
Standard Java applets are burdened with severe security constraints,
such as not writing to local disks or not connecting to other servers, that
are appropriate for other computing environments. An applet will run only
under a hosting program, nearly always a Web browser. This model
attempts to generalize and extend the mobile code concepts of Java
applets.
Why not use an object-oriented database to persist and distribute classes?
The problem to be solved implies not distributing an entire object graph
to the client, but what he probably needs next. The fact that symbolic
references to other classes exist within a class file provides the
opportunity and the granularity of distributing just enough of an
application trace. Simple and consistent naming and indexing schemes
provide sufficient mapping of classes to relational rows.
4


Why not use one of the available, distributed remote-object invocation
architectures such as RMI, DCOM, CORBA, JavaBeans or even JNLP to
distribute classes?
The model strives to be as simple as possible and to work with resource-
constrained environments. Further architectural comparisons are made
in Chapter 2.
Relational Databases
Why not distribute Java classes directly from a servers file system instead
of using a relational database? The advantages of a database over a file
system are well known. A robust database does not suffer from the data
redundancy and inconsistency issues, the difficulty in accessing data, the
lack of centralized control and operational safety, data isolation, integrity and
atomicity issues, concurrent access, security problems, and transaction
issues that bedevil most file systems. An effective way to persist the class
meta-information is in a database, and a unified storage approach demands
that the class bytestreams be stored there as well. Storing the classes in a
distributed, yet synchronized, database allows for the possibility of several
servers feeding a single client at the same time. Neither does the client have
to load a file-system driver or browser, only a TCP/IP driver that connects to
the server, which allows for the distribution of classes to PDAs and other
non-disk devices.
A significant performance disadvantage of object-oriented databases is that
they do not return small, discrete rows but instead must return entire class
hierarchies, only a portion of which might be actually invoked. If a large
fraction of a classs transitive closure is returned at each client request,
system performance suffers accordingly. The model is designed to stay only
one step ahead of what classes an application trace requires.
This is not to say that relational databases are perfect for the task at hand.
Relational databases suffer poor performance for compute-intensive
applications and data access navigation. Making database schema changes
is a difficult process at best and concurrency control protocols are designed
for short-lived transactions. Furthermore, unlike software applications,
databases usually support weak notions of time, object versioning and event
notification. Each of these issues must be specifically addressed when
implementing the model.
5


Operational Scenarios
There are two primary operations with the system. A class author must load
and manage valid Java classes and applications into the database, and a
client must efficiently access the same classes and applications over a
network. Whereas class authors write infrequently to the class server
database, the speed of class retrieval is crucial to the client. This produces
five main types of interaction with the database: class-authoring, client-
access (application navigation), application streaming, authorization and
security, and miscellaneous information-management. The database
schema is designed to support the different needs of these interactions.
Class Authoring
The system trades class and application development complexity up front
for the simplicity of client retrieval. The class authoring interface is primarily
responsible for inserting, updating and deleting classes and applications in
the class database server. It also manages the following information:
appropriate class and application authorization and security measures
any needed digital signatures or certificates
distinguishing between public or common components and private or
local components
what foundation set of Java classes are required locally on the client
the set of client deployment preferences that indicate how and when
the classes in a particular application trace should be updated.
These requirements specify the minimal client state persisted on the server.
A client profile would include application authentication, authorization and
access information as well as application deployment preferences including
the choice of initial class loader.
Whether presented through a web interface or a static program, a Class
Container Interface provides access to the entry points or named containers
that a client uses to navigate the class database. A container is another
name for the streaming application trace delivered to a client.
6


The author compiles and persists an entire Java application class set as
binary streams of Java bytecode into a relational database. Each stream is
uniquely named and indexed by a database version-package-classname
convention. Importantly, the class is scanned for references to included
classes, and that list is persisted. Note that his class analysis can not detect
the dynamic class loading of the Java newinstance () and for Name ()
methods. Although such classes would be cached on the server, they can
not be effectively predicted at the time of class authoring.
Client Access
Clients initiate and use the system by requesting a particular application by
name, radio frequency or well-known URL address. The Java Naming
Directory Interface (JNDI) is one alternative to browse a central naming
repository. In fact, the Java architecture allows several ways for a client to
activate Java classes: by scripting code (JavaScript, VBscript, Perl), by
JavaBeans, by C++ executables, and so on. These class browsers are a
function of the client device, its resources and the user operating it.
A client message requests a consumer-producer channel (CPC) to be
established with the well-known class server. Once the client is authorized,
the server locates an associated Java class loader (JCL) in the database
and transmits it to the client who loads it permanently into his Java virtual
machine (JVM). The channel is now established on the client side, including
the initial minimum and maximum sizes of the consumer buffer. The server
is notified about the channel, and it opens the producer end of the CPC.
Class Activation
Now the client attempts to load a particular class with his JCL. The JCL is
designed to search for classes in the following order: the currently active
JVM, the consumer buffer, the clients basic Java class libraries, and the
class server. If the class is found in the client buffer, it is decompressed by
the JCL or built-in classes of the JVM, loaded into the running JVM, and
removed from the buffer.
If the requested class is not in the client buffer, the JCL requests the class
from the class server. The class is retrieved from the database along with its
previously parsed list of symbolically referenced classes as either explicitly
7


declared class members or implicitly inherited references. This list is
generated when the class is inserted into the database by the class author.
The list of classes is streamed to the client buffer in the order in which a
JVM internally resolves all of its referenced classes. This strategy allows the
class server to act as a predictive look-ahead class feeder for the client. A
set of referenced classes is compressed and packaged into a single stream
and transmitted to the client. The amount of data transmitted is directly
proportional to the size of the clients buffer. In order for the set of classes to
execute at the client, one of the classes will always have a main () or
init () method, the entry method for a Java application or applet.
Class Discard Policy
When non-duplicated classes are inserted into the client CPC buffer, they
are associated with the class that symbolically invoked them. If a class is
extracted from the buffer to be activated, the discard policy marks the list of
explicitly associated classes for discard. These classes can be sequentially
pushed out of the buffer, if necessary, to make room for new classes or they
can be extracted for activation. Note that any inherited (i.e., parent) classes
or interfaces have themselves already been extracted from the buffer since
a classs parent(s) must be completely activated before the class itself.
Since only the client removes classes from the buffer, there is no need for
the client to inform the server that the client has unloaded a class. There is
no provision to pin a class into the buffer so that it can not be pushed out.
The client is responsible for buffer overflows.
This approach can be extended to any type of binary resource that can be
stored in the database, including multi-media resources and C++ object files
invoked as native methods. It is straightforward to enforce the rule that
classes authored into the database only refer to other database classes or
to well-known base classes in a JVM. Since classes loaded by different
Java class loaders are placed into separate name spaces inside a clients
JVM, this partitioning supports a clients separate name spaces connecting
to different class servers.
Depending upon the clients disk and memory resources, all or very few of
the distributed classes may be kept active at the client. If client resources
are low his customized class loader can unload selected classes thereby
invoking local garbage collection. If the class is invoked later on, the same
request is made of the database.
8


Prediction and Relevant Statistics
Even though the process is simpler and faster if there is no feedback from
the client to the server, the deployment process becomes more efficient if
the client sends messages to the server indicating which specific classes
have been loaded into the client JVM. The predictive powers of the server
are adaptively enhanced if the client maintains consumer class activation
statistics and sends them back to the sever each time a class is requested,
or all at once at applications end. Defining the miss rate as the fraction of
class requests that are not in the clients local cache, the effectiveness of
prediction becomes the ratio of class requests satisfied by the client buffer to
requests satisfied by the class. The modeling question becomes: How can
the server more efficiently predict which classes to stream to the client? The
overall measure of success will be good performance and scaling as
components and active clients are added to the system.
In order to experimentally collect class probabilities overtime, the clients
class loader collects statistics regarding which classes were invoked and
when, the total number of class invocations, and the type of activation.
These statistics are sent back to the server along with subsequent class
requests.
Software Design Patterns
One principle of object-oriented analysis and design is to look for software
design patterns that characterize the specific problem being researched.
These patterns are reusable solutions to recurring problems that occur
during software development. By abstracting the problem into a design
pattern, effective design reuse is accomplished. The following patterns were
found to be relevant to the project at a high level (Grand 1998).
Facade: this pattern simplifies access to a related set of objects by providing
one object that all objects outside the set use to communicate with the set,
thereby hiding complexity. The overall class distribution system from the
database acts as a virtual application facade for the client.
9


Dynamic Linkage: this pattern allows a program to load and use arbitrary
classes that implement a known interface. The Java linking model is built
solidly upon this polymorphic foundation. This is used in particular for the
specialized client class loaders that retrieve classes from the class server.
Virtual Proxy, this pattern delays the instantiation of an object until it is
actually needed (lazy instantiation), which is the basis for the class
distribution mechanism. In a virtual proxy, clients access the object indirectly
through a proxy object that implements the same interface. A logical
extension of this idea is to initially deliver a virtual object to a client
containing only method stubs. This technique grows in appeal the longer the
object derivation hierarchy.
Cache Management This pattern allows fast access to objects that would
otherwise take a long time to access, by retaining a copy of expensive
objects after the immediate need for the object is over. There are ordinarily
read and write consistency issues with caches, but the proposed model
avoids these problems by wrapping the cache buffer within the producer-
consumer pattern. The class server only inserts into the buffer, and the
client only reads or deletes from the buffer. The specific cache management
issues that must be addressed include:
deciding which and how many classes to keep in memory
the enforcement of a class discard policy
what to do if a particular class is larger than the class buffer size.
Figure 1 diagrams a generic cache management pattern.
Figure 1: Cache Management Pattern
10


Producer-Consumer. This pattern coordinates the asynchronous production
and consumption of objects, and forms the basis of the class distribution
mechanism. A hybrid queue is used because items can sometimes be
removed from anywhere inside the queue buffer. Note that some items may
never be activated at all. Refer to Figure 2 for a generic diagram of a
producer-consumer collaboration.
Figure 2: Producer-Consumer Collaboration
Mathematical Models
Queuing Models
Standard queuing theory is able to predict the behavior of random variables
under the following set of assumptions (Hennessy and Patterson 1996, 507-
515):
The system is in equilibrium.
The inter-arrival times are exponentially distributed.
11


The number of requests is unlimited; an infinite population model
There is no limit to the length of the queue, and it follows a FIFO
discipline.
All tasks in line must be completed.
The system is memoryless, where the past history of events has no
impact on the probability of an event occurring now.
Items can only be removed from one end of the queue.
Each of the above assumptions are violated, make formal queuing theory
unsuitable for this model. Informally, the consumer buffer of distinct and
non-duplicated classes still behaves as a queue under specific
circumstances.
Grammatical Models
Some research has been done (Bell 1989, 571) that considers the
grammatical structure of a programming language as the basis for predicting
program behavior. Grammatical models have proven to be useful for
analyzing the dependencies of while and do-while looping structures, for
loops, if-then-else conditionals, switch statements, try-catch-finally blocks
and even recursive method structures. But classes activated inside loops,
recursions and conditional or switch blocks need only be distributed once to
a client, and the practice of instantiating classes within catch blocks is
discouraged. Reconciling these language structures with class activation
can be adequately handled by transition probabilities. Directly modeling the
grammatical structure of the Java language doesnt appear to catch the
essence of the proposed problem, although it might be useful to record
which enclosing structure actually invoked a class.
For the purposes of this paper, an application can be modeled as a graph of
uniquely named, partially cyclic, partially hierarchical, activated Java
classes. This means that applications flow hierarchically, punctuated by
returns to previously encountered class nodes, Such as while and for loops.
The actual trace of an application is a subgraph of the same type since the
user traces a path through the possible class graph. The point is that a
predictive model must recognize that an application trace contains both
cyclic and hierarchical structure.
12


Transition Probability Model
The proposed model generates a set of probability distributions, one for
each context in which a class might be invoked. These contexts are termed
conditioning classes. The streams of class bytecodes, each persisted in its
own database row, act as the source domain of possible application
symbols. Its range would correspond to an actual trace through the possible
application class hierarchy per application session. Given a particular class,
the model enumerates all of the invocation probabilities of the classes that it
possibly references. Since the probability of transmitting a specific class is
conditionally dependent upon the class that calls it, the model establishes a
conditional probability graph (CPG) of P(classA | classB) probability values. A
class may be invoked by different classes; hence each class might have
multiple conditioning classes. The third dimension of this CPG graph is the
number of class invocations. From one perspective, class invocation
prediction implies distributing the globally most frequently accessed classes
to the client. Limiting the client buffer size implies distributing a more locally
defined subset of the class graph; the class most likely to be invoked next at
any point in the program. Good prediction means transmitting those classes
most likely to be consumed by the client.
A 1st-order, finite state, probabilistic Markov model is proposed for the
following reasons:
The Markov model is suitable to the local dependencies embedded in
Java class invocation structure.
The finite-state machine model accurately reflects the necessary and
unique set of state transitions that occurs in an application trace. Any
trace defines a path through the model that reflects a specific sequence
of class invocations.
Since classes are conditionally invoked in an application trace, the
model must characterize their invocations by probability values.
There are no constraints imposed on the model by using threads.
A 1st-order Markov model is formulated generally as P(xn| xn-i) = P(xn| xn-i,
xn-2,-..), where {xn} is a sequence of observations. Finite-state models are
based on finite-state machines possessing a set of states and a set of
transition probabilities where the probabilities of outgoing transitions from a
state must add up to 1; i.e., a transition must occur. The probability of any
message is the product of the transition probabilities going out of a certain
state, which can be represented using a state transition table. The entropy
13


of a finite state process with states Si (i.e., an application) is then simply the
average value of the entropy at each state: H=ZP(Sj)H(Sj) (i from 1 to M).
Indeed, using a state transition probability matrix, the entropy of a particular
class being invoked (a message) would be the sum of its transition
entropies through the matrix.
It is possible to compare sets of application traces that all start from the
same Java main () method. For any application, those nearest the main ()
root will have high probabilities of being invoked, even though they may be
called only once. The model must handle classes that are invariably invoked
as well as classes that are invoked frequently. It appears that two measures
of prediction must be used, assuming a fixed-length client buffer size:
a measure of local effectiveness to predict and transmit only the
classes that are needed by the client, just in time
a measure of global efficiency to predict and transmit the most
frequently invoked classes, overall
ClassLoader-NameSpace 1
Connection 1A
Class Method Cache CMC
JVM1 on Clientl
ClassLoader-NameSpace 2
Connection 2A Connection 2B
Class Browser
Client PC
Common
Cache
Clientl Cache
Client2 Cache
! i
Class Streamer
Streaming
Dsitribution
Monitor
Class Name
Resolution
Service
TCP/IP Connection
Event Notification System
Consumer-Producer Channel (CPC)
Class
Database
Performance
monitor
Database
Security
Interface
Thread Pool
Connection
Pool

ODD
Database Server 1
Figure 3: Overall Architectural Model
14


The overall architectural model is diagrammed in Figure 3. Note the use of
caches on both the server and the client.
The Java Architecture
Fundamental to the proposed model is the architecture of Java. Java
signifies several things: a programming language, a virtual machine
specification, an application programming interface, and a class file format.
(Eckel 2000) (Gosling et al 2000). This proposal focuses on the virtual
machine and the class file format, specifically the class loading mechanism,
and how they interact over a logical network connection.
Deployment Forms
Java is currently packaged into three deployment forms that are scaleable
supersets of each other. The minimal API set for embedded devices is
called Java 2 Platform, Micro Edition (J2ME). The desktop API is called
Java 2 Standard Edition (J2SE). The server API is called Java 2 Enterprise
Edition (J2EE). Importantly, this analysis pertains to all three platforms,
although there are significant limitations regarding J2ME.
The Java2 Micro Edition specifies a virtual machine that works within much
stricter memory requirements of Connected Limited Device Configuration
(CLDC) devices. Currently, J2ME has no floating point support, no
finalization, fewer error handling classes, no Java Native Interface (JNI), no
user-defined class loaders, no reflection including object serialization and
RMI, no thread groups and daemon threads, no JDBC support, and no weak
references.
Internal Architecture
Java uses a single-inheritance object-oriented model. All objects are derived
from ob j ect at the top of the inheritance tree. Java does not support
multiple inheritance because of name clashes and the possibility of acyclic
class libraries.
15


The internal architecture of the run time virtual machine is based solidly on
stacks and threads. All threads in one virtual machine instantiation share
one common method area and one common heap. Each thread has its own
set of logical registers, a Java method stack and a native method stack.
Every Java thread also has an associated stack of its ongoing method
invocations. The virtual machine recognizes two fundamental types:
primitives and references. The reference type references object classes,
interfaces and arrays. Java does not define access to machine registers.
Java was designed specifically with networks in mind. It is suitable for
networked environments because it enables secure, robust, platform-
independent software objects, both code and state, that are mobile over the
network. At a structural level, this process is assisted by the Java class file
format and the dynamic linking mechanism.
Java Security Model
Java's security model is one of the language's key architectural features that
makes it an appropriate technology for networked environments because it
establishes a needed trust in the safety of network-mobile code. The
security model focuses on protecting users from hostile programs
downloaded from untrusted sources across a network. To accomplish this
goal, Java provides a customizable "sandbox" in which Java programs run.
The fundamental components responsible for Java's sandbox are:
multiple safety features built into the JVM and the language itself
the class loader architecture
the class file verifier
the security manager and the Java API
Two of the more important safety features include structured memory
access (i.e., no pointer arithmetic) and the unspecified memory layout of the
JVM run time data areas. This way, a class file can not contain any run time
memory addresses. Note that the use of native methods circumvents the
standard Java security mechanisms.
Class Loading Mechanism
Class loaders are responsible for importing binary data that define the
running program's classes and interfaces. The JVM has a flexible class
16


loader architecture that allows a Java to load classes in custom ways.
Multiple class loaders can be defined by the user in addition to the
primordial (default) system class loader. Because of class loader objects, it
is not necessary to know at compile-time all the classes that may ultimately
take part in a running Java component. They enable a Java component to
be dynamically extended at run time. A component can determine what
extra classes it needs and load them through one or more class loader
objects at run time. Indeed, the power and flexibility of Java is revealed in
how classes are loaded and incorporated into a clients virtual machine.
Figure 4 illustrates the JVMs internal structure (Venners 1999, 137)
17


Java Class Activation
The process of class loading brings a binary class stream into the JVM.
Linking incorporates the stream into the run time state of the JVM. Linking is
divided into three steps: verification, preparation, and resolution. Verification
ensures that the stream is properly formed; preparation allocates memory
needed by the stream; and resolution transforms symbolic class references
into direct machine references. Finally, the process of class initialization
assigns class variables with their proper initial values. Loading, verification,
preparation, resolution and initialization must take place in that order,
although resolution can occur after initialization. The entire process is called
class activation, and it is driven by the first active use rule.
Invoking or calling a class method requires that the class be activated first,
which is represented in client memory by the Java class file, a precisely
defined binary file format for Java classes. Each Java class file represents a
complete description of exactly one Java class or interface.
Instances of this class Class represent classes and interfaces in a running
Java application. Every array also belongs to a class that is reflected as a
Class object that is shared by all arrays with the same element type and
number of dimensions. The primitive Java types (boolean, byte, char, short,
int, long, float, and double), and the keyword void are also represented as
Class objects. Class has no public constructor. Instead Class objects are
constructed automatically by the JVM as classes are loaded and by calls to
the def ineClass () method in the class loader.
Active Use Initialization
Since the delivery mechanism operates upon class activation, its predictive
effectiveness depends exactly upon when a JVM first initializes a class into
active use. There are six situations that a Java class or interface is initialized
for active use (Venners 1999, 239).
when new () creates a class instance; or through implicit creation,
reflection, cloning or serialization including: Class newinstance () ;
object.clone (); and Obj ectlnputStream.getObj ect()
the invocation of a static method declared by a class
18


the use or assignment of a static field declared by a class or interface
except for static fields that are final and are initialized by a compile-time
constant expression
the invocation of certain reflection methods, such as methods in class
Class or in classes in the j ava. lang. reflect package.
the initialization of a subclass of a class
the designation of the initial class (with the main () method) when the
JVM starts
In Java programs, classes can be instantiated explicitly or implicitly The
four ways a class can be instantiated explicitly are:
via the new () operator
by invoking newlnstance () on a Class or j ava. lang. reflect
constructor object
by invoking clone () on any existing object
by deserializing an object via the getob j ect () method of class
java.io.ObjectInputStream.
All other ways use implicit instantiation.
Activation Dependencies
Clearly, there are types of activations that are characterized by where and
when the activation occurs. Syntactically, classes can be activated inside
other class methods, inside constructors, as method arguments, as method
return types, as array elements, in the class extends or implements
declaration, and as inner classes. Note that whereas the superclasses of a
class must be recursively initialized, such is not true for a classs
superinterfaces. In fact, only an interfaces static block is initialized.
The temporal dependencies are even stronger. A class must be initialized in
a certain order. First, all the classs superclasses are recursively initialized,
then the class itself; then any inner classes, followed by the classs static
variables. If the class is initialized for active use, then the classs private
variables are initialized followed by all classes referenced in the constructor.
If a method is invoked, the method arguments and method return type are
initialized before any and all classes referenced in the method. The two
levels involved in this linear initialization order implies that a single class
name immediately generates the next series of classes, providing effective
compression and prediction. Table 1 defines the stages in class activation,
and the classes referenced in each stage.
19


STAGE 1a STAGE 1b STAGE 2
parent classes private variables method arguments
explicit interfaces constructor variables method return type
class itself method references
inner classes
static variables
Table 1: Order of Class Activation
Model Bounding
The class server explicitly transmits only the classes referenced in Stage I,
those that are necessary to initialize a class. By doing so, the model
assumes a single-level look-ahead as a Markov bound to the model size.
This assumption is reasonable if the application class hierarchy is relatively
flat that there are not too many classes with more than 3 user-defined
parent classes. The Java inheritance structure tends to force this user-
defined flatness for the following reasons:
Java uses a single, not a multiple, inheritance structure.
A Java class can only extend from one parent.
The multiple interfaces that a class can implement (inherit) only declare
constants and method signatures they do not declare member classes.
There are so many predefined standard Java classes available that
only the smallest extensions are usually necessary.
Java Class File Format
The Java class file contains everything a JVM needs to know about exactly
one class or interface. A persistent class file or stream is defined in terms of
8-bit big-endian bytes with data items stored sequentially and no padding.
The length of variable-length items precedes the actual data for the item.
The class file is organized in terms of multiple-named tables, given in Table
2. The items in each table have a name, a type, and a count. The type is
either a table name or one of four unsigned primitives: u1 (1 byte), u2 (2
bytes), u3 (4 bytes) or u4 (8 bytes). The class file pool format follows this
order, (Venners 1999, 194) and is loaded directly into the JVMs constant
pool area, one for each class. The constant pool is the one place where
each class file keeps all of its symbolic references, and so requires a
detailed examination.
20


Type Name Count
u4 magic 1
u2 minorversion 1
u2 major version 1
u2 constant_poo! count 1
cp info constant pool constant pool count
u2 access flags 1
u2 this class 1
u2 superclass 1
u2 interfaces_count 1
u2 interfaces interfaces count
u2 fields_count 1
field info fields fields count
u2 methods_count 1
method info methods methods count
u2 attributes_count 1
attribute info attributes attributes_count
Table 2: Format of a Class File Table
This_class is a constant pool index to the CONSTANT_Class_info table.
Superclass is another constant pool index to the fully qualified names of
this classs super-class. It is 0 for the Ob j ect class. It points to
j ava. lang. Ob j ect for interfaces.
Interfaces is an array that contains one index into the constant pool to
fully qualified names of each super-interface directly implemented by this
class or interface via the implements or extends keywords.
Fields_cunt includes both class and instance variables. Fields is a
list of non-inherited, explicitly defined, variable-length fieldjnfo tables which
contains the fields name, descriptor and modifiers. Inner classes are
dynamically added to this list and they are marked with the synthetic
attribute.
Methods_count only counts explicitly defined, non-inherited methods.
Methods is a list of variable-length methodjnfo tables which contains the
methods name, descriptor and modifiers. If the method is not abstract and
not native, this table includes the number of stack words required for the
methods local variables, the maximum number of stack words required by
the operand stack, a table of exceptions caught by the method, a table of
21


checked exceptions, the bytecode sequence, and optional line number and
local variable tables.
The constant_pool list contains the constants associated with the class or
interface such as literal strings, final variable values, fully qualified class and
interface names and field / method names and descriptors. (A field
descriptor string indicates the fields type. A method descriptor string
indicates the methods return type and the number, order and types of its
parameters.) These entries are referred to by their integer index throughout
the class stream. The zeroth entry is always empty. Each entry starts with a
one-byte type tag.
The symbolic references in the constant pool use three special strings: fully
qualified names, simple names, and descriptors. References to classes and
interfaces are declared using fully qualified names. References to fields are
declared using a simple field name, descriptor and a fully qualified type
name. References to methods are declared using a simple method name,
descriptor and a fully qualified type name.
Field and method descriptors are defined by a context-free grammar. A
method descriptor can contain only as many parameters as will fit into 255
words. Most parameters occupy one word; long and double parameters
occupy two words.
Each of these tags has a corresponding table whose name appends _info
to the tag name. Importantly, the fully qualified class / interface names and
field / method names and descriptors are used at run time to link this class
code to other classes and interfaces. The access_flags indicate whether the
type is a class or an interface
Each method declared in a class or interface or generated by the compiler is
described in the class stream by a variable-length methodjnfo table. The
two types of compiler-generated methods are instance initialization methods
and class and interface initialization methods.
Attributes give general information about the particular class or interface
defined by the class stream. The JVM implementor is free to define his own
class attributes and place them into the class stream. The JVM specification
defines two types of attributes that can appear in the class stream:
SourceCode and innerClasses. Attributes can appear in the ClassFile,
22


fieldjnfo, methodjnfo, and Code_attribute tables. The nine types of
attribute info tables are listed in Table 3.
Name Used By Description
Code methodjnfo The bytecodes and other data for one method
ConstantValue field info The value of a final value
Deprecated fieldjnfo, method info The field or method has been deprecated
Exceptions methodjnfo The checked exceptions a method may throw
InnerClasses ClassFile A list of inner and outer classes
LineNumberTable Code_attribute A mapping of line numbers to bytecodes
LocalVariableTabl e Code_attribute A description of local method variables
SourceFile ClassFile Name of the source file
Synthetic fieldjnfo, methodjnfo The field or method was compiler generated
Table 3: Types of Attributejnfo Tables
Method Area
When the JVM loads a class stream it parses the binary type stream and
then places that information into the run time method area, which all threads
share. (The other components of the run time environment are the heap, the
Java stacks, the program counter registers and the native method stacks.)
Whereas the class stream format is consistently defined for all JVM
implementations, the internal method area is completely implementation
dependent all the way down to byte order. The JVM designers make
engineering trade-offs about their target environment regarding the speed
and compactness of the method area. Even the form and data structure of a
fully qualified name in the method area is designer dependent.
However, a common set of type information is placed into the method area,
including:
the fully qualified name of the object reference
the name of the objects direct super-class, unless its an interface
whether this type is an object or an interface
the set of class modifiers: public, abstract, final
23


the list of direct super-interfaces
The following information for each loaded type is also placed into the
method area:
constant pool data, which contains direct and symbolic references
field descriptors, including name, type and modifiers
method descriptors, including name, type, modifiers, and parameters
all class (static) variables declared in the type, except constants
a reference to the types class loader
a Class reference, which exposes the getName (),
getSuperClass(), islnterface(), getlnterfaces() and
getClassLoader () methods of object Class.
bytecodes, operand stack sizes and local variables for non-abstract
and non-native methods
Given a class reference, a method will always occupy the same position in
the objects method table, independent of the actual class of the object. The
table itself contains the method operand stack sizes, local variable sizes,
bytecodes and exceptions.
Given this information, what are the alternatives for notifying the client that a
class or method needs to be dynamically reloaded?
It is possible that direct references in the constant pool section of the
method area can be converted back into symbolic (string) references in the
event of an updated class or interface method, which would force the local
JVM to reload that bytecode from the database at the next invocation.
However, is it possible for a standardized conversion mechanism to manage
all of the various method area implementations? Probably not, although
most JVM implementations structure the constant pool in the same way and
simply use arrays of native pointers as method tables to the method area.
Whereas the Java class file format can be easily and consistently mapped to
database tables and queried for the sake of managing class activation, the
internal method area is of no use.
It is interesting to note that there are a few format restrictions that are
relaxed in a persisted class file. For instance, in the Java language one
cannot overload methods by varying only the return type, but one can in the
class file as long as their method descriptors are different.
24


Class Stream Verification
The Java class stream verification process has the tasks of finding and
loading the class, checking that the bytestream conforms to the basic
structure of a Java class, and then resolving symbolic references with direct
references for the sake of run time speed and efficiency. Symbolic
references contain a class name, a field or method name and a field or
method descriptor. They are formatted in the class stream as unresolved
text strings. The class stream verifier uses this information to ensure run
time, not compile-time, binary compatibility. A reference must always refer to
an object, and all references are remote. Direct references are usually
converted to native pointers or offsets to the method area. :
There are four passes made by the class stream verifier.
In the first pass, the stream is checked to see whether it starts with the
magic four bytes OxCAFEBABE and whether the major and ininor version
numbers declared in the stream can be supported by the particular JVM
reading the stream. The last part of this first pass calculates the expected
total length of the type based on each sub-component type and its length.
The second pass makes semantic checks on the type data. Individual
components are inspected to make sure that they are well-formed instances
of their declared type. The class is verified that it has a super-class; that final
classes are not subclassed; that final methods are not overridden; that
constant pool entries are valid; that all entries into the constant pool refer to
the correct type of constant pool entry.
The third pass does the actual bytecode verification using data-flow analysis
on the class methods. It verifies the operand stack, class member
assignments, variable initialization, method invocation and parameter
passing, opcode operands and so on. The verifier tests whether some, but
not all, programs will execute in the virtual machine environment.
The fourth pass traverses and verifies symbolic references during the
process of dynamic linking. Implementation specifics may vary, but every
JVM must give the impression that it loads classes as late as possible, a
process called lazy activation. A symbolic reference string to another class
25


gives its full name. A reference to the fields of that class gives the class
name, field name and field descriptor. A reference to methods of other
classes gives the class name, method name and method descriptor. The
process of symbolic resolution locates and loads the class, then replaces
the symbolic reference with a direct reference, such as a pointer or offset.
These direct references are then stored in a structure for fast retrieval so the
symbolic reference does not have to be resolved again.
Rules of Binary Compatibility
Pass four essentially tests for run time binary compatibility (Gosling et al
2000). These rules define what can be added, changed or deleted in a class
without breaking binary compatibility with pre-existing class streams that
depend on the changed class; i.e., the class will link without errors. Clearly,
it is up to the class authoring interface to ensure that modified classes are
not reintroduced into the database that break the rules of binary
compatibility.
The following is a list of some binary compatible changes that the Java
programming language supports:
adding new fields, methods, or constructors to an existing class or
interface
reimplementing existing methods, constructors, and initializers to
improve performance
changing methods or constructors to return values on inputs for which
they previously either threw exceptions that normally would not occur or
failed by going into an infinite loop or causing a deadlock
deleting private fields, methods, or constructors of a class
when an entire package is updated, deleting default (package-only)
access fields, methods, or constructors of classes and interfaces in the
package
reordering the fields, methods, or constructors in an existing type
declaration
moving a method upward in the class hierarchy
reordering the list of direct super-interfaces of a class or interface
inserting new class or interface types in the type hierarchy
The class stream format has several properties which support source code
transformations that preserve binary compatibility. Venners (1999) has
complete details.
26


Class Loaders and the Class Loading Mechanism
Native Class Loaders
A class loader is an object that is responsible for loading classes into the
virtual machine. The Java 1.3.1 class library includes 4 instances of class
loaders. The root for three of them is an abstract class called
classLoader. Given the name of a class, it locates or generates data that
constitutes a definition for the class.
SecureClassLoader extends ClassLoader with additional support for
defining classes with an associated code source and permissions.
URLClassLoader extends SecureClassLoader, and is used to load
classes and resources from a search path of URLs referring to both JAR
files and directories.
RMiciassLoader provides static methods for loading classes from a
network location, one or more URLs, and obtaining the location from which
an existing class can be loaded. It is used in conjunction with RMI-based
systems.
The proposed model uses a Java class loader derived from the
SecureClassLoader class to load all subsequent classes.
SecureClassLoader uses a delegation model to search for classes and
resources. Each instance of it has an associated parent class loader. When
called upon to find a class or resource, a SecureClassLoader instance
will delegate the search for the class or resource to its parent class loader
before attempting to find the class or resource itself. Its defined ass ()
method converts an array of bytes into an instance of class Class. Instances
of this newly defined class can then be created using the newinstance ()
method in class Class, such as follows:
27


ClassLoader loader = new NetworkClassLoader(host, port);
Object main = loader.loadClass("Main", true).newlnstance();
The new class loader subclass must define the methods f indciass () and
loadClass Data () to load a class from another location, such as follows:
class NetworkClassLoader extends SecureClassLoader {
String host;
int port;
public Class findClass(String name) {
byte[] b = loadClassData(name);
return defineClass(name, b, 0, b.length);
}
private byte[] loadClassData(String name) {
// load the class data from the connection
}
}
Class Loading Security and Name Spaces
Class loaders place each loaded class into a protection domain. They
locate, load, link (verify, prepare, resolve) and initialize class streams in that
order. A SecureClassLoader performs many security-related duties, as
well. First, it will not attempt to load any classes in j ava. packages from
over the network. This ensures that the JVM isn't tricked into using false
representations of the core class libraries that could break the Java security
model. Second, it provides separate name spaces for classes loaded from
different locations to ensure that classes with the same name loaded from
different hosts will not clash.
For each class it loads, the JVM keeps track of which class loader, whether
primordial or object, loaded the class (Venners 1999, 45). When a loaded
class first refers to another class, the JVM requests the referenced class
from the same class loader that originally loaded the referencing class.
Because the JVM takes this approach to loading classes, classes can by
default only see other classes that were loaded by the same class loader. In
this way, Java's architecture enables multiple name spaces to be created
inside a single Java application. A name space is a set of unique names of
classes loaded by a particular class loader. For each class loader, the JVM
maintains a name space, which is populated by the names of all the classes
that have been loaded through that class loader. Thus, there is a one-to-one
28


mapping between class loaders and name spaces in a JVM. Classes loaded
by different class loaders are in different name spaces and cannot gain
access to each other unless the application explicitly allows it.
Overloading f indClass ()
Writing a custom class loader essentially means overloading the
f indClass () method, where at least five operations must be performed.
(Additional operations, like decryption, can also be performed here.) These
operations include:
1. Determining if the class is already loaded. The classLoader super-
class maintains a private Hashtable of previously loaded classes. To find
out if anything is loaded, ask the super-class with the f indLoadedciass ()
and findSystemClass () methods. It is necessary to check both, as not
only will a class be loaded through the custom class loader, but all of its
super-classes will be loaded that way, too.
2. If the class is not already loaded, loading the class file from a source
location. This loading could be accomplished using JDBC from a database,
using a network connection based on a URL, or through some other
deterministic mechanism.
3. Calling def ineClass () to convert the bytestream into a Class. This
method throws a ClassFormatError if the byte array is invalid and a
ClassNotFoundException if def ineClass () returns null, which
prevents accessing an erroneous class variable later.
4. Symbolically resolving the class. Until a class is resolved, class instances
cannot be created nor its methods called.
5. Returning the newly created class.
Java Linking Model
As mentioned in the section on class stream verification, linking incorporates
a binary type stream into the run time state of the JVM and is divided into
three steps: verification, preparation, and resolution.
29


Linking involves symbolic resolution, the process of replacing symbolic
references with direct memory references, as well as testing references for
existence and access permissions. Although different JVM implementations
differ in when they actually perform resolution, every JVM must give the
outward impression that late resolution is used so that throwing an exception
occurs when the class is actually used for the first time.
Dynamic linking implies dynamic extension deciding which classes to link
at run time, a departure from most other linking models. Previous to Java
1.2, there were two ways in which this could be accomplished. The first way
was to pass the class name to the j ava. lang. Class f orName ()
method, which returned a reference to the Class instance that represents
the loaded class. The second way was to override the loadClass ()
method of a user-defined subclass of classLoader.
The f orName () method guaranteed that the class created had been
initialized and linked upon its return, whereas loadClass () did not
guarantee that the class was linked upon return. But loadClass () placed
a loaded class into a specified protection domain and f orName () did not.
These differences are pointed up by the parent-delegation model for class
loaders, introduced in Java 1.2, that determines which class loader actually
loads a given class. The rule is that the JVM uses the same class loader
that loaded the referencing (or calling) class to load the referenced (or
called) class. Each user-defined class loader is assigned a parent class
loader when it is created, or the system (bootstrap) class loader by default.
The delegation part of the model occurs when class loader is asked to load
a class because it (recursively) first asks its parent to attempt to load the
class, all the way back to the system class loader. The process then
bounces back down to the first class loader that can actually load the
requested class, and this is not necessarily the first loader in the chain.
Loading constraints enforce type safety by ensuring that a name in one
name space must refer to the same type data in the method area as the
name in another name space. Type safety issues arise because multiple
name spaces inside a JVM can share types. At compile-time a type is
uniquely identifiable by its fully qualified name; but at run time it also needs
the defining class loader.
30


The specific mechanics of resolving constant pool entries, particularly the
CONSTANT_Class_info entry that represents class and interface
references, is given in (Venners 1999, 278-323). The steps are briefly
outlined as follows:
Load the class and any super-classes.
Check for class access permission.
Link and initialize the class and any super-classes.
Verify the class for language semantics.
Prepare the class (allocate memory).
Resolve the class.
Initialize the class.
Class Hierarchy Storage
What is the most effective way to persist a class hierarchy in a relational
database? The most important constraint is to return only the necessary
class subgraph, not an entire logical class hierarchy.
Although is it certainly possible to persist source code in database rows as a
text blob, there is no compelling reason to do so. Persisting Java bytecodes
offers the following advantages:
the bytecode does not have to be recompiled for each client
unlike machine code, the Java bytecode supports a heterogeneous
computing environment
the cost of source compilation and compression has been distributed
over many class authors
Several assumptions should be made explicit at this point. If a class author
has successfully compiled a class and integrated it into the database, the
development interface has resolved all symbolic class references whether
they exist within the database or in the default Java class libraries. More
specifically, successful compilation would resolve any overloaded, overriding
and inherited fields and methods.
The database schema will persist references to each class file format as
well as a classs conditional probability graph (CPG), and an enumeration of
all of the transition probabilities of the classes that it possibly invokes. Each
class is clustered and indexed on its CPG by database keys to provide
locality of reference during retrieval.
31


The Persisted Class File
Whereas classes have identity, state, and behavior (methods) in addition to
data, the relational database used in the model will only persist class
methods and static class data. The model is not concerned with the identity
or state of individual objects per se, only their schematic representation as
classes that can be instantiated by a virtual machine. The role of the
database in this model is to persist the class bytecode stream and as much
of the class file information as needed in order to:
reconstruct Java objects so that they may be executed in some client
process space; and
compute class activation transition probabilities
The model side-steps discussion of the many mapping-classes-to-tables
issues raised by other researchers who seek to reconcile the normalization
of relational data (i.e., eliminate redundant data from tables) with the goal of
object-oriented design, which is to model a business process by creating
real-world objects with data and behavior.
Containers
A client invokes a specific Java main () method as an entry point into the
system to initiate an application trace. There are many different main ()
methods in the database, and it is up to the client navigation interface to
map unique class identifiers to client naming schemes.
The idea of an application container conveniently names an entire class
traversal graph, starting from a particular main () method. Clients can
navigate containers via menuing or navigation pages. Once the starting path
to a particular component is identified, that path is bookmarked by the client,
just as he does with his favorite web pages. The client community may
decide to impose a hierarchy of names reflecting various business
organizational relationships over the available components for navigational
and security reasons.
32


Database T ransactions
Clearly, persistence is expected to operate with transactional integrity so
that the database is maintained in a consistent state. The class author,
when inserting, updating or deleting a particular class, depends upon the
transaction that uses an all-or-nothing approach where all actions must
either succeed or be rolled back. This type of transaction is short-lived and
infrequent, but it requires an optimistic locking mechanism and must be
carefully designed to avoid performance degradation.
However, the authoring transaction impacts upon the semantic transaction
involved in pursuing an application trace. A class subgraph might be
modified in the update or delete process, and the associated class set may
or may not be immediately available to clients. Although it is possible for the
class authoring interface to enforce a short-lived transaction by writing new
or updated rows all at once, it seems prudent that classes and applications
be marked with an availability flag during this type of transaction.
Traversing an application trace is a long-lived and frequent pseudo-
transaction. Technically, it is not a database transaction because the many
select statements that are dynamically generated during the course of an
application trace can be completely satisfied. However, semantically, a
retrieved class set may not be exactly what the client expects if one of the
classes has just been updated. The client or class authoring interface might
specify a set of deployment preferences that indicates how and when the
classes in a particular application trace should be updated, or simply
generate a client notification event. This event is easily initiated by a
database trigger, which has the format: on if
then . The insertion, update or deletion of a class then either
simply notifies the relevant set of clients, or actually pushes the updated
classes immediately to the clients.
Database Entity-Relation Diagram
The table schema is static, as shown in Figure 5. Once it is implemented,
the number of database entities and their relationships do not change.
33


Figure 5: Database Entity-Relation Diagram
Class Update Notification Mechanism
There are several expectations that must be addressed when considering
how and when a client would receive a new or updated class. The following
constraints would be expected as normal behavior:
a client will cache his favorite applications onto local disk, if possible
a client will sometimes want the most recent version of an application
a client will sometimes keep an older version of an application
automated version identifiers are associated with every class
automated version identifiers are associated with every application
trace
34


The following scenarios should be avoided:
forcing the class server to maintain client state knowing what classes
and versions are cached locally on each client
having the client transmit locally cached class name and version
numbers to the class server
It is possible to automatically notify clients when classes within an
application trace have been changed by persisting these class differences
as a list in the database through the class authoring interface. It is also
possible for the client navigation component to manually ping the class
server for these differences for a particular application. The more practical
approach simply reloads the updated class at the next invocation. The best
compromise appears to be in which the client informs the class server what
he wishes to be done at application start, in one of three ways:
The client does not wish any updates to this application at this time.
The client pulls the most recent component versions at client
invocation.
The server pushes the most recent components versions to the client.
Version Control and Binary Compatibility
The client transmits his Java run time version when requesting an
application from the class server. The Java development version that last
compiled a class will also be stored with the class in the database. These
two pieces of information partially resolve the issue of component versioning
because the class streamer will prevent the transmission of incompatible
Java classes being transmitted to a client Java environment. This check will
prevent linkage errors by providing only binary compatible class versions to
clients.
Serialization and Versioning
The Java mechanism used to detect class versioning differences is
instructive. Both the class file format and the Java serialization mechanisms
automatically assign a unique identifier to all persistent-capable classes in
order to manage object version compatibility during serialization. Any class
that implements the Serializable interface is versionable.
35


This serialization identifier is maintained in a 64-bit integer field typed as:
static final long SerialVersionUID. Thus, to control versioning,
provide the SerialVersionUID field manually and ensure that it is always
the same, no matter what changes are made to the class. (A utility that
comes with the JDK distribution called serialver displays the default
identifier. When given a fully-specified class name, serialver calculates
and displays the SerialVersionUID for the class.) The
SerialVersionUID is created by calculating a 64-bit safe hash of the
following information about the class:
the class name
the class modifiers
a sorted list of interface names implemented by the class
the name, modifiers, and descriptor of each field, sorted by field name,
except for private static and private transient fields
the name, modifiers, and signature of each method, sorted by method
name, except for private methods and constructors
Before calculating the SerialVersionUID, the ObjectOutputStream
checks the class it's serializing for a variable called SerialVersionUID.
If it finds the variable, that number, instead of the calculated value, is written
to the output stream. So, if in a new version this variable is set to the
SerialVersionUID of a previous class, Ob j ect Input Stream will get
the number it's expecting from the stream: the SerialVersionUID of the
previous class. The convention is that all subsequent compatible versions of
a class file return the same number that is, the SerialVersionUID of
the earliest compatible version of the file. Multiple, newer versions of a class
file may be scattered around a network, but so long as the original contract
isn't violated, they can all be considered equivalent, and
Obj ectinputstream will pass them. Note that the original class version
doesn't need to declare this number. Nothing needs to be done to the initial
version of a class to prepare it for versioning at some future time.
Maintaining Object Compatibility
The key to managing object version compatibility is to identify which kinds of
changes may cause serialization-incompatibilities between versions and
which wont, and to treat these cases differently. Serialization-compatible
changes include adding a method or a field. Incompatible changes include
changing an object's hierarchy or removing the implementation of the
36


Serializable interface. Any private static or private
transient fields never make it into an object stream, and private methods
and constructors can't be called from outside of the class, so none of these
can affect the contract.
The serialization-compatible changes that can be made to a class file are:
Adding fields.
Adding readobject () or writeObj ect () methods so long as an
added readObj ect () and writeObj ect method calls
def aultReadOb j ect () or def aultWriteOb j ect (), the formatted
object information will remain the same, and the previous version will be
able to read the data. (The default methods must always be called before
processing any optional object information.)
Changing the access to a field -- public, package, protected, and
private don't affect how or if a field can be serialized.
Changing a field from static/transient to nonstatic/nontransient. Since in
the previous version these fields would not have been serialized, this is
equivalent to adding a field.
Among other changes that usually break class caller contracts are:
Moving a class up or down in the inheritance tree. Since serialization
traverses the inheritance tree, changing this order would cause the data
in the stream to be out-of-sequence.
Changing afield to transient or static.
Deleting a field.
Eliminating or adding a call to def aultReadOb j ect () or
defaultWriteObject() in anyreadobject() orwriteObject()
methods.
Removing Serializable or Externalizable, or changing
between the two, since the stream data will not be structured as the input
stream expects.
Changing the data type of a field of built-in type, since the object
reading will be trying to get data for one type and receive data for
another.
A complete list of compatible and incompatible changes is given in the Java
Serialization Specification (Sun Microsystems 1998).
37


Java ARchive Format
The Java ARchive (JAR) format is the standard way to compress Java class
files for transport over the network. The JAR format is a set of conventions
for associating digital signatures, installer scripts, and other information with
class files in a directory. Signing tools such as the JAR Packager use this
format to create JAR archive files, which are used by client software to
support automatic software installation, client-controlled access to local
system resources by Java applets, and other features that help address
potential security problems.
The JAR file type is a registered internet MIME type based on the standard
cross-platform ZIP archive format. A JAR file functions as a digital envelope
for a compressed collection of files. The JAR file type is distinct from the
JAR format, which is simply a way of organizing information in a directory.
A JAR archive has a subdirectory of meta-information named META-INF.
This subdirectory contains the following information
A single, ASCII-encoded manifest file named MANIFEST.MF. Manifest
files can contain arbitrary information about the files in the archive, such
as their encoding or language.
Zero or more signature instruction files named name.SF. There is one
of these files for each entity that has signed files in the archive.
Zero or more digital signature files named name.suf, where the suffix is
determined by the digital signature format. There is at least one of these
files for each signature instruction file.
In addition to the MANIFEST.MF subdirectory, the archive contains
whatever files to be packaged in the archive.
Security Management
Architectural security is necessary to control who can introduce methods
into the database and who can invoke these methods. Traditional security
schemes either associate a specific list of resources and permissions to a
client or group profile record, where the default behavior forbids access to a
resource; or each specific resource is assigned a set of static access codes
to which a clients profile may be granted.
38


Security management can be introduced into the system at five access
points:
through the database security mechanisms
through the standard Java 2 Security Manager implementation
through package and class access modifiers public, private, and final
through the RMI security mechanisms
through the new Java Authentication and Authorization Service
Database Security Mechanisms
Since the database beneath the class server is effectively hidden from
remote clients, only the class authoring interface that decides to access the
database tables directly should integrate any native database security
mechanisms. The server might use only two types of accounts (say, reader
and writer) to access the database tables.
Standard Java Security
The standard Java 2 security mechanisms, including digital signatures and
certificates, are explicitly written into Java code and can be supported
without change. The Security Manager controls access to external
resources from within the JVM. It specifies and enforces a security policy
per protection domain. The primary domain in this system is the class server
database itself, which would have permissions assigned to it as a code
source.
Class Access Modifiers
Explicitly controlling read access to an application trace appears
complicated at first because the client may or may not have access to a
particular class during the process of traversing a trace. However, it is safe
to rely on Javas built-in security mechanisms for preventing the compromise
of private bytecode. A private class can not serve as the entry point for an
application trace. In addition, code obfuscation techniques and the digital
signing of individual classes may be used to further decrease the risk of
software theft or fraud.
39


The class access modifiers are also used to control access by other class
authors. Public, private, final and protected classes are entry
points defined by the class author into the functionality of the class
hierarchy. It is conceivable that the class streamer could be configured to
read and write (i.e., override) particular classes based upon code source
signing, client certificates and the process of client authentication.
A JVM only grants package-level access between classes loaded into the
same package by the same class loader. This produces, in effect, a run time
package which can also be used to secure class access.
RMI Security
The Java Remote Method Invocation mechanism is discussed in greater
detail later on, but its security requirements are mentioned here. It is not
mandatory to set a security manager to use RMI, unless the RMI client
needs to handle serialized objects for which the client does not have a
corresponding class file in its local classpath. If the security manager is set
to RMlSecurityManager, the client can safely download and instantiate
class files from a RMI server. This mechanism is actually fairly important to
RMI, as it allows the server to generate subclasses for any Serializable
object and provide the code to give these subclasses to the client.
It is entirely possible to use RMI without setting the security manager, as
long as the client has access to definitions for all objects that might be
returned from the RMI server. RMI's ability to handle the passing of any
object at any time using serialization and class file downloading is possible
only because the JVM provides a portable and secure environment for
passing around Java bytecodes from which Java objects can be
reconstructed at run time.
Java Authentication and Authorization Service
Authorization is secure access to system resources based on who a person
is or what role or group a person belongs to. The unit of authorization in
conventional database systems is usually an entire table or a table column
but never a single record. The Java Authentication and Authorization
Service (JAAS), available for the first time in the Java 1.3 platform and to be
integrated into the Java 1.4 platform, exercises access control based on the
40


clients who are executing code. It allows a flexible access control policy for
client-based, group-based, and role-based authorization as well as single
sign-on support (Sun Microsystems 2002). JAAS is the obvious choice for
the authentication and authorization of mobile code.
The Java Authentication and Authorization Service is a Java package that
enables services to authenticate and enforce access controls upon users. It
provides a means to enforce access controls based on where code came
from and who signed it. The need for such access controls derives from the
distributed nature of the Java platform, where, for instance, a remote applet
may be downloaded over a public network and then run locally.
The JAAS infrastructure can be divided into two main components: an
authentication component and an authorization component. The JAAS
authentication component provides the ability to reliably and securely
determine who is currently executing Java code, regardless of whether the
code is running as an application, an applet, a bean, or a servlet. The JAAS
authorization component supplements the existing Java 2 security
framework by providing the means to restrict the executing Java code from
performing sensitive tasks, depending on its codesource and depending on
who was authenticated.
JAAS can also be used in conjunction with the Java Secure Sockets
Extension (JSSE), which provides Secure Sockets Layer (SSL) encryption
facilities.
41


Chapter 2: Architectural Comparisons
This section presents, compares and contrasts the underlying architectural
technologies of the proposed model and their possible alternatives.
Serialization and Reflection
Is it possible to use Javas built-in serialization or reflection mechanisms to
effectively persist classes in a relational database? After all, serialization is
the mechanism that Java itself uses to transport objects over the network.
For example, a Java object could be serialized using object streams and
then inserted into a database as a binary blob, which the JDBC API explicitly
supports. Serialization comes in two forms: the Serializable interface
and the Externalizable interface.
The Serializable Interface
Object serialization is the Java language-specific mechanism for the storage
and retrieval of Java objects and primitives to streams. It is the process of
saving an object's state to a sequence of bytes, as well as the process of
rebuilding those bytes into a live object at some future time. However,
serialization of an object is simply the encoding of its state, the values of its
fields, in a structured way. The object serializer outputs only the fields, not
the method bytecodes, of an object. Although this constraint immediately
removes serialization as a direct mechanism for persisting classes, it is still
instructive to see how serialization works in case it can be used indirectly.
Serialized Java object streams don't contain bytecodes. They contain only
the information necessary to reconstruct an object assuming the class files
are available to the client to reconstruct the object. When an object is
created from its serialized representation, the JVM creating the instance of
the object must have already loaded the class into the JVM, or the JVM
must know where to get the class definition using a custom class loader. An
explicit typecast is then applied to the serial stream to produce an actual
object.
42


Java serialization uses the Writeob j ect () method of the
ObjectOutputstream class to serialize an object and its transitive
closure. In other words, if an object contains references to other objects or is
composed of other objects, the serialization mechanism automatically
detects these references. As long as the "sub-objects" are also
Serializable, Obj ectOutputStream serializes them and includes them
in the stream. Ob j ectOutputStream is a filter stream it is wrapped
around a lower-level byte stream, called a node stream, to handle the
serialization protocol. Node streams can be used to write to file systems or
even across sockets. Ob j ectlnputstream actually doesn't write class
objects when it's creating a stream of bytes representing an object. Instead,
it writes a ObjectstreamClass, which is a description of the class. The
destination JVM's class loader uses this description to find and load the
bytecodes for the class.
By default, serialization only writes and reads non-static and non-transient
fields from the stream. No facility is provided to gain access to the nonpublic
fields of an object being serialized, and no facility is provided to access the
private methods and fields associated with custom serialization. Serialized
fields may only be of primitive or string types. No arrays or other reference
types are supported.
An object's constructor is called only when a new instance of a class is
created. But serialization does not create new instances it merely restores
a persisted object, so an object's constructor is not invoked.
The Externalizable Interface
The j ava. io.Externalizable interface provides more control over how
objects are serialized. It contains just two interface methods:
public abstract void writeExternal(ObjectOutput out) throws
IOException.
public abstract void readExternal(ObjectInput in) throws
IOException, ClassNotFoundException.
The Externali zable interface specification requires that the object
implement a public or protected no-argument constructor. The
43


ObjectOutputstream container writes class information to the stream,
identifying the object type. Reading and writing the object is deferred entirely
to the two functions defined in the Externalizable interface. This allows
private fields to be written as well as arrays and reference types, which do
not have to be declared as Serializable. However, the objects transitive
closure or class subgraph must be explicitly traversed in code.
Reflection
Another interesting Java technology is reflection. The Java Reflection API
grew out of the needs of the JavaBeans user interface component API. The
API consists of two components: objects that represent the various parts of
a class file, and a means for extracting those objects in a safe and secure
way.
The first component of the reflection API is the mechanism used to fetch
information about a class. This mechanism is built into the class named
Class. The special class Class is the universal type for the meta information
that describes objects within the Java system. Class loaders in the Java
system return objects of type Class.
The API is symmetric, which means that, given a Class object, one can ask
about its internals, and given its internals, one can ask it which class
declared it. One can move back and forth from class to method to parameter
to class to method, and so on. One interesting use of this technology is to
find out most of the interdependencies between a given class and the rest of
the system.
Reflection reveals, and can invoke, all constructors, fields, methods (three of
the four fundamental parts of the class stream), interfaces, declared classes
and super-classes. Unfortunately, the API does not provide any class
attributes.
Reflection does solve the long-standing problem with dynamic Java
execution of creating an instance of a dynamically loaded class that did not
have a null constructor.
In summary, the reflection API is used in the class authoring process to
gather the class file information regarding a class, and is therefore useful in
44


the model. However, its serialization mechanisms lack the functionality
needed for network transport.
Oracle-Java Integration
Are there any possible solutions in the database engine itself? Whereas
many commercial relational databases are able to persist some form of
binary stream into table rows, the Oracle Corporation has invested a
significant amount of resources into integrating Java classes and methods
directly into its database engine in order to deliver the functionality of server-
side Java components to the client.
In the Oracle 8.1.5 JVM products, the class developer is assisted in class
resolution by several make-like utilities. For instance, the Oracle JPublisher
utility manually creates Java classes to correspond to Oracle database
objects. The developer spends time resolving the following issues:
how to map between Oracle object datatypes and Java classes
how to store Oracle object attributes in corresponding Java objects
how to convert attribute data between SQL and Java formats
how to access data
Oracle made significant changes to their JVM products in the release of
Oracle9i, particularly in their embrace of Enterprise JavaBeans (EJB) and in
the support of Suns JDK 1:2.1 (Mensah 2001). Indeed, the Oracle database
now provides a complete execution environment for EJB components. Once
deployed, these EJB methods are run inside the RDBMS via the integrated
JVM. The benefits of deploying class methods to run inside the database
include:
direct access to JDBC
reduced network traffic
server management of resources and security
However, Oracle continues to demand an object-relational mapping process
in which database object types are translated to Java classes. A significant
constraint is that Java methods must be public and static in order to be
deployed in an Oracle database.
45


Distributed Object Architectures
How does the model compare to other dynamic invocation interfaces, which
do not restrict the client to invoking methods that were defined at the time
the client was compiled? Because the model moves classes over the
network, a brief comparison to other distributed object architectures is
instructive. The following reference focuses on the activation and use of
objects on distributed servers and not the deployment of code to client
process space, which implies a much more sophisticated communication
model.
An additional question that the model is not prepared to answer is, What
factors influence the decision to activate classes in server memory space as
opposed to client memory space? The proposed model is disposed to
having classes activated at the client, and stops short of partitioning
applications at this distributed level.
CORBA. DCOM and RMI
The architectures of CORBA, DCOM and Java/RMI provide mechanisms for
transparent invocation and accessing of remote distributed objects (Raj,
1997). Though the mechanisms that they employ to achieve remoting may
be different, the approach each of them take is more or less similar. Three of
the most popular distributed object paradigms are Microsoft's Distributed
Component Object Model (DCOM), OMG's Common Object Request Broker
Architecture (CORBA) and JavaSoft's Java/Remote Method Invocation
(Java/RMI).
Common Object Request Broker Architecture
CORBA relies on a protocol called the Internet Inter-ORB Protocol (HOP) for
remoting objects. Everything in the CORBA architecture depends on an
Object Request Broker (ORB). The ORB acts as a central object bus over
which each CORBA object interacts transparently with other CORBA objects
located either locally or remotely. Each CORBA server object has an
interface and exposes a set of methods. To request a service, a CORBA
46


client acquires an object reference to a CORBA server object. The client can
now make method calls on the object reference as if the CORBA server
object resided in the client's address space. The ORB is responsible for
finding a CORBA object's implementation, preparing it to receive requests,
communicate requests to it and carry the reply back to the client. A CORBA
object interacts with the ORB either through the ORB interface or through an
Object Adapter either a Basic Object Adapter (BOA) or a Portable Object
Adapter (POA). Since CORBA is simply a specification, it can be used on
diverse operating system platforms from mainframes to UNIX boxes to
Windows machines to handheld devices as long as there is an ORB
implementation for that platform.
Distributed Component Object Model
DCOM (Brockschmidt 1995) supports remoting objects by running on a
protocol called the Object Remote Procedure Call (ORPC). This ORPC layer
is built on top of DCE's RPC and interacts with COM's run time services. A
DCOM server is a body of code that is capable of serving up objects of a
particular type at run time. Each DCOM server object can support multiple
interfaces each representing a different behavior of the object. A DCOM
client calls into the exposed methods of a DCOM server by acquiring a
pointer to one of the server object's interfaces. The client object then starts
calling the server object's exposed methods through the acquired interface
pointer as if the server object resided in the client's address space. As
specified by COM, a server object's memory layout conforms to the C++
vtable layout. Since the COM specification is at the binary level it allows
DCOM server components to be written in diverse programming languages
like C++, Java, Object Pascal, Visual Basic and even COBOL. As long as a
platform supports COM services, DCOM can be used on that platform.
Java/Remote Method Invocation
Java/RMI (Sun Microsystems 1997) relies on a protocol called the Java
Remote Method Protocol (JRMP). Java relies heavily on object serialization,
which allows objects to be marshaled and transmitted as a stream. Since
Java object serialization is specific to Java, both the Java/RMI server object
and the client object have to be written in Java. Each Java/RMI Server
object defines an interface which can be used to access the server object
outside of the current JVM and on another machine's JVM. The interface
47


exposes a set of methods which are indicative of the services offered by the
server object. For a client to locate a server object for the first time, RMI
depends on a naming mechanism called an RMlRegistry that runs on the
server machine and holds information about available server objects. A
Java/RMI client acquires an object reference to a Java/RMI server object by
doing a lookup for a server object reference and invokes methods on the
server object as if the Java/RMI server object resided in the client's address
space. Java/RMI server objects are named using URLs and for a client to
acquire a server object reference, it specifies the URL of the server object
as one would with the URL to a HTML page. Since Java/RMI relies on Java,
it can be used on diverse operating system platforms from mainframes to
UNIX boxes to Windows machines to handheld devices as long as there is a
JVM implementation for that platform.
Whenever a client needs some service from a remote distributed object, it
invokes a method implemented by the remote object. The service that the
remote distributed object provides is encapsulated as an object and the
remote object's interface is described in an Interface Definition Language
(IDL). The interfaces specified in the IDL file serve as a contract between a
remote object server and its clients. Clients can then interact with these
remote object servers by invoking methods defined in the IDL.
To invoke a remote method, the client makes a call to the client proxy. The
client side proxy packs the call parameters into a request message and
invokes a wire protocol like HOP (in CORBA) or ORPC (in DCOM) or JRMP
(in Java/RMI) to ship the message to the server. At the server side, the wire
protocol delivers the message to the server side stub. The server side stub
then unpacks the message and calls the actual method on the object. In
both CORBA and Java/RMI, the client stub is called a stub or proxy and the
server stub is called a skeleton.
The Java remote method invocation system described in this specification
has been specifically designed to operate in the Java environment. While
other RMI systems can be adapted to handle Java objects, these systems
fall short of seamless integration with the Java system due to their inter-
operability requirement with other languages. For example, CORBA
presumes a heterogeneous, multi-language environment and thus must
have a language-neutral object model. In contrast, the Java language's RMI
system assumes the homogeneous environment of the Java Virtual
Machine, and the system can therefore take advantage of the Java object
model whenever possible.
48


Java Support for Distributed Objects
The goals for supporting distributed objects in the Java language are:
Support seamless remote invocation on objects in different virtual
machines.
Support callbacks from servers to applets.
Integrate the distributed object model into the Java language in a
natural way while retaining most of the Java language's object
semantics.
Make differences between the distributed object model and local Java
object model apparent.
Make writing reliable distributed applications as simple as possible.
Preserve the safety provided by the Java run time environment.
Underlying all these goals is a general requirement that the RMI model
be both easy to use and fits well into the language.
In addition, the RMI system should allow extensions such as garbage
collection of remote objects, server replication, and the activation of
persistent objects to service an invocation. These extensions should be
transparent to the client and add minimal implementation requirements on
the part of the servers that use them. To support these extensions, the
system should also support:
Several invocation mechanisms, for example simple invocation to a
single object or invocation to an object replicated at multiple locations.
Various reference semantics for remote objects, for example live (non-
persistent) references, persistent references, and lazy activation.
The safe Java environment provided by security managers and class
loaders.
Distributed garbage collection of active objects.
Capability of supporting multiple transports.
Distributing RMI Classes
To run an RMI application, the supporting class files must be placed in
locations that can be found by both the server and the client. For the server,
the following classes must be available to its class loader:
Remote service interface definitions
49


Remote service implementations
Skeletons for the implementation classes (JDK 1.1 based servers only)
Stubs for the implementation classes
All other server classes
For the client, the following classes must be available to its class loader:
Remote service interface definitions
Stubs for the remote service implementation classes
Server classes for objects used by the client (such as return values)
All other client classes
RMICIassLoader
The RMI designers extended the concept of class loading to include the
loading of classes from FTP servers and HTTP servers. This is a powerful
extension as it means that classes can be deployed in one, or only a few
places, and all nodes in a RMI system will be able to get the proper class
files to operate.
RMI supports this remote class loading through the RMICIassLoader
class. If a client or server is running an RMI system and it sees that it must
load a class from a remote location, it calls on the RMICIassLoader to do
this work. RMICIassLoader provides static methods for loading classes
from a network location (one or more URLs) and obtaining the location from
which an existing class can be loaded.
Again recall that RMI does not send class files along with the serialized
objects.
Comparison
DCOM, CORBA and Java/RMI are fully distributed object architectures that
relate server objects to client objects. They solve a more general set of
problems than the proposed model, which simply describes a mechanism to
effectively distribute and deploy a class subgraph to a clients process
space.
50


Other Java Distribution Solutions
There are two other Java technologies that are relevant in that they both
allow code to be migrated to and executed on a client machine: JNLP and
Jini
Java Network Launching Protocol
JNLP allows a user to run a Java application or applet directly from the
Internet. JNLP provides direct access to Java software using the latest JVM
without the constraints and problems of applets within web browsers. It has
been successfully incorporated into a product called Java Web Start, and
provides the following benefits:
automatic download, update, and launching of an application
automatic creation of desktop shortcuts
code signing for security
automatic proxy configuration
Java Web Start is bundled with Java 1.4
However, to deploy a Java application using JNLP:
the application code must be modified
the application must be deployed as a set of jar files
the getResource mechanism must load all resources
issues arising from running under a different ClassLoader and
SecurityManager must be addressed
Java Web Start must be installed on the client, and the large JWS is
designed for resource-intensive desktop applications (especially disk
space), not resource-starved environments.
The JNLP specification explicitly limits JNLP to the HTTP protocol,
which means that all current JWS applications must be run from within a
browser.
Jini
Jini is the name for a distributed computing environment, that can offer
network plug and play. A device or a software service can be connected to
a network and announce its presence, and clients that wish to use such a
51


service can then locate it and call it to perform tasks. Jini can be used for
mobile computing tasks where a service may only be connected to a
network for a short time, but it can more generally be used in any network
where there is some degree of change. In effect, Jini is a specification for a
set of middleware protocols that links services and clients together. A Jini
system or federation is a collection of clients and services all communicating
by the Jini protocols.
The Jini interface is useful for defining interactions among networked
services and components. The components of a Jini system are shown in
Figure 6.
lookup
client service service
TCP/IP
Figure 6: Components of a Jini System.
Comparison Between Jini and JNLP
Jini JNLP
Downloads a service Downloads an application
A client must be running to request a service A browser must be running which calls a helper to start the application
Each service looks after itself, independently of any clients Each JNLP file specifies all required parts of an application; if any part changes, the JNLP file must be updated
The client may need to know the location of a lookup service, but not of any service The user of a JNLP application must know the URL of the JNLP file
No generic client Generic JNLP helper
Table 4: Similarities and Differences Between Jini and JNLP.
52


Table 4 summarizes some of the similarities and differences between Jini
and JNLP.
Another distributed perspective is the middleware component approach.
Middleware Component Models
Middleware component models (Raj 1999) take a high level approach to
building distributed systems. They free the application developer to
concentrate on programming only the business logic, while removing the
need to write all the "plumbing" code that is required in any enterprise
application development scenario. For example, the enterprise developer no
longer needs to write code that handles transactional behavior, security,
database connection pooling or threading, because the architecture
delegates this task to the server vendor.
Two popular models are the Microsoft Transaction Server (MTS)
Architecture from Microsoft and JavaSoft's Enterprise JavaBeans (EJB).
MTS, based on the Component Object Model (COM) which is the
middleware component model for Windows NT, is used for creating
scalable, transactional, multi-user and secure enterprise-level server side
components. EJB is a middleware component model for Java and CORBA
and is a specification for creating server-side, scalable, transactional, multi-
user and secure enterprise-level applications. It defines a set of
specifications and a consistent component architecture framework for
creating distributed n-tier middleware.
In essence, both MTS and EJB work by intercepting method calls and
inserting services based on a set of attributes defined at deployment time.
MTS uses class factory wrappers and object wrappers to intercept the
method calls (the MTS executive is called automatically by the COM run
time). Similarly, EJB requires wrappers for each component type and each
component instance.
Middleware components are always deployed on the server side. Each
server will probably service tens to many thousands of client requests at any
point in time. The server should be able to scale well and service these calls
efficiently from multiple clients. The technology used to build these server
components has to provide the features and a framework that will help the
developer build scalable middleware server components.
53


MTS uses a stateless model to ensure database consistency. Stateless
models are essentially stored procedures that operate on the database
state. There is no state that is kept between calls on the stored procedure,
so as soon as that procedure has finished its task, everything can be thrown
away to start again with a clean slate. This is why every database in the
world has some sort of stored procedure language that runs within the sql
process and is tuned for database operations. The stateless model ensures
the consistency of the database's state, which is the primary goal of a
transaction processing system.
Comparison
As before, the middleware component models are highly sophisticated
approaches to solving enterprise-wide software development issues. They
do not involve any notion of prediction.
54


Chapter 3: Performance Measurements
Increasing System Performance
This proposal explores the trade-offs involved in structured management of
the software deployment process. Can effective class distribution over the
network be reconciled with efficient client system performance? The most
important methods for managing performance in a client-server system are
those that directly impact the ability of the system to scale well as system
components and active clients are added to the system. This is usually
achieved by identifying and removing throughput bottlenecks.
General methods for increasing overall system performance include
minimizing network traffic, selecting the best algorithm for the job,
implementing caching on both the server and the client, properly structuring
the database schema, and making effective use of threads.
Minimizing Network Traffic
The proposed architecture attempts to directly minimize network traffic by
delivering only the immediately needed classes of an application trace in
compressed format. The effectiveness of the prediction algorithm will be
crucial to overall throughput, especially for clients that do not persist classes
locally.
Decreasing the amount of time required for n bytes to be transported over
the network will not be measured, a matter of simple efficiency; rather, the
thrust is to reduce the total number of classes to transmit beforehand, which
is a function of system effectiveness. Reducing network handshaking is
possible by packaging the entire class subgraph into a single JAR container
and transmitting it at one time as a function of the size of the client
consumer buffer. 1
55


Choice of Algorithm
Implementation of the proposed model used many of the native Java
classes; hence, their internal algorithms. The model claims that overall
delivery throughput is increased by incorporating and limiting the Markov
probability model to the 1st-order for predicting the clients next class
request. Finally, the separate class fetching algorithms used by the client
and the server are designed to optimize overall class retrieval time.
Caching
The client informs the class server that it will establish a fixed-size cache
buffer, and it is reasonable to assume that the client knows his best cache
size. The server cache is simply made as large as possible. Performance
measurements clearly demonstrate the effectiveness of both server and
client caches, including the use of a common server cache.
Database Schema
The time required to retrieve a class subgraph from the database can be
reduced depending upon the structure and size of table schema, indexes
and key fields, network and database drivers, and standard database
performance techniques such as minimizing the number of indexes
associated with the class bytecode tables.
The cost of inserting, updating or deleting class methods from the database
can be safely ignored because these events would happen much less
frequently than selecting or extracting classes for client use. Since a client
does not establish a direct connection to the database, the expense of
establishing a database connection is not an issue. The class server will
pool and thread any database connections. Database performance is also
sensitive to tuning factors, so these are standardized.
An important database performance factor is total class bytecode size, since
most relational databases, including Oracle, restrict the size of a record to
be no larger than the size of a system page, usually 4k.
56


This implies that performance monitoring should begin with a fixed TCP/IP
network configuration, standard server hardware and relational database
with no class prediction as a baseline. Adding more active clients will be
simulated via multiple virtual client sessions.
Effective Threading
The Java environment supports the effective use of threads for
asynchronous events. The model is implemented so that each connected
client is allocated at least two threads: one to receive class requests and
one to transmit class data. Separate server threads manage any inter-cache
communication. This coarse-grained parallelism, coupled with the use of
separate and dedicated socket ports, permits the effective overlapping of
client-server communication. The use of the Java synchronized keyword
was limited to thread and connection pools and access to the server
common cache.
Other Performance Enhancements
Just-In-Time (JIT) compilers compile bytecode into native machine code and
then cache the code locally. However, the appropriate function of the class
server is not to deliver machine code. Not only does the server exist within a
heterogeneous computing environment, but bytecodes are significantly more
compact than expanded JIT code. Therefore, this choice of using a JIT
compiler is one that a client makes locally.
Performance Measures
Based on the foregoing analysis, the following performance measures are
relevant:
Database class subgraph retrieval time: the time between the server
receiving a class request and the time it finishes delivering the subgraph
to the client.
Delivery prediction effectiveness: the ratio of class requests that invoke
the class server to requests that are satisfied by the client cache.
57


Server Memory Management Strategies
This section focuses on server memory management, which can be very
effective when it manages small, discrete elements such as database rows.
A memory management strategy can improve system performance in
several ways:
By predicting and pre-fetching class streams.
By caching the most frequently invoked bytecodes at the server.
By avoiding slow file system accesses.
By avoiding disk paging.
The Oracle database caches are disabled during these measurements.
Regrettably, there is no way in Java to pin memory caches in memory so
that they are not paged out onto disk.
The guiding principle behind the server caching strategy is that it should be
self-regulating; i.e., that total server memory usage should level off between
an upper and lower bound no matter how many clients are currently
connected.
The server implements the following caching strategy. A common cache is
maintained by the server, which contains classes or resources that have
recently used by two or more active clients. A smaller delivery cache is
established for each client connection that receives the anticipated set of
classes referenced in the class subgraph.
Class Fetching Algorithm
The class fetching algorithm works as follows. The class requested by the
client is first checked to see if it already exists in the clients dedicated
server cache, followed by the servers common cache. If it does not exist
there, the class is fetched from the database along with its class subgraph.
Concurrent with the requested class being transmitted to the client, the class
subgraph classes are retrieved into the delivery cache and then distributed
according to their current transition probabilities. The set of caches are
indexed by a map structure, and the set of classes in each cache is also
indexed by a map structure.
58


The entire memory structure can be visualized as a series of vertical bars,
one per client plus one for the server common cache. Each cache is
accessed using a separate thread. Basic statistics are collected for each
cache.
When a client is added to or removed from the cache set, a test is made
whether any shared classes can be moved into the server common cache.
There is no timer to periodically sweep the connected client caches.
The following are the fixed constraints on the entire mapping structure,
which would be determined after long production experiments:
the minimum and maximum number of connected clients
individual cache sizes, a constant fraction of the clients own cache size
the total size of the entire set of caches
Given these constraints, the total amount of memory used by the server is
best managed as a product of the current number of active clients and a soft
maximum size. A client should not have to be discarded once he is
connected to the class server, but it may be necessary to effect a denial of
service upon new clients if an established maximum number has been
breached.
The actual Java application used in the performance measurements, has
the following characteristics. No class is larger than 13k, and the average
class size is 3047 bytes. The total uncompressed class size is 129393
bytes, which can be compressed into a single JAR file of 66170 bytes giving
a reduction of about 51%. The 41 classes directly invoke an average of 1.56
class types. But two very commonly accessed classes, ClassFile and
EncapsulatedEventAdaptorClassFile, each invoke 10 classes. Thus,
assuming a JAR average compression ratio of 2-to-1 and 10 classes
possibly invoked per activated class, a client buffer size of 10 x 1.6k = 16k
would be a minimal choice.
Performance Measurements
These tests use one runtime instance of Java for all clients and another
instance for all servers within the same (server) computer. All tests were run
from within the Forte development environment. There is no common server
cache or statistical prediction used in the first set of tests that measure multi-
user throughput.
59


Sequential Class Instantiation: Cached vs Not Cached
Not Cached
Cached
Figure 7: Time of Class Package Instantiation: Cached (256k) and Non-Cached.
Class Subgraph Retrieval Time
Figure 7 measures the averaged time from client request to delivery in
seconds; i.e., the delivery, but not instantiation, of 269 class packages for 16
simultaneous clients. It reveals the beneficial effect of 256k client caches. A
single class package contains one or more (an average of 1.56) of the
available class files. The cumulative time required for the client to receive a
class package is linear in both cases, which implies that delivery is relatively
independent of class package size.
Clients Cached / Non-Cached Times
1 0.61
4 0.42
16 0.87
32 0.69
Table 5:Cached
/ Non-Cached Time Ratios vs Number of Clients.
60


Figure 8 Average Transfer Rate in Bytes Per Second.
Figure 8 graphs the averaged transfer rate in bytes per second for 1,4, 16
and 32 simultaneous clients. The Activated (A) and Common (x) lines
include complete class activation in the client JVM and are cached, as well;
whereas the cached and not cached lines measure simple delivery. The use
of a server common cache helps to speed up the overall class transfer rate
for multiple clients. The convergence and leveling of the measurements at
16 simultaneous clients appears to reflect the saturation of machine and
database resources.
Delivery Prediction
The second set of experimental conditions aim to determine the
effectiveness of class delivery prediction. Servers and clients execute on the
same machine, but all server instances execute within a single JVM,
whereas every client instance executes within its own separate JVM. Each
client is given a large class buffer size of 256k, enough memory to contain
all requested classes. A software switch turns off class pre-fetching and
delivery to measure the effects of class prediction. Predicted (stage 2)
61


classes are transmitted immediately following requested (stage 1) classes in
a separate JAR envelope.
No Prediction Prediction P/Np Ratio
Total stream size 84746 169104 1.995
Total transfer time: seconds 14 19 1.357
Classes requested 27 45 1.667
Cache class count 52 85 1.635
Classes from database 27 39 1.444
Classes from cache 25 59 2.360
Extra classes transmitted 0 60 n.a.
Classes from cache / cache class count 0.48 0.69 1.438
Classes from cache / classes from db and cache o;48 0.60 1.250
Table 6: Effectiveness of Class Prediction.
The prediction tests involve more class requests because the additional
classes transmitted need other classes to be fully instantiated. Hence, 60
more classes are both transmitted and loaded. The class cache hit ratio,
defined as the ratio of class requests satisfied by the client buffer to
requests satisfied by the class server, is calculated as the ratio of classes
retrieved from the client cache over the number of classes retrieved from the
database plus the client cache.
Even though twice as much data is transferred to the client, the class cache
hit ratio is 25% higher when class prediction is used. In effect, the class
server transmits classes to the client before the Client has requested them.
The client is able to locate more requested classes locally without having to
request them from the class server database. Prediction also allows the
client to make more effective use of the cache, by a factor of over 43%.
To simulate a slower client running over a network, a second prediction test
introduced a 100 millisecond delay into the class request thread. In Table 7,
the number of classes retrieved from the database rose to 45 and the
number of classes retrieved from the client cache jumped to 84. The classes
from cache / cache class count ratio changed to 84/85 = 0.99 and the
classes from cache / classes from database plus client cache ratio changed
to 84/129 = 0.65. The server was able to transmit nearly all expected
classes to a sufficiently slower client.
62


No Prediction Prediction P/Np Ratio
Total stream size 84746 169104 1.995
Total transfer time: seconds 14 32 2.286
Classes requested 27 45 1.667
Cache class count 52 85 1.635
Classes from database 27 45 1.667
Classes from cache 25 84 3.360
Extra classes transmitted 0 60 n.a.
Classes from cache / cache class count 0.48 0.99 2.063
Classes from cache / classes from db and cache 0.48 0.65 1.354
Table 7: Effectiveness of Class Prediction Over Simulated Network.
63


Chapter 4: Conclusions
Conclusions and Future Research
This project described an abstract transmission-compression model used to
distribute Java classes using a streaming JAR format from a relational
database instead of a file system. The mechanism manages communication
between multiple clients and a class server within a limited-buffer,
consumer-producer channel. The class server uses a 1 st-order Markov
probability model to predict the clients next class request. The Java linking
model allows the effective analysis of application behavior for the sake of
efficient application distribution. The use of design patterns simplified the
design process in that only Java interfaces crossed client-server boundaries.
Using the Java architecture required writing and delivering a custom class
loader to client computing devices.
The experimental results demonstrated the following results. Class package
delivery time is relatively independent of class package size. As expected,
the use of a server common cache helps to speed up the overall class
transfer rate for multiple clients.
The class cache hit ratio is defined as the ratio of class requests satisfied by
the client cache to class requests satisfied by the class server. Even though
twice as much data is transferred to the client, the class cache hit ratio is
25% higher when class prediction is used. Given a sufficiently slower client,
the server is able to transmit all expected classes to the client and the class
cache hit ratio rises to over 35%.
In terms of future project development, the optimal client and server cache
buffer sizes would need to be determined, as well as the ability of the
system to scale as components and active clients are added to the system.
The security measures and component versioning mechanisms would also
need to be incorporated into the system.
One direction of future research would be to investigate a software payment
model that charges clients as they use specific classes. These non-
64


persistable classes would be able to erase themselves from the local JVM
after use.
Another significant area for research involves streaming applications to
mobile wireless devices. Although J2ME does not directly support user-
defined class loaders, reflection, object serialization and RMI, the RMI
profile JSR-066, currently in the last stages of public draft, does add RMI
functionality to the CDC Foundation profile. A modified class loader could
then be embedded into the device that handles the streaming requirements.
65


Bibliography
Bell, T.C., Witten I.H., Cleary, J.G. 1989. Modeling for Text Compression. ACM
Computing Surveys 21:557-591
Brockschmidt, Kraig. 1995. Inside OLE, 2nd Ed. Redmond: Microsoft Press.
Eckel, Bruce. 2000. Thinking in Java, 2nd Ed. New Jersey: Prentice Hall.
Gosling, J., B. Joy, G. Steele, and G. Brach. 2000. Java Language Specification,
2nd Ed. Boston: Addison Wesley.
Grand, Mark. 1998. Patterns in Java, Volume 1. New York: Wiley Computer
Publishing.
Hennessy, J.L. and Patterson, D.A. 1996. Computer Architecture: A Quantitative
Approach, 2nd Ed. San Francisco: Morgan Kaufmann.
Liang, Sheng and Bracha, Gilad. 1998. Dynamic Class Loading in the Java Virtual
Machine, in Proceedings of OOPSLA '98, published as ACM SIGPLAN Notices,
Volume 33, Number 10, October 1998, pages 36-44.
Mensah, Kuassi 2001 Oracle Corporation. Oracle 9i for e-business: Using Java for
e-business. Redwood City.
Oracle Corp. 1999. Oracle8i Java Developer's Guide Release 8.1.5 A64682-01.
Available online:(http://pythie.univ-lyon2.fr/oracle2/java.815/a64682/toc.htm)
Raj, Gopalan Suresh. 1997. A Detailed Comparison of CORBA, DCOM and
Java/RMI. Available online: (http://www.execpc.com/~gopalan/misc/compare.html)
Raj, Gopalan Suresh. 1999. A Detailed Comparison of Enterprise JavaBeans
(EJB) & the Microsoft Transaction Server (MTS) Models, by Gopalan Suresh Raj;
Available online from:
(http://members.tripod.com/gsraj/misc/ejbmts/ejbmtscomp.html)
Reese, George. 2000. Database Programming with JDBC and JAVA, 2nd Edition.
Sebastopol: OReilly.
Sayood, Khalid. 2000. Introduction to Data Compression, 2nd Ed. San Diego:
Academic Press.


Sun Microsystems. 1997. Remote Method Invocation Specification. Available
online: (http://java.sun.eom/products/jdk/1.1/docs/guide/rmi/spec/rmiTOC.doc.html)
Sun Microsystems. 1998. Java Object Serialization Specification. Available online:
(http://java.sun.eom/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html)
Sun Microsystems. 2002. Java Authentication and Authorization Sen/ice. Available
online: (http://java.sun.com/products/jaas/)
Venners, Bill. 1999. Inside the Java 2 Virtual Machine, 2nd Ed. New York: McGraw-
Hill Companies.


Appendix A: Class Size Sampling Distribution
To help estimate client buffer sizes, a representative distribution of 4046
uncompressed Java class files is given in Table A, followed by its frequency
histogram in Figure A.
Class File Size No. of Cumulative Files Percentage
0-1023 1692 41.8 1k
1023-2047 1041 67.5 2k
2048-3071 529 80.6
3072-4097 265 87.2 4k
4098-5119 144 90.7
5120-6143 90 93.0
6144-7167 86 95.1
7168-8191 57 96.5 8k
8192-9215 29 97.2
9216-10239 30 97.9
10240-11263 12 98.2
11264-12287 14 98.6
12288-13311 11 98.9
13312-14335 5 99.0
14336-15359 7 99.2
15360-16383 5 99.3 16k
>= 16383 29 100.0
Table A: Distribution of 4046 Uncompressed Java Class Files.
jr
-r
,N A ^ t?
f / *
f #
&



' # # # &
& Sf "


Glossary
active container
A partition of an application trace that is retrievable from and storable into a
database by name. A container is active because, when it is updated in the
database, it attempts to dynamically update the client processes that are currently
using it.
application trace
This is a nameable graph where all the invoked classes are traceable to a starting
object; or, the serialized list of threaded classes that a client has invoked, starting
from a main () method to application termination.
binary compatibility rules
These rules insure that a Java class definition can be consistently loaded and
instantiated on different platforms and operating systems.
bytecode
The 200 Java instructions that are compiled into class streams.
class activation
The process of bringing a binary class stream into the JVM and linking it through
verification, preparation, resolution, and initialization.
class author
The person who writes and loads valid Java classes into the class server.
class file
A precisely defined binary file format for Java classes. Each Java class file
represents a complete description of exactly one Java class or interface.
class loader
The mechanism that actually loads persistent class streams into a JVM.
class stream
This is the actual bytecode stream extracted from the database which satisfies a
clients request for a particular class or method. It is a more general concept of a
class file because file system storage of classes is not important in this
architecture.


class streamer
This object sits on the database server to retrieve and persist class streams into the
database based on client requests.
class subgraph
The first level of classes, both explicit and implicit, that are referenced from a given
class.
conditional probability graph (CPG)
The graph of class invocation transition probabilities for a class or an application.
conditioning classes
Encoding contexts for symbols in a transition probability distribution model.
constant pool
The constant pool is an ordered set of constants used by each class loaded by the
JVM, including literals and symbolic references to types, fields and methods. The
pool is vital to the process of dynamic linking.
database proxies
An object that represents an object stored in a database. To every other object in
the system the database proxy appears to be the object that it represents. When
other objects send the proxy a message it immediately fetches the object from the
database and replaces itself with the fetched object, passing the message onto it.
direct references
Resolved pointers in the run time constant pool that refer to other methods, classes
or resources.
dynamic extension
The ability to add classes to a client component at run time and not at compile-time.
The client is not restricted to invoking methods that were defined at the time the
component was compiled.
dynamic linking
The process of inserting and symbolically resolving classes into a client component
at run time and not at compile-time.
Enterprise JavaBeans (EJB)
A Java architecture for component-based distributed computing.
first active use mle
The initialization of a object because its class has been referenced.


Java Database Connectivity (JDBC)
A standardized and generic (relational) database API for Java that allows SQL-level
statements in Java programs.
Java virtual machine (JVM)
An abstract computer specification whose main job is to load class streams and
execute their bytecodes.
lazy instantiation
The Java policy of not loading a class until it is actually referenced.
method area
A run-time memory area in a JVM where all the class bytecodes are stored.
miss rate
The fraction of class requests that are not in the clients local cache.
native methods
Java programs invoke these functions which are compiled to a host CPUs machine
language in order to interact with a specific host operating system.
object repository
The concept that pre-compiled Java objects can be persisted to a relational
database in a structured manner. In contrast, a file repository would be an up-to-
date named container placed on a file system for fast distribution.
object serialization
The mechanism that takes any object that implements the Serializable interface
and turn it into a sequence of bytes that can later be restored fully into the original
object, particularly across a network.
overloaded methods
Class methods are overloaded if they have the same name, are declared in the
same scope but have different parameter lists. Different return types are not
overloaded. The programmer can not overload Java operators.
overridden methods
Non-private class methods with the same name and parameter list in a derived
class as in the ancestor class whose behavior is then replaced by the derived
implementation.


parent-delegation model
The policy that determines which class loader actually loads a given class. The rule
is that the JVM uses the same class loader that loaded the referencing (or calling)
class to load the referenced (or called) class.
push technology
An event notification mechanism in which the database engine responds to specific
event definitions by re-issuing database queries and sending the result sets back to
a client. The client does not have to poll the database to get answers, it simply
listens for data. Similar to publish-subscribe event models.
protection domain
A protection domain defines all the permissions that are granted to a particular
code source, which corresponds to one or more grant clauses in a policy file.
reflection
The build-in Java mechanism that queries or introspects an object to determine its
constructors, fields, and methods.
Remote Method Invocation (RMI)
Java-specific mechanism for the transparent invocation and accessing of remote
distributed objects.
security manager
A special object that defines a custom security policy for a client component.
symbolic references
Unresolved text strings in class streams that point to other methods or classes.
transitive closure
The complete, recursively invoked, set of classes that make up a class, both by
reference and composition.
url
Uniform Resource Location / Locator. A standard format for locating resources on
the internet.