Citation
The role of automated test tools in the spiral development model

Material Information

Title:
The role of automated test tools in the spiral development model an x-runner case study
Creator:
Swagerty, John David
Publication Date:
Language:
English
Physical Description:
viii, 72 leaves : illustrations ; 28 cm

Subjects

Subjects / Keywords:
Computer software ( lcsh )
Software engineering ( lcsh )
Computer software ( fast )
Software engineering ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaf 72).
General Note:
Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Computer Science and Engineering.
Statement of Responsibility:
by John David Swagerty.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
36434476 ( OCLC )
ocm36434476
Classification:
LD1190.E52 1996m .S93 ( lcc )

Downloads

This item has the following downloads:


Full Text
THE ROLE OF AUTOMATED TEST TOOLS
IN THE SPIRAL DEVELOPMENT MODEL:
AN X-RUNNER CASE STUDY
by
John David Swagerty
B.S. Northeastern Oklahoma State University, 1985
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science and Engineering
1996
AL


This thesis for the Master of Science
degree by
John David Swagerty
has been approved
by
Date


Swagerty, John David (M.S., Computer Science and Engineering)
The Role Of Automated Test Tools In The Spiral Development Model:
An X-Runner Case Study
Thesis directed by Associate Professor William J. Wolfe
ABSTRACT
This thesis presents the results of a case study investigating the
employment of automated test tools to evaluate an X-windows, GUI-based
application which used the Spiral Development Model. The case study
focused on actual performance metrics derived from testing an application
using both manual and automated approaches. The difference in the test
results were examined, and projections for efficiency improvements were
derived from them. Particular emphasis is placed on overcoming side
effect propagation, which is a potential problem with the spiral
development model.
The results of this case study have been the development of a new
approach to test design, which anticipates the impact automated test tools
will have on system cost, schedule, and performance. Test engineers can
use thesis tools to properly design a balanced methodology which will
utilize automated testing where it provides a value-added contribution, and
avoid using automated when the benefits of doing so are not as promising.
This abstract accurately represents the content of the candidates thesis,
recommend its publication.
Signed
in
Willhun J. Wolfe


CONTENTS
Chapter
1. Introduction............................................................................1
1.1 Opportunities and Challenges in the Software Industry...............................1
1.2 Costs of Software Development.......................................................1
1.3 Changes to DOD Software Procurement.................................................2
1.4 Rapid Application Development and the Air Force.....................................2
1.5 Using Automated Test Tools..........................................................3
2. Purpose.................................................................................4
2.1 Goal of This Thesis.................................................................4
2.2 Why the Spiral Development Model Was Chosen.........................................4
2.3 The Choice of the Case Study Application............................................4
3. The Spiral Development Model............................................................5
3.1 What Are Process Models?............................................................6
3.2 What Is the Spiral Model?...........................................................6
3.3 Risk Analysis in Software Development...............................................7
3.4 Development Phases in the Spiral Model .............................................7
3.4.1 Quadrant A..........................................................................8
3.4.2 Quadrant B..........................................................................9
3.4.3 Quadrant C..........................................................................9
3.4.4 Quadrant D..........................................................................9
3.5 Advantages of the Spiral Model......................................................9
4. Traditional Verification/Certification Approaches......................................10
4.1 The DOD Approach...................................................................11
4.1.1 Unit Testing.......................................................................11
4.1.2 Integration & Testing..............................................................12
4.1.3 Systems Testing ...................................................................12
4.1.4 Systems Acceptance Testing.........................................................12
iv


4.2 The Shortfalls of the DOD Approach..................................................12
5. The Case for Automated Testing..........................................................13
5.1 New Demands on Traditional Organizations............................................14
5.2 The Need for New Methods of Problem Solving Using Technology........................15
5.3 Demands Caused by the Spiral Development Model......................................15
6. Automated Test Tools: X-Runner..........................................................16
6.1 Description.........................................................................17
6.2 Initial X-Runner Employment within Hughes...........................................17
6.2.1 Impact of X-Runner on Existing Hughes Organizations................................17
6.2.2 The Need for Selectivity to Implement X-Runner......................................18
7. The Case Study Environment..............................................................18
7.1 The Timeline Initializer Function (TIF).............................................19
7.2 The Application ....................................................................19
7.3 The Hardware........................................................................19
7.3.1 The Test Hardware Environment..................................................... 22
7.4 The Development Approach............................................................22
7.4.1 The Initial Software Development Plan...............................................22
8. Integrating X-Runner During System Development..........................................24
8.1 Test Strategy.......................................................................25
8.2 Test Execution .....................................................................26
8.2.1 Spiral I. GUI.......................................................................26
8.2.2 Spiral Ii. The Timeline Object .....................................................27
8.2.3 Spiral Iii. Requirements Development Tools..........................................30
8.2.4 The Complexity of This Analysis During Spiral III...................................31
8.2.5 Spiral IV. Requirements Translation/Schedule Scoring................................31
8.2.6 Testing The Context-Sensitive Widget Set............................................32
8.2.7 Miscellaneous Table Notes ..........................................................32
8.2.8 The Effectiveness of X-Runner for Green Requirements ...............................33
8.3 The Improved Test Design Process....................................................33
8.3.1 The Requirements Rating Procedure ..................................................33
8.3.2 The Test Methodology Impact on Cost/Schedule (T-MICS) Model ........................36
8.4 The New Architecture................................................................37
v


8.4.1 An Improved Test Design............................................................37
8.4.2 New Possibilities of Test Design Using X-Runner....................................38
8.4.3 Step 1: GUI Checkout...............................................................39
8.4.4 Step 2: Requirement Testing........................................................40
8.4.5 Step 3: Regression Testing.........................................................40
8.4.6 Step 4: Stress Testing.............................................................40
8.4.7 The Goals of Stress Testing ......................................................41
8.4.8 Stand-Alone Stress Testing.........................................................41
8.4.9 Multiple Application Stress Testing................................................42
9. Summary...............................................................................43
9.1 The Fundamental Question Concerning Using ATT......................................44
9.2 Observations ......................................................................44
9.2.1 The Need for Customer Support......................................................44
9.2.2 The Need for Organization Support..................................................44
9.2.3 The Need to Customize Tools for Problem Domain.....................................45
Appendix
A. Facts About X-Runner................................................................46
B. The Timeline Application............................................................49
C. The Formal DOD Development Test Process ............................................52
D. The TIF Test Development Model......................................................54
E. Lessons Learned from the TIF Evaluation Using X-Runner..............................58
F. The Requirements Rating Procedure ..................................................62
G. The Test Methodology Impact on Cost/Schedule (T-MICS) Model.........................65
H. Glossary............................................................................69
References..............................................................................72
vi


FIGURES
Figure
3.1: The Spiral Development Model (SDM)......................................8
7. l: Sample Timeline Output Built By TIF....................................21
7.2: Test TIF Hardware Configuration.........................................22
8.1: The Adaptive Test architecture (ATA)....................................38
A-1: X-Runner Test Development Environment...................................48
D-l: The Cyclical Nature Of Test Development.................................54
D-2: Early Stages Of Test Development........................................55
D-3: Test Execution For Both Manual And Automated Approaches.................56
F-1: The Requirements Rating Procedure For TIF0003...........................62
G-l: T-MICS Results Evaluating TIF (First Spiral)
67


TABLES
Table
5.1: Hours Spent for Manual vs. Automated Testing on a Sample Application.....15
8.1: GUI Test Results (Spiral I)..............................................26
8.2: Timeline Test Results (Spiral II)........................................29
8.3: Requirements Translation/Schedule Scoring Test Results (Spiral IV).......32
8.4: Requirements Rating Procedure Scores.....................................34
8.5: Requirements Rating Procedure Evaluation Criteria........................36
8.6: Impact of ATA on Test Engineer Level of Effort...........................38
A-1: Recognized X-Runner GUI Object Classes...................................48
B-l: The Notional Satellite (NS) Operational Parameters.......................49
D-2: Relative Test Execution Performance per Hour.............................57
viii


1. Introduction
1.1 Opportunities and challenges in the software industry. The explosive growth
in the software industry has created many opportunities as well as challenges. It is
now possible to develop significantly large applications with substantially less effort.
While software and systems engineers have been able to leverage new technologies
and design methods to increase their productivity, the approaches widely used in
industry today to verify program correctness have not kept pace.
1.2 Costs of software development. In commercial software development, up to
25% of the entire project costs are in the test area.1 2 Additionally, industry averages of
on-time delivery of products are down to a low of 16.2%. When the complexities of
object-oriented development and diverse client-server architectures are considered,
the job of testing software to reduce performance risk has become more complicated.
1.3 Changes to DOD Software Procurement. In the arena of government
applications, particularly the Department of Defense (DOD), this is a growing
concern. DOD expects to spend $40 billion this fiscal year for software development
1 Client Server Today. February 1995
2 ibid.


and maintenance.3 Unlike previous years, todays DOD software is produced using a
variety of methods ranging from economical commercial-off-the-shelf (COTS)
software. This allows the development life cycle to be shortened significantly.
Obtaining waivers to use the more current and inexpensive product is becoming more
common. Additionally, the easing of regulations to allow quicker updates to existing
software has created opportunities for system enhancement that would have been
unheard of a few years ago. Because of these changes, new development models are
being used at an unprecedented rate.
1.4 Rapid Application Development and the Air Force. One approach favored by
some U.S. Air Force programs has been to use a Rapid Application Development
(RAD) method known as the Spiral Development Model (SDM). This technique
expands functionality in a series of releases of increasingly more sophisticated
applications. One major design concern is the amount of effort required by testing
organizations to keep pace. A programmer may add only a single module to an
existing application. However the test engineer must examine both the new module
and the rest of the existing application to ensure side effects have not propagated
through previously functional code.
3 Testing DOD Software, Defense Software Engineering Report. Software Technology Support Center,
October 1995
2


1.5 Using Automated Test Tools. To cope with this increase in responsibility,
several vendors have developed automated test tools (ATT), which perform many
duties previously performed by test engineers. While ATTs perform many of the
bookkeeping tasks of testing adequately, they are not robust enough to provide
adequate relief in a meaningful way without significant tailoring to the problem
domain.
3


2. Purpose
2.1 Goal of this Thesis. The purpose of this thesis is to use a specific ATT, X-
Runner from Mercury Interactive Software, on a demanding graphical user interface
(GUI)-based application to develop a scoring criteria which will help planners decide
what applications should be tested using ATTs, to what extent, and what the expected
impacts to cost and schedule will be.
2.2 Why the Spiral Development Model Was Chosen. This case study employs a
non-traditional software development approach known as the spiral development
model. This model is specifically designed to develop a product in a series of
releases, each with increasing capabilities. The iterative nature of this approach
examines the ability of test designs to cope with the side effect propagation problem
that is the largest obstacle to fully automating the test process.
2.3 The Choice of the Case Study Application. The specific customer for this
application is the U.S. Air Force (USAF) which needs the ability to schedule a variety
of conflicting missions on a tightly constrained resource, in this case a common
geosynchronous satellite. A military application was chosen as it demands the
highest standards of reliability compared with most other potential applications. The
4


software must be efficient, but must display a degree of fault tolerance not required
by commercial customers.
5


3. The Spiral Development Model
3.1 What Are Process Models? Software process models are tools which aid
systems engineers and management in the proper planning and development of
software applications. They provide a framework that assists planning by defining
the expected sequences of events, development and management activities, reviews,
products, and milestones for a project. The phases in a software process model
increase the visibility of individual activities within the complex, intertwined network
of events during the development of a software product. Each cycle is completed
with a review involving vested, interested parties to make the determination of
whether to proceed and how. 4
3.2 What is the Spiral Model? The spiral model represents activities related to
software development as a spiraling progression of events that moves outward from
the center of the spiral. For each development phase, from project conception
through preliminary design, this model places great emphasis on defining the
objectives and evaluating alternatives and constraints, evaluating the alternatives and
their potential risks, developing and verifying the compliance on an interim product
4 Sodhi, Jag., Software Engineering: Methods, Management, and CASE Tools. TAB Books, 1991.
6


(e.g. prototypes), and planning for the next phase using knowledge from the previous
phases. As B. Boehm, the inventor of the spiral development model, stated:
The model reflects the underlying concept that each cycle involves a progression that
addresses the same sequence of steps for each portion of the product and for each of its levels
of elaboration from and overall concept-of-operation document down to the coding of each
individual program. 5
3.3 Risk Analysis in Software Development. In the overall field of software
development, where up to 50% of software products lead to no useable products, the
spiral model is useful in promoting reasoned analysis during the life of the project.6
For example, if the risk analysis conducted after the definition of requirements
showed that the system was not feasible, the requirements can be scaled back, or the
entire project modified before large amounts of resources are wasted.
3.4 Development Phases in the Spiral Model. The spiral model proceeds through
four distinct quadrants (steps) during each cycle. Figure 1 describes the spiral
development model in graphical detail.
5 Boehm, Barry, A Spiral Model of Software Development and Enhancement. IEEE Computer, Vol. 21, No. 5,
May 1988.
6 Software Engineering Guidebook, p. 3-13, Copyright 1994, Hughes STX Corporation
7


Figure 3.1: The Spiral Development Model (SDM)
3.4.1 Quadrant A: Determine objectives, alternatives, and constraints. Each cycle
of the spiral begins with this step to identify:
The actual system objectives, e.g., performance, functionality, changeability, etc.
Alternative approaches, such as design A, reuse, purchase.
Limitations that affect these alternatives, such as cost and time available.
8


3.4.2 Quadrant B: Evaluation of alternatives; identify and resolve risks. This
process frequently identifies areas of uncertainty that are often a significant source of
risk. Prototypes, simulations, questionnaires, and analytical models may be required
to ensure the cost-effectiveness of the design approach or method of risk mitigation.
At the end of this process, the next step of system development is decided and could
contain one of the following alternatives:
Proceed with the next phase
Develop a model
Change the objectives
Revise the constraints
Adopt a more traditional, evolutionary development model
Stop the development
3.4.3 Quadrant C: Develop and verify next-level product. The products vary and
can be a plan, software requirements, software design, code, simulation, or prototype
to address a specific problem. The product is then verified to ensure it meets the
objectives set in Quadrant A.
3.4.4 Quadrant D: The next iteration of the spiral is outlined. These plans are
based on information and lessons learned from the last completed step.
3.5 Advantages of the Spiral Model. The spiral model encourages analysis of
objectives, alternatives, and risks at each stage of development, providing an
alternative to one large commitment/decision point at the start of the project. In the
9


figure of the spiral model above, the farther one moves away from the intersection of
the axes, the greater the cost commitment. Additionally, the spiral model allows for
objectives to be re-evaluated and refined based on the latest perception of needs,
resources, and capabilities.7
7 Software Engineering Handbook, Build 3, p. 2-5, Copyright March, 1992, Information Systems Division,
Hughes Aircraft Corporation.
10


4. Traditional Verification/Certification Approaches
4.1 The DOD Approach. Testing DOD software has traditionally been a resource-
intensive proposition. It takes time, people, and hardware resources. It also generates
a great deal of plans, reports, and other paperwork that becomes part of the historical
database associated with the system. Generally speaking, tests are designed based on
the requirements for the system in an imperative maimer. In most DOD models,
testing consists of four distinct layers of verification: unit testing, integration &
testing, systems testing, and systems acceptance testing.
4.1.1 Unit Testing. This phase of testing is informal and usually conducted by the
software engineering function instead of full-time test engineers. The focus is on
ensuring all coding and design errors are caught and fixed before the customer sees
the product. Examples of unit testing are ensuring code compiles and links properly,
subsystem errors are trapped, shop coding standards are followed, and established
configuration management procedures are being enforced.
4.1.2 Integration & Testing. There is no formal dividing line between unit testing
and this phase. The focus has changed, however, to the proper integration of the code
units produced earlier into a cohesive program. Finding and correcting problems in
this phase are usually more expensive and time consuming. Generally, testing at this
11


level is performed by more experienced software and systems engineers who have a
deeper understanding of overall system objectives.
4.1.3 Systems Testing. In this phase, formal evaluation by a full-time testing
organization begins. The intent of this testing is to operate the software under as
close to operational conditions as possible, exercising the fullest set of functionality
possible. This requires a comprehensive understanding of not only the capabilities of
the software, but also the desires of the end user which is sometimes at odds.
4.1.4 Systems Acceptance Testing. This phase of testing is further divided into two
activities: acceptance testing and system installation. Acceptance testing is a formal,
comprehensive effort that identifies what problems, if any, exist with the system and
how they will be corrected. This is the final time a system is evaluated before
becoming part of the customers baseline. Installation is also a formal activity
concerned with building and integrating the new system with existing software on
hardware at the customers location. At the conclusion of this step, the system is
expected to perform to specifications and be fully useable by end users.
4.2 The Shortfalls of the DOD Approach. As we have seen, software verification
in a DOD environment is a continuous, time-consuming effort, performed by both
programmers and dedicated test personnel. It is an attempt by the development
12


organization to demonstrate to the customer that all reasonable effort has been made
to ensure the products will meet expectations. Since companies have gone out of
business by neglecting these efforts, most surviving ones do not take this step lightly.
13


5. The Case for Automated Testing
5.1 New Demands on Traditional Organizations. The growing complexities of
software have placed demands on the verification process that cannot be
accomplished using traditional test methods. The growing emphasis on
function/module reuse with tools that facilitate the integration of these modules in
increasingly diverse ways has made it possible to develop significantly large
applications in a small amount of time. However, the technology to validate these
new applications has not kept pace. In many commercial markets, this would not
pose a major problem. In domains where there is little or no fault tolerance, e.g.,
satellite health and welfare, status and commanding, etc., the need for complete
verification has not subsided. What is called for are tools that leverage the test
organizations capabilities in the same way as CASE tools and object oriented
development design approaches have done for software engineers. Table 1
demonstrates the dramatic time-saving potential of ATTs using SQAs TeamTest test
g
suite on a generic windows application. Note: All figures are given in hours.
8 Client Server Today. February 1995
14


TEST ELEMENT Manual Testing Automated 7 Time Difference
Test Plan Creation 32 40 8
Test Development (1,750 Test Cases) 262 116 145
Test Execution 466 23 443
Test Results Analyses (700 Errors) 117 58 59
Error Status/Correction Monitoring 117 23 94
Report Creation 96 16 82
Total Duration (in Hours) 1090 277
Table 5.1: Hours Spent for Manual vs. Automated Testing on a Sample Application
5.2 The Need for New Methods of Problem Solving Using Technology. This
paradigm shift in the test domain has far-reaching implications. Previously, there was
a distinct phase of development between the activities of software development and
software test. The old adage of throwing the code over the wall to the testers was
only slightly metaphorical. New methods of software development means the test has
to be much more carefully designed than before. It is no longer cost-effective to test
each requirement in an imperative manner.
5.3 Demands Caused by the Spiral Development Model. The iterative nature of
software development using the SDM forces test designers to reexamine their test
plans at each spiral. This means that a given section of code may have to be retested
numerous times, which can have a noticeable impact on labor costs. Additionally,
when a certain module is changed, a large amount of test effort must be expended to
15


test nearby but unchanged software to ensure the application still performs
previously-test functionality.
16


6. Automated Test Tools: X-Runner
6.1 Description. X-Runner is a commercial software package specifically designed
to provide automated testing functionality for X-windows applications. It offers the
ability to approach testing problems in manners which are impractical to accomplish
in a manual testing mode, i.e. repetitive tasks, consistently accurate test execution,
foreground and background testing, and expansible capability through both C and
UNIX. A further description of X-Runner capabilities and how they were used during
this case study is contained in Appendix A.
6.2 Initial X-Runner Employment within Hughes. There were two basic reactions
encountered upon integrating X-Runner into the test organization at HITC: use the
new tool for every single requirement, or resist using it for any requirements. Since
the tool was purchased by a government customer for a specific program,
management provided emphasis consistent with the former view.
6.2.1 Impact of X-Runner on Existing Hughes Organizations. Because of this, a
major training and experimentation program was instituted to prepare the existing test
organization for X-Runner integration. More people with programming experience
were assigned to test. Traditionally, the programmer to test engineer ratio at Hughes
17


for normal governmental development programs was 6:1. Using ATTs would lower
that ratio, making the overhead margins larger. Tighter coordination between test and
software development organizations occurred. Test schedules and budgets were
adjusted to cope with the changes. While many of these changes have been adjusted
or abandoned over time, the overall test process has improved, primarily due to
improved integration between software development and test organizations.
6.2.2 The Need for Selectivity to Implement X-Runner. Early in this process it
became apparent some selectivity was needed in implementing X-Runner.
Generating automated test cases and verification processes for every requirement was
too time consuming to justify the additional expense. The question remained: when
should X-Runner be used, for what level of verification, and what skill mix is
required to accomplish these goals? The experience gained during the actual software
tests provides insight into answering these questions.
18


7. The Case Study Environment.
7.1 The Timeline Initializer Function (TIF). To further examine the effect of
ATTs on the software development process, a sample application under development
that used the spiral development model was selected. For further information
regarding generic timeline issues and the packed-window scheduling problem, refer
to Appendix B. The selected system was evaluated in three distinct areas: the
application (problem domain), the hardware, and the development cycle.
7.2 The Application. The Timeline Initialize Function (TIF) is a software
application that will build a prototype schedule of tasks on a satellite resource
management tool known as the timeline. The timeline is a graphical representation of
jobs the mission planners want the satellite to perform during a given interval.
Additionally, specific resources aboard the spacecraft offer unique opportunities and
challenges to developing an optimized schedule of work.
7.3 The Hardware. Currently, the TIF is designed to operate on a Sun-based
workstation, preferably a Sparc-10 or greater. The necessity of the timeline
workstation to monitor real-time telemetry, ranging, and other spacecraft health data
requires significant multi-tasking abilities. Additionally, the timeline operator must
19


monitor message traffic from a variety of sources to see if adjustments to the existing
timeline are required by dynamic events, such as a change in mission priorities. For
the purposes of this case study, a two-monitor display was used, with one monitor to
contain the application under test, while the other one contained all associated X-
Runner windows. This approach allowed operators and software engineers to interact
with the software, while a test was being conducted in a passive, background mode.
For some of the later test efforts, where throughput was a concern, X-Runner was
executed on a different CPU and remotely connected to the machine which contained
TIF. This approach removes processor contention issues from the test results, but is
conducted sparingly to limit the impact of testing on other development activities.
20



Figure ~.l: Sample Timeline Output Built by TIF
7.3.1 The Test Hardware Environment. The testing of TIF was executed on a
two-headed Sparc-20, connected to a simulated ground station through a generic
Ethernet interface. Figure 3 displays the screen configuration of TIF in the actual test
environment. TIF was located on the primary screen {display:0.0), while the X-
Runner and X-Runner Backbone support utilities were echoed on the secondary
screen (display.OA). Both displays shared a keyboard and mouse. The X-Runner
Backbone utilities provided messages generated during test execution. The messages
were routed to one of three windows: general messages, error messages, and debug
information. By examining the output on these windows, as well as the X-
21


Runner/TSL window, the test engineer could examine the state of the application
before a particular TIF operation occurred and see the results afterwards.
Figure 7.2: Test TIF Hardware Configuration
7.4 The Development Approach. TIF was a portion of an application being
developed as a training vehicle for newlv-hired software engineers on both X-
windows as well as mission planning applications. Over the course of the case study,
a number of people worked on both the code and the test program. Significant,
recurring management participation in the case study helped ensure the integrity and
consistency of findings over the course of the entire project.
7.4.1 The Initial Software Development Plan. The initial software development
plan (SDP) called for four distinct spirals for the initial operational capability (IOC)
for TIF. At the end of each spiral, the customer would conduct a design review to
22


determine if the software met its original goals, or did any of the adjustments on this
spiral create a need to review original goals/capabilities, and should development
continue into the next spiral. The following list expands the goals of each spiral.
7.4.1.1 Spiral I: GUI. At this level, the primary focus was generating all displays,
pop-ups, and other human-computer interaction (HCI) concerns. Very little of the
underlying functionality would be implemented, but the logical flow of the design
would be established. Where needed, program stubs would provide a simulation
function. The expected coding time was estimated at eight labor-weeks. Testing was
allocated two labor-weeks.
7.4.1.2 Spiral II: The Timeline Object. This phase brought more emphasis on the
basic timeline object. This object is more than the simple placement of tasks in a
packed window. Jobs on the timeline behave like separate objects, with unique
constraints and opportunities. Proper reflection of timeline dynamics, based on
operator interaction and other dynamic events, was not a straight-forward effort, yet
was fundamental to the value of this product. The budgeted coding time was
estimated at 20 labor weeks, with three labor-weeks allocated for testing.
7.4.1.3 Spiral III: Requirements Development Tools. After the GUI was functional
and the timeline object performed reliably, numerous support tools were required to
23


actually operate the application in any real sense. These tools were primarily text-
related database functions which helped develop scheduling requirements from
multiple sources. The output of these tools served as the basis for the prototype
schedule generated and displayed in the timeline object. The budgeted coding time
was estimated at ten labor weeks. Testing was allocated two labor weeks.
7.4.1.4 SpirallV: Requirements Translation/Schedule Scoring. The final spiral used
for TIF focused on ensuring the validity of the generated prototype schedule. The
results of the scheduling algorithm were compared with schedule parameters with
known optimal values, and the deltas tracked. The total worth of the schedule was
calculated in a variety of ways, and cross-checked with a human-developed one. The
expected coding time was estimated at four labor weeks, with four labor weeks
allocated for testing.
24


8. Integrating X-Runner During System Development
8.1 Test Strategy. An umbrella test plan was developed to cover the entire scope of
the project. The initial development efforts, or Spiral I, were completely developed.
Subsequent spirals were completed in increasingly fuzzier detail, due to the dynamic
nature the product could take using the SDM. For this reason, the Spiral II portion of
the test plan was released for external staffing, but not included in the umbrella plan.
Spiral III was released for internal Hughes staffing. Spiral IV was still being
developed by key personnel on the development team. As the project proceeded to the
next spiral, the test plan for each phase would move the next step in its completion
cycle. This assisted in keeping the project on track, while allowing the flexibility to
adjust to changes in system requirements.
8.2 Test Execution. Since the SDM provides multiple releases of software, there is
a need for constant involvement by the supporting test team. Officially, test
involvement began as an advisory and information gathering role during unit and
integration testing. Much of the test case development occurred during these phases,
with particular emphasis on re-using as much of the software engineers test efforts as
possible. This approach also allowed systems engineers to steer test case
development in a more prominent way than is traditionally done. This improved the
25


quality of the total test program and helped ensure better coverage throughout the
code.
8.2.1 Spiral I. GUI. Most of the testing of this version of the software involved
performance of the GUI and the Motif widget set. Each object in the entire GUI was
manipulated, with results noted, a total of ten times. Since the TIF contains 129
separate objects, over 1,200 separate tests were performed. The following table lists
the results of those tests:
Object Tested Occurrences ofEaefe Object Manual Time Required (Min,) X-R Time Required (Min.) Time Savings (%)
Window 7 N/A N/A N/A
pushbutton 36 400.00 22.65 94.34 %
checkbutton 9 100.00 7.53 92.47 %
radiobutton 2 25.00 1.78 92.88 %
list 3 350.00 29.33 91.62%
edit 1 100.00 17.87 82.13%
scroll 12 600.00 72.59 87.90 %
menuitem 5 250.00 24.93 90.02 %
statictext 53 125.00 9.19 92.64 %
object 1 100.00 40.97 59.03 %
Total 129 2050.00 226.84 88 93 %
Table 8.1: GUI Test Results (Spiral I)
8.2.1.1 Miscellaneous Table Notes: The window objects were tested through their
component widgets. The numbers in the Occurrences of Each Object column
denote the total number of those objects within the entire TIF application. The Time
Required column lists the actual time, in minutes, each test approach needed to
26


examine all the objects ten times. These times do not include test preparation, test
case development, etc., but do include test report generation. The Time Savings
represents the amount of time saved using automated test tools for this type of test.
8.2.1.2 The primary difference between automated and manual testing was the
amount of time required to perform each test on multiple occasions. The total test
execution time difference was over 30 labor hours. Dramatic differences in time
allow test engineers to quickly examine GUI test data and move on to other test
objectives. In many cases, the automated testing was more effective, as the input data
was more consistent and quickly processed. The object referenced in the table is
the timeline, a non-standard object that was unstable during this early portion of the
test.
8.2.2 Spiral II. The Timeline Object. Although some of the earlier versions of the
timeline object suffered from a lack of stability, later versions improved significantly.
The timeline object was based loosely on several other similar developments, so the
software matured rapidly. From the testing perspective, the most challenging aspect
of this spiral was the almost complete reliance on non-Motif-standard widgets to
implement the timeline. Due to this approach, all TSL scripts which served as test
procedures were written using the analog testing mode. This created the necessity of
27


including numerous safety checks, such as ensuring the window was in exactly the
proper physical location before a given test thread executed.
8.2.2.1 Acceptance Criteria. Due to the complexity of analog testing, actual screen
representation of scheduled events for verification purposes was not attempted using
X-Runner. What was used was a reliance on systems calls within TSL to existing c-
shell scripts which examined the scheduling database associated with TIF to ensure
events were scheduled, at the proper time, and on the proper resource. Early manual
tests focused on ensuring what was graphically depicted matched the contents of the
database. Once a degree of confidence was reached that the timeline object could
accurately portray events from its databases, a greater reliance was placed on the
automatic verification of the timeline using database queries.
8.2.2.2 As the timeline object was contained within one requirement, TIF0002, only
one test procedure was developed for this spiral. The test was, however, quite
extensive, consisting of over two hundred discreet procedural steps and twelve
separate verification functions. The test was based on the manual approach to
evaluating timeline objects. Table 3 depicts the performance characteristics for each
test approach evaluating the timeline object.
28


Approach Required (inHrs.) Degree of Compliance Number of Tests Executed (in test window)
Manual 4:30 100.00 % 4
X-Runner :42 99.42 % 27
Table 8.2: Timeline Test Results (Spiral II)
8.2.2.3 Miscellaneous Table Notes. The time difference between the manual and
automated approach was even more dramatic in this test effort. This column depicts
the amount of time required to perform a single test procedure. The timeline has the
ability to dynamically adjust time thresholds by grabbing a scroll line with the mouse
pointer and dragging. The problem is this scroll line is only one pixel wide. Precise
manipulation takes a tremendous amount of practice and patience. Using the analog
testing mode, X-Runner is interacting with TIF at the pixel level and the
playback/record features of X-Runner become very useful.
8.2.2.4 The Degree of Compliance column denotes X-Runners ability to generate
test results similar to those produced by test engineers. The small difference in the
findings generated by X-Runner and human testers was largely caused by a test
engineers ability to adjust the lengthy test procedure, in small amounts, as the test
executes. The automated test procedure is highly objective, and designed to perform
29


in the same manner each time. The overall test findings were close enough to allow
X-Runner to perform requirement testing for TIF0002.
8.2.2.5 The throughput of this test effort represents ~20 hours of actual wall time.
In this same period, automated testing was able to accomplish almost seven times as
many evaluations with a high degree of compliance. During the course of the test,
problems were reported to systems engineers who made corrections and rebuild the
code. This action caused the need to retest the software. A test engineer would be
looking at one-half days work to accomplish this, whereas X-Runner could perform
it in less than one hour. This is particularly important when numerous requirements
need retesting which would require more time than is humanly possible.
8.2.2.6 Although many involved with the test design were not comfortable with
allowing X-Runner to evaluate a highly-complex, non-standard object in the analog
mode, it was able to perform well within expectations.
8.2.3 Spiral III. Requirements Development Tools. The textual nature of these
tools tended to limit the ability of X-Runner to easily adapt to evaluating them. The
lessons learned from examining the timeline databases through systems call were
applicable. However, the majority of the test effort during this spiral focused on
intensive mathematical analysis to ensure the scheduling function within TIF was
30


generating candidate missions which did not violate any constraints, either from
schedule conflicts or orbital constraints.
8.2.4 The Complexity of this Analysis During Spiral III. The complexity of this
analysis prevented most test engineers from conducting the evaluation. Most of this
work was performed by a technical support committee comprised of specialists
throughout the company. Once the scheduling algorithms were validated, some
minor test work was conducted to ensure the data was developed, disseminated, and
stored in a manner compliant with existing requirements. This work was largely done
by test engineers, as it only had to be examined once, and those routing functions
were not rebuilt after the first baseline.
8.2.5 Spiral IV. Requirements Translation/Schedule Scoring. This spiral
presented similar problems to the test effort as did Spiral III, but to a lesser degree.
Once the scoring algorithms were validated (by the same technical support
committee), a more meaningful test program was developed for X-Runner. The basic
method of evaluation centered around three different asset utilization plans whose
optimal schedule was known. TIF was to generate a prototype schedule, based on
operator input. The new schedule was compared with the optimal schedule, and the
deltas compared. TIF had a sensitivity function which was to make a determination
whether the differences were significant, and adjust its processing accordingly.
31


8.2.6 Testing the Context-Sensitive Widget Set. The design approach used to
accomplish these requirements was almost entirely within the context-sensitive
widget set. This allowed for rapid development of automated test procedures and
acceptance criteria. The amount of time required for these stages was nearly identical
to the amount of time needed for test engineers to design a manual method. The
actual execution characteristics of these tests is contained in Table 4.
Test Approach Time Required (in Min.) Degree of Compliance Number of Tests Executed (in test window)
Manual 12:00 100.00 % 25
X-Runner 1:59 100.00% 200
Table 8.3: Requirements Translation/Schedule Scoring Test Results (Spiral IV)
8.2.7 Miscellaneous Table Notes. The Time Required column depicts the amount
of wall time needed to perform the entire test battery for this spiral. The times are not
large as the evaluation process is not deeply involved. The Degree of Compliance
was identical due to this same fact. The large difference in the number of tests
executed was due to scheduling the automated test once per hour every day for the
entire test window (four weeks). Although this amount of time might seem excessive,
allowances were made in crafting the schedule to accommodate any problems which
might need a mini spiral to fix, and the final version of the test report once all the
findings were analyzed by test engineers.
32


8.2.8 The Effectiveness of X-Runner for Green Requirements. This entire spiral
demonstrated the effectiveness of using X-Runner for primarily Green requirements.
The ability to rapidly complete test development and begin executing a fast and
efficient test program simply must be done. The amount of time this frees up for test
engineers is too valuable and needed in many other areas.
8.3 The Improved Test Design Process. As a result of the test effort for TIF using
X-Runner, a change in the traditional approach to testing seemed necessary. The
lessons learned from these results are contained in appendix E. They have been used
to develop a new process of test development, which seeks to employ ATTs for the
tasks they are most suited. At the same time, this new process attempts to restrict the
use of ATTs in areas where they are not suited or their cost benefit is not clearly
known. To assist test engineers in designing future test strategies, an Adaptive Test
Architecture (ATA) was developed, which focused on the following steps:
Rating individual requirements to determine their suitability for automated testing
and to what degree this testing should examine.
Determine the impact on system development schedule, in both terms of increased
cost and quality.
Publish a working test strategy to guide the efforts of test engineers to implement
the test program.
8.3.1 The Requirements Rating Procedure. This first step in the process provided
a systematic method of evaluating individual requirements to determine how suited to
33


automated testing they were. At the end of the process, each requirement was scored
at one of three levels: Red, Amber, or Green. The following table explains these
levels in more detail:
Level Meaning Explanation
Red Performance Impact The complexity of this requirement makes designing a cost-effective automated test a high-risk proposition.
Amber Potential Impact The requirement is complex or the potential benefits of automation are not sufficiently known to determine if the requirement should be tested using X-Runner. These requirements are generally automated after all other test requirements are completed with time remaining on the test schedule.
Green Must Do The benefits of automation are straightforward. Automating the test effort can be accomplished with little or no risk to test effectiveness.
Table 8.4: Requirements Rating Procedure Scores
8.3.1.1 The evaluations made for a given requirement a made directly for the spiral
under consideration at that time. It is possible for a requirement to be rated at a
different level during a subsequent spiral, although this is rare. Most requirements
generate their rating based on their inherent complexity, a factor which does not
usually vary between spirals. It is possible, however, for a window-level object to be
seriously modified due to the decisions made at the beginning of a spiral. These
changes may have a profound impact on the complexity of the window object, which
will in turn have a noticeable impact on the rating that particular requirement may
receive from the rating procedure.
34


8.3.1.2 Additionally, while Red requirements make poor automation candidates,
many Amber requirements can and should be automated. However, this conversion
should only occur after all other required test activities have been performed. It is not
acceptable to jeopardize the test schedule automating requirements with higher
degrees of risk, unless the schedule has sufficient slack.
8.3.1.3 A number of factors were considered to reach the final score for a given
requirement. Appendix F gives a more detailed explanation of how this process is
implemented. The following table gives a brief explanation of the criteria used to
make a final determination for each requirement:
35


Factor Evaluation Criteria
Average window-level widget density The more widgets, both standard and non-standard, which appear on each window generated by the application makes it more difficult to test. This is partially do to the increased workload, but is also influenced by the combinatorics involved with the testing every widget.
Average widget complexity Certain widget classes, such as push_buttons and check_buttons, are straightforward to test. Other widget classes, such as non-standard objects, or bitmaps require much more effort. Appendix A contains a table of X-Runner objects and their complexity, as determined by the amount of test time required to evaluate those objects.
Depth of Window- Level Objects The amount of system resources, particularly memory (stack size, etc.), required to track the interaction between various levels of nesting impacts the complexity of the test. This is particularly acute on a workstation that contains several real-time applications running simultaneously. Testing a new application in a stand-alone manner may be successfully, while testing the application in operational conditions may not.
Standard, Reused Object? Certain objects from previously-developed products are reused as often as possible. They are completed, tested, understood by operators, and easier to maintain. If a new application contains reused objects, its testing complexity drops accordingly.
Function Maturity/Spiral Objects mature with each iteration around the spiral. While an object may be modified on every spiral, the likelihood of this drops with each iteration.
Test Engineers Rating This rating is the expert opinion of the difficulty of the test effort, based on prior experience of the test organization. This score is established through established metrics and committee evaluations.
Systems Engineers Rating This rating is the expert opinion of the difficulty of the test effort, based on prior experience of the systems engineer responsible for the project. The emphasis of this evaluation contains an end-to-end perspective, in addition to the impact of the test effort on the entire project. In many cases, the process of this evaluation will point to automating a test to a level that will have a negative impact on the fielding of the product, which is generally not acceptable.
Table 8.5: Requirements Rating Procedure Evaluation Criteria
8.3.2 The Test Methodology Impact on Cost/Schedule (T-MICS) Model. This
tool, which allows systems and test engineers to interactively plan the actual design of
the test, is executed as an Excel spreadsheet. It requires background information on
36


the particular spiral, such as the number of requirements, schedule time, staffing
levels, labor rates, etc. Using the manual test method as the benchmark, the model
asks which approach, manual or automated, will be used for each of the three
products of the test: the test procedure, the verifying process, and test documentation.
These decisions are determined for each requirement tested. Once these decisions are
made, T-MICS will determine the impact the test approach will have on the test
schedule, the quality of the test effort, the overall effectiveness of the test effort, and
the impact of the test approach on labor costs. Appendix G contains more
information on the construction of the model, as well as sample output.
8.4 The New Architecture. Incorporating these changes into existing methods has
produced a new approach to test design. These new steps are designed to accomplish
a broad range of test objectives. Some of these steps were not possible in the
traditional approach, primarily due to the intensity required. Note these steps will
occur for each spiral. However, the earlier two steps will decrease in intensity as the
application matures, with focus shifting to the latter two steps.
8.4.1 An Improved Test Design. Because of the findings of the TIF case study, a
new test design process has been developed. Figure 4 portrays this new approach to
test design.
37


Figure 8.1: The Adaptive Test Architecture (ATA)
8.4.2 New Possibilities of Test Design Using X-Runner. This new architecture
also introduces a new series of test efforts which are to be developed concurrently.
Under the previous test development process, Stage 2, Test Preparation, consisted of
designing requirements test procedures and limited regression test design. Table 7
gives a list of all test efforts which are occurring during test preparation under ATA.
These test steps are further explained below.
UvelofEiliM^J*;
GUI Checkout A thorough evaluation of GUI Automatically Generated
Requirements Test Design Requirement-driven, development test Same as Before
Regression Test Design Operationally-based examination of application Reuse of requirements test procedures minimizes development impact
Stand-Alone Stress Testing Establishes stability of the Application Under Test Automatically Generated
Multiple Application Stress Testing Confirms resources can support AUT design. Automatically Generated; Hardware must be scheduled in advance
Table 8.6: Impact of A TA on Test Engineer Level of Effort
38


8.4.3 Step 1: GUI Checkout. As far as the operator is concerned, the GUI is the
application. All their interaction with the software occurs here. For this reason, the
GUI must be consistently designed, and extremely fault tolerant. X-Runner maintains
an ASCII file which describes all objects contained in the GUI, including name,
location, and actions taken upon activation.
8.4.3.1 A TSL module, called GUI_CHECKOUT, has been designed to exercise the
entire GUI in a rigorous manner. This module examines the GUI data file and builds
a task list of objects. Based on the class of the object, GUI_CHECKOUT will
activate every object on the GUI, including pop-up and sub-windows, in varying
combinations to ensure the stability of the application.
8.4.3.2 GUI_CHECKOUT performs other operations as well. Certain GUI objects
perform in a similar manner across applications according to shop standards, e.g.,
quit, print, and minimize. Routines have been included to ensure these
standard objects perform the function they represent. For example, the quit object
should shut down the application with no residual processes active. This routine will
examine the process list, as well as the window table, to ensure no traces of the
application under test remain. An exspansible switch statement within the function
allows this list to grow as GUI_CHECKOUT evolves.
39


8.4.4 Step 2: Requirement Testing. This is the heart of the development test
effort. The primary mission of the test program is to ensure the application under test
can perform its required functionality. In the development testing arena, this is the
primary focus. Many of the other test activities contained in ATA help establish how
well above the minimum standard the application performs. ATA is also interested in
establishing the stability of the application to a much higher degree than is currently
possible.
8.4.5 Step 3: Regression Testing. This level of testing is designed to operate the
application in the same manner as it will be used in actual conditions. Ideally, the
regression test can be built from existing test cases, used in modified ways. Under the
manual approach, the self-contained nature of the requirements testing make simply
recycling them less effective. However, it is much easier to reuse test scripts in TSL.
In some cases a full regression test can be written by developing a script whose only
purpose is to call several requirements test scripts, in a logically-designed manner. If
the called scripts perform as designed, a consistent performance score will emerge
over time. This score can be used in the same manner as with requirements testing,
alerting test engineers of changes in application performance.
8.4.6 Step 4: Stress Testing. This area of testing is crucial to delivering quality
software, but is often too resource-intensive to conduct thoroughly. Resource
40


allocation problems like memory leaks, and stack violations become evident during
this type of testing, but finding them using manual methods is more a case of luck
than good test design. Currently, stress testing is viewed in two distinct manners:
stand alone, and multiple operators. This type of testing occurs later in the test effort,
depending on the success of the first three steps. Additionally, the intensity of this
testing increases as the number of spirals do. The philosophy is as the application
approaches delivery, it needs to be put through its paces more.
8.4.6.1 Stress testing has only one basic goal: Can the application survive in an
intense environment? Since this goal is straightforward, it requires little human
intervention. For this reason, stress testing occurs overnight, weekends, and other
times when significant machine resources are available. For example, a test engineer
will design a simple stress test for an application and start the job at close of business.
When the test engineer returns the following morning, they can determine the state of
the application by examining the screen and the test log. The constant activity
provides a degree of confidence in the applications ability to perform as designed.
8.4.6.2 Stand-alone Stress Testing. In this area, the entire emphasis of the test is to
determine if the application can be broken given enough interaction and time.
Stress testing at the application level occurs in three phases. Phase 1 will perform
many of the activities of the GUI_CHECKOUT function. The primary difference is
41


that the module will randomly perform thousands of object manipulations per hour.
The non-directional nature of these interactions will test the applications ability to
handle quick, sometimes illogical input. Phase 2 involves the random execution of
requirements test procedures. Test reports are generated and performance scores are
tracked for consistency. Phase 3 involves simulating operator errors. This is
accomplished by executing random requirements tests, as in phase 2, with one major
exception. Throughout most test procedures, list, scroll, and other multiple option
widgets must be manipulated to perform a desired function. In this phase, one or
more of these choices are selected incorrectly. In most cases, this should case the test
case to fail. The application must be able to accommodate such incorrect input.
8.4.6.3 Multiple Application Stress Testing. In this area, the emphasis shifts from
testing the application to testing the shared resources ability to support multiple
applications simultaneously. Many applications, like TIF, may be operated by several
operators at the same time. The shared object, the timeline in our case, is maintained
in a number of data tables on a common file server. As the number of requirements
tests which manipulate data in these tables grows, test engineers gain insight into the
servers ability to cope with these requests. Note: GUI checkout is not the emphasis
of this type of testing, although a limited amount does occur. Also, the population of
requirements tests which can be randomly executed is smaller than stand-alone stress
42


testing. The focus shifts to the entire hardware architecture, the network, file servers,
etc.
43


9. Summary
9.1 The Fundamental Question Concerning Using ATT. The fundamental
question concerning the use of ATTs is not whether test organizations will use them,
rather it is how. Improperly used, they can be a significant drain of resources and
may cause stress, both within the organization and with customers. However, when
properly used, they offer the potential of significantly increasing test engineer
effectiveness. In several instances throughout this case study, the improvements were
over 600%. Even static organizations, like the DOD, are interested in improvements
of this magnitude.
9.2 Observations. While examining the mountains of data generated during this
case study, a few observations echo throughout:
9.2.1 The Need for Customer Support. The customer must support changes in
established procedures, when they amount to a paradigm shift. Making such
profound changes to routine practices can quickly effect system efficiency and
desirability.
9.2.2 The Need for Organization Support. The organization, both software
development and test engineers, must also support these changes. In many cases, the
44


skill mix will not be appropriate to effectively employ ATTs. This is a problem that
can only improve over time.
9.2.3 The Need to Customize Tools for Problem Domain. No commercially-
available ATT will have sufficient functionality to contribute improvements early in
its integration. In X-Runners case, numerous utility and support programs were
needed to enjoy much of its potential.
45


APPENDIX A: Facts About X-Runner
A. 1. General X-Runner Capabilities. X-Runner provides several features, including automatic
recording and playback of operator interaction with the application; rudimentary report generation;
and the capability to execute test scripts in batch fashion. Its versatility and power offer a tantalizing
array of possibilities to professional test engineers. The potential cost savings, quality improvements,
and logic-based test designs make the furthest exploration of X-Runners capabilities essential.
A. 1.1. Modes of Testing. X-Runner provides two basic modes to accomplish testing. Analog mode
forces X-Runner to deal with the view port, keyboard, and mouse using absolute (physical) addresses.
Context sensitive is a high-level testing mode that allows X-Runner to learn objects on the screen,
based on varying criteria. Once these objects are learned, X-Runner may interact with them regardless
of their location on the screen. For example, an operator would like to use the mouse to click on an
object named QUIP. In the analog mode, X-Runner would move the mouse pointer to a physical
location on the screen, then execute a button press. If, for some reason the object was moved to a
different location, the mouse click would not work. In the context sensitive mode, X-Runner would
interpret the mouse click command, look up the current location of the object, move the mouse pointer
and execute the click. The context sensitive mode provides a higher degree of security, but the
additional overhead required slows its performance noticeably.
A.2. Test Script Language (TSL). X-Runner provides a C-like scripting language called TSL.
Although many of the syntax rules and functional descriptions are similar between C and TSL, it is an
interpreted language and experiences a predictable performance lag. TSL has the capability to call
other modules, including compiled TSL functions, other TSL scripts, and other compiled C modules
46


contained in Dynamic Linked Libraries (DLL). TSL follows C syntax concerning rules of
associativity and precedence, and contains a rudimentary mathematical function library. More
extensive computational requirements would require linking into DLLs, using Cs math library. In
most testing applications, this has not proved necessary.
A.3. Strengths. X-Runner is an excellent, versatile ATT that offers features found on no other
commercially-available product. It offers a variety of approaches to testing that are costly and often
impractical in manual testing modes. It can eliminate many repetitive tasks faced by test engineers,
performing its work in a consistent and accurate manner. X-Runner can accomplish testing in either
foreground or background mode. It generates a rudimentary test report and has expansible capability
through both C and UNIX.
A.4. Weaknesses. Fully realizing X-Runners potential is not automatic. Significant overhead is
required before any testing can begin (e.g., setting up the test environment). Additionally, there are
some instabilities within the X-Runner application and within TSL that require workarounds on
occasion. The rudimentary reporting feature is too immature. It needs augmentation, through TSL, to
report data necessary to accurately characterize the results of the test. To overcome these weaknesses,
an in-house developed suite of TSL functions, called the X-Runner Backbone (XRBB) was written.
This application isolated many of the problem areas within TSL to minimize their impact on routine
test development. The XRBB also provided reporting and results recording features which greatly
simplified using X-Runner during this test.
A.5. X-Runner Objects. X-Runner has the ability to directly interact with every object in the Motif
toolkit (in the context sensitive mode). It has established protocols for communicating with these
objects, and provides the TSL programmer the ability to determine and modify the attributes of these
objects. Table A-l details the actual classes of widgets with which X-Runner may communicate.
47


Class Description Complexity
Window Any application window, dialog box, or form N/A
push_button A push (command) button 1
checkbutton A check button 1
radiobutton A radio (option) button 1
list A list box. This can be a regular list or an option menu. 5
edit An edit field. 6
scroll A scroll bar, scale, or scroll box. 7
menuitem A menu item. 3
statictext Display-only text that is not part of any GUI object. 2
object All objects not included in one of the classes described above. 9
Table A-l: Recognized X-Runner GUI Object Classes
A.6. The interface with X-Runner is a GUI-based, text editor used to develop TSL scripts. It has most
of the standard features associated with text editors, file, print, edit, etc., and many tools specific to
testing. Figure A-l displays an actual screen capture of the X-Runner test environment.



public function start_application(in application, in window)
{
auto app_dir. rc. I. toolkit, version:
if (win_exists(window) -- 0)
return 0;
if (win_exists(application)!- OK
app_dir getenv("M_ROOT") & "/samples/bin";
GUI_get_name (toolkit version);
if (toolkit "Windows"K
system(app_dlr & "fwinburfwinbur.exe");
>
else {
if (system(~cd & app_dir & "/motifbur; &
"motlfbur &") !-
return 1;
>
if (win_wait( application. 60)!- 0)
Figure A-l: X-Runner Test Development Environment
48


APPENDIX B: The Timeline Application
B. 1. The timeline application used during this case study is the primary visualization tool used by
mission planners. It provides bargraph, or GANTT style views of the requested activities over time.
The timeline allows the user to view in an instant the breadth of system resource utilization during a
specific time interval. The timeline has the capability to use a drag-and-drop mechanism to select and
move scheduled activities, or to select additional requests from a list and add them to the schedule.
B.2. During the period of this evaluation, a notional satellite was designed to fully examine the
timeline function. This geosynchronous satellite had the operating characteristics listed in Table B-l.
Note: All times are given in milliseconds.
Satellite Payload Payload Type Maximum Scan Minimum Sean Slew Time lii^adi Shed Recovery Time
S-Band Control Link N/A N/A N/A 5% N/A
X-Band Data Downlink N/A N/A N/A 10% N/A
Im_l Imaging Sensor 2,000 200 250 25% 200
Im_2 Imaging Sensor 2,000 200 250 25% 200
HRMI High Resolution Multi-Spectral Imaging 1,000 200 300 35% 500
WX Weather N/A N/A N/A 10% N/A
Table B-l: The Notional Satellite (NS) Operational Parameters
B.2.1. The notional satellite was configured along these lines to simulate the performance
characteristics of other programs, both governmental and commercial. Its performance characteristics
are compliant with the industry standard Hughes Electronics HS-601s satellite with a modified payload
to simulate both current and notional capabilities. The columns in the table are further explained
below:
49


B.2.2. Satellite Payload. The Notional Satellite (NS) contains ten elements within its payload. In
keeping with trends in recent programs, the payload is heterogeneous, which imposes new demands on
the timeline function. The payloads are named based on the primary function they provide.
B.2.3. Payload Type. NS contains five different types of payloads, each with unique scheduling
properties. They range from continuous transponder functionality in the X-band, a continuous uplink
in the S-band, two imaging sensors, one HRMI sensor, and won very low resolution, fixed weather
sensor.
B.2.4. Maximum Scan. This time represents the maximum amount of time a given sensor may
perform a particular task. An entry of N/A means that particular payload does not move. If the sensor
remains pointed at the same location, the sensitivity and calibration of the sensor diminishes.
B.2.5. Minimum Scan. Some sensors require a minimum amount of time to provide reliable data.
This time limit is usually provided by the sensor vendor as one of the design constraints of the sensor.
For example, if the imaging sensor is supposed to generate a finding within +/- one meter, the sensor
may require a constant exposure to the phenomenon for n milliseconds. If the expose time is smaller,
the quality of the event generated by the sensor will suffer, which may cause it to generate false
results.
B.2.6. Slew Time. Some of the sensors aboard NS require an electromechanical movement of
hardware on the spacecraft. These components are not omnidirectional and require specific ground
locations to complete their tasks. As these components must move, it takes a finite amount of time to
move them to the desired location. Slew time considers the amount of time required to calculate the
amount of movement and the physical deployment of those components.
B.2.7. Load Shed. On NS there is a limited amount of power available at any one time. There is not
sufficient power to operate all sensors on the spacecraft at the same time. Hardware on existing
50


satellites prevents the overburdening the power grid. However, mission planning software should also
prevent satellite over-tasking.
B.2.8. Recovery Time. After some sensors have performed a task for a given amount of time, they
may require some recover time. For example, if the imaging sensor has been tracking a bright spot it
will need to look at cold space to allow residual heat to dissipate from the sensor. Remember, the
operational temperature of the sensor hovers around -175 degrees Celsius, so a few degrees are
significant. Once the recovery time has occurred, the sensor is available for subsequent tasking.
B.3. The TIF software was designed to generate a prototype timeline, based on the existing
requirements for a given period of time. The requirements were derived from two sources: the
recurring task list, and a requirements list for a specific time window. An example of the recurring list
appears below:
O
Requirements List for NS (Recurring)
c
o
c
Task Number Payload Start Time Stop Time Scan Type Target Area Target Size Priority Requesting Agency
R-0001 wx 00:00 23:59 Continuous In View N/A 100 NASA
R-0002 HRMI 00:00 23:59 Every 5 min. 18E:41N 15km 75 DOD-Special
R-0003 IM-2 10:00 12:00 Every 5 min 05W:35N 150km 25 NASA
R-0004 IM-1 12:00 13:00 Best Chance 20E:40N 75km 15 DOD-USAF
R-0005 IM-2 05:00 06:00 Best Chance 26E:38N 50km 25 DOD-USA
R-0006 IM-2 17:30 22:00 Best Chance 22E:42N 50km 15 DOD-USAF
c
c
c
c
c
The following list demonstrates what a specific days requirements would look like:
O
O
O
O
O
O
O
Requirements List for NS (Daily) for 23 January 1995
c
c
Task Number Payload Start Time Stop Time Scan Type Target Area Target Size Priority Requesting Agency
[>-0001 HRMI 00:00 01:00 Best Chance 28E:35N 50km 100 NASA
D-0002 IM-1 00:00 02:00 Best Chance 28E:35N 50km 100 NASA
D-0003 HRMI 09:00 10:59 Every 10 min 18E:41N 15km 75 DOD-Special
D-0004 IM-1 10:00 12:00 Every 5 min 05W:35N 150km 25 NASA
D-0005 IM-1 12:00 13:00 Best Chance 20E:40N 75km 15 DOD-USAF
D-0006 IM-2 05:00 06:00 Best Chance 26E:38N 50km 25 DOD-USA
D-0007 IM-2 17:30 22:00 Best Chance 22E:42N 50km 15 DOD-USAF
c
51


APPENDIX C: The Formal DOD Development Test Process
C.l. The formal test approach used during each spiral during this case study complied with existing
governmental policies, specifically, MIL-STD 2167A. From a test perspective, this approach can be
summarized by the following paragraphs.
C. 1.1. The emphasis of testing was on functionality of the system as it will be used operationally.
Most of the preparation for the test occurred well before the test phase, and was conducted by a
separate test organization, not the development team. However, several programmers assisted in the
test design process, as most requirements are subject to some degree of interpretation at lower levels.
C. 1.2. A review of the system test plan and procedures occurred before actual testing began to ensure
test coverage was complete.
C.1.3. A baseline configuration of TIF was established and frozen. Baseline source code was used to
compile, link, and build a new executable version of the application. The current hardware
configuration was documented.
C.l.4. Once the new application was judged stable, it became the product baseline. Final versions of
the current spirals test plans and procedures were staffed, revised, and published.
C.l.5. Any problems discovered during the first test were documented and resolved in a controlled
configuration management process. No ad-hoc changes were allowed.
C. 1.6. The formal test was conducted by the test organization with quality assurance monitoring the
process to certify correct procedures were followed.
C. 1.7. At the conclusion of the test, numerous briefings were conducted between the test organization,
systems engineers, project management, and customer representatives. The test results were reviewed,
52


with problems and potential solutions identified. Every instance of software performance not meeting
requirements or expectations was examined to determine what actions needed to occur. In some
instances, the anomalies could be traced to operator error, or in some cases minor deviations from the
test procedure. In other instances, hardware bottlenecks and failures caused some test anomalies.
C.1.8. Problems which could not be resolved were documented in the final test report, after
consultation with the software development team. These issues became a central portion of the
quadrant A activities of the next spiral. The effort needed to correct these problems was compared
with the benefits gained by the additional capability planned for the new spiral, and a prioritized task
list was derived. This new task list defined the objectives of the development organization for the next
spiral.
53


APPENDIX D: The Timeline Initilizer Function (TIF) Test Development Model
D. 1. The software testing model used for this case study consisted of four basic steps: Requirements
Information Sheet (RIS) development, test preparation, test execution, and test results. These stages
contain all actions required by DOD regulations, standards, and style guides. Table D-l outlines these
steps in more detail.


RIS Development Starting point in the test development process Contains all necessary data to fully understand a given requirement, particularly why it is included on the development Includes the Acceptance Criteria for its listed requirement. Allows test engineers to design a thorough, meaningful evaluation
Test Preparation Translate the data contained in the RIS forms into working procedures Validate the acceptance criteria through actual operation of the software
Test Execution Actually perform test procedures Generate results data and formal test reports for each procedure Begin work on the final test report
Test Results Reduce data generated through previous stages Summarize findings Make final determination about the readiness of the software tested Complete the final test report, including recommendations
Table D-l: Test Development Steps Followed at Hughes
D.2. This process is not a simple linear one. At any time during the test development, should
circumstances require, work may regress to an earlier stage to reevaluate and reconfigure existing test
strategies. Figure D-l illustrates the process as it is used for both manual and automated testing.
\z
RIS 1 Test Test Test j
Development I Preparation Execution Results |
Figure D-l: The Cyclical Nature of Test Development
54


D.3. Generally speaking, when using manual test methods the workload is balanced between each of
these stages. For automated testing, however, the amount of labor-intensive work peaks during test
preparation and falls dramatically for the final two stages. Note: The time savings in those stages
allows test engineers to focus on more than one test objective at the same time, increasing their
productivity.
D.4. This method of test design fits the spiral development model well, in that it is adaptive to both
large and small changes in system requirements, constraints, and performance targets. Because the
systems development is more dynamic in this model of software development, the test program must
also be.
D.5. Manual and X-Runner Test Development (Stages 1 & 2). These two steps of the development
process are iterative and consist of all work that needs to occur before actual testing can begin. These
steps occur early in the test process and sometimes continue into actual test execution. Figure D-2
depicts what actions test engineers perform during these first two steps by each test approach.
Figure D-2: Early Stages of Test Development
55


D.6. RIS development is the same process and takes the same amount of time for either approach.
However, during the TIF case study, it was observed that performing the X-Runner test preparation
took approximately 150% more time than the manual test approach (six hours per requirement vice
four for manual testing). While this seems excessive, at the conclusion of step two, using X-Runner,
test engineers have developed a fully validated, automated test procedure and acceptance criteria.
These test procedures are now ready for background execution with little or no human intervention.
D.7. When an application is tested only one time, the additional time cost is not justified. However,
in this early stage of software development, applications change often. It is usually necessary to retest
functions on several occasions. If the automated test procedure is properly designed, no new work is
required by the test engineer to prepare for the retest. A rerun of the existing test procedure will be
sufficient. Using this approach, the retest can be scheduled, executed, and evaluated with minimal
involvement by the test engineer. The time saving allows test personnel to perform numerous test
procedures simultaneously, leveraging their capability and improving throughput.
D.8. Manual and X-Runner Test Execution (Stage 3). Performing a test using either approach
involves several steps. Many of these steps are routine and repetitive, which X-Runner can perform in
an efficient manner, without mistakes caused by boredom or lack of attention to detail
Stage 3: Generic-Test Execution^!,
Determine which tests to perform
Retrieve test procedure & acceptance criteria
Set up hardware environment
Execute test procedure
Evaluate acceptance criteria
Generate test report
Figure D-3: Test Execution for Both Manual and Automated Approaches
56


D.9. Using the ratio of 1:1.5 of manual to automated testing, the additional costs of automated testing
are recovered just as the manual test procedure for that requirement is executed the second time. Each
subsequent execution will generate time savings consistent with the percentages in listed in the Table
D-2.
Test Approach Time Allowed for Testing Number of Tests Executed Cost per Hour BiiiPiiiiiliiii Percent Improvement
Manual l 0.5 120 (baseline)
X-Runner l 3.0 20 600 %
In Hours Total In Minutes vice Manual
Table D-2: Relative Test Execution Performance per Hour
D.10. The result from testing TIF in both manual and automated approaches has led to the
development of the Adaptive Test Architecture, with its ancillary tools, mentioned in the body of the
thesis. Providing an applications requirements are correctly analyzed, time savings and quality
improvements consistent with those demonstrated here should be achieved.
57


APPENDIX E: Lessons Learned from the TIF Evaluation using X-Runner
E.l. The following is a list of lessons learned from using X-Runner during the development of the
TIF. These findings were the basis of the Requirements Rating Procedure, the Test Methodology
Impact on Cost/Schedule (T-MICS) Model, and the Adaptive Test Architecture (ATA).
E. 1.1. Lesson 1: Using the spiral development model allows significant reuse of not only source
code, but also test procedures, as the application expands its functionality with each iteration. A well-
designed test suite will save dramatic amounts of time, while a poorly designed one will be largely
discarded on subsequent spirals. Due to the additional labor cost associated with automated testing,
efficient re-use of test scripts is essential. While it is uncommon for scripts to be reused with no
modification, the less they need to be modified, the better.
E.l.2. Lesson 2: Using X-Runner to mimic the actions of a human operator. X-Runners
Record/Playback feature was well suited to this GUI-intensive application. The ability to refer to these
objects using a logical name greatly simplified test script development and maintenance. This freed
test engineers to focus on more difficult problems. Additionally, X-Runner can play recorded scripts
much faster than humans, does not skip steps or transpose digits, and does not become complacent or
bored. Experience from TIF testing showed that over 95% of every test procedure can be covered
using Record/Playback feature. Because of this, the ability to test using these features greatly
improves its score.
E.1.3. Lesson 3: Using X-Runner to automatically evaluate the success of a given test case. TSL
programs can be written that evaluate the state of the application before and after a certain event
occurs. If TSL can accurately evaluate program results, the burden on the test engineer shifts greatly
58


from manually verifying requirements to other functions. While mimicking the application operator is
a straightforward concept, replicating the results analyzing function of a human test engineer is a much
more complex problem. In this case, an extreme version of the Pareto principle applies. Almost all
acceptance criteria can be verified programatically. However, some require an inordinate amount of
time. Deciding if a task should or should not be encoded is not always an easy decision. A heuristic
has been developed to score a particular acceptance criterion to make this decision more deterministic.
Projections from TIF indicate acceptance criteria scripting takes ~15 times longer than test procedure
encoding. While this number seems prohibitively large at first, it is primarily due to the fact the
record/playback feature of test procedure generation is so efficient. However, the difference in time is
a significant factor that must be evaluated.
E.1.4. Lesson 4: Using X-Runner to generate formatted test reports. X-Runners report generating
procedure is altogether unsatisfactory for present test findings to customers. However, TSL provides
excellent output capabilities. This requires adding more printf-Vke features within test scripts both to
the console as well as the secondary storage device recording the test findings. Additionally, the
reporting feature should be selective to allow gradually less-and-less information as the spiral
develops. This is essential as in the early steps of testing a wide variety of information is needed to
help correct errors. As the product develops, this information is not as important as before. We
overcame this problem by routing all test report output through a central procedure with a selectivity
tag that allowed the message to be omitted if its content was not appropriate. Some work was also
devoted to a function that would translate TSL statements into a more natural language format, which
would make direct acceptance by customers an easier proposition.
E.1.5. Lesson 5: Using X-Runner to analyze a given test based on its past performance. The X-
Runner test suite used was capable of generating several megabytes of test report text data per test
engineer per day. A useable benchmark of a given test begins to emerge only after several, interactive
59


tests have been executed. Once these benchmarks are firm, subsequent executions of the test can be
compared with it to determine if the software performs as it did in the past. Initially, we have a model
that summarizes several performance characteristics and error counts. A more comprehensive data
reduction method is needed in the future.
E.1.6. Lesson 6: Using X-Runner to suggest which test to execute in the future, based on dynamic
conditions that exist after every test. Many assumptions are made when designing a test schedule.
Often data supporting these assumptions change, which makes the original scheduling approach less
than optimal. To know how to adjust these assumptions, different data must be consulted at different
times in the test development cycle. This need has led to the development of a prototype Test
Scheduler and Reviewer (TSAR) application that can automatically review test results and schedule
new tests accordingly. For example, a given requirement has two test cases associated with one test
procedure. If the test procedure using the positive test case fails, the negative test case should also be
executed to determine the impact of the applications changes. This requires a mapping mechanism to
establish the relationships of the various procedures in the test battery.
E.1.7. Lesson 7: General Observations.
E. 1.7.1. The learning curve required to exploit X-Runner and overcome TSL instabilities is steep.
E. 1.7.2. Configuration management for test scripts is just as important it is for source code,
particularly in team test efforts.
E. 1.7.3. The improved and early interaction between the test organization and systems engineers is
beneficial, even if automating a particular test effort is deemed impractical.
E. 1.7.4. Some customers have expressed concern of approving significant departures from traditional
methods of software testing. However, in this case, the customer accepted the actual TSL scripts In
lieu of more traditional in-progress test reporting which yielded significant savings in labor costs.
60


E. 1.7.5. Maintaining a consistent rule set for the TSAR is challenging. We have written many expert
and decision support systems, but these were quite small in comparison to TSAR. Tree diagrams and
dependency graphs provide an accurate tracing capability, but they are difficult to maintain in a system
with hundreds of rules in a complex, interdependent problem domain.
E. 1.7.6. Introducing new technology into an existing operation invites resistance. The importance of
marketing cannot be overstated. Not only do customers have to acquire a measure of comfort, test
engineers themselves must want to leant the complexities of ATTs to fully reap their benefits.
61


Appendix F: The Requirements Rating Procedure
F. 1. The criteria that are used to score a particular requirement as Red, Amber, or Green is discussed
in the body of the thesis in detail. This appendix shows how criteria are applied to a specific
requirement for final adjudication. The evaluation for requirement TIF0003, a general activity search,
is listed in Figure F-l.
Requirement Number:
TIF0003 ~
TIF shall be able to search the timeline for an entity based on activity, description, or location.
Requirement Narrative:
JL
Based on the data provided below, requirement 11 it-Uilhl31 is evaluated as:
-wiagsr
Count
Avg. Widget Nesting Standard
Complexity Depth Object?
--------HE"
Amoer
Spiral
lest
Rating
.L '
Rating
totals
Window
TETT
TKT
T2tr
uw
untr
1
"TETTT
tot
"TTKT
---T
TE7
---2
TE7T
W
mr
2T
TEST
15:35
-mrz
i otais/Avg
"T8T
TTTT
Z|
TET
Â¥!7
t score:
i L
; TH
3 = High (>5U) Keter to unart tacn | 2 = NO Percent 3 CUMPLbX 0.. 5 tireei
2 = Med. (20..49) in App. A nest level 1 = YES Complete 2 = MEDIUM 5.1 .. 250 Amtx
1 = Low (1..19) below -1 T^^IMPLE >~Z30 Red
parent (Reverse
window min/max) !
adds .1
to total
i 1 u L U i
Figure F-l: The Requirements Rating Procedure for TIF0003
F. 1.1. To use this procedure, the test engineer will input the requirement name and narrative at the top
of the spreadsheet. Next, they will determine the widget density by physically counting the number of
widgets each window uses to accomplish the requirement. The average complexity comes from Table
62


A-l, Recognized X-Runner GUI Object Classes, located in Appendix A. These two values have the
largest impact in the ultimate outcome of the evaluation.
F.1.2. The nesting depth is a minor scoring criterion that determines the level of nesting each window-
level object contributes to the complexity of the code. More deeply nested objects can lead to pointer,
chaining, and memory leak problems, so this area must be evaluated at some level.
F.1.3. Standard objects are those window-level objects which have been incorporated into the
operational baseline
F. 1.4. The Spiral column allows for the influence of less mature code on the difficulty of the test
effort. The value is derived from how complete the code is ( n / total spirals). This number is divided
by one to ensure it complies with the goal state of lower numbers meaning easier automation.
F.1.5. The final two columns contain the expert opinions of those engineers involved with designing
and conducting the actual test effort. The resolution is coarse, but provides significant influence on the
final outcome. These decisions are derived in a number of ways, with each person using a distinctive
style. The outcome is that for many evaluative tools, a certain amount of experience and intuition is
used to generate expected results. These columns introduce an amount of expertise into the outcome
that accounts for many of the factors that have not or could not be adequately represented.
F.2. The process by which an evaluation is derived is known as a decision matrix. This technique is a
commonly used one in statistically based decision making. The values in each cell are multiplied
across the row to derive a score. This number may or may not have meaning in an absolute sense,
but in the case of the Requirements Rating Procedure, this number is relative. The total numbers for
each window-level object are averaged, and the mean is applied to a threshold that determines what the
scoring recommendation will be.
F.2.1. The total range of potential values is from -.20 to over 1,000 (depending on the nesting level -
a practical observation is this number rarely exceeds 1.5). The thresholds were set by the following
63


function: t = 2 (score3). The start point for this function was set at score = 1.5, which was the
limit of the comfort zone of the test engineer designing this model. This set the next threshold (for
Amber) at ~ 5. Processing this new threshold, the formula generated the threshold for Red at -250.
After processing numerous requirements through this model, both TIF, as well as other applications
with known risks, some minor adjustments were made, and the model has been generally accepted by
the test organization.
F.2.2. This function set the thresholds very low for Green. This is desirable, particularly for
developmental software using the spiral development model. It was also desirable for TIF testing as
X-Runner was a new tool to the organization that required time for training and experimenting. This
approach erred on the side of caution. While TIF was used as an experiment to evaluate ATTs, it was
also a training program used for other purposes.
F.2.3. In many ways the model is a summarization for a multitude of complex, and sometimes
competing criteria that are needed to make a determination on the difficulty in testing a given
requirement using ATTs. Some of these criteria have been directly incorporated, while others have
been amalgamated into summary columns. The net effect, however, is to provide a direct, meaningful,
and defensible rating for each requirement in a relatively short amount of time. These ratings have a
direct impact on how each requirement is viewed to determine the risk necessary to automate it and
enjoy the increased effectiveness to the test programs which ATTs potential can provide.
64


Appendix G: The Test Methodology Impact on Cost/Schedule (T-MICS) Model.
G. 1. The T-MICS model is an important tool that was developed to help systems engineers and test
management to understand the implications of a particular test strategy on the overall effectiveness of
the test program. In the T-MICS area, effectiveness is measured in anticipated impacts to labor costs,
as well as qualitative improvements of the test. Additionally, these two criteria may be weighted so
the effect of cost or quality improvement can be considered at a level the customer desires.
G. 1.1. For development testing, the model had to be requirements-driven. This means the overall
design of the development test would center on the applications ability to perform according to the
specifications contained in its requirements. To properly integrate automated testing in a measured,
efficient manner, test designers needed the ability to selectively implement test procedure, verification,
and documentation efforts. The intent of T-MICS was to give these designers an interactive tool that
would provide immediate feedback to show what the cost and quality impacts for a particular design
would be.
G. 1.2. After requirements were scored using the Requirements Rating Procedure (red, amber, or
green), the test designer could give the model various data about the application being tested. When
T-MICS had processed this data, it provided enough information to evaluate the effectiveness of a
particular methodology. If the costs were too high, adjustments could be made to the approach to
individual requirements, which would adjust the results, which could then be evaluated again. This
process would continue until acceptable figures for cost and qualities were achieved.
G.1.3. It became obvious from the first few experiences with T-MICS that it needed the ability to
adjust its findings based on the maturity level of the software. When the TIF application was on its
65


earlier spirals, the projected effectiveness was not as close to the actual effectiveness as expected.
However, later spirals performed much better. To make the model more effective during early spirals,
a sensitivity feature was added to the spreadsheet. It basically allows for larger margins of error
during times when the code is less stable. For example, for a development expected to have n spirals,
the total impact on the model results, T, is calculated using the following formula:
f T(n -1) i, n> 1
[ n .25, n = 1
G.1.4. The amount generated by this equation would be used to add time to the labor amount, and
subtract improvement from the test quality metric. As n increased, the amount of the adjustments
would diminish, which is expected as code matures through several spirals of adjustments and testing.
G.1.5. The composition of the test team also has a bearing on the effectiveness of the test
methodology. Generally speaking, the higher the demands of the test methodology on the automated
side, the greater the need for X-Runner specialists. If this demand is not met, the effectiveness of the
test will decline. The reverse is also true: a highly manual test with a larger contingent of X-Runner
specialists will decrease the test effectiveness. T-MICS evaluates this by examining the ratio of
manual testers to X-Runner specialists in conjunction with the ratio of manually verified to
automatically verified requirements. If the ratios exceed one standard deviation, the effectiveness
rating is reduced accordingly. By using this technique, managers may also examine test team
composition and its impact on the test schedule. T-MICS will point out the proper balance of the
team, based on the existing design of the methodology.
G.1.6. To implement T-MICS, an Excel 5.0 spreadsheet was used. This approach meant the model
could provide instant results, based on an interactive session with the test designer. Figure G-l shows
the initial spiral evaluation of TIF, as judged by T-MICS.
66


''! TV'-V '
Kequiremems. iu i**t Me!hod6log,
bcneouie i ime (in calendar taUmatad i est cost
Slatting Level (i est) 2 ji tWW jgitSbfe J
Sinning Level (X-Kunnerj 1 1 r.K.j ver. jaoei SSBMUW TJU'illfffft*
Total Ls&or-Uays: &u iiruuui k 1 1 1 1 UU I UU 1
Kequirements/Uay (loam) o ou TIFC002 R 11' 1 Tt- TOO TOO 1
Labor Kate (per hour) s/o TTFU003 A i rn 1 i 1 25 0767 0.
tvdiuaiaa apirai (eur/max) 1 / 4 IIKJUU4 A |_L _ US2 U./O 0"
TIF0005 FT i 1 1 ue u.92 "1.1
IlhUUUb (3 ft 1 1 i 1 l 07 UTS 07
Aaonionai ocneouie (lauor uays) j 4 uu TTF0007 G 1 _ -- LJ_LJ 1 i 1 TOO O^T 07
Additional Testing Cast (labor) 2400.00 r IF0008 a p 175 0 67 0?
Original rest Budget (labor) 36.000.00 TIF0009 1 R K 1 1 ms os5 n
budget UvafTuri i 11-0010 G p 1 1TS2 0719 07
Uuaiity improvu..ieni JU Ui's H
i
New lotai ocneauie (tauur uayj oo u/ . ...
New Total schedule (calendar days) TT351 K -
New l esting cost (labor): 33,043.80 4
Cost impact: (+* is improvement): | '4.VS6.ZU | 8.41%
i m V \ i
A.'
Manual i est rrooeaures ^^TJU njotoo i ; i
Manual Verrfication rar TOO" TtXJ r 1 1 1
Manual Documentation 1 TW TW TW I
! ...J 1 - -

Automated i est Kroceaures V2X "3.00 1 00' 1
Automated Verification 2 00 1 50 7 1 1
Automated pocumemotion tw trrr trso : |
. i i
1 1 1 1
Automated i est rroceoure^ uTa U.OOi 0.33 i i i
Automated *e..cau;r. | TW 0.75 u33 I I
Automated uo;- a f ~ T TW 0 /0 0 00 i otais | 4 0 4 0( 0| 0| l.U/ u u u
Figure G-l: T-MICS Results Evaluating TIF (First Spiral)
G.2. After the initial data is entered in the CPCI box, the test designer will insert the requirements
along with their ratings from the Requirements Rating Procedure, by number, in the appropriate
column in the Test Methodology box. T-MICS will place a 1 in the manual column for each
phase of the test: test procedure development (T.P.), verification (Ver.), and documentation
(Doc.). Next, the test designer will place a 1 in the appropriate column under the Automated
area for each phase that should be tested using X-Runner. T-MICS will adjust the findings in the Test
Methodology Results box accordingly.
G.3. The Manual vs. Automated box contains the performance impacts of test approaches, using
manual testing as the benchmark. If the automated test approach has a number less than one, it
67


outperforms manual testing in that area. These numbers may be adjusted to suit a particular test
designers needs.
G.4. Once the test designer is satisfied the new methodology achieves its budget and quality targets,
a copy of the spreadsheet is printed and incorporated into the test plan and schedule. From this point,
test engineers know exactly what areas will be automated for each requirement tested during this
spiral.
68


Appendix H: Glossary
Acceptance Criteria. In software requirements analysis, these are the criteria used by the customer to
determine whether a system under development will meet its software requirements.
Application Under Test (AUT). A term used to describe the executable which will be evaluated by X-
Runner.
Development Testing. The test activities, both formal and informal, which occur while the application
is begin developed, before it has been released to the customer (compare O&M testing).
O&M Testing. The test activities, both formal and informal, which occur after the application has
been released to the customer. This testing includes integrating upgrades in hardware and software, as
well as routine health checks of the application (compare development testing).
Performance Score. Every automated test procedure produced a score which gave an indication of
what occurred during the test. If a particular step of a test procedure performed satisfactorily, the
performance score was adjusted. These scores are relative to the test procedure, and only are
meaningful when a specific test run is compared to an established benchmark. When the new test run
performance score deviates from the benchmark, a new problem has been discovered.
Requirements. Requirements are the sine qua non for any system under development. As the name
implies, there is little flexibility in implementing these features. There are three tiers of requirements:
A specifications (or A-specs), B specifications (B-specs), and C specifications (C-specs). A-specs are
the system level requirements generally used to describe very large systems and/or very large sub-
elements. They are synonymous with system requirements specifications. B-specs are subsystem-
level requirements specification which describe sub-elements of systems or system segments. These
69


sub-elements are called configuration items (Cls). A sub-set of B-specs is known as the B5-level
specification, or B5s. This sub-set is used to describe software configuration items (SCI). C-specs
are a subsystem-level design specification that are generally used by developers for internal design of
the configuration item they are producing. For the purposes of this case study, most of the TIF
evaluation focused on the B5-level specification.
Regression Test. An operationally-focused evaluation of an application. This test involves the
rerunning of test procedures that an application has previously executed correctly in order to detect
errors created during software correction or modification activities.
Requirements Information Sheet (RIS). A narrative description of a specific system requirement.
From a test perspective, the most important portion of the RIS is the section which captures the intent
of a given requirement. By possessing this information, test engineers are able to develop a more
meaningful test program in the gray areas of requirements.
Requirements Test. A test which seeks to evaluate a specific requirement and how it is implemented
in the application.
Test Approach. In the context of this case study, the test approach could either be a manual,
traditional one, or an automated, X-Runner-based test.
Test Case. The specific data set which will be instantiated into the generic test procedure (q.v.) to
create a unique test procedure. These cases contain both properly designed and error-producing data
sets, both of which the application should process properly.
Test Methodology. The specific steps which the test organization will follow to conduct the
evaluation.
Test Procedures. The steps performed by the test engineer to evaluate a particular aspect of an
application. At HITC, there are two basic types of test procedures: requirements test procedures and
regression test procedures. Requirements test procedures map to a specific system requirement (B5)
70


and detail how a test engineer will perform and evaluate that requirement. Regression test procedures
are loosely designed to verify A-spec requirements, and are ideally comprised of components from
numerous B5-level test procedures. Both of these test procedures are generic steps to perform. When
unique test case data (q.v.) is supplied, a new test procedure is created which will examine the same
function using the same steps, but with different results.
Test Scheduler and Reviewer (TSAR). A prototype system, written in C and TSL, which assists test
engineers by determining which automated tests need to be rerun, scheduling those tests overnight or
during other times of off-peak demand, and performing initial analysis on the results of those tests.
This tool allows test engineers to focus their time in a management by exception manner and focus on
performance deviations instead of constantly reconfirming working software still performs correctly.
Test Strategy. Similar to the Test Methodology. The primary difference is in three distinct areas: the
test procedures, verification, and documentation. The test strategy is the method used to evaluate these
three areas, either using X-Runner or traditional methods.
Validation. The process used to ensure an application adequately performs functionality called for in
its requirements.
X-Runner Backbone (XRBB). A library of TSL and UNIX scripts which provide additional reporting
and results tracking information to the test engineer while the test is executing.
71


Bibliography
Aho, A.V., Hopcroft, J.E., and Ullman, J.D., Data Structures and Algorithms, New York, New York:
Wiley & Sons, Inc., 1983.
Arthur, L.J., Rapid Evolutionary Development: Requirements, Prototyping, and Software Creation,
New York, New York: Wiley & Sons, Inc., 1992.
Automated Test Tools: The Next Step in Assurance, Client Server Today, February, 1995.
Bate, R.R., Mueller, D.D., and White, J.E., Fundamentals of Astrodynamics, New York, New York:
Dover Publications, Inc., 1971.
Boehm, B.W., A Spiral Model of Software Development and Enhancement, in R.H. Thayer (ed.),
Tutorial: Software Engineering Project Management, IEEE Computer Society Press, Washington,
D.C., 1988.
Boehm, B.W., Seven Basic Principles of Software Engineering, The Journal of Systems and
Software, Vol. 3, No. 1, 1983.
Davis, A., Software Requirements: Analysis and Specification, Englewood Cliffs, New Jersey:
Prentice-Hall, 1990.
Howden, W.E., Life-Cycle Software Validation, IEEE Computer, Vol. 15, No. 2, February 1992.
OrT, K.T., Structured Requirements Definition, Ken Orr and Associates, Topeka, Kansas, 1981.
Pradip, S. (ed.), The Hughes STX Software Engineering Guidebook, Hughes STX, 1994.
Schultz, H.P., Software Management Metrics, ESD TR-88-001, prepared by the MITRE Corporation
for Electronic Systems Division, Hanscom AFB, Massachusetts, 1988.
Schefstroem, D. and van den Broek, G., Tool Integration: Environments and Frameworks, Chichester,
United Kingdom: Wiley & Sons, Inc., 1993.
Sodhi, J., Software Engineering: Methods, Management, and CASE Tools, Blue Ridge Summit,
Pennsylvania: TAB Professional and Reference Books, 1991.
Software Engineering Handbook, Build 3, Division 48, Information Systems Division, Hughes Aircraft
Company, March 1992.
Whitehead, S., Testing DOD Software, Defense Software Engineering Report, Hill AFB, Utah:
Software Technology Support Center, October, 1995.
72


Full Text

PAGE 1

THE ROLE OF AUTOMATED TEST TOOLS IN THE SPIRAL DEVELOPMENT MODEL: AN X-RUNNER CASE STUDY by John David Swagerty B.S. Northeastern Oklahoma State University 1985 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Computer Science and Engineering 1996

PAGE 2

This thesis for the Master of Science degree by John David Swagerty has been approved by Jody Paul Boris Stilman Date

PAGE 3

Swagerty John David (M.S., Computer Science and Engineering) The Role Of Automated Test Tools In The Spiral Development Model: An X-Runner Case Study Thesis directed by Associate Professor William J. Wolfe ABSTRACT This thesis presents the results of a case study investigating the employment of automated test tools to evaluate an X-windows GUI-based application which used the Spiral Development Model. The case study focused on actual performance metrics derived from testing an application using both manual and automated approaches. The difference in the test results were examined and projections for efficiency improvements were derived from them. Particular emphasis is placed on overcoming side effect propagation which is a potential problem with the spiral development model. The results of this case study have been the development of a new approach to test design which anticipates the impact automated test tools will have on system cost schedule and performance Test en g ineers can use thesis tools to properly design a balanced methodology which will utilize automated testing where it provides a value-added contribution and avoid using automated when the benefits of doing so are not as promisin g This abstract accurately represents the content of the candidate s thesis I recommend its publication. .... iii

PAGE 4

CONTENTS Chapter 1. Introduction .................... ...... .... . ..... .............. .................. .......... . .... ...... .......... ...... .... ..... .... ..... ... 1 1.1 Opportunities and Challenges in the Software Industry .... ....... ...... .... ... ...... ......... ........ .... ..... 1 1.2 Costs of Software Development. .... .......... . ..... . . ........ . . ..... . .... ...... ..... .............. ... ... .... ........ 1 1.3 Changes to DOD Software Procurement .... . . . ..... ...... . ... ... . ...... ....... . . ...... .......... ... .... ... ... 2 1.4 Rapid Application Development and the Air Force ..... ......... . .... . . .... ..... ....... .......... ... . ... .... 2 1.5 Using Automated Test Tools .......... . . ..... ............. ........ ..... ......... . .... . .... . . ........... ........... . ... . 3 2. Purpose ..... . . ............ . ... .... . ........ . ......................... . ....... .......... . . ......... . . .... ................... .... ... . ... . 4 2.1 Goal ofThis Thesis ....... . .............. ... ....... .............. . . .... ..... . .... ............. . . ... . .......... ........... 4 2 2 Why the Spiral Development Model Was Chosen ......... ...... ... ........ ........ .... . .... .... .... ....... ..... .4 2 3 The Choice of the Case Study Application .... . ........... .... . . ....... . .......... .... .... .... . ................. 4 3. The Spiral Development Model ....... . . .............. ....... . ............... . ................... .... ............ ............ ... 5 3.1 What Are Process Models? ..... . .............. ..... ............... ...... . ........ . ........ . .... .... ........................ 6 3.2 What Is the Spiral Model? . . . .... ......... ..... . .... ..... ..... ... . . ... ...................... .... ...... ................ 6 3.3 Risk Analysis in Software Development ....... ... .. ... . ................. ...... . . .... . . ....... ....... . ...... . . 7 3.4 Development Phases in the Spiral Model ..... .................. ...... ... . . .... . .... . ...... .... ... ............ ... 7 3.4 1 Quadrant A ... .......... ..... ....... ... ..... ............ ..... . ..... .............. ....... .... ....... . .... ............ ............. 8 3 .4 .2 Quadrant B ... ....... . . .... .......... . . ...... ....... . ......... . ......... .......... .... . . ...... .......... ........ .... ........ .... 9 3.4.3 Quadrant C . . .......... . . .... . . . ........ ........... . . . ..... ... ........... ....... . ...... ...... ..... ....... .......... ....... . 9 3.4.4 Quadrant D ... . . ..... .... .... . . ....................................................... ........................ . ............ ........ 9 3 .5 Advantages of the Spiral Model. . ...... . ............ ... . . ...................... .... ...... .......... .... .... .......... 9 4 Traditional Verification/Certification Approaches ...................... ........ . . ................ ... .... .... . ........... . tO 4.1 The DOD Approach ...... ........ . .................... . ......... . . . . ..... ... ..... . .... ...... .............. ....... .... ... 11 4 1 1 Unit Testing ... ... ... . .... . . .................. . . .... ....... ... ..... ... ... . . .... ..... ...... ................................ 11 4.1.2 Integration& Testing ....... ........ ..... ...................... . ....... ..... . .... . ........ . ................... ............... l2 4.1.3 SystemsTesting ...................... ............................................ ........ ... ...... .... . .......................... 12 4 1.4 Systems Acceptance Testing .... ................ . ..... . . ..... . ....... ... ............ . ......... . ........ ..... .... .... . 12 iv

PAGE 5

4 2 The Shortfalls of the DOD Approach ....... ................ ............. ...... ... . .... .... .......... ........ ...... . 12 5 The Case for Au t omated T estin g . . ... ... ... ..... ........ .... ... .... . ......... . . . ....... . ......... . .... .... .... . .... . ... l3 5 1 N e w Demand s on T radit i onal Organizat i ons .... .... . . ........... . . . . .... . ....... . .... ......... . .... ....... l4 5 2 The Need for New Methods o f Problem Solving Using Techno l og y ............. .......... . .... .... ..... 15 5 3 Demands Caused by the Spiral Development Model... . ........... . ....... .... . . ........... . ... ........... 15 6. Automated Test Tools : X-Runner . ........ . ......... ....... .......... . . .... . .... . ........ ...... ...... ........................ 1 6 6 1 Description .... . ........ .... . ... . . .... . . . ... . ........ . ..... .... . . . ... . .... . ........ .... . .......... ..... . .... ... 17 6 2 Initial X-Runner E mployment w i thin Hughes .... ....... .... ....... ....... .... ... . . .... ............ ...... . 17 6.2.1 Impact of X-Runner on Existing Hughes Organizations .... .... . ..... . . . ....... . .... ...... ...... . .... . 1 7 6.2.2 The Need for Selectivity to Implement X-Runner ... ... . . . . . ... . . . .... . . .................. . .... . .... 18 7. The Case Study Environment .. ... . ..... . .... ..... . . . . . . . . . . ............. . ... . . .... ... ...... . .............. ...... .... 18 7 1 The Timeline Initializer Function (TIF) . . .... . . ...... ... . . ..... . .... . .... ............. ....... ........... .... . 19 7.2 The Application . . ... ..... . .... ........ ......... . .... .......... ...... ....... ..... . . . ........ .... .... .......... ..... . . .... 19 7.3 The Hardware . . . .............. . .......... . . ..... .... ...... ....... . ....... .... . .... . ..... .... .... . . . .... . . ......... 19 7.3 1 The Test Hardware Environment .... ....... ............... ............ .......... .. ................... ........ .' . . . . . . 22 7.4 The Development Approach ... . ..... .................... . .... .... ...... . .... . .... . . . ..... ... ... ... ....... ..... ... 2 2 7.4 1 The Initial Software Development Plan ... . ....... . . .... ..... . . . ................ ... . .... . . ... .............. .. .. 2 2 8 Integrating X-Runner During System Development... ....... . .... ....... . ... ..... ..... . ............ .... ......... . 2 4 8 1 Test Strategy ..... ........ .... . . . . . ..... ... .... . .... ....... . . . .... ... ........ ... .... ....... ..... . . . . ........ ... .... 25 8.2 Test Execution ... ........ . ........ .... .... .... ..... . ......... ............ ........... .... ... . . .... .... . ... ....... ..... .... . 26 8 2 1 Spiral!. GUI ......... ....... ............ .... . . ..... . . . .......... ..... . .... . . ... ... . . ..... ....... ......... ..... ..... . . . 26 8 2.2 Spiral Ii. The Timeline Object ... . . ... ...... .... . ..... . . . . .......... . . .... . ..... . . ....... . .... . .... . .... 2 7 8 2.3 Spiral Iii. Requirements Development Tools ... ....... ..... ..... .... . ..... . . ........... . .... ..... . .... . . . . 3 0 8.2.4 The Complexity of This Analysis During Spiral III ..... ...... . . ........ . ........ .... . . . . . . ...... ..... 31 8.2.5 Spiral IV Requirements Translation/Schedule Scoring ... ... . . .... . . ...... .... .... . ....... . .... . . ... 31 8.2.6 Testing The Context-Sensitive W i dget Set ..... . . . . ..... . . .... . ..... .... . . ..... . . ... ......... . .... . . . 32 8 2.7 Miscellaneous Table Notes . . . . ................ . . ... .......... . ....... . ....... .... . . .... ....... . . .............. 32 8 2.8 The Effectiveness ofX-Runner for Green Requirements ....... ... ... .............. . . . .... ... . . ........ 33 8 3 The Improved Test Design Process .... ....... . . .... .......... ....... .... ........ ........ . . .... . . . ....... . . . 3 3 8 .3.1 The Requirements Rating Procedure .... . .... . ... ............. ... . . ... . ........ .... ... ... . . .... ........ . . 33 8 3. 2 The Test Methodology Impact on Cost/Schedule (T-MICS) Model . . ....... . ....... .... .... .... ..... 3 6 8.4 The New Architecture . ..... .... . . . . . . . . .... . .... ..... ... ... . . . . . . ....... . . . .... .... .......... ..... ... .... 37 v

PAGE 6

8.4 1 An Improved Test Design ... ........... .. .. ........ .......... .... .... ... ............ ........ .................. ..... ......... ... 37 8.4.2 New Possibilities of Test Design Using X-Runner ........... ... ..... .. ....... .... .................. ....... ....... 3 8 8.4.3 Step 1: GUI Checkout .......... ... .... ................. ... ... .... ........ .. ............... .............. .. ............ . ..... .... 3 9 8.4.4 Step 2 : Requirement Testing ... ..... .. ............................... . ............... .............. ...... ......... ........ .40 8.4 5 Step 3 : Regression Testing ..... .. ... . .... ........... .. ...... ....... ..... .. ....... ........... .. ....... .. ....... ..... ..... .40 8.4 6 Step 4 : Stress Testing ......................... ...... ......... ........ ........................ . ........ ..... ................... 40 8.4 7 The Goals of Stress Testing ....... .. . ... .. .......... .. .. ... ........ .......... .... ... ... .... ... .. ......... ........ ... ..... 41 8.4 8 Stand-Alone Stress Testing .. ......... ... ... .... ................. .. .. ... ..... ... ...... .. ...... .. .......... ......... ... ....... 41 8.4.9 Multiple Application Stress Testing ... ........ .................... .. .. .. .. .. ....... .......... ...... ......... ........... .42 9. Summary .................................................... .. ..... ... . ... ... ....... ... ... .. ...... .. ....... .. .. ...... .......... .. ..... ... 43 9.1 The Fundamental Question Concerning Using A IT ................... .. ............. .... ............... ......... 44 9.2 Observations .... ...... ... ..... .. .. .............. ............... ...... .. ................ ............... ..... ........ .. ..... ... ....... 44 9 .2. 1 The Need for Customer Support ..... .. ........ ....... ..... ............................. ................ .. ............. .44 9 2.2 The Need for Organization Support ...... ........ ... ... ... ... .................. ................. ..... ..... .... ......... 44 9.2.3 The Need to Customize Tools for Problem Domain ........ .......... .... ... ...... ....... .... .............. ... .45 Appendix A. Facts About X-Runner ... ....... .. ..... .. ..... .. ..... .. ... ... ... ... .. ... .. .... .... .. ... . ........................ .... ..... .... ... .... 46 B The Timeline Application ... ......... ... ............ .... .. .. .. .... ... .... .................... ......... ........... ....... .... .. ..... .49 C. The Formal DOD Development Test Process ... ....... ...... .......... ... ............................ .... ... .. ., .......... 52 D. The TIF Test Development Model .............. .... .. .. ..... ......... .......... ...................... .......... ..... .......... 54 E. Lessons Learned from the TIF Evaluation Using X-Runner .................. .................... ............ .. .. ... 58 F The Requirements Rating Procedure ....... ............................ ............ ..................... ........ ......... ......... 62 G. The Test Methodology Impact on Cost/Schedule (T-MICS) Model ..... ...... .......... ..... ............. ....... 65 H. Glossary .................. .............................. ... ....................... .. .. ....... ... .. ....... .. .................... ... .... ........ 69 References ........ ...... .. .. .. .. ....................... ..... ... ........... .. .. .. ............ ...... .. ......... ...... .................... .. ........... 72 vi

PAGE 7

FIGURES FIGURE 3.1: THE SPIRAL DEVELOPMENT MODEL (S DM) ... ... . ... ............ ... . ... ... ..................... ... .... ............ ........ 8 7.1: SAMPLE TIMELINE OUTPUT BUlL T BY TIF 000000000000000000000 0000 0 21 7 2 : TEST TIF HARDWARE CONFIGURATION .... .... ... . ....... ... . ...... ....... .... . ...... ..... . ...... . .... ..... .......... 22 8.1: THE ADAPTIVE TEST ARCI-nTECTURE (ATA) ............ . .......... . ... . . ...... ........ ...... . .... ....... ... .... ..... 38 A-1: X-RUNNER TEST DEVELOPMENT ENVIRONMENT ............. . ............. ...... . .... ....... ................ .... ... ... 48 D-1: THE CYCLICAL NATURE OF TEST DEVELOPMENT . ....... ... ................... .... .... .......... ... ... ...... ...... 54 D-2: EARLY STAGES OF TEST DEVELOPMENT . . ........ . . .... .... ..... .................... ....... .... .... .... .... ...... . . . 55 D-3: TEST EXECUTION FOR BOTH MANUAL AND AUTOMATED APPROACHES .. .. ... ... .......... .... . ...... .... 56 F-1: THE REQUIREMENTS RATING PROCEDURE FOR TIF0003 ........... ...... . ...... . . ....... .......... . ....... ... ... 62 G-1: T-MICS RESULTS EVALUATING TIF (FIRST SPIRAL) .. ..... ... . ... ........ ...... .... . . ..... .... ........ ........ ... 67 vii

PAGE 8

TABLES TABLE 5.1: HOURS SPENT FOR MANuAL VS. AUTOMATED TESTING ON A SAMPLE APPLICATION .... ............... 15 8.1: GUl TEST RESULTS (SPIRAL I) ..... ..... ............................. . .. . ..... . .................. ... ................. ... ..... 26 8.2: TIMELINE TEST RESULTS (SPIRAL 11) ................................. ... ................................... . .. . .. .... . .. . .. 29 8.3: REQUIREMENTS TRANSLATION / SCHEDULE SCORING TEST RESULTS (SPIRAL IV) ... . .. .. ... . .. .. . . . 32 8.4: REQUIREMENTS RATING PROCEDURE SCORES ..................................................... .... . ........... ......... .. 34 8 .5: REQUIREMENTS RATING PROCEDURE EVALUATION CRITERIA ... ....................................... . ........... 36 8.6: IMPACT OF ATA ON TEST ENGINEER LEVEL OF EFFORT .. . ... ....... .......... .. . . .. ... ... ... . . . .... .. .... 38 A-1: RECOGNIZED X-RUNNER GUI OBJECT CLASSES .................................................... .... . ... . .......... .48 B-1: THE NOTIONAL SATELLITE (NS) OPERATIONAL PARAMETERS ... ........................ . .. .. ... ... . . .. .. . .. .49 D-2: RELA T1VE TEsT EXECUTION PERFORMANCE PER HOUR ......... ... . . .. ..................... .. . ............... ... 51 viii

PAGE 9

1. Introduction 1.1 Opportunities and challenges in the software industry. The explosive growth in the software industry has created many opp o rtunities as well as challenges. It is now possible to develop significantly large applications with substantiall y less effort While software and systems engineers have been able to leverage new technologies and design methods to increase their productivity, the approaches widely used in industry today to verify program correctness have not kept pace. 1.2 Costs of software development. In commercial software development up to 25% of the entire project costs are in the test area.1 Additionally industry averages of on-time delivery of products are down to a low of 16.2%? When the complexities of object-oriented development and diverse client-server architectures are considered the job of testing software to reduce performance risk has become more complicated. 1.3 Changes to DOD Software Procurement. In the arena of government applications, particularly the Department of Defense (DOD) this is a growing concern. DOD expects to spend $40 billion this fiscal year for software development 1 Client Server Today February 1995 2 ibid

PAGE 10

and maintenance. 3 Unlike previous years, today's DOD software is produced using a variety of methods ranging from economical commercial-off-the-shelf (COTS) software. This allows the development life cycle to be shortened significantly. Obtaining waivers to use the more current and inexpensive product is becoming more common. Additionally, the easing of regulations to allow quicker updates to existing software has created opportunities for system enhancement that would have been unheard of a few years ago. Because of these changes, new development models are being used at an unprecedented rate. 1.4 Rapid Application Development and the Air Force. One approach favored by some U.S. Air Force programs has been to use a Rapid Application Development (RAD) method known as the Spiral Development Model (SDM). This technique expands functionality in a series of releases of increasingly more sophisticated applications. One major design concern is the amount of effort required by testing organizations to keep pace. A programmer may add only a single module to an existing application. However the test engineer must examine both the new module and the rest of the existing application to ensure side effects have not propagated through previously functional code. 3 Testing DOD Software ", Defense Software Enejneerine Report, <0 Software Technology Support Center October 1995 2

PAGE 11

1.5 Using Automated Test Tools. To cope with this increase in responsibility several vendors have developed automated test tools (A TT ), which perform man y duties previously performed by test engineers. While A Tis perform man y o f the bookkeeping tasks of testing adequately they are not robust enough to provide adequate relief in a meaningful way without significant tailoring to the problem domain. 3

PAGE 12

2. Purpose 2.1 Goal of this Thesis. The purpose of this thesis is to use a specific ATT X Runner from Mercury Interactive Software on a demanding graphical user interface (GUI)-based application to develop a scoring criteria which will help planners decide what applications should be tested using A Tis, to what extent and what the expected impacts to cost and schedule will be 2.2 Why the Spiral Development Model Was Chosen. This case stud y employs a non-traditional software development approach known as the spiral development model. This model is specifically designed to develop a product in a series of releases each with increasing capabilities. The iterative nature of this approach examines the ability of test designs to cope with the side effect propagation problem that is the largest obstacle to fully automating the test process. 2.3 The Choice of the Case Study Application. The specific customer for this application is the U.S. Air Force (USAF) which needs the ability to schedule a variety of conflicting missions on a tightly constrained resource in this case a common geosynchronous satellite. A military application was chosen as it demands the highest standards of reliability compared with most other potential applications The 4

PAGE 13

software must be efficient but must display a degree of fault tolerance not required by commercial customers. s

PAGE 14

3. The Spiral Development Model 3.1 What Are Process Models? Software process models are tools which aid systems engineers and management in the proper planning and development of software applications. They provide a framework that assists planning by defining the expected sequences of events development and management activ iti es reviews products and milestones for a project. The phases in a software process model increase the visibility of individual activities within the complex intertwined network of events during the development of a software product. Each cycle is completed with a review involving vested interested parties to make the determination of whether to proceed and how 4 3.2 What is the Spiral Model? The spiral model represents activities related to software development as a spiraling progression of events that moves outward from the center of the spiral. For each development phase from project conception through preliminary design this model places great emphasis on defining the objectives and evaluating alternatives and constraints evaluating the alternatives and their potential risks developing and verifying the compliance on an interim product 4 Sodhi Jag Software Engineering: Methods Management, and CASE Tool s." TAB Books 1991. 6

PAGE 15

(e.g. prototypes) and planning for the next phase using knowledge from the previous phases. As B. Boehm the inventor of the spiral development model stated: The model reflects the underlying concept that each cycle involves a progression that addresses the same sequence of steps for each portion of the product and for each o f i ts levels of elaboration from and overall concept-of-operation document down to the coding of each individual program ." s 3.3 Risk Analysis in Software Development. In the overall field of software development, where up to 50% of software products lead to no useable products, the spiral model is useful in promoting reasoned analysis during the life of the project.6 For example, if the risk analysis conducted after the definition of requirements showed that the system was not feasible, the requirements can be scaled back, or the entire project modified before large amounts of resources are wasted. 3.4 Development Phases in the Spiral Model. The spiral model proceeds through four distinct quadrants (steps) during each cycle. Figure 1 describes the spiral development model in graphical detail. s Boehm Barry "A Spiral Model of Software Development and Enhancement. IEEE Computer Vol. 21, No 5 May 1988 6 Software Engineering Guidebook, p 3-13 1994 Hughes STX Corporation 7

PAGE 16

DETERMINE OBJECTIVES, ALTERNATIVES AND CONSTRAINTS COMMIT-MENT PARTI TION PLAN NEXT PHASES CUMULATI COST PROGRESS THROUGH STEPS Figure 3 1 : The Spiral D evelop ment Model (SDM) EVALUATE IDENTIFY, RESOLVE DEVELOP, NEXT-LEVEL 3.4.1 Quadrant A: Determine objectives alternatives, and constraints. Each cycle of the spiral begins with this step to identify: The actual system objectives e .g., perfonnance functionality changeability etc. Alternative approaches such as design A reuse, purchase Limitations that affect these alternatives, such as cost and time available 8

PAGE 17

3.4.2 Quadrant B: Evaluation of alternatives; identify and resolve risks. This process frequently identifies areas of uncertainty that are often a significant source of risk. Prototypes, simulations questionnaires, and analytical models may be required to ensure the cost-effectiveness of the design approach or method of risk mitigation. At the end of this process, the next step of system development is decided and could contain one of the following alternatives: Proceed with the next phase Develop a model Change the objectives Revise the constraints Adopt a more traditional, evolutionary development model Stop the development 3.4.3 Quadrant C: Develop and verify next-level product. The products vary and can be a plan, software requirements, software design, code, simulation, or prototype to address a specific problem. The product is then verified to ensure it meets the objectives set in Quadrant A. 3.4.4 Quadrant D: The next iteration of the spiral is outlined. These plans are based on information and lessons learned from the last completed step. 3.5 Advantages of the Spiral Model. The spiral model encourages analysis of objectives alternatives, and risks at each stage of development, providing an alternative to one large commitment/decision point at the start of the project. In the 9

PAGE 18

figure of the spiral model above the farther one moves awa y from the intersection o f the axes the greater the cost commitment. Additionally the spiral mode l allows for objectives to be re-evaluated and refined based on the latest perception o f needs resources and capabilities. 7 7 Software Engineering Handbook, Build 3 p 2-5 Copyright IC March 1992 Information S y stems Division Hughes A i rcraft Corpora ti on 10

PAGE 19

4. Traditional Verification/Certification Approaches 4.1 The DOD Approach. Testing DOD software has traditionally been a resource intensive proposition. It takes time, people, and hardware resources. It also generates a great deal of plans reports and other paperwork that becomes part of the historical database associated with the system. Generally speaking, tests are designed based on the requirements for the system in an imperative manner. In most DOD models testing consists of four distinct layers of verification: unit testing, integration & testing, systems testing, and systems acceptance testing. 4.1.1 Unit Testing. This phase of testing is informal and usually conducted by the software engineering function instead of full-time test engineers The focus is on ensuring all coding and design errors are caught and fixed before the customer sees the product. Examples of unit testing are ensuring code compiles and links properly, subsystem errors are trapped shop coding standards are followed and established configuration management procedures are being enforced. 4.1.2 Integration & Testing. There is no formal dividing line between unit testing and this phase. The focus has changed, however, to the proper integration of the code "units" produced earlier into a cohesive program. Finding and correcting problems in this phase are usually more expensive and time consuming. Generally testing at this 11

PAGE 20

level is performed by more experienced software and systems engineers who have a deeper understanding of overall system objectives. 4.1.3 Systems Testing. In this phase, formal evaluation by a full-time testing organization begins The intent of this testing is to operate the software under as close to operational conditions as possible exercising the fullest set of functionality possible. This requires a comprehensive understanding of not only the capabilities of the software, but also the desires of the end user which is sometimes at odds. 4.1.4 Systems Acceptance Testing. This phase of testing is further divided into two activities: acceptance testing and system installation. Acceptance testing is a formal, comprehensive effort that identifies what problems, if any exist with the system and how they will be corrected. This is the fmal time a system is evaluated before becoming part of the customer's baseline. Installation is also a formal activity concerned with building and integrating the new system with existing software on hardware at the customer s location. At the conclusion of this step the system is expected to perform to specifications and be fully useable by end users. 4.2 The Shortfalls of the DOD Approach. As we have seen software verification in a DOD environment is a continuous, time-consuming effort performed by both programmers and dedicated test personnel. It is an attempt by the development 12

PAGE 21

organization to demonstrate to the customer that all reasonable effort has been made to ensure the products will meet expectations Since companies have gone out of business by neglecting these efforts most surviving ones do not take this step lightly 13

PAGE 22

5. The Case for Automated Testing 5.1 New Demands on Traditional Organizations. The growing complexities of software have placed demands on the verification process that cannot be accomplished using traditional test methods. The growing emphasis on function/module reuse with tools that facilitate the integration of these modules in increasingly diverse ways has made it possible to develop significantly large applications in a small amount of time. However, the technology to validate these new applications has not kept pace In many commercial markets, this would not pose a major problem. In domains where there is little or no fault tolerance, e.g., satellite health and welfare, status and commanding, etc., the need for complete verification has not subsided. What is called for are tools that leverage the test organization's capabilities in the same way as CASE tools and object oriented development design approaches have done for software engineers. Table 1 demonstrates the dramatic time-saving potential of A Tis using SQA's TeamTest test suite on a generic windows application.8 Note: All figures are given in hours. 8 Cljent Server Today, February 1995 14

PAGE 23

Table 5.1 : Hours Spent for Manual vs. Automated T e sting on a Sample Application 5.2 The Need for New Methods of Problem Solving Using Technology. This paradigm shift in the test domain has far-reaching implications. Previously there was a distinct phase of development between the activities of software development and software test. The old adage of "throwing the code over the wall to the testers" was only slightly metaphorical. New methods of software development means the test has to be much more carefully designed than before. It is no longer cost-effective to test each requirement in an imperative manner. 5.3 Demands Caused by the Spiral Development Model. The iterative nature of software development using the SDM forces test designers to reexamine their test plans at each spiral. This means that a given section of code may have to be retested numerous times, which can have a noticeable impact on labor costs. Additionally when a certain module is changed a large amount of test effort must be expended to 15

PAGE 24

test nearby but unchanged software to ensure the application still performs previously-test functionality 16

PAGE 25

6. Automated Test Tools: X-Runner 6.1 Description. X-Runner is a commercial software package specifically designed to provide automated testing functionality for X-windows applications. It offers the ability to approach testing problems in manners which are impractical to accomplish in a manual testing mode, i.e. repetitive tasks, consistently accurate test execution, foreground and background testing, and expansible capability through both C and UNIX. A further description ofX-Runner capabilities and how they were used during this case study is contained in Appendix A. 6.2 Initial XRunner Employment within Hughes. There were two basic reactions encountered upon integrating X-Runner into the test organization at HITC: use the new tool for every single requirement, or resist using it for any requirements. Since the tool was purchased by a government customer for a specific program, management provided emphasis consistent with the former view. 6.2.1 Impact ofX-Runner on Existing Hughes' Organizations. Because of this a major training and experimentation program was instituted to prepare the existing test organization for X-Runner integration. More people with programming experience were assigned to test. Traditionally, the programmer to test engineer ratio at Hughes 17

PAGE 26

for normal governmental development programs was 6:1. Using A Tis would lower that ratio making the overhead margins larger. Tighter coordination between test and software development organizations occurred. Test schedules and budgets were adjusted to cope with the changes. While many of these changes have been adjusted or abandoned over time the overall test process has improved, primarily due to improved integration between software development and test organizations. 6.2.2 The Need for Selectivity to Implement X-Runner. Early in this process it became apparent some selectivity was needed in implementing X-Runner. Generating automated test cases and verification processes for every requirement was too time consuming to justify the additional expense. The question remained: when should X-Runner be used, for what level of verification and what skill mix is required to accomplish these goals? The experience gained during the actual software tests provides insight into answering these questions. 18

PAGE 27

7. The Case Study Environment. 7.1 The Timeline Initializer Function (TIF). To further examine the effect of A Tis on the software development process a sample application under development that used the spiral development model was selected. For further information regarding generic timeline issues and the packed-window scheduling problem refer to Appendix B The selected system was evaluated in three distinct areas: the application (problem domain) the hardware and the development cycle 7.2 The Application. The Timeline Initialize Function (TIF) is a software application that will build a prototype schedule of tasks on a satellite resource management tool known as the timeline. The timeline is a graphical representation of jobs the mission planners want the satellite to perform during a given interval. Additionally specific resources aboard the spacecraft offer unique opportunities and challenges to developing an optimized schedule of work 7.3 The Hardware. Currently the TIF is designed to operate on a Sun-based workstation, preferably a Sparc-10 or greater The necessity ofthe timeline workstation to monitor real-time telemetry ranging and other spacecraft health data requires significant multi-tasking abilities. Additionally the timeline operator must 19

PAGE 28

monitor message traffic from a variety of sources to see if adjustments to the existing timeline are required by dynamic events such as a change in mission priorities. For the purposes of this case study, a two-monitor display was used with one monitor to contain the application under test while the other one contained all associated X Runner windows. This approach allowed operators and software engineers to interact with the software while a test was being conducted in a passive background mode. For some of the later test efforts where throughput was a concern X-Runner was executed on a different CPU and remotely connected to the machine which contained TIF. This approach removes processor contention issues from the test results, but is conducted sparingly to limit the impact of testing on other development activities. 20

PAGE 29

Fctlvlt.y: Texas Oescrtptton : Lone Star Scan Lock Status: Unl ocked St.rt Tioo : 06/161 1999 End Tioo : 06/1 611999 05:00:00 Resoe..ree: GE0_4_Semor _1 Plan : planl Figure 7.1: Sample Timeline Output Built by TIF 7.3.1 The Test Hardware Environment. The testing ofTIF was executed on a two-headed Sparc-20, connected to a simulated gro und station through a ge neric Ethernet interface. Figure 3 displays the screen configuration of TIF in the actual test environment. TIF was located on the primary scree n (display:O .O), while the XRunner and XRunner Backbone support utilities were echoed on the secondary screen (display:O.l). Both displays shared a keyboard and mouse. The X-Runner Backbone utilities provided messages generated during test execution The messages were routed to one of three wi ndows: general messages, error messages, and debug information. By examining the output on these win dows as well as the X -21

PAGE 30

Runner / TSL window the test engineer could examine the state of the application before a particular TIF operation occurred and see the results afterwards. X-Runner Display Fi g ur e 7.2 : T es t TIF H a r dwa r e Config ur a t io n 7.4 The Development Approach TIF was a portion of an application being developed as a trainin g vehicle for newl y -hired software engineers on both X windows as w ell as mission plannin g a pplications. Over the course o f the case s tudy a number of people worked on both the code and the test pro g ram. Significant recurring management participation in the case study helped ensure the integrit y and consistency of findings over the course of the entire project. 7.4.1 The Initial Software Development Plan. The initial software development plan (SDP) called for four distinct spirals for the initial operational capability (IOC) for TIF At the end of each spiral the customer would conduct a design review to 22

PAGE 31

determine ifthe software met its original goals or did an y o f the adjustments on thi s spiral create a need to review original goals / capabilities and should development continue into the next spiral. The following list expands the goals of each spiral. 7.4.1.1 Spiral 1: GUI. At this level the primary focus was generating all displays pop-ups and other human-computer interaction (HCI) concerns Very little of the underlying functionality would be implemented but the logical flow of the design would be established. Where needed program stubs would provide a simulation function The expected coding time was estimated at eight labor-weeks. Testing was allocated two labor-weeks. 7.4.1.2 Spiral II: The Timeline Object. This phase brought more emphasis on the basic timeline object. This object is more than the simple placement of tasks in a packed window. Jobs on the timeline behave like separate objects with unique constraints and opportunities. Proper reflection of timeline dynamics based on operator interaction and other dynamic events was not a straight-forward effort yet was fundamental to the value of this product. The budgeted coding time was estimated at 20 labor weeks with three labor-weeks allocated for testing 7.4.1.3 Spiral III: Requirements Development Tools After the GUI was functional and the timeline object performed reliably numerous support tools were required to 23

PAGE 32

actually operate the application in any real sense These tools were primaril y text related database fimctions which helped develop scheduling requiremen t s from multiple sources. The output of these tools served as the basis for the prototype schedule generated and displayed in the timeline object. The budgeted coding time was estimated at ten labor weeks. Testing was allocated two labor weeks. 7.4.1.4 Spiral IV: Requirements Translation/Schedule Scoring. The fmal spiral used for TIF focused on ensuring the validity of the generated prototype schedule The results of the scheduling algorithm were compared with schedule parameters with known optimal values and the deltas tracked The total worth" of the schedule was calculated in a variety of ways and cross-checked with a human-developed one. The expected coding time was estimated at four labor weeks with four labor weeks allocated for testing 24

PAGE 33

8. Integrating X-Runner During System Development 8.1 Test Strategy. An umbrella test plan was developed to cover the entire scope o f the project. The initial development efforts or Spiral I were completel y developed Subsequent spirals were completed in increasingly fuzzier detail due to the dynam i c nature the product could take using the SDM. For this reason the Spiral II portion of the test plan was released for external staffing but not included in the umbrella plan. Spiral III was released for internal Hughes staffing Spiral IV was still being developed by key personnel on the development team. As the project proceeded to the next spiral the test plan for each phase would move the next step in its completion cycle. This assisted in keeping the project on track while allowing the flexibility to adjust to changes in system requirements. 8.2 Test Execution. Since the SDM provides multiple releases of software there i s a need for constant involvement by the supporting test team Officially test involvement began as an advisory and information gathering role during unit and integration testing. Much of the test case development occurred during these phases with particular emphasis on re-using as much of the software engineer s test efforts as possible. This approach also allowed systems engineers to steer test case development in a more prominent way than is traditionally done. This improved the 25

PAGE 34

quality of the total test program and helped ensure better coverage throughout the code 8.2.1 Spiral I. GUI. Most of the testing of this version of the software involved performance of the GUI and the Motif widget set. Each object in the entire GUI was manipulated with results noted, a total often times. Since the TIF contains 129 separate objects, over 1 200 separate tests were performed. The following table lists the results of those tests: Table 8 1 : GUJ Test Results (Spiral/) 8.2.1.1 Miscellaneous Table Notes: The window objects were tested through their component widgets. The numbers in the "Occurrences of Each Object column denote the total number of those objects within the entire TIF application. The "Time Required" column lists the actual time, in minutes, each test approach needed to 26

PAGE 35

examine all the objects ten times These times do not include test preparation test case development etc. but do include test report generation The Time Savings represents the amount of time saved using automated test tools for this type of test 8.2.1.2 The primary difference between automated and manual testing was the amount of time required to perform each test on multiple occasions The total test execution time difference was over 30 labor hours. Dramatic differences in time allow test engineers to quickly examine GUI test data and move on to other test objectives. In many cases the automated testing was more effective as the input data was more consistent and quickly processed. The object referenced in the table is the timeline a non-standard object that was unstable during this early portion of the test. 8.2.2 Spiral II. The Timeline Object. Although some of the earlier versions of the timeline object suffered from a lack of stability later versions improved significantly. The timeline object was based loosely on several other similar developments so the software matured rapidly. From the testing perspective the most challenging aspect of this spiral was the almost complete reliance on non-Motif-standard widgets to implement the timeline. Due to this approach, all TSL scripts which served as test procedures were written using the analog testing mode. This created the necessity of 27

PAGE 36

including numerous safety checks such as ensuring the window was in exactly the proper physical location before a given test thread executed. 8.2.2.1 Acceptance Criteria. Due to the complexity of analog testing actual screen representation of scheduled events for verification purposes was not attempted using X-Runner. What was used was a reliance on systems calls within TSL to existing c shell scripts which examined the scheduling database associated with TIF to ensure events were scheduled at the proper time and on the proper resource. Early manual tests focused on ensuring what was graphically depicted matched the contents of the database. Once a degree of confidence was reached that the timeline object could accurately portray events from its databases, a greater reliance was placed on the automatic verification of the timeline using database queries. 8.2.2.2 As the timeline object was contained within one requirement TIF0002, only one test procedure was developed for this spiral. The test was, however quite extensive, consisting of over two hundred discreet procedural steps and twelve separate verification functions. The test was based on the manual approach to evaluating timeline objects. Table 3 depicts the performance characteristics for each test approach evaluating the timeline object. 28

PAGE 37

Tabl e 8.2 : Timeline Test R e sults ( Spiral II) 8.2.2.3 Miscellaneous Table Notes. The time difference between the manual and automated approach was even more dramatic in this test effort. This column depicts the amount of time required to perform a single test procedure. The time line has the ability to dynamically adjust time thresholds by grabbing a scroll line with the mouse pointer and dragging. The problem is this scroll line is only one pixel wide. Precise manipulation takes a tremendous amount of practice and patience. Using the analog testing mode X-Runner is interacting with TIF at the pixel level and the playback/record features of XRunner become very useful. 8.2.2.4 The Degree of Compliance column denotes X-Runner s ability to generate test results similar to those produced by test engineers. The small difference in the fmdings generated by X-Runner and human testers was largely caused b y a test engineer s ability to adjust the lengthy test procedure in small amounts as the test executes The automated test procedure is highly objective and designed to perform 29

PAGE 38

in the same manner each time. The overall test findings were close enough to allo w X-Runner to perform requirement testing for TIF0002. 8.2.2.5 The throughput of this test effort represents -20 hours of actual wall time. In this same period automated testing was able to accomplish almost seven times as many evaluations with a high degree of compliance During the course of the test problems were reported to systems engineers who made corrections and rebuild the code. This action caused the need to retest the software. A test engineer would be looking at one-half day s work to accomplish this whereas X-Runner could perform it in less than one hour This is particularly important when numerous requirements need retesting which would require more time than is humanly possible. 8.2.2.6 Although many involved with the test design were not comfortable with allowing X-Runner to evaluate a highly-complex non-standard object in the analog mode it was able to perform well within expectations. 8.2.3 Spiral Ill. Requirements Development Tools. The textual nature of these tools tended to limit the ability ofX-Runner to easily adapt to evaluating them. The lessons learned from examining the timeline databases through systems call were applicable. However the majority of the test effort during this spiral focused on intensive mathematical analysis to ensure the scheduling function within TIF was 30

PAGE 39

generating candidate missions which did not violate any constraints either from schedule conflicts or orbital constraints. 8.2.4 The Complexity of this Analysis During Spiral lll. The complexity ofthis analysis prevented most test engineers from conducting the evaluation. Most of this work was performed by a technical support committee comprised of specialists throughout the company. Once the scheduling algorithms were validated, some minor test work was conducted to ensure the data was developed, disseminated, and stored in a manner compliant with existing requirements. This work was largely done by test engineers, as it only had to be examined once, and those routing functions were not rebuilt after the first baseline. 8.2.5 Spiral IV. Requirements Translation/Schedule Scoring. This spiral presented similar problems to the test effort as did Spiral III but to a lesser degree. Once the scoring algorithms were validated (by the same technical support committee), a more meaningful test program was developed for X-Runner The basic method of evaluation centered around three different asset utilization plans whose optimal schedule was known. TIF was to generate a prototype schedule, based on operator input. The new schedule was compared with the optimal schedule, and the deltas compared. TIF had a sensitivity function which was to make a determination whether the differences were significant, and adjust its processing accordingly. 31

PAGE 40

8.2.6 Testing the Context-Sensitive Widget Set. The design approach used to accomplish these requirements was almost entirely within the context-sensitive widget set. This allowed for rapid development of automated test procedures and acceptance criteria. The amount of time required for these stages was nearly identical to the amount of time needed for test engineers to design a manual method. The actual execution characteristics of these tests is contained in Table 4 Table 8.3 : Requirements Translation/Schedule Scoring Test Results (Spiral IV) 8.2. 7 Miscellaneous Table Notes. The Time Required column depicts the amount of wall time needed to perform the entire test battery for this spiral. The times are not large as the evaluation process is not deeply involved. The Degree of Compliance was identical due to this same fact. The large difference in the number of tests executed was due to scheduling the automated test once per hour every day for the entire test window (four weeks). Although this amount of time might seem excessive allowances were made in crafting the schedule to accommodate any problems which might need a "mini" spiral to fix and the fmal version of the test report once all the findings were analyzed by test engineers. 32

PAGE 41

8.2.8 The Effectiveness of X-Runner for Green Requirements. This entire spiral demonstrated the effectiveness of using X-Runner for primarily Green requirements. The ability to rapidly complete test development and begin executing a fast and efficient test program simply must be done. The amount of time this frees up for test engineers is too valuable and needed in many other areas. 8.3 The Improved Test Design Process. As a result of the test effort for TIF using X-Runner, a change in the traditional approach to testing seemed necessary. The lessons learned from these results are contained in appendix E. They have been used to develop a new process of test development which seeks to employ A Tis for the tasks they are most suited. At the same time, this new process attempts to restrict the use of A Tis in areas where they are not suited or their cost benefit is not clearly known. To assist test engineers in designing future test strategies an Adaptive Test Architecture (ATA) was developed which focused on the following steps: Rating individual requirements to determine their suitability for automated testing and to what degree this testing should examine. Determine the impact on system development schedule in both terms of increased cost and quality. Publish a working test strategy to guide the efforts of test engineers to implement the test program 8.3.1 The Requirements Rating Procedure. This first step in the process provided a systematic method of evaluating individual requirements to determine how suited to 33

PAGE 42

automated testing they were. At the end of the process each requirement was scored at one of three levels : Red Amber, or Green The following t abl e e x plains these levels in more detail: Amber Potential Impact Green Must Do The complexity of this requirement makes des igning a cost-effective automated test a h -risk The requirement is complex or the potent ial of automation are not sufficiently known to determine if the requirement should be tested using X-Runner These requirements are generally automated after all other test requirements are completed with time remain i ng on the test schedule The benefits of automation are straightforward Automating the test effort can be with little or no risk to test effectiveness Tab le 8.4 : R e qu i rem e nts Rat i n g Pro ce dur e Sc or es 8 3.1.1 The evaluations made for a given requirement a made directly for the spiral under consideration at that time It is possible for a requirement to be rated at a different level during a subsequent spiral although this is rare. Most requirements generate their rating based on their inherent complexity a factor which does not usually vary between spirals. It is possible however for a window-level object to be seriously modified due to the decisions made at the beginning of a spiral. These changes may have a profound impact on the complexity of the window object which will in tum have a noticeable impact on the rating that particular requirement may receive from the rating procedure. 34

PAGE 43

8.3.1.2 Additionally while Red requirements make poor automation candidates many Amber requirements can and should be automated. However this conversion should only occur after all other required test activities have been performed. It is not acceptable to jeopardize the test schedule automating requirements with higher degrees of risk, unless the schedule has sufficient slack. 8.3.1.3 A number of factors were considered to reach the final score for a given requirement. Appendix F gives a more detailed explanation of how this process is implemented The following table gives a brief explanation of the criteria used to make a final determination for each requirement: 35

PAGE 44

Average window-level widget density Average widget complexity Depth of Window Level Objects Standard Reused Object ? Systems Engineer s Rating The more and non-standard which appear on each window generated by the application makes it more difficult to test. This is partially do to the increased workload but is also influenced by the combinatorics involved with the Certain widget classes, such as push_ buttons and check_ buttons are straightforward to test. Other widget classes such as non-standard objects or bitmaps require much more effort. Appendix A contains a table ofX-Runner objects and their complexity as determined by the amount of test time required to evaluate those The amount of system resources particularly memory (stack size etc.) required to track the interaction between various levels of nesting impacts the complexity of the test. This is particularly acute on a workstation that contains several real-time applications running simultaneously. Testing a new application in a stand-alone manner may be successfully while testing the in conditions m not. Certain objects from previously-developed products are reused as often as possible. They are completed tested, understood by operators and easier to lf a new application contains reused objects its testing complexity ratmg ts test on prior experience of the test organization This score is established through established metrics and committee evaluations This rating is the expert opinion of the difficulty of the test effort based on prior experience of the systems engineer responsible for the project. The emphasis of this evaluation contains an end-to-end perspect i ve in addition to the impact of the test effort on the entire project. In many cases the process of this evaluation will point to automating a test to a level that will have a negative impact on the fielding of the product, which is generally not Tabl e 8 5 : R e quirem e nts Rating Pro c edure Evaluation C riteria 8.3.2 The Test Methodology Impact on Cost/Schedule (T -MICS) Model. This tool which allows systems and test engineers to interactively plan the actual design of the test is executed as an Excel spreadsheet. It requires background information on 36

PAGE 45

the particular spiral such as the number of requirements schedule time staffing levels labor rates etc Using the manual test method as the benchmark the model asks which approach manual or automated will be used for each of the three products of the test: the test procedure the verifying process and test documentation. These decisions are determined for each requirement tested Once these decisions are made T-MICS will determine the impact the test approach will have on the test schedule the quality of the test effort the overall effectiveness of the test effort and the impact of the test approach on labor costs. Appendix G contains more information on the construction of the model as well as sample output. 8.4 The New Architecture. Incorporating these changes into existing methods has produced a new approach to test design. These new steps are designed to accomplish a broad range oftest objectives Some of these steps were not possible in the traditional approach primarily due to the intensity required. Note these steps will occur for each spiral. However the earlier two steps will decrease in intensity as the application matures with focus shifting to the latter two steps 8.4.1 An Improved Test Design. Because of the findings of the TIF case study a new test design process has been developed. Figure 4 portrays this new approach to test design. 37

PAGE 46

RIS Development Man u al Manual 1 Test Test Preparation Execution Scoring Test Criteria Results X Runner X -Runner Test Test Preparation Execution Figure 8 1 : The Adaptive Test Architecture (ATA) 8.4.2 New Possibilities of Test Design Using X-Runner. This new architecture also introduces a new series of test efforts which are to be developed concurrently. Under the previous test devel o pment process, Stage 2, Test Preparation consisted of designing requirements test procedures and limited regression test design. Table 7 gives a list of all test efforts which are occurring during test preparation under AT A. These test steps are further explained below. 11' '"! Level GUI Checkout A thorough evaluation ofGUI Automaticall y Generated Requirements Test Design Requirement-driven, development Same as Before test Regression Test Design Operationally-based examination Reuse of requirements test of application procedures minimizes development impact Stand-Alone Stress Testing Establishes stability of the Automatically Generated Application Under Test Multiple Application Stress Testing ConfliTils resources can support Automatically Generated ; AUT design. Hardware must be scheduled in advance Table 8 6 : impact of ATA on Test Engineer Level of Effort 38

PAGE 47

8.4.3 Step 1: GUI Checkout. As far as the operator is concerned the GUI is the application. All their interaction with the software occurs here. For this reason, the GUI must be consistently designed and extremely fault tolerant. X-Runner maintains an ASCII file which describes all objects contained in the GUI including name location, and actions taken upon activation. 8.4.3.1 A TSL module called GUI_CHECKOUT, has been designed to exercise the entire GUI in a rigorous manner. This module examines the GUI data file and builds a task list of objects. Based on the class of the object, GUI _CHECKOUT will activate every object on the GUI including pop-up and sub-windows in varying combinations to ensure the stability of the application. 8.4.3.2 GUI CHECKOUT performs other operations as well. Certain GUI objects perform in a similar manner across applications according to shop standards e.g. quit" print" and "minimize". Routines have been included to ensure these standard objects perform the function they represent. For example, the "quit" object should shut down the application with no residual processes active This routine will examine the process list, as well as the window table, to ensure no traces of the application under test remain. An exspansible switch statement within the function allows this list to grow as GUI _CHECKOUT evolves. 39

PAGE 48

8.4.4 Step 2: Requirement Testing. This is the heart of the development test effort. The primary mission of the test program is to ensure the application under test can perform its required functionality. In the development testing arena this is the primary focus. Many of the other test activities contained in ATA help establish how well above the minimum standard the application performs. AT A is also interested in establishing the stability of the application to a much higher degree than is currently possible. 8.4.5 Step 3: Regression Testing. This level of testing is designed to operate the application in the same manner as it will be used in actual conditions. Ideally, the regression test can be built from existing test cases, used in modified ways. Under the manual approach, the self-contained nature of the requirements testing make simply recycling them less effective. However, it is much easier to reuse test scripts in TSL. In some cases a full regression test can be written by developing a script whose only purpose is to call several requirements test scripts, in a logically-designed manner. If the called scripts perform as designed, a consistent performance score will emerge over time. This score can be used in the same manner as with requirements testing alerting test engineers of changes in application performance. 8.4.6 Step 4: Stress Testing. This area of testing is crucial to delivering quality software but is often too resource-intensive to conduct thoroughly Resource 40

PAGE 49

allocation problems like memory leaks and stack violations become e vi dent durin g this type oftesting but fmding them using manual methods is more a case o f luck than good test design. Currently stress testing is viewed in two distinct manners : stand alone and multiple operators. This type of testing occurs later in the test effort depending on the success of the first three steps. Additionally, the intensity of this testing increases as the number of spirals do. The philosophy is as the application approaches delivery it needs to be "put through its paces more 8.4.6.1 Stress testing has only one basic goal: Can the application survive in an intense environment? Since this goal is straightforward it requires little human intervention. For this reason stress testing occurs overnight weekends, and other times when significant machine resources are available. For example a test engineer will design a simple stress test for an application and start the job at close of business. When the test engineer returns the following morning they can determine the state of the application by examining the screen and the test log. The constant activity provides a degree of confidence in the application s ability to perform as designed. 8.4.6.2 Stand-alone Stress Testing In this area, the entire emphasis of the test is to determine if the application can be "broken" given enough interaction and time. Stress testing at the application level occurs in three phases. Phase 1 will perform many of the activities of the GUI _CHECKOUT function The primary difference is 41

PAGE 50

that the module will randomly perform thousands of object manipulations per hour. The non-directional nature of these interactions will test the application s a bility to handle quick sometimes illogical input. Phase 2 involves the random e x ecution of requirements test procedures. Test reports are generated and performance scores are tracked for consistency. Phase 3 involves simulating operator errors. This is accomplished by executing random requirements tests as in phase 2 with one major exception. Throughout most test procedures list scroll and other multiple option widgets must be manipulated to perform a desired function. In this phase one or more of these choices are selected incorrectly. In most cases this should case the test case to fail. The application must be able to accommodate such incorrect input. 8.4.6.3 Multiple Application Stress Testing. In this area the emphasis shifts from testing the application to testing the shared resource s ability to support multiple applications simultaneously Many applications like TIF may be operated by several operators at the same time. The shared object the timeline in our case is maintained in a number of data tables on a common file server. As the number of requirements tests which manipulate data in these tables grows test engineers gain insight into the server s ability to cope with these requests. Note: GUI checkout is not the emphasis of this type of testing, although a limited amount does occur Also the population of requirements tests which can be randomly executed is smaller than stand-alone stress 42

PAGE 51

testing. The focus shifts to the entire hardware architecture the network file servers etc. 43

PAGE 52

9. Summary 9.1 The Fundamental Question Concerning Using ATT. The fundamental question concerning the use of A Tis is not whether test organizations will use them rather it is how Improperly used they can be a significant drain of resources and may cause stress both within the organization and with customers However when properly used they offer the potential of significantly increasing test engineer effectiveness. In several instances throughout this case study the improvements were over 600%. Even static organizations like the DOD, are interested in improvements of this magnitude. 9.2 Observations. While examining the mountains of data generated during this case study a few observations echo throughout: 9.2.1 The Need for Customer Support. The customer must support changes in established procedures when they amount to a paradigm shift. Making such profound changes to routine practices can quickly effect system efficienc y and desirability. 9.2.2 The Need for Organization Support. The organization both software development and test engineers must also support these changes. In many cases the 44

PAGE 53

skill mix will not be appropriate to effectively employ A Tis. This is a problem that can only improve over time. 9.2.3 The Need to Customize Tools for Problem Domain. No commercially available A TT will have sufficient functionality to contribute improvements early in its integration. In X-Runner's case nwnerous utility and support programs were needed to enjoy much of its potential. 45

PAGE 54

APPENDIX A: Facts About X-Runner A.l. General X-Runner Capabilities. XRunner provides several features including automatic recording and playback of operator interaction with the application; rudimentary report generation; and the capability to execute test scripts in batch fashion. Its versatility and power offer a tantalizing array of possibilities to professional test engineers. The potential cost savings quality improvements and logic-based test designs make the furthest exploration ofX-Runner's capabilities essential. A.l. l Modes of Testing. X-Runner provides two basic modes to accomplish testing. Analog mode forces X-Runner to deal with the view port, keyboard and mouse using absolute (physical) addresses Context sensitive is a high-level testing mode that allows X-Runner to "learn" objects on the screen based on varying criteria. Once these objects are learned X-Runner may interact with them regardless of their location on the screen. For example, an operator would like to use the mouse to click on an object named "QUIT'. In the analog mode, X-Runner would move the mouse pointer to a physical location on the screen, then execute a button press If, for some reason the object was moved to a different location, the mouse click would not work. In the context sensitive mode X-Runner would interpret the mouse click command look up the current location of the object move the mouse pointer and execute the click. The context sensitive mode provides a higher degree of security but the additional overhead required slows its performance noticeably A 2 Test Script Language (TSL). X-Runner provides a C-like scripting language called TSL. Although many of the syntax rules and functional descriptions are similar between C and TSL it is an interpreted language and experiences a predictable performance lag TSL has the capability to call other modules, including compiled TSL functions, other TSL scripts, and other compiled C modules 46

PAGE 55

contained in Dynamic Linked Libraries ( DLL) TSL follows C synta x concernin g rule s o f associativity and precedence and contains a rudimentary mathemat i cal function library. More extensive computational requirements would require linking into DLLs using C s math library In most testing applications this has not proved necessary. A.3. Strengths X-Runner is an excellent versatile A TT that offers features found on no other commercially-available product. It offers a variety of approaches to testing that are costly and often impractical in manual testing modes It can eliminate many repetitive tasks faced b y test engineers performing its work in a consistent and accurate manner. X-Runner can accomplish testing in either foreground or background mode. It generates a rudimentary test report and has expansible capability through both C and UNIX A.4. Weaknesses Fully realizing X-Runner s potential is not automatic Significant overhead is required before any testing can begin (e g. setting up the test environment) Additionally there are some instabilities within the X-Runner application and within TSL that require workarounds on occasion The rudimentary reporting feature is too immature. It needs augmentation through TSL to report data necessary to accurately characterize the results of the test. To overcome these weaknesses an in-house developed suite ofTSL functions called the X-Runner Backbone (XRBB ) was written. This application isolated many of the problem areas w i thin TSL to minimize their impact on routine test development. The XRBB also provided reporting and results recording features which greatly simplified using X-Runner during this test. A.5 X-Runner Objects. X-Runner has the ability to directly interact with every object in the Motif toolkit (in the context sensitive mode) It has established protocols for communicating with these objects and provides the TSL programmer the ability to determine and modify the attributes of these objects. Table A-1 details the actual classes of widgets with which X-Runner may communicate. 47

PAGE 56

Table A-1 : R e cognized X Runner GUJ Object Classes A 6 The interface with X Runner is a GUI-based text editor used to develop TSL scripts It has most of the standard features associated with text editors file print edit etc. and many tools specific to testing. Figure A-I displays an actual screen capture ofthe X-Runner test environment. auto app_dlr. rc. I. toolkit. version; lf(wln_exlsts(wlndow) -O) return o: lf(wln_exlsts(appllcatlon) !O){ app_dirgetenv("M_ROOr) &. "/samples/bin": GUI_get_name (toolkit. version); lf(toolkit --"Windows"){ system(app_dlr &. "lwlnburlwlnbur.exe"); } else { lf(system("cd" &. app_dlr &. "/motifbur;" &. "motlfbur &.") !O) return 1: } lf(wln_walt(appllcatlon. 60) !0) Fi gur e A-1 : X -Runn e r T est D evel opm e nt En vi ronm e nt 48

PAGE 57

APPENDIX B : The Timeline Application B. I. The timeline application used during this case study is the primary visualization tool used by mission planners It provides bargraph or GANTT style views of the requested activities over time The timeline allows the user to view in an instant the breadth of system resource utilization during a specific time interval. The timeline has the capability to use a drag-and-drop mechanism to select and move scheduled activities or to select additional requests from a list and add them to the schedule. B.2 During the period of this evaluation, a notional satellite was designed to fully examine the timeline function. This geosynchronous satellite had the operating characteristics listed in Table B-1. Note: All times are given in milliseconds. Table B-1 : The Notional Satellite (NS) Operational Parameters B 2 1 The notional satellite was configured along these lines to simulate the performance characteristics of other programs both governmental and commercial. Its performance characteristics are compliant with the industry standard Hughes Electronics HS-60 Is satellite with a modified payload to simulate both current and notional capabilities. The columns in the table are further explained below: 49

PAGE 58

B.2 .2. Satellite Payload The Notional Satellite (NS ) contains ten elements within its payload In keeping with trends in recent programs the payload is heterogeneous which imposes new demands on the timeline function. The payloads are named based on the primary function they provide. B.2.3 Payload Type. NS contains five different types of payloads each with unique scheduling properties. They range from continuous transponder functionality in the X-band, a continuous uplink in the S-band, two imaging sensors one HRMI sensor and won very low resolution fiXed weather sensor B 2 .4. Maximum Scan This time represents the maximum amount of time a given sensor may perform a particular task. An entry ofN/ A means that particular payload does not move If the sensor remains pointed at the same location the sensitivity and calibration of the sensor diminishes B.2.5. Minimum Scan Some sensors require a minimum amount of time to provide reliable data This time limit is usually provided by the sensor vendor as one of the design constraints of the sensor. For example if the imaging sensor is supposed to generate a finding within+/ one meter the sensor may require a constant exposure to the phenomenon for n milliseconds. If the expose time is smaller the quality of the event generated by the sensor will suffer which may cause it to generate false results B.2 6 Slew Time. Some of the sensors aboard NS require an electromechanical movement of hardware on the spacecraft These components are not omnidirectional and require specific ground locations to complete their tasks. As these components must move, it takes a fmite amount of time to move them to the desired location. Slew time considers the amount of time required to calculate the amount of movement and the physical deployment of those components. B 2 .7. Load Shed. On NS there is a limited amount of power available at any one time There is not sufficient power to operate all sensors on the spacecraft at the same time. Hardware on existing so

PAGE 59

satellites prevents the overburdening the power grid However mission planning software should also prevent satellite over-tasking 8 .2.8. Recovery Time After some sensors have performed a task for a given amount of time, they may require some recover time. For example, if the imaging sensor has been tracking a bright spot it will need to look at cold space to allow residual heat to dissipate from the sensor Remember, the operational temperature of the sensor hovers around -175 degrees Celsius, so a few degrees are significant. Once the recovery time has occurred, the sensor is available for subsequent tasking 8.3. The TIF software was designed to generate a prototype timeline based on the existing requirements for a given period of time. The requirements were derived from two sources: the recurring task list, and a requirements list for a specific time window. An example of the recurring list appears below: 0 Requirements List for NS (Recurring) c 0 c 0 Task Payload Start Stop Scan Target Target Priority Requesting c Number Time Time Type Area Size Agency 0 R-0001 wx 00:00 23:S9 Continuous In View N / A 100 NASA c R-0002 HRMI 00:00 23:S9 Every S min. 18E:41N I Sian 1S DOD-Special 0 R-0003 IM-2 10:00 12:00 Every S min OSW: 3SN I SOlan 2S NASA c R-0004 IM-1 12:00 13:00 Best Cbaoce 20E:40N 7Skm 1 5 DOD-USAF 0 R-OOOS IM-2 OS:OO 06:00 Best Chance 26E:38N SOlon 2S DOD-USA c R-0006 IM-2 17:30 22:00 Best Chance 22E:42N SOlan IS DOD-USAF 0 c -S The following list demonstrates what a specific day s requirements would look like: 0 Requirements List for NS (Daily) for 23 January 1995 c 0 c Task Payload Start Stop Scan Target Target Priority Requesting 0 Number Time Time Type Area Size Agency c D-0001 HRMI 00:00 01:00 Best Chance 28E:3SN SOlan 100 NASA 0 D-0002 IM-1 00:00 02:00 Best Chance 28E:3SN SOlan 100 NASA c D-0003 HRMI 09:00 IO:S9 Every 10 min 18E: 41N I Sian 75 DOD-Special 0 D-0004 IM-1 10:00 12:00 Every S min OSW: 3SN !SOlan 2S NASA c D-OOOS IM-1 12:00 13:00 Best Chance 20E:40N 7Skm 15 DOD-USAF 0 D-0006 IM-2 OS:OO 06:00 Best Chance 26E:38N SOlan 2S DOD-USA D-0007 IM-2 17:30 22:00 Best Chance 22E:42N SOlan IS DOD-USAF 0 0 0 -151

PAGE 60

APPENDIX C: The Formal DOD Development Test Process C.l. The formal test approach used during each spiral during this case study complied with existing governmental policies specifically, MIL-SID 2167A From a test perspective this approach can be summarized by the following paragraphs. C l l The emphasis of testing was on functionality of the system as it will be used operationally. Most of the preparation for the test occurred well before the test phase and was conducted by a separate test organization not the development team. However, several programmers assisted in the test design process as most requirements are subject to some degree of interpretation at lower levels. C.l.2. A review of the system test plan and procedures occurred before actual testing began to ensure test coverage was complete. C.l.3. A baseline configuration ofTIF was established and frozen. Baseline source code was used to compile, link and build a new executable version of the application. The current hardware configuration was documented C.l.4. Once the new application was judged stable it became the product baseline. Final versions of the current spiral s test plans and procedures were staffed revised and published C l 5 Any problems discovered during the first test were documented and resolved in a controlled configuration management process No "ad-hoc" changes were allowed. C.l. 6. The formal test was conducted by the test organization with quality assurance monitoring the process to certify correct procedures were followed C l 7 At the conclusion of the test, numerous briefmgs were conducted between the test organization systems engineers project management, and customer representatives The test results were reviewed 52

PAGE 61

with problems and potential solutions identified Every instance of software perfonnance not meetin g requirements or expectations was examined to detennine what actions needed to occur. In some instances the anomalies could be traced to operator error, or in some cases minor deviations from the test procedure In other instances hardware bottlenecks and failures caused some test anomalies C.l.8. Problems which could not be resolved were documented in the fmal test report after consultation with the software development team. These issues became a central portion of the quadrant A activities of the next spiral. The effort needed to correct these problems was compared with the benefits gained by the additional capability planned for the new spiral and a prioritized task list was derived. This new task list defmed the objectives of the development organization for the next spiral. 53

PAGE 62

APPENDIX D : The Timeline lnitilizer Function (TIF) Test Development Model D .l. The software testing model used for this case study consisted of four basic steps: Requirements Information Sheet (RIS) developme nt, test preparation, test execution, and test results These stages contain all actions required by DOD regulations, standards and style guides. Table D-1 outlines these steps in more detail. r "" "' .... : ,. } :1 :, : '.-....... ,;:-..... y. ... \ t _: RIS Development Starting point in the test development process Contains all necessary data to fully understand a given requirement particularly why it is included on the development Includes the Acceptance Criteria for its listed requirement. Allows test engineers to design a thorough meaningful evaluation Test Preparation Translate the data contained in the RIS forms into working procedures Validate the acceptance criteria through actual operation of the software Test Execution Actually perform test procedures Generate results data and formal test reports for each procedure Begin work on the fmal test report Test Results Reduce data generated through previous stages Summarize findings Make fmal determination about the readiness of the software tested Complete the fmal test report, including recommendations Table D-1 : Test Development Steps Followed at Hughes D.2. This process is not a simple linear one At any time during the test development should circumstances require work may regress to an earlier stage to reevaluate and reconfigure existing test strategies. Figure D-1 illustrates the process as it is used for both manual and automated testing l RIS Development Test Execution Figure D-1 : The C y clical Nature ofTest Development 54 Test Results

PAGE 63

D .3. Generally speaking when using manual test methods the workload is balanced between each of these stages. For automated testing, however the amount of labor-intensive work peaks during test preparation and falls dramatically for the fmal two stages. Note : The time savings in those stages allows test engineers to focus on more than one test objective at the same time, increasing their productivity D.4 This method of test design fits the spiral development model well in that it is adaptive to both large and small changes in system requirements constraints and performance targets Because the system's development is more dynamic in this model of software development, the test program must also be. D.5. Manual and X-Runner Test Development (Stages I & 2). These two steps of the development process are iterative and consist of all work that needs to occur before actual testing can begin These steps occur early in the test process and sometimes continue into actual test execution. Figure D-2 depicts what actions test engineers perform during these first two steps by each test approach Validate Test Procedures Write Formal Test Procedures (incl. typing distributing Study Problem Domain Evaluate Requirements For Each Requirement: Determine the intent Figure D-2 : Early Stages of Test Development 55 Record Test Procedure Encode Acceptance Criteria

PAGE 64

D 6. RIS development is the same process and takes the same amount of time for either approach However during the TIF case study it was observed that performing the X-Runner test preparation took approximately 150 % more time than the manual test approach (six hours per requirement vice four for manual testing) While this seems excessive at the conclusion of step two using X-Runner test engineers have developed a fully validated, automated test procedure and acceptance criteria These test procedures are now ready for background execution with little or no human intervention D 7 When an application is tested only one time, the additional time cost is not justified. However in this early stage of software development applications change often. It is usually necessary to retest functions on several occasions. If the automated test procedure is properly designed no new work is required by the test engineer to prepare for the retest. A rerun of the existing test procedure will be sufficient. Using this approach, the retest can be scheduled executed and evaluated with minimal involvement by the test engineer. The time saving allows test personnel to perform numerous test procedures simultaneously, leveraging their capability and improving throughput. D.8 Manual and X-Runner Test Execution (Stage 3). Performing a test using either approach involves several steps. Many of these steps are routine and repetitive which X-Runner can perform in an efficient manner without mistakes caused by boredom or lack of attention to detail Determine which tests to perform Retrieve test procedure & acceptance criteria Set up hardware environment Execute test procedure Evaluate acceptance criteria Generate test report Figure D-3 : Test Execution for Both Manual and Automated Approaches 56

PAGE 65

D 9 Using the ratio of I : 1.5 of manual to automated testing the additional costs of automated testing are recovered just as the manual test procedure for that requirement is executed the second time. Each subsequent execution will generate time savings consistent with the percentages in listed in the Table D-2. Table D-2 : Relative Test Execution Performance per Hour D.! 0. The result from testing TIF in both manual and automated approaches bas led to the development of the Adaptive Test Architecture, with its ancillary tools mentioned in the body of the thesis. Providing an application's requirements are correctly analyzed time savings and quality improvements consistent with those demonstrated here should be achieved. 57

PAGE 66

APPENDIX E : Lessons Learned from the TIF Evaluation using X-Runner E. I. The following is a list oflessons learned from using X-Runner during the development of the TIF. These fmdings were the basis of the Requirements Rating Procedure the Test Methodology Impact on Cost/Schedule (T -MICS) Model and the Adaptive Test Architecture (AT A). E l .l. Lesson I : Using the spiral development model allows significant reuse of not only source code but also test procedures, as the application expands its functionality with each iteration A well designed test suite will save dramatic amounts of time while a poorly designed one will be largely discarded on subsequent spirals. Due to the additional labor cost associated with automated testing efficient re-use of test scripts is essential. While it is uncommon for scripts to be reused with no modification the less they need to be modified, the better. E.l.2. Lesson 2 : Using X-Rurmer to mimic the actions of a human operator X-Runner s Record/Playback feature was well suited to this GUI-intensive application. The ability to refer to these objects using a logical name greatly simplified test script development and maintenance. This freed test engineers to focus on more difficult problems. Additionally, X-Runner can play recorded scripts much faster than humans does not skip steps or transpose digits and does not become complacent or bored. Experience from TIF testing showed that over 95% of every test procedure can be covered using Record/Playback feature Because of this, the ability to test using these features greatly improves its score. E.l.3 Lesson 3: Using X-Rurmer to automatically evaluate the success of a given test case. TSL programs can be written that evaluate the state of the application before and after a certain event occurs. If TSL can accurately evaluate program results the burden on the test engineer shifts greatly 58

PAGE 67

from manually verifying requirements to other function s While mimicking the application operator is a straightfoiWard concept replicating the results analyzing function of a human test engineer is a much more complex problem. In this case an extreme version of the Pareto principle applies. Almost all acceptance criteria can be verified programatically However some require an inordinate amount o f time Deciding if a task should or should not be encoded is not always an easy decision A heuristic has been developed to score a particular acceptance criterion to make this decision more deterministic Projections from TIF indicate acceptance criteria scripting takes -15 times longer than test procedure encoding. While this number seems prohibitively large at first, it is primarily due to the fact the record/playback feature of test procedure generation is so efficient. However, the difference in time is a significant factor that must be evaluated E l.4. Lesson 4 : Using X-Runner to generate formatted test reports X-Runner's report generating procedure is altogether unsatisfactory for present test fmdings to customers. However TSL provides excellent output capabilities. This requires adding more prinif-like features within test scripts both to the console as well as the secondary storage device recording the test fmdings Additionally, the reporting feature should be selective to allow gradually less-and-less information as the spiral develops This is essential as in the early steps of testing a wide variety of information is needed to help correct errors As the product develops this information is not as important as before We overcame this problem by routing all test report output through a central procedure with a selectivity tag that allowed the message to be omitted if its content was not appropriate Some work was also devoted to a function that would translate TSL statements into a more natural language format, which would make direct acceptance by customers an easier proposition. E.l.5. Lesson 5 : Using X-Runner to analyze a given test based on its past performance. The X Runner test suite used was capable of generating several megabytes of test report text data per test engineer per day A useable benchmark of a given test begins to emerge only after several interactive 59

PAGE 68

tests have been executed Once these benchmarks are fmn subsequent e x ecutions o f th e test can b e compared with it to determine if the software performs as it did in the past. Initiall y we have a mod el that summarizes several performance characteristics and error counts A more comprehensive da ta reduction method is needed in the future E.l. 6 Lesson 6 : Using X-Runner to suggest which test to execute in the future based on dynamic conditions that exist after every test. Many assumptions are made when designing a test schedule Often data supporting these assumptions change which makes the original schedulin g approach less than optimal. To know how to adjust these assumptions different data must be consulted at different times in the test development cycle This need has led to the development of a prototype Test Scheduler and Reviewer (TSAR) application that can automatically review test results and schedule new tests accordingly For example a given requirement has two test cases associated with one test procedure If the test procedure using the positive test case fails the negative test case should also be executed to determine the impact of the application s changes. This requires a mapping mechanism to establish the relationsh i ps of the various procedures in the test battery E .l. 7 Lesson 7 : General Observations. E l 7 1 The learning curve required to exploit X-Runner and overcome TSL instabilities is steep E.l.7.2. Configuration management for test scripts is just as important it i s for source code, particularly i n team test efforts. E l. 7.3 The improved and early interaction between the test or g anization and systems en g ineers i s beneficial even if automating a particular test effort is deemed impractical. E .l. 7 .4 Some customers have expressed concern of approving significant departures from traditional methods of software testing. However in this case the customer accepted the actual TSL scripts In lieu of more traditional in-progress test reporting which yielded significant savings in labor costs. 60

PAGE 69

E.l.7.5 Maintaining a consistent rule set for the TSAR is challenging We have written man y expert and decision support systems but these were quite small in comparison to TSAR Tree diagrams and dependency graphs provide an accurate tracing capability, but they are difficult to maintain in a system with hundreds of rules in a complex interdependent problem domain E.l. 7 .6 Introducing new technology into an existing operation invites resistance. The importance of marketing cannot be overstated. Not only do customers have to acquire a measure of comfort test engineers themselves must want to learn the complexities of A Tis to fully reap their benefits 61

PAGE 70

Appendix F: The Requirements Rating Procedure F. I. The criteria that are used to score a particular requirement as Red Amber or Green is discussed in the body of the thesis in detail. This appendix shows how criteria are applied to a specific requirement for fmal adjudication. The evaluation for requirement TIF0003, a general activity search is listed in Figure F-1. I I 1 Reqwremen Number : IIII"UUU-' IKeqwrement Narrative : TIF shall be able to search the t1meline for an ent1ty based on actJVJty, descnpt 1 on or 1ocat1on. diitB proViiiea oe IS 1ur-uw;, Amoer -n "-I'T I T I'Vg. nlal;l"' NeSting ----" t lotaiS I-Window Count Complexity Depth Object? Rating Rating 1 j 1.40 1 .UU I 1 U .tlU j j 1b1. 14 "1. 1 "I..UU 1 .1U I "1. U.tlU j "1. "1.1.1"1. j 1 ."/..UU 1 ."/.U I ,"/. U.tlU ,"/. ,"/. 10 .j0 I OUIIS/I'VQ. 1.01 I ,). 0.0: I \ .lVI 1.0/ v .ov1 ..: .0/ I "" ll:H o..: Average ;::,core: OO.Of I I I I J-HIQh 1 Heier to (.;han t:acn "/.= NU t-'ercent J -l,;UMt-'Lt:X u .. ll:iree1 12 = Med (20 .. 49) lnApp. A nest leve 1-YES 2MEDIUM 5 .. 250 lAmbE 1 = Low(1 .. 19) below -1 1 =SIMPLE > 250 I Red parent (Reverse wmaow m1rvmax I aaas 1 to totar I i I Note : Lower IS better (easter to automate ) Figure F-1: The Requirements Rating Procedure for TJFOOOJ F .l.l. To use this procedure, the test engineer will input the requirement name and narrative at the top of the spreadsheet. Next they will determine the widget density by physically counting the number of widgets each window uses to accomplish the requirement. The average complexity comes from Table 62

PAGE 71

A-1, Recognized X-Runner GUI Object Classes located in Appendi x A Th e se two values have the largest impact in the ultimate outcome of the evaluation. F l .2. The nesting depth is a minor scoring criterion that determines the level of nesting each window level object contributes to the complexity of the code. More deeply nested objects can lead to pointer chaining, and memory leak problems so this area must be evaluated at some level. F l.3. Standard objects are those window-level objects which have been incorporated into the operational baseline F .l.4. The Spiral column allows for the influence of less mature code on the difficulty of the test effort The value is derived from how complete the code is ( n I total spirals). This number is divided by one to ensure it complies with the goal state of lower numbers meaning easier automation. F .1.5. The fmal two columns contain the expert opinions ofthose engineers involved with designing and conducting the actual test effort The resolution is coarse but provides significant influence on the fmal outcome These decisions are derived in a number of ways with each person using a distinctive style The outcome is that for many evaluative tools a certain amount of experience and intuition is used to generate expected results. These columns introduce an amount of expertise into the outcome that accounts for many of the factors that have not or could not be adequately represented F.2. The process by which an evaluation is derived is known as a decision matrix This technique is a commonly used one in statistically based decision making The values in each cell are multiplied across the row to derive a score" This number may or may not have meaning in an absolute sense but in the case of the Requirements Rating Procedure this number is relative The total numbers for each window-level object are averaged and the mean is applied to a threshold that determines what the scoring recommendation will be. F .2. 1 The total range of potential values is from -.20 to over 1 000 (depending on the nesting level a practical observation is th i s number rarely exceeds 1 5). The thresholds were set by the following 63

PAGE 72

function: t = 2 (score3 ) The start point for this function was set at score= 1 5 which was the limit of the comfort zone of the test engineer designing this model. This set the next threshold (for Amber) at-5. Processing this new threshold, the formula generated the threshold for Red at 250 After processing numerous requirements through this model both TIF as well as other applications with known risks, some minor adjustments were made, and the model has been generally accepted by the test organization. F 2.2 This function set the thresholds very low for Green. This is desirable, particularly for developmental software using the spiral development model. It was also desirable for TIF testing as X-Runner was a new tool to the organization that required time for training and experimenting. This approach erred on the side of caution. While TIF was used as an experiment to evaluate A Tis, it was also a training program used for other purposes. F.2 3 In many ways the model is a summarization for a multitude of complex and sometimes competing criteria that are needed to make a detennination on the difficulty in testing a given requirement using A Tis. Some of these criteria have been directly incorporated while others have been amalgamated into summary columns. The net effect, however is to provide a direct, meaningful and defensible rating for each requirement in a relatively short amount of time These ratings have a direct impact on how each requirement is "viewed" to determine the risk necessary to automate it and enjoy the increased effectiveness to the test programs which A Tis' potential can provide 64

PAGE 73

Appendix G : The Test Methodology Impact on Cost/Schedule (T MICS) Model. G.l. The T-MICS model is an important tool that was developed to help systems engineers and test management to understand the implications of a particular test strategy on the overall effectiveness of the test program In the T-MICS area, effectiveness is measured in anticipated impacts to labor costs as well as qualitative improvements of the test. Additionally these two criteria may be weighted so the effect of cost or quality improvement can be considered at a level the customer desires G .1.1. For development testing, the model had to be requirements-driven This means the overall design of the development test would center on the application's ability to perform according to the specifications contained in its requirements To properly integrate automated testing in a measured, efficient manner test designers needed the ability to selectively implement test procedure verification and documentation efforts The intent ofT-MICS was to give these designers an interactive tool that would provide immediate feedback to show what the cost and quality impacts for a particular design would be. G.l.2. After requirements were scored using the Requirements Rating Procedure (red amber, or green) the test designer could give the model various data about the application being tested When T-MICS had processed this data it provided enough information to evaluate the effectiveness of a particular methodology If the costs were too high, adjustments could be made to the approach to individual requirements which would adjust the results which could then be evaluated again This process would continue until acceptable figures for cost and qualities were achieved. G l.3 It became obvious from the first few experiences with T-MICS that it needed the ability to adjust its fmdings based on the maturity level of the software. When the TIF application was on its 65

PAGE 74

earlier spirals the projected effectiveness was not as close to the actual effectiveness as expected. However later spirals performed much better. To make the model more effective dur ing early spirals a sensitivity feature was added to the spreadsheet. It basically allows for larger margins of error during times when the code is less stable For example for a development expected to have n spirals the total impact on the model results, T, is calculated using the following formula : { T(n-1) .5, T(n) = n .25, n > 1} n=1 G.l.4. The amount generated by this equation would be used to add time to the labor amount and subtract improvement from the test quality metric As n increased, the amount of the adjustments would diminish which is expected as code matures through several spirals of adjustments and testing G .1.5. The composition of the test team also has a bearing on the effectiveness of the test methodology. Generally speaking the higher the demands of the test methodology on the automated side the greater the need for XRunner specialists. If this demand is not met the effectiveness of the test will decline The reverse is also true : a highly manual test with a larger contingent of X-Runner specialists will decrease the test effectiveness T-MICS evaluates this by examining the ratio of manual testers to X-Runner specialists in conjunction with the ratio of manually verified to automatically verified requirements. If the ratios exceed one standard deviation the effectiveness rating is reduced accordingly. By using this technique, managers may also examine test team composition and its impact on the test schedule T-MICS will point out the proper balance of the team, based on the existing design of the methodology. G l.6 To implement TMICS an Excel 5.0 spreadsheet was used. This approach meant the model could provide instant results, based on an interactive session with the test designer Figure Gl shows the initial spiral evaluation ofTIF, as judged by TMICS 66

PAGE 75

I r 1U I l l o -uovwvo.JIIJ L ;,cnecu1 e ''" ca1enaar I >larung I Ul) : :;ra111ng .u ...... ,. I 10181 bU I.UU i.UU 1.1 1eam): 1 u .:.o l ..1 _I<_ _1_..1_1_..1 '-"'"-1 .00, 1 1 LaDer Kal e (per nour): :>t:. 1..<:. u .o U .' "P'"" 1 HCUUU'O u .r:.1 u (18CCTCBJ5) : 1 o.uo U >l.l l 1.0 u 1 _1_ u.rc u.: 'I.UU -"'-_1_ '-"'"-0 61 0.1 l"ccmcnal Tesung (18D()T):_ _1_ 0 6'1 0.1 tUngnal test 11uager (laoor): I ;>b,UUU.UU I< I.UC U .!l:ll 1 1 IIIUCge< unrrun : 0 0 ',. HCUU O U u u .o tuuamy 1mprovemem: _J,. J yu 1 uocun emauor I.UU u r:. U .;)U -'= I ... _l I I _}.' _l_ I 851 t' u.t:. u :.u U .;>;> ..1 ...... ... v '-"" U .f;) 1 uocumtJmauor I.UU U .f;) U .;)U .. --" 0 o u u u.: Figure G-1 : T-MICS Results Evaluating TIF (First Spiral) G 2. After the initial data is entered in the "CPCI" box, the test designer will insert the requirements along with their ratings from the Requirements Rating Procedure by number in the appropriate column in the "Test Methodology" box. T -MICS will place a "I" in the manual column for each phase of the test: test procedure development ("T P ." ) verification ("Ver. "), and documentation ("Doc."). Next the test designer will place a "I" in the appropriate column under the Automated area for each phase that should be tested using X-Runner. T-MICS will adjust the findings in the "Test Methodology Results" box accordingly G.3. The Manual vs Automated" box contains the performance impacts of test approaches, using manual testing as the benchmark. If the automated test approach has a number less than one, it 67

PAGE 76

outperforms manual testing in that area These numbers may be adjusted to suit a particular test designer's needs. G 4. Once the test designer is satisfied the new methodology achieves its budget and quality targets a copy of the spreadsheet is printed and incorporated into the test plan and schedule. From this point test engineers know exactly what areas will be automated for each requirement tested during this spiral. 68

PAGE 77

Appendix H : Glossary Acceptance Criteria In software requirements analysis these are the criteria used b y the customer to determine whether a system under development will meet its software requirements. Application Under Test (AUT). A term used to describe the executable which will be evaluated by X Runner Development Testing The test activities both formal and informal which occur while the application is begin developed, before i t has been released to the customer ( compare O&M testin g) O&M Testing. The test activities both formal and informal which occur after the application has been released to the customer. This testing includes integrating upgrades in hardware and software as well as routine health checks of the application (compare development testing). Performance Score. Every automated test procedure produced a score which gave an indication of what occurred during the test. If a particular step of a test procedure performed satisfactorily, the performance score was adjusted These scores are relative to the test procedure and only are meaningful when a specific test run is compared to an established benchmark When the new test run performance score deviates from the benchmark a new problem has been discovered Requirements Requirements are the sine qua non for any system under development. As the name implies there is little flexibility in implementing these features. There are three tiers of requirements : A specifications (or A-specs) B specifications (B-specs) and C specifications (C-specs). A-specs are the system level requirements generally used to describe very large systems and/or very large sub elements They are synonymous with system requirements specifications B-specs are subsystem level requirements specification which describe sub-elements of systems or system segments. These 69

PAGE 78

sub-elements are called "configuration items" (Cis) A sub-set of B-specs is known as the B5-level specification, or B5s This sub-set is used to describe software configuration items ( SCI ) C-spec s are a subsystem-level design specification that are generally used by developers for internal design of the configuration item they are producing. For the purposes of this case study most of the TIF evaluation focused on the B5-level specification. Regression Test. An operationally-focused evaluation of an application. This test involves the rerunning of test procedures that an application has previously executed correctly in order to detect errors created during software correction or modification activities. Requirements lnfonnation Sheet (RIS) A narrative description of a specific system requirement. From a test perspective the most important portion of the RIS is the section which captures the intent of a given requirement. By possessing this infonnation test engineers are able to develop a more meaningful test program in the gray areas of requirements. Requirements Test. A test which seeks to evaluate a specific requirement and how it is implemented in the application Test Approach. In the context of this case study the test approach could either be a manual traditional one, or an automated, X-Runner-based test. Test Case. The specific data set which will be instantiated into the generic test procedure (q v ) to create a unique test procedure These cases contain both properly designed and error-producing data sets, both of which the application should process properly Test Methodology. The specific steps which the test organization will follow to conduct the evaluation Test Procedures. The steps perfonned b y the test engineer to evaluate a particular aspect of an application. At HITC, there are two basic types of test procedures: requirements test procedures and regression test procedures. Requirements test procedures map to a specific system requirement (B5) 70

PAGE 79

and detail how a test en gin eer will perform and evaluate that requirement. Reg ress i on tes t procedure s are loosely designed to verify A-spec requirements and are ideally comprised of compon en ts from numerous B5-level test procedures. Both of these test procedures are generic steps to perform When unique test case data (q.v.) is supplied a new test procedure is created which will e xam ine the sam e function using the same steps but with different results. Test Scheduler and Reviewer (TSAR). A prototype system, written in C and TSL which assists test engineers by determining which automated tests need to be rerun scheduling those tests overnight or during other times of off-peak demand, and performing initial analysis on the results of those tests This tool allows test engineers to focus their time in a management by exception manner and focus on performance deviations instead of constantly reconfirming working software still performs correctly Test Strategy. Similar to the Test Methodology. The primary difference is in three distinct areas : the test procedures verification and documentation. The test strategy is the method used to evaluate these three areas either using X-Runner or traditional methods Validation. The process used to ensure an application adequately performs functionality called for in its requirements X-Runner Backbone (XRBB). A library ofTSL and UNIX scripts which provide additional reporting and results tracking information to the test engineer while the test is executing 71

PAGE 80

Bibliography Aho A.V Hopcroft J.E. and Ullman J D., Data Structures and Algorithms New York New York : Wiley & Sons Inc ., 1983. Arthur L.J. Rapid Evolutionary Development: Requirements Prototyping and Software Creation New York New York: Wiley & Sons Inc ., 1992. "Automated Test Tools : The Next Step in Assurance Client Server Today February 1995 Bate R .R., Mueller D.D and White J .E., Fundamentals of Astrodynamics New Yor k, New York : Dover Publications Inc 1971. Boehm B W A Spiral Model of Software Development and Enhancement, in R.H Thayer (ed), Tutorial : Software Engi n ee r ing Projec t Management IEEE Computer Society Press Washington D .C., 1988 Boehm B.W Seven Basic Principles of Software Engineering ," The Journal ofSyst e m s and Software Vol. 3 No 1 1 983. Davis A., Software Requirements : Anal y s i s and Specification Englewood Cliffs New Jersey : Prentice-Hall 1990 Howden W.E Life-Cycle Software Validation IEEE Computer Vol. 15, No 2 February 199 2. Orr K T., Structured Requirements Definition Ken Orr and Associates Topeka, Kansas 1981. Pradip S ( ed .), The Hu g h e s STX Softwar e Engineering Guid e book Hughes STX 199 4. Schultz, H .P., Software Management Metrics ESD TR-88-001 prepared by the MlTRE Corporation for Electronic Systems Division Hanscom AFB Massachusetts 1988 Schefstroem D. and van den Broek G., Tool Integration : Environments and Frameworks Ch i chester, U nited Kingdom : Wiley & Sons Inc. 1993 Sodhi J. Software Engineering : Methods Management and CASE Tools, Blue Rid ge Summit Pennsylvania: TAB Professional and Reference Books 1991. Software Engineering Handbook. Build 3 Division 48 Information Systems Division Hughes Aircraft Company March 1992 Whitehead, S ., "Testing DOD Software Defense Software Engineering Report Hill AFB Utah: Software Technology Support Center October 1995 72