Evaluation of staff development in Colorado school districts

Material Information

Evaluation of staff development in Colorado school districts
Matzen, Sandra Husk
Publication Date:
Physical Description:
xv, 231 leaves : illustrations, forms ; 29 cm


Subjects / Keywords:
School employees -- Training of -- Evaluation -- Colorado ( lcsh )
School personnel management -- Evaluation -- Colorado ( lcsh )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


Includes bibliographical references.
General Note:
Submitted in partial fulfillment of the requirements for the degree, Doctor of Philosophy, School of Education and Human Development.
Statement of Responsibility:
by Sandra Husk Matzen.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
24672935 ( OCLC )
LD1190.E3 1989d .M37 ( lcc )

Full Text
Sandra Husk Matzen
B.S., University of Georgia, 1977
M.A., University of Colorado, 1979
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
School of Education

1989 by Sandy Husk Matzen
All rights reserved.

This Thesis for the Doctor of Philosophy Degree by
Sandra Husk Matzen
has been approved for the
School of
Brent Wilson
Pate/C2aJ); J 2 F2
Lance Wri

Matzen, Sandra Husk (Ph.D., Education)
Evaluation of Staff Development in Colorado School Districts
Thesis directed by Professor Bob L. Taylor.
The purpose of this study was to examine the evaluation
systems in three districts that assess their staff development
programs for teachers.
The questions addressed were: 1. What are the existing
models of evaluation for staff development as derived from the
literature? 2. How are Colorado school districts evaluating their
staff development training? 3. To what extent are staff
development activities and the evaluation of staff development
used in the school improvement process and/or program review
cycle? 4. How are the results of staff development efforts being
reported to the public, the Board of Education, and employees?
Do the reports corroborate the data collected by the researcher?
5. In districts that have evaluation systems for measuring the
impact of staff development, what factors are identified as
facilitating or hindering the evaluation process?
Colorado school districts with more than 10,000 students
were included in the screening interviews. Three districts that
were determined to be evaluating the process and impact of staff
development were chosen to participate in the case studies.
Results indicated that school districts were evaluating
the process of staff development more often than the impact of
staff development. However, in the three exemplary districts
many methods were used to measure the impact of staff

development. The emphasis was on measuring staff development
by examining student outcomes. The school improvement
process and/or curriculum review process were evaluation cycles
that included the evaluation of staff development. The results of
the evaluation were rarely reported to employees, Boards of
Education, or parents.
Respondents indicated that time, lack of priority, fear,
and not enough knowledge hindered evaluation. Resources,
expectation for evaluation, positive people, a safe environment,
and receiving feedback were identified as factors that facilitate
The following major conclusions were offered: There is
great diversity in the evaluation of staff development and staff
development offerings in the state of Colorado. Three districts
were evaluating staff development effectively through their
school improvement process or curriculum review cycle.
Evaluation models in school districts do not approach the rigor of
research. Evaluation systems within the school district are not
recognized as a way to evaluate staff development. When the
method for gathering data in the evaluation does not make sense
to the audience, there will be less buy-in from the teachers.
When there is an expectation that evaluation will occur, it is more
likely that people will evaluate their efforts. School districts are
not reporting evaluation results. There is an emphasis on student
achievement scores.

The form and content of this abstract are
recommend its publication. Signed
Bob L. Tayl

This dissertation is dedicated to my husband, Charlie.
His everlasting love and patience make everything worthwhile.

I would like to thank the members of my committee
Mike Martin, Bob Taylor, Lance Wright, and Brent Wilson.
I would also like to thank the numerous teachers,
principals, and other administrators who participated in the
interviews and shared their ideas and encouragement. In
particular the Colorado Staff Development Council supported my
efforts and has always provided invaluable expertise.
A special thanks to my colleagues and friends in the
St. Vrain Valley School District who believed in me especially
Tonda Potts, Bob Roggow, Jack Hay, and Mary Leiker.

I. INTRODUCTION..................................... 1
Statement of the Problem......................... 3
Assumptions................................... 5
Significance of the Study..................... 5
Delimitations and Scope....................... 8
Definition of Terms........................ 1 1
II. REVIEW OF THE LITERATURE...................... 1 3
Effective Staff Development................ 1 3
Models of Evaluation....................... 2 0
Business Models............................ 3 5
Methods of Research........................ 3 9
III. METHODS....................................... 45
Introduction............................... 4 5
Qualitative................................ 4 8
Triangulation and Data Collection.......... 5 0
Analyzing Data............................. 5 9
Timeline................................... 6 2
IV. FINDINGS...................................... 6 4

Research Question 1:
What Are the Existing
Models of Evaluation
for Staff Development as Derived
from the Literature?........................ 6 5
Difference between
Evaluation and Research................... 6 5
Criticisms................................... 6 6
Process and Impact........................... 6 7
Formative and Summative...................... 6 8
Goal Oriented...,............................ 6 9
Reporting.................................... 6 9
Benefits..................................... 7 0
Summary...................................... 7 0
Research Question 2:
How Are Colorado School Districts
Evaluating Their
Staff Development Training?.................. 7 1
Description of Screening Interviews......... 7 2
Criteria for Case Studies.................... 7 2
Screening Interview Findings................. 7 3
Program Variety........................... 7 3
Process Evaluation........................ 7 3
Methodology Varied........................ 7 3
Communication Problems.................... 7 4
Level of Awareness........................ 7 4
Description of the Three Case Studies....... 7 5
General Findings............................. 7 5
Case Study Overviews......................... 7 6

Case Study District A.................. 7 6
Case Study District B................. 7 7
Case Study District C...................7 9
Case Study Findings......................... 8 0
Teachers' Acquisition
of Knowledge and Skills............... 8 1
The Transfer of
Skills to the Classroom................8 1
Student Outcomes......................... 8 1
School Level Impact...................... 8 2
Linkage of Staff Development with
Curriculum and Instruction.............8 2
Evaluation as an Integral Part
of a Planned Cycle.................. 8 3
University Role in
Staff Development Evaluation.......... 8 3
Limitation of Evaluation Forms........... 8 4
Perspectives Limited Regarding
Staff Development Evaluation.......... 8 5
Emphasis on Student Outcomes............. 8 5
Guilt.................................... 8 6
Confusion Over
Research vs. Evaluation.............. 8 6
Teachers Lack Awareness of
Staff Development Evaluation.......... 8 7
Summary.................................. 8 8

Research Question 3:
To What Extent Are
Staff Development Activities
and Their Evaluation Used in the
School Improvement Process and/or
Program Review Cycle?.................... 8 9
Findings................................. 8 9
Staff Development Evaluated
Through Curriculum Review and
School Improvement................. 8 9
Reports to State Differ from Reality. 9 0
Teachers Confused When
Plans Are Lacking...................9 0
Summary................................9 1
Research Question 4:
How Are the Results of
Staff Development Efforts
Being Reported to the Public,
the Board of Education, and Employees?
Do the Reports Corroborate the
Data Collected by the Researcher?............9 2
Findings................................... 9 2
Formal and Informal Reporting...........9 2
Emphasis on Student Outcomes.......... 9 2
Evaluation Reports Corroborate
Interview Findings.....................9 3
Summary.................................. 9 3

Research Question 5:
In Districts That Have
Evaluation Systems for
Measuring the Impact of
Staff Development, What Factors are
Identified as Facilitating or
Hindering the Evaluation Process?.........94
Facilitative Factors......................9 4
Expectation for Evaluation............. 95
Receiving Feedback.....................9 5
Safe Environment....................... 9 5
Positive People........................ 95
Hinderances............................... 9 6
Time...................................9 6
Lack of Priority....................... 9 6
Fear................................... 9 7
Lack of Knowledge...................... 9 7
Continuum................................. 9 8
RECOMMENDATIONS.............................. 100
Summary of the Study......................... 100
Summary of Major Findings.................... 102
What Are the Existing
Models of Evaluation
for Staff Development as
Derived from the Literature?..............102
How Are Colorado School Districts
Evaluating Their Staff Development
Training?................................. 104

Summary of Screening Interviews........ 104
Summary of Case Study Findings.......... 104
and Discussion
To What Extent Are Staff Development
Activities and Their Evaluation
Used in the
School Improvement Process
and/or Program Review Cycle?........108
Findings and Discussion............. 108
How Are the Results of
Staff Development Efforts
Being Reported to the
Public, the Board of Education, and
Employees? Do the Reports
Corroborate the Data
Collected by the Researcher?........... 109
Reports Corroborate
Interview Findings.................. 110
In Districts That Have Evaluation Systems
for Measuring the Impact of Staff
Development, What Factors Are
Identified as Facilitating or
Hindering the Evaluation Process?...... 110
Facilitative Factors.................... 110
Hinderances............................. Ill
Conclusions................................... 112
Evaluation Models That Work.................112
Confusion.................................. 114
Reporting Results.......................... 115
Implications for School Districts.......... 115
Recommendations for Future Study...............117

LIST OF REFERENCES..........................................12 0
A. Status of Program Evaluation......................131
B. Hamblin's Cycle of Evaluation................... 13 3
C Bakken and Bernstein's SET Approach.............. 135
D. Contingency Framework for
Management Training Evaluation................ 137
E Outcome Measures................................ 13 9
F. Instruments for Measuring
Staff Development Variables....................141
G Screening Interview...............................143
H. Case Study Interview............................. 146
I. An Illustration for Evaluation Possibilities
for Selected Staff Development Outcomes...... 149
J. Case Study District A.......................... 151
K. Case Study District B.......................... 176
L. Case Study District C.......................... 202

Training efforts in American schools began with the
initiation of teacher institutes in the early 19th century (Richey,
1957). The purpose of these training institutes was to instruct
individuals on how to teach. This could be viewed as an origin of
what is known today as staff development. Staff development
continued as teacher certification programs and minimum
educational requirements were established in an effort to assure
that teachers were competent.
More recently staff development was used when the
nation experienced teacher shortages (McLaughlin & March,
1978). When there was a shortage of certified teachers,
individuals were allowed to teach without a teaching certificate
and staff development was used as a training program. Staff
development further expanded in scope as changes were needed
to meet the needs of special education and bilingual students
(Lieberman & Miller, 1979). These training programs tried to
help teachers learn to address the needs of all students.
Current literature indicates increased interest in staff
development by school systems. Dillon (1976) summarized

three reasons for this. First, teacher shortages created a need for
teacher training. Second, public dissatisfaction with student
achievement brought attention to education and a cry of reform.
Finally, societal concerns about drugs, values, health, and other
issues helped education set new goals and plan for change to
reach these goals. Staff development programs have become a
way to implement change systematically (Guskey, 1986). In the
Rand study McLaughlin and Marsh (1978) found that,
"Successful change and staff development were essentially
synonymous" (p. 4). The changes that are sought through staff
development efforts are intended to increase student learning
and aptitude to learn (Guskey, 1986; Joyce, Showers, &
Rolheiser-Bennett, 1988).
Today, the literature on staff development has a wide
definitionfrom individual teacher skill training to overall
school improvement or leadership training. Along with a greater
scope of definition, the educational literature has shown an
increasing interest in evaluating the effectiveness of staff
development programs.
Joyce and Showers (1980) have analyzed over 200
research studies and expressed a concern that very few of these
are measuring the change in teacher behavior as a result of the
staff development training. The studies that were analyzed
emphasized research over evaluation and were looking for the
general effectiveness of a particular training. In industry there
are studies that describe the evaluation systems that are in place

and the best of these evaluation systems (Kirkpatrick, 1976),
whereas in education there is an absence of studies that describe
evaluation systems for staff development. This leads to the
purpose of this study. Before practical staff development
evaluation models can be tested in school systems we need to
understand how districts are presently evaluating their staff
development programs.
Statement of The Problem
The purpose of this study is to examine the evaluation
systems in three districts that assess their staff development
programs for teachers. By examining current practices in these
districts and those from research, a framework may be
developed that will describe staff development evaluation in
general. For these three Colorado districts that appraise staff
development, an effort will be made to identify key factors
which might hinder or facilitate evaluation of staff development
in those districts. If we know what these factors are, they might
be useful to other districts as they try to improve their own
evaluation efforts.
The following questions will be addressed in this study:
1. What are the existing models of evaluation for
staff development as derived from the literature?
2. How are Colorado school districts evaluating their
staff development training based on the following

levels of impact?
a. Teachers' acquisition of knowledge and skills
b. The transfer of skills to the classroom
c. Student outcomes (Including standardized
tests but also measuring change in behavior,
attitude, or academic success)
d. School level impact (For example, a change
in teacher morale, attitude or values).
3. To what extent are staff development activities
and the evaluation of staff development used in the
school improvement process and/or program review
4. How are the results of staff development efforts
being reported to the public, the Board
of Education, and employees? Do the reports
corroborate the data collected by the
5. In districts that have evaluation systems for
measuring the impact of staff development, what
factors are identified as facilitating or
hindering the evaluation process?

There are several ways to evaluate staff development.
One way is goal-oriented where the evaluator sets the goal that
is to be measured and gathers data to determine whether or not
the goal was met. Another way to evaluate staff development is
not goal-oriented. In this type of evaluation the evaluator does
not have a goal in mind when gathering data but is observing for
trends and changes (Scriven, 1973). This researcher was aware
that these differences exist in evaluation and decided to focus on
goal-oriented evaluation.
The reason for focusing on goal-oriented evaluation and
a more behavioral approach is because of the influence of
Guskey's work. Guskey (1984) found that when teachers change
their behavior first and see positive results in their students,
they then have a change in attitude. Therefore, the assumption
by this researcher is that goal-oriented evaluation allows
teachers to see the results of their efforts. When the results of
their efforts are improved student performance, teachers are
more likely to continue using the new behavior.
Significance of the Study
In education there is an abundance of criticism for
evaluation systems of staff development. The analysis by Joyce
and Showers (1980) of evaluations of training methods found

that a very small percentage of studies even attempted to
examine the changes in student performance or the transfer of
the new skill to the work setting. Howey and Vaughn (1983)
argue that staff development practices are fragmented and
rarely assess the impact in terms of teacher behavior or student
learning outcomes. According to Mann (1975), the information
available from staff development evaluations varies greatly in
quality. Furthermore, there is practically no documentation of
school site or delivery site effects. Fullan (1982) reviewed
studies which showed that few districts have systems for
relating evaluation data to program improvement. Lorenz
(1982) believed that staff development programs are not self-
renewing and blames the lack of comprehensive evaluation
designs. Smylie (1988) reported that the best research reports
have not addressed all of the issues. The experimental designs
have looked at whether teachers could be taught to use a
particular instructional technique, whether one method of
training was more effective than another, and how personality
or attitude affected the outcomes of training. These studies were
usually research efforts designed to prove the effectiveness of a
particular type of training or instructional program.
Whereas business and industry have described systems
they use to evaluate staff training (Kirkpatrick, 1979), no studies
were found by this writer that described evaluation of districts'
staff development efforts. Smylie (1988) agreed that more

research needs to be conducted on staff development efforts and
Criticism of staff development evaluation continues
even as school districts commit funds to staff training. In
addition, districts are relying on the implementation of staff
development as a strategy for change (Fenstermacher & Berliner,
1983). Guskey (1986) reported that nearly all of the calls for
improvement in education describe staff development as critical.
Studies which measured the effectiveness of training
components are lacking. On the other hand, the literature does
provide information on the components of training. Joyce and
Showers (1980) have reviewed many studies which link
individual training components to desired outcomes in teacher
behavior and student outcomes. For example, they have
described what type of staff development instruction will
increase the knowledge of teachers, help teachers demonstrate
new skills, and help teachers transfer those skills to the work
setting. Joyce and Showers reported that coaching increases the
chance that teachers will use new skills in their classrooms.
The evaluation of the process and impact of staff
development provides two different types of data. The first,
evaluation of the process of staff development, provides
feedback as to how well the training is being delivered. If
participants are unhappy with staff development activities it is
not probable that the change will be successfully implemented.
Process evaluation allows adjustments to be made in the staff

development activities. The second evaluation of the impact or
the outcomes of staff development, indicated whether or not the
goals were met. If the goal is to increase the knowledge of
participants then the evaluation should measure increases in
knowledge, but often there are many goals. The goals of staff
development often include, but are not limited to, acquisition of
knowledge, acquisition of skills, transferring skills to the
classroom, changes in student outcomes, or changes in teachers'
attitudes or beliefs. Measuring the outcomes of staff
development efforts demonstrates whether the training goal was
met. Measuring outcomes or the impact of staff development
provides feedback as to what the next goal of training might be.
In order to raise the awareness level of staff developers
regarding evaluation systems and the need for them, a
description of what currently exists should be written. This
study will begin that description by studying school districts in
Colorado. The description of what exists, compared with
effective models from the literature, will enable educators to
improve accountability and enhance evaluation of staff
development in their district.
Delimitations and Scope
This study will consider those districts with staff
i development programs operating out of their own district
administration as compared to smaller districts that might

receive staff development services from their Board of
Cooperative Educational Services (BOCES). Only school districts of
10,000 students or larger in Colorado will be considered.
Only teacher skill training will be examined. Staff
development programs often have responsibility for
management training, school climate and culture training, action
planning, and more. Teacher skill training is intended to have
direct impact on teacher behavior and thus student outcomes,
therefore it is in this area that this study will focus. Teacher
skill training that is identified through the school improvement
process or curriculum review cycle will be the heart of the case
Writers discuss the evaluating of staff development in
terms of process and impact. How districts assess the adequacy
of their staff development efforts will be discussed in the
literature review. Much information has been collected on how
districts implement change using staff development. The
evaluation forms that teachers complete usually measure how
they feel about workshops and provide staff developers
feedback on the process of staff development. What we do not
know is how districts measure the impact of staff development.
Therefore, how districts measure the impact of staff
development will be the primary focus for this study.
This study included three districts in Colorado that are
using evaluation strategies identified by a screening interview.
In addition, the three districts were chosen based on their

uniqueness and thus ability to have information for readers
from many different school districts. Several people were
interviewed in the case study for example, the person with
responsibility for staff development, principals, school
improvement team members, the person with responsibility for
evaluation, other staff in central administration such as
curriculum staff and the person with responsibility for school
A limiting factor in this study is that the interviews
were based on self-report rather than direct observation. Self-
reporting could reflect the respondent's bias and not accurately
reflect district practices. This factor was somewhat balanced by
interviewing several different people within each district and by
corroborating the interviews with the examination of evaluation
reports. A triangulation approach was used by combining an
interview that narrows the pool of districts, case studies of a few
districts, and a document analysis of evaluation reports.
Another limiting factor may be the differences in focus
and delivery of staff development programs and school
improvment processes across districts. For example, not all
districts implement the same content in their skill training for
teachers, and not all districts use the same methods of training
delivery. By using a case study approach and describing each
district, the training, and the evaluation of their efforts, the
differences between districts was illuminated.

The third limiting factor is that one of the districts used
as a case study is the district where this researcher is employed.
To balance any possible bias, additional interviews were
conducted by including accountability members and the director
of evaluation in the interviews. In addition, a third listener and
reader was used to validate the data collection and case study
Definition of Terms
For the purposes of this study and report the following
definitions will be used:
1. Staff development Any activity that is intended
partly or primarily to prepare paid staff members for
improved performance (Little, Gerritz, Stern,
Guthrie, Kirst, & Marsh, 1987). The activities are
formally planned and are at the district level or
building level. In this study staff development and
training will be used synonomously.
2. Evaluation Any attempt to obtain feedback on the
effects of training programs and to assess the value of
the training in light of that information (Friedman &
Yarbrough, 1985; Hamblin, 1974).
3. Formative Evaluation Feedback that is gathered
during the process of staff development so that

adjustments can be made during the training if
4. Summative Evaluation Feedback that is gathered at
the conclusion of training to determine the effect of
staff development efforts.
5. Skill Training Any activity which deliberately
attempts to improve a person's skill such as the use of
an instructional strategy or behavior management
technique (Hamblin, 1974).
6. Acquisition of knowledge The ability of individuals
to remember and restate knowledge delivered in the
7. Acquisition of skill The ability of individuals to
demonstrate the new teaching strategies (Joyce &
Showers, 1988).
8. Transfer to work setting Appropriate and
consistent use of new strategies to meet educational
objectives (Joyce & Showers, 1988).
9. Student Outcomes Any changes in student
behavior such as academic achievement, social
behavior or attitude.
10. Organizational Impact A change in teachers that
affects the whole staff such as a change in teacher
morale, attitude, beliefs, or values.

In reviewing effective models of evaluation for staff
development, characteristics of effective staff development
practices will be examined first. The characteristics for effective
staff development practices are drawn from the literature on
adult learning and change. Second the* literature on evaluation
processes and models will be examined. Finally, the literature
describing the methodology proposed for this study will be
reviewed: case study and triangulation.
Effective Staff Development
Staff development has been defined by many authors.
Fenstermacher and Berliner (1983) defined it as, "The provision
of activities designed to advance the knowledge, skills, and
understanding of teachers in ways that lead to changes in their
thinking and classroom behavior" (p. 4). Smylie (1988) wrote
that staff development is the systematic attempt to change
teacher behavior. Schlechty and Whitford (1983) described
three functions of staff development. It functions as a method of

implementing new programs, as a way to ensure compliance
with routines that are preferred by administration, or it
functions as a way to improve teacher performance in the
classroom. The common theme in these definitions of staff
development was that change is the goal.
If change of teacher behavior was the goal of staff
development, then it would be useful to define the training
activities used to achieve this goal. Joyce and Showers (1980)
and Stallings (1982) have studied the effectiveness of staff
development activities. Sparks (1983) summarized their work
and suggested the following stepsdiagnosing and prescribing,
giving information and demonstrating skills, discussing
applications of skills, practicing and giving feedback, and
coaching. Coaching was recently identified by Joyce and Showers
as a critical component of staff development if change in teacher
behavior is the goal.
According to Helms (1980), staff development programs
were presented so teachers can use classroom research findings
to improve their instruction. Staff development is most effective
when it is viewed as a part of an ongoing program and skill
building approach to learning (Lieberman & Miller, 1979). It is
not as effective when it is approached as a deficit model where
the training is intended to "fix" poor teaching. Lieberman and
Miller proposed that the following assumptions build the base
for effective staff development processes:
- Teachers have important clinical expertise

- Learning is adaptive and heuristic
- Learning is long-term and non-linear
- Learning should be connnected to school-site program
building efforts
- Learning is influenced by school and district climate
and support.
One characteristic of a profession is the ability to implement
research findings into practice. Fullan (1983) suggested that
staff development is the vehicle for implementing educational
research into practice.
Staff development planners must use knowledge about
the characteristics of adult learners and learning and personality
styles to design effective programs (Lorenz, 1982). Daresh
(1987) and Guskey (1986) both emphasized the need for teacher
training to be practical. Teachers must be able to use the
learning immediately in their classrooms and see the benefits in
student learning. In addition, Daresh and Guskey stated that
change is viewed more positively if the participants views are
incorporated into the planning, if the training is on-going rather
than isolated, and if peers are planning and delivering staff
development programs.
The findings from a study of teachers' perceptions of
staff development and their preferences for training approaches
agreed with what we know about adult learning. Zigarmi, Betz,
and Jensen (1977) concluded from their study that teachers
should be provided opportunities to observe other classrooms

and discuss topics of concern. The inservices should focus on
teacher strengths and interests and allow choice. More time
should be allowed for planning and implementing the follow-up
activities to training sessions such as, preparing materials, trying
out new practices, receiving feedback from peers and students,
and discussing problems and successes. Bennett (1987)
completed a meta-analysis of teacher training and found that
staff development could be effective with mandatory and
voluntary participation but that voluntary participation was
slightly more effective. He also found that training that was
distributed over time was more effective and more popular
among participants. A variety of training strategies were
effective when teaching teachers to use new skills but, coaching
in the work setting provided the greatest effects for changing
teachers' attitudes, increasing teachers' knowledge, increasing
teachers' ability to demonstrate new skills, and increasing the
transfer of skills to the work setting.
The teacher commitment to innovations might not be
there simply because training has been completed. Emrick
(1977) reported that teachers must understand and accept the
degree of change required of them. There are many criticisms in
the literature about staff development evaluations focusing
solely on how teachers feel about the training. The criticisms are
valid whenever no attempt is made to measure the impact of the
training. However, teachers' feelings about training are
important to the process or formative evaluation. McLaughlin

and Marsh (1978) reported that the Rand study found that the
amount of assistance provided by resource personnel was not as
important as how useful the teachers perceived that assistance.
For example, classroom visits are usually considered to be an
effective way to provide follow-up to training. The Rand study
showed that the visits were actually counterproductive if the
teachers did not feel as if they were being helped during the
visits. According to the Rand study, for staff development to be
effective the participants need to be open to learning and to
create this open attitude, feelings of the participants should be
measured and used as a formative evaluation.
Effective staff development practices include theories
developed by studying change. Hord, Rutherford, and Huling-
Austin (1987) studied change using the Concerns Based Adoption
Model from the University of Texas in Austin, Texas. They
reported that one of the most common mistakes in the change
process is to presume that once the inservice is completed the
innovation will be put into practice by teachers. The conclusions
from the research team at the University of Texas were that
change is a process and is developmental, therefore it takes time.
The writers stated that change is accomplished by individuals
and is highly personal and, therefore the individual's needs must
be addressed. Changes should be considered in terms of what it
will mean for the individuals involved in the change.
Lieberman and Miller (1979) provided a review of the
Rand study and discussed the importance of teacher

commitment to program change. The Rand study found that
teacher commitment had the most consistently positive
relationship to project goals such as percentage of goals
achieved, change in teachers, change in student performance,
and continuation of program methods and materials.
Furthermore, the study showed that teacher commitment is
influenced by three factors. The first factor, identified as
motivation of district managers, was the commitment and
attitude that the district administrators displayed to teachers.
The teachers did not want to commit themselves to the new
program, if the support was not there. They felt their efforts
alone would not make a difference over an extended amount of
time. Collaborative planning between administration and
teachers was the second factor that facilitated teacher
commitment to the project. The final factor, scope of change,
was the degree of effort required by the teachers. The
researchers found that the greater the amount of effort, the
higher proportion of committed teachers. These findings support
the view that intrinsic rewards are important factors in change
The feedback gathered from participants during the
staff development activity provides staff developers with
information that is formative. Formative evaluation data can be
gathered formally and informally. Friedman and Yarbrough
(1985) described common methods of evaluation. The most
common method of data collection is a paper and pencil feedback

form that is completed by participants at the end of the
inservice. Feedback forms may also be completed during the
inservice so that activities and goals can be adapted. A steering
committee or planning group, including participants involved in
the staff development program, is another method of gathering
formative data. Less formal is a spontaneous process where
feedback is gathered through discussions between the staff
developer and participants during breaks or during training.
Teacher attitudes and opinions are most often used for formative
Several researchers have found that teachers' attitudes
regarding innovations change after they have seen results with
their students (Bolster, 1983; Crandall, 1982; Crandall 1983; &
Guskey, 1984). Guskey has designed a model for the process of
teacher change. The first step is staff development followed by
changes in teachers' classroom practices. When teachers observe
changes in their students' learning outcomes, they have a change
in their own beliefs and attitudes. Based on this pattern in
Guskey's research, if attitude surveys regarding the new
teaching practice were administered to teachers after they had
time to practice new skills and observe student outcomes, you
would have additional information about the impact of staff
development on teacher behavior.
The review of the literature and research showed that
staff development can be the vehicle used to implement change
in teacher behavior and thus to improve instructional practices.

Studies from characteristics of adult learners and of change in
education provided a basis for the principles in staff
development practices. Because participants' attitudes, feelings,
commitment, and other affective measures influence the
implementation of staff development, formative evaluation
provides important information to the planner of staff
development activities.
Models of Evaluation
Stufflebeam (1971) wrote that evaluators should
delineate, obtain and provide information. Patton's (1982)
definition of evaluation emphasized that evaluation is a
systematic way to collect information about a wide range of
topics for a specific group of people. Evaluation information
might be used in a variety of ways. Evaluations of staff
development in education typically attempt to measure the
attitudes of teachers, quality of the presentation, and other
process oriented measurements that provide information as to
how well the training is being received. The criticisms in the
literature usually focus on the lack of impact evaluation. Impact
evaluation provides data on whether the training achieved the
intended goal such as, teacher behavior change, student
outcomes, or other outcome measurements. Holdzkom and
Kuligowski (1988) defined formative and summative evaluation.
They write that formative evaluation looks at staff development

in progress and can provide an opportunity for correction.
Summative evaluation occurs when the staff development is
completed and can provide an assessment of success.
Leiberman and Miller (1979) and Robbins (1986) reported that
we should have both process and impact evaluations. In an
American Association of School Administrators' Report,
Brodinsky (1986) also identified the need for both process and
product evaluations but emphasized that adminstrators and
boards of education usually want proof that the information in
workshops is actually translated into practice by teachers and
that subsequently students benefit. As Guba (1975) described
evaluation, there may be a reason to use evaluation to improve
the process and there may be a reason to use evaluation to make
a judgment about the value of a program. Guba also believes
that even though we can differentiate between formative and
summative evaluation using Scriven's (1973) definitions, there is
no clear distinction because formative evaluation eventually
provides data for the summative evaluation.
Joyce and Showers (1988) identified reasons that make
the design of staff development evaluations difficult. First, the
system we want to evaluate is large and complicated. The
implementation of change is heavily influenced by the context of
the individual schools and teachers. The goal of change in
teacher behavior and then student outcomes is influenced by a
long chain of events that begins with staff development. There
are other influencing variables along the way. The measurement

of teacher behavior change and student outcomes is technically
difficult. In addition to measuring teacher behavior change or
student outcomes, other levels of impact can be measured such
as the school level impact.
When studying the possibilities of improving evaluation
in staff development, it is important to distinguish between
evaluation and research. The distinction between evaluation and
research was the reason that a second edition of the Research
and Evaluation by Isaac and Michael (1981) was printed. The
authors felt that since 1971, when the first edition was
published, evaluation has emerged as a separate entity and that
before that time educators were focusing on research. Even
though evaluation and research are described as overlapping
disciplines, their purposes and methods differ. Where research
has been described as having an orientation towards the
development of theories, evaluation measures the
accomplishment of goals. Stufflebeam (1971) said that
evaluation was used to improve an operation, not prove.
Guba (1975) described internal validity as the concern
that the data accurately describes what the researcher intended.
Researchers are usually concerned with internal validity and try
to conduct studies that attempt to accurately measure an
individual factor that correlates with an outcome. Evaluators are
usually more concerned with external validity and whether or
not the the results generalize to other populations. When
studying the perceptions of those involved with staff

development evaluation, the differences between evaluation and
research need to be clarified. There might be some confusion
about the definition of evaluation or the definition of research.
For example, is it clear that evaluation is a measurement for
attaining goals? Or is evaluation being confused with research?
Baden (1979) wrote that evaluation of inservice could
address five areas. 1) Was the content of the inservice
informative and useful? 2) Was the presentor effective?
3) Was there any immediate change in participant's behavior?
4) Were there long term changes in behavior? 5) Did the
participants and students change as a result? Regardless of the
focus for the evaluation, Baden emphasized that the evaluation
design must be practical and not so complex that it is impossible
for school staff to implement it. He felt that evaluation designs
should meet several criteria. The form of the evaluation should
follow the function. In other words, the way data is gathered
and analyzed should be determined by the goals that evaluators
are measuring. The evaluation should be timely so that results
can be used in decision making. It should be cost effective and
within budget constraints for that district. The results of the
evaluation should be usable, meaning that the data should be
reported in ways that the intended audiences can understand;
The need for a practical approach to evaluation is identified by
Baden (1979) and Patton (1982). Baden also felt that research is
attempting to prove a specific theory while evaluation is
measuring the results of goals.

Studies by Brophy and Evertson (1974), Rim and Coller
(1978), and Soar and Soar (1973) agreed that the relationship
between classroom practice and student achievement is not
always linear. While research findings relating classroom
variables and student achievement are now more available
(Helms, 1980), it is a difficult relationship to study and provide
absolute conclusive evidence. Helms discussed school districts'
accountability systems and the fact that they do not attempt to
identify the classroom processes that increase student
achievement. Helms is critical about districts not conducting
research. Fullan (1982) wrote that most school districts do not
have evaluation systems that are linked to instructional
improvement practices.
Educators are faced with many decisions when
developing evaluations for their programs. Guba and Lincoln
(1986) presented the complexities of evaluation from a historical
perspective. They identified different generations of evaluation
and describe the need for flexible and responsive designs. They
said that we have used evaluations that provide descriptions and
judgments concerning the objectives in question. The next step
in evaluation models is to focus on the concerns and issues that
are identified by a variety of audiences who are involved in the
program and/or evaluation of it. The major theme in this type
of evaluation is viewing the outcomes from as many different
perspectives as there are audiences. The values and beliefs of
each audience are definitely a crucial component of this type of

evaluation while negotiation and change are skills needed by the
evaluator. Who is to serve as the evaluator is one of the first
questions in planning.
The variety of audiences in evaluation design was
discussed by Joyce (1988). He stated that even though their
purposes might overlap, each orientation should be judged using
different criteria. This is because each orientation (or audience)
interfaces differently with the students, offers different services,
and has different constraints. The ability to identify the
perspectives of all audiences and design an evaluation system
that provides useful information is complex. Helms (1980) said
that the evaluation design is intended to assist leaders from each
audience to secure the answers to their evaluation questions.
Christine Johnson's review of the literature (1986) found that
experts noted many appropriate criteria for the evaluation of
training and development programs:
- the effects of training on individual job performance
- work output (productivity)
- work quality
- the effects of training on organizational results
- cost savings
- turnover rates
- absenteeism
- appropriateness of training technique to meet training
- participant satisfaction

- determination of trainee skills or knowledge
which reflect the accomplishment of
instructional objective
- measurement of the influence of individual
- measurement of the influence of the organizational
To make the decision about which criteria should be included in
an evaluation design, the questions that the evaluator intends to
answer need to be defined.
The evaluation system can measure the content (what is
to be learned) or process (how it is to be conveyed). The
evaluation can be formative (conducted during implementation)
or summative (assessing final results) (Friedman & Yarbrough,
1985). The data collection can be informal or highly elaborate
(Fullan, 1982). When making decisions in evaluation design and
implementation, there is a distinction between being as thorough
as possible or as practical as possible. It would not be wise to
conduct an evaluation that was very thorough but ignored the
attitudes of those involved. Friedman and Yarbrough (1985)
believed the people being studied must always be considered.
Duke and Corno (1981) summarized by stating that staff
development evaluation is a decision making process. Decisions
must be made concerning technical aspects which are the design,
data collection, methods of analysis, and presentation of results.
In addition there are political decisions to be made which are

what is the purpose, what are the specific outcomes to be
evaluated, who implements the evaluation, who will have access
to the results, and what are available resources.
Many sources provide suggested guidelines for planning
evaluations. The guidelines should be considered when making
decisions about evaluation design and implementation. In
general each evaluation has to be designed to understand the
facts in each particular case (Tracey, 1968). For example if the
question is whether the management training was worthwhile
the following questions should be addressed:
- Has learning occurred?
- Has behavior changed?
- Has productivity improved? (Peterson, 1979).
In order to measure long term success, the evaluation should be
conducted at least three months after the training was
completed (Meigniez, 1962).
The final essay in Lieberman and Miller (1979) by Gary
Griffin proposed that evaluations should use the following
guidelines. They should be on-going, informed by multiple data
sources, dependent on qualitative and quantitative data, explicit
and public, considerate of participants' time and energy, focused
on all levels of the organization, and presented in
understandable forms. Bruello, Orbaugh, Kladder, and Benneth
(1981) reinforced Griffin's points and added that the concerns
and issues of the stakeholding audiences should be the basis for
the evaluation and that these concerns and issues must be

prioritized. The evaluator should report back to the relevant
audiences on a continuous basis.
In addition to research studies looking at the
effectiveness of particular training programs, there are studies
that describe how training is evaluated in business and industry.
A 1970 review of evaluation studies from training in industry
(Clement, 1981) showed that industry has experienced similar
problems to education. Less than a third of the studies reviewed
had measured the effects of training on job performance or on
results for the organization such as profit. Most studies
measured participant reaction to training or acquisition of
knowledge. Very few studies measured the effectiveness of
various training techniques and no studies measured the
influence of individual differences of participants on the
outcomes of training. Few studies looked at the effects that the
organizational environment has on transfer of skills to the work
setting. Clement felt in 1981 that the evaluation practices were
unlikely to change until top management demanded a change.
In the late 1960's Catalanello and Kirkpatrick (1968)
surveyed companies to find out at what level training was being
evaluated. The researchers had a low response rate so the data
was incomplete. However, indications were that very few
companies measured behavior changes in the work setting after
training. Even fewer companies attempted to measure job
performance before and after the training. The majority of
companies were evaluating training efforts with reaction to

training data. While a relatively high number of companies
attempted to measure the results of their human relations
training programs, the follow-up interviews revealed that very
few of these companies used systematic objective
measurements. A more recent study (Digman, 1980) showed
that little has changed in evaluation of training in industry. The
response rate to Digman's survey was extremely low but the
responses indicated that most of the evaluations were done by
higher level management who evaluated on the job behavior.
The companies used questionnaires, interviews with
participants, and interviews with supervisors to gather
information about the effects of training. A major Xerox
Learning Systems study (Phillips, 1983) reinforced the lack of
data on job performance as a measure of training success (see
Appendix A).
Business and education face similar questions about
evaluation. Tarnapol (1957) discussed the use of random
sampling for a representative view when training groups are
large and suggests that five employees are a good minimum for
measuring the behavior of their supervisor. Ethical issues were
raised when a study by Blocker (1955) measured the effects of a
democratic leadership training course. After the course the
participants were interviewed and the interviews were classified
as authoritarian or democratic. The supervisors were unaware
that they were participating in the study. It was cautioned
when doing a study like this that the data should be confidential,

reported in aggregate form, and that those involved should be
informed after the evaluation as to how and why the data were
collected and to how the results will be used.
Industry is also influenced by the fact that training is
not always a direct linear approach to changing a work setting.
For example, General Electric conducted an evaluation of their
safety training program. The purpose was to decrease the
number of accidents. They found that the safety training did not
adequately decrease the number of accidents but that a new
approach to training which focused on the job relationship
between the foreman and worker did (Kirkpatrick, 1976).
Very few studies were found by Joyce and Showers
(1980) to measure the impact that staff development has on
teacher behavior or student outcome. Staff development efforts
that are attempting to measure the level of impact beyond the
training setting are often projects connected with university
studies or federal programs. It is not clear whether this is due
to the resources available or the interests of these two groups.
For example, Helms' (1980) study on the effectiveness of a basic
skills instructional program was sponsored by the National
Institute of Education and incorporated a design for determining
the effectiveness of the approach. A study entitled A.
Multifaceted Study of Change in Seven Inner-Citv Schools was an
attempt at a multilevel analysis of large scale school
improvement. The study extended over three years, was
conducted by researchers from the University of Oregon and

Stanford University, and put an extensive amount of time and
effort into the evaluation (Gersten, Carnine, Zoref, & Cronin,
1986). Another example comes from a document entitled,
Impact Studies 1980 (Morris, Pine, d'Ambrosia, Spaulding, Brent,
Zimmerman, Price, Ho, Richey, Britton, Arends, Lovell, Childress,
& Acheson, 1980). It is a collection of articles describing
research on the impact of staff development. All of the studies
measured student outcomes or teacher behavior in the work
setting. The studies were disseminated by the National Institute
for Education and each study was conducted by a university in
conjunction with a school district.
One way to measure the impact of staff development is
to examine teachers behavior in the work setting after attending
an inservice. In addition to this level of impact measure, some
studies have measured other changes in the teachers. Guskey
(1984) found that teachers who became more effective in their
teaching tended to accept increased responsibility for their
students' learning outcomes. They also tended to be more
positive in their attitudes toward teaching, but expressed less
confidence in their teaching abilities. Another study showed
that teachers often report that the content presented in staff
development activities was already familiar to them (Gail, 1982).
Even though the teachers indicated that the content was already
familiar to them, they rated highly their satisfaction with staff
development and the effects on students.

In California a descriptive study was conducted of staff
development efforts (Little, Gerritz, Stern, Guthrie, Kirst, &
Marsh, 1987). It was not intended to study the impact of staff
development initiatives into the classroom. It was initiated by
the legislature and governor in response to the increase in
amount of staff development efforts and the study provided a
description of costs. The results of the study were to be used to
determine the possibilities of staff development as a vehicle for
improving the quality of classroom instruction and student
learning. The California study compared local efforts in staff
development to available research and recognized the limitations
of the research. For example, the authors wrote that in areas
such as skill training, they were able to use reasonably
informative research but there was no body of research that
directly connects staff development with student outcomes. The
findings were:
- Approximately 2% of California's education funding
goes to staff development,
- The largest investment in staff development is the
salary advancement made by teachers as a result of
accruing university or salary credits,
- California educators are committed to improving their
knowledge and practice,
- Local school district's ability to offer staff
development is growing,

- While staff development activities have sound
reasoning for positively influencing student
outcomes, the way staff development is implemented is
unlikely to change the performance of California's
- California's staff development reinforces existing
patterns of teaching and traditional concepts in
- The state provides funding for staff development
annually but lacks a comprehensive policy toward staff
- Californias staff development activities are usually
The final statement was clarified. California districts were
usually evaluating the process of their staff development, or
formative evaluations. Staff development was rarely evaluated
in its relationship to school improvement goals or program goals,
although there were some exceptions cited. Participation rates
and evaluations of the process of staff development tended to
dominate the information available. A conclusion was drawn by
the authors that the impact of the most innovative, costly and
promising programs was unknown. Another conclusion in this
study was that the resources for evaluation are insufficient and
that staff development evaluation was not tied to personnel
evaluation and that it should be.

There is no way of knowing yet if Californias study is
representative of the schools across the nation. This study does
not describe evaluation systems that are assessing the impact of
staff development but does describe how the majority of
districts in California function. The fact that a major study
concerning the present situation for staff development exists in
California is indicative of the current interest in this topic.
Joyce and Showers (1980) have written that we should
be measuring the level of impact that staff development has on
education beyond the training session. They suggested we
measure the level of teacher acquisition of knowledge, teacher
modeling of new skills, teacher transfer of skills to the work
setting, and student outcomes. The literature from training in
industry and in education presents many models for measuring
the impact of staff development, but the models are similar.
When discussing change efforts, Fullan (1982) identified
five key areas for measuring short-term and long-term
- Degree of implementation
- Attitude toward innovation
- Impact
students' benefits
teachers' benefits
organizational benefits
- Continuation or institutionalization
- Attitude toward school improvement.

The different kinds of outcomes that can be measured are
identified in order from intermediate to more long-term effects.
Friedman and Yarbrough (1985) stated that the purpose
of the evaluation determines the type of data gathered. Reaction
data, learning data, and job behavior data apply to the
participants in the training. If impact on the organization is of
interest, organizational data such as cost effectiveness would be
gathered. The authors also indicated that there might be other
subtle ways that the training affects participants, and in this
case additional outcomes could be measured. For example, in
addition to knowledge and skill outcomes from participants,
concept attainment was described in this model. It was
suggested that concept attainment can be measured using short
essay items.
Business Models
In 1979, Kirkpatrick plainly outlined four evaluation
steps. Step one measured the participants' reactions; in other
words, how well did they like the training? Recognizing that a
favorable reaction to a training does not assure learning, step
two was to measure the amount of learning that took place. Step
three was more difficult than the first two to measure. It was
the measurement of change in job behavior as a result of
training. Kirkpatrick identified nine studies as some of the best
research available that measures job behavior as an outcome.

He stated that the fourth step was the most difficult level to
prove. The fourth step was to determine the results, for
example to determine if there was a cost savings to the
Very similar to the above models, Watson (1979)
identified five areas for measurement for training programs.
Participants' reactions, learning, job behavior, organizational
impact, and additional outcomes are the five areas. Additional
outcomes were described as results or by-products of the
training that were not assessed by the other four areas, for
example social value. Likewise, Hamblins (1974) model of
evaluation included effects in the areas of reaction, learning, job
behavior, organization, and ultimate value (see Appendix B).
Bakken and Bernstein (1982) modified Hamblin's five effects to
four general goals of training: Personal growth, knowledge
acquisition, skill acquisition or performance improvement, and
organizational development (see Appendix C). According to
Bakken and Bernstein, the importance in the distinction of the
goals or categories is in determining which types of evidence will
show that the objectives have been met.
An added dimension was brought into the models by
Clement and Aranda (1982). They cited the dimensions of
training results, relative effectiveness of technique, impact of
individual differences, and impact of environment. They added
the variables of manager, subordinate, and organization to each
dimension (see Appendix D). A multi-dimensional model was

presented by Glaser (1979) as well. In this model the variable is
not manager, subordinate, and organization, but is the type of
organization. A number of criteria were suggested as outcome
measures depending on the purposes for that particular
organization (see Appendix E).
The evaluation design, including the objectives and
goals, will determine the level of impact that is being measured.
Evaluations will provide more impressive results when the
design attempts to measure the furthest possible point where
change can occur as a result of staff development efforts.
Several outcomes could indicate success for business:
Direct cost reductions, work quality, quantitative results, profits,
sales volumes, reduced customer complaints, worker efficiency,
new product development, and new customers. The following
formula has been proposed for business: The organization's
earnings before taxes, less the actual cost of the training
program, divided by the compensation and benefit costs for the
portion of the work for the work force that receives training
(Friedman & Yarbrough, 1985).
Friedman and Yarbrough (1985) also suggested there
are many qualitative indicators of success. Learners can gain in
self-esteem, pride, and enjoyment of work. Trainers benefit
from their instruction of adults in their own growth in
knowledge and experiences. Organizations can benefit in ways
such as recruitment of personnel, career counseling that trainers
provide, and an improvement in loyalty from personnel. The

qualitative rewards from staff development might be
worthwhile, but may not justify the expenses if changes are not
seen in employee performance.
Research provides data on what changes should be
implemented. Staff development evaluation provides data on
whether or not the changes were made. One of the ways to
measure whether teachers implemented new skills is to examine
the changes in job behavior. These changes might be assessed
by gathering input from supervisors or peers as coaches, through
self-reports, or student reports (Friedman & Yarbrough, 1985).
The best method might be to use a triangulation of data that
combines interviews, observations, questionnaires/inyentories,
document analysis, as well as evidences of student learning.
Some evaluation instruments exist that could be of use to school
districts (Joyce & Showers, 1988) (see Appendix F).
The greatest concern in the literature has been on how
to measure the effectiveness of staff development and if anyone
was doing it. A growing concern is how evaluation results are
being reported, Friedman and Yarbrough (1985) suggested that
evaluation report writers ask themselves, "What do I want this
report to accomplish?" Besides including basic information
necessary to understand the report and staff development
project, the evaluator should include information that will help
the readers make decisions (Bakken & Berstein, 1982). Phillips
(1983) wrote that the skills necessary for communicating the
results of training are almost as sophisticated as the skills

necessary for getting the results. He suggested that
communications have the following characteristics to be
effective. The communications must be:
- timely
- targeted to a specific audience
- presented with appropriate media
- unbiased and modest
- consistent.
Phillips suggested that testimonials are more effective if they
are from individuals that the audience respects. In addition, he
wrote that the audiences' attitude toward the human resource
department (or staff development department) will affect the
communication strategy.
Methods of Research
Isaac and Michael (1981) recommended that the design
of the study reflect the purpose of the investigation and the
nature of the problem. They also wrote that a case study
approach is useful when background information is needed in a
social science area. The background information usually leads to
further research or testing of hypotheses. The case study
approach examines a small number of units across a large
number of variables and conditions compared to a survey
approach which examines a large number of units across a small
number of variables. The primary objective of qualitative field

research according to Corwin (1983) is to understand the social
unit. The success is determined by how accurate the description
Goetz and LeCompte (1984) discussed the benefits of
case study but emphasize choosing the data collection technique
based on the theory or orientation of the researcher. According
to them, the primary criteria for selecting a case study, or any
other method of data collection, should be based on the match
between the research questions and design and the practicality
of the study for the researcher.
There are reasons why a case study might be the best
method of research for certain research questions. McCall
(1969) described a surprise observation. The surprise
observation is when the researcher is able to identify a unique
finding. This finding might only be identified through a case
study or other qualitative approach where in a quantitative
study the researcher might not have been measuring for this
particular finding (Corwin, 1983). Bogdan and Biklen (1982) also
discussed the advantages of the case study or qualitative
approach. They cited the degree of detail that is gathered that
might allow the researcher to find a clue that will help us
understand a larger issue. By identifying the existence of a new
factor, the next researcher can study the explanation or
Lofland (1971) felt that in qualitative research the unit
of study should be described "in their own terms." The results of

- 41
the study are presented from the perspective of the participants
and setting (Patton, 1969). The researcher must be close to the
setting and participants to gather this perspective. Case studies
can be a practical method of investigation for studies of social
settings (Webb et al., 1966). Each case study presents an
individualized description.
In 1970, Wolcott complained about the body of research
for school administrators because he felt it was too prescriptive
rather than descriptive. He wrote that the description of school
administration as it occurs in the field provides much
information about how school administrators really operate and
felt that it was less useful to discuss how school administrators
ought to operate ideally. Wolcott chose a case study approach to
describe reality. Staff development evaluation is similar because
there are models of evaluation prescribed for staff developers,
but not enough description of what actually exists.
Because case studies examine a few units, some
limitations exist. According to Isaac and Michael (1981) case
studies were limited because the results do not necessarily
represent a larger population. They believed you should not
make broad generalizations to the larger population until further
research has been completed. Case studies also were vulnerable
to subjective biases. In order to complete a case study the
researcher will need to become closer to the unit of study than
in other methods. The increased amount of contact can bias the
perception of the researcher. In addition, cases may be selected

because they are unique in some manner, which again limits the
possiblity of generalizing findings.
When conducting a case study, Isaac and Michael (1981)
recommended the following steps. First, the researcher should
state the objectives and then design the approach for the study.
The unit of study, sources of data, and methods will be
determined by the objective. After the data is collected, it
should be organized. The final step is reporting the results.
Isaac and Michael (1981) defined triangulation as, a
"multiple measure of a given concept or attribute, each sharing a
portion of the theoretically relevant components but each having
different loadings of irrelevant factors or noise" (p. 92). They
describe it as superior to a single method approach. Miles and
Huberman (1984) state that triangulation supports a finding
through independent measures. Triangulation can be a
combination of any three methods for studying a situation.
Webb et al. (1966) concurred that triangulation is the strongest
way to confirm measures. Increased confidence can be placed in
an observation that has used triangulation because you have
minimized the chance for error. Patton (1980) cautioned the
researcher that triangulation is not magic and that the use of
multiple methods does not guarantee that the results of the
study will automatically come together to form one clear picture.
If the social scientist was able to prove the validity of
research methods, triangulation would not be as valuable (Vidich
& Shapiro, 1969). For example, Corwin (1983) described several

types of qualitative field techniques that could be used in
triangulation. Documentation which is written evidence of an
occurence, like a calendar or roster, and reports that describe an
event can both be used in triangulation. The documents can also
be analyzed for content where specific events are counted within
the text, like the number of references to evaluation in memos.
Goetz and LeCompte (1984) listed three other ways to
triangulate. They mentioned collecting data over time,
interviewing a variety of people, and observing. In a study,
Smith and Kleine (1986) used triangulation and combined
ethnography, biography, and history in their data collection. In
the Napa-Vacaville study, Robbins (1986) wrote about the use of
field notes, informal observations, accounts from teachers,
students and principals, and writing samples from participants
as methods for triangulation. Zelditch (1982) listed three types
of data collection, incidents and histories, distributions and
frequencies, and rules and statuses. When collecting data about
rules and statuses, the researcher is focusing on the events as
they are seen through the eyes of the interviewee. This type of
research reports on the perceptions of participants in the study
and is able to provide meaning in this way.
One of the benefits to using triangulation is that it
improves the generalizability of the study (Miles & Huberman,
1984). It is a way to support the findings when the findings of
several methods do not contradict one another. Taylor and
Bogdan (1984) also felt that the combination of methods helps

guard against researcher bias. It provides a way to cross check
accounts from different participants. Miles and Huberman
(1984) discussed the fact that some data are better evidence and
might be used to sway a conclusion. The use of triangulation can
clear up any misconceptions. Webb et. al. (1966) encouraged
researchers to consider using triangulation to help eliminate
rival explanations for findings.
Mathison (1988) recommends triangulation because she
believes that it is a strategy that can help social science
researchers even when the findings produce the same results.
Mathison describes three possible combinations. The findings
might suggest a single solution, convergence, or the findings
might be inconsistent, or they might be contradictory. All three
combinations of findings described by Mathison provide
explanations about social phenomena.

Corwin (1983) wrote about an argument that exists
among researchers regarding the use of quantitative or
qualitative methods of study. Presently there are many
arguments for and against using qualitative measures and strong
recommendations to choose the method that best meets the
needs identified in each study. Corwin described qualitative
studies as an inductive approach to understanding a situation.
Stake (1978) stated that the purpose of qualitative studies is to
add to understanding and experience. Corwin reached the
conclusion that while each research strategy has merit,
qualitative methods are best used in open-ended situations or
where there is a need for adaptability.
The purpose of this study was to examine the
evaluation systems in those Colorado school districts that are
assessing the impact of their staff development programs for
teachers. It is a qualitative study of what currently exists. The
questions that were answered in this study are as follows:

1. What are the existing models of evaluation for
staff development as derived from the literature?
The review of the literature in chapter two describes the
literature about models of evaluation. This review includes
models of evaluation in education and industry.
2. How are Colorado school districts evaluating their
staff development training based on the following
levels of impact?
a. Teachers' acquisition of knowledge and skill
b. The transfer of skill to the classroom
c. Student outcomes (Including standardized
tests but also measuring change in behavior,
attitude or academic success)
d. School level impact (For example, a change
in teacher morale, attitude, or value).
The screening interview (see Appendix G) provided the answers
to this question. The answers were the interviewees'
perceptions of their school districts' evaluation process. Districts
were screened to a smaller sample of three. The case studies
addressed these questions in more depth. In the screening
interview the person with responsibility for staff development
in the central administration was interviewed. In the case
studies other people within the school system were asked to
respond to the same questions that were used in the screening
interview. The people interviewed were the person with

responsibility for staff development, principals, school
improvement team members including parents, other staff in
central administration such as curriculum staff and the person
with responsibility for school improvement.
3. To what extent are staff development activities and
their evaluation used in the school improvement
process and/or program review cycle?
In response to open-ended questions (see Appendix H)
participants discussed their perceptions of the school
improvement process or program review cycle and staff
development evaluation.
4. How are the results of staff development efforts
being reported to the public, the Board of Education, and
employees? Do the reports corroborate the data
collected by the researcher?
Where possible, school district documents connected to staff
development evaluation were analyzed to see how the results of
the interviews might or might not be corroborated in written
form. These documents also provided information about the
reporting of staff development evaluation.
5. In districts that have evaluation systems for
measuring the impact of staff development, what

factors are identified as facilitating or hindering the
evaluation process?
An open-ended question in the case study allowed participants
to share their perceptions regarding the factors that contribute
or inhibit the staff development evaluation process.
Dean, Eichhorn, & Dean (1969) referred to qualitative
research as naturalistic because there is no attempt to
manipulate the setting. Qualitative researchers examine what
currently exists in order to better understand the factors
involved. Some characteristics of qualitative research are
described by Bogdan and Biklen (1982). They are that the data
are soft and not usually handled by statistical procedures. There
are not specific hypotheses to test and the study focuses on
understanding the problem from the subject's own frame of
reference. Strengths of qualitative research include a strong
internal validity, a structure that allows for flexibility, and an
inductive approach that allows for patterns to emerge (Goetz &
LeCompte, 1984; Miles & Huberman, 1984; Patton, 1980). Patton
claimed that the qualitative methods provide a holistic approach
that is necessary to understand the perceptions of participants.
Because the qualitative methods have stengths and
weaknesses, it is necessary to discuss limitations. Miles and
Huberman (1984) mentioned that researchers may put too much

Huberman (1984) mentioned that researchers may put too much
emphasis on particular facts. Likewise, participants in this study
bring their own biases to interviews. Homans (1967) listed ways
researchers can limit subjectivity in a qualitative study. The
more time that researchers spend with a group, the more
accurate the picture. As the researcher and as a staff developer
there is currently interaction between the researcher and
several of the people who are likely to participate in the
interviews. Homans stated that the closer the researcher is
geographically to the subjects and the more social circumstances
the researcher observes, the greater chance of accuracy. By
limiting the pool of districts to those with student populations of
more than 10,000, the districts along the front range of Colorado
are a large portion of that pool. Staff developers who attend
meetings and conferences together typically are from districts in
this area. In addition, conducting interviews in person rather
than on the telephone narrowed the geographic distance. It is
also useful if the researcher and subjects share a common
language. Having worked in staff development improved this
researcher's understanding of the language and descriptions
shared in interviews. Finally, Homans believed the researcher
should try to confirm the interpretations that he/she makes
from data. Therefore, the final question of the interview allowed
the subject to react to the researcher's summary of the interview
and findings up to that point.

One of the strongest criticisms of qualitative research is
that the methodology is too subjective (Patton, 1980). This is a
difficult limitation to overcome but a suggestion was made by
Weick (1985) that distance should be maintained from the
subjects. This limitation will be addressed by acknowledging its'
existence and providing substantiation of any conclusions that
are drawn. In addition, the use of triangulation should help
minimize this limitation.
There is also a limitation in the ability to generalize.
Goetz and Lecompte (1984) acknowledged that case studies do
not generalize well to all other populations. A clear description
of the districts and staff development departments that were
used in the case studies should help readers decide whether the
findings generalize to their situation.
Triangulation and Data Collection
Denzin (1978) wrote that the researcher can use data
triangulation by combining time, space, or people, investigator
triangulation, theory triangulation, or methodological
triangulation. In this study, data triangulation were used by
comparing responses to screening interviews, the interviews of
other staff, or reports or, other documents that describe the
evaluation process or results.
When discussing data collection, Lofland (1981) stated
that the face to face interview has several advantages when

conducting research. The first part of the study was to interview
the person responsible for staff development in school districts
larger than 10,000 students. The first interview was conducted
in person as often as possible. The Colorado Staff Development
Council (CSDC) provided a vehicle for meeting with staff
developers. The first interview was used to narrow the districts
down to three.
After the districts were identified and agreed to
participate in the case study, the second portion of the research
began. In the second portion, another interview was conducted
with the staff development administrator gathering more details
as to who might have the most knowledge about their evaluation
systems. Several authors (Bogdan, 1984; Bogdan & Biklen, 1982;
Goetz & LeCompte, 1984; Patton, 1980) described the purpose of
the interview to access the perspective of the person being
interviewed. According to Bodgan and Biklen (1982) the
interview also will provide general information about the
setting. Because there is much advice available regarding
interviewing, Goetz and LeCompte (1984) recommended that the
interview be designed to meet the needs of each researcher.
McCall (1969) also advocated the use of an unstructured
interview allowing issues to arise during the course of the
interaction that the researcher might not have thought of
Other guidelines for interviewing are discussed by Goetz
and LeCompte (1984). They encouraged the researcher to

consider duration, number and setting of the interview since all
have an influence on the findings. They also encouraged the
researcher to consider who key informants might be in the case
study. Key informants are people who are willing to be
interviewed and who possibly have access to information that
can provide insight into the study. Even though those close to
the evaluation of staff development programs may have some
bias, they are also likely to have valuable insights and
perspectives. Goetz and LeCompte (1984) believed that the key
informant might alert the researcher to value dilemmas that
might otherwise be missed.
In this study as more details were gathered regarding
the staff development evaluation process, other individuals who
should be included in the interviews for the case study were
identified. For example, the staff development administrator
identified schools that should be considered for the study and
then the principal suggested school improvement team
members, including classified staff and parents that could be
interviewed. The central level administrators with responsibility
for school improvement, staff developers, and curriculum people
were interviewed. The organizational structure of the districts
in the case study helped to determine who else was included.
Bogdan (1972) and Taylor and Bogdan (1984) wrote about
comparing interviews as an appropriate way to add to
triangulation. Miles and Huberman (1984) recommended that

the researcher consider settings, actors, events, and processes
when choosing subjects.
Denzin (1978) described types of interviews that can be
used. Interviews might follow a consistent pattern of
questioning where even the probes are thought through in
advance and are asked consistently. The nonscheduled
standardized interview is similar to the above but allows the
researcher to alter the order of the questions depending on the
reponses given. The nonscheduled standardized interview was
used in this study. The questions and probes were designed
prior to interviewing, but flexibility was allowed. This style of
interviewing allowed the researcher to be more natural yet still
provided for an efficient system of coding responses.
When inviting districts to participate in this study, the
openness of the district was considered. McCall and Simmons
(1969) thought that there are some interviewees that are more
helpful than others. They wrote that participants are usually
more helpful when they are sensitive to the area of concern or
when they are more willing to reveal information. In this
researcher's experience, staff developers in the Colorado Staff
Development Council have developed a philosophy of sharing
resources and support across districts. There have been many
statewide efforts in training that are evidence of this. Therefore,
willingness to participate in this study was not anticipated to be
a major problem and it was not.

However, the need to interview other participants in the
chosen school districts needed to be considered. Because the
other participants were involved in school improvement and
curriculum review, an interest was created because in the
summary information provided by the researcher, they learned
about evaluation of staff development, school improvment, and
curriculum review in other districts. When the districts were
invited to participate in the case study, clearance was first
received from the district committees that review dissertation
requests. The staff developer then contacted the principals by
phone or note encouraging their school to participate in the
study. Obtaining permission to interview building staff was not
a problem in most schools. In one building the principal
requested that even more staff and parents be interviewed so
that he could gain feedback about the perceptions of school
improvement and evaluation of staff developement. Separate
reports were prepared for all schools that requested them, and
summaries of the case studies were given to the staff developer.
By providing summaries and reports to the participants, a check
was provided for the researcher regarding accuracy of data and
reactions were recorded.
Dean et al. (1969) and Bogdan (1972) discussed using
interview questions that had a structure but allow for open-
ended responses. Later in the case study, the researcher might
ask more specific questions to compare data from different
sources (Bogdan & Biklen, 1982). For example, if a source of

evaluation data for staff development was found in the school
improvement process in one interview and not mentioned in
another one, specific questions about the school improvement
evaluation process were asked by the researcher. Patton (1980)
also had a categorization for interview questions. He suggested
researchers ask interviewees what they have done in the past,
how they feel about their experiences, how they react
emotionally to these experiences, what they know about the
topic, how they describe the environment, and how they
describe themselves. In addition, Patton discussed how the
researcher can cover the questions in the interview from the
perspective of the present, past or future. For the purposes in
this study the interview questions focused on the present for the
description of the staff development evaluation models that
currently exist for their respective district, except when dealing
with respondents in two newly-opened schools where the focus
was on the future. Past and future were considered when
addressing the opinion of the interviewee as to what facilitates
or hinders staff development evaluation.
Mintzberg (1973) advocated allowing categories or key
factors to emerge from the data. Gallaher (1972) also supported
using the inductive technique of interviewing to produce a
pattern. Patton (1980) summarized by stating that the theory
should arise from the data. In the case studies key factors were
identified which facilitate the evaluation of staff development.
For instance, from the data may emerge the pattern that strong

support from the board of education or superintendent are what
cause staff development evaluation. Or, the administrators
might know a lot about evaluation. This researcher was attuned
to these types of factors. While no absolute proof is provided
that any of the factors identified- are what cause evaluation to
occur, further studies might investigate the cause or correlation.
In the Journal of Staff Development. Little (1982)
provided a chart that illustrates the possibilities for evaluation
of staff development outcomes (see Appendix I). The chart
breaks staff development activities into categories based on
their purpose. The categories of building knowledge and skill,
transferring use of skill to the work setting, and altering student
performance, can be associated with teacher skill training. Little
also identified ways to measure each type of activity. For
example, if a districts purpose is to improve teacher knowledge
or skill, direct observation, self-report, and records provide
evaluation information. In the interview, open-ended questions
which promoted participants to describe their evaluation system.
To encourage further description, examples were relied on from
Littles description to see if the interviewees might know of
other sources of evaluation data.
The third portion of this study was a document analysis.
Patton (1980) recommended that the approval for document
analysis be obtained when participants agree to the study.
School improvement reports, curriculum review reports, report
card forms, parent and teacher newsletters were provided, and

they provided information as to how the results of staff
development evaluation were being shared.
The unit of analysis for this study was the school and
central administration area such as staff development,
curriculum, and school improvement. The data were analyzed
across all three districts for two questions. These questions
were: How do you evaluate staff development activities in terms
of teacher or student change and what factors do you believe
hinder or facilitate the evaluation of staff development?
Yin (1984) wrote about construct validity, external
validity, and reliability in qualitative research. He
recommended using multiple sources of evidence, key
informants, and replication logic be used to improve the validity
and reliability of case studies.
In this study, several people were interviewed in each
building and within central administration units, when possible
(meaning when it was not a one person unit). The screening
interviews provided another source of evidence and so did
collecting documents, when available, that reported on the
evaluation of staff development. Using multiple sources of
evidence was one method used to improve the construct validity
of this study. In addition, key informants reviewed the data
that was being gathered. A clinical professor who has been
assisting with research projects at a nearby university and who
worked in staff development reviewed tapes and the notes
taken during interviews. She also read the three case studies

and provided feedback regarding her perceptions of the data
analysis. As another check, the case studies were sent to the
person in charge of staff development who provided reactions to
the descriptions of programs and projects. Finally, in the district
where the researcher is the staff developer for that district, the
assistant superintendent of instruction served as a second
reader. Using key informants, like using multiple sources of
evidence, was a method to improve the construct validity.
For external validity considerations, replication logic
was considered as described by Yin (1984). The caution in using
replication logic is that the case studies are not intended to
describe all other districts but are written to provide
generalizations to theory. The cases were chosen because they
were screened to be the "best" examples of staff development
evaluation. Therefore, the descriptions do not transfer to all
districts but to the theory of how staff development evaluation
might be implemented. Using three cases, rather than one, also
improved the validity methods.
To improve reliability, Yin (1984) recommends that
researchers take careful notes, check them many times, and
build a base of documentation to substantiate findings. This
researcher used a scripting method during interviews to record
responses verbatim, and tape recorded interviews to double
check note taking. The notes from the tapes and interviews
were then summarized prior to the case study development. By
using this second step of summarizing the data, the data were

sifted through several times and carefully checked for accuracy.
A data collection trail was left that allowed the researcher to go
back to the raw data at any time that questions of accuracy or
memory needed to be checked.
Analyzing Data
In qualitative research statistical measures are not
often used to analyze the data, therefore the researcher does not
have as many formal rules to follow in analyzing data
(Stevenson, 1986). Bogdan and Biklen (1982) defined data
analysis as the process of systematic sorting of information. It
involves working with the data, organizing it, breaking it into
parts, and synthesizing it while searching for patterns. They
stated that the purpose for analyzing data is to make sense of
the information for yourself and others and so that you are able
to tell what is important.
Bogdan and Biklen (1982) recommended that the
researcher begin making sense of the data as they are collected.
This way notes can be written in the outline during the
interview. The notes can help the researcher stay on track with
the research questions. Goetz and LeCompte (1984) stated that
the researcher can collect the data, then review the original
proposal, or they can analyze the data as it is gathered and
redraft the plan for analyzing the data. In addition to checking

to see if the data are consistent with the research questions, the
researcher should check to see if any information is missing.
Regardless of whether the information is sorted during
the interviews or after they are completed, data reduction is
needed. Miles and Huberman (1984) stated that data reduction
is simplifying and transforming the raw data. Coding is one way
to interpret data. Codes are categories or key concepts that the
raw data can be sifted into. The codes help organize and cluster.
Miles and Huberman suggested that developing the codes prior
to collecting the data might be helpful, but that the researcher
needs to stay flexible and open to necessary adaptations.
For this study the data were analyzed in the following
way. Over half of the interview sessions were audiotaped and
notes were recorded during the interviews. The researcher
listened to each audiotape and checked notes to confirm that
data had been captured accurately. In addition, a staff developer
who has been assisting with research at a local university
listened to portipns of the audiotapes and provided reactions for
the researcher. The notes then were summarized by building.
For example, each school improvement team that was
interviewed and each central adminstration that was
interviewed had a separate summary written regarding their
responses. These summaries were then used to write the case
In each case study there is a description of the school
improvement process and curriculum review cycle as described

by the participants. In addition, the researcher analyzed the
responses to how evaluation data were collected using the levels
of impact as a framework. The interviewees also identified
factors that they believe facilitate or hinder a school district's
staff development evaluation. Some of these factors are
identified in the change literature. For example, providing
sufficient time is necessary to successfully implement change,
and time was a common response and provided a category for
sorting the responses about what facilitates or hinders the
evaluation process for staff development. The categories that
emerged from the case studies might need to be studied further
to test if they actually do facilitate or hinder the evaluation
process or to see if they are only a perception.
Another avenue for data analysis was allowing the
subjects to react to the findings of the researcher. Because case
study is such a personal form of qualitative research, reactions
from the participants were valuable. Patton (1980) cautioned
that if the participants are unable to relate to the study, then the
conclusions are questionable. A case study examines the real
world from the perspective of the subject. The subject is likely
to know more about evaluation of staff development in their
district than an outside observer (Miles & Huberman, 1984).
The success of qualitative studies depends on the ability of the
research to analyze relationships and provide a systematic
summary (McCall, 1969). Patton (1980) also believed that
qualitative research relies on the observer's ability to deal with

intangibles and have insight into the relationships. Interviewees
reacted to a summary provided by the researcher at the end of
the interview and the staff developer read the finished case
study and provided feedback.
In review, the sequence of events for this study are as
follows. The districts with staff development programs and
student populations of over 10,000 were contacted by phone or
in person to conduct a screening interview. These districts were
selected based on their responses in the screening interview and
based on their uniqueness. The researcher then chose three
districts that have different descriptions and characteristics. The
commonality between the districts was their use of effective
strategies for evaluation of staff development. After the
permission was obtained for the case study, the second round of
data collection began. The staff development administrator
identified which schools should be included in the case study,
and the schools were contacted and interviewing began. No
school that was contacted denied the researcher an interview
although teachers in one building were unavailable. Specific
questions were also asked about curriculum review cycles and
school improvement processes, since these are areas in school
districts where there was a possibility of finding evidence of
evaluation of staff development. A question was asked

regarding the perception of the interviewee about why
evaluation of staff development is or is not occurring in a
particular program or area of the district.
Permission for reviewing documents was obtained from
individuals who were included in the case study. Documents
were read and analyzed for evidence of collaboration or
disagreement with facts found during the first two rounds of
interviews. The document analysis and analysis of interview
findings happened concurrently.

The purpose of this study was to examine how school
districts in Colorado are evaluating their staff development
programs. To achieve this the following questions were asked:
1. What are the existing models of evaluation for
staff development as derived from the literature?
2. How are Colorado school districts evaluating their
staff development training based on the following levels
of impact?
a. Teachers' acquisition of knowledge and skills
b. The transfer of skills to the classroom
c. Student outcomes (Including standardized
tests but also measuring change in behavior,
attitude, or academic success)
d. School level impact (For example, a change
in teacher morale, attitude or values).
3. To what extent are staff development activities and
their evaluation used in the school improvement
process and/or program review cycle?

4. How are the results of staff development efforts
being reported to the public, the Board of Education, and
employees? Do the reports corroborate the data
collected by the researcher?
5. In districts that have evaluation systems for
measuring the impact of staff development, what
factors are identified as facilitating or hindering the
evaluation process?
Research Question 1
What are the Existing Models of Evaluation for Staff
Development as Derived from the Literature?
A review of the literature was conducted regarding
evaluation models and staff development. The findings are
listed below.
Difference between Evaluation and Research
. There is a difference between evaluation and
research practices (Clement, 1981; Isaac &
Michael, 1981; McCall, 1969; Stufflebeam, 1971).
Researchers distinguish between research and
evaluation. Evaluation is the degree to which goals are attained,
while research measures and attempts to prove or disprove that
a relationship exists between two factors. Highly elaborate
models that are designed to compare and contrast findings are
better defined as research, and models that are less formal and

are attempting to measure what exists currently are better
described as evaluation.
Many criticisms exist about the lack of staff
development evaluation occurring in business and
industry and in education (Catalanello and Kirkpatrick,
1968; Clement 1981; Digman, 1980; Fullan, 1982; Helms,
1980; Johnson, 1986).
When it comes to staff development evaluation, a
review of the literature revealed that most of the criticism
regarding the absence of staff development evaluation pertains
to specific projects, such as chapter one reading, rather than
reviewing broad district-wide evaluation systems. In addition,
school evaluation systems vary greatly from district to district.
In all districts though, state mandates, the skills of
educators, requests from boards of education and the public, and
other factors influence the schools' evaluation systems. Systems
that might be sources for the evaluation of staff development
are the school improvement process, a curriculum review cycle,
program review of staff development as a separate entity,
performance evaluation for employees, and state reviews of
student performance. In addition, course or inservice
participants are usually required to complete an evaluation form
that provides feedback about the students' perceptions of the
instructor to the sponsoring university or state department. In

addition, instructors often measure the attainment of course
goals by assigning homework, testing, and allowing teachers to
report about their use of techniques. All of these systems can
provide staff development evaluation data and were rarely
discussed in the literature.
Process and Impact
Evaluation models should measure the process
and the impact of staff development and may use
formal or informal measures (Brodinsky, 1986; Clement
& Aranda, 1982; Friedman and Yarbrough, 1985; Fullan,
1982; Glaser, 1979; Guba, 1975; Guskey, 1984;
Kirkpatrick, 1979; Watson, 1979).
The process of staff development is often measured by
asking participants to rate the course or other activity based on
their perceptions of the content, instruction, and instructor.
These are referred to as "happy face" ratings because
participants are usually reflecting what they liked about the
activity. Other assessment tools at the evaluator's disposal
include opinion surveys, participant attendance, and feedback
from informal conversations with participants.
The impact of a staff development activity might be
measured by having evaluators focus on changes in teachers
skills, knowledge, attitude, changes in students' achievement,
attitude, behavior, or changes in the climate. While the ultimate
goal of staff development is to improve the students'

performance, evaluation models also must assess many levels of
change and provide planners with information that helps them
know where the school system is in the change process.
Many state legislative acts mandate that schools provide
the public with information about the ultimate success of
students. Consequently, districts frequently turn to achievement
indicators such as graduation rate or test scores to evaluate the
impact of staff development efforts on student performance.
The link between staff development and student outcomes is not
always linear and direct, so the evaluation model must measure
both student outcomes and teacher outcomes.
Formative and Summative
. Staff development evaluation should include both
formative and summative evaluation (Friedman &
Yarborough, 1985; Fullan 1982; Holdzkom &
Kuligowski, 1988).
Experts suggest that staff development evaluation
should include measures that provide feedback during the staff
development activity and at the conclusion or even after the
activity. Formative evaluation allows the staff developer to
make on-going changes in the activities and summative
evaluation provides an assessment of accomplishments. The two
different types of feedback allow decisions to be made quickly
as a result of the staff development activity or allow conclusions
to be drawn later. Because data can be used for both purposes,

there does not exist a clear distinction between formative and
summative evaluation.
Goal Oriented
. An effective model of evaluation will focus on
measuring the goals for that particular program while
considering the audiences involved and remaining
practical (Baden, 1979; Baken & Bernstein, 1982;
Bruello, Orbaugh, Kladder, & Benneth, 1981; Duke &
Como, 1981; Friedman & Yarborough, 1985; Helms,
1980; Patton, 1982; Tracey, 1968).
An effective model for staff development evaluation
measures goal attainment. In an evaluation model that is goal
oriented, the evaluation design will be determined by the
objectives of the organization or program. For instance, it is
important that goals be comprehensive so that the evaluation
can present the truest picture of what is occurring. The
evaluation should also be timely, cost effective, and results
should be useable. The audiences concerned about the results of
the evaluation should also be considered in planning the
evaluation and how it is to be reported.
Results of evaluation should be reported
(Bruello, Orbaugh, Kladder, Benneth, 1981;
Friedman & Yarbrough, 1985; Phillips, 1983).

Results should be reported to those stakeholders
affected by the results or who care about the results. Therefore,
the results need to be communicated in a form that the audience
understands and that presents the whole picture including the
goals, staff development plans, evaluation findings, and design.
Reporting should be continuous and should help parents,
teachers, and students, understand the emphasis and need for
staff development and school improvement. Understanding the
goal and what they accomplished helps people see the value in
the staff development and its evaluation.
. Evaluation results should be analyzed to assess
the benefits and costs (Friedman & Yarbrough, 1985;
Kirkpatrick, 1979).
The benefits and costs of the innovation or program
need to be considered when placing value on the criteria used to
measure success in any evaluation. The organization must
develop criteria to determine if benefits are worth the cost of
the staff development program. The criteria may vary in
different organizations, but it is important that they are
developed and used in the analysis of evaluation outcomes.
In summary, the review of the literature demonstrated
that staff development evaluations should measure both the

process and the impact. The evaluation should be both formative
and summative and goal-oriented. The results of evaluation
should be reported to the stakeholding audiences and the
benefits should be weighed against the costs. The review of the
literature also showed that there is a difference between
research and evaluation, while there are many criticisms about
the lack of staff development evaluation occurring in education
and business and industry.
Research Question 2
How Are Colorado School Districts Evaluating Their Staff
Development Training?
This question was studied by interviewing staff
developers in the districts that have more than 10,000 students.
This first set of interviews were called screening interviews
because the screening interviews allowed the researcher to
select three districts for the case studies. The second set of
interviews involved many people in each of the districts and
their responses formed the case studies. Following is a
description of the screening interviews, the criteria for choosing
the case studies, the findings from the screening interviews, and
the findings from the case studies.

Description of Screening Interviews
Of the 16 districts contacted for screening interviews all
but one agreed to participate and were interviewed. The 16
districts were chosen because they have student populations of
10,000 or more and are representative of three of the financial
categories recently created by the Colorado state legislature:
city, urban/suburban, and Denver metropolitan. Thirteen of the
interviews were conducted in person, while two of the
interviews were conducted over the phone (for the instrument
see Appendix G). Screening interviews lasted from 45 to 60
minutes. The person interviewed had primary responsibility for
staff development for that district, and in all but three cases the
term "staff development" was part of their administrative title.
Criteria for Case Studies
To choose the three exemplary districts for the case
studies, the following criteria were developed: How well did the
staff development administrator understand the evaluation
systems for that district? Was the district different from the
other two being chosen? Was the staff developer willing to
participate? The researcher determined that the staff
development administrators understood the evaluation systems
for their district when they were able to describe their
evaluation systems and their relationship to staff development.
Although all three districts were in the metro area and had

established staff development programs, they represented a
range of student populations and budgets.
Screening Interview Findings
Program variety. Excluding the three districts in the
case studies, the screening interviews revealed that while the
districts of over 10,000 students in Colorado all had some type of
staff development, the programs and evaluation varied. Keep in
mind that the findings from the screening interviews reflected
only the staff developer's perspective.
Process evaluation. All but one of the staff developers
identified ways that staff development was evaluated, but most
evaluations were measuring the process of staff development,
not the impact. The process evaluations usually occurred at the
end of the course or inservice.
Methodology varied. The staff developers reported
using a variety of methods to evaluate staff development. One
district was involved in a university partnership with an
evaluation component, and two others used their personnel
evaluation system to measure whether changes were occurring
in teachers' behavior as a result of staff development. Four staff
developers had designed questions that asked participants to
report on any changes during or after the course. Two districts

had staff developers who conduct observations after courses
were completed to coach the participants, and they were able to
assess changes in teaching behaviors while coaching.
Communication problems. Staff developers in three of
the larger districts discussed being frustrated that people did not
communicate across departments. When there was no
communication between staff development and curriculum
specialists, staff developers believed that training efforts were
being duplicated and they were not aware of an evaluation
model for curriculum.
Level of awareness. The awareness level of the staff
developer seemed to be related to the sophistication in school
improvement and curriculum improvement processes for their
district. Unfortunately, in some districts the staff developer is
not involved in the school improvement process or curriculum
development. In these cases, the staff developer was often
unaware of the evaluation systems.
In general, the awareness level of staff developers
regarding evaluation systems was not high. One staff developer
said, "How do we evaluate the impact of staff development? I
can't see how we do this. Give me a clue!" Most staff developers
indicated that they wanted to improve staff development
evaluation, but seemed to think that they must design the

evaluation themselves, separate from the current evaluation
systems in place.
Description of the Three Case Studies
The three case studies were developed by interviewing
between ten and twenty individuals in each district. The staff
development administrator recommended two or three school
improvement teams that should be interviewed and two or three
curriculum specialists. Each case study begins with a description
of that district, the job titles of individuals interviewed, and the
responses to the interview guide (see Appendices J, K, L).
General Findings
The three districts that were chosen to participate in
the case studies all had evaluation plans for student and teacher
outcomes.' These evaluation plans were found in the following
areas: school improvement process, curriculum review process,
course evaluations, staff development program evaluation,
grants, university partnerships, and personnel systems.
. Only one of the three districts had not used an
outside consultant to evaluate staff development impact, and
this was the district with the youngest staff development
program operating on the smallest budget.

All three districts have had university support in
evaluation designs in programs which were operated in
conjunction with the university.
. All three districts had designed impact measures that
were used at the end of courses and inservices and evaluation
measures that were implemented several months following the
J All three staff developers identified links between
staff development evaluation and school improvement and
curriculum review.
Case Study Overviews
Case study district A. This is a district with
approximately 20,000 students and is located in the suburbs of
Denver (see Appendix J). They have a comprehensive staff
development department with several staff members and a
training of trainers program that encourages teachers to teach
courses and inservices. The program offers support for building
based staff development, administrative training, district level
instruction, on the job coaching, and a variety of other on-site
assistance. They were asked to participate in the case study
because they have been evaluating their building based staff
development program and trying to follow-up all courses with
coaching observations and sometimes surveys. This district is
also very open to sharing staff development ideas with other

districts. They were very interested in the results of the study
for their own use as well.
Of the three case studies, district A was offering the
most training in how to evaluate. A training program has been
established to help school improvement teams improve their
evaluation methods and includes ideas on how to evaluate staff
development as well. However, the particular school
improvement process that the teams have been taught to use is
very global and the teams identify many areas of improvement.
The teachers reported feeling scattered because of the large
number of goals that were identified for each building. A large
number of goals was making it difficult for the school
improvement team leaders to develope a realistic evaluation
Case study district B. District B is one of the largest
school districts in Colorado and is located in the Denver metro
area (see Appendix K). The staff development program has been
changing as the district changes and was one of the first districts
in Colorado to have a staff development program. The vision for
the program is that staff development serves the employees
because they are a community of learners. The program is
divided into instructional support, curriculum support, and
organizational development. Staff development activities are
planned at the district and building level. Teachers may
participate in a training of trainers program and outside

consultants are brought in as presenters or facilitators. The
strength of the program and personnel involved in it has meant
that this district is a leader in staff development. Many
partnerships have been developed with universities and
government agencies and these partnerships usually include an
evaluation component. The staff development department was
very cooperative and supportive of a study that would help us
gain more knowledge in the area of staff development
This was the largest district included in the case studies
and the interviews in the buildings reflected the greatest
diversity in school improvement. One building developed their
staff development plans as a result of a North Central
Accreditation process. The two elementaries involved many
teachers in the school improvement process and subsequent
staff development planning, but parents were not as involved.
While the people in the district office reported that a substantial
amount of effort had gone into changing the school improvement
process to meet the requirements of the new House Bill 1341,
the principals in the buildings fell that the state mandates
hindered their ability to implement staff development and
evaluation. They said that they saw the state reports as a
exercise in paper work.

Case study district C. This district of about 15,000
students is near the Denver metro area and includes several
small communities (see Appendix L). The staff development
program is relatively new along with the school improvement
process and has been in operation for about three and a half
years. Staff development is operated with one administrator
and almost all of the training being offered is a result of the
training of trainers program. There is easy access to three
universities and consultants outside of the district are rarely
used. The staff development program offers district level
opportunities through curriculum training, general instruction,
and wellness courses. Courses are offered during the weekends
and evenings with very little training being offered during
release time.
Staff development is instrumental to curriculum
implementation and many changes have been initiated in the
last' three years. The curriculum changes are taught to teachers
in courses after work hours University credit is offered. Unlike
the other two districts, very few inservices are conducted during
release time. The staff development evaluation is required as a
part of the accountability process used for curriculum review.
The staff development program is also closely linked
with the school improvement process and the building staff
development plans are driven by the school improvement
process. Buildings are encouraged to choose just one goal, while
many instructional practices might be considered for

improvement. Evaluation of staff development is written into
the school improvement cycle. In addition, the leadership
objectives for principals and other administrators requires an
evaluation plan, for every objective. The evaluation must include
impact measures, as well as process.
Case Study Findings
The answers to the interview questions are summarized
by district in the case studies (see Appendices J, K, L) and the
researcher analyzed the answers across all three case studies.
Following are the findings.
The districts used a variety of methods to evaluate
changes in teacher and student learning as a result of staff
development. These methods were used formally and
informally. When respondents stated that they gathered
evaluation data informally they described that they collected the
data through discussions and during normal daily activities
rather than specifically stating, "Today, I am working on
evaluation." As one curriculum specialist said, "I do that
(evaluation) automatically, I am not even conscious of gathering
data." In addition, evaluation was described as being informal
when it was not mandated and the results often were not
reported. Therefore, the data collection methods listed were
being used to assess progress, but often the results were not

The following methods were identified in all three
districts as methods of data collection for staff development
Teachers' acquisition of knowledge and skills.
. Pre- and post- tests are given to teachers in
courses and inservices
. Teachers perform mini-teaching episodes
during inservicing
Homework or activities during training provide
evidences of knowledge or skill acquisition such as
journal writing, lesson planning, or other simulations
. Teachers complete self-report assessments such as
surveys about the changes they are making
The transfer of skills to the classroom.
. Administrators observe teachers as a part of the
evaluation process
Peer coaching provides an opportunity for
. Teachers complete self-report assessments
Purchase orders reflect a change in the type of
materials being used
Student outcomes
. Standardized tests

Writing samples
. Attitude surveys
. Teachers' observations
. Teachers' grading systems
. Discipline referral logs or suspension/expulsion
School level impact
. Needs assessments
Climate surveys with parents and/or teachers
Informal observations (for example changes in
topics discussed in the teachers' lounge)
Other findings surfaced as a result of the question, "How
are you evaluating your staff development efforts?"
Linkage of staff development with curriculum and
instruction. Staff development programs were located in the
department of curriculum and instruction or were closely
aligned with that department's goals. The purpose of the staff
development program was to implement changes that were
outlined in instructional goals. Instructional changes,
implementing new curriculum materials, lifestyle or wellness
changes, and organizational development were the most common
areas for staff development activities. Included in these plans
for change, was an evaluation component. The evaluation
measured the process and impact of staff development.

Evaluation as an integral part of a planned cycle. Staff
development evaluation was linked to school improvement,
curriculum review cycles, and leadership objectives. As a part of
the school improvement and curriculum review cycles, goals for
improvement were set, action plans implemented, and data
gathered to assess the level of goal attainment. The staff
development activities were usually a part of the action planning
in these cycles and thus activities were conducted to measure
whether the staff development was successful in terms of
process and impact.
Staff development administrators conducted evaluations
of specific programs as a part of their own leadership objectives.
Examples of specific programs evaluated included a building-
based staff development program, a general instruction training
focusing on higher level thinking strategies, and university
partnership teacher training programs. The types of evaluation
systems being used were designed by the university, outside
consultants, or the district office several months after training.
University role in staff development evaluation.
Evaluation was more common where there was a relationship
with a university or when evaluation was required as a
condition of receiving grant funding. For example, all three
districts were involved in partnership efforts with local
universities. Where university partnerships exist, evaluation

tended to be a component of the project. Also, as part of the
partnership agreement, these districts were able to request
assistance with evaluation designs and plans from the university
staff. The evaluations conducted with the university also
appeared to be more comprehensive, meaning they used a
variety of methods to gather data and assess impact.
Limitation of evaluation forms. Evaluation forms from
the state department of education and university continuing
education measure process only. The most common method for
evaluation was the evaluation form provided by the Colorado
Department of Education as a follow-up to receiving
recertification credit or the form provided by a local agency of
higher education for the same purpose. In addition, districts
often had designed their own evaluation form. Sometimes the
staff development department designed an evaluation form to
be used in all courses, and sometimes the instructor developed
an evaluation for their specific course.
The forms being provided by universities and the state
department measure process. Instructors were designing their
own questions to measure impact. Because the form from the
university or state department is usually required, the
participants were completing two sets of questions, one
measuring impact and one measuring process. Completing two
evaluation forms is cumbersome for the participants and

Perspectives limited regarding staff development
evaluation. Respondents lacked information about staff
development activities and evaluation designs outside of their
own districts, schools, or departments. The first responses from
participants indicated that when they initially considered
evaluation of staff development, they only considered those
evaluation activities implemented through their own program
review. During the interview, however, they would list
examples of evaluation from the school improvement process
and curriculum review cycles, and other evaluation projects.
Emphasis on student outcomes. There is an increased
emphasis on measuring success of staff development based on
student outcomes as measured by standardized test scores.
Joyce and Showers (1980) have identified several levels of
impact that can be used when evaluating the impact of staff
development. The levels are knowledge or skills of participants,
transfer of skills to the classroom, and student outcomes. When
respondents were asked to describe how they knew whether or
not the teachers had gained knowledge or skills as a result of
participating in staff development, they often gave examples of
how they knew that the behaviors had changed in the classroom
and that students had benefited. The methods for gathering
evaluation data tended to fall in the categories of teaching
behavior in the classroom and student outcomes.