System for persona ensemble clustering

Material Information

System for persona ensemble clustering a cluster ensemble approach to persona development
Brickey, Jonalan
Publication Date:
Physical Description:
xvi, 153 leaves : ill. ; 28 cm.


Subjects / Keywords:
Cluster analysis ( lcsh )
Persona (Psychoanalysis) ( lcsh )
Computer interfaces -- Design and construction ( lcsh )
Human-computer interaction ( lcsh )
Cluster analysis ( fast )
Computer interfaces -- Design and construction ( fast )
Human-computer interaction ( fast )
Persona (Psychoanalysis) ( fast )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


Thesis (Ph. D.)--University of Colorado Denver, 2010.
Includes bibliographical references (leaves 135-153).
Statement of Responsibility:
by Jonalan Brickey.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
672293668 ( OCLC )


This item is only available as the following downloads:

Full Text


SYSTEM FOR PERSONA ENSEMBLE CLUSTERING: A CLUSTER ENSEMBLE APPROACH TO PERSONA DEVELOPMENT by Jonalan Brickey B.S United States Military Academy, 1991 M.S., Naval Postgraduate School 2001 A thesis submitted to the University of Colorado Denver in partial fulfillment of the requirements for the degree of Doctor of Philosophy Computer Science and Information Systems 2010


2010 by Jonalan Brickey All rights reserved


This thesis for the Doctor of Philosophy degree by Jonalan Brickey has been approved by --Dawn G Gregg Min-Hyung Choi


Brickey Jonalan (Ph.D., Computer Science and Information Systems) System for Persona Ensemble Clustering: A Cluster Ensemble Approach to Persona Development Thesis directed by Associate Professor Steven Walczak ABSTRACT The personas approach to user modeling aims to improve system interface design and increase the chances of information system success. Whereas there have been recent attempts to semi-automate the persona clustering process the current methods fail to conduct simultaneous data analysis utilizing both quantitative and qualitative data; therefore, the current methods do not approximate human clustering judgment. Additionally, traditional manual persona clustering methods are resource intensive. This study views solutions to this problem from a design science lens in order to find utility in artifacts that meet the needs of an organization. Two information technology artifacts are built and evaluated : a new ensemble clustering method for persona development and an instantiation of the method in a prototype known as the System for Persona Ensemble Clustering. In order to validate the new method and prototype system data are collected on system users in the context of a military Knowledge Management System. All data


were simultaneously analyzed and combined using an ensemble cluster method implemented in the prototype system The clustering effectiveness of three existing persona clustering methods was compared with the new method by using an expert panel clustering as the baseline. Final agreement measures for the three existing methods and the new method indicate the new semi-automated ensemble cluster method creates persona clusters more effectively. Additionally the new method is more than three times faster than the traditional manual clustering method. These results suggest that ensemble clustering methods are effective at triangulating the plethora of user data commonly available for persona development projects As the amount of user data available to design teams grows it is imperative to use qualitative and quantitative data simultaneously to understand user goals needs and behaviors The results of an effective and efficient persona clustering method are realistic personas that portray typical system users and system interfaces that match those personas. Finally, the research is communicated to managementoriented and technically-oriented audiences using a design science research perspective. This abstract accurately represents the content of the candidate s thesis. I recommend its publication. Signed --Steven Walczak


DEDICATION I wish to dedicate this dissertation and all of my accomplishments to my parents : Carol W Brickey and Homer Brickey Jr. This is dedicated to my father for always being a good role model but never preaching to me about how to live my life He inspired me to serve my country just as he served in the Army during the Vietnam War Through his actions and sometimes through his words he inspired me to always do the right thing and to work hard at everything I do. This is dedicated to my mother who passed away in the final months of my dissertation, for encouraging me to apply to West Point and continuing to lift my spirits when I doubted myself. Finally this is dedicated to soldiers everywhere for doing what others cannot or will not do.


ACKNOWLEDGMENT Any significant accomplishment is a reflection of more than one person. There are many people I would like to acknowledge for their support through guidance assistance friendship and most importantly love. First I wish to thank previous mentors in the United States Army for guiding me through a rewarding career of service to my country I am thankful for the incredible advanced education opportunities afforded by the Army especially while soldiers are fighting in wars overseas. Throughout my studies at the U niversity of Colorado Denver I received tremendous support from faculty and fellow students I wish to thank Dr. Steven Walczak for being an excellent mentor throughout this process He guided me from the early moments of my doctoral work through the end ; I appreciated his patience as I meandered from one topic to another in the initial stages My committee members provided encouragement and guidance throughout the entire arduous process for that I thank them all. Two professors went above and beyond normal classroom duties and provided advice long after the classes ended so I want express my appreciation to them as well: Drs. Peter Bryant and Nancy Leech My fellow students provided friendship and loads of humor when I needed it most thank you! Jeff Erickson took me under his wings as a first year student and guided me


through my first conference paper. Andrea Hester Bobby Olsen and Gary Borken often invited me out for a drink and food after classes. Finally Chris Sibona was an excellent sounding board and friend not just in the real world but also in the world ofFacebook! I thoroughly enjoyed my time in Denver with all of these individuals. Next I want to extend my gratitude to specific people and organizations associated with the military instrumental in this accomplishment. First I want to thank Lieutenant Colonels Tony Burgess and Pete Kilner for their encouragement guidance and especially their cooperation This would not have been possible without Tony the Company Command forum members and the team at the Center for the Advancement of Leader Development and Organizational Learning (CALDOL) at the United States Military Academy Moving from one academy to another I would also like to thank Colonel Thomas Drohan Head of the Department of Military & Strategic Studies United States Air Force Academy for giving me o f fice space and access to many resources Within the department I would be remiss if I did not thank Dr Dorri Karolick for mentoring and friendship as well as Lieutenant Colonels Dave Christensen and Jerry Boone for support Also I want to thank Dr. Michele Costanza for suggesting research in personas and offering tips for a successful dissertation. Finally I want to thank Dr. Pam Savageknepshield and the expert panel members from the Army Research


Laboratory: Drs. Alan Davison Linda Elliott Susan Hill Bruce Sterling and Bill Evans I am grateful for the many friends and family who have supported me throughout my life in many endeavors and especially for their support during the last few years. First I want to thank Rick and Diane Jones for their love and support. I also want to thank Beth Young for being my companion for the past two years providing more love and encouragement than I could have ever expected. She never complained when I dragged her to the library or to a conference ( who could resist Hawaii?) and she listened to me talk about my research night after night when she really wanted to watch a movie with me I want to thank my father Homer Brickey Jr. for proofreading my dissertation-it helps to have a writer in the family! Finally I want to thank my boys Geoffrey and Geordan for cooking their own meals while I was studying. I am sure they wanted to do more with me the past few years but I was thankful for the time we had to go bowling or to attend concerts and sporting events If they were not such good kids I would not have had the time to focus on my research.


TABLE OF CONTENTS Figures ........... ..... . ..... ..... . . . ........ ..... ........... . . ..... ..... ...... . ............ xiv Tables ................ ... ............ . . .............. ..... ....... ..... . .................... . ............ xv Chapter 1. Introduction .. .............. ............ ............................... ..... .... ..... ..... . ...... 1 1.1 Statemen t of the Problem .............. ......... ....... ................... ............. ..... . 3 1.2 Research Questions ......... .......... ................. . ....... ...... ........... . .............. 3 1.3 Overview of the Methodology .. ..... ....... ........... ........ ........ .... ......... .... 4 1.4 Anticipated Impact. ......... ..... ...... ......... . ......... ..... ...... ......... ............... 5 1.5 Overview of the Dissertation ........... .... . ....... ........ ......... ..... ..... . ..... 6 2. Literature Review ... ..... ......... . .... ..... ............. ....................... ..... ............... 7 2.1 Information System Development and Design ............... ..... ..... ......... ..... .... 7 2.2 Design Science Research Guidelines for the IS Domain .............. ........... 10 2.3 Personas .. .... ...... .......................... ..... .............................. . ............ .... 13 2.4 Importance ofiS Design Research in Military Domains ............... . .... . 17 2.5 Persona Development. ............... ..... ....... . ............. ........ ............. . ........ 21 2.5.1 Step 1: Identify Target Users ..... ..... ..... . .... .... . .................. ....... .......... 21 2.5.2 Step 2: Collect User Data ... .... . ............ ...... .......................... . ..... ......... 23 X


2.5.3 Step 3: Group Participants into Personas .......... ................ .... .................. .... 24 2 5.4 Step 4: Create and Present Persona Details .......... ....................................... 27 2.6 Comparison of Qualitative and Quantitative Persona Clustering Methods 28 2.7 Cluster Analysis and Ensemble Clusters .......... .......................... ...... ............ 37 2. 7.1 Cluster Analysis as a Pattern Recognition Technique ................................. 3 7 2.7.2 Cluster Ensembles ........................... .......... ................................................... 39 3. Research Hypotheses and System Attributes ............................................... 44 3.1 Hypotheses .......................................... ......................................................... 47 3.1.1 Effective .............................................. ............................ ...... .... ................. 48 3 .1.2 Efficient .................................................. ............................................ ......... 50 4. Research Methodology .............................................................................. ... 54 4.1 Research Approach .. ................................................................................... 54 4.2 Research Context ............ ............................................................ .... .... ........ 55 4.3 Research Participants ................................................................................... 56 4.4 Instruments Used in Data Collection .. .................................... ................ 58 4.5 Data Analysis: Clustering Members and Evaluating System Performance. 61 4.6 Pilot Study ................................................................................................... 66 4.6 1 Pilot Study Results ........................................................................ ............... 66 4.6.2 Moving Beyond the Pilot Study .. .................................... ........................ .... 68 xi


5 The System for Persona Ensemble Clustering (SPEC) . . .... .............. ... 71 5.1 Data Collection and Number of Clusters ... .. .................................. . . ... 73 5.2 Base Clusterers ...... ..... ..... . .... ......... ..... ...... . ......... ...... . ............. ... 76 5.2.1 Latent Semantic Analysis .... . ... ....................... .... .... ....... ..... ..... ....... .......... 76 5.2.2 Factor Analysis and Principle Components Analysis .................................. 77 5.2.3 Cluster Analysis .......................... . ..... ..... ..... . .... . .... . ..... ..... ...... 78 5.3 Combining Base Clusterers into a Cluster Ensemble ....................... .. ...... 79 5.3.1 Majority Vote .......................................................................................... ..... 81 5.3 .2 Hyper-graph Cluster Ensemble .................................................................... 83 5.4 Revisiting the Pilot Study ...................... ....................................................... 85 6 Results: SPEC Validation .. ..... ...... ...................... ...... ............................ 87 6.1 Factor/Principle Components Analysis Results ........................................... 87 6.2 Latent Semantic Analysis Results ............ ........ ................................... ......... 92 6.3 Cluster Analysis Results ............ .................................. ................................ 93 6 4 SPEC Results ....................................................................................... ...... 94 6.5 Expert Panel Results ........................................................................ .. .......... 96 6.6 Hypothesis Testing ..... . ..... .................... . . . . . .... ..... .... ..... .... 100 6.6.1 Effective Results ........................ .. .............. .................... ...... ........ ...... ........ 100 6 6.2 Efficient Result s ...... ...... ...... ............. ....................................... .................. 103 xii


6.7 Using the Results to Create Personas.. ...... ........ ..... .................. ............ ... 104 7. Discussion and Conclusion .. ... ............ ....... ..... ......... .................. ....... 11 0 7 1 Contributions ......................... . ........ ..... ..... ........... ............... ........... 112 7.2 Implications for Research ..... ...... ...................................................... ..... 115 7.3 Implications for Practice .... ................ ..... . ..... ..... .......... ............ .... 116 7.4 Limitations ......... ..... . ....... ............ . . ..................... .... ......... ....... 117 7.5 Recommendations for Future Research ........ ............. . ...... .............. . 119 Appendix A. Instruments ...... .............. ................ ........ ..... .... . ................ ......... . 122 B. Participant Demographics ............... . ..................... ....................... . . 125 C. Human Subjects Certificate of Exemption ........ .. .................. .. ..... .128 D. Approved Informed Consent Form ......... ..... ..... ................ ......... ........ 129 E. Final Clustering Result s for Each Member ...... ..... ................ .... ........... 131 F. Cross Tabulation Tables for All Clustering Methods ............. .......... .... 133 Bibliography .. ... ..... .............................. . ..... . . ..... ...... ................. ... 135 xiii


LIST OF FIGURES Figure 2.1 Jeff ," a Sample Persona for a Leadership Community ofPractice . . .... 16 2.2 Qualitative Versus Quantitative Data for Persona Development . . .... ... 24 2.3 Persona Grouping Techniques Found in 27 Studies . ....... . ..... . . ..... 34 3.1 SPEC System Design Attributes and Goals ..... ..... .............................. 45 3.2 Key Studies on Existing Persona Clustering Techniques Mapped According to Research Approach and Automation ....... .................. ......... 4 7 3.3 Aggregating Complementary Information to Discover Patterns Similar to Humans . ..... ..... ..... .... ..... . . . ..... ....... . ............................ 50 4.1 Comparing the Performance of Persona Clustering Methods . ..... . ......... 64 5.1 Combining Clustering Output into a Cluster Decision Center in SPEC ...... 73 5 2 The SPEC Method .................. . . ....... ........... .... .......... ....... ... ... ...... .... 81 6.1 Scatter Plot of Forum Members Based on Component Scores .................... 92 6.2 Scatter Plot of Members Using Component Scores from Usage Data . .... 95 6.3 Final Persona Representation for Persona 1: Knowledge Consumer. . .... 107 6.4 Final Persona Representation for Persona 2: Social Networker ...... . .... 108 6.5 Final Persona Representation for Persona 3: Knowledge Provider ........... 109 A.l Online Survey Questions Related to Goals and Needs . .......... ............ .... 124 xiv


LIST OF TABLES Table 2 1 Reported benefits of personas ........... ............ . ..... ..... ..... . ............... 18 2.2 Number of personas created in persona studies ........... ........ .... ...... . .... 26 2.3 Comparison of existing persona clustering techniques ............. ........ ..... 36 4 1 Most common categories for participants by demographic ......... ..... ....... 61 4 2 Interpreting Cohen's kappa ... ......................... ........... ......... ..... ..... .... .... 65 4.3 Pilot study clustering results ..... . ....... ..... ...... . ....... ...................... ....... 68 4.4 Cohen s kappa and agreement for each method ................. .................... 68 6 1 Component loadings for the rotated components ..... ......... ............. ........ ... 89 6.2 Interpretation of the components ......... . ...... ..... ..... . ..... ......... ......... ... 91 6.3 Expert panel results for persona group identification and description ... ...... 97 6.4 Agreement (kappa) between individual expert and consensus groupings ... 98 6.5 Expert panel individual and consensus clustering decisions ................... ... 99 6.6 Expert panel and semi-automated methods clustering results ........... .... 102 6.7 Cohen s kappa and degree of agreement for all four methods ............... ... 103 6.8 Times to complete clustering tasks for experts and SPEC ..... ................... 104 6.9 Persona qualities ..... .... ..... ......... ....... ........... .... ......... ......... .... . ....... 106 A.1 Online interview questions ......... ............. ..... . .... ....... . .... . ....... ... .... 122 XV


A.2 Description of Observation Data (Trace Data from Transaction Logs .... 123 B.1 Participant age ............................................................................ ...... .......... 125 B.2 Participant gender ............ ..... .............. ....... ........ . ..... ........... ..... ........ .... 125 B.3 Participant marital status .......................................................... ........ .......... 126 B.4 Participant education ....... ....... ............. ........ ..... ............... . .......... ......... 126 B.5 Participant rank ............................................................................ ...... ........ 127 B.6 Participant point in career .......................................................................... 127 E.1 Clustering results by member for each method ...................... ........ .... ........ 131 F .1 FA/PC A cross tabulation ........................................................ .................... 13 3 F.2 LSA cross tabulation .................................................................................. 133 F.3 CA cross tabulation .................................................................................... 134 F .4 SPEC cross tabulation ................................................................................ 134 xvi


1. Introduction The lack of utility or even failure of many information systems (IS) is well documented and spans the business psychology computer science and management information systems literatures (Cook, 1996 ; Davis 1989; Zhang Carrey Te'eni & Tremaine, 2005) There are as many dimensions to the problem as there are IS development and design approaches. Since the beginning of modem computing organizations continue to struggle with challenges encountered at the nexus of humans and computers as they strive to reap the promised rewards of IS productivity (Gerlach & Kuo 1991 ). Given the importance of information systems in our globally networked society, it is no wonder researchers and practitioners pay so much attention to system development and design as well as their impact on system acceptance and use. As organizations rely more on digital capital to drive innovation and achieve competitive advantages in worldwide markets (Sambamurthy, Bharadwaj & Grover, 2003) it is more important than ever to understand the antecedents of system success and avoid the causes of failure. Increasingly system designers are turning to the development and use of personasa fairly recent design method-as an effective approach to designing better system interfaces in hopes of improving user satisfaction and system success (Mulder & 1


Y aar 2007). Personas are design tools based on fictitious characters that represent real system users by captur i ng their goals needs frustrations and behaviors 1 Reported benefits of using personas include greater focus on users and better communication among developers (Cooper 1999 ; Dharwada Greenstein Gramopadhye & Davis, 2007; Miaskiewicz, Sumner & Kozar 2008) The personas approach to interface design presents two challenges that may significantly hinder its adoption for IS design First design teams are faced with different approaches to persona development and left with many questions about the methodologies For example qualitative and quantitative persona clustering methods for persona identification and creation exist but there is no research comparing their clustering effectiveness or efficiency. Second most of the studies in the literature implement manual qualitative persona development methods that require substantial resources for most organizations especially when one considers the number of systems in typical organizations and the number of interfaces within those systems. A small number of researchers have recently suggested more automated persona development methods that attempt to reduce the resources required to develop personas (McGinn & Kotarnraju 2008 ; Miaskiewicz et al. 2008 ; Sinha 2003). However the semi-automated techniques suggested in the 1 Personas in sy stems development are different than personas avatars or profiles used to represent a specific individual in a system or application such as Second Life or Sim City Personas in this stud y are IS art i facts that model typical s y stem users and a ssis t sy stem developers and stakeholders in the design o f user interfaces 2


literature are problematic because even though the researchers or practitioners creating personas often have both qualitative and quantitative data available to identify user groups (that eventually become personas) they only consider portions of the data. In the cases where they do consider both types of data they typically only use the additional data in a segmented and piecemeal fashion to validate existing personas instead of simultaneously analyzing the data (Mulder & Yaar 2007). Additionally current semi-automated techniques fail to take advantage of opportunities to increase automation in order to save resources in the persona development process 1.1 Statement of the Problem Although there have been recent attempts to semi-automate the persona clustering process the current methods fail to conduct simultaneous data analysis utilizing both quantitative and qualitative data ; therefore the current methods do not approximate human clustering judgment. 1.2 Research Questions This study views solutions to this problem from a design science lens which addresses research through the building and evaluation of artifacts in order to fmd utility in the artifacts that meet the needs of an organization (Hevner March, Park & Ram, 2004, p. 79). According to March and Smith (1995) an information 3


technology (IT ) artifact is a construct model method or instantiation The following questions guide the search for useful artifacts in this research: 1 What are the existing qualitative and quantitative methods used for identifying personas ? 2. How do the various quantitative and qualitative methods used in identifying personas differ ? 3. How do e x isting semi-automated qualitative and quantitative techniques perform when compared to manual clustering? 4. Can a method based on ensemble clusters utilizing both quantitative and qualitative data improve clustering performance ? 1.3 Overview of the Methodology Although the methodology will be discussed in detail later in Chapter 4 a brief preview is pro v ided here. The research approach is applied the focus is on solving a real problem in persona clustering-and follows the design science paradigm by proposing a method creating a protot y pe to implement the method and conducting an empirical study to validate the new method and prototype. In order to validate the new method and prototype system data are collected on s y stem users in the context of a military Knowledge Management System ( KMS ). E xisting semi automated persona clustering methods are applied to the data and compared with 4


expert clustering decisions; concurrently a new semi-automated ensemble cluster method is presented and compared to the existing manual and semi -automated methods. Finally using the new system for persona clustering and the data collected, three personas are created to support interface design for a military KMS. 1.4 Anticipated Impact Upon completion of this research there are three expected contributions: a comparison of existing persona clustering methods the design and development of two design artifacts and a validation of the artifacts. This study: 1. Provides the first known empirical comparison of existing persona clustering methods. Researchers and practitioners will be able to learn from the comparison and apply knowledge to other systems. 2. Presents two novel IT artifacts: a method (algorithms and process) and an instantiation (prototype system). The study presents a novel semi automated mixed analyses persona clustering method that combines the results of several base clustering methods into an ensemble method. The System for Persona Ensemble Clustering (SPEC) described in this study is an instantiation of the new method. SPEC can identify the number of personas for a system, describe the underlying persona qualities and assign users to personas to support the persona development process. Although 5


ensemble clusters have been successfully applied in other domains, they have not been used for persona development. 3. Conducts an empirical study to determine and compare the performance of three existing persona clustering techniques and a new ensemble technique presented in SPEC. 1.5 Overview of the Dissertation The dissertation is organized as follows. Chapter 2 is a discussion of related work on IS design and design science research personas and the persona development process cluster analysis and cluster ensembles. Chapter 3 describes system design attributes and goals for a semi-automated mixed analyses persona clustering method and presents the research hypotheses. Chapter 4 explains the research methodology and discusses a pilot study. The details of the two new artifacts are provided in Chapter 5. The results of persona clustering using existing methods and the proposed artifacts are reported in Chapter 6 Additionally three personas are created to support interface design for a military KMS using the new system for persona clustering and the data collected. The final chapter discusses the results and concludes with implications ofthis work and future research. 6


2. Literature Review 2.1 Information System Development and Design The implementation, acceptance and success of organizational information systems are mature topics in multiple research disciplines. Military organizations funded many of the earliest computer-based information systems development projects in the 1950s (Cook 1996). These large complex projects required a formal development methodology; thus information system developers adopted the systems development life cycle (SDLC) methodology (Tur ban & Aronson, 1998). The next two decades were marked by technological advances in computers and a greater variety of development methodologies. By the 1970s there were continual problems with software development and failures of major enterprise information systems were being widely publicized (Karimi & Konsynski, 1988; Zhang et al. 2005). Over the years system developers have used several different development approaches-waterfall, prototyping iterative spiral object-oriented and others (Booch Rumbaugh & Jacobson 2005; Cook 1996 ; Turban & Aronson 1998; Zhang et al. 2005). Each approach has its own set of tools models and methods that are used extensively throughout the development of information systems (Ma & LeRouge 2007). For example in the early days of IS development, the waterfall 7


approach reinforced a linear development process that promoted structured development methodologies and programming techniques (Laudon & Laudon 1999). The object-oriented design and analysis methodology is a hybrid incremental-iterative approach that became more popular during the 1990s and provided the foundation for the unified modeling language (Booch et al. 2005). More recently, the iterative approach gained in prominence with the rise of the dot com boom and the demand for shorter system lifecycles spawning the need for more agile software development techniques such as extreme programming (Beck 2000). Regardless of the development approach information system failures continue to plague academicians and practitioners alike (Goldfmch 2007). There is a plethora of research positing several causes of IS failures: technical problems lack of skills and cultural differences (Heeks 1999) ; system complexity (Mukherjee, 2008); lack of user acceptance (Davis 1989; Lyytinen & Hirschheim, 1987); and failure to understand user goals and interaction needs (Cooper, 1999) among others. Dalcher and Genus (2003) estimate the cost of information system failures in the U.S. alone is nearly $150 billion annually; this figure includes public and private organizations large and small projects. In the 1990s just two companies-Worlds, Inc. (online conversation application) and Nomadic Computing (remote office application)-accounted for tens of millions of dollars wasted on cancelled high-profile IS projects (Cooper, 1999). 8


Certain types of IS are more problematic than others due to the nature of the system tasks and processes. Organizations struggle with the development of KMS that support the creation transfer and application of knowledge in organizations (Alavi & Leidner 2001). KMS are difficult to develop and implement because the content or service provided is difficult to measure especially since most KMS implement a process view of knowledge management as opposed to economic or managerial views (Bell Whitwell & Lukas 2002) Complicating the task of measuring KMS value is the fact that knowledge is viewed and defined differently from one organization to another as well as within organizations (Nicolini Gherardi Y anow & Gomez, 2003) Information systems research cites poor system design as a major contributing factor to system failure (Bostrom & Heinen 1977 ; Cooper 1999 ; Joshua 2008 ; Zhang et al. 2005 ) Poor design can be the result of a number of shortcomings including miscommunication between designers and stakeholders untrained programmers and developers unsophisticated development tools and processes and a lack of understanding of user goals experiences and behavior ( Cooper 1999 ; Grudin 1994; Javaher y, Deicbman Seffah & Radhakrishnan 2007 ; Laudon & Laudon 1999). Wilson and Howcroft (2002) point to three types of information system failures: project failure system failure and user failure U sers are frequently the focus ofiS 9


research because they are the link between the knowledge created and stored in information systems and the application of an organization s knowledge. User focused research in the fields of IS business computer science and psychology considers user constructs such as perceptions (Davis 1989) emotions (Edell & Burke 1987) and abilities (Compeau & Higgins 1995) as significant predictors of system success. More importantly however researchers often want to understand how design characteristics affect system users whether they will accept and use an information system (Davis 2006). These concerns have been the focus of behavioral science research in the IS domain since its inception as researchers have sought to develop and justify theories Gustified truths) related to the design implementation and use of IS (March & Storey 2008). More recently research based in the design science paradigm has approached IS problems by focusing less on theory and more on the artifacts. 2.2 Design Science Research Guidelines for the IS Domain Design science is fundamentally a problem solving paradigm that seeks to create and evaluate artifacts As a research paradigm the focus is less on theory than on relevance in practice and designed artifacts (Kuechler & Vaishnavi 2007). Though the notion of design science is rooted in Simon s ( 1969) work on sciences of the artificial the significant transition of design science research in the IS domain only goes back 20 years. 10


Although IS development had been recognized in practice for several decades there was a paucity of published IS research in the United States into artifacts and their design until1990, when Nunnamaker and Chen (1990) introduced a systems development research methodology. Momentum in DSR grew in the 1990s as researchers began to question the relevance of IS research and discuss the merits of design science in the IS domain (Kuechler &Vaishnavi 2007). In 2001 Gregg Kulkarni & Vinze (200 1) introduced the Socio-technologist/Developmentalist paradigm, which like the DSR paradigm, offers an alternative lens to the traditional positivist and interpretive paradigms to attend to the creation of unique knowledge associated with the development of information systems from their conception to inception (p. 172). However even as DSR became more prevalent at the tum of the millennium there were still questions as to what constituted valid design research in the IS domain. In order to clarify the boundaries of design science research (DSR) in the IS domain and inform researchers how to conduct and evaluate such research Hevner et al. (2004) established the following seven DSR guidelines (G1-G7) : [ G 1]: Design as an artifact. Design-science research must produce a viable artifact in the form of a construct, a model a method or an instantiation. [G2]: Problem relevance. The objective of design-science research is to develop technology-based solutions to important and relevant business problems 11


[G3]: Design evaluation. The utility quality and efficacy of a design artifact must be rigorously demonstrated via well-executed evaluation methods. [G4]: Research contributions Effective design-science research must provide clear and verifiable contributions in the areas of the design artifact design foundations and/or design methodologies. [G5]: Research rigor. Design-science research relies upon the application of rigorous methods in both the construction and evaluation of the design artifact. [G6]: Design as a search process. The search for an effective artifact requires utilizing available means to reach desired ends while satisfying laws in the problem environment. [G7]: Communication of research. Design-science research must be presented effectively both to technology-oriented as well as management-oriented audiences. These guidelines propelled design science out of its niche into the headlights of the IS community and have served as the de facto standard for conducting DSR in the IS domain (Indulska & Recker 2008 p. 2). Hevner et al. (2004) assert that DSR must address all seven of their guidelines to be considered good research. Since the approach of this study is through the DSR lens each of the guidelines is discussed throughout this dissertation. The majority of this chapter focuses on the domain of personas: what they do for interface design how they are created, and where existing clustering methods fail. As knowledge that results from DSR becomes part of the research and practice knowledge base it becomes best practice (Hevner et al. 2004). After introducing 12


the personas approach to user modeling and interface design, best practices and emerging artifacts for persona clustering are examined. 2.3 Personas In the mid-1980s a user-based research approach emerged as a way to improve system success through better design User-centered design (UCD) is a process within the human factors or human computer interaction (HCI) discipline that seeks usable designs through an iterative framework that includes methods tools and models (Zhou Heesom, & Georgakis 2007) There are three key principles of UCD (Gould & Lewis 1985) : 1. An early focus on users and tasks ; 2. Empirical measurement of product usage ; 3. Iterative design. System developers espousing the UCD approach incorporate these principles into a system s life cycle to drive design activities (Courage & Baxter 2005) Early UCD approaches focused on the users by incorporating user profiles lists of user characteristics such as age, gender ethnicity and other demographics-early in the design process. Although user profiles describe system users to inform design they lack realistic and memorable user details and fail to create a strong connection between designers and users (Kozar & Miaskiewicz 2009) 13


Alan Cooper (1999) leveraged the theoretical underpinnings ofUCD to create a new design methodology known as goal-directed design As part of his design methodolog y, he developed a technique to keep product developers focused on a small group of users known as personas Personas in the IS domain are fictitious characters that represent an aggregation of real system users ; they serve as design tools by keeping system interface designers focused on aspects of users mental models: needs goals and frustrations (Cooper 1999) The sample persona in Figure 2 1 illustrates the composition of a typical persona: photo name skills motivations and goals. According to Cooper (1999) a persona like "Jeff' in Figure 2 1 is not made up ; rather he is discovered as a result of collecting information from real users o f a system In this case "Jeff' is a hypothetical archetype of actual users in a community of practice (CoP) on leadership. System interface designers would not re-use "Jeff' for another system because he was discovered in this setting ; instead they would create a new cast of characters usually three to seven personas for each s y stem. The use of personas as design tools is becoming more widespread as academicians and practitioners discover its benefits Since personas are primarily found in practitioner literature there is a need for research to evaluate application methodologies and their effectiveness for persona development (Long 2009 ; Miaskiewicz & Kozar, 2006 ; Pruitt & Adlin 2006). Several studies cite a focus on 14


the user as the primary benefit of personas as seen in Table 2.1, but they also mention other significant advantages of personas such as better communication among developers (Cooper, 1999 ; Dharwada et al. 2007; Miaskiewicz et al., 2008) and improved user task performance (Dharwada et al. 2007) Task performance has long been a focus of HCI research; however more recently there is greater interest in user perceptions and behaviors in the context of IS (Davis 2006). Even though the majority of the persona literature advocates the use of personas in the design process, there are skeptics. Most of the criticism focuses on the process used to create personas (described later in this study) though some critics attack the scientific underpinnings of personas. Chapman and Milham (2006) call for more research and rigorous evaluation of the personas methodology. They cite the impossibility to verify the accuracy of personas as the most serious limitation of the process (Chapman & Milham, 2006). Although researchers supporting the use of personas have not specifically addressed this issue, they have provided anecdotal support for personas in practice as "suggestive and provocative for design instead of definitive and valid for a general average user that is actually inexistent" (Schmidt, Terrenghi & Holleis 2007 p. 727). 15


Divorced, 37 BS, Finance Asst. VP of Finance (7 yrs.) Jefrs Mantra: "Talk to me! I go for the discussions. Golfs, lifts rides Harleys Skills Excellent Word & Excel user (a whiz!) Above average general computer skills Uses Facebook and Twitter Completed Lean Six Sigma What's Important to Jeff He doesn t like to re-enter information the system should already have. He likes help features and error messages with instructions He sees no need for anonymity How Jeff Uses the CoP Jeff logs into the CoP 3-4 times per week. usually from work. and spends I 0-20 minutes looking for the latest information. He starts his search by checking the what's new" section He posts questions but rarely answers other users' questions Jeff checks the profiles of users who answer his questions to evaluate their expertise. He is not interested in seeing examples of management forms used by other members. Figure 2. 1 "Jeff," a Sample Persona for a Leadership Community of Practice Cooper ( 1999), the founding father of personas, insists that precision is more important than accuracy because precision is more usable for designers For example, ifthe sample "Jeff' persona in Figure 2 1 was based on accuracy, he may have 2.4 children (if that was the average for his segment of the population). The fact that the statistically average user has 2.4 children may be accurate, but not useful for a designer. Kujala and Kaupp in en (2004) also support the use of personas as realistic descriptions of typical users, not as statistical representatives of users. 16


Another study critical of personas addresses the efficiency and influence of personas on design decisions. Ronkko (2005) found that although designers realized some of the benefits of personas they did not feel they were worth the expenditure of resources compared to other design approaches. Additionally he criticizes the use of personas as a scapegoat for designers: his study found that designers used personas as a political instrument to justify design decisions (Ronkko 2005). More recent research however credits personas with assisting designers in understanding user objectives keeping designers grounded facilitating communication within the development team and making better design decisions (Dharwada et al., 2007; Drego Temkin & Mcinnes 2008; Long 2009 ; Ma & LeRouge 2007). 2.4 Importance of IS Design Research in Military Domains The military domain has often been a catalyst for the advancement of technology. In the Information Age military analysts believe the superior application of IS leads to dominant awareness ofthe battlefield and a competitive military advantage over adversaries (Keohane & Nye Jr 1998). Although military organizations funded most of the earliest IS research in the 1950s (Cook 1996) today civilian industry plays a larger role ; nevertheless IS-related spending in the U.S. Department of Defense is more than $30 billion (Office of Management and Budget 2008). As such, military systems are excellent candidate systems for studying the development 17


of personas because there are hundreds of military systems and the impact of an y mission critical systems failing may have severe adverse impacts Table 2.1 Reported benefits of personas Author(s) Benefits Cooper 1999 Allows designers to envision the design problem ; promotes effective communication among design team and a shared understanding of users ; decreases programming code number of protot y pes and design c y cles Courage & Baxter 2005 Gives users life and keeps development team connected to users ; provides common targets for requirements and design. Dharwada et al. 2007 Improves user task performance ; increases knowledge transfer among developers ; clarifies user needs ; helps to v alidate designs. Long 2009 Produces designs with superior usability characteristics ; strengthens the focus on the end user ; directs decision making within design teams. Ma & LeRouge 2007 Helps the development team: identify with users communicate effectively with users and keep focused on user needs. Miaskewicz et al. 2008 Facilitates understanding of users among development team ; builds empathy for users; de v elops greater understanding of and identification with target users. Pruitt & Adlin 2006 Keeps the design team engaged ; evokes empathy and inspires the imagination ; brings fictitious users to life. 18


On July 3 1988 the USS Vincennes was conducting military monitoring operations in the Strait of Hormuz when its Command and Decision System a type of decision support system (DSS) alerted the crew to a potentially hostile unidentified aircraft that appeared to be approaching the ship What happened in the next few minutes would shock the world and affect naval operations for the next three decades. Seven minutes after detecting and attempting to warn the approaching aircraft the USS Vincennes fired two missiles and shot down an Iranian civilian aircraft over the Indian Ocean killing 290 civilians The ensuing investigation suggested to analysts two main causal factors for the accident: poor decision making and a flawed interface between the ship s crew and the onboard DSS (Bower 1988; Fisher & Kingma 2001) One outcome of the Iranian aircraft incident was a complete re-design of the Navy ship s IS, including the DSS in question. Some of the problems with the original system included : a lack of live video feeds poor display of critical information on the screens poor use of lighting around the display and most importantly a lack of threat aircraft altitude information on the screen (United States Senate 1988) System developers designed a new system by completely overhauling the previous system s human computer interface. The accident also provided a lesson for all current and future military information systems including KMS and other non combat systems : interface design is a critical success factor that requires a focus on the system users to understand the interaction between users and systems 19


Enterprise resource planning (ERP) systems have been the focus of considerable research in the last decade due to high costs and level of difficulty for development and implementation (Haines & Goodhue 2003). One of the largest ERP system projects in the world appears to be failing after costing American taxpayers nearly $2 billion in the past decade. The Defense Integrated Military Human Resources System (DIMHRS) is a Congressionally-mandated ERP program being developed to support over 2.5 million service men and women in the U.S military Originally conceived in 1995 as a replacement for 80 separate manpower personnel and pay processing systems DIMHRS development began in 2003. After five years of development the U.S. Government Accountability Office (2008) reported serious concerns regarding the design ofthe system. In May of2009, the Department of Defense announced that it will require only two of the four services Army and Air Force-to implement the system while the other half of the military will use other systems. The USS Vincennes incident and the DIMHRS program are examples ofthe failure of information s y stems in the military domain. Although they are extreme cases the y highlight how interface design failures may negativel y impact the development and implementation of information systems in the military. KMS are ideal settings to examine personas--especiall y in the military-because there is an increasing reliance on KMS to assist military decision makers in dealing with complex 20


challenges in operational and tactical environments (Cianciolo, Hei Prevou & Psotka, 2005). 2.5 Persona Development This research is focused on a method and system of clustering users into a few manageable groups ; however before exploring the various quantitative and qualitative clustering techniques used in developing personas it is important to understand the process for developing personas and where these clustering methods are utilized. Persona development teams (PDTs) pursue different approaches to developing personas each with its own advantages and disadvantages. The persona development approaches generally share four steps: 1. Identify target users; 2. Collect user data ; 3. Group users into personas; 4 Create and present persona details 2.5.1 Step 1: Identify Target Users Since personas are fictitious users based on real system users developers first identify the population of real users then select target users for data collection. PDTs usually conduct some version of market segmentation to identify target users (Clancy & Krieg 2000; Nieters Ivaturi, & Ahmed 2007 ; Pruitt & Grudin 2003) 21


For example the target users for an airline entertainment system include three user segments with different needs: flight staff technical staff and passenger (Cooper, 1999). Market segmentation is a common practice in design for classifying users according to demographics (Proctor 2000) but Sinha (2003) advocates the use of I quantitative techniques (simplified through the use of statistical software packages) to segment user groups based on their information needs. PDTs in a number of persona development studies use existing marketing data such as sales customer satisfaction surveys and advertising--or human resources data to determine appropriate target users (Grudin & Pruitt 2002 ; Lindgren Fang Amdahl & Chaikiat, 2007 ; Mulder & Yaar 2007). After PDTs determine user segments they sample individuals in the segments for data collection. Most of the persona literature relies on qualitative data collection methods and nonrandom sampling of target users to obtain persona details; however Kujala & Kauppinen (2004) recommend employing more rigorous methods of selecting representative users such as random or stratified sampling Other design researchers believe that PDTs can create representative personas with small sample sizes and less rigorous (non-statistical) sampling methods (Hackos & Redish 1998 ; Schmidt et al. 2007). Beyer and Holtzblatt (1998) suggest a sample size of 6-20 users is sufficient for creating representative personas. Kujala and Mantyla (2000) find that as few as six users can provide useful details for persona development. 22


2 5.2 Step 2: Collect User Data PDTs often collect both qualitat iv e and quantitative data to use throughout the persona de v elopment steps even though clustering is currently done using only a single data t ype-data not used in clustering is normally used later in step four to add persona details (Drego et al. 2008 ; Ja v ahery et al. 2007 ; Mulder & Yaar 2007 ; Sinha 2003). As depicted in Figure 2.2, qualitati v e and quantitat i ve research methods are used to capture similar data measuring different aspects o f users mental models for persona creation Cooper ( 1999 ) assumes designers will use qualitative data to create personas and Goodwin ( 2002) strongly recommends it. Pruitt and Grudin ( 2003) support iterative data collection and persona development using qualitative and quantitative methods to improve the selection enrichment and evolution of personas (p. 8 ) Quantitative data can be collected through direct means such as surve y s usability testing system usage or through data mining organizational records Interviews field studies focus groups and ethnographic studies provide valuable means of collecting qualitative data. The interview method of data collection was the most commonly used method in 27 persona studies that disclosed at least some of the data collection methods employed 23


QUALITATIVE DATA Interviews Usability Testing GOALS & NEEDS BEHAVIORS Surveys Online Act i vity QUANTITATIVE DATA F igure 2.2 Qualitative Versus Quantitative Data for Persona Development 2.5.3 Step 3: Group Participants into Per s on as The goal of this step is to combine users into a few managea b le groups-whi ch ultimately become personas-based on similar user characteristics such as goals needs frustrations and behavior. This step is critical to the effectiveness of the eventual personas b ecause it should capture the needs of all users interviewed identify key differences be t ween users and result in clusters that are easy to describe to system interface designers (Mulder & Yaar 200 7 ) PDTs combine participants (system users) i nto groups using either qualitative or quantitative clustering techniques. The persona grouping technique employed depends on the type of data-qualitative or quantitative-PD Ts gather on users in step 2 24


Qualitative clustering is accomplished through manual or semi-automated techniques whereas quantitative clustering is only done through semi-automated techniques using statistical software. These techniques are discussed in greater detail later in this study Determining the right number of personas is challenging for PDTs because they have to consider the multidimensionality of users as well as the ability of designers to satisfy the different groups interface needs Kujala and Kauppinen (2004) conducted seven case studies on user needs and found that in most cases user needs were largely common. Most persona literature as summarized in Table 2 .2-results in the creation of three or four personas for a project (Hoekman Jr 2006) Chapman (2005) claims to have observed designers use up to 50 personas at once The difference between design teams using a few personas and those using nearly 50 is experience--design teams with experience in the process recommend creating between three and six personas (Pruitt & Grudin 2003). 25


Table 2.2 Number of personas created in persona studies Number Author(s) Organization Product Type of Personas Almirall et a!., Open University of Multi-format books 4 2010 Catalonia Cooper, 1999 Remedy Help desk management 3 application Cooper, 1999 Sony In-flight entertainment 4 system Dantin, 2005 University of Auckland NZ Educational software 2 Drego et al. Thornburg Mortgage Client web site 3 2008 Drego et al. St. Jude Children's Research Donor web site 4 2008 Hospital Javahery eta!. Biomedical Research Web-based application for 3 2007 Community biomedical research LeRouge & Medical research consortium Diabetic assistance device 2 Ma, 2010 Lindgren et al. Safety institute Driving assistance system 4 2007 Ljungblad et Robotic development Personal embodied agents 4 al. 2006 McGinn& Sun Training programs Kotamraju 11 2008 Miaskewicz et Undisclosed university Information repository 4 al. 2008 Naghshin et al., Concordia University Software development 2 2003 training Nieters eta!. Cisco Hardware for office 3 2007 productivity Panke et al. Educational portal 4 2007 Powell et al. Software company Web-based spreadsheet 4 2007 Pruitt & Microsoft Web browser 4 Grudin, 2003 Operating system 6 Randolph, Anderson Community Training management 3 2004 Hospital system Sinha, 2003 Restaurants Web-based restaurant 4 search 26


2.5.4 Step 4: Create and Present Persona Details In the last step of persona creation PDTs give the groups names and photos and add personal details age gender, education hobbies skills motivations goals frustrations and other characteristics to bring the personas to life U ser profiles are another UCD technique that designers have used to conceptually model system users (Hackos & Redish 1998); however, profiles lack meaningful realistic summaries of users and designers have a difficult time remembering user details (Pruitt & Adlin 2006). Personas are more memorable and evoke empathy in designers because they are vivid and lifelike representations of real people even though they are fictitious (Norman 2004). Although persona creation up to this step may rely on scientific approaches such as quantitative market segmentation or cluster analysis this last step requires interpretation of the results This step in the persona creation process has the potential to be subjective-especially if the data is qualitative because in many cases there is no way to determine an average value for a characteristic to populate a persona s list of attributes. Chapman & Milham (2006) refer to this as the curse of dimensionality for personas. For example several users within a persona may express similar frustrations with a system but there is no way to quantify an average qualitative value for that persona s frustration with the system. One solution for many researchers to express a persona s characteristics is to use real information 27


from actual users to populate the fictional persona s description ( Pruitt & Grudin 2003) They accomplish this by selecting one or several similar users within a persona and using either verbatim responses or paraphrased themes for frustration as the input value for the persona. Still some researchers feel they have an artistic license to create a persona as they envision him or her in order to be provocative without directly attributing the persona s characteristics to real user data (Kantola Sauli Katri & Tomi 2007 ; Schmidt et al. 2007). Regardless of the technique used to represent and present personas the goal is to create realistic memorable personas to help the designers connect with users in order to create more effecti v e and efficient system interfaces (Cooper & Reimann 2003; Grudin & Pruitt 2002). 2 6 Comparison of Qualitative and Quantitative Persona Clustering Methods The use of personas is on the rise for the design of information systems but although Cooper introduced the concept a decade ago there is little research into the specific development techniques and procedures ( Long 2009 ) As previously discussed, step 3 of the persona development process is critical to the success of resulting personas because it identifies groups of users sharing similar goals needs frustrations and behavior the aspects of users mental models upon which personas are based. Therefore it is imperative to compare existing qualitative and quantitative persona clustering methods to understand how PDTs accomplish this important task and evaluate the effectiveness and efficiency of these methods 28


Persona identification and clustering currently rely on only one type of data and qualitative techniques are used most frequently. Qualitative clustering techniques include manual and semi-automated methods. Manual methods require human judgment to identify users with similar characteristics; semi-automated methods rely on the use of statistical software for analysis. Using qualitative data researchers usually cluster participants using manual techniques such as affinity diagrams card sorting exercises and expert panels (Broschinsky & Baker 2008; Lindgren et al. 2007). Researchers advocating the manual approach to clustering prefer the use of rich qualitative data from interviews or observations (Cooper & Reimann 2003 ; Goodwin 2002) Pruitt & Adlin (2006) recommend grouping participants by themes and relationships, then determining which ones are most important to the project objectives. Mulder and Yaar (2007) suggest a collaborative qualitative process using various stakeholders gathered in a room with plenty of white boards to develop clusters based on goals usage lifecycle or a combination of behaviors and attitudes (p. 123). Goodwin (2009) also recommends using as much work space as possible to lay out all of the user data and sort them across a continuum by roles behaviors, and demographics. Although the manual qualitative method of clustering users into personas is popular among PDTs ; it has several drawbacks First, as the number of participants and textual data grows it becomes difficult for human experts to make objective judgments and trace their findings back to user data (Pruitt & Adlin 2006). 29


Although humans may have qualitative and quantitative data available for making clustering decisions they tend to focus on qualitative data for insight on a few users at the expense of a broader understanding of users through the analysis of quantitative data (Sinha 2003). A second and related drawback is that manual methods rely on the ability of humans to understand complex, multi-dimensional relationships between variables under study. Several studies indicate that humans can only process a handful of variables or factors when discerning relationships between those factors (Halford Baker McCredden & Bain, 2005; Miller, 1956). Strehl (2002) also notes that although clustering objects is a routine human function we are quickly overloaded and unable to discover buried knowledge. A third drawback of the manual qualitative method of persona clustering is that despite recent attempts to provide guides to persona development several of the most authoritative qualitative persona development methods remain proprietary (Ndiwalana et al., 2005). Cooper introduced the IS design world to personas, but he saved the details for use in his interaction design consulting firm The final drawback is that the manual clustering methods may require extensive resource commitments from organizations Development teams require specialized skills in qualitative research methods and it may take considerable time and money to obtain results (Javahery et al. 2007 ; Miaskiewicz et al., 2008 ; Sinha, 2003). 30


Some critics of the manual qualitative persona clustering techniques advocate more automated methods that they believe address the rigor and resource concerns of the manual qualitative method. Although the manual qualitative method of clustering dominates the persona literature recent efforts to create semi-automated clustering techniques using qualitative or quantitative data have captured the attention of researchers and practitioners (McGinn & Kotamraju 2008; Miaskiewicz et al. 2008) Miaskiewicz et al. (2008) recently applied latent semantic analysis (LSA) to demonstrate the effectiveness of a semi-automated qualitative persona clustering technique. LSA is considered a qualitative method because it requires the collection of textual data such as interviews or observations even though the analysis is performed by software and the results are displayed as quantitative data. Derived from theory similar to factor analysis LSA is defined by Landauer Foltz and Laham (1998) as a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (p. 2). LSA determines similarity of word passages by only using the contexts in which words appear and do not appear ; it has been proven to make judgments similar to humans in standardized tests and essay grading (Landauer et al. 1998). 31


Miaskiewicz et al. (2008) introduce a five step LSA methodology for creating personas that the authors claim is less subjective more efficient and less reliant on specialized skills than other methods of persona development (p. 1501). Researchers collect user goals needs and frustrations through interviews that are transcribed into text. LSA software (found at http: // lsa colorado edu/) compares text from user interviews and displays a cosine matrix that represents the similarity among the interview participants (system users). Miaskiewicz et al. (2008) implement an LSA methodology for identifying personas that results in clusters that substantially agree with those created by experts The limitations of the LSA methodology are that it requires knowledge in specialized research tools and processes and its ability to compare participants depends on the corpus of text employed in the analysis. Quantitative clustering techniques are capable of determining the relationships between multiple variables to find patterns in user data that may be latent or unobservable to the human eye. The quantitative clustering techniques found in the persona literature are all semi-automated methods because they rely on statistical software to identify clusters of users ; the techniques used in the literature are factor analysis (McGinn & Kotarnraju 2008) principal components analysis (Sinha, 2003) and multivariate cluster analysis (Alrnirall Rivera & Valverde 2010 ; Javahery et al. 2007) Proponents of quantitative techniques assert they overcome some of the drawbacks of qualitative clustering: subjective assignment decisions and a lack of 32


rigor the need for experience in qualitative research training cognitive limitations of humans and considerable resource commitments (Javahery et al. 2007 ; McGinn & Kotarnraju 2008 ; Miaskiewicz et al. 2008 ; Mulder & Yaar 2007 ; Sinha 2003) Although quantitative clustering techniques rely on statistical software for analysis PDTs must still collect data prepare it for analysis and interpret the results The quantitative data required for these techniques must be collected as numbers or converted to numbers in order to use statistical software to produce clusters PDTs gather quantitative data through surveys (measures of goals needs and frustrations ), system transaction logs (usage) or organizational records (demographics education levels skills and other descriptive data ) As shown in Figure 2.3 the quantitative clustering techniques can only be found in a few cases in the literature as indicated by the numbers representing the number of times each method is found in the literature; therefore each technique is explained below in general mathematical terms and as implemented by the authors Table 2 3 summarizes advantages and disadvantages of all the qualitative and quantitative persona clustering techniques discussed in this study Sinha (2003) was the first to propose a semi-automated quantitative technique for persona clustering His approach implements Principal Components Analysis (PCA)-a data reduction technique to identify 3 components (groups of users) for an online restaurant finder based on 32 dimensions of the restaurant experience 33


(Sinha, 2003). PCA reduces the original variables into new components that convey as much of the original data as possible (Morgan Leech Gloeckner & Barrett 2007). The author claims his PCA approach provides a direct link from user surveys to the identification of persona clusters while nearly automating the task (Sinha, 2003). Principle Components Factor Analysis Latent Semantic' Analysis Semi automated Techniques Cluster Analysis Figure 2.3 Persona Grouping Techniques Found in 27 Studies Factor Analysis (FA) is a data reduction technique concerned with identifying the latent structure among multivariate data (Hair Black Babin Anderson & Tatham 2006) McGinn and Kotarnraju (2008) apply FA to identify latent groupings of 34


system users based on work tasks and demographics. They claim their FA clustering method is fast and cheap easy to scope ," and results in data-driven clusters derived from statistically significant sample sizes (McGinn & Kotamraju 2008 p. 1521 ). The main limitation ofF A as applied to persona clustering is that after identifying clusters PDTs still needed to conduct interviews to validate the clusters and collect qualitative data to explain wh y users behave in ways identified by FA. Depending on the number of variables in the analysis and the communalities among them FA and PCA may arrive at similar results (Tabachnick & Fidell 2007). Additionally FA and PCA rely on similar algorithms and the interpretation is similar except that FA assumes there are unobservable underlying factors shaping the data, whereas PCA makes no assumption about the meaning of components. Therefore in the rest of this paper they are treated as a single method for persona clustering. Like F AIPCA cluster analysis (CA) is capable of quickly analyzing multivariate data and determining relationships between participants that would be difficult if not impossible for humans to identify through visual inspection. (Cluster analysis as a generalized pattern recognition technique as opposed to the specific statistical CA method is discussed in more detail in the next section ) Mulder and Yaar (2007) describe how to use pivot tables in spreadsheets to sort users based on selected characteristics and perform a CA-like clustering effect but their technique does not 35


group users using multivariate data-it merely provides a different way to visualize the data. Table 2 3 Comparison of existing persona clustering techniques Technique Approach Data Advantages Disadvantages Type Manual Qualitative Textual Rich Humans can be qualitative overloaded by data; human quantity of data ; judgment considered subjective time consummg, reqmres expertise LSA Qualitative Textual Human-like Analysis depends of judgment ; corpus of text used semito compare automated ; interviews ; requires objective ; dataknowledge of LSA driven tools FAIPCA Quantitative Numeric, SemiQuality of the interval automated ; analysis depends on objective; datathe ability to capture driven ; desired goals needs identifies latent frustrations and factors behavior CA Quantitative Numeric SemiClusters any input ratio automated; data, even if there is objective; datano real underlying driven ; structure multiple algorithm options 36


There are however, two studies in the literature that analyze multivariate data using clustering methods automated in statistical software In the most recent study Almirall et al. (20 1 0) use an online survey to capture student attitudes regarding academic life and technology; they use CA to identify four user groups that they draw on as the basis for personas to design online books. In the second study Javahery et al. (2007) sample 22 users of a biomedical system and identify three clusters based on iterative cluster analysis: the first iteration of CA clusters users based on domain experience and system use ; the second iteration clusters using age as a factor Their technique demonstrates the use of readily available user datasuch as skills experience, system usage and demographics to quickly collect data and identify clusters. 2. 7 Cluster Analysis and Ensemble Clusters 2.7.1 Cluster Analysis as a Pattern Recognition Technique Pattern recognition is a challenging problem in everyday situations One mode of human intelligence is the ability to form meaningful groups of cases or objects given a variety of data (Strehl 2002). However as data complexity and volume increases, humans are quickly overloaded and unable to discern patterns Psychologist George Miller (1956) ignited conversation and research about the limits of human cognition when he stated that humans were limited to a short term memory consisting of seven elements. A more recent study by Halford Baker 37


McCredden and Bain (2005) illustrates the inability of humans to discern the relationship between more than three variables. For these reasons and more humans often tum to computer-based tools to mimic the human ability to recognize patterns through a technique known as cluster anal y sis. Cluster analysis is a multivariate analysis technique that uses a number of different algorithms and methods to arrange objects into similar groups according to the characteristics they possess. The different types of clustering algorithms include hierarchical partitional and density-based. Cluster anal y sis is a common tool often used in exploratory experimental science that seeks to recognize patterns in data (Witten & Frank 2005). The goal of cluster analysis is to group objects so that each object is similar to the other objects in the cluster and dissimilar from objects in the other clusters. Cluster analysis groups objects into clusters according to a distance measure which either measures similarity or dissimilarity Within the field of machine learning cluster analys i s is a form of unsupervised learning where the class labels are unknown and only raw inputs are given. By contrast supervised learning uses labeled data to train a classifier to classify or predict future objects according to observed patterns The unsupervised learning equivalent of a classifier is known as a clusterer. Cluster analysis is appropriate when class labels are unavailable or when labels are subject to human error or prejudice. 38


Cluster analysis is sometimes criticized because it will identify clusters based on any input data even ifthere is no underlying structure in the data. Additionally no single algorithm is universally accepted for discovering structure in multidimensional data-different approaches may reveal different clustering solutions (Jain Murty & Flynn 1999). Despite these limitations cluster analysis has a rich history in many disciplines and has be en used in every research setting imaginable to group objects (Hair et al. 2006 p. 561). Electronic commerce is just one example of a data-rich domain where cluster analysis can be used for data mining in order to discover consumer behavior and cluster them into meaningful groups for understanding effective web design and other ways to maximize profit. Another domain of interest is online communities of practice (COPs) because there is a plethora of trace data indicating user actions and behavior in knowledge sharing domains. 2.7.2 Cluster Ensembles Cluster ensembles are combinations of base clustering algorithms or methods used to improve clustering performance. Several researchers have experimentally demonstrated that some cluster ensembles routinely outperform single method clustering (Boulis & Ostendorf 2004 ; Kittler Hatef, Duin & Matas 1998 ; Turner & Agogino 2008) In order to achieve better performance results ensemble clusters combine diverse methods ; diversity can be measured by the extent that the 39


clusters use different sources of data or implement different inference engines in the algorithms (Strehl & Ghosh 2002 ) The goal of a cluster ensemble is to combine clusterers in a way that leverages the advantages of each base method or algorithm without inheriting their weaknesses. IS researchers advocate combining analytic methods to provide different views of reality and richer understandings of IS phenomena ( Scott & Walczak 2009). Cluster ensembles are common in several disciplines including economics statistics oncology pathology machine learning data mining computer vision bioinformatics biometrics and many others (Fred & Jain 2005 ; Granger & Lee 1989). Diversity in ensemble clusters can be explained using an example from sensor fusion. The combining of multiple sensor data provides significant advantages over single source data and a more complete view of the object in question (Hall & Llinas 1997). For example in the military domain several independent threat aircraft sensors detect different pieces of information that when combined provide a more complete view of the threat. One sensor may detect airspeed another may detect altitude and yet another may detect the temperature of the engine. Separately information from the sensors may indicate several types of aircraft but when combined the sensors provide a more complete signature of specific threat aircraft. Diversity in this case comes from sensing different attributes or features of 40


the aircraft. Diversity could also be achieved by using different clusterers--each with different biases in the algorithms--on the same features (Strehl & Ghosh 2002). Ensemble methods in supervised learning are learning algorithms that construct a set of classifiers and then classify new data points by combining the predictions of base classifiers (Dietterich 2000) Cluster ensembles in unsupervised learning environments are not as common as classifier ensembles in supervised learning domains but the concepts are similar (Strehl & Ghosh 2002 ; Turner & Agogino 2008). The idea behind combining multiple base classifiers or clusterers into ensembles is that the base methods may offer complementary information that when combined with other methods may improve the performance of any single method (Kittler et al. 1998). By taking different approaches to the computational space the combination of base classifiers may provide more accurate results. Representational issues can be corrected by ensemble methods because different methods allow the ensemble to have an expanded hypothesis space. By expanding the space the test observation may be more likely to be classified correctly. Nguyen Mannino Gardiner and Cios (2008) demonstrate this concept in the domain ofbioinformatics by developing a new hybrid algorithm that combines clustering and fuzzy cognitive maps for predicting protein functions. 41


Combining base clusterers into cluster ensembles can take place at either the data or decision level. Data level integration allows the cluster ensemble access to the original features of the data. In this case there is a plethora of options some of which include averaging proximity measures (Agrawal 2008 ; Kustra & Zagdanski), weighting features (Hall & Llinas 1997) fuzzy ensembles (A vogadri & Valentini 2009) and neural network ensembles (Hashem & Schmeiser 1995 ; Kittler et al. 1998 ) Decision level fusion is a case of knowledge reuse where the only input data for the cluster ensemble are the labels from the base clustering results ; the original features of the data are not used in the cluster ensemble. Since only the labels are available for analysis there are few options for combining the data into an ensemble cluster method A potential problem with combining clusterers at the decision level is that cluster labels are symbolic ; therefore cluster ensemble designers must solve label correspondence issues (Fred & Jain 2005) In the unsupervised learning domain there are two broad categories of combination mechanisms at the decision level : greedy optimization methods and graphbased methods. E xamples of greedy optimization methods include voting-based consensus clustering (Ayad & Kamel 2009) voting active clusters (Turner & Agogino 2008 ), and evidence accumulation clustering (F red & Jain 2005). Because greedy optimization methods seek to maximize a consensus function such as maximizing the average normalized mutual information ( ANMI ) among clusterings-computation is typically slow Graph-based methods often rely on 42


algorithms based on intuitive heuristics; as such, they tend to be more efficient than greedy optimization methods. Strehl and Ghosh (2002) developed three such hyper graph algorithms and a supra-consensus function, that when combined, serve as an effective and efficient cluster ensemble for unsupervised learning environments. Whether the problem domain is supervised or unsupervised learning, it is not known a priori which approaches of combining algorithms will be superior (Franke & Mandler, 1992). This is what makes ensemble clustering a challenging topic for research, and in particular, for this study. 43


3. Research Hypotheses and System Attributes The main focus of this research is to develop two IT artifacts: a semi-automated mixed analyses persona clustering method and a system that implements the method. Hevner et al. s (2004) DSR Guideline 3 requires that the efficacy of a design artifact be demonstrated by gathering and analyzing data and evaluating the artifact in terms of performance or other defmed metrics In this chapter four design attributes of the System for Persona Ensemble Clustering (SPEC) are explained and linked to the two primary measured goals of this researcheffectiveness and efficiency as depicted in the system model shown in Figure 3 .1. These two goals are in tum developed into hypotheses. This chapter begins with a brief explanation of the four design attributes followed by a more elaborate discussion of how they should assist in achieving the system goals. The design attributes and system goals are also central to the justification of SPEC and therefore discussed in more detail in a subsequent chapter 44


Design Attributes S y stem Goals Fig u re 3.1 SPEC System Design Attributes and Goals In designing this new method of persona clustering certain design attributes should be incorporated into the system to assist in accomplishing its goals The following attributes are designed into SPEC: 1. Aggregative: it is capable of analyzing multiple sources of qualitative and quantitative information. 2. Flexible: it is capable of reusing knowledge from various clustering methods 3. Scalable: it can handle a wide range of system users and clustering algorithms 45


4. Unsupervised: it does not rely on supervised learning algorithms or significant human judgment. It is anticipated that a system that possesses the above four attributes will help the system achieve the primary goals of this research: 1. It must be effective so that clustering decisions approximate human expert decisions and outperform other semi-automated methods. 2. It must be efficient so that fewer resources are required to cluster users into personas and support other phases of persona development. A system designed with the four attributes described above is needed to address the problems of current systems. Hevner et al. (2004) define a problem as the difference between a goal state and the current state of a system In the domain of persona clustering the problem is the gap depicted in Figure 3.2 that exists between the goal state-a more automated mixed analyses system that is effective and efficient-and the current state of existing methods / systems. None ofthe five semi automated studies in the persona literature attempt a mixed analyses approach with concurrent analysis of different data types though they do implement automation to different degrees ; thus the markers for these studies are mapped accordingly in Figure 3.2 Cooper and Reimann (2003) and Goodwin (2009) represent the traditional studies using manual qualitative clustering methods and they are also 46


mapped accordingly. Mulder and Yaar (2007) propose two types of persona clustering: manual qualitative and a semi-automated qualitative technique that subsequentl y utilizes quantitative data to validate the clusters. This study therefore develops a more automated persona clustering method that aggregates and concurrently analyzes qualitative and quantitative data and should be more effective and efficient than existing methods Quantitative ApJlroach Qualitative , ,' McGinn& Kotamraju e Almirall et al. (2010) ( 2008 ) \ e Sinha (2003) Java,: et al. (2007) Mulder & Yaar (2007) e Goodwin (2009) e Miaskiewicz et al. (2008) e Cooper (2003) Low (manual) Automation High (automated) Figure 3.2 Key Studies on Existing Persona Clustering Techniques Mapped According to Research Approach and Automation 3.1 Hypotheses The first two research questions relate to understanding the different methods of persona clustering ; thus no empirical study is needed to address these questions 47


The third and fourth questions consider the performance of new and existing clustering methods ; answering these questions does require an empirical study. Since performance is the key objective here it is defmed in terms of being effective and efficient. 3 .1.1 Effective In relation to the research questions the first component of performance is that a system of persona clustering must be effective The purpose of automating a persona clustering technique is to save resources and reduce subjectivity (Miaskiewicz et al. 2008) but there would be no point in designing such an automated system unless it was effective. Although there is no gold standard for comparing persona clustering methods the proposed and existing methods may be compared using a baseline clustering obtained from expert judgment. A new clustering technique needs to not onl y approximate the clustering judgment of experts but also outperform other semi-automated methods In order to help achieve the first system goal of effectiveness SPEC is designed to be aggregative or capable of combining and analyzing different types of data from disparate sources The term mixed analyses also applies to a system designed to be aggregative because it concurrently analyzes qualitative and quantitative data in order to make clustering decisions. It is common for system developers to have large amounts of qualitative and quantitative data on users during the interface 48


design process (Mulder & Yaar 2007) ; however there are no known cases in the literature of simultaneously combining these data to identify groups of users Current semi-automated methods only use limited segments of the data featureswhether they are qualitative or quantitative and manual qualitative methods are limited by human cognition (Halford et al. 2005). Persona clusters should be the result of aggregation and triangulation as different sources of data provide complementary information. The aggregation of data and research approaches shown in Figure 3 3 serves as a metaphor to achieve effective clustering. The SPEC method of persona clustering aggregates the clustering decisions of multiple clustering methods to triangulate the true groupings of IS users; as such it is able to reap the benefits of the individual methods without inheriting their weaknesses It is anticipated that a system designed to be aggregative will help SPEC be more effective than existing persona clustering methods Therefore this first goal is stated as the fust hypothesis (Hl). HI : The SPEC system will be more effective than the existing semi-automated persona clustering methods (LSA, F AIPCA and CA). 49


Goals & Needs Quantitative Goa Is & Needs Qualitative System Use Quantitative Figure 3.3 Aggregating Complementary Information to Discover Patterns Similar to Humans 3.1.2 Efficient The second component of performance relating to the research questions is that a persona clustering method must be efficient. The term semi-automated" is used for a persona clustering technique designed with the system attributes described here because it relies on computer-based tools to automate (as much as possible) the process of identifying personas and grouping system users into those personas. Automation includes not only the tools used to collect data but also statistical software used in the analysis of data and other software used to store and present data In order to help achieve the second system goal of efficiency SPEC is designed to be flexible, scalable and unsupervised. 50


. Flexibility is a key design attribute of SPEC made possible by employing an ensemble of cluster methods. A system designed to be flexible should be efficient because it allows the reuse of knowledge deri v ed from base clustering methods. As new data become available for a particular persona development project it would be ideal to reuse the existing knowledge from clustering methods without having to start from the beginning. With new data, the PDT could simply perform additional analysis and combine that information with existing knowledge from previous clustering efforts. This would be particularly useful in cases where the base clustering methods are conducted in parallel distributed systems SPEC does not require access to the original data features stored in the distributed systems ; instead SPEC only requires the cluster labels from the ensemble of clustering methods A system designed to be scalable should be efficient because it may handle increased system demands-such as the addition of users or clustering algorithms without requiring significant additional resources For PDTs using manual clustering methods even simple changes in users or data may require reanalyzing all of the data to ensure the clusters are still valid. On the other hand for PDTs using semi-automated methods changes may not affect them as much because they are more capable of dynamically changing cluster factors using statistical software Proponents of more automated persona development methods cite speed and cost savings as motivating factors for adopting semi-automated techniques (McGinn & 51


Kotamraju 2008 ; Sinha 2003) ; however some suggest increasing automation in all steps of the process including data collection (Miaskiewicz et al. 2008). Manning Root and Bacher (2005) estimate the cost of a typical manual qualitative persona development project at nearly $50 000 and they report finding a more complex project that cost $500 000 Miaskiewicz et al. ( 2008 ) asked experts to focus solely on the task of persona clustering ; they asked two experts to cluster their users into four personas-this relatively small project required over 20 hours of labor per expert A system designed to be unsupervised should be efficient because it can perform semi-automated clustering without the need for extensive resources Manual persona clustering relies on human judgment which for some development projects may mean experts in human factors and/or interface design. Semi-automated persona clustering methods aim to decrease the reliance on experts by using analytical software tools to group users. Although there are a variety of effective classification tools that learn from human judgment they do not solve the resource problem-human judgment is still needed to train them. Clustering algorithms do not require prior knowledge about the clusters or existing labels to make grouping decisions ; therefore fewer resources are needed to perform persona clustering. Although pattern recognition is an innate human function it becomes increasingly difficult as the amount of data grows. Manual qualitative clustering relies on human 52


identification of themes and patterns as well as organization of the data whereas automation tools use algorithms and visualization tools to quickly organize and present data. It is anticipated that a system designed to be flexible, scalable, and unsupervised will help SPEC be more efficient than the manual qualitative persona clustering method. Given the time advantages of computer processing over human processing, the second goal is also the second hypothesis (H2). H2: The SPEC system will be more efficient than a manual clustering method. 53


4. Research Methodology 4.1 Research Approach This study used the DSR approach to IS as prescribed in Hevner et al. s (2004) seven guidelines. Guideline 3 (G3) calls for rigorous evaluation of the designed artifacts using methodologies commonly found in the IS knowledge base including the following design evaluation methods : 1. Observational; 2. Analytical ; 3. Experimental ; 4 Testing ; 5. Descriptive. The case study observational evaluation method in the DSR approach was chosen in order to study the artifacts in depth in an organizational setting Zelkowitz (1998) describes the case study method of research validation for new computer technologies as implementing a new technology collecting data on a certain attribute (e.g. effectiveness and efficiency) and measuring its performance. For this study data are collected in the context of a military KMS and the performance measures described in Chapter 3 are calculated for the existing persona clustering 54


methods and the new artifacts presented in this research. Three online data collection methods were used: interview survey and observation. This mixed method approach simultaneously conducted qualitative and quantitative data collection and triangulated the data using a new system of persona clustering This chapter also discusses a pilot study that was used to evaluate the quality of the design process and the artifacts under construction. This was necessary according to Hevner et al. (2004) to ensure the artifacts satisfied the requirements and constraints of the problem with the context of the organizational setting. 4.2 Research Context The research context in this study was clustering users of organizational information systems in order to facilitate persona development. Professional forums in the U.S. Army are a type of KMS similar to distributed CoPs found in many civilian organizations (Brickey & Walczak 2010; Dixon Allen Burgess Kilner & Schweitzer 2005). The U.S. Army s Center for the Advancement of Leader Development and Organizational Learning (CALDOL) develops and facilitates the Army's Company Command (CC) professional forum. The Army s CC forum was selected as the research setting for this study for three reasons: it had a large base of users and data-there are more than 10, 000 members ; it was the most technologically advanced and mature forum in the Army and most importantly there was a need for research in understanding members and 55


providing effective interfaces to meet their goals and needs. Facilitators of the CC forum wished to improve its interface in order to increase system use and success as well as to support the knowledge needs of the members. Wenger (199 8 ) believes it is important to understand how forum members both at the core and at the boundaries interact in order to develop interfaces and processes that support the creation and retention of knowledge. Using personas in the forum will address the different interface needs of its members and motivate members to participate-this is a non-trivial problem for CoP facilitators (Gouvea, Motta & Santoro 2006). Additionally personas address an important IS design concern expressed by Kuo and Karimi (1988): different interface styles should still allow users to invoke every desirable system function (p. 1456). Therefore a goal ofthe CC forum was to create separate interfaces (based on identified personas) that support the same functionality yet are different in style. 4.3 Research Participants The population of concern is organizational IS users More specifically the target population is members of the U .S. Army's CC forum The sampling frame came from a list of the 500 most active members in CC forum Active forum members were defined as those members who had at least 1 00 combined page views or a minimum of 50 downloads offorum content during the past 12 months. The sampling strategy was purposive sampling to ensure the participants were familiar 56


with the forum s features and had enough exposure to the forum to provide quality data. Miles and Huberman (1994 ) further classify this type of purposive sampling as criterion sampling because the selected participants meet a minimum threshold of forum use According to Cooper and Schindler (2008) purposive sampling is appropriate for exploratory studies especially when the researcher believes participants should meet some criterion of importance to the study such as experience with a phenomenon Several factors were considered in determining an appropriate sample size for this study This study aimed to reach data saturation and information redundancy to ensure it captured a robust collection of goals and needs for persona development. During the literature review for this study, it was found that 22 of the 27 persona development studies used the manual qualitative method of persona clustering and development. Not all studies provided a detailed discussion of the participants ; however in those studies that revealed such information the authors typically sampled between 6-20 participants (Brickey Walczak & Burgess in press) In the five studies that implemented a semi-automated clustering method the number of participants ranged between 20-63 participants McGinn and Kotarnraju (2008) initially surveyed 1 300 participants to identify the number o f persona groups ; however they only interviewed 26 participants to collect the necessary data on goals and needs in order to construct personas Additionally, F AIPCA results are more reliable when there are at least 50 samples ( Tabachnick & Fidell 2007). 57


When all of these factors were taken into consideration it was determined that the desired sample size for this study was at least 50 members 4.4 Instruments Used in Data Collection Within the domain of IS personas are artifacts for design that model information system users. In order to create personas some experts believe it is important to reconstruct the users mental models to understand their goals (Goodwin 2009; Mulder & Yaar 2007). Norman (1988) acknowledged that user goals are difficult to capture identify and represent in user models. Therefore this study attempted to triangulate user data to expose the truth for each participant s goals and ultimately mental model by employing several instruments. Additionally in order to reduce bias the instruments relied on self-reporting and observation Data for this study came from three online instruments : an interview a survey and observations (in the form of user transaction logs stored in the forum s server). The online interview consisted of six semi-structured interview questions focused on user goals needs and frustrations for the forum. The use of an online interview was appropriate for the participants due to their geographic dispersion which included some participants in war zones the ability to quickly obtain transcribed interview data and the ability to build rapport by including the forum administrators and communicating within the Army s trusted email system (O'Connor Madge, Shaw & Wellens 2008) The interview questions (to be used in the LSA method) 58


were developed by adapting questions found in Goodwin (2002), Miaskiewicz et al. (2008), Mulder and Y aar (2007), and Pruitt and Adlin (2006) to meet the needs of the forum The online survey asked 21 rating scale questions focused on user goals and needs; and several questions varying in structure to capture user demographics, experiences and interests (to provide detailed persona characteristics). The rating scale questions (to be used in F AIPCA) were adapted from Sinha (2003) and Mulder and Yaar (2007). Appendix A lists the interview and survey questions used to collect user goals and needs. Forum members were "observed" by recording each participant's system usage behavior in the form of counts-the number of times members performed certain online activities in the forum-in the forum s server transaction logs. These logs are electronic records of user behavior captured by system software ; the actual data recorded as a result of online user activity is known as trace data (Jansen, Taksa & Spink, 2008) Depending on the level of sophistication in the server s software, there could be a plethora of user intentions and activity stored in the transaction logs. This database of user activity varies in explicitness and is an indication of users goals (Stone & Stone 2005). The server logs observed and recorded four variables for user activity: contributions, page views, profile views, and document downloads. Table A.2 in Appendix A contains a more detailed description of the usage variables found in the forum s 59


server logs The validity of using such online transaction log data has been questioned but when done with care has been found to be an effective unobtrusive technique for observing online users and understanding their behavior (Jansen et al. 2008). The CC forum s facilitators sent email invitations to the 500 most active members ; reminders were sent to members who did not respond to the initial invitation within two weeks The facilitators also queried the forum server to obtain the members actual forum usage A practical consideration for this study was getting forum members to participate in data collection because junior Army officers lieutenants and captains-were in high demand for training and tactical assignments in Iraq and Afghanistan. As a result data collection took place over five weeks in order to allow members sufficient time to respond. Although 63 members completed the survey and 70 completed the interview only 53 members completed both; therefore the response rate was 11%. The sample size was above the minimum goal of 50 participants; it was deemed sufficient considering the time demands placed on the target population and the redundancy of information observed in the responses The average length of military service for the 53 participants in this study was 9 years and 9 months Table 4.1 contains the mode for each demographic ; Tables B.1 through B 6 in Appendix B show the full demographic statistics. Based on the demographics in Table 4.1 and the average length of service the typical participant 60


was a male captain between the ages of 29-31 with 9 years and 9 months of service currently commanding a unit of soldiers with a Bachelor's degree and married with children. (Appendix C contains the Colorado Multiple Institutional Review Board certificate indicating the research was "not human subject research and the approved informed consent form is in Appendix D.) Table 4.1 Most common categories for participants by demographic Frequency Percent of Demographic Category (53 max) Participants Age 29-31 20 38% (Years Old) Gender Male 48 91% Marital Status Married with 30 57% Children Education Bachelor s Degree 41 77% (Highest Earned) Rank Captain 35 66% Command Status Currently in 20 38% Command 4.5 Data Analysis: Clustering Members and Evaluating System Performance According to the DSR guidelines proposed by Hevner et al. (2004) rigor (G5) is judged as the researcher s ability to select appropriate techniques to develop an artifact and select appropriate means to evaluate it (G3). Although there is no agreed upon "gold standard for persona clustering the manual qualitative (expert) method served as the de-facto baseline for comparing the existing semi-automated and the proposed ensemble cluster techniques. Miaskiewicz et al. (2008) compared 61


the performance of a semi-automated method with expert persona clustering though the other semi-automated studies offered no empirical e vidence-they claimed however that their methods were valid (for replacing human judgment) based on qualitative evaluations. A panel of six experts reviewed all member data to first identify and describe user groups and second to assign members to those groups using manual qualitative methods Experts were selected based on significant experience in the fields of user interface design human factors and Army professional forums. Not all of the experts were available for both tasks; howe v er at least four experts were available for each task to achieve consensus decisions. The tasks were: 1. Identify the number of user groups and describe each group ; 2 Assign each member to one and only one group. Figure 4.1 depicts the conceptual model for comparing semi-automated clusterings with expert clusterings using manual qualitative methods While the experts were completing their clustering tasks three semi-automated persona clustering methods were applied to the data. 1 Latent Semantic Analysis was conducted using an online tool provided b y the University of Colorado (http: // lsa The resulting cosine matrix-representing the similarity of text responses among the interview 62


participants-was analyzed in the Statistical Package for Social Sciences (SPSS) version 16 using hierarchical agglomerative clustering. 2 Factor (Principal Components) Analysis was conducted to reduce the survey data into components that would serve as personas The number of components identified in this step was used as the number of clusters for the LSA and CA methods. Using SPSS component scores were computed for each of the participants and used to conduct hierarchical agglomerative clustering. 3 Cluster Analysis was conducted in SPSS to assign participants to clusters according to observed forum activity. 63


Data Survey Numeric Interview Manua l Cluste ri ng Qualitative Human judgment Affinity diagrams, coding \ J y Textual Num e ric Semia u toma t ed Clust e rin g Qualitative & Quantitative LSA, F A /PC A CA & Ensemble y Clusters 1 ( Agr eement F igure 4.1 Comparing the Performance of Persona Clustering Methods Efficiency was determined by measuring and comparing the time required to perform p ersona clustering for the SPEC and expert methods Performance of the existing semi-automated qualitative and quantitative clustering techniques was measured by comparing the assignment agreement for each method with the manual expert clustering. A simple way to express agreement between the semi-automated and manual clustering techniques is to calculate percentage agreement by counting 64


the number of matching user assignments into clusters and dividing by the total number of users (participants in the study). Although percentage agreement is widely used it is misleading because it does not account for the fact that some percentage of the agreement is expected by chance ( Barrett et al. 1990) Cohen (1960) introduced a measure of agreement that adjusts the percentage agreement due to chance Cohen s kappa a statistical measure of inter-rater agreement for categorical items was used in persona literature to compare LSA and expert clusters (Miaskiewicz et al. 2008) Although interpretation of Cohen s kappa is a matter of debate Landis and Koch (1977) offered the guideline in Table 4 2 for interpreting degree of agreement. Table 4 2 Interpreting Cohen s kappa Cohen s Kappa Value Degree of Agreement < 0.21 Poor 0.21-0.40 Fair 0.41-0.60 Moderate 0 61-0.80 Substantial > 0.80 Almost Perfect 65


4.6 Pilot Study Before the final survey was given a pilot study was developed and administered to ensure a prototype could be built as an instantiation of the proposed method. As prescribed in the DSR guidelines (G6) there was an extensive search to find and test design ideas and to find appropriate tools to implement the method. The pilot study involved an iterative process so that any problems in design and analysis could be identified and corrected prior to the main study. Other important activities performed during the pilot study were acquiring software tools needed in the analysis and learning how to use those tools As described by Brickey Walczak and Burgess (in press) the pilot study followed the research methodology previously described in this section with some modifications. The sample consisted of 18 members of the Army's CC forum assigned to a single representative U.S Army installation-Fort Carson Colorado. Additionally the expert panel consisted of just two experts in the fields of user interface design and facilitation of Army forums. 4.6.1 Pilot Study Results All three of the persona clustering methods previously discussed were used to group the participants into clusters for persona development: LSA, F AIPCA and CA. FAIPCA revealed three groups ofusers in the pilot data. Table 4 3 shows the cluster assignments for each of the three clustering methods and the expert panel 66


clustering. (Participant identification numbers r ange from 1-25; however the numbers are not contiguous because several participants failed to complete the study ) Performance of the existing semi-automated qualitati v e and quantitative clustering techniques was measured by comparing the assignment agreement for each method with the manual expert clustering and computing Cohen s kappa for each method The performance of each method in terms of the raw number of matching cluster assignments and the adjusted kappa measure is listed in Table 4.4 The LSA kappa o f 0.308 indicates a fair amount of agreement with experts ; the F A!PCA kappa of 0.481 indicates moderate agreement; and theCA kappa of0.321 indicates fair agreement. These results indicate that the quantitati v e semi automated clustering methods proposed in the literature outperform LSA the only qualitati v e semi automated clustering method This finding appears to contradict the suggestions in the literature that qualitative data is better for persona clustering since both F A!PCA and CA had higher agreement measures in this study This unusual result may be influenced by the domain of application ; ne v ertheless it indicates the need for utilizing ensemble cluster methods to better triangulate the data regardless of unusual domain influences 67


Table 4.3 Pilot study clustering results Persona Clusters 1 2 3 Method Users Assigned to Clusters 1,8,9,10, 3,5,7,14, 20 16 LSA 11,13,15, 17,18,22, 24,25 1,9,16,22 5,7,8,13, 3 10,11, FAIPCA 14,15,20, 17,18 24,25 1,5, 11,16, 3,7,9,13, 8,10 CA 24,25 14,15,17, 18,20,22 1,8,9,13, 3,5,7, 14, 10,11,17, Expert 16,18 15,20,22, 25 24 Table 4.4 Cohen s kappa and agreement for each method Persona Matching Method Assignments Kappa Degree of Agreement LSA 10/18 0.308 Fair FAIPCA 12/18 0.481 Moderate CA 10/18 0.321 Fair 4.6.2 Moving Beyond the Pilot Study The most important outcomes of the initial phase of the pilot study were testing the analytic tools and refining the research design. Since the persona literature lacked 68


detailed steps to replicate the clustering methods a considerable amount of effort was required to analyze the data. For example Miaskiewicz et al. (2008) used spreadsheets to manually perform the LSA clustering ; this process took more than 10 hours to replicate according to the authors procedures. This pilot study resulted in automating the clustering using matrix commands in SPSS and reducing the processing time to a few seconds. The pilot study also identified opportunities to improve the research methodology. The original sampling strategy used a convenience sample because it was practical for the forum facilitators to invite members in the same geographic location. It was found that several of the members who responded to the pilot study invitation lacked significant exposure to the forum and had difficulty recalling features of the forum, as well as their recent behaviors while in the forum. Therefore it was decided that future studies required purposive sampling that would solicit members who were active in the forum and more likely to be familiar with its features and able to recall their recent activity. The final sample frame consisted of the 500 most active members in the forum (as previously defined) over the past 12 months Another lesson learned from the pilot study was that the expert panel needed to consist of at least three members in order to reach consensus clustering decisions. Since the experts were volunteers and had little time to participate it would save them time if they did not have to coordinate with each other to resolve differences in 69


clustering judgment. Therefore additional experts with expertise in human factors research were solicited from the Army Research Laboratory. The final pool of experts for the volunteer panel consisted of five experts in human factors and one expert in Army professional forums each with significant experiences in research and practice Finally results ofthe pilot study indicate agreement among the base clustering methods This provides support for the aggregation of clustering methods and triangulation toward a clustering decision that approximates human judgment by fmding true clusters. Therefore the pilot study results support the combination of clustering methods into a cluster ensemble 70


5. The System for Persona Ensemble Clustering (SPEC) The clustering of IS users into personas is critical to the success of resulting personas because it identifies groups of users sharing similar goals needs and behaviors-the characteristics upon which personas are based. Existing semi automated clustering methods use either qualitative or quantitative approaches to mimic human judgment while at the same time reducing the resources required to identify personas: it takes time money and training to manually identify clusters (McGinn & Kotarnraju 2008 ; Miaskiewicz et al. 2008; Sinha 2003) SPEC is a semi-automated prototype that implements a persona cluster ensemble method of simultaneously analyzing qualitative and quantitative data in an efficient manner to effectively approximate expert judgment. Additionally SPEC provides the foundation for a persona development methodology that results in realistic data driven personas. The method for SPEC, shown in Figure 5.1, is a cluster ensemble method that combines the base cluster labels (u1 u2 u3 ) through a consensus function within the cluster decision center. Numerous studies using cluster or classifier ensembles have reported performance improvements over individual or base algorithms (Boulis & Ostendorf 2004; Fred & Jain 2005 ; Gionis Mannila, & Tsaparas 2007; Kittler et al. 1998; Strehl & Ghosh, 2002; Turner & Agogino 2008) Nevertheless, merely 71


combining base clusterers is not enough to guarantee improvement ; the challenge is determining an effective way to combine the base clusterers to achieve improved performance in an ensemble system (Franke & Mandler 1992 ; Hong Sam Yuchou & Qingsheng 2008). There are three basic concepts behind this persona clustering method The first two concepts are also system attributes: SPEC should be aggregative and flexible The aggregation of data allows triangulation and consensus across the base clusterers. Since data sources and clustering methods may change over time based on the structure or nature of the data SPEC is designed to be flexible. The third concept behind SPEC is simplicity: simple methods are easier to understand and often work well (Franke & Mandler 1992 ; Witten & Frank 2005 ) This cluster ensemble method is summarized as follows: 1. User data are collected using web instruments and system queries 2 The number of groups (clusters) is identified using the FA/PC A method 3 Members are assigned to the designated groups by the base clusterers (individual clustering methods: LSA FAIPCA and CA) 4 The clustering results from the base clusterers are combined in a cluster decision center to arrive at a fmal cluster decision 72


Goals & Needs Textual = ID#of groups Goals & Needs Numeric Cluster Decision Center System Use Numeric u0 = 1 2, 3, ... n Figure 5.1 Combining Clustering Output into a Cluster Decision Center in SPEC 5.1 Data Collection and Number of Clusters The SPEC system relies on human intervention at several points of the process and the use of several computer-based tools Data collection is automated to make the process more time efficient. Several tools exist to facilitate data importation from online surveys and interviews into data files and system administrators can easily export system trace data into data files for analysis in SPEC. Data collected in the second step of persona development may be used not only for identifying user 73


groups and membership but also for filling in the persona details and bringing them to life in the last step of persona development. The base clusterers have access to overlapping and complementar y data primaril y concerned with the goals needs and behaviors of IS users. SPEC combines and triangulates knowledge from the base clusterers to make a consensus label decision Text from interviews consist of user-reported explanations of s y stem use; quantitative data from surve y items are also user-reported responses of system use ; and quantitative data from server transaction logs capture actual user behavior. SPEC uses three methods-LSA, FA/PC A and CA as input to the cluster decision center. Each of the base clustering methods selected for SP E C ultimately needs the number of clusters as input for analysis. One of the challenges of cluster analysis is determining the number of clusters to represent the underlying structure of the data as the "right number often depends on the desired granularity in the analysis (Ghosh & Strehl 2005 ; Gionis et al. 2007). Visually plotting data ma y reveal obvious groupings in low dimensional data sets ; however as the number of dimensions increase-for example above three visual tools are limited and other methods are necessary. F A!PCA is a technique capable of reducing data dimensionality so that many variables can be reduced to just a few for further anal y sis (Tabachnick & Fidell 74


2007). Within the domain of persona development two studies use this general approach to identify the number of personas and to describe what qualities users share within the clusters: Sinha (2003) proposes a PCA-based persona clustering method and McGinn and Kotamraju (2003) apply FA. The number of clusters is determined through the application ofFAIPCA by : evaluating the number of components that capture most of the variance in the data inspecting eigenvalues and scree plots and assessing how well the individual variables load with the components (Hair et al. 2006) Analysis of the variables that load with the components reveals the qualities that explain the differences between the persona groups (Sinha 2003) For example if three variables that load on a component represent a user s goal to participate in discussion threads then one quality of the persona could be prefers discussion threads. Based on the extensive use and theoretical soundness ofFAIPCA over the last century and its recent applications in the domain of persona development SPEC adopts F AIPCA for identification of components or factors that represent persona groups as well as for explaining what qualities the groups represent. The F AIPCA input data comes from survey data and represents scaled responses of user goals and needs The number of clusters obtained from FA/PC A is used as input by all of the base clusterers in SPEC 75


5.2 Base Clusterers 5.2.1 Latent Semantic Analysis As previously discussed the LSA persona clustering method developed by Miaskiewicz et al. (2008) analyzes the similarity of text to make cluster decisions. Text from user interviews is analyzed in an LSA tool that extracts the contextual usage meaning of words using a form of singular value decomposition and represents user proximity in a similarity matrix The output matrix is then used as input in hierarchical agglomerative cluster analysis. The inclusion of LSA in SPEC can be justified for several reasons. First LSA provides a method for aggregating data because it is capable of analyzing text for the clustering of users Without a text analysis tool there would be no way to take advantage of rich qualitative information relating to the goals and needs of system users. Second LSA meets the system goal of effectiveness LSA theory is proven to make judgments of textual data in a manner similar to humans (Landauer et al. 1998). Third LSA meets the system goal of efficiency. The LSA tool quickly analyzes text and turns the similarity into a proximity matrix that can be analyzed in seconds using analytical software. Finally LSA provides a scalable method because it can analyze an increasing number of cases without burdening the system. It only takes a few more seconds to analyze the text of 100 cases than it does 20 cases Although not relevant for the persona clustering process textual data can 76


also be used in step 4 of persona development to bring personas to life with realistic descriptions of typical users. The numerical data obtained by surveys and system usage cannot replace the richness of textual data. For these reasons LSA is a necessary component of SPEC. 5.2.2 Factor Analysis and Principal Components Analysis The F AIPCA persona clustering method is a key component of SPEC for several reasons. First as previously mentioned F AIPCA provides theoretical evidence for the identification of the number of personas. This meets the first goal of effectiveness because it automatically identifies factors in an automated fashion without the need for expert judgment. Sinha (2003) uses PCA for data reduction and interpretation of components; McGinn and Kotamraju (2008) similarly use FA to identify the latent factors to represent personas. Second F AIPCA serves as an efficient method for reducing data to simplify the identification of groups of cases. Fern and Brodley (2004) also use PCA as a data reduction method and a preprocessing step in a base clusterer, which is ultimately combined into a cluster ensemble. F AIPCA reduces the variables in a data set to component scores for each of the users If, for example, FA/PC A identifies 3 components in the data set out of 40 variables, it can reduce the variance of the data and represent each user s data in just 3 component scores. Similar to Sinha s (2003) approach FA/PCA is used in SPEC to identify user component scores and determine which users belong to each 77


cluster. The actual clustering is achieved through cluster analysis using the calculated component scores. The fmal reason to include the F AIPCA clustering method in SPEC is that it can process the quantitative data measuring user attitudes and preferences thus designing in an aggregation capability. Sinha (2003) and McGinn and Kotamraju (2008) introduce similar F A/PCA persona clustering methods as a response to calls for more quantitative approaches to persona development. The survey data is conceptually equivalent to the interview data used in LSA in that it captures users mental models to help understand their goals for a system Therefore a quantitative research method such as FA/PC A should serve as another view of the same underlying goals and assist in data triangulation 5.2.3 Cluster Analysis The CA persona clustering method is a key component of SPEC for several reasons First CA is a sound theoretical approach for identifying the underlying structure of data sets. It has been used successfully in multiple research contexts (Hair et al. 2006). Second CA relies on unsupervised learning to identify clusters. As stated in the previous discussion of the system attributes a persona clustering method needs to implement unsupervised learning because semi-automated methods aim to reduce dependencies on human judgment. The final reason SPEC implements CA is that it supports the system attributes of flexibility and aggregation because it provides 78


numerous clustering algorithms to analyze data from multiple, varying sources. SPEC implements CA to analyze user interactions with a system to understand user goals and needs from another perspective (i e., actual use versus self-reported system use). The other base clusterers analyze data representing self-reported behavior from interviews and surveys, whereas cluster analysis observes actual behavior in an IS. Javahery et al. (2007) also use cluster analysis to identify groups of IS users according to actual behavior. Users' interactions with an application provide a rich set of data for which cluster analysis can be used to infer knowledge about user online activity patterns to identify user groups (Srivastava Robert, Mukund & Pang-Ning 2000) Once the base clusterer decisions are known the labels become the input to the SPEC cluster decision center as shown in Figure 5.1. This decision center is the heart of the cluster ensemble where SPEC combines the base cluster decisions into a fmal cluster assignment. 5.3 Combining Base Clusterers into a Cluster Ensemble The motivation for combining base cluster decisions is conceptually based on the Dempster-Shafer theory (Dempster 1968; Shafer 197 6), which is a generalization of the Bayesian theory of subjective probability. The Dempster-Shafer theory is a mathematical theory of evidence that relies on rules for combining evidence from different sources to arrive at a final solution It is commonly used to aggregate data 79


from multiple sources to simplify information and make decisions in sensor fusion problems (Sentz & Person 2002) Although the Dempster-Shafer theory is based on several mathematical rules for combining evidence it is only used in this study as motivation for combining base clustering decisions as evidence for a final cluster label. SPEC assumes equal probabilities to determine labels so none of the specific combination rules based on the Dempster-Shafer theory are applied here. The SPEC cluster decision center combines evidence from the base clusterers to make incremental decisions. The first decision is simply a majority vote of the base clusterers. If there is no majority a more advanced hyper-graph cluster ensemble (HGCE) method developed by Strehl and Ghosh (2002) makes the fmal decision. The cluster decision center shown in Figure 5.2 follows one to three steps depending on the results of the first step 1. Members are assigned to clusters by the base clusterers. If there is a majority vote the user is assigned to the corresponding cluster. 2. If there is no majority the base clusterer votes (labels) are used as input to three additional hyper-graph clustering algorithms : Cluster-based Similarity Partitioning Algorithm (CSP A) HyperGraph Partitioning Algorithm (HGP A) and Meta-Clustering Algorithm (MCLA). 80 \


3. The results ofCSPA, HGPA and MCLA are evaluated and the supraconsensus function selects the algorithm with the greatest average normalized mutual information (ANMI) for the final cluster decision. 0 o { Assign to Cluster y Majority Cluster? N Cluster Decision Center Figure 5.2 SPEC Data Flow and Decisions 5.3.1 Majority Vote SPEC implements a majority vote rule also known as consensus clustering for a number of reasons. First a majority vote rule is flexible because it only needs the output labels from the base clustering methods ; therefore this form of knowledge reuse does not require the original data features. This creates flexibility because the number of and types clustering methods can change and a majorit y may still be 81


reached. This leads to the second reason for using a majority vote rule : it facilitates the aggregation of disparate data sources. Qualitative and quantitative data can be aggregated because the base clusterers used to analyze each data type produces its own cluster label that is reused in the majority vote Third as recognized by Hong et al. (2008), consensus clustering is an effective way to leverage the decisions of multiple unsupervised base clusterers to improve cluster robustness and stability. A majority vote combines the decisions of multiple clusterers in a way that leverages the advantages of each without inheriting their weaknesses. Finally implementing a majority vote rule satisfies the system goal of effectiveness by triangulating clustering decisions. Consensus clustering works well in cluster combinations with complementary data sources such as imagery data taken from different sensors with different views (Jimenez Morales-Morell & Creus, 1999). Although SPEC is not designed for detecting imagery patterns it does rely on complementary data sources and clusterers to triangulate the data and reveal groups of users. It is appropriate to implement a majority vote scheme in SPEC because there is a relationship between the clusters obtained from the base clusterers-they all attempt to separate users into clusters based on similar goals and needs If the clusters did not capture a similar underlying structure a majority vote rule would not be appropriate due to correspondence issues among clusters (Ayad & Kamel 2009) 82


For systems with only two to four personas it is likely that a majority will be reached; however for systems with more than four personas it is less likely. Therefore, in order to make a final cluster determination additional steps may be necessary such as the graph-based cluster ensemble in steps 2 and 3. 5.3.2 Hyper-graph Cluster Ensemble The last component of SPEC an ensemble of hyper-graph cluster algorithms is necessary for several reasons. First employing the HGCE supports the system attribute of flexibility. If there is no majority in step 1 ofthe cluster ensemble, the cluster results are used as input to three graph-based clustering algorithms and a supra-consensus function developed by Strehl and Ghosh (2002). Their HGCE method is flexible because they designed it to work with legacy or proprietary systems in cases where they would not have access to the original data features-the only input needed is the cluster labels (He, Xu, & Deng, 2005). Second, the HGCE method supports the system attribute of being unsupervised. The three unsupervised learning cluster methods developed by Strehl and Ghosh (2002) are graph methods based on consensus between data partitions. All three algorithms begin with a transformation of the clusterings into a hyper-graph representation The first algorithm, Cluster-based Similarity Partitioning Algorithm (CSP A) determines an induced pairwise similarity from the original clustering andre-clusters the objects to obtain a new clustering The second algorithm HyperGraph Partitioning Algorithm 83

PAGE 100

(HGP A), defines a cut objective that partitions the hypergraph into clusters. The last algorithm Meta-Clustering Algorithm (MCLA) identifies meta-clusters based on applying hyperedges and consolidates them to reveal new clusters (Fred & Jain 2005). Another reason SPEC uses this specific ensemble method is that the HGCE methods have been found to be more effective than single cluster methods and even other ensemble methods. Cluster ensemble research is gaining in popularity in several disciplines (Gionis 2007) especially in the machine learning community (N Nguyen 2007) Ensemble classifiers and clusters ha v e been used extensively in other domains and studies have consistently reported improved performance of ensembles over single classifier or cluster models (Boulis & Ostendorf 2004; Fred & Jain 2005; Kittler et al. 1998; Strehl & Ghosh 2002 ; Turner & Agogino 2008). More specifically graph-partitioning cluster ensemble methods like CSPA HGPA and MCLA have received considerable attention in the clustering literature (Fern & Brodley 2004). Turner and Agogino (2008) experimentally demonstrate that the hyper-graph methods provide better results than their greedy optimization methods based on voting active clusters. Finally SPEC uses the HGCE method because it is theoretically based in information theory Since the performance of each algorithm varies depending on noise and diversity in the data they are compared according to the average 84

PAGE 101

normalized mutual information (ANMI) metric which is a measure of similarity between the original clusterings and the new clusterings determined by CSP A HPGA and MCLA (He et al. 2005) The final clustering is determined by a supra consensus function that chooses the algorithm with the greatest ANMI (Strehl & Ghosh 2002). The HGCE method will theoretically result in a final cluster decision that best triangulates the base clustering decisions. 5.4 Revisiting the Pilot Study The first phase of the pilot study tested the analytic tools for the base clustering methods and refined the research design. After searching the design space for the new SPEC method and prototype data from the initial pilot study were used to test the feasibility of the new artifacts and demonstrate their utility. First the majority vote rule was applied to the base cluster labels In 11 o f the 18 pilot study cases at least two of the base clustering methods reached consensus-in those cases the consensus label became the final cluster decision. The majority vote label matched the expert panel in 10 out of 11 cases including 3 out of the 4 unanimous cases Next Matlab version R2009b and the HGCE package (found at http: // were installed on a computer and tested Since the HGCE package was written for older operating systems some of the code had to be updated to properly function. After verifying the configuration using test data found in the software package the HGCE analysis was conducted using the base cluster labels from the pilot study. 85

PAGE 102

For the remaining 7 cases (in which there was no majority) the HGCE algorithms were applied and the output labels matched the expert labels in 6 cases. In summary, SPEC was tested on the pilot study data using the majority vote rule and the HGCE algorithms. In 16 out of 18 cases the SPEC and expert labels matched and resulted in an almost perfect" agreement kappa of 0.830. These results provided confidence in the ability to implement SPEC on the main study data and continue with a full system validation. Although this study does not explicitly measure and evaluate the quality of the resulting personas (as that is left for future research) it was important to use the artifacts to create personas and demonstrate their utility in the context of the CC forum. Three personas were developed by using SPEC's clustering results and sorting summarized persona data with visualization tools found in common office automation software tools. Members of the CC forum s design and facilitation team evaluated the personas and found one of them to be insightful for design; the other two provided useful data, but did not identify previously undiscovered user mental models. As a result of this analysis the CC forum designers added a new feature to the main interface that allowed members to quickly find knowledge in the forum. 86

PAGE 103

6. Results: SPEC Validation As stated in the introduction this study builds and evaluates two IT artifacts for persona clustering. This chapter is organized in terms of applying the new SPEC persona clustering method and its instantiation answering the research questions and validating SPEC according to the performance measures (G3) Using the SPEC method described in the last chapter the base clustering methods were applied to the data and their clustering outputs were sent to the cluster decision center. At that point the majority vote rule was applied; ifthere was no majority clustering decision the final cluster decision for SPEC was then based on the graph-based ensemble cluster method. Concurrently with SPEC s analysis an expert panel made clustering decisions that became the baseline to compare the existing semi automated methods with SPEC. Finally personas were created from the data collected and analyzed with statistical software to demonstrate the efficacy of SPEC in supporting the creation of personas for IS interface design. 6.1 Factor/Principal Components Analysis Results Factor (Principal Components) Analysis with varimax rotation was conducted for two purposes: first to identify the number of user groups and describe their qualities; and second to compute component scores and assign members to personas. The Statistical Package for Social Sciences (SPSS) version 16 was used 87

PAGE 104

to conduct F A/PCA on the 21 variables from the online survey for member goals and needs. Analysis of the eigenvalues indicated that five components may have been chosen for extraction to represent the number of user groups though visual analysis ofthe scree plot suggested three to five components. Gorsuch (1974) recommends using factors or components with at least three salient variables and Cornrey and Lee ( 1992) suggest that items with loadings greater than 0. 71 are considered excellent and those in excess of 0.63 are very good. Table 6.1 displays the variables and component loadings for the rotated components with loadings less than 0.40 omitted to improve clarity As indicated in Table 6.1, components one and two had at least three variables with rotated loadings greater than 0. 71; component three had two rotated variables above 0.71 and two above 0.63. Components four and five only had two salient variables with loading above 0.63. Therefore only three components--one through threewere extracted for interpretation as personas. The three extracted components accounted for 50% of the variance. The number of components (3) identified in this step was used as the number of clusters for the LSA F AIPCA and CA persona clustering methods. 88

PAGE 105

Table 6.1 Component loadings for the rotated components Component 1 2 3 4 5 Goal3 917 Use2 872 Goal5 807 Behave2 787 Behave1 663 Goal6 534 Behave4 797 Behave3 770 Feature6 726 Feature5 708 Behave5 668 Goal? 628 Feature3 .753 Feature2 744 Use1 692 Goal2 677 Goal1 869 Goal4 809 Feature1 786 Feature4 704 Behave6 590 Components one through three in Table 6.1 were extracted and interpreted as personas for the conduct of this study. Table 6.2 provides an interpretation of the components using variable definitions Component 1 was interpreted to consist of elements of a knowledge provider in the forum. This group of members is likely to contribute answer questions and share expertise. For subsequent analysis this was referred to as persona 3. Component 2 was interpreted as a social networker based on behaviors indicating a proclivity for social aspects of the forum and no specific 89

PAGE 106

goals related to contributing or consuming knowledge. This component served as the foundation for persona 2. Finally component 3 was interpreted to represent persona 1 : a knowledge consumer or a member specifically focused on finding answers to issues or concerns using search tools and other features. One could argue that such persona labels could have been determined by inspection of the survey questions in Appendix A; however such a subjective method could have just as easily resulted in additional labels. Mulder and Y aar (2007) suggest that PDTs may have a sense for the important variables that they feel will determine personas but that statistical analysis would provide a richer perspective of which variables were most important and likely to describe the persona labels (p 158). The PCA method used here provided statistical evidence for the identification of the number of personas and the variables that define them. Using SPSS component scores were computed for each of the participants and used as input variables for clustering Because it cannot be known a priori which clustering method will perform best for all data ; decision guidelines were used to determine appropriate clustering methods. One of the heuristics applied was based on the presence of outliers. Component scores from the first two components were used to plot members according to survey responses resulting in the plot in Figure 6.1. Visual observation of the data revealed several outliers for example members 5 6 and 42. Hair et al. (2006) recommend using the average linkage or centroid hierarchical agglomerative clustering methods if there are outliers ; Everitt, Landau 90

PAGE 107

and Leese (2001) recommend the average linkage or Ward s method for continuous data types (such as the data set in this study) Since both the centroid and Ward's methods resulted in highly uneven clusters they were not considered for final clustering The average linkage within groups method of clustering based on Euclidean distance resulted in more balanced cluster sizes of 33, 3, and 17 members for personas 1-3 (respectively) ; therefore this method was used to conduct cluster analysis and assign members to the three persona groups The F A!PCA clustering results for each member are presented in Appendix E. Table 6.2 Interpretation of the components Component/ (Persona) Interpretation Item Description Knowledge Goal3 Contnbutinglposting to discussions Provider: GoalS Uploading/posting documents navigates furum to Goa16 Casually browsing the furum 1 I (3) find opportunities Behave 1 Ask a question in the furum to contribute Behave 2 Answer other questions in the furum Use 2 I use the furum to share my experiences or give advice to others Social Networker: Behave 3 Look for peers or friends in the furum to see what they are doing uses furum to Behave 4 Post details to your profile (dog tag) connect with View a member s profile after reading his/her posting 2 I (2) others and enjoys Behave 5 (document or discussion) social netwoking Feature 5 Leader Challenges (interactive scenarios) fuatures Feature 6 Social networking tools to connect and communicate with other Goal7 Seeing who is logged in to the furum Knowledge Feature 2 Ability to search using a search tool (by typing in key words) Consumer: looks Feature 3 Ability to view and/or participate in discussion threads on fur specific diffurent topics 3 I (I) info rrnatio n to Goal2 Reading discussions answer questions Use I I use the furum to find answers to my questions / concerns or to seek advice 91

PAGE 108

.... 1/J 2 ooooo1 oooooiii c "' .e .... o oooooCl) 0 u 1 00 00o o ti a: (!) -2 0000ow a: -3 0000Q-I -4 00000 11 3 46 78:J9 1 2 0 0 1852 0 0 36 35 30 6> 0 0 ,o34 0 re o o 19 26 o 20 0 0 49 4 14 2 40 844 38 10 33 0 27 0 0 39 5348 0 0 0 0 25 470<:J:6o321621 17 0 41 0 023 0 45 28 13 51 29 50 015 0 0 0 37 0 43 0 31 0 24 0 22 0 42 0 I -2 00000 I 0 00000 2.00000 REGR factor sco r e 2 for analysis 1 F i g u r e 6.1 Scatter Plot of Forum Members Based on Component Scores 6.2 L atent Sema ntic A na lys i s R es ult s 4 00000 Latent Semantic Analysis was conducted on the members text responses to the six interview questions using an online tool provided by the University of Colorado (http://lsa.colorado edu/). The text of each member was compared to all other members using the general reading up to first year of college topic space with 300 factors. The resulting proximity matrix a 53-by-53 matrix o f numbers between 0 and 1 representing the similarity of text responses among the interview participants-was analyzed in SPSS using hierarchical agglomerative clustering as suggested by Miaskiewicz et al. ( 2008). For proximity matrix data types Everitt et 92

PAGE 109

al. (2001) recommend using the single average or complete linkage (furthest neighbor) clustering methods All three methods were applied but the resulting clusters were mostly uneven. The single and average linkage clusters consisted of highly dominant groups where one cluster had 50 members assigned out of 53. Even though the complete linkage clustering was somewhat uneven it provided a better representation ofthe three persona groups with cluster sizes of 10, 41, and 2 members for personas 1-3 (respectively) Therefore the complete linkage based on Euclidean distance method was chosen for the final LSA clustering results listed in Appendix E. 6.3 Cluster Analysis Results Cluster Analysis was conducted in SPSS using the four observation variables extracted from the forum s transaction server log. For plotting purposes the four observation variables were reduced to two using PCA and used to plot members according to usage resulting in the plot in Figure 6.2. Visual observation of the data revealed several outliers for example members 11, 12, and 48. Hair et al. (2006) recommend using the average linkage or centroid hierarchical agglomerative clustering methods if there are outliers ; however both methods resulted in highly uneven clusters Everitt et al. (2001) recommend using Ward's or average linkage for continuous data types. Average linkage was used in clustering but once again it resulted in highly uneven clusters. Ward s method with cosine distance was used 93

PAGE 110

and resulted in reasonably sized clusters that captured the patterns indicated in Figure 6.2. The cluster sizes were 23, 18, and 12 for personas 1-3 (respectively). TheCA clustering results for each member are presented in Appendix E. 6.4 SPEC Results As discussed in the last chapter SPEC relies on the results of the base clustering methods reported in the three previous sections in order to make combined, incremental clustering decisions The first step in SPEC's cluster decision center was conducted by applying the majority vote rule. In 43 instances, at least two of the base clustering methods agreed on the member assignment-in those cases, that assignment became the final cluster decision. The "Majority" column in Table E.1 of Appendix E indicates the cluster labels for the cases that resulted in a majority. Out of those 43 cases in which there was a majority decision, the base methods unanimously agreed on 6 cases. 94

PAGE 111

5 .ooooo..-Ill c; 4 .oooooc: CIS ,E 3 .oooooN 2.ooooo o 0 Ill 0 1 .ooooo-0 .:!! a:: o. ooooo(!) w a:: -1.0000o-2 00000 Component Score 1 vs Score 2 12 0 11 0 22 0 03 01 14 0 4 7 0 38 13 18 6 5 2 15 41 25 23 0 1 37 0 00000 -r 2.00000 4.00000 I 6.00000 REGR factor score 1 for analysis 1 48 0 I 8.00000 Figure 6.2 Scatter Plot of Members Using Component Scores from Usage Data For the remaining 10 cases (in which there was no majority), the HGCE algorithms were applied to make the final cluster decisions. Cluster analysis was conducted using the base clusterer labels as input to the HGCE package (found at in Matlab version R2009b. The labels for the cases that required the HGCE analysis are displayed in the "HGCE" column in Table E.1 in Appendix E. Since the ANMI for CSPA (0 .384 ) was higher than for MCLA (0.355) and HGPA (0.325), the HGCE supra-consensus function used the CSPA output to determine the final cluster labels. The application of SPEC resulted in cluster sizes of25, 18, and 10 members for personas 1-3 (respectively). SPEC clustering results for all members are presented in Appendix E. 95

PAGE 112

6.5 Expert Panel Results Separate from SPEC s analysis the expert panel concurrently analyzed the data and made cluster decisions. Four experts independently analyzed the data and submitted their clustering results The first expert panel task required the experts to identify the number of groups and describe their qualities. Two experts identified four groups and one expert found three groups. A fourth expert-one with considerable expertise in the CC forum-reviewed the input from the first three experts and combined it into the final expert panel decision shown in Table 6.3. The expert panel identified three groups (personas) from the data: 1. Specific information seeker ; 2 General browser ; 3 Contributor-helper. The expert panel persona labels (1-3) in Table 6 3 were ordered to correspond with the groups identified by SPEC so that the personas for each method had similar meaning to facilitate agreement calculations. For example, the expert panel persona 1 member-"Specific Information Seeker "-is similar to the SPEC persona 1 member-" Knowledge Consumer "-in that both personas look for specific information to answer questions or concerns. The other two personas also correspond with each other. Expert panel persona 2-" General Browser"-closely resembles SPEC persona 2-"Social Networker "-because they both contain 96

PAGE 113

elements of social goals and they both lack specific contribution or consumption goals. Expert panel persona 3-" Contributor-helper "-matches SPEC persona 3"Knowledge Provider" as they both strive to contribute knowledge to the forum. Table 6.3 Expert panel results for persona group identification and description Persona Group Title Description Logs into CC to find something specific. Has a question he needs to answer or is looking for a specific tool or document he 1 Specific Infunnation Seeker needs right now. Visits to the funun are typically low but spike when preparing to become a connnander or preparing fur some key upcoming event or when dealing with a specific issue. Logs into CC to gain general knowledge about the profession Typically is not yet an experienced company commander and 2 General Browser rarely shares their knowledge in CC Visits to the funun could range :from moderate to high. Member is more likely to be interested in discussions and social aspects of the funun. Logs into CC to read and to contribute/help other members. Most of the actual contnbutions in CC come :from this type of member. Typically, he is an experienced 3 Contributorhelper company commander with a desire to give back and to contribute Visits to the fururn could range :from low to high, but if their visits are "low"they have a high ratio of contnbutions per visit. The second expert panel task required the experts to assign forum members to the persona groups the panel identified in task 1. Three experts classified all 53 users 97

PAGE 114

into one of the three personas In two cases-for member number 27 and 45the panel could not reach a majority decision ; therefore a fourth e x pert classified just those two cases to reach a consensus clustering decision. The experts reached unanimous decision on 26 cases. Expert 1 had the highest le vel o f agreement with the consensus expert clustering with a nearl y perfect kappa of0. 828 ; expert 2 had substantial agreement and expert 1 had moderate agreement. The agreements between individual experts and the expert consensus groupings are listed in Table 6.4. The expert panel s fmal clustering decision f or all 53 members is shown in Table 6.5. Table 6.4 Agreement (kappa) between individual expert and consensus groupings Expert Expert Expert 1 2 3 Consensus Expert 1 1 ------------------Expert 2 0 606 1 --------------Expert 3 0 .391 0.402 1 ---------Consensus 0 828 0.773 0.554 1 98

PAGE 115

Table 6.5 Expert panel individual and consensus clustering decisions Melllller Expert l Ellpert2 Ellpert4 Consensus Melllller Expert l Ellpert2 Ellpert4 Consensus ID ID 1 3 2 2 --2 28 2 2 1 --2 2 1 1 1 --1 29 2 1 2 --2 3 3 3 2 --3 30 1 1 1 --1 4 2 3 2 --2 31 2 2 1 --2 5 2 1 1 --1 32 2 2 2 --2 6 3 3 3 --3 33 2 2 3 -2 7 3 3 3 --3 34 1 2 1 --1 8 3 3 3 --3 35 3 3 2 --3 9 2 1 1 -1 36 2 2 2 --2 10 3 3 2 -3 37 1 1 1 --1 11 3 3 3 -3 38 2 3 3 -3 12 3 3 3 --3 39 1 1 1 -1 13 1 1 1 --1 40 1 1 1 --1 14 3 3 3 --3 41 1 1 1 --1 15 2 2 2 -2 42 2 2 1 --2 16 1 3 1 --1 43 1 1 1 -1 17 2 1 1 --1 44 1 1 1 --1 18 1 1 1 --1 45 2 3 1 2 2 19 3 3 2 --3 46 1 3 1 --1 20 1 1 1 --1 47 1 1 1 --1 21 2 1 1 --1 48 2 2 1 --2 22 2 2 2 --2 49 3 3 2 -3 23 1 2 1 -1 50 1 1 1 -1 24 2 2 1 --2 51 2 2 2 --2 25 2 2 2 -2 52 3 3 2 -3 26 2 2 2 -2 53 1 1 2 -1 27 2 3 1 2 2 99

PAGE 116

6.6 Hypothesis Testing 6.6.1 Effective Results Effectiveness of the existing semi-automated qualitative and quantitative clustering techniques was measured by comparing the assignment agreement for each method with the manual expert clustering and computing Cohen's kappa for each method. The final persona labels for all methods are listed in Table 6.6. The LSA and F AIPCA methods resulted in unevenly sized clusters though the CA SPEC and expert clusters were fairly even as indicated by the numbers in parentheses in Table 6.6-all three clustered 18 members into persona 2 Appendix F contains the cross tabulation tables ; the numbers along the diagonals represent the number of matching cases; the numbers off the diagonals represent the unmatched cases The performance of each method in terms ofthe raw number of matching cluster assignments and the adjusted kappa measure is listed in Table 6.7 and Table E 1 shows a full comparison for each member. As indicated in Table E 1 of Appendix C SPEC s majority vote rule decided the persona label for 43 cases ; in 32 of those cases SPEC and the expert panel agreed. For the remaining 10 cases (in which there was no majority) SPEC used the HGCE algorithms to determine persona labels : 4 out of 1 0 of these cases matched the expert persona labels. (It should be noted that the agreement between the HGCE method alone and the expert clustering resulted in a kappa of0.289. ) SPEC had the highest raw number of matching cluster 100

PAGE 117

assignments with the expert panel (36) followed by CA (31 ) F AIPCA (30) and LSA (26). Hypothesis H1 requires a comparison of the agreement between the existing semi automated methods and SPEC. As indicated in Table 6.7 the LSA kappa of0.216 indicates a fair amount of agreement with experts ; the F AIPCA kappa of 0.326 indicates fair agreement ; theCA kappa of 0.360 indicates fair agreement ; and the SPEC kappa of 0 503 indicates moderate agreement. The combined kappa for the pilot and main studies was 0.587 indicating an overall moderate agreement. A one-sample t test was conducted to determine if SPEC s kappa was significantly higher than the kappa for the other methods Results indicate that SPEC was significantly more effective (t = 4.656 p = .021 ) Hypothesis H1 is therefore supported by the data. 101

PAGE 118

Table 6.6 Expert panel and semi-automated methods clustering results Persona Clusters 1 2 3 Method Members Assigned to Clusters 1,3,4,5 6 7 9 ,10,11,12 14,15, 17,19,20 ,22,23, 13, 16, 18,21,29,30 24,25,26,27,28,31 ,32, 2, 8 LSA 34,38 ,40,50 33,35,36,37,39,41,42, 43,44 ,45,46,47,48,49, 51,52,53 (10) (41) (2) 1 2 5 6 8 9 10, 13,15 17, 18, 19,20 ,22,23, 3,4,7 ,11,12,14,16, FA / PCA 24,28,30,31 32,37 25,26,51 21,27,29,33,34,35, 39,40,41,43,44,45 36 38,42,49 46, 4 7 48,50,52,53 (33) (3) (17) 2,10,13,15,17, 18,19 5 6 8 16,25 ,26,27,28, 20,21,23,24,37,39, 1 3 4 7 9,11 ,12,14, CA 40,41 ,42,43,44,45 29,30,31 ,33,34 ,35,36, 22,32,38,48 46, 47,52 ,53 49,50,51 (23) (18) (12) 2,1 0 ,13, 15, 17, 18,19 1 5 6 8 9,22,25,26,27 20,21 ,23,24,30,37 28,31 ,32,33,3 5,36 ,48, 3,4,7,11,12 ,14, SPEC 39,40,41,42,43,44 49,51 16, 29,34 ,38 45,46,47 ,50,52,53 (25) (18) (10) 2,5,9, 13, 16, 17,18, I 4 15,22,24,25,2 6,27 20,21 ,23,30,34,37, 3,6,7,8,10 ,11, 1214 Expert 39,40,41,43,44,46, 28,29 31,32,33,36,42 19,35,38,49,52 47,50,53 45,48,51 (22) (18) (13) 102

PAGE 119

Table 6.7 Cohen s kappa and degree of agreement for all four methods Persona Matching Kappa Degree of Method Assignments Agreement LSA 26 /53 0 216 Fair FAIPCA 30 /53 0 326 Fair CA 31/53 0.360 Fair SPEC 36 /53 0 503 Moderate 6.6.2 Efficient Results Efficiency was measured in terms of the time required to identify the persona groups and assign all 53 members to those groups In addition to performing the clustering tasks the expert panel reported individual times for each task. Table 6 8 shows the range of times for the experts, the average times for each task and the total time required A one-sample t test was conducted to determine if the expert times were significantly greater than SPEC s times for persona clustering. Results indicate that experts were significantly different from SPEC on task 1 (t = 5.568 p = .015), task 2 (t = 3.539 p = .019) and the total clustering time (t = 4.571 p = .022). Inspection of the expert times in Table 6 8 indicates that their average times for task 1 (180 minutes) task 2 (120 minutes) and the total time (300 minutes) are significantly higher than the times required by SPEC (30 60 and 90 minutes). Hypothesis H2 is therefore supported by the data. 103

PAGE 120

Table 6.8 Times to complete clustering tasks for experts and SPEC Task Times (minutes) Identify / Assign Describe Members Groups to Groups Total E Range X 150-240 80-150 230-390 per p Expert e r Panel t Average 180 120 300 s s p E 30 60 90 c 6.7 Using the Results to Create Personas Based on the final clustering using the SPEC method cluster data were computed to serve as the basis for step 4 of the persona development process-persona creation and presentation. This last step highlights the importance of the clustering process because the resulting personas were based on actual data from members comprising the different personas. The more accurate the clustering process the more representative the final clusters. Using several automated tools in conjunction with SPEC only minimal effort is required to analyze and organize the data for persona creation As displayed in 104

PAGE 121

Table 6 9 summary data for the three personas were computed and organized in order to differentiate the personas using qualities that defme the users mental models-forum activity goals needs and frustrations-and capture key demographics. Some of the entries in the table for qualities such as typical session" or fmds expertise by ," were identified by using the text summarization feature in word processing software to quickly summarize key words and phrases in the interview text. Other table entries were identified by analyzing the survey and usage data in SPSS Three personas were created using the data in Table 6 9 and summary text responses. Actual text was inserted into the persona charts in order to add rich and realistic data. For example each persona has a mantra that summarizes the persona's general attitude about the forum As shown in Figure 6 3 the mantra for persona 1 is : "I'm not looking to re-create the wheel. In the Army we have plagiarism ; it is the sincerest form of flattery The text for this mantra as well as the mantras for the other two personas shown in Figures 6.4 and 6 5 was copied from an actual member interview and used to add realistic details to the personas. 105

PAGE 122

....... 0 0\ Demographics -Marital Years of Svc (Age) -Career -Education Average Forum Activity -Use minutes/week Contributions -Page vieM Proftle vieM Downloads Computing skiUs Typical session Finds expertise by Sources of expertise Shares expertise by Social networking Frustrations Persona 1 Manied with children 9 (30) Pre or In Command Bachelor's 105 0 166 2 65 Very Proficient BroMe Topic Menus Key Word Search Documents Post Documents Medium Participation Search Tool; Finding Info Persona 2 Single/Mamed 8.5 (29) In or Post Conunand Bachelor's 75 1 157 6 20 Proficient Front Porch BroMe Topic Menm Discmsions Post/Reply to Discussions High Participation Navigation & Layout Penooa3 Manted lt'ltb eblldnn 13 (32) In CoiDDilnd Bachelor's /some Gradte 60 8 746 55 60 Somewhat Proficient Speeialb.ed Throuah People Disemsio & Doeume n1B Aolwer QuestioDS Low Participation None to Few e:r 0"1 i.o ""0 (1) ""1 Vl 0 g ..0 [ ..... o; Vl

PAGE 123

IJCl = ., 0\ i.JJ 'Tj ..... :::::s ll) -""0 (1) '"1 Cll 0 :::::s ll) :::0 (1) "0 '"1 (1) Cll (1) g ll) ...... ..... 0 -:::::s 0 b' -....) '"1 ""0 (1) '"1 Cll 0 :::::s ll) ...... .. ?;i 0 (b 0.. (1Q (1) (1 0 :::::s Cll 3 (1) '"1 Persona 1 CP Alley Demographics: Married, 29 years old Mantra: 11 I'm not looki n g to r e -create the whee l. I n the A rmy w e have plag iarism; i t i s the s incerest for m o f Bachelor's Degree in Business Preparing for Company Command T)"-pical Forum Act:ilfu Enjoys hiking, travel. and reading U selweek (miD..) 105 C on.tribatiGIU ow he uses the forum: Knowl e d g e Con sume r Typical Session: CPT Alley's first inclinatiQn Ls to click on the tQpics inth.e browse section of the forum and drill down to a related category. If he can't find what he-'s looking for h.e uses th.e s earchtoQI. His goal i.s to firnd answers to que-stions or concerns. in ding Expertise: Alley see.s expertise as good documents : tactics techniques procedure.s policies., checklists and o ther useful files. He may look atthe contributor's profile to evaluate his background or e){pertis e but in theend he s l()oking for a starting point or several examples to combine for his needs Alley UJSe-s the topic hi era r chyto find documents ; the search tool is his backup. Sharing Expertise: Alley has not sllared his expertisein the past, but if he did he would share a document onoe h.e used it and achieved good re-sults. He is like-lytosharedocuments such as: standard operating procedures counselillg statements, or tracking tools. 0 Pae:e linl.'S 166 Pro.file l-im s -Downloads tiS Computing Skills: Alley is very proficient at using th.e can dliag nose problems and changese1tings if needed. His Web skills are also proficient Frustration with the Forum: He gets frustrated with the forum's r.avi.gation and layout. Alley wants to fincl a related topic in 2 clicks, sG hewants the topic llierarchy to include adva need options and more detailed t()pics Use of Social Networking Tools: Alley is not very active in Web 2..0 sites, but he has used Facebo()k and MySpace-. He may get ()n FaceboGk ()nee a to seewhat friendls are doing, but lle doesn t update his status oftell

PAGE 124

.... ., 0\ >-rj s '"0 (I) '"1 Cll 0 :::3 '"0 '"1 (I) Cll (I) g a 0 :::3 0 00 '"1 '"0 (I) '"1 Vl 0 :::3 N r/J 0 () [ z (I) ? 0 '"1 (I) '"1 Persona 2 CPT Dugan Demographics: Marriect 1 c hild 30 yea r s old Mantra: I like to see "vvhat m v friend s and fellov.; officer s are doing. Th e forum ser v es a s .a g r eat m eans of stay in g in t o u c h ... Bach e l o r 's Degree in P o liti ca l Scie nce TypU:alForu.m Actm.ty Curre n t l y in Comma nd 75 Enjoys go l fing, biking, a nd ph otography Uselweek(min.) Contributions 1 How he uses the forum: Social Networker Typical Session: T h e first thing he does is look at the topics on the front porch to see if any t hing looks interest ing; then he looks to see is logged into the forum Dugan likes to read discussiotl threads to see w hat others think and feel about situations he is facing. or ma y be facing soon. Fin ding Expertise: CPT Dugan v alues the expertise found in d i scussions-reading about the experiences of others taking i n their ideas and getting different perspect rJes: on the issues. He often looks: at other members profiles and tracks members he feels are con sistentt; 1 making v aluable contributions Dugan w ould not hesitate to email another mem b er if he has trouble findin g the right information Sharing Expertise: Although he doesn' t normally feel e>:perie11ced enough to contribute to disctlssions, Dugan is most comfortable sharing his e x perience in the form of discussions-either posting ne..r discussion threads or repl y ing to e x isting threads H e is likel y to share lessons learned stories of his e x periences and genera l tips. 157 Profile views 6 Downloads 20 Computing Skills: Dugan is proficient at llsing t h e computer especiall)i MS Office products. lie ttses the Web daily) so he is comfortable using the forum. Frustration with the Forum: His main complaint is that information is h ard to find; to h i m the search tool needs to do better at narro:.'in gth e results so he doesn' t need to scroll tlarou gh the discmsions and dig for information Use of Social Networking Tools: Dugan participates in a number of 1/!J'eb 2.0 sites: Facebook M ).Space, T.'.'itter, and Linked ln. He adrvely uses these sites t o keep in touch v:ith and fr iend s (Army, high school & college).

PAGE 125

.... IJ(l = ., 0'1 tJ. 'Tj s Pl '"0 (I) '"1 (/) 0 ::s Pl 10 (I) '"0 '"1 (I) (/) (I) ::s ....... Pl ....... c; ...... ::s 0 \0 '"1 '"0 (I) '"1 (/) 0 ::s Pl w 0 en 0.. OCI (I) '"0 '"1 0 < o.; (I) '"1 Persona 3 CP Woods Mantra: 11 If there's a to1>ic that I think I to I \.Viii I g o ahead and offer m y I Married 2 c hildren, 32 years old experience.' J Master's De gree in Management w ith some graduate school Completed Company Command Enjoys exercise, c ookin g and readin g Typical Forum Attivity How he uses the forum: Knowledge Provider Typical Session: CPT \Voods: enters the forum and quickl y scans tile topics to see w h ich areas and members are a div e He often Neaves bet:.reen forums to see if !me can find answers: in other Arm y communities to share in t11e C C forum Finding Expertise: W oods doesn' t bother w ith looking through doc11ments or discussions to find expertise. If he s not alread y the e x pert he know s ..rho to contact-he srwt afraid to ask forum leads because he i s one and he the other forum e x perts Woods looks outside the CC forum for e xpertise if necessary, crossing into other forums for it1tel maneu ver, and personnel Sharing Expertise: 1/oJ'oods looks for OJ>portunities to s:hare his expertise, or to answ er questions with answ ers from another area or another forum all together. Although he :.;ill offer his tAn solicited opi nions he is more likely to contribute ..... hen asked b y a topic lead. Uselweek(min.) 60 Contributions 8 746 Profile views 55 Downloads 60 Computi ngSkills: is somewhat JJroficient at using tile computer but he is also a multitasker mo ving from one program or forum to another and back Frustration with the Forum: 'v'l/oods has no problems with the forum; he s just gladthere, s a N a y to share krlo.vledge with his peers. Use of Socia I Networking Tools: IN oods belie ves in net w orking w ith peers: in the Arm y and in the forum but hera uses tools in the forum for thi s purpose-he '' Jould rather email other members. He is not a fan of Facebook and other commercial social net w orking tools

PAGE 126

7. Discussion and Conclusion As an aid to the reader, this final chapter begins by restating the research questions and reviewing the research methods used in the study. This chapter further serves to effectively communicate (G7) the research, as stipulated in the DSR guidelines, by summarizing the results, discussing their implications, highlighting the contributions, and recommending future research. The use of personas as a design tool for understanding IS users and improving system interfaces is relatively new (Mulder & Yaar, 2007) As such, there are many questions about the use and development of personas in research and practice. This study examined four research questions. 1. What are the existing qualitative and quantitative methods used for identifying personas? 2. How do the vario us quantitative and qualitative methods used in identifying personas differ? 3. How do existing semi-automated qualitative and quantitative techniques perform when compared to manual clustering? 4. Can a method based on ensemble clusters improve clustering performance? 110

PAGE 127

To answer these questions, the SPEC method and prototype were developed to solve the identified problems with persona clustering. Effectiveness and efficiency were defmed as performance measures to evaluate the hypotheses for the new artifacts Three instruments were then used to collect qualitative and quantitative data from members of a military KMS. The clustering effectiveness of three existing persona clustering methods was compared with the new SPEC method by using an expert panel clustering as the baseline. The first two research questions were answered by reviewing the persona literature and identifying four persona clustering methods: a traditional manual qualitative method and three semi-automated methods: LSA F AIPCA and CA. Based on the performance measure used SPEC outperformed LSA, FAIPCA and CA supporting the first hypothesis (Hl) and answering the last two research questions This finding is consistent with those reported in recent cluster ensemble studies in other research domains (Gionis et al. 2007 ; Hong et al. 2008 ; Turner & Agogino 2008). Although SPEC provided evidence that a cluster ensemble can improve clustering performance the increase was not as substantial as anticipated While the pilot study showed remarkable results for the SPEC method the full study showed slightly less optimal results This could be attributed to differences in the samples and experts between the pilot and main studies. The members sampled in the pilot study were from one geographic location and had a wide range of usage behavior. 111

PAGE 128

The two experts in the pilot study had forum and interface design expertise, whereas the four experts in the main study were all human factors experts. Nevertheless, the fact that the research methodology and application of SPEC were replicated with two different sets of data indicates reliability in the procedures and confidence in the results. The fourth research question also prompted a second hypothesis that measured performance in terms of efficiency or the amount of time needed to perform persona clustering. The times for SPEC and the manual qualitative method were computed and compared; SPEC times were significantly lower-it performed the task more than three times faster than the manual method. This result provided evidence to support the second hypothesis (H2). This finding is similar to those reported in other studies that implemented semi-automated methods (McGinn & Kotarnraju 2008 ; Miaskiewicz et al., 2008; Sinha, 2003) however this study provided empirical as opposed to anecdotal evidence. Therefore, based on the evidence supporting H1 and H2 the answer to the fourth research question is clear: in terms of effectiveness and efficiency a method based on ensemble clusters can improve clustering performance. 7.1 Contributions Within the realm of IS, this study provides an apt illustration of the seven DSR guidelines. Hevner et al. (2004) assert that there are three types of contributions for 112

PAGE 129

DSR in the IS domain-the design artifact foundations and methodologies-and one or more of these must be included in the research (G4). This research makes three contributions First it provides the first known empirical comparison of existing persona clustering methods The persona literature is dominated by manual clustering; however, recent studies have implemented semi-automated methods. The literature review revealed that although three semi-automated methods have been studied there is a dearth of empirical research comparing persona clustering methods. The comparisons from this study may influence researchers and practitioners as they make future decisions on which persona clustering methods to apply for research or practice. The second contribution directly addresses the DSR guidelines by presenting two novel IT artifacts-a method and an instantiation that improve persona clustering. The first artifact is a semi-automated mixed analyses persona clustering method that combines the results of several base clustering methods into an ensemble method The second artifact is an instantiation in the form of a prototype system that automates the clustering method. Although ensemble clusters have been successfully applied in other domains they hav e not been used for persona de v elopment. SPEC implements clustering algorithms commonly found in most statistical software packages or in the case of the HGCE algorithm and the LSA tool available online for free. Using concepts from the Dempster-Shafer and 113

PAGE 130

information theories evidence from clustering are combined in SPEC to cluster IS users into personas in a manner similar to human experts The prototype system described in this study helps automate the process of identifying the number of personas for a system describing the underlying persona qualities, and assigning users to personas to support the persona development process. This new persona clustering method can also be used to help explain the persona qualities and support the persona development process. A pilot study was used to demonstrate the feasibility of the artifacts and the utility of the resulting personas. Use of one of the personas provided new insights for interface designers in the case study environment and served as the impetus for a new design feature to improve usability. Third this empirical study validates SPEC as a persona clustering method and compares its performance with the three existing persona clustering methods. Data are collected on users of a military KMS using three instruments and analyzed using automated tools and an expert panel. Unlike previous research in semi-automated persona clustering this research automates all of the data collection and most ofthe analysis. A robust methodology is prescribed and followed providing a foundation for knowledge and opportunities for others to assess its applicability in other settings and replicate it in future studies. 114

PAGE 131

7.2 Implications for Research Although this research did not apply or test theory researchers should consider theory development as a potential outcome of using the IT artifacts developed in this study Hevner et al. (2004) believe that we discover knowledge through the development and use of DSR artifacts that may lead to theories regarding their application and impact (p 76). Within the persona literature practice issues are more prevalent than theory, however this is likely due to the fact that personas are relatively new tools and it takes time to evaluate design artifacts and develop theory The fmdings suggest that semi-automated persona clustering methods should employ an ensemble approach to triangulate the plethora of user data. The three base clustering approaches in this study only used portions of the available data for persona clustering even though the researchers indicated more data was available (Almirall et al. 2010 ; Javahery et al., 2007 ; McGinn & Kotarnraju 2008; Miaskiewicz et al. 2008; Sinha, 2003) They used the additional data to sequentially verify clusters when they could have concurrently analyzed all of the data to determine clusters Following the DSR perspective this study builds and evaluates a method and its instantiation based on ensemble clusters. Ensemble clusters have been studied in other domains and found to improve measures of performance but there is no known use of ensemble clusters in the persona literature. Persona clustering should take all sources of data into consideration as 115

PAGE 132

well as all types of available data Therefore researchers need to employ mixed method research approaches using mixed analyses to be able to effectively combine disparate sources and types of data for persona clustering. Another implication for research is that the automation of persona clustering methods requires more reliance in online research methods. As the amount of web usage data grows it will be critical to harness increasing volumes of data for persona clustering research. Although it is common today for researchers to use online questionnaires it is not as common for them to adopt an online approach to interviewing (Ayling & Mewse 2009) As researchers find new ways to mine data on the web persona clustering should improve as researchers find better ways to model user behavior. Online research methods are particularly useful for persona development projects in the military where oftentimes system users are deployed around the globe. It is difficult if not impractical to conduct face-to-face interviews and physically observe users to collect the necessary data. Therefore it is imperative for researchers to acquire the requisite skills to conduct online research. 7.3 Implications for Practice An important milestone in DSR is to demonstrate that a designed artifact actually works ; once that is achieved it is important to understand why it works and under what conditions (Hevner et al. 2004) Practitioners can use these functional artifacts to solve similar problems with personas and improve practice Although 116

PAGE 133

users mental models vary from one system to another (e.g., transaction processing systems are different than DSS and KMS) the instruments used to collect data can be modified to suit the needs of different systems and organizations Although one would expect the tasks interfaces and goals to be different across a range of systems the method developed here is robust and should be tested by practitioners in a variety of environments. One cannot be sure however that the method and SPEC will work for other types of s y stems 7.4 Limitations Since this study collected a non-random sample of users within the context of a single system there is limited generalizability of these fmdings to other systems. The sampling strategy was to invite active members of the forum based on the belief that they were more familiar with the forum s features and would be more likely to describe forum interactions. Although it may be more difficult to elicit the goals and needs of novice infrequent forum users it may be worth the effort if the goal is to design an interface specifically for those users. However since these lurkers remain on the periphery it may prove to be too difficult to convince them to participate in these types of studies and become active members of the forum From both the positivist and interpretive research perspectives the findings of this research cannot be generalized to other environments because of the non-random sampling strategy (Lee & Baskerville 2003). Nevertheless from a DSR 117

PAGE 134

perspective, practitioners and researchers should take advantage of this research and test these artifacts in other settings. SPEC prescribes a robust method of persona clustering that identifies user groups based on the data collected for a particular system. There is nothing special about the military system or users studied here that would preclude the use of SPEC in other environments The instruments used in this study also present some limitations. The online interview consisted of semi-structured questions in order to be able to compare member responses to the same questions The participants in this study had to type their responses themselves, which may have been a burden. They may have explained their responses in more detail if it had been a face-toface recorded interview. In the latter case, the participants may have provided richer data that could have improved clustering performance. The observation data consisted of user behavior in the forum ; however, the observations were limited to the activities captured by the server and the configuration of that server. The observations did not include length of time for each activity so it would record an accidental page view the same as an intentional view that lasted much longer. The collection of the observation data highlights the problem of using automatically collected data and interpreting its meaning Another limitation commonly found in cluster analysis studies is that there may be correspondence and association issues with resulting cluster labels. When using 118

PAGE 135

cluster analysis the labels for the clustering methods must be comparable researchers need to associate the description of clusters from one method to another. In this study the focus of the instruments was on capturing user goals and needs ; therefore, clusters from the different clustering methods corresponded with each other. In other words cluster 1 from the FA/PC A method had the same general qualities as cluster 1 from the CA method. Researchers attempting a cluster ensemble method such as SPEC must use caution to ensure clusters correspond across the different methods. When the number of clusters is the same and there is relative agreement the correspondence of cluster labels is often obvious upon inspection allowing the relabeling of clusters if necessary (Everitt et al. 2001 ). 7.5 Recommendations for Future Research Since personas are relatively new in the domain of IS design research there are many opportunities for future research. Staying within the domain ofDSR (where the artifact is the focus of research) the artifacts themselves are fruitful objects for future research. The cluster ensemble implemented in SPEC was the result of diligent research; however there may be other equally effective and efficient ways to combine data for persona clustering. Likewise, additional base clustering methods should be explored as different sets of data require different inference engines. Fuzzy clustering methods may be appropriate in contexts that require flexible degree of membership labels and at least two unsupervised neural network 119

PAGE 136

models also warrant examination: self-organizing maps (SOM) and adaptive resonance theory (ART) may be applied to recognize patterns in user goals needs and behaviors. Also the current prototype is more a collection of tools than a stand alone system. Future research on the system artifact should build and evaluate a single interface for PDTs to view data throughout the clustering process and into the persona creation phase. Another worthwhile study would be to investigate if certain combinations of variables can be used with the persona labels to predict the personas of the remaining 9 500 members of the forum. This is an important consideration for practitioners because once the decision is made to implement different views of a system to users, there has to be a way to decide how to assign all of the users of a system into personas. One way may be to provide a pop-up page for users to notify them of an upcoming change in the system interface and ask them to specify their preference; however it would be ideal to suggest an interface to users based on their predicted persona. In the case of the Army CC forum the administrators already have access to usage activity for all members so research could be done to determine if the usage variables predict persona group. Future research should extend the SPEC method and system into a more encompassing persona development methodology. This study focused on creating IT artifacts to deal with the problems of identifying personas and clustering users 120

PAGE 137

into those personas in order to gain a better understanding of typical system users to inform interface design. Although this required specification of the first two steps of a persona methodology the last step needs more clarification to understand what qualities of personas provide new insights for PDTs to improve persona outcomes. Finally researchers should extend this exploratory study into a confirmatory study that seeks to explain or predict the quality of resulting personas as a function of the persona development methodology. This could be measured from two different views, depending on whether the focus is on the users or designers. In either case persona outcomes should reflect the expected benefits of personas listed in Table 2.1 (e.g. increased empathy and memory in designers improved project communication and improved usability). Long (2009) recently studied the effectiveness of personas on usability (for system users) and memory (for designers); however his research used pre-prepared personas given to interface design students. Instead of using pre-prepared personas a researcher could use the persona clustering methods described in this dissertation to create different personas and combine measures of usability and memory using Long s (2009) methodology. Along the same lines one could also develop a framework for measuring persona effectiveness in terms of the qualities or characteristics that represent users mental models In any case future research should use a rigorous methodology to support generalizability of empirical finding to theory or to other settings. 121

PAGE 138

APPENDIX A. INSTRUMENTS Table A l Online interview questions Number Question 1 Describe your use and involvement in the CC forum. 2 Where do you go after login? What are you doing? Describe what a typical session might look like. 3 What do you find is valuable about the forum and why ? 4 What things frustrate you the most with the CC forum? 5 Describe how you use the forum to find the knowledge or information you need. 6 Describe how you use (or would use) the forum to share your knowledge and experiences with other members. 122

PAGE 139

Table A.2 Description of observation data (trace data from transaction logs) Variable Name Description Contributions The number of times a member contributed information to the forum. Contributions may be documents uploaded discussions started or replies to other members discussions (Not all members contribute to the forum; in fact, few members contribute more than a few times.) Page views The number of forum pages visited by a member. Profile views The number of metacards or quick profile boxes (Metacard of other members viewed by a member. views) Document The number of documents downloaded by a downloads member. Documents include sample counseling forms ; checklists ; preparation for deployment guides ; policy letters; or other knowledge artifacts. 123

PAGE 140

During a tfpical visit to the forum the things I wish to accomplish when I login include: Nei ther Strongly Slightly Disagree Slightly Strongly Disagree Disagree Disagree nor Agree Agree Agree Agree 1 2 3 4 5 6 7 Goal1 Finding inform ation or answers to specific issues Goal2 Reading discussions Goal3 Contributing/posting to discussions Goa14 Downloading documents (checklists, TTPs, SOPs, etc Goal 5 Uploading/posting documents Goal6 Casually browsing the forum Goal7 Seeing who is logged in to the forum How likely would you be to: Neither Very Somewha Unlikely Somewha Very Unlikely Unlikely t Unlikely nor Ukely t Ukely Ukely Ukely 1 2 3 4 5 6 7 Behave 1 Ask a question in the forum Behave 2 Answer other questions in the forum Behave 3 Look for peers or friends in the forum to see what they are doing Behave 4 Post details to your profile (dog tag) Behave 5 View a member's profile after reading his/her posting (document or discussion) Behave 6 Share/post a document (checklist TTP counseling statement template SOP etc. How important are the following capabilities/features in the forum? Moderate I Ofuttle y Very lrrportance Important Important Important 1 2 3 4 5 Feature 1 Abil ity to browse by forum topics (using the Browse feature, left side of the main page) Feature 2 Ability to search using a search tool (by typing in key words) Feature 3 Ability to view and /or participate in discussion threads on different topics Feature 4 Ability to download sample documents posted by other members Feature 5 Leader Challenges (interactive scenarios) Feature 6 Social networking tools to connect and communicate with other members How much do you disagree/agree with the following statements? Neither Strongly Slightly Disagree Slightly Strongly Disagree Disagree Disagree nor Agree Agree Agree Agree 1 2 3 4 5 6 7 Use 1 I use the forum to find answers to my questions/concerns, or to seek advice Use 2 I use the forum to share my experiences or give advice to others Figure A.l Online Survey Questions Related to Goals and Needs 124

PAGE 141

APPENDIX B. PARTICIPANT DEMOGRAPHICS Table B.l Participant age Age Valid Cumulative Frequency Percent Percent Percent Valid < 26 3 5.7 5 7 5.7 26-28 10 18.9 18.9 24.5 29-31 20 37.7 37.7 62.3 32-34 9 17.0 17.0 79.2 > 34 11 20 8 20.8 100 0 Total 53 100.0 100.0 Table B.2 Participant gender Gender Valid Cumulative Frequency Percent Percent Percent Valid Male 48 90 6 90.6 90.6 Female 5 9.4 9.4 100 0 Total 53 100.0 100 0 125

PAGE 142

Table B.3 Participant marital status Marital Valid Cumulative Frequency Percent Percent Percent Valid Single 12 22.6 22.6 22.6 Married 11 20.8 20 8 43.4 Married with 30 56.6 56.6 100.0 Children Total 53 100.0 100.0 Table B.4 Participant education Education Valid Cumulative Frequency Percent Percent Percent Valid Less than Bachelor's 1 1.9 1.9 1.9 Degree Bachelor's Degree 41 77.4 77.4 79.2 Master's Degree 11 20.8 20.8 100.0 Total 53 100.0 100.0 126

PAGE 143

Table B.S Participant rank Rank Valid Cumulative Frequency Percent Percent Percent Valid First Lieutenant 8 15.1 15.1 15.1 Captain 35 66.0 66.0 81.1 Major 9 17.0 17.0 98.1 Lieutenant 1 1.9 1.9 100 0 Colonel Total 53 100.0 100.0 Table B.6 Participant point in career Career Valid Cumulative Frequency Percent Percent Percent Valid Pre-command 14 26.4 26.4 26.4 Command 20 37 7 37 7 64.2 Post-command 19 35.8 35.8 100.0 Total 53 100.0 100 0 127

PAGE 144

AP P EN D IX C. HUMAN SU B JECTS CE R TIFICATE OF EXEMPTION ln\llebgltcr. Sponeorja): SIJDjlet E l'l'ec:llft Dl ..-: Tbt: Not Humm RIIM:IIn:ll Jonala Brtdtey COJ.!IRB F'IUID 1 04!1 AppllcaUo D3-0ee>-2Cl!'3 Deve lop g l'il L onn;rtlcm !:)'612111& liil. 72L 1 IFDJ(J uctec ..,wt:al'llt> rlld:>] ...U IH!all =\\'III!CCC!iC7U JF'IIJ'j Y pru ,ecu rnl e111oOOM IRB urtde r protocol :Tller a& bel!l'l rev1e-111e a!ld oureleterrnln;rJio 111a1 I & notnumZJ re&eiii!C ali de ll'Ur poiC:I K ZJCI oorrenl r egu l allollAi and In acco rllilnce 1111r11 O H RP and FDA !I dell es.. ell!l'ilre, )'CIU may rotE4!d 1111 U1e rojeet &llll::lly l'iiiiD< g U1e rotocd Jli ZJd r!Filewe II)' 0011o11 Ra. c cort11r. lng view Dl U1e pr ee1w111 be re I r e ever, ycu miEire& rn111ne rotoco iiD COioCJRB apj:lt)va i iTart &Uto5tlll1ll'e CIU:lgKa to me atacolln qui!!5Uon R'IIIW ComiiDinla;: l. After firiler, mi.s projec: is Qu.ality lmpro;ement and oot research unde11he regula.tioli.S. Therefore, it 1:5 being ;rpiJ'roved as l\ot Humm Subjec: 2 As disamed by pholll! } 'lHl a:re alh-is2d o obtam permissio!l for this study from the epartment of I>eimse IRB or research review bomi You do not need to this to HSltC Approval loclude; : Not HUI:!Wl Subject or Request for hemption Imemew Ques DJJD.1ire sneer !!I) '. UOD Pan. !: 1 2 8

PAGE 145

APPENDIX D APPROVED INFORMED CONSENT FORM Date: November 23, 2009 Valid for Use Through: September 1 2010 Study Title : Developing Personas for Information Systems Design Principal Investigator: Jonalan Brickey HSRC No : 09-1048 Version Date: November 23, 2009 Version No: 1 You are being asked to be in a research study This form provides you with information about the study A member of the research team will describe this study to you and answer all of your questions. Please read the information below and ask questions about anything you don t understand before deciding whether or not to take part. Why is this study being done? This study plans to learn more about the goals needs and frustrations of users of the Company Command Professional Forum You are being asked to be in this research study because you were randomly selected as a member of the forum. What happens if I join this study? If you join the study you will be asked to complete an interview either in person over the phone or online This interview should last no longer than 45 minutes What are the possible discomforts or risks? Since the study is about your use of the CC forum no risks are expected What are the possible benefits of the stud y? This study is designed for the researcher to learn more about CC forum member goals needs and frustrations. It is possible that forum features and capabilities will change as a result of this research to reflect the needs of the members Will I be paid for being in the study ? Will I have to pay for anything ? You will not be paid to be in the study It will not cost you anything to be in the study Is my participation voluntar y ? 129

PAGE 146

Taking part in this study is voluntary You have the right to choose not to take part in this study. If you choose to take part you have the right to stop at any time. If you refuse or decide to withdraw later you will not lose any benefits or rights to which you are entitled Who do I call if I have questions ? The researcher carrying out this study is Jonalan Brickey. You may ask any questions you have now If you have questions later you may call Jonalan Brickey at 719-648-8010. You may have questions about your rights as someone in this study You can call Jonalan Brickey with questions You can also call the Human Subject Research Committee (HSRC) You can call them at 303-556-4060 Who will see my research information? We will do everything we can to keep your records a secret. It cannot be guaranteed Both the records that identify you and the consent form signed by you may be looked at by others They are : Federal agencies that monitor human subject research Human Subject Research Committee The Regulatory officials from the institution where the research is being conducted who want to make sure the research is safe The results from the research may be shared at a meeting. The results from the research may be in published articles Your name will be kept private when information is presented Agreement to be in this study I have read this paper about the study or it was read to me I understand the possible risks and benefits of this study I know that being in this study is voluntary. I choose to be in this study: I will get a copy of this consent form. Signature : __________________ Date : __ Print Name : ------------------Consent form explained by : ____________ Date: _____ Print Name : Investigator: __________________ Date: _____ 130

PAGE 147

APPENDIX E. FINAL CLUSTERING RESULTS FOR EACH MEMBER Table E l Clustering results by member for each method Member Base Clusterers ID LSA FAIPCA CA Majority HGCE SPEC Expert 1 2 1 3 --2 2 2 2 3 1 1 1 --1 1 3 2 3 3 3 --3 3 4 2 3 3 3 --3 2 5 2 1 2 2 --2 1 6 2 1 2 2 -2 3 7 2 3 3 3 --3 3 8 3 1 2 --2 2 3 9 2 1 3 --2 2 1 10 2 1 1 1 --1 3 11 2 3 3 3 --3 3 12 2 3 3 3 --3 3 13 1 1 1 1 --1 1 14 2 3 3 3 --3 3 15 2 1 1 1 --1 2 16 1 3 2 -3 3 1 17 2 1 1 1 --1 1 18 1 1 1 1 --1 1 19 2 1 1 1 --1 3 20 2 1 1 1 --1 1 21 1 3 1 1 --1 1 22 2 1 3 --2 2 2 23 2 1 1 1 --1 1 24 2 1 1 1 --1 2 25 2 2 2 2 --2 2 26 2 2 2 2 --2 2 27 2 3 2 2 --2 2 28 2 1 2 2 -2 2 131

PAGE 148

Table E.l (Con t.) Member Base Clusterers ID LSA FNPCA CA Majority HGCE SPEC Expert 29 1 3 2 --3 3 2 30 1 1 2 1 --1 1 31 2 1 2 2 --2 2 32 2 1 3 --3 3 2 33 2 3 2 2 --2 2 34 1 3 2 --3 3 1 35 2 3 2 2 --2 3 36 2 3 2 2 --2 2 37 2 1 1 1 --1 1 38 1 3 3 3 --3 3 39 2 1 1 1 --1 1 40 1 1 1 1 --1 1 41 2 1 1 1 --1 1 42 2 3 1 --1 1 2 43 2 1 1 1 --1 1 44 2 1 1 1 --1 1 45 2 1 1 1 --1 2 46 2 1 1 1 --1 1 47 2 1 1 1 --1 1 48 2 1 3 --2 2 2 49 2 3 2 2 --2 3 50 1 1 2 1 --1 1 51 2 2 2 2 --2 2 52 2 1 1 1 --1 3 53 2 1 1 1 --1 1 Total 26 30 31 36 Matched Cohen's 0.216 0.326 0.36 0.503 kappa 132

PAGE 149

APPENDIX F. CROSS TABULATION TABLES FOR ALL CLUSTERING METHODS Table F.l F AIPCA cross tabulation Expert PCA Crosstabulation PCA Info Seeker Social Contributor (1) Browser (2) (3) Total Expert Info Seeker (1) Count 19 0 3 22 %of Total 35 8% 0% 5 .7% 41.5% Social Browser (2) Count 9 3 6 18 %of Total 17.0% 5.7% 11. 3% 34 .0% Contributor (3) Count 5 0 8 13 %of Total 9.4% .0% 15 .1% 24 .5% Total Count 33 3 17 53 %of Total 62.3% 5 7% 32 1% 100.0% Table F.2 LSA cross tabulation Expert LSA Crosstabulation LSA Info f1ieker Social Contributor Browser (2J (3\ Total Expert Info Seeker (1) Count 8 13 1 22 % ofTotal 15 1% 24 5% 1 9% 41.5% Social Browser (2) Count 1 17 0 18 %of Total 1 9% 32. 1% 0% 34 .0% Contributor (3) Count 1 11 1 13 %of Total 1 9% 20 8% 1 9% 24 .5% Total Count 10 41 2 53 %of Total 18.9% 77.4% 3.8% 100 .0% 133

PAGE 150

Table F.3 CA cross tabulation Expert CA Crosstabulation CA Info Seeker Social Contributor (1) Browser(2) (3) Total Expert Info Seeker (1) Count 16 5 1 22 % ofTota l 30.2% 9.4% 1 9% 4 1.5 % Socia l Browser (2) Count 4 9 5 18 % ofTotal 7 .5% 17.0% 9.4% 34.0% Con tributor (3) Count 3 4 6 13 % ofTotal 5 7% 7 5% 11. 3% 24 5% Total Count 23 18 12 53 %of Total 43.4% 34 0% 22.6% 100 0% Table F.4 SPEC cross tabulation Expert SPEC Crosstabulation SPEC Info Seeker Soc ial Contributor (1) Browser(2) (31 Total Expert Info See ker (1) Count 20 1 1 22 % ofTotal 37.7% 1.9% 1 .9% 41.5% Soc ial Browse r (2) Count 4 9 5 18 % ofTotal 7.5% 17 0% 9 4% 34.0% Contr ibutor (3) Count 3 3 7 13 % ofTotal 5 7 % 5 7% 13 2 % 24 5% Total Count 27 13 13 53 % ofTotal 50.9% 24 5% 24 5% 100 0% 134

PAGE 151

BIBLIOGRAPHY Agrawal D. (2008). Web data clustering using FCM and proximity hints from text as well as hyperlink -structure Proceedings of the 1st International Conference on Emerging Trends in Engineering and Technology (pp 11041108). Nagpur, Maharashtra India: IEEE Computer Society. Alavi M. & Leidner, D. E. (2001) Knowledge management and knowledge management systems: conceptual foundations and research issues MIS Quarterl y 25(1) 107-136 Almirall M. Rivera J ., & Valverde L. (2010). Learning contents in mobile scenarios. Proceedings of the 2nd International Conference on Mobile Hybrid and On-Line Learning 2010 St Maarten Netherlands Antilles : IEEE Computer Society. A vogadri R. & Valentini, G (2009) Fuzzy ensemble clustering based on random projections for DNA microarray data analysis Artificial Intelligence in Medicine 45(2-3) 173-183 Ayad H G., & Kamel, M. S. (2009). On voting-based consensus of cluster ensembles. Pattern recognition 43(5), 1943-1953. Ayling R., & Mewse A. 1. (2009). Evaluating internet interviews with gay men. Qualitative Health Res e arch, 19(4) 566-576 135

PAGE 152

Barrett, J. F. Jarvis G. J., Macdonald, H N., Buchan, P. C., Tyrrell S. N., & Lilford, R. J. (1990). Inconsistencies in clinical decision in obstetrics. Lancet 336(8714), 549-551. Beck K. (2000). Extreme programming explained: Embrace change. Reading, MA: Addison-Wesley. Bell S. 1., Whitwell G. J., & Lukas B. A. (2002). Schools of thought in organizational learning. Journal of the Academy of Marketing Science 30(1) 70-86. Beyer H. & Holtzblatt K. (1998). Contextual design : Defining customer -centered systems. San Francisco CA: Morgan Kaufmann Booch, G., Rumbaugh J., & Jacobson, I. (2005). The unified modeling language user guide (2nd ed.). Reading MA: Addison-Wesley. Bostrom, R. P., & Heinen, J. S. (1977). MIS problems and failures: a socio technical perspective: Part 1: The causes. MIS Quarterly 1(3) 17-32. Boulis C & Ostendorf M. (2004). Combining multiple clustering systems Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databa ses (pp. 63-74) Pisa Italy: Springer. Bower B. (1988). 'Human Factors' and military decisions. Science News, 134(16) 245. Brickey, J & Walczak S (2010). A comparative analysis of professional forums in the United States Army and hybrid communities of practice in the civilian 136

PAGE 153

sector. Proceedings of the 43rd Hawaii International Conference on System Sciences (pp. 1 10). Koloa HI: IEEE Computer Society. Brickey, J., Walczak, S., & Burgess T. (in press) A comparative analysis of persona clustering methods Proceedings of the 16th Americas Conference on Information Systems. Lima Peru. Broschinsky, D. & Baker L. (2008). Using persona with XP at LANDesk software, an Avocent company Proceedings of the Agile 2008 Conference (pp. 543-548) Piscataway NJ: IEEE Computer Society. Chapman, C N. (2005). Personal observations at user experience day 2005. Paper presented at the Microsoft User Experience Day, Redmond, W A. Chapman C N. & Milham, R. P (2006). The persona's new clothes: Methodological and practical arguments against a popular method. Proceedings of the Human Factors and Ergonomics Society 50th Annual Conference (pp 634 636) San Francisco, CA: Human Factors and Ergonomics Society. Cianciolo, A. T., Hei C. G., Prevou, M I., & Psotka, J (2005) Evaluating army structured professional forums: Innovations in understanding and assessing effectiveness. Paper presented at the Interservice/Industry Training, Simulation & Education Conference, Orlando, FL. Clancy, K. J & Krieg P. C. (2000) Counterintuitive marketing : Achieve great results using uncommon sense. New York: Free Press. 137

PAGE 154

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 37 46 Compeau, D. R & Higgins, C. A. (1995). Computer selfefficacy: Development of a measure and initial test. MIS Quarterly 19(2), 189-211. Comrey, A. L., & Lee, H. B. (1992). Afirst course in factor analysis. Hillsdale, NJ: Lawrence Erlbaum Cook, M. A. (1996). Building enterprise information architectures: reengineering information systems. Upper Saddle River NJ: Prentice Hall. Cooper A. (1999) The inmates are running the asylum. Indianapolis IN: SAMS. Cooper A., & Reimann R. (2003). About face 2. 0 : The essentials of interaction design. Indianapolis, IN: Wiley. Cooper, D. R. & Schindler P. S. (2008) Business research methods. Irwin, CA: McGraw-Hill Courage, C & Baxter K. (2005) Understanding your users : A practical guide to user requirements San Francisco, CA: Morgan Kaufmann. Dalcher, D., & Genus A. (2003). Introduction : Avoiding IS / IT implementation failure Technology Analysis and Strategic Management, 15(4) 403-407. Davis F D. (1989) Perceived usefulness, perceived ease ofuse, and user acceptance of information technology. MIS Quarterly 13(3) 319-340. Davis F D (2006). On the relationship between HCI and technology acceptance research In P. Z. Dennis Galletta (Ed ) Human-computer interaction and 138

PAGE 155

management information systems: Applications (pp. 395-401). New York : M.E Sharpe. Dempster, A. P. (1968). A generalization of Bayesian inference Journal of the Royal Statistical Society Series B 30 42. Dharwada P., Greenstein J., Gramopadhye A., & Davis S (2007). A case study on use of personas in design and development of an audit management system. Human Factors and Ergonomics Society Annual Meeting Proceedings 469-473. Dietterich, T. G. (2000) Ensemble methods in machine learning In J Kittler & F Roli (Eds.), rt International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science (pp. 1-15). New York: Springer. Dixon, N M Allen N., Burgess, T., Kilner P., & Schweitzer S (2005). CompanyCommand: Unleashing the power of the army profession West Point NY: The Center for the Advancement of Leader Development and Organizational Learning Drego V L., Temkin B. D. & Mcinnes A. (2008) Use personas to design for engagement Retrieved August 5 2009, from the Forrester Research website: http : / /frstwebOO 1. forrester. com/Research/Document/Excerpt/ 0 7211 45717,00.html Edell, J. A., & Burke, M. C. (1987). The power of feelings in understanding advertising effects. Journal ofConsumer Research 14(3) 421-433. 139

PAGE 156

Everitt B Landau S., & Leese M (2001) Cluster analysis (4th ed ) London: Oxford University Press. FengYang K., & Jahangir K. (1988). User interface design from a real time perspective Communications of the ACM, 31( 12) 1456 1466. Fern X Z., & Brodley C., E (2004). Solving cluster ensemble problems by bipartite graph partitioning Proceedings of the 21st International Conference on Machine Learning (pp 36-43). Banff, Alberta Canada : ACM. Fisher C. W., & Kingma B. R. (2001) Criticality of data qualit y as exemplified in two disasters Information & Management 39(2) 109 116. Franke, J., & Mandler E. (1992). A comparison oftwo approaches for combining the votes of cooperating classifiers Proc ee dings of the 11th International Conference on Pattern Recognition Methodology and Systems Conference B (pp. 611 614). The Hague, Netherlands : IEEE Computer Society Fred A. L. N., & Jain A. K (2005) Combining multiple clusterings using evidence accumulation. IEEE Tran s action s on Patt e rn A nal y sis and Machine Intelligence 27, 835 850 Gerlach, J. H & Kuo F. Y. (1991) Understanding human computer interaction for information systems design. MIS Quarterly, 15(4) 527-549. 140

PAGE 157

Ghosh, J., & Strehl A. (2005). Clustering and visualization of retail market baskets. InN. R. Pal and L. Jain (Eds.), Advanced Techniques in Knowledge Discovery and Data Mining (pp 75-102). London: Springer. Gionis A., Mannila H., & Tsaparas P. (2007). Clustering aggregation ACM Transactions on Knowledge Discovery from Data, 1(1), 4 Goldfmch S. (2007). Pessimism computer failure and information systems development in the public sector. Public Administration Review 6 7 (5), 917929. Goodwin, K. (2002). Getting from research to personas: Harnessing the power of data. Cooper Newsletter, Retrieved August 6, 2009, from http: //www / newsletters / 2002 _11 I getting_ from _research_ to _personas.asp Goodwin K. (2009). Designingfor the digital age : How to create human-centered products and services. Indianapolis, IN : Wiley. Gorsuch R L. (1974) Factor analysis. Philadelphia: Saunders. Gould, J. D & Lewis, C. (1985). Designing for usability : Key principles and what designers think. Communications of the A CM, 2 8(3 ), 3 00-311. Gouvea, M. T. A., Motta C. L. R., & Santoro, F. M. (2006). Scoring mechanisms to encourage participation in communities of practice. Proceedings of the lOth International Conference on CSCW in Design (pp. 516-521). Nanjing China: IEEE Computer Society. 141

PAGE 158

Granger, C. W. J., & Lee T. H. (1989). Investigation of production sales and inventory relationships using multicointegration and non-symmetric error correction models. Journal of Applied Econometrics 4 145-159. Gregg, D., U. Kulkarni & Vinze (2001) Understanding the philosophical underpinnings of software engineering research in information systems. Information Systems Frontiers 3(2): 169-183. Grudin, J. (1994). Groupware and social dynamics: Eight challenges for developers. Communications ofthe ACM, 3 7 (1) 92-105. Grudin, J & Pruitt, J. (2002). Personas participatory design and product development : An infrastructure for engagement Paper presented at the 2002 Participatory Design Conference Malmo Sweden Hackos J T., & Redish J. C (1998) User and Task Analysis for Interface Design. New York : Wiley Haines M N & Goodhue D. L. (2003). Implementation partner involvement and knowledge transfer in the context of ERP implementations. International Journal of Human-Computer Interaction 16(1) 23-38 Hair J F., Black W C., Babin B. J., Anderson R. E., & Tatham, R. L. (2006) Multivariate data analysis (6th ed ) Upper Saddle River NJ: Pearson Education Halford G S., Baker R. McCredden, J. E. & Bain J. D. (2005) How many variables can humans process? Ps y chological Science 16(1 ), 70. 142

PAGE 159

Hall, D. L., & Llinas J (1997) An introduction to multisensor data fusion. Proceedings of the IEEE 85(1) 6-23. Hashem S., & Schmeiser B (1995). Improving model accuracy using optimal linear combinations of trained neural networks. Neural Networks IEEE Transactions on 6(3) 792-794. He Z. Xu, X., & Deng, S. (2005). A cluster ensemble method for clustering categorical data Information Fusion 6(2) 14 3-151. Heeks, R (Ed ) (1999). Reinventing Government in the Information Age : International Practice in IT-Enabled Public Sector Reform. London: Routledge. Hevner A. R. March S. T., Park J., & Ram, S. (2004). Design science in information systems research. MISQ 28(1) 75-106. Hoekman Jr R. (2006). Designing the obviou s: A common sens e approach to we b application design Thousand Oaks CA: New Riders Publishing Hong, Y., Sam K., Yuchou C. & Qingsheng R (2008). Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters 29(5) 595-602. Indulska M. & Recker J C. (2008). Design science in IS research: A literature Analysis. InS. Gregor and S. Ho (Eds ) Expanding Kno w l e dg e in the Information and Computing Sciences Retrieved November 5 2009 from the ANU E-press website: http: // !U Q : 167503 143

PAGE 160

Jain A. K., Murty M. N., & Flynn P. J. (1999). Data clustering. ACM Computing Surveys 31(3) 264-323. Jansen, B. J. Taksa I., & Spink, A. (2008) Research and methodological foundations of transaction log analysis. In B. J Jansen I. Taksa & A. Spink (Eds.) Handbook of Research on Web Log Anal y sis (pp 1-16). Hershey P A: Information Science Reference Javahery, H., Deichman, A., Seffah A., & Radhakrishnan, T (2007). Incorporating human experiences into the design process of a visualization tool: A case study from bioinformatics. Proceedings of the IEEE International Conference on Systems, Man and C y bernetics (pp 1517-1523). Montreal: IEEE Computer Society. Jimenez, L. 0 Morales-Morell A., & Creus, A. (1999). Classification of hyperdimensional data based on feature and decision fusion approaches using projection pursuit majority voting and neural networks. IEEE Transactions on Geoscience and Remote Sensing 3 7 (3) 1360 1366. Joshua P (2008). Designing for the social web Berkeley CA: New Riders Publishing. Kantola, V. Sauli T. Katri M & Tomi K. (2007). Using dramaturgical methods to gain more dynamic user understanding in user-centered design 144

PAGE 161

Proceedings of the 6th A CM SIGCHI Conference on Creativity & Cognition (pp 173 182). Washington DC: ACM. Karimi, J., & Konsynski B R (1988). An automated software design assistant. IEEE Tran s action s on Softwar e Engineering 14(2),194-210. Keohane R. 0., & Nye Jr, J. S. (1998). Power and interdependence in the information age. Foreign Affairs 77 (5) 81-94. Kittler, J Hatef, M ., Duin R P. W & Matas J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysi s and Machine Intelligence, 20(3) 226-239. Kozar K., & Miaskiewicz, T. (2009) Designing the introductory IS course using student personas : Lessons learned from product design Pro ce edings of the 15th Americas Conference on Information Systems 2009 (paper 454). Retrieved July 20 2009 from the AIS Electronic Library website: http: / / amcis2009 I 4 54 Kuechler W and Vaishnavi V. (2007). Design [science] research in IS : A work in progress. Proceeding s of 2 nd International Conference on D e sign Science Research in Information Systems and Technology Pasadena, CA: ACM. Kujala, S., & Kauppinen M (2004). Identifying and selecting users for user centered design. Proceeding s of the 3rd N ordic Conference on Human computer Interaction (pp. 297-303) Tampere Finland: ACM. 145

PAGE 162

Kujala, S & Mantyla, M. (2000) How Effective Are User Studies? Proceedings of the Human-Computer Interaction 2000 Conference (pp. 61-71 ). Sunderland UK: Springer. Kustra, R. & Zagdanski, A. Data-fusion in clustering microarray data: Balancing discovery and interpretability. IEEE IACM Transactions on Computational Biology and Bioinformatics 7 (1) 50 63. Landauer, T K., Foltz, P W. & Laham D. (1998) An introduction to latent semantic analysis Discourse processes 25 259-284 Landis, J. R., & Koch, G G (1977) The measurement of observer agreement for categorical data. Biometrics 33(1 ) 159-174 Laudon, K. C & Laudon, J.P. (1999). Essentials of Management Information Systems (4th ed.). Englewood Cliffs, NJ: Prentice Hall. Lee A. S, & Baskerville R. L. (2003). Generalizability in information systems research Information Systems Research 14(3) 221 243. Lindgren, A., Fang C Amdahl P., & Chaikiat P (2007) Using personas and scenarios as an interface design tool for advanced driver assistance systems. Proceedings of the ih International Conference on Universal Access in Human-Computer Interaction Ambient Interaction (pp 460 469) Beijing China : Springer 146

PAGE 163

Long F. (2009). Real or imaginary : The effectiveness of using personas in product design Proceedings of the Irish Ergonomics Society A nnual Conference (pp. 1-1 0) Dublin Ireland: Irish Ergonomics Society Lyytinen K., & Hirschheim R. (1987). Information systems failures: A survey and classification of the empirical literature. Oxford Surv ey s in Information Technology 4 257 309. Ma J., & LeRouge C (2007). Introducing user profiles and personas into information systems development. Proceedings of the 13th A m e ricas Conference on Information Systems 2009 (paper 237) Retrieved July 22 2009 from the AIS Electronic Library website : http :// /23 7 Manning H. Root N L., & Backer E. (2005). Sit e design p e rsonas: How man y ho w much. Retrieved August 7 2009 from the Forrester Research website: http: //www .forrester corn/ Research/Document/Excerpt / 0 7211 37110 00.html March S. T., & Smith, G. (1995). Design and natural science research on information technology D e cision Support S y stems 15( 4 ) 251-266 March S T., & Storey V C. (2008). Design science in the information systems discipline : An introduction to the special issue on design science research. MIS Quarterl y 32(4) 725 730. 147

PAGE 164

McGinn, J., & Kotamraju, N. (2008). Data-driven persona development. Proceedings of the 26th Annual SJGCHJ Conference on Human Factors in Computing Systems (pp 1521-1524). Florence Italy: ACM. Miaskiewicz T., & Kozar K. (2006). The use ofthe Delphi method to determine the benefits of the personas method an approach to systems design Proceedings of the 51h Annual Workshop on HCI Research in MIS (pp. 5056). Milwaukee, WI: Association for Information Systems. Miaskiewicz T., Sumner, T., & Kozar, K. A. (2008). A latent semantic analysis methodology for the identification and creation of personas Proceedings of the 26th Annual SJGCHJ Conference on Human Factors in Computing Systems (pp. 1501-1510). Florence, Italy: ACM. Miles M. B., & Huberman, A.M. (1994) Qualitative data analysis : An expanded sourcebook. Newbury Park, CA: SAGE Publications, Inc. Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Ps y chological review 63(2) 81-97. Morgan, G A., Leech, N. L., Gloeckner G. W., & Barrett K. C. (2007) SPSSfor introductory statistics : Use and interpretation. Mahwah, New Jersey : Lawrence Erlbaum Associates Inc. Mukherjee I. (2008). Understanding information system failures from the complexity perspective Journal ofSocial Sciences 4(4) 308-319. 148

PAGE 165

Mulder, S., & Yaar, Z. (2007). The user is alwa y s right: A practical guide to creating and using personas for the web. Thousand Oaks CA: New Riders Publishing. Ndiwalana A. Lee J ., Smith J L. Wahid S Hobby L. Chewar C M ., et al. (2005) From personas to design: Creating a collaborative multidisciplinary design environment. Proceedings of the JJ'h International Conference on Human-Computer Interaction (pp 1 1 0). Las Vegas NV: Lawrence Erlbaum Associates Inc Nguyen, C., Mannino M. Gardiner K. & Cios K. J. (2008). ClusFCM: An algorithm for predicting protein functions using homologies and protein interactions. Journal of Bioinformatics and Computational Biology, 6( 1 ) 203-222. Nguyen, N (2007). Consensus clusterings Proceedings of the 7'h International Conference on Data Mining (pp 607-612) Omaha NE: IEEE Computer Society. Nicolini D., Gherardi S., Yanow D. & Gomez M. L. (2003). Kno w ing in organi z ations: a practice-based approach. Armonk New York: ME Sharpe. Nieters, J. E Ivaturi S., & Ahmed I. (2007) Making personas memorable Proc e edings of the ACM HCI 2 00 7 Confe renc e on Human Factors in Computing S y stems (pp 1817-1823) San Jose CA: ACM. 149

PAGE 166

Norman, D A. (1988) The psychology of everyday things: New York: Basic books Norman, D A. (2004). Emotional design : Why we love (or hate) everyday things. New York: Basic Books. Nunnamaker J F., & Chen M. (1990) Proceedings ofthe 23'd Annual Ha w aii International Conference on S yste m Sciences (pp. 631-640). Kailua Kona HI: IEEE Computer Society O'Connor H., Madge C Shaw R., & Wellens J. (2008). Internet based interviewing InN. Fielding, Lee R. M and Blank G (Eds ), The SAGE handbook of online research methods (pp. 271289) London: Routledge Office of Management and Budget (2008). Budget ofthe United States Government Fiscal Year 2009. Washington, DC: U.S Government Printing Office. Proctor, T. (2000). Strategic marketing: An introduction. London : Routledge. Pruitt, J & Adlin T. (2006). The persona lifec y cle: Keeping people in mind throughout product design San Francisco: Morgan Kaufmann Pruitt J., & Grudin J (2003) Personas: Practice and theory Proceedings of the 2003 conference on Designing/or user experiences (pp 144-161). San Francisco: ACM. Ronkko K. (2005). An empirical study demonstrating how different design constraints project organization and contexts limited the utility of personas. 150

PAGE 167

Proceedings of the 381h Hawaii International Conference on S y stem Sciences. Waikoloa HI : IEEE Computer Society. Sambamurthy V. Bharadwaj A. & Grover V (2003). Shaping agility through digital options : Reconceptualizing the role of information technology in contemporary firms. MIS Quarterly 2 7 (2), 237 263 Schmidt A., Terrenghi L. & Holleis P. (2007) Methods and guidelines for the design and development of domestic ubiquitous computing applications. Pervasive and Mobile Computing 3(6) 721 738 Scott, J E. & Walczak, S (2009). Cognitive engagement with a multimedia ERP training tool : Assessing computer self-efficacy and technology acceptance. Information & Management 46(4) 221-232. Sentz K., & Ferson, S. (2002). Combination of evidence in Dempster-Shafer theory (Report No. SAND2002 835). Albuquerque NM: Sandia National Laboratories Shafer, G (1976). A mathematical theory of evidence Princeton NJ: Princeton University Press. Simon, H A. (1969) The sciences ofthe artificial. Cambridge MA: MIT Press. Sinha R. (2003). Persona development for information-rich domains Proceedings of the CHI '03, Extended Abstracts on Human Factors in Computing Systems (pp. 830-831 ) Ft. Lauderdale FL: ACM. 151

PAGE 168

Srivastava J., Robert, C., Mukund D. & Pang-Ning, T. (2000). Web usage mining: Discovery and applications of usage patterns from Web data. SIGKDD Explorations Newsletter, 1(2) 12-23. Stone D. L., & Stone, D. (2005) User interface design and evaluation. Boston MA: Morgan Kaufmann. Strehl A. (2002). Cluster ensembles for high-dimensional data mining Unpublished doctoral thesis, University of Texas at Austin. Strehl A. & Ghosh, J. (2002). Cluster ensembles a knowledge reuse framework for combining multiple partitions The Journal of Machine Learning R esearch, 3, 583-617. Tabachnick B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston MA: Pearson. Turner K., & Agogino, A. K. (2008). Ensemble clustering with voting active clusters. Pattern Recognition Letters 29(14) 1947-1953. Turban, E., & Aronson J. (1998). Decision support sy stems and intelligent systems Upper Saddle River, NJ: Prentice Hall. United States Government Accountability Office (2008). DOD systems modernization (GAO Report 08-927R). Washington DC : U.S. Government Printing Office. 152

PAGE 169

United States Senate (1988) Investigation into the downing of an Iranian airliner by the U.S.S. Vincennes (Senate Hearings September 8 1988 100-1035) Washington DC: U.S. Government Printing Office. Wenger E. (1998). Communities of practice : Learning as a social system. S y stems thinker 9(5) 1-5. Wilson M., & Howcroft D. (2002). Re-conceptualising failure: social shaping meets IS research. European Journal of Information S y stems 11 ( 4 ) 236250. Witten I. H., & Frank E. (2005). Data mining : Practical machine learning tools and technique s (2nd ed.) San Francisco : Morgan Kaufmann. Zelkowitz M V (1998). An update to experimental models for validating computer technology Journal of Systems and Software, 82(3) 373-376 Zhang P., Carrey J., Te'eni D., & Tremaine M (2005) Incorporating HCI development into SDLC: A methodology. Communication s of the AIS (CAIS) 15(Article 29) 512-543 Zhou, W., Heesom D., & Georgakis P (2007 ) Enhancing user-centered design b y adopting the Taguchi philosophy Proce e dings of th e 1 ih International Conference on Human-Computer Int e raction Interaction Design and U sability (pp. 460-469). Beijing China : Springer. 153