Citation
Similarity measures for medical event sequences : prospects for clinical decision-making

Material Information

Title:
Similarity measures for medical event sequences : prospects for clinical decision-making
Creator:
Frederickson, Joel Scott
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Doctor of philosophy)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Computer Science and Engineering, CU Denver
Degree Disciplines:
Computer science and information systems
Committee Chair:
Gregg, Dawn
Committee Members:
Mannino, Michael
Barnaei-Kashani, Farnoush
Ramirex, Ronald

Notes

Abstract:
This dissertation advances the use of a scientific artifact to assist with clinical decision-making. Specifically, we develop and evaluate a similarity measure (OTCS-MES) adapted to medical event sequences (MESs). We further evaluate the decision-making performance of OTCS-MES as an extended clinical decision support tool for new health care domains. To expand the application of OTCS-MES, we generalize its application to improve efficacy for medical event sequences recorded in domains other than the inpatient setting for which it was first developed. Assessment uses industry recognized “gold standards” in place to benchmark health care facility performance. Assessing the generalized measure’s performance requires experimentation on the newly extended OTCS-MES, along with more classical inferential methods integrating data elements inherent to MESs. This dissertation begins with a literature review that broadly describes the many inhibitors to physician adoption of technology. It explains how many of these inhibitors are unique to health care. The literature review concludes with a narrower focus on the lack of effective clinical decision support tools available to providers and provision of a similarity measure using medical event sequences. After the literature review, the empirical studies develop and evaluate our MES similarity measure; a decision support tool unique and beneficial to health care. These studies address the proposition that a MES adapted similarity measure performs differently and ultimately better within the health care context.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
Copyright Joel Frederickson. Permission granted to University of Colorado Denver to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.

Downloads

This item has the following downloads:


Full Text
SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PROSPECTS FOR CLINICAL DECISION-MAKING
by
JOEL SCOTT FREDRICKSON
B.S., University of Colorado Colorado Springs, 1980 M.S., University of Colorado Denver, 2013
A dissertation submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy
Computer Science and Information Systems Program
2019


This thesis for the Doctor of Philosophy degree by
Joel Scott Fredrickson has been approved for the Computer Science and Information Systems Program
By
Dawn Gregg, Chair Michael Mannino, Advisor Farnoush Barnaei-Kashani
Ronald Ramirez


Fredrickson, Joel (Ph.D. Computer Science and Information Systems)
Similarity Measures for Medical Event Sequences: Prospects for Clinical Decision-Making Thesis directed by Associate Professor Michael Mannino
ABSTRACT
This dissertation advances the use of a scientific artifact to assist with clinical decision-making. Specifically, we develop and evaluate a similarity measure (OTCS-MES) adapted to medical event sequences (MESs). We further evaluate the decision-making performance of OTCS-MES as an extended clinical decision support tool for new health care domains. To expand the application of OTCS-MES, we generalize its application to improve efficacy for medical event sequences recorded in domains other than the inpatient setting for which it was first developed. Assessment uses industry recognized "gold standards" in place to benchmark health care facility performance. Assessing the generalized measure's performance requires experimentation on the newly extended OTCS-MES, along with more classical inferential methods integrating data elements inherent to MESs. This dissertation begins with a literature review that broadly describes the many inhibitors to physician adoption of technology. It explains how many of these inhibitors are unique to health care. The literature review concludes with a narrower focus on the lack of effective clinical decision support tools available to providers and provision of a similarity measure using medical event sequences. After the literature review, the empirical studies develop and evaluate our MES similarity measure; a decision support tool unique and beneficial to health care. These studies address the proposition that a MES adapted similarity measure performs differently and ultimately better within the health care context.
The form and content of this abstract are approved. I recommend its publication.
Approved: Michael Mannino
in


TABLE OF CONTENTS
CHAPTER
I. OVERVIEW.................................................................................1
Problem Statement.....................................................................2
Motivation............................................................................3
Literature Review.....................................................................5
II. DEVELOPMENT AND EVALUATION OF A SIMILARITY MEASURE FOR MEDICAL EVENT SEQUENCES... 27
Abstract......................................................................................27
Introduction..................................................................................27
Related Work..................................................................................31
OTCS-MES Similarity Measure...................................................................33
Empirical Evaluation of the OTCS-MES..........................................................40
Discussion....................................................................................59
Conclusion....................................................................................61
III. SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PREDICTING MORTALITY IN TRAUMA
PATIENTS..................................................................................63
Abstract..............................................................................63
Introduction..........................................................................64
Related Work..........................................................................68
Research Methodology..................................................................71
IV


Results of Empirical Evaluation
84
Conclusion........................................................................96
IV. EXTENDED SIMILARITY MEASURES TO PREDICT TRAUMA PATIENT MORTALITY.....................98
Abstract..........................................................................98
Introduction......................................................................98
Related Work.....................................................................100
Research Methodology.............................................................102
Results of Empirical Evaluation..................................................112
Discussion.......................................................................116
Conclusion.......................................................................117
Summary..........................................................................118
REFERENCES..............................................................................121
APPENDIX
A. Taxonomy of Barriers.........................................................130
B. Research Model of EHR Adoption Determinants..................................131
C. OMOPCDM Tables...............................................................132
D. The CDSS Acceptance Model - Facilitators.....................................133
E. OTCS-MESC# Algorithm.........................................................134
F. OTCS-MESSAS Algorithm........................................................142
v


LIST OF TABLES
TABLE
1. Sample Inpatient MES......................................................................28
2. Sample IP MES with Event Duration and Gap.................................................34
3. Example of Similar Patients Based on Event Similarity.....................................38
4. Distribution of Outpatient Incidents by Duration.........................................43
5. Simple Overlap Measure Example............................................................44
6. Rank Weighted Overlap Measure Example....................................................44
7. Summary of t Test Results for OTCS-MES/OTCS with Balanced Weights........................50
8. Summary of t Test Results for OTCS-MES/OTCS with Unbalanced Weights......................54
9. Confidence Intervals (95%) for Non-Significant t Test Results in Table 8.................54
10. Total Event Matches for OTCS-MES and OTCS...............................................56
11. Summary of t Test Results for OTCS-MES/Artemis with Balanced Weights....................59
12. Summary of Hypothesis Evaluation........................................................60
13: Summary of Variables for Secondary Research Questions....................................80
14: Summary of Filtered Trauma Data..........................................................82
15: Summary of Diagnosis Detail in Trauma and Inpatient Data.................................83
16: Comparison of Over-Sampled and Imbalanced Training Data (k = 15).........................85
17: Comparison of Certainty-Factor and Majority Voting (k = 15)..............................85
18: Comparison of Weighted and Non-Weighted Voting...........................................86
VI


19: Statistical Testing Results for Hypotheses.................................................89
20: Confusion Matrix Summary for Equal Weights.................................................91
21: Summary of Findings on Hypotheses..........................................................94
22: Match Level in Trauma Data.................................................................95
23. Sample Inpatient MES.......................................................................100
24: VDM Computations for Gender and Injury Mechanism...........................................104
25: Summary of Filtered Trauma Data............................................................110
26: Comparison Statistics for Logistic Regression Models (Method 1)............................112
27: AUC Hypothesis Test for Extended TMPM and OTCS-MES Ensemble (Method 1).....................113
28: Similarity Component Weighting Alternatives for Method 2...................................114
29: AUC Hypothesis Tests (Method 2)............................................................114
vii


LIST OF FIGURES
FIGURE
1. Frequency Distribution of Patient Pairs by Number of Matched Inpatient Events.................39
2. Frequency Distribution of Patient Pairs by Number of Matched Outpatient Events...............39
3. Prevalence Scales by Number of Matched Events for Inpatient MESs............................40
4. Frequency Distribution of Patient Pairs by OTCS-MES (Inpatient Data).........................45
5. Frequency Distribution of Patient Pairs by OTCS-MES (Outpatient Data)........................46
6. Simple Overlap using Unbalanced Weighting (Inpatient Data)..................................47
7. Rank Weighted Overlap using Unbalanced Weighting (Inpatient Data)...........................47
8. Simple and Rank Weighted Overlap of OTCS-MES Similarity Measure vs Original OTCS............50
9. Simple Overlap Results with Extreme Component Weighting.....................................52
10. Rank Weighted Overlap Results with Extreme Component Weighting..............................52
11. Simple Overlap Results with Mixed Component Weighting.......................................53
12. Rank Weighted Overlap Results with Mixed Component Weighting................................53
13. Nearest Neighbor Overlap by Sequence Length (Inpatient Data)................................55
14. Nearest Neighbor Overlap by Sequence Length (Outpatient Data)...............................55
15. Simple Overlap of OTCS-MES versus Artemis (Inpatient Data)..................................56
16. Rank Weighted Overlap of OTCS-MES sure versus Artemis (Inpatient Data)......................57
17. Simple Overlap of OTCS-MES versus Artemis (Outpatient Data).................................57
58 viii
18. Rank Weighted Overlap of OTCS-MES versus Artemis (Outpatient Data)


19: OTCS Event Matching Procedure (Zheng et al. 2010)............................................73
20: Comparison of Similarity Measures by Neighborhood Size (k)...................................87
21: Comparison of Methods for Imbalanced Data Sets on AUC........................................87
22: Impact of Case Base Size on AUC..............................................................88
23: ROC Curves for Mortality Prediction..........................................................90
24: Confusion Matrix Performance of Classification Methods.......................................91
25: Weighted Youden's Index by Method and Cost Ratio.............................................92
26: Maximum Sensitivity for False Positive Constraints (Neyman-Pearson criteria).................93
27: Patient Record Variable Significance........................................................104
28: ROC Curves for Extended TMPM and OTCS-MES Ensemble (Method 1)...............................113
29: Weighted Youdens Index by Method and Sensitivity Cost Ratio (Method 1)......................115
30: Sensitivity by FPF Constraint Level (Method 1)..............................................115
IX


CHAPTER I
OVERVIEW
The literature review and research presented in this dissertation support movement from the "practice of medicine" to the "science of medicine". Khosla (2014) argues that the practice of medicine "is driven by conclusions derived from partial information of a patient's history and current symptoms interacting subjectively with various known and unknown biases of the physician, hospital, and healthcare system". Health care must evolve toward a more scientific method, with complete and accurate data collection, sophisticated analysis, and scientific experimentation aimed at delivery efficiency and improved patient outcomes. Khosla (2014) summarizes by saying:
“Healthcare must move away from the system of small trials and experiential evolution of best practices (which has done us well) to a state unencumbered with the conflicts of interest, personal biases, and incomplete knowledge that currently lead to suboptimal
results".
Essentially, we must overcome an approach to health care based on practice and tradition and move to one able to effectively leverage the vast amounts of accessible and digitized patient-centric data. Khosla (2014) predicts that:
“Technology will reinvent healthcare as we know it" and “in the future, the majority of physicians' diagnostic, prescription and monitoring. . . will be replaced by smart hardware,
software, and testing."
Accordingly, this study advances the use of a scientifically designed tool augmenting clinical decision-making systems. Specifically, we develop and evaluate a similarity measure (OTCS-MES) adapted to medical event sequences (MESs). We further evaluate the decision-making performance of OTCS-MES as a flexible analytic method accommodating different health care domains. That is, we generalize our OTCS-MES similarity measure to better include various events found in diverse health care service categories, such as inpatient admissions and outpatient procedures. Assessing our generalized measure's performance then requires empirical evaluation of extended OTCS-MES versions,
1


along with more classical inferential methods integrating data elements inherent to MESs. Accordingly,
the remaining sections of this dissertation are as follows:
1. Problem Statement and Motivation
2. Literature Review: Health Care Technology Adoption - Enablers and Inhibitors
3. Literature Review: Clinical Decision Support Systems - Overview and Adoption Factors
4. Literature Review: Similarity Measures and Medical Event Sequences
5. Study 1: Development and Evaluation of a Similarity Measure for Medical Event Sequences
6. Study 2: Similarity Measures for Medical Event Sequences: Predicting Mortality in Trauma Patients
7. Study 3: Similarity Measures for Medical Event Sequences: Performance with Patient Record Data
Problem Statement
The pervasive retention of electronic health care data enables clinical decision-making systems (CDSS) and associated tools. The HITECH act not only incented the expanded adoption of electronic health records (EHRs) and their "meaningful use" but aimed to leverage this new data source to improve the quality of health care. Essentially, the originators of HITECH envisioned electronic health care data routinely captured in a standardized and secure manner to satisfy five goals for the US healthcare system: "improve quality, safety and efficiency; engage patients in their care; increase coordination of care; improve the health status of the population; and ensure privacy and security" (Sox 2008). A problem with this vision has become the obvious lack of clinical decision-making tools and use of EHRs by health care providers to improve patient outcomes. That is, EHRs are normally used in transaction-based systems for billing, scheduling, and workflow, but their use to improve patient diagnosis and treatment is remarkably neglected. The literature review below addresses many of the general inhibitors to EHR use for clinical decision-making. Importantly, this dissertation proceeds to focus on one of the more significant inhibitors: lack of CDSS tools perceived as useful in improving quality of care. Accordingly, this research becomes design science in nature through the provision of a CDSS tool specific
2


to medical event sequences. We strive to create an IT artifact (OTCS-MES Similarity Measure) that is carefully designed to help health care providers during the performance of their responsibilities. Such CDSS tools are not transaction oriented but learn "from the growing volume of captured data what does and does not work in healthcare" (Sox 2008). The development and testing of such tools, leveraging expanding data warehouses of EHRs, is critical to the movement from the "practice of medicine" to the "science of medicine" discussed previously.
Motivation
As the following literature review outlines, health care lags in the adoption of clinical decision support systems and supporting information technology. There are many unique factors within health care inhibiting IT adoption and these factors are thoroughly explained in the literature review. This dissertation focuses on one of the more important inhibitors: the lack of technology driven tools perceived as beneficial by health care providers during the diagnosis and treatment of patients. As the "meaningful" use and coordinated data stores of EHRs evolve, tools leveraging health care data for clinical decision support will increase in number and popularity among care givers. As such, we introduce a CDSS tool taking advantage of referential health care data and advanced analytical methods to enhance clinical decision-making. Our CDSS tool uses patient or incident specific medical event sequences (MESs) captured and retained in ever-growing data stores of EHRs.
A sequence of medical events, incurred by a patient and characterized in electronic data, lends itself to CDSS similarity measure analysis for several reasons. First, standardized coding systems for diagnoses, procedures, and pharmaceuticals are in place to represent medical events. For example, an inpatient admission, as a medical event, is characterized by the ICD (International Classification of Diseases) codes for admission and discharge diagnoses. This allows the attachment of "meaning" to a particular medical event. Consequently, similarity measures for MESs can go beyond simply matching states as previous measures do, and quantify likelihood, risk and severity of matched states. Second,
3


similarity measures adapted to MESs allow consideration of the important temporal or structural component of event sequences. That is, the data captured during multiple medical events includes temporal fields helpful in defining sequence structure. As an example, an inpatient admission claim includes the beginning and end date of the admission. Thus, the temporal structure and duration of medical events provide important information useful to similarity measures. For example, longer durations for medical events are indicative of greater severity as are more frequent and consistent event occurrences.
There are several practical applications for similarity measures functioning within clinical decision support systems. These include (1) improved patient classification and clustering, (2) increased patient adherence to clinical pathways (care management plans), and (3) augmented discovery of similar patients for medical social networking. First, MES similarity measures have potential for grouping "like" patients. Such classification and clustering research generates patient or entity cohorts having high similarity estimated by MES based similarity measures. The goal is patient cohorts having greater attribute similarity. For example, MES similarity measures could help build homogeneous patient groups having similar disease states or prospective risk. Second, a MES similarity measure could help evaluate patient adherence to an accepted clinical pathway specific to a diagnosed condition. A clinical pathway is a temporally spaced sequence of procedures, medications, tests or other medical events related to a disease or physical condition. MES similarity measures seem especially suitable to comparing a patient's actual sequence of medical events to their prescribed clinical pathway. Finally, medical social networks allow patients having similar medical histories to discuss treatment successes and failures, exchange experiences and receive emotional support. MES similarity measures could prove useful as another method incorporated by medical social networks when retrieving similar patients for their online communities.
4


Literature Review
Introduction
Physician intent to adopt our MES similarity measure, and clinical decision support systems in general, is of primary concern. A dichotomy exists between the apparent and significant benefits of CDSS adoption and the reluctance of physicians to accept this technology. Potentially, CDSSs can be integral in (1) improving quality and continuity of care, (2) increasing productivity for doctors and nurses, (3) providing better information for decision making, (4) enabling better product/service customization, (5) achieving higher quality patient outcomes, and (5) improving service efficiency. However, health care providers, and most especially physicians in smaller practices, have been notoriously slow in adopting EHR use for purposes other than transaction-based systems. Accordingly, prior to development and evaluation of our MES similarity measure for clinical decision support, we must understand the factors contributing to its eventual adoption and possible barriers to its acceptance. This literature review begins with a broad analysis of IT adoption by physicians in the United States. The focus of the review than becomes narrower through a review of research about clinical decision support systems. Finally, the review concentrates on referential literature about similarity measures and medical event sequences important to the development of our new IT artifact.
Enablers of Health Care IT Adoption
Use of EHRs
Significant developments are evident within the health care industry in terms of pervasive capture, retention and use of medical event information. The positive news for health care researchers is the focus on widespread use of EHRs by providers, and standardized coding algorithms for information abstraction. Encouragingly, IT trends in health care are enabling the application of similarity measures and related data mining techniques to ever increasing amounts of clinical data.
5


The health care industry formally directed its attention to increased application of IT to capture
and leverage patient information with the mandates of the Health Insurance Portability and Accountability Act of 1996. Since that time, the government has placed increasing emphasis on the need to adopt technology within health care, culminating with the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009. A central focus of this legislation is the "meaningful use" of EHRs by health care providers. Specific to electronic health or medical records, the HITECH Act establishes a system of incentive payments to healthcare providers for the "meaningful use" of EHRs. The act states "that, as of 2011 [and until 2015], healthcare providers will be offered financial incentives for demonstrating meaningful use of electronic health records" (healthcareitnews.com 2014).
The requirements for "meaningful use" of IT by healthcare providers are administered by the Centers for Medicare and Medicaid Services (cms.gov). To qualify for incentive payments, and to avoid future penalties, healthcare providers in the form of eligible professionals, eligible hospitals, and critical access hospitals (CAHs) must meet "measurement thresholds [for EHR use] that range from recording patient information as structured data to exchanging summary care records" (healthit.gov 2014). The CMS incentive programs have three stages, and each stage requires additional evidence of IT usage.
Observational Medical Outcomes Partnership (OMOP) - Common Data Model
In the presence of every-expanding stores of EHRs, their collaborative use for research requires an enabling common data model (CDM) and associated standards. The federal government first realized the enabling power of a CDM having a common format for disparate data sources with the FDA Amendments Act of 2007. Specifically, this act mandated that "the FDA collaborate with public, academic, and private entities to access disparate data sources and to validate ways to link and analyze safety data from multiple sources for medical product safety surveillance" (Stang et al. 2010). As a result, the Observational Medical Outcomes Partnership (OMOP) (http://omop.fnih.org), a partnership
6


among the FDA, academia, data owners, and the pharmaceutical industry, was initially created to identify needs and test scientific methods and data infrastructure of an "active drug safety surveillance system". While this first effort at building a facilitating health care data source for collaborative research was restricted to "maximizing the benefit and minimizing the risk of pharmaceuticals'", its potential for other categories of health care service quickly became apparent (Stang et al. 2010). Given the obvious value of a standardized data resource for collaborative health care research, the OMOP stakeholders, upon the conclusion of their project, initiated the Observational Health Data Sciences and Informatics (OHDSI) initiative (Hripcsak 2015). OHDSI strives "to bring out the value of observational health data through large-scale analytics" (https://ohdsi.org/who-we-are/). Integral to the OHDSI initiative is the establishment of a CDM for abstracted health care data and the development of associated ETL methods and informatic applications.
A CDM for health care data responds to the fundamental premise that health care data varies greatly among organizations. This is due to (1) the capture of information for different purposes and through different means, and (2) the use of "abstracted" and varied code sets to represent medical events. First, health care data may be retained for a variety of purposes including reimbursement, research and analytics, or patient care. Each of these purposes involves different code sets, forms, and retention methods. Second, numerous code sets exist to capture medical events, including code sets for diagnoses, procedures, clinical tests, and drugs. Furthermore, a code must be "abstracted" to represent a medical event, and this process involves a great deal of subjectivity and human error. Obviously, given these factors, the variance among "observational" data stores can be significant. The OMOP CDM is a concerted effort to convert disparate data into a common format enabling systematic analysis. Accordingly, Overhage et al. (2011) argue that "translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts' understanding and the suitability for large-scale systematic analysis". The OMOP CDM provisions standardized health care data
7


from diverse sources as furnished by an open network of observational data holders. Each element in a participant's database must be mapped to the approved CDM vocabulary and subsequently placed in the data schema. Once the OMOP CDM data store exists, it affords researchers the opportunity to utilize numerous data exploration and analytics tools. The schematic for the tables of OMOP CDM is provided as Appendix C.
OHDSI asserts that OMOP CDM enables improved and accessible analytics. As such, it allows participants to "perform systematic analyses using a library of standard analytic routines that have been written based on the common format" (https://ohdsi.org/data-standardization/). A couple of examples of such enhanced analytics are the open-source package ACHILLES and a study for temporal pattern discovery in longitudinal electronic patient records. First, ACHILLES (Automated Characterization of Health Information at Large-scale Longitudinal Exploration System) provides a graphical representation of health care information found in the OMOP CDM. Basically, this application is a visualization tool for reviewing clinical content based on summary statistics derived from OMOP CDM content. The second example of an application leveraging OMOP CDM is more relevant to our OTCS-MES similarity measure and its attention to the temporal pattern of medical events. In this example, Noren et al. (2010) demonstrates the use of enhanced graphics to portray drug prescription and medical event sequences and evaluate expected and observed medical event frequencies for drug combinations. Interestingly, the graphical portrayal of MESs by Noren et al. is very similar to that used in our study. Overall, this study shows the value of a summarized data store of MESs (such as OMOP CDM) when used in conjunction with sophisticated analytical tools.
Barriers to Health Care IT Adoption
The potential benefits of EHR adoption by the health care industry are significant. EHRs are believed to be integral in "improving quality, continuity, safety and efficiency in healthcare" (Boonstra
8


and Broekhuis 2010). While expectations for outcomes related to "time, quality improvements, cost,
efficiency, paper reduction, ease of sharing" (Randeree 2007) are high from EHR use, the adoption rate has been notoriously slow. Because of the dichotomy between the apparent and significant benefits for EHR adoption and the reluctance of physicians to accept this technology, a substantial amount of research has been undertaken to understand the determinants of EHR adoption and its meaningful use. In fact, researchers Boonstra and Broekhuis (2010) have proposed a structure for organizing EHR barrier research (Appendix A), where "from the physician perspective, barriers linked to similar problems were grouped into a single category". Expanding upon these previously identified adoption barriers, and leveraging more current referential studies, we propose the health care IT adoption model shown in Appendix B. The determinants of IT adoption comprising this model are listed below and explained in subsequent sections.
1. Financial
2. Technical
3. Time
4. Psychological
5. Social
6. Legal
7. Organizational
8. Change Process
9. Environmental
10. Third Party Intervention
Financial
Financial considerations are a strong determinant of the meaningful use of EHRs by physicians. Physicians face varied and significant costs during EHR implementation and upkeep, and they also encounter challenges unique to the healthcare industry when funding technology. Primary consideration was given in earlier research to the "initial costs" of acquiring and installing an EHR system. Miller and Sim (2004) have this initial cost focus when speaking of "high initial financial costs, slow and uncertain financial payoffs, and high initial physician time costs". Similar treatment of the
9


initial costs for EHR systems is found in research by Ford et al. (2006) saying that a cost "physicians in
small practices have had to internalize is the system's initial purchase" and by Boonstra and Broekhuis (2010) defining initial EHR costs to "include all the expenditure needed to get an EHR system working in the physician's practice, such as the purchase of hardware and software, selecting and contracting costs and installation expenses". Another initial cost identified by Vishwanath and Scamurra (2007) is the direct "cost of the current legacy system in place". The initial cost barrier concerns both the source of funds and eventual financial benefit (ROI) for EHR adoption. Accordingly, Gans et al. (2005) highlight the "lack of capital resources to invest in an EHR" as a top five barrier to EHR adoption and Hennington and Janz (2007) address this issue in the form of two questions that each physician must ask: "1) What is the likelihood of seeing a positive return on the investment? [and] 2) Are there financial means available to purchase and maintain the system?". Hennington and Janz (2007) apply these two questions to the Unified Theory of Acceptance and Use of Technology (UTAUT) model in terms of UTAUT constructs where EHR ROI is related to "physicians' performance expectancy" and the availability of funds "addresses whether or not facilitating conditions exist".
As physicians have become more experienced in the use of EHR systems, the financial determinant of "meaningful use" has shifted in its focus somewhat to include worries about the ongoing costs for EHR maintenance and support. Ford et al. (2006) describes this financial component as "ongoing operational costs" and limits it to a discussion about EHR vendor transience and risk. Other researchers, including Randaree (2007), expand this discussion and find that physicians face other often unexpected ongoing EHR costs including those "associated with the customizability of their system", expensive "maintenance agreements" and "technology obsolescence". The general impression from the research about financial determinants of EHR adoption is that (a) initial and ongoing costs remain underrepresented and (b) there is a great deal of uncertainty concerning the availability of incentive payments and effective reimbursement mechanisms from payers to fund EHR meaningful use. Miller
10


and Sim (2004) feel that these barriers are even more significant in the context of smaller physician practices, saying "these barriers were most acute for physicians in solo/small-group practice, a mode in which a substantial majority of U.S. physicians practice".
Technical
The technical determinant of EHR meaningful use can be considered using the two components of the classic Technology Acceptance Model (TAM): perceived "ease of use" and perceived "usefulness". First, in terms of EHR "ease of use", a significant amount of research points to the distinctive resistance to technology within the healthcare industry and lack of technical expertise found in physician's practices, both by the physicians themselves and their support staff. Smaller practices do not have the necessary background, training or time to learn and deploy often complex EHR systems. Randaree (2007) describes this phenomenon by saying that "smaller practices do not have the capacity to implement the software while dealing with issues of care and reimbursement" and "medical schools and residency programs do not currently employ or train future physicians in the use of EHR" leaving them unequipped to properly select and deploy EHRs. Terry et al. (2009) expand upon this by stating EHR "barriers included level of computer literacy, training, and time" with the initial transition period being especially challenging in its time burden upon poorly trained staff and doctors. Boonstra and Broekhuis (2010) also note that "physicians struggle to get appropriate technical training and support for the systems from the vendor". Terry et al. (2009) suggest the need for an "in-house problem-solver" to "serve a key role in helping novice users move forward to achieve ... EHR adoption". An additional factor of EHR "ease of use" is system complexity with Miller and Sim (2004) finding "even highly regarded, industry-leading EHRs to be challenging to use because of the multiplicity of screens, options, and navigational aids". Randaree (2007) elaborates saying a "recent study showed that at least 264 EHR software programs are in use; on average, the percentage of respondents with the same EHR software was 0.4%". Obviously, this issue is aggravated by the lack of technical skills possessed by most physicians
11


and staff. The need for EHR customizability is another factor of EHR perceived "ease of use". Nambisan
(2014) describes how "adopters of new innovations often learn by using the innovation or reinvent the technology to adapt it to their own context" and how this is "invaluable during the adoption of new technologies such as EHR", but may not be possible given the knowledge and time limitations for physicians. One final consideration has to do with the interoperability of the EHR system to other internal and external IT systems and existing physician office processes. This is described as an inability of the EHR system to "interconnect with other devices that "complement" the EHR system" and EHRs that "are not compatible with the existing practice systems" (Boonstra and Broekhuis 2010). Establishment of data standards is key to this interoperability and repeatedly surfaces as a concern physicians have with EHRs.
The second component of TAM deals with the perceived usefulness of a perspective technology. Physicians are told of great possible utility for EHRs, including enhanced quality of patient outcomes, better compliance with various reporting and healthcare mandates, and improved process efficiencies. However, most healthcare providers are skeptical of EHR capability to deliver these benefits, and they are also not provided compelling incentives to improve clinical performance. Hennington and Janz (2007) describe an environment where the "system does not compensate physicians based upon the quality of the care they provide, and thus does not reward them for investing in systems designed to improve quality of care". Finally, effective knowledge management helps mitigate technical obstacles for EHR meaningful use, but "barriers to establishing a learning culture in health care organizations" are more problematic than those found in other industries (Nambisan 2014).
Time
The time determinant for EHR meaningful use is noteworthy where a "key obstacle in this path to quality [of care] is the extra time it takes physicians to learn to use the EHR effectively for their daily
12


tasks" (Miller and Sim 2004). Time spent to install and maintain an EHR system comes at the detriment
of patient-doctor relationships. This is an extremely important issue given most physicians are principally driven to provide the best patient care possible. Along these lines, Miller and Sim (2004) find "that most physicians using EHRs spent more time per patient for a period of months or even years after EHR implementation. The increased time costs resulted in longer workdays or fewer patients seen, or both, during that initial period". Research points to especially frustrating and unexpected time investments for process redundancy caused by duplicating medical records and functionality during EHR transition, software maintenance and upgrades, vendor management and dedicated EHR training. Unexpected amounts of time are predominantly incurred during the initial transition period between a paper based and electronic medical records system. Gans et al. (2005) discover that "concern about loss of productivity during transition to an EHR system" is among the top five barriers to EHR adoption for practices. Other research also points to this surprising investment of time during EHR implementation. For example, Randaree (2007) argues that during EHR transition; "Expect staffing changes and lots of training. Slow your implementation expectations. Create redundant systems until errors are eliminated. Whatever your timetable and budget (double it)." Relating to this time and effort outlay, Hennington and Janz (2007) believe "the time required to enter information into EHR systems is one of the greatest challenges to EHR adoption".
Psychological
The psychological determinant of EHR use is especially significant within the unique environment of healthcare and to its behaviorally unique physicians. Nambisan (2014) feels that the psychological determinant does not receive appropriate attention arguing that "innovation adoption is situated in a social (cultural) context and implies that the norms and values of the individual, the larger community of the individual, and the organization that the individual belongs to, all can influence adoption". To best understand the psychological barriers to EHR adoption, an understanding of the
13


socio-cultural and psychological context of health care is important. Significantly, physicians in smaller
practices have control and authority over both clinical and business processes. As such, without their full support, EHR adoption is extremely difficult to realize. Miller and Sim (2004) support this view saying "nonchampion [unsupportive] physicians tended to be less positive toward EHRs and more easily discouraged by usability problems ... without exhortation and support from physician champions, these physicians tended to remain as lower-level EHR users." The issue primarily seems to be one of process control that is particularly evident with older physicians stubbornly retaining old paper-based systems. Randaree (2007) explains that "older physicians were reluctant to transition to the EHR while the younger ones were driving the adoption". This issue of control is better explained by Vishwanath and Scamurra (2007) as "the loss of control of patient information, the loss of control over business processes, systems tend not to be very easy to use, negative perceptions among administrative staff, and problems in understanding the vernacular". Control is related to "professional autonomy" defined as "professionals having control over the conditions, processes, procedures, or content of their work" (Walter 2008). Accordingly, "physicians' perceptions of the threat to their professional autonomy are very important in their reaction to EHR adoption" (Boonstra and Broekhuis 2010). In addition to a perceived lack of control, many physicians are skeptical about the capability of EHRs to improve patient outcomes or quality of care. In fact, "more than half (58.1%) of the physicians without an EHR doubt that EHRs can improve patient care or clinical outcomes" (Boonstra and Broekhuis 2010). Finally, an important facet of the psychological determinant is the need for support and empathy from fellow adopters of EHR systems. Nambisan (2014) applies Social Cohesion Theory to highlight the importance of "empathetic communication between the adopter and the laggard". In the context of EHR adoption, the "higher chance of adoption of the innovation by the laggard" is realized through an appropriate level of social cohesion (Nambisan 2014).
14


Social
Because physicians cannot function in isolation, their network of relationships is a key determinant of EHR use. Boonstra and Broekhuis (2010) highlight some of the relationships impacting physician behavior saying "physicians in medical practices work together and cooperate with other parties in the healthcare industry, such as vendors, subsidizers, insurance companies, patients, administrative staff, and managers". The influence of these relationships has been shown to be of greater importance than was originally thought in HER adoption research. Vishwanath and Scamurra (2007) point to the unexpected breadth of social related determinants found in more recent research that professes social determinants to be "more expansive and complex and include the lack of community level participation, the lack of involvement of major players such as hospitals, the lack of involvement of major players such as insurance companies, the lack of organizational support, the lack of knowledge/awareness of current or local success stories, and the fact that others do not use or recommend EHRs". Along these lines, Hennington and Janz (2007) draw from more current technology adoption models, including the Model of PC Utilization, to include social factors for "the individual's internalization of the reference group's subjective culture, and specific interpersonal agreements that the individual has made with others, in specific social situations". More recent research into the social determinant looks to understand the underlying causes for the unique health care culture and resulting social interactions. Accordingly, Nambisan (2014) believes much of the social determinant is explained by the communication style prevalent in health care settings, finding that an "important factor that affects learning is the mode of communication in the health care organization". Specifically, Nambisan (2014) argues that "face-to-face or phone [communication prevalent in health care facilities] has been found to be highly interruptive and is a leading cause of errors". The specific reason for the ineffectiveness of this communication style is that "communication among employees in a hospital environment often leads to interruption-driven work contexts, where miscommunication or ineffective
15


communication is the norm" (Coiera and Tombs 1998). Obviously, knowledge sharing facilitating EHR
meaningful use is a challenge to achieve when constrained by this type of communication environment. Legal
The legal determinant of EHR use relates to (a) the electronic storage and transfer of confidential health care information and (b) the various reporting requirements for compliance with health care agency mandates. Interestingly, in earlier research, concerns over the privacy of patient information were rated lowest for those practices with EHRs (Gans et al. 2005). Initial concern over the security of EHR information may have been lessened by the lack of enforcement measures provided with the Health Insurance Portability and Accountability Act of 1996 that was intended to help protect confidential health care information. In fact, "HIPAA enforcement and compliance changed in a dramatic way after 2009. As part of the American Recovery and Reinvestment Act (ARRA), Congress passed in 2009 the Health Information Technology for Economic and Clinical Health Act (HITECH). The HITECH Act greatly strengthened HIPAA by dramatically increasing the penalties for HIPAA violations—up to $1.5 million for a violation in certain circumstances" (library.ahima.org 2014). Research describes two concerns for physicians over the privacy of patient information in the context of EHR usage. First is providing adequate protection to safeguard a patient's confidential medical information in compliance with HIPAA regulations. Second is the need to maintain access to and usability of this information for proper patient care. Often these two concerns conflict as heightened information security may result in reduced information access. Randaree (2007) explains this by saying "Medical data availability has to be balanced between access and privacy...Maximum controls may inhibit physicians from performing their job; minimal controls could leave patient information vulnerable to theft and misuse". The increased focus in the health care industry on protection of patient privacy and the more stringent regulations along those lines has heightened the significance of this determinant for physicians. Again, Randaree (2007) argues "patient record management is not new to the medical industry, information technology
16


(IT) solutions using EHRs have special considerations relating to the security of patient information (HIPAA, privacy, firewalls, virus protection, transmission) and performance issues (reliability of services, service levels, customization) as well as long term impacts (storage, computer upgrades, data efficacy)". Ironically, Boonstra and Broekhuis (2010) find that physicians are more concerned than patients with the security of patient information saying, "physicians are more concerned about this issue than the patients themselves...among the physicians who do use EHRs, most believe that there are more security and confidentiality risks involved with EHRs than with paper records". The second component of the legal determinant for EHR use is the anticipated benefit from EHR systems for compliance with federal and health care agency programs. Ford et al. (2006) state that "The policy mechanism most commonly discussed for increasing EHR's external influence coefficient in the United States is the introduction of clinical reporting mandates.... As reporting requirements increase, the only feasible mechanism for gathering such data will be the EHR. While this may initially be viewed as having a positive impact on EHR meaningful use, Ford et al. (2006) caution "while such programs may be of some use, they may not advance the goal of full EHR adoption significantly, because U.S. providers tend to respond negatively to such mandated-use policies". An example of a health care program requiring enhanced reporting capabilities is National Center for Quality Assurance (NCQA) accreditation. This program is claimed to be "the most comprehensive evaluation in the industry, and the only assessment that bases results of clinical performance (i.e., HEDIS measures) and consumer experience (i.e., CAHPS measures)" (ncqa.org 2014). NCQA accreditation is becoming a necessity for competitive health care facilities and EHRs are believed to ease the reporting burdens from this program. Obviously, the "meaningful use" requirements under the HITECH act necessitate similar enhanced reporting capabilities available through EHR systems. Care should be taken when evaluating the reporting component of this determinant as many of the physicians experienced with EHR usage have not experienced anticipated reporting benefits. Randaree (2007) indicates that customizable and enhanced reporting was not there
17


for many practices requiring physicians to incorporate "new [and] redundant work flows" for reporting
purposes.
Organizational
The facets of the organizational determinant of EHR use considered by the referential literature include (a) organization characteristics for size, clinic type or scope of services (inpatient, outpatient, ambulatory, etc.), and specialty (psychiatry, pediatrics, orthopedics, etc.), (b) extent of separation, both socially and professionally, between physicians and staff and (c) the organizational structure in terms of horizontal or vertical orientation. First, the most consistent organizational determinant is practice size, measured either by the number of physicians or patients. Gans et al. (2005) corroborates saying "the percentage of practices with EHRs differs greatly by size of practice". Specifically, practice size impacts EHR adoption because "larger practices could spread the sizeable fixed cost of purchase and implementation over more physicians" (Burt and Sisk 2005). Burt and Sisk (2005) contend that ownership type (physician or physician group; health maintenance organization (HMO); and all other health care organizations) is what really drives EHR adoption, arguing "Physician owned practices have low probabilities of using EHRs no matter what the size." Therefore, Burt and Sisk (2005) maintain that smaller practices lag larger practices in EHR adoption due to their inability to fund EHRs, rather than some other intrinsic property of smaller practices. Boonstra and Broekhuis (2010) confirm this saying "physicians who are employed by or contracted to a medical practice are more likely to use EHRs than those who own their own practices". Burt and Sisk (2005) also propose that the unique scope of services provided by practices impacts EHR adoption where "scope of services may also influence adoption, to the extent that EHRs offer the potential for practices with a wider range of services to achieve greater efficiencies". However, this hypothesis was later disproven by Burt and Sisk (2005) because "neither the scope of services, as measured by single versus multispecialty practice, nor broadly defined categories of specialty, as measured by primary care, medical, or surgical, were significantly associated with use".
18


They did find "use varied by specific physician specialty, however, with psychiatrists and dermatologists
least likely and orthopedic surgeons and cardiovascular disease specialists most likely to use EHRs" (Burt and Sisk 2005). The second component of the organizational determinant deals with the debilitating effect, in many practices, of the wide social and professional gulf between physicians and support staff. Research indicates that support staff often is more receptive to EHR adoption to improve process efficiency and reduce paperwork. In fact, adoption interventions often have a greater effect when geared toward support staff instead of physicians. For example, Vishwanath and Scamurra (2007) explain that "intervention aimed at alleviating integration issues is better off training support staff". In the same vein, a hierarchical organization structure reinforcing the separation of physicians and support staff is especially detrimental to EHR adoption, and necessitates effective networking and "a more open and less hierarchical environment for peer to peer interactions and knowledge transfer" (Nambisan 2014). The role of an effective organizational structure is further highlighted by Hennington and Janz (2007) when claiming EHR adoption requires "existence of both "an organizational and technical infrastructure" that would support actual usage". A more collaborative relationship between physicians and support staff, unencumbered by a formal hierarchical structure, is necessary because the "ability to share such user innovations and experiences are invaluable during the adoption of new technologies such as EHR" (Nambisan 2014).
Change Process
Reluctance to change involves not only the social determinant of physician "lack of control" discussed earlier, but also physician inadequacies in managing the change process required for EHR adoption. Boonstra and Broekhuis (2010) frame the argument as "EHRs in medical practices amounts to a major change for physicians who tend to have their own unique working styles that they have developed over years". As such, physicians anticipate difficulties managing the EHR change process without a facilitating organizational culture, effective incentives, individual and local support,
19


community level participation, and leadership. The magnitude of the change process is determined by
the extent of alignment between existing business processes and the functionality of the chosen (and possibly customized) EHR system. Given the mindset of most physicians described previously; "physician concerns surrounding EHR adoption center more on integrating EHRs into existing clinical processes than on the need to fundamentally alter those processes" (Hennington and Janz 2007). This approach to make EHRs fit existing processes, results in significant frustration for physicians, patients and support staff, and is compounded by EHR system designers unfamiliar with procedures unique to the health care industry. An example of an EHR system poorly equipped to match existing health care processes is described by Hennington and Janz (2007) when they "cited EHR designers' poor understanding of clinical workflows as one reason for low EHR adoption rates among chest physicians, noting system designers' failure to recognize the role of group interaction in the clinical process". The consequence of poorly designed or incompatible EHR systems is costly and time-consuming customization. Miller and Sim (2004) maintain that many times "EHR hardware and software cannot simply be used "out of the box." Instead, physician practices must carry out many complex, costly, and time-consuming activities to "complement" an EHR product, hence, customization. The need for EHR customization is compounded by the lack of physician and support staff expertise and time as described earlier. Most physicians in smaller practices find they have insufficient resources to "customize the system and make it do what I want it to do, and potential inability to customize the software, reports, and outputs to my satisfaction" and are thus reluctant to proceed with EHR adoption (Vishwanath and Scamurra 2007).
Environmental
The environmental determinant of EHR adoption mainly concerns the unique payment models under which physicians function. Basically, physicians reimbursement is performed using one of two approaches. First, is the more traditional "Fee for Service" (FFS) model where healthcare services are unbundled and "doctors and other health care providers receive a fee for each service such as an office
20


visit, test, procedure, or other health care service" (opm.gov 2014). The second payment model is "outcomes based", and accounts for the quality of care afforded the patient. Essentially this is a new payment model where "Medicare and Medicaid, as well as private insurance plans, are shifting payment to what is referred to as "accountable care" where physicians are paid for quality rather than the quantity of care delivered" (creators.com 2012). Clearly, reimbursement based on the first approach tracking service volume de-incents the efficiencies promised with EHRs. The environmental determinant further includes the technical factors external to practices and impacting EHR adoption. These technical factors include interoperability standards, open-source EHR systems and single patient EHR accessibility for multiple providers.
Third Party Intervention
Another usage determinant concerns the impact of intervening third parties on EHR adoption. Previously, the impacts of several "third parties" have been touched upon by other determinants (e.g. government intervention with the legal determinant and third-party vendor relationships with the change process determinant). The overall conclusion from these analyses being that a physician's anticipated effort in dealing with all the third-party interventions associated with EHR adoption is consequential. It is important to note that IT efforts especially, including complex system installations, are onerous to most physicians, and often necessitate extensive third-party involvement for training, technological support, or additional reasons. Managing these and other third-party relationships is obviously an area of concern to physicians without the time or expertise to effectively perform vendor management tasks.
Summary
Upon reviewing reference literature, several advances in our understanding of the determinants of EHR use within the health care industry are apparent. Because of the unique nature of the health care
21


industry, determinants of health care IT adoption are different than those experienced by more traditional industries. Research has shown that the reasons originally suggested for physician reluctance to embrace EHRs may have not explained the issue properly or completely. Most significant of the adoption determinants unique to health care are (1) "physician autonomy" and its significant adoption impact, (2) the psychological and social context of the adopting organization, and (3) the cost dynamics for EHR adoption. We must shift our focus from the more common technical and financial IT adoption determinants, to those determinants of significance to health care. Accordingly, chiefly financial or technical incentives for adoption have proven to be less effective within health care. What seems of greater importance to health care adoption is the role of "physician autonomy" and the need to properly understand and mitigate physician desire for process control. Along those lines, the traditional hierarchical structure of small physician practices, the negative impact of physicians participating in peer-to-peer interaction on EHR adoption, and the obvious difference in level of EHR receptivity between physicians and support staff clearly demonstrate that "physician autonomy" is of more significance than originally thought. Hence, Venkatesh and Sykes (2011) argue that "doctors are likely to develop negative views towards the new system because it could pose a threat to their autonomy and power". Additionally, the connectivity and communication style of physicians, and their power to change peer behavior, makes their resistance to EHR adoption even more problematic. As such, "central doctors . . . those who interact a great deal with doctors for advice, information, and knowledge related to performing their work" have been shown to have a negative EHR adoption impact on their immediate peers (Vankatesh and Sykes 2011). Finally, based on case studies, physicians find unexpected patterns and magnitudes of costs related to EHR adoption. For example, unforeseen costs realized after the initial EHR transition phase, such as costs for maintenance and customization, by adoption "leaders" have a significant impact on EHR receptiveness for adoption "laggards." Furthermore, these ongoing costs are not just limited to vendor management but are incurred to keep the EHR system functional in terms of
22


"performing updates, monitoring usage, implementing data security, performing data storage, and maintenance of hardware and software" (Randaree 2007).
Clinical Decisions Support Systems
Overview
The ultimate objective of the research described in this dissertation is a MES driven similarity measure functioning as an integral component of clinical decision support systems (CDSS). As such, this section of the literature review explains what a CDSS is and explores its benefits for health care practitioners. Formally, a CDSS "is an application that analyzes data to help healthcare providers make clinical decisions" (http://searchhealthit.techtarget.com /definition/clinical-decision-support-system-CDSS).
More specifically:
"A CDSS is an adaptation of the decision support system commonly used to support business management. Physicians, nurses and other healthcare professionals use a CDSS to prepare a diagnosis and to review the diagnosis as a means of improving the final result. Data mining may be conducted to examine the patient's medical history in conjunction with relevant clinical research. Such analysis can help predict potential events, which can range from drug interactions to disease symptoms. Some physicians prefer to avoid over-consulting their CDSS, instead relying on their professional experience to determine the best course of care."
(http://searchhealthit.techtarget.com/definition/clinical-decision-support-system-CDSS)
Essentially, a CDSS provides an environment under which health care providers can incorporate data mining and analytics tools to assist with clinical decision-making. The institutionalized use of a CDSS requires:
1. Development and implementation of CDSS tools with clinical decision-making value.
2. An environment under which CDSS tools can be effectively used by health care providers.
3. Adoption intent by health care providers perceiving CDSS tools to be useful and easily used.
We present a MES similarity measure as a CDSS tool providing significant value for clinical decision-making to improve patient outcomes.
23


CDSS Adoption Factors
Specific to this research is an understanding of the forces behind the adoption of clinical decision-support systems. The factors influencing CDSS adoption are somewhat different than those for general IT and more transaction level system adoption within health care. However, the paradox is similar for CDSS adoption in that a technology with obvious benefits is being reluctantly used. Liberati (2017) observes that "Although this technology [CDSS] has the potential to improve the quality of patient care, its mere provision does not guarantee uptake: even where CDSSs are available, clinicians often fail to adopt their recommendations." Appendix D outlines the more important factors for CDSS specific IT adoption. Interestingly, physician autonomy, as discussed previously, is an important consideration in CDSS adoption. To confirm, Liberati (2017) states "the most severe barriers (prevalent in the first positions) include clinicians' perception that the CDSSs may reduce their professional autonomy or may be used against them in the event of medical-legal controversies." What is therefore needed are CDSS tools capable of convincing physicians that patient outcomes could improve, and professional autonomy maintained, with their use. Accordingly, this literature review continues with an overview of similarity measures, as a potential CDSS tool to further mitigate physician IT hesitancy, by proving its value in diagnosis and treatment.
Similarity Measures and Medical Event Sequences
We are motivated to develop similarity measures appropriate for MESs to support and broaden health care analytics. In reviewing related research, there is a general lack of understanding and reference studies concerning the application of similarity measures to MESs. Furthermore, similarity measures established for state sequences in other domains, fail to adequately address the unique characteristics of MESs. For example, MES characteristics of importance to similarity measures and ignored by referential similarity measures include (1) event likelihood and replication, (2) the
24


distribution of temporal gaps between medical events, and (3) hierarchically structured event codes characterizing events. Furthermore, other similarity measures include event sequence features, such as alignment or sentinel events, that may not be of relevance for MES analysis. During our survey of referential similarity measures, we focused on their inclusion and application of event matching and temporal structure; features unique and important to MESs. We find that some similarity measures do, in fact, address unique MES features for event alignment, temporal event and gap duration, and the significance of edit distance. First, event alignment is used in some similarity algorithms to construct a relative time scale for comparing two event sequences. Usually event alignment similarity involves a sentinel event to serve as a reference point, with this sentinel event being chosen by the user or based on some other criteria. For example, the Match/Mismatch measure requires the entry of a sentinel event, with the timing of other events being relative in terms of their duration from the sentinel event (Wongsuphasawat 2009). The ARTEMIS similarity measure also incorporates event alignment. This temporal alignment of event sequences seems especially appropriate to MES analysis, given that a triggering medical event can serve as a meaningful reference point for subsequent medical events comprising a patient's MES. Second, temporal structure, in terms of event durations and gaps between health care incidents, is integral to MESs, not only for event matching, but also for ascertaining the acuity of a patient's condition. For example, a longer inpatient event duration (length of stay) may indicate greater severity, and a longer period between inpatient admissions (duration gap) may indicate less severity. Along those lines, an exploration of referential literature reveals a prototypical similarity measure (Optimal Temporal Common Subsequence) that incorporates the key MES features of temporal structure (Zheng 2010). Third, the use of edit distance for measuring the cost (or distance) required to transform one event sequence into another may not have the same significance for MESs as other event sequences. Specifically, edit distance quantifies dissimilarity between two strings by counting the minimum number of operations required to transform one string into the other. Edit distance applies
25


when there is no intrinsic meaning for the characters or codes representing the events within a sequence. This is not the case for MESs where a diagnosis, procedure or drug code holds intrinsic meaning about the patient's health care event. As such, one cannot change the value of a patient's diagnosis code, appearing in their MES, to match another patient's diagnosis code, appearing in their MES, to simply compute an edit distance. This would result in losing information about the event itself, and this medical event information is crucial to MES based similarity.
In summary, there is limited application of similarity measures to MESs. Much of the previous research concerning similarity measures focuses on media retrieval through information recognition measures. This type of work considers event sequences as ordered lists without considering the unique temporal and coding facets of MESs. However, some components of referential similarity measures (OTCS for example) do seem applicable to MES analysis, and their methods should be incorporated as appropriate.
26


CHAPTER II
DEVELOPMENT AND EVALUATION OF A SIMILARITY MEASURE FOR MEDICAL EVENT SEQUENCES
Abstract
In this study, we develop a similarity measure for medical event sequences (MESs) and empirically assess it using U.S. Medicare claims data. Existing similarity measures do not use unique characteristics of MESs and have never been evaluated on real MESs. Our similarity measure, the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS-MES), provides a matching component that integrates event prevalence, event duplication, and hierarchical coding, important elements of MESs. The OTCS-MES also uses normalization to mitigate the impact of heavy positive skew of matching events and compact distribution of event prevalence. We empirically evaluate the OTCS-MES measure against two other measures compatible with MES analysis, the original OTCS and Artemis, a measure incorporating event alignment. Our evaluation uses two substantial data sets of Medicare claims data containing inpatient and outpatient sequences with different medical event coding. We find a small overlap in nearest neighbors among the three similarity measures demonstrating the importance of each unique aspect of a MES. The evaluation also provides evidence about the impact of component weights, neighborhood size, and sequence length. This study includes an expanded literature review about the enabling attributes of electronic health records (EHRs) for data mining, application of similarity measures to clinical decision-making, and general IT adoption barriers within health care. Overall, this exploration of MES similarity measure reasoning is undertaken with the understanding that EHR/CDSS pervasive use is fundamental to effective and transformational health care IS research.
Introduction
State sequences occur naturally in many circumstances and lend themselves to advanced data mining techniques. A state sequence is "a sequence of data, measured and/or spaced typically at
27


successive times, which can be either points or intervals" (Zheng et al. 2010). Examples of common state
sequences analyzed through data mining techniques include temporally spaced account payment statuses, audio pattern matching, and video information retrieval (Zheng et al. 2010). State sequence research applies to several fields including "geo-informatics, cognitive science, linguistic analysis, music and medicine" (Kostakis 2011).
This dissertation emphasizes the analysis of state sequences containing patient incidents or medical event occurrences within the health care industry. In this research, medical event sequences (MESs) are defined as state sequences relevant to health care. The advent of electronic medical records (EHRs) has enabled the abstraction and retention of standard diagnosis, procedure, and pharmaceutical codes characterizing MESs. Table 1 provides an example of an inpatient (IP) MES where a sequence of hospital admissions is characterized by the International Classification of Diseases Version 9 (ICD9) primary diagnosis code associated with each event.
Table 1. Sample Inpatient MES
Member ID Primary ICD-9 Diagnosis Code Description Event Start Event End
00824B6D595BAFB8 49122 OBSTRUCTIVE CHRONIC BRONCHITIS WITH ACUTE BRONCHITIS 2/7/2008 2/10/2008
00824B6D595BAFB8 49121 OBSTRUCTIVE CHRONIC BRONCHITIS WITH (ACUTE) EXACERBATION 2/28/2008 3/4/2008
00824B6D595BAFB8 78906 ABDOMINAL PAIN EPIGASTRIC 3/25/2008 3/29/2008
00824B6D595BAFB8 4280 CONGESTIVE HEART FAILURE UNSPECIFIED 3/29/2008 3/30/2008
00824B6D595BAFB8 7802 SYNCOPE AND COLLAPSE 5/14/2008 5/16/2008
00824B6D595BAFB8 27651 DEHYDRATION 6/11/2008 6/13/2008
00824B6D595BAFB8 5990 URINARY TRACT INFECTION SITE NOT SPECIFIED 7/13/2008 7/18/2008
00824B6D595BAFB8 5070 PNEUMONITIS DUE TO INHALATION OF FOOD OR VOMITUS 8/14/2008 8/21/2008
Motivation
As EHR adoption expands, the health care industry can realize increased "productivity for doctors and nurses, better information for decision-making, better product/service customization, higher quality patient outcomes, and better service" (Skinner 2003). Data warehouses of clinical information provide a "foundation for a learning healthcare system that facilitates clinical research, quality improvement, and other data-driven efforts to improve health" (Hersh 2013). Data warehouses
28


can provide economies of scale to support comparative effectiveness research (CER), which aims to study populations and clinical outcomes of maximal pertinence to real-world clinical practice (Hersh 2013). While adoption of EHR data warehousing and analytics is progressing slowly, the health care industry understands that "the investment in EHRs and supporting data warehouses is fundamentally required to achieve the value that is accessible in analytics" (Sanders 2013).
A similarity measure for MESs is important for a variety of reasoning tasks used by health care professionals and data mining algorithms. Informally, a similarity measure can be used by health care professionals to develop treatment plans based on similar patients. A similarity measure can be used in data mining algorithms for risk assessment, co-morbidity determination, and conformance to clinical pathways. Existing similarity measures fail to adequately address the unique characteristics of MESs including event likelihood and replication, distribution of temporal gaps between medical events, and hierarchically structured event codes. Furthermore, many similarity measures include other components such as alignment or sentinel events that do not have comparable significance for MESs.
Summary of Work
To explore the potential benefit of MESs to health care professionals and data mining algorithms, this research develops an improved similarity measure applicable to MESs and evaluates the measure's effectiveness through formal experimentation. To provide a context for the OTCS-MES measure, we explain components of MESs requiring special consideration when comparing health care incident sequences. We then present the OTCS-MES measure, an enhanced version of the Optimal Temporal Common Subsequence (OTCS) measure developed by Zheng et al. (2010). The OTCS-MES substantially extends the original OTCS with components for matching individual events and temporal structure.
29


We evaluate the OTCS-MES measure in an application and algorithm independent manner using
substantial samples of MESs. The evaluation uses two measures of nearest neighbor overlap, both independent of application and algorithm. Assessment of overlap among nearest neighbors provides insights for follow-on analysis of performance by application and algorithm. We evaluate the OTCS-MES measure, the original OTCS measure, and Artemis, a similarity measure based on event alignment for MESs. Our empirical evaluation utilizes substantial samples of inpatient and outpatient claims data from the Data Entrepreneurs' Synthetic Public Use Files (DE-SynPUF) available through the Centers for Medicare and Medicaid Services (CMS). Since its introduction, CMS SynPUF data has been widely used in health care research and analytical studies. Archimedes Inc., a prominent healthcare modeling and analytics company, collaborated with the Centers for Medicare & Medicaid Services to simplify and expand access to the same synthetic CMS claims data used in this study. Using two measures of nearest neighbor overlap, our empirical evaluation provides evidence of substantial differences among the three similarity measures. We also demonstrate internal consistency of OTCS-MES components and the impact of weights, neighborhood size, and sequence length.
Contributions
Our research provides innovation in the design of an adapted similarity measure (OTCS-MES) specific to medical events, characterization of MESs to improve the design of similarity measures, and empirical comparison of similarity measures in an application and algorithm independent manner using substantial samples of actual claims data. For the first contribution, the OTCS-MES, a new design science artifact, combines two independent components for matching events and temporal structures using simple weights. The event matching component integrates event prevalence, event duplication, and hierarchical coding, important elements in MESs. For the second contribution, the empirical characterization of MESs uses a substantial sample of medical claims data to indicate the impact of heavy positive skew on the number of matching events and the compact distribution of event
30


prevalence. Normalization used in the event matching component of the OTSC-MES mitigates these impacts. The insights of this empirical characterization can also improve the design of other similarity measures for event sequences. For the third contribution, the empirical comparison of OTCS-MES to other similarity measures uses two overlap measures that are application and algorithm independent. The overlap measures allow comparison of nearest neighbors generated by the proposed OTCS-MES, the original OTCS, and Artemis. The empirical comparison involves a substantial sample of real medical claims data, both inpatient and outpatient with different medical coding. The analysis provides strong evidence for substantial differences in overlap between OTCS-MES and the other similarity measures and the impact of weights, neighborhood, and sequence size on overlap. The insights from this empirical comparison provide guidance for evaluating performance of similarity measures for MESs in important applications and algorithms.
Related Work
Similarity measures quantify our "naive judgments of likeness" between entities (France 1994). In the health care industry, similarity measures quantify patient likeness to help with patient classification for improved outcomes and risk (cost) prediction. Research concerning similarity measures for MESs leverage ontologies. An ontology is "a vocabulary of terms and some specification of their meaning". It compares to an ICD code set when describing the events within an inpatient MES. Essentially our objective is "a function that, given two ontology terms or two sets of terms [MESs] annotating two entities [patients], returns a numerical value reflecting the closeness in meaning between them" (Couto 2013).
Existing similarity measures incorporate several components relevant to MES evaluation: (1) event alignment, (2) temporal event and gap duration, and (3) symbol substitution in edit distance.
Event alignment constructs a relative time scale for comparing two event sequences based on a sentinel matching event. For example, the Match/Mismatch measure requires a sentinel event category for
31


aligning other events based on their time from the sentinel event (Wongsuphasawat 2009). This type of
timeline alignment seems appropriate to MESs as a triggering event can serve as a reference point for establishing a relative MES time scale. The Artemis measure in our empirical evaluation incorporates event alignment.
Temporal duration and gap between health care incidents help with ascertaining the acuity of a patient's condition. For example, longer inpatient admissions may indicate a more severe condition, and longer periods of time (gaps) between inpatient admissions may indicate a less acute condition. Consequently, we select the OTCS similarity measure because it utilizes both temporal event duration and temporal event gap (Zheng 2010).
Symbol substitution, used in edit distance to transform event sequences, may not have the same utility for MESs as with other types of event subsets. Edit distance applies when symbols have no intrinsic meaning representing events within a sequence. In an MES, however, a diagnosis code provides intrinsic meaning about a patient. One cannot subjectively change the value of a patient's diagnosis code appearing in their MES to match another patient's diagnosis code as appearing in their MES and count that as a single step to be added to an edit distance.
Although some aspects of MESs are addressed by other similarity measures, these measures largely ignore important MES aspects including (1) the intrinsic meaning associated with abstracted codes characterizing events in an MES, (2) the temporal gap and duration features of medical incidents, and (3) partial matching from hierarchical event coding. An MES is characterized by abstracted medical event codes each with an intrinsic meaning. Therefore, codes comprising an MES cannot be randomly exchanged to measure event sequence similarity. For the other two aspects, most similarity measures neglect to capture the temporal components of event sequences. In general, "time-series and state-sequences have been simply expressed as ordered lists . . . leaving some critical issues unaddressed"
32


(Zheng 2010). None of the proposed similarity measures for MESs support hierarchical coding schemes.
Jordan (2004) emphasizes the significance of hierarchical coding levels, writing "recording diagnoses only at diagnostic ... heading [level] may improve the accuracy of recorded data but is too general for most research or clinical purposes". Accordingly, each hierarchical level of the ICD diagnosis code is important to similarity matching as it contains additional information about each event and, ultimately, the patient.
In summary, there is limited prior application of similarity measures to MESs. Much of the previous research concerning similarity measures focuses on media retrieval through information recognition algorithms. This type of work clearly considers event sequences as ordered lists without considering the unique attributes of MESs. In contrast to this approach, we propose a measure that incorporates the unique attributes of MESs. As a starting point, OTCS considers the temporal components of event sequences and the importance of matched events. As such, it serves as the starting point for our OTCS-MES measure specific to medical event sequences. For our selected comparative measure, Artemis incorporates temporal matching through defining and quantifying common temporal relationships between event-intervals (meet, match, overlap, contain and follow) (Kostakis 2011).
OTCS-MES Similarity Measure
Representation of MESs
We follow Zheng's (2010) representation of MESs in which a temporal event tl begins with time point pi and ends with time point ql. The duration of event tl is then simply ql minus pi, and the preceding temporal gap is pi minus qO (end point of prior event). We extend this representation to MESs with ICD-9 codes as shown in Table 2, an extension of Table 1. For the second inpatient event (with Primary ICD Diagnosis Code 49121), the starting point p2 is 2/28/2008, the ending point q2 is
33


3/4/2008 and the prior event ending point ql is 2/10/2008. Given these values, the event duration is 5
days (q2 minus p2) and the event gap is 18 days (p2 minus ql). ICD-9 codes contain three levels from
three digits to five digits. The primary ICD-9 code contains the maximum digits specified as providers
sometimes only report 3 or 4-digit codes. For example, the second event has all 5 digits (49121), while
the fourth event only has 4 digits (4280).
Table 2. Sample IP MES with Event Duration and Gap
Member ID Primary ICD-9 Diagnosis Code Description Event Start Event End ICD-9 3 Digit ICD-9 4 Digit ICD-9 5 Digit Event Duration Event Gap
00824B6D595BAFB8 49122 OBSTRUCTIVE CHRONIC BRONCHITIS WITH ACUTE BRONCHITIS 2/7/2008 2/10/2008 491 4912 49122 3 0
00824B6D595BAFB8 49121 OBSTRUCTIVE CHRONIC BRONCHITIS WITH (ACUTE) EXACERBATION 2/28/2008 3/4/2008 491 4912 49121 5 18
00824B6D595BAFB8 78906 ABDOMINAL PAIN EPIGASTRIC 3/25/2008 3/29/2008 789 7890 78906 4 21
00824B6D595BAFB8 4280 CONGESTIVE HEART FAILURE UNSPECIFIED 3/29/2008 3/30/2008 428 4280 4280 1 0
00824B6D595BAFB8 7802 SYNCOPE AND COLLAPSE 5/14/2008 5/16/2008 780 7802 7802 2 45
00824B6D595BAFB8 27651 DEHYDRATION 6/11/2008 6/13/2008 276 2765 27651 2 26
00824B6D595BAFB8 5990 URINARY TRACT INFECTION SITE NOT SPECIFIED 7/13/2008 7/18/2008 599 5990 5990 5 30
00824B6D595BAFB8 5070 PNEUMONITIS DUE TO INHALATION OF FOOD OR VOMITUS 8/14/2008 8/21/2008 507 5070 5070 7 27
OTCS-MES Components
We developed the OTCS-MES based on characteristics of MESs. The original OTCS measure, although motivated by MESs, does not account for prevalence of events, hierarchical coding, and duplication of events, important aspects of MESs. The OTCS-MES component for event matching integrates event prevalence, hierarchical coding, and replicated events. The OTCS-MES, using event prevalence, provides higher similarity for rarer events. Prevalence supports hierarchical matching in
34


medical coding schemes. The OTCS-MES also revises the temporal structure matching component. The original OTCS considered only common events for temporal structure matching. The OTCS-MES matches on the temporal structure of all events, maintaining independence between the event matching and temporal structure matching. Independence is consistent with MES characteristics including many event codes, no intrinsic meaning to the order of event occurrences, and typically a relatively small number of matching events. The OTCS-MES also uses both mean and variation to measure temporal structure similarity whereas the original OTCS only uses sum of gap differences. In contrast to the original OTCS, the OTCS-MES normalizes each component to simplify weight assignment.
Using important characteristics of MESs, the OTCS-MES consists of two major components for event matching and temporal structure matching. The event matching component sums prevalence weights in the numerator and the number of matched events in the denominator. We use prevalence as a score or weight among events matching at possibly different hierarchical levels of a medical coding representation. The temporal structure component contains four elements for differences of mean and coefficient of variation (CV) for duration and gap between two event sequences. We normalize each element to achieve a value between zero and one. The five normalized elements are then weighted for an overall similarity measure ranging from zero to one with larger values indicating greater similarity. The computation of each element comprising the OTCS-MES is provided in Equations 1 and 1.1-1.5.
OTCS MES = (IT, x OTCSMES1 ) + (W2x OTCSMES2 ) + (W3x OTCSMES3 ) + (W4x 0TCSMES4 ) +(Wsx OTCSMESs )
where £f=1VK( = 1 (V
In Equations 1.1 to 1.5,
35


• ME is the set of all events in the pair of cases,
• C is set of all cases, and
• MSSize is the cardinality of the associated set.
Event Similarity (OTCSMES1)
yMSSize fjpyir
OTCSMES, = (^M%LeN;We)/(PDM + 1 - MSSizeLimit) (1.1)
• NPWe is the normalized prevalence weight of event e,
• PDM is the maximum matched event limit, and
• MSSizeLimit is the number of event matches in the associated set constrained by the matched event limit.
Temporal Similarity (OTCSMES2-OTCSMES5)
OTCSMES2 = 1 -
y MSSize e in ME
AVGDURe
mean+stddevY^fnc*1 AVGDURe
(1.2)
AVGDURe is the average temporal duration difference of all events in the associated set e.
OTCSMES3 = 1 -
7MSSize
AVGGAPp
\mean+stddev Swine'6
AVBGAP.
(1.3)
AVGGAPe is the average temporal gap difference of all events in the associated set e.
OTCSMES, = 1
S^einME CVDURe
mean+stddevSwine'6 CVDUR(
(1.4)
CVDURe is the coefficient of variation difference for duration of all events in the associated set e.
OTCSMESs = 1 -
yMSSize f\Tp An in ME'-jVU/ire
mean+stddev HetncZe CVGAPe
(1.5)
36


CVGAPe is the coefficient of variation difference for gap of all events in the associated set e.
Coding of OTCS-MES Similarity Measure
Development of efficient software to operationalize the OTCS-MES similarity measure is imperative given the large amounts of data envisioned for processing. That is, MES comparisons between entity pairs quickly expands as the number of entities grows. In fact, the number of comparisons required for a set of N entities (MESs) is equal to: ( N x ( N -1)) / 2. As such, two different algorithms were developed to generate the results for each component of the OTCS-MES similarity measure described above. The first algorithm is a C# module and is best applied to smaller subsets of MES data. This C# module is included as Appendix E. The second algorithm is intended for a larger number of MES comparisons. This algorithm has been developed with the SAS programming language and incorporates features of that language to improve processing efficiency. The SAS program computing OTCS-MES comparison results is included as Appendix F.
Example
To clarify the practical application of OTCS-MES, we use a simple example to depict each component of the measure. Table 3 provides an example of a high similarity patient pair. In Table 3, the MES pairs have five matching events (<295 in rows 1 and 5>, <295 in rows 2 and 7>, <2953 in rows 3 and 9>, <2953 in rows 4 and 10>, <296 in rows 5 and 11>). Note that two matches occur at the four-digit level with higher prevalence than 3-digit matches. OTCS-MES uses the prevalence of these matching events rather just the count. The table also provides temporal component metrics indicating the similarity of these patients based on their MESs
37


Table 3. Example of Similar Patients Based on Event Similarity
DE-SynPUF Patient ID ICD9 Begin End Admit Admit
3-Digit 4-Digit 5-Digit Admit Admit Gap Duration
295 2953 29530 8/11/2008 9/1/2008 0 21
295 2953 29530 9/2/2008 10/6/2008 1 34
0278EC3A3183E5A4 295 2953 29530 12/16/2008 12/18/2008 71 2
295 2953 29530 1/12/2009 1/16/2009 25 4
296 2969 29690 6/26/2010 7/6/2010 526 10
Average 124.60 14.20
Coefficient of Variation 1.82 0.94
304 3048 30480 2/12/2008 2/17/2008 0 5
820 8208 5/16/2008 5/21/2008 89 5
298 2989 6/3/2008 6/19/2008 13 16
730 7302 73028 7/8/2008 7/12/2008 19 4
295 2957 29570 7/26/2008 8/4/2008 14 9
A7616FF2567C9EA8 296 2962 29620 10/18/2008 11/22/2008 75 35
295 2957 29570 12/2/2008 1/6/2009 10 35
283 2839 2/24/2009 2/25/2009 49 1
295 2953 29534 10/15/2009 11/9/2009 232 25
295 2953 29532 10/17/2009 10/29/2009 0 12
296 2962 29624 12/16/2010 12/27/2010 413 11
Average 83.09 14.36
Coefficient of Variation 1.55 0.85
Normalization in the Event Similarity Component
Two issues surfaced when devising the event similarity component of the OTCS-MES measure. The first issue involves the impact of a heavy, positively skewed distribution when normalizing the number of event matches based on the maximum number of event matches across all patient pairs. A heavily skewed distribution can make most MESs look dissimilar even when they share several common events. For the inpatient MESs used in our analysis, 99% of MES pairs with at least one common event have less than or equal to five matched events as indicated in Figure 1. Flowever, some patient pairs have up to twenty or more matched events. An analysis of matched events for patient pairs based on outpatient procedures yields a similar distribution (Figure 2).
38


To mitigate the impact of heavy positive skew, the matched event limit should be set based on
the cumulative frequency of patient pairs by matched events, perhaps at a level above 95%. We set the normalizing denominator to 5 (99%) for inpatient data and 40 (97%) for outpatient data.
Frequency Distribution of Entity Pairs by Number of Matched Inpatient Procedures (CMS Linkable Medicare Data - DE-SynPUF - Inpatient Samples 1-20)


£ £ .■b 4000000 _
B O _
j q/vrwvi . E Z 1 â– 
2000000 1000000 0 t
1 .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 Number of Matched Events
Figure 1. Frequency Distribution of Patient Pairs by Number of Matched Inpatient Events
Frequency Distribution of Entity Pairs by Number of Matched Outpatient Procedures
(CMS Linkable Medicare Data - DE-SynPUF - Outpatient Sample 1 - 2000 Randomly Selected Patients)
Number of Matched Events
Figure 2. Frequency Distribution of Patient Pairs by Number of Matched Outpatient Events
The second issue involves summation of prevalence weights for matching events. For ICD-9 codes in the Medicare data, the distribution is a tight, relatively uniform distribution. For the large number of ICD-9 codes, the comparative frequency of a single code is rather small and does not appropriately capture the relative rarity of certain incident codes. For example, the total number of
39


unique 3-digit ICD-9 codes in our claim sample is 528 with the most prevalent code (428) only
accounting for 6.87% of the claims. Likewise, there are 1,493 unique 4-digit codes and 1,153 unique 5-
digit codes with the most prevalent codes only accounting for 4.26% (code 4912) and 5.37% (code
49121) of all claims respectively.
To lessen the impact of this tight, relatively uniform distribution, we normalize prevalence
values using the difference between the minimum and maximum sum of prevalence weights across all
patient pairs with a corresponding number of matching events. Figure 3 shows the range (minimum and
maximum) of summed prevalence weights by the number of matching events.
Matching Event Prevalence Rate Computation Prevalence Sum Detail by Number of Matched Events
Matching Entity
Events Pairs
1 11974844
2 44C9220
3 1104488
4 420330
5 86550
6 57920
7 11636
8 8460
9 3636
10 1674
11 552
12 816
13 162
14 66
15 116
16 84
17 30
18 12
19 6
20 22
21 4
22 6
25 2
ev. Sum Prev. Sum
Minimum Maximum
0-9388 0.9999
1.6775 1.9996
2.8163 2.9961
3.7551 3.9949
4.6938 4.9860
5.6326 5.9880
6.5637 6.9499
7.5101 7.9579
6.4489 8.9454
9.3876 9.9025
10.3622 10.8667
11.2652 11.8423
12.2634 12.8518
13.3241 13.8025
14.1252 14.8251
15.1017 15.8011
16.1612 16.7056
17.1639 17.7457
15.6180 18.6439
18.9540 19.7065
20.3771 20.6200
21.5717 21.5815
24.5563 24.5563
Figure 3. Prevalence Scales by Number of Matched Events for Inpatient MESs
Empirical Evaluation of the OTCS-MES
The empirical evaluation compares the OTCS-MES to two other proposed similarity measures for MESs, the original OTCS and Artemis. The comparison explores differences in nearest neighbors returned by each measure. Nearest neighbors are used in data mining algorithms for classification and clustering as well as clinical decision-making by health care providers. Before comparing the measures,
40


we present characteristics of the data used in the comparison and analyze characteristics of the OTCS-
MES.
Data Characteristics
Centers for Medicare and Medicaid Services (CMS) provides real claims data intended "for data entrepreneurs, for software and application development, and for research training purposes" (cms.gov 2015). Importantly, the use of real CMS claims data provides researchers the capability to extract meaning from the wealth of information contained in abstracted health care event data. Researchers, life-science organizations, government agencies, payers, and providers use this data to make more informed decisions based on the actual health care experiences of their constituents. For example, Demand-Driven Open Data (DDOD), being a framework of "tools and methodologies to provide a systematic, ongoing and transparent mechanism" for the use of publicly available "open" data, leverages CMS SynPUF data. A sample DDOD application identifies "high-risk, high-cost individuals with the aim of providing them appropriate social services". DDOD recognizes that the CMS SynPUF data "is a realistic and professionally deidentified sample data set to give would-be data entrepreneurs something realistic to try out new applications".
CMS provides inpatient, outpatient, carrier, and prescription drug claims for randomly selected Medicare beneficiaries. Specifically, the files contain synthesized data taken "from a 5% random sample of Medicare beneficiaries in 2008 and their claims from 2008 to 2010" (cms.gov 2015). The files are synthesized in the sense that "a unique unidentifiable ID, DESYNPUF_ID...is provided on each file to link synthetic claims to a synthetic beneficiary" (cms.gov 2015).
Our empirical evaluation uses inpatient and outpatient claims from the CMS publicly available data sets. For the inpatient evaluation, we randomly extracted patients and their associated claims across all 20 claims samples prepared by CMS. The threshold for event length was set to five to help
41


with evaluation robustness. That is, patients are required to have at least five events to be eligible for
random selection. The resulting experimental data set contains 37,448 claims (inpatient admissions) for 7,000 unique beneficiaries or patients.
Clinical outpatient procedures use standardized codes to describe patient services such as tests, surgeries, evaluations, and other medical procedures. The outpatient data set contains Current Procedure Terminology (CPT1) codes, a standard code set maintained by the American Medical Association (AMA). Like ICD codes, CPT codes have a hierarchical organization. CPT codes are organized into sections (6), subsections (115), and then individual five-digit codes. The outpatient data set used in this study contains 2,600 unique, five-digit CPT codes.
Similar to ICD codes, CPT codes in our sample have a tight, relatively uniform distribution with the most frequent procedure code contained in 4.92% of the sample. Thus, we normalize prevalence values for CPT codes in a manner similar to ICD codes.
In the CMS data, outpatient MESs have important differences from inpatient MESs. First, outpatient MESs have longer sequence length due to the more routine nature of outpatient events. However, the extreme positive skew of the distribution of patient pairs by number of matched events for outpatient event matching (Figure 2) is like inpatient MESs. In our sample, 97% of patient pairs matched on 40 events or fewer, but the maximum number of matched events between patient pairs is 699. Second, Table 4 shows that the duration of outpatient events is much less than inpatient events
1 The Centers for Medicare and Medicaid Services uses CPT codes as Level 1 of the Health Care Common Procedure Coding System.
42


with 90% of outpatient services occurring on the same day or within two days based on the begin and
end service dates appearing on the outpatient claim.
Table 4. Distribution of Outpatient Incidents by Duration
(CMS Linkable Medicare Data - DE-SynPUF - Outpatient Sample 1 - 2000 Randomly Selected Patients)
Outpatient Incident Duration (# of Days) # of Outpatient Incidents % of Total
0 685135 87.89%
1 17525 2.25%
2 6300 0.81%
3 3831 0.49%
4 3239 0.42%
GE 5 63507 8.15%
Total 779537 100.00%
For the outpatient evaluation, a smaller sample of randomly selected patients was used due to the much longer length of MESs. The resulting experimental data set contained 6,968 claims (OP admissions) for 966 patients.
Nearest Neighbor Overlap Measures
The use of overlap measures to evaluate the relative impact on the set of nearest neighbors derived through different similarity measures is widely accepted. The intent of overlap measures is to determine whether the set of entities (nearest neighbors) chosen as most like a reference entity using a particular similarity measure is in fact different than the set of entities chosen by another similarity measure. Along those lines, we calculate nearest neighbor overlap to determine the equivalence of the most similar or nearest neighbor patients resulting from each measure or measure adaptation. The computation of nearest neighbor overlap may incorporate simple or rank weighted measures. The simple overlap measure is the ratio of the number of common neighbors (cardinality of set intersection) divided by the total number of possible matches (k). In Table 5, the simple overlap is 3 matching nearest neighbors out of a maximum of 5 possible matches for an overlap value of 0.6 (3/5).
43


Table 5. Simple Overlap Measure Example
Nearest Neighbors for Patient X Using Original OTCS Nearest Neighbors for Patient X Using OTCS MES
Patient L Patient K
Patient K Patient F
Patient C Patient G
Patient F Patient L
Patient Z Patient M
Total common nearest neighbors is 3 out of a possible 5. Therefore, the simple overlap measure value is 3 divided by 5 or 0.60.
The weighted overlap measure incorporates the relative rank of each nearest neighbor in addition to the actual number of matching neighbors. Building on the computation provided in Table 5, we now factor in the rank of each nearest neighbor in Table 6. The weighted overlap measure compares the sum of the rank differences for each matching nearest neighbor to the maximum sum of rank differences for the relevant combination of k value and number of matches. Expanding upon the simple overlap example shown in Table 5, we provide an example of the weighted overlap measure computation in Table 6.
Table 6. Rank Weighted Overlap Measure Example
Nearest Neighbors for Patient X Using Original OTCS Nearest Neighbors for Patient X Using OTCS MES Difference in Rank
2 Patient K 1 Patient K 1
4 Patient F 2 Patient F 2
3 Patient C 3 Patient G
1 Patient L 4 Patient L 3
5 Patient Z 5 Patient M
a. Sum of Rank Differences 6
b. Maximum Sum of Rank Differences for K=5 Nearest Neighbors and 3 Matches 10
c. Rank Difference Adjustment Factor (a/b) 0.4
Total common nearest neighbors is 3 out of a possible 5. Therefore, the simple overlap measure value is 3 divided by 5 or 0.60 and the weighted overlap measure value is the simple overlap measure value multiplied by the rank difference adjustment factor or 0.24.
44


Empirical Characteristics of the OTCS-MES
Before comparing the OTCS-MES to other similarity measures, we empirically study its characteristics on the inpatient and outpatient data sets. To understand the impact of OTCS-MES components, we consider both equal and unequal component weighting.
Figure 4 shows the frequency distribution of patient pairs based on the OTCS-MES with equal weighting for inpatient data. The distribution shows symmetry with a long, thin tail on the right. Nearest neighbors should be concentrated in the long, thin right tail. Figure 5 shows the frequency distribution of OP patient pairs based on the OTCS-MES measure with equal weighting. The distribution in Figure 5 shows less symmetry, but still a long thin tail at the right end. The long thin tails in frequency distributions for both data sets demonstrate that the OTCS-MES can help differentiate MESs with high similarity.
Frequency Distribution of Entity Pairs by OTCS MES IP Similarity Measure
71593
OTCS MES for IP Event Sequences
Event Sanilanty Component Weight = 0.5 i Structural Similarity Component Weight = 0 5 (CMS Linkable Medicare Data - OE SynPUF - Inpatient Samples 1-20)
Figure 4. Frequency Distribution of Patient Pairs by OTCS-MES (Inpatient Data)
45


Frequency Distribution of Entity Pairs
by OTCS MES OP Similarity Measure

OTCS MES OP Ewrt
EwmI S*nit»ify Cmwwl Wwjlil = 05/ SUoajirf Snfaiily CtapoWrt vtm/* = 04 4 CM3 L**hWb Vwkao Data - DC-Syif’UF - (MpaUMl Sancw 1 - 1000 Raiaxib,}
Figure 5. Frequency Distribution of Patient Pairs by OTCS-MES (Outpatient Data)
For unequal component weighting, the analysis helps determine if small weight changes indicate disproportionate impact on nearest neighbors. Analysis of component weighting for the IP based OTCS-MES measure indicates a reasonably consistent set of nearest neighbors. The graphs in Figures 6 and 7 demonstrate slopes with linear and flat appearances indicating that small changes in weights have comparatively small impacts on overlap between the nearest neighbor populations generated by OTCS-MES. Figure 6 indicates that simple nearest neighbor overlap exceeds 65% when adjusting the weighting up or down by 0.2 or less. Additionally, Figure 7 indicates that rank weighted nearest neighbor overlap exceeds 45% when adjusting the weighting up or down by 0.2 or less. Furthermore, extreme weighting of either the event similarity or structural similarity components still shows at least a 25% nearest neighbor overlap. This overlap indicates that the OTCS-MES remains relatively insensitive to component weighting levels.
46


Simple Percentage of Overlapping Entity Pairs Unbalanced Weighting versus Balanced Weighting
100% --------------------------------------------------
cl so%
u
£ 20%
0%
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
.....0.1/0.9-------- 0.3/0.7 ■ 0.5/0.5-----0.7/0.3 — * -0.9/01
Figure 6. Simple Overlap using Unbalanced Weighting (Inpatient Data)
100%
a. 80% n
£ 60% O
£ 40% u
(5! 20% 0%
Rank Weighted Percentage of Overlapping Entity Pairs Unbalanced Weighting versus Balanced Weighting
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
0.1/0.9---------- D.3/0.7 3.5/0.5----------0.7/0.3 — - 0.9/0.1
Figure 7. Rank Weighted Overlap using Unbalanced Weighting (Inpatient Data) Comparison to OTCS and Artemis
We evaluate nearest neighbor overlap between the OTCS-MES measure and two comparative similarity measures, the original OTCS and Artemis. The overlap analysis is a prerequisite to comparing performance in decision-making tasks such as classification, clustering, and search mechanisms used by
47


health care providers. If the nearest neighbors generated by OTCS-MES, OTCS, and Artemis do not differ,
the performance of these measures when applied to MESs cannot be different. To demonstrate the impact of component weights and neighborhood size, the comparison incorporates multiple component weights and neighborhood sizes. The overriding question is whether the OTCS-MES generates a different set of nearest neighbors for substantial samples of MESs than other proposed similarity measures (OTCS and Artemis).
We hypothesize that comparison of the measures will indicate less than 0.50 overlap for both the simple and rank weighted overlap measures. For Hypothesis 1, we consider both balanced component weights (equal weights for event matching and temporal matching components), and unbalanced component weights with extreme weighting with all weighting on one component and mixed weighting with a 2:1 ratio of component weights.
Hypothesis 1. The overlap of nearest neighbors will be less than half for the OTCS measures
because OTCS-MES has refinements (event and structural) in measure components.
Hypothesis 2. The overlap of nearest neighbors will be less than half between the OTCS-MES and Artemis measures and the overlap will be much smaller for Artemis compared to the OTCS measures as Artemis emphasizes alignment, not matching event counts.
Beyond demonstrating overlap differences among the three measures, the analysis isolates the impact of similarity measure components and MES weight. For similarity components, we hypothesize that extensions to the temporal component will have more impact than extensions to the event matching component. The OTCS-MES has more extensions in the temporal component (temporal structure of all events (OTCS-MES) versus matching events only (OTCS) and mean and variation of gaps and duration (OTCS-MES) versus sum of gap differences (OTCS)). For MES length, we hypothesize that the impact of partial matching and prevalence will increase as MES length increases. A larger number of
48


events in an MES will provide more opportunity for partial matching using event type prevalence, leading to decreased overlap between OTCS-MES and OTCS.
Hypothesis 3. Overlap on the temporal components will be smaller than overlap on the event matching components for OTCS-MES and OTCS.
Hypothesis 4. As MES length increases, overlap between OTCS-MES and OTCS decreases on the event matching component.
Comparison to OTCS using Balanced Weights
The comparison between OTCS-MES and OTCS begins with balanced weights, a likely default choice for weights. Without a strong preference for either event matching or temporal matching, equal weights for each component should be the default choice. Figures 8 depicts results of balanced weighting for the simple and weighted overlap measures. For the IP data set, the simple overlap rate is between 10% and 20%, while the weighted overlap rate is between 8% and 14%. The graph shows a slight increase of overlap rate with the size of the neighborhood. For the OP data set, OTCS-MES and OTCS have a larger overlap although still below 50% for all neighborhood sizes. Figure 8 shows that the simple overlap remains below 30% at a neighborhood size below 25. However, the weighted overlap does not cross 30% even for the largest neighborhood size (50) (Figure 8).
49


Balanced Component Weighting - Simple and Rank Weighted Overlap
OTC5-ME5 vs OTC5
5054
45%
aâ„¢
15 35%
| 30%
2 25%
£ 20% u
£ 15% 10% 5% 0%
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
— — IP Sim pie Over lap ...... OP Sim pie Over lap
— — — IP Rank Weighted Overlap — • OP Rank Weighted Overlap
Figure 8. Simple and Rank Weighted Overlap of OTCS-MES Similarity Measure vs Original OTCS
For more rigorous evaluations, we used one-tailed hypothesis tests with the null hypothesis of overlap greater than 0.5 as summarized in Table 7. In each test, we set the neighborhood size to 25 and used equal component weights. All overlap test results are consistent with the performance graph (Figure 8). For both inpatient and outpatient data sets, the hypothesis test results show strong evidence to reject the null hypothesis and confirm Flypothesis 1 that the overlap is less than 0.5.
Table 7. Summary of t Test Results for OTCS-MES/OTCS with Balanced Weights
Data Set Overlap Measure N Mean P value
Inpatient Simple 6997 0.155 0.0001
Inpatient Weighted 6997 0.101 0.0001
Outpatient Simple 948 0.415 0.0001
Outpatient Weighted 948 0.266 0.0001
Comparison to OTCS using Unbalanced Weights
For unbalanced weighting of the event and temporal (structural) components of OTCS-MES, the impact on nearest neighbor overlap may be determined through the results of various component weighting scenarios. To isolate component differences, we use extreme weighting with all weights on
50


just one component. We relax extreme weights to show overlap for weights favoring one component
(0.75) over the other component (0.25), a 2:1 weighting ratio.
Figures 9 and 10 show results of extreme weighting for both inpatient (IP) and outpatient (OP) data sets. For both data sets, the results show smaller overlap for extreme weighting on the temporal component. In Figure 9, the simple overlap in the inpatient graph remains under 10% except for large neighborhood sizes greater than 40. The simple overlap in the outpatient graph is below 25% for all neighborhood sizes. In Figure 10, the weighted overlap is smaller for temporal component graphs than temporal component graphs for simple overlap in Figure 9.
The graphs for event matching display different overlap (simple and weighted) than the temporal component graphs. For the inpatient graphs, the simple overlap increases from just under 30% for small neighborhoods to 40% for large neighborhoods. Weighted overlap increases from just under 20% to just under 30% in Figure 10. For outpatient graphs, simple overlap increases from about 25% for a very small neighborhood size (5) to 60% for a very large neighborhood size (50). Simple overlap is a little below 50% for a moderate neighborhood size of 25. Weighted overlap increases from less than 20% for a very small neighborhood size (5) to 40% for a very large neighborhood size (50). Weighted overlap is a little below 40% for a moderate neighborhood size of 25.
Compared to extreme weighting, the mixed weighting graphs in Figures 11 and 12 display similar overlap for inpatient data but smaller overlap for outpatient data. For a moderate neighborhood size of 25, the event matching emphasis graph shows simple overlap about 40% (Figure 11) while the extreme weighting graph shows simple overlap about 50% (Figure 9). For a moderate neighborhood size of 25, the event matching emphasis graph shows weighted overlap under 30% (Figure 12) while the extreme weighting graph shows weighted overlap above 30% (Figure 10). For the outpatient data set,
51


the event matching emphasis graph is under the temporal matching emphasis graph, a switch from extreme weighting graphs.
Unbalanced Component Weighting - Extreme Weighting - Simple Overlap OTCS-MES vs OTCS
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
— — IP Event Match rig Only .....OP Event Matching Only
— — — IP Temporal Matching Only — • -OP Temporal Mach hg Only
Figure 9. Simple Overlap Results with Extreme Component Weighting
Unbalanced Component Weighting - Extreme Weighting - Rank Wtd Overlap OTCS-MES vs OTCS
100%
90% ——___________________________________________________________________________________
80%
70%
CL
| 60%
>
2 5096
| 40%
5 30%
20%
10%
0%
— — IP Event Matching Only ......OP Event Match rig Only
- —— IP Temporal Matching Only — • -OP Temporal Match rig Only
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
Figure 10. Rank Weighted Overlap Results with Extreme Component Weighting
52


Unbalanced Component Weighting - Mixed Weighting - Simple Overlap
OTCS-MESvsOTCS
100%
90%
80%
70% ..................................................................
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
— — IP Event Matching Emphasis .......OP Event Matching Emphaas
— — — IP Temporal M etching Emphasis — • -OP Temporal Matching Em phase
Figure 11. Simple Overlap Results with Mixed Component Weighting
Unbalanced Component Weighting - Mixed Weighting - Rank Wtd Overlap OTCS-MESvs OTCS
100%
90%
80%
70%
Cl
ra
-j= 60%
:â– 
2 50% u 40%
30%
20%
10%
0%
5 10 15 20 25 30 35 40 45 50
Neighborhood Size
— — IP Event Matching Em phase ......OP Event Match rig Emphaas
— — — IP Temporal Matching Emphas'e — * -OP Temporal Matching Emphasis
Figure 12. Rank Weighted Overlap Results with Mixed Component Weighting
For more rigorous evaluations, we used one-tailed hypothesis tests with the null hypothesis of overlap greater than 0.5 as summarized in Table 8. In each test, we set the neighborhood size to 25 and used unbalanced component weights, either extreme or mixed. Overlap test results are consistent with performance graphs (Figures 9 to 12). All test results with rank weighted overlap are significant, demonstrating strong evidence to reject the null hypothesis and confirm Flypothesis 1. Three test results with simple overlap for outpatient data were not significant, but the confidence intervals in Table 9 are close to the threshold 0.5 overlap threshold.
53


Table 8. Summary of t Test Results for OTCS-MES/OTCS with Unbalanced Weights
Data Set Overlap Measure Weighting Component Emphasis Mean P value
Inpatient Simple Extreme Event 35.24% 0.0016
Temporal 7.46% 0.0000
Mixed Event 28.14% 0.0000
Temporal 9.79% 0.0000
Rank Weighted Extreme Event 22.84% 0.0000
Temporal 4.73% 0.0000
Mixed Event 18.41% 0.0000
Temporal 6.26% 0.0000
Outpatient Simple Extreme Event 46.33% 0.3753
Temporal 14.42% 0.0003
Mixed Event 41.03% 0.2100
Temporal 43.68% 0.3194
Rank Weighted Extreme Event 30.82% 0.0192
Temporal 9.02% 0.0000
Mixed Event 26.61% 0.0046
Temporal 27.97% 0.0173
Table 9. Confidence Intervals (95%) for Non-Significant t Test Results in Table 8
Data Set Overlap Measure Weighting Component Emphasis Mean P value Lower Cl Upper Cl
Outpatient Simple Extreme Event 46.33% 0.3753 37.20% 55.46%
Temporal 14.42% 0.0003
Mixed Event 41.03% 0.2100 30.69% 51.36%
Temporal 43.68% 0.3194 33.09% 54.28%
Results for hypothesis 4 are consistent with the graphs comparing overlap to event sequence length (Figures 13 and 14). Inpatient and outpatient data differ sharply on sequence length. In the MES samples used in this study, an outpatient MES had an average of 11.04 events and over half of the outpatient event sequences have greater than 10 events. In contrast, an inpatient MES had an average of 5.35 events with 64% of inpatient event sequences having five or fewer events. For inpatient data, Figure 13 shows overlap decreases over most increases in sequence length, confirming Flypothesis 4. Inpatient data has a high level of missing data (40%) for 5-digit ICD-9 codes so hierarchical matching used by OTCS-MES loses some advantage. For outpatient data, Figure 14 shows that overlap decreases as sequence length increases for most of the sequence length range, also confirming Flypothesis 4.
54


100% Nearest Neigbor Overlap by Inpatient Event Frequency Extreme Event Weighting-k=10

Cl 75%
ro > O
-Jd Vv
QJ U 25%
Q_ \
0%
5 6 7 8 9+
ME5 Length
— — — Inpatient S‘m pie Overlap Inpatient R an k Wtd Overlap
Figure 13. Nearest Neighbor Overlap by Sequence Length (Inpatient Data)
Nearest Neigbor Overlap by Outpatient Event Frequency Extreme Event Weighting - k=10
100%
0%
01 to 05 06 to 10 11 to 15 16 to 20 21+
MES Length
— — — Outpatient Simple Overlap •Outpatient Rank Wtd Overlap
Figure 14. Nearest Neighbor Overlap by Sequence Length (Outpatient Data)
For another perspective on the impact of partial matching of hierarchical codes. Table 10 shows total event matching for OTCS-MES (with partial matching) and OTCS (without partial matching). The impact for outpatient data is striking with more than 250% event matches.
55


Table 10. Total Event Matches for OTCS-MES and OTCS
Total Event Matches
Inpatient Outpatient
OTCS-MES 13,455,390 6,504,003
OTCS 11,560,821 1,821,349
Difference 1,894,569 4,682,654
% Difference 16.39% 257.10%
Comparison to Artemis
Since Artemis only has a temporal component, the analysis in this section shows graphs with varied weights for OTCS-MES but no weights for Artemis. Figures 15 and 16 show very small overlap (both simple and weighted) of nearest neighbors between OTCS-MES and Artemis. The overlap between OTCS-MES and Artemis is much smaller than the overlap between OTCS-MES and OTCS for inpatient data.
Simple Percentage of Overlapping Entity Pairs versus ARTEMIS
*0.00/1.00 -0.25/0.75 — — 0.50/0.50 ...... 0.75/0.25 1.00/0.00
Figure 15. Simple Overlap of OTCS-MES versus Artemis (Inpatient Data)
56


Rank Weighted Percentage of Overlapping Entity Pairs
versus ARTEMIS
---- 0.00/1.00 ---- -0.25/0.75 — — 0.50/0.50 ...... 0.75/0.25 -----1.00/0.00
Figure 16. Rank Weighted Overlap of OTCS-MES sure versus Artemis (Inpatient Data)
We also compare the OTCS-MES to Artemis using the outpatient data. Figures 17 and 18 show small overlap (both simple and weighted) of nearest neighbors between OTCS-MES and Artemis. Again, we observe that the overlap between OTCS-MES and ARTEMIS is substantially smaller than the overlap between OTCS-MES and OTCS. Event alignment as the focus of ARTEMIS measure substantially different nearest neighbors than event matching through the OTCS-MES measure for outpatient data.
Simple Percentage of Overlapping Entity Pairs versus ARTEMIS for Outpatient MESs
— • 0.00/1.00 ---- 0.25/0.75 — — 0.50/0.50 ..... 0.75/0.25 ------1.00/0.00
Figure 17. Simple Overlap of OTCS-MES versus Artemis (Outpatient Data)
57


Rank Weighted Percentage of Overlapping Entity Pairs
versus ARTEMIS for Outpatient MESs
— * 0.00/1.00 — 0.25/0.75 -- 0.50/0.50 . 0.75/0.25 ■1.00/0.00
Figure 18. Rank Weighted Overlap of OTCS-MES versus Artemis (Outpatient Data)
The overlap graphs for both data sets demonstrate little impact of component weights for OTCS-MES. The graphs for each combination of component weights are tight with little space between them. Thus, OTCS-MES and Artemis have little overlap regardless of weights used for OTCS components.
For more rigorous evaluations, we used one-tailed hypothesis tests with the null hypothesis of overlap greater than 0.5 as summarized in Table 11. In each test, we set the neighborhood size to 25 and used equal component weights for OTCS-MES. The overlap test results are consistent with performance graphs (Figures 15 to 18) for both data sets and overlap measures. All hypothesis test results show strong evidence to reject the null hypothesis and confirm Flypothesis 1 that the overlap is less than 50%.
58


Table 11. Summary of t Test Results for OTCS-MES/Artemis with Balanced Weights
Data Set Overlap Measure N Mean P value
Inpatient Simple 6999 0.00804 0.0001
Inpatient Weighted 6999 0.00513 0.0001
Outpatient Simple 965 0.12220 0.0001
Outpatient Weighted 965 0.07420 0.0001
Discussion
Our results demonstrate that a similarity measure adapted specifically to unique properties of medical events behaves differently than measures ignoring such features. The OTCS-MES uses hierarchical event matching with prevalence scores, mean and variation in measuring temporal structure, normalization of event similarity, and standard [0,1] weights to combine similarity components. Empirical analysis provided evidence about the consistency of weights and normalization for both inpatient and outpatient MES data.
A detailed empirical comparison demonstrated substantial differences in nearest neighbor overlap between OTCS-MES, OTCS, and Artemis. The comparison involved two data sets (inpatient with typically short sequence length and outpatient with longer sequence length) with different event coding (ICD-9 for inpatient and CPT for outpatient), two overlap measures (simple and weighted accounting for relative positions of neighbors), and impacts of three important factors (neighborhood size, matching component weights, and sequence lengths).
Table 12 summarizes empirical testing results to confirm most hypotheses about small overlap among the three similarity measures. For Hypothesis 1, results demonstrate substantially small overlap between OTCS-MES and OTCS except for two cases of unbalanced weights with overlap just above the 0.5 threshold. For Hypothesis 2, results demonstrate that event alignment in Artemis produces very different nearest neighbors than any combination of event and temporal matching as used in the OTCS-MES measure. For Hypothesis 3, results indicate less overlap on temporal similarity components of OTCS-MES and OTCS than the event matching components except for mixed weighting. For Hypothesis
59


4, results show overlap decreasing as sequence length increases for both inpatient and outpatient data.
The high level of missing data at the most detailed level seems to influence the impact of sequence length on inpatient data set.
Table 12. Summary of Hypothesis Evaluation
Comparison Inpatient Data Outpatient Data
Hla: OTCS-MES vs. OTCS (simple overlap) Evidence of substantial differences using balanced and unbalanced component weights Evidence of substantial differences on balanced weights and evidence of near threshold differences for unbalanced weights
Hlb: OTCS-MES vs. OTCS (weighted Evidence of substantial Evidence of substantial differences
overlap) differences using balanced and unbalanced component weights using balanced and unbalanced component weights
H2a: OTCS-MES vs. Artemis (simple overlap) Evidence of substantial difference Evidence of substantial difference
H2b: OTCS-MES vs. Artemis (weighted overlap) Evidence of substantial difference Evidence of substantial difference
H3a: OTCS-MES vs. OTCS overlap for event matching greaterthan temporal matching (extreme weighting) Evidence of substantial difference Evidence of substantial difference
H3b: OTCS-MES vs. OTCS overlap for event Evidence of substantial No evidence of substantial
matching greaterthan temporal matching (mixed weighting) difference difference
H4: OTCS-MES vs. OTCS overlap decreases as MES length increases Evidence of decrease Evidence of decrease
The empirical results in this study provide a foundation to study performance differences in
reasoning tasks using a similarity measure. The empirical results in this study focus on overlap of nearest neighbors, a measure independent of application and algorithm. The results depict substantial overlap differences to help explain performance differences in subsequent studies. We intend additional research to study performance differences among similarity measures in classification, clustering, and search tasks for various medical application areas. Because the OTCS-MES captures important features of MESs, we feel confident that OTCS-MES will have substantially better performance for a variety of tasks and applications than other proposed similarity measures. The evidence in this study (small overlap between OTCS-MES and OTCS on the event-matching component and the large amount of missed event matches by OTCS) indicate that OTCS-MES has a substantially better design than OTCS.
Exploring the potential applications for the OTCS-MES similarity measure yields several important research opportunities. Among these are (1) patient classification perhaps using disease or
60


risk groups, (2) evaluation of patient adherence to a clinical pathway or care management plan, and (3)
discovery of similar patients for medical social networking. First, determination of OTCS-MES precision in classification requires verification of the homogeneity of the risk level and condition profile for patients deemed similar based on OTCS-MES. Such precision evaluation necessitates a test group of ground truth patients and their respective risk and condition labels. Appropriate algorithms for patient risk and condition labeling that may be implemented for this application include CDPS risk scoring and the numerous co-morbidity indices (Charlson, Elixhaurser, etc.) for disease determination. A second OTCS-MES application involves determination of patient adherence to an accepted clinical pathway specific to their diagnosed condition. A clinical pathway is a sequence of prescribed procedures, medications, tests or other medical events related to a disease or physical condition. As such, OTCS-MES is especially suitable to evaluate a patient's adherence to their prescribed medical event sequence or clinical pathway. Finally, medical social networks allow patients having similar medical histories to discuss treatment successes and failures, exchange experiences and receive emotional support. OTCS-MES could prove valuable in augmenting methodologies used by medical social networks when retrieving similar patients for homogeneous online patient communities.
Conclusion
We developed a similarity measure for medical event sequences (MESs) and empirically compared it to two other similarity measures designed for MESs using publicly available U.S. Medicare claims data. We designed the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS-MES) based on unique aspects of MESs including dense, hierarchical coding schemes, duplication of events, and event prevalence. The OTCS-MES contains components for event similarity using hierarchical event matching with prevalence scores, mean and variation of event duration and gaps, standard [0-1] weights, and normalization based on empirical characteristics of MESs. We empirically evaluated the OTCS-MES measure against two other measures specifically designed for MESs, the
61


original OTCS and Artemis, a measure incorporating event alignment. Our evaluation used two substantial data sets of Medicare claims data containing inpatient and outpatient sequences. Using two overlap measures, we found a small overlap in nearest neighbors among the three similarity measures demonstrating the importance of unique aspects of MESs. The evaluation also provided evidence about internal consistency of weight choices for the OTCS-MES.
We plan additional research to assess the performance of similarity measures for MESs to augment the focus in this research on overlap differences. The analysis here should be extended to clustering performance using independent cluster quality measures, robustness for poor data quality, and classification performance. Additional analysis should utilize additional types of medical events.
Besides addressing these areas of additional analysis, we plan future research about medical reasoning using the OTCS-MES. To support reasoning by health care professionals using MESs for risk analysis, clinical pathways, and co-morbidity, we propose to develop a matching operator and visualization tools for MESs to augment the OTCS-MES. The matching operator will support temporal constraints about changes in medical events in MESs such as increased severity and prolonged symptoms. Visualization tools for MESs will help health care professionals see important patterns in MESs. Human factors studies will be necessary to evaluate the utility of a query architecture combining a similarity measure, matching operator, and visualization tools.
62


CHAPTER III
SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PREDICTING MORTALITY IN TRAUMA
PATIENTS
Abstract
In this study, we extend a similarity measure for medical event sequences (MESs) and evaluate its classification performance for retrospective mortality prediction of trauma patient outcomes. Retrospective mortality prediction is a benchmarking task used by trauma care governance bodies to assist with policy decisions. We extend a similarity measure, the Optimal Temporal Common Subsequence for MESs (OTCS-MES), by generalizing the event-matching component with a plug-in weighting element. The extended OTCS-MES uses an event prevalence weight developed in our previous study and an event severity weight developed for this study. Importantly, our method requires no exogenous data as all predictive information is contained in the trauma incident registry. In the empirical evaluation of classification performance, we provide a more complete evaluation than previous studies. We compare the predictive performance of the Trauma Mortality Prediction Model (TMPM), an accepted regression approach for mortality prediction in trauma data, to nearest neighbor algorithms using similarity measures for MESs. Using a data set from the National Trauma Data Bank, our results indicate improved predictive performance for an ensemble of nearest neighbor classifiers over TMPM. Our analysis reveals a superior Receiver Operating Characteristics (ROC) curve, larger AUC, and improved operating points on a ROC curve. We also study methods to adjust for uncommon class prediction, weighted voting, neighborhood size, and case base size. Results provide strong evidence that similarity measures for medical event sequences are a powerful and easily adapted method assisting with health care policy advances.
63


Introduction
Trauma Care Evaluation
This study involves mortality prediction for trauma centers, an important classification task having established methods. There is an essential focus on trauma care because trauma injuries are the leading cause of death in people younger than 44 and the fifth leading cause of death for all age groups (Glance et al. 2009). Additionally, treatment methods for trauma related injuries are extremely costly, often leading to more expensive forms of care. The importance and cost of trauma care mandate benchmarks for trauma center performance relative to the injury severity of incoming patients. Cassidy et al. (2014) maintain that "accurate injury severity scoring systems are essential for benchmarking outcomes and objectively evaluating and improving trauma care."
With injury being the leading cause of lost years of life and escalating trauma care costs, improved trauma patient outcomes and streamlined care delivery are important objectives of researchers and policy makers. As such, the Trauma Care Systems Planning and Development Act was passed to improve trauma care and establish a Division of Trauma in the Department of Health and Human Services. Resulting regional trauma systems are designed to reduce mortality from injury. Furthermore, governance bodies, including the World Health Organization and the American College of Surgeons, provide consensus-based policy recommendations on the structure of trauma systems. This attention to trauma care has shown positive results with an estimated 15% reduction in the odds of mortality and decreases in both disability outcomes and costs (Moore et al. 2018). Informed policy decisions by trauma governance bodies remain of utmost importance.
The method advanced in this research improves trauma policy decisions to help mitigate the "major knowledge gap on which components of a trauma system contribute to their effectiveness" (Moore et al. 2018). We propose our method to more accurately evaluate trauma center performance
64


to facilitate better policy decisions. In confirmation, trauma care administrators state the "the next logical step in the process of trauma system evaluation is to establish measures that consistently capture true outcome performance" and "evaluation of trauma system effectiveness will require ongoing outcome analysis in what must remain an uncompromising commitment to optimal outcome for the injured patient" (Celso et al. 2006).
Retrospective Mortality Prediction
Evaluating trauma care based on benchmarks for mortality rates, dependent upon injury severity, involves retrospective mortality prediction. Retrospective mortality prediction methods for trauma care have "important clinical and economic implications because these tools are used to evaluate patient outcomes and quality of care" (Weeks et al. 2016). Our research is not advancing an "in facility" clinical decision tool, per se, but suggests a method to improve strategic decision making though more accurate assessments of trauma care. Essentially, retrospective mortality prediction enables governance bodies to measure trauma care delivery based on "benchmark" mortality rates for comparable patients or injury mix. Trauma centers demonstrating superior patient outcomes inform policy and resource allocation decisions concerning the various components of trauma care systems (transportation, triage, facility design, benchmarking, etc.). Retrospective evaluation of trauma care is a commonly researched area. In fact, a recent Google Scholar search using trauma+retrospective+ "mortality prediction" since 2017, generated 674 articles in the result.
Typically for trauma care, mortality prediction models use historical incidents to correlate patient attributes and injury severity to known trauma discharge dispositions (deceased or nondeceased). Retrospective mortality prediction provides injury outcome "benchmarks" to help improve level of trauma care through more informed policy decisions. As explained previously, governance bodies retrospectively assess trauma center performance to determine a facility's comparative level of
65


care, and the components of trauma care systems most impacting improved patient outcomes. Trauma
centers having superior performance, evidenced by comparatively low mortality rates, are surveyed to determine which components of trauma care systems are predominant. For example, studies (Celso et al. 2006) using retrospective mortality prediction have compared trauma care between (a) in-hospital facilities and external trauma centers, (b) level I and level II trauma centers, and (c) trauma centers in high and low-middle income countries. An example of a resulting policy change, from such studies, is continuation of a 2-tiered designation system for trauma care (Glance et al. 2012). In addition to evaluating trauma center performance, retrospective mortality prediction enables study of mortality rates across patient cohorts. For example, Flashmi et al. (2014) projected mortality rates from reference studies for an age group comparison of outcomes for trauma patients. In summary, retrospective mortality prediction establishes guidelines for appropriate outcomes from trauma care to help inform trauma care system policies. Accordingly, the research advanced in this study intends to improve established methods for trauma mortality prediction.
Mortality Prediction Using Similarity Measures
Because of the importance of mortality prediction for trauma centers, researchers have developed several prominent prediction methods. The most widely accepted method, the Trauma Mortality Prediction Model (TMPM), involves detailed regression modeling of individual injury codes using a large training sample. TMPM (Glance et al. 2009) uses derived coefficients for more than one thousand injury codes to make mortality predictions.
In this portion of our research, we study an alternative approach to mortality prediction based on similarity of medical events in a patient's trauma incident. Our approach requires no exogenous data contained in linked EHRs but relies solely on data elements endogenous to the trauma incident registry. Furthermore, predicting mortality based on incident similarity provides better explanation than
66


regression prediction as similar cases provide explanation of a prediction. Prediction based on similarity
using nearest neighbor classification does not require training, although it requires indexing of trauma incidents for efficient computation of nearest neighbors. Alternatively, training is required to reduce the number of cases in a reference set.
Research Methodology
We compare predictive performance of nearest neighbor classification using a similarity measure to TMPM, a prominent approach for mortality prediction in trauma data. In nearest neighbor classification, we use OTCS-MES with two weighting approaches (event prevalence and event severity), the original OTCS with only exact matching of event codes, and an ensemble using these three classifiers. We compare performance with three important measures: receiver operating characteristic (ROC) curves, area under a ROC curve (AUC), and operating points derived from a ROC curve. Results are based on a substantial data set from the National Trauma Data Bank. Our results indicate superior performance for an ensemble of nearest neighbor classifiers over TMPM on ROC curve analysis and AUC. For optimal operating points, the ensemble provides better performance than TMPM especially as the importance of sensitivity increases. We also study the impact of oversampled training data versus lifted voting for the uncommon mortality class, weighted voting, neighborhood size, and case base size.
Contributions
This study makes three important contributions. Most importantly, this study developed a new classification method with better performance than the accepted standard, TMPM. The ensemble of nearest neighbor classifiers obtained better performance than TMPM on ROC curves, AUC, and optimal operating points on a ROC curve. No studies have used nearest neighbor classification for mortality prediction with an uncommon mortality class. As an important secondary contribution, generalization of the "event matching" component of OTCS-MES makes event matching applicable to a wider variety of
67


medical domains. As another secondary contribution, the detailed performance comparison provides a
more complete analysis than previous studies. For example, prior studies neglected to compare performance on operating points on a ROC curve.
The description of this study continues as follows. The next or second section reviews prior work on mortality prediction for trauma centers and similarity measures for MESs. The third section presents the design of the experiment comparing nearest neighbor prediction with OTCS-MES to TMPM. The fourth section presents results of the experiment and discusses implications. The fifth section summarizes the study and identifies future extensions.
Related Work
To provide a context for the experiment design and results in the next sections, we review previous work on mortality prediction for trauma centers and similarity measures for MESs. We review early methods for mortality prediction (Injury Severity Score and Abbreviated Injury Scale) as well as two contemporary methods (the Bayesian Trauma Prediction Model and the Trauma Mortality Prediction Model). Our research uses the Trauma Mortality Prediction Model and classifiers based on OTCS-MES measures.
Injury Severity Score (ISS) and Abbreviated Injury Scale (AIS)
Because of the demand for accurate mortality prediction for trauma centers, several severity scoring systems have been developed. The Injury Severity Score (ISS), an early method for severity scoring, uses the Abbreviated Injury Scales (AIS) to score injuries and predict trauma outcomes (Cassidy 2014). AIS is an anatomical-based coding system created by the Association for the Advancement of Automotive Medicine to quantify injury severity. The International Classification of Diseases, Clinical Modification, version 9 (ICD-9-CM) is a more recent coding system with injury classifications. Because
68


the National Trauma Data Standard now mandates ICD-9-CM, ISS provides an option to use ICD-9-CM or
AIS codes (Glance 2009).
Alternative severity scoring models to the ISS have been proposed. The International Classification of Diseases Injury Severity Score (ICISS) uses empirically derived survival risk ratios (SRR) for ICD-9-CM codes. ICISS calculates the proportion of survivors among patients having an ICD-9-CM injury code (Glance 2009). Another alternative approach, the Single-Worst Injury (SWI) Model, focuses on injury assessment using a patient's single most severe (worst) injury. The single worst injury is commonly determined by the AIS score (Tohira 2012). In preliminary injury scoring, the single worst injury was often used to predict outcomes. However, subsequent scoring systems leverage multiple injuries to contribute to outcome prediction (Kilgo et al. 2003).
Regression Models for Mortality Prediction
Burd et al. (2008) developed the Bayesian Logistic Injury Severity Score (BLISS) to leverage ICD-9-CM trauma coding with 2,210 possible injury codes and 243,037 two-way interactions among injury codes. Like ICISS, BLISS relies solely on ICD-9-CM codes without the need for psychological or supplementary data often input to other methods. In contrast to ICISS, BLISS uses injury interactions, not just individual injury codes. Burd et al. (2008) found slight improvements in prediction performance with BLISS compared to ICISS but much better model calibration with the Hosmer-Lemeshow h-statistic. Prediction performance of BLISS was most apparent among patients at lower risk for mortality.
The more recent Trauma Mortality Prediction Model (TMPM), a probit regression model, supports alternative injury codes (AIS, ICD-9-CM, or ICD-10). TMPM uses approximately 1,000 different types of injuries characterized by these coding sets. TMPM comprises two separate probit models.
Model 1 uses all possible injuries as binary predictors with death as the binary outcome. Model 2 uses
69


indicators of body region severity. A weighted average of the coefficients of the two regression models
provides the empirical severity for each injury.
Empirical analysis showed that TMPM ICD-9 provided superior performance than other ICD-9-CM based models. However, analysis in previous studies omitted analysis of operating points. Superior predictive performance of the TMPM-ICD9 was most noted as the number of injuries increased (Cassidy etal. 2014).
In this study, we use TMPM ICD-9 because of its performance and availability. TMPM has been compared to other prediction mortality prediction models except for BLISS. The R implementation of TMPM-ICD9 facilitated usage in our experiments (https://cran.r-project.org/web/packages/tmpm/tmpm.pdf).
Similarity Measures for MESs
In previous research, we developed a similarity measure for MESs known as the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS-MES). Its development was motivated by limitations in other proposed methods, particularly the original OTCS (Zheng et al. 2010). In a detailed empirical evaluation (XXXX), we compared the OTCS-MES to the original OTCS and Artemis, a measure incorporating event alignment. This comparison used inpatient MESs with ICD-9-CM codes and outpatient MESs with CPT procedure codes. Overall, we found a small overlap in nearest neighbors among the three similarity measures, demonstrating the superior design of the OTCS-MES with its emphasis on unique aspects of MESs. With extreme weighting on just the event matching components of OTCS and OTCS-MES, simple overlap rates for shared nearest neighbors ranged from 25% for small neighborhood sizes (5) to 40% for large neighborhood sizes (50) in inpatient data and 60% for large neighborhood sizes in outpatient data.
70


The evaluation in our previous study (XXXX) never investigated classification performance of
OTCS-MES. Although the OTCS-MES contains components for event similarity and temporal structure similarity, this study only uses the event matching component because trauma incidents are reported without timing of ICD codes. To use the OTCS-MES for classification, we generalize the OTCS-MES to provide a level of domain customization.
Research Methodology
Initially, our research methodology concerns the use of MES similarity measures to find like trauma incidents based on injury (event) sequence matching. We first expound on similarity measures and their application within nearest neighbor classification to predict trauma outcomes based on known outcomes for nearest neighbor trauma incidents. We then turn to our secondary research, aimed at designing the most effective approach to nearest neighbor classification for prediction of uncommon mortality in trauma incidents. Specifically, we evaluate the following constraints on our classification method leveraging similarity measures specific to trauma incidents:
• Voting method for nearest neighbor trauma incidents - traditional majority voting or soft voting with proportional weights.
• Size for nearest neighbor cohort - 1 through 49 (odd only).
• Adjustment method for imbalanced data - majority voting, oversampling, or certainty-factor voting.
• Case base size - 5,000, 10,000 or 50,000 training incidents and 2,000 test incidents.
Given the best kNN nearest neighbor classification method from our secondary research, we move on to our primary research evaluating trauma mortality prediction using our alternative method. We start by presenting our primary research hypotheses for similarity measure performance. Performance is evaluated against both the industry gold standard, TMPM, and amongst the various
71


similarity measure approaches (original OTCS exact matching, and OTCS-MES prevalence and severity
weighted partial matching). The research methodology section continues by detailing the trauma data used for our empirical evaluation and associated data filters required for an equitable empirical evaluation. Finally, we describe the performance measures chosen to evaluate our alternative predictive method.
Similarity Measures for Medical Event Sequences
This study uses three similarity measures, the original OTCS, the OTCS-MES with event prevalence weights (OTCS-MES EP), and the OTCS-MES with event severity weights (OTCS-MES ES). Although all three measures contain components for event matching and temporal structure of events, this study only uses the event matching component because trauma records do not have a temporal structure.
Original OTCS
The original Optimal Temporal Common Subsequence (OTCS), developed by Zheng et al. (2010), uses exact matching for events. Given a state-sequence defined as S„ = [si,..., sn], the OTCS compares two state-sequences 5m and S'„ based upon exact matching of the states (events) within S and S' (Zheng et al. 2010). Figure 19 shows the OTCS matching algorithm, reprinted from Zheng et al. (2010). This figure illustrates exact state matching by OTCS with state s, compared in totality to state s). Therefore, when applied to MESs, the original OTCS would require that an ICD-9-CM code match exactly in both length and content.
72


tor i = t: m
for j = I : n
if s, = s', # matched case I: \, = jfj i = s, = si,
OTCSl(i, j) = OTCSuti I. j - 1) case 2: - s,_t - a,
OTCS[(i, j> = OTCS,{i 1 „ j - 1) + 1
Figure 19: OTCS Event Matching Procedure (Zheng et al. 2010)
Although the OTCS was motivated by temporally spaced state sequences, the hierarchical nature of MESs was not utilized in the measure. Thus, the OTCS does not incorporate partial event matching, only counting the number of exact (non-partial) matches between event sequences. For example, if one MES contains ICD-9-CM code 250.00 and a second MES contains ICD-9-CM code 250.01, the original OTCS would not find a match although they represent highly related medical events. In addition, the OTCS does not allow weighting of matched events in its event matching component. For example, if two MESs share medical events 250.01 and 279.00, these matched events are given equal weight by the original OTCS. The original OTCS simply counts matched events, regardless of event likelihood, risk, or severity.
OTCS-MES with Prevalence Weights (OTCS-MES EP)
In contrast to the original OTCS, OTCS-MES integrates unique features of MESs. The OTCS-MES provides a matching component that integrates event prevalence, event duplication, and hierarchical coding, important elements of MESs. Event prevalence, normalized to mitigate heavy positive skew and compact distribution, provides weights for matched events. Partial matching captures similarity based on the hierarchical organization of event codes, increasing similarity beyond exact matching. For example, if one MES contains ICD-9-CM code 250.00 and a second MES contains ICD-9-CM code 250.01, the OTCS-MES considers these events matching at the 4-digit level but not at the 5-digit level (most specific ICD-9-CM codes).
73


The event matching component in XXXX uses normalized event prevalence weighting. Since this research just uses event matching, the other components of the OTCS-MES are ignored. Equation 2 shows the definition of the event matching component using event prevalence (OTCS-MES EP). In the numerator, the event matching component sums prevalence weights and accounts for the number of matched events. The denominator value is the maximum matched events across all MES pairs plus one, less the number of matched events between the two MESs under consideration.
yMSSize r\jp]Ar
OTCS-MES EP = (JZ%“lue„;wy(PDM + 1 - MSSizeLimit) (2)
where
• ME is the set of all matching events in the pair of cases (medical event sequences),
• C is set of all cases, and
• MSSize is the cardinality of the associated set.
• NPWe is the normalized prevalence weight of event e,
• PDM is the maximum matched event limit, and
• MSSizeLimit is the number of event matches in the pair of cases constrained by the matched event limit.
Event prevalence weighting presumes that rarer events matched between two MESs indicate greater similarity than more common matched events. The OTCS-MES EP calculates individual event likelihood or prevalence using the complete set of trauma incident events and associated diagnosis codes. An event's prevalence weight is one minus the event's frequency rate, so larger values (weights) indicate rarer events. OTCS-MES EP normalizes the summation of prevalence weights of matched events by the maximum prevalence weight summation across all MES pairs. Additionally, OTCS-MES EP retains replicated matched events versus the original OTCS measure that removes replicated event matches.
74


OTCS-MES with Severity Weighting (OTCS-MES ES)
For trauma data, event severity provides intuitive appeal to weight matching events for mortality prediction. As previously presented, early methods for mortality prediction incorporated injury scoring systems with event severity. Reference literature identified two important factors for injury scoring, injury type and anatomical body region. Injury type describes the nature of the injury and includes values such as contusion, sprain, open wound, and dislocation. Body region involves the anatomical area of the body injured, such as head and neck, spine and back, torso, and extremities. Based on these two variables, Bareli et al. (2002) developed a matrix having nature of injury columns, body region rows, and ICD-9-CM injury codes in each cell. As an extension to this work, Clark and Ahmad (2006) assigned a survivor proportion to each cell of the Bareli matrix.
Our study uses the Clark/Ahmad extension with survivor proportions assigned to each ICD-9-CM injury code. A severity weight equals one minus the survivor proportion, with larger values indicating more severe events. Equation 3 defines the OTCS-MES ES, a revision of the OTCS-MES EP, for severity weights. Essentially, we are replacing the prevalence weight for a matched event, as shown in Equation 2, with the severity weight for that same matched event. We then sum the severity weights for all matched events between our MESs under consideration, and normalize this value according to the maximum severity weight summation value across all MES pairs.
OTCS-MES ES = (:

vMSSize QU/
â– )/(PDM + 1 - MSSizeLimit) (3)
maxS, ei-,
where
• ME is the set of all matching events in the pair of cases (medical event sequences),
C is set of all cases,
MSSize is the cardinality of the associated set,
75


• SWe is the normalized severity weight of event e,
• PDM is the maximum matched event limit, and
• MSSizeLimit is the number of event matches in the pair of cases constrained by the matched event limit.
Nearest Neighbor Classification for Mortality Prediction
The similarity measures described above can be used in nearest neighbor classification algorithms. After a brief presentation of the nearest neighbor classification algorithm used in our experiment, this section describes adjustments to our classification approach accounting for imbalanced trauma data.
Nearest Neighbor Classification Algorithms
The kNN classification algorithm (Bhatia and Ashev, 2009) provides a simple but computationally intense approach for classification using a distance function. To make classification decisions, the kNN classification algorithm uses a neighborhood of k nearest neighbors with majority voting among the k neighbors. In this study, inverted similarity measures (1 - similarity) were used as distance measures.
The kNN classification algorithm requires no training so it is lazy. However, it uses all cases to classify new cases, so it requires large storage space and high search cost to retrieve nearest neighbors. Search cost can be substantially reduced by indexing so that indexing can be considered a training cost, qualifying the designation as a lazy learning algorithm. To reduce computational requirements for large case bases, we created indexes for creation and search of dissimilarity matrices.
kNN classification has low bias but high variance (Manning, Raghavan, and Schutze, 2008). The decision boundaries in kNN vary in a nonlinear manner, providing flexibility for classification decisions. Each case has a positive probability of correct classification for some training sets. In contrast, kNN classification has high variance with sensitivity to noise in relevant attributes. Classification algorithms
76


with high variance tend to overfit. For kNN, distance function, neighborhood size, and case base size
influence bias and variance, emphasizing the importance of these choices.
To improve prediction performance, we use two variations of nearest neighbor classification. Weighted voting allows more impact for neighbors close to a target case and less impact for far neighbors. The main benefit of weighted voted is less sensitivity to neighborhood size. We use proportional weights defined by Dudani (1976) as an alternative to traditional equal weighting voting.
Ensembles combine predictions of individual classifiers typically using weighted voting among classifiers on each case. Ensembles improve classification results for diverse classifiers with different biases. Many ensemble methods have been proposed for nearest neighbor classification, using both training and voting to combine individual classifiers (Garcia-Pedrajas and Ortiz-Boyer, 2009). We use a soft voting ensemble (scikit-learn.org/table/modules/ensemble.html) with cases labeled according to sum of predicted scores. This ensemble involves additional classification resources, as it requires determination of nearest neighbors for each component classifier.
Secondary Research - Accommodating the Uncommon Mortality Class
Despite the serious nature of patients admitted to trauma centers, mortality is uncommon. Treatment at a trauma center is short-term so only death between admittance and discharge counts as mortality. Patients dead on arrival and discharged to another facility do not count in the mortality disposition recorded in trauma data. After adjusting for death outside of the trauma center window, missing data, small trauma centers, and few diagnosis codes, the deceased prevalence in our sample data was 6.28 percent.
Over sampling of the uncommon class provides a typical strategy for dealing with imbalanced data. Although Maloof (2003) indicates some conflicts in results of over versus under sampling, the availability of cases for the uncommon class drives usage of over sampling. Since ample data was
77


available, we used over sampling to deal with the uncommon mortality class. Specifically, we used the
Over-Sampling Optimum Fraction (Kalton 1993) to increase the proportion of mortality events in training data. Equation 4 defines the Kalton Over-Sampling Optimum Fraction fh for class h
per case of the uncommon to the common class. Using Equation 4 with the trauma mortality prevalence of 6.28 percent and equal data collection costs yields an optimum sampling fraction of 25.07%.
A second strategy for dealing with an uncommon class modifies the majority voting rule. When the prevalence of the uncommon class cannot be adjusted, a modified voting rule can compensate for the scarcity of cases of the uncommon class. Zhang (2010) proposed kNN-CF, a certainty factor (CF) measure for kNN classification for imbalanced data. kNN-CF classification accounts for the lift in proportion of extreme outcomes among the k nearest-neighbors over the proportion of extreme outcomes in the population as a whole. In our experiment, we also used certainty factor voting as an alternative to adjust for the uncommon mortality class.
Primary Research -Trauma Mortality Prediction Using Similarity Measures
This study asserts that a similarity measure adapted to medical event histories can be a valuable clinical decision-making tool. Within this broad assertion, this experiment addresses the predictive capability of MES-adapted similarity measures for the classification of trauma incident outcomes based on the incident's set of events. We are interested in comparisons involving the predictive performance of classifiers using individual similarity measures (OTCS, OTCS-MES EP, and OTCS-MES ES), the existing standard for trauma morbidity prediction (TMPM), and an ensemble of nearest neighbor classifiers using individual similarity measures. We aim to observe improved prediction performance for OTCS-MES over
(4)
where Ph is the prevalence of the uncommon class and c is the ratio of the data collection costs
78


TMPM and improved prediction ability of OTCS-MES relative to the original OTCS. The following list presents hypotheses concerning predictive performance.
1. TMPM, as the recognized best method, performs better than the MES similarity measures (OTCS, OTCS-MES EP, OTCS-MES ES, and OTCS Ensemble).
As explained previously, TMPM was designed specifically to predict mortality for trauma center incidents. Its derivation, based on a large amount of trauma data, contains two probit regression models accounting for the type of injury and body region for the injury. According to Glance et al. (2009), since TMPM-ICD9 performs better than ICISS and the SWI model, it should be preferred for risk-adjusting trauma outcomes when injuries are recorded using ICD-9-CM codes. Furthermore, Cassidy et al. (2014) confirms the superiority of TMPM for injury scoring of pediatric patients especially as the number of injuries increases. Because NTDB mandates ICD coding for trauma incidents, TMPM should continue as the preferred method for trauma incident prediction.
2. OTCS-MES, adapted to medical event sequences, performs better than the original OTCS similarity measure on morbidity prediction.
Unlike the original OTCS similarity measure, OTCS-MES allows generalized weighting of matched event and partial matching. These two capabilities should result in improved prediction on trauma morbidity prediction. Despite its shortcomings, the OTCS may still identify the most important matching events to predict mortality in trauma patients. Lack of coding detail may negate the advantage of weighted, partial matching. Coding detail depends on data collection practices at trauma centers and perhaps beyond trauma centers with some ICD codes reported in a patient's medical record before a trauma incident occurs. As such, the original OTCS may match predictive performance of OTCS-MES with large amounts of cases and large neighborhood sizes.
79


3. OTCS-MES using event severity weighting (OTCS-MES ES) performs better than the OTCS-MES using prevalence weighting (OTCS-MES EP).
Injury severity is an appropriate weighting method for scoring trauma incidents based on referential literature. Also, injury severity has already been quantified by several scoring systems. Most recently, scoring systems based on ICD-9-CM codes and incorporating injury type and anatomical region have been found effective in classification experiments (Hedegaard et al, 2016). OTCS-MES ES, using an event severity score (the Bareli matrix survivor proportion), should demonstrate improved performance for trauma incident classification.
4. The best ensemble combining individual similarity classifiers should perform better than individual similarity-based classifiers.
Ensembles improve performance of diverse classifiers. We expect enough diversity between event matching based on exact matching, normalized event prevalence with partial matching, and event severity with partial matching to achieve improved prediction results.
The primary research questions listed above are evaluated by incorporating method choices made during our secondary investigation. Table 13 summarizes these method choices for algorithm voting, neighborhood size, adjustment method for imbalanced data, and case base size. Accordingly, the secondary investigation is then performed to determine our best alternatives for these methodology parameters. The secondary analysis on neighborhood size also indicates if trauma data contain noise where sensitivity to small neighborhood sizes (1 to 5) reflects noisy data.
Table 13: Summary of Variables for Secondary Research Questions
Variable Choices
Nearest neighbor algorithm voting Traditional majority voting and soft voting with proportional weights
Neighborhood size (k) 1 to 49 (odd only)
Adjustment method for imbalanced data Majority voting, Oversampling, CF voting
Case base size 5,000, 10,000, 50,000
80


Trauma Data Set
Hospital-based trauma registries provide a foundation for much research about improving care of injured patients. Research has been limited by the lack of consistent, quality data received from disparate hospitals, regions, and states. To address this limitation, the American College of Surgeons developed the National Trauma Data Standard (www.facs.org/quality-programs/trauma/ntdb/ntds) to standardize core variables across hospitals. A wide variety of trauma centers contribute to the National Trauma Data Bank (www.facs.org/quality-programs/trauma/ntdb), a large aggregation of trauma data conforming to the National Trauma Data Standard.
Data Filters
We used the National Trauma Data Bank for our morbidity prediction experiment. Specifically, we randomly selected various test and training data from the complete set of 2015 trauma incidents in the trauma registries for 2015. Table 14 summarizes filters applied to the trauma data. We apply these filters to ensure that all methods, including those developed by other researchers, are using equivalent evaluation data. Specifically, Glance et al. (2009) provides the following description for the data filters:
"Patients with burns or nontrauma diagnoses (eg, poisoning, drowning, suffocation)
(60,353), missing or invalid data (data missing on age, gender, or outcome [HOSPDISP])
(42,025), or age younger than 1 year (7923) were excluded. Patients who were dead on arrival (2338) or transferred to anotherfacility (52,169) were also excluded. We limited the data set to hospitals admitting at least 500 patients during at least 1 year of the study because we believed that coding would be more accurate in centers with substantial trauma experience (48,095
patients were excluded)."
As further corroboration, Burd et al. (2008) applied these same data filters during development of the Bayesian Logistic Injury Severity Score. The final filter excluding trauma incidents having less than five diagnosis codes (events) is applied in accord with TMPM which uses the five most severe injury codes in its regression.
81


Table 14: Summary of Filtered Trauma Data
2015 NTDB Trauma Incidents Remaining
Original data set 917,865
(1) Excluded incidents with all diagnoses being non-trauma (based on MARC table) 728,309
(2) Excluded incidents for patients w/age LT 1 year, or missing age or gender 685,587
(3) Excluded incidents with missing discharge disposition (HOSPDISP n/a) 590,288
(4) Excluded incidents w/patient DOA or w/transferto another facility 427,545
(5) Excluded incidents for facilities handling LT 500 incidents during the year 403,534
(6) Excluded incidents having fewer than 5 diagnosis (event) codes 175,319
(6a) Deceased Disposition (6.28%) 11,010
(6b) Non-Deceased Disposition (93.72%) 164,309
From the 175,319 incidents having at least five events, we randomly selected 50,000 trauma incidents for a case base and 2,000 cases for testing. The training data set contains 465,325 total diagnosis codes (4,053 unique ICD-9-CM codes). We used the same test set to evaluate all hypotheses.
Due to a shortage of deceased cases, the oversampled case base has a morbidity prevalence of 22%, yielding 10,900 deceased cases.
ICD-9-CM Code Granularity
OTCS-MES uses hierarchical matching that leverages the more detailed diagnoses provided by 4-and 5-digit ICD-9-CM codes. The level of coding detail in a data set affects the predictive performance of OTCS-MES compared to OTCS. As shown in Table 15, inpatient data2 contains more detailed codes (59% are 5-digits in length) than the trauma data (45% are 5-digits in length). However, trauma data contains a lower percentage of 3-digit codes (3.5%) than inpatient data (5.3%). Based on these results, the OTCS-MES (EP and ES) may lose some advantage over OTCS due to the reduced level of diagnosis code detail for trauma incident data.
2 Inpatient data in Synthetic Public Use Files from the Center for Medicare and Medicaid Services (cms.gov 2018).
82


Table 15: Summary of Diagnosis Detail in Trauma and Inpatient Data
Length of Diagnosis Inpatient Events Inpatient Events Trauma Events Trauma Events
Code (# of Diagnosis Codes) (% of Total) (# of Diagnosis Codes) (% of Total)
3 Digits 1,988 5.31% 1,588 3.52%
4 Digits 13,337 35.61% 23,026 51.06%
5 Digits 22,123 59.08% 20,478 45.41%
Total 37,448 100.00% 45,092 100.00%
Performance Measures
For statistical evaluations, we use Area under the Receiver Operating Characteristic Curve (AUROC or AUC) as the primary performance measure. AUC provides a prevalence independent measure of discrimination ability in risk prediction models. AUC has several equivalent interpretations including the expectation that a uniformly drawn random positive example ranks higher than a uniformly drawn negative example. Calculation of AUC requires a ROC curve of classification scores. For nearest neighbor algorithms, we used voting proportions among nearest neighbors as classification scores.
We performed two-tailed tests of AUC using Mann-Whitney confidence intervals augmented with the Logit transformation (Qin and Hotolovac 2008). In a detailed simulation study (Kottas et al. 2014), the augmented Mann-Whitney intervals provided good AUC coverage, robustness to unbalanced sample sizes and normality departures, and reasonable power.
Although AUC is widely recognized as a measure of discrimination ability, it does not provide an operating point for a classifier. To deploy a classifier, one must select an operating point corresponding to a score threshold. Each ROC point has an associated confusion matrix characterizing positive and negative predictions using the scoring threshold.
In our experiment, we evaluated operating points using three measures, Youden's J statistic (also known as Youden's Index), the weighted Youden's Index, and the Neyman-Pearson criterion. Youden's index (Youden 1950), computed as sensitivity + specificity - 1, ranges from -1 to 1. A value of 1
83


indicates a perfect test with no false positives or false negatives. Li et al. (2013) introduced the weighted
Youden's Index when sensitivity and specificity are not equally important. In this study, the importance of predicting a trauma incident outcome of deceased correctly (sensitivity) is more important than predicting a non-deceased outcome. Although difficult to quantify, trauma center morbidity is a costlier outcome in both treatment options and risk (Newgard and Lowe 2016). In contrast to the tradeoff in the Youden's Index, the Neyman-Pearson criterion maximizes sensitivity at false positive constraint levels.
Results of Empirical Evaluation
This section presents results of the empirical evaluation addressing the primary and secondary research questions. Results of the secondary research questions are presented first as the primary research questions use results from the investigation of the secondary questions. That is, the secondary research determines the parameters for the classification method evaluating our primary research hypotheses.
Results for Secondary Research Questions
The results begin with an analysis of imbalanced versus over-sampled trauma data for training. We compare average performance for methods dealing with imbalanced data (over-sampled data and certainty-factor voting (CF)). Results in Tables 16 and 17 use kNN with a neighborhood size (k) of 15. In Table 16, using average results across voting methods, over-sampled training demonstrates improved performance based on Youden's Index (0.30 versus 0.13), along with AUC (0.7652 versus 0.7031). Furthermore, over-sampled data improves the key metric in trauma incident prediction of sensitivity (0.7788 versus 0.4356). In accordance with Kalton (2009), these results suggest that over-sampled training delivers improved predictive performance on trauma morbidity across the three similarity measures.
84


Table 16: Comparison of Over-Sampled and Imbalanced Training Data (k = 15)
Similarity Measure Sensitivity Specificity Accuracy Youden AUC
Over-Sampled Trauma Data
OTCS 0.8341 0.4106 0.4339 0.2447 0.7300
OTCS-MES EP 0.7795 0.5566 0.5689 0.3362 0.7719
OTCS-MES ES 0.7227 0.5960 0.6030 0.3188 0.7939
Average 0.7788 0.5211 0.5353 0.2999 0.7652
Imbalanced (Normal) Trauma Data
OTCS 0.4464 0.6515 0.6383 0.0979 0.6755
OTCS-MES EP 0.3960 0.7215 0.7005 0.1174 0.7049
OTCS-MES ES 0.4643 0.6976 0.6826 0.1619 0.7288
Average 0.4356 0.6902 0.6738 0.1257 0.7031
Table 17 shows that changing the voting method between majority and certainty-factor has
minimal impact on predictive performance. CF and majority voting have similar average values across
sampling methods, for both Youden's Index and AUC. Interestingly, CF voting provides substantial
improvements in sensitivity for mortality prediction, but much lower results for specificity and accuracy.
Table 17: Comparison of Certainty-Factor and Majority Voting (k = 15)
Similarity Measure Sensitivity Specificity Accuracy Youden AUC
Majority Voting
OTCS 0.3656 0.8169 0.7876 0.1825 0.7049
OTCS-MES EP 0.3283 0.8762 0.8415 0.2045 0.7429
OTCS-MES ES 0.3149 0.9192 0.8818 0.2341 0.7615
Average 0.3362 0.8708 0.8369 0.2070 0.7364
Certainty-Factor Voting
OTCS 0.9149 0.2452 0.2846 0.1601 0.7007
OTCS-MES EP 0.8472 0.4019 0.4279 0.2491 0.7339
OTCS-MES ES 0.8721 0.3744 0.4038 0.2466 0.7612
Average 0.8781 0.3405 0.3721 0.2186 0.7319
Weighted voting improves predictive performance as shown in Table 18. For the moderate
neighborhood size {k=15), weighted voting provides better performance for AUC, Youden's Index, and
sensitivity. For the large neighborhood size (Ar=49), weighted voting improves AUC performance.
Weighted voting provides more improvements on smaller neighborhoods demonstrating some
sensitivity of nearest neighbor classification to the neighborhood size. Since AUC is the key performance
measure, the primary investigation uses weighted voting for each similarity measure.
85


Table 18: Comparison of Weighted and Non-Weighted Voting
Similarity Measure Sensitivity Specificity Accuracy Youden AUC
Weighted Voting Based on Similarity Measure Value (k=15)
OTCS 0.8182 0.5545 0.5690 0.3727 0.7573
OTCS-MES EP 0.7455 0.6360 0.6420 0.3814 0.7658
OTCS-MES ES 0.7545 0.7238 0.7255 0.4784 0.8066
Average 0.7727 0.6381 0.6455 0.4108 0.7766
Non-Weighted Voting (k=15)
OTCS 0.3656 0.8169 0.7876 0.1825 0.7049
OTCS-MES EP 0.3283 0.8762 0.8415 0.2045 0.7429
OTCS-MES ES 0.3149 0.9192 0.8818 0.2341 0.7615
Average 0.3362 0.8708 0.8369 0.2070 0.7364
Weighted Voting Based on Similarity Measure Value (k=49)
OTCS 0.7455 0.6783 0.6820 0.4238 0.7722
OTCS-MES EP 0.7000 0.6995 0.6995 0.3995 0.7921
OTCS-MES ES 0.7091 0.7228 0.7220 0.4318 0.8255
Average 0.7182 0.7002 0.7012 0.4184 0.7966
Non- Weighted Voting (k=49)
OTCS 0.7545 0.6302 0.6370 0.3847 0.7545
OTCS-MES EP 0.7455 0.6873 0.6905 0.4328 0.7455
OTCS-MES ES 0.7364 0.7947 0.7915 0.5311 0.8191
Average 0.7455 0.7041 0.7063 0.4495 0.7730
To gain insight about the sensitivity of neighborhood size (k), we increased neighborhood sizes (odd values only) from 1 to 49, and then measured AUC for each k value. Figure 20 illustrates the impact
of increased neighborhood size on AUC for each classification algorithm. All three similarity measures
demonstrate improved performance with larger neighborhood sizes. Performance gains remain level for
OTCS-MES EP and OTCS-MES ES after k = 29. For OTCS, performance increases slightly until leveling off
at k = 45. Figure 21 provides additional insight into the impact of increased neighborhood size on the
two techniques for dealing with imbalanced data, over-sampling and C-F voting. Again, increased
neighborhood size improves AUC performance for both techniques dealing with imbalanced data. The
variability of predictive performance indicates some instability of nearest-neighbor classification in
trauma data.
86


AUC by Nearest-Neighbor Method and k Value
(10K Over-Sampled Training)
5 9 15 19 25 29 35 39 45 49
k Value
OTCS .......OTCS-MESEP — — OTCS-MESES
Figure 20: Comparison of Similarity Measures by Neighborhood Size (k)
Figure 21: Comparison of Methods for Imbalanced Data Sets on AUC
In the last part of the secondary analysis, we compared several case base sizes to understand performance improvements with a larger number of cases. Figure 22 indicates mixed results with additional cases having small improvements for some classifiers, but slightly worse performance for other classifies. Since 50,000 cases provide a noticeable improvement for OTCS, we use the largest case base in analysis of primary research questions.
87


AUC by OTCS Measure and Training Case Size Weighted Voting
O.83OO
5K 10 K 50K
Training Case Size
â–  Original OTC5 v OTCS-MES EP f. 0TCS-ME5 ES
Figure 22: Impact of Case Base Size on AUC
The primary research results, presented in the next subsection, use the sampling, voting and classification methods recommended by our secondary research. Specifically, our secondary analysis suggests a classification methodology that involves (1) over-sampled training based on the Kalton Optimum Sampling Fraction, (2) weighted nearest-neighbor voting, (3) large neighborhood size (k>41), and (4) large case base (50,000).
Results for Primary Research Questions
Analysis of AUC
Table 19 presents confidence intervals3 and related p values addressing the primary hypotheses. For Flypothesis 1, the results show effects for TMPM compared to each individual OTCS measure and the OTCS ensemble. Flowever, the effect for TMPM versus the ensemble classifier is reversed indicative of the ensemble classifier outperforming TMPM on trauma mortality prediction. This is a significant finding in support of similarity measure classification to improve clinical decision making. For Flypothesis 2, the results show effects for both OTCS-MES EP and OTCS-MES ES versus OTCS. This again backs the
3 Computed using the Mann-Whitney measure with Logit transformation (Kottas et al. 2014).
88


argument for improved performance from an MES adapted similarity measure. For Hypothesis 3, test
results show effects between OTCS-MES EP and OTCS-MES ES at an alpha of 0.10. For Hypothesis 4, test results show effects for all three classifiers versus the ensemble classifier, demonstrating sufficient diversity among individual classifiers.
Table 19: Statistical Testing Results for Hypotheses4
Test Classification Method 1 Classification Method 2 p-value
la TMPM (AUC 0.8392 0:0.8326-0.8458) OTCS (AUC 0.7894 0:0.7840-0.7948) < 0.0001 *
lb OTCS-MES EP (AUC 0.8065 0:0.8008-0.8122) < 0.0001 *
lc OTCS-MES ES (AUC 0.8194 0:0.8120-0.8268) 0.0056 *
Id OTCS-MES Ensemble (AUC 0.8589 0:0.8521-0.8657) 0.0037 *
2a OTCS-MES EP (AUC 0.8065 0:0.8008-0.8122) OTCS (AUC 0.7894 0:0.7840-0.7948) 0.0024 *
2b OTCS-MES ES (AUC 0.8194 0:0.8120-0.8268) OTCS (AUC 0.7894 0:0.7840-0.7948) < 0.0001 *
3 OTCS-MES ES (AUC 0.8194 0:0.8120-0.8268) OTCS-MES EP (AUC 0.8065 0:0.8008-0.8122) 0.0524 **
4a OTCS-MES Ensemble (AUC 0.8589 0:0.8521-0.8657) OTCS (AUC 0.7894 0:0.7840-0.7948) < 0.0001 *
4b OTCS-MES EP (AUC 0.8065 0:0.8008-0.8122) < 0.0001 *
4c OTCS-MES ES (AUC 0.8194 0:0.8120-0.8268) < 0.0001 *
ROC curves provide a visual representation of performance differences. In Figure 23, the ROC
curve for the ensemble dominates all other ROC curves except for two small intervals. The ROC curve for
TMPM dominates the ROC curves for OTCS and OTCS-EP at false positive values below 0.40. For false
positive values above 0.5, the ROC curves switch with the two similarity measures dominating TMPM.
The ROC curves for TMPM and OTCS-MES ES cross in several areas with OTCS-MES showing a small
4 *: significant at traditional alpha of 0.05. The family-wise error rate (probability of making at least one Type I error) for simultaneous testing of 10 comparisons is 0.22.
89


advantage at low false positive values, but TMPM showing a small advantage at larger false positive values.
ROC Curves for OTCS Adapted Similarity Measures and TMPM (50,000 Training Cases)
FPF
---Random ---OTCS---OTCS-MESES..OTCS-MESEP ■ —TMPM - — -Ensemble
Figure 23: ROC Curves for Mortality Prediction
Analysis of Operating Points
To provide insight about choosing an operating point on a ROC curve, we examine results across score thresholds. Figure 24 shows Youden's Index values for the classification methods over score thresholds. For TMPM, the threshold represents the probability of death, while the threshold for the kNN methods represents the voting proportion. Figure 24 shows a roughly linear increase in Youden's Index for TMPM until peaking at a low threshold (0.20) for probability of death. For the kNN methods, the graphs appear symmetric with peaks at 0.55 for the ensemble, 0.50 for OTCS-MES ES, 0.60 for OTCS-EP and 0.50 for OTCS-MES.
90


Youdens I ndex by Score Threshold
0.60
Score Threshold
-----OTCS-MESES .........OTCS-MESEP --------OTCS -TMPM — — .Ensemble
Figure 24: Confusion Matrix Performance of Classification Methods
For optimal results on operating points, OTCS-MES ES and the ensemble outperform TMPM when using equal weights for sensitivity and specificity. As shown in Table 20, the Youden values for OTCS-MES ES (0.5430) and the ensemble (0.5541) are slightly better than TMPM (0.5296). The optimal threshold for TMPM (0.20) is much lower than the thresholds for the kNN methods (0.50 and 0.55). Since the sensitivity values for TMPM (0.700), OTCS-MES ES (0.7182), and the ensemble (0.7091) are rather low, more emphasis on sensitivity should be given to choose an operating point.
Table 20: Confusion Matrix Summary for Equal Weights
Method Sensitivity Specificity Accuracy Youden Threshold
TMPM 0.7000 0.8296 0.8225 0.5296 0.20
OTCS 0.6818 0.7164 0.7145 0.3982 0.55
OTCS-MES ES 0.7182 0.8249 0.8190 0.5430 0.50
OTCS-MES EP 0.8182 0.6238 0.6345 0.4420 0.45
Ensemble 0.7091 0.8450 0.8375 0.5541 0.55
The OTCS methods show an increasing advantage over TMPM as the weight on sensitivity increases. Figure 25 shows weighted Youden Index values as the sensitivity weight increases from equal sensitivity/specificity (1/1) to high preference for sensitivity (10/1). The ensemble dominates TMPM at all weight levels. The individual OTCS approaches (OTCS, OTCS-MES EP, and OTCS-MES ES) dominate TMPM at sensitivity weights above 3/1. The performance of TMPM remains relatively flat over the range
91


Full Text

PAGE 1

SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PROSPECTS FOR CLINICAL DECISION MAKING by JOEL S COTT FREDRICKSON B.S., University of Colorado Colorado Springs, 1980 M.S., University of Colorado Denver, 2013 A dissertation submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Computer Science and Information Systems Program 2019

PAGE 2

ii This thesis for the Doctor of Philosophy degree by Joel Scott Fredricks on has been approved for the Computer Science and Information Systems Program B y Dawn Gregg , Chair Michael Mannino , Advisor Farnoush BarnaeiKashani Ronald Ramirez Date: May 18, 2019

PAGE 3

iii Fredrickson, Joel (Ph.D. Computer Science and Information Systems) Similarity Measures for Medical Event Sequences: Prospects for Clinical Decision Making Thesis directed by Associate Professor Michael Mannino A BSTRACT This dissertation advances the u se of a scientific artifact to assist with clinical decision making . Specifically, we develop and evaluate a similarity measure (OTCS MES) adapted to medical event sequences (MESs) . We further evaluate the decision making performance of OTCS MES as an exte nded clinical decision support tool for new health care domains. To expand the application of OTCS MES, we generalize its application to improve efficacy for medical event sequences recorded in domains other than the inpatient setting for which it was firs t developed . Assessment uses industry recognized “gold standards” in place to benchmark health care facility performance . Assessing the generalized measure’s performance requires experimentation on the newly extended OTCS MES, along with more classical inf erential methods integrating data elements inherent to MESs . This dissertation begins with a literature review that broadly describes the many inhibitors to physician adoption of technology . It explains how many of these inhibitors are unique to health care . The literature review concludes with a narrower focus on the lack of effect ive clinical decision support tools available to providers and provision of a similarity measure using medical event sequences . After the literature review, the empirical studies develop and evaluate our MES similarity measure; a decision support tool uniq ue and beneficial to health care . These studies address the proposition that a MES adapted similarity measure performs differently and ultimately better within the health care context. The form and content of this abstract are approved. I recommend its pub lication. Approved: Michael Mannino

PAGE 4

iv TABLE OF CONTENTS CHAPTER I. OVERVIEW .............................................................................................................................................. 1 Problem Statement .......................................................................................................................... 2 Motivation ........................................................................................................................................ 3 Literature Review ............................................................................................................................. 5 II. DEVELOPMENT AND EVALUATION OF A SIMILARITY MEASURE FOR MEDICAL EVENT SEQUENCES ... 27 Abstract .......................................................................................................................................... 27 Introduction .................................................................................................................................... 27 Related Work .................................................................................................................................. 3 1 OTCS MES Similarity Measure ........................................................................................................ 3 3 Empirical Evaluation of the OTCS MES ........................................................................................... 4 0 Discussion ....................................................................................................................................... 59 Conclusion ...................................................................................................................................... 6 1 III. SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PREDICTING MORTALITY IN TRAUMA PATIENTS .............................................................................................................................................. 63 Abstract .......................................................................................................................................... 63 Introduction .................................................................................................................................... 64 Related Work .................................................................................................................................. 68 Research Methodology .................................................................................................................. 71

PAGE 5

v Results of Empirical Evaluation ...................................................................................................... 84 Conclusion ...................................................................................................................................... 96 IV. EXTENDED SIMILARITY MEASURES TO PREDICT TRAUMA PATIENT MORTALITY ................................ 98 Abstract .......................................................................................................................................... 98 Introduction .................................................................................................................................... 98 Related Work ................................................................................................................................ 100 Research Methodology ................................................................................................................ 1 02 Results of Empirical Evaluation .................................................................................................... 112 Discussion ..................................................................................................................................... 116 Conclusion .................................................................................................................................... 117 Summary ...................................................................................................................................... 118 REFERENCES .............................................................................................................................................. 12 1 APPENDIX A. Taxonomy of Barriers ............................................................................................................ 13 0 B. Research Model of EHR Adoption Determinants .................................................................. 1 31 C. OMOP CDM Tables ................................................................................................................ 1 32 D. The CDSS Acceptance Model Facilitators ............................................................................ 1 33 E. OTCS MES C# Algorithm ........................................................................................................ 1 34 F. OTCS MES SAS Algorithm ...................................................................................................... 1 42

PAGE 6

vi LIS T OF TABLES TABLE 1. Sample Inpatient MES ............................................................................................................................. 28 2. Sample IP MES with Event Duration and Gap ......................................................................................... 34 3. Example of Similar Patients Based on Event Similarity ........................................................................... 38 4. Distribution of Outpatient Incidents by Duration ................................................................................... 43 5. Simple Overlap Measure Example .......................................................................................................... 44 6. Rank Weighted Overlap Measure Example ............................................................................................ 44 7. Summary of t Test Results for OTCS MES/OTCS with Balanced Weights ............................................... 50 8. Summary of t Test Results for OTCS MES/OTCS with Unbalanced Weights .......................................... 54 9. Confidence Intervals (95%) for Non Significant t Test Results in Table 8 ............................................... 54 10. Total Event Matches for OTCS MES and OTCS ..................................................................................... 56 11. Summary of t Test Results for OTCS MES/Artemis with Balanced Weights ......................................... 59 12. Summary of Hypothesis Evaluation ...................................................................................................... 60 13: Summary of Variables for Secondary Research Questions ................................................................... 80 14: Summary of Filtered Trauma Data ........................................................................................................ 82 15: Summary of Diagnosis Detail in Trauma and Inpatient Data ................................................................ 83 16: Comparison of OverSampled and Imbalanced Training Data (k = 15) ................................................. 85 17: Comparison of Certainty Factor and Majority Voting (k = 15) ............................................................. 85 18: Comparison of Weighted and Non Wei ghted Voting ........................................................................... 86

PAGE 7

vii 19: Statistical Testing Results for Hypotheses ............................................................................................ 89 20: Confusion Matrix Summary for Equal Weights ..................................................................................... 91 21: Summary of Findings on Hypotheses .................................................................................................... 94 22: Match Level in Trauma Data ................................................................................................................. 95 23. Sample Inpatient MES ......................................................................................................................... 100 24: VDM Comp utations for Gender and Injury Mechanism ..................................................................... 104 25: Summary of Filtered Trauma Data ...................................................................................................... 110 26: Comparison Statistics for Logistic Regression Models (Method 1) .................................................... 112 27: AUC Hypothesis Test for Extended TMPM and OTCS MES Ensemble (Method 1) ............................. 113 28: Similarity Component Weighting Alternatives for Method 2 ............................................................. 114 29: AUC Hypothesis Tests (Method 2) ...................................................................................................... 114

PAGE 8

viii LIST OF FIG URES FIGURE 1. Frequency Distribution of Patient Pairs by Number of Matched Inpatient Events ................................ 39 2. Frequency Distribution of Patient Pairs by Number of Matched Outpatient Events ............................. 39 3. Prevalence Scales by Number of Matched Events for Inpatient MESs ................................................... 40 4. Frequency Distribution of Patient Pairs by OTCS MES (Inpatient Data) ................................................. 45 5. Frequency Distribution of Patient Pairs by OTCS MES (Outpatient Data) .............................................. 46 6. Simple Overlap using Unbalanced Weighting (Inpatient Data) .............................................................. 47 7. Rank Weighted Overlap using Unbalanced Weighting (Inpatient Data) ................................................ 47 8. Simple and Rank Weighted Overlap of OTCS MES Similarity Measure vs Original OTCS ....................... 50 9. Simple Overlap Results with Extreme Component Weighting ............................................................... 52 10. Rank Weighted Overlap Results with Extreme Component Weighting ................................................ 52 11. Simple Overlap Results with Mixed Componen t Weighting ................................................................. 53 12. Rank Weighted Overlap Results with Mixed Component Weighting ................................................... 53 13. Nearest Neighbor Overlap by Sequence Length (Inpatient Data) ........................................................ 55 14. Nearest Neighbor Overlap by Sequence Length (Outpatient Data) ..................................................... 55 15. Simple Overlap of OTCS MES versus Ar temis (Inpatient Data) ............................................................ 56 16. Rank Weighted Overlap of OTCS MES sure versus Artemis (Inpatient Data) ....................................... 57 17. Simple Overlap of OTCS MES versus Artemis (Outpatient Data) ......................................................... 57 18. Rank Weighted Overlap of OTCS MES versus Artemis (Outpatient Data) ............................................ 58

PAGE 9

ix 19: OT CS Event Matching Procedure (Zheng et al. 2010) ........................................................................... 73 20: Comparison of Similarity Measures by Neighborhood Size (k) ............................................................ 87 21: Comparison of Methods for Imbalanced Data Sets on AUC ................................................................. 87 22: Impact of Case Base Size on AUC .......................................................................................................... 88 23: ROC Curves for Mortality Prediction ..................................................................................................... 90 24: Confusion Matrix Performance of Classification Methods ................................................................... 91 25: Weighted Youden’s Index by Method and Cost Ratio .......................................................................... 92 26: Maximum Sensitivity for False Positive Constraints (Neyman Pearson criteria) ................................. 93 27: Patient Record Variable Significance .................................................................................................. 104 28: ROC Curves for Extended TMPM and OTCS MES Ensemble (Method 1) ........................................... 113 29: Weighted Youdens Index by Method and Sensitivity Cost Ratio (Method 1) .................................... 115 30: Sensitivity by FPF Constraint Level (Method 1) .................................................................................. 115

PAGE 10

1 CHAPTER I O VERVIEW The literature review and research presented in this dissertation support movement from the “practice of medicine” to the “science of medicine” . Khosla (2014) argues that the practice of medicine “is driven by conclusions derived from partial infor mation of a patient’s history and current symptoms interacting subjectively with various known and unknown biases of the physician, hospital, and healthcare system” . Health care must evolve toward a more scientific method, with complete and accurate data c ollection, sophisticated analysis, and scientific experimentation aimed at delivery efficiency and improved patient outcomes . Khosla (2014) summarizes by saying: “Healthcare must move away from the system of small trials and experiential evolution of best practices (which has done us well) to a state unencumbered with the conflicts of interest, personal biases, and incomplete knowledge that currently lead to suboptimal results”. Essentially, we must overcome an approach to health care based on practice an d tradition and move to one able to effectively leverage the vast amounts of accessible and digitized patient centric data . Khosla (2014) predicts that: “Technology will reinvent healthcare as we know it” and “in the future, the majority of physicians’ dia gnostic, prescription and monitoring . . . will be replaced by smart hardware, software, and testing.” Accordingly, this study advances the use of a scientifically designed tool augmenting clinical decision making systems . Specifically, we develop and ev aluate a similarity measure (OTCS MES) adapted to medical event sequences (MESs) . We further evaluate the decision making performance of OTCS MES as a flexible analytic method accommodating different health care domains. That is, we generalize our OTCS MES similarity measure to better include various events found in diverse health care service categories, such as inpatient admissions and outpatient procedures . Assessing our generalized measure’s performance then requires empirical evaluation of e xtended OTC S MES versions ,

PAGE 11

2 along with more classical inferential methods integrating data elements inherent to MESs . Accordingly, t he remaining sections of this dissertation are as follows: 1. Problem Statement and Motivation 2. Literature Review: Health Care Technology Adoption – Enablers and Inhibitors 3. Literature Review: Clinical Decision Su p port System s – Overview and Adoption Factors 4. Literature Review: Similarity Measures and Medical Event Sequences 5. Study 1: Development and Evaluation of a Similarity Measure for Medical Event Sequences 6. Study 2: Similarity Measures for Medical Event Sequences: Predicting Mortality in Trauma Patients 7. Study 3: Similarity Measures for Medical Event Sequences: Performance with Patient Record Data Problem State ment The pervasive retention of electronic health care data enables clinical decision making systems (CDSS) and associated tools . The HITECH act not only incented the expanded adoption of electronic health records (EHRs) and their "meaningful use ” but aime d to leverage this new data source to improve the quality of health care . Essentially, the originators of HITECH envisioned electronic health care data routinely captured in a standardized and secure manner to satisfy five goals for the US healthcare syste m: "improve quality, safety and efficiency; engage patients in their care; increase coordination of care; improve the health status of the population; and ensure privacy and security" ( Sox 2008) . A problem with this vision has become the obvious lack of clinical decision making tools and use of EHRs by health care providers to improve patient outcomes . That is, EHRs are normally used in transaction based systems for billing, scheduling, and workflow, but their use to improve patient diagnosis and treatment is remarkably neglected . The literature review below addresses many of the general inhibitors to E H R use for clinical decision making . Importantly, t his dissertation proceeds to focus on one of the more significant inhibitors : lack of CDSS tools perceived as useful in improving quality of care . Accordingly, this research becomes design science in nature through the provision of a CDSS tool specific

PAGE 12

3 to medical event sequences . We strive to create an IT artifact (OTCS MES Similarity Measure) that is carefully designed to help health care providers during the performance of their responsibilities . Such CDSS tools are not transaction oriented but learn “ from the growing volume of captured data what does and does not work in healthca re ” ( Sox 2008). The development and testing of such tools , leveraging expanding data warehouses of EHRs, is critical to the movement from the "practice of medicine" to the "science of medicine" discussed previously . M otivation As the following literature review outlines , health care lags in the adoption of clinical decision support systems and supporting information technology . There are many unique factors within health care inhibiting IT adoption and these factors are thoroughly explained in the literatu re review . This dissertation focuses on one of the more important inhibitors: the lack of technology driven tools perceived as beneficial by health care providers during the diagnosis and treatment of patients . As the "meaningful" use and coordinated data stores of EHRs evolve, tools leveraging health care data for clinical decision support will increase in number and popularity among care givers . As such, we introduce a CDSS tool taking advantage of referential health care data and advanced analytical meth ods to enhance clinical decision making . Our CDSS tool uses patient or incident specific medical event sequences (MESs) captured and retained in ever growing data stores of EHRs . A sequence of medical events , incurred by a patient and characterized in electronic data, lend s itself to CDSS similarity measure analysis for several reasons . First, standardized coding systems for diagnoses, procedures, and pharmaceuticals are in place to represent medical events . F or example, an inpatient admission, as a medical event, is characterized by the ICD (International Classification of Diseases) code s for admission and discharge diagnoses . This allows the attachment of “meaning” to a particular medical event . Consequently, similarity measures for MESs can go beyond simply matching states as previous measures do, and quantify likelihood, risk and severity of matched states . Second,

PAGE 13

4 similarity measures adapted to MESs allow consideration of the important temporal or structural component of event sequences . That is, the data captured during multiple medical event s includes temporal fields helpful in defining sequence structure . As an example, an inpatient admission claim includes the beginning and end date of the admission . Thu s, the temporal structure and duration of medical events provide important information useful to similarity measures . For example , longer durations for medical events are indicative of greater severity as are more frequent and consistent event occurrences. There are several practical applications for similarity measures functioning within clinical decision support systems . These include (1) improved patient classification and clustering, (2) increased patient adherence to clinical pathways (care management plans), and (3) augmented discovery of similar patients for medical social networking . First, MES similarity measures have potential for grouping “like” patients . Such classification and clustering research generates patient or entity cohorts having high s imilarity estimated by MES based similarity measures . The goal is patient cohorts having greater attribute similarity . For example, MES similarity measures could help build homogeneous patient groups having similar disease state s or prospective risk . Second, a MES similarity measure could help evaluate patient adherence to an accepted clinical pathway specific to a diagnosed condition . A clinical pathway is a temporally spaced sequence of procedures, medications, tests or other medical events related to a disease or physical condition . MES similarity measures seem especially suitable to comparing a patient’s actual sequence of medical events to their prescribed clinical pathway . Finally, medical social networks allow patients havin g similar medical histories to discuss treatment successes and failures, exchange experiences and receive emotional support. MES similarity measures could prove useful as another method incorporated by medical social networks when retrieving similar patien ts for their online communities.

PAGE 14

5 Literature Revie w Introduction P hysician intent to adopt our MES similarity measure , and clinical decision support systems in general, is of primary concern . A dichotomy exists between the apparent and significant benefits of CDSS adoption and the reluctance of physicians to accept this technology . Potentially, CDSSs can be integral in (1) improving quality and continuity of care, (2) increasing productivity for doctors and nurses, (3) providing better i nformation for decision making, (4) enabling better product/service customization, (5) achieving higher quality patient outcomes, and (5) improving service efficiency . However, health care providers , and most especially physicians in smaller practices, have been notoriously slow in adopting EHR use for purposes other than transaction based systems . Accordingly, prior to development and evaluation of our MES similarity measure for clinical decision sup port, we must understand the factors contributing to its eventual adoption and possible barriers to its acceptance . This literature review begins with a broad analysis of IT adoption by physicians in the United States . The focus of the review than becomes narrower through a review of research about clinical decision support systems . Finally, the review concentrates on referential literature about similarity measures and medical event sequences important to the development of our new IT artifact . Enablers of Health Care IT Adoption Use of EHRs Significant developments are evident within the health care industry in terms of pervasive capture, retention and use of medical event information . The positive news for health care researchers is the focus on widesprea d use of EHRs by providers , and standardized coding algorithms for information abstraction . Encouragingly , IT trends in health care are enabling the application of similarity measures and related data mining techniques to ever increasing amounts of clinical data.

PAGE 15

6 The health care industry formally directed its attention to increased application of IT to capture and leverage patient information with the mandates of the Health Insurance Portability and Accountability Act of 1996 . Since that time, the governmen t has placed increasing emphasis on the need to adopt technology within health care , culminating with the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 . A central focus of this legislation is the “meaningful use” of EH Rs by health care providers . Specific to electronic health or medical records, the HITECH Act establishes a system of incentive payments to healthcare providers for the “meaningful use” of EHRs . The act states “that, as of 2011 [and until 2015], healthcare providers will be offered financial incentives for demonstrating meaningful use of electronic health records” (healthcareitnews.com 2014) . The requirements for “meaningful use” of IT by healthcare providers are administered by the Centers for Medicare an d Medicaid Services (cms.gov) . To qualify for incentive payments , and to avoid future penalties, healthcare providers in the form of eligible professionals, eligible hospitals, and critical access hospitals (CAHs) must meet “measurement thresholds [for EHR use] that range from recording patient information as structured data to exchanging summary care records” (healthit.gov 2014) . The CMS incentive programs have three stages , and each stage requir es additional evidence of IT usage . Observational Medical Ou tcomes Partnership (OMOP) – Common Data Model In the presence of every expanding stores of EHRs, their collaborative use for research requires an enabling common data model (CDM) and associated standards . The federal government first realized the enabling power of a CDM having a common format for disparate data sources with the FDA Amendments Act of 2007 . Specifically, this act mandated that “the FDA collaborate with public, academic, and private entities to access disparate data sources and to validate way s to link and analyze safety data from multiple sources for medical product safety surveillance” (Stang et al. 2010) . As a result, the Observational Medical Outcomes Partnership (OMOP) (http://omop.fnih.org), a partnership

PAGE 16

7 among the FDA, academia, data own ers, and the pharmaceutical industry, was initially created to identify needs and test scientific methods and data infrastructure of an “active drug safety surveillance system” . While this first effort at building a facilitating health care data source for collaborative research was restricted to “maximizing the benefit and minimizing the risk of pharmaceuticals”’, its potential for other categories of health care service quickly became apparent (Stang et al. 2010) . Given the obvious value of a standardized data resource for collaborative health care research, the OMOP stakeholders , upon the conclusion of their project , initiated the Observational Health Data Sciences and Informatics (OHDSI) initiative (Hripcsak 2015) . OHDSI strives “ to bring out the value of observational health data through large scale analytics” (https://ohdsi.org/who we are/) . Integral to the OHDSI initiative is the establishment of a CDM for abstracted health care data and the development of associated ETL methods and informatic application s. A CDM for health care data responds to the fundamental premise that health care data varies greatly among organizations . This is due to (1) the capture of information for different purposes and through different means, and (2) the use of “abstracted” and varied code sets to represent medical events . First, health care data may be retained for a variety of purposes including reimbursement, research and analytics, or patient care . Each of these purposes involves different code sets, forms, and retention methods . Second, numerous code sets exist to capture medical events , including code set s for diagnoses, procedures, clinical tests, and drugs . Furthermore, a code must be “abstracted” to represent a medical event , and this process involves a great deal of subjectivity and human error. Obviously, given these factors, the variance among “observational” data stores can be significant . The OMOP CDM is a concerted effort to convert disparate data into a common format enabling systematic analysis . Accordingly, Overhage et al. (2011) argue that “translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts’ understanding and the suitability for large scale systematic analysis” . The OMOP CDM provisions standardized health care data

PAGE 17

8 from diverse sources as furnished by an open network of observational data holders . Each element in a participant’s database must be mapped to the approved CDM vocabulary and subsequently placed in the dat a schema . Once the OMOP CDM data store exists, it affords researchers the opportunity to utilize numerous data exploration and analytics tools . The schematic for the tables of OMOP CDM is provided as Appendix C. OHDSI asserts that OMOP CDM enables improved and accessible analytics . As such, it allows participants to “perform systematic analyses using a library of standard analytic routines that have been written based on the common format” (https://ohdsi.org/data standardization/) . A couple of examples of s uch enhanced analytics are the open source package ACHILLES and a study for temporal pattern discovery in longitudinal electronic patient records . First, ACHILLES (Automated Characterization of Health Information at Large scale Longitudinal Exploration Sys tem) provides a graphical representation of health care information found in the OMOP CDM . Basically, this application is a visualization tool for reviewing clinical content based on summary statistics derived from OMOP CDM content . The second example of a n application leveraging OMOP CDM is more relevant to our OTCS MES similarity measure and its attention to the temporal pattern of medical events . In this example, Noren et al. (2010) demonstrates the use of enhanced graphics to portray drug prescription a nd medical event sequences and evaluate expected and observed medical event frequencies for drug combinations . Interestingly, the graphical portrayal of MESs by Noren et al. is very similar to that used in our study . Overall, this study shows the value of a summarized data store of MESs (such as OMOP CDM) when used in conjunction with sophisticated analytical tools. Barriers to Health Care IT Adoption The potential benefits of EHR adoption by the health care industry are significant . EHRs are believed to be integral in “improving quality, continuity, safety and efficiency in healthcare” (Boonstra

PAGE 18

9 and Broekhuis 2010) . While expectations for outcomes related to “time, quality improvements, cost, efficiency, paper reduction, ease of sharing” (Rand eree 2007) are high from EHR use , the adoption rate has been notoriously slow . Because of the dichotomy between the apparent and significant benefits for EHR adoption and the reluctance of physicians to accept this technology, a substantial amount of resea rch has been undertaken to understand the determinants of EHR adoption and its meaningful use . In fact , researchers Boonstra and Broekhuis (2010) have proposed a structure for organizing EHR barrier research (Appendix A ), where “from the physician perspect ive, barriers linked to similar problems were grouped into a single category” . Expanding upon these previously identified adoption barriers, and leveraging more current referential studies, we propose the health care IT adoption model shown in Appendix B . The determinants of IT adoption comprising this model are listed below and explained in subsequent sections. 1. Financial 2. Technical 3. Time 4. Psychological 5. Social 6. Legal 7. Organizational 8. Change Process 9. Environmental 10. Third Party Intervention Financial Financial considerations are a strong determinant of the meaningful use of EHR s by physici ans . Physicians face varied and significant costs during EHR implementation and upkeep , and they also encounter challenges unique to the healthcare industry when fund ing technology . Primary consideration was given in earlier research to the “initial costs” of acquiring and installing an EHR system . Miller and Sim (2004) have this initial cost focus when speaking of “high initial financial costs, slow and uncertain financial payoffs, and high initial physician time costs” . Similar treatment of the

PAGE 19

10 initial costs for EHR systems is found in research by Ford et al. (2006) saying that a cost “physicians in small practices have had to internalize is the system’s initial purchase” and by Boonstra and Broekhuis (2010) defining initial EHR costs to “include all the expenditure needed to get an EHR system working in the physician’s practice, such as the purchase of hardware and software, selecting and contracting costs and installation expenses” . Another initial cost identified by Vishwanath and Scamurra (2007) is the direct “cost of the c urrent legacy system in place” . The initial cost barrier concerns both the source of funds and eventual financial benefit (ROI) for EHR adoption. Accordingly , Gans et al. (2005) highlight the “lack of capital resources to invest in an EHR” as a top five barrier to EHR adoption and Hennington and Janz (2007) address this issue in the form of two questions that each physician must ask: “1) What is the likelihood of seeing a positive return on the investment? [and] 2) Are there financial means available to pur chase and maintain the system?” . Hennington and Janz (2007) apply these two questions to the Unified Theory of Acceptance and Use of Technology (UTAUT) model in terms of UTAUT constructs where EHR ROI is related to “physicians’ performance expectancy” and the availability of funds “addresses whether or not facilitating conditions exist”. As physicians have become more experienced in the use of EHR systems, the financial determinant of “ meaningful use ” has shifted in its focus somewhat to include worries ab out the ongoing costs for EHR maintenance and support . Ford et al. (2006) describes this financial component as “ongoing operational costs” and limits it to a discussion about EHR vendor transience and risk . Other researchers , including Randaree (2007) , ex pand this discussion and find that physicians face other often unexpected ongoing EHR costs including those “associated with the customizability of their system”, expensive “maintenance agreements” and “technology obsolescence“ . The general impression from the research about financial determinants of EHR adoption is that (a) initial and ongoing costs remain underrepresented and (b) there is a great deal of uncertainty concerning the availability of incentive payments and effective reimbursement mechanisms f rom payers to fund EHR meaningful use . Miller

PAGE 20

11 and Sim (2004) feel that these barriers are even more significant in the context of smaller physician practices , saying “these barriers were most acute for physicians in solo/smallgroup practice, a mode in whi ch a substantial majority of U.S. physicians practice”. Technical The technical determinant of EHR meaningful use can be considered using the two components of the classic Technology Acceptance Model (TAM): perceived “ease of use” and perceived “usefulness ” . First, in terms of EHR “ease of use”, a significant amount of research points to the distinctive resistance to technology within the healthcare industry and lack of technical expertise found in physician’s practices, both by the physicians themselves an d their support staff . Smaller practices do not have the necessary background, training or time to learn and deploy often complex EHR systems . Randaree (2007) describes this phenomenon by saying that “smaller practices do not have the capacity to implement the software while dealing with issues of care and reimbursement” and “medical schools and residency programs do not currently employ or train future physicians in the use of EHR ” leaving them unequipped to properly select and deploy EHR s. Terry et al. (2 009) expand upon this by stating EHR “barriers included level of computer literacy, training, and time” with the initial transition period being especially challenging in its time burden upon poorly trained staff and doctors . Boonstra and Broekhuis (2010) also note that “physicians struggle to get appropriate technical training and support for the systems from the vendor” . Terry et al. (2009) suggest the need for an “in house problem solver” to “serve a key role in helping novice users move forward to achi eve EHR adoption” . An additional factor of EHR “ease of use” is system complexity with Miller and Sim (2004) finding “even highly regarded, industry leading EHR s to be challenging to use because of the multiplicity of screens, options, and navigational aids” . Randaree (2007) elaborates saying a “recen t study showed that at least 264 EHR software programs are in use; on average, the percentage of respondents with the same EHR software was 0.4%” . Obviously, this issue is aggravated by the lack of technical skills possessed by most physicians

PAGE 21

12 and staff. T he need for EHR customizability is another factor of EHR perceived “ease of use” . Nambisan (2014) describes how “adopters of new innovations often learn by using the innovation or reinvent the technology to adapt it to their own context” and how this is “i nvaluable during the adoption of new technologies such as EHR ”, but may not be possible given the knowledge and time limitations for physicians . One final consideration has to do with the interoperability of the EHR system to other internal and external IT systems and existing physician office processes . This is described as an inability of the EHR system to “interconnect with other devices that “complement” the EHR system” and EHR s that “are not compatible with the existing practice systems” (Boonstra and Broekhuis 2010) . Establishment of data standards is key to this interoperability and repeatedly surfaces as a concern physicians have with EHR s. The second component of TAM deals with the perceived usefulness of a perspective technology . Physicians are to ld of great possible utility for EHR s, including enhanced quality of patient outcomes, better compliance with various reporting and healthcare mandates , and improved process efficiencies . However, most healthcare providers are skeptical of EHR capability t o deliver these benefits , and they are also not provided compelling incentives to improve clinical performance . Hennington and Janz (2007) describe an environment where the “system does not compensate physicians based upon the quality of the care they prov ide, and thus does not reward them for investing in systems designed to improve quality of care” . Finally, effective knowledge management helps mitigate technical obstacles for EHR meaningful use, but “barriers to establishing a learning culture in health care organizations” are more problematic than those found in other industries (Nambisan 2014). Time The time determinant for EHR meaningful use is noteworthy where a “key obstacle in this path to quality [of care] is the extra time it takes physicians to learn to use the EHR effectively for their daily

PAGE 22

13 tasks” (Miller and Sim 2004) . Time spent to install and maintain an EHR system comes at the detriment of patient doctor relationships . This is an extremely important issue given most physicians are principally driven to provide the best patient care possible. Along these lines, Miller and Sim (2004) find “that most physicians using EHR s spent more time per patient for a period of months or even years after EHR implementation. The increased time costs resulted in longer workdays or fewer patients seen, or both, during that initial period” . Research points to especially frustrating and unexpected time investments for process redundancy caused by duplicating medic al records and functionality during EHR transition, software maintenance and upgrades, vendor management and dedicated EHR training . Unexpected amounts of time are predominantly incurred during the initial transition period between a paper based and electr onic medical records system . Gans et al. (2005) discover that “concern about loss of productivity during transition to an EHR system” is among the top five barriers to EHR adoption for practices . Other research also points to this surprising investment of time during EHR implementation . For example, Randaree (2007) argues that during EHR transition ; “Expect staffing changes and lots of training. Slow your implementation expectations. Create redundant systems until errors are eliminated. Whatever your timeta ble and budget (double it).” Relating to this time and effort outlay , Hennington and Janz (2007) believe “the time required to enter information into EHR systems is one of the greatest challenges to EHR adoption”. Psychological The psychological determin ant of EHR use is especially significant within the unique environment of healthcare and to its behaviorally unique physicians . Nambisan (2014) feels that the psychological determinant does not receive appropriate attention arguing that “innovation adoptio n is situated in a social (cultural) context and implies that the norms and values of the individual, the larger community of the individual, and the organization that the individual belongs to, all can influence adoption“ . To best understand the psychological barriers to EHR adoption, an understanding of the

PAGE 23

14 socio cultural and psychological context of health care is important . Significantly, p hysicians in smaller practices have control and authority over both clinical and business processes . As such, witho ut their full support, EHR adoption is extremely difficult to realize . Miller and Sim (2004) support this view saying “nonchampion [unsupportive] physicians tended to be less positive toward EHR s and more easily discouraged by usability problems without exhortation and support from physician champions, these physicians tended to remain as lower level EHR users.” The issue primarily seems to be one of process control that is particularly evident with older physicians stubbornly retaining old paper based systems . Randaree (2007) explains that “older physicians were reluctant to transition to the EHR while the younger ones were driving the adoption” . This issue of control is better explained by Vishwanath and Scamurra (2007) as “the loss of control of patien t information, the loss of control over business processes, systems tend not to be very easy to use, negative perceptions among administrative staff, and problems in understanding the vernacular” . Control is related to “professional autonomy” defined as “p rofessionals having control over the conditions, processes, procedures, or content of their work” ( Walter 2008) . Accordingly, “physicians’ perceptions of the threat to their professional autonomy are very important in their reaction to EHR adoption” (Boons tra and Broekhuis 2010) . In addition to a perceived lack of control, many physicians are skeptical about the capability of EHR s to improve patient outcomes or quality of care . In fact, “more than half (58.1%) of the physicians without an EHR doubt that EHR s can improve patient care or clinical outcomes” (Boonstra and Broekhuis 2010) . Finally, an important facet of the psychological determinant is the need for support and empathy from fellow adopters of EHR systems . Nambisan (2014) applies Social Cohesio n Theory to highlight the importance of “empathetic communication between the adopter and the laggard” . In the context of EHR adoption , the “higher chance of adoption of the innovation by the laggard” is realized through an appropriate level of social cohe sion (Nambisan 2014) .

PAGE 24

15 Social Because physicians cannot function in isolation, their network of relationships is a key determinant of EHR use . Boonstra and Broekhuis (2010) highlight some of the relationships impacting physician behavior saying “physicians in medical practices work together and cooperate with other parties in the healthcare industry, such as vendors, subsidizers, insurance companies, patients, administrative staff, and managers“ . The influence of these relationships has been shown to be of greater importance than was originally thought in HER adoption research . Vishwanath and Scamurra (2007) point to the unexpected breadth of social related determinants found in more recent research that professes social determinants to be “more expansive an d complex and include the lack of community level participation, the lack of involvement of major players such as hospitals, the lack of involvement of major players such as insurance companies, the lack of organizational support, the lack of knowledge/awa reness of current or local success stories, and the fact that others do not use or recommend EHRs” . Along these lines, Hennington and Janz (2007) draw from more current technology adoption models , including the Model of PC Utilization , to include social fa ctors for “the individual’s internalization of the reference group’s subjective culture, and specific interpersonal agreements that the individual has made with others, in specific social situations” . More recent research into the social determinant looks to understand the underlying causes for the unique health care culture and resulting social interactions . Accordingly, Nambisan (2014) believes much of the social determinant is explained by the communication style prevalent in health care settings , findin g that an “important factor that affects learning is the mode of communication in the health care organization” . Specifically, Nambisan (2014) argues that “face to face or phone [communication prevalent in health care facilities] has been found to be highl y interruptive and is a leading cause of errors” . The specific reason for the ineffectiveness of this communication style is that “communication among employees in a hospital environment often leads to interruption driven work contexts, where miscommunication or ineffective

PAGE 25

16 communication is the norm” (Coiera and Tombs 1998) . Obviously, knowledge sharing facilitating EHR meaningful use is a challenge to achieve when constrained by this type of communication environment. Legal The legal determinant of EHR us e relates to (a) the electronic storage and transfer of confidential health care information and (b) the various reporting requirements for compliance with health care agency mandates . Interestingly, in earlier research, concerns over the privacy of patien t information were rated lowest for those practices with EHR s (Gans et al. 2005) . Initial concern over the security of EHR information may have been lessened by the lack of enforcement measures provided with the Health Insurance Portability and Accountabil ity Act of 1996 that was intended to help protect confidential health care information . In fact, “HIPAA enforcement and compliance changed in a dramatic way after 2009. As part of the American Recovery and Reinvestment Act (ARRA), Congress passed in 2009 the Health Information Technology for Economic and Clinical Health Act (HITECH). The HITECH Act greatly strengthened HIPAA by dramatically increasing the penalties for HIPAA violations — up to $1.5 million for a violation in certain circumstances” (library.ahima.org 2014) . Research describes two concerns for physicians over the privacy of patient information in the context of EHR usage . First is providing adequate protection to safeguard a patient’s confidential medical information in compliance with HIP AA regulations . Second is the need to maintain access to and usability of this information for proper patient care . Often these two concerns conflict as heightened information security may result in reduced information access . Randaree (2007) explains this by saying “Medical data availability has to be balanced between access and privacyMaximum controls may inhibit physicians from performing their job; minimal controls could leave patient information vulnerable to theft and misuse” . The increased focus in the health care industry on protection of patient privacy and the more stringent regulations along those lines has heightened the significance of this determinant for physicians. Again, Randaree (2007) argues “patient record management is not new to the me dical industry, information technology

PAGE 26

17 (IT) solutions using EHR s have special considerations relating to the security of patient information (HIPAA, privacy, firewalls, virus protection, transmission) and performance issues (reliability of services, servic e levels, customization) as well as long term impacts (storage, computer upgrades, data efficacy ) ” . Ironically, Boonstra and Broekhuis (2010) find that physicians are more concerned than patients with the security of patient information saying, “physicians are more concerned about this issue than the patients themselvesamong the physicians who do use EHR s, most believe that there are more security and confidentiality risks involved with EHR s than with paper records” . The second component of the legal dete rminant for EHR use is the anticipated benefit from EHR systems for compliance with federal and health care agency programs . Ford et al. (2006) state that “The policy mechanism most commonly discussed for increasing EHR’s external influence coefficient in the United States is the introduction of clinical reporting mandates . As reporting requirements increase, the only feasible mechanism for gathering such data will be the EHR . While this may initially be viewed as having a positive impact on EHR meaningful use, Ford et al. (2006) caution “while such programs may be of some use, they may not advance the goal of full EHR adoption significantly, because U.S. providers tend to respond negatively to such mandated use policies” . An example of a health care program requiring enhanced reporting capabilities is National Center for Quality Assurance (NCQA) accreditation . This program is claimed to be “the most comprehensive evaluation in the industry, and the only assessment that bases results of clinical performance (i.e., HEDIS measures) and consumer experience (i.e., CAHPS measures)” (ncqa.org 2014) . NCQA accreditation is becoming a necessity for competitive health care facilities and EHR s are believed to ease the reporting burden s from this program . Obviously, th e “meaningful use” requirements under the HITECH act necessitate similar enhanced reporting capabilities available through EHR systems . Care should be taken when evaluating the reporting component of this determinant as many of the physicians experienced w ith EHR usage have not experienced anticipated reporting benefits . Randaree (2007) indicates that customizable and enhanced reporting was not there

PAGE 27

18 for many practices requiring physicians to incorporate “new [and] redundant work flows” for reporting purpos es. Organizational The facets of the organizational determinant of EHR use considered by the referential literature include (a) organization characteristics for size, clinic type or scope of services (inpatient, outpatient, ambulatory, etc.) , and specialty (psychiatry, pediatrics, orthopedics, etc.), (b) extent of separation , both socially and professionally , between physicians and staff and (c) the organizational structure in terms of horizontal or vertical orientation . First, the most consistent organizat ional determinant i s practice size , measured either by the number of physicians or patients . Gans et al. (2005) corroborates saying “the percentage of practices with EHRs differs greatly by size of practice” . Specifically, practice size impact s EHR adoption because “larger practices could spread the sizeable fixed cost of purchase and implementation over more physicians” (Burt and Sis k 2005) . Burt and Sisk (2005) contend that ownership type (physician or physician group; health maintenance organization (HMO); and all other health care organizations) is what really drives EHR adoption , arguing “Physician owned practices have low probabi lities of using EHR s no matter what the size.” Therefore, Burt and Sisk (2005) maintain that smaller practices lag larger practices in EHR adoption due to their inability to fund EHR s, rather than some other intrinsic property of smaller practices . Boons tra and Broekhuis (2010) confirm this saying “physicians who are employed by or contracted to a medical practice are more likely to use EHR s than those who own their own practices” . Burt and Sisk (2005) also propose that the unique scope of services provid ed by practices i mpact s EHR adoption where “scope of services may also influence adoption, to the extent that EHR s offer the potential for practices with a wider range of services to achieve greater efficiencies” . However, this hypothesis was later disprov en by Burt and Sisk (2005 ) because “neither the scope of services, as measured by single versus multispecialty practice, nor broadly defined categories of specialty, as measured by primary care, medical, or surgical, were significantly associated with use” .

PAGE 28

19 They did find “use varied by specific physician specialty, however, with psychiatrists and dermatologists least likely and orthopedic surgeons and cardiovascular disease specialists most likely to use EHR s” (Burt and Sisk 2005) . The second component of the organizational determinant deals with the debilitating effect , in many practices , of the wide social and professional gulf between physicians and support staff. Research indicates that support staff often is more receptive to EHR adoption to improve pro cess efficiency and reduce paperwork . In fact, adoption interventions often have a greater effect when geared toward support staff instead of physicians . For example, Vishwanath and Scamurra (2007) explain that “intervention aimed at alleviating integratio n issues is better off training support staff“ . In the same vein, a hierarchical organization structure reinforcing the separation of physicians and support staff is especially detrimental to EHR adoption , and necessitates effective networking and “a more open and less hierarchical environment for peer to peer interactions and knowledge transfer” (Nambisan 2014) . The role of an effective organizational structure is further highlighted by Hennington and Janz (2007) when claiming EHR adoption requires “existence of both “an organizational and technical infrastructure” that would support actual usage” . A more collaborative relationship between physicians and support staff , unencumbered by a formal hierarchical structure , is necessary because the “ability to share such user innovations and experiences are invaluable during the adoption of new technologies such as EHR ” (Nambisan 2014). Change Process Reluctance to change involves not only the social determinant of physician “lack of cont rol” discussed earlier, but also physician inadequacies in managing the change process required for EHR adoption . Boonstra and Broekhuis (2010) frame the argument as “ EHR s in medical practices amounts to a major change for physicians who tend to have their own unique working styles that they have developed over years” . As such, physicians anticipate difficulties managing the EHR change process without a facilitating organizational culture, effective incentives, individual and local support,

PAGE 29

20 community level participation, and leadership . The magnitude of the c hange process is determined by the extent of alignment between existing business processes and the functionality of the chosen (and possibly customized) EHR system . Given the mindset of most physicians d escribed previously ; “physician concerns surrounding EHR adoption center more on integrating EHR s into existing clinical processes than on the need to fundamentally alter those processes” (Hennington and Janz 2007) . This approach to make EHR s fit existing processes, results in significant frustration for physicians, patients and support staff, and is compounded by EHR system designers unfamiliar with procedures unique to the health care industry . An example of an EHR system poorly equipped to match existing health care process es is described by Hennington and Janz (200 7 ) when they “cited EHR designers’ poor understanding of clinical workflows as one reason for low EHR adoption rates among chest physicians, noting system designers’ failure to recognize the ro le of group interaction in the clinical process” . The consequence of poorly designed or incompatible EHR systems is costly and time consuming customization . Miller and Sim (2004) maintain that many times “ EHR hardware and software cannot simply be used “ou t of the box.” Instead, physician practices must carry out many complex, costly, and time consuming activities to “complement” an EHR product, hence, customization . The need for EHR customization is compounded by the lack of physician and support staff expertise and time as described earlier. Most physicians in smaller practices find they have insufficient resources to “customize the system and make it do what I want it to do, and po tential inability to customize the software, reports, and outputs to my satisfaction” and are thus reluctant to proceed with EHR adoption (Vishwanath and Scamurra 2007). Environmental The e nvironmental determinant of EHR adoption mainly concerns the unique payment models under which physicians function . Basically , physicians reimbursement is performed using one of two approaches . First, is the more traditional “Fee for Service” (FFS) model where healthcare services are unbundled and “doctors and other healt h care providers receive a fee for each service such as an office

PAGE 30

21 visit, test, procedure, or other health care service” (opm.gov 2014) . The second payment model is “outcomes based” , and accounts for the quality of care afforded the patient . Essentially this is a new payment model where “ Medicare and Medicaid, as well as private insurance plans, are shifting payment to what is referred to as "accountable care" where physicians are paid for quality rather than the quantity of care delivered” (creators.com 201 2) . Clearly, reimbursement based on the first approach tracking service volume de incents the efficiencies promised with EHRs. The e nvironmental determinant further includes t he technical factors external to practices and impacting EHR adoption . These tech nical factors include interoperability standards, open source EHR systems and single patient EHR accessibility for multiple providers . Third Party Intervention Another usage determinant concerns the impact of intervening third parties on EHR adoption . Pre viously, the i mpacts of several “ third parties ” have been touched upon by other determinant s (e.g. government intervention with the l egal determinant and third party vendor relationships with the c hange p rocess determinant ) . The overall conclusion from these analyses being that a physician’s anticipated effort in dealing with all the third party interventions associated with EHR adoption is consequential . It is important to note that IT efforts especially, including complex system installations , are onerous to most physicians, and often necessitate extensive third party involvement for training, technological support, or additional reasons . Managing these and other third party relationships is obviously an area of concern to physicians without the time or expertise to effectively perform vendor management tasks . Summary Upon reviewing reference literature , several advances in our understanding of the determinants of EHR use within the health care industry are apparent . Because of the unique nature of the health care

PAGE 31

22 industry, determinants of health care IT adoption are different than those experienced by more traditional industries . Research has shown that the reasons originally suggested for physician reluctance to embrace EHR s may have not explained the issue properly or completely. Most significant of the adoption determinants unique to health care are (1) "physician autonomy” and its significant adoption impact , (2 ) the psychological and social context of the adopting organization , and (3) the cost dynamics for EHR adoption . We must shift our focus from the more common technical and financial IT adoption determinants , to those determinants of significance to health care . Accordingly , chiefly financial or technical incentives for adoption have proven to be less effective within health care . What seems of greater importance to health care adoption is the role of “physician autonomy ” and the need to properly understand and mitigate physician desire for pro cess control . Along those lines , t he traditional hierarchical structure of small physician practices, the negative impact of physicians participating in peer to peer interaction on EHR adoption, and the obvious difference in level of EHR receptivity between physicians and support staff clearly demonstrate that “physician autonomy” is of more significance than originally thought . Hence, Venkatesh and Sykes (2011) argue that “doctors are likely to develop negative views towards the new system because it could pose a threat to their autonomy and power”. Additionally, t he connectivity and communication style of physicians , and their power to change peer behavior , make s their resistance to EHR adoption even more problematic . As such , “central doctors . . . those who interact a great deal with doctors for advice, information, and knowledge related to performing their work” have been shown to have a negative EHR adoption impact on their immediate peers (Vankatesh and Sykes 2011) . Finally, based on case studies, phys icians find unexpected patterns and magnitudes of costs related to EHR adoption . For example , unforeseen costs realized after the initial EHR transition phase , such as costs for maintenance and customization , by adoption “leaders” have a significant impact on EHR receptiveness for adoption “laggards.” Furthermore, these ongoing costs are not just limited to vendor management but are incurred to keep the EHR system functional in terms of

PAGE 32

23 “performing updates, monitoring usage, implementing data security, per forming data storage, and maintenance of hardware and software” (Randaree 2007). Clinical Decisions Support Systems Overview The ultimate objective of the research described in this dissertation is a MES driven similarity measure functioning as an integral component of clinical decision support system s (CDSS) . As such, this section of the literature review explains what a CDSS is and explores it s benefit s for health care practitioners . Formally , a CDSS “is an application that analyzes data to help h ealthcare providers make clinical decisions” (http://searchhealthit.techtarget.com /definition/clinicaldecision support system CDSS) . More specifically: “A CDSS is an adaptation of the decision support system commonly used to support business management . Physicians, nurses and other healthcare professionals use a CDSS to prepare a diagnosis and to review the diagnosis as a means of improving the final result. Data mining may be conducted to examine the patient's medical history in conjunction with relevan t clinical research. Such analysis can help predict potential events, which can range from drug interactions to disease symptoms. Some physicians prefer to avoid over consulting their CDSS, instead relying on their professional experience to determine the best course of care.” (http://searchhealthit.techtarget.com/definition/clinicaldecision support system CDSS) Essentially, a CDSS provides an environment under which health care providers can incorporate data mining and analytics tools to assist with clinical decision making . The institutionalized use of a CDSS requires: 1. Development and implementation of CDSS tools with clinical decision making value. 2. An environment under which CDSS tools can be effectively used by health care providers. 3. A dopt ion intent by health care providers perceiving CDSS tools to be useful and easily used . We present a MES similarity measure as a CDSS tool providing significant value for clinical decision making to improve patient outcomes .

PAGE 33

24 CDSS Adoption Factors Specific to this rese arch is an understanding of the forces behind the adoption of clinical decision support systems . The factors influencing CDSS adoption are somewhat different than those for general IT and more transaction level system adoption within health care . However, the paradox is similar for CDSS adoption in that a technology with obvious benefits is being reluctantly used . Liberati (2017) observes that “ Although this technology [CDSS] has the potential to improve the quality of patient care, its mere provision does not guarantee uptake: even where CDSSs are available, clinicians often fail to adopt their recommendations. ” Appendix D outlines the more important factors for CDSS specific IT adoption . Interestingly, physician autonomy , as discussed previously, is an important consideration in CDSS adoption . To confirm , Liberati (2017) states “t he most severe barriers (prevalent in the first positions) include clinicians’ perception that the CDSSs may reduce their professional autonomy or may be used against them in the event of medicallegal controversies .” What is therefore needed are CDSS tools capable of convincing physicians that patient outcomes could improve, and professional autonomy maintained, with their use . Accordingly , this literature rev iew continues with an overview of similarity measures , as a potential CDSS tool to further mitigat e physician IT hesitancy , by proving its value in diagnosis and treatment. Similarity Measures and Medical Event Sequences We are motivated to develop similar ity measures appropriate for MESs to support and broaden health care analytics . In reviewing related research, there is a general lack of understanding and reference studies concerning the application of similarity measures to MESs . Furthermore, similarity measures established for state sequences in other domains, fail to adequately address the unique characteristics of MESs . For example, MES characteristics of importance to similarity measure s and ignored by referential similarity measures include (1) even t likelihood and replication, (2) the

PAGE 34

25 distribution of temporal gaps between medical events, and (3) hierarchically structured event codes characterizing events . Furthermore, other similarity measures include event sequence features , such as alignment or se ntinel events , that may not be of relevance for MES analysis . During our survey of referential similarity measures, we focused on their inclusion and application of event matching and temporal structure; features unique and important to MESs . We find that some similarity measures do, in fact, address unique MES features for event alignment, temporal event and gap duration, and the significance of edit distance . First, e vent alignment is used in some similarity algorithms to construct a relative time s cale for comparing two event sequences . Usually event alignment similarity involves a sentinel event to serve as a reference point , with this sentinel event being chosen by the user or based on some other criteria. For example, the Match/Mismatch measure r equires the entry of a sentinel event , with the timing of other events being relative in terms of their duration from the sentinel event (Wongsuphasawat 2009) . The ARTEMIS similarity measure also incorporates event alignment . This temporal alignment of eve nt sequences seems especially appropriate to MES analysis, given that a triggering medical event can serve as a meaningful reference point for subsequent medical events comprising a patient’s MES . Second, temporal structure , in terms of event duration s and gaps between health care incidents , is integral to MESs , not only for event matching, but also for ascertaining the acuity of a patient’s condition . For example, a longer inpatient event duration (length of stay) may indicate greater severity , and a longer period between inpatient admissions (duration gap) may indicate less severity . Along those lines , an exploration of referential literature reveals a prototypical similarity measure ( Optimal Temporal Common Subsequence ) that incorporates the key MES features of temporal structure (Zheng 2010). Third, t he use of edit distance for measuring the cost (or distance) required to transform one event sequence into another may not have the same significance for MESs as other e vent sequences . Specifically, edit distance quantifies dissimilarity between two strings by counting the minimum number of operations required to transform one string into the other. Edit distance applies

PAGE 35

26 when there is no intrinsic meaning for the characters or codes representing t he events within a sequence . This is not the case for MESs where a diagnosis , procedure or drug code holds intrinsic meaning about the patient ’s health care event. As such, o ne cannot change the value of a patient’s diagnosis code , appearing in their MES , to match another patient’s diagnosis code , appearing in their MES , to simply compute an edit distance. This would result in losing information about the event itself, and this medical event information is crucial to MES based similarity. In summary, there is limited application of similarity measures to MESs . Much of the previous research concerning similarity measures focuses on media retrieval through information recognition measures . This type of work considers event sequences as ordered lists without co nsidering the unique temporal and coding facets of MESs. However, some components of referential similarity measures (OTCS for example) do seem applicable to MES analysis , and their methods should be incorporated as appropriate.

PAGE 36

27 CHAPTER II DEVELOPMENT AND EVALUATION OF A SIMILARITY MEASURE FOR MEDICAL EVENT SEQUENCES Abstract In this study, w e develop a similarity measure for medical event sequences (MESs) and empirically assess it using U.S. Medicare claims data. Existing similarity measure s do not use unique characteristics of MESs and have never been evaluated on real MESs . Our similarity measure, the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS MES) , provides a matching component that integrates event prevalence, event duplication, and hierarchical coding , important elements of MESs. The OTCS MES also uses normalization to mitigate the impact of heavy positive skew of matching events an d compact distribution of event prevalence . We empirically evaluate the OTCS MES measure against two other measures compatible with MES analysis , the original OTCS and Artemis , a measure incorporating event alignment. Our evaluation uses two substantial data sets of Medicare claims data containing inpatient and outpatient sequences wi th different medical event coding. We find a small overlap in nearest neighbors among the three similarity measures demonstrating the importance of each unique aspect of a MES. The evaluation also provides evidence about the impact of component weights, neighborhood size , and sequence length . This study includes an expanded literature review about the enabling attributes of electronic health records (EHRs) for data mining, application of similarity measures to clinical de cision making , and general IT adoption barriers within health care . Overall, t his exploration of MES similarity measure reasoning is undertaken with the understanding that EHR/CDSS pervasive use is fundamental to effective and transformational health care IS research. Introduction State sequences occur naturally in many circumstances and lend themselves to advanced data mining techniques . A state sequence is “a sequence of data, measured and/or spaced typically at

PAGE 37

28 successive times, which can be either points or intervals” (Zheng et al. 2010) . Examples of common state sequences analyzed through data mining techniques include temporally space d account payment statuses, audio pattern matching, and video information retrieval (Zheng et al. 2010) . State sequence research applies to several fields including “geo informatics, cognitive science, linguistic analysis, music and medicine” (Kostakis 201 1) . This dissertation emphasizes the analysis of state sequences containing patient incidents or medical event occurrences within the health care industry. In this research, medical event sequences (MESs) are defined as state sequences relevant to health care . The advent of electronic medical records ( EHR s) has enabled the abstraction and retention of standard diagnosis, procedure, and pharmaceutical codes characterizing MESs . Table 1 provides an example of an inpatient (IP) MES where a sequ ence of hospita l admissions is characterized by the International Classification of Diseases Version 9 (ICD9) primary diagnosis code associated with each event. Table 1 . Sample Inpatient MES Member ID Primary ICD 9 Diagnosis Code Description Event Start Event End 00824B6D595BAFB8 49122 OBSTRUCTIVE CHRONIC BRONCHITIS WITH ACUTE BRONCHITIS 2/7/2008 2/10/2008 00824B6D595BAFB8 49121 OBSTRUCTIVE CHRONIC BRONCHITIS WITH (ACUTE) EXACERBATION 2/28/2008 3/4/2008 00824B6D595BAFB8 78906 ABDOMINAL PAIN EPIGASTRIC 3/25/2008 3/29/2008 00824B6D595BAFB8 4280 CONGESTIVE HEART FAILURE UNSPECIFIED 3/29/2008 3/30/2008 00824B6D595BAFB8 7802 SYNCOPE AND COLLAPSE 5/14/2008 5/16/2008 00824B6D595BAFB8 27651 DEHYDRATION 6/11/2008 6/13/2008 00824B6D595BAFB8 5990 URINARY TRACT INFECTION SITE NOT SPECIFIED 7/13/2008 7/18/2008 00824B6D595BAFB8 5070 PNEUMONITIS DUE TO INHALATION OF FOOD OR VOMITUS 8/14/2008 8/21/2008 Motivation As EHR adoption expands , the health care industry can realize increased “productivity for doctors and nurses, better information for decision making , better product/service customization, higher quality patient outcomes, and better service” (Skinner 2003) . D ata warehouses of clinical information provide a “foundatio n for a learning healthcare system that facilitates clinical research, quality improvement, and other data driven efforts to improve health” (Hersh 2013) . Data warehouses

PAGE 38

29 can provide economies of scale to support comparative effectiveness research (CER), w hich aims to study populations and clinical outcomes of maximal pertinence to realworld clinical practice (Hersh 2013). While adoption of EHR data warehousing and analytics is progressing slowly , the health care industry understands that “the investment i n EHR s and supporting data warehouses is fundamentally required to achieve the value that is accessible in analytics” (Sanders 2013). A s imilarity measure for MESs is important for a variety of reasoning tasks used by health care professionals and data min ing algorithms . Informally, a similarity measure can be used by health care professionals to develop treatment plans based on similar patients . A similarity measure can be used in data mining algorithms for risk assessment , co morbidity determination , and conformance to clinical pathways. Existing similarity measures fail to adequately address the unique characteristics of MESs including event likelihood and replication, distribution of temporal gaps between medical events, and hierarchically structured event codes . Furthermore, many similarity measures include other components such as alignment or sentinel events that do not have comparable significance for MESs. Summary of Work To explore the potential benefit of MESs to health care pro fessionals and data mining algorithms , this research develop s an improved similarity measure applicable to MESs and evaluate s the measure’s effectiveness through formal experimentation. To provide a context for the OTCS MES measure, we explain components o f MESs requiring special consideration when comparing health care incident sequences . We then present the OTCS MES measure, an enhanced version of the Optimal Temporal Common Subsequence (OTCS) measure developed by Zheng et al. (2010) . The OTCS MES substan tially extends the original OTCS with components for matching individual events and temporal structure.

PAGE 39

30 We evaluate the OTCS MES measure in an application and algorithm independent manner using substantial samples of MESs. The evaluation uses two measures of nearest neighbor overlap, both independent of application and algorithm . Assessment of overlap among nearest neighbors provides insights for follow on analysis of performance by application and algorithm . We evaluate the OTCS MES measure, the o riginal OTCS measure, and Artemis, a similarity measure based on event alignment for MESs . Our empirical evaluation utilizes substantial samples of inpatient and outpatient claims data from the Data Entrepreneurs’ Synthetic Public Use Files (DE SynPUF) ava ilable through the Centers for Medicare and Medicaid Services (CMS) . Since its introduction, CMS SynPUF data has been widely used in health care research and analytical studies . Archimedes Inc. , a prominent healthcare modeling and analytics company , collab orated with the Centers for Medicare & Medicaid Services to simplify and expand access to the same synthetic CMS claims data used in this study . Using two measures of nearest neighbor overlap, our empirical evaluation provides evidence of substantial diffe rences among the three similarity measures . We also demonstrate internal consistency of OTCS MES components and the impact of weights, neighborhood size, and sequence length . Contributions Our research provides innovation in the design of an adapted simila rity measure ( OTCS MES ) specific to medical events, characterization of MESs to improve the design of similarity measures, and empirical comparison of similarity measures in an application and algorithm independent manner using substantial samples of actua l claims data . For the first contribution , the OTCS MES, a new design science artifact, combines two independent components for matching events and temporal structures using simple weight s. The event matching component integrates event prevalence, event duplication, and hierarchical codin g, important elements in MESs. For t he second contribution , the empir ical characterization of MESs uses a substantial sample of medical claims data to indi cate the impact of heavy positive skew on the number of matching events and the compact distribution of event

PAGE 40

31 prevalence. Normalization used in the event matching component of the OTSC MES mitigates these impacts . The insights of this empirical characteriz ation can also improve the design of other similarity measures for event sequences. For the third contribution , the empirical comparison of OTCS MES to other similarity measures uses two overlap measures that are application and algorithm independent . The overlap measures allow comparison of nearest neighbors generated by the proposed OTCS MES, the original OTCS , and Artemis . The empirical comparison involves a substantial sample of real medical claims data, both inpatient and outpatient with different medical coding. The analysis provides strong evidence for substantial differences in overlap between OTCS MES and the other similarity measures and the impact of weights, neighborhood , and sequence size on overlap . The insights from th is empirical comparison provide guidance for evaluating performance of similarity measures for MESs in important applications and algorithms. Related Work Similarity measures quantify our “naive judgments of likeness” between entities (France 1994) . In the health ca re industry, similarity measures quantify patient likeness to help with patient classification for improved outcomes and risk (cost) prediction . Research concerning similarity measures for MESs leverage ontolog ies . An o ntology is “a vocabulary of terms and so me specification of their meaning” . It compares to an ICD code set when describing the events within an inpatient MES . Essentially our objective is “a function that, given two ontology terms or two sets of terms [ MESs ] annotating two entities [ patients ] , r eturns a numerical value reflecting the closeness in meaning between them” (Couto 2013) . Existing similarity measures incorporate several components relevant to MES evaluation : (1) event alignment, (2) temporal event and gap duration, and (3) symbol substitution in edit distance . E vent alignment construct s a relative time scale for comparing two event sequences based on a sentinel matching event . For example, the Match/Mismatch measure requires a sentinel event category for

PAGE 41

32 aligning other events based on their time from the sentinel event (Wongsuphasawat 2009) . This type of time line alignment seems appropriate to MESs as a triggering event can serve as a reference point for esta blishing a relative MES time scale . The Artemis measure in our empirical evaluation incorporates event alignment . T emporal duration and gap between health care incidents help with ascertain ing the acuity of a patient’s condition . For example, longer inpat ient admissions may indicate a more severe condition, and longer period s of time (gaps) between inpatient admissions may indicate a less acute condition . Consequently, we select the OTCS similarity measure because it utilizes both temporal event duration a nd temporal event gap (Zheng 2010) . Symbol substitution , used in edit distance to transform event sequences, may not have the same utility for MESs as with other types of event subsets . Edit distance applies when symbols have no intrinsic meaning representing events within a sequence. In an MES, however, a diagnosis code provide s intrinsic meaning about a patient . One cannot subjectively change the value of a patient’s diagnosis code appearing in their MES to match another patient’s diagnosis code as appearing in their MES and count that as a single step to be added to an edit distance. Although some aspects of MESs are addressed by other similarity measures, these measures largely ignore important MES aspects including (1) the intrinsic meaning ass ociated with abstracted codes characterizing events in an MES, (2) the temporal gap and duration features of medical incidents, and (3) partial matching from hierarchical event coding . An MES is characterized by abstracted medical event codes each with an intrinsic meaning . Therefore, codes comprising an MES cannot be randomly exchanged to measure event sequence similarity . For the other two aspects , most similarity measures neglect to capture the temporal components of event sequences . In general, “time se ries and state sequences have been simply expressed as ordered lists . . . leaving some critical issues unaddressed”

PAGE 42

33 (Zheng 2010) . None of the proposed similarity measures for MESs support hierarchical coding schemes . Jordan (2004) emphasizes the signific ance of hierarchical coding levels, writing “recording diagnoses only at diagnostic heading [level] may improve the accuracy of recorded data but is too general for most research or clinical purposes” . Accordingly, each hierarchical level of the ICD diag nosis code is important to similarity matching as it contains additional information about each event and, ultimately, the patient. In summary, there is limited prior application of similarity measures to MESs . Much of the previous research concerning sim ilarity measures focuses on media retrieval through information recognition algorithms . This type of work clearly considers event sequences as ordered lists without considering the unique attributes of MESs . In contrast to this approach, we propose a measu re that incorporates the unique attributes of MESs . As a starting point , OTCS consider s the temporal components of event sequences and the importance of matched events . As such, it serves as the starting point for our OTCS MES measure specific to medical e vent sequences . For our selected comparative measure, Artemis incorporates temporal matching through defining and quantifying common temporal relationships between event intervals (meet, match, overlap, contain and follow) (Kostakis 2011) . OTCSMES Similarity Measure Representation of MESs We follow Zheng’s (2010) representation of MESs in which a temporal event t1 begins with time point p1 and ends with time point q1 . T he duration of event t1 is then simply q1 minus p1, and the preceding te mporal gap is p1 minus q0 (end point of prior event) . We extend this representation to MESs with ICD 9 codes as shown in Table 2 , an extension of Table 1. For the second inpatient event (with Primary ICD Diagnosis Code 49121), the starting point p2 is 2/28 /2008, the ending point q2 is

PAGE 43

34 3/4/2008 and the prior event ending point q1 is 2/10/2008 . Given these values, the event duration is 5 days (q2 minus p2) and the event gap is 18 days (p2 minus q1). ICD 9 codes contain three levels from three digits to five d igits. The primary ICD 9 code con tains the maximum digits specified as providers sometimes only report 3 or 4 digit codes. For example, the second event has all 5 digits (4912 1 ), while the fourth event only has 4 digits (4280). Table 2. Sample IP MES with Event Duration and Gap Member ID Primary ICD 9 Diagnosis Code Description Event Start Event End ICD 9 3 Digit ICD 9 4 Digit ICD 9 5 Digit Event Duration Event Gap 00824B6D595BAFB8 49122 OBSTRUCTIVE CHRONIC BRONCHITIS WITH ACUTE BRONCHITIS 2/7/2008 2/10/2008 491 4912 49122 3 0 00824B6D595BAFB8 49121 OBSTRUCTIVE CHRONIC BRONCHITIS WITH (ACUTE) EXACERBATION 2/28/2008 3/4/2008 491 4912 49121 5 18 00824B6D595BAFB8 78906 ABDOMINAL PAIN EPIGASTRIC 3/25/2008 3/29/2008 789 7890 78906 4 21 00824B6D595BAFB8 4280 CONGESTIVE HEART FAILURE UNSPECIFIED 3/29/2008 3/30/2008 428 4280 4280 1 0 00824B6D595BAFB8 7802 SYNCOPE AND COLLAPSE 5/14/2008 5/16/2008 780 7802 7802 2 45 00824B6D595BAFB8 27651 DEHYDRATION 6/11/2008 6/13/2008 276 2765 27651 2 26 00824B6D595BAFB8 5990 URINARY TRACT INFECTION SITE NOT SPECIFIED 7/13/2008 7/18/2008 599 5990 5990 5 30 00824B6D595BAFB8 5070 PNEUMONITIS DUE TO INHALATION OF FOOD OR VOMITUS 8/14/2008 8/21/2008 507 5070 5070 7 27 O TCS MES Components We developed the OTCS MES based on characteristics of MESs. The original OTCS measure , although motivated by MESs, does not account for prevalence of events, hierarchical coding , and duplication of events, important aspects of MESs. The OTCS MES component for event matching integrates event prevalence, hierarchical coding, and replicated events. The OTCS MES, using event prevalence, provides higher similarity for rarer events. Prevalence supports hierarchical matching in

PAGE 44

35 medical coding schemes. The OTCS MES also revises the temporal structure matching component. The original OTCS considered only common events for temporal structure matching. The OTCS MES matches on the temporal structure of all events, maintaining independence between the event matching and temporal structure matching. Independence is consistent w ith MES characteristics including many event codes, no intrinsic meaning to the order of event occurrences, and typically a relatively s mall number of matching events. The OTCS MES also uses both mean and variation to measure temp oral structure similarity whereas the original OTCS only uses sum of gap differences. In contrast to the original OTCS, t he OTCS MES normalizes each component to simplify weight assignment. Using important characteristics of MESs, t he OTCS MES consists of two major components for event matching and temporal structure matching. The event matching component sums prevalence weights in the numerator and the number of matched events in the denominator . We use prevalence as a score or weight among events matching at possibly different hierarchical levels of a medical coding representation . The temporal structure comp onent contains four elements for differences of mean and coefficient of variation (CV) for duration and gap between two event sequences . We normalize e ach element to achieve a value between zero and one . The five normalized elements are then weighted for an overall similarity measure ranging from zero to one with larger values indicating greater similarity. The computation of each element comprising the OTCS MES is provided in E quations 1 and 1 .1 1 .5. = ( ) + ( ) + ( ) + ( ) + ( ) = 1 ( 1 ) In E quations 1.1 to 1.5,

PAGE 45

36 ME is the set of all events in the pair of cases, C is set of all cases, and MSSize is the ca rdinality of the associated set. Event Similarity (OTCS MES 1) = ( ) / ( + 1 ) (1.1) NPWe is the normalized prevalence weight of event e, PDM is t he maximum matched event limit, and MSSizeLimit is the number of event matches in the associated set constrained by the matched event limit . Temporal Similarity (OTCS MES 2 OTCS MES 5) = 1 ( 1 .2) AVG DURe is the average temporal duration difference of all events in the associated set e. = 1 ( 1 .3) AVG GAPe is the average temporal gap difference of all events in the associated set e. = 1 ( 1 .4) CVDURe is the coefficient of variation difference for duration of all events in the associated set e. = 1 (1.5)

PAGE 46

37 CVGAPe is the coefficient of variation d ifference for gap of all events in the associated set e . Coding of OTCS MES Similarity Measure Development of efficient software to operationalize the OTCS MES similarity measure is imperative given the large amounts of data envisioned for processing. That is, MES comparisons between entity pairs quickly expands as the number of entities grows. In fact, the number of comparisons required for a set of N entities (MESs) is equal to: ( N x ( N 1 ) ) / 2. As such, two different algorithms were developed to generate the results for each component of the OTCS MES similarity measure described above. T he first algorithm is a C# module and is best applied to smaller subsets of MES data. This C# module is included as Appendix E . The second algorithm is intended for a larger number of MES comparisons. This algorithm has been developed with the SAS progr amming language and incorporates features of that language to improve processing efficiency. The SAS program computing OTCS MES comparison results is included as Appendix F . Example To clarify the practical application of OTCS MES, we use a simple example to depict each component of the measure . Table 3 provide s an example of a high similarity patient pair . In Table 3 , the MES pairs have five matching events (<295 in rows 1 and 5>, <295 in rows 2 and 7>, <2953 in rows 3 and 9>, <2953 in rows 4 and 10>, <296 in rows 5 and 11>). Note that two matches occur at the four digit level with higher prevalence than 3 digit matches. OTCS MES uses the prevalence of these matching events rather just the count. The table also provides temporal component metrics in dicating the similarity of these patients based on their MESs

PAGE 47

38 T able 3 . Example of Similar Patients Based on Event Similarity DE SynPUF Patient ID ICD9 Begin Admit End Admit Admit Gap Admit Duration 3Digit 4Digit 5Digit 0278EC3A3183E5A4 295 2953 29530 8/11/2008 9/1/2008 0 21 295 2953 29530 9/2/2008 10/6/2008 1 34 295 2953 29530 12/16/2008 12/18/2008 71 2 295 2953 29530 1/12/2009 1/16/2009 25 4 296 2969 29690 6/26/2010 7/6/2010 526 10 Average 124.60 14.20 Coefficient of Variation 1.82 0.94 A7616FF2567C9EA8 304 3048 30480 2/12/2008 2/17/2008 0 5 820 8208 5/16/2008 5/21/2008 89 5 298 2989 6/3/2008 6/19/2008 13 16 730 7302 73028 7/8/2008 7/12/2008 19 4 295 2957 29570 7/26/2008 8/4/2008 14 9 296 2962 29620 10/18/2008 11/22/2008 75 35 295 2957 29570 12/2/2008 1/6/2009 10 35 283 2839 2/24/2009 2/25/2009 49 1 295 2953 29534 10/15/2009 11/9/2009 232 25 295 2953 29532 10/17/2009 10/29/2009 0 12 296 2962 29624 12/16/2010 12/27/2010 413 11 Average 83.09 14.36 Coefficient of Variation 1.55 0.85 Normalization in the Event Similarity Component Two issues surfaced when devising the event similarity component of the OTCS MES measure . The first issue involves the impact of a heavy, positive ly skewed distribution when normalizing the number of event matches based on the maximum number of event matches across all patient pairs . A heavily skewed distribution can make most MESs look dissimilar even when they share several common events . For the inpatient MESs used in our analysis, 99 % of MES pairs with at least one common event have less than or equal to five matched events as indicated in Figure 1 . However, some patient pairs have up to twenty or more matched events . An analysis of matched events for patient pairs based on outpatient procedures yields a similar distribution (Figure 2 ) .

PAGE 48

39 To mitigate the impact of heavy positive skew, the matched event limit should be set based on the cumulative frequency of patient pairs by matched events, perhaps at a level above 95 %. W e set the norma lizing denominator to 5 (99 %) for inpatient data and 40 (97 %) for outpatient data. Figure 1. Frequency Distribution of Patient Pairs by Number of Matched Inpatient Events F igure 2. Frequency Distribution of Patient Pairs by Number of Matched Outpatient Events The second issue involves summation of prevalence weights for matching events. For ICD 9 codes in the Medicare data , the distribution is a tight, relatively uniform distribution. For t he large number of I C D 9 codes, the comparative frequ ency of a single code is rather small and does not appropriately capture the relative rarity of certain incident codes . For example, the total number of

PAGE 49

40 unique 3 digit ICD 9 codes in our claim sample is 528 with the most prevalent code (428) only accountin g for 6.87% of the claims . Likewise, there are 1,493 unique 4 digit codes and 1,153 unique 5 digit codes with the most prevalent codes only accounting for 4.26% (code 4912) and 5.37% (code 49121) of all claims respectively . To lessen the impact of this ti ght, relatively uniform distribution, we normalize prevalence values using the difference between the minimum and maximum sum of prevalence weights across all patient pairs with a corresponding number of matching events . Figure 3 shows t he range (minimum a nd maximum) of summed prevalence weights by the number of matching events. Figure 3. Prevalence Scales by Number of Matched Events for Inpatient MESs Empirical Evaluation of the OTCS MES The empirical evaluation compares the OTCS MES to two other proposed similarity measures for MESs, the original OTCS and Artemis. The comparison explores differences in nearest neighbors returned by each measure. Nearest neighbors are used in data mining algorithms for classification and clustering as well as clinical decision making by health care providers. Before comparing the measures,

PAGE 50

41 we present characteristics of the data used in the comparison and analyze characteristics of the OTCS MES. Data Charact eristics Centers for Medicare and Medicaid Services ( CMS ) provides real claims data intended “for data entrepreneurs, for software and application development, and for research training purposes“ (cms.gov 2015) . Importantly, the use of real CMS claims data provides researchers the capability to extract meaning from the wealth of information contained in abstracted health care event data. Researchers, life science organizations, government agencies, payers, and providers use this data to make more informed d ecisions based on the actual health care experiences of their constituents . For example, Demand Driven Open Data (DDOD) , being a framework of “tools and methodologies to provide a systematic, ongoing and transparent mechanism” for the use of publicly available “open” data , leverages CMS SynPUF data. A sample DDOD application identifies “ high risk, high cost individuals with the aim of providing them appropriate social services “ . DDOD recognizes that the CMS SynPUF data “is a realistic and professionally deidentified sample data set to give would be data entrepreneurs something realistic to try out new applications” . CMS provides inpatient, outpatient, carrier, and prescription drug claims for randomly selected Med icare beneficiaries . Specifically, the files contain synthesized data taken “from a 5% random sample of Medicare beneficiaries in 2008 and their claims from 2008 to 2010” (cms.gov 2015) . The files are synthesized in the sense that “a unique unidentifiable ID, DESYNPUF_IDis provided on each file to link synthetic claims to a synthetic beneficiary” (cms.gov 2015) . Our empirical evaluation uses inpatient and outpatient claims from the CMS publicly available data sets . For the inpatient evaluation , we randoml y extracted patients and their associated claims across all 20 claims samples prepared by CMS . T he threshold for event length was set to five to help

PAGE 51

42 with evaluation robustness . That is, patients are required to hav e at least five events to be eligible for random selection . The resulting experimental data set contains 37,448 claims ( inpatient admission s) for 7,000 unique beneficiaries or patients . Clinical outpatient procedures use standardized code s to describe patient services such as tests, surgeries, evaluations, and other medical procedures . The outpatient data set contains Current Procedure Terminology (CPT1) codes, a standard code set maintained by the Ame rican Medical Association (AMA) . Like ICD codes, CP T codes have a hierarchical organization. CPT codes are organized into sections (6), subsections (115), and then individual five digit codes. The outpatient data set used in this study contains 2,600 unique, five digit CPT codes. Similar to ICD codes, CPT codes in our sample have a tight, relatively uniform distribution with the most frequent procedure code contained in 4.92% of the sample. Thus, we normalize prevalence values for CPT codes in a manner similar to ICD codes. In the CMS data, outpatient MESs have important differences from inpatient MESs . First, outpatient MESs have longer sequence length due to the more routine nature of outpatient events . However, the extreme positive skew of the distribution of patient pairs by number of matched events for outpatient event matching (Figure 2 ) is like inpatient MESs . I n our sample, 97% of patient pairs matched on 40 events or fewer, but the maximum number of matched events between patient pairs is 699 . Second, Table 4 shows that the duration of outpatient eve nts is much less than inpatient events 1 The Centers for Medicare and Medicaid Services uses CPT codes as Level 1 of the Health Care Common Procedure Coding System.

PAGE 52

43 with 90% of outpatient services occurring on the same day or within two days based on the begin and end service dates appearing on the outpatient claim . Table 4. Distribution of Outpatient Incidents by Duration (CMS Linkable Medicare Data DE SynPUF Outpatient Sample 1 2000 Randomly Selected Patients) Outpatient Incident Duration (# of Days) # of Outpatient Incidents % of Total 0 685135 87.89% 1 17525 2.25% 2 6300 0.81% 3 3831 0.49% 4 3239 0.42% GE 5 63507 8.15% Total 779537 100.00% For the outpatient evaluation, a smaller sample of randomly selected patients was used due to t he much longer length of MESs. The resulting experimental data set contained 6,968 claims (OP admissions) for 966 patients. N earest Neighbor Overlap Measures The use of overlap measures to evaluate the relative impact on the set of nearest neig h bors derived through different similarity measures is widely accepted . The intent of overlap measures is to determine whether the set of entities (nearest neighbors) chosen as most like a reference entity using a particular similarity measure is in fact different than the set of entities chosen by another similarity measure . Along those lines, w e calculate nearest neighbor overlap to determine the equivalence of the most similar or nearest neighbor patients resulting from each measure or measure adaptation . The computation of nearest neighbor overlap may incorporate simple or rank weighted measures . The simple overlap measure is the ratio of the number of common neighbors (cardinality of set intersection) divided by the total number of pos sible matches (k) . I n Table 5 , the simple overlap is 3 matching nearest neighbors out of a maximum of 5 possible matches for an overlap value of 0.6 (3/5).

PAGE 53

44 Table 5 . Simple Overlap Measure Example Nearest Neighbors for Patient X Using Original OTCS Nearest Neighbors for Patient X Using OTCS MES Patient L Patient K Patient K Patient F Patient C Patient G Patient F Patient L Patient Z Patient M Total common nearest neighbors is 3 out of a possible 5 . Therefore, the simple overlap measure value is 3 divided by 5 or 0.60. T he weighted overlap measure incorporates the relative rank of each nearest neighbor in addition to the actual number of matching neighbors . B uilding on the computation provided in Table 5 , we now factor in the rank of each nearest neighbor in Table 6 . The weighted overlap measure compares t he sum of the rank differences for each matching nearest neighbor to the maximum sum of rank differences for the relevant combination of k value and number of matches . Expanding upon the simple overlap example shown in Table 5 , we provide an example of the weighted overlap measure computation i n Table 6 . Table 6 . Rank Weighted Overlap Measure Example Nearest Neighbors for Patient X Using Original OTCS Nearest Neighbors for Patient X Using OTCS MES Difference in Rank 2 Patient K 1 Patient K 1 4 Patient F 2 Patient F 2 3 Patient C 3 Patient G 1 Patient L 4 Patient L 3 5 Patient Z 5 Patient M a. Sum of Rank Differences 6 b. Maximum Sum of Rank Differences for K=5 Nearest Neighbors and 3 Matches 10 c. Rank Difference Adjustment Factor (a/b) 0.4 Total common nearest neighbors is 3 out of a possible 5 . Therefore, the simple overlap measure value is 3 divided by 5 or 0.60 and the weighted overlap measure value is the simple overlap measure value multiplied by the rank difference adjustment factor or 0.24.

PAGE 54

45 Empirical Characteristics of the OTCSMES Before comparing the OTCS MES to other similarity measures, we empirically study its characteristics on the inpatient and outpatient data sets. To understand the impact of OTCS MES components, we consider both equal and unequal component weighting. Figure 4 shows t he frequency distribution of patient pairs based on the OTCS MES with equal weighting for inpatient data . The distribution shows symmetry with a long, thin tail on the right. Nearest neighbors should be concentrated in the long, thin right tail . Figure 5 shows t he frequency distribution of OP patient pairs based on the OTCS MES measure with equal weighting. The distribution in Figure 5 shows less symmetry, b ut still a long thin tail at the right end. The long thin tails in frequency distributions for both data sets demonstrate that the OTCS MES can help differentiate MESs with high similarity. Figure 4 . Fre quency Distribution of Patient P airs by OTCS MES (I npatient Data)

PAGE 55

46 Figure 5. Frequency Distribution of Patient Pairs by OTCS MES (Outpatient Data) For unequal component weighting, the analysis helps determine if small weight changes indicate disproportionate impact on nearest neighbors . A nalysis of component weighting for the IP based OTCS MES measure indicates a reasonably consistent set of nearest neighbors. T he graphs in Figures 6 and 7 demonstrate slopes with linear and flat appearances indicating that small changes in weights have comparatively small impacts on overlap between the nearest neighbor populations generated by OTCS MES . Figure 6 indicates that simple nearest neighbor overlap exceeds 65% when adjusting the weighting up or down by 0.2 or less . Additionally, Figure 7 indicates that rank weighted nearest neighbor overlap exceeds 45% when adjusting the weighting up or down by 0.2 or less . Furthermore, extreme weighting of either the event similarity or structural similarity components still show s at least a 25% nearest neighbor overlap . This overlap indicates that the OTCS MES remains relatively insensitive to component weighting levels.

PAGE 56

47 Figure 6 . Simple Overlap using Unbalanced Weighting (Inpatient Data) Figure 7 . Rank Weighted Overlap using Unbalanced Weighting (Inpatient Data) Comparison to OTCS and Artemis W e evaluate nearest neighbor overlap between the OTCS MES measure and two comparative similarity measures, the original OTCS and Artemis . The overlap analysis is a prerequisite to comparing performance in decision making tasks such as classification, clustering, and search mechanisms used by

PAGE 57

48 health care providers. If the nearest neighbors generated by OTCS MES , OTCS, and Artemis do not differ, the performance of these measures when applied to MESs cannot be different. To demonstrate the impact of component weights and neighborhood size, the comparison incorporates multiple component weights and neighborhood sizes . T he overriding question is whether the OTCS MES generates a different set of nearest neighbor s for substantial samples of MESs than other proposed similarity measures (OTCS and Artemis) . We hypothesize that comparison of the measures will indicate les s than 0.50 overlap for both the si mple and rank weighted overlap measures. For Hypothesis 1, we consider both balanced component weights (equal weights for event matching and temporal matching components), and unbalanced component weights with extreme wei ghting with all weighting on one component and mixed weighting with a 2: 1 ratio of component weights. Hypothesis 1. The overlap of nearest neighbors will be less than half for the OTCS measures because OTCS MES has refinements (event and structural) in measure component s. Hypothesis 2. The overlap of nearest neighbors will be less than half between the OTCS MES and Artemis measures and the overlap will be much smaller for Artemis compared to the OTCS measures as Artemis emphasizes alignme nt, not matching event counts. Beyond demonstrating overlap differences among the three measures , the analysis isolates the impact of similarity measure components and MES weight . For similarity components, we hypothesize that extensions to the temporal component will have more impact than extensions to the event matching component. The OTCS MES has more extensions in the temporal component (temporal structure of all events (OTCS MES) versus matching events only (OTCS) and mean and variation of gaps and duration (OTCS MES) versus sum of gap differences (OTCS)). For MES length, we hypothesize that the impact of partial matching and prevalence will increase as MES length increases. A larger number of

PAGE 58

49 events in an MES will provide more opportunity for partial matching using event type prevalence , l eading to decreased overlap between OTCS MES and OTCS . Hypothesis 3. Overlap on the temporal components will be smaller than overlap on the event matching components for OTCS MES and OTCS. Hypothesis 4. As MES length increases, overlap between OTCS MES and OTCS decreases on the even t matching co mponent. Comparison to OTCS using Balanced Weights The comparison between OTCS MES and OTCS begins with balanced weights, a likely default choice for weights. Without a strong preference for either event matching or temporal matching, equal we ights for each component should be the default choice. Figures 8 depict s results of balanced weighting for the simple and weighted overlap measures. For the IP data set, the simple overlap rate is between 10% and 20%, while the weighted overlap rate is between 8% and 14%. The graph show s a slight increase of overlap rate with the size of the neighborhood. For the OP data set, OTCS MES and OTCS have a larger overlap although still below 50% for all neighborhood sizes . Figure 8 shows that t he simple overlap remains below 3 0% at a neighborhood size below 2 5 . However, the weighted overlap does not cross 3 0% even for the largest neighborhood size (50) (Figure 8) .

PAGE 59

50 Figure 8 . Simple and Rank Weighted Overlap of OTCS MES Similarity Measure vs Original OTCS For more rigorous evaluation s, we used one tailed hypothesis tests with the null hypothe sis of overlap greater than 0.5 as summarized in Table 7. In each test, we set the neighborhood size to 25 and used equal component weights . All overlap test results are consistent with the performance graph (Figure 8 ). For both inpatient and outpatient data set s , t he hypothesis test results show strong evidence to reject the null hypothesis and confirm Hypothesis 1 that the overlap is less than 0.5 . Table 7. Summary of t Test Results for OTCS MES/ OTCS with Balanced Weights Data Set Overlap Measure N Mean P value Inpatient Simple 6997 0.155 0.0001 Inpatient Weighted 6997 0.101 0.0001 Outpatient Simple 948 0.415 0.0001 Outpatient Weighted 948 0.266 0.0001 Comparison to OTCS using Unbalanced Weights For unbalanced weighting of the event and temporal (structural) components of OTCS MES, the impact on nearest neighbor overlap may be determined through the results of various component weighting scenarios . To isolate component differences, we use extreme weighting with all weights on

PAGE 60

51 just one component. We relax extreme weights to show overlap for weights favoring one component (0.75) over the other component (0.25) , a 2:1 weighting ratio . Figures 9 and 10 show results of extreme weighting for both inpatient (IP) and outpatient (OP) data sets . For both data sets, the results show smaller overlap for extreme weighting on the temporal component. In Figure 9 , t he simple overlap in the inpatient graph remains under 1 0% except for large neighborhood sizes greater than 40. The simple overlap in the outpatient graph is below 25% for all neighborhood sizes. In Figure 10 , the weighted overlap is smaller for temporal component graphs than temporal component graphs for simple overlap in Figure 9 . The graphs for event matching display different overlap (simple and weighted) than the temporal component graphs. For the inpatient graphs, the simple overlap increases from just under 30% for small neighborhoods to 40% for large neighborhoods. W eighted overlap increase s from just under 20% to just under 30% in Figure 10. For outpatient graphs, simple overlap increases from about 25% for a very small neighborhood size (5) to 60% for a very large neighborhood size (50). Simple overlap is a little below 50% for a moderate neighborhood size of 25. Weighted overlap increases from less than 20% for a very small neighborhood size (5) to 40% for a very large neighborhood size (50). Weighted overlap is a little below 40% for a moderate neighborhood size of 25. Compared to extreme weighting, t he mixed weighting graphs in Figures 11 and 12 display similar overlap for inpatient data but smaller overlap for outpatient data. For a moderate neighborhood size of 25, the event matching emphasis graph shows simple overlap about 40% (Figure 11 ) while the extreme weighting graph shows simple overlap about 50% (Figure 9 ). For a moderate neighborhood size of 25, the event matching emp hasis graph shows weighted overlap under 30% (Figure 12 ) while the extreme weighting graph shows weighted overlap above 3 0% (Figure 10). For the outpatient data set,

PAGE 61

52 the event matching emphasis graph is under the temporal matching emphasis graph, a switch from extreme weighting graphs . Figure 9 . Simple Overlap Results with Extreme Component Weighting Figure 10 . Rank Weighted Overlap Results with Extreme Component Weighting

PAGE 62

53 Figure 11 . Simple Overlap Results with Mixed Component Weighting Figure 12 . Rank Weighted Overlap Results with Mixed Component Weighting For more rigorous evaluations, we used one tailed hypothesis tests with the null hypothesis of overlap greater than 0.5 as summarized in Table 8. In each test, w e set the neighborhood size to 25 and used unbalanced component weights, either extreme or mixed . O verlap test results are consistent with performance graphs (Figures 9 to 12 ). All test results with rank weighted overlap are significant, demonstrating stro ng evidence to reject the null hypothesis and confirm Hypothesis 1. Three test results with simple overlap for outpatient data were not significant, but the confidence intervals in Table 9 are close to the threshold 0.5 overlap threshold .

PAGE 63

54 Table 8 . Summary of t Test Results for OTCS MES/OTCS with Unbalanced Weights Data Set Overlap Measure Weighting Component Emphasis Mean P value Inpatient Simple Extreme Event 35.24% 0.0016 Temporal 7.46% 0.0000 Mixed Event 28.14% 0.0000 Temporal 9.79% 0.0000 Rank Weighted Extreme Event 22.84% 0.0000 Temporal 4.73% 0.0000 Mixed Event 18.41% 0.0000 Temporal 6.26% 0.0000 Outpatient Simple Extreme Event 46.33% 0.3753 Temporal 14.42% 0.0003 Mixed Event 41.03% 0.2100 Temporal 43.68% 0.3194 Rank Weighted Extreme Event 30.82% 0.0192 Temporal 9.02% 0.0000 Mixed Event 26.61% 0.0046 Temporal 27.97% 0.0173 Table 9. Confidence Intervals (95%) for NonSignificant t Test Results in Table 8 Data Set Overlap Measure Weighting Component Emphasis Mean P value Lower CI Upper CI Outpatient Simple Extreme Event 46.33% 0.3753 37.20% 55.46% Temporal 14.42% 0.0003 Mixed Event 41.03% 0.2100 30.69% 51.36% Temporal 43.68% 0.3194 33.09% 54.28% Results for hypothesis 4 are consistent with the graphs comparing overlap to event sequence length (Figures 13 and 14) . Inpatient and outpatient data differ sharply on sequence length. In the MES samples used in this study, an outpatient MES had an average of 11.04 events and over half of the outpatient event sequences have greater than 10 events . In contrast, an inpatient MES had an average of 5.35 events with 64% of inpatient event sequences having five or fewer events. For inpatient data, Figure 1 3 shows overlap decreases over most increases in sequence length , confirming Hypothesis 4. Inpatient data has a high level of missing data (40%) for 5 digit ICD 9 codes so hierarchical matching used by OTCS MES loses some advantage . For outpatient data, Fi gure 14 shows that overlap decreases as sequence length increases for most of the sequence length range , also confirming Hypothesis 4 .

PAGE 64

55 Figure 13 . Nearest Neighbor Overlap by Sequence Length (Inpatient Data) Figure 14 . Nearest Neighbor Overlap by Sequence Length (Outpatient Data) For another perspective on the impact of partial matching of hierarchical codes. Table 10 shows total event matching for OTCS MES (with partial matching) and OTCS (without partial matching ) . The impact for outpatient data is striking with more than 250% event matches.

PAGE 65

56 Table 10 . Total Event Matches for OTCS MES and OTCS Total Event Matches Inpatient Outpatient OTCS MES 13,455,390 6,504,003 OTCS 11,560,821 1,821,349 Difference 1,894,569 4,682,654 % Difference 16.39% 257.10% Comparison to Artemis Since Artemis only has a temporal component, the analysis in th is section shows graphs with varied weights for OTCS MES but no weights for Artemis. Figures 1 5 and 1 6 show very small overlap (both simple and weighted) of nearest neighbors between OTCS MES and Artemis . T he overlap between OTCS MES and Artemis is much smaller than the overlap between OTCS MES and OTCS for inpatient data . Figure 1 5 . Simple Overlap of OTCS MES versus Artemis (Inpatient Data)

PAGE 66

57 Figure 16 . Rank Weighted Overlap of OTCS MES sure versus Arte mis (Inpatient Data) We also compare the OTCS MES to Artemis using the outpatient data. Figures 1 7 and 1 8 show small overlap (both simple and weighted) of nearest neighbors between OTCS MES and Artemis. Again, we observe that the overlap between OTCS MES and ARTEMIS is substantially smaller than the overlap between OTCS MES and OTCS . Event alignment as the focus of ARTEMIS measure substantially different nearest neighbors than event matching through the OTCS ME S measure for outpatient data . Figure 1 7 . Simple Overlap of OTCS MES versus Artemis (Outpatient Data)

PAGE 67

58 Figure 18 . Rank Weighted Overlap of OTCS MES versus Artemis (Outpatient Data) The overlap graphs for both data sets demonstrate little impact of component weights for OTCS MES. The graphs for each combination of component weights are tight with little space between them. Thus, OTCS MES and Artemis have little overlap regardless of w eights used for OTCS components. For more rigorous evaluations, we used one tailed hypothesis tests with the null hypothesis of overlap greater t han 0.5 as summarized in Table 1 1 . In each test, we set the neighborhood size to 25 and used equal component w eights for OTCS MES . The overlap test results are consistent wi th performance graphs (Figures 1 5 to 1 8 ) for both data sets and overlap measures . All hypothesis test results show strong evidence to reject the null hypothesis and confirm Hypothesis 1 that the overlap is less than 50% .

PAGE 68

59 Table 1 1 . Summary of t Test Results for OTCS MES/ Artemis with Balanced Weights Data Set Overlap Measure N Mean P value Inpatient Simple 6999 0.00804 0.0001 Inpatient Weighted 6999 0.00513 0.0001 Outpatient Simple 965 0.12220 0.0001 Outpatient Weighted 965 0.07420 0.0001 Discussion Our results demonstrate that a similarity measure adapted specifically to unique properties of medical events behaves differently than measures ignoring such features . The OTCS MES uses hierarchical event matching with prevalence scores , mean and variation in measuring temporal structure, no rmalization of event similarity, and standard [0,1] weights to combine similarity components. Empirical analysis provided evidence about the consistency of weights and normalization for both inpatient and outpatient MES data . A detailed empirical comparison demonstrated substantial differences in nearest neighbor over lap between OTCS MES, OTCS, and Artemis. The comparison involved two data sets (inpatient with typically short sequence length and outpatient with longer sequence length) with different event coding (ICD 9 for inpatient and CPT for outpatient), two overlap measures (simple and weighted accounting for relative positions of neighbors), and impacts of three important factors ( neighborhood size, matching component weights , and sequence lengths). Table 1 2 summarizes empirical testing results to confirm most hyp otheses about small overlap among the three similarity measures. For Hypothesis 1, results demonstrate substantially small overlap between OTCS MES and OTCS except for two cases of unbalanced weights with overlap just above the 0.5 threshold. For Hypothesi s 2, results demonstrate that event alignment in Artemis produces very different nearest neighbors than any combination of event and temporal matching as used in the OTCS MES measure. For Hypothesis 3, results indicate less overlap on temporal similarity c omponents of OTCS MES and OTCS than the event matching components except for mixed weighting. For Hypothesis

PAGE 69

60 4, results show overlap decreasing as sequence length increases for both inpatient and outpatient data. The high level of missing data at the most detailed level seems to influence the impact of sequence length on inpatient data set. Table 1 2 . Summary of Hypothesis Evaluation Comparison Inpatient Data Outpatient Data H1a: OTCS MES vs. OTCS (simple overlap) Evidence of substantial differences using balanced and unbalanced component weights Evidence of substantial differences on balanced weights and evidence of near threshold differences for unbalanced weights H1b: OTCS MES vs. OTCS (weighted overlap) Evidence of substantial differences using balanced and unbalanced component weights Evidence of substantial differences using balanced and unbalanced component weights H2a: OTCS MES vs. Artemis (simple overlap) Evidence of substantial difference Evidence of substanti al difference H2b: OTCS MES vs. Artemis (weighted overlap) Evidence of substantial difference Evidence of substantial difference H3a: OTCS MES vs. OTCS overlap for event matching greater than temporal matching (extreme weighting) Evidence of substantial difference Evidence of substantial difference H3b: OTCS MES vs. OTCS overlap for event matching greater than temporal matching (mixed weighting) Evidence of substantial difference No evidence of substantial difference H4: OTCS MES vs. OTCS overlap decreases as MES length increases Evidence of decrease Evidence of decrease The empirical results in this study provide a foundation to study performance differences in reasoning tasks using a similarity measure. The empirical results in this study focus on overlap of nearest neighbors, a measure independent of application and alg orithm . The results depict substantial overlap differences to help explain performance differences in subsequent studies. We intend additional research to study performance differences among similarity measures in classification, clustering, and search tas ks for various medical application areas. Because the OTCS MES captures important features of MESs, we feel confident that OTCS MES will have subs tantially better performance for a variety of tasks and applications than other proposed similarity measures. The evidence in this study ( small overlap between OTCS MES and OTCS on the event matching component and the large amount of missed event matches by OTCS ) indicate that OTCS MES has a substantially better design than OTCS. Exploring the potential applications for the OTCS MES similarity measure yields several important research opportunities . Among these are (1) patient classification perhaps using disease or

PAGE 70

61 risk groups, (2) evaluation of patient adherence to a clinical pat hway or care management plan, and (3) discovery of similar patients for medical social networking . First, determination of OTCS MES precision in classification requires verification of the homogeneity of the risk level and condition profile for patients de emed similar based on OTCS MES . Such precision evaluation necessitates a test group of ground truth patients and their respective risk and condition labels . Appropriate algorithms for patient risk and condition labeling that may be implemented for this app lication include CDPS risk scoring and the numerous co morbidity indices (Charlson, Elixhaurser, etc.) for disease determination . A second OTCS MES application involves determination of patient adherence to an accepted clinical pathway specific to their diagnosed condition . A clinical pathway is a sequence of prescribed procedures, medications, tests or other medical events related to a disease or physical condition . As such, OTCS MES is especially suitable to evaluate a patient’s adherence to their prescri bed medical event sequence or clinical pathway . Finally, medical social networks allow patients having similar medical histories to discuss treatment successes and failures, exchange experiences and receive emotional support. OTCS MES could prove valuable in augmenting methodologies used by medical social networks when retrieving similar patients for homogeneous online patient communities. Conclusion We developed a similarity measure for medical event sequences (MESs) and empirically compared it to two othe r similarity measures designed for MESs using publicly available U.S. Medicare claims data. We designed the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS MES) based on unique aspects of MESs including dense, hierarchical coding sche mes, duplication of events, and event prevalence. The OTCS MES contains components for event similarity using hierarchical event matching with prevalence scores , mean and variation of event duration and gaps, standard [0 1] weights, and normalization based on empirical characteristics of MESs. We empirically evaluate d the OTCS MES measure against two other measures specifically designed for MESs, the

PAGE 71

62 original OTCS and Artemis, a measure incorporating event alignment. Our evaluation used two substantial data sets of Medicare claims data containing inpatient and outpatient sequences. Using two overlap measures, w e found a small overlap in nearest neighbo rs among the three similarity measures demonstrating the importance of unique aspects of MESs. The evaluation also provided evidence about internal consistency of weight choices for the OTCS MES. We plan additional research to assess the performance of sim ilarity measures for MESs to augment the focus in this research on overlap differences. The analysis here should be extended to clustering performance using indep endent cluster quality measures, robustness for poor data quality , and classification performa nce . Additional analysis should utilize additional types of medical events. Besides addressing these areas of additional analysis, we plan future research about medical reasoning using the OTCS MES . To support reasoning by health care professionals using M ESs for risk analysis, clinical pathways , and co morbidity, we propose to develop a matching operator and visualization tools for MESs to augment the OTCS MES. The matching operator will support temporal constraints about changes in medical events in MESs such as increased severity and prolonged symptoms. Visualization tools for MESs will help health care professionals see important patterns in MESs. Human factors studies will be necessary to evaluate the utility of a query architecture combining a similari ty measure, matching opera tor, and visualization tools .

PAGE 72

63 CHAPTER III SIMILARITY MEASURES FOR MEDICAL EVENT SEQUENCES: PREDICTING MORTALITY IN TRAUMA PATIENTS Abstract In this study, w e extend a similarity measure for medical event sequences (MESs) and evaluate its classification performance for retrospective mortality prediction of trauma patient outcomes. Retrospective mortality prediction is a benchmarking task used by trauma care governance bodies to assist with policy decisions. We extend a similarity measure, the Optimal Temporal Common Subsequence for MESs (OTCS MES), by generalizing the event matching component with a plugin weighting element. The extended OTCS MES uses an event prevalence weight developed in our previous study and an event sever ity weight developed for this study. Importantly, our method requires no exogenous data as all predictive information is contained in the trauma incident registry. In the empirical evaluation of classification performance, we provide a more complete evaluation than previous studies. We compare the predictive performance of the Trauma Mortality Prediction Model (TMPM), an accepted regression approach for mortality prediction in trauma data, to nearest neighbor algorithms using similarity measures for MESs. Using a data set from the National Trauma Data Bank, our results indicate improved predictive performance for an ensemble of nearest neighbor classifiers over TMPM. Our analysis reveals a superior Receiver Operating Characteristics (ROC) curve, larger AUC, and improved operating points on a ROC curve. We also study methods to adjust for uncommon class prediction, weighted voting, neighborhood size, and case base size. Results provide strong evidence that similarity measures for medical event sequences are a powerful and easily adapted method assisting with health care policy advances.

PAGE 73

64 Introduction Trauma Care Evaluation This study involves mortality prediction for trauma centers, an important classification task having established methods. There is an essential focus on trauma care because trauma injuries are the leading cause of death in people younger than 44 and the fifth leading cause of death for all age groups (Glance et al. 2009). Additionally, treatment methods for trauma related injuries are ex tremely costly, often leading to more expensive forms of care. The importance and cost of trauma care mandate benchmarks for trauma center performance relative to the injury severity of incoming patients. Cassidy et al. (2014) maintain that “accurate injury severity scoring systems are essential for benchmarking outcomes and objectively evaluating and improving trauma care.” With injury being the leading cause of lost years of life and escalating trauma care costs, improved trauma patient outcomes and strea mlined care delivery are important objectives of researchers and policy makers. As such, the Trauma Care Systems Planning and Development Act was passed to improve trauma care and establish a Division of Trauma in the Department of Health and Human Service s. Resulting regional trauma systems are designed to reduce mortality from injury. Furthermore, governance bodies, including the World Health Organization and the American College of Surgeons, provide consensus based policy recommendations on the structure of trauma systems. This attention to trauma care has shown positive results with an estimated 15% reduction in the odds of mortality and decreases in both disability outcomes and costs (Moore et al. 2018). I nformed policy decision s by trauma governance bo dies remain of utmost importance. The method advanced in this research improves trauma policy decisions to help mitigate the “major knowledge gap on which components of a trauma system contribute to their effectiveness ” (Moore et al. 2018). W e propose our method to more accurately evaluate trauma center performance

PAGE 74

65 to facilitate better policy decisions. In confirmation, t rauma care administrators state the “the next logical step in the process of trauma system evaluation is to establish measures that consistently capture true outcome performance” and “evaluation of trauma system effectiveness will require ongoing outcome analysis in what must remain an uncompromising commitment to optimal outcome for the injured patient ” ( Celso et al. 2006) . Retrospective M ortality Prediction Evaluating trauma care based on benchmarks for mortality rates, dependent upon injury severity, involves retrospective mortality prediction. Retrospective mortality prediction methods for trauma care have “important clinical and economi c implications because these tools are used to evaluate patient outcomes and quality of care” (Weeks et al. 2016). Our research is not advancing an “in facility” clinical decision tool, per se, but suggests a method to improve strategic decision making tho ugh more accurate assessments of trauma care. Essentially, retrospective mortality prediction enables governance bodies to measure trauma care delivery based on “benchmark” mortality rates for comparable patients or injury mix. Trauma centers demonstrating superior patient outcomes inform policy and resource allocation decisions concerning the various components of trauma care systems (transportation, triage, facility design, benchmarking, etc.). R etrospective evaluation of trauma care is a commonly researc hed area. In fact, a recent Google Scholar search using trauma+retrospective+ “mortality prediction” since 2017, generated 674 articles in the result. Typically for trauma care , mortality prediction models use historical incidents to correlate patient attributes and injury severity to known trauma discharge dispositions (deceased or non deceased) . R etrospective mortality prediction provides injury outcome “benchmarks” to help improve level of trauma care through more informed policy decisions. A s explained previously, governance bodies retrospectively assess trauma center performance to determine a facility’s comparative level of

PAGE 75

66 care, and the components of trauma care systems most impacting improved patient outcomes. Trauma centers having superi or performance, evidenced by comparatively low mortality rates, are surveyed to determine which components of trauma care systems are predominant. For example, studies (Celso et al. 2006) using retrospective mortality prediction have compared trauma care b etween (a) in hospital facilities and external trauma centers, (b) level I and level II trauma centers, and (c) trauma centers in high and low middle income countries . An example of a resulting policy change, from such studies, is c ontinuation of a 2 tiered designation system for trauma care (Glance et al. 2012). In addition to evaluating trauma center performance, retrospective mortality prediction enables study of mortality rates across patient cohorts . For example, Hashmi et al. (2014) projected mortality rates from reference studies for an age group comparison of outcomes for trauma patients. In summary, retrospective mortality prediction establishes guidelines for appropriate outcomes from trauma care to help inform trauma care system policies. Accordin gly, the research advanced in this study intends to improve established methods for trauma mortality prediction. Mortality Prediction Using Similarity Measures Because of the importance of mortality prediction for trauma centers, researchers have developed several prominent prediction methods. The most widely accepted method, the Trauma Mortality Prediction Model (TMPM), involves detailed regression modeling of individual injury codes using a large training sample. TMPM (Glance et al. 2009) uses derived coe fficients for more than one thousand injury codes to make mortality predictions. In this portion of our research , we study an alternative approach to mortality prediction based on similarity of medical events in a patient’s trauma incident. Our approach requires no exogenous data contained in linked EHRs but relies solely on data elements endogenous to the trauma incident registry. Furthermore, predicting mortality based on incident similarity provides better explanation than

PAGE 76

67 regression prediction as similar cases provide explanation of a prediction. Prediction based on similarity using nearest neighbor classification does not require training, although it requires indexing of trauma incidents for efficient computation of nearest neighbors. Alternatively, t raining is required to reduce the number of cases in a reference set. Research Methodology We compare predictive performance of nearest neighbor classification using a similarity measure to TMPM, a prominent approach for mortality prediction in trauma data . In nearest neighbor classification, we use OTCS MES with two weighting approaches (event prevalence and event severity), the original OTCS with only exact matching of event codes, and an ensemble using these three classifiers. We compare performance with three important measures: receiver operating characteristic (ROC) curves, area under a ROC curve (AUC), and operating points derived from a ROC curve. Results are based on a substantial data set from the National Trauma Data Bank. Our results indicate sup erior performance for an ensemble of nearest neighbor classifiers over TMPM on ROC curve analysis and AUC. For optimal operating points, the ensemble provides better performance than TMPM especially as the importance of sensitivity increases. W e also study the impact of oversampled training data versus lift ed voting for the uncommon mortality class, weighted voting, neighborhood size, and case base size . Contributions This study makes three important contributions. Most importantly, this study developed a n ew classification method with better performance than the accepted standard, TMPM. The ensemble of nearest neighbor classifiers obtained better performance than TMPM on ROC curves, AUC, and optimal operating points on a ROC curve. No studies have used near est neighbor classification for mortality prediction with an uncommon mortality class. As an important secondary contribution, generalization of the “event matching” component of OTCS MES makes event matching applicable to a wider variety of

PAGE 77

68 medical domain s. As another secondary contribution, the detailed performance comparison provides a more complete analysis than previous studies. For example, prior studies neglected to compare performance on operating points on a ROC curve. Th e description of this stud y continues as follows. The next or second section reviews prior work on mortality prediction for trauma centers and similarity measures for MESs. The third section presents the design of the experiment comparing nearest neighbor prediction with OTCS MES t o TMPM. The fourth section presents results of the experiment and discusses implications. The fifth section summarizes the study and identifies future extensions. Related Work To provide a context for the experiment design and results in the next sections, we review previous work on mortality prediction for trauma centers and similarity measures for MESs . We review early methods for mortality prediction (Injury Severity Score and Abbreviated Injury Scale) as well as two contemporary methods (the Bayesian Tr auma Prediction Model and the Trauma Mortality Prediction Model) . Our research uses the Trauma Mortality Prediction Model and classifiers base d on OTCS MES measures . Injury Severity Score (ISS) and Abbreviated Injury Scale (AIS) Because of the demand for a ccurate mortality prediction for trauma centers, several severity scoring systems have been developed. The Injury Severity Score (ISS), an early method for severity scoring, uses the Abbreviated Injury Scales (AIS) to score injuries and predict trauma outc omes (Cassidy 2014) . AIS is an anatomical based coding system created by the Association for the Advancement of Automotive Medicine to quantify injury severity . The International Classification of Diseases, Clinical Modification, version 9 (ICD 9 CM) is a more recent coding system with injury classifications. Because

PAGE 78

69 the National Trauma Data Standard now mandates ICD 9 CM, ISS provides an option to use ICD 9 CM or AIS codes (Glance 2009). Alternative severity scoring models to the ISS have been proposed . The International Classification of Diseases Injury Severity Score (ICISS) uses empirically derived survival risk ratios (SRR) for ICD 9 CM codes . ICISS calculates the proportion of survivors among patients having an ICD 9 CM injury code (Glance 2009). Another alternative approach, the Single Worst Injury (SWI) Model, focuses on injury assessment using a patient’s single most severe (worst) injury . The single worst injury is commonly determined by the AIS score (Tohira 2012) . In preliminary injury scoring, the single worst injury was often used to predict outcomes . However, subsequent scoring systems leverage multiple injuries to contribute to outcome prediction (Kilgo et al. 2003). Regression Models for Mortality Prediction Burd et al. (2008) deve loped the Bayesian Logistic Injury Severity Score (BLISS) to leverage ICD 9 CM trauma coding with 2,210 possible injury codes and 243,037 two way interactions among injury codes . Like ICISS, BLISS relies solely on ICD 9 CM codes without the need for psycho logical or supplementary data often input to other methods . In contrast to ICISS, BLISS uses injury interactions, not just individual injury codes. Burd et al. (2008) found slight improvements in prediction performance with BLISS compared to ICISS but much better model calibration with the Hosmer Lemeshow h statistic. Prediction performance of BLISS was most apparent among patients at lower risk for mortality. The more recent Trauma Mortality Prediction Model (TMPM), a probit regression model, supports alte rnative injury codes (AIS, ICD 9 CM, or ICD 10) . TMPM uses approximately 1,000 different types of injuries characterized by these coding sets . TMPM comprises two separate probit models. Model 1 uses all possible injuries as binary predictors with death as the binary outcome. Model 2 uses

PAGE 79

70 indicators of body region severity . A weighted average of the coefficients of the two regression models provides the empirical severity for each injury. Empirical analysis showed that TMPM ICD 9 provided superior performan ce than other ICD 9 CM based models. However, analysis in previous studies omitted analysis of operating points. Superior predictive performance of the TMPM ICD9 was most noted as the number of injuries increased (Cassidy et al. 2014). In this study, we us e TMPM ICD 9 because of its performance and availability. TMPM has been compared to other prediction mortality prediction models except for BLISS. The R implementation of TMPM ICD9 facilitated usage in our experiments (https://cran.rproject.org/web/packag es/tmpm/tmpm.pdf). Similarity Measures for MESs In previous research, we developed a similarity measure for MESs known as the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS MES) . Its development was motivated by limitations in other proposed methods, particularly the original OTCS (Zheng et al. 2010). In a detailed empirical evaluation (XXXX), we compared the OTCS MES to the original OTCS and Artemis, a measure incorporating event alignment. This comparison used inpatient MESs with ICD 9 CM codes and outpatient MESs with CPT procedure codes. Overall, we found a small overlap in nearest neighbors among the three similarity measures, demonstrating the superior design of the OTCS MES with its emphasis on unique aspect s of MESs. With extreme weighting on just the event matching components of OTCS and OTCS MES, simple overlap rates for shared nearest neighbors ranged from 25% for small neighborhood sizes (5) to 40% for large neighborhood sizes (50) in inpatient data and 60% for large neighborhood sizes in outpatient data.

PAGE 80

71 The evaluation in our previous study (XXXX) never investigated classification performance of OTCS MES. Although the OTCS MES contains components for event similarity and temporal structure similarity, th is study only uses the event matching component because trauma incidents are reported without timing of ICD codes. To use the OTCS MES for classification, we generalize the OTCS MES to provide a level of domain customization. R esearch Methodology Initially , our research methodology concerns the use of MES similarity measures to find like trauma incidents based on injury (event) sequence matching. We first expound on similarity measures and their application within nearest neighbor classification to predict trauma outcomes based on known outcomes for nearest neighbor trauma incidents. We then turn to our secondary research, aimed at designing the most effective approach to nearest neighbor classification for prediction of uncommon mortality in trauma incident s. Specifically, we evaluate the following constraints on our classification method leveraging similarity measures specific to trauma incidents: Voting method for nearest neighbor trauma incidents traditional majority voting or soft voting with propor tional weights. Size for nearest neighbor cohort – 1 through 49 (odd only). Adjustment method for imbalanced data – majority voting, oversampling, or certainty factor voting. Case base size – 5,000, 10,000 or 50,000 training incidents and 2,000 test incidents. Given the best kNN nearest neighbor classification method from our secondary research, we move on to our primary research evaluating trauma mortality prediction using our alternative method. We start by presenting our primary research hypotheses for similarity measure performance. Performance is evaluated against both the industry gold standard , TMPM, and amongst the various

PAGE 81

72 similarity measure approaches (original OTCS exact matching, and OTCS MES prevalence and severity weighted partial matching ). The research methodology section continues by detailing the trauma data used for our empirical evaluation and associated data filters required for an equitable empirical evaluation. Finally, we describe the performance measures chosen to evaluate our al ternative predictive method. Similarity Measures for Medical Event Sequences This study uses three similarity measures, the original OTCS, the OTCS MES with event prevalence weights (OTCS MES EP), and the OTCS MES with event severity weights (OTCS MES ES) . Although all three measures contain components for event matching and temporal structure of events, this study only uses the event matching component because trauma records do not have a temporal structure. Original OTCS The original Optimal Temporal Comm on Subsequence (OTCS), developed by Zheng et al. (2010), uses exact matching for events. Given a statesequence defined as Sn = [ s1, , sn], the OTCS compares two statesequences Sm and S'n based upon exact matching of the states (events) within S and S' ( Zheng et al. 2010) . Figure 1 9 shows the OTCS matching algorithm, reprinted from Zheng et al. (2010) . This figure illustrates exact state matching by OTCS with state si compared in totality to state s'j. Therefore, when applied to MESs, the original OTCS wo uld require that an ICD 9 CM code match exactly in both length and content.

PAGE 82

73 Figure 19 : OTCS Event Matching Procedure (Zheng et al. 2010) Although the OTCS was motivated by temporally spaced state sequences, the hierarchical nature of MESs was not utilized in the measure. Thus, the OTCS does not incorporate partial event matching, only counting the number of exact (non partial) matches between event sequences . For example, if one MES contains ICD 9 CM code 250.00 and a second MES contains ICD 9 CM c ode 250.01, the original OTCS would not find a match although they represent highly related medical events. In addition, the OTCS does not allow weighting of matched events in its event matching component. For example, if two MESs share medical events 250. 01 and 279.00, these matched events are given equal weight by the original OTCS . The original OTCS simply counts matched events, regardless of event likelihood, risk, or severity . OTCS MES with Prevalence Weights (OTCS MES EP) In contrast to the original OTCS, OTCS MES integrates unique features of MESs . The OTCS MES provides a matching component that integrates event prevalence, event duplication, and hierarchical coding, important elements of MESs. Event prevalence, normalized to mitigate heavy positive skew and compact distribution, provides weights for matched events. Partial matching captures similarity based on the hierarchical organization of event codes, increasing similarity beyond exact matching. For example, if one MES contains ICD 9 CM code 250.00 and a second MES contains ICD 9 CM code 250.01, the OTCS MES considers these events matching at the 4 digit level but not at the 5 digit level (most specific ICD 9 CM codes).

PAGE 83

74 The event matching component in XXXX uses normalized event prevalence weighti ng. Since this research just uses event matching, the other components of the OTCS MES are ignored. Equation 2 shows the definition of the event matching component using event prevalence ( OTCS MES EP ). In the numerator, the event matching component sums pr evalence weights and accounts for the number of matched events . The denominator value is the maximum matched events across all MES pairs plus one, less the number of matched events between the two MESs under consideration. = ( ) / ( + 1 ) ( 2 ) where ME is the set of all matching events in the pair of cases (medical event sequences), C is set of all cases, and MSSize is the cardinality of the associated set. NPWe is the normalized prevalence weight of event e, PDM is the maximum matched event limit, and MSSizeLimit is the number of event matches in the pair of cases constrained by the matched event limit. Event prevalence weighting presumes that rarer events matched between two MESs indicate greater similarity than more common matched events . The OTCS MES EP calculates individual event likelihood or prevalence using the complete set of trauma incident events and associated diagnosis codes . An event’s prevalence weight is one minus the event’s frequency rate, so larger values (weights) indicate rare r events . OTCS MES EP normalizes the summation of prevalence weights of matched events by the maximum prevalence weight summation across all MES pairs . Additionally, OTCS MES EP retains replicated matched events versus the original OTCS measure that remove s replicated event matches.

PAGE 84

75 OTCS MES with Severity Weighting (OTCS MES ES) For trauma data, event severity provides intuitive appeal to weight matching events for mortality prediction . As previously presented, early methods for mortality prediction incorpo rated injury scoring systems with event severity . Reference literature identified two important factors for injury scoring, injury type and anatomical body region . Injury type describes the nature of the injury and includes values such as contusion, sprain , open wound, and dislocation. Body region involves the anatomical area of the body injured, such as head and neck, spine and back, torso, and extremities . Based on these two variables, Barell et al. (2002) developed a matrix having nature of injury column s, body region rows, and ICD 9 CM injury codes in each cell . As an extension to this work, Clark and Ahmad (2006) assigned a survivor proportion to each cell of the Barell matrix . Our study uses the Clark/Ahmad extension with survivor proportions assigned to each ICD 9 CM injury code. A severity weight equals one minus the survivor proportion, with larger values indicating more severe events. Equation 3 defines the OTCS MES ES , a revision of the OTCS MES EP , for severity weights . Essentially, we are replac ing the prevalence weight for a matched event, as shown in Equation 2 , with the severity weight for that same matched event. We then sum the severity weights for all matched events between our MESs under consideration, and normalize this value according to the maximum severity weight summation value across all MES pairs. = ( ) / ( + 1 ) ( 3 ) where ME is the set of all matching events in the pair of cases (medical event sequences), C is set of all cases, MSSize is the cardinality of the associated set,

PAGE 85

76 SWe is the normalized severity weight of event e, PDM is the maximum matched event limit, and MSSi zeLimit is the number of event matches in the pair of cases constrained by the matched event limit. Nearest Neighbor Classification for Mortality Prediction The similarity measures described above can be used in nearest neighbor classification algorithms. After a brief presentation of the nearest neighbor classification algorithm used in our experiment, this section describes adjust ments to our classification approach accounting for imbalanced trauma data. Nearest Neighbor Classification Algorithms The kNN classification algorithm (Bhatia and Ashev, 2009) provides a simple but computationally intense approach for classification using a distance function. To make classification decisions, the kNN classification algorithm uses a neighborhood of k neare st neighbors with majority voting among the k neighbors. In this study, inverted similarity measures (1 – similarity) were used as distance measures. The kNN classification algorithm requires no training so it is lazy. However, it uses all cases to classify new cases, so it requires large storage space and high search cost to retrieve nearest neighbors. Search cost can be substantially reduced by indexing so that indexing can be considered a training cost, qualifying the designation as a lazy learning algorithm. To reduce computational requirements for large case bases, we created indexes for creation and search of dissimilarity matrices. kNN classification has low bias but high variance (Manning, Raghavan, and Schutze, 2008). The decision boundaries in kNN vary in a nonlinear manner, providing flexibility for classification decisions. Each case has a positive probability of correct classification for some training sets. In contrast, kNN classification has high variance with sensitivity to noise in relevant attributes. Classification algorithms

PAGE 86

77 with high variance tend to overfit. For kNN, distance function, neighborhood size, and case base size influence bias and variance, emphasizing the importance of these choices. To improve prediction performance, we use t wo variations of nearest neighbor classification. Weighted voting allows more impact for neighbors close to a target case and less impact for far neighbors. The main benefit of weighted voted is less sensitivity to neighborhood size. We use proportional we ights defined by Dudani (1976) as an alternative to traditional equal weighting voting. Ensembles combine predictions of individual classifiers typically using weighted voting among classifiers on each case. Ensembles improve classification results for diverse classifiers with different biases. Many ensemble methods have been proposed for nearest neighbor classification, using both training and voting to combine individual classifiers (GarciaPedrajas and Ortiz Boyer, 2009). We use a soft voting ensemble ( scikit learn.org/table/modules/ensemble.html) with cases labeled according to sum of predicted scores. This ensemble involves additional classification resources, as it requires determination of nearest neighbors for each component classifier. Secondary Re search Accommodating the Uncommon Mortality Class Despite the serious nature of patients admitted to trauma centers, mortality is uncommon. Treatment at a trauma center is short term so only death between admittance and discharge counts as mortality. Pat ients dead on arrival and discharged to another facility do not count in the mortality disposition recorded in trauma data. After adjusting for death outside of the trauma center window, missing data, small trauma centers, and few diagnosis codes, the dece ased prevalence in our sample data was 6.28 percent. Over sampling of the uncommon class provides a typical strategy for dealing with imbalanced data . Although Maloof (2003) indicates some conflicts in results of over versus under sampling, the availabilit y of cases for the uncommon class drives usage of over sampling . Since ample data was

PAGE 87

78 available, we used over sampling to deal with the uncommon mortality class. Specifically, we used the Over Sampling Optimum Fraction (Kalton 1993) to increase the proport ion of mortality events in training data. Equation 4 defines the Kalton Over Sampling Optimum Fraction fh for class h ( ) ( 4 ) where Ph is the prevalence of the uncommon class and c is the ratio of the data collection costs per case of the uncommon to the common class. Using E quation 4 with the trauma mortality prevalence of 6.28 percent and equal data collection costs yields an optimum sampling fraction of 25.07% . A second strateg y for dealing with an uncommon class modifies the majority voting rule. When the prevalence of the uncommon class cannot be adjusted, a modified voting rule can compensate for the scarcity of cases of the uncommon class. Zhang (2010) proposed kNN CF, a certainty factor (CF) measure for kNN classification for imbalanced data. kNN CF classification accounts for the lift in proportion of extreme outcomes among the k nearestneighbors over the proportion of extreme outcomes in the population as a whole . In our experiment, we also used certainty factor voting as an alternative to adjust for the uncommon mortality class. Primary Research – Trauma Mortality Prediction Using Similarity Measures This study asserts that a similarity measure adapted to medical event histories can be a valuable clinical decision making tool . Within this broad assertion, this experiment addresses the predictive capability of MES adapted similarity measures for the classification of trauma incident outcomes based on the incident’s set of e vents . We are interested in comparisons involving the predictive performance of classifiers using individual similarity measures (OTCS, OTCS MES EP, and OTCS MES ES), the existing standard for trauma morbidity prediction (TMPM), and an ensemble of nearest neighbor classifiers using individual similarity measures. We aim to observe improved prediction performance for OTCS MES over

PAGE 88

79 TMPM and improved prediction ability of OTCS MES relative to the original OTCS. The following list presents hypotheses concerning predictive performance . 1. TMPM, as the recognized best method, performs better than the MES similarity measures (OTCS, OTCS MES EP, OTCS MES ES, and OTCS E nsemble). As explained previously, TMPM was designed specifically to predict mortality for trauma ce nter incidents . Its derivation, based on a large amount of trauma data, contains two probit regression models accounting for the type of injury and body region for the injury . According to Glance et al. (2009), since TMPM ICD9 performs better than ICISS an d the SWI model, it should be preferred for risk adjusting trauma outcomes when injuries are recorded using ICD 9 CM codes . Furthermore, Cassidy et al. (2014) confirms the superiority of TMPM for injury scoring of pediatric patients especially as the numbe r of injuries increases. Because NTDB mandates ICD coding for trauma incidents, TMPM should continue as the preferred method for trauma incident prediction. 2. OTCS MES, adapted to medical event sequences, performs better than the original OTCS similarity measure on morbidity prediction. Unlike the original OTCS similarity measure, OTCS MES allows generalized weighting of matched event and partial matching . These two capabilities should result in improved prediction on trauma morbidity prediction. Despite its shortcomings, the OTCS may still identify the most important matching events to predict mortality in trauma patients. Lack of coding detail may negate the advantage of weighted, partial matching. Coding detail depends on data collection practices at trauma centers and perhaps beyond trauma centers with some ICD codes reported in a patient’s medical record before a trauma incident occurs. As such, t he original OTCS may match predictive performance of OTCS MES with large amounts of cases and large neighborhoo d sizes.

PAGE 89

80 3. OTCS MES using event severity weighting (OTCS MES ES) performs better than the OTCS MES using prevalence weighting (OTCS MES EP). Injury severity is an appropriate weighting method for scoring trauma incidents based on referential literature . Also , injury severity has already been quantified by several scoring systems. Most recently, scoring systems based on ICD 9 CM codes and incorporating injury type and anatomical region have been found effective in classification experiments (Hedegaard et al, 2016) . OTCS MES ES, using an event severity score (the Barell matrix survivor proportion), should demonstrate improved performance for trauma incident classification. 4. The best ensemble combining individual similarity classifiers should pe rform better than individual similarity based classifiers. Ensembles improve performance of diverse classifiers. We expect enough diversity between event matching based on exact matching, normalized event prevalence with partial matching, and event severity with partial matching to achieve improved prediction results. The primary research questions listed above are evaluated by incorporating method choices made during our secondary investigation. Table 1 3 summarizes these method choices for algorithm voting , neighborhood size, adjustment method for imbalanced data, and case base size. Accordingly, t he secondary investigation is then performed to determine our best alternatives for these methodology parameters . The secondary analysis on neighborhood size also indicates if trauma data contain noise where sensitivity to small neighborhood sizes (1 to 5) reflects noisy data. Table 1 3 : Summary of Variables for Secondary Research Questions Variable Choices Nearest neighbor algorithm voting Traditional majority voting and soft voting with proportional weights Neighborhood size ( k ) 1 to 49 (odd only) Adjustment method for imbalanced data Majority voting, Oversampling, CF voting Case base size 5,000, 10,000, 50,000

PAGE 90

81 Trauma Data Set Hospitalbased trauma registries provide a foundation for much research about improving care of injured patients. Research has been limited by the lack of consistent, quality data received from disparate hospitals, regions, and states. To address this limitation, the American College of Surgeons developed the National Trauma Data Standard (www.facs.org/quality programs/trauma/ntdb/ntds) to standardize core variables across hospitals. A wide variety of trauma centers contribute to the National Trauma Data Bank (www.facs.org/quality programs/trauma/ntdb), a large aggregation of trauma data conforming to the National Trauma Data Standard. Data Filters We used the National Trauma Data Bank for our morbidity prediction experiment. Specifically, we randomly selec ted various test and training data from the complete set of 2015 trauma incidents in the trauma registries for 2015. Table 14 summarizes filters applied to the trauma data. We apply these filters to ensure that all methods, including those developed by oth er researchers, are using equivalent evaluation data. Specifically, Glance et al. (2009) provides the following description for the data filters: “Patients with burns or nontrauma diagnoses (eg, poisoning, drowning, suffocation) (60,753), missing or invali d data (data missing on age, gender, or outcome [HOSPDISP]) (42,025), or age younger than 1 year (7925) were excluded. Patients who were dead on arrival (2378) or transferred to another facility (52,169) were also excluded. We limited the data set to hospi tals admitting at least 500 patients during at least 1 year of the study because we believed that coding would be more accurate in centers with substantial trauma experience (48,095 patients were excluded).” As further corroboration, Burd et al. (2008) app lied these same data filters during development of the Bayesian Logistic Injury Severity Score. The final filter excluding trauma incidents having less than five diagnosis codes (events) is applied in accord with TMPM which uses the five most severe injury codes in its regression.

PAGE 91

82 Table 14 : Summary of Filtered Trauma Data 2015 NTDB Trauma Incidents Remaining Original data set 917,865 (1) Excluded incidents with all diagnoses being non trauma (based on MARC table) 728,309 (2) Excluded incidents for patients w/age LT 1 year, or missing age or gender 685,587 (3) Excluded incidents with missing discharge disposition (HOSPDISP n/a) 590,288 (4) Excluded incidents w/patient DOA or w/transfer to another facility 427,545 (5) Exclude d incidents for facilities handling LT 500 incidents during the year 403,534 (6) Excluded incidents having fewer than 5 diagnosis (event) codes 175,319 (6a) Deceased Disposition (6.28%) 11,010 (6b) Non Deceased Disposition (93.72%) 164,309 From the 175,319 incidents having at least five events, we randomly selected 50,000 trauma incidents for a case base and 2,000 cases for testing . The training data set contains 465,325 total diagnosis codes (4,053 unique ICD 9 CM codes) . We used the same test s et to evaluate all hypotheses. Due to a shortage of deceased cases, the oversampled case base has a morbidity prevalence of 22%, yielding 10,900 deceased cases. ICD 9 CM Code Granularity OTCS MES uses hierarchical matching that leverages the more detailed diagnoses provided by 4 and 5 digit ICD 9 CM codes . The level of coding detail in a data set affects the predictive performance of OTCS MES compared to OTCS . As shown in Table 15 , inpatient data2 contains more detailed codes (59% are 5 digits in length) t han the trauma data (45% are 5 digits in length). However, trauma data contains a lower percentage of 3 digit codes (3.5%) than inpatient data (5.3%). Based on these results, the OTCS MES (EP and ES) may lose some advantage over OTCS due to the reduced lev el of diagnosis code detail for trauma incident data. 2 Inpatient data in Synthetic Public Use Files from the Center for Medicare and Medicaid Services (cms.gov 2018).

PAGE 92

83 Table 15 : Summary of Diagnosis Detail in Trauma and Inpatient Data Length of Diagnosis Code Inpatient Events (# of Diagnosis Codes) Inpatient Events (% of Total) Trauma Events (# of Diagnosis Codes) Trauma Events (% of Total) 3 Digits 1,988 5.31% 1,588 3.52% 4 Digits 13,337 35.61% 23,026 51.06% 5 Digits 22,123 59.08% 20,478 45.41% Total 37,448 100.00% 45,092 100.00% Performance Measures For statistical evaluations, we use Area under the Receiver Operating Characteristic Curve (AUROC or AUC) as the primary performance measure . AUC provides a prevalence independent measure of discrimination ability in risk prediction models. AUC has several equivalent interpretations including the expectati on that a uniformly drawn random positive example ranks higher than a uniformly drawn negative example. Calculation of AUC requires a ROC curve of classification scores. For nearest neighbor algorithms, we used voting proportions among nearest neighbors as classification scores . We performed two tailed tests of AUC using Mann Whitney confidence intervals augmented with the Logit transformation (Qin and Hotolovac 2008). In a detailed simulation study (Kottas et al. 2014), the augmented Mann Whitney intervals provided good AUC coverage, robustness to unbalanced sample sizes and normality departures, and reasonable power. Although AUC is widely recognized as a measure of discrimination ability, it does not provide an operating point for a classifier. To deploy a classifier, one must select an operating point corresponding to a score threshold. Each ROC point has an associated confusion matrix characterizing positive and negative predictions using the scoring threshold. In our experiment, we evaluated operating points using three measures, Youden’s J statistic (also known as Youden’s Index), the weighted Youden’s Index, and the Neyman Pearson criterion . Youden’s index (Youden 1950), computed as sensitivity + specificity – 1, ranges from 1 to 1 . A value of 1

PAGE 93

84 indi cates a perfect test with no false positives or false negatives . Li et al. (2013) introduced the weighted Youden’s Index when sensitivity and specificity are not equally important . In this study, the importance of predicting a trauma incident outcome of de ceased correctly (sensitivity) is more important than predicting a non deceased outcome . Although difficult to quantify, trauma center morbidity is a costlier outcome in both treatment options and risk (Newgard and Lowe 2016) . In contrast to the tradeoff in the Youden’s Index, the Neyman Pearson criterion maximizes sensitivity at false positive constraint levels. R esults of Empirical Evaluation This section presents results of the empirical evaluation addressing the primary and secondary research questions . Results of the secondary research questions are presented first as the primary research questions use results from the investigation of the secondary questions. That is, the secondary research determines the parameters for the classification method evalu ating our primary research hypotheses. Results for Secondary Research Questions The results begin with an analysis of imbalanced versus oversampled trauma data for training . We compare average performance for methods dealing with imbalanced data (oversa mpled data and certainty factor voting (CF)). Results in Tables 16 and 17 use kNN with a neighborhood size ( k ) of 15. In Table 16 , using average results across voting methods, oversampled training demonstrates improved performance based on Youden’s Index (0.30 versus 0.13), along with AUC (0.7652 versus 0.7031) . Furthermore, over sampled data improves the key metric in trauma incident prediction of sensitivity (0.7788 versus 0.4356) . In accordance with Kalton (2009), these results suggest that over sampled training delivers improved predictive performance on trauma morbidity across the three similarity measures.

PAGE 94

85 Table 16 : Comparison of Over Sampled and Imbalanced Training Data (k = 15) Similarity Measure Sensitivity Specificity Accuracy Youden AUC Over Sampled Trauma Data OTCS 0.8341 0.4106 0.4339 0.2447 0.7300 OTCS MES EP 0.7795 0.5566 0.5689 0.3362 0.7719 OTCS MES ES 0.7227 0.5960 0.6030 0.3188 0.7939 Average 0.7788 0.5211 0.5353 0.2999 0.7652 Imbalanced (Normal) Trauma Data OTCS 0.4464 0.6515 0.6383 0.0979 0.6755 OTCS MES EP 0.3960 0.7215 0.7005 0.1174 0.7049 OTCS MES ES 0.4643 0.6976 0.6826 0.1619 0.7288 Average 0.4356 0.6902 0.6738 0.1257 0.7031 Table 17 shows that changing the voting method between majority and certainty factor has minimal impact on predictive performance . CF and majority voting have similar average values across sampling methods, for both Youden’s Index and AUC. Interestingly, CF voting provides substantial improvements in sensitivity for mortality prediction, but much lower results for specificity and accuracy . Table 17 : Comparison of Certainty Factor and Majority Voting (k = 15) Similarity Measure Sensitivity Specificity Accuracy Youd en AUC Majority Voting OTCS 0.3656 0.8169 0.7876 0.1825 0.7049 OTCS MES EP 0.3283 0.8762 0.8415 0.2045 0.7429 OTCS MES ES 0.3149 0.9192 0.8818 0.2341 0.7615 Average 0.3362 0.8708 0.8369 0.2070 0.7364 Certainty Factor Voting OTCS 0.9149 0.2452 0.2846 0.1601 0.7007 OTCS MES EP 0.8472 0.4019 0.4279 0.2491 0.7339 OTCS MES ES 0.8721 0.3744 0.4038 0.2466 0.7612 Average 0.8781 0.3405 0.3721 0.2186 0.7319 Weighted voting improves predictive performance as shown in Table 18. For the moderate neighborhood size ( k =15), weighted voting provides better performance for AUC, Youden’s Index, and sensitivity. For the large neighborhood size ( k =49), weighted voting i mproves AUC performance. Weighted voting provides more improvements on smaller neighborhoods demonstrating some sensitivity of nearest neighbor classification to the neighborhood size. Since AUC is the key performance measure, the primary investigation use s weighted voting for each similarity measure.

PAGE 95

86 Table 18 : Comparison of Weighted and NonWeighted Voting Similarity Measure Sensitivity Specificity Accuracy Youden AUC Weighted Voting Based on Similarity Measure Value (k=15) OTCS 0.8182 0.5545 0.5690 0.3727 0.7573 OTCS MES EP 0.7455 0.6360 0.6420 0.3814 0.7658 OTCS MES ES 0.7545 0.7238 0.7255 0.4784 0.8066 Average 0.7727 0.6381 0.6455 0.4108 0.7766 Non Weighted Voting (k=15) OTCS 0.3656 0.8169 0.7876 0.1825 0.7049 OTCS MES EP 0.3283 0.8762 0.8415 0.2045 0.7429 OTCS MES ES 0.3149 0.9192 0.8818 0.2341 0.7615 Average 0.3362 0.8708 0.8369 0.2070 0.7364 Weighted Voting Based on Similarity Measure Value (k=49) OTCS 0.7455 0.6783 0.6820 0.4238 0.7722 OTCS MES EP 0.7000 0.6995 0.6995 0.3995 0.7921 OTCS MES ES 0.7091 0.7228 0.7220 0.4318 0.8255 Average 0.7182 0.7002 0.7012 0.4184 0.7966 Non Weighted Voting (k=49) OTCS 0.7545 0.6302 0.6370 0.3847 0.7545 OTCS MES EP 0.7455 0.6873 0.6905 0.4328 0.7455 OTCS MES ES 0.7364 0.7947 0.7915 0.5311 0.8191 Average 0.7455 0.7041 0.7063 0.4495 0.7730 To gain insight about the sensitivity of neighborhood size ( k ), we increased neighborhood sizes (odd values only) from 1 to 49, and then measured AUC for each k value . Figure 2 0 illustrates the impact of increased neighborhood size on AUC for each classification algorithm . All three similarity measures demonstrate i mproved performance with larger neighborhood sizes. Performance gains remain level for OTCS MES EP and OTCS MES ES after k = 29. For OTCS, performance increases slightly until leveling off at k = 45. Figure 21 provides additional insight into the impact of increased neighborhood size on the two techniques for dealing with imbalanced data, over sampling and C F voting . Again, increased neighborhood size improves AUC performance for both techniques dealing with imbalanced data. The variability of predictive p erformance indicates some instability of nearest neighbor classification in trauma data.

PAGE 96

87 Figure 20 : Comparison of Similarity Measures by Neighborhood Size (k) Figure 21: Comparison of Methods for Imbalanced Data Sets on AUC In the last part of the secondary analysis, we compared several case base sizes to understand performance improvements with a larger number of cases. Figure 22 indicates mixed results with additional cases having small improvements for some classifiers, bu t slightly worse performance for other classifies. Since 50,000 cases provide a noticeable improvement for OTCS, we use the largest case base in analysis of primary research questions.

PAGE 97

88 Figure 22: Impact of Case Base Size on AUC The primary research resul ts, presented in the next subsection, use the sampling, voting and classification methods recommended by our secondary research . Specifically, our secondary analysis suggests a classification methodology that involves (1) oversampled training based on the Kalton Optimum Sampling Fraction, (2) weighted nearest neighbor voting, (3) large neighborhood size (k>41), and (4) large case base (50,000). Results for Primary Research Questions Analysis of AUC Table 19 presents confidence intervals3 and related p val ues addressing the primary hypotheses. For Hypothesis 1, the results show effects for TMPM compared to each individual OTCS measure and the OTCS ensemble. However, the effect for TMPM versus the ensemble classifier is reversed indicative of the ensemble cl assifier outperforming TMPM on trauma mortality prediction. This is a significant finding in support of similarity measure classification to improve clinical decision making. For Hypothesis 2, the results show effects for both OTCS MES EP and OTCS MES ES v ersus OTCS. This again backs the 3 Computed using the Mann Whitney measure with Logit transformation (Kottas et al. 2014).

PAGE 98

89 argument for improved performance from an MES adapted similarity measure. For Hypothesis 3, test results show effects between OTCS MES EP and OTCS MES ES at an alpha of 0.10. For Hypothesis 4, test results show effects for all three classifiers versus the ensemble classifier, demonstrating sufficient diversity among individual classifiers. Table 19 : Statistical Testing Results for Hypotheses4 Test Classification Method 1 Classification Method 2 pvalue 1a TMPM (AUC 0.8392 CI:0.8326 0.8458) OTCS (AUC 0.7894 CI:0.7840 0.7948) < 0.0001 * 1b OTCS MES EP (AUC 0.8065 CI:0.8008 0.8122) < 0.0001 * 1c OTCS MES ES (AUC 0.8194 CI:0.8120 0.8268) 0.0056 * 1d OTCS MES Ensemble (AUC 0.8589 CI:0.8521 0.8657) 0.0037 * 2a OTCS MES EP (AUC 0.8065 CI:0.8008 0.8122) OTCS (AUC 0.7894 CI:0.7840 0.7948) 0.0024 * 2b OTCS MES ES (AUC 0.8194 CI:0.8120 0.8268) OTCS (AUC 0.7894 CI:0.7840 0.7948) < 0.0001 * 3 OTCS MES ES (AUC 0.8194 CI:0.8120 0.8268) OTCS MES EP (AUC 0.8065 CI:0.8008 0.8122) 0.0524 ** 4a OTCS MES Ensemble (AUC 0.8589 CI:0.8521 0.8657) OTCS (AUC 0.7894 CI:0.7840 0.7948) < 0.0001 * 4b OTCS MES EP (AUC 0.8065 CI:0.8008 0.8122) < 0.0001 * 4c OTCS MES ES (AUC 0.8194 CI:0.8120 0.8268) < 0.0001 * ROC curves provide a visual representation of performance differences. In Figure 23, the ROC curve for the ensemble dominates all other ROC curves except for two small intervals. The ROC curve for TMPM dominates the ROC curves for OTCS and OTCS EP at false positive values below 0.40. For false positive values above 0.5, the ROC curves switch with the two similarity measures dominating TMPM. The ROC curves for TMPM and OTCS MES ES cross in several areas with OTCS MES showing a small 4 *: significant at traditional alpha of 0.05. The family wise error rate (probability of making at least one Type I error) for simultaneous testing of 10 comparisons is 0.22.

PAGE 99

90 advantage at low false po sitive values, but TMPM showing a small advantage at larger false positive values. Figure 23: ROC Curves for Mortality Prediction Analysis of Operating Points To provide insight about choosing an operating point on a ROC curve, we examine results across score thresholds. Figure 24 shows Youden’s Index values for the classification methods over score thresholds. For TMPM, the threshold represents the probability of death, while the threshold for the kNN methods represents the voting proportion . Figure 24 shows a roughly linear increase in Youden’s Index for TMPM until peaking at a low threshold (0.20) for probability of death . For the kNN methods, the graphs appear symmetric with peaks at 0.55 for the ensemble, 0.50 for OTCS MES ES, 0.60 for OTCS EP and 0.5 0 for OTCS MES.

PAGE 100

91 Figure 24: Confusion Matrix Performance of Classification Methods For optimal results on operating points, OTCS MES ES and the ensemble outperform TMPM when using equal weights for sensitivity and specificity. As shown in Table 20, the Y ouden values for OTCS MES ES (0.5430) and the ensemble (0.5541) are slightly better than TMPM (0.5296). The optimal threshold for TMPM (0.20) is much lower than the thresholds for the kNN methods (0.50 and 0.55). Since the sensitivity values for TMPM (0.70 0), OTCS MES ES (0.7182), and the ensemble (0.7091) are rather low, more emphasis on sensitivity should be given to choose an operating point. Table 20 : Confusion Matrix Summary for Equal Weights Method Sensitivity Specificity Accuracy Youden Threshold TMPM 0.7000 0.8296 0.8225 0.5296 0.20 OTCS 0.6818 0.7164 0.7145 0.3982 0.55 OTCS MES ES 0.7182 0.8249 0.8190 0.5430 0.50 OTCS MES EP 0.8182 0.6238 0.6345 0.4420 0.45 Ensemble 0.7091 0.8450 0.8375 0.5541 0.55 The OTCS methods show an increasing advantage over TMPM as the weight on sensitivity increases. Figure 25 shows weighted Youden Index values as the sensitivity weight increases from equal sensitivity/specificity (1/1) to high preference for sensitivity (10 /1). The ensemble dominates TMPM at all weight levels. The individual OTCS approaches (OTCS, OTCS MES EP, and OTCS MES ES) dominate TMPM at sensitivity weights above 3/1. The performance of TMPM remains relatively flat over the range

PAGE 101

92 of sensitivity weights , while the four OTCS approaches increase in a linear manner with an increasing performance improvement over TMPM. Figure 25: Weighted Youden’s Index by Method and Cost Ratio The ensemble approach also shows advantages over TMPM using the Neyman Pearson criteria. As shown in Figure 26 , the ensemble provides higher sensitivity at false positive constraints below 0.6. At false positive constraints above 0.6, TMPM and the ensemble show similar sensitivity values with some crossing between the TMPM and ensemb le graphs. For the individual OTCS methods, TMPM shows an advantage for false positive constraints above 0.5. For small false positive constraint levels, the individual OTCS methods show higher sensitivity values.

PAGE 102

93 Figure 26: Maximum Sensitivity for False Positive Constraints (Neyman Pearson criteria) Discussion Overall, the analysis provides evidence (Table 21 ) to support all hypotheses except 1d between TMPM and the OTCS ensemble. Results demonstrate strong evidence to reject null hy potheses (equal AUC performance) except for less evidence to support Hypothesis 3 involving OTCS MES ES and OTCS MES EP. Similarity measures using partial matching and matching weights provide better performance than simple matching. Event severity based o n domain knowledge of injuries provides improved predictive performance than prevalence without domain knowledge of injuries. The OTCS ensemble provides improved predictive performance demonstrating enough diversity in the base classifiers (OTCS, OTCS MES EP, and OTCS MES ES). TMPM, developed with a large set of data and specialized domain knowledge, provides improved predictive performance than nearest neighbor classifiers using similarity measures. The AUC values for Hypothesis 1d (TMPM versus OTCS ensemb le) conflict with expected results. The performance difference is large enough to have confidence that the ensemble provides improved performance to the established TMPM.

PAGE 103

94 Table 21 : Summary of Findings on Hypotheses Hypothesis Result Comments 1a TMPM > OTCS Strong evidence to confirm 1b TMPM > OTCS MES EP Strong evidence to confirm 1c TMPM > OTCS MES EP Strong evidence to confirm 1d OTCS Ensemble > TMPM Strong evidence of opposite effect 2a OTCS MES EP > OTCS Strong evidence to confirm 2b OTCS MES ES > OTCS Strong evidence to confirm 3 OTCS MES ES > OTCS MES EP Some evidence to confirm 4a OTCS Ensemble > OTCS Strong evidence to confirm 4b OTCS Ensemble > OTCS MES EP Strong evidence to confirm 4c OTCS Ensemble > OTCS MES ES Strong evidence to confirm The ensemble of nearest neighbor classifiers provides advantages in simplicity and explanation capability over TMPM. The ensemble needs no training as compared to extensive training with a large data set by TMPM. TMPM may also require periodic retraining to deal with concept drift. However, the ensemble requires more classification effort, combining nearest neighbor search of three component classifiers. Indexing may be necessary to mitigate the additional resource usage for three neare st neighbor searches. Nearest neighbor classifiers provide superior explanations with details of prominent cases rather than the regression coefficients used in the more complex TMPM model. However, the ensemble complicates explanations with a need to comb ine prominent cases from component classifiers. Simplification of the ensemble has little negative impact. An ensemble using OTCS ES and OTCS EP generates an AUC of 0.8546, just slightly less than the AUC using three component classifiers (0.8568). OTCS wi th exact matching adds little diversity to the partial, weighted event matching in OTCS MES EP and OTCS MES ES. An ensemble with four component classifiers (OTCS, OTCS MES EP, OTCS MES ES, and TMPM) provides a slight performance improvement. The four compo nent ensemble generates an AUC of 0.8649 compared to 0.8568 with three components. However, explanation of classification decisions would be difficult, combining important cases and regression coefficients for prominent medical events.

PAGE 104

95 The AUC values for T MPM (0.839) are below the reported value (0.88) in Glance et al. (2009) . A possible explanation for TMPM’s smaller AUC value is concept drift in more recent trauma data (2015) used in this study than used in the original study (2002 to 2006). The ensemble method also provides better performance than TMPM using an operating point on a ROC curve. With equal weight on sensitivity and specificity, the ensemble provides a more credible score threshold (0.55) than TMPM (0.20) as well as a slightly larger, optimal Youden value (0.5541 versus 0.5296). However, sensitivity values at the optimal Youden value seem too low, so higher weighting for sensitivity seems likely in practice. The nearest neighbor methods (individual and ensemble) provide linear improvements in the weighted Youden value as the cost ratio increases. The ensemble approach also shows advantages over TMPM using the Neyman Pearson criteria at false positive constraints below 0.5. If a false positive constraint of 0.4 is feasible in practice, the ensem ble should provide sufficient sensitivity (0.9545). An important assertion in this study is the significance of partial, weighted matching using OTCS MES versus the original OTCS. Concerning the OTCS MES benefits for partial matching, Table 22 indicates th at over 70% of the matched events between MESs are partial. Essentially, the original OTCS misses all of these partial matches , with 3 and 4 digit partial matches containing valuable similarity information. Table 22 : Match Level in Trauma Data Target ICD 9CM Length Match Level Test Incidents (2K) # of Matches % of Total Matches 3 3 264,774 3.70% 4 3 2,236,076 31.25% 4 4 1,347,003 18.82% 5 3 1,750,320 24.46% 5 4 1,086,807 15.19% 5 5 470,492 6.58% Exact Matches 2,082,269 29.10% Partial Matches 5,073,203 70.90% Total Matches 7,155,472 100.00%

PAGE 105

96 C onclusion We extended a similarity measure for medical event sequences (MESs) and evaluated its classification performance for mortality prediction using a data set of trauma incidents. To generalize the Optimal Temporal Common Subsequence for MESs (OTCS MES) developed in a previous study (XXXX), we separated event matching into degree of matching (exact or partial using hierarchical structure of event codes) and weights to indicate importan ce or relevance of events. The OTCS MES EP uses partial matching and event prevalence (EP) weights, while the OTCS MES ES uses partial matching and event severity (ES) weights. We compared these methods to the original OTCS (Zheng et al. 2010) measure with only exact event code matching and no weights. We compared the performance of nearest neighbor classification using MES similarity measures to the Trauma Mortality Prediction Model (TMPM), an accepted regression model for mortality prediction for trauma p atients. The comparisons used a substantial data set from the National Trauma Data Bank. The secondary part of the comparison determined reasonable values for neighborhood size ( k ), weighted voting, method to handle imbalanced mortality class, and case bas e size with a preference for large neighbor sizes, weighted voting, oversampling, and large case base. The primary part of the experiment compared nearest neighbor classification to TMPM using Receiver Operating Characteristic (ROC) Curves, Area under a ROC curve (AUC), and confusion matrices derived from operating points on the Receiver Operating Characteristic (ROC) curves. The results demonstrated an advantage on ROC curves, AUC, and operating points for the ensemble of nearest neighbor classifiers. Impo rtantly, this research provides a better method to retrospectively evaluate trauma care for specific trauma centers or care facilities. The method in this study supports improved strategic decisions concerning trauma care processes based on a better measur e of care performance. Governance bodies for trauma care face multiple procedural and resource allocation decisions. The decisions involving trauma system components are many and varied, including staffing, EMS transport systems, triage and

PAGE 106

97 transport proto cols, EMS/trauma center communication systems, facility design and inclusive coverage. We feel our OTCS MES based method can improve strategic decisions in these areas based on superior care facility templates derived from high performing trauma centers. W e plan future work on classification performance of linked patient records and a query architecture for medical event sequences. Linked patient records combine patient characteristics and medical event sequences. We plan to extend mortality prediction comb ining medical events with key characteristics of trauma patients. We also plan to predict high risk patients using patient characteristics and MESs containing both medical events and temporal structure. To support clinical decision making using MESs, we wi ll develop a query architecture supporting both similarity measures for linked MESs and regular expression matching to capture important patterns in MESs. To evaluate the query architecture, we will cooperate with medical professionals and analysts to dete rmine use cases and evaluate utility of query results. We will also develop storage and optimization techniques for large databases of linked MESs.

PAGE 107

98 CHAPTER IV EXTENDED SIMILARITY MEASURES TO PREDICT TRAUMA PATIENT MORTALITY Abstract In this final empirical evaluation, w e extend a similarity measure for medical event sequences (MESs) with linked patient records and evaluate its classification performance for retrospective mortality prediction among trauma patients, a benchmark prediction task in med ical decision making. We extend the Optimal Temporal Common Subsequence for MESs (OTCS MES) measure with distance functions and weights to combine key variables of patient records with MES similarity. Our empirical evaluation compares the predictive perfor mance of the Trauma Mortality Prediction Model (TMPM), an accepted regression approach for mortality prediction in trauma data, to nearest neighbor algorithms using similarity measures based on medical event history and linked patient records. Using a larg e data set of trauma incidents from the National Trauma Data Bank, our results indicate improved predictive performance for an ensemble of nearest neighbor classifiers over TMPM augmented with a second stage regression using patient variables. Furthermore, when supplementing our similarity measure with patient attributes , we see improved predictive performance over measures based solely on medical event sequences. Results provide additional evidence that similarity measures for medical event sequences are a powerful and easily adapted method for medical decision making. Introduction This study further explores mortality prediction for trauma centers, argued previously as being an important classification task with established methods. To restate , t rauma injuries account for a substantial number of deaths , being the fifth leading cause of death for all age groups (Glance et al. 2009) , and t reatment methods for trauma related inju ries are extremely costly, often leading to more expensive forms of care. The importance and cost of trauma care mandate accurate benchmarks for trauma patient outcomes and trauma center effectiveness based on injury severity scores. Cassidy et al.

PAGE 108

99 (2014) maintain that “accurate injury severity scoring systems are essential for benchmarking outcomes and objectively evaluating and improving trauma care.” Trauma care evaluation, based on benchmarks for mortality rates dependent upon injury severity, involves retrospective mortality prediction. Retrospective mortality prediction methods have “important clinical and economic implications because these tools are used to evaluate patient outcomes and quality of care” (Weeks et al. 2016). Prior studies (Celso et al. 2006) using retrospective mortality prediction have compared trauma care between (a) in hospital facilities and independent trauma centers, (b) level I and level II trauma centers, and (c) trauma centers in high and low middle income countries. Because of the importance of mortality prediction for trauma centers, researchers have developed several prediction methods. The most widely accepted method, the Trauma Mortality Prediction Model (TMPM), involves detailed regression modeling of injury codes using a large training sample. Specifically, TMPM (Glance et al. 2009) uses derived coefficients for more than one thousand injury codes to make mortality predictions. In this portion of our research , we extend a similarity approach using medical event sequences and compare its predictive performance to TMPM for mortality prediction of trauma patients. We extend the Optimal Temporal Common Subsequence for MESs similarity measure (OTCS MES) with distance functions and weights to combine key variables from patient records with medical event history . The empirical evaluation compares the predictive performance of nearest neighbor classification using the extended OTCS MES to TMPM augmented with a second stage regression using patient variables . Based on a large data set of trauma incidents from the National Trauma Data Bank, our results indicate improved predictive performance for an ensemble of nearest neighbor classifiers over extended TMPM .

PAGE 109

100 Related Work Similarity Measures for Medical Event Sequences Decades of dev elopment and usage of medical event sequences (MESs) have made them pervasive in health care. Electronic health records contain abstractions of standard diagnosis, procedure, and drug codes to characterize MESs. For example, Table 23 shows a sample MES of hospital admissions characterized by the International Classification of Diseases V9 (ICD 9 CM) primary diagnosis code for each event. Table 23. Sample Inpatient MES Member ID Primary ICD 9 CM Diagnosis Code Description Event Start Event End 00824B6D595BAFB8 491 . 22 OBSTR . CHRONIC BRONCHITIS ACUTE BRONCHITIS 2/7/2008 2/10/2008 00824B6D595BAFB8 491 . 21 OBSTR . CHRONIC BRONCHITIS – ACUTE EXACERBATION 2/28/2008 3/4/2008 00824B6D595BAFB8 789 . 06 ABDOMINAL PAIN EPIGASTRIC 3/25/2008 3/29/2008 00824B6D595BAFB8 428 . 0 CONGESTIVE HEART FAILURE UNSPECIFIED 3/29/2008 3/30/2008 00824B6D595BAFB8 780 . 2 SYNCOPE AND COLLAPSE 5/14/2008 5/16/2008 This study uses multiple similarity measures to classify similar trauma incidents: (1) original Optimal Temporal Common Subsequence (OTCS), (2) OTCS MES with event prevalence (OTCS MES EP), and (3) OTCS MES with event severity weights (OTCS MES ES). The original OTCS, developed by Zheng et al. (2010), uses exac t matching for events in a sequence. However, the unique hierarchical nature of MESs was not utilized in the measure. For example, if one MES contains ICD 9 CM code 250.00 and a second MES contains ICD 9 CM code 250.01, the original OTCS would not find a m atch, although these events are highly related. In addition, the original OTCS does not weight matched events in its event matching component. We adapted the original OTCS to medical event sequences (OTCS MES) and compared the new measure to the original OTCS (Mannino et al. 2017). OTCS MES incorporates hierarchical coding in partial matches, duplicate event matching, and prevalence weights in the event matching component. As a partial matching example, if one MES contains ICD 9 CM code 250.00 and a second MES contains ICD 9 CM code 250.01, OTCS MES considers these events matching at the 4 digit level, but not at the 5 -

PAGE 110

101 digit level. In comparison of OTCS MES with OTCS and Artemis, a measure incorporating event alignment, on two substantial data sets of Medic are claims data, we found a small overlap in nearest neighbors demonstrating the superior design of OTCS MES with emphasis on unique aspects of MESs. In Fredrickson et al. (2018), we generalized the weighting component of the OTCS MES to generate two variations. For trauma incident classification, our first adaptation uses event prevalence weighting of matched events (OTCS MES EP). Event prevalence weighting presumes that rarer events matched between two MESs indicate greater similarity than more common mat ched events . The OTCS MES calculates individual event likelihood or prevalence using the complete set of trauma incident events and associated diagnosis codes . An event’s prevalence weight is one minus the event’s frequency rate, so larger values (weights) indicate rarer events . Our second adaptation uses severity weights for each injury code matched between two trauma incidents. For trauma data, event severity provides intuitive appeal to weight matching events for mortality prediction . In support , e arly methods for mortality prediction incorporated injury scoring systems based on event severity. Our approach assumes matched events with greater severity are indicative of more similar incidents. Accordingly, we use the Clark/Ahmad extension (Clark and Ahmad 2006) to quantify event severity, with survivor proportions assigned to each ICD 9 CM injury code. A severity weight equals one minus the survivor proportion, with larger values indicating more severe events. Classification Methods for Retrospecti ve Mortality Prediction Because of the importance of mortality prediction for trauma centers, several classification (scoring) methods for retrospective mortality prediction have been developed. The Injury Severity Score (ISS), an early method for severity scoring, uses the Abbreviated Injury Scales to score injuries and predict trauma outcomes (Cassidy 2014). The International Classification of Diseases Injury Severity Score uses empirically derived survival risk ratios for ICD 9 CM codes (Glance 2009). Bu rd et al. (2008)

PAGE 111

102 developed the Bayesian Logistic Injury Severity Score to leverage ICD 9 CM trauma coding with 2,210 possible injury codes and 243,037 two way interactions among injury codes. In contrast to the International Classification of Diseases Inju ry Severity Score, the Bayesian Logistic Injury Severity Score uses injury interactions, not just individual injury codes. Burd et al. (2008) found slight improvements in prediction performance with the Bayesian Logistic Injury Severity Score compared to I nternational Classification of Diseases Injury Severity Score, but much better model calibration with the HosmerLemeshow h statistic. Finally, the more recent Trauma Mortality Prediction Model (TMPM), a probit regression model, supports alternative injury codes (Abbreviated Injury Scales, ICD 9 CM, or ICD 10), using approximately 1,000 different types of injuries. TMPM comprises two separate probit models. Model 1 uses all possible injuries as binary predictors with death as the binary outcome. Model 2 use s indicators of body region severity. A weighted average of the coefficients of the two regression models provides the empirical severity for each injury. Comparative research demonstrated that TMPM ICD 9 provides superior performance than other ICD 9 CM based models predicting trauma outcomes. Superior predictive performance of TMPM ICD9 was most noted as the number of injuries increased (Cassidy et al. 2014), compatible with multiple injury bundles, and well matched to our adapted similarity measures. Fu rthermore, Haider et al. (2012) found that TMPM ICD 9 is a superior predictor of mortality when compared with earlier methods and “TMPM ICD 9 may be an even better measure of human injury, and its use in administrative or nonregistry data is suggested”. Ac cordingly, in this study, we use TMPM ICD 9 because of its industry recognized performance. Research Methodology We evaluate the predictive performance of a similarity measure combining medical event sequences and variables from linked patient records. In support of this approach, research about TMPM (Glance et al. 2009) provided some evidence about improvement in predictive performance from the

PAGE 112

103 inclusion of data elements from linked patient records. As patient attributes are often accessible for linkage with medical event histories, it is important to understand the consequences of combining medical event sequences with patient attributes in terms of improved clinical decision making. The empirical evaluation of these extended predictive techniques involves retrospective mortality prediction. The experiment uses a substantial data set from the National Trauma Data Bank (NTDB) and extended similarity measures combining medical event sequences and linked patient records. This section describes the research methodology for design and evaluation of extended similarity measures for MESs, research goals and hypotheses, data, and performance measures. Patient Attributes and Prediction Methods Distance between entities typically combines individual distances between the multiple attributes of each entity. We extend our previously developed similarity measure, based only on medical event histories, with distance measures for attributes extracted from typical patient records. Using these extended similarity measures, we predict trauma incident outcome (deceased or non deceased) with nearest neighbor classification. Selection of Patient Attributes Selection of the patient attributes for extending our OTCS MES measure is guided by reference research. An extended version of TMPM (Glance et al. 2009) identifies age, gender and the nature of the injury as key predictive variables for trauma patient mort ality. We evaluated the importance of these variables through logistic regression analysis of effects and principal component analysis. Figure 27 provides the results of these analyses identifying age, gender, and injury mechanism or cause as variables imp ortant to trauma outcome prediction.

PAGE 113

104 Figure 27: Patient Record Variable Significance Distance Functions for Patient Attributes After selecting patient variables (age, gender and injury mechanism), we chose a distance measure for each variable. For catego rical variables such as gender and injury mechanism, the Value Difference Metric provides an appropriate distance function based on conditional class probabilities (Wilson et al. 1997). Essentially, the Value Distance Metric considers entities similar if t hey have similar correlations with the output classes. For example, Table 2 4 shows that the distance between male and female is 0.0273 (similarity 0.9727) based on correlation with trauma outcome. For the final patient attribute of age, we used normalized Euclidean distance appropriate for numeric variables. Table 24: VDM Computations for Gender and Injury Mechanism x (MES1) y (MES2) (1) (2) (3) (4) Distance (a b) |(1) (2)|+|(3) (4)| a b P(Dcsd|a) P(Dcsd|b) P(NonDcsd|a) P(NonDcsd|b) Gender Male Female 0.0670 0.0533 0.9330 0.9467 0.0273 Injury Mechanism 0 Other 1 MVT Ped. 0.0539 0.0997 0.9461 0.9003 0.0916 2 Firearm 0.0539 0.1737 0.9461 0.8263 0.2396 3 Drwn/Sbmrs 0.0539 0.1333 0.9461 0.8667 0.1588 4 Suffocation 0.0539 0.2727 0.9461 0.7273 0.4376 1 MVT Ped. 2 Firearm 0.0997 0.1737 0.9003 0.8263 0.1480 3 Drwn/Sbmrs 0.0997 0.1333 0.9003 0.8667 0.0672 4 Suffocation 0.0997 0.2727 0.9003 0.7273 0.3460 2 Firearm 3 Drwn/Sbmrs 0.1737 0.1333 0.8263 0.8667 0.0808 4 Suffocation 0.1737 0.2727 0.8263 0.7273 0.1980 3 Drwn/Sbmrs 4 Suffocation 0.1333 0.2727 0.8667 0.7273 0.2788

PAGE 114

105 Predictive Method 1 – Multi Stage Logistical Regression Two separate classification methodologies were used to evaluate the predictive impact of linked patient attributes. First, a multistep regression, combining either TMPM computed probability of death or OTCS MES event similarity with patient attributes, was used to predict trauma mortality. Basically, we added the patient record attributes to a previously modeled morbidity score (either TMPM_pDeath or OTCS MES similarity score) in the second stage logistic regressions. Accordingly, our logistic regression p redicting trauma incident outcome of deceased or non deceased, uses the following independent or predictor variables : First stage predictive score based on either (a) injury code regression (TMPM) or (b) kNN classification probability from OTCS MES similarity measures Binary gender flag (1=Male and 0=Female) Ordinal injury mechanism value (0=Other, 1=Motor Vehicle Traffic/Pedestrian, 2=Firearm, 3=Drown/Submersion and 4=Suffocation) Scalar value for age (1 89) As stated, we predict trauma incident mortality using a logistic regression model incorporating either the probability of death from TMPM (TMPM_pDeath) or the OTCS MES mortality score, based on kNN nearest neighbor classification. Equation 5 below shows the first logistic regression model using TMPM pro bability of death and Equation 6 is the second model using OTCS MES based mortality score . ( ) = 5.468 + ( 5 .1644 _ ) + ( 0 .0303 ) + ( 0 .6758 ) + ( 0.5527 ) ( 5) ( ) = 8.9338 + ( 11 .0058 _ _ ) + ( 0 .0271 ) + ( 0.7018 ) + ( 0.4214 ) ( 6)

PAGE 115

106 where TMPMpMortality is the probability of mortality for the trauma incident using TMPM pDeath , OTCSMESpMortality is the probability of mortality for the trauma incident using the OTCS MES score from nearest neighbor classification Predictive Method 2 – kNN Classification The second classification methodology uses kNN classification leveraging a similarity measure extended with distance functions for age, gender and injury mechanism. The distance measu res (Euclidean or VDM) for our patient attributes were previously explained. The addition of these distance measures results in four measures for similarity between trauma incidents. The resulting equation for our similarity measure extended with patient record attributes is shown as Equation 7 . The weighting of these similarity measures to arrive at a single composite measure, is described in the results section for Method 2. = ( ) + ( ) + ( ) + ( ) = 1 ( 7 ) The first component of the Equation 7 quantifies similarity based on matched trauma incident events. We include t he “event matching” component of OTCS MES only because the “temporal structure” component of OTCS MES is not applicable to the “bundled” event codes found in trauma registries. That is, there is no temporal duration or spacing information provided for trau ma diagnoses (events) . As explained previously, we compute MES similarity using two methods . The first method sums the prevalence weights of matched events. Event prevalence weighting presumes that rarer events matched between two MESs indicate greater sim ilarity than more common matched events. Equation 8

PAGE 116

107 below shows the event prevalence based MES similarity computation (OTCS MES EP). For the second measure, o ur study computes the severity weight for a matched injury code to be one minus the survivor proportion for that injury code from the Clark/Ahmad extension to the Barell injury code matrix. L arger severity weights indicat e more severe events. Equation 9 defines the severity based MES similarity component (OTCS MES ES) , a re vision of the prevalence based MES similarity component , incorporating a summation of severity weights for matched events . Essentially, we are replacing the prevalence weight for a matched event, as shown in Equation 8 , with the severity weight for that sa me matched event. ( ) = ( ) / ( + 1 ) ( 8) ( ) = ( ) / ( + 1 ) ( 9) where ME is the set of all matching events in the pair of cases (medical event sequences), C is set of all cases, MSSize is the cardinality of the associated set, NP We is the normalized prevalence weight of event e, SWe is the normalized severity weight of event e, PDM is the maximum matched event limit, and MSSizeLimit is the number of event matches in the pair of cases constrained by the matched event limit. The kNN classification algorithm (Bhatia and Ashev, 2009) provides a simple but computationally intense approach for classification using a distance function. To make classification decisions, the kNN classification algorithm uses a neighborhood of k neare st neighbors with voting among the k neighbors.

PAGE 117

108 In this study, we used inverted similarity measures (1 – similarity) as distance measures. To reduce computational requirements for large case bases, we created indexes for search of similarity matrices. We als o alter the nearest neighbor voting and classification methods, as recommended by Fredrickson et al. (2018), to accommodate the rare outcome of trauma incident morbidity. Our final classification methodology uses (1) over sampled training based on the Kalt on Optimum Sampling Fraction, (2) majority nearest neighbor voting, (3) large neighborhood sizes ( k >41), and (4) an expanded case base of trauma incidents (50,000). Research Goals and Hypotheses Our research goals address the value added from patient attributes when predicting trauma mortality. Our first objective is to evaluate performance improvement when extending the industry standard (TMPM) for trauma morbidity prediction. Second, we extend our alternative predictive method, OTCS MES based kNN classification, with similarity components or distance functions for patient record variables. Our final goal is to then assess trauma morbidity prediction using kNN classification, from similarity measures, as a viable option to TMPM. Our evaluation uses Area und er the Receiver Operating Characteristic Curve and threshold specific operating points. The research goals described above lead to the following hypotheses concerning extended methods for trauma outcome prediction. 1. The extended TMPM performs better than e xtended OTCS MES classification methods, but performs worse than an ensemble of extended OTCS MES methods on trauma morbidity prediction. Prior research (Fredrickson et al. 2018) demonstrated that an ensemble of MES adapted similarity measures outperforms TMPM, which serves as the referential method for trauma morbidity prediction . However, this same research indicated that TMPM performs better than

PAGE 118

109 classification methods based on individual (rather than ensemble) similarity measures such as the event sever ity or event prevalence based OTCS MES . TMPM, extended with patient attributes for age, gender and injury mechanism, demonstrated a small improvement in predictive performance (Glance et al. 2009). We anticipate a similar improvement in predictive performance for our classification method based on an ensemble of MES similarity measures that has been extended with patient attributes. 2. A severity based OTCS MES similarity measure classification method performs better than prevalence based OTCS MES on trauma m orbidity prediction. As explained previously, injury severity is an appropriate weighting method for scoring trauma incidents . In fact, injury severity has been quantified by several diverse scoring methods because of its effectiveness in characterizing trauma incidents . Furthermore, like our similarity measure based on injury severity, injury specific scoring systems leveraging diagnosis codes, injury type, and anatomical region have been found appropriate for trauma outcome prediction . Therefore, it is reasonable to propose that OTCS MES, weighted by an event severity score (e.g. Clarkson Barell matrix survivor proportion), will demonstrate improved performance on trauma morbidity prediction over similarity measures weighting matched events in other ways. 3. The ensemble OTCS MES similarity measure for classification performs better than the individual severity OTCS MES or prevalence OTCS MES based similarity measures on trauma morbidity prediction. Ensembles improve performance of diverse classifiers. We expect to observe diversity in predictive performance between event matching based on exact normalized event prevalence with partial matching, and event severity with partial matching . Essentially, the extent of the

PAGE 119

110 diversity between these two measures should result in improved prediction results when incorporating both into an ensemble measure. 4. All methods extended with linked patient record attributes perform better than methods without linked patient attributes. In accordance with Hypothesis 1, the extension of predictive methods with additional data elements from linked patient records, should provide improved performance . As such, we anticipate that extended classification methods, kNN nearest neighbor or logistic regression, will demonstrate improved perfo rmance on trauma morbidity prediction. Data Trauma center registries provide a foundation for research improving trauma patient care. We use the publicly available National Trauma Data Bank (NTDB) of registries for our morbidity prediction experiment. Spec ifically, we randomly selected various test and training data from the complete set of 2015 trauma incidents in the collected NTDB trauma registries for 2015. Table 25 summarizes filters applied to the trauma data. For an equitable comparison to TMPM, thes e filters are the same ones used during development of TMPM (Glance et al. 2009). The final data filter excluding trauma incidents having fewer than five event or diagnosis codes is applied in accordance with the TMPM criteria of considering the five worst injuries based on severity score . Table 25: Summary of Filtered Trauma Data 2015 NTDB Trauma Incidents Remaining Original data set 917,865 (1) Excluded incidents with all diagnoses being non trauma (based on MARC table) 728,309 (2) Excluded incidents for patients w/age less than 1 year, or missing age or gender 685,587 (3) Excluded incidents with missing discharge disposition (HOSPDISP n/a) 590,288 (4) Excluded incidents w/patient DOA or w/transfer to another facility 427,54 5 (5) Excluded incidents for facilities handling less than 500 incidents during the year 403,534 (6) Excluded incidents having fewer than 5 diagnosis (event) codes 175,319 (6a) Deceased Disposition (6.28%) 11,010 (6b) Non Deceased Disposition (93.72%) 164,309

PAGE 120

111 From the 175,319 incidents having at least five events, we randomly selected 50,000 trauma incidents for a case base and 2,000 cases for testing. The training data set contains 465,325 total diagnosis codes (4,053 unique ICD 9 CM codes). Predictive Performance Evaluation Metrics – AUC and Operating Points For statistical evaluations, we use Area under the Receiver Operating Characteristic Curve (AUROC or AUC) as the primary performance measure. AUC provides a prevalence indepe ndent measure of discrimination ability in risk prediction models. Calculation of AUC requires a ROC curve of classification scores. We used voting proportions among nearest neighbors for classification scores. Two tailed tests of AUC are based on Mann Whi tney confidence intervals augmented with the Logit transformation (Qin and Hotolovac 2008). In a detailed simulation study (Kottas et al. 2014), the augmented Mann Whitney intervals provided good AUC coverage, robustness to unbalanced sample sizes and norm ality departures, and reasonable power. We evaluated operating points using three measures: Youden’s J statistic (Youden’s Index), the weighted Youden’s Index, and the Neyman Pearson criterion. Youden’s index (Youden 1950), computed as sensitivity + speci ficity – 1, ranges from 1 to 1. A value of 1 indicates a perfect test with no false positives or false negatives. Li et al. (2013) introduced the weighted Youden’s Index when sensitivity and specificity are not equally important. Given the cost in treatme nt and risk for deceased outcomes (Newgard and Lowe 2016), predicting a trauma incident outcome of deceased correctly (sensitivity) is more important than predicting a non deceased outcome. Accordingly, we use the Neyman Pearson criterion maximizing sensit ivity for false positive constraints.

PAGE 121

112 Results of Empirical Evaluation Predictive Method 1 – Multi Stage Logistical Regression The coefficients for our predictor variables are based on a regression model of the 50,000 training incidents described in the Da ta section. The coefficients computed by this model are then applied to the 2,000 test incidents arriving at a dependent value for probability of death for each test incident . After executing our two predictive models, we then compute comparative statistics evaluating their performance. Comparison statistics for the two logistic regression models (TMPM and OTCSMES) are shown in Table 26 . The Akaike Information Criterion (AIC) and Schwarz Criterion (SC) provide estimators of relative quality of statistical models for a given set of data. First, g iven a set of candidate models, the preferred model is the one with the minimum AIC value. Similarly, the model with the lowest SC is preferred. SC is based, in part, on the likelihood function an d it is closely related to AIC. Given these guidelines, our comparative statistics indicate that using the extended OTCS MES ensemble provides a preferable predictive model over TMPM . This result is true using both the AIC and SC comparative model statisti cs. Table 26: Comparison Statistics for Logistic Regression Models (Method 1) Measure Intercept Only Extended TMPM: Intercept and Covariates Extended OTCS MES Ensemble: Intercept and Covariates AIC 853.93 642.50 565.03 SC 853.53 670.51 593.04 The results of our multistage logistical regression for extending our referential TMPM and OTCS MES similarity measure ensemble are shown below (Figure 2 8 and Table 27 ). Patient record extension results in an improvement in AUC from 0.8392 to 0.8673 for TMPM and from 0.8589 to 0.8805 for the OTCS MES ensemble. The OTCS MES ensemble shows significant improvement (alpha of 0.05) in AUC over the referential TMPM (Table 27 ).

PAGE 122

113 Figure 28 : ROC Curves for Extended TMPM and OTCS MES Ensemble (Method 1) Table 27: AUC Hypothesis Test for Extended TMPM and OTCS MES Ensemble (Method 1) Method 1 AUC Method 2 AUC z p TMPM 0.8392 OTCS MES Ensemble 0.8589 2.90 0.0037 Extended TMPM 0.8673 Extended OTCS MES Ensemble 0.8805 2.03 0.0424 Predictive Method 2 – kNN Classification with OTCSMES Similarity Measure Predictive method 2 involves the inclusion of additional measures for patient record attributes to generate a composite similarity measure between trauma incidents. As such, a weighting technique is necessary to appropriately integrate our four similarity measures for event history, age, gender and injury mechanism. We evaluated two weighting approaches based on principal component analysis (PCA) eigenvalues and decision tree importance factors (Table 28 ). For all measures the order of similarity component importance is OTCS MES, age, gender and injury mechanism.

PAGE 123

114 Table 28: Similarity Component Weighting Alternatives for Method 2 Variable OTCS MES EVENT SEVERITY OTCS MES EVENT PREVALENCE OTCS MES ENSEMBLE PCA HPSPLIT PCA HPSPLIT PCA HPSPLIT Eig. Wt. Imp. Wt. Eig. Wt. Imp. Wt. Eig. Wt. Imp. Wt. OTCS MES 1.281 0.320 6.607 0.465 1.271 0.318 5.743 0.440 1.279 0.320 6.535 0.413 Age 1.031 0.258 4.563 0.321 1.052 0.263 4.627 0.354 1.054 0.263 5.217 0.329 Gender 0.901 0.225 1.541 0.109 0.902 0.225 1.643 0.126 0.899 0.225 2.071 0.131 Injury Mechanism 0.787 0.197 1.486 0.105 0.776 0.194 1.051 0.080 0.769 0.192 2.016 0.127 Performance of composite similarity measures based on eigenvalue component weighting was consistently better. Therefore, all performance tests are based on eigenvalue weighted similarity measures. Results of these tests are shown in Table 29. Importantly, the extended OTCS MES ensemble provides better performance than the extended TMPM (alpha of 0.05). Table 29: AUC Hypothesis Tests (Method 2) Method 1 AUC Method 2 AUC z p 1a. TMPM EXT 0.8673 OTCS MES EP EXT 0.8598 1.22 0.222 1b. TMPM EXT 0.8673 OTCS MES ES EXT 0.8685 0.16 0.873 1c. TMPM EXT 0.8673 OTCS MES ENS EXT 0.8841 2.57 0.010 2. OTCS MES ES EXT 0.8685 OTCS MES EP EXT 0.8598 1.32 0.187 3a. OTCS MES ENS EXT 0.8841 OTCS MES EP EXT 0.8598 3.96 0.000 3b. OTCS MES ENS EXT 0.8841 OTCS MES ES EXT 0.8685 2.26 0.024 4a. TMPM EXT 0.8673 TMPM 0.8392 4.23 0.000 4b. OTCS MES EP EXT 0.8598 OTCS MES EP 0.8065 9.21 0.000 4c. OTCS MES ES EXT 0.8685 OTCS MES ES 0.8194 6.57 0.000 4d. OTCS MES ENS EXT 0.8841 OTCS MES ENS 0.8589 3.78 0.000 To provide insight about choosing an operating point on a ROC curve, we examine results across score thresholds for extended methods. Figure 29 shows weighted Youden Index values as sensitivity weights increase from equal sensitivity/specificity (1/ 1) to high preference for sensitivity (10/1). Of significance, OTCS MES ensemble and OTCS MES ES dominate TMPM at sensitivity weights above 2/1.

PAGE 124

115 Figure 29: Weighted Youdens Index by Method and Sensitivity Cost Ratio (Method 1) Extended TMPM, OTCS MES ES and OTCS MES are showing comparable results using the Neyman Pearson criteria. As shown in Figure 30 , all extended measures show similar sensitivity across FPF constraint levels except for OTCS MES EP which lags for FPF constraints above 0.5 . Figure 30: Sensitivity by FPF Constraint Level (Method 1)

PAGE 125

116 Discussion The results in the previous section indicate that (1) addition of variables from linked patient records improves performance of all methods and (2) the extended OTCS MES ensemble continues to perform better than extended TMPM. Linked patient records provide a context for prediction based on medical event sequences alone, supporting our hypotheses. The add ition of linked patient records continues the dominance of a general purpose prediction method (extended OTCS MES) over a method with computation intensive training (TMPM). Prior research used design science to develop a clinical decision support tool (OT CS MES) leveraging medical event sequences. Importantly, our research demonstrated both its flexibility in other medical domains and competitive performance against an industry standard method for classification. The original OTCS MES similarity measure was developed using inpatient and outpatient event sequences. Our research applies this same measure to trauma incidents, and demonstrates comparable performance to an accepted method already functioning within this medical domain. The original OTCS MES is a daptable to other medical domains because it allows for similarity component weighting and the inclusion/exclusion of various similarity components based on their domain applicability . For example, specific to trauma incidents, we do not require the tempor al structure component of the measure and we choose to weight matched events by severity. OTCS MES allows both modifications with minimal effort. Furthermore, OTCS MES extended with linked patient record attributes provided similar performance improvement to that of TMPM, the referential method. Furthermore, kNN classification using similarity measures requires no training as opposed to TMPM, requiring computationally intensive training. Given these advantages for similarity measure based classification, th e application of this tool to support clinical decision making in other medical domains is encouraging. We plan future research using extended OTCS MES in predicting patient risk (cost) based on inpatient admission MESs linked with patient record attribute s.

PAGE 126

117 As a limitation to our research, t he AUC values for TMPM (0.839) are below the reported value (0.88) in Glance et al. (2009) . A possible explanation for TMPM’s smaller AUC value is concept drift in more recent trauma data (2015) used in this study than used in the original study (2002 to 2006). Accordingly, TMPM may also require periodic retraining to deal with concept drift . This includes the R implementation of TMPM ICD9 applied in our experiments (https://cran.r project.org/web/ packages/tmpm/tmpm.pd f). A further limitation may be a product of the filters used to generate our experimental data. The data filters are compatible with those used in earlier research to evaluate the referential method, TMPM. However, use of these filters may limit the gener alization of our conclusions concerning predictive performance. For example, since TMPM is based upon the five worst injuries, we’ve restricted our trauma incidents to those with five or more injury codes. As such, our conclusions concerning predictive per formance are limited to trauma incidents having five or more injury codes, along with the other incident characteristics resulting from our data filters. Further research is recommended to determine the validity of our findings for trauma incidents having other characteristics. Conclusion Further research about similarity measures for medical event sequences is justified by this study and related previous studies. Prior studies demonstrate that similarity measures adapted to MESs perform differently than t hose used for other purposes. This study provides evidence that MES similarity measures have the potential to outperform industry standard methods for outcome prediction within specific medical domains . This key finding compels further exploration into the clinical decision support capabilities of MES similarity measures leveraging large and accessible stores of health care data. There are several practical applications and resulting benefits from similarity measures functioning within clinical decision sup port systems. These include (1) improved patient classification and clustering, (2) increased patient adherence to clinical pathways (care management plans), and (3)

PAGE 127

118 augmented discovery of similar patients for medical social networking . In a broader applic ation, a query architecture can support similarity measures for medical event sequences and regular expression matching to capture important patterns in medical event sequences. To evaluate the query architecture, we will cooperate with health care profess ionals and analysts to determine use cases and evaluate utility of query results in decision making. We will also develop storage and optimization techniques for large databases of linked medical event sequences. Besides addressing these areas of additional analysis, we plan future research about medical reasoning using OTCS MES. To support reasoning by health care professionals using MESs for risk analysis, clinical pathways, and co morbidity, we propose matching operator and visualization tools for MESs . The matching operator will support temporal constraints about changes in medical events in MESs such as increased severity and prolonged symptoms. Visualization tools for MESs will help health care professionals see important patterns in MESs. Human factors studies will be necessary to evaluate the utility of a query architecture combining a similarity measure, matching operator, and visualization tools. Summary The research presented in this dissertation responds to the reluctance of health care providers to use clinical decision support tools leveraging ever growing volumes of digitized patient information . It is part of a movement toward the "science of medicine" from the more subjective and often less effective "practice of medicine" . Specifically, we intend to advance the use of scientific methods leverag ing pervasive health care dat a to improve clinical decision making, and ultimately patient outcomes. The introductory literature review broadly describes many inhibitors to physician adoption of technology . It explains how many of these inhibitors are unique to health care . The literature review

PAGE 128

119 concludes with a narrower focus addressing the lack of effective CDSS tools available to providers and introduction of a similarity measure us ing medical event sequences . The subsequent studies of this dissertation develop and evaluate a MES similarity measure as a CDSS tool; unique and beneficial to health care . The first study introduces a similarity measure based on medical event histories ( OTCS MES) designed to perform differently than previously introduced state sequence similarity measures in identifying "like" patients . OTCS MES specifically leverages the unique components of medical event sequences . Accordingly, the research proposition is that OTCS MES performs differently and ultimately better within the health care context. The first study is basically design science research and does, in fact, support the first part of this proposition in demonstrating the difference or lack of overlap in "nearest neighbor" patients generated using OTCS MES versus other applicable similarity measures . The second study addresses the other part of our research proposition by evaluating the p erformance of OTCS MES versus an established or referential met hod . Specifically, this study utilizes OTCS MES for trauma center mortality prediction and compares its predictive or classification performance against TMPM, a recognized and accepted method for trauma mortality prediction . Results from this empirical stu dy demonstrate an advantage for kNN classification based on OTCS MES similarity over the “industry standard” TMPM method on trauma outcome prediction. The third study extends the OTCS MES similarity measure with key patient specific variables generally pro vided with EHRs, or more precisely for this case, trauma center registries. The augmentation of predictive variables with patient record data is intended to improve classification performance . This study uses demographic and injury mechanism data purchased in conjunction with trauma incident registries . Specifically, we add age, gender and injury mechanism as similarity components to OTCS MES and compare predictive performance to TMPM extended with the identical

PAGE 129

120 variables. The extended OTCS MES method cont inues to demonstrate improved predictive performance over the correspondingly extended TMPM method on trauma outcome prediction. In summary, this dissertation presents research that is design science in nature. It introduces an original IT artifact in OTCS MES intended to be a CDSS tool that health care providers can access to improve patient outcomes and clinical procedural efficiency . Further research is planned to assert the benefits of our OTCS MES tool . Additionally, we strive to make its practical app lication possible through IT advancements in query architecture s for retrieving and comparing MESs.

PAGE 130

121 R EFERENCES Akhtar, Nikhat, and Devendera Agarwal. "A Literature Review of Empirical Studies of Recommendation Systems." Almeida, H., Neto, D. G., Meira Jr, W., & Zaki, M. J. (2012). Towards a better quality metric for graph cluster evaluation. Journal of Information and Data Management , 3(3), 37 8. Angst, C. M., & Agarwal, R. (2009). Adoption of electronic health records in the presence of privacy concerns: The elaboration likelihood model and individual persuasion. MIS quarterly , 33(2), 339 370. Barell, V., Aharonson Daniel, L., Fingerhut, L. A., Mackenzie, E. J., Ziv, A., Boyko, V., ... & Heruti, R. (2002). An introduction to the Barell body region by nature of injury diagnosis matrix. Injury Prevention, 8(2), 9196. Baras, J. S., & James, M. R. (1994). Robust and ri sk sensitive output feedback control for finite state machines and hidden Markov models. Bayyapu, K. R., & Dolog, P. (2010). Tag and neighbour based recommender system for medical events. Proceedings of MEDEX . Beecks, C., Kirchhoff, S., & Seidl, T. (2014). On stability of signature based similarity measures for contentbased image retrieval. Multimedia tools and applications , 71(1), 349 362. Bhatia, N. and Ashev, V. Survey of Nearest Neighbor Techniques, International Journal of Computer Science and Informa tion Security, 8 (2), 14 22. Birman Deych, E., Waterman, A. D., Yan, Y., Nilasena, D. S., Radford, M. J., & Gage, B. F. (2005). Accuracy of ICD 9 CM codes for identifying cardiovascular and stroke risk factors. Medical care, 43(5), 480485. Blumenthal, D. (2010). Launching hitech. New England Journal of Medicine, 362(5), 382385. Boonstra, A., & Broekhuis, M. (2010). Barriers to the acceptance of electronic medical records by physicians from systematic review to taxonomy and interventions. BMC health servic es research, 10(1), 231. Botsis, T., Hartvigsen, G., Chen, F., & Weng, C. (2010). Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translational Bioinformatics, 2010, 1. Brown, J., Kahn, M., & Toh, S. (2013). Data quality assessment for comparative effectiveness research in distributed data networks. Medical care, 51(8 0 3), S22. Brownlee, J. (2015). Tactics to combat imbalanced classes in your machine learning dataset. Machine Learning Process. Brownlee, J. "8 Tactics to Co mbat Imbalanced Classes in Your Machine Learning Dataset." Machine Learning Mastery (2015). Burd, R. S., Ouyang, M., & Madigan, D. (2008). Bayesian Logistic Injury Severity Score: A Method for Predicting Mortality Using International Classification of Dise ase 9 Codes. Academic emergency medicine, 15(5), 466475. Burt, C. W., & Sisk, J. E. (2005). Which physicians and practices are using electronic medical records?. Health Affairs, 24(5), 13341343. Campbell, H., Hotchkiss, R., Bradshaw, N., & Porteous, M. ( 1998). Integrated care pathways. BMJ: British Medical Journal, 316(7125), 133.

PAGE 131

122 Carvalho, Deborah R., Alex A. Freitas, and Nelson Ebecken. "Evaluating the correlation between objective rule interestingness measures and real human interest." European Confere nce on Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2005. Cassidy, Laura D., et al. "Is the Trauma Mortality Prediction Model (TMPM ICD 9) a valid predictor of mortality in pediatric trauma patients?." Journal of pediatric surgery 49.1 (2014): 189192. cdc.gov 2014. Meaningful Use . http://www.cdc.gov/ehrmeaningfuluse/ . Accessed November 26, 2014 Celso, B., Tepas, J., Langland Orban, B., Pracht, E., Papa, L., Lottenberg, L., & Flint, L. (2006). A systematic review and metaa nalysis comparing outcome of severely injured patients treated in trauma centers following the establishment of trauma systems. Journal of Trauma and Acute Care Surgery, 60(2), 371 378. Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview . In Data mining and knowledge discovery handbook (pp. 875 886). Springer, Boston, MA. Chechulin, Y., Nazerian, A., Rais, S., & Malikov, K. (2014). Predicting patients with high risk of becoming high cost healthcare users in Ontario (Canada). Healthcare Po licy, 9(3), 68. Chismar, W. G., & Wiley Patton, S. (2003, January). Does the extended technology acceptance model apply to physicians. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on (pp. 8 pp). IEEE. Cimino, Jam es J. "Coding systems in health care." Methods of information in medicine 35 (1996): 273284. Clark, D. E., and S. Ahmad. "Estimating injury severity using the Barell matrix." Injury Prevention 12.2 (2006): 111116. Clark, D. and Ahmad, S. (2006). “Estimating injury severity using the Barell matrix,” Injury Prevention, 12(2), 111116. cms.gov 2014. EHR Incentive Programs. http://www.cms.gov/Regulations and Guidance/Legislation/EHRIncentivePrograms/index.html?redirect= /EHRIncentivePrograms/. Accessed November 3, 2014 Coiera, E., & Tombs, V. (1998). Communication behaviours in a hospital setting: an observational study. Bmj, 316(7132), 673 676. Couto, F. M., & Pinto, H. S. (2013). The next generation of similarity measur es that fully explore the semantics in biomedical ontologies. Journal of bioinformatics and computational biology, 11(05), 1371001. Coyte, P. C., & Holmes, D. (2007). Health care technology adoption and diffusion in a social context. Policy, Politics, & Nu rsing Practice, 8(1), 47 54. Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), 224 227. Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319 340. De Bleser, Leentje, et al. "Defining pathways." Journal of nursing management 14.7 (2006): 553 563. De Bleser, L., Depreitere, R., WAELE, K. D., Vanhaecht, K., Vlayen, J., & Sermeus, W. (2006). Defining pathways. Journal of nursing management, 14(7), 553 563.

PAGE 132

123 De Coster, C., Quan, H., Finlayson, A., Gao, M., Halfon, P., Humphries, K. H., ... & Romano, P. S. (2006). Identifying priorities in methodological research using ICD 9 CM and ICD 10 adminis trative data: report from an international consortium. BMC health services research, 6(1), 77. Deborah, L. J., Baskaran, R., & Kannan, A. (2010). A survey on internal validity measure for cluster validation. International Journal of Computer Science & Engi neering Survey, 1(2), 85 102. DeLong, E. R., DeLong, D. M., & Clarke Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837 845. Devaraj, S., Sharma, S. K. , Fausto, D. J., Viernes, S., & Kharrazi, H. (2014). Barriers and facilitators to clinical decision support systems adoption: A systematic review. Journal of Business Administration Research, 3(2), 36. Di Bartolomeo, S., Tillati, S., Valent, F., Zanier, L. , & Barbone, F. (2010). ISS mapped from ICD 9 CM by a novel freeware versus traditional coding: a comparative study. Scandinavian journal of trauma, resuscitation and emergency medicine, 18(1), 17. Dudani, S. A. (1976). The distance weighted k nearest neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, (4), 325 327. Farmer, K. C. (1999). Methods for measuring and monitoring medication regimen adherence in clinical trials and clinical practice. Clinical therapeutics, 21(6), 10741090. Ferreira, J. D., Hastings, J., & Couto, F. M. (2013). Exploiting disjointness axioms to improve semantic similarity measures. Bioinformatics, 29(21), 2781 2787. Flexer, A. (2014, October). On Inter rater Agreement in Audio Music Similarity. In ISMIR (pp. 245 250). Ford, E. W., Menachemi, N., & Phillips, M. T. (2006). Predicting the adoption of electronic health records by physicians: when will health care be paperless?. Journal of the American Medical Informatics Association, 13(1), 106112. France, R. K. (1994). W eights and measures: An axiomatic model for similarity computations. Fredrickson, J., Mannino, M., Alqahtani, O., and Banaei kashani, F. (2018, August). Mortality Prediction Performance using Similarity Measures for Medical Event Sequences. in Proceedings of the AMCIS Conference, New Orleans, NO. Ganova Iolovska, M., & Geraedts, M. (2009). Clinical pathways – the Bulgarian approach. Journal of Public Health, 17(3), 225230. Gans, D., Kralewski, J., Hammons, T., & Dowd, B. (2005). Medical groups’ adoption of e lectronic health records and information systems. Health affairs, 24(5), 1323 1333. GarcaPedrajas, N., & Ortiz Boyer, D. (2009). Boosting k nearest neighbor classifier by means of input space projection. Expert Systems with Applications, 36(7), 10570105 82. Garfinkel, R., Gopal, R. D., Pathak, B. K., Venkatesan, R., & Yin, F. (2006). Empirical analysis of the business value of recommender systems. Garg, A. X., Adhikari, N. K., McDonald, H., Rosas Arellano, M. P., Devereaux, P. J., Beyene, J., ... & Haynes , R. B. (2005). Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. Jama, 293(10), 1223 1238.

PAGE 133

124 Glance, L. G., Osler, T. M., Mukamel, D. B., Meredith, W., Wagner, J., & Dick, A. W. (2009). TMPM– ICD9: A Trauma Mortality Prediction Model Based on ICD 9 CM Codes. Annals of surgery, 249(6), 1032 1039. Glance, L. G., Osler, T. M., Mukamel, D. B., and Dick, A. W. (2012). “Impact of trauma center designation on outcomes: is there a differen ce between Level I and Level II trauma centers?” Journal of the American College of Surgeons, 215(3), 372 378. Goldzweig, C. L., Towfigh, A., Maglione, M., & Shekelle, P. G. (2009). Costs and benefits of health information technology: new trends from the literature. Health affairs, 28(2), w282w293. Haider, A. H., Villegas, C. V., Saleem, T., Efron, D. T., Stevens, K. A., Oyetunji, T. A., ... and Schneider, E. B. (2012). Should the IDC 9 Trauma Mortality Prediction Model become the new paradigm for benchmar king trauma outcomes?. Journal of trauma and acute care surgery, 72(6), 1695 1701. Hashmi, A., Ibrahim Zada, I., Rhee, P., Aziz, H., Fain, M. J., Friese, R. S., & Joseph, B. (2014). Predictors of mortality in geriatric trauma patients: a systematic review and meta analysis. Journal of Trauma and Acute Care Surgery, 76(3), 894 901. Hedegaard, H. B., Johnson, R. L., & Ballesteros, M. F. (2017). Proposed ICD 10CM Surveillance Case Definitions for Injury Hospitalizations and Emergency Department Visits. Nation al health statistics reports, (100), 1 8. Henderson, R., & Divett, M. J. (2003). Perceived usefulness, ease of use and electronic supermarket use. International Journal of Human Computer Studies, 59(3), 383395. Henderson, T., Shepheard, J., & Sundararajan, V. (2006). Quality of diagnosis and procedure coding in ICD 10 administrative data. Medical care, 1011 1019. Hennington, A., & Janz, B. D. (2007). Information systems and healthcare XVI: physician adoption of electronic medical records: apply ing the UTAUT model in a healthcare context. Communications of the Association for Information Systems, 19(1), 5. Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., ... & Saltz, J. H. (2013). Caveats for the use of opera tional electronic health record data in comparative effectiveness research. Medical care, 51(8 0 3), S30. Hripcsak, G., Duke, J. D., Shah, N. H., Reich, C. G., Huser, V., Schuemie, M. J., ... & Van Der Lei, J. (2015). Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Studies in health technology and informatics, 216, 574. https://www.cms.gov/Research Statistics Data and Systems/Downloadable Public Use Files/SynPUFs/DE_Syn_PUF.htmlCMS Linkable 2008 – 2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE SynPUF) Huebner, Richard A. "Diversity based interestingness measures for association rule mining." Proceedings of ASBBS 16.1 (2009). Huebner, R. A. (2009). Diversity based interestingness measu res for association rule mining. Proceedings of ASBBS, 16(1). Jordan, K., Porcheret, M., & Croft, P. (2004). Quality of morbidity coding in general practice computerized medical records: a systematic review. Family practice, 21(4), 396412. Kalton, G. (200 9). Methods for oversampling rare subpopulations in social surveys. Survey methodology, 35(2), 125141.

PAGE 134

125 Khosla, Vinod. "20 percent doctor included: Speculations & musings of a technology optimist." (2014). Kilgo, P. D., Osler, T. M., & Meredith, W. (2003). The worst injury predicts mortality outcome the best: rethinking the role of multiple injuries in trauma outcome scoring. Journal of Trauma and Acute Care Surgery, 55(4), 599607. Kinsman, L., Rotter, T., James, E., Snow, P., & Willis, J. (2010). What is a clinical pathway? Development of a definition to inform the debate. BMC medicine, 8(1), 31. Klenk, S., Dippon, J., Fritz, P., & Heidemann, G. (2010). Determining patient similarity in medical social networks. In Proceedings of the First International Workshop on Web Science and Information Exchange in the Medical Web (pp. 6 14). Kostakis, O., Papapetrou, P., & Hollmn, J. (2011, September). Artemis: Assessing the similarity of event interval sequences. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 229 244). Springer, Berlin, Heidelberg. Kottas, M., Kuss, O., & Zapf, A. (2014). A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case control studies. BMC medical research methodology, 14( 1), 26. Kovcs, F., Legny, C., & Babos, A. (2005, November). Cluster validity measurement techniques. In 6th International symposium of hungarian researchers on computational intelligence. Lee, C. P., & Shim, J. P. (2007). An exploratory study of radio frequency identification (RFID) adoption in the healthcare industry. European Journal of Information Systems, 16(6), 712724. Lee, D., & Hosanagar, K. (2016, April). When do recommender systems work the best?: The moderating effects of product attributes and consumer reviews on recommender performance. In Proceedings of the 25th International Conference on World Wide Web (pp. 85 97). International World Wide Web Conferences Steering Committee. Lee, M. D., Pincombe, B., & Welsh, M. (2005, January). An empirica l evaluation of models of text document similarity. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 27, No. 27). Lefebure, L. ICD 9 Coding of Discharge Summaries. Li, D. L., Shen, F., Yin, Y., Peng, J. X., & Chen, P. Y. (2013). Weighted Youden index and its two independent sample comparison based on weighted sensitivity and specificity. Chin Med J (Engl), 126(6), 1150 1154. Liberati, E. G., Ruggiero, F., Galuppo, L., Gorli, M., Gonzlez Lorenzo, M., Maraldi, M., ... & Vespignani, R. (2017). What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation. Implementation Science, 12(1), 113. Lhr, H., Sadeghi, A. R., & Winandy, M. (2010, November). Securing the e he alth cloud. In Proceedings of the 1st ACM International Health Informatics Symposium (pp. 220 229). ACM. Lupiani, E., Sauer, C., Roth Berghofer, T., Juarez, J. M., & Palma, J. (2013, December). Implementation of similarity measures for event sequences in m yCBR. In Proceedings of the 18th UKCBR Workshop. Maloof, M. A. (2003, August). Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML 2003 workshop on learning from imbalanced data sets II(Vol. 2, pp. 2 1). Manning, C. D., R aghavan, P., & Schtze, H. (2008). Introduction to information retrieval (Vol. 1, No. 1, p. 496). Cambridge: Cambridge university press.

PAGE 135

126 Mannino, M., Fredrickson, J., Banaei Kashani, F., Linck, I., & Raghda, R. A. (2017). Development and Evaluation of a Si milarity Measure for Medical Event Sequences. ACM Transactions on Management Information Systems (TMIS), 8(2 3), 8. McDonald, R. P., & Ho, M. H. R. (2002). Principles and practice in reporting structural equation analyses. Psychological methods, 7(1), 64. McGarry, K. (2005). A survey of interestingness measures for knowledge discovery. The knowledge engineering review, 20(1), 3961. Miller, R. H., & Sim, I. (2004). Physicians’ use of electronic medical records: barriers and solutions. Health affairs, 23(2), 116 126. Mistry, P. K., Lee, M. D., & Newell, B. R. (2016). An empirical evaluation of models for how people learn cue search orders. In Proceedings of the 38th Annual Conference of the Cognitive Science Society. 38th annual conference of the cognitive sc ience society. Austin, TX: Cognitive Science Society. Montalvo, S., Martnez, R., & Fresno, V. (2015). Quality prediction of multilingual news clustering: An experimental study. Journal of Information Science, 41(4), 518 530. Mookerjee, V. S., & Mannino, M . V. (1997). Redesigning case retrieval to reduce information acquisition costs. Information Systems Research, 8(1), 51 68. Mortensen, J. M., Szabo, L., & Yancy Jr, L. Prediction of High Cost Hospital Patients. Morton, M. E., & Wiedenbeck, S. (2009). A fra mework for predicting EHR adoption attitudes: a physician survey. Perspectives in health information management/AHIMA, American Health Information Management Association, 6(Fall). Mllensiefen, D., & Frieler, K. (2006). Evaluating different approaches to m easuring the similarity of melodies. In Data Science and Classification (pp. 299 306). Springer, Berlin, Heidelberg. Nambisan, P. (2014, January). EMR adoption among office based physicians and practices: Impact of peer to peer interactions, peer support a nd online forums. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 2733 2740). IEEE. Neves, G. A. O. (2015). Empirical study of the behavior of several Recommender System methods on SAPO Videos. Newgard, C. D., Holmes, J. F., H aukoos, J. S., Bulger, E. M., Staudenmayer, K., Wittwer, L., ... & Hsia, R. Y. (2016). Improving early identification of the high risk elderly trauma patient by emergency medical services. Injury, 47(1), 19 25. Neyman, J. and Pearson, E. (1933). “On the Pr oblem of the Most Efficient Tests of Statistical Hypotheses,” Philosophical Transactions of the Royal Society 231 (694 706), 289 337. Norn, G. N., Hopstadius, J., Bate, A., Star, K., & Edwards, I. R. (2010). Temporal pattern discovery in longitudinal elec tronic patient records. Data Mining and Knowledge Discovery, 20(3), 361 387. Osler, T., Glance, L., Buzas, J. S., Mukamel, D., Wagner, J., & Dick, A. (2008). A trauma mortality prediction model based on the anatomic injury scale. Annals of surgery, 247(6), 10411048. Overhage, J. M., Ryan, P. B., Reich, C. G., Hartzema, A. G., & Stang, P. E. (2011). Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association, 19(1), 54 60 .

PAGE 136

127 Painter, M. W., & Jha, A. K. (2015). Health Information Technology in the United States, 2015: Transition to a Post HITECH World (Executive Summary) (No. 03f5b8d79339485f90149d875ac0b40b). Mathematica Policy Research. Perotte, A., Pivovarov, R., Nataraja n, K., Weiskopf, N., Wood, F., & Elhadad, N. (2013). Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2), 231237. Petrevska, B., & Koceski, S. (2012). Tourism recommendation system: empi rical investigation. Revista de turism studii si cercetari in turism, (14), 11 18. Plaisant, C., Mushlin, R., Snyder, A., Li, J., Heller, D., & Shneiderman, B. (2003). LifeLines: using visualization to enhance navigation and analysis of patient records. In The Craft of Information Visualization (pp. 308 312). Plaisant, C., Shneiderman, B., & Mushlin, R. (1998). An information architecture to support the visualization of personal histories. Information Processing & Management, 34(5), 581597. Puzicha, J., Bu hmann, J. M., Rubner, Y., & Tomasi, C. (1999). Empirical evaluation of dissimilarity measures for color and texture. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on (Vol. 2, pp. 1165 1172). IEEE. Qin, G. and Hotilo vac, L. (2008). “Comparison of non parametric confidence intervals for the area under the roc curve of a continuous scale diagnostic test,” Stat Methods Med Res, 17, 207 221. Quan, H., Li, B., Duncan Saunders, L., Parsons, G. A., Nilsson, C. I., Alibhai, A ., & Ghali, W. A. (2008). Assessing validity of ICD 9 CM and ICD 10 administrative data in recording clinical conditions in a unique dually coded database. Health services research, 43(4), 1424 1441. Quan, H., Parsons, G. A., & Ghali, W. A. (2002). Validit y of information on comorbidity derived from ICD 9 CCM administrative data. Medical care, 40(8), 675685. Quan, H., Sundararajan, V., Halfon, P., Fong, A., Burnand, B., Luthi, J. C., ... & Ghali, W. A. (2005). Coding algorithms for defining comorbidities in ICD 9 CM and ICD 10 administrative data. Medical care, 11301139. Randeree, E. (2007). Exploring physician adoption of EMRs: a multi case analysis. Journal of medical systems, 31(6), 489496. Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow up studies: ROC Area, Cohen's d, and r. Law and human behavior, 29(5), 615. Ruddle, R. A., Bernard, J., May, T., Lcke Tieke, H., & Kohlhammer, J. (2016). Methods and a research agenda for the evaluation of event sequence visualization techniques. In Proceedings of the IEEE VIS 2016 Workshop on Temporal & Sequential Event Analysis.. Leeds. Sanders, D., Burton, D. A., & Protti, D. (2013). The Healthcare Analytics Adoption Model: A Framework and Roadmap. Serra, J., & Arcos, J. L. (2014). An empirical evaluation of similarity measures for time series classification. KnowledgeBased Systems, 67, 305314. Skinner, R. I. (2003). The value of information technology in healthcare/reply. Frontiers of health services management, 19(3), 3. Southern, D . A., Quan, H., & Ghali, W. A. (2004). Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Medical care, 42(4), 355360.

PAGE 137

128 Sox, H., McNeil, B., Wheatley, B., & Eden, J. (Eds.). (2008). Knowing what works in health care: a roadmap for the nation. National Academies Press. Stang, P. E., Ryan, P. B., Racoosin, J. A., Overhage, J. M., Hartzema, A. G., Reich, C., ... & Woodcock, J. (2010). Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Annals of internal medicine, 153(9), 600 606. Stone, T., Zhang, W., & Zhao, X. (2013, October). An empirical study of top n recommendation for venture finance. In Proceedings of the 22nd ACM international con ference on Conference on information & knowledge management (pp. 1865 1868). ACM. Sundararajan, V., Quan, H., Halfon, P., Fushimi, K., Luthi, J. C., Burnand, B., ... & International Methodology Consortium for Coded Health Information (IMECCHI. (2007). Cros s national comparative performance of three versions of the ICD 10 Charlson index. Medical care, 45(12), 12101215. Tang, L., Long, B., Chen, B. C., & Agarwal, D. (2016, August). An empirical study on recommendation with multiple types of feedback. In Proc eedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 283 292). ACM. Terry, A. L., Giles, G., Brown, J. B., Thind, A., & Stewart, M. (2009). Adoption of electronic medical records in family practice: the provid ers' perspective. Family medicine, 41(7), 508. Tohira, H., Jacobs, I., Mountain, D., Gibson, N., & Yeo, A. (2012). Systematic review of predictive performance of injury severity scoring tools. Scandinavian journal of trauma, resuscitation and emergency med icine, 20(1), 63. Valderas, J. M., Starfield, B., Sibbald, B., Salisbury, C., & Roland, M. (2009). Defining comorbidity: implications for understanding health and health services. The Annals of Family Medicine, 7(4), 357363. Van Craenendonck, T., & Blocke el, H. (2015, June). Using internal validity measures to compare clustering algorithms. In Benelearn 2015 Poster presentations (online) (pp. 1 8). van de Klundert, J., Gorissen, P., & Zeemering, S. (2010). Measuring clinical pathway adherence. Journal of b iomedical informatics, 43(6), 861872. Vanhaecht, K., WITTE, K. D., Depreitere, R., & Sermeus, W. (2006). Clinical pathway audit tools: a systematic review. Journal of Nursing Management, 14(7), 529537. Varshney, U. (2007). Pervasive healthcare and wirele ss health monitoring. Mobile Networks and Applications, 12(23), 113 127. Venkatesh, V., Zhang, X., & Sykes, T. A. (2011). “Doctors do too little technology”: A longitudinal field study of an electronic healthcare system implementation. Information Systems Research, 22(3), 523 546. Vishwanath, A., & Scamurra, S. D. (2007). Barriers to the adoption of electronic health records: using concept mapping to develop a comprehensive empirical model. Health Informatics Journal, 13(2), 119 134. Vrotsou, K., & Forsell , C. (2011, July). A qualitative study of similarity measures in event based data. In Symposium on Human Interface (pp. 170 179). Springer, Berlin, Heidelberg.

PAGE 138

129 Weeks, S. R., Stevens, K. A., Haider, A. H., Efron, D. T., Haut, E. R., MacKenzie, E. J., & Schn eider, E. B. (2016). A modified Kampala trauma score (KTS) effectively predicts mortality in trauma patients. Injury, 47(1), 125 129. Walter, Z., & Lopez, M. S. (2008). Physician acceptance of information technologies: Role of perceived threat to professio nal autonomy. Decision Support Systems, 46(1), 206 215. Wilson, D. R., & Martinez, T. R. (1997). Improved heterogeneous distance functions. Journal of artificial intelligence research. Wongsuphasawat, K., & Shneiderman, B. (2009, October). Finding comparab le temporal categorical records: A similarity measure with an interactive visualization. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (pp. 27 34). IEEE. Wongsuphasawat, K., Plaisant, C., Taieb Maimon, M., & Shneiderman, B. (2012). Querying event sequences by exact match or similarity search: Design and empirical evaluation. Interacting with computers, 24(2), 55 68. Wu, J. H., Wang, S. C., & Lin, L. M. (2007). Mobile computing acceptance factors in the healthcare industry: A structural equation model. International journal of medical informatics, 76(1), 66 77. Wu, L., Li, J. Y., & Fu, C. Y. (2011). The adoption of mobile healthcare by hospital's professionals: An integrative perspective. Decision Support Systems, 51(3), 5875 96. Yan, Y., Fung, G., Dy, J. G., & Rosales, R. (2010, July). Medical coding classification by leveraging intercode relationships. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 193 202). ACM. Ya ng, S., Zhou, Y., & Baras, J. (2013). Compositional analysis of dynamic bayesian networks and applications to complex dynamic system decomposition. Procedia Computer Science, 16, 167176. Yao, W., & Kumar, A. (2013). Conflexflow: Integrating flexible clinical pathways into clinical decision support systems using context and rules. Decision Support Systems, 55(2), 499 515. Yoo, S., Cho, M., Kim, S., Kim, E., Park, S. M., Kim, K., ... & Song, M. (2015). Conformance analysis of clinical pathway u sing electronic health record data. Healthcare informatics research, 21(3), 161 166. Youden, W. (1950). Index for Rating Diagnostic Tests. Cancer, 3, 32 35. Zhang, S. (2010). KNN CF Approach: Incorporating Certainty Factor to kNN Classification. IEEE Intelligent Informatics Bulletin, 11(1), 2433. Zhang, X., Li, Y., Kotagiri, R., Wu, L., Tari, Z., & Cheriet, M. (2017). KRNN: k Rare class Nearest Neighbour clas sification. Pattern Recognition, 62, 3344. Zhang, Y., Padman, R., & Wasserman, L. (2014). On learning and visualizing practice based clinical pathways for chronic kidney disease. In AMIA Annual Symposium Proceedings(Vol. 2014, p. 1980). American Medical I nformatics Association. Zheng, A., Zhou, X., Ma, J., & Petridis, M. (2010, June). The optimal temporal common subsequence. In Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on (pp. 316 321). IEEE.

PAGE 139

130 APPENDIX A TAXONOMY OF BAR RIERS ( Boonstra and Broekhuis 2010)

PAGE 140

131 APPENDIX B RESEARCH MODEL OF EH R ADOPTION DETERMINA NTS

PAGE 141

132 APPENDIX C OMOP CDM TABLES

PAGE 142

133 APPENDIX D THE CDSS ACCEPTANCE MODEL FACILITATORS (Devaraj et al. 2014)

PAGE 143

134 APPENDIX E OTCS MES C# ALGORITH M using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.Data.OleDb; namespace OTCS_Algorit hm { public partial class Form1 : Form { public Form1() { int MaxEntity; int MaxClaims; int TotalEvents; int TotalMatchEvents; int i, j, k;

PAGE 144

135 int state_count; int dur_diff; int gap_diff; int match_level; string sqlA; string sqlB; String connectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data" @" Source= CMS_Claims_Db_OTCS_ME S.accdb"; String tableName = "Use_Cases_CMS_Claims_Minimal"; String tableName1 = "Entities"; String tableName2 = "Claims"; String CaptionEnt = "Number of Unique Entities: "; String CaptionClm = "N umber of Unique Claims: "; String CaptionEventA = "Number of Events for This Entity: "; String query1 = String.Format("select Distinct ENTITY_ID from Use_Cases_CMS_Claims_Minimal", tableName); String query2 = String.Format("select * from Use_Cases_CMS_Claims_Minimal", tableName); DataTable de = new DataTable(); DataTable dc = new DataTable(); DataTable DataTableA = new DataTable(); DataTable DataTableB = new DataTable(); DataRow DataRowViewMstEnt; DataRow DataRowViewMstEntEvent;

PAGE 145

136 DataRow DataRowViewMatchEntEvent; DataSet ds = new DataSet(); DataSet DataSetA; DataSet DataSetB; OleDbCommand comm and; OleDbDataAdapter adapter; OleDbConnection conn = new OleDbConnection(connectionString); try { //Open Database Connection conn.Open(); MessageBox.Show("Dataset Filled and Connection Open"); //Fill the DataSet //Build the Entities table OleDbDataAdapter da = new OleDbDataAdapter(query1, conn); da.Fill(ds, tableName1); //Build the Claims table OleDbDataAdapter db = new OleDbDataAdapter(query2, conn); db.Fill(ds, tableName2); //Display the record counts in the Entities and Claims tables MaxEntity = ds .Tables[0].Rows.Count; MessageBox.Show(CaptionEnt + MaxEntity.ToString()); MaxClaims = ds.Tables[1].Rows.Count;

PAGE 146

137 MessageBox.Show(CaptionClm + MaxClaims.ToString()); de = ds.Tables[0]; // Data t able de containing distinct entities dc = ds.Tables[1]; // Data table dc containing all claims (event states) // Loop through ENTITIES comparing equivalent state duration and gap to other ENTITIES for (i = 0; i < de.Rows.Count; i++) { DataRowViewMstEnt = de.Rows[i]; // MessageBox.Show(DataRowViewMstEnt["ENTITY_ID"].ToString()); // Select all EVENT STATES for the current ENTITY in loop sqlA = "select ENTITY_ID, ICD9_3DIGIT, ICD9_4DIGIT, ICD9_5DIGIT, VISIT_DUR, VISIT_GAP from Use_Cases_CMS_Claims_Minimal where ENTITY_ID = " + "'" + (string)DataRowViewMstEnt["ENTITY_ID"] + "'"; command = new OleDbCommand(sqlA, conn); adapter = new OleDbDataAdapter(command); DataSetA = new DataSet(); adapter.F ill(DataSetA); DataTableA = DataSetA.Tables[0]; TotalEvents = DataTableA.Rows.Count; // MessageBox.Show(CaptionEventA + TotalEvents, ToString()); // Loop through the EVENT STAT ES of the current ENTITY iteration and find matching EVENT STATES for other ENTITIES for (j = 0; j < DataTableA.Rows.Count; j++) {

PAGE 147

138 DataRowViewMstEntEvent = DataTableA.Rows[j]; // MessageBox.Show(DataRowViewMstEntEvent["ENTITY_ID"] + " " + DataRowViewMstEntEvent["ICD9_3DIGIT"].ToString()); // Select all matching EVENT STATES for other ENTITIES sqlB = "select ENTITY_ID AS M_E NTITY_ID, ICD9_3DIGIT as M_ICD9_3DIGIT, ICD9_4DIGIT as M_ICD9_4DIGIT, ICD9_5DIGIT as M_ICD9_5DIGIT, VISIT_DUR AS M_VISIT_DUR, VISIT_GAP AS M_VISIT_GAP " + "from Use_Cases_CMS_Claims_Minimal where ENTITY_ID <> " + "'" + (string)DataRowViewMstEntEvent["ENTITY_ID"] + "' and ICD9_3DIGIT = '" + (string)DataRowViewMstEntEvent["ICD9_3DIGIT"] + "'"; command = new OleDbCommand(sqlB, conn); adapter = new OleDbDataAdapter(command); DataSetB = new DataSet(); adapter.Fill(DataSetB); DataTableB = DataSetB.Tables[0]; TotalMatchEvents = DataTableB.Rows.Count; // MessageBox.Show(CaptionEventB + DataRowViewMstEntEvent["ENTITY_ID"] + DataRowViewMstEntEvent["ICD9_3DIGIT"] + TotalMatchEvents, ToString()); // Loop though all other different ENTITIES with EVENT STATE matching current EVE NT STATE and compute dur_diff and gap_diff // Output record to MS Access Results Table for (k = 0; k < DataTableB.Rows.Count; k++) { DataRowViewMatchEntEvent = DataTableB.Rows[k];

PAGE 148

139 state_count = 1; //Console.Write((string)DataRowViewMstEntEvent["ENTITY_ID"]); dur_diff = (int)Da taRowViewMstEntEvent["VISIT_DUR"] (int)DataRowViewMatchEntEvent["M_VISIT_DUR"]; gap_diff = (int)DataRowViewMstEntEvent["VISIT_GAP"] (int)DataRowViewMatchEntEvent["M_VISIT_GAP"]; match_level = 3; if (DataRowViewMstEntEvent["ICD9_4DIGIT"].Equals(DataRowViewMatchEntEvent["M_ICD9_4DIGIT"])) { match_level = 4; } if (DataRowViewMstEntEvent["ICD9_5DIGIT"].Equals(DataRowViewMatchEntEvent["M_ICD9_5DIGIT"])) { match_level = 5; } if (dur_diff < 0) { dur_diff = 0 dur_diff; } if (gap_diff < 0) { gap_diff = 0 gap_diff;

PAGE 149

140 } OleDbCommand inscommand = new OleDbCommand(); inscommand.CommandType = CommandType.Text; inscommand.CommandText = "insert into OTCS_Results_Test (ENTITY_I D,ENTITY_ID_M,EVENT,STATE_COUNT_RESULT, DUR_DIFF_RESULT, GAP_T, GAP_M, GAP_DIFF_RESULT, MATCH_LEVEL) values ('" + DataRowViewMstEnt["ENTITY_ID"] + "','" + DataRowViewMatchEntEvent["M_ENTITY_ID"] + "','" + DataRowViewMstEntEv ent["ICD9_3DIGIT"] + "'," + +state_count + "," + dur_diff + "," + DataRowViewMstEntEvent["VISIT_GAP"] + "," + DataRowViewMatchEntEvent["M_VISIT_GAP"] + "," + gap_diff + "," + match_level + ")"; inscommand.Connection = conn; inscommand.ExecuteNonQuery(); // Console.Write("Duration Difference" + dur_diff + " / Gap Difference" + gap_diff); } } } conn.Close(); } catch (OleDbException exp) { MessageBox.Show("Database Error:" + exp.Message.ToString()); } finally

PAGE 150

141 { if (conn.State == ConnectionState.Open) { conn.Close(); } } } } }

PAGE 151

142 APPENDIX F OTCS MES SAS ALGORIT HM Input Medical event sequence information consisting of a record for each event within the medical event sequence and comprised of the following fields: 1. Entity ID number 2. 3, 4 and 5 Digit event code 3. Temporal duration of event 4. Temporal gap from preceding event Output Results of OTCS MES similarity measure computations consisting of a record for each event partially of fully matched between the “match” and “target” sequences and comprised of the following fields: 1. Entity ID of “match” MES 2. Entity ID of “target” MES 3. 3, 4 and 5 Digit event code for each matched event 4. Level for event match (3, 4 or 5) 5. Difference in durations of matched event between the “match” and “target” MESs 6. Difference in preceding event gap of matched event between “match” and “target” MESs SAS Code * Com pute OTCS MES; data otcs_mes1; set targ_sample; state_count_result=0; match_level=0; do i=1 to xnobs; set mtch_sample nobs=xnobs point=i; if (icd9_3digit eq icd9_3digit_mtch) and (entity_id ne entity_id_mtch) then do;

PAGE 152

143 state_count_result=1; match_le vel=3; if (icd9_4digit eq icd9_4digit_mtch) and (icd9_4digit ne ' ') then match_level=4; if (icd9_5digit eq icd9_5digit_mtch) and (icd9_5digit ne ' ') then match_level=5; dur_diff_result=abs(visit_durvisit_dur_mtch); gap_diff_result=abs(visit_gap visit_gap_mtch); output; end; end; run;