Citation |

- Permanent Link:
- http://digital.auraria.edu/AA00004055/00001
## Material Information- Title:
- Identification of ictal and pre-ictal states using neural networks with wavelet decomposed data
- Creator:
- Schuyler, Ronald Paul
- Publication Date:
- 2004
- Language:
- English
- Physical Description:
- vii, 54 leaves : illustrations ; 28 cm
## Subjects- Subjects / Keywords:
- Spasms -- Detection ( lcsh )
Radial basis functions ( lcsh ) Neural networks (Neurobiology) ( lcsh ) Electroencephalography ( lcsh ) Wavelets (Mathematics) ( lcsh ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Bibliography:
- Includes bibliographical references (leaves -).
- General Note:
- Department of Computer Science and Engineering
- Statement of Responsibility:
- by Ronald Paul Schuyler.
## Record Information- Source Institution:
- |University of Colorado Denver
- Holding Location:
- |Auraria Library
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 60403341 ( OCLC )
ocm60403341 - Classification:
- LD1190.E52 2004m S33 ( lcc )
## Auraria Membership |

Full Text |

IDENTIFICATION OF ICTAL AND PRE-ICTAL STATES USING NEURAL
NETWORKS WITH WAVELET DECOMPOSED EEG DATA by Ronald Paul Schuyler B.S., University of Colorado, Boulder, 1998 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Computer Science 2004 I I I This thesis for the Master of Science degree by Ronald Paul Schuyler has been approved by Utf. I Schuyler, Ronald Paul (M.S., Computer Science) Identification of Ictal and Pre-Ictal States Using Neural Networks with Wavelet Decomposed EEG Data Thesis directed by Professor Krzysztof Cios This work presents a reliable seizure detection method based on radial basis function (RBF) neural networks, and extends that method to confirm the existence of an identifiable pre-ictal state. The efficacies of several preprocessing methods are evaluated for their abilities to extract relevant information from the electroencephalographic (EEG) data. RBF network topology is investigated, and a heuristic is proposed for narrowing the search for optimal values of neuron radius. This abstract accurately represents the content of the ABSTRACT its publication. Signed ACKNOWLEDGMENT I would like to thank my advisor, Krys Cios, for his direction and for his confidence in my abilities. Thanks also to Andrew White at the Childrens Hospital in Denver for supplying the data used here and for answering many of my questions about neurology. CONTENTS Figures......................................................................vi Tables......................................................................vii Chapter 1. Introduction...............................................................1 2. Literature Review.........................................................4 3. Data......................................................................9 3.1 Raw Data..................................................................9 3.2 Meta Data................................................................9 4. Methods..................................................................10 4.1 Artificial Neural Networks...............................................10 4.1.1 Radial Basis Function Neural Networks..................................11 4.2 Preprocessing...........................................................18 4.2.1 Windowing..............................................................20 4.2.2 Fourier Transform......................................................21 4.2.3 Wavelet Transform......................................................22 4.2.4 Input Vector Construction..............................................24 5. Results..................................................................30 5.1 Neuron Locations.........................................................30 5.2 Seizure Identification..................................................31 5.2.1 Seizure-At-Once Method.................................................31 5.2.2 Short Slices...........................................................33 5.3 Seizure Prediction.......................................................38 6. Discussion...............................................................44 7. Conclusions and Future Work..............................................49 References...................................................................51 v FIGURES Figure 4.1 RBF Network Architecture...................................................11 4.2 Radial basis transfer function.............................................12 4.3 Accuracies for mushroom data at a range of spread factors..................16 4.4 Accuracies for FFT feature set over a range of spread values...............16 4.5 Transformation examples....................................................25 5.1 Comparison of neuron location methods......................................30 5.2 Per-slice seizure identification accuracies using short slices.............34 5.3 Seizure identification for rat 4...........................................36 5.4 Three heuristics for seizure identification over 24 hours..................37 5.5 Rat 4 seizure identification for two days..................................38 5.6 Seizure prediction on different channels...................................39 5.7 Seizure prediction for rat 6...............................................40 5.8 Seizure prediction for rat 4...............................................40 5.9 Seizure prediction for rat 5...............................................40 5.10 Prediction of one seizure.................................................41 5.11 Prediction refinement with heuristic......................................42 5.12 Seizure prediction for 24 hours...........................................43 vi TABLES Table 4.1 Indexed seizures per rat.................................................21 4.2 Seizure counts per rat...................................................26 4.3 Normal segment counts per rat............................................26 5.1 Classification of seizures using seizure-at-once method..................32 5.2 Results for seizure identification using seizure-at-once method..........33 1. Introduction In order to facilitate the development of drugs to control epileptic seizures, animal models are needed. A necessary component in the development of these animal models is the ability to keep track of when seizures happen. Currently, this is done by a human expert trained to identify seizures within electroencephalographic (EEG) data recorded from intracranial electrodes. This process requires a researcher to review thousands of hours of EEG data plots. The development of a system capable of automating this task would relieve the researcher from this tedious, time consuming and error prone task. Artificial neural networks are known to be useful in pattern recognition applications, and have been applied to EEG analysis in areas such as disease diagnosis [12,36], sleep stage classification [26], mental state classification [2], artifact recognition [3] and the detection of epileptiform discharges [13,38]. The use of radial basis function neural networks in this study demonstrates that with the proper data preprocessing, seizure identification can be very accurate. The Fourier transform or wavelet decomposition is used to preprocess the data before using it to train the neural network. The results are compared to feeding the untransformed data directly to the network. The research of [12] suggests that 1 training a neural network on raw EEG data is unlikely to be successful. This study also shows that using preprocessing methods outperforms the use of raw data. However, a properly configured neural network trained only on raw data shows better than expected results. In addition to demonstrating a reliable seizure identification system, the possibility of predicting an impending seizure before clinical onset is investigated. The period during a seizure is known as the ictal state, while the periods of normal brain activity between seizures are called interictal. A third state, referred to as pre-ictal, has been proposed [19,20,30] as the period just before seizure onset. If this state can be identified in the EEG [16,19,20,21,23,29,30,33], seizures can effectively be predicted. Implantable devices for humans already exist that can abort a seizure using electrical stimulation or localized on-demand drug delivery [8,19,20,30,33,37]. Combined with reliable pre-ictal state identification techniques, these devices could eliminate the need for constant drug treatment of a condition with intermittent symptoms [20]. At the very least, seizure prediction could give an early warning to the 25% 30% of epileptics who do not respond to drug therapy [16,19,33]. As with seizure identification, appropriate data preprocessing methods improve the accuracy of seizure prediction. Wavelet decomposition provides an effective means to transform a window of raw data long enough to contain relevant information for seizure prediction into a vector short enough to be generalized by a neural network. 2 Windows of different lengths are used in combination with different levels of wavelet decomposition. Although seizures could not be reliably predicted in all cases, the limited success in identifying a pre-ictal period demonstrates that the possibility of more accurate seizure prediction exists. 3 2. Literature Review Neural networks have been used for different EEG classification tasks with varying degrees of success. Ultimately, the potential success of a particular classification task is dependent on the existence of the appropriate information within the raw EEG data for that task. Assuming the necessary information is contained within the raw data, the success of a neural network classification method is largely based on preprocessing. The raw data must be converted into a vector of manageable size, while retaining as much of the relevant information as possible. Transformations, windowing, sampling or some combination are typically used with time-series data from the EEG. This section provides an overview of some methods that have been investigated. The non-stationaiy characteristic of EEG data is an issue that must be addressed. Typically this is done by limiting the data to a small window so that the data analyzed can be assumed to be stationary. Anderson et al. [2] found that a window of one quarter second was as good as a two second window for distinguishing between a subject performing mental arithmetic and a baseline mental state. Using a fully connected feed-forward neural network with autoregressive parameters for spectral density resulted in an accuracy of 74% for mental state classification, and 4 outperformed the use of the raw EEG data directly. They suggest the averaging of results over several successive windows to improve accuracy. In 1997, Hazarika et al. applied a Lemarie wavelet transform to one second segments of EEG data as a preprocessing step to train a neural network to classify patients as normal, having schizophrenia or having obsessive compulsive disorder (OCD) [12]. Only the two largest coefficients of the wavelet transform of each segment were used from each level of decomposition, resulting in a substantial loss of information. Their network correctly classified only 66% of normal cases and 71% of schizophrenia cases. Classification results for OCD were described as poor. Still, these results were better than those obtained for the same task using an autoregressive transformation of the raw EEG data. This may indicate that classification of these conditions is not a task that can be effectively performed from EEG data alone (the necessary information is not present in the EEG signal), or that other factors, such as not controlling for different levels of drug treatment had a more substantial impact on effectiveness than the authors believed. Visual inspection of wavelet transformed EEG from an epileptic patient is used in [1]. They deem the Daubechies wavelet decomposition superior to the short time Fourier transform for its ability to localize and identify the transient signals associated with epileptic discharges. Daubechies wavelets are also used in [10], along with several other raw data transformations, including fractal dimension estimation. Three 5 algorithms are compared for the estimation of fractal dimension. One is found to be more reproducible than the others, but no quantitative results are provided. It is pointed out in [11] that estimates of fractal dimension of EEG data are almost certain to be incorrect, however relative differences between estimates using the same method may be useful in distinguishing between states. In [10], EEG segments of tens of seconds are considered to be stationary, and windows between one quarter and 45 seconds are used. The rate of epileptic discharges in the form of spikes is investigated using the wavelet transform and found to be uncorrelated with seizure onset. Another important observation given here is that a seizure detector based on any of these methods will likely require patient-specific tuning, as with speech- and handwriting-recognition systems. This observation is echoed in [20]. Seizure prediction is closely related to seizure identification, and many of the same techniques have been applied. Estimates of the fractal dimension of the EEG have been investigated as apossible identifier of the pre-ictal state [6,16,19,21,23], based on the observation that brain activity involved in epileptogenesis near die seizure focus becomes more correlated, while electrical activity not associated with the building seizure decreases [11]. The method of [20] noticed a decrease in dimensionality in human EEG data up to four hours before seizure onset. The nonlinear pattern recognition capabilities of neural networks [28] make them a natural 6 choice to investigate the possibility of the existence of a pre-ictal state without requiring an explicit estimation of fractal dimension. A review of seizure prediction research was published in 2002 that qualitatively compared many methods [19], Research in the areas of time-domain analysis, frequency-domain analysis, nonlinear dynamics, and intelligent systems were described, but no quantitative results were given. The ability of methods, such as neural networks, to distinguish between pre-ictal and normal states without articulation of specific rules was acknowledged. These methods were dismissed for their inability to provide insight into the nature of the pre-ictal state. If the goal is investigation of the causes of seizures, then the black-box nature of neural network methods is a disadvantage. If the goal is to know when a seizure is about to happen in order to prepare for it or stop it altogether, then it is necessary only to know that it is coming, and the disadvantage of neural network methods becomes a non-issue. However, because neural networks are characteristically non-linear, the output is not bounded in some cases. If an input state is encountered that is outside the states represented in training, the output is not predictable, and care must be taken in dealing with these results. The first study of pre-ictal EEG to use wavelet decomposed data with a neural network was published in [30]. Their method used recurrent neural networks with one or two inputs, ten or 15 recurrent hidden neurons and one output neuron. 7 Daubechies wavelets were used to decompose the raw data, and only data from the most relevant intracranial probe was used. Separate networks were trained with raw data, wavelet approximation coefficients and detail coefficients. Four seizures from one patient were analyzed, with 95 seconds of data immediately preceding seizure onset used as pre-ictal, and the 95 seconds immediately preceding that used as normal baseline data. Segments of ten to 20 consecutive training pairs were chosen randomly from the pre-ictal or normal periods and used to train the network. The best prediction results were obtained using wavelet detail coefficients. As there were only four seizures used in the study, the criterion used to evaluate accuracy was visual inspection of a plot of network output when presented with 170 seconds of data immediately preceding a previously unseen seizure. The same method was used with extra-cranial EEG data, with less accurate results. This difference is attributed to attenuation of high frequency components of the signal by the skull and scalp. The windowing and wavelet method used in this study is similar to that described in [30]. Several additional wavelet bases and levels of decomposition are used here. Those results are then extended from pre-ictal state identification for a few isolated seizures to seizure prediction for several full days of data with frequent seizure activity. 8 3. Data Data used here were furnished by Dr. Andrew White of the University of Colorado Health Sciences Center as part of an unpublished study. 3.1 Raw Data The data analyzed consists of approximately 50 billion data points taken from raw EEG readings acquired from three channel radiotelemetry units of nine rats. Five rats (rats 4-8) were treated with kainite to induce seizures. Rats 1-3 and 9 are controls. No data were processed for rats 2 and 3. More than 100 days of data were recorded at a sampling rate of 250 Hz. Separate files were used to store data for each day and each rat. 3.2 Meta Data In addition to the raw EEG data, an Excel spreadsheet containing the time of day and duration of2462 seizures was provided. Seizure times are recorded as the nearest minute before the onset of seizure activity, so the actual start of the seizure could be up to 59 seconds, or 14750 data points, away from the nominal start time. 9 4. Methods Two methods are used for distinguishing between segments of EEG recordings containing seizures and those containing only normal data. The first method attempts to identify the entire seizure at once using 230 second segments of data. The second method examines several consecutive data slices, where each slice is a few seconds in duration. The second method is then modified to identify pre-ictal data. 4.1 Artificial Neural Networks An artificial neural network is a mathematical model inspired by the biological neural networks of a living brain, capable of learning from examples and generalization beyond the examples used in training. The network is composed of individual interconnected units known as neurons. The neurons are usually arranged in layers, with an input layer, one or more hidden layers which are not directly connected to the outside world, and an output layer. Each neuron takes input from other neurons in the network and calculates its output based on a transfer function. The outputs from the individual neurons are then combined at the output layer to produce the total network response to a given input vector. 10 Supervised learning is achieved by presenting training patterns to the input layer and adjusting the connection weights between neurons to minimize the difference between the total network response at the output layer and the target response for each training pattern. 4.1.1 Radial Basis Function Neural Networks The radial basis function (RBF) neural networks used in this study were chosen for their pattern recognition capabilities and training speed. An RBF network consists of an input layer, a single hidden layer of radial basis neurons, and a linear output layer. The RBF network architecture is shown in Figure 4.1. Input Layer Hidden Layer Output Layer Figure 4.1. RBF Network Architecture. The input vector X produces the hidden layer output vector a and the network response L 11 Weight vectors of the neurons in the hidden layer are set to a representative subset of the input vectors used for training. The processing done by a single RBF neuron consists of calculating the Euclidian distance from its weight vector to the input vector and passing the bias adjusted result through the transfer function. The radial basis transfer function used here is a Gaussian, given by equation 4.1. f(x) = exA2 (4.1) A plot of this function is given in Figure 4.2. Radal Basis Transfer Function Figure 4.2. Radial basis transfer function. 12 The weight vector of an RBF neuron is also referred to as its location or center point. Input vectors close to the center point of an RBF neuron will cause that neuron to generate an output value near one. The neurons output decreases approaching zero as the input vector gets further from the neuron center. The output vector an of the hidden layer is calculated in terms of the P-dimensional input vector x in equation 4.2, where W is the weight vector of neuron n, b is the bias and tfn is the Gaussian transfer function. an = tfn(b V(Xp=l :p(Xp Wnp)2)) (4.2) The bias of the hidden layer is calculated as: b = V(-log(0.5)) / radius (4.3) Supervised learning is achieved by presenting the network with a set of training vectors and adjusting the weights and bias of the linear output layer to minimize the mean absolute error between target and actual network output. This is done by solving equation 4.4 in terms of the output from the hidden layer, where k is the number of training vector/target pairs. tk = b + au w, + a^ w2 + ... + a^ wn (4.4) 13 The RBF networks used in this study were built using functions from the Matlab Neural Network toolbox. 4.1.1.1 RBF Network Parameters Several parameters must be specified when building an RBF neural network. Useful heuristics exist for choosing reasonable values for some parameters, but much of the design process involves trial and error. One new heuristic is proposed for choosing a good starting value for neuron radius, greatly speeding up network design times. 4.1.1.1.1 Neuron Radius One heuristic for selecting a value for the neurons radii, also referred to as spread, is given in the Matlab Neural Network Toolbox documentation [9]: ...choose a spread constant larger than the distance between adjacent input vectors, so as to get good generalization, but smaller than the distance across the whole input space. Taking the minimum and maximum values of the distance matrix of input vectors for the FFT feature set described in section 4.2.4.1 gives a range of: [851,1211900], narrowing the search down to a still huge search space of 1.2 x 106. A better heuristic is based on the amount of variation in the set of input training vectors. The base spread for a given set of training vectors is calculated as the mean 14 of the distance matrix of those vectors. The formula used to calculate the base spread is given by equation 4.5, where x, and Xj are training vectors, k is the number of training vectors and P is the dimension of the data. Base Spread = 2*Q]i=i:kIj=i+i:kV(Zp=i:p(xip Xjp)2))/(k2-k) (4.5) The base spread is multiplied by a spread factor to determine the neuron radius. The best values for neuron radius were determined experimentally to be between one third and two thirds of the value of base spread. This observation held across multiple diverse data sets. Figure 4.3 shows a plot of the accuracies of a neural network trained to distinguish between edible and inedible mushrooms from 8416 samples of 22 dimensional data [18]. Figure 4.4 shows a similar plot of accuracies for the FFT feature set. 15 Figure 4.3. Accuracies for mushroom data at a range of spread factors. Plots of accuracies using 4, 8, 14,20 and 30 neurons are shown. Figure 4.4. Accuracies for FFT feature set over a range of spread values. The best accuracies in Figures 4.3 and 4.4 occur in the relatively narrow range of one third to two thirds of the value of base spread for the respective data sets. Using this heuristic with the same data set as above results in a search space of 3.9 x 104, a 16 reduction of two orders of magnitude. The width of the space necessary to search using the Matlab heuristic is approximately equal to the width of the entire plot of Figure 4.4. This is more useful than any neuron radius heuristic that we have found. 4.1.1.1.2 Neuron Location Each neuron in an RBF neural network has an associated n-dimensional location in the n-dimensional input space, where n is the dimension of the input vectors. In this study, these location vectors are determined using an unsupervised learning technique. All training vectors are clustered using the k-means clustering algorithm implemented in the Matlab Statistics toolbox. The number of neurons used in the network is the same as the number of clusters. Cluster centers are determined as the means of the vectors in each cluster, and these values are used as RBF neuron centers. Another method commonly used to choose neuron centers is the greedy strategy implemented in the Matlab RBF design function newrb. Using this function, the network is built incrementally, adding neurons until some error boundary is reached. In each iteration the algorithm chooses the location of the new neuron as the input vector which will minimize the total error of die network when used as a neuron center. The effectiveness of each of these two methods is compared in the results section. 17 4.1.1.1.3 Number of Hidden Layer Neurons Ideally, we would like to cover the entire input space with overlapping neurons, so any input vector would generate a response from several neurons, but this strategy is rarely feasible. With more neurons, the radius of each one could be decreased, and the network would be more specific. If taken too far, this could result in die network memorizing each input vector and not generalizing well. Trials using neuron counts from three to 200 were conducted. When using a value for the neuron radius obtained using the heuristic outlined above, 13 hidden layer neurons were found to be sufficient for good network performance on this data set. 4.2 Preprocessing The first step in preparing the raw EEG data for presentation to the neural network was to locate the relevant segments. A C++ program was developed to extract relevant segments from the raw data files based on the seizure times given in the Excel file. Extracted segments are 230 seconds long, slightly longer than the longest seizure duration in the study. The total number of data points per extracted segment is: 230 seconds x 250 points per second x three channels equals 172500 points per segment. 18 Seizure times were checked against the recording ranges of the raw data files and extracted segments were checked for inconsistencies and seizures that were truncated by the extraction process, leading to the rejection of 101 segments. This left 2361 seizure segments for training and testing. In this study, there are two normal (non-epileptic) rats and five abnormal (epileptic) rats. Obviously, all seizure samples come from the epileptic rats. The easiest way to generate samples of normal EEG signals would be to take segments from random times from only the normal rats. However, if this method were used it is possible that the neural network could learn to identify features specific to each rat, distinguishing between individuals rather than between normal and abnormal EEG segments. This could result in highly accurate, but completely meaningless results. Therefore it was necessary to include segments of normal EEG from epileptic rats as well as from normal rats. Normal segments were chosen randomly from the raw data with the restriction that each segment start and end time must not be within five minutes of the start or end of any previously extracted segment. Due to the frequency of seizure occurrences on some days, five minutes was the maximum amount of time possible between extracted segments. Longer intervals between segments were used when possible. One normal segment was extracted from the same day for each abnormal segment, so the total number of normal segments was equal to the number of seizure 19 segments for each day and for each rat. An additional 336 normal segments were extracted from each of the non-epileptic rats. 4.2.1 Windowing In order for a neural network to be able to generalize from a set of training vectors, the number of training samples available must be much greater than the length of each sample. If the length of training vectors used is greater than the number of samples available for training, the network will not generalize well. Given the extracted segment length of 172500 points, 2361 seizures and 3106 normal samples, the length of the vector presented to the network must be significantly reduced. The most straight-forward technique is to chop the vector into smaller segments, taking only a few seconds, or fractions of a second, worth of raw data, rather than the whole segment. This technique is called windowing. 4.2.1.1 Indexing Given die fact that the known seizure times are rounded to the nearest minute before seizure onset, combined with the variable duration across seizures, determining where to take a windowed slice from within the 230 second extracted segment is somewhat problematic in practice. In order to determine seizure start and end times, and where it is appropriate to take data slices from, a Matlab based segment viewer was 20 developed to display all three channels of a given seizure segment using variable scales and allowing navigation through the full 230 seconds of extracted data. In this way, 555 seizures were manually annotated with start and end times at a resolution of one second. The number of seizures indexed for each rat is given in Table 4.1. Rat Indexed Seizures 4 57 5 38 6 107 7 204 8 149 Total 555 Table 4.1. Indexed seizures per rat. Indexing was not necessary for normal segments, as any slice should be as good as any other within the same segment. Accordingly, normal slices were chosen randomly from within each normal segment. 4.2.2 Fourier Transform The Fourier transform is used to analyze the frequency spectrum of a signal by decomposing the signal into different sinusoids. This yields a view of the frequency components of the signal, but results in a loss of information in the time domain. In order to preserve some time information, the transform is often applied to a moving window of the data. This is known as the short-time Fourier transform. In this study, 21 the short-time fast Fourier transform was used, as implemented by the Matlab function fit. For a vector x of length P, the transformed vector X is given by equation 4.6, where j is the square root of-1. xk = Xp=i ;P(Xp*exp(-j *2 *pi*(k-1 )*(n-1 )/N)), 1 <= k <= P (4.6) 4.23 Wavelet Transform The wavelet transform provides another view of a signals frequency content. Rather than using the fixed width window of the short-time Fourier transform, wavelet decomposition uses a scaled window. A narrow window is used to capture high frequency data, while wider windows are used for the lower frequencies. Instead of the sinusoidal bases of the Fourier transform, a wide range of basis functions are available for use with wavelet decomposition. This study concentrated on the Daubechies base wavelets developed by I. Daubechies [7], and used in [30] for seizure prediction with recurrent neural networks. Matlab implements 43 Daubechies wavelet bases, referred to as dbl through db45. Decomposition is achieved by comparing the original signal to scaled and shifted versions of the base wavelet and generating coefficients indicating how well each version of the base wavelet represents the original signal. 22 Wavelet decomposition can be used to separate a signal into a low frequency approximation of the original signal and high frequency details. Applying a second decomposition to the approximation coefficients obtained from the first decomposition results in a level two decomposition of the original signal. This may be carried out multiple times, depending on the length of the original signal and the wavelet base used. The discrete wavelet transform of the original signal f(t) is given by equation 4.7. Xk=-:^Jk^PJk(t) "b ^J=J:ooXk=-a:
function and y is the basis function, in this case the Daubechies base. In the right |