Citation
High-dimensional data assimilation and morphing ensemble Kalman filters with applications in wildfire modeling

Material Information

Title:
High-dimensional data assimilation and morphing ensemble Kalman filters with applications in wildfire modeling
Creator:
Beezley, Jonathan D
Publication Date:
Language:
English
Physical Description:
xxv, 136 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Wildfires -- Mathematical models ( lcsh )
Kalman filtering ( lcsh )
Kalman filtering ( fast )
Wildfires -- Mathematical models ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Colorado Denver, 2009.
Bibliography:
Includes bibliographical references (leaves 124-136).
General Note:
Department of Mathematical and Statistical Sciences
Statement of Responsibility:
by Jonathan D. Beezley.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
436221876 ( OCLC )
ocn436221876

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

HIGH-DIMENSIONAL DATA ASSIMILATION AND MORPHING ENSEMBLE KALMAN FILTERS WITH APPLICATIONS IN WILDFIRE MODELING by Jonathan D. Beezley Bachelor of Science, University of Nebraska Lincoln, 2001 Master of Science, University of Colorado Denver, 2006 A thesis submitted to the University of Colorado Denver in partial fulfillment of the requirements for the degree of Doctor of Philosophy Applied Mathematics 2009

PAGE 2

Thi s the i s for th e Do c tor of Philo sophy degree by J o n at h a n 0. B eez ley h as b ee n approved by Jan Mandel ,c; ... I -ZDD't Date

PAGE 3

Beezley, Jonathan D. (Ph.D., Applied Mathematics) High-Dimensional Data Assimilation and Morphing Ensemble Kalman Filters with Applications in Wildfire Modeling Thesis directed by Professor Jan Mandel ABSTRACT In this dissertation, we examine practical aspects of incorporating data into highdimensional and nonlinear models. The Kalman filter is discussed from a theoretical prospective and is shown to solve the Bayesian data assimilation problem under certain linearity and normality conditions. The ensemble Kalman filter (EnKF) and its variants are presented as an approximation to the optimal filtering distributions using statistical sampling that are used even when the assumptions of the Kalman filter are violated. We present a series of numerical experiments that explore the behavior of the EnKF as the model state size becomes large. Finally, we present an application, wildfire modeling, in which the standard EnKF techniques fail. The behavior of the EnKF applied to this highly nonlinear problem inspires the development of two new types of ensemble filters: predictor-corrector and morphing. Predictor-corrector filters use a proposal ensemble (obtained by some method, called the predictor) with assignment of importance weights to recover the correct statistic of the posterior (the corrector). The proposal ensemble comes from an arbitrary unknown distribution and only needs concentrated coverage of the support of the posterior. The ratio of the prior and the proposal densities for calculating the

PAGE 4

imp orta n ce we i ghts i s o bt aine d b y de n sity es tim atio n Th e pr edic tion c r ea t e d b y the e n s embl e K alma n f o rmula s and co n ec tion form e d b y nonp a r a m etric d e n s ity e s tim atio n b ase d o n the di t a n ce in S o b o l ev s p aces a r e c on side r ed. umerica l ex p e rim e nt s h ow tha t the n ew pr e dict o r -co n ec t o r filt e rs combin e the a d va n tage of e n se mbl e K a lm a n filt e rs and p a rticl e filt e rs for highly n o nlin ea r sys t e m s and tha t they a r e s uit able for hig h dim e n s ion a l s tat es tha t ar e di sc r etiz ation s of so lution s of p a rtial di ffe r e ntial equ a tion The morphin g filter combine the EnKF with the idea s of m o rphin g and regi s tr a tion from ima ge proce ssing Thi s r e ult s in filter s s uitabl e f o r n o nlinear problem in w hich the s olution s exhibit m ov in g c oh e r e nt f ea tur e uch as thin interface s in wildfire modeling. The ensemble member s are repre s ented a s the compo s ition of one common s tate with a spatial tran formation called registration mappin g, plus a re s id ual. A fully a utomatic regi s tration m e thod i s u s ed tha t r e quir es onl y g ridded d a ta, s o th e featur es in the model s tat e do n o t n ee d to be identifi e d by th e u ser. Th e morphin g EnKF operate on a tran s form e d s t a te con s i ting of the m a ppin g obt a in e d through registration and the residual image The EnKF is shown to perform far better when applied to this transformed state than to the original model state for a wildfire model containing spatial errors. This abstract accurately represents the content of the candidate s thesi s I recommend its publication. Si g n e d

PAGE 5

ACKNOWLEDGMENT This thesis would not have been possible without the support of the many educa tors who have gone out of their way to help me. In particular, I wish to thank my high school teacher, Mike Smith, and undergraduate advisors, Gordon Woodward and Edward Jones. Mostly, I thank my graduate advisor, Jan Mandel, for his endless patience and encouragement. I would like to thank my sister, Jennifer, for supporting me through the most difficult times in my life and pushing me to apply for graduate school; my parents for their encouragement and financial support; and my sister, An gela, for her companionship. Finally, I thank the staff of the Mathematics department for guiding me through this process and my fellow students for their friendship over the last several years.

PAGE 6

CONTENTS Figures. Tables Chapter 1. Introduction . . . . 2. The Ensemble Kalman Filter 2.1 Preliminaries . 2.2 The Kalman Filter. 2.3 The Ensemble Kalman Filter 2.4 Variants of the EnKF 2.4.1 Square Root Filters 2.4.2 Localized Ensemble Filters 3. Predictor-Corrector Filter 3.1 Preliminaries . . 3.1.1 Weighted ensembles 3.1.2 Particle filters ... 3.1.3 Density estimation 3.2 Derivation of predictor-corrector filters 3.2.1 Motivation .... . . 3 2.2 High-dimensional systems 3 2.3 Formulation of the method VI lX XXV 7 7 8 14 17 17 18 23 23 24 26 29 30 30 32 34

PAGE 7

3.3 Numerical Results 3.3.1 Bimodal Prior . 3 3 2 Filtering for a stocha s tic ODE 3 3.3 Filtering in high dimension 4. The EnKF in Practice . . 4.1 The curse of dimensionality 4.2 Generating initial ensembles 4.3 Implementation . . . 4.4 Wildfire data assimilation 4.4.1 A POE Wildfire Model 4.4.2 A Semi-empirical Wildfire Model 4.4.3 Coupling fire and weather models 4.4.4 Wildfire Data ....... . 4.4.5 Spatially Perturbed Ensembles 5. Image Morphing and Registration. 5.1 Preliminaries .... 5 .1.1 Image Registration 5.1.2 Morphing ..... 5 .2 An Automatic Registration Procedure 5.2.1 Grids and Sub-domains 5.2.2 Smoothing ..... 5.2.3 Local minimization 5.2.4 The Levenberg-Marquardt Algorithm 5.2.5 Optimization ........ . . vii 35 37 39 44 48 48 57 60 61 62 63 65 66 68 75 75 75 78 80 84 88 89 92 96

PAGE 8

5.2.6 Computational Complexity ..... 6. The Morphing Ensemble Kalman Filter 6.1 The Morphing Transformation 6.2 Numerical Results 6.3 Conclusion . . Vlll 102 104 105 107 120

PAGE 9

FIGURES Figure 2.1 The function described in (2.25) is displayed as a blue line, and Gau s sian function with bandwidth J3710, scaled to maximum of 1 is displayed as a dotted black line. . . . . . . . . . . . . . 22 3.1 A simple example showing degeneracy of the particle filter. The SIS is applied to a prior ensemble of size 10 sampled out of N(O 0 16) with equal weights. The data likelihood is centered away from the prior mean with distribution N(2.5, 0 0625). In (a), the exact prior, likelihood, and posterior densities are given by the blue, red, and black lines, respec tively The prior and posterior ensemble members and weights are given by blue *'sand black x 's. The value of the ensemble members are given by their position on the x axis and the weights are the vertical positions. In addition, the scaled likelihood is given by redo's, but because the prior ensemble had equal weights the likelihood is proportional to the poste rior weight. In (b), the posterior ensemble is shown on a semi-log plot. The weight of the largest ensemble member is almost 1, while weights of the rest are essentially 0. In (c), the exact mean and covariance is given along with the computed ensemble mean and covariance from (3.6) and (3.8) for both the prior and posterior. In this case, the particle filter fails to accurately represent these statistics of the posterior. . . . . . 31 lX

PAGE 10

3.2 The problem shown in Figure 3 1 is repeated with the same prior en semble but using the predictor-corrector filter. With the new method, the proposal ensemble is much closer to the data, and the posterior ensemble exhibits far less degeneracy and its sample mean and covariance is far closer to the exact mean and covariance compared with the SIS method. 36 3.3 In (a), the prior and data distributions are given by blue and red lines, respectively. The prior ensemble, with size 10, is displayed with blue *'s, where the position of each ensemble member on the x axis is its value and the position on the y axis is its weight. The data likelihood given each ensemble member is displayed as redo's, scaled so the like lihoods sum to 1. The prior distribution has two modes at .5 and is constructed by sampling each ensemble member, x k ,....., N(O 2.5) and assigning weights as the pdf of the sum of two Gaussian distributions, N(.5, 0.1) evaluated at xk. The data distribution is N(O, 0.1). In (b) (d), the posterior ensemble is shown with black x 's for EnKF, SIS, and EnKF-SIS, respectively. The exact posterior density is shown as a dotted line, while the estimated posterior density from the ensemble calculated from a gaussian kernel with bandwidth 0.5 is shown as a solid black line. The EnKF alone cannot account for the bimodal nature of the posterior, while the EnKF-SIS method corrects this by applying weights which in crease the accuracy of the posterior estimation. In this scalar problem, the SIS method is able to perform reasonably well. . . . . . 38 3.4 (a) The deterministic potential curve, and (b) A solution of (3 22) switching between stable points. . . . . . . . . . . . . 40 X

PAGE 11

3.5 In (a), an example of the applying the EnKF, SIS, and EnKF+SIS on (3.22) with the data given by circles. Displayed is the ensemble mean of a single realization and the mean of the optimal solution from (3.23) using an ensemble of size 100. In (b) and (c), the estimated mean square error in the ensemble mean and covariance are given for each method by averaging over 100 simulations . . . . . . . . . . . 42 3.6 In (a), a non-Gaussian prior distribution is applied to a data distribution with small likelihood. The exact posterior distribution is given in (b). The EnKF, SIS, and EnKF-SIS was applied to an en s emble of size 100. In (c) the EnKF does not approximate the exact posterior distribution well because it ignores the bimodal prior. The SIS given in (d) does approximate the posterior reasonably well because the data likelihood is large. In (e) the EnKF-SIS resembles the features of (d) by reducing the weights of the proposal ensemble members that pass though the center of the figure. . . . . . . . . . . . . . . . . 46 3.7 In (a), a Gaussian prior distribution is applied to a data distribution with small likelihood. The exact posterior distribution i s given in (b) The EnKF SIS and EnKF SIS wa s applied to an ensemble of size 100 In (c) the EnKF approximates the exact solution well because the prior was Gau s sian; however, the SIS given in (d) wa s fairly degenerate and was not able to correct for the large s hift in the distribution In (e), the EnKF SIS approximates the shape of the exact po s terior well, while exhibiting far les s degeneracy than (d). . . . . . . . . . . . . 47 XI

PAGE 12

4.1 For constant inverse and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 10 ensemble members and 25 data points with a random observation matrix. . . . . . . 51 4.2 For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 10 ensemble members and 250 data points with a random observation matrix. . . . . . . 51 4.3 For constant inverse and inverse square foreca s t covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and 25 data points with a random observation matrix. . . . . . . 52 4.4 For constant inverse, and inverse square forecast covariance eigenvalue structures the mean square covariance filtering error versu s model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and 250 data points with a random observation matrix 4.5 For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 10 ensemble members and 2 5 data points with observation matrix, Hij = 8ij 4.6 For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versu s model size i s displayed for SIR (a) and EnKF (b) u s ing 10 ensemble members and 52 53 250 data points with observation matrix, Hij = 8ij. . . . . . . 53 XII

PAGE 13

4.7 For constant inverse and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and 25 data points with observation matrix, Hij = 8ij . . . . . . 54 4.8 For constant inverse and inverse square forecast covariance eigenvalue structures the mean square covariance filtering error versus model size i s displayed for SIR (a) and EnKF (b) using 100 ensemble members and 250 data points with observation matrix Hij = 8ij . . . . . . 54 4.9 For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versu s model s ize is displayed for SIR (a) and EnKF (b) using 10 en s emble members and an identity observation matrix. . . . . . . . . . . . 55 4 10 For constant inverse and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and an identity observation matrix. . . . . . . . . . . . 55 4.11 The effect of creating a smooth random field using the procedure outlined above. An i.i.d. complex vector of length 100 is sampled out of N ( O 1 ) The components of thi s vector are s caled according to 4.3, with a = 0 1 2. A larger a reduces the influence of higher frequency components, and the re s ulting function is smoother. xiii 59

PAGE 14

4.12 In (a) the fireline and level set function for a fire initialized by two thin lines and a small circle propagating in constant wind. The fire is in the are a where the level set function is negative The black contour is the fireline. In (b), the heat flux generated by the fire in (a) . . . . . 65 4.13 From Mandel et al. [2008], the temperature profile of a combu s tion wave traveling over a fixed point. The modeled temperature profile given by the black line wa s optimized to approximate the dotted line representing real data obtained from Kremens et al. [2003] . . . . . . . 67 4.14 The estimated point wise probability density of the temperature near the fire line for an ensemble generated by perturbing a known temperature profile in space by a Gaussian random variable. The density has peaks centered around burning and non-burning regions and i s highly non-Gau s sian. . . . . . . . . . . . . . . . 68 4.15 In (a), a forecast ensemble of size 10 containing temperature profiles shifted in space with a variance of 200 m. The data profile given as a dotted red line is located in the center of the ensemble members. In (b), the point-wise ensemble variance calculated from this ensemble. 70 4 16 The analysis ensemble resulted in applying the EnKF to the forecast en semble and data in Figure 4.15. The data variance u s ed was 100 K2 The EnKF was unable to track the data accurately, and the analysis ensemble ha s e ss ential degenerated to a s ingle solution with variance reduced by several orders of magnitude. . . . . . . . . . . . . 70 XIV

PAGE 15

4.17 The analysis ensemble resulted in applying the B-localized EnKF to the forecast ensemble and data in Figure 4.15. The data variance used was 100 K2 and localization radius was 20 m The localized EnKF was able to create a posterior ensemble which tracks the data accurately on the reaction front; however, there are many spurious features outside of this region. In addition, the localization has reduced the degeneracy of the analysis ensemble compared to the EnKF without localization 4.18 In (a), a forecast ensemble of size 10 containing temperature profiles shifted in space with a variance of 200 m. The data profile given as a dotted red line is located 4 00 m to the left of the center of the ensemble members. In (b), the point-wise ensemble variance calculated from this 71 ensemble . . . . . . . . . . . . . . . . . 71 4.19 The analysis ensemble resulted in applying the EnKF to the forecast en semble and data in Figure 4.18. The data variance used was 100 K2 In this case, the analysis ensemble does not resemble the data at all because the forecast ensemble contained no members nearby. . . . . . 72 4.20 The analysis ensemble resulted in applying the B-localized EnKF to the forecast ensemble and data in Figure 4.18. The data variance used was 100 K2 and localization radius was 20 m. The results here are similar to that of the non-localized EnKF; the analysis ensemble was unable to track the data because no forecast ensemble members were nearby 72 5.1 Shown in (a) and (b) are contour plots of the x and y components of an example morphing function, T. Figure (c) shows a quiver plot ofT indicating how an image would be warped when T is applied to it. . . 77 XV

PAGE 16

5.2 Shown in (a) is an image u made up of concentric circles. The morphing function given in Fig. 5.1 applied to this image, uo (I+ T). The resulting warped image is given in (b). . . . . . . . . . . . 78 5.3 The example given in Remark 5.1 is illustrated for cp( x ) = u ( x), a hat function centered at 0, displayed in blue, and v ( x ) = 2cp( x + 1) is dis played in red. In (a) and (b), intermediate images are formed following (5.10), which uses the inverse morphing function, and (5.11 ), which is commonly used in applications, respectively, are displayed in black. The common method produces two features: cp( x +A), which is a translation of u centered at -A, and A
PAGE 17

5.4 An example of the intermediate states of the morphing transformation applied to the heat flux emitted from two different fires. The morphing transformation is taken for both of the images and linear combinations of the two are formed, representing intermediate states in the morphing rep resentation. The resulting states are transformed back into the standard representation. The two alternative formulations for the residual image yield different intermediate states. In the given figures, the top and bot tom images are the two original fire lines. The intermediate states for the standard form of the residual (5.8) are given on the right and those of our representation (5.5) are given on the left. The standard method produces spurious features that remain centered on the location of the bottom fire. In our method, the large components of the residual move with the morphed image, so no spurious features are present. . . . 82 5.5 The physical domain (a) with grid resolutions d x and dy is scaled to the normalized domain (b) on a square grid D = [0, 1] x [0, 1 ] with grid resolutions d x and dy. The change in aspect ratio from d x :d y to d x :dy must be accounted for in the algorithm using the parameter p = which represents the anisotropic scaling of each axis. . . . . . 84 5.6 The original domain Dis split into 9 different sub-domains of x at refinement level = 1. The sub-domains at this level are indexed by D f j with i,j E {1 2}. Fractional indices indicate that the sub domain is offset from the global domain's edge by a distance of half of the sub-domain s width in its respective axis. Further refinement levels are decomposed simjlarly. . . . . . . . . . . . . 86 XVIl

PAGE 18

5.7 Displayed in (a) is a function defined over [ 0 1] containing a thin feature similar to the reaction front of a wildfire. This function is smoothed using equation 5.18 with various smoothing bandwidths which would be used at increasing refinement levels, where the bandwidth decreases as 2 -e. The larger bandwidth at lower levels increases the ability of the reg istration algorithm to find the global minimum of the objective function. Decreasing the bandwidth on the same scale of the refinement enables the algorithm to retain accuracy as the algorithm attempts to align in creasingly fine scale detail. . . . . . . . . . . . . 90 5.8 (a) The morphing shape function S(x ) used in (5.19) defined as a cubic spline with S(x) = 0 and S'(x) = 0 on the boundary and S(x) = 1 and S'(x) = 0 at the center, x = 0. (b) The tensor product of the shape functions that make up the components of B1 and B2 The gradient of this tensor product is zero along the boundary, and it attains a maximal value of 1 at the center, (0, 0). (c) and (d) Quiver plots showing the deformation of the domain performed by each shape function scaled by 0 2 ...................................... 93 XVlll

PAGE 19

5.9 These figures illustrate the effect of various methods of interpolating a morphing function onto the pixel grid. (a) shows the original image with four arrows indicating the specified perturbations of the image. The re maining images show the morphed image with the specified perturba tions interpolated to the pixel grid with (b) nearest neighbor, (c) bilinear and (d) tensor product B-spline interpolants Figure (b) exhibits clear discontinuities, while Figure (c) creates kinks in what were originally smooth curves. . . . . . . 5.10 (a) The derivative of S(x ) from Fig. 5.8a has maximal and minimal values at 0 5 and 0 .5. The partial derivatives of Fig. 5.8b in the x andy directions are given in (b) and (c), respectively. These partial derivatives have extreme values at (. 5 0) and (0, 0.5), where they attain slopes of .5. Equation (5. 16) is used to provide approximate conditions for the invertibility of I + T + c1Pi}B1 by finding bounds for c1 such that %x (I+ T + c1Pi}B1 ) > -1 at ( 0.5, 0). The resulting bounds give rise 94 to the conditions appearing in (5.25). . . . . . . . . . . 99 XJX

PAGE 20

5.11 The black dots represent the sampling grid defined in (5.27) as a uniform grid with resolution Kx x Ky. The region outlined in red, C, is defined as the feasibility region of the local minimization problem (5.24). The points marked with a black square are determined by solving (5.26) and represent the the intersection of C with the x and y axes. The distance of these points from the origin are multiplied by the contraction parameters, ax and ay, to obtain the blue dots located at (ex, 0), (0 cY), 0), and (0, ey). The coordinates for these dots are explicitly given in (5.25) and define the vertices of the quadrilateral, C, outlined in blue. The sampling points, circled in blue are determined as the intersection of the sam pling grid with the region C. The dashed blue line represents what the sampling region would be if we did not include the contraction param eters. This region is not a subset of C, and, in particular, contains the sampling point circled in red, which is not in the feasible region of the problem .................................... 100 6.1 Data assimilation by the morphing ensemble filter. The forecast ensem ble (b) was created by smooth random morphing of the initial tempera ture profile located near the center of the domain. The analysis ensemble (c) was obtained by the EnKF applied to the transformed state The data for the EnKF was the morphing transformation of the simulated data (a), and the observation function was the identity mapping. Contours are at 800 K, indicating the location of the fireline. The reaction zone is approximately between the two curves. . . XX 109

PAGE 21

6.2 After five analysis cycles, the ensemble shows less spread and follows the data reliably Ensemble members were registered using the initial state, advanced in time without data assimilation. The forecast ensemble (b) is closer to the simulated data (a) because of preceding analysis steps that have attracted the ensemble to the truth. The analysis ensemble (c) has a little less spread than the forecast, and the change between the forecast and the analysis is well within the ensemble spread. Contours are at 800 K, indicating the location of the fireline. The reaction zone is approximately between the two curves. . . . . . . . . 110 6.3 Probability densities estimated by a Gaussian kernel with bandwidths 37 K, 19 K, and 30 m. Data was collected from the ensemble shown in Fig 6.1 b Displayed are typical marginal probability densities at a point near the reaction area of the original temperature (a), the registration residual of temperature (b), and the registration mapping component in the x direction (c). The transformation has vastly improved Gaussian nature of the densities. . . . . . . . . . . . . . 110 6.4 The p-value from the Anderson-Darling test of the data from the en semble after five morphing EnKF analysis cycles shows the ensemble transformed into its registration representation, the registration residual of the temperature (b) and the registration mapping (c), has distribution much closer to Gaussian than the original ensemble (a). The shading at a point indicates where the marginal distribution of the ensemble at that point is highly Gaussian (white) or highly non-Gaussian (black) ...... 111 XXI

PAGE 22

6.5 The morphing EnKF applied to the reaction-diffusion model. The false color is generated from the temperature with shading for depth percep tion The reference solution (a) is the simulated data. The initial en semble is created by a random perturbation of the comparison solution (b), where the fire is ignjted at an intentionally incorrect location. The standard ENKF panel (c) is the result of data assimilation of the temper ature field after running the model for 500 seconds The morphing EnKF panel (d) is the ensemble with the image registration against the tem perature field at the time of ignition, and the morphing applied to both the temperature and the fuel supply. The ensembles have 25 members each, and are visualized as the superposition of transparent images of their members. The observation function is the temperature field on the whole grid. The standard EnKF ensemble diverges from the data, while the morphing EnKF ensemble remains closer to the data .......... 113 XXII

PAGE 23

6.6 The morphing EnKF applied to the fireline propagation model. The false color is the output heat flux with shading for depth perception. The refer ence solution (a) is the simulated data. The initial ensemble is created by a random perturbation of the comparison solution (b), where the fire is ignited at an intentionally incorrect location The standard ENKF panel (c) is the result of data assimilation of the time from ignition after run ning the model for 1000 seconds. The morphing EnKF panel (d) is the result with the image registration determined from the time from ignition and the morphing applied to all of the model variables. The ensembles have 25 members each, and are visualized as the superposition of trans parent images of heat fluxes of their members. The registration is done on the atmospheric grid with the fire heat flux as the observation function, but the atmospheric model is not used. The standard EnKF ensemble di verges from the data, while the morphing EnKF ensemble remains closer to the data. . . . . . . . . . . . . . . . . 114 xxiii

PAGE 24

6.7 The morphing EnKF applied to the fireline propagation model coupled with WRF. The false color of the contour on the horizontal plane is the fire output heat flux. The superposition of transparent ensemble mem bers is shown. The volume shading is the vorticity of the atmosphere in the ensemble mean, where red and blue shades represent positive and negative vorticity, respectively. The reference solution (a) is the simu lated data. The initial ensemble is created by a random perturbation of the comparison solution (b), where the fire is ignited at an intentionally incorrect location. The standard ENKF (c) and the morphing EnKF (d) are applied after 15 minutes. The ensembles have 25 members each. The registration is done on the atmospheric grid with the fire heat flux as the observation function. The standard EnKF ensemble diverges from the data, while the morphing EnKF ensemble remains closer to the data .... 115 XXIV

PAGE 25

TABLES Table 3. I Data used in assimilation into (3.22) 6.1 Analysis ensemble statistics for the EnKF with and without morphing. An ensemble of size 25 is generated of a 2-D fire shifted in space with mean 0 and standard deviation, a. Four data points are tested with stan dard deviation 75 m. The exact analysis mean is given next to the sample mean resulting from the EnKF and the morphing EnKF, as well as the relative analysis standard deviation (6.4). The ensemble statistics were averaged over 100 simulations. While both methods exhibit large errors, the results indicate that the EnKF is unable to move the location of the fire, while the morphing EnKF can. Also, the EnKF alone produces far more degeneracy in the analysis ensemble compared to the morphing 41 EnKF .......... .......................... 119 XXV

PAGE 26

1. Introduction Modelling physical systems with computational simulations has become a ubiq uitous part of modem science The purpose of the simulation can be to predict the outcome of a future event or to gain insight in the details of a prior event. The model itself can be described abstractly as a function that maps a vector of physical quan tities (the model state) from some initial time into a forecasted state at some future time. The model is often given as the solution of a partial differential equation (PDE), but can also be an empirical algorithm, or a combination of the two. Usually, the model will depend on a set of parameters, such as the coefficients of the PDE. A simple model could describe the position of ball rolling down an incline, with param eters containing the mass of the ball, gravitational acceleration, and the coefficient of friction. Clearly, we wish to create numerical models which forecast real life phenomena exactly; however, this is generally impossible. Often either the precise mechanics of the problem are unknown or we lack the computational resources to model them. Neither the parameters of the model nor the initial conditions are known exactly. In addition, floating point error prevents any sort of numerical scheme from exactly solving the model equation. All of these factors combined create intrinsic errors in the forecasted solution to the system. Many nonlinear systems, such as weather models, exhibit chaotic effects where these errors grow exponentially over time. It is, therefore, essential to estimate these errors and their behavior when making numerical 1

PAGE 27

forecasts of physical systems. These errors will be modelled in a statistical framework with the mathematical notation introduced in Section 2.1. The model errors present in computational simulations can often be reduced by the inclusion of data retrieved from the physical system The instruments that collect this data have their own associated error. We generally assume the data error from these instruments is either known or can be estimated somehow. There should also be a known relationship between the data and the model state vector. This relationship is given in terms of a function known as the observation function, which acts on the model state to produce synthetic data. For each realization of the model state, the synthetic data represents what the data would be if the model state represented the "truth" While methods of including continuous time data sets into a numerical model exist (such as the Kalman-Bucy filter [Kalman and Bucy, 1961 ]), we will limit our discussion to problems involving discrete data sets. The process of including real data in numerical models is known as sequential statistical estimation or, in meteorology and oceanography, as data assimilation [Ben nett, 1992, p. 67]. Data assimilation aims to find the best estimate from the model state and the data by balancing the uncertainties This problem is often posed in one of two ways. Variational methods, such as 3DVAR and 4DVAR, construct least square estimates using two norms weighted by the inverse covariances. The square error produced by the deviation from the original model state and data is minimized using an iterative Quasi-Newton method. Other data assimilation methods such as the ensemble Kalman and particle filters are posed as statistical estimation of the filtering distribution defined by the model state conditioned on the data. We show in Section 2 2 that the Kalman filter results in the exact filtering distribution when the 2

PAGE 28

model is linear and the error distributions are Gaussian. In this case, the mean of the filtering distribution can be shown to be the least-square solution obtained from the variational methods, [Kalnay, 2003]. The Kalman filter works by explicitly updating the covariance of the model state, which is usuitable for large problems. In Section 2.3, we present the ensemble Kalman filter (EnKF) and several variants. These filters are Monte-Carlo methods that evolve in time a collection of simulations, called an ensemble, and replace the exact covariance by the sample covariance computed from the ensemble. In Section 4.3, we present a formulation of this method that can be computed efficiently on large problems. While the methods based on the Kalman filter are, theoretically, only justified for linear models and Gaussian distributions, they are often used when these assumptions are violated However, when the model is nonlinear, the distributions can become highly non-Gaussian. Kalman filter based methods represent the errors exclusively from their covariances. While Gaussian distributions are given uniquely by their mean and covariance, this is not true for general distributions. The higher order char acteristics are implicitly ignored by the Kalman filter, which can lead to significant biases in the filtered distribution. In Section 3, we introduce particle filters, which implement the data assimilation without relying on the Gaussian assumption, but the required ensemble size grows quickly with the dimension of the state space for many problems [Bengtsson et al., 2008, Snyder et al., 2008]. Existing data assimilation techniques are incapable of producing acceptable es timations of filtering distributions for high-dimensional and nonlinear models. Thus, the design of more efficient filters for these problems has been the subject of signif icant recent interest. We will discuss two extensions to standard techniques in data 3

PAGE 29

assimilation, which attempt to improve prediction capability when applied to such models. The development of these extensions will be motivated by numerical exper iments intended to explore the problems encountered. Finally, we provide numerical evidence that the new methods improve the estimation of the filtering distributions. This thesis is organized as follows. In chapter 2, we will derive the Kalman fil ter directly from Bayes' rule under a set of assumptions based on a linear model. Next, the ensemble Kalman filter (EnKF) will be introduced as an approximation to the Kalman filter by replacing the distribution's covariance with a sample covariance. Finally, several variants of the EnKF commonly used in practice will be discussed. In chapter 3, a hybrid filter will be presented that uses density estimation to combine EnKF and particle filter to retain the advantages of both. This method is based on the technical report [Mandel and Beezley, 2006] and the extended abstract [Mandel and Beezley, January 2007] In chapter 4, we will discuss practical issues of the EnKF including generating initial ensembles and efficient implementation. A series of numerical experiments will be performed which are designed to explore the be havior of the filtering error for both the EnKF and the particle filter as the size of the model increases. Finally, a model for prediction of wildfire growth, based on the paper Mandel et al. [2009a], will be discussed along with the complications that arise when applying the EnKF to it. The EnKF alone will be shown to be insufficient when app]jed to this highly nonlinear model due to the natural evolution of errors in the location of sharp reaction fronts. This will motivate the development of a nonlinear transformation of the model state called the morphing transformation based on the paper, Beezley and Mandel [2008]. This transformation relies on a nonlinear opti mization problem called image registration which attempts to align coherent features 4

PAGE 30

of images by composition with spatial deformations known as morphing functions. In chapter 5, we will present an efficient, automated procedure capable of registering images originating from fire modeling. In chapter 6, we will define the morphing transformation which will be applied to model states prior to their application to the EnKF. This morphing EnKF is an extension to the traditional algorithm that trans forms the original state variables to a form intended to separate spatial errors from intensity errors. The transformed model states are shown to be much closer to Gaus sian, and when applied to the EnKF, produce far superior results in comparison to the standard approach. My personal contribution to the work described in this thesis is as follows. In chapter 2: the proofs of Lemma 2.1 and Theorem 2.4 along with the descriptions of the square root and localized filters. In chapter 3: the Matlab implementation and numerical results dealing with bimodal prior densities in Figures 3.1-3.5. In chapter 4: a Matlab implementation of the standard and localized EnKF, which were used to produce the Figures 4.1-4.10. In addition, I was a contributor to a joint project cre ating a parallel Fortran implementation of the EnKF, which has been used to collect numerical results for large problems on parallel supercomputers. My contributions to this project include: paraJJelizing the implementation, maintaining the main driver as model and data interfaces were revised, and programming subroutines capable of im porting model states from binary NetCDF files. I also assisted in the development of the integrated weather/wildfire model used for the numerical results in this thesis. My contributions included: integrating the fire code into NCAR's weather model (WRF), maintaining the software infrastructure, porting the changes to the development ver sion of WRF, and modifying the WRF data preprocessor (WPS) to support refined 5

PAGE 31

grids necessary for the inclusion of high resolution fuel data. These modifications to the standard WRF code have been merged with the development repository and are scheduled to be included in a future release cycle. In chapter 5: the design of the au tomated image registration algorithm and an efficient Fortran implementation, which was used in the relevant numerical results. In chapter 6: the design of the morphing transformation and its Fortran implementation, with a contribution from Jan Mandel, providing for non-gridded interpolation used for the inversion of morphing functions. 6

PAGE 32

2. The Ensemble Kalman Filter 2.1 Preliminaries We will deal with model states which at timet are represented as real vectors of size n, X t E IRn. These vectors are evolved in time using a model equation, which will be represented as the function M : ( xt t Ot) Xt+Ot We will assume that data is available at discrete time steps t1 , t K, which we will denote as dti' a real vector of size mti Similarly, we assume observation function for each data vector is known and denote it by h ti When we are discussing only a single assimilation cycle, we will often drop the s ub scripts t i and the data will be given as d E IRm with observation function h. The model and data vectors will be assumed to be random vectors whose distri butions have a probability density and bounded second moment. In this context, the data assimilation is posed in terms of Bayes rule, (2.1) where ex: means proportionality. The probability distribution of the model state at a fixed time, called the prior or forecast distribution ( x { ), is combined with the probabil ity distribution of the data error as a function of the model state, called data likelihood (cit l x { ), to produce the analysis or posterior ( xf). Eq. (2.1) determines the posterior density completely because Jn p ( xf) d>..(w) = 1. The model is then advanced in time starting from the analysis solution to the next available data set, and the so-called analysis cycle repeats. The data assimilation problem is then to estimate the model 7

PAGE 33

state at some time in the future, t K, using all data available in the time horizon of the simulat ion, dtp dtK-1' with t0::; t1 < < tK-l ::; tK. The optimal solution to this problem is the random variable XtK given Xt0 XtK_1 and dt1 dtK_1 If the model satisfies the Markov equality, (2.2) then using Bayes rule and the independence of model and data errors, we obtain the standard sequential estimation formula, P (xt;ldtll ,dti-1) ex j P (xt;lxti-1) P (xti-lldtll ,dti-1) dxti-1 P ( Xt; ldtll dtJ ex P ( dt; lxt;) P ( Xt; ldtll dt;_ J (2.3) The first part of this formula is the model update or forecasting phase; the second is the data assimilation or analysis phase. In the case of a deterministic and continuous model, the Markov equality is trivial and reduces to 2.2 The Kalman Filter First described by Kalman [1960] and Kalman and Bucy [1961], the Kalman tilter is a simple recursive formula that implements the sequential estimation formula (2.3) when the initial model state and data distributions are independent and Gaussian and the model is linear. When these conditions are satisfied, all forecast and analysis distributions remain Gaussian over a finite number of assimilation cycles, and the distributions can be represented uniquely by their means and covariances The Kalman filter formula operates directly on the mean and covariance of the model state to produce the exact filtering distribution. For completeness, major points in the development of the Kalman filter are derived here. 8

PAGE 34

We will assume that the model operator is linear and deterministic within each analysis cycle but may change from one cycle to the next. Under thi s assumption, we will represent the model propagation operator for each as s similation cycle, in the following form where Mt; i s an n x n matrix and b t ; i s an n dimensional vector. We will also restrict the observation function to be a linear operator where the observation being a s similated has Gau s sian error density (2.4) for some matrix H. In this case, the data likelihood is p (dt, l xt;) ex ex p ( ( H t;Xt ; dt;) ( H t,Xt -dtJ) When we are dealing only with a single assimilation cycle, the s ubscript t i will be dropped for brevity; however it is to be understood that the model and observation function can change from one cycle to the next. This is desirable because in mo s t application s these operators will not be linear, but can be approximated as such. In this case, the matrix Mt; i s the Jacobian of the model evaluated at time t i and the vector b t ; i s the model evaluated at zero M (0, ti, t i+l -ti), as in a fir s t order Taylor approximation. The ob s ervation function i s approximated in a similar fashion. The matrix Mt; i s called the tangent linear mod e l and the Kalman filter applied to the approximated model i s called the Extended Kalman filter. 9

PAGE 35

Assume that at a given analysis cycle the forecast has normal distribution with mean, E(x{) = p, 1 and non-singular covariance Q { and the data likelihood is xti normal with mean H t;x and non-singular covariance, Rt;, p (x) ex exp ( 11-x[i )T ( Q ) -l (x 11-x[i) ) (2.5) p ( dt;lx ) ex exp ( H x)T H t;x) ) (2.6) Lemma 2 1 Given normal forecast (2.5) and likelihood distributions (2. 6) the analysis distribution given by Ba y es theorem (2.1) is also normal. Further, the mean and covariance of the analysis distribution is /1-xa = /1-xf + K (dHP,xf), Q a =(I-KH) Q f, (2.7) where (2.8) is called th e Kalman gain matri x Proof Taking the logarithm of Bayes rule (2.1) gives us the following expression for the posterior density -2log(p(xfld)) = (x f -P,x!)T(Qf)-1(x f -P,x!) + ( d -H xf)T R -1(d-H x f ) + co nst = xfT Qx f 2 x f T ij + const (2.9) where co nst is some constant which does not depend on x f 10

PAGE 36

Because Q is symmetric and positive definite, we can complete the square in (209) and conclude that the posterior distribution is Gaussian with some covariance, Q a and mean J-txa 0 The analysis density then has the following form -2log (p ( xfld)) = ( x f J-txa) T(Qa)1 (xf-J-txa) + c on s t = x f T ( Qat 1 x f 2 x f T (Qat 1 J-txa + const 0 (2010) Comparing the quadratic term in (201 0) with that in (2o9) yields By the Sherman-Morrison-Woodbury formula [Hager, 1989], Q a can be written as Q a = Qf -Qf H T ( H Qf H T + R) -1 H Qf =(I-Qf H T (HQI H T + R) 1 H) Qf =(I-KH)Qfo Finally, comparing the linear terms, we have J-txa = Q a ((Qf)-1/-Lx f + H T R -1d) = (I-K H)J-tx + (I-K H)Qf H T R 1 d = (I -K H) /-Lx! + (I -Qf H T ( H Qf H T + R) -1 H) Qf H T R 1 d =(IKH)J-t xi+ Qf H T (HQI H T + R) -l ( (HQI H T + R) -HQI H T ) R 1 d = (I -K H) 1-Lxf + Qf H T ( H Qf H T + R) -1 d =(IKH)J-t x + Kd = /-Lxf + K ( d H /-Lxf) o 11 D

PAGE 37

The following Lemma interprets the Kalman filter as a least-square method as interpreted by 3DVAR and shows that, under these assumptions, the mean of the analysis distributions produced by the Kalman filter is the same as the least-square estimate produced by 3DVAR. Lemma 2 2 If J.Lxa is defined by (2.7) and (2.8), then J.Lxa is the solution of the leastsquares problem Proof At the minimum, Q = ((Qf) 1 +HT R-1H) 1 = Q'-Q' HT (HQ' H T + Rr1 HQ' =(I-KH)Qf = Q a 0 Because the posterior density is proportional to exp ( the solution x = J.Lxa, which minimizes the least-square estimate also maximizes the analysis density and is called the Bayesian estimate of the filtering problem. The Kalman filter proceeds by computing the forecast distribution for the next Bayesian update at time ti+l using the following lemma, which is an elementary consequence of the linearity of E( ). 12

PAGE 38

Lemma 2.3. Assume x is a normal random variable with mean J.Lx and covariance Q x If M and bare deterministic and of the appropriate size, then the random variable y = M x + b is also normal with mean, /Ly = M 1-Lx + b and covariance Q y = M T Q x M. The Kalman filter can then be expressed as the following two step recursion: analysis step ; (2.12) forecast step (2.13) Equations (2.12) and (2.13) can be seen as solving the sequential data assimilation problem from (2.3) for the special case involving step-wise linear model and observation functions, and Gaussian distributions. This is a consequence of the following theorem, which gives the optimal Bayesian solution to this problem. Theorem 2.4. At time t i 1-Lxr., as defined in (2.12 ) maximi z es the conditional proba bility density x p(x { J l tu dtJ. Proof By induction, we have, at time ti-l /Lt;_1 maximizes the density p(x_1idtu dti-1). By Lemma 2.3, the density of p(x[ldtu dt;_J is Gaus sian with mean J.L 1 = Mt _1J.Lxa + b and covariance, Q 1 = M( Qxa M t _1 By xti ti-t xti ,_1 ti-l ,. Lemma 2.2, 1-Lxr. maximizes the density p(x[ ldtJ But x[ is a continuous function of x_1ldt1 dt;_1 therefore it is measurable with respect to dt1 dt;_1 and D 13

PAGE 39

2.3 The Ensemble Kalman Filter While the Kalman filter provides optimal algebraic formulas for the change of the mean and covariance by the Bayesian update and a formula for advancing the covari ance matrix in time provided the system is linear, it is not possible computationally for high-dimensional s ystems. For this reason the EnKF was developed in Evensen [1994], Houtekamer and Mitchell [1998] The EnKF represents the distribution of the system state using an ensemble of simulations, and replaces the covariance ma trix by the sample covariance matrix of the ensemble. One advantage of the EnKF is that advancing the probability distribution in time is achieved by simply advanc ing each member of the ensemble. The EnKF however, still relies on the Gaussian assumption, though it can still be used in practice for nonlinear problems, where the Gaussian assumption is not satisfied. Related filters attempting to relax the Gaussian assumption in EnKF include Anderson and Anderson [1999], Beezley and Mandel [2008] Bengtsson et al. [2003], Mandel and Beezley [2006], van Leeuwen [2003]. We use the EnKF following Burgers et al. [ 1998] and Evensen [2003], with only some minor differences. This filter involves randomization of data. For filters with out randomization of data, see Anderson [1999] Evensen [2004], and Tippett et al. [2003]. The data assimilation uses a collection of independent simulations, called an ensemble. The ensemble filter consists of 1. generating an initial ensemble by random perturbations, by taking some fore casted solution and adding to it, for each ensemble member, independent gaus sian random vectors 14

PAGE 40

2. advancing each ensemble member in time until the time of the data, which gives the so-called forecast ensemble 3. modifying the ensemble members by injecting the data (the anal ys is step), which results in the so-called analysis ensemble 4. continuing with step 2 to advance the ensemble in time again. We now consider the analysis step in more detail. We have the forecast ensemble XI= [x{, .. = [x{] where each x{ is a random, column vector of dimension n, which contains the whole model state. Thus, X I is a matrix of dimension n by N. As in the Kalman filter, the data is given as a measurement vector d of dimension m and data error covariance matrix R of dimension m by m, with a corresponding m x n observation matrix, H. The forecast ensemble X I is considered a sample from the prior distribution p(x), and the EnKF strives to create an analysis ensemble that is a sample from the posterior distribution p (xid). The EnKF is based on applying a version of the Kalman update (2 .12 ) to each forecast ensemble member x{ to yield the analysis ensemble member xf. For this update, the data vector d in (2.12) is replaced by a randomly perturbed vector with distribution equal to that of the data error, (2.14) where each vi is independent of both each other and of the forecast ensemble. These data vectors are combined into a perturbed data matrix, D = [d + v1 . d + vN] 15

PAGE 41

This is done to ensure proper covariance statistics of the analysis ensemble. Let X I be the sample, or ensemble, mean of the forecast ensemble, 1 N XI= NLx{. i==l (2.15) The unknown covariance matrix Qf in (2.8) is replaced by the sample covariance matrix C(X f) of the forecast ensemble X I, which we will define as (2.16) Note that X I and C(X f) are random quantities as opposed to the deterministic mean and covariance used in the Kalman filter. Define x a = .. = [xf] as the analysis ensemble. This gives the EnKF formula, (2.17) Since R is in practice positive definite, there is no difficulty with the inverse in (2.17). The matrix R is known from the data, cf. (2.4). The EnKF can be written analogously to the Kalman filter's recursion formula (2.12 and 2.13), by defining a sample Kalman gain matrix, Then, the EnKF can be expressed as (analysis step) ; (forecast step) where e1xN is a 1 x N vector of 1 's. 16 (2.18) (2.19) (2.20)

PAGE 42

2.4 Variants of the EnKF 2.4.1 Square Root Filters The form of the EnKF described here (2.19) recreates the proper posterior covari ance by making use of randomly perturbed data. It has been shown by Whitaker and Hamj]] [2002] that the use of perturbed data in certain non-Gaussian problems can cause systematic errors in the posterior covariance. As a result, a variant of the EnKF, called the Ensemble Square Root Filter (EnSRF) has been developed that eliminates the necessity of data perturbation. The EnSRF is given as a two step method where the ensemble mean is updated as T = X1 + L(d-HX1 ). Then the analysis ensemble is created out of deviations from this mean where L is determined by solving C (Xa) =(ILH)C (XI) (2.21) ensuring the analysis ensemble has the correct covariance. If we define A = X I -X1 llxN, then 2.2] Can be rewritten as ALLT AT= (I-LH)AAT =(I-AATHT (HAAT HT + R)-l H) AAT = A (I AT HT ( H AAT H T + Rr1 H A) AT = A (I z T z) AT, 17 (2.22)

PAGE 43

where Z i s a matrix square root, z T Z = A T H T ( H AAT H T + R) I H A. Following Andrews [ 1968], if we define the singular value decomposition (SVD) Z = U:EVT, then the solutions to this equation are of the form where 8 is an arbitrary unitary matrix Evensen [2004] shows that the analysis en semble can be expressed as xa = X I A for some matrix A. Therefore, as with the EnKF, the analysis ensemble is made up of linear combinations of the forecast en semble. The En s emble Transform Kalman Filter [Bishop et al., 2001] and the Ensemble Adjustment Kalman Filter [Anderson, 1999] are both variations of the EnKF based on the EnSRF, which have been shown to provide more accurate representations of the posterior covariance for many problems. However, both methods require com puting the eigenvalue decomposition of dense m x m matrices at a co s t of O (m3). When there are a small number of data points or data errors are independent and the assimilation can be done sequentially, the cost of this SVD negligible; however, when there is a large amount of correlated data (such as a high resolution image) computing the SVD can become prohibitive; however, Anderson [2003] de s cribes a method in which the ensemble is expressed in terms of a rotated basis, where the data covariance becomes diagonal. 2.4.2 Localized Ensemble Filters Because the EnKF and the EnSRF arise by replacing the covariance in the optimal Kalman filter with an ensemble covariance the error in the ensemble methods i s necessarily dependent on the sampling error, In practice we 18

PAGE 44

have N n. In this case the s ample covariance consists of the sum of N rank 1 I -1 I -! T N I -1 1 matnces, N l ( XNk-X N)(XNk-X N ) Because L k = l XNk-X N 0, the terms are linearly dependent, and the rank of C ( X I ) is at most N -1. Further, the perturbation of the forecast ensemble, X'f.r X k is always in the span of the columns of C(X k ). Any uncertainty outside of this subspace will simply be ignored by the classical Kalman filter causing the filter to diverge from the optimal solution over several analysis cycles. In addition, spurious correlations between distant model nodes tend to appear [Houtekamer and Mitchell, 2001, Anderson 2007]. Furrer and Bengtsson [2007] show that the mean trace of the sampled covariance is equal to the trace of the true covariance; the fact that the sampled covariance has only N -1 positive eigenvalues implies that these eigenvalues are heavily biased. In order to deal with this problem without increasing the s ize of the ensemble, localized methods attempt to modify the covariances artificially so that it can more accurately reflect the true uncertainty of the system. Because we do not know the true covariances, we cannot solve this problem precisely; however we can make certain assumptions about its general structure. Model states are usually made up of variables discretized over a spatial grid. Typically, the correlation between two grid points decays with distance so that nearby points have a correlation of close to one and distant points are almost uncorrelated. This can be understood mathematically by observing that the state vectors tend to be discretizations of smooth functions. For global weather models, this localized covariance structure has been demonstrated by Patil et al. [1997] and Houtekamer and Mitchell [1998]. While EnKF localization techniques can take a variety of forms, we will consider here only a s imple method called B-localization. The forecast ensemble covariance 19

PAGE 45

(often called the background error covariance Bin the literature) is directly modified by tapering off correlations between distant grid points. Specifically, the localized covariance is created taking the Schur product with a spatial correlation matrix, p, (2.23) Where the Schur product, D = A B, is an element-wise operation on two matrices of the same size, Dij = AijBij The correlation matrix represents the assumed correlation between grid points, which requires some amount of heuristic analysis A detailed discussion of modeling correlation over spatial fields can be found in Gaspari and Cohn [1999]. Typically the correlation matrix is generated by a radial shape function, S(r), which is monotonically decreasing, compactly supported, and /p(O) = 1. We will denote the "distance" between the ith and lh components of the state vector, located at r i and rj as d(ri, rj) For example, if Xi and Xj are the values of variables Vk1 and vk2 at the locations r e1 and r e2 on the physical domain, then the distance between these components might be the Euclidean distance between their locations, d(ri rj) = lre1 r e21. The correlation matrix is then given by (2.24) If d( ) is a metric over JR.n, then p is symmetric and Cp(X f) is a covariance [Houtekamer and Mitchell, 2001]. A simple example of this sort of tapering when X f contains only a single variable defined on a 1-dimensional domain is Pi]= { 1 0, li-jl ::; 1; li-jl > 1. 20

PAGE 46

In this case, Cp(X f) is sparse with a bandwidth of 3. In general, these correlation matrices will be sparse with bands of non-zeros near the diagonal. Under traditional uniform discretizations, multi-dimensional domains will have additional bands of non-zeros centered some distance from the main diagonal depending on the grid sizes. The generator function for correlation matrices typically used in practice is described in Gaspari and Cohn [1999] as a piecewise fifth order rational function, which is similar to a Gaussian function, with bandwidth .J3710 and scaled to a maximum value of 1, in shape but compactly supported. This function can be written as 0:::; r:::; 1 ; S(r) = (2.25) 0 r > 2. If we define a length scale, l, as the minimum distance over which correlation reduces to zero, then the correlation matrix is defined by (2.26) The tapered forecast covariance is then used in place of the ensemble covariance in the ensemble update formula. Hamill et al. [2001] describe an algorithm based on this approach; however, instead of assimilating the ensemble globally, each data component is assimilated sequentially on a subset of the state vector. In this sequential approach, a local ensemble is defined for each data component d i located in on the domain at rd; A local state vector, X{ contains only the components of the global state vector, x, that are correlated with the data component, {j: d(r d;, r i) < l}. 21

PAGE 47

1 0.8 \ 0 6 \ tJ') \ 0 4 \ \ \ \ 0 2 ' 00 0.5 1 1.5 2 r Figure 2.1: The function described in (2.25) is displayed as a blue line, and Gaussian function with bandwidth J3710, scaled to maximum of 1, is displayed as a dotted black line. The EnSRF is then applied to the local ensemble, X I, with the tapered ensemble covariance. This gives a local perturbation to the forecast ensemble for each data component. This procedure is repeated for every data component, where the sum of all the local perturbations is applied to the global forecast ensemble. For the sequential algorithm, the tapered covariance is a much smaller matrix in which the uncorrelated components of the global state vector are filtered out. This can be seen splitting a large sparse problem into many smaller dense problems; however, it is only rigorously justifiable when the data errors are independent. Anderson [2003] describes a method that allows the use of correlated data by changing the basis of the ensemble states so that the data covariance becomes diagonal. Hunt et al. [2007] note that performing data assimilation sequentially or in small batches has computational advantages over the standard EnSRF because they can be split into many independent problems computed in parallel. 22

PAGE 48

3. Predictor-Corrector Filter In this section, we focus on an ensemble data assimilation technique known as the particle filter. Unlike the EnKF, the particle filter is capable of representing non Gaussian distributions. This fact allows it to perform far more accurately in situations involving highly non-Gaussian distributions, such as those with multiple modes. In practice, however, the particle filter requires many more ensemble members than the dimension of the model space, which limits its usefulness in modern applications. In this section, we introduce a modification to the standard particle filter, called the predictor-corrector filter, which seeks to combine the usability of the EnKF in high dimension with the non-Gaussian capabilities of the particle filter. The method dis cussed here is based on, Mandel and Beezley [2006] and Mandel and Beezley [2009]. 3.1 Preliminaries Because in this section we are concerned with the behavior of filters in high dimension, we will consider model states that are elements of some Hilbert space, V, rather than restricting ourselves to .!Rn. We will denote the inner product of V as ( ). Ensemble members will be random elements, X i that are (!1, S P) (V, B(V)) measurable. The forecast distribution of the ensemble is written as f.Lf (B) = P(X/ E B) defined over all B E B(V). The analysis distribution, f.La, is defined similarly. These distributions are allowed to be non-Gaussian, but are assumed to have bounded second moments. Also, the observation function his now allowed to be nonlinear, but is assumed to be continuous on its domain, and it is assumed that the data likelihood p( diX f) is a continuous function of X f. 23

PAGE 49

In this more general setting, we will use the following form of Bayes rule [Kleijn and van der Vaart, 2006]: (3.1) for any set B E B. Clearly, p,a is again a probability measure defined on B. The mean of a distribution p, is defined as the element 71 E V such that (x, 71) = [ (x, u)dp,(u) Y x E V. (3.2) The covariance of p, is the operator Q IJ. : V --+ V given by ( x ,QIJ.y)= [(x,u-Jl)(y, u-Jl)dp,(u), Y x,yEV. (3.3) Notice that when V = JRn, we can take x and y in these definitions to be coordinate unit vectors, and we recover the usual definition of the mean and covariance. 3.1.1 Weighted ensembles Weighted ensemble (XNk, WNk) := l is a collection of members XNk E V and weights WNk E JR, k = 1, ... N, such that N WN k 2:0 LWNk = 1. k = l (3.4) We will work with weighted ensembles where the ensemble members XNk k = 1 ... N, are identically distributed random variables with some marginal distribution 1r and the weights are given as proportional to some !Nk A weighted ensemble defines a measure J.l-N as the linear combination of Dirac deltas, N J.l-N = L WNk8 (XNk)' WN k
PAGE 50

The proportionality con s tant i s easily determined from (3.4). Note that for all s et s B E B (V) and all real B-measurable function s f, N /-LN ( B ) = L WNk. k=l XNkEB From (3. 2), the mean of the weighted ensemble is the mean of the measure /-LN, given by N (x, ll w ) = j (x, u ) dJ-LN (u) = L WNk (x, XNk) V x E V, k=l giving th e usual formula for the weighted sample mean, N liN = L wNkxNk k=l (3. 6) Likewi s e from (3.3), the covariance of the weighted en s emble is the covariance oper ator Q IJ.N : V ---+ V of the measure /-LN, given by (x, QIJ.Ny) = J (x, U liN) (y, U liN ) dJ-LN (u) N = L WNk ( x, XNkliN ) (y, XNkli N ) Vx, y E V (3.7) k=l In the ca s e when V = JRn, with x = e i y = e i the unit vector s and X Nk,i the entries of the vector XNk. (3.7) yields the weighted sample covariance matrix N QIJ.N = [Q!J.N,ij], Q!J.N,ij = L WNk (XNk,iliN,i ) (XNk,j liN,J (3. 8) k=l Using the s e definitions for the mean and covariance of weighted ensembles, we can apply the EnKF formula (2.17) to weighted ensemble s In this case the EnKF update formula only updates the ensemble members, while the weight s remain the same. 25

PAGE 51

We say that a weighted ensemble represents a probability distribution /.L when the weights are chosen as the importance weights, k = 1 ... ,N. (3.9) where di.L/ dn is the Radon-Nikodym derivative, i.e., the density of /.L with respect to 1r. (If both /.L and 1r have densities with respect to some reference measure, the Radon-Nikodym derivative is the ratio of their densities, which is how the importance weights are often written.) A standard Monte-Carlo quadrature argument, e.g., [Doucet et al., 2001], shows that if X Nk. k = 1 ... N, are in addition independent, i.e., X Nk. k = 1 ... N, is a sample from 1r, then the random measures /.LN converge to /.Lin the sense that E (If fd!LN J fd!LI') = 0 G) (3.10) for every continuous function f on V such that J If dnl2 dn < oo The subscript N was needed only to formulate properly the assumptions of the stochastic convergence (3.1 0). Since the theory of convergence as N oo is outside of the scope of this section, the subscript N in the ensemble members and in the weights is not needed any more and it will be dropped from now on. 3.1.2 Particle filters We first derive the particle filter [Doucet et al., 2001, Ch. 1] by considering the measure defined by the forecast weighted ensemble (X{, w) N = L w ( x) x1 f'V v k k = 1 ... N (3.11) k=l 26

PAGE 52

as the forecast distribution itself. Application of the Bayes' rule (3.1) then gives the analysis distribution r Jl.N (B) = .:..c) B::--------1 p (diu) (u) where the analysis weights are given by N L P (diX) w k = l X{EB N LP(diXf)w k = l (3.12) and the analysis ensemble members are unchanged from the forecast, Xk = x. This is the Sequential Importance Sampling (SIS) method. Alternatively, one arrives at the same SIS update formula (3.12) by considering weighted ensembles as Monte-Carlo representations of probability distributions. Suppose that p,f is the forecast distribution and, following (3.9), the forecast ensemble (xf, w) : = l represents p,f, xi rv 1/ k ) k= l, ... N. Consider an analysis ensemble (Xk, such that the measure approximates the analysis distribution p,a, given by (3.1 ), and suppose that the marginal distribution of the analysis ensemble is some probability distribution 1r, called a proposal distribution. Then Xf "'1r, ex: :a (Xf) = p (diXk) (Xf) k = 1 ... N. (3.13) 27

PAGE 53

While the data likelihood p (diXk') is assumed to be known and can be readily evaluated, the density of the forecast distribution with respect to the proposal distribution is in general not available, except in the particular case when the members of the analysis ensemble is taken the same as in the forecast ensemble. In that case, 1r = v, which gives dJlt (Xa) = dJlf (x') f d1f k dv k ex wk, k = l, ... N and, comparing with (3.11), we recover the SIS update (3.12). Since the regions where the forecast and the analysis distributions are concentrated in general differ, many of the analysis weights tend to be very small or even degenerate to numerical zeros. Since only members with large weights contribute to the accuracy of the representation of the probability distribution, the effective size of the ensemble shrinks. Over several analysis cycle, all the weight tends to become concentrated on a single member. This effect is known as .filter degeneracy. To avoid filter degeneracy, the SIS update is followed by a resampling. In the resulting method, called SIR or the bootstrap filter [Gordon and Smith, 1993], a new ensemble member Xf is obtained by selecting X{ with probability w ( This results in an ensemble with repeated members and all weights equal. Advancing in time by a stochastic model is then relied upon to spread the ensemble again. In [Kim et al., 2003, Xiong et al., 2006], the resampling is done by first estimating the density P a and then generating the new ensemble by random sampling from the estimated density. In [van Leeuwen, 2003], the use of Cauchy distribution, which has thicker tails that the normal distribution, is suggested to alleviate the degeneracy problem. 28

PAGE 54

3.1.3 Density estimation If (Xk):=l is a sample from a probability distribution f.-, which has density f with respect to the measure 1r and f is continuous at x, then the density can be approximated by the nonparametric kernel estimate f (x) _!_ l{k: Xk E B u (x, r)}l N 1r (Bu (x, r)) (3.14 ) where 1 is the counting measure and B u ( x, r) is the closed ball B u ( x, r) = { u E V : u x E U, II u x II u r} (3.15 ) with respect to a subspace U C V. The radius r is called the bandwidth. In finite dimension, Loftsgaarden and Quesenberry [ 1965] have shown that the density esti mate (3.14) converges in probability when 1r i s the Lebesgue measure, 11 u is the Euclidean norm, U = V, r is chosen as the distance to the k(N)-th nearest sample point to x and lim k (N) = +oo, lim k (N) = 0 N (3.16) For probability distributions on a Banach space V, the estimate (3.14) converges in mean square when lim r = 0 lim N1r (Bu (x, r)) = +oo, (3.17) U = V, and some other technical conditions are satisfied [Dabo-Niang, 2004]. 29

PAGE 55

3.2 Derivation of predictor-corrector filters 3.2.1 Motivation The Kalman filter is capable of making an arbitrarily large change in the state (called innovation in geosciences) in response to the data, because the KF computes the exact analysis probability distribution from Bayes' rule for the Gaussian case. The EnKF shares this advantage, because it is based on the same algebraic update formu las. In addition, the EnKF formulas can be used for arbitrary probability distributions. However, if the distributions are no longer Gaussian, the underlying assumptions of EnKF are no longer valid and there is no reason to expect that the EnKF should de liver ensembles that would describe the correct analysis distribution. Instead, the EnKF tends to replace the analysis distribution by a distribution closer to Gaussian and concentrated approximately in the same region of the state space. The SIS is able to approximate arbitrary distributions and converges to the correct distribution in the limit for large ensembles [Crisan, 2001], but it cannot relocate the ensemble members and thus make larger innovation than the spread of the ensemble. When a large change of state is required in the analysis step, SIS with a modestly sized ensemble fails, because it has few or no ensemble members where the posterior is concentrated. This fact is illustrated in Figure 3.1. The idea of the predictor-corrector filters is that it is possible to get proposal ensembles that can be expected to cover the posterior better by other means, such as the EnKF. The predictor phase is intended to take the forecast ensemble (X w) and produce the proposal ensemble, (Xk', w). In the corrector phase, the weights of the proposal must then must then be corrected by (3.12) to obtain the analysis ensemble 30

PAGE 56

2 100 X 1.5 10-10 0 XX 1 10-20 X X 0.5 10 -3o X 0_2 4 10-4o 1.5 1 0.5 f) 0 .5 1 (a) (b) exact prior ensemble prior exact posterior ensemble posterior mean 0 -0.13228 1.7978 0.75439 varian
PAGE 57

(Xk wk). However, the Radon-Nikodym density dj.t/ /dn is not directly available, so we replace it by density estimation. Recall that from section 3.1.2 that the proposal distribution 1r is given as the marginal distribution of the members of the analysis ensemble Xk. Motivated by the kernel estimate (3.14), we replace the ratio of the densities in (3.13) by the nonparametric estimate 1 l::e:llxf w{ dp/ (Xk) rv "N 1r (Bu (Xk, rk)) dn (Xk) rv 1 w{ (3.18) N 1r (Bu (Xk, rk)) This estimate uses only the concept of distance and it is inherently dimension independent. Motivated by (3.16), we choose the bandwith rk as the distance to the l N1 / 2 J -th nearest member Xff from Xk, measured in the I H u norm. 3.2.2 High-dimensional systems It remains to choose the norm llllu In the scalar case, every oorm a multiple of the absolute value. We are particularly interested in the case when the ensemble consists of discrete representations of random smooth functions, such as solutions of partial differential equations. Since the dimension of the state space can be very large when the mesh is fine, looking at the infinitely dimensional case provides a guidance to the choice of the norm 11 u in the density estimate kernel and brings an insight into the performance of EnKF and SIS as well. So, consider filters operating on an infinitely dimensional, separable Hibert space V The ensemble is initialized as a sample from an appropriate distribution of random functions. A well-known way [Evensen, 1994 Ruan and McLaughlin, 1998] to construct smooth random functions for the initial ensemble is from a complete 32

PAGE 58

orthonormal basis { IPn} in V and coefficients An > 0, < +oo, by 00 u = L dnAniPn, dn rv N (0, 1)' { dn} independent. (3.19) n=l The sum (3.19) defines a Gaussian random variable with values in V. In computations, of course, the sum is only finite and the space V is finite dimensional. Possible choices of { IPn} include a Fourier basis, such as the sine or cosine functions, or bred vectors [Kalnay, 2003]. In any case, with increasing n, the basis functions IPn are more oscillatory, and they contribute less to the sum since A n --+ 0. If An --+ 0 sufficiently fast.. the sum (3 .19) defines a random smooth function u. For example, for IPn (x) = sin nx, it is easy to see that u E Ck a .s if A n = 0 (n-l), > k + 1. The distribution of u is a Gaussian measure on V and are the eigenvalues of its covariance. We choose as the norm in the density estimate (3.18) 00 = L n=l "'n U = { u E V : < +oo} (3.20) where 0 < "'n :::; 1 and limn-+oo An/ "'n = 0. The reason for this choice is that for "'n = An, U becomes the Cameron-Martin space of the Gaussian measure, {), the initial ensemble was sampled from, and{) (U) = 0 [Da Prato, 2006, Prop. f.27]. A ball of measure zero does not bode well for the density estimate (3.14) and thus for (3.18); however, a ball in the V norm (all "'n = 1) is guaranteed to have a positive measure [Da Prato, 2006, Prop 1.25]. Choosing intermediate norms may allow norms more adapted to the probability distribution (and thus to the smoothness of the state), with a smaller ba11 possibly yielding a more accurate estimate. An important consequence of the expansion (3.19) is that the Gaussian initial state can be approximated by only a finite sum of the basis functions with arbitrary 33

PAGE 59

accuracy. In fact the sum of eigenvalues of the covariance of an y probability measure on a separable Hilbert converges as long as the covariance exists. For non-Gaussian distributions, the Karhunen-Loeve expansion [Loeve, 1977] provides a more general approximation So one can expect that for a state space of large finite dimension that approximates an infinitely dimen s ional separable space (such as the spaces where the solutions of partial differential equations are), the behavior of stochastic algorithms such the filters considered here will depend only on the nature of the infinitely dimensional probability distribution and it will not det e riorate in the limit for a large dimension of the state space. This is also the reason why we do not consider distributions where the eigenvalues of the covariance are bounded away from zero. The convergence of particle filters is known to deteriorate [Bengtsson et al., 2008, Snyder et al., 2008] for such state distributions. 3.2.3 Formulation of the method Combining EnKF, SIS and density estimation yields the following predictor corrector algorithm. EnKF is used to make a potentionally large change in state followed by SIS, which changes the weights to account for non-Gaussian distributions. Algorithm 3.1. Given forecast weighted ens e mbl e (x, w) N and observation k = 1 H X -d rv {), with f the density of{), R = Cov {), and E ( fJ) = 0 : 1. (Predictor) Compute the m e mbers X k of th e anal y sis ensembl e b y the EnKF, X k = X{+ K ( dkHX{), d k d"' {), (3.21) K = C(Xf,wf)HT (HC(Xf,wf)HT +R)-1 where C(X f wf) is the covariance of (X{, w ) following (3.8). 34

PAGE 60

2. (Corrector) Compute the anal y sis weights wk from the SIS update with density estimation L t llx' -xall < r w ex f ( H d) t k u k I wk optionally with resampling: Choose Y k as Xf with probability wf., k 1 ... N then set all X Z = Y k and all wk = 1 / N. The bandwidth r k is the distance to the l N112 J -th nearest member X f from x z measured in the llllu norm, which is chosen following (3.20) Note that if resampling is used, then the ensemble weights are all same, so standard EnKF applies and the extension to weighted EnKF is not needed. The resampiing generally duplicates some ensemble members However, even for a deterministic model, the random s election of the data vectors d k in the next analysis step will spread the ensemble again Finally the predictor corrector filter is not restricted to the present version of EnKF; other version of EnKF can be used just as well. 3 3 Numerica l Results To get an idea of why are predictor-corrector filters beneficial we will consider some situations where standard filtering techniques are unsuitable. Such conditions are frequently encountered when considering nonlinear problems or when it is technically unfeasible to use a sufficiently large ensemble to approximate the distribu tions. We will use predictor by weighted EnKF, and the resulting predictor-corrector filter will be called EnKF-SIS 35

PAGE 61

2 100 1.5 w 1 X x
PAGE 62

3.3.1 Bimodal Prior The first such situation we will consider is that of a bimodal prior. We construct a bimodal prior by first sampling from N (0, 2 .5). These samples are then weighted by the function representing the sum of two Gaussian distributions with mean .5 and variance 0.1. The resulting weighted ensemble can then be considered a weighted sample from the prior probability distribution function shown in Fig. 3.3 (a). The data likelihood is Gaussian, with the mean shifted slightly to the right. Each filter (EnKF, SIS, and EnKF-SIS) was applied to this problem with an en semble size of 100. The density ratio estimate (3.18) was then applied to the resulting posterior ensemble to obtain an approximate posterior probability distribution. The results for each method are given in Fig. 3.3. Because the EnKF technique assumes that all distributions are Gaussian, it is no surprise that it would fail to capture the non-Gaussian features of the posterior. Both SIS and EnKF-SIS were able to represent the nature of the posterior reasonably well. 37

PAGE 63

1 1 0.5 0.5 (a) Prior and data distributions 1 0.5 (c) Posterior from SI S I I I I I I I I 1 0 1 (b) Posterior from EnKF 1 I I I I I I I I I I 1 (d) Posterior from EnKF-SIS 2 2 Figure 3.3: In (a), the prior and data distributions are given by blue and red lines, respectively. The prior ensemble, with size 10, is displayed with blue *'s, where the position of each ensemble member on the x axis is its value and the position on the y axis is its weight. The data likelihood given each ensemble member is displayed as red o's, scaled so the likelihoods sum to 1. The prior distribution has two modes at .5 and is constructed by sampling each ensemble member, xk rv N(O 2.5) and assigning weights as the pdf of the sum of two Gaussian distributions, N(l.5, 0.1) evaluated at Xk The data distribution is N(O, 0 1). In (b)-(d), the posterior ensemble is shown with black x 's for EnKF, SIS, and EnKF-SIS, respectively. The exact pos terior density is shown as a dotted line, while the estimated posterior density from the ensemble calculated from a gaussian kernel with bandwidth 0.5 is shown as a solid black line. The EnKF alone cannot account for the bimodal nature of the posterior, while the EnKF-SIS method corrects this by applying weights which increase the ac curacy of the posterior estimation. In this scalar problem, the SIS method is able to perform reasonably well. 38

PAGE 64

3.3.2 Filtering for a stochastic ODE The results given above describe how each filtering technique applies to certain carefully designed synthetic distributions. It remains to be seen how these filters work when applied to an actual model. We have implemented a stochastic differential equation as used by Kim et al. [2003], du 3 -= 4u 4u + r;,TJ dt (3.22) where TJ(t) is white noise, namely an element with Gaussian distribution of zero mean and covariance E[TJ(t)TJ(t')] = b(tt'). The parameter r;, controls the magnitude of the stochastic term. The deterministic part of this differential equation is of the form du/ dt = -f'(u), where the potential f(u) = -2u2 + u4 cf., Fig 3.4a. The equilibria are given by f' ( u) = 0; there are two stable equilibria at u = and an unstable equi librium at u = 0. The stochastic term of the differential equation makes it possible for the state to flip from one stable equilibrium point to another; however, a sufficiently small r;, insures that such an event is rare (Fig 3.4b). The equation (3.22) is a simple model problem related to fire simulation that includes spatial diffusion; the two equilibria in (3.22) are related to the burning and not burning states. A suitable test for an ensemble filter will be determining if it can properly track the model as it transitions from one stable fixed point to the other. From the previous results, it is clear that EnKF will not be capable of describing the bimodal nature of 39

PAGE 65

1 0 1 0 1 u (a) 1 0 -1 0 0.5 (b) 1 t 1.5 2 Figure 3.4: (a) The deterministic potential curve, and (b) A solution of (3.22) switch ing between stable points. the distributions involved so it will not perform well when tracking the transition. When the initial ensemble is centered around one stable point, it is unlikely that some ensemble members would be close to the other stable point, so SIS will be even slower in tracking the transition [Kim et al., 2003]. The solution u of (3.22) is a random variable dependent on time, with probability density p(t, u). The evolution of the density in time is given by the Fokker-Planck equation [Miller et al., 1999], ap a 2 "'2 a2p = -[4u(u -l)p] + ---. at au 2 au2 (3.23) To obtain the exact (also called optimal) solution to the Bayesian filtering problem, up to a numerical truncation error, we have advanced the probability density of u between the Bayesian updates by solving the Fokker-Pianck equation (3.23) numerically on a uniform mesh from u = -3 to u = 3 with the step !:lu = 0.01, using the 40

PAGE 66

t 1 2 3 4 5 6 u 1.2 1.3 -0.1 -0.6 -1.4 -1.2 Table 3.1: Data used in assimilation into (3.22) MATLAB function pdepe. At the analysis time, we have multiplied the probability density of u by the data likelihood as in (2.1) and then scaled so that again J pdu = 1, using numerical quadrature by the trapezoidal rule. The data points (Table 3.1) were taken from one solution of this model, called a reference solution, which exhibits a switch at time t 1.3. The data error distri bution was normal with the variance taken to be 0.1 at each point. To advance the ensemble members and the reference solution, we have solved (3.22) by the explicit Euler method with a random perturbation from N(O, (b.t)1 / 2 ) added to the right hand side in every step [Higham, 2001]. The simulation was run for each method with ensemble size 100, and assimilation performed for each data point. There was no resampling in EnKF-SIS. Examining the results in Fig. 3.5, EnKF-SIS was able to approximate the mean of the optimal solution better than either SIS or EnKF alone in the three cycles following the data switching to the lower mode; however, the new method was unable to track the variance as accurately os the EnKF alone during these cycles. The EnKF-SIS appears to under-estimate the true covariance; this behavior could possibly be ame liorated this problem. EnKF provided the smoothest approximation; however, it is unable to track the data quickly as it switches. SIS suffers from a lot of noise as only a small number of ensemble members contribute to the posterior. In addition, SIS 41

PAGE 67

1.5 0 1 0.5 ;:! 0 0 -0.5 EnKF SIS -1 EnKF+SIS exact 0 data 1.5 0 0 1 2 3 4 5 6 7 (a) 100 w-1 0 c:; w 1 u w-2 "' a 0 E "[;j .!: > .... !: 0 .... t: w-2 0 w-3 0 t: 0 w-3 0 2 4 6 w-4 0 2 4 6 t (b) (c) Figure 3.5: In (a), an example of the applying the EnKF, SIS, and EnKF+SIS on (3.22) with the data given by circles. Displayed is the ensemble mean of a single realization and the mean of the optimal solution from (3.23) using an ensemble of size 100. In (b) and (c), the estimated mean square error in the ensemble mean and covariance are given for each method by averaging over 100 simulations. 42

PAGE 68

provides even worse tracking of the data than EnKF. The hybrid EnKF-SIS method provides the best tracking in this case, but it exhibits some noise in the solution, be cause the proposal and forecast distributions are far apart. This noise is caused by the ensemble degenerating to only a few solutions and is similar to that seen with SIS, but less severe. 43

PAGE 69

3.3.3 Filtering in high dimension Typical results for filtering in the space of functions on [0, 1r] of the form d u = L Cn sin ( nx) (3.24) n=l are in Figs. 3.6 and 3.7. The horizontal axis is the spatial coordinate x E [0, 1r]. The vertical axis is the value of u. The level of shading on each vertical line is the marginal density of u at a fixed x, computed from a histogram with 50 bins. The ensemble size was N = 100 and the dimension of the state space was d = 500 The Fourier coefficients were chosen An = n -3 to generate the initial ensemble from (3.19) (i.e., eigenvalues of the covariance are n -6 and ""n = n-2 for the norm in the density estimation. Fig. 3.6 again shows that EnKF cannot handle bimodal distribution. The prior was constructed by assimilating the data likelihood 1/2 if u( 1r / 4) and p(diu) = u(37rj4) E (-2,-1)U(1,2) 0 otherwise into a large initial ensemble (size 50000) and resampling to the obtain the forecast ensemble size 100 with a non-Gaussian density. Then the data likelihood u(1rj2)-0 .1,...., N(O, 1) was assimilated to obtain the analysis ensemble. 44

PAGE 70

Fig. 3.7 shows a failure of SIS. The prior ensemble sampled from Gaussian dis tribution with coefficients A n = n -3 using (3.19) and (3.24), and the data likelihood was u (rr / 2 ) 7""' N(O 1). We observe that while EnKF and EnKF-SIS create ensembles that are attracted to the point (rr / 2 7 ) SIS cannot reach so far because there are no such members in this relatively small ensemble of size 100. We have demonstrated the potential of predictor-corrector filters to perform a successful Bayesian update in the presence of non-Gaussian distributions large num ber of degrees of freedom, and large change of the state distribution. The new filter performs as well as EnKF on Gaussian distributions. We have also observed that on spaces of smooth functions, the particle filter itself often works quite well already, as long as the updates are not too large, even in very high dimension. This appears due to the fact that the smooth functions can be approximated well by a low-dimensional space regardless of the number of the state dimension. We have noted that due to general properties of probability distributions, this also holds for a large class of probability distribution on infinite dimensional vector spaces, in particular for models by partial differential equations. Open questions include convergence of the filters when applied to multiple up dates over time, mathematical convergence proofs for the density estimation and for the Bayesian update, and performance of the filters when applied to systems with a large number of different physical variables and modes, as is common in atmospheric models. 45

PAGE 71

4 -2 I. (a) Prior 5 (c) EnKF (d) SIS 1 L (b) Exact (e) EnKF-SIS L j Figure 3.6: In (a), a non-Gaussian prior distribution is applied to a data distribution with small likelihood. The exact posterior distribution is given in (b). The EnKF, SIS, and EnKF-SIS was applied to an ensemble of size 100. In (c), the EnKF does not approximate the exact posterior distribution well because it ignores the bimodal prior. The SIS given in (d) does approximate the posterior reasonably well because the data likelihood is large. In (e), the EnKF-SIS resembles the features of (d) by reducing the weights of the proposal ensemble members that pass though the center of the figure. 46

PAGE 72

I (a) Prior c (c) EnKF .. < .. 2 (d) SIS 1 (b) Exact (e) EnKF-SIS Figure 3.7: In (a), a Gaussian prior distribution is applied to a data distribution with small likelihood. The exact posterior distribution is given in (b). The EnKF, SIS, and EnKF-SIS was applied to an ensemble of size 100. In (c), the EnKF approximates the exact solution well because the prior was Gaussian; however, the SIS given in (d) was fairly degenerate and was not able to correct for the large shift in the distribu tion. In (e), the EnKF-SIS approximates the shape of the exact posterior well, while exhibiting far less degeneracy than (d). 47

PAGE 73

4. The EnKF in Practice 4.1 The curse of dimensionality The "curse of dimensionality" is the well known issue involving Monte Carlo sampling, where the number of samples necessary to estimate a high dimensional probability density must grow exponentially with the dimension of the system. The problems with the particle filter in high dimension discussed in Section 3 are directly related to this issue. In Bengtsson et al. [2008], the authors formalize this concept in a Gaussian setting by showing that, when the forecast covariance has a spectrum bounded away from zero, filter collapse will occur unless the size of the ensemble grows exponentially with the size of the model. The assumptions of this statement, however, are likely too strong to be of practical relevance. For instance, it is known that the covariance of a Gaussian measure on a Hilbert space has a bounded trace [Kuo, 1975]. Therefore, a Gaussian measure on an infinitely dimensional space can not have a covariance satisfying these conditions. In its limit, this problem can no longer be considered as Monte Carlo sampling of a Gaussian distribution. We would like to examine the behavior filtering error as the size of the model, n, increases; however, this requires some care because the distributions necessarily vary with n. For this, we will consider an infinitely dimensional, separable, Hilbert space, V, with a complete, orthonormal sequence, { vn} Denote V n C Vas thendimensional subspace spanned by v1 Vn. If we assume that J.L is a measure on (V, B(V) ), such that J.L(span{ vn}) > 0 for each n. We will consider "n-dimensional projections" of this measure, defined on (Vn, B(Vn)), as J.Ln(A) := J.L(A)/ J.L(Vn) for 48

PAGE 74

every A E B(V n) The question the dependence of filtering error on model dimension will be posed as follows: We are given 1-lf, the true forecast distribution on (V B (V)) and a given data set with known error density. For each n is the true analysis distribution resulting from the n -dimensional projection, of the forecast applied to the same data likelihood. We can apply a data assimilation algorithm to each which results in an approximation, of the true analysi s distribution, Following (3.7) we will denote QJ.L, the covariance of the measure, 1-l The error of the data assimilation algorithm for each n will be = where I I is the matrix 2-norm. In the numerical experiments that follow we will study the behavior of this error as a function of n Consider a Gaussian measure J-l with mean 0 and positive definite covariance Q w If we define {vi} an orthonormal basis of eigenvectors of QJ.L with A i their corresponding eigenvalues then a random element, X ,......, J-l, can be expressed in terms of this basis as a random sequence, { xi}, with diagonal covariance: Because a covariance is Gaussian if and only if it has a bounded trace, we can con struct a measure with Gaussian covariance over V out of an i.i d. sequence defined by (4.1) The following figures explore the error of the EnKF and SIR applied to forecast ensembles of n -dimensional random vectors sampled out of a Gaussian distribution with zero mean and diagonal covariance. We will explore this error for various eigen value structures : constant (-\ i = 1) inverse (-\ i = t ), and inverse square (,\ i = b). 49

PAGE 75

Notice that only in the last eigenvalue structure can the limiting case possibly be a Gaussian distribution, and only the first conforms to the assumptions discussed in Bengtsson et al. [2008]. The data used will be sampled out of an m -dimensional Gaussian distribution with zero mean and covariance 0.1!. We will consider a lin ear observation operator made up of independent random components sampled out of N(O 1). The true analysis covariance will be calculated from (2.12). In they-axis is the error in the analysis covariance, E (JQi'a QJJ.a J2 ) is approximated by averaging over 100 simulations. While the use of a random observation function may seem counter-intuitive, the intention is to describe the average filtering error for an arbitrary data assimilation satisfying the assumptions of the EnKF. Naturally, as the size of the system state changes, the observation operator must change as well. The use of a random H with independent components sampled out of N(O 1 ) is an ad-hoc method for accom plishing the desired result. Therefore, there is little theoretical justification for this choice. For completeness, Figures 4.5-4.8 repeat these experiments using a determin istic observation function defined as an m x n matrix H ii = b ii This form of H produces observations in the largest m directions of uncertainty of the forecast dis tribution. Figures 4.9 and 4.10 show the filtering error where the data size changes with the model size m = n and the observation operator is the identity, representing a direct observation of the system state. The behavior of the error as a function of model size is fairly consistent in all of the figures for both filtering methods. In particular, the error for constant >.i tend to rapidly increase with the model size This is consistent with the familiar curse of dimensionality. However, the error for >.i = f.x remain remarkably flat in all of the 50

PAGE 76

SIR, N = 10, m = 25 EnKF, N = 10, m = 25 102 102 101 .... 101 .... 0 0 t: t: 0 0 100 .... .... B <2 100
PAGE 77

S[R, N = 100, m = 25 EnKF, N = 100, m = 25 10 2 102 101 10 1 ... ... 0 0 t:: t:: 0 100 0 100 ... ... <2 <2 10 1 10-1 10-2 101 10 2 103 10-2 10 1 102 10 3 model s ize model size (a) (b) Figure 4.3: For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and 25 data points with a random observation matrix. SIR, N = 100, m = 250 EnKF, N = 100, m = 250 102 10 1 101 .... .... 100 g 0 t: 0 100 0 ... ... <2 <2 10-1 10 1 10-2 10 1 10 2 10 3 10-2 10 1 10 2 103 model s i ze model size (a) (b) Figure 4.4: For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and EnKF (b) using 100 ensemble members and 250 data points with a random observation matrix. 52

PAGE 78

SIR, N = 10, m = 25 EnKF, N = 10, m = 25 103 100 102 ..... ..... 0 101 0 t: t: 0 0 10-1 ..... ..... B 100 ll
PAGE 79

SIR N = 100, m = 25 EnKF, N = 100, m = 25 10 2 102 10 1 101 ... ... 0 g 100 t: 0 100 0 ... ... 10-1 tC tC 10-1 10-2 10-2 10-3 ,,, 101 102 10 3 101 102 10 3 model size model s ize (a) (b) Figure 4.7: For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and En.KF (b) using 100 ensemble members and 25 data points with ob servation matrix, H ii = 8ii. SIR N = 100, m = 250 EnKF, N = 100 m = 250 102 10 1 10-1 ... ... 0 100 -0 t: t: 0 0 ... ... 10-1 tC tC 10-2 10-3 10-3 101 10 2 103 10-4 101 102 103 model size mod e l size (a) (b) Figure 4.8: For constant inver e, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and En.KF (b) using 100 ensemble members and 250 data points with observation matrix, H ii = 8ii. 54

PAGE 80

SIR N = 10 EnKF, N = 10 .. 101 .. g g Q) Q) 10-1 .. .. M M tC 100 tC 101 102 103 101 102 103 model size model s ize (a) (b) Figure 4.9: For constant inver e and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and En.KF (b) using 10 ensemble members and an identity observation matrix. 102 101 .. 0 t:: Q) 10 0 ... M tC 10-1 10 2 101 SIR, N = 100 ' I 102 model size (a) EnKF, N = 100 101 102 103 model size (b) Figure 4.10: For constant inverse, and inverse square forecast covariance eigenvalue structures, the mean square covariance filtering error versus model size is displayed for SIR (a) and En.KF (b) using 100 ensemble members and an identity observation matrix. 55

PAGE 81

figures, suggesting a certain amount of independence between filter error and model size. The behavior for A i = t exhibits mixed results between that of the red and blue lines. As expected, increasing the number of samples tends to decrease the error. Also, the experiments present in all of the figures was repeated using the predictor corrector filter, EnKF-SIR, but the results were nearly identical to that of the EnKF alone and have been omitted. The analysis distributions are Gaussian by construction, so the corrections produced by SIR are of little consequence. We observe that the results for A i = fr are the only problems where the distribu tions can remain Gaussian in the limit. The results for A i = t lie on the boundary of such distributions in that covariances with eigenvalues decaying any faster are Gaus sian covariances in their limit. The results of these experiments provide evidence that the curse of dimensionality in the analysis covariance error simply does not oc cur when the limiting covariances satisfy the conditions of the EnKF (i.e. they are Gaussian) The evidence presented here present opportunities for further theoretical re se arch. 56

PAGE 82

4.2 Generating initial ensembles So far, we have assumed that an initial ensemble is given which is sampled out of a known forecast error distribution without mentioning how this is actually done. In atmospheric sciences, both the creation of such ensembles and producing initial distributions has been the focus of much research. Common techniques involve constructing covariances out of known spatial correlations and determining components with the largest variance directly from the computational model known as bred veetors. For our purposes, we will limit the construction of initial ensembles to Gaussian perturbations of a central model state, xi = Xo + i, Ci rv N(O, Q). Because the model states in our application represent smooth functions obtained from solutions of PDE's, we must take care to produce ensemble states that retain the properties of the model solutions. The most straightforward method of creating Gaussian perturbations would be to add a Gaussian random variable with independent components or equivalently a diagonal covariance. However, in the limit, such perturbations do not produce smooth functions. For simplicity, assume the model states are comprised of a single variable, which belongs to a space of smooth functions over the bounded domain, D = [-1r, 1r]. For this, we will consider Sobolev spaces, Ha:(D) for a 2: 0, which contain functions whose a-order derivative is in the space L2( D). If { is Fourier series of the element v E Ha:(D), then the norm of this space can be expressed as 00 llviiH" = L(l + n2)a:lvnl2 (4.2) n=O 57

PAGE 83

This tells us that the higher frequency components of a smooth function must de cay sufficiently rapidly for this sum to remain bounded In particular if the Fourier coefficient s decay as then the function is an element of Her( D ). We will u s e this as motivation for the following con s truction similar to that presented in Evensen [1994], of what we will call "random field s of smoothness a". If the domain D is discretized over a uniform mesh of s ize n with resolution h then we define a vector z k = ck + i dk, with ck, dk "' N ( O 1 ) The random field will be given by v(x ) = R e { -1z ei(k-l)x } L....J k l+cr k k = l which can be computed efficiently by taking the real part of the inverse fast Fourier In general, we can follow this procedure to produce perturbation s to each discrete spatial variables making up the state vector. For a 3-dimensional variable discretized over a uniform grid of dimen s ions n x x ny x n z with resolution h x x hy x h z the proces s i s s imilar. We create a random element in cnx Xny X n z with independent components s ampled out of N ( O 1 ) Each component, (ix, iy, i z ) of this random element is scaled by s omething proportional to (4.3) The proportionality constant s erves to make the s caling independent of the spatial unit s of the domain and will be considered to be a problem dependent parameter. The real part of the 3-dimensional inverse Fa s t Fourier Tran s form (FFT) of the scaled array is multiplied by another proportionality con s tant to obtain the random perturba58

PAGE 84

a=O a=1 a=2 3 1.4 1.4 2 1.3 1.2 1.2 1 1 1.1 0 0 20 4 0 60 8 0 100 0.80 20 4 0 6 0 8 0 100 20 4 0 6 0 8 0 100 ( a) ( b) ( c ) Figure 4.11: The effect of creating a smooth random field using the procedure out lined above. An i.i.d. complex vector of length 100 is sampled out of N(O, 1). The components of this vector are scaled according to 4.3, with a = 0 1 2. A larger a reduces the influence of higher frequency components, and the resulting function is smoother. tion to the variable. The proportionality constant again serves to make the perturbations independent of the variable s units and are treated as problem dependent. The effect of this procedure is illustrated in Figure 4.11 for various levels of smoothne s s, a = 0 1 2. The inverse FFT constructs functions that have periodic boundary conditions. If this behavior is not desired, one can use this procedure on a domain twice as large in each dimension and restrict the output to the desired size. In some applications, it is desirable to create random functions with 0 Direchlet boundary conditions. This is possible by modifying the array just before applying the inverse FFT in the following manner. Suppose zix,iy,iz is an array in the frequency domain. If we modify the fir s t component in each dimension according to then the inverse FFT of the re s ulting array will be 0 on the boundary. 59

PAGE 85

The preceding analysis provides a means to construct random perturbations to a state vector using smooth functions In addition forecast ensembles produced in thi s way satisfy the conditions presented in 4.1. We can expect that the filtering error produced from such ensembles to behave well as the state size grows. 4 3 Implementation The formulation of the EnKF as presented in (2.17) is written in form that can be easily understood; however, it is not efficient to compute it in this form. We have chosen to compute the ensemble update with the inverse formed by an application of the Sherman-Morri s onWoodbury formula [Hager 1989] (HC ( Xi)]{'+ Rr1 = ( R + N l ( H A) ( H A ) T r R -1[!-1 ( H A)(I+(H A)T R -1 1 ( HA)) -1( H A)T R -1 ] (4.4 ) N-1 N-1 This formula is advantageous when the data error covariance matrix R is of a special form such that left and right multiplications by R -1 can be computed inexpensively. In particular, when the data error s are uncorrelated, which is often the cas e in practice, the matrix R is diagonal. In this form, the N x N matrix W = I + ( H Af R -1 N 1 ( H A) 1 is inverted using the LAPACK subroutine dpos v which takes it s Cholesky de composition W = LLT and solves V +-w -1 V in the factored form using tri angular backs ubstitution. In this case the EnkF formula (2.17 ) with (4.4) costs 0 (N3 + mN2 + nN2 ) operation s which is suitable both for a large number n of the degrees of freedom and a large number m of data points. Als o (2.17) can be 60

PAGE 86

implemented without forming the linearized observation matrix H explicitly by only evaluating the observation function h using N N [HA]i = Hx(-H "Lxf = h (xf)'Lh (xf), (4.5) j = l j = l See Mandel et al. [2009a] for further details. The ensemble filter formulas are operations on full matrices, and they were implemented in a distributed parallel environment using MPI and ScaLAPACK. EnKF is naturally parallel: each ensemble member can be advanced in time independently. The linear algebra in the Bayesian update step links the ensemble members together. 4.4 Wildfire data assimilation A wildfire is a large and complex physical process driven by fine scale chemical reactions. The length scales involved range over several orders of magnitude, from millimeters for the smallest pieces of vegetation up to the entire burning region spanning many kilometers. From the heat generated, these fires are capable of producing large gusts of wind that can even affect the local weather patterns. In tum, the weather, from winds, moisture, and atmospheric temperature, are major components driving behavior of the fire. It is essential to take into account these effects when modeling the wildfire phenomena in the form of a two way coupled fire-atmospheric model. Many wildfire models have been developed over time that range from complex geophysical models, which attempt to replicate small scale features, to simple, ide alized models, designed only to approximate the large scale behavior. While the complex models tend produce more accurate simulations in reanalysis studies, they 61

PAGE 87

are generally far too computationally expensive for use in real time forecasting. Be cause we are interested in real time forecasting with ensemble data assimilation, it is essential that the model is capable of running faster than real time on a workstation class computer. In the research reported here, we will consider two idealized wildfire models de signed to mimic the large scale behavior of real wildfires. The first model [Mandel et at., 2005] is given as a convection-reation-diffusion PDE that describes the state in terms of the temperature of the burning and the fraction of unburned fuel. These vari ables, obtained from the solution of two coupled PDE's, exhibit a nonlinear, traveling wave, the profile and speed of which are determined to match experimental data. The second model [Mandel et al., 2009a] is semi-empirical, based on the assumption that fire propagation speed is normal to the fireline as a function of wind and terrain slope and assumes an exponential decay of fuel from the time of ignition. The fire propa gation speed and heat generated is given by experimental data generated for various fuels. It is distinguished by the lack of temperature or fire intensity model variables. Instead, the model state contains only information on the time of ignition, as well as the unburned fuel fraction. This model is designed to run on a relatively coarse grid in comparison to the PDE model, so it is more suitable for problem domains exceeding 1 km. 4.4.1 A PDE Wildfire Model The PDE wildfire model expresses the model state in terms of the temperature of the fire, U(x, y), and the fuel fraction, F( x, y) defined on a surface. The evolu tion of the model is derived from a heat balance equation with terms describing the transporting of heat through diffusion and convection. The heat balance also contains 62

PAGE 88

forcing terms representing cooling into the atmosphere and reactive heat input. The reaction rate, r(T), is a function of the temperature and is governed by a modified Arrhenius equation from chemical kinetics, where constants r0 and c,.. depend on the fuel properties and U a is the atmospheric temperature. The model is governed by the following coupled equations, { Cu = V (kVU) -Cv( V + ')'V z ) VU + C JF r(U)-Cc( U -U a ) (4.6) dF = Fr(U) dt with constants eu, k, Cv, c1 ')' and Cc and terrain height, z The constants have been determined in Mandel et al. [2005] so that the temperature profile (Figure 4.13) approximates the shape of a real fire. The wind velocity, v is taken as a known quantity in the standalone model, or it is determined from the output of the atmospheric model in the coupled model. Although this model is simple, its solutions exhibit the qualitative behaviors of combustion; however, stability of a numerical solution requires mesh resolutions on the order of 1m and a time step of about 0.01 s 4 4.2 A Semi-empirical Wildfire Model The semi-empirical fire propagation model discussed here imposes the fire spread rate directly from laboratory data, without consideration of the reaction intensity or shape of the fire profile. Instead, it assumes that the propagation direction is normal to the fireline and the speed is known. This model predicts the heat flux, (x, y) produced by the reaction, which is determined exclusively by the time passed since ignition, ti(x, y) and various fuel properties The equations governing this behavior 63

PAGE 89

are given in the following formula: a (x,y,t) = -A(x,y) atF(x,y,t), (4.7) where A(x, y) is a parameter determined from the fuel type and F(x, y, t) is the fuel remaining, which is assumed to decrease exponentially as it burns: { Fo (x, y) e-(t-ti(x,y))/W(x,y), if (x, y) E 0 (t), F(x, y, t)F0 ( x, y) otherwise, (4.8) where F0 ( x, y) is the initial fuel supply, W ( x, y) is the time constant of the fuel, and n defines the burning region. The propagation of the burning region, n, is modeled as a level set function, '1/;(x, y), which has the property that it is continuous and non-negative outside the burning region, n = {(x, y) : '1/;(x, y) :::; 0}. The fireline is defined as the boundary of the burning region which coincides with the zero contour of '1/J. The evolution of the level set function is governed by the following differential equation: 8'1/; at+ S (x, y) IIV'I/JII = 0, (4 .9 ) where the spread rate, S(x, y) of the fireline is modeled from the modified Rother-mel's formula [Clark et al., 2004, Rothermel, 1972], This empirical formula takes into account the wind, terrain slope, and fuel properties to replicate the observed behavior of real wildfires. The level set equation 4.9 is initialized so that its magnitude increases with the distance from the fireline. For a point ignition centered at (x, y) with radius r, the level function is initialized as '1/;(x, y) = l(x, y)(x, fi)lr. 64

PAGE 90

(a) (b) Figure 4.12: In (a), the fireline and level set function for a fire initialized by two thin lines and a small circle propagating in constant wind. The fire is in the area where the level set function is negative. The black contour is the fireline. In (b), the heat flux generated by the fire in (a). Figure 4.12 illustrates such a level set function for a fire propagating across the domain, and the corresponding heat flux. Finally, while the level set function does specify the burning region uniquely, the converse is not true; there is only a one to one relationship between nand the zero contour of '1/J. 4.4.3 Coupling fire and weather models Generally, a fire model is coupled with a computational fluid dynamics solver in order to approximate the complex relationship between the fire and the atmosphere. We have chosen to couple the level set propagation model with the Weather Research and Forecasting (WRF) code [WRF Working Group, 2005]. WRF, which is a standard for weather forecasting, supports distributed memory execution, and provides import and export of the state through binary files for data assimilation. 65

PAGE 91

The fire model takes as input the horizontal wind velocity 7!, and it outputs the heat flux into the atmosphere, 1>(x, y), given by (4.7). The heat flux is split into sensible heat flux (a transfer of heat between the surface and air due to the difference in temperature between them) and latent heat flux (the transfer of heat due to the phase change of water between liquid and gas) in the proportion given by the fuel type and its moisture. Since the fire mesh is generally finer than the atmospheric mesh, the wind is interpolated to the nodes of the fire mesh, and the heat flux is aggregated over the cells of the fire mesh that make up one cell of the atmospheric mesh. The ratio of the fire mesh resolution to the atmospheric mesh resolution is called mesh refinement ratio. In the numerical results in Section 6.2, the atmospheric mesh has dimensions 42 x 42 x 41 over a 2.5 x 2.5 x 1.5 km3 domain and the fire mesh as dimensions 420 x 420, for a refinement ratio of 10 : 1. The time step used for these simulations was 250 ms. The dimension of the combined state vector used for data assimila tion was about 1.4 x 108 For a problem of this scale, the model runs a 1 minute simulation in approximately 4 minutes on a 1.4 GHz Intel Xeon processor. Using the parallel processing capabilities, the model spread over 4 processes can be run in approximately real time. For the most part, the processing time is dominated by the atmospheric solver rather than the wildfire model. Increasing the refinement ratio will reduce the computational burden significantly at the cost of increased error in interpolating the wind onto the fire mesh. 4.4.4 Wildfire Data We want to model the location of the burning region as it moves through the do main. The data considered here will be assumed to be an observation of the heat flux 66

PAGE 92

1000 800 u 600 ::I !!1 Q) a. 400 E Q) 1-200 ;: 1 175 1.225 1.275 1.325 time( seconds) X 104 Figure 4.13: From Mandel et al. [2008], the temperature profile of a combustion wave traveling over a fixed point. The modeled temperature profile given by the black line was optimized to approximate the dotted line, representing real data obtained from Kremens et al. [2003]. throughout the domain, as would be obtained by an arial infrared image. Such im ages can be processed to determine the approximate location of the fire front [Ononye et al., 2007]. For simplicity, we will assume that the data is a direct observation of one of the model state variables, such as the heat flux. While this is not a realistic assumption, the data can be preprocessed to interpolate the image data to a subset of the model grid. Extensions that allow the use of more complicated observation functions will be the focus of future research. Wildfire models tend to develop into traveling waves with a sharp leading edge followed by a an exponentially decaying cool-down [Mandel et al., 2008]. Under uniform fuel and wind conditions, the shape of the combustion wave (Figure 4.13) remains relatively constant; however, small errors in the speed of propagation will eventually result in large errors in its location over time. If we create an ensemble by moving a known temperature profile in space, where the magnitude of the pertur-67

PAGE 93

1 500 Temperature (K) Figure 4.14: The estimated point-wise probability density of the temperature near the fire line for an ensemble generated by perturbing a known temperature profile in space by a Gaussian random variable. The density has peaks centered around burning and non-burning regions and is highly non-Gaussian. bation is a Gaussian random variable, then the point-wise error distributions of the model variables tends to develop highly non-Gaussian features. Figure 4.14, shows the estimated density of the temperature at single point in the domain for an ensem ble constructed in this manner. Applying additive corrections to the state variables as discussed in Section 4.2 simply cannot replicate this highly non-Gaussian behavior. 4.4.5 Spatially Perturbed Ensembles For the remainder of this section, we will limit our attention to a highly simplified problem, in which the fuel supply and wind are all uniform over the problem domain. The ensemble members will be initialized as a point ignition originating at a location determined from a Gaussian random variable. The ensemble members will then be advanced in time using the fire model described in Mandel et al. [2009a]. The resulting ensemble members will all have approximately the same shape shifted in space according to the perturbation of the ignition. The data will be a direct observa-68

PAGE 94

tion of one of the state variables taken from an independently initialized model state representing the "truth". In this case, we want the analysis ensemble to consist of the same temperature profiles shifted toward data with a slightly reduced spatial spread amongst them. In Figures 4.15 and 4.18, forecast ensembles of size 10 are shown in black lines superimposed on each other Each ensemble member is shifted in the domain with a variance of 200 m. The data is given as a dotted red line. In Figure 4.15a, the data is located in the mean position of the ensemble members, and in Figure 4.18a the data is shifted to the left 400 m In Figures 4.15b and 4.18b, the point-wise ensemble variance is given as a black line. In each case, the EnKF as given in 4.5 is applied with diagonal data covariance containing 100 K2 components. The resulting analysis ensemble and variance is given in Figures 4.16 and 4.19. In addition, the same ensemble is applied to a B-Iocalized EnKF, where the forecast covariance is tapered by (2.25), with a localization radius of 20 m chosen to be approximately half the width of the temperature profile. The resulting analysis ensembles are given in Figures 4.17 and 4.20. The results of these simple tests show that the EnKF, with and without localiza tion, do not in general produce analysis ensembles exhibiting the behavior that we want. The analysis ensemble which most closely resembles the data is obtained with the localized EnKF in Figure 4.17. However, the ensemble members contain spurious features outside of the main ignition region In terms of the location of the reaction front, even this ensemble has essentially degenerated to a single location. When the data does not happen to be near an ensemble member, both methods fail to produce anything resembling a valid temperature profile. 69

PAGE 95

1,200 1,000 g 800 -h 600 400 0 \ 500 1 000 1,500 2,000 x (m) (a) 1 500 1 000 1,500 2,000 x (m) (b) Figure 4.15: In (a), a forecast ensemble of size 10 containing temperature profiles shifted in space with a variance of 200 m. The data profile given as a dotted red line is located in the center of the ensemble members. In (b), the point-wise ensemble variance calculated from this ensemble. 1,200 30 1 000 ......... 20 "' ......... 8 00 -:s '-" ......... h -..... 600 10 400 -l I I'J..J .. t... 00 0 500 1 000 1,500 2 000 500 1 000 1,500 2 000 x (m) x (m) (a) (b) Figure 4.16: The analysis ensemble resulted in applying the EnKF to the forecast ensemble and data in Figure 4.15. The data variance used was 100 K2 The EnKF was unable to track the data accurately, and the analysis ensemble has essential de generated to a single solution, with variance reduced by several orders of magnitude. 70

PAGE 96

1 ,200 1 ,000 2 ,000 ...-... "' 800 ...__._ ...__._ h h 600 I:;' 1,000 400 \J-. ..w ,..,...... .... 2000 500 1 000 1 500 2 000 00 500 1 000 1 500 2 ,000 x (m) x (m) ( a ) ( b ) Figure 4.17: The analysis ensemble resulted in applying the B-localized EnKF to the forecast ensemble and data in Figure 4.15. The data variance used was 100 K2 and localization radius was 20 m. The localized EnKF was able to create a posterior ensemble which tracks the data accurately on the reaction front; however, there are many spurious features outside of this region. In addition, the localization has reduced the degeneracy of the analysis ensemble compared to the EnKF without localization. 1 200 1 000 8 00 ...__._ h 600 400 0 . :;. \ l 500 1 ,000 1,500 2 ,000 x (m) (a) 1 0.5 5 00 1 000 1,500 2,000 x (m) ( b ) Figure 4.18: In (a), a forecast ensemble of size 10 containing temperature profiles shifted in space with a variance of 200 m The data profile given a s a dotted red line is located 4 00 m to the left of the center of the ensemble members. In (b), the point-wise ensemble variance calculated from this en s emble. 71

PAGE 97

1,200 1,000 800 ..._.., h 600 400 0 500 1,000 1 500 2 000 x (m) (a) 30 20 10 500 1,000 1,500 2,000 x (m) (b) Figure 4.19: The analysis ensemble resulted in applying the EnKF to the forecast ensemble and data in Figure 4.18. The data variance used was 100 K2 In this case, the analysis ensemble does not resemble the data at all because the forecast ensemble contained no members nearby. 1,200 ' 1,000 1,000 I I I I I I ........ "' 800 I I :: ..._.., .. ........ ..._.., h 600 :: 500 :: .... . : : 400 l\ ... Allo. 2000 I 00 500 1,000 1 ,500 2 000 500 1 000 1,500 2 000 x (m) x (m) (a) (b) Figure 4.20: The analysis ensemble resulted in applying the B-localized EnKF to the forecast ensemble and data in Figure 4.18. The data variance used was 100 K2 and localization radius was 20 m. The results here are similar to that of the non-localized EnKF; the analysis ensemble was unable to track the data because no forecast ensem ble members were nearby. 72

PAGE 98

These experiments were specifically designed to eliminate many of the complex ities that would be involved in applying the EnKF to a real world problem. In a real 2-diminsional environment, the location of the fire cannot be described by a scalar position, and intensity and shape of the front can vary as well. Because the stan dard filters failed to produce the kind of results that we want, even with these simple tests, we need to examine the inherent difficulties involved. We can offer several plausible explanations for why the results were unsuccessful. First, the nonlinearity of the model itself produces highly non-Gaussian forecast distributions. The EnKF represents the forecast distribution only in terms of its mean and covariance, so it can not "see" the higher order features of the true distribution. Second, the EnKF forms analysis ensembles out of linear combinations of the forecast ensemble. Taking a lin ear combination of two spatially disjoint reaction fronts results in two small reaction fronts rather than a single one between the original two as we might want. While localization eases this restriction, the results still exhibit the same behavior. The issues presented here underscore a fundamental problem with employing a standard approach to this data assimilation problem. As it has been posed in this example, this is essentially a scalar problem. The only true unknown is the position of the fire. The representation of the problem in terms of a function (the tempera ture) casts it in an unnecessarily complicated high-dimensional fashion. A far more reasonable representation of the ensemble would contain only a single, scalar state variable: the position of the fire. If we also represented the data in this way, the stan dard assumptions of the EnKF are met: the model is linear because the position of the fire moves at a constant speed, the observation function is 1, and the ensemble was initialized from i.i.d. Gaussian random variables. Clearly, real world applica73

PAGE 99

tions of wildfire data assimilation cannot be cast in such simplistic terms, but we will use the concept of transforming the ensemble into some sort of alternate form which describes its position in space as a motivation for the Morphing EnKF described in Section 6. 74

PAGE 100

5. Image Morphing and Registration In this section we build the tools from image processing that we will use for the morphing EnKF later in Section 6.1. The registration problem in image processing is to find a spatial mapping that turns two given images into each other [Brown, 1992]. Classical registration requires the user to pick manually points in the two images that should be mapped onto each other and the registration mapping is interpolated from that. Here we are interested in a fully automatic registration procedure that does not require an y user input. The specific feature or objects, such as fire fronts or hurricane vortices, do not need to be specified. The method takes as input only the pixel value s in an image (i.e., gridded arrays). Of course, for the method to work well, the images being registered should be sufficiently similar. 5.1 Preliminaries 5 1.1 Image Registration Consider, for example, two grayscale images with intensities u and v given as functions on some domain D (such as a rectangle in the plane, JR2). For simplicity, assume that both u and v are equal to some constant at and near the boundary of the domain D, or at least can be extended beyond D in a physically meaningful way In our application, u and v will be heat flux fields from two s tates of our wildfire model, the fire will be inside the domain, and the heat flux near the boundary of the domain D will be zero. In image processing, u and v can be the darkness levels of two photographs of objects with the same solid background. The functions u and v will be al s o referred to as images in the following description. 75

PAGE 101

A mapping T from D c JR2 to JR2 T: (x, y) r-+ (Tx (x, y), Ty (x, y)), (5.1) such that v u o (I+ T) on D or, equivalently, v-u o (I+ T) 0 on D. (5.2) The mapping I+ Twill be called the registration mapping, and the mapping Twill be called warping. The reason for writing the registration mapping as I+ Tis that the zero warping T = 0 is the neutral element of the operation of addition, and so linear combinations of warpings have a meaningful interpretation as blends of the warpings. This will be important in the development of the morphing EnKF. To avoid unnecessarily large and complicated warping, the warping T should be as also close to zero and as smooth as possible, (5.3) where VT denotes the matrix of the first derivatives (the Jacobian matrix) ofT, ( !!b. !!b.) VT = ax ay ax ay In addition, we require that the registration mapping I + T is one-to-one, so the inverse (I+ T)-1 exists. However, we do not require that the values of I+ Tare always in D or the inverse (I+ T)-1 is defined on all of D, so u o (I+ T) and u o (I+ T)-1may not be defined on all of D Therefore, we consider all functions u, 76

PAGE 102

1 1 0 9 0.8 @ 0.8 0.6 0.3 0.6 0.6 ( 0 0.4 \ 0.4 0 -0.3 -0. 6 0.2 0.2 -0.9 00 0.2 0.4 0.6 0.8 1 00 0.2 0.4 0 6 0.8 1 (a) Tx (b) Ty 1 .. ... .J .. .J 0.8 .. I(" -? ..,. '>t 'II .. ... ,. ..,. -+ ....,."' '>t 0.6 ( ) -t --+ ---+ ,.. "' _, /I .A --+ -t 0.4 r, .-'t --+ .......,. ....,. '-\.. -+ "' 0.2 '-v '>t \,\'II '-'-\.. ) > .. 'II 'II .. 00 0.2 0.4 0.6 0.8 1 (c) T Figure 5.1: Shown in (a) and (b) are contour plots of the x and y components of an example morphing function, T. Figure (c) shows a quiver plot ofT indicating how an image would be warped when T is applied to it. 77

PAGE 103

1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (a) u (b) u o (I+ T) Figure 5.2: Shown in (a) is an image u made up of concentric circles. The morphing function given in Fig. 5.1 applied to this image, u o (I+ T). The resulting warped image is given in (b). v, u o (I+ T) u o (I+ T) -1 etc., extended on the whole by the constant value of u and v on the boundary of D. 5.1.2 Morphing Once the registration mapping I + T is found, one can construct intermediate functions U>. between u0 and u1 by morphing (Fig. 5.4), U>. = (u + Ar) o (I+ AT) 0::; A::; 1 (5.4) where r = v o (I+ T) -1 -u. (5.5) The vector, r, will be called the registration residual; it is easy to see that r is linked to the approximation in (5.2) by r = ( v u o (I + T)) o (I + T) -1 78

PAGE 104

thus (5.2) also implies that r 0. The original functions u and v are recovered by choosing in (5.4) A A = 1, respectively, uo = uo I= u u1 = (u + r) o (I+ T) = ( u + v o (I + T) -1 -u) o (I + T) = v o (I + T) -1 o (I + T) = v 0 and (5.6) (5.7) Remark 5 .1. In the registration and morphing literature, the residual is often ne glected. Then the morphed function is given simply by the transformation of the argument, u o (I+ AT); however, this formula does not recover the image, v, when A = 1. The simplest way to account for the residual might be to add a correction term to U>.. This gives the morphing formula it>.= u o (I+ AT)+ A ( v -u o (I+ T)) 0:::; A:::; 1, (5.8) which is much easier to use because does not require the inverse (I+ T) -1 like (5.4). The formula (5.8) also recovers u = u0 and v = u1 but, in our computations, we have found it unsatisfactory for tracking features and therefore we do not use it. The reason is that when the residual is not negligibly small, the intermediate functions U>. will have a spurious feature in the fixed location where the residual is large, cf. Figures 5.3 and 5.4. On the other hand, the more expensive morphing formula (5.4) moves the contribution to the change in amplitude along with the change of the position. To understand why this happens, consider a simple example in one dimension where we have some function


PAGE 105

T( x) = 1. In this scenario, we have so that r(x) = v ((x + 1)-1)-u(x) = v(x-1) u(x) = 2cp( x ) cp( x ) = cp( x ) = (u + .Xr)(x + ..\) = (1 + ..\)cp( x +.X) from ( 5.5 ); however, if instead we used the form of ( 5.8 ), then = u(x + ..\) + .X(v (x)u(x + 1)) = cp(x + ..\) + .X(2cp(x + 1)cp( x + 1)) = cp(x +.X)+ ..\cp( x + 1). (5.9) (5.10) (5.11) Suppose that cp( x ) is equal to 1 at 0 and 0 everywhere else (as in the Kronecker 80(x)), then for,\ E (0, 1), has one non-zero value at x = -,\,but has two nonz ero values at x = -,\ and x = -1. Similarly, if cp contains a single peak, then will contain two peaks, one moving with ,\ and the other stationary. This example is illustrated in Figure 5.3. Figure 5.4 shows the same spurious feature appearing in our application a 2-D fire simulation. 5.2 An Automatic Registration Procedure The formulation of registration as (5.2) (5.3) naturally leads to a construction of the mapping T by optimization. So, suppose we are given u and v and wish to find 80

PAGE 106

1 + >. .. 1 >. .... (a) ( b) Figure 5.3: The example given in Remark 5.1 is illustrated for cp(x) = u(x), a hat function centered at 0, displayed in blue, and v( x) = 2cp( x + 1) is displayed in red. In (a) and (b), intermediate images are formed following (5.10), whjch uses the inverse morphing function, and (5.11), wruch is commonly used in applications, respectively, are displayed in black. The common method produces two features: cp(x +.X), which is a translation of u centered at -.X, and .Xcp(x + 1), which grows with .X, but remains centered at -1. The method requiring the inverse morphing function produces a single peak that translates and grows simultaneously. This is a result of the form of the residual in (5.5) which moves in conjunction with u. 81

PAGE 107

Figure 5.4: An example of the intermediate states of the morphing transformation applied to the heat flux emitted from two different fires. The morphing transforma tion is taken for both of the images and linear combinations of the two are formed, representing intermediate states in the morphing representation. The resulting states are transformed back into the standard representation. The two alternative formula tions for the residual image yield different intermediate states. In the giveR figures, the top and bottom images are the two original fire lines. The intermediate states for the standard form of the residual (5.8) are given on the right and those of our repre sentation (5.5) are given on the left. The standard method produces spurious features that remain centered on the location of the bottom fire. In our method, the large com ponents of the residual move with the morphed image, so no spurious features are present. 82

PAGE 108

a warping T that is an approximate solution of J(T, u, v) = iiv u o ( I + T ) II+ C 1IITII + C2IIVTIImin, T where the norms are chosen a s llv u o (I+ T) 112 = fniv u o (I+ T)l2 d x dy IITII2 = fn1Tx l2 + 1Tv l2 d x dy IIVTII' = i 1: I'+ Ia:: I + 10::1 I dx d y (5.12) (5.13) (5.14) (5.15) The optimization formulation tries to balance the conflicting objectives of good approximation by the registered image, and as small and smooth warping as po s sible. The objective function J(T, u, v ) is in general not a convex function ofT, and so there are many local minima. For example a local minimum of may occur when some small part of u and v matches while the overall match is still not very good. To solve this minimization problem, we have used an algorithm based on that described by Gao and Sederberg [1998] with several modifications. This algorithm differs from other automated image registration techniques in that I + Tis guarenteed to be invertible by construction. We use a similar con s truction here; however, because the invertibility explicitly relies on their use of bilinear interpolation, we can no longer make that guarentee. Instead we attempt to force invertibility through a weak constraint in the objective function (5.12) by way of the norm on the gradient ofT. We will also impose a explicit constraint based on the following condition on the partial derivative s of the morphing function, T: (5.16) 83

PAGE 109

dyi 1+--------l! d x (a) (b) Figure 5 5: The physical domain (a) with grid resolutions d x and dy is scaled to the normalized domain (b) on a square grid, D = [0, 1] x [0, 1], with grid resolutions d x and dy. The change in aspect ratio from d x :dy to d x :dy must be accounted for in the algorithm using the parameter p = which represents the anisotropic scaling of each axis for each ( x, y) E D. When this is satisfied, I + T is strictly monotonically increasing in each component, and, thus, invertible. 5.2 1 Grids and Sub-domains In our application, the domain D is a rectangle, discretized by a uniform nx x ny pixel grid with n = nxny nodes For simplicity, the domain will be represented in a normalized coordinate system, where D is mapped by an affine transformation into [0, 1] x [ 0 1 ] In this coordinate sy s tem, the ( i, j)th node has coordinate s ( ni::!1 (Figure 5.5b). Unless otherwise noted the domain and coordinate system used will be assumed to be in normalized form. We will als o split the normalized domain into increasingly refined sub-domain s Df,j This will be called the ( i, j)th subdomain at 84

PAGE 110

level f. The indices i and j will take on the fractional values {1, 3/2, 2, ... 2 -1/2, 2 }. The refinement level f will range from 0 to some predetermined maximum L. The oth level contains only one sub-domain, namely 1 = D. At the first level, D will be split into 4 standard sub-domains with integer indices: Di. 1 = x Di,2 = X 1], = 1] X = 1] x 1], and 5 offset sub-domains with fractional indices: = X ntl = x These are illustrated in Figure 5.6. The definition of each sub-domain can be expressed in the following explicit formula: [ j -1 j l X for each i, j = 1 , 2 -2 So that each level contains (2i+1 -1) x (2i+1 1 ) sub-domains in total. The sub-domains themselves do not have their own grid. Instead, they will share the pixel grid. When we refer to an image or morphing function restricted to a subdomain it is defined on the subset of pixel nodes located inside the sub-domain. In general, the pixel grid will not be aligned with the sub-domains nor will each subdomain at a given level contain the same number of pixel nodes. Also, while the 85

PAGE 111

D = D}2 2 2' 2 2 '2 '2 D} 1 1 2' Figure 5.6: The original domain Dis split into 9 different sub-domains of x at refinement level = 1. The sub-domains at this level are indexed by Df,j with i, j E { 1 2}. Fractional indices indicate that the sub-domain is offset from the global domain's edge by a distance of half of the sub-domain's width in its respective axis. Further refinement levels are decomposed similarly. number of sub-domains grow exponentially at each level as 4e, the number of pixel nodes in each sub-domain decays exponentially as 4-e. Denote by TIDe. the gridded J array T restricted to the domain Dfi. We will also use the notation (xfi, yfi) and taken to mean the center point of the sub-domain Dfi. When we evaluate a morphed image u' = u o T, we will take the value of the morphing function at each pixel grid point T(xi, Yi) and evaluate the original image by interpolation at the morphed location u'(xi, Yi) = u o T(xi, Yi). In this way, the original image and the morphed image are defined on the same grid. Similarly, u o will be the morphed image restricted to the sub-domain Dfj When evaluating J a morphed image, it is necessary to obtain its values away from the pixel grid by interpolation. Two specific methods of interpolating the images will be considered 86

PAGE 112

here: bicubic spline and bilinear. While bilinear interpolation is significantly faster, morphed images created this way can exhibit noticeable kinks due to the general loss of differentiability of the interpolant. Thus, we will evaluate the morphed images using bicubic spline interpolaton. In order to reduce the the chance that the minimization gets in a local minimum, the method proceeds by building up T on the nested hierarchy of sub-domains D fj It begins with some initial guess T = T of the optimal solution, which could beT = 0 if none is known. This initial solution is then repeatedly refined on each sub-domain at each refinement level by correcting the global solution inside each sub-domain. The corrections to the global solution will come in the form of shape functions with sup port inside their respective sub-domains. The role of this refinement procedure will be to align increasingly smaller features of the images, while limiting to dimension of the minimization problem on each sub-domain to a small subspace. This allows the registration procedure to align fine details in the images, while avoiding the issues inherent in large nonlinear minimization problems. The refinement continues until either the maximum number of refinement levels has been reached, .e = L, or a sufficiently accurate solution has been found. In the computation, (5.13), (5.14), and (5.15) are integrated numerically on each of the sub-domains D fi with the deriva tives approximated by finite differences, and the integrals are approximated by piece wise constant quadrature. Denote the objective function restricted to the grid D fi by u, v) On each sub-domain D fi, the method proceeds as follows. In order not to overload the notation with many iteration indices, the values of T, and thus also T D t change during the computation just like a variable in a computer program. J 1. Initialize T = T. 87

PAGE 113

2. Smooth u and v by convolution to get u e and iie. 3. Minimize the objective function ui, iii) in the span of some shape func tions with the constraint that (I + T) is invertible. 5 2 2 Smoothing The purpose of doing the registration on the large sub-domains first is to capture coarse similarities between the images u and v In order to force this on each grid, we first smooth the images by convolution with a Gaussian kernel. The bandwidth of the kernel, h e is reduced at each refinement levelallowing the registration to track large scale perturbations on coarse levels even for a thin feature such as a fireline, while maintaining small scale accuracy at fine levels. The precise values one should use for h e is in general a parameter that depends on the problem at hand. However a good rule of thumb is to have h0 scale with the ratio of the feature size over the global domain size. For all other refinement levels, we set h e = 2 -e h o so that fine scale detail present in each sub-domain scales with its size. If the original image domain is not square, it is helpful to use different bandwidths in the x and y axes so the smoothing is consistent with the true spatial scaling due to the numerical use of a normalized, square grid. In this case, we will use bandwidth in the x and y axes denoted h ; and h ; respectively, where h ; = and D x and D y are the physical sizes of the domain in each axis. The convolution is defined by the discrete Fourier transform: n, nil iie (xi, Y i ) = L (xi' ) u (xi', Y i') -0i (Yi'), (5.17) i'=l j '=l 88

PAGE 114

where A 1 ( (Xi X ) 2 ) cPi ( x ) = 2 h i A 1 ( (y -Y?) '1/Jj (y) = r;;:-;:e exp -J 2he y 2nht y A direct implementation of (5.17) is, however, very inefficient because it requires 0 ( ( n x ny )2 ) floating point operations to compute the convolution u ( 0 -J;). The same operation can be performed much faster in the frequency domain using the relation u ( 0 -J;) = ;:-1 ( F ( u) F ( 0 -J;)) (5.18) where F(u) denotes the Fourier transform of u. Using FFT to compute the forward and inverse Fourier transforms, the convolution can be computed in only 5.2.3 Local minimization Finding a global minimum of Jij (T u e ve) is generally too compilicated to achieve using traditional minimization techniques directly. The problem is highly nonlinear, has many unknowns (i.e. twice the number of grid nodes inside D fi), and has a complicated feasibility region (in which I+ Tis invertible). Because we are working with a multi-level algorithm, the solution will be refined at higher levels, and we can seek to solve the problem in a subset of all feasible TjD i j t which is of low dimension and vastly simpler constraints. The subset in which we seek to minimize the local objective function will be of the form T + span { k = l K 89

PAGE 115

" ii.O u' 1,200 1,200 1,200 ,------,--------;-------, 1,000 1,000 1 ,000 800 800 800 600 600 600 400 400 400 2000 0.2 0.4 0 6 0 8 2000 0 2 0.4 0.6 0 8 200 0 0.2 0.4 0 6 0.8 (a) u, h = 0 (b) f.= 0 h0 = 0 .02 1,200 .-----r---,-----.----.-------, 1 200 .-----r--,-----.----,-------, 1,000 1 000 800 600 400 2000 0.2 0.4 0 6 0 8 Figure 5 7 : Displayed in (a) is a function defined over [0, 1] containing a thin feature similar to the reaction front of a wildfire This function is smoothed using equation 5.18 with various smoothing bandwidths which would be used at increasing refine ment levels, 1!, where the bandwidth decreases as 2 -e The larger bandwidth at lower levels increases the ability of the registration algorithm to find the global minimum of the objective function. Decreasing the bandwidth on the same scale of the re finement enables the algorithm to retain accuracy as the algorithm attempts to align increasingly fine scale detail. 90

PAGE 116

where each B k is a shape function defined over D and Pi} is the affine transformation that maps D fi to (-1 1] x (-1, 1]. A simple example of this would be constant shape functions: B1 : (x, y) (1, 0) and B2 : (x, y) (0, 1). Because these shape func tions are non-zero on the boundary of D, they will not preserve the global continuity ofT. An important example of shape functions that will preserve continuity are cones formed by bilinear interpolation of 3 x 3 uniform grid with values 0 on the boundary, ( 1 0) and (0, 1 ) at the center. If the pixel grid has dimensions nx = ny = 2+1 + 1, then each sub-domain will contain a pixel node at the side, comers, and center. In this case, the algorithm presented here without image smoothing is equivalent to that presented in Gao and Sederberg [ 1998]. This method of interpolation has two major benefits. Bilinear interpolation is very fast. There is also a simple 0(1) method of maintaining invertibility ofT by assuring that T o Pi}(O, 0 ) is strictly inside the quadralateral formed by the image ofT o Pi} at ( -1, 0), (1, 0 ) ( 0 -1) and (0, 1 ) When this condition is satisfied, then the morphing function maps each rectangular grid element into a convex quadrilateral, which is a necessary and sufficient condi tion for it s invertibility [Frey et al. 1978]. However, because these shape functions have non-zero normal derivative on the boundary global differentiability will not be preserved We have chosen to use the interpretation of the corrections on each sub-domain as a deformation of the center point, 0 ) which has two degrees of freedom to be represented in B1 and B2 Because we to maintain both the continuity and the differentiability ofT, we will fix the value and gradient ofT, on the boundary, to 0. A simple set of functions that meet these restrictions are created by a tensor product 91

PAGE 117

cubic spline, S(x) = 2lxl3 3x2 + 1 B1(x, y) = (S(x)S(y), 0), B2(x, y) = (0, S(x)S(y)). (5.19) This form of shape function allows us to create a continuously differentiable morphing function that moves the center point of an arbitrary sub-domain, remaining zero outside of that sub-domain. For example, T = (c1B1 + c2B2 ) defines such a morphing function that moves the center of sub-domain D fi by ( c1 c2). Figure 5.9 illustrates the application of various methods of interpolation to a standard test image. 5.2.4 The Levenberg-Marquardt Algorithm In this section, we introduce an algorithm used for minimizing more general classes of nonlinear functions known as the Levenberg-Marquardt Algorithm (LMA) originally described by Levenberg [1944] and Marquardt [1963]. Given a function f : ]Rn ---+ JRP with p n, which has continuous second partial derivatives, we seek to find an x* which is a local minimum of F(x) = = f(x). (5.20) We will denote the gradient ofF at x as F'(x), which is a row vector of size n with components (F'(x))i = ;: (x), J and the Hessian ofF at x, F"(x), a matrix of size n x n with components Sufficient conditions that x* is a local minimizer are F'(x*) = 0 and F"(x*) is positive definite [Frandsen et al., 1999]. 92

PAGE 118

1 0.8 1 0 6 ........_ -,___., .......... Cl) 0.5 CJ} ........_ 0.4 H ,___., Cl) 0.2 0 1 o 1 1 0.5 0 0.5 1 X y -1 -1 X (a) ( b) 0.2B1 0 .2B2 1 1 > > > ) > > > A A A .... A A A 0.5 > -+ -+ -+ > 0.5 A .... 't t 't .... A > -+ -+ --+ -+ -+ > A 't i i i 't A 0 ) -+ --+ ---+ --+ -+ ) ;;::., 0 A t i i i t A > -+ -+ --+ -+ -+ > .... 't i i i 't .... 0.5 > -+ -+ -+ > 0.5 A .... 't t 't .... A > > > ) ) > > A A A .... A A A 1 1 1 0.5 0 0.5 1 1 -0.5 0 0.5 1 X X (c) (d) Figure 5.8: (a) The morphing shape function S(x) used in (5.19) defined as a cubic spline with S(x ) = 0 and S'( x ) = 0 on the boundary and S(x ) = 1 and S'( x ) = 0 at the center, x = 0. (b) The tensor product of the shape functions that make up the components of B1 and B2 The gradient of this tensor l?roduct is zero along the boundary, and it attains a maximal value of 1 at the center, ( 0 0). (c) and (d) Quiver plots showing the deformation of the domain performed by each shape function scaled by 0.2. 93

PAGE 119

(a) (b) (c) (d) Figure 5.9: These figures illustrate the effect of various methods of interpolating a morphing function onto the pixel grid. (a) shows the original image with four arrows indicating the specified perturbations of the image. The remaining images show the morphed image with the specified perturbations interpolated to the pixel grid with (b) nearest neighbor, (c) bilinear, and (d) tensor product B-spline interpolants. Figure (b) exhibits clear discontinuities, while Figure (c) creates kinks in what were originally smooth curves 94

PAGE 120

The Gauss-Newton method is an iterative procedure, where the current guess of minimum, x is improved by the increment, ox. This increment is determined by approximating the solution to F'( x +ox)= 0. (5.21) The gradient in this expression is approximated using the Taylor expansion off, f(x +ox) = f(x) + f'( x)ox + O(lox l2 ) Omitting the second order term, we have F'( x +ox)= f'( x + ox)T f(x f'( x)T f(x) + f'( x)T f'( x) o x. Substituting this approximation in (5. 21) gives us the equation for the increment of the Gauss-Newton method f'(xf f'( x)ox =-f'( x)T f(x ) (5.22) When the current iterate is sufficiently close to the minimum, one can expect at least linear convergence; however, in general, this method may fail to converge at all [Bjorck, 1996]. In addition, when f'( x)T f'( x) is singular, a solution to this equa tion may not exist. Damped Gauss-Newton methods attempt to solve these problems by modifying the increment equation (5.22) so that (f'(x)T f'( x) +a!) ox= -f'(x)T f(x), (5.23) for a> 0. In this case, f'( x)T f'( x) +a! is symmetric positive definite, and a unique solution exists. In addition, when a is small, the increment becomes approximately 95

PAGE 121

equal to that of the undamped Gauss-Newton method. When a is large, _!_ f'( x)T f(x ) = _!_ F'( x ) a a and the increment is a small step in the direction of steepest descent. The LMA uses the damped increment equation (5.23), where a is determined adaptively so that it is large when the iterate is far away from the minimum and small when it is close. For this, the LMA is often considered a combination between the Gauss-Newton method, with fast local convergence, and steepest descent method, with more reliable global convergence [Frandsen et al., 1999]. This property makes the LMA popular in many nonlinear optimization applications including image registration [Zitova and Flusser, 2003]. While the details vary among implementations, generally a single iteration consists of computing the increment o x and F( x)-F( x +ox ) p +1 2 oxT (aox F'( x)) The numerator of this expression is the improvement in the value of F from the increment, and when it is negative, indicates that the increment has degraded the solution. The denominator is the estimated improvement using the linearized version of f and is always positive. When p is negative, the iteration is rejected and a is increased for the next iteration. When p is positive, the iteration is accepted and the damping parameter is reduced. 5.2.5 Optimization Because the shape functions were designed to be zero outside of their respective sub-domains, we have the relation 96

PAGE 122

Therefore on each subdomain, minimizing the global objective function in the span of the shape functions is equivelent to minimizing the local objective function over the same span. Within each sub-domain, the registration problem is reduced to minimizing (5.24) constrained in a region ( c1 c2 ) E C c IR2 such that I+ T + + is invertible. We will seek to perform this minimization using two complementary methods. The first involves sampling in the region C similar the method described in Gao and Sederberg [1998]. The second method uses an implementation of the Levenberg-Marquardt algorithm (LMA) (described in Section 5.2.4) released as part ofMINPACK [More et al., 1980]. The sampling method described here is intended to prevent the LMA minimization procedure from converging to a local minimum. This is particularly relevant at coarse refinement levels. Rather than initializing theLMA at just one value, we run it multiple times starting from a uniform grid of values inside C. However, the region C can be quite complex in general, and determining it would require looping over all nodes inside the local region. To avoid the additional complexity in determining this region explicitly we will approximate it by a quadralateral 6 c C defined by the following vertices on the principal axes (CZ:, 0), (ex, 0), (0, cY), and (0, ey), with Y 2-e .2_ T ( e e 1 ) c a 3 oy xii' Yii + 2H2 (5.25) where a < 1 is a contraction parameter. The definitions for the other vertices follow in a similar manner. These ad-hoc definitions come about from the constraint given in (5.16) and by the fact that ltxB11 is maximal on y = 0 at x = x = 97

PAGE 123

Then, solving a ( x e ) ( e 1 e ) ax I +T+c B1Pij xij + 2H1, yij = -1 (5.26) lead to the given definition. However depending on T, the quadralateral 6 may fail to be a subset of C. For this reason the contraction parameter a is included Experimentally, we have determined that a = is small enough to ensure that 6 c C, but there is no reason that this will be true in general (cf. Figure 5.11 ). Once the region C has been determined, a set of sampling points is defined as (5.27) Since we assume that the initial morphing function T is invertible, then (0, 0) E S. The sampling grid resolutions Kx and Ky are global optimization parameters that will generally be problem dependent. As a general rule, these parameters should be set to approximately the ratio of the feature width over the global domain size in each axis. For instance, in our application, the global domain will be about 1 km and the width of an ignition region is about 10 m, so we would use a sampling grid resolution of 0.01. (In practice, we have found that a resolution of 0 1 is suitable because the image smoothing reduces the chance of converging to a local minimum.) In this case the global domain may have several thousand sample points; however, the number of sample points in S will decay on the order of 4 -e as a function of the refinement level f.. This is due to the fact that the area of C must be strictly smaller than the area of the local sub-domain, whose area decays as 4 e This procedure is displayed graphically in Figure 5.11. 98

PAGE 124

2 -2 1 1 0 1 fxBx(x, y) = S'(x)S(y) 1 y -1 -1 X (b) X (a) 2 -2 1 0.5 1 /yBy(x, y) = S(x)S'(y) 1 y -1 -1 X (c) Figure 5.10: (a) The derivative of S(x) from Fig. 5.8a has maximal and minimal values at -0.5 and 0.5. The partial derivatives of Fig. 5.8b in the x andy directions are given in (b) and (c), respectively. These partial derivatives have extreme values at (.5, 0) and (0, .5), where they attain slopes of .5. Equation (5.16) is used to provide approximate conditions for the invertibility of I+ T + by finding bounds for c1 such that %x (I+ T + c1Pi}B1 ) > 1 at (.5, 0). The resulting bounds give rise to the conditions appearing in (5.25). 99

PAGE 125

' I I I ' $ ' Figure 5.11: The black dots represent the sampling grid defined in (5.27) as a uniform grid with resolution Kx x Ku The region outlined in red C is defined as the feasibility region of the local minimization problem (5.24). The points marked with a black square are determined by solving (5.26) and represent the the intersection of C with the x and y axes. The distance of these points from the origin are multiplied by the contraction parameters, ax and ay, to obtain the blue dots located at (ex, 0), (0 cY), 0), and (0, ey). The coordinates for these dots are explicitly given in (5.25) and define the vertices of the quadrilateral, 6, outlined in blue. The sampling points, circled in blue are determined as the intersection of the sampling grid with the region 6. The dashed blue line represents what the sampling region would be if we did not include the contraction parameters. This region is not a subset of C and, in particular, contains the sampling point circled in red which is not in the fea s ible region of the problem. 100

PAGE 126

Finally, the LMA is performed starting at each point in S. The function, f in (5.20), that is minimized in the least-square sense is defined as v -u 0 (I+ T) j(c 1 c2) = VG;T VG;\lt where range is a single, long vector made up of all components inside the brackets. This function is chosen so that theLMA minimizes F = jT f, which is similar to the square of the registration local objective function (5.24). In fact, one can view these to norms as being approximately equivalent because we have assumed a uniform mesh and the vector two norm can be viewed as proportional to piecewise constant numerical integration with error on the order of the square of the mesh resolution. We expect that the minima of both (5.20) and (5.12) to be approximately aligned. This procedure results in a set of local minima for the local optimization problem. The solution ( c1 c2 ) that results in the minimum value of g is then used to update the global morphing function, The LMA itself takes several parameters including a tolerance T This tolerance is treated as another global parameter for the registration algorithm As the refinement level increases, the size of the sub-domain decreases, and thus the number of iterations of the LMA performed also decreases since the tolerance is kept constant. The algorithm then moves on to the next sub-domain. In practice the order in which the sub-domains are optimized tends to affect the registered morphing function. The local optimization is limited to correcting the 101

PAGE 127

morphing function in the span of the shape functions, which are large in the center and small near the boundary This structure tends to force the local optimization to over estimate the perturbation ofT in the center of the sub-domain while under estimating the perturbation near the edges. The staggering of the sub-domains at each refinement level means that a comer of one sub-domain (in the interior of D) will actually be the center of another. So the staggering tends to stabilize the registration somewhat, but the over estimation will still occur in the center of the sub-domains that were optimized first. This effect can be significantly reduced by looping over all sub-domains twice before moving on to the next refinement level. 5.2.6 Computational Complexity The registration algorithm is summarized here for a complexity analysis. 1. Loop over all refinement levels,= 1 , L. (a) Smooth the images by convolution using FFf. (b) Loop over all sub-grids Dfi and bfi at the refinement level f. i. Perform LMA at each sample point in S for the local subgrid. (c) Update Tusing the correction found in the local optimization procedure. The algorithm then requires L FFf's each of size O(nlgn) operations, where n = nxny In addition, there are 0( 4e) sub-domains at each refinement level f. The local optimization procedure requires several evaluations of the local objective function, each of which require O(n4 -e) operations. The number of evaluations of the local objective function is some multiple of the number of sample points depending on number of iterations of theLMA performed. Therefore each refinement level requires 102

PAGE 128

0( n) operations for the local optimization procedure over all sub-domains. While the total number of refinement levels necessary for an accurate registration will depend on the resolution of the images, we require that each sub-domain contain at least one image pixel, which necessarly bounds the maximum number of refinement levels by nlgn. Therefore the algorithm as described requires O(nlg2n) operations in total. In the production of the numerical results in Chapter 6, we have found that an implementation of this algorithm in Fortran is capable producing acceptable regis trations for our wildfire appJjcation, even when the fires in the images are separated by as much 1 km. The residual images produced in these experiments have large components only near the fireline of the moving image u. The morphed images lack any noticeable spurious features. In addition, the computational time for producing these registrations is acceptable for use in a real time application. The images used in these examples have a resolution of 411 x 411 defined on an approximately 2.5 km square physical domain, with 5 subdomain refinement levels, and a 250 m sampling grid resolution. The registrations using these parameters were produced in about 2 minutes when starting from T = 0. When a good initial guess of the initial morphing function is provided, the registration is performed in about 30 s. 103

PAGE 129

6. The Morphing Ensemble Kalman Filter In Section 6, we observed that standard EnKF approach fails for our wildfire model when the forecast ensemble involves spatial errors. This behavior is consistent with other applications involving sharp features which evolve in time, such as rain fronts or vortices [Chen and Snyder, 2007]. This can be ameliorated to some degree by penalization of nonphysical states [Johns and Mandel, 2008], localization ofEnKF [Anderson, 2003, Ott et al., 2004], and employing the location of the feature as an observation function [Chen and Snyder, 2007], but the basic problem remains: EnKF works only when the increment in the location of the feature is small [Chen and Snyder, 2007]. While the location and the size of the feature may have an error distribution that is approximately Gaussian, this is not necessarily the case for the value of the state at a given point. There is clearly a need to adjust the simulation state by distorting the simulation state in space rather than employing an additive correction to the state Therefore, alternative error models that include the position of features were consid ered in the literature [Hoffman et al., 1995, Davis et al., 2006] and a number of works emerged that achieve more efficient movement of features by using a spatial transfor mation as the field to which additive corrections are made, such as a transformation of the space by a global low order polynomial mapping to achieve alignment [Alexander et al., 1998], and two-step models to use alignment as preprocessing to an additive correction [Lawson and Hansen, 2005, Ravela et al., 2007]. 104

PAGE 130

The e ss ence of the new method described here is to replace the linear combina tions of states in an ensemble filter by intermediate states obtained by morphing. This method provide s additive and position correction in a single step. For the analysis step (the Bayesian update), the state is transformed into an extended state con s isting of ad ditive and position components After the analysis step the state is converted back and advanced in time. 6 1 T h e Morphing Transformation For simplicity, we will assume that the state of the model consists of several gridded variables X = (w, z, . ), all arrays are defined over the same grid, and the registration is applied only to the first variable, w ; this will be the case in the model application in Section 6.2. The general case will be dis cussed in Section 6.3. Let [Xk] = { X1 . XN} be an ensemble of s tate s with the ensemble member X k consisting of the gridded arrays Given one fixed state X0 = ( w0 z0 ... ) the automatic registration (5.12) of the first array w defines the registration representations [ Rk, Tk] of the en s emble mem bers as morphs of X0 with the registration residual Twk = Wk o (I+ T k)-1 -Wo, Tzk = Zk o (I+ T k)-1 -zo, 105

PAGE 131

and warping s T k determined a s approximate solution s of independent optimization problems ba s ed on the state array w, The mapping T k from the previous analysis cycle is used as the initial T k in the automatic regi s tration. In our tests, this all but guarantees good registration for the ensemble member s and a significant speedup compared to starting from zero. Instead of EnKF operating on the ensemble { X k } and making linear combinations of its members, the morphing EnKF applies the EnKF algorithm to the ensemble of registration representations { [ Rk, Tk]}, resulting in the analysis ensemble in reg istration representation T t]}. with = ... ) containing all model fields, not just the registration variable The analysis ensemble is then transformed back by (5.7), which here becomes = ( wk + o (I + Tt), zk = ( zk + o (I + Tk) (6.1 ) (6.2 ) Given an observation function h the transformed observation function for EnKF on the regi s tration representations can be obtained directly by substituting from (6.1) into the ob s ervation function h ( [ R T]) = h ( ( w + r w) o ( I + T ) ( z + r z ) o ( I + T ) ... ) However constructing the observation function this way may not be the best. Con sider the ca s e of one point ob s ervation, such as the temperature at some location. 106

PAGE 132

Then the difference between the observed temperature and the value of the observa tion function gives little indication which way should the transformed state be ad justed. Suppose the temperature reading is high and the ensemble members have high temperature only in some small location (fireline). Then it is quite possible that the observation function (temperature at the given location) evaluated on ensemble members will miss the fire in the ensemble members completely. This is, however, a reflection of the inherent difficulty of localizing small features from point observa tions. For data that is given as gridded arrays (e.g, images, or a dense array of measure ments), there is a better way. Suppose the data dis a measurement of the first array in the state, w. Then, transformjng the data d into its registration representation [r d Td] just like the registration of the state array w, the observation equation becomes the comparison between the registration representations of the data d and the state array W, (6.3) Data given on a part of the domain can be registered and used in the same way. Note that no manual identification of the location of the feature either in the data or in the ensemble members is needed. 6.2 Numerical Results We have applied the morprung EnKF to an ensemble from our PDE wildfire model Mandel et al. [2005]. The simulation state consists of the temperature and the fuel fraction remaining on a 500 m x 500 m domain with a 2m uniform grid. The model has two state arrays, the temperature w and fuel supply z An initial fire solu tion X0 was created by raising a small square in the center of the domain above the 107

PAGE 133

ignition temperature and applying a small amount of ambient wind until a fire front developed. The simulated data consisted of the whole temperature field of another so lution, started from U0 and morphed so that the fire was moved to a different location of the domain compared with the average location of the ensemble members. The observation equation (6.3) was used, with Gaussian error in the registration residual of the temperature and in the registration mapping. The standard deviation of the data error was 50 K and 5 m, respectively. This large discrepancy is used to show that the morphing EnKF can be effective even when data is very different from the ensemble mean. The image registration algorithm was applied with L = 4 refine ment levels. We have performed up to 5 optimization sweeps, stopping if the relative improvement of the objective function for the last sweep was less than 0 001 or if the infinjty norm of the residual r w fell below 1 K. The optimization parameters used for scaling the norms in the objective function (5. 12) were C1 = lOOOOmK -1 and C2 = 1000 m2 K -1 For simplicity in the computation, the fuel supply variables were not included in the data assimilation. Although the fuel supply was warped spatiaJiy as in (6.2), the registration residual of the fuel supply, Tzk' was taken to be zero The 50 member ensemble shown in Fig. 6.1 was generated by morphing the initial state X0 using smooth random fields r wk and T k with smoothness parameter a = 1, restricted to 0 on the boundary Since it is not guaranteed that (I + T) -1 exists for a smooth random T, we have tested if I + T is one to one and generated another random T if not. The resulting two 250 x 250 matrices are appended to form 2 2502 element vectors representing an ensemble state [Rk, Tk] for the EnKF. The same state U0 was advanced in time along with the ensemble (Of course, other choices of U0 are possible.) 108

PAGE 134

500 500 400 400 I 300 I I 300 "' 200 "' "' 200 100 cr 100 00 100 200 300 400 500 0 500 0 z(m) x(m) x(m) (a) (b) (c) Figure 6.1: Data assimilation by the morphing ensemble filter. The forecast ensem ble (b) was created by smooth random morphing of the initial temperature profile located near the center of the domain. The analysis ensemble (c) was obtained by the EnKF applied to the transformed state. The data for the EnKF was the morphing transformation of the simulated data (a), and the observation function was the identity mapping. Contours are at 800 K, indicating the location of the fireline. The reaction zone is approximately between the two curves. The ensemble was advanced in 3 minute analysis cycles. The new registration representation [rk, Tk] was then calculated using the previous analysis values as an initial guess and incremented by EnKF. The ensemble after the first analysis cycle is shown in Fig. 6.1. The results after five analysis cycles were similar, indicating no fil ter divergence (Fig. 6.2). Numerical results indicate that the error distribution of the registration representation is much closer to Gaussian than the error distribution of the temperature itself. This is demonstrated in Fig. 6.3, where the estimated probabil ity density functions for the temperature, the registration residual of the temperature, and the registration mapping for Fig. 6.2 are computed at a single point in the domain using a Gaussian kernel with bandwidth 0 3 times the sample standard deviation. The point was chosen to be on the boundary of the fire shape, where the non-Gaussianity may be expected to be the strongest. In Fig 6.4, the Anderson-Darling test for nor mality was applied to each point on the domain for the analysis step from Fig. 6.2c. 109

PAGE 135

500 500 500 400 400 400 l 300 l 300 l 300 "' 200 c "' 200 "' 200 100 100 100 00 00 00 100 200 300 400 500 100 100 x(m) x(m) (a) (b) (c) Figure 6.2: After five analysis cycles, the ensemble shows less spread and follows the data reliably. Ensemble members were registered using the initial state, advanced in time without data assimjlation. The forecast ensemble (b) is closer to the simulated data (a) because of preceding analysis steps that have attracted the ensemble to the truth. The analysis ensemble (c) has a Little Less spread than the forecast, and the change between the forecast and the analysis is well withln the ensemble spread. Contours are at 800 K, indicating the location of the fireline. The reaction zone is approximately between the two curves 0 500 1,000 1 ,500 -400 200 Tempera t ure (K) Tempe rat ure (K) (a) (b) (c) Figure 6.3: Probability densities estimated by a Gaussian kernel with bandwidths 37 K, 19 K, and 30 m. Data was collected from the ensemble shown in Fig. 6.1 b. Displayed are typical marginal probability densities at a point near the reaction area of the original temperature (a), the registration residual of temperature (b), and the registration mapping component in the x direction (c). The transformation has vastly improved Gaussian nature of the densities. 110

PAGE 136

. (a) (b) (c) Figure 6.4: The p-value from the Anderson-Darling test of the data from the ensem ble after five morphing EnKF analysis cycles shows the ensemble transformed into its registration representation, the registration residual of the temperature (b) and the registration mapping (c), has distribution much closer to Gaussian than the original ensemble (a) The shading at a point indicates where the marginal distribution of the ensemble at that point is highly Gaussian (white) or highly non-Gaussian (black). The resulting p-values were plotted on their corresponding locations in the domain with darkness determined on a log scale with black as w-s (highly non-Gaussian) and white 1 (highly Gaussian). While the Anderson-Darling test is intended to be a hypothesis test for normality, it is used here to visualize on a continuous scale the closeness to normality of the marginal probability distribution any point in the do main. Again, strongest departure from normality of the distribution is seen around the fire. Figure 6 5 shows a 3-D representation of a similar experiment with the PDE model. The temperature profiles are displayed as a superposition of transparent im ages to give an idea of both the shape analysis ensemble members and the variance of their position. The same ensembles and data were used for the EnKF both with and without morphing. The standard EnKF produces a degenerate ensemble made up of 111

PAGE 137

highly non-physical states, while the morphing EnKF ensemble members retain the approximate shape of a fire profile. In Figure 6.6, this experiment was also applied to the stand alone level set fire model, where the sensible heat flux from the ground was used for registration and data. The results are similar to the PDE model where the EnKF produces a degen erate non-physical result and the morphing EnKF create something far more reason able. Finally, the experiment was repeated using the coupled fire model in Figure 6.7. Again, the sensible heat flux is used as the registration variable. The atmospheric winds were included in the ensemble states. The figures contours of the heat fluxes for all ensemble members superimposed on the x-y plane. In addition, the mean of the vertical vorticity, defined as the curl of the wind velocity in the atmosphere, is dis played in red (positive vorticity) and blue (negative vorticity). The heat flux contours in the EnKF analysis indicate similar non-physical results, while the morphing EnKF is able to move the burning areas near the data In addition, the wind turbulence was also translated to better match the data. While the experiments described so far show that the morphing EnKF yeilds a significant improvement in the creation of ensembles that retain spatial error and physical features, the conclusion is based on a subjective analysis of a single realiza tion of the filter It would be desirable to obtain some sort of a metric that describes these qualities and can be averaged over multiple realizations. Unfortunately, there is no easy method for obtaining such statistics. As we have seen, describing the er ror using the pointwise mean and covariance of the model variables is insufficient for describing the error distributions involved. Even using the model variables in the morphing representation to collect these statistics is unlikely to be effective because 112

PAGE 138

g G) a ll ) f G) c. E G) 1-52' -G) :; iii ... G) c. E G) 1-ll ... ... Y ( m ) .... . .. ... : Y ( m ) .... ... .. ... (a) ... ' ... ( c) : .. X ( m ) : : X(m) 1 0'. g j . .. . ... 4) :; lll (1 . .... I c. I j,. .. ::::: :::' 1. . Y ( m ) ( b ) ... ... : .. l' I) .. g I ... G) .. . :; ... iii 0 ... 411 c. E G) 1-., Y ( m ) (d) .. . .. . . .. . . ;, . .. ":, .. :. X ( m ) . : : .. : L I X (m) .. .. Figure 6.5: The morphing EnKF applied to the reaction-diffusion model. The false color is generated from the temperature with s hading for depth perception. The ref erence solution (a) is the simulated data. The initial ensemble is created by a random perturbation of the comparison solution (b), where the fire is ignited at an intention ally incorrect location. The standard ENKF panel (c) is the result of data a s similation of the temperature field after running the model for 500 seconds The morphing EnKF panel (d) is the ensemble with the image registration against the temperature field at the time of ignition, and the morphing applied to both the temperature and the fuel s upply. The en s embles have 25 member s each and are visualized as the s uperposition of transparent images of their member s The ob s ervation function i s the temperature field on the whole grid The standard EnKF en s emble diverge s from the data, while the morphing EnKF ensemble remains closer to the data. 113

PAGE 139

. l 0 ... ... ... l .... ii: Q 41 1 . Y(km) I. :0 -: y ....... I I :I u: I iii I ., l Y (km) .. .... ... ... :... ... ..... ... .... : (a) .... ... .. 4 4 (c) .. . .. : . X (km) : X (km) ......... E Y. 1 J .. :' ... ... : .. ... .. : . ... : .. ... .. ' Y (km) 1 4 I. c .; y .. : . : . .. (b) :-,' ... : ... :.,. L ,, ii: t; 41 7 .. ..... ... Y (km) (d) X (km) .. : X (km) Figure 6.6: The morphing EnKF applied to the fireline propagation model. The false color is the output heat flux with shading for depth perception. The reference solution (a) is the simulated data. The initial ensemble is created by a random perturbation of the comparison solution (b), where the fire is ignited at an intentionally incorrect location. The standard ENKF panel (c) is the result of data assimilation of the time from ignition after running the model for 1000 seconds. The morphing EnKF panel (d) is the result with the image registration determined from the time from ignition and the morphing applied to all of the model variables. The ensembles have 25 members each, and are visualized as the superposition of transparent images of heat fluxes of their members. The registration is done on the atmospheric grid with the fire heat flux as the observation function, but the atmospheric model is not used. The standard EnKF ensemble diverges from the data, while the morphing EnKF ensemble remains closer to the data. 114

PAGE 140

2 Y (m) ... ... .... .... : . .... ... ... . .... ... ...... : .... (a) .: .. ... ... ... ... X (m) ... ... : ... Y ( m) ... .... .. . : ... ... .. ; ... = -...... ,: .... / "" 1 ... X (m) (c) I l). .: ... . ... : .. ... =. .. .. ... ... . . .: ...... ............... ... -=-............. .. Y (m) X (m) (b) ... .... ., : ... .. : ... . :... -.. .: .... : ..... -.................... _,.: ......... .. = Y {m) X ( m) (d) I I 0 Figure 6.7: The morphing EnKF applied to the fireline propagation model coupled with WRF. The false color of the contour on the horizontal plane is the fire output heat flux. The superposition of transparent ensemble members is shown. The volume shading is the vorticity of the atmosphere in the ensemble mean where red and blue shades represent positive and negative vorticity, respectively. The reference solution (a) is the simulated data. The initial ensemble is created by a random perturbation of the comparison solution (b), where the fire is ignited at an intentionally incorrect location. The standard ENKF (c) and the morphing EnKF (d) are applied after 15 minutes. The ensembles have 25 members each. The registration is done on the atmospheric grid with the fire heat flux as the observation function. The standard EnKF ensemble diverges from the data, while the morphing EnKF ensemble remains closer to the data. 115

PAGE 141

the analysis ensemble members for the EnKF do not resemble true model states and the registration most likely will be meaningless. In addition, characterizing error statistics this way would be unfairly advantageous to the morphing EnKF, which uses this representation inside the filter. To get some idea of the true error statistics for this problem, we propose to generate a forecast ensemble which is made up of the heat flux from a single solution of the level set fire model translated in space by a Gaussian random variable with some given variance. The data will be the same model state shifted by a known quantity. In this case, we can characterize the spatial data assimilation problem in terms of a 2-D random variable, the spatial shift from the original model state. The problem lies in how to determine this shift from the analysis ensembles generated by the filter methods on the model states. We will solve this problem by approximating the center of the burning region. As we know, the burning region, 0 is defined as the subset of the domain where the level function is non-positive. We will define center of this region, ( x0 y0), by the location of its centroid: JJ0x d x dy X o = ffn d x dy ffn yd xdy Yo = With this definition, we can calculate the true analysis distribution exactly and compare it to the averaged statistics of the centroids to see how well the filters reproduce the expected results. This method doesn't inherently favor the morphing EnKF because the registration cannot be done perfectly because it does not allow for translations. It can also be defined even when the model state is non-physical (assuming 0 is non-empty). The process we will employ is as follows: 116

PAGE 142

Generate an ensemble by translating a model solution in space by a Gaussian random variable with mean (0, 0) and covariance o}I. Pick a translation (xd, Y d ) to use for the data and create a model state from it. Also pick a data variance for the spatial shift and O"; for the model state. Create the morphing transformation of the ensemble and the data solution. Run the EnKF from both the original model state and the transformed model state using the data state, with variance O"; for the EnKF and the transformed data state, with variance for the morphing EnKF. Perform the inverse morphing transformation for the morphing EnKF analysis ensemble. Determine the center of the burning region for each ensemble member for both analysis ensembles, and calculate the ensemble mean and covariance of these quantities. We will repeat this process 100 times for each chosen value of O"J, O"; and (xd, Y d) The ensemble mean from all repetitions will then be averaged and compared to the true analysis mean of the location of the center to detect any bias in the results. In addition, the degeneracy of the analysis ensembles will be estimated by calculating the relative analysis standard deviation defined by Rei. Analysis O" = jCov ( [ xa, y a])l' (6.4) where is the sample covariance of the centers of the analysis ensemble and Cov([ xa y a]) is the exact covariance of the analysis distribution. Because ensemble data assimilation techniques tend to deflate the covariance artificially (filter 117

PAGE 143

degeneracy), we expect this number to be less than one. In the absence of any filter degeneracy, the relative analysis standard deviation will be one. The results of these experiments are given in Table 6.1. The EnKF essentially does not respond to the data, the center of the burning region remains close to the fore cast mean and the analysis standard deviation is deflated by more than three orders of magnitude. While the morphing EnKF does not approximate the exact analysis dis tribution well, it does tend to move the center of the fire much closer to the expected location while exhibiting much less degeneracy than the EnKF alone. These results indicate that some modification, such as a larger ensemble or covariance localization, is necessary to improve the accuracy. 118

PAGE 144

\0 Fore 17 Data Mean Analysis Mean Rel. Analysis 17 Exact EnKF Morphing EnKF Morphing 1 (0.5, 1.1)1 (0.6, 1.1)-lo -3 (0.5, 1.0)-101 (0.9, -1.2) 1 3.6-3 3.8-1 10 (0.5, 1.1)1 (0.5, 1.1)-10-1 (8.3, -0.0)-10 (1.1, -0.1)1 5.7-4 9.6-2 100 (0.5, 1.1 )1 (2.7, 5.4) (5.6, 1.6) (1.3 -0.2)1 4.9-4 1.6 10-2 1 (2.1,4.3)1 (2.2, 4.4)-3 (0.3, -1.5)1 (2.1, -0.1)1 8.4-3 4.7-1 10 (2.1, 4.3)1 (2.2, 4.4)-1 (0.1, -1.0)-101 ( 4.6 4. 7) 1 1.2-3 1.0-1 100 (2.1, 4.3)1 (1.1, 2.2)1 (1.3, 4.8)1 (3.5, 4.9)1 7.20-4 1.7 -2 1 ( 0 .9, 1.7)2 (0.9 1.8)-lo -2 (0.3 -1.7)101 (5.0, 7.3)1 5.8-3 1.2 10 (0.9, 1. 7)2 (0.9, 1. 7) (0.2, -2.0)1 (0.9, 2.2) 2 1.1-3 1.3 -1 100 (0.9, 1.7)2 ( 4.3, 8. 7)1 ( 2.4 6.6)1 (1.2, 2.3) 2 3.7-3 1.8-2 1 (3.4, 6.8)2 (3.5, 7 .opo-2 (0.3, -1. 7) 1 (0.7, 2.0)-102 8.6-3 1.1 10 (3.4, 6.8)2 (3.5, 7.0)-10 (0.2, -2.0)-101 (1.9, 4.6)2 2.3-3 1.7 -1 100 (3.4, 6.8)2 (1.7, 3.5)2 ( -2.1, -0.4)1 (2.3, 5.6) 102 8.7-15 1.1 -2 Table 6.1: Analysis ensemble statistics for the EnKF with and without morphing. An ensemble of size 25 is generated of a 2-D fire shifted in space with mean 0 and standard deviation, a. Four data points are tested with standard deviation 75 m. The exact analysis mean is given next to the sample mean resulting from the EnKF and the morphing EnKF, as well as the relative analysis standard deviation (6.4). The ensemble statistics were averaged over 100 simulations. While both methods exhibit large errors, the results indicate that the EnKF is unable to move the location of the fire, while the morphing EnKF can. Also, the EnKF alone produces far more degeneracy in the analysis ensemble compared to the morphing EnKF.

PAGE 145

6.3 Conclusion The numerical results show that the morphing EnKF is useful for a highly nonlin ear problem (a model problem for wildfire simulation) with a coherent spatial feature of the solution (propagating fireline). In previous work [Mandel et al., 2005], we have used penalization of nonphysical solutions, but the location of the fire in the data could not be too far from the location in the ensemble, artificial perturbation had to be added to retain the spread of the ensemble, and the penalty constant, the amount of additional spread, and the data variance had to be finely tuned for acceptable re sults. This new method does not appear to have the same limitations. The registration works automatically from gridded data and no objects need to be specified. The dif ference between the feature location in the data and in the ensemble can be large and the data variance can be as small as necessary, without causing filter divergence. One essential limitation is that the registered images need to be sufficiently similar, and the registration mapping should be sufficiently similar to the initial guess. This will eventually impose a limitation on how long can the ensemble go without an analysis step. However, compared to previous results for the same problem [Johns and Man del, 2008, Mandel et al., 2005], the convergence of the present filter is much better. Several enhancements to the morphing EnKF are in development. These addi tional features are designed to extend the usability of the technique to more general models and data types. They include the following. Support for registration of an image consisting of only a subset of the model domain. This is important because a large wildfire may stretch across many kilometers, but an arial image may only cap120

PAGE 146

ture a small fragment of the fireline Preliminary support for this feature is included in the current implementation by the addition of a masking array. The image is pro jected onto the model grid, and pixels which do not contain valid data are recorded in the masking array. The registration procedure continues as normal except in the evaluation of the residual norm in the objective function (5.13); residual pixels lo cated where the masking array indicates that no data is present are set to zero. This method allows for the subimage to be registered while maintaining a smooth mor phing function over the whole domain; however, it requires computations over the whole domain even when the image is much smaller. Further research is required to determine whether this method is feasible in general. We have assumed in the registration that the data is a direct observation of one of the state variables. Even if subimages are allowed, we still cannot operate on data with more complicated observation functions that frequently occur in real applications. For image data, where each pixel is only locally dependent on the model variables, it may be possible to substitute u and v in the residual component of the objective function (5.13) with morphed synthetic data, such as u o (I+ T) = h([R, T]). This approach necessarily requires that the observation function is not computationally expensive because it will need to be performed with each evaluation of the objective function. There is also a need for similar filtering methods for point data, which might be obtained from weather stations. Applications such as these are more difficult because they don't directly provide information about the location of the feature of interest, and the presence of such information is integral to the concept of the morphing trans formation. However, assuming that the observations are made continuously in time, it could be possible to obtain information about when (or if) the feature, such as the 121

PAGE 147

fire, passed over the location of the instrument. Future research will be conducted regarding the inclusion of this information into the morphing ensemble filter. Other simple modifications that might improve the success of the algorithm de scribed here are possible. For instance, one could register the ensemble states more often, whether there are new data or not. This would ensure that the initial guess for the registration algorithm is always sufficiently close, thus improving the quality of the registration and the time required to produce it. Also, the common state X0 which the ensemble members are compared against has been evolved from one initial condition regardless of the data. Over time, such X0 could diverge significantly from the ensemble (which tracks the data), resulting in more strenuous registration. If this becomes an issue, a better X0 might be constructed from the analysis directly (per haps as the mean of registration representations of the analysis ensemble members) and then evolved in time until the next analysis step While the numerical results indicate that the morphing EnKF provides a vast improvement when using synthetic data generated by the numerical model, no ex periments have been attempted using real data. In addition to the complications pre sented by more realistic observation functions, real data will appear more noisy than synthetic data. The real data may not even resemble the synthetic data at all. Realis tic fires could exhibit much more complicated behavior such as splitting into multiple separate fires. It is not clear how the registration and morphing algorithms will react to these situations. Future research will be needed to address these issues as they arise. The fact that existing methods behave so poorly even with the simple experi ments contained here, provides some optimism for the success of future incarnations of the morphing EnKF. 122

PAGE 148

123

PAGE 149

Bibliography G. David Alexander, James A. Weinman, and J. L. Schols. The use of digital warping of microwave integrated water vapor imagery to improve forecasts of marine extratropical cyclones. Monthly Weather Review, 126(6): 1469-1496, 1998. Brian D. 0. Anderson and John B. Moore Optimal filtering. Prentice-Hall, Engle wood Cliffs, N.J., 1979. Ed Anderson, Zhaojun Bai, Christian H. Bischof, L. Susan Blackford, James Demme1, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, A. McKenney, and Danny Sorensen. IAPACK Users' Guide Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999 Jeffrey L. Anderson. An ensemble adjustment Kalman filter for data assimilation. Monthly Weather Review, 129(12):2884-2903, 1999. Jeffrey L. Anderson. A local least squares framework for ensemble filtering. Monthly Weather Review, 131(4):634-642, 2003 Jeffrey L. Anderson. Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D. Nonlinear Phenomena, 230(1-2): 99-111 2007. Jeffrey L. Anderson and Stephen L. Anderson. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Monthly Weather Review, 127(12):2741 2758, 1999. Angus Andrews. A square root formulation of the Kalman covariance equations. American Institute of Aeronautics and Astronautics Journal, 6(6): 1165-1166, 1968. 124

PAGE 150

Ignacio Arganda-Carreras, Carlos 0. Sanchez Sorzano, Roberto Marabini, Jose M. Carazo, Carlos Ortiz-de Solorzano, and Jan Kybic. Consistent and elastic reg istration of histological sections using vector-spline regularization. In Computer Vision Approaches to Medical Image Analysis, volume 4241 of Lecture Notes in Computer Science, pages 85-95. Springer Berlin I Heidelberg, May 2006. Jonathan D. Beezley and Jan Mandel. Morphing ensemble Kalman filters. Tellus A, 60(1):131-140, 2008. Thomas Bengtsson, Chris Snyder, and Doug Nychka. Toward a nonlinear ensem ble filter for high dimensional systems. Journal of Geophysical Research Atmo spheres, 108(D24):STS 2-1-10, 2003. Thomas Bengtsson, Peter Bickel, and Bo Li. Curse-of-dimensionality revisited: Col lapse of the particle filter in very large scale systems. Probability and Statistics: Essays in Honor of David A. Freedman, 2:316-334, 2008. Andrew F. Bennett. Inverse Methods in Physical Oceanography. Cambridge Univer sity Press, 1992. Patrick Billingsley. Probability and measure. John Wiley & Sons Inc., New York, third edition, 1995. Craig H. Bishop, Brian J. Etherton, and Sharanya J. Majumdar. Adaptive sampling with the ensemble transform Kalman filter. Part 1: Theoretical aspects. Monthly Weather Review, 129(3):420-436, 2001. Ake Bjorck. Numerical methods for least squares problems. Society for Industrial Mathematics, 1996. 125

PAGE 151

L. Susan Blackford, Jaeyoung Choi, Andrew Cleary, Eduardo D' Azevedo, James Demmel, Inderjit Dhillon, Jack Dongarra, Sven Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David Walker, and R. Clinton Whaley. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1997. Fred L. Bookstein. Principal warps: thin-plate splines and the decomposition of deformations Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(6):567-585, Jun 1989. Lisa Gottesfeld Brown. A survey of image registration techniques. ACM Computing Surveys, 24(4):325-376, 1992. Gerrit Burgers, Peter Jan van Leeuwen, and Geir Evensen. Analysis scheme in the ensemble Kalman filter. Monthly Weather Review, 126(6):1719-1724, 1998. Bradley P. Carlin, Nicholas G. Polson, and DavidS. Stoffer. A monte carlo approach to non normal and nonlinear state-space modeling. Journal of the American Statis ticalAssociation, 87(418):493-500, 1992. Rong Chen and Jun S. Liu. Mixture Kalman filters. Journal of the Royal Statistical Society: Series B, 62(3):493-508, 2000. Rong Chen, J. S. Liu, and Xiaodong Wang. Convergence analyses and comparisons of Markov chain Monte Carlo algorithms in digital communications. IEEE Trans actions on Signal Processing, 50(2):255-270, 2002. Yongsheng Chen and Chris Snyder. Assimilating vortex position with an ensemble Kalman filter. Monthly Weather Review, 135(5): 1828-1845, 2007. 126

PAGE 152

Yuan Shih Chow and Henry Teicher. Probability theory. Independence, interchange ability, martingales. SpringerVerlag, New York, third edition, 1997. Terry L. Clark and William D. Hall. On the design of smooth, conservative vertical grids for interactive grid nesting with stretching. Journal Applied Meteorology, 35 (6): 1040-1046, 1996. Terry L. Clark, Janice Coen, and Don Latham. Description of a coupled atmosphere fire model. International Journal of Wildland Fire, 13(1):49-64, 2004. Janice L. Coen, Shankar Mahalingam, and John W. Daily. Infrared imagery of crown fire dynamics during FROSTFIRE. Journal of Applied Meteorology, 43(9): 1241-1259,2004. Dan Crisan. Particle filters: A theoretical perspective. In Arnaud Doucet, Nando de Freitas, and Neil Gordon, editors, Sequential Monte Carlo in Practice, pages 17-41. Springer, 2001. Giuseppe Da Prato. An introduction to infinite-dimensional analysis. SpringerVerlag, Berlin, 2006. Sophie Dabo-Niang. Kernel density estimator in an infinite-dimensional space with a rate of convergence in the case of diffusion process. Applied Mathematics Letters, 17(4):381-386, 2004. Christopher Davis, Barbara Brown, and Randy Bullock. Object-based verification of precipitation forecasts. Part I: Methodology and application to mesoscale rain areas. Monthly Weather Review, 134(7):1772-1784, 2006. 127

PAGE 153

Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and lain Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1-17, Mar 1990. Arnaud Doucet, Nando de Freitas, and Neil Gordon, editors. Sequential Monte Carlo in Practice. Springer, 2001. Geir Evensen. Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research, 99 (C5)(10):143-162, 1994. Geir Evensen. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dynamics, 53(4):343-367, 2003. Geir Evensen. Sampling strategies and square root analysis schemes for the EnKF. Ocean Dynamics, 54:539-560, 2004. Geir Evensen. Data assimilation: The ensemble Kalman filter. Springer, Berlin, 2007. Poul E. Frandsen, Kristian Jonassen, Hans B. Nielsen, and Ole Tingleff. Unconstrained optimization, Aug 1999. URL http://www.imm.dtu.dk/ documents/ftp/publlec/lec2_99.pdf. A. E. Frey, C. A. Hall, and T. A. Porsching. Some results on the global inversion of bilinear and quadratic isoparametric finite element transformations. Mathematics ofComputation, 32(143):725-749, 1978. 128

PAGE 154

Ichiro Fukumori. A partitioned Kalman filter and smoother. Monthly Weather Review, 130(5): 1370-1383, 2002. Reinhard Furrer and Thomas Bengtsson. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants Journal of Multivariate Analysis, 98(2):227-255, 2007. Lev S. Gandin. Objective Analysis of Meteorological Fields. Gidrometizdat, Leningrad, 1963. In Russian. English translation No. 1373 by Israel Program for Scientific Translations (1965), Jerusalem, 242 pp. Peisheng Gao and Thomas W. Sederberg. A work minimization approach to image morphing. The Visual Computer, 14(8-9):390-400, 1998. Gregory Gaspari and Stephen E. Cohn. Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society, 125 (554), 1999. Neil J. Gordon and Adrian F. M. Smith. Approximate non-Gaussian Bayesian esti mation and modal consistency. Journal of the Royal Statistical Society: Series B, 55(4):913-918, 1993. William W. Hager. Updating the inverse of a matrix. SIAM Review, 31(2):221-239, 1989. Thomas M. Hamm, Jeffrey S. Whitaker, and Chris Snyder. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Monthly Weather Review, 129(11):2776-2790, 2001. 129

PAGE 155

Per Christian Hansen Rank-deficient and discrete ill-posed problems. SIAM Mono graphs on Mathematical Modeling and Computation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1998. Desmond J. Higham. An algorithmic introduction to numerical simulation of stochas tic differential equations. SIAM Review, 43(3):525-546, 2001. Ross N. Hoffman, Zheng Liu, Jean-Francois Louis, and Christopher Grassoti. Distor tion representation of forecast errors. Monthly Weather Review, 123(9):2758-2770, 1995. Peter L. Houtekamer and Herschel L. Mitchell. Data assimilation using an ensemble Kalman filter technique. Monthly Weather Review, 126(3):796-811, 1998. Peter L. Houtekamer and Herschel L. Mitchell. Sequential ensemble Kalman filter for atmospheric data assimilation. Monthly Weather Review, 129( 1 ): 123-137, 2001. Brian R. Hunt, Eric J. Kostelich, and Istvan Szunyogh. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform kalman filter. Physica D, 230 (1-2): 112-126, 2007. Luis Ibanez, Will Schroeder, Lydia Ng, and Josh Cates The ITK Software Guide. Kitware, Inc., second edition, 2005. URL http: I lwww. i tk. org I ItkSo f twareGuide.pdf. Andrew H. Jazwinski. Stochastic processes and filtering theory. Academic Press, New York, 1970. 130

PAGE 156

Lijian Jiang and Craig C. Douglas. A unified analysis of 4D variational data assimi lation and its application. Computing, submitted, 2008. Tor A. Johansen. On Tikhonov regularization, bias and variance in nonlinear system identification. Automatica, 33(3):441-446, 1997. Craig J. Johns and Jan Mandel. A two-stage ensemble Kalman filter for smooth data assimilation. Environmental and Ecological Statistics 15(1):101-110, 2008. Rudolf E. Kalman and RichardS. Bucy. New results in filtering and prediction theory. Transactions of the ASME-Journal of Basic Engineering, 83:95-1 08, 1961. Rudolph Emil Kalman. A new approach to linear filtering and prediction problems Transactions of the ASME-Journal of Basic Engineering, Series D, 82(1):35-45, 1960. Eugenia Kalnay. Atmospheric Modeling, Data Assimilation and Predictability. Cam bridge University Press, 2003. Sangil Kim Gregory L. Eyink Juan M Restrepo, Francis J. Alexander, and Gregory Johnson. Ensemble filtering for nonlinear dynamics. Monthl y Weather Review, 131(11):2586-2594, 2003. Bas J. K. Kleijn and Aad W. van der Vaart. Misspecification in infinite-dimensional Bayesian statistics Annals of Statistics, 34(2):837-877, 2006. Robert Kremens, Jason Faulring, and Colin C. Hardy. Measurement of the time temperature and emissivity history of the bum scar for remote sensing applications. 131

PAGE 157

Paper J1G.5, Proceedings of the 2nd Fire Ecology Congress, Orlando FL, Amer ican Meteorological Society, 2003. URL http: I I ams. confex. com/ ams I FIRE2003/techprogram/paper_66847.htm. Hui Hsiung Kuo. Gaussian measures in Banach spaces. Lecture Notes in Mathemat ics, Vol. 463. Springer-Verlag, Berlin, 1975. W. Gregory Lawson and James A. Hansen. Alignment error models and ensemble based data assimilation. Monthly Weather Review, 133(6):1687-1709, 2005. Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2): 164-1 68, 1944. Rodman Linn, Jon Reisner, Jonah J. Colman, and Judith Winterkarnp. Studying wild fire behavior using FIRETEC. International Journal of Wildland Fire, 11:233-246, 2002. Michel Loeve. Probability theory. I. SpringerVerlag, New York, fourth edition, 1977. Graduate Texts in Mathematics, Vol. 45. Don 0. Loftsgaarden and Charles P. Quesenberry. A nonpararnetric estimate of a multivariate density function. Annals Of Mathematical Statistics, 36: 1049-1051, 1965. Jan Mandel. Efficient implementation of the ensemble Kalman filter. CCM Re port 231, University of Colorado Denver, 2006. URL http: I /www. math. cudenver.edu/ccm/reports/rep231.pdf. 132

PAGE 158

Jan Mandel and Jonathan D. Beezley. Predictor-corrector ensemble filters for the assimilation of sparse data into high dimensional nonlinear systems. CCM Re port 232, University of Colorado Denver, 2006. URL http: I /www .math. cudenver.edu/ccm/reports/rep232.pdf. Jan Mandel and Jonathan D. Beezley. An ensemble Kalman-particle predictor corrector filter for non-gaussian data assimilation. ICCS 2009, Lecture Notes in Computer Science, Springer, to appear, 2009. arXiv:0812.2290, 2008. Jan Mandel and Jonathan D. Beezley. Predictor-corrector and morphing ensemble filters for the assimilation of sparse data into high dimensional nonlinear sys tems. 11th Symposium on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface (IOAS-AOLS), CD-ROM, Paper 4.12, 87th American Meteorological Society Annual Meeting, San Antonio, TX, January 2007.URLhttp://ams.confex.com/ams/87ANNUAL/techprogram/ paper_119633.htm. Jan Mandel, Mingshi Chen, Leopoldo P. Franca, Craig Johns, Anatolii Puhalskii, Janice L. Coen, Craig C. Douglas, Robert Kremens, Anthony Vodacek, and Wei Zhao. A note on dynamic data driven wildfire modeling. In Computational Science ICCS 2004, volume 3038 of Lecture Notes in Computer Science, pages 725 731 Springer, 2004. Jan Mandel, Lynn S. Bennethum, Mingshi Chen, Janice L. Coen, Craig C. Douglas, Leopoldo P. Franca, Craig J. Johns, Minjeong Kim, Andrew V. Knyazev, Robert Kremens, Vaibhav Kulkarni, Guan Qin, Anthony Vodacek, Jianjia Wu, Wei Zhao, and Adam Zornes Towards a dynamic data driven application system for wildfire 133

PAGE 159

simulation. In Vaidy S. Sunderam, Geert Dick van Albada, Peter M. A. Sloot, and Jack J. Dongarra, editors, Computational Science-ICCS 2005, volume 3515 of Lecture Notes in Computer Science, pages 632-639. Springer, 2005. Jan Mandel, Lynn S. Bennethum, Jonathan D. Beezley, Janice L. Coen, Craig C. Douglas, Minjeong Kim, and Anthony Vodacek. A wildland fire model with data assimilation. Mathematics and Computers in Simulation, 79(3):584-606, 2008. Jan Mandel, Jonathan D. Beezley, Janice L. Coen, and Minjeong Kim. Data as similation for wildland fires: Ensemble Kalman filters in coupled atmosphere surface models. IEEE Control Systems Magazine, to appear, 2009a. URL http://arxiv.org/abs/0712.3965. Jan Mandel, Loren Cobb, and Jonathan D. Beezley. On the convergence of the en semble Kalman filter. arXiv:0901.2951, 2009b. Donald W. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics, 11 (2): 431-441, 1963. Robert N. Miller, Everett F. Carter, and Sally T. Blue. Data assimilation into nonlinear stochastic models. Tellus A, 51(2): 167-194, 1999. Jorge J. More, Burton S. Garbow, and Kenneth E. Hillstrom. User Guide for MINPACK-1. ANL-80-74, Argonne National Laboratory, 1980. URL http: //www.mcs.anl.gov/-more/ANL8074a.pdf. 134

PAGE 160

Ambrose E. Ononye, Anthony Vodacek, and Eli Saber. Automated extraction of fire line parameters from multispectral infrared images. Remote Sensing of Environ ment, 108(2):179-188, 2007. Edward Ott, Brian R. Hunt, Istvan Szunyogh, Aleksey V. Zimin, Eric J. Kostelich, Matteo Corazza, Eugenia Kalnay, D.J. Patil, and James A. Yorke. A local ensemble Kalman filter for atmospheric data assimilation. Tellus A, 56(5):415--428, 2004. DJ Patil, Brian R. Hunt, Eugenia Kalnay, James A. Yorke, and Edward Ott. Local low dimensionality of atmospheric dynamics. Physical Review Letters, 86(26): 5878-5881, 1997. Sai Ravela, Kerry Emanuel, and Dennis McLaughlin. Data assimilation by field align ment. Physica D: Nonlinear Phenomena, 230(1-2):127-145, 2007. Data Assimi lation. Richard C. Rothermel. A mathematical model for predicting fire spread in wildland fires. USDA Forest Service Research Paper INT-115, Jan 1972. Feng Ruan and Dennis McLaughlin. An efficient multivariate random field generator using the fast Fourier transform. Advances in Water Resources, 21(5):385-399, 1998. Chris Snyder, Thomas Bengtsson, Peter Bickel, and Jeffery L. Anderson. Obstacles to high-dimensional particle filtering. Monthly Weather Review, 136(12):4629-4640, 2008. Istvan Szunyogh, Eric J. Kostelich, Gyorgyi Gyarmati, Eugenia Kalnay, Brian R. Hunt, Edward Ott, Elizabeth Satterfield, and James A. Yorke. A local ensemble 135

PAGE 161

transform kalman filter data assimilation system for the ncep global model. Tellus: Series A, 60(1): 113-130, 2008. Michael K. Tippett, Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, and JefferyS. Whitaker. Ensemble square root filters. Monthly Weather Review, 131 (7) : 1485-1490, 2003. Peter Jan van Leeuwen. A variance-minimizing filter for large-scale applications. Monthly Weather Review, 131(9):2071-2084, 2003. Jeffery S. Whitaker and Thomas M. Hamill. Ensemble data assimilation without perturbed observations. Monthly Weather Review, 130(7): 1913-1924, 2002. WRF Working Group. Weather Research Forecasting (WRF) Model. http: I I www. wrf-model. org, 2005. Xiaozhen Xiong, Ionel Michael Navon, and Bahri Uzunoglu. A note on the particle filter with posterior Gaussian resampling. Tellus A, 58(4):456-460, 2006. Barbara Zitova and Jan Flusser. Image registration methods: a survey. Image and Vision Computing, 21(11):977-1000, 2003. 136