Citation |

- Permanent Link:
- http://digital.auraria.edu/AA00002989/00001
## Material Information- Title:
- Principal angles between subspaces as related to rayleigh quotient and raleigh ritz inequalities with applications to eigenvalue accuracy and an eigenvalue solver
- Creator:
- Argentati, Merico Edward
- Publication Date:
- 2003
- Language:
- English
- Physical Description:
- 165 leaves : ; 28 cm
## Subjects- Subjects / Keywords:
- Rayleigh quotient ( lcsh )
Angles (Geometry) ( lcsh ) Canonical correlation (Statistics) ( lcsh ) Eigenvalues ( lcsh ) Angles (Geometry) ( fast ) Canonical correlation (Statistics) ( fast ) Eigenvalues ( fast ) Rayleigh quotient ( fast ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Bibliography:
- Includes bibliographical references (leaves 161-165).
- General Note:
- Department of Mathematical and Statistical Sciences
- Statement of Responsibility:
- by Merico Edward Argentati.
## Record Information- Source Institution:
- |University of Colorado Denver
- Holding Location:
- Auraria Library
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 54522667 ( OCLC )
ocm54522667 - Classification:
- LD1190.L622 2003d A73 ( lcc )
## Auraria Membership |

Full Text |

PRINCIPAL ANGLES BETWEEN SUBSPACES AS RELATED TO
RAYLEIGH QUOTIENT AND RALEIGH RITZ INEQUALITIES WITH APPLICATIONS TO EIGENVALUE ACCURACY AND AN EIGENVALUE SOLVER by Merico Edward Argentati B.S., Worcester Polytechnic Institute, 1970 M.S., University of California at San Diego, 1972 M.S., Unversity of Colorado at Boulder, 1977 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Doctor of Philosophy Applied Mathematics 2003 This thesis for the Doctor of Philosophy- degree by Merico Edward Argentati has been approved by Andrew Knyazev / Jan Mandel Date Argentati, Merico Edward (Ph.D., Applied Mathematics) Principal Angles Between Subspaces as Related to Rayleigh Quotient and Raleigh Ritz Inequalities with Applications to Eigenvalue Accuracy and an Eigenvalue Solver Thesis directed by Professor Andrew Knyazev ABSTRACT In this dissertation we focus on three related areas of research: 1) principal angles (sometimes denoted as canonical angles) between subspaces including theory and numerical results, 2) Rayleigh quotient and Raleigh Ritz perturbations and eigenvalue accuracy, and 3) parallel software implementation and numerical results concerning the eigenvalue solver LOBPCG (Locally Op- timal Block Preconditioned Conjugate Gradient Method) [35], using parallel software libraries and interface specifications based on the Lawrence Livermore National Laboratory, Center for Applied Scientific Computing (LLNL-CASC) High Performance Preconditioners (Hypre) project. Concerning principal angles or canonical angles between subspaces, we pro- vide some theoretical results and discuss how current codes compute the cosine of principal angles, thus making impossible, because of round-off errors, finding small angles accurately. We propose several MATLAB based algorithms that compute both small and large angles accurately, and illustrate their practical robustness with numerical examples. We prove basic perturbation theorems for m absolute errors for sine and cosine of principal angles with improved constants. MATLAB release 13 has implemented our SVD-based algorithm for the sine. Secondly, we discuss Rayleigh quotient and Raleigh Ritz perturbations and eigenvalue accuracy. Several completely new results are presented. One of the interesting findings characterizes the perturbation of Ritz values for a symmetric positive definite matrix and two different subspaces, in terms of the spectral condition number and the largest principal angle between the subspaces. The final area of research involves the parallel implementation of the LOBPCG algorithm using Hypre, which involves the computation of eigenvec- tors that are computed by optimizing Rayleigh quotients with the conjugate gradient method. We discuss a flexible matrix-free parallel algorithm and performance on several test problems. This LOBPCG Hypre software has been integrated into LLNL Hypre Beta Release Hypre-1.8.0b. This abstract accurately represents the content of the candidates thesis. I recommend its publication. Signed Andrew Knyazev IV DEDICATION I dedicate this thesis to my dear wife Shannon and my shining, creative daughter Angela who have supported my research and long hours of study over the years, in spite of my strange obsession with mathematics. I also dedicate this thesis to my mother and father for their love and support over the years, and especially for their encouragement that one can do anything that one wants to if you approach each task with enough confidence and opti- mism. I also dedicate this thesis to my brothers Jim and John who have always shown support and interest in my work, even though they were far removed from the details. Finally, I also would like to dedicate this thesis to my uncle Jack Morrissey who inspired me toward science and mathematics, when I was ten years old, by showing me the moon through his 8 inch Newtonian reflecting telescope. ACKNOWLEDGMENT I would like to thank my advisor, Andrew Knyazev, for his support and guidance during all stages of my research and dissertation preparation. This work was in part supported by NSF Award DMS 0208773, Precon- ditioned Algorithms for Large Eigenvalue Problems, PI Andrew Knyazev. A portion of this work, involving the scalable eigensolver, was supported by Lawrence Livermore National Laboratory, Center for Applied Scientific Comput- ing (LLNL-CASC). I would like to thank Rob Falgout, Charles Tong, Panayot Vassilevski and other members of the LLNL-CASC team for their help. I would also like to thank several CU-Denver graduate students, includ- ing Sean Jenson, who helped to test the SUBSPACEA.m code, and Chan- Chai Aniwathananon and Saulo Oliveira, who participated in testing the code ORTHA.m. Rapheal Bar-Or has been very helpful concerning the development of enhancements to algorithms for computing angles between subspaces. Tessa Weinstein has also been a great source of ideas and encouragement which have contributed to progress in my general research. CONTENTS Figures ................................................................ x Tables................................................................. xi Algorithms............................................................ xii Chapter 1. Introduction......................................................... 1 1.1 Overview, Motivation and Organization 1........................... 1 1.2 Notation........................................................... 6 2. Principal Angles Between Subspaces................................... 8 2.1 Introduction to Principal Angles................................... 8 2.2 Inaccuracy in the Cosine-Based Algorithm.......................... 17 2.3 Properties of Principal Angles and a Sine-Based Algorithm ............................................. 18 2.4 Generalization to an A-Based Scalar Product....................... 31 2.5 Perturbation of Principal Angles in the A-Based Scalar Product............................................ 39 2.6 Algorithm Implementation.......................................... 51 2.7 Numerical Examples................................................ 57 2.8 Availability of the Software ..................................... 72 2.9 Conclusions Concerning Angles Between Subspaces......................................................... 73 3. Rayleigh Quotient Perturbations and Eigenvalue Accuracy................................................. 75 vii 3.1 Introduction to Rayleigh Quotient Perturbations.................................................... 75 3.2 Subspaces and the Rayleigh-Ritz Procedure........................ 79 3.3 Analysis in 2-Dimensions ........................................ 85 3.3.1 Eigenvalue Accuracy in 2-Dimensions ........................... 88 3.3.2 Rayleigh Quotient Perturbations in 2-Dimensions................................................... 89 3.4 Analysis in the General Case .................................... 90 3.4.1 Eigenvalue Accuracy............................................ 90 3.4.2 Rayleigh Quotient Perturbations................................ 94 3.5 Bounds for Rayleigh-Ritz Estimates .............................. 99 3.5.1 Eigenvalue Approximation by Ritz Values........................ 99 3.5.2 Perturbation of Ritz Values................................... 100 3.6 Rayleigh Quotient Perturbations using a Matrix Inner Product............................................ 101 3.7 Conclusions Concerning Rayleigh Quotient Perturbations................................................... 106 4. Implementation of a Preconditioned Eigensolver Using Hypre.......................................... 108 4.1 Introduction to the Preconditioned Eigenvalue Solver............................................... 108 4.2 Background Concerning the LOBPCG Algorithm....................................................... 109 4.3 The Block LOBPCG Algorithm...................................... 109 4.4 LOBPCG Hypre Implementation Strategy............................ 116 4.5 Software Implementation Details ................................ 118 viii 4.6 Applications Program Interface (API) ......................... 120 4.7 Compiling and Testing......................................... 120 4.8 Numerical Results ........................................... 121 4.9 Conclusions Concerning the Hypre LOBPCG Implementation................................................ 128 5. Overall Conclusions and Impact.................................. 130 5.1 Overall Conclusions........................................... 130 5.2 Impact........................................................ 133 6. Future Work Directions.......................................... 135 Appendices A MATLAB Code for SUBSPACEA.M and ORTHA.M....................... 136 B LOBPCG Hypre API Functions.................................... 141 C Primary LOBPCG Functions lobpcg.c........................... 143 References......................................................... 161 IX FIGURES Figure 2.1 Errors in principal angles as functions of n/2 62 2.2 Errors in individual angles in algorithms 2.2 and 2.4 with A = I . . 65 2.3 Errors in individual angles between F2 and G2 and F$ and G3 . . . 67 2.4 [ik as functions of l and errors as functions of condition number . . 70 4.1 LOBPCG Hypre software modules..................................... 119 4.2 LOBPCG scalability as problem size increases...................... 123 4.3 Execution time to converge as problem size increases.............. 124 4.4 Performance versus quality of preconditioner...................... 125 4.5 Convergence rates for different preconditioners................... 127 TABLES Table 2.1 Accuracy of computed angles..................................... 18 2.2 Computed sine and cosine of principal angles of the example of [5] 59 4.1 LOBPCG test script............................................... 121 4.2 Timing results scalability .................................... 122 4.3 Scalability data for 3-D Laplacian ............................. 124 xi ALGORITHMS Algorithm 2.1 SUBSPACE.m.......................................... 21 2.2 Modified and Improved SUBSPACE.m.................... 24 2.3 SUBSPACEA.m......................................... 54 2.4 Modified and Improved SUBSPACEA.m................... 56 4.1 Block LOBPCG Algorithm............................. 110 4.2 Block LOBPCG Algorithm in Hypre.................... 114 xii 1. Introduction 1.1 Overview, Motivation and Organization In this dissertation we focus on three related areas of research: 1) principal angles (sometimes denoted as canonical angles) between subspaces including theory and numerical results, 2) Rayleigh quotient and Raleigh Ritz perturbations and eigenvalue accuracy, and 3) parallel software implementation and numerical results concerning the eigenvalue solver LOBPCG (Locally Op- timal Block Preconditioned Conjugate Gradient Method) [35], using parallel software libraries and interface specifications based on the Lawrence Livermore National Laboratory, Center for Applied Scientific Computing (LLNL-CASC) High Performance Preconditioners (Hypre) project. The framework for this research embodies finite dimensional real vector spaces, vectors, subspaces, angles between vectors/subspaces, orthogonal pro- jectors, symmetric and/or positive definite operators, eigenvalue problems, Rayleigh quotients and Rayleigh-Ritz approximations. In Chapter 2 we discuss principal angles between subspaces. Computation of principal angles or canonical angles between subspaces is important in many applications including statistics [8], information retrieval [43], and analysis of algorithms [52]'. This concept allows us to characterize or measure, in a natural way, how two subspaces differ, which is the main connection with perturbation theory. When applied to column-spaces of matrices, the principal angles describe canonical correlations of a matrix pair. We provide some theoretical results 1 and discuss how current codes compute cosine of principal angles, thus making impossible, because of round-off errors, finding small angles accurately. We review a combination of sine and cosine based algorithms that provide accurate results for all angles, and include a generalization to an A-based scalar product for a symmetric and positive definite matrix A. We propose several MATLAB based algorithms that compute both small and large angles accurately, and illustrate their practical robustness with numerical examples. We prove basic perturbation theorems for absolute errors for sine and cosine of principal angles with improved constants. Numerical examples and a description of our code are given. MATLAB release 13 has implemented our SVD-based algorithm for the sine. A portion of this research involving angles, was presented in several talks including a talk at the Seventh Copper Mountain Conference on Iterative Meth- ods, March 24-29, 2002, Copper Mountain: Principal Angles Between Subspaces in an A-Based Scalar Product: Dense and Sparse Matrix Algorithms joint with Andrew Knyazev, and another talk at the Sixth IMACS International Sympo- sium on Iterative Methods in Scientific Computing March 27-30, 2003 Univer- sity of Colorado at Denver: Principal Angles Between Subspaces in an A-Based Scalar Product: Interesting Properties, Algorithms and Numerical Examples - joint with Andrew Knyazev. In Chapter 3 we discuss Rayleigh quotient and Raleigh Ritz perturbations and eigenvalue accuracy. There are two reasons for studying this problem [52]: first, the results can be used in the design and analysis of algorithms, and 2 second, a knowledge of the sensitivity of an eigenpair is of both theoretical and of practical interest. Rayleigh quotient and Raleigh Ritz equalities and inequalities are central to determining eigenvalue accuracy and in analyzing many situations when individual vectors and/or subspaces are perturbed. We address the case of a perturbation of general vector, as well as perturbations involving a general vector and an eigenvector, and perturbations involving subspaces. The absolute magnitude of these perturbations is a function of 1) the characteristics of the matrix A (e.g., condition number), 2) the angles between vectors/subspaces, and 3) orthogonal projectors onto the vectors/subspaces that are involved. So the framework for angles, which is discussed in Chapter 2, is also important as a background for Chapter 3. Several completely new results are presented. One of the interesting findings characterizes the perturbation of Ritz values for a symmetric positive definite matrix and two different subspaces, in terms of the spectral condition number and the largest principal angle between the subspaces. Thus, this is an area where angles between subspaces can be used to obtain some elegant results. The Rayleigh quotient is defined by p(x) = p(x; A) = {Ax, x)/{x, x), and we are often interested in bounding the absolute difference |p(x; A)p{y\ A)|, when A Â£ RnXn is a symmetric and/or positive definite matrix and x and y are general vectors, or when one of them is an eigenvector. It is natural to charac- terize these perturbation as some function of angles between individual vectors and/or by the largest principal angle between the subspaces. In this investiga- 3 tion some relevant and interesting perturbations are discussed and proofs are included. For example we prove that - (K(i4)":)sin(^{xiy})j where k(A) is the spectral condition number of A. Considering two subspaces X, y C Rnxn we obtain the new result hi< Wj4) 1) si(z{^, y}) 3 = 1,.... m, aj where ctj,Pj for j = 1,... ,m are the Ritz values of a positive definite matrix A w.r.t to the subspaces X and y and A{X, T} is the largest principal angle between the subspaces X and y. In our analysis we first restrict the domain to a 2-D subspace. These 2-D results are then extended to larger dimensional spaces. In the literature, this is referred to as a mini-dimensional analysis [39]. This extension requires a care- ful analysis of the properties of subspaces and Rayleigh-Ritz approximations, which is provided. We also provide several alternative proofs, one of which uses a somewhat unique approach of expressing the Rayleigh quotient as a Frobe- nius inner product; p(x\ A) = (A, Px) = trace(APx), where Px is the orthogonal projector onto x. In Chapter 4 we discuss the LOBPCG eigensolver algorithm, in detail, and the parallel'sbftware implementation of this algorithm. This final area of research involves a project to develop a Scalable Preconditioned Eigenvalue Solver for the solution of eigenvalue problems for large sparse symmetric matrices on massively parallel computers. This work involves the implementation of the LOBPCG 4 algorithm, where eigenvectors are computed by optimizing the Rayleigh quotient with the Conjugate Gradient Method. This Rayleigh quotient optimization uses the Raleigh-Ritz method, hence this is the connection with Chapter 3 of this work. The code implementation takes advantage of advances in the Scalable Linear Solvers project, in particular in multigrid technology and in incomplete factor- izations (ILU) developed under the Hypre project, at LLNL-CASC. The solver allows for the utilization of Hypre preconditioners for symmetric eigenvalue prob- lems. We discuss a flexible matrix-free parallel algorithm, and the capabilities of the developed software. We gain a large amount of leverage through use of the Hypre parallel library functions that handle the construction of a flexible set of preconditioners, application of the preconditioned solve during execution of the algorithm and other linear algebraic operations such as matrix-vector multi- plies, scalar products, etc. In the LOBPCG algorithm we implement soft locking, whereby certain vectors may be eliminated from the Rayleigh Ritz procedure at each iteration. This is done somewhat differently as compared to locking as it is discussed in [2]. This entire development amounts to approximately 4,000 lines of non-commentary source code. We also discuss performance on several test problems. The LOBPCG Hypre software has been integrated into the Hypre software at LLNL and is part of the Hypre Beta Release Hypre-1.8.0b. A portion of the work, concerning the implementation of a Hypre version of LOBPCG, was presented at Eleventh Copper Mountain Conference on Multigrid Methods, March 30 April 4, 2003: Implementation of a Scalable Preconditioned 5 Eigenvalue Solver Using Hypre joint with Andrew Knyazev. Many of the insights in this investigation stem from concrete examples and problems, and special cases. For example, concerning Rayleigh quotient in- equalities, the special case of 2-D analysis yields major results for the general case. In the case of angles between subspaces, the simple example of the inaccu- racy in the current algorithm provided insights to remedy the problem. Finally the LOBPCG parallel implementation has benefited from code development of MATLAB and C-language programs, and through a number of numerical ex- periments using both of these earlier implementations. 1.2 Notation Rn - Vector space of real n-tuples. Rnxm _ Vector space of real n x m matrices. x, y,... - Lower case Roman letters denote vectors. A,B,... - Upper case Roman letters denote matrices. T,Q,... - Calligraphic letters denote subspaces of Rn. (;x, y) - Euclidean inner product(x, y) = xTy where x, y Â£ Rn. ||z|| - Euclidean norm ||x||2 = (x,x) where x Â£ Rn. || A|| - For a matrix this is the induced operator norm which is derived from the Euclidean norm and is also called the spectral norm. ||x||b - R-norm \\x\\% = (Rrr, x) where x Â£ Rn and B Â£ Rnxn is a positive definite matrix. ||A||b - For a matrix this is the induced operator norm which is 6 derived from the 5-norm. (A B) k(A) F1- Feg p(x-,A) r(x; A) z{x,y} Z{F,g} - For A,Be Rnxn this is the Frobenius inner product (A, B) = trace (ATB). - Spectral condition number k(A) = ||A||||^4_1|| = ^ Ai for A positive definite. - The orthogonal complement of the subspace F. - Is the subspace5 intersected with the orthogonal complement of g, that is f g = f n gL. - The Rayleigh quotient is defined for x Rn with x ^ 0 by p{x\A) = (Ax,x)/(x,x). If it is obvious which matrix A is involved, the form p(x) may be used. - The residual is defined for x Rn with x ^ 0 by r(x; A) = (A p(x; A)I)x. If it is obvious which matrix A is involved, the form r(x) may be used. - The acute angle between the vectors Rn with x, y ^ 0 is given by Z{x,y} = arccos 1^. - The largest principal angle between the subspaces f, g C Rn. 7 2. Principal Angles Between Subspaces 2.1 Introduction to Principal Angles In this chapter we present concepts and results concerning angles between subspaces. Given two non-zero vectors, the acute angle between the vectors x, y G Rn with x, y ^ 0 is denoted by Z{x, y} = arccos 7r It is important to emphasize that 0 < Z{x,y} < , by definition. Considering two subspaces, we can recursively define a set of angles between them, which are denoted as principal or canonical angles. Let us consider two real-valued matrices F and G, each with n rows, and their corresponding column- spaces T and Q, which are subspaces in Rn, assuming that p = dim.F > dimC? = q > 1. Then the principal angles 6u...,6ge[ 0,tt/2] between T and Q may be defined, e.g., [29, 21], recursively for k = 1,..., q by cos(#fc) = max,, Â£jr max^ vFv = v%Vk subject to IMI = IMI = 1> uTiLi = 0, vTVi = 0, i = 1,... ,k 1. 8 The vectors ui,... ,uq and vi,_, vq are called principal vectors. We first find 6\ and the corresponding principal vectors ui and Vi. To obtain the second angle, we search in subspaces that are orthogonal, respectively to iq and v\. Continuing in this manner, always searching in subspaces orthogonal to principal vectors that have already been found, we obtain the complete set of principal angles and principal vectors. Here and below || || denotes the standard Euclidean norm of a vector or, when applied to a matrix, the corresponding induced operator norm, also called the spectral norm, of the matrix. We axe often interested in the largest principal angle 0q. In this investigation we will denote the largest principal angle by e, = z{r,s}. Definition 4.2.1 of [52], provides an equivalent, slightly more intuitive char- acterization of the largest principal angle when p = g, which is given by Qq = maxue^-minegZ{u, v}. 0 According to [51, 47], the notion of canonical angles between subspaces goes back to Jordan (1875). Principal angles between subspaces, and particularly the smallest and the largest angles, serve as important tools in functional analysis (see books [!', 20, 30] and a survey [14]) and in perturbation theory of invariant subspaces, e.g., [9, 51, 48, 33, 42], Computation of principal angles between subspaces is needed in many ap- plications. For example, in statistics, the angles are closely related to measures 9 of dependency and covariance of random variables; see a canonical analysis of [8]. When applied to column-spaces T and Q of matrices F and G, the principal angles describe canonical correlations retrieval. Principal angles between subspaces also appear naturally in compu- tations of eigenspaces, e.g., [34, 35], where angles provide information about solution quality and need to be computed with high accuracy. In such large-scale applications, it is typical that n p; in other words, informally speaking, we are dealing with a small number of vectors having a large number of components-. Because of this, we are interested in matrix-free methods; i.e., no n-by-n matrices need to be stored in memory in our algorithms. A singular value decomposition (SVD)-based algorithm [18, 5, 7, 21, 23] for computing cosines of principal angles can be formulated as follows. Let columns of matrices Qp G Rnxp and Qc 6 Rnxq form orthonormal bases for the subspaces T and Q, respectively. The reduced SVD of Q'pQc is YTQpQGZ = diag(cri, where Y G Rpxq, Z G Rqxq both have orthonormal columns. Then the principal angles can be computed as 6k = arccos(crfc), k = 1,..., q, (2.2) where 0 < 61 < < 6q < 10 while principal vectors are given by uk = QpVk, vk = QGzk, k = l,...,q. The equivalence [5, 21] of the original geometric definition of principal an- gles and the SVD-based approach follows from the next general theorem on an equivalent representation of singular values. Theorem 2.1 //Me Rmxn, then the singular values of M are defined recur- sively by uk := maxy Â£Rm max2 yTMz = yfMzk, k = 1,..., min{m, n}, The vectors yi and Zi are, respectively, left and right singular vectors. Proof: The proof of the theorem is straightforward if based on Allakhverdievs representation (see [20]) of singular numbers, and using the well-known formula of the induced Euclidean norm of a matrix as (2.3) subject to IMI = INI = i, vTyi = o, zTZi = o, i = i, 1. (2.4) fc-i the norm of the corresponding bilinear form. To apply the theorem to principal angles, one takes M = Q'pQc- 11 In the most recent publication on the subject, [16], the SVD-based algorithm for cosine is proved to be stable, but QR factorizations with complete pivoting are recommended for computing Qp and Qc to improve stability. The SVD-based algorithm for cosine was considered as standard and is implemented in software packages, e.g., in MATLAB, version 5.3, 2000, code SUBSPACE.m, revision 5.5,1 where Qp Â£ RnXp and Qg Â£ RnXq are computed using the QR factorization. However, this algorithm cannot provide accurate results for small angles because of the presence of round-off errors. Namely, when using the standard double-precision arithmetic with EPS ~ 1016 the algorithm fails to accurately compute angles smaller than 10~8 (see section 2.2). The problem has been highlighted in the classical paper [5], as well as a cure has been suggested (see also publications on cosine-sine (CS) decomposition methods [50, 54, 51, 47]), but apparently it did not attract enough attention. In statistics, most software packages include a code for computing o~k = cos(9k), which are called canonical correlations', see, e.g., CANCOR Fortran code in FIRST MDS Package of AT&T, CANCR (DCANCR) Fortran subroutine in IMSL STAT/LIBRARY, G03ADF Fortran code in NAG package, CANCOR subroutine in Splus, and CANCORR procedure in SAS/STAT Software. While accurately computing the cosine of principal angles in corresponding precision, these codes do not compute the sine. However, the cosine simply equals one in 1Revision 5.8 of SUBSPACE.m in MATLAB release 12.1, version 6.1.0.450, May 18, 2001, is still identical to revision 5.5, which we have used for numerical tests in this investigation. MATLAB release 13 has implemented the SVD-based algorithm for sine. 12 double precision for all angles smaller than 10-8 (see next section). Therefore, it is impossible in principal to observe an improvement in canonical correla- tions for angles smaller than 10-8 in double precision. It might not be typically important when processing experimental statistical data because the expected measurement error may be so great that a statistician would deem the highly correlated variable essentially redundant and therefore not useful as a further explanatory variable in their model. Statistical computer experiments are dif- ferent, however, as there is no measurement error, so accurate computation of very high correlations may be important in such applications. The largest principal angle is related to the notion of distance, or a gap, between equidimensional subspaces. If p = q, the distance is defined [1, 20, 21, 30] as gap(.F, Q) = ||Pf PG\\ = sin(9q) = yjl- (cos(0g))2, (2.5) where Pf and Pq are orthogonal projectors onto T and Q, respectively. This formulation provides insight into a possible alternative algorithm for computing the sine of principal angles. The corresponding algorithm, described in [5], while being mathematically equivalent to the previous one in exact arith- metic, is accurate for small angles in computer arithmetic as it computes the sine of principal angles directly, without using SVD (2.1) leading to the cosine. We review the algorithm of [5] based on a general form of (2.5) in section 2.3 and suggest an improved version, similar to the CS decomposition algorithm of [54], with the second SVD of the reduced size. 13 The CS decomposition methods, e.g., [50, 54, 51, 47], one of which we just mentioned, provide a well-known and popular alternative approach for com- puting principal angles between subspaces given by selected p (q) columns of orthogonal matrices of the size n. For example, if the matrix QF, with or- thonormal columns that span the subspace P1-, the orthogonal complement of J7, is available to us, the CS decomposition methods compute the SVD of (Qf)TQg together with the SVD of (Qf)tQg, thus providing cosine and sine of principal angles. When p is of the same order as n/2, matrix QF is about of the same size as matrix QF, and the CS decomposition methods are effective and are recommended for practical computations. However, when n p, the CS decomposition methods, explicitly using matrix QF of the size n-by-n p, will be less efficient compared to matrix-free methods we consider in this in- vestigation. Let us highlight that the cosine- and sine-based methods of [5] that we investigate here in section 2.3, while different algorithmically from the CS decomposition methods, are very close mathematically to them. A different sine-based approach, using eigenvalues of PF Pg, is described in [7, 47]; see a similar statement of Theorem 2.5. It is also not attractive numerically, when n^> p, as it requires computing an n-by-n matrix and finding all its nonzero eigenvalues. In some applications, e.g., when solving symmetric generalized eigenvalue problems [34], the default scalar product uTv cannot be used and needs to be replaced with an A-based scalar product (u,v)a = uTAv, where A is a symmetric positive definite matrix. In statistics, a general scalar product for 14 computing canonical correlations gives a user an opportunity, for example, to take into account a priori information that some vector components are more meaningful than others. In a purely mathematical setting, generalization to A-based scalar products brings nothing really new. In practical computations, however, it carries numerous algorithmic and numerical problems, especially for ill-conditioned cases, which are important in applications. In section 2.4, we propose extension of the algorithms to an A-based scalar product and provide the corresponding theoretical justification. In section 2.5, we turn our attention to perturbation estimates, which gener- alize the following trigonometric inequalities: if an angle 0 G [0, 7t/2] is perturbed by e G [0, 7t/2] such that 9 + e G [0, 7t/2] then 0 < cos(0) cos(# + e) < sin(0 + e) sin(e) < sin(e), 0 < sin((9 + e) sin($) < cos(0) sin(e) < sin(e). We prove new absolute perturbation estimates for the sine and cosine of principal angles computed in the A-based scalar product. When A I, our estimates are similar to those of [5, 55, 53, 23, 22], but the technique we use is different. More importantly, our constants are somewhat better, in fact, in the same way the constants in the middle terms of the trigonometric inequalities above are less than one. We consider particular implementation of algorithms used in our MATLAB code SUBSPACEA.m in section 2.6, with emphasis on the large-scale case, n p, and sparse ill-conditioned matrix A, which may be specified only as a function that multiplies A by a given vector. When matrices F and G are sparse, our 15 code can still be used even though it performs orthogonalization of columns of matrices F and G that increases the fill-in; cf. [23]. Also, we do not even touch here upon a practically important issue of the possibility of recomputing the correlations with an increase in the data; see again [23]. Finally, numerical results, presented in section 2.7, demonstrate the practical robustness of our code. For simplicity, we discuss only real spaces and real scalar products; however, all results can be trivially generalized to cover complex spaces as well. In fact, our code SUBSPACEA.m is written for the general complex case. As was pointed out in [40], there are several natural questions that are of interest, but are not considered here. Our algorithms are based on SVD. How does SVD accuracy (cf. [4, 12, 11, 16]), especially for small singular values, or in ill-conditioned cases, affect the results? In [16], a formal stability analysis is done for the SVD-based algorithm for cosine, which is proved to be mixed stable. In our numerical tests, practical robustness of our algorithms is encouraging. Are our methods accurate and stable theoretically, e.g., see [26]? For A-based scalar products, how does the increase of the condition num- ber of A influence the accuracy? Which parts of our algorithm are respon- sible for the main error growth? 16 We feel, however, that investigating these matters is not within the scope of this work. They may rather serve as interesting directions for future research. 2.2 Inaccuracy in the Cosine-Based Algorithm Let d be a constant and T = span j(l Of} Q span j(l df} . Then the angle between the one-dimensional subspaces T and Q can be com- puted as a ( d VflTcP In the table below d varies from one to small values. Formula (2.6) is accurate for small angles, so we use it as an exact answer in the second column of the table. We use the MATLAB built-in function SUBSPACE.m (revision 5.5) which implements (2.1) to compute values for the third column of the table. It is apparent that SUBSPACE.m returns inaccurate results for d < 10-8, which is approximately y/EPS for double precision. In this simple one- dimensional example the algorithm of SUBSPACE.m is reduced to computing 0 = arccos (. = = .) . This formula clearly shows that the inability to compute accurately small angles is integrated in the standard algorithm and cannot be fixed without changing the algorithm itself. The cosine, that is, a canonical correlation, is computed accurately and simply equals to one for all positive d < 10-8. However, one cannot determine small angles from a cosine accurately in the presence of round- off errors. In statistical terms, it illustrates the problem we already mentioned 17 d Formula (2.6) SUBSPACE.m 1.0 1.0e-04 1.0e-06 1.0e-08 1.0e-10 1.0e-16 1.0e-20 1.0e-30 7.853981633974483e-001 9.999999966666666e-005 9,999999999996666e-007 1.000000000000000e-008 1.000000000000000e-010 9.999999999999998e-017 9.999999999999998e-021 1.000000000000000e-030 7.853981633974483e-001 9.999999986273192e-005 1.000044449242271e-006 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017 Table 2.1: Accuracy of computed angles above that the canonical correlation itself does not show any improvement in correlation when d is smaller than 10~8 in double precision. In the next section, we consider a formula [5] that directly computes the sine of principal angles as in (2.6). 2.3 Properties of Principal Angles and a Sine-Based Algorithm We first review known sine-based formulas for the largest principal angle. Results of [1, 30] concerning the aperture of two linear manifolds give ll-Pp PaII = max {maxx5iW=i ||(7 PF)x\\, max^|W,i ]|(/ Pa)s||} . (2.7) Let columns of matrices Qp RnXp and Qcj Â£ Rnxq form orthonormal bases for the subspaces T and Q, respectively. Then orthogonal projectors on T and Q are Pp = QpQ'p and Pc = QgQgi respectively, and ||PF Poll = max{||(J QfQtf)Qc\\, |l(7 QoQDQfII}. (2.8) 18 If p ^ g, then expression of (2.8) is always equal to one; e.g., if p > q, then the second term under the maximum is one. If p = q, then both terms are the same and yield sin(0g) by (2.5). Therefore, under our assumption p > g, only the first term is interesting to analyze. We note that the first term is the largest singular value of (7 QfQf)Qg What is the meaning of other singular values of the matrix? This provides an insight into how to find a sine-based formulation to obtain the principal angles, which is embodied in the following theorem [5]. Theorem 2.2 Singular values Ah < /h < < pLq of the matrix (7 QfQ1f)Qg are given by fik = \/l of, k = 1,... ,g, where cr*, are defined in (2.1). More- over., the principal angles satisfy the equalities Qk = arcsin(/h). The right principal vectors can be computed as Vk = QcZk, k = l,...,q, where Zk are corresponding orthonormal right singular vectors of matrix (7 QfQf)Qg- The left principal vectors are then computed by uk = QFQpVk/cTk ifak^0, k = 1,..., g. Proof: Our proof is essentially the same as that of [5]. We reproduce it here for completeness as we use a similar proof later for a general scalar product. Let B = (7 Pf)Qg = (7 QfQf)Qg Using the fact that 7 Pp is a projector and that QqQg = 7, we have BtB = Qtg{I-Pf){I-Pf)Qg = QTg{I-Pf)Qg = 7 QgQfQfQg- 19 Utilizing the SVD (2.1), we obtain Q^Qg = YT,ZT, where S = diag (01, cr2,..., aq); then ZtBtBZ = / E2 = diag (1 of, 1 of,..., 1 of). Thus, the singular values of B are given by fik = x/iTof, A: = 1,..., q, and the formula for the principal angles 9k = arcsin(^) follows directly from (2.2). We can now use the theorem to formulate an algorithm for computing all the principal angles. This approach meets our goal of a sine-based formulation, which should provide accurate computation of small angles. However, for large angles we keep the cosine-based algorithm. 20 Algorithm 2.1: SUBSPACE.m Input: real matrices F and G with the same number of rows. 1. Compute orthonormal bases QF = orth(F), Qg = orth(G) of column-spaces of F and G. 2. Compute SVD for cosine: [Y, S, Z] = svc1(Q^(5g)j S = diag (cri,..., crq). 3. Compute matrices of left Ucos = QpY and right Yos = QgZ principal vectors. 7. 8. Qg ~ Qf(QfQg) if diag (QF) > diag (QG); v Qf ~ Qg{QgQf) otherwise. Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(S). Compute matrices Us-m and V^jn of left and right principal vectors: V'sin = QgZ, Usin = Qf(QFVsin)Z,~l if diag (QF) > diag (Qg)', Us in ~ QfZ, Vsm = QG(QG^sin)^ ^ otherwise. Compute the principal angles, for k = 1,..., q: 4. Compute matrix B = 5. 6. 9k = <1/2; pi <1/2. arccos(<7fc) if arcsin(/i/c) if Form matrices U and V by picking up corresponding columns of USia, V^in and Ucos, Ycos, according to the choice for 9k above. Output: Principal angles 9i,... ,9q between column-spaces of matrices F and G, and corresponding matrices U and V of left and right principal vectors, respectively. Remark 2.1 In step 1 of the algorithm, the orthogonalization can be performed using the QR. method or the SVD. In our actual code, an SVD-based built-in MATLAB function ORTH.m is used for the orthogonalization. It is pointed out in [16] that errors in computing QF and Qg, especially- expected for ill-conditioned F and G, may lead to an irreparable damage in final 21 answers. A proper column scaling of F and G could in some cases significantly reduce condition numbers of F and G. We highlight that an explicit columnwise normalization of matrices F and G is hot required prior to orthogonalization if a particular orthogonalization algorithm used here is invariant under column scaling in finite precision arithmetic. Our numerical tests show that the explicit column scaling is not needed if we utilize a built-in MATLAB function QR.m for orthonormalization. However, the explicit column scaling apparently helps to improve the accuracy when the SVD-based built-in MATLAB function ORTH.m is used for orthonormalization. In [16];"QR factorizations with complete pivoting are recommended for computing Qp and Qq. Remark 2.2 A check of diag (Qp) > diag (Qg) in steps 4 and 6 of the algo- rithm removes the need for our assumption p = diag (Qp) > diag (Qg) = Q- Remark 2.3 We replace here in step 4 (I QfQ'p)Qg = Qg Qf(QfQg), (I QgQg)Qf = Qf ~ Qg(QgQf) to avoid storing in memory any n-by-n matrices in the algorithm, which allows us to compute principal angles efficiently for large n p as well. If the matrix Qpx, with orthonormal columns that span the subspace F1, the orthogonal complement of F, was available to us when p > q, we could take here ' B = Qfa.Qg, as in the CS decomposition methods; see [50, 54, 51, 47]. Under our assumption n p, however, the matrix Qpx is essentially of the size n and thus shall be 22 avoided. The 1/2 threshold used in Algorithm 2.1 in steps 7 and 8 to separate small and large principal angles and corresponding vectors seems to be a natural choice. However, such an artificial fixed threshold may cause troubles with orthogonality in the resulting choice of vectors in step 8 if there are several an- gles close to each other but on different sides of the threshold. The problem is that the corresponding principal vectors, picked up from two orthogonal sets computed by different algorithms, may not be orthogonal. A more accurate approach would be to identify such possible cluster of principal angles around the original threshold and to make sure that all principal vectors corresponding to the cluster are chosen according to either step 3, or step 6, but not both. 23 Algorithm 2.2: Modified and Improved SUBSPACE.m Input: real matrices F and G with the same number of rows. 1. Compute orthonormal bases Qp = orth(F), Qc = orth(G) of column-spaces of F and G. 2. Compute SVD for cosine: [Y, E, Z\ = svd(QpQc), S = diag ( vectors. 4. Compute large principal angles, for k = 1,..., q: Ok = arccos (cjfc) if o\ < 1/2. 5. Form parts of matrices U and V by picking up corresponding columns of UCos, Kos, according to the choice for Ok above. Put columns ofUcos, Icos> which are left, in matrices Rp and Rq- Collect the corresponding os in a diagonal matrix Up. 6. Compute the matrix B = Rq Qf(QpRg)- 7. Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(B). 8. Compute matrices Usjn and of left and right principal vectors: Y>in RgZ, Usin = Rp (Rp lsin) Dp^. 9. Recompute the small principal angles, for k = 1,..., q: Ok = arcsin(pfc) if pi ^ l/2- 10. Complete matrices U and V by adding columns of Usin, V^in. Output: Principal angles 0\,...,0q between column-spaces of matrices F and G, and corresponding matrices U and V of left and right principal vectors, respectively. Let us repeat that, in exact arithmetic, the sine and cosine based approaches give the same results; e.g., columns of Usin and V^in must be the same as those of Ucos and V^os- Why do we need to recompute essentially the same vectors a second time? What if we compute only Ucos, Kos and then recompute just small 24 principal angles using, e.g., the obvious formula Mfc = IK cxkvk||? (2.9) This approach was discussed in [40]. It was suggested that it would resolve the inaccuracy in the cosine-based algorithm illustrated in the previous section, without the need for the second SVD. The answer is that the cosine-based algorithm fails to compute accurately not only the small principal angles but also the corresponding principal vectors. The reason for this is that singular values computed in step 2 of Algorithm 2.1 are the cosines of principal angles, while singular values of the matrix B in step 5 are the sines of principal angles. Thus, the distribution of singular values is different in steps 2 and 5; e.g., singular values corresponding to small angles are much better separated in step 5 than in step 2. For example, angles 1CT10 and 1CT12 will produce a multiple singular value 1 in step 2 in double precision but will produce two distinct small singular values in step 5. This means that singular vectors, corresponding to small principal angles, might not be computed accurately in computer arithmetic using only SVD in step 2, which will also lead to inaccurate computation of the small principal angles by formula (2.9). Our numerical tests support this conclusion. There is some obvious redundancy in Algorithm 2.1. Indeed, we do not need to calculate columns of t/s;n and V^n, corresponding to large sines, and columns of {/cos and Vcos, corresponding to large cosines, as they are computed inaccu- rately in computer arithmetic and we just discard them later in the algorithm. However, first, for large-scale applications with n p that we are interested in, 25 the redundancy is insignificant. Second, this allows us to perform steps 2-3 and steps 4-5 of Algorithm 2.1 independently in parallel. We note that E must be invertible in Algorithm 2.1. For sequential computations, we now describe Algorithm 2.2. Here, we re- duce computational costs of the second SVD by using already computed vectors Ucos and Kos for the cosines. The cosine-based algorithm computes inaccurately individual principal vectors corresponding to small principal angles. However, it may find accurately the corresponding invariant subspaces spanned by all these vectors. Thus, the idea is that, usingT7C0S and Vcos, we can identify invariant subspaces in T and Q, which correspond to all small principal angles. Then, we perform the second SVD only for these subspaces, computing only columns of Usin and that we actually need, which may significantly reduce the size of the matrix in the second SVD. This idea is used in the CS decomposition algorithm of [54], We keep steps 1-3 of Algorithm 2.1 unchanged but modify accordingly later steps to obtain Algorithm 2.2. Such changes may significantly reduce the size of matrix B and, thus, the costs, if large principal angles outnumber small ones; e.g., if there are no small principal angles at all, the algorithm simply stops at step 3. We note that matrix Sr is always invertible, unlike matrix S. Remark 2.4 By construction, matrices Rp and Rg have the same number of already orthogonal columns, which removes the need for orthogonalization and for comparing, in step 6, their ranks. Remark 2.5 We have three, equivalent in exact arithmetic, possibilities to com- 26 pute matrix B : B (I RfR^)Rg = Rg Rf{RfRg) 1 Rg Qf{Q^Rg)- The first formula is ruled out to avoid storing in memory any n-by-n matrices in the algorithm. Our numerical tests show that the third expression, though somewhat more expensive than the second one, often provides more accurate results in the presence of round-off errors. To summarize, Algorithms 2.1 and 2.2 use the cosine-based formulation (2.1), (2.2) for large angles and the sine-based formulation of Theorem 2.2 for small angles, which allows accurate computation of all angles. The algorithms are reasonably efficient for large-scale applications with n > p and are more robust than the original cosine-based only version. In the rest of the section, we describe some useful properties of principal angles not yet mentioned. In this investigation, we follow [33] and make use of an orthogonal projectors technique [25]. For an alternative approach, popular in matrix theory, which is based on representation of subspaces in a canonical CS form, we refer to [51]. Theorem 2.2 characterizes singular values of the product (/ Pf)Qgi which are the sine of the principal angles. What axe singular values of the matrix PfQg? A trivial modification of the previous proof leads to the following not really surprising result that these are the cosine of the principal angles. Theorem 2.3 Singular values of the matrix QfQ^fQg a,re exactly the same as Ok, defined in (2.1). 27 We conclude this section with other simple and known (e.g., [55, 51, 56]) sine and cosine representations of principal angles and principal vectors, this time using orthogonal projectors Pp and Pq on subspaces P and Q, respectively. Theorem 2.4 Let assumptions of Theorem 2.2 be satisfied. Then oy > > aq are the q largest singular values of the matrix PfPg; in particular, a1 = \\PFPG\\. Other n q singular values are all equal to zero. Remark 2.6 As singular values of PpPa are the same as those of PgPf, sub- spaces P and Q play symmetric roles in Theorem 2.4; thus, our assumption that p = dimiF > dimt? = q is irrelevant here. Theorem 2.5 Let assumptions of Theorem 2.2 be satisfied. Then pi < P2 < < pq are the q largest singular values of the matrix (I Pf)Pq; in particular, P., = \\(I-Pf)Pg\\. Other n q singular values are all equal to zero. Remark 2.7 Comparing Theorems 2.4 and 2.5 shows trivially that sine of prin- cipal angles between P and Q are the same as cosine of principal angles between P1- and Q because I Pp is an orthogonal projector on P^~. If p > n/2 > q, it may be cheaper to compute principal angles between P1- and Q instead of principal angles between P and Q. 28 What can we say about singular values of the matrix (/ Pg)Pf'7- In other words, how do cosine of principal angles between subspaces PL and Q compare to cosine of principal angles between their orthogonal complements P and (z1? If p = q, they are absolutely the same; in particular, the minimal angle between subspaces Px and Q is in this case the same as the minimal angle between their orthogonal complements P and G, e.g., in [14], and, in fact, is equal to gap(P, G) = 11 Pf Pg|| as we already discussed. When p > q, subspaces P and QL must have a nontrivial intersection because the sum of their dimensions is too big; thus, the minimal angle between subspaces P and QL must be zero in this case, which corresponds to ||(I Pq)Pf\\ = 1, while \\(I Pf)Pg|| uiay be less than one. To be more specific, dim(Jr n G1) > p q\ thus, at least p q singular values of the matrix (/ Pg)Pf are equal to one. Then, we have the following statement, which completely clarifies the issue of principal angles between orthogonal complements; cf. Ex. 1.2.6 of [7]. Theorem 2.6 The set of singular values of (I Pg)Pf, when p > q, consists of p q ones, q singular values of (/ Pf)Pg, and, n p zeros. In particular, this shows that the smallest positive sine of principal angles between T and Q, called the minimum gap, is the same as that between P1 and G*; see [30]. This theorem can also be used to reduce the costs of computing the principal angles between subspaces P and G, when their dimeiisions p and q are greater than n/2, by replacing P and G with their orthogonal complements. Let us finally mention a simple property of principal vectors, emphasized in [55], which helps us to understand a geometric meaning of pairs of corresponding 29 principal vectors from different subspaces. Theorem 2.7 We have PFvk = ovitfc, PGuk = akvk, k=l,...,q, and ujvj (PFUi)TVj = uJPFVj = (Tjujiij = cTjSij, i,j = l,...,q. In other words, a chosen pair uk,vk spans a subspace, invariant with respect to orthoprojectors Pf and PG and orthogonal to all other such subspaces. The kth principal angle 9k is simply the angle between uk and vk; see (2.9). Moreover, the subspace span{u,fc, vk} is also invariant with respect to or- thoprojectors I Pf and I PG. Let us define two other unit vectors in this subspace: uk = (vk ~ <7kUk)/(J,k e vk = (uk crkvk)/pk e Ql such that ufuf = vfvjr = 0. Then uk, vk are principal vectors for subspaces J- and Q; uf, vk are principal vectors for subspaces P1 and Q; uk,vk are principal vectors for subspaces T and QL; uk, vk are principal vectors for subspaces P1- and Q1, which concludes the description of all cases. 30 In the next section, we deal with an arbitrary scalar product. 2.4 Generalization to an ABased Scalar Product Let A Â£ RnXn be a fixed symmetric positive definite matrix. Let (x, y)A = (x, Ay) = yTAx be an A-based scalar product, x, y Â£ Rn. Let ||x||,4 = ->/(x,x)a be the corresponding vector norm and let ||R||a be the corresponding induced matrix norm of a matrix B Â£ Rnxn. We note that ||x||.a = ||A1/,2x|| and \\B\\a = \\Al/2BA~V2\\. In order to define principal angles based on this scalar product, we will follow arguments of [5, 21] but in an A-based scalar product instead of the standard Euclidean scalar product. Again, we will assume for simplicity of notation that P>Q- Principal angles Oi,... ,6q Â£ [0,7t/2] between subspaces T and Q in the A-based scalar product (, -)A are defined recursively for k = 1,..., q by analogy with the previous definition for A = I as cos(4) = maxu max, eg ()a = (Uk,vk)A (2.10) subject to IMU = ||w|U = l, (u,Ui)A = 0, (v,Vi)A = 0, i=l,...,k-l. (2.11) The vectors u{\uq and Vi,... ,vq are called principal vectors relative to the A-based scalar product. The. following theorem justifies the consistency of the definition above and provides a cosine-based algorithm for computing the principal angles in the 31 A-based scalar product. It is a direct generalization of the cosine-based ap- proach of [5, 21]. Theorem 2.8 Let columns of QF Rnxp and QG G Rnxq now be A- orthonormal bases for the subspaces F and G, respectively. Let cq > cr2 > > aq be singular values of QFAQG with corresponding left and right singular vectors yk and Zk, k = 1,..., q. Then the principal angles relative to the scalar product (, -)^ as defined in (2.10) and (2.11) are computed as 9k = arccos (ak),. k = l,...,q, (2-12) where 0 < 9, < < 9q < while the principal vectors are given by uk = QFVk, vk = QGzk, k = l,...,q. Proof: We first rewrite definition (2.10) and (2.11) of principal angles in the following equivalent form. For k = 1,..., q, cos(9k) = max^ Â£RP max* GRq yTQ'FAQGz = ylQ^AQGzk subject to IMI = INI = 1, yTVi = 0, zTZi = 0, i = 1,. -, A: 1, where u = QFy E F,v = QGz G Q and uk QFyk G T, vk = QGZk Q- Since QF and QG have A-orthonormal columns, Q'FAQF = I and Q'qAQg = I. This implies \\u\\\ = yTQFAQFy = yTy = |M|2 = 1 32 and IMfc = ztQtgAQgz = zTz = ||z||2 = 1. For i ^ j, we derive (ui,Uj)A = yfQpAQpyj = yfyj = 0 and (vi,Vj)A = zf QGAQGZj = zfzj = 0. Now, let the reduced SVD of QfpAQc be YTQpAQGZ = diag(<7i, cr2,..., crq), (2.13) where Y e RpXq, Z e Rqxq both have orthonormal columns. Then, by Theorem 2.1 with M = QpAQG, the equality cos {Ok) = o k, k = 1,..., q, just provides two equivalent representations of the singular values of Q'pAQa, and yz and Zk can be chosen as columns of matrices Y and Z, respec- tively. The statement of the theorem follows. M Let us now make a trivial but important observation that links principal an- gles in the A-based scalar product with principal angles in the original standard scalar product, when a factorization of A = KTK, e.g., K = A1//2, is available. We formulate it as the following theorem. Theorem 2.9 Let A = KTK. Under assumptions of Theorem 2.8 the principal angles between subspaces T and Q relative to the scalar product (, -)a coincide with the principal angles between subspaces KT and KQ relative to the original scalar product (,) 33 Proof: One way to prove this is to notice that our definition of the principal angles between subspaces T and Q relative to the scalar product (, -)^ turns into a definition of the principal angles between subspaces KT and KQ relative to the original scalar product (, ) if we make substitutions Ku f> u and Kv t-> v. Another proof is to use the representation QpAQa = (KQf)T KQg, where columns of matrices KQp and KQg are orthonormal with respect to the original scalar product (, ) and span subspaces KT and KQ, respectively. Now Theorem 2.8 is equivalent to the traditional SVD theorem on cosine of principal angles between subspaces KT and KQ relative to the original scalar product (, ), formulated in the introduction. The A-orthogonal projectors on subspaces T and Q are now defined by formulas Pf ; QfQ*f = QfQ^A and Pq QgQ*q 1 QgQgA, where *a denotes the A-adjoint. To obtain a sine-based formulation in the A-based scalar product that is accurate for small angles, we first adjust (2.5) and (2.7) to the new A-based scalar product: ga,pA(T,Q) = \\PF Pg\\a = max {U*|U=i IK7 pf)x\\a, IK7 Pg)v\\a} (2-14) 34 If p = q, this equation will yield sin(^), consistent with Theorem 2.8. Similar to the previous case A = I, only the first term under the maximum is of interest under our assumption that p> q. Using the fact that IMU = \\Qgz\\a = INI Vz G g, X- Qgz, z e Rq, the term of interest can be rewritten as maxze^u^i ||(/ PF)x\\A = \\A1/2(I QfQfa)Qg\\- (2-15) Here we use the standard induced Euclidean norm || || for computational pur- poses. Similar to our arguments, in the previous section, we obtain a more general formula for all principal angles in the following. Theorem 2.10 Let S = (/ QfQ:fA)Qg Singular values pi < H2 < < pq of matrix Al!2S are given by //& = y/l a|, k = 1,... ,q, where ak are defined in (2.13). Moreover, the principal angles satisfy the equalities Ok = arcsin(^). The right principal vectors can be computed as Vk = QcZk, k = 1,..., q, where Zk are corresponding orthonormal right singular vectors of matrix A1/2S. The left principal vectors are then computed by Uk = QFQTFAvkjcFk ifak^ 0, k = 1,..., q. Proof: We first notice that squares of the singular values Uk of the matrix Al/2S, which appear in (2.15), coincide with eigenvalues i/& = p2 of the product 35 STAS. Using the fact that Q'pAQp = I and QqAQg = I, we have StAS = QTG{I-AQFQl)A{I-QFQTFA)QG = I Q'qAQpQ^AQg- Utilizing the SVD (2.13), we obtain QfpAQc = YYZT, where S = diag ( Thus, the eigenvalues of STAS are given by ^ = 1 k = 1,..., q, and the formula for the principal angles follows directly from (2.12). For computational reasons, when n is large, we need to avoid dealing with the square root A1/2 explicitly. Also, A may not be available as a matrix but only as a function performing the multiplication of A by a given vector. Fortunately, the previous theorem can be trivially reformulated as follows to resolve this issue. Theorem 2.11 Eigenvalues Vi < < < uq of matrix STAS, where S = (I QFQ'fA)Qg, are equal to Uk = 1 crl, k = 1,..., g, where ak are defined in (2.13). Moreover, the principal angles satisfy the equalities Ok = arcsin (ydT) k = 1,..., q. The right principal vectors can be computed as Vk = QoZk, k = l,...,g, where Zk are corresponding orthonormal right eigenvectors of matrix STAS. The left principal vectors are then computed by Uk = QFQFAvk/ak if We can easily modify the previous proof to obtain the following analogue of Theorem 2.3. Theorem 2.12 Singular values of the matrix A1/1QpQrpAQo = JP^PpQg co- incide with Ok, defined in (2.13). It is also useful to represent principal angles using exclusively A-orthogonal projectors Pp and Pq on subspaces T and Q, respectively, similarly to Theorems 2.4 and 2.5. Theorem 2.13 Under assumptions of Theorem 2.8, cri > cr2 > > \\PfPg\\a- Other n q singular values are all equal to zero. Proof: First, we rewrite A^PpPaA-1'2 = A'pQpQlAQaQlAA-'P = A1/2QF (A'PQpf A^Qc (A'KQb? . As columns of matrices AlPQp and A1^Qg are orthonormal with respect to the original scalar product (, ) bases of subspaces AlPp and AlPQ, respectively, the last product is equal to the product of orthogonal (not A-orthogonal!) pro- jectors P^-np and Pjpy 12q on subspaces Af^T and A1^1Q. Second, we can now use Theorem 2.4 to state that cosine of principal angles between subspaces All2Tr and AfPQ with respect to the original scalar product 37 (,) are given by the q largest singular values of the product Pa\iitPa\iiq A^PpPcA-1/2. Finally, we use Theorem 2.9 to conclude that these singular values are, in fact, a-*,, k = 1,... ,q, i.e., the cosine of principal angles between subspaces T and Q with respect to the A-based scalar product. Theorem 2.14 Let assumptions of Theorem 2.11 be satisfied. Then pi < //2 < < pbq are the q largest singular values of the matrix A1/2 (I Pp)PcA' lP; in particular, Pq=\\(I-PF)PG\\A. The other n q singular values are all equal to zero. Proof: We rewrite Al/\I Pf)PgA~1/2 = (/ A1/2Qf (A1/2gF)T) A1/2Qg (A1/2Qg)T = (I P^iiyf) PAl/lg and then follow arguments similar to those of the previous proof, but now using Theorem 2.5 instead of Theorem 2.4. Remarks 2.6-2.7 and Theorems 2.6-2.7 for the case A = I hold in the general case, too, with obvious modifications. Our final theoretical results are perturbation theorems in the next section. 38 2.5 Perturbation of Principal Angles in the ABased Scalar Product In the present section, for simplicity, we always assume that matrices F, G and their perturbations F, G have the same rank; thus, in particular, p = q. We notice that F and G appear symmetrically in the definition of the prin- cipal angles, under our assumption that they and their perturbations have the same rank. This means that we do not have to analyze the perturbation of F and G together at the same time. Instead, we first study only a perturbation in G. Before we start with an estimate for cosine, let us introduce a new notation 0 using an example: (G + G)eQ = (G + G)ng, where and the orthogonal complement to Q are understood in the A-based scalar product. Lemma 2.1 Let 0"i > cr2 > > cr9 and di > principal angles between subspaces T, Q and J-, Q, respectively, computed in the A-based scalar product. Then, for k = 1,..., q, K-dfc| < max[cos(6>mill{(C/ + Q) 0 Q, J7}); (2.16) cos (flminK^ + Q) 0 G, 7''})]SaPA(^> where 0mjn is the smallest angle between corresponding subspaces, measured in the A-based scalar product. 39 Proof: The proof is based on the following identity: AiI2QfQtfAQg = A1/2QFQFAQGQGAQG + A1/2QFQFA(I QGQGA)QG, (2.17) which is a multidimensional analogue of the trigonometric formula for the cosine of the sum of two angles. Now we use two classical theorems on perturbation of singular values with respect to addition: sk(T + S) sfc(TS'T) A1/2QfQtfAQgQtgAQg and S = Al!2QFQTFA{I QGQTGA)QG in (2.18) to get where the first equality follows from Theorem 2.12. In the second term in the sum on the right, we need to estimate a product, similar to a product of three orthoprojectors. We notice that column vectors of (I QGQGA)QG belong to the subspace (Q + Q) Q. Let P^g+g)Qg be an A-orthogonal projector on the subspace. Then the second term can be rewritten, also using the projector QfQfA = PF, as AWQfQtfA{I QgQtgA)Qg = AlPQFQTFAP{g+g)eg{I QGQTGA)QG 40 = (AWpFP^a)egA~W) A^(I QoQlA)Q6-, therefore, it can be estimated as WA^QpQlAil QgQtgA)Qs\\ < P1/2-Pi'f(5+6)e^_1/2HP1/2(-f QoQlA)Qa||. The first multiplier in the last product equals \\A1^PpP(s+g)eg-'4-1/2|l = \\PpPmi)ee\\A = + S) 0 g, ^}), similar to (2.15) and using Theoreni 2.13 for subspaces (Q + Q) Q and T\ while the second multiplier is gap A (Â£/, dimC? = dimÂ£. To estimate the first term in the sum, we apply (2.19) with T = A^QfQIAQg and ST = QTGAQG. SkiA^QFQ^AQGQaAQc) < Sk(A1/2QFQFAQG)\\QGAQcl\ < S}.(A1/2QfQ'fAQg) = crfc, simply because the second multiplier here is the cosine of an angle between Q and Q in the A-based scalar product, which is, of course, bounded by one from above. Thus, we just proved dfc < Changing places of QG and Qg, we obtain o-fc < + cos(f9min{(Â£ + Q) G, -^DgapA(G, Q) and come to the statement of the lemma. 41 Remark 2.8 Let us try to clarify the meaning of constants appearing in the statement of Lemma 2.1. Let us consider, e.g., + G) 9 G, IF}). The cosine takes its maximal value, one, when at least one direction of the perturbation of Q is A-orthogonal to Q and parallel to F at the same time. It is small, on the contrary, when a part of the perturbation, A-orthogonal to Q, is also A-orthogonal to IF. As (Q + Q) Q C QL, we have cos+ G)QG, IF}) < cos {G1, IF}) (2.20) = sin IF}) = gap a(G,IF), which is the constant of the asymptotic perturbation estimate of [5] (where A = I). The latter constant is small if subspaces Q and F are close to each other, which can be considered more as a fortunate cancellation, as in this case cosine of all principal angles is almost one, and a perturbation estimate for the cosine does not help much because of the cancellation effect. Remark 2.9 A natural approach similar to that of [23] with A = I involves a simpler identity: Q'fAQq = Q'pAQo + Q'pA{Qq Qg)-> where a norm df the second term is then estimated. Then (2.18) gives an esti- mate of singular values using \\A1^{Qq Qg)\\- As singular values are invariant with respect to particular choices of matrices Qq and Qa with A-orthonormal columns, as far as they provide ranges G and G, respectively, we can choose them 42 to minimize the norm of the difference, which gives tafPI/2Wa-Q,5<3)ll, (2.21) where Q is an arbitrary q-by-q orthogonal matrix. This quantity appears in [23] with A = I as a special type of the Procrustes problem. In [23], it is estimated in terms of the gap between subspaces Q and Q (using an extra assumption that 2q < n). Repeating similar arguments, we derive Wk~ffk\ < mi\\Al/2{QG-Q6Q)\\A < \/2gapA(0,6O, k = l,...,q. (2.22) Lemma 2.1 furnishes estimates of the-perturbation of singular values in terms of the gap directly, which gives a much better constant, consistent with that of the asymptotic estimate of [5] for A = I; see the previous remark. Now we prove a separate estimate for sine. Lemma 2.2 Let Hi < < < nq and < fa < < p,q be sine of principal angles between subspaces T, Q, and T, Q, respectively, computed in the A-based scalar product. Then, for k = 1,..., q, \Hk~h\ < max[sin(0xnax{(^ + Â£)(?> -T7}); shi(0max{((5 + Q) Q, ^"})]gaPJ4(0,Q), (2.23) where 0max the largest angle between corresponding subspaces, measured in the A-based scalar product. Proof: The proof is based on the following identity: A1/2{I QfQfA)Qg = A1/2(I QfQtfA)QgQtgAQg + AX>\I QpQpA)(/ QgQtgA)Qg, 43 which is a multidimensional analogue of the trigonometric formula for the sine of the sum of two angles. The rest of the proof is similar to that of Lemma 2.1. We first need to take T = A1/2(/ QfQ'pA)QgQ'gAQq and S = A1/2(J QpQ'pA){I QgQgA)Qg and use (2.18) to get - QfQtfA)Q6) < sk(A^(I QFQTFA)QaQlAQe) + \\AV\I QfQtfA)(I QaQTaA)Qs||. In the second term in the sum on the right, QpQpA = Pp and we deduce A1/2(I QFQTFA)(I QgQga)Qg = aU2(I Ww-hW7 QoQaA)Qa = (AW(I P)P(g+i)egA-1/2) (AV\I QaQlA)Q6) , using notation P(g+g)eg for the A-orthogonal projector on the subspace (Q + Q) Q, introduced in the proof of Lemma 2.1. Therefore, the second term can be estimated as \\A1I2(I QfQfA)(I QgQga)Qg\\ < \\AW(I Pp)P^Q+Q-jegA~l^\\\\Al^{I QgQTgA)Qg||. The first multiplier is \\All2{i-pF)p{gF6)asA-ll2\\ = ||(/-Pp)P(s+5)eg|U = sm(emax{(e+e)e5, p}) by Theorem 2.14 as dimJ7 > dim((t/ + Q) Q), while the second multiplier is simply gapj4(S, Q) because of our assumption dimty = dimfT 44 To estimate the first term in the sum, we take with T = A1!2(IQFQrFA)QG and ST = QqAQq and apply (2.19): st(A1/2(I QFQ?A)QoQ^AQs) < 8^(1 -QFQTFA)Qa)\\QTGAQo\\ < sk(Al*(I QfQtfA)Qc), using exactly the same arguments as in the proof of Lemma 2.1. Thus, we have proved /he 5: fJ'k + sin(0max{(f? + Q) Q, 7r})gaP&) Changing the places of Qq and QG, we get Mfc < Afc + sin(0max{(f? + Q) Q, ?"})gaPa(^> 60- The statement of the lemma follows. Remark 2.10 Let us also highlight that simpler estimates, \iuk-fik\ almost trivially using orthoprojectors (see [55, 53, 22]), where this approach is used for the case A I. Indeed, we start with identities A1/2PfPqA~1/2 = A1/2PFPGA~1/2 + (A1/2PFA~l/2) (A1/2{Pa PG)A~1/2) for the cosine and A1/2(I Pf)PqA~1/2 = Afl2{I Pf)PgA-1/2 + (A1/2(/ Pf)A~1/2) (A1/2(Pd Pg)A~1/2) 45 for the sine, and use (2.18) and Theorems 2.13 and 2.14. A norm of the second term is then estimated from above by gapusing the fact that for an A-orthoprojector Pp we have ||Pp||a = ||7 Pf|]a = 1- Instead of the latter, we can use a bit more sophisticated approach, as in [22], if we introduce the A-orthogonal projector Pg+Â§ on the subspace Q + Q. Then the norm of second term is bounded by gapA{Q,Q) times \\PpPg+g\\A for the cosine and times ||(7 Pf)Pq+q\\a for the sine, where we can now use Theorem 2.13 to provide a geometric interpretation of these two constants. This leads to estimates similar to those of [22] for A= I: Wk <3-fc| < cos(0min{.F, Q + Â£})gapA(Â£, G), k = l,...,q, and W ~ fo\ < cos(0min{J'-L, g + Â£})gapA(Â£, g), k = l,...,q. However, the apparent constant improvement in the second estimate, for the sine, is truly misleading as cos^f.^, g + g}) = i simply because dimJ7 < dim (5 + g) in all cases except for the trivial possibility g = g, so subspaces J-x and g + Q must have a nontrivial intersection. The first estimate, for the cosine, does give a better constant (compare to one), but our constant is sharper; e.g., COS^Tninl^ (g + g)Q G}) < COs(0min{.F, Â£ + Â£}) 46 Our more complex identities used to derive perturbation bounds provide an extra projector in the error term, which allows us to obtain more refined con- stants. We can now establish an estimate of absolute sensitivity of cosine and sine of principal angles with respect to absolute perturbations of subspaces. Theorem 2.15 Let > a2 > > aq and cb > 02 > > aq be cosine of principal angles between subspaces F, Q, and F, Q, respectively, computed in the A-based scalar product. Then Wk ~&k\< cigapA(F,F) + c2gap^(Â£,Q), k=l,...,q, (2.24) where ci = max{cos(6>min{(Â£ + Q) 0 G, F})\ cos(0min{(Â£ + Q) Q, F})}, c2 = max{cos((9miri{(^r + F) 0 F, Q}); cos(0min{(.F + F) F, Q})}, where 0min is the smallest angle between corresponding subspaces in the A-based scalar product. Proof: First, by Lemma 2.1, for k = 1,..., q, kfc-dfcl < max{cos(#mjn{((? + G) QQ, F})\ cos (^min{(^ + Q) 0, F})}g&pA{Q, Q). 47 Second, we apply a similar statement to cosine of principal angles between sub- spaces J~, Q and T, Q, respectively, computed in the A-based scalar product: I dfc | < max{cos(0min{(Jr + f) T, G})\ cos^min^ + The statement of the theorem now follows from the triangle inequality. Theorem 2.16 Let pi < ^2 < < p,q and fi\ < ft2 < < ftq be sine of principal angles between subspaces T, G, and J~, Q, respectively, computed in the A-based scalar product. Then \iak-ftk\ C3 = max{sin(#max{((5 + G) G, -T7}); sin(^max{(0 + G) G, -T7})}, c4 = max{sin(0rnax{(^r + Â£) 0 F, G})\ sin^maxiO?7 + F) f, Â£})}, where 0max is the largest angle between corresponding subspaces in the A-based scalar product. Proof: First, by Lemma 2.2, for k = 1,..., q, K-/hi| < max{sin(0max{(Â£? + G) QG, F})\ sin(^max{(^ + G) 0 G, ?7})}gaP60- 48 Second, we apply a similar statement to sine of principal angles between sub- spaces T, Q and T, Q, respectively, computed in the A-based scalar product: |Afc ~h\ < max{sin(6>max{(^:' + T) 0 T, Q})\ sin^maxIC?7 + f) f, ^})}gapA(^, f). The statement of the theorem now follows from the triangle inequality. Finally, we want a perturbation analysis in terms of matrices F and G that generate subspaces T and Q. For that, we have to estimate the sensitivity of a column-space of a matrix, for example, matrix G. Lemma 2.3 Let ka{G) = ^max(^1/,2 3 mm denote the corresponding A-based condition number ofG, where smax and smjn are, respectively, largest and smallest singular values of the matrix Al^2G. Let Q and Q be column-spaces of matrices G and G, respectively. Then gapA(S, 5) < ^(G)11^1^^11. (2.26) Proof: Here, we essentially just adopt the corresponding proof of [55] for the A-based scalar product using the same approach as in Theorem 2.9. Let us consider the polar decompositions Al/2G = A1/2QGTG and A1/2G = A^Q^Tq, where matrices Al/2QG and A1/2Qq have orthonormal columns and matrices TG and Tq are g-by-g symmetric positive definite; e.g., TG = (QoQc)1^2- Singular 49 values of Tq and Tq are, therefore, the same as singular values of A^2G and Al^2G, respectively. Then, (I P6){G Gf) = (I P6)QqTg. Therefore, A1/2(I Pq)Qg = (A1/2(I Pg)A~1/2) A1/2{G G)Ta\ and gapA(5,S) as HA1/2^ Pq)A l/2\\ = || J Pq\\a < 1- The statement of the lemma follows. column scaling, which trivially does not change the column range. Our simple Lemma 2.3 does not capture this property. A more sophisticated variant can be easily obtained using a technique developed in [23, 22], Our cosine theorem follows next. It generalizes results of [53, 22, 23] to A-based scalar products and somewhat improves the constant. Theorem 2.17 Under assumptions of Theorem 2.15, Remark 2.11 Some matrices allow improvement of their condition numbers by Wk ~0k\< CiKA(F) \\A^F\\ + C2^a(G) \\A^{G-&)\\ life'll (2.27) 50 The theorem above does not provide an accurate estimate for small angles. To fill the gap, we suggest the following perturbation theorem in terms of sine of principal angles; cf. [53, 22] for A = I. Theorem 2.18 Under assumptions of Theorem 2.16, [l^1/2(G-g)ll , , \\AWGW (2.28) Finally, let us underscore that all sensitivity results in this chapter are for absolute errors. Golub and Zha in [22] observe that relative errors of sine and cosine of principal angles are not, in general, bounded by the perturbation. A-based scalar product in the next section. 2.6 Algorithm Implementation In this section, we provide a detailed description of our code SUBSPACEA.m and discuss the algorithm implementation. Theorem 2.11 is our main theoretical foundation for a sine-based algorithm for computing principal angles with respect to an A-based scalar product. How- ever, a naive implementation, using the SVD of the matrix 5TA5r, may not produce small angles accurately in computer arithmetic. We now try to explain informally this fact, which is actually observed in numerical tests. Let, for simplicity of notation, all principal angles be small. Let us consider a particular case, where columns of matrices Qf and Qg are already principal vectors in exact arithmetic. In reality, this is the situation We consider algorithms of computing principal angles with respect to an 51 we will face in Algorithm 2.4. Then, in exact arithmetic, columns of S are A- orthogonal and their A-norms are exactly the sine of principal angles. Thus, if there are several small angles different in orders of magnitude, the columns of S are badly scaled. When we take the norms squared, by explicitly computing the product STAS, we make the scaling even worse, as the diagonal entries of this diagonal matrix are now the sine of the principal angles squared, in exact arithmetic. In the presence of round-off errors, the matrix STAS is usually not diagonal; thus, principal angles smaller than 10~8 will lead to an underflow effect in double precision, which cannot be cured by taking square roots of its singular values later in the algorithm. To resolve this, we want to be able to compute the SVD of the matrix STAS without using either STAS itself or A1/2S. One possibility is suggested in the following lemma. Lemma 2.4 The SVD of the matrix A1/2S coincides with the SVD of the matrix Q'gAS, where Qs is a matrix with A-orthonormal columns, which span the same column-space as columns of matrix S. Proof: We have (QtsAS)t QtsAS = StAQsQtsAS = StAPsS = stas, where Ps = QsQsA is the A-orthogonal projector on the column-space of Qs, which is the same, by definition, as the column-space of S, so that PsS = S. 52 This contributes to the accuracy of our next Algorithm 2.3, based on Lemma 2.4, to be more reliable in the presence of round-off errors, when several principal angles are small. By analogy with Algorithms 2.1 and (2.2), we can remove the restriction that matrix E is invertible and somewhat improve the costs in Algorithm 2.3 by reducing the size of the matrix S in step 4, which leads to Algorithm 2.4. 53 Algorithm 2.3: SUBSPACEA.m Input: real matrices F and G with the same number of rows, and a symmetric positive definite matrix A for the scalar product, or a device to compute Ax for a given vector x. 1. Compute A-orthonormal bases Qp = ortha(F), Qg = ortha(G) of column-spaces of F and G. 2. Compute SVD for cosine [Y, E, Z] = svd(Q^ A Qg), E = diag (o\,..., oq). 3. Compute matrices of left Ucos = QfY and right Yos = QqZ principal vectors. 4. 5. 6. 7. 8. 9. Compute the matrix S = { Q A diag ^ d diag > [ Qf Qg(QgAQf) otherwise. Compute A-orthonormal basis Qs = ortha(S) of the column-space of S. Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(QgAS'). Compute matrices Usin and Yin of left and right principal vectors: Yin = QgZ, Usin = QF(Q^AVsm)E-1 diag (QF) > diag (Qg)] f/sin = QfZ, Yin = Qg(Q'gAUsm)S ^ otherwise. Compute the principal angles, for k = 1,..., q: 8 k = arccos((7fc) if cr\ < 1/2, arcsin(/ifc) if p% < 1/2. Form matrices U and V by picking up corresponding columns of Us-m, Yin and Ucos, Yost according to the choice for 8^ above. Output: Principal angles 91,.. .,9q between column-spaces of matrices F and G in the A-based scalar product, and corresponding matrices of left, U, and right, V, principal vectors. Previous remarks of section 2.3 for the algorithms with A = I are applicable to the present algorithms with self-evident changes. A few additional A-specific remarks follow. Remark 2.12 In step 1 of Algorithms 2.3 and 2.4, we use our SVD-based 54 function ORTHA.m for the A-orthogonalization. Specifically, computing an A-orthogonal basis Q of the column-space of a matrix X is done in three steps in ORTHA.m. First, we orthonormalize X, using SVD-based built- in MATLAB code ORTH.m with a preceding explicit column scaling; see Re- mark 2.1 on whether the scaling is actually needed. Second, we compute [U, S, V] = svd(XTAX), using MATLABs built-in SVD code. Finally, we take Q = XUS-1/2. If A is ill-conditioned, an extra cycle may be performed to im- prove the accuracy. While formal stability and accuracy analysis of this method has not been done, ORTHA.m demonstrates practical robustness in our numer- ical tests. 55 Algorithm 2.4: Modified and Improved SUB- SPACEA.m Input: real matrices F and G with the same number of rows, and a symmetric positive definite matrix A for the scalar product, or a device to compute Ax for a given vector x. 1. Compute A-orthogonal bases Qp = ortha(A), Qq ortha(G) of column-spaces of F and G. 2. Compute SVD for cosine [Y, S, Z\ = svd(Q^ AQg), Â£ = diag [u\,... ,crq). 3. Compute matrices of left Ucos = QpY and right V^0s = QgZ principal vectors. 4. Compute large principal angles for k = 1,... ,q: 9k arccos(ofc) if u\ < 1/2. 5. Form parts of matrices U and V by picking up corresponding columns according to the choice for 6k above. Put columns ofUcos, V^0S) which are left, of Ucos, YC0S) in matrices Rp and Rg Collect the corresponding os in a diagonal matrix Hr. 6. Compute the matrix S = Rq Qf{Q'rARg). 7. Compute A-orthogonal basis Qs = ortha(5) of the column-space of S. 8. Compute SVD for sine: [Y, diag (m,..., pq), Z] = svd(Q^AS). 9. Compute matrices Us-m and V^jn of left and right principal vectors: Yiin RgZ, Usjn = Rp^RpAVsir^Hj^. 10. Compute the missing principal angles, for k l,..., q: 9k = arcsin(/ifc) if pi < 1/2. 11. Complete matrices U and V by adding columns of Us[n, T^in- Output: Principal angles 9\,... ,9q between column-spaces of matrices F and G, and corresponding matrices U and V of left and right principal vectors, respectively. 56 Remark 2.13 We note that, when n p, the computational costs of SVDs of p-by-p matrices are negligible; it is multiplication by A, which may be very computationally expensive. Therefore, we want to minimize the number of mul- tiplications by A. In the present version 4.0 of our code SUBSPACEA.m, based on Algorithm 2.4, we multiply matrix A by a vector 2p+q times in the worst-case scenario of all angles being small, in steps 1 and 7. We can avoid multiplying by A on steps 2, 8, and 9 by using appropriate linear combinations of earlier computed vectors instead. Remark 2.14 Our actual code is written for a more general complex case, where we require matrix A to be Hermitian. Let us finally underline that in a situation when a matrix K from the fac- torization A = KtK is given rather than the matrix A itself, we do not advise using Algorithms 2.3 and 2.4. Instead, we recommend multiplying matrices F and G by K on the right and using simpler Algorithms 2.1 and 2.2, according to Theorem 2.9. 2.7 Numerical Examples Numerical tests in the present section were performed using version 4.0 of our SUBSPACEA.m code, based on Algorithms 2.2 and 2.4. Numerical results presented were obtained, unless indicated otherwise, on Red Hat 6.1 LINUX Dual Pentium-Ill 500, running MATLAB release 12, version 6.0.0.18. Tests were also made on Compaq Dual Alpha server DS 20, running MATLAB release 12, version 6.0.0.18 and on several Microsoft Windows Pentium-Ill systems, running 57 MATLAB version. 5.1-5.3. In our experience, Intel Pill-based LINUX systems typically provided more accurate results, apparently utilizing the extended 80 bit precision of FPU registers of PHI; see the discussion at the end of the section. The main goal of our numerical tests is to check a practical robustness of our code, using the following argument. According to our analysis of section 2.5 and similar results of [5, 53, 22, 23], an absolute change in cosine and sine of principal angles is bounded by perturbations in matrices F and G, with the constant, proportional to their condition numbers taken after a proper column scaling; see Theorems 2.17 and 2.28 and Remark 2.11. Assuming a perturbation of entries of matrices F and G at the level of double precision, EPS 10-16, we expect a similar perturbation in cosine and sine of principal angles, when matrices F and G are well-conditioned after column scaling. We want to check if our code achieves this accuracy in practice. We concentrate here on testing our sine-based algorithms, i.e., for principal angles smaller than 7r/4. The cosine-based algorithm with A I is recently studied in [16]. Our first example is taken from [5] with p = 13 and m = 26. Matrices F and G were called A and B in [5]. F was orthogonal, while G was an m-by-p Vandermonde matrix: with cond(G) ~ 104. Matrix G was generated in double precision and then rounded to single precision. According to our theory and a perturbation analysis of [5, 53, 22, 23], in this example an absolute change in principal angles is bounded by a perturbation in matrix G times its condition number. Thus, we should expect sine and cosine of 58 principal angles computed in [5] to be accurate with approximately four decimal digits. In our code, all computations are performed in double precision; therefore, answers in Table 2.2 should be accurate up to twelve decimal digits. We observe, as expected, that our results are consistent with those of [5] within four digits. k sin(0fc) cos(0*) 1 0.00000000000 1.00000000000 2 0.05942261363 0.99823291519 3 0.06089682091 0.99814406635 4 0.13875176720 0.99032719194 5 0.14184708183 0.98988858230 6 0.21569434797 0.97646093022 7 0.27005046021 0.96284617096 8 0.33704307148 0.94148922881 9 0.39753678833 0.91758623677 10 0.49280942462 0.87013727135 11 0.64562133627 0.76365770483 12 0.99815068733 0.06078820101 13 0.99987854229 0.01558527040 Table 2.2: Computed sine and cosine of principal angles of the example of [5] In our next series of tests, we assume n to be even and p = q < n/2. Let D be a diagonal matrix of the size p: D = diag (<2i,..., dp), dk> 0, k = l,...,p. We first define n-by-p matrices F, = [I 0]T, G1 = [ID 0]T, (2.29) 59 where I is the identity matrix of the size p and 0 are zero matrices of appropriate sizes. We notice that condition numbers of F\ and G\ are, respectively, one and condGi = 1 + (max{diag (D)})2 1 + (min{diag (.D)})2 Thus, the condition number may be large only when large diagonal entries in D are present. Yet in this case, the condition number of G\ can be significantly reduced by column scaling; see Remark 2.11. The exact values of sine and cosine of principal angles between column- spaces of matrices F\ and G\ are obviously given by Pk = dk V1 + dl V1 + di k = 1,...,P, (2.30) respectively, assuming that dk s are sorted in the increasing order. The collective error in principal angles is measured as the following sum: \J(l11 ~ Ai)2 + + P'p)2 + \j{&i &\)2 ---------+ (o'p dp)2, (2.31) where ps are the sine and as are the cosine of principal angles, and the tilde sign' is used for actual computed values. We multiply matrices Fi and G\ by a random orthogonal matrix U of the size n on the left to get F2 = U* Fi, G2 = U*G i. (2.32) This transformation does not change angles and condition numbers. It still allows for improving the condition number of G2 by column scaling. 60 Finally, we multiply matrices by random orthogonal p-by-p matrices Tp and Tq, respectively, on the right: Fz = F2 Tp, G$ = G2* Tq. (2.33) This transformation does not change angles or condition numbers. It is likely, however, to remove the possibility of improving the condition number of Gz by column scaling. Thus, if Gz is ill-conditioned, we could expect a loss of accuracy; see Theorems 2.17 and 2.28 and Remark 2.11. We start by checking scalability of the code for well-conditioned cases. We increase the size of the problem n and plot the collective error, given by 2.31, for the principal angles between Fz and Gz, against the value n/2. We solve the same problem two times, using Algorithm 2.2 and Algorithm 2.4 with A I. In the first two tests, diagonal entries of D are chosen as uniformly dis- tributed random numbers rand on the interval (0,1). On Figure 2.1 (top) we fix p = q = 20. We observe that the average error grows approximately two times with a ten times increase in the problem size. On Figure 2.1 (bottom), we also raise p = q = n/2. This time, the error grows with the same pace as the problem size. Please note different scales used for errors on Figure 2.1. 61 x 10 Error Growth with Problem Size Q> 4 1 0 0 Alg. 6.2, A=l * * Alg. 3.2 A * l V *%6o * ^ o Oo :0 0 50 100 150 200 250 300 350 400 450 500 problem size: n/2 x10-u Error Growth with Problem Size problem size: n/2 Figure 2.1: Errors in principal angles as functions of n/2 62 To test our methods for very small angles with p = q = 20, we first choose p = 20, dk = 10_fc, k=l,...,p. We observe a similar pattern of the absolute error as that of Figure 2.1 (top), but do not reproduce this figure here. In our second test for small angles, we set p = q = n/2 as on Figure 2.1 (bottom) and select every diagonal element of D in the form 10-17'rarid, where rand is again a uniformly distributed random number on the interval (0,1). The corresponding figure, not shown here, looks the same as Figure 2.1 (bottom), except that the error grows a bit faster and reaches the level m 4-10-14 (compare to 3 10-14 value on Figure 2.1 (bottom). In all these and our other analogous tests, Algorithm 2.2 and Algorithm 2.4 with A I behave very similarly, so that Figure 2.1 provides a typical example. In our next series of experiments, we fix a small p = q and n = 100, and compute angles between F2 and G2 and between Fz and Gz 500 times, changing only the random matrices used in the construction of our Fi and G{. Instead of the collective error, given by 2.31, we now compute errors for individual principal angles as W -h\ + Wk cr*s|, k = i,...,p. We tested several different combinations of angles less than 7r/4. In most cases, the error was only insignificantly different from EPS ~ 10-16. The worst-case scenario found numerically corresponds to D = diag {1,0.5,10u, 1012,1013,5 1015,2 10~15,10~15,10~16,0} 63 and is presented on Figure 2.2. Figure 2.2 (top) shows the distribution of the error for individual angles between F3 and G3 in Algorithm 2.2 in 500 runs. Figure 2.2 (bottom) demonstrates the performance of Algorithm 2.4 with A = I, for the same problem. The numeration of angles is reversed for technical reasons; i.e., smaller angles are further away from the viewer. Also the computed distribution of the error for individual angles between F2 and G2 are very similar and, because of that, are not shown here. We detect that for such small values of p and n most of the angles are computed essentially within the double precision accuracy. Only multiple small angles present a slight challenge, more noticeable on Figure 2.2 (bottom), which uses a somewhat different scale for the error to accommodate a larger error. Nevertheless, all observed errors in all 500 runs are bounded from above by 6 10-15 on Figure 2.2, which seems to be a reasonable level of accuracy for accumulation of round-off errors appearing in computations with 200-by-10 matrices in double precision. 64 Error Distribution for Different Angles x 10 error angle number reversed Error Distribution for Different Angles X10 error 3 1 angle number reversed Figure 2.2: Errors in individual angles in algorithms 2.2 and 2.4 with A = I 65 To test our code for an ill-conditioned case, we add two large values to the previous choice of D to obtain D = diag {1010,10s, 1,0.5,KT11, 1CT12,1(T13,5 1(T15,2 1CT15, 1(T15,1CT16,0}, which leads to condGi ~ 1010. Figure 2.3 (top) shows the distribution of the error for individual angles between F2 and G2 in Algorithm 2.4 with A = I after 500 runs. The numer- ation of angles is again reversed; i.e., smaller angles are further away from the viewer. There is no visible difference between the new Figure 2.3 (top) and the previous Figure 2.2 for the well-conditioned case, which confirms results of [23, 22] on column scaling of ill-conditioned matrices; see Remark 2.11. Namely, our ill-conditioned matrix G2 can be made well-conditioned by column scaling. Thus, perturbations in the angles should be small. Figure 2.3 (top) therefore demonstrates that our code SUBSPACEA.m is able to take advantage of this. 66 number of hits number of hits Error Distribution for Different Angles error angle number reversed Error Distribution for Different Angles Figure 2.3: Errors in individual angles between F2 and G2 and 3 and Gz 67 As we move to computing angles between A3 and G3 in this example (see Figure 2.3 (bottom)), where the distribution of the error for individual angles in Algorithm 2.4 with A I after 50.0 runs is shown, the situation changes dramatically. In general, it is not. possible to improve the condition number cond(G3) 1010 by column scaling. : Thus, according to [5, 23] and our pertur- bation analysis of Theorems 2,17 and 2.28 and Remark 2.11, we should expect the absolute errors in angles to grow ~ 1010 times (compare to the errors on Figure 2.3 (bottom)), i.e., up to the ievel 10~5. On Figure 2.3 (bottom), which shows actual errors obtained during 500 runs of the code, we indeed observe absolute errors in angles at the level up to 105, as just predicted. Surprisingly, the absolute error for angles with small cosines is much better. We do not have an explanation of such good behavior, as the present theory does not provide individual perturbation bounds; for different angles. Our concluding numerical results illustrate performance of our code for ill- conditioned scalar products. We take G to be the first ten columns of the identity matrix of size twenty, and F to be the last ten columns of the Vandermonde matrix of size twenty with elements Vij i2c>~J, i, j = .!,;..., 20. Matrix F'is ill-conditioned, condA 1013. We compute principal angles and vectors between F and G in an A-based scalar product for the following family of matrices: A = Ai = 10~lI + H, 1 = 1,..., 16, where / is identity and H is the Hilbert matrix of the order twenty, whose elements are given by hitj = l/(i+j l), i,j = 1,...,20. Our subspaces T and 68 Q do not change with /; only the scalar product that describes the geometry of the space changes. When Â£ = 1, we observe three angles with cosine less than 10-3 and three angles with sine less than 10-3. When Â£ increases, we are getting closer to the Hilbert matrix, which emphasizes first rows in matrices F and G, effectively ignoring last rows. By construction of F, its large elements, which make subspace T to be further away from subspace G, are all in last rows. Thus, we should expect large principal angles between T and Q to decrease monotonically as Â£ grows. We observe this in our numerical tests (see Figure 2.4 (top)), which plots in logarithmic scale sine of all ten principal angles as functions of Â£. 69 Error vs. Condition Number 10 * . MS Windows O 0 LINUX * * 10' - 10 10 10 10 LLi 10 10 condition number 10 Figure 2.4: /j,k as functions of l and errors as functions of condition number 70 Of course, such change in geometry that makes sine of an angle to decrease 104 times, means that matrix Ai, describing the scalar product, gets more and more ill-conditioned, as it gets closer to Hilbert matrix H, namely, cond(A) rj 10i in our case. It is known that ill-conditioned problems usually lead to a significant increase of the resulting error, as ill-conditioning amplifies round-off errors. To investigate this effect for our code, we introduce the error as the following sum: error = \\VTAV J|| + ||UTAU I\\ + ||Â£ UTAV\\, where the first two terms control orthogonality of principal vectors and the last term measures the accuracy of cosine of principal angles. We observe in our experiments that different terms in the sum are close to each other, and none dominates. The accuracy of sine of principal angles is not crucial in this example as angles are not small enough to cause concerns. As U and V are constructed directly from columns of F and G, they span the same subspaces with high accuracy independently of condition number of A, as we observe in the tests. We plot the error on the y-axis of Figure 2.4 (bottom) for a Pentium III 500 PC running two different operating systems: Microsoft Windows NT 4.0 SP6 (red stars) and RedHat LINUX 6.1 (blue diamonds), where the x-axis presents condition number of A; both axes are in logarithmic scale. The MATLAB Version 5.3.1.29215a (Rll.l) is the same on both operating systems. We see, as expected, that the error grows, apparently linearly, with the condition number. We also observe, now with quite a surprise, that the error on LINUX is much smaller than the error on MS Windows! 71 As the same MATLABs version and the same code SUBSPACEA.m are run on the same hardware, this fact deserves an explanation. As a result of a discussion with Nabeel and Lawrence Kirby at the News Group sci.math.num- analysis, it has been found that MATLAB was apparently compiled on LINUX to take advantage of extended 80 bit precision of FPU registers of Pill, while Mi- crosoft compilers apparently set the FPU to 64 bit operations. To demonstrate this, Nabeel suggested the following elegant example: compute scalar product (1 10'19 -1)T (1 1 1). On MS Windows, the result is zero, as it should be in double precision, while on LINUX the result is 1.084210-19. Figure 2.4 (bottom) shows that our algorithm turns this difference into a significant error improvement for an ill-conditioned problem. Finally, our code SUBSPACEA.m has been used in the code LOBPCG.m (see [34, 35]) to control accuracy of invariant subspaces of large symmetric gen- eralized eigenvalue problems, and thus has been tested for a variety of large-scale practical problems. 2.8 Availability of the Software A listing for SUBSPACEA.m and the function ORTHA.m is provided in Appendix A. Also see http://www.mathworks.com/support/ftp/linalgv5.shtml for the code SUBSPACEA.m and the function ORTHA.m it uses, as well as the fix for the MATLAB code SUBSPACE.m, as submitted to Math- Works. They are also publicly available at the Web page: http://www-math. cudenver. edu/~aknyaze v / software. 72 2.9 Conclusions Concerning Angles Between Subspaces Let us formulate here the main points of this section: A bug in the cosine-based algorithm for computing principal angles be- tween subspaces, which prevents one from computing small angles accu- rately in computer arithmetic, is illustrated. An algorithm is presented that computes all principal angles accurately in computer arithmetic and is proved to be equivalent to the standard algorithm in exact arithmetic. A generalization of the algorithm to an arbitrary scalar product given by a symmetric positive definite matrix is suggested and justified theoretically. Perturbation estimates for absolute errors in cosine and sine of principal angles, with improved constants and for an arbitrary scalar product, are derived. A description of the code is. given as well as results of numerical tests. The code is robust in practice and provides accurate angles for large-scale and ill-conditioned cases we tested numerically. It is also reasonably efficient for large-scale applications with n~^> p. Our algorithms are matrix-free; i.e., they do not require storing in mem- ory any matrices of the size n and are capable of dealing with A, which may be specified only as a function that multiplies A by a given vector. 73 MATLAB release 13 has implemented the SVD-based sine version of our algorithm. 74 3. Rayleigh Quotient Perturbations and Eigenvalue Accuracy In this chapter we discuss Rayleigh quotient and Raleigh Ritz perturbations and eigenvalue accuracy. There are two reasons for studying this problem [52]: first, the results can be used in the design and analysis of algorithms, and second, a knowledge of the sensitivity of an eigenpair is of both theoretical and of practical interest. Rayleigh quotient and Raleigh Ritz equalities and inequalities are central to determining eigenvalue accuracy and in analyzing many situations when individual vectors and/or subspaces are perturbed. 3.1 Introduction to Rayleigh Quotient Perturbations Let A e RnXn be a matrix and r be a vector in Rn. In this chapter, for the majority of results, we will assume that the matrix A is symmetric and that we are dealing with finite dimensional real vector spaces. Let Ai < < An be the eigenvalues of A with corresponding eigenvectors, respectively, ui,..., un. These eigenvectors may be chosen to be orthonormal. In some cases, we will assume that A is positive definite, with spectral condition number k(A) = ||A||||A_1|| = A n V In this chapter we will avoid introducing a basis to the extent possible. This makes these concepts and results more general, and possibly applicable to infinite dimensional vector spaces (e.g., Hilbert spaces). Investigation into this area is beyond the scope of this work, but promises interesting results for future work. 75 The Rayleigh quotient p(x\ A), is defined for x G Rn with x ^ 0 as: p(x) = p(x\ A) = (Ax, x)/(x, x), (3.1) where (x, x) = xTx = |[x||2 as in [49]. If it is obvious which matrix A is involved, the form p(x) will be used. A significant application of the Rayleigh quotient involves the Courant- Fischer Theorem. We restate here the Courant-Fischer Theorem (Theorem 10.2.1 of [49]), which is known as the minimax characterization of eigenvalues. It is used in this investigation to obtain several important results. Theorem 3.1 Let A E RnXn be a symmetric matrix with eigenvalues Ai < < An. Then Aj = mingicR" maxgegj p(g\ A), j = 1,..., n. (3.2) dim Q3=j Given a vector x G Rn with s^O, the residual is defined as: r(x) = r(x; A) = (A p(x\ A)I)x, (3.3) where I is the identity matrix. If it is obvious which matrix A is involved, the form r(x) will be used. It is worth noting that the gradient of p(x] A) is given by Vp(x, A) = -r-^-v r(z; A). (3.4) (x,x) We need to define some concepts concerning angles between vectors and an- gles between subspaces which are used in this chapter. The acute angle between two non-zero vectors x, y G Rn is denoted by /r t \ix,y)\ A{x, y} = arccos .. .... ... 76 It is important to emphasize that 0 < Z{x,yj < , by definition. The largest Li principal angle between the subspaces T,Q C Rn is denoted by G}. The definition and concepts involving principal angles between subspaces is discussed in detail in Chapter 2. The following result is used in [33], although this reference does not include a proof. Lemma 3.1 Let A Â£ Rnxn be a symmetric matrix and let x Â£ Rn with x ^ 0. Let Px be the orthogonal projector onto the subspace spanned by the vector x. Then r{x\A) |1 11*11 \\(I PX)APX Proof: We have 11(7 P,)APX - W P.)AP*y\\ v#? Il.-'li First we prove that (Ax: A)PX -- PXAPX. (3.5) Let yiiU2 Â£ Rn with y\ = ol\x + w\ and 7/2 = ol^x + W2, where wi,W2 are orthogonal to x. Then ((p(x; A)PX PxAPx)yi,y2) = ((p(x; A) A)Pxyu Pxy2) = otiOL2((p{x\ A) A)x, x) = 0. Since y\ and y2 are arbitrary, p{x\ A)PX PXAPX = 0. 77 For any vector y, we have y = ax+w for some real a and vector w orthogonal to x. Then ||(/ Px)APxy\\ |a|||(J Px)APxx|| \a\\\Ax PxAPxx\ IMI IMI \a\\\Ax p{x] A)x\\ IMI |o!|||r(a;; A)|| IMI VM2IMI2 + MI2 and for y = x this bound is achieved, which proves the result. < ||r(x;A)|| x\ There is another interesting result which relates p{x\ A) and r{x\ A) to the angle between x and Ax. Lemma 3.2 Let A 6 Rnxn be a symmetric matrix and let x G Rn with x ^ 0 and assume that Ax ^ 0. Then \\r{x-A)\\ ||Ax|| and assuming p{x\A) ^ 0, we have \\r{x-A)\\ = sin(Z{:r, Ax}), (3.6) x = \p(x; A)| tan(Z{:r, Ax}). (3.7) Proof: We have cos(Z{x, Ax}) \{x,Ax)\ = \p{x-,A)\ ||g|l IMIII^II II^MI and sin2(Z{x,Ax}) = 1 cos 2{Z{x,Ax}) = \\Ax\\2-p{x-A)2\\x\\2 ||r(*;A)||s ||Ac||2 ll^ll2 This proves (3.6). Combining the results for cosine and sine, we obtain (3.7). 78 Another important concept involves what is regarded as an invariant sub- space. A subspace S C Rn is invariant w.r.t to the matrix A, if it satisfies the condition that if x Â£ S, then Ax Â£ Let A Â£ Rnxn be a symmetric matrix and S C Rn be an ra-dimensional subspace, 0 < m < n. We can define an operator A = PsAjs on S, where Ps is the orthogonal projection onto S and PsA\s denotes the restriction of PsA to S. Eigenvalues of A are called Ritz values, Ai < < Am, and the corresponding eigenvectors, u\< The operator A is self-adjoint, since if x, y Â£ Utilizing a setup similar to the Courant-Fischer Theorem (Theorem 3.1), Ritz values are characterized, as in equation (11.5) of [49], by Aj = minxes maxgegj p{g\ A), j = 1,..., m. (3.8) dim S'3 =j Here, C S instead of Qi C Rn as in (3.2). Equivalently, let the columns of Q Â£ Rnxm form an orthonormal basis for S (Theorem 11.4.1 of [49]), then A = QTAQ = QTAQ is a matrix representation of the operator A = PsA\s in this basis, and A j, j = 1,..., m are the eigenvalues of A. Assuming w Â£ Rm, a linear map (p : Rm can be defined by
The inner product (and hence the norm) is preserved under this map, since if Thus
served under if.
1. Considering the Rayleigh quotient,
81
3. Let w, v E Rm. Since A is self-adjoint we have |